MMAct Challenge

MMAct Challenge 2021 with ActivityNet @ CVPR'21

The MMAct Challenge 2021 will be hosted in the CVPR'21 International Challenge on Activity Recognition (ActivityNet) Workshop. This challenge asks participants to propose cross-modal video action recognition/localization approaches for addressing shortcomings in visual only approaches using MMAct Dataset.

Dataset structure

After the extraction of the dataset is over, you will see the dataset structure as follows.

For videos, video data will be stored under each session folders. Example of untrimmed videos under untrimmed/video folder:

trainval/
├── cam1
│   ├── subject1
│   │   ├── scene1
│   │   │   ├── session1
│   │   │   │   └── 20181016-133948.mp4
│   │   │   ├── session2
│   │   │   │   └── 20181016-134634.mp4
│   │   │   └── session3
│   │   │       └── 20181016-140459.mp4
...

For sensors, sensor data (.csv) will be stored under each session folders according to each type of sensor, acc_phone_clip, gyro_clip, orientation_clip:acceleration, gyroscope, orientation from smartphone in the right pockets of pants, acc_watch_clip:acceleration from smartwatch worn on the right hand. Example of untrimmed sensor data under untrimmed/sensor. Notice that sensor has NO view definition.

sensor/
├── acc_phone_clip
│   ├── subject1
│   │   ├── scene1
│   │   │   ├── session1
│   │   │   ├── session2
│   │   │   ├── session3
...

For trimmed data, the annotation will be the file name itself. For untrimmed data, the annotation will be stored under untrimmed/annotation folder, the file name of the untrimmed video is the end time stamp. The split of train and val for each sub-task will be stored in trimmed/splits and untrimmed/splits respectively. Camera views with the same session index as subjectXX/sceneYY/sessionZZ share the same annotation in annotation/trainval/subjectXX/sceneYY/sessionZZ/. The folder structure under untrimmed/annotation is:

trainval/
├── subject1
│   ├── scene1
│   │   ├── session1
│   │   │   └── subject1_scene1_session1.txt
│   │   ├── session2
│   │   │   └── subject1_scene1_session2.txt
│   │   └── session3
│   │       └── subject1_scene1_session3.txt
...

In the untrimmed video annotation file, each column means [start timestamp]-[end timestamp]-[action_name] such as:

2018/10/16 13:33:45.170-2018/10/16 13:33:49.891-standing
2018/10/16 13:33:55.362-2018/10/16 13:34:00.323-crouching
2018/10/16 13:34:06.132-2018/10/16 13:34:14.522-walking
2018/10/16 13:34:19.402-2018/10/16 13:34:25.114-running
2018/10/16 13:34:33.226-2018/10/16 13:34:38.762-checking_time
2018/10/16 13:34:46.450-2018/10/16 13:34:51.698-waving_hand
2018/10/16 13:34:57.226-2018/10/16 13:35:04.075-using_phone
...

Pose keypoints related frame extraction

Human pose keypoints data are stored under trimmed/pose folder. Keypoints are provide with a json format by using openpifpaf with a manually check. All coordinates are in pixel coordinates. The keypoints entry is in COCO format with triples of (x, y, c) (c for confidence) for every joint as listed under coco-person-keypoints. To get the same frame index number stored in json that corresponded with the extracted keypoints, below is a sample script for using:

python utils/pose_frame_extraction.py

Toolkit for sensor processing

For the entry of sensor data processing, we provide a complete example to show the way of creating time-series sliding window data, dealing with datetime merging across different type of sensor with different sampling rate if an early fusion needed, and use the pre-processed data for training/testing a time-series classifier. When MMAct trimmed cross-scene dataset and MMAct untrimmed cross-session dataset prepared over, to run the classifier (InceptionTime) training and testing example as:

python utils/time_series_classifiers.py

Evaluation for Validation Set

We provide the ground truth files with json format and evaluation script for validation set of both tasks under evaluation/. File name as evaluation/[file_name]_gt.json is the ground truth file. To evaluate Task1 Action Recognition with validation set, run:

python evaluation/eval_mmact_trimmed.py --gt ground_truth_file --pred your_prediction_file

Example run for Task1 cross-view validation with sample format file from evaluation/trimmed_val_view_format.json:

python evaluation/eval_mmact_trimmed.py --gt trimmed_val_view_gt.json --pred trimmed_val_view_sample_format.json

To evaluate Task2 Temporal Localization with validation set, run:

python evaluation/eval_mmact_untrimmed.py --gt ground_truth_file --pred prediction_file

Example run for Task2 with the sample format from evaluation/untrimmed_val_sample_format.json:

python evaluation/eval_mmact_untrimmed.py --gt untrimmed_val_gt.json --pred untrimmed_val_sample_format.json

Test set Submission format for Leaderboard

Task1 Action Recognition Leaderboard recieves result of trimmed action recognition test set for competition. User needs to submit two results on cross-view and cross-scene,respectively. Both of the two splits submission files are the same format as follows,

{
  "results": {
    "nljxzmeshydtlonl": [
      {
        "label": "walking", #one prediction per video is required
        "score": 0.5
      }
    ],
    "hvuapypvzwsjutrf": [
      {
        "label": "talking",
        "score": 0.5
      }
    ],
    "hiukqqolgmtcnisi": [
      {
        "label": "throwing",
        "score": 0.5
      }
    ]
  }
}

Task2 Temporal Localziation Leaderboard recieves result of untrimmed video temporal localization test set for competition, the submission format example is:

{
  "results": {
    "mynbiqpmzjplsgqe": [{
        "label": "standing",
        "score": 0.40685554496254395,
        "segment": [
          62.03, #start seconds, 0.0 is the starting time of the given video.
          66.32  #end seconds
        ]
      },
      {
        "label": "crouching",
        "score": 0.5805843080181547,
        "segment": [
          70.58,
          75.12
        ]
      }
    ]
  }
}

Reference

Please cite the following paper if you use the code or dataset.

@InProceedings{Kong_2019_ICCV,
          author = {Kong, Quan and Wu, Ziming and Deng, Ziwei and Klinkigt, Martin and Tong, Bin and Murakami, Tomokazu},
          title = {MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding},
          booktitle = {The IEEE International Conference on Computer Vision (ICCV)},
          month = {October},
          year = {2019}
        }

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
assets/report		assets/report
css		css
evaluation		evaluation
fonts		fonts
img		img
js		js
php		php
scss		scss
utils		utils
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MMAct Challenge

MMAct Challenge 2021 with ActivityNet @ CVPR'21

Dataset structure

Pose keypoints related frame extraction

Toolkit for sensor processing

Evaluation for Validation Set

Test set Submission format for Leaderboard

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

mmact19/challenge

Folders and files

Latest commit

History

Repository files navigation

MMAct Challenge

MMAct Challenge 2021 with ActivityNet @ CVPR'21

Dataset structure

Pose keypoints related frame extraction

Toolkit for sensor processing

Evaluation for Validation Set

Test set Submission format for Leaderboard

Reference

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages