Assembly101 is a large-scale video dataset for action recognition and markerless motion capture of
hand-object interactions, captured in the above cage setting.
The recordings, which are multi-view captures, feature participants assembling 101 children's toys.
News
integration_instructions [Oct 2024] Labels for
test set has been released in annotations repo. The codalab
challenge has ended. Please refer to paperswithcode
for the leaderboard and feel free to add your test.csv results there.
integration_instructions [Aug 2023] Mistake
detection annotations are now available on github.
integration_instructions [April 28th 2023]
Camera extrinsics can now be found in AssemblyPoses.zip.
integration_instructions [Feb 20th
2023] Code and models for the Temporal Action Segmentation benchmark is now available on
github.
integration_instructions [Jan 17th
2023] Code and models for the Action Anticipation benchmark is now available on github.
leaderboard [Sept. 1st 2022]
We are pleased to announce awards worth up to $2000 for the top three entries in
our 3D Action
Recognition Challenge leaderboard.
integration_instructions [May
20th 2022] Code and models for the Action Recognition benchmark is now available
on github.
integration_instructions
[May 17th 2022] Annotations for both fine-grained and coarse actions are now
available on github.
integration_instructions
[May 2nd 2022] Scripts to download the videos are now available
on github.
integration_instructions
[March 28th 2022] Dataset released on Google
Drive.
description [March
28th 2022] Paper released on arXiv.
Abstract
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and
disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the
sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101
is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings.
Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand
poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal
segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format
and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer,
long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new
challenge to investigate various activity understanding problems.
Paper
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural
Activities
Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, Angela
Yao
Attribution : You must give appropriate credit, provide a link to the license, and
indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests
the licensor endorses you or your use.
NonCommercial : You may not use the material for commercial purposes.
Citation
@article{sener2022assembly101,
title = {Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities},
author = {F. Sener and D. Chatterjee and D. Shelepov and K. He and D. Singhania and R. Wang and A. Yao},
journal = {CVPR 2022},
}
We thank Joey Litalien for providing
us with the framework for this website.