| CARVIEW |
Watch the Trailer
EPIC-KITCHENS VISOR
We are proud to announce the EPIC-KITCHENS VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality, and introduces:
Sparse Annotations
271K masks covering 36 hours of untrimmed video
Dense Annotations
14.9M high quality automatic interpolations
Video Object Segmentation
Goal: Track segments through video and occlusion
Hand Object Segmentation
Goal: Identify contact with 67K in-hand object masks
Where Did This Come From?
Goal: Name and point to where things came from with 222 test cases
Explore EPIC-KITCHENS VISOR
To get a sense of the data, feel free to explore some of the data in VISOR!
Interactive! Watch a Segment.
You can click through and see the annotations for a full sequence. We show the image on the left, the annotations on the right, and the legend for the annotation below.
Frame 1 / 402
Interactive! See Our Dense Annotations.
Part of VISOR is a new collection of 14.9M new masks that are interpolated between our sparse annotation.
Click on
any of the images below to see some clips of new dense annotations.
Interactive! What are Hands Doing?
Mouseover an image and you can see what hands are up to in EPIC-KITCHENS. We'll show you a hand that's at your mouse cursor.
Move your mouse here! Download Data
VISOR is now available for download.
Annotation and sparse frames are available at the University of Bristol data repository, data.bris, at https://doi.org/10.5523/bris.2v6cgv1x04ol22qp9rm9x2j6a7
Code
We make the following codes now public, which replicate the VISOR paper's baseline and provide visualisation support for the annotations- VISOR-VIS: Code to visualise segmentations
- VISOR-FrameExtraction: Code to extract frames for dense annotations from the original video
- VISOR-VOS: Code to perform semi-supervised video object segmentation. Models and code replicate our first benchmark
- VISOR-HOS: Code to perform in-frame hand and active object segmentations. Models and code replicate our second baseline
- VISOR-WDTCF: Code to replicate our taster benchmark: Where did this come from?
The above repos contain everything you need to replicate our paper's results and visualise annotations. We are not releasing any further code or models.
Paper and Citation
Read our NeurIPS 2022 paper EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations on ArXiv and Open Review
When using these annotations, cite our EPIC-KITCHENS VISOR Benchmark paper:
@inproceedings{VISOR2022,
title={EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations},
author={Darkhalil, Ahmad and Shan, Dandan and Zhu, Bin and Ma, Jian and Kar, Amlan and Higgins, Richard and Fidler, Sanja and Fouhey, David and Damen, Dima},
booktitle = {Proceedings of the Neural Information Processing Systems (NeurIPS) Track on Datasets and Benchmarks},
year = {2022}
}
Also cite the EPIC-KITCHENS-100 paper where the videos originate:
@ARTICLE{Damen2022RESCALING,
title={Rescaling Egocentric Vision: Collection, Pipeline and Challenges for EPIC-KITCHENS-100},
author={Damen, Dima and Doughty, Hazel and Farinella, Giovanni Maria and and Furnari, Antonino
and Ma, Jian and Kazakos, Evangelos and Moltisanti, Davide and Munro, Jonathan
and Perrett, Toby and Price, Will and Wray, Michael},
journal = {International Journal of Computer Vision (IJCV)},
year = {2022},
volume = {130},
pages = {33–55},
Url = {https://doi.org/10.1007/s11263-021-01531-2}
}
Disclaimer
The underlying data that power VISOR, EPIC-KITCHENS-55 and EPIC-KITCHENS-100, were collected as a tool for research in computer vision. The dataset may have unintended biases (including those of a societal, gender or racial nature).
Copyright 
The VISOR dataset is copyright by us and published under the Creative Commons Attribution-NonCommercial 4.0 International License. This means that you must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. You may not use the material for commercial purposes.
For commercial licenses of EPIC-KITCHENS and VISOR annotations, email us at uob-epic-kitchens@bristol.ac.uk
The Team
Ahmad Dar Khalil*
University of Bristol
Dandan Shan*
University of Michigan
Bin Zhu*
University of Bristol
Jian Ma*
University of Bristol
Amlan Kar
University of Toronto
Richard Higgins
University of Michigan
Sanja Fidler
University of Toronto
David Fouhey
University of Michigan
Dima Damen
University of Bristol
Research Funding
The work on VISOR was supported by the following:
- Segmentation annotations were funded by charitable unrestricted donation from Procter and Gamble as well as charitable unrestricted donation from DeepMind.
- Research at the University of Bristol is supported by UKRI Engineering and Physical Sciences Research Council (EPSRC) Doctoral Training Program (DTP), EPSRC Fellowship UMPIRE (EP/T004991/1) and EPSRC Program Grant Visual AI (EP/T028572/1).
- The project acknowledges the use of the ESPRC funded Tier 2 facility, JADE and University of Bristol's Blue Crystal 4 facility.
- Research at the University of Michigan is based upon work supported by the National Science Foundation under Grant No. 2006619.
- Research at the University of Toronto is in part sponsored by NSERC. S.F. also acknowledges support through the Canada CIFAR AI Chair program.