| CARVIEW |
| Tweets by dimadamen |
Research Projects
Prime and Reach
|
M Hatano*, S Sinha*, J Chalk, W Li, H Saito, D Damen (2025). Prime and Reach: Synthesising Body Motion for Gaze-Primed Object Reach. ArXiv | Website | Dataset |
The N-Body Problem
|
Z Zhu, Y Huang, Y Sato, D Damen (2025). The N-Body Problem: Parallel Execution from Single-Person Egocentric Video ArXiv | Webpage |
PointSt3R
|
R Guerrier, A W Harley, D Damen (2026). PointSt3R: Point Tracking through 3D Grounded Correspondence. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). ArXiv | Webpage | Code and Benchmark |
HD-EPIC
![]() |
T Perrett, A Darkhalil, S Sinha, O Emara, S Pollard, K Parida, K Liu, P Gatti, S Bansal, K Flanagan, J Chalk, Z Zhu, R Guerrier, F Abdelazim, B Zhu, D Moltisanti, M Wray, H Doughty, D Damen (2025). HD-EPIC: A Highly-Detailed Egocentric Video Dataset. IEEE/CVF Computer Vision and Pattern Recognition (CVPR) ArXiv | Webpage | Dataset | Annotations | Explore Dataset | CVF |
Learning from Streaming Video with Orthogonal Gradients
![]() |
T Han, D Gokay, J Heyward, C Zhang, D Zoran, V Patraucean, J Carreira, D Damen, A Zisserman (2025). Learning from Streaming Video with Orthogonal Gradients. IEEE/CVF Computer Vision and Pattern Recognition (CVPR) ArXiv | Webpage and Code | CVF |
EgoPoints
|
A Darkhalil, R Guerrier, A W Harley, D Damen (2025). EgoPoints: Advancing Point Tracking for Egocentric Videos. IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). ArXiv | Webpage | Code and Benchmark |
ShowHowTo
|
T Soucek, P Gatti, M Wray, I Laptev, D Damen, J Sivic (2025). ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions. IEEE/CVF Computer Vision and Pattern Recognition (CVPR). ArXiv | Website | Code and Dataset | CVF |
It's Just Another Day
|
T Perrett, T Han, D Damen, A Zisserman (2024). It's Just Another Day: Unique Video Captioning by Discriminitave Prompting. ACCV (Best Paper Award) ArXiv | Website | Code and Benchmark |
AMEGO: Active Memory from long EGOcentric videos
|
G Goletto, T Nagarajan, G Averta, D Damen (2024). AMEGO: Active Memory from long EGOcentric videos. ECCV ArXiv | Website | AMB Benchmark | Code |
HOI-Ref: Hand-Object Interaction Referral
![]() |
S Bansal, M Wray, D Damen (2024). HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision. ArXiv | Website | HOI-QA Dataset | Models and Code |
TIM: A Time Interval Machine
![]() |
TIM: A Time Interval Machine for Audio-Visual Action Recognition. Jacob Chalk, Jaesung Huh, Evangelos Kazakos, Andrew Zisserman, Dima Damen (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). Webpage | Code and Models | ArXiv | < a href="https://openaccess.thecvf.com/content/CVPR2024/papers/Chalk_TIM_A_Time_Interval_Machine_for_Audio-Visual_Action_Recognition_CVPR_2024_paper.pdf">CVF PDF |
Out of Sight, not Out of Mind
![]() |
ESpatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind. Chiara Plizzari, Shubham Goel, Toby Perrett, Jacob Chalk, Angjoo Kanazawa, Dima Damen (2025). 3DV Webpage | ArXiv | Video |
Every Shot Counts
|
Every Shot Counts: Using Exemplars for Repetition Counting in Videos. Saptarshi Sinha, Alexandros Stergiou, Dima Damen (2024). Asian Conference on Computer Vision (ACCV). Webpage | Code | ArXiv We propose an exemplar-based approach that discovers visual correspondence of video exemplars across repetitions within target videos. Our proposed Every Shot Counts (ESCounts) model is an attention-based encoder-decoder that encodes videos of varying lengths alongside exemplars from the same and different videos. |
GenHowTo
|
GenHowTo: Learning to Generate Actions and State Transformations from Instructional Videos. Tomas Soucek, Dima Damen, Michael Wray, Ivan Laptev, Josef Sivic (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). Webpage | Code | ArXiv | CVF PDF |
Get A Grip
![]() |
Get a Grip: Reconstructing Hand-Object Stable Grasps in Egocentric Videos. Zhifan Zhu and Dima Damen (2024). ArXiv. Webpage | EPIC-Grasps Dataset and Code | ArXiv (v2) |
Rank2Reward
![]() |
Rank2Reward: Learning Shaped Reward Functions from Passive Video. Daniel Yang, Davin Tjia, Jacob Berg, Dima Damen, Pulkit Agrawal and Abhishek Gupta (2024). IEEE International Conference on Robotics and Automation (ICRA).Webpage | ArXiv |
Ego-Exo4D
|
Ego-Exo4D : Understanding Skilled Human Activity from First- and Third-Person Perspectives. K Grauman et al. (2024). IEEE/CVF Computer Vision and Pattern Recognition (CVPR). ArXiv, Webpage, PDF | CVF PDF Journal Version: Ego-Exo4D : Understanding Skilled Human Activity from First- and Third-Person Perspectives. K Grauman et al. (2025). International Journal of Computer Vision. Open Access | PDF |
The Future of Egocentric Vision.
![]() |
An Outlook into the Future of Egocentric Vision. C Plizzari*, G Goletto*, A Furnari*, S Bansal*, F Ragusa*, GM Farinella, D Damen, T Tommasi. (2024). International Journal of Computer Vision (IJCV). PDF | Open Review Preprint | ArXiv |
Learning Temporal Sentence Grounding From Narrated EgoVideos
![]() |
Learning Temporal Sentence Grounding From Narrated EgoVideos. K Flanagan, D Damen, M Wray (2023). British Machine Vision Conference (BMVC). ArXiv Camera Ready | Project Webpage | Code and Models |
EPIC Fields: Marrying 3D Geometry and Video Understanding
|
EPIC Fields: Marrying 3D Geometry and Video Understanding. V Tschernezki*, A Darkhalil*, Z Zhu*, D Fouhey, I Laina, D Larlus, D Damen, A Vedaldi (2023). Neural Information Processing Systems (NeurIPS) Preprint, Webpage |
What can a cook in Italy teach a mechanic in India?
![]() |
What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations. C Plizzari, T Perrett, B Caputo, D Damen. ICCV 2023 Preprint | Webpage | Dataset | Code |
Use Your Head: Improving Long-Tail Video Recognition
![]() |
CVF PDF | CVF Supp | ArXiv | Webpage | Benchmarks, Code and Models Use Your Head: Improving Long-Tail Video Recognition. T Perrett, S Sinha, T Perrett, M Mirmehdi, D Damen. CVPR 2023. |
Temporal Progressive Attention for Early Action Prediction
![]() |
CVF PDF | CVF Supp | ArXiv | Webpage | Code The Wisdom of Crowds: Temporal Progressive Attention for Early Action Prediction. A Stergiou, D Damen. CVPR 2023. |
EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound
|
EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound. J Huh*, J Chalk*, E Kazakos, D Damen, A Zisserman. Journal Extended Version (2025). IEEE Transactions on Pattern Analysis and Machine Intelligence 47, pp. 9953-9965. Journal Version (DOI), ArXiv, Webpage EPIC-SOUNDS: A Large-Scale Dataset of Actions that Sound. J Huh*, J Chalk*, E Kazakos, D Damen, A Zisserman. ICASSP 2023. ArXiv, Webpage |
Play It Back: Iterative Attention for Audio Recognition
![]() |
Play It Back: Iterative Attention for Audio Recognition. A Stergiou, D Damen. ICASSP 2023. ArXiv, Webpage |
VISOR: Video Segmentations and Object Relations
|
Trailer | Reveal @EPIC2022 | Download EPIC-KITCHENS VISOR Benchmark: VIdeo Segmentations and Object Relations. A Darkhalil, D Shan, B Zhu, J Ma, A Kar, R Higgins, S Fidler, D Fouhey, D Damen. NeurIPS 2022. PDF, Webpage |
ConTra: Context Transformer for Cross-Modal Retrieval
VideoConTra: Context Transformer for Cross-Modal Retrieval. A Fragomeni, M Wray, D Damen. ACCV (2022) Oral. ArXiv | PDF Preprint | Project Webpage | Code
Egocentric Video-Language PreTraining
Egocentric Video-Language Pretraining. KQ Lin, AJ Wang, M Soldan, M Wray, R Yan, EZ Xu, D Gao, R Tu, W Zhao, W Kong, C Cai, H Wang, D Damen, B Ghanem, W Liu, MZ Shou. NeurIPS (2022). ArXiv | PDF Preprint | Project Webpage | Code
UnweaveNet: Unweaving Activity Stories
VideoUnweaveNet: Unweaving Activity Stories. W Price, C Vondrick, D Damen. CVPR (2022). ArXiv Paper | Project Webpage | Annotations
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Ego4D Project and Dataset | Reveal Session Video | Trailer VideoAround the World in 3,000 Hours of Egocentric Video. K Grauman (+83 Authors) et al. CVPR (2022). ArXiv
Temporal Context in Egocentric Video
VideoWith a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition. E Kazakos, J Huh, A Nagrani, A Zisserman, D Damen. BMVC (2021). ArXiv Paper | Project Webpage | Code, features and models
Rescaling Egocentric Vision
|
Trailer | Video Demonstration | Webinar | Download Rescaling Egocentric Vision. D Damen, H Doughty, G Farinella, A Furnari, E Kazakos, J Ma, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IJCV. IJCV paper, ArXiv, Webpage The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. D Damen, H Doughty, GM Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IEEE Transactions on Pattern Analysis and Machine Intelligence 43(11) pp 4125-4141 (2021). IEEE, Arxiv Preprint |
Domain Adaptation in Video Retrieval
Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval. J Munro, M Wray, D Larlus, G Csurka, D Damen. ArXiv (2021). ArXiv Paper
Semantic Similarity in Video Retrieval
On Semantic Similarity in Video Retrieval. M Wray, H Doughty, D Damen. CVPR (2021). CVF PDF | ArXiv Preprint | Project Webpage | Video
Temporal-Relational CrossTransformers
Temporal-Relational CrossTransformers for Few-Shot Action Recognition. T Perrett, A Masullo, T Burghardt, M Mirmehdi, D Damen. CVPR (2021). CVF PDF | ArXiv Preprint | Code and Model | Project Webpage
Slow-Fast Auditory Streams
Slow-Fast Auditory Streams For Audio Recognition. E Kazakos, A Nagrani, A Zisserman, D Damen. ICASSP (2021). ArXiv Preprint | IEEE PDF | Code and Models | Project Webpage [Outstanding Paper]
Frame Attributions in Video Models
Interactive Dashboard | Teaser Video | Code
Play Fair: Frame Attributions in Video Models. W Price, D Damen. ACCV (2020). ArXiv Preprint | Project Details | CVF | CVF PDF
MetaLearning with Context-Agnostic Initialisation
MetaLearning with Context-Agnostic Initialisation. T Perrett, A Masullo, T Burghard, M Mirmehdi, D Damen. ACCV (2020). ArXiv Preprint | CVF | CVF PDF | Project Details
Action Modifiers: Learning from Adverbs in Instructional Videos
Action Modifiers: Learning from Adverbs in Instructional Videos. H Doughty, I Laptev, W Mayol-Cuevas, D Damen. CVPR (2020). ArXiv Preprint, CVF PDF, Project Details
Multi-Modal Domain Adaptation for Fine-Grained Action Recognition
Video, Oral Presentation Video
Multi-modal Domain Adaptation for Fine-grained Action Recognition. J Munro, Dima Damen. CVPR (2020). ArXiv Preprint, CVF PDF, Project Details, Code
Retro-Actions
Retro-Actions: Learning 'Close' by Time-Reversing 'Open' Videos. W Price, Dima Damen. ICCV (2019). ArXiv Preprint, Project Details
Fine-Grained Action Retrieval
Fine-Grained Action Retrieval through Multiple Parts-of-Speech Embeddings. Michael Wray, Diane Larlus, Gabriela Csurka, Dima Damen. ICCV (2019). CVF PDF, ArXiv Preprint, Project Details
Audio-Visual Temporal Binding for Egocentric Action Recognition
EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition. Evangelos Kazakos, Arsha Nagrani, Andrew Zisserman, Dima Damen. ICCV (2019). Project Details, CVF PDF, Arxiv Preprint
Learning Visual Actions Using Multiple Verb-Only Labels
Learning Visual Actions Using Multiple Verb-Only Labels. M Wray, D Damen. BMVC (2019). ArXiv Preprint, Project Details
DDLSTM: Dual-Domain LSTM
DDLSTM: Dual-Domain LSTM for Cross-Dataset Action Recognition. T Perrett and D Damen. CVPR (2019). pdf preprint, Arxiv Project Details
The Pros and Cons: Rank-Aware Attention Modules
The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos. H Doughty, W Mayol-Cuevas, D Damen. CVPR (2019). pdf preprint, Arxiv, Project Details
Action Recognition from Single Timestamps
Action Recognition from Single Timestamp Supervision in Untrimmed Videos. D Moltisant, S Fidler and D Damen. CVPR (2019). pdf preprint, Project Details
Tent Assembly Egocentric Dataset
(2021) B Sullivan, C Ludwig, D Damen, W Mayol-Cuevas, I Gilchrist. Look-Ahead Fixations During Visuomotor Behavior: Evidence from Assembling a Camping Tent. Journal of Vision 21(3):13. PDF
EPIC-Tent: An Egocentric Video Dataset for Camping Tent Assembly. Y Jang, B Sullivan, C Ludwig, I.D. Gilchrist, D Damen and W Mayol-Cuevas. ICCV Workshops (2019). pdf, Project Details, Dataset, Annotations
Scaling Egocentric Vision: EPIC-KITCHENS 2018
|
Scaling Egocentric Vision: The EPIC-KITCHENS Dataset. D Damen, H Doughty, G Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. ECCV (2018). Webpage | Dataset | arxiv An Evaluation of Action Recognition Models on EPIC-Kitchens. W Price, D Damen. Arxiv (2019) Arxiv | Github | PDF The EPIC-KITCHENS Dataset: Collection, Challenges and Baselines. D Damen, H Doughty, GM Farinella, S Fidler, A Furnari, E Kazakos, D Moltisanti, J Munro, T Perrett, W Price, M Wray. IEEE Transactions on Pattern Analysis and Machine Intelligence (2020). Arxiv Preprint |
Skill Determination in Video
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination. H Doughty, D Damen, W Mayol-Cuevas. CVPR (2018). PDF | arxiv | Dataset
Action Completion: A Temporal Model for Moment Detection
Weakly-Supervised Completion Moment Detection using Temporal Attention. F Heidarivincheh, M Mirmehdi, D Damen. ICCV Workshop on Human Behaviour Understanding. Arxiv | CVF PDF, Oct 2019.
Action Completion: A Temporal Model for Moment Detection. F Heidarivincheh, M Mirmehdi, D Damen. British Machine Vision Conference (BMVC), Sep 2018. Arxiv PDF | Dataset
Beyond Action Recognition: Action Completion in RGB-D Data. F Heidarivincheh, M Mirmehdi, D Damen. British Machine Vision Conference (BMVC), Sep 2016. pdf | abstract | Dataset
Human Routine Modelling and Change Detection
Human Routine Change Detection using Bayesian Modelling. Y Xu, D Damen. ICPR (2018). PDF
Unsupervised Long-Term Routine Modelling using Dynamic Bayesian Networks. Y Xu, D Bull, D Damen. DICTA (2017). PDF
Trespassing the Boundaries of Object Interactions
Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video. D Moltisanti, M Wray, W Mayol-Cuevas, D Damen. International Conference on Computer Vision (ICCV), 2017. pdf (camera ready) | arxiv
Semantic Embedding for Egocentric Actions
SEMBED: Semantic Embedding of Egocentric Action Videos. M Wray, D Moltisanti, W Mayol-Cuevas, D Damen. Egocentric Interaction, Perception and Computing (EPIC), European Conference on Computer Vision Workshops (ECCVW), Oct 2016. pdf | Dataset
You-Do, I-Learn
Automated capture and delivery of assistive task guidance with an eyewear computer: The GlaciAR system. T Leelasawassuk, D Damen, W Mayol-Cuevas. Augmented Human, Mar 2017 pdf
You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video. D Damen, T Leelasawassuk, O Haines, A Calway, W Mayol-Cuevas. British Machine Vision Conference (BMVC), Sep 2014. PDF | Abstract | Dataset
Multi-user egocentric Online System for Unsupervised Assistance on Object Usage. D Damen, O Haines, T Leelasawassuk, A Calway, W Mayol-Cuevas. ECCV Workshop on Assistive Computer Vision and Robotics (ACVR), Sep 2014. PDF Preprint
Estimating Visual Attention from a Head Mounted IMU. T Leelasawassuk, D Damen, W Mayol-Cuevas. International Symposium on Wearable Computers (ISWC), Sep 2015. PDF
DS-KCF: Depth-Based Real-Time Single Object Tracker
Real-time RGB-D Tracking with Depth Scaling Kernelised Correlation Filters and Occlusion Handling. M Camplani, S Hannuna, M Mirmehdi, D Damen, L Tao, T Burghardt and A Paiment. British Machine Vision Conference (BMVC), Sep 2015. PDF.
Real-time Learning and Detection of 3D Texture-minimal Objects
Real-time Learning and Detection of 3D Texture-minimal Objects: A Scalable approach. D Damen, P Bunnun, A Calway, W Mayol-Cuevas. British Machine Vision Conference (BMVC), Sep 2012. PDF | Abstract | Code | Video | Dataset.
Efficient Texture-less Object Detection for Augmented Reality Guidance. T Hodan, D Damen, W Mayol-Cuevas, J Matas. IEEE Int. Symposium on Mixed and Augmented Reality (ISMAR) Workshop on Visual Recognition and Retrieval for Mixed and Augmented Reality, Sep 2015.
Egocentric Real-time Industrial Workflow
Cognitive Learning, Monitoring and Assistance of Industrial Workflows Using Egocentric Sensor Networks. G Bleser, D Damen, A Behera, et al. PLOSONE, June 2015 PDF.
Egocentric Real-time Workspace Monitoring using an RGB-D Camera. D Damen, A Gee, W Mayol-Cuevas, A Calway. Intelligent Robotics and Systems (IROS), Oct 2012. PDF | Video.
Online Quality Assessment for Human Motion
Online quality assessment of human movement from skeleton data. A Paiment, L Tao, S Hannuna, M Camplani, D Damen and M Mirmehdi. British Machine Vision Conference (BMVC), Sep 2014. PDF | Dataset.
The Bicycle Problem
Explaining Activities as Consistent Groups of Events - A Bayesian Framework using Attribute Multiset Grammars. D Damen and D Hogg International Journal of Computer Vision (IJCV), 2012. PDF.
Recognizing Linked Events: Searching the Space of Feasible Explanations. D Damen and D Hogg. Computer Vision and Pattern Recognition (CVPR), Miami, Florida, June 2009. PDF | Poster
Detecting Carried Objects from Walking Pedestrians
Detecting Carried Objects from Sequences of Walking Pedestrians. D Damen and D Hogg. Pattern Analysis and Machine Intelligence (PAMI), 2012. PDF.
Detecting Carried Objects in Short Video Sequences. D Damen and D Hogg. European Conference on Computer Vision (ECCV), Marseille, France, Oct 2008 PDF | Poster
Research Group Members
- Kevin Flanagan, PhD student 2021-
- Jacob Chalk, PhD student 2021-
- Ahmad Dar Khalil, PhD student 2021-
- Zhifan Zhu, PhD student 2021-
- Saptarshi Sinha, PhD student 2022-
- Rhodri Gurrier, PhD student 2023-
- Siddhant Bansal, PhD student, 2023-
- Prajwal Gatti, PhD student, 2024-
- Omar Emara, PhD student 2024 - (CDT in Interactive AI)
- Jiahe Zhao, PhD student 2025 -
Previous Students, and Postdocs
- Kranti Kumar Parida, Postdoc, 2023-2025
- Adriano Fragomeni, PhD student 2020-2025. Currently Founding AI Engineer at Nexus Additive, London.
- Toby Perrett, Senior Postdoctoral Researcher (UMPIRE project), 2018-2025. Currently Senior Research Engineer at Autodesk.
- Daniel Whettam, PhD student 2020 - 2024 (CDT in Interactive AI). Currently Machine Learning Engineer at Beam
- Bin Zhu, Postdoctoral Researcher 2021-2023. Currently Assistant Professor at Singapore Management University
- Jian Ma, PhD student 2019 - 2023
- Alexanros Stergiou, Postdoctoral Researcher 2021-2022. Currently Assistant Prof at University of Twente
- Dena Bazazian, Postdoctoral Researcher, 2021-2022
- Michael Wray, Postdoc 2019-2022, previously PhD student 2015-2019. Currently lecturer at University of Bristol
- Evangelos Kazakos, PhD student 2017 - 2022 - currently postdoc at Czech University of Prague w/ Josef Sivic and Cordelia Schmid
- Will Price, PhD student 2017 - 2021
- Jonathan Munro, PhD student 2017 - 2021
- Alessandro Masullo, postdoc (SPHERE project), 2017-2021 - currently Lecturer at University of Bristol
- Youngkyoon Jang, postdoc (GLANCE project), 2018-2021 - currentlly postdoc at Imperial College
- Hazel Doughty, PhD student 2016 - 2020 - currently Assistant Professor at University of Leiden
- Farnoosh Heidarivincheh, PhD student 2015-2020 - currently postdoc at University of Bristol
- Davide Moltisanti, PhD student 2015-2019 - currently Assistant Professor at University of Bath
- Miguel Fortiz, PhD student 2016 - 2019 (Co-supervisor: Walterio Mayol-Cuevas)
- Victor Ponce Lopez, postdoc (SPHERE project), 2017-2018
- Vahid Soleimani, PhD student 2014-2018 (Co-supervisor: Majid Mirmehdi)
- Yangdi Xu, PhD student 2013-2018
- Toby Perrett, postdoc (LOCATE project), 2017-2018
- Teesid Leelasawassuk, PhD student 2011-2016
- Massimo Camplani, postdoc (SPHERE project), 2013-2017
- Sion Hannuna, postdoc (SPHERE project), 2013-2017
- Lili Tao, postdoc (SPHERE project), 2013-2017
- Adeline Paiment, postdoc (SPHERE project), 2013-2016
Professor of Computer Vision,













