Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Multitasking


We benchmark EgoM2P's multitasking abilities with SOTA specialist models in downstream tasks, including egocentric perception and synthesis. We also benchmark it on unseen datasets without any fine-tuning to show the strong generalization ability of the pretrained feature.

Egocentric Camera Tracking

EgoExo4D

Compared to specialist SOTA models that require geometry-based test-time optimization, our feed-forward EgoM2P predicts camera trajectories with better translation and rotation accuracy, achieving an inference speed of 300+ FPS

EgoM2P can also predict smooth and plausible camera trajectories in the sense that it learns to capture the uniqueness of egocentric head motion, while baselines suffer from temporal jittering

ADT (unseen)

EgoM2P effectively predicts realistic egocentric camera trajectories, even on unseen dataset ADT


Gaze Estimation in Egocentric Videos

EgoExo4D

EgoM2P can better understand human intentions

Image 1
Image 2
Image 3
Image 4

Egocentric Monocular Depth Estimation

H2O

Baselines requires sequence-level optimization which is time-consuming. Our model achieves at least 30 times faster inference speed, while ensuring temporal consistency

HOI4D (unseen)

EgoM2P has strong generalization abilities to unseen datasets


Conditional Egocentric Video Synthesis

HoloAssist

EgoM2P demonstrates superior performance to generate depth-aligned egocentric RGB videos, minimizing hallucinations and produces realistic finger motions

Egocentric 4D Reconstruction

Given ground-truth camera intrinsics and an egocentric video, we compare EgoM2P with the SOTA baseline MegaSAM (CVPR 2025) for 4D reconstruction. Unlike MegaSAM, which relies on SOTA monocular depth estimators and expensive geometry optimization, EgoM2P efficiently reconstructs dynamic egocentric scenes. For a 2-second video at 8 FPS, EgoM2P completes the reconstruction in less than 1 second, whereas MegaSAM requires 71 seconds.

MegaSam (71 s)
Ours (<1 s)

Video

Coming Soon...

Citation


EgoM2P: Egocentric Multimodal Multitask Pretraining
Gen Li, Yutong Chen, Yiqian Wu, Kaifeng Zhao, Marc Pollefeys, Siyu Tang


@article{li2025egom2p,
    title={EgoM2P: Egocentric Multimodal Multitask Pretraining},
    author={Li, Gen and Chen, Yutong and Wu, Yiqian and Zhao, Kaifeng and Pollefeys, Marc and Tang, Siyu},
    journal={arXiv preprint arXiv:2506.07886},
    year={2025}
  }

Contact


For questions, please contact Gen Li:
gen.li@inf.ethz.ch

Copyright © VLG 2025

template from LEAP