You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
This is a PyTorch implementation of the MeMViT paper (CVPR 2022 oral):
@inproceedings{memvit2022,
title={{MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition}},
author={Wu, Chao-Yuan and Li, Yanghao and Mangalam, Karttikeya and Fan, Haoqi and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph},
booktitle={CVPR},
year={2022}
}
MeMViT builds on the MViT models:
@inproceedings{li2021improved,
title={{MViTv2}: Improved multiscale vision transformers for classification and detection},
author={Li, Yanghao and Wu, Chao-Yuan and Fan, Haoqi and Mangalam, Karttikeya and Xiong, Bo and Malik, Jitendra and Feichtenhofer, Christoph},
booktitle={CVPR},
year={2022}
}
@inproceedings{fan2021multiscale,
title={Multiscale vision transformers},
author={Fan, Haoqi and Xiong, Bo and Mangalam, Karttikeya and Li, Yanghao and Yan, Zhicheng and Malik, Jitendra and Feichtenhofer, Christoph},
booktitle={ICCV},
year={2021}
}