You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 17, 2025. It is now read-only.
VMZ is no longer supported and this repository has been archived.
VMZ is a Caffe2 and Pytorch codebase for video modeling developed by the Computer Vision team at Facebook AI. The aim of this codebase is to help other researchers and industry practitioners:
reproduce some of our research results and
leverage our very strong pre-trained models.
Currently, this codebase supports the following models:
CSN models [2] (note:pytorch implementation is buggy).
R(2+1)D and CSN models pre-trained on large-scale (65 million!) weakly-supervised public Instagram videos (IG-65M) [3].
Gradient-Blending for audio-visual modeling [4] (Caffe2 Only)
References
D. Tran, H. Wang, L. Torresani, J. Ray, Y. LeCun and M. Paluri. A Closer Look at Spatiotemporal Convolutions for Action Recognition. CVPR 2018.
D. Tran, H. Wang, L. Torresani and M. Feiszli. Video Classification with Channel-Separated Convolutional Networks. ICCV 2019.
D. Ghadiyaram, M. Feiszli, D. Tran, X. Yan, H. Wang and D. Mahajan, Large-scale weakly-supervised pre-training for video action recognition. CVPR 2019.
W. Wang, D. Tran, M. Feiszli, What Makes Training Multi-Modal Classification Networks Hard? CVPR 2020.
Suporting Team
This codebase was supported by Facebook AI computer vision: @CHJoanna, @weiyaowang, @hengcv, @deeptigp, @dutran, and community researchers @bjuncek (Quansight, Oxford VGG).