| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 28 Oct 2025 05:12:28 GMT
access-control-allow-origin: *
etag: W/"690050bc-16a43"
expires: Sun, 28 Dec 2025 02:08:56 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: DEB8:3827E5:72D167:808053:69508ED6
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 01:58:56 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210039-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766887136.438371,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 8d52c01eae452a5cf9042ef7e1c5e2e7343df1dc
content-length: 13789
Ishan Misra
Ishan Misra
Director, Research Scientist @ TBD Labs (Meta)
I work on computer vision and machine learning research specifically in generative AI and representation learning. I am a Director, Research Scientist in the TBD Labs research division at Meta's SuperIntelligence group.
Previously I was at the GenAI group at Meta where I lead the research efforts on video generation models. I was the tech lead for Meta's Movie Gen project for foundation models in video generation, video editing, video personalization, and audio generation.
Prior to GenAI, I worked at FAIR in Meta on self-supervised learning in computer vision and multimodal learning.
I got my PhD at Carnegie Mellon University.
For my work in self-supervised learning, I was featured in the MIT Tech Review’s 35 innovators under 35 list (compiled globally across technological disciplines).
You can hear me on Lex Fridman’s podcast for an overview of my work.
I received CMU's Recent Alumni Achievement Award in 2024 and PAMI TC's Young Researcher Award (Honorable Mention) in 2025 for my research contributions to computer vision and machine learning.
News
2025 June
PAMI Young Researcher Award (Honorable Mention) for contributions to computer vision
2025 March
Featured in Rising Stars of AI Research by The Information.
2024 October
Research on Movie Gen series of foundation media models announced (played role of Tech Lead for the full project). Covered in NY Times, Financial Times, Forbes.
2024 October
Giving four talks at ECCV 2024 Workshops and Tutorials on Generative Video Models
2024 September
2024 July
Mark Zuckerberg announces the release of Llama3 (with our efforts on video recognition).
2024 July
Talk at ELLIS Workshop on Open Problems in Computer Vision & Generative Modelling at Munich, Germany
2024 July
2024 March
2024 June
4 papers accepted at CVPR
2024 June
Emu Video now powers "animate" on meta.ai that converts images to videos!
2024 June
Llama3 is released!
2023 Nov
2023 May
2023 April
2022 April
Keynote talk at the Ghost Day ML Conference
2021 March
Blog on self-supervised learning the dark matter of intelligence with Yann LeCun
Publications
Mainly publish on video and image recognition, video and image generation, object detection/segmentation, multimodal learning, and self-supervised learning.
CAT: Content-Adaptive Image Tokenization
Junhong Shen,
Kushal Tirumala,
Michihiro Yasunaga,
Ishan Misra,
Luke Zettlemoyer,
Lili Yu*,
Chunting Zhou*
NeurIPS 2025
PDF
Generative AI
Foundation Models
Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning
Rohit Girdhar^* ,
Mannat Singh^* ,
Andrew Brown* ,
Quentin Duval* ,
Samaneh Azadi* ,
Sai Saketh Rambhatla,
Akbar Shah,
Xi Yin,
Devi Parikh,
Ishan Misra*
ECCV 2024
@inproceedings{emuvideo2023,
title={Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning},
author={Rohit Girdhar and Mannat Singh and Andrew Brown and Quentin Duval and Samaneh Azadi and Sai Saketh Rambhatla and Akbar Shah and Xi Yin and Devi Parikh and Ishan Misra},
inproceedings={ECCV},
year={2024},
}
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis
Feng Liang,
Bichen Wu,
Jialiang Wang,
Licheng Yu,
Kunpeng Li,
Yinan Zhao,
Ishan Misra,
Jia-Bin Huang,
Peizhao Zhang,
Peter Vajda,
Diana Marculescu
CVPR 2024
@inproceedings{liang2024flowvid,
title={FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis},
author={Feng Liang and Bichen Wuand Jialiang Wang and Licheng Yu and Kunpeng Li and Yinan Zhao and Ishan Misra and Jia-Bin Huang and Peizhao Zhang and Peter Vajda and Diana Marculescu},
booktitle={CVPR},
year={2024},
}
@inproceedings{wang2024instance,
title={InstanceDiffusion: Instance-level Control for Image Generation},
author={Xudong Wang and Trevor Darrell and Sai Saketh Rambhatla and Rohit Girdhar and Ishan Misra},
booktitle={CVPR},
year={2024},
}
@inproceedings{menon2024illustrated,
title={Generating Illustrated Instructions},
author={Sachit Menon and Ishan Misra and Rohit Girdhar},
booktitle={CVPR},
year={2024},
}
@inproceedings{wang2024vcutler,
title={VideoCutLER: Surprisingly Simple Unsupervised Video Instance Segmentation},
author={Xudong Wang and Ishan Misra and Ziyun Zheng and Rohit Girdhar and Trevor Darrell},
booktitle={CVPR},
year={2024},
}
The effectiveness of MAE pre-pretraining for billion-scale pretraining
Mannat Singh* ,
Quentin Duval* ,
Kalyan Vasudev Alwala* ,
Haoqi Fan,
Vaibhav Aggarwal,
Aaron Adcock,
Armand Joulin,
Piotr Dollár,
Christoph Feichtenhofer,
Ross Girshick,
Rohit Girdhar,
Ishan Misra
ICCV 2023
@inproceedings{singh2023effectiveness,
title={The effectiveness of MAE pre-pretraining for billion-scale pretraining},
author={Singh, Mannat and Duval, Quentin and Alwala, Kalyan Vasudev and Fan, Haoqi and Aggarwal, Vaibhav and Adcock, Aaron and Joulin, Armand and Doll{\'a}r, Piotr and Feichtenhofer, Christoph and Girshick, Ross and Girdhar, Rohit and Misra, Ishan},
booktitle={ICCV},
year={2023},
}
@inproceedings{rambhatla2023most,
title={MOST: Multiple Object localization with Self-supervised Transformers for object discovery},
author={Sai Saketh Rambhatla and Ishan Misra and Rama Chellappa and Abhinav Shrivastava},
booktitle={ICCV},
year={2023},
}
@inproceedings{fu2023mononerf,
title={MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without Camera Poses},
author={Yang Fu and Ishan Misra and Xiaolong Wang},
booktitle={ICML},
year={2023},
}
ImageBind: One Embedding Space To Bind Them All
Rohit Girdhar* ,
Alaaeldin El-Nouby* ,
Zhuang Liu,
Mannat Singh,
Kalyan Vasudev Alwala,
Armand Joulin,
Ishan Misra*
CVPR 2023
@inproceedings{girdhar2023imagebind,
title={ImageBind: One Embedding Space To Bind Them All},
author={Girdhar, Rohit and El-Nouby, Alaaeldin and Liu, Zhuang and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
booktitle={CVPR},
year={2023},
}
@inproceedings{wang2023cut,
title={Cut and Learn for Unsupervised Object Detection and Instance Segmentation},
author={Wang, Xudong and Girdhar, Rohit and Yu, Stella X and Misra, Ishan},
booktitle={CVPR},
year={2023},
}
@inproceedings{zhao2022lavila,
title={Learning Video Representations from Large Language Models},
author={Zhao, Yue and Misra, Ishan and Kr{\"a}henb{\"u}hl, Philipp and Girdhar, Rohit},
booktitle=CVPR,
year={2023},
}
@inproceedings{assran2023hidden,
title={The Hidden Uniform Cluster Prior in Self-Supervised Learning},
author={Mahmoud Assran and Randall Balestriero and Quentin Duval and Florian Bordes and Ishan Misra and Piotr Bojanowski and Pascal Vincent and Michael Rabbat and Nicolas Ballas},
booktitle=ICLR,
year={2023},
}
@inproceedings{girdhar2022omnimae,
title={OmniMAE: Single Model Masked Pretraining on Images and Videos},
author={Girdhar, Rohit and El-Nouby, Alaaeldin and Singh, Mannat and Alwala, Kalyan Vasudev and Joulin, Armand and Misra, Ishan},
booktitle={CVPR},
year={2023},
}
@inproceedings{assran2022masked,
title={Masked Siamese Networks for Label-Efficient Learning},
author={Assran, Mahmoud, and Caron, Mathilde, and Misra, Ishan, and Bojanowski, Piotr, and Bordes, Florian and Vincent, Pascal, and Joulin, Armand, and Rabbat, Michael, and Ballas, Nicolas},
booktitle={ECCV},
year={2022},
}
@inproceedings{zhou2021detecting,
title={Detecting Twenty-thousand Classes using Image-level Supervision},
author={Zhou, Xingyi and Girdhar, Rohit and Joulin, Armand and Kr{\"a}henb{\"u}hl, Philipp and Misra, Ishan},
booktitle={ECCV},
year={2022},
}
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal,
Quentin Duval,
Isaac Seessel,
Mathilde Caron,
Ishan Misra,
Levent Sagun,
Armand Joulin,
Piotr Bojanowski
Arxiv 2022
PDF
Self-supervised Learning
Image Recognition
Foundation Models
Omnivore: A Single Model for Many Visual Modalities
Rohit Girdhar* ,
Mannat Singh* ,
Nikhila Ravi* ,
Laurens van der Maaten,
Armand Joulin,
Ishan Misra*
CVPR 2022
@inproceedings{girdhar2022omnivore,
title={{Omnivore: A Single Model for Many Visual Modalities}},
author={Girdhar, Rohit and Singh, Mannat and Ravi, Nikhila and van der Maaten, Laurens and Joulin, Armand and Misra, Ishan},
booktitle={CVPR},
year={2022},
}
@inproceedings{cheng2021mask2former,
title={Masked-attention Mask Transformer for Universal Image Segmentation},
author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
booktitle={CVPR},
year={2022},
}
@inproceedings{misra2021-3detr,
title={{An End-to-End Transformer Model for 3D Object Detection}},
author={Misra, Ishan and Girdhar, Rohit and Joulin, Armand},
booktitle={{ICCV}},
year={2021},
}
@inproceedings{zhang_depth_contrast,
title={Self-Supervised Pretraining of 3D Features on any Point-Cloud},
author={Zhang, Zaiwei and Girdhar, Rohit and Joulin, Armand and Misra, Ishan},
journal={arXiv preprint arXiv:2101.02691},
year={2021},
}
@article{morgado2020avid,
title={Audio-Visual Instance Discrimination with Cross-Modal Agreement},
author={Pedro Morgado and Nuno Vasconcelos and Ishan Misra},
year={2020},
journal={https://arxiv.org/abs/2004.12943},
}
@ InProceedings{morgado2021_robust_xid,
title={Robust Audio-Visual Instance Discrimination},
author={Pedro Morgado, Ishan Misra, Nuno Vasconcelos},
booktitle = {{CVPR}},
year={2021},
}
@inproceedings{zbontar_barlowtwins,
title={Barlow Twins: Self-Supervised Learning via Redundancy Reduction},
author={Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stephane Deny},
booktitle={ICML},
year={2021},
}
3D Spatial Recognition without Spatially Labeled 3D
Zhongzheng Ren,
Ishan Misra,
Alexander G. Schwing,
Rohit Girdhar
CVPR 2021
PDF
3D Recognition
Instance Recognition
@inproceedings{caron2020swav,
title={Unsupervised Learning of Visual Features by Contrasting Cluster Assignments},
author={Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin},
year={2020},
booktitle={NeurIPS},
}
@inproceedings{misra2020pirl,
title={Self-Supervised Learning of Pretext-Invariant Representations},
author={Misra, Ishan and van der Maaten, Laurens},
booktitle={CVPR},
year={2020},
}
@inproceedings{yan2020cluster,
title={{ClusterFit: Improving Generalization of Visual Representations}},
author={Xueting Yan, Ishan Misra, Abhinav Gupta, Deepti Ghadiyaram, Dhruv Mahajan},
booktitle={CVPR},
year={2020},
}
@inproceedings{jiang2020grid,
title={In Defense of Grid Features for Visual Question Answering},
author={Huaizu Jiang, Ishan Misra, Marcus Rohrbach, Erik Learned-Miller, Xinlei Chen},
booktitle={CVPR},
year={2020},
}
@inproceedings{kulkarni20193drel,
title={{3D-RelNet: Joint Object and Relational Network for 3D Prediction}},
author={Nilesh Kulkarni and Ishan Misra and Shubham Tulsiani and Abhinav Gupta},
booktitle={ICCV},
year={2019},
}
@inproceedings{goyal2019self,
title={{Scaling and Benchmarking Self-Supervised Visual Representation Learning}},
author={Priya Goyal and Dhruv Mahajan and Abhinav Gupta and Ishan Misra},
booktitle={ICCV},
year={2019},
}
@article{hexiang2018bison,
title={{Binary Image Selection (BISON): Interpretable Evaluation of Visual Grounding}},
author={Hu, Hexiang and Misra, Ishan and van der Maaten, Laurens},
journal={arXiv preprint arXiv:1901.06595},
year={2019},
}
@inproceedings{devries2019fairness,
title={{Does Object Recognition Work for Everyone?}},
author={Terrance DeVries and Ishan Misra and Changhan Wang and Laurens van der Maaten},
booktitle={CVPR 2019 Workshop on Computer Vision for Global Challenges},
year={2019},
}
@inproceedings {jiangmainstream18,
title = {Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing},
authors = {Angela Jiang and Daniel L.-K. Wong and Christopher Canel and Ishan Misra and Michael Kaminsky and Michael Kozuch and Padmanabhan Pillai and David G. Andersen and Gregory Ganger},
booktitle = {{USENIX} Annual Technical Conference ({USENIX} {ATC} 18)},
year = {2018},
address = {Boston, MA},
url = {https://www.usenix.org/conference/atc18/presentation/jiang},
publisher = {{USENIX} Association},
}
@inproceedings{misra2017lba,
Author = {Ishan Misra and Ross Girshick and Rob Fergus and,
Martial Hebert and Abhinav Gupta and Laurens van der Maaten},
Title = {{Learning by Asking Questions}},
Booktitle = {{CVPR}},
Year = {2018},
}
@inproceedings{debi2017cutpaste,
title={{Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection}},
author={Dwibedi, Debidatta and Misra, Ishan and Hebert, Martial},
booktitle={ICCV},
year={2017},
}
@inproceedings{misra2017composing,
title={{From Red Wine to Red Tomato: Composition with Context}},
author={Misra, Ishan and Gupta, Abhinav and Hebert, Martial},
booktitle={CVPR},
year={2017},
}
@inproceedings{misra2016unsupervised,
title={{Shuffle and Learn: Unsupervised Learning using Temporal Order Verification}},
author={Misra, Ishan and Zitnick, C. Lawrence and Hebert, Martial},
booktitle={ECCV},
year={2016},
}
@inproceedings{MisraNoisy16,
Author = {Ishan Misra and C. Lawrence Zitnick and Margaret Mitchell and Ross Girshick},
Booktitle = {CVPR},
Title = {{Seeing through the Human Reporting Bias: Visual Classifiers from Noisy Human-Centric Labels}},
Year = {2016},
}
,
@inproceedings{MisraCrossMTL16,
Author = {Ishan Misra and Abhinav Shrivastava and Abhinav Gupta and Martial Hebert},
Booktitle = {CVPR},
Title = {{Cross-stitch Networks for Multi-task Learning}},
Year = {2016},
}
,
@article{mostafazadeh2016generating,
title={Generating Natural Questions About an Image},
author={Mostafazadeh, Nasrin and Misra, Ishan and Devlin, Jacob and Mitchell, Margaret and He, Xiaodong and Vanderwende, Lucy},
journal={arXiv preprint arXiv:1603.06059},
year={2016},
}
,
@article{ferraro2016visual,
title={Visual storytelling},
author={Ferraro, Francis and Mostafazadeh, Nasrin and Misra, Ishan and Agrawal, Aishwarya and Devlin, Jacob and Girshick, Ross and He, Xiaodong and Kohli, Pushmeet and Batra, Dhruv and Zitnick, C Lawrence and Parikh, Devi and Vanderwende, Lucy and Galley, Michel and Mitchell, Margaret},
journal={arXiv preprint arXiv:1604.03968},
year={2016},
}
@inproceedings{MisraSSL15,
Author = {Ishan Misra and Abhinav Shrivastava and Martial Hebert},
Booktitle = {CVPR},
Title = {Watch and Learn: Semi-Supervised Learning of Object Detectors from Videos},
Year = {2015},
}
,
Applying artificial vision models to human scene understanding
Elissa Aminoff,
M. Toneva,
Abhinav Shrivastava,
Xinlei Chen,
Ishan Misra,
et al.
Journal of Frontiers in Computational Neuroscience 2015
@inproceedings{MisraExemplarSelection,
Author = {Ishan Misra and Abhinav Shrivastava and Martial Hebert},
Booktitle = {IEEE Winter Conference on Applications of Computer Vision (WACV)},
Title = {Data-driven Exemplar Model Selection},
Year = {2014},
}
,



















































