| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 28 Oct 2025 19:12:48 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"690115b0-95c0"
expires: Sun, 28 Dec 2025 19:20:56 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 6B66:328FD3:7F0BC1:8E7608:695180C0
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 19:10:56 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210073-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766949056.347902,VS0,VE207
vary: Accept-Encoding
x-fastly-request-id: eb5edc0ce6baf06e186f0d843d9567412f15844a
content-length: 8176
Jehanzeb Mirza
Postdoctoral Researcher - MIT (Boston, USA): Multi-modal Learning with Speech/Audio, Vision, and Language. (11.24 - Present).
Research Assistant - TU Graz (Graz, Austria): Self-supervised learning and vision-language understanding (01.21 - 10.24).
Research Scientist Internship - Sony AI (Tokyo, Japan): Multimodal vision-language understanding (05.24 - 8.24)
Internship - Intel (Karlsruhe, Germany): Evaluating robustness of object detectors in degrading weather (03.19 - 08.20).
Jehanzeb Mirza
Hi, I am Jehanzeb Mirza. I am a Postdoctoral Researcher at MIT CSAIL, in the Spoken Language Systems Group, led by Dr. James Glass. I received my Ph.D. in Computer Science (Computer Vision) from TU Graz, Austria, where I was advised by Professor Horst Bischof, and Professor Serge Belongie served as an external referee.
I am particularly interested in self-supervised learning for uni-modal models and multi-modal learning for vision-language models, with a focus on improving fine-grained understanding.
I am actively looking for student collaborators in the area of multi-modal learning. Please do not hesitate to write me an email, even if you just want an opinion on your work! :)
Contact
- jmirza [at] mit.edu
- Office: 32-G442.
- MIT, Cambridge, USA.
Education
-
Ph.D. in Computer Vision (2021 - 2024)
TU Graz, Austria. -
MS in ETIT (2017 - 2020)
KIT, Germany. -
BS in EE (2013 - 2017)
NUST, Pakistan
Recent News
10/25: Our recent ICCV work was covered by MIT News: Story.
10/25: I have been recognized as NeurIPS 2025 Exceptional Reviewer.
09/25: 1 paper accepted at NeurIPS, 2025.
08/25: 1 paper accepted at TMLR, 2025.
07/25: 1 paper accepted at COLM, 2025.
06/25: 2 paper accepted at ICCV, 2025.
04/25: Our workshops "Long Multi-Scene Video Foundations" and "MMFM" got accepted at ICCV 2025.
03/25: Talk at EI Seminar, MIT-CSAIL.
02/25: 2 paper accepted at CVPR, 2025 (workshops).
01/25: 3 papers accepted at ICLR, 2025.
12/24: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2025.
11/24: I joined MIT CSAIL as a Postdoctoral Researcher.
11/24: 1 paper accepted at 3DV, 2025.
09/24: 1 paper accepted at NeurIPS, 2024.
07/24: 1 paper accepted at BMVC, 2024.
07/24: 2 papers accepted at ECCV, 2024.
04/24: I successfully defended my Ph.D. thesis.
12/23: Our workshop "What's Next in Multi-Modal Foundation Models" got accepted at CVPR 2024.
10/23: Invited talk at Cohere.
10/23: Invited talk at VIS Lab, University of Amsterdam.
9/23: 1 paper accepted at NeurIPS, 2023.
9/23: Invited talk at Center for Robotics, Paris Tech.
7/23: 1 paper accepted at ICCV, 2023.
4/23: I will be attending ICVSS 2023.
3/23: 2 papers accepted at CVPR, 2023.
2/23: Reviewing for CVPR, ICCV, and TPAMI.
3/22: 2 papers accepted at CVPR, 2022.
Experience
Selected Publications
Teaching VLMs to Localize Specific Objects from In-context Examples
Sivan Doveh,
Nimrod Shabtay,
Wei Lin,
Eli Schwartz,
Hilde Kuehne,
Raja Giryes,
Rogerio Feris,
Leonid Karlinsky,
James Glass,
Assaf Arbelle,
Shimon Ullman,
M. Jehanzeb Mirza
ICCV 2025
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models
M. Jehanzeb Mirza,
Mengjie Zhao,
Zhuoyuan Mao,
Sivan Doveh,
Wei Lin,
Paul Gavrikov,
Michael Dorkenwald,
Shiqi Yang,
Saurav Jha,
Hiromi Wakaki,
Yuki Mitsufuji,
Horst Possegger,
Rogerio Feris,
Leonid Karlinsky,
James Glass
TMLR 2025
Are Vision Language Models Texture or Shape Biased and Can We Steer Them?
Paul Gavrikov,
Jovita Lukasik,
Steffen Jung,
Robert Geirhos,
Bianca Lamm,
M. Jehanzeb Mirza,
Margret Keuper,
Janis Keuper
ICLR 2025
[ Paper]
Mining your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Saurav Jha,
Shiqi Yang,
Masato Ishii,
Mengjie Zhao,
Christian Simon,
M. Jehanzeb Mirza,
Dong Gong,
Lina Yao,
Shusuke Takahashi,
Yuki Mitsufuji
ICLR 2025
[ Paper]
ConMe: Rethinking Evaluation of Compositional Reasoning for Modern VLMs
*Irene Huang,
*Wei Lin,
*M. Jehanzeb Mirza,
Jacob Hansen,
Sivan Doveh,
Victor Ion Butoi,
Roei Herzig,
Assaf Arbelle,
Hilde Kuehne,
Trevor Darrel,
Chuang Gan,
Aude Oliva,
Rogerio Feris,
Leonid Karlinsky (*Equal Contribution)
NeurIPS 2024
Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs
M. Jehanzeb Mirza,
Leonid Karlinsky,
Wei Lin,
Sivan Doveh,
Jakub Micorek,
Mateusz Kozinski,
Hilde Kuhene,
Horst Possegger
ECCV 2024
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh,
Shaked Perek,
M. Jehanzeb Mirza,
Amit Alfassy,
Assaf Arbelle,
Shimon Ullman,
Leonid Karlinsky
ECCVW 2024
[ Paper]
LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
M. Jehanzeb Mirza,
Leonid Karlinsky,
Wei Lin,
Mateusz Kozinski,
Horst Possegger,
Rogerio Feris,
Horst Bischof
NeurIPS 2023
MATE: Masked Autoencoders are Online 3D Test-Time Learners
*M. Jehanzeb Mirza,
*Inkyu Shin,
*Wei Lin,
Andreas Schriebl,
Kunyang Sun,
Jaesung Choe,
Mateusz Kozinski,
Horst Possegger,
In So Kweon,
Kun-Jin Yoon,
Horst Bischof (*Equal Contribution)
ICCV 2023
ActMAD: Activation Matching to Align Distributions for Test-Time-Training
CVPR 2023
Video Test-Time Adaptation for Action Recognition
*Wei Lin,
*M. Jehanzeb Mirza,
Mateusz Kozinski,
Horst Possegger,
Hilde Kuehne,
Horst Bischof (*Equal Contribution)
CVPR 2023
The Norm Must Go On: Dynamic Unsupervised Domain Adaptation by Normalization
CVPR 2022