| CARVIEW |

About me
I’m a Senior Research Scientist at the Allen Institute for Artificial Intelligence (Ai2), working on general-purpose vision-language models and multimodal agents for web, code, and robotics.
I completed my PhD at UIUC with Prof. Derek Hoiem in 2020 and have continued that line of research at Ai2 pushing towards greater autonomy and agency in AI systems.
tanmayg at allenai dot org Awards
Service
Professional Journey
Senior Research Scientist @ Ai2
I currently lead multimodal agents research at the PRIOR team, with a focus on moving beyond AI that understands to AI that acts in digital and physical environments.
Research Scientist @ Ai2
I began my post-PhD career at Ai2 in the PRIOR team, working across robotics, multimodal reasoning, and vision-language models. Highlights include:
- CodeNav: a code-use agent that can read, write, and execute code to solve a task given a codebase. An extension of the tool-use paradigm and precursor to modern coding agents.
- Molmo and Pixmo: open weights and open data for training SOTA VLM (CVPR 2025 Best Paper Honorable Mention)
- SPOC: a vision-language inspired end-to-end mobile manipulation architecture and policy for real-world robots trained completely in simulation. (CVPR 2025)
- VisProg: a neuro-symbolic system that showcased the promise of tool-use for visual reasoning. (CVPR 2023 Best Paper)
- GPV-1 & GPV-2: general-purpose instruction-following vision-language models capable of captioning, vqa, detection, and classification with a unified transformer architecture. (CVPR 2022 Oral and ECCV 2022)
- GRIT: benchmark for evaluation of general-purpose vision systems on 7 diverse vision-language tasks across 3 dimensions - accuracy, robustness, and calibration. Used for evaluation of VLMs like Unified-IO-1 & Unified-IO-2.
Research Intern @ Nvidia
Collaborated with Arash Vahdat to develop a SOTA contrastive learning algorithm for weakly-supervised phrase grounding in images. (ECCV 2020 Spotlight)
Research Intern @ Ai2
Collaborated with Ani Kembhavi to develop one of the earliest deep learning system for text-to-video generation. Demonstrated it on generating short clips from the animated series The Flintstones! (ECCV 2018)
PhD @ UIUC
Enjoyed working with my advisor Prof. Derek Hoiem and close collaborator Prof. Alex Schwing at UIUC. My work focused on joint representation learning for vision and language including word embeddings from visual co-occurrences, multitask vision-language models with shared image and word representations, human-object interaction detection models.
Research Intern @ Cornell
As an undergraduate research intern at Prof. Tsuhan Chen's lab at Cornell, I worked on point cloud registration techniques. This was the fork in the road that took me down the path of PhD!
UG @ IIT Kanpur
While majoring in Electric Engineering, I got interested in Computer Vision and Machine Learning early on and tailored my curriculum to include several Math and CS courses, such as Statistics, Machine Learning, Image Processing, Linear Algebra, Probability Theory, and Data Structures & Algorithms. I finished my Bachelor's degree with a thesis on "Face Detection and Tracking" under the supervision of Prof. Aditya Jagannatham.