| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 16 Dec 2025 20:31:01 GMT
access-control-allow-origin: *
etag: W/"6941c185-2fc6"
expires: Sun, 28 Dec 2025 19:33:07 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 08B8:2DDCFF:7EFEF1:8E6DCF:6951839A
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 19:23:07 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210041-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766949787.046760,VS0,VE203
vary: Accept-Encoding
x-fastly-request-id: 5903701417ea200d811a33eb6a15a6ff805c0375
content-length: 4567
Dhruv Batra
Dhruv Batra
Co-founder and Chief Scientist, Yutori
Prev:
Senior Director
Fundamental AI Research (FAIR), Meta
Associate Professor
School of Interactive Computing, Georgia Tech
Email: username -at- domain.com
(where username = my first name, domain = this website's domain)
Here are some representative projects:
Dhruv Batra
Dhruv Batra
Co-founder and Chief Scientist, Yutori
Prev:
Senior Director
Fundamental AI Research (FAIR), Meta
Associate Professor
School of Interactive Computing, Georgia Tech
Email: username -at- domain.com
(where username = my first name, domain = this website's domain)
About me and my work
I am fascinated by the natural phenomenon of intelligence, and I work on understanding and advancing the limits of artificial intelligence (AI).
I am a co-founder and the Chief Scientist of Yutori. I have been a professor, led research teams in industry, and built open-source communities.
I was a Senior Director at Meta leading FAIR Embodied AI (AI for robotics and smart glasses). My teams:
- Developed the multimodal AI assistant that shipped in the Ray-Ban Meta SmartGlasses.
- Built Habitat, the fastest 3D simulator for training virtual robots to navigate, pick and place objects, and operate around humans, and follow language instructions.
- Solved PointNav, the task of navigation to goal coordinates in unfamiliar environments without a map, both in simulation and with Boston Dynamics' Spot robot.
- Demonstrated a robotic assistant to CBS and at the White House Correspondents' Dinner.
- Built the world's first aritifical superhuman fingertip.
I was a tenured Associate Professor in the School of Interactive Computing at Georgia Tech, where:
- I recived the PECASE award, the highest honor bestowed by the U.S. government for early career scientists and engineers.
- I created Georgia Tech's Deep Learning class in 2017 and taught it till 2021.
- My PhD students won university-level dissertation awards in 4 out of the 8 years I spent at Georgia Tech.
- Three of those students (Aishwarya Agrawal, Abhishek Das, and Erik Wijmans) were recipients/honorable-mentions of the ACM SIGAI Doctoral Dissertation Awards.
My work has received best paper awards/nominations/honorable mentions in every area of AI
- computer vision (Ego4D at CVPR 2022, AI Habitat at ICCV 2019),
- machine learning (Emergence of Maps at ICLR 2023),
- natural language processing (Lack of emergence of language at EMNLP 2017),
- robotics (Combining foundation models and mapping (System 1 + System 2) at ICRA 2024).
Here are some representative projects:
- Summarizing beliefs of AI agents via diverse plausible predictions/hypotheses:
Diverse Beam Search, Multiple Choice Learning, Tutorial on Diversity at CVPR '13 and CVPR '16, - Vision-and-language (or multimodal AI):
My colleagues and I developed the foundations of a new AI sub-field — new tasks, benchmarks, techniques, and models.
If the phrases Visual Question Answering (VQA), Text VQA, Visual Dialog, Audio-Video Dialog, image-text cross-modal attention, visuolinguistic pre-training, or training visual chatbots with RL sound familiar, you have heard of the titles of our papers. - Embodied AI and robotics:
Habitat: A Platform for Embodied AI, Decentralized Distributed PPO, Embodied Question Answering, Sim2Real Predictivity, ASC: Adaptive Skill Coordination for Robotic Mobile Manipulation, LSC: Language-guided Skill Coordination for Open-Vocabulary Mobile Pick-and-Place - Explainable, Unbiased, Trustworthy AI:
Grad-CAM (Visual Explanations), Human-vs-machine attention, Counterfactual Visual Explanations - Platforms for reproducible AI research:
EvalAI, a platform for evaluating AI algorithms.
Essays
- Dec 2024: The term "LLM" is a misnomer
- Nov 2024: The repeated saturation of scaling
- Jan 2024: 6 Years of Questions