| CARVIEW |
Viraj Prabhu
I am a research scientist on the multimodal AI team at Salesforce Research. I received my PhD in Computer Science from Georgia Tech in December 2023, where I was advised by Judy Hoffman and worked on making computer vision models generalize to new environments. I earned my Master's in CS (awarded the MS Research award) in May 2019, also at Georgia Tech, where I was advised by Devi Parikh and worked on developing visual conversational agents.
In grad school, I've had the opportunity to intern at NVIDIA (with Sanja Fidler), Salesforce (with Nikhil Naik), and Curai (with Anitha Kannan). Before that, I've had stints as a research assistant at Virginia Tech (with Dhruv Batra), a software engineer at Adobe, and a mentor for Google Summer of Code. I received my Bachelor's degree in Computer Science from BITS Pilani in 2015.
In my free time, I enjoy reading, running, soccer, and playing the guitar.
Research
WALT: Web Agents that Learn Tools
SCUBA: Salesforce Computer Use Benchmark
CoAct-1: Computer-using Agents with Coding as Actions
Trust but Verify: Programmatic VLM Evaluation in the Wild
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
We're Not Using Videos Effectively: An Updated Video Domain Adaptation Baseline
Translating Labels to Solve Annotation Mismatches Across Object Detection Datasets
AUGCAL: Sim-to-Real Adaptation by Improving Uncertainty Calibration on Augmented Synthetic Images
LANCE: Stress-testing Visual Models by Generating Language-guided Counterfactual Images
Bridging the Sim2Real gap with CARE: Supervised Detection Adaptation with Conditional Alignment and Reweighting
Battle of the Backbones: A Large-Scale Comparison of Pretrained Models across Vision Tasks
FACTS: First Amplify Correlations and Then Slice to Discover Bias
ICON2: Reliably Benchmarking Inequity in Detection by Identifying and Controlling for Confounders
Can domain adaptation make object recognition work for everyone?
Mitigating Bias in Visual Transformers via Targeted Alignment
Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
AUGCO: Augmentation Consistency-guided Self-training for Source-free Domain Adaptive Segmentation
UDIS: Unsupervised Discovery of Bias in Deep Visual Recognition Models
Selective Entropy Optimization via Committee Consistency for Unsupervised Domain Adaptation
Active Domain Adaptation via Clustering Uncertainty-weighted Embeddings
Open Set Medical Diagnosis
Few-shot Learning for Dermatological Disease Diagnosis
Do Explanations make VQA Models more Predictable to a Human?
The Promise of Premise: Harnessing Question Premises in Visual Question Answering
Evaluating Visual Conversational Agents via Cooperative Human-AI Games
Miscellaneous
Service & Recognition
- Reviewer: CVPR, NeurIPS, ICCV, ECCV, ICLR, TMLR, WACV, ACL (outstanding reviewer at NeurIPS 2021, CVPR 2021)
- Workshop organizer: EMACS (CVPR '25), L2ID (ECCV '22)
- Teaching Assistant: for Computer Vision (Spring '21), Deep Learning (Fall '19), Machine Learning (Fall 17')
Talks & Media
- Invited talks: Towards Reliable Computer Vision, at Caltech, AWS, UC Berkeley, CMU (2023-2024)
- Speaker: Human-Centered AI Tutorial (CVPR 2022), Industry Research Panel (CVPR 2025)
- News coverage: CoAct-1 (VentureBeat), BLIP-3 (VentureBeat), PACMAC & LANCE (GT News)
Projects & Software
- Agentic AI for Flow: Salesforce blog
- Fabrik: Neural network IDE (code)
- KeyframeCut: Adobe video segmentation tool
- Visual Dialog RL: PyTorch implementation





