| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 28 Sep 2025 07:56:49 GMT
access-control-allow-origin: *
etag: W/"68d8ea41-687d"
expires: Sun, 28 Dec 2025 05:07:34 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: B9E7:15317B:7411C7:81FD3E:6950B8BE
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 04:57:34 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210023-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766897855.778137,VS0,VE209
vary: Accept-Encoding
x-fastly-request-id: ad2508283dd839be976e72bdc9b490406882971b
content-length: 7228
Vivek Ramanujan
Vivek Ramanujan
I am currently a PhD student at the University of Washington working with Ali Farhadi and Ludwig Schmidt on problems related to robust machine learning. Previously, I was a predoctoral researcher on the PRIOR (vision) group at the Allen Institute for Artificial Intelligence (AI2), where I was advised by Mohammad Rastegari and Aniruddha Kembhavi.
Research
I'm broadly interested in computer vision, machine learning, and optimization. See my Google Scholar for a consistently up-to-date publication list.
* denotes equal contribution
When Worse is Better: Navigating the Compression-Generation
Tradeoff in Visual Tokenization
Vivek Ramanujan, Kushal Tirumala, Armen
Aghajanyan, Luke Zettlemoyer, Ali Farhadi
arXiv, 2024
We challenge the assumption that better image reconstruction leads
to better generation in two-stage image generation models. We
introduce Causally Regularized Tokenization (CRT), which optimizes
the compression-generation trade-off by incorporating stage 2
generation knowledge into stage 1 training. Despite worse
reconstruction, CRT achieves state-of-the-art ImageNet generation
(2.18 FID) with 2-3× improved compute efficiency, using fewer tokens
and parameters than previous methods.
From an Image to a Scene: Learning to Imagine the World from a
Million 360 Videos
Matthew Wallingford, Anand Bhattad, Aditya Kusupati,
Vivek Ramanujan, Matt Deitke, Sham
Kakade, Aniruddha Kembhavi, Roozbeh Mottaghi, Wei-Chiu Ma, Ali
Farhadi
In Proceedings at NeurIPS, 2024
We introduce 360-1M, a large-scale 360-degree video dataset, and
Odin, a diffusion-based model for novel view synthesis. By
leveraging the largest real-world, multi-view dataset to date, Odin
can generate novel views of real-world scenes and infer scene
geometry and layout, showing improved performance on standard view
synthesis and 3D reconstruction benchmarks.
The Unmet Promise of Synthetic Training Images: Using Retrieved
Real Images Performs Better
Scott Geng, Cheng-Yu Hsieh,
Vivek Ramanujan, Matthew Wallingford,
Chun-Liang Li, Pang Wei Koh, Ranjay Krishna
In Proceedings at NeurIPS, 2024
We investigate the effectiveness of synthetic images for training
vision models by comparing them against retrieved real images from
the generator's training data (LAION-2B). Our findings show that
while synthetic data can be beneficial, it is consistently matched
or outperformed by real images from a simple retrieval baseline,
partly due to generator artifacts and inaccurate visual details in
synthetic images.
On the Connection between Pre-training Data Diversity and
Fine-tuning Robustness
Vivek Ramanujan*, Thao Nguyen*, Sewoong
Oh, Ludwig Schmidt, Ali Farhadi
(Spotlight) In Proceedings at NeurIPS, 2023
We investigate how pre-training data properties affect the
robustness of fine-tuned models. Through extensive experiments
across natural and synthetic datasets, we find that data quantity is
the primary factor influencing downstream robustness, while other
factors like label space, semantics, and image diversity have
limited impact. We demonstrate this using the iWildCam-WILDS
distribution shift benchmark, showing that even significant changes
to pre-training class distribution don't affect robustness when
total data quantity is preserved.
DataComp: In Search of the Next Generation of Multimodal Datasets
Samir Yitzhak Gadre*, Gabriel Ilharco*, Alex Fang*, (many more
important authors) Vivek Ramanujan,
(many more important authors), Vaishaal Shankar, Ludwig Schmidt
In Proceedings at NeurIPS (Datasets and Benchmarks Track), 2023
We introduce DataComp, a benchmark for multimodal dataset creation
with a candidate pool of 12.8B image-text pairs. Our testbed enables
systematic evaluation of dataset design choices through standardized
CLIP training and evaluation on 38 downstream tasks. Our best
baseline, DataComp-1B, achieves 79.2% zero-shot ImageNet accuracy
with CLIP ViT-L/14, surpassing OpenAI's CLIP by 3.7%.
Neural Priming for Sample-Efficient Adaptation
Matthew Wallingford*, Vivek Ramanujan*,
Alex Fang, Aditya Kusupati, Roozbeh Mottaghi, Aniruddha Kembhavi,
Ludwig Schmidt, Ali Farhadi
In Proceedings at NeurIPS, 2023
We introduce Neural Priming, a technique that enables large
pretrained models to adapt to distribution shifts and downstream
tasks with minimal labeled data. By recalling and conditioning on
relevant pretraining data when presented with class names or
unlabeled samples, Neural Priming achieves significant improvements
across various benchmarks: 2.45% on ImageNet zero-shot, 3.81% on
transfer learning tasks, and 1.41% on ImageNetV2 using test-time
adaptation.
Neural Radiance Field Codebooks
Matthew Wallingford, Aditya Kusupati, Alex Fang,
Vivek Ramanujan, Aniruddha Kembhavi,
Roozbeh Mottaghi, Ali Farhadi
International Conference on Representation Learning 2023
We introduce Neural Radiance Field Codebooks (NRC), a method for
learning object-centric representations through novel view
reconstruction. NRC learns to reconstruct scenes using a dictionary
of object codes decoded through a volumetric renderer, enabling
discovery of reoccurring visual and geometric patterns. We
demonstrate superior performance in object navigation, unsupervised
segmentation, and depth ordering tasks across both synthetic and
real scenes.
Matryoshka Representations for Adaptive Deployment
Aditya Kusupati, Gantavya Bhatt, Aniket Rege, Matthew Wallingford,
Aditya Sinha, Vivek Ramanujan, William
Howard-Snyder, Kaifeng Chen, Sham Kakade, Prateek Jain, Ali Farhadi
In Proceedings at NeurIPS, 2022
We introduce Matryoshka Representation Learning (MRL), a method for
learning flexible representations that can adapt to multiple
downstream tasks with varying computational resources. MRL encodes
information at different granularities, allowing a single embedding
to adapt to computational constraints without additional inference
cost. We demonstrate significant improvements in efficiency and
accuracy across various tasks and modalities, including up to 14×
smaller embedding sizes for ImageNet classification and retrieval.
LLC: Accurate, Multi-Purpose Learnt Low-Dimensional Binary Codes
Aditya Kusupati, Matthew Wallingford,
Vivek Ramanujan, Raghav Somani, Jae Sung
Park, Krishna Pillutla, Prateek Jain, Sham Kakade, Ali Farhadi
Advances in Neural Information Processing Systems (NeurIPS), 2021
We propose a novel method for learning low-dimensional binary codes
for instances and classes without requiring side-information. Our
method learns extremely low-dimensional binary codes (~20 bits for
ImageNet-1K) while maintaining near-optimal classification accuracy.
The codes capture intrinsic data features, enabling efficient image
retrieval and out-of-distribution detection tasks.
Forward Compatible Training for Representation Learning
Vivek Ramanujan, Pavan Kumar Anasosalu
Vasu, Ali Farhadi, Oncel Tuzel, Hadi Pouransari
In Proceedings at CVPR, 2022
In real world visual retrieval systems, the embedding model is
consistently updated. This requires embeddings for all images in the
gallery to be recomputed for every new model, an expensive process
known as backfilling. We present a method for forward compatible
training (FCT) in which we prepare for the future version of a model
by saving cheap auxiliary information about the present training
task. We show empirically that this improves performance on model
compatibility on common largescale datasets (ImageNet, Places-365,
VGGFace2).
Effects of Parameter Norm Growth During Transformer Training:
Inductive Bias from Gradient Descent
Will Merrill, Vivek Ramanujan, Yoav
Goldberg, Roy Schwartz, Noah Smith
In Proceedings at EMNLP, (Oral) 2022
The capacity of neural networks like the widely adopted transformer
is known to be very high. Evidence is emerging that they learn
successfully due to inductive bias in the training routine,
typically a variant of gradient descent (GD). As the parameters grow
in magnitude, we prove that the network approximates a discretized
network with saturated activation functions. Such "saturated"
networks are known to have a reduced capacity compared to the full
network family that can be described in terms of formal languages
and automata. Our results suggest saturation is a new
characterization of an inductive bias implicit in GD of particular
interest for NLP. We leverage the emergent discrete structure in a
saturated transformer to analyze the role of different attention
heads, finding that some focus locally on a small number of
positions, while other heads compute global averages, allowing
counting.
Supermasks in Superposition
Mitchell Wortsman*, Vivek Ramanujan*,
Rosanne Liu, Aniruddha Kembhavi, Mohammad Rastegari, Jason Yosinski,
Ali Farhadi
In Proceedings at NeurIPS, 2020
We present an application of hidden networks for continual learning,
capable of learning thousands of tasks without catastrophic
forgetting. We solve tasks individually, each solution corresponding
to a subnetwork of a randomly initialized neural network. Using a
superposition of these subnetworks, we demonstrate that the
viability of this model for task inference. Finally, we introduce a
coherent hierarchy for continual learning problems.
Soft Threshold Weight Reparameterization for Learnable Sparsity
Aditya Kusupati, Vivek Ramanujan, Raghav
Somani, Mitchell Wortsman, Prateek Jain, Sham Kakade, Ali Farhadi
International Conference on Machine Learning, 2020
We introduce a new strategy for pruning neural networks based off of
the soft threshold reparametrization technique from signal
processing. The layerwise sparsity budgets allow for very sparse but
still highly performant trained models across a variety of
architectures and tasks.
What's Hidden in a Randomly Weighted Neural Network?
Vivek Ramanujan*, Mitchell Wortsman*,
Aniruddha Kembhavi, Ali Farhadi, Mohammad Rastegari
Computer Vision and Pattern Recognition, 2020
We demonstrate that you can find untrained subnetworks of common
overparametrized convolutional neural networks
at initialization that achieve performance similar to their
densely trained counterparts.
Improving Shape Deformation in Unsupervised Image-to-Image
Translation
Aaron Gokaslan, Vivek Ramanujan,
Kwang-In Kim, Daniel Ritchie, James Tompkin
European Conference for Computer Vision, 2018
We improve on CycleGAN by allowing for better shape deformation
between more disparate domains.
Service
Reviewer
CVPR 2025
Reviewer
ICLR 2024
Reviewer
NeurIPS 2024
Teaching Assistant
Computer Vision CS146 Spring 2018
Teaching Assistant
Machine Learning CS142 Spring 2018
Teaching Assistant
Applied Artificial Intelligence CS141 Spring 2017
Teaching Assistant
Deep Learning CS2951K, Fall 2016
Reaction Game
Link
Brown Noise
Link