| CARVIEW |
Vatsal Sharan
Assistant Professor,
Thomas Lord Department of Computer Science,
University of Southern California
Email: vsharan at usc dot edu
Office: GCS 402P
About
I'm an assistant professor of computer science at USC. I am a part of the Theory Group, Machine Learning Center and the Center for AI in Society at USC.
Previously, I was a postdoc at MIT hosted by Ankur Moitra. I obtained my Ph.D. from Stanford advised by Greg Valiant.
I work on the foundations of machine learning, and my interests mostly lie in the intersection of machine learning, theoretical computer science and statistics. The goal of my research is to study and discover the underlying principles which govern learning, and to leverage this understanding to build practical machine learning systems which are more efficient, fair and robust. A large part of my work aims to inspect questions which arise from modern applications and challenges of machine learning. If you're interested in learning more, some of my representative work is highlighted below.
My research is supported by the NSF CAREER award, Amazon Research Awards, the Google Research Scholar Award and the Okawa Research Award. This support is very gratefully acknowledged.
I'm also a part of the Learning Theory Alliance (LeT-All), a community building and mentorship initiative for the learning theory community.
Here is my CV, which has a more complete list of activities.
Some recent preprints:
- Limitations on Safe, Trusted, Artificial General Intelligence
- Latent Concept Disentanglement in Transformer-based Language Models
- FoNE: Precise Single-Token Number Embeddings via Fourier Features
- Textual Steering Vectors Can Improve Visual Understanding in Multimodal Large Language Models
Some Representative Publications | All Publications
Using algorithms to understand Transformers, and using Transformers to understand algorithms
Theory can provide a bird's eye view of the landscape of information, computation and how they interact for learning problems. Can we use some of this computational and information theoretic understanding to understand Transformers? On the flip side, can we use Transformers to explore this landscape, and understand and discover algorithms and data structures? Here is a talk which covers some of this work (slides).- Discovering Data Structures: Nearest Neighbor Search and Beyond
- Transformers Learn Low Sensitivity Functions: Investigations and Implications
- Transformers Learn to Achieve Second-Order Convergence Rates for In-Context Linear Regression
- Pre-trained Large Language Models Use Fourier Features to Compute Addition
- Mitigating Simplicity Bias in Deep Learning for Improved OOD Generalization and Robustness
- One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks Learning
A multi-group perspective to go beyond loss minimization in ML
Minimizing some loss function on average across all datapoints is the dominant paradigm in ML, but applications of ML in societal systems often involve more complex considerations. Different individuals participating in the system may have their own loss functions, it may not be possible to make decisions for these individuals in isolation, and we may care about the model’s behavior on various groups of individuals and not just on average across all of them. Some of my work here uses a multigroup perspective --- which examines the model's predictions on a large number of groups within the population --- to provide solutions to some of the above problems. Here is a talk which covers some of this work (slides).- Improved Bounds for Swap Multicalibration and Swap Omniprediction
- When is Multicalibration Post-Processing Necessary?
- Stability and Multigroup Fairness in Ranking with Uncertain Predictions
- Omnipredictors
- Simultaneous Swap Regret Minimization via KL-Calibration
- Optimal Multiclass U-Calibration Error and Beyond
- Fairness in Matching under Uncertainty
- KL Divergence Estimation with Multi-group Attribution
- Multicalibrated Partitions for Importance Weights
Memory as a lens to understand efficient learning and optimization
Classical learning theory mainly focuses on the number of operations performed by the algorithm as the proxy for the algorithm’s running time. However, since growth in the available processing power has outpaced the growth in the available memory by many orders of magnitude, memory rather than compute has become the primary bottleneck in many applications. Despite this, and even though memory has traditionally been one of the most fundamental computational resources in theoretical computer science, very little is known about the role of memory in solving learning tasks. My work investigates the role of memory in learning, and if memory could be a useful discerning factor to provide a clear separation between 'efficient' and 'expensive' techniques. Here is a talk which covers some of this work (slides).- Efficient Convex Optimization Requires Superlinear Memory
- Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales
- Memory-Sample Tradeoffs for Linear Regression with Small Error
- NeuroSketch: A Neural Network Method for Fast and Approximate Evaluation of Range Aggregate Queries
- Compressed Factorization: Fast and Accurate Low-Rank Factorization of Compressively-Sensed DataVatsal Sharan, Kai Sheng Tai, Peter Bailis, Gregory Valiant
ICML, 2019
abstract | pdf | arXiv | code | spotlight video - Efficient Anomaly Detection via Matrix Sketching
- Moment-Based Quantile Sketches for Efficient High Cardinality Aggregation Queries
- Prediction with a Short Memory
- Sketching Linear Classifiers over Data Streams
Regularization and learning with limited (or synthetic) data
Modern ML techniques are data-hungry, but data is an expensive resource. Current ML applications have brought to light several interesting information-theoretic questions around learning with limited data. Some of my work provides a formulation to study the basic statistical task of synthetically augmenting a given set of samples, and explores notions of regularization --- a classical technique for data-efficient learning whose role still remains mysterious in deep learning. We are also developing principles to explain differences in statistical behavior and inductive biases of common optimization algorithms.- The Rich and the Simple: On the Implicit Bias of Adam and SGD
- Proper Learnability and the Role of Unlabeled Data
- Regularization and Optimal Multiclass Learning
- On the Statistical Complexity of Sample AmplificationBrian Axelrod, Shivam Garg, Yanjun Han, Vatsal Sharan, Gregory Valiant
Annals of Statistics, 2024
arXiv | preprint pdf
- Transductive Sample Complexities Are Compact
- Open Problem: Can Local Regularization Learn All Multiclass Problems?Julian Asilis, Siddartha Devic, Shaddin Dughmi, Vatsal Sharan, Shang-Hua Teng
Open Problem @ COLT, 2024
pdf - Sample Amplification: Increasing Dataset Size even when Learning is Impossible
Students
I am very lucky to advise an amazing group of students, including the following amazing Ph.D. students:- Siddartha Devic (co-advised with Aleksandra Korolova)
- Bhavya Vasudeva
- Julian Asilis
- Deqing Fu (co-advised with Robin Jia)
- Devansh Gupta (co-advised with Meisam Razaviyayn)
- Spandan Senapati (co-advised with Haipeng Luo)
- Tianyi Zhou (co-advised with Robin Jia)
- Anish Jayant
- Woody Gan
- Kuan Liu
- Nathan Derhake
- Jung Whan Lee (graduated in 2025)
- Dutch Hansen (graduated in 2025, now Ph.D. student at the University of Washington)
- Natalie Abreu (graduated in 2023, now Ph.D. student at Harvard)
- Aditya Prased (graduated in 2024, now Ph.D. student at the University of Chicago)
- Kameron Shahabi (graduated in 2024, now Ph.D. student at the University of Washington)
- Qilin Ye (joint with Robin Jia, graduated in 2024, now M.S. student at Duke)
- Devin Martin (SURE program intern in Summer'22)
- You Qi Huang (SURE program intern in Summer'23)
Teaching
- CSCI 699: Trustworthy Machine Learning, from an Optimization Lens (Fall 2025)
- CSCI 567: Machine Learning (Spring 2024)
- CSCI 699: Theory of Machine Learning (Fall 2023)
- CSCI 699: Seminar on Computational Perspectives on the Frontiers of ML (Spring 2023)
- CSCI 567: Machine Learning (Fall 2022)
- CSCI 699: Theory of Machine Learning (Fall 2021)