Hi! I am a second-year Ph.D. student in the Computer Science Department at the University of Chicago advised by Prof. Tian Li. My broad research interests are in machine learning and optimization. I am excited about different performant, efficient, and principled learning algorithms, including optimizers, objectives, and model architectures.
Previously, I studied tradeoffs between the primary objective and other desirable properties of machine learning such as differential privacy and robustness when I did my M.S. in Machine Learning at Carnegie Mellon University. Before that, I received my B.S. in Data Science from Duke Kunshan University and Duke University.
I am always excited about exchanging research perspectives and connecting with people. If you are interested, please feel free to send me an email!
Classic zeroth-order optimization approaches typically optimize for a smoothed version of the original function, i.e., the expected objective under randomly perturbed model parameters. This can be interpreted as encouraging the loss values in the perturbation set to be small on average. Popular sharpness-aware minimization (SAM) objectives, however, typically focus on the largest loss within the neighborhood to arrive at flat minima more effectively. In this work, we connect zeroth-order optimization (and its corresponding objectives) with SAM approaches explicitly, through an exponential tilting objective that provides a smooth transition between the average- and the max-loss formulations. We explore new zeroth-order algorithms to solve a soft SAM objective parameterized by a tilting parameter. We provide precise characterizations of the sharpness notions of the tilted SAM framework. Practically, our approach can be used as a gradient-free and memory-efficient alternative to SAM variants, and it achieves better generalization compared to vanilla zeroth-order baselines on a wide range of downstream tasks, including classification, multiple choice QA, and language generation.
Private Zeroth-Order Optimization with Public Data
One of the major bottlenecks for deploying popular first-order differentially private (DP) machine learning algorithms (e.g., DP-SGD) lies in their high computation and memory cost, despite the existence of optimized implementations. Zeroth-order methods have promise in mitigating the overhead, as they leverage function evaluations to approximate the gradients, hence significantly easier to privatize. While recent works have explored zeroth-order approaches in both private and non-private settings, they still suffer from relatively low utilities compared with DP-SGD and limited application domains. In this work, we propose to leverage public information to guide and improve gradient approximation of private zeroth-order algorithms. We explore a suite of \underlinepublic data \underlineassisted \underlinezeroth-\underlineorder optimizers (PAZO) with minimal overhead. We provide theoretical analyses of the PAZO framework under an assumption of the similarity between public and private data. Empirically, we demonstrate that PAZO achieves stronger privacy/utility tradeoffs across vision and text tasks in both pre-training and fine-tuning regimes, outperforming the best first-order baselines (with public gradients) especially in highly private regimes, while offering up to runtime speedup.
Unraveling the Complexities of Simplicity Bias: Mitigating and Amplifying Factors
Xuchen Gong*, and Tianwen Fu*
NeurIPS Workshop on Mathematics of Modern Machine Learning, 2023
The success of neural networks depends on the generalization ability, while Shah et al. conclude that the inherent bias towards simplistic features, a phenomenon called Simplicity Bias, hurts generalization by preferring simple but noisy features to complex yet predictive ones. We aim to understand the scenarios when simplicity bias occurs more severely and the factors that help mitigate its effects. We show that many traditional insights such as increasing training size and increasing informative feature dimensions are not as effective as balancing the modes of our data distribution, distorting the simplistic features, or even searching for a good initialization. Our empirical results reveal intriguing factors of simplicity bias, and we call for future investigations to a more thorough understanding of simplicity bias and its interplay with the related fields.