| CARVIEW |
Hello, I'm
Clayton.
Welcome to my personal webpage. I am a Research Scientist at Google based in San Francisco working on LLM data and modeling. I completed a Ph.D. in Computer Science in 2024 from Columbia University under the supervision of Rocco Servedio and Daniel Hsu.
My Resume (short-form)My CV (long-form)
About Me
I'm a San Francisco-based researcher studying the overlap between machine learning and theoretical computer science. My research is broadly motivated by a desire to improve the interpretability, transparency, and accountability of neural networks by understanding their mathematical properties. My Ph.D. research was funded by an NSF GRFP fellowship, which I received in March 2021.
Before starting my PhD at Columbia, I studied applied math and computer science as an undergrad at Brown and worked as a data scientist at LinkedIn. I've completed internships at LinkedIn (data science), Lumi Labs (engineering at a 15-person startup), Allen Institute for AI (climate modeling), Microsoft Research (transformer theory), and Google Research (transformers and graph reasoning).
My work
Most of my research consists of mathematical results about the capabilities and limitations of neural networks and other machine learning algorithms. In particular, I study:
- the fundamental limitations of architectures like random feature models, deep neural networks, and self-attention units;
- the generalization and optimization properties of machine learning models like maximum-margin linear classifiers, neural nets trained with gradient descent on low-dimensional data, randomly initialized recurrent neural networks, and minimum-norm interpolating neural networks;
- the empirical and theoretical abilities of transformer models trained to solve combinatorial planning tasks (work in progress from a Summer 2023 internship at Microsoft Research NYC).
I applied machine learning techniques to climate modeling research as an internship at the Allen Institute for AI (AI2) during Summer 2022. My contributions led to my recognition as an Outstanding Intern.
As an undergraduate, I completed a variety of research projects on mathematical modeling (as a winner of the COMAP Interdisciplinary Contest for Modeling), dynamical systems (with Bjorn Sandstede), molecular biology (with William Fairbrother), and machine learning theory (with Eli Upfal).
I primarily code in Python and am experienced with core data science and deep learning packages (e.g. Pytorch, Tensorflow, Sklearn, and Pandas). I programmed in Java and Scala during my undergraduate years and when I worked at LinkedIn and Lumi Labs. The climate modeling repository I contributed to while interning at AI2 is publicly available, and my Microsoft Research code will be published once we upload our preprint.
Teaching experience
I TA'd five different courses at Brown: accelerated intro to CS, discrete math, CS theory, intro to dynamical systems, and algorithms, for which I was Head TA.
I was a graduate TA for courses on computational learning theory with Rocco Servedio, natural and artificial neural networks with Christos Papadimitriou and John Morrison, and ML and climate with Alp Kucukelbir. Natural and artificial neural networks was a new course, for which I developed a lab component from scratch in collaboration with Sam Deng; all materials are available online.
As a graduate student, I coordinated undergraduate seminars on deep learning theory during Summer 2021, Fall 2021, and Spring 2023.
Departmental and academic service
I served as a reviewer for ICLR 2024, NeurIPS 2023, JMLR, SODA 2023, and STOC 2022.
I was a PhD representative alongside the great Tim Randolph. We represented computer science PhD student interests and concerns to CS faculty and administrators.
In the Columbia CS theory group, I coordinated the student retreat for fall 2021 and 2022 and started (but no longer run) the CS Theory Student Seminar.
I was the leader of qSTEM, an organization for LGBTQ+ students in the Columbia School of Engineering and Applied Sciences.
Find me online:
[linkedin] [github] [arxiv] [google scholar] [dblp]
I am indebted to many fantastic mentors over many years among my teachers, professors, advisors, coworkers, and friends. I am always happy to pass it forward and chat with anyone interested in learning more about PhD programs in ML and/or theory, Columbia, and all things NYC or Bay Area.
My Papers
Neural Networks
[MSWE25] Alireza Mousavi-Hosseini, Clayton Sanford, Denny Wu, Murat A. Erdogdu. "When Do Transformers Outperform Feedforward and Recurrent Networks? A Statistical Perspective." Preprint. [arxiv]
[YSB+25] Gilad Yehudai, Clayton Sanford, Maya Bechler-Speicher, Orr Fischer, Ran Gilad-Bachrach, Amir Globerson. "Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers." Preprint. [arxiv]
[BPK+25] Ali Behrouz, Ali Parviz, Mahdi Karami, Clayton Sanford, Bryan Perozzi, Vahab Mirrokni. "Best of Both Worlds: Advantages of Hybrid Graph Sequence Models." ICML 2025. [arxiv]
[SHT24b] Clayton Sanford, Daniel Hsu, Matus Telgarsky. "One-layer transformers fail to solve the induction heads task." Note. [arxiv]
[SFH+24] Clayton Sanford, Bahare Fatemi, Ethan Hall, Anton Tsitsulin, Mehran Kazemi, Jonathan Halcrow, Bryan Perozzi, Vahab Mirrokni. "Understanding Transformer Reasoning Capabilities via Graph Algorithms." NeurIPS 2024. [arxiv] [Flatiron slides]
[Sanford24] Clayton Sanford. "Representational Capabilities of Feed-forward and Sequential Neural Architectures." Doctoral Thesis. 2024. [thesis] [defense slides]
[SHT24a] Clayton Sanford, Daniel Hsu, Matus Telgarsky. "Transformers, Parallel Computation, and Logarithmic Depth." ICML 2024. [arxiv] [Google slides] [Columbia slides]
[SHT23] Clayton Sanford, Daniel Hsu, Matus Telgarsky. "Representational Strengths and Limitations of Transformers." NeurIPS 2023. [paper] [arxiv] [UCSD seminar slides] [Google NYC algorithms seminar slides] [Columbia StatML workshop poster] [Columbia StatML workshop slides]
[AHS23] Navid Ardeshir*, Daniel Hsu*, Clayton Sanford*. "Intrinsic dimensionality and generalization properties of the R-norm inductive bias." COLT 2023. [paper] [arxiv] [blog post] [COLT poster]
[BBSS22] Alberto Bietti*, Joan Bruna*, Clayton Sanford*, Min Jae Song*. "Learning single-index models with shallow neural networks." NeurIPS 2022. [paper] [NeurIPS poster] [arxiv]
[CPSS22] Vaggos Chatziafratis*, Ioannis Panageas*, Clayton Sanford*, Stelios Stavroulakis*. "On Scrambling Phenomena for Randomly Initialized Recurrent Networks." NeurIPS 2022. [paper] [arxiv]
[HSSV22] Daniel Hsu*, Clayton Sanford*, Rocco Servedio*, Emmanouil-Vasileios Vlatakis-Gkaragkounis*. "Near-Optimal Statistical Query Lower Bounds for Agnostically Learning Intersections of Halfspaces with Gaussian Marginals." COLT 2022. [paper] [arxiv] [NYU seminar slides] [conference talk]
[SC22] Clayton Sanford, Vaggos Chatziafratis. "Expressivity of Neural Networks via Chaotic Itineraries beyond Sharkovsky's Theorem." AISTATS 2022. [paper] [arxiv] [conference talk]
[ASH21] Navid Ardeshir*, Clayton Sanford*, Daniel Hsu. "Support vector machines and linear regression coincide with very high-dimensional features." NeurIPS 2021. [paper] [arxiv] [blog post] [reviews] [conference talk] [Brown seminar slides] [UCSC seminar slides]
[HSSV21] Daniel Hsu*, Clayton Sanford*, Rocco Servedio*, Emmanouil-Vasileios Vlatakis-Gkaragkounis*. "On the Approximation Power of Two-Layer Networks of Random ReLUs." COLT 2021. [paper] [arxiv] [blog post] [conference talks] [Columbia DSI poster session] [MIT and BU seminar slides] [UW seminar slides]
ML + Climate
[SKW+23] Clayton Sanford, Anna Kwa, Oliver Watt-Meyer, Spencer Clark, Noah Brenowitz, Jeremy McGibbon, Christopher Bretherton. "Improving the reliability of ML-corrected climate models with novelty detection." Appearing in Journal of Advances in Modeling Earth Systems (JAMES). [journal submission] [NeurIPS workshop paper arxiv] [NeurIPS workshop slides] [AMS workshop abstract] [AMS workshop slides]
Undergraduate Research
[CRSSCS22] Tracy Chin*, Jacob Ruth*, Clayton Sanford*, Rebecca Santorella*, Paul Carter*, Bjorn Sandstede*. "Enabling equation-free modeling via diffusion maps." Journal of Dynamics and Differential Equations, 2022. [journal] [arxiv]
[S18] Clayton Sanford. "Applying Rademacher-Like Bounds to Combinatorial Samples and Function Selection." Honors Thesis, Brown Department of Computer Science, 2018. [thesis]
[CSF17] Kamil Cygan*, Clayton Sanford*, William Fairbrother. "Spliceman2 - A Computational Web Server That Predicts Sequence Variations in Pre-mRNA Splicing." Bioinformatics 33 (18), 2017. [paper]
[GSK16] Julia Gross*, Clayton Sanford*, Geoff Kocks*. "Projected Water Needs and Intervention Strategies in India." Undergraduate Mathematics and its Applications 37 (2), 2016. [paper] [article]
A few random things...
- I grew up in Santa Cruz, California.
- Technically, I lived in Aptos, which is why this page uses a font with the same name.
- I like running, backpacking, and all things nature.
- I was on Manhattan Community Board 9 from 2023 to 2025. I wrote about it in a series of blog posts.
- I am an Eagle Scout.
- I have strange completionist exploration goals:
- In 2021, I ran from the end of every subway line in NYC to Morningside Heights.
- In 2022 and 2023, I walked every block in Manhattan.
- If you have ideas what the next goal should be, let me know!