| CARVIEW |
About
I am interested in the science of deep learning. Recently, I’ve been very excited about topics like reasoning, multi-modal foundation models, and safe and scalable deep learning. During my time at Microsoft Research, I worked on developing a knowledge base generative model towards a knowledge-augmented LLM approach to improve interpretability and limit hallucination. At FAIR, I worked on new pre-training objectives to make LLMs more data-efficient (learn more with less) and improve their knowledge storage and planning capabilities.
Interests
- Science of Deep Learning
- (Mechanistic) Interpretability
- Reasoning in Foundation Models
- Safety and Robustness
Education
Interdisciplinary Ph.D. in Physics and Statistics, 2019 - Present
Massachusetts Institute of Technology
BSc in Physics and Mathematics, 2019
University of Rochester
Experience
Machine Learning Researcher Intern
NASA/SETI Frontier Development Lab
Featured Publications

The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More
Training transformers to predict “any-to-any” as opposed to just next token solves the reversal curse and can improve planning capabilites.

DiSK: Diffusion Model for Structured Knowledge
DiSK is a generative framework for structured (dictionary-like) data that can handle various data types, from numbers to complex hierarchical types. This model excels in tasks like populating missing data and is especially proficient at predicting numerical values. Its potential extends to augmenting language models for better information retrieval and knowledge manipulation.

Robust and Provably Monotonic Networks
We develop a novel neural architecture with an exact bound on its Lipschitz constant. The model can be made monotonic in any subset of its features. This inductive bias is especially important for fairness and interpretability considerations.

Towards Understanding Grokking: An Effective Theory of Representation Learning
This study investigates grokking, a generalization phenomenon first observed in transformer models trained on arithmetic data, using microscopic and macroscopic analyses, revealing four learning phases and a “Goldilocks zone” for optimal representation learning, while emphasizing the value of physics-inspired tools in understanding deep learning.
Recent Publications
Controlling Classifier Bias with Moment Decomposition: A Method to Enhance Searches for Resonances
NuCLR: Nuclear Co-Learned Representations
Miscellaneous Projects

Monotonic Networks
A small package to make neural networks monotonic in any subset of their inputs (this works for individual neurons, too!).

MoDe: Controlling Classifier Bias
A regularization to make neural networks’ output independent from certain features.

Bell Inequality Experiment
An experiment to demonstrate the non-locality of quantum mechanics through the violation of Bell’s Inequality.
Recent Posts
The Physics of Deep Learning
Contact
- kitouni@mit.edu
- MIT, Cambridge, MA 02139








