I am a final year CS PhD student at EPFL working with Michael Kapralov. I am broadly interested in LLM inference and fine-tuning optimization with recent works on fast attention, KV cache compression and data-selection for LLM distillation. In the past I have also worked on fast algorithms for large-scale and high-dimensional data analysis and numerical linear algebra.
In my current research, I am exploring:
- Data selection for distillation based targeted instruction tuning for LLMs
- Writing efficient SIMD kernels for speeding up graph-based retrieval algorithms via quantization
- Fast attention at prefill, KV cache compression, and long-context inference
Experience:
- Student Researcher, Google Research (October 2025-Present): Working on quantization for vector databases for graph-based retrieval algorithms and data selection for LLM fine-tuning.
- Applied Science Intern, Amazon Research (July 2024-Jan 2025): Deployed ML and optimization solutions to production on AWS cloud infra stack for internal customers.
- Research Assistant (January 2017-August 2018): Worked with Anirban Dasgupta and Dinesh Garg (IBM Research, Bengaluru) on randomized linear algebra.
- Caltech - SURF Fellow (May 2017-July 2017): Worked with Ashish Mahabal on deep learning for astronomy.
News
- October 2025: Started as a Student Researcher at Google Research.
- September 2025: Paper on BalanceKV, a novel KV cache compression method, accepted to NeurIPS 2025 as Spotlight.
- January 2025: Paper accepted to ICLR 2025 (first author).
- July 2024: “Improved Algorithms for Kernel Matrix-Vector Multiplication” won Best Paper at the ICML 2024 Workshop on Long Context Foundation Models.
- July 2024: Started as an Applied Science Intern at Amazon Research.
Publications
Full list also on Google Scholar.
Streaming Attention Approximation via Discrepancy Theory.
Ekaterina Kochetkova, Kshiteej Sheth, Insu Han, Amir Zandieh, Michael Kapralov.
NeurIPS 2025 (Spotlight).
[arXiv | Code]
Improved Algorithms for Kernel Matrix-Vector Multiplication.
(alphabetical) Piotr Indyk, Michael Kapralov, Kshiteej Sheth, Tal Wagner.
ICLR 2025 (Poster) (I was first author). Best Paper at ICML 2024 Workshop on Long Context Foundation Models.
[OpenReview | Workshop]
Sublinear Time Low-Rank Approximation of Hankel Matrices.
Michael Kapralov, Cameron Musco, Kshiteej Sheth.
SODA 2026.
[arXiv]
Sublinear Time Low-Rank Approximation of Toeplitz Matrices.
Cameron Musco, Kshiteej Sheth.
SODA 2024.
[arXiv]
Toeplitz Low-Rank Approximation with Sublinear Query Complexity.
Michael Kapralov, Hannah Lawrence, Mikhail Makarov, Cameron Musco, Kshiteej Sheth.
SODA 2023.
[arXiv]
Towards Non-Uniform k-Center with Constant types of Radii.
Xinrui Jia, Lars Rohwedder, Kshiteej Sheth, Ola Svensson.
SOSA 2022.
[arXiv]
Fair Colorful k-Center Clustering.
Xinrui Jia, Kshiteej Sheth, Ola Svensson.
Mathematical Programming, 2021. Preliminary version in IPCO, 2020.
[arXiv | Talk | Journal]
Improved linear embeddings via Lagrange duality.
Kshiteej Sheth, Dinesh Garg, Anirban Dasgupta.
Machine Learning (Springer), 2019.
[Paper]
Deep-learnt classification of light curves.
Ashish Mahabal, Kshiteej Sheth, Fabian Gieseke, Akshay Pai, S. George Djorgovski, Andrew J. Drake, Matthew J. Graham.
SSCI 2017.
[arXiv]
Service
- Conference review: ICLR, NeurIPS, ICML.