| CARVIEW |
|
Rajesh JayaramEmail: rkjayaram (at) google (dot) com
|
I am a Research Scientist at Google NYC in the Algorithms and Optimization Group. I received my PhD in computer science at Carnegie Mellon in the summer of 2021, where I was fortunate to be advised by David Woodruff. Prior to that, I received my bachelor's from Brown University in May of 2017.
Research: I am interested primarily in sublinear algorithms and high-dimensional geometry, specifically sketching, streaming, and distributed algorithms for large scale computational problems. More broadly, I am interested in dimensionality reduction methods: namely, to what extent can we compress the significant components of an enormous, noisy data-set? My work also spans the areas of property testing, machine learning, and optimization.
Google Scholar, DBLP.
Teaching:
I taught as an Adjunct Professor at NYU's Tandon School of Engineering.
- Spring 2022: NYU CS-GY 6763 - Algorithmic Machine Learning and Data Science
Workshops:
I co-organized the Workshop on Algorithms for Large Data (Online) (WALDO 2021), which took pace on Monday, August 23 through Wednesday, August 25.
I co-organized the Workshop on Robust Streaming, Sketching, and Sampling in STOC 2021. A full recording of the workshop can be found here.
I am co-organizing an Industry Workshop at FOCS 2024, with the goal of bridging techniques between theory and practice.
Dissertation:
- Sketching and Sampling Algorithms for High-Dimensional Data. PhD Thesis, May 2021 Commitee: David Woodruff (advisor), Anupam Gupta (CMU), Andrej Risteski (CMU), Alexandr Andoni (Columbia), Jelani Nelson (Berkeley)
Preprints:
-
CRISP: Clustering Multi-Vector Representations for Denoising and Pruning
With João Veneroso, Jinmeng Rao, Gustavo Hernández Ábrego, Majid Hadian, and Daniel Cer. [arXiv]
Publications:
-
Hierarchical Retrieval: The Geometry and a Pretrain-Finetune Recipe
With Chong You, Ananda Theertha Suresh, Robin Nittka, Felix Yu, and Sanjiv Kumar. NeurIPS 2025 [arXiv] -
Approximating High-Dimensional Earth Mover's Distance as Fast as Closest Pair
With Lorenzo Beretta, Vincent Cohen-Addad, and Erik Waingarten. FOCS 2025 [arXiv] -
Metric Embeddings Beyond Bi-Lipschitz Distortion via Sherali-Adams
With Ainesh Bakshi, Vincent Cohen-Addad, Sameul B. Hopkins, and Silvio Lattanzi. COLT 2025 [arXiv] -
Randomized Dimensionality Reduction for Euclidean Maximization and Diversity Measures
With Jie Gao, Benedikt Kolbe, Shay Sapir, Chris Schwiegelshohn, Sandeep Silwal, and Erik Waingarten. ICML 2025 [arXiv] -
Unleashing Graph Partitioning for Large-Scale Nearest Neighbor Search
With Laxman Dhulipala, Lars Gottesbüren, and Jakub Lacki. VLDB 2025 [arXiv] -
Near-Optimal Spectral Density Estimation via Explicit and Implicit Deflation
With Rajarshi Bhattacharjee, Cameron Musco, Christopher Musco, and Archan Ray. SODA 2025 [arXiv] -
Massively Parallel Minimum Spanning Tree in General Metric Spaces
With Amir Azarmehr, Soheil Behnezhad, Jakub Łącki, Vahab Mirrokni, and Peilin Zhong. SODA 2025 [arXiv] -
MUVERA: Multi-Vector Retrieval via Fixed Dimensional Encodings
With Laxman Dhulipala, Majid Hadian, Jason Lee, and Vahab Mirrokni. NeurIPS 2024 [arXiv], [Google Research Blog], [Weaviate Podcast] -
Efficient Centroid-Linkage Clustering
With MohammadHossein Bateni, Laxman Dhulipala, Willem Fletcher, Kishen N Gowda, D Ellis Hershkowitz and Jakub Łącki. NeurIPS 2024 [arXiv] -
Metric Clustering and MST with Strong and Weak Distance Oracles
With MohammadHossein Bateni, Prathamesh Dharangutte, and Chen Wang. COLT 2024 [arXiv] -
Parallel and Sequential Hardness of Hierarchical Graph Clustering
With Mohammad Hossein Bateni, Laxman Dhulipala, Kishen Gowda, D Ellis Hershkowitz, and Jakub Lacki.
ICALP 2024 [arXiv] -
Dynamic PageRank: Algorithms and Lower Bounds
With Jakub Łącki, Slobodan Mitrović, Krzysztof Onak, and Piotr Sankowski.
ICALP 2024 [arXiv] -
Data-Dependent LSH for the Earth Mover’s Distance
With Erik Waingarten and Tian Zhang. STOC 2024 [arXiv] [Talk] -
HyperAttention: Long-Context Attention in Near-Linear Time
With Insu Han, Amin Karbasi, Vahab Mirrokni, David Woodruff, and Amir Zandieh. ICLR 2024 [arXiv] -
Massively Parallel Algorithms for High-Dimensional Euclidean Minimum Spanning Tree
With Vahab Mirrokni, Shyam Narayanan, and Peilin Zhong. SODA 2024 [arXiv] -
Fully Dynamic Consistent k-Center Clustering
With Christoph Grunau, Bernhard Haeupler, Jakub Łącki, and Václav Rozhoň. SODA 2024 [arXiv] -
Streaming Algorithms with Few State Changes
With David Woodruff and Samson Zhou.
PODS 2024 [arXiv] -
A Near-Linear Time Algorithm for the Chamfer Distance
With Ainesh Bakshi, Piotr Indyk, Sandeep Silwal, and Erik Waingarten. NeurIPS 2023 [arXiv] -
Streaming Euclidean MST to a Constant Factor
With Vincent Cohen-Addad, Xi Chen, Amit Levi, and Erik Waingarten.
STOC 2023 [arXiv] -
Optimal Fully Dynamic k-Centers Clustering
With MohammadHossein Bateni, Hossein Esfandiari, and Vahab Mirrokni. SODA 2023 [arXiv] [Merged Paper] with Hendrik Fichtenberger, Monika Henzinger, and Andreas Wiese -
Differentially Oblivious Relational Database Operators
With Lianke Qin, Elaine Shi, Zhao Song, Danyang Zhuo, Shumo Chu. VLDB 2023 [arXiv] -
Stars: Tera-Scale Graph Building for Clustering and Learning
With CJ Carey, Jonathan Halcrow, Vahab Mirrokni, Warren Schudy, and Peilin Zhong. NeurIPS 2022 [arXiv] -
New Streaming Algorithms for High Dimensional EMD and MST
With Xi Chen, Amit Levi, and Erik Waingarten. STOC 2022 [arXiv], [Talk] -
Truly Perfect Samplers for Data Streams and Sliding Windows
With David Woodruff and Samson Zhou. PODS 2022 [arXiv] -
An Optimal Algorithm for Triangle Counting in a Stream
With John Kallaugher. APPROX 2021 [arXiv] -
Learning and Testing Junta Distributions with Subcube Conditioning
With Xi Chen, Amit Levi, and Erik Waingarten. COLT 2021 [arXiv] -
In-Database Regression in Input Sparsity Time
With Alireza Samadian, David Woodruff, and Peng Ye. ICML 2021 [arXiv] -
When is Approximate Counting for Conjunctive Queries Tractable?
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros. STOC 2021 [arXiv] -
Testing Positive Semi-Definiteness via Random Submatrices
With Ainesh Bakshi and Nadiia Chepurko. FOCS 2020 [arXiv], [Extended Abstract], [Talk @ WOLA'20], [Longer Talk] -
A Framework for Adversarially Robust Streaming Algorithms
With Omri Ben-Eliezer, David Woodruff, and Eylon Yogev. PODS 2020 and Journal of the ACM [arXiv] PODS Best Paper Award, 2020 Invited to the Journal of the ACM 2021 ACM SIGMOD Research Highlight Award Invited to HALG 2021 -
Span Recovery for Deep Neural Networks with Applications to Input Obfuscation
With Qiuyi Zhang and David Woodruff. ICLR 2020 [arXiv], [Short Talk] -
Optimal Sketching for Kronecker Product Regression and Low Rank Approximation
With Huaian Diao, Zhao Song, Wen Sun, and David Woodruff. NeurIPS 2019 [arXiv] [Poster] -
Towards Optimal Moment Estimation in Streaming and Distributed Models
With David Woodruff. APPROX 2019 [arXiv] -
Learning Two Layer Rectified Neural Networks in Polynomial Time
With Ainesh Bakshi and David Woodruff. COLT 2019 [arXiv], [Talk @ COLT] -
Efficient Logspace Classes for Enumeration, Counting, and Uniform Generation
With Marcelo Arenas, Luis Alberto Croquevielle, and Cristian Riveros. PODS 2019 and Journal of the ACM [arXiv], [Talk], [SIGMOD Technical Perspective] PODS Best Paper Award, 2019 Invited to the Journal of the ACM 2021 ACM SIGMOD Research Highlight Award -
Weighted Reservoir Sampling from Distributed Streams
With Gokarna Sharma, Srikanta Tirthapura, and David P. Woodruff. PODS 2019 [arXiv] -
Perfect L_p Sampling in a Data Stream
With David Woodruff. FOCS 2018 and SIAM Journal on Computing [arXiv] -
Data Streams with Bounded Deletions
With David Woodruff. PODS 2018 [arXiv] -
Approximating Language Edit Distance Beyond Fast Matrix Multiplication: Ultralinear Grammars Are Where Parsing Becomes Hard!
With Barna Saha. ICALP 2017 [Full Version], [Conference Version]