You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m a Master’s student in Computer Science at Johns Hopkins University, working on efficient machine learning and large language model (LLM) inference optimization.
My current research focuses on making LLMs more efficient at test time through:
Inference-time early exiting
Test-time scaling strategies
KV-cache compression & quantization
Speculative decoding
Memory- and compute-efficient reasoning pipelines
I’m affiliated with the Center for Language and Speech Processing (CLSP) at JHU and collaborate with the InfiniAI Lab at CMU.
Previously, I worked across research and applied ML roles at Bosch and the University of Zurich, and I also have experience in computer vision from earlier academic projects.
Interests:
LLM Inference Optimization · Efficient ML · Test-Time Scaling · Quantization · NLP Systems
Reduce LLM KV cache memory usage by ~2x with minimal accuracy loss using selective quantization. Experiment with full precision as well as different compression strategies.
PyTorch implementation of “Rich Teacher Features for Efficient Single-Image Haze Removal”. This repository provides an efficient, lightweight approach for single-image dehazing, leveraging rich tea…