| CARVIEW |
![]() (at Lake Louise, Alberta, Canada) |
Hanshi Sun   孙寒石
I am currently a Research Scientist at
At CMU, I worked with Prof.
Beidi Chen in the
preminstrel [at] gmail [dot] com; hanshi.s [at] bytedance.com
|
News 📢
-
[2025/09/18] Our R-KV has been accepted by
NeurIPS 2025! See you in San Diego!
-
[2025/05/01] Our ShadowKV has been accepted by
ICML 2025 as Spotlight! See you in Vancouver!
-
[2025/03/03] Join
ByteDance Seed-Foundation-MLSys team as a Research Scientist.
-
[2024/10/01] Our Speculative Rejection has been accepted by
NeurIPS 2024! See you in Vancouver!
-
[2024/07/09] Our TriForce has been accepted by 🦙 COLM 2024! See you in Philadelphia!
-
[2024/06/03] Work as a MLSys Research Intern in Seed-Foundation team at
ByteDance.
-
[2023/11/06] Join Prof. Beidi Chen's group at
CMU.
-
[2023/06/20] Graduate from
Southeast University
with a
bachelor degree.
-
[2022/11/15] Work at
Apple as a Software Engineer Intern in the iPad System team.
-
[2022/07/06] Work as a Research Intern in Prof. Xingyu Li’s group at the
University of Alberta.
Publications
/ . Selected papers are highlighted.
|
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu Conference on Neural Information Processing Systems (NeurIPS), 2025
arXiv / website / code Shrink the cache, keep the brains. |
|
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar ICML 2025 Workshop on Long-Context Foundation Models, 2025 Fine-grained, Head-wise Offloading Strategy |
|
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen International Conference on Machine Learning (ICML) Spotlight, 2025
arXiv / website / code High-Throughput Long-Context LLM Inference System |
|
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors) Conference on Neural Information Processing Systems (NeurIPS), 2024
arXiv / website / code Fast Inference-time Aligment Algorithm |
|
*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen Conference on Language Modeling (COLM), 2024
arXiv / website / code Training-free Lossless Long Sequence Generation Acceleration |
|
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models
Zefan Cai, Wen Xiao, Hanshi Sun, Cheng Luo, Yikai Zhang, Ke Wan, Yucheng Li, Yeyang Zhou, Li-Wen Chang, Jiuxiang Gu, Zhen Dong, Anima Anandkumar, Abedelkadir Asi, Junjie Hu Conference on Neural Information Processing Systems (NeurIPS), 2025
arXiv / website / code Shrink the cache, keep the brains. |
|
HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Cheng Luo, Zefan Cai, Hanshi Sun, Jinqi Xiao, Bao Yuan, Wen Xiao, Junjie Hu, Jiawei Zhao, Beidi Chen, and Anima Anandkumar ICML 2025 Workshop on Long-Context Foundation Models, 2025 Fine-grained, Head-wise Offloading Strategy |
|
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference
Hanshi Sun, Li-Wen Chang, Wenlei Bao, Size Zheng, Ningxin Zheng, Xin Liu, Harry Dong, Yuejie Chi, and Beidi Chen International Conference on Machine Learning (ICML) Spotlight, 2025
arXiv / website / code High-Throughput Long-Context LLM Inference System |
|
Fast Best-of-N Decoding via Speculative Rejection
Hanshi Sun*, Momin Haider*, Ruiqi Zhang*, Huitao Yang, Jiahao Qiu, Ming Yin, Mengdi Wang, Peter Bartlett, and Andrea Zanette* (* for core authors) Conference on Neural Information Processing Systems (NeurIPS), 2024
arXiv / website / code Fast Inference-time Aligment Algorithm |
|
*TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun, Zhuoming Chen, Xinyu Yang, Yuandong Tian, and Beidi Chen Conference on Language Modeling (COLM), 2024
arXiv / website / code Training-free Lossless Long Sequence Generation Acceleration |
|
BMAD: Benchmarks for Medical Anomaly Detection
Jinan Bao, Hanshi Sun, Hanqiu Deng, Zhaoxiang Zhang, and Xingyu Li Computer Vision and Pattern Recognition (CVPR) Workshop, 2024 This benchmark encompasses six reorganized datasets from five medical domains (i.e. brain MRI, liver CT, retinal OCT, chest X-ray, and digital histopathology) and three key evaluation metrics, and includes a total of fourteen state-of-the-art AD algorithms. |
|
Combating Medical Noisy Labels by Disentangled Distribution Learning and Consistency Regularization
Yi Zhou, Lei Huang, Tao Zhou and Hanshi Sun Future Generation Computer Systems (FGCS), 2023 Disentangled distribution learning reduces effect of label uncertainty and ambiguity |
|
Arrhythmia Classifier Using Convolutional Neural Network with Adaptive Loss-aware Multi-bit Networks
Quantization
Hanshi Sun, Ao Wang, Ninghao Pu, Zhiqing Li, Junguang Huang, Hao Liu and Zhi Qi ICAICE, 2021
website /
paper /
code Present a 1-D adaptive loss-aware quantization, achieving a high compression rate that reduces memory consumption by 23.36x |
Services
- Reviewer: NeurIPS 2025, ICML 2025, ICLR 2025, COLM 2025, MLSys 2025, ACL ARR 2025, AAAI 2025
- Teaching Assistant: Introduction to Deep Learning (CMU), Introduction to Machine Learning (CMU)
© Hanshi Sun 2025

InfiniAI Lab
PALM Lab
preminstrel [at] gmail [dot] com; hanshi.s [at] bytedance.com
@preminstrel
GitHub
Google Scholar
LinkedIn
Blog
ShadowKV: KV Cache in Shadows for High-Throughput Long-Context LLM Inference