Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 accept-ranges: bytes age: 0 cache-control: public,max-age=0,must-revalidate cache-status: "Netlify Edge"; fwd=miss content-encoding: gzip content-type: text/html; charset=UTF-8 date: Sun, 28 Dec 2025 10:30:20 GMT etag: "f40cb7c0a8f4d387d4e72ce847b703d7-ssl-df" server: Netlify strict-transport-security: max-age=31536000 vary: Accept-Encoding x-nf-request-id: 01KDJ84KK44QBD868YFEAFD388 Xiuyu Li's Homepage

Xiuyu Li

I am a Ph.D. candidate affiliated with Berkeley AI Research (BAIR) at UC Berkeley, advised by Prof. Kurt Keutzer. Previously, I received a B.A. in Computer Science and Math from Cornell University. During my undergrad years, I was fortunate to work with Prof. Zhiru Zhang, Prof. Vitaly Shmatikov, and Prof. Song Han.

Email: xiuyu [at] berkeley [dot] edu

/ / /

Research

My current research interests are enhancing the reasoning capabilities of large language models (LLMs) and developing scalable AI agents. This pursuit is built on my broader expertise in making generative models more efficient in both training and inference across language and vision.

Reasoning and Test-time Scaling: APR (COLM'25) unlocks a new dimension of scaling in LLMs via parallel reasoning. S* (EMNLP'25) is an effective test-time scaling framework for code generation.

Efficient Generative Models (Quantization & Sparsity): SparseLoRA (ICML'25) speeds up LLM finetuning with contextual sparsity. Q-Diffusion (ICCV'23) and SVDQuant (ICLR'25) are pioneering works for diffusion models quantization. SVG (ICML'25) accelerates video generation speed by 2x via attention sparsity. SqueezeLLM (ICML'24) achieves near-lossless 3-bit quantization for LLMs.

Long-context LLMs/VLMs: STORM (ICCV'25 CLVL) and NVILA (CVPR'25) propose efficient VLM architectures for long video understanding. LLoCO (EMNLP'24) improves long-context LLMs via context compression and parameter-efficient finetuning.

ML Systems: LongVILA (ICLR'25) is a framework for distributed training of VLMs on hour-long videos. TorchSparse (MLSys'22, MICRO'23) is a high-performance CUDA library for sparse convolution.

Evaluation: RouterBench (ICML'24 Agentic Markets) is the first benchmark for LLM routing. ArtBench (arXiv'22) is high-quality dataset for artwork generation. LINKX (NeurIPS'21) offers diverse large-scale non-homophilous graph datasets with a strong baseline.

Selected Publications

For the most up-to-date list of publications, please see google scholar.

* indicates co-first author ^† indicates project lead

Learning Adaptive Parallel Reasoning with Language Models
Jiayi Pan*, Xiuyu Li*, Long Lian*, Charlie Snell, Yifei Zhou, Adam Yala, Trevor Darrell, Kurt Keutzer, Alane Suhr
COLM, 2025
[abs] [paper] [code]

SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity
Samir Khaki*, Xiuyu Li*^†, Junxian Guo*, Ligeng Zhu, Konstantinos N. Plataniotis, Amir Yazdanbakhsh, Kurt Keutzer, Song Han, Zhijian Liu
ICML, 2025
[abs] [paper] [code] [website]

S*: Test Time Scaling for Code Generation
Dacheng Li*, Shiyi Cao*, Chengkun Cao, Xiuyu Li, Shangyin Tan, Kurt Keutzer, Jiarong Xing, Joseph E. Gonzalez, Ion Stoica
EMNLP Findings, 2025
[abs] [paper] [code]

Token-Efficient Long Video Understanding for Multimodal LLMs
Jindong Jiang*, Xiuyu Li*, Zhijian Liu, Muyang Li, Guo Chen, Zhiqi Li, De-An Huang, Guilin Liu, Zhiding Yu, Kurt Keutzer, Sungjin Ahn, Jan Kautz, Hongxu Yin, Yao Lu, Song Han, Wonmin Byeon
ICCV CLVL workshop, 2025
[abs] [paper] [website]

LLoCO: Learning Long Contexts Offline
Sijun Tan*, Xiuyu Li*, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, Joseph E. Gonzalez, Raluca Ada Popa
EMNLP, 2024
[abs] [paper] [code]

Q-Diffusion: Quantizing Diffusion Models
Xiuyu Li, Yijiang Liu, Long Lian, Huanrui Yang, Zhen Dong, Daniel Kang, Shanghang Zhang, Kurt Keutzer
ICCV, 2023
[abs] [paper] [code] [website] [talk]
Integration: NVIDIA TensorRT

SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim*, Coleman Hooper*, Amir Gholami*, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
ICML, 2024
[abs] [paper] [code]
Integration: Intel oneAPI

TorchSparse: Efficient Point Cloud Inference Engine
Haotian Tang*, Zhijian Liu*, Xiuyu Li*, Yujun Lin, Song Han
MLSys, 2022
[abs] [paper] [code] [website]