| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Wed, 26 Nov 2025 15:56:29 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6927232d-44e1"
expires: Sun, 28 Dec 2025 11:22:01 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 6A2C:444BC:792391:87B746:69511080
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 11:12:01 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210083-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766920322.751621,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 765de23d55c461681695159f7204eadca95545b5
content-length: 4542
Shang Yang | MIT EECS
Ph.D. Student
MIT EECS
Cambridge, MA
shangy [at] mit [dot] edu
Shang Yang
Shang Yang
I am a third-year Ph.D. student at HAN LAB of MIT EECS, advised by Prof. Song Han. My long-term goal is to build efficient machine learning systems for applications at different scales, especially the Large Language Models (LLMs). Recently, I am activately working on the efficient inference systems for LLMs/VLMs.
News
- [2025/11] π TLT, our efficient RL framework for reasoning LLMs, has been accepted by ASPLOS 2026!
- [2025/05] π₯ I presented QServe and LServe at MLSys 2025! [QServe Video] / [LServe Video]
- [2025/02] π Both QServe and LServe have been accepted by MLSys 2025!
- [2025/02] π₯ We released LServe, substantially accelerating long-sequence LLM inference with Unified Sparse Attention.
- [2024/05] π₯ We released QServe, an efficient large-scale LLM serving framework with W4A8KV4 Quantization.
- [2024/05] π AWQ&TinyChat receives the Best Paper Award of MLSys 2024!
- [2024/03] We have released an updated version of TinyChat. Visual Language Models (e.g. VILA) are supported! Play with our demo!
- [2024/02] π₯ AWQ is accepted by MLSys 2024!
- [2023/10] π₯ I presented TorchSparse++ at MICRO 2023! See the video and slides here!
Selected Publications
-
ASPLOS
Qinghao Hu*, Shang Yang*, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han Cai, Chuang Gan, Ana Klimovic, Song HanThe 31th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2026. -
MLSys
Shang Yang*, Junxian Guo*, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang, Yujun Lin, Zhijian Liu, Yao Lu, Song Han.The Eighth Annual Conference on Machine Learning and Systems (MLSys), 2025. -
MLSys
Yujun Lin*, Haotian Tang*, Shang Yang*, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han.The Eighth Annual Conference on Machine Learning and Systems (MLSys), 2025. -
MLSys
Ji Lin*, Jiaming Tang*, Haotian Tangβ , Shang Yangβ , Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han.The Seventh Annual Conference on Machine Learning and Systems (MLSys), 2024.Code Best Paper Award -
MICRO
Haotian Tang*, Shang Yang*, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han.56th IEEE/ACM International Symposium on Microarchitecture (MICRO), 2023.
Blogs
-
Explore the latest advancement in TinyChat - the 2.0 version with significant advancements in prefilling speed of Edge LLMs and VLMs. Apart from the 3-4x decoding speedups achieved with AWQ quantization, TinyChat 2.0 now delivers state-of-the-art Time-To-First-Token, which is 1.5-1.7x faster than the legacy version of TinyChat. -
Explore the latest advancement in TinyChat and AWQ - the integration of Visual Language Models (VLM) on the edge! The exciting advancements in VLM allows LLMs to comprehend visual inputs, enabling seamless image understanding tasks like caption generation, question answering, and more. With the latest release, TinyChat now supports leading VLMs such as VILA, which can be easily quantized with AWQ, empowering users with seamless experience for image understanding tasks. -
Running large language models (LLMs) on the edge is of great importance. In this blog, we introduce TinyChat, an efficient and lightweight system for LLM deployment on the edge. It runs Meta's latest LLaMA-2 model at 30 tokens / second on NVIDIA Jetson Orin and can easily support different models and hardware.
Β© Copyright 2024 Shang Yang. Powered by Jekyll and Minimal Light theme.