| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://ikace.github.io/publications/
access-control-allow-origin: *
strict-transport-security: max-age=31556952
expires: Mon, 29 Dec 2025 02:04:31 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 225E:3157C7:816CA6:918FAB:6951DF55
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 01:54:31 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210092-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766973271.959764,VS0,VE217
vary: Accept-Encoding
x-fastly-request-id: bc79add3ec9f6aa4367dcca3402a3c6d5eed9e6d
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 31 Aug 2025 01:23:38 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68b3a41a-2d46"
expires: Mon, 29 Dec 2025 02:04:31 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 5626:3946E9:818F8C:91B266:6951DF56
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 01:54:31 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210092-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766973271.190736,VS0,VE208
vary: Accept-Encoding
x-fastly-request-id: 5ca48192e19066ce02dabc0d11d14afefdb09d2f
content-length: 3964
Publications - Yile (Michael) Gu

Yile (Michael) Gu
A CSE PhD student at the University of Washington
- Seattle, WA
- Github
- Google Scholar
Publications
Peer-reviewed Publications
- Mitigating Application Resource Overload with Targeted Task Cancellation
Yigong Hu, Zeyin Zhang, Yicheng Liu, Yile Gu , Shuangyu Lei, Baris Kasikci, Peng Huang.
(To Appear) SOSP 2025, Seoul, Republic of Korea, November 2025. - Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci.
(To Appear) OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390 - NanoFlow: Towards Optimal Large Language Model Serving Throughput
Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
OSDI 2025, Boston, MA, USA, July 2025. https://arxiv.org/abs/2408.12757 - Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci.
ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033 - Perseus: Removing Energy Bloat from Large Model Training
Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury.
SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902
Preprints
- ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
Yile Gu*, Rohan Kadekodi*, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci.
https://arxiv.org/abs/2506.17538 - Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng.
https://arxiv.org/abs/2501.14170 - Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci.
https://arxiv.org/abs/2502.12216 - Semantic Scheduling for LLM Inference
Wenyue Hua*, Dujian Ding*, Yile Gu , Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang.
https://arxiv.org/abs/2506.12204 - TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci.
https://arxiv.org/abs/2502.20969