Publications

Peer-reviewed Publications

  • Mitigating Application Resource Overload with Targeted Task Cancellation
    Yigong Hu, Zeyin Zhang, Yicheng Liu, Yile Gu , Shuangyu Lei, Baris Kasikci, Peng Huang.
    (To Appear) SOSP 2025, Seoul, Republic of Korea, November 2025.
  • Scalable and Accurate Application-level Crash-Consistency Testing via Representative Testing
    Yile Gu*, Ian Neal*, Jiexiao Xu, Shaun Christopher Lee, Ayman Said, Musa Haydar, Jacob Van Geffen, Rohan Kadekodi, Andrew Quinn, Baris Kasikci.
    (To Appear) OOPSLA 2025, Singapore, October 2025. https://arxiv.org/abs/2503.01390
  • NanoFlow: Towards Optimal Large Language Model Serving Throughput
    Kan Zhu, Yufei Gao, Yilong Zhao, Liangyu Zhao, Gefei Zuo, Yile Gu , Dedong Xie, Zihao Ye, Keisuke Kamahori, Chien-Yu Lin, Ziren Wang, Stephanie Wang, Arvind Krishnamurthy, Baris Kasikci.
    OSDI 2025, Boston, MA, USA, July 2025. https://arxiv.org/abs/2408.12757
  • Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
    Keisuke Kamahori*, Tian Tang*, Yile Gu, Kan Zhu, Baris Kasikci.
    ICLR 2025, Singapore, May 2025. https://arxiv.org/abs/2402.07033
  • Perseus: Removing Energy Bloat from Large Model Training
    Jae-Won Chung, Yile Gu, Insu Jang, Luoxi Meng, Nikhil Bansal, Mosharaf Chowdhury.
    SOSP 2024, Austin, TX, USA, November 2024. https://arxiv.org/abs/2312.06902

Preprints

  • ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
    Yile Gu*, Rohan Kadekodi*, Hoang Nguyen, Keisuke Kamahori, Yiyu Liu, Baris Kasikci.
    https://arxiv.org/abs/2506.17538
  • Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
    Yile Gu, Yifan Xiong, Jonathan Mace, Yuting Jiang, Yigong Hu, Baris Kasikci, Peng Cheng.
    https://arxiv.org/abs/2501.14170
  • Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs
    Kan Zhu*, Tian Tang*, Qinyu Xu*, Yile Gu, Zhichen Zeng, Rohan Kadekodi, Liangyu Zhao, Ang Li, Arvind Krishnamurthy, Baris Kasikci.
    https://arxiv.org/abs/2502.12216
  • Semantic Scheduling for LLM Inference
    Wenyue Hua*, Dujian Ding*, Yile Gu , Yujie Ren, Kai Mei, Minghua Ma, William Yang Wang.
    https://arxiv.org/abs/2506.12204
  • TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
    Chien-Yu Lin*, Keisuke Kamahori*, Yiyu Liu, Xiaoxiang Shi, Madhav Kashyap, Yile Gu, Rulin Shao, Zihao Ye, Kan Zhu, Stephanie Wang, Arvind Krishnamurthy, Rohan Kadekodi, Luis Ceze, Baris Kasikci.
    https://arxiv.org/abs/2502.20969