| CARVIEW |
Haoli Bai (柏昊立)
Researcher, Huawei Hong Kong Research Center
Hong Kong SAR, China
Email: haolibai [at] gmail.com
General
I am currently a researcher at the Language Model Lab, Huawei Hong Kong Research Center. I obtained my Ph.D. degree from The Chinese University of Hong Kong supervised by Prof. Michael R. Lyu and Prof. Irwin King, and the B.Eng. Degree from Yingcai Honors College of University of Electronic Science and Technology.
Our team's effort is on large langauge models with topics spanning from pre-training, post-training, to agentic AI (e.g., deep research and coding agent). I am also an experienced researcher in LLM efficiency, e.g., compression and acceleration of LLMs.
[Hiring]🔥 We are constantly looking for full-time researcher and research interns with solid algorithm or system bacground (Base: HK or Shenzhen). Please connect by E-Mail.
News
- [2025-11] We will present the tutorial "Efficient Inference for Large Language Models – Algorithm, Model, and System" at EMNLP 2025. Tutorial website.
- [2025-11] I will give a talk on "Quantization and Pruning of Large Language Models: Challenges, Techinques and Oppertunities" at LMG 2025 .
- [2025-9] Our paper "A Simple Linear Patch Revives Layer-Pruned Large Language Models" is accepted by NeurIPS 2025.
- [2025-7]🔥Our paper "Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models" is accepted by COLM 2025. It was on the top-10 trending list on alphaXiv. Code is available here.
- [2025-4] I will serve as the Area Chair for NeurIPS 2025.
Selected Research
(*: Equal contribution; #: Corresponding author; +: Project lead)-
Xinrui Chen, Haoli Bai#+, Tao Yuan, Ruikang Liu, Kang Zhao, Xianzhi Yu, Lu Hou, Tian Guan, Yonghong He, Chun Yuan#
A Simple Linear Patch Revives Layer-Pruned Large Language Models
Proceedings of the 39th conference on Neural Information Processing Systems (NeurIPS) , 2025.
-
Ruikang Liu*, Yuxuan Sun*, Manyi Zhang*, Haoli Bai#+, Xianzhi Yu, Tiezheng Yu, Chun Yuan, Lu Hou#
Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models
Conference on Language Modeling (COLM) , 2025. [Code]
-
Yuxuan Sun*, Ruikang Liu*, Haoli Bai#+, Han Bao, Kang Zhao, Yuening Li, Jiaxin Hu, Xianzhi Yu, Lu Hou, Chun Yuan, Xin Jiang, Wulong Liu, Jun Yao
FlatQuant: Flatness Matters for LLM Quantization
International Conference on Machine Learning (ICML), 2025. [Code]
-
Zhiming Mao, Haoli Bai#+, Lu Hou, Lifeng Shang, Xin Jiang, Qun Liu, Kam-Fai Wong
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
The North American Chapter of the Association for Computational Linguistics (NACCL), 2024. [Code]
-
Ruikang Liu, Haoli Bai+, Haokun Lin, Yuening Li, Han Gao, Zhengzhuo Xu, Lu Hou, Jun Yao, Chun Yuan
IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Findings of Annual Meeting of the Association for Computational Linguistics (ACL), 2024. [Code]
-
Haokun Lin, Haoli Bai+, Zhili Liu, Lu Hou, Muyi Sun, Linqi Song, Ying Wei, Zhenan Sun
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024.
-
Yingtao Zhang, Haoli Bai+, Haokun Lin, Jialin Zhao, Lu Hou, Carlo Vittorio Cannistraci
Plug-and-Play: An Efficient Post-training Pruning Method for Large Language Models
The Twelfth International Conference on Learning Representations (ICLR), 2024. [Code]
-
Haoli Bai*, Zhiguang Liu*, Xiaojun Meng*, Wentao Li, Shuang Liu, Nian Xie, Rongfu Zheng, Liangwei Wang, Lu Hou, Jiansheng Wei, Xin Jiang, Qun Liu
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
The 61th Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
-
Chaofan Tao, Lu Hou+, Haoli Bai+, Jiansheng Wei, Xin Jiang, Qun Liu, Ping Luo, Ngai Wong
Structured Pruning for Efficient Generative Pre-trained Language Models
Findings of The 61th Annual Meeting of the Association for Computational Linguistics (ACL), 2023.
-
Haoli Bai, Lu Hou, Lifeng Shang, Xin Jiang, Irwin King, Michael Lyu
Towards Efficient Post-training Quantization of Pre-trained Language Models
Proceedings of the 36th conference on Neural Information Processing Systems (NeurIPS), 2022.
-
Haoli Bai, Hongda Mao, Dinesh Nair
Dynamically pruning segformer for efficient semantic segmentation
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022.
-
Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun Liu, Michael Lyu, Irwin King
BinaryBERT: Pushing the Limit of BERT Quantization
The 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021. Accepted with scores 5, 5, 4. [Code]
-
Haoli Bai*, Jiaxing Wang*, Jiaxiang Wu, Xupeng Shi, Junzhou Huang, Irwin King, Michael Lyu, Jian Cheng
Revisiting Parameter Sharing for Automatic Neural Channel Number Search
Proceedings of the 34th conference on Neural Information Processing Systems (NeurIPS), 2020. [Code]
-
Haoli Bai, Jiaxiang Wu, Irwin King, Michael Lyu
Few Shot Network Compression via Cross Distillation
Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI), 2020. [Code] [Poster]
-
Haoli Bai, Zhuangbin Chen, Michael Lyu, Irwin King, Zenglin Xu
Neural Relational Topic Models for Scientific Article Analysis
Proceedings of The 27th International Conference on Information and Knowledge Management (CIKM), 2018. [Code]
-
Haoli Bai, Zenglin Xu, Bin Liu, Yingming Li
Hierarchical Probabilistic Matrix Factorization with Network Topology for Multi-relational Social Network
Proceedings of The 8th Asian Conference on Machine Learning (ACML), 2016, Best Student Paper Runner-up.
Invited Talks
- "Quantization and Pruning of Large Language Models: Challenges, Techinques and Oppertunities" at SLAI, 2025. [Slide].
- "Efficient Inference for Large Language Models – Algorithm, Model, and System" at EMNLP Tutorial, 2025. Tutorial website.
- "Quantization and Pruning of Large Language Models: Challenges, Techinques and Oppertunities" at LMG , 2025.
Projects
-
PocketFlow: An Automated Framework for Compressing and Accelerating DNNs
PocketFlow automatically searches for optimal model compression strategies such as network pruning, quantization, knowledge distillation with little human efforts, and also supports TFLite deployment on Andriod devices. It has collected 2600+ stars and 480+ folks.
Services
- Area Chair: NeurIPS 2025
- Senior PC Member: IJCAI 2021
- PC Member: ICLR 22-25, ICML 21-25, NeurIPS 20-24, ACL ARR 25, COLM 25, ICCV 25, AAAI 19-21, IJCAI 20
- Journal Reviewer: T-PAMI, Neural Networks, etc.
Selected Awards
Excellent InternHuawei Noah's Ark Lab, 2021AAAI Student Travel GrantAAAI 2020ACM Student Travel GrantCIKM 2018CUHK Postgraduate Student Scholarship2017-2021Best Student Paper Runner-upACML 2016National Scholarship2015Tang Lixin Scholarship2015Working Experiences
Applied Scientist Intern at Amazon Devices2021 SummerResearch Intern at Huawei Noah's Ark Lab2020 SummerResearch Intern at Tencent AI Lab2018 SummerTeaching Assistant
CSCI3100: Software Engineering2020 SpringCSCI3100: Software Engineering2019 SpringCSCI1540: Introduction to C++2018 FallCSCI3100: Software Engineering2018 Spring



