| CARVIEW |
Huan Wang
Senior Director at Salesforce Research
I am currently a Senior Director at Salesforce Research in Palo Alto, CA. I received my Ph.D. in Computer Science from Yale University in 2013, where I was advised by Prof. Daniel Spielman. I was also mentored by Prof. John Wright at Columbia University. Prior to Yale, I was a member of the Multimedia Lab at the Chinese University of Hong Kong, supervised by Prof. Xiaoou Tang, Prof. Shuicheng Yan, and Prof. Jianzhuang Liu.
Open Source Projects
A collection of research projects and tools I've contributed to, spanning Large Language Models, AI Agents, Multimodal AI, and more.
APIGen
Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets
APIGen-MT
Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
AgentLite
A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System
CRM Arena
Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments
Persona Bench
Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data
LoCoBench
A Benchmark for Long-Context Large Language Models in Complex Software Engineering
Diversity Empowers Intelligence
Integrating Expertise of Software Engineering Agents
Retroformer
Retrospective Large Language Agents with Policy Gradient Optimization
DialogStudio
Towards Richest and Most Diverse Unified Dataset Collection and Instruction-Aware Models for Conversational AI
Converse
A Flexible Framework for Building and Deploying Task-Oriented Chatbots.
Publication Highlights
Full publication list can be found on Google Scholar.
Large Language Model (LLM)
APIGen-MT: Agentic PIpeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay, by Akshara Prabhakar, Zuxin Liu, Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar, Shiyu Wang, Zhiwei Liu, Haolin Chen, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Weiran Yao, Huan Wang*, Silvio Savarese*, Caiming Xiong*. NeurIPS Datasets and Benchmarks Track, 2025. [Data][Model], * co-corresponding authors.
APIGen: Automated PIpeline for Generating Verifiable and Diverse Function-Calling Datasets, by Zuxin Liu, Thai Hoang, Jianguo Zhang, Ming Zhu, Tian Lan, Shirley Kokane, Juntao Tan, Weiran Yao, Zhiwei Liu, Yihao Feng, Rithesh Murthy, Liangwei Yang, Silvio Savarese, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong. NeurIPS, 2024. [Data][Model]
xLAM: A Family of Large Action Models to Empower AI Agent Systems, by Jianguo Zhang, Tian Lan, Ming Zhu, Zuxin Liu, Thai Hoang, Shirley Kokane, Weiran Yao, Juntao Tan, Akshara Prabhakar, Haolin Chen, Zhiwei Liu, Yihao Feng, Tulika Awalgaonkar, Rithesh Murthy, Eric Hu, Zeyuan Chen, Ran Xu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang*, Silvio Savarese*, Caiming Xiong*. Arxiv, 2024. [Github Repo], * co-corresponding authors.
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization, by Weiran Yao, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Yihao Feng, Le Xue, Rithesh Murthy, Zeyuan Chen, Jianguo Zhang, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. [Github Repo], * co-corresponding authors.
BOLAA: Benchmarking and Orchestrating LLM-augmented Autonomous Agents, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Le Xue, Shelby Heinecke, Rithesh Murthy, Yihao Feng, Zeyuan Chen, Juan Carlos Niebles, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. [Github Repo], * co-corresponding authors.
REX: Rapid Exploration and eXploitation for AI Agents, by Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang*, Caiming Xiong*, Silvio Savarese*. Arxiv, 2023. * co-corresponding authors.
AgentOhana: Design Unified Data and Training Pipeline for Effective Agent Learning, by Jianguo Zhang, Tian Lan, Rithesh Murthy, Zhiwei Liu, Weiran Yao, Ming Zhu, Juntao Tan, Thai Hoang, Zuxin Liu, Liangwei Yang, Yihao Feng, Shirley Kokane, Tulika Awalgaonkar, Juan Carlos Niebles, Silvio Savarese, Shelby Heinecke, Huan Wang, Caiming Xiong. Arxiv, 2024. [Github Repo]
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis, by Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. Arxiv, 2022. [Github Repo]
AI Agent
AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Liangwei Yang, Zuxin Liu, Juntao Tan, Prafulla K. Choubey, Tian Lan, Jason Wu, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese. Arxiv, 2024. [Github Repo]
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents, by Kexin Zhang, Weiran Yao, Zuxin Liu, Yihao Feng, Zhiwei Liu, Rithesh Murthy, Tian Lan, Lei Li, Renze Lou, Jiacheng Xu, Bo Pang, Yingbo Zhou, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong. Arxiv, 2024.
CRMArena: Understanding the Capacity of LLM Agents to Perform Professional CRM Tasks in Realistic Environments, by Kung-Hsiang Huang, Akshara Prabhakar, Sidharth Dhawan, Yixin Mao, Huan Wang, Silvio Savarese, Caiming Xiong, Philippe Laban, Chien-Sheng Wu. NAACL, 2025. [Github Repo]
MobileAIBench: Benchmarking LLMs and LMMs for On-Device Use Cases, by Rithesh Murthy, Liangwei Yang, Juntao Tan, Tulika Manoj Awalgaonkar, Yilun Zhou, Shelby Heinecke, Sachin Desai, Jason Wu, Ran Xu, Sarah Tan, Jianguo Zhang, Zhiwei Liu, Shirley Kokane, Zuxin Liu, Ming Zhu, Huan Wang, Caiming Xiong, Silvio Savarese. Arxiv, 2024.
MCPEval: Automatic MCP-based Deep Evaluation for AI Agent Models, by Zhiwei Liu, Jielin Qiu, Shiyu Wang, Jianguo Zhang, Zuxin Liu, Roshan Ram, Haolin Chen, Weiran Yao, Shelby Heinecke, Silvio Savarese, Huan Wang, Caiming Xiong. Arxiv, 2025. [Github Repo]
ToolScan: A Benchmark for Characterizing Errors in Tool-Use LLMs, by Shirley Kokane, Ming Zhu, Tulika Awalgaonkar, Jianguo Zhang, Thai Hoang, Akshara Prabhakar, Zuxin Liu, Tian Lan, Liangwei Yang, Juntao Tan, Rithesh Murthy, Weiran Yao, Zhiwei Liu, Juan Carlos Niebles, Huan Wang, Shelby Heinecke, Caiming Xiong, Silvio Savarese. Arxiv, 2024.
UserBench: An Interactive Gym Environment for User-Centric Agents, by Cheng Qian, Zuxin Liu, Akshara Prabhakar, Zhiwei Liu, Jianguo Zhang, Haolin Chen, Heng Ji, Weiran Yao, Shelby Heinecke, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2025.
LoCoBench: A Benchmark for Long-Context Large Language Models in Complex Software Engineering, by Jielin Qiu, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Jianguo Zhang, Haolin Chen, Shiyu Wang, Ming Zhu, Liangwei Yang, Juntao Tan, Zhepeng Cen, Cheng Qian, Shelby Heinecke, Weiran Yao, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2025. [Github Repo]
LAM SIMULATOR: Advancing Data Generation for Large Action Model Training via Online Exploration and Trajectory Feedback, by Thai Hoang, Kung-Hsiang Huang, Shirley Kokane, Jianguo Zhang, Zuxin Liu, Ming Zhu, Jake Grigsby, Tian Lan, Michael S Ryoo, Chien-Sheng Wu, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles. Arxiv, 2025. [Github Repo]
PersonaBench: Evaluating AI Models on Understanding Personal Information through Accessing (Synthetic) Private User Data, by Juntao Tan, Liangwei Yang, Zuxin Liu, Zhiwei Liu, Rithesh Murthy, Tulika Manoj Awalgaonkar, Jianguo Zhang, Weiran Yao, Ming Zhu, Shirley Kokane, Silvio Savarese, Huan Wang, Caiming Xiong, Shelby Heinecke. ACL Findings, 2025. [Github Repo]
LLM Reasoning
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding, by Haolin Chen, Yihao Feng, Zuxin Liu, Weiran Yao, Akshara Prabhakar, Shelby Heinecke, Ricky Ho, Phil Mui, Silvio Savarese, Caiming Xiong, Huan Wang. Arxiv, 2024.
PRACT: Optimizing Principled Reasoning and Acting of LLM Agent, by Zhiwei Liu, Weiran Yao, Jianguo Zhang, Rithesh Murthy, Liangwei Yang, Zuxin Liu, Tian Lan, Ming Zhu, Juntao Tan, Shirley Kokane, Thai Hoang, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong. SIG CoNLL, 2024.
LATTE: Learning to Think with Vision Specialists, by Zixian Ma, Jianguo Zhang, Zhiwei Liu, Jieyu Zhang, Juntao Tan, Manli Shu, Juan Carlos Niebles, Shelby Heinecke, Huan Wang, Caiming Xiong, Ranjay Krishna, Silvio Savarese. Arxiv, 2024.
Reinforcement Learning
On the Generalization Gap in Reparameterizable Reinforcement Learning, by Huan Wang, Stephan Zheng, Caiming Xiong, Richard Socher. ICML, 2019.
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning, by Tengyang Xie, Nan Jiang, Huan Wang, Caiming Xiong, Yu Bai. NeurIPS, 2021.
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games, by Yu Bai, Chi Jin, Huan Wang, and Caiming Xiong. NeurIPS, 2021.
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU, by Tian Lan, Sunil Srinivasa, Huan Wang, Stephan Zheng. Arxiv, 2021. [Github Repo]
Uncertainty Estimation
Improved Online Conformal Prediction via Strongly Adaptive Online Learning, by Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Yu Bai. ICML, 2023.
Understanding the Under-Coverage Bias in Uncertainty Estimation, by Yu Bai, Song Mei, Huan Wang, Caiming Xiong. NeurIPS, 2021.
Localized Calibration: Metrics and Recalibration, by Rachel Luo, Aadyot Bhatnagar, Huan Wang, Caiming Xiong, Silvio Savarese, Yu Bai, Shengjia Zhao, Stefano Ermon. Arxiv, 2021.
Natural Language Processing
Unsupervised Paraphrasing with Pretrained Language Models, by Tong Niu, Semih Yavuz, Yingbo Zhou, Nitish Shirish Keskar, Huan Wang and Caiming Xiong. EMNLP, 2021.
BatchMixup: Improving Training by Interpolating Hidden States of the Entire Mini-batch, by Wenpeng Yin, Huan Wang, Jin Qu, Caiming Xiong. ACL.Findings, 2021.
Neural Network and Deep Learning
Evaluating State-of-the-Art Classification Models Against Bayes Optimality, by Ryan Theisen, Huan Wang, Lav R Varshney, Caiming Xiong, and Richard Socher. NeurIPS, 2021.
Catastrophic Fisher Explosion: Early Phase Fisher Matrix Impacts Generalization, by Stanislaw Jastrzebski, Devansh Arpit, Oliver Astrand, Giancarlo Kerg, Huan Wang, Caiming Xiong, Richard Socher, Kyunghyun Cho, Krzysztof Geras. ICML, 2021.
Sparse Representation and Dictionary Learning
Exact Recovery of Sparsely-Used Dictionaries, by Daniel Spielman, Huan Wang, and John Wright. Best paper award of the 25th Conference on Learning Theory (COLT), Jun.2012.
Music
不会写代码的研究员不是好歌手 - Original compositions and musical works
My Compositions (POP and NEW AGE)
My Recordings / 翻唱
YouTube Channel
Subscribe to my YouTube channel for more music content:
JoyousPrince (@YouTube)NetEase Music Channe (网易音乐人)
Follow my NetEase Music artist page for streaming my compositions:
Huan Wang (@NetEase Music)