| CARVIEW |
Yiheng Xu
ResearcherOpenAI
charlesyihengxu[at]gmail[dot]com [Google Scholar]
[GitHub]
[LinkedIn]
[Twitter]
From Digital Automation to Autonomous Agents
I develop scalable methods that advance AI from digital workflow automation toward fully autonomous agents. My research spans three interconnected directions:
- Digital Workflow Automation via Diverse Interface Understanding: Building models that can interpret across diverse digital interfaces — from unstructured, visually rich documents (LayoutLM, DiT, DocBank) to structured web pages (MarkupLM).
- Scaling Machine-like Autonomous Coding Agents: Designing and training agents that operate with machine-native efficiency through command-line and API-based interfaces, exemplified by Lemur, Qwen3 Coder, and Qwen Code.
- Towards Human-like Autonomous Computer Use Agents: Pushing the frontier of agents that can interact in human-designed GUI environments, including Aguvis, AgentTrek, and Qwen2.5-VL.
Selected Publications
-
Qwen3 Coder
Core Contributor (Responsible for CLI and Web Agent Capability and Framework (Qwen-Code))
[Blog] [Code] -
Qwen2.5-VL Technical Report
Core Contributor (Responsible for Computer/Mobile Using Agent Capability)
[Blog] [PDF] [Code] -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Yiheng Xu*, Zekun Wang*, Junli Wang*, Dunjie Lu, Tianbao Xie, Amrita Saha, Doyen Sahoo, Tao Yu, Caiming Xiong
ICML 2025
[PDF] [Project] -
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Yiheng Xu*, Dunjie Lu*, Zhennan Shen*, Junli Wang, Zekun Wang, Yuchen Mao, Caiming Xiong, Tao Yu
ICLR 2025 Spotlight (Top 5%)
[PDF] [Project] -
Lemur: Harmonizing Natural Language and Code for Language Agents
Yiheng Xu*, Hongjin Su*, Chen Xing*, Boyu Mi, Qian Liu, Weijia Shi, Binyuan Hui, Fan Zhou, Yitao Liu, Tianbao Xie, Zhoujun Cheng, Siheng Zhao, Lingpeng Kong, Bailin Wang, Caiming Xiong, Tao Yu
ICLR 2024 Spotlight (Top 5%)
[PDF] [Code] [Model] [Blog] -
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
ACM Multimedia 2022
[PDF] [Code] [Demo] -
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding
Junlong Li*, Yiheng Xu*, Lei Cui, Furu Wei
ACL 2022
[PDF] [Code] [Model] [Blog] -
XFUND: A Benchmark Dataset for Multilingual Visually Rich Form Understanding
Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
ACL 2022 Findings
[PDF] [Data] -
LayoutReader: Pre-training of Text and Layout for Reading Order Detection
Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei
EMNLP 2021
[PDF] [Code] [Blog] -
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu*, Yiheng Xu*, Tengchao Lv*, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
ACL 2021
[PDF] [Code] [Model] [Blog] -
DocBank: A Benchmark Dataset for Document Layout Analysis
Minghao Li*, Yiheng Xu*, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
COLING 2020
[PDF] [Code] [Blog] -
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
Yiheng Xu*, Minghao Li*, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
KDD 2020
[PDF] [Code] [Model] [Blog] [PaperDigest Most Influential Papers] [ICBS 2024 Frontiers of Science Award]
Service
- Conference Reviewer: AAAI, ACL, COLING, EMNLP, ICLR, ICML, MM
- Journal Reviewer: IEEE Transactions on Multimedia, Neurocomputing, IJDAR
Powered by Jekyll and Minimal Light theme.