| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 17 Jul 2025 19:01:50 GMT
access-control-allow-origin: *
etag: W/"6879489e-2670"
expires: Mon, 29 Dec 2025 02:54:17 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: B31B:1F53DD:83025C:933C91:6951EB01
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 02:44:17 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210086-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766976258.690179,VS0,VE223
vary: Accept-Encoding
x-fastly-request-id: 5032ac09cd5e41637005ffd515cd2ccdfbf5a8d9
content-length: 2768
Xiujun Li
__PRESENT
Xiujun Li
Multimodal LLMs, LLMs, NLP, Vision and Language
I am currently a Research Scientist at Apple. I got my PhD from UW CSE with Yejin Choi in 2024. Prior to this, I worked five years at Microsoft Research. My research experience is on Dialog, Deep Reinforcement Learning, NLP, Vision and Language, Multimodal LLMs etc. Recently, I am also interested in Video Generation.
Selected Papers Google Scholar
Multimodal LLMs
-
Xiujun Li*, Yujie Lu*, Zhe Gan, Jianfeng Gao, William Yang Wang, Yejin Choi
Text as Images: Can Multimodal Large Language Models Follow Printed Instructions in Pixels?, arXiv 2024
-
Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moorthy, Jeff Nichols, Yinfei Yang, Zhe Gan
Ferret-UI 2: Mastering Universal User Interface Understanding Across Platforms, ICLR 2025
-
Yingzi Ma, Jiongxiao Wang, Fei Wang, Siyuan Ma, Jiazhao Li, Xiujun Li, Furong Huang, Lichao Sun, Bo Li, Yejin Choi, Muhao Chen, Chaowei Xiao
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset, ICLR 2025
-
Enrico Fini, Mustafa Shukor, Xiujun Li, Philipp Dufter, Michal Klein, David Haldimann, Sai Aitharaju, Victor Guilherme Turrisi da Costa, Louis Béthune, Zhe Gan, Alexander T Toshev, Marcin Eichner, Moin Nabi, Yinfei Yang, Joshua M. Susskind, Alaaeldin El-Nouby
Multimodal Autoregressive Pre-training of Large Vision Encoders, CVPR 2025
Vision and Language
-
Pengchuan Zhang*, Xiujun Li*, Xiaowei Hu, Jianwei Yang, Lei Zhang, Lijuan Wang, Yejin Choi, Jianfeng Gao
VinVL: Making Visual Representations Matter in Vision-Language Models, CVPR 2021
-
Xiujun Li, Xi Yin, Chunyuan Li, Xiaowei Hu, Pengchuan Zhang, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks, ECCV 2020
-
Xiujun Li, Chunyuan Li, Qiaolin Xia, Yonatan Bisk, Asli Celikyilmaz, Jianfeng Gao, Noah Smith, Yejin Choi
Robust navigation with language pretraining and stochastic sampling, EMNLP 2019
Dialog, RL
-
Xiujun Li, Yun-Nung Chen, Lihong Li, Jianfeng Gao, Asli Celikyilmaz
End-to-End Task-Completion Neural Dialogue Systems, IJCNLP 2017 -
Baolin Peng, Xiujun Li, Lihong Li, Jianfeng Gao, Asli Celikyilmaz, Sungjin Lee, Kam-Fai Wong
Composite Task-Completion Dialogue System via Hierarchical Deep Reinforcement Learning, EMNLP 2017 -
Baolin Peng, Xiujun Li, Jianfeng Gao, Jingjing Liu, Kam-Fai Wong and Shang-Yu Su
Deep Dyna-Q: Integrating Planning for Task-Completion Dialogue Policy Learning, ACL 2018