| CARVIEW |
Kevin Qinghong LinPostdoctoral Researcher
Torr Vision Group |
![]() |
Biography
My research focuses on developing multimodal intelligent agents to assist humans. This spans abilities like:
- See multimodally: video understanding (VideoMind,VideoLLM-online) from scalable human data (EgoVLP,UniVTG).
- Think like humans: adaptive reasoning via reinforcement learning (Think or Not) and symbolic coding (Code2Video,VCode).
- Act in environments: computer-use agents (ShowUI,GroundCUA) for human workflows (Paper2Poster,Paper2Video).
I’m open to collaborate with academic / industry / startups. Feel free to drop me an email.
I am passionate about open-source!
Selected Publications [Google Scholar]
† indicates equal contribution. Denotes student I mentored. ✉ indicates corresponding author.|
Video Reality Test: Can AI-Generated ASMR Videos fool VLMs and Humans? Jiaqi Wang, Weijia Wu, Yi Zhan, Rui Zhao, Ming Hu, James Cheng, Wei Liu, Philip Torr, Kevin QH. Lin✉
Preprint, 2025 |
|
Computer-Use Agents as Judges for Generative User Interface Kevin QH. Lin†, Siyuan Hu†, Linjie Li, Zhengyuan Yang, Lijuan Wang, Philip Torr, Mike Z. Shou. |
|
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Kevin QH. Lin†, Yuhao Zheng†, Hangyu Ran†, Dantong Zhu, Dongxing Mao, Linjie Li, Philip Torr, Alex JP. Wang.
Preprint, 2025 |
|
Paper2Video: Automatic Video Generation from Scientific Papers Zeyu Zhu†, Kevin QH. Lin†, Mike Z. Shou.
Preprint, 2025 |
|
Code2Video: A Code-centric Paradigm for Educational Video Generation Yanzhe Chen†, Kevin QH. Lin†, Mike Z. Shou.
Preprint, 2025 |
|
Paper2Poster: Towards Multimodal Poster
Automation from Scientific Papers Wei Pang†, Kevin QH. Lin†, Xiangru Jian†, Xi He, Philip Torr.
NeurIPS D&B, 2025 |
|
Think or Not? Selective Reasoning via
Reinforcement Learning for Vision-Language Models Jiaqi Wang†, Kevin QH. Lin†, James Cheng, Mike Z. Shou.
NeurIPS, 2025 |
|
VideoMind: A Chain-of-LoRA Agent for Long
Video Reasoning Ye Liu†, Kevin QH. Lin†, Chang Wen Chen, Mike Z. Shou.
Preprint, 2025 |
|
Grounding Computer Use Agents on Human Demonstrations Aarash Feizi†, Shravan Nayak†, Xiangru Jian, Kevin QH. Lin, Kaixin Li, Rabiul Awal, Xing Han Lù, Johan Obando-Ceron, Juan A Rodriguez, Nicolas Chapados, David Vazquez, Adriana Romero-Soriano, Reihaneh Rabbany, Perouz Taslakian, Christopher Pal, Spandana Gella, Sai Rajeswar.
Preprint, 2025 |
|
UI-Vision: A Desktop-centric GUI Benchmark for Visual Perception and Interaction Shravan Nayak†, Xiangru Jian†, Kevin QH. Lin, Juan A Rodriguez, Montek Kalsi, Rabiul Awal, Nicolas Chapados, M Tamer Özsu, Aishwarya Agrawal, David Vazquez, Christopher Pal, Perouz Taslakian, Spandana Gella, Sai Rajeswar.
ICML, 2025 |
|
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Jinheng Xie†, Weijia Mao†, Zechen Bai†, David JH. Zhang†, Weihao Wang, Kevin QH. Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Z. Shou.
ICLR, 2025 |
|
AssistGPT: Towards Multi-modal Agent for Human-Centric AI Assistant Difei Gao, Siyuan Hu, Kevin QH. Lin, Mike Z. Shou.
ACMMM HCMA workshop, 2024. Best Demo Paper |
|
VideoLLM-online: Online Video Large Language Model for Streaming Video Joya Chen, Zhaoyang Lv, Shiwei Wu, Kevin QH. Lin, Chenan Song, Difei Gao, Jia-Wei Liu, Ziteng Gao, Dongxing Mao, Mike Z. Shou.
CVPR, 2024 |
|
ShowUI: One Vision-Language-Action
Model for GUI Visual Agent Kevin QH. Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan WX. Lei, Lijuan Wang, Mike Z. Shou.
CVPR, 2025 |
|
VLog: Video-Language Models by Generative
Retrieval of Narration Vocabulary Kevin QH. Lin, Mike Z. Shou.
CVPR, 2025 |
|
VideoGUI: A Benchmark for GUI Automation
from Instructional Videos Kevin QH. Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan Wang, Mike Z. Shou.
NeurIPS D&B, 2024. Spotlight |
|
Learning Video Context as Interleaved
Multimodal Sequences Kevin QH. Lin, Pengchuan Zhang, Difei Gao, Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Z. Shou. |
|
UniVTG: Towards Unified Video-Language
Temporal Grounding Kevin QH. Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex JP. Wang, Rui Yan, Mike Z. Shou.
ICCV, 2023 |
|
Egocentric Video-Language Pretraining
Kevin QH. Lin, Alex JP. Wang, M. Soldan, M. Wray, R. Yan, Eric ZC. Xu, D. Gao, R. Tu, W. Zhao, W. Kong, C. Cai, H. Wang, D. Damen, B. Ghanemå, W. Liu, Mike Z. Shou.
NeurIPS, 2022. Spotlight (1.7%) |
Honors
-
Tinker Research Grant, Thinking Machines Lab2025
-
DAAD AINeT Fellowship2025
-
CVPR Doctoral Consortium2025
-
Outstanding Paper Award, NeurIPS Open-World Agents2024
-
NeurIPS Top Reviewers2024
-
Best Demo Paper Award, ACM Multimedia HCMA2024
-
Egocentric Vision (EgoVis) Distinguished Paper Award2024
-
CVPR Outstanding Reviewers (Top 2%)2024
-
PREMIA Best Student Paper Awards, Gold Award2023
-
NeurIPS Scholar Award2022
-
Tencent Rhino-Bird Research Scholarship, Second Prize2022
-
1st Place on Ego4D - Object State Change Classification Challenge, CVPR2022
-
1st Place on EPIC-Kitchens - Multi-Instance Retrieval
Challenge, CVPR2022
-
Show Lab Annual Award2022, 2024
-
China National Scholarship2018, 2021
Service
-
Area Chair: NeurIPS 2025.
-
Workshop Organizer: Open Multimodal Gathering @ NUS; Multimodal Video Agent @ CVPR 25.
-
Conference Reviewer: CVPR (2024 Outstanding Reviewers), ICCV, ECCV, NeurIPS (2024 Top Reviewers), ICML, ICLR, etc.
-
Journal Reviewer: TPAMI, IJCV, TMLR, TNNLS, TMM, etc.
-
Co-organizer of The AI Talks.
