| CARVIEW |
Yue Fan
Ph.D. candidate
University of California, Santa Cruz
yfan71 AT ucsc.edu
Github
Google Scholar
X
Linkedin
I am currently in my fifth year as a Ph.D. student in the Computer Science and Engineering (CSE) department at the University of California, Santa Cruz, advised by Professor Xin Eirc Wang. I earned my Bachelor's degree in Automation from Shandong University, followed by a Master's degree in Robotics from Johns Hopkins University. My research interests predominantly lie in the fields of AI agents, reinforcement learning for reasoning, and post-training multimodal LLM.
I am on the job mark seeking full-time research opportunities starting in first half of 2026. Please feel free to reach out to me if you have any opportunities. Thanks.
News
- [Sept 2025] Our GRIT paper is accepted by NeurIPS 2025.
- [Sept 2025] Our GUI-Bee paper is accepted by EMNLP 2025.
- [Sept 2025] I finished my full-time summer internship at Adobe Research.
- [May 2025] Our MMIR paper is accepted by ACL 2025 as a Finding paper.
- [Jan 2025] Our LLM-Coordination paper is accepted by NAACL 2025.
- [Sept 2024] Our Read Anywhere Pointed paper is accepted by EMNLP 2024.
- [June 2024] Our Muffin or Chihuahua paper is accepted by ACL 2024.
- [Apr 2024] I am glad to share that I have passed my Ph.D. qualification exam and become a Ph.D candidate.
- [Oct 2023] Our R2H paper is accepted by EMNLP2023.
- [Sep 2023] The Athena Team that I proudly lead secured a remarkable second-place in the scientific innovation category of Amazon Alexa Prize SocialBot Grand Challenge 5.
Publication (First/Co-first authorship)
GRIT: Teaching MLLMs to Think with Images
NeurIPS 2025
Yue Fan, Xuehai He, Diji Yang, Kaizhi Zheng, Ching-Chen Kuo, Yuting Zheng, Sravana Jyothi Narayanaraju,
Xinze Guan, Xin Eric Wang
GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration
EMNLP 2025
Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, Gang Wu
Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
EMNLP 2024
Yue Fan, Lei Ding, Ching-Chen Kuo, Shan Jiang, Yang Zhao, Xinze Guan, Jie Yang, Yi Zhang, Xin Eric Wang
Muffin or Chihuahua? Challenging Large Vision-Language Models with Multipanel VQA
ACL 2024
Yue Fan, Jing Gu, Kaiwen Zhou, Qianqi Yan, Shan Jiang, Ching-Chen Kuo, Yang Zhao, Xinze Guan, Xin Eric Wang
R2H: Building Multimodal Navigation Helpers that Respond to Help Requests
EMNLP 2023
Yue Fan, Jing Gu, Kaizhi Zheng, Xin Eric Wang
Athena 3.0: Personalized Multimodal ChatBot with Neuro-Symbolic Dialogue Generators
Alexa Prize SocialBot Grand Challenge 5
Yue Fan, Kevin K Bowden, Wen Cui, Winson Chen, Vrindavan Harrison, Angela Ramirez, Saaket Agashe, XG Liu, N Pullabhotla, NQJ Bheemanpally, S Garg, M Walker, XE Wang
Aerial Vision-and-Dialog Navigation
ACL 2023
Yue Fan, Winson Chen, Tongzhou Jiang, Chun Zhou, Yi Zhang, Xin Eric Wang
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Preprint 2022
Kaizhi Zheng*, Kaiwen Zhou*, Jing Gu*, Yue Fan*, Jialu Wang*, Zonglin Di, Xuehai He, Xin Eric Wang
Learn by Observation: Imitation Learning for Drone Patrolling from Videos of A Human Navigator
IROS 2020
Yue Fan, Shilei Chu, Wei Zhang, Ran Song, and Yibin Li
Earlier Projects
Unspuervised Adrenomyeloneuropathy disease data analysis
- Apply feature selection to find dominant factors among the disease progression.
- Design the extra-data-dimension heatmap toolkit for visualization the patient clusters.
- Use Bayesian Neural Network to classify the progressor with uncertainty.
Heatmap Toolkit
Learn by Observation: Imitation Learning for Drone Patrol from Raw Videos of A Human Navigator (IROS 2020)
- Design a data auto-labeling method using inter-frame geometric consistency.
- Bring up a DNN called UAVPatrolNet for Detecting Road.
- Make a dataset for drone autonomous Navigation.
Project page
Object Detection in Aerial Image
- I contributed to the teamwork by reproducing existing mature algorithms, e.g. RPN, Faster R-CNN.
- I conducted simulated experiments and adjusted the parameters to realize the optimal training effect; improved the object detection performance on aerial images.
ECCV Workshop - Visdrone2018
Control and Monitoring System of DJI Drones through PC
- Designed the control interface on PC with varies functions like "vehicle detection".
- Developed a system to transmit data between UVA and PC using Qt and DJI SDK.
- Applied the system in city traffic to successfully improve the management efficiency.
Github
Control of Carbon-free Car
- Developed a circuit board and selected the proper sensor by studying the control system of the carbon-free car.
- Conducted OOP of the machine by designing and applying the control algorithm.
- Awarded the First Prize in Engineering Training Integration Ability Competition of Shandong Province.
Competitions
Amazon Alexa Prize competition: Socialbot Grand Challenge 5
- The challenge aims at advancing conversational AI. University teams are tasked with developing a "socialbot", an AI chatbot that can interact naturally and intelligently with humans on a variety of topics through Amazon's Alexa platform.
- I serve as the team leader of our Athena3 team.
- Our Athena team has secured the second-place in the scientific innovation category of Alexa Prize SocialBot Grand Challenge 5.
Alexa Prize Socialbot Grand Challenge
Amazon Alexa Prize competition: Simbot Challenge
- The challenge is focused on helping advance development of next-generation virtual assistants that will assist humans in completing real-world tasks by continuously learning, and gaining the ability to perform commonsense reasoning.
- Our SlugJARVIS Team won the third place in the Simbot Challenge.
- Our SlugJARVIS Team won the Public Benchmark Challenge.
Alexa Prize SimBot Challenge
Public Benchmark Challenge
Athena3 Team
Students from the ERIC Lab and the Natural Language and Dialogue Systems Lab are making the fifth appearance in the competition. The goal of the team is to leverage advance algorithms and AI models to build a smart chat bot.

Location: Santa Cruz, California
Faculty advisor: Xin Wang
Team lead: Yue Fan
SlugJARVIS Team
UC Santa Cruz is one of America's Public Ivy universities and a member of the prestigious Association of American Universities (AAU). The ERIC Lab is led by Prof. Xin Eric Wang and stands for Embodiment, Reasoning, Intelligence, and language Communication. The ERIC Lab's research topics include natural language processing, computer vision, and machine learning, with an emphasis on building embodied AI agents that can communicate with humans in natural language to perform real-world multimodal tasks.

Location: Santa Cruz, California
Faculty advisor: Xin Wang
Object Detection in Aerial Image
Placeholder for more details about Object Detection in Aerial Image. You can add more information here about the project, methods, results, and your contributions.
Control and Monitoring System of DJI Drones through PC
Placeholder for more details about the DJI Drone Control and Monitoring System. You can add more information here about the system design, features, and impact.
Control of Carbon-free Car
Placeholder for more details about the Control of Carbon-free Car project. You can add more information here about the engineering, algorithms, and competition results.
Assembling and Programming Drones with ROS
Placeholder for more details about Assembling and Programming Drones with ROS. You can add more information here about the assembly process, programming, and competition experience.
Building Multimodal Web AI Agent
- We first build MultipanelVQA benchmark to challenge Large Vision-Language Models with their ability to understand multipanel images, such as web screenshot and posters.
- We are now working on developing specialized AI agent to interact with all kinds of UI, including web, mobile, etc.
Respond to Help Requests (R2H) Project (EMNLP 2023)
- We establish the R2H benchmark, featuring tasks that assess an agent's capabilities based on guiding users or another agent in unknown areas through dialogues.
- We propose two multimodal navigation-helper agents, fine-tuned SeeRee model for multi-modal response generation and employing a large language model in a zero-shot manner, analyzed via benchmarking and human evaluations.
Project page
Aerial Vision-and-Dialog Navigation (AVDN) Project (ACL 2023)
- Our AVDN project aims at building drones that understand and follow natural language commands, facilitating hands-free control and accessibility.
- We build AVDN dataset of over 3k recorded dialogs and navigation trajectories and drone simulator with a photorealistic environment.
- We successfully host public AVDN Challenge at the ICCV 2023 CLVL workshop.
Project page
AVDN Challenge
Learn by Observation: Imitation Learning for Drone Patrol from Raw Videos of A Human Navigator (IROS 2020)
- Design a data auto-labeling method using inter-frame geometric consistency.
- Bring up a DNN called UAVPatrolNet for Detecting Road.
- Make a dataset for drone autonomous Navigation.
Project page
Academic Service and Teaching
Conference Reviewer
- ACL, EMNLP, NAACL, NeurIPS, COLM, ECCV, ICRA, IROS
Workshop Organization
- SpLU-RoboNLP 2023: The Third Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics at EMNLP 2023
Teaching Experience
- Course Assistant, University of California Santa Cruz Science Internship Program (SIP)
An open-ended STEM/STEAM research program exclusively for high school students
- Teaching Assistant, CSE 140 Machine Learning
University of California, Santa Cruz
- Course Assistant, EN 601.475 Machine Learning & EN 601.783 Vision as Bayesian Inference
Johns Hopkins University
Copyright © 2025 by YueFan. All rights reserved.
All designs are the property of the owner.