| CARVIEW |
Select Language
Recent Publications (2024 & 2025)
[Recent Technical Report]
[World Model]Yume: An Interactive World Generation Model
[Image Generation]IA-T2I: Internet-Augmented Text-to-Image Generation
[Image Generation]SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model
[MLLM]ARMOR: Empowering Multimodal Understanding Model with Interleaved Multimodal Generation Capability
[Conference Papers]
[NeurIPS 2025 Spotlight]Think or not think: A study of explicit thinking in rule-based visual reinforcement fine-tuning
WAIC Young Outstanding Paper Award, 2022
World's TOP 2% Scientists (published by Stanford University), 2020 & 2021 & 2022 & 2023
JSPS Research Fellowships for Young Scientists, 2020
Tencent Rhino-Bird Elite Training Program, 2020
MSRA Fellowship Nomination Award, 2019
Emotion Recognition in the Wild: Engagement Prediction (ICMI 2019 Grand Challenge), 3rd place
Emotion Recognition in the Wild: Group-based Cohesion Prediction (ICMI 2019 Grand Challenge), 2nd place
Disguised Faces in the Wild Challenge (in conjunction with CVPR 2018), 1st place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2018 Grand Challenge), 2nd place
Emotion Recognition in the Wild: Group-level emotion recognition (ICMI 2017 Grand Challenge), 1st place
ChaLearn Looking at People Challenge: Accessories Classification (in conjunction with CVPR 2016), 1st place
ChaLearn Looking at People Challenge: Smile and Gender Classification (in conjunction with CVPR 2016), 1st place
Outstanding Undergraduate Thesis, 2016
Area Chair of ICLR
Senior program committee of IJCAI and AAAI
Reviewer/Program committee of NeurIPS, ICML, ICLR, AAAI, ICCV, ECCV, CVPR, BMVC, WACV and ACCV
Reviewer of TPAMI, TIP, TCSVT, TNNLS, TMM, TIFS, Neurocomputing, Pattern Recognition, and SPL
[NeurIPS 2025] Sekai: A Video Dataset towards World Exploration
[NeurIPS 2025] Neural-Driven Image Editing
[NeurIPS 2025] REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
[ACL Findings 2025] MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification
[ICML 2025] Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
[CVPR 2025 Oral] OpenING: A Comprehensive Benchmark for Judging Open-ended Interleaved Image-Text Generation
[ICLR 2025 Oral] Dynamic Multimodal Evaluation with Flexible Complexity by Vision-Language Bootstrapping
[NeurIPS 2024 Spotlight] ConvBench: A Multi-Turn Conversation Evaluation Benchmark with Hierarchical Capability for Large Vision-Language Models
[NeurIPS 2024] SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge
[NeurIPS 2024] Rethinking Human Evaluation Protocol for Text-to-Video Models: Enhancing Reliability,Reproducibility, and Practicality
[NeurIPS 2024] Needle In A Multimodal Haystack
[ICML 2024] Towards Implicit Prompt For Text-To-Image Models
[NAACL Findings 2024] T3M: Text Guided 3D Human Motion Synthesis from Speech
[ACL Findings 2024] ChartAssistant: A Universal Chart Multimodal Language Model via Chart-to-Table Pre-training and Multitask Instruction Tuning
[Journal Papers]
[TBigData 2024]Tiny LVLM-eHub: Early Multimodal Experiments with Bard
[Pattern Recognition 2024]FMGNet: An efficient feature-multiplex group network for real-time vision task
[Tutorial]
[CVPR 2025 Tutorial]From Multimodal LLM to Human-level AI: Evaluations and Benchmarks
Education
Selected Awards and Competitions
Academic Service
Work Experience
Researcher
Shanghai AI Lab
OpenGVLab
Shanghai, China
May. 2022 - Present
Intern
MSRA
Visual Computing Group
Beijing, China
Jan. 2018 - Jul. 2018
Intern
Tencen
AI Lab & AI Advertisement Department
Shenzhen, China
Jul. 2017 - Aug. 2017
Sep. 2020 - Feb. 2021