| CARVIEW |
My research interests are in Multi-modality Learning and Data-centric AI. Currently, I focus on very large-scale efficient vision-language pre-training (1,000 GPUs and 10 Billion Samples Level) and Multi-modality Large Language Models.
I graduated from Ningxia Yucai High School.
I obtained my bachelor’s and master’s degrees from Sun Yat-Sen University (SYSU).
I completed my PhD at the National University of Singapore (NUS) in three years, where I was supervised by Prof. Mike Zheng Shou. I am now a tenure-track faculty member in Central South University and have sufficient computational resources to support cutting-edge research. News
[Sep 2025] Three papers are accepted by NeurIPS 2025 (1 spotlight). |
|
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Alex Jinpeng Wang , Linjie Li, Zhengyuan Yang, Lijuan Wang, Min Li Arxiv, 2025. [Paper] [Website] |
|
TextAtlas5M: A Large-scale Dataset for Dense Text Image Generation Alex Jinpeng Wang , Dongxing Mao, Jiawei Zhang, Weiming Han, Zhuobai Dong, Linjie Li, Yiqi Lin, Zhengyuan Yang, Libo Qin, Fuwei Zhang, Lijuan Wang, Min Li Arxiv, 2025. [Paper] [GitHub] [Website] |
  2023.1.1 - 2024.12.31
|
Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning Alex Jinpeng Wang , Linjie Li, Yiqi Lin, Min Li, Lijuan Wang and Mike Zheng Shou To appear in NeurIPS, 2024. [Paper] [GitHub] [Website] |
|
COSMO: Contrastive Streamlined Multimodal Model With Interleaved Pre-Training. Alex Jinpeng Wang , Linjie Li, Kevin Qinghong Lin, Jianfeng Wang, Kevin Lin, Zhengyuan Yang, Lijuan Wang and Mike Zheng Shou Arxiv, 2024. [Paper] [GitHub] [COSMO Website] [COSMOE Website] |
|
Parrot Captions Teach CLIP to Spot Text. Yiqi Lin, Conghui He, Alex Jinpeng Wang (equal contribution) , Bin Wang, Weijia Li, Mike Zheng Shou To appear in ECCV, 2024. (Oral) [Paper] [GitHub] |
|
Too Large; Data Reduction for Vision-Language Pre-Training. Alex Jinpeng Wang , Kevin Qinghong Lin, David Junhao Zhang, Stan Weixian Lei and Mike Zheng Shou To appear in ICCV, 2023. [Paper] [GitHub] |
  2022.1.1 - 2022.12.31
|
Position-guided Text Prompt for Vision Language Pre-training. Alex Jinpeng Wang , Pan Zhou , Mike Zheng Shou, Shuicheng Yan To appear in CVPR, 2023. [Paper] [GitHub] [Bibtex] |
|
All in One: Exploring Unified Video-Language Pre-training. Alex Jinpeng Wang , Yixiao Ge , Rui Yan, Yuying Ge, Xudong Lin, Guanyu Cai, Jianping Wu, Ying Shan, Xiaohu Qie, Mike Zheng Shou To appear in CVPR, 2023. [Paper] [GitHub] [Bibtex] |
|
Object-aware Video-language Pre-training for Retrieval. Alex Jinpeng Wang , Yixiao Ge , Guanyu Cai, Rui Yan, Xudong Lin, Ying Shan, Xiaohu Qie, Mike Zheng Shou To appear in CVPR, 2022. [Paper] [Webpage] [GitHub] [Bibtex] |
|
Suppressing Static Visual Cues in Probability via Normalizing Flows for Self-Supervised Video Representation Learning. Manlin Zhang, Jinpeng Wang (equal contribution), Andy J. Ma To appear in AAAI, 2022. Oral. [Paper] [Bibtex] [GitHub] |
  2021.1.1 - 2021.12.31
|
Learning Spatio-temporal Representation by Channel Aliasing video Perception. Yiqi Lin, Jinpeng Wang (equal contribution), Manlin Zhang, Andy J. Ma To appear in ACM MM, 2021. [Paper] [GitHub] [Bibtex] |
|
Multi-level Temporal Dilated Dense Prediction for Action Recognition Jinpeng Wang, Yiqi Lin, Manlin Zhang, Yuan Gao, Andy J. Ma To appear in TMM, 2021. [Paper] [Bibtex] |
|
Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning Jinpeng Wang, Yuting Gao, Ke Li, Yiqi Lin, Andy J. Ma, Hao Cheng, Pai Peng, Rongrong Ji, Xing Sun To appear in CVPR, 2021. [Paper] [Webpage] [Video] [GitHub] [Bibtex] |
|
Enhancing Unsupervised Video Representation Learning by Decoupling the Scene and the Motion Jinpeng Wang , Yuting Gao, Ke Li, Jianguo Hu, Xinyang Jiang, Xiaowei Guo, Rongrong Ji, Xing Sun In AAAI, 2021. [Paper] [Webpage] [Video] [GitHub] [Bibtex] |
  2020 & before
|
Revisiting Hard Example for Action Recognition Jinpeng Wang, Jianguo Hu, Shiren Li, Zhihao Yuan In TCSVT, 2020. [Paper] [GitHub] [Bibtex] |
|
Rethinking Temporal-Related Sample for Human Action Recognition Jinpeng Wang, Shiren Li, Zhikui Duan, Zhihao Yuan In ICASSP, 2020. [Paper] [Bibtex] |
Awards
Egocentric Vision (EgoVis) 2022/2023 Distinguished Paper Awards.2022 Showlab Annual Award. ($1,000)
The champion of CVPR'22 Epic-kinetics challenge (2022).
The 1st place in Ego4D challenge (2022).
The final awarded list of AI SINGAPORE PhD FELLOWSHIP PROGRAMME ($$240,000).
2021 Excellent Graduation Thesis (1/224).
Reviewer recognitions, CVPR 2021, ICCV 2021, AAAI 2021, TCSVT.
The First Prize Scholarship (2020).
Second Prize of College Students Innovation and Entrepreneurship Competition (2018).
Excellent undergraduate thesis (2017).
Collaborators
I have gotten to work with some wonderful collaborators.
@Microsoft Azure AI
-
Linjie Li, Research Scientist
Zhengyuan Yang, Research Scientist
Lijuan Wang, Principal Research Manager
@Sea AI Lab
-
Pan Zhou, @SAIL
Shuicheng Yan, Professor of @NUS
@Tencent PCG ARC Lab
@Tencent Youtu Lab
-
Xin Sun, @HKU
Rongrong Ji, Professor of @XiaMen University
@Sun Yat-sen University
-
Andy Jinhua Ma, Associate Professor
Talk
|
Time: 2021.2.12; Title: CVPR21 BE Demo; Source: Youtube
|
Time: 2021.3.29; Title: VALSE STUDENT WEBINAR; Source: Bilibili (China)
|