CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 01 Dec 2025 04:53:15 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"692d1f3b-2849" expires: Sun, 28 Dec 2025 10:10:56 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: A74A:2DDCFF:779E9B:86102E:6950FFD8 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 10:00:56 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210035-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766916057.775247,VS0,VE215 vary: Accept-Encoding x-fastly-request-id: 352f511c6a545d0a6771e2b0481011c5d6cdc7c6 content-length: 2812 Wei Chen

Wei Chen

PhD Student

Department of Computer Science and Engineering
The Hong Kong University of Science and Technology

Email: csewei.chen AT connect DOT ust DOT hk

Biography

I am a PhD student in the Department of Computer Science and Engineering at The Hong Kong University of Science and Technology, where I am advised by Prof. Long Chen in the LONG Group. Prior to this, I obtained my bachelor's degree from the School of Computer Science at Wuhan University, under the supervision of Prof. Yu Wu.

Research Interests

My research interests lie in multi-modal content understanding and generation, which include:

Vision-language alignment in Multi-modal Large Language Models (MLLM)
Reinforcement Learning from Human Feedback (RLHF) in MLLM
Unified multi-modal understanding and generation with mutual enhancement

🔍 I am currently seeking Summer 2026 internships related to Multi-modal Large Language Models (MLLM). Please feel free to contact me if you have any opportunities!

Publications

Relation-R1: Cognitive Chain-of-Thought Guided Reinforcement Learning for Unified Relational Comprehension

Lin Li*, Wei Chen*, Jiahui Li, Kwang-Ting Cheng, Long Chen

AAAI Conference on Artificial Intelligence (AAAI), 2026

[paper] [code]

Thyme: Think Beyond Images

Yi-Fan Zhang, Xingyu Lu, Shukang Yin, Chaoyou Fu, Wei Chen, Xiao Hu, Bin Wen, et al.

Arxiv Preprint, 2025

[paper] [code]

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

Wei Chen, Xin Yan, Bin Wen, Fan Yang, Tingting Gao, Di Zhang, Long Chen

Neural Information Processing Systems (NeurIPS), 2025

[paper] [code]

CoMM: A Coherent Interleaved Image-Text Dataset for Multimodal Understanding and Generation

Wei Chen*, Lin Li*, Yongqi Yang*, Bin Wen, Fan Yang, Tingting Gao, Yu Wu, Long Chen

Computer Vision and Pattern Recognition (CVPR Highlight), 2025

[paper] [code]

An Efficient and Effective Transformer Decoder-Based Framework for Multi-Task Visual Grounding

Wei Chen, Long Chen, Yu Wu

European Conference on Computer Vision (ECCV), 2024

[paper] [code]

Internship Experience

Kuaishou Keye-VL Team, Beijing, China

Mar. 2024 – Present

Keye-VL Post-train

Apr. 2025 – Present, K-Star Intern

Contributor to the Keye-VL series, including: Keye-VL-671B-A37B, Keye-VL 8B & Keye-VL 1.5 8B

Remote Collaboration

Sep. 2024 – Apr. 2025, Research Intern

Research Intern

Mar. 2024 – Aug. 2024, Multimedia Understanding Group

Original Source | Taken Source