Carview!

HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sun, 07 Dec 2025 12:47:46 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"69357772-1035d" expires: Sun, 28 Dec 2025 15:21:47 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 5EE0:3827E5:7CA7B5:8BBBE1:695148B2 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 15:11:47 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210096-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766934707.345821,VS0,VE211 vary: Accept-Encoding x-fastly-request-id: 24bca752354761ad6dccd33629d4f8a1a280f84f content-length: 10637 Homepage - Welcome to Xiaokang Chen's Homepage

Welcome to Xiaokang Chen's Homepage

Home
Publications
Showcase

Xiaokang Chen (陈小康)

I am currently a technical staff member at DeepSeek AI, leading the multimodal group. This group, responsible for multimodal pre-training and post-training, focuses on advancing the multimodal capabilities of DeepSeek's large language models (LLMs). I am driven by the mission to expand the frontiers of machine intelligence and weave it into the fabric of everyday life, ultimately augmenting human potential.

I obtained my Ph.D degree at Peking University (PKU) in 2024, supervised by Professor Gang Zeng. Before that, I received my Bachelor’s degree at Peking University in July 2019.

pkucxk(at)pku.edu.cn Google Scholar GitHub Twitter LinkedIn

Warning

Problem: The current name of your GitHub Pages repository ("") does not match the recommended repository name for your site ("").
Solution: Please consider renaming the repository to "", so that your site can be accessed directly at "https://". However, if the current repository name is intended, you can ignore this message by removing "{% include widgets/debug_repo_name.html %}" in index.html.

Action required

Problem: The current root path of this site is "", which does not match the baseurl ("") configured in _config.yml.
Solution: Please set the baseurl in _config.yml to "".

Education

Peking University

Ph.D. Student

Sep. 2019 - Jul. 2024
Peking University

B.S. in Computer Science

Sep. 2015 - Jul. 2019

Academic Service

Journal reviewer: IJCV, TPAMI, TIP, TCSVT, Neurocomputing, CVIU.
Conference reviewer: CVPR, ECCV, ICCV, NeurIPS, ICML, AAAI

Honors & Awards

WAIC Yunfan Award (云帆奖)

2025
Standford World's Top 2% Scientists

2025
Outstanding Graduate, Peking University

2024
National Scholarship, (Ministry of Education, PRC)

2021, 2022, 2023
Merit Student, PKU

2020, 2021, 2022, 2023
Top 10 Outstanding Researcher (学术十杰), PKU

2021
Huawei Scholarship

2021
Award for Academic Innovation, PKU

2021
Schlumberger Scholarship

2020
Award for Excellent Research, PKU

2017, 2018, 2019

Experience

DeepSeek

AGI Researcher

Apr. 2024 - now
Shanghai Artificial Intelligence Laboratory

Research Intern, directed by Dr. Wenhai Wang and Dr. Jifeng Dai.

Dec. 2022 - Nov. 2023
Baidu Research

Research Intern, directed by Dr. Jingdong Wang.

Dec. 2021 - Dec. 2022
Microsoft Research Aisa (MSRA)

Research Intern, directed by Dr. Jingdong Wang.

Jun. 2020 - Dec. 2021
Sensetime Research

Research Intern, directed by Dr. Kwan-Yee Lin and Dr. Wayne (Wenyan) Wu.

Apr. 2019 - May. 2020

News

2025

- I was awarded as the 2025 WAIC Yunfan Award.

Jul 27

- Release Janus-Pro for unified multimodal understanding and generation.

Jan 28

2024

- Release DeepSeek-VL2, a Mixture-of-Experts based Vision-Language Models.

Dec 13

- I successfully defended my Ph.D thesis!

May 20

Selected Projects and Papers (view all )

Janus-Series: Unified Multimodal Understanding and Generation Models

DeepSeek

Project lead and core contributor.

Abstract

The Janus-Series pioneers the design of separate visual encoders for unified multimodal understanding and generation, effectively alleviating the inherent conflict of using a single encoder found in prior work. This series comprises three models: Janus, an autoregressive-based unified model published at CVPR 2025; JanusFlow, a Flow Matching-based unified model also published at CVPR 2025; and Janus-Pro, a scaled-up version of Janus in terms of both data and model size. Janus-Pro achieves state-of-the-art performance among open-source models in both multimodal understanding and generation. Notably, on the GenEval image generation benchmark, Janus-Pro scores 80, outperforming both DALLE-3 and Stable Diffusion 3.

[Paper: Janus-Pro] [Paper: Janus (CVPR 2025)] [Paper: JanusFlow (CVPR 2025)]

[🔥 Code (17k stars)] [Huggingface Model] [Online Demo]

[🔥 Twitter] [机器之心] [量子位] [新智元]

Janus-Series: Unified Multimodal Understanding and Generation Models

DeepSeek

Project lead and core contributor.

Abstract

[Paper: Janus-Pro] [Paper: Janus (CVPR 2025)] [Paper: JanusFlow (CVPR 2025)]

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

DeepSeek

Project co-lead and core contributor.

Abstract

DeepSeek-VL2 is a large multimodal foundation model based on the Mixture-of-Experts (MoE) architecture. It possesses a wide range of multimodal understanding capabilities, including image description, landmark recognition, chart understanding, OCR, meme understanding, multi-image understanding, object localization, and reasoning. Thanks to its MoE architecture, the model achieves better overall performance than Qwen2-VL-7B and InternVL2-8B while using only 4.1B active parameters. In terms of visual perception (specifically image description and vision perception), it surpasses Qwen2-VL-72B.

[Paper] [Code] [官方介绍]

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

DeepSeek

Project co-lead and core contributor.

Abstract

[Paper] [Code] [官方介绍]

CAE: Context Autoencoder for Self-Supervised Representation Learning

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

International Journal of Computer Vision (IJCV) 2023

Abstract

Core Contributions: (1) Proposed to perform the prediction of masked image patches in the latent space. (2) Decoupled the functionalities of the encoder and decoder during the pre-training stage: the encoder is solely responsible for representation learning, while the decoder is only for completing the pre-training task. The method achieved state-of-the-art results on various ViT models (small, base, large, huge). Specifically, the ViT-H based model reached 64.5% mAP on the COCO test set, which ranked first on the leaderboard at the time of submission. The core idea of this work is similar to I-JEPA, a slightly later work from the same period by Turing Award winner Yann LeCun, as both perform prediction in the latent space. CAE has been successfully applied in Baidu's large models for industrial vision, OCR text recognition, and human body analysis.

[Paper] [Code] [Code2] [中文解读]

CAE: Context Autoencoder for Self-Supervised Representation Learning

Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang

International Journal of Computer Vision (IJCV) 2023

Abstract

[Paper] [Code] [Code2] [中文解读]

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai (* equal contribution)

Neural Information Processing Systems (NeurIPS) 2023

Abstract

We propose a vision-centric task framework based on large language models (LLMs). By treating images as a form of language and aligning vision tasks with language tasks—which can be flexibly defined and managed through linguistic instructions—this framework provides a unified perspective for both vision and language tasks. VisionLLM enables task customization at various levels via language instructions, ranging from fine-grained object-level to coarse-grained task-level customization. It achieves over 60% mAP on COCO, comparable to specialized detection models.

[Paper] [Code] [Demo]

VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Wenhai Wang*, Zhe Chen*, Xiaokang Chen*, Jiannan Wu*, Xizhou Zhu, Gang Zeng, Ping Luo, Tong Lu, Jie Zhou, Yu Qiao, Jifeng Dai (* equal contribution)

Neural Information Processing Systems (NeurIPS) 2023

Abstract

[Paper] [Code] [Demo]

Conditional DETR for Fast Training Convergence

Xiaokang Chen*, Depu Meng*, Zejia Fan, Gang Zeng, Houqiang Li,, Yuhui Yuan,, Lei Sun, Jingdong Wang (* equal contribution)

International Conference on Computer Vision (ICCV) 2021

Abstract

We solve the slow convergence of Detection Transformer (DETR) with our Conditional Spatial Query method. DETR converges slowly because it struggles to find key extremity regions of an object (e.g., an elephant's feet, back, or trunk), which are vital for accurate localization and recognition. Our method explicitly finds these extremity regions in space, constrains the search area, and speeds up DETR's convergence by 6-10x. This was one of the first works to address DETR's slow training, inspiring many later algorithms like DAB-DETR and DINO.

[Paper] [Code] [中文解读]

Conditional DETR for Fast Training Convergence

Xiaokang Chen*, Depu Meng*, Zejia Fan, Gang Zeng, Houqiang Li,, Yuhui Yuan,, Lei Sun, Jingdong Wang (* equal contribution)

International Conference on Computer Vision (ICCV) 2021

Abstract

[Paper] [Code] [中文解读]

CPS: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021

Abstract

This work proposes a simple and efficient semi-supervised semantic segmentation algorithm that enforces consistency between a dual-branch network using online-generated pseudo-labels. This approach achieves excellent semi-supervised performance without the need for threshold-based filtering. It significantly outperforms other contemporary semi-supervised segmentation algorithms on the PASCAL VOC 2012 and Cityscapes datasets, including Google's PseudoSeg (ICLR 2021). The method has become a key baseline in the field of semi-supervised segmentation. The paper has garnered over 1,000 citations and was featured on the [list of highly-cited AI papers in 2021].

[Paper] [Code] [Poster] [Slides] [Video Talk] [中文解读]

CPS: Semi-Supervised Semantic Segmentation with Cross Pseudo Supervision

Xiaokang Chen, Yuhui Yuan, Gang Zeng, Jingdong Wang

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021

Abstract

[Paper] [Code] [Poster] [Slides] [Video Talk] [中文解读]

All publications

Last updated: Dec 2025

Thanks Shitong for the template.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source