| CARVIEW |
Select Language
Selected Works ([Full List])
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable
Convolutions
Wenhai Wang*, Jifeng Dai*, Zhe Chen*†, Zhenhang Huang* Zhiqi Li*†, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR, 2023(Highlight Paper (2.5%))
[Paper] [Code]
[BibTex]
A strong large-scale CNN-based fondamention model.
Wenhai Wang*, Jifeng Dai*, Zhe Chen*†, Zhenhang Huang* Zhiqi Li*†, Xizhou Zhu*, Xiaowei Hu, Tong Lu, Lewei Lu, Hongsheng Li, Xiaogang Wang, Yu Qiao#
CVPR, 2023
[Paper] [Code]
BEVFormer: Learning Bird’s-Eye-View Representation from Multi-Camera Images via
Spatiotemporal Transformers
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code]
[BibTex]
[ECCV 2022' Top-10 Influential Papers]
[100 Most Cited AI Papers in 2022]
A versatile camera-only framework for autonomous driving perception, e.g., 3D object
detection and semantic map segmentation.
Zhiqi Li*†, Wenhai Wang*, Hongyang Li*, Enze Xie, Chonghao Sima, Tong Lu, Yu Qiao, Jifeng Dai#
ECCV, 2022
[Paper] [Code]
[ECCV 2022' Top-10 Influential Papers]
[100 Most Cited AI Papers in 2022]
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021(ESI Highly Cited Paper (1%), ESI Hot Paper (0.1%))
[Paper] [Code]
[中文解读]
[Report]
[Talk]
[BibTex]
[CNKI's Academic Essentials]
[CVMJ 2022 Honorable Mention Award]
A better PVT.
Wenhai Wang#, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, Ling Shao
CVMJ, 2021
[Paper] [Code]
[CNKI's Academic Essentials]
[CVMJ 2022 Honorable Mention Award]
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without
Convolutions
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021(Oral Presentation (3.4%))
[Paper] [Code]
[中译版]
[中文解读]
[Report]
[Talk]
[BibTex]
[ICCV21' Top-10 Influential Papers]
A pure Transformer backbone for dense prediction, such as object detection and semantic
segmentation.
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan#, Kaitao Song, Ding Liang, Tong Lu#, Ping Luo, Ling Shao
ICCV, 2021
[Paper] [Code]
[ICCV21' Top-10 Influential Papers]
PolarMask++: Enhanced Polar Representation for Single-Shot Instance Segmentation and
Beyond
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code]
[BibTex]
[CVPR 2020 Top-10 Influential Papers]
We extend PolarMask (CVPR 2020 Oral Presentation (5.7%)) to several instance-level detection
tasks.
Enze Xie*, Wenhai Wang*, Mingyu Ding, Ruimao Zhang, Ping Luo#
TPAMI, 2021
[Paper] [Code]
[CVPR 2020 Top-10 Influential Papers]
Honors and Awards
- 2023/12: 2023 CSIG Excellent Doctoral Dissertation Award Honorable Mention Award
- 2023/08: 2023 World Artificial Intelligence Conference Youth Outstanding Paper Award
- 2023/06: CVPR 2023 Best Paper Award
- 2023/04: CVMJ 2022 Best Paper Honorable Mention Award
- 2022/06: Waymo 2022 3D Camera-Only Detection Task, 1st Place (15,000 USD Bonus)
- 2022/06: Outstanding Doctoral Thesis Award, Nanjing University
- 2022/04: Most Popular Speakers in TechBeat 2022
- 2021/04: Outstanding Graduate, Nanjing University
- 2020/12: National Artificial Intelligence Challenge (NAIC) 2020, Remote Sensing Semantic Segmentation Task, 1st Place (1,000,000 RMB Bonus)
- 2019/12: China National Scholarship
- 2019/09: ICDAR2019 Robust Reading Challenge on Arbitrary-Shaped Text, Task1, 1st Place
- 2019/09: ICDAR2019 Robust Reading Challenge on Large-scale Street View Text with Partial Labeling, Task1, 2nd Place
- 2018/12: AI Challenger 2018 Autonomous Driving Perception Task, 2nd Place (40,000 RMB Bonus)
- 2015/11: ACM-ICPC Asia Regional Contest, Silver Medal
Invited Talk
- 2023/10: Preliminary Study on "Large-Scale Visual Foundation Model + LLM in Open-World Application", PRCV Talk.
- 2023/08: Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions, WAIC Youth Outstanding Paper Award Talk
- 2023/07-08: Preliminary Study on "Large-Scale Visual Foundation Model + LLM", Zhidx/Huawei Noah's Ark Lab/Tencent Youtu Lab/Fudan University Talk
- 2023/06: Study and Application of Large-scale Foundation Models in Open World Tasks, VALSE Talk
- 2023/05: InternImage: A Large-Scale Generic Vision Model, SenseTime Talk
- 2022/11-12: Study and Application of Multi-Task Generic Perception Model, AITIME (2:31:30)/Tsinghua University Talk
- 2022/07: Transformer-based Vision Perception, ChinaMM Talk
- 2021/07: Application of Transformer in Detection and Segmentation Tasks, TechBeat Talk
Academic Services
Workshop (Co-)Organizer
- Vision and Language Collision: Synergy between Language Model and Vision Ecology (视言碰撞:语言模型与视觉生态协同) at PRCV 2023
- Challenges and Opportunities of Large Models for CV/PR (大模型对CV/PR的挑战与机会) at VALSE 2023
- Visual Intelligence
- International Joint Conference on Artificial Intelligence (IJCAI), 2021
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- International Journal of Computer Vision (IJCV)
- IEEE Transactions on Image Processing (TIP)
- IEEE Transactions on Multimedia (TMM)
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- Computational Visual Media Journal (CVMJ)
- Pattern Recognition (PR)
- IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020, 2021, 2022, 2023
- Neural Information Processing Systems (NeurIPS), 2020, 2021, 2023
- International Conference on Machine Learning (ICML), 2021, 2022
- International Conference on Learning Representations (ICLR), 2021
- IEEE International Conference on Computer Vision (ICCV), 2021
- European Conference on Computer Vision (ECCV), 2022
- AAAI Conference on Artificial Intelligence (AAAI), 2022
- International Joint Conference on Artificial Intelligence (IJCAI), 2022
- IEEE Winter Conference on Applications of Computer Vision (WACV), 2021
- Asian Conference on Computer Vision (ACCV), 2020