| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 15 Dec 2025 12:50:16 GMT
access-control-allow-origin: *
etag: W/"69400408-cc8f"
expires: Sun, 28 Dec 2025 15:21:57 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: BEE1:1F53DD:7C01C5:8B0341:695148B4
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 15:11:57 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210082-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766934718.707906,VS0,VE205
vary: Accept-Encoding
x-fastly-request-id: 2ba9ac92ac268d40c74defd5f16bd52d67ea47af
content-length: 11820
Limin Wang
Limin Wang (王利民)
Multimedia Computing Group
Department of Computer Science and Technology
Nanjing University
Office: CS Building 506
Email: lmwang.nju [at] gmail.com
Multimedia Computing Group
Department of Computer Science and Technology
Nanjing University
Office: CS Building 506
Email: lmwang.nju [at] gmail.com
About Me (CV)
I am a Professor at Department of Computer Science and Technology and also affiliated with State Key Laboratory for Novel Software Technology, Nanjing University.
Previously, I received the B.S. degree from Nanjing University in 2011, and the Ph.D. degree from The Chinese University of Hong Kong under the supervision of Prof. Xiaoou Tang in 2015. From 2015 to 2018, I was a Post-Doctoral Researcher with Prof. Luc Van Gool in the Computer Vision Laboratory (CVL) at ETH Zurich.
Previously, I received the B.S. degree from Nanjing University in 2011, and the Ph.D. degree from The Chinese University of Hong Kong under the supervision of Prof. Xiaoou Tang in 2015. From 2015 to 2018, I was a Post-Doctoral Researcher with Prof. Luc Van Gool in the Computer Vision Laboratory (CVL) at ETH Zurich.
News
- 2025-12-15:
We open-source several interesting projects: SAM2++, Sora2-mini, SteadyDancer. - 2025-12-01:
We release the InternVideo-Next , a new Video Foundation Model for world modeling - 2025-09-18: Six papers are accepted by NeurIPS 2025.
- 2025-07-30: Two papers are accepted by T-PAMI.
- 2025-06-27: Five papers are accepted by ICCV 2025.
- 2025-05-01: Two papers are accepted by ICML 2025.
- 2025-04-19:
Prof. Limin Wang is invited to be an Associate Editor of T-PAMI. - 2025-03-28: JointFormer is accepted by T-PAMI.
- 2025-02-27: Six papers are accepted by CVPR 2025.
- 2025-01-23: Six papers are accepted by ICLR 2025.
- 2024-12-12: The extension of PDPP is accepted by T-PAMI.
- 2024-09-26: Four papers are accepted by NeurIPS 2024.
- 2024-07-01: InternVideo2 is accepted by ECCV 2024.
- 2024-07-01: Six papers are accepted by ECCV 2024 on video foundation model & image generation etc.
- 2024-06-01:
Our MixFormer is selected as the Featured Article of TPAMI. - 2024-04-12: The VLG work is accepted by IJCV.
- 2024-04-07: The extension of STMixer is accepted by T-PAMI.
- 2024-02-27: Ten papers are accepted by CVPR 2024 on video foundation model & video generation & benchmarks etc.
- 2024-01-15: Two papers (InternVid & SparseFormer) are accepted by ICLR 2024.
- 2024-01-01: The extension of MixFormer is accepted by T-PAMI.
- 2023-11-30: Our LogN is accepted by IJCV.
- 2023-11-01: Our CamLiFlow is accepted by T-PAMI.
- 2023-11-01:
Our RefineTAD receives the Best Paper Honorable Mention Award of ACM MM 2023. - 2023-10-25: Our Dynamic MDETR is accepted by T-PAMI.
- 2023-09-22: Our MixFormer V2 is accepted by NeurIPS 2023.
- 2023-09-02: One paper on crowded pose estimation is accepted by IJCV.
- 2023-07-21: Our survey paper on 3D human mesh recovery is accepted by T-PAMI.
- 2023-07-14: Our UMT Foundation Model is accepted by ICCV 2023.
- 2023-07-14: Our SportsMOT dataset is accepted by ICCV 2023.
- 2023-07-14: Ten papers are accepted by ICCV 2023 (Topics: Video foundation models, action detection and anticipation, multi-object tracking, (3D) object detection, new dataset.)
- 2023-07-13: We release the InterVid dataset for multi-modal video understanding and generation.
- 2023-06-25: We release the Grasp Anything project for embodied AI by leveraging vision foundation model.
- 2023-06-15:
Prof. Limin Wang is invited to be an Editorial Board Member of IJCV. - 2023-06-10: I am invited to give a ARP talk at VALSE 2023 (slide).
- 2022-05-25: We propose the MixFormer V2, a real-time object tracker. We have released the source code.
- 2023-05-19: Temporal Perceiver is accepted by T-PAMI. We have released the source code.
- 2023-05-10: We present the VideoChat system, by combining video foundation model and LLM.
- 2023-03-18: We propose the VideoMAE V2, training the first billion-level video transformer (source code).
- 2023-03-01: Five papers on video understanding and point cloud analysis are accepted by CVPR 2023.
- 2023-02-01: One paper is accepted by ICLR 2023 and one by AAAI 2023.
- 2022-11-01: The FineAction dataset is accepted by TIP.
- 2022-10-09: The extension of LIP is accepted by IJCV.
- 2022-09-15: VideoMAE and PointTAD are accepted by NeurIPS 2022.
- 2022-09-15: We present the BasicTAD, an end-to-end TAD baseline method. We have released the source code.
- 2022-08-10: One paper is accepted by ECCV 2022 and one paper (CDG) is accepted by IJCV.
- 2022-05-01: We are organizing the second DeeperAction Challenge at ECCV 2022, by introducing five new benchmarks on temporal action localization, multi-actor tracking, spatiotemporal action detection, part-level action parsing, and fine-grained video anomaly recognition.
- 2022-03-23: We present the VideoMAE, a self-supervised video transformer obtaining SOTA performance on the benchmarks of Kinetics, Something-Something, and AVA. We have released the source code and pre-trained models.
- 2022-03-02: We present the MixFormer, a compact and efficient object tracker, obtaining SOTA performance on several benchmarks. We have released the source code.
- 2022-03-02: We present the AdaMixer, a fast-converging query based object detector tracker, obtaining competitive performance on the MS COCCO benchmark. We have released the source code.
- 2022-03-02: Seven papers on object detection, object tracking, action recognition etc. are accepted by CVPR 2022.
- 2021-07-25: Eight papers on video understanding are accepted by ICCV 2021: new dataset (MultiSports), backbone (TAM), sampling method (MGSampler), detection frameworks (RTD and TRACE). For more details, please refer to our papers.
- 2021-07-15: We release the MultiSports dataset for spatiotemporal action detection.
- 2021-07-15: Our team secures the first place at ACM MM Pre-training for Video Understanding Challenge for Track 2.
- 2021-06-15: Our team secures the first place at CVPR Kinetics Challenge for Self-Supervised Task.
- 2021-06-15: Our team secures the first place at CVPR PIC Challenge for Human-Centric Spatio-Temporal Video Grounding Task.
- 2021-06-01:
We are organizing DeeperAction Challenge at ICCV 2021, by introducing three new benchmarks on temporal action localization, spatiotemporal action detection, and part-level action parsing. - 2021-04-20: The extension of TRecgNet is accepted by IJCV.
- 2021-04-07: We propose a target transformer for accurate anchor-free tracking, termed as TREG (code).
- 2021-04-07: We present a transformer decoder for direct action proposal generation, termed as RTD-Net (code).
- 2021-03-01: Two papers on action recognition and point cloud segmentation are accepted by CVPR 2021.
- 2020-12-30: We propose a new video architecture of using temporal difference, termed as TDN and realease the code.
- 2020-07-03: Three papers on action detection and segmentation are accepted by ECCV 2020.
- 2020-06-28: Our proposed DSN, a dynamic version of TSN for efficient action recognition, is accepted by TIP.
- 2020-05-14: We propose a temporal adaptive module for video recognition, termed as TAM and code.
- 2020-04-16: The code of our published papers will be made available at Github: MCG-NJU.
- 2020-04-16: We propose a fully convolutional online tracking framwork, termed as FCOT and code.
- 2020-03-10: Our proposed temporal module TEA is accepted by CVPR 2020.
- 2020-01-20: We propose an efficient video representation learning framwork, termed as CPD and release the code.
- 2020-01-15: We present an anchor-free action tubelet detector, termed as MOC-Detector and release the code.
- 2019-12-20: Our proposed V4D, a principled video-level representation learning framework, is accepted by ICLR 2020.
- 2019-11-21: Our proposed TEINet, an efficient video architecture for video recognition, is accepted by AAAI 2020.
- 2019-07-23: Our proposed LIP, a general alternative to average or max pooling, is accepted by ICCV 2019.
- 2019-03-15: Two papers are accepted by CVPR 2019: one for group activity recognition and one for RGB-D transfer learning.
- 2018-08-19: One paper is accepted by ECCV 2018 and one (TSN) by T-PAMI.
- 2018-04-01:
I join Nanjing University as a faculty member at Department of Computer Science and Technology . - 2017-11-28: We released a recent work on video architecture design for spatiotemporal feature learning. [ arXiv ] [ Code ].
- 2017-09-08: We have released the TSN models learned in the Kinetics dataset. These models could be transferred well to the existing datasets for action recognition and detection [ Link ].
- 2017-09-01: One paper is accepted by ICCV 2017 and one (OS2E-CNN) by IJCV.
- 2017-07-18: I am invited to give a talk at the Workshop on Frontiers of Video Technology-2017 [ Slide ].
- 2017-03-28:
I am co-organizing the CVPR2017 workshop and challenge on Visual Understanding by Learning from Web Data . For more details, please see the workshop page and challenge page. - 2017-02-28: Two papers are accepted by CVPR 2017.
- 2016-12-20:
We release the code and models for SR-CNN paper [ Code ]. - 2016-10-05:
We release the code and models for Places2 scene recognition challenge [ arXiv ] [ Code ]. - 2016-08-03:
Code and model of Temporal Segment Networks is released [ arXiv ] [ Code ]. - 2016-07-15: One paper is accepted by ECCV 2016 and one by BMVC 2016.
- 2016-06-16:
Our team secures the 1st place for untrimmed video classification at ActivityNet Challenge 2016 [ Result ].
Basically, our solution is based on our works of Temporal Segment Networks (TSN) and Trajectory-pooled Deep-convolutional Descriptors (TDD). - 2016-03-01: Two papers are accepted by CVPR 2016.
- 2015-12-10:
Our SIAT_MMLAB team secures the 2nd place for scene recognition at ILSVRC 2015 [ Result ]. - 2015-09-30: We rank 3rd for cultural event recognition on ChaLearn Looking at People challenge, at ICCV 2015.
- 2015-08-07:
We release the Places205-VGGNet models [ Link ]. - 2015-07-22:
Code of Trajectory-Pooled Deep-onvolutional Descriptors (TDD) is released [ Link ]. - 2015-07-15:
Very deep two stream ConvNets are proposed for action recognition [ Link ]. - 2015-03-15: We are the 1st winner of both tracks for action recognition and cultural event recognition, on ChaLearn Looking at People Challenge at CVPR 2015.
- 2015-03-03: One paper is accepted by CVPR 2015, details coming soon.
- 2014-09-05: We rank 4th for action recognition and 2nd for action detection, on THUMOS'14 Challenge at ECCV 2014.
- 2014-06-16: Two papers are accepted by ECCV 2014.
- 2014-06-10: We are the 1st winner of both track 1 and track2, and rank 4th for track3, on ChaLearn Looking at People Challenge at ECCV 2014.
- 2014-05-20: A comprehensive study paper on action recognition [ Link ].
- 2014-05-16: New homepage on Github launched!
Selected Publications [ Full List ] [ Google Scholar ] [ Github: MCG-NJU ]
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Y. Wang, K. Li, X. Li, J. Yu, Y. He, G. Chen, B. Pei, R. Zheng, J. Xu, Z. Wang, Y. Shi, T. Jiang, S. Li, H. Zhang, Y. Huang, Y. Qiao, Y. Wang, L. Wang
in European Conference on Computer Vision (ECCV), 2024.
[ Paper ] [ Code ]
STOA performance on more than 60 video understanding tasks.
Y. Wang, K. Li, X. Li, J. Yu, Y. He, G. Chen, B. Pei, R. Zheng, J. Xu, Z. Wang, Y. Shi, T. Jiang, S. Li, H. Zhang, Y. Huang, Y. Qiao, Y. Wang, L. Wang
in European Conference on Computer Vision (ECCV), 2024.
[ Paper ] [ Code ]
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool
in European Conference on Computer Vision (ECCV), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Code ] [ Journal Version]
Proposing a segmental architecture and obtaining the state-of-the-art performance on UCF101 and HMDB51
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. Van Gool
in European Conference on Computer Vision (ECCV), 2016.
[ Paper ] [ BibTex ] [ Poster ] [ Code ] [ Journal Version]
Action Recognition with Trajectory-Pooled Deep-Convolutional Descriptors
L. Wang, Y. Qiao, and X. Tang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ Paper ] [ BibTex ] [ Extended Abstract ] [ Poster ] [ Project Page ] [ Code ]
State-of-the-art performance: HMDB51: 65.9%, UCF101: 91.5%.
L. Wang, Y. Qiao, and X. Tang
in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
[ Paper ] [ BibTex ] [ Extended Abstract ] [ Poster ] [ Project Page ] [ Code ]
Contests
ActivityNet Large Scale Activity Recognition Challenge, 2016 : Untrimmed Video Classification, Rank: 1/24.ImageNet Large Scale Visual Recognition Challenge, 2015 : Scene Recognition, Rank: 2/25.- ChaLearn Looking at People Challenge, 2015, Rank: 1/6
- THUMOS Action Recognition Challenge, 2015, Rank: 5/11.
- ChaLearn Looking at People Challenge, 2014 , Rank: 1/6, 4/17.
- THUMOS Action Recognition Challenge, 2014, Rank: 4/14, 2/3.
- ChaLearn Multi-Modal Gesture Recognition Challenge, 2013 , Rank: 4/54.
- THUMOS Action Recognition Challenge, 2013, Rank: 4/16.
Academic Service
Journal Reviewer
Conference Reviewer
IEEE Transactions on Pattern Analysis and Machine Intelligence
IEEE Transactions on Image Processing
IEEE Transactions on Multimedia
IEEE Transactions on Circuits and Systems for Video Technology
Pattern Recognition
Pattern Recognition Letter
Image and Vision Computing
Computer Vision and Image Understanding
Conference Reviewer
IEEE Conference on Computer Vision and Pattern Recognition, 2017
IEEE International Conference on Automatic Face and Gesture Recognition, 2017
European Conference on Computer Vision, 2016
Asian Conference on Computer Vision, 2016
International Conference on Pattern Recognition, 2016
Friends
Wen Li (ETH), Jie Song (ETH), Sheng Guo (Malong), Weilin Huang (Malong), Bowen Zhang (USC), Zhe Wang (UCI), Wei Li (Google), Yuanjun Xiong (Amazon), Xiaojiang Peng (SIAT), Zhuowei Cai (Google), Xingxing Wang (NTU)
Last Updated on 15th June., 2023
Published with GitHub Pages