| CARVIEW |
About Me
My name is Min-Hung (Steve) Chen (陳敏弘 in Chinese). I am a Senior Research Scientist at NVIDIA Research Taiwan, working on Vision+X Multimodal AI. I received my Ph.D. degree from Georgia Tech, advised by Prof. Ghassan AlRegib and in collaboration with Prof. Zsolt Kira. Before joining NVIDIA, I was working on Biometric Research for Cognitive Services as a Research Engineer II at Microsoft Azure AI, and was working on Edge-AI Research as a Senior AI Engineer at MediaTek, respectively.
My research interest is mainly Multimodal AI, including Vision-Language, 4D (video+depth) Understanding, Efficient Deep Learning, VLA, and Transformer. I am also interested in Learning without Fully Supervision, including domain adaptation, transfer learning, continual learning, X-supervised learning, etc.
[Recruiting] NVIDIA Taiwan is hiring Research Scientist (fulltime & internship). I am also open to research collaboration. Please drop me an email if you are interested in.
[Note] The Projects, Talks, and Publications Sections are out of date. Please mainly check the News Section.
Interests
- Transfer Learning
- Unsupervised Learning
- Video Understanding
- Vision Transformer
- Computer Vision
- Deep Learning
- Machine Learning
Education
PhD in Electrical and Computer Engineering, 2020
Georgia Institute of Technology
MSc in Integrated Circuits and Systems, 2012
National Taiwan University
BSc in Electrical Engineering, 2010
National Taiwan University
News
- Dec. 2025: I will be serving as an Area Chair for ICML 2026.
- Nov. 2025: Our "VADER" paper is accepted to WACV 2026!!
- Sep. 2025: Our papers "ThinkAct" and "BlurDM" are accepted to NeurIPS 2025!! Our "TC-LoRA" work is accepted to the SPACE in Vision, Language, and Embodied AI (SpaVLE) workshop @ NeurIPS 2025.
- Aug. 2025: I will be serving as an Area Chair for ICLR 2026.
- Aug. 2025: Our "MovieCORE" ( Code ) is accepted to EMNLP 2025 as Oral!!
- Jun. 2025: Our papers "HERMES" ( Code ) and "LongSplat" ( Code ) are accepted to ICCV 2025!!
- May 2025: I am serving as an organizer for The Workshop on Ego-Exo Sensing for Smart Mobility (X-Sense) @ ICCV 2025.
- Apr. 2025: I will be serving as a workshop reviewer for Tiny Titans: The next wave of On-Device Learning for Foundational Models (TTODLer-FM) @ ICML 2025 and Representation Learning with Very Limited Resources: When Data, Modalities, Labels, and Computing Resources are Scarce (LIMIT) @ ICCV 2025, respectively.
- Apr. 2025: Our "V2V-LLM" work is accepted to CVPR 2025 Workshops (Best Paper in T4V and Oral in DriveX)!!
- Mar. 2025: I am serving as an organizer for The Workshop on Transformers for Vision (T4V) @ CVPR 2025.
- Mar. 2025: I am selected as an outstanding reviewer for the SCOPE workshop @ ICLR 2025.
- Feb. 2025: Our papers "Omni-RGPT" and "AuraFusion360" ( Code ) are accepted to CVPR 2025!!
- Feb. 2025: I will be serving as a workshop reviewer for Emergent Visual Abilities and Limits of Foundation Models (EVAL-FoMo) @ CVPR 2025.
- Jan. 2025: Our papers "SANER" and "Hymba" ( Code & Hugging Face & NV Blog ) are accepted to ICLR 2025!!
- Jan. 2025: I am serving as a journal reviewer for IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI).
- Jan. 2025: I will be serving as a workshop reviewer for Scalable Optimization for Efficient and Adaptive Foundation Models (SCOPE) @ ICLR 2025.
- Oct. 2024: Four papers, "SemPLeS", "ST-CLIP", "CorrFill" and "ORFormer", are accepted to WACV 2025!!
- Sep. 2024: Our paper "DRAIL" ( Code ) is accepted to NeurIPS 2024 and ICLR 2024 Workshop (Gen4AIDM)!!
- Jun. 2024: Our "GroPrompt" paper is accepted to CVPR 2024 Workshop (CVinW)!!
- May 2024: Our paper "DoRA" ( Code & NV Blog ) is accepted to ICML 2024 as Oral (acceptance rate: 1.5%)!!!
- Apr. 2024: I will be serving as a workshop reviewer for Transformers for Vision (T4V) Workshop @ CVPR 2024.
- Feb. 2024: Two papers, "CoDe" ( Code ) and "PartDistill" ( Code ), are accepted to CVPR 2024!!
- Feb. 2024: The paper list for Vision Transformer/Attention has obtained 4000+ stars!!
- Sep. 2023: Our paper "TS-LSTM and temporal-inception" ( Blog & arXiv & Code ) received the 2023 EURASIP Best Paper Award for Image Communication Journal!!
- Jul. 2023: Two papers, "CEVR" ( Code ) and "MIT" ( Code ), are accepted to ICCV 2023!! See you in Paris!
- Jun. 2023: Our "QuAVF" work is selected as the 1st place winner of the CVPR 2023 Ego4D Challenge in the Audio-Visual Social Understanding: Talking to me track!!
- Apr. 2023: I will be serving as a workshop reviewer for Transformers for Vision (T4V) Workshop @ CVPR 2023.
- Apr. 2023: Our "GAIN" paper is accepted to CVPR 2023 Workshop (Biometrics) with the Best Paper Award!!
- Nov. 2022: I joined the Taipei Team at NVIDIA Research as a Senior Research Scientist, working on Vision+X Multi-Modal AI.
- Oct. 2022: Our "HIT" paper is accepted to WACV 2023!!
- Oct. 2022: Our "ROGUE" paper is accepted to BVMC 2022!!
- Jul. 2022: I am selected as an outstanding reviewer for ICML 2022.
- Apr. 2022: I released a comprehensive paper list for Vision Transformer/Attention to facilitate related research.
- Jan. 2022: I joined the Face Science Team at Microsoft Azure AI as a Research Engineer II, working on Cutting-edge AI Research for Cognitive Services.
- Sep. 2021: I am selected as an outstanding reviewer for ICCV 2021.
- May. 2021: I am selected as an outstanding reviewer for CVPR 2021.
- Jan. 2021: I am co-organizing the Learned Smartphone ISP Challenge in the Mobile AI (MAI) Workshop at CVPR 2021 with ETHZ! Please check the Project page for more details.
- Oct. 2020: I joined the AI team at MediaTek Taiwan as a Senior AI Engineer, working on Deep Learning Research for Edge-AI.
- Aug. 2020: I officially obtained my Ph.D. degree from Georgia Tech!!! (Feel free to check my Ph.D. Dissertation for more details)
Work Experience
Senior Research Scientist
NVIDIA Research
Research Engineer II
Microsoft
Deploy research approaches to next-generation cloud service solutions
Senior AI Engineer
MediaTek Inc.
Coordinate academic-industry collaboration for EcoSystem (e.g. co-host CVPR'21 workshop)
Research Intern
Baidu USA
Deep Learning Engineer Intern
Aipoly
Ph.D. Research
Georgia Institute of Technology
Human action understanding
Robust machine learning for autonomous vehicle
Research Assistant
Academia Sinica
Projects

Ultimate Awesome Transformer Attention
An ultimately comprehensive paper list of Vision Transformer and Attention, including papers, codes, and related websites.

Vision-based Autonomous Retail Store
Deep Learning and Computer Vision system for real-time autonomous retail stores using only RGB cameras.

Deep Learning for Smartphone ISP
The Learned Smartphone ISP Challenge for the CVPR 2021 MAI Workshop.

Action Segmentation with Temporal Domain Adaptation
Cross-domain action segmentation by aligning temporal feature spaces.

Activity Recognition with RNN and Temporal-ConvNet
Two methods (TS-LSTM and Temporal-Inception) to exploit spatiotemporal dynamics for activity recognition.

Temporal Attentive Alignment for Video Domain Adaptation
Cross-domain action recognition with new datasets and novel video-based DA approaches.

Traffic Sign Detection under Challenging Conditions
A large-scale traffic sign detection dataset with various challenging conditions.
Professional Activities
Area Chairs
- International Conference on Learning Representations (ICLR)
- International Conference on Machine Learning (ICML)
Organizers
- The Workshop on Ego-Exo Sensing for Smart Mobility (X-Sense) at IEEE/CVF ICCV 2025.
- The 4th Workshop on Transformers for Vision (T4V) at IEEE/CVF CVPR 2025.
- Learned Smartphone ISP Challenge at IEEE CVPR MAI Workshop 2021.
- Visual Attention Estimation Challenge at IEEE AIVR 2021.
Professional Talks
- Dec. 2025: Invited talk at NTH, Taiwan (Topic: Multimodal Efficient AI Research at NVIDIA Taiwan).
- May. 2025: Invited talk at NTHU, Taiwan (Topic: Multimodal AI Research at NVIDIA Taiwan).
- May. 2024: Invited talk at NTHU, Taiwan (Topic: Multimodal AI Research at NVIDIA Taiwan).
- May. 2023: Invited talk at NYCU, Taiwan (Topic: My Research Journey: TW x US x Academics x Industry).
- Jun. 2021: Invited talk at CVPR MAI Workshop 2021 (Topic: Learned Smartphone ISP Challenge: Results and Top Solutions).
- May. 2021: Invited talk at Academia Sinica, Taiwan (Topic: Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding).
- Jan. 2021: Invited talk at NYCU, Taiwan (Topic: My Research Journey for Video Understanding).
- Publication talks at CVPR2020 and ICCV2019.
Conference Reviewers
- IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), including Workshop (CVPRW)
- International Conference on Learning Representations (ICLR), including Workshop (ICLRW)
- Advances in Neural Information Processing Systems (NeurIPS)
- IEEE/CVF International Conference on Computer Vision (ICCV), including Workshop (ICCVW)
- International Conference on Machine Learning (ICML), including Workshop (ICMLW)
- European Conference on Computer Vision (ECCV), including Workshop (ECCVW)
- Association for the Advancement of Artificial Intelligence (AAAI)
- IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)
- British Machine Vision Conference (BMVC)
- IEEE International Conference on Image Processing (ICIP)
- Asian Conference on Computer Vision (ACCV)
- IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- IAPR International Conference on Pattern Recognition (ICPR)
- IAPR International Conference on Image Analysis and Processing (ICIAP)
- IEEE International Workshop on Multimedia Signal Processing (MMSP)
- European Signal Processing Conference (EUSIPCO)
Journal Reviewers
- IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)
- Elsevier Pattern Recognition (PR)
- Springer International Journal of Computer Vision (IJCV)
- IEEE Transactions on Intelligent Transportation Systems (TITS)
- IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)
- IEEE Access
Recent & Upcoming Talks
Learned Smartphone ISP Challenge
Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding
My Research Journey for Video Understanding
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
Featured Publications

Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
[CVPR 2020] Cross-domain action segmentation by aligning feature spaces across multiple temporal scales with self-supervised learning to reduce spatio-temporal variability.

Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
[ICCV 2019 (Oral)] Cross-domain action recognition with new datasets and novel attention-based DA approaches.
Selected Publications
Learned Smartphone ISP on Mobile NPUs With Deep Learning, Mobile AI 2021 Challenge: Report
Network Space Search for Pareto-Efficient Spaces
Bridging Distributional Discrepancy with Temporal Dynamics for Video Understanding
Action Segmentation with Joint Self-Supervised Temporal Domain Adaptation
Interpretable Self-Attention Temporal Reasoning for Driving Behavior Understanding
Action Segmentation with Mixed Temporal Domain Adaptation
Temporal Attentive Alignment for Large-Scale Video Domain Adaptation
Traffic Sign Detection Under Challenging Conditions: A Deeper Look into Performance Variations and Spectral Characteristics
TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition
Depth and Skeleton Associated Action Recognition without Online Accessible RGB-D Cameras
Honors & Awards
- Outstanding Reviewer for ICLRW (Spring 2025)
- 2023 EURASIP Best Paper Award for Image Communication Journal (Fall 2023)
- Outstanding Reviewer for ICML (Summer 2022)
- Outstanding Reviewer for ICCV (Fall 2021)
- Outstanding Reviewer for CVPR (Summer 2021)
- Student Travel Grant Award for ICCV (Fall 2019)
- Ministry of Education Technologies Incubation Scholarship, Taiwan (Fall 2014 - Spring 2017)
- Otto F. and Jenny H. Krauss Fellowship, Georgia Institute of Technology (Fall 2014 - Spring 2015)
Teaching Experience
Graduate Teaching Assistant
Georgia Institute of Technology
- Deep Learning (Spring 2019 by Prof. Zsolt Kira)
- Computer Vision (Fall 2018 by Prof. James Hays)
- Signals and Systems (Spring 2015 by Prof. Jennifer E Michaels)
- Fundamentals of Digital Signal Processing (Fall 2014 by Prof. Mark A Clements)
National Taiwan University
- Statistical Image Processing (Spring 2012)
- Computer Programming (Fall 2011)















