| CARVIEW |
I am a Senior Research Scientist at Google DeepMind (MTV, CA).
Recently, my research interests are on improving the capabilities of Large Multimodal Models (eg, Gemini), and understanding the interaction of vision and language.
I obtained my Ph.D. and M.S. at KAIST, advised by Professor In So Kweon. I have been fortunate to collaborate with Adobe Research (2019), Google Brain (2020), and Google Research (2021). I am a recipient of Microsoft Research Asia Fellowship, Qualcomm Innovation Fellowship and Global Ph.D Fellowship from NRF Korea.
Contact
-
mcahny01 [at] gmail.com
mcahny [at] google.com
-
Googleplex, 1600 Amphitheatre Pkwy, Mountain View, CA 94043
Education
-
PhD in EE, KAIST, 2022
on "Learning Dense Pixel Features for Video Processing and Understanding"
-
MS in EE, KAIST, 2018
on "Reducing Human Supervision in Supervised Learning"
-
BS in EE, KAIST, 2016
-
Exchange Student Program, 2014
KTH Royal Institute of Technology in Stockholm, Sweden
Academic Activities
- Area Chair in NeurIPS 2025, 2024, 2023, ICML 2025, CVPR 2026, 2025, 2024, 2023
- Action Editor of Transactions on Machine Learning Research (TMLR)
- Outstanding Reviewer in CVPR 2021, ECCV 2020
- Reviewer at CVPR, NeurIPS, ICLR, ICCV, ECCV, ICML, AAAI, EG, TPAMI, TNNLS, TIP
Research Experiences
- Google DeepMind (previously Google Brain), MTV, CAJul 2022 - Present
Senior Research Scientist, Research Scientist
- Google Research, LA, CA (virtual)May 2021 - Jan 2022
Research Intern, worked with Liang-Chieh Chen, and Jun Xie - Google Brain, MTV, CA (virtual)Jun 2020 - Nov 2020
Research Intern, worked with Weicheng Kuo, Tsung-Yi Lin, and Anelia Anegelova - Adobe Research, San Jose, CAJun 2019 - Sep 2019
Research Intern, worked with: Joon-Young Lee - KAIST, Daejeon, KoreaMar 2016 - Feb 2022
Research Assistant, Robotics and Computer Vision Lab.
Publications
-
EmbeddingGemma: Powerful and Lightweight Text Representations
Gemini Embedding Team, Google
2025
[ paper / huggingface / Google blogpost ]
-
Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment
Dahun Kim, Anelia Angelova
COLM 2025
[ paper ]
-
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities
Gemini Team, Google
2025
[ paper / Google blogpost ]
-
Time-Scaling State-Space Models for Dense Video Captioning
AJ Piergiovanni, Ganesh Mallya, Dahun Kim, Anelia Angelova
BMVC 2025
[ paper ]
-
Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications
Ganesh Mallya, Yotam Gigi, Dahun Kim, Maxim Neumann, Genady Beryozkin, Tomer Shekel, Anelia Angelova
AGU 2025 Oral presentation
[ paper / Google blogpost / Colab tutorial ]
-
Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning
AJ Piergiovanni, Dahun Kim, Michael S Ryoo, Isaac Noble, Anelia Angelova
Preprint 2025
[ paper ]
-
Learning Visual Grounding from Generative Vision and Language Model
Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, Weicheng Kuo
WACV 2025
[ paper ]
-
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim, Anelia Angelova, Weicheng Kuo
ECCV 2024
[ paper / code / Google Cloud Vertex AI ]
-
Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation
Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Roh
TASLP 2024 (IEEE/ACM Transactions on Audio, Speech and Language Processing)
[ paper ]
-
Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all
Yuanhuiyi Lyu, Xu Zheng, Dahun Kim, Lin Wang
Preprint 2024
[ paper ]
-
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
AJ Piergiovanni*, Isaac Nobel*, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova
CVPR 2024
Featured at Google AI blogpost[ paper / Google blogpost ]
-
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim, Anelia Angelova, Weicheng Kuo
ICCV 2023
[ paper ]
-
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Dahun Kim, Anelia Angelova, Weicheng Kuo
CVPR 2023 Highlight presentation - top 2.5% of submissions
Featured at Google AI blogpost[ paper / code / Google blogpost ]
-
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo*, AJ Piergiovanni*, Dahun Kim†, Xiyang Luo†, Ben Caine, Wei Li, Abhijit Ogale,
Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova (*, † equal contribution)TMLR 2023
Featured at Google AI blogpost[ paper / Google blogpost ]
-
RECLIP: Resource-Efficient Clip by Training with Small Images
Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo
TMLR 2023
[ paper ]
-
Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation
Ji-Yeon Kim, Hyun-Bin Oh, Dahun Kim, Tae-Hyun Oh
RAL-IROS 2024 Oral presentation
Short version at CVPRW 2023 'Vision-Centric Autonomous Driving' Workshop[ paper ]
-
Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation
Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen
WACV 2024 Oral presentation
Short version at 'Transformers for Vision' workshop @ CVPR 2023[ paper / video demo ]
-
Dense Pixel-level Interpretation of Dynamic Scenes with Video Panoptic Segmentation
Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon
TIP 2022
Short version at What is Motion For (WIMF) workshop @ ECCV 2022[ paper ]
-
TubeFormer-DeepLab: Video Mask Transformer
Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen
CVPR 2022
Ranked #1 on SemKITTI-DVPS, #3 on KITTI-STEP, and #4 on VSPW 2021
Short version at 'Transformers for Vision' workshop @ CVPR 2022[ paper ]
-
CMT-DeepLab: Dynamic Clustering Mask Transformers for Panoptic Segmentation
Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen
CVPR 2022 Oral presentation
[ paper ]
-
Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation
Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Vishy Swaminathan, Henry Fuchs
WACV 2022
[ paper ]
-
Global Context and Geometric Priors for Effective Non-Local Self-Attention
Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon
BMVC 2021
Received Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd[ paper ]
-
DeepLab2: A TensorFlow Library for Deep Labeling
Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan,
Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh ChenTechnical report 2021 Internal code contribution
-
Learning to Associate Every Segment for Video Panoptic Segmentation
Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon
CVPR 2021
[ paper ]
-
The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation
Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon
WACV 2021
[ paper ]
-
Align-and-Attend Network for Globally and Locally Coherent Video Inpainting
Sanghyun Woo, Dahun Kim, KwanYoung Park, Joon-Young Lee, In So Kweon
BMVC 2020 (Acceptance: 195/670 ≈ 29.1%)
[ paper ]
-
Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling
Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon
AAAI 2020 (Acceptance: 1591/7737 ≈ 20.6%)
[ paper ]
-
Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles
Dahun Kim, Donghyeon Cho, In So Kweon
AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
[ paper ]
-
Discriminative Feature Learning for Unsupervised Video Summarization
Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon
AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
Received Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd
Patented[ paper ]
-
Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency
Donghyeon Cho, Yunjae Jung, Francois Rameau, Dahun Kim, Sanghyun Woo and In So Kweon
MM 2019 (Acceptance: 252/936 ≈ 26.9%)
[ paper ]
-
Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation
Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon
MM 2019 (Acceptance: 252/936 ≈ 26.9%)
[ paper ]
-
LinkNet: Relational Embedding for Scene Graph
Sanghyun Woo*, Dahun Kim*, Donghyeon Cho, In So Kweon (* equal contribution)
NeurIPS 2018 (Acceptance: 1011/4856 ≈ 20.8%)
[ paper ]
-
Learning Image Representations by Completing Damaged Jigsaw Puzzles
Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon
WACV 2018
[ paper ]
-
Two Phase Learning for Weakly Supervised Object Localization
Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon
ICCV 2017 (Acceptance: 621/2143 ≈ 28.9%)
[ paper ]
-
Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering
Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs
NeurIPS 2021 Spotlight presentation (Acceptance: < 3.0%)
Received Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd
Multimodal AI
Perception - Object and Video Understanding
3D Representation - Avatar Modeling
Interns whom I had the pleasure to work with
- Shijiw Wang in Winter 2023, Ph.D. student at Brown University.
hosted with Weicheng Kuo - Runze Li in Summer 2022. Finished Ph.D. at UC Riverside. Now at Google.
hosted with Weicheng Kuo - Inkyu Shin in Summer 2022, Finished Ph.D. at KAIST. Now at TikTok Research.
hosted with Liang-Chieh Chen and Jun Xie
Awards & Honors
- NSF travel award for Doctoral Consortium, CVPR 2022
- Best Ph.D. Thesis Award, EE, KAIST, 2022
- Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2022 ($5,000)
- Qualcomm Innovation Award ($4,000), 2021
- Outstanding Reviewer Award, IEEE Conference on Computer Vision and Pattern Recognition, 2021
- Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2021 ($5,000)
- Outstanding Reviewer Award, European Conference on Computer Vision, 2020
- KAIST-Samsung Industry-University Cooperation Best Paper Award ($3,000), 2020
- Microsoft Research Asia (MSRA) Ph.D. Fellowship 2019 Winner ($10,000)
- Global Ph.D. Fellowship, National Research Foundation of Korea ($60,000 + 3-year full scholarship)
- 1st Place Award in ChaLearn LAP 2018 Inpainting Challenge Track2 - Video Decaptioning (ECCV 2018 challenge)
- Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd. 2019 ($2,000)
- International Computer Vision Summer School (ICVSS) 2018, Sicily, Italy
US Patents
- Video Panoptic Segmentation (issued, 11,640,714)
- Panoptic Segmentation (issued, 11,256,960)
- Electronic device for key frame analysis and control method thereof (issued, 12,175,369)
- Methods and apparatus localizing object (s) in vision data (pending, 18,289,725)
- Electronic Device and Control Method of Same (pending, 17/554,142)
- Method and Device for Hierarchical Learning of Neural Network Based on Weakly Supervised Learning (pending, 16/758,089)