Carview!

EmbeddingGemma: Powerful and Lightweight Text Representations

Gemini Embedding Team, Google

2025

[ paper / huggingface / Google blogpost ]

Context-Adaptive Multi-Prompt Embedding with Large Language Models for Vision-Language Alignment

Dahun Kim, Anelia Angelova

COLM 2025

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team, Google

2025

[ paper / Google blogpost ]

Time-Scaling State-Space Models for Dense Video Captioning

AJ Piergiovanni, Ganesh Mallya, Dahun Kim, Anelia Angelova

BMVC 2025

[ paper ]

VideoComp: Advancing Fine-Grained Compositional and Temporal Alignment in Video-Text Models

Dahun Kim, AJ Piergiovanni, Ganesh Mallya, Anelia Angelova

CVPR 2025

[ paper / data ]

Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications

Ganesh Mallya, Yotam Gigi, Dahun Kim, Maxim Neumann, Genady Beryozkin, Tomer Shekel, Anelia Angelova

AGU 2025 Oral presentation

[ paper / Google blogpost / Colab tutorial ]

Whats in a Video: Factorized Autoregressive Decoding for Online Dense Video Captioning

AJ Piergiovanni, Dahun Kim, Michael S Ryoo, Isaac Noble, Anelia Angelova

Preprint 2025

[ paper ]

Learning Visual Grounding from Generative Vision and Language Model

Shijie Wang, Dahun Kim, Ali Taalimi, Chen Sun, Weicheng Kuo

WACV 2025

[ paper ]

Region-centric Image-Language Pretraining for Open-Vocabulary Detection

Dahun Kim, Anelia Angelova, Weicheng Kuo

ECCV 2024

[ paper / code / Google Cloud Vertex AI ]

Textless Unit-to-Unit training for Many-to-Many Multilingual Speech-to-Speech Translation

Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Roh

TASLP 2024 (IEEE/ACM Transactions on Audio, Speech and Language Processing)

[ paper ]

Omnibind: Teach to build unequal-scale modality interaction for omni-bind of all

Yuanhuiyi Lyu, Xu Zheng, Dahun Kim, Lin Wang

Preprint 2024

[ paper ]

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

AJ Piergiovanni*, Isaac Nobel*, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

CVPR 2024
Featured at Google AI blogpost

[ paper / Google blogpost ]

Contrastive Feature Masking Open-Vocabulary Vision Transformer

Dahun Kim, Anelia Angelova, Weicheng Kuo

ICCV 2023

[ paper ]

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

Dahun Kim, Anelia Angelova, Weicheng Kuo

CVPR 2023 Highlight presentation - top 2.5% of submissions
Featured at Google AI blogpost

[ paper / code / Google blogpost ]

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

Weicheng Kuo*, AJ Piergiovanni*, Dahun Kim†, Xiyang Luo†, Ben Caine, Wei Li, Abhijit Ogale,
Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova (*, † equal contribution)

TMLR 2023
Featured at Google AI blogpost

[ paper / Google blogpost ]

RECLIP: Resource-Efficient Clip by Training with Small Images

Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

TMLR 2023

[ paper ]

Uni-DVPS: Unified Model for Depth-Aware Video Panoptic Segmentation

Ji-Yeon Kim, Hyun-Bin Oh, Dahun Kim, Tae-Hyun Oh

RAL-IROS 2024 Oral presentation
Short version at CVPRW 2023 'Vision-Centric Autonomous Driving' Workshop

[ paper ]

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

WACV 2024 Oral presentation
Short version at 'Transformers for Vision' workshop @ CVPR 2023

[ paper / video demo ]

Dense Pixel-level Interpretation of Dynamic Scenes with Video Panoptic Segmentation

Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

TIP 2022
Short version at What is Motion For (WIMF) workshop @ ECCV 2022

[ paper ]

TubeFormer-DeepLab: Video Mask Transformer

Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

CVPR 2022
Ranked #1 on SemKITTI-DVPS, #3 on KITTI-STEP, and #4 on VSPW 2021
Short version at 'Transformers for Vision' workshop @ CVPR 2022

[ paper ]

CMT-DeepLab: Dynamic Clustering Mask Transformers for Panoptic Segmentation

Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

CVPR 2022 Oral presentation

[ paper ]

Learning Open-World Object Proposals without Learning to Classify

Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

RAL-ICRA 2022
Invited paper talk at Open-World Segmentation (UVO) Workshop @ ICCV 2021
Received Qualcomm Innovation Award 2021

[ paper / code / tf2 / talk ]

Tailor Me: An Editing Network for Fashion Attribute Shape Manipulation

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Vishy Swaminathan, Henry Fuchs

WACV 2022

[ paper ]

Global Context and Geometric Priors for Effective Non-Local Self-Attention

Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

BMVC 2021
Received Bronze Prize, 27th HumanTech Paper Award, Samsung Electronics Co., Ltd

[ paper ]

DeepLab2: A TensorFlow Library for Deep Labeling

Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan,
Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

Technical report 2021 Internal code contribution

[ paper / code ]

Learning to Associate Every Segment for Video Panoptic Segmentation

Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

CVPR 2021

[ paper ]

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

WACV 2021

[ paper ]

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

Sanghyun Woo, Dahun Kim, KwanYoung Park, Joon-Young Lee, In So Kweon

BMVC 2020 (Acceptance: 195/670 ≈ 29.1%)

[ paper ]

Video Panoptic Segmentation

Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

CVPR 2020 Oral presentation (Acceptance: 335/6656 ≈ 5.0%)
Patented

[ paper / code / project ]

Recurrent Temporal Aggregation Framework for Deep Video Inpainting

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

TPAMI 2020
Received KAIST-Samsung Industry-University Cooperation Best Paper Award

[ paper / code ]

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyungsu Kim, Sungjin Kim, In So Kweon

AAAI 2020 (Acceptance: 1591/7737 ≈ 20.6%)

[ paper ]

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

1st place winner of ECCV 2018 Chalearn LAP Video De-Captioning Challenge

[ paper / code / video / project ]

Deep Video Inpainting

Dahun Kim*, Sanghyun Woo*, Joon-Young Lee, In So Kweon (* equal contribution)

CVPR 2019 (Acceptance: 1294/5160 ≈ 25.2%)

[ paper / code / video / project ]

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

Dahun Kim, Donghyeon Cho, In So Kweon

AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)

[ paper ]

Discriminative Feature Learning for Unsupervised Video Summarization

Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

AAAI 2019 Oral presentation (Acceptance: 459/7095 ≈ 6.5%)
Received Honorable Mention, 25th HumanTech Paper Award, Samsung Electronics Co., Ltd
Patented

[ paper ]

Video Retargeting: Trade-off between Content Preservation and Spatio-temporal Consistency

Donghyeon Cho, Yunjae Jung, Francois Rameau, Dahun Kim, Sanghyun Woo and In So Kweon

MM 2019 (Acceptance: 252/936 ≈ 26.9%)

[ paper ]

Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

MM 2019 (Acceptance: 252/936 ≈ 26.9%)

[ paper ]

LinkNet: Relational Embedding for Scene Graph

Sanghyun Woo*, Dahun Kim*, Donghyeon Cho, In So Kweon (* equal contribution)

NeurIPS 2018 (Acceptance: 1011/4856 ≈ 20.8%)

[ paper ]

Learning Image Representations by Completing Damaged Jigsaw Puzzles

Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

WACV 2018

[ paper ]

Two Phase Learning for Weakly Supervised Object Localization

Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

ICCV 2017 (Acceptance: 621/2143 ≈ 28.9%)

[ paper ]

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

ICLR 2023

[ paper / project ]

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

NeurIPS 2021 Spotlight presentation (Acceptance: < 3.0%)
Received Bronze Prize, 28th HumanTech Paper Award, Samsung Electronics Co., Ltd

[ paper / code / project ]

Rotationally-Consistent Novel View Synthesis for Humans

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Henry Fuchs, Vishy Swaminathan

MM 2020 (Acceptance: 472/1698 ≈ 27.8%)

[ paper / dataset ]

Rotationally-Temporally Consistent Novel-View Synthesis of Human Performance Video

Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Eunbyung Park, Vishy Swaminathan, Henry Fuchs

ECCV 2020 Spotlight presentation (Acceptance: 265/5025 ≈ 5.3%)

[ paper / dataset / code ]

Dahun Kim

Google DeepMind

Contact

Education

Academic Activities

Research Experiences

Publications

Multimodal AI

Perception - Object and Video Understanding

3D Representation - Avatar Modeling

Interns whom I had the pleasure to work with

Awards & Honors

US Patents