-
"LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters"
Poster #22
Authors: Haomin Zhang, Kristin Qi, Shuxin Yang, Zihao Chen, Chaofan Ding, and Xinhan Di
-
"Do State-of-the-art Audio-visual VLMs Understand Audio-video Temporal Misalignment"
Poster #23
Authors: Motonobu Kimura, Ren Ohkubo, Yue Qiu, and Yutaka Satoh
-
"Seeing What You Say: Expressive Image Generation from Speech"
Poster #24
Authors: Jiyoung Lee, Song Park, Sanghyuk Chun, and Soo-Whan Chung
-
"KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation"
Poster #25
Authors: Xingrui Wang, Jiang Liu, Ze Wang, Xiaodong Yu, Jialian Wu, Ximeng Sun, Yusheng Su, Alan Yuille, Zicheng Liu, and Emad Barsoum
-
"Not Like Transformers: Drop the Beat Representation for Dance Generation with Mamba-Based Diffusion Model"
Poster #26
Authors: Sangjune Park, Inhyeok Choi, Donghyeon Soon, Youngwoo Jeon, and Kyungdon Joo
-
"High-Fidelity Talking Portrait Synthesis with Personalized 3D Generative Prior"
Poster #27
Authors: Jaehoon Ko, Kyusun Cho, JoungBin Lee, Heeji Yoon, and Seungryong Kim
-
"Dance Video Generation using Music-to-Pose Encoder Trained on Synthetic Dataset Generation Pipeline leveraging Latent Diffusion Framework"
Poster #28
Author: Nokap Tony Park
-
"Differentiable Room Acoustic Rendering with Multi-View Vision Priors"
Poster #29
Authors: Derong Jin and Ruohan Gao
-
"SpecMaskFoley: Efficient Yet Effective Synchronized Video-to-audio Synthesis via Pretraining and ControlNet"
Poster #30
Authors: Zhi Zhong, Akira Takahashi, Shuyang Cui, Keisuke Toyama, Shusuke Takahashi, and Yuki Mitsufuji
-
"JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version I"
Poster #31
Authors: Xinhan Di and Kristin Qi
-
"TITAN-Guide: Taming Inference-Time Alignment for Guided Text-to-Video Diffusion Models"
Poster #32
Authors: Christian Simon, Masato Ishii, Akio Hayakawa, Zhi Zhong, Shusuke Takahashi, Takashi Shibuya, and Yuki Mitsufuji
ICCV 2025
-
"TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis"
Poster #33
Authors: Tri Ton, Ji Woo Hong, and Chang D. Yoo
ICCV 2025
-
"How Would It Sound? Material-Controlled Multimodal Acoustic Profile Generation for Indoor Scenes"
Poster #34
Authors: Mahnoor Fatima Saad and Ziad Al-Halah
ICCV 2025