Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Wed, 25 Jun 2025 10:42:50 GMT access-control-allow-origin: * etag: W/"685bd2aa-4add" expires: Sun, 28 Dec 2025 21:26:48 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: AEA0:2680BD:7FB508:8F59CE:69519E40 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 21:16:48 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210051-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766956608.193072,VS0,VE201 vary: Accept-Encoding x-fastly-request-id: 744ec0cd2910a88292aaa73a9dd0ddb462f840dd content-length: 4768 3D Scene Understanding at CVPR 2025

5th 3D Scene Understanding for Vision, Graphics, and Robotics

CVPR 2025 Workshop, Nashville TN, June 11th Morning, 2025

Watch the video recordings from virtual CVPR or Youtube

Overview

The developments in AI technology have spurred calls for next-generation AI, e.g., Embodied AI and General AI, which enables systems to physically interact with their environments for comprehensive tasks in a human-like manner. Towards this goal, researchers from diverse fields, e.g., computer vision, computer graphics, and robotics, have made separate efforts and made progress across various topics, including 3D representation (e.g., NeRF, Gaussian Splatting), foundation models (e.g., SAM(2), Stable (Video) Diffusion), datasets (e.g., Objaverse (XL), Open X-Embodiment), and end-to-end vision-language-action (VLA) models (e.g., RT-X), etc.

However, new fundamental questions arise about how to sustain a more comprehensive understanding of the environment, unite these efforts, and facilitate the future development of next-generation AI. For example, what is the role of traditional scene parsing/detection/localization in today’s development? How to leverage scene understanding techniques to improve the physical interaction? Could pure end-to-end models and scaling large-scale datasets work, or are intermediate representations, even symbolic ones more suitable for certain tasks?

This year’s focus will be exploring the fundamental aspects to enhance interaction between agents and 3D scenes in the new era of AI, promoting future directions and ideas to emerge within the next two to five years.

Invited Speakers


Deva Ramanan (CMU)	Angel X. Chang (SFU)	Carl Vondrick (Columbia)	Daniel Cremers (TUM)

Iro Armeni (Stanford)	Kiana Ehsani (Vercept)	Guanya Shi (CMU)

Schedule

08:15 am - 08:30 am Opening Remark and Introduction
08:30 pm - 09:00 pm Invited talk: Guanya Shi (CMU) [video]
09:00 am - 09:30 am Invited Talk: Deva Ramanan (CMU)
09:30 am - 10:00 am Invited Talk: Angel X. Chang (SFU) [video]
10:00 am - 10:30 am Invited Talk: Carl Vondrick (Columbia) [video]
10:30 am - 11:00 am Invited Talk: Daniel Cremers (TUM) [video]
11:00 pm - 11:15 pm Coffee Break
11:15 am - 11:45 am Invited Talk: Iro Armeni (Stanford) [video]
11:45 am - 12:15 pm Invited talk: Kiana Ehsani (Vercept)

Organizers


Yixin Chen (BIGAI)	Baoxiong Jia (BIGAI)	Yao Feng (Stanford)	Songyou Peng (DeepMind)

Chuhang Zou (Reality Lab)	Sai Kumar Dwivedi (MPI)	Yixin Zhu (PKU)	Siyuan Huang (BIGAI)

Challenge Organizers


Baoxiong Jia (BIGAI)	Xiongkun Linghu (BIGAI)	Tai Wang (Shanghai AI Lab)	Jingli Lin (SJTU)	Xiaojian Ma (BIGAI)

Senior Organizers


Marc Pollefeys (ETH Zurich)	Derek Hoiem (UIUC)	Song-Chun Zhu (BIGAI, PKU, THU)

Original Source | Taken Source