| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://3d-aigc.github.io/VDG/
access-control-allow-origin: *
strict-transport-security: max-age=31556952
expires: Mon, 29 Dec 2025 17:01:24 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 4C1D:2118F1:90B016:A266D6:6952B18C
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 16:51:24 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210058-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767027084.124750,VS0,VE203
vary: Accept-Encoding
x-fastly-request-id: eeb99da700ad1682831dc1a5ae77beefb42b3668
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 11 Jul 2025 10:57:09 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"6870ee05-4428"
expires: Mon, 29 Dec 2025 17:01:24 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: B4D0:272D88:92744B:A429B6:6952B18C
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 16:51:24 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210058-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767027084.349055,VS0,VE210
vary: Accept-Encoding
x-fastly-request-id: bfade7d396973bda51c9bd0ed7e3fc1c4868996b
content-length: 4653
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
VDG: Vision-Only Dynamic Gaussian for Driving Simulation
Hao Li*,
Jingfeng Li*,
Dingwen Zhang,
Chenming Wu,
Jieqi Shi,
Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han
Chen Zhao, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han
Brain and Artificial Intelligence Lab, Northwestern Polytechnical University
Baidu VIS
Aerial Robotics Group, HKUST
Baidu VIS
Aerial Robotics Group, HKUST
Abstract
Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views.
Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors.
For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition.
Moreover, VDG can work with RGB image input only and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method.
We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments.
Our results show favorable performance over the state-of-the-art dynamic view synthesis methods.
The proposed VDG. (a) VDG Initialization: uses the off-the-shelf VO network \(\mathcal{P}(\cdot)\), \(\mathcal{M}(\cdot)\), and \(\mathcal{D}(\cdot)\) to estimate the global poses \(T_t\), motion masks \(M_t\), and depth maps \(D_t\) (see Sec. IV-A.1). Given poses \(T_t\) and corresponding depth maps \(D_t\), we project the depth maps into 3D space to initialize the Gaussian points \(G^k_t =\{\tilde{\mu}^k_t, \Sigma^k, \widetilde{\alpha}^k_t, S^k\}\). Note that the velocity \(v\) of each Gaussian is set to 0 (see Sec. IV-A.2). (b) VDG Training Procedure: Given initialized Gaussians \(G^k_t\), we train our VDG using RGB and depth supervision (see Sec. IV-A.3). Moreover, we apply motion mask supervision to decompose static and dynamic scenes (Sec. IV-B). In the end, we adopt a training strategy to refine vo-given poses \(T_t\) (Sec. IV-C).
Quantitative performance of novel view synthesis on the Waymo Open Dataset and KITTI benchmark. '-' means SplaTAM cannot rendering original resolution image on a single NVIDIA V100 GPU.
Pose accuracy on the Waymo Open and KITTI datasets. Note that the unit of $RPE_r$ is in degrees, ATE is in the ground truth scale and $RPE_t$ is scaled by 100.