| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Fri, 17 Oct 2025 14:51:09 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68f257dd-770c"
expires: Sun, 28 Dec 2025 08:27:13 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 9377:3157C7:75801F:83C5AE:6950E787
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 08:17:13 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210028-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766909834.682789,VS0,VE208
vary: Accept-Encoding
x-fastly-request-id: 54489b1dd208c92f48f8160cab8b0e9a515eb598
content-length: 6279
PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos
PAD3R
SIGGRAPH Asia 2025
PAD3R
Pose-Aware Dynamic 3D Reconstruction from Casual Videos
Ting-Hsuan Liao*1  
Haowen Liu*1  
Yiran Xu1  
Songwei Ge1  
Gengshan Yang✢  
Jia-Bin Huang✢1  
1University of Maryland College Park   
✢Joint advising.
SIGGRAPH Asia 2025
Input casual video
Dynamic 3D object reconstruction
Our method reconstructs dynamic 3D objects from a single casual monocular video,
coupling object deformation with camera motion.
Input casual video
Dynamic 3D object reconstruction
Our method reconstructs dynamic 3D objects from a single casual monocular video,
coupling object deformation with camera motion.
coupling object deformation with camera motion.
This webpage showcases qualitative results and comparisons of our method from
In-the-wild videos and
Artemis dataset.
We also present visualizations from our composite demo, ablation study and highlight representative failure cases.
Please refer to our main paper for more details on the results.
To explore the content, scroll down or use the navigation buttons below.
Comparisons on Artemis ↑ back to top
We present qualitative results for four sequences from the Artemis dataset with different view coverge.
Panda
Wolf
Cat
Duck
Baseline
Input
Reference view
360° view
Method ↑ back to top
Our method consists of two main stages.
In the first stage, we select a frame from the video sequence as the canonical frame (or keyframe), and use an image-to-3D model to obtain a static 3D Gaussian.
We then render 3D Gaussian from a set of randomly sampled camera poses to fine-tune a lightweight image-to-pose estimator, PoseNet, using DINO-v2 backbone.
Then, we use the camera pose estimator to initialize the camera pose of every input video frame and optimize a deformable 3d object model.
Robustness Across View Coverage ↑ back to top
To further assess PAD3R’s capacity to capture camera movements, we analyze how input view coverage affects reconstruction quality.
We evaluate on five input sequences, each providing a different extent of viewpoint variation,
0° (single-view), 40°, 90°, 140° and 180°.
PAD3R maintains consistently high reconstruction quality across varying view coverage angles.
In contrast, due to its static camera assumption, DreamMesh4D exhibits a steady decline in performance as the range of viewpoints expands.
Conversely, BANMo shows improved results with broader view coverage, but performs poorly under single-view or narrow-view settings.
Composite Demo ↑ back to top
Our model estimates object-centric camera poses (relative to the object).
By combining this with off-the-shelf methods for estimating scene-centric camera poses and simple background reconstruction,
we can re-project the dynamic 3D object back into a full 3D scene.
This enables rendering with large-scale camera trajectories. Below, we showcase a demo where the camera smoothly navigates through the reconstructed 3D world.
This enables rendering with large-scale camera trajectories. Below, we showcase a demo where the camera smoothly navigates through the reconstructed 3D world.
Ablation Study ↑ back to top
We present qualitative ablation results on three Artemis sequences, gradually introducing key components of our method.
PoseNet initialization improves camera pose estimation and leads to better novel view consistency.
Multi-block tracking supervision helps capture fine-grained motion, particularly around articulated limbs.
Incorporating bi-directional multi-block tracking (full model) further improves reconstruction quality, producing more consistent object dynamics and camera motion.
Sequence
Input
cam
cam+P_init
cam+P_init+track
Full method
Reference view
360° view
Limitations ↑ back to top
Inaccurate initial geometry leads to pose errors and degraded reconstruction.
Static model
Input
Reference view
360° view
References
[1] Yifei Zeng, Yanqin Jiang, Siyu Zhu, Yuanxun Lu, Youtian Lin, Hao Zhu, Weiming Hu, Xun Cao and Yao Yao. STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians. In ECCV, 2024.
[2] Jiawei Ren, Kevin Xie, Ashkan Mirzaei, Hanxue Liang, Xiaohui Zeng, Karsten Kreis, Ziwei Liu, Antonio Torralba, Sanja Fidler, Seung Wook Kim and Huan Ling. L4GM: Large 4D Gaussian Reconstruction Model. In NeurIPS, 2024.
[3] Zhiqi Li, Yiming Chen and Peidong Liu. DreamMesh4D: Video-to-4D Generation with Sparse-Controlled Gaussian-Mesh Hybrid Representation. In NeurIPS, 2024.
[4] Gengshan Yang, Minh Vo, Natalia Neverova, Deva Ramanan, Andrea Vedaldi, Hanbyul Joo. BANMo: Building Animatable 3D Neural Models from Many Casual Videos. In CVPR, 2022.
BibTeX
@article{pad3r,
author = {Liao, Ting-Hsuan and Liu, Haowen and Xu, Yiran and Ge, Songwei and Yang, Gengshan and Huang, Jia-Bin},
title = {PAD3R: Pose-Aware Dynamic 3D Reconstruction from Casual Videos},
journal = {SIGGRAPH ASIA},
year = {2025},
}