CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 16 Mar 2024 09:32:57 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"65f56749-57bb" expires: Tue, 30 Dec 2025 00:01:18 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 589E:3FD64F:96C7A3:A93F02:695313F5 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 23:51:18 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210025-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767052278.233693,VS0,VE201 vary: Accept-Encoding x-fastly-request-id: 2bfc6e4f648bfa43316de2c94ac7e33680cec2d0 content-length: 3519 Project Page

Consistent4D: Consistent 360° Dynamic Object Generation from Monocular Video

Yanqin Jiang¹, Li Zhang³, Jin Gao¹, Weiming Hu¹, Yao Yao^2*

¹ CASIA, ² Nanjin University, ³ Fudan University

Paper

arXiv Video (coming soon) Code Data

(a) In-the-wild Video

(b) Foreground

(d) Consistent4D Depth

Consistent4D generates a 360° dynamic object from a uncalibrated monocular video
captured by hand-held devices like smartphones or from segments of animated films.

Abstract

In this paper, we present Consistent4D, a novel approach for generating 4D dynamic objects from uncalibrated monocular videos.

Uniquely, we cast the 360-degree dynamic object reconstruction as a 4D generation problem, eliminating the need for tedious multi-view data collection and camera calibration. This is achieved by leveraging the object-level 3D-aware image diffusion model as the primary supervision signal for training Dynamic Neural Radiance Fields (DyNeRF). Specifically, we propose a Cascade DyNeRF to facilitate stable convergence and temporal continuity under the supervision signal which is discrete along the time axis. To achieve spatial and temporal consistency, we further introduce an Interpolation-driven Consistency Loss. It is optimized by minimizing the discrepancy between rendered frames from DyNeRF and interpolated frames from a pretrained video interpolation model.

Extensive experiments show that our Consistent4D can perform competitively to prior art alternatives, opening up new possibilities for 4D dynamic object generation from monocular videos, whilst also demonstrating advantage for conventional text-to-3D generation tasks.

Pipeline

The framework consists of the optimization of a Cascade DyNeRF and the training of a post video enhancer. The Cascade DyNeRF, which adopts residual learning strategy, is supervised by SDS loss from an image-to-image diffusion model. Particularly, a novel Interpolation-driven Consistency Loss is proposed to compensate for the spatiotemporal inconsistency brought by the image diffusion model. For post-processing, we train a lightweight cross-frame video enhancer using GAN to further improve the quality of the video rendered from DyNeRF.

Gallery

Input Video

Novel View 0

Novel View 1

Timestamp 0

Timestamp 1

Please refer to the paper and the video to see more visualization results.

BibTeX

@inproceedings{
jiang2024consistentd,
title={Consistent4D: Consistent 360{\textdegree} Dynamic Object Generation from Monocular Video},
author={Yanqin Jiang and Li Zhang and Jin Gao and Weiming Hu and Yao Yao},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024},
url={https://openreview.net/forum?id=sPUrdFGepF}
}

Original Source | Taken Source