Carview!

CARVIEW

MOTORHOMES

Select Language

Abstract

Single-image 3D reconstruction remains a fundamental challenge in computer vision due to inherent geometric ambiguities and limited viewpoint information. Recent advances in Latent Video Diffusion Models (LVDMs) offer promising 3D priors learned from large-scale video data. However, leveraging these priors effectively faces three key challenges: (1) degradation in quality across large camera motions, (2) difficulties in achieving precise camera control, and (3) geometric distortions inherent to the diffusion process that damage 3D consistency. We address these challenges by proposing LiftImage3D, a framework that effectively releases LVDMs' generative priors while ensuring 3D consistency. Specifically, we design an articulated trajectory strategy to generate video frames, which decomposes video sequences with large camera motions into ones with controllable small motions. Then we use robust neural matching models, i.e. MASt3R, to calibrate the camera poses of generated frames and produce corresponding point clouds. Finally, we propose a distortion-aware 3D Gaussian splatting representation, which can learn independent distortions between frames and output undistorted canonical Gaussians. Extensive experiments demonstrate that LiftImage3D achieves state-of-the-art performance on two challenging datasets, i.e. LLFF and DL3DV, and generalizes well to diverse in-the-wild images, from cartoon illustrations to complex real-world scenes.

Interactive Viewer

Click on the images below to render 3D scenes in real-time in your browser, powered by Brush!
Note that the quality may be reduced. We will also provide a local viewer

Browser Not Supported

Your browser does not appear to support the interactive viewer. Currently, only Chrome 130+ is supported.

Framework

The overall pipeline of LiftImage3D. Firstly, we extend LVDM to generate diverse video clips from a single image using an articulated camera trajectory strategy. Then all generated frames are matching using the robust neural matching module and registered in to a point cloud. After that we initialize Gaussians from registered point clouds and construct a distortion field to model the independent distortion of each video frame upon canonical 3DGS.

Single Image to 3D Scene which Can Be Dragged freely

Citation

@misc{chen2024liftimage3d, title={LiftImage3D: Lifting Any Single Image to 3D Gaussians with Video Generation Priors}, author={Yabo Chen and Chen Yang and Jiemin Fang and Xiaopeng Zhang and Lingxi Xie and Wei Shen and Wenrui Dai and Hongkai Xiong and Qi Tian}, year={2024}, eprint={2412.09597}, archivePrefix={arXiv}, primaryClass={cs.CV} }

The website template was borrowed from Nerfies, and CAT3D. Interactive viewer powered by Brush. Thanks for their awesome work! Thanks Sikuang Li and Guanjun Wu for their contributions to this project page.

HOME

ABOUT

AUCTIONS

SHIPPING

FEES

TOOLS

HOW

FAQ

CONTACT

Original Source | Taken Source