Carview!

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 01 Jul 2024 18:08:33 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6682f0a1-440b" expires: Tue, 30 Dec 2025 05:50:56 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 2F63:2118F1:99B0AD:ACC338:695365E8 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 05:40:56 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210077-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767073257.710789,VS0,VE228 vary: Accept-Encoding x-fastly-request-id: 493bed84453c95a7dd2ce99ebc27aa7b93f19430 content-length: 3361 MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

ECCV 2024

Shitao Tang^1, Jiacheng Chen^1, Dilin Wang^2*, Chengzhou Tang², Fuyang Zhang¹, Yuchen Fan², Vikas Chandra², Yasutaka Furukawa†¹, Rakesh Ranjan†²

¹Simon Fraser University ²Meta Reality Labs

¹*Equal contribution. † Joint last author. Contact the authors at shitaot@sfu.ca.

Paper

Discussions

Abstract

This paper presents a neural architecture MVDiffusion++ for 3D object reconstruction that synthesizes dense and high-resolution views of an object given one or a few images without camera poses. MVDiffusion++ achieves superior flexibility and scalability with two surprisingly simple ideas: 1) A ``pose-free architecture'' where standard self-attention among 2D latent features learns 3D consistency across an arbitrary number of conditional and generation views without explicitly using camera pose information; and 2) A ``view dropout strategy'' that discards a substantial number of output views during training, which reduces the training-time memory footprint and enables dense and high-resolution view synthesis at test time. We use the Objaverse for training and the Google Scanned Objects for evaluation with standard novel view synthesis and 3D reconstruction metrics, where MVDiffusion++ significantly outperforms the current state of the arts. We also demonstrate a text-to-3D application example by combining MVDiffusion++ with a text-to-image generative model.

Single view reconstruction

MVDiffusion++ is able to generate dense, high-resolution images conditioned on single or multiple unposed images

Input image

Generated images

Textured mesh

More examples

Sparse view reconstruction

Left: generated images, Right: textured mesh
1-view generation	2-view generation	4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

1st view

2nd view

3rd view

4th view

1-view generation

2-view generation

4-view generation

More examples

Citation

@article{tang2024mvdiffusionpp,
  title={MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction},
  author={Tang, Shitao and Chen, Jiacheng and Wang, Dilin and Tang, Chengzhou and Zhang, Fuyang and Fan, Yuchen and Chandra, Vikas and Furukawa, Yasutaka and Ranjan, Rakesh},
  journal={arXiv preprint arXiv:2402.12712},
  year={2024}
}

Original Source | Taken Source

MVDiffusion++: A Dense High-resolution Multi-view Diffusion Model for Single or Sparse-view 3D Object Reconstruction

ECCV 2024

Shitao Tang1*, Jiacheng Chen1*, Dilin Wang2*, Chengzhou Tang2, Fuyang Zhang1, Yuchen Fan2, Vikas Chandra2, Yasutaka Furukawa†1, Rakesh Ranjan†2

Abstract

Single view reconstruction

MVDiffusion++ is able to generate dense, high-resolution images conditioned on single or multiple unposed images

Sparse view reconstruction

Citation

Shitao Tang^1, Jiacheng Chen^1, Dilin Wang^2*, Chengzhou Tang², Fuyang Zhang¹, Yuchen Fan², Vikas Chandra², Yasutaka Furukawa†¹, Rakesh Ranjan†²