| CARVIEW |
⚡️Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass
Franziska Meier1, Matt Feiszli1
CVPR 2025
Abstract
Multi-view 3D reconstruction remains a core challenge in computer vision, particularly in applications requiring accurate and scalable representations across diverse perspectives. Current leading methods such as DUSt3R employ a fundamentally pairwise approach, processing images in pairs and necessitating costly global alignment procedures to reconstruct from multiple views. In this work, we propose Fast 3D Reconstruction (Fast3R), a novel multi-view generalization to DUSt3R that achieves efficient and scalable 3D reconstruction by processing many views in parallel. Fast3R's Transformer-based architecture forwards N images in a single forward pass, bypassing the need for iterative alignment. Through extensive experiments on camera pose estimation and 3D reconstruction, Fast3R demonstrates state-of-the-art performance, with significant improvements in inference speed and reduced error accumulation. These results establish Fast3R as a robust alternative for multi-view applications, offering enhanced scalability without compromising reconstruction accuracy.
⚡️Fast3R Demo
Our interactive Gradio demo allows you to upload images or videos and visualize the 3D reconstruction in lightning ⚡️ speed.
Upload a video or images, visualize 3D reconstruction, playback frame by frame, explore confidence maps, and render a GIF. And don't forget to give us a feedback! 🤗
Results Showcase
Explore our 3D reconstruction results across a variety of scenes.
3D Reconstruction ❤️ LLM Scalability
Fast3R departs from the long-standing two-view architecture design in most existing 3D reconstruction methods and instead processes all views together. As a result, traditional time and memory consuming view selection and global alignment stages are eliminated and all become end-to-end learnable in a single unified images-to-3D model, resulting in dramatic speed and memory improvement.
Fast3R at its core uses a big Transformer to fuse information across views and leverages a series of LLM training and inference techniques to enable efficient and scalable processing:
- FlashAttention 2.0 for memory-efficient attention computation
- DeepSpeed ZeRO-2 for distributed training optimization
- Positional Embedding Interpolation to "train short, test long"
- Tensor Parallelism for accelerated inference across multiple GPUs
Speed & Memory
Comparison of computational efficiency between Fast3R and DUSt3R on a single A100 GPU. Each view has a 512×384 resolution.
| # Views | Fast3R | DUSt3R | ||
|---|---|---|---|---|
| Time (s) | Peak GPU Mem (GiB) | Time (s) | Peak GPU Mem (GiB) | |
| 2 | 0.065 | 3.84 | 0.092 | 3.52 |
| 8 | 0.122 | 6.33 | 8.386 | 24.59 |
| 32 | 0.509 | 13.25 | 129.0 | 67.61 |
| 48 | 0.84 | 20.8 | OOM | OOM |
| 320 | 15.938 | 41.90 | OOM | OOM |
| 800 | 89.569 | 55.97 | OOM | OOM |
| 1000 | 137.62 | 63.01 | OOM | OOM |
| 1500 | 308.85 | 78.59 | OOM | OOM |
Note: "OOM" indicates Out of Memory. For DUSt3R, at 48 views the N² pairwise reconstructions consume all VRAM during global alignment.
Scalability
Fast3R's performance scales with increasing model and data size, demonstrating an exciting future for large-scale 3D reconstruction.
Model Scaling
Data Scaling
BibTeX
@InProceedings{Yang_2025_Fast3R,
title={Fast3R: Towards 3D Reconstruction of 1000+ Images in One Forward Pass},
author={Jianing Yang and Alexander Sax and Kevin J. Liang and Mikael Henaff and Hao Tang and Ang Cao and Joyce Chai and Franziska Meier and Matt Feiszli},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month={June},
year={2025},
}