CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://muelea.github.io/hsfm/ x-github-request-id: 13BF:15317B:9EFCA9:B28BD5:6953A663 accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 10:16:03 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210022-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767089764.543436,VS0,VE200 vary: Accept-Encoding x-fastly-request-id: 81ba8211b05867618494f202415223a9ea90d187 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Wed, 30 Apr 2025 10:41:10 GMT access-control-allow-origin: * etag: W/"6811fe46-4783" expires: Tue, 30 Dec 2025 10:26:03 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: B48E:2680BD:9E9BF9:B229B1:6953A65D accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 10:16:04 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210022-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767089764.774987,VS0,VE237 vary: Accept-Encoding x-fastly-request-id: dbf0b5c0d49ba1ec0993aeb47d9a38e2aaff8b91 content-length: 5961 HSfM

Reconstructing People, Places, and Cameras

Lea Müller^* Hongsuk Choi^* Anthony Zhang Brent Yi Jitendra Malik Angjoo Kanazawa

UC Berkeley

^* equal contribution

CVPR 2025 (Highlight)

TLDR; We propose - Humans and Structure from Motion (HSfM) - Our approach integrates Human Mesh Reconstruction and Structure from Motion to jointly estimate 3D human pose and shape, scene point maps, and camera poses in a metric world coordinate frame.

Paper Code

Our approach places people in the scene and improves camera pose and scene reconstruction. Here we show a top view of a gym environment before HSfM optimization (DUSt3R output) and after optimization with HSfM Loss.

Reconstruction Result

Humans and Structure from Motion enables capturing people interacting with their environment as well as the spatial positioning between individuals. Here we show a reconstruction of three people building Lego together.

Overview

Humans and Structure from Motion (HSfM) is a novel method that jointly reconstructs 3D humans, scene, and cameras from a sparse set of uncalibrated images. To achieve this, HSfM combines Human Mesh Recovery (HMR) methods for local human pose estimation and Structure from Motion (SfM) techniques for scene and camera reconstruction and to localize people. Specifically, our approach combines camera and scene reconstruction from data-driven SfM methods, such as DUSt3R, with the bundle adjustment step from traditional SfM applied to 2D keypoints where a human body model provides 3D human meshes and constrains human size.

Step by Step Method

Step 1: Input Processing and Feature Extraction
From the sparse input images, we extract 2D human joints using VIT-Pose and estimate 3D joint positions and body shape using HMR2. For scene and camera reconstruction, we use DUSt3R, a state-of-the-art data-driven SfM method. We assume known re-identification of people across camera views.
Step 2: Resolving Scale Ambiguity
SfM methods often suffer from scale ambiguity. We address this by estimating camera parameters based on human body size and orientation. After alignment, people are roughly placed in a world of consistent scale.
Step 3: Joint Human and Scene Reconstruction
We adapt the global alignment loss from DUSt3R to jointly estimate humans and the scene. The HSfM loss consists of three terms: a keypoint re-projection term performing bundle adjustment on 2D body joints, a body shape regularizer, and the global alignment loss from DUSt3R.

Evaluation

This joint reasoning not only enables accurate human placement in the scene. Notably, it also improves camera poses and the scene reconstruction itself. Evaluations on public benchmarks show significant improvements. Here we show the camera angle (RRA) and scaled translation (s-CCA) accuracy in percent at a threshold of 10 degree / meter on the EgoHumans benchmark.

Joint Human and Scene Reconstruction Metrics

Method Video

Acknowledgements

The interactive results are powered by Viser.

This project is supported in part by DARPA No. HR001123C0021, IARPA DOI/IBC No. 140D0423C0035, NSF:CNS-2235013, ONR MURI N00014-21-1-2801, Bakar Fellows, and Bair Sponsors. The views and conclusions contained herein are those of the authors and do not represent the official policies or endorsements of these institutions. We also thank Chung Min Kim for her critical reviews of this paper and Junyi Zhang for his valuable insights on the method.

Citation


          @article{mueller2024hsfm,

            title={Reconstructing People, Places, and Cameras},

            author={Lea M\"uller and Hongsuk Choi and Anthony Zhang
          and Brent Yi and Jitendra Malik and Angjoo Kanazawa},

            year={2024},

            journal={arXiv:2412.17806},

          }

Original Source | Taken Source