CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://shubhtuls.github.io/ss3d/ x-github-request-id: A6F2:2F7ECD:877C7E:983578:69522DAE accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 07:28:47 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210071-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766993327.107449,VS0,VE203 vary: Accept-Encoding x-fastly-request-id: d96755f095601dfb9f297e3fa2603656840b1b6e content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Fri, 08 Apr 2022 16:56:05 GMT access-control-allow-origin: * etag: W/"62506925-47db" expires: Mon, 29 Dec 2025 07:38:47 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 4B6B:2C10E1:87973E:98522E:69522DAF accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 07:28:47 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210071-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766993327.340163,VS0,VE223 vary: Accept-Encoding x-fastly-request-id: 040ed3f5d7590c31b86736bac61cce0bc6a77a90 content-length: 2747 Pre-Training meets Self-Training for Supersizing 3D Reconstruction

Pre-train, Self-train, Distill: A simple recipe for Supersizing 3D Reconstruction

Kalyan Vasudev Alwala¹ Abhinav Gupta^1,2 Shubham Tulsiani²

¹Facebook AI Research ²Carnegie Mellon University

[pdf] [code]

We present an approach to learn a single self-supervised reconstruction model across diverse object categories. Given an input image depicting any (segmented) object spanning over 150 categories, this unified reconstruction model can infer its 3D shape.

Our work learns a unified model for single-view 3D reconstruction of objects from hundreds of semantic categories. As a scalable alternative to direct 3D supervision, our work relies on segmented image collections for learning 3D of generic categories. Unlike prior works that use similar supervision but learn independent category-specific models from scratch, our approach of learning a unified model simplifies the training process while also allowing the model to benefit from the common structure across categories. Using image collections from standard recognition datasets, we show that our approach allows learning 3D inference for over 150 object categories. We evaluate using two datasets and qualitatively and quantitatively show that our unified reconstruction approach improves over prior category-specific reconstruction baselines. Our final 3D reconstruction model is also capable of zero-shot inference on images from unseen object categories and we empirically show that increasing the number of training categories improves the reconstruction quality.

Approach Overview

Approach Overview. We first pre-train a reconstruction model using multi-view renderings of synthetic data. We then self-train category-specific models on diverse image collections in-the-wild with only foreground mask annotations. We finally distill the learned models from prior training stages into a unified reconstruction model.

Results

Original Source | Taken Source