HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Sun, 13 Aug 2023 04:56:19 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"64d86273-3305"
expires: Mon, 29 Dec 2025 19:09:17 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 5795:2680BD:92F21B:A4E273:6952CF85
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 18:59:17 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210052-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767034757.170085,VS0,VE223
vary: Accept-Encoding
x-fastly-request-id: b5bf7df002750c6cae66b032a8e7c9afcaa16659
content-length: 3853
Abstract
We introduce ViSER, a method for recovering articulated 3D shapes and dense 3D trajectories from monocular videos.
Previous work on high-quality reconstruction of dynamic 3D shapes typically relies on multiple camera views,
strong category-specific priors, or 2D keypoint supervision.
We show that none of these are required if one can reliably estimate long-range 2D point correspondences, making
use of only 2D object masks and two-frame optical flow as inputs.
ViSER infers correspondences by matching 2D pixels to a canonical, deformable 3D mesh via video-specific
surface embeddings that capture the pixel appearance of each surface point.
These embeddings behave as a continous set of keypoint descriptors defined over the mesh surface, which can be
used to establish dense long-range correspondences across pixels.
The surface embeddings are implemented via coordinate-based MLPs that are fit to each video via contrastive
reconstruction losses.
Experimental results show that ViSER compares favorably against prior work on challenging videos of humans with
loose clothing and unusual poses as well as animals videos from DAVIS and YTVOS.
Bibtex
@inproceedings{yang2021viser,
title={ViSER: Video-Specific Surface Embeddings for Articulated 3D Shape Reconstruction},
author={Yang, Gengshan
and Sun, Deqing
and Jampani, Varun
and Vlasic, Daniel
and Cole, Forrester
and Liu, Ce
and Ramanan, Deva},
booktitle = {NeurIPS},
year={2021}
}
Acknowledgments
This work was supported by Google Cloud Platform (GCP) awards received from Google and the CMU Argo AI Center for
Autonomous Vehicle Research. We thank William T. Freeman and many others from CMU and Google for providing
valuable feedback.