HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 07 Apr 2025 20:30:05 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67f435cd-48f8" expires: Sat, 27 Dec 2025 21:24:06 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: C220:3FD64F:70340B:7D50EE:69504C1C accept-ranges: bytes age: 0 date: Sat, 27 Dec 2025 21:14:06 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210020-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766870046.520784,VS0,VE581 vary: Accept-Encoding x-fastly-request-id: 899dfc0c4ffb54776d84d48f6fcfaa39e99fb03d content-length: 4793 Can Generative Video Models Help Pose Estimation?

Can Generative Video Models Help Pose Estimation?

Ruojin Cai^1,2, Jason Y. Zhang¹, Philipp Henzler¹, Zhengqi Li¹, Noah Snavely^1,2, Ricardo Martin-Brualla¹

¹Google, ²Cornell University

arXiv Check back later

CVPR 2025 Highlight

Yes! We find that off-the-shelf generative video models can hallucinate plausible intermediate frames that provide useful context for pose estimators (e.g. DUSt3R), especially for images with little to no overlap.

Estimating the relative camera pose from two images with minimal overlap is challenging for existing methods that rely on visual correspondences (left). We propose InterPose, which uses an off-the-shelf video generation model to interpolate a video connecting the two images. These generated frames enhance pose estimators (e.g. DUSt3R), enabling more accurate and reliable pose estimation (right).

Example Results

Camera for Image A

Predicted Camera for Image B

Method Overview

Given two images, our goal is to recover their relative camera pose. We introduce InterPose, which leverages off-the-shelf video models to generate the intermediate frames between the two images. By using these generated frames alongside the original image pair as input to a camera pose estimator, we provide additional context that can improve pose estimation compared to using only the two input images. A key challenge is that the generated videos may contain visual artifacts or implausible motion. Thus, we generate multiple videos which we score using a self-consistency metric to select the best video sample.

Video Generation for Pose Estimation: Benefits and Challenges

We propose to use a video model to hallucinate intermediate frames between two input images, effectively creating a dense visual transition, which significantly simplifies the problem of pose estimation. However, a key challenge persists: generated videos may contain visual inconsistencies, like morphing or shot cuts, which can degrade pose estimation performance. We show some examples of common failure modes of video models.

Selecting Visually Consistent Videos with a Self-consistency Score

One approach is to sample multiple such video interpolations, with the hope that one displays a plausible interpretation of the scene that is 3D consistent. However, how do we tell which video sample is a good one?
Consider a low-quality video that has rapid shot-cuts or inconsistent geometry, like Video 0. Selecting different subsets of frames from that video would likely produce dramatically different pose estimations. We operationalize this concept by measuring a video's self-consistency.

Experimental Results

We demonstrate that our approach generalizes among three state-of-the-art video models and show consistent improvements over the state-of-the-art DUSt3R on four diverse datasets encompassing indoor, outdoor, and object-centric scenes.

Quantitative Results

Our approach, integrating generative video models with DUSt3R, consistently achieves better performance than using only the original image pair as input (DUSt3R - Pair) across all datasets, achieving lower rotation and translation errors.

This also applies to MASt3R. While MASt3R excels with overlapping pairs via feature matching, it struggles with non-overlapping ones due to unreliable correspondences. Our approach maintains robustness, outperforming MASt3R on outward-facing datasets and matching its performance on center-facing datasets. For detailed MASt3R results, please refer to the supplementary material.

Qualitative Results

Check out the webpages for DUSt3R interactive point clouds and more examples on four datasets.

DUSt3R interactive point clouds

more examples on four datasets

Acknowledgements

We would like to thank Keunhong Park, Matthew Levine, and Aleksander Hołynski for their feedback and suggestions.

BibTeX

@inproceedings{InterPose,
  title     = {Can Generative Video Models Help Pose Estimation?},
  author    = {Cai, Ruojin and Zhang, Jason Y and Henzler, Philipp and Li, Zhengqi and Snavely, Noah and Martin-Brualla,
  Ricardo},
  booktitle = {CVPR},
  year      = {2025},
}

This website is borrowed from Nerfies.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source