| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://crockwell.github.io/rel_pose/
access-control-allow-origin: *
expires: Mon, 29 Dec 2025 08:38:52 GMT
cache-control: max-age=600
x-proxy-cache: MISS
x-github-request-id: 3480:444BC:8910B7:99E523:69523BC3
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 08:28:52 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210024-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766996932.428309,VS0,VE207
vary: Accept-Encoding
x-fastly-request-id: 8950ac9ba08189557d648c19b1b288f6a81196cb
content-length: 162
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
x-origin-cache: HIT
last-modified: Mon, 22 Sep 2025 01:08:47 GMT
access-control-allow-origin: *
etag: W/"68d0a19f-2e9e"
expires: Mon, 29 Dec 2025 08:38:52 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: F451:15317B:889843:996D5E:69523BC4
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 08:28:52 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210024-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766996933.664685,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 93d34c39ed0345603b309236a9ff5322be500203
content-length: 3289
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs

Figure 1. We propose three small modifications to a ViT via the
Essential Matrix Module, enabling computations similar to the
Eight-Point algorithm. The resulting mix of visual and positional
features is a good inductive bias for pose estimation.
We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductive bias enables a simple method to be competitive in multiple settings, often substantially improving over the state of the art with strong performance gains in limited data regimes.

Figure 2. Essential Matrix Module.
We make three small changes to standard ViT Cross-Attention: (1) appending positional encodings
to Values, (2) applying a dual softmax on Affinities, and (3) applying bilinear attention.
Thanks to Linyi Jin, Ruojin Cai and Zach Teed for help replicating and building upon their works. Thanks to Mohamed El Banani, Karan Desai and Nilesh Kulkarni for their many helpful suggestions. Thanks to Laura Fink and UM DCO for their tireless support with computing! The webpage template originally came from some colorful folks.
|
|
|
|
|
|
|
|
|
|
|
|

Abstract
We present a simple baseline for directly estimating the relative pose (rotation and translation, including scale) between two images. Deep methods have recently shown strong progress but often require complex or multi-stage architectures. We show that a handful of modifications can be applied to a Vision Transformer (ViT) to bring its computations close to the Eight-Point Algorithm. This inductive bias enables a simple method to be competitive in multiple settings, often substantially improving over the state of the art with strong performance gains in limited data regimes.
Approach

Paper and Supplemental Material
Rockwell, Johnson and Fouhey.
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs.
In 3DV 2022. (Hosted on arXiv)
The 8-Point Algorithm as an Inductive Bias for Relative Pose Prediction by ViTs.
In 3DV 2022. (Hosted on arXiv)
Acknowledgements
Thanks to Linyi Jin, Ruojin Cai and Zach Teed for help replicating and building upon their works. Thanks to Mohamed El Banani, Karan Desai and Nilesh Kulkarni for their many helpful suggestions. Thanks to Laura Fink and UM DCO for their tireless support with computing! The webpage template originally came from some colorful folks.
