You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Probing the 3D Awareness of Visual Foundation Models
This repository contains a re-implementation of the code for the paper Probing the 3D Awareness of
Visual Foundation Models (CVPR 2024) which presents an analysis of the 3D awareness of visual
foundation models.
If you find this code useful, please consider citing:
@inProceedings{elbanani2024probing,
title={{Probing the 3D Awareness of Visual Foundation Models}},
author={
El Banani, Mohamed and Raj, Amit and Maninis, Kevis-Kokitsi and
Kar, Abhishek and Li, Yuanzhen and Rubinstein, Michael and Sun, Deqing and
Guibas, Leonidas and Johnson, Justin and Jampani, Varun
},
booktitle={CVPR},
year={2024},
}
Environment Setup
We recommend using Anaconda or Miniconda. To setup the environment, follow the instructions below.
Finally, please follow the dataset download and preprocessing instructions here.
Evaluation Experiments
We provide code to train the depth probes and evaluate the correspondence. All experiments use
hydra configs which can be found here. Below are example commands for running the
evaluations with the DINO ViT-B/16 backbone.
# Training single-view probespythontrain_depth.pybackbone=dino_b16+backbone.return_multilayer=Truepythontrain_snorm.pybackbone=dino_b16+backbone.return_multilayer=True# Evaluating multiview correspondence pythonevaluate_navi_correspondence.py+backbone=dino_b16pythonevaluate_scannet_correspondence.py+backbone=dino_b16pythonevaluate_spair_correspondence.py+backbone=dino_b16
Performance Correlation
Coming soon.
Acknowledgments
We thank Prafull Sharma, Shivam Duggal, Karan Desai, Junhwa Hur, and Charles Herrmann for many helpful discussions.
We also thank Alyosha Efros, David Fouhey, Stella Yu, and Andrew Owens for their feedback.
We would also like to acknowledge the following repositories and users for releasing very valuable
code and datasets:
GeoNet for releasing the extracted surface normals for full NYU.
About
[CVPR 2024] Probing the 3D Awareness of Visual Foundation Models