Carview!

CARVIEW

MOTORHOMES

Select Language

Know Your Neighbors (KYN) excels in disambiguating occluded scene geometry from a single image by utilizing vision-language semantics and spatial reasoning.

TL;DR

A new single-view scene reconstruction method that reasons faithful scene/object geometry with partial visual observations.
A VL modulation module that enriches per-point features with fine-grained semantics from visual and text features.
A VL spatial attention that aggregates point representations of the scene for accurate predictions aware of the neighboring 3D semantic context.

Overview

Given an input image \(\textbf{I}_{0}\), we use two image encoders to obtain features (\(F_{\text{app}}\), \(F_{\text{vis}}\)), and fuse these into feature map \(F_{\text{fused}}\). We further extract category-level text features and a segmentation map \(S\). For a given 3D point set \(\mathbf{X}\), we query the extracted features by projecting them onto the image plane yielding point-wise visual and text features. Next, the VL modulation layers endow the point representation with fine-grained semantic information. Finally, the VL spatial attention aggregates these point representations across the 3D scene, yielding density predictions aware of the 3D semantic context.

Visual Comparisons

Ours

BTS [Wimbauer, 2023]

Ours

PixelNeRF [Yu, 2021]

Ours

MonoDepth2 [Godard, 2019]

Scene Reconstruction

Compared to previous methods that struggle with corrupted and trailing shapes, our method produces faithful scene geometry, especially for occluded areas.

Object Reconstruction

Our method produces more faithful object geometries for various semantic categories.

BibTeX

@inproceedings{li2024know,
      title={Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning}, 
      author={Li, Rui and Fischer, Tobias and Segu, Mattia and Pollefeys, Marc and Van Gool, Luc and Tombari, Federico},
      booktitle={CVPR},
      year={2024}
}

awesome webpage template

Original Source | Taken Source

🏡Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

CVPR 2024

Know Your Neighbors (KYN) excels in disambiguating occluded scene geometry from a single image by utilizing vision-language semantics and spatial reasoning.

TL;DR

Overview

Visual Comparisons

Scene Reconstruction

Compared to previous methods that struggle with corrupted and trailing shapes, our method produces faithful scene geometry, especially for occluded areas.

Object Reconstruction

Our method produces more faithful object geometries for various semantic categories.

BibTeX