Exporters From Japan
Wholesale exporters from Japan   Company Established 1983
CARVIEW
Select Language

Attention Maps

We condition the VOT on the noiseless action taken by the agent. Inspecting the attention maps, we find that different actions prime the VOT to attend to meaningful regions in the image. For instance, turning left leads to the model focusing on regions present at both time steps (see below). This makes intuitive sense, as a turning action of 30° strongly displaces visual features or even pushes them out of the agent’s field of view. A similar behavior emerges for moving forward, which leads to attending on the center regions, e.g., the walls and the end of a hallway (see below).

Loading...



Hint: Drag the slider to overlay the attention map over the observations.

Habitat Challenge

We submit our VOT (RGB-D) to the Habitat Challenge 2021 benchmark (test-std split). Using the same navigation policy as Rank 2, we achieve the highest SSPL training on only 5% of the data. (Leaderboard)

Rank Participant team S SPL SSPL
1 MultiModalVO (VOT) (ours) 93 74 77
2 VO for Realistic PointGoal 94 74 76
3 inspir.ai robotics 91 70 71
4 VO2021 78 59 69
5 Differentiable SLAM-net 65 47 60

BibTeX

@inproceedings{memmel2023modality,
    title={Modality-invariant Visual Odometry for Embodied Vision},
    author={Memmel, Marius and Bachmann, Roman and Zamir, Amir},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
    pages={21549--21559},
    year={2023}
  }