CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Fri, 21 Mar 2025 03:58:09 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67dce3d1-2d25" expires: Sun, 28 Dec 2025 16:03:27 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 0A31:1F53DD:7C8A4C:8B9E4D:69515277 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 15:53:28 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210021-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766937208.859478,VS0,VE209 vary: Accept-Encoding x-fastly-request-id: 75adc700610ace32284c3b470ab7216f06e63a6a content-length: 3376 M3: 3D-Spatial Multimodel Memory

M3

3D-Spatial MultiModel Memory

Paper

arXiv

Video Code

ICLR 2025

Xueyan Zou, Yuchen Song, Ri-Zhao Qiu, Xuanbin Peng,
Jianglong Ye, Sifei Liu, Xiaolong Wang

UC San Diego, NVIDIA

M3 is a framework for rendering static 3D scenes with RGB and foundation model embeddings, enabling rich spatial and semantic understanding.

Precise Feature Reconstruction: M3 uses Gaussian Memory Attention to reconstruct spatial memory directly in the foundation model's embedding space, avoiding distillation and preserving the source model's embedding space.

Efficient Feature Representation: M3 reduces embedding dimensions from 64 to 16–32 per Gaussian primitive, achieving equal or better performance with 50% fewer dimensions for improved efficiency.

The best part? Fewer parameters, original embedding space!

Technical Summary Video

Interactive Demo

We propose a new visualization tool that support streaming of 3D scene reconstruction for RGB, and Foundation Model embeddings with GPU as backend.

Real Robot Deployment

We deploy M3 on tabletop manipulation tasks and show demo videos, noted that M3 is currently only used for localization and mapping.

Memory to Rendering

In the visualization below, we have shown the raw feature manifold (blue points) and the memory extracted by M3 (red points), with the proposed M3 method, we apply gaussian memory attention from principle scene component to the rendered high resolution feature map (third row).

Original Source | Taken Source

M3