CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 31 Dec 2024 22:08:57 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67746b79-a046" expires: Sun, 28 Dec 2025 20:43:36 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 2CF0:2B0FD4:801B05:8FAAF7:6951941F accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 22:21:54 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210021-BOM x-cache: HIT x-cache-hits: 0 x-timer: S1766960514.399245,VS0,VE202 vary: Accept-Encoding x-fastly-request-id: 079e49ed26d91b132306dc208b9f868e856a9c19 content-length: 7754 DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulations

DynaMem

Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation

Peiqi Liu, Zhanqiu Guo, Mohit Warke, Soumith Chintala, Chris Paxton, Nur Muhammad "Mahi" Shafiullah^†, Lerrel Pinto^†

†: Equal advising

DynaMem in the wild

Read the paper

GitHub repo

Videos

DynaMem in action

Here are sample trials from 3 lab environments and 2 home environments.

Method

Illustration of DynaMem

We maintain a feature pointcloud as the robot memory. When the robot receives a new RGBD observation of the environment, it adds the newly observed objects and removes the points no longer existing.

To ground the object of interest described by the text query, the robot locates the point most similar to text query along with the last image it is observed. If the text is grounded in the image or the point has high similarity with the text, it will be considered as the location of the object of interest.

If the text is grounded the environment, the robot will navigate to the target object; otherwise, the robot memory will be projected into a value map and the robot explores the environment based on the value map.

Evaluation

Performance of DynaMem

performance and failure analysis of DynaMem and baselines

We evaluate DynaMem in 3 different environments, 10 queries from each environment. We select OK-Robot (with prescanned static robot memory) and Gemini (utilized following the pipeline proposed in OpenEQA) as baselines.

We find that both DynaMem and mLLM have a total success rate of 70%. This is a significant improvement over the OK-Robot system, which has a total success rate of 30%. Notably, DynaMem is particularly adept at handling dynamic objects in the environment: only 6.7% of the trials failed due to our system not being able to navigate to such dynamic objects in the scene.

Paper

DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulations

Read the paper (Arxiv)

Read the paper (PDF)

Citation (bibtex)

@article{liu2024dynamem,
  title={DynaMem: Online Dynamic Spatio-Semantic Memory for Open World Mobile Manipulation},
  author={Liu, Peiqi and Guo, Zhanqiu and Warke, Mohit and Chintala, Soumith and Shafiullah, Nur Muhammad Mahi and Pinto, Lerrel},
  journal={arXiv preprint arXiv:2411.04999},
  year={2024}
}

Code

Get the code on github.

GitHub repo

Documentation

Original Source | Taken Source