| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 20 Oct 2025 06:56:27 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"68f5dd1b-4d11"
expires: Sun, 28 Dec 2025 19:45:00 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: C3FA:21D6A4:804AFC:8FC200:69518664
accept-ranges: bytes
age: 0
date: Sun, 28 Dec 2025 19:35:00 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210048-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766950500.414952,VS0,VE207
vary: Accept-Encoding
x-fastly-request-id: 7b58fdf7b0c72085ed70c40b68354f3b1e92b0d6
content-length: 4531
Learning Generalizable Feature Fields for Mobile Manipulation
Building upon 2D vision foundation models, GeFF generalizes to diverse real-world indoor and outdoor scenes and objects.
We study what GeFF representations can do on various applications, including classical problems such as
dynamic obstacle avoidance and narrow passage traversal, as well as more challenging tasks such as
open-vocabulary semantic-aware planning on our quadruped robot.
Learning Generalizable Feature Fields for Mobile Manipulation
Ri-Zhao Qiu*1,
Yafei Hu*1, 2,
Yuchen Song*1,
Ge Yang3, 4,
Yang Fu1,
Jianglong Ye1,
Jiteng Mu1,
Ruihan Yang1,
Nikolay Atanasov1,
Sebastian Scherer2,
Xiaolong Wang1
1UC San Diego,
2CMU,
3MIT
4IAIFI
*Indicates equal contribution
TLDR: Real-time Generalizable Feature Fields enable Mobile Manipulation
Abstract
An open problem in mobile manipulation is how to
represent objects and scenes in a unified manner so that robots
can use it both for navigating in the environment and manipulating
objects. The latter requires capturing intricate geometry
while understanding fine-grained semantics, whereas the former
involves capturing the complexity inherited to an expansive
physical scale. In this work, we present GeFF (Generalizable
Feature Fields), a scene-level generalizable neural feature field
that acts as a unified representation for both navigation and
manipulation that performs in real-time. To do so, we treat
generative novel view synthesis as a pre-training task, and then
align the resulting rich scene priors with natural language via
CLIP feature distillation. We demonstrate the effectiveness of this
approach by deploying GeFF on a quadruped robot equipped
with a manipulator. We evaluate GeFF’s ability to generalize to
open-set objects as well as running time when performing
open-vocabulary mobile manipulation in dynamic scenes.
Piloting Study: Generalizable NeRFs as a Pre-training Proxy
Even without explicit semantic supervision, generalizable NeRFs implicitly acquire geometric and semantic priors (grouping similar structures), which we further enhance in GeFF. Feature visualizations are done by PCA on rendered features on ScanNet.
Method Overview
Pre-trained as a generalizable NeRF encoder, GeFF provides unified scene representations from onboard RGB-D stream, offering both real-time geometry and language-grounded semantics. Compared to LERF, GeFF runs in real-time without costly per-scene optimization.
Open-vocabulary Mobile Manipulation
Building upon 2D vision foundation models, GeFF generalizes to diverse real-world indoor and outdoor scenes and objects.
*Features fields visualization reflects real FPS on the mobile robot.
Collecting an empty blue bottle coffee cup and toss it in a recycling bin.
Picking up a bottle in the woods.
Cleaning food warps from an outdoor patio.
Placing used packaging to trash bin in an office.
Picking up a bottle of car glass cleaner from the trunk.
Recycling a plastic bottle in a university lounge.
Case study: capabilities of GeFF
We study what GeFF representations can do on various applications, including classical problems such as
dynamic obstacle avoidance and narrow passage traversal, as well as more challenging tasks such as
open-vocabulary semantic-aware planning on our quadruped robot.
Dynamic Obstacle Avoidance
The robot avoids a person who walks into the path with feature fields updated in real time.
Narrow Passage Traversal
The robot goes through a narrow doorway with geometric representations from GeFF.
Geometry-only Path Planning
The robot takes the shortest path and step over the lawn to the target object.
Semantic-aware Path Planning
GeFF assigns higher affordances for lawn and keeps the robot on the walkway.
Part-level Manipulation
With multiple views, GeFF can target part-level representation conditioned on object-level representation, thus enhancing manipulation ability.
BibTeX
@article{qiu-hu-song-2024-geff,
title={Learning Generalizable Feature Fields for Mobile Manipulation},
author={Ri-Zhao Qiu and Yafei Hu and Yuchen Song and Ge Yang and Yang Fu and Jianglong Ye and Jiteng Mu and Ruihan Yang and Nikolay Atanasov and Sebastian Scherer and Xiaolong Wang},
journal={arXiv preprint arXiv:2403.07563},
year={2024}
}