HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 10 Mar 2025 07:08:47 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"67ce8fff-49f4" expires: Sun, 28 Dec 2025 23:12:11 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: E29A:21D6A4:823923:920E89:6951B6F2 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 23:02:11 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210056-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766962931.290193,VS0,VE201 vary: Accept-Encoding x-fastly-request-id: db6030dc8671acb36fb19761a0f7c0d61f63994b content-length: 5044 COME robot

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

Peiyuan Zhi^1,*, Zhiyuan Zhang^1,2,*, Yu Zhao¹, Muzhi Han³, Zeyu Zhang¹, Zhitian Li¹, Ziyuan jiao¹, Baoxiong Jia^1,†, Siyuan Huang^1,†,

¹State Key Laboratory of General Artificial Intelligence, Beijing Institute for General Artificial Intelligence (BIGAI), ²Department of Automation, Tsinghua University, ³University of California, Los Angeles
^*Indicates Equal Contribution
ICRA 2025

Paper arXiv Code video

Abstract

Autonomous robot navigation and manipulation in open environments require reasoning and replanning with closed-loop feedback. In this work, we present COME-robot, the first closed-loop robotic system utilizing the GPT-4V vision-language foundation model for open-ended reasoning and adaptive planning in real-world scenarios. robot incorporates two key innovative modules: (i) a multi-level open-vocabulary perception and situated reasoning module that enables effective exploration of the 3D environment and target object identification using commonsense knowledge and situated information, and (ii) an iterative closed-loop feedback and restoration mechanism that verifies task feasibility, monitors execution success, and traces failure causes across different modules for robust failure recovery. Through comprehensive experiments involving 8 challenging real-world mobile and tabletop manipulation tasks, COME-robot demonstrates a significant improvement in task success rate (~35%) compared to state-of-the-art methods. We further conduct comprehensive analyses to elucidate how COME-robot's design facilitates failure recovery, free-form instruction following, and long-horizon task planning.

Approach

A brief overview of COME-robot's workfow. Given a task instruction, COME-robot employs GPT-4V for reasoning and generates a code-based plan. Through feedback obtained from the robot's execution and interaction with the environment, it iteratively updates the subsequent plan or recovers from failures, ultimately accomplishing the given task.

COME-robot's planner has two key designs: Open-Vocabulary Perception and Reasoning and Closed Loop Feedback and restoration. The former helps the robot ground open-ended instructions in real environment, and the latter guarantees task's completion. Actions to be executed as reasoned by GPT-4V are highlighted in blue, identified failures are highlighted in red, and analysis after observation or verification are highlighted in green.

Results

legged manipulation

mobile manipulation

tabletop manipulation

Cases of recover from failures

System Prompts

BibTeX


        @misc{zhi2025closedloopopenvocabularymobilemanipulation,
          title={Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V}, 
          author={Peiyuan Zhi and Zhiyuan Zhang and Yu Zhao and Muzhi Han and Zeyu Zhang and Zhitian Li and Ziyuan Jiao and Baoxiong Jia and Siyuan Huang},
          year={2025},
          eprint={2404.10220},
          archivePrefix={arXiv},
          primaryClass={cs.RO},
          url={https://arxiv.org/abs/2404.10220},
  }

This page was built using the Academic Project Page Template which was adopted from the Nerfies project page.
This website is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source