CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 date: Sun, 28 Dec 2025 12:10:22 GMT content-type: text/html; charset=utf-8 access-control-allow-origin: * cache-control: public, max-age=0, must-revalidate nel: {"report_to":"cf-nel","success_fraction":0.0,"max_age":604800} link: ; rel="preconnect" referrer-policy: strict-origin-when-cross-origin x-content-type-options: nosniff vary: accept-encoding report-to: {"group":"cf-nel","max_age":604800,"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v4?s=ZaJ6YEYVdvBdP%2BBA5sRNgEnRgpkE6UXQJdZ0XcI5t8crc6I5J%2BsKONzq8WK4R3UMtKaCE%2FijW95CfJW2NzBfl9QdTog3Tar9Ytvs0NhMAgJDNGICrdA%3D"}]} cf-cache-status: DYNAMIC server: cloudflare content-encoding: gzip cf-ray: 9b50f4437d273a4b-BOM alt-svc: h3=":443"; ma=86400 Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Robot Utility Models

General Policies for Zero-Shot Deployment in New Environments

90% success rate in novel environments with 0 additional data or training.

Haritheja Etukuru^*, Norihito Naka, Zijin Hu, Seungjae Lee, Julian Mehu, Aaron Edsinger, Chris Paxton, Soumith Chintala, Lerrel Pinto, Nur Muhammad “Mahi” Shafiullah^*

Corresponding author: mahi at cs dot nyu dot edu, (^*) denotes equal contribution

RUMs in action

Read the paper

GitHub repo

Robot models, particularly those trained with large amounts of data, have recently shown a plethora of real-world manipulation and navigation capabilities. Several independent efforts have shown that given sufficient training data in an environment, robot policies can generalize to demonstrated variations in that environment. However, needing to finetune robot models to every new environment stands in stark contrast to models in language or vision that can be deployed zero-shot for open-world problems. In this work, we present Robot Utility Models (RUMs), a framework for training and deploying zero-shot robot policies that can directly generalize to new environments without any finetuning. To create RUMs efficiently, we develop new tools to quickly collect data for mobile manipulation tasks, integrate such data into a policy with multi-modal imitation learning, and deploy policies on-device on Hello Robot Stretch, a cheap commodity robot, with an external mLLM verifier for retrying. We train five such utility models for opening cabinet doors, opening drawers, picking up napkins, picking up paper bags, and reorienting fallen objects. Our system, on average, achieves 90% success rate in unseen, novel environments interacting with unseen objects. Moreover, the utility models can also succeed in different robot and camera set-ups with no further data, training, or fine-tuning. Primary among our lessons are the importance of training data over training algorithm and policy class, guidance about data scaling, necessity for diverse yet high-quality demonstrations, and a recipe for robot introspection and retrying to improve performance on individual environments.

RUMs, in a nutshell:

Videos

RUMs in action

Our RUMs attempted 5 tasks, each in 5+ environments, on a Hello Robot Stretch. They also attempted a few tasks on an xArm. See sample rollouts below:

RUMs Automatically Retrying Upon Failure

We feed in a summary of robot observations into a multimodal LLM, which determines whether or not the task at hand has succeeded. If the mLLM determines that the task has failed, the robot automatically resets to a new initial state and retries.

Hardware

The Stick V2

We've redesigned the Stick! Addressing some of the previous limitations, Stick V2 is designed to improve on user experience, becoming more ergonomic, more capable, and stronger than before.

Download the 3D files

Bill of Materials

Dataset Creation Code

Robot Gripper/iPhone Mount

We've made it possible to add the Stick gripper onto your own robot arm with a 3D-printed mount and Dynamixel set, allowing for an identical POV. Thus facilitating seemless zero-shot transfer of policies to new robots.

Download the 3D files

Dataset

5 tasks

180 environments

5509 trajectories

We release the training dataset for our Robot Utility Models, containing 5 tasks, each with on average ~1000 training demonstrations across 36 environments. The dataset contains RGB videos at 30 fps, as well as full action annotations for 6D pose of the gripper and the gripper's opening angle normalized between (0, 1).

Data diversity visualizer

RGB + actions dataset (473 MB)

Paper

Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments

Read the paper (Arxiv)

Read the paper (PDF)

Citation (bibtex)

@article{etukuru2024robot,
    title={Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments}, 
    author={Haritheja Etukuru and Norihito Naka and Zijin Hu and Seungjae Lee and Julian Mehu and Aaron Edsinger and Chris Paxton and Soumith Chintala and Lerrel Pinto and Nur Muhammad Mahi Shafiullah},
    journal={arXiv preprint arXiv:2409.05865},
    year={2024}
}

Code

Get the code on github.

GitHub repo

Documentation

Questions? Contact us.

Original Source | Taken Source