CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 07 Nov 2023 13:24:49 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"654a3aa1-50f9" expires: Sun, 28 Dec 2025 12:05:15 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 23F6:2D8B9D:79262C:87CE32:69511AA2 accept-ranges: bytes age: 0 date: Sun, 28 Dec 2025 11:55:15 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210029-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766922915.281636,VS0,VE204 vary: Accept-Encoding x-fastly-request-id: dad48dded2cc3c85833a279bf1d5ec8d2171a9df content-length: 4629 Learning to Design and Use Tools for Robotic Manipulation

Learning to Design and Use Tools for Robotic Manipulation

Ziang Liu^*, Stephen Tian^*, Michelle Guo, C. Karen Liu, Jiajun Wu

Stanford University

Paper arXiv Video Code (coming soon)

A robot may need to use different tools to fetch an out-of-reach book (blue) or push it into the bookshelf (pink). Therefore, it should rapidly prototype tools for the tasks at hand.

Abstract

When limited by their own morphologies, humans and some species of animals have the remarkable ability to use objects from the environment toward accomplishing otherwise impossible tasks. Robots might similarly unlock a range of additional capabilities through tool use. Recent techniques for jointly optimizing morphology and control via deep learning are effective at designing locomotion agents. But while outputting a single morphology makes sense for locomotion, manipulation involves a variety of strategies depending on the task goals at hand. A manipulation agent must be capable of rapidly prototyping specialized tools for different goals. Therefore, we propose learning a designer policy , rather than a single design. A designer policy is conditioned on task information and outputs a tool design that helps solve the task. A design-conditioned controller policy can then perform manipulation using these tools. In this work, we take a step towards this goal by introducing a reinforcement learning framework for jointly learning these policies. Through simulated manipulation tasks, we show that this framework is more sample efficient than prior methods in multi-goal or multi-variant settings, can perform zero-shot interpolation or fine-tuning to tackle previously unseen goals, and allows tradeoffs between the complexity of design and control policies under practical constraints. Finally, we deploy our learned policies onto a real robot.