| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Mon, 16 Dec 2024 08:14:24 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"675fe160-5e41"
expires: Mon, 29 Dec 2025 15:23:01 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 5270:3157C7:8EA2B8:A02D45:69529A7D
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 15:13:01 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210082-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767021181.451876,VS0,VE218
vary: Accept-Encoding
x-fastly-request-id: 15a672ce72a0f4daedbd963f72d2d5d0bf51e6bd
content-length: 3084
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
OpenVLA + RLDG
OpenVLA
Octo + RLDG
Octo
OpenVLA + RLDG
OpenVLA
Octo + RLDG
Octo
Type-C Connector (Unseen)
Type-C Connector (Unseen)
Type-C Connector (Unseen)
Type-C Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
DisplayPort Connector (Unseen)
XLR Connector (Unseen)
XLR Connector (Unseen)
XLR Connector (Unseen)
XLR Connector (Unseen)
OpenVLA + RLDG
OpenVLA
Octo + RLDG
Octo
Unseen Object and Background
Unseen Object and Background
Unseen Object and Background
Unseen Object and Background
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
Department of EECS, University of California, Berkeley
RLDG is a framework for distilling specialist RL policies into a generalist robot policy. Generalists trained this way demonstrate higher performance compared to conventional fine-tuning methods using human demonstrations, and stronger generalization capabilities over the RL policies that they are distilled from. It works by:
- Train specialist policies on narrowly scoped tasks using online Reinforcement Learning. This could look like training a separate policy for each type of connector in the insertion task. It can also be training on just the "bottleneck" portion of a long-horizon task, while leaving the rest for human demonstrations.
- Generate a dataset of expert trajectories by rolling out the specialist policies. The dataset can contain episodes with multiple variants of the task generated by different RL policies. It may also include expert human demonstrations for the "easy" portion of a long-horizon task.
- Use the high-quality dataset to fine-tune any generalist robot policy and see improved performance!