| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Thu, 19 Dec 2024 14:48:23 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"67643237-3d90"
expires: Mon, 29 Dec 2025 08:21:30 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 871D:272D88:88D590:99A0F1:695237B2
accept-ranges: bytes
age: 0
date: Mon, 29 Dec 2025 08:11:30 GMT
via: 1.1 varnish
x-served-by: cache-bom-vanm7210058-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766995890.274713,VS0,VE204
vary: Accept-Encoding
x-fastly-request-id: 926f3c82a41c18a8e708b5ac02f08ac4f6585be1
content-length: 2902
HIL-SERL: Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning
Department of EECS, University of California, Berkeley
Policy Autonomous Rollouts
Policy Robustness
Method Overview
HIL-SERL is a system for training state of the art manipulation policies using Reinforcement Learning.
- We first tele-operate the robot to collect positive and negative samples and train a binary reward classifier.
- We then collect a small set of demonstrations, which is added to the demo buffer at the start of RL training.
- During online training, we use the binary classifier as a sparse reward signal and provide human interventions. Initially, we provide more frequent interventions to demonstrate ways of solving the task from various states and prevent the robot from performing undesirable behavior. We gradually reduce the amount of interventions as the policy reaches higher success rate and faster cycle times.