CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Mon, 29 Jan 2024 22:52:38 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"65b82c36-2068" expires: Tue, 30 Dec 2025 04:00:54 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: ABF1:2D64E0:980BF4:AAEBBC:69534C1E accept-ranges: bytes age: 0 date: Tue, 30 Dec 2025 03:50:54 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210064-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767066655.692267,VS0,VE237 vary: Accept-Encoding x-fastly-request-id: 2e6170b041e1531d69eb72c0db1138960ba2de00 content-length: 2336 SAQ: Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

SAQ: Action-Quantized Offline Reinforcement Learning for Robotic Skill Learning

Jianlan Luo

Perry Dong

Jeffrey Wu

Aviral Kumar

Xinyang Geng

Sergey Levine

Conference on Robot Learning (CoRL) 2023

Atlanta, GA

Code

Paper

Abstract

The offline reinforcement learning (RL) paradigm provides a general recipe to convert static behavior datasets of experience into policies that can perform better than the behavior policy that collected the data. While policy constraints, conservatism, and other methods for mitigating distributional shifts w.r.t. behavior datasets have made offline reinforcement learning more effective, the continuous action setting often necessitates various approximations for applying these techniques. Many of these challenges are greatly alleviated in discrete action settings, where offline RL constraints and regularizers can often be computed more precisely or even exactly. In this paper, we propose an adaptive scheme for action quantization. We use a VQ-VAE to learn state-conditioned action quantization, avoiding the exponential blowup that comes with naïve discretization of the action space. We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme. We further validate our approach on a set of challenging long-horizon complex robotic manipulation tasks in the robomimic environment where our discretized offline RL algorithms are able to improve upon their continuous counterparts by 2-3x in performance.

Summary

Train a conditional VQ-VAE to learn a latent representation of the actions conditioned on the state
Perform offline RL with the VQ-VAE discrete codes as actions
During inference, select the best discrete action using the policy and transform into continuous with the trained decoder

Method

Results

Original Source | Taken Source