CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 21 Oct 2023 09:41:27 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"65339cc7-6ab3" expires: Mon, 29 Dec 2025 21:54:05 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 78A2:2B0FD4:95BCB1:A7F794:6952F625 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 21:44:05 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210047-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767044646.570996,VS0,VE218 vary: Accept-Encoding x-fastly-request-id: 9c97f3e0c41c133ab7ad7a9a5cada4b6041855ff content-length: 6499 MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

Abstract

Trajectory Translation

Results

Attention Analysis

MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL

¹Tianjin University,

²The University of Hong Kong,

³Huawei Noah's Ark Lab

Paper

Code

Abstract

Recently, diffusion model shines as a promising backbone for the sequence modeling paradigm in offline reinforcement learning. However, these works mostly lack the generalization ability across tasks with reward or dynamics change. To tackle this challenge, in this paper we propose a task-oriented conditioned diffusion planner for offline meta-RL(MetaDiffuser), which considers the generalization problem as conditional trajectory generation task with contextual representation. The key is to learn a context conditioned diffusion model which can generate task-oriented trajectories for planning across diverse tasks. To enhance the dynamics consistency of the generated trajectories while encouraging trajectories to achieve high returns, we further design a dual-guided module in the sampling process of the diffusion model. The proposed framework enjoys the robustness to the quality of collected warm-start data from the testing task and the flexibility to incorporate with different task representation method. The experiment results on MuJoCo benchmarks show that MetaDiffuser outperforms other strong offline meta-RL baselines, demonstrating the outstanding conditional generation ability of diffusion architecture.

Motivation overview of MetaDiffuser. It enables diffusion models to generate rich synthetic expert data using guidance from reward gradients of either seen or unseen goal-conditioned tasks. Then, it iteratively selects high-quality data via a discriminator to finetune the diffusion model for self-evolving, leading to improved performance on seen tasks and better generalizability to unseen tasks. Planning with diffusion model (Janner et al., 2022b) provides a promising paradigm for offline RL, which utilizes diffusion model as a trajectory generator by joint diffusing the states and actions from the noise to formulate the sequence decision-making problem as standard generative modeling.