HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 301 server: GitHub.com content-type: text/html location: https://hari-sikchi.github.io/rlzero/ access-control-allow-origin: * expires: Mon, 29 Dec 2025 23:06:58 GMT cache-control: max-age=600 x-proxy-cache: MISS x-github-request-id: 9C5F:234FE9:97284D:A9862B:69530738 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 22:56:58 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210047-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767049018.414921,VS0,VE215 vary: Accept-Encoding x-fastly-request-id: 210bcc778735b9779dc9e20c1ed7849c4f3bbe66 content-length: 162 HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 x-origin-cache: HIT last-modified: Fri, 14 Nov 2025 07:27:30 GMT access-control-allow-origin: * etag: W/"6916d9e2-68db" expires: Mon, 29 Dec 2025 23:06:58 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 46A8:1F53DD:95F2B4:A8505A:69530730 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 22:56:58 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210047-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767049019.643831,VS0,VE208 vary: Accept-Encoding x-fastly-request-id: 63cf3006a47946b94f2819ddb000a57eccb10fb0 content-length: 5471 RLZero: Direct Policy Inference from Language/Video Without In-Domain Supervision

RLZero: Direct Policy Inference from Language/Video Without In-Domain Supervision

Harshit Sikchi^θ^* , Siddhant Agarwal^θ^*, Pranaya Jajoo^†^*, Samyak Parajuli^θ^*, Caleb Chuck^θ^*, Max Rudolph^θ^*,
Peter Stone^{θ ν}⁺, Amy Zhang^{θ φ}⁺, Scott Niekum^ψ⁺

^θ UT Austin, ^† University of Alberta, ^ψ UMass Amherst,
^ν Sony AI, ^φ Meta AI
^* Equal Contribution, ⁺ Equal Advising

Paper Code (Coming Soon)

Note: The blogs serves to promote informal understanding of the ideas in our work. For a more formal read, check out our paper.

Talk and Overview

RLZero

Figure 1: Overview of the RLZero approach

Reinforcement Learning lacks an interpretable window to the agent. Specifying a task to the agent requires desiging a reward function, which experienced researchers struggle to do. We propose RLZero as a way to design a small language promptable generalist RL agent. RLZero provides two advances over prior methods:
a) Zero-shot: During text time, there is no further training or environment interactions required to generate a policy given a task description
b) Unsupervised: We do not use any task labels to map language to skills and our approach remains completely unsupervised.

How it works?

Step 1 : Imagine

Given a task description in natural language, RLZero uses a video model to generate imagination of the task.

Figure 2: Generated video clip for Walker environment using the prompt 'do lunges'

In this stage, the agent can be prompted with a real video (cross-embodiment) rather than one generated by a video model.

Figure 3: RLZero can use a video scraped from YouTube or AI generated at this stage.

Step 2: Project

The imagination might differ in the domain or the dynamics when compared to the agent. Each frame of the imagined video is projected with a real observation that the agent encountered in its past environmental interactions.

Figure 4: SigLIP is used to do image retrieval, finding the closest frame in the prior interaction dataset.

Step 3: Zero-shot Imitation with Behavior Foundation Models (BFM)

Figure 4: Skills learned as points on a hypersphere of a BFM

RLZero uses agent's past interaction data to learn a wealth of skills. This becomes possible now with advances in Zero-shot Reinforcement Learning [ 1, 2, 3 ]. This model is referred to as a Behavior Foundation Model (BFM). Using the real observations of the agent, we can compute the policy that solves the observation-only imitation problem in closed form using BFMs.