You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ReQueST — (Re)ward (Que)ry (S)ynthesis via (T)rajectory Optimization
ReQueST is a reward modeling
algorithm that asks the user for feedback on hypothetical trajectories synthesized using a
pretrained model of the environment dynamics, instead of real trajectories generated by rolling out
a partially-trained agent in the environment. Compared to
previousapproaches, this enables
training more robust reward models that work off-policy,
learning about unsafe states without visiting them, and
better query-efficiency through the use of active learning.
This codebase implements ReQueST in three domains:
Set wm_dir, mnist_dir, and home_dir in
ReQueST/utils.py
Install the rqst package with python setup.py install
Download
data.zip,
then unzip it into ReQueST/data
Jupyter notebooks in ReQueST/notebooks provide an entry-point to the code base, where you can
play around with the environments, visualize synthesized queries, and reproduce the figures
from the paper.
Citation
If you find this software useful in your work, we kindly request that you cite the following
paper:
@article{ReQueST,
title={Learning Human Objectives by Evaluating Hypothetical Behavior},
author={Reddy, Siddharth and Dragan, Anca D. and Levine, Sergey and Legg, Shane and Leike, Jan},
journal={arXiv preprint arXiv:1912.05652},
year={2019}
}
Disclaimer
This is not an officially supported Google product.
About
Code for the paper, "Learning Human Objectives by Evaluating Hypothetical Behavior"