You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To run on a different environment, you can modify the provided template. You will also need to provide the termination function for the environment in mbpo/static. If you name the file the lowercase version of the environment name, it will be found automatically. See hopper.py for an example.
Logging
This codebase contains viskit as a submodule. You can view saved runs with:
The rollout length schedule is defined by a length-4 list in a config file. The format is [start_epoch, end_epoch, start_length, end_length], so the following:
'rollout_schedule': [20, 100, 1, 5]
corresponds to a model rollout length linearly increasing from 1 to 5 over epochs 20 to 100.
If you want to speed up training in terms of wall clock time (but possibly make the runs less sample-efficient), you can set a timeout for model training (max_model_t, in seconds) or train the model less frequently (every model_train_freq steps).
Comparing to MBPO
If you would like to compare to MBPO but do not have the resources to re-run all experiments, the learning curves found in Figure 2 of the paper (plus on the Humanoid environment) are available in this shared folder. See plot.py for an example of how to read the pickle files with the results.
Reference
@inproceedings{janner2019mbpo,
author = {Michael Janner and Justin Fu and Marvin Zhang and Sergey Levine},
title = {When to Trust Your Model: Model-Based Policy Optimization},
booktitle = {Advances in Neural Information Processing Systems},
year = {2019}
}