Towards Robust Offline RL under Diverse Data Corruption

This repo contains the official implemented Robust IQL (RIQL) algorithm for the ICLR 2024 spotlight paper (⭐ top 5%), "Towards Robust Offline Reinforcement Learning under Diverse Data Corruption". This code is implemented based on the open-sourced CORL library.

Note

We fixed a small bug and ensured setting iql_deterministic=True as the default hyperparameter in our experiments, which is more stable and generally performs better. We have discussed the deterministic policy in Appendix E.4 of our paper.

Getting started

Install torch>=1.7.1, gym, mujoco_py, d4rl, pyrallis, wandb, tqdm.

Under Random Data Corruption

Run RIQL with random observation corruption:

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode random  --corrupt_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate}  --env_name ${env_name} --seed ${seed}

'env_name' can be 'halfcheetah-medium-replay-v2', 'walker2d-medium-replay-v2', 'hopper-medium-replay-v2', ....

'corruption_range' and 'corruption_rate' are set to 1.0 and 0.3 by default.

Replace '--corrupt_obs' with '--corrupt_reward', '--corrupt_acts', and '--corrupt_dynamics' to enforce corruption on rewards, actions, and dynamics.

Under Adversarial Data Corruption

Run RIQL with adversarial observation corruption:

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py --corruption_mode adversarial --corruption_obs --corruption_range ${corruption_range} --corruption_rate ${corruption_rate}  --env_name ${env_name} --seed ${seed}

The adversarial attacks on obs, actions, and next-obs require performing gradient-based attack and will save the corrupted data. After saving the corrupted data, we will load these data for later training.

Clean Data

To run the algorithm with a clean dataset, you can run the following command without specifying the corruption-related parameters

CUDA_VISIBLE_DEVICES=${gpu} python RIQL.py  --env_name ${env_name} --seed ${seed}

Baselines

You can replace the RIQL.py with other baselines, such as IQL.py, CQL.py, EDAC.py, and MSG.py, to run IQL, CQL, EDAC, and MSG.

Citation

If you find our work helpful for your research, please cite:

@inproceedings{yang2023towards,
  title={Towards Robust Offline Reinforcement Learning under Diverse Data Corruption},
  author={Yang, Rui and Zhong, Han and Xu, Jiawei and Zhang, Amy and Zhang, Chongjie and Han, Lei and Zhang, Tong},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=5hAMmCU0bK}
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
pretrained_model/EDAC		pretrained_model/EDAC
CQL.py		CQL.py
EDAC.py		EDAC.py
IQL.py		IQL.py
MSG.py		MSG.py
README.md		README.md
RIQL.py		RIQL.py
RIQL_config.py		RIQL_config.py
attack.py		attack.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towards Robust Offline RL under Diverse Data Corruption

Note

Getting started

Under Random Data Corruption

Under Adversarial Data Corruption

Clean Data

Baselines

Citation

About

Uh oh!

Releases

Packages

Languages

YangRui2015/RIQL

Folders and files

Latest commit

History

Repository files navigation

Towards Robust Offline RL under Diverse Data Corruption

Note

Getting started

Under Random Data Corruption

Under Adversarial Data Corruption

Clean Data

Baselines

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages