You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exploration-Exploitation in Reinforcement Learning
This library contains several algorithms based on the Optimism in Face of Uncertainty (OFU) principle both for MDPs and SMDPs.
In particular we have implemented
UCRL [1]
SMDP-UCRL and Free-Parameter SMDP-UCRL [2]
SCAL [3]
All the implementations uses both Hoeffding's or Bernstein's confidence intervals.
Note that this is a research project and by definition is unstable. Please write to us if you find something not correct or strange.
References:
[1] Jaksch, Ortner, and Auer. Near-optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:1563–1600, 2010.
[2] Fruit, Pirotta, Lazaric, Brunskill. Regret Minimization in MDPs with Options without Prior Knowledge. NIPS 2017
You can perform a minimal install of the library with:
git clone https://github.com/RonanFR/UCRL
cd UCRL
pip install -e .
make
For MAC OSX we suggest to use Anaconda and GCC.
Testing
We are using pytest for tests. You can run them via:
pytest
How to reproduce experiments
In order to reproduce the results in [3] you can follow these instructions.
For SCAL, you can run the following command by changing the span constraint (5 and 10) and the seed (114364114, 679848179, 375341576, 340061651, 311346802). Results are averaged over 15 runs. You can change the number of repetitions by changing the parameter -r.