| CARVIEW |
Select Language
HTTP/2 301
server: GitHub.com
content-type: text/html
location: https://deeprl.cs.washington.edu/reading
x-github-request-id: BBD5:1387E:9BE0B0:AF397C:695389AE
accept-ranges: bytes
date: Tue, 30 Dec 2025 08:13:34 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-bom-vanm7210087-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1767082414.367567,VS0,VE200
vary: Accept-Encoding
x-fastly-request-id: 07d74face85d7105894d14e18d8a238fbec6e9a4
content-length: 162
HTTP/1.1 200 OK
Connection: keep-alive
Content-Length: 2327
Server: GitHub.com
Content-Type: text/html; charset=utf-8
Last-Modified: Mon, 18 Jun 2018 19:19:40 GMT
Access-Control-Allow-Origin: *
ETag: W/"5b2805cc-1785"
expires: Tue, 30 Dec 2025 08:23:35 GMT
Cache-Control: max-age=600
Content-Encoding: gzip
x-proxy-cache: MISS
X-GitHub-Request-Id: 490F:2C10E1:9D60A4:B0B9F4:695389AE
Accept-Ranges: bytes
Age: 0
Date: Tue, 30 Dec 2025 08:13:35 GMT
Via: 1.1 varnish
X-Served-By: cache-bom-vanm7210068-BOM
X-Cache: MISS
X-Cache-Hits: 0
X-Timer: S1767082415.120188,VS0,VE216
Vary: Accept-Encoding
X-Fastly-Request-ID: 0c9d0c4b726c5501d5b90ec42890d28c8e43d3a1
Reading
Reading
Papers and Books
- Sutton, Barto, Reinforcement Learning an Introduction. (classic textbook)
- White, Real applications of markov decision processes
- Kober, Bagnell, Peters, Reinforcement learning in robotics: a survey, 2013
Policy gradient
- Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, 1992
- Sutton et al. Policy gradient methods for reinforcement learning with function approximation, 2000
- Kakade, A natural policy gradient, 2001**
- Kakade, Langford, Approximately optimal approximate reinforcement learning, 2002
- Schulman et al. Trust region policy optimization, 2015**
- Schulman et al. High-dimensional continuous control using generalized advantage estimation, 2016
- Rajeswaran et al. Towards generalization and simplicity in continuous control, 2017**
- Schulman et al. Proximal Policy Optimization Algorithms, 2017
- Mnih et al. Asynchronous Methods for Deep Reinforcement Learning, 2016
- Toussaint, Gradient descent lecture notes, 2012**