# Reinforcement Learning: An Introduction
[![Build Status](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction.svg?branch=master)](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction)
Python code for Sutton & Barto's book [*Reinforcement Learning: An Introduction (2nd Edition)*](http://incompleteideas.net/book/the-book-2nd.html)
> If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly.
# Contents
### Chapter 1
1. Tic-Tac-Toe
### Chapter 2
1. [Figure 2.1: An exemplary bandit problem from the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_1.png)
2. [Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_2.png)
3. [Figure 2.3: Optimistic initial action-value estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_3.png)
4. [Figure 2.4: Average performance of UCB action selection on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_4.png)
5. [Figure 2.5: Average performance of the gradient bandit algorithm](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_5.png)
6. [Figure 2.6: A parameter study of the various bandit algorithms](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_6.png)
### Chapter 3
1. [Figure 3.2: Grid example with random policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_2.png)
2. [Figure 3.5: Optimal solutions to the gridworld example](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_5.png)
### Chapter 4
1. [Figure 4.1: Convergence of iterative policy evaluation on a small gridworld](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_1.png)
2. [Figure 4.2: Jack’s car rental problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_2.png)
3. [Figure 4.3: The solution to the gambler’s problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_3.png)
### Chapter 5
1. [Figure 5.1: Approximate state-value functions for the blackjack policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_1.png)
2. [Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_2.png)
3. [Figure 5.3: Weighted importance sampling](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_3.png)
4. [Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_4.png)
### Chapter 6
1. [Example 6.2: Random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_6_2.png)
2. [Figure 6.2: Batch updating](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_2.png)
3. [Figure 6.3: Sarsa applied to windy grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_3.png)
4. [Figure 6.4: The cliff-walking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_4.png)
5. [Figure 6.6: Interim and asymptotic performance of TD control methods](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_6.png)
6. [Figure 6.7: Comparison of Q-learning and Double Q-learning](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_7.png)
### Chapter 7
1. [Figure 7.2: Performance of n-step TD methods on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_7_2.png)
### Chapter 8
1. [Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_2.png)
2. [Figure 8.4: Average performance of Dyna agents on a blocking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_4.png)
3. [Figure 8.5: Average performance of Dyna agents on a shortcut task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_5.png)
4. [Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_8_4.png)
5. [Figure 8.7: Comparison of efficiency of expected and sample updates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_7.png)
6. [Figure 8.8: Relative efficiency of different update distributions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_8.png)
### Chapter 9
1. [Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_1.png)
2. [Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_2.png)
3. [Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_5.png)
4. [Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_8.png)
5. [Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_10.png)
### Chapter 10
1. [Figure 10.1: The cost-to-go function for Mountain Car task in one run](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_1.png)
2. [Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_2.png)
3. [Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_3.png)
4. [Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_4.png)
5. [Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_5.png)
### Chapter 11
1. [Figure 11.2: Baird's Countere
没有合适的资源?快使用搜索试试~ 我知道了~
Reinforcement Learning An Introduction_Sutton-增强学习导论(文档+代码)
共80个文件
png:49个
py:24个
pdf:2个
3星 · 超过75%的资源 需积分: 10 51 下载量 87 浏览量
2018-09-29
12:38:52
上传
评论 3
收藏 16.82MB RAR 举报
温馨提示
压缩包中包括Reinforcement Learning An Introduction英文与中文文档,还包括涉及的课程代码。
资源详情
资源评论
资源推荐
收起资源包目录
Reinforcement Learning-增强学习(文档+代码).rar (80个子文件)
Reinforcement Learning-增强学习(文档+代码)
reinforcement-learning-an-introduction-master
chapter03
grid_world.py 4KB
images
figure_3_2.png 17KB
figure_9_8.png 453KB
figure_12_3.png 71KB
figure_5_1.png 161KB
figure_5_2.png 168KB
figure_5_4.png 53KB
figure_9_10.png 38KB
figure_11_7.png 31KB
figure_4_2.png 170KB
figure_9_1.png 71KB
figure_6_2.png 25KB
figure_6_6.png 53KB
figure_7_2.png 71KB
figure_9_5.png 83KB
figure_6_4.png 64KB
figure_10_4.png 51KB
figure_11_2.png 106KB
figure_8_8.png 176KB
figure_2_4.png 38KB
figure_12_8.png 72KB
figure_12_11.png 43KB
figure_2_5.png 49KB
figure_8_4.png 28KB
example_13_1.png 35KB
figure_10_5.png 101KB
figure_12_6.png 64KB
figure_12_10.png 48KB
figure_4_1.png 12KB
figure_10_3.png 36KB
example_8_4.png 31KB
figure_2_6.png 39KB
figure_8_2.png 31KB
figure_9_2.png 154KB
figure_10_2.png 48KB
figure_13_2.png 42KB
figure_11_6.png 107KB
figure_2_3.png 35KB
example_6_2.png 233KB
figure_8_5.png 29KB
figure_5_3.png 29KB
figure_8_7.png 38KB
figure_2_1.png 25KB
figure_6_7.png 38KB
figure_2_2.png 153KB
figure_13_1.png 29KB
figure_10_1.png 988KB
figure_4_3.png 53KB
figure_3_5.png 18KB
figure_6_3.png 21KB
chapter05
blackjack.py 13KB
infinite_variance.py 2KB
requirements.txt 29B
chapter11
counterexample.py 12KB
chapter12
mountain_car.py 12KB
random_walk.py 9KB
chapter01
tic_tac_toe.py 11KB
chapter09
square_wave.py 4KB
random_walk.py 15KB
.travis.yml 148B
chapter04
gamblers_problem.py 2KB
car_rental.py 7KB
grid_world.py 3KB
LICENSE 11KB
chapter13
short_corridor.py 8KB
chapter06
cliff_walking.py 9KB
maximization_bias.py 4KB
windy_grid_world.py 4KB
random_walk.py 6KB
README.md 10KB
chapter10
access_control.py 9KB
mountain_car.py 13KB
chapter02
ten_armed_testbed.py 9KB
chapter07
random_walk.py 4KB
.gitignore 40B
chapter08
maze.py 23KB
trajectory_sampling.py 5KB
expectation_vs_sample.py 2KB
Reinforcement Learning:An Introduction.pdf 12.03MB
增强学习导论中文版(RLAI)2~10.pdf 2.98MB
共 80 条
- 1
大星派
- 粉丝: 16
- 资源: 2
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- apk.tw_LineLite_v8a_v.2.17.1_sign.apk
- Elasticsearch实战:构建高效搜索系统的秘诀.zip
- HTML+CSS+JS网页设计:从入门到精通.zip
- 数据库课程设计:从理论到实践的全面指南.zip
- Python闭包:深入理解与应用场景解析.zip
- Win64OpenSSL-3-3-0.exe
- 课高分程设计-基于C++实现的民航飞行与地图简易管理系统-南京航空航天大学
- 航天器遥测数据故障检测系统python源码+文档说明+数据库(课程设计)
- 北京航空航天大学操作系统课设+ppt+实验报告
- 基于Vue+Echarts实现风力发电机中传感器的数据展示监控可视化系统+源代码+文档说明(高分课程设计)
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论2