ReinforcementLearningAnIntroduction_Sutton-增强学习导论（文档+代码）

共80个文件

png：49个

py：24个

pdf：2个

人工智能

深度学习

增强学习

Reinforcemen

Sutton

3星 · 超过75%的资源需积分: 10 87 浏览量 2018-09-29 12:38:52 上传评论 3 收藏 16.82MB RAR 举报

资源详情

资源评论

资源推荐

收起资源包目录

Reinforcement Learning-增强学习(文档+代码).rar （80个子文件）

Reinforcement Learning-增强学习(文档+代码)

reinforcement-learning-an-introduction-master

chapter03

grid_world.py 4KB

images

figure_3_2.png 17KB

figure_9_8.png 453KB

figure_12_3.png 71KB

figure_5_1.png 161KB

figure_5_2.png 168KB

figure_5_4.png 53KB

figure_9_10.png 38KB

figure_11_7.png 31KB

figure_4_2.png 170KB

figure_9_1.png 71KB

figure_6_2.png 25KB

figure_6_6.png 53KB

figure_7_2.png 71KB

figure_9_5.png 83KB

figure_6_4.png 64KB

figure_10_4.png 51KB

figure_11_2.png 106KB

figure_8_8.png 176KB

figure_2_4.png 38KB

figure_12_8.png 72KB

figure_12_11.png 43KB

figure_2_5.png 49KB

figure_8_4.png 28KB

example_13_1.png 35KB

figure_10_5.png 101KB

figure_12_6.png 64KB

figure_12_10.png 48KB

figure_4_1.png 12KB

figure_10_3.png 36KB

example_8_4.png 31KB

figure_2_6.png 39KB

figure_8_2.png 31KB

figure_9_2.png 154KB

figure_10_2.png 48KB

figure_13_2.png 42KB

figure_11_6.png 107KB

figure_2_3.png 35KB

example_6_2.png 233KB

figure_8_5.png 29KB

figure_5_3.png 29KB

figure_8_7.png 38KB

figure_2_1.png 25KB

figure_6_7.png 38KB

figure_2_2.png 153KB

figure_13_1.png 29KB

figure_10_1.png 988KB

figure_4_3.png 53KB

figure_3_5.png 18KB

figure_6_3.png 21KB

chapter05

blackjack.py 13KB

infinite_variance.py 2KB

requirements.txt 29B

chapter11

counterexample.py 12KB

chapter12

mountain_car.py 12KB

random_walk.py 9KB

chapter01

tic_tac_toe.py 11KB

chapter09

square_wave.py 4KB

random_walk.py 15KB

.travis.yml 148B

chapter04

gamblers_problem.py 2KB

car_rental.py 7KB

grid_world.py 3KB

LICENSE 11KB

chapter13

short_corridor.py 8KB

chapter06

cliff_walking.py 9KB

maximization_bias.py 4KB

windy_grid_world.py 4KB

random_walk.py 6KB

README.md 10KB

chapter10

access_control.py 9KB

mountain_car.py 13KB

chapter02

ten_armed_testbed.py 9KB

chapter07

random_walk.py 4KB

.gitignore 40B

chapter08

maze.py 23KB

trajectory_sampling.py 5KB

expectation_vs_sample.py 2KB

Reinforcement Learning：An Introduction.pdf 12.03MB

增强学习导论中文版（RLAI）2~10.pdf 2.98MB

# Reinforcement Learning: An Introduction [![Build Status](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction.svg?branch=master)](https://travis-ci.org/ShangtongZhang/reinforcement-learning-an-introduction) Python code for Sutton & Barto's book [*Reinforcement Learning: An Introduction (2nd Edition)*](http://incompleteideas.net/book/the-book-2nd.html) > If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly. # Contents ### Chapter 1 1. Tic-Tac-Toe ### Chapter 2 1. [Figure 2.1: An exemplary bandit problem from the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_1.png) 2. [Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_2.png) 3. [Figure 2.3: Optimistic initial action-value estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_3.png) 4. [Figure 2.4: Average performance of UCB action selection on the 10-armed testbed](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_4.png) 5. [Figure 2.5: Average performance of the gradient bandit algorithm](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_5.png) 6. [Figure 2.6: A parameter study of the various bandit algorithms](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_2_6.png) ### Chapter 3 1. [Figure 3.2: Grid example with random policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_2.png) 2. [Figure 3.5: Optimal solutions to the gridworld example](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_3_5.png) ### Chapter 4 1. [Figure 4.1: Convergence of iterative policy evaluation on a small gridworld](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_1.png) 2. [Figure 4.2: Jack’s car rental problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_2.png) 3. [Figure 4.3: The solution to the gambler’s problem](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_4_3.png) ### Chapter 5 1. [Figure 5.1: Approximate state-value functions for the blackjack policy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_1.png) 2. [Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_2.png) 3. [Figure 5.3: Weighted importance sampling](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_3.png) 4. [Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_5_4.png) ### Chapter 6 1. [Example 6.2: Random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_6_2.png) 2. [Figure 6.2: Batch updating](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_2.png) 3. [Figure 6.3: Sarsa applied to windy grid world](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_3.png) 4. [Figure 6.4: The cliff-walking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_4.png) 5. [Figure 6.6: Interim and asymptotic performance of TD control methods](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_6.png) 6. [Figure 6.7: Comparison of Q-learning and Double Q-learning](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_6_7.png) ### Chapter 7 1. [Figure 7.2: Performance of n-step TD methods on 19-state random walk](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_7_2.png) ### Chapter 8 1. [Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_2.png) 2. [Figure 8.4: Average performance of Dyna agents on a blocking task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_4.png) 3. [Figure 8.5: Average performance of Dyna agents on a shortcut task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_5.png) 4. [Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/example_8_4.png) 5. [Figure 8.7: Comparison of efficiency of expected and sample updates](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_7.png) 6. [Figure 8.8: Relative efficiency of different update distributions](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_8_8.png) ### Chapter 9 1. [Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_1.png) 2. [Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_2.png) 3. [Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_5.png) 4. [Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_8.png) 5. [Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_9_10.png) ### Chapter 10 1. [Figure 10.1: The cost-to-go function for Mountain Car task in one run](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_1.png) 2. [Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_2.png) 3. [Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_3.png) 4. [Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_4.png) 5. [Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task](https://raw.githubusercontent.com/ShangtongZhang/reinforcement-learning-an-introduction/master/images/figure_10_5.png) ### Chapter 11 1. [Figure 11.2: Baird's Countere