reinforcementlearning：anintroduction2018最新版bookpdf格式资源-CSDN文库

共1个文件

pdf：1个

强化学习

5星 · 超过95%的资源需积分: 47 123 浏览量 2018-04-23 15:03:53 上传评论 4 收藏 16.16MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

book-Reinforcement Learning An Introduction2018最新版.zip （1个子文件）

book-Reinforcement Learning An Introduction2018最新版.pdf 18.44MB

Reinforcement Learning:

An Introduction

Second edition

****Complete draft****

March 25, 2018

Richard S. Sutton and Andrew G. Barto

 2014, 2015, 2016, 2017, 2018

The text is now complete and formatted. Some ﬁgures are still being re-made. Please

send any errors to rich@richsutton.com. We are also interested in correcting any im-

portant omissions in the “Bibliographical and Historical Remarks” at the end of each

chapter.

A Bradford Book

The MIT Press

Cambridge, Massachusetts

London, England

In memory of A. Harry Klopf

Contents

Preface to the Second Edition ix

Preface to the First Edition xv

Summary of Notation xvii

1 Introduction 1

1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Elements of Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 6

1.4 Limitations and Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.5 An Extended Example: Tic-Tac-Toe . . . . . . . . . . . . . . . . . . . . . 8

1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.7 Early History of Reinforcement Learning . . . . . . . . . . . . . . . . . . . 13

I Tabular Solution Methods 23

2 Multi-armed Bandits 25

2.1 A k-armed Bandit Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2 Action-value Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3 The 10-armed Testbed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2.4 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 30

2.5 Tracking a Nonstationary Problem . . . . . . . . . . . . . . . . . . . . . . 32

2.6 Optimistic Initial Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

2.7 Upper-Conﬁdence-Bound Action Selection . . . . . . . . . . . . . . . . . . 35

2.8 Gradient Bandit Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2.9 Associative Search (Contextual Bandits) . . . . . . . . . . . . . . . . . . . 41

2.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

iii

iv Contents

3 Finite Markov Decision Processes 47

3.1 The Agent–Environment Interface . . . . . . . . . . . . . . . . . . . . . . 47

3.2 Goals and Rewards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.3 Returns and Episodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3.4 Uniﬁed Notation for Episodic and Continuing Tasks . . . . . . . . . . . . 57

3.5 Policies and Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . 58

3.6 Optimal Policies and Optimal Value Functions . . . . . . . . . . . . . . . 62

3.7 Optimality and Approximation . . . . . . . . . . . . . . . . . . . . . . . . 67

3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

4 Dynamic Programming 73

4.1 Policy Evaluation (Prediction) . . . . . . . . . . . . . . . . . . . . . . . . 74

4.2 Policy Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

4.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.4 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

4.5 Asynchronous Dynamic Programming . . . . . . . . . . . . . . . . . . . . 85

4.6 Generalized Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 86

4.7 Eﬃciency of Dynamic Programming . . . . . . . . . . . . . . . . . . . . . 87

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

5 Monte Carlo Methods 91

5.1 Monte Carlo Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

5.2 Monte Carlo Estimation of Action Values . . . . . . . . . . . . . . . . . . 96

5.3 Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.4 Monte Carlo Control without Exploring Starts . . . . . . . . . . . . . . . 100

5.5 Oﬀ-policy Prediction via Importance Sampling . . . . . . . . . . . . . . . 103

5.6 Incremental Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 108

5.7 Oﬀ-policy Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . . . 110

5.8 *Discounting-aware Importance Sampling . . . . . . . . . . . . . . . . . . 112

5.9 *Per-decision Importance Sampling . . . . . . . . . . . . . . . . . . . . . . 113

5.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

6 Temporal-Diﬀerence Learning 119

6.1 TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.2 Advantages of TD Prediction Methods . . . . . . . . . . . . . . . . . . . . 124

6.3 Optimality of TD(0) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

6.4 Sarsa: On-policy TD Control . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.5 Q-learning: Oﬀ-policy TD Control . . . . . . . . . . . . . . . . . . . . . . 131

6.6 Expected Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

6.7 Maximization Bias and Double Learning . . . . . . . . . . . . . . . . . . . 134

6.8 Games, Afterstates, and Other Special Cases . . . . . . . . . . . . . . . . 136

6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

Contents v

7 n-step Bootstrapping 141

7.1 n-step TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

7.2 n-step Sarsa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.3 n-step Oﬀ-policy Learning by Importance Sampling . . . . . . . . . . . . 148

7.4 *Per-decision Oﬀ-policy Methods with Control

Variates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

7.5 Oﬀ-policy Learning Without Importance Sampling:

The n-step Tree Backup Algorithm . . . . . . . . . . . . . . . . . . . . . . 152

7.6 *A Unifying Algorithm: n-step Q(σ) . . . . . . . . . . . . . . . . . . . . . 154

7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

8 Planning and Learning with Tabular Methods 159

8.1 Models and Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

8.2 Dyna: Integrated Planning, Acting, and Learning . . . . . . . . . . . . . . 161

8.3 When the Model Is Wrong . . . . . . . . . . . . . . . . . . . . . . . . . . . 166

8.4 Prioritized Sweeping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

8.5 Expected vs. Sample Updates . . . . . . . . . . . . . . . . . . . . . . . . . 172

8.6 Trajectory Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174

8.7 Real-time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . 177

8.8 Planning at Decision Time . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.9 Heuristic Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181

8.10 Rollout Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

8.11 Monte Carlo Tree Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185

8.12 Summary of the Chapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

8.13 Summary of Part I: Dimensions . . . . . . . . . . . . . . . . . . . . . . . . 189

II Approximate Solution Methods 195

9 On-policy Prediction with Approximation 197

9.1 Value-function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . 198

9.2 The Prediction Objective (VE) . . . . . . . . . . . . . . . . . . . . . . . . 199

9.3 Stochastic-gradient and Semi-gradient Methods . . . . . . . . . . . . . . . 200

9.4 Linear Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204

9.5 Feature Construction for Linear Methods . . . . . . . . . . . . . . . . . . 210

9.5.1 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210

9.5.2 Fourier Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

9.5.3 Coarse Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

9.5.4 Tile Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 217

9.5.5 Radial Basis Functions . . . . . . . . . . . . . . . . . . . . . . . . . 221

9.6 Selecting Step-Size Parameters Manually . . . . . . . . . . . . . . . . . . . 222

9.7 Nonlinear Function Approximation: Artiﬁcial Neural Networks . . . . . . 223

评论收藏

内容反馈

erzhongdetaozi

2020-06-04

很专业的书，可以好好学一学强化学习了

fe_ng_ji

粉丝: 3
资源: 6

reinforcement learning ：an introduction 2018最新版book pdf格式

最新资源

reinforcement learning ：an introduction 2018最新版book pdf格式

Introduction to Reinforcement Learning

Reinforcement Learning：An Introduction.pdf

Reinforcement Learning An Introduction(2nd)2018.pdf

An_introduction_to_Reinforcement_Learning.pdf

Reinforcement Learning-An introduction

Reinforcement learning an introduction中文pdf

reinforcement-learning-an-introduction(2018年3月最新版550页)+配套源码+配套David Silver RL课件

Reinforcement Learning_An Introduction多版本合集

强化学习导论中文版 增强学习导论中文版 Reinforcement learning an introduction 中文版.

Reinforcement Learning: An Introduction 2018年 第二版和之前2015中文翻译版

Reinforcement Learning - An Introduction

Reinforcement Learning An Introduction

Reinforcement Learning_ An Introduction

An Introduction to Deep Reinforcement Learning.pdf

Reinforcement Learning: An Introduction

Reinforcement Learning：An Introduction PDF文档+源代码

Reinforcement Learning: An Introduction second edition

reinforcement-learning-an-introduction-PDF和源码（2017版）

增强学习导论（Reinforcement Learning An Introduction_Sutton）

An introduction of RL.pdf

Reinforcement Learning: An Introduction（最新版书籍+代码，February 28, 2018）

RL-AnIntroduction-sutton强化学习-2018七月最新版

An Introduction to Reinforcement Learning

Reinforcement Learning An Introduction.pdf

Reinforcement Learning An Introduction 2nd solutions - 副本.pdf

Reinforcement Learning 2nd(Richard_S._Sutton).pdf

最新资源

强化学习导论中文版增强学习导论中文版 Reinforcement learning an introduction 中文版.

Reinforcement Learning: An Introduction 2018年第二版和之前2015中文翻译版