强化学习PPT（国外）资源-CSDN文库

共12个文件

pdf：12个

强化学习

机器学习

PPT

需积分: 48 38 浏览量 2018-01-26 12:23:28 上传评论 1 收藏 19.24MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

RL.zip （12个子文件）

Lecture2.pdf 816KB

Lecture3.pdf 805KB

Lecture1-Introduction.pdf 2.86MB

lecture10_games.pdf 2.96MB

lecture7_pg.pdf 1.79MB

Algorithms for RL.pdf 1.59MB

lecture9.pdf 1.28MB

Lecture5-Model-Free-Control.pdf 1.43MB

Lecture6_FA.pdf 1.9MB

Lecture4-MC-TD.pdf 1.39MB

lecture8_dyna.pdf 2.08MB

An Introduction to RL - SuttonBook.pdf 3.03MB

Reinforcement Learning:

An Introduction

Second edition, in progress

Richard S. Sutton and Andrew G. Barto

 2012

A Bradford Book

The MIT Press

Cambridge, Massachusetts

London, England

In memory of A. Harry Klopf

Contents

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Series Forward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Summary of Notation . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

I The Problem 1

1 Introduction 3

1.1 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . 4

1.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3 Elements of Reinforcement Learning . . . . . . . . . . . . . . 7

1.4 An Extended Example: Tic-Tac-Toe . . . . . . . . . . . . . . 10

1.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.6 History of Reinforcement Learning . . . . . . . . . . . . . . . 16

1.7 Bibliographical Remarks . . . . . . . . . . . . . . . . . . . . . 23

2 Bandit Problems 25

2.1 An n-Armed Bandit Problem . . . . . . . . . . . . . . . . . . 26

2.2 Action-Value Methods . . . . . . . . . . . . . . . . . . . . . . 27

2.3 Softmax Action Selection . . . . . . . . . . . . . . . . . . . . . 30

2.4 Incremental Implementation . . . . . . . . . . . . . . . . . . . 32

2.5 Tracking a Nonstationary Problem . . . . . . . . . . . . . . . 33

2.6 Optimistic Initial Values . . . . . . . . . . . . . . . . . . . . . 35

2.7 Associative Search (Contextual Bandits) . . . . . . . . . . . . 37

iii

iv CONTENTS

2.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.9 Bibliographical and Historical Remarks . . . . . . . . . . . . . 40

3 The Reinforcement Learning Problem 43

3.1 The Agent–Environment Interface . . . . . . . . . . . . . . . . 43

3.2 Goals and Rewards . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 Returns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3.4 Uniﬁed Notation for Episodic and Continuing Tasks . . . . . . 52

∗

3.5 The Markov Property . . . . . . . . . . . . . . . . . . . . . . . 53

3.6 Markov Decision Processes . . . . . . . . . . . . . . . . . . . . 58

3.7 Value Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 60

3.8 Optimal Value Functions . . . . . . . . . . . . . . . . . . . . . 66

3.9 Optimality and Approximation . . . . . . . . . . . . . . . . . 71

3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

3.11 Bibliographical and Historical Remarks . . . . . . . . . . . . . 74

II Tabular Action-Value Methods 79

4 Dynamic Programming 83

4.1 Policy Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 84

4.2 Policy Improvement . . . . . . . . . . . . . . . . . . . . . . . . 87

4.3 Policy Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4.4 Value Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . 95

4.5 Asynchronous Dynamic Programming . . . . . . . . . . . . . . 98

4.6 Generalized Policy Iteration . . . . . . . . . . . . . . . . . . . 99

4.7 Eﬃciency of Dynamic Programming . . . . . . . . . . . . . . . 101

4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

4.9 Bibliographical and Historical Remarks . . . . . . . . . . . . . 103

5 Monte Carlo Methods 107

5.1 Monte Carlo Policy Evaluation . . . . . . . . . . . . . . . . . 108

CONTENTS v

5.2 Monte Carlo Estimation of Action Values . . . . . . . . . . . . 112

5.3 Monte Carlo Control . . . . . . . . . . . . . . . . . . . . . . . 114

5.4 On-Policy Monte Carlo Control . . . . . . . . . . . . . . . . . 118

∗

5.5 Evaluating One Policy While Following Another (Oﬀ-policy Pol-

icy Evaluation) . . . . . . . . . . . . . . . . . . . . . . . . . . 121

5.6 Oﬀ-Policy Monte Carlo Control . . . . . . . . . . . . . . . . . 122

5.7 Incremental Implementation . . . . . . . . . . . . . . . . . . . 124

5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

5.9 Bibliographical and Historical Remarks . . . . . . . . . . . . . 127

6 Temporal-Diﬀerence Learning 129

6.1 TD Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . 129

6.2 Advantages of TD Prediction Methods . . . . . . . . . . . . . 134

6.3 Optimality of TD(0) . . . . . . . . . . . . . . . . . . . . . . . 137

6.4 Sarsa: On-Policy TD Control . . . . . . . . . . . . . . . . . . 141

6.5 Q-Learning: Oﬀ-Policy TD Control . . . . . . . . . . . . . . . 144

6.6 Games, Afterstates, and Other Special Cases . . . . . . . . . . 147

6.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

6.8 Bibliographical and Historical Remarks . . . . . . . . . . . . . 149

7 Eligibility Traces 153

7.1 n-Step TD Prediction . . . . . . . . . . . . . . . . . . . . . . . 154

7.2 The Forward View of TD(λ) . . . . . . . . . . . . . . . . . . . 159

7.3 The Backward View of TD(λ) . . . . . . . . . . . . . . . . . . 163

7.4 Equivalence of Forward and Backward Views . . . . . . . . . . 166

7.5 Sarsa(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7.6 Q(λ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

7.7 Replacing Traces . . . . . . . . . . . . . . . . . . . . . . . . . 175

7.8 Implementation Issues . . . . . . . . . . . . . . . . . . . . . . 178

∗

7.9 Variable λ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

7.10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179

评论收藏

内容反馈

1907530058

粉丝: 2
资源: 19

强化学习PPT（国外）

强化学习PPT资源

David Silver的强化学习Reinforcement Learning课程PPT

David Silver强化学习（reinforcement learning）课程PPT

强化学习PPT课件.pptx

David Silver的强化学习Reinforcement Learning课程讲义PPT

强化学习总结PPT学习

强化学习全套PPT.pdf

David Silver强化学习PPT.rar

李宏毅深度强化学习PPT（含机器学习课程对RL的简介）

强化学习课件.pdf

深入学习VTK国外PPT资料

国外博弈论PPT学习教案.pptx

国外文化事业管理PPT学习教案.pptx

国外课程与教学改革PPT学习教案.pptx

国外循环经济PPT学习教案.pptx

强化学习，深度学习，Actor-critic.ppt

强化学习ppt.pdf

强化学习简介PPT学习教案.pptx

David Silver强化学习课件ppt

【AIDL】南京大学俞扬博士：强化学习前沿【PPT完整版】

CS294深度强化学习课件（完整版）

David Silver强化学习课程课件

强化学习经典案列

强化学习笔记

《强化学习导论》最新版——————文字版本，

国外图书馆现代时尚PPT学习教案.pptx

国外游戏理论PPT学习教案.pptx

unittwoJobsabroadasteachers去国外做英语老师PPT教案学习.pptx

国外机构贷款融资PPT学习教案.pptx

最新资源