强化学习导论（Reinforcement Learning）

强化学习

需积分: 50 100 下载量 54 浏览量 2016-05-26 13:15:02 上传评论收藏 5.45MB PDF 举报

温馨提示

试读

398页

Reinforcement Learning：An Introduction 强化学习经典入门教程

资源推荐

资源详情

资源评论

Book

Next: Contents Contents

Reinforcement Learning:

An Introduction

Richard S. Sutton and Andrew G. Barto

A Bradford Book

The MIT Press

Cambridge, Massachusetts

London, England

In memory of A. Harry Klopf

● Contents

❍ Preface

❍ Series Forward

❍ Summary of Notation

● I. The Problem

❍ 1. Introduction

■ 1.1 Reinforcement Learning

http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (1 di 4)22/06/2005 9.04.27

Book
■     1.2 Examples 
■     1.3 Elements of Reinforcement Learning 
■     1.4 An Extended Example: Tic-Tac-Toe 
■     1.5 Summary 
■     1.6 History of Reinforcement Learning 
■     1.7 Bibliographical Remarks 
❍     2. Evaluative Feedback 
■     2.1 An  -Armed Bandit Problem 
■     2.2 Action-Value Methods 
■     2.3 Softmax Action Selection 
■     2.4 Evaluation Versus Instruction 
■     2.5 Incremental Implementation 
■     2.6 Tracking a Nonstationary Problem 
■     2.7 Optimistic Initial Values 
■     2.8 Reinforcement Comparison 
■     2.9 Pursuit Methods 
■     2.10 Associative Search 
■     2.11 Conclusions 
■     2.12 Bibliographical and Historical Remarks 
❍     3. The Reinforcement Learning Problem 
■     3.1 The Agent-Environment Interface 
■     3.2 Goals and Rewards 
■     3.3 Returns 
■     3.4 Unified Notation for Episodic and Continuing Tasks 
■     3.5 The Markov Property 
■     3.6 Markov Decision Processes 
■     3.7 Value Functions 
■     3.8 Optimal Value Functions 
■     3.9 Optimality and Approximation 
■     3.10 Summary 
■     3.11 Bibliographical and Historical Remarks 
 
●     II. Elementary Solution Methods 
❍     4. Dynamic Programming 
■     4.1 Policy Evaluation 
■     4.2 Policy Improvement 
■     4.3 Policy Iteration 
■     4.4 Value Iteration 
■     4.5 Asynchronous Dynamic Programming 
■     4.6 Generalized Policy Iteration 
■     4.7 Efficiency of Dynamic Programming 
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (2 di 4)22/06/2005 9.04.27

Book
■     4.8 Summary 
■     4.9 Bibliographical and Historical Remarks 
❍     5. Monte Carlo Methods 
■     5.1 Monte Carlo Policy Evaluation 
■     5.2 Monte Carlo Estimation of Action Values 
■     5.3 Monte Carlo Control 
■     5.4 On-Policy Monte Carlo Control 
■     5.5 Evaluating One Policy While Following Another 
■     5.6 Off-Policy Monte Carlo Control 
■     5.7 Incremental Implementation 
■     5.8 Summary 
■     5.9 Bibliographical and Historical Remarks 
❍     6. Temporal-Difference Learning 
■     6.1 TD Prediction 
■     6.2 Advantages of TD Prediction Methods 
■     6.3 Optimality of TD(0) 
■     6.4 Sarsa: On-Policy TD Control 
■     6.5 Q-Learning: Off-Policy TD Control 
■     6.6 Actor-Critic Methods 
■     6.7 R-Learning for Undiscounted Continuing Tasks 
■     6.8 Games, Afterstates, and Other Special Cases 
■     6.9 Summary 
■     6.10 Bibliographical and Historical Remarks 
 
●     III. A Unified View 
❍     7. Eligibility Traces 
■     7.1  -Step TD Prediction 
■     7.2 The Forward View of TD( ) 
■     7.3 The Backward View of TD( ) 
■     7.4 Equivalence of Forward and Backward Views 
■     7.5 Sarsa( ) 
■     7.6 Q( ) 
■     7.7 Eligibility Traces for Actor-Critic Methods 
■     7.8 Replacing Traces 
■     7.9 Implementation Issues 
■     7.10 Variable   
■     7.11 Conclusions 
■     7.12 Bibliographical and Historical Remarks 
❍     8. Generalization and Function Approximation 
■     8.1 Value Prediction with Function Approximation 
■     8.2 Gradient-Descent Methods 
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (3 di 4)22/06/2005 9.04.27

Book
■     8.3 Linear Methods 
■     8.3.1 Coarse Coding 
■     8.3.2 Tile Coding 
■     8.3.3 Radial Basis Functions 
■     8.3.4 Kanerva Coding 
■     8.4 Control with Function Approximation 
■     8.5 Off-Policy Bootstrapping 
■     8.6 Should We Bootstrap? 
■     8.7 Summary 
■     8.8 Bibliographical and Historical Remarks 
❍     9. Planning and Learning 
■     9.1 Models and Planning 
■     9.2 Integrating Planning, Acting, and Learning 
■     9.3 When the Model Is Wrong 
■     9.4 Prioritized Sweeping 
■     9.5 Full vs. Sample Backups 
■     9.6 Trajectory Sampling 
■     9.7 Heuristic Search 
■     9.8 Summary 
■     9.9 Bibliographical and Historical Remarks 
❍     10. Dimensions of Reinforcement Learning 
■     10.1 The Unified View 
■     10.2 Other Frontier Dimensions 
❍     11. Case Studies 
■     11.1 TD-Gammon 
■     11.2 Samuel's Checkers Player 
■     11.3 The Acrobot 
■     11.4 Elevator Dispatching 
■     11.5 Dynamic Channel Allocation 
■     11.6 Job-Shop Scheduling 
 
●     Bibliography 
❍     Index 
 
Mark Lee 2005-01-04 
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (4 di 4)22/06/2005 9.04.27

Contents
      
Next: Preface Up: Book Previous: Book  
 
 
Contents 
●     I. The Problem 
❍     1. Introduction 
❍     2. Evaluative Feedback 
❍     3. The Reinforcement Learning Problem 
 
●     II. Elementary Solution Methods 
❍     4. Dynamic Programming 
❍     5. Monte Carlo Methods 
❍     6. Temporal-Difference Learning 
 
●     III. A Unified View 
❍     7. Eligibility Traces 
❍     8. Generalization and Function Approximation 
❍     9. Planning and Learning 
❍     10. Dimensions of Reinforcement Learning 
❍     11. Case Studies 
 
●     Bibliography 
 
Subsections 
❍     Preface 
❍     Series Forward 
❍     Summary of Notation 
 
Mark Lee 2005-01-04 
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node1.html22/06/2005 9.04.31