没有合适的资源?快使用搜索试试~ 我知道了~
强化学习导论(Reinforcement Learning)
需积分: 50 92 下载量 30 浏览量
2016-05-26
21:15:02
上传
评论
收藏 5.45MB PDF 举报
温馨提示
Reinforcement Learning:An Introduction 强化学习经典入门教程
资源推荐
资源详情
资源评论
Book
Next: Contents Contents
Reinforcement Learning:
An Introduction
Richard S. Sutton and Andrew G. Barto
A Bradford Book
The MIT Press
Cambridge, Massachusetts
London, England
In memory of A. Harry Klopf
● Contents
❍ Preface
❍ Series Forward
❍ Summary of Notation
● I. The Problem
❍ 1. Introduction
■ 1.1 Reinforcement Learning
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (1 di 4)22/06/2005 9.04.27
Book
■ 1.2 Examples
■ 1.3 Elements of Reinforcement Learning
■ 1.4 An Extended Example: Tic-Tac-Toe
■ 1.5 Summary
■ 1.6 History of Reinforcement Learning
■ 1.7 Bibliographical Remarks
❍ 2. Evaluative Feedback
■ 2.1 An -Armed Bandit Problem
■ 2.2 Action-Value Methods
■ 2.3 Softmax Action Selection
■ 2.4 Evaluation Versus Instruction
■ 2.5 Incremental Implementation
■ 2.6 Tracking a Nonstationary Problem
■ 2.7 Optimistic Initial Values
■ 2.8 Reinforcement Comparison
■ 2.9 Pursuit Methods
■ 2.10 Associative Search
■ 2.11 Conclusions
■ 2.12 Bibliographical and Historical Remarks
❍ 3. The Reinforcement Learning Problem
■ 3.1 The Agent-Environment Interface
■ 3.2 Goals and Rewards
■ 3.3 Returns
■ 3.4 Unified Notation for Episodic and Continuing Tasks
■ 3.5 The Markov Property
■ 3.6 Markov Decision Processes
■ 3.7 Value Functions
■ 3.8 Optimal Value Functions
■ 3.9 Optimality and Approximation
■ 3.10 Summary
■ 3.11 Bibliographical and Historical Remarks
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
■ 4.1 Policy Evaluation
■ 4.2 Policy Improvement
■ 4.3 Policy Iteration
■ 4.4 Value Iteration
■ 4.5 Asynchronous Dynamic Programming
■ 4.6 Generalized Policy Iteration
■ 4.7 Efficiency of Dynamic Programming
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (2 di 4)22/06/2005 9.04.27
Book
■ 4.8 Summary
■ 4.9 Bibliographical and Historical Remarks
❍ 5. Monte Carlo Methods
■ 5.1 Monte Carlo Policy Evaluation
■ 5.2 Monte Carlo Estimation of Action Values
■ 5.3 Monte Carlo Control
■ 5.4 On-Policy Monte Carlo Control
■ 5.5 Evaluating One Policy While Following Another
■ 5.6 Off-Policy Monte Carlo Control
■ 5.7 Incremental Implementation
■ 5.8 Summary
■ 5.9 Bibliographical and Historical Remarks
❍ 6. Temporal-Difference Learning
■ 6.1 TD Prediction
■ 6.2 Advantages of TD Prediction Methods
■ 6.3 Optimality of TD(0)
■ 6.4 Sarsa: On-Policy TD Control
■ 6.5 Q-Learning: Off-Policy TD Control
■ 6.6 Actor-Critic Methods
■ 6.7 R-Learning for Undiscounted Continuing Tasks
■ 6.8 Games, Afterstates, and Other Special Cases
■ 6.9 Summary
■ 6.10 Bibliographical and Historical Remarks
● III. A Unified View
❍ 7. Eligibility Traces
■ 7.1 -Step TD Prediction
■ 7.2 The Forward View of TD( )
■ 7.3 The Backward View of TD( )
■ 7.4 Equivalence of Forward and Backward Views
■ 7.5 Sarsa( )
■ 7.6 Q( )
■ 7.7 Eligibility Traces for Actor-Critic Methods
■ 7.8 Replacing Traces
■ 7.9 Implementation Issues
■ 7.10 Variable
■ 7.11 Conclusions
■ 7.12 Bibliographical and Historical Remarks
❍ 8. Generalization and Function Approximation
■ 8.1 Value Prediction with Function Approximation
■ 8.2 Gradient-Descent Methods
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (3 di 4)22/06/2005 9.04.27
Book
■ 8.3 Linear Methods
■ 8.3.1 Coarse Coding
■ 8.3.2 Tile Coding
■ 8.3.3 Radial Basis Functions
■ 8.3.4 Kanerva Coding
■ 8.4 Control with Function Approximation
■ 8.5 Off-Policy Bootstrapping
■ 8.6 Should We Bootstrap?
■ 8.7 Summary
■ 8.8 Bibliographical and Historical Remarks
❍ 9. Planning and Learning
■ 9.1 Models and Planning
■ 9.2 Integrating Planning, Acting, and Learning
■ 9.3 When the Model Is Wrong
■ 9.4 Prioritized Sweeping
■ 9.5 Full vs. Sample Backups
■ 9.6 Trajectory Sampling
■ 9.7 Heuristic Search
■ 9.8 Summary
■ 9.9 Bibliographical and Historical Remarks
❍ 10. Dimensions of Reinforcement Learning
■ 10.1 The Unified View
■ 10.2 Other Frontier Dimensions
❍ 11. Case Studies
■ 11.1 TD-Gammon
■ 11.2 Samuel's Checkers Player
■ 11.3 The Acrobot
■ 11.4 Elevator Dispatching
■ 11.5 Dynamic Channel Allocation
■ 11.6 Job-Shop Scheduling
● Bibliography
❍ Index
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html (4 di 4)22/06/2005 9.04.27
Contents
Next: Preface Up: Book Previous: Book
Contents
● I. The Problem
❍ 1. Introduction
❍ 2. Evaluative Feedback
❍ 3. The Reinforcement Learning Problem
● II. Elementary Solution Methods
❍ 4. Dynamic Programming
❍ 5. Monte Carlo Methods
❍ 6. Temporal-Difference Learning
● III. A Unified View
❍ 7. Eligibility Traces
❍ 8. Generalization and Function Approximation
❍ 9. Planning and Learning
❍ 10. Dimensions of Reinforcement Learning
❍ 11. Case Studies
● Bibliography
Subsections
❍ Preface
❍ Series Forward
❍ Summary of Notation
Mark Lee 2005-01-04
http://www.cs.ualberta.ca/%7Esutton/book/ebook/node1.html22/06/2005 9.04.31
剩余397页未读,继续阅读
资源评论
rtygbwwwerr
- 粉丝: 184
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功