没有合适的资源?快使用搜索试试~ 我知道了~
RLCOURSECOMPLETE.pdf
需积分: 5 0 下载量 18 浏览量
2023-08-29
09:57:10
上传
评论
收藏 25.45MB PDF 举报
温馨提示
![preview](https://dl-preview.csdnimg.cn/88268040/0001-6f34dcc4a88573e13e0eaec0e78fdfcf_thumbnail.jpeg)
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
试读
412页
RLCOURSECOMPLETE
资源推荐
资源详情
资源评论
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![rar](https://img-home.csdnimg.cn/images/20210720083606.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![xlsx](https://img-home.csdnimg.cn/images/20210720083732.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![](https://csdnimg.cn/release/download_crawler_static/88268040/bg1.jpg)
![](https://csdnimg.cn/release/download_crawler_static/88268040/bg2.jpg)
ACoursein
Reinforcement Learning
by
Dimitri P. Bertsekas
Arizona State University
WWW site for book information and orders
http://www.athenasc.com
Athena Scientific, Belmont, Massachusetts
![](https://csdnimg.cn/release/download_crawler_static/88268040/bg3.jpg)
Athena Scientific
Post Office Box 805
Nashua, NH 03060
U.S.A.
Email: info@athenasc.com
WWW: http://www.athenasc.com
c
! 2023 Dimitri P. Bertsekas
All rights reserved. No part of this book may be reproduced in any form
by any electronic o r mechanical means (including photocopying, recording,
or information storage and retrieval) without permis sion in writing from
the publisher.
Publisher’s Cataloging-in-Publication Data
Bertsekas, Dimitri P.
Class Notes for a Course on Reinforc e ment Learning
Includes Bibliography and Index
1. Mathematical Optimiza tion. 2. Dynamic Programming . I. Title.
QA402.5 .B465 2020 519.703 00-91281
ISBN-10: 1-886529-??-?, ISBN-13: 9 78-1-88 6529-??-?
![](https://csdnimg.cn/release/download_crawler_static/88268040/bg4.jpg)
CONTENTS
1. Exact and Approximate Dynamic Programming . . . . .
1.1. AlphaZero, Off-Line Training, and On- L ine Play . . . . . . p. 4
1.2. Deterministic Dynamic Programming . . . . . . . . . . . p. 9
1.2.1. Finite Horizon Problem Formulation . . . . . . . . . p. 9
1.2.2. The Dynamic Programming Algorithm . . . . . . . . p. 13
1.2.3. Approximation in Value Space and Rollout . . . . . . p. 21
1.3. Stochastic Dynamic Programming . . . . . . . . . . . . . p. 27
1.3.1. Finite Horizon Problems . . . . . . . . . . . . . . p. 27
1.3.2. Approximation in Value Space for Stochastic DP . . . p. 32
1.3.3. Approximation in Policy Space . . . . . . . . . . . p. 36
1.4. Infinite Horizon Problems - An O verview . . . . . . . . . p. 39
1.4.1. Infinite Horizon Methodology . . . . . . . . . . . . p. 42
1.4.2. Approximation in Value Space - Infinite Horizon . . . p. 45
1.4.3. Understanding Appr oximation in Value Space . . . . . p. 51
1.5. Infinite Horizon Line ar Quadratic Problems . . . . . . . . p. 53
1.5.1. Visualizing Approximation in Value Space - . . . . . . .
Newton’s Method . . . . . . . . . . . . . . . . . p. 59
1.5.2. Local and Global Error Bounds for Approximation in . . .
Value Space . . . . . . . . . . . . . . . . . . . p. 66
1.5.3. Rollout and Policy Iteration . . . . . . . . . . . . p. 68
1.6. Examples, Refo rmulations, and Simplifications . . . . . . . p. 71
1.6.1. A Few Words About Modeling . . . . . . . . . . . p. 72
1.6.2. Problems with a Termination State . . . . . . . . . p. 75
1.6.3. State Augmentation, Time Delays, Forecasts , and . . . . .
Uncontrollable State Components . . . . . . . . . . p. 79
1.6.4. Partial State Informatio n and Be lief States . . . . . . p. 86
1.6.5. Multiagent Problems and Multiagent Rollout . . . . . p. 90
1.6.6. Problems with Unknown Parameters - Adaptive . . . . .
Control . . . . . . . . . . . . . . . . . . . . . p. 95
1.6.7. Model Predictive Control . . . . . . . . . . . . . p. 1 06
1.7. Reinforcement Learning and Decision/Control . . . . . . p. 116
1.7.1. Terminology . . . . . . . . . . . . . . . . . . p. 117
1.7.2. Notation . . . . . . . . . . . . . . . . . . . . p. 119
1.7.3. A Few Words about Machine Learning and . . . . . . .
Mathematical Optimization . . . . . . . . . . . p. 120
1.8. Notes, Sources, and Exercises . . . . . . . . . . . . . . p. 124
![](https://csdnimg.cn/release/download_crawler_static/88268040/bg5.jpg)
2. Approximation in Value Space - Rollout Algorithms
2.1. Deterministic Discrete Spaces Finite Ho rizon Problems . . p. 14 8
2.2. Approximation in Value Space . . . . . . . . . . . . . p. 157
2.3. Rollout Algorithms for Discrete Optimization . . . . . . . p. 158
2.3.1. Cost Improvement with Rollout - Sequential Consistency, . .
Sequential Improvement . . . . . . . . . . . . . . p. 163
2.3.2. The Fortified Rollout Algorithm . . . . . . . . . . . p. 170
2.3.3. Using Multiple Base Heuristics - Parallel Rollout . . . p. 173
2.3.4. Simplified Rollout Algorithms . . . . . . . . . . . . p. 174
2.3.5. Truncated Rollout with Terminal Cost Approximation . p. 175
2.3.6. Model-Free Rollout . . . . . . . . . . . . . . . . p. 1 76
2.4. Rollout and Approximation in Value Space with Multistep . . . .
Lookahead . . . . . . . . . . . . . . . . . . . . . . p. 180
2.4.1. Iterative Deepening Using Forward Dynamic . . . . . . . .
Programming . . . . . . . . . . . . . . . . . . . p. 186
2.4.2. Incremental Multistep Rollout . . . . . . . . . . . . p. 188
2.5. Constraine d Forms of Rollout Algorithms . . . . . . . . p. 190
2.5.1. Constrained Rollout for Discrete Optimization and Integer . .
Programming . . . . . . . . . . . . . . . . . . . p. 202
2.6. Small Stage Costs and Long Horiz on - Continuous-Time . . . . .
Rollout . . . . . . . . . . . . . . . . . . . . . . . p. 20 6
2.7. Stochastic Rollout and Monte Carlo Tree Search . . . . . p. 214
2.7.1. Simplified Rollout and Policy Iteration . . . . . . . . p. 218
2.7.2. Certainty Equivalence Approximations . . . . . . . . p. 219
2.7.3. Simulation-Based Implementation of the Rollout . . . . . . .
Algorithm . . . . . . . . . . . . . . . . . . . . . p. 220
2.7.4. Variance Reduction in Rollout - Compa ring Advantages p. 223
2.7.5. Monte Carlo Tree Search . . . . . . . . . . . . . . p. 22 6
2.7.6. Randomized Policy Improve ment by Monte Carlo . . . . . .
Tree Search . . . . . . . . . . . . . . . . . . . . p. 229
2.8. Rollout for Infinite-Spaces Pro blems - Optimiza tion . . . . . . .
Heuristics . . . . . . . . . . . . . . . . . . . . . . p. 230
2.8.1. Rollout for Infinite-Spaces Deterministic Problems . . . p. 230
2.8.2. Rollout Based on Stochastic Programming . . . . . . p. 234
2.8.3. Sto chastic Programming with Certainty Equivalence . . p. 237
2.9. Multiagent Rollout . . . . . . . . . . . . . . . . . . p. 23 8
2.9.1. Asynchronous and Autonomous Multiagent Rollout . . p. 249
2.10. Rollout for Bayesian Optimization and Sequential . . . . . . .
Estimation . . . . . . . . . . . . . . . . . . . . . p. 253
2.11. Adaptive Control by Rollout with a POMDP . . . . . . . . .
Formulation . . . . . . . . . . . . . . . . . . . . . p. 264
2.12. Rollout for Minimax Control . . . . . . . . . . . . . p. 272
2.13. Notes, Sources, and Exercise s . . . . . . . . . . . . . p. 280
剩余411页未读,继续阅读
资源评论
![avatar-default](https://csdnimg.cn/release/downloadcmsfe/public/img/lazyLogo2.1882d7f4.png)
![avatar](https://profile-avatar.csdnimg.cn/3610692d2e4e426984da676269b593f4_weixin_43856625.jpg!1)
傻啦嘿哟
- 粉丝: 4199
- 资源: 11
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)