RLCOURSECOMPLETE.pdf资源-CSDN文库

需积分: 5 18 浏览量 2023-08-29 09:57:10 上传评论收藏 25.45MB PDF 举报

资源推荐

资源详情

资源评论

CONTENTS

1. Exact and Approximate Dynamic Programming . . . . .

1.1. AlphaZero, Oﬀ-Line Training, and On- L ine Play . . . . . . p. 4

1.2. Deterministic Dynamic Programming . . . . . . . . . . . p. 9

1.2.1. Finite Horizon Problem Formulation . . . . . . . . . p. 9

1.2.2. The Dynamic Programming Algorithm . . . . . . . . p. 13

1.2.3. Approximation in Value Space and Rollout . . . . . . p. 21

1.3. Stochastic Dynamic Programming . . . . . . . . . . . . . p. 27

1.3.1. Finite Horizon Problems . . . . . . . . . . . . . . p. 27

1.3.2. Approximation in Value Space for Stochastic DP . . . p. 32

1.3.3. Approximation in Policy Space . . . . . . . . . . . p. 36

1.4. Inﬁnite Horizon Problems - An O verview . . . . . . . . . p. 39

1.4.1. Inﬁnite Horizon Methodology . . . . . . . . . . . . p. 42

1.4.2. Approximation in Value Space - Inﬁnite Horizon . . . p. 45

1.4.3. Understanding Appr oximation in Value Space . . . . . p. 51

1.5. Inﬁnite Horizon Line ar Quadratic Problems . . . . . . . . p. 53

1.5.1. Visualizing Approximation in Value Space - . . . . . . .

Newton’s Method . . . . . . . . . . . . . . . . . p. 59

1.5.2. Local and Global Error Bounds for Approximation in . . .

Value Space . . . . . . . . . . . . . . . . . . . p. 66

1.5.3. Rollout and Policy Iteration . . . . . . . . . . . . p. 68

1.6. Examples, Refo rmulations, and Simpliﬁcations . . . . . . . p. 71

1.6.1. A Few Words About Modeling . . . . . . . . . . . p. 72

1.6.2. Problems with a Termination State . . . . . . . . . p. 75

1.6.3. State Augmentation, Time Delays, Forecasts , and . . . . .

Uncontrollable State Components . . . . . . . . . . p. 79

1.6.4. Partial State Informatio n and Be lief States . . . . . . p. 86

1.6.5. Multiagent Problems and Multiagent Rollout . . . . . p. 90

1.6.6. Problems with Unknown Parameters - Adaptive . . . . .

Control . . . . . . . . . . . . . . . . . . . . . p. 95

1.6.7. Model Predictive Control . . . . . . . . . . . . . p. 1 06

1.7. Reinforcement Learning and Decision/Control . . . . . . p. 116

1.7.1. Terminology . . . . . . . . . . . . . . . . . . p. 117

1.7.2. Notation . . . . . . . . . . . . . . . . . . . . p. 119

1.7.3. A Few Words about Machine Learning and . . . . . . .

Mathematical Optimization . . . . . . . . . . . p. 120

1.8. Notes, Sources, and Exercises . . . . . . . . . . . . . . p. 124

2. Approximation in Value Space - Rollout Algorithms

2.1. Deterministic Discrete Spaces Finite Ho rizon Problems . . p. 14 8

2.2. Approximation in Value Space . . . . . . . . . . . . . p. 157

2.3. Rollout Algorithms for Discrete Optimization . . . . . . . p. 158

2.3.1. Cost Improvement with Rollout - Sequential Consistency, . .

Sequential Improvement . . . . . . . . . . . . . . p. 163

2.3.2. The Fortiﬁed Rollout Algorithm . . . . . . . . . . . p. 170

2.3.3. Using Multiple Base Heuristics - Parallel Rollout . . . p. 173

2.3.4. Simpliﬁed Rollout Algorithms . . . . . . . . . . . . p. 174

2.3.5. Truncated Rollout with Terminal Cost Approximation . p. 175

2.3.6. Model-Free Rollout . . . . . . . . . . . . . . . . p. 1 76

2.4. Rollout and Approximation in Value Space with Multistep . . . .

Lookahead . . . . . . . . . . . . . . . . . . . . . . p. 180

2.4.1. Iterative Deepening Using Forward Dynamic . . . . . . . .

Programming . . . . . . . . . . . . . . . . . . . p. 186

2.4.2. Incremental Multistep Rollout . . . . . . . . . . . . p. 188

2.5. Constraine d Forms of Rollout Algorithms . . . . . . . . p. 190

2.5.1. Constrained Rollout for Discrete Optimization and Integer . .

Programming . . . . . . . . . . . . . . . . . . . p. 202

2.6. Small Stage Costs and Long Horiz on - Continuous-Time . . . . .

Rollout . . . . . . . . . . . . . . . . . . . . . . . p. 20 6

2.7. Stochastic Rollout and Monte Carlo Tree Search . . . . . p. 214

2.7.1. Simpliﬁed Rollout and Policy Iteration . . . . . . . . p. 218

2.7.2. Certainty Equivalence Approximations . . . . . . . . p. 219

2.7.3. Simulation-Based Implementation of the Rollout . . . . . . .

Algorithm . . . . . . . . . . . . . . . . . . . . . p. 220

2.7.4. Variance Reduction in Rollout - Compa ring Advantages p. 223

2.7.5. Monte Carlo Tree Search . . . . . . . . . . . . . . p. 22 6

2.7.6. Randomized Policy Improve ment by Monte Carlo . . . . . .

Tree Search . . . . . . . . . . . . . . . . . . . . p. 229

2.8. Rollout for Inﬁnite-Spaces Pro blems - Optimiza tion . . . . . . .

Heuristics . . . . . . . . . . . . . . . . . . . . . . p. 230

2.8.1. Rollout for Inﬁnite-Spaces Deterministic Problems . . . p. 230

2.8.2. Rollout Based on Stochastic Programming . . . . . . p. 234

2.8.3. Sto chastic Programming with Certainty Equivalence . . p. 237

2.9. Multiagent Rollout . . . . . . . . . . . . . . . . . . p. 23 8

2.9.1. Asynchronous and Autonomous Multiagent Rollout . . p. 249

2.10. Rollout for Bayesian Optimization and Sequential . . . . . . .

Estimation . . . . . . . . . . . . . . . . . . . . . p. 253

2.11. Adaptive Control by Rollout with a POMDP . . . . . . . . .

Formulation . . . . . . . . . . . . . . . . . . . . . p. 264

2.12. Rollout for Minimax Control . . . . . . . . . . . . . p. 272

2.13. Notes, Sources, and Exercise s . . . . . . . . . . . . . p. 280

剩余411页未读，继续阅读

评论收藏

内容反馈

傻啦嘿哟

粉丝: 4199
资源: 11

RLCOURSECOMPLETE.pdf

相关实用应用程序（Windows可用）

李飞飞自传 我看见的世界 The World I see

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

第十九届研电赛-技术论文模板

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

智联招聘：2024年大学生就业力调研报告.pdf

4个亲测好用的ChatGPT4渠道

2024年俄罗斯商用车数字集群信息娱乐系统市场机会及渠道调研报告Sample.pdf

学术海报模板+论文科研+研究生

北森能力测评题库.zip

农村公交与异构无人机协同配送优化

车载毫米波雷达DOA估计综述博文仿真代码

ST-LINK Utility 4.6.0

认知智能技术与产业研究报告2023

2023泛娱乐社交出海手册-ZEGO即构科技

python大作业 含爬虫、数据可视化、地图、报告、及源码（2016-2021全国各地区粮食产量）.rar

1000份ppt模版，PPT模板优秀PPT

由于找不到iUtils.dll,无法继续执行代码

249个 ChatGPT 关键词汇总 中文版

ST语言规则编程手册全面讲解ST语言

GJB 3206B-2022 《技术状态管理》

【R213】The Worlds I See 我看见的世界【Fei-Fei_Li 李飞飞】.pdf

高项第四版十大管理、49个过程、五大过程组【趣味联想记忆口诀，助你一篇牢牢记住！】

chrome-win64.zip

用拉伸法测量金属丝的杨氏模量实验报告

Infiniband Specification Vol 1-Release-1.4

卸载软件最最最彻底的工具

最新资源

李飞飞自传我看见的世界 The World I see

python大作业含爬虫、数据可视化、地图、报告、及源码（2016-2021全国各地区粮食产量）.rar

249个 ChatGPT 关键词汇总中文版