Reinforcementlearninganddynamicprogrammingusingfunctionapproximators资源-CSDN文库

需积分: 9 47 浏览量 2017-12-29 10:32:28 上传评论 1 收藏 7.87MB PDF 举报

资源详情

资源评论

Lucian Bus¸oniu, Robert Babuˇska, Bart De Schutter, and Damien Ernst

Reinforcement learning and

dynamic programming using

function approximators

Preface

Control systems are making a tremendous impact on our society. Though invisible

to most users, they are essential for the operation of nearly all devices – from basic

home appliances to aircraft and nuclear power plants. Apart from technical systems,

the principles of control are routinely applied and exploited in a variety of disciplines

such as economics, medicine, social sciences, and artiﬁcial intelligence.

A common denominator in the diverse applications of control is the need to in-

ﬂuence or modify the behavior of dynamic systems to attain prespeciﬁed goals. One

approach to achieve this is to assign a numerical performance index to each state tra-

jectory of the system. The control problem is then solved by searching for a control

policy that drives the system along trajectories corresponding to the best value of the

performance index. This approach essentially reduces the problem of ﬁnding good

control policies to the search for solutions of a mathematical optimization problem.

Early work in the ﬁeld of optimal control dates back to the 1940s with the pi-

oneering research of Pontryagin and Bellman. Dynamic programming (DP), intro-

duced by Bellman, is still among the state-of-the-art tools commonly used to solve

optimal control problems when a system model is available. The alternative idea of

ﬁnding a solution

in the absence

of a model was explored as early as the 1960s. In

the 1980s, a revival of interest in this model-free paradigm led to the development of

the ﬁeld of reinforcement learning (RL). The central theme in RL research is the de-

sign of algorithms that learn control policies solely from the knowledge of transition

samples or trajectories, which are collected beforehand or by online interaction with

the system. Most approaches developed to tackle the RL problem are closely related

to DP algorithms.

A core obstacle in DP and RL is that solutions cannot be represented exactly for

problems with large discrete state-action spaces or continuous spaces. Instead, com-

pact representations relying on function approximators must be used. This challenge

was already recognized while the ﬁrst DP techniques were being developed. How-

ever, it has only been in recent years – and largely in correlation with the advance

of RL – that approximation-based methods have grown in diversity, maturity, and

efﬁciency, enabling RL and DP to scale up to realistic problems.

This book provides an accessible in-depth treatment of reinforcement learning

and dynamic programming methods using function approximators. We start with a

concise introduction to classical DP and RL, in order to build the foundation for

the remainder of the book. Next, we present an extensive review of state-of-the-art

approaches to DP and RL with approximation. Theoretical guarantees are provided

on the solutions obtained, and numerical examples and comparisons are used to il-

lustrate the properties of the individual methods. The remaining three chapters are

dedicated to a detailed presentation of representative algorithms from the three ma-

jor classes of techniques: value iteration, policy iteration, and policy search. The

properties and the performance of these algorithms are highlighted in simulation and

experimental studies on a range of control applications.

We believe that this balanced combination of practical algorithms, theoretical

analysis, and comprehensive examples makes our book suitable not only for re-

searchers, teachers, and graduate students in the ﬁelds of optimal and adaptive con-

trol, machine learning and artiﬁcial intelligence, but also for practitioners seeking

novel strategies for solving challenging real-life control problems.

This book can be read in several ways. Readers unfamiliar with the ﬁeld are

advised to start with Chapter 1 for a gentle introduction, and continue with Chap-

ter 2 (which discusses classical DP and RL) and Chapter 3 (which considers

approximation-based methods). Those who are familiar with the basic concepts of

RL and DP may consult the list of notations given at the end of the book, and then

start directly with Chapter 3. This ﬁrst part of the book is sufﬁcient to get an overview

of the ﬁeld. Thereafter,readers can pick any combination of Chapters 4 to 6, depend-

ing on their interests: approximate value iteration (Chapter 4), approximate policy

iteration and online learning (Chapter 5), or approximate policy search (Chapter 6).

Supplementary information relevant to this book, including a complete archive

of the computer code used in the experimental studies, is available at the Web site:

http://www.dcsc.tudelft.nl/rlbook/

Comments, suggestions, or questions concerning the book or the Web site are wel-

come. Interested readers are encouraged to get in touch with the authors using the

contact information on the Web site.

The authors have been inspired over the years by many scientists who undoubt-

edly left their mark on this book; in particular by Louis Wehenkel, Pierre Geurts,

Guy-Bart Stan, R´emi Munos, Martin Riedmiller, and Michail Lagoudakis. Pierre

Geurts also provided the computer program for building ensembles of regression

trees, used in several examples in the book. This work would not have been pos-

sible without our colleagues, students, and the excellent professional environments

at the Delft Center for Systems and Control of the Delft University of Technology,

the Netherlands, the Monteﬁore Institute of the University of Li`ege, Belgium, and at

Sup´elec Rennes, France. Among our colleagues in Delft, Justin Rice deserves special

mention for carefully proofreading the manuscript. To all these people we extend our

sincere thanks.

We thank Sam Ge for giving us the opportunity to publish our book with Taylor

& Francis CRC Press, and the editorial and production team at Taylor & Francis for

their valuable help. We gratefully acknowledge the ﬁnancial support of the BSIK-

ICIS project “Interactive Collaborative Information Systems” (grant no. BSIK03024)

and the Dutch funding organizations NWO and STW. Damien Ernst is a Research

Associate of the FRS-FNRS, the ﬁnancial support of which he acknowledges. We

appreciate the kind permission offered by the IEEE to reproduce material from our

previous works over which they hold copyright.

剩余280页未读，继续阅读

评论收藏

内容反馈

Reinforcement learning and dynamic programming using function ap...

评论0

最新资源

Reinforcement learning and dynamic programming using function ap...

评论0

最新资源

相关推荐

Reinforcement learning_state of the art

reinforcement learning and optimal control

Reinforcement Learning : With Open AI, TensorFlow and Keras Using Python

Practical Reinforcement Learning Develop self-evolving, intelligent agents with

Reinforcement learning合集

Keras Reinforcement Learning Projects

Algorithms for reinforcement learning

An Introduction to Deep Reinforcement Learning

Reinforcement learning an introduction中文pdf

Keras Reinforcement Learning Projects Giuseppe Ciaburro 2018

Reinforcement Learning-Theory and Algorithm.pdf

Reinforcement Learning: An Introduction

A Reinforcement Learning Framework for Medical Image Segmentation.pdf

Reinforcement Learning With Open AI TensorFlow and.Keras Using Python

Reinforcement Learning With Open AI, TensorFlow and Keras Using Python epub

Reinforcement Learning_ Past, Present, and Future Perspectives

deep reinforcement learning

Reinforcement Learning.pdf

ChatGPT教程（终极版）最全整理

博客中Kmeans以及FCM算法数据（免积分）

hugging face的models-openai-clip-vit-large-patch14文件夹

神经网络回归预测--气温数据集

XGBoost+LightGBM+LSTM-光伏发电量预测

Mathwork+Matlab+编程手册

Stable-Diffusion WEBUI 简体中文语言包（2023.05.30更新）

基于Python+pytorch的图像处理+附完整代码图像处理，能够轻松实现图像的读取、显示、裁剪等还有机器学习等操作

时间序列预测模型实战案例(Xgboost)(Python)(机器学习)包括时间序列预测和时间序列分类，点击即可运行！