ii
dedicated to a detailed presentation of representative algorithms from the three ma-
jor classes of techniques: value iteration, policy iteration, and policy search. The
properties and the performance of these algorithms are highlighted in simulation and
experimental studies on a range of control applications.
We believe that this balanced combination of practical algorithms, theoretical
analysis, and comprehensive examples makes our book suitable not only for re-
searchers, teachers, and graduate students in the fields of optimal and adaptive con-
trol, machine learning and artificial intelligence, but also for practitioners seeking
novel strategies for solving challenging real-life control problems.
This book can be read in several ways. Readers unfamiliar with the field are
advised to start with Chapter 1 for a gentle introduction, and continue with Chap-
ter 2 (which discusses classical DP and RL) and Chapter 3 (which considers
approximation-based methods). Those who are familiar with the basic concepts of
RL and DP may consult the list of notations given at the end of the book, and then
start directly with Chapter 3. This first part of the book is sufficient to get an overview
of the field. Thereafter,readers can pick any combination of Chapters 4 to 6, depend-
ing on their interests: approximate value iteration (Chapter 4), approximate policy
iteration and online learning (Chapter 5), or approximate policy search (Chapter 6).
Supplementary information relevant to this book, including a complete archive
of the computer code used in the experimental studies, is available at the Web site:
http://www.dcsc.tudelft.nl/rlbook/
Comments, suggestions, or questions concerning the book or the Web site are wel-
come. Interested readers are encouraged to get in touch with the authors using the
contact information on the Web site.
The authors have been inspired over the years by many scientists who undoubt-
edly left their mark on this book; in particular by Louis Wehenkel, Pierre Geurts,
Guy-Bart Stan, R´emi Munos, Martin Riedmiller, and Michail Lagoudakis. Pierre
Geurts also provided the computer program for building ensembles of regression
trees, used in several examples in the book. This work would not have been pos-
sible without our colleagues, students, and the excellent professional environments
at the Delft Center for Systems and Control of the Delft University of Technology,
the Netherlands, the Montefiore Institute of the University of Li`ege, Belgium, and at
Sup´elec Rennes, France. Among our colleagues in Delft, Justin Rice deserves special
mention for carefully proofreading the manuscript. To all these people we extend our
sincere thanks.
We thank Sam Ge for giving us the opportunity to publish our book with Taylor
& Francis CRC Press, and the editorial and production team at Taylor & Francis for
their valuable help. We gratefully acknowledge the financial support of the BSIK-
ICIS project “Interactive Collaborative Information Systems” (grant no. BSIK03024)
and the Dutch funding organizations NWO and STW. Damien Ernst is a Research
Associate of the FRS-FNRS, the financial support of which he acknowledges. We
appreciate the kind permission offered by the IEEE to reproduce material from our
previous works over which they hold copyright.
评论0
最新资源