2017强化学习英文最新综述DeepReinforcementLearning:AnOverview_DeepReinforcementLearningthatMatters资源-CSDN文库

需积分: 39 132 浏览量 2018-01-01 15:47:57 上传评论收藏 1.05MB PDF 举报

资源推荐

资源详情

资源评论

DEEP REINFORCEMENT LEARNING: AN OVERVIEW

Yuxi Li (yuxili@gmail.com)

ABSTRACT

We give an overview of recent exciting achievements of deep reinforcement learn-

ing (RL). We discuss six core elements, six important mechanisms, and twelve

applications. We start with background of machine learning, deep learning and

reinforcement learning. Next we discuss core RL elements, including value func-

tion, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and

exploration. After that, we discuss important mechanisms for RL, including at-

tention and memory, unsupervised learning, transfer learning, multi-agent RL, hi-

erarchical RL, and learning to learn. Then we discuss various applications of RL,

including games, in particular, AlphaGo, robotics, natural language processing,

including dialogue systems, machine translation, and text generation, computer

vision, neural architecture design, business management, ﬁnance, healthcare, In-

dustry 4.0, smart grid, intelligent transportation systems, and computer systems.

We mention topics not reviewed yet, and list a collection of RL resources. After

presenting a brief summary, we close with discussions.

This is the ﬁrst overview about deep reinforcement learning publicly available

online. It is comprehensive. Comments and criticisms are welcome.

arXiv:1701.07274v5 [cs.LG] 15 Sep 2017

CONTENTS

1 Introduction 5

2 Background 6

2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.2 Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.3 Temporal Difference Learning . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3.4 Multi-step Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.5 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.6 Policy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.7 Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 12

2.3.8 RL Parlance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.9 Brief Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Core Elements 14

3.1 Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.1 Deep Q-Network (DQN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1.2 Double DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.1.3 Prioritized Experience Replay . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.4 Dueling Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.1.5 More DQN Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2 Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Actor-Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.2 Policy Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.3 Combining Policy Gradient with Off-Policy RL . . . . . . . . . . . . . . . 19

3.3 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.5 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.6 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Important Mechanisms 22

4.1 Attention and Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2.1 Horde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.2.2 Unsupervised Auxiliary Learning . . . . . . . . . . . . . . . . . . . . . . 23

4.2.3 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . 23

4.3 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4 Multi-Agent Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 24

4.5 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 25

4.6 Learning to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Applications 26

5.1 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

5.1.1 Perfect Information Board Games . . . . . . . . . . . . . . . . . . . . . . 26

5.1.2 Imperfect Information Board Games . . . . . . . . . . . . . . . . . . . . . 28

5.1.3 Video Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5.2 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.1 Guided Policy Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.2.2 Learn to Navigate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3.1 Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

5.3.2 Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

5.3.3 Text Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.4 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

5.5 Neural Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.6 Business Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5.7 Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.8 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.9 Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.10 Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.11 Intelligent Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.12 Computer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6 More Topics 36

7 Resources 37

7.1 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

7.2 More Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.3 Surveys and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.4 Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

7.5 Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

7.6 Conferences, Journals and Workshops . . . . . . . . . . . . . . . . . . . . . . . . 39

7.7 Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.8 Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

7.9 Algorithm Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

1 INTRODUCTION

Reinforcement learning (RL) is about an agent interacting with the environment, learning an optimal

policy, by trail and error, for sequential decision making problems in a wide range of ﬁelds in both

natural and social sciences, and engineering (Sutton and Barto, 1998; 2017; Bertsekas and Tsitsiklis,

1996; Bertsekas, 2012; Szepesv

ari, 2010; Powell, 2011).

The integration of reinforcement learning and neural networks has a long history (Sutton and Barto,

2017; Bertsekas and Tsitsiklis, 1996; Schmidhuber, 2015). With recent exciting achievements of

deep learning (LeCun et al., 2015; Goodfellow et al., 2016), beneﬁting from big data, powerful

computation, new algorithmic techniques, mature software packages and architectures, and strong

ﬁnancial support, we have been witnessing the renaissance of reinforcement learning (Krakovsky,

2016), especially, the combination of deep neural networks and reinforcement learning, i.e., deep

reinforcement learning (deep RL).

Deep learning, or deep neural networks, has been prevailing in reinforcement learning in the last

several years, in games, robotics, natural language processing, etc. We have been witnessing break-

throughs, like deep Q-network (Mnih et al., 2015) and AlphaGo (Silver et al., 2016a); and novel ar-

chitectures and applications, like differentiable neural computer (Graves et al., 2016), asynchronous

methods (Mnih et al., 2016), dueling network architectures (Wang et al., 2016b), value iteration

networks (Tamar et al., 2016), unsupervised reinforcement and auxiliary learning (Jaderberg et al.,

2017; Mirowski et al., 2017), neural architecture design (Zoph and Le, 2017), dual learning for

machine translation (He et al., 2016a), spoken dialogue systems (Su et al., 2016b), information

extraction (Narasimhan et al., 2016), guided policy search (Levine et al., 2016a), and generative

adversarial imitation learning (Ho and Ermon, 2016), etc.

Why has deep learning been helping reinforcement learning make so many and so enormous achieve-

ments? Representation learning with deep learning enables automatic feature engineering and end-

to-end learning through gradient descent, so that reliance on domain knowledge is signiﬁcantly

reduced or even removed. Feature engineering used to be done manually and is usually time-

consuming, over-speciﬁed, and incomplete. Deep, distributed representations exploit the hierar-

chical composition of factors in data to combat the exponential challenges of the curse of dimen-

sionality. Generality, expressiveness and ﬂexibility of deep neural networks make some tasks easier

or possible, e.g., in the breakthroughs and novel architectures and applications discussed above.

Deep learning and reinforcement learning, being selected as one of the MIT Technology Review 10

Breakthrough Technologies in 2013 and 2017 respectively, will play their crucial role in achieving

artiﬁcial general intelligence. David Silver, the major contributor of AlphaGo (Silver et al., 2016a),

even made a formula: artiﬁcial intelligence = reinforcement learning + deep learning (Silver, 2016).

We invent a sentence: Deep reinforcement learning is artiﬁcial intelligence.

The outline of this overview follows. First we discuss background of machine learning, deep learn-

ing and reinforcement learning in Section 2. Next we discuss core RL elements, including value

function in Section 3.1, policy in Section 3.2, reward in Section 3.3, model in Section 3.4, plan-

ning in Section 3.5, and exploration in Section 3.6. Then we discuss important mechanisms for

RL, including attention and memory in Section 4.1, unsupervised learning in Section 4.2, transfer

learning in Section 4.3, multi-agent RL in Section 4.4, hierarchical RL in Section 4.5, and, learning

to learn in Section 4.6. After that, we discuss various RL applications, including games in Sec-

tion 5.1, robotics in Section 5.2, natural language processing in Section 5.3, computer vision in

Section 5.4, neural architecture design in Section 5.5, business management in Section 5.6, ﬁnance

in Section 5.7, healthcare in Section 5.8, Industry 4.0 in Section 5.9, smart grid in Section 5.10, in-

telligent transportation systems in Section 5.11, and computer systems in Section 5.12. We present

a list of topics not reviewed yet in Section 6, give a brief summary in Section 8, and close with

discussions in Section 9.

In Section 7, we list a collection of RL resources including books, surveys, reports, online courses,

tutorials, conferences, journals and workshops, blogs, and open sources. If picking a single RL

resource, it is Sutton and Barto’s RL book (Sutton and Barto, 2017), 2nd edition in preparation. It

covers RL fundamentals and reﬂects new progress, e.g., in deep Q-network, AlphaGo, policy gra-

dient methods, as well as in psychology and neuroscience. Deng and Dong (2014) and Goodfellow

et al. (2016) are recent deep learning books. Bishop (2011), Hastie et al. (2009), and Murphy (2012)

剩余69页未读，继续阅读

评论收藏

内容反馈

zhuf14

粉丝: 16
资源: 57

2017强化学习英文最新综述 Deep Reinforcement Learning: An Overview

最新资源

2017强化学习英文最新综述 Deep Reinforcement Learning: An Overview

强化学习综述

深度强化学习综述

强化学习导论 第二版 英文版 2017最新版 Reinforcement Learning An Introduction

deep reinforcement learning

Deep Reinforcement Learning

强化学习英文文献

Multi-agent reinforcement learning_An overview

强化学习完flappybird

深度强化学习必读文献

强化学习论文

Deep Reinforcement Learning深度强化学习

Deep_Reinforcement_Learning：深度强化学习项目的集合

深度强化学习中的泛化研究综述_A Survey of Generalisation in Deep Reinforcement

CS294-112 Deep Reinforcement Learning Sp17强化学习课件

Deep-Reinforcement-Learning-master_强化学习_

强化学习算法与应用综述(中文版)

DEEP REINFORCEMENT LEARNING

An Introduction to Deep Reinforcement Learning

强化学习导论资源集合(两个版本英文原文，部分翻译)

深度学习综述（英文）

强化学习导论

强化学习进阶书籍

AAAI2018 multi-agent RL

逆强化学习算法的概述By Abbeel

Deep-Reinforcement-Learning-Hands-On_deepreinforcement_强化学习_

Deep_Reinforcement_Learning:Udacity的深度强化学习纳米度

保护决策的深层强化学习_Deep Reinforcement Learning for Conservation Decisio

DeepMind 关系型深度强化学习 Relational Deep Reinforcement Learning

Relational Deep Reinforcement Learning（DeepMind提出关系性深度强化学习：在星际争霸2任务中获得最优水平）

最新资源

强化学习导论第二版英文版 2017最新版 Reinforcement Learning An Introduction