没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
试读
70页
2017强化学习英文最新综述 Deep Reinforcement Learning: An Overview,主要讨论了深度强化学习六个核心要素,六个重要机制和十二个应用。文章从机器学习的背景开始,深入讨论了强化学习。难能可贵的是,本文也讨论了强化学习的最新应用,特别是AlphaGo,机器人,自然语言处理,包括对话系统,机器翻译和文本生成,计算机视觉等。是非常好的入门学习资料。
资源推荐
资源详情
资源评论
DEEP REINFORCEMENT LEARNING: AN OVERVIEW
Yuxi Li (yuxili@gmail.com)
ABSTRACT
We give an overview of recent exciting achievements of deep reinforcement learn-
ing (RL). We discuss six core elements, six important mechanisms, and twelve
applications. We start with background of machine learning, deep learning and
reinforcement learning. Next we discuss core RL elements, including value func-
tion, in particular, Deep Q-Network (DQN), policy, reward, model, planning, and
exploration. After that, we discuss important mechanisms for RL, including at-
tention and memory, unsupervised learning, transfer learning, multi-agent RL, hi-
erarchical RL, and learning to learn. Then we discuss various applications of RL,
including games, in particular, AlphaGo, robotics, natural language processing,
including dialogue systems, machine translation, and text generation, computer
vision, neural architecture design, business management, finance, healthcare, In-
dustry 4.0, smart grid, intelligent transportation systems, and computer systems.
We mention topics not reviewed yet, and list a collection of RL resources. After
presenting a brief summary, we close with discussions.
This is the first overview about deep reinforcement learning publicly available
online. It is comprehensive. Comments and criticisms are welcome.
1
arXiv:1701.07274v5 [cs.LG] 15 Sep 2017
CONTENTS
1 Introduction 5
2 Background 6
2.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Deep Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.1 Problem Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.2 Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.3 Temporal Difference Learning . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3.4 Multi-step Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.5 Function Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3.6 Policy Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.7 Deep Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 12
2.3.8 RL Parlance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.3.9 Brief Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Core Elements 14
3.1 Value Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.1 Deep Q-Network (DQN) . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Double DQN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.1.3 Prioritized Experience Replay . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.4 Dueling Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.1.5 More DQN Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Policy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 Actor-Critic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 Policy Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.3 Combining Policy Gradient with Off-Policy RL . . . . . . . . . . . . . . . 19
3.3 Reward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.5 Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.6 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4 Important Mechanisms 22
4.1 Attention and Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Unsupervised Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2.1 Horde . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
4.2.2 Unsupervised Auxiliary Learning . . . . . . . . . . . . . . . . . . . . . . 23
4.2.3 Generative Adversarial Networks . . . . . . . . . . . . . . . . . . . . . . 23
2
4.3 Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.4 Multi-Agent Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 24
4.5 Hierarchical Reinforcement Learning . . . . . . . . . . . . . . . . . . . . . . . . 25
4.6 Learning to Learn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 Applications 26
5.1 Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.1.1 Perfect Information Board Games . . . . . . . . . . . . . . . . . . . . . . 26
5.1.2 Imperfect Information Board Games . . . . . . . . . . . . . . . . . . . . . 28
5.1.3 Video Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.1 Guided Policy Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.2.2 Learn to Navigate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Natural Language Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.1 Dialogue Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
5.3.2 Machine Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3.3 Text Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.4 Computer Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.5 Neural Architecture Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.6 Business Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.7 Finance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.8 Healthcare . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.9 Industry 4.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.10 Smart Grid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.11 Intelligent Transportation Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.12 Computer Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6 More Topics 36
7 Resources 37
7.1 Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
7.2 More Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.3 Surveys and Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.4 Courses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7.5 Tutorials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.6 Conferences, Journals and Workshops . . . . . . . . . . . . . . . . . . . . . . . . 39
7.7 Blogs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.8 Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
7.9 Algorithm Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3
1 INTRODUCTION
Reinforcement learning (RL) is about an agent interacting with the environment, learning an optimal
policy, by trail and error, for sequential decision making problems in a wide range of fields in both
natural and social sciences, and engineering (Sutton and Barto, 1998; 2017; Bertsekas and Tsitsiklis,
1996; Bertsekas, 2012; Szepesv
´
ari, 2010; Powell, 2011).
The integration of reinforcement learning and neural networks has a long history (Sutton and Barto,
2017; Bertsekas and Tsitsiklis, 1996; Schmidhuber, 2015). With recent exciting achievements of
deep learning (LeCun et al., 2015; Goodfellow et al., 2016), benefiting from big data, powerful
computation, new algorithmic techniques, mature software packages and architectures, and strong
financial support, we have been witnessing the renaissance of reinforcement learning (Krakovsky,
2016), especially, the combination of deep neural networks and reinforcement learning, i.e., deep
reinforcement learning (deep RL).
Deep learning, or deep neural networks, has been prevailing in reinforcement learning in the last
several years, in games, robotics, natural language processing, etc. We have been witnessing break-
throughs, like deep Q-network (Mnih et al., 2015) and AlphaGo (Silver et al., 2016a); and novel ar-
chitectures and applications, like differentiable neural computer (Graves et al., 2016), asynchronous
methods (Mnih et al., 2016), dueling network architectures (Wang et al., 2016b), value iteration
networks (Tamar et al., 2016), unsupervised reinforcement and auxiliary learning (Jaderberg et al.,
2017; Mirowski et al., 2017), neural architecture design (Zoph and Le, 2017), dual learning for
machine translation (He et al., 2016a), spoken dialogue systems (Su et al., 2016b), information
extraction (Narasimhan et al., 2016), guided policy search (Levine et al., 2016a), and generative
adversarial imitation learning (Ho and Ermon, 2016), etc.
Why has deep learning been helping reinforcement learning make so many and so enormous achieve-
ments? Representation learning with deep learning enables automatic feature engineering and end-
to-end learning through gradient descent, so that reliance on domain knowledge is significantly
reduced or even removed. Feature engineering used to be done manually and is usually time-
consuming, over-specified, and incomplete. Deep, distributed representations exploit the hierar-
chical composition of factors in data to combat the exponential challenges of the curse of dimen-
sionality. Generality, expressiveness and flexibility of deep neural networks make some tasks easier
or possible, e.g., in the breakthroughs and novel architectures and applications discussed above.
Deep learning and reinforcement learning, being selected as one of the MIT Technology Review 10
Breakthrough Technologies in 2013 and 2017 respectively, will play their crucial role in achieving
artificial general intelligence. David Silver, the major contributor of AlphaGo (Silver et al., 2016a),
even made a formula: artificial intelligence = reinforcement learning + deep learning (Silver, 2016).
We invent a sentence: Deep reinforcement learning is artificial intelligence.
The outline of this overview follows. First we discuss background of machine learning, deep learn-
ing and reinforcement learning in Section 2. Next we discuss core RL elements, including value
function in Section 3.1, policy in Section 3.2, reward in Section 3.3, model in Section 3.4, plan-
ning in Section 3.5, and exploration in Section 3.6. Then we discuss important mechanisms for
RL, including attention and memory in Section 4.1, unsupervised learning in Section 4.2, transfer
learning in Section 4.3, multi-agent RL in Section 4.4, hierarchical RL in Section 4.5, and, learning
to learn in Section 4.6. After that, we discuss various RL applications, including games in Sec-
tion 5.1, robotics in Section 5.2, natural language processing in Section 5.3, computer vision in
Section 5.4, neural architecture design in Section 5.5, business management in Section 5.6, finance
in Section 5.7, healthcare in Section 5.8, Industry 4.0 in Section 5.9, smart grid in Section 5.10, in-
telligent transportation systems in Section 5.11, and computer systems in Section 5.12. We present
a list of topics not reviewed yet in Section 6, give a brief summary in Section 8, and close with
discussions in Section 9.
In Section 7, we list a collection of RL resources including books, surveys, reports, online courses,
tutorials, conferences, journals and workshops, blogs, and open sources. If picking a single RL
resource, it is Sutton and Barto’s RL book (Sutton and Barto, 2017), 2nd edition in preparation. It
covers RL fundamentals and reflects new progress, e.g., in deep Q-network, AlphaGo, policy gra-
dient methods, as well as in psychology and neuroscience. Deng and Dong (2014) and Goodfellow
et al. (2016) are recent deep learning books. Bishop (2011), Hastie et al. (2009), and Murphy (2012)
5
剩余69页未读,继续阅读
资源评论
zhuf14
- 粉丝: 16
- 资源: 57
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功