# Comprehensive Reinforcement Learning Tutorial
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/tensorlayer/tensorlayer/master.svg)
[![Supported TF Version](https://img.shields.io/badge/TensorFlow-2.0.0%2B-brightgreen.svg)](https://github.com/tensorflow/tensorflow/releases)
[![Documentation Status](https://readthedocs.org/projects/tensorlayer/badge/)](https://tensorlayer.readthedocs.io/)
[![Build Status](https://travis-ci.org/tensorlayer/tensorlayer.svg?branch=master)](https://travis-ci.org/tensorlayer/tensorlayer)
[![Downloads](http://pepy.tech/badge/tensorlayer)](http://pepy.tech/project/tensorlayer)
<br/>
<a href="https://deepreinforcementlearningbook.org" target="\_blank">
<div align="center">
<img src="http://deep-reinforcement-learning-book.github.io/assets/images/cover_v1.png" width="22%"/>
</div>
<!-- <div align="center"><caption>Slack Invitation Link</caption></div> -->
</a>
<br/>
<!--
<br/>
<a href="https://github.com/tensorlayer/tensorlayer-chinese/blob/master/docs/images/RL_Group_QR.jpeg" target="\_blank">
<div align="center">
<img src="https://github.com/tensorlayer/tensorlayer-chinese/blob/master/docs/images/RL_Group_QR.jpeg" width="20%"/>
</div>
<div align="center"><caption>WeChat QR Code</caption></div>
</a>
<br/>
-->
This repository contains implementations of the most popular reinforcement learning algorithms, powered by [Tensorflow 2.0](https://www.tensorflow.org/alpha/guide/effective_tf2) and Tensorlayer 2.0. We aim to make the reinforcement learning tutorial simple, transparent and straight-forward, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.
A corresponding [Springer textbook](https://deepreinforcementlearningbook.org) is also provided, you can get the free PDF if your institute has Springer license. We also released an [RLzoo](https://github.com/tensorlayer/RLzoo) for simple usage.
<br/>
<a href="https://join.slack.com/t/tensorlayer/shared_invite/enQtMjUyMjczMzU2Njg4LWI0MWU0MDFkOWY2YjQ4YjVhMzI5M2VlZmE4YTNhNGY1NjZhMzUwMmQ2MTc0YWRjMjQzMjdjMTg2MWQ2ZWJhYzc" target="\_blank">
<div align="center">
<img src="../../img/join_slack.png" width="20%"/>
</div>
<!-- <div align="center"><caption>Slack Invitation Link</caption></div> -->
</a>
<br/>
## Prerequisites:
* python 3.5
* tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
* tensorlayer >= 2.0.1
* tensorflow-probability
*** If you meet the error`AttributeError: module 'tensorflow' has no attribute 'contrib'` when running the code after installing tensorflow-probability, try:
`pip install --upgrade tf-nightly-2.0-preview tfp-nightly`
## Quick Start
```
conda create --name tl python=3.6.4
conda activate tl
pip install tensorflow-gpu==2.0.0-rc1 # if no GPU, use pip install tensorflow==2.0.0
pip install tensorlayer
pip install tensorflow-probability==0.9.0
pip install gym
pip install gym[atari] # for others, use pip instal gym[all]
python tutorial_DDPG.py --train
```
## Status: Beta
We are currently open to any suggestions or pull requests from you to make the reinforcement learning tutorial with TensorLayer2.0 a better code repository for both new learners and senior researchers. Some of the algorithms mentioned in the this markdown may be not yet available, since we are still trying to implement more RL algorithms and optimize their performances. However, those algorithms listed above will come out in a few weeks, and the repository will keep updating more advanced RL algorithms in the future.
## To Use:
For each tutorial, open a terminal and run:
`python ***.py --train` for training and `python ***.py --test` for testing.
The tutorial algorithms follow the same basic structure, as shown in file: [`./tutorial_format.py`](https://github.com/tensorlayer/tensorlayer/blob/reinforcement-learning/examples/reinforcement_learning/tutorial_format.py)
The pretrained models and learning curves for each algorithm are stored [here](https://github.com/tensorlayer/pretrained-models). You can download the models and load the weights in the policies for tests.
## Table of Contents:
### value-based
| Algorithms | Action Space | Tutorial Env | Papers |
| --------------- | ------------ | -------------- | -------|
|**value-based**||||
| Q-learning | Discrete | FrozenLake | [Technical note: Q-learning. Watkins et al. 1992](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)|
| Deep Q-Network (DQN)| Discrete | FrozenLake | [Human-level control through deep reinforcement learning, Mnih et al. 2015.](https://www.nature.com/articles/nature14236/) |
| Prioritized Experience Replay | Discrete | Pong, CartPole | [Schaul et al. Prioritized experience replay. Schaul et al. 2015.](https://arxiv.org/abs/1511.05952) |
|Dueling DQN|Discrete | Pong, CartPole |[Dueling network architectures for deep reinforcement learning. Wang et al. 2015.](https://arxiv.org/abs/1511.06581)|
|Double DQN| Discrete | Pong, CartPole |[Deep reinforcement learning with double q-learning. Van et al. 2016.](https://arxiv.org/abs/1509.06461)|
|Noisy DQN|Discrete | Pong, CartPole |[Noisy networks for exploration. Fortunato et al. 2017.](https://arxiv.org/pdf/1706.10295.pdf)|
| Distributed DQN (C51)| Discrete | Pong, CartPole | [A distributional perspective on reinforcement learning. Bellemare et al. 2017.](https://arxiv.org/pdf/1707.06887.pdf) |
|**policy-based**||||
|REINFORCE(PG) |Discrete/Continuous|CartPole | [Reinforcement learning: An introduction. Sutton et al. 2011.](https://www.cambridge.org/core/journals/robotica/article/robot-learning-edited-by-jonathan-h-connell-and-sridhar-mahadevan-kluwer-boston-19931997-xii240-pp-isbn-0792393651-hardback-21800-guilders-12000-8995/737FD21CA908246DF17779E9C20B6DF6)|
| Trust Region Policy Optimization (TRPO)| Discrete/Continuous | Pendulum | [Abbeel et al. Trust region policy optimization. Schulman et al.2015.](https://arxiv.org/pdf/1502.05477.pdf) |
| Proximal Policy Optimization (PPO) |Discrete/Continuous |Pendulum| [Proximal policy optimization algorithms. Schulman et al. 2017.](https://arxiv.org/abs/1707.06347) |
|Distributed Proximal Policy Optimization (DPPO)|Discrete/Continuous |Pendulum|[Emergence of locomotion behaviours in rich environments. Heess et al. 2017.](https://arxiv.org/abs/1707.02286)|
|**actor-critic**||||
|Actor-Critic (AC)|Discrete/Continuous|CartPole| [Actor-critic algorithms. Konda er al. 2000.](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)|
| Asynchronous Advantage Actor-Critic (A3C)| Discrete/Continuous | BipedalWalker| [Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.](https://arxiv.org/pdf/1602.01783.pdf) |
| DDPG|Discrete/Continuous |Pendulum| [Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) |
|TD3|Discrete/Continuous |Pendulum|[Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.](https://arxiv.org/pdf/1802.09477.pdf)|
|Soft Actor-Critic (SAC)|Discrete/Continuous |Pendulum|[Soft actor-critic algorithms and applications. Haarnoja et al. 2018.](https://arxiv.org/abs/1812.05905)|
## Examples of RL Algorithms:
* **Q-learning**
Code: `./tutorial_Qlearning.py`
<u>Paper</u>: [Technical Note Q-Learning](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)
<u>Description</u>:
```
Q-learning is a non-deep-learning method with TD Learning, Off-Policy, e-Greedy Exploration.
Central formula:
Q(S, A) <- Q(S, A) + alpha * (R + lambda * Q(newS, newA) - Q(S, A))
See David Silver RL Tutorial Lecture 5 - Q-Learning for more details.
```
* **Deep Q-Network (DQN)**
<u>Code:</u> `./tutorial_DQN.py`
<u>Paper</u>: [Human-level control through deep reinforcementlearning](https://
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
tensorlayer-master.zip (411个子文件)
make.bat 8KB
DockerLint.bat 121B
setup.travis.cfg 3KB
setup.travis_doc.cfg 2KB
setup.cfg 2KB
fix_rtd.css 213B
Dockerfile 1KB
Dockerfile 549B
.dockerignore 20B
.DS_Store 6KB
.DS_Store 6KB
yolov4_video_result.gif 6.91MB
.gitignore 1KB
.gitignore 14B
model_basic.h5 276KB
karpathy_rnn.jpeg 731KB
mnist.jpeg 227KB
pong_game.jpeg 38KB
tiger.jpeg 12KB
tiger.jpeg 12KB
puzzle.jpeg 9KB
affine_transform_comparison.jpg 91KB
affine_transform_why.jpg 90KB
3d_human_pose_result.jpg 47KB
1.jpg 30KB
human_pose_points.jpg 26KB
0.jpg 18KB
img3.jpg 7KB
img4.jpg 7KB
img8.jpg 7KB
img1.jpg 6KB
img5.jpg 6KB
img8.jpg 5KB
img2.jpg 5KB
img9.jpg 5KB
img7.jpg 5KB
img5.jpg 5KB
img3.jpg 5KB
img6.jpg 5KB
img1.jpg 5KB
img4.jpg 4KB
img2.jpg 4KB
img6.jpg 4KB
img9.jpg 3KB
img7.jpg 3KB
imagenet_class_index.json 35KB
imagenet_class_index.json 35KB
cat_caption.json 891B
Makefile 7KB
Makefile 715B
CHANGELOG.md 24KB
README.md 17KB
CONTRIBUTING.md 7KB
README.md 4KB
README.md 2KB
README.md 2KB
README.md 1KB
README.md 1KB
ISSUE_TEMPLATE.md 1KB
readme.md 1KB
PULL_REQUEST_TEMPLATE.md 974B
README.md 866B
Readme.md 680B
README.md 363B
README.md 263B
README.md 146B
README.md 87B
README.md 54B
README.md 8B
README.md 0B
coco.names 627B
word2vec_basic.pdf 111KB
word2vec_basic.pdf 111KB
img_tlayer_big.png 4.07MB
yolov4_image_result.png 1.79MB
TL_gardener.png 261KB
tsne.png 195KB
tl_transparent_logo.png 147KB
tl_transparent_logo.png 147KB
tl_black_logo.png 124KB
tl_black_logo.png 124KB
laska.png 100KB
tl_white_logo.png 82KB
tl_white_logo.png 82KB
github_mascot.png 64KB
join_slack.png 60KB
awesome-mentioned.png 59KB
basic_seq2seq.png 33KB
seq2seq.png 29KB
medium_header.png 28KB
img_tensorflow.png 27KB
img_tensorlayer.png 14KB
img_tensorlayer.png 14KB
img_tunelayer.png 4KB
img_tlayer.png 4KB
img_tlayer1.png 4KB
TL_gardener.psd 1.75MB
join_slack.psd 269KB
medium_header.psd 94KB
prepro.py 136KB
共 411 条
- 1
- 2
- 3
- 4
- 5
资源评论
m0_72731342
- 粉丝: 2
- 资源: 1832
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功