# Comprehensive Reinforcement Learning Tutorial
![GitHub last commit (branch)](https://img.shields.io/github/last-commit/tensorlayer/tensorlayer/master.svg)
[![Supported TF Version](https://img.shields.io/badge/TensorFlow-2.0.0%2B-brightgreen.svg)](https://github.com/tensorflow/tensorflow/releases)
[![Documentation Status](https://readthedocs.org/projects/tensorlayer/badge/)](https://tensorlayer.readthedocs.io/)
[![Build Status](https://travis-ci.org/tensorlayer/tensorlayer.svg?branch=master)](https://travis-ci.org/tensorlayer/tensorlayer)
[![Downloads](http://pepy.tech/badge/tensorlayer)](http://pepy.tech/project/tensorlayer)
<br/>
<a href="https://deepreinforcementlearningbook.org" target="\_blank">
<div align="center">
<img src="http://deep-reinforcement-learning-book.github.io/assets/images/cover_v1.png" width="22%"/>
</div>
<!-- <div align="center"><caption>Slack Invitation Link</caption></div> -->
</a>
<br/>
<!--
<br/>
<a href="https://github.com/tensorlayer/tensorlayer-chinese/blob/master/docs/images/RL_Group_QR.jpeg" target="\_blank">
<div align="center">
<img src="https://github.com/tensorlayer/tensorlayer-chinese/blob/master/docs/images/RL_Group_QR.jpeg" width="20%"/>
</div>
<div align="center"><caption>WeChat QR Code</caption></div>
</a>
<br/>
-->
This repository contains implementations of the most popular reinforcement learning algorithms, powered by [Tensorflow 2.0](https://www.tensorflow.org/alpha/guide/effective_tf2) and Tensorlayer 2.0. We aim to make the reinforcement learning tutorial simple, transparent and straight-forward, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly.
A corresponding [Springer textbook](https://deepreinforcementlearningbook.org) is also provided, you can get the free PDF if your institute has Springer license. We also released an [RLzoo](https://github.com/tensorlayer/RLzoo) for simple usage.
<br/>
<a href="https://join.slack.com/t/tensorlayer/shared_invite/enQtMjUyMjczMzU2Njg4LWI0MWU0MDFkOWY2YjQ4YjVhMzI5M2VlZmE4YTNhNGY1NjZhMzUwMmQ2MTc0YWRjMjQzMjdjMTg2MWQ2ZWJhYzc" target="\_blank">
<div align="center">
<img src="../../img/join_slack.png" width="20%"/>
</div>
<!-- <div align="center"><caption>Slack Invitation Link</caption></div> -->
</a>
<br/>
## Prerequisites:
* python 3.5
* tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0
* tensorlayer >= 2.0.1
* tensorflow-probability
*** If you meet the error`AttributeError: module 'tensorflow' has no attribute 'contrib'` when running the code after installing tensorflow-probability, try:
`pip install --upgrade tf-nightly-2.0-preview tfp-nightly`
## Quick Start
```
conda create --name tl python=3.6.4
conda activate tl
pip install tensorflow-gpu==2.0.0-rc1 # if no GPU, use pip install tensorflow==2.0.0
pip install tensorlayer
pip install tensorflow-probability==0.9.0
pip install gym
pip install gym[atari] # for others, use pip instal gym[all]
python tutorial_DDPG.py --train
```
## Status: Beta
We are currently open to any suggestions or pull requests from you to make the reinforcement learning tutorial with TensorLayer2.0 a better code repository for both new learners and senior researchers. Some of the algorithms mentioned in the this markdown may be not yet available, since we are still trying to implement more RL algorithms and optimize their performances. However, those algorithms listed above will come out in a few weeks, and the repository will keep updating more advanced RL algorithms in the future.
## To Use:
For each tutorial, open a terminal and run:
`python ***.py --train` for training and `python ***.py --test` for testing.
The tutorial algorithms follow the same basic structure, as shown in file: [`./tutorial_format.py`](https://github.com/tensorlayer/tensorlayer/blob/reinforcement-learning/examples/reinforcement_learning/tutorial_format.py)
The pretrained models and learning curves for each algorithm are stored [here](https://github.com/tensorlayer/pretrained-models). You can download the models and load the weights in the policies for tests.
## Table of Contents:
### value-based
| Algorithms | Action Space | Tutorial Env | Papers |
| --------------- | ------------ | -------------- | -------|
|**value-based**||||
| Q-learning | Discrete | FrozenLake | [Technical note: Q-learning. Watkins et al. 1992](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)|
| Deep Q-Network (DQN)| Discrete | FrozenLake | [Human-level control through deep reinforcement learning, Mnih et al. 2015.](https://www.nature.com/articles/nature14236/) |
| Prioritized Experience Replay | Discrete | Pong, CartPole | [Schaul et al. Prioritized experience replay. Schaul et al. 2015.](https://arxiv.org/abs/1511.05952) |
|Dueling DQN|Discrete | Pong, CartPole |[Dueling network architectures for deep reinforcement learning. Wang et al. 2015.](https://arxiv.org/abs/1511.06581)|
|Double DQN| Discrete | Pong, CartPole |[Deep reinforcement learning with double q-learning. Van et al. 2016.](https://arxiv.org/abs/1509.06461)|
|Noisy DQN|Discrete | Pong, CartPole |[Noisy networks for exploration. Fortunato et al. 2017.](https://arxiv.org/pdf/1706.10295.pdf)|
| Distributed DQN (C51)| Discrete | Pong, CartPole | [A distributional perspective on reinforcement learning. Bellemare et al. 2017.](https://arxiv.org/pdf/1707.06887.pdf) |
|**policy-based**||||
|REINFORCE(PG) |Discrete/Continuous|CartPole | [Reinforcement learning: An introduction. Sutton et al. 2011.](https://www.cambridge.org/core/journals/robotica/article/robot-learning-edited-by-jonathan-h-connell-and-sridhar-mahadevan-kluwer-boston-19931997-xii240-pp-isbn-0792393651-hardback-21800-guilders-12000-8995/737FD21CA908246DF17779E9C20B6DF6)|
| Trust Region Policy Optimization (TRPO)| Discrete/Continuous | Pendulum | [Abbeel et al. Trust region policy optimization. Schulman et al.2015.](https://arxiv.org/pdf/1502.05477.pdf) |
| Proximal Policy Optimization (PPO) |Discrete/Continuous |Pendulum| [Proximal policy optimization algorithms. Schulman et al. 2017.](https://arxiv.org/abs/1707.06347) |
|Distributed Proximal Policy Optimization (DPPO)|Discrete/Continuous |Pendulum|[Emergence of locomotion behaviours in rich environments. Heess et al. 2017.](https://arxiv.org/abs/1707.02286)|
|**actor-critic**||||
|Actor-Critic (AC)|Discrete/Continuous|CartPole| [Actor-critic algorithms. Konda er al. 2000.](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)|
| Asynchronous Advantage Actor-Critic (A3C)| Discrete/Continuous | BipedalWalker| [Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.](https://arxiv.org/pdf/1602.01783.pdf) |
| DDPG|Discrete/Continuous |Pendulum| [Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) |
|TD3|Discrete/Continuous |Pendulum|[Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.](https://arxiv.org/pdf/1802.09477.pdf)|
|Soft Actor-Critic (SAC)|Discrete/Continuous |Pendulum|[Soft actor-critic algorithms and applications. Haarnoja et al. 2018.](https://arxiv.org/abs/1812.05905)|
## Examples of RL Algorithms:
* **Q-learning**
Code: `./tutorial_Qlearning.py`
<u>Paper</u>: [Technical Note Q-Learning](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)
<u>Description</u>:
```
Q-learning is a non-deep-learning method with TD Learning, Off-Policy, e-Greedy Exploration.
Central formula:
Q(S, A) <- Q(S, A) + alpha * (R + lambda * Q(newS, newA) - Q(S, A))
See David Silver RL Tutorial Lecture 5 - Q-Learning for more details.
```
* **Deep Q-Network (DQN)**
<u>Code:</u> `./tutorial_DQN.py`
<u>Paper</u>: [Human-level control through deep reinforcementlearning](https://
没有合适的资源?快使用搜索试试~ 我知道了~
面向科研和产品化的深度学习和强化学习库.zip
共414个文件
py:259个
rst:27个
jpg:24个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 83 浏览量
2024-04-29
11:02:36
上传
评论
收藏 13.32MB ZIP 举报
温馨提示
机器学习 机器学习使计算机能够从研究数据和统计信息中学习。 机器学习是迈向人工智能(AI)方向的其中一步。 机器学习是一种程序,可以分析数据并学习预测结果。 从何处开始? 在本教程中,我们将回到数学并研究统计学,以及如何根据数据集计算重要数值。 我们还将学习如何使用各种 Python 模块来获得所需的答案。 并且,我们将学习如何根据所学知识编写能够预测结果的函数。 数据集 在计算机中,数据集指的是任何数据集合。它可以是从数组到完整数据库的任何内容。 通过查看数据库,我们可以看到最受欢迎的颜色是白色,最老的车龄是 17 年,但是如果仅通过查看其他值就可以预测汽车是否具有 AutoPass,该怎么办? 这就是机器学习的目的!分析数据并预测结果! 在机器学习中,通常使用非常大的数据集。在本教程中,我们会尝试让您尽可能容易地理解机器学习的不同概念,并将使用一些易于理解的小型数据集。 数据类型 如需分析数据,了解我们要处理的数据类型非常重要。 我们可以将数据类型分为三种主要类别: 数值(Numerical) 分类(Categorical) 序数(Ordinal) 数值数据是数字,可以分为两种数值
资源推荐
资源详情
资源评论
收起资源包目录
面向科研和产品化的深度学习和强化学习库.zip (414个子文件)
make.bat 8KB
DockerLint.bat 121B
setup.travis.cfg 3KB
setup.travis_doc.cfg 2KB
setup.cfg 2KB
fix_rtd.css 213B
Dockerfile 1KB
Dockerfile 549B
.dockerignore 20B
.DS_Store 6KB
.DS_Store 6KB
yolov4_video_result.gif 6.91MB
.gitignore 1KB
.gitignore 14B
model_basic.h5 276KB
karpathy_rnn.jpeg 731KB
mnist.jpeg 227KB
pong_game.jpeg 38KB
tiger.jpeg 12KB
tiger.jpeg 12KB
puzzle.jpeg 9KB
affine_transform_comparison.jpg 91KB
affine_transform_why.jpg 90KB
3d_human_pose_result.jpg 47KB
1.jpg 30KB
human_pose_points.jpg 26KB
0.jpg 18KB
img3.jpg 7KB
img4.jpg 7KB
img8.jpg 7KB
img1.jpg 6KB
img5.jpg 6KB
img8.jpg 5KB
img2.jpg 5KB
img9.jpg 5KB
img7.jpg 5KB
img5.jpg 5KB
img3.jpg 5KB
img6.jpg 5KB
img1.jpg 5KB
img4.jpg 4KB
img2.jpg 4KB
img6.jpg 4KB
img9.jpg 3KB
img7.jpg 3KB
imagenet_class_index.json 35KB
imagenet_class_index.json 35KB
cat_caption.json 891B
Makefile 7KB
Makefile 715B
CHANGELOG.md 24KB
README.md 17KB
README.md 10KB
CONTRIBUTING.md 7KB
README.md 4KB
README.md 2KB
README.md 2KB
README.md 1KB
README.md 1KB
ISSUE_TEMPLATE.md 1KB
readme.md 1KB
PULL_REQUEST_TEMPLATE.md 974B
README.md 866B
Readme.md 680B
README.md 363B
README.md 263B
README.md 146B
README.md 87B
README.md 54B
README.md 8B
README.md 0B
coco.names 627B
word2vec_basic.pdf 111KB
word2vec_basic.pdf 111KB
img_tlayer_big.png 4.07MB
yolov4_image_result.png 1.79MB
TL_gardener.png 261KB
tsne.png 195KB
tl_transparent_logo.png 147KB
tl_transparent_logo.png 147KB
tl_black_logo.png 124KB
tl_black_logo.png 124KB
laska.png 100KB
tl_white_logo.png 82KB
tl_white_logo.png 82KB
github_mascot.png 64KB
join_slack.png 60KB
awesome-mentioned.png 59KB
basic_seq2seq.png 33KB
seq2seq.png 29KB
medium_header.png 28KB
img_tensorflow.png 27KB
img_tensorlayer.png 14KB
img_tensorlayer.png 14KB
img_tlayer.png 4KB
img_tunelayer.png 4KB
img_tlayer1.png 4KB
TL_gardener.psd 1.75MB
join_slack.psd 269KB
medium_header.psd 94KB
共 414 条
- 1
- 2
- 3
- 4
- 5
资源评论
野生的狒狒
- 粉丝: 2384
- 资源: 2110
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于MATLAB自然图像的随机数发生器的设计与实现
- 基于MATLAB的图像分割源码+详细文档+全部数据(高分课程设计).zip
- 基于MATLAB的图像分割源码+详细文档+全部数据(高分课程设计).zip
- 基于MINST数据库的手写体数字识别CNN设计,MATLAB实现源码+全部资料.zip
- 基于MINST数据库的手写体数字识别CNN设计,MATLAB实现源码+全部资料(高分项目)
- stata 经纬度计算反距离矩阵的方法.docx
- ORACLE数据库面试题解答DBA数据库管理员JAVA程序员架构师必看.docx
- stata 经纬度计算反距离矩阵的方法.zip
- ORACLE数据库面试题解答DBA数据库管理员JAVA程序员架构师必看.zip
- 华为-华为od题库练习题之密码截取.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功