面向科研和产品化的深度学习和强化学习库.zip资源-CSDN文库

共414个文件

py：259个

rst：27个

jpg：24个

版权申诉

深度学习

83 浏览量 2024-04-29 11:02:36 上传评论收藏 13.32MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

面向科研和产品化的深度学习和强化学习库.zip （414个子文件）

make.bat 8KB

DockerLint.bat 121B

setup.travis.cfg 3KB

setup.travis_doc.cfg 2KB

setup.cfg 2KB

fix_rtd.css 213B

Dockerfile 1KB

Dockerfile 549B

.dockerignore 20B

.DS_Store 6KB

yolov4_video_result.gif 6.91MB

.gitignore 1KB

.gitignore 14B

model_basic.h5 276KB

karpathy_rnn.jpeg 731KB

mnist.jpeg 227KB

pong_game.jpeg 38KB

tiger.jpeg 12KB

puzzle.jpeg 9KB

affine_transform_comparison.jpg 91KB

affine_transform_why.jpg 90KB

3d_human_pose_result.jpg 47KB

1.jpg 30KB

human_pose_points.jpg 26KB

0.jpg 18KB

img3.jpg 7KB

img4.jpg 7KB

img8.jpg 7KB

img1.jpg 6KB

img5.jpg 6KB

img8.jpg 5KB

img2.jpg 5KB

img9.jpg 5KB

img7.jpg 5KB

img5.jpg 5KB

img3.jpg 5KB

img6.jpg 5KB

img1.jpg 5KB

img4.jpg 4KB

img2.jpg 4KB

img6.jpg 4KB

img9.jpg 3KB

img7.jpg 3KB

imagenet_class_index.json 35KB

cat_caption.json 891B

Makefile 7KB

Makefile 715B

CHANGELOG.md 24KB

README.md 17KB

README.md 10KB

CONTRIBUTING.md 7KB

README.md 4KB

README.md 2KB

README.md 1KB

ISSUE_TEMPLATE.md 1KB

readme.md 1KB

PULL_REQUEST_TEMPLATE.md 974B

README.md 866B

Readme.md 680B

README.md 363B

README.md 263B

README.md 146B

README.md 87B

README.md 54B

README.md 8B

README.md 0B

coco.names 627B

word2vec_basic.pdf 111KB

img_tlayer_big.png 4.07MB

yolov4_image_result.png 1.79MB

TL_gardener.png 261KB

tsne.png 195KB

tl_transparent_logo.png 147KB

tl_black_logo.png 124KB

laska.png 100KB

tl_white_logo.png 82KB

github_mascot.png 64KB

join_slack.png 60KB

awesome-mentioned.png 59KB

basic_seq2seq.png 33KB

seq2seq.png 29KB

medium_header.png 28KB

img_tensorflow.png 27KB

img_tensorlayer.png 14KB

img_tlayer.png 4KB

img_tunelayer.png 4KB

img_tlayer1.png 4KB

TL_gardener.psd 1.75MB

join_slack.psd 269KB

medium_header.psd 94KB

共 414 条

# Comprehensive Reinforcement Learning Tutorial ![GitHub last commit (branch)](https://img.shields.io/github/last-commit/tensorlayer/tensorlayer/master.svg) [![Supported TF Version](https://img.shields.io/badge/TensorFlow-2.0.0%2B-brightgreen.svg)](https://github.com/tensorflow/tensorflow/releases) [![Documentation Status](https://readthedocs.org/projects/tensorlayer/badge/)](https://tensorlayer.readthedocs.io/) [![Build Status](https://travis-ci.org/tensorlayer/tensorlayer.svg?branch=master)](https://travis-ci.org/tensorlayer/tensorlayer) [![Downloads](http://pepy.tech/badge/tensorlayer)](http://pepy.tech/project/tensorlayer) <br/> <a href="https://deepreinforcementlearningbook.org" target="\_blank"> <div align="center"> <img src="http://deep-reinforcement-learning-book.github.io/assets/images/cover_v1.png" width="22%"/> </div>  </a> <br/>  This repository contains implementations of the most popular reinforcement learning algorithms, powered by [Tensorflow 2.0](https://www.tensorflow.org/alpha/guide/effective_tf2) and Tensorlayer 2.0. We aim to make the reinforcement learning tutorial simple, transparent and straight-forward, as this would not only benefits new learners of reinforcement learning, but also provide convenience for senior researchers to testify their new ideas quickly. A corresponding [Springer textbook](https://deepreinforcementlearningbook.org) is also provided, you can get the free PDF if your institute has Springer license. We also released an [RLzoo](https://github.com/tensorlayer/RLzoo) for simple usage. <br/> <a href="https://join.slack.com/t/tensorlayer/shared_invite/enQtMjUyMjczMzU2Njg4LWI0MWU0MDFkOWY2YjQ4YjVhMzI5M2VlZmE4YTNhNGY1NjZhMzUwMmQ2MTc0YWRjMjQzMjdjMTg2MWQ2ZWJhYzc" target="\_blank"> <div align="center"> <img src="../../img/join_slack.png" width="20%"/> </div>  </a> <br/> ## Prerequisites: * python 3.5 * tensorflow >= 2.0.0 or tensorflow-gpu >= 2.0.0a0 * tensorlayer >= 2.0.1 * tensorflow-probability *** If you meet the error`AttributeError: module 'tensorflow' has no attribute 'contrib'` when running the code after installing tensorflow-probability, try: `pip install --upgrade tf-nightly-2.0-preview tfp-nightly` ## Quick Start ``` conda create --name tl python=3.6.4 conda activate tl pip install tensorflow-gpu==2.0.0-rc1 # if no GPU, use pip install tensorflow==2.0.0 pip install tensorlayer pip install tensorflow-probability==0.9.0 pip install gym pip install gym[atari] # for others, use pip instal gym[all] python tutorial_DDPG.py --train ``` ## Status: Beta We are currently open to any suggestions or pull requests from you to make the reinforcement learning tutorial with TensorLayer2.0 a better code repository for both new learners and senior researchers. Some of the algorithms mentioned in the this markdown may be not yet available, since we are still trying to implement more RL algorithms and optimize their performances. However, those algorithms listed above will come out in a few weeks, and the repository will keep updating more advanced RL algorithms in the future. ## To Use: For each tutorial, open a terminal and run: `python ***.py --train` for training and `python ***.py --test` for testing. The tutorial algorithms follow the same basic structure, as shown in file: [`./tutorial_format.py`](https://github.com/tensorlayer/tensorlayer/blob/reinforcement-learning/examples/reinforcement_learning/tutorial_format.py) The pretrained models and learning curves for each algorithm are stored [here](https://github.com/tensorlayer/pretrained-models). You can download the models and load the weights in the policies for tests. ## Table of Contents: ### value-based | Algorithms | Action Space | Tutorial Env | Papers | | --------------- | ------------ | -------------- | -------| |**value-based**|||| | Q-learning | Discrete | FrozenLake | [Technical note: Q-learning. Watkins et al. 1992](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf)| | Deep Q-Network (DQN)| Discrete | FrozenLake | [Human-level control through deep reinforcement learning, Mnih et al. 2015.](https://www.nature.com/articles/nature14236/) | | Prioritized Experience Replay | Discrete | Pong, CartPole | [Schaul et al. Prioritized experience replay. Schaul et al. 2015.](https://arxiv.org/abs/1511.05952) | |Dueling DQN|Discrete | Pong, CartPole |[Dueling network architectures for deep reinforcement learning. Wang et al. 2015.](https://arxiv.org/abs/1511.06581)| |Double DQN| Discrete | Pong, CartPole |[Deep reinforcement learning with double q-learning. Van et al. 2016.](https://arxiv.org/abs/1509.06461)| |Noisy DQN|Discrete | Pong, CartPole |[Noisy networks for exploration. Fortunato et al. 2017.](https://arxiv.org/pdf/1706.10295.pdf)| | Distributed DQN (C51)| Discrete | Pong, CartPole | [A distributional perspective on reinforcement learning. Bellemare et al. 2017.](https://arxiv.org/pdf/1707.06887.pdf) | |**policy-based**|||| |REINFORCE(PG) |Discrete/Continuous|CartPole | [Reinforcement learning: An introduction. Sutton et al. 2011.](https://www.cambridge.org/core/journals/robotica/article/robot-learning-edited-by-jonathan-h-connell-and-sridhar-mahadevan-kluwer-boston-19931997-xii240-pp-isbn-0792393651-hardback-21800-guilders-12000-8995/737FD21CA908246DF17779E9C20B6DF6)| | Trust Region Policy Optimization (TRPO)| Discrete/Continuous | Pendulum | [Abbeel et al. Trust region policy optimization. Schulman et al.2015.](https://arxiv.org/pdf/1502.05477.pdf) | | Proximal Policy Optimization (PPO) |Discrete/Continuous |Pendulum| [Proximal policy optimization algorithms. Schulman et al. 2017.](https://arxiv.org/abs/1707.06347) | |Distributed Proximal Policy Optimization (DPPO)|Discrete/Continuous |Pendulum|[Emergence of locomotion behaviours in rich environments. Heess et al. 2017.](https://arxiv.org/abs/1707.02286)| |**actor-critic**|||| |Actor-Critic (AC)|Discrete/Continuous|CartPole| [Actor-critic algorithms. Konda er al. 2000.](https://papers.nips.cc/paper/1786-actor-critic-algorithms.pdf)| | Asynchronous Advantage Actor-Critic (A3C)| Discrete/Continuous | BipedalWalker| [Asynchronous methods for deep reinforcement learning. Mnih et al. 2016.](https://arxiv.org/pdf/1602.01783.pdf) | | DDPG|Discrete/Continuous |Pendulum| [Continuous Control With Deep Reinforcement Learning, Lillicrap et al. 2016](https://arxiv.org/pdf/1509.02971.pdf) | |TD3|Discrete/Continuous |Pendulum|[Addressing function approximation error in actor-critic methods. Fujimoto et al. 2018.](https://arxiv.org/pdf/1802.09477.pdf)| |Soft Actor-Critic (SAC)|Discrete/Continuous |Pendulum|[Soft actor-critic algorithms and applications. Haarnoja et al. 2018.](https://arxiv.org/abs/1812.05905)| ## Examples of RL Algorithms: * **Q-learning** Code: `./tutorial_Qlearning.py` <u>Paper</u>: [Technical Note Q-Learning](http://www.gatsby.ucl.ac.uk/~dayan/papers/cjch.pdf) <u>Description</u>: ``` Q-learning is a non-deep-learning method with TD Learning, Off-Policy, e-Greedy Exploration. Central formula: Q(S, A) <- Q(S, A) + alpha * (R + lambda * Q(newS, newA) - Q(S, A)) See David Silver RL Tutorial Lecture 5 - Q-Learning for more details. ``` * **Deep Q-Network (DQN)** <u>Code:</u> `./tutorial_DQN.py` <u>Paper</u>: [Human-level control through deep reinforcementlearning](https://

评论收藏

内容反馈

版权申诉