Python-深度强化学习PyTorch实现集锦_强化学习架构资源-CSDN文库

共64个文件

py：48个

md：6个

png：6个

Python开发-机器学习

1星需积分: 29 55 浏览量 2019-08-11 04:45:34 上传评论 6 收藏 3.79MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Python-深度强化学习PyTorch实现集锦.zip （64个子文件）

reinforcement-learning-algorithms-master

.gitignore 1KB

README.md 5KB

05_ppo

ppo_agent.py 11KB

README.md 751B

models.py 4KB

train.py 757B

utils.py 1KB

arguments.py 2KB

demo.py 3KB

01_dqn_algos

README.md 434B

models.py 3KB

train.py 589B

utils.py 2KB

dqn_agent.py 6KB

arguments.py 2KB

demo.py 1KB

rl_utils

mpi_utils

utils.py 1KB

normalizer.py 3KB

__init__.py 0B

logger

plot.py 4KB

logger.py 14KB

__init__.py 0B

bench.py 6KB

running_filter

__init__.py 0B

running_filter.py 2KB

seeds

seeds.py 407B

__init__.py 0B

env_wrapper

multi_envs_wrapper.py 4KB

create_env.py 2KB

frame_stack.py 1KB

atari_wrapper.py 10KB

__init__.py 6KB

experience_replay

experience_replay.py 1KB

04_trpo

trpo_agent.py 9KB

README.md 258B

models.py 1KB

train.py 461B

utils.py 2KB

arguments.py 1KB

demo.py 1KB

figures

04_trpo.png 142KB

logo.png 13KB

breakout.gif 452KB

01_dqn.png 233KB

hopper.gif 1.79MB

03_a2c.png 165KB

05_ppo.png 130KB

02_ddpg.png 136KB

bipedal.gif 815KB

setup.py 275B

02_ddpg

README.md 351B

models.py 950B

train.py 717B

ddpg_agent.py 9KB

utils.py 686B

arguments.py 2KB

demo.py 1KB

03_a2c

README.md 266B

models.py 2KB

train.py 612B

utils.py 749B

a2c_agent.py 6KB

arguments.py 2KB

demo.py 1KB

# Deep Reinforcement Learning Algorithms ![logo](figures/logo.png)   ![MIT License](https://img.shields.io/badge/license-MIT-blue.svg) This repository will implement the classic deep reinforcement learning algorithms by using **PyTorch**. The aim of this repository is to provide clear code for people to learn the deep reinforcemen learning algorithms. In the future, more algorithms will be added and the existing codes will also be maintained. ## Current Implementations - [x] Deep Q-Learning Network (DQN) - [x] Basic DQN - [x] Double Q network - [x] Dueling Network Archtiecure - [x] Deep Deterministic Policy Gradient (DDPG) - [x] Advantage Actor-Critic (A2C) - [x] Trust Region Policy Gradient (TRPO) - [x] Proximal Policy Optimization (PPO) - [ ] Actor Critic using Kronecker-Factored Trust Region (ACKTR) - [ ] Soft Actor Critic (SAC) ## Update Info :triangular_flag_on_post: **2018-10-17** - In this update, most of algorithms have been imporved and **add more experiments with plots** (except for DPPG). The **PPO** now supports **atari-games** and **mujoco-env**. The **TRPO** is much stable and can have better results!   :triangular_flag_on_post: **2019-07-15** - In this update, the installation for the openai baseline is no longer needed. I have intergated useful functions in the **rl__utils** module. DDPG is also re-implemented and support more results. README file has been modified. The code structure also has tiny adjustment.   :triangular_flag_on_post: **2019-07-26** - In this update, the revised repository will be public. In order to have a light size of the repository. I **rebuild** the repository and the previous version is deleted. But I will make a backup in the google driver. ## TODO List - [ ] add prioritized experience replay. - [x] in the future, we will not use openai baseline's pre-processing functions. - [x] improve the **DDPG** - I have already implemented a pytorch Hindsight Experience Replay (HER) with DDPG, you chould check them [here](https://github.com/TianhongDai/hindsight-experience-replay). - [ ] update pre-trained models in google driver (will update soon!). ## Requirments - pytorch=1.0.1 - gym=0.12.5 - mpi4py - mujoco-py ## Installation 1. Install our `rl_utils` module: ```bash pip install -e . ``` 2. Install mujoco: please follow the instruction of [official website](https://github.com/openai/mujoco-py). 3. Instll Box2d: ```bash sudo apt-get install swig or brew install swig pip install gym[box2d] pip install box2d box2d-kengz ``` ## Instruction 1. Train the agent (details could be found in each folder): ``` cd target_algo_folder/ python train.py --<arguments you need> ``` 2. Play the demo: ``` cd target_algo_folder/ python demo.py --<arguments you need> ``` ## Code Structures 1. **rl algorithms**: - `arguments.py`: contain the parameters used in the training. - `<rl-name>_agent.py`: contain the most important part of the reinforcement learning algorithms. - `models.py`: the network structure for the policy and value function. - `utils.py`: some useful function, such as **select actions**. - `train.py`: the script to train the agent. - `demo.py`: visualize the trained agent. 2. **rl_utils** module: - `env_wrapper/`: contain the pre-processing function for the atari games and wrapper to create environments. - `experience_replay/`: contain the experience replay for the off-policy rl algorithms. - `logger/`: contain functions to take down log infos during training. - `mpi_utils/`: contain the tools for the mpi training. - `running_filter/`: contain the running mean filter functions to normalize the observation in the mujoco environments. - `seeds/`: contain function to setup the random seeds for the training for reproducibility. ## Example Results ### 1. DQN algorithms ![dqn_performance](figures/01_dqn.png) ### 2. DDPG ![dueling_network](figures/02_ddpg.png) ### 3. A2C ![a2c](figures/03_a2c.png) ### 4. TRPO ![trpo](figures/04_trpo.png) ### 5. PPO ![ppo](figures/05_ppo.png) ## Demos Atari Env (BreakoutNoFrameskip-v4)| Box2d Env (BipedalWalker-v2)| Mujoco Env (Hopper-v2) -----------------------|-----------------------|-----------------------| ![](figures/breakout.gif)| ![](figures/bipedal.gif)| ![](figures/hopper.gif) ## Acknowledgement - [Ilya Kostrikov's GitHub](https://github.com/ikostrikov) - [Openai Baselines](https://github.com/openai/baselines) - [Kai's suggestions to simplify MPI functions](https://github.com/Kaixhin) ## Related Papers [1] [A Brief Survey of Deep Reinforcement Learning](https://arxiv.org/abs/1708.05866) [2] [The Beta Policy for Continuous Control Reinforcement Learning](https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf) [3] [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf) [4] [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461) [5] [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581) [6] [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971) [7] [Continuous Deep Q-Learning with Model-based Acceleration](https://arxiv.org/abs/1603.00748) [8] [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1602.01783) [9] [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477) [10] [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347) [11] [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://arxiv.org/abs/1708.05144)

评论收藏

内容反馈