# Deep Reinforcement Learning Algorithms
![logo](figures/logo.png)
![MIT License](https://img.shields.io/badge/license-MIT-blue.svg)
This repository will implement the classic deep reinforcement learning algorithms by using **PyTorch**. The aim of this repository is to provide clear code for people to learn the deep reinforcemen learning algorithms. In the future, more algorithms will be added and the existing codes will also be maintained.
## Current Implementations
- [x] Deep Q-Learning Network (DQN)
- [x] Basic DQN
- [x] Double Q network
- [x] Dueling Network Archtiecure
- [x] Deep Deterministic Policy Gradient (DDPG)
- [x] Advantage Actor-Critic (A2C)
- [x] Trust Region Policy Gradient (TRPO)
- [x] Proximal Policy Optimization (PPO)
- [ ] Actor Critic using Kronecker-Factored Trust Region (ACKTR)
- [ ] Soft Actor Critic (SAC)
## Update Info
:triangular_flag_on_post: **2018-10-17** - In this update, most of algorithms have been imporved and **add more experiments with plots** (except for DPPG). The **PPO** now supports **atari-games** and **mujoco-env**. The **TRPO** is much stable and can have better results!
:triangular_flag_on_post: **2019-07-15** - In this update, the installation for the openai baseline is no longer needed. I have intergated useful functions in the **rl__utils** module. DDPG is also re-implemented and support more results. README file has been modified. The code structure also has tiny adjustment.
:triangular_flag_on_post: **2019-07-26** - In this update, the revised repository will be public. In order to have a light size of the repository. I **rebuild** the repository and the previous version is deleted. But I will make a backup in the google driver.
## TODO List
- [ ] add prioritized experience replay.
- [x] in the future, we will not use openai baseline's pre-processing functions.
- [x] improve the **DDPG** - I have already implemented a pytorch Hindsight Experience Replay (HER) with DDPG, you chould check them [here](https://github.com/TianhongDai/hindsight-experience-replay).
- [ ] update pre-trained models in google driver (will update soon!).
## Requirments
- pytorch=1.0.1
- gym=0.12.5
- mpi4py
- mujoco-py
## Installation
1. Install our `rl_utils` module:
```bash
pip install -e .
```
2. Install mujoco: please follow the instruction of [official website](https://github.com/openai/mujoco-py).
3. Instll Box2d:
```bash
sudo apt-get install swig or brew install swig
pip install gym[box2d]
pip install box2d box2d-kengz
```
## Instruction
1. Train the agent (details could be found in each folder):
```
cd target_algo_folder/
python train.py --<arguments you need>
```
2. Play the demo:
```
cd target_algo_folder/
python demo.py --<arguments you need>
```
## Code Structures
1. **rl algorithms**:
- `arguments.py`: contain the parameters used in the training.
- `<rl-name>_agent.py`: contain the most important part of the reinforcement learning algorithms.
- `models.py`: the network structure for the policy and value function.
- `utils.py`: some useful function, such as **select actions**.
- `train.py`: the script to train the agent.
- `demo.py`: visualize the trained agent.
2. **rl_utils** module:
- `env_wrapper/`: contain the pre-processing function for the atari games and wrapper to create environments.
- `experience_replay/`: contain the experience replay for the off-policy rl algorithms.
- `logger/`: contain functions to take down log infos during training.
- `mpi_utils/`: contain the tools for the mpi training.
- `running_filter/`: contain the running mean filter functions to normalize the observation in the mujoco environments.
- `seeds/`: contain function to setup the random seeds for the training for reproducibility.
## Example Results
### 1. DQN algorithms
![dqn_performance](figures/01_dqn.png)
### 2. DDPG
![dueling_network](figures/02_ddpg.png)
### 3. A2C
![a2c](figures/03_a2c.png)
### 4. TRPO
![trpo](figures/04_trpo.png)
### 5. PPO
![ppo](figures/05_ppo.png)
## Demos
Atari Env (BreakoutNoFrameskip-v4)| Box2d Env (BipedalWalker-v2)| Mujoco Env (Hopper-v2)
-----------------------|-----------------------|-----------------------|
![](figures/breakout.gif)| ![](figures/bipedal.gif)| ![](figures/hopper.gif)
## Acknowledgement
- [Ilya Kostrikov's GitHub](https://github.com/ikostrikov)
- [Openai Baselines](https://github.com/openai/baselines)
- [Kai's suggestions to simplify MPI functions](https://github.com/Kaixhin)
## Related Papers
[1] [A Brief Survey of Deep Reinforcement Learning](https://arxiv.org/abs/1708.05866)
[2] [The Beta Policy for Continuous Control Reinforcement Learning](https://www.ri.cmu.edu/wp-content/uploads/2017/06/thesis-Chou.pdf)
[3] [Playing Atari with Deep Reinforcement Learning](https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf)
[4] [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)
[5] [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)
[6] [Continuous control with deep reinforcement learning](https://arxiv.org/abs/1509.02971)
[7] [Continuous Deep Q-Learning with Model-based Acceleration](https://arxiv.org/abs/1603.00748)
[8] [Asynchronous Methods for Deep Reinforcement Learning](https://arxiv.org/abs/1602.01783)
[9] [Trust Region Policy Optimization](https://arxiv.org/abs/1502.05477)
[10] [Proximal Policy Optimization Algorithms](https://arxiv.org/abs/1707.06347)
[11] [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation](https://arxiv.org/abs/1708.05144)
没有合适的资源?快使用搜索试试~ 我知道了~
Python-深度强化学习PyTorch实现集锦
共64个文件
py:48个
md:6个
png:6个
1星 需积分: 29 105 下载量 55 浏览量
2019-08-11
04:45:34
上传
评论 6
收藏 3.79MB ZIP 举报
温馨提示
This repository contains most of classic deep reinforcement learning algorithms, including - DQN, DDPG, A3C, PPO, TRPO. (More algorithms are still in progress)
资源推荐
资源详情
资源评论
收起资源包目录
Python-深度强化学习PyTorch实现集锦.zip (64个子文件)
reinforcement-learning-algorithms-master
.gitignore 1KB
README.md 5KB
05_ppo
ppo_agent.py 11KB
README.md 751B
models.py 4KB
train.py 757B
utils.py 1KB
arguments.py 2KB
demo.py 3KB
01_dqn_algos
README.md 434B
models.py 3KB
train.py 589B
utils.py 2KB
dqn_agent.py 6KB
arguments.py 2KB
demo.py 1KB
rl_utils
mpi_utils
utils.py 1KB
normalizer.py 3KB
__init__.py 0B
logger
plot.py 4KB
logger.py 14KB
__init__.py 0B
bench.py 6KB
running_filter
__init__.py 0B
running_filter.py 2KB
seeds
seeds.py 407B
__init__.py 0B
env_wrapper
multi_envs_wrapper.py 4KB
create_env.py 2KB
frame_stack.py 1KB
atari_wrapper.py 10KB
__init__.py 6KB
experience_replay
experience_replay.py 1KB
04_trpo
trpo_agent.py 9KB
README.md 258B
models.py 1KB
train.py 461B
utils.py 2KB
arguments.py 1KB
demo.py 1KB
figures
04_trpo.png 142KB
logo.png 13KB
breakout.gif 452KB
01_dqn.png 233KB
hopper.gif 1.79MB
03_a2c.png 165KB
05_ppo.png 130KB
02_ddpg.png 136KB
bipedal.gif 815KB
setup.py 275B
02_ddpg
README.md 351B
models.py 950B
train.py 717B
ddpg_agent.py 9KB
utils.py 686B
arguments.py 2KB
demo.py 1KB
03_a2c
README.md 266B
models.py 2KB
train.py 612B
utils.py 749B
a2c_agent.py 6KB
arguments.py 2KB
demo.py 1KB
共 64 条
- 1
资源评论
- Python量化投资、代码解析与论文精读2021-02-17都是github里面的代码,别人的劳动成果
普通网友
- 粉丝: 484
- 资源: 1万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功