DeepReinforcementLearning：深度RL实施。在pytorch中实现的DQN，SAC，DDPG，TD3，PPO和VPG。经过测试的环境：LunarLander-v2和Pendulum-v0

共42个文件

py：15个

png：8个

pyc：8个

algorithms

ddpg

5星 · 超过95%的资源需积分: 46 18 浏览量 2021-02-13 07:25:10 上传评论 10 收藏 391KB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

DeepReinforcementLearning-main.zip （42个子文件）

DeepReinforcementLearning-main

td3.py 4KB

.ipynb_checkpoints

test_and_intial_Experimentation-checkpoint.ipynb 72B

Policy Gradient Methods-checkpoint.ipynb 13KB

RLUtils

__init__.py 21B

utils.py 3KB

__pycache__

utils.cpython-37.pyc 4KB

__init__.cpython-37.pyc 179B

SoftActorCritic.py 3KB

Policy Gradient Methods.ipynb 13KB

Readme.md 2KB

.idea

.gitignore 47B

misc.xml 292B

vcs.xml 180B

inspectionProfiles

Project_Default.xml 659B

profiles_settings.xml 174B

modules.xml 294B

ReinforcementLearning.iml 317B

ppo_clip.py 4KB

ddpg.py 9KB

agents

__init__.py 57B

agent.py 125B

__pycache__

__init__.cpython-37.pyc 225B

agent.cpython-37.pyc 560B

ActorCriticAgents

__init__.py 63B

PPO_clip_agent.py 11KB

td3_agent.py 7KB

soft_Actor_critic_Agent.py 7KB

__pycache__

soft_Actor_critic_Agent.cpython-37.pyc 6KB

td3_agent.cpython-37.pyc 6KB

PPO_clip_agent.cpython-37.pyc 8KB

__init__.cpython-37.pyc 235B

MLPAgent.py 0B

figures

PPO_MountainCarContinuous-v0_rewards.png 22KB

DQN_Lunar_lander_losses.png 38KB

VPG_LunarLander-v2_rewards.png 38KB

SAC_Pendulum-v0_rewards.png 51KB

DQN_Lunar_lander_rewards.png 48KB

TD3_Pendulum_rewards.png 62KB

DDPG_Pendulum-v0_rewards.png 43KB

PPO_Pendulum-v0_rewards.png 57KB

vanilla_policy_gradient.py 8KB

DQN.py 19KB

### Deep RL algorithms implemented using Pytorch #### Algo list: 1. [DQN](https://github.com/akashe/DeepReinforcementLearning/blob/main/DQN.py) 2. [Vanilla policy Gradient](https://github.com/akashe/DeepReinforcementLearning/blob/main/vanilla_policy_gradient.py) 3. [Deep Deterministic Policy Gradient](https://github.com/akashe/DeepReinforcementLearning/blob/main/ddpg.py) 4. [Twin Delayed Deep Deterministic Policy Gradient](https://github.com/akashe/DeepReinforcementLearning/blob/main/td3.py) 5. [Soft Actor Critic](https://github.com/akashe/DeepReinforcementLearning/blob/main/SoftActorCritic.py) 6. [Proximal Policy Optimization - CLIP](https://github.com/akashe/DeepReinforcementLearning/blob/main/ppo_clip.py) ###### Article on deeper Look into [policy gradients](https://akashe.io/blog/2020/10/14/policy-gradient-methods/) #### Experimental Results: |Algorithm| Discrete Env: LunarLander-v2 | Continuous Env: Pendulum-v0 | | :---: | :---: | :---: | | DQN | ![LunnarLander-DQN](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/DQN_Lunar_lander_rewards.png) | - | | VPG | ![LunarLander-VPG](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/VPG_LunarLander-v2_rewards.png) | - | | DDPG | - | ![Pendulum-DDPG](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/DDPG_Pendulum-v0_rewards.png)| | TD3 | - | ![Pendulum-TD3](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/TD3_Pendulum_rewards.png) | | SAC | - | ![Pendulum-SAC](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/SAC_Pendulum-v0_rewards.png) | | PPO | - | ![Pendulum-PPO](https://raw.githubusercontent.com/akashe/DeepReinforcementLearning/main/figures/PPO_Pendulum-v0_rewards.png) | #### Usage: Just run the file/algorithm directly. There is no common structures between algorithms as I implemented them as I learnt them. Different algorithms are inspired from different sources. #### Resources: 1. [RL course by David Silver](https://www.youtube.com/watch?v=KHZVXao4qXs&list=PLqYmG7hTraZDM-OYHWgPebj2MfCFzFObQ&index=7) 2. [Lecture slides for above course](https://www.davidsilver.uk/teaching/) 3. [Spinning up by OpenAI](https://spinningup.openai.com) 3. [More exhaustive RL guide by Deeny Britz](https://github.com/dennybritz/reinforcement-learning) #### Future projects: 1. If time available I will add a simple program for elevator using RL. 2. Better graphs