# Vanilla DQN, Double DQN, and Dueling DQN in PyTorch
## Description
This repo is a [PyTorch](https://www.pytorch.org/) implementation of Vanilla DQN, Double DQN, and Dueling DQN based off these papers.
- [Human-level control through deep reinforcement learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html)
- [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461)
- [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581)
Starter code is used from [Berkeley CS 294 Assignment 3](https://github.com/berkeleydeeprlcourse/homework/tree/master/hw3) and modified for PyTorch with some guidance from [here](https://github.com/transedward/pytorch-dqn). Tensorboard logging has also been added (thanks [here](https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/04-utils/tensorboard) for visualization during training in addition to what the Gym Monitor already does).
## Background
Deep Q-networks use neural networks as function approximators for the action-value function, Q. The architecture used here specifically takes inputs frames from the Atari simulator as input (i.e., the state) and passes these frames through two convolutional layers and two fully connected layers before outputting a Q value for each action.
<p align="center">
<img src="assets/nature_dqn_model.png" height="300px">
</p>
[Human-level control through deep reinforcement learning](http://www.nature.com/nature/journal/v518/n7540/full/nature14236.html) introduced using a experience replay buffer that stores past observations and uses them as training input to reduce correlations between data samples. They also used a separate target network consisting of weights at a past time step for calculating the target Q value. These weights are periodically updated to match the updated, latest set of weights on the main Q network. This reduces the correlation between the target and current Q values. Q target is calculated as below.
<p align="center">
<img src="assets/nature_dqn_target.png" height="100px">
</p>
Noting that vanilla DQN can overestimate action values, [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461) proposes an alternative Q target value that takes the argmax of the current Q network when inputted with the next observations. These actions, together with the next observations, are passed into the frozen target network to yield Q values at each update. This new Q target is shown below.
<p align="center">
<img src="assets/double_q_target.png" height="70px">
</p>
Finally, [Dueling Network Architectures for Deep Reinforcement Learning](https://arxiv.org/abs/1511.06581) proposes a different architecture for approximating Q functions. After the last convolutional layer, the output is split into two streams that separately estimate the state-value and advantages for each action within the state. These two estimations are then combined together to generate a Q value through the equation below. The architecture is also shown here in contrast to traditional Deep Q-Learning networks.
<p align="center">
<img src="assets/dueling_q_target.png" height="150px">
<img src="assets/dueling_q_arch.png" height="300px">
</p>
## Dependencies
- Python 2.7
- [PyTorch 0.2.0](http://pytorch.org/)
- [NumPy](http://www.numpy.org/)
- [OpenAI Gym](https://github.com/openai/gym)
- [OpenCV 3.3.0](https://pypi.python.org/pypi/opencv-python)
- [Tensorboard](https://github.com/tensorflow/tensorboard)
## Usage
- Execute the following command to train a model on vanilla DQN:
```
$ python main.py train --task-id $TASK_ID
```
From the Atari40M spec, here are the different environments you can use:
* `0`: BeamRider
* `1`: Breakout
* `2`: Enduro
* `3`: Pong
* `4`: Qbert
* `5`: Seaquest
* `6`: Spaceinvaders
Here are some options that you can use:
* `--gpu`: id of the GPU you want to use (if not specified, will train on CPU)
* `--double-dqn`: 1 to train with double DQN, 0 for vanilla DQN
* `--dueling-dqn`: 1 to train with dueling DQN, 0 for vanilla DQN
## Results
### SpaceInvaders
Sample gameplay
<p align="center">
<img src="assets/spaceinvaders.gif" height="400">
</p>
### Pong
Sample gameplay
<p align="center">
<img src="assets/pong.gif" height="400">
</p>
### Breakout
Sample gameplay
<p align="center">
<img src="assets/breakout.gif" height="400">
</p>
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
深度Q网络使用神经网络作为动作值函数Q的函数逼近器。这里使用的架构专门将来自Atari模拟器的输入帧作为输入(即状态),并将这些帧通过两个卷积层和两个完全连接层,然后为每个动作输出Q值。 通过深度强化学习引入的人类水平控制,使用经验回放缓冲区来存储过去的观察结果,并将其用作训练输入,以减少数据样本之间的相关性。他们还使用由过去时间步长的权重组成的单独目标网络来计算目标Q值。这些权重会定期更新,以匹配主Q网络上更新的最新权重集。这会降低目标和当前Q值之间的相关性。Q目标计算如下。 注意到普通DQN可能高估动作值,具有双Q学习的深度强化学习提出了一种替代Q目标值,当输入下一个观测值时,该值取当前Q网络的argmax。这些动作与接下来的观测一起被传递到冻结的目标网络中,以在每次更新时产生Q值。这个新的Q目标如下所示。 最后,深度强化学习的决斗网络架构提出了一种不同的近似Q函数的架构。在最后一个卷积层之后,输出被分成两个流,分别估计状态值和状态内每个动作的优势。然后将这两个估计组合在一起,通过下面的等式生成Q值。这里还显示了与传统的深度Q学习网络形成对比的架构。
资源推荐
资源详情
资源评论
收起资源包目录
基于pytorch实现Vanilla DQN Double DQN 和Dueling DQN源码.zip (19个子文件)
code
assets
nature_dqn_target.png 28KB
dueling_q_target.png 43KB
double_q_target.png 32KB
spaceinvaders.gif 1.01MB
dueling_q_arch.png 116KB
qbert_all.png 39KB
pong.gif 1.62MB
breakout.gif 850KB
nature_dqn_model.png 322KB
main.py 4KB
learn.py 12KB
utils
__init__.py 0B
gym_setup.py 1KB
replay_buffer.py 8KB
atari_wrappers.py 5KB
schedules.py 3KB
model.py 2KB
logger.py 2KB
README.md 4KB
共 19 条
- 1
资源评论
生活家小毛.
- 粉丝: 6036
- 资源: 7290
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功