# Deep reinforcement learning algorithms in Atari Pong
## Summary
Goal of this aplication is to find out how accurate and effective can Deep Q-Learning (DQN) be on Atari 1600 game of Pong in OpenAI enviroment. On top of DQN, aditional improvements on same algorithm were tested including Multi-step DQN, Double DQN and Dueling DQN. Result that can be seen on graph below show that basic DQN achieves human-like accuracy after only ~110 played games and great accuracy after 300 games. Improved versions of DQN considered in this project showed some improvement in both efficiancy and accuracy.
![Pong Gif](images/000.gif)
![Pong Gif](images/216.gif)
Basic DQN: Episode 1 vs Episode 216
## Enviroment
Atari 1600 emulator is made by OpenAI in which you can test your reinforcement algorithms on 59 different games. Deep reinforcement learning is used because input is RGB picture of current frame (210x160x3). Since RGB picture is too much computationally expensive, it is turned to grayscale. Next is downsampling and cutting image to playable area, which size is 84x84x1. https://gym.openai.com/envs/Pong-v0/
![](images/rgb_image.png)
*Grayscale, downsampling and cropped*
---
In Pong every game is played until one side has 21 points. One point is gain when other side didnt manage to return ball. In terms of reward for our agent, he gains -1 reward if he misses ball, +1 reward if opponent misses ball and 0 reward in every other case. After one side collects 21 points total reward gained is calculated by agent. Therefore minimum total reward is -21, human-like performance is over 0 and +21 is best possible outcome.
## DQN
For the DQN implementation and the choose of the hyperparameters, I mostly followed [Mnih et al.](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf). I improved the basic DQN, implementing some variations like **Double Q-learning**, **Dueling networks** and **Multi-step learning**. You can find them summarized by [Hessel et al.](https://arxiv.org/pdf/1710.02298.pdf).
For more in detail about each improved version of DQN you can check out these papers:
* Multi-step DQN - [The "Bible" of Reinforcement Learning: Chapters 7 - Sutton & Barto](https://www.amazon.com/Reinforcement-Learning-Introduction-Adaptive-Computation/dp/0262039249/ref=as_li_ss_tl?keywords=reinforcement+learning&qid=1567849400&s=gateway&sr=8-1&linkCode=sl1&tag=andreaaffilia-20&linkId=e05d8ab8146051d903abb166926f6bce&language=en_US&tag=andreaaffilia-20)
* [Double DQN](https://arxiv.org/pdf/1509.06461.pdf)
* [Dueling DQN](http://proceedings.mlr.press/v48/wangf16.pdf)
## Results
Efficiency and accuracy are two main factors in calculating how good results are. Efficiency means how quickly agent achieves human-like level and accuracy represents how close is agent to total reward of +21. Graphs represent how high was mean total reward (on last 40 games) after each game. Agent trained for each variation of algorithm for up to 500 games.
### Optimizers
Adam and RMSProp optimizers were one tested in this project. Graph with some results comparing two optimizers can be seen below. It is clear RMSProp outperformed Adam in these tests, although more test runs are needed for better average values before giving clear verdict. Some other optimizers can be tested in future, like SGD or Adamax.
![](images/graph_optim.png)
- ![#ff7043](https://via.placeholder.com/15/ff7043/000000?text=+) `Basic DQN Adam`
- ![#bbbbbb](https://via.placeholder.com/15/bbbbbb/000000?text=+) `Basic DQN RMSProp`
- ![#0077bb](https://via.placeholder.com/15/0077bb/000000?text=+) `2-step DQN Adam`
- ![#009988](https://via.placeholder.com/15/009988/000000?text=+) `2-step DQN RMSProp`
### Algorithms
Few selected variations of implemented algorithms are shown below. Altough it looks like DQN with 2 step and Double DQN outperformed Dueling DQN in efficiency, important note to keep in mind is that these results need to be averaged over many runs, as both Double DQN and DQN with Multi-step = 2 showed high variancy in results (both better and worse than Dueling DQN). As for accuracy Dueling DQN mixed with other variations of DQN showed best results. For more informations about viewing all of the data, check out next section.
![](images/graph_total.png)
- ![#ff7043](https://via.placeholder.com/15/ff7043/000000?text=+) `Basic DQN Adam`
- ![#ee3377](https://via.placeholder.com/15/ee3377/000000?text=+) `2-step Dueling DQN RMSProp`
- ![#009988](https://via.placeholder.com/15/009988/000000?text=+) `2-step Dueling Double DQN RMSProp`
- ![#0077bb](https://via.placeholder.com/15/0077bb/000000?text=+) `2-step Double DQN RMSProp`
- ![#bbbbbb](https://via.placeholder.com/15/bbbbbb/000000?text=+) `2-step DQN RMSProp`
---
* Mean total reward over last 10 games
* Best efficiancy recorded: **2-step DQN RMSProp - after 79 games**
* Best accuracy recorded: **2-step Dueling Double DQN RMSProp - 20.30 score (after 444 games)**
* Mean total reward over last 40 games
* Best efficiancy recorded: **2-step DQN RMSProp - after 93 games**
* Best accuracy recorded: **2-step Dueling Double DQN RMSProp - 19.48 score (after 473 games)**
## Rest of data and TensorBoard
Rest of training data can be found at [/content/runs](https://github.com/leonjovanovic/deep-reinforcement-learning-atari-pong/tree/main/content/runs). If you wish to see it and compare with rest I recommend using TensorBoard. After installation simply change directory where data is stored and use command
```python
LOG_DIR = "full\path\to\data"
tensorboard --logdir=LOG_DIR --host=127.0.0.1
```
and open http://localhost:6006 in your browser.
For information about installation and further questions visit [TensorBoard github](https://github.com/tensorflow/tensorboard/blob/master/README.md)
## Future improvements
For further improvements on efficiency and accuracy we can do couple of things:
* Smaller epsilon decay, bigger replay memory size, longer training time may produce better results
* Implement [Prioritized Experience Replay](https://arxiv.org/pdf/1511.05952.pdf)
* Implement [Noisy Networks for Exploration](https://arxiv.org/pdf/1706.10295.pdf)
没有合适的资源?快使用搜索试试~ 我知道了~
deep-reinforcement-learning-atari-pong:强化学习DQN算法的PyTorch在OpenAI ...
共64个文件
0:13个
json:10个
mp4:8个
需积分: 50 15 下载量 99 浏览量
2021-03-12
19:55:13
上传
评论 4
收藏 1.3MB ZIP 举报
温馨提示
Atari Pong中的深度强化学习算法 概括 此应用程序的目标是找出深度Q学习(DQN)在OpenAI环境中对Pong的Atari 1600游戏有多准确和有效。 在DQN之上,测试了对相同算法的其他改进,包括多步DQN,Double DQN和Dueling DQN。 从下图可以看出,基本DQN仅需玩约110场游戏即可达到类似于人的准确性,而经过300场游戏即可达到极高的准确性。 此项目中考虑的DQN改进版本显示出效率和准确性方面的一些改进。 基本DQN:第1集与第216集 环保环境 Atari 1600仿真器由OpenAI制作,您可以在59种不同的游戏上测试您的强化算法。 使用深度强化学习,因为输入是当前帧(210x160x3)的RGB图片。 由于RGB图片的计算量太大,因此变成了灰度。 接下来是将图像缩减采样并将其剪切到可播放区域,该区域的大小为84x84x1。 灰度,下采样和裁剪
资源详情
资源评论
资源推荐
收起资源包目录
deep-reinforcement-learning-atari-pong-main.zip (64个子文件)
deep-reinforcement-learning-atari-pong-main
README.md 6KB
telegram_bot.py 869B
agent_control.py 6KB
content
runs
DQN Mutli-step=2, Double DQN, Dueling, RMSProp
events.out.tfevents.1615406876.DESKTOP-3P8CLT4.4740.0 115KB
DQN Multi-step=2, Adam
events.out.tfevents.1614784340.DESKTOP-3P8CLT4.12848.0 115KB
DQN Mutli-step=2, Double DQN, RMSProp
events.out.tfevents.1615049690.DESKTOP-3P8CLT4.14692.0 92KB
DQN Multi-step=2-1, Adam
events.out.tfevents.1614884542.DESKTOP-3P8CLT4.12196.0 37KB
DQN Multi-step=2, Dueling,RMSProp
events.out.tfevents.1615320628.DESKTOP-3P8CLT4.13180.2 98KB
Basic DQN, Adam - Copy
events.out.tfevents.1614716469.DESKTOP-3P8CLT4.13156.0 115KB
Basic DQN, RMSProp
events.out.tfevents.1615469512.DESKTOP-3P8CLT4.12884.0 46KB
Basic DQN, RMSProp - Copy
events.out.tfevents.1615469512.DESKTOP-3P8CLT4.12884.0 46KB
DQN Multi-step=2, RMSProp
events.out.tfevents.1615068573.DESKTOP-3P8CLT4.14692.1 44KB
DQN Multi-step=2, Double DQN, Adam
events.out.tfevents.1614972078.DESKTOP-3P8CLT4.14916.2 92KB
DQN Mutli-step=2, Double DQN, RMSProp new
events.out.tfevents.1615480715.DESKTOP-3P8CLT4.7824.0 115KB
DQN Multi-step=3, Adam
events.out.tfevents.1614867134.DESKTOP-3P8CLT4.9288.0 115KB
DQN Mutli-step=1, Double DQN, Adam
events.out.tfevents.1614952713.DESKTOP-3P8CLT4.14916.0 93KB
DQN Multi-step=1,Dueling,RMSProp
events.out.tfevents.1615300889.DESKTOP-3P8CLT4.13180.0 92KB
DQN Multi-step=2, RMSProp new
events.out.tfevents.1615456222.DESKTOP-3P8CLT4.7576.0 115KB
Basic DQN, Adam
events.out.tfevents.1614716469.DESKTOP-3P8CLT4.13156.0 115KB
DQN Multi-step=4, Adam
events.out.tfevents.1614802149.DESKTOP-3P8CLT4.12848.1 69KB
main-PongNoFrameskip-v4
openaigym.video.0.7824.video000000.meta.json 2KB
openaigym.video.0.7824.video000125.meta.json 2KB
openaigym.video.0.7824.video000343.meta.json 2KB
openaigym.video.0.7824.video000008.meta.json 2KB
openaigym.manifest.0.7824.manifest.json 873B
openaigym.video.0.7824.video000216.mp4 113KB
openaigym.video.0.7824.video000064.meta.json 2KB
openaigym.video.0.7824.video000064.mp4 90KB
openaigym.episode_batch.0.7824.stats.json 18KB
openaigym.video.0.7824.video000216.meta.json 2KB
openaigym.video.0.7824.video000343.mp4 120KB
openaigym.video.0.7824.video000000.mp4 69KB
openaigym.video.0.7824.video000125.mp4 116KB
openaigym.video.0.7824.video000001.mp4 48KB
openaigym.video.0.7824.video000027.mp4 75KB
openaigym.video.0.7824.video000027.meta.json 2KB
openaigym.video.0.7824.video000008.mp4 56KB
openaigym.video.0.7824.video000001.meta.json 2KB
.gitattributes 66B
replay_buffer.py 2KB
main.py 2KB
neural_nets.py 3KB
atari_wrappers.py 6KB
__pycache__
agent_control.cpython-38.pyc 3KB
neural_nets.cpython-38.pyc 2KB
replay_buffer.cpython-38.pyc 2KB
telegram_bot.cpython-38.pyc 1KB
atari_wrappers.cpython-38.pyc 7KB
agent.cpython-38.pyc 3KB
.idea
.gitignore 293B
other.xml 176B
vcs.xml 180B
misc.xml 185B
modules.xml 318B
inspectionProfiles
profiles_settings.xml 174B
reinforcement-learning-atari-pong.iml 407B
images
rgb_image.png 18KB
000.gif 138KB
graph_total.png 58KB
Screenshot_1.png 26KB
graph_optim.png 50KB
Screenshot_2.png 19KB
216.gif 146KB
agent.py 5KB
共 64 条
- 1
PaytonSun
- 粉丝: 20
- 资源: 4577
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0