deep-reinforcement-learning-atari-pong:强化学习DQN算法的PyTorch在OpenAIAtariPong游戏中的应用

共64个文件

0：13个

json：10个

mp4：8个

Python

需积分: 50 99 浏览量 2021-03-12 19:55:13 上传评论 4 收藏 1.3MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

deep-reinforcement-learning-atari-pong-main.zip （64个子文件）

deep-reinforcement-learning-atari-pong-main

README.md 6KB

telegram_bot.py 869B

agent_control.py 6KB

content

runs

DQN Mutli-step=2, Double DQN, Dueling, RMSProp

events.out.tfevents.1615406876.DESKTOP-3P8CLT4.4740.0 115KB

DQN Multi-step=2, Adam

events.out.tfevents.1614784340.DESKTOP-3P8CLT4.12848.0 115KB

DQN Mutli-step=2, Double DQN, RMSProp

events.out.tfevents.1615049690.DESKTOP-3P8CLT4.14692.0 92KB

DQN Multi-step=2-1, Adam

events.out.tfevents.1614884542.DESKTOP-3P8CLT4.12196.0 37KB

DQN Multi-step=2, Dueling,RMSProp

events.out.tfevents.1615320628.DESKTOP-3P8CLT4.13180.2 98KB

Basic DQN, Adam - Copy

events.out.tfevents.1614716469.DESKTOP-3P8CLT4.13156.0 115KB

Basic DQN, RMSProp

events.out.tfevents.1615469512.DESKTOP-3P8CLT4.12884.0 46KB

Basic DQN, RMSProp - Copy

events.out.tfevents.1615469512.DESKTOP-3P8CLT4.12884.0 46KB

DQN Multi-step=2, RMSProp

events.out.tfevents.1615068573.DESKTOP-3P8CLT4.14692.1 44KB

DQN Multi-step=2, Double DQN, Adam

events.out.tfevents.1614972078.DESKTOP-3P8CLT4.14916.2 92KB

DQN Mutli-step=2, Double DQN, RMSProp new

events.out.tfevents.1615480715.DESKTOP-3P8CLT4.7824.0 115KB

DQN Multi-step=3, Adam

events.out.tfevents.1614867134.DESKTOP-3P8CLT4.9288.0 115KB

DQN Mutli-step=1, Double DQN, Adam

events.out.tfevents.1614952713.DESKTOP-3P8CLT4.14916.0 93KB

DQN Multi-step=1,Dueling,RMSProp

events.out.tfevents.1615300889.DESKTOP-3P8CLT4.13180.0 92KB

DQN Multi-step=2, RMSProp new

events.out.tfevents.1615456222.DESKTOP-3P8CLT4.7576.0 115KB

Basic DQN, Adam

events.out.tfevents.1614716469.DESKTOP-3P8CLT4.13156.0 115KB

DQN Multi-step=4, Adam

events.out.tfevents.1614802149.DESKTOP-3P8CLT4.12848.1 69KB

main-PongNoFrameskip-v4

openaigym.video.0.7824.video000000.meta.json 2KB

openaigym.video.0.7824.video000125.meta.json 2KB

openaigym.video.0.7824.video000343.meta.json 2KB

openaigym.video.0.7824.video000008.meta.json 2KB

openaigym.manifest.0.7824.manifest.json 873B

openaigym.video.0.7824.video000216.mp4 113KB

openaigym.video.0.7824.video000064.meta.json 2KB

openaigym.video.0.7824.video000064.mp4 90KB

openaigym.episode_batch.0.7824.stats.json 18KB

openaigym.video.0.7824.video000216.meta.json 2KB

openaigym.video.0.7824.video000343.mp4 120KB

openaigym.video.0.7824.video000000.mp4 69KB

openaigym.video.0.7824.video000125.mp4 116KB

openaigym.video.0.7824.video000001.mp4 48KB

openaigym.video.0.7824.video000027.mp4 75KB

openaigym.video.0.7824.video000027.meta.json 2KB

openaigym.video.0.7824.video000008.mp4 56KB

openaigym.video.0.7824.video000001.meta.json 2KB

.gitattributes 66B

replay_buffer.py 2KB

main.py 2KB

neural_nets.py 3KB

atari_wrappers.py 6KB

__pycache__

agent_control.cpython-38.pyc 3KB

neural_nets.cpython-38.pyc 2KB

replay_buffer.cpython-38.pyc 2KB

telegram_bot.cpython-38.pyc 1KB

atari_wrappers.cpython-38.pyc 7KB

agent.cpython-38.pyc 3KB

.idea

.gitignore 293B

other.xml 176B

vcs.xml 180B

misc.xml 185B

modules.xml 318B

inspectionProfiles

profiles_settings.xml 174B

reinforcement-learning-atari-pong.iml 407B

images

rgb_image.png 18KB

000.gif 138KB

graph_total.png 58KB

Screenshot_1.png 26KB

graph_optim.png 50KB

Screenshot_2.png 19KB

216.gif 146KB

agent.py 5KB

# Deep reinforcement learning algorithms in Atari Pong ## Summary Goal of this aplication is to find out how accurate and effective can Deep Q-Learning (DQN) be on Atari 1600 game of Pong in OpenAI enviroment. On top of DQN, aditional improvements on same algorithm were tested including Multi-step DQN, Double DQN and Dueling DQN. Result that can be seen on graph below show that basic DQN achieves human-like accuracy after only ~110 played games and great accuracy after 300 games. Improved versions of DQN considered in this project showed some improvement in both efficiancy and accuracy. ![Pong Gif](images/000.gif) ![Pong Gif](images/216.gif) Basic DQN: Episode 1 vs Episode 216 ## Enviroment Atari 1600 emulator is made by OpenAI in which you can test your reinforcement algorithms on 59 different games. Deep reinforcement learning is used because input is RGB picture of current frame (210x160x3). Since RGB picture is too much computationally expensive, it is turned to grayscale. Next is downsampling and cutting image to playable area, which size is 84x84x1. https://gym.openai.com/envs/Pong-v0/ ![](images/rgb_image.png) *Grayscale, downsampling and cropped* --- In Pong every game is played until one side has 21 points. One point is gain when other side didnt manage to return ball. In terms of reward for our agent, he gains -1 reward if he misses ball, +1 reward if opponent misses ball and 0 reward in every other case. After one side collects 21 points total reward gained is calculated by agent. Therefore minimum total reward is -21, human-like performance is over 0 and +21 is best possible outcome. ## DQN For the DQN implementation and the choose of the hyperparameters, I mostly followed [Mnih et al.](https://storage.googleapis.com/deepmind-media/dqn/DQNNaturePaper.pdf). I improved the basic DQN, implementing some variations like **Double Q-learning**, **Dueling networks** and **Multi-step learning**. You can find them summarized by [Hessel et al.](https://arxiv.org/pdf/1710.02298.pdf). For more in detail about each improved version of DQN you can check out these papers: * Multi-step DQN - [The "Bible" of Reinforcement Learning: Chapters 7 - Sutton & Barto](https://www.amazon.com/Reinforcement-Learning-Introduction-Adaptive-Computation/dp/0262039249/ref=as_li_ss_tl?keywords=reinforcement+learning&qid=1567849400&s=gateway&sr=8-1&linkCode=sl1&tag=andreaaffilia-20&linkId=e05d8ab8146051d903abb166926f6bce&language=en_US&tag=andreaaffilia-20) * [Double DQN](https://arxiv.org/pdf/1509.06461.pdf) * [Dueling DQN](http://proceedings.mlr.press/v48/wangf16.pdf) ## Results Efficiency and accuracy are two main factors in calculating how good results are. Efficiency means how quickly agent achieves human-like level and accuracy represents how close is agent to total reward of +21. Graphs represent how high was mean total reward (on last 40 games) after each game. Agent trained for each variation of algorithm for up to 500 games. ### Optimizers Adam and RMSProp optimizers were one tested in this project. Graph with some results comparing two optimizers can be seen below. It is clear RMSProp outperformed Adam in these tests, although more test runs are needed for better average values before giving clear verdict. Some other optimizers can be tested in future, like SGD or Adamax. ![](images/graph_optim.png) - ![#ff7043](https://via.placeholder.com/15/ff7043/000000?text=+) `Basic DQN Adam` - ![#bbbbbb](https://via.placeholder.com/15/bbbbbb/000000?text=+) `Basic DQN RMSProp` - ![#0077bb](https://via.placeholder.com/15/0077bb/000000?text=+) `2-step DQN Adam` - ![#009988](https://via.placeholder.com/15/009988/000000?text=+) `2-step DQN RMSProp` ### Algorithms Few selected variations of implemented algorithms are shown below. Altough it looks like DQN with 2 step and Double DQN outperformed Dueling DQN in efficiency, important note to keep in mind is that these results need to be averaged over many runs, as both Double DQN and DQN with Multi-step = 2 showed high variancy in results (both better and worse than Dueling DQN). As for accuracy Dueling DQN mixed with other variations of DQN showed best results. For more informations about viewing all of the data, check out next section. ![](images/graph_total.png) - ![#ff7043](https://via.placeholder.com/15/ff7043/000000?text=+) `Basic DQN Adam` - ![#ee3377](https://via.placeholder.com/15/ee3377/000000?text=+) `2-step Dueling DQN RMSProp` - ![#009988](https://via.placeholder.com/15/009988/000000?text=+) `2-step Dueling Double DQN RMSProp` - ![#0077bb](https://via.placeholder.com/15/0077bb/000000?text=+) `2-step Double DQN RMSProp` - ![#bbbbbb](https://via.placeholder.com/15/bbbbbb/000000?text=+) `2-step DQN RMSProp` --- * Mean total reward over last 10 games * Best efficiancy recorded: **2-step DQN RMSProp - after 79 games** * Best accuracy recorded: **2-step Dueling Double DQN RMSProp - 20.30 score (after 444 games)** * Mean total reward over last 40 games * Best efficiancy recorded: **2-step DQN RMSProp - after 93 games** * Best accuracy recorded: **2-step Dueling Double DQN RMSProp - 19.48 score (after 473 games)** ## Rest of data and TensorBoard Rest of training data can be found at [/content/runs](https://github.com/leonjovanovic/deep-reinforcement-learning-atari-pong/tree/main/content/runs). If you wish to see it and compare with rest I recommend using TensorBoard. After installation simply change directory where data is stored and use command ```python LOG_DIR = "full\path\to\data" tensorboard --logdir=LOG_DIR --host=127.0.0.1 ``` and open http://localhost:6006 in your browser. For information about installation and further questions visit [TensorBoard github](https://github.com/tensorflow/tensorboard/blob/master/README.md) ## Future improvements For further improvements on efficiency and accuracy we can do couple of things: * Smaller epsilon decay, bigger replay memory size, longer training time may produce better results * Implement [Prioritized Experience Replay](https://arxiv.org/pdf/1511.05952.pdf) * Implement [Noisy Networks for Exploration](https://arxiv.org/pdf/1706.10295.pdf)