# [PYTORCH] Proximal Policy Optimization (PPO) for playing Super Mario Bros
## Introduction
Here is my python source code for training an agent to play super mario bros. By using Proximal Policy Optimization (PPO) algorithm introduced in the paper **Proximal Policy Optimization Algorithms** [paper](https://arxiv.org/abs/1707.06347).
Talking about performance, my PPO-trained agent could complete 29/32 levels, which is much better than what I expected at the beginning.
For your information, PPO is the algorithm proposed by OpenAI and used for training OpenAI Five, which is the first AI to beat the world champions in an esports game. Specifically, The OpenAI Five dispatched a team of casters and ex-pros with MMR rankings in the 99.95th percentile of Dota 2 players in August 2018.
<p align="left">
<img src="demo/video-1-1.gif" width="200">
<img src="demo/video-1-2.gif" width="200">
<img src="demo/video-1-3.gif" width="200">
<img src="demo/video-1-4.gif" width="200"><br/>
<img src="demo/video-2-1.gif" width="200">
<img src="demo/video-2-2.gif" width="200">
<img src="demo/video-2-3.gif" width="200">
<img src="demo/video-2-4.gif" width="200"><br/>
<img src="demo/video-3-1.gif" width="200">
<img src="demo/video-3-2.gif" width="200">
<img src="demo/video-3-3.gif" width="200">
<img src="demo/video-3-4.gif" width="200"><br/>
<img src="demo/video-4-1.gif" width="200">
<img src="demo/video-4-2.gif" width="200">
<img src="demo/video-4-3.gif" width="200"><br/>
<img src="demo/video-5-1.gif" width="200">
<img src="demo/video-5-2.gif" width="200">
<img src="demo/video-5-3.gif" width="200">
<img src="demo/video-5-4.gif" width="200"><br/>
<img src="demo/video-6-1.gif" width="200">
<img src="demo/video-6-2.gif" width="200">
<img src="demo/video-6-3.gif" width="200">
<img src="demo/video-6-4.gif" width="200"><br/>
<img src="demo/video-7-1.gif" width="200">
<img src="demo/video-7-2.gif" width="200">
<img src="demo/video-7-3.gif" width="200"><br/>
<img src="demo/video-8-1.gif" width="200">
<img src="demo/video-8-2.gif" width="200">
<img src="demo/video-8-3.gif" width="200"><br/>
<i>Sample results</i>
</p>
## Motivation
It has been a while since I have released my A3C implementation ([A3C code](https://github.com/uvipen/Super-mario-bros-A3C-pytorch)) for training an agent to play super mario bros. Although the trained agent could complete levels quite fast and quite well (at least faster and better than I played :sweat_smile:), it still did not totally satisfy me. The main reason is, agent trained with A3C could only complete 9/32 levels, no matter how much I fine-tuned and tested. It motivated me to look for a new approach.
Before I decided to choose PPO as my next complete implementation, I had partially implemented a couple of other algorithms, including A2C and Rainbow. While the former did not show a big jump in performance, the latter is more suitable for more randomized environments/games, like ping-pong or space invaders.
## How to use my code
With my code, you can:
* **Train your model** by running `python train.py`. For example: `python train.py --world 5 --stage 2 --lr 1e-4`
* **Test your trained model** by running `python test.py`. For example: `python test.py --world 5 --stage 2`
**Note**: If you got stuck at any level, try training again with different **learning rates**. You could conquer 29/32 levels like what I did, by changing only **learning rate**. Normally I set **learning rate** as **1e-3**, **1e-4** or **1e-5**. However, there are some difficult levels, including level **1-3**, in which I finally trained successfully with **learning rate** of **7e-5** after failed for 70 times.
## Docker
For being convenient, I provide Dockerfile which could be used for running training as well as test phases
Assume that docker image's name is ppo. You only want to use the first gpu. You already clone this repository and cd into it.
Build:
`sudo docker build --network=host -t ppo .`
Run:
`docker run --runtime=nvidia -it --rm --volume="$PWD"/../Super-mario-bros-PPO-pytorch:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo`
Then inside docker container, you could simply run **train.py** or **test.py** scripts as mentioned above.
**Note**: There is a bug for rendering when using docker. Therefore, when you train or test by using docker, please comment line `env.render()` on script **src/process.py** for training or **test.py** for test. Then, you will not be able to see the window pop up for visualization anymore. But it is not a big problem, since the training process will still run, and the test process will end up with an output mp4 file for visualization
## Why there are still 3 levels missing?
In world 4-4, 7-4 and 8-4, map consists of puzzles where the player must choose the correct the path in order to move forward. If you choose a wrong path, you have to go through path you visited again. That's why my agent at the moment can not complete these 3 levels
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
Super-mario-bros-PPO-pytorch-master.zip (95个子文件)
Super-mario-bros-PPO-pytorch-master
src
process.py 2KB
model.py 1KB
env.py 5KB
LICENSE 1KB
demo
video-2-4.gif 2.23MB
video-4-3.gif 2.31MB
video-7-3.gif 3.56MB
video-4-1.gif 3.15MB
video-7-2.gif 5.77MB
video-5-2.gif 3.56MB
video-1-3.gif 2.96MB
video-3-4.gif 2.13MB
video-6-3.gif 1.89MB
video-3-3.gif 2.12MB
video-7-1.gif 3.84MB
video-3-2.gif 2.72MB
video-4-2.gif 3.31MB
video-5-1.gif 3.56MB
video-1-2.gif 3.19MB
video-8-1.gif 6.54MB
video-5-4.gif 2.42MB
video-1-1.gif 3.47MB
video-2-1.gif 3.93MB
video-1-4.gif 2.32MB
video-6-4.gif 2.48MB
video-8-2.gif 3.94MB
video-3-1.gif 3.84MB
video-6-1.gif 2.68MB
video-2-3.gif 3.51MB
video-6-2.gif 4.05MB
video-8-3.gif 4.49MB
video-2-2.gif 4.92MB
video-5-3.gif 2.69MB
output
video_1_3.mp4 847KB
video_5_2.mp4 840KB
video_2_2.mp4 1.33MB
video_8_2.mp4 912KB
video_7_3.mp4 967KB
video_6_3.mp4 643KB
video_8_1.mp4 1.51MB
video_1_4.mp4 1.07MB
video_8_3.mp4 1.05MB
video_5_4.mp4 950KB
video_5_3.mp4 798KB
video_6_2.mp4 1.02MB
video_2_1.mp4 1007KB
video_4_2.mp4 924KB
video_3_3.mp4 621KB
video_3_4.mp4 831KB
video_4_3.mp4 555KB
video_2_3.mp4 974KB
video_6_4.mp4 1.08MB
video_1_1.mp4 892KB
video_7_1.mp4 969KB
video_3_1.mp4 1.06MB
video_6_1.mp4 781KB
video_5_1.mp4 874KB
video_3_2.mp4 803KB
video_4_1.mp4 839KB
video_1_2.mp4 909KB
video_2_4.mp4 968KB
video_7_2.mp4 1.83MB
trained_models
ppo_super_mario_bros_3_4 2.38MB
ppo_super_mario_bros_3_3 2.38MB
ppo_super_mario_bros_3_2 2.38MB
ppo_super_mario_bros_6_4 2.38MB
ppo_super_mario_bros_5_1 2.38MB
ppo_super_mario_bros_7_1 2.38MB
ppo_super_mario_bros_8_2 2.38MB
ppo_super_mario_bros_3_1 2.38MB
ppo_super_mario_bros_7_3 2.38MB
ppo_super_mario_bros_5_2 2.38MB
ppo_super_mario_bros_6_3 2.38MB
ppo_super_mario_bros_1_1 2.38MB
ppo_super_mario_bros_4_2 2.38MB
ppo_super_mario_bros_2_4 2.38MB
ppo_super_mario_bros_2_2 2.38MB
ppo_super_mario_bros_1_3 2.38MB
ppo_super_mario_bros_2_3 2.38MB
ppo_super_mario_bros_7_2 2.38MB
ppo_super_mario_bros_1_2 2.38MB
ppo_super_mario_bros_6_2 2.38MB
ppo_super_mario_bros_8_1 2.38MB
ppo_super_mario_bros_5_4 2.38MB
ppo_super_mario_bros_2_1 2.38MB
ppo_super_mario_bros_4_3 2.38MB
ppo_super_mario_bros_8_3 2.38MB
ppo_super_mario_bros_6_1 2.38MB
ppo_super_mario_bros_5_3 2.38MB
ppo_super_mario_bros_1_4 2.38MB
ppo_super_mario_bros_4_1 2.38MB
Dockerfile 343B
train.py 7KB
test.py 2KB
README.md 5KB
共 95 条
- 1
资源评论
Older司机渣渣威
- 粉丝: 6
- 资源: 202
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功