【免费】超级马里奥兄弟PPOpytorch资源-CSDN文库

共95个文件

mp4：29个

gif：29个

py：5个

pytorch

需积分: 0 59 浏览量 2023-10-21 16:48:31 上传评论收藏 172.1MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

Super-mario-bros-PPO-pytorch-master.zip （95个子文件）

Super-mario-bros-PPO-pytorch-master

src

process.py 2KB

model.py 1KB

env.py 5KB

LICENSE 1KB

demo

video-2-4.gif 2.23MB

video-4-3.gif 2.31MB

video-7-3.gif 3.56MB

video-4-1.gif 3.15MB

video-7-2.gif 5.77MB

video-5-2.gif 3.56MB

video-1-3.gif 2.96MB

video-3-4.gif 2.13MB

video-6-3.gif 1.89MB

video-3-3.gif 2.12MB

video-7-1.gif 3.84MB

video-3-2.gif 2.72MB

video-4-2.gif 3.31MB

video-5-1.gif 3.56MB

video-1-2.gif 3.19MB

video-8-1.gif 6.54MB

video-5-4.gif 2.42MB

video-1-1.gif 3.47MB

video-2-1.gif 3.93MB

video-1-4.gif 2.32MB

video-6-4.gif 2.48MB

video-8-2.gif 3.94MB

video-3-1.gif 3.84MB

video-6-1.gif 2.68MB

video-2-3.gif 3.51MB

video-6-2.gif 4.05MB

video-8-3.gif 4.49MB

video-2-2.gif 4.92MB

video-5-3.gif 2.69MB

output

video_1_3.mp4 847KB

video_5_2.mp4 840KB

video_2_2.mp4 1.33MB

video_8_2.mp4 912KB

video_7_3.mp4 967KB

video_6_3.mp4 643KB

video_8_1.mp4 1.51MB

video_1_4.mp4 1.07MB

video_8_3.mp4 1.05MB

video_5_4.mp4 950KB

video_5_3.mp4 798KB

video_6_2.mp4 1.02MB

video_2_1.mp4 1007KB

video_4_2.mp4 924KB

video_3_3.mp4 621KB

video_3_4.mp4 831KB

video_4_3.mp4 555KB

video_2_3.mp4 974KB

video_6_4.mp4 1.08MB

video_1_1.mp4 892KB

video_7_1.mp4 969KB

video_3_1.mp4 1.06MB

video_6_1.mp4 781KB

video_5_1.mp4 874KB

video_3_2.mp4 803KB

video_4_1.mp4 839KB

video_1_2.mp4 909KB

video_2_4.mp4 968KB

video_7_2.mp4 1.83MB

trained_models

ppo_super_mario_bros_3_4 2.38MB

ppo_super_mario_bros_3_3 2.38MB

ppo_super_mario_bros_3_2 2.38MB

ppo_super_mario_bros_6_4 2.38MB

ppo_super_mario_bros_5_1 2.38MB

ppo_super_mario_bros_7_1 2.38MB

ppo_super_mario_bros_8_2 2.38MB

ppo_super_mario_bros_3_1 2.38MB

ppo_super_mario_bros_7_3 2.38MB

ppo_super_mario_bros_5_2 2.38MB

ppo_super_mario_bros_6_3 2.38MB

ppo_super_mario_bros_1_1 2.38MB

ppo_super_mario_bros_4_2 2.38MB

ppo_super_mario_bros_2_4 2.38MB

ppo_super_mario_bros_2_2 2.38MB

ppo_super_mario_bros_1_3 2.38MB

ppo_super_mario_bros_2_3 2.38MB

ppo_super_mario_bros_7_2 2.38MB

ppo_super_mario_bros_1_2 2.38MB

ppo_super_mario_bros_6_2 2.38MB

ppo_super_mario_bros_8_1 2.38MB

ppo_super_mario_bros_5_4 2.38MB

ppo_super_mario_bros_2_1 2.38MB

ppo_super_mario_bros_4_3 2.38MB

ppo_super_mario_bros_8_3 2.38MB

ppo_super_mario_bros_6_1 2.38MB

ppo_super_mario_bros_5_3 2.38MB

ppo_super_mario_bros_1_4 2.38MB

ppo_super_mario_bros_4_1 2.38MB

Dockerfile 343B

train.py 7KB

test.py 2KB

README.md 5KB

# [PYTORCH] Proximal Policy Optimization (PPO) for playing Super Mario Bros ## Introduction Here is my python source code for training an agent to play super mario bros. By using Proximal Policy Optimization (PPO) algorithm introduced in the paper **Proximal Policy Optimization Algorithms** [paper](https://arxiv.org/abs/1707.06347). Talking about performance, my PPO-trained agent could complete 29/32 levels, which is much better than what I expected at the beginning. For your information, PPO is the algorithm proposed by OpenAI and used for training OpenAI Five, which is the first AI to beat the world champions in an esports game. Specifically, The OpenAI Five dispatched a team of casters and ex-pros with MMR rankings in the 99.95th percentile of Dota 2 players in August 2018. <img src="demo/video-1-1.gif" width="200"> <img src="demo/video-1-2.gif" width="200"> <img src="demo/video-1-3.gif" width="200"> <img src="demo/video-1-4.gif" width="200"> <img src="demo/video-2-1.gif" width="200"> <img src="demo/video-2-2.gif" width="200"> <img src="demo/video-2-3.gif" width="200"> <img src="demo/video-2-4.gif" width="200"> <img src="demo/video-3-1.gif" width="200"> <img src="demo/video-3-2.gif" width="200"> <img src="demo/video-3-3.gif" width="200"> <img src="demo/video-3-4.gif" width="200"> <img src="demo/video-4-1.gif" width="200"> <img src="demo/video-4-2.gif" width="200"> <img src="demo/video-4-3.gif" width="200"> <img src="demo/video-5-1.gif" width="200"> <img src="demo/video-5-2.gif" width="200"> <img src="demo/video-5-3.gif" width="200"> <img src="demo/video-5-4.gif" width="200"> <img src="demo/video-6-1.gif" width="200"> <img src="demo/video-6-2.gif" width="200"> <img src="demo/video-6-3.gif" width="200"> <img src="demo/video-6-4.gif" width="200"> <img src="demo/video-7-1.gif" width="200"> <img src="demo/video-7-2.gif" width="200"> <img src="demo/video-7-3.gif" width="200"> <img src="demo/video-8-1.gif" width="200"> <img src="demo/video-8-2.gif" width="200"> <img src="demo/video-8-3.gif" width="200"> Sample results ## Motivation It has been a while since I have released my A3C implementation ([A3C code](https://github.com/uvipen/Super-mario-bros-A3C-pytorch)) for training an agent to play super mario bros. Although the trained agent could complete levels quite fast and quite well (at least faster and better than I played :sweat_smile:), it still did not totally satisfy me. The main reason is, agent trained with A3C could only complete 9/32 levels, no matter how much I fine-tuned and tested. It motivated me to look for a new approach. Before I decided to choose PPO as my next complete implementation, I had partially implemented a couple of other algorithms, including A2C and Rainbow. While the former did not show a big jump in performance, the latter is more suitable for more randomized environments/games, like ping-pong or space invaders. ## How to use my code With my code, you can: * **Train your model** by running `python train.py`. For example: `python train.py --world 5 --stage 2 --lr 1e-4` * **Test your trained model** by running `python test.py`. For example: `python test.py --world 5 --stage 2` **Note**: If you got stuck at any level, try training again with different **learning rates**. You could conquer 29/32 levels like what I did, by changing only **learning rate**. Normally I set **learning rate** as **1e-3**, **1e-4** or **1e-5**. However, there are some difficult levels, including level **1-3**, in which I finally trained successfully with **learning rate** of **7e-5** after failed for 70 times. ## Docker For being convenient, I provide Dockerfile which could be used for running training as well as test phases Assume that docker image's name is ppo. You only want to use the first gpu. You already clone this repository and cd into it. Build: `sudo docker build --network=host -t ppo .` Run: `docker run --runtime=nvidia -it --rm --volume="$PWD"/../Super-mario-bros-PPO-pytorch:/Super-mario-bros-PPO-pytorch --gpus device=0 ppo` Then inside docker container, you could simply run **train.py** or **test.py** scripts as mentioned above. **Note**: There is a bug for rendering when using docker. Therefore, when you train or test by using docker, please comment line `env.render()` on script **src/process.py** for training or **test.py** for test. Then, you will not be able to see the window pop up for visualization anymore. But it is not a big problem, since the training process will still run, and the test process will end up with an output mp4 file for visualization ## Why there are still 3 levels missing? In world 4-4, 7-4 and 8-4, map consists of puzzles where the player must choose the correct the path in order to move forward. If you choose a wrong path, you have to go through path you visited again. That's why my agent at the moment can not complete these 3 levels

评论收藏

内容反馈