# MAPPO
## New Update!!!We support SMAC V2 now~
Chao Yu*, Akash Velu*, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu.
This repository implements MAPPO, a multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. We also make the off-policy repo public, please feel free to try that. [off-policy link](https://github.com/marlbenchmark/off-policy)
<font color="red"> All hyperparameters and training curves are reported in appendix, we would strongly suggest to double check the important factors before runing the code, such as the rollout threads, episode length, ppo epoch, mini-batches, clip term and so on. <font color='red'>Besides, we have updated the newest results on google football testbed and suggestions about the episode length and parameter-sharing in appendix, welcome to check that. </font>
<font color="red"> We have recently noticed that a lot of papers do not reproduce the mappo results correctly, probably due to the rough hyper-parameters description. We have updated training scripts for each map or scenario in /train/train_xxx_scripts/*.sh. Feel free to try that.</font>
## Environments supported:
- [StarCraftII (SMAC)](https://github.com/oxwhirl/smac)
- [Hanabi](https://github.com/deepmind/hanabi-learning-environment)
- [Multiagent Particle-World Environments (MPEs)](https://github.com/openai/multiagent-particle-envs)
- [Google Research Football (GRF)](https://github.com/google-research/football)
- - [StarCraftII (SMAC) v2](https://github.com/oxwhirl/smacv2)
## 1. Usage
**WARNING: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents**
All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code
for MAPPO.
* The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi.
* Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for
each environment.
* Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named
in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered.
* Python training scripts for each environment can be found in the scripts/train/ folder.
* The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones
used in the paper; however, please refer to the appendix for a full list of hyperparameters used.
## 2. Installation
Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the [PyTorch website](https://pytorch.org/get-started/locally/). We remark that this repo. does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer.
``` Bash
# create conda environment
conda create -n marl python==3.6.1
conda activate marl
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html
```
```
# install on-policy package
cd on-policy
pip install -e .
```
Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet.
### 2.1 StarCraftII [4.10](http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip)
``` Bash
unzip SC2.4.10.zip
# password is iagreetotheeula
echo "export SC2PATH=~/StarCraftII/" >> ~/.bashrc
```
* download SMAC Maps, and move it to `~/StarCraftII/Maps/`.
* To use a stableid, copy `stableid.json` from https://github.com/Blizzard/s2client-proto.git to `~/StarCraftII/`.
For SMAC v2, please refer to https://github.com/oxwhirl/smacv2.git. Make sure you have the `32x32_flat.SC2Map` map file in your `SMAC_Maps` folder.
### 2.2 Hanabi
Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here.
To install, execute the following:
``` Bash
pip install cffi
cd envs/hanabi
mkdir build & cd build
cmake ..
make -j
```
Here are all hanabi [models](https://drive.google.com/drive/folders/1RIcP_rG9NY9UzaWfFsIncDcjASk5h4Nx?usp=sharing).
### 2.3 MPE
``` Bash
# install this package first
pip install seaborn
```
There are 3 Cooperative scenarios in MPE:
* simple_spread
* simple_speaker_listener, which is 'Comm' scenario in paper
* simple_reference
### 2.4 GRF
Please see the [football](https://github.com/google-research/football/blob/master/README.md) repository to install the football environment.
## 3.Train
Here we use train_mpe.sh as an example:
```
cd onpolicy/scripts
chmod +x ./train_mpe.sh
./train_mpe.sh
```
Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official [documentation](https://docs.wandb.ai/). Adding the `--use_wandb` in command line or in the .sh file will use Tensorboard instead of Weights & Biases.
We additionally provide `./eval_hanabi_forward.sh` for evaluating the hanabi score over 100k trials.
## 4. Publication
If you find this repository useful, please cite our [paper](https://arxiv.org/abs/2103.01955):
```
@inproceedings{
yu2022the,
title={The Surprising Effectiveness of {PPO} in Cooperative Multi-Agent Games},
author={Chao Yu and Akash Velu and Eugene Vinitsky and Jiaxuan Gao and Yu Wang and Alexandre Bayen and Yi Wu},
booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track},
year={2022}
}
```
没有合适的资源?快使用搜索试试~ 我知道了~
这是多智能体的PPO(MAPPO)算法实现
共163个文件
py:79个
sh:55个
h:11个
需积分: 0 20 下载量 154 浏览量
2023-10-27
23:14:24
上传
评论
收藏 256KB ZIP 举报
温馨提示
这是多智能体的PPO(MAPPO)算法实现
资源推荐
资源详情
资源评论
收起资源包目录
这是多智能体的PPO(MAPPO)算法实现 (163个子文件)
pyhanabi.cc 29KB
canonical_encoders.cc 21KB
hanabi_state.cc 12KB
hanabi_game.cc 7KB
hanabi_observation.cc 5KB
hanabi_hand.cc 4KB
util.cc 2KB
hanabi_move.cc 2KB
hanabi_history_item.cc 2KB
hanabi_card.cc 1020B
.DS_Store 6KB
.gitignore 109B
pyhanabi.h 8KB
hanabi_state.h 6KB
hanabi_hand.h 5KB
hanabi_game.h 5KB
hanabi_observation.h 3KB
hanabi_history_item.h 2KB
util.h 2KB
hanabi_move.h 2KB
canonical_encoders.h 2KB
observation_encoder.h 1KB
hanabi_card.h 1KB
LICENSE 1KB
README.md 6KB
starcraft2.py 104KB
StarCraft2_Env.py 79KB
Hanabi_Env.py 36KB
shared_buffer.py 33KB
pyhanabi.py 31KB
env_wrappers.py 28KB
separated_buffer.py 23KB
hatrpo_trainer.py 18KB
config.py 17KB
environment.py 17KB
hanabi_runner_forward.py 17KB
mpe_runner.py 15KB
core.py 14KB
ma_transformer.py 13KB
football_runner.py 13KB
distributions.py 13KB
base_runner.py 12KB
mpe_runner.py 12KB
simple_world_comm.py 12KB
smac_runner.py 11KB
act.py 11KB
smac_runner.py 11KB
rendering.py 11KB
transformer_policy.py 10KB
train_smac.py 10KB
happo_trainer.py 10KB
smac_maps.py 10KB
r_mappo.py 10KB
mat_trainer.py 9KB
r_actor_critic.py 9KB
base_runner.py 8KB
train_football.py 8KB
policy.py 8KB
policy.py 8KB
rMAPPOPolicy.py 7KB
simple_crypto_display.py 7KB
eval_hanabi.py 6KB
train_hanabi_forward.py 6KB
simple_crypto.py 6KB
simple_attack.py 6KB
train_mpe.py 6KB
render_football.py 6KB
simple_tag.py 6KB
simple_adversary.py 6KB
render_mpe.py 5KB
simple_push.py 4KB
Football_Env.py 4KB
simple_reference.py 4KB
popart.py 4KB
simple_spread.py 4KB
simple_speaker_listener.py 4KB
transformer_act.py 4KB
distributions.py 3KB
popart_hatrpo.py 3KB
valuenorm.py 3KB
rnn.py 3KB
wrapper.py 3KB
multi_discrete.py 2KB
multi_discrete.py 2KB
util.py 2KB
SMACv2_modified.py 2KB
mlp.py 2KB
SMACv2.py 2KB
multiagentenv.py 2KB
cnn.py 2KB
setup.py 1KB
MPE_env.py 1KB
__init__.py 574B
util.py 461B
scenario.py 361B
__init__.py 194B
__init__.py 147B
__init__.py 90B
__init__.py 1B
__init__.py 0B
共 163 条
- 1
- 2
资源评论
Older司机渣渣威
- 粉丝: 6
- 资源: 202
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功