【免费】这是多智能体的PPO（MAPPO）算法实现

共163个文件

py：79个

sh：55个

h：11个

需积分: 0 154 浏览量 2023-10-27 23:14:24 上传评论收藏 256KB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

这是多智能体的PPO（MAPPO）算法实现（163个子文件）

pyhanabi.cc 29KB

canonical_encoders.cc 21KB

hanabi_state.cc 12KB

hanabi_game.cc 7KB

hanabi_observation.cc 5KB

hanabi_hand.cc 4KB

util.cc 2KB

hanabi_move.cc 2KB

hanabi_history_item.cc 2KB

hanabi_card.cc 1020B

.DS_Store 6KB

.gitignore 109B

pyhanabi.h 8KB

hanabi_state.h 6KB

hanabi_hand.h 5KB

hanabi_game.h 5KB

hanabi_observation.h 3KB

hanabi_history_item.h 2KB

util.h 2KB

hanabi_move.h 2KB

canonical_encoders.h 2KB

observation_encoder.h 1KB

hanabi_card.h 1KB

LICENSE 1KB

README.md 6KB

starcraft2.py 104KB

StarCraft2_Env.py 79KB

Hanabi_Env.py 36KB

shared_buffer.py 33KB

pyhanabi.py 31KB

env_wrappers.py 28KB

separated_buffer.py 23KB

hatrpo_trainer.py 18KB

config.py 17KB

environment.py 17KB

hanabi_runner_forward.py 17KB

mpe_runner.py 15KB

core.py 14KB

ma_transformer.py 13KB

football_runner.py 13KB

distributions.py 13KB

base_runner.py 12KB

mpe_runner.py 12KB

simple_world_comm.py 12KB

smac_runner.py 11KB

act.py 11KB

smac_runner.py 11KB

rendering.py 11KB

transformer_policy.py 10KB

train_smac.py 10KB

happo_trainer.py 10KB

smac_maps.py 10KB

r_mappo.py 10KB

mat_trainer.py 9KB

r_actor_critic.py 9KB

base_runner.py 8KB

train_football.py 8KB

policy.py 8KB

rMAPPOPolicy.py 7KB

simple_crypto_display.py 7KB

eval_hanabi.py 6KB

train_hanabi_forward.py 6KB

simple_crypto.py 6KB

simple_attack.py 6KB

train_mpe.py 6KB

render_football.py 6KB

simple_tag.py 6KB

simple_adversary.py 6KB

render_mpe.py 5KB

simple_push.py 4KB

Football_Env.py 4KB

simple_reference.py 4KB

popart.py 4KB

simple_spread.py 4KB

simple_speaker_listener.py 4KB

transformer_act.py 4KB

distributions.py 3KB

popart_hatrpo.py 3KB

valuenorm.py 3KB

rnn.py 3KB

wrapper.py 3KB

multi_discrete.py 2KB

util.py 2KB

SMACv2_modified.py 2KB

mlp.py 2KB

SMACv2.py 2KB

multiagentenv.py 2KB

cnn.py 2KB

setup.py 1KB

MPE_env.py 1KB

__init__.py 574B

util.py 461B

scenario.py 361B

__init__.py 194B

__init__.py 147B

__init__.py 90B

__init__.py 1B

__init__.py 0B

共 163 条

# MAPPO ## New Update！！！We support SMAC V2 now～ Chao Yu*, Akash Velu*, Eugene Vinitsky, Jiaxuan Gao, Yu Wang, Alexandre Bayen, and Yi Wu. This repository implements MAPPO, a multi-agent variant of PPO. The implementation in this repositorory is used in the paper "The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games" (https://arxiv.org/abs/2103.01955). This repository is heavily based on https://github.com/ikostrikov/pytorch-a2c-ppo-acktr-gail. We also make the off-policy repo public, please feel free to try that. [off-policy link](https://github.com/marlbenchmark/off-policy) All hyperparameters and training curves are reported in appendix, we would strongly suggest to double check the important factors before runing the code, such as the rollout threads, episode length, ppo epoch, mini-batches, clip term and so on. Besides, we have updated the newest results on google football testbed and suggestions about the episode length and parameter-sharing in appendix, welcome to check that. We have recently noticed that a lot of papers do not reproduce the mappo results correctly, probably due to the rough hyper-parameters description. We have updated training scripts for each map or scenario in /train/train_xxx_scripts/*.sh. Feel free to try that. ## Environments supported: - [StarCraftII (SMAC)](https://github.com/oxwhirl/smac) - [Hanabi](https://github.com/deepmind/hanabi-learning-environment) - [Multiagent Particle-World Environments (MPEs)](https://github.com/openai/multiagent-particle-envs) - [Google Research Football (GRF)](https://github.com/google-research/football) - - [StarCraftII (SMAC) v2](https://github.com/oxwhirl/smacv2) ## 1. Usage **WARNING: by default all experiments assume a shared policy by all agents i.e. there is one neural network shared by all agents** All core code is located within the onpolicy folder. The algorithms/ subfolder contains algorithm-specific code for MAPPO. * The envs/ subfolder contains environment wrapper implementations for the MPEs, SMAC, and Hanabi. * Code to perform training rollouts and policy updates are contained within the runner/ folder - there is a runner for each environment. * Executable scripts for training with default hyperparameters can be found in the scripts/ folder. The files are named in the following manner: train_algo_environment.sh. Within each file, the map name (in the case of SMAC and the MPEs) can be altered. * Python training scripts for each environment can be found in the scripts/train/ folder. * The config.py file contains relevant hyperparameter and env settings. Most hyperparameters are defaulted to the ones used in the paper; however, please refer to the appendix for a full list of hyperparameters used. ## 2. Installation Here we give an example installation on CUDA == 10.1. For non-GPU & other CUDA version installation, please refer to the [PyTorch website](https://pytorch.org/get-started/locally/). We remark that this repo. does not depend on a specific CUDA version, feel free to use any CUDA version suitable on your own computer. ``` Bash # create conda environment conda create -n marl python==3.6.1 conda activate marl pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html ``` ``` # install on-policy package cd on-policy pip install -e . ``` Even though we provide requirement.txt, it may have redundancy. We recommend that the user try to install other required packages by running the code and finding which required package hasn't installed yet. ### 2.1 StarCraftII [4.10](http://blzdistsc2-a.akamaihd.net/Linux/SC2.4.10.zip) ``` Bash unzip SC2.4.10.zip # password is iagreetotheeula echo "export SC2PATH=~/StarCraftII/" >> ~/.bashrc ``` * download SMAC Maps, and move it to `~/StarCraftII/Maps/`. * To use a stableid, copy `stableid.json` from https://github.com/Blizzard/s2client-proto.git to `~/StarCraftII/`. For SMAC v2, please refer to https://github.com/oxwhirl/smacv2.git. Make sure you have the `32x32_flat.SC2Map` map file in your `SMAC_Maps` folder. ### 2.2 Hanabi Environment code for Hanabi is developed from the open-source environment code, but has been slightly modified to fit the algorithms used here. To install, execute the following: ``` Bash pip install cffi cd envs/hanabi mkdir build & cd build cmake .. make -j ``` Here are all hanabi [models](https://drive.google.com/drive/folders/1RIcP_rG9NY9UzaWfFsIncDcjASk5h4Nx?usp=sharing). ### 2.3 MPE ``` Bash # install this package first pip install seaborn ``` There are 3 Cooperative scenarios in MPE: * simple_spread * simple_speaker_listener, which is 'Comm' scenario in paper * simple_reference ### 2.4 GRF Please see the [football](https://github.com/google-research/football/blob/master/README.md) repository to install the football environment. ## 3.Train Here we use train_mpe.sh as an example: ``` cd onpolicy/scripts chmod +x ./train_mpe.sh ./train_mpe.sh ``` Local results are stored in subfold scripts/results. Note that we use Weights & Bias as the default visualization platform; to use Weights & Bias, please register and login to the platform first. More instructions for using Weights&Bias can be found in the official [documentation](https://docs.wandb.ai/). Adding the `--use_wandb` in command line or in the .sh file will use Tensorboard instead of Weights & Biases. We additionally provide `./eval_hanabi_forward.sh` for evaluating the hanabi score over 100k trials. ## 4. Publication If you find this repository useful, please cite our [paper](https://arxiv.org/abs/2103.01955): ``` @inproceedings{ yu2022the, title={The Surprising Effectiveness of {PPO} in Cooperative Multi-Agent Games}, author={Chao Yu and Akash Velu and Eugene Vinitsky and Jiaxuan Gao and Yu Wang and Alexandre Bayen and Yi Wu}, booktitle={Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track}, year={2022} } ```

评论收藏

内容反馈