用于大规模交通信号控制的多代理深度强化学习。_Python_JupyterNotebook

共82个文件

py：25个

csv：19个

ini：14个

版权申诉

8 浏览量 2023-04-26 11:23:23 上传评论收藏 7.5MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

用于大规模交通信号控制的多代理深度强化学习。_Python_Jupyter Notebook_下载.zip （82个子文件）

deeprl_signal_control-master

utils.py 14KB

__init__.py 0B

setup_ubuntu.sh 277B

large_grid

__init__.py 0B

data

__init__.py 0B

network.pdf 158KB

build_file.py 16KB

view.xml 37KB

intersection.pdf 30KB

network1.pdf 157KB

build_file_old.py 15KB

real_net

__init__.py 0B

data

__init__.py 0B

network.pdf 1.65MB

build_file.py 7KB

view.xml 37KB

intersection.pdf 117KB

view1.xml 37KB

.DS_Store 6KB

most.net.xml 928KB

most.add.xml 12KB

most.sumocfg 1KB

main.py 9KB

LICENSE 1KB

agents

utils.py 10KB

__init__.py 0B

models.py 16KB

policies.py 16KB

setup_eval.py 2KB

figs

large_grid_train.png 108KB

1500.png 58KB

real_net_train.png 98KB

2500.png 65KB

3500.png 65KB

real_net_experimental_data

eva_data

real_net_greedy_traffic.csv 2.54MB

real_net_ia2c_traffic.csv 2.55MB

real_net_iqll_traffic.csv 2.57MB

real_net_greedy_trip.csv 906KB

real_net_ma2c_traffic.csv 2.47MB

real_net_iqll_control.csv 562KB

real_net_ma2c_trip.csv 1.09MB

real_net_iqll_trip.csv 729KB

real_net_iqld_traffic.csv 2.58MB

real_net_greedy_control.csv 557KB

real_net_iqld_trip.csv 482KB

real_net_iqld_control.csv 563KB

real_net_ia2c_trip.csv 1.03MB

real_net_ia2c_control.csv 559KB

real_net_ma2c_control.csv 556KB

train_data

ia2c_train_reward.csv 75KB

iqll_real_train_reward.csv 75KB

ma2c_real_train_reward.csv 76KB

iqld_train_reward.csv 76KB

envs

__init__.py 0B

large_grid_env.py 14KB

real_net_env.py 7KB

env.py 26KB

small_grid_env.py 4KB

test_env.py 1KB

small_grid

__init__.py 0B

data

__init__.py 0B

network.pdf 31KB

build_file.py 18KB

extract_tensorboard.py 2KB

result_plot.ipynb 27KB

.ipynb_checkpoints

result_plot-checkpoint.ipynb 21KB

README.md 6KB

setup_mac.sh 469B

config

config_greedy_real.ini 532B

config_ia2c_large.ini 966B

config_test_real.ini 556B

config_iqll_real.ini 835B

config_test_large.ini 580B

config_ma2c_real.ini 935B

config_ia2c_real.ini 923B

config_test_small.ini 558B

config_iqld_real.ini 859B

config_ma2c_large.ini 978B

config_iqld_large.ini 901B

config_iqll_large.ini 877B

config_iqld_gym.ini 373B

config_greedy_large.ini 982B

# Deep RL for traffic signal control [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) This repo implements start-of-the-art mutli-agent (decentralized) deep RL algorithms for large-scale traffic signal control in SUMO-simulated environments. Available cooperation levels: * Centralized: a global agent that makes global control w/ global observation, reward. * Decentralized: multiple local agents that make local control independently w/ neighborhood information sharing. Available NN layers: Fully-connected, LSTM. Available algorithms: IQL, IA2C, IA2C with stabilization (called MA2C in this paper). For more advanced algorithms, please check [deeprl_network](https://github.com/cts198859/deeprl_network). Available environments: * A 6-intersection benchmark traffic network. [Ye, Bao-Lin, et al. "A hierarchical model predictive control approach for signal splits optimization in large-scale urban road networks." IEEE Transactions on Intelligent Transportation Systems 17.8 (2016): 2182-2192.](https://ieeexplore.ieee.org/abstract/document/7406703/) * A 5X5 traffic grid. [Chu, Tianshu, Shuhui Qu, and Jie Wang. "Large-scale traffic grid signal control with regional reinforcement learning." American Control Conference (ACC), 2016. IEEE, 2016.](https://ieeexplore.ieee.org/abstract/document/7525014/) * A modified Monaco traffic network with 30 signalized intersections. [L. Codeca, J. Härri, "Monaco SUMO Traffic (MoST) Scenario: A 3D Mobility Scenario for Cooperative ITS" SUMO 2018, SUMO User Conference, Simulating Autonomous and Intermodal Transport Systems May 14-16, 2018, Berlin, Germany.](http://www.eurecom.fr/en/publication/5527/download/comsys-publi-5527.pdf) ([code](https://github.com/lcodeca/MoSTScenario)) ## Requirements * Python3==3.5 * [Tensorflow](http://www.tensorflow.org/install)==1.12.0 * [SUMO](http://sumo.dlr.de/wiki/Installing)>=1.1.0 Required packages can be installed by running `setup_mac.sh` or `setup_ubuntu.sh`. Attention: the code on master branch is for SUMO version >= 1.1.0. Please go to branch [sumo-0.32.0](https://github.com/cts198859/deeprl_signal_control/tree/sumo-0.32.0) if you are using the old SUMO version. ## Usages First define all hyperparameters in a config file under `[config_dir]`, and create the base directory of experiements `[base_dir]`. Before training, please call `build_file.py` under `[environment_dir]/data/` to generate SUMO network files for `small_grid` and `large_grid` environments. 1. To train a new agent, run ~~~ python3 main.py --base-dir [base_dir]/[agent] train --config-dir [config_dir] --test-mode no_test ~~~ `[agent]` is from `{ia2c, ma2c, iqll, iqld}`. `no_test` is suggested, since tests will significantly slow down the training speed. 2. To access tensorboard during training, run ~~~ tensorboard --logdir=[base_dir]/log ~~~ 3. To evaluate and compare trained agents, run ~~~ python3 main.py --base-dir [base_dir] evaluate --agents [agents] --evaluation-seeds [seeds] ~~~ Evaluation data will be output to `[base_dir]/eva_data`, and make sure evaluation seeds are different from those used in training. Under default evaluation setting, the inference policy of A2C is stochastic whereas that of Q-learning is greedy (deterministic). To explicitly specifiy the inference policy type, pass argument `--evaluation-policy-type [default/stochastic/deterministic]`. Please note running a determinisitc inference policy for A2C may cause the performance loss, due to the violation of "on-policy" learning. 4. To visualize the agent behavior, run ~~~ python3 main.py --base-dir [base_dir] evaluate --agents [agent] --evaluation-seeds [seed] --demo ~~~ It is recommended to have only one agent and one evaluation seed for the demo run. This will launch the SUMO GUI, and `./large_grid/data/view.xml` can be applied to visualize queue length and intersectin delay in edge color and thickness. Below are a few example screenshots. | t=1500s | t=2500s | t=3500s :-------------------:|:--------------------:|:--------------------: ![](./figs/1500.png) | ![](./figs/2500.png) | ![](./figs/3500.png) ## Reproducibility Due to SUMO version change and a few corresponding code modifications (e.g. `tau="0.5"` has to be removed from `vType` to prevent extensive vehicle collisions in simulation), it becomes difficult to reproduce paper results, which are based on SUMO 0.32.0. So we have re-run the experiments using SUMO 1.1.0 and provided the following training plots as reference. The conclusion still remains the same, that is, MA2C ~ IQL-LR > IA2C in large grid and MA2C > IA2C > IQL-LR in Monaco net. Note rather than reproducing exactly the same results, an evaluation is always valid as far as the comparison is fair, that is, fixing env config and seed across agents. | large grid | Monaco net :-------------------------------:|:------------------------------: ![](./figs/large_grid_train.png) | ![](./figs/real_net_train.png) ## Citation If you find this useful in your research, please cite our paper "Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control" ([early access version](https://ieeexplore.ieee.org/document/8667868), [preprint version](https://arxiv.org/pdf/1903.04527.pdf)): ~~~ @article{chu2019multi, title={Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control}, author={Chu, Tianshu and Wang, Jie and Codec{\`a}, Lara and Li, Zhaojian}, journal={IEEE Transactions on Intelligent Transportation Systems}, year={2019}, publisher={IEEE} } ~~~

评论收藏

内容反馈

版权申诉