给力的PyTorch深度强化学习库_pytorch是如何导入库的资源-CSDN文库

共513个文件

py：237个

png：179个

json：18个

版权申诉

pytorch

104 浏览量 2024-03-14 21:30:54 上传评论收藏 31.48MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

给力的PyTorch深度强化学习库（513个子文件）

benchmark 37B

benchmark 36B

refs.bib 2KB

CSDN关注我不迷路.bmp 2.79MB

D4_battle2.cfg 938B

D3_battle.cfg 936B

D2_navigation.cfg 878B

D1_basic.cfg 873B

style.css 2KB

discrete_dqn_hl.gif 600KB

testpg.gif 526KB

.gitignore 2KB

.gitignore 50B

.gitignore 13B

MANIFEST.in 16B

L4_Policy.ipynb 33KB

L2_Buffer.ipynb 13KB

L1_Batch.ipynb 11KB

L7_Experiment.ipynb 9KB

L6_Trainer.ipynb 8KB

L0_overview.ipynb 7KB

L5_Collector.ipynb 7KB

L3_Vectorized__Environment.ipynb 7KB

action1.jpg 56KB

action2.jpg 40KB

rl-loop.jpg 16KB

vega@5.js 502KB

vega-lite@5.js 243KB

jquery-1.12.4.min.js 95KB

vega-embed@5.js 50KB

benchmark.js 3KB

copybutton.js 3KB

v5.json 1.73MB

result.json 123KB

result.json 122KB

result.json 121KB

result.json 117KB

result.json 114KB

result.json 66KB

result.json 59KB

bibtex.json 118B

LICENSE 1KB

d3.lmp 7KB

d4.lmp 5KB

poetry.lock 420KB

README.md 39KB

README.md 38KB

README.md 14KB

README.md 11KB

README.md 4KB

README.md 2KB

README.md 927B

ISSUE_TEMPLATE.md 738B

PULL_REQUEST_TEMPLATE.md 604B

README.md 371B

0_intro.md 325B

CONTRIBUTING.md 160B

README.md 152B

figure.png 425KB

figure.png 408KB

figure.png 405KB

figure.png 388KB

figure.png 382KB

offpolicy.png 378KB

figure.png 377KB

all.png 373KB

all.png 368KB

all.png 340KB

figure.png 335KB

figure.png 325KB

figure.png 316KB

all.png 314KB

figure.png 305KB

figure.png 302KB

offpolicy.png 302KB

figure.png 299KB

figure.png 297KB

all.png 292KB

all.png 289KB

figure.png 283KB

onpolicy.png 281KB

figure.png 280KB

figure.png 279KB

figure.png 273KB

offpolicy.png 271KB

figure.png 267KB

Qbert_rew.png 261KB

figure.png 259KB

figure.png 257KB

Breakout_rew.png 253KB

共 513 条

# Tianshou's Mujoco Benchmark We benchmarked Tianshou algorithm implementations in 9 out of 13 environments from the MuJoCo Gym task suite[[1]](#footnote1). For each supported algorithm and supported mujoco environments, we provide: - Default hyperparameters used for benchmark and scripts to reproduce the benchmark; - A comparison of performance (or code level details) with other open source implementations or classic papers; - Graphs and raw data that can be used for research purposes[[2]](#footnote2); - Log details obtained during training[[2]](#footnote2); - Pretrained agents[[2]](#footnote2); - Some hints on how to tune the algorithm. Supported algorithms are listed below: - [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/pdf/1509.02971.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec) - [Twin Delayed DDPG (TD3)](https://arxiv.org/pdf/1802.09477.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec) - [Soft Actor-Critic (SAC)](https://arxiv.org/pdf/1812.05905.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec) - [REINFORCE algorithm](https://papers.nips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e27b5a26f330de446fe15388bf81c3777f024fb9) - [Natural Policy Gradient](https://proceedings.neurips.cc/paper/2001/file/4b86abe48d358ecf194c56c69108433e-Paper.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/844d7703c313009c4c364edb4018c91de93439ca) - [Advantage Actor-Critic (A2C)](https://openai.com/blog/baselines-acktr-a2c/), [commit id](https://github.com/thu-ml/tianshou/tree/1730a9008ad6bb67cac3b21347bed33b532b17bc) - [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/6426a39796db052bafb7cabe85c764db20a722b0) - [Trust Region Policy Optimization (TRPO)](https://arxiv.org/pdf/1502.05477.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/5057b5c89e6168220272c9c28a15b758a72efc32) - [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495) ## EnvPool We highly recommend using envpool to run the following experiments. To install, in a linux machine, type: ```bash pip install envpool ``` After that, `make_mujoco_env` will automatically switch to envpool's Mujoco env. EnvPool's implementation is much faster (about 2\~3x faster for pure execution speed, 1.5x for overall RL training pipeline in average) than python vectorized env implementation, and it's behavior is consistent to gym's Mujoco env. For more information, please refer to EnvPool's [GitHub](https://github.com/sail-sg/envpool/) and [Docs](https://envpool.readthedocs.io/en/latest/api/mujoco.html). ## Usage Run ```bash $ python mujoco_sac.py --task Ant-v3 ``` Logs is saved in `./log/` and can be monitored with tensorboard. ```bash $ tensorboard --logdir log ``` You can also reproduce the benchmark (e.g. SAC in Ant-v3) with the example script we provide under `examples/mujoco/`: ```bash $ ./run_experiments.sh Ant-v3 sac ``` This will start 10 experiments with different seeds. Now that all the experiments are finished, we can convert all tfevent files into csv files and then try plotting the results. ```bash # generate csv $ ./tools.py --root-dir ./results/Ant-v3/sac # generate figures $ ./plotter.py --root-dir ./results/Ant-v3 --shaded-std --legend-pattern "\\w+" # generate numerical result (support multiple groups: `--root-dir ./` instead of single dir) $ ./analysis.py --root-dir ./results --norm ``` ## Example benchmark <img src="./benchmark/Ant-v3/offpolicy.png" width="500" height="450"> Other graphs can be found under `examples/mujuco/benchmark/` For pretrained agents, detailed graphs (single agent, single game) and log details, please refer to [https://cloud.tsinghua.edu.cn/d/f45fcfc5016043bc8fbc/](https://cloud.tsinghua.edu.cn/d/f45fcfc5016043bc8fbc/). ## Offpolicy algorithms ### Notes 1. In offpolicy algorithms (DDPG, TD3, SAC), the shared hyperparameters are almost the same, and unless otherwise stated, hyperparameters are consistent with those used for benchmark in SpinningUp's implementations (e.g. we use batchsize 256 in DDPG/TD3/SAC while SpinningUp use 100. Minor difference also lies with `start-timesteps`, data loop method `step_per_collect`, method to deal with/bootstrap truncated steps because of timelimit and unfinished/collecting episodes (contribute to performance improvement), etc.). 2. By comparison to both classic literature and open source implementations (e.g., SpinningUp)[[1]](#footnote1)[[2]](#footnote2), Tianshou's implementations of DDPG, TD3, and SAC are roughly at-parity with or better than the best reported results for these algorithms, so you can definitely use Tianshou's benchmark for research purposes. 3. We didn't compare offpolicy algorithms to OpenAI baselines [benchmark](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm), because for now it seems that they haven't provided benchmark for offpolicy algorithms, but in [SpinningUp docs](https://spinningup.openai.com/en/latest/spinningup/bench.html) they stated that "SpinningUp implementations of DDPG, TD3, and SAC are roughly at-parity with the best-reported results for these algorithms", so we think lack of comparisons with OpenAI baselines is okay. ### DDPG | Environment | Tianshou (1M) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) | [TD3 paper (DDPG)](https://arxiv.org/abs/1802.09477) | [TD3 paper (OurDDPG)](https://arxiv.org/abs/1802.09477) | | :--------------------: | :---------------: | :----------------------------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------: | | Ant | 990.4±4.3 | ~840 | **1005.3** | 888.8 | | HalfCheetah | **11718.7±465.6** | ~11000 | 3305.6 | 8577.3 | | Hopper | **2197.0±971.6** | ~1800 | **2020.5** | 1860.0 | | Walker2d | 1400.6±905.0 | ~1950 | 1843.6 | **3098.1** | | Swimmer | **144.1±6.5** | ~137 | N | N | | Humanoid | **177.3±77.6** | N | N | N | | Reacher | **-3.3±0.3** | N | -6.51 | -4.01 | | InvertedPendulum | **1000.0±0.0** | N | **1000.0** | **1000.0** | | InvertedDoublePendulum | 8364.3±2778.9 | N | **9355.5** | 8370.0

评论收藏

内容反馈

版权申诉