# Tianshou's Mujoco Benchmark
We benchmarked Tianshou algorithm implementations in 9 out of 13 environments from the MuJoCo Gym task suite<sup>[[1]](#footnote1)</sup>.
For each supported algorithm and supported mujoco environments, we provide:
- Default hyperparameters used for benchmark and scripts to reproduce the benchmark;
- A comparison of performance (or code level details) with other open source implementations or classic papers;
- Graphs and raw data that can be used for research purposes<sup>[[2]](#footnote2)</sup>;
- Log details obtained during training<sup>[[2]](#footnote2)</sup>;
- Pretrained agents<sup>[[2]](#footnote2)</sup>;
- Some hints on how to tune the algorithm.
Supported algorithms are listed below:
- [Deep Deterministic Policy Gradient (DDPG)](https://arxiv.org/pdf/1509.02971.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec)
- [Twin Delayed DDPG (TD3)](https://arxiv.org/pdf/1802.09477.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec)
- [Soft Actor-Critic (SAC)](https://arxiv.org/pdf/1812.05905.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e605bdea942b408126ef4fbc740359773259c9ec)
- [REINFORCE algorithm](https://papers.nips.cc/paper/1999/file/464d828b85b0bed98e80ade0a5c43b0f-Paper.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/e27b5a26f330de446fe15388bf81c3777f024fb9)
- [Natural Policy Gradient](https://proceedings.neurips.cc/paper/2001/file/4b86abe48d358ecf194c56c69108433e-Paper.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/844d7703c313009c4c364edb4018c91de93439ca)
- [Advantage Actor-Critic (A2C)](https://openai.com/blog/baselines-acktr-a2c/), [commit id](https://github.com/thu-ml/tianshou/tree/1730a9008ad6bb67cac3b21347bed33b532b17bc)
- [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/6426a39796db052bafb7cabe85c764db20a722b0)
- [Trust Region Policy Optimization (TRPO)](https://arxiv.org/pdf/1502.05477.pdf), [commit id](https://github.com/thu-ml/tianshou/tree/5057b5c89e6168220272c9c28a15b758a72efc32)
- [Hindsight Experience Replay (HER)](https://arxiv.org/abs/1707.01495)
## EnvPool
We highly recommend using envpool to run the following experiments. To install, in a linux machine, type:
```bash
pip install envpool
```
After that, `make_mujoco_env` will automatically switch to envpool's Mujoco env. EnvPool's implementation is much faster (about 2\~3x faster for pure execution speed, 1.5x for overall RL training pipeline in average) than python vectorized env implementation, and it's behavior is consistent to gym's Mujoco env.
For more information, please refer to EnvPool's [GitHub](https://github.com/sail-sg/envpool/) and [Docs](https://envpool.readthedocs.io/en/latest/api/mujoco.html).
## Usage
Run
```bash
$ python mujoco_sac.py --task Ant-v3
```
Logs is saved in `./log/` and can be monitored with tensorboard.
```bash
$ tensorboard --logdir log
```
You can also reproduce the benchmark (e.g. SAC in Ant-v3) with the example script we provide under `examples/mujoco/`:
```bash
$ ./run_experiments.sh Ant-v3 sac
```
This will start 10 experiments with different seeds.
Now that all the experiments are finished, we can convert all tfevent files into csv files and then try plotting the results.
```bash
# generate csv
$ ./tools.py --root-dir ./results/Ant-v3/sac
# generate figures
$ ./plotter.py --root-dir ./results/Ant-v3 --shaded-std --legend-pattern "\\w+"
# generate numerical result (support multiple groups: `--root-dir ./` instead of single dir)
$ ./analysis.py --root-dir ./results --norm
```
## Example benchmark
<img src="./benchmark/Ant-v3/offpolicy.png" width="500" height="450">
Other graphs can be found under `examples/mujuco/benchmark/`
For pretrained agents, detailed graphs (single agent, single game) and log details, please refer to [https://cloud.tsinghua.edu.cn/d/f45fcfc5016043bc8fbc/](https://cloud.tsinghua.edu.cn/d/f45fcfc5016043bc8fbc/).
## Offpolicy algorithms
### Notes
1. In offpolicy algorithms (DDPG, TD3, SAC), the shared hyperparameters are almost the same, and unless otherwise stated, hyperparameters are consistent with those used for benchmark in SpinningUp's implementations (e.g. we use batchsize 256 in DDPG/TD3/SAC while SpinningUp use 100. Minor difference also lies with `start-timesteps`, data loop method `step_per_collect`, method to deal with/bootstrap truncated steps because of timelimit and unfinished/collecting episodes (contribute to performance improvement), etc.).
2. By comparison to both classic literature and open source implementations (e.g., SpinningUp)<sup>[[1]](#footnote1)</sup><sup>[[2]](#footnote2)</sup>, Tianshou's implementations of DDPG, TD3, and SAC are roughly at-parity with or better than the best reported results for these algorithms, so you can definitely use Tianshou's benchmark for research purposes.
3. We didn't compare offpolicy algorithms to OpenAI baselines [benchmark](https://github.com/openai/baselines/blob/master/benchmarks_mujoco1M.htm), because for now it seems that they haven't provided benchmark for offpolicy algorithms, but in [SpinningUp docs](https://spinningup.openai.com/en/latest/spinningup/bench.html) they stated that "SpinningUp implementations of DDPG, TD3, and SAC are roughly at-parity with the best-reported results for these algorithms", so we think lack of comparisons with OpenAI baselines is okay.
### DDPG
| Environment | Tianshou (1M) | [Spinning Up (PyTorch)](https://spinningup.openai.com/en/latest/spinningup/bench.html) | [TD3 paper (DDPG)](https://arxiv.org/abs/1802.09477) | [TD3 paper (OurDDPG)](https://arxiv.org/abs/1802.09477) |
| :--------------------: | :---------------: | :----------------------------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------: |
| Ant | 990.4±4.3 | ~840 | **1005.3** | 888.8 |
| HalfCheetah | **11718.7±465.6** | ~11000 | 3305.6 | 8577.3 |
| Hopper | **2197.0±971.6** | ~1800 | **2020.5** | 1860.0 |
| Walker2d | 1400.6±905.0 | ~1950 | 1843.6 | **3098.1** |
| Swimmer | **144.1±6.5** | ~137 | N | N |
| Humanoid | **177.3±77.6** | N | N | N |
| Reacher | **-3.3±0.3** | N | -6.51 | -4.01 |
| InvertedPendulum | **1000.0±0.0** | N | **1000.0** | **1000.0** |
| InvertedDoublePendulum | 8364.3±2778.9 | N | **9355.5** | 8370.0
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
YOLOv8是一种改进的You Only Look Once(YOLO)目标检测算法,而PyTorch则是一个常用的深度学习框架。虽然原始版本的YOLO通常使用Darknet框架实现,但是可以通过将YOLOv8模型迁移到PyTorch框架中来实现。 以下是在PyTorch中实现YOLOv8的一般步骤: 1. **了解YOLOv8模型结构**: - 首先,要深入了解YOLOv8算法的改进和结构,以便在PyTorch中重新构建该模型。 - YOLOv8通常包含类似Darknet中的Backbone网络(如CSPDarknet53)、FPN(Feature Pyramid Network)、YOLO头等组件。 2. **转换模型结构为PyTorch代码**: - 基于YOLOv8的论文和公开实现代码,将模型结构转换为PyTorch代码。创建对应的网络层、损失函数和模块,并组合它们以构建完整的YOLOv8模型。 3. **加载权重和预训练模型**: - 可能需要将从Darknet权重中提取的参数导入到PyTorch模型中,以便从预训练的模型中初始化权重。...
资源推荐
资源详情
资源评论
收起资源包目录
给力的PyTorch深度强化学习库 (513个子文件)
benchmark 37B
benchmark 36B
refs.bib 2KB
CSDN关注我不迷路.bmp 2.79MB
D4_battle2.cfg 938B
D3_battle.cfg 936B
D2_navigation.cfg 878B
D1_basic.cfg 873B
style.css 2KB
discrete_dqn_hl.gif 600KB
testpg.gif 526KB
.gitignore 2KB
.gitignore 50B
.gitignore 13B
MANIFEST.in 16B
L4_Policy.ipynb 33KB
L2_Buffer.ipynb 13KB
L1_Batch.ipynb 11KB
L7_Experiment.ipynb 9KB
L6_Trainer.ipynb 8KB
L0_overview.ipynb 7KB
L5_Collector.ipynb 7KB
L3_Vectorized__Environment.ipynb 7KB
action1.jpg 56KB
action2.jpg 40KB
rl-loop.jpg 16KB
vega@5.js 502KB
vega-lite@5.js 243KB
jquery-1.12.4.min.js 95KB
vega-embed@5.js 50KB
benchmark.js 3KB
copybutton.js 3KB
v5.json 1.73MB
result.json 123KB
result.json 123KB
result.json 122KB
result.json 122KB
result.json 122KB
result.json 121KB
result.json 117KB
result.json 117KB
result.json 114KB
result.json 66KB
result.json 66KB
result.json 59KB
result.json 59KB
result.json 59KB
result.json 59KB
result.json 59KB
bibtex.json 118B
LICENSE 1KB
d3.lmp 7KB
d4.lmp 5KB
poetry.lock 420KB
README.md 39KB
README.md 38KB
README.md 14KB
README.md 11KB
README.md 4KB
README.md 2KB
README.md 927B
ISSUE_TEMPLATE.md 738B
PULL_REQUEST_TEMPLATE.md 604B
README.md 371B
0_intro.md 325B
CONTRIBUTING.md 160B
README.md 152B
figure.png 425KB
figure.png 408KB
figure.png 405KB
figure.png 388KB
figure.png 382KB
offpolicy.png 378KB
figure.png 377KB
all.png 373KB
all.png 368KB
all.png 340KB
figure.png 335KB
figure.png 325KB
figure.png 316KB
all.png 314KB
figure.png 305KB
figure.png 302KB
offpolicy.png 302KB
figure.png 299KB
figure.png 299KB
figure.png 297KB
all.png 292KB
all.png 289KB
figure.png 283KB
onpolicy.png 281KB
figure.png 280KB
figure.png 279KB
figure.png 273KB
offpolicy.png 271KB
figure.png 267KB
Qbert_rew.png 261KB
figure.png 259KB
figure.png 257KB
Breakout_rew.png 253KB
共 513 条
- 1
- 2
- 3
- 4
- 5
- 6
资源评论
专家-百锦再
- 粉丝: 7434
- 资源: 731
下载权益
C知道特权
VIP文章
课程特权
开通VIP
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 论文(最终)_20240430235101.pdf
- 基于python编写的Keras深度学习框架开发,利用卷积神经网络CNN,快速识别图片并进行分类
- 最全空间计量实证方法(空间杜宾模型和检验以及结果解释文档).txt
- 5uonly.apk
- 蓝桥杯Python组的历年真题
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 2023-04-06-项目笔记 - 第一百十九阶段 - 4.4.2.117全局变量的作用域-117 -2024.04.30
- 前端开发技术实验报告:内含4四实验&实验报告
- Highlight Plus v20.0.1
- 林周瑜-论文.docx
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功