# The Winning Solution for the NeurIPS 2018: AI for Prosthetics Challenge
<p align="center">
<img src="image/competition.png" alt="PARL" width="800"/>
</p>
This folder contains the winning solution of our team `Firework` in the NeurIPS 2018: AI for Prosthetics Challenge. It consists of three parts. The first part is our final submitted model, a sensible controller that can follow random target velocity. The second part is used for curriculum learning, to learn a natural and efficient gait at low-speed walking. The last part learns the final agent in the random velocity environment for round2 evaluation.
For more technical details about our solution, we provide:
1. [[Link]](https://youtu.be/RT4JdMsZaTE) An interesting video demonstrating the training process visually.
2. [[Link]](https://docs.google.com/presentation/d/1n9nTfn3EAuw2Z7JichqMMHB1VzNKMgExLJHtS4VwMJg/edit?usp=sharing) A PowerPoint Presentation briefly introducing our solution in NeurIPS2018 competition workshop.
3. [[Link]](https://drive.google.com/file/d/1W-FmbJu4_8KmwMIzH0GwaFKZ0z1jg_u0/view?usp=sharing) A poster briefly introducing our solution in NeurIPS2018 competition workshop.
3. (coming soon)A full academic paper detailing our solution, including entire training pipline, related work and experiments that analyze the importance of each key ingredient.
**Note**: Reproducibility is a long-standing issue in reinforcement learning field. We have tried to guarantee that our code is reproducible, testing each training sub-task three times. However, there are still some factors that prevent us from achieving the same performance. One problem is the choice time of a convergence model during curriculum learning. Choosing a sensible and natural gait visually is crucial for subsequent training, but the definition of what is a good gait varies from person to person.
<p align="center">
<img src="image/demo.gif" alt="PARL" width="500"/>
</p>
## Dependencies
- python3.6
- [parl==1.0](https://github.com/PaddlePaddle/PARL)
- [paddlepaddle==1.5.1](https://github.com/PaddlePaddle/Paddle)
- [osim-rl](https://github.com/stanfordnmbl/osim-rl)
- [grpcio==1.12.1](https://grpc.io/docs/quickstart/python.html)
- tqdm
- tensorflow (To use tensorboard)
## Part1: Final submitted model
### Result
For final submission, we test our model in 500 CPUs, running 10 episodes per CPU with different random seeds.
| Avg reward of all episodes | Avg reward of complete episodes | Falldown % | Evaluate episodes |
|----------------------------|---------------------------------|------------|-------------------|
| 9968.5404 | 9980.3952 | 0.0026 | 5000 |
### Test
- How to Run
1. Enter the sub-folder `final_submit`
2. Download the model file from online storage service, [Baidu Pan](https://pan.baidu.com/s/1NN1auY2eDblGzUiqR8Bfqw) or [Google Drive](https://drive.google.com/open?id=1DQHrwtXzgFbl9dE7jGOe9ZbY0G9-qfq3)
3. Unpack the file by using:
`tar zxvf saved_model.tar.gz`
4. Launch the test script:
`python test.py`
## Part2: Curriculum learning
<p align="center">
<img src="image/curriculum-learning.png" alt="PARL" width="500"/>
</p>
#### 1. Target: Run as fast as possible
<p align="center">
<img src="image/fastest.png" alt="PARL" width="800"/>
</p>
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type RunFastest
```
#### 2. Target: run at 3.0 m/s
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [RunFastest model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 3.0 \
--act_penalty_lowerbound 1.5
```
#### 3. target: walk at 2.0 m/s
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 3.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 2.0 \
--act_penalty_lowerbound 0.75
```
#### 4. target: walk slowly at 1.25 m/s
<p align="center">
<img src="image/last course.png" alt="PARL" width="800"/>
</p>
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 1 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 2.0m/s model]
# client (Suggest: 200+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type FixedTargetSpeed --target_v 1.25 \
--act_penalty_lowerbound 0.6
```
## Part3: Training in random velocity environment for round2 evaluation
As mentioned before, the selection of model that used to fine-tune influence later training. For those who can not obtain expected performance by former steps, a pre-trained model that walk naturally at 1.25m/s is provided. ([Baidu Pan](https://pan.baidu.com/s/1PVDgIe3NuLB-4qI5iSxtKA) or [Google Drive](https://drive.google.com/open?id=1jWzs3wvq7_ierIwGZXc-M92bv1X5eqs7))
```bash
# server
python simulator_server.py --port [PORT] --ensemble_num 12 --warm_start_batchs 1000 \
--restore_model_path [FixedTargetSpeed 1.25m/s model] --restore_from_one_head
# client (Suggest: 100+ clients)
python simulator_client.py --port [PORT] --ip [SERVER_IP] --reward_type Round2 --act_penalty_lowerbound 0.75 \
--act_penalty_coeff 7.0 --vel_penalty_coeff 20.0 --discrete_data --stage 3
```
### Test trained model
```bash
python test.py --restore_model_path [MODEL_PATH] --ensemble_num [ENSEMBLE_NUM]
```
### Other implementation details
<p align="center">
<img src="image/velocity_distribution.png" alt="PARL" width="800"/>
</p>
Following the above steps correctly, you can get an agent that scores around 9960, socring slightly poorer than our final submitted model. The score gap results from the lack of multi-stage-training paradigm. As shown in the above Firgure, the distribution of possible target velocity keeps changing throughout the entire episode, degrading the performance a single model due to the convetional conpept that it's hard to fit a model under different data distributions. Thus we actually have trained 4 models that amis to perform well in different velocity disstribution. These four models are trained successively, this is, we train a model that specializes in start stage(first 60 frames), then fix this start model at first 60 frames, and train another model for rest 940 frames. We do not provide this part of the code, since it reduces the readability of the code. Feel free to post issue if you have any problems :)
## Acknowledgments
We would like to thank Zhihua Wu, Jingzhou He, Kai Zeng for providing stable computation resources and other colleagues on the Online Learning team for insightful discussions. We are grateful to Tingru Hong, Wenxia Zheng and others for creating a vivid and popular demonstration video.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等)内涵20+强化学习经典算法代码。对应使用教程什么的参考博客: 多智能体(前沿算法+原理) https://blog.csdn.net/sinat_39620217/article/details/115299073?spm=1001.2014.3001.5502 强化学习基础篇(单智能体算法) https://blog.csdn.net/sinat_39620217/category_10940146.html
资源推荐
资源详情
资源评论
收起资源包目录
强化学习算法合集(DQN、DDPG、SAC、TD3、MADDPG、QMIX等等) (302个子文件)
model.ckpt 4.63MB
demo.gif 4.58MB
Lane_bend.gif 3.19MB
performance.gif 782KB
performance.gif 238KB
test.ipynb 2KB
test.ipynb 1KB
l2rpn.jpeg 69KB
cartpole.jpg 110KB
README.md 7KB
README.md 3KB
README.md 3KB
README.md 3KB
README.md 3KB
README.md 3KB
README.md 3KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 2KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 1KB
README.md 849B
README.md 718B
README.md 700B
README.md 659B
README.md 435B
README.md 334B
pelvisBasedObs_scaler.npz 4KB
pelvisBasedObs_scaler.npz 4KB
official_obs_scaler.npz 2KB
official_obs_scaler.npz 2KB
last course.png 360KB
fastest.png 271KB
Dueling DQN.png 218KB
result_a2c_paddle1.png 203KB
result_a2c_paddle0.png 193KB
competition.png 185KB
curriculum-learning.png 158KB
carla_sac.png 142KB
paddle2.0_qmix_result.png 97KB
perfect_moves_rate.png 64KB
good_moves_rate.png 60KB
velocity_distribution.png 28KB
latest_ship_model.pth 338KB
latest_ship_model.pth 325KB
submission.py 100KB
submission.py 100KB
env_wrapper.py 28KB
controller.py 21KB
controller.py 21KB
agent.py 19KB
env_wrapper.py 17KB
env_wrapper.py 17KB
utils.py 14KB
agent.py 13KB
train.py 12KB
simulator_server.py 12KB
evaluate.py 11KB
env_wrapper.py 10KB
train.py 9KB
train.py 9KB
train.py 9KB
Coach.py 9KB
opensim_agent.py 9KB
connect4_game.py 8KB
utils.py 8KB
utils.py 8KB
train.py 8KB
simulator_pb2.py 7KB
train.py 7KB
train.py 7KB
train.py 7KB
env_utils.py 7KB
train.py 7KB
actor.py 7KB
gridworld.py 7KB
train.py 7KB
gridworld.py 7KB
gridworld.py 7KB
gridworld.py 7KB
gridworld.py 7KB
train.py 7KB
mlp_model.py 6KB
mlp_model.py 6KB
opensim_model.py 6KB
env.py 6KB
env.py 6KB
train.py 6KB
obs_filter.py 6KB
train.py 6KB
共 302 条
- 1
- 2
- 3
- 4
资源评论
汀、人工智能
- 粉丝: 7w+
- 资源: 365
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 图片.pdf
- Lsky-pro的token获取程序
- ABAQUS混凝土本构曲线数据(C25、C30、C35、C40、C45、C50...)
- 基于Python Flask框架的金融项目设计源码
- Campus一款简单的后台管理系统,RuoYi-Vue简单版,快速开发框架,适合大学生开发毕设,或其他小项目。使用Spring
- 一个Vue3 Ts Node.js的通用后台管理系统后端
- 1.实现定义学生成绩记录,记录包括字段有:序号、学生姓名、学号、课程名称、成绩。 2.实现学生成绩管理系统的菜单管理功能,允许
- 基于Java的WebBase业务系统框架设计源码
- app.revanced.android.gms.0.3.1.4.240913.23b92906231cf22e7c81ac50707854e2.apk
- 基于ThinkPHP的API文档自动生成系统设计源码
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功