RL关键算法的结构实现_Python_下载.zip资源-CSDN文库

共165个文件

py：111个

yaml：29个

sh：5个

版权申诉

24 浏览量 2023-04-30 10:23:26 上传评论收藏 1.52MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

RL关键算法的结构实现_Python_下载.zip （165个子文件）

.all-contributorsrc 3KB

.isort.cfg 177B

CODEOWNERS 299B

Dockerfile 911B

.flake8 246B

.gitignore 164B

MANIFEST.in 42B

mypy.ini 77B

Jenkinsfile 326B

Makefile 758B

README.md 23KB

README.md 2KB

LICENSE.md 1KB

reacher_demo.pkl 1.2MB

lunarlander_continuous_demo.pkl 977KB

lunarlander_discrete_demo.pkl 645KB

losses.py 15KB

replay_buffer.py 14KB

dqn_agent.py 14KB

dqn_agent.py 12KB

agent.py 11KB

atari_wrappers.py 11KB

learner.py 10KB

agent.py 10KB

losses.py 9KB

learner.py 9KB

multiprocessing_env.py 9KB

agent.py 9KB

distributed_logger.py 9KB

learner.py 8KB

agent.py 8KB

wrapper.py 8KB

agent.py 8KB

learner.py 8KB

agent.py 8KB

networks.py 8KB

brain.py 8KB

heads.py 7KB

sac_agent.py 7KB

architecture.py 7KB

dqn_agent.py 7KB

sac_agent.py 7KB

ddpg_agent.py 7KB

learner.py 7KB

ddpg_agent.py 7KB

learner.py 6KB

agent.py 6KB

learner.py 6KB

helper_functions.py 6KB

worker.py 6KB

learner.py 6KB

resnet.py 6KB

utils.py 6KB

agent.py 5KB

distributed_worker.py 5KB

learner.py 5KB

her.py 5KB

distributed_worker.py 5KB

learner.py 5KB

grad_cam.py 5KB

sac_learner.py 5KB

sac_learner.py 4KB

run_pong_no_frameskip_v4.py 4KB

ddpg_learner.py 4KB

segment_tree.py 4KB

linear.py 4KB

ddpg_learner.py 4KB

distillation_buffer.py 4KB

dqn_learner.py 4KB

registry.py 4KB

buffer.py 4KB

run_reacher_v2.py 4KB

run_lunarlander_continuous_v2.py 3KB

run_lunarlander_v2.py 3KB

saliency_map.py 3KB

her.py 3KB

test_cnn_cfg.py 3KB

test_config_registry.py 3KB

test_prioritized_buffer.py 3KB

test_run_distillation_agent.py 3KB

networks.py 3KB

__init__.py 2KB

config.py 2KB

test_run_agent.py 2KB

test_distillation_buffer.py 2KB

cnn.py 2KB

registry.py 2KB

distributed_logger.py 2KB

noise.py 2KB

test_helper_funcion.py 2KB

test_run_apex.py 2KB

gail_buffer.py 2KB

utils.py 1KB

test_uniform_buffer.py 1KB

utils.py 1KB

buffer.py 1KB

setup.py 1KB

normalizers.py 1KB

共 165 条

<img src="https://user-images.githubusercontent.com/17582508/52845370-4a930200-314a-11e9-9889-e00007043872.jpg" align="center"> [![Language grade: Python](https://img.shields.io/lgtm/grade/python/g/medipixel/rl_algorithms.svg?logo=lgtm&logoWidth=18)](https://lgtm.com/projects/g/medipixel/rl_algorithms/context:python) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black) [![All Contributors](https://img.shields.io/badge/all_contributors-10-orange.svg?style=flat-square)](#contributors-)  ## Contents * [Welcome!](https://github.com/medipixel/rl_algorithms#welcome) * [Contributors](https://github.com/medipixel/rl_algorithms#contributors) * [Algorithms](https://github.com/medipixel/rl_algorithms#algorithms) * [Performance](https://github.com/medipixel/rl_algorithms#performance) * [Getting Started](https://github.com/medipixel/rl_algorithms#getting-started) * [Class Diagram](https://github.com/medipixel/rl_algorithms#class-diagram) * [References](https://github.com/medipixel/rl_algorithms#references) ## Welcome! This repository contains Reinforcement Learning algorithms which are being used for research activities at Medipixel. The source code will be frequently updated. We are warmly welcoming external contributors! :) |<img src="https://user-images.githubusercontent.com/17582508/52840582-18c76e80-313d-11e9-9752-3d6138f39a15.gif" width="260" height="180"/>|<img src="https://media.giphy.com/media/ZxLNajigOcLyeUnOwg/giphy.gif" width="160" height="180"/>|<img src="https://media.giphy.com/media/1mikGEln2lArKMQ6Pt/giphy.gif" width="260" height="180"/>| |:---:|:---:|:---:| |BC agent on LunarLanderContinuous-v2|RainbowIQN agent on PongNoFrameskip-v4|SAC agent on Reacher-v2| ## Contributors Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):    <table> <tr> <td align="center"><a href="https://github.com/Curt-Park"><img src="https://avatars3.githubusercontent.com/u/14961526?v=4?s=100" width="100px;" alt=""/> Jinwoo Park (Curt)</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=Curt-Park" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/MrSyee"><img src="https://avatars3.githubusercontent.com/u/17582508?v=4?s=100" width="100px;" alt=""/> Kyunghwan Kim</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=MrSyee" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/darthegg"><img src="https://avatars3.githubusercontent.com/u/16010242?v=4?s=100" width="100px;" alt=""/> darthegg</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=darthegg" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/mclearning2"><img src="https://avatars3.githubusercontent.com/u/43226417?v=4?s=100" width="100px;" alt=""/> Mincheol Kim</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=mclearning2" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/minseop4898"><img src="https://avatars1.githubusercontent.com/u/34338299?v=4?s=100" width="100px;" alt=""/> ê¹ë¯¼ì</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=minseop4898" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/jinPrelude"><img src="https://avatars1.githubusercontent.com/u/16518993?v=4?s=100" width="100px;" alt=""/> Leejin Jung</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=jinPrelude" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/cyoon1729"><img src="https://avatars2.githubusercontent.com/u/33583101?v=4?s=100" width="100px;" alt=""/> Chris Yoon</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=cyoon1729" title="Code">ð»</a></td> </tr> <tr> <td align="center"><a href="https://jiseonghan.github.io/"><img src="https://avatars2.githubusercontent.com/u/48741026?v=4?s=100" width="100px;" alt=""/> Jiseong Han</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=jiseongHAN" title="Code">ð»</a></td> <td align="center"><a href="https://github.com/sehyun-hwang"><img src="https://avatars3.githubusercontent.com/u/23437715?v=4?s=100" width="100px;" alt=""/> Sehyun Hwang</a> <a href="#maintenance-sehyun-hwang" title="Maintenance">ð§</a></td> <td align="center"><a href="https://github.com/isk03276"><img src="https://avatars.githubusercontent.com/u/23740495?v=4?s=100" width="100px;" alt=""/> eunjin</a> <a href="https://github.com/medipixel/rl_algorithms/commits?author=isk03276" title="Code">ð»</a></td> </tr> </table>    This project follows the [all-contributors](https://github.com/all-contributors/all-contributors) specification. ## Algorithms 0. [Advantage Actor-Critic (A2C)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/a2c) 1. [Deep Deterministic Policy Gradient (DDPG)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/ddpg) 2. [Proximal Policy Optimization Algorithms (PPO)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/ppo) 3. [Twin Delayed Deep Deterministic Policy Gradient Algorithm (TD3)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/td3) 4. [Soft Actor Critic Algorithm (SAC)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/sac) 5. [Behaviour Cloning (BC with DDPG, SAC)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/bc) 6. [From Demonstrations (DDPGfD, SACfD, DQfD)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/fd) 7. [Rainbow DQN](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/dqn) 8. [Rainbow IQN (without DuelingNet)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/dqn) - DuelingNet [degrades performance](https://github.com/medipixel/rl_algorithms/pull/137) 9. Rainbow IQN (with [ResNet](https://github.com/medipixel/rl_algorithms/blob/master/rl_algorithms/common/networks/backbones/resnet.py)) 10. [Recurrent Replay DQN (R2D1)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/recurrent) 11. [Distributed Pioritized Experience Replay (Ape-X)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/common/apex) 12. [Policy Distillation](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/distillation) 13. [Generative Adversarial Imitation Learning (GAIL)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/gail) 14. [Sample Efficient Actor-Critic with Experience Replay (ACER)](https://github.com/medipixel/rl_algorithms/tree/master/rl_algorithms/acer) ## Performance We have tested each algorithm on some of the following environments. - [PongNoFrameskip-v4](https://github.com/medipixel/rl_algorithms/tree/master/configs/pong_no_frameskip_v4) - [LunarLanderContinuous-v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/lunarlander_continuous_v2) - [LunarLander_v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/lunarlander_v2) - [Reacher-v2](https://github.com/medipixel/rl_algorithms/tree/master/configs/reacher-v2) âPlease note that this won't be fr

评论收藏

内容反馈

版权申诉