# simple_rl
A simple framework for experimenting with Reinforcement Learning in Python.
There are loads of other great libraries out there for RL. The aim of this one is twofold:
1. Simplicity.
2. Reproducibility of results.
A brief tutorial for a slightly earlier version is available [here](http://cs.brown.edu/~dabel/blog/posts/simple_rl.html). As of version 0.77, the library should work with both Python 2 and Python 3. Please let me know if you find that is not the case!
simple_rl requires [numpy](http://www.numpy.org/) and [matplotlib](http://matplotlib.org/). Some MDPs have visuals, too, which requires [pygame](http://www.pygame.org/news). Also includes support for hooking into any of the [Open AI Gym environments](https://gym.openai.com/envs). The library comes along with basic test script, contained in the _tests_ directory. I suggest running it and making sure all tests pass when you install the library.
[Documentation available here](https://david-abel.github.io/simple_rl/docs/index.html)
## Installation
The easiest way to install is with [pip](https://pypi.python.org/pypi/pip). Just run:
pip install simple_rl
Alternatively, you can download simple_rl [here](https://github.com/david-abel/simple_rl/tarball/v0.811).
## Citation
If you use simple_rl in your research, please cite the [workshop paper](https://david-abel.github.io/papers/simple_rl.pdf) as follows:
@article{abel2019simple_rl,
title={simple_rl: Reproducible Reinforcement Learning in Python},
author={David Abel},
booktitle={ICLR Workshop on Reproducibility in Machine Learning},
year={2019}
}
## New Feature: Easy Reproduction of Results
I just added a new feature I'm quite excited about: *easy reproduction of results*. Every experiment run now outputs a file "full_experiment.txt" in the _results/exp_name/_ directory. The new function _reproduce_from_exp_file(file_name)_, when pointed at an experiment directory, will reassemble and rerun an entire experiment based on this file. The goal here is to encourage simple tracking of experiments and enable quick result-reproduction. It only works with MDPs though -- it does not yet work with OOMDPs, POMDPs, or MarkovGames (I'd be delighted if someone wants to make it work, though!).
See the second example below for a quick sense of how to use this feature.
## Example
Some examples showcasing basic functionality are included in the [examples](https://github.com/david-abel/simple_rl/tree/master/examples) directory.
To run a simple experiment, import the _run_agents_on_mdp(agent_list, mdp)_ method from _simple_rl.run_experiments_ and call it with some agents for a given MDP. For example:
# Imports
from simple_rl.run_experiments import run_agents_on_mdp
from simple_rl.tasks import GridWorldMDP
from simple_rl.agents import QLearningAgent
# Run Experiment
mdp = GridWorldMDP()
agent = QLearningAgent(mdp.get_actions())
run_agents_on_mdp([agent], mdp)
Running the above code will run _Q_-learning on a simple GridWorld. When it finishes it stores the results in _cur_dir/results/*_ and makes and opens the following plot:
<img src="https://david-abel.github.io/blog/posts/images/simple_grid.jpg" width="480" align="center">
For a slightly more complicated example, take a look at the code of _simple_example.py_. Here we run two agents on the grid world from the Russell-Norvig AI textbook:
from simple_rl.agents import QLearningAgent, RandomAgent, RMaxAgent
from simple_rl.tasks import GridWorldMDP
from simple_rl.run_experiments import run_agents_on_mdp
# Setup MDP.
mdp = GridWorldMDP(width=4, height=3, init_loc=(1, 1), goal_locs=[(4, 3)], lava_locs=[(4, 2)], gamma=0.95, walls=[(2, 2)], slip_prob=0.05)
# Setup Agents.
ql_agent = QLearningAgent(actions=mdp.get_actions())
rmax_agent = RMaxAgent(actions=mdp.get_actions())
rand_agent = RandomAgent(actions=mdp.get_actions())
# Run experiment and make plot.
run_agents_on_mdp([ql_agent, rmax_agent, rand_agent], mdp, instances=5, episodes=50, steps=10)
The above code will generate the following plot:
<img src="https://david-abel.github.io/blog/posts/images/rn_grid.jpg" width="480" align="center">
To showcase the new reproducibility feature, suppose we now wanted to reproduce the above experiment. We just do the following:
from simple_rl.run_experiments import reproduce_from_exp_file
reproduce_from_exp_file("gridworld_h-3_w-4")
Which will rerun the entire experiment, based on a file created and populated behind the scenes. Then, we should get the following plot:
<img src="https://david-abel.github.io/blog/posts/images/rn_grid_reproduce.jpg" width="480" align="center">
Easy! This is a new feature, so there may be bugs -- just let me know as things come up. It's only supposed to work for MDPs, not POMDPs/OOMDPs/MarkovGameMDPs (so far). Take a look at [_reproduce_example.py_](https://github.com/david-abel/simple_rl/blob/master/examples/reproduce_example.py) for a bit more detail.
## Overview
* (_agents_): Code for some basic agents (a random actor, _Q_-learning, [[R-Max]](http://www.jmlr.org/papers/volume3/brafman02a/brafman02a.pdf), _Q_-learning with a Linear Approximator, and so on).
* (_experiments_): Code for an Experiment class to track parameters and reproduce results.
* (_mdp_): Code for a basic MDP and MDPState class, and an MDPDistribution class (for lifelong learning). Also contains OO-MDP implementation [[Diuk et al. 2008]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.149.7056&rep=rep1&type=pdf).
* (_planning_): Implementations for planning algorithms, includes ValueIteration and MCTS [[Couloum 2006]](https://hal.archives-ouvertes.fr/file/index/docid/116992/filename/CG2006.pdf), the latter being still in development.
* (_tasks_): Implementations for a few standard MDPs (grid world, N-chain, Taxi [[Dietterich 2000]](http://www.scs.cmu.edu/afs/cs/project/jair/pub/volume13/dietterich00a.pdf), and the [OpenAI Gym](https://gym.openai.com/envs)).
* (_utils_): Code for charting and other utilities.
## Contributing
If you'd like to contribute: that's great! Take a look at some of the needed improvements below: I'd love for folks to work on those items. Please see the [contribution guidelines](https://github.com/david-abel/simple_rl/blob/master/CONTRIBUTING.md). Email me with any questions.
## Making a New MDP
Make an MDP subclass, which needs:
* A static variable, _ACTIONS_, which is a list of strings denoting each action.
* Implement a reward and transition function and pass them to MDP constructor (along with _ACTIONS_).
* I also suggest overwriting the "\_\_str\_\_" method of the class, and adding a "\_\_init\_\_.py" file to the directory.
* Create a State subclass for your MDP (if necessary). I suggest overwriting the "\_\_hash\_\_", "\_\_eq\_\_", and "\_\_str\_\_" for the class to play along well with the agents.
## Making a New Agent
Make an Agent subclass, which requires:
* A method, _act(self, state, reward)_, that returns an action.
* A method, _reset()_, that puts the agent back to its _tabula rasa_ state.
## In Development
I'm hoping to add the following features:
* __Planning__: Finish MCTS [[Coloum 2006]](https://hal.inria.fr/file/index/docid/116992/filename/CG2006.pdf), implement RTDP [[Barto et al. 1995]](https://pdfs.semanticscholar.org/2838/e01572bf53805c502ec31e3e00a8e1e0afcf.pdf)
* __Deep RL__: Write a DQN [[Mnih et al. 2015]](http://www.davidqiu.com:8888/research/nature14236.pdf) in PyTorch, possibly others (some kind of policy gradient).
* __Efficiency__: Convert most defaultdict/dict uses to numpy.
* __Reproducibility__: The new reproduce feature is limited in scope -- I'd love for someone to extend it to work with OO-MDPs, Planning, MarkovGames, POMDPs, and beyond.
* __Docs__: Tutorial and documentation.
* __Visuals__: Unify MDP visualization.
* __Misc__: Additional testing.
Cheers,
-Dave
没有合适的资源?快使用搜索试试~ 我知道了~
simple_rl:使用Python进行强化学习实验的简单框架
共250个文件
py:160个
html:36个
txt:13个
需积分: 16 1 下载量 57 浏览量
2021-02-10
11:10:47
上传
评论
收藏 799KB ZIP 举报
温馨提示
simple_rl 用于在Python中进行强化学习实验的简单框架。 RL还有很多其他很棒的库。 这一目标的目的是双重的: 简单。 结果的可重复性。 提供了一个稍早版本的简短教程。 从0.77版开始,该库应可与Python 2和Python 3一起使用。如果发现情况并非如此,请告诉我! simple_rl需要和 。 一些MDP也具有视觉效果,这需要 。 还包括对挂钩到任何。 该库随附基本的测试脚本,包含在tests目录中。 我建议运行它并确保在安装库时所有测试均通过。 安装 最简单的安装方法是使用 。 赶紧跑: pip install simple_rl 或者,您可以下载simple_rl。 引文 如果您在研究中使用simple_rl,请引用以下: @article{abel2019simple_rl, title={simple_rl: Reproducible Re
资源详情
资源评论
资源推荐
收起资源包目录
simple_rl:使用Python进行强化学习实验的简单框架 (250个子文件)
setup.cfg 39B
basic.css 10KB
pygments.css 4KB
classic.css 4KB
Random.csv 4KB
Q-learning.csv 4KB
Random.csv 2KB
Q-learning.csv 2KB
ajax-loader.gif 673B
.gitignore 199B
run_experiments.html 94KB
chart_utils.html 68KB
ExperimentClass.html 52KB
agents.html 52KB
DelayedQAgentClass.html 39KB
genindex.html 36KB
ValueIterationClass.html 35KB
QLearningAgentClass.html 34KB
RMaxAgentClass.html 31KB
make_mdp.html 31KB
MDPDistributionClass.html 30KB
BeliefSparseSamplingClass.html 28KB
mdp.html 28KB
BoundedRTDPClass.html 27KB
DoubleQAgentClass.html 26KB
planning.html 24KB
utils.html 21KB
MCTSClass.html 20KB
overview.html 18KB
MDPClass.html 15KB
experiments.html 15KB
code.html 11KB
StateClass.html 11KB
AgentClass.html 11KB
py-modindex.html 10KB
additional_datastructures.html 9KB
FixedPolicyAgentClass.html 7KB
BeliefAgentClass.html 7KB
PlannerClass.html 7KB
ExperimentParametersClass.html 7KB
index.html 6KB
RandomAgentClass.html 6KB
tasks.html 5KB
index.html 5KB
tutorial.html 5KB
search.html 4KB
objects.inv 2KB
Usage_NavigationWorldMDP.ipynb 192KB
examples_overview-checkpoint.ipynb 142KB
examples_overview.ipynb 142KB
jquery-3.2.1.js 262KB
jquery.js 85KB
underscore-1.3.1.js 34KB
searchtools.js 25KB
websupport.js 25KB
searchindex.js 16KB
underscore.js 12KB
doctools.js 9KB
sidebar.js 5KB
documentation_options.js 278B
full_experiment_data.json 2KB
LICENSE 11KB
MANIFEST 113B
README.md 8KB
CONTRIBUTING.md 3KB
.nojekyll 0B
comment-close.png 829B
comment-bright.png 756B
comment.png 641B
file.png 286B
down-pressed.png 222B
up-pressed.png 214B
up.png 203B
down.png 202B
minus.png 90B
plus.png 90B
NavigationWorldMDP.py 39KB
run_experiments.py 24KB
chart_utils.py 19KB
mdp_visualizer.py 14KB
ExperimentClass.py 13KB
GridWorldMDPClass.py 12KB
GatherMDPClass.py 12KB
CleanupMDPClass.py 11KB
sa_helpers.py 10KB
ValueIterationClass.py 10KB
TrenchOOMDPClass.py 10KB
aa_visuals.py 9KB
QLearningAgentClass.py 9KB
DelayedQAgentClass.py 8KB
TaxiOOMDPClass.py 8KB
DQNAgentClass.py 8KB
cleanup_visualizer.py 8KB
RMaxAgentClass.py 7KB
GatherStateClass.py 6KB
grid_visualizer.py 6KB
BoundedRTDPClass.py 6KB
BeliefSparseSamplingClass.py 6KB
DoubleQAgentClass.py 6KB
GradientBoostingAgentClass.py 6KB
共 250 条
- 1
- 2
- 3
Matt小特
- 粉丝: 38
- 资源: 4539
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0