![img](https://img.shields.io/gitlab/pipeline/ymd_h/cpprb.svg)
![img](https://img.shields.io/pypi/v/cpprb.svg)
![img](https://img.shields.io/pypi/l/cpprb.svg)
![img](https://img.shields.io/pypi/status/cpprb.svg)
[![img](https://gitlab.com/ymd_h/cpprb/badges/master/coverage.svg)](https://ymd_h.gitlab.io/cpprb/coverage/)
<div class="html" id="org9e465a3">
<p>
<img src="<a href="https://pepy.tech/badge/cpprb">https://pepy.tech/badge/cpprb</a>"><img src="<a href="https://pepy.tech/badge/cpprb/month">https://pepy.tech/badge/cpprb/month</a>"><img src="<a href="https://pepy.tech/badge/cpprb/week">https://pepy.tech/badge/cpprb/week</a>">
</p>
</div>
![img](./site/static/images/favicon.png)
# Overview
cpprb is a python ([CPython](https://github.com/python/cpython/tree/master/Python)) module providing replay buffer classes for
reinforcement learning.
Major target users are researchers and library developers.
You can build your own reinforcement learning algorithms together with
your favorite deep learning library (e.g. [TensorFlow](https://www.tensorflow.org/), [PyTorch](https://pytorch.org/)).
cpprb forcuses speed, flexibility, and memory efficiency.
By utilizing [Cython](https://cython.org/), complicated calculations (e.g. segment tree for
prioritized experience replay) are offloaded onto C++.
(The name cpprb comes from "C++ Replay Buffer".)
In terms of API, initially cpprb referred to [OpenAI Baselines](https://github.com/openai/baselines)'
implementation. The current version of cpprb has much more
flexibility. Any [NumPy](https://numpy.org/) compatible types of any numbers of values can
be stored (as long as memory capacity is sufficient). For example, you
can store the next action and the next next observation, too.
# Installation
cpprb requires following softwares before installation.
- C++17 compiler (for installation from source)
- [GCC](https://gcc.gnu.org/) (maybe 7.2 and newer)
- [Visual Studio](https://visualstudio.microsoft.com/) (2017 Enterprise is fine)
- Python 3
- pip
Cuurently, [clang](https://clang.llvm.org/), which is a default Xcode C/C++ compiler at Apple macOS,
cannot compile cpprb.
If you are macOS user, you need to install GCC and set environment
values of `CC` and `CXX` to `g++`, or just use virtual environment
(e.g. [Docker](https://www.docker.com/)). Step by step installation is described [here](https://ymd_h.gitlab.io/cpprb/installation/install_on_macos/).
Additionally, here are user's good feedbacks for installation at [macOS](https://github.com/keiohta/tf2rl/issues/75) and [Ubuntu](https://gitlab.com/ymd_h/cpprb/issues/73).
(Thanks!)
## Install from [PyPI](https://pypi.org/) (Recommended)
The following command installs cpprb together with other dependencies.
pip install cpprb
Depending on your environment, you might need `sudo` or `--user` flag
for installation.
On supported platflorms (Linux x86-64 and Windows amd64), binary
packages hosted on PyPI can be used, so that you don't need C++
compiler. On the other platforms, such as macOS, and 32bit or
arm-architectured Linux and Windows, you cannot install from binary,
and you need to compile by yourself. Please be patient, we plan to
support wider platforms in future.
If you have any troubles to install from binary, you can fall back to
source installation by passing `--no-binary` option to the above pip
command. (In order to avoid NumPy source installation, it is better to
install NumPy beforehand.)
pip install numpy
pip install --no-binary cpprb
## Install from source code
First, download source code manually or clone the repository;
git clone https://gitlab.com/ymd_h/cpprb.git
Then you can install in the same way;
cd cpprb
pip install .
For this installation, you need to convert extended Python (.pyx) to
C++ (.cpp) during installation, it takes longer time than installation
from PyPI.
# Usage
## Basic Usage
Basic usage is following step;
1. Create replay buffer (`ReplayBuffer.__init__`)
2. Add transitions (`ReplayBuffer.add`)
1. Reset at episode end (`ReplayBuffer.on_episode_end`)
3. Sample transitions (`ReplayBuffer.sample`)
## Example Code
Here is a simple example for storing standard environment (aka. `obs`,
`act`, `rew`, `next_obs`, and `done`).
from cpprb import ReplayBuffer
buffer_size = 256
obs_shape = 3
act_dim = 1
rb = ReplayBuffer(buffer_size,
env_dict ={"obs": {"shape": obs_shape},
"act": {"shape": act_dim},
"rew": {},
"next_obs": {"shape": obs_shape},
"done": {}})
obs = np.ones(shape=(obs_shape))
act = np.ones(shape=(act_dim))
rew = 0
next_obs = np.ones(shape=(obs_shape))
done = 0
for i in range(500):
rb.add(obs=obs,act=act,rew=rew,next_obs=next_obs,done=done)
if done:
# Together with resetting environment, call ReplayBuffer.on_episode_end()
rb.on_episode_end()
batch_size = 32
sample = rb.sample(batch_size)
# sample is a dictionary whose keys are 'obs', 'act', 'rew', 'next_obs', and 'done'
## Construction Parameters
(See also [API reference](https://ymd_h.gitlab.io/cpprb/api/api/cpprb.ReplayBuffer.html))
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">Name</th>
<th scope="col" class="org-left">Type</th>
<th scope="col" class="org-left">Optional</th>
<th scope="col" class="org-left">Discription</th>
</tr>
</thead>
<tbody>
<tr>
<td class="org-left"><code>size</code></td>
<td class="org-left"><code>int</code></td>
<td class="org-left">No</td>
<td class="org-left">Buffer size</td>
</tr>
<tr>
<td class="org-left"><code>env_dict</code></td>
<td class="org-left"><code>dict</code></td>
<td class="org-left">Yes (but unusable)</td>
<td class="org-left">Environment definition (See <a href="https://ymd_h.gitlab.io/cpprb/features/flexible_environment/">here</a>)</td>
</tr>
<tr>
<td class="org-left"><code>next_of</code></td>
<td class="org-left"><code>str</code> or array-like of <code>str</code></td>
<td class="org-left">Yes</td>
<td class="org-left">Memory compression (See <a href="https://ymd_h.gitlab.io/cpprb/features/memory_compression/">here</a>)</td>
</tr>
<tr>
<td class="org-left"><code>stack_compress</code></td>
<td class="org-left"><code>str</code> or array-like of <code>str</code></td>
<td class="org-left">Yes</td>
<td class="org-left">Memory compression (See <a href="https://ymd_h.gitlab.io/cpprb/features/memory_compression/">here</a>)</td>
</tr>
<tr>
<td class="org-left"><code>default_dtype</code></td>
<td class="org-left"><code>numpy.dtype</code></td>
<td class="org-left">Yes</td>
<td class="org-left">Fall back data type</td>
</tr>
<tr>
<td class="org-left"><code>Nstep</code></td>
<td class="org-left"><code>dict</code></td>
<td class="org-left">Yes</td>
<td class="org-left">Nstep configuration (See <a href="https://ymd_h.gitlab.io/cpprb/features/nstep/">here</a>)</td>
</tr>
<tr>
<td class="org-left"><code>mmap_prefix</code></td>
<td class="org-left"><code>str</code></td>
<td class="org-left">Yes</td>
<td class="org-left">mmap file prefix (See <a href="https://ymd_h.gitlab.io/cpprb/features/mmap/">here</a>)</td>
</tr>
</tbody>
</table>
## Notes
Flexible environment values are defined by `env_dict` when buffer
creation. The detail is described at [document](https://ymd_h.gitlab.io/cpprb/features/flexible_environment/).
Since stored values have flexible name, you have to pass to
`ReplayBuffer.add` member by keyword.
# Features
cpprb provides buffer classes for building following algorithms.
<table border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<colgroup>
<col class="org-left" />
<col class="org-left" />
<col class="org-left" />
</colgr