# Factor Expr [![status][ci_badge]][ci_page] [![pypi][pypi_badge]][pypi_page]
[ci_badge]: https://github.com/dovahcrow/factor-expr/workflows/ci/badge.svg
[ci_page]: https://github.com/dovahcrow/factor-expr/actions
[pypi_badge]: https://img.shields.io/pypi/v/factor-expr?color=green&style=flat-square
[pypi_page]: https://pypi.org/project/factor-expr/
<center>
<table>
<tr>
<th>Factor Expression</th>
<th></th>
<th>Historical Data</th>
<th></th>
<th>Factor Values</th>
</tr>
<tr>
<td>(TSLogReturn 30 :close)</td>
<td>+</td>
<td>2019-12-27~2020-01-14.pq</td>
<td>=</td>
<td>[0.01, 0.035, ...]</td>
</tr>
</table>
</center>
----------
Extreme fast factor expression & computation library for quantitative trading in Python.
On a server with an E7-4830 CPU (16 cores, 2000MHz),
computing 48 factors over a dataset with 24.5M rows x 683 columns (12GB) takes 150s.
Join [\[Discussions\]](https://github.com/dovahcrow/factor-expr/discussions) for Q&A and feature proposal!
## Features
* Express factors in [S-Expression](https://en.wikipedia.org/wiki/S-expression).
* Compute factors in parallel over multiple factors and multiple datasets.
## Usage
There are three steps to use this library.
1. Prepare the datasets into files. Currently, only the [Parquet](https://parquet.apache.org/) format is supported.
2. Define factors using [S-Expression](https://en.wikipedia.org/wiki/S-expression).
3. Run `replay` to compute the factors on the dataset.
### 1. Prepare the dataset
A dataset is a tabular format with float64 columns and arbitrary column names.
Each row in the dataset represents a tick, e.g. for a daily dataset, each row is one day.
For example, here is an OHLC candle dataset representing 2 ticks:
```python
df = pd.DataFrame({
"open": [3.1, 5.8],
"high": [8.8, 7.7],
"low": [1.1, 2.1],
"close": [4.4, 3.4]
})
```
You can use the following code to store the DataFrame into a Parquet file:
```python
df.to_parquet("data.pq")
```
### 2. Define your factors
`Factor Expr` uses the S-Expression to describe a factor.
For example, on a daily OHLC dataset, the 30 days log return on the column `close` is expressed as:
```python
from factor_expr import Factor
Factor("(TSLogReturn 30 :close)")
```
Note, in `Factor Expr`, column names are referred by the `:column-name` syntax.
### 3. Compute the factors on the prepared dataset
Following step 1 and 2, you can now compute the factors using the `replay` function:
```python
from factor_expr import Factor, replay
result = await replay(
["data.pq"],
[Factor("(TSLogReturn 30 :close)")]
)
```
The first parameter of `replay` is a list of dataset files and the second parameter is a list of Factors. This gives you the ability to compute multiple factors on multiple datasets.
Don't worry about the performance! `Factor Expr` allows you parallelize the computation over the factors as well as the datasets by setting `n_factor_jobs` and `n_data_jobs` in the `replay` function.
The returned result is a pandas DataFrame with factors as the column names and `time` as the index.
In case of multiple datasets are passed in, the results will be concatenated with the exact order of the datasets. This is useful if you have a scattered dataset. E.g. one file for each year.
For example, the code above will give you a DataFrame looks similar to this:
| index | (TSLogReturn 30 :close) |
| ----- | ----------------------- |
| 0 | 0.23 |
| ... | ... |
Check out the [docstring](#replay) of `replay` for more information!
## Installation
```bash
pip install factor-expr
```
## Supported Functions
Notations:
* `<const>` means a constant, e.g. `3`.
* `<expr>` means either a constant or an S-Expression or a column name, e.g. `3` or `(+ :close 3)` or `:open`.
Here's the full list of supported functions. If you didn't find one you need,
consider asking on [Discussions](https://github.com/dovahcrow/factor-expr/discussions) or creating a PR!
### Arithmetics
* Addition: `(+ <expr> <expr>)`
* Subtraction: `(- <expr> <expr>)`
* Multiplication: `(* <expr> <expr>)`
* Division: `(/ <expr> <expr>)`
* Power: `(^ <const> <expr>)` - compute `<expr> ^ <const>`
* Negation: `(Neg <expr>)`
* Signed Power: `(SPow <const> <expr>)` - compute `sign(<expr>) * abs(<expr>) ^ <const>`
* Natural Logarithm after Absolute: `(LogAbs <expr>)`
* Sign: `(Sign <expr>)`
* Abs: `(Abs <expr>)`
### Logics
Any `<expr>` larger than 0 are treated as `true`.
* If: `(If <expr> <expr> <expr>)` - if the first `<expr>` is larger than 0, return the second `<expr>` otherwise return the third `<expr>`
* And: `(And <expr> <expr>)`
* Or: `(Or <expr> <expr>)`
* Less Than: `(< <expr> <expr>)`
* Less Than or Equal: `(<= <expr> <expr>)`
* Great Than: `(> <expr> <expr>)`
* Greate Than or Equal: `(>= <expr> <expr>)`
* Equal: `(== <expr> <expr>)`
* Not: `(! <expr>)`
### Window Functions
All the window functions take a window size as the first argument. The computation will be done on the look-back window with the size given in `<const>`.
* Sum of the window elements: `(TSSum <const> <expr>)`
* Mean of the window elements: `(TSMean <const> <expr>)`
* Min of the window elements: `(TSMin <const> <expr>)`
* Max of the window elements: `(TSMax <const> <expr>)`
* The index of the min of the window elements: `(TSArgMin <const> <expr>)`
* The index of the max of the window elements: `(TSArgMax <const> <expr>)`
* Stdev of the window elements: `(TSStd <const> <expr>)`
* Skew of the window elements: `(TSSkew <const> <expr>)`
* The rank (ascending) of the current element in the window: `(TSRank <const> <expr>)`
* The value `<const>` ticks back: `(Delay <const> <expr>)`
* The log return of the value `<const>` ticks back to current value: `(TSLogReturn <const> <expr>)`
* Rolling correlation between two series: `(TSCorrelation <const> <expr> <expr>)`
* Rolling quantile of a series: `(TSQuantile <const> <const> <expr>)`, e.g. `(TSQuantile 100 0.5 <expr>)` computes the median of a window sized 100.
#### Warm-up Period for Window Functions
Factors containing window functions require a warm-up period. For example, for
`(TSSum 10 :close)`, it will not generate data until the 10th tick is replayed.
In this case, `replay` will write `NaN` into the result during the warm-up period, until the factor starts to produce data.
This ensures the length of the factor output will be as same as the length of the input dataset. You can use the `trim`
parameter to let replay trim off the warm-up period before it returns.
## Factors Failed to Compute
`Factor Expr` guarantees that there will not be any `inf`, `-inf` or `NaN` appear in the result, except for the warm-up period. However, sometimes a factor can fail due to numerical issues. For example, `(Pow 3 (Pow 3 (Pow 3 :volume)))` might overflow and become `inf`, and `1 / inf` will become `NaN`. `Factor Expr` will detect these situations and mark these factors as failed. The failed factors will still be returned in the replay result, but the values in that column will be all `NaN`. You can easily remove these failed factors from the result by using `pd.DataFrame.dropna(axis=1, how="all")`.
## I Want to Have a Time Index for the Result
The `replay` function optionally accepts a `index_col` parameter.
If you want to set a column from the dataset as the index of the returned result, you can do the following:
```python
from factor_expr import Factor, replay
pd.DataFrame({
"time": [datetime(2021,4,23), datetime(2021,4,24)],
"open": [3.1, 5.8],
"high": [8.8, 7.7],
"low": [1.1, 2.1],
"close": [4.4, 3.4],
}).to_parquet("data.pq")
result = await replay(
["data.pq"],
[Factor("(TSLogReturn 30 :close)")],
index_col="time",
)
```
Note, accessing the `time` column from factor expressions will cause an error.
Factor expressions can only read `float64` columns.
## API
There are two components in `Factor Expr`, a `Factor` class and a `replay` function.
### Factor
The f
weixin_38748555
- 粉丝: 6
- 资源: 933
最新资源
- 一个基于C语言开发的极其简易的shell命令行程序的实现 300行代码包括详细的注释 .zip
- MyBatis-Flex 一个优雅的 MyBatis 增强框架
- 一个使用Java语言编写的简易学生信息管理系统,录入的学生信息保存在电脑本地D盘根目录下 支持基本的增、删、改、查操作 .zip
- 一个pyside6开发的网易云音乐第三方客户端,使用了NeteaseCloudMusic-PythonSDK,HFUT Python语言与系统设计课程大作业.zip
- xhttp 是一个用 Go 语言编写的 HTTP 客户端库,旨在提供类似于 Python 中 requests 库的简洁易用的 API 通过 xhttp,您可以轻松地进行 HTTP 请求,处理.zip
- leetcode 1.两数之和
- Vue开发资源汇总Vux ★8133 - 基于Vue和WeUI的组件库
- vSphere-vCenter-Esxi-Vmware虚拟机管理工具,可以连接到任何vCenter机器进行批量创建主机,批量关机,批量开机,批量重启,批量删除释放等操作 后端接口Django开.zip
- sfd是一个GO语言开发的,简单易用的下载网络文件(图片,HTML,视频,音频)小工具.zip
- 4-8 Spring 源码深度剖析(四).rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈