# RipTable
![](docs/riptable_logo.PNG)
All in one, high performance 64 bit python analytics engine for numpy arrays with multithreaded support.
Support for Python 3.6, 3.7, 3.8 on 64 bit Linux, Windows, and Mac OS.
Enhances or replaces numpy, pandas, and includes high speed cross platform SDS file format.
RipTable can often crunch numbers at 1.5x to 10x the speed of numpy or pandas.
Maximum speed is achieved through the use of **[vector instrinsics](https://software.intel.com/sites/landingpage/IntrinsicsGuide/)**: hand rolled loops, using [AVX-256](https://en.wikipedia.org/wiki/Advanced_Vector_Extensions#CPUs_with_AVX2) with [AVX-512](https://en.wikipedia.org/wiki/AVX-512) support coming; **[parallel computing](https://www.drdobbs.com/go-parallel/article/print?articleId=212903586)**: for large arrays, multiple threads are deployed; **[recycling](https://en.wikipedia.org/wiki/Garbage_collection_(computer_science))**: built in array garbage collection; **[hashing](https://en.wikipedia.org/wiki/Hash_function)** and **parallel sorts** for core algorithms.
Install
-------
```
pip install riptable
```
Documentation: [readthedocs](https://riptable.readthedocs.io/en/latest/py-modindex.html)
Basic Concepts and Classes
--------------------------
**[FastArray](https://riptable.readthedocs.io/en/latest/riptable.html#riptable.rt_fastarray.FastArray)**: subclasses from a numpy array with builtin multithreaded number crunching. All scikit routines that expect a numpy array will also accept a FastArray since it is subclassed. isinstance(fastarray, np.ndarray) will return True.
**[Dataset](https://riptable.readthedocs.io/en/latest/riptable.html#module-riptable.rt_dataset)**: replaces the pandas DataFrame class and holds equal row length numpy arrays (including > 1 dimension).
**Struct**: replaces the pandas Series class. A **Struct** is a grab bag collection class that **Dataset** subclasses from.
**[Categorical](https://riptable.readthedocs.io/en/latest/riptable.html#module-riptable.rt_categorical)**: replaces both pandas groupby and Categorical class. RipTable **Categoricals** are multikey, filterable, stackable, archivable, and can chain computations such as apply_reduce loops. They can do everything groupby can plus more.
**Date/Time Classes**: DateTimeNano, Date, TimeSpan, and DateSpan are designed more like Java, C++, or C# classes. Replaces most numpy and pandas date time classes.
**Accum2/AccumTable**: For cross tabulation.
**[SDS](https://riptable.readthedocs.io/en/latest/riptable.html#riptable.rt_sds.save_sds)**: a new file format which can stack multiple datasets in multiple files with [zstd](https://github.com/facebook/zstd) compression, threads, and no extra memory copies. SDS also supports loading and writing datasets to shared memory.
Getting Started
----------------
```
import riptable as rt
ds = rt.Dataset({'intarray': rt.arange(1_000_000), 'floatarray': rt.arange(1_000_000.0)})
ds
ds.intarray.sum()
```
Numpy Users
------------
FastArray is a numpy array, however they can be flipped back and forth with no array copies taking place (it just changes the view).
```
import riptable as rt
import numpy as np
a = rt.arange(100)
numpyarray = a._np
fastarray = rt.FA(numpyarray)
```
or directly by changing the view, note how a FastArray is a numpy array
```
numpyarray.view(rt.FastArray)
fastarry.view(np.ndarray)
ininstance(fastarray, np.ndarray)
```
Pandas Users
------------
Simply drop a pandas DataFrame class into a riptable Dataset and it will be auto converted.
```
import riptable as rt
import numpy as np
import pandas as pd
df = pd.DataFrame({'intarray': np.arange(1_000_000), 'floatarray': np.arange(1_000_000.0)})
ds = rt.Dataset(df)
```
How can I contribute?
---------------------
RipTable has been public open sourced because it needs more users and
contributions to take it to the next level. The RipTable team is confident
the engine is the next generation building block for python data analytics
computing. We need help from reporting bugs, docs, improved functionality,
and new functionality. Please consider a github pull request or email the
team.
See the [contributing guide](docs/CONTRIBUTING.md) for more information.
How can I trust RipTable calculations?
--------------------------------------
RipTable has been in development for 3 years and tested by dozens of quants at a large financial firm. It has a full suite of [testing](riptable/tests). However just like any project, we still disover bugs and improvements. Please report them using github issues.
How can RipTable perform the same calculations faster?
------------------------------------------------------
RipTable was written from day one to handle large data and mulithreading using the riptide_cpp layer for basic arithmetic functions and algorithms. Many core algorithms have been painstakingly rewritten for multithreading.
Why doesn't numpy or pandas just pick up the same code?
-------------------------------------------------------
numpy does not have a multithreaded layer (we are in discussions with the numpy team to add such a layer), nor is it designed to use C++ templates or hashing algorithms. pandas does not have a C++ layer (it uses cython instead) and is a victim of its own success making early design mistakes difficult to change (such as the block manager and lack of powerful Categoricals).
Small, Medium, and Large array performance
------------------------------------------
RipTable is designed for *all* sizes of arrays. For small arrays (< 100 length), low processing overhead is important. RipTable's **FastArray** is written in hand coded 'C' and processes simple arithmetic functions faster than numpy arrays. For medium arrays (< 100,000 length), RipTable has vector instrinic loops. For large arrays (>= 100,000) RipTable knows how to dynamically scale out threading, waking up threads efficiently using a [futex](https://man7.org/linux/man-pages/man7/futex.7.html).
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
PyPI 官网下载 | riptable-1.1.1.tar.gz (211个子文件)
bld.bat 1KB
make.bat 799B
run_test.bat 25B
setup.cfg 38B
unicode_ex3.csv 2KB
unicode_ex2.csv 2KB
unicode_ex1.csv 2KB
.editorconfig 143B
.gitignore 2KB
googlefd23130f7a008646.html 53B
MANIFEST.in 190B
VSWorkspaceState.json 123B
ProjectSettings.json 35B
.keep 0B
LICENSE 3KB
Makefile 638B
README.md 6KB
LICENSES-thirdparty.md 3KB
THREADING.md 3KB
CONTRIBUTING.md 871B
groupby1_ex3.pickle 739B
PKG-INFO 662B
PKG-INFO 662B
riptable_logo.PNG 1KB
rt_dataset.py 252KB
test_merge.py 230KB
rt_datetime.py 222KB
rt_categorical.py 220KB
rt_merge.py 193KB
rt_struct.py 168KB
test_groupby_autotest_aggregated_functions.py 158KB
rt_fastarray.py 157KB
rt_grouping.py 152KB
test_categorical.py 126KB
rt_sds.py 125KB
rt_numpy.py 122KB
rt_display.py 104KB
test_datetime.py 96KB
test_ufunc2.py 91KB
test_dataset.py 87KB
test_riptide_numpy_equivalency.py 87KB
test_saveload.py 73KB
rt_groupbyops.py 70KB
test_fastarray.py 61KB
rt_accum2.py 46KB
bench_primops.py 46KB
rt_str.py 40KB
test_sprint_fastarray.py 39KB
ipython_compatibility_test.py 38KB
rt_utils.py 37KB
rt_multiset.py 33KB
test_struct.py 32KB
test_categorical_groupby.py 31KB
test_groupby.py 30KB
rt_enum.py 30KB
rt_misc.py 30KB
rt_hstack.py 29KB
rt_pdataset.py 27KB
appdirs.py 25KB
rt_timezone.py 24KB
rt_matplotlib.py 24KB
test_grouping.py 23KB
bench_numpy.py 23KB
rt_groupby.py 23KB
rt_itemcontainer.py 22KB
display_options.py 21KB
test_str.py 21KB
test_accum2.py 21KB
helper_strategies.py 21KB
bench_merge.py 20KB
test_dataset_slicing.py 20KB
test_apply.py 20KB
rt_bin.py 20KB
rt_fastarraynumba.py 19KB
rt_accumtable.py 19KB
rand_keydata.py 19KB
rt_testdata.py 19KB
rt_groupbynumba.py 19KB
test_categorical_values.py 19KB
test_rtnumpy.py 19KB
categorical_strategy.py 19KB
rt_display_nested.py 18KB
rt_groupbykeys.py 18KB
test_ismember.py 18KB
test_fastarray_functions.py 18KB
rt_timers.py 17KB
test_date.py 17KB
main.py 16KB
test_datetime_math.py 15KB
ipython_integration_test.py 15KB
rt_display_properties.py 14KB
array_assert.py 14KB
test_categorical_autotest_aggregated_functions.py 14KB
runner.py 13KB
ipython_utils.py 13KB
test_ufunc_unary.py 13KB
test_lexsort.py 12KB
rt_ledger.py 12KB
test_interop_pyarrow.py 12KB
test_categorical_property.py 12KB
共 211 条
- 1
- 2
- 3
资源评论
挣扎的蓝藻
- 粉丝: 12w+
- 资源: 15万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功