# pandas-multiprocess [![Build Status](https://travis-ci.org/xieqihui/pandas-multiprocess.svg?branch=master)](https://travis-ci.org/xieqihui/pandas-multiprocess)
A Python package to process Pandas Dataframe using multi-processing.
## Install
```
pip install pandas-multiprocess
```
## Example
### Import the package
```python
from pandas_multiprocess import multi_process
```
#### Define a function which will process each row in a Pandas DataFrame
The func must take a pandas.Series as its first positional argument and returns
either a pandas.Series or a list of pands.Series.
The function has one positional argument `data_row`, additional arguments can be
defined and the values of the additional arguments will be passed through
`multi_process()`. Here we use `**args` to stand for the additional arguments.
```python
def func(data_row, **args):
# data_row (pd.Series): a row of a panda Dataframe
# args: a dict of additional arguments
data_row['sum'] = data_row['col_1'] + data_row['col_2']
return data_row
```
### Initiate a DataFrame
```python
import pandas as pd
import numpy as np
df_len = 1000
df = pd.DataFrame({'col_1': np.random.normal(size=df_len),
'col_2': np.random.cd normal(size=df_len)
})
```
### Process it using multiprocess
```python
# The `args` will be passed to the additional arguments of `func()`
args = {}
result = multi_process(func=func,
data=df,
num_process=8,
**args)
```
### The above operation is equivalent as below, but much more efficient
```
result = df.apply(func, axis=1, **args)
```
The result of [example](examples/example.py) demonstrate the efficiency of
`pandas-multiprocess` in processing computational expensive operations for
each row of a Datafram.
```
Running examples...
100%|████| 100/100 [00:01<00:00, 68.65it/s]8 processes run time 2.189883 seconds.
100%|████| 100/100 [00:00<00:00, 140.90it/s]16 processes run time 1.440812 seconds.
Pandas apply() run time 11.165841 seconds.
```
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
熊猫多进程 一个使用多处理功能处理Pandas Dataframe的Python包。 安装 pip install pandas-multiprocess 例子 导入包裹 from pandas_multiprocess import multi_process 定义一个函数来处理Pandas DataFrame中的每一行 函数必须将pandas.Series作为其第一个位置参数,并返回pandas.Series或pands.Series的列表。 该函数具有一个位置参数data_row ,可以定义其他参数,并且其他参数的值将通过multi_process()传递。 在这里,我们使用**args代表其他参数。 def func ( data_row , ** args ): # data_row (pd.Series): a row of a panda Dataframe
资源详情
资源评论
资源推荐
收起资源包目录
pandas-multiprocess-master.zip (15个子文件)
pandas-multiprocess-master
MANIFEST.in 177B
.travis.yml 174B
Pipfile 228B
pandas_multiprocess
multiprocess.py 7KB
__init__.py 59B
tests
test_multiprocess.py 3KB
context.py 128B
Pipfile.lock 8KB
setup.cfg 22B
examples
example.py 1KB
setup.py 858B
.gitignore 110B
Makefile 551B
README.md 2KB
LICENSE.txt 1KB
共 15 条
- 1
易烊千玺的小朋友
- 粉丝: 31
- 资源: 4516
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0