# File Case
- This is used to cache the Dataframe result, even there are multiply Dataframe, which can help to reduce the huge time in feature engineering
- It also support to log the function time cost and parameters
## Installation
pip install file_cache
## Sample case
```python
from file_cache.cache import file_cache
import numpy as np
import pandas as pd
@file_cache()
def test_cache_normal(name):
import time
import numpy as np
time.sleep(3)
return pd.DataFrame(data= np.arange(0,10).reshape(2,5))
normal_df = test_cache_normal('Felix')
normal_df.head()
```
<div>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>0</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>0</td>
<td>1</td>
<td>2</td>
<td>3</td>
<td>4</td>
</tr>
<tr>
<th>1</th>
<td>5</td>
<td>6</td>
<td>7</td>
<td>8</td>
<td>9</td>
</tr>
</tbody>
</table>
</div>
## Return mulpiple DF with tuple
Support to cache multiple DF with tuple
```python
import time
from functools import lru_cache
@lru_cache()
@file_cache()
def test_cache_tuple(name):
time.sleep(3)
df0 = pd.DataFrame(data= np.arange(5,15).reshape(2,5))
df1 = pd.DataFrame(data= np.arange(20,30).reshape(2,5))
return df0, df1
df0, df1 = test_cache_tuple('Felix2')
print(df0 , '\n')
print(df1)
```
0 1 2 3 4
0 5 6 7 8 9
1 10 11 12 13 14
0 1 2 3 4
0 20 21 22 23 24
1 25 26 27 28 29
## For the input paras can not be cached
If the input is DF or cannot be hashed, ignore the cache, run the function directly
```python
@file_cache()
def test_cache_ignore(name):
df0 = pd.DataFrame(data= np.arange(5,15).reshape(2,5))
return df0
df = pd.DataFrame(data= np.arange(5,15).reshape(2,5))
ignore = test_cache_ignore(df)
```
## Log the function time and parameter
```python
from file_cache.utils.util_log import *
@timed()
def log_time(arg):
return f'{arg} msg'
print(log_time("hello"))
```
2018-12-26 11:08:52,662 util_log.py[61] DEBUG Start the program at:LALI2-M-G0MD, 127.0.0.1, with:Load module
2018-12-26 11:08:52,665 util_log.py[41] INFO log_time begin with(1 paras) :['hello'], []
2018-12-26 11:08:52,667 util_log.py[49] INFO log_time cost: 0.00 sec:(1 paras)(['hello'], []), return:hello msg, end
hello msg
## Not only support DataFrame, but also support Series
```python
from file_cache.cache import file_cache
@file_cache()
def get_train_data():
from sklearn import datasets
import pandas as pd
import numpy as np
data = datasets.load_boston()
df = pd.DataFrame( data.data , columns=data.feature_names)
df['target'] = data.target
df.head()
return df, df['target']
df, series = get_train_data()
print(type(df), type(series))
df, series = get_train_data()
print(type(df), type(series))
```
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.series.Series'>
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.series.Series'>
PyPI 官网下载 | file_cache-0.1.6.tar.gz
版权申诉
41 浏览量
2022-01-11
08:22:14
上传
评论
收藏 9KB GZ 举报
挣扎的蓝藻
- 粉丝: 13w+
- 资源: 15万+
最新资源
- Color-Transformer introduction
- FastStone Capture屏幕长截图软件包
- Table IoT物联网工具,简单快速的搭建物联网服务平台
- zheng2020 ecg new dataset-12 lead-add-label
- """YOLOv5-specific modules Usage: $ python path/to/models/y
- onnx-while-test.cpython-37
- 基于MapReduce的招聘数据清洗项目(免费提供源码)
- 微笑话-搜索-小程序-html
- 10kv-10支路机柜-集装箱系统-布局图202240418.dwg
- elastic-distributed-sampler
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈