# pandas-appender
[![Build Status](https://travis-ci.org/wumpus/pandas-appender.svg?branch=master)](https://travis-ci.org/wumpus/pandas-appender) [![Coverage Status](https://coveralls.io/repos/github/wumpus/pandas-appender/badge.svg?branch=master)](https://coveralls.io/github/wumpus/pandas-appender?branch=master) [![Apache License 2.0](https://img.shields.io/github/license/wumpus/pandas-appender.svg)](LICENSE)
Have you ever wanted to append a bunch of rows to a Pandas DataFrame? Turns out that
it's extremely inefficient to do so for a large dataframe, you're supposed to make
multiple dataframes and pd.concat them instead.
So... helper function? Pandas doesn't seem to have one. Roll your own?
OK then. Here's that helper function. It can append around 1 million
very small rows per cpu-second, and has modest additional memory
usage.
## Install
`pip install pandas-appender`
## Usage
```
from pandas_appender import DF_Appender
dfa = DF_appender(ignore_index=True) # note that ignore_index moves to the init
for i in range(1_000_000):
dfa = dfa.append({'i': i})
df = dfa.finalize()
```
## Type hints and category detection
Using narrower types and categories can often dramatically reduce the size of a
DataFrame. There are two ways to do this in pandas-appender. One is to
append to an existing dataframe:
```
dfa = DF_appender(df, ignore_index=True)
```
and the second is to pass in a `dtypes=` argument:
```
dfa = DF_appender(ignore_index=True, dtypes=another_dataframe.dtypes)
```
pandas-appender also offers a way to infer which columns would be smaller
if they were categories. This code will either analyze an existing dataframe
that you're appending to:
```
dfa = DF_appender(df, ignore_index=True, infer_categories=True)
```
or it will analyze the first chunk of appended lines:
```
dfa = DF_appender(ignore_index=True, infer_categories=True)
```
These inferred categories will override existing types or a `dtypes=` argument.
## Incompatibilities with pandas.DataFrame.append()
### pandas.DataFame.append is idempotent, DF_Appender is not
* Pandas: `df_new = df.append() # df is not changed`
* DF_Appender: `dfa_new = dfa.append # modifies dfa, and dfa_new == dfa`
### pandas.DataFrame.append will promote types, while DF_Appender is strict
* Pandas: append `0.1` to previously integer column, it will be promoted to float
* DF_Appender: when initialized with `dtypes=` or an existing DataFrame, appending
`0.1` to an integer column causes `0.1` to be cast to an integer, i.e. `0`.
## TODO
add a `predicted_rows=` kwarg, and scale memory usage (chunksize, middles) to keep
performance up for larger row counts, by using more memory.
程序员Chino的日记
- 粉丝: 3718
- 资源: 5万+
最新资源
- 2024年江西省职业院校技能大赛:GZ015-机器人系统集成应用技术(学生赛)赛项(高职组)样题
- 适用Centos7/8/9的libpcap、libpcap-devel离线安装包rpm
- MPC模型预测控制,风电调频,风储调频 在风储调频基础上加了MPC控制,复现的EI文献 MPC控制预测频率变化,进而改变风电出力 实时改变风电出力调频 创新就是, 仿真对比了实际仿真和在MP
- 圆管自动上料机sw16可编辑全套技术资料100%好用.zip
- NSFileNotFoundError如何解决.md
- TabError.md
- GlobalMixinError解决办法.md
- SystemError.md
- MalformedURLException(解决方案).md
- SystemExit.md
- FilterError解决办法.md
- UnknownHostException(解决方案).md
- NSFileAlreadyExistsError如何解决.md
- ConnectException(解决方案).md
- NSOutOfMemoryError如何解决.md
- DependencyInjectionError解决办法.md
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈