Pandas Type Checks
==================
[![Build Status](https://dev.azure.com/martin-zuber/pandas-type-checks/_apis/build/status/mzuber.pandas-type-checks?branchName=main)](https://dev.azure.com/martin-zuber/pandas-type-checks/_build/latest?definitionId=1&branchName=main)
[![Quality Gate Status](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=alert_status)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)
[![Coverage](https://sonarcloud.io/api/project_badges/measure?project=mzuber_pandas-type-checks&metric=coverage)](https://sonarcloud.io/summary/new_code?id=mzuber_pandas-type-checks)
[![PyPI Version](https://img.shields.io/pypi/v/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)
[![PyPI Wheel](https://img.shields.io/pypi/wheel/pandas-type-checks)](https://pypi.org/project/pandas-type-checks/)
A Python library providing means for structural type checking of Pandas data frames and series:
- A decorator `pandas_type_check` for specifying and checking the structure of Pandas `DataFrame` and `Series`
arguments and return values of a function.
- Support for "non-strict" type checking. In this mode data frames can contain columns which are not part of the type
specification against which they are checked. Non-strict type checking in that sense allows a form of structural
subtyping for data frames.
- Configuration options to raise exceptions for type errors or alternatively log them.
- Configuration option to globally enable/disable the type checks. This allows users to enable the type checking
functionality in e.g. only testing environments.
This library focuses on providing utilities to check the structure (i.e. columns and their types) of Pandas data frames
and series arguments and return values of functions. For checking individual data frame and series values, including
formulating more sophisticated constraints on column values, [Pandera](https://github.com/unionai-oss/pandera) is a
great alternative.
Installation
------------
Packages for all released versions are available at the
[Python Package Index (PyPI)](https://pypi.org/project/pandas-type-checks) and can be installed with `pip`:
```
pip install pandas-type-checks
```
The library can also be installed with support for additional functionality:
```
pip install pandas-type-checks[pandera] # Support for Pandera data frame and series schemas
```
Usage Example
-------------
The function `filter_rows_and_remove_column` is annotated with type check hints for the Pandas `DataFrame` and `Series`
arguments and return value of the function:
```python
import pandas as pd
import numpy as np
import pandas_type_checks as pd_types
@pd_types.pandas_type_check(
pd_types.DataFrameArgument('data', {
'A': np.dtype('float64'),
'B': np.dtype('int64'),
'C': np.dtype('bool')
}),
pd_types.SeriesArgument('filter_values', 'int64'),
pd_types.DataFrameReturnValue({
'B': np.dtype('int64'),
'C': np.dtype('bool')
})
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
return data[data['B'].isin(filter_values.values)].drop('A', axis=1)
```
Applying the function `filter_rows_and_remove_column` to a filter values `Series` with the wrong type will result in a
`TypeError` exception with a detailed type error message:
```python
test_data = pd.DataFrame({
'A': pd.Series(1, index=list(range(4)), dtype='float64'),
'B': np.array([1, 2, 3, 4], dtype='int64'),
'C': np.array([True] * 4, dtype='bool')
})
test_filter_values_with_wrong_type = pd.Series([3, 4], dtype='int32')
filter_rows_and_remove_column(test_data, test_filter_values_with_wrong_type)
```
```
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'filter_values':
Expected Series of type 'int64' but found type 'int32'
```
Applying the function `filter_rows_and_remove_column` to a data frame with a wrong column type and a missing column
will result in a `TypeError` exception with a detailed type error message:
```python
test_data_with_wrong_type_and_missing_column = pd.DataFrame({
'A': pd.Series(1, index=list(range(4)), dtype='float64'),
'B': np.array([1, 2, 3, 4], dtype='int32')
})
test_filter_values = pd.Series([3, 4], dtype='int64')
filter_rows_and_remove_column(test_data_with_wrong_type_and_missing_column, test_filter_values)
```
```
TypeError: Pandas type error in function 'filter_rows_and_remove_column'
Type error in argument 'data':
Expected type 'int64' for column B' but found type 'int32'
Missing column in DataFrame: 'C'
Type error in return value:
Expected type 'int64' for column B' but found type 'int32'
Missing column in DataFrame: 'C'
```
Configuration
-------------
The global configuration object `pandas_type_checks.config` can be used to configure the behavior of the library:
- `config.enable_type_checks` (`bool`): Flag for enabling/disabling type checks for specified arguments and return
values. This flag can be used to globally enable or disable the type checker in certain environments.
Default: `True`
- `config.strict_type_checks` (`bool`): Flag for strict type check mode. If strict type checking is enabled data frames
cannot contain columns which are not part of the type specification against which they are checked. Non-strict type
checking in that sense allows a form of structural subtyping for data frames.
Default: `False`
- `config.log_type_errors` (`bool`): Flag indicating that type errors for Pandas dataframes or series values should be
logged instead of raising a `TypeError` exception. Type errors will be logged with log level `ERROR`.
Default: `False`
- `config.logger` (`logging.Logger`): Logger to be used for logging type errors when the `log_type_errors` flag is enabled.
When no logger is specified via the configuration a built-in default logger is used.
Pandera Support
---------------
This library can be installed which additional support for [Pandera](https://github.com/unionai-oss/pandera):
```
pip install pandas-type-checks[pandera]
```
In this case Pandera [DataFrameSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.DataFrameSchema.html)
and [SeriesSchema](https://pandera.readthedocs.io/en/stable/reference/generated/pandera.schemas.SeriesSchema.html)
can be used as type specifications for data frame and series arguments and return values.
```python
import pandas as pd
import pandera as pa
import numpy as np
import pandas_type_checks as pd_types
@pd_types.pandas_type_check(
pd_types.DataFrameArgument('data',
pa.DataFrameSchema({
'A': pa.Column(np.dtype('float64'), checks=pa.Check.le(10.0)),
'B': pa.Column(np.dtype('int64'), checks=pa.Check.lt(2)),
'C': pa.Column(np.dtype('bool'))
})),
pd_types.SeriesArgument('filter_values', 'int64'),
pd_types.DataFrameReturnValue({
'B': np.dtype('int64'),
'C': np.dtype('bool')
})
)
def filter_rows_and_remove_column(data: pd.DataFrame, filter_values: pd.Series) -> pd.DataFrame:
return data[data['B'].isin(filter_values.values)].drop('A', axis=1)
```
References
----------
* [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
* [Python Packaging User Guide](https://packaging.python.org/en/latest/)
没有合适的资源?快使用搜索试试~ 我知道了~
pandas-type-checks-1.1.1.tar.gz
需积分: 1 0 下载量 9 浏览量
2024-03-08
15:38:50
上传
评论
收藏 12KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
pandas-type-checks-1.1.1.tar.gz (20个子文件)
pandas-type-checks-1.1.1
setup.py 61B
src
pandas_type_checks
__init__.py 542B
decorator.py 8KB
core.py 11KB
pandera_support.py 1KB
errors.py 3KB
pandas_type_checks.egg-info
SOURCES.txt 514B
top_level.txt 19B
PKG-INFO 8KB
requires.txt 41B
dependency_links.txt 1B
LICENSE 1KB
PKG-INFO 8KB
version.txt 6B
pyproject.toml 1KB
requirements.txt 14B
MANIFEST.in 95B
setup.cfg 38B
requirements-optional.txt 17B
README.md 7KB
共 20 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3689
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功