# pandas-cat
<img alt="PyPI - License" src="https://img.shields.io/pypi/l/pandas-cat">
<img alt="PyPI - Python Version" src="https://img.shields.io/pypi/pyversions/pandas-cat">
<img alt="PyPI - Wheel" src="https://img.shields.io/pypi/wheel/pandas-cat">
<img alt="PyPI - Status" src="https://img.shields.io/pypi/status/pandas-cat">
<img alt="PyPI - Downloads" src="https://img.shields.io/pypi/dm/pandas-cat">
## The pandas-cat is a Pandas's categorical profiling library.
pandas-cat is abbreviation of PANDAS-CATegorical profiling. This package provides profile for categorical attributes as well as (optional) adjustments of data set, e.g. estimating whether variable is numeric and order categories with respect to numbers etc.
## The pandas-cat in more detail
The package creates (html) profile of the categorical dataset. It supports both ordinal (ordered) categories as well as nominal ones. Moreover, it overcomes typical issues with categorical, mainly ordered data that are typically available, like that categories are de facto numbers, or numbers with some enhancement and should be treated as ordered.
For example, in dataset *Accidents*
attribute Hit Objects in can be used as:
- *unordered*: 0.0 10.0 7.0 11.0 4.0 2.0 8.0 1.0 9.0 6.0 5.0 12.0 nan
- *ordered*: 0.0 1.0 10.0 11.0 12.0 2.0 4.0 5.0 6.0 7.0 8.0 9.0 nan
- *as analyst wishes (package does)*: 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 nan
Typical issues are (numbers are nor numbers):
- categories are intervals (like 75-100, 101-200)
- have category with some additional information (e.g. Over 75, 60+, <18, Under 16)
- have n/a category explicitly coded sorted in data
Therefore this library provides profiling as well as somehow automatic data preparation.
Currently, there are two methods in place:
- `profile` -- profiles a dataset, categories and their correlations
- `prepare` -- prepares a dataset, tries to understand label names (if they are numbers) and sort them
## Installation
You can install the package using
`pip install pandas-cat`
## Usage
The usage of this package is simple. Sample code follows (it uses dataset [Accidents](https://petrmasa.com/pandas-cat/data/accidents.zip) based on [Kaggle dataset](https://www.kaggle.com/code/ambaniverma/uk-traffic-accidents))
```
import pandas as pd
from pandas_cat import pandas_cat
#read dataset. You can download it and setup path to local file.
df = pd.read_csv ('https://petrmasa.com/pandas-cat/data/accidents.zip', encoding='cp1250', sep='\t')
#use only selected columns
df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey']]
#longer demo report uses this set of columns instead of the first one
#df=df[['Driver_Age_Band','Driver_IMD','Sex','Journey','Hit_Objects_in','Hit_Objects_off','Casualties','Severity','Area','Vehicle_Age','Road_Type','Speed_limit','Light','Vehicle_Location','Vehicle_Type']]
#for profiling, use following code
pandas_cat.profile(df=df,dataset_name="Accidents",opts={"auto_prepare":True})
#for just adjusting dataset, use following code
df = pandas_cat.prepare(df)
```
## Data and sample reports
Sample reports are here - [basic](https://petrmasa.com/pandas-cat/sample/report1.html) and [longer](https://petrmasa.com/pandas-cat/sample/report2.html). Note that these reports have been generated with code above.
The dataset is downloaded from the web (each time you run the code). If you want, you can download sample dataset [here](https://petrmasa.com/pandas-cat/data/accidents.zip) and store it locally.
没有合适的资源?快使用搜索试试~ 我知道了~
pandas-cat-0.1.2.tar.gz
0 下载量 31 浏览量
2024-03-06
13:09:37
上传
评论
收藏 14KB GZ 举报
温馨提示
Python库是一组预先编写的代码模块,旨在帮助开发者实现特定的编程任务,无需从零开始编写代码。这些库可以包括各种功能,如数学运算、文件操作、数据分析和网络编程等。Python社区提供了大量的第三方库,如NumPy、Pandas和Requests,极大地丰富了Python的应用领域,从数据科学到Web开发。Python库的丰富性是Python成为最受欢迎的编程语言之一的关键原因之一。这些库不仅为初学者提供了快速入门的途径,而且为经验丰富的开发者提供了强大的工具,以高效率、高质量地完成复杂任务。例如,Matplotlib和Seaborn库在数据可视化领域内非常受欢迎,它们提供了广泛的工具和技术,可以创建高度定制化的图表和图形,帮助数据科学家和分析师在数据探索和结果展示中更有效地传达信息。
资源推荐
资源详情
资源评论
收起资源包目录
pandas-cat-0.1.2.tar.gz (14个子文件)
pandas-cat-0.1.2
setup.py 1KB
src
pandas_cat.egg-info
SOURCES.txt 329B
top_level.txt 11B
PKG-INFO 4KB
requires.txt 73B
dependency_links.txt 1B
pandas_cat
__init__.py 46B
pandas_cat.py 21KB
templates
default_0_1_0.tem 12KB
LICENSE 1KB
PKG-INFO 4KB
pyproject.toml 190B
setup.cfg 42B
README.md 4KB
共 14 条
- 1
资源评论
程序员Chino的日记
- 粉丝: 3676
- 资源: 5万+
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- sensors-18-03721.pdf
- Facebook.apk
- 推荐一款JTools的call-this-method插件
- json的合法基色来自红包东i请各位
- 项目采用YOLO V4算法模型进行目标检测,使用Deep SORT目标跟踪算法 .zip
- 针对实时视频流和静态图像实现的对象检测和跟踪算法 .zip
- 部署 yolox 算法使用 deepstream.zip
- 基于webmagic、springboot和mybatis的MagicToe Java爬虫设计源码
- 通过实时流协议 (RTSP) 使用 Yolo、OpenCV 和 Python 进行深度学习的对象检测.zip
- 基于Python和HTML的tb商品列表查询分析设计源码
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功