# Pandas Profiling
![Pandas Profiling Logo Header](https://pandas-profiling.github.io/pandas-profiling/docs/assets/logo_header.png)
[![Build Status](https://travis-ci.com/pandas-profiling/pandas-profiling.svg?branch=master)](https://travis-ci.com/pandas-profiling/pandas-profiling)
[![Code Coverage](https://codecov.io/gh/pandas-profiling/pandas-profiling/branch/master/graph/badge.svg?token=gMptB4YUnF)](https://codecov.io/gh/pandas-profiling/pandas-profiling)
[![Release Version](https://img.shields.io/github/release/pandas-profiling/pandas-profiling.svg)](https://github.com/pandas-profiling/pandas-profiling/releases)
[![Python Version](https://img.shields.io/pypi/pyversions/pandas-profiling)](https://pypi.org/project/pandas-profiling/)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black)
Generates profile reports from a pandas `DataFrame`.
The pandas `df.describe()` function is great but a little basic for serious exploratory data analysis.
`pandas_profiling` extends the pandas DataFrame with `df.profile_report()` for quick data analysis.
For each column the following statistics - if relevant for the column type - are presented in an interactive HTML report:
* **Type inference**: detect the [types](#types) of columns in a dataframe.
* **Essentials**: type, unique values, missing values
* **Quantile statistics** like minimum value, Q1, median, Q3, maximum, range, interquartile range
* **Descriptive statistics** like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
* **Most frequent values**
* **Histogram**
* **Correlations** highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
* **Missing values** matrix, count, heatmap and dendrogram of missing values
* **Text analysis** learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
* **File and Image analysis** extract file sizes, creation dates and dimensions and scan for truncated images or those containing EXIF information.
## Announcements
### Version v2.8.0 released
News for users working with image datasets: ``pandas-profiling`` now has build-in supports for Files and Images.
Moreover, the text analysis features have also been reworked, providing more informative statistics.
For a better feel, have a look at the [examples](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/examples.html#showcasing-specific-features) section in the docs or read the changelog for a complete view of the changes.
### Version v2.7.0 released
#### Performance
There were several performance regressions pointed out to me recently when comparing 1.4.1 to 2.6.0.
To that end, we benchmarked the code and found several minor features introducing disproportionate computational complexity.
Version 2.7.0 optimizes these, giving significant performance improvements!
Moreover, the default configuration is tweaked for towards the needs of the average user.
#### Phased builds and lazy loading
A report is built in phases, which allows for new exciting features such as caching, only re-rendering partial reports and lazily computing the report.
Moreover, the progress bar provides more information on the building phase and step.
#### Documentation
This version introduces [more elaborate documentation](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/index.html) powered by Sphinx. The previously used pdoc3 has been adequate initially, however misses functionality and extensibility. Several recurring topics are now documented, for instance the configuration parameters are documented and there are pages on big datasets, sensitive data, integrations and resources.
#### Support `pandas-profiling`
The development of ``pandas-profiling`` relies completely on contributions.
If you find value in the package, we welcome you to support the project through [GitHub Sponsors](https://github.com/sponsors/sbrugman)!
It's extra exciting that GitHub **matches your contribution** for the first year.
Find more information here:
- [Changelog v2.7.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-7-0)
- [Changelog v2.8.0](https://pandas-profiling.github.io/pandas-profiling/docs/master/rtd/pages/changelog.html#changelog-v2-8-0)
- [Sponsor the project on GitHub](https://github.com/sponsors/sbrugman)
*May 7, 2020 ����*
---
_Contents:_ **[Examples](#examples)** |
**[Installation](#installation)** | **[Documentation](#documentation)** |
**[Large datasets](#large-datasets)** | **[Command line usage](#command-line-usage)** |
**[Advanced usage](#advanced-usage)** |
**[Types](#types)** | **[How to contribute](#contributing)** |
**[Editor Integration](#editor-integration)** | **[Dependencies](#dependencies)**
---
## Examples
The following examples can give you an impression of what the package can do:
* [Census Income](https://pandas-profiling.github.io/pandas-profiling/examples/master/census/census_report.html) (US Adult Census data relating income)
* [NASA Meteorites](https://pandas-profiling.github.io/pandas-profiling/examples/master/meteorites/meteorites_report.html) (comprehensive set of meteorite landings) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/meteorites/meteorites.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Fmeteorites%2Fmeteorites.ipynb)
* [Titanic](https://pandas-profiling.github.io/pandas-profiling/examples/master/titanic/titanic_report.html) (the "Wonderwall" of datasets) [![Open In Colab](https://camo.githubusercontent.com/52feade06f2fecbf006889a904d221e6a730c194/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/github/pandas-profiling/pandas-profiling/blob/master/examples/titanic/titanic.ipynb) [![Binder](https://camo.githubusercontent.com/483bae47a175c24dfbfc57390edd8b6982ac5fb3/68747470733a2f2f6d7962696e6465722e6f72672f62616467655f6c6f676f2e737667)](https://mybinder.org/v2/gh/pandas-profiling/pandas-profiling/master?filepath=examples%2Ftitanic%2Ftitanic.ipynb)
* [NZA](https://pandas-profiling.github.io/pandas-profiling/examples/master/nza/nza_report.html) (open data from the Dutch Healthcare Authority)
* [Stata Auto](https://pandas-profiling.github.io/pandas-profiling/examples/master/stata_auto/stata_auto_report.html) (1978 Automobile data)
* [Vektis](https://pandas-profiling.github.io/pandas-profiling/examples/master/vektis/vektis_report.html) (Vektis Dutch Healthcare data)
* [Colors](https://pandas-profiling.github.io/pandas-profiling/examples/master/colors/colors_report.html) (a simple colors dataset)
Specific features:
* [Russian Vocabulary](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/russian_vocabulary.html) (demonstrates text analysis)
* [Cats and Dogs](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/cats-and-dogs.html) (demonstrates image analysis from the file system)
* [Celebrity Faces](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/celebrity-faces.html) (demonstrates image analysis with EXIF information)
* [Website Inaccessibility](https://pandas-profiling.github.io/pandas-profiling/examples/master/features/website_inaccessibility_report.html) (demonstrates URL analysis)
* [Orange prices](https://pandas-profiling.github.io/pandas-profiling/examples/master/themes/united
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
pandas-profiling-master.zip (308个子文件)
make.bat 1KB
make.bat 884B
flatly.bootstrap.min.css 124KB
united.bootstrap.min.css 120KB
bootstrap.min.css 118KB
bootstrap-theme.min.css 23KB
style.css 5KB
custom.css 267B
config_variables.csv 1KB
config_correlations.csv 1KB
config_html.csv 1016B
config_missing.csv 704B
config_general.csv 346B
config_interactions.csv 308B
widgets.gif 3.55MB
widgets.gif 3.55MB
iframe.gif 2.38MB
iframe.gif 2.38MB
.gitignore 982B
.gitkeep 0B
style.html 2KB
layout.html 1KB
select.html 1KB
navigation.html 1KB
tabs.html 1KB
frequency_table_small.html 1KB
toggle_button.html 944B
javascript.html 904B
report.html 885B
frequency_table.html 882B
warnings.html 667B
sections.html 533B
table.html 498B
warning_high_correlation.html 432B
collapse.html 371B
diagram.html 347B
variable_info.html 281B
grid.html 258B
variable.html 239B
named_list.html 213B
warning_truncated.html 207B
warning_infinite.html 205B
warning_missing.html 202B
footer.html 201B
warning_type_date.html 198B
warning_zeros.html 189B
warning_high_cardinality.html 178B
warning_skewed.html 170B
warning_unsupported.html 170B
list.html 165B
warning_constant.html 150B
warning_uniform.html 124B
warning_constant_length.html 119B
warning_unique.html 117B
warning_duplicates.html 116B
duplicate.html 115B
sample.html 109B
MANIFEST.in 609B
titanic.ipynb 9.56MB
lazy_pipeline.ipynb 5.5MB
titanic.ipynb 5.5MB
modify_report_structure.ipynb 3.14MB
meteorites.ipynb 4KB
meteorites.ipynb 4KB
jquery-1.12.4.min.js 95KB
bootstrap.min.js 36KB
script.js 491B
LICENSE 1KB
Makefile 895B
Makefile 654B
CONTRIBUTING.md 7KB
bug_report.md 2KB
feature_request.md 709B
pull_request_template.md 268B
README.md 17KB
pandas_profiling.mplstyle 1KB
git-workflow.pdf 4.05MB
figure-git-workflow.pdf 2.75MB
twitter_wisdom.png 181KB
pycharm-integration.png 65KB
cli.png 63KB
cli.png 63KB
logo_header.png 52KB
qt.png 32KB
qt.png 32KB
icon.png 12KB
icon.png 12KB
theme_united_demo.py 78KB
meteorites.py 36KB
image.py 32KB
summary.py 23KB
test_describe.py 18KB
app.py 17KB
profile_report.py 15KB
render_categorical.py 11KB
report.py 10KB
messages.py 9KB
nza.py 8KB
base.py 8KB
correlations.py 8KB
共 308 条
- 1
- 2
- 3
- 4
资源评论
DL路人甲
- 粉丝: 32
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- C语言基础-C语言编程基础之Leetcode编程题解之第39题组合总和.zip
- C语言基础-C语言编程基础之Leetcode编程题解之第38题外观数列.zip
- C语言基础-C语言编程基础之Leetcode编程题解之第37题解数独.zip
- C语言基础-C语言编程基础之Leetcode编程题解之第36题有效的数独.zip
- C语言基础-C语言编程基础之Leetcode编程题解之第35题搜索插入位置.zip
- index.wxml
- C语言基础-C语言编程基础之Leetcode编程题解之第33题搜索旋转排序数组.zip
- 基于Python实现的手写数字识别系统源码.zip
- 从网页提取禁止转载的文字
- C语言基础-C语言编程基础之Leetcode编程题解之第32题最长有效括号.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功