pandas-profiling-3.6.3.tar.gz资源-CSDN文库

需积分: 1 179 浏览量 2024-03-07 12:45:55 上传评论收藏 257KB GZ 举报

共216个文件

py：144个

html：43个

txt：8个

资源推荐

资源详情

资源评论

收起资源包目录

pandas-profiling-3.6.3.tar.gz （216个子文件）

make.bat 978B

setup.cfg 38B

simplex.bootstrap.min.css 125KB

flatly.bootstrap.min.css 124KB

cosmo.bootstrap.min.css 123KB

united.bootstrap.min.css 120KB

bootstrap.min.css 118KB

bootstrap-theme.min.css 23KB

style.css 6KB

style.html 3KB

alerts.html 2KB

toggle_button.html 2KB

frequency_table.html 2KB

variable_info.html 2KB

select.html 2KB

table.html 1KB

frequency_table_small.html 1KB

navigation.html 1KB

tabs.html 1KB

report.html 962B

javascript.html 896B

batch_grid.html 769B

grid.html 712B

sections.html 533B

alert_high_correlation.html 424B

collapse.html 371B

diagram.html 353B

variable.html 239B

named_list.html 213B

dropdown.html 206B

sample.html 205B

footer.html 201B

alert_truncated.html 185B

alert_infinite.html 183B

alert_type_date.html 180B

alert_missing.html 180B

alert_zeros.html 167B

list.html 165B

alert_high_cardinality.html 154B

alert_unsupported.html 152B

alert_skewed.html 151B

alert_imbalance.html 148B

alert_duplicates.html 138B

alert_constant.html 129B

alert_uniform.html 106B

alert_constant_length.html 101B

alert_non_stationary.html 99B

alert_unique.html 99B

correlation_table.html 97B

alert_seasonal.html 93B

duplicate.html 80B

alert_empty.html 17B

MANIFEST.in 702B

jquery-1.12.4.min.js 95KB

bootstrap.min.js 36KB

script.js 941B

LICENSE 1KB

Makefile 759B

README.md 17KB

CONTRIBUTING.md 6KB

PKG-INFO 20KB

plot.py 28KB

profile_report.py 17KB

render_categorical.py 17KB

report.py 14KB

config.py 11KB

alerts.py 11KB

compare_reports.py 10KB

render_real.py 9KB

describe_categorical_pandas.py 9KB

render_timeseries.py 9KB

formatters.py 9KB

overview.py 9KB

dataframe.py 8KB

typeset.py 8KB

render_image.py 7KB

correlations_pandas.py 7KB

describe.py 6KB

describe_image_pandas.py 6KB

describe_numeric_pandas.py 5KB

describe_timeseries_pandas.py 5KB

summary_algorithms.py 5KB

serialize_report.py 5KB

expectations_report.py 4KB

render_path.py 4KB

render_count.py 4KB

render_boolean.py 4KB

correlations.py 4KB

render_url.py 4KB

correlations.py 4KB

flavours.py 4KB

frequency_table_utils.py 4KB

container.py 4KB

missing.py 3KB

render_date.py 3KB

missing.py 3KB

summary_pandas.py 3KB

expectation_algorithms.py 3KB

utils.py 3KB

共 216 条

# `pandas-profiling` ![Pandas Profiling Logo Header](https://pandas-profiling.ydata.ai/docs/assets/logo_header.png) [![Build Status](https://github.com/ydataai/pandas-profiling/actions/workflows/tests.yml/badge.svg?branch=master)](https://github.com/ydataai/pandas-profiling/actions/workflows/tests.yml) [![PyPI download month](https://img.shields.io/pypi/dm/pandas-profiling.svg)](https://pypi.python.org/pypi/pandas-profiling/) [![](https://pepy.tech/badge/pandas-profiling)](https://pypi.org/project/pandas-profiling/) [![Code Coverage](https://codecov.io/gh/ydataai/pandas-profiling/branch/master/graph/badge.svg?token=gMptB4YUnF)](https://codecov.io/gh/ydataai/pandas-profiling) [![Release Version](https://img.shields.io/github/release/ydataai/pandas-profiling.svg)](https://github.com/ydataai/pandas-profiling/releases) [![Python Version](https://img.shields.io/pypi/pyversions/pandas-profiling)](https://pypi.org/project/pandas-profiling/) [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/python/black) <p align="center"> <a href="https://pandas-profiling.ydata.ai/docs/master/">Documentation</a> | <a href="https://discord.com/invite/mw7xjJ7b7s">Discord</a> | <a href="https://stackoverflow.com/questions/tagged/pandas-profiling">Stack Overflow</a> | <a href="https://pandas-profiling.ydata.ai/docs/master/pages/reference/changelog.html#changelog">Latest changelog</a> </p> <p align="center"> Do you like this project? Show us your love and <a href="https://engage.ydata.ai">give feedback!</a> </p> `pandas-profiling` primary goal is to provide a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution. Like pandas `df.describe()` function, that is so handy, pandas-profiling delivers an extended analysis of a DataFrame while alllowing the data analysis to be exported in different formats such as **html** and **json**. The package outputs a simple and digested analysis of a dataset, including **time-series** and **text**. ### Key features - **Type inference**: automatic detection of columns' data types (*Categorical*, *Numerical*, *Date*, etc.) - **Warnings**: A summary of the problems/challenges in the data that you might need to work on (*missing data*, *inaccuracies*, *skewness*, etc.) - **Univariate analysis**: including descriptive statistics (mean, median, mode, etc) and informative visualizations such as distribution histograms - **Multivariate analysis**: including correlations, a detailed analysis of missing data, duplicate rows, and visual support for variables pairwise interaction - **Time-Series**: including different statistical information relative to time dependent data such as auto-correlation and seasonality, along ACF and PACF plots. - **Text analysis**: most common categories (uppercase, lowercase, separator), scripts (Latin, Cyrillic) and blocks (ASCII, Cyrilic) - **File and Image analysis**: file sizes, creation dates, dimensions, indication of truncated images and existence of EXIF metadata - **Compare datasets**: one-line solution to enable a fast and complete report on the comparison of datasets - **Flexible output formats**: all analysis can be exported to an HTML report that can be easily shared with different parties, as JSON for an easy integration in automated systems and as a widget in a Jupyter Notebook. The report contains three additional sections: - **Overview**: mostly global details about the dataset (number of records, number of variables, overall missigness and duplicates, memory footprint) - **Alerts**: a comprehensive and automatic list of potential data quality issues (high correlation, skewness, uniformity, zeros, missing values, constant values, between others) - **Reproduction**: technical details about the analysis (time, version and configuration) > ### ð Latest features > - Looking for how you can do an EDA for Time-Series ð ? Check [this blogpost](https://towardsdatascience.com/how-to-do-an-eda-for-time-series-cbb92b3b1913). > - You want to compare 2 datasets and get a report? Check [this blogpost](https://medium.com/towards-artificial-intelligence/how-to-compare-2-dataset-with-pandas-profiling-2ae3a9d7695e) ## ð Use cases Pandas-profiling can be used to deliver a variety of different use-case. The documentation includes guides, tips and tricks for tackling them: | Use case | Description | |----------|----------------------------------------------------------------------------------------------| | [Comparing datasets](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/comparing_datasets.html ) | Comparing multiple version of the same dataset | | [Profiling a Time-Series dataset](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/time_series_datasets.html) | Generating a report for a time-series dataset with a single line of code | |[Profiling large datasets](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/big_data.html ) | Tips on how to prepare data and configure `pandas-profiling` for working with large datasets | | [Handling sensitive data](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/sensitive_data.html ) | Generating reports which are mindful about sensitive data in the input dataset | | [Dataset metadata and data dictionaries](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/metadata.html) | Complementing the report with dataset details and column-specific data dictionaries | | [Customizing the report's appearance](https://pandas-profiling.ydata.ai/docs/master/pages/use_cases/custom_report_appearance.html ) | Changing the appearance of the report's page and of the contained visualizations | > â¡ Looking for a Spark backend to profile large datasets? It's [work in progress](https://github.com/ydataai/pandas-profiling/projects/3). ## â¶ï¸ Quickstart Start by loading your pandas `DataFrame` as you normally would, e.g. by using: ```python import numpy as np import pandas as pd from pandas_profiling import ProfileReport df = pd.DataFrame(np.random.rand(100, 5), columns=["a", "b", "c", "d", "e"]) ``` To generate the standard profiling report, merely run: ```python profile = ProfileReport(df, title="Pandas Profiling Report") ``` ### Using inside Jupyter Notebooks There are two interfaces to consume the report inside a Jupyter notebook: through widgets and through an embedded HTML report. <img alt="Notebook Widgets" src="https://pandas-profiling.ydata.ai/docs/master/assets/widgets.gif" width="800" /> The above is achieved by simply displaying the report as a set of widgets. In a Jupyter Notebook, run: ```python profile.to_widgets() ``` The HTML report can be directly embedded in a cell in a similar fashion: ```python profile.to_notebook_iframe() ``` <img alt="HTML" src="https://pandas-profiling.ydata.ai/docs/master/assets/iframe.gif" width="800" /> ### Exporting the report to a file To generate a HTML report file, save the `ProfileReport` to an object and use the `to_file()` function: ```python profile.to_file("your_report.html") ``` Alternatively, the report's data can be obtained as a JSON file: ```python # As a JSON string json_data = profile.to_json() # As a file profile.to_file("your_report.json") ``` ### Using in the command line For standard formatted CSV files (which can be read directly by pandas without additional settings), the `pandas_profiling` executable can be used in the command line. The example below generates a report named _Example Profiling Report_, using a configuration file called `default.yaml`, in the file `report.html` by processing a `data.csv` dataset. ```sh pandas_profiling --title "Exampl

评论收藏

内容反馈