pandas-summary-0.1.0.tar.gz资源-CSDN文库

需积分: 5 12 浏览量 2024-03-08 15:37:52 上传评论收藏 3KB GZ 举报

共12个文件

txt：4个

py：2个

pkg-info：2个

资源推荐

资源详情

资源评论

收起资源包目录

pandas-summary-0.1.0.tar.gz （12个子文件）

pandas-summary-0.1.0

setup.py 1KB

LICENSE 1KB

PKG-INFO 531B

pandas_summary

__init__.py 49B

MANIFEST.in 34B

setup.cfg 38B

README.md 4KB

pandas_summary.egg-info

SOURCES.txt 254B

top_level.txt 15B

PKG-INFO 531B

requires.txt 22B

dependency_links.txt 1B

# pandas_summary An extension to [pandas](http://pandas.pydata.org/) dataframes describe function. The module contains `DataFrameSummary` object that extend `describe()` with: * **properties** * dfs.columns_stats: counts, uniques, missing, missing_perc, and type per column * dsf.columns_types: a count of the types of columns * dfs[column]: more in depth summary of the column * **function** * summary(): extends the `describe()` function with the values with `columns_stats` # Installation The module can be easily installed with pip: ```conslole > pip install pandas-summary ``` This module depends on `numpy` and `pandas`. Optionally you can get also some nice visualisations if you have `matplotlib` installed. # Tests To run the tests, execute the command `python setup.py test` # Usage The module contains one class: ## DataFrameSummary The `DataFrameSummary` expect a pandas `DataFrame` to summarise. ```python from pandas_summary import DataFrameSummary dfs = DataFrameSummary(df) ``` getting the columns types ```python dfs.columns_types numeric 9 bool 3 categorical 2 unique 1 date 1 constant 1 dtype: int64 ``` getting the columns stats ```python dfs.columns_stats A B C D E counts 5802 5794 5781 5781 4617 uniques 5802 3 5771 128 121 missing 0 8 21 21 1185 missing_perc 0% 0.14% 0.36% 0.36% 20.42% types unique categorical numeric numeric numeric ``` getting a single column summary, e.g. numerical column ```python # we can also access the column using numbers A[1] dfs['A'] std 0.2827146 max 1.072792 min 0 variance 0.07992753 mean 0.5548516 5% 0.1603367 25% 0.3199776 50% 0.4968588 75% 0.8274732 95% 1.011255 iqr 0.5074956 kurtosis -1.208469 skewness 0.2679559 sum 3207.597 mad 0.2459508 cv 0.5095319 zeros_num 11 zeros_perc 0,1% deviating_of_mean 21 deviating_of_mean_perc 0.36% deviating_of_median 21 deviating_of_median_perc 0.36% top_correlations {u'D': 0.702240243124, u'E': -0.663} counts 5781 uniques 5771 missing 21 missing_perc 0.36% types numeric Name: A, dtype: object ``` # Future development Summary analysis between columns, i.e. `dfs[[1, 2]]`

评论收藏

内容反馈