pandas-0.23.2.tar.gz资源-CSDN文库

需积分: 1 67 浏览量 2024-02-14 20:15:35 上传评论收藏 9.49MB GZ 举报

共904个文件

py：618个

txt：51个

rst：48个

《Pandas 0.23.2：Python数据分析的核心库》 Pandas是Python编程语言中的一个开源数据处理和分析库，它为Python提供了一种高效、灵活且易于使用的数据结构，使得数据清洗、转换、整合以及分析变得更加简单。Pandas 0.23.2是该库的一个版本，它在前一版本的基础上进行了多项改进和优化，以更好地服务于数据科学家和数据工程师。在Pandas 0.23.2中，我们主要关注以下几个关键知识点： 1. **DataFrame和Series**：DataFrame是Pandas的核心数据结构，它类似于二维表格，可以理解为列式数据库。Series则是一维数据结构，类似于带索引的数组。这两种结构支持大量的内建函数和操作，如筛选、排序、聚合等，极大地提高了数据处理效率。 2. **数据导入与导出**：Pandas提供了丰富的数据导入功能，可以方便地读取CSV、Excel、SQL数据库等多种格式的数据。同时，它也能将处理后的数据导出为各种格式，方便分享和存储。 3. **缺失数据处理**：Pandas对缺失数据有完善的处理机制，可以便捷地进行缺失值的识别、填充或删除，确保数据的完整性。 4. **时间序列分析**：Pandas内置了对时间序列数据的支持，包括日期范围生成、时间间隔操作、时间戳处理等，适合金融、气象等领域的时间序列数据分析。 5. **数据融合与合并**：通过`merge`和`concat`函数，Pandas可以轻松实现不同数据集之间的连接和拼接，处理复杂的数据整合问题。 6. **数据清洗**：Pandas提供了一系列函数用于处理数据清洗，如去除重复值、替换异常值、数据类型转换等，使得数据预处理更为便捷。 7. **数据统计与可视化**：Pandas内置了一些基本的统计方法，如描述性统计、分组计算、频率统计等。结合Matplotlib或Seaborn等可视化库，可以快速生成统计图表，帮助用户理解数据分布和趋势。 8. **性能优化**：Pandas 0.23.2版本在性能上进行了优化，包括更快的运算速度和更小的内存占用，使得大规模数据处理成为可能。 9. **API改进**：此版本可能包含对API的调整和增强，以提高用户的使用体验和代码的可读性，但具体改动需参考官方文档。 10. **错误修复**：每个新版本都会修复一些已知的问题，确保软件的稳定性和可靠性。 Pandas 0.23.2作为Python数据分析的重要工具，无论对于初学者还是经验丰富的数据专业人士，都是一个不可或缺的库。通过掌握其核心功能和特性，我们可以更高效地进行数据探索、模型构建和业务分析。对于想要深入学习Python数据科学的人来说，Pandas的每一个版本都值得我们去了解和研究。

资源推荐

资源详情

资源评论

收起资源包目录

pandas-0.23.2.tar.gz （904个子文件）

join.c 7.89MB

interval.c 6.74MB

algos.c 5.76MB

sparse.c 3.65MB

groupby.c 2.73MB

hashtable.c 2.26MB

lib.c 2.14MB

parsers.c 2.08MB

index.c 1.65MB

offsets.c 1.45MB

timedeltas.c 1.31MB

timestamps.c 1.3MB

period.c 1.28MB

internals.c 1.07MB

conversion.c 1.04MB

strptime.c 1009KB

sas.c 1007KB

writers.c 1000KB

resolution.c 997KB

parsing.c 980KB

tslib.c 950KB

reduction.c 945KB

fields.c 761KB

nattype.c 758KB

frequencies.c 594KB

timezones.c 546KB

reshape.c 488KB

ops.c 483KB

missing.c 468KB

skiplist.c 388KB

hashing.c 380KB

ccalendar.c 346KB

np_datetime.c 327KB

testing.c 253KB

properties.c 162KB

indexing.c 102KB

objToJSON.c 74KB

tokenizer.c 60KB

ultrajsondec.c 29KB

ultrajsonenc.c 28KB

np_datetime.c 27KB

np_datetime_strings.c 23KB

JSONtoObj.c 19KB

period_helper.c 18KB

move.c 8KB

io.c 6KB

ujson.c 5KB

setup.cfg 878B

cmath 345B

theme.conf 101B

window.cpp 2.41MB

_unpacker.cpp 337KB

_packer.cpp 324KB

nature.css_t 6KB

iris.data 4KB

feather-0_3_1.feather 672B

fx_prices 16KB

pack_template.h 20KB

khash.h 19KB

unpack_template.h 15KB

ultrajson.h 10KB

ms_inttypes.h 8KB

unpack.h 8KB

ms_stdint.h 8KB

skiplist.h 7KB

tokenizer.h 7KB

parse_helper.h 7KB

sysdep.h 6KB

period_helper.h 4KB

pack.h 3KB

np_datetime.h 3KB

np_datetime_strings.h 3KB

py_defines.h 3KB

unpack_define.h 2KB

version.h 2KB

khash_python.h 2KB

numpy_helper.h 2KB

compat_helper.h 1KB

io.h 1KB

helper.h 629B

stdint.h 160B

portable.h 142B

hashtable_class_helper.pxi.in 27KB

groupby_helper.pxi.in 25KB

algos_common_helper.pxi.in 16KB

intervaltree.pxi.in 14KB

join_func_helper.pxi.in 13KB

algos_rank_helper.pxi.in 12KB

join_helper.pxi.in 11KB

sparse_op_helper.pxi.in 10KB

algos_take_helper.pxi.in 10KB

hashtable_func_helper.pxi.in 9KB

index_class_helper.pxi.in 2KB

reshape_helper.pxi.in 2KB

MANIFEST.in 755B

style.ipynb 36KB

items.jsonl 33B

LICENSE 2KB

README.md 10KB

RELEASE.md 238B

共 904 条

F M A

Data Wrangling

with pandas

Cheat Sheet

http://pandas.pydata.org

Syntax – Creating DataFrames

Tidy Data – A foundation for wrangling in pandas

In a tidy

data set:

F M A

Each variable is saved

in its own column

Each observation is

saved in its own row

Tidy data complements pandas’s vectorized

operations. pandas will automatically preserve

observations as you manipulate variables. No

other format works as intuitively with pandas.

Reshaping Data – Change the layout of a data set

M A F

M A

pd.melt(df)

Gather columns into rows.

df.pivot(columns='var', values='val')

Spread rows into columns.

pd.concat([df1,df2])

Append rows of DataFrames

pd.concat([df1,df2], axis=1)

Append columns of DataFrames

df.sort_values('mpg')

Order rows by values of a column (low to high).

df.sort_values('mpg',ascending=False)

Order rows by values of a column (high to low).

df.rename(columns = {'y':'year'})

Rename the columns of a DataFrame

df.sort_index()

Sort the index of a DataFrame

df.reset_index()

Reset index of DataFrame to row numbers, moving

index to columns.

df.drop(columns=['Length','Height'])

Drop columns from DataFrame

Subset Observations (Rows) Subset Variables (Columns)

df = pd.DataFrame(

{"a" : [4 ,5, 6],

"b" : [7, 8, 9],

"c" : [10, 11, 12]},

index = [1, 2, 3])

Specify values for each column.

df = pd.DataFrame(

[[4, 7, 10],

[5, 8, 11],

[6, 9, 12]],

index=[1, 2, 3],

columns=['a', 'b', 'c'])

Specify values for each row.

df = pd.DataFrame(

{"a" : [4 ,5, 6],

"b" : [7, 8, 9],

"c" : [10, 11, 12]},

index = pd.MultiIndex.from_tuples(

[('d',1),('d',2),('e',2)],

names=['n','v'])))

Create DataFrame with a MultiIndex

Method Chaining

Most pandas methods return a DataFrame so that

another pandas method can be applied to the

result. This improves readability of code.

df = (pd.melt(df)

.rename(columns={

'variable' : 'var',

'value' : 'val'})

.query('val >= 200')

)

df[df.Length > 7]

Extract rows that meet logical

criteria.

df.drop_duplicates()

Remove duplicate rows (only

considers columns).

df.head(n)

Select first n rows.

df.tail(n)

Select last n rows.

Logic in Python (and pandas)

Less than

Not equal to

Greater than

df.column.isin

(values)

Group membership

Equals

pd.isnull

(obj)

NaN

Less than or equals

pd.notnull

(obj)

Is not

NaN

Greater than or equals

&,|,~,^,

df.any(),df.all()

Logical

and, or, not, xor, any, all

http://pandas.pydata.org/ This cheat sheet inspired by Rstudio Data Wrangling Cheatsheet (https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf) Written by Irv Lustig, Princeton Consultants

df[['width','length','species']]

Select multiple columns with specific names.

df['width'] or df.width

Select single column with specific name.

df.filter(regex='regex')

Select columns whose name matches regular expression regex.

df.loc[:,'x2':'x4']

Select all columns between x2 and x4 (inclusive).

df.iloc[:,[1,2,5]]

Select columns in positions 1, 2 and 5 (first column is 0).

df.loc[df['a'] > 10, ['a','c']]

Select rows meeting logical condition, and only the specific columns .

regex (Regular Expressions) Examples

\.'

Matches strings containing

a period '.'

'Length$'

Matches strings ending with word 'Length'

'^Sepal'

Matches strings beginning with the word 'Sepal'

'^x[1

-5]$'

Matches strings beginning with 'x' and ending with 1,2,3,4,5

''^(?!Species$).*'

Matches strings except

the string 'Species'

df.sample(frac=0.5)

Randomly select fraction of rows.

df.sample(n=10)

Randomly select n rows.

df.iloc[10:20]

Select rows by position.

df.nlargest(n, 'value')

Select and order top n entries.

df.nsmallest(n, 'value')

Select and order bottom n entries.

评论收藏

内容反馈

程序员Chino的日记

粉丝: 3733
资源: 5万+

pandas-0.23.2.tar.gz

pandas-0.24.2.tar.gz

pandas-0.20.3.tar.gz

Python库 | ajenti.plugin.filemanager-0.23.tar.gz

Python库 | pytoil-0.23.2.tar.gz

Python库 | autofit-0.23.3.tar.gz

pandas-0.23.3.tar.gz

pandas-stubs-1.2.0.23.tar.gz

tensorflow_opt-0.23.tar.gz

Python库 | flytekitplugins-athena-0.23.0b2.tar.gz

yolov11源码+yolov11n、s、m.pt文件整合8.3.20版本

《点燃我温暖你》中李峋的同款爱心代码

Google Chrome浏览器ChromeDriver驱动下载(Chrome版本：131.0.6778.205)win64

python3.12对应的dlib-19.24.99-cp312-cp312-win-amd64

yolov8源码+yolov8n、s、m.pt文件整合8.2.0版本

Python学习笔记(干货) 中文PDF完整版.pdf

Google Chrome浏览器ChromeDriver驱动下载(Chrome版本：131.0.6778.86)win64

Python入门基础教程全套.ppt

PUBG罗技宏代码免费

Tesseract最新中文语言包chi-sim.traineddata

Google Chrome浏览器ChromeDriver驱动下载(Chrome版本：131.0.6778.140)win64

PyCharm 激活方法

国家中小学智慧教育平台（课件、课本、视频 ）下载器

微信小程序+反编译工具

影刀RPA应用一键迁移复制工具

Google Chrome浏览器ChromeDriver驱动下载(Chrome版本：131.0.6778.109)win64

Python 八股文.pdf

122版本Chrome最新驱动-122.0.6261.58

抢购haiwei.rar

Microsoft C++ Build Tools

计算机二级python真题题库（题目+答案）电子版笔记2

最新资源

国家中小学智慧教育平台（课件、课本、视频）下载器