pandas全部操作指令表_pandas常用命令资源-CSDN文库

需积分: 44 197 浏览量 2017-11-21 18:21:24 上传评论 1 收藏 87KB PDF 举报

根据给定文件信息，以下是关于“pandas全部操作指令表”的详细知识点说明：一、基本导入与数据导入 1. 导入pandas与numpy库，通常使用别名pd与np，以便后续操作。 ```python import pandas as pd import numpy as np ``` 2. 使用pandas库读取各种格式的数据文件，例如CSV、文本、Excel、SQL、JSON、HTML网页等。 ```python pd.read_csv(filename) # 读取CSV文件 pd.read_table(filename) # 读取定界符文本文件，如制表符分隔的TSV文件 pd.read_excel(filename) # 读取Excel文件 pd.read_sql(query, connection_object) # 从SQL表或数据库读取数据 pd.read_json(json_string) # 读取JSON格式的字符串、URL或文件 pd.read_html(url) # 解析HTML的URL、字符串或文件，并从中提取表格 pd.read_clipboard() # 将剪贴板的内容传递给read_table() ``` 3. 创建测试数据对象，如随机数DataFrame，列表创建Series。 ```python pd.DataFrame(np.random.rand(20, 5)) # 创建5列20行的随机浮点数DataFrame pd.Series(my_list) # 从可迭代对象my_list创建Series ``` 二、数据导出 1. 将DataFrame导出到不同格式的文件，如CSV、Excel、SQL、JSON、HTML。 ```python df.to_csv(filename) # 将DataFrame导出为CSV文件 df.to_excel(filename) # 将DataFrame导出为Excel文件 df.to_sql(table_name, connection_object) # 将DataFrame写入SQL表 df.to_json(filename) # 将DataFrame导出为JSON格式文件 df.to_html(filename) # 将DataFrame保存为HTML表格 df.to_clipboard() # 将DataFrame写入剪贴板 ``` 三、数据查看与检查 1. 查看DataFrame的前n行、后n行，以及数据的形状（行数和列数）。 ```python df.head(n) # 查看前n行数据 df.tail(n) # 查看后n行数据 df.shape() # 查看行数和列数 ``` 2. 查看DataFrame的索引、数据类型、内存信息，以及数值型列的描述统计。 ```*** ***() # 查看索引、数据类型和内存信息 df.describe() # 查看数值型列的描述统计 ``` 3. 对于Series对象，查看唯一值及其计数。 ```python s.value_counts(dropna=False) # 查看唯一值及计数 ``` 4. 对所有列应用唯一值计数，查看每列的唯一值分布。 ```python df.apply(pd.Series.value_counts) # 对所有列应用唯一值计数 ``` 四、数据选择与处理 1. 通过列标签选择数据列。 ```python df[col] # 返回标签为col的列，作为Series df[[col1, col2]] # 返回col1和col2作为新***ame ``` 2. 通过位置和索引选择数据，包括选取行、列、单个元素。 ```python s.iloc[0] # 通过位置选择 s.loc[0] # 通过索引选择 df.iloc[0, :] # 选择第一行 df.iloc[0, 0] # 选择第一行的第一个元素 ``` 五、数据清洗 1. 修改DataFrame的列名。 ```python df.columns = ['a', 'b', 'c'] # 将列名重命名为a、b、c ``` 2. 检查并处理缺失值，如检查空值、删除含有空值的行。 ```python pd.isnull() # 检查空值，返回布尔数组 pd.notnull() # 检查非空值，与pd.isnull()相反 df.dropna() # 删除所有包含空值的行 ``` 以上总结了pandas库中的主要操作指令，这些指令覆盖了从数据导入、查看、选择到数据清洗的常见需求，是数据分析与处理的重要工具。由于篇幅限制，未列出全部细节，但在实际应用中，pandas库还提供了大量高级功能和细节配置，供数据科学家深入操作和优化数据处理流程。

资源推荐

资源详情

资源评论

IMPORTING DATA

pd.read_csv(filename) - From a CSV file

pd.read_table(filename) - From a delimited text

file (like TSV)

pd.read_excel(filename) - From an Excel file

pd.read_sql(query, connection_object) -

Read from a SQL table/database

pd.read_json(json_string) - Read from a JSON

formatted string, URL or file.

pd.read_html(url) - Parses an html URL, string or

file and extracts tables to a list of dataframes

pd.read_clipboard() - Takes the contents of your

clipboard and passes it to read_table()

pd.DataFrame(dict) - From a dict, keys for col-

umns names, values for data as lists

EXPORTING DATA

df.to_csv(filename) - Write to a CSV file

df.to_excel(filename) - Write to an Excel file

df.to_sql(table_name, connection_object) -

Write to a SQL table

df.to_json(filename) - Write to a file in JSON

format

df.to_html(filename) - Save as an HTML table

df.to_clipboard() - Write to the clipboard

CREATE TEST OBJECTS

Useful for testing

pd.DataFrame(np.random.rand(20,5)) - 5 col-

umns and 20 rows of random floats

pd.Series(my_list) - Create a series from an

iterable my_list

df.index = pd.date_range('1900/1/30',

periods=df.shape[0]) - Add a date index

VIEWING/INSPECTING DATA

df.head(n) - First n rows of the DataFrame

df.tail(n) - Last n rows of the DataFrame

df.shape() - Number of rows and columns

df.info() - Index, Datatype and Memory informa-

tion

df.describe() - Summary statistics for numerical

columns

s.value_counts(dropna=False) - View unique

values and counts

df.apply(pd.Series.value_counts) - Unique

values and counts for all columns

SELECTION

df[col] - Return column with label col as Series

df[[col1, col2]] - Return Columns as a new

DataFrame

s.iloc[0] - selection by position

s.loc[0] - selection by index

df.iloc[0,:] - first row

df.iloc[0,0] - first element of first column

DATA CLEANING

df.columns = ['a','b','c'] - Rename columns

pd.isnull() - Checks for null Values, Returns

Boolean Arrray

pd.notnull() - Opposite of s.isnull()

df.dropna() - Drop all rows that contain null

values

df.dropna(axis=1) - Drop all columns that con-

tain null values

df.dropna(axis=1,thresh=n) - Drop all rows

have have less than n non null values

df.fillna(x) - Replace all null values with x

s.fillna(s.mean()) - Replace all null values with

the mean (mean can be replaced with almost any

function from the statistics section)

s.astype(float) - Convert the datatype of the

series to float

s.replace(1,'one') - Replace all values equal to

1 with 'one'

s.replace([1,3],['one','three']) - Replace all

1 with 'one' and 3 with 'three'

df.rename(columns=lambda x: x + 1) - mass

renaming of columns

df.rename(columns={'old_name': 'new_

name'}) - selective renaming

df.set_index('column_one') - change the index

df.rename(index=lambda x: x + 1) - mass

renaming of index

FILTER, SORT, & GROUPBY

df[df[col] > 0.5] - Rows where the col column

is greater than 0.5

df[(df[col] > 0.5) & (df[col] < 0.7)] -

Rows where 0.7 > col > 0.5

df.sort_values(col1) - Sort values by col1 in

ascending order

df.sort_values(col2,ascending=False) - Sort

values by col2 in descending order

df.sort_values([col1,col2],

ascending=[True,False]) - Sort values by col1 in

ascending order then col2 in descending order

df.groupby(col) - Return a groupby object for

values from one column

df.groupby([col1,col2]) - Return a groupby

object values from multiple columns

df.groupby(col1)[col2].mean() - Return the

mean of the values in col2, grouped by the values

in col1 (mean can be replaced with almost any

function from the statistics section)

df.pivot_table(index=col1,values=

[col2,col3],aggfunc=max) - Create a pivot table

that groups by col1 and calculates the mean of

col2 and col3

df.groupby(col1).agg(np.mean) - find the

average across all columns for every unique column

1 group

data.apply(np.mean) - apply a function across

each column

data.apply(np.max, axis=1) - apply a function

across each row

JOIN/COMBINE

df1.append(df2) - Add the rows in df1 to the end

of df2 (columns should be identical)

df.concat([df1, df2],axis=1) - Add the

columns in df1 to the end of df2 (rows should be

identical)

df1.join(df2,on=col1,how='inner') - SQL-style

join the columns in df1 with the columns on df2

where the rows for col have identical values. how

can be one of 'left', 'right', 'outer', 'inner'

STATISTICS

These can all be applied to a series as well.

df.describe() - Summary statistics for numerical

columns

df.mean() - Return the mean of all columns

df.corr() - finds the correlation between columns

in a DataFrame.

df.count() - counts the number of non-null values

in each DataFrame column.

df.max() - finds the highest value in each column.

df.min() - finds the lowest value in each column.

df.median() - finds the median of each column.

df.std() - finds the standard deviation of each

column.

Data Science Cheat Sheet

Pandas

KEY

We’ll use shorthand in this cheat sheet

df - A pandas DataFrame object

s - A pandas Series object

IMPORTS

Import these to start

import pandas as pd

import numpy as np

LEARN DATA SCIENCE ONLINE

Start Learning For Free - www.dataquest.io

LEARN DATA SCIENCE ONLINE

Start Learning For Free - www.dataquest.io

本内容试读结束，登录后可阅读更多

下载后可阅读完整内容，剩余0页未读，立即下载

评论收藏

内容反馈

bluse20

粉丝: 1
资源: 2

pandas全部操作指令表

pandas常用操作.pdf

Pandas常用操作.rar

pandas基础操作知识

pandas使用手册.pdf

pandas cheatsheet

pandas修改DataFrame列名的方法

用python语言把excel表格首行删去

Pandas快速参考卡（中文版）

pandas删除指定行详解

Pandas 使用手册.pdf

Pandas基础操作.pdf

Pandas综合练习题一

Pandas基本操作.ipynb

Pandas数据操作练习.ipynb

pandas使用工作技能总结

解决python pandas读取excel中多个不同sheet表格存在的问题

可以读取多个工作表的excel

python pandas消除空值和空格以及 Nan数据替换方法

【整理】pandas教程

Windows下Python使用Pandas模块操作Excel文件的教程

pandas官方文档中文版_pandas_pandas文档_python_

pandas基础操作1.0

python 使用pandas操作EXCEL表格数据

pandas1.4.3官方文档

Numpy Pandas Matplotlib Seaborn基本操作

Pandas手册.pdf

pandas全表查询定位某个值所在行列的方法

天池 pandas实践第二章pandas基础中所用的数据

pandas日期操作.py

最新资源