爬虫+数据分析实战项目_数据挖掘分析项目实战资源-CSDN文库

共22个文件

ipynb：8个

py：4个

pptx：3个

爬虫

数据分析

需积分: 1 122 浏览量 2024-07-19 16:55:36 上传评论收藏 2.02MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

爬虫+数据分析实战项目.zip （22个子文件）

爬虫+数据分析实战项目

微信好友那些事

聊天机器人和性别预测.ipynb 14KB

微信好友分析.ipynb 116KB

无敌Scikit_Learn小抄.pdf 126KB

玩转itchat 微信好友那些事.pptx 358KB

itchat.pkl 346KB

.ipynb_checkpoints

微信好友分析-checkpoint.ipynb 72B

聊天机器人和性别预测-checkpoint.ipynb 14KB

friend.csv 24KB

猫眼电影爬虫及分析

maoyan.py 2KB

猫眼爬虫及数据分析.pptx 360KB

maoyan.csv 11KB

猫眼电影数据分析.ipynb 143KB

test.py 121B

.ipynb_checkpoints

猫眼电影数据分析-checkpoint.ipynb 72B

简书交友图片爬虫及颜值打分

test1.jpg 153KB

1.jpg 501KB

颜值打分.ipynb 5KB

简书交友图片爬虫及颜值打分.pptx 356KB

jianshu.py 1KB

test.py 371B

test2.jpg 131KB

.ipynb_checkpoints

颜值打分-checkpoint.ipynb 5KB

Python For Data Science Cheat Sheet

Scikit-Learn

Learn Python for data science Interactively at www.DataCamp.com

Scikit-learn

DataCamp

Learn Python for Data Science Interactively

Loading The Data

Also see NumPy & Pandas

Scikit-learn is an open source Python library that

implements a range of machine learning,

preprocessing, cross-validation and visualization

algorithms using a uniﬁed interface.

>>> import numpy as np

>>> X = np.random.random((10,5))

>>> y = np.array(['M','M','F','F','M','F','M','M','F','F','F'])

>>> X[X < 0.7] = 0

Your data needs to be numeric and stored as NumPy arrays or SciPy sparse

matrices. Other types that are convertible to numeric arrays, such as Pandas

DataFrame, are also acceptable.

Create Your Model

Model Fi!ing

Prediction

Tune Your Model

Evaluate Your Model’s Performance

Grid Search

Randomized Parameter Optimization

Linear Regression

>>> from sklearn.linear_model import LinearRegression

>>> lr = LinearRegression(normalize=True)

Support Vector Machines (SVM)

>>> from sklearn.svm import SVC

>>> svc = SVC(kernel='linear')

Naive Bayes

>>> from sklearn.naive_bayes import GaussianNB

>>> gnb = GaussianNB()

KNN

>>> from sklearn import neighbors

>>> knn = neighbors.KNeighborsClassier(n_neighbors=5)

Supervised learning

>>> lr.t(X, y)

>>> knn.t(X_train, y_train)

>>> svc.t(X_train, y_train)

Unsupervised Learning

>>> k_means.t(X_train)

>>> pca_model = pca.t_transform(X_train)

Accuracy Score

>>> knn.score(X_test, y_test)

>>> from sklearn.metrics import accuracy_score

>>> accuracy_score(y_test, y_pred)

Classiﬁcation Report

>>> from sklearn.metrics import classication_report

>>> print(classication_report(y_test, y_pred))

Confusion Matrix

>>> from sklearn.metrics import confusion_matrix

>>> print(confusion_matrix(y_test, y_pred))

Cross-Validation

>>> from sklearn.cross_validation import cross_val_score

>>> print(cross_val_score(knn, X_train, y_train, cv=4))

>>> print(cross_val_score(lr, X, y, cv=2))

Classiﬁcation Metrics

>>> from sklearn.grid_search import GridSearchCV

>>> params = {"n_neighbors": np.arange(1,3),

"metric": ["euclidean", "cityblock"]}

>>> grid = GridSearchCV(estimator=knn,

param_grid=params)

>>> grid.t(X_train, y_train)

>>> print(grid.best_score_)

>>> print(grid.best_estimator_.n_neighbors)

>>> from sklearn.grid_search import RandomizedSearchCV

>>> params = {"n_neighbors": range(1,5),

"weights": ["uniform", "distance"]}

>>> rsearch = RandomizedSearchCV(estimator=knn,

param_distributions=params,

cv=4,

n_iter=8,

random_state=5)

>>> rsearch.t(X_train, y_train)

>>> print(rsearch.best_score_)

A Basic Example

>>> from sklearn import neighbors, datasets, preprocessing

>>> from sklearn.model_selection import train_test_split

>>> from sklearn.metrics import accuracy_score

>>> iris = datasets.load_iris()

>>> X, y = iris.data[:, :2], iris.target

>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=33)

>>> scaler = preprocessing.StandardScaler().t(X_train)

>>> X_train = scaler.transform(X_train)

>>> X_test = scaler.transform(X_test)

>>> knn = neighbors.KNeighborsClassier(n_neighbors=5)

>>> knn.t(X_train, y_train)

>>> y_pred = knn.predict(X_test)

>>> accuracy_score(y_test, y_pred)

Supervised Learning Estimators

Unsupervised Learning Estimators

Principal Component Analysis (PCA)

>>> from sklearn.decomposition import PCA

>>> pca = PCA(n_components=0.95)

K Means

>>> from sklearn.cluster import KMeans

>>> k_means = KMeans(n_clusters=3, random_state=0)

Fit the model to the data

Fit to data, then transform it

Preprocessing The Data

Standardization

Normalization

>>> from sklearn.preprocessing import Normalizer

>>> scaler = Normalizer().t(X_train)

>>> normalized_X = scaler.transform(X_train)

>>> normalized_X_test = scaler.transform(X_test)

Training And Test Data

>>> from sklearn.model_selection import train_test_split

>>> X_train, X_test, y_train, y_test = train_test_split(X,

random_state=0)

>>> from sklearn.preprocessing import StandardScaler

>>> scaler = StandardScaler().t(X_train)

>>> standardized_X = scaler.transform(X_train)

>>> standardized_X_test = scaler.transform(X_test)

Binarization

>>> from sklearn.preprocessing import Binarizer

>>> binarizer = Binarizer(threshold=0.0).t(X)

>>> binary_X = binarizer.transform(X)

Encoding Categorical Features

Supervised Estimators

>>> y_pred = svc.predict(np.random.random((2,5)))

>>> y_pred = lr.predict(X_test)

>>> y_pred = knn.predict_proba(X_test)

Unsupervised Estimators

>>> y_pred = k_means.predict(X_test)

>>> from sklearn.preprocessing import LabelEncoder

>>> enc = LabelEncoder()

>>> y = enc.t_transform(y)

Imputing Missing Values

Predict labels

Estimate probability of a label

Predict labels in clustering algos

>>> from sklearn.preprocessing import Imputer

>>> imp = Imputer(missing_values=0, strategy='mean', axis=0)

>>> imp.t_transform(X_train)

Generating Polynomial Features

>>> from sklearn.preprocessing import PolynomialFeatures

>>> poly = PolynomialFeatures(5)

>>> poly.t_transform(X)

Regression Metrics

Mean Absolute Error

>>> from sklearn.metrics import mean_absolute_error

>>> y_true = [3, -0.5, 2]

>>> mean_absolute_error(y_true, y_pred)

Mean Squared Error

>>> from sklearn.metrics import mean_squared_error

>>> mean_squared_error(y_test, y_pred)

R² Score

>>> from sklearn.metrics import r2_score

>>> r2_score(y_true, y_pred)

Clustering Metrics

Adjusted Rand Index

>>> from sklearn.metrics import adjusted_rand_score

>>> adjusted_rand_score(y_true, y_pred)

Homogeneity

>>> from sklearn.metrics import homogeneity_score

>>> homogeneity_score(y_true, y_pred)

V-measure

>>> from sklearn.metrics import v_measure_score

>>> metrics.v_measure_score(y_true, y_pred)

Estimator score method

Metric scoring functions

Precision, recall, f1-score

and support

评论收藏

内容反馈

才华横溢caozy

粉丝: 2766
资源: 163

爬虫+数据分析实战项目

spider:XksA的爬虫+数据分析实战项目

实战项目分析

爬虫+数据分析实战项目（基于python）.zip

基于python的爬虫+数据分析实战项目.zip

爬虫+数据分析实战项目.zip

基于python爬虫+数据分析实战项目文档详细+资料齐全.zip

爬虫+数据分析实战项目,基于python+源代码+文档说明

爬虫+数据分析实战项目大全+各种项目案例

Python爬虫实战+数据分析+数据可视化

Python 入门爬虫和数据分析实战

python爬虫及数据分析实战案例.zip

基于selenium的51job网站爬虫与数据可视化分析实战

爬虫+数据分析实战项目.rar

爬虫-数据分析-实战项目代码

python的豆瓣电影爬虫+数据分析可视化项目源码+文档说明（高分项目）

Python豆瓣电影排行榜爬虫+数据分析可视化（高分大作业）

微信爬虫项目实例.zip

基于Python实现的北京市大数据岗位招聘数据分析及可视化展示项目源代码+数据+爬虫

项目实战项目

爬虫理论核心剖析超级实战 大神爬虫高级技术-倾囊相授 爬虫提升+项目实战+数据分析

基于网络爬虫的计量数据分析系统开发

基于爬虫的房源数据分析系统

python豆瓣电影数据爬虫+可视化分析项目源码+部署说明（高分项目）

Python爬虫实战+数据分析+数据可视化源码合集

Python股票数据爬虫采集+分析+可视化项目源码（95分以上期末大作业）

安居客出租房（武汉为例）爬虫+数据分析+可视化

Python股票数据爬虫+分析+可视化框架（高分完整项目代码）

最新资源

爬虫理论核心剖析超级实战大神爬虫高级技术-倾囊相授爬虫提升+项目实战+数据分析