LinearRegressionjupyter.zip_hearingvfw_机器学习

共36个文件

ipynb：14个

py：7个

pyc：6个

版权申诉

机器学习

线性回归

181 浏览量 2022-07-14 19:52:45 上传评论收藏 8.71MB ZIP 举报

线性回归是统计学和机器学习领域中一种基础但重要的预测模型，用于建立因变量（目标变量）与一个或多个自变量（特征）之间的线性关系。在这个"Linear Regression jupyter.zip"压缩包中，我们可以预见到包含了一个使用Python编程语言和Scikit-learn（简称sklearn）库实现的线性回归分析的Jupyter notebook。让我们深入理解线性回归的基本概念。线性回归试图找到一条直线（对于简单线性回归）或超平面（对于多元线性回归）来最佳拟合数据点，这条直线或超平面通过最小化残差平方和来确定，即所有预测值与实际值之差的平方和。在机器学习中，线性回归常用于预测任务，尤其是在连续数值型输出的情况下。 Python的Scikit-learn库是数据科学和机器学习领域的首选工具之一，它提供了简洁、高效的接口来实现各种机器学习算法，包括线性回归。在Sklearn中，我们可以使用`LinearRegression`类来构建和训练模型。以下是一般的步骤： 1. 导入必要的库：我们需要导入`numpy`处理数值计算，`pandas`处理数据集，以及`matplotlib`进行可视化。当然，最重要的`sklearn.linear_model.LinearRegression`用于线性回归。 2. 数据预处理：加载数据到DataFrame，检查缺失值，可能需要进行特征缩放，并将目标变量分离出来。 3. 创建模型：实例化`LinearRegression`对象。 4. 训练模型：使用`fit()`方法将训练数据输入模型，让模型学习数据中的关系。 5. 预测：对新数据使用`predict()`方法，得到预测结果。 6. 评估：计算模型的性能，如均方误差（MSE）、决定系数R²等。 7. 可视化：可以绘制学习曲线，残差图，以及实际值与预测值的散点图来帮助理解模型的表现。在Jupyter notebook中，这样的过程通常会伴随着代码解释和图表展示，便于理解和学习。通过这个项目，你可以了解到如何使用Python和sklearn进行线性回归的实践，以及如何评估和优化模型。此外，“hearingvfw”标签可能意味着这个项目特别关注听力或声音相关的数据，线性回归可能被用来预测某种声音特性或听力测试的结果。而“机器学习”标签则强调了这是运用自动化方法从数据中学习的过程。这个压缩包提供了一个实际操作的线性回归示例，涵盖了从数据处理到模型训练的完整流程，对于初学者或希望提升技能的数据科学家来说，是一个宝贵的资源。通过深入研究和实践，你将能够掌握线性回归的核心概念和在Python中的应用。

资源详情

资源评论

资源推荐

收起资源包目录

Linear Regression jupyter.zip （36个子文件）

Linear Regression jupyter

Untitled.ipynb 9KB

knn.ipynb 555B

img

MSE.png 6KB

gongshi.png 63KB

MAE.png 6KB

RMSE.png 10KB

simple linear regression.ipynb 28KB

preprocessing.py 1KB

metrics.py 1KB

__pycache__

SimpleLinearRegression.cpython-36.pyc 2KB

LinearRegression.cpython-36.pyc 4KB

metrics.cpython-36.pyc 1KB

KNN.cpython-36.pyc 3KB

Select_model.cpython-36.pyc 845B

__init__.cpython-36.pyc 144B

SimpleLinearRegression.py 2KB

LinearRegression.py 5KB

__init__.py 0B

.idea

misc.xml 190B

modules.xml 256B

ml.iml 493B

workspace.xml 19KB

KNN.py 3KB

Select_model.py 758B

Gradient Descent.ipynb 32KB

scikitLearnKnn.ipynb 8KB

MSE-vs-MAE.ipynb 57KB

benchmarks.mat 8.5MB

LinearRegression.ipynb 14KB

.ipynb_checkpoints

MSE-vs-MAE-checkpoint.ipynb 57KB

scikitLearnKnn-checkpoint.ipynb 7KB

LinearRegression-checkpoint.ipynb 14KB

simple linear regression-checkpoint.ipynb 29KB

Untitled-checkpoint.ipynb 4KB

knn-checkpoint.ipynb 72B

Gradient Descent-checkpoint.ipynb 72B

# coding=utf-8 import numpy as np from .metrics import r2_score class LinearRegression: def __init__(self): """初始化Linear Regression模型""" # θ1~θn，θ0，θ向量 self.coef_ = None self.intercept_ = None self._theta = None def fit_normal(self, X_train, y_train): """根据训练数据集X_train, y_train训练Linear Regression模型""" # 样本数量等于标签数量 assert X_train.shape[0] == y_train.shape[0], \ "the size of X_train must be equal to the size of y_train" # 创建矩阵：第一列都为1，其他列由X特征组成 X_b = np.hstack([np.ones((len(X_train), 1)), X_train]) # linalg.inv函数求逆阵 self._theta = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y_train) self.intercept_ = self._theta[0] self.coef_ = self._theta[1:] return self def fit_gd(self, X_train, y_train, eta=0.01, n_iters=1e4): """根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型""" assert X_train.shape[0] == y_train.shape[0], \ "the size of X_train must be equal to the size of y_train" def J(theta, X_b, y): try: return np.sum((y - X_b.dot(theta)) ** 2) / len(y) except: return float('inf') def dJ(theta, X_b, y): # res = np.empty(len(theta)) # res[0] = np.sum(X_b.dot(theta) - y) # for i in range(1, len(theta)): # res[i] = (X_b.dot(theta) - y).dot(X_b[:, i]) # return res * 2 / len(X_b) return X_b.T.dot(X_b.dot(theta) - y) * 2. / len(X_b) def gradient_descent(X_b, y, initial_theta, eta, n_iters=1e4, epsilon=1e-8): theta = initial_theta cur_iter = 0 while cur_iter < n_iters: gradient = dJ(theta, X_b, y) last_theta = theta theta = theta - eta * gradient if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon): break cur_iter += 1 return theta X_b = np.hstack([np.ones((len(X_train), 1)), X_train]) initial_theta = np.zeros(X_b.shape[1]) self._theta = gradient_descent(X_b, y_train, initial_theta, eta, n_iters) self.intercept_ = self._theta[0] self.coef_ = self._theta[1:] return self def fit_sgd(self, X_train, y_train, n_iters=5, t0=5, t1=50): """根据训练数据集X_train, y_train, 使用梯度下降法训练Linear Regression模型""" assert X_train.shape[0] == y_train.shape[0], \ "the size of X_train must be equal to the size of y_train" assert n_iters >= 1 def dJ_sgd(theta, X_b_i, y_i): return X_b_i * (X_b_i.dot(theta) - y_i) * 2. def sgd(X_b, y, initial_theta, n_iters, t0=5, t1=50): def learning_rate(t): return t0 / (t + t1) theta = initial_theta m = len(X_b) for cur_iter in range(n_iters): indexes = np.random.permutation(m) X_b_new = X_b[indexes] y_new = y[indexes] for i in range(m): gradient = dJ_sgd(theta, X_b_new[i], y_new[i]) theta = theta - learning_rate(cur_iter * m + i) * gradient return theta X_b = np.hstack([np.ones((len(X_train), 1)), X_train]) initial_theta = np.random.randn(X_b.shape[1]) self._theta = sgd(X_b, y_train, initial_theta, n_iters, t0, t1) self.intercept_ = self._theta[0] self.coef_ = self._theta[1:] return self def predict(self, X_predict): """给定待预测数据集X_predict，返回表示X_predict的结果向量""" assert self.intercept_ is not None and self.coef_ is not None, \ "must fit before predict!" # 样本特征数目必须等于θ1~θn系数个数 assert X_predict.shape[1] == len(self.coef_), \ "the feature number of X_predict must be equal to X_train" X_b = np.hstack([np.ones((len(X_predict), 1)), X_predict]) return X_b.dot(self._theta) def score(self, X_test, y_test): """根据测试数据集 X_test 和 y_test 确定当前模型的准确度""" y_predict = self.predict(X_test) return r2_score(y_test, y_predict) def __repr__(self): return "LinearRegression()"