斯坦福机器学习讲义(全)Stanford-Machine-Leaning资源-CSDN文库

需积分: 11 104 浏览量 2017-05-18 16:43:17 上传评论收藏 2.19MB PDF 举报

在提供的文件信息中，我们看到了一个关于斯坦福机器学习课程讲义的概述，其中详细描述了监督学习的一些基础概念，以及线性回归的入门知识。下面我们将详细梳理这些知识点。讲义提到了监督学习问题的几个例子，通过具体的房屋面积和价格数据集来展示监督学习的应用场景。在这个例子中，我们试图根据房屋的面积来预测房价。这里引入了监督学习中的一个基本概念——训练集（Training set），它是由一系列的训练样例（Training examples）组成的，每个训练样例是一个输入-输出对，用来表示我们想要预测的变量。为了形式化描述监督学习问题，文档定义了输入变量（input variables），也称为输入特征（input features），并用x(i)来表示；输出变量（output variable）或目标变量（target variable），用y(i)来表示。一个输入-输出对(x(i), y(i))被称为一个训练样例。整个数据集被称为训练集，记作 {(x(i), y(i)); i=1, ..., m}。其中上标(i)仅仅是一个索引，并不是指数运算。监督学习的目标是使用训练集来学习一个函数h，该函数能够预测给定输入x时对应的输出y，即寻找函数h:X→Y。在历史上，这个函数被称为假设（hypothesis）。例如，在房屋面积和价格的例子中，我们试图学习一个能根据房屋面积预测价格的函数。当目标变量是连续的，比如房价，我们称之为回归问题（regression problem）。如果目标变量只能取几个离散值，例如根据房屋面积预测房屋是独立屋还是公寓，这种情况被称为分类问题（classification problem）。随后，文档转入线性回归（Linear Regression）的讨论。为了使房屋价格的例子更有趣，讲义引入了一个更为丰富的数据集，其中除了房屋面积外，还包括了卧室的数量。在这个例子中，输入特征x是从R2空间中取值的二维向量。例如，x(i)1表示第i个房屋的面积。线性回归的目标是找到一个函数h，它描述了输入特征和目标变量之间的线性关系。在这里，hθ(x)通常表示为θ0 + θ1x1 + ... + θnxn，其中n是输入特征的数量。学习算法的目标是为参数θ选择合适的值，以便最好地预测训练集中给定的x与y之间的关系。文档还可能介绍了如何通过最小化代价函数（cost function）来学习最佳的θ值。常用的代价函数是均方误差（Mean Squared Error, MSE），它衡量的是预测值和实际值之间的平方差的平均值。通过优化算法（比如梯度下降，Gradient Descent）来最小化这个代价函数，我们可以求得一组最优的θ值，使得学习得到的假设函数hθ(x)与实际数据尽可能吻合。总结以上知识点，我们可以归纳出以下几个核心概念： - 监督学习问题的定义，包括输入特征、目标变量、训练集等概念。 - 学习函数h的动机和定义，及其在回归问题和分类问题中的不同表现形式。 - 线性回归作为最基础的回归技术，其目标是通过训练数据学习线性关系，并用最小均方误差等标准寻找最佳拟合模型。 - 代价函数的引入和优化算法的选择，以实现对模型参数的调整和模型的训练过程。以上内容不仅涵盖了监督学习和线性回归的基本概念，还可能涉及到参数估计、模型评估和优化策略等重要知识点，为深入理解机器学习中的回归分析和模型构建打下坚实的基础。

资源推荐

资源详情

资源评论

CS229 Lecture notes

Andrew Ng

Supervised learning

Lets start by talking about a few examples of supervised learning problems.

Suppose we have a dataset giving the living areas and prices of 47 houses

from Portland, Oregon:

Living area (feet

) Price (1000$s)

2104 400

1600

330

2400

369

1416

232

3000

540

We can plot this data:

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

100

200

300

400

500

600

700

800

900

1000

housing prices

square feet

price (in $1000)

Given data like this, how can we learn to predict the prices of other houses

in Portland, as a function of the size of their living areas?

For a single training example, this gives the update rule:

:= θ

+ α



(i)

− h

(i)

)



(i)

The rule is called the LMS update rule (LMS stands for “least mean squares”),

and is also known as the Widrow-Hoﬀ learning rule. This rule has several

properties that seem natural and intuitive. For instance, the magnitude of

the update is proportional to the error term (y

(i)

− h

(i)

)); thus, for in-

stance, if we are encounter ing a training example on which our prediction

nearly matches the actual value of y

(i)

, then we ﬁnd that there is little need

to change the parameters; in contrast, a larger change to the parameters will

be made if our prediction h

(i)

) has a large error (i.e., if it is very far from

(i)

We’d derived the LMS rule for when there was only a single training

example. There are two ways to modify this method for a training set of

more than one example. The ﬁr st is replace it with the following algorithm:

Repeat until convergence {

:= θ

+ α

i=1



(i)

− h

(i)

)



(i)

(for every j).

}

The reader can easily verify that the quantity in the summation in the update

rule above is just ∂J(θ)/∂θ

(for the original deﬁnition of J). So, this is

simply gradient descent on the original cost function J. This method looks

at every example in the entire training set on every step, and is called batch

gradient descent. Note that, while gradient descent can be susceptible

to local minima in general, the optimization problem we have posed here

for linear regression has only one global, and no other local, optima; thus

gradient descent always converges (assuming the learning rate α is not too

large) to the global minimum. Indeed, J is a convex quadratic function.

Here is an example of gradient descent as it is run to minimize a quadratic

function.

We use the notation “a := b” to denote an op eration (in a computer program) in

which we set the value of a variable a to b e equal to the value of b. In other words, this

operation overwrites a with the value of b. I n contrast, we will write “a = b” when we are

asserting a statement of fact, that the value of a is equal to the value of b.

剩余133页未读，继续阅读

评论收藏

内容反馈

icesongqiang

粉丝: 45
资源: 16

斯坦福机器学习讲义(全)Stanford-Machine-Leaning

斯坦福机器学习讲义

斯坦福机器学习讲义(全)Stanford-Machine-Leaning.pdf

斯坦福大学公开课 ：机器学习课程课件讲义学习笔记（高清最全讲义+课件+学习笔记）

andrew ng等 斯坦福机器学习讲义

吴恩达Andrew Ng机器学习中文讲义

斯坦福机器学习讲义(全)

斯坦福机器学习讲义(全)Stanford_Machine_Leaning

Stanford机器学习讲义和习题

Stanford-Machine-Learning-Course:代表机器学习课程的编程练习

Stanford 机器学习公开课讲义

机器学习讲义斯坦福大学吴恩达博士

斯坦福大学机器学习课程cs229原始讲义

stanford-corenlp-full-2015-12-09.zip

stanford-corenlp-3.9.2-models.jar

斯坦福机器学习编程作业machine-learning-ex5，方差与误差， Bias v.s. Variance题目，满分，2015最新作业答案

Machine Learning机器学习讲义

Coursera 机器学习课程 Machine Learning Andrew Ng Stanford 讲义合集 lectures

[转]Stanford Universtiy Machine LearningCS229(含学习笔记和原始讲义)

stanford-chinese-corenlp-2018-10-05-models.jar

stanford-corenlp-full-2014-08-27

stanford-corenlp-4.2.2.zip

stanford-segmenter-3.6.0.jar

stanford-corenlp-full-2018-10-05.zip

[网盘][转]Stanford Universtiy Machine LearningCS229(含学习笔记和原始讲义)

Stanford 机器学习讲义中文版

最新资源

斯坦福大学公开课：机器学习课程课件讲义学习笔记（高清最全讲义+课件+学习笔记）

andrew ng等斯坦福机器学习讲义