斯坦福机器学习讲义资源-CSDN文库

机器学习

需积分: 10 197 浏览量 2017-12-20 15:28:46 上传评论收藏 4.77MB PDF 举报

资源推荐

资源详情

资源评论

CS229 Lecture notes

Andrew Ng

Supervised learning

Lets start by talking about a few examples of supervised learning problems.

Suppose we have a dataset giving the living areas and prices of 47 houses

from Portland, Oregon:

Living area (feet

) Price (1000$s)

2104 400

1600

330

2400

369

1416

232

3000

540

We can plot this data:

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

100

200

300

400

500

600

700

800

900

1000

housing prices

square feet

price (in $1000)

Given data like this, how can we learn to predict the prices of other houses

in Portland, as a function of the size of their living areas?

For a single training example, this gives the update rule:

:= θ

+ α

(i)

− h

(i)

)

(i)

The rule is called the LMS update rule (LMS stands for “least mean squares”),

and is also known as the Widrow-Hoﬀ learning rule. This rule has several

properties that seem natural and intuitive. For instance, the magnitude of

the update is proportional to the error term (y

(i)

− h

(i)

)); thus, for in-

stance, if we are encounter ing a training example on which our prediction

nearly matches the actual value of y

(i)

, then we ﬁnd that there is little need

to change the parameters; in contrast, a larger ch ange to the parameters will

be made if our prediction h

(i)

) has a large error (i.e., if it is very far from

(i)

We’d derived the LMS rule for when t here was only a single training

example. There are two ways to modify this method for a training set of

more than one example. The ﬁrst is replace it with the followingalgorithm:

Repeat until convergence {

:= θ

+ α

i=1

(i)

− h

(i)

)

(i)

(for every j).

}

The reader can easily verify that the quantity in the summation in the update

rule above is just ∂J (θ)/∂θ

(for the original deﬁnition of J). So, this is

simply gradient descent on the original cost function J. This method looks

at every example in the entire training set on every step, and is called batch

gradient descent. Note that, while gradient descent can be susceptible

to local minima in general, the optimization problem we have posed here

for linear regression has only one global, and no other local, optima; thus

gradient descent always converges (assuming the learning rate α is not too

large) to the global minimum. Indeed, J is a convex quadratic function.

Here is an example of gradient descent as it is run to minimize a quadratic

function.

We use the notation “a := b” to denote an operation (in a computer program) in

which we set the value of a variable a to be equal to the value of b. In other words, this

operation overwrites a with the value of b. In contrast, we will write “a = b” when we are

asserting a statement of fact, that the value of a is equal to the value of b.

剩余133页未读，继续阅读

评论收藏

内容反馈

qiuqiu374

粉丝: 1
资源: 1

斯坦福机器学习讲义

斯坦福大学机器学习课程讲义.rar

斯坦福机器学习讲义(全)Stanford-Machine-Leaning.pdf

斯坦福机器学习讲义-中文版-黄海广

andrew ng等 斯坦福机器学习讲义

斯坦福机器学习讲义+习题+答案

斯坦福机器学习课程的讲义

斯坦福机器学习课程讲义

斯坦福机器学习讲义(全)

斯坦福机器学习讲义(全)Stanford_Machine_Leaning

斯坦福机器学习讲义-笔记下

斯坦福机器学习讲义(全)Stanford-Machine-Leaning

斯坦福机器学习公开课讲义+个人笔记

机器学习-斯坦福课程讲义

斯坦福机器学习

斯坦福机器学习讲义理解笔记以及李飞飞CVppt课件zip

斯坦福大学机器学习学习课程讲义.rar_斯坦福大学机器学习_机器学习

斯坦福 机器学习

机器学习讲义-斯坦福-吴恩达

最新资源

andrew ng等斯坦福机器学习讲义

斯坦福机器学习