没有合适的资源？快使用搜索试试~ 我知道了~

文库首页课程资源讲义CS229_Stanford_MachineLearning_AndrewNg 整理为完整书签单一PDF

CS229_Stanford_MachineLearning_AndrewNg 整理为完整书签单一PDF

CS229

AndrewNg

机器学习

需积分: 9 60 下载量 130 浏览量 2016-09-30 11:10:37 上传评论收藏 2.02MB PDF 举报

温馨提示

试读

135页

CS229_Stanford_MachineLearning_AndrewNg 课程讲义, 多个PDF整理到一个PDF中, 并附完整书签.

资源推荐

资源详情

资源评论

cs229-notes1.pdf

cs229-notes2.pdf

cs229-notes3.pdf

cs229-notes4.pdf

cs229-notes5.pdf

cs229-notes6.pdf

cs229-notes7a.pdf

cs229-notes7b.pdf

cs229-notes8.pdf

cs229-notes9.pdf

cs229-notes10.pdf

cs229-notes11.pdf

cs229-notes12.pdf

CS229 Lecture notes

Andrew Ng

Supervised learning

Lets start by talking about a few examples of supervised learning problems.

Suppose we have a dataset giving the living areas and prices of 47 houses

from Portland, Oregon:

Living area (feet

) Price (1000$s)

2104 400

1600

330

2400

369

1416

232

3000

540

We can plot this data:

500 1000 1500 2000 2500 3000 3500 4000 4500 5000

100

200

300

400

500

600

700

800

900

1000

housing prices

square feet

price (in $1000)

Given data like this, how can we learn to predict the prices of other houses

in Portland, as a function of the size of their living areas?

CS229 Winter 2003 2

To establish notation for future use, we’ll use x

(i)

to denote the “inp ut”

variables (living area in this example), also called input features, and y

(i)

to denote the “output” or target variable that we are trying to predict

(price). A pair (x

(i)

, y

(i)

) is called a training example, and the dataset

that we’ll be using to learn—a list of m training examples {(x

(i)

, y

(i)

); i =

1, . . . , m}—is called a training set. Note that the superscript “(i)” in the

notation is simply an index into the training set, and has nothing to do with

exponentiation. We will also use X denote the space of input values, and Y

the space of output values. In this example, X = Y = R.

To describe the supervised learning problem slightly more formally, our

goal is, given a training set, to learn a function h : X 7→ Y so that h(x) is a

“good” predictor for the corresponding value of y. For historical reasons, th is

function h is called a hypothesis. Seen pictorially, the process is therefore

like this:

Training

set

house.)

(living area of

Learning

algorithm

predicted yx

(predicted price)

of house)

When the target variable that we’re trying to predict is continuous, such

as in our housing example, we call the learning problem a regression prob-

lem. When y can take on only a small number of d iscrete values (such as

if, given the living area, we wanted to predict if a dwelling is a house or an

apartment, say), we call it a classiﬁcation problem.

Part I

Linear Regression

To make our housing example more interesting, lets consider a slightly richer

dataset in which we also know the number of bedrooms in each house:

Living area (feet

)

#bedrooms Price (1000$s)

2104 3 400

1600

3 330

2400

3 369

1416

2 232

3000

4 540

Here, the x’s are two-dimensional vectors in R

. For instance, x

(i)

is the

living area of the i-th house in the training set, and x

(i)

is its number of

bedrooms. (In general, when designing a learning problem, it will be up to

you to d ecide what features to choose, so if you are out in Portland gathering

housing data, you might also decide to include other features such as whether

each house has a ﬁreplace, the number of bathrooms, and so on. We’ll say

more about feature selection later, but for now lets take the features as given.)

To perform super vised learning, we must decide how we’re going to rep-

resent functions/hypotheses h in a computer. As an initial choice, lets say

we decide to approximate y as a linear function of x:

(x) = θ

+ θ

Here, the θ

’s are the parameters (also called weights) parameterizing the

space of linear functions mapping from X to Y. When there is no risk of

confusion, we will drop the θ subscript in h

(x), and write it more simply as

h(x). To simplify our notation, we also introduce the convention of letting

= 1 (this is the intercept term), so that

h(x) =

i=0

= θ

where on the right-hand side above we are viewing θ and x both as vectors,

and here n is the number of input variables (not counting x

Now, given a training set, how do we pick, or learn, the parameters θ?

One reasonable method seems to be to make h(x) close to y, at least for

the training examples we have. To formalize this, we will deﬁne a function

that measures, for each value of the θ’s, how close the h(x

(i)

)’s are to the

corresponding y

(i)

’s. We deﬁne the cost function:

J(θ) =

i=1

(i)

) − y

(i)

)

If you’ve seen linear regression before, you may recognize this as the familiar

least-squares cost function that gives rise to the ordinary least squares

regression model. Whether or not you have seen it previously, lets keep

going, and we’ll eventually show this to be a special case of a much broader

family of algorithms.

1 LMS algorithm

We want to choose θ so as to minimize J(θ). To do so, lets use a search

algorithm that starts with some “initial guess” for θ, and that repeatedly

changes θ to make J(θ) smaller, until hopefully we converge to a value of

θ that minimizes J(θ). Speciﬁcally, lets consider the gradient descent

algorithm, which starts with some initial θ, and repeatedly performs the

update:

:= θ

− α

∂

∂θ

J(θ).

(This update is simultaneously performed for all values of j = 0, . . . , n.)

Here, α is called the learning rate. This is a very natural algorithm that

repeatedly takes a step in the direction of steepest decrease of J.

In order to implement this algorithm, we have to work out what is the

partial derivative term on the right hand side. Lets ﬁrst work it out for the

case of if we have only one training example (x, y), so that we can neglect

the sum in the deﬁnition of J. We have:

∂

∂θ

J(θ) =

∂

∂θ

(x) − y)

= 2 ·

(x) − y) ·

∂

∂θ

(x) − y)

= (h

(x) − y) ·

∂

∂θ

i=0

− y

= (h

(x) − y) x

剩余134页未读，继续阅读

评论收藏

内容反馈

资源评论

资源反馈

评论星级较低，若资源使用遇到问题可联系上传者，3个工作日内问题未解决可申请退款~

DOOM

粉丝: 150
资源: 2

上传资源快速赚钱

我的内容管理展开

我的资源快来上传第一个资源

我的收益

登录查看自己的收益

我的积分登录查看自己的积分

我的C币登录后查看C币余额

我的收藏

我的下载

下载帮助

前往需求广场，查看用户热搜

CS229_Stanford_MachineLearning_AndrewNg 整理为完整书签单一PDF

Stanford_CS229MachineLearning_AndrewNg

斯坦福机器学习讲义(全)Stanford-Machine-Leaning.pdf

CS229课程讲义及作业-Andrew Ng

斯坦福大学机器学习课程cs229原始讲义

CS229_机器学习_斯坦福公开课(中英版)

cs229的所有纸质资料

cs229原版讲义

Andrew Ng Machine Learning.zip

CS229吴恩达机器学习完整版（打印版本）

CS229 problem set 1

Andrew Ng_Stanford原版Machine Learning课程材料

AndrewNg 主讲 Machine Learning 官方 真实 讲义

stanford machine learning pdf资料

stanford_cs229.zip_Stanford_cs229_cs229下载_screen4yb_深度学习

斯坦福 CS229 机器学习讲义中文版 1~5

stanford CS229 课程讲义

[网盘][转]Stanford Universtiy Machine LearningCS229(含学习笔记和原始讲义)

[网盘][转]Stanford Universtiy Machine LearningCS229(含学习笔记和原始讲义).2018_03_19

Machine_Learning_Yearning+第1-27章.pdf

斯坦福大学机器学习课程原始讲义

cs229-hmm.pdf

斯坦福-CS229机器学习原版讲义

斯坦福机器学习公开课CS229讲义作业及matlab代码资料

[网盘][转]Stanford Universtiy Machine LearningCS229(含学习笔记和原始讲义).2018_03_17

Coursera 机器学习课程 Machine Learning Andrew Ng Stanford 讲义合集 lectures

Stanford Machine Learning Course Notes (Andrew Ng)

Coursera 机器学习课程 Machine Learning 课程项目（未包含答案）合集 Andrew Ng Stanford

Stanford 教授 Andrew Ng 的 Deep Learning 教程

cs229 stanford machine learning（所有的lectures）(1).rar

最新资源

AndrewNg 主讲 Machine Learning 官方真实讲义