CS229LecturenotesAndrewNg资源-CSDN文库

共13个文件

pdf：13个

CS229

Andrew_Ng

机器学习

5星 · 超过95%的资源需积分: 11 73 浏览量 2015-07-02 10:58:54 上传评论 1 收藏 1.98MB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

CS229 Lecture notes Andrew Ng.rar （13个子文件）

cs229-notes10.pdf 75KB

cs229-notes3.pdf 188KB

cs229-notes2.pdf 860KB

cs229-notes4.pdf 109KB

cs229-notes5.pdf 87KB

cs229-notes1.pdf 229KB

cs229-notes6.pdf 51KB

cs229-notes11.pdf 74KB

cs229-notes7a.pdf 265KB

cs229-notes12.pdf 167KB

cs229-notes8.pdf 81KB

cs229-notes7b.pdf 54KB

cs229-notes9.pdf 81KB

CS229 Lecture notes

Andrew Ng

Part IV

Generative Learning algorithms

So far, we’ve mainly been talking about learning algorithms that model

p(y|x; θ), the conditional distribution of y given x. For instance, logistic

regression modeled p(y|x; θ) as h

(x) = g( θ

x) where g is the sigmoid func-

tion. In these notes, we’ll talk about a diﬀerent type of learning algorithm.

Consider a classiﬁcation problem in which we want to learn to distinguish

between elephants (y = 1) and dogs (y = 0), based on some featur es of

an animal. Given a training set, an algorithm like logistic regression or

the perceptron algorithm (basically) tries to ﬁnd a straight line—that is, a

decision boundary—that separates the elephants and dogs. Then, to classify

a new animal as either an elephant or a dog, it checks on which side of the

decision boundar y it falls, and makes its prediction accordingly.

Here’s a diﬀerent approach. First, looking at elephants, we can build a

model of what elephants look like. Then, looking at dogs, we can build a

separate model of what dogs look like. Finally, to classify a new animal, we

can match the new animal against the elephant model, and match it against

the dog model, to see whether the new animal looks more like the elephants

or more like the dogs we had seen in the training set.

Algorithms that try to learn p(y|x) directly (such as logistic regression),

or algorithms that try to lear n mappings directly from the space of inputs X

to the labels {0, 1}, (such as the perceptron algorithm) are called discrim-

inative learning algorithms. Here, we’ll talk a bout algorithms that instead

try to model p ( x|y) (and p(y)). These algorithms are called generative

learning algorithms. Fo r instance, if y indicates whether an example is a

dog (0) or an elephant (1), then p(x|y = 0) models the distribution of dogs’

features, and p(x|y = 1) models the distribution of elephants’ f eatur es.

After modeling p(y) (called the class priors) and p(x|y), our algor ithm

can then use Bayes rule to derive the posterior distribution on y given x:

p(y|x) =

p(x|y) p ( y)

p(x)

Here, the denominator is given by p(x) = p(x|y = 1)p(y = 1) + p(x|y =

0)p(y = 0) (you should be able to verify that this is true from the standard

properties of probabilities), and thus can also be expressed in terms of the

quantities p(x|y) and p(y) that we’ve learned. Actually, if were calculating

p(y|x) in order to ma ke a prediction, then we don’t actually need to calculate

the denominator, since

arg max

p(y|x) = arg max

p(x|y) p ( y)

p(x)

= arg max

p(x|y) p ( y).

1 Gaussian di scriminant analysis

The ﬁrst generative learning algorithm that we’ll look at is Gaussian discrim-

inant analysis (GDA). In this model, we’ll assume that p(x|y) is distributed

according to a multivariate no r ma l distribution. Let’s talk brieﬂy about t he

properties of multivariate normal distributions before moving o n t o the GDA

model itself.

1.1 The multivariate normal distribution

The multivariate normal distribution in n-dimensions, also called the multi-

variate Gaussian distribution, is parameterized by a mean vector µ ∈ R

and a covariance matrix Σ ∈ R

n×n

, where Σ ≥ 0 is symmetric and positive

semi-deﬁnite. Also written “N (µ, Σ)”, its density is given by:

p(x; µ, Σ) =

(2π)

n/2

|Σ|

1/2

exp



−

(x − µ)

−1

(x − µ )



In the equation above, “|Σ|” denotes the determinant of the matrix Σ.

For a random variable X distributed N (µ, Σ), the mean is ( unsurpris-

ingly) given by µ:

E[X] =

x p(x; µ, Σ)dx = µ

The covariance of a vector-valued random variable Z is deﬁned as Cov(Z) =

E[(Z − E[Z]) (Z − E[Z])

]. This generalizes the notion of the variance of a

real-valued random variable. The covariance can also be deﬁned as Cov(Z) =

E[ZZ

] − (E[Z])(E[Z])

. (You should be able to prove to yourself that these

two deﬁnitions are equivalent.) If X ∼ N (µ, Σ), t hen

Cov(X) = Σ.

Here’re some examples of what the density of a Gaussian distribution

looks like:

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

The left-most ﬁgure shows a Gaussian with mean zero (that is, the 2x1

zero-vector) and covariance matrix Σ = I (the 2x2 identity matrix). A Gaus-

sian with zero mean and identity covariance is also called the standard nor-

mal distribution. The middle ﬁgure shows the density of a Gaussian with

zero mean and Σ = 0.6I; and in the rig htmost ﬁgure shows one with , Σ = 2I.

We see that as Σ becomes larger, the Gaussian becomes more “spread-o ut ,”

and as it becomes smaller, the distribution becomes mo r e “compressed.”

Let’s look a t some more examples.

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

The ﬁgures above show Gaussians with mean 0, and with covariance

matrices respectively

Σ =



1 0

0 1



; Σ =



1 0.5

0.5 1



; .Σ =



1 0.8

0.8 1



The leftmost ﬁgure shows the familiar standard normal distribution, and we

see that as we increase the oﬀ-diagonal entr y in Σ, the density becomes more

“compressed” towards the 45

◦

line (given by x

= x

). We can see this more

clearly when we look at the contours of the same three densities:

−3 −2 −1 0 1 2 3

−3

−2

−1

−3 −2 −1 0 1 2 3

−3

−2

−1

−3 −2 −1 0 1 2 3

−3

−2

−1

Here’s one last set of examples generated by varying Σ:

−3 −2 −1 0 1 2 3

−3

−2

−1

−3 −2 −1 0 1 2 3

−3

−2

−1

−3 −2 −1 0 1 2 3

−3

−2

−1

The plots a bove used, respectively,

Σ =



1 -0.5

-0.5 1



; Σ =



1 -0.8

-0.8 1



; .Σ =



3 0.8

0.8 1



From the leftmost and middle ﬁgures, we see that by decreasing the diagonal

elements of the covariance matrix, the density now becomes “compressed”

again, but in the opposite direction. Lastly, as we vary the parameters, more

generally the contours will form ellipses (the rightmost ﬁgure showing an

example).

As our last set of examples, ﬁxing Σ = I, by varying µ, we can a lso move

the mean of the density around.

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

−3

−2

−1

−3

−2

−1

0.05

0.1

0.15

0.2

0.25

The ﬁgures above were generated using Σ = I, and respectively

µ =





; µ =



-0.5



; µ =



-1

-1.5



1.2 The Gaussian Discriminant Analysis model

When we have a classiﬁcation problem in which the input features x are

continuous-valued random variables, we can then use the Gaussian Discrim-

inant Analysis (GDA) model, which models p( x|y) using a multivariate nor-

mal distribution. The model is:

y ∼ Bernoulli( φ)

x|y = 0 ∼ N (µ

, Σ)

x|y = 1 ∼ N (µ

, Σ)

Writing out the distributions, this is:

p(y) = φ

(1 − φ)

1−y

p(x|y = 0) =

(2π)

n/2

|Σ|

1/2

exp



−

(x − µ

)

−1

(x − µ

)



p(x|y = 1) =

(2π)

n/2

|Σ|

1/2

exp



−

(x − µ

)

−1

(x − µ

)



Here, the parameters of our model are φ, Σ, µ

and µ

. (Note that while

there’re two diﬀerent mean vectors µ

and µ

, this model is usually applied

using only one covariance matrix Σ.) The log-likelihood of the data is given

ℓ(φ, µ

, µ

, Σ) = log

i=1

p(x

(i)

, y

(i)

; φ, µ

, µ

, Σ)

= log

i=1

p(x

(i)

; µ

, µ

, Σ)p(y

(i)

; φ).

评论收藏

内容反馈

steely816

2018-03-20

下载了，能用，很赞，谢谢
nalanjia

2016-05-26

偶都忘了这个是什么哦。。。

venero

粉丝: 2
资源: 24

CS229 Lecture notes Andrew Ng

CS229 Lecture Notes

CS229 Lecture notes

CS229 LECTURE NOTES

斯坦福大学机器学习课程讲义cs229-andrew Ng. lecture notes

斯坦福学习笔记 CS229

斯坦福学习笔记CS229

cs229-notes1.pdf

Stanford Machine Learning CS229 lecture notes (autumn 2019)

吴恩达CS229全部12notes原稿中文笔记

斯坦福大学CS229机器学习完整详细笔记 中文版 （含Coursera课程作业代码 以及全套中文版笔记）.zip

CS229课程讲义及作业-Andrew Ng

CS229-notes-deep_learning by Andrew Ng

Andrew Ng CS229 课程英文版讲义(2017)

Andrew+Ng+机器学习+笔记coursera+ml+notes

CS229 Supplemental Lecture notes John Duchi

cs229课件笔记

andrew ng cs229机器学习 notes

cs229note

CS229Andrew Ng网易公开课笔记

CS229_Stanford_MachineLearning_AndrewNg 整理为完整书签单一PDF

Stanford_CS229MachineLearning_AndrewNg

斯坦福大学cs229机器学习课程原始讲义合集

斯坦福大学机器学习课程原始讲义

CS294A Lecture notes Sparse autoencoder （稀疏自编码器课程讲义，吴恩达）

机器学习cs229

斯坦福大学cs229资料

CS229

Andrew Ng机器学习笔记

最新资源

斯坦福大学CS229机器学习完整详细笔记中文版（含Coursera课程作业代码以及全套中文版笔记）.zip