SupportVectorMachines资源-CSDN文库

支持向量机

需积分: 9 115 浏览量 2015-05-31 23:56:56 上传评论收藏 187KB PDF 举报

资源推荐

资源详情

资源评论

CS229 Lecture notes

Andrew Ng

Part V

Support Vector Machines

This set of notes presents the Support Vector Machine (SVM) learning al-

gorithm. SVMs are among the best (and many believe are indeed the best)

“oﬀ-the-shelf” sup ervised learning algorithm. To tell the SVM story, we’ll

need to ﬁrst talk about margins and the idea of separating data with a large

“gap.” Next, we’ll talk about the optimal margin classiﬁer, which will lead

us into a digression on L agrange duality. We’ll also see kernels, which give

a way t o apply SVMs eﬃciently in very hig h dimensional (such as inﬁnite-

dimensional) feature spaces, and ﬁnally, we’ll close oﬀ the story with the

SMO algorithm, which gives an eﬃcient implementation of SVMs.

1 Margins: Intuition

We’ll start our story on SVMs by talking about margins. This section will

give the intuitions about margins and about the “ conﬁdence” of our predic-

tions; these ideas will be made formal in Section 3.

Consider logistic regression, where the probability p(y = 1|x; θ) is mod-

eled by h

(x) = g(θ

x). We would then predict “1” on an input x if and

only if h

(x) ≥ 0.5, or equivalently, if and only if θ

x ≥ 0. Consider a

positive training example (y = 1 ) . The lar ger θ

x is, the larger also is

(x) = p(y = 1|x; w, b), and thus a lso the higher our degree of “conﬁdence”

that the label is 1. Thus, informally we can think of our prediction as being

a very conﬁdent one that y = 1 if θ

x ≫ 0. Similarly, we think of logistic

regression as making a very conﬁdent prediction of y = 0, if θ

x ≪ 0. Given

a training set, again informally it seems that we’d have found a good ﬁt to

the training data if we can ﬁnd θ so that θ

(i)

≫ 0 whenever y

(i)

= 1, and

2 Notation

To make our discussion of SVMs easier, we’ll ﬁrst need to introduce a new

notation for talking about classiﬁcation. We will be considering a linear

classiﬁer for a binary classiﬁcation problem with labels y and features x.

From now, we’ll use y ∈ {−1, 1} (instead of {0, 1}) to denote the class labels.

Also, rather than parameterizing our linear classiﬁer with the vect or θ, we

will use parameters w, b, and write our classiﬁer as

w,b

(x) = g(w

x + b).

Here, g(z) = 1 if z ≥ 0, and g(z) = −1 otherwise. This “w, b” notation

allows us to explicitly t r eat the intercept term b separately from the other

parameters. (We also drop the convention we ha d previously of letting x

= 1

be an extra coordinate in the input feature vector.) Thus, b takes the role of

what was previously θ

, a nd w takes the role of [θ

. . . θ

]

Note also that, from our deﬁnition of g above, our classiﬁer will directly

predict either 1 or −1 (cf. the perceptron algorithm), without ﬁrst going

through the intermediate step of estimating the probability of y being 1

(which was what logistic regression did).

3 Function al and geometri c margins

Let’s formalize the notions of the functional and geometric margins. Given a

training example (x

(i)

, y

(i)

), we deﬁne the functional margin of (w, b) with

respect to the training example

ˆγ

(i)

= y

(i)

x + b).

Note that if y

(i)

= 1, then for the functional margin to be large (i.e., for

our prediction to be conﬁdent and correct), we need w

x + b to be a large

positive number. Conver sely, if y

(i)

= −1, then for the functional margin

to be larg e, we need w

x + b to be a larg e negative number. Moreover, if

(i)

x + b) > 0, then our prediction on this example is correct. (Check

this yourself.) Hence, a la rge functional ma r gin represents a conﬁdent and a

correct prediction.

For a linear classiﬁer with the cho ice of g given above (taking values in

{−1, 1}), there’s one property of the functional margin that ma kes it not a

very good measure of conﬁdence, however. Given our choice of g, we note that

if we replace w with 2w and b with 2b, then since g(w

x+b) = g(2w

x+2b),

剩余24页未读，继续阅读

评论收藏

内容反馈

lhr18716032695

粉丝: 0
资源: 1

Support Vector Machines

Support_Vector_Machines.pdf

机器学习 第十讲：Support Vector Machines 4

support vector machines by Ingo Steinwart

Deep Learning using Linear Support Vector Machines实现

Support Vector Machines

Text Categorization with Support Vector Machines_ Learning with Many Relevant Fe.pdf

LEAST SQUARES SUPPORT VECTOR MACHINES

Lagrangian Support Vector Machines

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.chm

Learning with Support Vector Machines

Support Vector Machines 支持向量机

a Library for Support Vector Machines

Comment on Robustness and Regularization of Support Vector Machines

LIBSVM: A Library for Support Vector Machines

Twin Support Vector Machines for Pattern Classification.pdf

Support Vector Machines for Pattern Classification

Coursera Machine Learning 第七周week7ex6Support Vector Machines编程全套满分题目+注释选做

Support_Vector_Machines.pptx

Applications of Support Vector Machines in Chemistry

An_Introduction_to_Support_Vector_Machines

A Tutorial on Support Vector Machines for Pattern

最新资源

机器学习第十讲：Support Vector Machines 4