learningfromdata下半部资源-CSDN文库

4星 · 超过85%的资源需积分: 10 31 浏览量 2016-05-31 14:38:46 上传评论 1 收藏 15.06MB PDF 举报

资源推荐

资源详情

资源评论

CHAPTER

e-Chapter 6

Similarity-Based Methods

“It’s a manohorse”, exclaimed the conﬁdent little 5

year old boy. We call it the Centaur out of habit, but

who can fault the kid’s intuition? The 5 year old has

never seen this thing before now, yet he came up with

a reasonable classiﬁcation for the beast. He is using

the simplest method of learning that we know of –

similarity – and yet it’s eﬀective: the child searches

through his history for similar objects (in this cas e a

man and a horse) and builds a classiﬁcation ba sed on

these similar objects.

The method is simple and intuitive, yet when we

get into the details, several issues need to be addressed

in order to arrive at a technique that is quantitative

and ﬁt for a computer. The goal of this chapter is to build ex actly such a

quantitative framework for similarity based learning.

6.1 Simila ri ty

The “manohorse” is interesting because it requires a deep understanding of

similarity: ﬁrst, to say that the Centaur is similar to both man and horse;

and, second, to decide that there is enough similarity to both objects so that

neither can be excluded, warranting a new class. A good measure of similarity

allows us to not only classify objects using similar objects, but also detect the

arrival of a new class of objects (novelty detection).

A simple classiﬁcation rule is to give a new input the class of the most

similar input in your data. This is the “nea rest neighbor” rule. To implement

the nearest neighbor rule, we need to ﬁrst qua ntify the similar ity between

two objects. There are diﬀerent ways to measure similarity, or equivalently

dissimilarity. Consider the following example with 3 digits.



Yaser Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin: Oct-2014.

CHAPTER

e-6. Similarity-Based Methods 6.1. Similarity

The two 9s should be regarded as very similar. Yet, if we naively measure

similarity by the number of black pixels in common, the two 9s have only two

in c ommon. On the other hand, the 6 has many more black pixels in common

with either 9, even though the 6 should be regarded as dissimilar to both 9s.

Before measuring the similarity, one should prepr oces s the inputs, for example

by centering, axis aligning and normalizing the size in the case of an image.

One can go further and extract the relevant features of the data, for example

size (number of black pixels) and symmetry as was done in Chapter 3. These

practical consider ations regarding the natur e of the learning task, though im-

portant, are not our primary focus here. We will assume that through domain

expertise or otherwise, features have been co ns tructed to identify the impor-

tant dimensions, and if two inputs diﬀer in these dimensions, then the inputs

are likely to be dissimilar. Given this assumption, there are well established

ways to measure similarity (or dissimilarity) in diﬀerent contexts.

6.1.1 Similarity Measures

For inputs x, x

′

which are vectors in R

, we can measure dissimilarity using

the standard Euclidean distance,

d(x, x

′

) = kx −x

′

The smaller the distance, the more similar are the objects corresponding to

inputs x and x

′

. For Boolean feature s the Euclidean distance is the square

root of the well known Hamming distance. The Euclidean distance is a sp e c ial

case of a more general distance measure which can be deﬁned for an arbitrary

positive semi-deﬁnite matrix Q:

d(x, x

′

) = (x −x

′

)

Q(x − x

′

A useful special case, known as the Mahalanobis distance is to set Q = Σ

−1

where Σ is the covar iance matrix

of the data. The Mahalanobis distance

metric depends on the data set. The ma in advantage of the Ma halanobis

distance over the standard Euclidean distance is tha t it takes into account

correla tions among the data dimensions and scale. A similar eﬀect can be

accomplished by ﬁrst using input preprocess ing to standardize the data (see

Σ =

i=1

−

, where

x =

i=1



Abu-Mostafa, Magdon-Ismail, Lin: Oct-2014 e-Chap:6–2

CHAPTER

e-6. Similarity-Based Methods 6.2. Nearest Neighbor

Chapter 9) and then using the standard Euclidean distance. Ano ther useful

measure, espec ially for Boolean vectors, is the cosine similarity,

CosSim(x, x

′

) =

•

′

kxkkx

′

The cosine similarity is the cosine of the angle between the two vectors,

CosSim ∈ [−1, 1], and larger values indica te greater similarity. When the

objects represent sets, then the set similarity or Jaccard coeﬃcient is often

used. For example, consider two movies which have be e n watched by two dif-

ferent sets of users S

, S

. We may measure how similar these movies are by

how similar the two sets S

and S

are:

J(S

, S

) =

∩ S

∪ S

;

1 −J(S

, S

) can be used as a measure of distance which conveniently has the

properties that a metric formally satisﬁes, such as the tr iangle inequality. We

fo c us on the Euclidean distance which is als o a metric; many of the algo rithms,

however, apply to arbitrary similarity measures.

Exercise 6.1

(a) Give two vectors with very high cosine similarity but very low Euclidean

distance similarity. Similarly, give two vectors with very low cosine

similarity but very high Euclidean distance similarity.

(b) If we shift the origin of the coordinate system, which of the two

measures of similarity will change? How will this aﬀect your choice

of features?

6.2 Nearest Neighbo r

Simple rules survive; and, the nearest neighbor technique is perhaps the sim-

plest of all. We will summarize the entire algorithm in a short paragraph.

But, before we forge ahead, let’s not forget the two basic competing principles

laid out in Part I: any lear ning technique should be expressive enough that

we can ﬁt the data and obtain low E

; however, it should be reliable enough

that a low E

implies a low E

out

The nearest neighbor rule is embarrassingly simple. There is no training

phase (or no “learning” so to speak ). The entire algor ithm is sp e c iﬁed by how

one computes the ﬁnal hypo thesis g(x) on a test input x. Reca ll that the data

set is D = (x

, y

) . . . (x

, y

), where y

= ±1. To clas sify the test point x,

ﬁnd the nearest point to x in the data s et (the nearest neighbor), and use the

classiﬁcation of this nearest neighbor.

Fo rmally speaking, reorder the data according to distance from x (breaking

ties using the data point’s index for simplicity). We wr ite (x

[n]

(x), y

[n]

(x))



Abu-Mostafa, Magdon-Ismail, Lin: Oct-2014 e-Chap:6–3

CHAPTER

e-6. Similarity-Based Methods 6.2. Nearest Neighbor

which the c lassiﬁcation is based. The main disadvantage is the computational

overhead.

VC dimension. The nearest neighbor rule ﬁts within our standard super-

vised learning framework with a very large hypothesis set. Any set of n labeled

points induces a Voronoi tessellation with each Voronoi region assigned to a

class; thus any set of n labeled points deﬁnes a hyp othesis. Let H

be the hy-

pothesis s e t co ntaining all hypothese s which result from some labeled Voronoi

tessellation on n points. Let H = ∪

∞

n=1

be the union of all these hypothesis

sets; this is the hypothesis set for the nearest neighbor rule. The learning

algorithm picks the particular hypothesis from H which corresponds to the

realized labeled data set; this hypothesis is in H

⊂ H. Since the training

error is zero no matter what the size of the data set, the nearest neighbor rule

is non-falsiﬁable

and the VC-dimension of this model is inﬁnite. So , from

the VC-worst-case analysis, this spells doom. A ﬁnite VC-dimension would

have been great, and it would have given us one form of reliability, namely

that E

out

is close to E

and so minimizing E

works. The nearest neighbor

method is reliable in another sense, and we are going to need some new tools

if we are to argue the case.

6.2.1 Nearest N eighbor is 2-Optimal

Using a probabilistic argument, we will show that the nearest neighbor rule

has an out-of-sample error that is at most twice the minimum possible out-

of-sample error. The success of the nearest neighbor algorithm relies on the

nearby point x

[1]

(x) having the same classiﬁcation as the test point x. This

means two things: there is a ‘nearby’ point in the data set; and, the target

function is reasonably smooth so that the classiﬁcation of this nearest neighbor

is indicative of the classiﬁcation of the test point.

As in logistic regression, we model the target as noisy and deﬁne

π(x) = P[y = +1|x].

A data pa ir (x, y) is obtained by ﬁrst generating x fr om the input probability

distribution P (x), and then y from the conditional distribution π(x). We

can relate π(x) to a deterministic target function f(x) by observing that if

π(x) ≥

, then the optimal prediction is f (x) = +1;

and, if π(x) <

then

the optimal prediction is f(x) = −1 . Let η(x) = min{π(x), 1 − π(x)}.

Recall that a hypothesis set is non-fal siﬁable if it can ﬁt any data set.

When π(x) =

, we break the tie in favor of +1.



Abu-Mostafa, Magdon-Ismail, Lin: Oct-2014 e-Chap:6–5

剩余202页未读，继续阅读

评论收藏

内容反馈

microsheen229

2016-10-26

非茶国内不错，正在看呢

waterloveman

粉丝: 0
资源: 6

learning from data 下半部

最新资源

learning from data 下半部

Learning from data

learning from data (Volume 2)

learning from data 新的章节

Learning From Data.zip

Learning From Data-从数据中学习

Learning From Data plus 超清完整版 林轩田（英文版）

Learning Data Mining with Python - Second Edition

learning from data 林轩田课程用书

Unsupervised_learning_country_data:聚类分析

Quantum machine learning for data scientists

machine-learning-data-to-model:源代码-Source code learning

Using machine learning to predict student difficulties from learning session data

Linear.Algebra.and.Learning.from.Data_machinglearning_data_linea

Learning From Data-A Short Course

Learning Active Learning from Data

Learning From Data Yaser

Statistical and machine-learning data mining

Statistical learning and data science

Image-data-importer-for-deep-learning

statistical_learning_for_big_data

A survey on deep learning for big data

Learning from data e-chapter

《Learning From Data 2nd Ed》PDF

Learning from Data: Concepts, Theory and Methods

Self-taught Learning: Transfer Learning from Unlabeled Data

learning data mining with python正版

Big data machine learning

learning-spring-javabrains-data

最新资源

Learning From Data plus 超清完整版林轩田（英文版）