支持向量机大牛Vapnik的两篇论文_SupportedVectorMachine资源-CSDN文库

共2个文件

pdf：2个

机器学习

支持向量机

5星 · 超过95%的资源需积分: 49 57 浏览量 2010-12-24 16:43:27 上传评论 1 收藏 895KB RAR 举报

资源推荐

资源详情

资源评论

收起资源包目录

vapnik.rar （2个子文件）

support vector machine for histogram-based image classification.pdf 805KB

choosing multiple parameters for support vector machines.pdf 297KB

IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 1055

Support Vector Machines for

Histogram-Based Image Classiﬁcation

Olivier Chapelle, Patrick Haffner, and Vladimir N. Vapnik

Abstract— Traditional classiﬁcation approaches generalize

poorly on image classiﬁcation tasks, because of the high

dimensionality of the feature space. This paper shows that

support vector machines (SVM’s) can generalize well on difﬁcult

image classiﬁcation problems where the only features are

high dimensional histograms. Heavy-tailed RBF kernels of

the form

(

;



with



and



are evaluated on the classiﬁcation of images extracted from

the Corel stock photo collection and shown to far outperform

traditional polynomial or Gaussian radial basis function (RBF)

kernels. Moreover, we observed that a simple remapping of the

input

improves the performance of linear SVM’s to

such an extend that it makes them, for this problem, a valid

alternative to RBF kernels.

Index Terms— Corel, image classiﬁcation, image histogram,

radial basis functions, support vector machines.

I. INTRODUCTION

ARGE collections of images are becoming available to

the public, from photo collections to Web pages or even

video databases. To index or retrieve them is a challenge which

is the focus of many research projects (for instance IBM’s

QBIC [1]). A large part of this research work is devoted to

ﬁnding suitable representations for the images, and retrieval

generally involves comparisons of images. In this paper, we

choose to use color histograms as an image representation

because of the reasonable performance that can be obtained

in spite of their extreme simplicity [2]. Using this histogram

representation, our initial goal is to perform generic object

classiﬁcation with a “winner takes all” approach: ﬁnd the one

category of object that is the most likely to be present in a

given image.

From classiﬁcation trees to neural networks, there are many

possible choices for what classiﬁer to use. The support vector

machine (SVM) approach is considered a good candidate

because of its high generalization performance without the

need to add a priori knowledge, even when the dimension of

the input space is very high.

Intuitively, given a set of points which belongs to either

one of two classes, a linear SVM ﬁnds the hyperplane leaving

the largest possible fraction of points of the same class on the

same side, while maximizing the distance of either class from

the hyperplane. According to [3], this hyperplane minimizes

the risk of misclassifying examples of the test set.

Manuscript received January 21, 1999; revised April 30, 1999.

The authors are with the Speech and Image Processing Services Research

Laboratory, AT&T Labs-Research, Red Bank, NJ 07701 USA.

Publisher Item Identiﬁer S 1045-9227(99)07269-0.

This paper follows an experimental approach, and its or-

ganization unfolds as increasingly better results are obtained

through modiﬁcations of the SVM architecture. Section II

provides a brief introduction to SVM’s. Section III describes

the image recognition problem on Corel photo images. Section

IV compares SVM and KNN-based recognition techniques

which are inspired by previous work. From these results,

Section V explores novel techniques, by either selecting the

SVM kernel, or remapping the input, that provide high image

recognition performance with low computational requirements.

II. S

UPPORT VECTOR MACHINES

A. Optimal Separating Hyperplanes

We give in this section a very brief introduction to SVM’s.

Let

be a set of training examples, each example

being the dimension of the input space, belongs

to a class labeled by

. The aim is to deﬁne a

hyperplane which divides the set of examples such that all

the points with the same label are on the same side of the

hyperplane. This amounts to ﬁnding

and so that

(1)

If there exists a hyperplane satisfying (1), the set is said

to be linearly separable. In this case, it is always possible to

rescale

and so that

i.e., so that the distance between the closest point to the

hyperplane is

. Then, (1) becomes

(2)

Among the separating hyperplanes, the one for which the

distance to the closest point is maximal is called optimal

separating hyperplane (OSH). Since the distance to the closest

point is

, ﬁnding the OSH amounts to minimizing

under constraints (2).

The quantity

is called the margin, and thus the OSH

is the separating hyperplane which maximizes the margin. The

margin can be seen as a measure of the generalization ability:

the larger the margin, the better the generalization is expected

to be [4], [5].

Since

is convex, minimizing it under linear constraints

(2) can be achieved with Lagrange multipliers. If we denote

1045–9227/99$10.00  1999 IEEE

1056 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999

by the non negative Lagrange multipli-

ers associated with constraints (2), our optimization problem

amounts to maximizing

(3)

with

and under constraint . This can

be achieved by the use of standard quadratic programming

methods [6].

Once the vector

solution of the maxi-

mization problem (3) has been found, the OSH

has

the following expansion:

(4)

The support vectors are the points for which

satisfy

(2) with equality.

Considering the expansion (4) of

, the hyperplane deci-

sion function can thus be written as

(5)

B. Linearly Nonseparable Case

When the data is not linearly separable, we introduce slack

variables

with [7] such that

(6)

to allow the possibility of examples that violate (2). The

purpose of the variables

is to allow misclassiﬁed points,

which have their corresponding

. Therefore is an

upper bound on the number of training errors. The generalized

OSH is then regarded as the solution of the following problem:

minimize

(7)

subject to constraints (6) and

. The ﬁrst term is

minimized to control the learning capacity as in the separable

case; the purpose of the second term is to control the number of

misclassiﬁed points. The parameter

is chosen by the user, a

larger

corresponding to assigning a higher penalty to errors.

SVM training requires to ﬁx

in (7), the penalty term

for misclassiﬁcations. When dealing with images, most of

the time, the dimension of the input space is large (

1000)

compared to the size of the training set, so that the training

data is generally linearly separable. Consequently, the value

has in this case little impact on performance.

C. Nonlinear Support Vector Machines

The input data is mapped into a high-dimensional feature

space through some nonlinear mapping chosen a priori [8]. In

this feature space, the OSH is constructed.

If we replace

by its mapping in the feature space ,

(3) becomes

If we have , then only is

needed in the training algorithm and the mapping

is never

explicitly used. Conversely, given a symmetric positive kernel

, Mercer’s theorem [3] indicates us that there exists a

mapping

such that

Once a kernel satisfying Mercer’s condition has been

chosen, the training algorithm consists of minimizing

(8)

and the decision function becomes

(9)

D. Multiclass Learning

SVM’s are designed for binary classiﬁcation. When dealing

with several classes, as in object recognition and image

classiﬁcation, one needs an appropriate multiclass method.

Different possibilities include the following.

• Modify the design of the SVM, as in [9], in order

to incorporate the multiclass learning directly in the

quadratic solving algorithm.

• Combine several binary classiﬁers: “One against one”

[10] applies pairwise comparisons between classes, while

“One against the others” [11] compares a given class with

all the others put together.

According to a comparison study [9], the accuracies of these

methods are almost the same. As a consequence, we chose the

one with the lowest complexity, which is “one against the

others.”

In the “one against the others” algorithm,

hyperplanes are

constructed, where

is the number of classes. Each hyperplane

separates one class from the other classes. In this way, we get

decision functions of the form (5). The class of

a new point

is given by , i.e., the class with

the largest decision function.

We made the assumption that every point has a single label.

Nevertheless, in image classiﬁcation, an image may belong

to several classes as its content is not unique. It would be

possible to make multiclass learning more robust, and extend

it to handle multilabel classiﬁcation problems by using error

correcting codes [12]. This more complex approach has not

been experimented in this paper.

III. T

HE DATA AND ITS REPRESENTATION

Among the many possible features that can be extracted

from an image, we restrict ourselves to ones which are global

and low-level (the segmentation of the image into regions,

objects or relations is not in the scope of the present paper).

CHAPELLE et al.: SVM’S FOR HISTOGRAM-BASED IMAGE CLASSIFICATION 1057

The simplest way to represent an image is to consider its

bitmap representation. Assuming the sizes of the images in

the database are ﬁxed to

( for the height and

for the width), then the input data for the SVM are vectors

of size

for grey-level images and 3 for color

images. Each component of the vector is associated to a pixel

in the image. Some major drawbacks of this representation

are its large size and its lack of invariance with respect

to translations. For these reasons, our ﬁrst choice was the

histogram representation which is described presently.

A. Color Histograms

In spite of the fact that the color histogram technique is a

very simple and low-level method, it has shown good results in

practice [2] especially for image indexing and retrieval tasks,

where feature extraction has to be as simple and as fast as

possible. Spatial features are lost, meaning that spatial relations

between parts of an image cannot be used. This also ensures

full translation and rotation invariance.

A color is represented by a three dimensional vector corre-

sponding to a position in a color space. This leaves us to select

the color space and the quantization steps in this color space.

As a color space, we chose the hue-saturation-value (HSV)

space, which is in bijection with the red–green–blue (RGB)

space. The reason for the choice of HSV is that it is widely

used in the literature.

HSV is attractive in theory. It is considered more suitable

since it separates the color components (HS) from the lu-

minance component (V) and is less sensitive to illumination

changes. Note also that distances in the HSV space correspond

to perceptual differences in color in a more consistent way

than in the RGB space.

However, this does not seem to matter in practice. All the

experiments reported in the paper use the HSV space. For

the sake of comparison, we have selected a few experiments

and used the RGB space instead of the HSV space, while

keeping the other conditions identical: the impact of the choice

of the color space on performance was found to be minimal

compared to the impacts of the other experimental conditions

(choice of the kernel, remapping of the input). An explanation

for this fact is that, after quantization into bins, no information

about the color space is used by the classiﬁer.

The number of bins per color component has been ﬁxed

to 16, and the dimension of each histogram is

Some experiments with a smaller number of bins have been

undertaken, but the best results have been reached with 16

bins. We have not tried to increase this number, because it

is computationally too intensive. It is preferable to compute

the histogram from the highest spatial resolution available.

Subsampling the image too much results in signiﬁcant losses

in performance. This may be explained by the fact that by

subsampling, the histogram loses its sharp peaks, as pixel

colors turn into averages (aliasing).

B. Selecting Classes of Images in the Corel

Stock Photo Collection

The Corel stock photo collection consists of a set of

photographs divided into about 200 categories, each one with

100 images. For our experiments, the original 200 categories

have been reduced using two different labeling approaches.

In the ﬁrst one, named Corel14, we chose to keep the cat-

egories deﬁned by Corel. For the sake of comparison, we

chose the same subset of categories as [13], which are:

air shows, bears, elephants, tigers, Arabian horses, polar

bears, African specialty animals, cheetahs-leopards-jaguars,

bald eagles, mountains, ﬁelds, deserts, sunrises-sunsets, night

scenes. It is important to note that we had no inﬂuence on the

choices made in Corel14: the classes were selected by [13]

and the examples illustrating a class are the 100 images we

found in a Corel category. In [13], some images which were

visually deemed inconsistent with the rest of their category

were removed. In the results reported in this paper, we use all

100 images in each category and kept many obvious outliers:

see for instance, in Fig. 2, the “polar bear alert” sign which is

considered to be an image of a polar bear. With 14 categories,

this results in a database of 1400 images. Note that some Corel

categories come from the same batch of photographs: a system

trained to classify them may only have to classify color and

exposure idiosyncracies.

In an attempt to avoid these potential problems and to

move toward a more generic classiﬁcation, we also deﬁned

a second labeling approach, Corel7, in which we designed our

own seven categories: airplanes, birds, boats, buildings, ﬁsh,

people, vehicles. The number of images in each category varies

from 300 to 625 for a total of 2670 samples.

For each category images were hand-picked from several

original Corel categories. For example, the airplanes category

includes images of air shows, aviation photography, ﬁghter jets

and WW-II planes. The representation of what is an airplane

is then more general. Table I shows the origin of the images

for each category.

IV. S

ELECTING THE KERNEL

A. Introduction

The design of the SVM classiﬁer architecture is very simple

and mainly requires the choice of the kernel (the only other

parameter is

). Nevertheless, it has to be chosen carefully

since an inappropriate kernel can lead to poor performance.

There are currently no techniques available to “learn” the form

of the kernel; as a consequence, the ﬁrst kernels investigated

were borrowed from the pattern recognition literature. The

kernel products between input vectors

and are

results in a classiﬁer which has a polynomial decision

function.

gives a Gaussian radial basis function

(RBF) classiﬁer. In the Gaussian RBF case, the number of

centers (number of support vectors), the centers themselves

(the support vectors), the weights

and the threshold

are all produced automatically by the SVM training and give

excellent results compared to RBF’s trained with non-SVM’s

methods [14].

评论收藏

内容反馈

sbxurui

2013-03-19

理论派的必读
mathsdzf

2015-01-01

谢谢分享，非常好的经典资源！
孔雀東南飞

2012-11-07

存英文的，有些专业单词不会啊，但是还是读下来了，收益匪浅啊，毕竟是原作，国内好多其他的关于此的论文有些坑爹啊！

MONKEYCSS1

粉丝: 1
资源: 2

支持向量机大牛Vapnik的两篇论文

支持向量机的论文

支持向量机论文

文献关于支持向量机

首次提出SVM的英文论文，105页pdf

支持向量机 比较完整的原理推导（论文）

论文研究-基于支持向量机与反.pdf

支持向量机推广能力分析

Vapnik大作第二版，Estimation of Dependences Based on Empirical Data(基于经验数据的依赖性估计)

SVM支持向量机代码解释

支持向量机模型

支持向量机通俗导论

支持向量机(数学建模)

最小二乘支持向量机论文打包-支持向量机.rar

基于支持向量机的多分类算法研究

几篇关于SVM的经典外文文献论坛上有的就不发啦-一些文献.rar

支持向量机SVM

基于不同损失函数支持向量回归机的对比研究_周怡.caj

机器学习-第六章支持向量机

支持向量机

数据仓库与数据挖掘第六章Part6_6_SVM支持向量机.ppt

基于支持向量机的机器学习研究 Research of Machine-Learning Based Support Vector Machine

支持向量机及libsvm资料

SVM入门 支持向量机

支持向量机原理介绍

数据挖掘新方法-支持向量机

支持向量机知识点概要

论文研究-支持向量机在步态分类中的应用 .pdf

最新资源

支持向量机比较完整的原理推导（论文）

SVM入门支持向量机