IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 10, NO. 5, SEPTEMBER 1999 1055
Support Vector Machines for
Histogram-Based Image Classification
Olivier Chapelle, Patrick Haffner, and Vladimir N. Vapnik
Abstract— Traditional classification approaches generalize
poorly on image classification tasks, because of the high
dimensionality of the feature space. This paper shows that
support vector machines (SVM’s) can generalize well on difficult
image classification problems where the only features are
high dimensional histograms. Heavy-tailed RBF kernels of
the form
K
(
x
;
y
)=
e
0
j
x
0
y
j
with
a
1
and
b
2
are evaluated on the classification of images extracted from
the Corel stock photo collection and shown to far outperform
traditional polynomial or Gaussian radial basis function (RBF)
kernels. Moreover, we observed that a simple remapping of the
input
x
i
!
x
a
i
improves the performance of linear SVM’s to
such an extend that it makes them, for this problem, a valid
alternative to RBF kernels.
Index Terms— Corel, image classification, image histogram,
radial basis functions, support vector machines.
I. INTRODUCTION
L
ARGE collections of images are becoming available to
the public, from photo collections to Web pages or even
video databases. To index or retrieve them is a challenge which
is the focus of many research projects (for instance IBM’s
QBIC [1]). A large part of this research work is devoted to
finding suitable representations for the images, and retrieval
generally involves comparisons of images. In this paper, we
choose to use color histograms as an image representation
because of the reasonable performance that can be obtained
in spite of their extreme simplicity [2]. Using this histogram
representation, our initial goal is to perform generic object
classification with a “winner takes all” approach: find the one
category of object that is the most likely to be present in a
given image.
From classification trees to neural networks, there are many
possible choices for what classifier to use. The support vector
machine (SVM) approach is considered a good candidate
because of its high generalization performance without the
need to add a priori knowledge, even when the dimension of
the input space is very high.
Intuitively, given a set of points which belongs to either
one of two classes, a linear SVM finds the hyperplane leaving
the largest possible fraction of points of the same class on the
same side, while maximizing the distance of either class from
the hyperplane. According to [3], this hyperplane minimizes
the risk of misclassifying examples of the test set.
Manuscript received January 21, 1999; revised April 30, 1999.
The authors are with the Speech and Image Processing Services Research
Laboratory, AT&T Labs-Research, Red Bank, NJ 07701 USA.
Publisher Item Identifier S 1045-9227(99)07269-0.
This paper follows an experimental approach, and its or-
ganization unfolds as increasingly better results are obtained
through modifications of the SVM architecture. Section II
provides a brief introduction to SVM’s. Section III describes
the image recognition problem on Corel photo images. Section
IV compares SVM and KNN-based recognition techniques
which are inspired by previous work. From these results,
Section V explores novel techniques, by either selecting the
SVM kernel, or remapping the input, that provide high image
recognition performance with low computational requirements.
II. S
UPPORT VECTOR MACHINES
A. Optimal Separating Hyperplanes
We give in this section a very brief introduction to SVM’s.
Let
be a set of training examples, each example
being the dimension of the input space, belongs
to a class labeled by
. The aim is to define a
hyperplane which divides the set of examples such that all
the points with the same label are on the same side of the
hyperplane. This amounts to finding
and so that
(1)
If there exists a hyperplane satisfying (1), the set is said
to be linearly separable. In this case, it is always possible to
rescale
and so that
i.e., so that the distance between the closest point to the
hyperplane is
. Then, (1) becomes
(2)
Among the separating hyperplanes, the one for which the
distance to the closest point is maximal is called optimal
separating hyperplane (OSH). Since the distance to the closest
point is
, finding the OSH amounts to minimizing
under constraints (2).
The quantity
is called the margin, and thus the OSH
is the separating hyperplane which maximizes the margin. The
margin can be seen as a measure of the generalization ability:
the larger the margin, the better the generalization is expected
to be [4], [5].
Since
is convex, minimizing it under linear constraints
(2) can be achieved with Lagrange multipliers. If we denote
1045–9227/99$10.00 1999 IEEE