Neural Networks 22 (2009) 49–57
Contents lists available at ScienceDirect
Neural Networks
journal homepage: www.elsevier.com/locate/neunet
A signal theory approach to support vector classification: The sinc kernel
I
James D.B. Nelson, Robert I. Damper
∗
, Steve R. Gunn, Baofeng Guo
Information: Signals, Images, Systems Research Group, School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK
a r t i c l e i n f o
Article history:
Received 11 May 2006
Accepted 15 September 2008
Keywords:
Hyperspectral imaging
Parameter estimation
Regularisation
Reproducing kernel Hilbert spaces
Sequency analysis
Signal theory
Sinc kernel
Support vector machines
a b s t r a c t
Fourier-based regularisation is considered for the support vector machine classification problem over
absolutely integrable loss functions. By invoking the modest assumption that the decision function
belongs to a Paley–Wiener space, it is shown that the classification problem can be developed in the
context of signal theory. Furthermore, by employing the Paley–Wiener reproducing kernel, namely the
sinc function, it is shown that a principled and finite kernel hyper-parameter search space can be
discerned, a priori. Subsequent simulations performed on a commonly-available hyperspectral image data
set reveal that the approach yields results that surpass state-of-the-art benchmarks.
© 2009 Published by Elsevier Ltd
1. Introduction
An often-cited property of the support vector machine (SVM)
learning method is the existence of a unique solution. Another
very desirable attribute, namely flexibility, is readily realised by
the introduction of non-linear kernel methods. But herein lies a
conflict. Although flexibility admits richness, it also introduces
parameters, and thereby precludes uniqueness. Whether the
parameter takes the form of a scaling vector, a scaling number, or
the kernel itself, the fact remains that in the context of non-linear
support vector machines there are uncountably many solutions.
Unfortunately, the only way to determine the best solution is to
build uncountably many kernels. This is, of course, intractable.
However, when framed in the context of reproducing kernel
Hilbert spaces, it has been shown by Girosi (1998) that the
choice of kernel and parameters control the nature and degree of
regularisation that is imposed on the solution. A related issue is
that the so-called curse of dimensionality often turns out not to
have the detrimental effect that is predicted. Some recent machine
learning research has focused on finding cogent explanations for
this phenomenon. Belkin and Niyogi (2004) argue that a possible
reason is that the data lie on a sub-manifold, embedded in the
input space. Indeed, data with a large number of variables may
I
This research was supported by the Data and Information Fusion (DIF) Defence
Technology Centre, United Kingdom, under DTC Project 8.2, funded by the UK
Ministry of Defence and managed by General Dynamics Limited and QinetiQ.
∗
Corresponding author. Tel.: +44 0 23 8059 4577.
E-mail addresses: jn@ecs.soton.ac.uk (J.D.B. Nelson), rid@ecs.soton.ac.uk
(R.I. Damper), srg@ecs.soton.ac.uk (S.R. Gunn), bg@ecs.soton.ac.uk (B. Guo).
lie entirely in a much smaller-dimensional manifold. Knowledge
pertaining to the structure of the manifold can be used to
guide the choice of parameters, and thus the nature and degree
of regularisation. Such realisations lead to a more considered
approach: that is to ascertain, a priori, properties of the space
wherein the data lie. Although there may still exist infinitely many
solutions, the range of an empirical search could then at least be
focused upon subsets of parameters rather than all possible choices
of parameters.
We propose a principled way of reducing the infinite parameter
search space to an exhaustive and finite one. Our approach
is motivated by sampling theory, where the main goal is to
establish equivalence relations between data sequence spaces
and kernel function spaces. To this end, we employ perhaps the
most elementary function space from sampling theory, namely
the simply connected and zero-centred Paley–Wiener reproducing
kernel Hilbert space, more commonly referred to by engineers as
baseband-limited signals. For a given class of data, we show how to
estimate, a priori, a suitable kernel and parameter subspace. Smale
and Zhou (2004) have also studied the application of sampling
theory and reproducing kernel Hilbert spaces to learning theory.
They consider the least squares loss regression problem and
construct probability estimates for the sampling error. The work
reported here adds to the rather small amount of literature on this
under-explored topic.
The remainder of this paper is structured as follows. In
Section 2, the data class under consideration and its corresponding
reproducing kernel Hilbert space are constructed. Accordingly,
some necessary signal theory concepts are introduced and
discussed in Section 3, and exploited in Section 4. Finally, in
Section 5, we report the best results to date on a popular
0893-6080/$ – see front matter © 2009 Published by Elsevier Ltd
doi:10.1016/j.neunet.2008.09.016