in a high-dimensional feature space rather than in the original fea-
ture space. It’s worth pointing out that the presented KSRC in [10] is
a little different from [12–14] since it integrates the described KSRC
[12–14] with dimensionality reduction in the kernel feature space.
In machine learning area, the kernel trick is originally used to
develop support vector machines (SVM) [16], kernel Fisher discrim-
inant analysis (KFDA) [17], and kernel principal component analysis
(KPCA) [18], since it can be easily used to generalize a linear algo-
rithm into a nonlinear algorithm. These kernel-based learning
methods recall the kernel trick to map the original data into a
high-dimensional feature space (also called kernel feature space)
by using a nonlinear mapping, and then perform linear processing
problems in this high-dimensional feature space with the inner
products. The inner products in the high-dimensional feature space
are computed by using some implicit mapping kernel function.
Similarly, KSRC maps the data into a high-dimensional feature
space by using some nonlinear mapping associated with a kernel
function and then implements the SRC algorithm in this high-
dimensional feature space. As a kernel method, KSRC is capable of
capturing the nonlinear relationships of data, and thus performs
better than SRC for classification.
Although KSRC has been successfully applied for image classi-
fication and face recognition since it integrates the kernel
method and the SRC method, KSRC is not able to capture the
locality structure of data. In pattern recognition area, data local-
ity is an important issue in the problems of K-nearest-neighbor
(KNN) classifier [19], data clustering [20], dimensionality reduc-
tion [21,22], image classification [23], etc. Note that, the recently
reported work [22,23] have proved that in the problem of sparse
coding data locality is more essential than sparsity since inte-
grating data locality with the original sparse coding methods
yields more effective sparse coding coefficients. Motivated by
the advantage of data locality, in this paper we integrate KSRC
with data locality in the kernel feature space rather than in
the original feature space, and develop an extension of KSRC,
called locality-sensitive kernel sparse representation classifica-
tion (LS-KSRC).
The remaining of this paper is organized as follows: Section 2
reviews kernel sparse representation classification (KSRC) in brief.
Section 3 provides the proposed LS-KSRC method in detail. Exper-
imental results and analysis are given in Section 4. Section 5 offers
the concluding remarks and discussions.
2. Review of kernel sparse representation classification
2.1. Sparse representation classification
In this section, we will simply introduce the principal of SRC [7]
based on the CS theory. Essentially, SRC is based on the linearity
assumption that the whole set of training samples form a dictio-
nary, and then the recognition problem is cast as one of discrimi-
natively finding a sparse representation of the test image as a
linear combination of training images by solving the l
1
-norm opti-
mization problem.
Given a set of training samples with dimensionality d (x
i
, y
i
)|x
i
e
R
d
, y
i
e
{1, 2, , c}, i =1,2,, N, where y
i
is the class label of
input data x
i
, c is the number of classes, the goal of SRC is to use
the given c-class training samples to exactly predict the class label
y
i
of x
i
.
Now let the jth class training samples form columns of a matrix
X
j
¼½x
j;1
; x
j;2
; ; x
j;n
j
2R
dn
j
; j ¼ 1; 2; ; c, where x
j,i
is the ith
training sample of the jth class, and n
j
denotes the number of the
jth class training samples. Then, for all training samples a new
sample matrix X can be expressed as
X ¼½X
1
; X
2
; ; X
c
ð1Þ
In SRC, a test sample x could be represented linearly by all the
training samples
x ¼ X
a
þ
e
ð2Þ
where
a
is the coefficient vector and e is the approximation error.
The linearity assumption in SRC implies that the coefficient vector
a
is expected to be zero except some of those associated with the
correct class label of the test sample. To achieve the coefficient vec-
tor
a
, the following l
1
-norm minimization problem should be
solved.
min
a
k
a
k
1
; subject to kx X
a
k
2
2
6
e
ð3Þ
This is a convex optimization problem and can be solved by qua-
dratic programming. So far, a variety of efficient algorithms have
been proposed to solve the l
1
-norm minimization of Eq. (3), such
as l1-magic [24], l1-ls [25], spectral projected gradient method
(SPGL1) [26] and NESTA (a shorthand for Nesterov’s algorithm) [27].
Once the coefficient vector
a
is found, the test sample x could
be classified in terms of the reconstruction errors (residuals)
between x and its approximations. The jth approximation of the
test sample x is achieved by using only the coefficients belonging
to the jth class. The class label of the test sample x is then assigned
to the one with the minimum residual. The detailed classification
procedure of SRC [7] is summarized in Algorithm 1:
Algorithm 1. Sparse representation classification (SRC)
(1) Input: the matrix of all training samples X, and a test
sample x
(2)
Solve the l
1
-norm minimization problem in Eq. (3)
(3) Compute the residuals by using the samples associated
with the jth class by using r
j
ðxÞ¼kx X
a
j
k
2
2
(4) Output: the class label y of the given test sample
x:y = arg min
j=1,2,,c
r
j
(x)
2.2. Kernel sparse representation classification
By means of integrating the kernel trick and the SRC method,
kernel sparse representation classification (KSRC) [10–14] is devel-
oped as a nonlinear extension of SRC. Essentially, KSRC aims to
seek sparse representation coefficients in the kernel feature space
rather than in the original feature space.
Assuming that there exists a nonlinear kernel mapping / for
each input data point x:
/ : R
d
!F; x#/ðxÞð4Þ
With the nonlinear mapping /, we can map the input data point
x
i
e
R
d
into some potentially high-dimensional feature space F .In
this kernel feature space F an inner product h,i could be defined
for a properly chosen /, which gives rise to a so-called reproducing
kernel Hilbert space (RKHS). In RHKS, a kernel function k(x
i
, x
j
)is
defined as:
kðx
i
; x
j
Þ¼h/ðx
i
Þ; /ðx
j
Þi ¼ /ðx
i
Þ
T
/ðx
j
Þð5Þ
where k is known as a kernel. There are three commonly used
kernel functions, i.e., the linear kernel, polynomial kernel as well
as the Gaussian kernel.
The linear kernel function is defined as
kðx
i
; x
j
Þ¼x
T
i
x
j
ð6Þ
The polynomial kernel function is defined as
kðx
i
; x
j
Þ¼ðx
T
i
x
j
Þ
r
ð7Þ
S. Zhang, X. Zhao / J. Vis. Commun. Image R. 25 (2014) 1878–1885
1879