7UDLQLQJLPDJHV
)HDWXUHH[WUDFWLRQ
1RQQHJDWLYH
VSDUVHFRGLQJ
PXOWLVFDOHPD[
SRROLQJ
6SDUVHDQGORZ
UDQNPDWUL[
GHFRPSRVLWLRQ
/RFDOLW\
FRQVWUDLQHG
OLQHDUFRGLQJ
//&
/LQHDU690
FODVVLILHU
Figure 1. The flowchart of the proposed method.
structed by the training images of the same class. However,
since images are often contaminated with noise; besides,
there are often multiple objects in an image with different
poses and occlusions. Sometimes using the training images
as the bases is not discriminative enough to boost the final
classification performance. Moreover, images of the same
class often share a lot of similarities and correlate with each
other, hence exhibit degenerated structure [6]. This seman-
tic information of images can help make correct classifica-
tion if computed correctly.
In this paper, we propose a new image classification
framework by leveraging the non-negative sparse coding,
low-rank and sparse matrix decomposition techniques (LR-
Sc
+
SPM). Figure 1 shows the flowchart of the proposed
method. Our proposed framework consists of two contri-
butions. First, we extend the recent work on image classi-
fication [4] and present to use non-negative sparse coding
along with max pooling method to reduce the information
loss during the encoding process for image representation.
The second is our main contribution. We propose a new
image classification method method by using the low-rank
and sparse matrix decomposition technique. Our work is
motivated by the observation that: (i) images of the same
class often correlate with each other. Ideally, if we stack the
BoW representation of images within the same class into
a matrix, this matrix will be low-rank; (ii) one image con-
tains only a limited number of objects and a limited type
of noise. This results in the characteristics of noise spar-
sity for the stacked BoW matrix. This low-rank and noise
information can be utilized for better image representation
than directly using the BoW representation of training im-
ages. Specially, to get more discriminative sparse coding
bases with the BoW representation of images, we leverage
the low-rank and sparse matrix decomposition technique to
decompose the BoW representation of images within the
same class into a low rank matrix and a sparse error matrix.
We then use these bases to encode the BoW representation
of images with sparsity and locality constraints. These cod-
ing parameters are used to represent images and linear SVM
classifier is then utilized to predict the category labels of im-
ages. Experimental results on four public datasets demon-
strate the effectiveness of the proposed method.
The rest of the paper is organized as follows. Section 2
introduces some related work. Section 3 presents the pro-
posed non-negative sparse coding spatial pyramid match-
ing method (Sc
+
SPM). Section 4 shows the proposed im-
age classification method by low-rank and sparse matrix
decomposition. Experimental results are given in Section
5. Finally we conclude in Section 6.
2. Related Work
The use of the bag-of-visual words (BoW) model [1] has
been proven very useful for image classification. Over the
past few years, many works have been done to improve the
performance of the BoW model. Some tried to learn dis-
criminative visual codebooks for image classification [12,
13]. Co-occurrence information of visual words was also
modeled in a generative framework [14, 15]. Others tried
to learn discriminative classifiers by considering the spatial
information and correlations among visual words [2-4, 7,
10-11]. To overcome the loss of spatial information in the
BoW model, motivated by Grauman and Darrell’s [3] pyra-
mid matching in feature space, Lazebnik et al. [2] proposed
the spatial pyramid matching (SPM). Since its introduction,
SPM has been widely used and proven very effective.
Recently, Yang et al. [4] proposed an extension of the
SPM approach by leveraging sparse coding and achieved
the state-of-the-art performance for image classification
when only one type of local feature (SIFT) is used. This
method can automatically learn the optimal codebook and
search for the optimal coding weights for each local feature.
After this, max pooling along with SPM is used to get the
feature representation of images. Inspired by this, Wang et
al. [16] proposed to use locality to constrain the sparse cod-
ing process which can be computed faster and yields better
performance. [11, 17] also tried to jointly learn the optimal
codebooks and classifiers. However, sparse coding [18] has
no constraints on the sign of the coding parameters, nega-
tive parameters are sometimes needed to satisfy the sparse
coding constrains. For some particular applications [19],
non-negative sparse coding [20] is needed.
Not only has sparse coding been used for local features,
but also it has been widely used holistically on the entire
image. Wright et al. [6] tried to do face recognition as
finding a sparse representation of the test image by treat-
ing the training set as the bases and impressive results were
achieved. Bradley and Bagnell [9] tried to train a compact
codebook using sparse coding. Yuan and Yan [7] made vi-
sual classification with multi-task joint sparse representa-
tion by fusing different types of features. Liu et al. [19]
tried to learn sparse and nonnegative representations of im-
1674