Supervised sparse patch coding towards misalignment-robust face recognition
Congyan Lang
a,
⇑
, Songhe Feng
a
, Bin Chen
b
, Xiaotong Yuan
b
a
Department of Computer Science and Engineering, Beijing Jiaotong University, Beijing, China
b
Department of Electrical and Computer Engineering, National University of Singapore, Singapore
article info
Article history:
Available online 19 June 2012
Keywords:
Face recognition
Spatial misalignment
Image occlusions
Sparse coding
Misalignment robust
Supervised sparse coding
Dual sparsity
Collective sparse reconstructions
abstract
We address the challenging problem of face recognition under the scenarios where both training and test
data are possibly contaminated with spatial misalignments. A supervised sparse coding framework is
developed in this paper towards a practical solution to misalignment-robust face recognition. Each gal-
lery face image is represented as a set of patches, in both original and misaligned positions and scales, and
each given probe face image is then uniformly divided into a set of local patches. We propose to sparsely
reconstruct each probe image patch from the patches of all gallery images, and at the same time the
reconstructions for all patches of the probe image are regularized by one term towards enforcing sparsity
on the subjects of those selected patches. The derived reconstruction coefficients by ‘
1
-norm minimiza-
tion are then utilized to fuse the subject information of the patches for identifying the probe face. Such a
supervised sparse coding framework provides a unique solution to face recognition with all (Here, we
emphasize ‘‘all’’ because some conventional algorithms for face recognition possess partial of these char-
acteristics.) the following four characteristics: (1) the solution is model-free, without the model learning
process, (2) the solution is robust to spatial misalignments, (3) the solution is robust to image occlusions,
and (4) the solution is effective even when there exist spatial misalignments for gallery images. Extensive
face recognition experiments on three benchmark face datasets demonstrate the advantages of the pro-
posed framework over holistic sparse coding and conventional subspace learning based algorithms in
terms of robustness to spatial misalignments and image occlusions.
Ó 2012 Elsevier Inc. All rights reserved.
1. Introduction
Face recognition has been motivated by both its scientific val-
ues and potential applications in the practice of computer vision
and machine learning. This problem has been extensively studied
and much progress has been achieved during the past decades.
As a standard preprocessing step for face recognition, face align-
ment and cropping are generally applied in automatic face recogni-
tion systems, and face images are typically aligned according to the
positions of corresponding eyes [10–12]. The main purpose of face
alignment is to build the semantic correspondences between the
pixels of different images and eventually to classify by matching
the pixels with identical semantic meaning.
Unfortunately, the images may not be accurately aligned, and
the pixels for the same facial landmarks may not be strictly
matched. Practical systems, or even manual face cropping, may
bring considerable image misalignments, including translations,
scaling, and rotation. These transformations can consequently
make discrepant the semantics of two pixels in different images
but at the same position. This discrepancy may inversely affect im-
age similarity measurement, and consequently degrade face recog-
nition performance. Thus it is a challenging problem to recognize
faces under scenarios with spatial misalignments, where the mar-
gins between subjects tend to be more ambiguous.
In the literature, there exist some attempts to analyze and
tackle this type of problems. Shan et al. [13] showed that the effect
of spatial misalignments can be alleviated to some extent by add-
ing virtual gallery samples with artificial spatial misalignments.
Yang et al. [20] proposed a solution to improve algorithmic robust-
ness to image misalignments with ubiquitously supervised sub-
space learning. Xu et al. [19] proposed a solution based on the
so-called Spatially constrained Earth Mover’s Distance (SEMD),
which is more robust against spatial misalignments than the tradi-
tional distance measures (e.g., Euclidean distance). Recently, Wang
et al. [16] provided a novel and efficient algorithm for face recog-
nition under scenarios with spatial misalignments by solving a
constrained ‘
1
-norm optimization problem, which minimizes the
error between the misalignment-amended image and the image
reconstructed from the given subspace along with its principal
complementary subspace. However, the spatial misalignment
problem is still far from being solved, since: (1) most of these
methods focus on the global features of face images, yet typically,
1047-3203/$ - see front matter Ó 2012 Elsevier Inc. All rights reserved.
http://dx.doi.org/10.1016/j.jvcir.2012.06.002
⇑
Corresponding author.
E-mail addresses: cylang@bjtu.edu.cn (C. Lang), shfeng@bjtu.edu.cn (S. Feng),
g0800415@nus.edu.sg (B. Chen), eleyuanx@nus.edu.sg (X. Yuan).
J. Vis. Commun. Image R. 24 (2013) 103–110
Contents lists available at SciVerse ScienceDirect
J. Vis. Commun. Image R.
journal homepage: www.elsevier.com/locate/jvci