Joint Regularized Nearest Points for Image Set based Face Recognition
Meng Yang
1
, Weiyang Liu
2
, and Linlin Shen
1
1
College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
2
School of Electronic & Computer Engineering, Peking University
Abstract—Face recognition based on image set has attracted
much attention due to its promising per-formance to overcome
various variations. Recently, (collaborative) regularized nearest
points (C)RNP has achieved the state-of-art performance by
measuring the between-set distance as the distance between
nearest points generated in each image set. However, the nearest
point of the query set in RNP changes in computing its distance
to nearest points of different gallery image sets, which may result
in that a wrong gallery image set can also has a small between-set
distance; CRNP used collaborative representation to overcome
this issue but it doesn’t explicitly minimize the between-set
distance. In order to solve these issues and fully exploit the
advantages of nearest point based approaches, in this paper a
novel joint regularized nearest points (JRNP) is proposed for face
recognition based on image sets. In JRNP, the nearest point in
the query set keeps the same when computing its distance to the
image sets of different classes; at the same time, it explicitly
minimize the between-set distance of facial images. An efficient
algorithm was proposed to solve this problem, and the
classification is then based on the joint distance between the
regularized nearest points in image sets. Extensive experiments
on benchmark databases were conducted on benchmark
databases (e.g., Honda/UCSD, CMU Mobo, and YouTube
databases). The experimental results clearly show that our JRNP
leads the performance in face recognition based on image sets.
I. INTRODUCTION
Recognizing the objects of interest (e.g., human faces) is
one of the most important and fundamental problems in the
communities of computer vision and pattern recognition.
Although face recognition (FR) has been extensively studied
in the past several decades, the traditional face recognition
usually assumes there is only a single query face image, from
which a human identity is recognized. Although there are
multiple images in the gallery set per subject, it is still a big
challenge to correctly recognize a person’s identity based on
only a single query face image captured in less-
controlled/uncontrolled environments due to different
variations (e.g., lighting, expression, pose, disguise changes)
existing in facial appearance images.
With the wide installation of video cameras and the
developments of large-capacity-storage media, it becomes
very convenient to collect multiple images from video
sequences or photo albums for a known subject and store these
images as the gallery and query image sets. Multiple face
images in the query and gallery set for each subject
incorporates more within-class appearance variations and
provides richer information for face recognition. Compared to
the traditional face recognition with a single query face image,
face recognition based on image sets could achieve more
satisfactory performance in practical face recognition
applications and is more promising framework of face
recognition.
Face recognition based on image sets has been attracting
much attention from researchers over the past decades. The
image sets could either be the consecutive video sequences
with temporal information, or unordered photo album images
collected from web at different times. Compared to video-
based face recognition [1][20-23][38-39], face recognition
based on general image sets, in which the temporal
information is not available, has wider applications. In this
paper we mainly focus on the face recognition problem based
on general image sets. Numerous approaches have been
proposed for this kind of image-set based recognition problem.
One major category of face recognition based on image set
is the parameter model based approaches [39][24-25]. These
parametric model based approaches [39][24-25] firstly
represent each image set by some parametric distribution with
the parameters estimated from the data itself, and then
calculate the between-set distance by measuring the similarity
between these two distributions (e.g., in terms of Kullback-
Leibler divergence [37]). However, the parametric methods
need to solve the difficult parameter estimation problem and
require strong statistical correlations between the gallery and
query sets, which may not exist in practice. To overcome the
shortcomings of parameter model based approaches, recently
Lu et al. [36] directly extracted the multiple order statistics
features from the image set and developed a multi-kernel
metric learning method to combine different order information.
In order to avoid the drawbacks of model-based methods,
non-parametric model-free based approaches were proposed
based on representing an image set as a convex/affine
subspace [3][19][26-28], mixture of subspaces [29-31], or
nonlinear manifolds [4][17][32-33]. In nonlinear-manifold
methods, the manifold of an image set is usually represented
as a combination of local linear subspaces [4][33]. In this
model-free face recognition based on image sets, how to
measure between-set distance is the key problem. A popular
way is to define the between-set distance as the distance
between two “exemplars” (e.g., the mean of samples) chosen
from these two image sets. For instance, Cevikalp et al. [3]
characterized each image set by an affine/convex hull spanned
by its samples, and selected two points (one point in one hull)
with the closest approach as the “exemplars”. Another way of
measuring the between-set distance for non-parametric
approach is to compare the structure of the non-parametric
model. For instance, Canonical correlation analysis (CCA) [9],
which analyzes the principal angles and canonical correlations
between linear subspaces, is widely used in the works of
978-1-4799-6026-2/15/$31.00 ©2015 IEEE