166 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 17, NO. 1, JANUARY 2006
Ensemble-Based Discriminant Learning
With Boosting for Face Recognition
Juwei Lu, Member, IEEE, K. N. Plataniotis, Senior Member, IEEE, A. N. Venetsanopoulos, Fellow, IEEE, and
Stan Z. Li
Abstract—In this paper, we propose a novel ensemble-based
approach to boost performance of traditional Linear Discriminant
Analysis (LDA)-based methods used in face recognition. The
ensemble-based approach is based on the recently emerged tech-
nique known as “boosting.” However, it is generally believed that
boosting-like learning rules are not suited to a strong and stable
learner such as LDA. To break the limitation, a novel weakness
analysis theory is developed here. The theory attempts to boost a
strong learner by increasing the diversity between the classifiers
created by the learner, at the expense of decreasing their margins,
so as to achieve a tradeoff suggested by recent boosting studies
for a low generalization error. In addition, a novel distribution
accounting for the pairwise class discriminant information is
introduced for effective interaction between the booster and the
LDA-based learner. The integration of all these methodologies
proposed here leads to the novel ensemble-based discriminant
learning approach, capable of taking advantage of both the
boosting and LDA techniques. Promising experimental results ob-
tained on various difficult face recognition scenarios demonstrate
the effectiveness of the proposed approach. We believe that this
work is especially beneficial in extending the boosting framework
to accommodate general (strong/weak) learners.
Index Terms—Boosting, face recognition (FR), linear discrimi-
nant analysis, machine learning, mixture of linear models, small-
sample-size (SSS) problem, strong learner.
I. INTRODUCTION
A. Face Recognition
F
ACE RECOGNITION (FR) has a wide range of appli-
cations, such as face-based video indexing and browsing
engines, biometric identity authentication, human-computer
interaction, and multimedia monitoring/surveillance. Within
the past two decades, numerous FR algorithms have been
proposed, and detailed surveys of the developments in the
area have appeared in the literature [1]–[6]. Among various
FR methodologies used, the most popular are the so-called
appearance-based approaches, which include the three most
well-known FR methods, namely Eigenfaces [7], Fisherfaces
[8], and Bayes Matching [9]. With focus on low-dimensional
statistical feature extraction, the appearance-based approaches
Manuscript received March 23, 2004; revised December 24, 2004.
This work was supported in part by the Bell University Laboratories at the
University of Toronto.
J. Lu, K. N. Plataniotis, and A. N. Venetsanopoulos are with The Edward S.
Rogers Sr. Department of Electrical and Computer Engineering, University of
Toronto, ON M5S 3G4 Canada (e-mail: kostas@dsp.toronto.edu).
Stan Z. Li is with the Center for Biometrics and Security Research, Institute
of Automation, Chinese Academy of Sciences, Beijing 100080, P.R. China.
Digital Object Identifier 10.1109/TNN.2005.860853
generally operate directly on appearance images of face object
and process them as two-dimensional (2-D) holistic patterns
to avoid difficulties associated with three-dimensional (3-D)
modeling, and shape or landmark detection [5]. Of the appear-
ance-based FR methods, those based on linear discriminant
analysis (LDA) have shown promising results as it is demon-
strated in [8], [10]–[15]. However, statistical learning methods
such as the LDA-based ones often suffer from the so-called
“small-sample-size” (SSS) problem [16], encountered in
high-dimensional pattern recognition tasks where the number
of training samples available for each subject is smaller than the
dimensionality of the samples. For example, in the experiments
reported here only
training samples per subject
are available while the dimensionality of the sample space
is up to
. In addition, the performance of linear
appearance-based methods including LDA often deteriorates
rapidly when face patterns are subject to large variations in
viewpoints, illumination or facial expression. These variations
result in a highly nonconvex and complex distribution of face
images [17]. Thus, the limited success of these methods should
be attributed to their linear nature.
In general, a nonconvex distribution can be handled either
by globally nonlinear models or by a mixture of locally linear
models (or ensemble-based methods as they are known in the
machine learning literature [18]). Globally nonlinear methods
are not without problems. Approaches such as those based on
kernel machines [19]–[26] require the optimization of many de-
sign parameters, tend to overfit easily due to the increased al-
gorithmic complexity, and they are computationally expensive
compared to their linear counterparts. The last point is particu-
larly important for tasks such as face recognition, which are per-
formed in a high-dimensional input space. On the other hand,
ensemble-based approaches embody the principle of “divide
and conquer,” by which a complex recognition task is decom-
posed into a set of simpler ones, in each of which a locally
linear pattern distribution can be generalized and dealt with by
a relatively simple linear solution. As such, the ensemble-based
methods are simpler, easier to implement, and more cost effec-
tive compared to the nonlinear ones. However, most existing en-
semble-based FR methods are developed based on traditional
cluster analysis [27]–[30]. As a consequence, a disadvantage to
classification tasks is that the submodels’ division/combination
criteria used in these clustering techniques are not directly re-
lated to the classification error rate (CER) of the resulting clas-
sifier, especially the true CER (often referred to as the general-
ization error rate).
1045-9227/$20.00 © 2006 IEEE