JOURNAL OF L
A
T
E
X CLASS FILES, VOL. 4, NO. 5, APRIL 2015 1
Adaptive Cascade Regression Model for Robust Face
Alignment
Qingshan Liu, Senior Member, IEEE, Jiankang Deng, Jing Yang, Guangcan Liu, Member, IEEE,
and Dacheng Tao, Fellow, IEEE
Abstract—Cascade regression is a popular face alignment
approach, and it has achieved good performances on the wild
databases. However, it depends heavily on local features in
estimating reliable landmark locations and therefore suffers from
corrupted images, such as images with occlusion, which often
exists in real-world face images. In this paper, we present a new
adaptive cascade regression model for robust face alignment.
In each iteration, the shape-indexed appearance is introduced
to estimate the occlusion level of each landmark, and each
landmark is then weighted according to its estimated occlusion
level. Also, the occlusion levels of the landmarks act as adaptive
weights on the shape-indexed features to decrease the noise on
the shape-indexed features. At the same time, an exemplar-
based shape prior is designed to suppress the influence of
local image corruption. Extensive experiments are conducted
on the challenging benchmarks, and the experimental results
demonstrate that the proposed method achieves better results
than state-of-the-art methods for facial landmark localization and
occlusion detection.
Keywords—Robust Face Alignment, Cascade Regression Model,
Shape-Indexed Appearance, Adaptive Shape Prior
I. INTRODUCTION
Face alignment has been an active research topic over the
last two decades [1], because it potentially has significance for
many face-oriented applications, such as face recognition [2],
[3], [4], [5], expression analysis [6], [7], face animation [8],
face synthesis [9], and 3D face modeling [10], [11]. A large
number of facial landmark localization methods have been
proposed in the past two decades [12], and the most popular
solution is to take the ensemble of facial landmarks as a whole
shape and learn a general face shape model from labeled
training images [13]. In respect of this shape model, the
previous works can be categorized as explicit shape model-
based methods and implicit shape model-based methods.
Manuscript received May 6, 2016; revised August 22, 2016 and November
1, 2016; accepted November 26, 2016. This work was supported in part by
the National Natural Science Foundation of China under Grant 61532009
and Grant 61272223, in part by the Natural Science Foundation of Jiangsu
province under Grant BK2012045, in part by the Australian Research Council
under Project DP-140102164 and Project FT-130101457. The associate editor
coordinating the review of this manuscript and approving it for publication
was Prof. Yonggang Shi.
Q. Liu, J. Deng, J. Yang and G. Liu are with the B-DAT Laboratory, the
Department of Information and Control, Nanjing University of Information
and Technology, Nanjing 210014 China.
D. Tao is with the Center for Quantum Computation and Intelligent Systems,
Faculty of Engineering and Information Technology, University of Technology,
Sydney, N.S.W. 2007, Australia.
Most early works on this topic address the face alignment
problem by employing explicit shape constraints, and they
learn a parametric shape model from the labeled training data.
Representative works are Active Shape Model (ASM) [14] and
Active Appearance Model (AAM) [15], [16], [17], in which
the variation in face shape is modeled by Principal Component
Analysis (PCA) [14], [15]. Other methods include Markov
Random Field (MRF)-based modeling [18], [19], Graph-based
model [20], [21], and exemplar-based modeling [22], [21]. In
the context of medical image analysis, Zhang [23] developed
an Adaptive Shape Composition method (ASC) to model
shapes and implicitly incorporate the shape prior constraint
effectively by utilizing sparse representation on the shape
dictionary. ASC is able to handle non-Gaussian errors, model
multi-modal distribution of shapes and recover local details.
The problem is efficiently solved by an EM type of framework
and an efficient convex optimization algorithm. Inspired by
ASC, Liu [24] proposed a dual sparse constrained cascade
regression model (DSC-CR) for robust facial landmark lo-
calization. During the regressor training, a sparse constraint
is incorporated by Lasso [25], which can select the robust
features and compress the size of the model. Another sparse
shape constraint is incorporated between the regressors to
suppress the ambiguity in the local features. Due to the
limited capacity of explicit shape models, they tend to under-
perform on faces that have extreme variations in pose and
expression [26].
In recent years, implicit shape constraints have attracted
much attention. Their main objective is to learn shape re-
gression functions that directly map the face image to the
landmark coordinates without a parametric shape model, and
good performances have been achieved on some standard
benchmarks [27], [28], as a result of their ability to integrate
contextual information and their flexibility in building the
relationship between landmark points. There are two popular
ways to learn such a regression function. One is based on deep
network learning [28], [29], and the cascade regression model
is another popular implicit shape model. Our work focuses on
the cascade regression model, which aims to learn a series of
face shape regressors and combine them in an additive manner
to approximate the complex nonlinear mapping between the
initial shape and the true shape [27]. However, the cascade
regression model is sensitive to large occlusion, because oc-
clusion not only affects the location updates around occluded
regions but also has an effect on the location updates in non-
occluded regions during shape regressor iterations [30].
In this paper, we present a new adaptive cascade regression