Author's personal copy
work by introducing a new cost function to realize lip contour
extraction in color images, while Tian et al. [14] utilized a
symmetrical DT to model the lip shape and formulated the color
distribution inside the closed mouth region as a Gaussian mixture
to regularize the DT. In general, the tracking performance of this
kind of methods will be degraded if a lip shape is evidently
irregular or when the mouth opens widely. The region-based ACM
algorithm featuring on minimizing a regional energy function
always outperforms the edge-based ACM for lip images with
weak edges or without edges. For instance, Chiou et al. [15]
modified the original ACM by adding eight radial vectors within
the lip region to regularize the active contours driving to the lip
boundary. Wakasugi et al. [16] applied the separability of regional
color intensity distributions with ACM to achieve lip contour
extraction. Nevertheless, it has been found that these methods
often suffer from the complex components in oral cavity and are
highly dependent on the parameter initialization. The ASM
approach adopts a set of landmark points to describe the lip
shape, and these points are controlled within a few modes
derived from a training data set. For example, Luettin et al. [17]
applied a set of manually labeled points with ASM to train the
possible lip shapes. Sum et al. [18] presented an optimization
procedure from a point-based model using ASM for extracting the
lip contours. Nguyen et al. [19] integrated multi-features of lip
regions with ASM to learn lip shapes. The AAM algorithm
proposed by Matthews et al. [2] is an extension of ASM algorithm
incorporating the eigenanalysis in gray-level case. Often, the ASM
and AAM are both quite laborious to establish a training data set
with manually cautious calibration and perform a training pro-
cess to determine the lip shapes. Meanwhile, these methods may
not be able to provide a good match to those lip shapes that are
quite distinct from the training data. It is therefore unsuitable for
the robust lip tracking applications from a practical viewpoint.
In recent years, lip image analysis in color space, e.g., CIELAB,
CIELUV and HSV, has received much attention as the color can
provide additional significant information that is not available in
gray-level cases. Wang et al. [20] generated probability map of lip
region in color space via fuzzy clustering method incorporating
shape function (FCMS) and developed an iterative point-driven
optimization scheme to fit the lip boundary based on pre-
generated probability map. Subsequently, Leung et al. [21] further
extended the above work with an elliptic shape function to
segment the lip region in color space. Similar and related works
can be found in [22,23]. It is found that this kind of methods can
significantly simplify the detection and location of the lip regions.
Nevertheless, as the distributions of skin, tongue and lip may
overlap and diversify among different speakers, it may make such
a method inaccurate and unstable to achieve lip segmentation or
lip contour extraction, particularly in the case of mouth opening
widely. Meanwhile, the implementation of these methods often
suffers from the appearance of tongue or black hole as shown in
Fig. 1, although multiple pre-processing procedures can reduce
the teeth effect.
More recently, Eveno et al. [24] attempted to combine the
merits of the above-stated approaches and proposed a jumping
snake with a parametric model composed of four cubic curves to
achieve lip tracking. It is effective in most cases, but which is
highly dependent on pre-and-post-processing techniques and
adjustment process to make the model match the lip shape
appropriately. Differing from the above region-based approaches,
Jian et al. [25] addressed a modified attractor-guided particle
filtering framework to track the lip contours. Unfortunately, such
a method needs to segment a set of representative lip contours
manually as the shape priors in advance. Furthermore, Ong et al.
[26] proposed a learnt data-driven approach via linear predictors
to track the lip movements, but which needs a data set composed
of different types of lip shapes in advance. Further, this method,
as well as the one in [25], involves the complicated iterative
learning to match the lip shape, whose computation is time-
consuming.
Thus far, almost all the region-based approaches involve the
globally statistical characteristics. Subsequently, their perfor-
mance may deteriorate upon the appearance of teeth, tongue or
black hole. Until very recently, when object in an image has
heterogeneous statistics or complex components, it is found that
the localized active contour model (LACM) [27], which utilizes the
local statistical characteristics, can generally achieve a better
segmentation result as shown in Figs. 2 and 3(d). Nevertheless,
this model highly depends on the appropriate selection of
correlative parameters. Often, the improper parameters, e.g.,
ulterior evolving curve with small local radius or proper evolving
curve with large local radius, could lead to erroneous extractions
as shown in Fig. 3(c). In addition, Ref. [27] does not consider the
prior knowledge about color information, which actually provides
more information to improve the extraction performance, espe-
cially when the images are shadowed, shaded and highlighted
[28,29].
In this paper, we present a local region based approach to lip
tracking with two phases: (i) lip contour extraction for the first
lip frame, and followed by (ii) lip tracking in the subsequent lip
frames. Initially, we introduce a new kind of active contour
model, namely localized color active color model (LCACM), pro-
vided that the foreground and background regions around the
object are locally different in color space. In the first phase, we
find a combined semi-ellipse around the first lip image as initial
evolving curve and compute the localized energies for curve
evolution such that the lip image is separated into lip and non-
lip regions. Then, we utilize a 16-point deformable model [20]
with geometric constraint to achieve lip contour extraction. In the
second phase, we present a dynamic selection of the radius of
local regions associated with the extracted lip contour of the
previous frame to realize lip tracking. The proposed approach is
adaptive to lip movement, and robust against the appearance of
Upper lip
lower lip
tougue
teeth
black hole
skin
Fig. 1. A lip region incorporated the appearance of teeth, tongue and black hole in
oral cavity.
interestin
ob
ect
evolving curve
local interior
local exterior
Fig. 2. Graphical representation of the active contour model: (a) evolving curve
with diverging directions along the arrow; (b) the description of local interior and
local exterior region.
Y.-m. Cheung et al. / Pattern Recognition 45 (2012) 3336–3347 3337