2 Face Recognition
Face recognition is an easy task for humans. Experiments in [6] have shown, that even one to three
day old babies are able to distinguish between known faces. So how hard could it be for a computer?
It turns out we know little about human recognition to date. Are inner features (eyes, nose, mouth)
or outer features (head shape, hairline) used for a successful face recognition? How do we analyze an
image and how does the brain encode it? It was shown by David Hubel and Torsten Wiesel, that our
brain has specialized nerve cells responding to specific local features of a scene, such as lines, edges,
angles or movement. Since we don’t see the world as scattered pieces, our visual cortex must somehow
combine the different sources of information into useful patterns. Automatic face recognition is all
about extracting those meaningful features from an image, putting them into a useful representation
and performing some kind of classification on them.
Face recognition based on the geometric features of a face is probably the most intuitive approach to
face recognition. One of the first automated face recognition systems was described in [9]: marker
points (position of eyes, ears, nose, ...) were used to build a feature vector (distance between the
points, angle between them, ...). The recognition was performed by calculating the euclidean distance
between feature vectors of a probe and reference image. Such a method is robust against changes in
illumination by its nature, but has a huge drawback: the accurate registration of the marker points
is complicated, even with state of the art algorithms. Some of the latest work on geometric face
recognition was carried out in [4]. A 22-dimensional feature vector was used and experiments on
large datasets have shown, that geometrical features alone don’t carry enough information for face
recognition.
The Eigenfaces method described in [13] took a holistic approach to face recognition: A facial image is
a point from a high-dimensional image space and a lower-dimensional representation is found, where
classification becomes easy. The lower-dimensional subspace is found with Principal Component
Analysis, which identifies the axes with maximum variance. While this kind of transformation is
optimal from a reconstruction standpoint, it doesn’t take any class labels into account. Imagine a
situation where the variance is generated from external sources, let it be light. The axes with maximum
variance do not necessarily contain any discriminative information at all, hence a classification becomes
impossible. So a class-specific projection with a Linear Discriminant Analysis was applied to face
recognition in [3]. The basic idea is to minimize the variance within a class, while maximizing the
variance between the classes at the same time (Figure 1).
Recently various methods for a local feature extraction emerged. To avoid the high-dimensionality of
the input data only local regions of an image are described, the extracted features are (hopefully) more
robust against partial occlusion, illumation and small sample size. Algorithms used for a local feature
extraction are Gabor Wavelets ([14]), Discrete Cosinus Transform ([5]) and Local Binary Patterns
([1, 11, 12]). It’s still an open research question how to preserve spatial information when applying a
local feature extraction, because spatial information is potentially useful information.
2.1 Face Database
I don’t want to do a toy example here. We are doing face recognition, so you’ll need some face
images! You can either create your own database or start with one of the available databases, face-
rec.org/databases gives an up-to-date overview. Three interesting databases are
1
:
AT&T Facedatabase The AT&T Facedatabase, sometimes also known as ORL Database of Faces,
contains ten different images of each of 40 distinct subjects. For some subjects, the images were
taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling /
not smiling) and facial details (glasses / no glasses). All the images were taken against a dark
homogeneous background with the subjects in an upright, frontal position (with tolerance for
some side movement).
Yale Facedatabase A The AT&T Facedatabase is good for initial tests, but it’s a fairly easy
database. The Eigenfaces method already has a 97% recognition rate, so you won’t see any
improvements with other algorithms. The Yale Facedatabase A is a more appropriate dataset
for initial experiments, because the recognition problem is harder. The database consists of
15 people (14 male, 1 female) each with 11 grayscale images sized 320 × 243 pixel. There are
1
Parts of the description are quoted from face-rec.org.
2