Aggregate Channel Features for Multi-view Face Detection
Bin Yang Junjie Yan Zhen Lei Stan Z. Li
∗
Center for Biometrics and Security Research & National Laboratory of Pattern Recognition
Institute of Automation, Chinese Academy of Sciences, China
yb.derek@gmail.com {jjyan,zlei,szli}@nlpr.ia.ac.cn
Abstract
Face detection has drawn much attention in recen-
t decades since the seminal work by Viola and Jones. While
many subsequences have improved the work with more pow-
erful learning algorithms, the feature representation used
for face detection still can’t meet the demand for effectively
and efficiently handling faces with large appearance vari-
ance in the wild. To solve this bottleneck, we borrow the
concept of channel features to the face detection domain,
which extends the image channel to diverse types like gradi-
ent magnitude and oriented gradient histograms and there-
fore encodes rich information in a simple form. We adop-
t a novel variant called aggregate channel features, make
a full exploration of feature design, and discover a multi-
scale version of features with better performance. To deal
with poses of faces in the wild, we propose a multi-view
detection approach featuring score re-ranking and detec-
tion adjustment. Following the learning pipelines in Viola-
Jones framework, the multi-view face detector using ag-
gregate channel features shows competitive performance a-
gainst state-of-the-art algorithms on AFW and FDDB test-
sets, while runs at 42 FPS on VGA images.
1. Introduction
Human face detection have long been one of the most
fundamental problems in computer vision and human-
computer interaction. In the past decade, the most influen-
tial work should be the face detection framework proposed
by Viola and Jones [22]. The Viola-Jones (abbreviated as
VJ below) framework uses rectangular Haar-like features
and learns the hypothesis using Adaboost algorithm. Com-
bined with the attentional cascade structure, the VJ detector
achieved real-time face detection at that time. Despite the
great success of the VJ detector, the performance is still far
from satisfactory due to the large appearance variance of
faces in unconstrained settings.
∗
Corresponding author.
Figure 1. An intuitive visualization of our multi-view face detec-
tor using aggregate channel features. The area with warmer color
indicates more attention paid to by the detector.
To handle faces in the wild, many subsequences of
VJ framework merged. These methods mainly get the
performance gains in two aspects, more complicated fea-
tures [17, 19, 26] and (or) more powerful learning algo-
rithms [14, 1, 25]. As the combination of boosting and cas-
cade has been proven to be quite effective in face detection,
the bottleneck lies in the feature representation since com-
plicated features adopted in the above literatures bring about
limited performance gains at the cost of large computation
cost.
Lately in another domain of pedestrian detection, a fami-
ly of channel features has achieved record performances [6,
5]. Channel features compute registered maps of the origi-
nal images like gradients and histograms of oriented gradi-
ents and then extract features on these extended channel-
s. The classifier learning process follows the VJ frame-
work pipeline. In this paper, we adopt a variant of chan-
nel features called aggregate channel features [5], which
are extracted directly as pixel values on subsampled chan-
nels. Channel extension offers rich representation capaci-
ty, while simple feature form guarantees fast computation.
With these two superiorities, the aggregate channel features
break through the bottleneck in VJ framework and have the
potential to make great advance in face detection.