AggregateChannelFeaturesforMulti-viewFaceDetection英文原文加翻译

共2个文件

pdf：1个

docx：1个

需积分: 10 176 浏览量 2017-04-14 18:25:28 上传评论收藏 3.35MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

ACF.zip （2个子文件）

Aggregate Channel Features for Multi-view Face Detection.pdf 2.44MB

ACF.docx 1016KB

Aggregate Channel Features for Multi-view Face Detection

Bin Yang Junjie Yan Zhen Lei Stan Z. Li

∗

Center for Biometrics and Security Research & National Laboratory of Pattern Recognition

Institute of Automation, Chinese Academy of Sciences, China

yb.derek@gmail.com {jjyan,zlei,szli}@nlpr.ia.ac.cn

Abstract

Face detection has drawn much attention in recen-

t decades since the seminal work by Viola and Jones. While

many subsequences have improved the work with more pow-

erful learning algorithms, the feature representation used

for face detection still can’t meet the demand for effectively

and efﬁciently handling faces with large appearance vari-

ance in the wild. To solve this bottleneck, we borrow the

concept of channel features to the face detection domain,

which extends the image channel to diverse types like gradi-

ent magnitude and oriented gradient histograms and there-

fore encodes rich information in a simple form. We adop-

t a novel variant called aggregate channel features, make

a full exploration of feature design, and discover a multi-

scale version of features with better performance. To deal

with poses of faces in the wild, we propose a multi-view

detection approach featuring score re-ranking and detec-

tion adjustment. Following the learning pipelines in Viola-

Jones framework, the multi-view face detector using ag-

gregate channel features shows competitive performance a-

gainst state-of-the-art algorithms on AFW and FDDB test-

sets, while runs at 42 FPS on VGA images.

1. Introduction

Human face detection have long been one of the most

fundamental problems in computer vision and human-

computer interaction. In the past decade, the most inﬂuen-

tial work should be the face detection framework proposed

by Viola and Jones [22]. The Viola-Jones (abbreviated as

VJ below) framework uses rectangular Haar-like features

and learns the hypothesis using Adaboost algorithm. Com-

bined with the attentional cascade structure, the VJ detector

achieved real-time face detection at that time. Despite the

great success of the VJ detector, the performance is still far

from satisfactory due to the large appearance variance of

faces in unconstrained settings.

∗

Corresponding author.

Figure 1. An intuitive visualization of our multi-view face detec-

tor using aggregate channel features. The area with warmer color

indicates more attention paid to by the detector.

To handle faces in the wild, many subsequences of

VJ framework merged. These methods mainly get the

performance gains in two aspects, more complicated fea-

tures [17, 19, 26] and (or) more powerful learning algo-

rithms [14, 1, 25]. As the combination of boosting and cas-

cade has been proven to be quite effective in face detection,

the bottleneck lies in the feature representation since com-

plicated features adopted in the above literatures bring about

limited performance gains at the cost of large computation

cost.

Lately in another domain of pedestrian detection, a fami-

ly of channel features has achieved record performances [6,

5]. Channel features compute registered maps of the origi-

nal images like gradients and histograms of oriented gradi-

ents and then extract features on these extended channel-

s. The classiﬁer learning process follows the VJ frame-

work pipeline. In this paper, we adopt a variant of chan-

nel features called aggregate channel features [5], which

are extracted directly as pixel values on subsampled chan-

nels. Channel extension offers rich representation capaci-

ty, while simple feature form guarantees fast computation.

With these two superiorities, the aggregate channel features

break through the bottleneck in VJ framework and have the

potential to make great advance in face detection.

As we mainly concentrate our efforts to the feature rep-

resentation rather than learning algorithms in this paper, we

not only just adopt the aggregate channel features in face de-

tection, but also try to explore the full potential of this nov-

el representation. To do so, we make a deep and all-round

investigation into the speciﬁc feature parameters concern-

ing channel types, feature pool size, subsampling method,

feature scale and so on, which gives insights into the fea-

ture design and hopefully provides helpful guidelines for

practitioners. Through the deep exploration, we ﬁnd that:

1) multi-scaling the feature representation further enriches

the representation capacity since original aggregate channel

features have uniform feature scale; 2) different combina-

tions of channel types impact the performance greatly, while

for face detection the color channel in LUV space, plus gra-

dient magnitude channel and gradient histograms channels

in RGB space show best result; 3) multi-view detection is

proven to be a good match with aggregate channel features

as the representation naturally encodes the facial structure

(Figure 1).

Although multi-view detection could effectively deal

with diverse poses, additional issues come up as how to

merge detections output by separately trained subview de-

tectors, and how to deal with the offsets of location and s-

cale between output detections and ground-truth. We solve

these problems by carefully designed post-processing in-

cluding score re-ranking, detection merging and bounding

box adjustment.

The detailed experimental exploration of aggregate

channel features, along with our improvements on multi-

view detection, leads to large performance gain in face de-

tection in the wild. On two challenging face databases,

AFW and FDDB, the proposed multi-view face detector

shows competitive performance against state-of-the-art de-

tectors in both detection accuracy and speed.

The remaining parts of this paper are organized as fol-

lows. Section 2 revisits related work in face detection. Sec-

tion 3 describes how we build the face detector using aggre-

gate channel features. Section 4 addresses problems con-

cerning multi-view face detection. Experimental results on

AFW and FDDB are shown in section 5 and we conclude

the paper in section 6.

2. Related work

Face detection has drawn much attention since the ear-

ly time of computer vision. Although many solutions had

been put forward, it was not until Viola and Jones [22] pro-

posed their milestone work that face detection saw surpris-

ing progress in the past decades. The VJ face detector fea-

tures in three aspects: fast feature computation via integral

image representation, classiﬁer learning using Adaboost,

and the attentional cascade structure. One main drawback

of the VJ framework is that the features have limited repre-

sentation capacity, while the feature pool size is quite large

to compensate for that. Typically, in a 24 × 24 detection

window, the number of Haar-like features is 160,000 [22].

To address the problem, efforts are made in two directions.

Some focus on more complicated features like HoG [26],

SURF [13]. Some aim to speed up the feature selection in

a heuristic way [18, 2]. However, the problem hasn’t been

solved perfectly. In this paper, we mainly focus on the fea-

ture representation part and make a deep exploration into

it, which is complementary to existing work on the learning

algorithm and classiﬁer structure in the VJ framework.

Recently channel features have been proposed and

shown record performance in pedestrian detection [6, 5].

Due to the channel extension to diverse types like gradients

and local histograms, the features show richer representa-

tion capacity for classiﬁcation. However, the features are

extracted as rectangular sums at various locations and scales

which we believe leads to a redundant feature pool. During

preparation of this paper, Mathias et al. [16] independent-

ly discover the effectiveness of integral channel features in

face detection domain. In this paper, we adopt a novel vari-

ant of channel features called aggregate channel features,

which extract features directly as pixel values in extended

channels without computing rectangular sums at various lo-

cations and scales. The feature has powerful representation

capacity and the feature pool size is only several thousands.

Through careful design in section 3 and implementation of

multi-view detection in section 4, the aggregate channel fea-

tures based detector achieves state-of-the-art performance

on challenging databases.

3. Proposed face detector

In this section, we make a full exploration of the aggre-

gate channel features in the context of face detection. We

ﬁrst give a brief introduction of the feature itself, including

its computation, properties and advantages over traditional

Haar-like features used in VJ framework. Then the detailed

experimental investigation is described in two parts, feature

design and training design. Before that, some guidelines

concerning how we conduct the investigation are demon-

strated. Each design part is divided into several separate ex-

periments ended with a summary explaining the speciﬁc pa-

rameters used in our proposed face detector. Note that each

experiment focuses on only one parameter and the others

remain constant. Through the well-designed experiments,

the proposed face detector based on aggregate channel fea-

tures is built step by step. Issues concerning the implemen-

tation of multi-view face detection which further improves

the performance are discussed in the next section.

3.1. Feature description

Channel extension: The basic structure of the aggre-

gate channel features is channel. The application of channel

Figure 2. Work-ﬂow of proposed face detector.

has a long history since digital images were invented. The

most common type of channel should be the color chan-

nels of the image, with Gray-scale and RGB being typical

ones. Besides color channels, many different channel types

have been invented to encode different types of informa-

tion for more difﬁcult problems. Generally, channels can

be deﬁned as a registered map of the original image, whose

pixels are computed from corresponding patches of original

pixels [6]. Different channels can be computed with linear

or non-linear transformation of the original image. To al-

low for sliding window detection, the transformations are

constrained to be translationally invariant.

Feature computation: Based on the deﬁnition of chan-

nels, the computation of aggregate channel features is quite

simple. As shown in Figure 2, given a color image, all

deﬁned channels are computed and subsampled by a pre-

set factor. The aggregate pixels in all subsampled channels

are then vectorized into a pixel look-up table. Note that an

optional smoothing procedure can be done on each chan-

nel with a binomial ﬁlter both before computation and after

subsampling.

Classiﬁer learning: The learning process is quite sim-

ple. Two changes are made compared with VJ framework.

First is that weak classiﬁer is changed from decision stump

to depth-2 decision tree. The more complex weak classiﬁer

shows stronger ability in seeking the discriminant intra and

inter channel correlations for classiﬁcation [15]. Second d-

ifference is that soft-cascade [1] structure is used. Unlike

the attentional cascade structure in VJ framework which has

several cascade stages, a single-stage classiﬁer is trained on

the whole training data and a threshold is then set after each

weak classiﬁer picked by Adaboost. These two changes

lead to more efﬁcient training and detection.

Overall superiority: Compared with traditional Haar-

like features used in VJ framework, aggregate channel fea-

tures have the following differences and advantages: 1) The

image channels are extended to more types in order to en-

code diverse information like color, gradients, local his-

tograms and so on, therefore possess richer representation

capacity. 2) Features are extracted directly as pixel values

on downsampled channels rather than computing rectangu-

lar sums with various locations and scales using integral

images, leading to a faster feature computation and small-

er feature pool size for boosting learning. With the help

of cascade structure, detection speed is accelerated more.

3) Due to its structure consistence with the overall image,

when coupled with boosting method, the boosted classiﬁer

naturally encodes structured pattern information from large

training data (see Figure 1 for an illustration), which gives

more accurate localization of faces in the image.

3.2. Investigation guidelines

All investigations are trained on the AFLW face

database

[10] and tested on the Annotated Faces in the

Wild (AFW) testset

. To make it clear, there are in total

36, 112 positive samples and 108, 336 negative samples s-

elected from AFLW which are kept constant in all investi-

gations. Testset contains 205 natural images with faces that

vary a lot in pose, appearance and illumination.

To alleviate the ground-truth offset caused by different

annotation styles (Figure 4) in training and testing set and

make the evaluation more comparable, a lower Jaccard in-

dex

with threshold 0.3 is adopted in comparative evalua-

tion. Practically the lower threshold won’t cause errors be-

ing mistakenly corrected. Note that in ﬁnal evaluation of the

proposed face detector (section 5), the AFW testset, togeth-

er with another face benchmark FDDB database, are used

as testbed and the evaluation metric follows the database

protocol.

3.3. Feature design

To fully exploit the power of aggregate channel features

in face detection domain, a deep investigation into the de-

sign of the feature is done mainly on channel types, win-

dow size, subsampling method and feature scale. Results of

comparative experiments are shown in Figure 6.

Channel types: Three types of channels are used, which

are color channel (Gray-scale, RGB, HSV and LUV), gra-

dient magnitude, and gradient histograms. The computation

of the latter two channel types could be seen as a general-

ized version of HoG features. Speciﬁcally, gradient magni-

tude is the biggest response on all three color channels, and

oriented gradient histograms follow the idea of HoG in that:

1) rectangular cell size in HoG equals the subsampling fac-

tor in aggregated channel features; 2) each orientation bin

results in one feature channel (6 orientation bins are used

in this paper). Figure 6 (a)˜(c) show how much each of

these three types alone contributes to the performance of

face detection. It can be seen that the gradient histograms

contribute most to the performance among all three channel

http://testsetlrs.icg.tugraz.at/research/aflw/

http://www.ics.uci.edu/

xzhu/face/

The Jaccard index is deﬁned as the size of the intersection divided by

the size of the union of the sample sets.

评论收藏

内容反馈

你向我靠过来

粉丝: 10
资源: 6

Aggregate Channel Features for Multi-view Face Detection英文原文加翻译

最新资源

Aggregate Channel Features for Multi-view Face Detection英文原文加翻译

Aggregate Channel Features for Multi-view Face Detection

基于聚合通道特征（ACF）的行人检测

Aggregate

Integral_Channel_Features及其代码

ACF+CNN 英文原文及翻译

maven-site-aggregate-plugin-2.14.0-test-sources.jar

maven-site-aggregate-plugin-2.13.0-test-sources.jar

maven-site-aggregate-plugin-2.12.0-test-sources.jar

maven-site-aggregate-plugin-2.11.0-test-sources.jar

maven-site-aggregate-plugin-2.10.0-test-sources.jar

maven-site-aggregate-plugin-2.9.0-test-sources.jar

maven-site-aggregate-plugin-2.8.0-test-sources.jar

maven-site-aggregate-plugin-2.7.0-test-sources.jar

maven-site-aggregate-plugin-2.6.0-test-sources.jar

maven-site-aggregate-plugin-2.5.0-test-sources.jar

maven-site-aggregate-plugin-2.4.0-test-sources.jar

maven-site-aggregate-plugin-2.3.0-test-sources.jar

maven-site-aggregate-plugin-2.2.0-test-sources.jar

maven-site-aggregate-plugin-2.1.0-test-sources.jar

maven-site-aggregate-plugin-1.4.0-test-sources.jar

maven-site-aggregate-plugin-1.3.0-test-sources.jar

国家开放大学计算机应用基础终结性考试（大作业）

离散数学知识点整理（超级全面详细！）

《科研伦理与学术规范》期末考试文档2（40题）

Word2Recite 桌面单词

2021全国及分省市县行政区划矢量图层shp文件.rar

Revit 各版本官方族库及项目样板下载和安装方法，2016-2021族库离线包下载.rar

38000词汇思维导图（1-50词根）β版.rar

博士“申请-考核制”面试——英文提问问题/答案模板

最新资源