《端到端人脸识别》2020综述论文资源-CSDN文库

人脸识别

需积分: 10 175 浏览量 2020-10-04 10:21:36 上传评论收藏 9.45MB PDF 举报

资源推荐

资源详情

资源评论

The Elements of End-to-end Deep Face Recognition: A

Survey of Recent Advances

HANG DU

∗

, Shanghai University, China

HAILIN SHI

∗

, JD AI Research, China

DAN ZENG

†

, Shanghai University, China

TAO MEI, JD AI Research, China

Face recognition is one of the most fundamental and long-standing topics in computer vision community.

With the recent developments of deep convolutional neural networks and large-scale datasets, deep face

recognition has made remarkable progress and been widely used in the real-world applications. Given a

natural image or video frame as input, an end-to-end deep face recognition system outputs the face feature for

recognition. To achieve this, the whole system is generally built with three key elements: face detection, face

preprocessing, and face representation. The face detection locates faces in the image or frame. Then, the face

preprocessing is proceeded to calibrate the faces to a canonical view and crop them to a normalized pixel size.

Finally, in the stage of face representation, the discriminative features are extracted from the preprocessed

faces for recognition. All of the three elements are fullled by deep convolutional neural networks. In this

paper, we present a comprehensive survey about the recent advances of every element of the end-to-end

deep face recognition, since the thriving deep learning techniques have greatly improved the capability of

them. To start with, we introduce an overview of the end-to-end deep face recognition, which, as mentioned

above, includes face detection, face preprocessing, and face representation. Then, we review the deep learning

based advances of each element, respectively, covering many aspects such as the up-to-date algorithm designs,

evaluation metrics, datasets, performance comparison, existing challenges, and promising directions for future

research. We hope this survey could bring helpful thoughts to one for better understanding of the big picture

of end-to-end face recognition and deeper exploration in a systematic way.

Additional Key Words and Phrases: Deep convolutional neural network, face recognition, face detection, face

preprocessing, face representation.

1 INTRODUCTION

Face recognition is an extensively studied topic in computer vision. Among the existing technolo-

gies of human biometrics, face recognition is the most widely used in real-world applications, such

as the authentication and surveillance systems. According to the modality of data, face recognition

can be divided into 2D image based methods and 3D scan based methods, which are quite dierent

in development and application. Moreover, with the great advance of deep convolutional neural

networks (DCNNs), deep learning based methods have achieved signicant performance improve-

ments on various computer vision tasks, including face recognition. In this survey, we focus on 2D

image based end-to-end deep face recognition which takes the natural images or video frames as

input, and extracts the deep features of each face as output. We provide a comprehensive review of

the recent advances in the elements of end-to-end deep face recognition. Specically, an end-to-end

deep face recognition system is composed of three key elements: face detection, face preprocessing,

and face representation. In the following, we give a brief introduction of each element.

Face detection is the rst step of the end-to-end face recognition. It aims to locate the face

regions in the natural images or video frames. Before the deep learning era, one of the pioneering

∗

Equal contribution. This work was performed at JD AI Research.

†

Corresponding author.

Authors’ addresses: Hang Du, duhang@shu.edu.cn, Shanghai University, Shanghai, China; Hailin Shi, shihialin@jd.com, JD

AI Research, Beijing, China; Dan Zeng, Shanghai University, Shanghai, China, dzeng@shu.edu.cn; Tao Mei, tmei@jd.com,

JD AI Research, Beijing, China.

arXiv:2009.13290v1 [cs.CV] 28 Sep 2020

Fig. 1. The publication number of the elements of end-to-end deep face recognition from 2013 to July 2020.

work for face detection is Viola-Jones [

232

] face detector, which utilizes AdaBoost classiers

with Haar features to build a cascaded structure. Later on, the subsequent approaches explore

the eective hand-craft features [

165

172

] and various classiers [

127

155

] to improve the

detection performance. Besides, some methods [

281

] employ Deformable Part Models (DPM) for

face detection. One can refer to [

304

] for a thorough survey of traditional face detection methods.

Recently, with the great progress of DCNNs, deep learning based face detection has been extensively

studied. By learning from large-scale data with DCNN, face detectors become more robust to various

conditions, such as large facial poses and occlusions.

Next, face preprocessing refers to calibrate the natural face to a canonical view and crop it to a

normalized pixel size, in order to facilitate the subsequent task of face representation computing. It

is an essential intermediate procedure for a face recognition system. In this survey, we introduce two

major practices for face preprocessing, i.e., face alignment and face frontalization. Generally, the face

alignment utilizes spatial transformations to warp faces to a canonical location with the reference of

facial landmarks. So, the facial landmark localization is necessary for face alignment. Most traditional

works of facial landmark localization focused on either generative methods [

] or discriminative

methods [

158

354

], and there are several exhaustive surveys about them [

100

249

371

]. Instead of

utilizing facial landmarks, some approaches directly generate aligned face from the input one. In

addition, face frontalization studies to synthesize frontal faces from non-frontal inputs, which is

commonly used to handle large pose face recognition.

In the face representation stage, the discriminative features are extracted from the preprocessed

face images for recognition. This is the nal and core step of face recognition. In early studies,

many approaches calculates the face representation by projecting face images into low-dimensional

subspace, such as Eigenfaces [

229

] and Fisherfaces [

]. Later on, more handcrafted local de-

scriptors based methods [

137

] prevailed in face representation. For a detailed review of these

traditional methods, one can refer to [

233

312

]. Recently, the face representation benets from the

development of DCNNs and witnesses great improvements for high performance face recognition.

This survey focuses on reviewing and analyzing the recent advances in each element of end-to-

end deep face recognition. An important fact is that, the performance of face recognition depends

on the contribution of all the elements ( i.e., face detection, preprocessing and representation). In

other words, inferiority in any one of the elements will become the shortest piece of cask and

harm the nal performance. In order to establish a high-performance end-to-end face recognition

system, it is essential to discuss every element of the holistic framework and their mutual eect

on each other. A number of face recognition surveys have been published in the past twenty

years. The main dierence between our survey and existing ones are summarized in Table 1.

Table 1. Representative surveys of face recognition

Title Year Description

Face Recognition: A Literature Survey [233] 2003

Traditional image- and video-based methods in face

recognition. Not covering deep face recognition.

Face Recognition from a Single Image per Person: A Survey [312] 2006

The methods to address the single sample problem in face

recognition, not covering deep face recognition.

A survey of approaches and challenges in 3D and multi-modal 3D+2D

face recognition [15]

2006

A survey of 3D and multi-modal face recognition, not

covering deep face recognition.

Illumination Invariant Face Recognition: A Survey [369] 2007

Focus on illumination-invariant face recognition task, not

covering deep face recognition.

A Survey of Face Recognition Techniques [6] 2009

Traditional face recognition methods on dierent modal

face data, not covering deep face recognition.

A Comprehensive Survey on Pose-Invariant Face Recognition [48] 2016 Focus on pose-invariant face recognition task.

A survey of local feature methods for 3D face recognition [206] 2017

A review of feature extraction based methods for 3D face

recognition.

Deep Learning for Understanding Faces [181] 2018

Provide a brief overview of the end-to-end deep face

recognition, not covering the recent works.

Deep Face Recognition: A Survey [246] 2018 Focus on the deep face representation learning.

Past, Present, and Future of Face Recognition: A Review [2] 2020

A review of 2D and 3D face recognition, not covering

end-to-end deep face recognition.

Specically, there are certain surveys [

233

312

] about face recognition but do not cover deep

learning based methods since they were published early before the deep learning era; besides, some

surveys focus on 3D face recognition [

206

] and specic tasks [

369

]. Instead, we focus on

the 2D face recognition which is the most needed in practical applications. Ranjan et al. [

181

]

provided a brief overview of the three elements, while they did not cover the recent techniques that

rapidly evolved in the past few years. As shown in Fig. 1, the number of published works has been

increasing dramatically during these years. Wang et al. [

246

] presented a systematic review about

deep face recognition, in which they mainly focused on deep face representation learning, and the

categorization of training loss is sub-optimal. For instance, they sorted the supervised learning

of deep face representation by euclidean-distance based loss, angular/cosine-margin-based loss,

softmax loss and its variations, however, almost all the angular/cosine-margin-based losses are

implemented as the variation of softmax loss rather than an individual set. In contrast, we suggest a

more reasonable categorization of the training supervision with three subsets, i.e., the classication,

feature embedding and hybrid methods (in Section 5.2). More recently, Insaf et al. [

] provided a

review of 2D and 3D face recognition from the traditional to deep-learning era, while the scope

was still limited in the face representation. In summary, the face recognition techniques need to be

systematically reviewed with a wide scope covering all the elements of the end-to-end pipeline,

while seldom of the existing surveys has fullled this job.

Therefore, we systematically review the deep learning based approaches of each element in

the end-to-end face recognition, respectively. The review of each element covers many aspects:

algorithm designs, evaluation metrics, datasets, performance comparisons, remaining challenges,

and promising directions for future research. We hope this survey could bring helpful thoughts

to one for better understanding of the big picture of end-to-end face recognition and deeper

exploration in a systematic way.

Specically, the main contributions can be summarized as follows:

•

We provide a comprehensive survey of the recent advances of the elements in end-to-end

deep face recognition, including face detection, face preprocessing, face representation.

•

We discuss the three elements from many aspects: algorithm designs, evaluation metrics,

datasets, and performance comparison etc.

Fig. 2. The standard pipeline of end-to-end deep face recognition system. First, the face detection stage aims

to localize the face region on the input image. Then, the face preprocessing is proceeded to normalize the

detected face to a canonical view. Finally, the face representation devotes to extract discriminative features

for face recognition.

•

We further collect the existing challenges and promising directions for each element to

facilitate future research, and also discuss the future trends from the view of the holistic

framework.

2 OVERVIEW

A typical end-to-end deep face recognition system includes three basic elements: face detection,

face preprocessing, and face representation, as shown in Fig. 2. First, face detection localizes the

face region on the input image. Then, face preprocessing is proceeded to normalize the detected

face into a canonical layout. Finally, face representation devotes to extract discriminative features

from the prepossessed face. The features are used to calculate the similarity between them, in order

to make the decision that whether the faces belong to the same identity.

We structure the body sections (Section 3, 4, 5) with respect to the three elements, each of which

is a research topic that covers abundant literatures in computer vision. We give an overview of the

three elements briey in this section, and dive into each of them in the following body sections.

2.1 Face Detection

Face detection is the rst procedure of the face recognition system. Given an input image, the

face detection aims to nd all the faces in the image and give the coordinates of bounding box

with a condence score. The major challenges of face detection contain varying resolution, scale,

pose, illumination, occlusion etc.The traditional methods focus on designing hand-crafted features

that distinguishes facial and background region. With the development of deep learning, the deep

features have been extensively used in face detection. In Section 3, we provide a categorization of the

deep learning based face detection methods from multiple dimensions, which includes multi-stage,

single-stage, anchor-based, anchor-free, multi-task learning, CPU real-time and problem-oriented

methods. Generally, the categorizing criterion of the multi-stage and single-stage methods relies

on whether the face detectors generate candidate boxes, then the following one or more stages

further rene the candidates for accurate predictions. Most anchor-based methods preset a number

of anchors on the feature maps and then make classication and regression on these anchors. The

anchors play a crucial role in this routine. Recently, another routine, i.e., the anchor-free design,

attracts growing attention in object detection due to its exibility and eciency. So, we also discuss

the anchor-free methods and make comparison with the anchor-based ones. In addition, as the face

detection is the prior step in face recognition systems, the computational eciency of face detector

Fig. 3. Visualization of facial landmarks of dierent versions. The 4-point and 5-point landmarks are oen

used for face alignment.

is important in real-world applications. Although the detectors can achieve good performance with

the DCNNs, it is impractical to deploy heavy-weight networks, especially on the non-GPU devices.

Thus, we introduce the CPU real-time methods for practical applications. Certainly, we should

not ignore another set of problem-oriented methods for face detection, since they have explicit

motivation to tackle the specic challenges. From the above-mentioned perspectives, we provide an

in-depth discussion about the existing deep face detection methods in Section 3. It is worth noting

that there exists overlapping techniques between the categories, because, as explained above, the

categorization is built up from multiple perspectives. It will help us to better recognize the deep

learning based methods for face detection.

2.2 Face Preprocessing

In the second stage, face preprocessing aims to calibrate the detected face to a canonical view

( i.e., face alignment or frontalization), which is an essential procedure for improving the end-

to-end performance of face recognition. Since human face appears with the regular structure,

in which the facial parts (eyes, nose, mouth, etc.) have constant arrangement, the alignment of

face is of great benet to the subsequent feature computation for face recognition. Commonly,

face alignment utilizes spatial transformation techniques to calibrate faces to a normalized layout.

For most existing methods of face alignment, the facial landmarks, or so-called facial keypoints

(as shown in Fig. 3), are indispensable, because they are involved as the reference for similarity

transformation or ane transformation. So, the facial landmark localization is a prerequisite for

face alignment. The DCNNs based facial landmark localization methods can be divided into three

subcategories: coordinate regression based approaches, heatmap regression based approaches and

3D model tting based approaches. The coordinate regression based approaches take the landmark

coordinates as the target of the regression objective, and aims to learn the nonlinear mapping

from the input face image to the landmark coordinates. Besides, the heatmap regression based

methods output likelihood response maps corresponding to each landmark, respectively. Moreover,

the 3D model tting based methods predict a 3D face shape from a 2D image, and then project it

onto the image plane to obtain 2D landmarks. Without relying on the facial landmarks, several

methods can directly output aligned face from the input by learning the transformation parameters.

In addition, face frontalization techniques can also be applied in face preprocessing to tackle large

pose variations by synthesizing identity-preserving frontal faces from non-frontal views. Both face

alignment and face frontalization are the common practices for calibrating an unconstrained face

to a canonical view and facilitating the subsequent face representation. We will review this set of

methods in Section 4.

2.3 Face Representation

As the key step of face recognition systems, face representation devotes to learn deep face model

and use it to extract features from preprocessed faces for recognition. The features are used to

calculate the similarity of the matched faces. In Section 5, we provide a review of deep learning

剩余43页未读，继续阅读

评论收藏

内容反馈

syp_net

粉丝: 158
资源: 1187

《端到端人脸识别》2020综述论文

端到端增强卷积网络的视频人脸表情识别研究.docx

人脸识别综述论文（几篇在维普上下的论文）

关于人脸识别研究的几篇综述文章

经典的人脸识别论文，包含中、英文

《人脸识别手册》人脸识别领域的经典著作

人工智能-人脸识别源代码（分享）

基于深度学习的人脸识别技术综述.doc

人脸识别学习资料论文之类的

python人脸识别,python人脸识别源码,Python

人脸识别报告有源码python有界面

人脸识别手册_李子青

(人脸识别系统研究与实现)全部论文和源代码

视频检索、视频会议、人脸识别相关硕士论文

人脸识别综述及应用，了解一下

人脸识别项目的图片素材下载

人脸识别综述20181

人脸识别综述_刘卫凯1

比较全面的国内外现状：自动人脸识别技术ppt

face_recognition_人脸识别_cnn人脸识别_face_recognition_cnn人脸识别_facerecog

人脸识别与语音识别

Handbook_Of_Face_Recognition 李子青 《人脸识别手册》

基于VC++的人脸识别系统

计算机毕业设计：C++人脸识别系统源码.zip

基于深度学习的人脸识别

OpenCV的PCA人脸识别

最新资源

Handbook_Of_Face_Recognition 李子青《人脸识别手册》