没有合适的资源?快使用搜索试试~ 我知道了~
人脸识别是计算机视觉领域中最基本、最长期存在的研究课题之一。随着深度卷积神经网络和大规模数据集的发展,深度人脸识别取得了显著的进展,并在实际应用中得到了广泛的应用。以自然图像或视频帧作为输入,端到端深度人脸识别系统输出人脸特征进行识别。
资源推荐
资源详情
资源评论
The Elements of End-to-end Deep Face Recognition: A
Survey of Recent Advances
HANG DU
∗
, Shanghai University, China
HAILIN SHI
∗
, JD AI Research, China
DAN ZENG
†
, Shanghai University, China
TAO MEI, JD AI Research, China
Face recognition is one of the most fundamental and long-standing topics in computer vision community.
With the recent developments of deep convolutional neural networks and large-scale datasets, deep face
recognition has made remarkable progress and been widely used in the real-world applications. Given a
natural image or video frame as input, an end-to-end deep face recognition system outputs the face feature for
recognition. To achieve this, the whole system is generally built with three key elements: face detection, face
preprocessing, and face representation. The face detection locates faces in the image or frame. Then, the face
preprocessing is proceeded to calibrate the faces to a canonical view and crop them to a normalized pixel size.
Finally, in the stage of face representation, the discriminative features are extracted from the preprocessed
faces for recognition. All of the three elements are fullled by deep convolutional neural networks. In this
paper, we present a comprehensive survey about the recent advances of every element of the end-to-end
deep face recognition, since the thriving deep learning techniques have greatly improved the capability of
them. To start with, we introduce an overview of the end-to-end deep face recognition, which, as mentioned
above, includes face detection, face preprocessing, and face representation. Then, we review the deep learning
based advances of each element, respectively, covering many aspects such as the up-to-date algorithm designs,
evaluation metrics, datasets, performance comparison, existing challenges, and promising directions for future
research. We hope this survey could bring helpful thoughts to one for better understanding of the big picture
of end-to-end face recognition and deeper exploration in a systematic way.
Additional Key Words and Phrases: Deep convolutional neural network, face recognition, face detection, face
preprocessing, face representation.
1 INTRODUCTION
Face recognition is an extensively studied topic in computer vision. Among the existing technolo-
gies of human biometrics, face recognition is the most widely used in real-world applications, such
as the authentication and surveillance systems. According to the modality of data, face recognition
can be divided into 2D image based methods and 3D scan based methods, which are quite dierent
in development and application. Moreover, with the great advance of deep convolutional neural
networks (DCNNs), deep learning based methods have achieved signicant performance improve-
ments on various computer vision tasks, including face recognition. In this survey, we focus on 2D
image based end-to-end deep face recognition which takes the natural images or video frames as
input, and extracts the deep features of each face as output. We provide a comprehensive review of
the recent advances in the elements of end-to-end deep face recognition. Specically, an end-to-end
deep face recognition system is composed of three key elements: face detection, face preprocessing,
and face representation. In the following, we give a brief introduction of each element.
Face detection is the rst step of the end-to-end face recognition. It aims to locate the face
regions in the natural images or video frames. Before the deep learning era, one of the pioneering
∗
Equal contribution. This work was performed at JD AI Research.
†
Corresponding author.
Authors’ addresses: Hang Du, duhang@shu.edu.cn, Shanghai University, Shanghai, China; Hailin Shi, shihialin@jd.com, JD
AI Research, Beijing, China; Dan Zeng, Shanghai University, Shanghai, China, dzeng@shu.edu.cn; Tao Mei, tmei@jd.com,
JD AI Research, Beijing, China.
arXiv:2009.13290v1 [cs.CV] 28 Sep 2020
2
Fig. 1. The publication number of the elements of end-to-end deep face recognition from 2013 to July 2020.
work for face detection is Viola-Jones [
232
] face detector, which utilizes AdaBoost classiers
with Haar features to build a cascaded structure. Later on, the subsequent approaches explore
the eective hand-craft features [
7
,
165
,
172
] and various classiers [
16
,
127
,
155
] to improve the
detection performance. Besides, some methods [
59
,
281
] employ Deformable Part Models (DPM) for
face detection. One can refer to [
304
] for a thorough survey of traditional face detection methods.
Recently, with the great progress of DCNNs, deep learning based face detection has been extensively
studied. By learning from large-scale data with DCNN, face detectors become more robust to various
conditions, such as large facial poses and occlusions.
Next, face preprocessing refers to calibrate the natural face to a canonical view and crop it to a
normalized pixel size, in order to facilitate the subsequent task of face representation computing. It
is an essential intermediate procedure for a face recognition system. In this survey, we introduce two
major practices for face preprocessing, i.e., face alignment and face frontalization. Generally, the face
alignment utilizes spatial transformations to warp faces to a canonical location with the reference of
facial landmarks. So, the facial landmark localization is necessary for face alignment. Most traditional
works of facial landmark localization focused on either generative methods [
36
,
37
] or discriminative
methods [
158
,
354
], and there are several exhaustive surveys about them [
100
,
249
,
371
]. Instead of
utilizing facial landmarks, some approaches directly generate aligned face from the input one. In
addition, face frontalization studies to synthesize frontal faces from non-frontal inputs, which is
commonly used to handle large pose face recognition.
In the face representation stage, the discriminative features are extracted from the preprocessed
face images for recognition. This is the nal and core step of face recognition. In early studies,
many approaches calculates the face representation by projecting face images into low-dimensional
subspace, such as Eigenfaces [
229
] and Fisherfaces [
12
]. Later on, more handcrafted local de-
scriptors based methods [
3
,
137
] prevailed in face representation. For a detailed review of these
traditional methods, one can refer to [
6
,
233
,
312
]. Recently, the face representation benets from the
development of DCNNs and witnesses great improvements for high performance face recognition.
This survey focuses on reviewing and analyzing the recent advances in each element of end-to-
end deep face recognition. An important fact is that, the performance of face recognition depends
on the contribution of all the elements ( i.e., face detection, preprocessing and representation). In
other words, inferiority in any one of the elements will become the shortest piece of cask and
harm the nal performance. In order to establish a high-performance end-to-end face recognition
system, it is essential to discuss every element of the holistic framework and their mutual eect
on each other. A number of face recognition surveys have been published in the past twenty
years. The main dierence between our survey and existing ones are summarized in Table 1.
3
Table 1. Representative surveys of face recognition
Title Year Description
Face Recognition: A Literature Survey [233] 2003
Traditional image- and video-based methods in face
recognition. Not covering deep face recognition.
Face Recognition from a Single Image per Person: A Survey [312] 2006
The methods to address the single sample problem in face
recognition, not covering deep face recognition.
A survey of approaches and challenges in 3D and multi-modal 3D+2D
face recognition [15]
2006
A survey of 3D and multi-modal face recognition, not
covering deep face recognition.
Illumination Invariant Face Recognition: A Survey [369] 2007
Focus on illumination-invariant face recognition task, not
covering deep face recognition.
A Survey of Face Recognition Techniques [6] 2009
Traditional face recognition methods on dierent modal
face data, not covering deep face recognition.
A Comprehensive Survey on Pose-Invariant Face Recognition [48] 2016 Focus on pose-invariant face recognition task.
A survey of local feature methods for 3D face recognition [206] 2017
A review of feature extraction based methods for 3D face
recognition.
Deep Learning for Understanding Faces [181] 2018
Provide a brief overview of the end-to-end deep face
recognition, not covering the recent works.
Deep Face Recognition: A Survey [246] 2018 Focus on the deep face representation learning.
Past, Present, and Future of Face Recognition: A Review [2] 2020
A review of 2D and 3D face recognition, not covering
end-to-end deep face recognition.
Specically, there are certain surveys [
6
,
233
,
312
] about face recognition but do not cover deep
learning based methods since they were published early before the deep learning era; besides, some
surveys focus on 3D face recognition [
15
,
206
] and specic tasks [
48
,
369
]. Instead, we focus on
the 2D face recognition which is the most needed in practical applications. Ranjan et al. [
181
]
provided a brief overview of the three elements, while they did not cover the recent techniques that
rapidly evolved in the past few years. As shown in Fig. 1, the number of published works has been
increasing dramatically during these years. Wang et al. [
246
] presented a systematic review about
deep face recognition, in which they mainly focused on deep face representation learning, and the
categorization of training loss is sub-optimal. For instance, they sorted the supervised learning
of deep face representation by euclidean-distance based loss, angular/cosine-margin-based loss,
softmax loss and its variations, however, almost all the angular/cosine-margin-based losses are
implemented as the variation of softmax loss rather than an individual set. In contrast, we suggest a
more reasonable categorization of the training supervision with three subsets, i.e., the classication,
feature embedding and hybrid methods (in Section 5.2). More recently, Insaf et al. [
2
] provided a
review of 2D and 3D face recognition from the traditional to deep-learning era, while the scope
was still limited in the face representation. In summary, the face recognition techniques need to be
systematically reviewed with a wide scope covering all the elements of the end-to-end pipeline,
while seldom of the existing surveys has fullled this job.
Therefore, we systematically review the deep learning based approaches of each element in
the end-to-end face recognition, respectively. The review of each element covers many aspects:
algorithm designs, evaluation metrics, datasets, performance comparisons, remaining challenges,
and promising directions for future research. We hope this survey could bring helpful thoughts
to one for better understanding of the big picture of end-to-end face recognition and deeper
exploration in a systematic way.
Specically, the main contributions can be summarized as follows:
•
We provide a comprehensive survey of the recent advances of the elements in end-to-end
deep face recognition, including face detection, face preprocessing, face representation.
•
We discuss the three elements from many aspects: algorithm designs, evaluation metrics,
datasets, and performance comparison etc.
4
Fig. 2. The standard pipeline of end-to-end deep face recognition system. First, the face detection stage aims
to localize the face region on the input image. Then, the face preprocessing is proceeded to normalize the
detected face to a canonical view. Finally, the face representation devotes to extract discriminative features
for face recognition.
•
We further collect the existing challenges and promising directions for each element to
facilitate future research, and also discuss the future trends from the view of the holistic
framework.
2 OVERVIEW
A typical end-to-end deep face recognition system includes three basic elements: face detection,
face preprocessing, and face representation, as shown in Fig. 2. First, face detection localizes the
face region on the input image. Then, face preprocessing is proceeded to normalize the detected
face into a canonical layout. Finally, face representation devotes to extract discriminative features
from the prepossessed face. The features are used to calculate the similarity between them, in order
to make the decision that whether the faces belong to the same identity.
We structure the body sections (Section 3, 4, 5) with respect to the three elements, each of which
is a research topic that covers abundant literatures in computer vision. We give an overview of the
three elements briey in this section, and dive into each of them in the following body sections.
2.1 Face Detection
Face detection is the rst procedure of the face recognition system. Given an input image, the
face detection aims to nd all the faces in the image and give the coordinates of bounding box
with a condence score. The major challenges of face detection contain varying resolution, scale,
pose, illumination, occlusion etc.The traditional methods focus on designing hand-crafted features
that distinguishes facial and background region. With the development of deep learning, the deep
features have been extensively used in face detection. In Section 3, we provide a categorization of the
deep learning based face detection methods from multiple dimensions, which includes multi-stage,
single-stage, anchor-based, anchor-free, multi-task learning, CPU real-time and problem-oriented
methods. Generally, the categorizing criterion of the multi-stage and single-stage methods relies
on whether the face detectors generate candidate boxes, then the following one or more stages
further rene the candidates for accurate predictions. Most anchor-based methods preset a number
of anchors on the feature maps and then make classication and regression on these anchors. The
anchors play a crucial role in this routine. Recently, another routine, i.e., the anchor-free design,
attracts growing attention in object detection due to its exibility and eciency. So, we also discuss
the anchor-free methods and make comparison with the anchor-based ones. In addition, as the face
detection is the prior step in face recognition systems, the computational eciency of face detector
5
Fig. 3. Visualization of facial landmarks of dierent versions. The 4-point and 5-point landmarks are oen
used for face alignment.
is important in real-world applications. Although the detectors can achieve good performance with
the DCNNs, it is impractical to deploy heavy-weight networks, especially on the non-GPU devices.
Thus, we introduce the CPU real-time methods for practical applications. Certainly, we should
not ignore another set of problem-oriented methods for face detection, since they have explicit
motivation to tackle the specic challenges. From the above-mentioned perspectives, we provide an
in-depth discussion about the existing deep face detection methods in Section 3. It is worth noting
that there exists overlapping techniques between the categories, because, as explained above, the
categorization is built up from multiple perspectives. It will help us to better recognize the deep
learning based methods for face detection.
2.2 Face Preprocessing
In the second stage, face preprocessing aims to calibrate the detected face to a canonical view
( i.e., face alignment or frontalization), which is an essential procedure for improving the end-
to-end performance of face recognition. Since human face appears with the regular structure,
in which the facial parts (eyes, nose, mouth, etc.) have constant arrangement, the alignment of
face is of great benet to the subsequent feature computation for face recognition. Commonly,
face alignment utilizes spatial transformation techniques to calibrate faces to a normalized layout.
For most existing methods of face alignment, the facial landmarks, or so-called facial keypoints
(as shown in Fig. 3), are indispensable, because they are involved as the reference for similarity
transformation or ane transformation. So, the facial landmark localization is a prerequisite for
face alignment. The DCNNs based facial landmark localization methods can be divided into three
subcategories: coordinate regression based approaches, heatmap regression based approaches and
3D model tting based approaches. The coordinate regression based approaches take the landmark
coordinates as the target of the regression objective, and aims to learn the nonlinear mapping
from the input face image to the landmark coordinates. Besides, the heatmap regression based
methods output likelihood response maps corresponding to each landmark, respectively. Moreover,
the 3D model tting based methods predict a 3D face shape from a 2D image, and then project it
onto the image plane to obtain 2D landmarks. Without relying on the facial landmarks, several
methods can directly output aligned face from the input by learning the transformation parameters.
In addition, face frontalization techniques can also be applied in face preprocessing to tackle large
pose variations by synthesizing identity-preserving frontal faces from non-frontal views. Both face
alignment and face frontalization are the common practices for calibrating an unconstrained face
to a canonical view and facilitating the subsequent face representation. We will review this set of
methods in Section 4.
2.3 Face Representation
As the key step of face recognition systems, face representation devotes to learn deep face model
and use it to extract features from preprocessed faces for recognition. The features are used to
calculate the similarity of the matched faces. In Section 5, we provide a review of deep learning
剩余43页未读,继续阅读
资源评论
syp_net
- 粉丝: 158
- 资源: 1187
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- Vue Tour 是一款轻量级、简单且可自定义的导览插件,可与 Vue.js 配合使用 它提供了一种快速简便的方式来引导用户浏览您的应用程序 .zip
- Vue SFC REPL 作为 Vue 3 组件.zip
- Vue JS-掌握 Web 应用程序.zip
- vue calendar fullCalendar 无需 jquery 计划事件管理.zip
- 头歌java实训作业-test-day09.rar
- 头歌java实训作业-test-day08.rar
- 头歌java实训作业-test-day07.rar
- Vue Argon 仪表板.zip
- 利用JNI来实现android与SO文件的交互中文最新版本
- 用VirtualBox安装Android-x864.0图文教程中文4.8MB最新版本
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功