OcclusionAwareFacialExpressionRecognitionUsingCNNWithAttentionMechanism资源-CSDN文库

深度学习

需积分: 48 18 浏览量 2021-01-21 18:43:24 上传评论收藏 1.3MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 5, MAY 2019 2439

Occlusion Aware Facial Expression Recognition

Using CNN With Attention Mechanism

Yong Li , Student Member, IEEE,JiabeiZeng , Member, IEEE, Shiguang Shan , Member, IEEE,

and Xilin Chen, Fellow, IEEE

Abstract— Facial expression recognition in the wild is challeng-

ing due to various unconstrained conditions. Although existing

facial expression classiﬁers have been almost perfect on analyzing

constrained frontal faces, they fail to perform well on partially

occluded faces that are common in the wild. In this paper,

we propose a convolution neutral network (CNN) with attention

mechanism (ACNN) that can perceive the occlusion regions of the

face and focus on the most discriminative un-occluded regions.

ACNN is an end-to-end learning framework. It combines the

multiple representations from facial regions of interest (ROIs).

Each representation is weighed via a proposed gate unit that

computes an adaptive weight from the region itself according

to the unobstructedness and importance. Considering different

RoIs, we introduce two versions of ACNN: patch-based ACNN

(pACNN) and global–local-based ACNN (gACNN). pACNN only

pays attention to local facial patches. gACNN integrates local rep-

resentations at patch-level with global representation at image-

level. The p roposed ACNNs are evaluated on both real and

synthetic occlusions, including a self-collected facial expression

dataset with real-world occlusions, the two largest in-the-wild

facial expression datasets (RAF-DB and AffectNet) and their

modiﬁcations with synthesized facial occlusions. Experimental

results show that ACNNs improve the recognition accuracy on

both the non-occluded faces and occluded faces. Visualization

results demonstrate that, compared with the CNN without Gate

Unit, ACNNs are capable of shifting the attention from the

occluded patches to other related but unobstructed ones. ACNNs

also outperform other state-of-the-art methods on several widely

used in-the-lab facial expression datasets under the cross-dataset

evaluation protocol.

Index Terms— Facial expression recognition, occlusion, CNN

with attention mechanism, gate unit.

Manuscript received May 28, 2018; re vised September 27, 2018 and

November 11, 2018; accepted December 5, 2018. Date of publication

December 14, 2018; date of current version February 13, 2019. This

work was partially supported by National Key R&D Program of China

(grant NO.2017YFB1002802), Natural Science F oundation of China (grants

61702481 and 61702486), and External Cooperation Program of CAS (grant

GJHZ1843). The associate editor coordinating the review of this manuscript

and approving it for publication was Prof. Xiaochun Cao. (Corresponding

author: Jiabei Zeng.)

Y. Li and X. Chen are with the Ke y Laboratory of Intelligent Informa-

tion Processing, Institute of Computing Technology, Chinese Academy of

Sciences, Beijing 100190, China, and also with the Uni versity of Chinese

Academy of Sciences, Beijing 100049, China (e-mail: yong.li@vipl.ict.ac.cn;

xlchen@ict.ac.cn).

J. Zeng is with the Key Laboratory of Intelligent Information Processing,

Institute of Computing Technology, Chinese Academy of Sciences, Beijing

100190, China (e-mail: jiabei.zeng@vipl.ict.ac.cn).

S. Shan is with the Key Laboratory of Intelligent Information Processing,

Center for Excellence in Brain Science and Intelligence Technology, Institute

of Computing Technology, Chinese Academy of Sciences, Beijing 100190,

China, and also with the University of Chinese Academy of Sciences, Beijing

100049, China (e-mail: sgshan@ict.ac.cn).

Digital Object Identiﬁer 10.1109/TIP.2018.2886767

I. INTRODUCTION

ACIAL expression recognition (FER) has received signif-

icant interest from computer scientists and psychologists

over r ecent decades, as it holds promise to an abundance

of applications, such as human-computer interaction, affect

analysis, and mental health assessment. Although many facial

expression recognition systems have been proposed and imple-

mented, majority of them are built on images captured in

controlled environment, such as CK+ [1], MMI [2], Oulu-

CASIA [3], and other lab-collected datasets. The controlled

faces are frontal and without any occlusion. The FER systems

that perform perfectly on the lab-collected datasets, are prob-

able to perform poorly when recognizing human expressions

under natural and un-controlled conditions. To ﬁll the gap

between the FER accuracy on the controlled faces and un-

controlled faces, researchers make efforts on collecting large-

scale facial expressio n datasets in the wild [4], [5]. Despite

the usage of data from the wild, facial expression recognition

is still challenging due to the existence of partially occluded

faces. It is non-trivial to address the occlusion issue because

occlusions varies in the occluders and their positions. The

occlusions may caused by hair, glasses, scarf, breathing mask,

hands, arms, food, and other objects that could be placed in

front of the faces in daily life. These objects may block the eye,

mouth, part of the cheek, and any other part of the face. The

variability of occlusions cannot be fully covered by limited

amounts of data and will inevitably lead the recognition

accuracy to decrease.

To address the issue of occlusion, we propose a Convolution

Neural Network with attention mechanism (ACNN), mimicing

the way that human recognize the facial expression. Intuitively,

human recognizes the facial expressions based on certain

patches of the face. When some regions of the face are blocked

(e.g., the lower left cheek), human may judge the expression

according to the symmetric part of face (e.g., the lower right

cheek), or other highly related facial regions (e.g., regions

around the eyes or mouth). Inspired by the intuition, ACNN

automatically perceives the blocked facial patches and pays

attention mainly to the unblocked and informative patches.

Fig. 1 illustrates the main idea of the proposed method.

Each Gate Unit in ACNN learns an adaptive weight by the

unobstructed-ness or importance. As can be seen in Fig. 1,

the last three visualized patches are blocked by the baby’s

hand and thus they have low unobstructed-ness (α

). Then,

the weighed representations are concatenated and used in the

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

2440 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 5, MAY 2019

Fig. 1. Illustration of the proposed ACNN for occlusion-aware facial expression recognition. ACNN can be categorized in two versions: pA CNN and

gACNN. During Part 3, pACNN extracts 24 re gions of interest from the intermediate feature maps. Then as can be seen in red rectangle, for each patch

region, a speciﬁc Patch-Gated Unit (PG-Unit) is learnt to weigh the local representations according to the region’s “unobstructed-ness” (to what extent the

patch is occluded). Then, the weighed representations are concatenated and passed to the classiﬁcation part. gACNN integrates weighed local representations

with global representation (purple rectangle). The global representation is encoded and weighed via a Global-Gated Unit (GG-Unit).

classiﬁcation part. Thus ACNN is able to focus on distinctive

as well as unobstructed regions in facial image.

Considering different facial regions of interest, we propose

two versions of ACNN: (1) pACNN crops patches of interest

from the last convolution feature maps according to the

positions of the related facial landmarks. Then for each patch,

a Patch-Gated Unit (PG-Unit) is learned to weigh the patch’s

local representation by its unobstructed-ness that is computed

from the patch itself. (2) gACNN integrates local and global

representations concurrently. Besides local weighed features,

a Global-Gated Unit (GG-Unit) is adopted in gACNN to learn

and weigh the global representation.

A preliminary version of this work appeared as [6]. In this

paper, we provide technical details of facial region decompo-

sition, present extended results with more comparisons and on

more datasets, and release a facial expression dataset in the

presence of real occlusions. The contributions of this work are

summarized as follows:

1) We propose a convolutional neural network with atten-

tion mechanism (ACNN) to recognize facial expressions

from partially occluded faces. ACNN can automatically

perceives the occluded regions of the face and focus on

the most informative and un-blocked regions.

2) Visualized results show that Gate-Unit (the crucial part

of ACNN) is effective in perceiving the occluded facial

patches. For pACNN, PG-Unit is capable of learning a

low weight for a blocked region and a high weight for

an unblocked and informative one. With the integration

of PG-Unit and GG-Unit, gACNN gains further improve-

ment on FER performance under occlusions.

3) Experimental results demonstrate the advantages of the

proposed ACNNs over other state-of-the-art methods on

two large in-the-wild facial expression datasets and sev-

eral popular in-the-lab datasets, under the settings with

either partially occluded or non-occluded faces.

4) We collected and labelled a facial expression dataset in

the presence of real occlusions (FED-RO). To the best of

our knowledge, It is the ﬁrst facial expression dataset in

the p resence of real occlusions.

II. R

ELATED WORK

We review the previous work considering two aspects that

are related to ours, i.e., the similar tasks (facial analysis with

occluded faces) and related techniques (attention mechanism).

A. Methods Towards Facial Occlusions

For facial analysis tasks, occlusion is one of the inherent

challenges in the real world facial expression recognition

and other facial analysis tasks, e.g., facial recognition, age

estimate, gender classiﬁcation, etc. Previous approaches that

address facial occlusions can be classiﬁed into two categories:

holistic-based or part-based methods.

Holistic-based approaches treat the face as a whole and

do not explicitly divide the face into sub-regions. To address

剩余11页未读，继续阅读

评论收藏

内容反馈

高山我梦：）

粉丝: 61
资源: 3

Occlusion Aware Facial Expression Recognition Using CNN With Att...

最新资源

Occlusion Aware Facial Expression Recognition Using CNN With Att...

Attention-based model for speech recogntion

Facial-classification-using-CNN-and-MATLAB-according-to-gend:此代码使用一组经过裁剪和对齐的面部图像来训练 CNN。 使用新图像进行验证的准确度为 8-matlab开发

OCCLUSION-AWARE GAN FOR FACE DE-OCCLUSION IN THE WILD.pdf

PAMI-Robust FaceRecognition via Sparse Representation

face_recognition_occlusion-master_2_面部识别_人脸识别_口罩识别_口罩

Iterative Re-constrained Group Sparse Face Recognition

Structured Sparse Error Coding for Face Recognition With Occlusion

matlab女孩代码-Occlusion-aware-real-time-object-tracking-:遮挡感知的实时对象跟踪

Towards a Practical Face Recognition System

An HOG-LBP Human Detector with Partial Occlusion Handlin

Coherent Hierarchical Culling：Hardware Occlusion Queries Made Useful

Nvidia卡下的Occlusion Query实现

Unity3d Occlusion Culling设置

face_recognition_occlusion-master_2_面部识别_人脸识别_口罩识别_口罩.zip

An HOG-LBP Human Detector with Partial Occlusion Handling

Towards occlusion handling: object tracking with background estimation

face_recognition_occlusion-master_2_面部识别_人脸识别_口罩识别_口罩_源码.zip

masked-software-occlusion-culling.pdf

Ambient Occlusion Volumes

Java 面经手册·小傅哥.pdf

解压后拖入浏览器扩展程序使用.zip

103套PPT模板.zip

Beyond Compare 免安装直接使用

notepad++.exe官网下载

Mars4_5.zip

keygen_2032.rar

QT自制精美Ui模板系列（一）桃子风格模板 - 二次开发专用

最新资源

Facial-classification-using-CNN-and-MATLAB-according-to-gend:此代码使用一组经过裁剪和对齐的面部图像来训练 CNN。使用新图像进行验证的准确度为 8-matlab开发