没有合适的资源?快使用搜索试试~ 我知道了~
Occlusion Aware Facial Expression Recognition Using CNN With Att...
需积分: 48 2 下载量 73 浏览量
2021-01-21
18:43:24
上传
评论
收藏 1.3MB PDF 举报
温馨提示
Yong Li , Student Member, IEEE, Jiabei Zeng , Member, IEEE, Shiguang Shan , Member, IEEE, and Xilin Chen, Fellow, IEEE
资源推荐
资源详情
资源评论
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 5, MAY 2019 2439
Occlusion Aware Facial Expression Recognition
Using CNN With Attention Mechanism
Yong Li , Student Member, IEEE,JiabeiZeng , Member, IEEE, Shiguang Shan , Member, IEEE,
and Xilin Chen, Fellow, IEEE
Abstract— Facial expression recognition in the wild is challeng-
ing due to various unconstrained conditions. Although existing
facial expression classifiers have been almost perfect on analyzing
constrained frontal faces, they fail to perform well on partially
occluded faces that are common in the wild. In this paper,
we propose a convolution neutral network (CNN) with attention
mechanism (ACNN) that can perceive the occlusion regions of the
face and focus on the most discriminative un-occluded regions.
ACNN is an end-to-end learning framework. It combines the
multiple representations from facial regions of interest (ROIs).
Each representation is weighed via a proposed gate unit that
computes an adaptive weight from the region itself according
to the unobstructedness and importance. Considering different
RoIs, we introduce two versions of ACNN: patch-based ACNN
(pACNN) and global–local-based ACNN (gACNN). pACNN only
pays attention to local facial patches. gACNN integrates local rep-
resentations at patch-level with global representation at image-
level. The p roposed ACNNs are evaluated on both real and
synthetic occlusions, including a self-collected facial expression
dataset with real-world occlusions, the two largest in-the-wild
facial expression datasets (RAF-DB and AffectNet) and their
modifications with synthesized facial occlusions. Experimental
results show that ACNNs improve the recognition accuracy on
both the non-occluded faces and occluded faces. Visualization
results demonstrate that, compared with the CNN without Gate
Unit, ACNNs are capable of shifting the attention from the
occluded patches to other related but unobstructed ones. ACNNs
also outperform other state-of-the-art methods on several widely
used in-the-lab facial expression datasets under the cross-dataset
evaluation protocol.
Index Terms— Facial expression recognition, occlusion, CNN
with attention mechanism, gate unit.
Manuscript received May 28, 2018; re vised September 27, 2018 and
November 11, 2018; accepted December 5, 2018. Date of publication
December 14, 2018; date of current version February 13, 2019. This
work was partially supported by National Key R&D Program of China
(grant NO.2017YFB1002802), Natural Science F oundation of China (grants
61702481 and 61702486), and External Cooperation Program of CAS (grant
GJHZ1843). The associate editor coordinating the review of this manuscript
and approving it for publication was Prof. Xiaochun Cao. (Corresponding
author: Jiabei Zeng.)
Y. Li and X. Chen are with the Ke y Laboratory of Intelligent Informa-
tion Processing, Institute of Computing Technology, Chinese Academy of
Sciences, Beijing 100190, China, and also with the Uni versity of Chinese
Academy of Sciences, Beijing 100049, China (e-mail: yong.li@vipl.ict.ac.cn;
xlchen@ict.ac.cn).
J. Zeng is with the Key Laboratory of Intelligent Information Processing,
Institute of Computing Technology, Chinese Academy of Sciences, Beijing
100190, China (e-mail: jiabei.zeng@vipl.ict.ac.cn).
S. Shan is with the Key Laboratory of Intelligent Information Processing,
Center for Excellence in Brain Science and Intelligence Technology, Institute
of Computing Technology, Chinese Academy of Sciences, Beijing 100190,
China, and also with the University of Chinese Academy of Sciences, Beijing
100049, China (e-mail: sgshan@ict.ac.cn).
Digital Object Identifier 10.1109/TIP.2018.2886767
I. INTRODUCTION
F
ACIAL expression recognition (FER) has received signif-
icant interest from computer scientists and psychologists
over r ecent decades, as it holds promise to an abundance
of applications, such as human-computer interaction, affect
analysis, and mental health assessment. Although many facial
expression recognition systems have been proposed and imple-
mented, majority of them are built on images captured in
controlled environment, such as CK+ [1], MMI [2], Oulu-
CASIA [3], and other lab-collected datasets. The controlled
faces are frontal and without any occlusion. The FER systems
that perform perfectly on the lab-collected datasets, are prob-
able to perform poorly when recognizing human expressions
under natural and un-controlled conditions. To fill the gap
between the FER accuracy on the controlled faces and un-
controlled faces, researchers make efforts on collecting large-
scale facial expressio n datasets in the wild [4], [5]. Despite
the usage of data from the wild, facial expression recognition
is still challenging due to the existence of partially occluded
faces. It is non-trivial to address the occlusion issue because
occlusions varies in the occluders and their positions. The
occlusions may caused by hair, glasses, scarf, breathing mask,
hands, arms, food, and other objects that could be placed in
front of the faces in daily life. These objects may block the eye,
mouth, part of the cheek, and any other part of the face. The
variability of occlusions cannot be fully covered by limited
amounts of data and will inevitably lead the recognition
accuracy to decrease.
To address the issue of occlusion, we propose a Convolution
Neural Network with attention mechanism (ACNN), mimicing
the way that human recognize the facial expression. Intuitively,
human recognizes the facial expressions based on certain
patches of the face. When some regions of the face are blocked
(e.g., the lower left cheek), human may judge the expression
according to the symmetric part of face (e.g., the lower right
cheek), or other highly related facial regions (e.g., regions
around the eyes or mouth). Inspired by the intuition, ACNN
automatically perceives the blocked facial patches and pays
attention mainly to the unblocked and informative patches.
Fig. 1 illustrates the main idea of the proposed method.
Each Gate Unit in ACNN learns an adaptive weight by the
unobstructed-ness or importance. As can be seen in Fig. 1,
the last three visualized patches are blocked by the baby’s
hand and thus they have low unobstructed-ness (α
p
). Then,
the weighed representations are concatenated and used in the
1057-7149 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
2440 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 28, NO. 5, MAY 2019
Fig. 1. Illustration of the proposed ACNN for occlusion-aware facial expression recognition. ACNN can be categorized in two versions: pA CNN and
gACNN. During Part 3, pACNN extracts 24 re gions of interest from the intermediate feature maps. Then as can be seen in red rectangle, for each patch
region, a specific Patch-Gated Unit (PG-Unit) is learnt to weigh the local representations according to the region’s “unobstructed-ness” (to what extent the
patch is occluded). Then, the weighed representations are concatenated and passed to the classification part. gACNN integrates weighed local representations
with global representation (purple rectangle). The global representation is encoded and weighed via a Global-Gated Unit (GG-Unit).
classification part. Thus ACNN is able to focus on distinctive
as well as unobstructed regions in facial image.
Considering different facial regions of interest, we propose
two versions of ACNN: (1) pACNN crops patches of interest
from the last convolution feature maps according to the
positions of the related facial landmarks. Then for each patch,
a Patch-Gated Unit (PG-Unit) is learned to weigh the patch’s
local representation by its unobstructed-ness that is computed
from the patch itself. (2) gACNN integrates local and global
representations concurrently. Besides local weighed features,
a Global-Gated Unit (GG-Unit) is adopted in gACNN to learn
and weigh the global representation.
A preliminary version of this work appeared as [6]. In this
paper, we provide technical details of facial region decompo-
sition, present extended results with more comparisons and on
more datasets, and release a facial expression dataset in the
presence of real occlusions. The contributions of this work are
summarized as follows:
1) We propose a convolutional neural network with atten-
tion mechanism (ACNN) to recognize facial expressions
from partially occluded faces. ACNN can automatically
perceives the occluded regions of the face and focus on
the most informative and un-blocked regions.
2) Visualized results show that Gate-Unit (the crucial part
of ACNN) is effective in perceiving the occluded facial
patches. For pACNN, PG-Unit is capable of learning a
low weight for a blocked region and a high weight for
an unblocked and informative one. With the integration
of PG-Unit and GG-Unit, gACNN gains further improve-
ment on FER performance under occlusions.
3) Experimental results demonstrate the advantages of the
proposed ACNNs over other state-of-the-art methods on
two large in-the-wild facial expression datasets and sev-
eral popular in-the-lab datasets, under the settings with
either partially occluded or non-occluded faces.
4) We collected and labelled a facial expression dataset in
the presence of real occlusions (FED-RO). To the best of
our knowledge, It is the first facial expression dataset in
the p resence of real occlusions.
II. R
ELATED WORK
We review the previous work considering two aspects that
are related to ours, i.e., the similar tasks (facial analysis with
occluded faces) and related techniques (attention mechanism).
A. Methods Towards Facial Occlusions
For facial analysis tasks, occlusion is one of the inherent
challenges in the real world facial expression recognition
and other facial analysis tasks, e.g., facial recognition, age
estimate, gender classification, etc. Previous approaches that
address facial occlusions can be classified into two categories:
holistic-based or part-based methods.
Holistic-based approaches treat the face as a whole and
do not explicitly divide the face into sub-regions. To address
剩余11页未读,继续阅读
资源评论
高山我梦:)
- 粉丝: 61
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 上海大悦城.zipPS
- 七大【华北+东北+华中+华东+西北+西南+华南】地理区划-汇总+各单个区划-标准Shape文件
- pytorch 的 yolov3.zip
- 黑白、彩色验证码训练图片(免费)
- 北京五彩城.zip Photo
- pytorch、tensorflow 和 onnx 中的 Yolo v4.zip
- YOLOV5智慧工地安全帽检测数据集系统及危险区域检测系统源码+数据集+文档说明(高分项目)
- Pytorch复现YOLOv3,使用最新的DIOU损失训练.zip
- 我的smartGit,方便大家下载
- Pytroch 0.4.1 和 YoloV3 在 python3 上的又一次实现.zip
- 一点万象城.zip PHOTO
- YOLOV5智慧工地安全帽检测数据集系统及危险区域检测系统源码+文档说明+可视化界面+全部资料(毕业设计)
- 通过实验深入了解 TCP 数据的发送和接收
- 奥伦纳素礼盒完稿文件夹(-F)(1).zip
- React 电子商务网上商店.zip
- 20170306台卡完稿.rar
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功