Suppressing Uncertainties for Large-Scale Facial Expression Recognition
Kai Wang
∗1,2
, Xiaojiang Peng
∗1
, Jianfei Yang
3
, Shijian Lu
3
, and Yu Qiao
†1
1
ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint
Lab,Shenzhen Institutes of Advanced Technology, Chinese Academy of Science
2
University of Chinese Academy of Sciences, China
3
Nanyang Technological University Singapore
Abstract
Annotating a qualitative large-scale facial expression
dataset is extremely difficult due to the uncertainties caused
by ambiguous facial expressions, low-quality facial images,
and the subjectiveness of annotators. These uncertainties
lead to a key challenge of large-scale Facial Expression
Recognition (FER) in deep learning era. To address this
problem, this paper proposes a simple yet efficient Self-
Cure Network (SCN) which suppresses the uncertainties ef-
ficiently and prevents deep networks from over-fitting un-
certain facial images. Specifically, SCN suppresses the
uncertainty from two different aspects: 1) a self-attention
mechanism over mini-batch to weight each training sam-
ple with a ranking regularization, and 2) a careful rela-
beling mechanism to modify the labels of these samples in
the lowest-ranked group. Experiments on synthetic FER
datasets and our collected WebEmotion dataset validate the
effectiveness of our method. Results on public benchmarks
demonstrate that our SCN outperforms current state-of-the-
art methods with 88.14% on RAF-DB, 60.23% on Affect-
Net, and 89.35% on FERPlus. The code will be available
at https://github.com/kaiwang960112/Self-Cure-Network.
1. Introduction
Facial expression is one of the most natural, powerful
and universal signals for human beings to convey their emo-
tional states and intentions [7, 38]. Automatically recogniz-
ing facial expression is also important to help the computer
understand human behavior and to interact with them. In
the past decades, researchers have made significant progress
on facial expression recognition (FER) with algorithms and
large-scale datasets, where datasets can be collected in lab-
∗
Equally-contributed first authors (kai.wang, xj.peng@siat.ac.cn)
†
Corresponding author (yu.qiao@siat.ac.cn)
Figure 1: Illustration of uncertainties on real-world facial
images from RAF-DB. The right samples are extremely dif-
ficult for machines and even human which are better to be
suppressed in training.
oratory or in the wild, such as CK+ [29], MMI [39], Oulu-
CASIA [47], SFEW/AFEW [10], FERPlus [4], AffectNet
[32], EmotioNet [11], RAF-DB [22], etc.
However, for the large-scale FER datasets collected from
the Internet, it is extremely difficult to annotate with high
quality due to the uncertainties caused by the subjective-
ness of annotators as well as ambiguous in-the-wild facial
images. As illustrated in Figure 1, the uncertainties increase
from high-quality and evident facial expressions to low-
quality and micro expressions. These uncertainties usually
lead to inconsistent labels and incorrect labels, which are
suspending the progress of large-scale Facial Expression
Recognition (FER), especially for the one of data-driven
deep learning based FER. Generally, training with uncer-
tainties of FER may lead to the following problems. First,
it may result in over-fitting on the uncertain samples which
may be mislabeled. Second, it is harmful for a model to
learn useful facial expression features. Third, a high ratio
of incorrect labels even makes the model disconvergence in
the early stage of optimization.
To address these issues, we propose a simple yet efficient
method, termed as Self-Cure Network (SCN), to suppress
the uncertainties for large-scale facial expression recogni-
tion. The SCN consists of three crucial modules: self-
arXiv:2002.10392v2 [cs.CV] 6 Mar 2020