Fake News Detection via NLP is Vulnerable to Adversarial Attacks
Zhixuan Zhou
1,2
, Huankang Guan
1
, Meghana Moorthy Bhat
2
and Justin Hsu
2
1
Hongyi Honor College, Wuhan University, Wuhan, China
2
Department of Computer Science, University of Wisconsin-Madison, Madison, USA
{kyriezoe, hkguan}@whu.edu.cn, {mbhat2, justhsu}@cs.wisc.edu
Keywords:
Fake News Detection, NLP, Attack, Fact Checking, Outsourced Knowledge Graph
Abstract:
News plays a significant role in shaping people’s beliefs and opinions. Fake news has always been a problem,
which wasn’t exposed to the mass public until the past election cycle for the 45th President of the United States.
While quite a few detection methods have been proposed to combat fake news since 2015, they focus mainly
on linguistic aspects of an article without any fact checking. In this paper, we argue that these models have the
potential to misclassify fact-tampering fake news as well as under-written real news. Through experiments on
Fakebox, a state-of-the-art fake news detector, we show that fact tampering attacks can be effective. To address
these weaknesses, we argue that fact checking should be adopted in conjunction with linguistic characteristics
analysis, so as to truly separate fake news from real news. A crowdsourced knowledge graph is proposed as a
straw man solution to collecting timely facts about news events.
1 INTRODUCTION
Fake news is an increasingly common feature of to-
day’s political landscape. To help address this issue,
researchers and media experts have proposed fake
news detectors adopting natural language processing
(NLP) to analyze word patterns and statistical corre-
lations of news articles. While these detectors achieve
impressive accuracy on existing examples of manip-
ulated news, the analysis is typically quite shallow—
roughly, models check whether news articles conform
to standard norms and styles used by professional
journalists. This leads to two drawbacks.
First, these models can detect fake news only
when they are under-written, for instance when the
content is totally unrelated to the headline (so-called
“clickbait”) or when the article includes words con-
sidered to be biased or inflammatory. While this
criteria suffices to detect many existing examples of
fake news, more sophisticated rumor disseminators
can craft more subtle attacks, for instance taking a
well-written real news article and tampering the ar-
ticle in a targeted way. By preserving the original
subject matter and relating the content tightly to the
headline without using biased phrases, an adversar-
ial article can easily evade detection. To demon-
strate this kind of attack, we evaluate a state-of-the-art
model called Fakebox. We introduce three classes of
attacks: fact distortion, subject-object exchange and
cause confounding. We generate adversarial versions
of real news from a dataset by McIntire (2018), and
show that Fakebox achieves low accuracy when clas-
sifying these examples.
At the same time, requirements posed by current
detectors are often too strict. Real news which is
under-written or talks about certain political and re-
ligious topics is likely to be mistakenly rejected, re-
gardless of its accuracy. This is a particularly seri-
ous problem for open platforms, such as Twitter in
the United States and TouTiao in China, where much
of the news is contributed by users with diverse back-
grounds. To prevent frustrating false positives, plat-
forms are still heavily relying on manual work for
separating fake news from real news. We provide ex-
perimental evidence for Fakebox’s potential of mis-
classifying real news.
Taken together, our experiments highlight vulner-
able aspects of fake news detection methods based
purely on NLP. Without deeper semantic knowledge,
such detectors are easily fooled by fact-tampering at-
tacks and can suffer from a high rate of false pos-
itives, mistakenly classifying under-written yet real
news which may not be written in a journalistic style.
To address these problems, we argue that some form
of fact-based knowledge must be adopted alongside
NLP-based models. What this knowledge is remains
to be seen, but we consider a straw man solution: a
crowdsourced knowledge graph that aggregates infor-