Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1036–1042
Brussels, Belgium, October 31 - November 4, 2018.
c
2018 Association for Computational Linguistics
1036
Do explanations make VQA models more predictable to a human?
Arjun Chandrasekaran
∗,1
Viraj Prabhu
∗,1
Deshraj Yadav
∗,1
Prithvijit Chattopadhyay
∗,1
Devi Parikh
1,2
1
Georgia Institute of Technology
2
Facebook AI Research
{carjun, virajp, deshraj, prithvijit3, parikh}@gatech.edu
Abstract
A rich line of research attempts to make
deep neural networks more transparent by gen-
erating human-interpretable ‘explanations’ of
their decision process, especially for inter-
active tasks like Visual Question Answering
(VQA). In this work, we analyze if existing ex-
planations indeed make a VQA model – its re-
sponses as well as failures – more predictable
to a human. Surprisingly, we find that they do
not. On the other hand, we find that human-
in-the-loop approaches that treat the model as
a black-box do.
1 Introduction
As technology progresses, we are increasingly
collaborating with AI agents in interactive scenar-
ios where humans and AI work together as a team,
e.g., in AI-assisted diagnosis, autonomous driving,
etc. Thus far, AI research has typically only fo-
cused on the AI in such an interaction – for it to
be more accurate, be more human-like, understand
our intentions, beliefs, contexts, and mental states.
In this work, we argue that for human-AI inter-
actions to be more effective, humans must also un-
derstand the AI’s beliefs, knowledge, and quirks.
Many recent works generate human-
interpretable ‘explanations’ regarding a model’s
decisions. These are usually evaluated offline
based on whether human judges found them to be
‘good’ or to improve trust in the model. However,
their contribution in an interactive setting remains
unclear. In this work, we evaluate the role of
explanations towards making a model predictable
to a human.
We consider an AI trained to perform the
multi-modal task of Visual Question Answering
(VQA) (Malinowski and Fritz, 2014; Antol et al.,
2015), i.e., answering free-form natural language
∗
Denotes equal contribution.
Figure 1: We evaluate the extent to which expla-
nation modalities (right) and familiarization with
a VQA model help humans predict its behavior –
its responses, successes, and failures (left).
questions about images. VQA is applicable to
scenarios where humans actively elicit informa-
tion from visual data, and naturally lends itself to
human-AI interactions. We consider two tasks that
demonstrate the degree to which a human under-
stands their AI teammate (we call Vicki) – Failure
Prediction (FP) and Knowledge Prediction (KP).
In FP, we ask subjects on Amazon Mechanical
Turk to predict if Vicki will correctly answer a
given question about an image. In KP, subjects
predict Vicki’s exact response.
We aid humans in forming a mental model of
Vicki by (1) familiarizing them with its behavior
in a ‘training’ phase and (2) exposing them to its
internal states via various explanation modalities.
We then measure their FP and KP performance.
Our key findings are that (1) humans are indeed
capable of predicting successes, failures, and out-
puts of the VQA model better than chance, (2) ex-
plicitly training humans to familiarize themselves
with the model improves their performance, and
(3) existing explanation modalities do not enhance
human performance.
2 Related Work
Explanations in deep neural networks. Sev-
eral works generate explanations based on inter-