
KBQA: Learning Question Answering over QA Corpora
and Knowledge Bases
Wanyun Cui
§
Yanghua Xiao
§
Haixun Wang
‡
Yangqiu Song
¶
Seung-won Hwang
]
Wei Wang
§
§
Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University
‡
Facebook
¶
HKUST
]
Yonsei University
wanyuncui1@gmail.com, shawyh@fudan.edu.cn, haixun@gmail.com, yqsong@cse.ust.hk,
seungwonh@yonsei.ac.kr, weiwang1@fudan.edu.cn
ABSTRACT
Question answering (QA) has become a popular way for humans to
access billion-scale knowledge bases. Unlike web search, QA over
a knowledge base gives out accurate and concise results, provid-
ed that natural language questions can be understood and mapped
precisely to structured queries over the knowledge base. The chal-
lenge, however, is that a human can ask one question in many dif-
ferent ways. Previous approaches have natural limits due to their
representations: rule based approaches only understand a small set
of “canned” questions, while keyword based or synonym based ap-
proaches cannot fully understand the questions. In this paper, we
design a new kind of question representation: templates, over a
billion scale knowledge base and a million scale QA corpora. For
example, for questions about a city’s population, we learn tem-
plates such as What’s the population of $city?, How
many people are there in $city?. We learned 27 mil-
lion templates for 2782 intents. Based on these templates, our QA
system KBQA effectively supports binary factoid questions, as well
as complex questions which are composed of a series of binary fac-
toid questions. Furthermore, we expand predicates in RDF knowl-
edge base, which boosts the coverage of knowledge base by 57
times. Our QA system beats all other state-of-art works on both
effectiveness and efficiency over QALD benchmarks.
1. INTRODUCTION
Question Answering (QA) has drawn a lot of research inter-
ests. A QA system is designed to answer a particular type of
questions [5]. One of the most important types of question-
s is the factoid question (FQ), which asks about objective fact-
s of an entity. A particular type of FQ, known as the bi-
nary factoid question (BFQ) [1], asks about a property of an
entity. For example, how many people are there in
Honolulu? If we can answer BFQs, then we will be able
to answer other types of questions, such as 1) ranking question-
s: which city has the 3rd largest population?;
2) comparison questions: which city has more people,
Honolulu or New Jersey?; 3) listing questions: list
This work is licensed under the Creative Commons Attribution-
NonCommercial-NoDerivatives 4.0 International License. To view a copy
of this license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/. For
any use beyond those covered by this license, obtain permission by emailing
info@vldb.org.
Proceedings of the VLDB Endowment, Vol. 10, No. 5
Copyright 2017 VLDB Endowment 2150-8097/17/01.
cities ordered by population etc. In addition to BFQ
and its variants, we can answer a complex factoid question such
as when was Barack Obama’s wife born? This can
be answered by combining the answers of two BFQs: who’s
Barack Obama’s wife? (Michelle Obama) and when was
Michelle Obama born? (1964). We define a complex fac-
toid question as a question that can be decomposed into a series
of BFQs. In this paper, we focus on BFQs and complex factoid
questions.
QA over a knowledge base has a long history. In recent years,
large scale knowledge bases become available, including Google’s
Knowledge Graph, Freebase [3], YAGO2 [16], etc., greatly in-
crease the importance and the commercial value of a QA system.
Most of such knowledge bases adopt RDF as data format, and they
contain millions or billions of SPO triples (S, P , and O denote
subject, predicate, and object respectively).
a
b
Barack Obama
1992
c
Michelle
Obama
marriage
name
person
name
1961
dob
1964
dob
Honolulu
d
390K
Event
CityPolitician
Person
category
category
date
category
Person
category
name
population
category
pob
Figure 1: A toy RDF knowledge base (here, “dob” and “pob” s-
tand for “date of birth” and “place of birth” respectively). Note
that the “spouse of” intent is represented by multiple edges:
name - marriage - person - name.
1.1 Challenges
Given a question against a knowledge base, we face two chal-
lenges: in which representation we understand the questions (repre-
sentation designment), and how to map the representations to struc-
tured queries against the knowledge base (semantic matching)?
• Representation Designment: Questions describe thousands of
intents, and one intent has thousands of question templates. For
example, both
a
and
b
in Table 1 ask about population of Hon-
olulu, although they are expressed in quite different ways. The
QA system needs different representations for different ques-
tions. Such representations must be able to (1) identify ques-
tions with the same semantics; (2) distinguish different ques-
tion intents. In the QA corpora we use, we find 27M question
templates over 2782 question intents. So it’s a big challenge to
design representations to handle this.
565
- 1
- 2
- 3
- 4
- 5
- 6
前往页