A Relational Tucker Decomposition for Multi-Relational Link Prediction
Yanjie Wang
1
Samuel Broscheit
1
Rainer Gemulla
1
Abstract
We propose the Relational Tucker3 (RT) decom-
position for multi-relational link prediction in
knowledge graphs. We show that many existing
knowledge graph embedding models are special
cases of the RT decomposition with certain pre-
defined sparsity patterns in its components. In
contrast to these prior models, RT decouples the
sizes of entity and relation embeddings, allows
parameter sharing across relations, and does not
make use of a predefined sparsity pattern. We use
the RT decomposition as a tool to explore whether
it is possible and beneficial to automatically learn
sparsity patterns, and whether dense models can
outperform sparse models (using the same number
of parameters). Our experiments indicate that—
depending on the dataset–both questions can be
answered affirmatively.
1. Introduction
Knowledge graphs (KG) (Lehmann et al., 2015; Rebele
et al., 2016) represent facts as subject-relation-object triples,
e.g., (London, capital of, UK). KG embedding (KGE) mod-
els embed each entity and each relation of a given KG into
a latent semantic space such that important structure of the
KG is retained. A large number of KGE models has been
proposed in the literature; applications include question an-
swering (Abujabal et al., 2018b;a), semantic search (Bast
et al., 2016), and recommendation (Zhang et al., 2016; Wang
et al., 2018a).
Many of the available KGE models can be expressed as
bilinear models, on which we focus throughout. Examples
include RESCAL (Nickel et al., 2011), DistMult (Tucker,
1966), ComplEx (Trouillon et al., 2016), Analogy (Liu et al.,
2017a), and CP (Lacroix et al., 2018). KGE models assign
a “score” to each subject-relation-object triple; high-scoring
triples are considered more likely to be true. In bilinear
models, the score is computed using a relation-specific linear
1
University of Mannheim, Germany. Correspon-
dence to:
<
ywang,rgemulla@uni-mannheim.de
>
,
<broscheit@informatik.uni-mannheim.de>.
combination of the pairwise interactions of the embeddings
of the subject and the object. The models differ in the kind
of interactions that are considered: RESCAL is dense in that
it considers all pairwise interactions, whereas all other of
the aforementioned models are sparse in that they consider
only a small, hard-coded subset of interactions (and learn
weights only for this subset). As a consequence, these
later models have fewer parameters. They empirically show
state-of-the-art performance (Liu et al., 2017b; Trouillon
et al., 2016; Lacroix et al., 2018) for multi-relational link
prediction tasks.
In this paper, we propose the Relational Tucker3 (RT) de-
composition, which tailors the standard Tucker3 decom-
position (Tucker, 1966) to the relational domain. The RT
decomposition is inspired by RESCAL, which specialized
the Tucker2 decomposition in a similar way. We use the RT
decomposition as a tool to to explore (1) whether we can
automatically learn which interactions should be considered
instead of using hard-coded sparsity patterns, (2) whether
and when this is beneficial, and finally (3) whether sparsity
is indeed necessary to learn good representations.
In a nutshell, RT decomposes the KG into an entity embed-
ding matrix, a relation embedding matrix, and a core tensor.
We show that all existing bilinear models are special cases
of RT under different viewpoints: the fixed core tensor view
and the constrained core tensor view. In both cases, the
differences between different bilinear models are reflected
in different (fixed a priori) sparsity patterns of the associated
core tensor. In contrast to bilinear models, RT offers a natu-
ral way to decouple entity and relation embedding sizes and
allows parameter sharing across relations. These properties
allow us to learn state-of-the-art dense representations for
KGs. Moreover, to study the questions raised above, we
propose and explore a sparse RT decomposition, in which
the core tensor is encouraged to be sparse, but without using
a predefined sparsity pattern.
We conducted an experimental study on common bench-
mark datasets to gain insight into the dense and sparse RT
decompositions and to compare them with state-of-the-art
models. Our results indicate that dense RT models can
outperform state-of-the-art sparse models (when using the
same number of parameters), and that it is possible and
sometimes beneficial to learn sparsity patterns via a sparse
arXiv:1902.00898v1 [cs.LG] 3 Feb 2019