ARelationalTuckerDecompositionforMulti-RelationalLinkPrediction.pdf资源-CSDN文库

需积分: 9 100 浏览量 2019-08-09 14:56:14 上传评论收藏 888KB PDF 举报

资源推荐

资源详情

资源评论

A Relational Tucker Decomposition for Multi-Relational Link Prediction

Yanjie Wang

Samuel Broscheit

Rainer Gemulla

Abstract

We propose the Relational Tucker3 (RT) decom-

position for multi-relational link prediction in

knowledge graphs. We show that many existing

knowledge graph embedding models are special

cases of the RT decomposition with certain pre-

deﬁned sparsity patterns in its components. In

contrast to these prior models, RT decouples the

sizes of entity and relation embeddings, allows

parameter sharing across relations, and does not

make use of a predeﬁned sparsity pattern. We use

the RT decomposition as a tool to explore whether

it is possible and beneﬁcial to automatically learn

sparsity patterns, and whether dense models can

outperform sparse models (using the same number

of parameters). Our experiments indicate that—

depending on the dataset–both questions can be

answered afﬁrmatively.

1. Introduction

Knowledge graphs (KG) (Lehmann et al., 2015; Rebele

et al., 2016) represent facts as subject-relation-object triples,

e.g., (London, capital of, UK). KG embedding (KGE) mod-

els embed each entity and each relation of a given KG into

a latent semantic space such that important structure of the

KG is retained. A large number of KGE models has been

proposed in the literature; applications include question an-

swering (Abujabal et al., 2018b;a), semantic search (Bast

et al., 2016), and recommendation (Zhang et al., 2016; Wang

et al., 2018a).

Many of the available KGE models can be expressed as

bilinear models, on which we focus throughout. Examples

include RESCAL (Nickel et al., 2011), DistMult (Tucker,

1966), ComplEx (Trouillon et al., 2016), Analogy (Liu et al.,

2017a), and CP (Lacroix et al., 2018). KGE models assign

a “score” to each subject-relation-object triple; high-scoring

triples are considered more likely to be true. In bilinear

models, the score is computed using a relation-speciﬁc linear

University of Mannheim, Germany. Correspon-

dence to:

ywang,rgemulla@uni-mannheim.de

<broscheit@informatik.uni-mannheim.de>.

combination of the pairwise interactions of the embeddings

of the subject and the object. The models differ in the kind

of interactions that are considered: RESCAL is dense in that

it considers all pairwise interactions, whereas all other of

the aforementioned models are sparse in that they consider

only a small, hard-coded subset of interactions (and learn

weights only for this subset). As a consequence, these

later models have fewer parameters. They empirically show

state-of-the-art performance (Liu et al., 2017b; Trouillon

et al., 2016; Lacroix et al., 2018) for multi-relational link

prediction tasks.

In this paper, we propose the Relational Tucker3 (RT) de-

composition, which tailors the standard Tucker3 decom-

position (Tucker, 1966) to the relational domain. The RT

decomposition is inspired by RESCAL, which specialized

the Tucker2 decomposition in a similar way. We use the RT

decomposition as a tool to to explore (1) whether we can

automatically learn which interactions should be considered

instead of using hard-coded sparsity patterns, (2) whether

and when this is beneﬁcial, and ﬁnally (3) whether sparsity

is indeed necessary to learn good representations.

In a nutshell, RT decomposes the KG into an entity embed-

ding matrix, a relation embedding matrix, and a core tensor.

We show that all existing bilinear models are special cases

of RT under different viewpoints: the ﬁxed core tensor view

and the constrained core tensor view. In both cases, the

differences between different bilinear models are reﬂected

in different (ﬁxed a priori) sparsity patterns of the associated

core tensor. In contrast to bilinear models, RT offers a natu-

ral way to decouple entity and relation embedding sizes and

allows parameter sharing across relations. These properties

allow us to learn state-of-the-art dense representations for

KGs. Moreover, to study the questions raised above, we

propose and explore a sparse RT decomposition, in which

the core tensor is encouraged to be sparse, but without using

a predeﬁned sparsity pattern.

We conducted an experimental study on common bench-

mark datasets to gain insight into the dense and sparse RT

decompositions and to compare them with state-of-the-art

models. Our results indicate that dense RT models can

outperform state-of-the-art sparse models (when using the

same number of parameters), and that it is possible and

sometimes beneﬁcial to learn sparsity patterns via a sparse

arXiv:1902.00898v1 [cs.LG] 3 Feb 2019

A Relational Tucker Decomposition for Multi-Relational Link Prediction

RT model. We found that the best-performing method is

dataset-dependent.

2. Background

Multi-relational link prediction.

Given a set of enti-

ties

and a set of relations

, a knowledge graph

K ⊆

E × R × E

is a set of triples

(i,k, j)

, where

i, j ∈ E

and

k ∈ R

. Commonly,

and

are referred to as the subject,

relation, and object, respectively. A knowledge base can be

viewed as a labeled graph, where each vertex corresponds to

an entity, each label to a relation, and each labeled edge to a

triple. The goal of multi-relational link prediction is to de-

termine correct but unobserved triples

∈ (E × R × E)\K

based on

. The task has been studied extensively in the lit-

erature (Nickel et al., 2016a). The main approaches include

rule-based methods (Lao et al., 2011; Galarraga et al., 2013;

Meilicke et al., 2018), knowledge graph embeddings (Bor-

des et al., 2013; Trouillon et al., 2016; Nickel et al., 2011;

2016b; Liu et al., 2017a; Dettmers et al., 2018), and com-

bined methods such as (Guo et al., 2018).

KG embedding (KGE) models.

A KGE model asso-

ciates with each entity

and relation

an embedding

∈ R

and

∈ R

in a low-dimensional vector space, respec-

tively. Here

∈ N

are hyper-parameters that refer

to the size of the entity embeddings and relation embed-

dings, respectively. Each model uses a scoring function

s : E × R × E → R

to associate a score

s(i,k, j)

to each

triple

(i,k, j) ∈ E ×R×E

. The scoring function depends on

, and

only through their respective embeddings

and

. Triples with high scores are considered more likely

to be true than triples with low scores.

Embedding models roughly can be classiﬁed into translation-

based models (Bordes et al., 2013; Wang et al., 2014), fac-

torization models (Trouillon & Nickel, 2017; Liu et al.,

2017a), and neural models (Socher et al., 2013; Dettmers

et al., 2018). Many of the available KGE models can be

expressed as bilinear models (Wang et al., 2018b), in which

the scoring function takes form

s(i,k, j) = e

, (1)

where

∈ R

and

∈ R

×d

. We refer to matrix

as the mixing matrix for relation

;

is derived from

using a model-speciﬁc mapping

→ R

×d

. Existing

bilinear models differ from each other mainly in this map-

ping. We summarize some of the most prevalent models

in what follows. We use

◦

for the Hadamard product (i.e.,

elementwise multiplication),

vec(·)

for the vectorization of

a matrix from its columns,

for the

K × K

identity matrix,

diag(·)

for the diagonal matrix built from the arguments,

and

[n]

for

{1,2,...,n}

. By convention, vectors of form

refer to rows of some matrix

(as a column vector) and

scalars a

i j

to individual entries.

RESCAL (Nickel et al., 2011).

RESCAL is an uncon-

strained bilinear model and directly learns the mixing ma-

trices

{

}

. In our notation, RESCAL sets

= d

and

uses

RESCAL

= vec

−1

All of the bilinear models discussed below can be seen as

constrained variants of RESCAL; constraints are used to

facilitate learning and reduce the number of parameters.

DistMult (Yang et al., 2014).

DistMult puts a diagonality

constraint on the mixing matrices. The relation embeddings

hold the values on the diagonal, i.e., d

= d

and

DistMult

= diag(r

Since each mixing matrix is symmetric, we have

s(i,k, j) =

s( j,k,i)

so that DistMult can only model symmetric re-

lations. DistMult is equivalent to the INDSCAL tensor

decomposition (Carroll & Chang, 1970).

CP (Lacroix et al., 2018).

CP is another classical ten-

sor decomposition (Kolda & Bader, 2009) and has recently

shown good results for KGE. Here CP associates two em-

beddings

sub

and

obj

with each entity and uses scoring

function of form

s(i,k, j) = (e

sub

)

diag(r

obj

. The CP

decomposition can be expressed as a bilinear model using

mixing matrix

diag(r

)

where

is even,

= d

, and thus

∈ R

. To see

this, observe that if we set

e =



sub

obj



, then

sub

)

diag(r

obj

. Note that CP can model symmetric

and asymmetric relations.

ComplEx (Trouillon et al., 2016) .

ComplEx is currently

one of the best-performing KGE models (see also Sec. 4.5).

Let

be even, set

= d

, and denote by

left

and

right

the

ﬁrst and last

entries of

. ComplEx then uses mixing

matrix

ComplEx

diag(r

left

) diag(r

right

)

−diag(r

right

) diag(r

left

)

As CP, ComplEx can model both symmetric (

right

= 0

) and

asymmetric (r

right

6= 0) relations.

ComplEx can be expressed in a number of equivalent

ways (Kazemi & Poole, 2018). In their original work, Trouil-

lon et al. (2016) use complex embeddings (instead of real

ones) and scoring function

s(i,k, j) = Re(e

diag(r

)

剩余11页未读，继续阅读

评论收藏

内容反馈

Jayxp

粉丝: 6
资源: 137

A Relational Tucker Decomposition for Multi-Relational Link Pred...

最新资源

A Relational Tucker Decomposition for Multi-Relational Link Pred...

MySQL-5.5.46-1.linux2.6.x86_64.rpm-bundle

mysql-community-server-5.7.23-1.el6.x86_64.rpm

mysql-installer-web-community-5.7.25.0.msi

spring-data-jpa-1.4.2.RELEASE.jar

mysql-installer-community-5.7.32.0.zip

mysql-boost-8.0.19.tar.gz

ssd7 exercise7 标准答案

Recurrent Event Network for Reasoning over Temporal Knowledge Graphs.pdf

mysql80-community-release-el6-1.noarch.rpm

mysql-5.6.39-win32.msi

mysql-standard-4.1.21-pc-linux-gnu-i686.tar.gz

mysql-shell-8.0.18-windows-x86-64bit.zip

Mapping Objects to Relational Databases O-R Mapping In Detail.doc

Streaming Systems - Tyler Akidau.pdf.zip

mysql-5.7.20-win32.msi

mysql-8.0.2-dmr-winx64.msi

mysql-installer-community-5.7.29.0.msi

YOLOv8-deepsort 实现智能车辆目标检测+车辆跟踪+车辆计数

Transformer模型实现长期预测并可视化结果（附代码+数据集+原理介绍）

YOLOv8网络结构图，自制visio文件，yolov8.vsds，需要的自取，在原有的基础上直接改就行了

yolov8(2023年8月版本),已经下好yolov8s.pt和yolov8n.pt

社交平台上经济类话题的文章热度信息，数据是真实的，但不是真实日期

行人跌倒数据集（VOC格式）

CIFAR10数据集免费下载

大作业05-YOLOV5口罩检测数据集+代码+模型 2000张标注好的数据+教学视频.zip

Deep Learning Tuning Playbook（中译版）

zotero翻译插件.xpi

最新资源