突破消息传递图神经网络的局限性_BreakingtheLimitsofMessagePassingGraphNeu资源-CSDN文库

版权申诉

82 浏览量 2022-02-08 00:24:16 上传评论收藏 925KB PDF 举报

资源推荐

资源详情

资源评论

Breaking the Limits of Message Passing Graph Neural Networks

Muhammet Balcilar

1 2

Pierre H

eroux

Benoit Ga

ere

Pascal Vasseur

1 4

ebastien Adam

Paul Honeine

Abstract

Since the Message Passing (Graph) Neural Net-

works (MPNNs) have a linear complexity with

respect to the number of nodes when applied

to sparse graphs, they have been widely imple-

mented and still raise a lot of interest even though

their theoretical expressive power is limited to

the ﬁrst order Weisfeiler-Lehman test (1-WL). In

this paper, we show that if the graph convolution

supports are designed in spectral-domain by a non-

linear custom function of eigenvalues and masked

with an arbitrary large receptive ﬁeld, the MPNN

is theoretically more powerful than the 1-WL test

and experimentally as powerful as a 3-WL exist-

ing models, while remaining spatially localized.

Moreover, by designing custom ﬁlter functions,

outputs can have various frequency components

that allow the convolution process to learn differ-

ent relationships between a given input graph sig-

nal and its associated properties. So far, the best

3-WL equivalent graph neural networks have a

computational complexity in

O(n

)

with memory

usage in

O(n

)

, consider non-local update mech-

anism and do not provide the spectral richness of

output proﬁle. The proposed method overcomes

all these aforementioned problems and reaches

state-of-the-art results in many downstream tasks.

1. Introduction

In the past few years, ﬁnding the best inductive bias for

relational data represented as graphs has gained a lot of

interest in the machine learning community. Node-based

message passing mechanisms relying on the graph structure

have given rise to the ﬁrst generation of Graph Neural Net-

works (GNNs) called Message Passing Neural Networks

(MPNNs) (Gilmer et al., 2017). These algorithms spread

each node features to the neighborhood nodes using train-

LITIS Lab, University of Rouen Normandy, France

InterDigital, France

LITIS Lab, INSA Rouen Normandy, France

MIS Lab, Universit

e de Picardie Jules Verne, France. Correspon-

dence to: Muhammet Balcilar

muhammetbalcilar@gmail.com

Proceedings of the

International Conference on Machine

able weights. These weights can be shared with respect

to the distance between nodes (Chebnet GNN) (Defferrard

et al., 2016), to the connected nodes features (GAT for graph

attention network) (Veli

ckovi

c et al., 2018) and/or to edge

features (Bresson & Laurent, 2018). When considering

sparse graphs, the memory and computational complexity

of such approaches are linear with respect to the number of

nodes. As a consequence, these algorithms are feasible for

large sparse graphs and thus have been applied with success

on many downstream tasks (Dwivedi et al., 2020).

Despite these successes and these interesting computational

properties, it has been shown that MPNNs are not pow-

erful enough (Xu et al., 2019). Considering two non-

isomorphic graphs that are not distinguishable by the ﬁrst

order Weisfeiler-Lehman test (known as the 1-WL test), ex-

isting maximum powerful MPNNs embed them to the same

point. Thus, from a theoretical expressive power point of

view, these algorithms are not more powerful than the 1-WL

test. Beyond the graph isomorphism issue, it has also been

shown that many other combinatorial problems on graph

cannot be solved by MPNNs (Sato et al., 2019).

In (Maron et al., 2019b; Keriven & Peyr

e, 2019), it has been

proven that in order to reach universal approximation, higher

order relations are required. In this context, some powerful

models that are equivalent to the 3-WL test were proposed.

For instance, (Maron et al., 2019a) proposed the model

PPGN (Provably Powerful Graph Network) that mimics the

second order Folklore WL test (2-FWL), which is equivalent

to the 3-WL test. In (Morris et al., 2019), they proposed to

use message passing between 1, 2 and 3 order node tuples

hierarchically, thus reaching the 3-WL expressive power.

However, using such relations makes both memory usage

and computational complexities grown exponentially. Thus,

it is not feasible to have universal approximation models in

practice.

In order to increase the theoretical expressive power of

MPNNs by keeping the linear complexity mentioned above,

some researchers proposed to partly randomize node fea-

tures (Abboud et al., 2020; Sato et al., 2020) or to add a

unique label (Murphy et al., 2019) in order to have the

ability to distinguish two non-isomorphic graphs that are

not distinguished by the 1-WL test. These solutions need

massively training samples and involve slow convergence.

arXiv:2106.04319v1 [cs.LG] 8 Jun 2021

Breaking the Limits of Message Passing Graph Neural Networks

(Bouritsas et al., 2020; Dasoulas et al., 2020) proposed to

use a preprocessing step to extract some features that cannot

be extracted by MPNNs. Thus, the expressive power of

their GNN is improved. However, these handcrafted fea-

tures need domain expertise and a feature selection process

among an inﬁnite number of possibilities.

All these studies target more theoretically powerful models,

closer to universal approximation. However, this does not

always induce a better generalization ability. Since most of

the realistic problems are given with many node/edge fea-

tures (which can be either continuous or discrete), there is

almost no pair of graphs that are not distinguishable by the

1-WL test in practice. In addition, theoretically more power-

ful methods use non-local updates, breaking one of the most

important inductive bias in Euclidean learning named local-

ity principle (Battaglia et al., 2018). These may explain why

theoretical powerful methods cannot outperform MPNNs

on many downstream tasks, as reported in (Dwivedi et al.,

2020). On the other hand, it is obvious that 1-WL equivalent

GNNs are not expressive enough since they are not able to

count some simple structural features such as cycles or trian-

gles (Arvind et al., 2020; Chen et al., 2020; Bouritsas et al.,

2020; Vignac et al., 2020), which are informative for some

social or chemical graphs. Finally, another important aspect

mentioned by a recent paper (Balcilar et al., 2021) concerns

the spectral ability of GNN models. It is shown that a vast

majority of the MPNNs actually work as low-pass ﬁlters,

thus reducing their expressive power.

In this paper, we propose to design graph convolution in the

spectral domain with custom non-linear functions of eigen-

values and by masking the convolution support with desired

length of receptive ﬁeld. In this way, we have (i) a spatially

local updates process, (ii) linear memory and computational

complexities (except the eigendecomposition in preprocess-

ing step), (iii) enough spectral ability and (iv) a model that is

theoretically more powerful than the 1-WL test, and experi-

mentally as powerful as PPGN. Experiments show that the

proposed model can distinguish pairs of graphs that cannot

be distinguished by 1-WL equivalent MPNNs. It is also able

to count some substructures that 1-WL equivalent MPNNs

cannot. Its spectral ability enables to produce various kind

of spectral components in the output, while the vast majority

of the GNNs including higher order WL equivalent models

do not. Finally, thanks to the sparse matrix multiplication, it

has linear time complexity except the eigendecomposition

in preprocessing step.

The paper is structured as follows. In Section 2, we set

the notations and the general framework used in the follow-

ing. Section 3 is dedicated to the characterization of WL

test, which is the backbone of our theoretical analysis. It

is followed by our ﬁndings in Section 4 on analysing the

expressive power of MPNNs and our solutions to improve

expressive power of MPNNs in Section 5. The experimental

results and conclusion are the last two section of this paper.

2. Generalization of Spectral and Spatial

MPNN

Let

be a graph with

nodes and an arbitrary number

of edges. Connectivity is given by the adjacency matrix

A ∈ {0, 1}

n×n

and features are deﬁned on nodes by

X ∈

n×f

, with

the length of feature vectors. For any matrix

, we used

and

i,j

to refer to its

-th column

vector,

-th row vector and (

i, j

)-th entry, respectively. A

graph Laplacian is given by

L = D − A

(or

L = I −

−1/2

) where

D ∈ R

n×n

is the diagonal degree

matrix and

is the identity. Through an eigendecomposition,

can be written by

L = Udiag(λ)U

where each column

U ∈ R

n×n

is an eigenvector of

λ ∈ R

gathers

the eigenvalues of L and

diag(·)

creates a diagonal matrix

whose diagonal elements are from a given vector. We use

superscripts to refer to vectors or matrices evolving through

iterations or layers. For instance,

(l)

∈ R

n×f

refers to

the node representation on layer

whose feature dimension

is f

GNN models rely on a set of layers where each layer takes

the node representation of the previous layer

(l−1)

as input

and produces a new representation

(l)

, with

(0)

= X

According to the domain which is considered to design the

layer computations, GNNs are generally classiﬁed as either

spectral or spatial (Wu et al., 2019; Chami et al., 2020).

Spectral GNNs rely on the spectral graph theory (Chung,

1997). In this framework, signals on graphs are ﬁltered using

the eigendecomposition of the graph Laplacian (Shuman

et al., 2013). By transposing the convolution theorem to

graphs, the spectral ﬁltering in the frequency domain can

be deﬁned by

flt

= Udiag(Ω(λ))U

, where

Ω(.)

the desired ﬁlter function which needs to be learnt by back-

propagation. On the other hand, spatial GNNs, such as GCN

(graph convolutional network) (Kipf & Welling, 2017) and

GraphSage (Hamilton et al., 2017), consider two operators,

one that aggregates the connected nodes messages and one

that updates the concerned node representation.

In a recent paper (Balcilar et al., 2021), it was explicitly

shown that both spatial and spectral GNNs are MPNN, tak-

ing the general form

(l+1)

= σ



(s)

(l)

(l,s)



, (1)

where

(s)

∈ R

n×n

is the

-th convolution support that

deﬁnes how the node features are propagated to the neigh-

boring nodes and

(l,s)

∈ R

×f

l+1

is the trainable matrix

for the

-th layer and

-th support. Within this generalization,

GNNs differ from each other by the design of the convo-

lution supports

(s)

. If the supports are designed in the

Breaking the Limits of Message Passing Graph Neural Networks

spectral domain by

(λ)

, the convolution support needs to

be written as C

(s)

= Udiag(Φ

(λ))U

One can see that as long as

(s)

matrices are sparse (number

of edges is deﬁned by some constant multiplied by the

number of nodes), MPNN in Eq.1 has linear memory and

computational complexities with respect to the number of

nodes. Because, the valid entries in

(s)

that we need to

keep is linear with respect to the number of nodes and thank

to the sparse matrix multiplication

(s)

(l)

takes linear

time with respect to the number of edges thus nodes as well.

3. Characterization of Weisfeiler-Lehman

The universality of a GNN is based on its ability to embed

two non-isomorphic graphs to distinct points in the target

feature space. A model that can distinguish all pairs of non-

isomorphic graphs is a universal approximator. Since the

graph isomorphism problem is NP-intermediate (Takapoui

& Boyd, 2016), the Weisfeiler-Lehman Test (abbreviated

WL-test), which gives sufﬁcient but not enough evidence

of graph isomorphism, is frequently used for characterizing

GNN expressive power. The classical vertex coloring WL

test can be extended by taking into account higher order of

node tuple within the iterative process. These extensions are

denoted as

-WL test, where

is equals to the order of the

tuple. These tests are described in Appendix A.

It is shown in (Arvind et al., 2020) that for

k ≥ 2

(k + 1)

> (k)

-WL, i.e., higher order of tuple leads to a better

ability to distinguish two non-isomorphic graphs. For

k = 1

this statement is not true, and 2-WL is not more powerful

than 1-WL (Maron et al., 2019a). To clarify this point,

the Folkore WL (FWL) test has been deﬁned such that

1-WL=1-FWL, but for

k ≥ 2

, we have

(k + 1)

-WL

≈

(k)-FWL (Maron et al., 2019a).

In literature, some confusions occur among the two ver-

sions. Some papers use WL test order (Morris et al., 2019;

Maron et al., 2019a), while others use FWL order under the

name of WL such as in (Abboud et al., 2020; Arvind et al.,

2020; Takapoui & Boyd, 2016). In this paper, we explicitly

mention both WL and FWL equivalent.

In order to better understand the capability of WL tests,

some papers attempt to characterize these tests using a ﬁrst

order logic (Immerman & Lander, 1990; Barcel

o et al.,

2019). Consider two unlabeled and undirected graphs repre-

sented by their adjacency matrices

and

. These two

graphs are said

-WL (or

-FWL) equivalent, and denoted

≡

k−W L

, if they are indistinguishable by a

-WL

(or k-FWL) test.

Recently (Brijder et al., 2019; Geerts, 2020) proposed a

new Matrix Language called MATLANG. This language

includes different operations on matrices and makes some

explicit connections between speciﬁc dictionaries of oper-

ations and the 1-WL and 3-WL tests. Expressive power

varies with the operations included in each dictionnary.

Deﬁnition 1. M L(L)

is a matrix language with an al-

lowed operation set

L = {op

, . . . op

}

, where

∈

{., +,

, diag, tr, 1, , ×, f}

. The possible operations are

matrices multiplication and addition, matrix transpose,

vector diagonalization, matrix trace computation, column

vector full of 1, element-wise matrix multiplication, ma-

trix/scalar multiplication and element-wise custom function

operating on scalars or vectors.

Deﬁnition 2. e(X) ∈ R

is a sentence in

ML(L)

if it con-

sists of any possible consecutive operations in

, operating

on a given matrix X and resulting in a scalar value.

As an example,

e(X) = 1

is a sentence of

ML(L)

with

L = {.,

, 1}

, computing the sum of all elements of

square matrix

. In the following, we are interested in lan-

guages

, L

and

that have been used for characterizing

the WL-test in (Geerts, 2020). These results are given next.

Remark 1.

Two adjacency matrices are indistinguishable

by the 1-WL test if and only if

e(A

) = e(A

)

for all

e ∈ L

with

= {.,

, 1, diag}

. Hence, all possible

sentences in

are the same for 1-WL equivalent adjacency

matrices. Thus,

≡

1−W L

↔ A

≡

ML(L

)

(see Theorem 7.1 in (Geerts, 2020))

Remark 2. ML(L

)

with

= {.,

, 1, diag, tr}

strictly more powerful than

, i.e., than the 1-WL test,

but less powerful than the 3-WL test. (see Theorem 7.2 and

Example 7.3 in (Geerts, 2020))

Remark 3.

Two adjacency matrices are indistinguishable

by the 3-WL test if and only if they are indistinguishable by

any sentence in

ML(L

)

with

= {.,

, 1, diag, tr, }

Thus,

≡

3−W L

↔ A

≡

ML(L

)

. (see Theo-

rem 9.2 in (Geerts, 2020))

Remark 4.

Enriching the operation set to

= L ∪

{+, ×, f}

where

L ∈ (L

, L

) does not improve the ex-

pressive power of the language. Thus,

≡

ML(L)

↔

≡

ML(L

)

. (see Proposition 7.5 in (Geerts, 2020))

4. How Powerful are MPNNs?

This section presents some results about the theoretical ex-

pressive power of state-of-the-art MPNNs. Those results are

derived using the MATLANG language (Geerts, 2020) and

more precisely the remarks of the preceding section. Proofs

of the theorems are given in Appendix B.

Theorem 1.

MPNNs such as GCN, GAT, GraphSage, GIN

(deﬁned in Appendix H) cannot go further than operations

. Thus, they are not more powerful than the 1-WL test.

This result has already been given in (Xu et al., 2019), which

proposed GIN-



(GIN for Graph Isomorphism Network) and

Breaking the Limits of Message Passing Graph Neural Networks

showed that it is the unique MPNN which is provably exact

the same powerful with the 1-WL test, while the rest of

MPNNs are known to be less powerful than 1-WL test.

Chebnet is also known to be not more powerful than the

1-WL test. However, the next theorem states that it is true if

the maximum eigenvalues are the same for both graphs. For

a pair of graphs whose maximum eigenvalues are not equal,

Chebnet is strictly more powerful than the 1-WL test.

Theorem 2.

Chebnet is more powerful than the 1-WL test

if the Laplacian maximum eigenvalues of the non-regular

graphs to be compared are not the same. Otherwise Chebnet

is not more powerful than 1-WL.

Figure 1.

Decalin (

) and Bicyclopentyl (

) graphs are

and

also 1-WL equivalent, but Chebnet can distinguish them.

Figure 1 shows two graphs that are 1-WL equivalent and

are generally used to show how MPNNs fail. However,

their normalized Laplacian’s maximum eigenvalues are not

the same. Thus, Chebnet can project these two graphs to

different points in feature space. Details can be found in

Appendix C.

As stated in the introduction, comparison with the WL-test

is not the only way to characterize the expressive power

of GNNs. Powerful GNNs are also expected to be able to

count relevant substructures in a given graph for speciﬁc

problems. The following theorems describe the matrix lan-

guage required to be able to count the graphlets illustrated

in Figure 2, which are called 3-star, triangle, tailed triangle

and 4-cycle.

Figure 2.

Sample of patterns: 3-star, triangle, tailed triangle and

4-cycle graphlets used in our analysis.

Theorem 3.

3-star graphlets can be counted by sentences

in L

Theorem 4.

Triangle and 4-cycle graphlets can be counted

by sentences in L

Theorem 5.

Tailed triangle graphlets can be counted by

sentences in L

These theorems show that 1-WL equivalent MPNNs can

only count 3-star patterns, while 3-WL equivalent MPNNs

can count all graphlets shown in Figure 2.

(Dehmamy et al., 2019) has shown that a MPNN is not able

to learn node degrees if the MPNN has not an appropriate

convolution support (e.g.

). Therefore, to achieve a fair

comparison, we assume that node degrees are included as

a node feature. Note however, that the number of 3-star

graphlets centered on a node can be directly derived from its

degrees (see Appendix B.3). Therefore, any graph agnostic

MLP can count the number of 3-star graphlets given the

node degree.

5. MPNN Beyond 1-WL

In this section, we present two new MPNN models. The

ﬁrst one, called GNNML1 is shown to be as powerful as the

1-WL test. The second one, called GNNML3 exploits the

theoretical results of (Geerts, 2020) to break the limits of 1-

WL and reach 3-WL equivalence experimentally. GNNML1

relies on the node update schema given by :

(l+1)

=σ

(

(l)

(l,1)

+AH

(l)

(l,2)

(l)

(l,3)

H

(l)

(l,4)

)

(2)

where

(l,s)

are trainable parameters. Using this model,

the new representation of a node consists of a sum of three

terms : (i) a linear transformation of the previous layer repre-

sentation of the node, (ii) a linear transformation of the sum

of the previous layer representations of its connected nodes

and (iii) the element-wise multiplication of two different

linear transformations of the previous layer representation

of the node.

The expressive power of GNNML1 is deﬁned by the follow-

ing theorem. Its proof is given in Appendix B:

Theorem 6.

GNNML1 can produce every possible sen-

tences in

ML(L

)

for undirected graph adjacency

with

monochromatic edges and nodes. Thus, GNNML1 is exactly

as powerful as the 1-WL test.

Hence, this model has the same ability as the 1-WL test to

distinguish two non-isomorphic graphs, i.e., the same as

GIN. This is explained by the third term in the sum of Eq.

(2)

since it can produce feature-wise multiplication on each

layer. Since node representation is richer, we also assume

that it would be more powerful for counting substructures.

This assumption is validated by experiments in Section 6.

To reach more powerful models than 1-WL, theoretical re-

sults (see Remarks 1, 2 and 3 in Section 3) show that a

model that can produce different outputs than

language

is needed. More precisely, according to Remarks 2 and 3,

trace (

) and element-wise multiplication (



) operations

are required to go further than 1-WL.

In order to illustrate the impact of the trace operation, one

can use 1-WL equivalent Decalin and Bicyclopentyl graphs

in Figure 1. It is easy to show that

tr(A

) = 0

but

tr(A

) = 20

tr(A

)

giving the number of 5-length closed

剩余17页未读，继续阅读

评论收藏

内容反馈

版权申诉

易小侠

粉丝: 6514
资源: 9万+

突破消息传递图神经网络的局限性_Breaking the Limits of Message Passing Graph Neu

最新资源