没有合适的资源?快使用搜索试试~ 我知道了~
2019-On Graph Classification Networks, Datasets and Baselines-小对
需积分: 0 0 下载量 30 浏览量
2022-08-04
13:42:46
上传
评论
收藏 524KB PDF 举报
温馨提示
试读
5页
On Graph Classification Networks, Datasets and BaselinesEnxhell Luzhnica * 1 Ben
资源详情
资源评论
资源推荐
On Graph Classification Networks, Datasets and Baselines
Enxhell Luzhnica
* 1
Ben Day
* 1
Pietro Lio
1
Abstract
Graph classification receives a great deal of at-
tention from the non-Euclidean machine learning
community. Recent advances in graph coarsen-
ing have enabled the training of deeper networks
and produced new state-of-the-art results in many
benchmark tasks. We examine how these archi-
tectures train and find that performance is highly-
sensitive to initialisation and depends strongly
on jumping-knowledge structures. We then show
that, despite the great complexity of these mod-
els, competitive performance is achieved by the
simplest of models – structure-blind MLP, single-
layer GCN and fixed-weight GCN – and propose
these be included as baselines in future.
1. Introduction
Deep learning has produced remarkable results across the
full breadth of machine learning research. For the most
part this has been achieved through the reapplication of the
two main architectures, the CNN and RNN, adapted to two
Euclidean cases – omnidirectional (image-like) and unidirec-
tional (series) – respectively. As such there is great interest
in extending the general techniques to non-Euclidean cases
and graph-structured data problems in particular.
These efforts are mostly inspired by the CNN and attempting
to find suitable analogs to its core components, the convo-
lutional and pooling operators. Early work set out to de-
velop convolution-like graph operators. The focus has now
turned to developing pooling operations, often referred to as
coarsening in the context of graphs. Besides static methods
(Luzhnica et al., 2019), differentiable pooling frameworks
have been developed. DiffPool achieved state-of-the-art
(SoTA) performance across many benchmark tasks (Ying
et al., 2018), however a dense representation, quadratic in
memory, is required. The Graph U-Net introduces a sparse
method based on pruning nodes (
top-k
) (Gao & Ji, 2019).
*
Equal contribution
1
Department of Computer Science & Tech-
nology, University of Cambridge, Cambridge, United Kingdom.
Submitted to the ICML 2019 Workshop on Learning and Reason-
ing with Graph-Structured Data Copyright 2019 by the author(s).
Cangea et al. (2018) apply the method in graph classification
by incorporating
top-k
pools in a GCN model, achieving per-
formance competitive with the SoTA with scalable memory
requirements.
In this work we show that, under standard initialisation (Glo-
rot & Bengio, 2010; He et al., 2015), using the GCN and
top-k
operator together results in vanishing gradients be-
yond the first layers. In addition, we show that it is possible
to attain good performance on smaller benchmark tasks sim-
ply using a global-pool
1
followed by an MLP. Furthermore,
to achieve results on a par with Graph U-Net in all bench-
marks a single-layer GCN with a jumping-knowledge (JK)
connection (Xu et al., 2018) from the input graph followed
by an MLP is sufficient, whether the weights of the GCN are
trained or not.
Considering the implications of these results, we primarily
argue for the importance of including strong, simple base-
lines in evaluation. We also define an initialisation scheme
that remedies the vanishing gradient issue by design though
we find that this does not consistently improve performance.
Motivation
This work was motivated by studies of net-
work activations and gradient flow in deeper GNNs with JK
structures and
top-k
pooling. We found that, at initialisa-
tion, activations into the network rapidly vanish and that
throughout training the gradients flowed mostly into earlier
layers. These findings prompt two questions: firstly, are
deeper networks only trainable thanks to JK structures by-
passing later layers? and secondly, how important are the
later layers to performance anyway?
2. Preliminaries
We use the standard notation: a graph
G
of
N
nodes with
F
features per node is represented by the pair
(A, X)
with
adjacency matrix,
A ∈ R
N ×N
, and node feature matrix,
X ∈ R
N ×F
.
Graph Convolution
ReLU activations and the improved
GCN (Gao & Ji, 2019) are used throughout. This differs
from the standard GCN in that
ˆ
A = A + 2I
is used i.e.
self-loops have a weight of 2.
1
A simple mean or sum over the features of all nodes.
arXiv:1905.04682v1 [cs.LG] 12 May 2019
华亿
- 粉丝: 41
- 资源: 308
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功
评论0