【免费】2019-[斯坦福]-Pre-trainingGraphNeuralNetworks-利用遮挡分子局部，强制学领域知识-r资源-CSDN文库

需积分: 0 51 浏览量 2022-08-04 11:54:50 上传评论收藏 1.45MB PDF 举报

资源推荐

资源详情

资源评论

Pre-training Graph Neural Networks

Weihua Hu

1∗

, Bowen Liu

2∗

, Joseph Gomes

, Marinka Zitnik

Percy Liang

, Vijay S. Pande

, Jure Leskovec

Department of Computer Science,

Department of Chemistry,

Department of Bioengineering

Stanford University

{weihuahu,liubowen,joegomes,pande}@stanford.edu,

{marinka,pliang,jure}@cs.stanford.edu

Abstract

Many applications of machine learning in science and medicine, including molecu-

lar property and protein function prediction, can be cast as problems of predicting

some properties of graphs, where having good graph representations is critical.

However, two key challenges in these domains are (1) extreme scarcity of labeled

data due to expensive lab experiments, and (2) needing to extrapolate to test graphs

that are structurally different from those seen during training. In this paper, we

explore pre-training to address both of these challenges. In particular, working with

Graph Neural Networks (GNNs) for representation learning of graphs, we wish to

obtain node representations that (1) capture similarity of nodes’ network neighbor-

hood structure, (2) can be composed to give accurate graph-level representations,

and (3) capture domain-knowledge. To achieve these goals, we propose a series of

methods to pre-train GNNs at both the node-level and the graph-level, using both

unlabeled data and labeled data from related auxiliary supervised tasks. We perform

extensive evaluation on two applications, molecular property and protein function

prediction. We observe that performing only graph-level supervised pre-training

often leads to marginal performance gain or even can worsen the performance com-

pared to non-pre-trained models. On the other hand, effectively combining both

node- and graph-level pre-training techniques signiﬁcantly improves generalization

to out-of-distribution graphs, consistently outperforming non-pre-trained GNNs

across 8 datasets in molecular property prediction (resp. 40 tasks in protein function

prediction), with the average ROC-AUC improvement of 7.2% (resp. 11.7%).

1 Introduction

Many problems in scientiﬁc domains, such as chemistry and biology, can be cast as the prediction of

some property of a graph. For example, in chemistry, predicting chemical properties such as toxicity

of molecules is important to accelerate drug discovery, where molecules are naturally represented

by molecular graphs [

]. In biology, identifying the functionality of proteins is

important to ﬁnd proteins that associate with a certain disease, where proteins are represented by

local protein-protein interaction (PPI) graphs [

]. Supervised learning of graphs, especially

with Graph Neural Networks (GNNs) [

], has shown promising results in these domains

[64, 56, 13, 60].

Despite the promise, there remain two key challenges in applying GNNs to these scientiﬁc domains:

(1) the extreme scarcity of labeled data, and (2) out-of-distribution prediction, where the graphs in the

training set can have very different structural properties from those in the test set. First, task-speciﬁc

data labeling is a costly and time consuming procedure typically performed in wet lab environments.

Consequently, conventional GNNs can easily overﬁt to the small training datasets. Second, many

∗

The two ﬁrst authors made equal contributions.

Preprint. Under review.

arXiv:1905.12265v1 [cs.LG] 29 May 2019

Node embedding

Graph embedding

Node-level

pre-training only

Graph-level

pre-training only

Node-level +

Graph-level pre-training

Pooling

Node-level

Graph

level

Domain

Knowledge

Masking

Supervised

Graph

Classification

Structure

Context

Prediction

Self

supervised

Graph

Classification

(a) Our approach

(b.i)

(b.ii)

(b.iii)

Figure 1:

(a)

Categorization of the pre-training methods for GNNs. Crucially, our methods, i.e.,

Context Prediction, Masking, and graph-level supervised pre-training, cover both node-level and

graph-level pre-training.

(b)

Node and graph embeddings obtained by different pre-training strategies.

(b.i)

When only node-level pre-training is used, nodes of different shapes (semantically different

nodes) can be well separated, however, the node embeddings are not composable, and thus resulting

graph embeddings (denoted by their classes,

and

−

) that are created by pooling embeddings of

individual nodes are not separable.

(b.ii)

With graph-level pre-training only, graph embeddings

are well separated, however the embeddings of individual nodes do not necessarily capture their

domain-speciﬁc semantics.

(b.iii)

High-quality node embeddings are such that nodes of different

types are well separated, while at the same time, the embedding space is also composable. This allows

for accurate and robust representations of entire graphs, which allows robust transfer of pre-trained

models to a variety of downstream tasks.

scientiﬁc applications naturally involve out-of-distribution prediction. For example, one may want to

predict chemical properties of newly-synthesized molecules which are often structurally different

from the training molecules, or one may want to predict functionality of proteins from a new species

that has different PPI network structure than previously studied species. Unfortunately, deep learning

models are known to be extremely poor at out-of-distribution prediction [20, 16].

One promising approach to address the above two challenges is to pre-train GNNs using large

amounts of easily-accessible unlabeled data as well as relatively easily-accessible labeled data that

comes from related auxiliary tasks. For example, to perform a variety of downstream molecular

property prediction tasks (e.g., predicting toxicity or enzyme binding), one could use large amounts of

easily-accessible molecule data to pre-train a model to capture chemistry domain knowledge, such as

valency and chemical properties of different functional groups. Afterwards, very little hard-to-obtain

labeled data would be needed to specialize the pre-trained model to the given downstream prediction

task. Beyond its beneﬁt of increasing data efﬁciency, pre-training could also improve predictive

performance in out-of-distribution samples [

]. Therefore, pre-training could provide an attractive

solution to the above two challenges. However, currently there exists no systematic investigation of

potential strategies for pre-training GNNs and their effectiveness. In fact, as we see in our experiments,

naïve pre-training of GNNs can often only give marginal increase in generalization performance on

downstream tasks, and sometimes even worsen the performance compared to non-pre-trained models.

In this paper, we examine effective pre-training approaches to graph representation learning using

GNNs. Our key observation is that GNNs obtain a representation of an entire graph by combining

the following two steps [

]: (1) recursively aggregating neighboring information to obtain low-

dimensional node embeddings that capture neighborhood structure, and (2) pooling/composing node

embeddings to obtain a representation of the entire graph. Based on this observation, our goals for

pre-training GNNs are to produce node embeddings that:

1. capture structural similarity of nodes’ network neighborhoods.

are composable so that node embeddings can be pooled into an accurate graph-level repre-

sentation.

3. capture domain-knowledge at the level of individual nodes and entire graphs.

Our approach to achieve these goals, which we brieﬂy summarize below, is categorized in Figure 1 (a).

Importantly, we aim to pre-train GNNs both at the level of individual nodes as well as entire graphs,

which provides composability of embeddings as it builds a bridge between local node embeddings and

the global graph embeddings, as illustrated in Figure 1 (b.iii). This is in contrast to naïve approaches

to pre-train GNNs, i.e., either only apply an (off-the-shelf) unsupervised node representation learning

technique, as illustrated in Figure 1 (b.i), or only perform supervised pre-training to predict auxillary

properties of entire graphs, as illustrated in Figure 1 (b.ii).

Context Prediction.

Most of the existing off-the-shelf unsupervised node representation learning

methods are designed for node classiﬁcation [

] and enforce nearby nodes to

have similar embeddings. This is not suited for representation learning of an entire graph, where

capturing the structural similarity of local neighborhoods is more important [

]. To learn

node embeddings that capture local graph structure, we introduce Context Prediction, which is a novel

self-supervised node-level pre-training method that applies the distributional hypothesis [

] to

the graph domain. In particular, we use node embeddings to predict surrounding graph structure, so

nodes that have similar surrounding graph structure will be mapped into similar representations.

Masking.

To learn node embeddings that capture domain knowledge, we propose a novel self-

supervised node-level pre-training method called Masking. In Masking, we randomly mask input

node/edge attributes and let GNNs predict the masked attributes from the surrounding structure. For

example, in the chemistry application, we can use node embeddings to predict atom types of masked

atoms, as illustrated in Figure 2 (b). This forces the model to capture chemistry domain knowledge,

such as valency and the electronic or steric properties of functional groups [30].

Graph-level Prediction.

To learn composable node embeddings that are useful for downstream

tasks, we can either perform (1) supervised graph-level pre-training on domain-speciﬁc auxiliary

tasks, or (2) self-supervised pre-training to predict structural properties of the graphs. Here, to directly

encode domain knowledge into graph embeddings, we take the ﬁrst approach and combine our novel

Context Prediction and Masking methods with graph-level supervised pre-training. This ensures that

individual node embeddings are easily composed to obtain domain-speciﬁc representations of an

entire graph, as illustrated in Figure 1 (b.iii).

We extensively evaluate the above pre-training methods and their combinations (one from node-level

and another from graph-level) on two scientiﬁc applications of graph classiﬁcation: molecular property

prediction in chemistry, and protein function prediction in biology. First, on many downstream tasks,

performing only graph-level supervised pre-training gives marginal performance gain or sometimes

even worsen the generalization performance compared to a non-pre-trained model. This phenomenon

is referred to as negative transfer [

], which poses a signiﬁcant problem when deploying pre-

trained models to real world applications, and has been previously observed for multi-task learning

of molecular property prediction tasks [

]. When our node-level self-supervised pre-

training is combined with the graph-level supervised pre-training, negative transfer is completely

avoided across all the 8 downstream datasets of molecular prediction and all the 40 downstream

tasks of protein function prediction; thus, robustly transferable pre-trained models are achieved.

Furthermore, on these downstream tasks, GNNs pre-trained with such combined strategies achieve

signiﬁcantly better out-of-distribution generalization performance than GNNs pretrained with a single

type of (or no) pre-training method. Speciﬁcally, on molecular property (resp. protein function)

prediction tasks, our combined pre-training methods give 7.2% (resp. 11.7%) higher average ROC-

AUC scores compared to the non-pre-trained GNNs, 4.1% (resp. 6.1%) higher average ROC-AUC

scores compared to GNNs pre-trained only with graph-level supervised auxillary tasks, and 3.1%

(resp. 9.8%) higher average ROC-AUC scores compared to GNNs pre-trained only with node-level

self-supervised tasks.

2 Preliminaries on Graph Neural Networks

We begin by formalizing the task of supervised learning of graphs, and review the basic components

of GNNs [

]. Then, we review existing methods for unsupervised representation learning on graphs.

Supervised learning of graphs.

Let

G = (V, E)

denote a graph with node feature vectors

for

v ∈ V

and edge feature vectors

for

(u, v) ∈ E

. Given a set of graphs {

, ...,

} and their

labels {

, . .. ,

}, the task of graph supervised learning is to learn a representation vector

that

helps predict the label of an entire graph, y

= g(h

Graph Neural Networks (GNNs)

. GNNs use the graph structure as well as node features and edge

features to learn a representation vector of a node,

, and of the entire graph

. Modern GNNs

follow a neighborhood aggregation strategy, where we iteratively update the representation of a

node by aggregating representations of its neighboring nodes and edges [

]. After

iterations of

aggregation, a node’s representation captures the structural information within its

-hop network

剩余16页未读，继续阅读

评论收藏

内容反馈

郑瑜伊

粉丝: 19
资源: 317

2019-[斯坦福]-Pre-training Graph Neural Networks-利用遮挡分子局部，强制学领域知识-r

最新资源

2019-[斯坦福]-Pre-training Graph Neural Networks-利用遮挡分子局部，强制学领域知识-r

2019-[斯坦福]-Pre-training Graph Neural Networks-笔记-rrrr1

2019-ICML-YOU-Position-aware Graph Neural Networks-利用邻近锚点集，强化位置描

STRATEGIES FOR PRE-TRAINING GRAPH NEURAL NETWORKS

2019-Session-based Recommendation with Graph Neural Networks.pdf

2019-Graph Neural Tangent Kernel, Fusing Graph Neural Networks w

2019-MINCUT POOLING IN GRAPH NEURAL NETWORKS-网文-rrrr1

刘知远-Introduction to Graph Neural Networks.pdf

2019-Graph Neural Networks with convolutional ARMA filters-基于矩阵的

A Comprehensive Survey on Graph Neural NETWORKS.pdf

2018-Graph Convolutional Neural Networks for Web-Scale Recommender Systems.pdf

GATED GRAPH SEQUENCE NEURAL NETWORKS

2019-Neighborhood Enlargement in Graph Neural Networks-网文+笔记1

E(n) Equivariant Graph Neural Networks

2019-ACL-清华等-Graph Neural Networks with Generated Parameters for

2019-WWW-Graph Neural Networks for Social Recommendation 网文1

2019-MINCUT POOLING IN GRAPH NEURAL NETWORKS-借助图分割和聚类实现pooling-r

a-gentle-introduction-to-neural-networks-with-python

2019-Neighborhood Enlargement in Graph Neural Networks-扩展邻域=递归聚类

BurpLoaderKeygen.jar.zip

最新版ISO/IEC 27001:2022、ISO 27002:2022中英文合集

Goby红队版-win-x64-2.4.7版本

Chrome Header Editor 插件

ISO SAE 21434-2021 中文版.pdf

OpenVAS GVM 中文翻译补丁

安全认证cisp教材全套

STM32F103C8T6核心板-电路原理图1.PDF

软件工程导论(第六版)课后习题答案1

现代永磁同步电机控制原理及MATLAB仿真__袁雷编著1

最新资源