没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
内容概要:本文提出了一种新的几何感知互连图神经网络(GIANT)用于蛋白质-配体结合亲和力的预测。GIANT由两部分组成:3D几何图学习网络(3DG-NET)和成对互动学习网络(PI-NET)。3DG-NET在迭代节点-边交互过程中更新嵌入表示,同时保持原子间的三维空间距离、极角和二面角信息。PI-NET则通过全局互动学习整合元素类型的互动和分子级别的互动信息,进一步提升模型性能。实验结果表明,GIANT在两个基准数据集上表现优于现有方法。 适合人群:计算化学、生物信息学、深度学习研究者以及从事药物发现的相关科研人员。 使用场景及目标:应用于结构化药物发现任务中,提高蛋白质-配体结合亲和力的预测精度,加速药物筛选进程。 其他说明:本研究强调了三维几何信息在结合亲和力预测中的重要性,并提出了新颖的方法来建模这种复杂交互。通过详尽的实验验证,证明了GIANT的有效性和泛化能力。 -可实现的-有问题请联系博主,博主会第一时间回复!!!
资源推荐
资源详情
资源评论
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024 1991
GIANT: Protein-Ligand Binding Affinity
Prediction via Geometry-Aware Interactive
Graph Neural Network
Shuangli Li , Jingbo Zhou , Tong Xu ,LiangHuang ,FanWang , Haoyi Xiong , Senior Member, IEEE,
Weili Huang
, Dejing Dou , Senior Member, IEEE, and Hui Xiong , Fellow, IEEE
Abstract—Drug discovery often relies on the successful predic-
tion of protein-ligand binding affinity. Recent advances have shown
great promise in applying graph neural networks (GNNs) for better
affinity prediction by learning the representations of protein-ligand
complexes. However, existing solutions usually treat protein-ligand
complexes as topological graph data, thus the 3D geometry-based
biomolecular structural information is not fully utilized. The essen-
tial intermolecular interactions with long-range dependencies, in-
cluding type-wise interactions and molecule-wise interactions, are
also neglected in GNN models. To this end, we propose a geometry-
aware interactive graph neural network (GI
ANT) which consists of
two components: 3D geometric graph learning network (3DG-N
ET)
and pairwise interactive learning network (P
I-NET). Specifically,
3DG-N
ET iteratively performs the node-edge interaction process
to update embeddings of nodes and edges in a unified framework
while preserving the 3D geometric factors among atoms, including
spatial distance, polar angle and dihedral angle information in 3D
space. Moreover, P
I-NET is adopted to incorporate both element
type-level and molecule-level interactions. Specially, interactive
edges are gathered with a subsequent reconstruction loss to reflect
the global type-level interactions. Meanwhile, a pairwise attentive
pooling scheme is designed to identify the critical interactive atoms
for complex representation learning from a semantic view. An
exhaustive experimental study on two benchmarks verifies the
superiority of GI
ANT.
Manuscript received 24 March 2022; revised 26 July 2023; accepted 26
August 2023. Date of publication 13 September 2023; date of current version 5
April 2024. This work was supported in part by OPPO Research Fund and in part
by the National Natural Science Foundation of China under Grant 61960206008.
Recommended for acceptance by Y. Tong. (Corresponding authors: Jingbo
Zhou; Dejing Dou; Hui Xiong.)
Shuangli Li is with the Anhui Province Key Lab of Big Data Analysis
and Application, School of Computer Science and Technology, University of
Science and Technology of China, Hefei, Anhui 230026, China, and also with
the Business Intelligence Lab, Baidu Research, Beijing 100085, China (e-mail:
lsl1997@mail.ustc.edu.cn).
Jingbo Zhou is with the Business Intelligence Lab, Baidu Research, Beijing
100085, China (e-mail: zhoujingbo@outlook.com).
Tong Xu is with the Anhui Province Key Lab of Big Data Analysis and
Application, Schoolof Computer Science, University of Science and Technology
of China, Hefei, Anhui 230026, China (e-mail: tongxu@ustc.edu.cn).
Liang Huang is with the Oregon State University, Corvallis, OR 97331 USA
(e-mail: lianghuang@baidu.com).
Fan Wang and Haoyi Xiong are with the Baidu Inc., Beijing 100085, China
(e-mail: wangfan04@baidu.com; xionghaoyi@baidu.com).
Weili Huang is with the HWL Consulting LLC, LU7 9PU Bedfordshire, U.K.
(e-mail: lwlily99@gmail.com).
Dejing Dou is with the BCG X, Boston, MA 02210 USA (e-mail: doude-
jing@baidu.com).
Hui Xiong is with the Artificial Intelligence Thrust, Hong Kong University
of Science and Technology (Guangzhou), Guangzhou 529200, China (e-mail:
xionghui@ust.hk).
Digital Object Identifier 10.1109/TKDE.2023.3314502
Index Terms—Binding affinity prediction, graph neural
network, geometry modeling, drug discovery, compound-protein
interaction.
I. INTRODUCTION
T
HE prediction of protein-ligand binding affinity has been
widely considered as one of the most important tasks in
computational drug discovery [1]. Here ligands are usually drug
candidates including small molecules and biologics which can
interact with proteins as agonists or inhibitors in the biological
processes to cure diseases. Given a protein, we are interested in
understanding how well a drug molecule (called a ligand) can
interact with this protein. The strength of interaction between
them can be quantified as a numerical score (called the binding
affinity),which potentially determines whether a ligand can have
an effective influence on the protein (for example, to inactivate a
protein to cure a disease). Therefore, the calculation of binding
affinity is of great significance, and our target is to estimate
this valuable interaction score. Although it can be measured by
experimental methods, those biological tests are laborious and
time-consuming. Thus, data-driven computational approaches
have become increasingly necessary and achieved remarkable
success in various drug discovery applications, including pro-
tein interaction mining [2], molecule generation [3], and drug
reactions prediction [4], which highlight the efficacy of such
methods in tackling complex problems for drug-based data min-
ing and knowledge discovery. With similar data-driven learning
models, binding affinities can be predicted in the early stage of
drug discovery. Instead of applying costly biological methods
directly to screen numerous candidate molecules, the prediction
of binding affinity can help to rank drug candidates and prioritize
the appropriate ones for subsequent testing to accelerate the
process of drug screening [5].
With the development of structural biology and protein struc-
ture prediction [6], especially the recent Alphafold II model [7],
there are growing three-dimensional (3D) structure protein data,
which enables a new paradigm for structure-based drug discov-
ery [8], [9], [10]. It has been demonstrated that 3D structural
information can effectively contribute to the drug design [11].
Indeed, since there are already many accurate and robust algo-
rithms to find poses of protein-ligand complexes (e.g., binding
site prediction methods and docking methods), it is significant to
1041-4347 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.
1992 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024
Fig. 1. Brief summary for protein-ligand binding affinity prediction.
(1) Top left: An example of protein-ligand complex (Structure ID in Protein
Data Bank (PDB): 5HMI). (2) Top right: Various complex representations. (3)
Bottom left: Traditional Methods. (4) Bottom right: Machine learning and deep
learning methods.
focus on the much harder task of binding affinity prediction [12].
To learn useful 3D structure from a protein-ligand complex, as
illustrated in Fig. 1, many efforts have been devoted to esti-
mating more accurate binding affinity for effective drug design.
Docking methods [13], [14], [15] play an important role to
predict how a specific ligand binds to the target protein with
affordable computational costs. While the docking process can
identify the binding pose of the protein-ligand complex with
relatively high accuracy, its prediction of binding affinity is
inaccurate and unreliable due to poor scoring functions [12],
[16], which limits the applicability of docking methods in drug
discovery. Compared to docking calculations, traditional ma-
chine learning methods [12], [17] have improved the perfor-
mance by learning the extracted features from protein-ligand
complexes. However, these approaches with limited generaliz-
ability require expert knowledge and heavily rely on feature
engineering.
Recently, deep learning for binding affinity prediction has
become an emerging research area, which represents the com-
plex as sequence data [18], [19], 3D grid-like data [20] or graph
data [21] to employ various neural networks. One of the key
challenges of deep learning in structural biology is how to model
the 3D spatial structure for better performance. To this end,
most of the existing works [20], [22], [23] attempt to apply
3D convolutional neural networks (3D CNNs) by treating the
complex as a 3D-grid representation. However, the cost of these
models is huge, especially when considering long-range struc-
tural interactions. In addition, both the absence of topological
information and the sensitivity to rotation in the complex have
a negative effect on the prediction results.
Despite the powerful ability of graph neural networks (GNNs)
to learn graph representations [24], there are only a few stud-
ies [21], [25] using GNNs to predict the protein-ligand binding
affinity. By contrast, many researchers have developed GNN
models in other fields of drug discovery [26], [27], such as
predicting molecular property [28], [29], [30], [31], biological
network linking [32], and chemical reactions [33]. Nevertheless,
these domain-specific models tend to lose their effectiveness
when modeling the larger biomolecules, e.g., protein-ligand
complexes. In general, most of the existing GNNs in drug
design aim to learn the spatial structure by incorporating the
distance information, which is insufficient to model the 3D
geometric structure of complex. Moreover, the fundamental
pairwise interactive information between proteins and ligands,
which is valuable for predicting the binding affinity [34], cannot
be handled under the current GNN framework.
This paper is an extension of our preliminary work [35].
To overcome the above limitations, in this paper, we propose
a novel G
eometry-aware Interactive Graph Neural Network
(GI
ANT) to learn the constructed complex interaction graph for
protein-ligand binding affinity prediction. GI
ANT is equipped
with two components to correspondingly address the challenges,
namely the 3D geometric graph learning network (3DG-N
ET)
for modeling the geometric structure in 3D space and the
pairwise interaction learning network (P
I-NET) for leveraging
both element type-level and molecule-level interactions with
long-range intermolecular dependencies.
As the first part of GI
ANT, the key idea of 3DG-NET is illus-
trated in Fig. 2 which aims to construct the spherical space for
each central targetand to apply the node-edge interactive scheme
iteratively. 3DG-N
ET has the ability to preserve spatial distance,
polar angle and dihedral angle information of neighbors when
performing the aggregation process, thus it can effectively learn
the 3D structure of protein-ligand complexes.
P
I-NET is designed as the secondary part of GIANT to incor-
porate global intermolecular interactions. On the one hand, in
view of the large size of the protein, it is redundant to contain
the complete protein structure in the graph and we construct the
spatial-based interaction graph from the central key structure of
complex, but in this way the type-level long-range interactive
information (including distant solvation effects [36] and elec-
trostatic interactions [34]) between the protein and the ligand
cannot be captured through such complicated graph without
the complete structure. To deal with this issue, we employ an
atomic type-aware pooling process on edges by introducing
an auxiliary learning task to reconstruct the interactive matrix
for type-level interaction injection. On the other hand, several
important atoms of the complex can affect pairwise interactions
and contribute to the binding affinity. Therefore, we finally
utilize the molecule-level attentive pooling network to extract
the informative biological semantics.
By means of 3DG-N
ET and PI-NET from two perspectives,
our proposed GI
ANT can enhance the representation learning
for complexes with involving both 3D geometric structures and
global interactions. To summarize, the main contributions of this
paper are as follows:
r
To the best of our knowledge, we are among the first to
develop graph neural networks from the perspective of
comprehensive biochemical representation learning in 3D
space for structure-based binding affinity prediction.
r
We propose a novel geometry-aware interactive graph
neural network (GI
ANT), which can capture not only 3D
geometric information through distance-aware graph atten-
tion and angle-oriented graph convolution with a triangular
fusion scheme in 3DG-N
ET, but also global long-range
interactions through pairwise interaction learning network
(P
I-NET) in a semi-supervised manner.
r
We conduct extensive experiments using two benchmark
datasets to evaluate the performance of the proposed model,
Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.
LI et al.: GIANT: PROTEIN-LIGAND BINDING AFFINITY PREDICTION VIA GEOMETRY-AWARE INTERACTIVE GRAPH NEURAL NETWORK 1993
Fig. 2. Illustration of complex geometric division with three angle domains (in different colors) in 3D space, where θ
i
represents polar angle and φ
i,j
denotes
dihedral angle between two adjacent planes.
which demonstrates the superiority of GIANT compared
with state-of-the-art baselines.
Compared with our previous conference paper SIGN [35],
the major improvements include: 1) For geometry modeling,
we present a new sufficient paradigm to model the 3D view
for protein-ligand complexes and replace the angle-oriented
graph attention with a triple-wise dihedral graph aggregation
process (TAGG) to enhance the structure learning. We fulfil the
3DG-N
ET for integrated complex modeling with complementary
dihedral angle information. In this way, the proposed GI
ANT
can learn the comprehensive 3D geometry instead of partial
geometry in SIGN. 2) For interaction modeling, we devise a
novel pairwise interaction learning network P
I-NET to further
facilitate the complex representation learning with adding the
molecule-level interaction component, which can capture the
interactive correlations of both biological element types and
high-level molecules. 3) We significantly extend our experimen-
tal evaluation by comparing with our primary work [35] and
showing additional quantitative results for model effectiveness
and parameter analysis. 4) We also provide a case study to
analyze the interpretability of our model in understanding the
protein-ligand interaction patterns.
II. R
ELATED WORK
In this section, we first review the related literatures about
predicting protein-ligand binding affinity and then detail recent
advances in graph neural networks f or drug discovery.
Protein-Ligand Binding Affinity Prediction: As a crucial stage
in drug discovery, predicting protein-ligand binding affinity has
been intensively studied for a long time [37], [38], which is of
great importance for efficient and accurate drug screening. The
earlier empirical-based methods [14], [39], [40] design docking
and scoring functions specially to makepredictions, while expert
domain knowledge is required to encode internal biochemical
interactions. Later on, statistical and machine learning-based
methods [41] are developed to predict binding affinity based
on data-driven learning, which attempt to extract protein-ligand
features and use classic models for regression, such as random
forest [12] and SVM [17]. These approaches are dependent
on the quality of hand-crafted features and lack of generality
on the larger dataset. Recently, several deep learning-based
models [18], [19] utilize 1D convolutions and pooling to capture
potential patterns from raw sequence information of both ligands
and proteins. However, only using separate character represen-
tations fails to achieve desirable performance.
Recently, AlphaFold II [7] makes a remarkable achievement
in the field of protein structure prediction, which adopts the
Transformer-based framework designed for predicting protein’s
3D structure given the amino acid s equence of the protein. As the
increasing availability of 3D-structure protein-ligand data [42],
there is another hot research area of studying structure-based
approaches, which focus on learning from 3D-structure protein-
ligand complexes to predict binding affinity. The problem of
Alphafold II and the binding affinity prediction problem are two
complementary problems, both of which hold great importance
for biological data mining and drug knowledge discovery. Some
recent works [22], [23] represent the protein-ligand complex
as 3D grid-like data and use 3D convolutions (3D-CNNs) to
take advantage of spatially-local correlations. Though these
approaches can learn spatial information, one limitation is that
positions of proteins and ligands in different complexes are
changeable, such as different angle rotations, which means the
spatial structure of 3D grid-like modeling is inevitably incom-
plete. More recently, OnionNet [43] employs CNN models to
learn the complex representation from the extracted element-
specific interaction features between a protein and its ligand.
However, all the above models neglect the critical topologi-
cal structure information of complexes. In the work [25],a
protein-ligand complex is represented as a weighted graph with
distance information. Then graph attention networks are applied
to predicting the interactions. Nevertheless, only distance infor-
mation between atoms is not adequate to model 3D-structure
interactions. In this paper, we also focus on the structure-based
prediction of protein-ligand binding affinity with incorporating
abundant spatial information.
Graph Neural Networks for Drug Discovery: Inspired by the
great advantage of graph neural networks (GNNs) in modeling
graph data, more attention has been devoted to applying them
in computational drug discovery [26], such as the prediction of
molecular property [44] and protein interface [45]. Treating the
molecule as a graph, GNNs can learn the graph-level represen-
tation for drug or protein by aggregating structural information.
GraphDTA [21] adopts GNN models [46], [47], [48] to learn
drug presentation with combining the protein representation
from 1D convolutions to predict binding affinity. In attributed
Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.
1994 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024
TABLE I
M
ATHEMATICAL NOTATIONS
Fig. 3. Illustrative example of converting the protein-ligand complex into a
complex interaction graph.
molecular graphs, the edges between atoms contain valuable
information, such as distance or bond order. To leverage rich
attributes in the molecule, edge-oriented message passing neural
networks [28], [49], [50] are proposed to update both node and
edge embeddings. Meanwhile, there are also some efforts to
model the 3D-structure of molecule by improving GNNs with
spatial information, such as distance [25], [29], angle [30], [51],
and 3D coordinate [52]. However, these models fail to consider
the spatial interactions between proteins and ligands. In addition,
the function of learning angle information in [30] is designed for
density functional theory, which is only beneficial for predicting
molecular properties rather than protein-ligand binding affinity.
Moreover, recently there are long-range interaction learning
GNN models, while they are designed for specific applications
(such as user-item interaction in social recommendation [53],
[54]) and only focus on node-wise interactions [55]. To over-
come these limitations, we propose an multi-level interaction-
aware GNN framework with integrating both distance and angle
factors harmoniously.
III. P
RELIMINARIES
In this section, we introduce some definitions used in our
model and formulate the structure-based prediction problem
for protein-ligand binding affinity. The frequently used key
notations in this paper are summarized in Table I.
Definition 1. Complex Interaction Graph: Given a protein-
ligand complex as shown in Fig. 3(a), we define the atom
node sets of protein and ligand as V
P
= {a
P
1
,...,a
P
m
} and
V
L
= {a
L
1
,...,a
L
n
} with the position matrix M
P
∈ R
m×3
and
Algorithm 1: Graph Construction Process.
M
L
∈ R
n×3
for 3D atomic coordinates, respectively. Then we
define the complex interaction graph as a directional graph
G
I
=< V, E >, where the vertex set V is a subset of atom
node sets of protein and ligand, i.e., V⊆V
P
∪V
L
and the
unweighted edge set E = f
e
(V
P
, V
L
,M
P
,M
L
) is constructed
based on the s patial positions of atoms in the complex. Specifi-
cally, except the V
L
, the protein’s atoms close to the ligand from
V
P
are selected to add into V. We then update the complex edge
set E by adding into the edges of atom pairs whose distances
are shorter than the cutoff threshold θ
d
. The distance between
atom nodes is calculated using the euclidean distance, which is
a widely employed distance metric that measures the straight-
line spatial distance between two points in three-dimensional
space. By applying the euclidean distance calculation, denoted
as d
ij
=
(M
L
i
− M
P
j
)
2
, we can precisely quantify the spatial
separation between atom nodes. Formally, the edge set is repre-
sented as E = {(a
i
,a
j
)|a
i
,a
j
∈V,s.t.d
ij
≤ r
θ
}. The detailed
construction process is described in Algorithm 1.
Definition 2. Edge-Oriented Neighbors: Given an atom node
a
i
or a directed edge e
ij
(i.e., a
i
→ a
j
) in the complex interaction
graph G
I
, the edge-oriented neighbors N
e
of a
i
or e
ij
are defined
as the sets of directed edges {e
ki
,...,e
li
} which point to the
target atom a
i
or the target edge e
ij
.
Taking Fig. 3(b) as an example, the edges e
21
and e
41
are
connected to the edge e
13
via the common node a
1
, the edge-
oriented neighbors of e
13
are denoted as N
e
(e
13
)={e
21
,e
41
}.
Similarly, the edges e
13
, e
53
and e
63
point to the atom node a
3
,
resulting in the neighbors set N
e
(a
3
)={e
13
,e
53
,e
63
}.
Problem 1: Structure-Based Protein-Ligand Binding Affinity
Prediction: Given a protein-ligand complex with 3D structure,
i.e., the complex interaction graph G
I
and the 3D position matrix
M, our goal is to learn a regression model f(G
I
,M) to precisely
predict the numerical binding affinity score, which represents the
Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.
剩余17页未读,继续阅读
资源评论
pk_xz123456
- 粉丝: 2436
- 资源: 3453
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功