基于几何感知互连图神经网络的蛋白质-配体结合亲和力预测-可实现的-有问题请联系博主，博主会第一时间回复！！！资源-CSDN文库

版权申诉

88 浏览量 2024-12-13 22:52:40 上传评论收藏 3.27MB PDF 举报

资源推荐

资源详情

资源评论

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024 1991

GIANT: Protein-Ligand Binding Afﬁnity

Prediction via Geometry-Aware Interactive

Graph Neural Network

Shuangli Li , Jingbo Zhou , Tong Xu ,LiangHuang ,FanWang , Haoyi Xiong , Senior Member, IEEE,

Weili Huang

, Dejing Dou , Senior Member, IEEE, and Hui Xiong , Fellow, IEEE

Abstract—Drug discovery often relies on the successful predic-

tion of protein-ligand binding afﬁnity. Recent advances have shown

great promise in applying graph neural networks (GNNs) for better

afﬁnity prediction by learning the representations of protein-ligand

complexes. However, existing solutions usually treat protein-ligand

complexes as topological graph data, thus the 3D geometry-based

biomolecular structural information is not fully utilized. The essen-

tial intermolecular interactions with long-range dependencies, in-

cluding type-wise interactions and molecule-wise interactions, are

also neglected in GNN models. To this end, we propose a geometry-

aware interactive graph neural network (GI

ANT) which consists of

two components: 3D geometric graph learning network (3DG-N

ET)

and pairwise interactive learning network (P

I-NET). Speciﬁcally,

3DG-N

ET iteratively performs the node-edge interaction process

to update embeddings of nodes and edges in a uniﬁed framework

while preserving the 3D geometric factors among atoms, including

spatial distance, polar angle and dihedral angle information in 3D

space. Moreover, P

I-NET is adopted to incorporate both element

type-level and molecule-level interactions. Specially, interactive

edges are gathered with a subsequent reconstruction loss to reﬂect

the global type-level interactions. Meanwhile, a pairwise attentive

pooling scheme is designed to identify the critical interactive atoms

for complex representation learning from a semantic view. An

exhaustive experimental study on two benchmarks veriﬁes the

superiority of GI

ANT.

Manuscript received 24 March 2022; revised 26 July 2023; accepted 26

August 2023. Date of publication 13 September 2023; date of current version 5

April 2024. This work was supported in part by OPPO Research Fund and in part

by the National Natural Science Foundation of China under Grant 61960206008.

Recommended for acceptance by Y. Tong. (Corresponding authors: Jingbo

Zhou; Dejing Dou; Hui Xiong.)

Shuangli Li is with the Anhui Province Key Lab of Big Data Analysis

and Application, School of Computer Science and Technology, University of

Science and Technology of China, Hefei, Anhui 230026, China, and also with

the Business Intelligence Lab, Baidu Research, Beijing 100085, China (e-mail:

lsl1997@mail.ustc.edu.cn).

Jingbo Zhou is with the Business Intelligence Lab, Baidu Research, Beijing

100085, China (e-mail: zhoujingbo@outlook.com).

Tong Xu is with the Anhui Province Key Lab of Big Data Analysis and

Application, Schoolof Computer Science, University of Science and Technology

of China, Hefei, Anhui 230026, China (e-mail: tongxu@ustc.edu.cn).

Liang Huang is with the Oregon State University, Corvallis, OR 97331 USA

(e-mail: lianghuang@baidu.com).

Fan Wang and Haoyi Xiong are with the Baidu Inc., Beijing 100085, China

(e-mail: wangfan04@baidu.com; xionghaoyi@baidu.com).

Weili Huang is with the HWL Consulting LLC, LU7 9PU Bedfordshire, U.K.

(e-mail: lwlily99@gmail.com).

Dejing Dou is with the BCG X, Boston, MA 02210 USA (e-mail: doude-

jing@baidu.com).

Hui Xiong is with the Artiﬁcial Intelligence Thrust, Hong Kong University

of Science and Technology (Guangzhou), Guangzhou 529200, China (e-mail:

xionghui@ust.hk).

Digital Object Identiﬁer 10.1109/TKDE.2023.3314502

Index Terms—Binding afﬁnity prediction, graph neural

network, geometry modeling, drug discovery, compound-protein

interaction.

I. INTRODUCTION

HE prediction of protein-ligand binding afﬁnity has been

widely considered as one of the most important tasks in

computational drug discovery [1]. Here ligands are usually drug

candidates including small molecules and biologics which can

interact with proteins as agonists or inhibitors in the biological

processes to cure diseases. Given a protein, we are interested in

understanding how well a drug molecule (called a ligand) can

interact with this protein. The strength of interaction between

them can be quantiﬁed as a numerical score (called the binding

afﬁnity),which potentially determines whether a ligand can have

an effective inﬂuence on the protein (for example, to inactivate a

protein to cure a disease). Therefore, the calculation of binding

afﬁnity is of great signiﬁcance, and our target is to estimate

this valuable interaction score. Although it can be measured by

experimental methods, those biological tests are laborious and

time-consuming. Thus, data-driven computational approaches

have become increasingly necessary and achieved remarkable

success in various drug discovery applications, including pro-

tein interaction mining [2], molecule generation [3], and drug

reactions prediction [4], which highlight the efﬁcacy of such

methods in tackling complex problems for drug-based data min-

ing and knowledge discovery. With similar data-driven learning

models, binding afﬁnities can be predicted in the early stage of

drug discovery. Instead of applying costly biological methods

directly to screen numerous candidate molecules, the prediction

of binding afﬁnity can help to rank drug candidates and prioritize

the appropriate ones for subsequent testing to accelerate the

process of drug screening [5].

With the development of structural biology and protein struc-

ture prediction [6], especially the recent Alphafold II model [7],

there are growing three-dimensional (3D) structure protein data,

which enables a new paradigm for structure-based drug discov-

ery [8], [9], [10]. It has been demonstrated that 3D structural

information can effectively contribute to the drug design [11].

Indeed, since there are already many accurate and robust algo-

rithms to ﬁnd poses of protein-ligand complexes (e.g., binding

site prediction methods and docking methods), it is signiﬁcant to

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.

1992 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024

Fig. 1. Brief summary for protein-ligand binding afﬁnity prediction.

(1) Top left: An example of protein-ligand complex (Structure ID in Protein

Data Bank (PDB): 5HMI). (2) Top right: Various complex representations. (3)

Bottom left: Traditional Methods. (4) Bottom right: Machine learning and deep

learning methods.

focus on the much harder task of binding afﬁnity prediction [12].

To learn useful 3D structure from a protein-ligand complex, as

illustrated in Fig. 1, many efforts have been devoted to esti-

mating more accurate binding afﬁnity for effective drug design.

Docking methods [13], [14], [15] play an important role to

predict how a speciﬁc ligand binds to the target protein with

affordable computational costs. While the docking process can

identify the binding pose of the protein-ligand complex with

relatively high accuracy, its prediction of binding afﬁnity is

inaccurate and unreliable due to poor scoring functions [12],

[16], which limits the applicability of docking methods in drug

discovery. Compared to docking calculations, traditional ma-

chine learning methods [12], [17] have improved the perfor-

mance by learning the extracted features from protein-ligand

complexes. However, these approaches with limited generaliz-

ability require expert knowledge and heavily rely on feature

engineering.

Recently, deep learning for binding afﬁnity prediction has

become an emerging research area, which represents the com-

plex as sequence data [18], [19], 3D grid-like data [20] or graph

data [21] to employ various neural networks. One of the key

challenges of deep learning in structural biology is how to model

the 3D spatial structure for better performance. To this end,

most of the existing works [20], [22], [23] attempt to apply

3D convolutional neural networks (3D CNNs) by treating the

complex as a 3D-grid representation. However, the cost of these

models is huge, especially when considering long-range struc-

tural interactions. In addition, both the absence of topological

information and the sensitivity to rotation in the complex have

a negative effect on the prediction results.

Despite the powerful ability of graph neural networks (GNNs)

to learn graph representations [24], there are only a few stud-

ies [21], [25] using GNNs to predict the protein-ligand binding

afﬁnity. By contrast, many researchers have developed GNN

models in other ﬁelds of drug discovery [26], [27], such as

predicting molecular property [28], [29], [30], [31], biological

network linking [32], and chemical reactions [33]. Nevertheless,

these domain-speciﬁc models tend to lose their effectiveness

when modeling the larger biomolecules, e.g., protein-ligand

complexes. In general, most of the existing GNNs in drug

design aim to learn the spatial structure by incorporating the

distance information, which is insufﬁcient to model the 3D

geometric structure of complex. Moreover, the fundamental

pairwise interactive information between proteins and ligands,

which is valuable for predicting the binding afﬁnity [34], cannot

be handled under the current GNN framework.

This paper is an extension of our preliminary work [35].

To overcome the above limitations, in this paper, we propose

a novel G

eometry-aware Interactive Graph Neural Network

(GI

ANT) to learn the constructed complex interaction graph for

protein-ligand binding afﬁnity prediction. GI

ANT is equipped

with two components to correspondingly address the challenges,

namely the 3D geometric graph learning network (3DG-N

ET)

for modeling the geometric structure in 3D space and the

pairwise interaction learning network (P

I-NET) for leveraging

both element type-level and molecule-level interactions with

long-range intermolecular dependencies.

As the ﬁrst part of GI

ANT, the key idea of 3DG-NET is illus-

trated in Fig. 2 which aims to construct the spherical space for

each central targetand to apply the node-edge interactive scheme

iteratively. 3DG-N

ET has the ability to preserve spatial distance,

polar angle and dihedral angle information of neighbors when

performing the aggregation process, thus it can effectively learn

the 3D structure of protein-ligand complexes.

I-NET is designed as the secondary part of GIANT to incor-

porate global intermolecular interactions. On the one hand, in

view of the large size of the protein, it is redundant to contain

the complete protein structure in the graph and we construct the

spatial-based interaction graph from the central key structure of

complex, but in this way the type-level long-range interactive

information (including distant solvation effects [36] and elec-

trostatic interactions [34]) between the protein and the ligand

cannot be captured through such complicated graph without

the complete structure. To deal with this issue, we employ an

atomic type-aware pooling process on edges by introducing

an auxiliary learning task to reconstruct the interactive matrix

for type-level interaction injection. On the other hand, several

important atoms of the complex can affect pairwise interactions

and contribute to the binding afﬁnity. Therefore, we ﬁnally

utilize the molecule-level attentive pooling network to extract

the informative biological semantics.

By means of 3DG-N

ET and PI-NET from two perspectives,

our proposed GI

ANT can enhance the representation learning

for complexes with involving both 3D geometric structures and

global interactions. To summarize, the main contributions of this

paper are as follows:

To the best of our knowledge, we are among the ﬁrst to

develop graph neural networks from the perspective of

comprehensive biochemical representation learning in 3D

space for structure-based binding afﬁnity prediction.

We propose a novel geometry-aware interactive graph

neural network (GI

ANT), which can capture not only 3D

geometric information through distance-aware graph atten-

tion and angle-oriented graph convolution with a triangular

fusion scheme in 3DG-N

ET, but also global long-range

interactions through pairwise interaction learning network

I-NET) in a semi-supervised manner.

We conduct extensive experiments using two benchmark

datasets to evaluate the performance of the proposed model,

Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.

LI et al.: GIANT: PROTEIN-LIGAND BINDING AFFINITY PREDICTION VIA GEOMETRY-AWARE INTERACTIVE GRAPH NEURAL NETWORK 1993

Fig. 2. Illustration of complex geometric division with three angle domains (in different colors) in 3D space, where θ

represents polar angle and φ

i,j

denotes

dihedral angle between two adjacent planes.

which demonstrates the superiority of GIANT compared

with state-of-the-art baselines.

Compared with our previous conference paper SIGN [35],

the major improvements include: 1) For geometry modeling,

we present a new sufﬁcient paradigm to model the 3D view

for protein-ligand complexes and replace the angle-oriented

graph attention with a triple-wise dihedral graph aggregation

process (TAGG) to enhance the structure learning. We fulﬁl the

3DG-N

ET for integrated complex modeling with complementary

dihedral angle information. In this way, the proposed GI

ANT

can learn the comprehensive 3D geometry instead of partial

geometry in SIGN. 2) For interaction modeling, we devise a

novel pairwise interaction learning network P

I-NET to further

facilitate the complex representation learning with adding the

molecule-level interaction component, which can capture the

interactive correlations of both biological element types and

high-level molecules. 3) We signiﬁcantly extend our experimen-

tal evaluation by comparing with our primary work [35] and

showing additional quantitative results for model effectiveness

and parameter analysis. 4) We also provide a case study to

analyze the interpretability of our model in understanding the

protein-ligand interaction patterns.

II. R

ELATED WORK

In this section, we ﬁrst review the related literatures about

predicting protein-ligand binding afﬁnity and then detail recent

advances in graph neural networks f or drug discovery.

Protein-Ligand Binding Afﬁnity Prediction: As a crucial stage

in drug discovery, predicting protein-ligand binding afﬁnity has

been intensively studied for a long time [37], [38], which is of

great importance for efﬁcient and accurate drug screening. The

earlier empirical-based methods [14], [39], [40] design docking

and scoring functions specially to makepredictions, while expert

domain knowledge is required to encode internal biochemical

interactions. Later on, statistical and machine learning-based

methods [41] are developed to predict binding afﬁnity based

on data-driven learning, which attempt to extract protein-ligand

features and use classic models for regression, such as random

forest [12] and SVM [17]. These approaches are dependent

on the quality of hand-crafted features and lack of generality

on the larger dataset. Recently, several deep learning-based

models [18], [19] utilize 1D convolutions and pooling to capture

potential patterns from raw sequence information of both ligands

and proteins. However, only using separate character represen-

tations fails to achieve desirable performance.

Recently, AlphaFold II [7] makes a remarkable achievement

in the ﬁeld of protein structure prediction, which adopts the

Transformer-based framework designed for predicting protein’s

3D structure given the amino acid s equence of the protein. As the

increasing availability of 3D-structure protein-ligand data [42],

there is another hot research area of studying structure-based

approaches, which focus on learning from 3D-structure protein-

ligand complexes to predict binding afﬁnity. The problem of

Alphafold II and the binding afﬁnity prediction problem are two

complementary problems, both of which hold great importance

for biological data mining and drug knowledge discovery. Some

recent works [22], [23] represent the protein-ligand complex

as 3D grid-like data and use 3D convolutions (3D-CNNs) to

take advantage of spatially-local correlations. Though these

approaches can learn spatial information, one limitation is that

positions of proteins and ligands in different complexes are

changeable, such as different angle rotations, which means the

spatial structure of 3D grid-like modeling is inevitably incom-

plete. More recently, OnionNet [43] employs CNN models to

learn the complex representation from the extracted element-

speciﬁc interaction features between a protein and its ligand.

However, all the above models neglect the critical topologi-

cal structure information of complexes. In the work [25],a

protein-ligand complex is represented as a weighted graph with

distance information. Then graph attention networks are applied

to predicting the interactions. Nevertheless, only distance infor-

mation between atoms is not adequate to model 3D-structure

interactions. In this paper, we also focus on the structure-based

prediction of protein-ligand binding afﬁnity with incorporating

abundant spatial information.

Graph Neural Networks for Drug Discovery: Inspired by the

great advantage of graph neural networks (GNNs) in modeling

graph data, more attention has been devoted to applying them

in computational drug discovery [26], such as the prediction of

molecular property [44] and protein interface [45]. Treating the

molecule as a graph, GNNs can learn the graph-level represen-

tation for drug or protein by aggregating structural information.

GraphDTA [21] adopts GNN models [46], [47], [48] to learn

drug presentation with combining the protein representation

from 1D convolutions to predict binding afﬁnity. In attributed

Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.

1994 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 36, NO. 5, MAY 2024

TABLE I

ATHEMATICAL NOTATIONS

Fig. 3. Illustrative example of converting the protein-ligand complex into a

complex interaction graph.

molecular graphs, the edges between atoms contain valuable

information, such as distance or bond order. To leverage rich

attributes in the molecule, edge-oriented message passing neural

networks [28], [49], [50] are proposed to update both node and

edge embeddings. Meanwhile, there are also some efforts to

model the 3D-structure of molecule by improving GNNs with

spatial information, such as distance [25], [29], angle [30], [51],

and 3D coordinate [52]. However, these models fail to consider

the spatial interactions between proteins and ligands. In addition,

the function of learning angle information in [30] is designed for

density functional theory, which is only beneﬁcial for predicting

molecular properties rather than protein-ligand binding afﬁnity.

Moreover, recently there are long-range interaction learning

GNN models, while they are designed for speciﬁc applications

(such as user-item interaction in social recommendation [53],

[54]) and only focus on node-wise interactions [55]. To over-

come these limitations, we propose an multi-level interaction-

aware GNN framework with integrating both distance and angle

factors harmoniously.

III. P

RELIMINARIES

In this section, we introduce some deﬁnitions used in our

model and formulate the structure-based prediction problem

for protein-ligand binding afﬁnity. The frequently used key

notations in this paper are summarized in Table I.

Deﬁnition 1. Complex Interaction Graph: Given a protein-

ligand complex as shown in Fig. 3(a), we deﬁne the atom

node sets of protein and ligand as V

= {a

,...,a

} and

= {a

,...,a

} with the position matrix M

∈ R

m×3

and

Algorithm 1: Graph Construction Process.

∈ R

n×3

for 3D atomic coordinates, respectively. Then we

deﬁne the complex interaction graph as a directional graph

=< V, E >, where the vertex set V is a subset of atom

node sets of protein and ligand, i.e., V⊆V

∪V

and the

unweighted edge set E = f

, V

) is constructed

based on the s patial positions of atoms in the complex. Speciﬁ-

cally, except the V

, the protein’s atoms close to the ligand from

are selected to add into V. We then update the complex edge

set E by adding into the edges of atom pairs whose distances

are shorter than the cutoff threshold θ

. The distance between

atom nodes is calculated using the euclidean distance, which is

a widely employed distance metric that measures the straight-

line spatial distance between two points in three-dimensional

space. By applying the euclidean distance calculation, denoted

as d



− M

)

, we can precisely quantify the spatial

separation between atom nodes. Formally, the edge set is repre-

sented as E = {(a

)|a

∈V,s.t.d

≤ r

}. The detailed

construction process is described in Algorithm 1.

Deﬁnition 2. Edge-Oriented Neighbors: Given an atom node

or a directed edge e

(i.e., a

→ a

) in the complex interaction

graph G

, the edge-oriented neighbors N

of a

or e

are deﬁned

as the sets of directed edges {e

,...,e

} which point to the

target atom a

or the target edge e

Taking Fig. 3(b) as an example, the edges e

and e

are

connected to the edge e

via the common node a

, the edge-

oriented neighbors of e

are denoted as N

)={e

Similarly, the edges e

, e

and e

point to the atom node a

resulting in the neighbors set N

)={e

Problem 1: Structure-Based Protein-Ligand Binding Afﬁnity

Prediction: Given a protein-ligand complex with 3D structure,

i.e., the complex interaction graph G

and the 3D position matrix

M, our goal is to learn a regression model f(G

,M) to precisely

predict the numerical binding afﬁnity score, which represents the

Authorized licensed use limited to: Central South University. Downloaded on October 19,2024 at 08:15:05 UTC from IEEE Xplore. Restrictions apply.

剩余17页未读，继续阅读

评论收藏

内容反馈

版权申诉

pk_xz123456

粉丝: 2436
资源: 3453

基于几何感知互连图神经网络的蛋白质-配体结合亲和力预测-可实现的-有问题请联系博主，博主会第一时间回复！！！

基于体素点到像素匹配的图像与LiDAR点云的鲁棒注册方法-可实现的-有问题请联系博主，博主会第一时间回复！！！

数字图像处理实验中的PCB小孔检测技术及Simulink建模应用-可实现的-有问题请联系博主，博主会第一时间回复！！！

计算机图形学课程项目指南：面向未来的翡翠岛-可实现的-有问题请联系博主，博主会第一时间回复！！！

电子工程图像处理实验-PCB钻孔位置检测与校验-可复现的-有问题请联系博主，博主会第一时间回复！！！

麦克风阵列不变性的多通道语音分离与增强研究-可实现的-有问题请联系博主，博主会第一时间回复！！！

深度特征对齐的激光雷达与相机多模态3D物体检测方法-DeepFusion-可实现的-有问题请联系博主，博主会第一时间回复！！！

BA-NET：密集束平差网络解决结构光运动问题-可实现的-有问题请联系博主，博主会第一时间回复！！！

机械工程中自卸车举升机构与翻转机构的运动分析及优化-可实现的-有问题请联系博主，博主会第一时间回复！！！

热工实验：增加散热片提高传热效率的研究-可实现的-有问题请联系博主，博主会第一时间回复！！！

基于二维图像和三维几何约束神经网络的单目室内深度估计方法.docx

行业分类-设备装置-基于几何智能体的人员疏散模型.zip

基于混合量子−经典神经网络模型的股价预测.docx

【AI人工智能】AI在医学领域的应用实战案例：基于3D卷积神经网络的结合口袋预测工具：DeepPocket.zip

基于BP神经网络的铁路轨道几何不平顺预测方法.pdf

基于不变矩和神经网络的交通标志识别方法研究

基于BP-RBF神经网络的刀具寿命预测模型研究.pdf

基于模糊神经网络的拉-镗复合刀具受力预测分析.pdf

基于线性神经网络和多参数的蛋白质相似度算法.pdf

网络游戏-基于粒子群优化BP神经网络的惯容器力学性能预测方法.zip

论文研究-基于距离矩阵灰度图的蛋白质二级结构类型预测.pdf

最新几何深度学习扩展库 PyTorch Geometric - 1903.02428.zip

精品--毕业设计 蛋白质二级结构预测.zip

网络游戏-基于多尺度分形维和神经网络的机器人视觉图像分割方法.zip

基于神经网络RGB-D图像分割

胶囊图神经网络20190823.pdf

基于几何深度学习的知识图谱关键技术研究进展

基于图像的几何建模技术综述

基于高光谱图像特征提取与凸面几何体投影变换的目标探测

基于DenseNet进化的卷积神经网络图像分类算法

最新资源

精品--毕业设计蛋白质二级结构预测.zip