没有合适的资源?快使用搜索试试~ 我知道了~
使用 AlphaFold 3 准确预测生物分子相互作用的结构
0 下载量 131 浏览量
2024-05-09
15:45:56
上传
评论
收藏 11.83MB PDF 举报
温馨提示
![preview](https://dl-preview.csdnimg.cn/89286306/0001-e68dad88ffdda0e63b77c7600bef2c64_thumbnail.jpeg)
![preview-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/scale.ab9e0183.png)
试读
45页
使用 AlphaFold 3 准确预测生物分子相互作用的结构
资源推荐
资源详情
资源评论
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![sh](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![whl](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![whl](https://img-home.csdnimg.cn/images/20210720083646.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![zip](https://img-home.csdnimg.cn/images/20210720083736.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![pdf](https://img-home.csdnimg.cn/images/20210720083512.png)
![](https://csdnimg.cn/release/download_crawler_static/89286306/bg1.jpg)
Accurate structure prediction of
biomolecular interactions with AlphaFold 3
Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel,
Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua B am br ic k ,
S eb as ti an W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David R ei ma n ,
K a t hr yn T un ya su vu na ko ol, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie,
Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve,
Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs,
Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin,
Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian S te cu la , A sh ok Thillaisundaram,
Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek,
Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis & John M. Jumper
This is a PDF le of a peer-reviewed paper that has been accepted for publication.
Although unedited, the content has been subjected to preliminary formatting. Nature
is providing this early version of the typeset paper as a service to our authors and
readers. The text and gures will undergo copyediting and a proof review before the
paper is published in its nal form. Please note that during the production process
errors may be discovered which could aect the content, and all legal disclaimers
apply.
Received: 19 December 2023
Accepted: 29 April 2024
Accelerated Article Preview
Published online xx xx xxxx
Cite this article as: Abramson, J. et al.
Accurate structure prediction of
biomolecular interactions with
AlphaFold 3. Nature https://doi.org/
10.1038/s41586-024-07487-w (2024)
https://doi.org/10.1038/s41586-024-07487-w
Nature | www.nature.com
Accelerated Article Preview
A
C
C
E
L
E
R
A
T
E
D
A
R
T
I
C
L
E
P
R
E
V
I
E
W
![](https://csdnimg.cn/release/download_crawler_static/89286306/bg2.jpg)
Accurate structure prediction of biomolecular 1
interactions with AlphaFold 3 2
3
Josh Abramson
*1
, Jonas Adler
*1
, Jack Dunger
*1
, Richard Evans
*1
, Tim Green
*1
, Alexander 4
Pritzel
*1
, Olaf Ronneberger
*1
, Lindsay Willmore
*1
, Andrew J Ballard
1
, Joshua Bambrick
2
, 5
Sebastian W Bodenstein
1
, David A Evans
1
, Chia-Chun Hung
2
, Michael O'Neill
1
, David Reiman
1
, 6
Kathryn Tunyasuvunakool
1
, Zachary Wu
1
, Akvilė Žemgulytė
1
, Eirini Arvaniti
3
, Charles Beattie
3
, 7
Ottavia Bertolli
3
, Alex Bridgland
3
, Alexey Cherepanov
4
, Miles Congreve
4
, Alexander I Cowen-8
Rivers
3
, Andrew Cowie
3
, Michael Figurnov
3
, Fabian B Fuchs
3
, Hannah Gladman
3
, Rishub Jain
3
, 9
Yousuf A Khan
3
, Caroline M R Low
4
, Kuba Perlin
3
, Anna Potapenko
3
, Pascal Savy
4
, Sukhdeep 10
Singh
3
, Adrian Stecula
4
, Ashok Thillaisundaram
3
, Catherine Tong
4
, Sergei Yakneen
4
, Ellen D 11
Zhong
3
, Michal Zielinski
3
, Augustin Žídek
3
, Victor Bapst
†1
, Pushmeet Kohli
†1
, Max Jaderberg
†2
, 12
Demis Hassabis
†1,2
, John M Jumper
†1
13
14
15
*
Contributed equally 16
1
Core Contributor, Google DeepMind, London, UK 17
2
Core Contributor, Isomorphic Labs, London, UK 18
3
Google DeepMind, London, UK 19
4
Isomorphic Labs, London, UK 20
†
Jointly supervised 21
22
Corresponding author emails: 23
J. J. - jumper@google.com; D.H. - dhcontact@google.com; M.J. - jaderberg@isomorphiclabs.com 24
25
26
The introduction of AlphaFold 2
1
has spurred a revolution in modelling the structure of 27
proteins and their interactions, enabling a huge range of applications in protein modelling 28
and design
2–6
. In this paper, we describe our AlphaFold 3 model with a substantially 29
updated diffusion-based architecture, which is capable of joint structure prediction of 30
complexes including proteins, nucleic acids, small molecules, ions, and modified residues. 31
The new AlphaFold model demonstrates significantly improved accuracy over many 32
previous specialised tools: far greater accuracy on protein-ligand interactions than state of 33
the art docking tools, much higher accuracy on protein-nucleic acid interactions than 34
nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction 35
accuracy than AlphaFold-Multimer v2.3
7,8
. Together these results show that high accuracy 36
modelling across biomolecular space is possible within a single unified deep learning 37
framework. 38
ACCELERATED ARTICLE PREVIEW
![](https://csdnimg.cn/release/download_crawler_static/89286306/bg3.jpg)
Main Text 39
Introduction 40
Accurate models of biological complexes are critical to our understanding of cellular functions 41
and for the rational design of therapeutics
2–4,9
. Enormous progress has been achieved in protein 42
structure prediction with the development of AlphaFold
1
, and the field has grown tremendously 43
with a number of later methods that build on the ideas and techniques of AlphaFold 2
10–12
. 44
Almost immediately after AlphaFold became available, it was shown that simple input 45
modifications would enable surprisingly accurate protein interaction predictions
13–15
and that 46
training AlphaFold 2 specifically for protein interaction prediction yielded a highly accurate 47
system
7
. 48
49
These successes lead to the question of whether it is possible to accurately predict the structure 50
of complexes containing a much wider range of biomolecules, including ligands, ions, nucleic 51
acids, and modified residues, within a deep learning framework. A wide range of predictors for 52
various specific interaction types have been developed
16–28
, as well as one generalist method 53
developed concurrently with the present work
29
, but the accuracy of such deep learning attempts 54
has been mixed and often below that of physics-inspired methods
30,31
. Almost all these methods 55
are also highly specialised to particular interaction types and cannot predict the structure of 56
general biomolecular complexes containing many types of entities. 57
58
Here, we present AlphaFold 3 (AF3), a model that is capable of high accuracy prediction of 59
complexes containing nearly all molecular types present in the Protein Data Bank
32
(PDB) (Fig. 60
1a,b). In all but one category it achieves a significantly higher performance than strong methods 61
that specialise in just the given task (Fig. 1c, Extended Data Table 1) including higher accuracy 62
at protein structure and the structure of protein-protein interactions. 63
64
This is achieved by a substantial evolution of the AlphaFold 2 architecture and training 65
procedure (Fig. 1d) both to accommodate more general chemical structures and to improve the 66
data efficiency of learning. The system reduces the amount of multiple sequence alignment 67
(MSA) processing by replacing the AlphaFold 2 Evoformer with the simpler Pairformer Module 68
(Fig. 2a). Furthermore it directly predicts the raw atom coordinates with a Diffusion Module, 69
replacing the AlphaFold 2 Structure Module that operated on amino-acid-specific frames and 70
side chain torsion angles (Fig. 2b). The multiscale nature of the diffusion process (low noise 71
levels induce the network to improve local structure) also allow us to eliminate stereochemical 72
losses and most special handling of bonding patterns in the network, easily accommodating 73
arbitrary chemical components. 74
75
ACCELERATED ARTICLE PREVIEW
![](https://csdnimg.cn/release/download_crawler_static/89286306/bg4.jpg)
Network architecture and training 76
The overall structure of AF3 (Fig. 1d, Supplementary Methods 3) echoes that of AlphaFold 2 77
with a large trunk evolving a pairwise representation of the chemical complex followed by a 78
Structure Module that uses the pairwise representation to generate explicit atomic positions, but 79
there are large differences in each major component. These modifications were driven both by 80
the need to accommodate a wide range of chemical entities without excessive special-casing and 81
by observations of AlphaFold 2 performance with different modifications. Within the trunk, 82
MSA processing is substantially de-emphasized with a much smaller and simpler MSA 83
embedding block (Supplementary Methods 3.3). Compared to the original Evoformer from 84
AlphaFold 2 the number of blocks are reduced to four, the processing of the MSA representation 85
uses an inexpensive pair-weighted averaging, and only the pair representation is used for later 86
processing steps. The "Pairformer" (Fig. 2a, Supplementary Methods 3.6) replaces the 87
"Evoformer" of AlphaFold 2 as the dominant processing block. It operates only on the pair 88
representation and the single representation; the MSA representation is not retained and all 89
information passes via the pair representation. The pair processing and the number of blocks (48) 90
is largely unchanged from AlphaFold 2. The resulting pair and single representation together 91
with the input representation are passed to the new Diffusion Module (Fig. 2b) that replaces the 92
Structure Module of AlphaFold 2. 93
94
The Diffusion Module (Fig. 2b, Supplementary Methods 3.7) operates directly on raw atom 95
coordinates, and on a coarse abstract token representation, without rotational frames or any 96
equivariant processing. We had observed in AlphaFold 2 that removing most of the complexity 97
of the Structure Module had only a modest effect on prediction accuracy, and maintaining the 98
backbone frame and side chain torsion representation add quite a bit of complexity for general 99
molecular graphs. Similarly AlphaFold 2 required carefully tuned stereochemical violation 100
penalties during training to enforce chemical plausibility of the resulting structures. We use a 101
relatively standard diffusion approach
34
in which the diffusion model is trained to receive 102
“noised” atomic coordinates then predict the true coordinates. This task requires the network to 103
learn protein structure at a variety of length scales, where the denoising task at small noise 104
emphasises understanding very local stereochemistry and the denoising task at high noise 105
emphasises large-scale structure of the system. At inference time, random noise is sampled and 106
then recurrently denoised to produce a final structure. Importantly, this is a generative training 107
procedure which produces a distribution of answers. This means that, for each answer, the local 108
structure will be sharply defined (e.g. side chain bond geometry) even when the network is 109
uncertain about the positions. For this reason, we are able to avoid both torsion-based 110
parametrizations of the residues and violation losses on the structure, while handling the full 111
complexity of general ligands. Similarly to some recent work
35
, we find that no invariance or 112
equivariance with respect to global rotations and translation of the molecule are required in the 113
architecture and so we omit them to simplify the machine learning architecture. 114
115
The use of a generative diffusion approach comes with some technical challenges that we needed 116
to address. The biggest issue is that generative models are prone to hallucination
36
where the 117
ACCELERATED ARTICLE PREVIEW
![](https://csdnimg.cn/release/download_crawler_static/89286306/bg5.jpg)
model may invent plausible-looking structure even in unstructured regions. To counteract this 118
effect, we use a novel cross-distillation method where we enrich the training data with 119
AlphaFold-Multimer v2.3
7,8
predicted structures. In these structures, unstructured regions are 120
typically represented by long extended loops instead of compact structures and training on them 121
“teaches” AlphaFold 3 to mimic this behaviour. This cross-distillation greatly reduced the 122
hallucination behaviour of AF3 (Extended Data Fig. 1 for disorder prediction results on the 123
CAID 2
37
benchmark set). 124
125
We also developed confidence measures that predict the atom-level and pairwise errors in our 126
final structures. In AlphaFold 2, this was done directly by regressing the error in the output of the 127
Structure Module during training. This procedure is not applicable to diffusion training however, 128
since only a single step of the diffusion is trained instead of a full structure generation (Fig. 2c). 129
To remedy this, we developed a diffusion “rollout” procedure for the full structure prediction 130
generation during training (using a larger step size than normal; see Fig. 2c "mini-rollout"). This 131
predicted structure is then used to permute the symmetric ground truth chains and ligands, and to 132
compute the performance metrics to train the confidence head. The confidence head uses the 133
pairwise representation to predict the LDDT (pLDDT) and a predicted aligned error (PAE) 134
matrix as in AlphaFold 2, as well as a distance error matrix (PDE) which is the error in the 135
distance matrix of the predicted structure as compared to the true structure (see Supplementary 136
Methods 4.3 for details). 137
138
Fig. 2d shows that during initial training the model learns quickly to predict the local structures 139
(all intra chain metrics go up quickly and reach 97% of the maximum performance within the 140
first 20k training steps) while the model needs considerably longer to learn the global 141
constellation (the interface metrics go up slowly and protein-protein interface LDDT passes the 142
97% bar only after 60k steps). During AF3 development we observed that some model 143
capabilities topped out relatively early and started to decline (most likely due to overfitting to the 144
limited number of training samples for this capability) while other capabilities were still 145
undertrained. We addressed this by increasing / decreasing the sampling probability for the 146
corresponding training sets (Supplementary Methods 2.5.1) and by an early stopping using a 147
weighted average of all above metrics and some additional metrics to select the best model 148
checkpoint (Supplementary Table 7). The fine tuning stages with the larger crop sizes improve 149
the model on all metrics with an especially high uplift on protein-protein interfaces (Extended 150
Data Fig. 2). 151
Accuracy across complex types 152
AF3 can predict structures from input polymer sequences, residue modifications, and ligand 153
SMILES. In Fig. 3 we show a selection of examples highlighting the ability of the model to 154
generalise to a number of biologically important and therapeutically relevant modalities. In 155
selecting these examples, we considered novelty in terms of the similarity of individual chains 156
and interfaces to the training set (additional information in Supplementary Methods 8.1). 157
158
ACCELERATED ARTICLE PREVIEW
剩余44页未读,继续阅读
资源评论
![avatar-default](https://csdnimg.cn/release/downloadcmsfe/public/img/lazyLogo2.1882d7f4.png)
![avatar](https://profile-avatar.csdnimg.cn/68ef26bd67034c68b8d314222b3e4014_weixin_41429382.jpg!1)
百态老人
- 粉丝: 2129
- 资源: 2万+
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助
![voice](https://csdnimg.cn/release/downloadcmsfe/public/img/voice.245cc511.png)
![center-task](https://csdnimg.cn/release/downloadcmsfe/public/img/center-task.c2eda91a.png)
安全验证
文档复制为VIP权益,开通VIP直接复制
![dialog-icon](https://csdnimg.cn/release/downloadcmsfe/public/img/green-success.6a4acb44.png)