使用AlphaFold 3准确预测生物分子相互作用的结构

131 浏览量 2024-05-09 15:45:56 上传评论收藏 11.83MB PDF 举报

资源推荐

资源详情

资源评论

Accurate structure prediction of

biomolecular interactions with AlphaFold 3

Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, Alexander Pritzel,

Olaf Ronneberger, Lindsay Willmore, Andrew J. Ballard, Joshua B am br ic k ,

S eb as ti an W. Bodenstein, David A. Evans, Chia-Chun Hung, Michael O’Neill, David R ei ma n ,

K a t hr yn T un ya su vu na ko ol, Zachary Wu, Akvilė Žemgulytė, Eirini Arvaniti, Charles Beattie,

Ottavia Bertolli, Alex Bridgland, Alexey Cherepanov, Miles Congreve,

Alexander I. Cowen-Rivers, Andrew Cowie, Michael Figurnov, Fabian B. Fuchs,

Hannah Gladman, Rishub Jain, Yousuf A. Khan, Caroline M. R. Low, Kuba Perlin,

Anna Potapenko, Pascal Savy, Sukhdeep Singh, Adrian S te cu la , A sh ok Thillaisundaram,

Catherine Tong, Sergei Yakneen, Ellen D. Zhong, Michal Zielinski, Augustin Žídek,

Victor Bapst, Pushmeet Kohli, Max Jaderberg, Demis Hassabis & John M. Jumper

This is a PDF le of a peer-reviewed paper that has been accepted for publication.

Although unedited, the content has been subjected to preliminary formatting. Nature

is providing this early version of the typeset paper as a service to our authors and

readers. The text and gures will undergo copyediting and a proof review before the

paper is published in its nal form. Please note that during the production process

errors may be discovered which could aect the content, and all legal disclaimers

apply.

Received: 19 December 2023

Accepted: 29 April 2024

Accelerated Article Preview

Published online xx xx xxxx

Cite this article as: Abramson, J. et al.

Accurate structure prediction of

biomolecular interactions with

AlphaFold 3. Nature https://doi.org/

10.1038/s41586-024-07487-w (2024)

https://doi.org/10.1038/s41586-024-07487-w

Nature | www.nature.com

Accelerated Article Preview

Accurate structure prediction of biomolecular 1

interactions with AlphaFold 3 2

Josh Abramson

, Jonas Adler

, Jack Dunger

, Richard Evans

, Tim Green

, Alexander 4

Pritzel

, Olaf Ronneberger

, Lindsay Willmore

, Andrew J Ballard

, Joshua Bambrick

, 5

Sebastian W Bodenstein

, David A Evans

, Chia-Chun Hung

, Michael O'Neill

, David Reiman

, 6

Kathryn Tunyasuvunakool

, Zachary Wu

, Akvilė Žemgulytė

, Eirini Arvaniti

, Charles Beattie

, 7

Ottavia Bertolli

, Alex Bridgland

, Alexey Cherepanov

, Miles Congreve

, Alexander I Cowen-8

Rivers

, Andrew Cowie

, Michael Figurnov

, Fabian B Fuchs

, Hannah Gladman

, Rishub Jain

, 9

Yousuf A Khan

, Caroline M R Low

, Kuba Perlin

, Anna Potapenko

, Pascal Savy

, Sukhdeep 10

Singh

, Adrian Stecula

, Ashok Thillaisundaram

, Catherine Tong

, Sergei Yakneen

, Ellen D 11

Zhong

, Michal Zielinski

, Augustin Žídek

, Victor Bapst

†1

, Pushmeet Kohli

†1

, Max Jaderberg

†2

, 12

Demis Hassabis

†1,2

, John M Jumper

†1

Contributed equally 16

Core Contributor, Google DeepMind, London, UK 17

Core Contributor, Isomorphic Labs, London, UK 18

Google DeepMind, London, UK 19

Isomorphic Labs, London, UK 20

†

Jointly supervised 21

Corresponding author emails: 23

J. J. - jumper@google.com; D.H. - dhcontact@google.com; M.J. - jaderberg@isomorphiclabs.com 24

The introduction of AlphaFold 2

has spurred a revolution in modelling the structure of 27

proteins and their interactions, enabling a huge range of applications in protein modelling 28

and design

2–6

. In this paper, we describe our AlphaFold 3 model with a substantially 29

updated diffusion-based architecture, which is capable of joint structure prediction of 30

complexes including proteins, nucleic acids, small molecules, ions, and modified residues. 31

The new AlphaFold model demonstrates significantly improved accuracy over many 32

previous specialised tools: far greater accuracy on protein-ligand interactions than state of 33

the art docking tools, much higher accuracy on protein-nucleic acid interactions than 34

nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction 35

accuracy than AlphaFold-Multimer v2.3

7,8

. Together these results show that high accuracy 36

modelling across biomolecular space is possible within a single unified deep learning 37

framework. 38

ACCELERATED ARTICLE PREVIEW

Main Text 39

Introduction 40

Accurate models of biological complexes are critical to our understanding of cellular functions 41

and for the rational design of therapeutics

2–4,9

. Enormous progress has been achieved in protein 42

structure prediction with the development of AlphaFold

, and the field has grown tremendously 43

with a number of later methods that build on the ideas and techniques of AlphaFold 2

10–12

. 44

Almost immediately after AlphaFold became available, it was shown that simple input 45

modifications would enable surprisingly accurate protein interaction predictions

13–15

and that 46

training AlphaFold 2 specifically for protein interaction prediction yielded a highly accurate 47

system

. 48

These successes lead to the question of whether it is possible to accurately predict the structure 50

of complexes containing a much wider range of biomolecules, including ligands, ions, nucleic 51

acids, and modified residues, within a deep learning framework. A wide range of predictors for 52

various specific interaction types have been developed

16–28

, as well as one generalist method 53

developed concurrently with the present work

, but the accuracy of such deep learning attempts 54

has been mixed and often below that of physics-inspired methods

30,31

. Almost all these methods 55

are also highly specialised to particular interaction types and cannot predict the structure of 56

general biomolecular complexes containing many types of entities. 57

Here, we present AlphaFold 3 (AF3), a model that is capable of high accuracy prediction of 59

complexes containing nearly all molecular types present in the Protein Data Bank

(PDB) (Fig. 60

1a,b). In all but one category it achieves a significantly higher performance than strong methods 61

that specialise in just the given task (Fig. 1c, Extended Data Table 1) including higher accuracy 62

at protein structure and the structure of protein-protein interactions. 63

This is achieved by a substantial evolution of the AlphaFold 2 architecture and training 65

procedure (Fig. 1d) both to accommodate more general chemical structures and to improve the 66

data efficiency of learning. The system reduces the amount of multiple sequence alignment 67

(MSA) processing by replacing the AlphaFold 2 Evoformer with the simpler Pairformer Module 68

(Fig. 2a). Furthermore it directly predicts the raw atom coordinates with a Diffusion Module, 69

replacing the AlphaFold 2 Structure Module that operated on amino-acid-specific frames and 70

side chain torsion angles (Fig. 2b). The multiscale nature of the diffusion process (low noise 71

levels induce the network to improve local structure) also allow us to eliminate stereochemical 72

losses and most special handling of bonding patterns in the network, easily accommodating 73

arbitrary chemical components. 74

ACCELERATED ARTICLE PREVIEW

Network architecture and training 76

The overall structure of AF3 (Fig. 1d, Supplementary Methods 3) echoes that of AlphaFold 2 77

with a large trunk evolving a pairwise representation of the chemical complex followed by a 78

Structure Module that uses the pairwise representation to generate explicit atomic positions, but 79

there are large differences in each major component. These modifications were driven both by 80

the need to accommodate a wide range of chemical entities without excessive special-casing and 81

by observations of AlphaFold 2 performance with different modifications. Within the trunk, 82

MSA processing is substantially de-emphasized with a much smaller and simpler MSA 83

embedding block (Supplementary Methods 3.3). Compared to the original Evoformer from 84

AlphaFold 2 the number of blocks are reduced to four, the processing of the MSA representation 85

uses an inexpensive pair-weighted averaging, and only the pair representation is used for later 86

processing steps. The "Pairformer" (Fig. 2a, Supplementary Methods 3.6) replaces the 87

"Evoformer" of AlphaFold 2 as the dominant processing block. It operates only on the pair 88

representation and the single representation; the MSA representation is not retained and all 89

information passes via the pair representation. The pair processing and the number of blocks (48) 90

is largely unchanged from AlphaFold 2. The resulting pair and single representation together 91

with the input representation are passed to the new Diffusion Module (Fig. 2b) that replaces the 92

Structure Module of AlphaFold 2. 93

The Diffusion Module (Fig. 2b, Supplementary Methods 3.7) operates directly on raw atom 95

coordinates, and on a coarse abstract token representation, without rotational frames or any 96

equivariant processing. We had observed in AlphaFold 2 that removing most of the complexity 97

of the Structure Module had only a modest effect on prediction accuracy, and maintaining the 98

backbone frame and side chain torsion representation add quite a bit of complexity for general 99

molecular graphs. Similarly AlphaFold 2 required carefully tuned stereochemical violation 100

penalties during training to enforce chemical plausibility of the resulting structures. We use a 101

relatively standard diffusion approach

in which the diffusion model is trained to receive 102

“noised” atomic coordinates then predict the true coordinates. This task requires the network to 103

learn protein structure at a variety of length scales, where the denoising task at small noise 104

emphasises understanding very local stereochemistry and the denoising task at high noise 105

emphasises large-scale structure of the system. At inference time, random noise is sampled and 106

then recurrently denoised to produce a final structure. Importantly, this is a generative training 107

procedure which produces a distribution of answers. This means that, for each answer, the local 108

structure will be sharply defined (e.g. side chain bond geometry) even when the network is 109

uncertain about the positions. For this reason, we are able to avoid both torsion-based 110

parametrizations of the residues and violation losses on the structure, while handling the full 111

complexity of general ligands. Similarly to some recent work

, we find that no invariance or 112

equivariance with respect to global rotations and translation of the molecule are required in the 113

architecture and so we omit them to simplify the machine learning architecture. 114

115

The use of a generative diffusion approach comes with some technical challenges that we needed 116

to address. The biggest issue is that generative models are prone to hallucination

where the 117

ACCELERATED ARTICLE PREVIEW

model may invent plausible-looking structure even in unstructured regions. To counteract this 118

effect, we use a novel cross-distillation method where we enrich the training data with 119

AlphaFold-Multimer v2.3

7,8

predicted structures. In these structures, unstructured regions are 120

typically represented by long extended loops instead of compact structures and training on them 121

“teaches” AlphaFold 3 to mimic this behaviour. This cross-distillation greatly reduced the 122

hallucination behaviour of AF3 (Extended Data Fig. 1 for disorder prediction results on the 123

CAID 2

benchmark set). 124

125

We also developed confidence measures that predict the atom-level and pairwise errors in our 126

final structures. In AlphaFold 2, this was done directly by regressing the error in the output of the 127

Structure Module during training. This procedure is not applicable to diffusion training however, 128

since only a single step of the diffusion is trained instead of a full structure generation (Fig. 2c). 129

To remedy this, we developed a diffusion “rollout” procedure for the full structure prediction 130

generation during training (using a larger step size than normal; see Fig. 2c "mini-rollout"). This 131

predicted structure is then used to permute the symmetric ground truth chains and ligands, and to 132

compute the performance metrics to train the confidence head. The confidence head uses the 133

pairwise representation to predict the LDDT (pLDDT) and a predicted aligned error (PAE) 134

matrix as in AlphaFold 2, as well as a distance error matrix (PDE) which is the error in the 135

distance matrix of the predicted structure as compared to the true structure (see Supplementary 136

Methods 4.3 for details). 137

138

Fig. 2d shows that during initial training the model learns quickly to predict the local structures 139

(all intra chain metrics go up quickly and reach 97% of the maximum performance within the 140

first 20k training steps) while the model needs considerably longer to learn the global 141

constellation (the interface metrics go up slowly and protein-protein interface LDDT passes the 142

97% bar only after 60k steps). During AF3 development we observed that some model 143

capabilities topped out relatively early and started to decline (most likely due to overfitting to the 144

limited number of training samples for this capability) while other capabilities were still 145

undertrained. We addressed this by increasing / decreasing the sampling probability for the 146

corresponding training sets (Supplementary Methods 2.5.1) and by an early stopping using a 147

weighted average of all above metrics and some additional metrics to select the best model 148

checkpoint (Supplementary Table 7). The fine tuning stages with the larger crop sizes improve 149

the model on all metrics with an especially high uplift on protein-protein interfaces (Extended 150

Data Fig. 2). 151

Accuracy across complex types 152

AF3 can predict structures from input polymer sequences, residue modifications, and ligand 153

SMILES. In Fig. 3 we show a selection of examples highlighting the ability of the model to 154

generalise to a number of biologically important and therapeutically relevant modalities. In 155

selecting these examples, we considered novelty in terms of the similarity of individual chains 156

and interfaces to the training set (additional information in Supplementary Methods 8.1). 157

158

ACCELERATED ARTICLE PREVIEW

剩余44页未读，继续阅读

评论收藏

内容反馈

百态老人

粉丝: 2129
资源: 2万+

使用 AlphaFold 3 准确预测生物分子相互作用的结构

最新资源

使用 AlphaFold 3 准确预测生物分子相互作用的结构

准确的结构预测 与 AlphaFold 3 的生物分子相互作用.pdf

Python_利用AlphaFold3在PyTorch中实现生物分子相互作用的精确结构预测.zip

【哈佛大学】使用AlphaFold估算蛋白质模型精度的最新技术

AlphaFold overview and recent developments.pdf

Alphafold2如何应用AI预测蛋白.pdf

AlphaFold2论文及其补充文档

这是AlphaFold运行bash脚本更新版本，适用于AF2.1及以后，新脚本名称run-alphafold23.sh

介绍了AlphaFold2的原理和简单的安装过程！

Python库 | alphafold2_pytorch-0.0.67-py3-none-any.whl

alphafold1的model

Python库 | alphafold2_pytorch-0.3.3-py3-none-any.whl

PyPI 官网下载 | alphafold2_pytorch-0.0.23-py3-none-any.whl

Python库 | alphafold2_pytorch-0.4.18-py3-none-any.whl

Python库 | alphafold2_pytorch-0.0.57-py3-none-any.whl

AlphaFold2文档和61页技术补充材料

alphafold2：随着架构细节的发布，最终成为Alphafold2的非官方Pytorch实现复制

alphafold_pipeline：从seq到pdb的端到端蛋白质结构预测管线

alphafold1的t1919s2的数据

相关实用应用程序（Windows可用）

李飞飞自传 我看见的世界 The World I see

ChatGPT使用总结：150个ChatGPT提示词模板（完整版）

chromedriver-win64.zip

全国计算机二级WPSoffice精选350道选择题题库（含答案）.pdf

第十九届研电赛-技术论文模板

哈尔滨工业大学-ChatGPT调研报告-2023.3.6-94页.pdf

智联招聘：2024年大学生就业力调研报告.pdf

4个亲测好用的ChatGPT4渠道

2024年俄罗斯商用车数字集群信息娱乐系统市场机会及渠道调研报告Sample.pdf

最新资源

准确的结构预测与 AlphaFold 3 的生物分子相互作用.pdf

李飞飞自传我看见的世界 The World I see