PyPI官网下载|enformer-pytorch-0.2.14.tar.gz

版权申诉

145 浏览量 2022-01-27 13:19:41 上传评论收藏 13KB GZ 举报

共17个文件

py：6个

txt：4个

pkg-info：2个

《PyPI官网下载：enformer-pytorch-0.2.14.tar.gz——深度学习与人工智能中的Enformer模型解析》 PyPI（Python Package Index）是Python编程语言的官方软件包仓库，提供了丰富的第三方库资源，使得开发者能够便捷地获取和安装所需的工具。在此次讨论的资源中，“enformer-pytorch-0.2.14.tar.gz”是一个源自PyPI官网的压缩包，它包含了PyTorch实现的Enformer模型的版本0.2.14。接下来，我们将深入探讨Enformer模型及其在深度学习和人工智能领域中的应用。 Enformer，全称为“Environment Former”，是由Max Planck Institute for Molecular Genetics的研究团队提出的一种新型的Transformer架构，它在2021年首次被引入到科学研究中。Enformer的设计旨在处理高分辨率的时空数据，如基因表达数据，其核心思想是对环境信息进行建模，从而提供更准确的预测和分析。传统的Transformer模型，如BERT和GPT，主要针对序列数据，而Enformer则扩展了这一框架，不仅处理序列，还能处理网格状数据。它采用了混合注意力机制，结合了全局和局部的关注点，以适应复杂的数据结构。这种设计允许模型在保持高效计算的同时，捕获远距离依赖关系，这在处理如基因表达等具有广泛相互作用的复杂数据时尤为重要。 PyTorch是一个流行的深度学习框架，它为开发和训练神经网络提供了灵活的接口。将Enformer模型集成到PyTorch中，使得研究人员和工程师可以利用其强大的计算能力来实现Enformer的训练和推理，同时利用PyTorch的可视化工具进行模型调试和优化。在压缩包“enformer-pytorch-0.2.14”中，通常会包含以下内容： 1. `setup.py`：Python的配置文件，用于安装和构建Enformer-PyTorch库。 2. `requirements.txt`：列出项目依赖的Python库，确保安装环境的正确性。 3. `enformer.py`或类似文件：实际的Enformer模型代码，定义模型结构和训练逻辑。 4. `tests`目录：包含单元测试，用于验证模型功能的正确性。 5. `examples`或`scripts`目录：可能包含示例脚本，展示如何使用Enformer进行预测或训练。 6. `docs`或`README.md`：文档或说明文件，提供模型的详细说明和使用指南。在实际应用中，Enformer-PyTorch库可以用于多个领域，包括但不限于生物信息学、气候科学、图像分析等，这些领域都需要处理具有时空信息的高维度数据。例如，在生物信息学中，Enformer可以用于预测基因表达模式，理解基因之间的相互作用；在气候科学中，模型可以分析气候数据，预测极端天气事件。 PyPI上的“enformer-pytorch-0.2.14.tar.gz”是一个强大且实用的工具，它为研究者和开发者提供了一个在PyTorch环境中实现和探索Enformer模型的平台，进一步推动了深度学习在处理复杂时空数据方面的应用。通过理解和掌握Enformer模型，我们可以更好地应对各种领域的挑战，推动人工智能技术的进步。

资源推荐

资源详情

资源评论

收起资源包目录

enformer-pytorch-0.2.14.tar.gz （17个子文件）

enformer-pytorch-0.2.14

MANIFEST.in 41B

PKG-INFO 584B

LICENSE 1KB

enformer_pytorch

model_loader.py 2KB

finetune.py 7KB

enformer_pytorch.py 13KB

__init__.py 262B

config.yml 716B

data.py 5KB

setup.cfg 38B

setup.py 819B

enformer_pytorch.egg-info

PKG-INFO 584B

requires.txt 51B

SOURCES.txt 418B

top_level.txt 17B

dependency_links.txt 1B

README.md 9KB

<img src="./enformer.png" width="450px"></img> ## Enformer - Pytorch Implementation of <a href="https://deepmind.com/blog/article/enformer">Enformer</a>, Deepmind's attention network for predicting gene expression, in Pytorch. This repository also contains the means to fine tune pretrained models for your downstream tasks. The original tensorflow sonnet code can be found <a href="https://github.com/deepmind/deepmind-research/tree/master/enformer">here</a>. ## Install ```bash $ pip install enformer-pytorch ``` ## Usage ```python import torch from enformer_pytorch import Enformer model = Enformer( dim = 1536, depth = 11, heads = 8, output_heads = dict(human = 5313, mouse = 1643), target_length = 896, ) seq = torch.randint(0, 5, (1, 196_608)) # for ACGTN, in that order (-1 for padding) output = model(seq) output['human'] # (1, 896, 5313) output['mouse'] # (1, 896, 1643) ``` You can also directly pass in the sequence as one-hot encodings, which must be float values ```python import torch from enformer_pytorch import Enformer, seq_indices_to_one_hot model = Enformer( dim = 1536, depth = 11, heads = 8, output_heads = dict(human = 5313, mouse = 1643), target_length = 896, ) seq = torch.randint(0, 5, (1, 196_608)) one_hot = seq_indices_to_one_hot(seq) output = model(one_hot) output['human'] # (1, 896, 5313) output['mouse'] # (1, 896, 1643) ``` Finally, one can fetch the embeddings, for fine-tuning and otherwise, by setting the `return_embeddings` flag to be `True` on forward ```python import torch from enformer_pytorch import Enformer, seq_indices_to_one_hot model = Enformer( dim = 1536, depth = 11, heads = 8, output_heads = dict(human = 5313, mouse = 1643), target_length = 896, ) seq = torch.randint(0, 5, (1, 196_608)) one_hot = seq_indices_to_one_hot(seq) output, embeddings = model(one_hot, return_embeddings = True) embeddings # (1, 896, 3072) ``` For training, you can directly pass the head and target in to get the poisson loss ```python import torch from enformer_pytorch import Enformer, seq_indices_to_one_hot model = Enformer( dim = 1536, depth = 11, heads = 8, output_heads = dict(human = 5313, mouse = 1643), target_length = 200, ).cuda() seq = torch.randint(0, 5, (196_608 // 2,)).cuda() target = torch.randn(200, 5313).cuda() loss = model( seq, head = 'human', target = target ) loss.backward() # after much training corr_coef = model( seq, head = 'human', target = target, return_corr_coef = True ) corr_coef # pearson R, used as a metric in the paper ``` ## Pretrained Model Warning: the pretrained models so far have not hit the mark of what was presented in the paper. if you would like to help out, please join <a href="https://discord.com/invite/s7WyNU24aM">this discord</a>. replication efforts ongoing To use a pretrained model (may not be of the same quality as the one in the paper yet), first install `gdown` ```bash $ pip install gdown ``` Then ```python from enformer_pytorch import load_pretrained_model model = load_pretrained_model('preview') # do your fine-tuning ``` You can also load, with overriding of the `target_length` parameter, if you are working with shorter sequence lengths ```python from enformer_pytorch import load_pretrained_model model = load_pretrained_model('preview', target_length = 128, dropout_rate = 0.1) # do your fine-tuning ``` You can also define the model externally, and then load the pretrained weights by passing it into `load_pretrained_model` ```python from enformer_pytorch import Enformer, load_pretrained_model enformer = Enformer(dim = 1536, depth = 11, target_length = 128, dropout_rate = 0.1) load_pretrained_model('preview', model = enformer) # use enformer ``` To save on memory during fine-tuning a large Enformer model ```python from enformer_pytorch import Enformer, load_pretrained_model enformer = load_pretrained_model('preview', use_checkpointing = True) # finetune enformer on a limited budget ``` ## Fine-tuning This repository will also allow for easy fine-tuning of Enformer. Fine-tuning on new tracks ```python import torch from enformer_pytorch import Enformer from enformer_pytorch.finetune import HeadAdapterWrapper enformer = Enformer( dim = 1536, depth = 1, heads = 8, target_length = 200, ) model = HeadAdapterWrapper( enformer = enformer, num_tracks = 128 ).cuda() seq = torch.randint(0, 5, (1, 196_608 // 2,)).cuda() target = torch.randn(1, 200, 128).cuda() # 128 tracks loss = model(seq, target = target) loss.backward() ``` Finetuning on contextual data (cell type, transcription factor, etc) ```python import torch from enformer_pytorch import Enformer from enformer_pytorch.finetune import ContextAdapterWrapper enformer = Enformer( dim = 1536, depth = 1, heads = 8, target_length = 200, ) model = ContextAdapterWrapper( enformer = enformer, context_dim = 1024 ).cuda() seq = torch.randint(0, 5, (1, 196_608 // 2,)).cuda() target = torch.randn(1, 200, 4).cuda() # 4 tracks context = torch.randn(4, 1024).cuda() # 4 contexts for the different 'tracks' loss = model( seq, context = context, target = target ) loss.backward() ``` Finally, there is also a way to use attention aggregation from a set of context embeddings (or a single context embedding). Simply use the `ContextAttentionAdapterWrapper` ```python import torch from enformer_pytorch import Enformer from enformer_pytorch.finetune import ContextAttentionAdapterWrapper enformer = Enformer( dim = 1536, depth = 1, heads = 8, target_length = 200, ) model = ContextAttentionAdapterWrapper( enformer = enformer, context_dim = 1024, heads = 8, # number of heads in the cross attention dim_head = 64 # dimension per head ).cuda() seq = torch.randint(0, 5, (1, 196_608 // 2,)).cuda() target = torch.randn(1, 200, 4).cuda() # 4 tracks context = torch.randn(4, 16, 1024).cuda() # 4 contexts for the different 'tracks', each with 16 tokens context_mask = torch.ones(4, 16).bool().cuda() # optional context mask, in example, include all context tokens loss = model( seq, context = context, context_mask = context_mask, target = target ) loss.backward() ``` ## Data You can use the `GenomicIntervalDataset` to easily fetch sequences of any length from a `.bed` file, with greater context length dynamically computed if specified ```python import torch import polars as pl from enformer_pytorch import Enformer, GenomeIntervalDataset filter_train = lambda df: df.filter(pl.col('column_4') == 'train') ds = GenomeIntervalDataset( bed_file = './sequences.bed', # bed file - columns 0, 1, 2 must be <chromosome>, <start position>, <end position> fasta_file = './hg38.ml.fa', # path to fasta file filter_df_fn = filter_train, # filter dataframe function return_seq_indices = True, # return nucleotide indices (ACGTN) or one hot encodings shift_augs = (-2, 2), # random shift augmentations from -2 to +2 basepairs context_length = 196_608, # this can be longer than the interval designated in the .bed file, # in which case it will take care of lengthening the interval on either sides # as well as proper padding if at the end of the chromosomes chr_bed_to_fasta_map = { 'chr1': 'chromosome1', # if the chromosome name in the .bed file is different than the key name in the fasta file, you can rename them on the fly 'chr2': 'chromosome2', 'chr3': 'chromosome3', # etc etc } ) model = Enformer( dim = 1536, depth = 11, heads = 8, output_heads = dict(human = 5313, mouse = 1643), target_length = 896, ) seq = ds[0] # (196608,) pred = model(seq, head = 'human') # (896, 5313) ``` ## App

评论收藏

内容反馈

版权申诉