PyPI官网下载|linear_attention_transformer-0.17.0.tar.gz_linear

版权申诉

102 浏览量 2022-01-12 23:16:40 上传评论收藏 11KB GZ 举报

共15个文件

py：7个

txt：4个

pkg-info：2个

线性注意力变换器（Linear Attention Transformer）是一种优化后的Transformer模型，它在保持性能的同时，显著降低了计算复杂度。PyPI（Python Package Index）是Python软件的官方仓库，提供了一个平台，使得开发者可以方便地分享和下载Python库。在这个场景中，"linear_attention_transformer-0.17.0.tar.gz"是线性注意力变换器库的0.17.0版本，以gzip压缩格式的tar文件提供。 Transformer模型是自然语言处理领域的一个里程碑，由Vaswani等人在2017年提出。它使用自注意力机制替代了传统的RNN和CNN，能够并行处理序列数据，提高了训练速度。然而，标准的Transformer模型的注意力机制在计算上较为昂贵，尤其是对于长序列，这限制了其在资源有限的环境中的应用。线性注意力变换器是对Transformer的一种改进，它通过引入线性化注意力机制，减少了计算复杂度。这种机制通常通过矩阵分解或者近似方法实现，允许模型在保持相当的准确度的同时，减少计算时间和内存消耗。这对于处理大规模序列数据或在资源受限的设备上部署Transformer模型非常有帮助。在"linear_attention_transformer-0.17.0"这个版本中，我们可以期待包含以下内容： 1. **库源代码**：包括Python模块和可能的C/C++扩展，用于实现线性注意力机制。 2. **文档**：可能包含README文件，提供了安装、使用和API的简要说明。 3. **示例**：演示如何在实际项目中应用该库的代码示例。 4. **测试**：单元测试和集成测试，确保库的功能正确性和稳定性。 5. **需求文件**：如requirements.txt，列出库运行所依赖的其他Python库及其版本。 6. **许可文件**：定义库的使用、分发和修改的法律条款。 7. **打包脚本**：如setup.py，用于构建和发布到PyPI的脚本。安装此库，用户可以通过Python的pip工具执行如下命令： ``` pip install linear_attention_transformer-0.17.0.tar.gz ``` 在实际使用时，开发人员可以导入库并利用其提供的类和函数，例如创建一个线性注意力层，应用于自己的模型中，从而提高效率。线性注意力变换器可能还支持与其他现有的Transformer库如Hugging Face的Transformers的兼容，以便于迁移和比较。 "linear_attention_transformer-0.17.0.tar.gz"是优化Transformer模型的一个重要贡献，为Python开发者提供了更高效且资源友好的自然语言处理工具。通过深入理解和使用这个库，可以为处理大型文本数据集的项目带来显著的性能提升。

资源推荐

资源详情

资源评论

收起资源包目录

linear_attention_transformer-0.17.0.tar.gz （15个子文件）

linear_attention_transformer-0.17.0

setup.cfg 38B

README.md 7KB

linear_attention_transformer.egg-info

dependency_links.txt 1B

PKG-INFO 602B

SOURCES.txt 559B

top_level.txt 29B

requires.txt 92B

PKG-INFO 602B

linear_attention_transformer

reversible.py 6KB

autoregressive_wrapper.py 3KB

autopadder.py 2KB

images.py 2KB

__init__.py 327B

linear_attention_transformer.py 15KB

setup.py 864B

## Linear Attention Transformer <img src="./linear-attention.png" width="700px" /> [![PyPI version](https://badge.fury.io/py/linear-attention-transformer.svg)](https://badge.fury.io/py/linear-attention-transformer) A fully featured Transformer that mixes (QKᵀ)V local attention with Q(KᵀV) global attention (scales linearly with respect to sequence length) for efficient long-range language modeling. ## Install ```bash $ pip install linear-attention-transformer ``` ## Usage Language model ```python import torch from linear_attention_transformer import LinearAttentionTransformerLM model = LinearAttentionTransformerLM( num_tokens = 20000, dim = 512, heads = 8, depth = 1, max_seq_len = 8192, causal = True, # auto-regressive or not ff_dropout = 0.1, # dropout for feedforward attn_layer_dropout = 0.1, # dropout right after self-attention layer attn_dropout = 0.1, # dropout post-attention emb_dim = 128, # embedding factorization, to save on memory dim_head = 128, # be able to fix the dimension of each head, making it independent of the embedding dimension and the number of heads blindspot_size = 64, # this gives the q(kv) attention a blindspot of 64 tokens back in the causal case, but gives back an order of magnitude return in memory savings. should be paired with local attention of at least a window size of this setting. setting this to 1 will allow for full q(kv) attention of past n_local_attn_heads = 4, # number of local attention heads for (qk)v attention. this can be a tuple specifying the exact number of local attention heads at that depth local_attn_window_size = 128, # receptive field of the local attention reversible = True, # use reversible nets, from Reformer paper ff_chunks = 2, # feedforward chunking, from Reformer paper ff_glu = True, # use GLU variant for feedforward attend_axially = False # will fold the sequence by the local attention window size, and do an extra strided attention followed by a feedforward with the cheap q(kv) attention ).cuda() x = torch.randint(0, 20000, (1, 8192)).cuda() model(x) # (1, 8192, 512) ``` Transformer ```python import torch from linear_attention_transformer import LinearAttentionTransformer model = LinearAttentionTransformer( dim = 512, heads = 8, depth = 1, max_seq_len = 8192, n_local_attn_heads = 4 ).cuda() x = torch.randn(1, 8192, 512).cuda() model(x) # (1, 8192, 512) ``` Encoder / decoder ```python import torch from linear_attention_transformer import LinearAttentionTransformerLM enc = LinearAttentionTransformerLM( num_tokens = 20000, dim = 512, heads = 8, depth = 6, max_seq_len = 4096, reversible = True, n_local_attn_heads = 4, return_embeddings = True ).cuda() dec = LinearAttentionTransformerLM( num_tokens = 20000, dim = 512, heads = 8, depth = 6, causal = True, max_seq_len = 4096, reversible = True, receives_context = True, n_local_attn_heads = 4 ).cuda() src = torch.randint(0, 20000, (1, 4096)).cuda() src_mask = torch.ones_like(src).bool().cuda() tgt = torch.randint(0, 20000, (1, 4096)).cuda() tgt_mask = torch.ones_like(tgt).bool().cuda() context = enc(src, input_mask = src_mask) logits = dec(tgt, context = context, input_mask = tgt_mask, context_mask = src_mask) ``` ## Linformer Linformer is another variant of attention with linear complexity championed by Facebook AI. It only works with non-autoregressive models of a fixed sequence length. If your problem satisfies that criteria, you may choose to try it out. ```python from linear_attention_transformer import LinearAttentionTransformerLM, LinformerSettings settings = LinformerSettings(k = 256) enc = LinearAttentionTransformerLM( num_tokens = 20000, dim = 512, heads = 8, depth = 6, max_seq_len = 4096, linformer_settings = settings ).cuda() ``` You can also used Linformer for the contextual attention layer, if the contextual keys are of a fixed sequence length. ```python from linear_attention_transformer import LinearAttentionTransformerLM, LinformerContextSettings settings = LinformerContextSettings( seq_len = 2048, k = 256 ) dec = LinearAttentionTransformerLM( num_tokens = 20000, dim = 512, heads = 8, depth = 6, max_seq_len = 4096, causal = True, context_linformer_settings = settings, receives_context = True ).cuda() ``` ## Images This repository also contains a concise implementation of this efficient attention for images ```python import torch from linear_attention_transformer.images import ImageLinearAttention attn =ImageLinearAttention( chan = 32, heads = 8, key_dim = 64 # can be decreased to 32 for more memory savings ) img = torch.randn(1, 32, 256, 256) attn(img) # (1, 32, 256, 256) ``` ## Citations ```bibtex @inproceedings{katharopoulos-et-al-2020, author = {Katharopoulos, A. and Vyas, A. and Pappas, N. and Fleuret, F.}, title = {Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention}, booktitle = {Proceedings of the International Conference on Machine Learning (ICML)}, year = {2020}, url = {https://arxiv.org/abs/2006.16236} } ``` ```bibtex @article{shen2019efficient, author = {Zhuoran Shen and Mingyuan Zhang and Haiyu Zhao and Shuai Yi and Hongsheng Li}, title = {Efficient Attention: Attention with Linear Complexities}, journal = {CoRR}, volume = {abs/1812.01243}, year = {2018}, url = {http://arxiv.org/abs/1812.01243} } ``` ```bibtex @inproceedings{kitaev2020reformer, title = {Reformer: The Efficient Transformer}, author = {Nikita Kitaev and Lukasz Kaiser and Anselm Levskaya}, booktitle = {International Conference on Learning Representations}, year = {2020}, url = {https://openreview.net/forum?id=rkgNKkHtvB} } ``` ```bibtex @misc{shazeer2020glu, title = {GLU Variants Improve Transformer}, author = {Noam Shazeer}, year = {2020}, url = {https://arxiv.org/abs/2002.05202} } ``` ```bibtex @misc{wang2020linformer, title = {Linformer: Self-Attention with Linear Complexity}, author = {Sinong Wang and Belinda Z. Li and Madian Khabsa and Han Fang and Hao Ma}, year = {2020}, eprint = {2006.04768} } ``` ```bibtex @misc{bhojanapalli2020lowrank, title = {Low-Rank Bottleneck in Multi-head Attention Models}, author = {Srinadh Bhojanapalli and Chulhee Yun and Ankit Singh Rawat and Sashank J. Reddi and Sanjiv Kumar}, year = {2020}, eprint = {2002.07028} } ```

评论收藏

内容反馈

版权申诉