一个为预训练语言模型设计的模型裁剪工具包，通过轻量、快速的裁剪方法对模型进行结构化剪枝，从而实现压缩模型体积、提升模型速度

共92个文件

py：53个

md：9个

tsv：6个

版权申诉

语言模型

人工智能

136 浏览量 2023-12-23 15:25:17 上传评论收藏 10.8MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

TextPruner-main.zip （92个子文件）

TextPruner-main

setup.py 3KB

.github

stale.yml 784B

src

textpruner

utils.py 8KB

__init__.py 235B

pruners

utils.py 6KB

__init__.py 141B

transformer_pruner.py 28KB

pipeline_pruner.py 8KB

vocabulary_pruner.py 5KB

tokenizer_utils

utils.py 1KB

__init__.py 361B

roberta_gpt2_tokenizer.py 2KB

sp_tokenizer.py 3KB

xlmr_sp_tokenizer.py 3KB

t5_sp_tokenizer.py 2KB

subword_tokenizer.py 1KB

xlm_tokenizer.py 2KB

mt5_sp_tokenizer.py 3KB

configurations.py 6KB

model_utils

utils.py 3KB

__init__.py 480B

xlm.py 2KB

xlm_roberta.py 983B

albert.py 951B

mt5.py 3KB

roberta.py 973B

model_structure.py 6KB

electra.py 973B

bert.py 961B

t5.py 3KB

bart.py 3KB

commands

utils.py 3KB

__init__.py 0B

functions.py 3KB

textpruner_cli.py 3KB

model_map.py 2KB

LICENSE 11KB

examples

pipeline_pruning

pipeline_pruning.py 2KB

pipeline_pruning.sh 350B

measure_performance.py 1KB

README.md 1000B

transformer_pruning_xnli

transformer_pruning_selfsupervised.py 2KB

measure_performance.py 1KB

README.md 861B

vocabulary_pruning_xnli

vocabulary_pruning.py 2KB

measure_performance.py 1KB

README.md 902B

transformer_pruning

transformer_pruning.sh 313B

transformer_pruning_with_masks.py 983B

transformer_pruning.py 2KB

measure_performance.py 1KB

README.md 1KB

datasets

xnli

en.tsv 17.8MB

README.md 268B

pawsx

dev-en.tsv 449KB

test-zh.tsv 432KB

translate-train

en.tsv 10.94MB

test-en.tsv 452KB

dev-zh.tsv 431KB

classification_utils

__init__.py 0B

dataloader_script.py 10KB

dataloader_script_xnli.py 10KB

predict_function.py 3KB

my_dataset.py 9KB

models

xlmr_xnli

README.md 76B

xlmr_pawsx

README.md 76B

vocabulary_pruning

vocabulary_pruning.sh 293B

vocabulary_pruning.py 2KB

MaskedLM_vocabulary_pruning.py 1KB

measure_performance.py 1KB

README.md 1KB

configurations

gc.json 96B

tc-masks.json 78B

vc.json 92B

tc-iterative.json 255B

docs

make.bat 804B

Makefile 638B

source

index.rst 2KB

conf.py 2KB

_static

css

custom.css 136B

APIs

Utils.rst 141B

Configurations.rst 457B

Pruners.rst 401B

requirements.txt 46B

CODE_OF_CONDUCT.md 5KB

pics

banner.png 89KB

hfl_qrcode.jpg 26KB

nav_banner.png 89KB

PruningModes.png 48KB

.gitignore 0B

MANIFEST.in 16B

.gitignore 86B

# Pruning the Classification model These scripts perform vocabulary pruning on the classification model (`XLMRobertaForSequenceClassification`) and evaluate the performance. We use a subset of XNLI English training set as the vocabulary file. Download the fine-tuned model or train your own model on PAWS-X dataset, and save the files to `../models/xlmr_pawsx`. Download link: * [Google Drive](https://drive.google.com/drive/folders/1TXuIvcYJ0aje7WC-LyrxstzeJn4_383r?usp=sharing) * [Hugging Face Models](https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en/tree/main) * Pruning with the textpruner-CLI tool: ```bash bash vocabulary_pruning.sh ``` * Pruning with the python script: ```bash VOCABULARY_FILE=../datasets/xnli/en.tsv MODEL_PATH=../models/xlmr_pawsx python vocabulary_pruning.py $MODEL_PATH $VOCABULARY_FILE ``` * Evaluate the model: Set `$PRUNED_MODEL_PATH` to the directory where the pruned model is stored. ```bash python measure_performance.py $PRUNED_MODEL_PATH ``` # Pruning the Pre-Trained models for MLM This script prunes the pre-trained models for MLM with a vocabulary limited to the SST-2 training set. Set `$MODEL_PATH` to the directory where the pre-trained model （BERT, RoBERTa, etc.) is stored. ```bash python MaskedLM_vocabulary_pruning.py $MODEL_PATH ```

评论收藏

内容反馈

版权申诉