# Pruning the Classification model
These scripts perform vocabulary pruning on the classification model (`XLMRobertaForSequenceClassification`) and evaluate the performance.
We use a subset of XNLI English training set as the vocabulary file.
Download the fine-tuned model or train your own model on PAWS-X dataset, and save the files to `../models/xlmr_pawsx`.
Download link:
* [Google Drive](https://drive.google.com/drive/folders/1TXuIvcYJ0aje7WC-LyrxstzeJn4_383r?usp=sharing)
* [Hugging Face Models](https://huggingface.co/ziqingyang/XLMRobertaBaseForPAWSX-en/tree/main)
* Pruning with the textpruner-CLI tool:
```bash
bash vocabulary_pruning.sh
```
* Pruning with the python script:
```bash
VOCABULARY_FILE=../datasets/xnli/en.tsv
MODEL_PATH=../models/xlmr_pawsx
python vocabulary_pruning.py $MODEL_PATH $VOCABULARY_FILE
```
* Evaluate the model:
Set `$PRUNED_MODEL_PATH` to the directory where the pruned model is stored.
```bash
python measure_performance.py $PRUNED_MODEL_PATH
```
# Pruning the Pre-Trained models for MLM
This script prunes the pre-trained models for MLM with a vocabulary limited to the SST-2 training set.
Set `$MODEL_PATH` to the directory where the pre-trained model (BERT, RoBERTa, etc.) is stored.
```bash
python MaskedLM_vocabulary_pruning.py $MODEL_PATH
```
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
TextPruner-main.zip (92个子文件)
TextPruner-main
setup.py 3KB
.github
stale.yml 784B
src
textpruner
utils.py 8KB
__init__.py 235B
pruners
utils.py 6KB
__init__.py 141B
transformer_pruner.py 28KB
pipeline_pruner.py 8KB
vocabulary_pruner.py 5KB
tokenizer_utils
utils.py 1KB
__init__.py 361B
roberta_gpt2_tokenizer.py 2KB
sp_tokenizer.py 3KB
xlmr_sp_tokenizer.py 3KB
t5_sp_tokenizer.py 2KB
subword_tokenizer.py 1KB
xlm_tokenizer.py 2KB
mt5_sp_tokenizer.py 3KB
configurations.py 6KB
model_utils
utils.py 3KB
__init__.py 480B
xlm.py 2KB
xlm_roberta.py 983B
albert.py 951B
mt5.py 3KB
roberta.py 973B
model_structure.py 6KB
electra.py 973B
bert.py 961B
t5.py 3KB
bart.py 3KB
commands
utils.py 3KB
__init__.py 0B
functions.py 3KB
textpruner_cli.py 3KB
model_map.py 2KB
LICENSE 11KB
examples
pipeline_pruning
pipeline_pruning.py 2KB
pipeline_pruning.sh 350B
measure_performance.py 1KB
README.md 1000B
transformer_pruning_xnli
transformer_pruning_selfsupervised.py 2KB
measure_performance.py 1KB
README.md 861B
vocabulary_pruning_xnli
vocabulary_pruning.py 2KB
measure_performance.py 1KB
README.md 902B
transformer_pruning
transformer_pruning.sh 313B
transformer_pruning_with_masks.py 983B
transformer_pruning.py 2KB
measure_performance.py 1KB
README.md 1KB
datasets
xnli
en.tsv 17.8MB
README.md 268B
pawsx
dev-en.tsv 449KB
test-zh.tsv 432KB
translate-train
en.tsv 10.94MB
test-en.tsv 452KB
dev-zh.tsv 431KB
classification_utils
__init__.py 0B
dataloader_script.py 10KB
dataloader_script_xnli.py 10KB
predict_function.py 3KB
my_dataset.py 9KB
models
xlmr_xnli
README.md 76B
xlmr_pawsx
README.md 76B
vocabulary_pruning
vocabulary_pruning.sh 293B
vocabulary_pruning.py 2KB
MaskedLM_vocabulary_pruning.py 1KB
measure_performance.py 1KB
README.md 1KB
configurations
gc.json 96B
tc-masks.json 78B
vc.json 92B
tc-iterative.json 255B
docs
make.bat 804B
Makefile 638B
source
index.rst 2KB
conf.py 2KB
_static
css
custom.css 136B
APIs
Utils.rst 141B
Configurations.rst 457B
Pruners.rst 401B
requirements.txt 46B
CODE_OF_CONDUCT.md 5KB
pics
banner.png 89KB
hfl_qrcode.jpg 26KB
nav_banner.png 89KB
PruningModes.png 48KB
.gitignore 0B
MANIFEST.in 16B
.gitignore 86B
共 92 条
- 1
资源评论
Java程序员-张凯
- 粉丝: 1w+
- 资源: 6746
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于区块链的数字版权管理的设计与实现+详细文档+全部资料(高分毕业设计).zip
- html+js+css实现简易计算器.rar
- 用python从excel读取数据并画图.zip
- 一个光立方项目,使用树莓派+74HC154芯片控制(20多种特效)
- 基于MATLAB口罩定位识别系统源码+GUI界面+详细文档+全部资料(高分项目).zip
- comfyui的电商工作流BrushNet-basic
- 基于pytorch+OpenCV的手写数字识别源码+使用文档+全部资料(优秀项目).zip
- 基于C++和Opencv的传统手势识别源码+使用文档+全部资料(优秀项目).zip
- 与我最爱的人度过的第二个情人节,花心思制作的一个网页送给她
- Python实战:高效读取Excel数据.zip
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功