武大本科毕业设计：基于Self-Attention的汉语语义角色标注.zip资源-CSDN文库

共72个文件

py：41个

result：10个

txt：7个

版权申诉

本科毕业设计

83 浏览量 2023-10-05 01:22:42 上传评论收藏 2.83MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

武大本科毕业设计：基于Self-Attention的汉语语义角色标注.zip （72个子文件）

Graduation Design

visual.txt 408B

validation.sh 792B

attention.png 56KB

tagger

__init__.py 0B

optimizers

__init__.py 461B

optimizers.py 16KB

clipping.py 2KB

schedules.py 9KB

data

__init__.py 147B

embedding.py 2KB

dataset.py 6KB

vocab.py 3KB

modules

__init__.py 491B

embedding.py 1KB

losses.py 2KB

module.py 577B

batch_norm.py 3KB

recurrent.py 11KB

affine.py 2KB

layer_norm.py 2KB

attention.py 3KB

feed_forward.py 2KB

utils

__init__.py 242B

checkpoint.py 2KB

hparams.py 4KB

validationThread.py 5KB

misc.py 309B

summary.py 2KB

scope.py 2KB

validation.py 5KB

bin

trainer.py 15KB

predictor.py 5KB

models

__init__.py 479B

deepatt.py 7KB

lstmatt.py 7KB

scripts

__init__.py 0B

visualization.py 3KB

convert_to_conll.py 3KB

build_vocab.py 3KB

preprocess

process_conll2012.py 8KB

subword.py 3KB

shuffle.py 2KB

processor.py 9KB

decoder.sh 561B

data

test

conll2012.devel.props.gold.txt 145B

conll2012.train.txt 924B

label.txt 67B

conll2012.devel.txt 250B

vocab.txt 407B

special.py 3KB

decode.txt 179KB

.idea

vcs.xml 180B

Tagger.iml 326B

misc.xml 310B

inspectionProfiles

profiles_settings.xml 174B

modules.xml 264B

.gitignore 47B

make_conll2012_data.sh 1KB

test.sh 749B

.gitignore 4KB

run.sh 2KB

results

conll05

ensemble

conll05.dev.result 1.86MB

conll05.wsj.result 2.99MB

conll05.brown.result 388KB

single

conll05.dev.result 1.86MB

conll05.wsj.result 2.99MB

conll05.brown.result 388KB

conll12

ensemble

conll12.test.result 11.75MB

conll12.dev.result 18.04MB

single

conll12.test.result 11.75MB

conll12.devel.result 18.04MB

README.md 2KB

# 基于Self-Attention的汉语语义角色标注本文模型基于[Deep Semantic Role Labeling with Self-Attention](https://github.com/XMUNLP/Tagger) # 数据预处理 ## 获取数据在LDC上获取ontonotes 5.0数据 https://catalog.ldc.upenn.edu/LDC2013T19 ## 将数据转化为Conll格式依照这篇教程将数据转为Conll格式 http://conll.cemantix.org/2012/data.html ## 数据处理脚本修改 make_conll2012_data.sh 脚本的变量. ```shell script # 训练集,开发集,测试集的路径 TRAIN=".../conll-2012/v4/data/train/data/chinese/annotations" DEV=".../conll-2012/v4/data/development/data/chinese/annotations" TEST=".../conll-2012/v9/data/test/data/chinese/annotations" ``` 然后运行该脚本 ```shell script make_conll2012_data.sh ``` 运行后,会在 data/srl 目录下生成.txt数据文件,以及exclude文件夹(单独包含了脚本中指定的特殊标签) 处理后的数据格式如下 ```text 2 My cats love hats . ||| B-A0 I-A0 B-V B-A1 O ``` ## 生成字典 ```shell script # limit 代表字典的大小, lower 代表小写 python tagger/scripts/build_vocab.py --limit 20000 --lower data/srl/conll2012.train.txt data/srl ``` # 运行 ## 修改脚本修改 run.sh validation.sh 脚本变量参数 ```shell script TAGGERPATH=本项目根目录 ``` 并根据需要修改`parameters`参数 ##运行 ```shell script ./run.sh ``` ##验证 ```shell script ./validation.sh ``` # 结果 ## Attention可视化将需要可视化的数据复制到visual.txt中,然后运行 ```shell script python tagger/scripts/visualization.py train visual.txt --embedding EMBEDDING ``` ## 使用预训练向量注意文件开头如果是字典长度的信息,则该行需要删除

评论收藏

内容反馈

版权申诉