Chinese_Coreference_Resolution:中文指代消解，pytorch实现

共189个文件

response：72个

key：28个

pyc：28个

Python

5星 · 超过95%的资源需积分: 50 34 浏览量 2021-04-02 20:36:44 上传评论 6 收藏 8.73MB ZIP 举报

资源详情

资源评论

资源推荐

收起资源包目录

Chinese_Coreference_Resolution:中文指代消解，pytorch实现（189个子文件）

scorer.bat 2KB

experiments.conf 1KB

bert_config.json 520B

train.chinese.128.jsonlines 29.05MB

dev.chinese.128.jsonlines 4.48MB

test.chinese.128.jsonlines 3.49MB

TC-C.key 2KB

TC-B.key 2KB

TC-D.key 790B

TC-E.key 790B

TC-D.key 790B

TC-E.key 790B

TC-L.key 780B

TC-K.key 780B

TC-L.key 780B

TC-F.key 774B

TC-H.key 774B

TC-G.key 774B

TC-I.key 774B

TC-G.key 774B

TC-H.key 774B

TC-J.key 772B

TC-A.key 363B

TC-M.key 363B

TC-N.key 363B

TC-M.key 363B

TC-A.key 363B

TC-N.key 363B

README.md 2KB

README.Munkres 5KB

test.pl 2KB

scorer.pl 1KB

Dumper.pm 40KB

CorScorer.pm 39KB

Combinatorics.pm 27KB

Cwd.pm 21KB

CorefMetricTestConfig.pm 14KB

Munkres.pm 13KB

CorefMetricTest.pm 4KB

modeling_change.py 64KB

modeling.py 61KB

modeling_transfo_xl.py 58KB

modeling_openai.py 37KB

coreference.py 35KB

modeling_gpt2.py 31KB

tokenization_transfo_xl.py 22KB

tokenization.py 19KB

demo.py 16KB

modeling_transfo_xl_utilities.py 16KB

tokenization_process_data.py 14KB

tokenization_openai.py 14KB

tokenization_gpt2.py 13KB

optimization.py 13KB

file_utils.py 9KB

convert_transfo_xl_checkpoint_to_pytorch.py 5KB

optimization_openai.py 5KB

metrics.py 4KB

__main__.py 4KB

conll.py 4KB

convert_openai_checkpoint_to_pytorch.py 3KB

convert_gpt2_checkpoint_to_pytorch.py 3KB

convert_tf_checkpoint_to_pytorch.py 3KB

__init__.py 1KB

utils.py 1KB

modeling_change.cpython-35.pyc 57KB

modeling_change.cpython-36.pyc 54KB

modeling.cpython-35.pyc 54KB

modeling.cpython-36.pyc 51KB

modeling_transfo_xl.cpython-35.pyc 45KB

modeling_transfo_xl.cpython-36.pyc 41KB

modeling_openai.cpython-35.pyc 34KB

modeling_openai.cpython-36.pyc 32KB

modeling_gpt2.cpython-35.pyc 29KB

modeling_gpt2.cpython-36.pyc 28KB

tokenization_transfo_xl.cpython-35.pyc 19KB

tokenization_transfo_xl.cpython-36.pyc 17KB

tokenization.cpython-35.pyc 15KB

tokenization.cpython-36.pyc 14KB

tokenization_gpt2.cpython-35.pyc 12KB

optimization.cpython-35.pyc 11KB

tokenization_process_data.cpython-36.pyc 11KB

tokenization_gpt2.cpython-36.pyc 11KB

optimization.cpython-36.pyc 11KB

共 189 条

# SpanBERT for Chinese Coreference Resolution (Pytorch) - 参考论文： [SpanBERT: Improving Pre-training by Representing and Predicting Spans](https://arxiv.org/abs/1907.10529) - 参考开源代码（面向英文、使用tensorflow）：[https://github.com/mandarjoshi90/coref](https://github.com/mandarjoshi90/coref) - 预训练模型下载地址： - 中文预训练`RoBERTa`模型（https://github.com/brightmart/roberta_zh） - 中文预训练`BERT-wwm`模型（https://github.com/brightmart/roberta_zh） - 中文预训练`Bert`模型（https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz） #### 1. 代码架构： │ conll.py │ coreference.py │ demo.py │ metrics.py │ utils.py │ experiments.conf │ requirements.txt │ ├─bert │ │ modeling.py │ │ optimization.py │ │ tokenization.py │ ├─conll-2012 │ └─scorer │ ├─reference-coreference-scorers │ └─v8.01 ├─data │ ├─dev │ ├─test │ └─train │ └─pretrain_model │ bert_config.json │ pytorch_model.bin │ vocab.txt **其中**： **conll.py**：验证集验证所需脚本 **coreference.py** **：**指代消解模型脚本 **demo.py**：指代消解工程测试脚本 **metrics.py**：验证集计算指标脚本 **utils.py**：数据转换，文件读写脚本 **experiments.conf**：代码运行所需参数配置文件 **requirements.txt**：代码运行必要环境文件 **bert**：用于存放bert模型相关脚本文件 **conll-2012**：官方提供的验证文件 **data**：用于存放训练验证预测文件以及最后预测的结果文件 **pretrain_model**：用于存放预训练模型（包含模型、参数配置文件、字典） #### 2. 运行环境 - 运行环境要求python版本在3.5及以上，运行环境配置见`requirements.txt` - 一块`TITAN xp` , 参数选择`[ffnn_size=2000，nun_epochs=30]`，需要7小时左右 #### 3.运行方式 - 在`experiments.conf`文件中配置好向相应的参数，命令行运行：`python demo.py` 即可，默认使用第三块GPU(编号为2)。