基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip

共98个文件

py：57个

tsv：16个

sh：7个

版权申诉

bert

自然语言处理

知识图谱

5星 · 超过95%的资源 32 浏览量 2023-10-05 14:36:44 上传评论 3 收藏 6.11MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

基于BERT和知识图谱的中文电子病例医学命名实体识别.zip （98个子文件）

基于BERT和知识图谱的中文电子病例医学命名实体识别

run_bertless_ner_ccks2019.py 51KB

shellbywxx

train_ner_ccks_bertless_model.sh 590B

run_cls_predict.sh 472B

train_ner_demo.sh 485B

run_bertlesssingle_ner_predict.sh 515B

run_single_ner_predict.sh 527B

run_ner_ensemble_predict.sh 873B

train_ner_ccks_bertbase.sh 595B

run_kbert_ner_predict.py 52KB

run_kbert_cls.py 24KB

uer

__init__.py 0B

layers

__init__.py 0B

multi_headed_attn.py 2KB

embeddings.py 2KB

position_ffn.py 532B

layer_norm.py 491B

transformer.py 1KB

encoders

__init__.py 0B

mixed_encoder.py 4KB

bert_encoder.py 2KB

birnn_encoder.py 2KB

attn_encoder.py 1KB

gpt_encoder.py 1KB

cnn_encoder.py 3KB

rnn_encoder.py 3KB

subencoders

__init__.py 0B

cnn_subencoder.py 914B

rnn_subencoder.py 992B

avg_subencoder.py 425B

trainer.py 20KB

utils

__init__.py 0B

seed.py 412B

optimizers.py 6KB

tokenizer.py 11KB

vocab.py 5KB

misc.py 279B

subword.py 655B

data.py 44KB

act_fun.py 122B

constants.py 278B

config.py 596B

targets

__init__.py 0B

bert_target.py 3KB

bilm_target.py 3KB

cls_target.py 1KB

s2s_target.py 2KB

mlm_target.py 2KB

lm_target.py 2KB

nsp_target.py 1KB

model_builder.py 2KB

model_saver.py 284B

models

__init__.py 0B

model.py 1KB

bert_model.py 1KB

run_kbert_ner.py 17KB

run_kbert_ner_ensemble.py 67KB

brain

__init__.py 59B

knowgraph.py 16KB

kgs

Medical.spo 461KB

Medical-clean.spo 452KB

Medical-plus.spo 714KB

Medical-plus_with_check_entity.spo 837KB

medicalrepeatentities.txt 13KB

HowNet.spo 1.31MB

notebypaulwen.md 1KB

.gitignore 14B

config.py 897B

run_cls_predict_bywxx.py 16KB

datasets

medical_ner

train.tsv 1.68MB

dev.tsv 238KB

test.tsv 195KB

chnsenticorp

train.tsv 2.88MB

dev.tsv 365KB

test.tsv 361KB

ccks2019-datasets

train.tsv 2.45MB

ordered

train.tsv 2.28MB

dev.tsv 613KB

dev.tsv 435KB

test_test.tsv 3KB

ori

train.tsv 2.28MB

test_ori.txt 750KB

dev.tsv 613KB

train_ori.txt 2.39MB

test.tsv 916KB

transformdata.py 3KB

train_dev.tsv 2.87MB

test.tsv 914KB

.gitignore 40B

requirements.txt 114B

models

google_vocab.txt 107KB

.gitignore 6B

google_config.json 114B

.gitignore 1KB

run_kbertless_ner_predict.py 39KB

outputs

README.md 62B

README.md 6KB

networkbywxx

BILSTM_CRF_demo.py 10KB

run_kbert_ner_ccks2019.py 63KB

# K-BERT ![](https://img.shields.io/badge/license-MIT-000000.svg) Sorce code and datasets for ["K-BERT: Enabling Language Representation with Knowledge Graph"](https://aaai.org/Papers/AAAI/2020GB/AAAI-LiuW.5594.pdf), which is implemented based on the [UER](https://github.com/dbiir/UER-py) framework. ## Requirements Software: ``` Python3 Pytorch >= 1.0 argparse == 1.1 ``` ## Prepare * Download the ``google_model.bin`` from [here](https://share.weiyun.com/5GuzfVX), and save it to the ``models/`` directory. * Download the ``CnDbpedia.spo`` from [here](https://share.weiyun.com/5BvtHyO), and save it to the ``brain/kgs/`` directory. * Optional - Download the datasets for evaluation from [here](https://share.weiyun.com/5Id9PVZ), unzip and place them in the ``datasets/`` directory. The directory tree of K-BERT: ``` K-BERT ├── brain │ ├── config.py │ ├── __init__.py │ ├── kgs │ │ ├── CnDbpedia.spo │ │ ├── HowNet.spo │ │ └── Medical.spo │ └── knowgraph.py ├── datasets │ ├── book_review │ │ ├── dev.tsv │ │ ├── test.tsv │ │ └── train.tsv │ ├── chnsenticorp │ │ ├── dev.tsv │ │ ├── test.tsv │ │ └── train.tsv │ ... │ ├── models │ ├── google_config.json │ ├── google_model.bin │ └── google_vocab.txt ├── outputs ├── uer ├── README.md ├── requirements.txt ├── run_kbert_cls.py └── run_kbert_ner.py ``` ## K-BERT for text classification ### Classification example Run example on Book review with CnDbpedia: ```sh CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_cls.py \ --pretrained_model_path ./models/google_model.bin \ --config_path ./models/google_config.json \ --vocab_path ./models/google_vocab.txt \ --train_path ./datasets/book_review/train.tsv \ --dev_path ./datasets/book_review/dev.tsv \ --test_path ./datasets/book_review/test.tsv \ --epochs_num 5 --batch_size 32 --kg_name CnDbpedia \ --output_model_path ./outputs/kbert_bookreview_CnDbpedia.bin \ > ./outputs/kbert_bookreview_CnDbpedia.log & ``` Results: ``` Best accuracy in dev : 88.80% Best accuracy in test: 87.69% ``` Options of ``run_kbert_cls.py``: ``` useage: [--pretrained_model_path] - Path to the pre-trained model parameters. [--config_path] - Path to the model configuration file. [--vocab_path] - Path to the vocabulary file. --train_path - Path to the training dataset. --dev_path - Path to the validating dataset. --test_path - Path to the testing dataset. [--epochs_num] - The number of training epoches. [--batch_size] - Batch size of the training process. [--kg_name] - The name of knowledge graph, "HowNet", "CnDbpedia" or "Medical". [--output_model_path] - Path to the output model. ``` ### Classification benchmarks Accuracy (dev/test %) on different dataset: | Dataset | HowNet | CnDbpedia | | :----- | :----: | :----: | | Book review | 88.75/87.75 | 88.80/87.69 | | ChnSentiCorp | 95.00/95.50 | 94.42/95.25 | | Shopping | 97.01/96.92 | 96.94/96.73 | | Weibo | 98.22/98.33 | 98.29/98.33 | | LCQMC | 88.97/87.14 | 88.91/87.20 | | XNLI | 77.11/77.07 | 76.99/77.43 | ## K-BERT for named entity recognization (NER) ### NER example Run an example on the msra_ner dataset with CnDbpedia: ``` CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_ner.py \ --pretrained_model_path ./models/google_model.bin \ --config_path ./models/google_config.json \ --vocab_path ./models/google_vocab.txt \ --train_path ./datasets/msra_ner/train.tsv \ --dev_path ./datasets/msra_ner/dev.tsv \ --test_path ./datasets/msra_ner/test.tsv \ --epochs_num 5 --batch_size 16 --kg_name CnDbpedia \ --output_model_path ./outputs/kbert_msraner_CnDbpedia.bin \ > ./outputs/kbert_msraner_CnDbpedia.log & ``` Results: ``` The best in dev : precision=0.957, recall=0.962, f1=0.960 The best in test: precision=0.953, recall=0.959, f1=0.956 ``` Options of ``run_kbert_ner.py``: ``` useage: [--pretrained_model_path] - Path to the pre-trained model parameters. [--config_path] - Path to the model configuration file. [--vocab_path] - Path to the vocabulary file. --train_path - Path to the training dataset. --dev_path - Path to the validating dataset. --test_path - Path to the testing dataset. [--epochs_num] - The number of training epoches. [--batch_size] - Batch size of the training process. [--kg_name] - The name of knowledge graph. [--output_model_path] - Path to the output model. ``` ## K-BERT for domain-specific tasks Experimental results on domain-specific tasks (Precision/Recall/F1 %): | KG | Finance_QA | Law_QA | Finance_NER | Medicine_NER | | :----- | :----: | :----: | :----: | :----: | | HowNet | 0.805/0.888/0.845 | 0.842/0.903/0.871 | 0.860/0.888/0.874 | 0.935/0.939/0.937 | | CN-DBpedia | 0.814/0.881/0.846 | 0.814/0.942/0.874 | 0.860/0.887/0.873 | 0.935/0.937/0.936 | | MedicalKG | -- | -- | -- | 0.944/0.943/0.944 | ## Acknowledgement This work is a joint study with the support of Peking University and Tencent Inc. If you use this code, please cite this paper: ``` @inproceedings{weijie2019kbert, title={{K-BERT}: Enabling Language Representation with Knowledge Graph}, author={Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang}, booktitle={Proceedings of AAAI 2020}, year={2020} } ```

评论收藏

内容反馈

版权申诉