# K-BERT
![](https://img.shields.io/badge/license-MIT-000000.svg)
Sorce code and datasets for ["K-BERT: Enabling Language Representation with Knowledge Graph"](https://aaai.org/Papers/AAAI/2020GB/AAAI-LiuW.5594.pdf), which is implemented based on the [UER](https://github.com/dbiir/UER-py) framework.
## Requirements
Software:
```
Python3
Pytorch >= 1.0
argparse == 1.1
```
## Prepare
* Download the ``google_model.bin`` from [here](https://share.weiyun.com/5GuzfVX), and save it to the ``models/`` directory.
* Download the ``CnDbpedia.spo`` from [here](https://share.weiyun.com/5BvtHyO), and save it to the ``brain/kgs/`` directory.
* Optional - Download the datasets for evaluation from [here](https://share.weiyun.com/5Id9PVZ), unzip and place them in the ``datasets/`` directory.
The directory tree of K-BERT:
```
K-BERT
├── brain
│ ├── config.py
│ ├── __init__.py
│ ├── kgs
│ │ ├── CnDbpedia.spo
│ │ ├── HowNet.spo
│ │ └── Medical.spo
│ └── knowgraph.py
├── datasets
│ ├── book_review
│ │ ├── dev.tsv
│ │ ├── test.tsv
│ │ └── train.tsv
│ ├── chnsenticorp
│ │ ├── dev.tsv
│ │ ├── test.tsv
│ │ └── train.tsv
│ ...
│
├── models
│ ├── google_config.json
│ ├── google_model.bin
│ └── google_vocab.txt
├── outputs
├── uer
├── README.md
├── requirements.txt
├── run_kbert_cls.py
└── run_kbert_ner.py
```
## K-BERT for text classification
### Classification example
Run example on Book review with CnDbpedia:
```sh
CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_cls.py \
--pretrained_model_path ./models/google_model.bin \
--config_path ./models/google_config.json \
--vocab_path ./models/google_vocab.txt \
--train_path ./datasets/book_review/train.tsv \
--dev_path ./datasets/book_review/dev.tsv \
--test_path ./datasets/book_review/test.tsv \
--epochs_num 5 --batch_size 32 --kg_name CnDbpedia \
--output_model_path ./outputs/kbert_bookreview_CnDbpedia.bin \
> ./outputs/kbert_bookreview_CnDbpedia.log &
```
Results:
```
Best accuracy in dev : 88.80%
Best accuracy in test: 87.69%
```
Options of ``run_kbert_cls.py``:
```
useage: [--pretrained_model_path] - Path to the pre-trained model parameters.
[--config_path] - Path to the model configuration file.
[--vocab_path] - Path to the vocabulary file.
--train_path - Path to the training dataset.
--dev_path - Path to the validating dataset.
--test_path - Path to the testing dataset.
[--epochs_num] - The number of training epoches.
[--batch_size] - Batch size of the training process.
[--kg_name] - The name of knowledge graph, "HowNet", "CnDbpedia" or "Medical".
[--output_model_path] - Path to the output model.
```
### Classification benchmarks
Accuracy (dev/test %) on different dataset:
| Dataset | HowNet | CnDbpedia |
| :----- | :----: | :----: |
| Book review | 88.75/87.75 | 88.80/87.69 |
| ChnSentiCorp | 95.00/95.50 | 94.42/95.25 |
| Shopping | 97.01/96.92 | 96.94/96.73 |
| Weibo | 98.22/98.33 | 98.29/98.33 |
| LCQMC | 88.97/87.14 | 88.91/87.20 |
| XNLI | 77.11/77.07 | 76.99/77.43 |
## K-BERT for named entity recognization (NER)
### NER example
Run an example on the msra_ner dataset with CnDbpedia:
```
CUDA_VISIBLE_DEVICES='0' nohup python3 -u run_kbert_ner.py \
--pretrained_model_path ./models/google_model.bin \
--config_path ./models/google_config.json \
--vocab_path ./models/google_vocab.txt \
--train_path ./datasets/msra_ner/train.tsv \
--dev_path ./datasets/msra_ner/dev.tsv \
--test_path ./datasets/msra_ner/test.tsv \
--epochs_num 5 --batch_size 16 --kg_name CnDbpedia \
--output_model_path ./outputs/kbert_msraner_CnDbpedia.bin \
> ./outputs/kbert_msraner_CnDbpedia.log &
```
Results:
```
The best in dev : precision=0.957, recall=0.962, f1=0.960
The best in test: precision=0.953, recall=0.959, f1=0.956
```
Options of ``run_kbert_ner.py``:
```
useage: [--pretrained_model_path] - Path to the pre-trained model parameters.
[--config_path] - Path to the model configuration file.
[--vocab_path] - Path to the vocabulary file.
--train_path - Path to the training dataset.
--dev_path - Path to the validating dataset.
--test_path - Path to the testing dataset.
[--epochs_num] - The number of training epoches.
[--batch_size] - Batch size of the training process.
[--kg_name] - The name of knowledge graph.
[--output_model_path] - Path to the output model.
```
## K-BERT for domain-specific tasks
Experimental results on domain-specific tasks (Precision/Recall/F1 %):
| KG | Finance_QA | Law_QA | Finance_NER | Medicine_NER |
| :----- | :----: | :----: | :----: | :----: |
| HowNet | 0.805/0.888/0.845 | 0.842/0.903/0.871 | 0.860/0.888/0.874 | 0.935/0.939/0.937 |
| CN-DBpedia | 0.814/0.881/0.846 | 0.814/0.942/0.874 | 0.860/0.887/0.873 | 0.935/0.937/0.936 |
| MedicalKG | -- | -- | -- | 0.944/0.943/0.944 |
## Acknowledgement
This work is a joint study with the support of Peking University and Tencent Inc.
If you use this code, please cite this paper:
```
@inproceedings{weijie2019kbert,
title={{K-BERT}: Enabling Language Representation with Knowledge Graph},
author={Weijie Liu, Peng Zhou, Zhe Zhao, Zhiruo Wang, Qi Ju, Haotang Deng, Ping Wang},
booktitle={Proceedings of AAAI 2020},
year={2020}
}
```
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。 基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项目源码.zip 已获导师指导并通过的97分高分课程大作业项目,代码完整下载可用。基于BERT和知识图谱的中文电子病例医学命名实体识别项
资源推荐
资源详情
资源评论
收起资源包目录
基于BERT和知识图谱的中文电子病例医学命名实体识别.zip (98个子文件)
基于BERT和知识图谱的中文电子病例医学命名实体识别
run_bertless_ner_ccks2019.py 51KB
shellbywxx
train_ner_ccks_bertless_model.sh 590B
run_cls_predict.sh 472B
train_ner_demo.sh 485B
run_bertlesssingle_ner_predict.sh 515B
run_single_ner_predict.sh 527B
run_ner_ensemble_predict.sh 873B
train_ner_ccks_bertbase.sh 595B
run_kbert_ner_predict.py 52KB
run_kbert_cls.py 24KB
uer
__init__.py 0B
layers
__init__.py 0B
multi_headed_attn.py 2KB
embeddings.py 2KB
position_ffn.py 532B
layer_norm.py 491B
transformer.py 1KB
encoders
__init__.py 0B
mixed_encoder.py 4KB
bert_encoder.py 2KB
birnn_encoder.py 2KB
attn_encoder.py 1KB
gpt_encoder.py 1KB
cnn_encoder.py 3KB
rnn_encoder.py 3KB
subencoders
__init__.py 0B
cnn_subencoder.py 914B
rnn_subencoder.py 992B
avg_subencoder.py 425B
trainer.py 20KB
utils
__init__.py 0B
seed.py 412B
optimizers.py 6KB
tokenizer.py 11KB
vocab.py 5KB
misc.py 279B
subword.py 655B
data.py 44KB
act_fun.py 122B
constants.py 278B
config.py 596B
targets
__init__.py 0B
bert_target.py 3KB
bilm_target.py 3KB
cls_target.py 1KB
s2s_target.py 2KB
mlm_target.py 2KB
lm_target.py 2KB
nsp_target.py 1KB
model_builder.py 2KB
model_saver.py 284B
models
__init__.py 0B
model.py 1KB
bert_model.py 1KB
run_kbert_ner.py 17KB
run_kbert_ner_ensemble.py 67KB
brain
__init__.py 59B
knowgraph.py 16KB
kgs
Medical.spo 461KB
Medical-clean.spo 452KB
Medical-plus.spo 714KB
Medical-plus_with_check_entity.spo 837KB
medicalrepeatentities.txt 13KB
HowNet.spo 1.31MB
notebypaulwen.md 1KB
.gitignore 14B
config.py 897B
run_cls_predict_bywxx.py 16KB
datasets
medical_ner
train.tsv 1.68MB
dev.tsv 238KB
test.tsv 195KB
chnsenticorp
train.tsv 2.88MB
dev.tsv 365KB
test.tsv 361KB
ccks2019-datasets
train.tsv 2.45MB
ordered
train.tsv 2.28MB
dev.tsv 613KB
dev.tsv 435KB
test_test.tsv 3KB
ori
train.tsv 2.28MB
test_ori.txt 750KB
dev.tsv 613KB
train_ori.txt 2.39MB
test.tsv 916KB
transformdata.py 3KB
train_dev.tsv 2.87MB
test.tsv 914KB
.gitignore 40B
requirements.txt 114B
models
google_vocab.txt 107KB
.gitignore 6B
google_config.json 114B
.gitignore 1KB
run_kbertless_ner_predict.py 39KB
outputs
README.md 62B
README.md 6KB
networkbywxx
BILSTM_CRF_demo.py 10KB
run_kbert_ner_ccks2019.py 63KB
共 98 条
- 1
资源评论
- Belugalalala2024-05-15资源内容总结地很全面,值得借鉴,对我来说很有用,解决了我的燃眉之急。
- weixin_622279432024-04-22资源不错,很实用,内容全面,介绍详细,很好用,谢谢分享。
- qq_217436612024-02-09资源很不错,内容和描述一致,值得借鉴,赶紧学起来!
- shansha_zong4562024-06-25资源内容详尽,对我有使用价值,谢谢资源主的分享。
猰貐的新时代
- 粉丝: 1w+
- 资源: 2554
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功