面向知识图谱的问答系统项目.zip_知识图谱问答系统资源-CSDN文库

共116个文件

json：42个

py：25个

pyc：24个

版权申诉

知识图谱

机器学习

毕业设计

源码

125 浏览量 2023-11-14 00:11:07 上传评论收藏 10.77MB ZIP 举报

知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码知识图谱项目可用于毕业设计、课程设计、项目实践，提供设计资料+源码

资源推荐

资源详情

资源评论

收起资源包目录

面向知识图谱的问答系统项目.zip （116个子文件）

CMeIE_train.json 11.03MB

CMeEE_train.json 9.96MB

KUAKE-QTR_train.json 3.55MB

CMeEE_dev.json 3.29MB

CHIP-CTC_train.json 3.13MB

CHIP-STS_train.json 2.85MB

CMeIE_dev.json 2.69MB

KUAKE-QQR_train.json 1.89MB

CHIP-STS_test.json 1.67MB

CHIP-CTC_test.json 1.07MB

CHIP-CTC_dev.json 1.04MB

CMeIE_test.json 1.02MB

KUAKE-QIC_train.json 758KB

CHIP-STS_dev.json 729KB

CHIP-CDN_train.json 691KB

KUAKE-QTR_test.json 646KB

CHIP-CDN_test.json 561KB

CMeEE_test.json 498KB

KUAKE-QTR_dev.json 436KB

CHIP-CDN_dev.json 228KB

KUAKE-QIC_dev.json 226KB

KUAKE-QQR_dev.json 206KB

KUAKE-QQR_test.json 178KB

KUAKE-QIC_dev2test.json 173KB

KUAKE-QIC_test.json 141KB

53_schemas.json 4KB

example_gold.json 3KB

example_pred.json 2KB

example_gold.json 2KB

example_pred.json 1KB

example_pred.json 568B

example_golden.json 568B

example_gold.json 529B

example_pred.json 529B

example_gold.json 494B

example_pred.json 473B

example_gold.json 445B

example_pred.json 420B

example_gold.json 420B

example_pred.json 384B

example_pred.json 368B

example_gold.json 368B

LICENSE 11KB

README.md 32KB

README_ZH.md 31KB

train.py 95KB

modeling.py 70KB

data_process.py 38KB

tokenization.py 18KB

dataset.py 16KB

optimization.py 13KB

run_ie.py 12KB

run_cdn.py 11KB

data.py 10KB

file_utils.py 9KB

run_classifier.py 9KB

model.py 6KB

utils.py 5KB

cblue_commit.py 5KB

run.py 2KB

ngram_utils.py 2KB

tf2torch.py 2KB

cblue_metrics.py 1KB

__init__.py 721B

__init__.py 710B

__init__.py 700B

__init__.py 519B

__init__.py 324B

__init__.py 0B

modeling.cpython-36.pyc 54KB

train.cpython-36.pyc 50KB

data_process.cpython-36.pyc 31KB

data_process.cpython-37.pyc 30KB

tokenization.cpython-36.pyc 13KB

dataset.cpython-36.pyc 11KB

dataset.cpython-37.pyc 11KB

optimization.cpython-36.pyc 11KB

file_utils.cpython-36.pyc 7KB

data.cpython-36.pyc 6KB

utils.cpython-36.pyc 6KB

utils.cpython-37.pyc 6KB

cblue_commit.cpython-36.pyc 4KB

model.cpython-36.pyc 4KB

cblue_metrics.cpython-36.pyc 2KB

ngram_utils.cpython-36.pyc 1KB

__init__.cpython-36.pyc 847B

__init__.cpython-37.pyc 767B

__init__.cpython-36.pyc 743B

__init__.cpython-36.pyc 737B

__init__.cpython-36.pyc 553B

__init__.cpython-36.pyc 435B

__init__.cpython-37.pyc 132B

__init__.cpython-36.pyc 102B

run_cdn.sh 2KB

run_qic2.sh 1KB

run_qicCase.sh 1KB

run_qic.sh 1KB

run_ctc.sh 1KB

run_ee.sh 1KB

共 116 条

[**English**](https://github.com/CBLUEbenchmark/CBLUE) | [**中文说明**](https://github.com/CBLUEbenchmark/CBLUE/blob/main/README_ZH.md) <p align="center"><img src="resources/img/LOGO.png" height="23%" width="23%" /></p> # CBLUE [![License](https://img.shields.io/github/license/CBLUEbenchmark/CBLUE?style=flat-square)](https://github.com/CBLUEbenchmark/CBLUE/blob/master/LICENSE) [![GitHub stars](https://img.shields.io/github/stars/CBLUEbenchmark/CBLUE?style=flat-square)](https://github.com/CBLUEbenchmark/CBLUE/stargazers) ![](https://img.shields.io/badge/PRs-Welcome-red) AI (Artificial Intelligence) plays an indispensable role in the biomedical field, helping improve medical technology. For further accelerating AI research in the biomedical field, we present **Chinese Biomedical Language Understanding Evaluation** (CBLUE), including datasets collected from real-world biomedical scenarios, baseline models, and an online platform for model evaluation, comparison, and analysis. ## CBLUE Benchmark We evaluate the current 11 Chinese pre-trained models on the eight biomedical language understanding tasks and report the baselines of these tasks. | Model | CMedEE | CMedIE | CDN | CTC | STS | QIC | QTR | QQR | Avg. | | ------------------------------------------------------------ | :------: | :----: | :------: | :------: | :------: | :------: | :------: | :------: | :--: | | [BERT-base](https://github.com/ymcui/Chinese-BERT-wwm) | 62.1 | 54.0 | 55.4 | 69.2 | 83.0 | 84.3 | 60.0 | **84.7** | 69.0 | | [BERT-wwm-ext-base](https://github.com/ymcui/Chinese-BERT-wwm) | 61.7 | 54.0 | 55.4 | 70.1 | 83.9 | 84.5 | 60.9 | 84.4 | 69.4 | | [ALBERT-tiny](https://github.com/brightmart/albert_zh) | 50.5 | 35.9 | 50.2 | 61.0 | 79.7 | 75.8 | 55.5 | 79.8 | 61.1 | | [ALBERT-xxlarge](https://huggingface.co/voidful/albert_chinese_xxlarge) | 61.8 | 47.6 | 37.5 | 66.9 | 84.8 | 84.8 | 62.2 | 83.1 | 66.1 | | [RoBERTa-large](https://github.com/brightmart/roberta_zh) | 62.1 | 54.4 | 56.5 | **70.9** | 84.7 | 84.2 | 60.9 | 82.9 | 69.6 | | [RoBERTa-wwm-ext-base](https://github.com/ymcui/Chinese-BERT-wwm) | 62.4 | 53.7 | 56.4 | 69.4 | 83.7 | **85.5** | 60.3 | 82.7 | 69.3 | | [RoBERTa-wwm-ext-large](https://github.com/ymcui/Chinese-BERT-wwm) | 61.8 | 55.9 | 55.7 | 69.0 | 85.2 | 85.3 | 62.8 | 84.4 | 70.0 | | [PCL-MedBERT](https://code.ihub.org.cn/projects/1775) | 60.6 | 49.1 | 55.8 | 67.8 | 83.8 | 84.3 | 59.3 | 82.5 | 67.9 | | [ZEN](https://github.com/sinovation/ZEN) | 61.0 | 50.1 | 57.8 | 68.6 | 83.5 | 83.2 | 60.3 | 83.0 | 68.4 | | [MacBERT-base](https://huggingface.co/hfl/chinese-macbert-base) | 60.7 | 53.2 | 57.7 | 67.7 | 84.4 | 84.9 | 59.7 | 84.0 | 69.0 | | [MacBERT-large](https://huggingface.co/hfl/chinese-macbert-large) | **62.4** | 51.6 | **59.3** | 68.6 | **85.6** | 82.7 | **62.9** | 83.5 | 69.6 | | Human | 67.0 | 66.0 | 65.0 | 78.0 | 93.0 | 88.0 | 71.0 | 89.0 | 77.1 | ## Baseline of tasks We present the baseline models on the biomedical tasks and release corresponding codes for a quick start. ### Requirements python3 / pytorch 1.7 / transformers 4.5.1 / jieba / gensim / sklearn ### Data preparation [Download dataset](https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414) The whole zip package includes the datasets of 8 biomedical NLU tasks (more detail in the following section). Every task includes the following files: ```text ├── {Task} | └── {Task}_train.json | └── {Task}_test.json | └── {Task}_dev.json | └── example_gold.json | └── example_pred.json | └── README.md ``` **Notice: a few tasks have additional files, e.g. it includes 'category.xlsx' file in the CHIP-CTC task.** You can download Chinese pre-trained models according to your need (download URLs are provided above). With [Huggingface-Transformers](https://huggingface.co/) , the models above could be easily accessed and loaded. The reference directory: ```text ├── CBLUE | └── baselines | └── run_classifier.py | └── ... | └── examples | └── run_qqr.sh | └── ... | └── cblue | └── CBLUEDatasets | └── KUAKE-QQR | └── ... | └── data | └── output | └── model_data | └── bert-base | └── ... | └── result_output | └── KUAKE-QQR_test.json | └── ... ``` ### Running examples The shell files of training and evaluation for every task are provided in `examples/` , and could directly run. Also, you can utilize the running codes in `baselines/` , and write your shell files according to your need: - `baselines/run_classifer.py`: support `{sts, qqr, qtr, qic, ctc, ee}` tasks; - `baselines/run_cdn.py`: support `{cdn}` task; - `baselines/run_ie.py`: support `{ie}` task. #### Training models Running shell files: `bash examples/run_{task}.sh`, and the contents of shell files are as follow: ```shell DATA_DIR="CBLUEDatasets" TASK_NAME="qqr" MODEL_TYPE="bert" MODEL_DIR="data/model_data" MODEL_NAME="chinese-bert-wwm" OUTPUT_DIR="data/output" RESULT_OUTPUT_DIR="data/result_output" MAX_LENGTH=128 python baselines/run_classifier.py \ --data_dir=${DATA_DIR} \ --model_type=${MODEL_TYPE} \ --model_dir=${MODEL_DIR} \ --model_name=${MODEL_NAME} \ --task_name=${TASK_NAME} \ --output_dir=${OUTPUT_DIR} \ --result_output_dir=${RESULT_OUTPUT_DIR} \ --do_train \ --max_length=${MAX_LENGTH} \ --train_batch_size=16 \ --eval_batch_size=16 \ --learning_rate=3e-5 \ --epochs=3 \ --warmup_proportion=0.1 \ --earlystop_patience=3 \ --logging_steps=250 \ --save_steps=250 \ --seed=2021 ``` **Notice: the best checkpoint is saved in** `OUTPUT_DIR/MODEL_NAME/`. - `MODEL_TYPE`: support `{bert, roberta, albert, zen}` model types; - `MODEL_NAME`: support `{bert-base, bert-wwm-ext, albert-tiny, albert-xxlarge, zen, pcl-medbert, roberta-large, roberta-wwm-ext-base, roberta-wwm-ext-large, macbert-base, macbert-large}` Chinese pre-trained models. The `MODEL_TYPE`-`MODEL_NAME` mappings are listed below. | MODEL_TYPE | MODEL_NAME | | :--------: | :----------------------------------------------------------- | | `bert` | `bert-base`, `bert-wwm-ext`, `pcl-medbert`, `macbert-base`, `macbert-large` | | `roberta` | `roberta-large`, `roberta-wwm-ext-base`, `roberta-wwm-ext-large` | | `albert` | `albert-tiny`, `albert-xxlarge` | | `zen` | `zen` | #### Inference & generation of results Running shell files: `base examples/run_{task}.sh predict`, and the contents of shell files are as follows: ```shell DATA_DIR="CBLUEDatasets" TASK_NAME="qqr" MODEL_TYPE="bert" MODEL_DIR="data/model_data" MODEL_NAME="chinese-bert-wwm" OUTPUT_DIR="data/output" RESULT_OUTPUT_DIR="data/result_output" MAX_LENGTH=128 python baselines/run_classifier.py \ --data_dir=${DATA_DIR} \ --model_type=${MODEL_TYPE} \ --model_name=${MODEL_NAME} \ --model_dir=${MODEL_DIR} \ --task_name=${TASK_NAME} \ --output_dir=${OUTPUT_DIR} \ --result_output_dir=${RESULT_OUTPUT_DIR} \ --do_predict \ --max_length=${MAX_LENGTH} \ --eval_batch_size=16 \ --seed=2021 ``` **Notice: the result of prediction** `{TASK_NAME}_test.json` **will be generated in** `RESULT_OUTPUT_DIR` . ### Submit results Compre

评论收藏

内容反馈

版权申诉