[**English**](https://github.com/CBLUEbenchmark/CBLUE) | [**中文说明**](https://github.com/CBLUEbenchmark/CBLUE/blob/main/README_ZH.md)
<p align="center"><img src="resources/img/LOGO.png" height="23%" width="23%" /></p>
# CBLUE
[![License](https://img.shields.io/github/license/CBLUEbenchmark/CBLUE?style=flat-square)](https://github.com/CBLUEbenchmark/CBLUE/blob/master/LICENSE)
[![GitHub stars](https://img.shields.io/github/stars/CBLUEbenchmark/CBLUE?style=flat-square)](https://github.com/CBLUEbenchmark/CBLUE/stargazers)
![](https://img.shields.io/badge/PRs-Welcome-red)
AI (Artificial Intelligence) plays an indispensable role in the biomedical field, helping improve medical technology. For further accelerating AI research in the biomedical field, we present **Chinese Biomedical Language Understanding Evaluation** (CBLUE), including datasets collected from real-world biomedical scenarios, baseline models, and an online platform for model evaluation, comparison, and analysis.
## CBLUE Benchmark
We evaluate the current 11 Chinese pre-trained models on the eight biomedical language understanding tasks and report the baselines of these tasks.
| Model | CMedEE | CMedIE | CDN | CTC | STS | QIC | QTR | QQR | Avg. |
| ------------------------------------------------------------ | :------: | :----: | :------: | :------: | :------: | :------: | :------: | :------: | :--: |
| [BERT-base](https://github.com/ymcui/Chinese-BERT-wwm) | 62.1 | 54.0 | 55.4 | 69.2 | 83.0 | 84.3 | 60.0 | **84.7** | 69.0 |
| [BERT-wwm-ext-base](https://github.com/ymcui/Chinese-BERT-wwm) | 61.7 | 54.0 | 55.4 | 70.1 | 83.9 | 84.5 | 60.9 | 84.4 | 69.4 |
| [ALBERT-tiny](https://github.com/brightmart/albert_zh) | 50.5 | 35.9 | 50.2 | 61.0 | 79.7 | 75.8 | 55.5 | 79.8 | 61.1 |
| [ALBERT-xxlarge](https://huggingface.co/voidful/albert_chinese_xxlarge) | 61.8 | 47.6 | 37.5 | 66.9 | 84.8 | 84.8 | 62.2 | 83.1 | 66.1 |
| [RoBERTa-large](https://github.com/brightmart/roberta_zh) | 62.1 | 54.4 | 56.5 | **70.9** | 84.7 | 84.2 | 60.9 | 82.9 | 69.6 |
| [RoBERTa-wwm-ext-base](https://github.com/ymcui/Chinese-BERT-wwm) | 62.4 | 53.7 | 56.4 | 69.4 | 83.7 | **85.5** | 60.3 | 82.7 | 69.3 |
| [RoBERTa-wwm-ext-large](https://github.com/ymcui/Chinese-BERT-wwm) | 61.8 | 55.9 | 55.7 | 69.0 | 85.2 | 85.3 | 62.8 | 84.4 | 70.0 |
| [PCL-MedBERT](https://code.ihub.org.cn/projects/1775) | 60.6 | 49.1 | 55.8 | 67.8 | 83.8 | 84.3 | 59.3 | 82.5 | 67.9 |
| [ZEN](https://github.com/sinovation/ZEN) | 61.0 | 50.1 | 57.8 | 68.6 | 83.5 | 83.2 | 60.3 | 83.0 | 68.4 |
| [MacBERT-base](https://huggingface.co/hfl/chinese-macbert-base) | 60.7 | 53.2 | 57.7 | 67.7 | 84.4 | 84.9 | 59.7 | 84.0 | 69.0 |
| [MacBERT-large](https://huggingface.co/hfl/chinese-macbert-large) | **62.4** | 51.6 | **59.3** | 68.6 | **85.6** | 82.7 | **62.9** | 83.5 | 69.6 |
| Human | 67.0 | 66.0 | 65.0 | 78.0 | 93.0 | 88.0 | 71.0 | 89.0 | 77.1 |
## Baseline of tasks
We present the baseline models on the biomedical tasks and release corresponding codes for a quick start.
### Requirements
python3 / pytorch 1.7 / transformers 4.5.1 / jieba / gensim / sklearn
### Data preparation
[Download dataset](https://tianchi.aliyun.com/dataset/dataDetail?dataId=95414)
The whole zip package includes the datasets of 8 biomedical NLU tasks (more detail in the following section). Every task includes the following files:
```text
├── {Task}
| └── {Task}_train.json
| └── {Task}_test.json
| └── {Task}_dev.json
| └── example_gold.json
| └── example_pred.json
| └── README.md
```
**Notice: a few tasks have additional files, e.g. it includes 'category.xlsx' file in the CHIP-CTC task.**
You can download Chinese pre-trained models according to your need (download URLs are provided above). With [Huggingface-Transformers](https://huggingface.co/) , the models above could be easily accessed and loaded.
The reference directory:
```text
├── CBLUE
| └── baselines
| └── run_classifier.py
| └── ...
| └── examples
| └── run_qqr.sh
| └── ...
| └── cblue
| └── CBLUEDatasets
| └── KUAKE-QQR
| └── ...
| └── data
| └── output
| └── model_data
| └── bert-base
| └── ...
| └── result_output
| └── KUAKE-QQR_test.json
| └── ...
```
### Running examples
The shell files of training and evaluation for every task are provided in `examples/` , and could directly run.
Also, you can utilize the running codes in `baselines/` , and write your shell files according to your need:
- `baselines/run_classifer.py`: support `{sts, qqr, qtr, qic, ctc, ee}` tasks;
- `baselines/run_cdn.py`: support `{cdn}` task;
- `baselines/run_ie.py`: support `{ie}` task.
#### Training models
Running shell files: `bash examples/run_{task}.sh`, and the contents of shell files are as follow:
```shell
DATA_DIR="CBLUEDatasets"
TASK_NAME="qqr"
MODEL_TYPE="bert"
MODEL_DIR="data/model_data"
MODEL_NAME="chinese-bert-wwm"
OUTPUT_DIR="data/output"
RESULT_OUTPUT_DIR="data/result_output"
MAX_LENGTH=128
python baselines/run_classifier.py \
--data_dir=${DATA_DIR} \
--model_type=${MODEL_TYPE} \
--model_dir=${MODEL_DIR} \
--model_name=${MODEL_NAME} \
--task_name=${TASK_NAME} \
--output_dir=${OUTPUT_DIR} \
--result_output_dir=${RESULT_OUTPUT_DIR} \
--do_train \
--max_length=${MAX_LENGTH} \
--train_batch_size=16 \
--eval_batch_size=16 \
--learning_rate=3e-5 \
--epochs=3 \
--warmup_proportion=0.1 \
--earlystop_patience=3 \
--logging_steps=250 \
--save_steps=250 \
--seed=2021
```
**Notice: the best checkpoint is saved in** `OUTPUT_DIR/MODEL_NAME/`.
- `MODEL_TYPE`: support `{bert, roberta, albert, zen}` model types;
- `MODEL_NAME`: support `{bert-base, bert-wwm-ext, albert-tiny, albert-xxlarge, zen, pcl-medbert, roberta-large, roberta-wwm-ext-base, roberta-wwm-ext-large, macbert-base, macbert-large}` Chinese pre-trained models.
The `MODEL_TYPE`-`MODEL_NAME` mappings are listed below.
| MODEL_TYPE | MODEL_NAME |
| :--------: | :----------------------------------------------------------- |
| `bert` | `bert-base`, `bert-wwm-ext`, `pcl-medbert`, `macbert-base`, `macbert-large` |
| `roberta` | `roberta-large`, `roberta-wwm-ext-base`, `roberta-wwm-ext-large` |
| `albert` | `albert-tiny`, `albert-xxlarge` |
| `zen` | `zen` |
#### Inference & generation of results
Running shell files: `base examples/run_{task}.sh predict`, and the contents of shell files are as follows:
```shell
DATA_DIR="CBLUEDatasets"
TASK_NAME="qqr"
MODEL_TYPE="bert"
MODEL_DIR="data/model_data"
MODEL_NAME="chinese-bert-wwm"
OUTPUT_DIR="data/output"
RESULT_OUTPUT_DIR="data/result_output"
MAX_LENGTH=128
python baselines/run_classifier.py \
--data_dir=${DATA_DIR} \
--model_type=${MODEL_TYPE} \
--model_name=${MODEL_NAME} \
--model_dir=${MODEL_DIR} \
--task_name=${TASK_NAME} \
--output_dir=${OUTPUT_DIR} \
--result_output_dir=${RESULT_OUTPUT_DIR} \
--do_predict \
--max_length=${MAX_LENGTH} \
--eval_batch_size=16 \
--seed=2021
```
**Notice: the result of prediction** `{TASK_NAME}_test.json` **will be generated in** `RESULT_OUTPUT_DIR` .
### Submit results
Compre
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码 知识图谱项目 可用于毕业设计、课程设计、项目实践,提供设计资料+源码
资源推荐
资源详情
资源评论
收起资源包目录
面向知识图谱的问答系统项目.zip (116个子文件)
CMeIE_train.json 11.03MB
CMeEE_train.json 9.96MB
KUAKE-QTR_train.json 3.55MB
CMeEE_dev.json 3.29MB
CHIP-CTC_train.json 3.13MB
CHIP-STS_train.json 2.85MB
CMeIE_dev.json 2.69MB
KUAKE-QQR_train.json 1.89MB
CHIP-STS_test.json 1.67MB
CHIP-CTC_test.json 1.07MB
CHIP-CTC_dev.json 1.04MB
CMeIE_test.json 1.02MB
KUAKE-QIC_train.json 758KB
CHIP-STS_dev.json 729KB
CHIP-CDN_train.json 691KB
KUAKE-QTR_test.json 646KB
CHIP-CDN_test.json 561KB
CMeEE_test.json 498KB
KUAKE-QTR_dev.json 436KB
CHIP-CDN_dev.json 228KB
KUAKE-QIC_dev.json 226KB
KUAKE-QQR_dev.json 206KB
KUAKE-QQR_test.json 178KB
KUAKE-QIC_dev2test.json 173KB
KUAKE-QIC_test.json 141KB
53_schemas.json 4KB
example_gold.json 3KB
example_pred.json 2KB
example_gold.json 2KB
example_pred.json 1KB
example_pred.json 568B
example_golden.json 568B
example_gold.json 529B
example_pred.json 529B
example_gold.json 494B
example_pred.json 473B
example_gold.json 445B
example_pred.json 420B
example_gold.json 420B
example_pred.json 384B
example_pred.json 368B
example_gold.json 368B
LICENSE 11KB
README.md 32KB
README_ZH.md 31KB
train.py 95KB
modeling.py 70KB
data_process.py 38KB
tokenization.py 18KB
dataset.py 16KB
optimization.py 13KB
run_ie.py 12KB
run_cdn.py 11KB
data.py 10KB
file_utils.py 9KB
run_classifier.py 9KB
model.py 6KB
utils.py 5KB
cblue_commit.py 5KB
run.py 2KB
ngram_utils.py 2KB
tf2torch.py 2KB
cblue_metrics.py 1KB
__init__.py 721B
__init__.py 710B
__init__.py 700B
__init__.py 519B
__init__.py 324B
__init__.py 0B
__init__.py 0B
modeling.cpython-36.pyc 54KB
train.cpython-36.pyc 50KB
data_process.cpython-36.pyc 31KB
data_process.cpython-37.pyc 30KB
tokenization.cpython-36.pyc 13KB
dataset.cpython-36.pyc 11KB
dataset.cpython-37.pyc 11KB
optimization.cpython-36.pyc 11KB
file_utils.cpython-36.pyc 7KB
data.cpython-36.pyc 6KB
utils.cpython-36.pyc 6KB
utils.cpython-37.pyc 6KB
cblue_commit.cpython-36.pyc 4KB
model.cpython-36.pyc 4KB
cblue_metrics.cpython-36.pyc 2KB
ngram_utils.cpython-36.pyc 1KB
__init__.cpython-36.pyc 847B
__init__.cpython-37.pyc 767B
__init__.cpython-36.pyc 743B
__init__.cpython-36.pyc 737B
__init__.cpython-36.pyc 553B
__init__.cpython-36.pyc 435B
__init__.cpython-37.pyc 132B
__init__.cpython-36.pyc 102B
run_cdn.sh 2KB
run_qic2.sh 1KB
run_qicCase.sh 1KB
run_qic.sh 1KB
run_ctc.sh 1KB
run_ee.sh 1KB
共 116 条
- 1
- 2
资源评论
辣椒种子
- 粉丝: 3414
- 资源: 5723
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 基于python开发使用深度学习去预测股票后续的价格+源码+文档(毕业设计&课程设计&项目开发)
- flowable-designer-5.22.0.zip
- threadmanager.cpp
- 腾讯云小程序 - 一站式开发与部署平台
- 基于JSP+Java+Servlet采用MVC模式开发的购物网站+源码(毕业设计&课程设计&项目开发)
- fastgestures安装包,模拟mac的触控板收拾,两指代表右击, 三指拖拽
- 基于组态王的升降式横移立体车库控制系统+源码(毕业设计&课程设计&项目开发)
- 基于python+Django和协同过滤算法的电影推荐系统+源码(毕业设计&课程设计&项目开发)
- 环境配置 vscode+jupyter
- 项目全部代码,还包含使用到的图片
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功