## Baseline System for CMRC 2019
We provide a simple BERT-based baseline system (PyTorch version) for participants. </br>
### Note
- We assume that you have been familiar with [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT). </br>
- We are NOT responsible for helping participants in building up their baseline systems.
- The baseline codes are only for helping participants to better understand the basic routine of this task.
## Content
| Section | Description |
|---|---|
| [Requirements](#Requirements) | Describe dependency requirements |
| [Preparations](#Preparation) | Describe preparation steps before running |
| [Training](#Training) | Training command line |
| [Testing](#Testing) | Testing command line |
| [Baseline Results](#Baseline-Results) | Baseline Results |
| [Acknowledgement](#Acknowledgement) | - |
## Requirements
Our codes are adapted from [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT). </br>
If you are familiar with that, there will be nothing special here. </br>
Specifically, we use `pytorch 1.0.0` for baseline system.
## Preparation
### Step 1: Clone the repository
```
git clone https://github.com/ymcui/cmrc2019.git
```
### Step 2: Prepare Chinese BERT weights (PyTorch version)
You have to get pre-trained Chinese BERT for initialization purpose. </br>
Please infer official guidelines through: https://github.com/huggingface/pytorch-pretrained-BERT#Command-line-interface
## Training
We assume that all the files are placed in the correct path. </br>
Pre-trained Chinese BERT weights (PyTorch version) should be placed in `bert_weights_chinese` folder.
```
python run_baseline.py \
--bert_model bert-base-chinese \
--vocab_file ./bert_weights_chinese/vocab.txt \
--bert_config_file ./bert_weights_chinese/bert_config.json \
--init_checkpoint ./bert_weights_chinese/pytorch_model.bin \
--do_train \
--do_predict \
--train_file cmrc2019_train.json \
--predict_file cmrc2019_trial.json \
--train_batch_size 24 \
--learning_rate 2e-5 \
--num_train_epochs 3.0 \
--max_seq_length 512 \
--output_dir ./output_model
```
We use one NVIDIA V100 (32GB) for training and roughly take 4~5 hours.
## Testing
If you have successfully trained your model, you could use the following command for testing your model on the testing sets.
```
python run_baseline.py \
--bert_model bert-base-chinese \
--vocab_file ./bert_weights_chinese/vocab.txt \
--bert_config_file ./bert_weights_chinese/bert_config.json \
--do_predict \
--predict_file cmrc2019_trial.json \
--max_seq_length 512 \
--output_dir ./output_model
```
After running testing script, there will be a `predictions.json` file generated under `--output_dir` folder.
Then we can use official evaluation script `cmrc2019_evaluate.py` (in this GitHub `eval` directory) to get final results.
```
python cmrc2019_evaluate.py cmrc2019_trial.json predictions.json
```
And the results will be shown like
```
FILE: predictions.json
QAC: 64.6276595745
PAC: 10.7913669065
TOTAL: 1504
SKIP: 0
```
## Baseline Results
We provide a BERT-based baseline system for participants (will be available shortly).
Results on other sets will be annouced later.
Note: Due to the non-determinism on GPU, your results will be slightly different.
| Data | QAC | PAC |
| :------ | :-----: | :-----: |
| Trial data | 64.627% | 10.791% |
| Development data | - | - |
| Qualifying data | - | - |
| Test data | - | - |
## Acknowledgement
Our codes are adapted from [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT).
没有合适的资源?快使用搜索试试~ 我知道了~
CMRC2019中文阅读理解比赛方案.zip
共22个文件
py:9个
pyc:5个
sh:2个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 31 浏览量
2023-09-30
14:47:15
上传
评论
收藏 9.48MB ZIP 举报
温馨提示
比赛项目源码
资源推荐
资源详情
资源评论
收起资源包目录
CMRC2019中文阅读理解比赛方案.zip (22个子文件)
CMRC2019-master
baseline
run_cmrc2019_baseline.py 54KB
requirements.txt 181B
pytorch_pretrained_bert
__init__.py 478B
convert_tf_2_torch.sh 456B
file_utils.py 8KB
modeling.py 54KB
optimization.py 7KB
__main__.py 908B
tokenization.py 13KB
__pycache__
tokenization.cpython-36.pyc 9KB
__init__.cpython-36.pyc 710B
modeling.cpython-36.pyc 48KB
optimization.cpython-36.pyc 4KB
file_utils.cpython-36.pyc 7KB
convert_tf_checkpoint_to_pytorch.py 4KB
run.sh 614B
README.md 3KB
data
cmrc2019_train.json 23.26MB
cmrc2019_trial.json 354KB
sample_submission
trial_submission.zip 2KB
eval
cmrc2019_evaluate.py 2KB
README.md 1KB
共 22 条
- 1
资源评论
学术菜鸟小晨
- 粉丝: 1w+
- 资源: 5009
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功