CMRC2019中文阅读理解比赛方案.zip资源-CSDN文库

共22个文件

py：9个

pyc：5个

sh：2个

版权申诉

31 浏览量 2023-09-30 14:47:15 上传评论收藏 9.48MB ZIP 举报

资源推荐

资源详情

资源评论

收起资源包目录

CMRC2019中文阅读理解比赛方案.zip （22个子文件）

CMRC2019-master

baseline

run_cmrc2019_baseline.py 54KB

requirements.txt 181B

pytorch_pretrained_bert

__init__.py 478B

convert_tf_2_torch.sh 456B

file_utils.py 8KB

modeling.py 54KB

optimization.py 7KB

__main__.py 908B

tokenization.py 13KB

__pycache__

tokenization.cpython-36.pyc 9KB

__init__.cpython-36.pyc 710B

modeling.cpython-36.pyc 48KB

optimization.cpython-36.pyc 4KB

file_utils.cpython-36.pyc 7KB

convert_tf_checkpoint_to_pytorch.py 4KB

run.sh 614B

README.md 3KB

data

cmrc2019_train.json 23.26MB

cmrc2019_trial.json 354KB

sample_submission

trial_submission.zip 2KB

eval

cmrc2019_evaluate.py 2KB

README.md 1KB

## Baseline System for CMRC 2019 We provide a simple BERT-based baseline system (PyTorch version) for participants. ### Note - We assume that you have been familiar with [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT). - We are NOT responsible for helping participants in building up their baseline systems. - The baseline codes are only for helping participants to better understand the basic routine of this task. ## Content | Section | Description | |---|---| | [Requirements](#Requirements) | Describe dependency requirements | | [Preparations](#Preparation) | Describe preparation steps before running | | [Training](#Training) | Training command line | | [Testing](#Testing) | Testing command line | | [Baseline Results](#Baseline-Results) | Baseline Results | | [Acknowledgement](#Acknowledgement) | - | ## Requirements Our codes are adapted from [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT). If you are familiar with that, there will be nothing special here. Specifically, we use `pytorch 1.0.0` for baseline system. ## Preparation ### Step 1: Clone the repository ``` git clone https://github.com/ymcui/cmrc2019.git ``` ### Step 2: Prepare Chinese BERT weights (PyTorch version) You have to get pre-trained Chinese BERT for initialization purpose. Please infer official guidelines through: https://github.com/huggingface/pytorch-pretrained-BERT#Command-line-interface ## Training We assume that all the files are placed in the correct path. Pre-trained Chinese BERT weights (PyTorch version) should be placed in `bert_weights_chinese` folder. ``` python run_baseline.py \ --bert_model bert-base-chinese \ --vocab_file ./bert_weights_chinese/vocab.txt \ --bert_config_file ./bert_weights_chinese/bert_config.json \ --init_checkpoint ./bert_weights_chinese/pytorch_model.bin \ --do_train \ --do_predict \ --train_file cmrc2019_train.json \ --predict_file cmrc2019_trial.json \ --train_batch_size 24 \ --learning_rate 2e-5 \ --num_train_epochs 3.0 \ --max_seq_length 512 \ --output_dir ./output_model ``` We use one NVIDIA V100 (32GB) for training and roughly take 4~5 hours. ## Testing If you have successfully trained your model, you could use the following command for testing your model on the testing sets. ``` python run_baseline.py \ --bert_model bert-base-chinese \ --vocab_file ./bert_weights_chinese/vocab.txt \ --bert_config_file ./bert_weights_chinese/bert_config.json \ --do_predict \ --predict_file cmrc2019_trial.json \ --max_seq_length 512 \ --output_dir ./output_model ``` After running testing script, there will be a `predictions.json` file generated under `--output_dir` folder. Then we can use official evaluation script `cmrc2019_evaluate.py` (in this GitHub `eval` directory) to get final results. ``` python cmrc2019_evaluate.py cmrc2019_trial.json predictions.json ``` And the results will be shown like ``` FILE: predictions.json QAC: 64.6276595745 PAC: 10.7913669065 TOTAL: 1504 SKIP: 0 ``` ## Baseline Results We provide a BERT-based baseline system for participants (will be available shortly). Results on other sets will be annouced later. Note: Due to the non-determinism on GPU, your results will be slightly different. | Data | QAC | PAC | | :------ | :-----: | :-----: | | Trial data | 64.627% | 10.791% | | Development data | - | - | | Qualifying data | - | - | | Test data | - | - | ## Acknowledgement Our codes are adapted from [PyTorch BERT](https://github.com/huggingface/pytorch-pretrained-BERT).

评论收藏

内容反馈

版权申诉