# Notice
This repository is no longer maintained.
If any issues, check by yourself.
# CasRel Model Pytorch reimplement 3
The code is the PyTorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020.
The [official code](https://github.com/weizhepei/CasRel) was written in keras.
I have encountered a lot of troubles with the keras version, so I decided to rewrite the code in PyTorch.
# Introduction
I followed the previous work of [longlongman](https://github.com/longlongman/CasRel-pytorch-reimplement)
and [JuliaSun623](https://github.com/JuliaSun623/CasRel_fastNLP).
So I have to express sincere thanks to them.
I made some changes in order to better apply to the Chinese Dataset.
The changes I have made are listed:
- I changed the tokenizer from HBTokenizer to BertTokenizer, so Chinese sentences are tokenized by single character.
(Note that you don't need to worry about keras)
- I substituted the original pretrained model with 'bert-base-chinese'.
- I used fastNLP to build the datasets.
- I changed the encoding and decoding methods in order to fit the Chinese Dataset.
- I reconstruct the structure for readability.
# Requirements
- torch==1.8.0+cu111
- transformers==4.3.3
- fastNLP==0.6.0
- tqdm==4.59.0
- numpy==1.20.1
# Dataset
I preprocessed the open-source dataset from Baidu. I did some cleaning, so the data given have 18 relation types.
Some noisy data are eliminated.
The data are in form of json. Take one as an example:
```json
{
"text": "陶喆的一首《好好说再见》推荐给大家,希望你们能够喜欢",
"spo_list": [
{
"predicate": "歌手",
"object_type": "人物",
"subject_type": "歌曲",
"object": "陶喆",
"subject": "好好说再见"
}
]
}
```
In fact the field object_type and subject_type are not used.
If you have your own data, you can organize your data in the same format.
# Usage
```
python run.py
```
I have already set the default value of the model, but you can still set your own configuration in model/config.py
# Results
The best F1 score on test data is 0.78 with a precision of 0.80 and recall of 0.76.
It is to my expectation although it may not reach its utmost.
I have also trained the [SpERT](https://github.com/lavis-nlp/spert) model,
and CasRel turns out to perform better.
More experiments need to be carried out since there are slight differences in both criterion and datasets.
# Experiences
- Learning rate 1e-5 seems a good choice. If you change the learning rate, the model will be dramatically affected.
- It shows little improvement when I substitute BERT with RoBERTa.
- It is crucial to shuffle the datasets in order to avoid overfitting.
没有合适的资源?快使用搜索试试~ 我知道了~
Reimplement CasRel model in PyTorch.使用PyTorch对吉林大学CasRel模型.zip
共11个文件
py:6个
json:4个
md:1个
需积分: 1 2 下载量 131 浏览量
2024-03-06
22:58:17
上传
评论 1
收藏 6.8MB ZIP 举报
温馨提示
Reimplement CasRel model in PyTorch.使用PyTorch对吉林大学CasRel模型.zip
资源推荐
资源详情
资源评论
收起资源包目录
Reimplement CasRel model in PyTorch.使用PyTorch对吉林大学CasRel模型.zip (11个子文件)
CasRelPyTorch-master
Run.py 3KB
data
baidu
test.json 5.37MB
train.json 20.12MB
rel.json 364B
dev.json 4.03MB
model
evaluate.py 5KB
casRel.py 2KB
data.py 5KB
callback.py 2KB
config.py 1KB
README.md 3KB
共 11 条
- 1
资源评论
日刷百题
- 粉丝: 5354
- 资源: 951
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功