## A simple BiLSTM-CRF model for Chinese Named Entity Recognition
This repository includes the code for buliding a very simple __character-based BiLSTM-CRF sequence labelling model__ for Chinese Named Entity Recognition task. Its goal is to recognize three types of Named Entity: PERSON, LOCATION and ORGANIZATION.
This code works on __Python 3 & TensorFlow 1.2__ and the following repository [https://github.com/guillaumegenthial/sequence_tagging](https://github.com/guillaumegenthial/sequence_tagging) gives me much help.
### model
This model is similar to the models provied by paper [1] and [2]. Its structure looks just like the following illustration:
![Network](./pics/pic1.png)
For one Chinese sentence, each character in this sentence has / will have a tag which belongs to the set {O, B-PER, I-PER, B-LOC, I-LOC, B-ORG, I-ORG}.
The first layer, __look-up layer__, aims at transforming character representation from one-hot vector into *character embedding*. In this code I initialize the embedding matrix randomly and I know it looks too simple. We could add some language knowledge later. For example, do tokenization and use pre-trained word-level embedding, then every character in one token could be initialized with this token's word embedding. In addition, we can get the character embedding by combining low-level features (please see paper[2]'s section 4.1 and paper[3]'s section 3.3 for more details).
The second layer, __BiLSTM layer__, can efficiently use *both past and future* input information and extract features automatically.
The third layer, __CRF layer__, labels the tag for each character in one sentence. If we use Softmax layer for labelling we might get ungrammatic tag sequences beacuse Softmax could only label each position independently. We know that 'I-LOC' cannot follow 'B-PER' but Softmax don't know. Compared to Softmax layer, CRF layer could use *sentence-level tag information* and model the transition behavior of each two different tags.
### dataset
| | #sentence | #PER | #LOC | #ORG |
| :----: | :---: | :---: | :---: | :---: |
| train | 46364 | 17615 | 36517 | 20571 |
| test | 4365 | 1973 | 2877 | 1331 |
It looks like a portion of [MSRA corpus](http://sighan.cs.uchicago.edu/bakeoff2006/).
### train
`python main.py --mode=train `
### test
`python main.py --mode=test --demo_model=1521112368`
Please set the parameter `--demo_model` to the model which you want to test. `1521112368` is the model trained by me.
An official evaluation tool: [here (click 'Instructions')](http://sighan.cs.uchicago.edu/bakeoff2006/)
My test performance:
| P | R | F | F (PER)| F (LOC)| F (ORG)|
| :---: | :---: | :---: | :---: | :---: | :---: |
| 0.8945 | 0.8752 | 0.8847 | 0.8688 | 0.9118 | 0.8515
### demo
`python main.py --mode=demo --demo_model=1521112368`
You can input one Chinese sentence and the model will return the recognition result:
![demo_pic](./pics/pic2.png)
### references
\[1\] [Bidirectional LSTM-CRF Models for Sequence Tagging](https://arxiv.org/pdf/1508.01991v1.pdf)
\[2\] [Neural Architectures for Named Entity Recognition](http://aclweb.org/anthology/N16-1030)
\[3\] [Character-Based LSTM-CRF with Radical-Level Features for Chinese Named Entity Recognition](http://www.nlpr.ia.ac.cn/cip/ZhangPublications/dong-nlpcc-2016.pdf)
\[4\] [https://github.com/guillaumegenthial/sequence_tagging](https://github.com/guillaumegenthial/sequence_tagging)
没有合适的资源?快使用搜索试试~ 我知道了~
资源推荐
资源详情
资源评论
收起资源包目录
zh-NER-TF-master.zip (22个子文件)
zh-NER-TF-master
.gitignore 28B
data_path
original
link.txt 49B
train1.txt 9.99MB
test1.txt 514KB
testright1.txt 564KB
train_data 13.26MB
word2id.pkl 60KB
test_data 1.06MB
eval.py 778B
conlleval_rev.pl 12KB
utils.py 3KB
README.md 3KB
data_path_save
1521112368
checkpoints
model-31680.meta 5.06MB
model-31680.index 1KB
model-31680.data-00000-of-00001 29.96MB
checkpoint 79B
main.py 5KB
pics
demo.txt 961B
pic1.png 768KB
pic2.png 284KB
model.py 12KB
data.py 4KB
共 22 条
- 1
资源评论
- qq_415989732021-04-07你厉害,github开源项目改个名字你就敢要50个币
托尼托尼光
- 粉丝: 3
- 资源: 10
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功