YEDDA: A Lightweight Collaborative Text Span Annotation Tool
======
About:
====
YEDDA (the previous SUTDAnnotator) is developed for annotating chunk/entity/event on text (almost all languages including English, Chinese), symbol and even emoji. It supports shortcut annotation which is extremely efficient to annotate text by hand. The user only need to select text span and press shortcut key, the span will be annotated automatically. It also support command annotation model which annotates multiple entities in batch and support export annotated text into sequence text. Besides, intelligent recommendation and adminstrator analysis is also included in updated version. It is compatiable with all mainstream operating systems includings Windows, Linux and MacOS.
For more details, please refer to [our paper (ACL2018:demo)](https://arxiv.org/pdf/1711.03759.pdf).
This GUI annotation tool is developed with tkinter package in Python.
System required: Python 2.7
Author: [Jie Yang](https://jiesutd.github.io), Phd Candidate of SUTD.
Interface:
====
It provides both annotator interface for efficient annotatation and admin interface for result analysis.
* Annotator Interface:
![alt text](https://github.com/jiesutd/SUTDAnnotator/blob/master/EnglishInterface.png "English Interface demo")
![alt text](https://github.com/jiesutd/SUTDAnnotator/blob/master/ChineseInterface.png "Chinese Interface demo")
* Administrator Interface:
![alt text](https://github.com/jiesutd/SUTDAnnotator/blob/master/AdminInterface.png "Administrator Interface demo")
Use as an annotator ?
====
* Start the interface: run `python YEDDA_Annotator.py`
* Configure your shortcut map in the right side of annotation interface, you can leave other labels empty if the shortcut number is enough. For example: `a: Action; b: Loc; c: Cont`
* Click the `ReMap` button to store the map setting
* Click `Open` button and select your input file. (You may set your file name ended with .txt or .ann if possible)
This tool supports two ways of annotation (annotated text format `[@the text span#Location*]`):
* Shortcut Key Annotation: select the text and press the corresponding shortcut (i.e. `c` for label `Cont`).
* Command Line Annotation: type the code at command entry (at the bottom of the annotation interface). For example, type `2c3b1a` end with `<Enter>`, it will annotate the following `2` character as type `c: Cont`, the following `3` character as type `b: Loc`, then the following `1` character as `a: Action`.
Intelligent recommendation:
* Intelligent recommendation is enabled or disabled by the button `RMOn` and `RMOff`, respectively.
* If recommendation model is enabled, system will recommend entities based on the annotated text. Recommendation span is formatted as `[$the text span#Location*]`in green color. (Notice the difference of annotated and recommended span, the former starts with `[@` while the later starts with `[$`)
The annotated results will be stored synchronously. Annotated file is located at the same directory with origin file with the name of ***"origin name + .ann"***
Use as an administrator ?
====
YEDDA provides a simple interface for administartor to evaluate and analyze annotation quality among multiple annotators. After collected multiple annotated `*.ann` files from multiple annotators (annotated on same plain text), YEDDA can give two toolkits to monitor the annotation quality: multi-annotator analysis and pairwise annotators comparison.
* Start the interface: run `python YEDDA_Admin.py`
* Multi-Annotator Analysis: press button `Multi-Annotator Analysis` and select multiple annotated `*.ann` files, it will give f-measure matrix among all annotators. The result matrix is shown below:
![alt text](https://github.com/jiesutd/SUTDAnnotator/blob/master/resultMatrix.png "Result Maxtix")
* Pairwise Annotators Comparison: press button `Pairwise Comparison` and select two annotated `*.ann` files, it will generate a specific comparison report (in `.tex` format, can be compiled as `.pdf` file). The demo pdf file is shown below:
![alt text](https://github.com/jiesutd/SUTDAnnotator/blob/master/detailReport.png "Detail Report")
Important features:
=====
1. Type `ctrl + z` will undo the most recent modification
2. Put cursor within an entity span, press shortcut key (e.g. `x`) to update label (binded with `x`) of the entity where cursor is belonging. (`q` for remove the label)
3. Selected the annotated text, such as `[@美国#Location*]`, then press `q`, the annotated text will be recoverd to unannotate format (i.e. "美国").
4. Change label directly, select entity content or put cursor inside the entity span (such as `[@美国#Location*]`), then press `x`, the annotated text will change to new label mapped with shortcut `x` (e.g. `[@美国#Organization*]`).
5. Confirm or remove recommended entity: put cursor inside of the entity span and press `y` (yes) or `q` (quit).
6. In the command entry, just type `Enter` without any command, the cursor in text will move to the head of next line. (You can monitor this through "Cursor").
7. The "Cursor" shows the current cursor position in text widget, with `row` and `col` represent the row and column number, respectively.
8. `Export` button will export the ***".ann"*** file as a identity name with ***".anns"*** in the same directory. The exported file list the content in sequence format. In the source code, there is a flag `self.seged` which controls the exported bahaviour. If your sentences are consist of words seperated with space (such as segmentated Chinese and English), then you may set it `True`, otherwise set it as `False` (for sentences which are consist of characters without space, such as unsegmentated Chinese text). Besides, another flag `self.tagScheme` controls the exporting format, the exported ***".anns"*** will use the `BMES` format if this flag is set to `"BMES"`, otherwise the exported file is formatted as `"BIO".`
Cite:
========
If you use YEDDA for research, please cite this report as follows:
@article{yang2017yedda,
title={YEDDA: A Lightweight Collaborative Text Span Annotation Tool},
author={Yang, Jie and Zhang, Yue and Li, Linwei and Li, Xingxuan},
booktitle={Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL): Demonstration},
year={2018}
}
Updating...
====
* 2018-May-07, Repository is renamed as YEDDA now!
* 2018-May-01, Our paper has been accepted as a demonstration at ACL 2018.
* 2017-Sep-27, (YEDDA V 1.0): project was officially named as YEDDA ! See our paper [here](https://arxiv.org/pdf/1711.03759.pdf).
* 2017-June-24, (V 0.6): support nested coloring; add event annotation beta version [Event_beta.py](Event_beta.py)
* 2017-May-31, (V 0.6): optimize for Windows OS.
* 2017-Apr-26, (V 0.5.3): fix bug with line merge when change entity type.
* 2017-Apr-20, (V 0.5.2): fix bugs with `newline` problem on MacOS/Linux/Windows. (`\r` `\n` `\r\n`)
* 2017-Apr-20, (V 0.5.1): change entity label more directly; optimize cursor figure.
* 2017-Apr-19, (V 0.5): update entity represent as `[@Entity#Type*]`; support change label directly; fix some bugs.
* 2017-Apr-15, (V 0.4): update example and readme.
* 2017-Apr-13, (V 0.4): modify color; support setting color single line or whole file (may be slow in large file) (`self.colorAllChunk`).
* 2017-Apr-12, (V 0.4): support BMES/BIO export (`self.tagScheme`); support segmented sentence export(`self.seged`); can save previous shortcut setting.
* 2016-Mar-01, (V 0.3): fix export bug (bug: set space when sentence didn't include any effective label).
* 2016-Jan-11, (V 0.2): add sequence format export function.
* 2016-Jan-09, (V 0.1): init version.
没有合适的资源?快使用搜索试试~ 我知道了~
中文NLP命名实体识别序列标注工具YEDDA
共78个文件
sample:11个
ann:11个
py:7个
需积分: 49 206 下载量 48 浏览量
2018-06-26
11:23:05
上传
评论 22
收藏 9.85MB ZIP 举报
温馨提示
中文NLP序列标注工具。利用CRF进行命名实体识别NER,自动标注数据集产生语料库,可以选择BIO或者BMES标注体系。
资源推荐
资源详情
资源评论
收起资源包目录
YEDDA.zip (78个子文件)
YEDDA
.git
info
exclude 240B
objects
pack
pack-5c8225a8916edac885388c4e6bcbcce4f170f1ac.pack 7.33MB
pack-5c8225a8916edac885388c4e6bcbcce4f170f1ac.idx 12KB
info
HEAD 23B
description 73B
packed-refs 338B
config 295B
index 4KB
refs
tags
remotes
origin
HEAD 32B
heads
master 41B
hooks
commit-msg.sample 896B
pre-receive.sample 544B
fsmonitor-watchman.sample 3KB
pre-rebase.sample 5KB
prepare-commit-msg.sample 1KB
update.sample 4KB
pre-push.sample 1KB
pre-commit.sample 2KB
post-update.sample 189B
applypatch-msg.sample 478B
pre-applypatch.sample 424B
logs
HEAD 180B
refs
remotes
origin
HEAD 180B
heads
master 180B
ChineseInterface.png 467KB
AdminInterface.png 78KB
Event_beta.py 37KB
.YEDDA_Annotator.py.un~ 561B
LICENCE 11KB
tex2pdf
eng.log 19KB
test.fdb_latexmk 9KB
eng.fdb_latexmk 9KB
test.pdf 67KB
test.fls 9KB
llncs.cls 42KB
eng.synctex.gz 28KB
test.log 19KB
eng.fls 9KB
test.tex 9KB
example.pdf 792KB
test.aux 484B
eng.pdf 67KB
eng.aux 484B
test.synctex.gz 27KB
eng.tex 9KB
EnglishInterface.png 327KB
YEDDA_Annotator.py~ 38KB
utils
metric4ann.py 12KB
__init__.pyc 118B
recommend.py 6KB
__init__.py 0B
metric4ann.pyc 8KB
recommend.pyc 4KB
compareAnn.py 19KB
compareAnn.pyc 14KB
README.md 8KB
config~ 141B
.config.un~ 8KB
config 141B
resultMatrix.png 325KB
README.md.ann 8KB
YEDDA_Admin.py 6KB
YEDDA_Annotator.py 38KB
demotext
UserC.ann 23KB
UserD.ann 26KB
ChineseDemo.txt 3KB
EnglishDemo.txt.ann 10KB
UserA.ann 23KB
EnglishDemo1.txt 2KB
EnglishDemo1.txt.ann 10KB
Jie_resumeSample.txt.ann 313KB
test.tex 21KB
EnglishDemo.txt 2KB
UserE.ann 26KB
ChineseDemo.txt.ann 4KB
UserB.ann 24KB
ChineseDemo.ann 23KB
detailReport.png 203KB
共 78 条
- 1
资源评论
jewelshaw
- 粉丝: 2
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功