## NLP-dataset (General)
* [Huggingface, datasets](https://huggingface.co/datasets)
* [Awesome-Chinese-NLP, Chinese](https://github.com/crownpku/Awesome-Chinese-NLP)
* [CLUEDatasetSearch, Chinese](https://github.com/CLUEbenchmark/CLUEDatasetSearch)
* [funNLP, Chinese](https://github.com/fighting41love/funNLP)
* [ChineseNLPCorpus1, Chinese](https://github.com/InsaneLife/ChineseNLPCorpus)
* [ChineseNLPCorpus2, Chinese](https://github.com/SophonPlus/ChineseNlpCorpus)
* [CLUE, Chinese](https://www.cluebenchmarks.com/introduce.html)
* [Chinese NLP data by ShannonAI, Chinese](https://github.com/ShannonAI/glyce/blob/master/docs/dataset_download.md)
* [nlp-datasets, Multilingual](https://github.com/niderhoff/nlp-datasets)
* [awesome-nlp, Multilingual](https://github.com/keon/awesome-nlp#datasets)
## Word Segmentation (Chinese)
* [SIGHAN2005](http://sighan.cs.uchicago.edu/bakeoff2005/)
* [multi-criteria-cws](https://github.com/hankcs/multi-criteria-cws)
* [Chinese NLP data by ShannonAI, Chinese](https://github.com/ShannonAI/glyce/blob/master/docs/dataset_download.md)
## NER dataset (English)
* [various NER dataset](https://github.com/juand-r/entity-recognition-datasets)
* [CoNLL-2003, Offical](https://www.clips.uantwerpen.be/conll2003/ner/), [CoNLL-2003, other link](https://github.com/synalp/NER/tree/master/corpus/CoNLL-2003)
* [WNUT-2016, Twitter](https://github.com/aritter/twitter_nlp/tree/master/data/annotated/wnut16)
* [OntoNotes-5.0, broadcase news, braodcase conversation, weblogs, magzine genre](https://github.com/yuchenlin/OntoNotes-5.0-NER-BIO)
* [Wikigold](https://github.com/juand-r/entity-recognition-datasets/tree/master/data/wikigold)
* [Twitter](https://github.com/aritter/twitter_nlp/blob/master/data/annotated/ner.txt)
* [kaggle](https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus/data)
* [MUC6](https://catalog.ldc.upenn.edu/LDC2003T13)
* [MUC7](https://catalog.ldc.upenn.edu/LDC2001T02)
## NER dataset (Chinese)
- [MSRA, OntoNotes 4.0, Resume, Weibo](https://drive.google.com/file/d/1mDKkc2-8e4wXAuAnGiZMHI59UgVbl1q4/view)
- [CLUENER](https://storage.googleapis.com/cluebenchmark/tasks/cluener_public.zip)
- [RenMinRiBao](https://github.com/quincyliang/nlp-dataset/tree/master/ner-data/renMinRiBao)
- [MSRA](https://github.com/quincyliang/nlp-dataset/tree/master/ner-data/MSRA)
- [Boson](https://github.com/quincyliang/nlp-dataset/tree/master/ner-data/boson)
- [Weibo](https://github.com/quincyliang/nlp-dataset/tree/master/ner-data/weibo)
- [Others](https://github.com/OYE93/Chinese-NLP-Corpus/tree/master/NER)
## Machine Translation (Chinese-English)
- [WMT 2020](http://statmt.org/wmt20/translation-task.html)
- [AI challenger](https://challenger.ai/) (英中翻译规模最大的口语领域英中双语对照数据集)
- [UM-Corpus: A Large English-Chinese Parallel Corpus](http://nlp2ct.cis.umac.mo/um-corpus/)
- [OpenSubtitles2016](http://opus.nlpl.eu/OpenSubtitles2016.php)
- [MultiUN](http://opus.nlpl.eu/MultiUN.php)
没有合适的资源?快使用搜索试试~ 我知道了~
中英文实体识别数据集,中英文机器翻译数据集,中文分词数据集.zip
共24个文件
txt:8个
train:3个
dev:3个
1.该资源内容由用户上传,如若侵权请联系客服进行举报
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
2.虚拟产品一经售出概不退款(资源遇到问题,请及时私信上传者)
版权申诉
0 下载量 78 浏览量
2023-10-19
22:26:35
上传
评论
收藏 13.33MB ZIP 举报
温馨提示
中英文实体识别数据集,中英文机器翻译数据集,中文分词数据集.zip
资源推荐
资源详情
资源评论
收起资源包目录
中英文实体识别数据集,中英文机器翻译数据集,中文分词数据集.zip (24个子文件)
nlp-public-dataset-master
ner-data
boson
data_util.py 4KB
origindata.txt 1.78MB
license.txt 2KB
readme.txt 981B
weibo
crfsuite.weiboNER.charpos.conll.dev 1.96MB
pku_test_gold.utf8 701KB
pku_training.utf8 7.37MB
weiboNER_2nd_conll.dev 103KB
crfsuite.weiboNER.charpos.conll.test 2MB
weiboNER.conll.train 442KB
weiboNER_2nd_conll.test 106KB
weiboNER_2nd_conll.train 523KB
weiboNER.conll.test 90KB
weiboNER.conll.dev 88KB
crfsuite.weiboNER.charpos.conll.train 9.93MB
renMinRiBao
data_renmin_word.py 5KB
renmin.txt 10.18MB
MSRA
train2pkl.py 4KB
test1.txt 514KB
link.txt 49B
testright1.txt 564KB
train1.txt 9.99MB
ss.md 0B
README.md 3KB
共 24 条
- 1
资源评论
天天501
- 粉丝: 598
- 资源: 4666
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功