Ubuntu Dialogue Corpus V1

共5个文件

txt：5个

NLP

需积分: 9 29 下载量 62 浏览量 2017-08-03 17:17:34 上传评论 3 收藏 190MB RAR 举报

温馨提示

label \t conversation utterances (splited by \t) \t response

资源推荐

资源详情

资源评论

收起资源包目录

ubuntu_data.rar （5个子文件）

ubuntu_data

test.txt 338.15MB

train.txt 676.29MB

valid.txt 337.03MB

ReadMe.txt 704B

vocab.txt 1.97MB

共 5 条

This .zip file includes the datasets (training/testint/validation) used in the experiments of paper: Incorporating Loose-Structured Knowledge into LSTM with Recall-Gate for Conversation Modeling. The datasets are extracted from the corpus: http://cs.mcgill.ca/~jpineau/datasets/ubuntu-corpus-1.0/ Negtive sampling is conducted to produce balanced training set and 1:9 validation/testing sets following the paper of Lowe et al. (2015) The details of the datasets are give below: 1. train.txt: 1 million training samples (pos:neg=1:1) 2. valid.txt: 50,000 samples for validation (pos:neg=1:9) 3. test.txt: 50,000 samples for testing (pos:neg=1:9) 4. vocab.txt: Vocabulary of the datasets.

评论收藏

内容反馈

资源评论