AG's News Topic Classification Dataset
Version 3, Updated 09/09/2015
ORIGIN
AG is a collection of more than 1 million news articles. News articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non-commercial activity. For more information, please refer to the link http://www.di.unipi.it/~gulli/AG_corpus_of_news_articles.html .
The AG's news topic classification dataset is constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the dataset above. It is used as a text classification benchmark in the following paper: Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).
DESCRIPTION
The AG's news topic classification dataset is constructed by choosing 4 largest classes from the original corpus. Each class contains 30,000 training samples and 1,900 testing samples. The total number of training samples is 120,000 and testing 7,600.
The file classes.txt contains a list of classes corresponding to each label.
The files train.csv and test.csv contain all the training samples as comma-sparated values. There are 3 columns in them, corresponding to class index (1 to 4), title and description. The title and description are escaped using double quotes ("), and any internal double quote is escaped by 2 double quotes (""). New lines are escaped by a backslash followed with an "n" character, that is "\n".
没有合适的资源?快使用搜索试试~ 我知道了~
ag-news新闻数据
共4个文件
txt:2个
csv:2个
需积分: 0 0 下载量 147 浏览量
2022-12-07
15:18:03
上传
评论
收藏 10.82MB ZIP 举报
温馨提示
数据格式,每一条数据有三列,第一列为标签,第二列为title,第三列为content:
资源推荐
资源详情
资源评论
收起资源包目录
ag_news.zip (4个子文件)
ag_news_csv
classes.txt 31B
train.csv 28.11MB
test.csv 1.77MB
readme.txt 2KB
共 4 条
- 1
资源评论
Mark_Aussie
- 粉丝: 124
- 资源: 1
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功