http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html
Naive Bayes algorithm for learning to classify text
Companion to Chapter 6 of Machine Learning textbook.
Naive Bayes classifiers are among the most successful known algorithms for learning to classify text documents. This page provides an implementation of the Naive Bayes learning algorithm similar to that described in Table 6.2 of the textbook. It also provides a dataset containing 20,000 newsgroup messages drawn from the 20 newsgroups described in Table 6.3. As mentioned in the textbook, the dataset contains 1000 documents from each of the 20 newsgroups.
Note on downloading
This code and data are only supported under the Unix and Linux operating systems. (if you would like to volunteer support for Windows, please contact me). To reconstruct the original files from a downloaded files such as xxx.tar.gz, type the following two commands to Unix:
gunzip xxx.tar.gz
tar -xf xxx.tar
Code
This code is based on the Rainbow/Libbow software package developed by Andrew McCallum. It includes efficient C code for indexing text documents along with code implementing the Naive Bayes learning algorithm. Libbow also provides implementations of two additional text learning algorithms: TFIDF and prTFIDF. This code may be used as both a building block for creating other programs, or as a stand-alone learning/classification system.
Note: this code is a minor variant of the code described in Table 6.2 of Machine Learning.
Most recent Libbow source code and documentation
Old Libbow source code and documentation (tarred and gziped)
.
Newsgroup Data
The tarred and gzipped data directory (easiest for downloading).
A tarred and gzipped subset of the Newsgroup data which contains 100 randomly selected messages from each newsgroup. This is a useful dataset for learning to use Rainbow.
On-Line Documentation
Rainbow Documentation
Visitors from outside CMU are invited to use this material free of charge for any educational purpose, provided attribution is given in any lectures or publications that make use of this material.
This page organized by Jason Rennie.
没有合适的资源?快使用搜索试试~ 我知道了~
温馨提示
软件地址下载源地址:http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html 在fedora12上,排除编译错误,成功安装,正常使用,能够实现简单的文本分类。 资源除源代码和数据集外,还包含一个简单的安装文档和一个bug的排除文档
资源推荐
资源详情
资源评论
收起资源包目录
文本分类软件rainbow源代码及其安装和使用方法.rar (7个子文件)
bow-20020213.tar.gz 522KB
mini_newsgroups.tar.gz 1.77MB
安装与使用rainbow文档.doc 29KB
网页信息.txt 2KB
install rainbow problem.doc 27KB
20_newsgroups.tar.gz 16.53MB
bow-latest.tar.gz 237KB
共 7 条
- 1
资源评论
- li123042122012-12-03正需要材料,可以借鉴下,谢谢了
- ilovesweet2012-12-24在linux下的,我反正是没装好,后来也没装好
- calmo2012-04-02能够很好的提取bags of word 唯一不足是有没有windows版本的啊?
sysustudy
- 粉丝: 0
- 资源: 5
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功