***************************
STC: Sparse Topical Coding
***************************
Jun Zhu
junzhu [at] cs.cmu.edu
(C) Copyright 2011, Jun Zhu (junzhu [at] cs [dot] cmu [dot] edu)
This file is part of STC.
STC is free software; you can redistribute it and/or modify it under
the terms of the GNU General Public License as published by the Free
Software Foundation; either version 2 of the License, or (at your
option) any later version.
STC is distributed in the hope that it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307
USA
------------------------------------------------------------------------
This is a C implementation of Sparse Topical Coding (STC), a model
of discrete data which is fully described in Zhu et al. (2011)
(http://www.cs.cmu.edu/~junzhu/stc/stc.pdf).
------------------------------------------------------------------------
TABLE OF CONTENTS
A. COMPILING
B. SETTINGS FILE
C. DATA FILE FORMAT
D. ESTIMATION AND INFERENCE
E. QUESTIONS, COMMENTS, PROBLEMS, UPDATE ANNOUNCEMENTS
------------------------------------------------------------------------
A. COMPILING
Use Visual Studio 2008 to open "MedSTC.sln" and compile.
------------------------------------------------------------------------
B. Settings file
See settings.txt for a sample. These are placeholder values; they
should be experimented with.
This is of the following form:
supervised [0 or 1]
primal svm [0 or 1, default 1]
var max iter [integer e.g., 20]
var convergence [float e.g., 1e-4]
em max iter [integer e.g., 100]
em convergence [float e.g., 1e-4]
model C [positive float e.g., 0.5]
delta ell [positive float e.g., 3600]
lambda [positive float e.g., 0.1]
rho [positive float e.g., 0.01]
svm_alg_type [0 or 2]
train_file: [string e.g., ..\train.dat]
test_file: [string e.g., ..\test.dat]
class_num: [20 for 20 Newsgroup]
res_file: overall-res_supervised.txt
where the settings are
[supervised]
If the value is "1", the model is a supervised MedSTC; if "0", the model
is the unsupervised STC.
[primal svm]
Only works when "supervised" is set at 1. If the value is "1", use the
loss-augmented prediction (i.e., sub-gradient) to update document codes;
use the gradient with Lagrangian multipliers to update document codes.
[var max iter]
The maximum number of iterations of coordinate descent for a single document.
[var convergence]
The convergence criteria for coordinate descent. Stop if
(objective_old - objective) / abs(objective_old) is less than this value (or
after the maximum number of iterations). Note that "objective" is the objective
value for a single document.
[em max iter]
The maximum number of iterations of hierarchical sparse coding, dictionary learning,
and svm training (for supervised MedSTC).
[em convergence]
The convergence criteria for coordinate descent. Stop if (objective_old -
objective) / abs(objective_old) is less than this value (or after the
maximum number of iterations). Note that "objective" is the objective value
for the whole corpus.
[delta ell]
The parameter for the svm cost function, i.e., 0/(delta ell) loss.
[C],[lambda],[rho]
These are the regularization constants.
[train_file]
The file name of training data.
[test_file]
The file name of testing data.
[class_num]
The number of class labels for the classification problem.
[res_file]
The name of the file for saving prediction results.
2. Data format
Under STC, the words of each document are assumed exchangeable. Thus, each document
is succinctly represented as a sparse vector of word counts. The data is a file
where each line is of the form:
[M] [label] [term_1]:[count] [term_2]:[count] ... [term_M]:[count]
where [M] is the number of unique terms in the document; [label] is the true label
of the document; the [count] associated with each term is how many times that term
appeared in the document. Note that [term_1] is an integer which indexes the
term; it is not a string.
------------------------------------------------------------------------
D. ESTIMATION AND INFERENCE
For simplicity, a command is provided for doing both estimation and inference.
Usage is:
MedSTC [K] [C] [lambda] [rho] [setting] [fold] [delta ell]
where [setting] is the name of setting file (e.g., "settings_20ng.txt");
and [fold] is an integer used only for naming the directory for saving models.
------------------------------------------------------------------------
E. QUESTIONS, COMMENTS, PROBLEMS, AND UPDATE ANNOUNCEMENTS
Questions, comments, and problems should be addressed to: [email protected].
Update announcements will be posted at: http://cs.cmu.edu/~junzhu/stc.htm
没有合适的资源?快使用搜索试试~ 我知道了~
机器学习,监督学习,无监督学习,统计方法,包含机器学习算法
共75个文件
url:28个
cpp:18个
h:16个
需积分: 5 0 下载量 94 浏览量
2024-03-27
20:19:32
上传
评论
收藏 182KB ZIP 举报
温馨提示
说明:机器学习,监督学习,无监督学习,统计方法,包含机器学习算法。 (computer learning) 文件列表: MedSTC-2norm-win (0, 2014-07-25) MedSTC-2norm-win\bin (0, 2011-06-18) MedSTC-2norm-win\bin\settings_20ng.txt (398, 2011-06-18) MedSTC-2norm-win\MedSTC (0, 2011-06-18) MedSTC-2norm-win\MedSTC.sln (2224, 2011-06-18) MedSTC-2norm-win\MedSTC.suo (177664, 2011-06-18) MedSTC-2norm-win\MedSTC\cokus.cpp (6391, 2010-04-09) MedSTC-2norm-win\MedSTC\cokus.h (937, 2010-03-29) MedSTC-2norm-win\MedSTC\Corpus.cpp (8919, 2011-06-18) MedSTC-2norm-win\Med
资源推荐
资源详情
资源评论
收起资源包目录
机器学习.zip (75个子文件)
机器学习
MedSTC-2norm-win
MedSTC.sln 2KB
SVM_Multiclass
stdafx.h 376B
健康养生秘笈.url 133B
SVM_Multiclass.vcproj.SCS.junzhu.user 1KB
主播培训.url 61B
SVM_Multiclass.vcproj 4KB
svm_struct_learn.h 5KB
武术资料获取.url 125B
svm_struct_api.h 4KB
svm_struct_cpp.cpp 38KB
svm_struct_common.h 2KB
ReadMe.txt 1KB
黑客技术.url 62B
svm_struct_learn.cpp 61KB
svm_struct_common.cpp 2KB
美味小吃技术.url 126B
撩妹套路(120G).url 195B
svm_struct_learn_custom.cpp 2KB
svm_struct_api_types.h 5KB
职业技能培训.url 61B
stdafx.cpp 301B
svm_struct_api.cpp 23KB
bin
健康养生秘笈.url 133B
主播培训.url 61B
武术资料获取.url 125B
黑客技术.url 62B
settings_20ng.txt 398B
美味小吃技术.url 126B
撩妹套路(120G).url 195B
职业技能培训.url 61B
SVMLight
stdafx.h 376B
svm_hideo.cpp 29KB
健康养生秘笈.url 133B
主播培训.url 61B
svm_learn.cpp 131KB
kernel.h 2KB
svm_learn.h 8KB
武术资料获取.url 125B
ReadMe.txt 1KB
黑客技术.url 62B
SVMLight_CPlus.vcproj 4KB
SVMLight_CPlus.vcproj.SCS.junzhu.user 1KB
svm_common.h 16KB
美味小吃技术.url 126B
撩妹套路(120G).url 195B
职业技能培训.url 61B
stdafx.cpp 301B
svm_common.cpp 50KB
MedSTC.suo 174KB
MedSTC
stdafx.h 376B
Params.cpp 3KB
健康养生秘笈.url 133B
主播培训.url 61B
MedSTC.vcproj 5KB
Params.h 1KB
Corpus.h 2KB
LBFGSCPP.h 4KB
武术资料获取.url 125B
LBFGSCPP.cpp 32KB
main.cpp 5KB
ReadMe.txt 1KB
黑客技术.url 62B
Corpus.cpp 9KB
utils.cpp 5KB
MedSTC.h 4KB
美味小吃技术.url 126B
cokus.h 937B
MedSTC.vcproj.SCS.junzhu.user 1KB
撩妹套路(120G).url 195B
MedSTC.cpp 38KB
职业技能培训.url 61B
stdafx.cpp 295B
utils.h 2KB
cokus.cpp 6KB
readme.txt 5KB
共 75 条
- 1
资源评论
greatdhyuan
- 粉丝: 0
- 资源: 52
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功