function [predict_labels evd_time kmeans_time total_time] = nystrom(data, num_samples, sigma, num_clusters)
%NYSTROM Spectral clusterina using the Nystrom method.
%
% Input : data : N-by-D data matrix, where N is the number of data,
% D is the number of dimensions
% num_samples : number of random samples
% sigma : sigma value used in computing similarity
% num_clusters : number of clusters
%
% Output : predict_labels : N-by-1 vector containing predicted labels
% evd_time : running time for eigendecomposition
% kmeans_time : running time for k-means
% total_time : total running time
%
% Authors: Wen-Yen Chen, Yangqiu Song, Hongjie Bai, Chih-Jen Lin, 2008.
%
% Contact: Wen-Yen Chen (wychen@cs.ucsb.edu)
% Computer Science
% University of California at Santa Barbara
% Santa Barbara, CA 93106
%
% Randomly select samples
%
disp('Randomly selecting samples...');
tic;
num_rows = size(data, 1);
permed_index = randperm(num_rows);
sample_data = data(permed_index(1:num_samples), :);
other_data = data(permed_index(num_samples+1:num_rows), :);
clear data;
toc;
%
% Calculate the euclidean distance between samples themselves
%
disp('Calculating distance among samples...');
A = euclidean(sample_data', sample_data');
A = single(A);
toc;
%
% Calculate the euclidean distance between samples and other points
%
disp('Calculating distance between samples and other points...');
B = euclidean(sample_data', other_data');
B = single(B);
clear sample_data other_data;
toc;
%
% Convert distance matrix to similarity matrix: S = exp^(-(dist^2 / 2*sigma^2))
%
disp('Converting distance matrix to similarity matrix...');
A = single(exp(-(A.*A) ./ (2*sigma*sigma)));
B = single(exp(-(B.*B) ./ (2*sigma*sigma)));
toc;
%
% Normalize A and B using row sums of W, where W = [A B; B' B'*A^-1*B].
% Let d1 = [A B]*1, d2 = [B' B'*A^-1*B]*1, dhat = sqrt(1./[d1; d2]).
%
disp('Normalizing A and B for Laplacian...');
B_T = B';
d1 = sum(A, 2) + sum(B, 2);
d2 = sum(B_T, 2) + B_T*(pinv(A)*sum(B, 2));
dhat = sqrt(1./[d1; d2]);
A = A .* (dhat(1:num_samples)*dhat(1:num_samples)');
m = num_rows - num_samples;
B1 = dhat(1:num_samples)*dhat(num_samples+(1:m))';
B = B .* B1;
clear W d1 d2 B1 dhat;
time1 = toc;
%
% Do orthogalization and eigendecomposition
% Reference: PAMI'04 paper 'Spectral grouping using Nystrom method'
%
disp('Orthogalizing and eigendecomposition...');
Asi = sqrtm(pinv(A));
B_T = B';
BBT = B*B_T;
W = single(zeros(size(A,1)+size(B_T,1), size(A,2)));
W(1:size(A,1), :) = A;
W(size(A,1)+1:size(W,1), :) = B_T;
clear B B_T;
% Calculate R = A + A^-1/2*B*B'*A^-1/2
R = A + Asi*BBT*Asi;
R = (R + R')/2; % Make sure R is symmetric, sometimes R can be non-symmetric because of numerical inaccuracy
[U L] = eig(R);
[val ind] = sort(diag(L), 'descend');
U = U(:, ind); % in decreasing order
L = L(ind, ind); % in decreasing order
clear A R BBT;
W = W*Asi;
V = W*U(:,1:num_clusters)*pinv(sqrt(L(1:num_clusters,1:num_clusters)));
clear W Asi L U;
time2 = toc;
%
% Do k-means
%
disp('Performing kmeans...');
% Normalize each row to be of unit length
sq_sum = sqrt(sum(V.*V, 2)) + 1e-20;
U = V ./ repmat(sq_sum, 1, num_clusters);
clear sq_sum V;
predict_labels = k_means(U, [], num_clusters);
% Restore predict_labels in original order
predict_labels(permed_index) = predict_labels;
clear permed_index;
total_time = toc;
%
% Calculate and show time statistics
%
evd_time = time2 - time1
kmeans_time = total_time - time2
total_time
disp('Finished!');
%--------------------------------------------------------------------------
function d = euclidean(a, b)
%EUCLIDEAN Compute the Euclidean distance matrix between two matrices.
% This function is designed for processing very large data using divide-
% and-conquer technique.
%
% Input : a : D-by-M data matrix, where D is the number of dimensions,
% M is the number of data
% b : D-by-N data matrix, where D is the number of dimensions,
% N is the number of data
% Output: d : M-by-N matrix of the Euclidean distance between a and b.
%
% Calculate a^2, b^2, here we assume b is larger than a
%
aa = single(full(sum(a.*a, 1)));
bb = single(full(sum(b.*b, 1)));
%
% Do a*b in several steps instead of once because of memory limitation
%
two_ab = single(zeros(size(aa, 2), size(bb, 2)));
% Select at most 10000 instances of b for a*b per iteration
num_iter = ceil(size(bb, 2)/10000);
for i = 1:num_iter
start_index = 1 + (i-1)*10000;
end_index = min(i*10000, size(bb, 2));
abtmp = single(full(a'*b(:, start_index:end_index)));
two_ab(:, start_index:end_index) = 2*abtmp;
end % Now we have entire ab
clear a b abtmp;
d = bb(ones(size(aa, 2), 1), :);
d = d - two_ab; % Now we have d = b^2 - 2ab
clear two_ab;
ff = aa';
ff = ff(:, ones(size(bb, 2), 1));
d = d + ff; % Now we have d = a^2 + b^2 -2ab
clear aa bb ff;
d = sqrt(d);
没有合适的资源?快使用搜索试试~ 我知道了~
spectralclustering
共18个文件
mat:10个
m:7个
readme:1个
5星 · 超过95%的资源 需积分: 15 26 下载量 160 浏览量
2008-10-07
15:20:46
上传
评论
收藏 15.07MB ZIP 举报
温馨提示
spectral clustering 算法工具包
资源推荐
资源详情
资源评论
收起资源包目录
spectralclustering-1.0.zip (18个子文件)
spectralclustering-1.0
k_means.m 4KB
script_nystrom.m 2KB
nmi.m 2KB
script_sc.m 2KB
sc.m 3KB
gen_nn_distance.m 4KB
README 5KB
nystrom.m 5KB
data
corel_50_NN_sym_distance.mat 1.24MB
corel_200_NN_sym_distance.mat 5.07MB
corel_5_NN_sym_distance.mat 127KB
corel_20_NN_sym_distance.mat 504KB
corel_10_NN_sym_distance.mat 253KB
corel_feature.mat 1.23MB
corel_label.mat 236B
corel_15_NN_sym_distance.mat 377KB
corel_100_NN_sym_distance.mat 2.51MB
corel_150_NN_sym_distance.mat 3.8MB
共 18 条
- 1
资源评论
- 攻城狮AndLi2015-05-11资源很好 相似函数不是gaussian 但是大致的思路是一样的 解释很详细 虽然版本不是最新的
- Seven_July_1232011-11-25虽然版本不是最新的,但是大致的思路是一样的,不错,相似函数不是gaussian
- ghost8112142015-02-26资源很好,解释很详细
redrobot
- 粉丝: 0
- 资源: 3
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功