基于最大流的光谱聚类相似性度量资源-CSDN文库

195 浏览量 2021-03-16 08:40:21 上传评论收藏 1.18MB PDF 举报

资源推荐

资源详情

资源评论

et al.

311

In most spectral clustering approaches, the Gaussian

kernel-based similarity measure is used to construct the

affinity matrix. However, such a similarity measure does

not work well on a dataset with a nonlinear and elongated

structure. In this paper, we present a new similarity

measure to deal with the nonlinearity issue. The

maximum flow between data points is computed as the

new similarity, which can satisfy the requirement for

similarity in the clustering method. Additionally, the new

similarity carries the global and local relations between

data. We apply it to spectral clustering and compare the

proposed similarity measure with other state-of-the-art

methods on both synthetic and real-world data. The

experiment results show the superiority of the new

similarity: 1) The max-flow-based similarity measure can

significantly improve the performance of spectral

clustering; 2) It is robust and not sensitive to the

parameters.

Keywords: Spectral clustering, maximum flow, affinity

graph, similarity measure.

Manuscript received July 31, 2012; revised Oct. 7, 2012; accepted Oct. 22, 2012.

This work was supported by the National Natural Science Foundation of China through the

program 61173083, by the Ministry of Science and Technology, China, through the 973

Program 2011CB302200 and by the Economic & Information Commission of Guangdong

province through the Program GDIID2008IS007.

Jiangzhong Cao (phone: +86 135 6008 2826, cjz510@gdut.edu.cn) is with the School of

Information Science and Technology, Sun Yat-sen University, Guangzhou, China, and also

with the School of Information Engineering, Guangdong University of Technology,

Guangzhou, China.

Pei Chen (chenpei@mail.sysu.edu.cn) and Yun Zheng (zhengyun84@gmail.com) are with

the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China.

Qingyun Dai (daiqy@gdut.edu.cn) is with the School of Information Engineering,

Guangdong University of Technology, Guangzhou, China.

http://dx.doi.org/10.4218/etrij.13.0112.0520

I. Introduction

Spectral clustering has attracted a significant amount of

attention [1]-[4] due to its impressive performance on some

challenging clustering datasets, with successful applications in

computer vision [5], [6], VLSI design [7], and speech

processing [8], [9]. It has been shown that the affinity matrix is

crucial to the performance of spectral clustering [10]-[16].

Most spectral clustering methods adopted the Gaussian kernel

function as a similarity measure to construct the affinity matrix

[5], [11]-[13], where only the parameters are different. In [11],

a fixed scaling parameter controls how fast the similarity falls

off with the distance between points. In [12], a self-tuning

parameter was used to adapt to the multiscale dataset. In [13],

the Gaussian kernel function was scaled according to the local

density between data points so that the similarity between two

points is higher if there are more common points in their

neighborhood.

Though the Gaussian kernel-based similarity measure can

describe the information of the local consistency, it does not

work well on a dataset with a nonlinear elongated structure.

See the example in Fig. 1(a), which reflects three spiral clusters.

The grayscale of lines indicates the similarity between the

points. The darker the line is, the larger the similarity is. One

can easily find cases in Fig. 1(a) wherein the similarity between

the points in the same manifold is smaller than that for a

different manifold. This phenomenon results in unsatisfactory

performance for spectral clustering algorithms.

To overcome the difficulty mentioned above, Fischer and

Buhmann [17], [18] proposed a path-based similarity measure

based on a connectedness criterion, which considers two

objects as similar if there exists a mediating intra cluster path

without any large-cost edge. Though the path-based measure

A Max-Flow-Based Similarity Measure

for Spectral Clustering

Jiangzhong Cao, Pei Chen, Yun Zheng, and Qingyun Dai

312

Jiangzhong Cao et al.

ETRI Journal, Volume 35, Number 2, April 2013

can partly overcome the difficulty with nonlinearity, it is

sensitive to noise. In [14], a robust path-based similarity

measure based on M-estimator was proposed to improve the

robustness of the path-based spectral clustering. It was reported

that the robust path-based measure performs well on some

datasets; however, the measure favors taking the data points

around the clusters as noise, as shown in [14].

In this paper, we propose a max-flow-based similarity

measure for constructing the affinity matrix, originating from

the fact that data points in the same cluster are more connected

than data points in different clusters, as shown in Fig. 1(a). The

maximum flow between data points is computed as the new

similarity, in which a weighted graph is constructed by using

the technique of

-nearest neighbor (

-NN),

-neighborhood, or

a combination of both. As opposed to the path-based similarity

measure, the maximum flow takes all paths between two

points into account, not just the shortest one. Thus, the

proposed measure reflects the global similarity between two

points through all paths: the maximum flow (similarity) is

larger when there are more paths or shorter paths connecting

the two points. The commute time distance (resistance

distance) and its variants based on the random walk (electronic

network) have been proposed to carry out a similar idea and

have been widely used [19]-[21]; however, we will show that

the max-flow-based similarity measure can improve the

performance of the spectral clustering algorithm on most

datasets in our experiments.

The rest of this paper is organized as follows. The

background of spectral clustering is reviewed in section II. In

section III, we propose a max-flow-based similarity measure

and apply it to construct the affinity matrix in detail.

Experiment results on some datasets are presented in section IV,

and some concluding remarks are given in section V.

II. Background on Spectral Clustering

1. Ng-Jordan-Weiss Algorithm

Most spectral clustering algorithms follow the spirit of the

Ng-Jordan-Weiss (NJW) algorithm [11]. For completeness, the

NJW algorithm is briefly reviewed here.

Given a dataset

{

}

Sx x= "

ℜ

the NJW

algorithm is implemented as follows. 1) Construct an affinity

matrix

by the Gaussian kernel function in (1):

|| ||

exp( ) for ,

0 for ,

⎧

−−

⎪

≠

⎨

⎪

⎩

(1)

where

is a scale parameter to control how fast the similarity

changes with the distance between the data points

and

. 2)

Compute the normalized affinity matrix

-1/2

A D

-1/2

, where

is the diagonal matrix with

ii ij

∑

3) Compute

the

eigenvectors of

,…,

, which are associated with

the

largest eigenvalues, and form the matrix

][

21 K

vvvX "=

. 4) Renormalize each row to form a new

matrix

ℜ∈

with

21/2

(),

ij ij ij

∑

YX X

so that each

row of

has a unit length. 5) Treat each row of

as a point in

ℜ

and partition the

points (

rows) into

clusters via a

general cluster algorithm, such as k-means algorithm. 6)

Assign the original point

to the cluster

if and only if the

corresponding row

of the matrix

is assigned to the cluster

2. Similarity Graph

A weighted graph

) is a convenient tool for

describing the similarity between data points, where

is the

dataset {

,…,

} and the weight for the edge between

and

is the similarity. The adjacency matrix of the graph is

the affinity matrix described in subsection II.1. The Gaussian

kernel similarity in (1) results in a fully connected graph.

There are other approaches to construct the graph, including

the

-NN and

-neighborhood. In the

-NN graph, one vertex is

only connected to its

-NNs, that is, the weight is computed as

if is one of the NNs of ,

0 otherwise,

ij j i

xk-x

⎧

⎪

⎨

⎪

⎩

(2)

where

is the similarity between

and

In the

-neighborhood graph, the points whose pairwise

distances (similarity) are smaller (larger) than

are connected,

defined as

if || || ,

0 otherwise.

ij i j

sxx

−≤

⎧

⎨

⎩

(3)

The

-NN or

-neighborhood technique produces a sparse

graph, which can help our method to reduce the computation

and improve the performance. Generally, the

-NN-based

graph is recommended as the first choice [15].

III. Max-Flow-Based Similarity Measure

Gaussian kernel function is widely used as the similarity

measure for its ability to reflect the homogeneity of

compactness. However, it fails on a dataset with an elongated

structure, as shown Fig. 1(a). The two points existing on the

same manifold should be homogeneous, that is, their similarity

is high even if with a large Euclidean distance. Such a fact

motivates us to seek a new similarity measure.

It was observed in [22] that the density of points in each

cluster is considerably higher than that within the area

separating the clusters. Figure 1(b) shows the sparse graph,

剩余9页未读，继续阅读

评论收藏

内容反馈

weixin_38631225

粉丝: 5
资源: 908

基于最大流的光谱聚类相似性度量

论文研究-基于最大偏差相似性准则的交通流聚类算法.pdf

基于维度最大熵数据流聚类的异常检测方法

一种新的超像素光谱聚类图像分割方法

使用线性光谱聚类的超像素分割

基于核函数度量相似性的遥感影像变化检测

使用双自动编码器网络的深度光谱聚类

聚类算法中相似性度量方法的研究

基于面匹配的模型相似性度量方法 (2015年)

基于相似性度量的证据融合改进算法

基于异质信息网络的相似性度量研究

基于最大内聚度基准的加权投票聚类集成

基于快速稳健特征最大子矩阵的光谱图像配准方法

一种基于相似性度量的离散化方法 (2012年)

基于最大相关熵准则的鲁棒度量学习算法_谢林江1

基于格式塔原理和光谱聚类的边缘分组

相似性分类器：基于相似性度量的分类器。-matlab开发

基于最大最小距离的高光谱遥感图像波段选择.pdf

基于核函数的最大间隔聚类算法 (2002年)

spectral-clustering:光谱聚类演示

SpectralClustering:光谱聚类的Python实现

l1spectral:这是CRAN R软件包存储库的只读镜像。 l1spectral —光谱聚类的L1版本

一种谱聚类算法 spectral

matlab开发-MSCWK

Qt 5实现串口调试助手 （源工程文件、0积分下载）

【SystemVerilog】路科验证V2学习笔记（全600页）.pdf

最新资源

Qt 5实现串口调试助手（源工程文件、0积分下载）