没有合适的资源?快使用搜索试试~ 我知道了~
基于最大流的光谱聚类相似性度量
0 下载量 195 浏览量
2021-03-16
08:40:21
上传
评论
收藏 1.18MB PDF 举报
温馨提示
基于最大流的光谱聚类相似性度量
资源推荐
资源详情
资源评论
ETRI Journal, Volume 35, Number 2, April 2013 © 2013 Jiangzhong Cao
et al.
311
In most spectral clustering approaches, the Gaussian
kernel-based similarity measure is used to construct the
affinity matrix. However, such a similarity measure does
not work well on a dataset with a nonlinear and elongated
structure. In this paper, we present a new similarity
measure to deal with the nonlinearity issue. The
maximum flow between data points is computed as the
new similarity, which can satisfy the requirement for
similarity in the clustering method. Additionally, the new
similarity carries the global and local relations between
data. We apply it to spectral clustering and compare the
proposed similarity measure with other state-of-the-art
methods on both synthetic and real-world data. The
experiment results show the superiority of the new
similarity: 1) The max-flow-based similarity measure can
significantly improve the performance of spectral
clustering; 2) It is robust and not sensitive to the
parameters.
Keywords: Spectral clustering, maximum flow, affinity
graph, similarity measure.
Manuscript received July 31, 2012; revised Oct. 7, 2012; accepted Oct. 22, 2012.
This work was supported by the National Natural Science Foundation of China through the
program 61173083, by the Ministry of Science and Technology, China, through the 973
Program 2011CB302200 and by the Economic & Information Commission of Guangdong
province through the Program GDIID2008IS007.
Jiangzhong Cao (phone: +86 135 6008 2826, cjz510@gdut.edu.cn) is with the School of
Information Science and Technology, Sun Yat-sen University, Guangzhou, China, and also
with the School of Information Engineering, Guangdong University of Technology,
Guangzhou, China.
Pei Chen (chenpei@mail.sysu.edu.cn) and Yun Zheng (zhengyun84@gmail.com) are with
the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China.
Qingyun Dai (daiqy@gdut.edu.cn) is with the School of Information Engineering,
Guangdong University of Technology, Guangzhou, China.
http://dx.doi.org/10.4218/etrij.13.0112.0520
I. Introduction
Spectral clustering has attracted a significant amount of
attention [1]-[4] due to its impressive performance on some
challenging clustering datasets, with successful applications in
computer vision [5], [6], VLSI design [7], and speech
processing [8], [9]. It has been shown that the affinity matrix is
crucial to the performance of spectral clustering [10]-[16].
Most spectral clustering methods adopted the Gaussian kernel
function as a similarity measure to construct the affinity matrix
[5], [11]-[13], where only the parameters are different. In [11],
a fixed scaling parameter controls how fast the similarity falls
off with the distance between points. In [12], a self-tuning
parameter was used to adapt to the multiscale dataset. In [13],
the Gaussian kernel function was scaled according to the local
density between data points so that the similarity between two
points is higher if there are more common points in their
ε
-
neighborhood.
Though the Gaussian kernel-based similarity measure can
describe the information of the local consistency, it does not
work well on a dataset with a nonlinear elongated structure.
See the example in Fig. 1(a), which reflects three spiral clusters.
The grayscale of lines indicates the similarity between the
points. The darker the line is, the larger the similarity is. One
can easily find cases in Fig. 1(a) wherein the similarity between
the points in the same manifold is smaller than that for a
different manifold. This phenomenon results in unsatisfactory
performance for spectral clustering algorithms.
To overcome the difficulty mentioned above, Fischer and
Buhmann [17], [18] proposed a path-based similarity measure
based on a connectedness criterion, which considers two
objects as similar if there exists a mediating intra cluster path
without any large-cost edge. Though the path-based measure
A Max-Flow-Based Similarity Measure
for Spectral Clustering
Jiangzhong Cao, Pei Chen, Yun Zheng, and Qingyun Dai
312
Jiangzhong Cao et al.
ETRI Journal, Volume 35, Number 2, April 2013
can partly overcome the difficulty with nonlinearity, it is
sensitive to noise. In [14], a robust path-based similarity
measure based on M-estimator was proposed to improve the
robustness of the path-based spectral clustering. It was reported
that the robust path-based measure performs well on some
datasets; however, the measure favors taking the data points
around the clusters as noise, as shown in [14].
In this paper, we propose a max-flow-based similarity
measure for constructing the affinity matrix, originating from
the fact that data points in the same cluster are more connected
than data points in different clusters, as shown in Fig. 1(a). The
maximum flow between data points is computed as the new
similarity, in which a weighted graph is constructed by using
the technique of
k
-nearest neighbor (
k
-NN),
ε
-neighborhood, or
a combination of both. As opposed to the path-based similarity
measure, the maximum flow takes all paths between two
points into account, not just the shortest one. Thus, the
proposed measure reflects the global similarity between two
points through all paths: the maximum flow (similarity) is
larger when there are more paths or shorter paths connecting
the two points. The commute time distance (resistance
distance) and its variants based on the random walk (electronic
network) have been proposed to carry out a similar idea and
have been widely used [19]-[21]; however, we will show that
the max-flow-based similarity measure can improve the
performance of the spectral clustering algorithm on most
datasets in our experiments.
The rest of this paper is organized as follows. The
background of spectral clustering is reviewed in section II. In
section III, we propose a max-flow-based similarity measure
and apply it to construct the affinity matrix in detail.
Experiment results on some datasets are presented in section IV,
and some concluding remarks are given in section V.
II. Background on Spectral Clustering
1. Ng-Jordan-Weiss Algorithm
Most spectral clustering algorithms follow the spirit of the
Ng-Jordan-Weiss (NJW) algorithm [11]. For completeness, the
NJW algorithm is briefly reviewed here.
Given a dataset
{
}
1
,,
n
Sx x= "
in
,
l
ℜ
the NJW
algorithm is implemented as follows. 1) Construct an affinity
matrix
A
by the Gaussian kernel function in (1):
2
2
|| ||
exp( ) for ,
0 for ,
ij
ij
xx
ij
A
ij
δ
⎧
−−
⎪
≠
=
⎨
⎪
=
⎩
(1)
where
δ
is a scale parameter to control how fast the similarity
changes with the distance between the data points
x
i
and
x
j
. 2)
Compute the normalized affinity matrix
L
=
D
-1/2
A D
-1/2
, where
D
is the diagonal matrix with
1
.
n
ii ij
j=
=
∑
DA
3) Compute
the
K
eigenvectors of
L
,
v
1
,
v
2
,…,
v
k
, which are associated with
the
K
largest eigenvalues, and form the matrix
][
21 K
vvvX "=
. 4) Renormalize each row to form a new
matrix
Kn
Y
×
ℜ∈
with
21/2
(),
ij ij ij
j
=
∑
YX X
so that each
row of
Y
has a unit length. 5) Treat each row of
Y
as a point in
K
ℜ
and partition the
n
points (
n
rows) into
K
clusters via a
general cluster algorithm, such as k-means algorithm. 6)
Assign the original point
x
i
to the cluster
c
if and only if the
corresponding row
i
of the matrix
Y
is assigned to the cluster
c
.
2. Similarity Graph
A weighted graph
G
=(
V
,
E
) is a convenient tool for
describing the similarity between data points, where
V
is the
dataset {
x
1
,
x
2
,…,
x
n
} and the weight for the edge between
x
i
and
x
j
is the similarity. The adjacency matrix of the graph is
the affinity matrix described in subsection II.1. The Gaussian
kernel similarity in (1) results in a fully connected graph.
There are other approaches to construct the graph, including
the
k
-NN and
ε
-neighborhood. In the
k
-NN graph, one vertex is
only connected to its
k
-NNs, that is, the weight is computed as
if is one of the NNs of ,
0 otherwise,
ij j i
ij
s
xk-x
w
⎧
⎪
=
⎨
⎪
⎩
(2)
where
s
ij
is the similarity between
x
i
and
x
j
.
In the
ε
-neighborhood graph, the points whose pairwise
distances (similarity) are smaller (larger) than
ε
are connected,
defined as
if || || ,
0 otherwise.
ij i j
ij
sxx
w
ε
−≤
⎧
=
⎨
⎩
(3)
The
k
-NN or
ε
-neighborhood technique produces a sparse
graph, which can help our method to reduce the computation
and improve the performance. Generally, the
k
-NN-based
graph is recommended as the first choice [15].
III. Max-Flow-Based Similarity Measure
Gaussian kernel function is widely used as the similarity
measure for its ability to reflect the homogeneity of
compactness. However, it fails on a dataset with an elongated
structure, as shown Fig. 1(a). The two points existing on the
same manifold should be homogeneous, that is, their similarity
is high even if with a large Euclidean distance. Such a fact
motivates us to seek a new similarity measure.
It was observed in [22] that the density of points in each
cluster is considerably higher than that within the area
separating the clusters. Figure 1(b) shows the sparse graph,
剩余9页未读,继续阅读
资源评论
weixin_38631225
- 粉丝: 5
- 资源: 908
上传资源 快速赚钱
- 我的内容管理 展开
- 我的资源 快来上传第一个资源
- 我的收益 登录查看自己的收益
- 我的积分 登录查看自己的积分
- 我的C币 登录后查看C币余额
- 我的收藏
- 我的下载
- 下载帮助
最新资源
- 直流微电网设计(MATLAB SIMULINK源码) 本项目试图研究由风能、光伏电源和电池三种能源组成的混合系统 三个能源中的每一个都可以向负载提供源源不断的电源 讨论了直流微电网中利用太阳能和风
- 基于多时间尺度的冷热电联供综合能源系统优化调度模型 摘要:代码主要做的是冷热电联供综合能源微网的多时间尺度优化问题,其中,日前计划中通过多场景描述可再生能源的不确定性,侧重于一个运行优化周期内 综合能
- 多目标人工秃鹫优化算法(MATLAB源码分享,智能优化算法) 提出了一种多目标版本的人工秃鹫优化算法(AVOA),用于多目标优化问题 AVOA的灵感来源于非洲秃鹫的生活方式 档案、网格和领导者选择
- 最新微信拼车打车程序,完整无错直接运营版,对接微信支付
- 基于copula的风光联合场景生成代码 该代码考虑风电和光伏出力的空间相关性生成联合场景,用于风光不确定性分析,为配置规划调度提供基础,地理位置相近的风电机组和光伏机组具有极大的相关性
- C++、MFC制作的图像处理工具,包括图像灰度、采样、量化、灰度直方图、灰度线性变化、灰度非线性变化、阈值化、均衡化处理等 -2025
- MATLAB代码:分布式最优潮流 关键词:网络划分;分布式光伏;集群电压控制;分布式优化;有功缩减 参考文档:《含分布式光伏的配电网集群划分和集群电压协调控制》 仿真平台:MATLAB 主要内容:本文
- 3D视觉上传一个报告类资源
- 基于opencv和MFC的图像处理软件,图像的灰度化、二值化、滤波、边缘检测、直方图,视频的边缘检测和跟踪-2025
- Python-tslearn专用于时间序列数据的机器学习Python工具包
- 具备VSG功能的逆变器仿真模型,同步发电机,构网型逆变器,基于MATLAB Simulink建模仿真 具备一次调频,惯性阻尼,一次调压 可以运行于离网模式和并网模式 仿真模型使用MATLAB 2
- Video-2024-09-19晚上-教学案例课(1).wmv
- 基于人工神经网络的系统辩识(MATLAB源码分享) 该示例文件显示了使用高斯白噪声下2DOF系统的人工神经网络(ANN)进行系统辩识 神经网络由输入层,输出层,隐藏层组成: -输入层:2个节点使用当
- 毕业设计:交通信息网上查询系统的设计与实现(源代码+论文+开题报告)
- 基于C++ MFC实现的五子棋游戏,包括棋盘棋子绘制、输赢判定、新游戏、悔棋和修改棋盘背景样式等功能 .zip
- 大一的C++课程项目,使用MFC框架搭建外卖平台,实现买、卖、计算路径送货、签收等核心功能-2025
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈
安全验证
文档复制为VIP权益,开通VIP直接复制
信息提交成功