高维数据的快速两级近似欧几里德最小生成树算法资源-CSDN文库

研究论文

168 浏览量 2021-03-14 23:40:26 上传评论收藏 2.52MB PDF 举报

资源推荐

资源详情

资源评论

A Fast Two-Level Approximate Euclidean

Minimum Spanning Tree Algorithm

for High-Dimensional Data

Xia Li Wang

1(&)

, Xiaochun Wang

, and Xiaqiong Li

School of Information Engineering, Changan University, Xi’an 710061, China

xlwang@chd.edu.cn

School of Software Engineering,

Xi’an Jiaotong University, Xi’an 710049, China

xiaocchunwang@mail.xjtu.edu.cn,

xiaqiongli@stu.xjtu.edu.cn

Abstract. Euclidean minimum spanning tree algorithms run typically with

quadratic computational complexity, which is not practical for large scale high

dimensional datasets. In this paper, we propose a new two-level approximate

Euclidean minimum spanning tree algorithm for high dimensional data. In the

ﬁrst level, we perform outlier detection for a given data set to identify a small

amount of boundary points and run standard Prim’s algorithm on the reduced

dataset. In the second level, we conduct a k-nearest neighbors search to com-

plete an approximate Euclidean Minimum Spanning Tree construction process.

Experimental results on sample data sets demonstrate the efﬁciency of the

proposed method while keeping high approximate precision.

Keywords: Euclidean minimum spanning tree

 Minimum spanning tree

Approximate minimum spanning tree

 Nearest neighbor search

1 Introduction

Finding a minimum spanning tree (MST) for a given connected graph is a fundamental

problem with diverse application domains and many efﬁcient MST algorithms have

been developed. In today’s MST tasks, usually, a set of N d-dimensional data points is

given and the problem is commonly solved in the Euclidean setting, giving rise to the

so-called Euclidean minimum spanning tree (EMST) problem. In this case, there are

V ¼ N vertices and E ¼ NN 1ðÞ=2 edges in the complete graph, and standard EMST

algorithms, such as Kruskal’s[1] and Prim’s[2], have a time complexity roughly equal

to O dN



. For large-scale high- dimensional datasets, standard EMST algorithms will

lose its time performance. Fortunately, in many practical applications, an exact EMST

can be generally replaced by an approximate one without degrading the quality of the

ﬁnal application.

Being a compact data representation of a given data set, EMST has been exten-

sively used in image segmentation [3, 4], cluster analysis [5–7], classiﬁcation [8], and

manifold learning [9]. In particular, we are interested in EMST construction for cases

P. Perner (Ed.): MLDM 2018, LNAI 10935, pp. 273–287, 2018.

https://doi.org/10.1007/978-3-319-96133-0_21

The rest of this paper is organized as follows. Section 2 gives a review of some

related work. Section 3 presents the proposed data- dependent EMST method. Sec-

tion 4 shows the experimental comparisons. In Sect. 5, we give the conclusion and

future work.

2 Related Work

Work related to the method presented in this paper falls into two main categories:

EMST algorithms and density-based outlier detection methods.

2.1 MST Algorithms

For a given connected and weighted graph G ¼ E, VðÞ, Bor°uvka’s algorithm begins

with each vertex of a graph being a tree, and for each consecutive iteration, it selec ts the

shortest edge from a tree to another tree and combines them. This process continues

until all the trees are combined into one tree [12]. Proposed independently by Jarn´ık

[13], Prim [2] and Dijkstra [14] in 1930, 1957 and 1959, respectively, the famous

Prim’s algorithm ﬁrst arbitrarily selects a vertex as a tree, and then repeatedly adds the

shortest edge that connects a new vertex to the tree, until all the vertices are included.

Proposed in 1956, Kruskal’s algorithm starts with sorting all the edges by their weights

in a non-decreasing order, treats each vertex as a tree, and iteratively combines the trees

by adding edges in the sorted order excluding those leading to a cycle until all the trees

are combined into one tree [1]. The time complexity of these classic MST algorithms is

O ElogVðÞ.

To construct an MST in the Euclidean setting, standard Prim’s algorithm requires a

quadratic running time. To be more efﬁcient, in 1978, Bentley and Friedman [15]

proposed to use a kd-tree in Prim’s algorithm to enhance the search for the next edge to

add to the tree, which can reach an O NlogNðÞrunning time for most data distributions.

In 1985, Preparata and Shamos [16] gave a lower bound for the EMST problem of

NlogNðÞ, which has been the tightest known lower bound. In 1993, Callahan and

Kosaraju’s proposed Well-Separated Pair Decomposition (WSPD) [17] which forms

the basis of most recent EMST algorithms. The WSPD partitions data points into a set

of pairs of tree nodes such that the nodes in any pair are farther apart than the diameter

of either n ode. It can be shown that the WSPD has O NðÞpairs of nodes, and that the

MST is a subset of the edges formed between the closest pair of points in each pair of

nodes. In 2000, Narasimhan and Zachariasen applied WSPD to compute neighbors of

components for Boruvka’s algorithm to ﬁnd edges of the MST [18]. However, the

constant in the O NðÞsize of the WSPD grows exponentially with the data dimension

and is often very large in practice. In 2010, March et al. presented a new dual-tree

algorithm for efﬁciently computing the EMST [19], which is superﬁcially similar to the

method in [18] except that the WSPD is replaced by the new dual-tree data structure

and referred to in the following as FEMST algorithm. They used adaptive algorithm

analysis to prove the tightest (and possibly optimal) runtime bound for the EMST

problem to-date. Experiments conducted demonstrated the scalability of their met hod

on astronomical data sets.

A Fast Two-Level Approximate Euclidean Minimum Spanning Tree Algorithm 275

剩余14页未读，继续阅读

评论收藏

内容反馈

weixin_38719578

粉丝: 6
资源: 928

高维数据的快速两级近似欧几里德最小生成树算法

用于彩色图像分割的高效近似EMST算法

Dboost：一种基于DBSCAN的高维数据聚类的快速算法

一种基于学习的高维数据 c-近似最近邻查询算法1

高维数据处理论文

基于自适应波束形成的高维数据挖掘算法.pdf

计算机研究 -基于深度学习的高维数据聚类算法研究.pdf

论文研究-求解度约束最小生成树问题的新算法.pdf

高维数据SVM实现+降维可视化

浅谈高维数据挖掘的现状与方法.pdf

基于维度分组降维的高维数据近似k近邻查询.docx

用matlab实现最大最小距离聚类算法

大维协方差矩阵和高维数据分析-姚郑白等著.pdf

高维数据异常检测算法.pptx

高维数据挖掘中的聚类算法研究.pdf

高维数据几何结构与降维（国内唯一一本讲如何比较详尽的阐述高维数据如何降维的）

基于概要数据结构的高维数据流聚类算法.pdf

高维数据的统计挑战

高维数据排序优化算法.pptx

最新资源