论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf

1星(超过10%的资源)
所需积分/C币:13 2019-08-16 13:19:01 510KB .PDF
47
收藏 收藏
举报

基于自然最近邻的密度敏感谱聚类,雷大江,王明达,谱聚类算法的关键在于如何衡量数据之间的邻近关系,本文通过一种快速自然最近邻方法自适应选取邻居数目来描述数据点之间的邻近关
山国酬技论文在线 http://www.paper.edu.cn the relationship (Neighbor's Upper Bound). Natural neighborhood can be described by its neighbors 95 upper bound, and neighbor's upper bound is defined as Supk SuPk=maxr{r|(vpq)A(p≠q)∧(pENM(q) Where ris the number of neighbors that dynamically change when searching, NN(g) indicates that q satisfies the defined number of neighbors when r neighbors are searched ( Stable Searching State). Natural neighborhood is defined as G=(V, E), where V 100 represents the vertex, E represents the variable set, hypothesis vi, y;E V and vi+ vj, the saturation state of the graph can be described as ED(y)A(D()≥1(2) WhereDDr(vi) represents the vertex adjacent to yi, when vi represents the most isolated vertex DDr(viI 105 Formula 3 indicates that the most isolated vertices have the nearest neighbor, and the graph is in a saturated state as a condition for the natural neighbor search to stop For density-sensitive spectral clustering algorithms, the correlation betw een data points is mostly based on the traditional euclidean distance between data points. And then researchers began to describe the similarity between data points based on Gaussian kernel functions. Although Gaussian kernel function can better describe the relationship between data points, the correct choice of kernel parameters of Gaussian kernel function is the key to the success of the algorithm And the description based on Gaussian kernel function is not suitable to deal with the multi-scale parameter problem[9]. In 2007, Wang designing a density-sensitive similarity measure which is data-dependent measure of 115 similarity, and introduced it into spectral clustering to propose a density-sensitive spectral clustering method [10]. This method enlarges the distance between data points in different high-density areas meanwhile reducing the distance between data points in the same high-density area. For density-sensitive distance, Lu proposed a density-sensitive hierarchical clustering algorithm[ll], and Sun proposed a semi-supervised clustering algorithm with extended constraints[ 12] 120 In the density-sensitive spectral clustering algorithm, a line length with adjustable density is defined and according adjustable line length to define the distance between new data points. Then a larity matrix w is defined by the new define of dista (Density Adjustable Segment Length). Give two points x and y. The function dist(, y) is the Euclidean distance between the data points x and y, and pis a scaling factor. The 125 value range of p must p>1. Adjustable length of line segment is defined as follow dist(x,y) The length of the Euclidean line segment between data points is converted to a line segment of adjustable density and stretch the Euclidean distance between data points can be implemented. (Density-sensitive Distance). We can calculate a density-sensitive distance measure 130 by density adjustable segment length between data points. Data point V=(XJi=1 is taken as the vertex of graph G=(V, E). The density-sensitive distance between data points is the sum of the lengths of adjustable line segments on the shortest path. Deline as follow =mins∈P,2k=1L(Pk,P 山国技论文在线 http://www.paper.edu.cn A new density-sensitive distance between data points is used as the distance between data points 135 to establish a similarity matrix W. We apply w to classical spectral clustering to get the density sensitive spectral clustering algorithm and then the distance measure finds the shortest path on the same manifold. Therefore, density-sensitive spectral clustering algorithm can be applied to a variety of manifolds. The algorithm can better to satisfy the global consistency 140 According to the definition of density-sensitive spectral clustering algorithm, the key of density -sensitive spectral clustering lies in the definition of a new distance measure and a new measure of line length. According to the definition of the length of the adjustable line, the length of the line with adjustable density is a complex function of power function. The length of the line with adjustable density transforms the traditional Euclidean distance by an exponential function, making the distance 145 between data points exponentially increase with increasing Euclidean distance. The distance between data points is defined by the density-sensitive distance. If the data points belong to the same class, all the data points on the shortest path between the data points belong to the same class, i. e. satisfies the global consistency principle of the clustering algorithm. When the relative position between data points does not change, the classification of the clustering algorithm should not change, i. e. the mutual 150 relationship between the data points unchanged. However, density-sensitive distance metrics may change as the distance between data points increases Fig 1 shows that when the distance between data points linear change, the shortest path between A and b changes when the phase position between data points does not change d In A 155 Fig. I Two paths bctwccn A and B (Density-sensitive Distance). In Fig. 1, path between the above direction of A andB is defined as follo 71=DAB=L(A,m1)+L(m1m2)+L(m2,B)=p4+p42+pd-3 (6) And another path is defined as follow T2=D4B=L(A,n1)+L(n1,n2)+L(n2B)=p+pd+p°-3(7) Assume thatT1 T2 before linear transformation, There is no guarantee that T1 will always be less than T2 whenfd1, d2, d3, d4, d5, d6 is linearly expanded The shortest path between data points has changed and the division of data points will change. according to the definition of line length adjustable density, scaling factor p needs to be made through empirical knowledge. It can be seen from 165 Fig 3 that the selection of scaling factor will also affect the clustering effect. Different scaling factors in the classic three-circle data of artificial data set will produce different clustering 4 山国武技论文在线 http:/iwww.paper.edu.cn *++,+t Fig 2 Thc Original Data Sct Th wA E4 Fig 3 The Effect of Scaling Factor on Data Division Therefore, in the above analysis, there are some problems in density-sensitive spectral clustering algorithrn 1. In density-sensitive spectral clustering algorithm, the value of scaling factor afTects the per formance of the algorithm 2. In the density-sensitive clustering algorithm, adjustable length of line segment will destroy the correlation between data points when the relative position of data points linear changes The accuracy of density-sensitive spectral clustering algorithms will have an impact when use density-adjustable line lengths l85 In order to solve these problems, we must consider the relationship between the clustering result and the data point distance. Inliterature [11, if the two points belong to a cluster division, the data points can be connected by a path Through the same sparse area. If the two points do not belong to the same cluster division, the connected paths between data points will pass through different sparse areas.Therefore, the shortest path between data points does not change with the distance between 190 scalability factors and data points are the key point to solve this problem From the idea of the natural neighbor method, we can see that when the most isolated point can get the nearest neighbor, then the graph is just in the saturated state. According to this theory, a fast search method is proposed to make the near neighbor of data fast to thesaturation state. The fast search method based on natural neighbor method(FNN method) directly obtains the furthest neighbors of data sets 200 and then looks for the nearest neighbor to find the nearest neighbor's critical situation, i.e. access to the Supi. When we know the rankings of all the farthest neighbors in the nearest neighbor list, we can calculate the maximum rank by the neighbor list. In other words, all the most remote data points can be found in neighbors, near neighbors are saturated and this is a criterion of stopping searching neighbor 山国科技论文在线 http://www.paper.edu.cn ( Criterion of Stopping Searching Neighbor ) The fast search method needs to return to the Supk, The nearest neighbor node xi of the furthest neighbor node of the data point xi is obtained by Farnn(xi), and then through the findRank(xi,xi) method to calculate rank of xi in the neighbor list of xi Supa=Max(n=find Rank(x, x)A(x, E Farnn(xD) (8) The Fnn method is to calculate 2 of the datasets. The time complexity of this method is o(*n). In fact, in high-dimensional data, the n will be more than 20 but less than 30. So n always much smaller thanlog n, and FNN method is faster than the natural neighbor method In Algorithm 2 function findFN (X calculates the furthest neighbors of datasets. Function indKNN(xi 1) 205 calculates the nearest neighbor of the furthest neighbors of data. Function findRank(xi,xi) calculates the rank of xiin the neighbors list ofx r=1, FNNN=0, Rank =o NaN lambda= o 2:Vxk∈ⅹ 210 3: forall;∈ findEN(X) 4: FN_NN=findknN(i, 1) 5678 ∈FNNN Rank=findRank(xi, xi) 215 9: =Max(rank) 10: NaNlambda= NaN_lambda U xk, knn (k)] 11: Return: Nan lambda Algorithm 1 firstly finds the furthest neighbors of datasets. Secondly, get the nearest neighbor of 220 furthest neighbors, and step 4 gets its nearest neighbor Thirdly, step 6-8 get the rank of furthest neighbors and its nearest neighbor. Finally, the max of rank and NaN_lambda are our goal. According to the relationship of each data point stored in NaN_lambda, we can get the adjacentmatrix A, which can describe the relationship of nearest neighbor. If x i is the nature neighbor of xj, Ai is set to 1; otherwise Ai is set to O 225 The transformation of the distance between data points can be identified by the sparse degree of neighbors. It is common to measure the degree of sparse around a data point using the distance between the data point and the k nearestneighbors. Therefore, we describe the sparse degree of data points by describing the distance between the point and its local information points 230 (Scaling Factor Based on Local Information). In a data set, the local information of data point iis the distance from data point i to the local information points. In formula 9 Si represents the data point. Sho represents the data information point corresponding to the data pointiand function d(Si, Sno) represents the Euclidean distance between the data point i and its local informati point 235 According to the concept of local information, we can set a new scaling factor based on loc information. This scaling factor needs to satisfy some of the features of the scaling factor. According to the research of scaling operators in density sensitive spectral clustering algorithm, we design a scaling factor based on local information Define as follow 山国科技论文在线 httpiiwww.paper.edu.cn 240 nin Cr,o (ensity-Adjustable Line Segment Length Based on Local Information). According to the new scaling factor we can define a new line length. Where dist(, y) represents the euclidean 245 distance between the data point x and y Define as follow L(x,y)=ndist(e,y) (1) The new designed length of line with adjustable density based on local information can satisfy the 50 requirement thatthe density adjustable segment length of data points and the shortest line segment length has not changedwhen the distance between data points increases linearly. Therefore, adjusting the length of the line segment based on the local information can eliminate the influence of the Euclidean distance expansion. (Similarity Measure Based on Local Information. We can further define the distance measure between data points based on the densily-adjustable line segment length of the local 255 information. It can be seen from definition 8 that the density-sensitive distance belween data points can describe the similarity between data points. We can calculate the similarity matri In order to ensure the integrity of the matrix, the diagonal of the similarity matrix W constructed by the density-sensitive similarity measure is 0, i.e. Wi=0 W1=1/(mnep1k(xx012-1)+1)(12) 260 Fig 4. shows that the accuracy of the algorithm in the expansion of different multiples of the data values Through the different magnification of three-circle data, the new adjustable line length is applied to the spectral clustering algorithm to achieve the purpose of expanding the euclidean distance between data points Fig 4 The Accuracy of Algorithm It can be seen from Fig 4 thatafter multiple expansion of three-circle data, the algorithm can correctly divide the tricyclic data. Density sensitive spectral clustering algorithm based on local information can well adapt to the change of distance between data points. So, we can design a density-sensitive spectral clustering method based on local information In algorithm 2, firstly, Algorithm I is used to construct the neighborhoods between data and stored inadjacent matrix 4. Secondly, a similarity matrix wis constructed from the density-sensitive 275 similarity measure according to the adjacent matrix Aand the formulas. Step 4-5, we calculate the normalized Laplacian matrix. Then we can calculate eigenvectors corresponding to the first k largest eigenvalues and stored in V. Then, we need to normalize the line vector of v wl 7 山国技论文在线 http://www.paper.edu.cn V=Vii/oi Vii)2. Last, we use kmeans method to solve clustering results 275 1:={x=1k,Y,V 2:A=FNN(X),∈Rn2n 3: Wii similar Matrix(X)if aii = 1,W Rnxn 4:Dt=∑}-1W 5:P=D-12WD-1/2 280 6:V=eigs(P,k),V=[v1,v2…,vk]∈Rnxk 7: V= normalize(v) 8 Y= Kmeans(v,k) 9: Return: Y 285 We would separately test the clustering effect of our method on the data set from the artificial data set and the uci data set, and compared with the original K-means algorithm and density-sensitive spectral clustering algorithm. In the uci data set, we use the correct number of classification ratio to evaluate the accuracy of the algorithm 290 Some "challenging" clusteringproblems are given by the literature [6]. We select a small part of the data in the literature It can be seen from Fig 4 that kmeans algorithm cannot be correctly divided the datasets, Density-sensitive spectral clustering DS-SSC)algorithm could only divided the second group while p= 1.5, and cannot be correctly divided the third group while p= 2. but Density-sensitive spectral clustering based on natural neighbor (NDS-SSC) correctly divides all data 295 sets. Therefore, we can see that the density-sensitive spectral clustering algorithm has shortcomings while only considers the scaling factor. We can use natural neighbor method self-adaptively describe neighborhoods, and provide local information for density-sensitive spectral clustering algorithm 到 (a) Original 300 (b) kme (c) Density sensitive spectral clustering(p =1.5) 山国酬技论文在线 http://www.paper.edu.cn 蒸 ∵;… 305 (d) Density sensitive spectral clustering(p= 2) .∵:"·∴。 ● ,“4 Density-sensitive spectral clustering based on natural neighbor ig. 3 Effect Comparison ofClustering 310 In order to further test the performance of density-sensitive spectral clustering algorithm based on natural neighbor, we design experiments and select a series dataset (Iris dataset, Ionosphere dataset, glass dataset and wine dataset)in the international UCi dataset From Table 1, the properties of each data set already presented 315 Tab. 1 The Properties of Each UCI Data Set 351 210 13 23332 60 During the experiment, the scaling factor parameters of density-sensitive spectral clustering algorithm were chosen as 1.5, and all the algorithms were repeated 50 times. The result of the algorithm 320 involves the problem of an algorithm renumbering. In the course of our classification, the actual number of samples may not be in accordance with the new number of our classification In order to solve this problem, we establish a mapping matrix. By selecting the maximum number of clusters and mapping values, we get the total number of correct classification, and then get the correct rate Table 2 shows that density-sensitive spectral clustering method based on natural neighbor has good performance in dataset Ionosphere, Iris, Seed, Sonar dataset. The kmeans algorithm on Wine data sets has a slight advantage over our method. Among the four algorithms, density-sensitive spectral clustering algorithms are weaker than the other two algorithms, it can be seen that the value of inappropriate scaling parameters will affect the clustering effect. But our method can overcome the selection problem of scaling parameters and have better performance on the data set 330 Tab 2 Comparison ofCorrect Rate 0.7123 0.6382 0.8933 0.7267 0.8952 0.6095 0.7022 0.5506 山国科技论文在统 http://www.paper.edu.cn 0.5129 0.5385 This paper analyzes the problem of parameter selection in density sensitive spectral clustering. The parameter selection based on natural neighbor methodadopts the adaptive schema to solve the 335 before-mentioned problem. We propose a density clustering algorithm based on natural neighbor. Our algorithm achieves better performance on the"challenging"data sets. The experiments show that our prposed method can overcome the problem, that the exponential transformation of distance may affect the clustering performance when the distance between data points is relatively large. What is more, it is still an open question that how to describe the neighbor releationship between data pointsaccurately 340 The future work maybe design the neighbor graph via convex optimization This paper is supported by the following foundations or programs, including Natural Science Foundation of Chongqing of China(cstc2014jcyjA40049), Doctoral Scientific Research Foundation of Chongqing of University of Posts and Telecommunications(A2013-01) 345 [J Buhmann, Data clustering and learning LA]. The handbook of brain theory and neural networkslC] 1995.1059-1070 [2]J. Macqueen. Some Methods for Classification and Analysis of Multi Variate Observations [A]. Berkeley Symposium on Mathematical Statistics and Probability [C]. 1967. 281-297 350 [3].L. Lauritzen. The EM Algorithm for Graphical Association Models with Missing Data[J]. Computational Statistics Data Analysis, 1995, ol. 19: 191-201 [4J Shi, J. Malik Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis machine ntelligence,200o1.22:888-905 [5]R Kannan, S. Vempala, A. Veta. On Clusterings: Good, Bad and Spectral[J]. Symposium on Foundations of 355 Computer Science, 2000, 367 [6A.Y. Ng, M.I. Jordan, Y. Weiss. On spectral clustering: analysis and an algorithm[A]. International Conference. Neural Information Processing Systems: Natural and Synthetic[C]. 2001, 849-856 [7 M. Mcila, J Shi. Lcarning Scgmcntation by Random Walks[J]. Advances in Ncural Information Proccssing Systems.2001,873-879 360 [8]L. Zelnik-Manor. Self-tuning spectral clustering[J]. Advances in Neural Information Processing Systems, 2004, 7:1601-1608. []Q. S. Zhu, J. Feng, J. L Huang. Natural neighbor: A self-adaptive neighborhood method without parameter K[. Pattern Rccognition Lcttcrs, 2016, vol. 80: 30-36 OTL. U Pengli, Z Wang Density-Sensitive hierarchical clustering algorithm J. Computer Engineering 365 Applications, 2014 [II]L. Wang, B O. Lie- Feng, L C Jiao Density-Sensitive Spectral Clustering[. Acta Electronica Sinica, 2007 vol.35:1577-1581 [12]. Sun,YG. Yi,Y.H LV, H. T Cai, J Z Wang, A Semi-Supervised Sparsity Discriminant Analysis Algorithm for Feature Extraction[]. Advanced Materials Research, 2012, vol 546-547: 670-674 370 [13]B Scholkopf, J. Weston,O. Bousquct, D. Zhou, Lcarning with Local and Global Consistency[J]. 2004 基于自然最近邻的密度敏感谱聚类 375 400065 400065 摘要: 380 10

...展开详情
试读 11P 论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf
立即下载
限时抽奖 低至0.43元/次
身份认证后 购VIP低至7折
一个资源只可评论一次,评论内容不能少于5个字
Joy_0413 原来以为是代码,没有想到是pdf啊,,,
2019-12-02
回复
您会向同学/朋友/同事推荐我们的CSDN下载吗?
谢谢参与!您的真实评价是我们改进的动力~
  • 至尊王者

关注 私信
上传资源赚钱or赚积分
最新推荐
论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf 13积分/C币 立即下载
1/11
论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf第1页
论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf第2页
论文研究-Density-Sensitive Spectral Clustering Based on Natural Neighbor.pdf第3页

试读结束, 可继续读1页

13积分/C币 立即下载