下载  >  行业  >  教育  > Detection of community structures in networks via global optimization

Detection of community structures in networks via global optimization 评分

社会网络中的社区发现已应用到许多领域,已成为学术界研究的热点问题,本文是社区发现的好论文,有机会一定要看看,呵呵!
A. Medus et al./ Physica A 358(2005)593-604 95 Given the network n we will define a partition P as a given grouping of the nodes in subsets p; (I<i<g), while keeping the structure of the adjacency matrix unaltered Following I we will quantify the degree of communality of a given partition P in the following way Given a partition P comprising g subsets, a matrix e(of dimension g *g) is defined such that the corresponding component eii is the fraction of edges in the origina network that connect nodes in subset i with nodes in subset In I the modularity o is defined as e i Q stands for the difference between the relative quantity of links within subsets and the expected relative number of links that would result from a random placement of links when no attention is given to the community structure of the network under consideration 9 If the network under consideration has no community structure, Q equals 0. On the other hand, if the network under consideration does have a community structure the closer the chosen partition is to the actual community structure of the network the larger the modularity Q will be In this way, the search of community structures in networks is reduced to finding the partition p which maximizes the modularity Q 3. Community recognition algorithms In this section we review the algorithm presented in I based on edge removal Hereafter referred as edge removal(ER), and describe our approach based on simulated annealing(hereafter referred aS Sa) 3.1. Community recognition via edge removal In a recent work, Newman and Girvan 3] have proposed to study the structure of the network by analyzing the effect of the removal of links with highest betweenness The betweenness bi of a given link li is ∑am∑ Lkm path with paths the sum over all the path joining the nn nodes ano is the degeneracy of the path between nodes n and o, and 2ienath is the sum over all the links lkm that form he path under consideration. In this way the link with highest betweenness is the one that appears most often when we study all the components of all the minimum paths between all the pairs of nodes According to this prescription (i) One calculates the betweenness of all the links in the network. (ii) The one with the highest betweenness is removed A. Medus et al./ Physica A 358(2005)593-604 .. The process is continued until a disjoint cluster is obtained. Afterwards, it is pplied to each of the resulting subgraphs Special care is to be taken when the highest betweenness is degenerate. Because it is not possible to foresee which will be the optimum cut, we should select at random the link to be removed In this way, partitions with 2, 3,., N subsets can be obtained. The best one, according to the discussion in the previous section is the one that maximizes the magnitude Q 3. 2. Simulated annealing analysis In this section we present a methodology to study the community structure in networks based on the search for that partition that maximizes the value of o. this is accomplished by resorting to a SA [10] calculation in the space of the partitions of the network under analysis. Sa is a generalization of the well known Metropolis Monte Carlo (MMC) procedure. MMC consists in the realization of a Markov Chain in the space of the configurations of the system according to certain transition probabilities chosen in such a way that the asymptotic frequency of each state satisfies the Boltzmann distribution exp(-BEiz with B=(1/kr) where T is the temperature of the system, Ei the energy of state i and z the canonical partition function. The transition probability qi reads qi=min(l, exp(B(Ej-EiD In Sa(see [10] for details) the same procedure is employed but instead of using the temperature of the system we use a pseudo temperature, T, which controls the behavior of the transition probability and instead of the energy the observ able that we want to maximize. The pseudo temperature t is monotonously lowered until an extremum of the relevant observable is attained, in our case the Markov Chain is performed in the space of the partitions of the network under consideration. The transition probabilities read q l i=min(l,exp(B(2-2))with B=1/t and Ek has been replaced by Qk, the modularity of partition k. moreover, because we are looking for the maximum of the modularity(2,-2) stands for (O initial O The Markov Chain is implemented in the following way We start out from an arbitrary initial partition plo of the nn nodes in which the initial number of subsets is taken at random between mo=2 and mo =(nn-1)and then the nodes that compose the network are randomly assigned to any to the corresponding pi subsets. In order to allow the procedure to increase the number of subsets in the partition of the system, the algorithm is implemented in such a way that at every step there is an empty subset present. Starting from this configurations we randomly choose a node ni which belongs to the subset Pk of the partition plo We then choose at random another subset p, of the partition pho. If p, is not empty we check if there is a link between the chosen node n; and any of the nodes belonging to Pr. If no such a link exists, the possibility of transferring ni from pk to pi is discarded and the selection procedure is repeated. If, on the other hand, there exists A. Medus et al./ Physica A 358(2005)593-604 597 at least one link lim between the node ni and node nm in pi, or p, is empty, we perform the metropolis acceptance analysis i We calculate the value of Q for the partition p) (i) The transference of n; from subset pk to subset pi is proposed, thus giving a new partition p (iii) The value of (for (p)' is calculated (iv) The new partition is accepted with a probability g=min (l, exp(bao) and the transference of node ni is performed accordingly. If p was empty the total number of subsets , mk, is increased by 1 (v) Steps(1i)-(iii) are repeated with and exponentially increasing value of B until no new configurations are accepted within a fixed number of steps. The pseudo inverse temperature B is changed according to the rule B=aB, with a>1 (vi) Whenever a partition with a higher Q is visited during the development of the Markov Chain, it is recorded. If the asymptotic partition is worse(lower @)than the recorded one we consider this one as the result of the calculation This steps are repeated until the rate of acceptance of particle transfer drops to 0. In order to improve the performance of this algorithm we have implemented steps of the Markov Chain in which multiple transferences of nodes between partitions are performed before the acceptance criterion is checked over the resulting partition This procedure is used in order to avoid trapping configurations(those in which the removal of a single node is highly improbable but the configuration as a whole is not a maximum of 2) It should be noted at this time that the resulting methodology is the same as the one developed by one of us for the analysis of fragmentation of highly excited liquid drops [8] and for the case of phase transitions in small constrained systems [6](see these references for further details, for a comparison between different fragment recognition algorithms see [lID 4. Case study In order to check the properties of the two approaches above mentioned, we have found it helpful to analyze the following simple undirected graph Fig. 1A. The advantage of dealing with such a small and simple graph is that the calculations can be performed by hand and the properties of the recognition algorithms can be easily understood In Fig. I we show the comparison between the results obtained with the above mentioned algorithms(see figure captions for details) We first analyze what happens when we apply the er approach (1) We search for the links with highest betweenness, in this case there is degeneration and links 110.11, /10.12, 712.13,111.13, stand on an equal footing. We then have to choose one at random(as proposed by b3] and this edge is removed. In our example we choose /12.13 obtaining the graph displayed in Fig. I B 598 A. Medus et al./ Physica A 358(2005)593-604 Fig. 1. Development of community structures in terms of the ER and SA analysis. Full arrows denote steps in the er approach. Dotted arrows denote results from Sa methodology. Starting from network A by applying er methodology we first get to network B and, after the second removal of a link, to network C On the other hand, starting from the same initial network the Sa will give network D if we impose the constraint that the final configuration should display two communes. If we do not impose any constraint the result according to Sa will be network e. It is important to notice that network e is unreachable from network C. This is the main drawback of the ER approach (2)We repeat step(1)and we find that the edge with highest betweenness is 110.11 It should be noticed that as a consequence of removing this link the graph breaks up in two pieces Fig. IC. The value of Q is in this case 0=0.409 A. Medus et al./ Physica A 358(2005)593-604 599 As we continue in this way we will obtain that the next breaking of the network takes place when removing link /10.12. By removing this link we obtain 3 clusters with a modularity value of @=0.405. Notice that the removal of l1L13 is equivalent to removing /10.12, giving a different graph with the same value of o We now apply the sa approach. (i)If no restriction on the number of partitions is imposed, we obtain the result displayed in Fig. 1E. In this case the original network is broken into 3 subsets with a modularity value of @=0.446, (ii) if, on the other hand, we restrict the number of partitions to two, we obtain the result displayed in Fig. ID, which is the same graph as the one obtained using ER for two subsets(of course the equivalent configuration resulting from the removal of /10.12 and 111. 13can be obtained as well) It is relevant to notice that the best result according to Sa cannot be reached using ER, because in order to get the graph displayed in Fig. lE, from the previous step in the calculation(Fig. 1C), the link 110. 11 must be reconstructed, but this step does not exists in the er methodology From this analysis it is clear that the sa algorithm is able to find a better (as measured by the quantity @)solution to the communality analysis than the er criterion The reason why the er approach fails to reach the best result is because this methodology is local and irreversible. a decision is made at a given stage of the analysis based only on the betweenness of the links with total disregard to the possible value of o at the end of the calculation. Once a link is removed it is not rebuild into the system at any other stage of the calculation. On the other hand when we analyze the sequence of results obtained with Sa when we impose the condition of having 2, 3, 4, .. partitions, we see that in going from two partitions to three partitions the link 110.11 appears again. This is no problem in Sa because we are working with different groupings of the nodes and all the information about the links is conserved at all times It is interesting to note that in a recent paper [12] it has been proposed that all links that share the highest value of betweenness are to be removed. If such a recipe is applied for our test case, there would be no route to a three communities solution. In fact the first structure that appears gives four communities in which two nodes (11, 12)are isolated 5. Examples Once we have gained insight into the properties of the different approaches analyzed in this paper, we have found it appropriate to reanalyze some examples present in the literature and compare the results already published with the ones obtained using our methodology. We will analyze the Zachary's Karate Club network [3] and the relationship network of the characters of the novel Les Miserables by Victor Hugo, as compiled by Knuth [13 and analyzed 600 A. Medus et al./ Physica A 358(2005)593-604 5.1. Zachary's karate club network The main reason for the election of this network is that it is a classic social network and it has been analyzed by means of er algorithms in some previous works [3]. This network was constructed by Wayne Zachary [14], who dedicated two years to the observation of social interactions between the members of a karate club The data collected by Zachary made it possible to build the corresponding adjacency matrix that characterize relations between the members of the club In Figs. 2 and 3 we show the best partition obtained, in terms of the modularity 2, for the Zachary's non-weighted network by means of the sa and er algorithms respectively [15]. For the case of the Sa algorithm, we achieve the largest modularity (2=0.42) for a structure of four communities(Fig. 2) On the other hand the best community structure recognized with the er approach [3] corresponds to five communities with a modularity value of Q=0.37(see Fig 3) In the actual case analyzed by zachary a dispute arose between the clubs director and the karate teacher, and as a result the club split in two smaller clubs, one centered around the director and the other around the karate teacher. We performed the analysis of Q for this two-community split using both algorithms We started the 24 25 17 13 18 Fig. 2. Community structures for the Zachary network according to Sa approach. In this figure, squares and circles denote the members of the two subsets according to observations by Zachary. broken lines denote the partitions obtained according to SA approach A. Medus et al./ Physica A 358(2005)593-604 601 10 24 Y6 16 20 21 Fig 3. Community structures for the Zachary network according to er approach calculation using the network structure previous to the incident. The results are shown in Figs. 2 and 3 through circles and squares. Only the node 10 is misclassified by Sa approach in comparison with the actual two-communities division observed by Zachary. However, when we run our algorithm for this problem, taking into account that the network can be transformed into a weighted network, with the weight of the links given by the"affinity degree"among the members, (see [14]we obtain the actual two-community split shown in Fig. 4. Here we want to emphasize the fact that our Sa algorithm can by applied either to weighted or unweighted networks without modification. This is so, because all the information about weights is contained in the adjacency matrix M and using this information, changes in the value of o due to nodes regrouping, can be straightforwardly calculated(Fig 4) 5.2. Les Miserables network In the following example we analyze the network of interactions between major characters in the novel Les Miserables, by Victor Hugo, using the list of character appearances by scene as compiled by Knuth [13. For this case, a link between two characters(nodes)represents the simultaneous appearance of both characters in one or more chapters Fig. 5 shows the community structure achieved by our Sa algorithm. The best community structure has a modularity of Q=0.546 and corresponds to 5 602 A. Medus et al./ Physica A 358(2005)593-604 26 25 30 16 29 23 17 20 15 19 14 12 22 18 Fig. 4. Actual community structures as recorded by Zachary. Once again squares and circles denote the members of each subset communities, two of which are centered on the protagonists Jean Valjean and his persecutor, the police officer Javert; as can be seen in the figure. The other communities are centered on Marius, Fantine and bishop myriel (here we want to note that the original data collected by Knuth, which we use in our calculations, has some mistakes like the inclusion of jondrette as a individual character while Jondrette is only a pseudonymous of Thenardier) In the course of the analysis we find 3 isolated nodes, this mean that they have not links with other nodes, and for this reason they have been excluded from this figure When we run the er algorithm for the same network, we obtain a structure with 11 communities and Q=0.538, smaller than the obtained with our algorithm 6. Conclusions In this paper we have presented an analysis of communality structures in graphs based on a process of global optimization on the cost function @. As stated in Section 1(see 3, 4, 9, 12 the higher the value of o the better the communality

...展开详情
所需积分/C币:8 上传时间:2013-04-05 资源大小:543KB
举报 举报 收藏 收藏
分享 分享
Detection of community structures in networks via global optimization

社会网络中的社区发现已应用到许多领域,已成为学术界研究的热点问题,本文是社区发现的好论文,有机会一定要看看,呵呵!

立即下载
Community Detection and Mining in Social Media

It is a pleasure to acknowledge many colleagues who made substantial contributions in various ways to this time-consuming book project. The members of the Social Computing Group, Data Mining and Machine Learning Lab at Arizona State University made this project enjoyable. They include Ali Abbasi, Ge

立即下载
Handbook of Big Data Technologies

Title: Handbook of Big Data Technologies Length: 895 pages Edition: 1st ed. 2017 Language: English Publisher: Springer Publication Date: 2017-03-26 ISBN-10: 3319493396 ISBN-13: 9783319493398 Table of Contents Part I Fundamentals of Big Data Processing Big Data Storage and Data Models 1 Storage Mode

立即下载
Stanford 大学--Analysis of Networks课程13-19-PPT.rar

Stanford 大学--Analysis of Networks课程13-19章 Handouts Info Sheet Lecture 01 - 09/25 Course Introduction and Structure of Graphs Lecture 02 - 09/27 Measuring Networks, and Random Graph Model Lecture 03 - 10/02 Link Analysis: PageRank Lecture 04 - 10/04 Network Construction, Inference, a

立即下载
Stanford 大学--Analysis of Networks课程7-12-PPT.rar

Stanford 大学--Analysis of Networks课程的PPT资源第7-12章。 Handouts Info Sheet Lecture 01 - 09/25 Course Introduction and Structure of Graphs Lecture 02 - 09/27 Measuring Networks, and Random Graph Model Lecture 03 - 10/02 Link Analysis: PageRank Lecture 04 - 10/04 Network Construction, Infer

立即下载
big data analysis deep learning

一个会议的论文集。英文版。 Big Data Analysis and Deep Learning Applications: Proceedings of the First International Conference on Big Data Analysis and Deep Learning (Advances in Intelligent Systems and Computing) This book presents a compilation of selected papers from the first International Conference on Big

立即下载
Mobile Robot Navigation in Unknown Dynamic Environment

In view of the navigation problem of mobile robot in unknown dynamic environment, this paper proposes a new method of robot rolling navigation using improved ant colony system algorithm. The method first utilize the fuzzy logistic description to establish the fuzzy environment model of robot local a

立即下载
wurfl-2.3.1

包含目前最新的移动终端信息 Wireless Universal Resource FiLe is a community effort focused on mobile device detection. WURFL is a set of proprietary application programming interfaces (APIs) and an XML configuration file which contains information about device capabilities and features for a variety of mobile dev

立即下载
Structure from Motion (SFM) Photogrammetry

ABSTRACT: Topographic data measurement is a fundamental aspect of many geomorphic research applications, particularly those including landform monitoring and investigation of changes in topography. However, most surveying techniques require relatively expensive technologies or specialized user super

立即下载
Practical Graph Mining with R(CRC,2013)

Discover Novel and Insightful Knowledge from Data Represented as a Graph Practical Graph Mining with R presents a “do-it-yourself” approach to extracting interesting patterns from graph data. It covers many basic and advanced techniques for the identification of anomalous or frequently recurring pat

立即下载
Deep Learning for Medical Image Analysis

Foreword Computational Medical Image Analysis has become a prominent field of research at the intersection of Informatics, Computational Sciences, and Medicine, supported by a vibrant community of researchers working in academics, industry, and clinical centers. During the past few years, Machine Le

立即下载
Big Data Analytics in Future Power Systems

Big Data Analytics in Future Power Systems by:Ahmed F. Zobaa and Trevor J. Bihl ISBN-10 书号: 1138095885 ISBN-13 书号: 9781138095885 Edition 版本: 1 出版日期: 2018-09-24 pages 页数: (188) $99.95 Power systems are increasingly collecting large amounts of data due to the expansion of the Internet of Things into

立即下载
Mastering OpenCV 3 2nd Edition B01N7G0BKE

Mastering OpenCV3, Second Edition contains seven chapters, where each chapter is a tutorial for an entire project from start to finish, based on OpenCV's C++ interface, including the full source code. The author of each chapter was chosen for their well-regarded online contributions to the OpenCV co

立即下载
Mastering OpenDV 3 Second Edition

Mastering OpenCV3, Second Edition contains seven chapters, where each chapter is a tutorial for an entire project from start to finish, based on OpenCV's C++ interface, including the full source code. The author of each chapter was chosen for their well-regarded online contributions to the OpenCV c

立即下载
Mastering OpenCV 3 - Second Edition.pdf

Preface Mastering OpenCV3, Second Edition contains seven chapters, where each chapter is a tutorial for an entire project from start to finish, based on OpenCV's C++ interface, including the full source code. The author of each chapter was chosen for their well-regarded online contributions to the O

立即下载
SIGMOD 2009 全部论文(2。后12篇)

14 Continuous obstructed nearest neighbor queries in spatial databases Yunjun Gao, Baihua Zheng Jun. 2009 Proceedings of the 35th SIGMOD international conference on Management of data In this paper, we study a novel form of continuous nearest neighbor queries in the presence of obsta

立即下载
用于人体动作识别的PKU-MMD大范围数据集

Despite the fact that many 3D human activity benchmarks being proposed, most existing action datasets focus on the action recognition tasks for the segmented videos. There is a lack of standard large-scale benchmarks, especially for current popular data-hungry deep learning based methods. In this pa

立即下载
Analyzing_Social_Networks

Designed to walk beginners through core aspects of collecting, visualizing, analyzing, and interpreting social network data, this book will get you up-to-speed on the theory and skills you need to conduct social network analysis. Using simple language and equations, the authors provide expert, clear

立即下载
Remote sensing image processing

2012出版的,Earth observation is the field of science concerned with the problem of monitoring and modeling the processes on the Earth surface and their interaction with the atmosphere.The Earth is continuously monitored with advanced optical and radar sensors.The images are analyzed and processed to de

立即下载
How to Cheat at Configuring Open Source Security Tools

Product Description The Perfect Reference for the Multitasked SysAdmin This is the perfect guide if network security tools is not your specialty. It is the perfect introduction to managing an infrastructure with freely available, and powerful, Open Source tools. Learn how to test and audit your syst

立即下载