Detection of community structures in networks via global optimization 评分

社会网络中的社区发现已应用到许多领域，已成为学术界研究的热点问题，本文是社区发现的好论文，有机会一定要看看，呵呵！
A. Medus et al./ Physica A 358(2005)593604 95 Given the network n we will define a partition P as a given grouping of the nodes in subsets p; (I<i<g), while keeping the structure of the adjacency matrix unaltered Following I we will quantify the degree of communality of a given partition P in the following way Given a partition P comprising g subsets, a matrix e(of dimension g *g) is defined such that the corresponding component eii is the fraction of edges in the origina network that connect nodes in subset i with nodes in subset In I the modularity o is defined as e i Q stands for the difference between the relative quantity of links within subsets and the expected relative number of links that would result from a random placement of links when no attention is given to the community structure of the network under consideration 9 If the network under consideration has no community structure, Q equals 0. On the other hand, if the network under consideration does have a community structure the closer the chosen partition is to the actual community structure of the network the larger the modularity Q will be In this way, the search of community structures in networks is reduced to finding the partition p which maximizes the modularity Q 3. Community recognition algorithms In this section we review the algorithm presented in I based on edge removal Hereafter referred as edge removal(ER), and describe our approach based on simulated annealing(hereafter referred aS Sa) 3.1. Community recognition via edge removal In a recent work, Newman and Girvan 3] have proposed to study the structure of the network by analyzing the effect of the removal of links with highest betweenness The betweenness bi of a given link li is ∑am∑ Lkm path with paths the sum over all the path joining the nn nodes ano is the degeneracy of the path between nodes n and o, and 2ienath is the sum over all the links lkm that form he path under consideration. In this way the link with highest betweenness is the one that appears most often when we study all the components of all the minimum paths between all the pairs of nodes According to this prescription (i) One calculates the betweenness of all the links in the network. (ii) The one with the highest betweenness is removed A. Medus et al./ Physica A 358(2005)593604 .. The process is continued until a disjoint cluster is obtained. Afterwards, it is pplied to each of the resulting subgraphs Special care is to be taken when the highest betweenness is degenerate. Because it is not possible to foresee which will be the optimum cut, we should select at random the link to be removed In this way, partitions with 2, 3,., N subsets can be obtained. The best one, according to the discussion in the previous section is the one that maximizes the magnitude Q 3. 2. Simulated annealing analysis In this section we present a methodology to study the community structure in networks based on the search for that partition that maximizes the value of o. this is accomplished by resorting to a SA [10] calculation in the space of the partitions of the network under analysis. Sa is a generalization of the well known Metropolis Monte Carlo (MMC) procedure. MMC consists in the realization of a Markov Chain in the space of the configurations of the system according to certain transition probabilities chosen in such a way that the asymptotic frequency of each state satisfies the Boltzmann distribution exp(BEiz with B=(1/kr) where T is the temperature of the system, Ei the energy of state i and z the canonical partition function. The transition probability qi reads qi=min(l, exp(B(EjEiD In Sa(see [10] for details) the same procedure is employed but instead of using the temperature of the system we use a pseudo temperature, T, which controls the behavior of the transition probability and instead of the energy the observ able that we want to maximize. The pseudo temperature t is monotonously lowered until an extremum of the relevant observable is attained, in our case the Markov Chain is performed in the space of the partitions of the network under consideration. The transition probabilities read q l i=min(l,exp(B(22))with B=1/t and Ek has been replaced by Qk, the modularity of partition k. moreover, because we are looking for the maximum of the modularity(2,2) stands for (O initial O The Markov Chain is implemented in the following way We start out from an arbitrary initial partition plo of the nn nodes in which the initial number of subsets is taken at random between mo=2 and mo =(nn1)and then the nodes that compose the network are randomly assigned to any to the corresponding pi subsets. In order to allow the procedure to increase the number of subsets in the partition of the system, the algorithm is implemented in such a way that at every step there is an empty subset present. Starting from this configurations we randomly choose a node ni which belongs to the subset Pk of the partition plo We then choose at random another subset p, of the partition pho. If p, is not empty we check if there is a link between the chosen node n; and any of the nodes belonging to Pr. If no such a link exists, the possibility of transferring ni from pk to pi is discarded and the selection procedure is repeated. If, on the other hand, there exists A. Medus et al./ Physica A 358(2005)593604 597 at least one link lim between the node ni and node nm in pi, or p, is empty, we perform the metropolis acceptance analysis i We calculate the value of Q for the partition p) (i) The transference of n; from subset pk to subset pi is proposed, thus giving a new partition p (iii) The value of (for (p)' is calculated (iv) The new partition is accepted with a probability g=min (l, exp(bao) and the transference of node ni is performed accordingly. If p was empty the total number of subsets , mk, is increased by 1 (v) Steps(1i)(iii) are repeated with and exponentially increasing value of B until no new configurations are accepted within a fixed number of steps. The pseudo inverse temperature B is changed according to the rule B=aB, with a>1 (vi) Whenever a partition with a higher Q is visited during the development of the Markov Chain, it is recorded. If the asymptotic partition is worse(lower @)than the recorded one we consider this one as the result of the calculation This steps are repeated until the rate of acceptance of particle transfer drops to 0. In order to improve the performance of this algorithm we have implemented steps of the Markov Chain in which multiple transferences of nodes between partitions are performed before the acceptance criterion is checked over the resulting partition This procedure is used in order to avoid trapping configurations(those in which the removal of a single node is highly improbable but the configuration as a whole is not a maximum of 2) It should be noted at this time that the resulting methodology is the same as the one developed by one of us for the analysis of fragmentation of highly excited liquid drops [8] and for the case of phase transitions in small constrained systems [6](see these references for further details, for a comparison between different fragment recognition algorithms see [lID 4. Case study In order to check the properties of the two approaches above mentioned, we have found it helpful to analyze the following simple undirected graph Fig. 1A. The advantage of dealing with such a small and simple graph is that the calculations can be performed by hand and the properties of the recognition algorithms can be easily understood In Fig. I we show the comparison between the results obtained with the above mentioned algorithms(see figure captions for details) We first analyze what happens when we apply the er approach (1) We search for the links with highest betweenness, in this case there is degeneration and links 110.11, /10.12, 712.13,111.13, stand on an equal footing. We then have to choose one at random(as proposed by b3] and this edge is removed. In our example we choose /12.13 obtaining the graph displayed in Fig. I B 598 A. Medus et al./ Physica A 358(2005)593604 Fig. 1. Development of community structures in terms of the ER and SA analysis. Full arrows denote steps in the er approach. Dotted arrows denote results from Sa methodology. Starting from network A by applying er methodology we first get to network B and, after the second removal of a link, to network C On the other hand, starting from the same initial network the Sa will give network D if we impose the constraint that the final configuration should display two communes. If we do not impose any constraint the result according to Sa will be network e. It is important to notice that network e is unreachable from network C. This is the main drawback of the ER approach (2)We repeat step(1)and we find that the edge with highest betweenness is 110.11 It should be noticed that as a consequence of removing this link the graph breaks up in two pieces Fig. IC. The value of Q is in this case 0=0.409 A. Medus et al./ Physica A 358(2005)593604 599 As we continue in this way we will obtain that the next breaking of the network takes place when removing link /10.12. By removing this link we obtain 3 clusters with a modularity value of @=0.405. Notice that the removal of l1L13 is equivalent to removing /10.12, giving a different graph with the same value of o We now apply the sa approach. (i)If no restriction on the number of partitions is imposed, we obtain the result displayed in Fig. 1E. In this case the original network is broken into 3 subsets with a modularity value of @=0.446, (ii) if, on the other hand, we restrict the number of partitions to two, we obtain the result displayed in Fig. ID, which is the same graph as the one obtained using ER for two subsets(of course the equivalent configuration resulting from the removal of /10.12 and 111. 13can be obtained as well) It is relevant to notice that the best result according to Sa cannot be reached using ER, because in order to get the graph displayed in Fig. lE, from the previous step in the calculation(Fig. 1C), the link 110. 11 must be reconstructed, but this step does not exists in the er methodology From this analysis it is clear that the sa algorithm is able to find a better (as measured by the quantity @)solution to the communality analysis than the er criterion The reason why the er approach fails to reach the best result is because this methodology is local and irreversible. a decision is made at a given stage of the analysis based only on the betweenness of the links with total disregard to the possible value of o at the end of the calculation. Once a link is removed it is not rebuild into the system at any other stage of the calculation. On the other hand when we analyze the sequence of results obtained with Sa when we impose the condition of having 2, 3, 4, .. partitions, we see that in going from two partitions to three partitions the link 110.11 appears again. This is no problem in Sa because we are working with different groupings of the nodes and all the information about the links is conserved at all times It is interesting to note that in a recent paper [12] it has been proposed that all links that share the highest value of betweenness are to be removed. If such a recipe is applied for our test case, there would be no route to a three communities solution. In fact the first structure that appears gives four communities in which two nodes (11, 12)are isolated 5. Examples Once we have gained insight into the properties of the different approaches analyzed in this paper, we have found it appropriate to reanalyze some examples present in the literature and compare the results already published with the ones obtained using our methodology. We will analyze the Zachary's Karate Club network [3] and the relationship network of the characters of the novel Les Miserables by Victor Hugo, as compiled by Knuth [13 and analyzed 600 A. Medus et al./ Physica A 358(2005)593604 5.1. Zachary's karate club network The main reason for the election of this network is that it is a classic social network and it has been analyzed by means of er algorithms in some previous works [3]. This network was constructed by Wayne Zachary [14], who dedicated two years to the observation of social interactions between the members of a karate club The data collected by Zachary made it possible to build the corresponding adjacency matrix that characterize relations between the members of the club In Figs. 2 and 3 we show the best partition obtained, in terms of the modularity 2, for the Zachary's nonweighted network by means of the sa and er algorithms respectively [15]. For the case of the Sa algorithm, we achieve the largest modularity (2=0.42) for a structure of four communities(Fig. 2) On the other hand the best community structure recognized with the er approach [3] corresponds to five communities with a modularity value of Q=0.37(see Fig 3) In the actual case analyzed by zachary a dispute arose between the clubs director and the karate teacher, and as a result the club split in two smaller clubs, one centered around the director and the other around the karate teacher. We performed the analysis of Q for this twocommunity split using both algorithms We started the 24 25 17 13 18 Fig. 2. Community structures for the Zachary network according to Sa approach. In this figure, squares and circles denote the members of the two subsets according to observations by Zachary. broken lines denote the partitions obtained according to SA approach A. Medus et al./ Physica A 358(2005)593604 601 10 24 Y6 16 20 21 Fig 3. Community structures for the Zachary network according to er approach calculation using the network structure previous to the incident. The results are shown in Figs. 2 and 3 through circles and squares. Only the node 10 is misclassified by Sa approach in comparison with the actual twocommunities division observed by Zachary. However, when we run our algorithm for this problem, taking into account that the network can be transformed into a weighted network, with the weight of the links given by the"affinity degree"among the members, (see [14]we obtain the actual twocommunity split shown in Fig. 4. Here we want to emphasize the fact that our Sa algorithm can by applied either to weighted or unweighted networks without modification. This is so, because all the information about weights is contained in the adjacency matrix M and using this information, changes in the value of o due to nodes regrouping, can be straightforwardly calculated(Fig 4) 5.2. Les Miserables network In the following example we analyze the network of interactions between major characters in the novel Les Miserables, by Victor Hugo, using the list of character appearances by scene as compiled by Knuth [13. For this case, a link between two characters(nodes)represents the simultaneous appearance of both characters in one or more chapters Fig. 5 shows the community structure achieved by our Sa algorithm. The best community structure has a modularity of Q=0.546 and corresponds to 5 602 A. Medus et al./ Physica A 358(2005)593604 26 25 30 16 29 23 17 20 15 19 14 12 22 18 Fig. 4. Actual community structures as recorded by Zachary. Once again squares and circles denote the members of each subset communities, two of which are centered on the protagonists Jean Valjean and his persecutor, the police officer Javert; as can be seen in the figure. The other communities are centered on Marius, Fantine and bishop myriel (here we want to note that the original data collected by Knuth, which we use in our calculations, has some mistakes like the inclusion of jondrette as a individual character while Jondrette is only a pseudonymous of Thenardier) In the course of the analysis we find 3 isolated nodes, this mean that they have not links with other nodes, and for this reason they have been excluded from this figure When we run the er algorithm for the same network, we obtain a structure with 11 communities and Q=0.538, smaller than the obtained with our algorithm 6. Conclusions In this paper we have presented an analysis of communality structures in graphs based on a process of global optimization on the cost function @. As stated in Section 1(see 3, 4, 9, 12 the higher the value of o the better the communality所需积分/C币：8 上传时间：20130405 资源大小：543KB

Detection of community structures in networks via global optimization
社会网络中的社区发现已应用到许多领域，已成为学术界研究的热点问题，本文是社区发现的好论文，有机会一定要看看，呵呵！
立即下载