kcore decomposition

kcore decomposition
The problem of core decomposition is given a graph G, Algorithm I BottomUp find the kcore of G for k=0, 1, .. kimaa, where kmar is the Input: G=(V, E) maximum core number of any vertex in G. Equivalently, the Output: The kclass, Wk, of G for all k problem is to find the hclass of g for k =0, 1,..., him.aa. In the kclasses of G when main memory is not large enough to 3. While(G is not empty) ascending this paper, we propose an externalmemory algorithm that finds I. order the vertices in g in ascending order of their degree let d be the minimum vertex degree in g hold the whole graph G. From the kclasses, we can easily 4. wd t0 obtain any kcore as the induced subgraph of G by the vertex 5. while( there exists a vertex v with degree of at most d) set Vk= 6 ya←业aU{v号} The following example illustrates the concept of core de 7 remove u and all edges incident lu v from G composition. ertices in ascending order of their degree, Example 1 Figure I shows a graph G that contains 14 9. output all yki vertices, a,..., n. The Oclass, yo, of G is an empty set since there is no isolated vertex in G. The lclass yy Ya, b, c,jf, the 2class y2=c d, e, f, g, m), and the 3class degree of d or less, they cannot be in a kclass where k>d y 3=h, i, k, I, n In this example, we have kimar =3.The and thus must be in y d. Therefore, they are added to y 0core is the induced subgraph of g by vertex set 0i<3 and removed from G. This process continues until all vertices Since yo=0 in this example, the 0core is the same as the remaining in g have degree greater than d. Then, the algorithm 1core, which is simply the entire graph. The 2core is the moves on to the next iteration to compute the next kclass induced subgraph by(y2UY3)and the 3core is the induced where k d. The algorithm terminates when all vertices, and subgraph by y 3. One can easily verify that in the kcore, their edges are removed from G every vertex is connected to at least k other vertices, where The following example further explains the bottomup com 0≤k<3. utation Example 2 St e that we compute the kcla of th graph in Figure l by Algorithm 1. Line I sorts the vertices as follows:{a(1),c(1),j(1),f(2),m(2),b(3),d(3),g(3), e(4), k(4), n(4), h(5),l(5),i(6), where the number parentheses indicates the degree of the vertex in the current G. Then, in the first iteration(Lines 38)that computes the 1 class, the vertices a, C and j are first removed from g together with their incident edges since their degree is I. After that, the Fig. 1. A Graph and Its hclasscs vertices are reordered as (b(1), f(2), m(2), d(3), 9(3), c(4) k (4), n(4),h(5), 1(5), i(5)). Then vertex b is also removed IIL. INMEMORY CORE DECOMPOSITION because after removing a and c, the degree of b becomes 1. at this point, the ordered vertex list becomes If(2), m(2), d(2), 1. In this section, we describe the existing inmemory al .9(3), e(4), k(4), n(4),(5), l(), i(5)1. The first iteration gorithm for core decomposition and explain why it is not completes and obtains V1=[a, c,j, b. In the second iteration suitable to be extended to an externalmemory algorithm. We that computes the 2class, the vertices removed are (in the also discuss the advantage of the topdown approach over the order): f, m, d, g, and c. In the third iteration that computes bottomup approach for core decomposition in large graphs. the 3class, the vertices removed are(in the order): n, k, h, I Algorithm 1 describes the skeleton of the existing in and i memory algorithm for core decomposition [20]. It is a bottom up approach as the algorithm starts the computation of the k When main memory is sufficient to hold the entire input class from smaller values of k and moves to larger values of graph, the most costly step of the inmemory algorithm is k sorting the vertices according to their degree at each iteration The algorithm first sorts the vertices in ascending order of (Line 8).Batagelj and Zaversnik [20] use binsort to order the their degree(among the vertices with the Same degree, the vertices to achieve O(m n)time complexity ordering can be arbitrary ) Then, it starts to compute the d When the graph is too large to be placed n main memory class y d, where d is the minimum vertex degree in G. It the bottomup approach fails because it requires memory space removes from G all vertices with degree of d, together with of Q(m t n). The bottomup approach is not suitable for all the edges incident to them, and puts these vertices into y d. designing an efficient externalmemory algorithm since it starts After removing these vertices and their edges, the degree of core decomposition over the entire graph to compute the k some vertices that are previously connected with the removed class with the smallest k. Although sorting can be done in vertices decreases. If any vertices remaining in G now have O(B log 4 B)1Os(assuming that we store separately the 53 vertices and their degree on consecutive disk blocks ), it is non Algorithm 2 PartitionGraph trivial to address the problem of random accesses to vertices Input: G=(V, E), memory size m, disk block size B and edges on disk during the computation Output: A partition of disjoint vertex sets, =U1,.,U1 In this paper, we propose a topdown approach, which first computes the kclass for larger k and then moves down create an empty partition w to smaller values of k. The topdown approach is more roLET 2l; // the part of that are currently in memory 3. for each vertex, vEV. do suitable for designing an externalmemory algorithm. An if(e; is not connected to a vertex in any vertex set in mem) other advantage of the topdown approach over the bottom5 create a new set U= u and add U to mem; up approach is that most applications and algorithm designs 6 using kcores only need the kcores with larger values of 7 find the vertex set e mem lo which k, and sometimes, only the kimarcore(e. g, for defining the u has the most connections nucleus of a communication network [4], for predicting protein 8 add u to u f(∑nerd()=B)∥ memory used for U reaches B functions [8 for approximating the densest subgraph [161, 10 Write block(u) etc. ). The topdown approach can obtain the kcores with 11. if(LI U∈arem larger k without computing those with Smaller k. This can 12 let Umam be the vertex set in mem s. result in huge savings of computational resources for handling 13 vr e mem,∑ d()≥∑eol(v); massive networks because the smaller the value of k, the Writeblock(umas) larger is the graph from which the kcore is computed IV OVERALL FRAMEWORK Procedure 3 WriteBlock(o) In this section, we first give the overall framework of our 1. let H be the subgraph of G that consists of all edges externalmemory algorithm, called EMcore, for core decom connected to the vertices in U 2. estimate l(u) for each vertex v in H position, and then discuss the details of each part in the next 3. write h to the disk and remove U from eu section 1)Graph partitioning The first step of EMcore is to partition the vertex set Y. EXTERNALMEMORY CORE DECOMPOSITION V of G, so that core decomposition by EMcore in later In this section, we discuss in detail the three parts in the steps can be processed by loading only the correspond overall framework, followed by a detailed analysis of the ing subgraphs of the relevant subsets of vertices (instead algorithm of the whole graph) in main memory. We describe an efficient partitioning strategy that uses only one scan of A. Graph Partitioning The purpose of graph partitioning is to decompose the orig inal large graph into a set of smaller subgraphs, so that core 2) Estimation of the upper bound on the core number: decomposition can be processed by loading only those relevant We develop heuristics to estimate the upper bound on subgraphs into main memory. There are many existing graph the core number of the vertices in each subset in the partitioning algorithms [22],[23], but they are inmemon partition. The upper bound is refined progressively at algorithms. For core decomposition in massive networks that each step of core decomposition cannot fit into main menory, we need an efficient algorithm that partitions a large graph with limited memory consumption Recursive topdown core decomposition based on the To this end, we devise a simple graph partitioning algorithm upperbound core number that requires only one scan of the original graph and has linear The main step of EMcore is a recursive procedur CPU time complexity of core decomposition based on the upper bound on The outline of the partitioning algorithm is given in Algo the core number of the vertices, EMcore recursively rithm 2. The algorithm reads the input graph g from external computes the kclasses on a set of vertices whose core memory once and partitions v into a set of disjoint vertex sets number falls within a range such that the corresponding for all i+ 3 U}, where∪1<tUz= v and L∩U=0 relevant subgraph can be kept in main memory. Our agorithm is topdown, meaning that we compute the k For each vertex u, the algorithIm processes v as follows classes with ranges of larger core numbers before those There are two cases. The first case (Lines 45 of Algorithm of smaller core numbers 2)is that v is not connected to any vertex in any vertex set in Llmem (the part of that are currently in main memory). In At the end of each recursive step, we remove from G this case, we simply create a new vertex set u with a single the kclasses already computed, as well as their incident vertex v and add u to nneT edges, to reduce the search space and disk io cost for The second case (Lines 610 of Algorithm 2)is that v is the next recursion of core decomposition connected to some vertex(es)in some existing vertex sel(s)in men. The algorithm finds the vertex set u E mem with in X. That is, mar (X)=mar lu):wE Xl,where which o has the most connections; that is,wU′∈mem,X≠0. b(0)n> nb(u)nU. This vertex set U can be found The following lemma describes how to refine l(u) in O(d(u)) expected time using a hashtable. Note that we do not keep all vertices in G in the hashtable, but only those in LEMMA 2: Given a vertex v E V. where the current a vertex set in mem. After finding U, we add v to U. If the y(a)>0. let Z contain all neighbors of v with v values memory used for U is as large as the block size B, we write lower than w (u), that is, z=u: u E nb(o), Q(u)<v(u) u to disk by procedure 3 If (d(u)zd< (u), then y (u) can be refined as After processing each vertex v, we also check (Lines 1113 mar d(v), 2 ma (Z)1 of Algorithm 2) if the memory used by the current partition reaches the available memory space assigned. When this is Proof: Based on the definition of al,no4: (Z), we know the case, we reclaim the memory by writing the vertex set that all vertices in z can only exist in the kcores for currently with the largest memory usage to disk k< vmaz (z), which means that the edges connecting v and The writing of a vertex set u to the disk as shown in its neighbors in Z cannot exist in the (a/mar(Z)+1)core Procedure 3 proceeds as follows. When we read the vertices Therefore, if v exists in the (af mas(Z)+1)core,the degree in U from the disk( during the scanning of G), we read their of v is at most (d(v)1Zl) edges as well. Let h be the corresponding subgraph; that is, If(d(0)IZD>1mo(z), we have v(u)s(d(u)1Z1) H=(VH, EH), where VH=CUlv:ueU,(a, v)E Ej and because it is possible that in the(d(u)1ZD)core, v has a EH=i(a,U): uEU,u, v)EEJ. We write H to disk and, degree of (d(u)1Z1) Otherwise, we have(d(v at the same time, release the memory occupied by h(also 0) rns Z), which means that v cannot exist in the (umar (2)+ We delay the discussion of the estimation of the upperbound 1)core, and thus we have y(u)< core number, y/(v), of each v in H to Section VB a (a)s mac d(o)z, imar (z) Before we move on to present our main algorithm for core Finally, for the current value of u(u), we have(d(u decomposition, we ask whether we can devise a divideand))<l(o)(given)and i,nar(Z)< l(o)(by the definition conquer solution that computes core decomposition on each of Z). Therefore, macd()Zl, alma( z)) is a tighter subgraph obtained by the partition, and then merges the results value of vi(u), which completes the proof to find the kcores in the original graph we note that the divideandconquer method does not work because at each Lemma 2 can be applied iteratively to replace the current conquer phase, this approach still needs to perform search in value of v(v) whenever a tighter value is possible for o() the merged problem space to determine the core number of However, simply applying Lemma 2 might not be good enough each vertex, and hence it demands as much memory as does since it is likely that mar(2)>(d(u)1ZD especially if an inmemorv algorithm Z is not small. As a result, we often choose vmca (z)to refine y(a), which weakens the effectiveness of Lemma 2 in B. Upper bound on Core Number obtaining a good estimation of vb(u Our topdown core decomposition algorithm requires the refinement based on the following corollary of Lemma ner To get a better estimation of v we app selection of a relevant set of vertices based on the upper bound of their core number. In this subsection, we develop heuristics COrollary 1: Given a vertex vE v, where the current to estimate this upper bound v(a)>o, let Z=u: uE nb(o), yb(a <y(uh e use w(u to denote the upper bound on v (u)of a vertex 2. The following lemma gives a coarse estimation on vp(u1) (a)can be refined as maid(a)1e7e For any Z'cZ,Z+0, if (d(u) ) y(u),then ,0ma(Z)} LEMMA I: y(u)=d(u) Proof: Similar to proof of Lemma 2 Proof: It is trivial to show that vl(u)=d(u)because (v)≤ad(v). Based on Corollary l, we can select the subset z of Z that attains the tightest estimation of v(u), rather than restrictin Lemma 1 serves as an initialization of o (o) for each 0. to the whole set Z for the estimation After the initialization, we can further refine b(v). The basic Procedure 4 describes our algorithm to estimate (u. Lines idea is to refine y(u)based on the y values of us neighbors. 12 first initialize v(u) to be d(u) by Lemma 1. Then,a Intuitively, v's neighbors that have v values lower than v (u) refinement is applied to obtain a tighter bound on the core definitely have the true core number lower than y(u). These number of each vertex in V(Lines 37), as explained in neighbors cannot contribute to the degree of v in the q(u)Corollary 1 ore and thus if v has many of such neighbors, w(u) can be Note that Line 5 of Procedure 4 uses the condition " d(u) further tightened 2<y(v)”, while Corollary I uses“d(v)2<v()” We first define a notation mor (X) for a nonempty vertex An incorrect value of yb()may be assigned if Lines 67 of set X to denote the naximum w value among all the vertices Procedure 4 chooses some Z with (d(u)zd2v(v) 55 Procedure 4 Estimateab Example 3: Assume that the vertex set of the graph in for each 2∈vdo Figure 1 is partitioned into three sets: U1=fa,b,c,d,e, f, 2. initialize al(o) as d(v) U2=9, h,i,j), and U3=e, k, m, n. Figure 2 shows the 3. for each∈Vdo three corresponding lbgraphs'. The first number next to each 4. let Z=u: Enb(u), yl(u)<( 0); vertex v is the initial y (u)=d(u), while .>y indicates that 5.if(d(v)2<v(v) the al value is tightened from to y by Procedure 4 6 let f(X)mard(o)X, vma(X); 7. v(0)+minf(z):z′sE,Z≠0} 8. repeat steps 37 ③@Q人 However, this Z cannot be selected to refine v(u), since Line 7 selects the subset of z with the lowest f value and Z with(d(v)2D)<v (u) is obviously a better refinement (a)U/={a,b,c,d,e,f}(b)U2={g,h,,j} (c)U/y={k,,m,n} Therefore, Lines 57 of Procedure 4 refines v(u) as Slated in Fig. 2. Vertex Sct Partition and Thcir Subgraphs Corollary 1 and the correctness of the refinement of /(v) is ensured Let us consider vertex b in Figure 2(a). We have Z=ta,ch In each refinement process as described in Lines 37, we since v(a)=v(c)=1<v(b)=3 at this moment. It is easy always refine the vertices with smaller w values before those to see minf(2):Z'cZ, Z't0= l in this case and with higher ones. This strategy allows the refinement to tighten thus we obtain (b)=1. Similarly for vertex d, we compute he bounds more quickly since only the neighbors with low Z=6 since now v(b)=1<u(d)=3.Thus, we obtain values are useful in tightening the bounds (by Corollary v(d)=2. Note that we cannot use g to compute (d )since we 1). The smaller y/; values refined are thus particularly effective do not have the degree or al information of g at the current in tightening the y values for other vertices in the remaining stage, which will become available in a later stage of core decomposition To tighten the value of v(o), we do not need to compute f(2) for all subsets of z in Line 7 of Procedure 4. The C. Recursive TopDown Core decomposition definition of f(2) in Line 6 of Procedure 4 actually implies a dilemma: the larger the size of Z the smaller is the value We now discuss our algorithm eMcore, as outlined in algo of (d(u) z'd but the larger may be the value of y, mar(Z) rithm 5. EMcore first invokes Partition Graph(i.e, Algorithm while a smaller Z implies the opposite. Moreover, for subsets 2)to partition G. Then, it invokes Procedure 6, EmCoreStep to compute the kclasses with k in the range [kl, kul,where of the same size, clearly those vertices in Z that have a smaller kl s hiu, recursively from higher value ranges to lower value a, help more in obtaining a smaller y,mar (z)than other ones vertices. Therefore, we can sort the vertices in Z in ascending Zl, where the sizei Z/simply contains the first i vertices in range [kil, hiu as follows. We first detine ar etermine each order of their y values. Then, we select Z of size from 1 to In order to bound the memory usage, we determine each the ordered Z, until we find that f(z)reaches its minimum V, kl < w(u)s hia: that is, yk is the set of vertices whose In this way, we process at most Z number of subsets of z y values fall within the range kl, k At each recursive step, instead of 2 Z subsets we construct a subgraph that is relevant for computing the We apply Procedure 4 to estimate l(u) for each v in H in real core number of the vertices in y k in main memory. Let Line 2 of Procedure 3. Since H is only a small subgraph of G, b=r. Then, b is the maximum number of blocks that can be the refinement process(lines 37 of Procedure 4)converges held in main memory. We use b and a given hu to determine quickly. To handle the worstcase scenario, in our algorithm, kl as follows we set a limit on the number of iterations for the refinement Let K be a set of k values such that the vertices in y k process. We bound the cost of the refinement by the cost of are scattered in at most b vertex sets in l. That is a disk i/o. That is, we terminate the refinement process once the accumulated time for refinement reaches the time for a disk I/O. Therefore. the total cost of our algorithm is always K={k:1<k≤k,{U:U∈2,U∩亚k“≠0≤ bounded by the number of disk 1/Os We then define In Procedure 3, the subgraph h is associated with a vertex set U in partition l. However, there may be some vertices in min{:k∈K},ifK≠0, I that are not inu but are connected to some vertices in l k (1) otherwise Since we may not have the degree of these vertices to initialize their v values, we consider their a/ values to be oo when we We use an adjacency list to store nb(u)of a vertex v In each subgraph, we refine v(v)for v E U. The following example illustrates how Thus an edge between two vertices in the same U: is counted twice and the we refine y by Procedure 4. three subgraphs have 14, 15, and 15 edges, respectively 56 Algorithm 5 EMcore Procedure 7 Computecore(gw, kl, ku) Input: G=(V, E), memory size M, disk block size B 1. initialize (u) to be1 for each vE w: Output The kclass. yk, of g for all k k←k 3. while (h hu) 1. invoke Partitiongraph 4. while (vE W,(d(u, Gw)+ deposit(u, hv))<k) compute h by setting hu oo and b emove v and all edges incident to v from gw 3. invoke Em Core Step(kl, kiu) 6. update w(u) to be k for each remaining vertex v in Gw; 7 k←k+1; 8. for each k, where k<k< ku, do ∈W,()=k} Procedure 6 Em Core Step(hl, hin) l0. output业k 1.W<亚 2. let H be the set of subgraphs of the vertex sets in chat conlain any vertex in w 3. read H into main memory to construct shown in Procedure 7, to compute the keclasses y k from Gw the induced subgraph Gw by n where ki <ksku. However, an immediate question raised 4. invoke Compute Core(g w, key, hu) here is: since Gw is the induced subgraph, is the core number 5. refine v(u) for any vertex v currently in main memory; 6. deposit any vertex connected to a vertex in业k(kt≤k≤ka) computed from Gw correct? We will prove the correctness of 7. remove all vertices in亚k(kt≤k≤hu) and their edges our algorithm in Theorem 1. which will also become clearer from the subgraphs in H, and merge any of as we continue to unwrap the details of our algorithm these subgrap if their combined size is less than B One key concept to ensure the correctness of our algorithm 8.if(k>2) is the concept of deposit assigned to each vertex ( line 6 of 9. compute kl in I k by setting ku =(k!1) Procedure 6). We define the deposit of a vertex as follows. 10. write the subgraphs in H that consist of no vertex in to the disk, unless all the subgraphs can now be kept Definition 2(DEPOSIT ) Given a vertex v E V and an inte In main memorv ger hu, where ha >v(u), the deposit of v with respect to h 11. invoke EmCore Step(ki, ku is defined as deposit(u, kiu)=Ru: u E nb(u).(u)>ku 12. else 13. add all the remaining vertices to 1 and output 41: The concept of deposit is used to ensure that the number of dge connections to a vertex v in the kcore, where kl<k< ku, can be correctly counted even if the vertices and edges that belong to all k core are removed. where k'>> k. After we Intuitively, ht= min(k h e k means that we load compute the kclasses, where k>kl, we can deposit a"coin as Inany parts the graph as possible into main memory to each neighbor of a vertex in the kclasses if the neighbor's (bounded by the memory size n for core decomposition at core number has not been determined (i. e, the neighbors y each recursive step. However, in the rare case when Ko is less than k and its core number will be computed in a later (i. e, the vertices in y ku are in more than b vertex sets in u ), recursive step). This"coin" accounts for an edge connection we simply assign kt= ku and load b vertex sets in U: Ue to this neighbor vertex in the computation of its core number u,Unykt0 each time into main memory to construct Using the deposit, we can safely remove all the vertices the subgraph for core decomposition and their edges after the core number of these vertices in the To obtain hl, we keep a heap in main memory to maintain current recursive step is computed(Line 7), without worrying the highest in each vertex setU E l. The highest v among about the correctness of the core decomposition in the later the vertices in each U E l is inserted into the heap before recursive steps. In this way, we can save considerable disk i/o U is written to disk during graph partitioning. Given b and costs for reading and writing these largecorenumber vertices kiu, we can easily determine the corresponding value of h by and their edges again and again in computing the he classes querying the heap with a smaller core num ber We now discuss how EmCoreStep computes the kclasses, Procedure 7 computes the correct core number of th where kl< k< ka, as outlined in Procedure 6 vertices in the induced subgraph gw as follows. The algorithm Let w=y k. In other words, W is the set of candidate iteratively deletes the lowest degree vertex and its edges as vertices whose exact core number is to be computed at this in an inmemory algorithm. However, in selecting the lowest recursive step. The first step is to construct the induced degree vertex, the algorithm considers the sum of the local subgraph Gw of g by W. We construct Gw from the set degree in Gw and the deposit of the vertex. We remark that of subgraphs of the vertex sets in u that contain a vertex in the initial value of the deposit of all vertices with respect to W, since all vertices in W and their interconnection edges hu oo is set to 0. We prove the correctness of Procedure 7 are contained in these subgraphs. During the construction, we in Lemma 3 can discard those vertices that are not in w, together with After we compute yk, where hI <k< ku, we return to their edges Lines 56 of Procedure 6. We need to refine the upper bound After Gw is constructed, we invoke Compute Core, as on the core number of remaining vertices as well as updating 57 their deposit. We first refine v(u) for any vertex v that is currently in main memory but does not have its core number computed. The refinement of ()involves two steps: (1)We first set v(u to be(kt1) if currently vl(u)> kl.This is because if v(u)>kl, then u must be processed in the current recursive step. However, since us core number is not determined, v is not in any y k, where hi h<hu, which a)W{g,h,讠k4,n} (b)wd,e,fg, mi implies that l(u)<(kl1).(2)We then refine all v Fig 3. Induced Subgraph Gn Procedure 4. since now we have the exact core number of more vertices computed which can be used to tighten the v value of the remaining vertices After computing the 3class, we can refine v(g from 3 to After the refinement, we have l(o)<kl s ap (a) for all 2. We also update the deposit of any vertex that i lected v whose core number has not been determined and any u in to a vertex in the 3class for k n=(bi1)=2, as shown in Y k, where h> hil. Then, Line 6 updates the deposit of a the following table vertex v with respect to ha=(hl1). The deposit (u, hl1) can be calculated as (deposit(u, ku)+ Ha: u E nb(u),u E TABLEⅡ deposit(v,2) WHERE U∈ ,m} ∪k≤k<k亚k}), where the first part is the number of edge connections to u that were removed at previous recursive steps m while the second part is the number of edge connections to u) deposit(v, 2) that are to be removed at the current recursive step Therefore deposit(u, k1) ensures the correct computation of v (u)at Now we can remove all the edges connected to any vertex later recursive steps in h, i, k, l,n. After removing the edges, we only have 1 After the update of the deposit, we can safely remove all edge(d g)and 3 isolated vertices e, f and in Figure 2(b), Ⅴ ertices in yy, where kil≤k≤ku, and their edges from the and only 2 isolated vertices g and m in Figure 2(c) Next we compute for k,M=2 and we load the subgraph in subgraphs in H (Line 7 of Procedure 6). Then, we merge any Figure 2(a)(and also Figure 2(c)which is still in main mem of these subgraphs if their combined size is less than B,So as to further reduce the disk I/O cost Furthermore, we keep ory)to construct the induced subgraph by W=d,c, f, 9,m) those subgraphs that will be used in the next recursive step in as shown in Figure 3(b ). By using the deposit, it is easy to main memory and only write the rest to disk, or keep all the obtain the 2class, d, e, f,g, m, by Procedure 7 subgraphs in main memory if they are now small enough to After computing the 2class, all the remaining vertices be all kept in main memory(Line 10 of Procedure 6) fa, b,c,j1, belong to the 1class The recursive step goes on to process the next range of k kf, k1,until kl=2(Lines811 of Procedure 6). Note that D. Analysis of Algorithm the upper bound of the range, hl, for the next recursive step In this subsection, we provide the theoretical analysis re now changes to(hl1). This is because, after the refinement garding the correctness as well as the complexity of our in Line 5 in the current recursive step, we have v(u)<(k1) algorithm EMcore for any vertex v whose core number has not been determined When kl= 2, the remaining vertices must be all in the lclass LEMMA 3: Given W= yku and Gw, for any vEw 业1( Lines1213). al(o)> kl, then Compute Core (i. e, Procedure 7)correctly There is still a small problem: are there any vertices in the computes v(u) 0class? The case of k=0.i.e. the 0class means that no Proof: If w(u)> kl, to compute y,(u)we only need the vertex in this class is connected to any other vertices. Thus, kIcore of G. Ck, In Gw, the largest core number that a vertex the oclass contains only isolated vertices in the graph, which can have is ku. Thus, in computing v(u)using Gw instead can be obtained trivially by one scan of the graph. To do this, of Ck,i, we are missing the edge set (u, u): vEW, v(u)> we add a check in Algorithm 2: as we scan the input graph, ku,(u, U)E EJ, where kl < p(u)<ky if a vertex has a degree of 0, we output it as a Oclass vertex. Since our algorithm follows a topdown approach, Vv E w The following example explains how the topdown core if 3u E nb(o), vb(u)>kia, then v(u) must have already been decomposition is processed computed in some previous recursive step and deposited to deposit(v, hu ) Thus, we have(d(v, Gw)+ deposit (v, ku)) cxample 4: Let us consider the three subgraphs given in d(u, Cki) for all vE W. Since Compute Core removes a vertex Figure 2 and suppose 6=2. Then, we obtain k!=3 and n a only if (d(, Gw)+ deposit (v. kia))<k, or equivalently, 1g, h, i, h, l, n. We load the subgraphs in Figures 2(b) d(u, Ck,)<k, it correctly computes v() for all v E w and 2( )into main memory to construct the induced subgraph v(c)>kl by w, Gw, which is shown in Figure 3(a), where the number next to each vertex v is the current v(o). It is then easy to Using Lemma 3, we prove the correctness of our main obtain the 3class, h, i, ki, I, 7. by Procedure 7 algorithm emcore 58 THEOREM 1: Given a graph G, EMcore (i.e, Algorithm 5) The worstcase complexity is given in terms of hmar. In the correctly computes the kclass of G for all k, where < k< following, we give an estimation on the value of kmar kimar and kmar is the maximum core number of any vertex We focus our analysis on realworld networks whose degree distribution follows a power law. We note that similar analysis Proof: Procedure 6 recursively computes a range of k can also be applied for graphs with other degree distributions be the ranges of k of two sule. 9 However, extensive studies [],[251,[,[271,[281, 1291, classes with k e kn, ku] each time, from ku =o to ki Let cursive [30 have shown that graphs with powerlaw degree distri recursive steps. Procedure 6 assigns ka= hl butions are prevalent in realworld applications, such as the since lemma 3 proves the correctness of each recursion, the hii, u, wwW, many social networks, neural networks, genome and recursive procedure correctly computes all the hclasses for protein networks. More importantly, most realworld networks k 22. After Procedure 6 computes the 2class, the remaining that are too large to be kept in main memory follow a power vertices must belong to the lclass, since the 0class is already law degree distribution [27 ,[3 outputted by a simple checking on the degree of each vertex Faloutsos et al. [26] give the following power law degree being read as we scan g in Algorithm 2, by the fact that a distribution for realworld networks vertex u is in the 0class iff d()=0 d(u) R We now analyze the complexity of EMcore. For external memory algorithms, the complexity analysis is mainly based where r(m) is the degree rank of a vertex v, i. e, 21 is the on the 1/O model, which focuses on the importance of disk (r(a))th highestdegree vertex in G, and R<0 is a constant 1Os over CPU computation. We remark that our algorithm called the rank exponent is not CPU intensive, it takes O(kmaz(m +n)) CPU time, Given a graph g with the above vertex degree distributio since graph partitioning takes linear time and each recursion we now try to obtain the maximum core number kimar that G of keclass computation takes at most O(m n)time. Note can have kmar is essentially the maximum value of k such that the refinement of the y value follows a similar strategy that the top(R+1) highestdegree vertices in G are pairwise as the inmemory core decomposition algorithm and thus its connected (i.e of degree at least k). Therefore, by substituting time complexity is also bounded by that of the inmemory r(u) by(kaz +1)in Equation(2)and having d(a) of at least algorithm, i. e, O(m +n) time. The amortized CPu time kmar, we have should be close to o(m+n)since each recursion operates on an induced subgraph relevant for the range kl, ku only, which is verified by our experimental results as compared with mR(naz +1)≥kma (3) the inmemory algorithm. Thus, disk yo time dominates the overall complexity An approximation for the solution of the inequality stated in Equation(3)by putting(Fima +1)akmar gives the following THEOREM 2: Given a graph G, EMcore computes the k upper bound on h classes of G using O(kman (mm tn))1/Os and ((+m)disk block space in the worst case hmac Proof: The graph partitioning process scans the graph only once and the total size of the subgraphs written to The value of r measured in [26] for three snapshots of the disk is the same as the size of the original graph. Thus. internet graph is between 0 8 and.7. For a graph of 1 AlgorithIn 2 uses O(t I/Os. Each subsequent recursive million vertices. we have kmgr 296 when R=07 Note tep reads/writes only those subgraphs relevant for computing that this is the worst case. Some existing studies on large the kclasses that fall into the range k e ka. Thus, the scale networks [61[3], [5] show that kimas: is less than 50 number of disk I/Os at each recursive step is bounded by approximately. Our experimental results also show that hima O(Er).Thus, the total number of disk I/Os is O(kma (mtn)) is small and the maximum is 372 among the datasets used in the worst case, since there are at most kima.z recursive steps (see Table III). But suppose that imaa is close to its upper The algorithm uses O(r disk blocks since the total size bound given by Equation (4)in the worst case, we have the of the subgraphs written to disk is O(m +n) following analysis on our algorith Let s be the subgraph of G such that one or both end We further remark that the actual 1/O cost of the algorithm vertices of every edge in S are in the kmazclass; that is, S can be significantly smaller for the following reasons: (1)each (Vs, Es), where Vs= Wkmar t o: uE Vkma,(u,U)EEI recursive step reads only the relevant subgraphs from the disk, and Es=(u, u):(uU)CE, u C Hkma. By Equation(2), which takes O(b)=0(F)instead of 0(23)disk I/Os; (2) we have the following lower bound for S we cache some subgraphs in main memory for reuse in the next recursive step; and (3)the size of each subgraph reduces kmas+1 after each recursive step In our experiments we show that the nar(nna z actual i/o is close to m+n instead of kimar(m+n) S2∑() The equality holds when there are only(kmar +1) vertices all experiments on a machine with an Intel Xeon 2.67GHz in the kimarcore. The first component in the righthand side of CPU and 6GB RAM, running CentOS 5. 4(Linux) Equation(5)is the sum of degrees of the(kmax +1) vertices Since they are pairwise connected in this case, their interedges Datasets. We use the following four datasets in our exper are double counted and hence deducting the number of these iments: roadnetworkCalifornia(RNCA), LiveJournal(LD) edges from the degree sum gives the lower bound of S Billion Triple Challenge(btc), and Web. The RNCA dataset By Equation(2), we also obtain the size of the entire graph a road network of California, where vertices represent interse G, which is half of the degree sum of all vertices in V. Since tions and endpoints, and edges represent the roads connectin g all edges are counted twice in this case, the degree sum needs these intersections or road endpoints. The L) dataset is the free to be halved onlinecommunitycalledLivejournal(www.livejournal.com) where vertices are members and edges represent friend ship between members. The RNCA and l datasets can be R (6)downloaded from the Stanford Network Analysis Platform (http:/snap.stanfordedu/datal/index.html).Thebtcda a semantic graph converted from the billion Triple challenge Inourtopdownapproachaftercomputingthekmazcore,2009Rdfdataset(http://vmlion25.deri.ie/,whereeachnod we can remove S from G. According to Equation (5)and represents an object such as a person, a document, and an Equation(6), for a network with R=0.7 and 1 million event, and each edge represents the relationship between two vertices,S is at least 12% of the entire network, which is a nodes such as "hasauthor""linksto'" and " hastitle"The huge reduction on the search space and disk t/o cost Web graph is obtained from the YAHOO webspam dataset hThevalueofkmazisingeneralsmallforlargescalereal(http://barcelona.researchyahoonet/webspam),wherevertices world networks, in which case emcore has fewer number of are pages and edges are hyperlinks. We give the number of recursive steps. On the other hand, if kaz is larger, we have vertices and edges, as well as the storage size in the disk, of greater reduction on the search space and 1O cost. Thus, in the datasets in Table IIl both cases, the analysis shows the effectiveness of our top down approach for core decomposition TABLE III DATASETS (M=10, G=10) E. Algorithm Optimization RICA ⅠJ BTC Web There is one optimization that may considerably improve 2.0M 4.8 1485M529M the performance of the algorithm. Before we write a subgraph m=E55M690M673.3M1.65G h to the disk in Procedure 3. we can further trim h to reduce Disk size(byte)67.4M 809 1M 8.7G 17.2G 372 both the search space and the disk io cost by precomputing a partial lclass. Such lclass vertices in H can be found by iteratively deleting vertices with degree of I and the edges incident to them. Note that the lclass so computed may Experiment Settings. We test our algorithm for two scenar not be complete, since some interedges may exist between los: (1) when main memory can hold the entire input graph, different subgraphs in the partition and the computation here and (2) when main memory cannot hold the entire input graph only considers each local subgraph H We use Case(1)to demonstrate that our externalmemory Since vertices with a smaller core number, together with algorithm does not process a large graph at the expense of their edges, cannot be in kcores with a greater core number, significantly increased CPu time complexity. We use Case we can safely remove all these vertices and their edges from (2)to verify that our algorithm computes core decomposition H. If there are many such vertices and edges in H, we can efficiently when the existing inmemory algorithm becomes continue to keep H in main memory and put into H more infeasible vertices and edges from G. We write h to the disk until its size reaches b A. When Main memory is Sufficient We remark that if after removing the partial lclass vertices We first assess the performance of our algorithm and their edges, the graph becomes small enough to be main memory is sufficient to hold the entire graph, by mainly kept in main memory, then our algorithm simply performs comparing the running time with the inmemory algorithm core decomposition in main memory, as does an inmemory We use two medium size networks. RNCa and L/ algorithm Table iv reports the partitioning time, the recursive core decomposition time(and the time to compute the k.rclass VI EXPERIMENTAL EVALUATION y r), the total running time, and the memory consumption of In this section, we evaluate the performance of our al EMcore and IMcore. All time taken is the elapsed wallclock gorithm EMcore, comparing it with the stateoftheart in time. The total running time of EMcore is comparable to that memory core decomposition algorithm by Batagelj and Za of IMcore For EMcore, the partitioning time occupies a large versnik [201, denoted as IMcore in our experiments. We portion of the total running time. Thus, the actual time taken implement our algorithm in C++(as does IMcore). We ran by the recursive core decomposition procedure(Procedure 6) 60
 434KB
Distributed kcore decomposition
20181204Distributed kcore decomposition.
 1.0MB
强大的时频分析工具ITD附原文Intrinsic timescale decomposition.pdf
20190813强大的时频分析工具ITD附原文Intrinsic timescale decomposition.pdf 据说克服了傅里叶、加窗傅里叶、小波、EMD等所有时频分析方法所存在的缺点。。。值得一看。。
 2KB
kshell分解算法
20150525Kshell 分解方法给出了节点重要性的一种粗粒化的划分。 其基本思想如下，假设边缘节点的 Kshell值为 1，然后往内一层层进入网络的核心，先去除网络 中度值等于 1 的所有节点以及连边。 若
 31KB
Biasvariance decomposition推导
20160205Biasvariance decomposition 偏置方差分解推导
 59KB
Algorithmchompdecomposition.zip
20190917Algorithmchompdecomposition.zip,Staffjoy套件（v1）微服务按需转换分解,算法是为计算机程序高效、彻底地完成任务而创建的一组详细的准则。
 153KB
Successive RankOne Decomposition of HigherOrder Symmetric Complex Tensors
20200203高阶对称复张量的逐次秩一分解，张仉辉，白敏茹，为了得到高阶对称复张量的标准分解,本文将矩阵中的逐次分解方法推广到张量情形,提出高阶对称复张量的逐次对称秩一复逼近方法。本�
 545KB
Kcore介绍
20131106本文介绍了复杂网络中的Kcore概念，对于kcore的概念，算法，用途等有简单清晰的介绍
 538KB
基于ITD能量特征与KELM的滚动轴承故障诊断方法研究
20200515针对滚动轴承故障诊断中故障特征难提取与极限学习机稳定性、泛化能力差,致使故障辨识精度差的问题,提出了一种基于ITD(Intrinsic Timescale Decomposition)能量特征与KE
 231KB
Isospin couplingchannel decomposition of nuclear symmetry energy in covariant density functional theory
20200211基于协变密度泛函理论对核物质对称能的同位旋耦合道分解，赵前，孙保元，基于协变密度泛函理论,对势能密度泛函进行同位旋耦合道分解,研究了其随同位旋与密度的依赖性质以及对与核物质对称能的相关影响。�
 336KB
论文研究Domain Decomposition FDTD Algorithm for the Analysis of Electromagnetic Fields and Microwave Structures.pdf
20190816电磁场与微波结构的区域分解时域有限差分法综述，许锋，吴柯，在本文中，综述了一种可以非常有效地用来分析和设计电磁场与微波结构的区域分解时域有限差分法。所提出的区域分解时域有限差分法
 400KB
EZW以及SPIHT的MATLAB实现
20110503The SPIHT function in this toolbox are listed as follow: % func_SPIHT_Demo_Main  Main function % f
 555KB
论文研究BPLCD方法及其在滚动轴承故障诊断中的应用.pdf
20190912针对局部特征尺度分解（Local Characteristicscale Decomposition，LCD）方法中严重的端点效应，将BP神经网络应用到信号的延拓中，提出了一种提出基于BP神经网络延
 461B
matlab开发用qrdecomposition测定能量值
20190825matlab开发用qrdecomposition测定能量值。利用QR分解计算特征值
 1KB
GoDec:Randomized Lowrank & Sparse Matrix Decomposition in Noisy Case
20180523Tianyi Zhou，Dacheng Tao等人提出的GoDec模型，适用于低秩分解。
 1.32MB
Robust MultiExposure Image Fusion: A Structural Patch Decomposition Approach
20180714Robust MultiExposure Image Fusion: A Structural Patch Decomposition Approach
 297KB
复矩阵的一些特征量的界的新估计
20200226复矩阵的一些特征量的界的新估计，俞龙坤，赵建国，本文通过利用近期关于矩阵特征值模的平方和的上界的研究成果，给出了矩阵秩和行列式上界的一些新估计，并利用矩阵展形的估计得到
 3.0MB
Variational Mode Decomposition.pdf
201907242014年，K. Dragomiretskiy and D. Zosso, Variational Mode Decomposition等人提出 Variational Mode Decomposit
 4.61MB
Deep Retinex Decomposition for LowLight Enhancement
20190306BMVC2018年的论文Deep Retinex Decomposition for LowLight Enhancement的PDF
 12.89MB
Handbook of Robust LowRank and Sparse Matrix Decomposition
20170109Handbook of Robust LowRank and Sparse Matrix Decomposition
 564KB
DMC Formation over CexZr1xO2 Prepared by ComplexDecomposition Method
20200206铈锆固溶体CexZr1xO2的制备及其在合成碳酸二甲酯中的催化性能研究，张智芳，刘昭铁，采用配位分解法制备了一系列铈锆固溶体催化剂（CexZr1xO2 (x=0.2, 0.3, 0.4, 0
Python数据分析三剑客主流数据分析库精讲
20190928Python数据分析三剑客主流数据分析库精讲
Python数据清洗实战入门
20191209本次课程主要以真实的电商数据为基础，通过Python详细的介绍了数据分析中的数据清洗阶段各种技巧和方法。

博客
在VM虚拟机中 CentOS7安装VMware Tools（超级详解）
在VM虚拟机中 CentOS7安装VMware Tools（超级详解）

学院
小程序开发入门之实战案例解析：高清壁纸推荐
小程序开发入门之实战案例解析：高清壁纸推荐

博客
数据结构9 树
数据结构9 树

博客
数控机床分期锁破解
数控机床分期锁破解

下载
资产负债表Excel图表模板
资产负债表Excel图表模板

学院
测试的课程1
测试的课程1

博客
Linux安装zookeeper
Linux安装zookeeper

学院
快速入门CTF逆向
快速入门CTF逆向

博客
Go map
Go map

下载
JavaScript简单实现弹出拖拽窗口（二）
JavaScript简单实现弹出拖拽窗口（二）

博客
分期机床锁怎么解锁
分期机床锁怎么解锁

博客
PAT 乙级1023 组个最小数 (20分)
PAT 乙级1023 组个最小数 (20分)

下载
20112018年湖南农业大学814水生生物学考研真题
20112018年湖南农业大学814水生生物学考研真题

学院
Java Web快速入门
Java Web快速入门

下载
借贷款总账Excel图表模板
借贷款总账Excel图表模板

下载
quotewebfluxdemomaster.zip
quotewebfluxdemomaster.zip

学院
JAVA20分钟手写HashMap
JAVA20分钟手写HashMap

下载
华为荣耀手机被机主锁定要账户密码来激活才能用应该怎么解开呢自用解锁新机方法.zip
华为荣耀手机被机主锁定要账户密码来激活才能用应该怎么解开呢自用解锁新机方法.zip

学院
2020版_全国计算机等级考试二级Java程序设计
2020版_全国计算机等级考试二级Java程序设计

学院
社群本质拆解25讲——为什么90%的人认为的「社群」概念都是错的？
社群本质拆解25讲——为什么90%的人认为的「社群」概念都是错的？

博客
架构不止严选Service Mesh架构的持续演进
架构不止严选Service Mesh架构的持续演进

博客
03. 朴素贝叶斯
03. 朴素贝叶斯

下载
designer_vol1.pdf
designer_vol1.pdf

博客
Java虚拟机堆、栈、运行时数据区
Java虚拟机堆、栈、运行时数据区

学院
小程序开发基础与项目实战
小程序开发基础与项目实战

下载
显示/光电技术中的液晶显示模块HG240128R在单片机系统中的应用
显示/光电技术中的液晶显示模块HG240128R在单片机系统中的应用

博客
Java框架—Maven
Java框架—Maven

博客
Event causality extraction based on connectives analysis
Event causality extraction based on connectives analysis

下载
开店损益评估表Excel图表模板
开店损益评估表Excel图表模板

下载
JavaScript操作URL的相关内容集锦
JavaScript操作URL的相关内容集锦