On community structure in complex networks: challenges and opportunities
Hocine Cherifi, Gergely Palla, Boleslaw K. Szymanski, Xiaoyan Lu
OOn community structure in complex networks:challenges and opportunities
Hocine Cherifi · Gergely Palla · Boleslaw K. Szymanski · Xiaoyan Lu
Received: November 5, 2019
Abstract
Community structure is one of the most relevant features encoun-tered in numerous real-world applications of networked systems. Despite thetremendous effort of a large interdisciplinary community of scientists workingon this subject over the past few decades to characterize, model, and analyzecommunities, more investigations are needed in order to better understand theimpact of community structure and its dynamics on networked systems. Here,we first focus on generative models of communities in complex networks andtheir role in developing strong foundation for community detection algorithms.We discuss modularity and the use of modularity maximization as the basis forcommunity detection. Then, we follow with an overview of the Stochastic BlockModel and its different variants as well as inference of community structuresfrom such models. Next, we focus on time evolving networks, where existingnodes and links can disappear, and in parallel new nodes and links may beintroduced. The extraction of communities under such circumstances poses aninteresting and non-trivial problem that has gained considerable interest over
Hocine CherifiLIB EA 7534 University of Burgundy, Esplanade Erasme, Dijon, FranceE-mail: hocine.cherifi@u-bourgogne.frGergely PallaMTA-ELTE Statistical and Biological Physics Research GroupP´azm´any P. stny. 1/A, Budapest, H-1117, HungaryE-mail: [email protected] K. SzymanskiDepartment of Computer Science & Network Science and Technology CenterRensselaer Polytechnic Institute110 8 th Street, Troy, NY 12180, USAE-mail: [email protected] LuDepartment of Computer Science & Network Science and Technology CenterRensselaer Polytechnic Institute110 8 th Street, Troy, NY 12180, USA E-mail: [email protected] a r X i v : . [ phy s i c s . s o c - ph ] N ov Cherifi, Palla, Szymanski, Lu the last decade. We briefly discuss considerable advances made in this fieldrecently. Finally, we focus on immunization strategies essential for targetingthe influential spreaders of epidemics in modular networks. Their main goalis to select and immunize a small proportion of individuals from the wholenetwork to control the diffusion process. Various strategies have emerged overthe years suggesting different ways to immunize nodes in networks with over-lapping and non-overlapping community structure. We first discuss stochasticstrategies that require little or no information about the network topology atthe expense of their performance. Then, we introduce deterministic strategiesthat have proven to be very efficient in controlling the epidemic outbreaks,but require complete knowledge of the network.
Keywords community detection · stochastic block model · time evolvingnetworks · immunization · centrality · epidemic spreading Complex systems are found to be naturally partitioned into multiple modulesor communities. In the network representation, these modules are usually de-scribed as groups of densely connected nodes with sparse connections to thenodes of other groups. When a node can belong to a single community thecommunity structure is said to be non-overlapping, while in overlapping com-munities a node can belong to multiple communities. In this position paper,in three subsequent sections, we discuss three fundamental questions tied tothe community structure of networks: generative models, communities in timeevolving networks and immunization techniques in networks with modularstructure.In the next section, we review the work on generative models for commu-nities in complex networks and their role in developing strong foundation forcommunity detection algorithms. We start with modularity which is an elegantand general metric for community quality, and which has also been used as thebasis for community detection algorithms by modularity maximization [1–4].This method was recently proven [5] to be equivalent to maximum likelihoodmethods for the planted partition. More generally, the recovery of stochasticblock model finds the latent partition of networks nodes into the communitieswhich are equal to or correlate with the truth communities used for generationof the given network.The stochastic block model also serves as an important tool for the evalu-ation of community detection results, including the diagnosis of the resolutionlimit on community sizes and determining the number of communities in anetwork. We review several widely used random graph models and introducethe definitions of the stochastic block model and its variants. We also describedsome recent results in this area. The first one presented in [6] discovers suf-ficient and necessary conditions for modularity maximization to suffer fromresolution limit effects and proposes a new algorithm designed to avoid thoseconditions. Another one, presented in [7], uses one parameter to indicate if the ommunity structure: challenges and opportunities 3 assortative or disassortative structure is sought by the inference algorithm.This approach enables the algorithm to avoid being trapped at the inferiorlocal optimal partitions, improving both computation time and the quality ofthe recovered community structure.Section 3 focuses on the time evolution of complex systems, study of whichhas been enabled by the rapid increase in the amount of publicly availabledata, including time stamped and/or time dependent data. The network rep-resentation of such systems naturally corresponds to time evolving networks,where existing nodes and links can disappear, and in parallel new nodes andlinks may be introduced. The extraction of communities under such circum-stances poses an interesting and non-trivial problem that has gained consid-erable interest over the last decade. Over time, communities might grow orshrink in size, may split into smaller communities or merge together forminglarger ones, absolutely new communities may also emerge, and old ones candisappear. Keeping track of a rapidly changing community embedded in anoisy network can be challenging, especially when the time resolution of theavailable data is low. Nevertheless, considerable advances have been made inthis field over the years, which we shall briefly discuss.Section 4 focuses on immunization strategies designed for modular net-works. It is motivated by the importance of prevention of epidemic whoseoutbreaks, such as diseases, represent a serious threat to human lives andcould have a dramatic impact on the society [8,9]. Immunization through vac-cination permits to protect individuals and prevent the propagation of con-tamination to their neighbors. As mass vaccination is not possible when thereis limited dose of vaccines designing efficient immunization strategy is a cru-cial issue. Immunization strategies are the essential techniques to target theinfluential spreaders in networks. Their main goal is to select and immunize asmall proportion of individuals from the whole network to control the spreadof epidemics. To do so they rely on various properties of the network topology.For example, the network degree distribution has been extensively studied. In-deed, as real-world contact networks exhibit a power-law degree distribution,targeting preferentially high degree nodes appears to be an effective strategy.Community structure is also a well-known property of social networks. Recentstudies have shown that it affects the dynamics of epidemics, and that it needsto be considered to design tailored epidemic control strategies [10–14]. Thissection presents an overview of recent and influential works on this issue.
The Erd˝os–R´enyi (ER) random graph [15] is perhaps one of the earliest workson random graph models. It has two closely related definitions. Given a set
Cherifi, Palla, Szymanski, Lu of n nodes and m edges, one variant of the ER model randomly connects m pairs of different nodes. This process generates a collection of unique graphsof exactly n nodes and m edges, each of them being generated uniformly atrandom.The other variant of ER model [16] specifies the probability of forming anedge between every pair of different nodes. According to this definition, eachpair of nodes is connected with a probability p independently at random. Bythe law of large numbers, as the number of nodes in such random graph tendsto infinity, the number of generated edges approaches (cid:0) n (cid:1) p . The likelihood ofgenerating a network G of n nodes and m edges is P [ G ] = p m (1 − p )( n ) − m (1)Since every edge is generated randomly with the same probability p , the degreeof any particular node in the ER model follows the Binomial distribution. Similar to the ER model, the configuration model [17] assumes that the edgesare placed randomly between the nodes. The randomization conducted by theconfiguration model always preserves the pre-defined node degree which canbe represented as the number of adjacent half-links or stubs. The networkgeneration process keep randomly pairing every two stubs to create an edgeuntil no stub remains. Hence, the configuration model produces an ensemble ofgraphs with the same degree sequence. The number of edges between nodes i and j averaged over all the graphs generated in this way is equal to k i k j m where k l is the degree of node l and the number of edges m = (cid:80) l k l . The configu-ration model is considered a benchmark in the calculation of modularity [5],a commonly used quality metric for network partitions. Given a partition ofnetwork nodes into communities, modularity compares the number of edgesobserved in each community with the corresponding expected number in thegraphs generated by the configuration model with the same degree sequence,which is given as Q = 12 m (cid:88) r (cid:88) { i,j }∈ r (cid:18) A ij − k i k j m (cid:19) = (cid:88) r (cid:20) m r m − (cid:16) κ r m (cid:17) (cid:21) , (2)where { i, j } ∈ r denotes every pair of nodes inside community r , m r is thenumber of edges with both endpoints inside the community r , κ r is the sumof the degrees of nodes in community r .It is worth noting that the network generated by the configuration modeldoes not exclude the self-loop edges, each of which connects a node to itselfand the multi-links which are the multiple edges between the same pair ofnodes. However, when the number of nodes approaches infinity, the density ofself-loops and multi-links in the network generated by the configuration modeltends to zero. ommunity structure: challenges and opportunities 5 Unlike the pre-defined node degrees in the configuration model, the ex-pected node degrees in the ER model are all the same, which are rarely ob-served in real graphs. Thus, the graphs produced by the configuration modelare more realistic than the ER graphs thanks to the node degree variations.
Standard SBM.
The standard stochastic block model [18] is a generativemodel of the graph in which nodes are organized as blocks and edges areplaced between nodes independently at random. In the standard stochasticblock model, each node i in the network is associated with a block assignment g i . The number of edges between nodes i and j is independently distributed. Itfollows a Bernoulli distribution with mean ω g i ,g j , a parameter which dependsonly on the block assignments of two endpoints. Thus, the standard stochasticblock model is parameterized by a matrix Ω = { ω rs } whose component ω rs denotes the probability of forming an edge between a node in block r andthe other node in block s . Given the block assignment { g i } and the edgeprobability matrix Ω , the likelihood of generating an undirected unweightednetwork G is P [ G |{ g i } , Ω ] = (cid:89) i Since the standard stochastic block model con-siders nodes in the same block statistically indistinguishable in terms of theprobability of forming edges, the degree heterogeneity is ignored. However,real-world networks typically display broad degree distributions. The lack ofdegree heterogeneity makes the standard stochastic block model unsuitablefor applications to many realistic networks. Therefore, the degree-correctedstochastic block model [19] incorporates the degree heterogeneity, assuming Cherifi, Palla, Szymanski, Lu that the number of edges between any pair of nodes i and j follows the Pois-son distribution with mean ω g i ,g j θ i θ j where θ l is a model parameter associatedwith each node l . In an unweighted undirected multi-graph, after ignoring allterms independent of the model parameters, the log-likelihood simplifies tolog P [ G |{ g i } , { θ i } , Ω ] = 12 (cid:88) ij (cid:2) A ij log (cid:0) ω g i ,g j θ i θ j (cid:1) − ω g i ,g j θ i θ j (cid:3) (4)where A ij is the number of edges between different nodes i and j for i (cid:54) = j ; forthe simplicity of the expression, the model defines A ii = 2 k for any node i with k self-loop edges. Given a partition of the network, i.e., the block assignments { g i } , the posterior maximum likelihood estimates of θ i and ω rs areˆ θ i = k i κ g i , ˆ ω rs = m rs , (5)where κ r = (cid:80) i ∈ r k i is the sum of the degrees of all nodes in a block r , and m rs is the total number of edges between blocks r and s , or twice the numberof edges in r if r = s . Plugging in the maximum likelihood estimates aboveand skipping the irrelevant terms, the log-likelihood of the degree-correctedstochastic block model can be simplified aslog P [ G |{ g i } ] = (cid:88) rs m rs log m rs κ r κ s . (6)It is worth mentioning that the degree-corrected stochastic block model as-sumes that the number of edges between any two nodes follows the Poissondistribution. In the standard stochastic block model where the number of edgesdraws from the Bernoulli, it is rare that the edge probability is close to 1, be-cause most real networks are often sparse. A Bernoulli random variable witha small mean is well approximated by a Poisson random variable [20], whichmakes the Poisson distribution a good replacement here for the number ofedges between two nodes. The standard planted partition model [21, 22] is a special case of the standardstochastic block model. The network generated by the planted partition modelincludes an edge between any two nodes in the same block with a probability p and an edge between any two nodes across different blocks with a probability q . When p > q , the network generated by the planted partition model hasan assortative structure; otherwise, when p < q , the model generates networkswith disassortative structure, which corresponds to the bi-partite networks [23]when only two blocks exist.Similar to the degree-correction of the standard stochastic block model,the standard planted partition model can be extended to its degree-correctedversion. In the degree-corrected planted partition model [5], the number ofedges between any two nodes i and j follows the Poisson distribution with ommunity structure: challenges and opportunities 7 mean ω g i ,g j k i k j m where ω g i ,g j = ω if g i = g j or otherwise ω g i ,g j = ω . Giventhe block assignments { g i } and parameters ω and ω , the log-likelihood ofgenerating a particular graph islog P [ G |{ g i } , { ω , ω } ] = 12 (cid:88) ij (cid:20) A ij log (cid:18) ω g i ,g j k i k j m (cid:19) − ω g i ,g j k i k j m (cid:21) (7)which, after a small amount of manipulation, can be re-written aslog P [ G |{ g i } , { ω , ω } ] = B m (cid:88) r (cid:88) { i,j }∈ r (cid:18) A ij − γ k i k j m (cid:19) + const. (8)where { i, j } ∈ r denotes every pair of nodes in block r , the terms B = m log ω ω and γ = ω − ω log ω − log ω are independent of the block assignments { g i } . Com-paring Eq. 8 with the definition of generalized modularity of Reichardt andBornholdt [24], maximizing the log-likelihood of the degree-corrected plantedpartition model is equivalent to maximizing the generalized modularity witha specific resolution parameter γ . This equivalence result shows that maxi-mizing generalized modularity tends to find communities of similar statisticalproperties. In realistic networks where edges are heterogeneously distributedwithin different communities, however, there may not be a single resolutionparameter γ sufficient to avoid the resolution limit anomaly [6,25]. As a result,small well-formed communities are likely to be merged into inappropriate largegroups, while large well-formed communities spread across smaller ones.2.2 Model InferenceDespite of being widely used for community detection, modularity maximiza-tion is provably NP-Hard [26] that implies that any algorithm based on thisapproach may fail on some inputs. It also suffers from the resolution limitanomaly in which the well-formed dense communities get merged into a largecluster or the loose community inappropriately splits into multiple smallerclusters to increase the modularity. An alternative approach for communitydetection is the statistical inference to fit the generative model to the observednetwork data. Such approach assumes the observed network is produced by arandom graph model with a pre-defined partition of the network as the modelparameter. In general, the statistical inference aims at recovering the partitionwhich maximizes the likelihood of the random graph model generating the ob-served network data. In this section, we introduce the inference methods forthe generative graph models which usually requires selecting the number ofblocks and discuss their connection to the traditional modularity optimizationsand the resolution limit anomaly in Section 2.2.4. Cherifi, Palla, Szymanski, Lu Selecting the number of communities The stochastic block model and its variants do not specify the number of com-munities in the network. In general, the likelihood of these models increasesas the number of communities grows. Thus, maximizing the likelihood of themodel produces the trivial results where every node becomes a single com-munity. Therefore, one needs to specify the number of communities for theserandom graph models. One approach is to find the number of communitiesby the statistical inference [27, 28]. Alternatively, according to the Occam’sRazor, the model inference process should take into account the complexityof the model, which can be measured by the model description length [29].Other work [30] also uses the Bayesian model selection to determine the num-ber of the communities in a network. [31] provides a detailed discussion ofcommonly used approaches to select the number of communities for randomgraph models. The simplest Markov Chain Monte Carlo approach is to propose moving eachnode from its original block into one of the B blocks randomly, which eas-ily satisfies the requirements of ergodicity and detailed balance because anyblock assignment can be reached from the current block assignment with fi-nite and aperiodic expected number of steps. However, considering the sizeof the partition space O ( N K ) for a network with N nodes and K blocks, thenaive MCMC approach is not practical. Therefore, [32] proposes the optimizedMarkov Chain Monte Carlo (MCMC) algorithm with the greedy heuristic toinfer the block assignment. Initially, every node in the network is assigned toone random block independently. Then, one attempts to move a node fromblock r to s with a probability conditioned on its neighbor’s block assignment t p ( r → s | t ) = m ts + (cid:15) (cid:80) s m ts + (cid:15)B . In the above, (cid:15) > (cid:15) tends to ∞ ,the proposed function reduces to the naive scheme which assigns random blockto the current node. However, such naive scheme is inefficient. Indeed, the pos-sibility of current node being assigned to the correct block assignment is verylow, thus, such assignment does not increase the log-likelihood in most cases.Consequently, the assignments are rejected very frequently, wasting the com-putational resource. By applying a relatively small (cid:15) , the assignment selectedby the function proposed above is more likely to get accepted. The intuitionbehind this function is that, given that there are many edges across blocks s and t , a node with many neighbors in block t is likely to be assigned to block s . Thus, the function proposed above is more likely to be accepted, avoidingthe computational cost wasted by many rejected assignments. ommunity structure: challenges and opportunities 9 To ensure the detailed balance, each proposed move is accepted with aprobability a in the Metropolis-Hastings fashion [33] given by a = min (cid:110) exp( ∆ L ) (cid:80) t n t p ( s → r | t ) (cid:80) t n t p ( r → s | t ) (cid:111) , (9)where ∆ L is the change of log-likelihood after the move and the node of theproposed move has n t neighbors in block t .In [7], the authors observe that the current versions of stochastic blockmodel randomly search through the large space of potential solutions con-taining both assortative and disassortative structures. Consequently, inferencealgorithms using these models are often trapped in a solution unsuitable for theuser and it takes them long time to escape. To address this issue, the authorsof [7] apply a simple constraint on nodes internal degree ratio in the objectivefunction. This approach is independent of the inference algorithm. The result-ing algorithm reliably finds assortative or disassortative structure as directedby the value of a single parameter. The paper contains the results of validationof the model experimentally by testing its performance on several real and syn-thetic networks. The experiments show that the inference of degree-correctedstochastic block model quickly converges to the desired assortative or disassor-tative structure. In contrast, the inference of degree-corrected stochastic blockmodel gets often trapped at the inferior local optimal partitions. [5] proposes an iterative algorithm to find the optimal values of Ω, g thatmaximize the log-likelihood of the degree-corrected planted partition model.The author of [5] shows the maximum likelihood estimates of the block as-signments g = { g i } is equivalent to maximizing the generalized modularity Q ( γ ) = 12 m (cid:88) ij ( A ij − γ k i k j m ) δ g i ,g j (10)which is given as a function of γ , a positive parameter known as the resolutionparameter. The algorithm repeats the following two steps until convergence: – Given the values of Ω = { ω , ω } , find the optimal block assignment g maximizing the log-likelihood of degree-corrected planted partition modeldefined in Eq. 7. This is equivalent to maximizing the generalized modu-larity Q ( γ ) with a γ = ω − ω log ω − log ω , g new = arg max g log P ( A | Ω, g ) = arg max g Q ( γ ) – After updating g , find the Ω = { ω , ω } under the current block assignment g by the maximum likelihood estimation, Ω new = arg max Ω log P ( A | Ω, g ) The maximization of the generalized modularity is equivalent to the maximum-likelihood estimation (MLE) of the degree-corrected planted partition modelon the same graph [5]. Hence, the partition of the network which most likelygenerates the observed network also maximizes the generalized modularitywith a particular resolution parameter. However, in the planted partitionmodel, all communities have similar statistical properties, which is unusualin practical application.In [6], the authors answer the important question about the performance ofthe generalized modularity on the networks generated by the stochastic blockmodel that can generate more realistic networks with heterogeneous communi-ties. First, these authors establish an asymptotic theoretical upper and lowerbounds on the resolution parameter of generalized modularity bridging thegap between the literature on the resolutions limits of modularity-based com-munity detection [25] and the random graph models. They also show thatcommunities with different densities can still be detected by maximizing thegeneralized modularity when the resolution parameter is within the establishedrange. Otherwise, when this parameter is larger than the upper bound estab-lished in this paper, some well-formed communities are likely to be spreadamong multiple clusters. In the opposite case when the resolution parameteris lower than the bound presented in the paper, some communities are in-appropriately merged into one large component. The conclusion is that thegeneralized modularity resolution limits arise when a network contains a sub-graph whose lower bound is higher than the upper bound of another subgraphbecause in such a case any resolution parameter will be either above the upperbound of latter subgraph or below the lower bound of the former subgraph orboth.To address the above mentioned problem, the authors of [6] introduce aprogressive agglomerative heuristic algorithm that systematically increases theresolution parameter. The algorithm recursively splits the resulting clusters ofthe previous level to detect smaller communities. As the recursion proceeds,the algorithm gradually increases the resolution parameter for high-resolutioncommunity detection in local subgraphs of the network. The algorithm pro-ceeds until the final partition is no longer statistically significant. This ap-proach avoids getting trapped by the resolution limit and does not requiremultiple re-computing of the resolution parameter [5], which can be computa-tionally prohibitively costly for large networks. As mentioned in the Introduction, one of the challenging problems related tocommunities is given by the partitioning of time evolving networks. Here webriefly overview the most widely used methodologies and important advancesrelated to this area. A very nice survey providing a more in depth description ommunity structure: challenges and opportunities 11 of the various approaches with formal definitions, algorithms, etc. was recentlypublished by Rossetti and Cazabet in Ref. [34].3.1 Snapshot based approachesProbably the most simple approach is to define snapshots, corresponding tostatic graphs, representing the state of the evolving network at a given timepoint, and to apply a static community finding method to the subsequentsnapshots [35–40]. The communities found in the neighboring time steps thenhave to be matched with each other somehow. One of the basic ideas is touse the Jaccard-index for measuring the relative overlap between the commu-nities, and match the pairs in the decreasing order of the Jaccard-index [37].Naturally, the Jaccard-index can be replaced by any other similarity measuresuch as e.g., the normalized mutual information [41, 42], the adjusted mutualinformation [43], or any advanced information based similarity in general.The advantage of this approach is that it is conceptually simple, and onecan use basically any community finding method on the static snapshots. Thedrawback is that the matching part can become technically complicated undercertain circumstances. First of all, if there are O ( N c ) communities found in agiven snapshot, in principle we need to evaluate the chosen similarity function O ( N c ) times for every pair of subsequent snapshots. Moreover, for similaritymeasures based on solely memberships (without taking into account e.g., thelink structure of the communities) it is not uncommon for a community C i ( t )at time step t to have two or even more corresponding communities C j ( t + 1)at time step t +1 with equal similarity to C i ( t ) simply because the membershipvalues can take only integer numbers. Thus, when choosing the most similarcommunity from the next time step as the image of C i ( t ) at t + 1, we mightrun into the problem of having multiple equally similar candidates. Anotherproblem is that a large community at t can have a non-zero similarity withmany different communities at t + 1, and thus, if we follow the merging andsplitting processes between the communities without any restriction on theminimal similarity, the lineage of the evolving community structure can be-come extremely subtle and complicated. Of course, using a minimum similaritythreshold can make the picture clearer, however, at the cost of the introduc-tion of an extra parameter to the method. Last but not least, in case we areusing a static community finding method that allows overlaps between thecommunities, finding the best match between the subsequent time steps canbecome even more tricky [37]. For the above reasons, the introduction of morespecialized community finding methods targeted at time dependent networkswas very well motivated.3.2 Evolutionary algorithmsThe key idea behind these approaches is to provide a unified framework inwhich the inference of communities at a given time step t can take into account information about the network structure at other time steps as well. One ofthe first methods pointing in this direction was suggested in [44], where thegoal was to optimize both for ’point wise’ precise communities reflecting themodular structure of the network at any given time point t , and for keepingthe change in the community structure between two subsequent time stepsas low as possible. This was achieved in a rather general framework, wherea cost function is introduced composed of two parts, the first related to theaccuracy of the communities located at the different time steps, and the secondterm corresponding to the ’historical cost’, depending on the similarity of thepartitions at subsequent time steps. The second term also involves a userdefined parameter (a simple multiplicative factor) with which we can balancethe trade-off between lowering the point-wise accuracy and gaining smoothnessof evolution in time. In [44] the method is used with hierarchical clustering and k -means clustering together with historical costs specifically using the natureof the applied clustering.In principle, the above framework can be used with any static communityfinding algorithm combined with a suitable similarity measure between com-munities. E.g., in [45] spectral clustering techniques are used to uncover thecommunities, whereas in Ref. [46], the community finding is based on opti-mizing the Kullback–Leibler divergence between the actual network structureand the one predicted based on community memberships. The advantage ofthis latter approach is that the historical costs can also be formulated as theKullback–Leibler divergence between the consecutive community partitions,providing a unified formulation for both type of costs, and also allowing for aprobabilistic interpretation of the optimization problem [46]. Further methodssimilar in nature were proposed in Refs. [47–54].Another quite general framework for evolutionary community finding wasproposed in [55], based on the concept of multislice networks. In such systems,the network structure can be organized into layers, where the layers representdifferent types of connections between the same nodes such as e.g., social mediaconnections, e-mail connections and personal acquaintances between the samepeople. By taking any community finding approach in general that is suitablefor detecting communities in multiple levels simultaneously, the same methodcan be also applied to evolutionary community finding if we represent the timeevolving network as a multislice network, where the different layers correspondto the subsequent time steps during the time evolution. The solution offeredin [55] is based on modularity, however as mentioned above, the generality ofthe framework allows any further multislice methods as well.A further general problem class into which the challenge of evolutionaryclustering fits naturally is given by consensus clustering [56]. The basic idea ofconsensus clustering is to apply multiple different clustering methods to thesame network, and then bring the found (presumably different) partitions toconsensus, resulting in stable, relevant communities even for stochastic com-munity finding methods. However, this approach is also very suitable for evo-lutionary clustering when the setup is modified as follows. First, based on thetime evolving network data, following the well-known concept of sliding time ommunity structure: challenges and opportunities 13 windows, a number of time frames are defined, where each frame correspondsto the aggregation of a certain number of consecutive time steps in the origi-nal data, and also the neighboring time frames show a significant overlap witheach other to ensure stability and a smooth time evolution of the communities.Next, a static community finding algorithm is applied to the subsequent timeframes, and then the obtained results are brought to consensus, again, oversliding windows of a fixed length [56].Generative models such as the stochastic block model can also providevery interesting solutions for evolutionary clustering. In Ref. [57] the conceptof the dynamic stochastic block model is introduced, where in addition tothe usual group membership probabilities and membership dependent link-ing probabilities, further probabilistic transition matrices are considered fordescribing the evolution of node memberships between the subsequent timesteps. A more general formulation of the model is given in [58] with the helpof a layered stochastic block model, where the layers can naturally correspondto time steps in case of a dynamic network, however the approach can han-dle general multilayer networks as well. Important results on the detectabilitythresholds for the dynamic stochastic block model are presented in [59] basedon the cavity method, while in [60], the concept of higher order Markov chains(and thus, the possibility for memory effects) are successfully incorporated intothe framework of dynamic stochastic block models. A common feature of theabove methods is that the results are obtained via Bayesian inference, whichin practice is usually implemented with the help of a Markov chain MonteCarlo algorithms [57, 58, 60].Stochastic block models can be also successful in the analysis of systemswhere the network structure itself should also be generated from time depen-dent (and possibly noisy) signals. In [61], an end-to-end community detectionalgorithm is proposed, avoiding the extraction a sequence of point estimatesfor the links, and providing an inference of the stochastic blocks directly fromthe raw data. In parallel, the stochastic block model framework can be alsoused for a joint reconstruction of the network structure and the communitiesfrom time varying functional data [62], where synergistic effects were reported,as the inferred blocks improved the reconstruction accuracy of the links, whichin turn also made accuracy of the inferred communities better.3.3 Incremental clustering, online community finding and predictingcommunity evolutionIn case of the previously mentioned methods, we assumed a ’complete knowl-edge’ about the time evolution of the system at least on the level of the inputdata, thus, when inferring the communities at a given time step, informationabout the network structure coming from later time steps was also available,and could be made use of. A somewhat more restrictive setup is where at agiven time point only the data corresponding to previous time steps can beused. Such scenario could be when small but fast changes occur in a large network, and our aim is to always give the currently best partitioning of thenetwork into communities, which however is also likely to be quite similar tothe partitioning in the previous time steps. The concept of incremental clus-tering fits to this setup in a natural manner [63], where instead of running thecommunity finding method of our choice ’from scratch’ on the current snapshot of the studied network, we consider the changes in the network structureand update the communities from the previous time step. A method follow-ing this approach was proposed in [64] based on spectral clustering, whilein [65,66] modularity optimization techniques were used for a similar purpose.However, further static community finding methods such as the label propa-gation approach can also be adapted to this framework as shown in [67], andthe problem of overlapping communities can also be handled [68]. Additionalincremental clustering techniques can be found in Refs. [69–76].An idea closely related to incremental clustering is given by the concept ofonline clustering in dynamical networks [77]. This framework considers largenetworks updated in a stream fashion, where changes in the communities aredetected online, separated from offline community detection and exploratoryquerying. A somewhat different strategy for online community finding is pro-posed in [78] based on expectation-maximization and the stochastic blockmodel, and further methods are proposed in Refs [79, 80].A closely related problem to the above described ’instantaneous’ commu-nity detection methods is given by the challenge of predicting the futurechanges in communities for time evolving systems. The first results in thisdirection were related to the prediction of whether a community will growand/or survive, or instead will disappear [81,82]. In [83] also the predicted lifespan and the connection between the life span and structural properties of thecommunities were studied. Beside the ’ultimate fate’ and life span, predictingthe occurrence of change events for communities is also a relevant problem,where the usage of machine learning techniques is a natural idea. The basicidea is to build classifiers that can predict certain type of events based on vari-ous community features [84–86]. A detailed study of the problem together witha thorough testing of methods on multiple real datasets is presented in [87]. Various strategies have emerged over the years suggesting different ways toimmunize nodes [88]. Yet, finding even more highly effective strategies mustbe pursued since any improvement can play a major role in saving humanlives and resources. Immunizing nodes at random is the simplest approach.This strategy has proven to be impractical since it requires a large proportionof nodes to be immunized to mitigate the epidemic spreading. To solve thisproblem, researchers try to come up with the best possible way to immunize asmall number of key nodes using various topological features of networks. Upto now, these immunization strategies fall into two categories: stochastic anddeterministic. In stochastic strategies, targeted nodes are identified by collect- ommunity structure: challenges and opportunities 15 ing information locally from randomly selected nodes in the network. They aretotally agnostic about the full network structure. The most popular strategyin this category is the so-called Acquaintance immunization. It aims to vacci-nate nodes which are randomly picked several times among the neighbors ofrandomly selected nodes. There is obviously a high chance that nodes withhigh degree are selected by the acquaintance strategy. Deterministic strate-gies, on the other hand, assume the knowledge of the whole network. Thesestrategies determine the succession in which nodes of a network should be im-munized to mitigate the epidemic spreading. They rank all nodes accordingto a given centrality measure. From high to low, nodes are targeted based ontheir rank. Deterministic strategies have proven to be very efficient in control-ling the epidemic outbreaks. Their only drawback is their high requirement ofthe global topology of the network. This makes them impractical in large scalenetworks. Stochastic strategies, however, have the advantage of requiring onlylittle information of the network at the expense of their performance, which islower as compared to the deterministic immunization. The standard centralitymeasures designed for complex networks with non-modular structure highlightdifferent characteristics of the nodes depending upon their objective criteria.The Degree-based strategy targets highly connected nodes (hubs). The immu-nization of hubs results in a big reduction in network density which reduces theepidemic diffusion. It is a very efficient strategy in scale-free networks due tothe power law degree distribution. The Closeness-based immunization strategyselects nodes with least average propagation length in the network as the mostinfluential spreaders. Targeting these nodes may increase the average pathslength in the network, hence the decrease of the epidemic propagation. Fur-ther, the Betweenness-based strategy immunizes nodes with maximum fractionof shortest paths passing through it. These nodes may have a considerable in-fluence in networks in terms of controlling the information flow. Therefore,immunizing these nodes can stop the diffusion between many vertices due totheir bridging role in the largest number of paths. Despite the efficiency ofthese popular immunization strategies (Degree, Closeness and Betweenness-based strategies) on targeting influential nodes, they exhibit some limitationswhen applied to networks with community structure. According to recent re-search, community structure strongly affects the epidemic spreading process.Thus, the design of immunization strategies needs to take into considerationthe community structure. Stochastic as well as deterministic strategies usinginformation of the community structure have been proposed. They can be clas-sified into two groups according to the community structure model they use.The first group of strategies uses the non-overlapping community structurefeatures. The second group is based on the overlapping community structureproperties. The most widely known stochastic strategies together with deter-ministic strategies using advantageously the community structure are recalledin this section. CBF [3] DCBF [7]BHD [4]RWOS [8]Stochastic Immunization Non-overlapping Communities Overlapping Communities Fig. 1 Stochastic immunization methods. Stochastic immunization strategies focus on using information at the nodelevel. They identify target nodes without knowledge of the full network struc-ture. That makes them computationally more efficient and more practical inlarge networks as compared to the deterministic strategies. Roughly speaking,these strategies target either the nodes linking the communities (Bridges) orthe highly connected nodes (Hubs) or the overlapping nodes using little or noinformation about the network topology.Some researchers assume that bridges are the most influential spreaders.These nodes can propagate the epidemic to the entire network because of theirconnectivity with various modules. They have then a global influence on thewhole network and their immunization can prevent the effective diffusion tothe different parts of the network. The Community Bridge Finder CBF [11]is an immunization strategy aiming to target the bridge nodes. It is basedupon a random-walk algorithm. The community hubs are also believed tohave a strong local influence in their communities. Based on this assumptionthe Degree Community Bridge Finder DCBF [89] and the Bridge-Hub De-tector BHD [12] are two immunization strategies, which targets bridge nodeswith high connections for immunization. The selected bridge nodes in thiscase play also the role of hubs. The former strategy is a variation of the CBF ,while the latter one is based on expanding friendship circles during a ran-dom walk. Other researchers try to highlight the importance of overlappingnodes in terms of the epidemic spreading dynamics. Random-Walk OverlapSelection RWOS strategy [90] is proposed to select the overlapping nodes ac-cording to a random-based algorithm. These key nodes can play a major rolein epidemic diffusion due to their membership to multiple communities. In thefollowing, we present a brief overview of the three stochastic strategies based ommunity structure: challenges and opportunities 17 on non-overlapping community structure and the one tailored for networkswith overlapping community structure (refer to Figure 1). Community Bridge Finder (CBF) Immunization interventions of highly connected individuals are not alwaysenough to protect networks from large-scale epidemics. Indeed, targeting in-dividuals bridging communities is sometimes more effective than simply im-munizing nodes with high degrees. The goal of the CBF strategy [11] is toidentify nodes acting as bridges between communities. This strategy is basedon random walks. It works as follows:Step 1. Select a random node v i =0 .Step 2. Follow a random walk with the condition that a node has not beenvisited by the random path before.Step 3. At each node v i> =2 , check if it is connected to more than one visitednodes. If there is just one connection, v i − is considered as a potential bridge.Step 4. Select two random neighboring nodes of v i other than v i − . If bothnodes have no connections to the previously visited nodes, the node v i − isthen marked as a bridge and it is immunized. Otherwise, a random walk istaken back at v i − .This strategy has been compared to the Acquaintance strategy defined asfollows. At each step, a node is picked at random and one of its acquaintancesis randomly selected, then nodes which are picked as acquaintances n timesare immunized. Extensive experiments were conducted on synthetic and real-world networks using SIR epidemic model. Results show that CBF outperformsmostly the Acquaintance strategy. Its best performance is obtained in networkswith strong community structure (few inter community links). Degree Community Bridge Finder (DCBF) DCBF [89] is a variant of the CBF strategy. The goal of this strategy is to tar-get bridges with large amount of connections. This strategy incorporates thesame steps as described in the CBF algorithm. The difference is that nodesare not randomly chosen among all the possible nodes during the randomwalk, but according to their degree from high to low. Two additional checksare also implemented in DCBF to decrease the computation time of the al-gorithm. First, the number of nodes visited in a running path is kept at thelength of ten. Also, the number of visits by all random paths is recorded foreach node. The node is immunized when the number of visits k is equal to acertain number ( k = 2). DCBF has been tested on synthetic networks withvarious modularity values. After running the SIR epidemic model simulations,results demonstrate that DCBF performs better than the CBF algorithm incontrolling outbreaks. Its performance gets higher in networks with strongcommunity structure (when the modularity is very high Q > . Bridge-Hub Detector (BHD) Communities are characterized by the heterogeneity in the connections amongnodes bridging various communities. Based on this idea, BHD [12] aims toidentify bridge hub nodes as targets for immunization. It is based on expandingfriendship circles of visited nodes and works as followsStep 1. Select a random node v i =0 .Step 2. Follow a random walk with the condition that a node has not beenvisited by the random path before.Step 3. Let v i> =2 be the node visited after i steps, and f i be the set of itsneighbors. The node v i is considered as an immunization target if there is atleast one node that does not take part of the set F i − and that it not linkedto any node in F i − , where F i − = f (cid:83) f (cid:83) f (cid:83) ... (cid:83) f t − . Otherwise, therandom walk moves on from v i , and the friendship circle will be updated to F i = F i − (cid:83) f i .Step 4. Among the nodes in f i , one node v H is randomly picked for immu-nization that do not belong and cannot be connected back to F i − .At the end of this procedure, a pair of nodes, a bridge and a bridge hubnodes are selected for immunization. This is via friendship circles of randomlyvisited nodes. BHD was tested on simulated and empirical data constructedfrom Facebook network of five US universities using the SIR model. It results ina smaller epidemic size as compared to the Acquaintance and CBF strategies.In terms of computational time, Acquaintance is the fastest algorithm, followedby CBF and BHD. Random-Walk Overlap Selection (RWOS) Overlapping nodes do not necessarily have high centrality measures, yet, theycan have a major effect in spreading epidemics from one community to another.Indeed, these nodes have access to multiple communities in the network. TheRWOS strategy [90] is designed to target the overlapping nodes for immuniza-tion according to a random walk. It can be specified as follows:Step 1. Define the list of overlapping nodes.Step 2. Select randomly a node of the network and run a random walk.Step 3. Each visited node is nominated as a target for immunization if itbelongs to the overlapping set of nodes. This process continues until reachingthe desired immunization coverage.This strategy targets highly connected overlapping nodes for immuniza-tion. It is based on the idea that the probability of visiting any node in arandom path is proportional to the node degree. RWOS has been investigatedon synthetic and real-world networks. The standard SIR epidemic model wasrun on these networks. Results show that RWOS outperforms CBF and BHDstrategies in terms of the epidemic size. It performs sometimes even better thanmembership strategy (where nodes are immunized according to the numberof communities they belong to). Moreover, its performance gets better in net-works with strong community structure and higher membership values. Notethat it uses more information about the community structure. Indeed, oneneeds to know the overlapping nodes. ommunity structure: challenges and opportunities 19 The stochastic strategies have been investigated on both simulated networks[91, 92] with different community structure, and real-world networks. Overall,results show that stochastic strategies based on the community structure aremore efficient than the standard stochastic strategies. Results show that gener-ally BHD and DCBF are more efficient than the CBF strategy. However, BHD strategy displays the best performance among the other strategies. Moreover,the difference between their performances increases when the modularity ishigh, so the communities are well separated from each other. Thus, the out-breaks stay restricted in local communities. Consequently, immunizing bridgesis not an effective way to control the spreading of epidemics. That explainsthe poor performance of CBF in networks with strong community structure. DCBF may at least identify relatively highly connected bridge nodes whichcan cause extensive spreading of epidemics. In addition, BHD is capable ofidentifying bridge nodes with high number of inter-community links. There-fore, the effectiveness of BHD can be attributed to the better identificationof the influential spreaders as compared to the CBF and DCBF . All thesethree strategies do not take into account the overlaps between communities.On the other hand, RWOS strategy which immunizes overlapping nodes re-sults in smaller epidemic size as compared to the other stochastic strategiesin all the networks. Furthermore, its performance enhances while increasingthe membership degree of overlapping nodes. Thus, overlapping nodes play amajor role in spreading infection from one community to another even if theyare not necessarily highly connected.4.2 Deterministic strategiesDeterministic strategies target nodes by ranking them following a given cen-trality measure. The centrality of a node reflects its ability to propagate thedisease. The procedure of deterministic strategies can be specified as followsStep 1. Select a given centrality measure.Step 2. Compute the centrality for each node of the network.Step 3. Rank nodes in decreasing order from the most to the less centralnode.Step 4. Target a proportion of nodes with high ranks for immunization.These strategies require the knowledge of the whole network because all thenodes are involved in the process. We now give an overview of some recentdeterministic strategies designed for modular networks. They are classified intodifferent categories according to their immunization goals (refer to Figure 2).4.3 Non-overlapping community structureA plethora of deterministic immunization strategies are developed to identifyvital nodes in networks with community structure. They can be classified GLR [21]BVA [10]Community Inbetweenness[16]Mod strategy[9]Comm [18]CbC [17]CbM [19]K-sell with community [20]Global strategy [14]NNC [13]CHB [13]WCHB [13]Modular centrality [23]Membership [24] OverlapNeighborhood [25] Deterministic Immunization OverlappingCommunitiesNon-overlappingCommunities Communitycentrality[2]OC [26]IM-PLA [27] G l o b a l a nd L o c a l s t r a t e g i e s G l o b a l s t r a t e g i e s IVD [11]Bridgeness [12]Super-node [15]DCL [22] L o c a l s t r a t e g i e s Fig. 2 Deterministic immunization methods.ommunity structure: challenges and opportunities 21 into three categories (global, local, global and local) in networks with non-overlapping structure. The first type of strategies highlights nodes with outerconnections towards foreign communities. They target bridge nodes, which canhave a significant global influence on other nodes of the network. The secondcategory tends to identify nodes with the highest local influence in their owncommunities. Some strategies target hubs for immunization because of theirstrong influence on nodes of their neighborhoods, while others immunize nodeslocated in the core of the community. The strategies belonging to the thirdcategory immunize both types of nodes. They select nodes having both localand global influence in the network. Bridges can be viewed as individuals that connect different subgroups of nodesin networks. They can let the epidemic outbreaks move from one module toanother through their inter-community connections. Therefore, they have amajor global influence in the entire network. Series of strategies have beenproposed to select these critical nodes for immunization. The Module-basedstrategy ( Mod strategy ) [93] is proposed to highlight the bridge nodes betweencommunities. It is based on an approximated calculation of the eigenvectorcentrality of the coarse-grained network (called also the meta-graph). In thisnetwork the communities are represented simply by nodes, and the links areweighted by the number of links between the two communities. It can be spec-ified as follows. Module-based strategy (Mod strategy): Mod strategy was proposed byMasuda et al. [93]. Given the community structure of the original network, thisstrategy is applied on the coarse-grained network. Where each communityis represented by a single node, and edges are weighted by the number oflinks shared by two neighboring communities. It targets nodes maximizing thefollowing measure M od i = 2˜ u K (cid:88) I (cid:54) = K d kI ˜ u I (11)Where ˜ u K represents the eigenvector corresponding to the K th community. d kI is the number of inter-community links that exist between node k andthe I th community. The first term of this measure (i.e., 2˜ u K ) quantifies theimportance of the community that the node k belongs to, whereas the sec-ond quantity (i.e., (cid:80) I (cid:54) = K d kI ˜ u I ) measures its connectivity to other importantcommunities. After immunizing all the bridge nodes, the remaining nodes areranked according to their degree. This method preferentially targets globallyimportant nodes having important inter-community links rather than commu-nity hubs that are locally important. The effectiveness of the Mod strategy istested by applying it on synthetic and real-world networks of various nature.Results show that it is in most cases more efficient than Degree, Betweenness and Ress strategy (an eigenvector based strategy [13]) in networks with mod-ular structure.Different from the above method, Mantzaris [94] proposed the BoundaryVicinity Algorithm BVA . Boundary Vicinity Algorithm ( BVA ): Thisstrategy ranks nodes according to their vicinity to bridge nodes (boundarynodes) of each community. It is defined as followsStep 1. Define the set of communities of the network.Step 2. Extract the set of bridges which connects communities.Step 3. Run a number of random walkers of a chosen fixed number of stepsfrom each bridge node. Then, the number of visits to each node is counted.This measure quantifies the ability of a given node to propagate epidemicsacross bridges towards different communities. Using the SI epidemic model,the authors show that the BVA strategy outperforms the Betweenness-basedstrategy in terms of the epidemic size.Yoshida et al. proposed the Inverse Vector Density ( IVD ) [95]. It is anotherimmunization strategy that do not require the community labels of nodes. Thisis by constructing a vector representation of nodes based on the modularityquality measure. The IVD immunizes nodes with small number of nearbynode vectors which are identified as bridges. This strategy performs betterthan the Betweenness-based strategy in terms of the Largest Connected Com-ponent ( LCC ). Bridgeness strategy is proposed by Jensen et al. [96]. It is basedon the Betweenness centrality while considering only shortest paths betweennodes belonging to different communities. This strategy highlights nodes thatconnect different regions of a network. Using both synthetic and real-world net-works, the Bridgeness strategy is shown to be globally more effective than theBetweenness-based strategy to identify bridge nodes. Different from the abovemethods, the Number of Neighboring Community ( NNC ) [97] selects nodeswhich are connected to the larger number of foreign communities, regardlessof the amount of their inter-community links. It ranks nodes according to thenumber of neighboring communities that they can reach through at least onelink. Indeed, nodes with high number of neighboring communities are able todisseminate information across the entire network. Experimental results showthat the Number of Neighboring Communities strategy outperforms the De-gree and the Betweenness-based strategies in terms of the epidemic size. Itperforms also better than some community-based strategies such as the Com-munity Inbetweenness, CbM strategies (see their definition in section 4.3.3).This is particularly true in networks with a community structure of mediumstrength (i.e., when the proportion of intra-community links is of the same or-der than the proportion of inter-community links). M. Kitromilidis et al. [98]propose to redefine the traditional centrality measures to characterize the in-fluence of Western artists. This global strategy is based on computing thestandard centrality measures by considering only the inter-community links ofthe networks. Their idea is based on the fact that influential artists usuallyhave connections beyond their artistic movement. The Global Betweenness ommunity structure: challenges and opportunities 23 and Closeness strategies are compared to their classical versions. They weretested on a painter collaboration network. Experimental results show thatthe Global strategies allow to highlight some influential nodes who might havebeen missed as they do not necessary rank high in the standard measure basedstrategies. Hubs represent the high degree nodes with the larger amount of connectionsthat greatly exceed the average. They are a consequence of the scale-free degreedistribution observed in real-world networks. In modular networks, such nodescan be found in all the communities. They have then a strong local influenceon the nodes of their own communities. Newman proposed the Communitycentrality [10] to identify nodes that plays a central role inside communities interms of the number of connections. These nodes are responsible for the max-imum information flow inside their communities. He et al. proposed the Supernode strategy [99] that can immunize nodes with the highest intra-communitylinks (or with highest k-core index) belonging to various communities. Bothstrategies are described as follows: Community centrality ( CC ): Newman proposed a slightly different for-mulation of the modularity. The Community centrality [10] is derived fromthe eigenvectors of the modularity matrix. The modularity matrix is dividedinto two projections. The first dimension represents the positive eigenvectorsof the modularity matrix while the second dimension represents the negativeones. Thus, the modularity can be written in terms of these vectors as follows: Q = c (cid:88) k =1 | X k | − c (cid:88) k =1 | Y k | (12)where c is the number of communities. X and Y are the community eigenvec-tors in both dimensions. The i th node in the community k is represented bytwo vectors x i and y i (the i th rows of X k and Y k respectively).The magnitude of a node vector | x i | specifies how central the node i is inits community in terms of the number of connections. Thus, the node i has alarge positive contribution to the modularity when this measure is large. Onthe other hand, a higher value of | y i | means that the node i has many con-nections to other nodes from foreign communities. Therefore, the Communitycentrality is defined to be equal to the vector magnitude | x i | . It measures thestrength with which a given node i is assigned to its community. This measurehas been tested in a co-authorship network between scientists. Results showthat it is not well correlated with the degree centrality. Moreover, some nodeswith high Community centrality measure have relatively low degree. However,they have more connections with nodes of their communities. Thus, nodes withhigh Community centrality value play a central role in the spreading process in their local neighborhood. Super node strategy: This strategy starts by ranking communities in de-creasing order according to their size. After that, the node with the largestinner degree is selected from the largest community. Then, the node withthe highest inner degree in the second largest community and which do nothave any connections with the previous communities is selected as the sec-ond spreader. Note that there is only one previous community for the secondspreader. After visiting all the communities of the network, this process isrestarted again until achieving the desired number of immunized nodes. Thegoal of this method is to select multiple spreaders from different communi-ties in a balanced way. SIR simulations are performed in both synthetic andreal-world networks. Experimental results show that the Super node strategyresults in a smaller epidemic size as compared to the Degree-based strategy.Additionally, Super node strategy proved also its efficiency while using the k-shell decomposition method in the process of finding the influential spreadersin each community. The immunization strategies in this category tend to target nodes that haveboth local and global influence. They combine the various aspects of the pre-vious strategies to select the most influential nodes in the network. Thesenodes are supposed to be the main spreaders in their communities which canalso disseminate the epidemics towards other modules of the network. Com-munity Inbetweenness [100] together with the CbC strategy [101] select theHub-bridge nodes for immunization. They can be defined as follows: Community Inbetweenness strategy: The classical betweenness needs tosolve the shortest path problem of all pairs, what makes it unfeasible in largenetworks. Community Inbetweenness strategy [100] is proposed to solve thisproblem. It is based on an entropy-based measure which approximates thebetweenness centrality. It ranks nodes based solely on community informa-tion. This strategy evaluates node importance according to the proportion ofits surrounding links in addition to the external links connecting it with for-eign communities. The Community Inbetweenness centrality C CI is defined asfollows: C CI ( i ) = k i (cid:88) c ∈ C p i → c log (cid:18) p i → c (cid:19) (13)Where k i is the degree of node i . p i → c is the proportion of links connectingnode i to the community c ∈ C . C is the set of non-overlapping communities.Community Inbetweenness tends to select nodes with high connectivity andwith more links to different communities. It is based on the idea that nodeswith high betweenness measure are usually located between densely connectedmodules. These nodes are also targeted by the standard betweenness central-ity. Simulation results on real-world networks show that this strategy is more ommunity structure: challenges and opportunities 25 efficient than the betweenness based strategy in terms of computational perfor-mance. Both strategies are also tested with the SIR model in [14] to comparetheir epidemic size. Results show that Community Inbetweenness performs al-most as well as the betweenness in networks with strong community structure.It is however more efficient in networks with loose community structure. Community-based Centrality ( (CbC) : This strategy selects nodes for im-munization according to their links characteristics and the size of their com-munities. It targets nodes that have a big impact in their communities andthat can spread epidemics to nodes from other communities. It is based on ameasure that evaluates the importance of node i via the following formula: CbC i = m (cid:88) c =1 d ic S c N (14)Where d ic is the number of links between node i and other nodes in commu-nity c , m is the number of communities in the network, S c is the number ofnodes in community c , and N is the size of the network. Simulation resultsusing the SIR model show that CbC outperforms some traditional measuressuch as Degree and K-shell. Moreover, CbC can also better reflect nodes im-portance as compared to Closeness, Betweenness and Eigenvector centralities,with much lower computational complexity.Comm strategy was proposed by Gupta et al. [102] [103]. The aim of thisstrategy is to target nodes that are at the same time hubs in their communitiesand bridges towards other communities. It ranks nodes according to a degree-based measure. This measure is a weighted combination of the number ofintra-community links and the square of the number of the inter-communitylinks, which accounts for importance of bridge nodes. Results on syntheticand real-world networks show that the Comm strategy is more effective orat least works as well as Module-based immunization strategy, Degree andBetweenness based strategies. Community-based Mediator ( CbM ) [104] is an-other strategy that takes into account the internal and external density of eachnode. They represent the proportion of the intra and the inter-community linksof a given node respectively. This strategy is based on the entropy using bothdensities. It uses this information to select individuals that can propagate theepidemic in their community from internal density and in other communitiesfrom external density. Experimental results demonstrate that nodes with highCbM value have a more noteworthy effect to spread epidemics in networksthan nodes having a high CbC, Betweenness, Degree, PageRank or Eigenvec-tor value. Luo et al. [105] proposed also the k-shell with community strategydesigned for networks exhibiting a community structure. It is based on theidea that the location of a node has a big impact on the spreading process.It is a variation of the k-shell decomposition strategy, in which decomposi-tion method is applied to the intra and the inter-community links separately.The goal is to select for immunization hubs and bridges that are located in the core of the network. Results of SIR simulations performed on Facebooknetwork show that it outperforms the traditional k-shell decomposition, theBetweenness and Degree based strategies. Salavati et al. [106] proposed an im-proved version of the Closeness-based strategy designed for modular networks.It decreases also the high computational complexity of the standard closenessmethod. The so-called Gateway Local Rank strategy GLR starts by ignoringthe connections between communities. Then, in each community one criticalnode is extracted using the betweenness centrality. After that, one node withthe highest inter-community links is also extracted from each community. Inthe last step, nodes are ranked based on the sum of their shortest paths withthe extracted core and bridge nodes instead of computing their shortest pathsusing all the nodes of the network. Experiments on synthetic and real-worldnetworks using the SIR diffusion model demonstrate the effectiveness the GLR strategy in comparison with the Closeness, Degree, Betweenness and k-shellbased strategies. Berahmand et al. [107] proposed the Degree and Cluster-ing coefficient and Location strategy DCL . It immunizes the best spreadersbased on a combination of the degree and the inverse cluster coefficient ofa given node. The latter two measures are also combined with the degree ofits neighbors and the common links between the node and its neighbors todefine the location of a node (whether it is in the core or the periphery ofthe community). This strategy allows identifying low-degree bridges and somecritical hub nodes. Comparisons based on the SIR and the SI models revealthat the proposed method outperforms the well-known strategies such us theDegree, Betweenness, Eigenvector, PageRank and the k-shell based strategies.The Community Hub-Bridge strategy [97] is based on a linear measure. It is aweighted combination of the number of intra-community links and the numberinter-community links. The first term of this measure is weighted by the sizeof the community. The aim of this is to prioritize the immunization of hubslocated in large communities due to their big influence. The second term of theexpression is weighted by the number of neighboring communities to targetin priority bridges having many connections with multiple communities. Ac-cording to SIR simulations performed on synthetic and real-world networks,this strategy is more efficient than the Number of Neighboring Communi-ties, Community Inbetweenness, CBM and Comm strategies. It is particularlysuited for networks with strong community structure (having a small propor-tion of inter-community connections). The Weighted Community Hub-Bridgestrategy [97] is another variant of the previous strategy. It is based on a lin-ear measure weighted also by the density of the inter-community links. It isweighted such that, in networks with strong community structure, more impor-tance is granted to bridges while in networks with loose community structuremore importance is given to the local community hubs. Experimental resultsshow that it outperforms the previous strategy namely in networks with loosecommunity structure. ommunity structure: challenges and opportunities 27 The above-mentioned immunization strategies are based on measures thatquantify either the global influence of nodes by selecting bridge nodes, or thelocal influence of nodes by targeting community hub nodes. Other centralitymeasures highlight nodes having both local and global influence for immu-nization. The modular centrality considers two types of influences for a nodein a modular network: A local influence on the nodes belonging to its owncommunity through the intra-community links, and a global influence on thenodes of the other communities through the inter-community links. Therefore,in this approach, centrality measures are not represented by a simple scalarvalue but rather by a two-dimensional vector, the so called Modular central-ity [108]. Its first component measures the local influence of the node, whilethe second component measures its global influence. The Modular centralityis computed following two steps. The global component of the vector is com-puted on the global network obtained by removing all the intra-communitylinks from the original network. Remaining isolated nodes are also removed.The local component is computed on the local graph obtained by removing allthe inter-community links from the original network. The Modular centralityis computed according to the following algorithm:Step 1. Choose a standard centrality measure β .Step 2. Remove all the inter-community edges from the original network G to obtain the set of communities C forming the local network G l .Step 3. Compute the local measure β L for each node in its own community.Step 4. Remove all the intra-community edges from the original networkto reveal the set of connected components S formed by the inter-communitylinks.Step 5. Form the global network G g based on the union of all the connectedcomponents. Isolated nodes are removed from this network and their globalcentrality value is set to 0.Step 6. Compute the global measure β G of the nodes linking the commu-nities based on each component of the global network.Step 7. Add β L and β G to the Modular centrality vector B M .This approach allows to redefine all the standard centrality measures de-signed for non-modular networks to networks with non-overlapping communitystructure. A series of experiments have been performed on both real-world andsynthetic networks using the SIR model in order to investigate the efficiencyof the Modular centrality. Results show that the Local measure is more ef-ficient in networks with strong community structure, while Global measureperforms better in networks with a weak community structure. Furthermore,the measure that combines both components outperforms the local, the globaland the classical measure. Recently this work has been extended to networkswith non-overlapping community structure [109]. Comparing with stochastic immunization strategies, the epidemic size of deter-ministic strategies (e.g., Comm , CbM , CBH , WCBH and NNC ) outperforms CBF and BHD methods in all the networks. Indeed, stochastic strategies onlyseek current node’s information, while deterministic strategies require the ac-cess to the whole network structure. That explains why the performance ofstochastic strategies is usually far from the deterministic ones. To compare theperformance of deterministic strategies, we consider two extreme cases: Net-works with well-defined community structure and networks with weak commu-nity structure. In the first case, the communities are very separated from eachother. Hence, there are few inter-community connections between the differentmodules of the network. The local strategies have proven to be more efficientthan the global strategies in such networks. The Super node strategy outper-forms some global strategies such as the global betweenness method. Actually,there is a great chance that the epidemic stays confined inside the communitiesbecause of the small number of inter-community links. Therefore, immunizinghub nodes or community core nodes may appear as the most efficient wayto stop the epidemic diffusion in networks with strong community structure.In networks with medium or unclear community structure, there are a largeamount of inter-community connections in the network. The epidemic in thiscase can move easily from one community to another. Thus, bridge nodes mayplay a major role in the diffusion process. That explains the efficiency of theglobal strategies as compared to the local ones in these networks. The Num-ber of Neighboring Communities ( NNC ) for instance is more efficient than thelocal degree and the super node strategies. The combination-based strategies,on the other hand, target both type of nodes. They are overall more efficientthan both local and global strategies in networks with different structures.Some strategies such as CbM , CBH and WCBH outperform the super node,the local and the global betweenness and degree-based strategies. Furthermore, WCBM has proven to be more efficient than some other combination-basedstrategies (e.g., Comm , CbM and CbC ). This strategy uses different level ofinformation about the topological properties of the community structure suchas the size of communities, the number of neighboring communities of eachnode and the proportion of inter-community links of each community. Thus,it uses more information about the community structure as compared to theother strategies. Therefore, the performance of the immunization strategiesincreases when more information about the community structure is used.These assumptions led to the introduction of the Modular centrality , whichis a bi-dimensional vector measuring both local and global influence of eachnode in the network. This approach investigated for some classical centralitymeasures (Degree, Betweenness, Closeness and Eigenvector) shows that theLocal measure is more efficient in networks with strong community structure,while the Global measure performs better in networks with loose communitystructure. Moreover, the performance of ranking strategies combining bothcomponents of the Modular centrality is more efficient than using only one ommunity structure: challenges and opportunities 29 component. Furthermore, better results were even obtained by using moreinformation related to the topological properties of the community structure.These experimental results of the Modular centrality confirm the ones obtainedwith the alternative deterministic strategies.4.4 Overlapping community structureCommunities can often overlap in real-world networks. In this case, nodes canbelong to more than one community at once. Identifying such overlappingnodes is crucial for controlling the epidemic spreading. These nodes can ex-tend the epidemic diffusion across all communities to which they belong. Somestrategies select these nodes for immunization. Hebert et al. [110] proposed astraightforward strategy which directly counts the membership number of eachnode in the network. Chakraborty et al. [111] analyze how immunization basedon the membership number of overlapping nodes affect the largest connectedcomponent size. OverlapNeighborhood ON [112] is another strategy that tar-gets the neighbors of the overlapping nodes for immunization. It is based onthe idea that overlapping nodes are connected to many hub nodes located inthe different communities to which they belong. The Membership and Over-lapNeighborhood strategies are defined as follows: Membership strategy: This strategy [110] is applied to networks with over-lapping modular structure. It is based on a measure that counts simply thenumber of communities to which a node belongs. If the membership of a node i is greater than 1, i.e., this node belongs to an overlapping region in the net-work. Experimental results using the SIR model have shown that this strategyoutperforms degree, coreness and betweenness-based strategies in networkswith denser communities and by using a higher infection rates. OverlapNeighborhood strategy (ON): This method [112] selects imme-diate neighbors of overlapping nodes as the top influential spreaders. Its mainobjective is to select the most highly connected nodes using a limited amountof information at the community level. Indeed, there is a high probability thatnodes with very high connections are neighbors to overlapping nodes since theyare part of more than one community. This is also due to the power-law degreedistribution in real-world networks. The simulation results revealed that thismethod outperforms CBF, BHD and RWOS methods. It performs better oras good as Degree and Betweenness centrality based methods using less infor-mation about the overall network.The Overlapping constraint coefficient ( OC ) [113] is an immunization strat-egy that highlights the influential nodes based on the multiplication of twomeasures. The first measure represents the membership of a given node whichquantifies its propagation capacity. So, the more communities a node belongsto, the more communities the node can influence. The second measure rep- resents the network constraint coefficient of the node, which quantifies itspropagation speed in the communities. SIR simulations demonstrate that theOverlapping constraint coefficient strategy outperforms the Degree, Between-ness, Closeness and the k-shell based strategies. The Influence Maximizationbased on Label Propagation Algorithm ( IM-LPA ) [114] is another strategy de-signed for networks with overlapping communities. It is based on an improvedversion of the Label propagation algorithm [115]. It operates in two phases:the seeding phase and the label propagation phase. At the beginning of theseeding phase, the set of seed nodes is empty and all the nodes of the net-work are considered as candidate nodes. After that, the node with the highestdegree is added to the seed set and all its neighbors are removed from the can-didate node set. This process is repeated until the candidate node set becomesempty. This phase guarantees that the selected seed nodes are independentfrom each other. In the label propagation phase, each seed node is associ-ated with a unique label. Then, the labels expand from the seed nodes untilcovering all the other nodes of the network. Nodes may have several labels.Thus, they can belong to several communities. At the end of this process, thecentrality of each node can be measured by the number of nodes associatedto its label. Nodes with the highest measure can propagate the epidemics toa large set of nodes of their communities. The Independent cascade diffusionmodel ( IC ) was performed on both synthetic and real-world networks. Resultsdemonstrate the efficiency of the IM-LPA strategy in identifying the influen-tial spreaders as compared to the Degree, Betweenness, Closeness, K-shell andPageRank-based strategies. In complex networks, community structures are widely observed. Despite thefact that this property is well-recognized, it is very often ignored when itcomes to use it in order to develop new techniques in the field. In this paper,we consider three hot topics linked to the community structure of complexnetworks. First one focuses on the fundamental issue of community detectionin static networks. The second one discusses the same issue but for temporalnetworks. Finally, the third one examines immunization strategies designedfor modular networks.After the introduction, the second section focuses on static networks inwhich detecting communities can be viewed as partitioning of the network intoclusters in which the nodes are more densely connected to each other than tothe nodes in the rest of the network. In this section, we look at communitydetection based on this fundamental assumption about community structure.In summary, the current state of the art in this area is as follows. Onesystematic approach to community detection is to select a metric of commu-nity quality and maximize it. Several of such metrics [4, 24, 116–119] are vari-ants or improvements based on the modularity metric of community structurethat measures the difference between the observed fraction of edges within ommunity structure: challenges and opportunities 31 a community and this fraction expected in a random graph with the samenumber of nodes and the same degree sequence. That gave raise to modularitymaximization [1] as one of the state-of-the-art methods for community detec-tion. However, it suffers from the so-called resolution limit problem [25, 120],a tendency of standard modularity to increase when some small well-formedcommunities are combined into inappropriate large clusters, while some largewell-formed communities are spread among smaller ones. Some of the abovementioned variants of the modularity function have been proposed to either re-solve this problem [116,121] or to enable detection of communities at differentscales [117–119]. A popular choice for the latter is the generalized modularity of Reichardt and Bornholdt [24], which scales the discovered community sizesaccording to a simple resolution parameter. This parameter is not fixed in thedefinition of the generalized modularity. Hence, many approaches [122–124] trydifferent values of the resolution parameter to find proper community struc-tures in the real networks. When the resolution parameter is set as one, thegeneralized modularity reduces to the traditional modularity. Another draw-back of this approach is that the stochastic block model requires the selectionof the number of communities, because selecting a large number of blocks al-ways leads to a high likelihood of generating the observed network. Anotherdrawback of this approach is that the stochastic block model requires the se-lection of the number of communities, because selecting a large number ofblocks always leads to a high likelihood of generating the observed network.Therefore, recent works [27, 28, 30] adopt Bayes model selection to find theappropriate number of communities in a network. According to Occam’s Ra-zor, this approach also minimizes the description length (MDL) of the blockmodel [29, 30] so that community detection algorithm finds the most suitablenumber of communities.An extension of this model [19] introduces the so-called degree-correctedstochastic block model in which the node degrees are also used as parameters,making the expected node degree in the model equivalent to the observed nodedegree. Since the nodes in the same community tend to have broad degreedistributions, this simple yet effective extension of node degrees improves theperformance of the models for statistical inference of community structure inthe real-world networks. The degree-corrected planted partition model is aspecial case of the degree-corrected stochastic block model.Recently, Newman [5] proved partial equivalence of the two approachesby showing that modularity maximization is equivalent to the maximum-likelihood estimation (MLE) of the degree-corrected planted partition modelon the same graph. Lu and Szymanski [6] established an asymptotic theoreticalupper and lower bounds on the resolution parameter of generalized modular-ity. When the upper bound larger than the lower one then we know that thereis a resolution parameter that avoids modularity resolution problem in thecorresponding network. The open question now is how to proceed if the upperbound is smaller than the lower one.An alternative approach to metric maximization is the statistical inferencethat fit the generative model to the observed network data. Such approach assumes the observed network is produced by a random graph model witha pre-defined partition of the network as the model parameter. In general,the statistical inference aims at recovering the partition which maximizes thelikelihood of the random graph model generating the observed network data.One widely used generative model for community structure is the stochasticblock model [19] where nodes are organized as blocks and edges are placedbetween the nodes independently at random, with a probability depending onthe block assignments of the endpoints. The weakness of this approach is thatthe model considers nodes in the same block statistically indistinguishablefrom each other, so the most likely block assignment often groups the nodes ofsimilar degrees in a block, resulting in lower and higher-degree blocks, ratherthan the traditional community structures. Moreover, the inference is actu-ally much more complicated than maximizing generalized modularity. One ofthe reasons is that the current versions of stochastic block model searchesthrough the large space of potential solutions containing both assortative anddisassortative structures [125]. Consequently, inference algorithms using thesemodels are often trapped in a solution unsuitable for the user and it takesthem long time to escape. To address this issue, the authors of [7] apply asimple constraint on nodes internal degree ratio in the objective function.Despite the significant progress made towards community detection usingfundamental properties of the communities, provably optimal algorithms arestill beyond our reach for the modularity maximization based approaches. Thecurrent open question is how to proceed if for the network in question no singleresolution parameter exists that will allow modularity maximization to avoidanomalies. At least there is now a simple test, introduced in [6], that allowsfor detecting such cases and proposes a method for finding a solution free ofsuch anomalies.In the third section of the paper, we briefly overview the most popularapproaches and recent advances in the field of evolving community detection.Nowadays, the availability of time stamped or time dependent data on net-worked systems is becoming widespread, hence the scientific interest towardsthe study of time evolving networks is increasing. Locating communities intime dependent networks is a non-trivial and challenging problem, with animpressive number of proposed different solutions.A relatively straightforward idea is to represent the time evolving networkas a sequence of static snap-shots, and apply one of the well-known staticcommunity detection algorithms on the series of static graphs, as was donein Refs. [35–40]. Naturally, the obtained communities have to be matched atsubsequent time steps in order to obtain time evolving clusters. The advantageof this approach is that basically any static community finding method can beused, however the drawback is that the matching part can become complicatedand the threads of the evolving communities may turn needlessly intricate.In contrast to snap-shot based methods, the concept of evolutionary algo-rithms treats the inference of the time dependent communities in a unifiedframework. Indeed, in this case, the structure of a community at a giventime step t can be influenced by information coming from other time steps ommunity structure: challenges and opportunities 33 as well [44–54]. A popular approach along this line is to formulate the aimfor a smooth evolution over time together with the goal of obtaining precisecommunities reflecting the true modular structure of the network at any timepoint as an optimization problem. Further methods following a similar trackare based on multislice networks [55], consensus clustering [56], or generativemodels such as the stochastic block model [57–62].A closely related idea to the above is given by incremental clustering[63–76], where only the time steps relatively in the past are taken into ac-count when extracting the communities at a given date t . Although this issomewhat a more restrictive setup compared to evolutionary clustering, theadvantage of this approach is that it enables in principle the online clusteringof networks [77–80]. Besides online community detection, the concept of fore-casting the future events and changes in time dependent communities is alsogaining considerable interest [81–87].Partly due to the large number of different methods, providing a well-controlled benchmark system on which the proposed algorithms can be testedand compared has become a very important challenge as well. However, thisproblem is relevant also from other perspectives, such as e.g., measuring thequality of the obtained evolving communities. A number of important firststeps have already been made in this direction, such as the introduction of thetime dependent version of the static Girvan-Newman benchmark [126] in Ref.[46], the dynamic modification of the static LFR benchmark [91] in Ref. [38],and the proposition of a benchmark based on a time evolving stochastic blockmodel in Ref. [127]. Furthermore, the problem can be also brought into a moregeneral context with the concept of multilayer community benchmarks [128],while tailor made benchmarks specific for a given problem or method can bealso well motivated [129].Nevertheless, how to measure and compare the performance of evolution-ary community finding algorithms is a highly non-trivial question, related towhich further advances can be expected in the future. What makes the prob-lem especially difficult is the rather diverse nature of both the time evolvingnetworks and of the applied methods. There are systems where we find quitelarge variations in the network structure across subsequent time steps, whereasother networks show a gradual, significantly smoother evolution in time; andin respect of the proposed algorithms, there are methods concentrating moreon the accuracy of the obtained communities, whereas others focus instead onthe smoothness and coherence of the evolution. Based on that, defining e.g.,a quality function analogous to modularity is far from trivial, and bringingthe field to a common ground in terms of benchmarks and comparison provideinteresting and important challenges for the future.In the fourth section, we look at how the community structure affects thediffusion process of epidemics, and how to use information about the commu-nity structure in order to design effective immunization strategies to controlepidemics in modular networks. We can distinguish two main approaches tosolve this issue. The first is the stochastic approach beneficial when little isknown about the full network structure or when the networks are too large to compute features for each nodes. However in general, the second approach ofusing the deterministic strategies outperforms the non-deterministic strategies.Overall, the works presented above demonstrate that it is important to con-sider the community structure of real-world networks to develop more suitableimmunization strategies. Some stochastic strategies are designed to target thenodes linking the communities (bridges) because such nodes connect to manyparts of the network. Others concentrate on the highly connected nodes (hub).A third type of strategy targets bridges and hubs. Globally their effectivenessdepends of the community structure strength. Indeed, the best strategies arethe ones that give more importance to the hubs when there is a small pro-portion of links between the communities. But when the proportion of intercommunity links increases, it is better to immunize the bridge nodes first.So, there is a need for new stochastic strategies that can adapt to both sit-uations and can be tuned according to the community structure strength. Infact, the performance of stochastic strategies increases when additional knowl-edge about the community structure of the network is utilized by the algo-rithm. Globally, deterministic strategies are more sophisticated than stochasticstrategies. since they can easily exploit knowledge about the network topol-ogy. We classified them into three categories. Local strategies concentrate onthe information into the communities, while global strategies use the informa-tion between the communities. Finally global and local strategies exploit bothtype of knowledge. We observe the same behavior that the one observed withstochastic strategies. Indeed, local strategies outperform the global strategiesin networks with well-defined community structure while global strategies aremore effective in networks with loose community structure. Strategies exploit-ing both aspects perform generally better. Indeed, they incorporate in theirdefinition additional information about the community structure as comparedto local or global strategies. Finally, we believe that the modular centralityframework is very promising. It gives a clear idea of how to use both localand global knowledge of the community structure. Additionally, as there is noconstraint about the centrality used and the way to combine both dimensions,there is room for improvement.In networks with overlapping communities, immunization strategies takealso into account the overlapping nodes which belong to multiple communities.These strategies show the importance of these nodes, and show also their abil-ity in terms of the spread of infections. The OC strategy has proven to be themost effective deterministic strategy based on overlapping nodes. Indeed, thisstrategy considers other information about the community structure as com-pared to the membership, OverlapNeighborhood and the IM-LPA strategies.It is a combination-based method. It targets nodes having access to multi-ple communities and with high propagation speed in these communities. Thestochastic strategy RWOS compares well with its alternatives. However wecannot call it a pure stochastic strategy, because the overlapping nodes needto be known or estimated.All of these works give us a sense of directions for designing new immu-nization strategies tailored to the network topology. The community struc- ommunity structure: challenges and opportunities 35 ture cannot be ignored and much more knowledge about the formation ofthe communities and of their main features [130] need to be uncovered andintegrated into the immunization strategies in order to better identify the influ-ential nodes. One of the main challenge is to initiate research concerning semistochastic strategies such as RWOS. Indeed, stochastic strategies are the onesthat are the more suitable when the network is partially unknown, or too largeto uncover its community structure. However, adding information about thecommunity structure make them more effective. That is why the main streamof improvement is in between the effectiveness of the deterministic strategiesand the computational efficiency of the stochastic strategies. DeclarationsAvailability of data and material All data used in this article is publicly available at the websites cited in thereferences. No program source code is described in the paper. Competing interests The authors declare that they have no competing interests. Funding GP was partially supported by the European Unions Horizon 2020 Researchand Innovation Programme under Grant Agreement No. 740688 and by the Na-tional Research, Development and Innovation Office under Grant No. K128780.BKS was partially supported by the Army Research Laboratory under Coop-erative Agreement No. W911NF-09-2-0053 (the Network Science CTA), andthe Office of Naval Research (ONR) Grant No. N00014-15-1-2640. Acknowledgment The authors wish to acknowledge a partial support from the European UnionsHorizon 2020 Research and Innovation Programme under Grant AgreementNo. 740688, from the Hungarian National Research, Development and Inno-vation Office under Grant No. K128780, from the U.S. Army Research Lab-oratory under Cooperative Agreement No. W911NF-09-2-0053 (the NetworkScience CTA), and from the U.S. Office of Naval Research (ONR) Grant Mo.N00014-15-1-2640. Authors’ contributions All authors conceived of the ideas of the study. BKS and XL prepared andwrote the section 2, titled “The random graph models for community detec-tion”. GP prepared and wrote the section 3 titled: “Time evolving communi-ties”. HC prepared and wrote the section 4 titled: “Immunization strategies”. All authors prepared and wrote section 1, titled “Introduction,” and section 5,“Summary and Conclusions”. All authors read, edited and approved the finalmanuscript. Authors’ informationHocine Cherifi is a Professor of Computer Science at the University of Bur-gundy, Dijon, France. Gergely Palla is a Senior Research Associate in the Statistical and Biologi-cal Physics Research Group of Hungarian Academy of Science at the EotvosUniversity, Budapest, Hungary. Boleslaw K. Szymanski is the Director of Network Science and TechnologyCenter, the Claire and Roland Schmitt Distinguished Professor of ComputerScience, and a Professor of Physics at the Rensselaer Polytechnic Institute. Xiaoyan Lu is the fourth year graduate student at the Network Science andTechnology Center and the Department of Computer Science at the RensselaerPolytechnic Institute. References 1. M.E. Newman, Modularity and community structure in networks, Proc. Nat. Acad.Sci. (23), 8577 (2006)2. A. Clauset, M. Newman, C. Moore, Finding community structure in very large net-works, Phys. Rev. E , 066111 (2004)3. V.D. Blondel, J.L. Guillaume, R. Lambiotte, E. Lefebvre, Fast unfolding of commu-nities in large networks, Journal of Statistical Mechanics: Theory and Experiment (10), P10008 (2008). DOI 10.1088/1742-5468/2008/10/p100084. M. Chen, K. Kuzmin, B. Szymanski, Community detection via maximization of mod-ularity and its variants, IEEE Trans. Computational Social Systems (1), 46 (2014)5. M. Newman, Equivalence between modularity optimization and maximum likelihoodmethods for community detection, Phys. Rev. E (5), 052315 (2016)6. X. Lu, B. Szymanski, Asymptotic resolution bounds of generalized modularity and sta-tistically significant community detection, Information Sciences (2020, to appear,available arXiv:1902.04243 )7. X. Lu, B. Szymanski, Adaptive modularity maximization via edge weighting scheme,Scientific Reports , 13247 (2019)8. Z. Wang, C.T. Bauch, S. Bhattacharyya, A. d’Onofrio, P. Manfredi, M. Perc, N. Perra,M. Salath, D. Zhao, Statistical physics of vaccination, Physics Reports , 1 (2016)9. D. Helbing, D. Brockmann, T. Chadefaux, K. Donnay, U. Blanke, O. Woolley-Meza,M. Moussaid, J. Anders, J. Krause, S. Schutte, M. Perc, Saving human lives: What com-plexity science and information systems can contribute, Journal of Statistical Physics (3), 735 (2015)10. M.E. Newman, Finding community structure in networks using the eigenvectors ofmatrices, Physical review E (3), 036104 (2006)11. M. Salath´e, J.H. Jones, Dynamics and control of diseases in networks with communitystructure, PLoS computational biology (4), e1000736 (2010)12. K. Gong, M. Tang, P.M. Hui, H.F. Zhang, D. Younghae, Y.C. Lai, An efficient immu-nization strategy for community networks, PloS one (12), e83489 (2013)13. J.G. Restrepo, E. Ott, B.R. Hunt, Weighted percolation on directed networks, Physicalreview letters (5), 058701 (2008)14. Z. Ghalmane, M. El Hassouni, H. Cherifi, in (IEEE, 2018), pp. 1–5ommunity structure: challenges and opportunities 3715. P. Erdos, A. Renyi, On random graphs i, Publ. Math. Debrecen , 290 (1959)16. E.N. Gilbert, Random graphs, The Annals of Mathematical Statistics (4), 1141(1959)17. M. Molloy, B. Reed, A critical point for random graphs with a given degree sequence,Random structures & algorithms (2-3), 161 (1995)18. P. Holland, K.B. Laskey, S. Leinhardt, Stochastic blockmodels: First steps, Socialnetworks (2), 109 (1983)19. B. Karrer, M. Newman, Stochastic blockmodels and community structure in networks,Phys. Rev. E (1), 016107 (2011)20. P.O. Perry, P.J. Wolfe, Null models for network data, arXiv preprint arXiv:1201.5871(2012)21. F. McSherry, in Proceedings 2001 IEEE International Conference on Cluster Com-puting (IEEE, 2001), pp. 529–53722. A. Condon, R. Karp, Algorithms for graph partitioning on the planted partition model,Random Structures & Algorithms (2), 116 (2001)23. A. Asratian, T. Denley, R. H¨aggkvist, Bipartite graphs and their applications , vol. 131(Cambridge university press, 1998)24. J. Reichardt, S. Bornholdt, Statistical mechanics of community detection, Phys. Rev.E (1), 016110 (2006)25. S. Fortunato, M. Barthelemy, Resolution limit in community detection, Proc. Nat.Acad. Sci. (1), 36 (2007)26. U. Brandes, D. Delling, M. Gaertler, R. Gorke, M. Hoefer. Maximizing modularity ishard (2016, arXiv:0608255 )27. M. Riolo, G. Cantwell, G. Reinert, M. Newman, Efficient method for estimating thenumber of communities in a network, Physical review e (3), 032310 (2017)28. M.E. Newman, G. Reiner, Estimating the number of communities in a network, Phys.Rev. Let. (7), 078301 (2016)29. T.P. Peixoto, Entropy of stochastic blockmodel ensembles, Physical Review E (5),056122 (2012)30. T.P. Peixoto, Bayesian stochastic blockmodeling, arXiv preprint arXiv:1705.10225(2017)31. A. Ghasemian, H. Hosseinmardi, A. Clauset, Evaluating overfit and underfit in modelsof network community structure, IEEE Transactions on Knowledge and Data Engi-neering (early access 2019)32. T. Peixoto, Efficient monte carlo and greedy heuristic for the inference of stochasticblock models, Phys. Rev. E , 012804 (2014)33. N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, E. Teller, Equation of statecalculations by fast computing machines, J. Chem. Phys. , 1087 (1953)34. G. Rossetti, R. Cazabet, Community discovery in dynamic networks: A survey, ACMComput. Surv. (2), 35:1 (2018). DOI 10.1145/317286735. J. Hopcroft, O. Khan, B. K, B. Selman, Tracking evolving communities in large linkednetworks, Proc Natl Acad Sci USA (suppl 1), 5249 (2004)36. S. Asur, S. Parthasarathy, D. Ucar, in Proceedings of the 13th ACM SIGKDD Interna-tional Conference on Knowledge Discovery and Data Mining (ACM, New York, NY,USA, 2007), KDD ’07, pp. 913–921. DOI 10.1145/1281192.128129037. G. Palla, A.L. Barab´asi, T. Vicsek, Quantifying social group evolution, Nature ,664 (2007)38. D. Greene, D. Doyle, P. Cunningham, Tracking the evolution of communities in dy-namic social networks, 2010 International Conference on Advances in Social NetworksAnalysis and Mining pp. 176–183 (2010)39. M. Rosvall, C.T. Bergstrom, Mapping change in large networks, PLOS ONE (1), 1(2010). DOI 10.1371/journal.pone.000869440. P. Br´odka, S. Saganowski, P. Kazienko, Ged: the method for group evolution discoveryin social networks, Social Network Analysis and Mining (1), 1 (2013). DOI 10.1007/s13278-012-0058-841. L. Danon, A. D´ıaz-Guilera, J. Duch, A. Arenas, Comparing community structure iden-tification, J. Stat. Mech. (2005)42. A. Lancichinetti, S. Fortunato, J. Kert´esz, Detecting the overlapping and hierarchicalcommunity structure in complex networks, New J. Phys. , 033015 (2009)8 Cherifi, Palla, Szymanski, Lu43. A. Amelio, C. Pizzuti, Correction for closeness: Adjusting normalized mutual informa-tion measure for clustering comparison, Computational Intelligence (3), 579 (2017).DOI 10.1111/coin.1210044. D. Chakrabarti, R. Kumar, A. Tomkins, in Proceedings of the 12th ACM SIGKDDInternational Conference on Knowledge Discovery and Data Mining (ACM, New York,NY, USA, 2006), KDD ’06, pp. 554–560. DOI 10.1145/1150402.115046745. Y. Chi, X. Song, D. Zhou, K. Hino, B.L. Tseng, in Proceedings of the 13th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining (ACM,New York, NY, USA, 2007), KDD ’07, pp. 153–162. DOI 10.1145/1281192.128121246. Y.R. Lin, Y. Chi, S. Zhu, H. Sundaram, B.L. Tseng, in Proceedings of the 17th Inter-national Conference on World Wide Web (ACM, New York, NY, USA, 2008), WWW’08, pp. 685–694. DOI 10.1145/1367497.136759047. D. Zhou, I. Councill, H. Zha, C.L. Giles, in In ICDM07 (2007), p. 74575048. L. Tang, H. Liu, J. Zhang, Z. Nazeri, in Proceedings of the 14th ACM SIGKDD In-ternational Conference on Knowledge Discovery and Data Mining (ACM, New York,NY, USA, 2008), KDD ’08, pp. 677–685. DOI 10.1145/1401890.140197249. F. Folino, C. Pizzuti, in Proceedings of the 12th Annual Conference on Genetic andEvolutionary Computation (ACM, New York, NY, USA, 2010), GECCO ’10, pp. 535–536. DOI 10.1145/1830483.183058050. Y. Sun, J. Tang, J. Han, M. Gupta, B. Zhao, in Proceedings of the Eighth Workshopon Mining and Learning with Graphs (ACM, New York, NY, USA, 2010), MLG ’10,pp. 137–146. DOI 10.1145/1830252.183027051. M.G. Gong, L.J. Zhang, J.J. Ma, L.C. Jiao, Community detection in dynamic socialnetworks based on multiobjective immune algorithm, Journal of Computer Science andTechnology (3), 455 (2012). DOI 10.1007/s11390-012-1235-y52. V. Kawadia, S. Sreenivasan, Sequential detection of temporal communities by estrange-ment confinement, Scientific Reports , 794 (2012)53. H. Crane, W. Dempsey, Community detection for interaction networks, CoRR abs/1509.09254 (2015). URL http://arxiv.org/abs/1509.09254 54. R. G¨orke, P. Maillard, A. Schumm, C. Staudt, D. Wagner, Dynamic graph clusteringcombining modularity and smoothness, J. Exp. Algorithmics , 1.5:1.1 (2013). DOI10.1145/2444016.244402155. P.J. Mucha, T. Richardson, K. Macon, M.A. Porter, J.P. Onnela, Community structurein time-dependent, multiscale, and multiplex networks, Science (5980), 876 (2010).DOI 10.1126/science.118481956. A. Lancichinetti, S. Fortunato, Consensus clustering in complex networks, ScientificReports , 336 (2012)57. T. Yang, Y. Chi, S. Zhu, Y. Gong, R. Jin, A Bayesian Approach Toward FindingCommunities and Their Evolutions in Dynamic Social Networks (SIMA, 2009), pp.990–1001. DOI 10.1137/1.9781611972795.8558. T.P. Peixoto, Inferring the mesoscale structure of layered, edge-valued, and time-varying networks, Phys. Rev. E , 042807 (2015). DOI 10.1103/PhysRevE.92.04280759. A. Ghasemian, P. Zhang, A. Clauset, C. Moore, L. Peel, Detectability thresholds andoptimal algorithms for community structure in dynamic networks, Phys. Rev. X ,031005 (2016). DOI 10.1103/PhysRevX.6.03100560. T.P. Peixoto, M. Rosvall, Modelling sequences and temporal networks with dynamiccommunity structures, Nature Communications , 582 (2017)61. T. Hoffmann, L. Peel, R. Lambiotte, N.S. Jones, Community detection in networkswith unobserved edges (2018). ArXiv:1808.0607962. T.P. Peixoto, Network reconstruction and community detection from dynamics (2019).ArXiv:1903.1083363. T. Aynaud, E. Fleury, J. Guillaume, Q. Wang, Communities in Evolving Networks:Definitions, Detection and Analysis Techniques (Springer, New York, 2013), vol. 2,pp. 159–20064. H. Ning, W. Xu, Y. Chi, Y. Gong, T.S. Huang, Incremental spectral clustering byefficiently updating the eigen-system, Pattern Recogn. (1), 113 (2010). DOI 10.1016/j.patcog.2009.06.001ommunity structure: challenges and opportunities 3965. S. Bansal, S. Bhowmick, P. Paymal, in Communications in Computer and Informa-tion Science , Communications in Computer and Information Science , vol. 116 CCIS(2011), Communications in Computer and Information Science , vol. 116 CCIS, pp.196–20766. R. G¨orke, P. Maillard, C. Staudt, D. Wagner, in Experimental Algorithms. SEA 2010 , Lecture Notes in Computer Science , vol. 6049, ed. by P. Festa (Springer, Berlin, Hei-delberg, 2010), pp. 436–44867. J. Xie, M. Chen, B.K. Szymanski, in Proceedings of the Workshop on Dynamic Net-works Management and Mining (ACM, New York, NY, USA, 2013), DyNetMM ’13,pp. 25–32. DOI 10.1145/2489247.248924968. R. Cazabet, F. Amblard, C. Hanachi, in Proceedings of the 2010 IEEE Second Inter-national Conference on Social Computing (IEEE Computer Society, Washington, DC,USA, 2010), SOCIALCOM ’10, pp. 309–314. DOI 10.1109/SocialCom.2010.5169. D. Duan, Y. Li, R. Li, Z. Lu, Incremental k-clique clustering in dynamic so-cial networks, Artificial Intelligence Review (2), 129 (2012). DOI 10.1007/s10462-011-9250-x70. T. Falkowski, A. Barth, M. Spiliopoulou, in AMCIS (2008)71. N.P. Nguyen, T.N. Dinh, S. Tokala, M.T. Thai, in MobiCom (2011)72. R. Cazabet, F. Amblard, in Proceedings of the 2011 IEEE/WIC/ACM InternationalConferences on Web Intelligence and Intelligent Agent Technology - Volume 02 (IEEEComputer Society, Washington, DC, USA, 2011), WI-IAT ’11, pp. 402–408. DOI10.1109/WI-IAT.2011.5073. R. G¨orke, T. Hartmann, D. Wagner, Dynamic graph clustering using minimum-cuttrees, Journal of Graph Algorithms and Applications , 411 (2012)74. H.S. Ma, J.W. Huang, in Proceedings of the 7th Workshop on Social Network Miningand Analysis (ACM, New York, NY, USA, 2013), SNAKDD ’13, pp. 6:1–6:8. DOI10.1145/2501025.250102675. P. Lee, L.V.S. Lakshmanan, E.E. Milios, Incremental cluster evolution tracking fromhighly dynamic network data, 2014 IEEE 30th International Conference on Data En-gineering pp. 3–14 (2014)76. A. Zakrzewska, D.A. Bader, in Proceedings of the 2015 IEEE/ACM InternationalConference on Advances in Social Networks Analysis and Mining 2015 (ACM, NewYork, NY, USA, 2015), ASONAM ’15, pp. 559–564. DOI 10.1145/2808797.280937577. C.C. Aggarwal, P.S. Yu, Online Analysis of Community Evolution in Data Streams (SIAM, 2005), pp. 56–67. DOI 10.1137/1.9781611972757.678. H. Zanghi, C. Ambroise, V. Miele, Fast online graph clustering via erdsrnyi mixture,Pattern Recognition (12), 3592 (2008). DOI https://doi.org/10.1016/j.patcog.2008.06.01979. G. Rossetti, L. Pappalardo, D. Pedreschi, F. Giannotti, Tiles: an online algorithmfor community discovery in dynamic social networks, Machine Learning (8), 1213(2017). DOI 10.1007/s10994-016-5582-880. B. Tan, F. Zhi, Q. Qu, S. Liu, in Web-Age Information Management: 15th Interna-tional Conference, WAIM 2014 (2014), pp. 633–64481. S. Kairam, D. Wang, J. Leskovec, in Proceedings of the fifth ACM International Con-ference on Web Search and Data Mining (WSDM12) (2012), pp. 673–68282. A. Patil, J. Liu, J. Gao, in Proceedings of the 22nd International Conference on WorldWide Web (WWW13) (2013), pp. 1021–103083. M. Goldberg, M. Magdon-Ismail, S. Nambirajan, J. Thompson, in Proceedings of Pri-vacy, Security, Risk and Trust (PASSAT) and 2011 IEEE Third International Con-ference on Social Computing (SocialCom) (2011), pp. 780–78384. P. Br´odka, P. Kazienko, B. Ko(cid:32)loszczyk, Predicting Group Evolution in the Social Net-work (Springer, Berlin/Heidelberg, Germany, 2012), pp. 54–6785. B. Gliwa, P. Br´odka, A. Zygmunt, S. Saganowski, P. Kazienko, J. Ko´zlak, in Proceed-ings of 2013 IEEE/ACM International Conference on Advances in Social NetworksAnalysis and Mining (ASONAM) (2013), pp. 1291–129886. M. Takaffoli, R. Rabbany, O. Zaiane, in Proceedings of 2013 12th International Con-ference on Machine Learning and Applications (ICMLA) (2013), pp. 191–1960 Cherifi, Palla, Szymanski, Lu87. S. Saganowski, B. Gliwa, P. Br´odka, A. Zygmunt, P. Kazienko, J. Ko´zlak, Predictingcommunity evolution in social networks, Entropy (5), 3053 (2015). URL 88. L. L¨u, D. Chen, X.L. Ren, Q.M. Zhang, Y.C. Zhang, T. Zhou, Vital nodes identificationin complex networks, Physics Reports , 1 (2016)89. K. Gong, in International Conference on Logistics Engineering, Management andComputer Science (LEMCS 2014) (Atlantis Press, 2014)90. F. Taghavian, M. Salehi, M. Teimouri, A local immunization strategy for networkswith overlapping community structure, Physica A: Statistical Mechanics and its Ap-plications , 148 (2017)91. A. Lancichinetti, S. Fortunato, F. Radicchi, Benchmark graphs for testing communitydetection algorithms, Physical review E (4), 046 (2008)92. G.K. Orman, V. Labatut, H. Cherifi, Towards realistic artificial benchmark for commu-nity detection algorithms evaluation, International Journal of Web Based Communities (3), 349 (2013)93. N. Masuda, Immunization of networks with community structure, New Journal ofPhysics (12), 123018 (2009)94. A.V. Mantzaris, Uncovering nodes that spread information between communities insocial networks, EPJ Data Science (1), 26 (2014)95. T. Yoshida, Y. Yamada, A community structure-based approach for network immu-nization, Computational Intelligence (1), 77 (2017)96. P. Jensen, M. Morini, M. Karsai, T. Venturini, A. Vespignani, M. Jacomy, J.P. Coin-tet, P. Merckl´e, E. Fleury, Detecting global bridges in networks, Journal of ComplexNetworks (3), 319 (2015)97. Z. Ghalmane, M.E. Hassouni, H. Cherifi, Immunization of networks with non-overlapping community structure, arXiv preprint arXiv:1806.05637 (2018)98. M. Kitromilidis, T.S. Evans, Community detection with metadata in a network ofbiographies of western art painters, arXiv preprint arXiv:1802.07985 (2018)99. J.L. He, Y. Fu, D.B. Chen, A novel top-k strategy for influence maximization in com-plex networks with community structure, PloS one (12), e0145283 (2015)100. S.Y. Chan, I.X. Leung, P. Li`o, in Proceedings of the 1st ACM international workshopon Complex networks meet information & knowledge management (ACM, 2009), pp.31–38101. Z. Zhao, X. Wang, W. Zhang, Z. Zhu, A community-based approach to identifyinginfluential spreaders, Entropy (4), 2228 (2015)102. N. Gupta, A. Singh, H. Cherifi, Centrality measures for networks with communitystructure, Physica A: Statistical Mechanics and its Applications , 46 (2016)103. N. Gupta, A. Singh, H. Cherifi, in Communication Systems and Networks (COM-SNETS), 2015 7th International Conference on (IEEE, 2015), pp. 1–6104. M.M. Tulu, R. Hou, T. Younas, Identifying influential nodes based on communitystructure to speed up the dissemination of information in complex network, IEEEACCESS , 7390 (2018)105. S.L. Luo, K. Gong, L. Kang, Identifying influential spreaders of epidemics on commu-nity networks, CoRR abs/1601.07700 (2016)106. C. Salavati, A. Abdollahpouri, Z. Manbari, Ranking nodes in complex networks basedon local structure and improving closeness centrality, Neurocomputing , 36 (2019)107. K. Berahmand, A. Bouyer, N. Samadi, in Computing (2018), pp. 1–23108. Z. Ghalmane, M. El Hassouni, C. Cherifi, H. Cherifi, Centrality in modular networks,EPJ Data Science (1), 15 (2019)109. Z. Ghalmane, C. Cherifi, H. Cherifi, M. El Hassouni, Centrality in complex networkswith overlapping community structure, Scientific Reports (1) (2019)110. L. H´ebert-Dufresne, A. Allard, J.G. Young, L.J. Dub´e, Global efficiency of local im-munization on complex networks, Scientific reports , 2171 (2013)111. D. Chakraborty, A. Singh, H. Cherifi, in International Conference on ComputationalSocial Networks (Springer, 2016), pp. 62–73112. M. Kumar, A. Singh, H. Cherifi, in Companion of the The Web Conference 2018on The Web Conference 2018 (International World Wide Web Conferences SteeringCommittee, 2018), pp. 1269–1275ommunity structure: challenges and opportunities 41113. H. Wei, Z. Pan, G. Hu, L. Zhang, H. Yang, X. Li, X. Zhou, Identifying influentialnodes based on network representation learning in complex networks, PloS one (7),e0200091 (2018)114. Y. Zhao, S. Li, F. Jin, Identification of influential nodes in social networks with com-munity structure based on label propagation, Neurocomputing , 34 (2016)115. U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect communitystructures in large-scale networks, Physical review E (3), 036106 (2007)116. X. Lu, K. Kuzmin, M. Chen, B. Szymanski, Adaptive modularity maximization viaedge weighting scheme, Information Sciences , 55 (2018)117. A. Lewis, N. Jones, M. Porter, D. Deane, The function of communities in proteininteraction networks at multiple scales, BMC Sys. Biol. (1), 100 (2010)118. H. Simon, in Facets Sys. Sci. (Springer, 1991), pp. 457–476119. M. Porter, J.P. Onnela, P. Mucha, Communities in networks, Notices AMS (9), 1082(2009)120. A. Lancichinetti, S. Fortunato, Limits of modularity maximization in community de-tection, Phys. Rev. E (6), 066122 (2011)121. M. Chen, K. Kuzmin, B. Szymanski, in Proceedings of the IEEE/ACM ASONAM,4th Social Network Analysis and Applications (SNAA) Workshop (IEEE, 2014), pp.856–863122. M. Porter, M. McDonald, S. Williams, N. Johnson, N. Jones, Dynamic communities inmultichannel data: An application to the foreign exchange market during the 20072008credit crisis, Chaos: Interdisciplinary J. Nonlinear Sci, (3), 033119 (2009)123. P. Mucha, T. Richardson, K. Macon, M. Porter, J.P. Onnela, Community structurein time-dependent, multiscale, and multiplex networks., Science (5980), 876878(2010)124. V. Traag, G. Krings, P. Van Dooren, Significant scales in community structure, Sci.Rep. , 2930 (2013)125. L. Peel, D.B. Larremore, A. Clauset, The ground truth about metadata and communitydetection in networks, Science Advances (5) (2017). DOI 10.1126/sciadv.1602548126. M. Girvan, M.E.J. Newman, Community structure in social and biological networks,Proc. Nat. Acad. Sci. (12), 7821 (2002). DOI 10.1073/pnas.122653799127. C. Granell, R.K. Darst, A. Arenas, S. Fortunato, S. G´omez, Benchmark model toassess community structure in evolving networks, Phys. Rev. E , 012805 (2015).DOI 10.1103/PhysRevE.92.012805128. M. Bazzi, L.G.S. Jeub, A. Arenas, S.D. Howison, M.A. Porter, Generative bench-mark models for mesoscale structures in multilayer networks, CoRR abs/1608.06196 (2016). URL http://arxiv.org/abs/1608.06196 (6), 893 (2017). DOI 10.1093/comnet/cnx016130. G.K. Orman, V. Labatut, H. Cherifi, An empirical study of the relation betweencommunity structure and transitivity, CoRR abs/1207.3234 (2012). URL(2012). URL