[PDF] Hierarchical benchmark graphs for testing community detection algorithms

Abstract

Hierarchical organization is an important, prevalent characteristic of complex systems; in order to understand their organization, the study of the underlying (generally complex) networks that describe the interactions between their constituents plays a central role. Numerous previous works have shown that many real-world networks in social, biologic and technical systems present hierarchical organization, often in the form of a hierarchy of community structures. Many artificial benchmark graphs have been proposed in order to test different community detection methods, but no benchmark has been developed to throughly test the detection of hierarchical community structures. In this study, we fill this vacancy by extending the Lancichinetti-Fortunato-Radicchi (LFR) ensemble of benchmark graphs, adopting the rule of constructing hierarchical networks proposed by Ravasz and Barab\'asi. We employ this benchmark to test three of the most popular community detection algorithms, and quantify their accuracy using the traditional Mutual Information and the recently introduced Hierarchical Mutual Information. The results indicate that the Ravasz-Barab\'asi-Lancichinetti-Fortunato-Radicchi (RB-LFR) benchmark generates a complex hierarchical structure constituting a challenging benchmark for the considered community detection methods.

Full PDF

HHierarchical benchmark graphs for testing community detection algorithms

Zhao Yang, ∗ Juan I. Perotti, † and Claudio J. Tessone

1, 2, ‡ URPP Social Networks, University of Z¨urich, Andreasstrasse 15, CH-8050 Z¨urich, Switzerland IMT School for Advanced Studies Lucca, Piazza San Francesco 19, I-55100 Lucca, Italy (Dated: August 24, 2017)Hierarchical organization is an important, prevalent characteristic of complex systems; in order to under-stand their organization, the study of the underlying (generally complex) networks that describe the interactionsbetween their constituents plays a central role. Numerous previous works have shown that many real-world net-works in social, biologic and technical systems present hierarchical organization, often in the form of a hierarchyof community structures. Many artiﬁcial benchmark graphs have been proposed in order to test different com-munity detection methods, but no benchmark has been developed to throughly test the detection of hierarchicalcommunity structures. In this study, we ﬁll this vacancy by extending the Lancichinetti-Fortunato-Radicchi(LFR) ensemble of benchmark graphs, adopting the rule of constructing hierarchical networks proposed byRavasz and Barab´asi. We employ this benchmark to test three of the most popular community detection algo-rithms, and quantify their accuracy using the traditional Mutual Information and the recently introduced Hier-archical Mutual Information. The results indicate that the Ravasz-Barab´asi-Lancichinetti-Fortunato-Radicchi(RB-LFR) benchmark generates a complex hierarchical structure constituting a challenging benchmark for theconsidered community detection methods.

I. INTRODUCTION

Hierarchical organization [1–3] is a typical trait of com-plex systems, appearing in many biological, social (corpo-rations, education systems, governments, and organized re-ligions) or technological (internet and other infrastructure) ar-rangements whose different scales are apparent. The interac-tions between the constituents of those systems are correctlydescribed as networks of interconnected modules nested hier-archically [4, 5]. Typical hierarchical networks include foodwebs, protein interaction networks, metabolic networks, generegulatory networks, social networks, etc. [6]. While interac-tions ultimately occur between the basic or microscopic con-stituents of the systems, effective coarse-grained elements andinteractions between them emerge at the different levels of or-ganization which should be characterized and understood attheir own scale. Because of this, ﬁnding the appropriate hi-erarchical and modular structure of complex networks is ofgreat interest for the understanding of complex systems [6, 7].Community detection helps to unveil the non-trivial orga-nization of complex systems at the mesoscopic scale [8–10].Many algorithms have been developed to identify the com-munity structure in networks [11–18]. Some of them are alsoable to reveal the hierarchical community structure within.Without the intention of being exhaustive, the most widelyused are:

Infomap [15], which uses the probability ﬂow ofrandom walks on the network under consideration as a proxyfor information diffusion in the real system; it then proceedsby decomposing the network into modules by compressing aspeciﬁc description of probability ﬂow.

Louvain [16], whichemploys a computationally efﬁcient greedy-algorithm for theoptimization of Newman’s modularity [19].

Spinglass [17], ∗ [email protected] † [email protected] ‡ [email protected] which uncovers the community structure of networks by min-imizing the energy of a Hamiltonian whose spin-states rep-resent the community indices. OSLOM [18], which detectsclusters by using the local optimization of a ﬁtness functionexpressing the statistical signiﬁcance of a community withrespect to random ﬂuctuations. And hierarchical stochasticblock model [20], which seeks to ﬁt a hierarchy of stochasticblock models to the different levels of organization of net-works.Comparing the accuracy of different community detectionalgorithms is a non-trivial problem. Commonly, two sepa-rately, intricate tools are required for the task [9]. The ﬁrstone are benchmark graphs . These can be either real networkswith known community structure (i.e. ground truth) or en-sembles of artiﬁcial graphs with built-in community structure[8, 11, 21–26]. The second tool required is a measure quan-tifying the similarity between different allocations of nodesinto communities for the same network. This enables thecomparison between the known community structure and theidentiﬁed by the algorithms under study. Recently, to coverthe need of the second requirement, a similarity measure forthe comparison of hierarchical community structures has beenintroduced – the so-called

Hierarchical Mutual Information(HMI) [27] – which is a generalization of the

Mutual Infor-mation (MI) , a standard measure for the comparison of non-hierarchical community structures [28]. As we show in thispaper the HMI can be further combined with the more tra-ditional approach, where a level-by-level comparison of thehierarchies is performed with the standard MI [18, 29].The development of a benchmark graph model mimick-ing the hierarchical community structure of real complex net-works – i.e. to cover the need of the ﬁrst tools previouslymentioned – is the central topic of the present paper. Namely,in this work, we introduce the

Ravasz-Barab´asi LFR bench-mark (RB-LFR) . Broadly speaking, in its simplest incarna-tion, the RB-LFR is obtained combining the complex com-munity structure of the standard LFR benchmark [8] with thecelebrated Rabasz-Barab´asi mechanism of constructing hier- a r X i v : . [ phy s i c s . s o c - ph ] A ug archies [30]. While we develop the benchmark as a stylizedrepresentation of real-world networks, given that data aboutproperties of the hierarchical organization with multiple lev-els is scarce, we reproduce properties of well-established ar-tiﬁcial models that are also inspired in real data. We arguethat only after solid hierarchical community detection meth-ods have been developed and pass tests posed by artiﬁcialbenchmarks, a proper understanding of hierarchical organiza-tions in real world will be possible. As we show in this paper,the RB-LFR benchmark poses challenging detection problemsfor the most popular hierarchical community detection meth-ods and it allows us to show that the HMI is a superior tool forthe comparison of hierarchical community structures as com-pared to the traditional MI.The outline of the paper is the following. In section II, theconstruction of the benchmark is presented. In section III,three community detection algorithms have been tested on theRB-LFR benchmark graphs with different setups: in subsec-tion III A, the benchmark graphs have two levels, while insubsection III B, the benchmarks have three levels. Finally,the discussions and conclusions are summarized in section IV. II. THE RB-LFR HIERARCHICAL BENCHMARK

In this section, we provide a detailed description for theconstruction of the networks in the ensemble deﬁned by theRB-LFR benchmark. By performing a topological analysis,we also show that the resulting networks exhibit both: power-law degree and community size distributions.Before we go into the construction details, let us ﬁrst mo-tivate the convenience of the RB-LFR benchmark as com-pared to other existing alternatives. Different hierarchicalnetwork structures have been already proposed in the litera-ture. For instance, the Sierpinski gasket [31], the hierarchi-cal planted partition model [29], the hierarchical stochasticblock model and its variants [6, 20, 32], the Ravasz-Barab´asimodel [30] and a hierarchically nested version of the LFRbenchmark [18]. While some of these network structureshave been already employed in the problem of community de-tection, they display certain limitations when considered asbenchmark graphs. For example, the Sierpinski gasket andthe standard Ravasz-Barab´asi models have an excessively reg-ular structure, while real networks have more complex hier-archical community structures (see for example, the politicalblog network of Adamic and Glance and the IMDB ﬁlm-actornetwork [20, 33]). The hierarchical planted partition modelcontains disorder, but it has exceedingly simple communitiesand connection structures which fail to reﬂect the properties ofthe communities found in real networks where largely varyingcommunity sizes and node degrees are found. The hierarchi-cal stochastic block model admits generalized communitiesof different sizes and approximately arbitrary degree distri-butions, improving over the hierarchical partition model. Inpractice however, at least to the extent of our knowledge, ithas never been used to construct benchmark graphs with hi-erarchical structures and power-law distribution of commu-nity sizes. The most promising alternative is the hierarchi- (a) (b)(c) (d)

FIG. 1. (a) An example of the LFR benchmark taken as originalbuilding block of the benchmark. (b) Four replicated LFR bench-marks are generated and connected to the original or seed LFRbenchmark, community by community. (c) An schematic diagramof the connections between the seed community and the replicatedones; the red node is the hub, i.e. the node with the largest degreein the community. We have only shown the links between the blacknodes and the hub. The other links are not visible. (d) A realizationof a three-level RB-LFR benchmark. Links of the other communitiesare not visible. cal version of the LFR benchmark, since it presents com-plex and realistic degree and community structures like thestandard LFR does, and a hierarchical community structure.However, although the general idea is given, a precise deﬁ-nition of the hierarchical LFR is still missing, nor its proper-ties have been systematically tested. Only realizations withtwo levels have been considered – so-called ﬁne or micro-community level and coarse or macro-community level – and,according to the given speciﬁcations, it is not clear how themacro-communities should be obtained by merging micro-communities, something that is required to generate networkswith more than two levels. In other words, a guiding princi-ple or mechanism is required to combine LFR networks intohierarchies with an arbitrary number of levels. The straight-forward way is to appropriately extend the deﬁnition of thehierarchical LFR, recursively building LFR networks withinthe modules of other LFR networks. However, this approachpresents two important disadvantages, affecting the compu-tational cost required to analyze and generate the networks.Firstly, the number of nodes in the network quickly growswith the number L of levels as N L ∼ C L N where N isthe number of nodes and C the number of communities in anon-hierarchical LFR playing the role of a seed-network. Sec-ondly, in order to be conceptually consistent, the algorithmdevised to generate the non-hierarchical LFR networks shouldbe appropriately modiﬁed in order to preserve the power-lawcommunity-size and degree distributions across every level ofthe resulting hierarchy.Since we want to develop a computationally accessiblebenchmark combining well studied ideas, we propose a dif-ferent approach. Namely, we introduce the Ravasz-Barab´asiLFR benchmark (RB-LFR) , an extension of the LFR bench-mark [8] obtained by combining it with a construction proce-dure inspired in the work by Ravasz and Barab´asi [30]. Com-pared to previous alternatives, the RB-LFR benchmark hasa complex and realistic network degree and community-sizedistributions – like the LFR benchmark does –: its hierarchycan have an arbitrary number of levels and the RB procedurecan be generalized in a straightforward manner even further.In the standard RB method, the hubs of different network mo-tifs are connected to the nodes of corresponding replicas [30]but, in a more general setup, these restrictions can be relaxedby allowing alternative inter-replica connections by combin-ing different ways or modes of doing so [34]. In the presentwork, in order to simplify the analysis, we restrict ourselvesto study the case of the original RB procedure, leaving for fu-ture work the study of the alternative generalizations of theRB-LFR benchmark.Our starting point is a standard non-hierarchical LFRbenchmark network (Fig. 1a), which we consider as the seednetwork motif for an adapted Rabasz-Barab´asi procedure forconstructing hierarchical networks. The parameters used togenerate this LFR benchmark network are indicated in TableI. The number of nodes in the seed network is N = 1000 .Each node is given a degree taken from a power-law distribu-tion with exponent γ = − . We have ﬁxed the average degree (cid:104) k (cid:105) = 20 , and the maximum degree to k max = 0 . N . Com-munity size is taken from a power-law distribution with expo-nent -1 and the upper bound and lower bound of communitysize are . N and (cid:104) k (cid:105) , respectively. The mixing parameter, µ , which represents the fraction of links with the other nodesoutside of its community, is deﬁned as µ = (cid:80) i k ext i (cid:80) i k tot i , where k ext i stands for the external degree of node i and k tot i isthe total degree of i . In this study, the values of µ are takenfrom an arithmetic sequence from 0.01 to 0.89 with step 0.04.Next, following the constructing RB procedure, we gen-erate R replicas of the seed LFR network in this context,it means that we generate R replicas of each seed commu-nity and connect each seed community to their correspond-ing replica communities (Fig. 1b) [8, 30]. We denote commu-nity hubs , the node with the largest degree in that community.Then, the connections between the seed and the replica com-munities are always between the hub of the seed communityand nearest neighbors of the replicated hub (Fig. 1c). Thisreplication and connection procedure can be repeated up tothe desired number of levels. Each replication increases thenumber of nodes of the benchmark graph by a factor R + 1 , Parameter ValueNumber of nodes, N (cid:104) k (cid:105) . N Maximum community size . N Minimum community size (cid:104) k (cid:105) Degree distribution exponent, γ -2Community size distribution exponent, β -1Mixing parameter, µ [0.01, 0.05, ..., 0.89]TABLE I. Parameters deﬁning the ensemble of seed LFR benchmarkgraphs. To deal with possible discrepancies in the network prop-erties, we have generated 10 independent networks for every set ofparameters. so the number of nodes of a RB-LFR network with L lev-els scales as N L ∼ ( R + 1) L N , a number that can be con-siderably smaller than the analogous for the hierarchical LFRsince, in practice, R + 1 can be chosen to be signiﬁcantlysmaller than C . In Figure 1d we show a three-level RB-LFRbenchmark graph. Importantly, by assuming that each node inthe network chooses to join the community to which the max-imum number of its neighbors belong to [14], introducing theinter-community connections does not cause vanishing, merg-ing, or generation of communities. For instance, in the moststylized case, the hub node has the same amount of links tothe seed community and to the replica communities. As wewill show later by introducing a non-zero probability of re-moving connections between the seed communities and thereplicas, we can guarantee that the hubs will always belong tothe seed communities. Hence, a power-law community struc-ture is preserved at the bottom level (or top level, depending ofthe benchmark parameters) of the hierarchy, while a uniformcommunity-structure is generated at the other levels.In Fig. 2, the degree distributions of 2 and 3 layers net-works generated by the RB-LFR benchmark are plotted, al-ways starting with a seed LFR graph with the same set ofparameters. We have ﬁtted the degree distributions and re-ported the exponent of the ﬁtted power-law distribution. As itcan be seen, the added inter-community connections produceminor changes to the exponent of the degree distribution. Inother words, an RB-LFR benchmark network approximatelypreserves the degree distribution of the seed LFR.Depending on the value of the mixing parameter µ for theseed LFR benchmark, the process described above can gener-ate hierarchical graphs with two different well-deﬁned groundtruths. Taking the two-level RB-LFR benchmark graphs as anexample, when the mixing parameter of the seed LFR bench-mark is small, its community structure and that of its repli-cas are well-deﬁned. First, on the ﬁrst level, the RB-LFRbenchmark displays as many communities as the seed LFRhas, i.e. C communities. Each community in this ﬁrst layercontains one community of the seed LFR together with all itsreplicas. At the second level, each community of the ﬁrst onecontains R + 1 sub-communities (Fig. 3a & c) – one for eachreplica plus the seed one – summing a total of C × ( R + 1) sub-communities in the complete network ˙Notice, this occurs − − − − − − − LFR, g = - g = - g = - degree, log k d e g r ee d i s t r i bu ti on l og P ( k ) FIG. 2. log-log plot of the degree distribution for different RB-LFRbenchmark graph samples with two levels (red triangles) and threelevels (green pluses), constructed from a seed LFR with N = 50000 nodes and mixing parameter µ = 0 . (black crosses). because there are no connections between each of the seedcommunities and the replicas of other seed communities. Thissort of inter-replica connections could be added and studied infuture works, an interesting aspect showing how much richerin possible variations is the hierarchical case as compared tothe non-hierarchical one.When the mixing parameter µ is increased, the communitystructure of the seed LFR becomes more fuzzy and harderto detect. Therefore, the seed and the replica communitieswithin the RB-LFR benchmark become harder to detect, too.This obviously occurs to all replicas, while the number ofinter-layer links remain the same regardless of µ . Therefore,the seed LFR and the replicas may be interpreted as R + 1 communities at the ﬁrst layer. Each of them has as manysub-communities at the second level as the seed LFR had,i.e. C (see Figs. 3b & d). Again, the total number of sub-communities at the second level is ( R + 1) × C but, this time,such number is reached through different means, as you cansee by comparing Figs. 3a & b.If the mixing parameter of the seed LFR becomes too large,then the communities become impossible to detect and thecommunity structure of the RB-LFR benchmark network be-comes mono-level; i.e. no second level arises and only R + 1 communities exist at the ﬁrst level, one for the seed LFR andothers for the replicas. l a y e r (a) l a y e r (b)(c) (d) FIG. 3. (a) and (b) are the circular representations of the hierarchicalstructure of an RB-LFR benchmark with R = 4 replicas. The centerrepresents the whole network at level 0. In the example, LFR seedswith N = 1000 nodes, C = 13 communities and varying mixingparameter µ are used. In (a), the mixing parameter of the seed LFRbenchmark is small, and the RB-LFR has C communities on the ﬁrstlevel and each of them has R + 1 sub-communities on the secondlevel. A larger mixing parameter for the seed LFR is used in (b),where the RB-LFR benchmark has R + 1 communities on the ﬁrstlevel, each having C sub-communities on the second level. In panels(c) and (d) schematic network representations corresponding to thehierarchies in (a) and (b) are shown, respectively. The shaded (blue)areas represents a community on the ﬁrst level, and the black circlesrepresent sub-communities on the second level. Communities mighthave different sizes. For clarity reasons, links between the seed LFRand the replicas are not shown. III. TEST

In the previous section, we have given the intuition thatthe RB-FLR benchmark is compatible with different groundtruths for the hierarchical community structure. In this sec-tion, we verify that this topological transition occurs. But themain result of this section is the use the RB-LFR benchmarkto test the performance of three hierarchical community de-tection algorithms:

Infomap [15], a recursive application of

Louvain method for the generation of hierarchies [16, 27] andthe Minimum Description Length implementation of the

Hi-erarchical Stochastic Block Model (HSBM) [20].

Spinglass algorithm [17] is not tested because is computationally slowand

OSLOM [18] is not employed because we focus on net-works with non-overlapping communities.As we already mentioned, we compare the similarity of theground truth and detected community structures, employingthe

Normalized Mutual Information (NMI) [28] and the

Nor-malized Hierarchical Mutual Information (NHMI) [27]. Inaddition, we calculate the difference between the

HierarchicalMutual Information (HMI) and the

Mutual Information (MI) at the different levels, in order to quantify the cumulative con-tributions of the deeper levels of graphs, only.

A. Test on two-level RB-LFR benchmark

We ﬁrst concentrate on the two-level RB-LFR benchmarkensemble. The seed LFR benchmark graphs we employ areundirected and unweighed networks with non-overlappingcommunities. The parameters of LFR benchmark are shownin Table I. The number of replicas equals to R = 4 .First, we study the accuracy of the community detectionmethods as a function of the mixing parameter µ . We deﬁnethree different ground truths: the ﬁrst ground truth, namely seed-replica , corresponds to the hierarchy that should emergefor small mixing parameter (Fig. 3a); the second ground truth,namely replica-seed , corresponds to a larger value of the mix-ing parameter (Fig. 3b), and the last ground truth correspondsto a ﬂat structure that there is only one level [9, 35]. Thesethree ground truths are represented in black, red, and greencolor, respectively.The results are shown in Fig. 4. In the left panels the ac-curacy of the different community detection algorithms arequantiﬁed by the average value of the NHMI computed be-tween the detected hierarchical community structures and thedifferent ground truths. In the center column, the similarityis quantiﬁed with the average NMI computed between the de-tected partitions at the second level and those exhibited bythe different ground truths. In the right panels, the similarityis quantiﬁed by the difference HMI - MI between the HMIcomputed for the full hierarchies and the MI computed forthe partitions at the ﬁrst level. The tested methods are In-fomap, Louvain, and HSBM from top to bottom. Taking thetop-left panel as an example: Infomap can unveil the com-munity structure until µ ≈ . (with the difference betweenboth ground truths). For µ (cid:47) . , it detects the ﬁrst type ofground truth, and for . (cid:47) µ (cid:47) . , it detects the secondtype of ground truth. We observe a clear transition betweenthe ground truths for µ between µ = 0 . and µ = 0 . ; in bothregions, the NHMI reaches values close to one making appar-ent that the algorithm gives a description of the hierarchy veryclose to the ground truth. For µ (cid:39) . , Infomap detects aﬂat community structure. This result showcases that the RB-LFR benchmark shows a clear hierarchical community struc-ture which can be recognized successfully by Infomap. Thefact that NHMI = 1 highlights that this is indeed non-trivial.Comparing panels (a) to (d), and (g) of Fig. 4, we observethat the new benchmark poses a challenging task that can testthe performance of the algorithms: the accuracy of Louvainreaches 0.6 until µ ≈ . but, it still detects some hierarchicalcommunity structure until µ ≈ . , a far wider range thanInfomap. The HSBM always has an accuracy smaller than 0.2. We note here that the poor performance of the HSBM is mostlikely related to its approach, i.e. a bottom-up approach, whilethe other two methods are taking the top-down approaches tobuild the hierarchies [27].The right panels, Fig. 4c, f, & i, which show the differencebetween the full HMI and MI of the ﬁrst level, overall givingthe contribution that the second level has on the HMI. In otherwords, it quantiﬁes how accurately the algorithms detect thesecond level and how relevant is the corresponding contribu-tion as measured by the HMI. For instance, for Infomap, underthe second deﬁnition of ground truth, the observed value rep-resents 64.7% of the total value of the HMI when µ = 0 . .Hence, the contribution of the second level is non-negligible,showing the convenience of Hierarchical Mutual Information as a measure for the comparison of hierarchical communitystructures, when compared to the traditional

Mutual Informa-tion . l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (a) NH M I(I n f o m a p ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (b)Mixing parameter, µ N M I(I n f o m a p ) l l l l l l l l l l l l l l l l l l l l l l l Seed−ReplicaReplica−SeedFlat (c) H M I- M I(I n f o m a p ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (d) NH M I( L ouv a i n ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (e)Mixing parameter, µ N M I( L ouv a i n ) l l l l l l l l l l l l l l l l l l l l l l l (f) H M I- M I( L ouv a i n ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (g) NH M I( H S B M ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (h)Mixing parameter, µ N M I( H S B M ) l l l l l l l l l l l l l l l l l l l l l l l (i) H M I- M I( H S B M ) FIG. 4. Average NHMI, NMI, and (HMI - MI) as a function of themixing parameter, µ at the left, middle and right panels, respectively.Here, the NMI compares partitions at second level of the detectedand ground truth hierarchies. Similarly, the HMI compares full hier-archies while the MI compares partitions at the ﬁrst level. From topto bottom, the methods are Infomap, Louvain, and HSBM. Averagesare computed over 10 different network realizations with the sameset of parameters of the seed LFR benchmark. The parameters of theseed networks can be found in Table I. Now, we measure the effect of the average degree (cid:104) k (cid:105) onthe performance of algorithms. We use the NHMI to quan-tify the accuracies of the algorithms and the results are shownin Fig. 5. The top panels correspond to (cid:104) k (cid:105) = 10 , and thebottom ones correspond to (cid:104) k (cid:105) = 40 . Comparing panels (a)and (d), and panels (b) and (e), we can observe that for sparseRB-LFR benchmark graphs, the community detection meth-ods have better performance with increasing (cid:104) k (cid:105) . This is theresult that is typically observed [9] and is a reasonable onesince, in the sparse regime (cid:104) k (cid:105) (cid:28) N , where N is the num-ber of nodes in the network, the larger is (cid:104) k (cid:105) the less impor-tant are the sample to sample ﬂuctuations that may affect howwell deﬁned the communities are. Furthermore, we observea similar pattern to the Fig. 4: while Infomap exhibits higheraccuracy, Louvain is able to detect a hierarchical structure ina wider range of the mixing parameter µ (Figs. 5d & e andFigs. 4a & d). l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (a)Infomap NH M I( (cid:104) k (cid:105) = ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (b)LouvainMixing parameter, µ NH M I( (cid:104) k (cid:105) = ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . Seed−ReplicaReplica−SeedFlat (c)HSBM NH M I( (cid:104) k (cid:105) = ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (d) NH M I( (cid:104) k (cid:105) = ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (e)Mixing parameter, µ NH M I( (cid:104) k (cid:105) = ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (f) NH M I( (cid:104) k (cid:105) = ) FIG. 5. Average NHMI as a function of the mixing parameter, µ . Thetop panels correspond to seed LFR benchmaks with average degree (cid:104) k (cid:105) = 10 and the bottom ones to (cid:104) k (cid:105) = 40 . From left to right, themethods are Infomap, Louvain, and HSBM. Averages are computedover 10 different network realizations with the same set of parametersof the seed LFR benchmark.The parameters of the seed networks canbe found in Table I. a. Decimated inter-layer connections So far we haveconsidered a highly stylized model where the communitiesin the seed network are deterministically replicated in deeperlayers. In this subsection, we relax this assumption. We notethat in these less stylized cases, all the nodes would have morelinks to their own communities, such that the topologies ofthe networks would remain the same. With this in mind, weintroduce a parameter p . It speciﬁes the probability of ran-domly removing connections between the seed communitiesand the replicas (Fig. 1d). The decimation procedure associ-ated to p is applied to every pair of seed–replica communities,independently. In this way, p = 0 means that all connec-tions are kept (the case studied in the previous subsection)and p = 1 means all connections are removed. Hence, p isa sort of complementary mixing parameter; while µ controlsthe connectivity at the LFR level, p controls the connectivityat the inter-layer level. We study the accuracy of the commu-nity detection methods by plotting the NHMI as a function of p . We repeat calculations for three different values of the mix-ing parameter, µ = 0 . , . and . , i.e. they represent thethree qualitatively different regions for the mixing parameterfound in the previous results. The ﬁndings are shown in Fig-ure 6. In Fig. 6a, a transition between the two seed-replica andreplica-seeds ground truths is observed as p is varied. This isanalogous to what is observed in Fig. 4a when µ is varied.In other words, the previous result conﬁrm the role of p asa complementary mixing parameter. The rest of the panelsin Fig. 6 essentially show that, when the mixing parameter islarge, the number of connections between communities andtheir replicas is already very small and p cannot have a signif- icant impact on the detected structure. Overall, we can con-clude that the RB-LFR benchmark graphs are relatively robustto random removal of some connections, a desirable charac-teristic for a well deﬁned ensemble of benchmark graphs. Im-portantly, only the Infomap algorithm is able to unveil suchtopological transition induced by p . From now on, p = 0 . llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (a) µ = 0 . NH M I(I n f o m a p ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (b) µ = 0 . p NH M I(I n f o m a p ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . Seed−ReplicaReplica−SeedFlat (c) µ = 0 . NH M I(I n f o m a p ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (d) NH M I( L ouv a i n ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (e) p NH M I( L ouv a i n ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (f) NH M I( L ouv a i n ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (g) NH M I( H S B M ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (h) p NH M I( H S B M ) llllllllllllllllllllllll . . . . . . . . . . . . . . . . . . (i) NH M I( H S B M ) FIG. 6. Average NHMI as a function of the complementary mixingparameter, p . From left to right, the mixing parameters are µ =0 . , . and . , respectively. From top to bottom, the methodsare Infomap, Louvain, and HSBM. Averages are computed over 10different network realizations with the same set of parameters of theseed LFR benchmark. The parameters of the seed networks can befound in Table I Since the previous results show that Infomap performs welland, in some cases, considerably better than the other options,in what follows we restrict our analysis presenting the resultsobtained with Infomap, only. b. Decimation of replicas

We now randomly remove afraction q of the existing replicas—together with all theirconnections—from a previously generated RB-LFR bench-mark graph. For q = 0 all the replica communities are keptwhile for q = 1 all of them are removed. As before, we use µ = 0.05, 0.3, & 0.7 to represent three different regions of themixing parameter. The results indicate that, in all cases, theRB-LFR benchmark graphs still preserves a relatively stablehierarchical structure even after 60% of the replicated com-munities have been removed (Fig. 7). From now on, q = 0 . c. Network sizes Then, we have measured the effect ofnetwork size on the performance of Infomap, observing thatthe accuracy of the method mildly decreases as the number ofnodes N increases. It only has a mensurable effect when for µ → . d. Number of replicas In the end, we studied the effectof the number of replicas on the performance of Infomap (go-ing from R = 4 to R = 9 ). We observe that the range ofthe mixing parameter µ where the transition between groundtruths occur, becomes slightly wider. Overall, we conclude llllllllllllllllllll . . . . . . . . . . . . . . . . . . (a) µ = 0 . NH M I(I n f o m a p ) llllllllllllllllllll . . . . . . . . . . . . . . . . . . (b) µ = 0 . q NH M I(I n f o m a p ) llllllllllllllllllll . . . . . . . . . . . . . . . . . . Seed−ReplicaReplica−SeedFlat (c) µ = 0 . NH M I(I n f o m a p ) llllllllllllllllllll . . . . . . . . . . . . . . . . . . (d) N M I(I n f o m a p ) llllllllllllllllllll . . . . . . . . . . . . . . . . . . (e) q N M I(I n f o m a p ) llllllllllllllllllll . . . . . . . . . . . . . . . . . . (f) N M I(I n f o m a p ) FIG. 7. Average NHMI and NMI (top and bottom, respectively) asa function of q , the fraction of replica communities removed froma standard RB-LFR benchmark. From left to right, the mixing pa-rameters is set to µ = 0 . , . and . , respectively. Averages arecomputed over 10 different network realizations with the same set ofparameters of the seed LFR benchmark. The parameters of the seednetworks can be found in Table I. that the results are robust to variations of the number of repli-cas. B. Test on three-level RB-LFR benchmark

In the last study, we focus on the three-level RB-LFRbenchmark. The setting is the same as those in the ﬁrst study,i.e. Table I and Fig. 4. Under this setting, the ﬁrst ground truthwould be seed-replica-replica (Seed-Replica*2), and the sec-ond ground truth becomes replica-replica-seed (Replica*2-Seed), while the third one remains the same. We report theaccuracy of Infomap as a function of the mixing parameter, µ .The results are shown in Fig. 8. One could see that the threelevels RB-LFR benchmark is a much harder test, but still In-fomap is able to unveil the network structure for certain valuesof the mixing parameter, µ . On the other hand, the accuraciesare much worse than those of the two-level benchmark graphsin most of the cases (see Figs. 4a & b for a comparison). InFig. 8c we show the difference between the full HMI and theMI of the ﬁrst level. Similar to what we have observed inFig. 4c, the second and third levels contribute with an impor-tant fraction of the total value of the HMI.Finally, in Fig. 9, we provide three examples of the groundtruth hierarchical structure of different RB-LFR benchmarkgraphs (top panels) and corresponding hierarchical structuresdetected by Infomap (bottom panels). The mixing parameters, µ , are 0.01, 0.33, and 0.77 from left to right.Panel (a) corresponds to the ﬁrst type of ground truth. Inthis case, the mixing parameter is small enough such that thestructure of the seed LFR is found on the upper level, and themechanism of Ravasz-Barab´asi model is observed in the sec-ond and third levels. Panels (b) and (c) correspond to the sec-ond type of ground truth. In this case, the mixing parameteris large enough such that the mechanism of Ravasz-Barab´asiis observed in levels 1 and 2, while the structure of the seed l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (a) NH M I(I n f o m a p ) l l l l l l l l l l l l l l l l l l l l l l l . . . . . . . . . . . . . . . . . . (b)Mixing parameter, µ N M I(I n f o m a p ) l l l l l l l l l l l l l l l l l l l l l l l Seed−Replica*2Replica*2−SeedFlat (c) H M I- M I(I n f o m a p ) FIG. 8. Average NHMI, NMI, and (HMI - MI) as a function of themixing parameter µ , at the left, middle and right panels, respectively,for RB-LFR benchmark graphs with three levels. Averages are com-puted over 10 different network realizations with the same set of pa-rameters of the seed LFR benchmark. The parameters of the seednetworks can be found in Table I. LFR becomes detected at the third level. In all the cases wehave ﬁxed the value of

R, p , and q to 4, 0, and 0, respec-tively. Each node on the last level represents a communitythat doesn’t contain any sub-communities [see Fig. 3c & d]Going into the detailed observation of the detected commu-nities, it is possible to compare the structure of the bottompanels with that of the top ones we can see that for µ = 0.01,Infomap made a mistake in the detection of the ﬁrst level; twocommunities have been merged together. On the second level,Infomap makes even more mistakes, by merging pairs of com-munities in several cases (Figs. 9a & d). In the example of µ =0.33, Infomap successfully unveils the ﬁrst level, but it makesmistakes on the second level (Figs. 9b & e). In the exampleof µ is 0.77, Infomap could neither correctly detect the com-munity structure of the ﬁrst level, nor unveil the structure ofthe deeper levels. In this case, the detected network structureis closed to a ﬂat one: there are three communities on the ﬁrstlevel. Each community on the ﬁrst level contains several sub-communities on the second level, and each community on thesecond level has only one sub-community, i.e. itself, on thethird level (Fig. 9c & f). IV. SUMMARY

In this study, we have introduced a new class of benchmarkgraphs to test hierarchical community detection algorithms.These new benchmark graphs combine the LFR benchmarkand the rule for constructing hierarchical network proposed byRavasz and Barab´asi, hence the name of RB-LFR benchmark.They integrate the properties of the standard LFR benchmark,i.e. a power-law degree distribution and community size dis-tribution, while also possess the clear hierarchical structure ofthe Ravasz-Barab´asi model, and can be extended to an arbi-trary number of levels.We have found that the newly introduced RB-LFR bench-mark graphs pose challenging tests to state-of-the-art hierar-chical community detection algorithms. In particular, we haveseen that the size of the graph and the average degree of nodeshave sizeable effect on the accuracies of the methods. Ourbenchmark graphs, while parsimonious, exhibit a rich phe-nomenology including a variety of topological transitions be-tween co-existing ground truths. Furthermore, by introducing l a y e r (a) l a y e r (b) l a y e r (c) l a y e r (d) l a y e r (e) l a y e r (f) FIG. 9. The top panels are the circular representation of the hierarchi-cal structure of three-level RB-LFR benchmark graphs. The bottompanels are the corresponding hierarchical structures detected by In-fomap. In cases (a) and (d) the mixing parameters of the seed LFRbenchmark is µ = 0 . , in cases (b) and (e) µ = 0 . and in cases(c) and (f) µ = 0 . . The center of every panel represents the wholenetwork at level 0. Similar to the nd and last level of the two-levelRB-LFR, the rd and last level of the three-level RB-LFR representscommunities that do not contain any sub-communities [see Fig. 3c &d]. two parameters to randomly remove connections and replicas,we have observed that the RB-LFR benchmark exhibits a ro-bust hierarchical community structure. Additionally, our testshave also validated that the recently introduced HierarchicalMutual Information (HMI) suits better for the comparison ofhierarchical partitions than the traditional

Mutual Information (MI) does.The comparison of the performance of the tested algo-rithms: Infomap, Louvain, and the Hierarchical StochasticBlock Model (HSBM) against the RB-LFR benchmark, in-dicates that Infomap produce the best results overall. Morespeciﬁcally, the tests on the two-level RB-LFR benchmarkgraphs indicate that Infomap outperforms the other two meth-ods in terms of accuracy. However, it seems that the three-level RB-LFR benchmark is very challenging for all of theexisting algorithms.Our next step is to conduct a more comprehensive compari-son of hierarchical community detection algorithms by evalu-ating their performance on the RB-LFR benchmark. By doingthis, we will gain deeper understanding of the features of theRB-LFR benchmark, and learn more about its limitations andthe differences between the RB-LFR benchmark and the realhierarchical systems have. The benchmark introduced in thisPaper has a very stylized hierarchical structure, which may beseen as a limitation of the approach. However, existing em-pirical work on hierarchical community detection has foundhierarchies whose complexity is rather limited. Our resultshighlight that the algorithms for community detection must bevastly improved to ascertain more complex hierarchies. Thispaper provides the foundation to proceed with this importantline of research.

ACKNOWLEDGEMENTS

ZY and CJT acknowledge ﬁnancial support from the URPPSocial Networks at University of Zurich. They are alsothankful to the S3IT (Service and Support for Science IT)of the University of Zurich, for providing the support andthe computational resources that have contributed to the re-search results reported in this study. JIP acknowledges ﬁnan-cial support from grants by CONICET (PIP 112 20150 10028)and SeCyT-UNC (Argentina), and institutional support fromIFEG-CONICET. [1] F. A. Hayek, The Critical Approach to Science and Philosophy, 332 (1964).[2] H. H. Pattee, Braziller, New York (1973).[3] H. A. Simon,

The Sciences of the Artiﬁcial (MIT press, 1996).[4] B. Corominas-Murtra, J. Go˜ni, R. V. Sol´e, and C. Rodr´ıguez-Caso, Proceedings of the National Academy of Sciences ,13316 (2013).[5] J. L. Gross and J. Yellen,

Graph Theory and Its Applications (CRC press, 2005).[6] A. Clauset, C. Moore, and M. E. J. Newman, Nature , 98(2008).[7] M. E. J. Newman, Nature Physics , 25 (2012).[8] A. Lancichinetti, S. Fortunato, and F. Radicchi, Physical Re-view E , 046110 (2008).[9] S. Fortunato, Physics Reports , 75 (2010).[10] S. Fortunato and D. Hric, Physics Reports , 1 (2016).[11] M. Girvan and M. E. J. Newman, Proceedings of the NationalAcademy of Sciences , 7821 (2002). [12] A. Clauset, M. E. J. Newman, and C. Moore, Physical ReviewE , 066111 (2004).[13] M. E. J. Newman, Physical Review E , 036104 (2006).[14] U. N. Raghavan, R. Albert, and S. Kumara, Physical Review E , 036106 (2007).[15] M. Rosvall and C. T. Bergstrom, Proceedings of the NationalAcademy of Sciences , 1118 (2008).[16] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre,Journal of Statistical Mechanics: Theory and Experiment ,P10008 (2008).[17] J. Reichardt and S. Bornholdt, Physical Review E , 016110(2006).[18] A. Lancichinetti, F. Radicchi, J. J. Ramasco, and S. Fortunato,PloS One , e18961 (2011).[19] M. E. J. Newman, Proceedings of the National Academy of Sci-ences , 8577 (2006).[20] T. P. Peixoto, Physical Review X , 011047 (2014). [21] W. W. Zachary, Journal of Anthropological Research , 452(1977).[22] L. Danon, A. D´ıaz-Guilera, and A. Arenas, Journal of Statisti-cal Mechanics: Theory and Experiment , P11010 (2006).[23] J. P. Bagrow, Journal of Statistical Mechanics: Theory and Ex-periment , P05001 (2008).[24] A. Lancichinetti and S. Fortunato, Physical Review E ,016118 (2009).[25] G. K. Orman and V. Labatut, in International Conferenceon Advances in Social Networks Analysis and Mining (IEEE,2010) pp. 301–305.[26] Z. Yang, R. Algesheimer, and C. J. Tessone, Scientiﬁc Reports (2016).[27] J. I. Perotti, C. J. Tessone, and G. Caldarelli, Physical ReviewE , 062825 (2015).[28] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, Journal ofStatistical Mechanics: Theory and Experiment , P09008 (2005).[29] A. Lancichinetti, S. Fortunato, and J. Kert´esz, New Journal ofPhysics , 033015 (2009).[30] E. Ravasz and A.-L. Barab´asi, Physical Review E , 026112(2003).[31] M. Sierpinski, Compte Rendus hebdomadaires des s´eance del’Acad´emie des Science de Paris , 302–305 (1915).[32] T. Herlau, M. N. Schmidt, L. K. Hansen, et al. , in Cognitive In-formation Processing (CIP), 2012 3rd International Workshopon (IEEE, 2012) pp. 1–6.[33] L. A. Adamic and N. Glance, in

Proceedings of the 3rd Inter-national Workshop on Link Discovery (ACM, 2005) pp. 36–43.[34] C. Song, S. Havlin, and H. A. Makse, Nature Physics , 275(2006).[35] S. Gregory, New Journal of Physics12