[PDF] Triadic closure as a basic generating mechanism of communities in complex networks

Abstract

Most of the complex social, technological and biological networks have a significant community structure. Therefore the community structure of complex networks has to be considered as a universal property, together with the much explored small-world and scale-free properties of these networks. Despite the large interest in characterizing the community structures of real networks, not enough attention has been devoted to the detection of universal mechanisms able to spontaneously generate networks with communities. Triadic closure is a natural mechanism to make new connections, especially in social networks. Here we show that models of network growth based on simple triadic closure naturally lead to the emergence of community structure, together with fat-tailed distributions of node degree, high clustering coefficients. Communities emerge from the initial stochastic heterogeneity in the concentration of links, followed by a cycle of growth and fragmentation. Communities are the more pronounced, the sparser the graph, and disappear for high values of link density and randomness in the attachment procedure. By introducing a fitness-based link attractivity for the nodes, we find a novel phase transition, where communities disappear for high heterogeneity of the fitness distribution, but a new mesoscopic organization of the nodes emerges, with groups of nodes being shared between just a few superhubs, which attract most of the links of the system.

Full PDF

TTriadic closure as a basic generating mechanism of communities in complex networks

Ginestra Bianconi, Richard K. Darst, Jacopo Iacovacci, and Santo Fortunato School of Mathematical Sciences, Queen Mary University of London, London, UK Department of Biomedical Engineering and Computational Science,Aalto University School of Science, P.O. Box 12200, FI-00076, Finland

Most of the complex social, technological and biological networks have a signiﬁcant communitystructure. Therefore the community structure of complex networks has to be considered as a uni-versal property, together with the much explored small-world and scale-free properties of thesenetworks. Despite the large interest in characterizing the community structures of real networks,not enough attention has been devoted to the detection of universal mechanisms able to sponta-neously generate networks with communities. Triadic closure is a natural mechanism to make newconnections, especially in social networks. Here we show that models of network growth based onsimple triadic closure naturally lead to the emergence of community structure, together with fat-tailed distributions of node degree, high clustering coeﬃcients. Communities emerge from the initialstochastic heterogeneity in the concentration of links, followed by a cycle of growth and fragmenta-tion. Communities are the more pronounced, the sparser the graph, and disappear for high valuesof link density and randomness in the attachment procedure. By introducing a ﬁtness-based linkattractivity for the nodes, we ﬁnd a novel phase transition, where communities disappear for highheterogeneity of the ﬁtness distribution, but a new mesoscopic organization of the nodes emerges,with groups of nodes being shared between just a few superhubs, which attract most of the links ofthe system.

PACS numbers: 89.75.Hc, 89.75.Fb, 89.75.Kd, 89.75.-k, 05.40.-aKeywords: Networks, triads, community structure

I. INTRODUCTION

Complex networks are characterized by a number ofgeneral properties, that link together systems of very di-verse origin, from nature, society and technology [1–3].The feature that has received most attention in the lit-erature is the distribution of the number of neighbors ofa node (degree), which is highly skewed, with a tail thatcan be often well approximated by a power law [4]. Suchproperty explains a number of striking characteristics ofcomplex networks, like their high resilience to randomfailures [5] and the very rapid dynamics of diﬀusion phe-nomena, like epidemic spreading [6]. The generally ac-cepted mechanism yielding broad degree distributions ispreferential attachment [7]: in a growing network, newnodes set links with existing nodes with a probabilityproportional to the degree of the latter. This way therate of accretion of neighbors will be higher for nodeswith more connections, and the ﬁnal degrees will be dis-tributed according to a power law. Such basic mech-anism, however, taken alone without considering addi-tional growing rules, generates networks with very lowvalues of the clustering coeﬃcient, a relevant feature ofreal networks [8]. Furthermore, these networks have nocommunity structure [9, 10] either.High clustering coeﬃcients imply a high proportion oftriads (triangles) in the network. It has been pointed outthat there is a close relationship between a high densityof triads and the existence of community structure, es-pecially in social networks, where the density of triadsis remarkably high [11–15]. Indeed, if we stick to theusual concept of communities as subgraphs with an ap- preciably higher density of (internal) links than in thewhole graph, one would expect that triads are formedmore frequently between nodes of the same group, thanbetween nodes of diﬀerent groups [16]. This concept hasbeen actually used to implement well known communityﬁnding methods [17, 18]. Foster et al. [15] have studiedequilibrium graph ensembles obtained by rewiring linksof several real networks such to preserve their degree se-quences and introduce tunable values of the average clus-tering coeﬃcient and degree assortativity. They foundthat the modularity of the resulting networks is the morepronounced, the larger the value of the clustering coef-ﬁcient. Correlation, however, does not imply causation,and the work does not provide a dynamic mechanism ex-plaining the emergence of high clustering and communitystructure.Triadic closure [19] is a strong candidate mechanismfor the creation of links in networks, especially social net-works. Acquaintances are frequently made via interme-diate individuals who know both us and the new friends.Besides, such process has the additional advantage ofnot depending on the features of the nodes that get at-tached. With preferential attachment, it is the node’sdegree that determine the probability of linking, imply-ing that each new node knows this information about allother nodes, which is not realistic. Instead, triadic clo-sure induces an eﬀective preferential attachment: gettinglinked to a neighbor A of a node corresponds to choos-ing A with a probability increasing with the degree k A ofthat node, according to a linear or sublinear preferentialattachment. This principle is at the basis of several gen-erative network models [13, 20–29], all yielding graphswith fat-tailed degree distributions and high clustering a r X i v : . [ phy s i c s . s o c - ph ] D ec coeﬃcients, as desired. Toivonen et al. have found thatcommunity structure emerges as well [13].Here we propose a ﬁrst systematic analysis of mod-els based on triadic closure, and demonstrate that thisbasic mechanism can indeed endow the resulting graphswith all basic properties of real networks, including asigniﬁcant community structure. These models can in-clude or not an explicit preferential attachment, they canbe even temporal networks, but as long as triadic clo-sure is included, the networks are suﬃciently sparse, andthe growth is random, a signiﬁcant community structurespontaneously emerges in the networks. In fact the nodesof these networks are not assigned any “ a priori” hiddenvariable that correlates with the community structure ofthe networks.We will ﬁrst discuss a basic model including triadicclosure but not an explicit preferential attachment mech-anism and we will characterize the community formationand evolution as a function of the main variables of thelinking mechanism, i.e. the relative importance of clos-ing a triad versus random attachment and the averagedegree of the graph. We ﬁnd that communities emergewhen there is a high propensity for triadic closure andwhen the network is suﬃciently sparse (low average de-gree). We will also consider further models existing inthe literature and including triadic closure, and we showthat results concerning the emergence of the communitystructure are qualitatively the same, independently onthe presence or not of the explicit preferential attachmentmechanism or on the temporal dynamics of the links. Fi-nally, we will introduce a variant of the basic model, inwhich nodes have a ﬁtness and a propensity to attractnew links depending on their ﬁtness. Here clusters areless pronounced and, when the ﬁtness distribution is suf-ﬁciently skewed, they disappear altogether, while new pe-culiar aggregations of the nodes emerge, where all nodesof each group are attached to a few superhubs. II. THE BASIC MODEL INCLUDING TRIADICCLOSURE

We begin with what is possibly the simplest modelof network growth based on triadic closure. The startingpoint is a small connected network of n nodes and m ≥ m links. The basic model contains two ingredients: • Growth.

At each time a new node is added to thenetwork with m links. • Proximity bias.

The probability to attach the newnode to node i depends on the order in which thelinks are added.The ﬁrst link of the new node is attached to a ran-dom node i of the network. The probability thatthe new node is attached to node i is then givenby Π [0] ( i ) = 1 n + t . (1) neighbor of j random link random link i j j i p (1-p) a b FIG. 1: (Color online) Basic model. One link associated to anew node i is attached to a randomly chosen node j , the otherlinks are attached to neighbors of j with probability p , closingtriangles, or to other randomly chosen nodes with probability1 − p . The second link is attached to a random node of thenetwork with probability 1 − p , while with proba-bility p it is attached to a node chosen randomlyamong the neighbors of node i . Therefore in theﬁrst case the probability to attach to a node i (cid:54) = i is given byΠ [0] ( i ) = (1 − δ i ,,i ) n + t − , (2)where δ i ,i indicates the Kronecker delta, while inthe second case the probability Π [1] ( i ) that thenew node links to node i is given byΠ [1] ( i ) = a i ,i k i , (3)where a ij is the adjacency matrix of the networkand k i is the degree of node i . • Further edges.

For the model with m >

2, furtheredges are added according to the “second link” rulein the previous point. With probability p , and edgeis added to a random neighbor without a link of the ﬁrst node i . With probability 1 − p , a link is at-tached to a random node in the network without alink already. A total of m edges are added, 1 initialrandom edge and m − m = 2. In thebasic model the probability that a node i acquires a newlink at time t is given by1 t  (2 − p ) + p (cid:88) j a ij k j  . (4)In an uncorrelated network, where the probability p ij that a node i is connected to a node j is p ij = k i k j (cid:104) k (cid:105) n ( n being the number of nodes of the network), we mightexpect that the proximity bias always induces a linearpreferential attachment, i.e. (cid:88) j a ij k j ∝ k i , (5)but for a correlated network this guess might not be cor-rect. Therefore, assuming, as supported by the simula-tion results (see Fig. 2), that the proximity bias inducesa linear or sublinear preferential attachment, i.e.Θ i = p (cid:88) j a ij k j (cid:39) ck θi , (6)with θ = θ ( p ) ≤ c = c ( p ), we can write the mas-ter equation [30] for the average number n k ( t ) of nodesof degree k at time t . from the simulation results it isfound that the function θ ( p ) is an increasing function of p for m = 2. Moreover the exponent theta is also an in-creasing function of the number of edges of the new node m . Assuming the scaling in Eq. (6), the master equationfor m = 2 reads n k ( t + 1) = n k ( t ) + 2 − p + c ( k − θ t n k − ( t )(1 − δ k, ) − − p + ck θ t n k ( t ) + δ k, . (7)In the limit of large values of t , we assume that the degreedistribution P ( k ) can be found as n k /t → P ( k ). So weﬁnd the solution for P ( k ) P ( k ) = C − p + ck θ k − (cid:89) j =1 (cid:18) − − p + cj θ (cid:19) , (8)where C is a normalization factor. This expression for θ < P ( k ) (cid:39) D − p + ck θ e − ( k − G ( k − ,θ,c ) , (9)where D is the normalization constant and G ( k, θ, c ) isgiven by G ( k, θ, c ) = − θ F (cid:18) , θ , θ , − ck θ − p (cid:19) + θ F (cid:18) , θ , θ , − ck θ − p (cid:19) + log (cid:18) − − p + ck θ (cid:19) . (10) −2 −1 k Θ p=0.9p=0.7p=0.5p=0.3p=0.1theory FIG. 2: (Color online) Scaling of Θ = (cid:104) Θ i (cid:105) k i = k , the average ofΘ i , performed over nodes of degree k i = k , versus the degree k . This scaling allows us to deﬁne the exponents θ = θ ( p )deﬁned by Eq. (6). The ﬁgure is obtained by performing 100realizations of networks of size n = 100 000. In this case the distribution is broad but not power law.For θ = 1, instead, the distribution can be approximatedin the continuous limit by a power law, given by P ( k ) (cid:39) D − p + ck ) /c +1 , (11)where D is a normalization constant. Therefore we ﬁndthat the network is scale free only for θ = 1, i.e. onlyin the absence of degree correlations. In order to con-ﬁrm the result of our theory, we have extracted from thesimulation results the values of the exponents θ = θ ( p )as a function of p . With these values of the exponents θ = θ ( p ), that turn out to be all smaller than 1, we haveevaluated the theoretically expected degree distribution P ( k ) given by Eq. (9) and we have compared it withsimulations (see Fig. 3), ﬁnding optimal agreement.We remark that this model has been already studiedin independent papers by Vazquez [22] and Jackson [24],who claimed that the model yields always power law de-gree distributions. Our derivation for m = 2 shows thatthis is not correct, in general, and in particular it is notcorrect when the growing network exhibits degree corre-lations, in which case we do not expect that the proba-bility to reach a node of degree k A by following a link isproportional to k A . When the network is correlated wealways ﬁnd θ <

1, i.e. the eﬀective link probability is sublinear in the degree of the target node.We note however, that the duplication model [25–28],in which every new node is attached to a random nodeand to each of its neighbor with probability p , displays atthe same time degree correlations and power-law degreedistribution.We also ﬁnd that the model spontaneously generatescommunities during the evolution of the system. To −7 −6 −5 −4 −3 −2 −1 k P ( k ) p=0.9p=0.7p=0.5p=0.3p=0.1theory FIG. 3: (Color online) Degree distributions of the basic model,for diﬀerent values of the parameter p . The continuous linesindicate the theoretical predictions of Eq. (9), the symbolsthe distributions obtained from numerical simulations of themodel. The ﬁgure is obtained by performing 100 realizationsof networks of size n = 100 000. quantify how pronounced communities are, we use a mea-sure called embeddedness , which estimates how stronglynodes are attached to their own cluster. Embeddedness,which we shall indicate with ξ , is deﬁned as follows: ξ = 1 n c (cid:88) c k c in k c tot , (12)where k c in and k c tot are the internal and the total degree ofcommunity c and the sum runs over all n c communities ofthe network. If the community structure is strong, mostof the neighbors of each node in a cluster will be nodes ofthat cluster, so k c in will be close to k c tot and ξ turns out tobe close to 1; if there is no community structure ξ is closeto zero. However, one could still get values of embedded-ness which are not too small, even in random graphs,which have no modular structure, as k c in might still besizeable there. To eliminate such borderline cases, weintroduce a new variable, the node-based embeddedness ,that we shall indicate with ξ n . It is based on the ideathat for a node to be properly assigned to a cluster, itmust have more neighbors in that cluster than in any ofthe others. This leads to the following deﬁnition ξ n = 1 n (cid:88) i k i, in − k max i, ext k i , (13)where k i, in is the number of neighbors of node i in itscluster, k max i, ext is the maximum number of neighbors of i in any one other cluster and k i the total degree of i . Thesum runs over all n nodes of the graph. For a proper com-munity assignment, the diﬀerence k i, in − k max i, ext is expectedto be positive, negative if the node is misclassiﬁed. In arandom graph, and for subgraphs of approximately the m p (a) ξ n m (b) C FIG. 4: (Color online) Heat map of node-based embedded-ness (a) and average clustering coeﬃcient (b) as a function of p and m for the basic model. Community structure (higherembeddedness and clustering coeﬃcient) is pronounced in thelower left region when m is not too large (sparse graphs) andwhen the probability of triadic closure p is very high. Foreach pair of parameter values we report the average over 50network realizations. The white area in the upper right cor-responds to systems where a single community, consisting ofthe whole network, is found. Here one would get a maximumvalue 1 for ξ n , but it is not meaningful, hence we discard thisportion of the phase diagram, as well as in Figs. 7 and 8. same size, ξ n would be around zero. In a set of discon-nected cliques (a clique being a subgraph where all nodesare connected to each other), which is the paradigm ofperfect community structure, ξ n would be 1.In Fig. 4a we show a heat map for ξ n as a function ofthe two main variables of the model, the probability p and the number of edges per node m , which is half theaverage degree. Communities were detected with non-hierarchical Infomap [31] in all cases. Results obtainedby applying the Louvain algorithm [32] (taking the mostgranular level to avoid artifacts caused by the resolutionlimit [33]) yield a consistent picture. All networks aregrown until n = 50 000 nodes. We see that large val-ues of ξ n are associated to the bottom left portion of thediagram, corresponding to high values of the probabil-ity of triadic closure and to low values of degree. So, ahigh density of triangles ensures the formation of clusters,provided the network is suﬃciently sparse. In Fig. 4b wepresent an analogous heat map for the average clusteringcoeﬃcient C , which is deﬁned [8] as C = 1 n (cid:88) i (cid:88) j,k a ij a jk a ki k i ( k i −

1) (14)where a ij is the element of the adjacency matrix of thegraph and k i is again the degree of node i . Fig. 4b con-ﬁrms that C is the largest when p is high and m is low,as expected.The mechanism of formation and evolution of commu-nities is schematically illustrated in Fig. 5. When the ﬁrstdenser clumps of the network are formed (a), out of ran-dom ﬂuctuations in the density of triangles newly added (a) (b) (c) FIG. 5: (Color online) Schematic illustration of the forma-tion and evolution of communities. Initial inhomogeneities inthe link density make more likely the closure of triads in thedenser parts, that keep growing until they become themselvesinhomogeous, leading to a split into smaller communities (dif-ferent colors). n ξ n n ξ n FIG. 6: (Color online) Evolution of node-based communityembeddedness ξ n along the growth of the network. The curvesrefer to the extreme cases of absence of triadic closure (lowercurve), yielding a random graph without communities, andof systematic triadic closure (upper curve), yielding a graphwith pronounced community structure. For the latter case,we magnify in the inset the initial portion of the curve, tohighlight the sudden drops of ξ n , indicated by the arrows,which correspond to the breakout of clusters into smaller ones. nodes are more likely to close triads within the protoclus-ters than between them (b). As more nodes and links areadded, the protoclusters become larger and larger andtheir internal density of links becomes inhomogenous, sothere will be a selective triadic closure within the denserparts, which yields a separation into smaller clusters (c).This cycle of growing and splitting plays repeatedly alongthe evolution of the system.In Fig. 6 we show the time evolution of the node-basedembeddedness ξ n during the growth of the system, until500 nodes are added to the network, m = 2. We considerthe two extreme situations p = 0, corresponding to theabsence of triadic closure and p = 1, where both linksclose a triangle every time and there is no additionalnoise. In the ﬁrst case (green line), after a transient, ξ n sets to a low value, with small ﬂuctuations; in thecase with pure triadic closure, instead, the equilibrium value is much higher, indicating strong community struc-ture, and ﬂuctuations are modest. In contrast with therandom case, we recognize a characteristic pattern, with ξ n increasing steadily and then suddenly dropping. Thesmooth increase of ξ n signal that the communities aregrowing, the rapid drop that a cluster splits into smallerpieces: in the inset such breakouts are indicated by ar-rows. Embeddedness drops when clusters break up be-cause the internal degrees k i, in of the nodes of the frag-ments in Eq. 13 suddenly decrease, since some of the oldinternal neighbors belong to a diﬀerent community, whilethe values of k max i, ext are typically unaﬀected. III. PREFERENTIAL ATTACHMENT ORTEMPORAL NETWORK MODELS INCLUDINGTRIADIC CLOSURE

The scenario depicted in Section II is not limited to thebasic model we have investigated, but it is quite general.To show this, we consider here two other models basedon triadic closure.The model by Holme and Kim [20] is a variant of theBarab´asi-Albert model of preferential attachment (BAmodel) which generate scale-free networks with cluster-ing. The new node joining the network sets a link withan existing node, chosen with a probability proportionalto the degree of the latter, just like in the BA model. Theother m − P t to a random neighborof the node which received the most recent preferentially-attached link, closing a triangle, and with a probability1 − P t to another node chosen with preferential attach-ment. By varying P t it is possible to tune the level ofclustering into the network, while the degree distributionis the same as in the BA model, i.e. a power law withexponent −

3, for any value of P t . In Fig. 7 we show thesame heat map as in Fig. 4 for this model, where we nowreport the probability P t on the y-axis. Networks areagain grown until n = 50 000 nodes. The picture is verysimilar to what we observe for the basic model.The model by Marsili et al. [23], at variance with mostmodels of network formation, is not based on a growthprocess. The model is a model for temporal networks[34], in which the links are created and destroyed on thefast time scale while the number of nodes remains con-stant. The starting point is a random graph with n nodes.Then, three processes take place, at diﬀerent rates:1. any existing link vanishes (rate λ );2. a new link is created between a pair of nodes, cho-sen at random (rate η );3. a triangle is formed by joining a node with a ran-dom neighbor of one of his neighbors, chosen atrandom (rate ξ M ).In our simulations we start from a random network of n = 50 000 nodes with average degree 10. The three rates m P t (a) ξ n m (b) C FIG. 7: (Color online) Heat map of node-based embeddedness(a) and average clustering coeﬃcient (b) as a function of P t and m for the model by Holme and Kim [20]. For each pairof parameter values we report the average over 50 networkrealizations. The white area in the upper right correspondsto systems where a single community, consisting of the wholenetwork, is found, which is not interesting. The diagrams lookqualitatively similar to that of the basic model (Fig. 4), withhighest embeddedness and clustering coeﬃcient in the lowerleft region. λ , η and ξ M can be reduced to two independent parame-ters, since what counts is their relative size. The numberof links deleted at each iteration is proportional to λM ,where M is the number of links of the network, whilethe number of links created via the two other processesis proportional to ηn and ξ M n , respectively. The num-ber of links M varies in time but in order to get a non-trivial stationary state, one should reach an equilibriumsituation where the numbers of deleted and created linksmatch. A variety of scenarios are possible, depending onthe choices of the parameters. For instance, if ξ M is setequal to zero, there are no triads, and what one gets atstationarity is a random graph with average degree 2 η/λ .So, if η (cid:28) λ , the graph is fragmented into many smallconnected components. In one introduces triadic closure,the clustering coeﬃcient grows with ξ M if the networkis fragmented, as triangles concentrate in the connectedcomponents. Moreover the model can display a veritableﬁrst order phase transition and in a region of the phasediagram displays two stable phases: one corresponding toa connected network with large average clustering coeﬃ-cient and the other one corresponding to a disconnectednetwork. Interestingly, if there is a dense single compo-nent, the clustering coeﬃcient decreases with ξ M . Thedegree distribution can follow diﬀerent patterns too: itis Poissonian in the diluted phase, where the system isfragmented, and broad in the dense phase, where the sys-tem consists of a single component with an appreciabledensity of links. In Fig. 8 we show the analogous heatmap as in Figs. 4 and 7, for the two parameters λ and ξ M . The third parameter η = 1. We consider only con-ﬁgurations where the giant component covers more thana half of the nodes of the network. The diagrams are nowdiﬀerent because of the diﬀerent role of the parameters, λ ξ M (a) ξ n λ (b) C FIG. 8: (Color online) Heat map of node-based embedded-ness (a) and average clustering coeﬃcient (b) as a functionof the rates λ and ξ M for the model by Marsili et al. [23]( η = 1). For each pair of parameter values we report theaverage over 50 network realizations. The white area in theupper right corresponds to systems where a single commu-nity, consisting of the whole network, is found, which is notinteresting. These diagrams have better communities (higherembeddedness and clustering coeﬃcient) towards the upperright, diﬀerent from those in Figs. 4 and 7, because of the dif-ferent meaning and eﬀect of the parameters. However, thereis a strong correspondence between high clustering coeﬃcientand strong community structure, as in the other models. but the picture is consistent nevertheless. The clusteringcoeﬃcient C is highest when the ratio of λ and ξ M lieswithin a narrow range, yielding a sparse network with agiant component having a high density of triangles anda corresponding presence of strong communities. IV. THE BASIC MODEL INCLUDING TRIADICCLOSURE AND FITNESS OF THE NODES

In this Section we introduce a variant of the basicmodel, where the link attractivity depends on some in-trinsic ﬁtness of the nodes. We will assume that thenodes are not all equal and assign to each node i a ﬁt-ness η i representing the ability of a node to attract newlinks. We have chosen to parametrize the ﬁtness with aparameter β > η i = e − β(cid:15) i , (15)with (cid:15) chosen from a distribution g ( (cid:15) ) and β representinga tuning parameter of the model. We take g ( (cid:15) ) = (1 + ν ) (cid:15) ν , (16)with (cid:15) ∈ (0 , β = 0 all the ﬁtness values are thesame, when β is large small diﬀerences in the (cid:15) i causelarge diﬀerences in ﬁtness. For simplicity we assume thatthe ﬁtness values are quenched variables assigned oncefor all to the nodes. As in the basic model without ﬁt-ness, the starting point is a small connected network of n nodes and m ≥ m links. The model contains twoingredients: • Growth . At time t a new node is added to thenetwork with m ≥ • Proximity and ﬁtness bias . The probability to at-tach the new node to node i depends on the orderin which links are added.The ﬁrst link of the new node is attached to a ran-dom node i of the network with probability pro-portional to its ﬁtness. The probability that thenew node is attached to node i is then given byΠ [0] ( i ) = η i (cid:80) j η j . (17)For m = 2 the second link is attached to a nodeof the network chosen according to its ﬁtness, asabove, with probability 1 − p , while with proba-bility p it is attached to a node chosen randomlybetween the neighbors of the node i with proba-bility proportional to its ﬁtness. Therefore in theﬁrst case the probability to attach to a node i (cid:54) = i is given byΠ [0] ( i ) = η i (1 − δ i ,i ) (cid:80) j (cid:54) = i η j , (18)with δ i ,i indicating the Kronecker delta, while inthe second case the probability Π [1] ( i ) that thenew node links to node i is given byΠ [1] ( i ) = η i a i ,i (cid:80) j η j a i ,j , (19)where a ij indicates the matrix element ( i, j ) of theadjacency matrix of the network. • Further edges . For m >

2, further edges are addedaccording to the “second link” rule in the previouspoint. With probability p an edge is added to aneighbor of the ﬁrst node i , not already attachedto the new node, according to the ﬁtness rule. Withprobability 1 − p , a link is set to any node in thenetwork, not already attached to the new node, ac-cording to the ﬁtness rule.For simplicity we shall consider here the case m = 2.The probability that a node i acquires a new link at time t is given by e − β(cid:15) i t  (2 − p ) + p (cid:88) j a ij (cid:80) r η r a jr  . (20)Similarly to the case without ﬁtness, here we will assume,supported by simulations, thatΘ i = p (cid:88) j η j a ij (cid:80) r η r a jr (cid:39) ck θ ( (cid:15) ) i , (21)where, for every value of p , θ = θ ( (cid:15) ) ≤ c = c ( (cid:15) ). We can write the master equation for the average num-ber n k,(cid:15) ( t ) of nodes of degree k and energy (cid:15) at time t ,as n k,(cid:15) ( t + 1) = n k,(cid:15) ( t )+ e − β(cid:15) [2 − p + c ( (cid:15) )( k − θ ] t n k − ,(cid:15) ( t )(1 − δ k, ) − e − β(cid:15) [2 − p + c ( (cid:15) ) k θ ( (cid:15) ) ] t n k,(cid:15) ( t ) + δ k, g ( (cid:15) ) . (22)In the limit of large values of t we assume that n k,(cid:15) /t → P (cid:15) ( k ), and therefore we ﬁnd that the solution for P (cid:15) ( k )is given by P (cid:15) ( k ) = C ( (cid:15) ) 11 + e − β(cid:15) [2 − p + c ( (cid:15) ) k θ ( (cid:15) ) ] × k − (cid:89) j =1 (cid:26) −

11 + e − β(cid:15) [2 − p + c ( (cid:15) ) j θ ( (cid:15) ) ] (cid:27) , (23)where C ( (cid:15) ) is the normalization factor. This expressionfor θ ( (cid:15) ) < P (cid:15) ( k ) (cid:39) D ( (cid:15) ) e − ( k − G [ k − ,(cid:15),θ ( (cid:15) ) ,c ( (cid:15) )] e − β(cid:15) [2 − p + c ( (cid:15) ) k θ ( (cid:15) ) ] , (24)where D ( (cid:15) ) is the normalization constant and G ( k, (cid:15), θ, c )is given by G ( k, (cid:15), θ, c ) = − θ F (cid:18) , θ , θ , − ck θ − p + e β(cid:15) (cid:19) + θ F (cid:18) , θ , θ , − ck θ − p (cid:19) + log (cid:32) −

11 + e β(cid:15) − p + ck θ (cid:33) . (25)When θ ( (cid:15) ) = 1, instead, we can approximate P (cid:15) ( k ) witha power law, i.e. P (cid:15) ( k ) (cid:39) D ( (cid:15) ) (cid:2) e − β(cid:15) (2 − p + c ( (cid:15) ) k ) (cid:3) − e − β(cid:15)c ( (cid:15) ) − . (26)Therefore, the degree distribution P ( k ) of the entire net-work is a convolution of the degree distributions P (cid:15) ( k )conditioned on the value of (cid:15) , i.e. P ( k ) = (cid:90) d(cid:15)P (cid:15) ( k ) . (27)As a result of this expression, we found that the degreedistribution can be a power law also if the network ex-hibits degree correlations and θ ( (cid:15) ) < (cid:15) . Moreover we observe that for large values of the pa-rameter β the distribution becomes broader and broaderuntil a condensation transition occurs at β = β c with thevalue of β c depending on both the parameters ν and p −8 −6 −4 −2 k P ( k ) β =16 β =6 β =2theory FIG. 9: (Color online) Degree distribution of the model withﬁtness, for three values of the parameter β , which indicatesthe heterogeneity of the distribution of the ﬁtness of thenodes. Symbols stand for the results obtained by buildingthe network via simulations, continuous lines for our analyt-ical derivations. The ﬁgure is obtained by performing 100realizations of networks of size n = 100 000 with ν = 6. of the model. For β > β c successive nodes with maxi-mum ﬁtness (minimum value of (cid:15) ) become “superhubs”,attracting a ﬁnite fraction of all the links, similarly towhat happens in Ref. [35]. In Fig. 9 we see the degree dis-tribution of model, obtained via numerical simulations,for diﬀerent values of β . The continuous lines, illustrat-ing the theoretical behavior, are well aligned with thenumerical results, as long as β < β c .In Fig. 10 we show the heat map of ξ n and C for themodel, as a function of the parameters p and β . The num-ber of edges per node is m = 2, and the networks consistof 50 000 nodes. Everywhere in this work, we set the pa-rameter ν = 6. For β = 0 all nodes have identical ﬁtnessand the model reduces itself to the basic model. So werecover the previous results, with the emergence of com-munities for suﬃciently large values of the probability oftriadic closure p , following a large density of triangles inthe system. The situation changes dramatically when β starts to increase, as we witness a progressive weakeningof community structure, while the clustering coeﬃcientkeeps growing, which appears counterintuitive. In theanalogous diagrams for m = 5, we see that this patternholds, though with a weaker overall community structureand lower values of the clustering coeﬃcient.When β is suﬃciently large, communities disappear,despite the high density of triangles. To check whathappens, we compute the probability distribution of thescaled link density ˜ ρ and the node-based embeddedness ξ n of the communities of the networks obtained from 100runs of the model, for three diﬀerent values of β : 0, 6and 20. All networks are grown until 100 000 nodes. The β p (a) ξ n β (b) C FIG. 10: (Color online) Heat map of node-based embedded-ness (a) and average clustering coeﬃcient (b) as a functionof the probability of triadic closure p and the heterogeneityparameter β of the ﬁtness distribution of the nodes, for themodel with ﬁtness. The number of new edges per node is m = 2. For each pair of parameter values we report the av-erage over 50 network realizations. When β = 0 we recoverthe basic model, without ﬁtness. We see the highest valuesof embeddedness in the lower left, while highest values of theclustering coeﬃcient are in the lower right. When β increases,we see a drastic change of structure in contrast to the previ-ous pattern: communities disappear, whereas the clusteringcoeﬃcient gets higher. β p (a) ξ n β (b) C FIG. 11: (Color online) Same as Fig. 10, but for m = 5. Thepicture is consistent with the case m = 2, but communitiesare less pronounced. scaled link density ˜ ρ of a cluster is deﬁned [36] as˜ ρ = 2 l c n c − , (28)where l c and n c are the number of internal links and ofnodes of cluster c . If the cluster is tree-like, ˜ ρ ≈

2, if it isclique-like it ˜ ρ ≈ n c , so it grows linearly with the size ofthe cluster. The distributions of ξ n and ˜ ρ are shown inFig. 12. They are peaked, but the peaks undergo a rapidshift when β goes from 0 to 20. The situation resembleswhat one usually observes in ﬁrst-order phase transitions.The embeddedness ends up peaking at low values, quitedistant from the maximum 1, while the scaled link den-sity eventually peaks sharply at 2, indicating that the ρ p r o b a b ili t y d e n s i t y (a) β =0 β =6 β =20 ξ n (b) FIG. 12: (Color online) Probability distributions of the scaled link density ˜ ρ (left) and node-based embeddedness ξ n (right) ofthe communities of the ﬁtness model, for m = 2 and β = 0 , ,

20. For each β -value we derived 100 network realizations, eachwith 100 000 nodes. We see that at β = 0, the detected communities satisfy the expectations of good communities, while at β = 20 they do not. subgraphs are eﬀectively tree-like.What kind of objects are we looking at? To answerthis question, in Figs. 13 and 14 we display two picturesof networks obtained by the ﬁtness model, for β = 0 and β = 20, respectively. The number of nodes is 2 000, andthe number of edges per node m = 2. The probabilityof triadic closure is p = 0 .

97, as we want a very favor-able scenario for the emergence of structure. The sub-graphs found by our community detection method (non-hierarchical Infomap, but the Louvain method yields asimilar picture) are identiﬁed by the diﬀerent colors. Theinsets show an enlarged picture of the subgraphs, whichclarify the apparent puzzle delivered by the previous di-agrams. For the basic model β = 0 (Fig. 13), the sub-graphs are indeed communities, as they are cohesive ob-jects which are only loosely connected to the rest of thegraph. The situation remains similar for low values of β .However, for suﬃciently high β (Fig. 14), a phenomenonof link condensation takes place, with a few superhubs at-tracting most of the links of the network [35]. Most of theother nodes are organized in groups which are “shared”between pairs (for m = 2, more generally m -ples) of su-perhubs (see ﬁgure). The community embeddedness islow because there are always many links ﬂowing out ofthe subgraphs, towards superhubs. Besides, since the su-perhubs are all linked to each other, this generates highclustering coeﬃcient for the subgraphs, as observed inFigs. 10 and 11. In fact, the clustering coeﬃcient forthe non-hubs attains the maximum possible value of 1,as their neighbors are nodes which are all linked to eachother. V. CONCLUSIONS

Triadic closure is a fundamental mechanism of link for-mation, especially in social networks. We have shownthat such mechanism alone is capable to generate sys-tems with all the characteristic properties of complex net-works, from fat-tailed degree distributions to high clus-tering coeﬃcients and strong community structure. Inparticular, we have seen that communities emerge nat-urally via triadic closure, which tend to generate cohe-sive subgraphs around portions of the system that hap-pen to have higher density of links, due to stochasticﬂuctuations. When clusters become suﬃciently large,their internal structure exhibits in turn link density inho-mogeneities, leading to a progressive diﬀerentiation andeventual separation into smaller clusters (separation inthe sense that the density of links between the parts isappreciably lower than within them). This occurs both inthe basic version of network growth model based on tri-adic closure, and in more complex variants. The strengthof community structure is the higher, the sparser the net-work and the higher the probability of triadic closure.We have also introduced a new variant, in that link at-tractivity depends on some intrinsic appeal of the nodes,or ﬁtness. Here we have seen that, when the distributionof ﬁtness is not too heterogeneous, community structurestill emerges, though it is weaker than in the absenceof ﬁtness. By increasing the heterogeneity of the ﬁt-ness distribution, instead, we observe a major change inthe structural organization of the network: communitiesdisappear and are replaced by special subgraphs, whosenodes are connected only to superhubs of the network,i.e. nodes attracting most of the links. Such structuralphase transition is associated to very high values of the0

FIG. 13: (Color online) Picture of a network with 2 000 nodes generated by the ﬁtness model, for p = 0 . m = 2 and β = 0.Since β = 0 ﬁtness does not play a role and we recover the results of the basic model. Colors indicate communities as detectedby the non-hierarchical Infomap algorithm [31]. clustering coeﬃcient. Acknowledgments

R. K. D. and S. F. gratefully acknowledge MULTI-PLEX, grant number 317532 of the European Commis- sion and the computational resources provided by AaltoUniversity Science-IT project.1

FIG. 14: (Color online) Picture of a network with 2 000 nodes generated by the ﬁtness model, for p = 0 . m = 2 and β = 20.The growing process is the same as in Fig. 13, but the addition of ﬁtness changes the structural organization of the network. Asseen in the inset, node aggregations form around hub nodes with high ﬁtness. Looking at the inset we see that such aggregationsdo not satisfy the typical requirements for communities: they are internally tree-like, and there are more external edges (blueor light gray) than internal (red or dark gray) touching its nodes. In particular, internal edges only go from regular nodes tosuperhubs.[1] R. Albert and A.-L. Barab´asi, Rev. Mod. Phys. , 47(2002).[2] A. Barrat, M. Barth´elemy, and A. Vespignani, Dynami-cal processes on complex networks (Cambridge UniversityPress, Cambridge, UK, 2008).[3] M. Newman,

Networks: An Introduction (Oxford Uni-versity Press, Inc., New York, NY, USA, 2010).[4] R. Albert, H. Jeong, and A.-L. Barab´asi, Nature ,130 (1999).[5] R. Albert, H. Jeong, and A.-L. Barab´asi, Nature ,378 (2000).[6] R. Pastor-Satorras and A. Vespignani, Phys. Rev. Lett. , 3200 (2001).[7] A.-L. Barab´asi and R. Albert, Science , 509 (1999).[8] D. Watts and S. Strogatz, Nature , 440 (1998).[9] M. Girvan and M. E. Newman, Proc. Natl. Acad. Sci. USA , 7821 (2002).[10] S. Fortunato, Physics Reports , 75 (2010).[11] M. Newman and J. Park, Physical Review E , 036122(2003).[12] M. E. J. Newman, Phys. Rev. E , 026121 (2003).[13] R. Toivonen, J.-P. Onnela, J. Saram¨aki, J. Hyv¨onen, andK. Kaski, Physica A Statistical Mechanics and its Appli-cations , 851 (2006), arXiv:physics/0601114.[14] J. M. Kumpula, J.-P. Onnela, J. Saram¨aki, K. Kaski, andJ. Kert´esz, Phys. Rev. Lett. , 228701 (2007).[15] D. V. Foster, J. G. Foster, P. Grassberger, andM. Paczuski, Phys. Rev. E , 066117 (2011).[16] M. Granovetter, Am. J. Sociol. , 1360 (1973).[17] G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek, Nature , 814 (2005).[18] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi, Proc. Natl. Acad. Sci. USA , 2658 (2004).[19] A. Rapoport, The bulletin of mathematical biophysics , 523 (1953).[20] P. Holme and B. J. Kim, Physical Review E , 026107+(2002).[21] J. Davidsen, H. Ebel, and S. Bornholdt, Phys. Rev. Lett. , 128701 (2002).[22] A. V´azquez, Phys. Rev. E , 056104 (2003).[23] M. Marsili, F. Vega-Redondo, and F. Slanina, Proceed-ings of the National Academy of Sciences of the USA , 1439 (2004).[24] M. O. Jackson and B. W. Rogers, American EconomicReview , 890 (2007).[25] R. V. Sol´e, R. Pastor-Satorras, E. Smith, and T. B. Ke-pler, Adv. Complex Syst. , 43 (2002).[26] P. L. Krapivsky and S. Redner, Phys. Rev. E , 036118(2005).[27] I. Ispolatov, P. L. Krapivsky, and A. Yuryev, Phys. Rev.E , 061911 (2005).[28] R. Lambiotte, URL .[29] T. Aynaud, V. D. Blondel, J.-L. Guillaume, and R. Lam-biotte, Multilevel Local Optimization of Modularity (JohnWiley & Sons, Inc., 2013), pp. 315–345.[30] J. F. F. Mendes and S. N. Dorogovtsev,

Evolution ofNetworks: from biological nets to the Internet and WWW (Oxford University Press, Oxford, UK, 2003).[31] M. Rosvall and C. T. Bergstrom, Proc. Natl. Acad. Sci.USA , 1118 (2008).[32] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, andE. Lefebvre, J. Stat. Mech.

P10008 (2008).[33] S. Fortunato and M. Barth´elemy, Proc. Natl. Acad. Sci.USA , 36 (2007).[34] P. Holme and J. Saram¨aki, Physics Reports , 97(2012), temporal Networks.[35] G. Bianconi and A.-L. Barab´asi, Phys. Rev. Lett. ,5632 (2001).[36] A. Lancichinetti, M. Kivel¨a, J. Saram¨aki, and S. Fortu-nato, PLoS ONE5