Two universal physical principles shape the power-law statistics of real-world networks
aa r X i v : . [ phy s i c s . s o c - ph ] M a y Two universal physical principles shape the power-law statisticsof real-world networks
Tom Lorimer, Florian Gomez, Ruedi Stoop ∗ Institute of Computational Science and Institute of Neuroinformatics,University of Zurich and ETH Zurich,Winterthurerstrasse 190, 8057 Zurich, Switzerland (Dated: October 15, 2018)
Abstract
The study of complex networks has pursued an understanding of macroscopic behavior by focus-ing on power-laws in microscopic observables. Here, we uncover two universal fundamental physicalprinciples that are at the basis of complex networks generation. These principles together predictthe generic emergence of deviations from ideal power laws, which were previously discussed awayby reference to the thermodynamic limit. Our approach proposes a paradigm shift in the physicsof complex networks, toward the use of power-law deviations to infer meso-scale structure frommacroscopic observations. ntroduction A recent seminal discovery elucidated that in nature a simple physical principle rulesoften the growth of ‘random networks’. The so called preferential attachment (‘the richget richer’) rule leads to complex networks that have properties contrasting those predictedfrom classical random network theory . A fundamental universality principle of physicsmust be held responsible for this change of paradigm. The preferential attachment principleexpresses in our interpretation that for the formation of ensembles, attractive forces that aregenerally valid over decades of spatial extensions are required (that in physics may involvemass, charge, e.g.). It is this principle that generates the celebrated power laws observedin the distribution of mesoscopic network indicators, such as network degree, connectivityweight , or neuronal avalanche size . A second fundamental universality principle ofphysics is, however, active at the same time, that has passed unnoticed so far. It is the factthat real-world connectivity requires space, and that this space is limited. The question thatwe address in our work is what the traces of this principle will be, during network formationand regarding the final network. This question has not been answered so far.
Generic network building algorithm
To study this question, we consider a novel generic network building algorithm (our’primary model’) that implements both principles at the most basic level as follows. Westart from a connected network of N nodes. With probability p , an ‘outside’ node, from afinite set of available nodes, is added; alternatively, with probability 1 − p , an attempt is madeto construct an ’inside’ edge (see below). If an outside node is added, the new node joins thenetwork by m edges, where the target nodes are sampled according to their degree k (i.e. ∝ k ), following preferential attachment. For an inside edge, two nodes are independentlychosen along preferential attachment (i.e., proportional to the degree they have). If the twochosen nodes are not identical and not already connected, an edge is established. In this way,the algorithm’s second alternative expresses the second fundamental principle in terms ofan ’edge saturation’ (at a level defined by p and m , implemented right from the start of thenetwork’s growth). The process stops if the set of available nodes is depleted. The algorithmgenerates undirected topological networks of arbitrary size, void of loops and multiple-edges;2xamples will be discussed later. Fig. 1 shows the stereotypical degree distribution obtainedin this way, exhibiting an extended power-law part of the distribution terminated by a hump(that, upon the network’s growth, moves towards larger degrees, until the process is stoppedby node depletion, cf. Fig. 7b).
10 10010 - - - - P(k) k p = 1/48p = 1/24p = 1/12p = 1/6p = 1 (BA)
P(k) ~ k -3 P(k) ~ k -1.5
FIG. 1: Characteristic degree distributions from the two key principles (for different values ofparameter p and fixed parameter m = 2; the effect of m is exhibited in Fig. 3 and Fig. 4).Network size t = 10 nodes, mean of 10 realizations. Dashed lines: power-law visual guides. Theeffect is most saliently expressed for exponents <
2, occurring often in gene or protein networks.
Network properties
While we observe a wide-spread activity to find power-law distributions in all areas ofphysics, we emphasize that based on the fundamental ingredients necessary in the networkbuilding process, only in rare cases neat power laws will be found. Examples of experimentaldata with the deviations that our key principles predict are shown in Fig. 2. While ourreal-world examples are often related to biology (mostly because of the great availabilityof the underlying data, and because of the greater simplicity of the examples), all of ourarguments are immediately transferable to physical situations where previous analysis hasgenerally stopped at the preferential attachment level. Our analysis now provides guidelinesfor inferring from macroscopic measurements the microscopic properties that dominate net-work growth (cf. Fig. 3, where the ’humpiness’ of the distribution P ( k ) was evaluated asthe deviation from the power law p ( k ) excluding the hump, as ( P ( k ) − p ( k )) /p ( k )). Thisprovides an important input for the modeling of real world systems (see, e.g., the Drosophila3etwork example discussed below). By superposition of prototypes with different p and m parameters, more general hump structures can be generated (Fig. 2). This mechanism pro-vides an as yet unexplored link between the macro- and meso-scales that can be invaluablefor both the modeling and the further analysis of real-world systems.
11 505010101010 -1 -2 -3 -420-1 countfam. size fam. size kP(k)k a -4 -1 -1 -1-3-2 -2 -3 -3 -4 k/k max wP(w)P(k) (a) (b) FIG. 2: Typical weight and degree distributions, respectively, from experiments, and their quali-tative modeling (black: experimental, red: simulation data). a) Network of synchronizing linearphase oscillators (network weight distribution during synchronization) . b) Gene family for S.cerevisiae (family size distribution). For the modeling, different ( p, m )-models were superim-posed for a). m p pexponent hump IIIII IIV>2.52 - 2.5<2 m FIG. 3: Modeling guidelines: Phase diagram of the humped power law’s exponent and ’humpiness’on local parameters ( p, m ) (see text). Domains of humpiness: I) not resolvable, II minor, IIIsignificant, IV salient. Guided by the power-law paradigm, investigations have mostly focused onexamples from domains I and II. Network sizes: t = 10 .
4n contrast to preferential attachment networks (cf. ), a network generated along thetwo fundamental physical principles embodied in our primary model, will not be necessarilysparse (this would imply a power-law exponent > deviates from the fundamental principles that we have worked out. Thatmodel uses a second internal linking process that is always successful in making new connec-tions. In our case it is exactly the edge connection failures (by edge saturation) that definethe network structure. Whereas the rate of internal linking in their algorithm accelerateswith the network size, our approach does not share this property. Moreover, the networkstructures that we obtain depend primarily on parameter p and the obtained distributionsare generally unaffected by the network’s initial condition (in contrast to Refs. ).The modeling of biological networks containing a small number of nodes only, is a par-ticular challenge. The example of Drosophilas’s courtship network, a network that is builton observable irreducible acts of body language (cf. Figs. 4 and 5) illustrates that ourapproach also successfully masters this challenge (a further discussion of this example isgiven towards the end of the paper). Statistical modeling
To better understand how the statistical properties and in particular, saturation, emergefrom the model, we focus on a semi-analytical growth description, in which the natural timestep t is the addition of one node to the network. The degree distribution from a networkgrowth algorithm is usually determined from a differential equation that describes the rateof addition of new edges to a given node, as a function of the time s at which the node hasjoined the network , i.e. ∂k ( s,t ) ∂t = f ( k, s, t ) . For our algorithm, the topological constrainton the addition of inside edges implies that ∂k ( s,t ) ∂t can not be determined analytically fromthe single node information f ( k, s, t ), but requires the full pairwise connection informationof the network encoded in the adjacency matrix at time t , A t , i.e. ∂k ( s, t ) ∂t = f ( k, s, t, A t ) . To work around this complication, we make the following ansatz. We suppose that theprobability of failure while trying to add an inside edge ( i, j ) to an already chosen node i ,5an be expressed by a mean field ‘saturation’ function F ( k, t ) in terms of the degree k ofnode i . Furthermore, suppose that the total number of edges present in the network at time t can be approximated by K ( t ). F ( k, t ) is then defined as the average probability of a nodewith degree k , to be already connected to a second node j chosen with P ∝ k j . Thus, F ( k, t ) := (cid:10) F i ( t ) (cid:11) k i = k , (1)where F i ( t ) is the probability that node i with degree k i , is already connected to node j . F i ( t ) has then the form F i ( t ) := k i ( t ) + P ( i,j ) ∈ E ( t ) k j ( t ) P j k j ( t ) , (2)where k i ( t ) accounts for the case where node i would be chosen twice, and the second term isthe degree-weighted sum over the nodes to which node i is already connected ( E ( t ) denotesthe network’s set of edges).Using this approximation, we can express our algorithm by the rate of addition of new - - - - - -
11 5 10 50 100 10 - - -
11 5 10 50 10010 - - -
11 5 10 50 100 p = 1p = 1/12p = 1/48 p = 1/24 m = 2m = 3m = 5m = 10
P(k) P(k)P(k)P(k) kkk k - - -
11 5 10 50 100 p = 1/6
P(k) k (a) (b)(c) (d)(e) (f) - - - - P(k) k FIG. 4: a)-e) Choice of m on network degree distribution, for different values of p (network size t = 10 nodes, mean of 10 realizations). Increasing m for p << k < m have small probability. kSF(k) FIG. 5: Drosophila courtship language network degree distribution. a) Survival function SF ( k ) :=1 − CDF ( k ), where CDF is the cumulative distribution function (red dots: original data). Solidline: means, dashed lines: 0 .
05 quantiles, from 1000 realizations of our network growth algorithm( N = 34, p = , m = 2). Inset: mapped-out Drosophila language network. edges to a node of degree k ( s, t ) as ∂k ( s, t ) ∂t = mk ( s, t )2 K ( t ) + 1 − pp k ( s, t ) K ( t ) [1 − F ( k, t )] . (3)In this case, the network grows out from a connected network of N nodes, with k ( s, s ) ≈ m as the initial condition. The first term on the right hand side of Eq. (3) describes theincrease in k due to connection to outside nodes, and the second term describes the additionof inside edges. The whole equation has been rescaled by p (canceling the p in the firstterm’s numerator) such that t corresponds to the number of nodes in the network. As canbe easily seen from Eq. (3), our growth algorithm provides two well-known limiting cases.For p = 1 we retrieve the preferential attachment growth process . For p = 0, the networkwill not add nodes and must asymptotically become a clique of size N . In between, for p <<
1, the second term dominates, which renders the network more dense, and producesthe large deviation from power-law structure in the distribution tail.To demonstrate the validity of our mean-field approximation, we compare the node de-gree evolution obtained from a 4 th order Runge-Kutta integration of Eq. (3) using ourapproximation for F ( k, t ) (see below), against the averaged result from 10 realizations ofthe primary model. As the result, an approximate power law scaling clearly emerges at earlyevolution stage, and an upper bound to the envelope of node degrees emerges for longer evo-lution time t necessary to attain larger network sizes (cf. Fig. 6, where the results of the7emi-analytical description are based on exponents and prefactors from an approximationof the results of Fig. 7a) via Eq. (4)). F ( k, t ) has a very regular behavior in both variables
50 100 50051010030020 2000 tk(s,t)
FIG. 6: Comparison: Primary model / semi-analytical description. Degree evolution k ( s, t ) ofnodes entering the network at s = 21 , , , , primary model realizations(dashed), compared with numerical integration of Eq. (3) (solid). ( k, t ) (Fig. 7a) and is accompanied by a node degree distribution P ( k ) as found for ourprimary model (Fig. 7b). Over a large range, we can approximate F ( k, t ) by a power lawfor small k , and by a second power law at large k : F ( k, t ) ≈ t α k β if k ≤ k c k γ t α k cβ k cγ if k > k c , (4)where k c ∼ t λ , and the fractional term for k > k c simply makes F ( k, t ) continuous at k c .The exponents α, β, γ, λ will vary according to the choice of algorithm parameter p , where0 < λ <
1: i.e. 1 < k c < t . In accordance with Fig. 7a), the following observations can bemade: First, γ < β (the exponent of the power law fit decreases as k crosses k c ). Second, F ( t − , t ) = 1, since t − t (achieved inFig. 7a) for t = 25 only). Similarly, as p → F ( k, t ) →
1, (the network will tend towarda clique, where all possible connections already exist). When p = 1, F ( k, t ) ceases to berelevant. Finally, for any p ∈ (0 , t → ∞ , F ( k, t ) →
0, since the number of inside edgesadded at each time-step approximates a constant value, so the network becomes increasinglysparse.We can use F ( k, t ) to infer the generated unnormalized degree probability distribution, N ( k, t ) as follows. Starting from the continuity equation, we may write ∂∂t N ( k, t ) = − ∂∂k (cid:0) N ( k, t ) ∂k∂t (cid:1) + δ m,k , (5)8here ∂k∂t is given by Eq. (3), and the Kronecker delta function has been included to accountfor the addition of outside nodes. By differentiating Eq. (3), we notice that Eq. (5) containsthe product of k and the derivative of the saturation function F : ∂∂k ∂k∂t = a + a − a (cid:0) k ∂∂k F ( k, t ) + F ( k, t ) (cid:1) , (6)where a := m K ( t ) , a := (1 − p ) pK ( t ) . The form of F ( k, t ) implies that a sharp change should occurin the solutions of Eq. (6) around k c . Indeed, a comparison between P ( k, t ) and F ( k, t )(Fig. 7) supports this suggestion. Thus, we hold the properties of the saturation function F ( k, t ) responsible for the form of the deviation of P ( k, t ) from the ideal power law. - - - - - F(k) t = t = t = t = t = t = t = t = kkP(k) (a)(b) FIG. 7: Relation between power-law deviation hump and saturation function: a) Mean field sat-uration F ( k, t ), b) mean of the degree distribution. Data set: 10 network realizations for giventime t using p = . Vertical grey lines are visual aids. The figure indicates the disappearance ofthe hump structure in the thermodynamic limit. iscussion Examples of edge saturation network growth emerge from the fundamental situationwhere the state of a physical system is described by a symbol, and where time acting onthe states leads to a description in terms of a language (symbolic dynamics and formallanguages , natural languages). Starting with a finite number of N states, observationsof the system in time yield sequences of states, that define links on a graph between nodes(states), which implies that more important or more versatile nodes will have more links. Assuch a network evolves for a finer description, two processes may occur: 1) adjacencies areestablished between previously unconnected nodes (preferentially between more versatileones); 2) a new node is added and connected preferentially to already highly connectednodes. Evidently, in many networks there will, however, be a limitation on the number ofedges that can be hosted by a given node.The Drosophila courtship body language of 37 fundamental behavioral states andits network is an example of such a process. The states are fundamental in the sense thateach act could, from the view of the physics of body motion, be followed by any other act.Some transitions, however, are generally not taken, leading to edges missing. Well-definedconnected sub-networks characterize a chosen courtship partner’s class, according to whichprotagonists can be distinguished (male, female (virgin, mature, mated), fruitless). Withinthese bounds, courtship exploits the available expression space, corroborating the view thatit might advertise individual properties of the sender, into the eyes of a courtship partner .To compare our network growth algorithm with the data from male-female interaction, wegrow the network until the number of nodes (symbols) is depleted, with p chosen so that onaverage the number of edges matches that of the courtship network. A comparison -withoutfurther fitting- exhibits that the two degree distributions match extremely well and that theproposed generating algorithm is very specific (Fig. 5).Our paradigm may also appear in the guise of an equilibrium condition in the followingsense. Complex networks in physics or in biology are often constrained to maintain some’average’ conditions. As soon as (possibly: self-enhancing) node interaction sets in, thisneeds to be balanced by homeostasis, i.e. a competitive, counter-balancing mechanismthat weakens other connections of the same node to the network . In the neural networksdomain, a closely related principle is known as ‘Hebbian learning’ . Self-organized Hebbian-10earning in the super-paramagnetic phase of ensembles has been proven a reliable andefficient way of clustering that does away with convexity requirements of cluster borders . Avery similar approach has also been used as a synchronization model for coupled oscillators,where the oscillators’ struggle to synchronize is expressed by competing connection strengths w ij that evolve according to the dynamical update rule dw ij dt = s ij − w ij (cid:0)P ( i,k ) ∈ E s ik (cid:1) , where s ij measures the pairwise oscillator synchrony. The resulting distribution of w ij has beenshown to tend for intermediate coupling strengths towards a hump-terminated power-law(cf. Fig. 2a). This dynamical law expresses the limited resources available for the localwiring around each node, which in our model is encoded in the probability p ruling the edgesaturation. We envisage that also avalanche distributions of the typical form of Fig. 2a)could be understood similarly .Many interesting real-world phenomena dwell on the mesoscale. In social networks, thelargest scale is relevant, e.g., for the study of disease and rumor spreading, but more subtlesocial dynamics happens within the community structures . Our results suggest that alarge class of systems can be formulated as growing along simple principles, similar andin addition to preferential attachment. The sets of m , p parameters needed to recover anexperimental distribution, i.e. the violation of the ideal power law on the macroscopic scale,provides us with an insight about the local mesoscale structures present in the network.In this way, starting from non-ideal power law distributions of complex networks, an av-enue opens towards the identification and understanding of interesting mesoscale real-worldphenomena in physics.Work supported by the Swiss National Science Foundation (Grant 200021-153542/1 toR.S.). ∗ Electronic address: [email protected] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang, Complex networks: struc-ture and dynamics, Phys. Rep. , 175 (2006). R. Albert and A.-L. Barab´asi, Statistical mechanics of complex networks, Rev. Mod. Phys. ,47 (2002). R. Cohen and S. Havlin,
Complex networks: structure, robustness and function (Cambridge niversity Press, 2010). A.-L. Barab´asi and R. Albert, Emergence of scaling in random networks, Science , 509(1999). L.A.N. Amaral, A. Scala, M. Barth´el´emy, and H.E. Stanley, Classes of small-world networks,Proc. Natl. Acad. Sci. U.S.A. , 11149 (2000). S. Mossa, M. Barth´el´emy, H.E. Stanley, and L.A.N. Amaral, Truncation of power law behaviorin “scale-free” network models due to information filtering, Phys. Rev. Lett. , 138701 (2002). S.N. Dorogovtsev and J.F.F. Mendes, Language as an evolving word web, Proc. R. Soc. Lond.B , 2603 (2001). S. Assenza, R. Guti´errez, J. G´omez-Garde˜nes, V. Latora, and S. Boccaletti, Emergence ofstructural patterns out of synchronization in networks with competitive interactions, Sci. Rep. , 99 (2011). C.W. Eurich, J.M. Herrmann, and U.A. Ernst, Finite-size effects of avalanche dynamics, Phys.Rev. E , 066137 (2002). A. Levina, J.M. Herrmann, and T. Geisel, Dynamical synapses causing self-organized criticalityin neural networks, Nat. Phys. , 857 (2007). L. de Arcangelis, F. Lombardi, and H.J. Herrmann, Criticality in the brain, J. Stat. Mech. ,P03026 (2014). I. Yanai, C.J. Camacho and C. DeLisi, Predictions of gene family distributions in microbialgenomes: evolution by gene duplication and modification, Phys. Rev. Lett. , 2641 (2000). C. I. Del Genio, T. Gross, and K.E. Bassler, All scale-free networks are sparse, Phys. Rev. Lett. , 178701 (2011). S.N. Dorogovtsev, J.F.F. Mendes, and A.N. Samukhin, Size-dependent degree distribution of ascale-free growing network, Phys. Rev. E , 062101 (2001). P.R. Guimaraes, M.A.M. de Aguiar, J. Bascompte, P. Jordano, and S.F. dos Reis, Randominitial condition in small Barabasi-Albert networks and deviations from the scale-free behavior,Phys. Rev. E , 037101 (2005). B. Waclaw and I.M. Sokolov, Finite-size effects in Barab´asi-Albert growing networks, Phys.Rev. E , 056114 (2007). R. Stoop and B. Arthur, Periodic orbit analysis demonstrates genetic constraints, variability,and switching in Drosophila courtship behavior, Chaos , 023123 (2008). R. Stoop and J. Joller, Mesocopic Comparison of Complex Networks Based on Periodic Orbits,Chaos , 016112 (2011). S.N. Dorogovtsev and J.F.F. Mendes,
Evolution of Networks (Oxford University Press, Oxford,2003). P. Grassberger and H. Kantz, Generating partitions for the dissipative H´enon map, Phys. Lett.A , 235 (1985). P. Cvitanovi´c, G.H. Gunaratne, and I. Procaccia, Topological and metric properties of H´enon-type strange attractors, Phys. Rev. A , 1503 (1988). H. Bai-Lin,
Elementary Symbolic Dynamics and Chaos in Dissipative Systems (World Scientific,Singapore, 1989). R. Stoop, Bivariate thermodynamic formalism and anomalous diffusion, Phys. Rev. E , 4913(1994). R. Stoop and J. Parisi, Evaluation of probabilistic and dynamical invariants from finite symbolicsubstrings-comparison between two approaches, Physica D , 325 (1992). R. Stoop, Phase transitions in the approximated and asymptotic generalized entropy spectrumof a nonhyperbolic system, Phys. Rev. A , 7450 (1992). Y.-C. Lai, E. Bollt, and C. Grebogi, Communicating with chaos using two-dimensional symbolicdynamics, Phys. Lett. A , 75 (1999). R. Klages,
Microscopic chaos, fractals and transport in non-equilibrium statistical mechanics (World Scientific, Singapore, 2007). R. Stoop, P. N¨uesch, R. L. Stoop, and L.A. Bunimovich, At grammatical faculty of language,flies outsmart men, PLoS ONE , e70284 (2013). D. Hebb,
The Organization of Behavior (Wiley & Sons, New York, 1949). F. Landis, T. Ott, and R. Stoop, Hebbian self-organizing integrate-and-fire networks for dataclustering, Neur. Comp. , 273 (2010). T. Ott, A. Kern, A. Schuffenhauer, M. Popov, P. Acklin, E. Jacoby, and R. Stoop, Sequentialsuperparamagnetic clustering for unbiased classification of high-dimensional chemical data, J.Chem. Inf. Comput. Sci. , 1358 (2004). F. Gomez, R.L. Stoop, and R. Stoop, Universal dynamical properties preclude standard clus-tering in a large class of biochemical data, Bioinformatics , 2486 (2014). M. Girvan and M.E.J. Newman, Community structure in social and biological networks, Proc. atl. Acad. Sci. U.S.A. , 7821 (2002). G. Palla, I. Der´enyi, I. Farkas, and T. Vicsek, Uncovering the overlapping community structureof complex networks in nature and society, Nature , 814 (2005).
Author contributions
R.S. and T.L. designed the research, T.L. and F.G. carried out the analysis, R.S. wrotethe manuscript.