[PDF] Component sizes in networks with arbitrary degree distributions

Abstract

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] J un Component sizes in networks with arbitrary degree distributions

M. E. J. Newman

Department of Physics, University of Michigan, Ann Arbor, MI 48109 andSanta Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

We give an exact solution for the complete distribution of component sizes in random networkswith arbitrary degree distributions. The solution tells us the probability that a randomly chosennode belongs to a component of size s , for any s . We apply our results to networks with the threemost commonly studied degree distributions—Poisson, exponential, and power-law—as well as tothe calculation of cluster sizes for bond percolation on networks, which correspond to the sizes ofoutbreaks of SIR epidemic processes on the same networks. For the particular case of the power-law degree distribution, we show that the component size distribution itself follows a power laweverywhere below the phase transition at which a giant component forms, but takes an exponentialform when a giant component is present. There has in recent years been considerable interestwithin the physics community in the properties of net-works [1, 2, 3]. Methods from physics, and particularlyfrom statistical physics, have proved invaluable for un-derstanding the structure and behavior of networked sys-tems such as the Internet, the world wide web, metabolicnetworks, protein interaction networks, and social net-works of interactions between people. In particular, bycreating simple (and sometimes not-so-simple) models ofnetwork structure and formation, researchers have gainedinsight about the way networks behave as a function ofthe basic parameters governing their topology.One of the most fundamental parameters of a networkis its degree distribution. The degree of a node or vertexin a network is the number of edges connected to thatvertex, and the frequency distribution of the degrees ofvertices has been shown to have a profound inﬂuence onalmost every aspect of network structure and function,including path lengths, clustering, robustness, centralityindices, spreading processes, and many others. Variousnetwork models have been used to illuminate the eﬀectsof the degree distribution, but perhaps the most widelystudied, and certainly one of the simplest, is the so-calledconﬁguration model.In the conﬁguration model only the degrees of verticesare speciﬁed and nothing else; except for the constraintimposed by the degrees, connections between vertices arerandom. Equivalently, conﬁguration model networks canbe thought of as networks drawn uniformly at randomfrom the set of all possible networks whose vertices havethe speciﬁed degrees. One of the primary attractions ofthe conﬁguration model is that many of its propertiescan be calculated exactly in the limit of large system sizeand for this reason it has become one of the fundamen-tal tools for the quantitative understanding and study ofnetworks. In 1995 Molloy and Reed [4] gave an exact cri-terion for the existence of a giant component in the modeland later also gave an expression for the expected size ofthat component [5]. Newman, Strogatz, and Watts [6]gave additional expressions for a variety of other proper-ties including number of vertices a given distance from arandomly chosen vertex, average path length in the giantcomponent, and critical exponents near the transition at which the giant component appears, as well as general-izations of the model to bipartite and directed networks,and many further results have been presented since by avariety of authors.One fundamental result that has been missing, how-ever, is an expression for the sizes of components in themodel other than the giant component. More speciﬁcally,if we choose a vertex at random from the network, whatis the probability that it belongs to a component of agiven size? As well as being a central structural prop-erty of the network, this distribution is directly relatedto important practical issues such as the distribution ofthe sizes of disease outbreaks for diseases spreading overcontact networks [7, 8].At ﬁrst sight, calculation of the component sizes ap-pears diﬃcult. One can derive equations that must besatisﬁed by the generating function for the distributionof component sizes [6], but usually these equations can-not be solved. Here we show, however, that it is nonethe-less possible to derive an explicit expression for the com-plete distribution of component sizes in the conﬁgurationmodel for general degree distribution. In particular, weshow that it is possible to derive closed-form expressionsfor component sizes for the three most commonly stud-ied degree distributions, the Poisson, exponential, andpower-law distributions. We also show that the sametechniques can be used to calculate the sizes of percola-tion clusters for percolation models on networks of arbi-trary degree distribution, a development of some interestbecause of the close connection between percolation andepidemic processes. We explore this connection in thelast part of the paper.Let p k be the degree distribution of our network,i.e., the probability that a randomly chosen vertex hasdegree k . If rather than a vertex we choose an edge andfollow it to the vertex at one of its ends, then the numberof other edges emerging from that vertex follows a diﬀer-ent distribution, the so-called excess degree distribution : q k = ( k + 1) p k +1 h k i , (1)as shown in, for example, Ref. [6]. Here h k i = P k kp k isthe average degree in the network.It will be convenient to introduce the probability gener-ating functions for the two distributions p k and q k , thus: g ( z ) = ∞ X k =0 p k z k , g ( z ) = ∞ X k =0 q k z k . (2)Many of our results are more easily expressed in terms ofthese generating functions than directly in terms of thedegree distributions. It will also be convenient to notethat h k i = g ′ (1) , g ( z ) = g ′ ( z ) g ′ (1) , (3)where we have made use of Eq. (1) in the second equality.Now let us consider the distribution of the sizes of com-ponents in our network. Every vertex belongs to a com-ponent of size at least one (the vertex itself) and everyedge connected to the vertex adds at least one more ver-tex to the component, and possibly many, if there arelots of other vertices that are reachable via that edge.Let us denote by t the total number of vertices reachablevia a particular edge, let the probability distribution of t be ρ t , and let the generating function for this distributionbe h ( z ) = P t ρ t z t .The probability that a vertex of degree k belongs to acomponent of size s is the probability that the numbersof vertices reachable along each of its k edges sum to s −

1. This probability, which we will denote P ( s | k ), isgiven by P ( s | k ) = ∞ X t =1 . . . X t k =1 δ (cid:0) s − , P km =1 t m (cid:1) k Y m =1 ρ t m , (4)where δ ( i, j ) is the Kronecker delta symbol. Then theprobability π s of a randomly chosen vertex belonging toa component of size s is π s = P ∞ k =0 p k P ( s | k ) and thecorresponding generating function is h ( z ) = ∞ X s =1 π s z s = ∞ X s =1 ∞ X k =0 p k P ( s | k ) z s = ∞ X k =0 p k ∞ X s =1 z s ∞ X t =1 . . . X t k =1 δ (cid:0) s − , P km =1 t m (cid:1) k Y m =1 ρ t m = z ∞ X k =0 p k ∞ X t =1 . . . X t k =1 z P m t m k Y m =1 ρ t m = z ∞ X k =0 p k (cid:20) ∞ X t =1 ρ t z t (cid:21) k = z ∞ X k =0 p k [ h ( z )] k . (5)But the ﬁnal sum is simply the generating function g ( z ),Eq. (2), evaluated at h ( z ), and hence h ( z ) = zg ( h ( z )) . (6)By a similar argument the generating function h ( z ) canbe shown to satisfy h ( z ) = zg ( h ( z )) . (7) Between them, Eqs. (6) and (7) allow us, in princi-ple, to calculate the entire distribution of cluster sizes inour network given the degree distribution p k . Unfortu-nately, the self-consistent relation for h ( z ), Eq. (7), isin most cases not solvable and hence we cannot calcu-late the value of the generating function. Surprisingly,however, we can still calculate the probabilities π s .Since every component is of size at least 1, the gener-ating function h ( z ) for the component sizes is of leadingorder z (or higher) and hence contains an overall factorof z . Dividing out this factor and diﬀerentiating, we canwrite the probability of belonging to a cluster of size s as π s = 1( s − (cid:20) d s − d z s − (cid:18) h ( z ) z (cid:19)(cid:21) z =0 . (8)Using Eq. (6), this can also be written π s = 1( s − (cid:20) d s − d z s − g ( h ( z )) (cid:21) z =0 = 1( s − (cid:20) d s − d z s − (cid:2) g ′ ( h ( z )) h ′ ( z ) (cid:3)(cid:21) z =0 . (9)This expression can be rewritten using Cauchy’s for-mula for the n th derivative of a function,d n f d z n (cid:12)(cid:12)(cid:12)(cid:12) z = z = n !2 π ı I f ( z )( z − z ) n +1 d z, (10)where the integral is around a contour that encloses z in the complex plane but encloses no poles in f ( z ). Ap-plying this formula to Eq. (9) with z = 0 we get π s = 12 π ı( s − I g ′ ( h ( z )) z s − d h d z d z (11a)= h k i π ı( s − I g ( h ) z s − d h , (11b)where we have used Eq. (3) to eliminate g ′ in favor of g .In (11a) we choose the contour to be an inﬁnitesimal looparound the origin and, since h ( z ) goes to zero as z → z as a function of h , rather than theother way around, we make use of (7) to eliminate z andwrite π s = h k i π ı( s − I (cid:2) g ( h ) (cid:3) s h s − d h . (12)Applying (10) again we then ﬁnd that π s = h k i ( s − (cid:20) d s − d z s − (cid:2) g ( z ) (cid:3) s (cid:21) z =0 . (13)(An alternative and equivalent way to derive thisformula—although a less transparent one—would be torearrange Eq. (7) to give z as a function of h and thenapply the Lagrange inversion theorem [9] to derive theTaylor expansion of h or h . Indeed, Eqs. (8) to (13)are essentially a proof of a special case of the inversiontheorem, as applied to the problem in hand.)The only exception to Eq. (13) is for the case s = 1, forwhich Eq. (11) gives 0 / π is trivially equal to the probability ofhaving degree zero: π = p . (14)Between them, Eqs. (13) and (14) give the entire dis-tribution of component sizes in terms of the degree dis-tribution. They tell us explicitly the probability that arandomly chosen vertex belongs to a component of anygiven size s . For any speciﬁc choice of degree distribution,the application of Eq. (13) still requires us to perform thederivatives. Any ﬁnite number of derivatives can alwaysbe carried out exactly to give expressions for π s to ﬁniteorder. It is also possible in some cases to ﬁnd a generalformula for any derivative and so derive a closed-formexpression for π s for general s . In particular, it turns outto be possible, as we now show, to ﬁnd such closed-formexpressions for the three distributions most commonlystudied in the literature, the Poisson, exponential, andpower-law distributions.A network in which edges are placed between verticesuniformly at random has a Poisson degree distribution p k = e − c c k k ! , (15)where c is the distribution mean. Such networks havebeen studied widely for some decades, most famouslyby Erd˝os and R´enyi in the 1950s and 1960s [10, 11].Given Eq. (15), it is straightforward to show that g ( z ) = g ( z ) = e c ( z − and the derivatives in Eq. (13) can beperformed to give π s = e − cs ( cs ) s − s ! . (16)(The same expression also works for the special case s =1.) This expression for the component size distribution ofthe Poisson random graph has been derived in the pastby a number of other methods—see for instance [12]—but it is a useful check on our methods to see it appearhere as a special case of the more general formulation.Few real-world networks, however, have Poisson de-gree distributions. Most have highly right-skewed dis-tributions in which most vertices have low degree and asmall number of “hubs” have higher degree. A number ofnetworks, for example, are observed to have exponentialdegree distributions or distributions with an exponen-tial tail. Examples include food webs, power grids, andsome social networks [13, 14]. Consider the exponentialdistribution p k = C e − λk , where C is the appropriate nor-malizing constant. The generating functions in this case are g ( z ) = e λ − λ − z , g ( z ) = (cid:20) e λ − λ − z (cid:21) . (17)Again the derivatives are straightforward to carry outand we ﬁnd thatd n d z n (cid:2) g ( z ) (cid:3) s = (2 s − n )!(2 s − (cid:2) g ( z ) (cid:3) s (e λ − z ) n , (18)and hence π s = (3 s − s − s − − λ ( s − (cid:0) − e − λ (cid:1) s − . (19)Applying Stirling’s approximation for large s we canshow that this distribution behaves asymptotically as π s ∼ s e − µs , where µ = 2 ln (cid:2) (1 − e − λ ) (cid:3) − λ . Thus thecomponent size distribution approximately follows an ex-ponential law itself, although with an extra leading factorof s and a diﬀerent exponential constant.However, perhaps the greatest amount of attention inrecent years has been focused on networks that havepower-law degree distributions of the form p k ∝ k − α for some constant exponent α [15, 16, 17]. A numberof networks appear to follow this pattern, at least ap-proximately, including the world wide web, the Internet,citation networks, and some social and biological net-works [1]. The observed value of the exponent typicallylies in the range 2 < α <

3. Equivalently, we couldsay that the excess degree distribution q k —which ap-pears in the fundamental formula (13) via its generatingfunction—follows a power law with exponent α − q k , with a typical real-world value of α = 2 . q k = C Γ( k + )Γ( k + 2) , (20)where Γ( x ) is the standard gamma function and C isagain a normalizing constant. It is straightforward toshow (by Stirling’s approximation) that this distributionasymptotically follows a power law q k ∼ k − / , whichcorresponds to a raw degree distribution p k ∼ k − / . TheYule distribution appears in a number of contexts in thestudy of networks, particularly in the solutions of pref-erential attachment models that may explain the originof power laws in some networks [18, 19], and is consid-ered by some to be the most natural choice of power-lawform for discrete distributions. Employing this particularchoice for our conﬁguration model gives g ( z ) = 11 + √ − z , (21) s -4 -3 -2 -1 P r ob a b ilit y π s PoissonExponentialPower law

FIG. 1: The distribution of component sizes in random graphswith Poisson ( c = 1 . λ = 1), and power-law( α = 2 .

5) degree distributions. Solid lines indicate the ex-act solutions derived in this paper. Points are the results ofcomputer simulations for the same degree distributions. Eachpoint is an average over 5000 networks of 10 vertices each.Error bars have been omitted, but are smaller than the datapoints in each case. which in turn gives (cid:20) d n d z n (cid:2) g ( z ) (cid:3) s (cid:21) z =0 = 2 − (2 n + s ) ( s − × n − X j =0 ( n − j )!( s + n − − j )! j !( n − − j )! . (22)Setting n = s − π s = [1 − ln 2] − (3 s − s − s − s − s . (23)In Fig. 1 we show the form of this distribution, alongwith those for the Poisson and exponential networks,Eqs. (16) and (19). Also shown in the ﬁgure are nu-merical results for the distributions of component sizesmeasured on computer generated networks with the samedegree distributions. As the ﬁgure shows, there is excel-lent agreement between the simulations and the exactcalculations.As with the exponential network, we can study theasymptotic form of the component size distribution (23)for the power-law network by making use of Stirling’sapproximation. We ﬁnd that in the limit of large s , π s ∼ s e − νs , where ν = 5 ln 2 − ≃ . . . . Thus againwe have an exponential tail to the distribution.This last result is at ﬁrst slightly surprising. One mightimagine that the component size distribution should it-self fall oﬀ as a power law or slower because the degreeof a vertex provides a lower bound on the size of thecomponent to which the vertex belongs—the fraction of vertices in components of size s or greater must be atleast as large as the fraction of vertices of degree s orgreater and hence the cumulative distribution of com-ponents falls oﬀ as slow or slower than the cumulativedistribution of degrees.So how is it possible that we have an exponential dis-tribution of component sizes in the present case? Theanswer is that we are studying a network that has a giantcomponent. Vertices not in the giant component—whichmake up almost all of the component size distribution—have a diﬀerent degree distribution from the graph asa whole because the probability of not being in the gi-ant component dwindles exponentially with increasingdegree [8]. This creates an exponential cutoﬀ for thedegree distribution, and hence we are back to the situa-tion we had for the exponential network, which gave anexponential component size distribution.Thus in a power-law network we expect π s to havean exponential tail whenever there is a giant componentin the network, but a power-law tail when there is nogiant component. This contrasts with the case for essen-tially every other degree distribution, where we expect apower-law distribution of component sizes only preciselyat the phase transition where the giant component forms;everywhere else we expect the distribution to fall oﬀ ex-ponentially or faster [6].The methods described here can be extended to thecalculation of cluster sizes for percolation processes onnetworks also. Of particular interest is the bond perco-lation process, whose cluster sizes give the distributionof outbreaks for a standard SIR epidemiological processon the same network [7, 20]. Bond percolation can beframed in the same language as the calculation of com-ponent sizes above by considering the network formed byjust the occupied edges. If the occupation probabilityis φ , then it is straightforward to show [8] that the gener-ating functions for the degree distribution and excess de-gree distribution of this latter network are g (1 − φ + φz )and g (1 − φ + φz ), with g and g deﬁned as before.Substituting into Eq. (13), we then ﬁnd π s = φ s − h k i ( s − (cid:20) d s − d z s − (cid:2) g ( z ) (cid:3) s (cid:21) z =1 − φ . (24)This result immediately implies that for all φ < s . Thus, in the language of epidemiology,we will never see a power-law distribution of outbreaksizes, even if the network has a power-law degree distri-bution. This is, overall, good news: it implies that therewill be no fat tail to the outbreak distribution and henceno unexpectedly large outbreaks, regardless of whetherthe network has a giant component.To conclude, we have given an exact solution for thedistribution of component sizes in random graphs witharbitrary degree distributions and applied it to networkswith Poisson, exponential, and power-law distributed de-grees. In the latter case we ﬁnd that though the networkhas a power-law distribution of component sizes whenthere is no giant component, the distribution developsan exponential tail once a giant component appears. Wehave also applied our methods to bond percolation onnetworks, ﬁnding that percolation clusters always havean exponential tail to their distribution whenever the bond occupation probability is less than one.The author thanks Cris Moore for useful conversations.This work was funded in part by the National ScienceFoundation under grant DMS–0405348 and by the SantaFe Institute. [1] S. N. Dorogovtsev and J. F. F. Mendes, Evolution ofnetworks. Advances in Physics , 1079–1187 (2002).[2] M. E. J. Newman, The structure and function of complexnetworks. SIAM Review , 167–256 (2003).[3] M. E. J. Newman, A.-L. Barab´asi, and D. J. Watts, TheStructure and Dynamics of Networks . Princeton Univer-sity Press, Princeton (2006).[4] M. Molloy and B. Reed, A critical point for randomgraphs with a given degree sequence.

Random Structuresand Algorithms , 161–179 (1995).[5] M. Molloy and B. Reed, The size of the giant componentof a random graph with a given degree sequence. Combi-natorics, Probability and Computing , 295–305 (1998).[6] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Ran-dom graphs with arbitrary degree distributions and theirapplications. Phys. Rev. E , 026118 (2001).[7] P. Grassberger, On the critical behavior of the gen-eral epidemic process and dynamical percolation. Math.Biosci. , 157–172 (1982).[8] M. E. J. Newman, Spread of epidemic disease on net-works. Phys. Rev. E , 016128 (2002).[9] M. Abramowitz and I. A. Stegun (eds.), Handbook ofMathematical Functions . Dover Publishing, New York(1974).[10] P. Erd˝os and A. R´enyi, On random graphs.

PublicationesMathematicae , 290–297 (1959).[11] P. Erd˝os and A. R´enyi, On the evolution of randomgraphs. Publications of the Mathematical Institute of theHungarian Academy of Sciences , 17–61 (1960).[12] B. Bollob´as, Random Graphs . Academic Press, NewYork, 2nd edition (2001). [13] L. A. N. Amaral, A. Scala, M. Barth´el´emy, and H. E.Stanley, Classes of small-world networks.

Proc. Natl.Acad. Sci. USA , 11149–11152 (2000).[14] J. A. Dunne, R. J. Williams, and N. D. Martinez, Food-web structure and network theory: The role of con-nectance and size. Proc. Natl. Acad. Sci. USA , 12917–12922 (2002).[15] R. Albert, H. Jeong, and A.-L. Barab´asi, Diameter of theworld-wide web. Nature , 130–131 (1999).[16] M. Faloutsos, P. Faloutsos, and C. Faloutsos, On power-law relationships of the internet topology.

ComputerCommunications Review , 251–262 (1999).[17] J. M. Kleinberg, S. R. Kumar, P. Raghavan, S. Ra-jagopalan, and A. Tomkins, The Web as a graph: Mea-surements, models and methods. In T. Asano, H. Imai,D. T. Lee, S.-I. Nakano, and T. Tokuyama (eds.), Pro-ceedings of the 5th Annual International Conference onCombinatorics and Computing , number 1627 in LectureNotes in Computer Science, pp. 1–18, Springer, Berlin(1999).[18] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin,Structure of growing networks with preferential linking.

Phys. Rev. Lett. , 4633–4636 (2000).[19] P. L. Krapivsky, S. Redner, and F. Leyvraz, Connectivityof growing random networks. Phys. Rev. Lett. , 4629–4632 (2000).[20] D. Mollison, Spatial contact models for ecological andepidemic spread. Journal of the Royal Statistical SocietyB39