[PDF] Scaling of load in communications networks

Abstract

We show that the load at each node in a preferential attachment network scales as a power of the degree of the node. For a network whose degree distribution is p(k) ~ k^(-gamma), we show that the load is l(k) ~ k^eta with eta = gamma - 1, implying that the probability distribution for the load is p(l) ~ 1/l^2 independent of gamma. The results are obtained through scaling arguments supported by finite size scaling studies. They contradict earlier claims, but are in agreement with the exact solution for the special case of tree graphs. Results are also presented for real communications networks at the IP layer, using the latest available data. Our analysis of the data shows relatively poor power-law degree distributions as compared to the scaling of the load versus degree. This emphasizes the importance of the load in network analysis.

Full PDF

SScaling of load in communications networks

Onuttom Narayan and Iraj Saniee Department of Physics, University of California, Santa Cruz, CA 95064 and Mathematics of Networks Department, Bell Laboratories,Alcatel-Lucent, 600 Mountain Avenue, Murray Hill, NJ 07974 (Dated: October 29, 2018)We show that the load at each node in a preferential attachment network scales as a power of thedegree of the node. For a network whose degree distribution is p ( k ) ∼ k − γ , we show that the loadis l ( k ) ∼ k η with η = γ − , implying that the probability distribution for the load is p ( l ) ∼ /l independent of γ. The results are obtained through scaling arguments supported by ﬁnite size scalingstudies. They contradict earlier claims, but are in agreement with the exact solution for the specialcase of tree graphs. Results are also presented for real communications networks at the IP layer,using the latest available data. Our analysis of the data shows relatively poor power-law degreedistributions as compared to the scaling of the load versus degree. This emphasizes the importanceof the load in network analysis.

PACS numbers:

A variety of problems in ﬁelds ranging from the socialsciences to biology to engineering deal with networks, forwhich a uniﬁed understanding has been sought [1, 2, 3].One of the models commonly used is the preferential at-tachment (PA) model due to Barabasi and Albert [4] andits generalizations [5, 6, 7]. This model generates scale-free networks, in which the probability for a node to havea degree k scales as p ( k ) ∼ k − γ . Depending on the pa-rameters of the model, γ can be varied continuously overthe range 2 < γ ≤ . It has been argued that this modelis appropriate to describe communications networks suchas the Internet.Among the various properties of PA networks that havebeen studied is the distribution of load (with uniform de-mand) at diﬀerent nodes of the network. This is deﬁnedby assuming that one unit of traﬃc ﬂows between eachpair of nodes in the network along the shortest path con-necting them[8]. (If there are multiple shortest pathsbetween a pair of nodes, the traﬃc between them is di-vided equally among all the shortest paths[9].) In thissetting, the amount of traﬃc ﬂowing through a node is itsload. Based on numerical simulations [10], it was claimedthat the probability distribution for the load scales as p ( l ) ∼ /l δ with δ = 2 . . Subsequently, data for net-works in various diﬀerent ﬁelds were presented, and itwas argued that [11] there are two universality classeswith δ = 2 and δ = 2 . . These claims were disputed [12]on the basis further numerical simulations, which seemedto indicate that δ varies continuously with γ and is there-fore not universal. For the special case of PA networksthat are trees, it was argued [13] and then proved [14]that δ = 2 . In this paper, we show that the average load at nodesof degree k scales as l ( k ) ∼ k η with η = γ − γ. (As k is increased for ﬁxed N, ﬁnite size eﬀects are seen.) If we assume that thedistribution of load for ﬁxed k and N does not have ananomalously large width, this implies that the exponent δ is universal, but is equal to 2, contradicting the earlier claims [10, 11, 12], and extending the exact result for PAtrees. We also extend the analytical proof of Ref. [14]for tree graphs to show directly that η = 2 , supportingthe assumption that the distribution of l for ﬁxed k isnot anomalous. Our results are obtained by simple scal-ing arguments that are reinforced by ﬁnite size scalingstudies. The deviations from this universal result thatare observed [10, 12] are due to ﬁnite size scaling andsubleading corrections to the asymptotic scaling form.We also show results for load analyses on networksdrawn from a recent database [15] of connectivity of com-munications networks at the IP layer. The data arecollected with new measurement techniques, and ﬁndmany more routers and links than earlier studies [16].The results demonstrate that the scaling of the load l ( k ) ∼ k η is much clearer than that of the —more com-monly studied— degree distribution.In the generalized PA model, a network grows one nodeat a time. Each node is born with m undirected edgeswhich are attached to preexisting nodes. The probabilityof attachment to a preexisting node of degree k is pro-portional to k + k . Thus k and m are the parametersof the model, with k < − m. For an inﬁnite network, itcan be shown that the probability of a randomly chosennode having a degree k is to p k ∝ k − γ γ = 3 + k /m (1)for large k. For such a network, we assume —as veriﬁedlater through numerical simulations— that the averageload l N ( k ) at all the nodes of degree k in a network of N nodes has the scaling form l N ( k ) = N k η ˆ l ( k/N µ ) (2)where ˆ l ( x ) → x → l ( x ) → x → ∞ . The prefactor of N is reasonable, since most of the loadat nodes near the periphery of the network, for which k ∼ O (1), is due to traﬃc that starts or ends there,and is therefore O ( N ) . The exponent η can be found by a r X i v : . [ phy s i c s . s o c - ph ] J un noting that (cid:80) k [ N p k ] l N ( k ) is the total traﬃc ﬂowing inthe network. Since N units of traﬃc are generated inthe graph, and the average geodesic length is ∼ ln N, thesum should scale as ∼ N ln N for large N. From Eqs.(1)and (2), this implies that η = γ − . (3)To ﬁnd the exponent µ, we note that for a network of N nodes, the maximum degree k max that is achievedcan be estimated by requiring (1 − (cid:80) ∞ k max p k ) N to be ∼ O (1) . From Eq.(1), for large k max this is equivalentto exp[ − AN/k γ − max ] ∼ O (1) with some constant A, fromwhich k max ∼ N / ( γ − . If we assume that all character-istic k ’s scale with N in the same way, we obtain µ = 1 / ( γ − . (4)For k << N µ , Eq.(2) implies that l N ( k ) ∼ N k η . Whencombined with Eq.(1), we have p ( l ) dl ∼ dl/l δ with δ = γ − η + 1 = 2 (5)where we have used Eq.(3).Although these results are plausible, they are basedon assumptions, most notably the scaling hypothesis ofEq.(2) itself. To check these assumptions, we turn to ﬁ-nite size scaling numerical simulations. Networks with N ranging from 500 to 8000 or 16000 were generated for dif-ferent values of m and k . We considered the cases k = 0with m = 1 , , , m = 6 with k = 1 , , , , . Foreach choice of k , m and N, l N ( k ) was calcu-lated by averaging over all the nodes of degree k in the100 graphs. For the plots, it is more convenient to usethe form l N ( k ) = N ˜ l ( k/N µ ) instead of Eq.(2), whichone obtains if one uses Eqs.(4) and (3).Figure 1 shows the results for m = 1 and k = 0 , forwhich µ = 1 / l ( k ) ∼ k . , consistent with earlier results [13]. In view of the analyt-ical results for PA tree graphs, this discrepancy must beattributed to ﬁnite size scaling eﬀects which ﬂatten thecurve for large k and presumably reduce the apparentvalue of η. The same discrepancy is seen for k = 0 with m = 2 , ,

6; it is reasonable to attribute it to the samecause.Figure 2 shows a similar scaling plot for m = 6 and k = 3 , for which µ = 2 / . The scaling collapse and theﬁt to l ( k ) ∼ k . m = 6 and k = 5 . The scalingcollapse is again very good, but ﬁnite size corrections forlarge k now increase the slope of the curve. Thus the ﬁtto the predicted form of ∼ k / only works when k <> O (1) . ( Unless the scaling hypothesisbreaks down, Eq.(3) follows from the ln N factor in thetotal load. Therefore, we do not try a scaling plot withadjustable exponents..) We have also made similar plots Load / N k/N N=2000N=4000N=8000N=16000

FIG. 1: Average traﬃc at nodes of degree k as a function of k for the PA model with m = 1 , k = 0 . (Here and in thesubsequent ﬁgures, N/ N. ) A scaling collapse with the exponents from Eqs.(3) and(4) works reasonably well. However, a straight line with thepredicted slope of 2.0 is shown and only ﬁts the curve — if atall — for small k. Similar deviations are seen for m = 2 , , k = 0 .

10 100 1000 10 Load / N k/N N=2000N=4000N=8000

FIG. 2: Scaling plot of the average traﬃc at nodes of degree k as a function of k for the PA model with m = 6 , k = 3 . The scaling collapse is very good, as is the ﬁt to ∼ k / in thescaling regime. for m = 6 and k = 1 , k < , ﬁnite sizecorrections reduce the apparent η for large k, while for k > , they increase the apparent η for large k. This isconsistent with Ref. [12].For the tree graphs generated by the PA model with m = 1 , k = 0 , the result l ( k ) ∼ k follows from p ( k ) ∼ /k and p ( l ) ∼ /l if we assume that l ( k ) scales as a

10 100 1000 10 Load / N k/N N=2000N=4000N=8000N=16000

FIG. 3: A plot similar to Figure 2 but with k = 5 . For k ∼ O (1) , the individual curves pull away from the scalingform. For k ∼ O ( N / ) , ﬁnite size eﬀects cause the curvesto bend upwards. Between these two regimes, the slope isconsistent with ∼ k / as predicted. power of k and that the distribution of l for ﬁxed k isnot anomalously broad. Although these are reasonableassumptions, it is not diﬃcult to prove l ( k ) ∼ k directly.The probability that the node created at time τ will beattached to a preexisting node of degree k is equal to k/ (2 τ − . Therefore the probability that a node createdat time τ will have exactly k nodes subsequently attachedto it is p k +1 ,N ( τ ) = N (cid:88) τ<τ <...τ k P ( τ +1 , τ − ,

1) 12( τ − P ( τ +1 , τ − ,

2) 22( τ − . . . P ( τ k − +1 , τ k − , k ) k τ k − P ( τ k +1 , N, k +1)(6)where P ( τ, τ (cid:48) , m ) = (cid:32) − m τ − (cid:33)(cid:32) − m τ (cid:33) . . . (cid:32) − m τ (cid:48) − (cid:33) . (7)Replacing P ( τ, τ (cid:48) , m ) as the exponential of an integralinstead of a sum in the approximation τ >> , we have P ( τ, τ (cid:48) , m ) = ( τ /τ (cid:48) ) m/ . With this, Eq.(6) simpliﬁes to p k +1 ,N ( τ ) ≈ N (cid:88) τ = τ . . . N (cid:88) τ k = τ (cid:114) τN √ τ N . . . √ τ k N (8)where we have used the symmetry of the τ i ’s to eliminatethe restriction τ < τ . . . < τ k . If the sums are replacedwith integrals, p k +1 ,N ( τ ) = (cid:112) τ /N [1 − (cid:112) τ /N ] k [17] and p k = (1 /N ) (cid:80) τ p k,N ( τ ) ∼ / [ k ( k + 1)( k + 2)] for large N. If n , n , . . . n k are the sizes of the k subtrees descend-ing from a node of degree k +1 , one can show [14] that forlarge N the load at the node is proportional to N (cid:80) n i . If n i ( t ) is the size of the i ’th subtree at time t ≥ τ i , with n i ( N ) = n i , the probability that n i ( t + 1) = n i ( t ) + 1 is (2 n i − / (2 t ) , with the initial condition n i ( t ) = 1 . attime t. Averaging over randomness for ﬁxed τ i , the solu-tion is (cid:104) n i ( t ) (cid:105) = ( t/τ i + 1) / . With the symmetrizationof the previous paragraph, l N ( k +1; τ ) ∝ N (cid:104) (cid:88) n i (cid:105) = N k (cid:104) n i (cid:105) = kN − x (cid:90) x x i x i dx i (9)where τ i = N x i , τ = N x and we have replaced sumswith integrals. This yields l N ( k + 1) ∝ N k (cid:90) x (1 − x ) k [2 k (1 + x ) + x ] dx ∝ N k (10)where we have only kept the terms that are relevant for N >> k >> . The terms dropped with the k >> -5 -4 -3 -2 -1

110 1 10 100 N / k l ( k ) kp(k)P(k)N/k l(k) 10 -3 -1 P ( l ) l FIG. 4: Log log plot of N/ ( kl ( k )) as a function of k for theSprintlink network [15]. The dashed line is visually adjustedfor best ﬁt, and has a slope of − . l ( k ) ∼ k η with η = 1 . . The plot also shows the degree distribution p ( k )and the cumulative distribution P ( k ) = P ∞ k p ( k ) . The insetshows the cumulative load distribution P ( l ) = R ∞ l p ( l ) dl and a straight line with slope -1. We now compare with network data from the Rocket-fuel database [15], which is the most recent, comprehen-sive and publicly available collection of measurements ofthe connectivity between nodes of communications net-works at the IP layer. There are ten networks with121 to 10214 routing nodes (from here on referred to as”routers”) in the database. Since the data is insuﬃcientto test scaling functions (in our simulations we workedwith 100 networks for each N, with 500 < N < ∼ -5 -4 -3 -2 -1

110 1 10 100 N / k l ( k ) kp(k)P(k)N/k l(k) 10 -3 -1 P ( l ) l FIG. 5: Plot similar to Figure 4 but with results from all tennetworks in the database combined. law with a slope of 1.3. The degree distribution itself ismuch more irregular, and it is diﬃcult to say whetherit is of the form ∼ k − γ with γ = η + 1 . The cumula-tive degree distribution is also shown in the ﬁgure; itis much smoother, but does not show clear power law behavior. The inset shows the cumulative load distribu-tion and a straight line with the predicted slope of -1,which is equally unconvincing. Figure 5 is a similar ﬁg-ure with all ten networks in the database merged. Theplots are straighter, and one could perhaps argue for anarrow power law regime in the cumulative degree distri-bution as is done in Ref. [15] or a regime with slope − p ( k ) ∼ k − γ , the average traﬃc as a function of node degree scales as l ( k ) ∼ k γ − . This is equivalent to the statement that theprobability distribution for the load scales as p ( l ) ∼ /l regardless of γ. Although the numerical simulations andanalytical calculations are for a speciﬁc model, the resultfollows from the scaling assumption and the small-worldphenomenon and is therefore more robust. The scaling l ( k ) ∼ k η is also seen clearly in networks at the IP layer,and is in fact much better than the degree distributionwhich has attracted much more interest.This work was supported by AFOSR grant FA9550-08-1-0064. [1] R. Albert and A.-L. Barabasi, Rev. Mod. Phys. , 47(2002).[2] S.N. Dorogovtsev and J.F.F. Mendes, Adv. Phys. ,1079 (2002).[3] M.E.J. Newman, SIAM Review , 167 (2003).[4] A.-L. Barabasi and R. Albert, Science , 509 (1999).[5] S.N. Dorogovtsev, J.F.F. Mendes and A.N. Samukhin,Phys. Rev. Lett. , 4633 (2000).[6] S.N. Dorogovtsev and J.F.F. Mendes, Phys. Rev. E ,056125 (2001).[7] P.L. Krapivsky, S. Redner and F. Leyvraz, Phys. Rev.Lett. , 4629 (2000).[8] This notion of load is sometimes referred to as “between-ness centrality”.[9] We do not distinguish between geodesics that are partlyoverlapping and those that are completely distinct. Thusif there are three geodesics between a pair of nodes, withthe ﬁrst two overlapping for most of their lengths andthe last one completely separate, the traﬃc along eachwould be 1 / , resulting in greater traﬃc over most of theoverlapping geodesics. [10] K-I. Goh, B. Kahng and D. Kim, Phys. Rev. Lett. ,278701 (2001).[11] K-I. Goh, E. Oh, H. Jeong, B. Kahng and D. Kim, Proc.Natl. Acad. Sci, , 12583 (2002).[12] M. Barthelemy, Phys. Rev. Lett. , 189803 (2003).[13] G. Szabo, M. Alava and J. Kertesz, Phys. Rev. E ,026101 (2002).[14] B. Bollobas and O. Riordan, Phys. Rev. E63