[PDF] A critique of the Mean Field Approximation in preferential attachment networks

Abstract

Full PDF

AA CRITIQUE OF THE M EAN F IELD A PPROXIMATION INPREFERENTIAL ATTACHMENT NETWORKS

Matthijs Ruijgrok

Department of MathematicsUtrecht UniversityP.O. Box 80010, 3508 TA UtrechtThe Netherlands [email protected]

June 25, 2020 A BSTRACT

The Mean Field Approximation (MFA), or continuum method, is often used in courses on Networksto derive the degree distribution of preferential attachment networks. This method is simple and theoutcome is close to the correct answer. However, this paper shows that the method is ﬂawed in severalaspects, leading to unresolvable contradictions. More importantly, the MFA is not explicitly derivedfrom a mathematical model. An analysis of the implied model shows that it makes an approximationwhich is far from the truth and another one which can not be motivated in general. The success of theMFA for preferential attachment networks is therefore accidental and the method is not suitable forteaching undergraduates. K eywords BA networks · Mean Field Approximation · continuum method · master equation The birth of Network Science occurred somewhere in the late nineties. One of its foundational papers is the article byBarabási and Albert [2] on preferential attachment networks, which has become the most cited paper in the disciplinewith currently more than 36000 citations. The paper introduced an analytical method to calculate the node degreedistribution of preferential attachment networks, which are now known as Barabási-Albert (BA) networks. This methodis called the Mean Field Approximation (MFA) or continuum method. The result of the MFA is close to correct, but notexactly. The MFA has since been reproduced in many textbooks, see for example [1], [4] and [6] and in (online) coursenotes, see [10] and [11] (speciﬁcally [12]) .In this paper, I will analyse the MFA from the point of view of mathematical modelling. The conclusion is that themethod is the result of poor modelling, and its relative success is accidental. Because the MFA is so widely used incourses on networks, I believe it is worthwhile to point out its failings.The motivation for writing this paper is my experience with teaching the MFA in an undergraduate courseon networks. I encountered two major problems with the method. First, the MFA is a sequence of steps, each of whichis familiar from introductory undergraduate mathematics courses. However, none of the steps is carefully motivated,but simply applied mechanically. It obscures where the approximations are made, and what assumptions are necessary.Especially for students without much experience in the ﬁeld, it gives a wrong impression of mathematical modelling.The second problem with the MFA is in its details. It contains fallacies that lead to contradictions, makes arbitrarychoices, confuses stochastic variables with deterministic ones, uncritically changes discrete variables to continuousones and produces an incorrect answer.This last set of problems is actually not the most serious of the two. By slightly reformulating the method, theseproblems can be removed, except for the not quite correct ﬁnal answer. However, the reformulation reveals what a r X i v : . [ phy s i c s . s o c - ph ] J un odelling choices had been implicitly made. A closer look at the reconstructed full mathematical model shows thatit relies on a very poor approximation, and an assumption which, although almost true for BA networks, is false ingeneral. The modelling process can not be rehabilitated and this is the reason I believe the MFA is not a suitable methodto teach to students. The BA network starts at time t = 1 with one node, which has m ≥ self-loops. At time-step t + 1 , a new node isadded to the network. The new node has m outgoing links, each of which is attached to one of the t existing nodes. Theprobability that the new node links to a certain existing node is taken to be proportional to the degree of this target node.This process is then repeated.The primary quantity of interest for the BA network is its expected degree-distribution p ( k ; t ) , which is the expectedfraction of nodes at time t that have degree k . Equivalently, p ( k ; t ) is the probability that a randomly and uniformlychosen node at time t has degree k .Numerical simulations in [2] show that, when averaged over many realisations, p ( k ; t ) converges in time to a stationarydistribution p ( k ) . In particular, p ( k ) ≈ c k − γ for large k , where γ = 2 . ± . and c a constant.The MFA predicts: p ( k ) ≈ m k , (1)for large k .Soon after [2], Krapivsky, Redner and Leyvraz [9] and Dorogovtsev, Mendes and Samukhin [8], simultaneously andindependently, derived the approximation: p ( k ) ≈ m ( m + 1) k ( k + 1)( k + 2) , (2)valid for all k ≥ m .For large k , both (1) and (2) have a leading term of order k − , although the prefactors differ. In contrast to (1),expression (2) deﬁnes a probability distribution, because (cid:80) k ≥ m p ( k ) = 1 . Both Krapivsky et al. and Dorogovtsev et al.use the rate equation method to arrive at the result. See Appendix A for a quick review of this method.In Bollobás et al. [5] proved that, when couched in a careful mathematical formulation, the limit p ( k ) indeedexists and that the approximation (2) is exact. In the following years, research on networks quickly expanded, bothin its foundations and in ﬁnding applications in many ﬁelds. Soon, textbooks were published and Network Sciencebecame a subject of undergraduate curricula. Many authors chose to use the MFA to analyse BA networks, although inmost cases these books and notes also discuss the rate equation method, and mention both result (1) and (2). In the BA model, time is discrete and will be denoted as t = 1 , t = 2 , . . . . Nodes will be identiﬁed by the moment intime they were added to the network, also known as their birth date. The degree at time t of a node born at time t i isdenoted K i ( t ) , with t ≥ t i . For all nodes, K i ( t i ) = m .The probability that a node t i is chosen as a target node at time t + 1 is proportional to K i ( t ) , speciﬁcally: Π i ( t ) = K i ( t ) (cid:80) j = N ( t ) j =1 K j ( t ) , (3)where N ( t ) is the number of nodes in the network at time t . The following description of the MFA follows [2] and [3].(a) Let N ( t ) be the number of nodes at time t and S ( t ) = (cid:80) j = N ( t ) j =1 K j ( t ) . Obviously N ( t ) = t . Since at eachtime t = 1 , , . . . a new node is introduced which has m outgoing links and the rest of the network receives m incoming links, it is clear that: S ( t ) = 2 mt . (4) This choice is made for simplicity. Starting with a network with more nodes and arbitrary connections doesn’t change itsproperties for large t . K i ( t ) and t are reals, rather than integers. The rate of change of K i ( t ) per unit time is equalto m Π i ( t ) , giving d K i d t = m Π i ( t ) = mK i S ( t ) = K i t , K i ( t i ) = m , (5)with solution K i ( t ) = m ( tt i ) / . (6)(c) Let P ( k ; t ) be the probability that, at time t , a randomly and uniformly chosen node has degree k or larger.Because there is a one-to-one correspondence between the birth date of a node and its degree, given by (6),this probability is equal to the probability that the birth date of this node is t i or less. All nodes born after thisdate have degree smaller than k . Therefore, P ( k ; t ) = Pr ( t i ≤ ( mk ) t ) = ( mk ) . (7)The last equality follows from the fact that t i is chosen with uniform probability from the interval [0 , t ] .(d) The degree-distribution of the network at time t is given by p ( k ; t ) = − ∂P ( k ; t ) ∂k = 2 m k . (8) The above steps raise many objections, as itemized below: • Since S ( t ) = 2 mt is a deterministic function, it is implied that K i ( t ) is also a deterministic quantity, whichit is not. The degree K i ( t ) is a stochastic variable, with a corresponding probability distribution over all itspossible outcomes. • The equality S ( t ) = 2 mt is true for integer values of t , but not for other values. Since no nodes or edges areadded between integer times t = n and t = n + 1 , the value of S ( t ) is constant during that time. • The degree K i ( t ) has integer values. In the MFA, K i ( t ) can take on non-integer values, the meaning of whichis not explained. • The erroneous formula S ( t ) = 2 mt leads to a contradiction. Take t ∈ ( n − , n ) . Summing both sides of K (cid:48) i ( t ) = m Π i ( t ) from i = 1 to i = n − gives S (cid:48) ( t ) = m , with solution S ( t ) = mt + c , which is obviouslynot the same as mt . • If the assumption of K i ( t ) as a deterministic quantity is accepted, no further steps are needed. Deﬁne the jumppoints at time t as ¯ k i ( t ) = m ( tt i ) / . (9)Picking a node at time t , randomly and uniformly, produces a probability distribution ˜ p ( k ; t ) = (cid:26) t if k = ¯ k i ( t )0 otherwise , (10)which is by deﬁnition the sought-after degree distribution. • Equation (7) is not correct. Since the degrees of the nodes in the network have integer values, the true P ( k ; t ) is a piece-wise constant function of k , with jumps in integer values of k .3 .3 A visual overview of the MFA

10 20 30 40 k (a) Assuming that K i ( t ) is a deterministic quantity, the prob-ability distribution ˜ p ( k ; t ) for the degree of a randomly anduniformly chosen node at time t is uniform and concentratedin k = ¯ k i ( t ) , i = 1 , , . . . , t .

10 20 30 40 k (b) ˜ P ( k ; t ) = P r ( K i ( t ) ≥ k ) (red) is the (complemen-tary) cumulative distribution corresponding to ˜ p ( k ; t ) . It isapproximated by P MFA ( k ; t ) = ( m/k ) (black).

10 20 30 40 k (c) Deﬁne p MFA ( k ) = − ∂∂k P MFA ( k ; t ) = (2 m ) / ( k ) (blue). Compared with the starting distribution ˜ p ( k ; t ) (red),a considerable redistribution of probability has occurred.

10 20 30 40 k (d) The cumulative distribution ˜ P ( k ; t ) (red) agrees wellwith the complementary cumulative distribution of the true p ( k ; t ) (blue) given by (2). Figure 1: The MFA method in pictures. In all graphs, t = 25 and m = 8 .Apart from the errors mentioned in 3.2, the procedure lacks a heuristic content, as illustrated in Figure 1. The procedurestarts with the probability distribution ˜ p ( k ; t ) . It is clear that ˜ p ( k ; t ) is not in any way close to the true p ( k ; t ) , ifonly because the support of ˜ p ( k ; t ) is the set of ¯ k i ( t ) rather than the integers. Then, the corresponding cumulativedistribution ˜ P ( k ; t ) is approximated by P MF A ( k ; t ) = ( m/k ) . This expression is differentiated with respect to k , andits evaluation on the integers is declared to be the sought-after solution.There is no obvious reason why this redistribution of probabilities should lead to an approximation of p ( k ; t ) .Note that Figure 1d shows that ˜ P ( k ; t ) , although derived from incorrect assumptions and through a questionableredistribution of probability, agrees well with the complementary cumulative distribution of the correct expression (2).For large values of k , the error seems to be no larger than /t .4 The implied model

There are two sources for the problems identiﬁed in 3.2. The ﬁrst is the extension of t and K i ( t ) to continuous quantities.Since there is no assumption of anything changing in the network between two integer times, and degrees of nodes arealways integer valued, these extensions are unnecessary. The second source is taking K i ( t ) to be deterministic. Theintention of this assumption is that K i ( t ) , as it is used in the MFA, should be seen as the expected value of the degreeof node i at time t . This becomes clear by noting that the MFA value of K i ( t ) is deﬁned by the differential equation (5),the right hand side of which is the amount by which the expected value of the actual K i ( t ) changes during one timestep. Moreover, it is apparent that rather than taking K i ( t ) as deterministic, the MFA makes the approximation thatthe probability distribution for K i ( t ) is completely concentrated in a single point. Obviously, this single point is theexpected value of K i ( t ) . The above considerations imply the following mathematical model which follows the steps of the procedure outlined in3.1, but without its contradictions. Start with the description of the model as in section 3. An approximation P MF A ( k ; t ) of the cumulative probability function P ( k ; t ) will be calculated using the probability distributions of the K i ( t ) . Fromthis approximation, a probability distribution p MF A ( k ; t ) can then be derived.(a) Let B ( t ) be the birth date of a randomly and uniformly chosen node at time t . Clearly, P r ( B ( t ) = t i ) = 1 /t ,for all i = 1 , , . . . , t . Then P ( k ; t ) = t (cid:88) i =1 P r ( B ( t ) = t i ) P r ( K i ( t ) ≥ k ) = 1 t t (cid:88) i =1 P r ( K i ( t ) ≥ k )= 1 t t (cid:88) i =1 ∞ (cid:88) j = k p i ( j ; t ) , (11)where p i ( j ; t ) is the probability that K i ( t ) = j , at time t .(b) Now approximate the probability distribution p i ( x ; t ) of K i ( t ) by the delta-distribution: ˜ p i ( x ; t ) = (cid:26) if x = (cid:104) K i ( t ) (cid:105) otherwise (12)with (cid:104) K i ( t ) (cid:105) the expected value of K i ( t ) .(c) The above assumption leads to: ∞ (cid:88) j = k ˜ p i ( j ; t ) = (cid:26) if k ≤ (cid:104) K i ( t ) (cid:105) otherwiseFor a given value of k , let i = b ( k ; t ) be the smallest index such that k ≤ (cid:104) K i ( t ) (cid:105) at time t . Then P ( k ; t ) isapproximated by: ˜ P ( k ; t ) = 1 t b ( k ; t ) (cid:88) i =1 t b ( k ; t ) . (d) From the deﬁnition of the model, the following recursion can be obtained: (cid:104) K i ( t + 1) (cid:105) = (cid:104) K i ( t ) (cid:105) + (cid:104) K i ( t ) (cid:105) t , (cid:104) K i ( t i ) (cid:105) = m . (13)The solution of (13) is approximated by the following differential equation (for ease of presentation, thisapproximation will still be denoted (cid:104) K i (cid:105) ):d (cid:104) K i (cid:105) d t = (cid:104) K i (cid:105) t , (cid:104) K i ( t i ) (cid:105) = m , with solution (cid:104) K i ( t ) (cid:105) = m ( tt i ) / . (14)5e) From (14) follows: b ( k ; t ) = (cid:98) ( m/k ) t (cid:99) , and ˜ P ( k ; t ) = 1 t (cid:98) ( mk ) t (cid:99) . (15)(f) Make the following approximation: P MF A ( k ; t ) = ( mk ) . (16)(g) The value of p MF A ( k ; t ) is then deﬁned as p MF A ( k ; t ) = P MF A ( k ; t ) − P MF A ( k + 1; t ) . (17)(h) The right hand side of (17) can be approximated by a derivative, giving p MF A ( k ; t ) = − ∂∂k ( mk ) = 2 m k . Although the model in 4.1 is now free of contradictions, it has many questionable aspects, listed below.(b) The distributions p i ( j ; t ) are concentrated on integer values of j , whereas the support of ˜ p i ( x ; t ) is on thevalues x = (cid:104) K i ( t ) (cid:105) . Since it is not likely that these values are close to integers, ˜ p i ( x ; t ) is in no sense anapproximation of p i ( j ; t ) .Moreover, (12) is very crude. It assumes that all the probability of K i ( t ) is concentrated in a single point. Putanother way, this assumption says that all moments of K i ( t ) apart from the mean are negligible. AppendixB, which presents a method to calculate the p i ( j ; t ) , shows that this assumption is far from the truth. Thedistributions p i ( j ; t ) have a considerable variance.(c) Given the previous remark, there is no à priori reason to believe that ˜ P ( k ; t ) is close to the true P ( k ; t ) .(d) The recursion (13) can be solved without recourse to the dubious method of replacing it with a differentialequation. Appendix C shows that the result is still close to expression (14), at least when t i is not too small.(f) The function P MF A ( k ; t ) is exactly equal to ˜ P ( k ; t ) for all values of k such that ( mk ) t is an integer. Theseare the jump points k = ¯ k i ( t ) .However, for other values of k , notably the integers, there is no reason to choose the particular form (16) as anapproximation. There are many functions that agree with ˜ P ( k ; t ) in the jump points and are decreasing in k .Each of these different functions will lead to different values of p MF A ( k ; t ) . Because the distances betweensuccessive jump points is increasing without bound, ˜ P ( k ; t ) simply does not provide enough data points tomake a reasoned guess what the values of P ( k ; t ) might be in integer values of k .(h) This step is superﬂuous.The objection against item (f) already invalidates the reasoning behind the statement that m /k is a good approxima-tion of p ( k ; t ) .However, when (15) is accepted as an approximation of P ( k ; t ) , a weaker result can be derived, which only uses thevalue of ˜ P ( k ; t ) in the jump points. The average value of ˜ p ( k ; t ) on the interval between two successive jump pointscan be calculated, recalling that ¯ k i ( t ) > ¯ k i +1 ( t ) and ˜ P ( k ; t ) is a decreasing function of k : ˜ p av (¯ k i ( t ); t ) = 1(¯ k i ( t ) − ¯ k i +1 ( t )) j< ¯ k i ( t ) (cid:88) j ≥ ¯ k i +1 ( t ) ˜ p ( j ; t )= ˜ P (¯ k i +1 ( t ); t ) − ˜ P (¯ k i ( t ); t )(¯ k i ( t ) − ¯ k i +1 ( t )) . Using expressions (15) and (9), and the fact that t i +1 = t i + 1 , this becomes: ˜ p av (¯ k i ( t ); t ) = 2 t / i mt / + O ( t − i ) = 2 m ¯ k i ( t ) + O ( t − i ) . (18)Without making further assumptions, (18) is the strongest possible statement about the asymptotic behaviour of p ( k ; t ) for large values of k , using only the values of the means of the K i ( t ) .6 Why does the MFA work for BA networks? b ( k;t ) t birthkmdegree AB DC

Figure 2: A partition of the domain of p ( l, t i ) , the probability that at time t a uniformly randomly chosen node has birthdate t i and degree l . The black horizontal line shows l = k , the vertical line t i = b ( k ; t ) . The dashed line is (cid:104) K i (cid:105) .The MFA delivers a result which, even in its weak form (18), is reasonably close to the correct expression. Consideringthe above objections, this is an unexplained phenomenon. The answer to this conundrum is found in a closer inspectionof step (c), which is the crucial part of the model. In this step, the true P ( k ; t ) is pronounced to be close to theconstructed ˜ P ( k ; t ) . This is not supported by any heuristic, since the underlying probability distributions are verydissimilar, as illustrated in Figure 1. However, it is possible to derive a condition under which this closeness is indeedachieved. To this end, rewrite: ˜ P ( k ; t ) = 1 t b ( k ; t ) (cid:88) i =1

1= 1 t b ( k ; t ) (cid:88) i =1 ∞ (cid:88) l = m p i ( l, t )= 1 t ( k − (cid:88) l = m b ( k ; t ) (cid:88) i =1 p ( l, t i ) + ∞ (cid:88) l = k b ( k ; t ) (cid:88) i =1 p ( l, t i ))= 1 t ( (cid:88) A p ( l, t i ) + (cid:88) B p ( l, t i )) . (19)On the other hand, P ( k, t ) can be written as P ( k ; t ) = 1 t ( ∞ (cid:88) l = k b ( k ; t ) (cid:88) i =1 p ( l, t i ) + ∞ (cid:88) l = k t (cid:88) i = b ( k ; t )+1 p ( l, t i ))= 1 t ( (cid:88) B p ( l, t i ) + (cid:88) C p ( l, t i )) . (20)See Figure 2 for an illustration of the areas A , B and C .Comparing (19) and (20) shows that ˜ P ( k ; t ) is a good approximation of P ( k ; t ) if and only if the relative error | (cid:80) A p ( l, t i ) − (cid:80) C p ( l, t i ) | (cid:80) B p ( l, t i ) (21)is small.The expression ( (cid:80) A p ( l, t i )) /t gives the probability that, at time t , a uniformly randomly chosen node has a birth datebefore t i = b ( k ; t ) and a degree smaller than k , whereas ( (cid:80) C p ( l, t i )) /t is the probability that such a random node hasbirth date after b ( k ; t ) and a degree larger than k . For the MFA to work, the error (21) must be small for all large valuesof k .The distribution of p ( l, t i ) /t for the case of the BA network is shown in Figure 3, using the results found in AppendixB. The ﬁgure is somewhat stylized, since it shows the distribution as if t i and l were continuous variables. This is onlydone for illustration purposes.For large values of k , the value of b ( k ; t ) is small. Then, (cid:80) A p ( l, t i ) /t is roughly the product of a small probability7the birth date of the chosen node is less than t i = b ( k ; t ) ) and a quite substantial probability (the degree of thenode, conditional on its birth date, is less than k ). For (cid:80) C p ( l, t i ) /t the magnitudes of the probabilities are reversed.Apparently, the products of these probabilities in A and C are more or less the same size. This balancing phenomenonresults from the facts that the width of the distribution p ( l, t i ) is larger for small values of t i compared to larger onesand that p ( l, t i ) is somewhat skewed (see Figure 5).There is, however, no reason to expect that in general the relative error (21) is small. Moreover, checking whether itis indeed small requires detailed knowledge of p i ( l ; t ) beyond its expected value. If such information is available, anapproximation such as the MFA becomes superﬂuous, since then the exact expression for P ( k, t ) given by (20) can beused. (a) Contour plot of p ( l + m, t i ) /t . Hori-zontally ≤ t i ≤ , vertically l . Blueindicates low values, red high values.The black curve is (cid:104) K i ( t ) (cid:105) as a func-tion of t i . (b) A d plot of p ( l + m, t i ) /t , showing only the values for thedomains A , B and C . Figure 3: Plots of p ( l + m, t i ) /t . For k < m , p ( k, t i ) = 0 , for all t i . In both ﬁgures t = 100 and m = 7 . The totalprobability in areas A and C is approximately equal. There is nothing in the MFA that requires the network to be of the preferential attachment type. If the MFA is correct, itshould work for networks with an arbitrary degree distribution.Consider a random network consisting of t nodes. Let K i ( t ) be the degree of the node t i at time t and assume it has anuniform distribution over { , , . . . , t + 1 − t i } , for each t i = 1 , , . . . , t . The expected values of the K i ( t ) are (cid:104) K i ( t ) (cid:105) = ( t − t i + 2) / , t i = 1 , , . . . , t. The jump points of the cumulative probability function ˜ P ( k ; t ) , shown in Figure 4, are therefore successive multiples of / : ˜ P ( k ; t ) =  if k ≤ − jt for j = 1 , , . . . , ( t − and j < k ≤ j otherwise . (22)The approximation P MF A ( k ; t ) , which corresponds with ˜ P ( k ; t ) in the jump points k = j , is given by P MF A ( k ; t ) =  if k ≤ − t ( k − if < k ≤ t otherwiseThis leads to p MF A ( k ; t ) = 2 t for k = 1 , , . . . , ( t + 2) / . p ( k ; t ) is p ( k ; t ) = 1 t t (cid:88) i =1 p i ( k ; t )= 1 t t +1 − k (cid:88) i =1 t + 1 − i = 1 t t (cid:88) n = k n ≈ t log( t/k ) , for large t and moderately large k .It is clear from Figure 4 that p MF A ( k ; t ) is not a good approximation of p ( k ; t ) .

10 20 30 40 k (a) ˜ P ( k ; t ) (orange), P MFA ( k ; t ) (black) and the the truecumulative distribution P ( k ; t ) (blue).

10 20 30 40 k (b) The approximation p MFA ( k ; t ) (orange) and the trueprobability distribution p ( k ; t ) (blue) Figure 4: Probability distributions corresponding to the counterexample. Here, t = 41 . The MFA consists of a sequence of mathematical manipulations, resulting in the calculation of the degree distributionof a network. In the case of the BA network, its result is close to correct. Nonetheless, the MFA is not a good generalmethod to teach students. There are many reasons for this assessment.First, the method is not backed up by a proper mathematical model. This makes it impossible for students to see whereapproximations are made and how the MFA actually works. Students are presented with a number of steps, each ofwhich they may remember from Calculus or Probability class, but without a model, the method can not be criticallyexamined.Second, the method is faulty. It treats the degree of a node simultaneously as deterministic and as stochastic. It changesdiscrete variables, such as a timestep and the degree of a node, in to continuous ones without a reason. It interpolatesfunctions which are only deﬁned on integers to functions deﬁned on the reals in a haphazard way. All these fallacieslead to unresolvable contradictions.Third, the mathematical model behind the MFA can be reconstructed, and it shows that the familiar looking operationshide a lack of intuitive content. Figure 1 shows that the ﬁnal distribution is derived from a rather mysterious rearrange-ment of a different initial distribution. There is no indication why this would work.Fourth, the reconstructed model contains dubious elements. It replaces a distribution, which is shown numerically tohave a large variance, by a distribution concentrated in one point. Crucially, at one point an approximation is usedwhich is completely unmotivated. This approximation works for BA networks, but only by chance. A counterexampleshows that this approximation is, in general, false.Finally, a simple alternative method exists, namely the rate-equation method, which gives the correct answer and isbased on a standard mathematical technique.This leads to the question why, after more than twenty years, the MFA method can be found in many textbooks andcourse notes, apparently having passed dozens of authors, editors and referees unscathed. Finding the answer couldbe an interesting exercise in the sociology of science, but ironically one of the reasons might involve preferentialattachment networks themselves. De Solla Price [7] already noted that scientiﬁc papers tend to form a preferential9ttachment network, with references from one paper to another acting as the edges. The huge success of [2], the initialnode of the citation network on preferential attachment, may have given it such an aura of authority that a critical reviewof its every aspect would have been considered unnecessary.

Appendix A

The value of p ( k ) can be found using the rate equation, an approximation method based on the master equationapproach.Let n ( k ; t ) be the expected number of nodes of degree k ≥ m at time t . Neglecting the probability that an existingnode receives two or more links from a new node, the transition probability at time t + 1 for a node to go from k linksto k + 1 links is k/ (2 t ) . This leads to the following set of equations: n ( k ; t + 1) = n ( k ; t ) + k − t n ( k − t ) − k t n ( k ; t ) , k > mn ( m ; t + 1) = n ( m ; t ) + 1 − m t n ( m ; t ) . Since p ( k ; t ) = n ( k ; t ) /t , the above equations can be written as: t + 1) p ( k ; t + 1) = 2 tp ( k ; t ) + ( k − p ( k − t ) − kp ( k ; t ) , k > m t + 1) p ( m ; t + 1) = 2 tp ( m ; t ) − mp ( m ; t ) + 2 . Now assume that lim t →∞ p ( k ; t ) = p ( k ) . Making the approximation p ( k ; t + 1) ≈ p ( k ; t ) ≈ p ( k ) for large t ,substituting in the above equation and rearranging leads to: p ( k ) = k − k + 2 p ( k − , k > mp ( m ) = 2 m + 2 . It is easy to check that p ( k ) = 2 m ( m + 1) k ( k + 1)( k + 2) is the solution of this recursion. Appendix B

The model will be changed so that it becomes continuous in time. This model was ﬁrst used in connection with BAnetworks in [5].Time is taken to be the ﬁnite interval [0 , T ] . New nodes are born at integer values of time. During a short time-interval ∆ t , at most one connection is made from the new node to an existing node. The probability that an existing nodereceives a connection is proportional to its degree k and to ∆ t . The expected total number of connections made duringa time-interval of unit length is m . This leads to the probability of attachment to a node of degree k during a shorttime-interval [ t, t + ∆ t ] to be mk ∆ t/S ( t ) . Here, S ( t ) = 2 m (cid:98) t (cid:99) is the total degree of the network. For large values of t ,the relative error in approximating (cid:98) t (cid:99) by t is small. Therefore, the probability of attachment is simpliﬁed to k ∆ t/ (2 t ) .Eventually, the limit ∆ t → is taken.The change in the model implies that its outcomes are not guaranteed to match those of the original. Numericalsimulations show that the continuous-time model is a good approximation to the discrete-time one, apart for the earliestnodes. A theoretical justiﬁcation can be found in [5].If the network is considered as a directed graph, the out-degree of every node is constant and equal to m . It is thein-degree d which contains all the uncertainty. Consequently, it is useful to write the total degree as k = d + m .Let D i ( t ) = K i ( t ) − m be the stochastic variable representing the in-degree, at time t , of the node born at t i . Theprobability distribution for D i ( t ) will be denoted q i ( d ; t ) .The master-equations for q i ( d ; t + ∆ t ) become q i (0; t + ∆ t ) = (1 − m ∆ t t ) q i (0; t ) q i ( d ; t + ∆ t ) = (1 − ( m + d )∆ t t ) q i ( d ; t ) + ( m + d − t t q i ( d − t ) , ≤ d ≤ T − q i ( T ; t + ∆ t ) = ( m + T − t t q i ( T − t ) , q i (0; t i ) = 1 (23) q i ( d ; t i ) = 0 for all d > (24)Note that there is a set of equations for each i = 1 , , . . . , T .Rearranging the terms and taking the limit ∆ t → leads to the differential equations:dd t q i (0; t ) = − m t q i (0; t ) , dd t q i ( d ; t ) = 12 t (( m + d − q i ( d − t ) − ( m + d ) q i ( d ; t )) , d ≥ . (25)The equation for q i (0; t ) is easily solved, and yields: q i (0; t ) = ( t i t ) m / , where the initial condition (23) was used.To derive the solution for q i ( d ; t ) with d ≥ the following transformation is introduced: q i ( d ; t ) = τ − ( d + m ) i r ( d ; τ i ) with τ i = ( tt i ) / . Substitution in equation (25) gives:dd τ i r ( d ; τ i ) = ( m + d − r ( d − τ i ) , d ≥ . (26)After solving for d = 1 , d = 2 . . . , it soon becomes clear that the solution of (26) which satisﬁes the initial condition r ( d ; 1) = 0 , is r ( d ; τ i ) = (cid:18) m + d − m − (cid:19) ( τ i − d . This gives the result: q i ( d ; t ) = τ − mi (cid:18) m + d − m − (cid:19) (1 − ( τ i ) − ) d , d = 0 , , , . . . (27)and therefore p i ( k ; t ) = q i ( k − m ; t ) = τ − mi (cid:18) k − m − (cid:19) (1 − ( τ i ) − ) k − m , k = m, m + 1 , . . . (28)The function (27) is known as the negative binomial distribution with parameters r = m and p = 1 − ( τ i ) − = 1 − ( t i t ) / .Figure 5 compares simulations of the BA network to expression (28). The match is very good, excepting for nodes witha very low birth date. For those nodes, the mean of the numerically calculated distribution can be appreciably largerthan that of p i ( k ; t ) , although the general shape of the theoretical distribution and the simulation agree. See Figure 6 fora plot of the means of K i ( t ) .The mean and variance of the negative binomial distribution can be found in many reference sources: (cid:104) D i ( t ) (cid:105) = pr − p = m ( tt i ) / − m , (29)Var ( D i ( t )) = pr (1 − p ) = m (( tt i ) − ( tt i ) / ) . (30)Because K i ( t ) = D i ( t ) + m , the model predicts (cid:104) K i ( t ) (cid:105) = m ( tt i ) / . (31)11

20 40 60 k (a) t i = 2 k (b) t i = 5 k (c) t i = 10 k (d) t i = 20 Figure 5: The distribution p i ( k ; t ) corresponding to K i ( t ) . In orange, the theoretical prediction p i ( k ; t ) = q i ( k − m ; t ) (for clarity this function is plotted for all values of k , not just the integer values). In black, the numerical result based on20000 simulations. In all graphs, t = 50 and m = 3 . Appendix C

The expected value of K i ( t ) can be derived by the reasoning in section 3.1, using discrete time steps: (cid:104) K i ( t + 1) (cid:105) − (cid:104) K i ( t ) (cid:105) = (cid:104) K i ( t ) (cid:105) t , (cid:104) K i ( t i ) (cid:105) = m . This leads to the recursion (cid:104) K i ( t + 1) (cid:105) = (1 + 12 t ) (cid:104) K i ( t ) (cid:105) , (cid:104) K i ( t i ) (cid:105) = m , with solution (cid:104) K i ( t ) (cid:105) = (1 + 12( t −

1) ) . . . (1 + 12 t i ) m , t > t i . (32)This expression is not very informative, so an upper- and lower bound will be derived. Taking logarithms leads to log (cid:104) K i ( t ) (cid:105) = t − (cid:88) k = t i log(1 + 12 k ) + log m . Because log(1 + k ) is strictly decreasing in k , the sum in the right-hand side is both a Riemann lower sum and anupper sum: (cid:90) tt i log(1 + 12 x ) d x ≤ t − (cid:88) k = t i log(1 + 12 k ) ≤ (cid:90) t − t i − log(1 + 12 x ) d x . t − (cid:88) k = t i log(1 + 12 k ) ≤ log( 1 + 2( t − t i −

1) ) / + ( t −

1) log(1 + 12( t −

1) ) − ( t i −

1) log(1 + 12( t i −

1) ) t − (cid:88) k = t i log(1 + 12 k ) ≥ log( 1 + 2 t t i ) / + t log(1 + 12 t ) − t i log(1 + 12 t i ) . For large t , the terms ( t −

1) log(1 + t − ) and t log(1 + t ) can both be replaced by and the terms t − and t + 1 both by t .Taking exponents again yields the bounds: m ( e tt i + 1 / / (1 + 12 t i ) − t i ≤ (cid:104) K i ( t ) (cid:105) ≤ m ( e tt i + 1 / / (1 + 12( t i −

1) ) − ( t i − . (33)When t i is moderately large, both the upper bound and the lower bound in expression (33) are close to the value m ( tt i ) / found in (31). For small values of t i , calculating (cid:104) K i ( t ) (cid:105) using (32) shows that for all but the smallest valuesof t i , m ( tt i ) / is also an excellent approximation for (cid:104) K i ( t ) (cid:105) , see Figure 6. t i < K i > Figure 6: The blue lines are the upper and lower bounds for (cid:104) K i ( t ) (cid:105) using (33). The orange line is m ( tt i ) / . The blackdots represent the exact value of (cid:104) K i ( t ) (cid:105) calculated from (32). As in Figure 5, t = 50 and m = 3 . Acknowledgements

I thank Rob Bisseling, Anton van de Ven and Theo Ruijgrok for their comments. Th. R. also wrote some of the code forthe simulations.

References [1] A.L. Barabási, “Network Science,” Cambridge University Press, (2016).[2] A.L. Barabási and R. Albert, “Emergence of scaling in random networks,” Science , 509—512 (1999).[3] A.L. Barabási, R. Albert and H. Jeong, “Mean-ﬁeld theory for scale-free random networks,” Physica A ,173—187 (1999).[4] A. Barrat, M. Barthélemy and A. Vespignani, “Dynamical processes on complex networks,” CambridgeUniversity Press, (2008).[5] B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády, “The degree sequence of a scale-free random graphprocess,” Random Structures and Algorithms , 279—290 (2001).[6] G. Caldarelli, “Scale-free networks: complex webs in nature and technology,” Oxford University Press,(2007).[7] D. De Solla Price, “A general theory of bibliometric and other cumulative advantage processes,” Journal ofthe American Society for Information Science, , 292-306, (1976).138] S.N. Dorogovtsev, J.F.F. Mendes and A.N. Samukhin, “Structure of growing networks with preferentiallinking,” Phys. Rev. Lett. , 4633—4636 (2000).[9] P.L. Krapivsky, S. Redner and F. Leyvraz, “Connectivity of growing random networks,” Phys. Rev. Lett. ,4629—4632 (2000).[10] Massachusetts Institute of Technology , “Networks,” https://ocw.mit.edu/courses/economics/14-15j-networks-spring-2018/lecture-and-recitation-notes/MIT14_15JS18_lec12.pdf ,accessed January 2, 2020.[11] Stanford University, “Social and economic networks: models and analysis,” https://online.stanford.edu/courses/sohs-yecon0001-social-and-economic-networks-models-and-analysis , ac-cessed January 2, 2020.[12] Stanford University, “Social and Economic Networks 3.3 Week 3: Preferential Attachment,”