A Random Growth Model with any Real or Theoretical Degree Distribution
AA Random Growth Model with any Real orTheoretical Degree Distribution
Fr´ed´eric Giroire , St´ephane P´erennes , and Thibaud Trolliet Universit´e Cˆote d’Azur/CNRS, France INRIA Sophia-Antipolis, France
Abstract.
The degree distributions of complex networks are usuallyconsidered to be power law. However, it is not the case for a large num-ber of them. We thus propose a new model able to build random grow-ing networks with (almost) any wanted degree distribution. The degreedistribution can either be theoretical or extracted from a real-world net-work. The main idea is to invert the recurrence equation commonly usedto compute the degree distribution in order to find a convenient attach-ment function for node connections - commonly chosen as linear. Wecompute this attachment function for some classical distributions, as thepower-law, broken power-law, geometric and Poisson distributions. Wealso use the model on an undirected version of the Twitter network, forwhich the degree distribution has an unusual shape.
Keywords:
Complex Networks, Random Growth Model, PreferentialAttachment, Degree Distribution, Twitter
Complex networks appear in the empirical study of real world networks from var-ious domains, such that social, biology, economy, technology, ... Most of thosenetworks exhibit common properties, such as high clustering coefficient, commu-nities, ... Probably the most studied of those properties is the degree distribution(named DD in the rest of the paper), which is often observed as following a power-law distribution. Random network models have thus focused on being able tobuild graphs exhibiting power-law DDs, such as the well-known Barabasi-Albertmodel [2] or the Chun-Lu model [6], but also models for directed networks [4]or for networks with communities [18]. However, this is common to find realnetworks with DDs not perfectly following a power-law. For instance for socialnetworks, Facebook has been shown to follow a broken power-law [12], whileTwitter only has the distribution tail following a power-law and some atypicalbehaviors due to Twitter’s policies, as we report in Section 5.1.It is yet crucial to build models able to reproduce the properties of real networks.Indeed, some studies such as fake news propagation or evolution over time ofthe networks cannot always be done empirically, for technical or ethical reasons.Carrying out simulations with random networks created with well-built modelsis a solution to study real networks without directly experimenting on them. a r X i v : . [ c s . S I] A ug F.Giroire, S.P´erennes, T.Trolliet(a) DD of the num-ber of unique callersand callees from amobile phone opera-tor. [19] (b) In-DD be-tween shop-to-shoprecommendationsfrom an onlinemarketplace. [20] (c) Graphlet DDfrom a biologicalmodel. [17] (d) DD of users ofCyworld, the largestonline social networkof South Korea. [1](e) DDs of users ofFlickr, an online so-cial network. [5] (f) DD of the lengthof the contact list inMicrosoft Messengernetwork. [13] (g) DD of the num-ber of friends fromFaceBook, a socialnetwork. [12] (h) Out-DD of thenumber of followeeson Twitter. [21]
Fig. 1: DDs extracted from different seminal papers studying networks from var-ious domains.Those models have to create networks with similar properties as real ones, whilestaying as simple as possible.In this paper, we propose a random growth model able to create graphs withalmost any (under some conditions) given DD. Classical models usually choosethe nodes receiving new edges proportionally to a linear attachment function f ( i ) = i (or f ( i ) = i + b ) [2,4]. The theoretical DD of the networks generatedby those models is computed using a recurrence equation. The main idea of thispaper is to reverse this recurrence equation to express the attachment function f as a function of the DD. This way, for a given DD, we can compute the asso-ciated attachment function, and use it in a proposed random growth model tocreate graphs with the wanted DD. The given DD can either be theoretical, orextracted from a real network.We compute the attachment function associated with some classical DD, homo-geneous ones such as the Poisson or geometric distributions, and heterogeneousones such as the exact power-law, and the broken power-law. We also study theundirected DD of a Twitter snapshot of 400 millions nodes and 23 billion edges,extracted by Gabielkov et al. [10] and made available by the authors, and seethat it has an atypical form, due to Twitter’s policies. We compute empiricallythe associated attachment function, and use the model to build random graphswith this DD.The rest of the paper is organized as follows. We first discuss the related workin Section 2. In Section 3, we present the new model, and invert the recurrenceequation to find the relation between the attachment function and the DD. We Random Growth Model with any Degree Distribution 3 apply this relation to compute the attachment function associated to a power-lawDD, a broken-power law DD, and other theoretical distributions. In Section 5we apply our model on a real-world DD, the undirected DD of Twitter.
The degree distribution has been computed for a lot of networks, in particularfor social networks such as Facebook [22], Microsoft Messenger [13] or Insta-gram [9]. Note that Myers et al. have also studied DDs for Twitter in [15], usinga different dataset than the one of [10].Questioning the relevance of power-law fits is not new: for instance, Clauset etal. [7] or Lima-Mendez and van Helden [14] have already deeply questioned themyth of power-law -as Lima-Mendez and van Helden call it-, and develop toolsto verify if the DD can be considered as a power-law or not. Clauset et al. applythe developed tools on 24 DDs extracted from various domains of literature,which have all been considered to be power-laws. Among them, “17 of the 24data sets are consistent with a power-law distribution”, and “there is only onecase in which the power law appears to be truly convincing, in the sense that itis an excellent fit to the data and none of the alternatives carries any weight”.The study of Clauset et al. only considered DD which have a power-law shapewhen looking at the distribution in log-log. As a complement, we gathered DDsfrom literature which clearly do not follow power-law distributions to show theirdiversity. We extracted from literature DDs of networks from various domains:biology, economy, computer science, ... Each presented DD comes from a seminalwell cited paper of the respective domains. They are gathered in Figure 1. Vari-ous shapes can be observed from those DDs, which could (by eyes) be associatedwith exponential (Fig. 1b, 1c), broken power-law (Fig. 1a, 1e, 1g), or even somekind of inverted broken power-law (Fig 1d). We also observe DDs with specificbehaviors (Fig. 1f, 1h).The first proposed models of random networks, such as the Erd˝os–R´enyi model [8],build networks with a homogeneous DD. The observation that a lot of real-worldnetworks follow power-law DDs lead Albert and Barabasi to propose their fa-mous model with linear preferential attachment [2]. It has been followed by alot of random growth models, e.g. [4,6] also giving a DD in power-law. A fewmodels permit to build networks with any DD: for instance, the configurationmodel [3,16] takes as parameter a DD P and a number of nodes n , creates n nodes with a degree randomly picked following P , then randomly connects thehalf-edges of every node. Goshal and Newman propose in [11] a model generatingnon-growing networks (where, at each time-step, a node is added and another isdeleted) which can achieve any DD, using a method close to the one proposedin this paper. However, both of those models generate non-growing networks,while most real-world networks are constantly growing. The proposed model is a generalization of the model introduced by Chun and Luin [6]. At each time step, we have either a node event or an edge event. During
F.Giroire, S.P´erennes, T.Trolliet a node event, a node is added with an edge attached to it; during an edge event,an edge is added between two existing nodes. Each node to which the edge isconnected is randomly chosen among all nodes with a probability proportionalto a given function f , called the attachment function . The model is as follows: (cid:46) We start with an initial graph G . (cid:46) At each time step t:- With probability p : we add a node u , and an edge ( u, v ) where thenode v is chosen randomly between all existing nodes with a probability f ( deg ( v )) (cid:80) w ∈ V f ( deg ( w )) ;- With probability (1 − p ): we add an edge ( u, v ) where the nodes u and v are chosen randomly between all existing nodes with a probability f ( deg ( u )) (cid:80) w ∈ V f ( deg ( w )) and f ( deg ( v )) (cid:80) w ∈ V f ( deg ( w )) .Note that the Chun-lu model is the particular case where f ( i ) = i for all i ≥ generalized Chun-Lu model the proposed model where f ( i ) = i + b , forall i ≥ b > − The common way to find the DD of classical random growth models is to studythe recurrence equation of the evolution of the number of nodes with degree i between two time steps. This equation can sometimes be easily solved, sometimesnot. But what matters for us is that the common process is to start from agiven model -thus an attachment function f -, and use the recurrence equationto find the DD P . In this section, we show that the recurrence equation of theproposed model can be reversed such that, if P if given, we can find an associatedattachment function f . Theorem 1.
In the proposed model, if the attachment function is chosen as: ∀ i ≥ , f ( i ) = 1 P ( i ) ∞ (cid:88) k = i +1 P ( k ) , (1) then the DD of the created graph is distributed according to P .Proof. We consider the variation of the number of nodes of degree i N ( i, t )between two time steps t and (t+1). We have: N ( i, t +1) − N ( i, t ) = pδ i, +(2 − p ) f ( i − (cid:80) j ≥ f ( j ) N ( j, t ) N ( i − , t ) − (2 − p ) f ( i ) (cid:80) j ≥ f ( j ) N ( j, t ) N ( i, t )where δ i,j is the Kronecker delta. The first term of the right hand is the proba-bility of addition of a node. The second (resp. third) term is the probability that Random Growth Model with any Degree Distribution 5 a node of degree i − i ) gets chosen to be the end of an edge. The factor(2 − p ) = p + 2(1 − p ) comes from the fact that this happens with probability p during a node event (connection of a single half-edge) and with probability2(1 − p ) during an edge event (possible connection of 2 half-edges).Let P ( i ) = lim t → + ∞ N ( i,t ) pt (the p in the denominator comes from the fact that E [ N ( t )] = pt ). We denote g ( i ) = − pp f ( i ) (cid:80) j ≥ f ( j ) P ( j ) . We first show that g ( i ) = P ( i ) ∞ (cid:80) k = i +1 P ( k ). We will then show that we can choose f = g .We use the following lemma from [6]: Lemma 1.
Let ( a t ) , ( b t ) , ( c t ) be three sequences such that a t +1 = (1 − b t t ) a t + c t , lim t → + ∞ b t = b > , and lim t → + ∞ c t = c . Then lim t → + ∞ a t t exists and equals c b . For i = 1, the equation becomes: N (1 , t + 1) − N (1 , t ) = p − (2 − p ) f (1) (cid:80) j ≥ f ( j ) N ( j, t ) N (1 , t ) . Taking a t = N (1 ,t ) p , b t = (2 − p ) f (1) p (cid:80) j ≥ f ( j ) N ( j,t ) pt , and c t = 1, we have lim t → + ∞ b t = g (1) > t → + ∞ c t = 1. We can thus apply Lemma 1:lim t → + ∞ N (1 , t ) pt = P (1) = 11 + g (1) . Now, ∀ i ≥
2, taking a t = N ( i,t ) p , b t = (2 − p ) f ( i ) p (cid:80) j ≥ f ( j ) N ( j,t ) pt , and c t = (2 − p ) f ( i − p (cid:80) j ≥ f ( j ) N ( j,t ) pt N ( i − ,t ) pt ,we have lim t → + ∞ b t = g ( i ) > t → + ∞ c t = g ( i − P ( i − t → + ∞ N ( i, t ) pt = P ( i ) = g ( i − P ( i − g ( i ) . (2)Iterating over Equation 2, we express g as a function of P : g ( i ) P ( i ) = g ( i − P ( i − − P ( i ) = g (1) P (1) − i (cid:88) k =2 P ( k ) = 1 − i (cid:88) k =1 P ( k )= ⇒ g ( i ) = 1 P ( i ) ∞ (cid:88) k = i +1 P ( k )Now, notice that: ∞ (cid:88) k =1 g ( k ) P ( k ) = ∞ (cid:88) k =1 − pp f ( k ) ∞ (cid:80) k (cid:48) =1 f ( k (cid:48) ) P ( k (cid:48) ) P ( k ) = (2 − p ) p . F.Giroire, S.P´erennes, T.TrollietName P f ConditionGeneralized Chun-Lu P (1) Γ ( i + b ) Γ ( i + b + α ) 1 α − i + bα − p = α − α + b − Exact Power-Law i − α ζ ( α ) ζ ( α,i +1) i − α < α < . p = ζ ( α ) ζ ( α − Geometric Law q (1 − q ) i − − qq q ≤ . p = 2 q Poisson Law e λ − λ i i ! e λ γ ( i +1 ,λ ) λ i p = − e − λ ) λ Broken Power-Law (cid:40) C Γ ( i + b ) Γ ( i + b + α ) if i ≤ dCγ Γ ( i + b ) Γ ( i + b + α ) if i > d cf eq. 9& 10 cf eq. 8 Table 1: Attachment functions f and conditions on p for some classical probabil-ity distributions P . ζ ( s ) is the Riemann zeta function, ζ ( s, q ) the Hurwitz zetafunction, and γ ( a, x ) is the lower incomplete Gamma function.So g ( i ) satisfies g ( i ) = − pp g ( i ) ∞ (cid:80) k =1 g ( k ) P ( k ) . Hence the attachment function can bechosen as f = g , which concludes the proof. (cid:117)(cid:116) For a given probability law, Theorem 1 can be used to compute the attachmentfunction which, when used in the model, will give this probability law as DD.With the presented model, we also have an implicit constraint between the meandegree of P and the parameter p. Indeed by construction, we have E [ N ( t )] = pt and E ( | E | ( t )) = t , leading to a mean-degree of p . But the mean-degree can alsobe expressed as (cid:80) k ≥ kP ( k ). Condition 1
The parameter p has to satisfy: p = < P > (3)We can finally combine the previous results and present the method to build arandom network with a fixed DD:1) Use Equation 1 to compute f from P ;2) Compute p using Condition 1;2) Build the graph with the proposed model, given ( f, p ) as parameters. We now apply Equation 1 to compute the attachment function for some classicaldistributions. We first start in Section 4.1 from the distribution obtained withthe generalized Chun-Lu model to show we find a linear dependence, as expected.We then compute in Section 4.2 the associated attachment function of the brokenpower-law distribution. We finally compute the exact power-law, geometric law,and Poisson law distributions in Sections 4.3, 4.4 and 4.5. Table 1 summarizesthose results.
Random Growth Model with any Degree Distribution 7
As a first example, by taking a power-law DD, we should be able to find a linearprobability distribution for the generalized Chun-Lu model.In the general Chun-Lu model, we can show that the real DD is not an exactpower-law but a fraction of Gamma function -equivalent to a power-law for highdegrees- of the form: ∀ i ≥ , P ( i ) = P (1) Γ ( i + b ) Γ ( i + b + α ) ∼ i (cid:29) i − α where P (1) = ( α − Γ ( b + α ) Γ ( b +1) , and α >
2. The choice of α determines the slopeof the DD, while the choice of b determines the mean-degree of the graph. Constraint on p:
Condition 1 gives:2 p = ∞ (cid:88) k =1 kP ( k ) = ( α − Γ ( b + α ) Γ ( b + 1) × α + α (2 b −
1) + b ( b − α − α − Γ ( b + 1) Γ ( α + b + 1)= α + α (2 b −
1) + b ( b − α − α + b ) = ( α + b − α + b )( α − α + b )= ⇒ p = 2( α − α + b − Attachment function f:
Using Theorem 1: f ( i ) = 1 P ( i ) (cid:88) k ≥ i +1 P ( k ) = Γ ( i + b + α ) Γ ( i + b ) Γ ( i + b + 1)( α − Γ ( i + α + b )= Γ ( i + b + α ) Γ ( i + b ) Γ ( i + b + 1)( α − Γ ( i + α + b )= ⇒ f ( i ) = 1 α − i + bα − α and mean-degree p − , one only has to choose α as the wantedslope and b following equation 4.In the particular case b = 0, we recover theChun-Lu model of [6], with a slope of α = 2 + p − p as expected. We now study the case of a broken power-law, corresponding to the DD of realworld complex networks, as discussed in Section 2. which was the one we wereinterested in initially. We consider a distribution of the form: P ( i ) = (cid:40) C Γ ( i + b ) Γ ( i + b + α ) if i ≤ dCγ Γ ( i + b ) Γ ( i + b + α ) if i > d F.Giroire, S.P´erennes, T.Trolliet where d, b , α , b , and α are parameters of our distribution such that α > α > C a normalisation constant, and γ chosen in order to obtain continuityfor i = d . As seen in section 4.1, the ratio of gamma functions is close to apower-law as soon as i gets large. Hence, this distribution corresponds to twopowers-laws, with different slopes, and a switch between the two at the value d .We can easily find the continuity constant γ , since it verifies: Γ ( d + b ) Γ ( d + b + α ) = γ Γ ( d + b ) Γ ( d + b + α ) = ⇒ γ = Γ ( d + b ) Γ ( d + b + α ) Γ ( d + b + α ) Γ ( d + b ) . (6) Constraints on C and p:
The value of C can be computed by summing overall degrees:1 C = ∞ (cid:88) k =1 P ( k ) = d (cid:88) k =1 Γ ( k + b ) Γ ( k + b + α ) + γ ∞ (cid:88) k = d +1 Γ ( k + b ) Γ ( k + b + α )= 1 α − Γ ( b + 1) Γ ( α + b ) − α − Γ ( b + 1 + d ) Γ ( α + b + d ) + γ α − Γ ( b + d + 1) Γ ( α + b + d ) C = (cid:16) α − Γ ( b + 1) Γ ( α + b ) + Γ ( b + d ) Γ ( α + b + d ) (cid:0) b + dα − − b + dα − (cid:1)(cid:17) − (7)Using Condition 1, p is defined by the following equation:2 pC = d (cid:88) k =1 k Γ ( k + b ) Γ ( k + b + α ) + γ ∞ (cid:88) k = d +1 k Γ ( k + b ) Γ ( k + b + α )= α + α (2 b −
1) + b ( b − α − α − Γ ( b + 1) Γ ( α + b + 1) (8) − α ( d + 1) + α ( b ( d + 2) + d −
1) + b ( b − − d ( d + 1)( α − α − Γ ( b + d + 1) Γ ( α + b + d + 1)+ γ α ( d + 1) + α ( b ( d + 2) + d −
1) + b ( b − − d ( d + 1)( α − α − Γ ( b + d + 1) Γ ( α + b + d + 1) Attachment function f : For the computation of the attachment function, wehave to distinguish two cases:
Case 1: i ≥ df ( i ) = 1 P ( i ) ∞ (cid:88) k = i +1 P ( k ) = Γ ( i + b + α ) Γ ( i + b ) ∞ (cid:88) k = i +1 Γ ( k + b ) Γ ( k + b + α )= Γ ( i + b + α ) Γ ( i + b ) 1 α − Γ ( i + b + 1) B Γ ( i + b + α ) f ( i ) = 1 α − i + b α (9)We find a linear attachment function: indeed for i > d , we only take into accountthe second power-law, hence we expect to find the same result than in section 4.1. Random Growth Model with any Degree Distribution 9 f ( i ) (a) Theoretical attachment function f (b) DD of a random network Fig. 2: Theoretical attachment function f and degree distribution of a randomnetwork for the broken power-law distribution. Parameters are N = 5 · , b = b = 1, α = 2 . α = 4 and d = 100. Case 2: i < df ( i ) = Γ ( i + b + α ) Γ ( i + b ) (cid:32) d (cid:88) k = i +1 Γ ( k + b ) Γ ( k + b + α ) + γ ∞ (cid:88) k = d +1 Γ ( k + b ) Γ ( k + b + α ) (cid:33) = Γ ( i + b + α ) Γ ( i + b ) (cid:32) α − (cid:0) Γ ( i + b + 1) Γ ( i + α + b ) − Γ ( b + d + 1) Γ ( b + α + d ) (cid:1) + γα − Γ ( b + d + 1) Γ ( b + α + d ) (cid:33) = i + b α − Γ ( i + b + α ) Γ ( i + b ) (cid:32) d + b α − Γ ( b + d ) Γ ( b + α + d ) − α − Γ ( b + d + 1) Γ ( b + α + d ) (cid:33) f ( i ) = i + b α − Γ ( i + b + α ) Γ ( d + b ) Γ ( i + b ) Γ ( d + b + α ) (cid:16) b + dα − − b + dα − (cid:17) (10)In this second case, we have a linear part, in addition to a more complicatedpart. Note that, for ( α , b ) = ( α , b ), i.e., when the two power-laws are equals,this second term vanishes, letting as expected only the linear part. Figure 2ashows the shape of f . We see that, while the second part is linear as discussedbefore, the first part is sub-linear.We used this attachment function to build a network using our model. The DD isshown in Figure 2b: we see we built a random network with a broken power-lawdistribution as wanted. The DD obtained with the Chun-Lu model -and most of other classical models-gives a power-law only for high degrees. We can ask ourselves what would bethe attachment function associated with an exact power-law degree distributionof the form P ( i ) = i − α ζ ( α ) , where ζ ( s ) = (cid:80) k ≥ k s is the Riemann zeta function. Constraints on C and p
Condition 1 gives the following equation:2 p = 1 ζ ( α ) ∞ (cid:88) k =1 k − α = ζ ( α − ζ ( α )= ⇒ p = 2 ζ ( α ) ζ ( α −
1) (11)Since we want 0 < p ≤
1, it implies ζ ( α ) ζ ( α − < ⇒ < α < . Attachment function
Theorem 1 gives immediately: f ( i ) = 1 P ( i ) ∞ (cid:88) k = i +1 P ( k ) = ζ ( α, i + 1) i − α We now study the geometric distribution: ∀ i ≥ , P ( i ) = q (1 − q ) i − Constraints on p
Condition 1 gives:2 p = (cid:88) k ≥ kq (1 − q ) k − = q (1 − q ) (1 − q ) q = 1 q = ⇒ p = 2 q (12)Here again, to satisfy 0 < p ≤ q has to satisfy q ≤ . Attachment function
The attachment function is easy to compute: f ( i ) = 1 q (1 − q ) i − (cid:88) k ≥ i +1 q (1 − q ) k − = 1(1 − q ) i (1 − q ) i +1 q = 1 − qq (13) Another classic homogeneous law is the Poisson distribution: ∀ i ≥ , P ( i ) = 1 e λ − λ i i !The constant e λ − has been chosen such that (cid:80) k ≥ P ( k ) = 1. Random Growth Model with any Degree Distribution 11
Constraint on p
The condition 1 gives:2 p = 1 e λ − (cid:88) k ≥ k λ k k ! = e λ λe λ − ⇒ p = 2(1 − e − λ ) λ (14) Attachment function
Theorem 1 gives: f ( i ) = i ! λ i (cid:88) k ≥ i +1 λ k k ! = i ! λ i × e λ ( i ! − Γ ( i + 1 , λ )) i ! = e λ γ ( i + 1 , λ ) λ i (15)where γ ( a, x ) = x (cid:82) t =0 t a − e − t dt is the lower incomplete Gamma function. The model can also be applied to an empirical DD. Indeed, we observe in The-orem 1 that f ( i ) only depends on the values P ( i ) which can be arbitrary, thatis not following any classical function. This is a good way to model random net-works with an atypical DD. As an example, we apply our model on the DD ofan undirected version of Twitter, shown as having atypical behavior due to theTwitter policies. We start with a presentation of this DD, then apply our modelto build a random graph with this distribution. For this study, we use a snapshot of the Twitter’s snapshot from 2012, recoveredby Gabielkov and Legout [10] and made available by the authors. This networkcontains 505 millions nodes and 23 billions edges, making it one of the biggestsocial graph available nowadays. Each node corresponds to an account, and anarc ( u, v ) exists if the account u follows the account v . The in- and out-DDs arepresented in [21].In our case, we look at an undirected version of the Twitter snapshot. We con-sider the degree of each node as being the sum of its in- and out-degrees. Thedistribution of this undirected graph is presented in Figure 3a. We notice twospikes, around d = 20 and d = 2000. We do not know the reason of the first one(which could be social, or due to recommendation system). The second spikeis explained by a specificity of Twitter: until 2015, to avoid bots which werefollowing a very large number of users, Twitter limited the number of possiblefollowings to max(2000 , number of followers). In other words, a user is allowedto follow more than 2000 people only if he is also followed by more than 2000people. This leads to a lot of accounts with around 2000 followings. This high-lights the fact that some networks have their own specificities, sometimes dueto intern policies, which cannot be modeled but by a model specifically built forthem. degree10 N o d e s w i t h t h i s d e g r ee Modelized DD (b) DD of a random network with 8 · nodes using the attachment function ofFigure 3c. degree i10 f ( i ) (c) Attachment function f resulting fromthe undirected DD of Twitter. Fig. 3: Modelization of the undirected Twitter’s graph.
Figure 3c presents the obtained form of the attachment function f computedusing Equation 1 with the DD of Twitter. We notice that the overall functionis mainly increasing, showing that nodes of higher degrees have a higher chanceto connect with new nodes, like in classical preferential attachment models. Wealso notice two drops, around 20 and 2000. They are associated with the risingson the DD on the same degrees: to increase the amount of nodes with thosedegrees, the attachment function has to be smaller, so nodes with this degreehave less chance to gain new edges.We finally use our model with the empirical attachment function of Figure 3c.Note that, in an empirical study, P can be equal to zero for some degrees,for which no node has this degree in the network. In Twitter, the smallest ofthose degrees occurs around 18 . f cannot be computed. Toget around this difficulty, we interpolate the missing values of P , using the twoclosest smaller and bigger degrees of the missing points. Since we observe theprobability distribution on a log-log scale, we interpolate between the two pointsas a straight line on a log-log scale, i.e., as a power-law function. We believe thisis a fair choice since we only look at the tail of the distribution, which looks likea straight line, and since we interpolate between each pair of closest two pointsonly, instead of fitting on the whole tail of the distribution. Random Growth Model with any Degree Distribution 13
The DD of a random network built with our model is presented in Figure 3b.For time computation reasons, the built network only has N = 2 · nodes, tobe compared to the 5 · nodes of Twitter. However, it is enough to verify thatits DD shape follows the one of the real Twitter’s DD: in particular we recognizethe spikes around d = 20 and d = 2000. In this paper, we proposed a new random growth model picking the nodes to beconnected together in the graph with a flexible probability f . We expressed this f as a function of any distribution P , leading to the possibility to build a randomnetwork with any wanted degree distribution. We computed f for some classicaldistributions, as much as for a snapshot of Twitter of 505 million nodes and 23billion edges. We believe this model is useful for anyone studying networks withatypical degree distributions, regardless of the domain. If the presented modelis undirected, we also believe a directed version of it, based on the Bollob´as etal. model [4], can be easily generalized from the presented model. References
1. Yong-Yeol Ahn, Seungyeop Han, Haewoon Kwak, Sue Moon, and Hawoong Jeong.Analysis of topological characteristics of huge online social networking services. In
Proceedings of the 16th int. conference on World Wide Web , pages 835–844, 2007.2. R´eka Albert and Albert-L´aszl´o Barab´asi. Statistical mechanics of complex net-works.
Reviews of modern physics , 74(1):47, 2002.3. B´ela Bollob´as. A probabilistic proof of an asymptotic formula for the number oflabelled regular graphs.
European Journal of Combinatorics , 1(4):311–316, 1980.4. B´ela Bollob´as, Christian Borgs, Jennifer T Chayes, and Oliver Riordan. Directedscale-free graphs. In
SODA , volume 3, pages 132–139, 2003.5. Meeyoung Cha, Alan Mislove, and Krishna P Gummadi. A measurement-drivenanalysis of information propagation in the flickr social network. In
Proceedings ofthe 18th international conference on World wide web , pages 721–730, 2009.6. Fan Chung, Fan RK Chung, Fan Chung Graham, Linyuan Lu, Kian Fan Chung,et al.
Complex graphs and networks . American Mathematical Soc., 2006.7. Aaron Clauset, Cosma Rohilla Shalizi, and Mark EJ Newman. Power-law distri-butions in empirical data.
SIAM review , 51(4):661–703, 2009.8. Paul Erd˝os and Alfr´ed R´enyi. On the evolution of random graphs.
Publ. Math.Inst. Hung. Acad. Sci , 5(1):17–60, 1960.9. Emilio Ferrara, Roberto Interdonato, and Andrea Tagarelli. Online popularity andtopical interests through the lens of instagram. In
Proceedings of the 25th ACMconference on Hypertext and social media , pages 24–34, 2014.10. Maksym Gabielkov and Arnaud Legout. The complete picture of the twitter socialgraph. In
Proc. on CoNEXT student workshop , pages 19–20. ACM, 2012.11. Gourab Ghoshal and MEJ Newman. Growing distributed networks with arbitrarydegree distributions.
The European Physical Journal B , 58(2):175–184, 2007.12. Minas Gjoka, Maciej Kurant, Carter T Butts, and Athina Markopoulou. Walkingin facebook: A case study of unbiased sampling of osns. In , pages 1–9. Ieee, 2010.4 F.Giroire, S.P´erennes, T.Trolliet13. Jure Leskovec and Eric Horvitz. Planetary-scale views on a large instant-messagingnetwork. In
Proceedings of the 17th international conference on World Wide Web ,pages 915–924, 2008.14. Gipsi Lima-Mendez and Jacques van Helden. The powerful law of the power lawand other myths in network biology.
Molecular BioSystems , 5(12):1482–1493, 2009.15. Seth A Myers, Aneesh Sharma, Pankaj Gupta, and Jimmy Lin. Information net-work or social network?: the structure of the twitter follow graph. In
Proceedingsof the 23rd Int. Conference on World Wide Web , pages 493–498. ACM, 2014.16. Mark EJ Newman, Steven H Strogatz, and Duncan J Watts. Random graphswith arbitrary degree distributions and their applications.
Physical review E ,64(2):026118, 2001.17. Nataˇsa Prˇzulj. Biological network comparison using graphlet degree distribution.
Bioinformatics , 23(2):e177–e183, 2007.18. Arnaud Sallaberry, Faraz Zaidi, and Guy Melan¸con. Model for generating artifi-cial social networks having community structures with small-world and scale-freeproperties.
Social Network Analysis and Mining , 3(3):597–609, 2013.19. Mukund Seshadri, Sridhar Machiraju, Ashwin Sridharan, Jean Bolot, ChristosFaloutsos, and Jure Leskove. Mobile call graphs: beyond power-law and lognormaldistributions. In
Proceedings of the 14th ACM SIGKDD international conferenceon Knowledge discovery and data mining , pages 596–604, 2008.20. Andrew T Stephen and Olivier Toubia. Explaining the power-law degree distribu-tion in a social commerce network.
Social Networks , 31(4):262–270, 2009.21. Thibaud Trolliet, Nathann Cohen, Fr´ed´eric Giroire, Luc Hogie, and St´ephaneP´erennes. Interest clustering coefficient: a new metric for directed networks liketwitter. arXiv preprint arXiv:2008.00517 , 2020.22. Johan Ugander, Brian Karrer, Lars Backstrom, and Cameron Marlow. Theanatomy of the facebook social graph. arXiv preprint arXiv:1111.4503arXiv preprint arXiv:1111.4503