[PDF] Directed Hybrid Random Networks Mixing Preferential Attachment with Uniform Attachment Mechanisms

Abstract

Motivated by the complexity of network data, we propose a directed hybrid random network that mixes preferential attachment (PA) rules with uniform attachment (UA) rules. When a new edge is created, with probability p\in [0,1], it follows the PA rule. Otherwise, this new edge is added between two uniformly chosen nodes. Such mixture makes the in- and out-degrees of a fixed node grow at a slower rate, compared to the pure PA case, thus leading to lighter distributional tails. Useful inference methods for the proposed hybrid model are then provided and applied to both synthetic and real datasets. We see that with extra flexibility given by the parameter p, the hybrid random network provides a better fit to real-world scenarios, where lighter tails from in- and out-degrees are observed.

Full PDF

DDirected Hybrid Random Networks Mixing PreferentialAttachment with Uniform Attachment Mechanisms

Tiandong Wang and Panpan Zhang January 19, 2021

Abstract.

Motivated by the complexity of network data, we propose a directed hybrid randomnetwork that mixes preferential attachment (PA) rules with uniform attachment (UA) rules.When a new edge is created, with probability p ∈ [0 , p , the hybrid random network provides a better ﬁt to real-world scenarios, where lighter tailsfrom in- and out-degrees are observed. AMS subject classiﬁcations.

Primary: 05C80; 62G32Secondary: 05C07

Key words.

Preferential attachment; uniform attachment; in- and out-degrees; power laws;random networks 1.

Introduction

The preferential attachment (PA) mechanism [4] has been widely used to model interactionsor communications among the entities in a network-based system, especially evolving networks.A precursory study of PA networks was conducted by de Sollar Price [29] to model the growthof citation networks, where the research outcome coincides with a sociological theory called the

Matthew Eﬀect [20], inducing a well known economic manifestation—“The rich get richer; thepoor get poorer”.One of the most appealing properties of the PA network is scale-free (i.e., the node degreedistribution follows a power law ), rendering that the PA rule has become an attractive choice forreal network modeling, such as the World Wide Web [15] and collaboration networks [23]. Werefer the readers to Durrett [10], van der Hofstad [30] for some text-style elaborations of the ele-mentary descriptions and probabilistic properties of PA networks. Recent studies have extendedclassical PA networks to directed counterparts, where some asymptotic theories [34, 32, 33] andthe maximum likelihood estimators (MLEs) of the parameters [31] have been developed. Otherrecent works on the mathematical treatments of PA networks and their variants include Gao andvan der Vaart [11], Alves et al. [1], Mahmoud [18], Wang and Resnick [35], Zhang and Mahmoud[36].However, classical (either directed or undirected) PA networks do not always ﬁt the realnetwork data well, nor are they able to precisely capture some key attributes of the networks.Alternatively, Atalay et al. [2] proposed a model mixing PA and uniform attachment (UA) toinvestigate the buyer-supplier network in the United States, showing that the proposed modelhas outperformed the pure PA model. In this paper, we consider a class of directed hybridrandom networks (HRNs) presenting PA and UA mechanisms simultaneously, governed by atuning parameter p ∈ [0 , Department of Statistics, Texas A&M University, College Station, TX 77843, U.S.A. Email: [email protected] Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, U.S.A.Email: [email protected] 1 a r X i v : . [ c s . S I] J a n Note that by [8], an undirected sublinear PA model with attachment probability proportionalto Degree αv , α ∈ (0 , − k α ) , α ∈ (0 , , fro k large , which is lighter than the power-law tail. With a stretched exponential tails, widely adoptedmethods, e.g. [6], may provide inaccurate tail estimates, since the underlying distributional taildeviates from power laws; see [9] for a detailed discussion. Therefore, in this paper, we restrictourselves to the mixture of PA and UA schemes to obtain a lighter tail of the degree distributionand retain the asymptotic power-law behavior simultaneously.In the literature, there is a limited amount of work on the random structures that integratePA and UA during the evolution. Shao et al. [26] carried out a simulation study of the degreedistribution in a standard mixed attachment growing network. More recently, [24] investigatedthe scale-free property of the degree distribution in an analogous model through recursive for-mulations. We describe the construction of a hybrid random network in Section 2, and studytheoretical properties of its degree distributions in Section 3. We then propose estimation meth-ods and explore properties of the estimators in Section 4, which facilitate the numerical studieson both synthetic datasets (cf. Section 5) and real network data (cf. Section 6). With all resultsavailable, we also provide some interesting direction for future research in Section 7.2. Hybrid Random Networks

Let H n ( V n , E n ; α, β, γ, p, δ in , δ out ) denote the structure of a class of HRNs consisting of avertex set V n and an edge set E n at time n , parameterized by a set of parameters θ :=( α, β, γ, p, δ in , δ out ) subject to α + β + γ = 1, δ in , δ out >

0. Speciﬁcally, the parameters, α , β and γ , represent the probabilities of presenting one of the three edge-creation scenarios ateach step. With probability α , there emerges a directed edge from the newcomer to an existingnode. With probability γ , there emerges a directed edge from an existing node to the newcomer.With probability β = 1 − α − γ , a directed edge is added between two existing nodes. See Figure 2for a graphical illustration. The oﬀset parameters δ in and δ out respectively control the growth i u i j ju Figure 1.

Three edge-addition scenarios respectively corresponding to α , β and γ (from left to right)rate of in-degree and out-degree in the network. Another parameter 0 ≤ p ≤ p is to balance PA and UA in the model, and accordinglythe proposed HRN becomes more ﬂexible than pure PA network model for characterizing thein-degree and out-degree tail distributions of real network data.We start the network with H , which is a self-looped single node labeled with 1. At anysubsequent point n ≥

1, ﬂip a three-sided coin, for which the probabilities of landing the threefaces up are respectively α (associated with scenario 1), β (associated with scenario 2) and γ (associated with scenario 3). Let J n ∈ { , , } indicate the occurrence of the scenario type attime n , i.e. J n is a tri-nomial random variable on { , , } with cell probability α , β and γ ,respectively. The network evolves as below. (1) For J n = 1, we add a new node u to the network, connecting it to an existing node i ∈ V n − by a directed edge pointing to i with probability p × D in i ( n −

1) + δ in (cid:80) k ∈ V n − ( D in k ( n −

1) + δ in ) + (1 − p ) × | V n − | , (1)where D in i ( n ) is the in-degree of i in H n , and | V n | denotes the number of nodes at time n .(2) For J n = 2, we add a directed edge between two existing nodes i, j ∈ V n − , where i and j are sampled independently. Suppose that the newly added edge is pointed (from j ) to i , then the associated probability is given by (cid:32) p × D out j ( n −

1) + δ out (cid:80) k ∈ V n − ( D out k ( n −

1) + δ out ) + (1 − p ) × | V n − | (cid:33) × (cid:32) p × D in i ( n −

1) + δ in (cid:80) k ∈ V n − ( D in k ( n −

1) + δ in ) + (1 − p ) × | V n − | (cid:33) , (2)where D out i ( n ) is analogously deﬁned as the out-degree of node i in H n . Note that nonew node is added to the network under this scenario, hence V n = V n − . Besides, thereis a positive probability that a node is sampled twice; If so, a self loop is created.(3) For J n = 3, a new node u is appended to the network by a directed edge emanating outfrom j ∈ V n − with probability p × D out j ( n −

1) + δ out (cid:80) k ∈ V n − ( D out k ( n −

1) + δ out ) + (1 − p ) × | V n − | . (3)Some simpliﬁcations can be made to the conditional probabilities in Equations (1), (2) and (3)after observing (cid:80) k ∈ V n − ( D in k ( n −

1) + δ in ) = n + δ in | V n − | and (cid:80) k ∈ V n − ( D out k ( n −

1) + δ out ) = n + δ out | V n − | (since our initial time is n = 0). Meanwhile, the fact that the two fractions havediﬀerent denominators in each of the conditional probabilities would have brought a great dealof challenges to both analytical computations and parameter estimations.3. Degree Distribution

In this section, we investigate the in-degree and out-degree distributions of H n . Let F n be the sigma ﬁeld generated by the evolution of a hybrid random network up to time n , i.e., {H k : k ≤ n } . According to the evolutionary scenarios described in Section 2, we have for i ∈ V n , P (cid:0) D in i ( n + 1) − D in i ( n ) = 1 | F n (cid:1) = ( α + β ) (cid:34) p (cid:0) D in i ( n ) + δ in (cid:1)(cid:80) k ∈ V n (cid:0) D in k ( n ) + δ in (cid:1) + (1 − p ) | V n | (cid:35) , (4) P (cid:0) D out i ( n + 1) − D out i ( n ) = 1 | F n (cid:1) = ( β + γ ) (cid:34) p (cid:0) D out i ( n ) + δ out (cid:1)(cid:80) k ∈ V n (cid:0) D out k ( n ) + δ out (cid:1) + (1 − p ) | V n | (cid:35) . (5)We present important theoretical results on the in- and out-degree sequences in an HRN. Rele-vant proofs of the theorems in this section are collected in Appendices A, B and C.3.1. Expected In- and Out-Degrees.

The next theorem speciﬁes the growth rates of E (cid:2) D in i ( n ) (cid:3) and E (cid:2) D out i ( n ) (cid:3) for a ﬁxed node i ∈ V n . Theorem 1.

There exist M , M > such that sup i ≥ E (cid:2) D in i ( n ) (cid:3) n C ≤ M and sup i ≥ E (cid:2) D out i ( n ) (cid:3) n C ≤ M , where C and C are respectively given by C = ( α + β ) p δ in (1 − β ) and C = ( β + γ ) p δ out (1 − β ) . It is worth noting that the growth rates C and C are smaller than those in a pure directedPA model (i.e., p = 1). This suggests that incorporating a non-negligible number of uniformlyadded edges creates lighter distributional tails for both in- and out-degrees.3.2. Almost Sure Convergence of the In- and Out-Degrees.

Next, we study the asymp-totic properties of D in i ( n ) and D out i ( n ) by utilizing martingale formulations [10, Chapter 4]. Bythe conditional probability in Equation (4), we have for i ∈ V n , P (cid:0) D in i ( n + 1) − D in i ( n ) = 1 | F n (cid:1) ≥ p (cid:0) D in i ( n ) + δ in (cid:1) ( α + β )1 + n + δ in | V n | , which implies that for a ﬁxed i , M in n +1 := D in i ( S i + n + 1) + δ in (cid:81) nk =0 (cid:16) p ( α + β ) S i + k +1+ δ in | V Si + k | (cid:17) (6)is a sub-martingale with respect to the ﬁltration ( F n ) n ≥ , where S i := inf { n ≥ | V n | = i } .Analogously, based on Equation (5), we construct another sub-martingale sequence for out-degrees: M out n +1 := D out i ( S i + n + 1) + δ out (cid:81) nk =0 (cid:16) p ( β + γ ) S i + k +1+ δ out | V Si + k | (cid:17) . (7)In the proof of Theorem 3, we specify the asymptotic orders of the denominator in Equation (6) (asimilar argument also applies to the denominator in Equation (7)). Then applying the martingaleconvergence theorem [10, Theorem 4.2.11] gives the following convergence results for the in- andout-degrees of a ﬁxed node. Details of the proof of Theorem 2 are given in Appendix B. Theorem 2.

For a ﬁxed node i , there exist ﬁnite random variables ζ i and ξ i such that as n → ∞ , (cid:18) D in i ( n ) n C , D out i ( n ) n C (cid:19) a.s. −→ ( ζ i , ξ i ) , where C and C are identical to those developed in Theorem 1. Expected Degree Counts.

In addition, we develop the asymptotics for N in m ( n ) /n , and N out m ( n ) /n , m ≥

0, i.e., the empirical proportional of nodes with in- or out-degree m in H n . LetNB( n, q ) represent a negative binomial random variable with generating function( s + (1 − s ) /q ) − n , s ∈ [0 , . Theorem 3.

Deﬁne ˜ δ in := δ in /p + (1 − p ) / (cid:0) p (1 − β ) (cid:1) and ˜ δ out := δ out /p + (1 − p ) / (cid:0) p (1 − β ) (cid:1) .Let NB(˜ δ in , p ) , NB(1 + ˜ δ in , p ) , NB(˜ δ out , p ) and NB(1 + ˜ δ out , p ) , be four independent negativebinomial random variables, and set T in and T out to be two independent exponential randomvariables with rates δ in (1 − β ) p ( α + β ) and δ out (1 − β ) p ( β + γ ) , respectively (which are also independent from NB(˜ δ in , p ) , NB(1 + ˜ δ in , p ) , NB(˜ δ out , p ) and NB(1 + ˜ δ out , p ) ). As n → ∞ , we have N in m ( n ) n p −→ ˜ ψ in m and N out m ( n ) n p −→ ˜ ψ out m , where ˜ ψ in m = α P (cid:16) NB (cid:16) ˜ δ in , e − T in (cid:17) = m (cid:17) + γ P (cid:16) (cid:16) δ in , e − T in (cid:17) = m (cid:17) , (8) ˜ ψ out m = α P (cid:16) NB (cid:16) ˜ δ out , e − T out (cid:17) = m (cid:17) + γ P (cid:16) (cid:16) δ out , e − T out (cid:17) = m (cid:17) . (9)We conclude this section by remarking that the limit functions in Equations (8) and (9)coincide with those from a pure PA network with parameters ( α, β, γ, ˜ δ in , ˜ δ out ) . In fact, when β = 0, the HRN is identical to a pure PA network with ( α, , γ, ˜ δ in , ˜ δ out ), where all establishedresults for the pure PA model can be readily applied. The major goal in the proof of Theorem 3is to show that the discrepancy caused by having random number of edges is negligible when n is large. 4. Parameter Estimation

In this section, we propose our estimation scheme for the parameters in the HRN modeldescribed in Section 2, under a few regularity conditions given as follows. We assume theevolution history of the entire network is available since the beginning, recorded in the edgelist E := { E k } n − k =0 , where E = (1 ,

1) is deterministic. Notice that γ = 1 − ( α + β ) completelydepends on α and β so that the model is parametrized in terms of ( α, β, p, δ in , δ out ). We alsoassume 0 ≤ p ≤

1, 0 ≤ α, β < < α + β ≤

1, where the latter two jointly ensure theexclusion of the trivial cases of either α , β or γ taking value 1. The oﬀset parameters δ in and δ out are assumed to be positive and ﬁnite.4.1. Maximum Likelihood Estimation.

In a slight abuse of notation, let E k = ( v k, , v k, )represent the edge (from v k, to v k, ) added at time k , v k, and v k, can be the nodes from theexisting network or newcomers. According to Equations (1), (2) and (3), the likelihood of themodel is given by L ( θ | E ) = n (cid:89) k =1 (cid:34) α (cid:32) p ( D in v k, ( k −

1) + δ in ) k + δ in | V k − | + 1 − p | V k − | (cid:33)(cid:35) I { Jk =1 } × n (cid:89) k =1 (cid:34) β (cid:32) p ( D out v k, ( k −

1) + δ out ) k + δ out | V k − | + 1 − p | V k − | (cid:33) × (cid:32) p ( D in v k, ( k −

1) + δ in ) k + δ in | V k − | + 1 − p | V k − | (cid:33)(cid:35) I { Jk =2 } × n (cid:89) k =1 (cid:34) (1 − α − β ) (cid:32) p ( D out v k, ( k −

1) + δ out ) k + δ out | V k − | + 1 − p | V k − | (cid:33)(cid:35) I { Jk =3 } , then the log-likelihood becomeslog L ( θ | E )= log α n (cid:88) k =1 I { J k =1 } + log β n − (cid:88) k =1 I { J k =2 } + log(1 − α − β ) n (cid:88) k =2 I { J k =3 } + n (cid:88) k =1 log (cid:104)(cid:0) pD in v k, ( k −

1) + δ in (cid:1) | V k − | + (1 − p ) k (cid:105) I { J k = { , }} + n (cid:88) k =1 log (cid:104)(cid:0) pD out v k, ( k −

1) + δ out (cid:1) | V k − | + (1 − p ) k (cid:105) I { J k = { , }} − n (cid:88) k =1 log (cid:2) k | V k − | + | V k − | δ in (cid:3) I { J k = { , }} − n (cid:88) k =1 log (cid:2) k | V k − | + | V k − | δ out (cid:3) I { J k = { , }} , from which we see that the score functions of α and β are independent of those of the otherparameters.Carrying out an analogous analysis as in [31, Section 3.1], we ﬁnd that the MLEs for α and β are respectively given byˆ α MLE = 1 n n (cid:88) k =1 I { J k =1 } and ˆ β MLE = 1 n n (cid:88) k =1 I { J k =2 } . To develop the MLEs for δ in , δ out and p , we calculate their score functions to get ∂∂δ in log L ( θ | E ) = n (cid:88) k =1 | V k − | (cid:0) pD in v k, ( k −

1) + δ in (cid:1) | V k − | + (1 − p ) k I { J k = { , }} − n (cid:88) k =1 | V k − | k + | V k − | δ in I { J k = { , }} , (10) ∂∂δ out log L ( θ | E ) = n (cid:88) k =1 | V k − | (cid:0) pD out v k, ( k −

1) + δ out (cid:1) | V k − | + (1 − p ) k I { J k = { , }} − n (cid:88) k =1 | V k − | k + | V k − | δ out I { J k = { , }} , (11) ∂∂p log L ( θ | E ) = n (cid:88) k =1 (cid:0) D in v k, ( k − | V k − | − k (cid:1) I { J k = { , }} (cid:0) pD in v k, ( k −

1) + δ in (cid:1) | V k − | + (1 − p ) k + n (cid:88) k =1 (cid:0) D out v k, ( k − | V k − | − k (cid:1) I { J k = { , }} (cid:0) pD out v k, ( k −

1) + δ out (cid:1) | V k − | + (1 − p ) k . (12)We proceed by ﬁrst setting (10) to 0. Note that due to the randomness of | V k − | , the method-ology given in [31] is not directly applicable. Instead, we approximate the score function (10)as follows. n (cid:88) k =1 | V k − | (cid:0) pD in v k, ( k −

1) + δ in (cid:1) | V k − | + (1 − p ) k I { J k = { , }} = n (cid:88) k =1 I { J k = { , }} (cid:0) pD in v k, ( k −

1) + δ in (cid:1) + (1 − p ) k/ | V k − | = 1 p n (cid:88) k =1 I { J k = { , }} D in v k, ( k −

1) + ˜ δ in + R in ( n ) , where R in ( n ) = n (cid:88) k =1 I { J k = { , }} p (cid:32) D in v k, ( k −

1) + δ in /p + (1 − p ) k/ | V k − | − D in v k, ( k −

1) + ˜ δ in (cid:33) . Therefore, | R in ( n ) | ≤ p n (cid:88) k =1 (1 − p ) /p | k/ | V k − | − / (1 − β ) | I { J k = { , }} ( D in v k, ( k −

1) + δ in /p + (1 − p ) k/ | V k − | )( D in v k, ( k −

1) + ˜ δ in ) ≤ − pp n (cid:88) k =1 | k/ | V k − | − / (1 − β ) | ( δ in /p + (1 − p ) k/ | V k − | )˜ δ in . Since | V n − | /n a.s. −→ / (1 − β ), then by the Ces`aro convergence of random variables, we have | R in ( n ) | /n a.s. −→

0. Then the approximate score equation in (10) becomes1 n n (cid:88) k =1 I { J k = { , }} D in v k, ( k −

1) + ˜ δ in = 1 n n (cid:88) k =1 | V k − | k + | V k − | δ in I { J k = { , }} . Applying the method in [31] further yields the following approximate score function: ∞ (cid:88) m =0 N in >m ( n ) /nm + ˜ δ in = γ ˜ δ in + ( α + β )(1 − β )1 + δ in (1 − β ) , (13)where N in >m ( n ) denotes the number of nodes with in-degree strictly greater than m in H n .Similarly, the score equation with respect to (11) can be approximated by ∞ (cid:88) m =0 N out >m ( n ) /nm + ˜ δ out = α ˜ δ out + ( β + γ )(1 − β )1 + δ out (1 − β ) , (14)with N out >m ( n ) being the number of nodes with out-degree strictly greater than m in H n . However,with (13) and (14) available, the approximation to the third score equation in (12) leads to adeterministic solution of p = 1. This indicates standard approaches to ﬁnd MLE as in [31] arenot able to give us the desirable results. Alternatively, in the next section, we formulate theMLE searching procedure as a nonlinear optimization problem with constraint p ∈ [0 , Nonlinear optimization.

Having seen that standard ways to ﬁnd MLE by solving scoreequations do not produce reasonable estimates, we consider the MLE-searching procedure asa nonlinear optimization problem with properly identiﬁed constraints such that α, β, p ∈ [0 , δ in , δ out >

0. Speciﬁcally, we adopt the

Nelder-Mead (N-M) algorithm proposed by Nelderand Mead [22]. The N-M algorithm is appealing for eﬃciency as it usually converges within arelatively small number of iterations. Despite limited knowledge about the theoretical results ofthe N-M algorithm [16], its utilization is widespread in the community since it generally performswell in practice. One practical issue of the algorithm is that its convergence is quite sensitive tothe choice of the initial simplex. An improper initial simplex has become the main cause of thealgorithm breakdown.Having this in mind, we back up with an alternative — a

Bayesian estimation based on

Markovchain Monte Carlo (MCMC) algorithms. Speciﬁcally, we consider a

Metropolis-Hastings (M-H) algorithm [21, 14]. Being a classical approach, the fundamentals of the M-H algorithmshave been extensively elaborated in a wide range of texts, such as Chen et al. [5], Liang et al.[17], Gelman et al. [12]. Here we only present a few essential steps.Let π ( θ ; ψ ) be the prior distribution of θ , where ψ is a collection of hyper-parameters. Underthis setting, the likelihood function L ( θ | E ) is the posterior distribution of θ . As it is diﬃcult tosample θ from this posterior distribution directly, the M-H algorithm generates a Markov processwhose stationary distribution is the same as the posterior distribution, and the generation of theprocess is done in an iterative manner. Let θ ( t ) be the estimates from the t -th iteration, and let Q ( θ prop | θ ( t ) ) denote the proposal density governing the transition probability from the currentestimates to a proposed set of candidates. Suppose that the distribution Q is symmetric, thenthe acceptance rate a ( θ prop | θ ( t ) ) is given by a ( θ prop | θ ( t ) ) = min (cid:40) L ( θ prop | E ) Q ( θ prop | θ ( t ) ) L ( θ ( t ) | E ) Q ( θ ( t ) | θ prop ) , (cid:41) = min (cid:26) L ( θ prop | E ) L ( θ ( t ) | E ) , (cid:27) . This can be done by generating a standard uniform random variable U such that θ ( t +1) = θ prop if U < a ( θ prop | θ ( t ) ); θ ( t +1) = θ ( t ) , otherwise.There are multiple ways of selecting an appropriate proposal distribution Q ( θ prop | θ ( t ) ), wherea simple approach based on random walk is adopted. Consider θ prop = θ ( t ) + (cid:15) , where (cid:15) is arandom variable for step size. Note that the conditional distribution Q ( θ prop | θ ( t ) ) is fully Table 1.

Simulation results with large p = 0 . Nelder-Mead Algorithm Metropolis-Hastings AlgorithmParameters ˆ α ˆ β ˆ p ˆ δ in ˆ δ out ˆ α ˆ β ˆ p ˆ δ in ˆ δ out α = 0 . β = 0 . p = 0 . α = 0 . β = 0 . p = 0 . α = 0 .

45 Est. 0.4497 0.1003 0.8153 1.3608 0.7339 0.4494 0.1005 0.8305 1.4193 0.7714 β = 0 . p = 0 . Table 2.

Simulation results with moderate p = 0 . Nelder-Mead Algorithm Metropolis-Hastings AlgorithmParameters ˆ α ˆ β ˆ p ˆ δ in ˆ δ out ˆ α ˆ β ˆ p ˆ δ in ˆ δ out α = 0 . β = 0 . p = 0 . α = 0 . β = 0 . p = 0 . α = 0 .

45 Est. 0.4504 0.1001 0.6227 1.4097 0.7746 0.4503 0.1002 0.6643 1.5867 0.9064 β = 0 . p = 0 . speciﬁed by the distribution of (cid:15) . Thus, we only need to choose a symmetric distribution for (cid:15) to satisfy the required condition, such as normal or uniform distribution. One drawback ofMCMC algorithms is the lack of theoretical foundation for the assessment of convergence. Sincethe initial samples of θ from the prior distribution may fall into a low density of the targetposterior distribution, a suﬃciently large burn-in period is always necessary. The number ofiterations needed for the algorithm to converge is closely related to its convergence rate [19],which is practically unwieldy in general. Here we will rely on a few widely-accepted graphicaldiagnostics to assess the convergence of MCMC algorithms, such as time-series plots and runningmean plots [27]. 5. Simulations

In this section, we carry out an extensive simulation study along with a sensitivity analysis forthe estimation of the parameters for HRNs. We focus on the performance of the N-M and M-Halgorithms under diﬀerent combinations of ( α, β, p ). Speciﬁcally, we consider p ∈ { . , . , . } (respectively corresponding to dominant PA, roughly even PA and UA and dominant UA)paired with ( α, β ) ∈ { (0 . , . , (0 . , . , (0 . , . } . The other two oﬀset parameters are setto be δ in = 1 . δ out = 0 . R = 100 replicates of independent HRNs with n = 10 ,

000 edges.When applying the M-H algorithm, we use non-informative priors. The burn-in number is set tobe 10,000, and the number of iterations after burn-in is 20,000. To avoid auto-correlation in theposterior sample, a thinning sampling of gap 500 is used. In Tables 1, 2 and 3, we present thepoint estimates, the absolute percentages of bias and the standard errors based on the simulationresults.Overall, both algorithms provide estimates with low bias for ( α, β, p ), across all combinationsof the parameters, but we do observe that the N-M algorithm outperforms the M-H algorithm.

Table 3.

Simulation results with small p = 0 . Nelder-Mead Algorithm Metropolis-Hastings AlgorithmParameters ˆ α ˆ β ˆ p ˆ δ in ˆ δ out ˆ α ˆ β ˆ p ˆ δ in ˆ δ out α = 0 . β = 0 . p = 0 . α = 0 . β = 0 . p = 0 . α = 0 .

45 Est. 0.4495 0.0998 0.2125 1.4816 0.8271 0.4493 0.0999 0.2398 1.7302 1.2277 β = 0 . p = 0 . In particular, for p which controls the percentage of edges produced by the PA rule, the N-Malgorithm is preferred as the standard errors are consistently smaller. Estimates for δ in and δ out from both algorithms tend to be biased, especially when p is small (cf. Table 3; when theUA part dominates), though the estimated δ in is slightly less biased than the estimated δ out .When p = 0 .

8, Table 1 reveals that the N-M method produces the best (small bias and smallstandard errors) estimated ( δ in , δ out ) for the combination ( α, β, γ ) = (0 . , . , . p = 0 . δ in , δ out ) appearin case ( α, β, γ ) = (0 . , . , . p = 0 .

6, the N-M algorithm provides the mostaccurate estimation overall.The simulation results reveal that ˆ α , ˆ β and ˆ p are all unbiased from both algorithms, and thereis no signiﬁcant diﬀerence in relative eﬃciency observed for estimating the scenario probabili-ties α and β . However, there is a noticeable eﬃciency gain in estimating the tuning parameter p by using the N-M algorithm, rendering it a preferred approach. Besides, we look into thedistributions of the estimates. The standard central limit theorem ensures that the limitingdistributions of ˆ α MLE and ˆ β MLE are normal. However, since the score functions for δ in , δ out and p are not separable, no standard approach can be readily used to uncover their limiting distribu-tions. Nonetheless, the approximation method developed in Theorem 3 plausibly suggests thatthey may follow a Gaussian law asymptotically as well. To verify, we show the quantile-quantile(Q-Q) plots for the estimates from the case of α = 0 . β = 0 . p = 0 . α (0) = β (0) = γ (0) = 1 / p (0) = 1 / δ (0)in = δ (0)out = 1. When weuse a random initial start (e.g., spacing α , β and γ randomly on the unit interval, sampling p from a standard uniform distribution and sampling δ in and δ out independently from a standardexponential distribution), the success rate may drop signiﬁcantly to 60% or less. The failureof the algorithm is primarily due to the inaccurate start of δ in and δ out . We have also runsome experiments on smaller networks. When reducing the simulated network size to 5,000, thesuccess rate of the N-M algorithm declines to 35% or less even with a ﬁxed initial simplex.In contrast, the M-H algorithm is more robust, as it is always able to produce estimationresults regardless of the size of the network. Speciﬁcally, we consider a random spacing of α , β and γ on the unit interval, sample δ in and δ out independently from a standard exponentialdistribution, and let p start from a value close to 1 (e.g., 1 − − ), which indicates an almostperfect linear PA. Simulation results show that the choice of a non-informative prior of p (i.e.,sampling it uniformly from [0 , −2 −1 0 1 2 . . . . alpha Theoretical Quantiles S a m p l e Q uan t il e s −2 −1 0 1 2 . . . . beta Theoretical Quantiles S a m p l e Q uan t il e s −2 −1 0 1 2 . . . . . . p Theoretical Quantiles S a m p l e Q uan t il e s −2 −1 0 1 2 . . . . . delta_in Theoretical Quantiles S a m p l e Q uan t il e s −2 −1 0 1 2 . . . . delta_out Theoretical Quantiles S a m p l e Q uan t il e s Figure 2.

Q-Q plots for the estimates based on 100 independently simulatedHRNs with α = 0 . β = 0 . p = 0 . δ in = 1 . δ out = 0 . δ in and δ out , the 100% success rate renders the M-H methoda competitive alternative.To summarize, we recommend the N-M algorithm for parameter estimation if there is auxiliaryinformation available to decide reasonable initial values. In addition, the M-H algorithm is apossible backup when the N-M algorithm fails. In practice, we may consider an integration ofthe two algorithms. Although the estimates of δ in and δ out from the M-H algorithm may not bevery accurate, they are close enough to true values allowing for a successful implementation ofthe N-M algorithm. Therefore, we may ﬁrst adopt the M-H algorithm to get coarse estimates ofthe model parameters, and use them as initial values for the N-M algorithm, which ultimatelyleads to ﬁner estimation results. It is worth mentioning that this initial value selection proceduredoes not actually aﬀect the estimation results by the N-M algorithm, but eﬀectively increasesthe probability of the successful implementation of the N-M algorithm. If the target parameteris p only but not δ in or δ out , the estimation results from the M-H algorithm may have beenacceptable. 6. Real Data Analysis

In this section, we ﬁt the proposed HRN model to two real network datasets: the DutchWikipedia talk network and the Facebook wall posts, both of which are retrieved from theKONECT network data repository ( http://konect.cc/ ). By investigating the timestamp in-formation from both datasets, we see the existence of two additional edge creation scenarios ateach step:(1) Set J n = 4 (with probability 0 < ξ <

1) if a new node with a self-loop is added to thenetwork;(2) Set J n = 5 (with probability 0 < η <

1) if two new nodes with a directed edge connectingthem are added to the network.Note that these two additional scenarios require minor modiﬁcations of the log-likelihood givenin Section 4, but they do not impose direct eﬀect on the score functions of p , δ in and δ out . The − . . . alpha, beta, gamma, xi sample e s t i m a t e alpha beta gamma xi . . . p sample e s t i m a t e . . . delta_in sample e s t i m a t e . . . delta_out sample e s t i m a t e Figure 3.

Graphical diagnostics of the convergence for the M-H algorithm ap-plied to the sub-network of Dutch Wikipedia talk dataset; ; red dashed linesrepresent the estimated values.MLEs of ξ and η are straightforward:ˆ ξ MLE = 1 n n (cid:88) k =1 I { J k =4 } and ˆ η MLE = 1 n n (cid:88) k =1 I { J k =5 } . Dutch Wikipedia Talk.

In the Dutch Wikipedia talk dataset, each single node representsa speciﬁc user of the Dutch Wikipedia, and the creation of a directed edge from node u tonode v refers to the event that user u leaves a message on user v ’s talk page. The datasetincludes the communication among the users of Dutch Wikipedia talk pages from 10/18/2002and 11/23/2015, consisting of three columns. The ﬁrst two columns represent users’ ID and thethird column gives a UNIX timestamp with the time of a message posted on one’s Wikipediatalk page. For each row, the ﬁrst user writes a message on the talk page of the second user at atimestamp given in the third column.We select the sub-network according to the timestamp information from 01/01/2013 to03/31/2013. The sub-network is directed, consisting of 3,288 nodes and 21,724 directed edges.We ﬁt the proposed hybrid model to the data by using the integration of the M-H algorithm andthe N-M algorithm. For the M-H algorithm, the burn-in number and iteration sample size areboth set at 100,000, and the gap for thinning sampling is 500. The convergence of the estimatesis checked via the time-series plots in Figure 3. Using the estimates from the M-H algorithm asthe initial values for the N-M algorithm, we get almost identical estimates given by (cid:98) θ WK := ( ˆ α, ˆ β, ˆ γ, ˆ ξ, ˆ p, ˆ δ in , ˆ δ out ) = (0 . , . , . , . , . , . , . , where the value of ˆ p is extremely close to 1. The large ˆ p suggests PA dominates the evolutionaryprocess, thus leading to little diﬀerence between the proposed HRN and that proposed in Wanet al. [31]. For comparison, we compute the MLEs from the pure PA network model to get (cid:98) θ (PA)WK := ( ˆ α, ˆ β, ˆ γ, ˆ ξ, ˆ δ in , ˆ δ out ) = (0 . , . , . , . , . , . , from which we see the estimates from the two models are close to each other. − − − − − − − out−degree t a il p r obab ili t y wikiHRNPA 1 5 10 50 100 500 − − − − in−degree t a il p r obab ili t y wikiHRNPA Figure 4.

Out-degree and in-degree tail distributions of HRNs (red), directedPA networks (blue) and the sub-network of Dutch Wikipedia talk dataset (black).Next, we simulate an HRN and a pure PA network with the estimated parameters, andplot their corresponding empirical tail distributions of the out- and in-degrees in Figure 4.As expected, the empirical out-degree and in-degree tail distributions from the two simulatednetworks are alike. It seems that the out-degree tail distribution (of the simulated networks)ﬁts the real data better than the in-degree tail distribution, but the discrepancy between thein-degree tail distributions is also small.In addition, we generate 50 independent replications of the HRN using the given estimates.The out-degree and in-degree tail distributions of these simulated networks are collected inFigure 5. The left panel shows that the out-degree tail distribution of the selected sub-networkis well covered by the overlaid plot of the simulated counterparts, whereas the right revealssmall but acceptable discrepancy between the overlaid plot and the in-degree distribution of thereal data. Overall, our analysis suggests the evolution of the selected sub-network of the DutchWikipedia talk data follows a linear PA mechanism, which also conﬁrms the ﬂexibility of theproposed hybrid model.6.2.

Facebook Wall Posts.

The Facebook wall post dataset collects data from a regionalnetwork of users in New Orleans from 09/13/2004 to 01/21/2009. The data forms a directedgraph where the nodes are Facebook users and each directed edge represents a post from onenode to another node’s page. Like the previous one, this dataset contains three columns, wherethe ﬁrst two contain the identiﬁers of the individual users, while the third records the timestampof the corresponding post.We select the sub-network based on the timestamp from 01/01/2006 to 06/31/2006, whichconsists of 4,200 nodes and 11,422 directed edges. Fitting the proposed hybrid model to thedata, we get (cid:98) θ FB := ( ˆ α, ˆ β, ˆ γ, ˆ ξ, ˆ p, ˆ δ in , ˆ δ out ) = (0 . , . , . , . , . , . , . . The estimates are obtained via the integration of the N-M and M-H algorithms, where thesettings of burn-in number, iteration size and thinning gap are identical to the previous example.In Figure 6, we verify the convergence of the estimates based on the M-H algorithm. Once − − − − − out−degree t a il p r obab ili t y wikiHRN 1 5 10 50 500 − − − − in−degree t a il p r obab ili t y wikiHRN Figure 5.

Out-degree and in-degree tail distributions of 50 independent HRNs(red), compared with those from the Wikipedia data (black). − . . . alpha, beta, gamma, xi sample e s t i m a t e alpha beta gamma xi . . . p sample e s t i m a t e . . . delta_in sample e s t i m a t e − . . . delta_out sample e s t i m a t e Figure 6.

Graphical diagnostics of the convergence for the M-H algorithm ap-plied to the sub-network of Facebook wall posts dataset; red dashed lines repre-sent the estimated values.again, we do not observe signiﬁcant diﬀerence between the corresponding estimates from thetwo algorithms. − − − − − − − out−degree t a il p r obab ili t y facebookHRNPA 1 5 10 50 100 500 1000 − − − − in−degree t a il p r obab ili t y facebookHRNPA Figure 7.

Out-degree and in-degree tail distributions of HRNs (red), directedPA networks (blue) and the sub-network of Facebook wall posts (black).We assess the goodness-of-ﬁt of the model through the tail distributions of in-degree andout-degree. Speciﬁcally, we generate an HRN using (cid:98) θ FB , and compare the empirical out- and in-degree tail distributions from the simulated network with those from the real data. Results arepresented in Figure 7. For comparison, we also ﬁt a pure PA model to the same sub-network datausing the estimation method derived in Wan et al. [31], and plot the corresponding empiricaltail distributions (blue dots) in Figure 7.The empirical tail distributions show that the HRN provides a better ﬁt to the network databy introducing the extra UA rules controlled by the paramter p . Empirical out- and in-degreedistributions from the pure PA model display heavier tails than those from the Facebook dataset,leading to a clear deviation from the data. In the HRN, however, the existence of 100(1 − p )%edges created by the UA rule has reduced the heaviness of the empirical tails, thus reducing thediscrepancy between the empirical distributions.Furthermore, we generate 50 independent HRNs by using (cid:98) θ FB , and overlay their empiricalout-degree and in-degree distributions in Figure 8. The graphical results show that the out-degree tail distribution is better captured by the HRN than the in-degree distribution, as itappears on the lower bound of the overlaid tail distributions of the simulated networks. Thein-degree tail distribution of the Facebook sub-network is not covered by the counterparts of thesimulated networks, though the shapes look similar and the deviation is not large.The discrepancy in Figure 8 may be due to the reciprocity feature in the Facebook wall posts.The wall posts activities among the Facebook users in a speciﬁc region tend to be reciprocated:when a friend posts a message on one’s wall, he/she is likely to reply quickly. In fact, using the reciprocity() function in the igraph package [7], we see that the proportion of reciprocatededges in the sub-network is over 0 .

18. Indeed, the reciprocated wall posts are certainly notuniform, thus not very well characterized by the parameter p . To better study the reciprocityfeature, we may consider other variants of the PA model, which are left as future work. − − − − out−degree t a il p r obab ili t y facebookHRN 1 2 5 10 20 50 100 200 − − − − in−degree t a il p r obab ili t y facebookHRN Figure 8.

Out-degree and in-degree tail distributions of 50 independent HRNs(red), compared with those from the Facebook data (black).7.

Discussion

In this paper, we propose a class of hybrid model simultaneously presenting the preferentialattachment (PA) and uniform attachment (UA) mechanisms, which are governed by a tuningparameter p . Two standard methods, the Nelder-Mead (N-M) algorithm and the Metropolis-Hastings (M-H) algorithm, are adopted for parameter estimation. Through extensive simulationsand a sensitivity study, we ﬁnd that the N-M algorithm is preferred, but the correspondingsuccess rate of producing estimation result depends heavily on the selection of the initial simplex.We thus consider an integrated approach where we use the more robust M-H algorithm to get theinitial values for the target parameters, followed by the implementation of the N-M algorithm.In addition, we ﬁt the HRN model to two real network datasets: the Dutch Wikipedia talkand Facebook wall posts, where we see that the proposed hybrid model provides a more ﬂexiblemodeling framework compared with the directed PA network model as in [31]. The extra tuningparameter p helps correct the tail distributions of out- and in-degrees.From the Facebook example, it is worth noting that even though the heaviness of the taildistributions (for both out- and in-degrees) has been weakened by p in the HRN, the proposedUA part is not able to fully capture the reciprocity property in the real network. We here providethree classes of possible remedies: (1) It may be worthwhile to try some algorithmic approaches,such as network rewiring, to wash oﬀ the nodes of large in-degree or out-degree in the simulatednetworks; (2) We may consider modifying the model directly by introducing another parametermeasuring the rate of reciprocation; (3) We may consider a more realistic mixer (some suitablelight-tailed distribution) rather than simple UA in the present hybrid setting. We will reportour research outcomes elsewhere in the future. References [1] Alves, C., R. Ribeiro, and R. Sanchis (2019). Preferential attachment random graphs withedge-step functions.

Journal of Theoretical Probability . DOI: https://doi.org/10.1007/s10959-019-00959-0 . [2] Atalay, E., A. Horta¸csu, J. Roberts, and C. Syverson (2011). Network structure of pro-duction. Proceedings of the National Academy of Sciences of the United States of Amer-ica 108 (13), 5199–5202.[3] Athreya, K. and P. Ney (2004).

Branching processes. Reprint of the 1972 original . Springer.[4] Barab´asi, A.-L. and R. Albert (1999). Emergence of scaling in random networks.

Sci-ence 286 (5439), 509–512.[5] Chen, M.-H., Q.-M. Shao, and J. G. Ibrahim (2010).

Monte Carlo Methods in BayesianComputation . New York, NY, U.S.A.: Springer-Verlag.[6] Clauset, A., Shalizi, C., Newman, M. (2009) Power-law distributions in empirical data.

SIAM Rev , 51(4):661–703.[7] Csardi, G. and T. Nepusz (2006). The igraph software package for complex network research.

InterJournal Complex Systems , 1695.[8] Dereich, S., M¨orters, P. (2009) Random networks with sublinear preferential attachment:degree evolutions.

Electronic Journal of Probability , 14:1222–1267.[9] Drees, H., Janßen, A., Resnick, S.I., Wang, T. (2020) On a minimum distance procedurefor threshold selection in tail analysis.

SIAM Journal on Mathematics of Data Science ,2(1):75–102.[10] Durrett, R. T. (2006).

Random Graph Dynamics . Cambridge, U.K.: Cambridge UniversityPress.[11] Gao, F. and A. van der Vaart (2017). On the asymptotic normality of estimating the aﬃnepreferential attachment network models with random initial degrees.

Stochastic Processesand their Applications 127 (11), 3754–3775.[12] Gelman, A., J. B. Carlin, D. B. Dunson, A. Behtari, and D. B. Rubin (2013).

BayesianData Analysis . Boca Raton, FL, U.S.A.: Chapman and Hall/CRC.[13] Hasselman, B. (2018). nleqslv: Solve systems of nonlinear equations . CPB NetherlandsBureau for Economic Policy Analysis. R package version 3.3.2, https://CRAN.R-project.org/package=nleqslv .[14] Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and theirapplications.

Biometrika 57 (1), 97–109.[15] Henzinger, M. and S. Lawrence (2004). Extracting knowledge from the World WideWeb.

Proceedings of the National Academy of Sciences of the United States of Amer-ica 101 (supplement 1), 5186–5191.[16] Lagarias, J. C., J. A. Reeds, M. H. Wright, and P. E. Wright (1998). Convergence propertiesof the Nelder-Mead simplex method in low dimensions.

SIAM Journal on Optimization 9 (1),112–147.[17] Liang, F., C. Liu, and R. J. Carroll (2010).

Advanced Markov Chain Monte Carlo Methods:Learning from Past Examples . Hoboken, NJ, U.S.A.: John Wiley & Sons.[18] Mahmoud, H. M. (2019). Local and global degree proﬁles of randomly grown self-similarhooking networks under uniform and preferential attachment.

Advances in Applied Mathe-matics 111 , 101930.[19] Mengersen, K. L. and R. L. Tweedie (1996). Rates of convergence of the Hastings andMetropolis algorithms.

Annals of Statistics 24 (1), 101–121.[20] Merton, R. K. (1968). The Matthew Eﬀect in science.

Science 159 (3810), 56–63.[21] Metropolis, N., A. W. Rosenbluth, N. Rosenbluth, Marshall, and A. H. Teller (1953). Equa-tion of state calculations by fast computing machines.

The Journal of Chemical Physics 21 ,1087.[22] Nelder, J. A. and R. Mead (1965). A simple method for function minimization.

TheComputer Journal 7 (4), 308–313.[23] Newman, M. E. J. (2001). Clustering and preferential attachment in growing networks.

Physical Review E 65 (1), 025102.[24] Pachon, A., L. Sacerdote, and S. Yang (2018). Scale-free behavior of networks with the cop-resence of preferntial and uniform attachment rules.

Physica D: Nonliner Phenomena 371 ,1–12. [25] Samorodnitsky, G., S. Resnick, D. Towsley, R. Davis, A. Willis, and P. Wan (2016, March).Nonstandard regular variation of in-degree and out-degree in the preferential attachmentmodel. Journal of Applied Probability 53(1) , 146–161.[26] Shao, Z.-G., X.-W. Zou, and Z.-Z. Jin (2006). Growing networks withmixed attachmentmechanisms.

Journal of Physics A: Mathematical and General 39 , 9.[27] Smith, B. J. (2007). boa: An R package for MCMC output convergence assessment andposterior inference.

Journal of Sstatistical Software 21 (11), 1–37.[28] Soetaert, K. (2009). rootSolve: Nonlinear root ﬁnding, equilibrium and steady-state analysisof ordinary diﬀerential equations . Netherlands Institute of Ecology. R package 1.6, https://cran.r-project.org/web/packages/rootSolve/rootSolve.pdf .[29] de Sollar Price, D. J. (1965). Networks of scientiﬁc papers.

Science 149 (3683), 510–515.[30] van der Hofstad, R. (2017).

Random Graphs and Complex Networks . Cambridge, U.K.:Cambridge University Press.[31] Wan, P., T. Wang, R. A. Davis, and S. I. Resnick (2017). Fitting the linear preferentialattachment model.

Electronic Journal of Statistics 11 (2), 3738–3780.[32] Wang, T. and S. Resnick (2018). Multivariate regular variation of discrete mass functionswith applications to preferential attachment networks.

Methodology and Computing inApplied Probability 20 (3), 1029–1042.[33] Wang, T. and S. Resnick (2020). Degree growth rates and index estimation in a directedpreferential attachment model.

Stochastic Processes and their Applications 130 (2), 878–906.[34] Wang, T. and S. I. Resnick (2015). Asymptotic normality of in- and out-degree counts ina preferential attachment model.

Stochastic Models 33 (2), 229–255.[35] Wang, T. and S. I. Resnick (2020). A directed preferential attachment model with Poissonmeasurement. https://arxiv.org/pdf/2008.07005.pdf .[36] Zhang, P. and H. M. Mahmoud (2020). On nodes of small degrees and degree proﬁle inpreferential dynamic attachment circuits.

Methodology and Computing in Applied Probabil-ity 22 (2), 625–645.

Appendix A. Proof of Theorem 1

We explicitly demonstrate the derivations of the focusing theorem for the in-degree distribu-tion, and the methodology is also applicable to out-degrees. Let F n denote the σ -ﬁeld generatedby the network evolution up to n steps. Set τ to be an ( F n ) n ≥ -stopping time, then F τ = { F : F ∩ { τ = n } ∈ F n } . For i ≥

1, let S i be the time when node i is created, i.e., S i = inf { n ≥ | V n | = i } . Then S i is an ( F n ) n ≥ -stopping time. Also, for n ≥ k ≥

0, we have { S i + k = n } = { S i = n − k } ∈ F n − k ⊂ F n , so S i + k , k ≥

0, is a stopping time with respect to ( F n ) n ≥ .Since the in-degree of i is increased at most by 1 at each evolutionary step, the probability ofthe event { D in i ( S i + n + 1) = D in i ( S i + n ) } under P F Si + n ( · ) := P ( ·|F S i + n ) is P F Si + n (cid:0) D in i ( S i + n + 1) = D in i ( S i + n ) (cid:1) = 1 − ( α + β ) (cid:34) p ( D in i ( S i + n ) + δ in ) (cid:80) k ∈ V Si + n ( D in k ( S i + n ) + δ in ) + (1 − p ) | V S i + n | (cid:35) . Therefore, E F Si + n (cid:2) D in i ( S i + n + 1) (cid:3) = D in i ( S i + n ) + ( α + β ) (cid:34) p ( D in i ( S i + n ) + δ in ) (cid:80) k ∈ V Si + n ( D in k ( S i + n ) + δ in ) + (1 − p ) | V S i + n | (cid:35) = D in i ( S i + n ) + ( α + β ) (cid:20) p ( D in i ( S i + n ) + δ in ) S i + n + 1 + δ in | V S i + n | + (1 − p ) | V S i + n | (cid:21) . (15)Write C := p ( α + β ) / (1 + δ in (1 − β )), and ˜ δ in := δ in /p + (1 − p ) / ( p (1 − β )), then we have E (cid:104) D in i ( S i + n + 1) + ˜ δ in (cid:105) = E (cid:20) D in i ( S i + n ) + ˜ δ in + ( α + β ) (cid:18) p ( D in i ( S i + n ) + δ in ) S i + n + 1 + δ in | V S i + n | + (1 − p ) | V S i + n | (cid:19)(cid:21) = E (cid:34) D in i ( S i + n ) + ˜ δ in + ( α + β ) (cid:32) p ( D in i ( S i + n ) + ˜ δ in ) S i + n + 1 + δ in | V S i + n | + (1 − p ) | V S i + n |− p (˜ δ in − δ in ) S i + n + 1 + δ in | V S i + n | (cid:33)(cid:35) = E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:18) p ( α + β ) S i + n + 1 + δ in | V S i + n | (cid:19)(cid:21) + E (cid:34) − p | V S i + n | − p (˜ δ in − δ in ) S i + n + 1 + δ in | V S i + n | (cid:35) ≤ E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:18) p ( α + β ) n + 1 + δ in | V n | (cid:19)(cid:21) + 1 − pn (1 − β ) E (cid:12)(cid:12)(cid:12)(cid:12) (1 − β ) S i + n + 1 | V S i + n | − (cid:12)(cid:12)(cid:12)(cid:12) = E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:18) C n (cid:19)(cid:21) + E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:18) p ( α + β ) n + 1 + δ in | V n | − C n (cid:19)(cid:21) + 1 − pn (1 − β ) E (cid:12)(cid:12)(cid:12)(cid:12) (1 − β ) S i + n + 1 | V S i + n | − (cid:12)(cid:12)(cid:12)(cid:12) =: I i ( n ) + II i ( n ) + III i ( n ) . (16)For II i ( n ), we have II i ( n ) ≤ E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:12)(cid:12)(cid:12)(cid:12) p ( α + β ) n + 1 + δ in | V n | − C n (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) ≤ E (cid:34) ( D in i ( S i + n ) + ˜ δ in ) δ in (cid:12)(cid:12) | V n | − (1 − β ) n (cid:12)(cid:12) + 1(1 + δ in (1 − β )) n (cid:35) . Since for n ≥ | V n | − − β , we applythe Chernoﬀ bound to get P (cid:16) || V n | − (1 − β ) n | ≥ (cid:112) − β ) n log n (cid:17) ≤ n , (17) and noticing that || V n | − (1 − β ) n | ≤ n , and D in i ( S i + n ) ≤ n + 1, we have II i ( n ) ≤ (cid:18) δ in √ n log nn + 1 n (cid:19) E ( D in i ( S i + n ) + ˜ δ in )+ 2( δ in n + 1)( n + 1 + ˜ δ in ) n ≤ (cid:18) δ in √ n log nn + 1 n (cid:19) E ( D in i ( S i + n ) + ˜ δ in ) + 2(1 + δ in )(2 + ˜ δ in ) n . For

III i ( n ), we apply the Cauchy-Schwartz inequality to obtain III i ( n ) ≤ − pn (1 − β ) (cid:16) E (cid:104) ((1 − β )( S i + n + 1) − | V S i + n | ) (cid:105)(cid:17) / (cid:0) E (cid:2) | V n | − (cid:3)(cid:1) / . By Theorem 3.9.4 in [3], we have as n → ∞ , n E (cid:2) | V n | − (cid:3) → (1 − β ) − . Meanwhile, E (cid:104) ((1 − β )( S i + n + 1) − | V S i + n | ) (cid:105) = Var ( | V S i + n | ) + Var ((1 − β )( S i + n + 1))= β (1 − β ) ( E ( S i ) + n ) + 2(1 − β ) Var( S i ) . Hence, there exists some constant A i > III i ( n ) ≤ A i n − / . Then it follows from (16) that E (cid:104) D in i ( S i + n + 1) + ˜ δ in (cid:105) ≤ E (cid:20) ( D in i ( S i + n ) + ˜ δ in ) (cid:18) C n + 1 + δ in √ n log nn (cid:19)(cid:21) + 2(1 + δ in )(2 + ˜ δ in ) n − + A i n − / . Iterating backwards for n times gives E (cid:2) D in i ( S i + n + 1) + δ in (cid:3) ≤ (cid:0) δ in α + (1 + δ in ) γ (cid:1) n (cid:89) r =1 (cid:32) C r + 1 + (cid:112) − β ) r log rr (cid:33) + n (cid:88) r =1 δ in )(2 + ˜ δ in ) r n (cid:89) s = r +1 (cid:32) C s + 1 + (cid:112) − β ) s log ss (cid:33) . Note that n (cid:89) s = r +1 (cid:32) C s + 1 + (cid:112) − β ) s log s (1 + δ in (1 − β )) s (cid:33) ≤ exp (cid:40) n (cid:88) s = r +1 (cid:32) C s + 1 + (cid:112) − β ) s log s (1 + δ in (1 − β )) s (cid:33)(cid:41) ≤ ( n/r ) C exp (cid:40) ∞ (cid:88) s =1 (cid:112) − β ) s log s (1 + δ in (1 − β )) s (cid:41) , which gives sup i ≥ E [ D in i ( n )] n C ≤ sup i ≥ E [ D in i ( S i + n ) + ˜ δ in ] n C < ∞ . This completes the proof. Appendix B. Proof of Theorem 2

Like the proof of Theorem 1, we only present the details for the in-degree sequence. Thederivations for out-degree sequence are similar, so omitted.For k, (cid:96) ≥

0, we have E (cid:2) M in k + (cid:96) − M in k (cid:3) = k + (cid:96) (cid:88) s = k +1 E (cid:2) M in s − M in s − (cid:3) = k + (cid:96) (cid:88) s = k +1 E  (1 − p )( α + β ) / | V S i + s | (cid:81) s(cid:96) =1 (cid:16) p ( α + β ) S i + (cid:96) +1+ δ in | V Si + (cid:96) | (cid:17)  ≤ k + (cid:96) (cid:88) s = k +1 E (cid:34) s (cid:89) (cid:96) =1 (cid:18) p ( α + β )(1 + δ in )( S i + (cid:96) + 1) (cid:19) (1 − p )( α + β ) | V S i + s | (cid:35) ≤ k + (cid:96) (cid:88) s = k +1 s (cid:89) (cid:96) =1 (cid:18) p ( α + β )(1 + δ in )( (cid:96) + 1) (cid:19) E (cid:20) (1 − p )( α + β ) | V s | (cid:21) , where s (cid:89) (cid:96) =1 (cid:18) p ( α + β )(1 + δ in )( (cid:96) + 1) (cid:19) = Γ (cid:16) s + 2 + p ( α + β )1+ δ in (cid:17) Γ( s + 2)Γ (cid:16) p ( α + β )1+ δ in (cid:17) ∼ s − ( p ( α + β )) / (1+ δ in ) , for s large. For the expectation in the summand, we have E (cid:20) (1 − p )( α + β ) | V ( s ) | (cid:21) = (1 − p )( α + β ) (cid:18) − β ) s + E (cid:20) | V ( s ) | − − β ) s (cid:21)(cid:19) . By the Chernoﬀ bound in (17), we get E (cid:20)(cid:12)(cid:12)(cid:12)(cid:12) | V ( s ) | − − β ) s (cid:12)(cid:12)(cid:12)(cid:12)(cid:21) = E (cid:34) (cid:12)(cid:12) | V ( s ) | − (1 − β ) s (cid:12)(cid:12) (1 − β ) s | V ( s ) | (cid:35) ≤ (cid:112) − β ) s log s (1 − β ) s | (1 − β ) s − (cid:112) − β ) s log s | + (2 − β ) s (1 − β ) s (cid:112) − β ) s log s × s . Putting them together, we conclude that E (cid:12)(cid:12) M in k + (cid:96) − M in k (cid:12)(cid:12) → k → ∞ , suggesting that (cid:8) E (cid:2) M in n (cid:3)(cid:9) n ≥ is a cauchy sequence in L space. Applying the martingale convergence theo-rem [10, Theorem 4.2.11] gives that there exists some ﬁnite random variable L i such that as n → ∞ , M in n a.s. −→ L i . (1)Then it remains to show the almost sure convergence of X n := n − (cid:89) k =0 (cid:18) α + β ) pS i + k + 1 + δ in | V S i + k | (cid:19) . Consider log X n , and rewrite it aslog X n = (cid:34) n − (cid:88) k =0 log (cid:18) α + β ) pS i + k + 1 + δ in | V S i + k | (cid:19) − n − (cid:88) k =0 ( α + β ) pS i + k + 1 + δ in | V S i + k | (cid:35) + (cid:34) n − (cid:88) k =0 ( α + β ) pS i + k + 1 + δ in | V S i + k | − n (cid:88) k =1 C S i + k + 1 (cid:35) + (cid:34) n (cid:88) k =1 C S i + k − C log n (cid:35) + C log n =: P ( n ) + P ( n ) + P ( n ) + C log n. (2)For P ( n ), we ﬁrst note that log(1 + x ) − x ≤ x ≥

0. Then P ( n + 1) − P ( n ) ≤

2, for all x ≥

0, then E (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ (cid:88) k =0 (cid:18) log (cid:18) α + β ) pS i + k + 1 + δ in | V S i + k | (cid:19) − ( α + β ) pS i + k + 1 + δ in | V S i + k | (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∞ (cid:88) k =0 E (cid:12)(cid:12)(cid:12)(cid:12) log (cid:18) α + β ) pS i + k + 1 + δ in | V S i + k | (cid:19) − ( α + β ) pS i + k + 1 + δ in | V S i + k | (cid:12)(cid:12)(cid:12)(cid:12) ≤ ( α + β ) p ∞ (cid:88) k =0 E (cid:18) S i + k + 1 + δ in | V S i + k | (cid:19) ≤ ∞ (cid:88) k =1 k < ∞ . Hence, P ( n ) a.s. −→ P ( ∞ ).For P ( n ), we apply [3, Theorem 3.9.4] to conclude that there exists a ﬁnite r.v. Z such that ∞ (cid:88) k =1 (cid:18) ( α + β ) pk + δ in | V ( k − | − C k (cid:19) a.s. −→ Z, then P ( n ) a.s. −→ Z − S i (cid:88) k =1 (cid:18) ( α + β ) pk + δ in | V ( k − | − C k (cid:19) =: P ( ∞ ) . Meanwhile, since (cid:80) nk =1 /k − log n → ˜ c as n → ∞ , where ˜ c is Euler’s constant, then for i ≥ P ( n ) a.s. −→ C (cid:32) ˜ c + log( S i + 1) − S i (cid:88) k =1 k (cid:33) =: P ( ∞ ) . Then we conclude from (2) that X n n C a.s. −→ exp ( P ( ∞ ) + P ( ∞ ) + P ( ∞ )) < ∞ . (3)The results in the theorem follow by combining (3) with (1), where we set ζ i := L i exp ( − ( P ( ∞ ) + P ( ∞ ) + P ( ∞ ))) . Appendix C. Proof of Theorem 3

Analogous to the previous proofs, we present the major steps of the proof for in-degree.Applying the argument as in the proof of [30, Proposition 8.4], we have the concentration resultthat N in ) m ( n ) n − E (cid:0) N in m ( n ) (cid:1) n p −→ . Then it suﬃces to ﬁnd the asymptotic limit of E (cid:0) N in m ( n ) (cid:1) /n . We consider the approximation of the attachment probability: p (cid:0) D in i ( n ) + δ in (cid:1)(cid:0) δ in (1 − β ) (cid:1) n + 1 − p (1 − β ) n = D in i ( n ) + δ in + − pp (1 − β ) (cid:0) δ in (1 − β ) (cid:1)(cid:0) δ in (1 − β ) (cid:1) n/p . Recall that ˜ δ in = δ in + 1 − pp (1 − β ) (cid:0) δ in (1 − β ) (cid:1) = δ in p + 1 − pp (1 − β ) . Using Chernoﬀ bound in (1), we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:34) p (cid:0) D in i ( n ) + δ in (cid:1) n + 1 + | V n | δ in + 1 − p | V n | (cid:35) − E (cid:34) D in i ( n ) + ˜ δ in (cid:0) δ in (1 − β ) (cid:1) n/p (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cn − / (cid:112) log n, (1)for some constant C >

0. Consider a in-degree sequence (cid:110) ˜ D in i ( n ) (cid:111) from a directed PA networkwith set of parameters ( α, β, γ, ˜ δ in , ˜ δ out ), as studied in Samorodnitsky et al. [25], Wan et al. [31].Establish an argument similar to Equation (17) as follows: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E (cid:34) p (cid:0) D in i ( n ) + δ in (cid:1) n + 1 + | V n | δ in + 1 − p | V n | (cid:35) − E (cid:34) ˜ D in i ( n ) − ˜ δ in n + 1 + | V n | ˜ δ in (cid:35)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ˜ Cn − / (cid:112) log n, for some constant ˜ C >

0. Note P (cid:0) D in i ( n ) = m (cid:1) = m (cid:88) j = m − P (cid:0) D in i ( n ) = m | D in i ( n −

1) = j (cid:1) P (cid:0) D in i ( n −

1) = j (cid:1) . By the developed Chernoﬀ bounds, we have P (cid:0) D in i ( n ) = m (cid:1) ≤ P (cid:0) ˜ D in i ( n ) = m (cid:1) + ( C + ˜ C ) n (cid:88) k = i k − / (cid:112) log k. Noticing that (cid:80) nk = i k − / √ log k < ∞ as n → ∞→ ∞