The distribution of shortest path lengths in a class of node duplication network models
aa r X i v : . [ phy s i c s . s o c - ph ] A ug The distribution of shortest path lengths in a class of nodeduplication network models
Chanania Steinbock, Ofer Biham, and Eytan Katzav
Racah Institute of Physics, The Hebrew University, Jerusalem 91904, Israel
Abstract
We present analytical results for the distribution of shortest path lengths (DSPL) in a networkgrowth model which evolves by node duplication (ND). The model captures essential properties ofthe structure and growth dynamics of social networks, acquaintance networks and scientific citationnetworks, where duplication mechanisms play a major role. Starting from an initial seed network,at each time step a random node, referred to as a mother node, is selected for duplication. Itsdaughter node is added to the network, forming a link to the mother node, and with probability p to each one of its neighbors. The degree distribution of the resulting network turns out to followa power-law distribution, thus the ND network is a scale-free network. To calculate the DSPL wederive a master equation for the time evolution of the probability P t ( L = ℓ ), ℓ = 1 , , . . . , where L is the distance between a pair of nodes and t is the time. Finding an exact analytical solution ofthe master equation, we obtain a closed form expression for P t ( L = ℓ ). The mean distance, h L i t ,and the diameter, ∆ t , are found to scale like ln t , namely the ND network is a small world network.The variance of the DSPL is also found to scale like ln t . Interestingly, the mean distance and thediameter exhibit properties of a small world network, rather than the ultrasmall world networkbehavior observed in other scale-free networks, in which h L i t ∼ ln ln t . PACS numbers: 64.60.aq,89.75.Da . INTRODUCTION The increasing interest in the field of complex networks in recent years is motivated bythe realization that a large variety of systems and processes in physics, chemistry, biology,engineering, and society can be usefully described by network models [1–6]. These mod-els consist of nodes and edges, where the nodes represent physical objects, while the edgesrepresent the interactions between them. It was found that networks appearing in differentcontexts often share various structural properties. For example, they exhibit repeating net-work motifs such as the feed-forward loop (FFL) and the auto-regulator [7, 8]. The structureof these motifs and their abundance provide useful information on the growth mechanismof the network and often has functional importance. At the global scale, many of thesenetworks are scale-free, which means that they exhibit power-law degree distributions of theform P ( K = k ) ∼ k − γ [9–13]. The most highly connected nodes, called hubs, play a domi-nant role in dynamical processes on these networks. A central feature of random networks isthe small-world property, namely the fact that the mean distance and the diameter scale likeln N , where N is the network size [14–17]. Moreover, it was shown that scale-free networksare generically ultrasmall, namely their mean distance and diameter scale like ln ln N [18].While pairs of adjacent nodes exhibit direct interactions, the interactions between mostpairs of nodes are indirect, and are mediated by intermediate nodes and edges. Pairs ofnodes may be connected by many different paths. The shortest among these paths are ofparticular importance because they are likely to provide the fastest and strongest interac-tions. Therefore, it is of much interest to study the distribution of shortest path lengths(DSPL) between pairs of nodes in different types of networks. Such distributions, which arealso referred to as distance distributions, are expected to depend on the network structureand size. They are of great importance for the temporal evolution of dynamical processes[6] such as signal propagation in genetic regulatory networks [19, 20], navigation [21, 22]and epidemic spreading [23]. Central measures of the DSPL such as the mean distance andextremal measures such as the diameter were studied [15, 24–27]. However, apart from afew studies [28–34], the DSPL has not attracted nearly as much attention as the degree dis-tribution. Recently, an analytical approach was developed for calculating the DSPL [35] inthe Erd˝os-R´enyi (ER) network [36], which is the simplest mathematical model of a randomnetwork. More general formulations were later developed [37, 38], for the broader class of2onfiguration model networks [28, 39].To gain insight into the structure of complex networks, it is useful to study the growthdynamics that gives rise to these structures. In general, it appears that many of the networksencountered in biological, ecological and social systems grow step by step, by the additionof new nodes and their attachment to existing nodes. In some networks, the new nodesemerge with no predefined connections, while in other networks the new nodes result fromthe duplication of existing nodes, followed by a stochastic readjustment of their links. Afundamental feature of these growth processes is the preferential attachment mechanism, inwhich the likelihood of an existing node to gain a link to the new node is proportional toits degree. It was shown that growth models based on preferential attachment give rise toscale-free networks, which exhibit power-law degree distributions [1, 9].The effect of node duplication (ND) processes on the structure and evolution of networkswas studied using a simple network growth model. In this model, at each time step a randomnode, referred to as a mother node, is selected for duplication [40–47]. The new, daughternode, retains a copy of each link of the mother node with probability p . In this modelthe daughter node does not form a link to the mother node, and thus in the following isreferred to as the uncorded ND model. In the case that none of these links were retained,the daughter node remains isolated and is removed from the network. In such case, a newmother node is randomly selected for duplication and the growth process continues. Notethat as p is decreased, the probability that the daughter node will be discarded increasesand the network growth process slows down. It was shown that for 0 < p < / P t ( K = k ) ∼ k − γ . (1)For 0 < p < /e , where e is the base of the natural logarithm, the exponent is given by thenontrivial solution of the equation γ = 3 − p γ − , (2)while for 1 /e ≤ p < / γ = 2 [44]. In the former case the mean degree, h K i t , converges to an asymptotic value while in the latter case it diverges logarithmicallywith the network size. For 1 / ≤ p ≤ M DGM SS
FIG. 1: (Color online) Illustration of the corded ND model. A random node, referred to as amother node, M (empty circle) is selected for duplication. The newly created daughter node, D(empty circle) forms a deterministic edge (solid line) to the mother node, and with probability p it forms a probabilistic edge (dashed line) to each one of the neighbors of M. In this example, Dforms links to its two sister nodes, denoted by S, but does not form a link to its grand-mothernode, denoted by GM. In this illustration, all the other edges (solid lines) are deterministic edges. Recently, a new node duplication model was introduced and studied [48, 49]. In thismodel, referred to as the corded ND model, starting from a seed network which consists ofa single connected component of s nodes, at each time step a random, mother node, M, isselected for duplication. The daughter node, D, is added to the network. It forms a link toits mother node, M, and is also connected with probability p to each neighbor of M (Fig.1). It was shown that for 0 < p < / / ≤ p ≤ k to be a neighborof the randomly selected mother node is proportional to k . Therefore, the degrees of theneighbors of the mother node selected at time t are drawn from the distribution˜ P t ( K = k ) = kP t ( K = k ) h K i t . (3)The daughter node forms a link to each one of these nodes with probability p . Thus, theprobability that the daughter node will form a link to a node of degree k is proportional to˜ P t ( K = k ). The degree distribution of the corded ND network was studied in Refs. [48, 49].It was found that for 0 < p < /
2, in the asymptotic limit, the degree distribution of thisnetwork follows Eq. (1), where the exponent γ = γ ( p ) is given by the non-trivial solution ofthe equation 4 = 1 + p − − p γ − . (4)This solution, γ = γ ( p ) is a monotonically decreasing function of p , in the range of 0 < p < /
2. In the limit of p →
0, the exponent γ diverges like γ ( p ) ∼ /p , while γ (1 /
2) = 2. Inthe asymptotic limit, the mean degree is given by h K i = 21 − p , (5)while the second moment of the degree distribution is given by [49] h K i = (cid:18) − p (cid:19) (cid:18) p − p − p − p (cid:19) , < p < √ − . (6)The sparse network regime can be divided into two parts. For 0 < p < √ − γ ( p ) >
3, thus in this range the first two moments, h K i and h K i , are finite. For √ −
2, the exponent γ takes values in the range 2 < γ ( p ) <
3, thus in this range the firstmoment is finite while the second moment diverges. Using Eqs. (5) and (6), it is found thatthe connective constant λ = h K i − h K ih K i (7)of the corded ND network is given by λ = 2(1 + 2 p )1 − p − p . (8)While in the asymptotic limit the probability P ( K = k ) may be non-zero for any integervalue of k , for a finite network of N nodes, it is bounded in the range 1 ≤ k ≤ k max , where k max = N − , C , . . . ,C r , which were cited in B [58]. The resulting network module consists of r triangles, or triadic closures, which share the AB edge. Since the links of this network arepointing backwards, each one of these triangles can be considered as a feed-backward loop(FBL).The corded ND model exhibits a unique structure, which is radically different from config-uration model networks with the same degree distribution. Unlike the configuration modelnetwork [28, 39], which may include small, isolated clusters, the corded ND network con-sists of a single connected component. Therefore, unlike the configuration model, it doesnot exhibit a percolation transition. Also, while the configuration model network exhibitsa local tree-like structure, the ND network includes a large number of triangles and othershort cycles even in the dilute case of 0 < p < / p = 0, the corded ND network is a tree, which consists only of themother-daughter edges. This tree turns out to form a backbone for the corded ND networkat p >
0, and is thus refereed to as the backbone tree. Once a mother node is selectedfor duplication, the mother-daughter edge is added deterministically. Therefore, the edgesof the backbone tree are called deterministic edges. The other edges, which exist only for p >
0, are called probabilistic edges. In the limit of p = 0, where the corded ND network is6 tree, the shortest path between any pair of nodes is unique. In fact, on a tree structure theshortest path is the only path between any pair of nodes. Since the path which resides on thebackbone tree consists only of deterministic edges, it is referred to as the deterministic path.For p > < p < /
2, we derive a master equation for the timeevolution of the probabilities P t ( L = ℓ ), where ℓ = 1 , , . . . is the distance between a pair ofnodes and t is the time. The derivation of the master equation requires information on thestructure of the backbone tree and on the degeneracies of the shortest paths. Solving themaster equation we obtain an expression for P t ( L = ℓ ), which consists of two convolution-like sums. The first sum emanates from the DSPL of the seed network, P ( L = ℓ ), whilethe second sum involves a discrete exponential function. We calculate the mean distance, h L i t , and the diameter, ∆ t , and show that in the long-time limit they scale like ln t , namelythe corded ND network is a small-world network [14–17]. Interestingly, this behavior differsfrom other scale-free networks which are ultrasmall, namely their mean distance follows h L i t ∼ ln ln t [18].The paper is organized as follows. In Sec. II we present the corded ND model. In Sec.III we analyze the backbone tree, consisting of the mother-daughter edges. In Sec. IV weconsider the degeneracies of the shortest paths in the corded ND network. Using the resultsof sections III and IV we derive, in Sec. V, a master equation for the time evolution of theDSPL and solve it analytically. In Sec. VI we study properties of the DSPL. The meandistance is studied in Sec. VII, the diameter is evaluated in Sec. VIII and the variance ofthe DSPL is obtained in Sec. IX. The results are discussed in Sec. X and summarized inSec. XI. 7 a )( b ) FIG. 2: (Color online) Two instances of corded ND networks of size N = 50, with p = 0 . p = 0 . L = 2 to L = 1, forming atriangle. Increasing p makes the network denser. II. THE CORDED NODE DUPLICATION MODEL
Consider the corded ND model introduced in Refs. [48, 49]. At each time step, a randomnode, referred to as the mother node, is selected for duplication. The daughter node is addedto the network, forming a link to the mother node and with probability p to each neighbor ofthe mother node [48, 49]. The growth process starts from an initial seed network of N = s nodes. Thus, the network size after t time steps is N t = t + s .In Fig. 2 we present two instances of the corded ND network, of size N t = 50, whichwere formed around the same backbone tree. Both networks were grown from a seed of size s = 2, with p = 0 . p = 0 . N t − t is selected randomly from all the N t nodes in the network, its degree is effectivelydrawn from the degree distribution P t ( K = k ). The mother node gains a link to the daughternode, thus its degree increases by 1. By construction, the degree of the daughter node cannotexceed the degree of the mother node. In case that all the links are duplicated, the degreeof the daughter node is equal to the degree of the mother node, while in case that none ofthem is duplicated the degree of the daughter node is 1.In order to obtain a connected network, it is required that the seed network will consistof a single connected component. The size of the seed network is denoted by s and its degreedistribution is P ( K = k ). The mean degree of the seed network is denoted by h K i . TheDSPL of the seed network is denoted by P ( L = ℓ ) and the mean distance is denoted by h L i .The DSPL and the mean degree are related by P ( L = 1) = h K i / ( s − P ( L = ℓ ) may take non-zero values for ℓ = 1 , , . . . , ∆ , where ∆ is the diameter of theseed network, while P ( L = ℓ ) = 0 for ℓ ≥ ∆ + 1. For seed networks of s nodes, ∆ maytake values in the range 1 ≤ ∆ ≤ s − s nodes. In thiscase, the degree distribution of the seed network is P ( K = k ) = δ k,s − . The DSPL ofthe seed network is P ( L = ℓ ) = δ ℓ, , where δ i,j is the Kronecker delta, and its diameteris ∆ = 1. To avoid memory effects, which slow down the convergence to the asymptoticstructure, it is often convenient to use a seed network which consists of a single node, namely s = 1. In this case the degree distribution of the seed network is given by P ( K = k ) = δ k, ,while its DSPL is not defined. However, the DSPL becomes well defined at time t = 1,when the network consists of a pair of connected nodes, whose degree distribution is givenby P ( K = k ) = δ k, , its DSPL is P ( L = ℓ ) = δ ℓ, and its diameter is ∆ = 1. Anotherinteresting choice for the seed network is a linear chain of s nodes. In this case, the initialdegree distribution is P ( K = k ) = (2 /s ) δ k, + (1 − /s ) δ k, , and the initial DSPL is P ( L = ℓ ) = s − ℓ (cid:0) s (cid:1) , (9)for ℓ = 1 , , . . . , s −
1. This choice captures the largest possible diameter in a seed networkof s nodes, namely ∆ = s −
1. 9
II. THE BACKBONE TREE
The mother-daughter links in the corded ND network form a random tree structure,which serves as a backbone tree for the resulting network. The backbone tree is a randomrecursive tree [59–61]. To study its properties, one can take the limit of p = 0, in whichthe corded ND network is reduced to the backbone tree. The degree distribution of thebackbone tree, denoted by P B t ( K = k ), evolves in time according to P B t +1 ( K = k ) = 1 N t + 1 (cid:2) ( N t − P B t ( K = k ) + P B t ( K = k −
1) + δ k, (cid:3) . (10)The second term on the right hand side accounts for the degree of the mother node, whichincreases by 1 due to the link to the daughter node. The third term accouts for the degreeof the daughter node (which is K = 1), while the first term accounts for all the other nodesin the network. Subtracting P B t ( K = k ) from both sides of Eq. (10) and replacing thedifference on the left hand side by a time derivative we obtain ddt P B t ( K = k ) = 1 N t + 1 (cid:2) − P B t ( K = k ) + P B t ( K = k −
1) + δ k, (cid:3) . (11)In the long time limit, the degree distribution is expected to reach a steady state, in whichthe time derivative vanishes. The steady state solution of Eq. (11) is given by P B ( K = k ) = 12 k . (12)The corresponding tail distribution is given by P B ( K > k ) = 1 / k . Note that the degreedistribution of the backbone tree, given by Eq. (12), is a discrete exponential distribution. Itis very different from the degree distribution of the full corded ND network, which is a power-law distribution. Eq. (12) captures important properties of the network. In particular, itshows that half of the nodes in the backbone tree are leaf nodes, which have only one link.One fourth of the nodes in the backbone tree have two links, namely they lie along linearchains with no branching. The remaining nodes are branching points with three or morelinks.It is useful to define a conditional degree distribution of the form, P B ( K = k | K > k ),namely the degree distribution of all the nodes of degree K > k . The conditional degreedistribution can be expressed in the form 10 B ( K = k | K > k ) = P B ( K = k ; K > k ) P B ( K > k ) . (13)Thus, it is given by P B ( K = k | K > k ) = 12 k − k . (14)For example, this means that nodes which are not leaves (namely of degree k > /
2, are of degree 3 with probability of 1 /
4, and so on.
IV. THE DEGENERACY OF THE SHORTEST PATHS
Consider a pair of nodes, i and j , which are at a distance L = ℓ from each other. Theshortest path from i to j may be unique or it may be degenerate. In case that the shortestpath is degenerate, there are at least two different paths of length ℓ from i to j (which mayhave overlapping segments). In particular, the degenerate paths may differ in the first step,starting from node i . Here we focus on the degeneracy of the first step, namely on thenumber of neighbors of node i which reside on shortest paths from i to j . We denote thedistribution of degeneracy levels of the first steps of the shortest paths by P ( G = g ), where g = 1 , , . . . . In order to calculate the distribution P ( G = g ) we follow the growth processof the network and consider the shortest path from the newly formed daughter node, D, toa randomly selected target node T. It is important to note that the distances L DT betweenthe daughter node, D, and all the existing nodes, T, in the network are determined uponformation of the node D. This is due to the fact that nodes and edges which will be addedlater cannot form paths between D and T which are shorter than L DT . However, they canform additional paths of length L DT , thus increasing the degeneracy of the shortest paths.Since the shortest paths on the backbone tree are unique, it is expected that for p ≪ / P ( G = g ) will sharply decrease as g is increased. Therefore, we will focus below on theprobability of a double degeneracy, P ( G = 2).It turns out that there are two growth scenarios which give rise to a double degeneracyof the shortest path from the daughter node, D, to a random target node T. In the firstscenario, two probabilistic edges form an alternate path of length L = 2 between nodes11 and GM, which is degenerate with the shortest path which goes along the branch of thebackbone tree. In the second scenario, there are two probabilistic edges which form shortcutsbetween pairs of nodes which are next nearest neighbors on the backbone tree. As a result,they give rise to two degenerate paths of length L = 2, where each path consists of onedeterministic edge and one probabilistic edge.The first scenario is shown in Fig. 3(a). In this scenario, the node D has an older sister, S,which is connected to node GM via a probabilistic edge. In case that D forms a probabilisticedge to S, these two probabilistic edges form an alternate path of length L = 2 from D toGM. The probability of this scenario is proportional to p . In general, node D may haveseveral sister nodes. The number of such sister nodes is given by k −
2, where k is the degreeof the mother node, M. Therefore, the probability that the path from D to T will be doublydegenerate due to the mechanism of Fig. 3(a) is P ( G = 2) = ∞ X k =3 (cid:18) k − (cid:19) P B ( K = k | K > p (1 − p ) k − , (15)where P ( K = k | K >
2) is the conditional degree distribution of the backbone tree, given byEq. (14). Evaluating the right hand side of Eq. (15) we find that P ( G = 2) = p + O ( p ).The second scenario is shown in Fig. 3(b). In this case, the mother node, M, is connectednot only to its own mother node, GM, but also (with probability p ) to its grandmothernode, referred to as GGM. Upon formation of node D, it may form (with probability p )a probabilistic edge to node GM. In such case, there are two degenerate paths from D toGGM. The probability of this scenario is P ( G = g ) = p + O ( p ).It can be shown that the two scenarios presented above are mutually exclusive, thus theoverall probability for the shortest path to be doubly degenerate is P ( G = 2) = P a ( G = 2) + P b ( G = 2) = 2 p + O ( p ) . (16)A careful analysis shows that the lowest order contribution to P ( G = 3) is of order p ,because at least four probabilistic edges are required. Therefore, to leading order we obtain P ( G = 1) = 1 − p + O ( p ) P ( G = 2) = 2 p + O ( p ) P ( G = 3) = O ( p ) . (17)12 a) GM DT SM (b)
MGM DGGMT
FIG. 3: Illustrations of two local network structures which give rise to double degeneracy, G = 2,in the first step of the shortest path between the daughter node, D, on the right and a target node,T, which resides further down the branch, on the left. In both illustrations, solid lines correspondto deterministic edges, which belongs to the backbone tree, while dashed lines correspond toprobabilistic edges. (a) An alternate path of length L = 2 is formed by probabilistic edges betweennodes D and GM, via a sister node, S. This path is degenerate with the primary path which residesfully on the backbone tree. Such structure may form in two distinct sequences of events. Onepossibility is that S is an older sister which was connected to GM upon formation. When D formsit connects probabilistically to S and completes the alternate path. In the other possibility, S isa younger sister of D, conditioned on D not forming a probabilistic edge to GM upon formation.When S is formed, it connects simultaneously to GM and to D, thus forming the alternate path.(b) In this structure, the distance between D and GGM along the backbone tree is L = 3, whilethe shortest paths in the entire network is of length L = 2. This is achieved by two consecutiveprobabilistic shortcuts, one from M to GGM (created upon formation of M) and the other from Dto GM (created upon formation of D). Note that in this case, the existence of sisters of D (youngeror older) makes no difference. P ( G = g ) at a degree g = g max , its moments can be expressedby h G n i = g max X g =1 g n P ( G = g ) . (18)Taking g max = 2, we find that h G i = 1 + 2 p and h G i = 1 + 6 p . V. THE DISTRIBUTION OF SHORTEST PATH LENGTHS
Consider an instance of the corded ND network with a distance matrix L t of dimensions N t × N t , where L t ( i, j ) = ℓ ij ( t ) is the distance betwen nodes i and j at time t . A splendidproperty of the corded ND model is that the addition of the daughter node never shortensthe distance between any pair of existing nodes, i and j , in the network, namely ℓ i,j ( t ) = ℓ i,j is fixed. Thus, the distance matrix L t +1 consists of the matrix L t , with the addition a row(and a column) which account for the distances between the daughter node, D, and the restof the network. This property enables us to express the DSPL at time t +1 as a superpositionof the DSPL at time t and the DSPL between the daughter node, D, and the rest of thenetwork.Choosing a random node, i , one can describe the shell structure around such a node bythe distance distribution P t ( L = ℓ ) = N t ( L = ℓ ) N t − , (19)where N t ( L = ℓ ) is the number of nodes in the shell at distance ℓ from node i . At each timestep, t , a random node M, referred to as a mother node, is chosen for duplication. The new,daughter node, D, is then connected to the mother node, and with probability p to eachone of its neighbors. The shell structure around the daughter node is closely related to thatof the mother node. Among the neighbors of the mother node, those of the neighbors forwhich the link to M is copied, end up at distance L = 1 from D. Those neighbors of M forwhich the link to M is not copied end up at distance L = 2 from D. Therefore, the first shellaround the daughter node is given by P D t ( L = 1) = pP M t ( L = 1) + 1 N t − , (20)14here P M t ( L = ℓ ) is the distance distribution around the mother node. Thus, nodes whichare at distance L = ℓ from the mother node, may end up either at distance L = ℓ or atdistance L = ℓ + 1 from the daughter node. To exemplify this property, consider a targetnode T at distance L = ℓ from the mother node, M. A shortest path from M to T consistsof a set of nodes M , r , r , . . . , r ℓ − , T in which subsequent nodes are connected. In the casethat the edge between M and r is copied, node T ends up at a distance L = ℓ from D, whilein case it is not copied node T ends up at a distance L = ℓ + 1 from D. In the case that thereis a single shortest path from M to T, the former scenario would occur with probability p while the latter scenario would occur with probability 1 − p , namely P D t ( L = ℓ ) = pP M t ( L = ℓ ) + (1 − p ) P M t ( L = ℓ − , (21)where ℓ ≥
2. However, since the shortest path from M to T may be degenerate, thecalculation of P D t ( L = ℓ ) requires a more careful attention. We express the DSPL betweenthe daughter node, D, and the rest of the network in the form P D t ( L = ℓ ) = ηP M t ( L = ℓ ) + (1 − η ) P M t ( L = ℓ − . (22)where ℓ ≥ < η <
1. The assumption made here is that η = η ( p ) does not dependon the path length L .In order to evaluate the parameter η , consider a random target node, T, which is atdistance ℓ from the mother node, M. In the simplest case, the shortest path, of length ℓ from M to T is unique. However, it may be degenerate, in which case there are severalpaths of length ℓ from M to T. Here we are concerned with the degeneracy of the first stepalong the shortest paths. This degeneracy is given by the number of nearest neighbors of Mwhich reside on at least one shortest path from M to T, and is denoted by G MT . Clearly, G MT ≤ k M , where k M is the degree of the mother node, M.Consider a pair of nodes M and T, which are at a distance L = ℓ from each other, wherethe degeneracy level of the shortest paths is given by G MT = g . In the case that node M ischosen for duplication, if none of the g links of M which reside on shortest paths to T areduplicated, the distance between the daughter node D and T becomes L = ℓ + 1, while in thecase that at least one of these g edges is duplicated, the distance is L = ℓ . Since each linkof the mother node, M, is duplicated with probability p , the probability that none of them15s duplicated is (1 − p ) g . The probability that at least one of these g links will be duplicatedis 1 − (1 − p ) g . In order to account for the probabilistic nature of the degeneracy, we denotethe probability that the first step in the shortest path between random nodes M and T is g -fold degenerate by P t ( G = g ). Thus, the probability, η = η ( p ), that at least one of the g neighbors of the mother node, M, which reside along shortest paths to T are connected tothe daughter node can be expressed by1 − η = ∞ X g =1 (1 − p ) g P ( G = g ) , (23)or more concisely by 1 − η = (cid:10) (1 − p ) G (cid:11) . (24)In fact, Eq. (23) can also be expressed in the form1 − η = F (1 − p ) , (25)where F ( x ) = ∞ X g =1 x g P ( G = g ) (26)is the generating function of P ( G = g ).For simplicity we assume that the distribution P ( G MT = g ) does not depend on L MT ,except for the case of L MT = 1, in which P ( G = g ) = δ g, . If this assumption holds, itguarantees that the assumption made above that η is independent of L MT is valid.Using the binomial expansion of (1 − p ) g in Eq. (23), it can be expressed in the form η = − ∞ X n =1 ( − n B n p n , (27)where B n = ∞ X g = n (cid:18) gn (cid:19) P ( G = g ) (28)is the n th binomial moment of P ( G = g ). The first two terms in this expansion are η = B p − B p , where B = h G i and B = ( h G i − h G i ) /
2. Taking the first term in Eq. (27),where B = 1 + 2 p , we obtain 16 = p + 2 p + O ( p ) . (29)While paths of length L = 1 are non-degenerate, for simplicity we replace the parameter p by η also in the equation for P D t ( L = 1). Since p and η differ from each other only inorder p , while P t ( L = 1) is quickly reduced to order 1 /N t , the error introduced by thisapproximation is negligible.Assuming that the mother node, M, is a typical node, we replace the distribution P M t ( L = ℓ ) by P t ( L = ℓ ). As a result, Eqs. (20) and (22) are replaced by P D t ( L = 1) = ηP t ( L = 1) + 1 N t − , (30)and P D t ( L = ℓ ) = ηP t ( L = ℓ ) + (1 − η ) P t ( L = ℓ − , (31)respectively, where ℓ ≥
2. After the node duplication step is completed, the DSPL at time t + 1 is given by P t +1 ( L = ℓ ) = N t − N t + 1 P t ( L = ℓ ) + 2 N t + 1 P D t ( L = ℓ ) − P t ( L = ℓ )( N t − N t + 1) , (32)where the third term on the right hand side accounts for the dilution of the probability P t +1 ( L = ℓ ) due to the addition of the mother-daughter edge to the network. Subtracting P t ( L = ℓ ) from both sides of Eq. (32) and replacing the difference on the left hand side bya time derivative, we obtain ddt P t ( L = ℓ ) = − N t + 1 P t ( L = ℓ ) + 2 N t + 1 P D t ( L = ℓ ) − P t ( L = ℓ )( N t − N t + 1) , (33)where N t = t + s . Plugging in the expressions for P D t ( L = ℓ ) from Eqs. (30) and (31) weobtain ddt P t ( L = 1) = − (cid:18) − ηt + s + 1 (cid:19) P t ( L = 1) + 2 [1 − P t ( L = 1)]( t + s − t + s + 1) , (34)and 17 dt P t ( L = ℓ ) = − (cid:18) − ηt + s + 1 (cid:19) P t ( L = ℓ ) + 2 (cid:18) − ηt + s + 1 (cid:19) P t ( L = ℓ − − t + s − t + s + 1) P t ( L = ℓ ) , (35)where ℓ ≥
2. The solution of Eqs. (34) and (35), for s ≥
2, is given by P t ( L = 1) = s − t + s − (cid:18) s + 1 t + s + 1 (cid:19) − η P ( L = 1)+ 2(1 − η )( t + s − " − (cid:18) s + 1 t + s + 1 (cid:19) − η , (36)and P t ( L = ℓ ) = (cid:18) s − s + 1 (cid:19) (cid:18) t + s + 1 t + s − (cid:19) min { ℓ, ∆ } X ℓ ′ =1 e − c t c ℓ − ℓ ′ t ( ℓ − ℓ ′ )! P ( L = ℓ ′ )+ 1(1 − η )( s + 1) (cid:18) t + s + 1 t + s − (cid:19) ∞ X ℓ ′ =0 e − c t c ℓ + ℓ ′ t ( ℓ + ℓ ′ )! e − µℓ ′ , (37)for ℓ ≥
2, where c t = 2(1 − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) , (38)and µ = ln (cid:18) − η − η (cid:19) . (39)The parameter η is given by Eq. (23). Note that for η = 1 /
2, the exponent e − µ = 0, thus allthe terms in the second sum of Eq. (37) vanish except for the term ℓ ′ = 0. For 1 / < η < e − µℓ ′ by (cid:18) − η − η (cid:19) ℓ ′ = ( − ℓ ′ (cid:12)(cid:12)(cid:12)(cid:12) − η − η (cid:12)(cid:12)(cid:12)(cid:12) ℓ ′ . (40)Thus, for η > / ℓ ′ and negative terms for odd values of ℓ ′ .Eqs. (36) and (37) provide a closed form expression for the DSPL of the corded NDnetwork at time t for any size and degree distribution of the seed network. The first term in18ach of these equations accounts for the effect of the DSPL of the seed network, P ( L = ℓ ),while the second term does not depend on the initial DSPL. The first sum in Eq. (37) is aconvolution between the DSPL of the seed network and a Poisson distribution. The secondsum is a convolution between an exponential function and a Poisson distribution.Eq. (37) can also be written in the form P t ( L = ℓ ) = (cid:18) s − s + 1 (cid:19) (cid:18) t + s + 1 t + s − (cid:19) min { ℓ, ∆ } X ℓ ′ =1 e − c t c ℓ − ℓ ′ t ( ℓ − ℓ ′ )! P ( L = ℓ ′ )+ 1(1 − η )( s + 1) (cid:18) t + s + 1 t + s − (cid:19) e − c t (1 − e − µ ) e µℓ (cid:20) − Γ( ℓ, c t e − µ )Γ( ℓ ) (cid:21) , (41)where Γ( x ) is the Gamma function and Γ( x, y ) is the incomplete Gamma function.In Fig. 4 we present the parameter η as a function of p . The theoretical results (solidline), obtained from Eq. (29), are found to be in good agreement with computer simulations(circles). The value of η extracted from the simulations is the value which provides the bestfit to the DSPL of Eq. (37), when incorporated in Eq. (22). Since η = η ( p ) increases fasterthan linearly with p , there is a point 0 < p ∗ < /
2, for which η ( p ∗ ) = 1 /
2. Solving Eq. (29)for η ( p ∗ ) = 1 / p ∗ = [(9 + √ / / − [3(9 + √ − / ≃ . P t ( L = ℓ ) vs. ℓ for an ensemble of corded NDnetworks of size N t = 10 , grown from a seed network of size s = 2, with p = 0 . , . , . .
4. For small values of p , the analytical results (solid lines) are in very good agreementwith the simulation results (circles). As p is increased, the analytical results become shiftedto the right compared to the simulation results. The simulation data was averaged over 100network instances. VI. PROPERTIES OF THE DSPL
The first sum in Eq. (37) accounts for paths which emerge from repeated duplicationof nodes and edges along paths of the seed network. It can be noted that the probability P t ( L = ℓ ) is affected only by the initial probabilities P ( L = ℓ ′ ) for which ℓ ′ ≤ ℓ . This is dueto the fact that the distance from a daughter node to any other node in the network is equal orlarger by 1 than the distance from the mother node. The second sum accounts for repeatedduplication of nodes and edges along new paths that emerge beyond the seed network.19 η TheorySimulation
FIG. 4: (Color online) The parameter η as a function of the probability p . This parameterrepresents the probability that the distance between the daughter node, D, and a random targetnode, T, is equal to the distance between the mother node, M, and T, namely η = P ( L DT = L MT ).Hence, the probability that L DT = L MT + 1 is given by 1 − η . The theoretical results (solid line),obtained from Eq. (29) are found to be in good agreement with the simulation results (circles).For small values of p , where the shortest paths are most likely to be unique, η is equal to p . As p is increased, the shortest paths become degenerate. As a result, η acquires a nonlinear dependenceon p , making it larger than p . The exponential function accounts for the backbone tree structure which emerges from theedges connecting the mother and daughter nodes. The Poisson distribution accounts forthe probabilistic connections to the neighbors of the mother node. Both sums in Eq. (37)involve the same Poisson distribution, P ( m ) = e − c t c mt /m !, whose mean, c t is given by Eq.(38). The first sum runs over terms in the range m = ℓ − , ℓ − , . . . , max { , ℓ − s + 1 } ,while the second sum runs over terms in the range m = ℓ, ℓ + 1 , . . . , ∞ .Below we consider some special cases and limits in which the expression for P t ( L = ℓ )can be simplified. In particular, we study specific choices of the seed network, such as acomplete graph of s nodes, and the special case of a single node, in which s = 1. We alsoconsider specific values of the parameter p , such as p = 0, in which the corded ND networkis reduced to the backbone tree. Another special case is the value of p for which η ( p ) = 1 / µ diverges. As a result, the exponentials, e − µℓ ′ , in the second20 P ( L = ℓ ) TheorySimulation ℓ P ( L = ℓ ) ℓ P ( L = ℓ ) ℓ P ( L = ℓ ) FIG. 5: (Color online) The DSPL of the corded ND network of N t = 10 nodes with (a) p = 0 . p = 0 .
2, (c) p = 0 .
3, and (d) p = 0 .
4. The theoretical results (solid lines), obtained from Eqs.(36) and (37) are found to be in good agreement with the results of computer simulations (circles),obtained by averaging over 100 instances. As p is increased, the distances become shorter and theDSPL becomes narrower, consistent with Eqs. (58) and (74). The agreement is better for smallervalues of p . In fact, Eqs. (36) and (37) are exact, while the deviation from the simulation resultsare due to the underestimate of η , as can be seen in Fig. 4. sum in Eq. (37) vanish, except for the term with ℓ ′ = 0, thus the sum is reduced to a singleterm.A convenient choice for the seed network is a complete graph of s ≥ P ( L = 1) = 1 and P ( L ≥
2) = 0. The expression for theDSPL at time t is simplified to 21 t ( L = 1) = s − t + s − (cid:18) s + 1 t + s + 1 (cid:19) − η + 2(1 − η )( t + s − " − (cid:18) s + 1 t + s + 1 (cid:19) − η , (42)and P t ( L = ℓ ) = (cid:18) t + s + 1 t + s − (cid:19) "(cid:18) s − s + 1 (cid:19) e − c t c ℓ − t ( ℓ − − η )( s + 1) ∞ X ℓ ′ =0 e − c t c ℓ + ℓ ′ t ( ℓ + ℓ ′ )! e − µℓ ′ , (43)for ℓ ≥
2, where c t is given by Eq. (38) and µ is given by Eq. (39).In case that the seed network consists of a single node, s = 1, the probability P ( L = ℓ ) = 0 is not defined. However, after one time step, at t = 1, the network consists of apair of connected nodes, where P ( L = 1) = 1 and P ( L ≥
2) = 0. Thus, the ensembleof networks obtained at time t for a seed network of size s = 1 is identical to the networkensemble obtained at time t − s = 2, namely P t ( L = ℓ | s =1) = P t − ( L = ℓ | s = 2). The DSPL of the resulting ND network, for t ≥
1, takes the form P t ( L = 1) = − (cid:18) η − η (cid:19) (cid:18) t + 2 (cid:19) − η t + (cid:18) − η (cid:19) t , (44)and P t ( L = ℓ ) = 13 (cid:18) t + 2 t (cid:19) " e − c t c ℓ − t ( ℓ − − η ∞ X ℓ ′ =0 e − c t c ℓ + ℓ ′ t ( ℓ + ℓ ′ )! e − µℓ ′ , (45)for ℓ ≥
2, where c t is given by Eq. (38) and µ is given by Eq. (39).In case that the parameter p = 0, each daughter node is formed with a single edgeconnecting it to its mother node. In this case, the corded ND network is reduced to thebackbone tree. Upon formation of the daughter node, all the paths from it to existing nodesgo through the mother node. They are thus longer by 1 than the paths starting from themother node. In this case, Eq. (36) is simplified to P t ( L = 1) = ( s − s + 1)( t + s − t + s + 1) P ( L = 1) + 2 t ( t + s − t + s + 1) . (46)In case that p = 0 the parameters η and µ take the values η = 0 and µ = ln 2. Thus, Eq.(41) is reduced to 22 t ( L = ℓ ) = (cid:18) s − s + 1 (cid:19) (cid:18) t + s + 1 t + s − (cid:19) min { ℓ, ∆ } X ℓ ′ =1 e − c t c ℓ − ℓ ′ t ( ℓ − ℓ ′ )! P ( L = ℓ ′ )+ (cid:18) t + s + 1 t + s − (cid:19) (cid:18) e − c t / ℓ s + 1 (cid:19) (cid:20) − Γ( ℓ, c t / ℓ ) (cid:21) , (47)where c t = 2 ln (cid:18) t + s + 1 s + 1 (cid:19) . (48)Another interesting case appears for p = p ∗ , where η = η ( p ∗ ) = 1 /
2. In this case Eq.(37) is reduced to P t ( L = ℓ ) = (cid:18) t + s + 1 t + s − (cid:19) (cid:18) s − s + 1 (cid:19) min { ℓ, ∆ } X ℓ ′ =1 e − c t c ℓ − ℓ ′ t ( ℓ − ℓ ′ )! P ( L = ℓ ′ ) + 2 s + 1 e − c t c ℓt ℓ ! . (49)For the special case in which the seed network is a complete graph, Eq. (49) is furtherredued to the form P t ( L = ℓ ) = (cid:18) t + s + 1 t + s − (cid:19) (cid:20)(cid:18) s − s + 1 (cid:19) e − c t c ℓ − t ( ℓ − s + 1 e − c t c ℓt ℓ ! (cid:21) , (50)where ℓ ≥ VII. THE MEAN DISTANCE
The mean distance between a random pair of nodes in the corded ND network is givenby h L i t = ∞ X ℓ =1 ℓP t ( L = ℓ ) . (51)Taking the time derivative of Eq. (51) and plugging in the expressions for dP t ( L = 1) /dt from Eq. (34) and for dP t ( L = ℓ ) /dt from Eq. (35) we obtain ddt h L i t = 2( η − t + s ) − η ( t + s − t + s + 1) ∞ X ℓ =1 ℓP t ( L = ℓ ) + 2(1 − η ) t + s + 1 ∞ X ℓ =1 ( ℓ + 1) P t ( L = ℓ )+ 2( t + s − t + s + 1) . (52)23earranging terms we obtain ddt h L i t = − t + s − t + s + 1) h L i t + 2(1 − η ) t + s + 1 + 2( t + s − t + s + 1) . (53)Solving Eq. (53) we obtain h L i t = 2(1 − η ) (cid:18) t + s + 1 t + s − (cid:19) ln (cid:18) t + s + 1 s + 1 (cid:19) + (cid:18) s − s + 1 (cid:19) (cid:18) t + s + 1 t + s − (cid:19) h L i − (cid:18) s + 1 (cid:19) (cid:18) − η (cid:19) (cid:18) tt + s − (cid:19) . (54)In the long time limit, Eq. (54) is reduced to h L i t = 2(1 − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) + C + C , (55)where C = (cid:18) s − s + 1 (cid:19) h L i (56)and C = − (cid:18) s + 1 (cid:19) (cid:20) − η (1 − η )1 − η (cid:21) . (57)The term C accounts for the effect of the DSPL of the seed network on h L i t . The term C is a negative term which depends on p and s . For 0 < p < p ∗ (where 0 < η < / − / ( s + 1) < C <
0. For p > p ∗ it becomes smaller than − / ( s + 1), thus reducing the mean distance h L i t . In conclusion, in the long time limit themean distance scales logarithmically with the network size, according to h L i t ≃ − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) , (58)which means that the corded ND network is a small-world network.In Fig. 6 we present the mean distance, h L i t , as a function of the network size N t , for p = 0 . , . , . .
4. The theoretical results, obtained from Eq. (55), where η is takenfrom Eq. (58), are found to be in good agreement with computer simulations (symbols).24 og N t h L i t p = 0 . p = 0 . p = 0 . p = 0 . FIG. 6: (Color online) The mean shortest path length, h L i t , of the corded ND network as afunction of network size N t . The theoretical results (solid lines), obtained from Eq. (58), where η is taken from Eq. (29), are generally in good agreement with the simulation results (symbols),confirming the logarithmic dependence on the network size. As p is increased, the mean shortestpath length decreases. As in Fig. 5, the deviation between the theory and simulation increasesas p is increased, due to the fact that the exact value of η is not known. For clarity, we focus onnetwork sizes in the range 10 ≤ N t ≤ . VIII. THE DIAMETER
Consider an ensemble of corded ND networks of size N t . In each instance of the networkthere are N t ( N t − / P t ( L = ℓ ). The expectation value of the number of pairs of nodes which reside at a distance L = ℓ from each other is given by N t ( L = ℓ ) = N t ( N t − P t ( L = ℓ ) , (59)where N t = t + s . For sufficiently long times ( t ≫ s ), the effect of the seed network is reducedand the DSPL exhibits a well defined peak, above which P t ( L = ℓ ) gradually decreases. Asa result, the tail of the DSPL exhibits a distance ∆ t , at which N t ( L = ∆ t ) = 1, which canbe considered as the expectation value of the diameter of the network. Below, we use thiscriterion to evaluate the diameter. For simplicity, we consider the case in which the initialnetwork is a complete graph of s nodes. Note that a network resulting at time t from a seed25etwork of size s = 1 is equivalent to a network at time t − s = 2. Thus, in theanalysis below there is no need to treat the case of s = 1 separately. Considering the largenetwork limit, and focusing on the large distance tail of the DSPL, it can be expressed by P t ( L = ℓ ) = (cid:18) s − s + 1 (cid:19) e − c t c ℓ − t ( ℓ − . (60)For convenience, we write c t in the form c t = 2(1 − η ) ln t s , where t s = t + s + 1 s + 1 (61)is the network size at time t + 1, expressed in units of the network size at time t = 1.Inserting the expression for c t into Eq. (60) and using the Stirling formula we find that N t ( L = ℓ ) = ( s − t ηs − η ) ln t s (cid:18) − η ) e ln t s ℓ (cid:19) ℓ . (62)Inserting N t ( L = ∆ t ) = 1 in Eq. (62) we obtain (cid:18) ∆ t e (1 − η ) ln t s (cid:19) ∆ t = ( s − t ηs − η ) ln t s . (63)Taking a logarithm on both sides and rearranging terms, Eq. (63) can be expressed in theform (cid:18) ∆ t − η ) e ln t (cid:19) ln (cid:18) ∆ t − η ) e ln t (cid:19) = 4 η ln t s − ln[16 π (1 − η ) ln t s )] + 2 ln ( s − − η ) e ln t s . (64)Applying the Lambert W function [62] on both sides and using the relation W ( ze z ) = z , weobtain ln (cid:18) ∆ t − η ) e ln t s (cid:19) = W (cid:20) η ln t s − ln[16 π (1 − η ) ln t s ] + 2 ln ( s − − η ) e ln t s (cid:21) , (65)or ∆ t = 2(1 − η ) exp (cid:26) W (cid:20) η ln t s − ln [16 π (1 − η ) ln t s ] + 2 ln ( s − − η ) e ln t s (cid:21)(cid:27) ln t s (66)Taking the long time limit, we can approximate the argument of the W ( x ) function. Thenumerator can be replaced by its leading term, which is 2 η ln t , thus26 t = 2(1 − η ) e W [ η (1 − η ) e ] ln t s . (67)Using again the above mentioned property of the W ( x ) function, we obtain that the expec-tation value, ∆ t of the diameter of the corded ND network is given by∆ t ≃ ηW h η (1 − η ) e i ln (cid:18) t + s + 1 s + 1 (cid:19) . (68)The diameter thus scales logarithmically with the network size, namely exhibits the samescaling as the mean distance h L i t . However, the coefficient is larger than the coefficient ofthe mean distance. Using Eqs. (58) and (68) we find that∆ t h L i t = η (1 − η ) W h η (1 − η ) e i . (69)In the dilute network limit, where p ≪
1, the parameter η also satisfies η ≪
1. Usingthe leading term in the Taylor expansion of the Lambert W function, given by W ( x ) = P ∞ n =1 ( − n ) n − x n /n !, and the relation η = p + 2 p , we obtain∆ t h L i t = e + p + 2 e − e p + O ( p ) . (70)Thus, in the limit of p ≪ t ≃ e h L i t . This is in contrast to thecase of configuration model networks, where ∆ = h L i + δ , where δ is an additive constant[24, 63].In Fig. 7 we present the diameter of the corded ND network as a function of the networksize for p = 0 . , . , . .
4. The analytical results (solid lines), obtained from Eq. (68),where η is taken from Eq. (29), confirm that the diameter scales logarithmically with thenetwork size. The analytical results over-estimate the slope compared to the simulationresults (symbols). This is due to the fact that the argument used to estimate ∆ t does notaccount for correlations between the longest distances in a given instance of the network.Thus, the result of Eq. (68) may be considered as an upper bound for the diameter. Thesimulation data was averaged over 100 network instances.27 og N t ∆ t p = 0 . p = 0 . p = 0 . p = 0 . FIG. 7: (Color online) The diameter ∆ t of the corded ND network as a function of network size, N t . The theoretical results (solid lines), obtained from Eq. (68), where η is taken from Eq. (29),are found to be in good agreement with the simulation results (symbols). The results confirmthe logarithmic dependence of the diameter on the network size. As p is increased, the diameterdecreases. IX. THE VARIANCE OF THE DSPL
In order to obtain the variance of the DSPL, we need to calculate its second moment,given by h L i t = P ∞ ℓ =1 ℓ P t ( L = ℓ ). Taking the time derivative of h L i t and plugging in theexpressions for dP t ( L = 1) /dt from Eq. (34) and for dP t ( L = ℓ ) /dt from Eq. (35) we obtain ddt h L i t = − t + s − t + s + 1) h L i t + 4(1 − η ) t + s + 1 h L i t + 2(1 − η )( t + s −
1) + 2( t + s − t + s + 1) , (71)where h L i t is given by Eq. (55). Keeping only the leading terms we obtain ddt h L i t = 4(1 − η )[ln( t + s + 1) + 2 C + 2 C + 1] t + s + 1 . (72)Note that as p approaches 1 / og N t σ t p = 0 . p = 0 . p = 0 . p = 0 . FIG. 8: (Color online) The standard deviation of the DSPL, σ t , as a function of network size, N t .The theoretical results (solid lines), obtained from Eq. (75), where η is taken from Eq. (29), arefound to be in good agreement with the simulation results (symbols). h L i t = h L i + (cid:20) − η ) ln (cid:18) t + s + 1 s + 1 (cid:19)(cid:21) + 2(2 C + 2 C + 1)(1 − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) . (73)Thus, the variance σ t = h L i t − h L i t is given by σ t = 2(1 − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) + h L i − ( C + C ) . (74)In the long time limit, Eq. (74) can be simplified to the form σ t = 2(1 − η ) ln (cid:18) t + s + 1 s + 1 (cid:19) + O (1) , (75)which highlights the logarithmic scaling. Comparing Eqs. (54) and (74) we find that toleading order σ t = h L i t , which is the result obtained in the case of a Poisson distribution.In Fig. 8 we present the standard deviation, σ t , of the DSPL of the corded ND modelas a function of network size, N t . The analytical results (solid lines), obtained from Eq.(75), where η is taken from Eq. (29), are found to be in good agreement with the results ofnumerical simulations (symbols), thus the logarithmic scaling is confirmed.29 . DISCUSSION The mean distance, h L i t of the corded ND network was found to scale logarithmicallywith the network size, N t , according to h L i t ≃ − η ) ln N t , and it is thus a small worldnetwork. A similar logarithmic scaling is observed in other random networks such as con-figuration model networks. However, the pre-factor of the logarithmic term is different. Inconfiguration model networks the mean distance is given by [16, 17] h L i = 1ln (cid:16) h K i−h K ih K i (cid:17) ln N. (76)The pre-factor of ln N is equal to the inverse of the logarithm of the connective constant,which is expressed in terms of the first two moments of the degree distribution. Using Eq.(24), the mean distance of the corded ND network can be expressed in the form h L i t ≃ (cid:10) (1 − p ) G (cid:11) ln N t . (77)Thus, the mean distance in the corded ND network is expressed in terms of the generatingfunction of the distribution of degeneracy levels, P ( G = g ), unlike the configuration modelin which it is given in terms of the first two moments of the degree distribution, P ( K = k ).In order to compare the quantitative behaviors of the corded ND network and the config-uration model network, we present in Fig. 9 the mean distance, h L i t , expressed in units ofln N t , of the corded ND network (dashed line) and of the corresponding configuration modelnetwork with the same degree distribution (solid line), as a function of p . For the cordedND network, this ratio is h L i t ln N t ≃ − η ) , (78)where η is given by Eq. (29). For the corresponding configuration model network, it isexpressed by h L i ln N = 1ln (cid:16) h K i−h K ih K i (cid:17) , (79)where h K i is given by Eq. (5) and h K i is given by Eq. (6). It is found that for thecorded ND network this ratio is of order 1 for the whole range of sparse networks while in30 h L i / l n N FIG. 9: (Color online) The mean distance, h L i = h L i t , expressed in units of ln N = ln N t , namely h L i / ln N ≃ − η ), of the corded ND network as a function of the parameter p (dashed line), andthe corresponding ratio, h L i / ln N = 1 / ln[( h K i − h K i ) / h K i ], for a configuration model networkwith the same degree distribution (solid line), where h K i is given by Eq. (5) and h K i is given byEq. (6). the corresponding configuration model network it decreases as p is increased until it fallssharply to zero at p = √ − t , of the corded ND network and found that in thelong time limit ∆ t ln N t ≃ ηW h η (1 − η ) e i , (80)where η = p + 2 p . For p ≪
1, using the Taylor expansion of the Lambert W function weobtain ∆ t ln N t = 2(1 − η ) (cid:18) e + p + 2 e − e p (cid:19) + O ( p ) . (81)Thus, in the limit of p ≪ t ≃ e (1 − η ) ln N t , namely by a factorof e larger than the mean distance, h L i t . This is in contrast to the case of configurationmodel networks, where ∆ = h L i + δ , and δ is an additive constant [24, 63].The variance of the DSPL was found to scale like31 t = 2(1 − η ) ln N t , (82)namely the variance scales linearly with the mean distance, which reflects the dominanceof the Poisson distribution in the DSPL. Thus, the variance of the DSPL in the corded NDnetwork is much larger than in the corresponding configuration model networks, in whichthe DSPL tends to be narrow.It will be interesting to generalize the analysis presented here to the calculation of theDSPL of the uncorded ND network, in which there is no link between the mother anddaughter nodes. A useful simplifying property of the corded ND model studied here is thatthe daughter node is never discarded, namely each randomly selected mother node is actuallyduplicated. This guarantees that the degree of the mother node selected at time t is drawnfrom the instantaneous degree distribution, P t ( K = k ). In the uncorded ND model this isnot the case, because the probability that the daughter node will form a link to at least oneneighbor of the mother node and thus will be added to the network depends on the degree ofthe mother node. The conditional probability that the daughter node will be added to thenetwork, given that the mother node is of degree k , is P t (added | K = k ) = 1 − (1 − p ) k . UsingBayes’ theorem, it can by shown that the degree distribution of the mother node under thecondition that the daughter node was actually added to the network is P t ( K = k | added) = 1 − (1 − p ) k − G t (1 − p ) P t ( K = k ) , (83)where G t ( x ) = P k x k P t ( K = k ) is the generating function of the degree distribution attime t . The fact that P t ( K = k | added) is different from P t ( K = k ) is expected to makethe calculation of the DSPL more difficult, because the mother nodes in this case are notsimply random nodes. The DSPL between a node, i , of degree k i and the rest of the networkdepends on k i . It will thus require to derive a set of master equations for the conditionalDSPLs, P t ( L = ℓ | K = k ), between a random node of degree k and all other nodes in thenetwork. XI. SUMMARY
We have studied a node duplication network model, in which at each time step a randommother node is selected for duplication, referred to as the corded ND model. The daughter32ode is connected deterministically to the mother node, and is also connected, with proba-bility p , to each one of its neighbors. We focused on the regime of dilute networks, obtainedfor 0 < p < /
2. We derived a master equation for the time evolution of P t ( L = ℓ ). Findingan exact analytical solution of the master equation, we obtained a closed form expressionfor the DSPL, in which the probability P t ( L = ℓ ) is expressed as a sum of two terms. Thefirst term is a convolution between the DSPL of the seed network, P ( L = ℓ ), and a Poissondistribution. The second term is a convolution between a discrete exponential function andthe Poisson distribution. We calculated the mean distance h L i t and showed that in thelong time limit it scales like h L i t ≃ − η ) ln N t , where N t is the network size at time t .The mean distance thus scales logarithmically with the network size, which means that thecorded ND network is a small world network. Interestingly, this behavior differs from otherscale-free networks which are ultrasmall, namely their mean distance follows h L i t ∼ ln ln N t [18]. [1] R. Albert and A.L. Barab´asi, Statistical mechanics of complex networks, Rev. Mod. Phys. ,47 (2002).[2] G. Caldarelli, Scale free networks: complex webs in nature and technology (Oxford UniversityPress, 2007).[3] S. Havlin and R. Cohen,
Complex Networks: Structure, Robustness and Function (CambridgeUniversity Press, 2010).[4] M.E.J. Newman,
Networks: an Introduction (Oxford University Press, 2010).[5] E. Estrada,
The Structure of Complex Networks: Theory and Applications (Oxford UniversityPress, 2011).[6] A. Barrat, M. Barth´elemy and A. Vespignani, Dynamical Processes on Complex Networks(Cambridge University Press, 2012).[7] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, D. Chklovskii and U. Alon,
Science , 824(2002).[8] U. Alon,
An Introduction to Systems Biology: Design Principles of Biological Circuits (Chap-man and Hall/CRC, 2006).[9] A.-L. Barabasi and R. Albert, Science , 509 (1999).
10] H. Jeong, B. Tombor, R. Albert, Z.N. Oltvai, and A.-L. Barab´asi,
Nature , 651 (2000).[11] P. L. Krapivsky, S. Redner and F. Leyvraz, Phys. Rev. Lett. , 4629 (2000).[12] P.L. Krapivsky and S. Redner, Phys. Rev. E , 066123 (2001).[13] A. V´azquez, Phys. Rev. E , 056104 (2003).[14] S. Milgram, Psychology Today , 61 (1967).[15] D. Watts and S. Strogatz, Nature , 440 (1998).[16] F. Chung and L. Lu,
Proc. Nat. Acad. Sci. USA , 15879 (2002)[17] F. Chung and L. Lu, Internet Mathematics , 91 (2003).[18] R. Cohen and S. Havlin, Phys. Rev. Lett. , 058701 (2003).[19] L. Giot et al., Science
Science , 1078 (2005).[21] E.W. Dijkstra,
Numerische Mathematik l , 269 (1959).[22] D. Delling, P. Sanders, D. Schultes and D. Wagner, Engineering Route Planning Algorithms,in Algorithmics of Large and Complex Networks: Design, Analysis, and Simulation , J. Lerner,D. Wagner, and K.A. Zweig (Eds.), p. 117 (2009).[23] R. Pastor-Satorras, C. Castellano, P. Van Mieghem and A. Vespignani,
Rev. Mod. Phys. ,925 (2015).[24] B. Bollobas, Random Graphs, Second Edition (Academic Press, London, 2001).[25] R. Durrett,
Random Graph Dynamica (Cambridge University Press, Cambridge, 2007).[26] A. Fronczak, P. Fronczak, and J.A. Holyst,
Phys. Rev. E , 056110 (2004).[27] M.E.J. Newman, Proc. Natl. Acad. Sci. USA , 404 (2001).[28] M.E.J. Newman, S.H. Strogatz, and D.J. Watts, Phys. Rev. E , 026118 (2001).[29] S.N. Dorogotsev, J.F.F. Mendes and A.N. Samukhin, Nuclear Physics B , 307 (2003).[30] V.D. Blondel, J.-L. Guillaume, J.M. Hendrickx and R.M. Jungers,
Phys. Rev. E , 066101(2007).[31] R. van der Hofstad, G. Hooghiemstra and D. Znamenski, Electronic Journal of Probability ,703 (2007).[32] H. van der Esker, R. van der Hofstad and G. Hooghiemstra, J. Stat. Phys. , 169 (2008).[33] J. Shao, S. V. Buldyrev, R. Cohen, M. Kitsak, S. Havlin, and H. E. Stanley,
Europhys. Lett. , 48004 (2008).[34] J. Shao, S. V. Buldyrev, L. A. Braunstein, S. Havlin, and H. E. Stanley, Phys. Rev. E ,036105 (2009).[35] E. Katzav, M. Nitzan, D. ben-Avraham, P.L. Krapivsky, R. K¨uhn, N. Ross and O. Biham, EPL , 26006 (2015).[36] P. Erd˝os and A. R´enyi,
Publ. Math. Debrecen , 290 (1959); Publ. Math. Inst. Hungar. Acad.Sci. , 17 (1960); Bull. Inst. Internat. Statist , 343 (1961).[37] M. Nitzan, E. Katzav, R. K¨uhn and O. Biham, Phys. Rev. E , 062309 (2016).[38] S. Melnik and J.P. Gleeson, arXiv:1604.05521.[39] M. Molloy and B. Reed, Random Struct. Algorithms , 161 (1995).[40] A. Bhan, D.J. Galas and T.G. Dewey, Bioinformatics , 1486 (2002).[41] J. Kim, P.L. Krapivsky, B. Kahng and S. Redner, Phys. Rev. E , 055101 (2002).[42] F. Chung, L. Lu, T.G. Dewey and D.J. Galas, J. Comput. Biol. , 677 (2003).[43] P.L. Krapivsky and S. Redner, Phys. Rev. E , 036118 (2005).[44] I. Ispolatov, P.L. Krapivsky and A. Yuryev, Phys. Rev. E , 061911 (2005).[45] I. Ispolatov, P.L. Krapivsky, I. Mazo and A. Yuryev, New J. Phys. , 145 (2005).[46] G. Bebek, P. Berenbrink, C. Cooper, T. Friedetzky, J. Nadeau and S.C. Sahinalp, Theor.Comput. Sci. , 239 (2006).[47] S. Li, K.P. Choi and T. Wu,
Theor. Comput. Sci. , 94 (2013).[48] R. Lambiotte, P. L. Krapivsky, U. Bhat and S. Redner
Phys. Rev. Lett. , 218301 (2016).[49] U. Bhat, P. L. Krapivsky, R. Lambiotte and S. Redner
Phys. Rev. E. , 062302 (2016).[50] S. Ohno, Evolution by Gene Duplication (Springer-Verlag, New York, 1970).[51] S.A. Teichmann and M.M. Babu,
Nature Genetics , 492 (2004).[52] Except for the case in which the duplicated gene is an auto-regulator, namely a transcriptionfactor that regulates its own expression. In this case, one of the copies may end up regulatingthe other.[53] R. Toivonen, L. Kovanen, M. Kivel¨a, J.-P. Onnela, J. Saram¨aki and K. Kaski, Social Networks , 240 (2009).[54] M. Granovetter, American Journal of Sociology , 1360 (1973).[55] S. Redner, Eur. Phys. J. B , 131 (1998).[56] S. Redner, Physics Today , 49 (2005).
57] F. Radicchi, S. Fortunato, and C. Castellano,
Proc. Natl. Acad. Sci. USA , 17268 (2008).[58] G.J. Peterson, Steve Press´e and K.A. Dill,
Proc. Natl. Acad. Sci. USA , 16023 (2010).[59] R.T. Smythe andH. Mahmoud,
Theory Probab. Math. Statist. , 1 (1995).[60] M. Drmota and B. Gittenberger, Random Struct. Alg. , 421 (1997).[61] M. Drmota and H.-K. Hwang, Adv. Appl Probab. , 321 (2005).[62] F. W. J. Olver, D. M. Lozier, R. F. Boisvert, and C. W. Clark, NIST Handbook of MathematicalFunctions (Cambridge University Press, Cambridge, 2010).[63] B. Bollobas, S. Janson and O. Riordan,
Random Struct. Alg. , 3 (2007)., 3 (2007).