Generating hierarchial scale free graphs from fractals
GGENERATING HIERARCHIAL SCALE FREE GRAPHSFROM FRACTALS
J ´ULIA KOMJ ´ATHY AND K ´AROLY SIMON
Abstract.
Motivated by the hierarchial network model of E. Rav-asz, A.-L. Barab´asi, and T. Vicsek [3] and [2], we introduce de-terministic scale-free networks derived from a graph directed self-similar fractal Λ. With rigorous mathematical results we verifythat our model captures some of the most important features ofmany real networks: the scale free and the high clustering prop-erties. We also prove that the diameter is the logarithm of thesize of the system. Using our (deterministic) fractal Λ we generaterandom graph sequence sharing similar properties. Introduction
In the last two decades there have been a considerable amount of at-tention paid to the study of complex networks like the World WideWeb, social networks, or biological networks. This resulted in the con-struction of numerous network models, see e.g. [1], [9], [7], [4], [10][5]. Most of them use a version of preferential attachment and are ofprobabilistic nature. A completely different approach was initiated byBarab´asi, Ravasz, and Vicsek [3]. They introduced deterministic net-work models generated by a method which is common in constructingfractals. Their model exhibits hierarchical structure and the degree se-quence obeys power law decay. To model also the clustering behaviorof real networks, Ravasz and Barab´asi [2] developed the original modelso that their deterministic network model preserved the same powerlaw decay and has similar clustering behavior to many real networks.Namely, the average local clustering coefficient is independent of thesize of the network and the local clustering coefficient decays inverselyproportional to the degree of the node.In this paper we generalize both of the models above. Starting froman arbitrary initial bipartite graph G on N vertices, we construct ahierarchical sequence of deterministic graphs G n . Namely, V ( G n ), the Mathematics Subject Classification.
Primary 05C80 Secondary 28A80
Key words and phrases.
Random graphs, Scale-free networks;The research was supported by the NKTH OTKA grant a r X i v : . [ m a t h . C O ] A p r J ´ULIA KOMJ ´ATHY AND K ´AROLY SIMON set of vertices of G n is { , , . . . , N − } n . To construct G n from G n − ,we take N identical copies of G n − , each of them identified with avertex of G . Then we connect these components in a complicated waydescribed in (1). In this way, G n contains N n − copies of G , whichare connected in a hierarchical manner, see Figures 1(a), 1(b) and 3for two examples.There are no triangles in G n . Hence, in order to model the clusteringproperties of many real networks, we need to extend the set of edges ofour graph sequence to destroy the bipartite property. Motivated by [2],we add some additional edges to G to obtain the (no longer bipartite)graph (cid:98) G . Then we build up the graph sequence (cid:98) G n as follows: (cid:98) G n consist of N n − copies of (cid:98) G , which copies are connected to each otherin the same way as they were in G n . So, (cid:98) G n and G n have the samevertex set and their edges only differ at the lowest hierarchical level,that is, within the N n − copies of G and (cid:98) G , see Figures 3 and 4.We give a rigorous proof of the fact that the average local clusteringcoefficient of (cid:98) G n does not depend on the size and the local clusteringcoefficient of a node with degree k is of order 1 /k .The embedding of the adjacency matrix of the graph sequence G n iscarried out as follows: A vertex x = ( x . . . x n ) is identified with thecorresponding N -adic interval I x (see (4)). Λ n is the union of those N − n × N − n squares I x × I y for which the vertices x, y are connectedby an edge in G n . So, Λ n is the most straightforward embedding ofthe adjacency matrix of G n into the unit square. Λ n turns out to be anested sequence of compact sets, which can be considered as the n -thapproximation of a graph directed self-similar fractal Λ on the plane,see Figure 1(c). We discuss connection between the graph theoreticalproperties of G n and properties of the limiting fractal Λ.Furthermore, using Λ we generate a random graph sequence G r n in away which was inspired by the W -random graphs introduced by Lov´aszand Szegedy [10]. See also Diaconis, Janson [6], which paper contains alist of corresponding references. We show that the degree sequence haspower law decay with the same exponent as the deterministic graphsequence G n . Thus we can define a random graph sequence with aprescribed power law decay in a given range. Bollob’as, Janson andRiordan [5] considered inhomogeneous random graphs generated by akernel. Our model is not covered by their construction, since Λ is afractal set of zero two dimensional Lebesgue measure.The paper is organized as follows: In Section 2 we define the determin-istic model and the associated fractal set Λ. In Section 3, we verify the RAPHS GENERATED BY FRACTALS 3 scale free property of G n (Theorem 3.1). We compare the Hausdorffdimension of Λ to the power law exponent of the degree sequence of G n . Our next result is that both of the diameter of G n and the averagelength of shortest path between two vertices are of order of the loga-rithm of the size of G n (Corollary 3.6 and Theorem 3.7). In Section 3.4we prove the above mentioned properties of the clustering coefficient of (cid:98) G n (Theorem 3.13 and 3.11). In Section 4 we describe the randomizedmodel, and in Section 5 we prove that the model exhibits the samepower law decay as the corresponding deterministic version.2. Deterministic model
The model was motivated by the hierarchical graph sequence model in[3], and is given as follows.2.1.
Description of the model.
Let G , our base graph , be any la-beled bipartite graph on the vertex set Σ = { , . . . , N − } . We par-tition Σ into the non-empty sets V , V and one of the end points ofany edge is in V , and the other is in V . We write n i := | V i | , i = 1 , V i . The edge set of G is denoted by E ( G ). If thepair x, y ∈ Σ is connected by an edge, then this edge is denoted by (cid:0) xy (cid:1) , since this notation makes it convenient to follow the labels of thevertices along a path.Now we define our graph sequence { G n } n ∈ N generated by the basegraph G .The vertex set is Σ n = { ( x x . . . x n ) : x i ∈ Σ } , all words of length n above the alphabet Σ . To be able to define the edge set, we needsome further definitions. Definition 2.1. (1)
We assign a type to each element of Σ . Namely,typ ( x ) = (cid:26) , if x ∈ V ; , if x ∈ V . (2) We define the type of a word z = ( z z . . . z n ) ∈ Σ n as follows:if all the elements z j , j = 1 , . . . , n of z fall in the same V i , i = 1 , then typ ( z ) the type of z is i . Otherwise typ ( z ) := 0 . (3) For x = ( x . . . x n ) , y = ( y . . . y n ) ∈ Σ n we denote the commonprefix by x ∧ y = ( z . . . z k ) s.t. x i = y i = z i , ∀ i = 0 , . . . , k and x k +1 (cid:54) = y k +1 . J ´ULIA KOMJ ´ATHY AND K ´AROLY SIMON (4)
Given x = ( x . . . x n ) , y = ( y . . . y n ) ∈ Σ n , the postfixes ˜ x, ˜ y ∈ Σ n −| x ∧ y | are determined by x = ( x ∧ y )˜ x, y = ( x ∧ y )˜ y, where the concatenation of the words a, b is denoted by ab . Now we can define the edge set E ( G n ). Two vertices x and y in G n areconnected by an edge if and only if the following assumptions hold: (a): One of the postfixes ˜ x, ˜ y is of type 1, the other is of type 2, (b): for each i > | x ∧ y | , the coordinate pair (cid:0) x i y i (cid:1) forms an edgein G .That is, E ( G n ) ⊂ Σ n × Σ n : E ( G n ) = (cid:40)(cid:18) xy (cid:19)(cid:12)(cid:12)(cid:12) x = y or { typ(˜ x ) , typ(˜ y ) } = { , } , ∀| x ∧ y | < i ≤ n, (cid:18) x i y i (cid:19) ∈ E ( G ) (cid:41) (1) Remark 2.2.
Note that we artificially added all loops to the (otherwisebipartite) graph sequence G n , implying easier calculations later withoutloss of the important properties. In particular, G differs from G onlyin the loops. Remark 2.3 (Hierarchical structure of G n ) . For every initial digit x ∈ { , , . . . , N − } , consider the set W x of vertices ( x . . . x n ) of G n with x = x . Then the induced subgraph on W x is identical to G n − . We write deg n ( x ) for the degree of a vertex in G n , including the loopwhich increases the degree by 2. However, for an x ∈ Σ , deg x denotesdegree of x in G . In particular deg ( x ) = deg( x )+2. In what follows, wewill frequently use (cid:96) ( x ), the length of the longest block from backwardsin x which has a nonzero type,(2) (cid:96) ( x ) := max i ∈ N { typ( x n − i +1 , . . . x n ) ∈ { , }} Remark 2.4.
The degree of a node x ∈ Σ n deg n ( x ) = 2 + S ( x ) · deg( x n ) , where S ( x ) : = 1 + deg( x n − ) + · · · + deg( x n − ) deg( x n − ) · · · deg( x n − (cid:96) ( x )+1 )= (cid:96) ( x ) − (cid:88) r =0 (cid:32) r (cid:89) j =1 deg( x n − j ) (cid:33) , (3) RAPHS GENERATED BY FRACTALS 5 where the empty sum is meant to be . The following two examples satisfy the requirements of our generalmodel.
Example 2.5 (Cherry) . Barab´asi, Ravasz and Vicsek [3] introducedthe ”cherry” model presented on Figures 1(a) and 1(b): Let V = { } and V = { , } , E ( G ) = { (1 , , (1 , } . Example 2.6 (Fan) . Our second example is called ”fan”, and is definedon Figure 3. Note that here | V | > . The embedding of the adjacency matrices into [0 , . Inthis Section, we investigate the sequence of adjacency matrices corre-sponding to { G n } n ∈ N . Roughly speaking, we will map them in the unitsquare, see Figure 1(c).To represent the adjacency matrix of G n as a subset of the unit square,first partition [0 , into N n congruent boxes, i.e. divide [0 ,
1] intoequal subintervals of length N n , corresponding to the first n digits ofthe N -adic expansion of elements of [0 , I x ...x n = (cid:34) n (cid:88) r =1 x r N r , n (cid:88) r =1 x r N r + 1 N n (cid:35) , ∀ ( x . . . x n ) ∈ Σ n . We partition [0 , with the corresponding level- n squares:(5) Q ( xy ) := I x × I y , (cid:18) xy (cid:19) ∈ Σ n × Σ n . A natural embedding of the adjacency matrix of G n in the unit squareis as follows:(6) Λ n ( a, b ) := (cid:40) , if ( a, b ) ∈ Q ( xy ) , (cid:0) xy (cid:1) ∈ E ( G n );0 , otherwise.That is, Λ n ( a, b ) = (cid:88) x,y ∈ Σ n ( xy ) ∈ E ( Gn ) Q ( xy )( a, b ) . We write Λ n for the support of the function Λ n ( a, b ), see Figure 1(c).Observe that Λ n is a compact set and Λ n +1 ⊂ Λ n holds for all n . Sowe can define the non-empty compact set(7) Λ := ∞ (cid:92) n =1 Λ n . J ´ULIA KOMJ ´ATHY AND K ´AROLY SIMON
10 2 1110 120100 02 2120 22 (a) G and G with loops (b) G Λ Λ Λ (c) The sets Λ , Λ , Λ Figure 1. G , G , G , Λ , Λ , Λ for the cherry Example 2.5Clearly, Λ ( a, b ) = lim n →∞ Λ n ( a, b ) . Remark 2.7.
This representation obviously depends on the labelingof the graph G . For an arbitrary permutation π of { , . . . , N − } , thecorresponding representation of G n is denoted by Λ πn ( a, b ) . The relation RAPHS GENERATED BY FRACTALS 7 between these two representations is given by the formula Λ πn ( a, b ) = Λ n ( ϕ π − ( a ) , ϕ π − ( b )) , and Λ π ( a, b ) = Λ ( ϕ π − ( a ) , ϕ π − ( b )) , where the measurable function ϕ π ( x ) : [0 , → [0 , is defined by ϕ π (cid:32) ∞ (cid:88) i =1 x i N i (cid:33) = ∞ (cid:88) i =1 π ( x i ) N i . Graph-directed structure of Λ . Now we prove that the limit Λ(defined in (7)) can be considered as the attractor of a not irreduciblegraph-directed self-similar iterated function system, (for the definitionsee [8]), with the directed graph G defined below. Definition 2.8.
The vertex set V ( G ) is partitioned into three subsets: (8) V dd = (cid:26)(cid:18) zz (cid:19) , z ∈ Σ (cid:27) V = (cid:26)(cid:18) xy (cid:19) , x ∈ V , y ∈ V (cid:27) V = (cid:26)(cid:18) xy (cid:19) , x ∈ V , y ∈ V (cid:27) . Then V ( G ) = V dd ∪ V ∪ V . The set of directed edges E ( G ) of G is as follows: First we connectall vertices in both directions within each of the three sets V dd , V and V (loops included). Then there is an outgoing edge for each vertex in V dd to all vertices in V and V .For every directed edge e = ( v , v ) ∈ E ( G ) we define a homothety: (9) f e : Q v → Q v , f e ( a, b ) := 1 N ( a, b ) + 1 N ( x , y ) , with v i = (cid:18) x i y i (cid:19) , where Q v := Q ( xy ) is the level-1 square for v = (cid:0) xy (cid:1) ∈ V ( G ) . The graph G corresponding to the graph sequence in the ”cherry” ex-ample is given by Figure 2.In general, G is given by the schematic picture on the right hand sideof Figure 2, where the double arrow in between the complete directedgraphs −−−−→ K . ( V .. ) illustrates that we connect all pairs of vertices in thegiven direction.Let P n be the set of all paths of length n in G , i.e. P n := { v = ( v . . . v n ) |∀ ≤ i < n ( v i , v i +1 ) ∈ E ( G ) } . J ´ULIA KOMJ ´ATHY AND K ´AROLY SIMON −−−−−−→ K | E | (V ) −−−−−−→ K | N | (V dd ) −−−−−−→ K | E | (V ) ¡ ¢¡ ¢ ¡ ¢¡ ¢ ¡ ¢¡ ¢ ¡ ¢ Figure 2.
The graph G for the ”cherry”, Example 2.5.For a v = ( v . . . v n ) = (cid:0) x ...x n y ...y n (cid:1) ∈ P n it immediately follows from defi-nitions (5) and (9) that(10) Q v = f v (cid:0) [0 , (cid:1) = I x ...x n × I y ...y n , where(11) f v ( . ) : = f ( v ,v ) ◦ · · · ◦ f ( v n − ,v n ) ( . ) if n ≥ ,f v ( a, b ) : = 1 N ( a, b ) + 1 N ( x, y ) , if n = 1 , v = (cid:18) xy (cid:19) . The key observation of connecting G to the graph sequence G n is thefollowing: Claim 2.9.
For all n we have E ( G n ) = P n . Proof.
Let v = ( v . . . v n ) = (cid:0) a ...a n b ...b n (cid:1) ∈ Σ n × Σ n , thus a = ( a . . . a n )and b = ( b . . . b n ) are vertices in G n . First we assume that v ∈ E ( G n ).Observe that by (1), (cid:0) a i b i (cid:1) are vertices in G . We would like to prove thatthe sequence(12) (cid:18) a b (cid:19) . . . (cid:18) a n b n (cid:19) ∈ P n . RAPHS GENERATED BY FRACTALS 9 If k := | a ∧ b | ≥
1, then for i ≤ k , a i = b i holds, thus the sequenceof points (cid:0) a b (cid:1) . . . (cid:0) a k b k (cid:1) forms a path in −−−−−−→ K | N | ( V dd ). By (1), the pairs (cid:0) a k +1 b k +1 (cid:1) , . . . , (cid:0) a n b n (cid:1) are all edges in G thus vertices in G . Furthermore, ei-ther they all belong to V or they are all contained in V , see (8). Thisimplies that this postfix also forms a path in −−−−−−→ K | N | ( V ) or in −−−−−−→ K | N | ( V ).By definition of E ( G ), (cid:16)(cid:0) a k b k (cid:1) , (cid:0) a k +1 b k +1 (cid:1)(cid:17) is an edge in G , so (cid:0) a b (cid:1) . . . (cid:0) a n b n (cid:1) isa path in G . If k = 0 then the whole path is contained either in V orin V . This completes the proof of (12).On the other hand, if (cid:0) a b (cid:1) . . . (cid:0) a n b n (cid:1) is a path of length n in G , then weclaim that for a = ( a . . . a n ) , b = ( b . . . b n ) ∈ V ( G n )( a, b ) ∈ E ( G n ) . The proof is very similar to the previous one. (cid:3)
In this way we can characterize Λ n as follows: Corollary 2.10. Λ n = (cid:91) v ∈P n Q v = (cid:91) v ∈P n f v (cid:0) [0 , (cid:1) . Proof.
Immediately follows from (6) and (10) and the assertion of theClaim 2.9. (cid:3)
Let us define P ∞ := { v = ( v v . . . ) |∀ i ∈ N , ( v i , v i +1 ) ∈ E ( G ) } . Now for every v ∈ P ∞ we have ∞ (cid:84) n =1 Q ( v ...v n ) is a point in [0 , , whichwill be denoted by Π v . That is,Π : P ∞ → [0 , , Π( v ) := ∞ (cid:92) n =1 Q ( v ...v n ) = lim n →∞ f v ...v n (0 , . It is an immediate consequence of Corollary 2.10, that(13) Π( P ∞ ) = Λ , i.e. Λ = (cid:91) v ∈P ∞ Π v . This means that Λ n , the embedded adjacency matrix of G n , can beconsidered as the n -th approximation of the fractal set Λ.In this way we coded the elements of Λ by the elements of P ∞ . Thiscoding is not 1 − N -adic expansion is not1 −
1. However, if neither of the two coordinates of a point ( a, b ) ∈ Λare N -adic rational numbers, then (a,b) has a unique code. Fractal geometric characterisation of Λ . For notational con-venience we define the set of finite words above the alphabet V dd (in-cluding the empty word as well): V ∗ dd := { v | ∃ n ∈ N ∪ { } , v = ( v . . . v n ) and v i ∈ V dd } . The three subgraphs −−−−−−→ K | E | ( V ), −−−−−−→ K | E | ( V ) and −−−−−−→ K | E | ( V dd ) of G are com-plete directed graphs. We consider the three corresponding self-similariterated function systems (IFS): F dd : = { f v } v ∈ V dd , F : = { f v } v ∈ V , F : = { f v } v ∈ V , where the functions f v , v ∈ V ( G ) were defined in (11). The attractorsof these IFS-s (see [8, p.30]) are the unique nonempty compact setssatisfying(14) Λ dd := (cid:91) v ∈ V dd f v (Λ dd ) = { Π( v ) | v = ( v , v . . . ) and v i ∈ V dd } Λ := (cid:91) v ∈ V f v (Λ ) = { Π( v ) | v = ( v , v . . . ) and v i ∈ V } Λ := (cid:91) v ∈ V f v (Λ ) = { Π( v ) | v = ( v , v . . . ) and v i ∈ V } . The Open Set Condition (see e.g. [8, p.35]) holds for these IFS-s, so wecan easily compute the Hausdorff-dimension of the attractors. Clearly,Λ dd is the diagonal of the unit square.Now we prove that Λ is a countable union of homothetic copies of theseattractors. Theorem 2.11.
Λ = Diag (cid:124) (cid:123)(cid:122) (cid:125) Λ dd ∪ (cid:91) v ∈ V ∗ dd ( f v (Λ ) ∪ f v (Λ )) , where Diag = { ( x, x ) : x ∈ [0 , } . Remark 2.12.
Observe that Λ is the image of Λ by the reflectionthrough the diagonal, hence Λ is symmetric to the diagonal. The sameis true for the n -th approximation Λ n of Λ . This can be seen immedi-ately by using the embedded adjacency matrix characterization of Λ n .Proof of Theorem 2.11. We start by showing that(15) Λ ⊂ Diag ∪ (cid:91) v ∈ V ∗ dd ( f v (Λ ) ∪ f v (Λ )) . RAPHS GENERATED BY FRACTALS 11
Pick an arbitrary point ( a, b ) ∈ Λ. As a consequence of (13) thereexists a v = ( v v . . . ) ∈ P ∞ such that Π( v ) = ( a, b ). Let k :=max { (cid:96) : v (cid:96) ∈ Λ dd } . We distinguish three cases: k = 0, k = ∞ or0 < k < ∞ . Mind that for all i ≤ k, v i ∈ V dd since once the path leftthe component V dd , there is no way to return. Since V and V areclosed, for k < ∞ all v i , i > k are in the same component V or V . Case k = 0 : Clearly either all v i are in V or in V , so Π( v ) ∈ Λ ∪ Λ . Case k = ∞ : For the same reason, Π( v ) = lim n →∞ f v ...v n (0 , ∈ Λ dd = Diag. This is so because f v ...v n (0 ,
0) is in the N n neigh-borhood of the diagonal { ( x, x ) : x ∈ [0 , } . Case < k < ∞ : Let v k = ( v . . . v k ). For symmetry, withoutloss of generality we may assume that v k +1 ∈ V . As in thefirst case, we can see that for w := ( v k +1 v k +2 . . . ), Π( w ) ∈ Λ .Hence Π( v ) = f v k (Π w ) ∈ f v k (Λ ).Now we have verified (15). To prove the opposite direction, that is(16) Λ ⊃ Diag ∪ (cid:91) v ∈ V ∗ dd ( f v (Λ ) ∪ f v (Λ )) , we will use the symbolic representation of Λ given in (13).Pick an x ∈ [0 ,
1] and take the N -adic code ( x x . . . ) of x . That is, x = ∞ (cid:80) n =1 x i N i , x i ∈ { , . . . , N − } . Then v := (cid:32) (cid:18) x x (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) v , (cid:18) x x (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) v , . . . (cid:33) ∈ P ∞ , it is easy to see that Π( v ) = ( x, x ). So by (13), ( x, x ) ∈ Λ.Now we assume that ( a, b ) ∈ (cid:83) v ∈ V ∗ dd ( f v (Λ ) ∪ f v (Λ )). Without lossof generality we may further assume that ( a, b ) ∈ f v (Λ ) for some v ∈ V ∗ dd . That is, ( a, b ) = f v ( a (cid:48) , b (cid:48) ) where ( a (cid:48) , b (cid:48) ) ∈ Λ . By (14) thereexists a w := ( w w . . . ) , w i ∈ V such that Π( w ) = ( a (cid:48) , b (cid:48) ). In thisway, for the concatenation t := vw ∈ P ∞ we have ( a, b ) = Π( t ) whichimplies ( a, b ) ∈ Λ. This completes the proof of (16). (cid:3)
The same model without loops.
Let G (cid:48) n be the same graph as G n but without loops, i.e. V ( G (cid:48) n ) = V ( G n ) and E ( G (cid:48) n ) ⊂ Σ n × Σ n : E ( G (cid:48) n ) = (cid:40)(cid:18) xy (cid:19) (cid:12)(cid:12)(cid:12) { typ(˜ x ) , typ(˜ y ) } = { , } and ∀| x ∧ y | < i ≤ n, (cid:18) x i y i (cid:19) ∈ E ( G ) (cid:41) In this case Λ (cid:48) n = Λ n \ Diag n , where Diag n is the union of the level n squares that have nonempty intersection with the diagonal. Thesequence Λ (cid:48) n is not a nested sequence of compact sets. However, it iseasy to see that the characteristic function of Λ (cid:48) n tends to characteristicfunction of Λ \ Diag. Further, Λ (cid:48) n tends to Λ in the Hausdorff metric,see [8]. 3. Properties of the sequence { G n } and ΛIn this section we compute the degree distribution of G n , and relate itto the Hausdorff dimension of Λ. We also compute the length of theaverage shortest path in G n . To get interesting result about the localclustering coefficient we need to modify our graph sequence G n in theline as it was done in [2].3.1. Degree distribution of { G n } . Here we compute the degree dis-tribution under the following regularity assumption on the base graph G :( A1 ) deg( x ) := d , ∀ x ∈ V max j ∈ V deg( y ) := d ≤ d − , ∀ y ∈ V Recall that we defined (cid:96) ( x ) in (2) as the length of the longest blockfrom backwards of the node x such that the last (cid:96) ( x ) digits of x belongto the same V i . Put Σ in := { x ∈ Σ n | x n ∈ V i } , i = 1 ,
2. It follows from A1 and Remark 2.4 that the degree of a node x ∈ Σ n is d (cid:96) ( x )+11 − d − + 1,and the number of such nodes with (cid:96) ( x ) = (cid:96) is exactly N n − (cid:96) +1 · n · n (cid:96) . Under assumption A1 , the decay of the degree distribution is deter-mined by the set of high degree nodes denoted by HD n := (cid:26) x ∈ Σ n | deg n ( x ) > max y ∈ Σ n deg n ( y ) (cid:27) . An equivalent characterisation of HD n is HD n = (cid:26) x ∈ Σ n | (cid:96) ( x ) > d max { ( n + 1) log( d ) , log n } (cid:27) . RAPHS GENERATED BY FRACTALS 13 (a) G on the left and G on the right handside. Here V = { , } and V = { , , , }
02 0400 01 03 05 12 1410 11 13 15 32 3430 31 33 35 52 5450 51 53 5522 2420 21 23 25 42 4440 41 43 45 (b) The graph G (contains additionally all loops). Figure 3.
Example ”fan”.This is so because the degree of any y ∈ Σ n is at most max { d n +12 , n } .The tail of the cumulative degree distribution is P (cid:20) deg n ( X ) > d (cid:96) +11 − d − (cid:21) = n (cid:96) +11 N n − (cid:96) − N n = (cid:16) n N (cid:17) (cid:96) +1 , where X is a uniformly chosen node of G n . Mind that as long as (cid:96) < n ,this probability does not depend on n . Writing (cid:101) F ( t ) = P (deg( X ) > t )for the tail of the cumulative distribution function we get the powerlaw decay (cid:101) F ( t ) = t − log( N/n d · c ( d ) for t = d (cid:96) +11 − d − . So we have proved
Theorem 3.1.
The degree distribution of the graph sequence G n sat-isfying assumption A1 , has a power law decay with exponent (17) ˜ γ = γ − N/n )log d . This implies that the largest decay γ we can get in this family of modelsis 1 + log 3log 2 , and the maximum is attained at n = 1 and d = 2 = n .This is exactly the graph sequence in Example 2.5, see Figures 1(a) and1(b). We will later see that the case n = 1 is important in anothersense as well, see Section 3.2.3.2. Hausdorff dimension of Λ . In Theorem 2.11 we decomposed Λinto the diagonal of the square and countably many homothetic copiesof Λ and Λ , both attractors of self-similar IFS-s. Hence the Haus-dorff dimension is the maximum of the dimension of the diagonal anddim H Λ = dim H Λ . Note that the self-similar IFS F consists of | E | similarities of contraction ratio N , and satisfies the Open Set Condi-tion. As an immediate application of [8, Theorem 2.7], the Hausdorffdimension of Λ and Λ is dim H Λ = log | E | log N .By this argument above we have proved the following theorem: Theorem 3.2.
The Hausdorff dimension of Λ is dim H Λ = max (cid:26) log | E | log N , (cid:27) , furthermore, dim H (cid:0) Λ \ Diag (cid:1) = log | E | log N .
Corollary 3.3. If | V | = n = 1 , then ( A1 ) holds with d = | E | in thebipartite G . Hence the degree distribution exponent (17) equals ˜ γ = log N log | E | = 1dim H (cid:0) Λ \ Diag (cid:1) . Average shortest path in G n . In many real networks, the typ-ical distance between two randomly chosen points is of order log( | G | ),the logarithm of the size of the network. We will see that our model alsoshares this property as well as the power law decay and the hierarchicalstructure, combining all these important features.In this section we calculate the average length of shortest path betweentwo nodes in G n . First we give a deterministic way to construct one ofthe shortest paths between any two nodes in the graph. To do so, weneed to introduce some notation. Recall that the graph G is a bipartitegraph with partition V , V , see the beginning of Section 2. We remind RAPHS GENERATED BY FRACTALS 15 the reader that for x, y ∈ Σ n , typ( x ), the common prefix x ∧ y and thepostfixes ˜ x, ˜ y were defined in Definition 2.1. Definition 3.4.
For two arbitrary vertices x, y ∈ Σ n we denote the length of their com-mon prefix by k = k ( x, y ) := | x ∧ y | . Furthermore, let us decomposethe postfixes ˜ x, ˜ y into blocks of digits of the same type: (18) ˜ x = b b . . . b r , ˜ y = c c . . . c q , such that all of the blocks have a nonzero type and the consecutive blocksare of different types. That is, for i = 1 , . . . , r − , j = 1 , . . . q − wehave typ ( b i ) (cid:54) = typ ( b i +1 ) ∈ { , } , and typ ( c j ) (cid:54) = typ ( c j +1 ) ∈ { , } . Note, that we denoted the number of blocks in ˜ x, ˜ y by r and q , re-spectively. If X and Y are two random vertices of G n , then the samenotation as in (18) is used with capital letters. Now we fix an arbitrary self-map p of Σ such that( x, p ( x )) ∈ E ( G ) ∀ x ∈ G. Most commonly, p ( p ( x )) (cid:54) = x . Note that x and p ( x ) have different typessince G is bipartite. For a word z = ( z . . . z m ) with typ( z ) ∈ { , } wedefine p ( z ) := ( p ( z ) . . . p ( z m )) . . Then,(19) ( tz, tp ( z )) is an edge in G (cid:96) + m , ∀ t = ( t . . . t (cid:96) ) , follows from (1).As usual we write Diam( G ) for the maximal graph-distance in thegraph G within components of G . Clearly Diam( G ) ≤ N − Lemma 3.5.
Let x, y be arbitrary vertices in the same connected com-ponent of G n . Using the notation above, the length of the shortest pathbetween them is at least r + q − and at most r + q + Diam ( G ) − . Considering the worst case scenario, i.e. choosing all blocks of length1 yields:
Corollary 3.6.
The diameter of the graph G n is at most n + Diam ( G ) − . Since the size of the graph is N n , thereforeDiam ( G n ) = 2log N log( | G n | ) + O (1) . Proof of Lemma 3.5.
First we construct a path P ( x, y ) of minimal length.Starting from x the first half of the path P ( x, y ) is as follows:ˆ x = x = ( x ∧ y ) b . . . b r − b r ˆ x = ( x ∧ y ) b . . . b r − p ( b r ) . . . ˆ x r − = ( x ∧ y ) b p ( b . . . p ( b r − p ( b r ))) , Starting from y the first half of the path P ( x, y ) is as follows:ˆ y = y = ( x ∧ y ) c c . . . c r ˆ y = ( x ∧ y ) c . . . c r − p ( c r ) . . . ˆ y q − = ( x ∧ y ) c p ( c . . . p ( c r − p ( c q ))) . It follows from (19) that P x : = (ˆ x , ˆ x , . . . , ˆ x r − ) P y : = (ˆ y q − , · · · ˆ y , ˆ y )are two paths in G n . To construct P ( x, y ) the only thing remained isto connect ˆ x r − and ˆ y q − . Using (19) it is easy to see that this can bedone with a path P c of length at most Diam( G ). In this way, P ( x, y ) := P x P c P y . Clearly, r + q − ≤ Length( P ( x, y )) ≤ r + q + Diam( G ) − P ( x, y ). Recall that it follows from (1) that for any path Q ( x, y ) =( x = q , . . . , q (cid:96) = y ), the consecutive elements of the path only differin their postfixes, which have different types. That is, ∀ i, q i = w i z i , q i +1 = w i ˜ z i , with typ( z i ) (cid:54) = typ(˜ z i ) ∈ { , } . This implies that in each step on the path, the number of blocks in(18) changes by at most one. Recall that | x ∧ y | = k , so x k +1 (cid:54) = y k +1 .Since the digit on the k + 1-th position changes on the path, we haveto reach a point where all the digits to the right from the k -th positionare of the same type. Starting from ˜ p = x , to reach the first vertex a of this property, we need at least r − P , where r was defined in formula (18). Similarly, starting from y , we need atleast q − b where all the digits after the RAPHS GENERATED BY FRACTALS 17 k -th position are of the same type. Because x k +1 (cid:54) = y k +1 , we need atleast one more edge and at most Diam( G ) edges. (cid:3) Theorem 3.7.
The expectation of the length of a shortest path betweentwo uniformly chosen vertices
X, Y ∈ G n can be bounded by n n N ( n − < E ( | P ( X, Y ) | ) < N + 4 n n N ( n − . Corollary 3.8.
The magnitude of the average length of a shortest pathbetween two uniformly chosen vertices in G n is the logarithm of the sizeof G n , which is the same order as Diam ( G n ) .Proof of Theorem 3.7. Let
X, Y be independent, uniformly chosen ver-tices of G n . In this proof we use the notation introduced in Definitions2.1 and 3.4. The digits of the code of a uniformly chosen vertex areindependent and uniform in { , . . . , N − } , hence K ( X, Y ) := | X ∧ Y | has a truncated geometric distribution with parameter N − N . That is P ( K ( X, Y ) = k ) = (cid:18) N (cid:19) k · N − N , if 0 ≤ k < n, (cid:18) N (cid:19) n if k = n. Furthermore, given that the length of the prefix is k = K ( X, Y ), therandom variables R and Q (see Definition 2.1) can be represented asthe sum of indicators corresponding to the start of a new block: R = 1 + n − k − (cid:88) i =1 typ( X k + i ) (cid:54) =typ( X k + i +1 ) ,Q = 1 + n − k − (cid:88) i =1 typ( Y k + i ) (cid:54) =typ( Y k + i +1 ) . Taking expectation yields E ( Q | K ( X, Y ) = k ) = E ( R | K ( X, Y ) = k )= 1 + E (cid:32) n − k − (cid:88) i =1 typ( X k + i ) (cid:54) =typ( X k + i +1 ) (cid:33) = 1 + n − k − (cid:88) i =1 P (typ( X k + i ) (cid:54) = typ( X k + i +1 ))= 1 + ( n − k −
1) 2 n n N . So weighting this with the geometric weights of the length of the prefix,we get E ( Q ) = E ( R ) = E (cid:0) E ( R | K ( X, Y )) (cid:1) = E (cid:0) n − K ( X, Y ) −
1) 2 n n N (cid:1) = 1 + (cid:18) n − N − (cid:18) − N n (cid:19) − (cid:19) n n N . Using this and the following immediate consequence of Lemma 3.5 − ≤ E ( | P ( X, Y ) | − ( R + Q )) ≤ Diam( G ) − , finally we obtain that1 − N − n n N ( n − ≤ E ( | P ( X, Y ) | < Diam( G ) + 4 n n N ( n − . (cid:3) Decay of local clustering coefficient of the modified se-quence (cid:110) ˆ G n (cid:111) . An important property of most real networks is thehigh degree of clustering. In general, the local clustering coefficient ofa node v having n v neighbors is defined as C v := { links between neighbors of v } (cid:0) n v (cid:1) . Note that the numerator in the formula is the number of trianglescontaining v and C v is the portion of the pairs of neighbors of v whichform a triangle with v in the graph.Observe that without the loops the graph sequence G n is bipartite, i.e.there are no triangles in the graph G n . However, we can modify thegraph sequence G n in a natural way, like in [2], to get a new sequenceˆ G n preserving the hierarchical structure of G n , still reflecting the de-pendence of clustering coefficient on node degree observed in severalreal networks. Namely, the local clustering coefficient of a vertex v isof order 1 / deg( v ). Definition 3.9. • We obtain the graph ˆ G adding a set of extra edges RE ( ˆ G ) to G satisfying the following property: Property R ∀ x ∈ Σ , ∃ y, z ∈ Σ , such that two among the edges of thetriangle ( x, y, z ) ∆ are contained in E ( G ) and one of the edgesis in RE ( ˆ G ) . RAPHS GENERATED BY FRACTALS 19 b G (a) We obtain (cid:98) G by adding the dashed (red)edges to G .
02 0400 01 03 05 12 1410 11 13 15 32 3430 31 33 35 52 5450 51 53 5522 2420 21 23 25 42 4440 41 43 45 (b) (cid:98) G : The edges of (cid:98) G and G differ only at the lowest hierarchical level (cf.Figure 3) Figure 4.
Clustering extended ”fan”.
So, V ( ˆ G ) = V ( G ) and E ( ˆ G ) = E ( G ) ∪ RE ( ˆ G ) . In the example presented on Figure 4 the edges from RE ( ˆ G ) arethe dashed red edges. • Similarly we define the graph sequence (cid:110) ˆ G n (cid:111) ∞ n =1 by deleting allloops in G n and adding extra edges to G n . That is, the vertices V ( ˆ G n ) = V ( G n ) = Σ n , and with the definition of the simplegraph G (cid:48) n in Section 2.5, the edge set is extended by the following rule (20) E ( ˆ G n ) = E ( G (cid:48) n ) (cid:91) RE ( ˆ G n ) , where (21) RE ( ˆ G n ) = (cid:26)(cid:18) x . . . x n y . . . y n (cid:19) : x i = y i , i ≤ n − , (cid:18) x n y n (cid:19) ∈ RE ( ˆ G ) (cid:27) . It is clear from Property R that(22) ˆ C min := min x ∈ ˆ G C x > . Further, using (1) and (21) one can easily see that the degree of avertex x ∈ ˆ G n is(23) (cid:100) deg n ( x ) = S ( x ) · deg( x n ) + (cid:16)(cid:100) deg( x n ) − deg( x n ) (cid:17) , where (cid:100) deg( . ) denotes the degree of a vertex in ˆ G , while deg( . ) standsfor the degree in G . Remark 3.10.
The difference between the degree of any node x ∈ Σ n in G n and in ˆ G n is bounded, thus the degree sequence of ˆ G n has thesame power law exponent as G n . Theorem 3.11.
There exists K , K > such that the local clusteringcoefficient C x of an arbitrary node x ∈ ˆ G n satisfies. K (cid:100) deg n ( x ) ≤ C x ≤ K (cid:100) deg n ( x ) . Proof.
We write T n ( x ) for the set of all triangles in ˆ G n containing thenode x ∈ Σ n . We say that a triangle ( x, y, z ) ∆ ∈ T n ( x ) is regularif and only if exactly two of its edges are from E ( G n ). The triangle( x, y, z ) ∆ ∈ T n ( x ) is called irregular if it is not regular. The set ofirregular triangles containing x is denoted by IRT n ( x ). We partitionthe set of regular triangles RT n ( x ) into the classes: RT n ( x ) = RT n ( x ) ∪ RT n ( x )in the following way: A triangle ( x, y, z ) ∆ ∈ RT n ( x ) belongs to RT n ( x )if and only if x is NOT an endpoint of the edge contained in RE ( ˆ G n ).That is RT n ( x ) := (cid:26) ( x, y, z ) ∆ ∈ RT n ( x ) : (cid:18) xy (cid:19) , (cid:18) xz (cid:19) ∈ E ( G n ) . (cid:27) RAPHS GENERATED BY FRACTALS 21
Hence, RT n ( x ) is the set of those ( x, y, z ) ∆ ∈ RT n ( x ) for which either (cid:0) xy (cid:1) ∈ E ( G n ) and (cid:0) xz (cid:1) ∈ RE ( ˆ G n ) or vice versa. Summarizing thesepartitions: T n ( x ) = RT n ( x ) ∪ IRT n ( x ) = RT n ( x ) ∪ RT n ( x ) ∪ IRT n ( x )Now we define the cardinality of these classes:∆ n ( x ) := RT ( x ) , ∆ n ( x ) := RT ( x ) and ∆ ir n ( x ) := IRT ( x ) . When n = 1 then we suppress the index n . Observe that by Property R , ∆ r n ( x ) := ∆ n ( x ) + ∆ n ( x ) ≥ , ∀ n ≥ , x ∈ Σ n . Now we compute ∆ in ( x ), i ∈ { , , ir } , for an arbitrary fixed x ∈ Σ n .To do so the notation (cid:96) ( x ) will be used. First we verify that(24) ∆ n ( x ) = (cid:96) ( x ) − (cid:88) r =0 r (cid:89) j =1 deg( x n − j ) · ∆ ( x n ) = S ( x ) · ∆ ( x n ) , where S ( x ) was defined in (3). To see this, observe that it follows from(1), (20) and (21) that ( x, y, z ) ∆ ∈ RT n ( x )holds if and only if all of the following three assertions are satisfied:(1) ∃ ≤ r ≤ (cid:96) ( x ) − , | y ∧ z | = n − | x ∧ y | = | x ∧ z | = n − r − (cid:0) x k y k (cid:1) ∈ E ( G ) whenever n − r ≤ k ≤ n − x n , y n , z n ) ∆ ∈ RT ( x n ) . Hence (24) is obtained by an immediate calculation.Now we prove that(25) ∆ n ( x ) = (cid:96) ( x ) − (cid:88) r =0 r (cid:89) j =1 deg( x n − j ) · ∆ ( x n ) = S ( x ) · ∆ ( x n ) . This is so because by (1), (20) and (21) we have( x, y, z ) ∆ ∈ RT n ( x )holds if and only if all of the following three assertions are satisfied:(1) ∃ ≤ r ≤ (cid:96) ( x ) − , | x ∧ y | = n − | x ∧ z | = | y ∧ z | = n − r − , (2) (cid:0) x k z k (cid:1) ∈ E ( G ) whenever n − r ≤ k ≤ n − x n , y n , z n ) ∆ ∈ RT ( x n ) . Hence, using the same argument as above we get (25).Finally, we determine the number of irregular triangles containing x :(26) ∆ ir n ( x ) = ∆ ir ( x n ) . This follows from the fact that( x, y, z ) ∆ ∈ IRT n ( x )is equivalent to ∀ ≤ i ≤ n − , x i = y i = z i and ( x n , y n , z n ) ∆ ∈ IRT ( x n ) . We write Z ∆ ( x ) for the number of all triangles in ˆ G n containing x : Z ∆ ( x ) := ∆ n ( x ) + ∆ n ( x ) (cid:124) (cid:123)(cid:122) (cid:125) ∆ r ( x ) +∆ ir n ( x ) . Using (23), (24), (25) and (26) we get(27) C x = Z ∆ ( x ) (cid:0) (cid:100) deg n ( x )2 (cid:1) = 2∆ r ( x n ) · S ( x ) + 2∆ ir ( x n ) (cid:100) deg n ( x )( (cid:100) deg n ( x ) − , where S ( x ) was defined in (3). Now we estimate C x . Claim 3.12.(i): If (cid:96) ( x ) = 1 , then C x = C x n . (ii): If (cid:96) ( x ) ≥ , then we have (28) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) C x − r ( x n )deg( x n ) · (cid:100) deg n ( x ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ const (cid:100) deg n ( x ) . Proof of the Claim.
Part (i) immediately follows from (1). To prove (ii) we fix an arbitrary x ∈ Σ n with (cid:96) ( x ) ≥
2. Since t, u, v introducedbelow depend only on x n there exists a constant C ∗ independent of n and x such that(29)0 ≤ t := ∆ r ( x )deg( x n ) , u := (cid:100) deg( x n ) − deg( x n ) , v := 2∆ ir ( x n ) < C ∗ . To prove (28) it is enough to verify that Q := (cid:16) (cid:100) deg n ( x ) (cid:17) (cid:16) (cid:100) deg n ( x ) − (cid:17) · C x − t · ( (cid:100) deg n ( x ) − n and x ∈ Σ n . This so, because by (23) and (27) we have Q = 2∆ r ( x n ) · S + v − t (cid:0) S · deg( x n ) + u (cid:124) (cid:123)(cid:122) (cid:125) (cid:100) deg( x ) − (cid:1) = 2∆ r ( x n ) · S + v − r ( x n ) · S (cid:124) (cid:123)(cid:122) (cid:125) tS deg( x n ) − t ( u − v − t ( u − , which is bounded by (29). (cid:3) RAPHS GENERATED BY FRACTALS 23
Property R implies that both C x n and ∆ r ( x n )deg( x n ) are bounded away fromzero. This completes the proof of the Theorem 3.11. (cid:3) The following theorem shows that the graph sequence ˆ G n displays sim-ilar features to that of considered in [2], namely, the average localclustering coefficient of the graphs ˆ G n is not tending to zero with thesize of ˆ G n . Theorem 3.13.
The average local clustering coefficient ¯ C ( ˆ G n ) of thegraph ˆ G n is bounded by two positive constants, more precisely (30) 2 n n ˆ C min N ≤ ¯ C ( ˆ G n ) ≤ ¯ C ( ˆ G ) , where ˆ C min was defined in (22).Proof. We will use the notation introduced in the proof of Theorem3.11. It easily follows from the proof of Theorem 3.11 that(31) C x ≤ C x n . Namely, if (cid:96) ( x ) = 1 then by (21), C x = C x n . If (cid:96) ( x ) ≥ S ( x ) ≥ C x ≤ ∆ r ( x n ) + ∆ ir ( x n ) (cid:0) deg( x n )2 (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) C xn · (cid:18) deg( x n )2 (cid:19) · S ( x ) (cid:0) (cid:100) deg n ( x )2 (cid:1)(cid:124) (cid:123)(cid:122) (cid:125) ≤ . ≤ C x n . This completes the proof of (31) from which the upper estimate of (30)follows by averaging. On the other hand to see that the lower estimateholds we take into consideration only the contribution of x ∈ Σ n with (cid:96) ( x ) = 1 . ¯ C ( ˆ G n ) > N n (cid:32)(cid:88) z ∈ V N n − n C z + (cid:88) z ∈ V N n − n C z (cid:33) Using C z > ˆ C min , the lower bound of (30) follows. (cid:3) Definition of the randomized model
In this section we randomize the deterministic model in Section 2 byusing Λ in [0 , . The random graph sequence G r n is generated in away which was inspired by the W -random graphs introduced by Lov´aszand Szegedy [10]. See also [5].Fix a deterministic model with a base graph G , | V ( G ) | = N . This de-termines Λ( a, b ) the limit of the sequence of scaled adjacency matrices, see the definition (7) and (6) in Section 2.2. Now for each n , we throw M n + 1 independent, uniform random numbers over [0 , X (1) , X (2) , . . . , X ( M n +1) ∼ U [0 , , i.i.d.We denote the N -adic expansion of each of these numbers by X ( i ) = ( X i , X i , . . . ) , i.e. X ( i ) = ∞ (cid:88) k =1 X ik N k , where the X ik -s are uniform over the set { , , . . . , N − } . The n-thapproximation of X ( i ) is X ( i )[ n ] = n (cid:88) k =1 X ik N k , X ( i ) n = ( X i , . . . , X in ) . Now we construct the random graph G r n as follows: | V ( G r n ) | = { , . . . , M n } ,and E ( G r n ) is given by E ( G r n ) = (cid:26) ( i, j ) (cid:12)(cid:12) int (cid:18) I X ( i )[ n ] × I X ( j )[ n ] (cid:19) ∩ Λ (cid:54) = ∅ (cid:27) , where int denotes the interior of a set. Clearly, E ( G r n ) = (cid:8) ( i, j ) | Λ n ( X ( i ) , X ( j ) ) = 1 (cid:9) . Note that Λ n ( X ( i ) , X ( j ) ) = 1 ⇔ (cid:18) X i . . . X in X j . . . X jn (cid:19) ∈ E ( G n ) . Namely, we can think of the first n digits ( X i , . . . , X in ) and ( X j , . . . , X jn )of the N-adic expansion of X ( i ) and X ( j ) as vertices in G n . We draw anedge between the two vertices i and j in G r n if the vertices ( X i . . . X in )and ( X j . . . X jn ) are connected by an edge in the deterministic model G n . This gives the following probabilistic interpretation of the randommodel: Remark 4.1.
Consider the deterministic graph sequence G n with urnssitting at each vertex v ∈ G n . Now throw M n + 1 balls independentlyand uniformly into the urns, and connect vertex i to vertex j by anedge in the random graph G r n if and only if the urns of ball i and j areconnected by and edge in G n . We need to introduce some further notation.
Frequently used definitions.
Under assumption A1 , for an x ∈ G n with (cid:96) ( x ) = k the degree of x is t k := d k +11 − d − , RAPHS GENERATED BY FRACTALS 25 independently of the length of n .In the random graph G r n , the conditional probability of the degreedistribution of a random node V ∈ { , . . . , M n } conditioned on thefirst n digits of the N-adic expansion of the corresponding code X ( V ) follows a Binomial distribution:(32) (cid:0) deg( V ) | ( X V . . . X Vn ) = x (cid:1) ∼ BIN (cid:18) M n , t (cid:96) ( x ) N n (cid:19) . This follows from the characterization of G r n described in Remark 4.1.Namely, assume that the V -th ball has landed in urn with label x ∈ Σ n .In G n there are exactly deg n ( x ) − t (cid:96) ( x ) vertices y ∈ Σ n that areconnected to x . All the balls landing into urns corresponding to thesevertices y will be connected to V in G r n . Properties of the randomized model
In this section we determine the proportion of isolated vertices andcharacterize the degree sequence.5.1.
Isolated vertices.Theorem 5.1. If M n = c n N n with lim n →∞ c n = ∞ , then the fractionof isolated vertices tends to zero as n → ∞ . More precisely, for auniformly chosen node V ∈ G r n , P (deg( V ) = 0) ≤ e − d min c n , where d min stands for the minimal degree in the base graph G , and in deg( . ) we do not count the loops. The following corollary is an immediate consequence of the Borel-Cantelli lemma.
Corollary 5.2. If ∞ (cid:80) n =1 c n N n e − d min c n < ∞ , then almost surely there willbe only finitely many n -s, for which the graph G r n has isolated vertices. The assumption of the Corollary is satisfied if e.g. c n > n log( N + 1). Proof of Theorem 5.1.
Given the N-adic expansion of X ( V ) , the prob-ability that a vertex is isolated depends on how many neighbors thevertex ( X V . . . X Vn ) has in the deterministic model. So we can write P (deg( V ) = 0) = (cid:88) x ∈ Σ n P (deg( V ) = 0 | ( X V . . . X Vn ) = x ) · N n As we have already seen, (cid:0) deg( V ) | ( X V . . . X Vn ) = x (cid:1) follows a Bino-mial distribution with parameters M n and deg n ( x ) − N n , so the conditionalprobability of isolation is P (deg( V ) = 0 | ( X V . . . X Vn ) = x ) = (cid:18) − t (cid:96) ( x ) N n (cid:19) M n ≤ e − deg n ( x ) c n (1 + o (1)) . Obviously e − deg n ( x ) c n ≤ e − d min c n holds for all x ∈ Σ n , which completesthe proof. (cid:3) Decay of degree distribution.
Fix a constant K such that fora standard normal variable Z , P ( | Z | > K ) < e − . We write I k,n := [ c n t k − K √ c n t k , c n t k + K √ c n t k ] , and k ( n ) := max (cid:26) ( n + 1) log d log d , log n log d (cid:27) . Now we describe the degree distribution for the random model.
Theorem 5.3.
Let k > k ( n ) and u ∈ I k,n . Then for a uniformlychosen node V in G r n P (deg( V ) = u ) = (cid:16) n N (cid:17) k n N · √ c n t k φ (cid:16) u − c n t k (cid:113) c n t k (1 − t k N n ) (cid:17)(cid:0) O ( 1 √ c n t k ) (cid:1) , where φ denotes the density function of a standard Gaussian variable. This immediately implies
Corollary 5.4.
The degree distribution of the random model is givenby the following formula for a, b ∈ [ − K, K ] : P (cid:0) deg( V ) ∈ [ c n t k + a √ c n t k , c n t k + b √ c n t k ] (cid:1) = (cid:16) n N (cid:17) k n N · (Φ( b ) − Φ( a ))+ O (cid:16)(cid:16) n N (cid:17) k √ c n t k (cid:17) , where k > k ( n ) and Φ denotes the distribution function of a standardGaussian variable. So, for u ∈ I k,n , k > k ( n ) the tail of the probability RAPHS GENERATED BY FRACTALS 27 distribution is: (33) P (deg( V ) > u ) = (cid:16) n N (cid:17) k +1 + (cid:16) n N (cid:17) k n N − Φ (cid:16) u − c n t k (cid:113) c n t k (1 − t k N n ) (cid:17) + (cid:16) n N (cid:17) k +1 O (cid:16) √ c n t k (cid:17) . This holds because P (deg( V ) > u ) equals the sum of all probabilitymass that is concentrated around t l -s for l ≥ k + 1, resulting in the firstterm, plus the second term coming from the part greater than u of thebinomial mass around t k . As a consequence, the decay of the degreedistribution follows a power law. Namely, the following holds Theorem 5.5.
Let γ := 1 + log( Nn )log d . Then the decay of the degree distribution is: P (deg( V ) > u ) = u − γ +1 · L ( u ) , where L ( u ) is a bounded function: n N ≤ L ( u ) ≤ Nn . The idea of the proof of Theorem 5.3.
The conditional distribution ofthe degree of a node V conditioned on the n-digit N-adic expansion of X ( V ) n = x follows a BIN ( c n N n , t (cid:96) ( x ) N n ) law. This is close to a P OI ( c n t (cid:96) ( x ) )random variable, because c n and t (cid:96) ( x ) tend to infinity in a much smallerorder than N n . Now for the P OI ( c n t (cid:96) ( x ) ) variable, the Central LimitTheorem holds with an error term of order 1 / (cid:112) c n t (cid:96) ( x ) . Now the un-conditional degree distribution comes from the law of total probabilityand from the fact that all other errors are negligible. (cid:3) Proof of Theorem 5.3.
We determined the degree distribution of thedeterministic model under assumption ( A1 ), see Section 3.1 for details.Recall that if k > k ( n ), then the mass at t k is p k := P ( (cid:96) ( x ) = k ) = (cid:16) n N (cid:17) k n N .
We show that in the random model G r n , these Dirac masses are turnedinto Gaussian masses centered at c n t k . Suppose u ∈ I k,n . By the law of total probability, we have(34) P (deg( V ) = u ) = P (deg( V ) = u | (cid:0) X V . . . X Vn (cid:1) = x, (cid:96) ( x ) = k ) · p k + S + S , where S = k − (cid:88) j =1 P (deg( V ) = u | (cid:0) X V . . . X Vn (cid:1) = x, (cid:96) ( x ) = j ) · p j S = n (cid:88) j = k +1 P (deg( V ) = u | (cid:0) X V . . . X Vn (cid:1) = x, (cid:96) ( x ) = j ) · p j S and S combines the total contribution of cases when (cid:96) ( X V . . . X Vn ) (cid:54) = k , i.e. referring to the urn model of our random graph, S + S settlesthe cases when the random ball V falls into an urn which has degreedifferent from t k in G n . As a first step in our proof we show that theright hand side in the first line of (34) gives the formula in Theorem5.3, then as a second step we verify that S + S is negligible. First step:
Following the standard proof of the local form of deMoivre-Laplace CLT, we obtain that for u ∈ I k,n P (cid:16) deg( V ) = u | (cid:0) X V . . . X Vn (cid:1) = x (cid:17) = 1 (cid:113) c n t (cid:96) ( x ) (1 − t (cid:96) ( x ) N n ) φ u − c n t (cid:96) ( x ) (cid:113) c n t (cid:96) ( x ) (1 − t (cid:96) ( x ) N n ) · (cid:32) O (cid:0) (cid:112) c n t (cid:96) ( x ) (cid:1)(cid:33) . We can neglect 1 − t (cid:96) ( x ) N n . This completes the first step. Second step:
Since u ∈ I k,n we have:(35) S ≤ k − (cid:88) j =1 P (deg( V ) > t k − K √ t k | ( X V . . . X Vn ) = x, (cid:96) ( x ) = j ) · p j S ≤ n (cid:88) j = k +1 P (deg( V ) < t k + K √ t k | ( X V . . . X Vn ) = x, (cid:96) ( x ) = j ) · p j Now we use the fact known from Chernoff-bounds: for an Z ∼ BIN ( m, p )variable P ( Z ≥ (1 + δ ) E ( Z )) ≤ e − δ E ( Z ) , and the same bound holds for P ( Z ≤ (1 − δ ) E ( S )). By (32), toestimate each summand in (35) we can apply these inequalities for RAPHS GENERATED BY FRACTALS 29 Z j ∼ BIN ( c n N n , t j N n ) , j ∈ { , . . . , n } \ { k } , yielding an upper bound S + S ≤ k − (cid:88) j =1 e − d k − j c n · p j + n (cid:88) j = k +1 e − (1 − d k − j ) d j c n · p j ≤ e − d k c n . Since e − d k c n = o ( √ c n t k ), the statement of Theorem 5.3 follows. (cid:3) Now we are ready to prove the main result of the section.
Proof of Theorem 5.5. If u ∈ I k,n , then u = d k · (cid:18) O (cid:18) d (cid:19)(cid:19) . Using (33) we obtain that there exists C ( u ) ∈ [ n N ,
1] such that P (deg( V ) > u ) = (cid:16) n N (cid:17) k C ( u ) . The last two formulas immediately imply the assertion of the Theoremwhenever u ∈ I k,n . Actually in this case we have n N ≤ L ( u ) ≤
1. If u (cid:54)∈ ∪ k I n,k , then there exists k = k ( u ) such that u ∈ ( c n t k , c n t k +1 ). Bymonotonicity of the distribution function we have P (deg( V ) > c n t k +1 ) ≤ P (deg( V ) > u ) ≤ P (deg( V ) > c n t k ) . Applying the theorem for c n t k +1 and c n t k , we loose a factor of N n inthe upper bound of L ( u ) and the assertion of the Theorem follows. (cid:3) References [1] A.-L. Barab´asi, R. Albert, Emergence of Scaling in Random Networks
Science no. 5439 509-512 (1999)[2] R. Albert, A.-L. Barab´asi, Hierarchical organization in complex networks.
Rev.Mod. Phys. , 47 (2002).[3] A.-L. Barab´asi, E. Ravasz, T. Vicsek, Deterministic Scale-Free Networks. Physica A: Statistical Mechanics and its Applications , Issues 3-4, 559-564 (2001).[4] B. Bollob´as, Random graphs.
Cambridge University Press (2001)[5] B. Bollob´as, S. Janson, O. Riordan, The Phase transition in inhomogeneousrandom graphs.
Random Structures Algorithms no. 1, 3122, (2007)[6] P. Diaconis, S. Janson, Graph limits and exchangeable random graphs. Rend.Mat. Appl. (7) Wiley , 1997.[9] S. Jung, S. Kim and B. Kahng A Geometric Fractal Growth Model for ScaleFree Networks.
Phys. Rev. E Journal of Combina-torial Theory , 933-957, (2006). [11] G. Palla, L. Lov´asz and T. Vicsek, Multifractal network generator. PNAS , 7640-7645.
J´ulia Komj´athy, Institute of Mathematics, Technical University ofBudapest, H-1529 B.O.box 91, Hungary [email protected]