Iterated Global Models for Complex Networks
aa r X i v : . [ c s . D M ] F e b Iterated Global Models for ComplexNetworks ⋆ Anthony Bonato and Erin Meger Ryerson University, Toronto, ON, Canada [email protected], [email protected]
Abstract.
We introduce the Iterated Global model as a deterministicgraph process that simulates several properties of complex networks. Inthis model, for every set S of nodes of a prescribed cardinality, we adda new node that is adjacent to every node in S . We focus on the casewhere the size of S is approximately half the number of nodes at eachtime-step, and we refer to this as the half-model. The half-model provablygenerate graphs that densify over time, have bad spectral expansion, andlow diameter. We derive the clique, chromatic, and domination numbersof graphs generated by the model. Keywords:
Network models · social networks · densification · spectralgraph theory Over the last two decades, research in modelling complex networkshas become of great interest to mathematicians and theoretical com-puter scientists. Complex networks arise in technological, social, andbiological contexts. The emergence of the study of complex networkssuch as the web graph and on-line social networks has focused at-tention on these large-scale graphs, and in the modeling and miningof their emergent properties; see [1,5,6] for more on these models.Two deterministic models of complex networks of particular in-terest to the current study were introduced: the
Iterated Local Tran-sitivity (ILT) model and the
Iterated Local Anti-Transitivity (ILAT) model [4,3]. Consider a social network where friendships have posi-tive edge signs and adversarial relations have negative edge signs. A triad is a set of three nodes in a signed network. A triad is said tobe balanced if the product of the edge signs is positive. Structural ⋆ The first author acknowledges funding from an NSERC Discovery grant. A. Bonato, E. Meger balance theory states that these networks seek to balance all triads[8]. The ILT and ILAT models were designed with balanced triads inmind. In the ILT model, nodes are cloned , where nodes are adjacentto all neighbors of their parent node. In the ILAT model, nodes are anti-cloned , where a new node is adjacent to all non-neighbors of it’sparent node. The ILT and ILAT models simulates many propertiesof social networks. For example, as shown in [4], graphs generatedby the model densify over time (see [10] for more on densification),and exhibit bad spectral expansion (see [9] for more on this topic insocial networks). In addition, the ILT model generates graphs withthe small-world property, which requires the graphs to have low di-ameter and dense neighbor sets. Both the ILT and ILAT models wereunified in the recent context of Iterated Local Models in [2]The ILT, ILAT, and ILM models focused on considering the localstructure of the graph and generating a new model iteratively fromthis structure. We now define a model that is independent of thestructure of the initial graph but retains the iterative character of thepreviously defined models. We introduce the
Iterated Global Models ,where a dominating node is added for each subset of nodes of a givencardinality.Let k ≥ G = G . At each time-step t ≥ , wecreate G t +1 from G t in the following way: for each set of nodes ofcardinality ⌊ k | V ( G t ) |⌋ , say S , add a new v S that is adjacent to eachnode of S . We name this process the k -model . For ease of notationand for consistency with earlier chapters, we refer to newly addednodes in G t +1 as clones . Note that the clones form an independentset in G t +1 . For the sake of clarity, we focus in this paper on the case k = 2 , which we refer to as the half-model . In the half-model, each newnode is adjacent to approximately half of the existing network. SeeFigure 1 for an example.While structural balance theory considers the importance of localties, the half-model may be useful in analyzing complex networkswhere nodes interact via weaker, non-local ties. In social networkssuch as Twitter, Instagram, or Reddit, we may form a network ofusers where links are determined by likes, comments, or comments.For example, a user on Reddit may choose to comment on a fraction terated Global Network Models 3 Fig. 1: One time-step of the half-model beginning with C . of the posts they read, which is reflective of the design of the half-model.The paper is organized as follows. In Section 2, we prove that,as observed in complex networks, the half-model densifies over timeand has bad spectral expansion. We also show that after five time-steps, graphs generated by the model have diameter at most 3. Thehalf-model is of graph theoretic interest in its own right, and in Sec-tion 3 we determine the clique, chromatic, and domination numbersof graphs generated by the model. We conclude with further direc-tions to investigate for the half-model.For a general reference on graph theory, the reader is directedto [13]. For background on social and complex networks, see [1,5,7].Throughout the paper, we consider finite, undirected graphs. Our first result establishes the order and size of graphs generatedby the half-model. We first recall Stirling’s approximation for thefactorial given by n ! ∼ √ πn (cid:16) ne (cid:17) n . A. Bonato, E. Meger
Stirling’s approximation may be used to derive an expression for thecentral binomial coefficient given by (cid:18) nn (cid:19) ∼ n √ πn , which may be derived directly and is part of folklore. Such an ap-proximation will be useful in our analysis, and its usefulness hasprovided motivation for the study of the half-model as opposed toother values of k . For an exposition of the asymptotics of binomialcoefficients, see the book [12].The number of nodes of G t is denoted n t , the number of edges isdenoted e t . Theorem 1.
The order and size of the graph G t in the half-modelare given by the following, respectively: n t ∼ (cid:18) n t − (cid:4) n t − (cid:5)(cid:19) and e t ∼ (cid:18) n t − (cid:4) n t − (cid:5)(cid:19) · j n t − k . Before we give the proof of Theorem 1, we simplify notation bydefining the function α t = (cid:18) n t (cid:4) n t (cid:5)(cid:19) . Proof.
We begin with the order of G t . By the definition of the model,at each time-step t ≥ , we add one node for each set of size (cid:4) n t − (cid:5) .Hence, we derive the following sum given by n t = n + t X i =1 α i − . The term α t − will dominate the rest of the summation, which givesus the desired expression for the order of G t .Next, we determine the size of G t . Each new node added is adja-cent to a set of size (cid:4) n t − (cid:5) , and we add α t − nodes, so we obtain thefollowing recursive formula for the number of edges at time-step t : e t = e t − + j n t − k α t − . We observe that the second term dominates the sum, and the resultfollows. ⊓⊔ terated Global Network Models 5 We say that a network densifies if the limit of the ratio of edges tonodes is unbounded. Densification power laws in complex networkswere first reported in [10]. From Theorem 1 we have the followingresult.
Corollary 1.
The half-model densifies with time.Proof.
By Theorem 1, we have that e t n t ∼ α t − · (cid:4) n t − (cid:5) α t − = j n t − k , which tends to infinity with t . ⊓⊔ For a graph G and sets of nodes X, Y ⊆ V ( G ), define E ( X, Y )to be the set of edges in G with one endpoint in X and the otherin Y. For simplicity, we write E ( X ) = E ( X, X ) . Let A denote theadjacency matrix and D denote the diagonal degree matrix of agraph G . The normalized Laplacian of G is L = I − D − / AD − / . Let 0 = λ ≤ λ ≤ · · · ≤ λ n − ≤ L . The spectral gap of the normalized Laplacian is defined as λ = max {| λ − | , | λ n − − |} . We will use the expander mixing lemma for the normalized Lapla-cian [6]. For sets of nodes X and Y , we use the notation vol( X ) = P v ∈ X deg( v ) for the volume of X , X = V \ X for the complementof X , and, e ( X, Y ) for the number of edges with one end in each of X and Y. Note that X ∩ Y need not be empty, and in this case, theedges completely contained in X ∩ Y are counted twice. In particular, e ( X, X ) = 2 | E ( X ) | . Lemma 1 (Expander mixing lemma). [6] If G is a graph withspectral gap λ , then, for all sets X ⊆ V ( G ) , (cid:12)(cid:12)(cid:12)(cid:12) e ( X, X ) − (vol( X )) vol( G ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ λ vol( X )vol( X )vol( G ) . A. Bonato, E. Meger
A spectral gap bounded away from zero is an indication of badexpansion properties, which is characteristic for social networks, [9].The next theorem represents a drastic departure from the good ex-pansion found in binomial random graphs, where λ = o (1) [6]. Theorem 2.
Graphs generated by the half-model satisfy λ t ∼ , where λ t is the spectral gap of G t . Proof.
Let X = V ( G t ) \ V ( G t − ) be the set of cloned nodes addedto G t − to form G t . Since X is an independent set, we note that e ( X, X ) = 0. We derive thatVol( G t ) = 2 e t ∼ α t − · n t − , Vol( X ) = α t − · j n t − k , Vol( X ) ∼ α t − · j n t − k . Hence, by Lemma 1, we have that λ t ≥ (Vol( X )) Vol( G t ) · Vol( G t )Vol( X )Vol( X )= Vol( X )Vol( X ) ∼ α t − · (cid:4) n t − (cid:5) α t − · (cid:4) n t − (cid:5) = 1 , and the result follows. ⊓⊔ We observe that the half-model has a small (in fact, constant)diameter as required for the small-world property. We first provesome results about the connectivity for graphs generated by thismodel.
Lemma 2.
For all t ≥ , if G t is connected and n t ≥ , then G t +1 is connected. terated Global Network Models 7 Proof. If v is a clone in G t +1 , then since n t ≥
2, we have that v is adjacent to at least one node u in V ( G t ) \ V ( G t +1 ). Since G t isconnected by hypothesis, there exists a path from u to any othernode of G t , and hence, there is such a path from v to any node of G t . Since the node v was an arbitrary clone, we have shown thereexists a path between any two nodes in G t +1 . ⊓⊔ In the case where n = 1, then G is K . Note that G is K , and G is the disjoint union of two edges. In particular, G and G are not connected. The subsequent lemma will provide insight intohow many iterations a disconnected graph requires before becomingconnected. Lemma 3.
For all t ≥ , if G t is a graph with n t ≥ , then G t +1 isconnected.Proof. We proceed by a proof by contraposition. Suppose then that G t +1 is disconnected, and so there exists two nodes u, v in G t +1 suchthat there is no path between them. Case 1: u, v are both in V ( G t ).In this case, there is no set of size (cid:4) n t − (cid:5) that contains both u and v , since otherwise, a clone in G t +1 would be adjacent to both u, v . At each time-step t , we add a clone for every subset of size (cid:4) n t (cid:5) ;hence, it must be the case that (cid:4) n t (cid:5) < n t ≤
3. Thissatisfies the negation of the predicate, and we have proved the resultin this case.
Case 2:
Exactly one of u or v is not in V ( G t ); without loss of gen-erality, say u ∈ V ( G t +1 ) \ V ( G t ).As u is a clone it has degree (cid:4) n t − (cid:5) , and so has a neighbor x in G t , whenever n t ≥
2. Thus, there is no path from x to v in G t , andwe apply Case 1 using these two nodes. Case 3:
Both u, v are in V ( G t +1 ) \ V ( G t ).Since there are at least two clones it must be the case that α t ≥ n t ≥
2. There then exists some neighbor x of u in G t andsome neighbor y of v in G t . We then have that there is no path from x to y in G t and we apply Case 1 to these two nodes. The prooffollows. ⊓⊔ A. Bonato, E. Meger
Our next result proves the 2-connectivity of graphs generated bythe half-model.
Lemma 4.
The graph G t is 2-connected whenever t ≥ , regardlessof the input graph G .Proof. Using the recursive formula for the number of edges at time t in the proof of Theorem 1, for any graph G , we have at leastfour nodes after two time-steps. Using Lemma 3, we require at leastone additional time-step to ensure connectivity. Thus, regardless ofthe input graph G , it is the case that G t is connected for t ≥ G t is connected, G t +1 will be2-connected. Claim: If G t is connected and n t ≥ , then G t +1 is 2-connected.If G t is 2-connected, then we are done since every node in the set V ( G t +1 ) \ V ( G t ) has at least one neighbor in V ( G t ), and we may usethe same two paths between those neighbors to find 2-connectivity.Suppose G t is at most 1-connected and thus let u be a cut-node of G t . Consider two nodes in G t , say a, b , that have a shortest paththrough u . In G t +1 , there is some clone z that is adjacent to both a, b . Therefore, we have two paths from a to b , and the proofs of theclaim and theorem follow. ⊓⊔ Our main result on the diameter of half-model graphs is the fol-lowing.
Theorem 3.
Suppose that G has order at least 4. In the half-model,the diameter of G t for t ≥ , is at most three.Proof. We consider the distance between two non-adjacent nodes x, y ∈ V ( G t ) in three cases. Case 1 : x, y ∈ V ( G t − ).There exists some set S ⊆ V ( G t − ) of cardinality (cid:4) n t − (cid:5) contain-ing both x and y . Thus, the dominating node for this set S , v S isadjacent to both x and y so their distance is 2. Case 2 : x ∈ V ( G t − ) and y / ∈ V ( G t − ).There exists a node z ∈ N G t ( y ). There is some set S ⊆ V ( G t − )so that x, z ∈ S . The node v S that dominates S in G t is adjacent terated Global Network Models 9 to both x and z , so we have the path yzv S x . Hence, the distancebetween x and y is at most 3. The symmetric case where y ∈ V ( G t − )and x / ∈ V ( G t − ) is analogous. Case 3 : x, y / ∈ V ( G t − ).Since x, y are new nodes in time-step t , there must be two sets S x , S y ⊆ V ( G t − ) , where x dominates S x and y dominates S y . If S x T S y = ∅ , then there is some node of G t − adjacent to both x and y , so their distance is 2. Suppose now that S x T S y = ∅ . Since | S x | = | S y | = (cid:4) n t − (cid:5) , it may be the case that there exists a node z / ∈ S x ∪ S y .Suppose first that there is no such node z . There must be someedge with one endpoint in S x and the other in S y , since otherwise,the graph would be disconnected, which contradicts Lemma 4. Wecall these two endpoints a and b . We then have a path xaby and thedistance between x and y is 3.If there is such a node z , then since G t is 2-connected by Lemma 4, z cannot be a cut-node. Therefore, there must be some edge withone endpoint in S x and the other in S y and the distance between x and y is 3. ⊓⊔ In this section, we discuss classical graph parameters for the half-model. For further background on these parameters, the reader isdirected to [13]. We begin by considering the independence and cliquenumber.
Theorem 4.
The independence number of G t is α t − and for theclique number we have χ ( G t ) ≥ min (cid:16)j n t − k + 1 , ω ( G ) + t (cid:17) . Proof.
At each time-step t , all the cloned nodes form an independentset. The set of new nodes has order α t − ≥ n t − , so this set must bethe largest independent set in G t . Therefore, we derive that α ( G t ) = α t − . We next consider the clique number of G t . At each time-step t ,we add a dominating node to subsets of cardinality (cid:4) n t − (cid:5) from G t − . If the largest clique K at time-step t − K by 1. However, themaximum degree of new nodes is (cid:4) n t − (cid:5) . Hence, we cannot increasethe size of the largest clique to be larger than (cid:4) n t − (cid:5) + 1 . ⊓⊔ We next give the chromatic number of the half-model.
Theorem 5.
For the half-model, we have that the chromatic numberis given by χ ( G t ) = min (cid:16) χ ( G ) + t, j n t − k + 1 (cid:17) . Proof.
Suppose that G t is properly colored. Consider a rainbow sub-set of nodes; that is, a set of nodes that requires each distinct color inthe graph. Let the cardinality of this set be r ≥
1. When r ≤ (cid:4) n t − (cid:5) ,any new clone added that contains this set in its neighbors will needa new color. When r > (cid:4) n t − (cid:5) , any new clone that is added willhave a neighbor set smaller than the cardinality of the colors, whichimplies there will always be an available color. ⊓⊔ We finish by proving a result on the domination number of graphsgenerated by the half-model.
Theorem 6.
The domination number of G t is γ ( G t ) = l n t − m + 1 . Proof.
We will first establish the upper bound γ ( G t ) ≤ l n t − m + 1 . Consider a set S of ⌊ n t − ⌋ non-clone nodes in G t − . The node x S dominates S. The complement T of S in V ( G t − ) has cardinality (cid:6) n t − (cid:7) . Hence, T ∪ { x S } is the desired dominating set.For the lower bound, we must show that γ ( G t ) > (cid:6) n t − (cid:7) . For acontradiction, suppose that some set of (cid:6) n t − (cid:7) -many nodes, say X ,dominates G t . Suppose first that X consists of non-clones. Regardlessof the choice of X , there will be some set of non-clones, call it T , ofsize (cid:4) n t − (cid:5) such that X ∩ T = ∅ . Thus, x T is not dominated, whichis a contradiction. terated Global Network Models 11 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx XG t-1 z Fig. 2: The node z is not adjacent to X. Suppose that X contains at least one clone. There is a least oneclone z not adjacent to X ∩ V ( G t − ), since | X ∩ V ( G t − ) | < (cid:6) n t − (cid:7) .See Figure 2. Any clone in X is not adjacent to z , since the clonesform an independent set. Therefore, z is not dominated by X , whichgives a contradiction. ⊓⊔ We introduced the Iterated Global Model (IGM) for complex net-works. The IGM adds new nodes joined to ⌊ k n t ⌋ , where n t is thenumber of nodes at time t . Our focus was the case k = 2 , and weproved that graphs generated by the half-model exhibit densification,low distances, and bad spectral expansion as found in real-world,complex networks. We investigated various classical graph parame-ters for this model, including the clique, chromatic, and dominationnumbers.Several open problems remain concerning properties of graphsgenerated by the half-model. Graph limits consider dense sequencesof graphs and analyze their properties based on their homomorphismdensities; see [11]. Since the half-model generates dense sequences ofgraphs, it would be interesting to explore their graph limits. In thefull version, we will consider the clustering coefficient of the half-model, analyze its subgraph counts, and degree distribution. Another interesting direction would be to generalize our results to integers k > . References
1. A. Bonato,
A Course on the Web Graph , American Mathematical Society GraduateStudies Series in Mathematics, Providence, Rhode Island, 2008.2. A. Bonato, H. Chuangpishit, S. English, B. Kay, E. Meger, The iterated localmodel for social networks, Preprint 2020.3. A. Bonato, E. Infeld, H. Pokhrel, P. Pra lat, Common adversaries form alliances:modelling complex networks via anti-transitivity, In:
Proceedings of WAW’17 , 2017.4. A. Bonato, N. Hadi, P. Horn, P. Pra lat, C. Wang, Models of on-line social networks,
Internet Mathematics (2011) 285–313.5. A. Bonato, A. Tian, Complex networks and social networks, invited book chapterin: Social Networks , editor E. Kranakis, Springer, Mathematics in Industry series,2011.6. F.R.K. Chung,
Spectral Graph Theory , American Mathematical Society, Provi-dence, Rhode Island, 1997.7. F.R.K. Chung, L. Lu,
Complex Graphs and Networks , American MathematicalSociety, Providence, Rhode Island, 2006.8. D. Easley, J. Kleinberg,
Networks, Crowds, and Markets Reasoning about a HighlyConnected World , Cambridge University Press, 2010.9. E. Estrada, Spectral scaling and good expansion properties in complex networks,
Europhys. Lett. (2006) 649–655.10. J. Leskovec, J. Kleinberg, C. Faloutsos, Graphs over time: densification Laws,shrinking diameters and possible explanations, In: Proceedings of the 13th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining ,2005.11. L. Lov´asz,
Large networks and graph limits , American Mathematical Society, Prov-idence, RI, 2012.12. J. Spencer, L. Florescu, Asymptopia, American Mathematical Society, Providence,Rhode Island, 2014.13. D.B. West,