[PDF] An exact formula for percolation on higher-order cycles

Abstract

We present exact solutions for the size of the giant connected component (GCC) of graphs composed of higher-order homogeneous cycles, including weak cycles and cliques, following bond percolation. We use our theoretical result to find the location of the percolation threshold of the model, providing analytical solutions where possible. We expect the results derived here to be useful to a wide variety of applications including graph theory, epidemiology, percolation and lattice gas models as well as fragmentation theory. We also examine the Erd\H{o}s-Gallai theorem as a necessary condition on the graphicality of configuration model networks comprising higher-order clique sub-graphs.

Full PDF

AAn exact formula for percolation on higher-order cycles

Peter Mann, ∗ V. Anne Smith, John B.O. Mitchell, Christopher Jeﬀerson, and Simon Dobson

School of Computer Science, University of St Andrews, St Andrews, Fife KY16 9SX, United KingdomSchool of Chemistry, University of St Andrews, St Andrews, Fife KY16 9ST, United Kingdom andSchool of Biology, University of St Andrews, St Andrews, Fife KY16 9TH, United Kingdom (Dated: February 19, 2021)We present exact solutions for the size of the giant connected component (GCC) of graphs com-posed of higher-order homogeneous cycles, including weak cycles and cliques, following bond perco-lation. We use our theoretical result to ﬁnd the location of the percolation threshold of the model,providing analytical solutions where possible. We expect the results derived here to be useful to awide variety of applications including graph theory, epidemiology, percolation and lattice gas modelsas well as fragmentation theory. We also examine the Erd˝os-Gallai theorem as a necessary conditionon the graphicality of conﬁguration model networks comprising higher-order clique sub-graphs.

I. INTRODUCTION

Bond percolation on graphs is a process in which edgesare randomly removed with some probability, T . As T isreduced to some critical value, T ∗ , the graph exhibits asecond-order phase transition and fails to be globally con-nected. The size of the GCC, as well as the location of thecritical point, are important quantities within the perco-lation process. Percolation has not only inherent theo-retical interest but is also important for various appli-cations across many disciplines [1–9]. Perhaps the mostprominent utilisation is the study of diseases spreadingthrough structured populations with transmission prob-ability T . In this instance, the GCC is isomorphic to theoutbreak size of the disease while the critical bond oc-cupation probability is the epidemic threshold. It is wellunderstood how to extract the properties of graphs usingthe generating function formulation [1, 2, 10–12]. In itsoriginal form, it is assumed that there are no closed-loopsor cycles among the edges of the graph; it is locally tree-like everywhere. When this condition is true, or approx-imately true, the generating function formulation yieldsexcellent results compared to simulation. However, if thenetwork fails to be locally tree-like, then the formula-tion must be modiﬁed to describe correctly the emer-gent properties of percolation. Newman [13] provides anearly analytical breakthrough in the study of graphs withclosed loops. Within the generating function formula-tion, the next theoretical milestone is by Miller and New-man in 2009 who independently studied 3-cliques alongwith tree-like edges [14, 15]. Shortly thereafter Karrerand Newman [16] developed a general framework thataddressed the study of larger subgraphs; however, it wasdetermined that a crucial quantity, which we denote by g τ , could only be determined by an exponentially slowexhaustive enumeration of states. This quantity is theprobability that a node remains unattached to the GCCdespite its involvement in a cycle of topology τ . Allard etal [17, 18] developed a comprehensive and versatile tech-nique based on recursive formulas to determine the perco-lation properties numerically through iteration. Withinthe spirit of these developments, Mann et al developedan analytical approach that approximates the g τ expres- sion to high accuracy [19, 20] aﬀording an equation-basedtreatment of percolation on arbitrary subgraphs. It re-mains that the percolation properties can be found ex-actly, but slowly through Karrer and Newman’s method,exactly but recursively though Allard et al ’s method,or approximately but analytically though Mann et al ’smethod.In this paper, we develop exact analytical expressionsfor homogeneous subgraphs; that is, cycles whose nodesare all degree-equivalent to one another. We presentthese equations for simple cycles and cliques; however,we hope it is clear how the method can be extendedto other homogeneous classes that arise between theselimiting examples. Application of our counting methodto inhomogeneous cycles (cycles that contain nodes withdiﬀerent degrees) can readily be performed; however, theﬁnal expression depends on the details of the subgraph.The method is most similar to [13–16] and the polynomi-als we develop herein appear very similar to those foundby [13], although we provide closed form expressions. II. BACKGROUND

It is necessary to review both the generating func-tion formulation and the conﬁguration model in orderto progress with contents of this paper [1, 2]. The frame-work is based on the degree distribution, p ( k ), whichis the probability of choosing a node of degree k fromthe graph. Two generating functions are introduced thatgenerate (i) the probability of choosing a node at randomfrom the network G ( z ) = ∞ (cid:88) k =0 p ( k ) z k (1)and (ii) the distribution of degrees of a node reached byfollowing a randomly chosen edge G ( z ) = G (cid:48) ( z ) G (cid:48) (1) (2)Deﬁning u as the probability that a neighbour isunattached to the GCC, the probability that a node a r X i v : . [ phy s i c s . s o c - ph ] F e b fails to become attached through a single edge is g =1 − T + uT , which is the sum of the probability that theedge was not occupied, 1 − T , and the probability that it was occupied, but the neighbour was unattached to theGCC, uT . The quantity u can be found as the solutionto a self-consistent expression [11] u = G ( g ) (3)The expected size of the GCC, S , is then given by S =1 − G ( g ). The critical point can then be found byperturbing around u = 1 which corresponds to S = 0since G (1) = 1. Expanding Eq 3 with a Taylor serieswe have u = 1 + uT G (cid:48) (1) + O ( u ), from which we ﬁnd T ∗ = 1 /G (cid:48) (1) [1, 11, 12].The conﬁguration model is a method that can be usedto create a particular random graph from an ensembleof degree equivalent, uncorrelated random graphs. Inthe model, the nodes of the graph are assigned an in-teger, drawn at random from the degree distribution,which indicate its degree. The degree sequence { k } = k , k , . . . , k N , where (cid:80) i k i = 2 E for a network of N ∈ Z nodes and E ∈ Z edges, is a sequence of the degreesof the nodes and is typically displayed in descending or-der such that k ≥ k ≥ · · · ≥ k N . However, not alldegree sequences are valid, or graphic , such that somesequences of integers cannot be used to create a graph.The Erd˝os-Gallai theorem (EGT) states that in additionto the handshaking lemma (HL), (cid:80) i k i = 2 E , a sequenceis graphic if and only if the Erd˝os-Gallai inequality (EGI) n (cid:88) i =1 k i ≤ n ( n −

1) + N (cid:88) i = n +1 min( k i , n ) (4)holds for n ∈ [1 , N − N = 3 and { k } = { (1) , (1) , (1) } the inequality inEq 4 is satisﬁed but the sum of degrees is not even whilst { k } = { (2) , (0) , (0) } satisﬁes the lemma but not Eq 4.To construct the networks, node i is inserted k i timesinto a list for all i ∈ N which is then shuﬄed. Pairsof nodes are then drawn at random and connected to-gether. In the limit of large and sparse networks, theprobability that the construction process chooses pairsthat are either already connected through another edgeor belong to the same node is vanishingly small. Thenetworks generated according to this process are locallytree-like and contain no short-range loops; they are alsoabsent of degree-correlations. III. GRAPHICALITY OF JOINT DEGREESEQUENCES

The original conﬁguration model described in sectionII was extended by Newman to incorporate triangularclustering [13–15] and subsequently higher-order sub-graph motifs [16]. In this model the degree distribution is replaced by a joint degree distribution that describes anode’s involvement in higher-order cycles such as trian-gles, squares, 4-cliques etc. For instance, a node that isinvolved in s ordinary edges and t triangles is speciﬁed byjoint degree ( s, t ) and the usual degree is recovered from k = s + 2 t . Similarly, the joint degree of a node thatis a part of s ordinary edges, t triangles, v squares and w s, t, v, w ) and occurs with prob-ability p ( s, t, v, w ), its ordinary degree is recovered from k = s + 2 t + 2 v + 3 w , a Diophantine condition [21]. In theextended conﬁguration model it is important to note thatthe cycles are independent of one another, in much thesame way that simple edges are in the original model.This means that the accidental formation of a 4-cliqueduring triangle construction through the choosing of twonodes that are already involved in a triangle vanisheswith large and sparse networks. Thus, upon consideringthe characteristic size of each motif, the extended conﬁg-uration model regenerates the locally tree-like propertyof the subgraphs. The probability of edge sharing be-tween independent cycles is dependent on the number ofnodes and triangles in the cycles for a given number ofcycles, however.The degree sequence of a conﬁguration model networkis a sequence of tuples( s , t , . . . , τ ) , . . . , ( s N , t N , . . . , τ N ) , (5)and as with ordinary edges, not all sequences lead tothe successful creation of networks and we now considernecessary conditions on a joint degree sequence in orderthat it is graphic. It is natural to separate and orderthe joint sequence as s ≥ s ≥ · · · ≥ s N for the ordi-nary edges, t ≥ t ≥ · · · ≥ t N for the triangles (andso on). It is clear that the EGT (the EGI and the HL)must still hold among the overall degrees of the modelfor the joint degree sequence to be graphic. However,the EGT is no longer suﬃcient to ensure the graphicalityof joint degree sequences according to the extended con-ﬁguration model. For example, consider an ordered jointdegree sequence describing ordinary edges and triangles { s, t } = { (0 , , (1 , , (1 , } which is graphic accordingto the EGI, Eq 4, and the HL applied to the overall edges,but is not according to the extended conﬁguration model.We require the EGT to hold among the ordinary edgessuch that (cid:80) i s i = 2 H where H ∈ Z is the number ofordinary edges and that n (cid:88) i =1 s i ≤ n ( n −

1) + N (cid:88) i = n +1 min( s i , n ) (6)holds for n ∈ [1 , N − N (cid:88) i =1 t i = 3 T (7)which is a modiﬁed handshaking lemma, as well as amodiﬁed inequality2 n (cid:88) i =1 t i ≤ n ( n −

1) + N (cid:88) i = n +1 min(2 t i , n ) (8)must hold for n ∈ [1 , N −

IV. SIMPLE CYCLES

In this section we derive a formula for the probabilitythat a node fails to become attached to the GCC despiteits involvement in a simple cycle of length N . We deﬁnea complete graph to indicate that all edges in the cycleare intact; whilst a connected graph is one in which thereexists at least one pathway between all nodes. The re-moval of a single edge from a cycle with N > j for an index over thenumber of edges we have removed from the cycle and wereserve r for an index over the number of nodes n wehave removed from the cycle.We begin by deﬁning the probability of the completecycle. In this case, since all edges are present, each nodemust not belong to the GCC and hence we have u N − T N (9)We can remove an edge from the cycle and still retainfull connectivity among the nodes. Given that there are N edges we obtain N ( uT ) N − (1 − T ) (10)If another edge is removed, then a node can become iso-lated and hence we must reduce the power of u by oneto obtain ( N − uT ) N − (1 − T ) (11)The leading factor of ( N −

1) accounts for the number ofedges to remove. The removal of j edges yields N − (cid:88) j =0 ( j + 1)( uT ) j (1 − T ) (12)Therefore, the entire expression for g τ for weak cycles is g τ = u N − T N + N ( uT ) N − (1 − T )+ N − (cid:88) j =0 ( j + 1)( uT ) j (1 − T ) (13) V. CLIQUES

In this section we derive an exact expression for theprobability g N that a node fails to become attached tothe GCC when it is a constituent of a clique of size N .Cliques have been studied previously using alternativemethods; however, these approaches use recursion to ob-tain a solution [13, 18]. As with the weak case, we frameour theory in layers around integer powers of u in therange [0 , N − exterior or interior edges, depending on whetherthey belong to the outer skeleton of the cycle or connectnodes across the interior, through the shape respectively,see Fig 1. FIG. 1. The 6-clique has 6 nodes (blue), 6 exterior edges(black) and 6(6 − − / − / We deﬁne another term, a ( N − n ) -semi-complete graphto be the complete clique of codimension-( n ) embeddedin the clique of size N with n nodes, and their edges,coloured. In other words, an ( N − N with 1 node coloured, and all edgesthat connect to the coloured node are also coloured, seeFig 2. A ( N − N with 2 coloured nodes, whose edges to all other nodes(and between the coloured nodes themselves) are alsomarked. FIG. 2. The (6-1)-semi-complete clique has 1 coloured node(green) and 6 − −

1) = 5 edgesthat emanate from the coloured node have also been coloured(orange). If we were to ignore colouring, this cycle would bethe 6-clique.

With these deﬁnitions in place, let us begin the deriva-tion. The ﬁrst and arguably the easiest layer is the fullyconnected graph of size N . With all of its edges intactwe pick a focal node and set the remaining ( N −

1) nodesto the u state. The fully connected, complete clique ofsize N occurs with probability u N − T N T N ( N − − / (14)Examining these terms, we note that all nodes other thanthe focal node must not be in the GCC if all of their edgesare occupied. There are N exterior edges and N ( N − − / multiplicity (the number of diﬀerent waysthe conﬁguration can occur) is unity.As remarked above, for N >

2, we can remove edgesfrom this cycle and it will still be fully connected, al-though no longer complete. It happens that we can re-move all of the interior edges, and even one of the exterioredges and still make connected graphs. If we set one ofthe interior edges unoccupied, we have q N,N ( N − / − u N − T N T N ( N − − / − (1 − T ) (15)where q m,k is the number of connected graphs that can beformed over m labelled nodes with k edges (see AppendixA).If we remove a second edge we have q N,N ( N − / − u N − T N T N ( N − − / − (1 − T ) (16)The removal of j edges is now given by E ( N ) (cid:88) j =1 q N,N ( N − / − j u N − T N T N ( N − − / − j × (1 − T ) j (17)where E ( N ) = N ( N − − / u by one. There are ( N −

1) nodes that wecould remove and all edges that point to the removednode must now be (1 − T ), of which there are ( N − N − N -clique occurs withprobability( N − u N − T N − T ( N − N − − − / (1 − T ) N − (18)where the number of interior edges among the non-removed nodes is now ( N − N − − − /

2. We canimagine this as a clique of size ( N −

1) embedded withinthe N -clique, and the remaining edges are set to (1 − T ).We recall the (6 − − T ) edges are or-ange. The leading factor of ( N −

1) in Eq 18 accountsfor the choices of node we could remove other than thefocal node.As with the complete case, we can remove edges fromthis graph and still retain connectivity among the ( N −

1) non-removed nodes. Removal of a single edge occurs withprobability( N − q N − ,X N − , u N − T N − T ( N − N − − − / − × (1 − T ) N − (19)where X N − r,j is the number of edges in the ( N − r )-cliqueminus j X N − r,j = ( N − r )( N − r − / − j (20)Let us remove a second edge from this cycle to obtain( N − q N − ,X N − , u N − T N − T ( N − N − / − × (1 − T ) N − (21)The removal of j edges now proceeds as( N − E ( N − (cid:88) j =1 q N − ,X N − ,j u N − T N − T ( N − N − / − j × (1 − T ) N − j (22)To be clear, this is the equation of the N -clique with onenode removed and up to j = ( N − N − / − T )box. However, when a second node is removed, there isa connection between the removed nodes that need notbe (1 − T ). Therefore, we must subtract from this powerthose connections between removed nodes. This is sim-ply the number of edges in a clique of size equal to thenumber of removed nodes, n . We introduce the term in-terface edges to be edges that connect removed nodes tonon-removed nodes, see Fig 3. FIG. 3. The (6-2)-semi-complete clique has 2 coloured nodes(green) and 6 − N −

1) +( N −

2) = 9 edges that emanate from coloured nodes have alsobeen coloured (orange). Notice that the edge that connectsthe two coloured nodes (yellow) has been coloured diﬀerentlythan the other edges. Interface edges connect blue nodes togreen nodes. There are 9 - 1 interfaces edges in this example.

The number of interface edges is given by the totalnumber of edges that the removed nodes have, minusthe number of edges that connect removed nodes to eachother. If there are n < N removed nodes, then there area total of n (cid:88) i =1 ( N − i )coloured edges (orange plus yellow), of which a total of n ( n − ω ( r ), of interface edges (orange) is ω ( r ) = r (cid:88) i =1 ( N − i ) − r ( r − − T ) are given by thenumber of interface edges. In the case that n = 1 we ﬁndthat the number of interface edges is N −

1, in agreementwith the previous workings.We will now remove a second node from the N -cliqueand we begin by describing the ( N − N −

1) ways to remove the ﬁrst node fol-lowed by (cid:0) N (cid:1) ways to remove the second node, so the bi-nomial coeﬃcient will lead the expression. The chain ofnodes not in the GCC now occurs with probability u N − comprising N less two removed nodes and one focal node.The outer T skeleton of the ( N − T N − . The number of interior edgesamong present nodes is then ( N − N − − − / (cid:88) i =1 ( N − i ) − − N − N with2 removed nodes (a semi-complete graph of codimension2) is given by (cid:18) N − (cid:19) u N − T N − T ( N − N − − − / × (1 − T ) N − (24)We can then remove all of the interior edges among thenon-removed nodes, as well as a single exterior edge andplace them into the (1 − T ) box. Removing one edge wehave (cid:18) N − (cid:19) q N − ,X N − , u N − T N − × T ( N − N − − − / − (1 − T ) N − (25)All of the interior edges of the non-removed subgraphcan be removed, along with one exterior edge, and still permit connected subgraphs of size ( N − n ) among thenon-removed nodes. Hence, the removal of j such edgesyields (cid:18) N − (cid:19) E ( N − (cid:88) j =1 q N − ,X N − ,j u N − T N − × T ( N − N − − − / − j (1 − T ) N − j (26)Subsequent loss of edges will isolate a further node.We have now encountered all the suﬃcient logic thatwe require for the correct abstraction of the formula to ac-count for arbitrary numbers of removed nodes and edgesfrom a clique of size N .For a clique of size N , let there be n removed nodes.There are (cid:0) N − r (cid:1) ways to remove the r ≤ n nodes sequen-tially. The power of u is given by ( N − r − T also; the interior power of T isgiven by ( N − r ). The ﬁnal expression therefore is givenby g N = N − (cid:88) r =0 (cid:18) N − r (cid:19) E ( N − r ) (cid:88) j =0 q N − r,X N − r,j ( uT ) N − r − × T E ( N − r ) − − j (1 − T ) ω ( r ) (27)This equation is the main result of this section. A. Percolation threshold

We now turn our attention to the location of the crit-ical point for the formation of a GCC among networkscomprised entirely of N -cliques during bond percolation.From section I we understand that in order to obtain thepercolation properties of the network, we have to evalu-ate the derivative of g τ with respect to u . This derivativeis found to be ∂g N ∂u = N − (cid:88) r =0 (cid:18) N − r (cid:19) ( N − r − E ( N − r ) (cid:88) j =0 q N − r,X N − r,j × ( uT ) N − r − T E ( N − r ) − j (1 − T ) ω ( r ) (28)The percolation threshold is then obtained when u = 1,and following a similar analysis to the tree-like topologywe obtain ∂g N ∂u (cid:12)(cid:12)(cid:12)(cid:12) u =1 (cid:104) k − k (cid:105)(cid:104) k (cid:105) = 1 (29)For example, the derivative for 3-cliques is found to be ∂g ∂u = 2 T (1 − T ) + 6 uT (1 − T ) + 2 uT (30)Evaluated at u = 1 and inserted into Eq 29 we have2( T + T − T ) (cid:104) t (cid:105)− (cid:104) t (cid:105) is the average numberof triangles that a node belongs to; and, we have assumedthat the cycles are Poisson distributed. Using Gauss’slemma, this cubic expression is reducible in T into thequadratic form whose roots yield the critical transmissi-bilities of the model, and hence, the critical point occursat T ∗ = − (cid:115) (cid:104) t (cid:105) (31)We repeat the calculation for the 4-clique to obtain thefollowing polynomial ∂g ∂u = 3 T ( − T + 7 T − T + 2 T + 1) (32)The Galois group of the quintic part is the symmetricgroup, S , and hence a root cannot be found. It is un-likely that percolation properties of higher-order cyclescan be resolved analytically due to the Abel-Ruﬃni the-orem. In conclusion, we have derived an exact formula to ob-tain the bond percolation properties, including the size ofthe GCC and the location of the percolation threshold, ofconﬁguration model networks comprised of higher-ordersubgraphs. We presented our method for simple cyclesand cliques, however, a wide range of subgraphs can alsobe considered. We have also studied the conditions fordegree sequences to be considered graphic for these net-works and found the correct extension of the Erd˝os-Gallaitheorem. VI. ACKNOWLEDGMENTS

The authors would like to thank an anonymous refereefrom an earlier paper for providing Eq 13 to us and thusigniting the pathway to the exact formula. We would alsolike to thank the School of Chemistry and the School ofBiology of the University of St Andrews for funding thiswork. ∗ [email protected][1] M. E. Newman, Networks . Oxford University Press, 2019.[2] R. Cohen and S. Havlin,

Complex networks: structure,robustness and function . Cambridge University Press,2010.[3] P. D´esesquelles, “Exact solution of ﬁnite size mean ﬁeldpercolation and application to nuclear fragmentation,”

Physics Letters B , vol. 698, no. 4, p. 284–287, 2011.[4] A. D´avila, C. Escudero, J. L´opez, and C. Dorso, “Geo-metrical aspects of isoscaling,”

Physica A: Statistical Me-chanics and its Applications , vol. 374, no. 2, p. 663–668,2007.[5] K. Paech, W. Bauer, and S. Pratt, “Zipf’s law in nuclearmultifragmentation and percolation theory,”

Phys. Rev.C , vol. 76, p. 054603, Nov 2007.[6] W. Trautmann, “Multifragmentation and the liquid-gasphase transition: an experimental overview,”

NuclearPhysics A , vol. 752, pp. 407 – 416, 2005. Proceedings ofthe 22nd International Nuclear Physics Conference (Part2).[7] Y. I. Kuz’Min and I. V. Pleshakov, “Percolation transi-tions on an electriﬁed surface,”

Technical Physics Letters ,vol. 45, no. 11, p. 1167–1170, 2019.[8] R. Cimino, K. A. Cychosz, M. Thommes, and A. V.Neimark, “Experimental and theoretical studies of scan-ning adsorption–desorption isotherms,”

Colloids andSurfaces A: Physicochemical and Engineering Aspects ,vol. 437, p. 76–89, 2013.[9] D. Wilkinson, “Percolation model of immiscible displace-ment in the presence of buoyancy forces,”

Phys. Rev. A ,vol. 30, pp. 520–531, Jul 1984.[10] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, “Ran-dom graphs with arbitrary degree distributions and theirapplications,”

Phys. Rev. E , vol. 64, p. 026118, Jul 2001.[11] M. Newman, “Spread of epidemic disease on networks.,”

Physical review. E, Statistical, nonlinear, and soft matterphysics , vol. 66 1 Pt 2, p. 016128, 2002. [12] M. Molloy and B. Reed, “A critical point for randomgraphs with a given degree sequence,”

Random Structures& Algorithms , vol. 6, no. 2-3, p. 161–180, 1995.[13] M. E. J. Newman, “Properties of highly clustered net-works,”

Phys. Rev. E , vol. 68, p. 026121, Aug 2003.[14] J. C. Miller, “Percolation and epidemics in random clus-tered networks,”

Phys. Rev. E , vol. 80, p. 020901, Aug2009.[15] M. E. J. Newman, “Random graphs with clustering,”

Phys. Rev. Lett. , vol. 103, p. 058701, Jul 2009.[16] B. Karrer and M. E. J. Newman, “Random graphs con-taining arbitrary distributions of subgraphs,”

PhysicalReview E , vol. 82, no. 6, 2010.[17] A. Allard, L. H´ebert-Dufresne, P.-A. No¨el, V. Marceau,and L. J. Dub´e, “Exact solution of bond percolationon small arbitrary graphs,”

EPL (Europhysics Letters) ,vol. 98, p. 16001, mar 2012.[18] A. Allard, L. H´ebert-Dufresne, J.-G. Young, and L. J.Dub´e, “General and exact approach to percolation onrandom graphs,”

Physical Review E , vol. 92, Jul 2015.[19] P. Mann, V. A. Smith, J. B. O. Mitchell, and S. Dobson,“Percolation in random graphs with higher-order cluster-ing,” arXiv e-prints , p. arXiv:2006.06744, June 2020.[20] P. Mann, V. A. Smith, J. B. O. Mitchell, and S. Dob-son, “Random graphs with arbitrary clustering and theirapplications,” 2020.[21] M. Ritchie, L. Berthouze, and I. Z. Kiss, “Generation andanalysis of networks with a prescribed degree sequenceand subgraph family: higher-order structure matters,”

Journal of Complex Networks , vol. 5, pp. 1–31, 05 2016.[22] H. S. Wilf,

Generatingfunctionology . Academic Press,1994.[23] R. J. Riddell and G. E. Uhlenbeck, “On the theory of thevirial development of the equation of state of monoatomicgases,”

The Journal of Chemical Physics , vol. 21, no. 11,p. 2056–2064, 1953.

Appendix A: q n,k The number of connected graphs of N labelled verticesover k edges is given by q n,k . This quantity has a wellknown recursion formula as well as a closed-form analyt-ical solution [13, 22, 23]. Given the importance of thisquantity to the contents of this paper, we will review thisderivation now.Let Q be the combinatorial class of connected graphsand G the combinatorial class of all labelled graphs. Therelation between these two classes is the set-of relation:a graph is a set of connected components. This indicatesthat the mixed exponential generating function G ( z ) of G can be generated from Q ( z ) according to the followingrelationship G ( z ) = exp Q ( z ) (A1)We can readily compute G ( z ) as G ( z ) = 1 + (cid:88) m ≥ (1 + u ) m ( m − / z m m ! (A2)This yields an expression for the entire series of connected graphs, Q ( z ), since, Q ( z ) = log G ( z ) such that we obtain Q ( z ) = (cid:88) q ≥ ( − q +1 q  (cid:88) m ≥ (1 + u ) m ( m − / z m m !  q (A3)We now examine the case of n nodes and k edges where k ≥ n − q n,k of [ z n ][ u k ].Note that the term in the parenthesis has minimum de-gree q in z , allowing us to disregard the series beyond q > n. This yields the formula for the number of con-nected labelled graphs with n nodes and k edges as q n,k = n ![ z n ][ u k ] n (cid:88) q =1 ( − q +1 q × (cid:32) n (cid:88) m =1 (1 + u ) m ( m − / z m m ! (cid:33) q (A4)As an example of q n,k in Eq 27, we examine the coeﬃ-cients of the 4-clique when there are no removed nodes,that is, when n = 0. From table I, we observe the leadingcoeﬃcients of the terms in u are q ,k = 1 , ,

15 and 16which correspond to the number of graphs that can bemade with k = 6 , , , q , is presented in Fig 4. FIG. 4. The 16 graphs that can be made among 4 labellednodes with 3 edges is given by q , . N -Clique g N equation3 (1 − T ) + 2 uT (1 − T ) + 3( uT ) (1 − T ) + u T − T ) + 3 uT (1 − T ) + 3 u ( T (1 − T ) + 3 T (1 − T ) )+ u ( T + 6 T (1 − T ) + 15 T (1 − T ) + 16 T (1 − T ) )5 (1 − T ) + 4 uT (1 − T ) + 6 u ( T (1 − T ) + 3 T (1 − T ) ) + 4 u ( T (1 − T ) + 6 T (1 − T ) + 15 T (1 − T ) + 16 T (1 − T ) ) + u ( T + 10 T (1 − T ) + 45 T (1 − T ) + 120 T (1 − T ) + 205 T (1 − T ) + 222 T (1 − T ) + 125 T (1 − T ) )6 (1 − T ) + 5 uT (1 − T ) + 10 u ( T (1 − T ) + 3 T (1 − T ) ) + 10 u ( T (1 − T ) + 6 T (1 − T ) + 15 T (1 − T ) + 16 T (1 − T ) ) + 5 u ( T (1 − T ) + 10 T (1 − T ) + 45 T (1 − T ) + 120 T (1 − T ) + 205 T (1 − T ) + 222 T (1 − T ) + 125 T (1 − T ) ) + u ( T + 15 T (1 − T )+ 105 T (1 − T ) + 455 T (1 − T ) + 1365 T (1 − T ) + 2997 T (1 − T ) + 4945 T (1 − T ) + 6165 T (1 − T ) + 5700 T (1 − T ) + 3660 T (1 − T ) + 1296 T (1 − T ) TABLE I. The g N expressions for cliques of 6 vertices or fewer obtained from Eq 27. Appendix B: Displayed Clique formulas