The importance sampling technique for understanding rare events in Erdős-Rényi random graphs
aa r X i v : . [ m a t h . P R ] A p r THE IMPORTANCE SAMPLING TECHNIQUE FORUNDERSTANDING RARE EVENTS IN ERD ˝OS-R´ENYI RANDOMGRAPHS
SHANKAR BHAMIDI , JAN HANNIG , CHIA YING LEE , AND JAMES NOLEN Abstract.
In dense Erd˝os-R´enyi random graphs, we are interested in the events wherelarge numbers of a given subgraph occur. The mean behavior of subgraph counts isknown, and only recently were the related large deviations results discovered. Conse-quently, it is natural to ask, can one develop efficient numerical schemes to estimate theprobability of an Erd˝os-R´enyi graph containing an excessively large number of a fixedgiven subgraph? Using the large deviation principle we study an importance samplingscheme as a method to numerically compute the small probabilities of large trianglecounts occurring within Erd˝os-R´enyi graphs. We show that the exponential tilt sug-gested directly by the large deviation principle does not always yield an optimal scheme.The exponential tilt used in the importance sampling scheme comes from a generalizedclass of exponential random graphs. Asymptotic optimality, a measure of the efficiencyof the importance sampling scheme, is achieved by a special choice of the parametersin the exponential random graph that makes it indistinguishable from an Erd˝os-R´enyigraph conditioned to have many triangles in the large network limit. We show how thischoice can be made for the conditioned Erd˝os-R´enyi graphs both in the replica symmetricphase as well as in parts of the replica breaking phase to yield asymptotically optimalnumerical schemes to estimate this rare event probability. Introduction
In this paper we study the use of importance sampling schemes to numerically estimatethe probability that an Erd˝os-R´enyi random graph contains an unusually large number oftriangles. A simple graph X on n vertices can be represented as an element of the spaceΩ n = { , } ( n ). A graph X ∈ Ω n will be denoted by X = ( X ij ) i Primary: 65C05, 05C80, 60F10. Key words and phrases. Erd˝os-R´enyi random graphs, exponential random graphs, rare event simulation,large deviations, graph limits. and E ( X ) := P i Importance sampling and asymptotic optimality. If { X k } ∞ k =1 ⊂ Ω n is a se-quence of Erd˝os-R´enyi random graphs generated independently from P n,p , then for anyinteger K > M K = 1 K K X k =1 W n,t ( X k )is an unbiased estimate of µ n . By the law of large numbers, M K → µ n with probabilityone as K → ∞ . Although this estimate of µ n is very simple, the relative error is p Var( M K ) E ( M K ) = p µ n − ( µ n ) µ n √ K , which scales like ( Kµ n ) − / as µ n → 0. Hence the relative error may be very large inthe large deviation regime where µ n << 1, unless we have at least K ∼ O ( µ − n ) samples.Therefore, it is desirable to devise an estimate of µ n which, compared to this simpleMonte Carlo estimate, attains the same accuracy with fewer number of samples or lowercomputational cost.Importance sampling is a Monte Carlo algorithm based on a change of measure. Supposethat P n,p is absolutely continuous with respect to another measure Q on Ω n with d P n,p d Q = Y − : Ω n → R . Then we have µ n = E [ M K ] = E " K K X k =1 W n,t ( X k ) = E Q " K K X k =1 W n,t ( ˜ X k ) Y − ( ˜ X k ) (1.7)where E Q denotes expectation with respect to Q , and we now use { ˜ X k } ∞ k =1 to denote aset of random graphs sampled independently from the new measure Q . If we define˜ M K = 1 K K X k =1 W n,t ( ˜ X k ) Y − ( ˜ X k ) , (1.8)then ˜ M K is also an unbiased estimate of µ n , and the relative error is now: p Var Q ( M K ) E Q ( M K ) = q E Q [( W n,t ( X ) Y − ) ] − ( µ n ) µ n √ K , (1.9)Formally this is optimized by the choice Y = ( µ n ) − W n,t ( X ), in which case the relativeerror is zero. Such a choice for Q is not feasible, however, since normalizing Y wouldrequire a priori knowledge of µ n = P n,p ( W n,t ). Intuitively, we should choose the tiltedmeasure Q so that ˜ X k ∈ W n,t occurs with high probability under Q .We will refer to Y − as the importance sampling weights, and Q as the tilted measure ,or tilt. If Q arises naturally as the measure induced by a random graph G n , we will alsorefer to G n as the tilt. In the cases where a large deviation principle holds, it gives usan estimate of the relative error in the estimate ˜ M K . For any fixed K , it is clear from(1.9) that minimizing the relative error is equivalent to minimizing the second moment E Q n [( W t Y − ) ]. Since Jensen’s inequality implies that E Q n [( W t Y − ) ] > E Q n [ W t Y − ] , BHAMIDI, HANNIG, LEE, AND NOLEN we have the following asymptotic lower bound:lim inf n →∞ n log E Q n [( W t Y − ) ] > − φ ( p, t ) . (1.10)Thus, the presence of a large deviation principle for the random graphs G n,p as n → ∞ ,leads to a way to quantify the efficiency of the importance scheme in an asymptotic sense,as is done in other contexts [4]. Definition 1.1. A family of tilted measures Q n on W is said to be asymptotically optimal if lim n →∞ n log E Q n [( W t Y − ) ] = − f ∈W t [ I p ( f )] . In contrast, the second moment of each term in the simple Monte Carlo method satisfieslim n →∞ n log E P n [ W t ] = − φ ( p, t ) > − φ ( p, t ) . Thus, the simple Monte Carlo method is not asymptotically optimal. Observe thatJensen’s inequality for conditional expectation implies Q n ( W t ) − = P n ( W t ) − (cid:18) E P n ( W t Y ) P n ( W t ) (cid:19) − P n ( W t ) − E P n ( W t Y − ) = P n ( W t ) − E Q n ( W t Y − ) . (1.11)So, if Q n is asymptotically optimal, we must havelim inf n →∞ n log Q n ( W t ) > lim inf n →∞ n log P n ( W t ) + lim inf n →∞ − n log E Q n ( W t Y − ) = 0 , (1.12)which is consistent with the intuition that a good choice of Q n should put ˜ X k ∈ W t withhigh probability.To understand in this context the tilts that could be relevant, let us now describe ina little more detail, properties of the rate function 1.6 as well as structural results of theErd˝os-R´enyi model conditioned on rare events and their connections to a sub-family ofthe famous exponential random graph models.1.2. Edge and triangle tilts. In this article we consider tilted measures within a familyof exponential random graphs G h,β,αn . For parameters h ∈ R , β > 0, and α > 0, theseexponential random graphs are defined via the Gibbs measure, Q n = Q h,β,αn on the spaceof simple graphs on n vertices, where Q h,β,αn ( X ) ∝ e H ( X ) , where H ( X ) = hE ( X ) + βn (cid:18) n (cid:19) − α T ( X ) α . (1.13) E ( X ) is the number of edges in graph X . If β = 0, and h = h q = log q − q for some q ∈ (0 , G h q , ,αn is an Erd˝os-R´enyi graph with edge probability q (notice that α isirrelevant when β = 0). In particular, the original graph G n,p is an exponential randomgraph with parameters h = h p and β = 0.Given the rare event problem, G n,p conditioned on T ( G n,p ) > (cid:0) n (cid:1) t , which we shallhenceforth parameterize by ( p, t ), we will focus on two strategies for choosing the tiltedmeasure. The first is to set β = 0 and h = h q for some q > p . The resulting tilted measure Q h q , ,αn will be called an edge tilt ; compared to the original measure for G n,p , this tilt simplyputs more weight on edges. The second strategy is to set h = h p but vary β > S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 5 α > 0. We refer to the resulting tilted measure Q h p ,β,αn as a triangle tilt ; compared to theoriginal measure, this tilt puts more weight on triangles, while leaving h = h p unchanged.That the two tilts above are natural candidates for the importance sampling scheme,can be reasoned in light of the following concept of when two graphs are alike. In [8]it is shown that for the range of ( p, t ) where one has (1.4), the Erd˝os-R´enyi graph G n,p conditioned on the rare event { T ( G n,p ) > (cid:0) n (cid:1) t } is asymptotically indistinguishable fromanother Erd˝os-R´enyi graph G n,t with edge probability t , in the sense that the typicalgraphs in the conditioned Erd˝os-R´enyi graph resembles a typical graph drawn from G n,t when n is large. (Asymptotic indistinguishability is explained more precisely at (2.7).)Thus, choosing the tilted measure to resemble the typical conditioned graph is a naturalchoice. While it may seem plausible for any t > p that the conditioned graph resemblesanother Erd˝os-R´enyi graph, since E [ T ( G n,t )] ∼ (cid:0) n (cid:1) t as n → ∞ , it is not always thecase. Depending on p and t , it may be that the graph G n,p conditioned on the event { T ( G n,p ) > (cid:0) n (cid:1) t } tends to form cliques and hence does not resemble an Erd˝os-R´enyigraph. When the conditioned graph does resemble an Erd˝os-R´enyi graph, we say that( p, t ) is in the replica symmetric phase. On the other hand, when the conditioned graphis not asymptotically indistinguishable from an Erd˝os-R´enyi graph we say that ( p, t ) is inthe replica breaking phase. (See Definition 2.2.)The main question we wish to address is: given the parameters ( p, t ) for the rare eventproblem, how can we choose the tilt parameters ( h q for the edge tilt, or β and α for thetriangle tilt) so that the resulting importance sampling scheme is asymptotically optimal?And, can an optimal importance sampling scheme be constructed for all values of ( p, t )?Regarding the edge tilt, our first result (Prop 3.4) is that the edge tilt Q h q , ,αn can beasymptotically optimal only if h q = h t (i.e. q = t ). This is not very surprising since E [ T ( G n,q )] ∼ (cid:0) n (cid:1) q . On the other hand, we also will prove, in Proposition 3.5, the moresurprising result that for some values of ( p, t ) the importance sampling scheme based onthe edge tilt Q h t , ,αn will not be asymptotically optimal. In particular, there is a subregimeof the replica symmetric phase for which the edge tilt with h = h t produces a suboptimalestimator.Regarding the triangle tilt Q h p ,β,αn , our main result (Prop 3.3) is a necessary and suffi-cient condition on the tilt parameters for the resulting importance sampling scheme to beasymptotically optimal. Moreover, optimality can be achieved by a triangle tilt for every( p, t ) in the replica symmetric phase, and even for some choices of ( p, t ) in the replicabreaking phase, as we will show in Section 4. Thus, the triangle tilt succeeds where theedge tilt fails, because the former appropriately penalizes samples with an undesired num-ber of triangles, whereas the latter inappropriately penalizes samples with an undesirednumber of edges. As mentioned in the preceding paragraph, a crucial property to be ex-pected of such an optimal triangle tilt is that samples from the tilted measure resemblethe original graph G n,p conditioned to have at least (cid:0) n (cid:1) t triangles. This is indeed the casefor the optimal triangle tilt, thanks to Theorem 2.4.Finally, we remark that Theorem 2.4 draws the connection between an exponentialrandom graph and a conditioned Erd˝os-R´enyi graph, indicating how the parameters forthe two graphs must be related in order for them to resemble each other. This relationshiparises from the fact that the free energy of the exponential random graph can be expressedin a variational formulation involving the LDP rate function for the conditioned Erd˝os-R´enyi graph. For ( p, t ) in the replica symmetric phase, this connection has been observedby Chatterjee and Dey [7], Chatterjee and Diaconis [5], and Lubetzky and Zhao [17]. In BHAMIDI, HANNIG, LEE, AND NOLEN this paper, Theorem 2.4 generalizes this connection to include parameters ( p, t ) in thereplica breaking phase. Organization of the paper: We start by giving precise definitions of the variousconstructs arising in our study in Section 2. This culminates in Theorem 2.4 that char-acterizes the limiting free energy of the exponential random graph model. The rest ofSection 2 is devoted to drawing a connection between the exponential random graph andErd˝os-R´enyi random graph conditioned on an atypical number of triangles, leading to thederivation of the triangle tilts. Section 3 discusses and proves our main results on asymp-totic optimality or non-optimality of the importance sampling estimators. An explicitprocedure for determining the optimal triangle tilt parameters, given ( p, t ), is present inSection 4 and further expanded on in Appendix A. In Section 5, we carry out numericalsimulations on moderate size networks using the various proposed tilts to illustrate andcompare the viability of the importance sampling schemes. Additionally, we also discussalternative strategies for choosing the tilt measure, hybrid tilts and conditioned triangletilts, which are variants of the edge and triangle tilts. Acknowledgement This work was funded in part through the 2011-2012 SAMSI Programon Uncertainty Quantification, in which each of the authors participated. JN was partiallysupported by grant NSF-DMS 1007572. SB was partially supported by grant NSF-DMS1105581.2. Large deviations, importance sampling and exponential random graphs Large deviations for Erd˝os-R´enyi random graphs. Before the proof of themain result, we start with a more detailed description of the large deviations principle forErd˝os-R´enyi random graphs and introduce the necessary constructs required in our proof.Chatterjee and Varadhan [8] have proved a general large deviation principle which is basedon the theory of dense graph limits developed by [3] (See also Lovasz’s recent monograph,[14]). In this framework, a random graph is represented as a function X ( x, y ) ∈ f W , where f W is the set of all measureable functions f : [0 , → [0 , 1] satisfying f ( x, y ) = f ( y, x ).Specifically, a finite simple graph X on n vertices is represented by the function, or graphon , X ( x, y ) = n X i,j =1 i = j X ij [ i − n , in ) × [ j − n , jn ) ( x, y ) ∈ f W . (2.1)Here we treat ( X ij ) as a symmetric matrix with entries in { , } and X ii = 0 for all i .In general, for a function f ∈ f W , f ( x, y ) can be interpreted as the probability of havingan edge between vertices x and y . Then, we define the quotient space W under theequivalence relation defined by f ∼ g if f ( x, y ) = g ( σx, σy ) for some measure preservingbijection σ : [0 , → [0 , X with its graphon representation, we can consider theprobability measure P n,p as a measure induced on W supported on the finite subset ofgraphons of finite graphs. For f ∈ W , denote E ( f ) = Z Z f ( x, y ) dx dt (2.2) S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 7 and T ( f ) = Z Z Z f ( x, y ) f ( y, z ) f ( x, z ) dx dy dz. (2.3)We see that E ( X ) = n E ( X ) and T ( X ) = n T ( X ), so that E and T represent edge andtriangle densities of the graph X , respectively. Then, rather than considering the event W n,t , we shall equivalently consider the upper tails of triangle densities, W t := { f ∈ W | T ( f ) > t } . The large deviation principle of Chatterjee and Varadhan [8] implies for any p ∈ (0 , t ∈ [ p, n →∞ n log P (cid:0) T ( G n,p ) > t (cid:1) = − φ ( p, t ) (2.4)where φ ( p, t ) is the large deviation decay rate given by a variational form, φ ( p, t ) = inf (cid:8) I p ( f ) | f ∈ W , T ( f ) > t (cid:9) = inf f ∈W t [ I p ( f )] . (2.5)Here, I p ( f ) := Z Z I p ( f ( x, y )) dx dy (2.6)is the large deviation rate function, where I p : [0 , → R is defined at (1.5). A furtherimportant consequence of the large deviation principle concerns the typical behaviour ofthe conditioned probability measure P n,p ( A |W t ) = P n,p ( A ∩ W t ) µ − n . When we refer to G n,p conditioned on the event W t = (cid:8) T ( f ) > t (cid:9) , we mean the randomgraph whose law is given by this conditioned probability measure. Lemma 2.1. ( [8, Theorem 3.1] , Lemma C.1 ) Let F ∗ ⊂ W be the non-empty set of graphsthat optimize the variational form in (2.5) . Then the Erd˝os-R´enyi graph G n,p conditionedon (cid:8) T ( f ) > t (cid:9) is asymptotically indistinguishable from the minimal set F ∗ . The term “asymptotically indistinguishable” in Lemma 2.1 roughly means that thegraphon representation of the graph converges in probability, under the cut distance met-ric, to some function f ∗ ∈ F ∗ at an exponential rate as n → ∞ . Intuitively, this meansthat the typical conditioned Erd˝os-R´enyi graph resembles some graph f ∗ ∈ F ∗ for large n . In order to give a more precise definition of asymptotic indistinguishability, we firstrecall the cut distance metric δ (cid:3) , defined for f, g ∈ W by δ (cid:3) ( f, g ) = inf σ sup S,T ⊂ [0 , (cid:12)(cid:12)(cid:12)(cid:12)Z S × T ( f ( σx, σy ) − g ( x, y )) dx dy (cid:12)(cid:12)(cid:12)(cid:12) , where the infimum is taken over all measure-preserving bijections σ : [0 , → [0 , F , F ⊂ W , δ (cid:3) ( F , F ) = inf f ∈F ,f ∈F δ (cid:3) ( f , f ) . It is known by [15] that ( W , δ (cid:3) ) is a compact metric space.We say that a family of random graphs G n on n vertices, for n ∈ N , is asymptoticallyindistinguishable from a subset F ⊂ W if: for any ǫ > ǫ > n →∞ n log P ( δ (cid:3) ( G n , F ) > ǫ ) < − ǫ . (2.7) BHAMIDI, HANNIG, LEE, AND NOLEN Further, we say that G n is asymptotically indistinguishable from the minimal set F ⊂ W if F is the smallest closed subset of W that G n is asymptotically indistinguishable from.Clearly, if G n is asymptotically indistinguishable from a singleton set F , then F is, trivially,minimal. Finally, we say two random graphs G n , G n are asymptotically indistinguishableif they are each asymptotically indistinguishable from the same minimal set F ⊂ W .Intuitively, this means that the random behaviour, or the typical graphs, of G n resemblesthat of G n for large n . (See [5] and [8] for a wide-ranging exploration of this metric in thecontext of describing limits of dense random graph sequences.)Using this terminology, we observe that an Erd˝os-R´enyi graph G n,u is asymptoticallyindistinguishable from the singleton set containing the constant function f ∗ ≡ u . A specialnotion about whether the conditioned Erd˝os-R´enyi graph is again an Erd˝os-R´enyi graphleads to the following definition. Definition 2.2. The replica symmetric phase is the regime of parameters ( p, t ) for whichthe large deviations rate satisfies inf f ∈W t [ I p ( f )] = I p ( t ) , (2.8)and the infimum is uniquely attained at the constant function t .The replica breaking phase is the regime of parameters ( p, t ) that are not in the replicasymmetric phase. (cid:4) Hence, the notion of replica symmetry is a property of the rare event problem, wherethe Erd˝os-R´enyi graph G n,p conditioned on the event {T ( f ) > t } is asymptotically indis-tinguishable from an Erd˝os-R´enyi graph with the higher edge density, G n,t , a consequenceof Lemma 2.1. In contrast, the conditioned graphs in the replica breaking phase are notindistinguishable from any one Erd˝os-R´enyi graph; instead, they may behave like a mix-ture of Erd˝os-R´enyi graphs or exhibit a clique-like structure with edge density less than t . The term “replica symmetric phase” is borrowed from [8], which in turn was inspiredby the statistical physics literature. However, we remark that this term has been useddifferently from us by other authors to refer to other families of graphs behaving like anErd˝os-R´enyi graph or a mixture of Erd˝os-R´enyi graphs.2.2. Asymptotic behavior of exponential random graphs. To find “good” impor-tance sampling tilted measures, we focus on the class of exponential random graphs. Theexponential random graph is a random graph on n vertices defined by the Gibbs measure Q ( X ) = Q h,β,αn ( X ) ∝ e n H ( X ) (2.9)on Ω n , where for given h ∈ R , β ∈ R + , α > 0, the Hamiltonian is H ( X ) = h E ( X ) + β T ( X ) α . (2.10)We will use ψ n = ψ h,β,αn to denote the log of the normalizing constant (free energy) ψ n = ψ h,β,αn = 1 n log X X ∈ Ω n e n H ( X ) , so that Q h,β,αn ( X ) = exp( n ( H ( X ) − ψ n )). We denote by G h,β,αn the exponential randomgraph defined by the Gibbs measure (2.9). The case where α = 1 is the “classical”exponential random graph model that has an enormous literature in the social sciences,see e.g. [19, 20] and the references therin and rigorously studied in a number of recentpapers, see e.g. [1, 5, 17, 18, 22, 23]. In this case, the Hamiltonian can be rewritten as S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 9 n H ( X ) = hE ( X ) + βn T ( X ). We will drop the superscripts in ψ h,βn , Q h,βn when α = 1. Thegeneralization to the exponential random graph with the parameter α was first proposedin [17].Observe that the Erd˝os-R´enyi random graph is a special case of the exponential randomgraph: if β = 0 and h = h p with h p defined by (1.2), then Q h p , ,αn = P n,p for any α > p . On the other hand, choosing β > h, β, α ), the Gibbs measure Q h,β,αn can be adjusted to favor edges and triangles to varyingdegree.The asymptotic behavior of the exponential random graph measures Q h,β,αn and the freeenergy ψ h,β,αn is partially characterized by the following result of Chatterjee and Diaconis[5] and Lubetzky and Zhao [17]. In what follows, we will make use of the functions I ( u ) = 12 u log u + 12 (1 − u ) log(1 − u ) (2.11)on u ∈ [0 , 1] and, for f ∈ W , I ( f ) := Z Z I ( f ( x, y )) dx dy. (2.12) Theorem 2.3 (See [5] [17]) . For the exponential random graph G h,β,αn with parameters ( h, β, α ) ∈ R × R + × [2 / , , the free energy satisfies lim n →∞ ψ h,β,αn = sup u (cid:20) β u α − I ( u ) + h u (cid:21) . (2.13) If the supremum in (2.13) is attained at a unique point v ∗ ∈ [0 , , then the exponentialrandom graph G h,β,αn is asymptotically indistinguishable from the Erd˝os-R´enyi graph G n,v ∗ . The case α = 1 in Theorem 2.3 was proved by Chatterjee and Diaconis – Theorems 4.1,4.2 of [5]; the cases α ∈ [2 / , 1] is due to Lubetzky and Zhao– Theorems 1.3, 4.3 of [17].Our main result in this section, stated next, is the generalization of the variationalformulation for the free energy of the Gibbs measure of any exponential random graph.Our result emphasizes the connection between the exponential random graph and theconditioned Erd˝os-R´enyi graph. Before stating the result we will need some extra notation.Extend the Hamiltonian defined in (2.10) to the space of graphons in the natural way H ( f ) := h E ( f ) + β T ( f ) α (2.14)where recall the definitions for the density of edges and triangles for graphons definedrespectively in (2.2) and (2.3). For fixed q ∈ (0 , 1) recall the functions I q ( f ) from (2.6)and the function I ( f ) from (2.12). In particular, observe that I q ( f ) = I ( f ) − h q E ( f ) − 12 log(1 − q ) . (2.15)with h q = log q − q . Theorem 2.4. For the exponential random graph G h,β,αn with parameters ( h, β, α ) ∈ R × [0 , + ∞ ) × [0 , , the free energy satisfies lim n →∞ ψ h,β,αn = sup u (cid:20) β u α − φ q ( u ) − 12 log(1 − q ) (cid:21) (2.16) where q ∈ (0 , is such that h = h q = log q − q , and φ q ( u ) = inf f ∈ ∂ W u [ I q ( f )] (2.17) and ∂ W u := { f ∈ W | T ( f ) = u } .If the supremum in (2.16) is attained at a unique point v ∗ > q , then the exponentialrandom graph G h q ,β,αn is asymptotically indistinguishable from the conditioned Erd˝os-R´enyigraph, G n,q conditioned on the event (cid:8) T ( f ) > ( v ∗ ) (cid:9) .Remark . (i) If, in addition, ( q, v ∗ ) in Theorem 2.4 belongs to the replica symmetricphase, then G h q ,β,αn is asymptotically indistinguishable from the Erd˝os-R´enyi graph G n,v ∗ . This follows from the remarks following Definition 2.2, that in the replica sym-metric phase, G n,q conditioned on (cid:8) T ( f ) > ( v ∗ ) (cid:9) is asymptotically indistinguishablefrom the Erd˝os-R´enyi graph G n,v ∗ . In this case, (2.16) reduces to (2.13).(ii) Non-uniqueness of v ∗ is possible. As will be apparent from the proof, if the supremumin (2.16) is attained on the set U ∗ ⊂ [0 , G h q ,β,αn is asymptotically indistinguishable from the minimal set F ∗ = S u ∈ U ∗ F ∗ u , where F ∗ u is the set of minimizers of (2.17). In particular, if U ∗ contains more than one element,then G h,β,αn is asymptotically indistinguishable from a mixture of different conditionedErd˝os-R´enyi graphs. Proof. Theorem 3.1 in [5] implies thatlim n →∞ ψ h q ,β,αn = sup f ∈W [ H ( f ) − I ( f )] . (2.18)To show (2.16), suppose f ∈ ∂ W u , for u ∈ (0 , H ( f ) − I ( f ) = h q E ( f ) + β u α − I ( f )= β u α − I q ( f ) − 12 log(1 − q ) (2.19) β u α − inf f ∈ ∂ W u [ I q ( f )] − 12 log(1 − q )This implies thatsup f ∈ ∂ W u [ H ( f ) − I ( f )] β u α − inf f ∈ ∂ W u [ I q ( f )] − 12 log(1 − q ) , and sup f ∈W [ H ( f ) − I ( f )] = sup u sup f ∈ ∂ W u [ H ( f ) − I ( f )] sup u (cid:20) β u α − inf f ∈ ∂ W u [ I q ( f )] − 12 log(1 − q ) (cid:21) Now we show the reverse inequality. Fix ǫ > 0. For each u ∈ (0 , f u,ǫ ∈ ∂ W u besuch that I q ( f u,ǫ ) inf f ∈ ∂ W u [ I q ( f )] + ǫ. S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 11 Therefore, for each u ∈ (0 , 1) we have H ( f u,ǫ ) − I ( f u,ǫ ) = β u α − [ I q ( f u,ǫ )] − 12 log(1 − q ) > β u α − inf f ∈ ∂ W u [ I q ( f )] − 12 log(1 − q ) − ǫ. (2.20)Hence sup f ∈W [ H ( f ) − I ( f )] > sup
12 log(1 − q ) (cid:21) − ǫ. (2.21)Since ǫ > v ∗ > q . Let F ∗ v ∗ ⊂ ∂ W v ∗ denote the set of functions that attains the infimum in(2.17). We observe from the preceeding proof that, in fact, F ∗ v ∗ is the set that attainsthe infimum in (2.18), so that by [5, Theorem 3.2] and Lemma C.1, the graph G h q ,β,αn is asymptotically indistinguishable from the minimal set F ∗ v ∗ . On the other hand, since F ∗ v ∗ is also the set that attains the infimum in the LDP rate in (2.5) (due to [8, The-orem 4.2(iii)]), the conditioned Erd˝os-R´enyi graph, G n,q conditioned on (cid:8) T ( f ) > ( v ∗ ) (cid:9) ,is also asymptotically indistinguishable from the set F ∗ v ∗ . Thus, G h q ,β,αn is asymptoticallyindistinguishable from the conditioned Erd˝os-R´enyi graph, G n,q conditioned on the event (cid:8) T ( f ) > ( v ∗ ) (cid:9) . (cid:4) The mean behaviour of the triangle density of an exponential random graph G h q ,β,αn canbe deduced from the variational formulation in (2.16), and in special instances, so canthe mean behaviour of the edge density. This is shown in the next proposition, whichfollows from [5, Theorem 4.2] and the Lipschitz continuity of the mappings f 7→ T ( f )and f 7→ E ( f ) under the cut distance metric δ (cid:3) [3, Theorem 3.7]. The proof is left to theappendix. Proposition 2.6. Let ( h q , β, α ) ∈ R × [0 , + ∞ ) × [0 , . If the supremum in (2.16) isattained at a unique point v ∗ ∈ [0 , , then lim n →∞ E |T ( G h q ,β,αn ) − ( v ∗ ) | = 0 . (2.22) Further, if ( q, v ∗ ) belongs to the replica symmetric phase, then lim n →∞ E |E ( G h q ,β,αn ) − v ∗ | = 0 . (2.23)3. Asymptotic Optimality Recall that the edge tilt corresponds to the Gibbs measure (1.13) with β = 0 and h > h p = log p − p . Thus, an edge tilt Q h, n satisfies d P n,p d Q h, n ( X ) = exp (cid:20) − n (cid:18) h − h p E ( X ) + ψ h p , n − ψ h, n (cid:19)(cid:21) . (3.1)The triangle tilt corresponds to the Gibbs measure (1.13) with h = h p and β > α > Q h p ,β,αn satisfies d P n,p d Q h p ,β,αn ( X ) = exp (cid:20) − n (cid:18) β T ( X ) α + ψ h p , n − ψ h p ,β,αn (cid:19)(cid:21) Here recall that T ( X ) = n T ( X ) is the density of triangles in X and E ( X ) = n E ( X ) isthe density of edges.For any admissible parameters ( h, β, α ), the importance sampling estimator based onthe tilted measure Q h,β,αn is˜ M K = 1 K K X k =1 W t ( ˜ X k ) d P n,p d Q h,β,αn ( ˜ X k )= 1 K K X k =1 W t ( ˜ X k ) exp (cid:26) n (cid:18) h p − h E ( ˜ X k ) − β T ( ˜ X k ) α + ψ h,β,αn − ψ h p , n (cid:19)(cid:27) (3.2)where ˜ X k are i.i.d. samples drawn from Q h,β,αn . Denoteˆ q n = ˆ q n ( ˜ X ) = W t ( ˜ X ) d P n,p d Q h,β,αn ( ˜ X ) . For any ( h, β, α ), E [ˆ q n ] = µ n and so ˜ M K is an unbiased estimator for µ n .Our first result is a necessary condition for asymptotic optimality of the importancesampling scheme: Proposition 3.1. Given p < t , let ( h, β, α ) ∈ R × [0 , + ∞ ) × [0 , with h = h q = log q − q .Suppose that the supremum in (2.16) is not attained at t : sup u (cid:20) β u α − φ q ( u ) − 12 log(1 − q ) (cid:21) = β t α − φ q ( t ) − 12 log(1 − q ) . (3.3) Then the importance sampling scheme based on the Gibbs measure tilt Q h,β,αn is not asymp-totically optimal. Corollary 3.2. Given p < t , let ( h, β, α ) ∈ R × [0 , + ∞ ) × [0 , with h = h q = log q − q .Suppose that family of random graphs G h,β,αn is not indistinguishable from G n,p conditionedon the event {T ( X ) > t } . Then the importance sampling scheme based on the Gibbsmeasure tilt Q h,β,αn is not asymptotically optimal. Our next result shows that for triangle tilts (i.e. h = h p ), the necessary conditiondescribed in Proposition 3.1 is also a sufficient condition for asymptotic optimality: Proposition 3.3. Given p < t , let h = h p and ( β, α ) ∈ [0 , + ∞ ) × [0 , . Suppose that thesupremum in (2.16) is attained at t : sup u (cid:20) β u α − φ p ( u ) − 12 log(1 − p ) (cid:21) = β t α − φ p ( t ) − 12 log(1 − p ) . (3.4) Then the importance sampling scheme based on the triangle tilt Q h p ,β,αn is asymptoticallyoptimal. In Section 4, we give a more explicit way to determine the tilt parameters that satisfythe condition (3.4).Next, we turn to the edge tilts (i.e. β = 0). Since φ q ( u ) is minimized at u = q and(3.3) holds if β = 0 and q = t , we have, as a corollary of Prop 3.1, the following necessarycondition for an edge tilt to produce an optimal scheme. Proposition 3.4. Given p < t , let β = 0 and h = h q for some q = t . The importancesampling scheme based on the edge tilt Q h q , ,αn is not asymptotically optimal. S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 13 Observe that for the edge tilt with h = h t and β = 0, the supremum in (2.16) is alwaysattained at t , since inf u [ φ t ( u )] = inf f ∈W [ I t ( f )] = φ t ( t ) = 0 . (3.5)Thus, the edge tilt with h = h t always satisfies the necessary condition for asymptoticoptimality of the importance sampling scheme. Furthermore, if ( p, t ) is in the replicasymmetric phase, the tilted measure Q h t , ,αn is indistinguishable from the conditionedErd˝os-R´enyi graph. Nevertheless, the sampling scheme based on the edge tilt with h = h t may still be suboptimal, even in the replica symmetric phase, as the next result shows. Proposition 3.5. Let < p < e − / e − / and t ∈ ( p, . If t is sufficiently close to and ( p, t ) belong to the replica symmetric phase, then the importance sampling scheme basedon the edge tilt Q h t , n is not asymptotically optimal. Remark A.5 and Figure A.1 indicate that there do exist parameters ( p, t ) belonging tothe replica symmetric phase for which the hypothesis of Prop 3.5 is satisfied.3.1. Proofs of results. We first prove the asymptotic optimality of the triangle tilts,Prop 3.3. Proof of Proposition . Due to (1.10), it suffices to show thatlim n →∞ n log E Q [ˆ q n ] − f ∈W t I p ( f ) . (3.6)Let q ∈ (0 , 1) be such that h = h q = log q − q . Recall that Q h,β,αn ( X ) = exp (cid:20) n (cid:18) h E ( X ) + β T α ( X ) − ψ h,β,αn (cid:19)(cid:21) . Therefore, by definition of ˆ q n , we have E Q n [ˆ q n ] = E P n,p " W t d P n,p d Q h q ,β,αn = E P n,p (cid:20) exp (cid:26) n (cid:18) W t ( X ) + h p − h q E ( X ) − β T ( X ) α + ψ h q ,β,αn − ψ h p , n (cid:19)(cid:27)(cid:21) , where W t ( X ) = e n W t ( X ) with W t ( X ) = 0 if X ∈ W t and W t ( X ) = −∞ otherwise.The mappings E , T : W 7→ R are bounded and continuous [3, Theorem 3.8], and thefunction W t ( X ) can be approximated by bounded continuous approximations. Applyingthe Laplace principle for the family of measures P n,p , for which I p ( f ) is the rate function[5, Theorem 3.1], we obtainlim n →∞ n log E Q n [ˆ q n ] = lim n →∞ n log E P n,p (cid:20) exp (cid:26) n (cid:18) W t ( X ) + h p − h q E ( X ) − β T ( X ) α (cid:19)(cid:27)(cid:21) + lim n →∞ (cid:16) ψ h q ,β,αn − ψ h p , n (cid:17) = − inf f ∈W t (cid:20) I p ( f ) + h q − h p E ( f ) + β T ( f ) α (cid:21) + lim n →∞ (cid:16) ψ h q ,β,αn − ψ h p , n (cid:17) (3.7)By (2.16), lim n →∞ ψ h q ,β,αn = V ( u ∗ ) where u ∗ = argsup u [ V ( u )] and V ( u ) := β u α − inf f ∈ ∂ W u [ I q ( f )] − 12 log(1 − q ) . Also, lim n →∞ ψ h p , n = − log(1 − p ). Hence,lim n →∞ n log E Q [ˆ q n ] (3.8)= − inf f ∈W t (cid:20) I p ( f ) + h q − h p E ( f ) + β T ( f ) α (cid:21) + β u ∗ ) α − inf f ∈ ∂ W u ∗ [ I q ( f )] − 12 log 1 − q − p − inf f ∈W t (cid:20) I p ( f ) + h q − h p E ( f ) (cid:21) + β (cid:0) ( u ∗ ) α − t α (cid:1) − inf f ∈ ∂ W u ∗ [ I q ( f )] − 12 log 1 − q − p The last inequality follows from the fact that T ( f ) > t for all f ∈ W t . Since, I q ( f ) + 12 log 1 − q − p = I p ( f ) + h p − h q E ( f ) , we conclude thatlim n →∞ n log E Q [ˆ q n ] − inf f ∈W t (cid:20) I p ( f ) + h q − h p E ( f ) (cid:21) − inf f ∈ ∂ W t (cid:20) I p ( f ) − h q − h p E ( f ) (cid:21) + β (cid:0) ( u ∗ ) α − t α (cid:1) (3.9)The estimate (3.9) holds for any ( h q , β, α ) ∈ R × [0 , + ∞ ) × [0 , u ∗ = t and q = p . Therefore,lim n →∞ n log E Q [ˆ q n ] − inf f ∈W t [ I p ( f )] − inf f ∈ ∂ W t [ I p ( f )] = − f ∈W t [ I p ( f )]Combined with the upper bound for the asymptotic second moment, we conclude thatthe triangle tilt Q h p ,β,αn yields an asymptotically optimal importance sampling estimatorif (3.4) holds. (cid:4) We now prove the necessary condition for optimality, Prop 3.1. Proof of Proposition . We recall from (2.18) that lim n →∞ ψ h q ,β,αn = sup f ∈W [ H ( f ) −I ( f )]. Due to Theorem 2.4, there exists f ∗ ∈ W such that f ∗ minimizes the LDP ratefunction inf f ∈W t [ I p ( f )], and f ∗ does not maximize sup f ∈W [ H ( f ) − I ( f )]. From (3.7),lim n →∞ n log E Q n [ˆ q n ]= − inf f ∈W t (cid:20) I p ( f ) + h − h p E ( f ) + β T ( f ) α (cid:21) + lim n →∞ ψ h,β,αn + 12 log(1 − p )= − inf f ∈W t (cid:20) I p ( f ) + h − h p E ( f ) + β T ( f ) α (cid:21) + sup f ∈W [ H ( f ) − I ( f )] + 12 log(1 − p ) > − (cid:20) I p ( f ∗ ) + h − h p E ( f ∗ ) + β T ( f ∗ ) α (cid:21) + h E ( f ∗ ) + β T ( f ∗ ) α − I ( f ∗ ) + 12 log(1 − p )= − T ( f ∗ ) = − f ∈W t [ I p ( f )]Hence the importance sampling estimator is not asymptotically optimal. (cid:4) S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 15 Proof of Proposition . For the edge tilt with h = h t , β = 0, we have from (3.8),lim n →∞ n log E Q [ˆ q n ] = − inf f ∈W t (cid:20) I p ( f ) + (cid:18) h t − h p (cid:19) E ( f ) (cid:21) − I p ( t ) + (cid:18) h t − h p (cid:19) t. (3.10)Because ( p, t ) is in the replica symmetric phase, the term I p ( f ) is minimized over W t bythe constant function, f t ( x, y ) ≡ t = arg inf f ∈W t [ I p ( f )] . On the other hand, the term E ( f ) is minimized over W t by the clique function g t ( x, y ) = [0 ,t ] ( x, y ) = arg inf f ∈W t [ E ( f )] . (3.11)This g t represents a graph with a large clique, in which there is a complete subgraph ona fraction t of the vertices. Let V ( f ) = I p ( f ) + h t − h p E ( f ) be the function to be infimizedin (3.10). We have V ( f t ) = I p ( t ) + (cid:18) h t − h p (cid:19) t, and V ( g t ) = t I p (1) + (1 − t ) I p (0) + (cid:18) h t − h p (cid:19) t . Thus, if we can show that V ( g t ) < V ( f t ) , it will follow that from (3.10) thatlim n →∞ n log E Q [ˆ q n ] > −V ( g t ) − I p ( t ) + (cid:18) h t − h p (cid:19) t> − I p ( t ) = − f ∈W t I p ( f ) . (3.12)We claim that for p < e − / e − / and t sufficiently close to 1, we have V ( g t ) < V ( f t ).Indeed, let G ( t ) := V ( g t ) − V ( f t ) (3.13)= t I p (1) + (1 − t ) I p (0) − I p ( t ) + (cid:18) h t − h p (cid:19) ( t − t ) . Observe that G (1) = 0 and G ′ (1) = 2 I p (1) − I p (0) − / − log (cid:18) p − p (cid:19) − / . So, G ′ (1) > h p < − / 2, i.e., if p < e − / e − / . So, for t sufficiently close to 1, wehave V ( g t ) < V ( f t ), and we conclude that (3.12) holds with strict inequality. Hence, theimportance sampling scheme associated with the edge tilt Q h t , n cannot be asymptoticallyoptimal. (cid:4) Characterizing regimes for the triangle tilt Proposition (3.3) describes the necessary and sufficient condition (3.4) on the param-eters ( β, α ) of a triangle tilt, that will produce an optimal importance sampling scheme.Given ( p, t ), do these optimal tilt parameters ( β, α ) exist and how can they be found? Inthis section, we describe in a pseudo-explicit procedure for determining the optimal tiltparameters given ( p, t ).An explicit determination of the optimal tilt parameters can be made when ( p, t ) belongsto the replica symmetry phase. Proposition 4.1. If ( p, t ) belongs to the replica symmetry phase, then there exists some α ∈ [2 / , for which the triangle tilt with parameters ( h p , β, α ) produces an optimalscheme, where β satisfies β = h t − h p αt α − . (4.1)It will turn out that if there exists some α ∈ [0 , 1] and some β for which the triangle tiltproduces an optimal scheme, then for any α ′ ∈ [0 , α ], and an appropriate β ′ dependingon α ′ , the triangle tilt with parameters ( h p , β ′ , α ′ ) also produces an optimal scheme (seeLemma A.4). Thus, in Prop 4.1, we can always take α = 2 / p, t ) belong to the replicabreaking phase. Our next result, Prop 4.3, states a more general characterization of theoptimal tilt parameters that applies to both the replica symmetry and breaking phases.To state the result, we introduce the minorant condition.We shall say that ( p, t ) satisfies the minorant condition with parameter α if the point( t α , φ p ( t )) lies on the convex minorant of the function x φ p ( x / α ). In this case,subdifferential(s) of the convex minorant of x φ p ( x / α ) at x = t α always exist and arepositive. Recall that the subdifferentials of a convex function f ( x ) at a point x are theslopes of any line lying below f ( x ) that is tangent to f at x . The set of subdifferentialsof a convex function is non-empty; if the function is differentiable at x , then the set ofsubdifferentials contains exactly one point, the derivative f ′ ( x ).The minorant condition is not an unattainable one, as shown in the next lemma. Lemma 4.2. The parameters ( p, t ) that satisfy the minorant condition with some α in-cludes the replica symmetry phase as well as a non-empty subset of the replica breakingphase. Proposition 4.3. Suppose ( p, t ) satisfies the minorant condition for some α ∈ [0 , . Thenthe triangle tilt with the parameters ( h p , β, α ) produces an optimal scheme, where β is suchthat β is a subdifferential of the convex minorant of x φ p ( x / α ) at x = t α .Moreover, if φ p ( u ) is differentiable at t , then β = 2 φ ′ p ( t ) αt α − . (4.2)Combining Lemma 4.2 and Prop 4.3, there exists ( p, t ) belonging to the replica breakingphase for which a triangle tilt that produces an optimal scheme exists. In particular, inthe replica symmetry phase, since φ p ( t ) = I p ( t ) is differentiable at t , Prop 4.3 reducesto Prop 4.1. Thus, Prop 4.1 gives an explicit construction of the tilt parameters when( p, t ) belong to the replica symmetry phase. In the replica breaking phase, we may needto resort to numerical strategies to find the tilt parameters. Nonetheless, we emphasizethat it is possible in principle to construct an optimal importance sampling estimator in S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 17 the replica breaking phase even if the limiting behaviour of the conditioned graph is notknown exactly. Proofs of results. Notice that if t attains the supremum in (3.4), we may rewrite thecondition as t = arg sup u (cid:20) β u α − φ p ( u ) (cid:21) . Together with Prop 3.3, the next lemma immediately implies Prop 4.3. Lemma 4.4. Suppose ( p, t ) satisfies the minorant condition for some α > . Let β besuch that β is a subdifferential of the convex minorant of x φ p ( x / α ) at x = t α . Then sup u [ β u α − φ p ( u )] is maximized at t .Moreover, if φ p ( u ) is differentiable at t , then β is defined in (4.2) .Proof. The proof follows a similar technique to [17]. Using the rescaling u x / α , thevariational form sup u [ β u α − φ p ( u )] can be rewritten assup x [ β x − φ p ( x / α )] . Let ˆ φ p ( x ) denote the convex minorant of x φ p ( x / α ). The assumption that β is asubdifferential of ˆ φ p ( x ) at x = t α implies that the maximum of sup x [ β x − ˆ φ p ( x )] isattained at t α . By the hypothesis of the lemma, we have assumed that ( p, t ) satisfies theminorant condition for α , so that the point ( t α , φ p ( t )) lies on ˆ φ p ( x ). Thus we have thatˆ φ p ( t α ) = φ p ( t ) and so the maximum of sup x [ β x − φ p ( x / α )] is also attained at t α . Itfollows that the maximum of sup u [ β u α − φ p ( u )] is attained at t . (However, this maximummay not be unique. If the subtangent line defined by the subdifferential β touches ˆ φ p atanother point r α , then r also a maximum.)To prove the last part of the lemma, if φ p ( u ) is differentiable at t , then the subdifferentialis simply the derivative. Then we have0 = ∂∂x (cid:12)(cid:12)(cid:12) x = t α [ β x − φ p ( x / α )] = β − φ ′ p ( t ) t − α α implies that β = φ ′ p ( t ) αt α − . (cid:4) Proof. (Proof of Lemma 4.2.) Recalling Definition 2.2 of the replica symmetric phase, theit follows from the arguments in [17] and Theorem 4.3 in [8] that any ( p, t ) that belongsto the replica symmetric phase satisfies the minorant condition for some α ∈ [2 / , p, t ) belonging to the replica breaking phase that satisfiesthe minorant condition for some α . Notice from Appendix A and Figure A.1 that thereexists some a critical value p crit such that when p > p crit , ( p, t ) is replica symmetric forall t ∈ [ p, p p crit , there exists an interval [ r p , r p ] ⊂ ( p, 1) where ( p, t ) isreplica breaking if t ∈ [ r p , r p ], and ( p, t ) is replica symmetric for all other values of t .To see this, consider α = 1 / x φ p ( x / α ) = φ p ( x ). Foreach p < p crit , there exists an interval [ r p , r p ] ⊂ ( p, 1) where ( p, t ) is replica breaking if t ∈ [ r p , r p ], and ( p, t ) is replica symmetric for the other values of t . Since φ p ( t ) < I p ( t ) if t ∈ [ r p , r p ] and φ p ( t ) = I p ( t ) for other values of t , and since I p ( u ) is convex, the convexminorant of φ p ( x ) must touch φ p at at least one t p ∈ [ r p , r p ]. So ( p, t p ) is replica breakingand satisfies the minorant condition. (cid:4) Numerical simulations using importance sampling We implement the importance sampling schemes to show the optimality properties ofthe Gibbs measure tilts in practice. Although we have thus far been considering impor-tance sampling schemes that draw i.i.d. samples from the tilted measure Q , in practiceit is very difficult to sample independent copies of exponential random graphs. This isbecause of the dependencies of the edges in the exponential random graph, unlike the sit-uation with an Erd˝os-R´enyi graph where the edges are independent. Thus, to implementthe importance sampling scheme, we turn to a Markov chain Monte Carlo method knownas the Glauber dynamics to generate samples from the exponential random graph. TheGlauber dynamics refers to a Markov chain whose stationary distribution is the Gibbsmeasure Q h,β,αn . The samples ˜ X k from the Glauber dynamics are used to form the impor-tance sampling estimator ˜ M K in (3.2). The variance of ˜ M K clearly also depends on thecorrelation between the successive samples. However, in this paper, rather than focus onthe effect of correlation on the variance of ˜ M K , we instead investigate and compare theoptimality of the importance sampling schemes, and show that importance sampling is aviable method for moderate values of n . Glauber dynamics. For the exponential random graph G h,β,αn , the Glauber dynamicsproceeds as follows.Suppose we have a graph X = ( X ij ) i Regarding the mixing time of the Glauber dynamics, [1] showed for the case α = 1 that ifthe variational form for the free energy of the Gibbs measure Q h,β, n ,sup u [ h u + β u − I ( u )] , (5.1)has a unique local maximum, then the mixing time of the Glauber dynamics is O ( n log n );otherwise, the variational form has multiple local maxima, and the mixing time is O ( e n ).Clearly, the importance sampling tilt must be chosen so that the mixing time of theGlauber dynamics is O ( n log n ).5.1. Example 1. The importance sampling scheme was performed for p = 0 . , t = 0 . h = h t , β = 0, as well as from the triangle tilt with parameters h = h p , α = 1 and β = h t − h p t as in (4.1). The mixing time for both tilts is O ( n log n ).In addition to the edge and triangle tilts, we also consider a family of “hybrid” tilts withparameters h = h q for q > p , and α = 1 and β = β q = h t − h q t . (5.2)With these parameters, the variational form for the free energy of the correspondingGibbs measure is uniquely maximized at t . Thus, the hybrid tilt satisfies the necessarycondition in Proposition 3.1 for optimality (i.e., (3.3) does not hold). By Theorem 2.4,the corresponding exponential random graph G h q ,β q , n is indistinguishable from the Erd˝os-R´enyi graph G n,t and has a mean triangle density of t , in the sense of (2.22).In the simulations, we used the hybrid tilts with h = h q , for q = 0 . , . , . . . , . 4, and β q satisfying (5.2). With this notation, in fact, q = p = 0 . 35 corresponds to the triangletilt while q = t = 0 . µ n := P ( G n,p ∈ W t ) using the tilts Q h q ,β q , n . Also shownis the estimate for the log probability, n log P ( G n,p ∈ W t ), which can be seen to approachthe LDP rate lim n →∞ n log P ( G n,p ∈ W t ) = − I p ( t ) ≈ − . . as n is increased. Table 5.2 shows the estimated values of the variance of the estimator, V ar Q n (ˆ q n ), where ˆ q n = W t d P n,p d Q h,βn , as well as the log second moment n log E Q n [ˆ q n ]. Thevariance of the estimator for all the hybrid and edge tilts appear to be comparable to theoptimal triangle tilt, and the log second moment likewise appears to converge towards − I p ( t ) ≈ − . p = 0 . , t = 0 . n = 96.For n = 16 , , 64, the number of MCMC samples used was 5 × n log n , while for n = 96, the number of MCMC samples used was 10 n log n .Both the random graphs corresponding to the triangle or edge tilts are expected by(2.22), (2.23) to have triangle density of t and edge density of t , on average. However,there is a difference between the way that the triangle and edge tilts produce events in (cid:8) T ( f ) > t (cid:9) , which is that the edge tilt tends to produce more successful samples in (cid:8) T ( f ) > t (cid:9) with higher edge density, compared to the triangle tilt. (See Figure 5.1.)This is attributable to the fact that the edge tilt penalizes successful samples that containthe desired triangle density but with lower than expected edge density. q n 16 0.12475 0.1247 0.12521 0.12425 0.12441 0.12435(-0.008131) (-0.008132) (-0.008116) (-0.008146) (-0.008141) (-0.008143)32 0.01107 0.011056 0.011116 0.010941 0.010972 0.010729(-0.004398) (-0.004399) (-0.004394) (-0.004409) (-0.004407) (-0.004429)64 2.1919e-06 2.0283e-06 2.6073e-06 5.3287e-07 1.3822e-06 3.5772e-06(-0.003181) (-0.003200) (-0.003139) (-0.003527) (-0.003294) (-0.003062)96 1.1036e-11 1.6868e-11 2.0805e-11 4.4039e-11 2.6124e-11 4.497e-11(-0.002738) (-0.002692) (-0.002669) (-0.002587) (-0.002644) (-0.002585) Table 5.1. Comparison of the estimates for the probability µ n (top num-ber) for varying tilts with parameters ( h q , β, β is defined in (5.2).Also shown is the log probability n log P ( G n,p ∈ W t ) (lower number). n \ q Table 5.2. Comparison of the estimates for the variance V ar Q (ˆ q n ) (topnumber) for varying tilts with parameters ( h q , β, β is defined in(5.2). Also shown is the log second moment n log E Q [ˆ q n ] (lower number).5.2. Example 2: Using α = 1 or conditioned Gibbs measures. The importancesampling scheme was next performed for p = 0 . , t = 0 . 3, in the replica symmetric phase.We again use the Glauber dynamics to draw samples from the edge tilt with parameters h = h t , β = 0; the mixing time here is O ( n log n ). In contrast to the previous example,for the triangle tilt with α = 1, the variational form (5.1) has two local maxima, resultingin a mixing time of O ( e n ). Instead, we will use a triangle tilt with α = 2 / 3. Thanks to thefact that ( p, t ) is in the replica symmetric phase and φ p is differentiable at t , Proposition4.3 implies that for α = 2 / 3, we choose β = h t − h p (2 / t . The simulation results for the importance sampling scheme using the triangle tilt with α = 2 / n = 32. We see thatthe triangle tilt with α = 2 / p = 0 . , t = 0 . n = 64Alternatively, we also consider a modification to the triangle tilt with α = 1 and β p = h t − h p t as in (4.1). This modification draws samples from the Gibbs measure Q h p ,β p , n S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 21 100 110 120 130 140 150 160 1700246810 x 10 No. of Edgesn=25 ( n C × t = 120) 460 480 500 520 540 560 580 60000.511.522.5 x 10 No. of Edgesn=50 ( n C × t = 490)1950 2000 2050 2100 2150 22000123456 x 10 No. of Edgesn=100 ( n C × t = 1980) 7900 8000 8100 8200 8300 8400051015 x 10 No. of Edgesn=200 ( n C × t = 7960) h=h p , β =(h t −h p )/t h=h t , β =0h=h p , β =0 n C × t Histogram of edge count Figure 5.1. Histogram of the number of edges in the samples obtainedusing the importance sampling scheme based on the triangle tilt (solidred line) and edge tilts (dashed blue line), conditioned on the rare event { T ( X ) > (cid:0) n (cid:1) t } . The dotted green line in the top left panel shows thehistogram for direct Monte Carlo sampling. The vertical line indicates theexpected number of edges of the graph G n,p conditioned on the rare event.conditioned on the event that the edge and triangle densities not exceed a given threshold r . To be specific, let A r = { f ∈ W : T ( f ) r and E ( f ) r } (5.3)for some r > t , and let the conditioned triangle tilt be defined by the Gibbs measureconditioned on A r , ˜ Q h p ,β p , n,A r ( X ) ∝ ( e n ( h E ( X )+ β T ( X )) , if X ∈ A r X / ∈ A r . (5.4)In the numerical simulations, the threshold is chosen to be r ≈ . > t , which is alocal minimum of the variational form (5.1). The motivation for this choice of threshold r is discussed in the Appendix. The results for the conditioned triangle tilt are shownin Tables 5.3, 5.4, which indicate that both triangle tilts perform comparably and bothoutperform the edge tilt. Appendix A. Characterizing the phase diagrams We present in this appendix section a framework to define subregimes of the ( p, t ) phasespace, which extends the set up from [17]. n Triangle tilt α = 2 / Table 5.3. Estimates for the probability µ n . In parenthesis is the estima-tor for the log probability n log µ n . n Triangle tilt α = 2 / Table 5.4. Estimates for the variance V ar Q n (ˆ q n ). In parenthesis is theestimate for the log second moment, n log E Q n [ˆ q n ].Recall that ( p, t ) satisfies the minorant condition with parameter α if the point( t α , φ p ( t )) lies on the convex minorant of the function x φ p ( x / α ). Using the mi-norant condition and Lemma 4.4, we define a parameterized family of subregimes of the( p, t )-phase space. Definition A.1. Let α ∈ [0 , S α to be the set of parameters ( p, t )for which the minorant condition holds with α .Further, we define S ◦ α ⊂ S α to be the regime where, considering a subdifferential β ofthe convex minorant of x φ p ( x / α ), the variational formsup u [ β u α − φ p ( u )]is uniquely maximized at t . (cid:4) Using the Definition A.1, we can characterize the replica symmetry phase for conditionedErd˝os-R´enyi graphs, Definition 2.2, in terms of S α . To this effect, the next lemma followsdirectly from Definition 2.2 using the arguments in [17] and [8, Theorem 4.3]. Lemma A.2. S ◦ / is exactly the replica symmetric phase. We highlight that in Definition A.1, there exists a subdifferential β such that the varia-tional form is maximized at t , due to Lemma 4.4, but t may not be the unique maximizer;whereas the definition of S ◦ α requires that t be the unique maximizer. The uniquenessrequirement is convenient to make the clean connection with the replica symmetry phase S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 23 p t Figure A.1. S ◦ is the dark gray region to the right of the solid curve(not including the solid curve). S ◦ / , the replica symmetric phase, is thelight gray region to the right of the dashed curve (not including the dashedcurve), together with the dark gray region. The diagonal red dotted lineshows G ( t ) = 0, where G is the function in (3.13). For parameters abovethis line, the edge tilt does not give an optimal importance sampling esti-mator. The straight line is t = p .for conditioned Erd˝os-R´enyi graphs. As seen from Figure A.1, the replica symmetry phase S ◦ / is the light and dark gray region to the right of the dotted curve, excluding the dottedcurve. S / includes the dotted curve.Using the Definition A.1, we can rephrase Prop 4.3 and Lemma 4.2 as follows. Corollary A.3. If ( p, t ) ∈ S α , then there exists a triangle tilt that produces an optimalscheme.Moreover, the regime S α> S α where an optimal triangle tilt exists is strictly larger thanthe replica symmetric phase, and contains a nontrivial subset of the replica breaking phase. Finally, we show in the next lemma that the union, S α> S α , is an increasing union as α → 0. A consequence of this lemma is that if ( p, t ) ∈ S α for some α ∈ [0 , α ′ ∈ [0 , α ], the triangle tilt with parameters ( h p , β ′ , α ′ ) also produces an optimal scheme,where β ′ is appropriately chosen depending on α ′ . Lemma A.4. Let S α be defined in Definition A.1 . Then S α ′ ⊂ S α for < α < α ′ .Proof. Denote φ αp ( x ) = φ p ( x / α ) and let ˆ φ αp ( x ) be the convex minorant of φ αp ( x ). Then φ α ′ p ( x ) = φ p ( x / α ′ ) = φ p (cid:0) ( x α/α ′ ) / α (cid:1) = φ αp ( x α/α ′ ) . Define η ( x ) = ˆ φ α ′ p ( x α ′ /α ). Let K be the set where η ( x ) = φ α ′ p ( x α ′ /α ) for x ∈ K . Then η ( x ) φ α ′ p ( x α ′ /α ) = φ αp ( x )with equality occurring iff x ∈ K . (The interpretation of K is that t α ∈ K if and only if( p, t ) satisfies the minorant condition with α ′ .) Since α ′ α > 1, the function η ( x ) is convex and is less than φ αp ( x ), hence it must be less than the convex minorant, η ( x ) ˆ φ αp ( x ). For x ∈ K , φ αp ( x ) = η ( x ) ˆ φ αp ( x ) φ αp ( x )so ( x, φ αp ( x )) lies on the convex minorant ˆ φ αp ( x ) for all x ∈ K . Hence, if ( p, t ) satisfyingthe minorant condition with α ′ , then t α ∈ K and ( t α , φ p ( t )) lies on the convex minorantˆ φ αp ( x ), implying that ( p, t ) satisfies the minorant condition with α .Now let ( p, t ) satisfy the minorant condition with α ′ , and suppose that β ′ is a subdif-ferential of ˆ φ α ′ p ( x ) at the point t α ′ such that sup[ β ′ u α − φ p ( u )] is uniquely maximized at t . According to the arguments in the proof of Lemma 4.4, this means that the subtangentline ℓ α ′ ( x ) := β ′ x − t α ′ ) − φ p ( t )lies below φ α ′ p ( x ) and touches it at exactly one point t α ′ . Let ν ( x ) = ℓ α ′ ( x α ′ /α ). We havethat ν ( t α ) = φ α ′ p ( t α ′ ) = φ αp ( t α ) and ν ′ ( t α ) = β ′ α ′ α t α ′ − α ) . Since α ′ α > ν ( x ) is convex,and the line ℓ α ( x ) := β x − t α ) − φ p ( t ) , where β = ν ′ ( t α ), is tangent to ν ( x ) at the point t α and lies below ν ( x ). For x = t α , ν ( x ) = ℓ α ′ ( x α ′ /α ) < φ α ′ p ( x α ′ /α ) = φ αp ( x ) , so ℓ α ( x ) lies below φ αp ( x ) and touches it at exactly one point t α . Moreover, since v ( x ) is aconvex function less than φ αp ( x ), we have ˆ φ αp ( x ) > ν ( x ) > ℓ α ( x ). So, β is a subdifferentialof ˆ φ αp ( x ) and sup[ β u α − φ p ( u )] is uniquely maximized at t . The proof is complete. (cid:4) Remark A.5 . In Prop 3.5, the critical value, ˜ p = e − / e − / ≈ . h ˜ p = − / 2. We see from Figure A.1 that the conditions of the proposition are attained whenif p < ˜ p and ( p, t ) is in the region above the red dotted line intersected with the replicasymmetric phase. In this region, we have G ( t ) < 0, where G is defined at (3.13). The edgetilt Q h t , n does not produce an optimal estimator for the parameters in this region. Appendix B. Sampling from a conditioned Gibbs measure For exponential random graphs with α = 1, the Glauber dynamics is known to have anexponential mixing time O ( e n ) when the variational form (5.1) has multiple local maxima[1]. When considering a triangle tilt whose variational form has multiple local maxima,the slow mixing is one reason to preclude its feasibility as an importance sampling tilt.Another reason to avoid this tilt is because the global maximum of the variational formmay not occur at t , even though its have a local maximum at t by definition. Due to thesecond reason, such a tilt may produce a large number of samples with an over- or under-abundance of triangles, where the triangle density is determined by the global maximumof the variational form, rather than by the local minimum at t . This leads to a poorestimator that is not optimal and has large variance.We propose to circumvent these problems by modifying the triangle tilt so that thesampled graphs are restricted to the subregion of the state space that has just the ‘right’number of triangles. S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 25 Conditioned Gibbs measure. Given a set A ⊂ W , the exponential random graphconditioned on A , denoted G h,β,αn,A has the conditional Gibbs measure˜ Q h,β,αn,A ( X ) ∝ ( e n H ( X ) , if X ∈ A X / ∈ A where the Hamiltonian H ( X ) is defined in (2.9). The asymptotic behaviour of the freeenergy of the conditional Gibbs measure,˜ ψ h,β,αn,A = 1 n log X X ∈ A e n H ( X ) , is described in the following proposition, which follows from a direct modification of [5,Theorems 3.1, 3.2]. Proposition B.1. For any bounded continuous mapping H : W 7→ R , and any closedsubset A ⊂ W , let ˜ ψ n,A = ˜ ψ h,β,αn,A as above. Then lim n →∞ ˜ ψ n,A = sup f ∈ A [ H ( f ) − I ( f )] . (B.1) Moreover, if the variational form is maximized on the set ˜ F ⊂ A , then the correspondingconditioned exponential random graph is asymptotically indistinguishable from ˜ F . As a consequence of Proposition B.1, an argument akin to the proof of Theorem 2.4implies that for ( p, t ) in the replica symmetric phase, the conditioned Gibbs measure (5.4)conditioned on A r has free energy given bylim n →∞ ˜ ψ n,A r = sup u r (cid:20) h p u + β u − I ( u ) (cid:21) . (B.2)By choosing r so that the supremum is attained at t , the corresponding exponential randomgraph conditioned in A r is asymptotically indistinguishable from the Erd˝os-R´enyi graph G n,t . Thus, the necessary condition for asymptotic optimality of the importance samplingestimator is satisfied. Importance sampling using the conditioned Gibbs measure. The importance sam-pling scheme based on the conditioned Gibbs measure ˜ Q h,β,αn,A gives the estimatorˆ ν n = 1 K K X k =1 W t ( ˜ X k ) d ˜ P n,p,A r d ˜ Q h,β,αn,A r ( ˜ X k ) , where ˜ X k ∼ i.i.d. ˜ Q h,β,αn,A r (B.3)where ˜ P n,p,A r is the measure of the Erd˝os-R´enyi graph conditioned on A r . Note that ˆ ν n isan unbiased estimator for ν n = ˜ P n,p,A r ( W t ), but it is a biased estimator for µ n = P n,p ( W t ).The bias can be corrected byˆ µ n = ˆ ν n · P n,p ( A r ) + P n,p ( W t ∩ A cr ) , but the two probabilities on the RHS are not be easily computable or estimated. Nonethe-less, the choice of the set A r ensures the bias is small and vanishes exponentially fasterthan the small probability µ n . In fact, standard computations give that P n,p ( A r ) → n → ∞ , and lim n →∞ n log P n,p (cid:0) W t ∩ A cr (cid:1) (cid:12) − inf f ∈W t [ I p ( f )] . The asymptotic optimality of the importance sampling scheme is stated in the followingresult. Corollary B.2. Given ( p, t ) in the replica symmetric phase, consider the conditionedtriangle tilt defined by the Gibbs measure Q h p ,β p , n,A r in (5.4) , conditioned on A r in (5.3) with p < t < r . The importance sampling scheme based on this conditioned triangle tilt isasymptotically optimal. Choosing the set A r . We motivate the choice of the set A r in the Example from Section5.2, with p = 0 . , t = 0 . Q h p ,β p , n , the variational form V ( u ) = h p u + β p u − I ( u )has multiple local maxima. This is illustrated in Figure B.1 (inset), where t = 0 . u ∗ ≈ . 989 is the global maximum. Withoutconditioning, the exponential graph G h p ,β p , n has a mean triangle density of ( u ∗ ) , muchgreater than the desired triangle density of t ; moreover, successive samples in the Glauberdynamics take exponentially long time to move from the region with a high triangle density( u ∗ ) to the region with a lower triangle density t . The effect of conditioning on the set A r is to cap the triangle density at r , and ensure faster mixing of the Glauber dynamics.Thus, one convenient choice of r is to take r ≈ . V ( u )which separates the two local maxima. Then, t is the unique global maximum on theinterval [0 , r ] and the conditioned Gibbs measure has a mean triangle density of t .Conditioning the Gibbs measure leads to a significant reduction in the asymptotic logsecond moment of the importance sampling estimator. This reduction is best illustrated byconsidering, besides the triangle tilt itself, the family of Gibbs measures with h = h p , α = 1and varying β > 0. Figure B.1 illustrates that as β is increased from 0 up to a transitionpoint β ≈ . 76, the variational form V ( u ; β ) = h p u + β u α − I ( u ) has a global maximumwithin the range [0 . , . β > . 76, the global maximum jumps up into the range[0 . , β for which V ( u ; β ) has two local maxima and one local minimum. Observe from the figure inset thatthe triangle tilt with β = β p lies in this range. It is for this range of β that applying theconditioned Gibbs measure will lead to a reduction in the asymptotic log second moment.Figure B.2 shows the asymptotic log second moment, both with and without conditioningof the Gibbs measure. For each β , the threshold r is chosen as the local minimum of V ( u ; β ). We see that conditioning the Gibbs measure significantly reduces the asymptoticlog second moment, and the conditioned triangle tilt is asymptotically optimal. This iscorroborated by the numerical simulations presented in Section 5.2. In contrast, when noconditioning is performed, the IS estimator exhibits a sharp decline in performance when β is increased beyond the transition point at β ≈ . Appendix C. Auxiliary lemmas and proofs We present a lemma on the asymptotic indistinguishability of an exponential randomgraph from a minimal set F ∗ , as well as the proof of Proposition 2.6. Lemma C.1. (i) Given ( p, t ) , let F ∗ be the set of functions that minimize the LDP ratefunction, inf f ∈W t [ I p ( f )] in (2.5) . Then F ∗ is the minimal set that the Erd˝os-R´enyigraph G n,p conditioned on (cid:8) T ( f ) > t (cid:9) is asymptotically indistinguishable from. S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 27 β u ∗ u V ( u ) , β = β ∗ Figure B.1. The phase curve denotes the values of the stationary pointsof the variational form V ( u ) = h u + β u α − I ( u ), as β varies, and given α = 1, h p = log p − p , p = 0 . 2. The red solid line denotes when the stationarypoint is a global maximum of V ( u ); the red dotted line denotes the localmaximum; the blue dashed line denotes the local minimum. At the phasetransition point at β ≈ . 76, the maximum of the variational form jumpsfrom u ∗ ≈ . 253 to u ∗ ≈ . V ( u ) for β = β ∗ ≈ . 99 attaining a local maximum at t = 0 . u ∗ ≈ . β A s y m p t o ti c s ec ond m o m e n t β w/o conditioningCondition on A r β ∗ ≈ Figure B.2. A plot of the asymptotic second moment,lim n →∞ n log E ˜ Q [ˆ q n,A r ], of the importance sampling estimator basedon the conditioned Gibbs tilt for fixed h = h p and varying β . The insertis a zoom-in to show that the smallest variance is attained at β = β ∗ .The dotted line shows the rapid deterioration of the asymptotic secondmoment of the estimator without the use of conditioning. Parametersused are p = 0 . t = 0 . (ii) Given ( h, β, α ) , let F ∗ be the set of functions that maximize sup f ∈W [ H ( f ) − I ( f )] .Then F ∗ is the minimal set that the exponential random graph G h,β,αn is asymptoticallyindistinguishable from.Proof. The proofs of asymptotic indistinguishability of F ∗ was shown in [8, Theorem 3.1]for (i) and [5, Theorem 3.22] for (ii). The proofs naturally extend to give the minimalityof F ∗ , and we state them here for the record.Observe that for any random graph G n that is asymptotically indistinguishable froma set F ∗ , to show that F ∗ is minimal, it suffices to show that, for any relatively opennon-empty subset F ⊂ F ∗ such that F ∗ \ F is non-empty, there exists ǫ > n →∞ n log P ( δ (cid:3) ( G n , F ∗ \ F ) > ǫ ) = 0 . (C.1)Let F ⊂ F ∗ be any relatively open non-empty subset, with F ∗ \F non-empty. Denote,for ε > F ε = { f ∈ W | δ (cid:3) ( f, F ∗ \ F ) > ε } . (i) Since F is relatively open in F ∗ , δ (cid:3) ( f, F ∗ \ F ) > f ∈ F . So, there existsan ε > F ε ∩ W t ) ◦ contains at least one element of F . ( A ◦ denotes the interior of A .) It follows thatinf f ∈ ( F ε ∩W t ) ◦ [ I p ( f )] = inf f ∈W t [ I p ( f )] . Since P ( G n,p ∈ F ε | G n,p ∈ W t ) = P ( G n,p ∈ F ε ∩ W t ) P ( G n,p ∈ W t ) , from the large deviation principle in [8, Theorem 2.3] implies thatlim inf n →∞ n log P ( G n,p ∈ F ε | G n,p ∈ W t )= lim inf n →∞ n log P ( G n,p ∈ F ε ∩ W t ) − n log P ( G n,p ∈ W t ) > − inf f ∈ ( F ε ∩W t ) ◦ [ I p ( f )] + inf f ∈W t [ I p ( f )]= 0 . (ii) Since F is relatively open in F ∗ , there exists an ε > F ◦ ε contains at least one element of F , andinf f ∈F ◦ ε [ H ( f ) − I ( f )] = inf f ∈W [ H ( f ) − I ( f )] . Since the Hamiltonian H is bounded, for any η > 0, there is a finite set A ⊂ R suchthat the intervals { ( a, a + η ) , a ∈ A } cover the range of H . Let F aε = F ε ∩ H − ([ a, a + η ]),and let F a,nε = F aε ∩ Ω n be the functions corresponding to a simple finite graph. Then P ( G n ∈ F ε ) > X a ∈ A e n ( a − ψ n ) |F a,nε | > e − n ψ n sup a ∈ A h e n a |F a,nε | i and 1 n log P ( G n ∈ F ε ) > − ψ n + sup a ∈ A [ a − n log |F a,nε | ] . S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 29 By an observation in [5, Eqn. (3.4)], for any open set U ⊂ W , and U n = U ∩ Ω n ,lim inf n →∞ n log | U n | > − inf f ∈ U [ I ( f )] . Then, since sup f ∈F aε [ H ( f ) − I ( f )] sup f ∈F aε [ a + η − I ( f )] = a + η − inf f ∈F aε [ I ( f )]we have thatlim inf n →∞ n log P ( G n ∈ F ε ) > − sup f ∈W [ H ( f ) − I ( f )] + sup a ∈ A [ a − inf f ∈ ( F aε ) ◦ [ I ( f )]] > − sup f ∈W [ H ( f ) − I ( f )] + sup a ∈ A sup f ∈ ( F aε ) ◦ [ H ( f ) − I ( f )] − η > − sup f ∈W [ H ( f ) − I ( f )] + sup f ∈ F ◦ ε [ H ( f ) − I ( f )] − η = 0 . The proof is complete. (cid:4) Proof of Proposition 2.6. Proof. Let ǫ > F ∗ v ∗ be the set of minimizers ofinf f ∈ ∂ W v ∗ [ I q ( f )]. E Q n |T ( X ) − ( v ∗ ) | = Z { δ (cid:3) ( X, F ∗ v ∗ ) >ǫ } |T ( X ) − ( v ∗ ) | d Q n ( X ) + Z { δ (cid:3) ( X, F ∗ v ∗ ) ǫ } |T ( X ) − ( v ∗ ) | d Q n ( X )= ( I ) + ( II )(We have dropped the superscripts, Q n = Q h q ,β,αn .) We estimate the two terms. Toestimate ( I ), by [5, Theorem 4.2], there exists C, ǫ > n Q n ( δ (cid:3) ( X, F ∗ v ∗ ) > ǫ ) C e − n ǫ . Since |T ( X ) − ( v ∗ ) | I ) Q n ( δ (cid:3) ( X, F ∗ v ∗ ) > ǫ ) C e − n ǫ . To estimate ( II ), for any X ∈ { δ (cid:3) ( X, F ∗ v ∗ ) ǫ } , let the function f ∗ X ∈ F ∗ v ∗ be such that δ (cid:3) ( X, f ∗ X ) ǫ . Note that T ( f ∗ X ) = ( v ∗ ) by definition. By Lipschitz continuity of themapping f 7→ T ( f ) under the cut distance metric δ (cid:3) [3, Theorem 3.7], |T ( X ) − ( v ∗ ) | = |T ( X ) − T ( f ∗ X ) | C δ (cid:3) ( X, f ∗ X ) C ǫ . So ( II ) = Z { δ (cid:3) ( X, F ∗ v ∗ ) ǫ } |T ( X ) − ( v ∗ ) | d Q n ( X ) C ǫ Q n ( δ (cid:3) ( X, F ∗ v ∗ ) ǫ ) C ǫ . Hence, lim n →∞ E Q n |T ( X ) − ( v ∗ ) | lim n →∞ C e − n ǫ + C ǫ = C ǫ . Since ǫ is arbitrary, (2.22) follows.If ( q, v ∗ ) belongs to the replica symmetric phase, we have by Theorem 2.4 that F ∗ v ∗ consists uniquely of the constant function f ∗ ( x, y ) ≡ v ∗ . Then since E ( f ∗ ) = v ∗ , the aboveproof follows identically to yield thatlim n →∞ E Q n |E ( X ) − v ∗ | lim n →∞ C e − n ǫ + Cǫ = Cǫ . (cid:4) References [1] S. Bhamidi, G. Bresler, and A. Sly, Mixing time of exponential random graphs , Ann. Appl. Probab. (2011), no. 6, 2146–2170. MR2895412[2] J. Blanchet and P. Glynn, Efficient rare-event simulation for the maximum of heavy-tailed randomwalks , The Annals of Applied Probability (2008), no. 4, 1351–1378.[3] C. Borgs, J. T. Chayes, L. Lov´asz, V. T. S´os, and K. Vesztergombi, Convergent sequences of densegraphs. I. Subgraph frequencies, metric properties and testing , Adv. Math. (2008), no. 6, 1801–1851. MR2455626 (2009m:05161)[4] J. A. Bucklew, Introduction to rare event simulation , Springer Series in Statistics, Springer-Verlag,New York, 2004. MR2045385 (2005e:62001)[5] S. Chatterjee and P. Diaconis, Estimating and understanding exponential random graph models , arXivpreprint arXiv:1102.2650 (2011).[6] S. Chatterjee, The missing log in large deviations for triangle counts , Random Structures Algorithms (2012), no. 4, 437–451. MR2925306[7] S. Chatterjee and P. S. Dey, Applications of Stein’s method for concentration inequalities , Ann. Probab. (2010), no. 6, 2443–2485. MR2683635 (2012f:60073)[8] S. Chatterjee and S. R. S. Varadhan, The large deviation principle for the Erd˝os-R´enyi random graph ,European J. Combin. (2011), no. 7, 1000–1017. MR2825532 (2012m:60067)[9] B. DeMarco and J. Kahn, Upper tails for triangles , Random Structures Algorithms (2012), no. 4,452–459. MR2925307[10] P. Dupuis and H. Wang, Importance sampling, large deviations, and differential games , Stochastics:An International Journal of Probability and Stochastic Processes (2004), no. 6, 481–508.[11] P. Glasserman and Y. Wang, Counterexamples in importance sampling for large deviations probabili-ties , Ann. Appl. Probab. (1997), no. 3, 731–746. MR1459268 (98b:60053)[12] S. Juneja and P. Shahabuddin, Rare event simulation techniques: An introduction and recent advances ,Simulation, Handbooks in Operations Research and Management Science (2006), 291–350.[13] J. H. Kim and V. H. Vu, Divide and conquer martingales and the number of triangles in a randomgraph , Random Structures Algorithms (2004), no. 2, 166–174. MR2035874 (2005d:05135)[14] L. Lov´asz, Large networks and graph limits , American Mathematical Society Colloquium Publications,vol. 60, American Mathematical Society, Providence, RI, 2012. MR3012035[15] L. Lov´asz and B. Szegedy, Limits of dense graph sequences , J. Combin. Theory Ser. B (2006),no. 6, 933–957. MR2274085 (2007m:05132)[16] L. Lov´asz and B. Szegedy, Szemer´edi’s lemma for the analyst , Geom. Funct. Anal. (2007), no. 1,252–270. MR2306658 (2008a:05129)[17] E. Lubetzky and Y. Zhao, On replica symmetry of large deviations in random graphs , arXiv preprintarXiv:1210.7013 (2012).[18] C. Radin and M. Yin, Phase transitions in exponential random graphs , arXiv preprint arXiv:1108.0649(2011).[19] G. Robins, P. Pattison, Y. Kalish, and D. Lusher, An introduction to exponential random graph (p*)models for social networks , Social networks (2007), no. 2, 173–191.[20] G. Robins, T. Snijders, P. Wang, M. Handcock, and P. Pattison, Recent developments in exponentialrandom graph (p*) models for social networks , Social networks (2007), no. 2, 192–215.[21] G. Rubino and B. Tuffin, Rare event simulation using monte carlo methods , Wiley Online Library,2009.[22] M. Yin, A cluster expansion approach to exponential random graph models , Journal of StatisticalMechanics: Theory and Experiment (2012), no. 05, P05004.[23] M. Yin, Critical phenomena in exponential random graphs , arXiv preprint arXiv:1208.2992 (2012). S FOR RARE EVENTS IN ERD ˝OS-R´ENYI GRAPHS 31 Department of Statistics and Operations Research, 304 Hanes Hall, University of NorthCarolina, Chapel Hill, NC 27599 Department of Statistics and Operations Research, 330 Hanes Hall, University of NorthCarolina, Chapel Hill, NC 27599 Statistical and Applied Mathematical Sciences Institute, 19 T.W. Alexander Drive, P.O.Box 14006,Research Triangle Park, NC 27709, USA.Mathematics Department, Duke University, Box 90320, Durham, North Carolina, 27708,USA E-mail address ::