[PDF] Typicality and entropy of processes on infinite trees

Abstract

Consider a uniformly sampled random d-regular graph on n vertices. If d is fixed and n goes to \infty then we can relate typical (large probability) properties of such random graph to a family of invariant random processes (called "typical" processes) on the infinite d-regular tree T_d. This correspondence between ergodic theory on T_d and random regular graphs is already proven to be fruitful in both directions. This paper continues the investigation of typical processes with a special emphasis on entropy. We study a natural notion of micro-state entropy for invariant processes on T_d. It serves as a quantitative refinement of the notion of typicality and is tightly connected to the asymptotic free energy in statistical physics. Using entropy inequalities, we provide new sufficient conditions for typicality for edge Markov processes. We also extend these notions and results to processes on unimodular Galton-Watson random trees.

Full PDF

aa r X i v : . [ m a t h . P R ] F e b TYPICALITY AND ENTROPY OF PROCESSES ON INFINITE TREES

ÁGNES BACKHAUSZ, CHARLES BORDENAVE, AND BALÁZS SZEGEDY

Abstract.

Consider a uniformly sampled random d -regular graph on n vertices. If d is ﬁxed and n goesto ∞ then we can relate typical (large probability) properties of such random graph to a family of invariantrandom processes (called "typical" processes) on the inﬁnite d -regular tree T d . This correspondence betweenergodic theory on T d and random regular graphs is already proven to be fruitful in both directions. Thispaper continues the investigation of typical processes with a special emphasis on entropy. We study anatural notion of micro-state entropy for invariant processes on T d . It serves as a quantitative reﬁnement ofthe notion of typicality and is tightly connected to the asymptotic free energy in statistical physics. Usingentropy inequalities, we provide new suﬃcient conditions for typicality for edge Markov processes. We alsoextend these notions and results to processes on unimodular Galton-Watson random trees. Introduction

Typical processes and soﬁc entropy.

Random d -regular graphs have been extensively studied overthe past 50 years [10, 20, 25, 29]. Sophisticated methods from probability theory, combinatorics and statisticalphysics have been successfully used to uncover many of their properties such as independence ratio, thedensity of a maximal cut or its spectral gap [18, 28, 20]. The recently emerging theory of graph limits[8, 22, 21, 23] gives a new, limiting point of views on the subject. It turns out that many of the crucialproperties of random d -regular graphs for d ﬁxed and n going to inﬁnity can also be studied in the frameworkof ergodic theory on the inﬁnite d -regular tree T d [2]. An illustration of the power of this method is the proofof the Gaussianity of the almost eigenvectors of random d -regular graphs [3, 11]. The proof is a combinationof ergodic theory on T d with information theoretic methods.In this paper we are interested in a question which is also related to the limit of random regular graphs,namely, to the family of typical processes on T d introduced in [2]. These objects arise as the local limitsof vertex-colored random d -regular graphs (see formal deﬁnition below). These typical processes containuseful information on the structure of d -regular graphs. For example, the classical fact from [9] that theindependence ratio is separated from / is equivalent to the fact that the alternating coloring of the inﬁnite d -regular tree is not typical. Several necessary conditions have been formulated for typical processes inthe last years. Some of them are about the covariance structure, others are entropy inequalities [4, 5, 15].However, general suﬃcient conditions for a process to be typical are less common.In this paper we obtain suﬃcient conditions for the typicality of processes on T d , by studying a newmicro-state entropy. This entropy measures in some sense the number of ﬁnite approximations of a processon a random instance of a large d -regular graph. This entropy is tightly connected to Bowen’s soﬁc entropyin measured group theory, see [13].Our approach has another connection to the convergence of random graphs. While graph limit theoryshows great promise in a variety of questions related to random d -regular graphs, it also revealed an intriguingopen problem. It is believed that for ﬁx d and n → ∞ we have that a random d -regular graph on n verticesis convergent (in probability) in the local-global sense of [22] and also right-convergent in the sense of [21],this last notion of convergence relies on a deep statistical physics theory, see the monograph [24]. Thesegeneral conjectures are a common strengthening of a large variety of conjectures many of which are alreadyproven. For example the convergence of the independence ratio is proven in [6, 17] and the convergence ofasymptotic free energies of a large class of statistical physics models on random d -regular graphs in [26, 14].In this paper, we introduce an upper and a lower version of the micro-state entropies. Equality of theseentropies imply the convergence of random regular graphs. For a family of processes we can establish thisequality. This will lead to the suﬃcient conditions mentioned above, and this shows that our results mightlead to a deep understanding of the structure of random regular graphs. e now formalize the main deﬁnitions. Let d ≥ be an integer and T d the inﬁnite d -regular tree (allvertices have d neighbors). Let M be a ﬁnite set. A process X on T d is a random variable on M T d . Thisprocess (or its law) is invariant if the law of X is invariant by all automorphisms of the tree T d . We denoteby I d ( M ) the set of invariant probability measures on M T d .We now deﬁne the Benjamini-Schramm topology. A pair ( G, f ) formed by a graph G = ( V, E ) and a map f : V → M will be called a colored graph with color set M . A rooted colored graph is a triple ( G, f, o ) formed by a connected colored graph ( G, f ) and a distinguished vertex o of G , called the root. Two rootedcolored graphs ( G, f, o ) and ( G ′ , f ′ , o ′ ) are isomorphic if there is an isomorphism of G and G ′ which preservesthe colors and the roots. An equivalence class of rooted colored graphs is called an unlabeled rooted coloredgraph in combinatorics.Unlabeled rooted graphs give the proper setup for deﬁning a meaningful notion of convergence. It ishowever more convenient to work with rooted labeled graphs instead of unlabeled rooted graphs. To thisend, we now deﬁne a randomized canonical graph in each equivalence class. We deﬁne the set of ﬁnite integersequences as N f = ∪ k ≥ N k (1.1)where N = { o } and N = { , , . . . , } by convention. The tree T d can be classically built on a subset of N f asfollows. The root of the tree is o , its d -neighbors of o are V = { , . . . , d } ⊂ N , the neighbors of i ∈ V are o and { ( i, , . . . , ( i, d − } ∈ N and so on. More generally, if ( G, o ) is a rooted graph, the breadth-ﬁrst searchtree started at the root o , where ties between vertices are broken uniformly at random deﬁnes a randomgraph ( G ′ , o ) on a subset of N f whose law depends only the equivalence class of ( G, o ) : a vertex at distance k from the root receives a label in N k , if ( i , . . . , i k − ) is the label of its parent in the search tree, it has thelabel ( i , . . . , i k − , j ) if it the j -th oﬀspring of its parent in the random ordering. We call this random rootedgraph, the randomly labeled rooted graph associated to ( G, o ) . Conversely, we will say that a random labeledcolored graph ( G, f, o ) on a subset of N f is randomly labeled if its law is equal to the law of the randomlylabeled rooted colored graph associated to its unlabeled rooted colored graph. By deﬁnition if X ∈ I d ( M ) then ( T d , X, o ) is randomly labeled.Recall that a graph is locally ﬁnite if all its vertices have a ﬁnite number of neighbors. We denote by G • (respectively G • M ) the set of locally ﬁnite graphs (respectively locally ﬁnite colored graphs on the colorset M ) on the vertex set N f rooted at o which are admissible in the sense that they are realizable as abreadth-ﬁrst search labeling of a locally ﬁnite graph. The sets G • and G • M are complete separable metricspaces when equipped with the distance d( g, g ′ ) = P r ≥ − r ( g ) r =( g ′ ) r where is the indicator function and g r is the restriction of g to the vertices at distance at most r from the root. We denote by P ( G • M ) the setof probability measures on G • M . We equip P ( G • M ) with a distance, also denoted by d , which generates theweak topology on P ( G • M ) (for example, the Lévy-Prohorov distance).If ( G, f ) is a locally ﬁnite colored graph and v is a vertex of G then we denote by distr G,v ( f ) the law in P ( G • M ) of the randomly labeled rooted graph (( G, f )( v ) , v ) where ( G, f )( v ) is the restriction of ( G, f ) to theconnected component of G containing v . The law distr G,v in P ( G • ) is deﬁned similarly for a graph G anda vertex v . Finally, if G is ﬁnite with vertex set V , we may deﬁne the probability measures in P ( G • ) and P ( G • M ) : distr G = 1 | V | X v ∈ V distr G,v and distr G ( f ) = 1 | V | X v ∈ V distr G,v ( f ) . For integers n ≥ d + 1 with nd even, the set G n ( d ) of simple d -regular graphs on the vertex set [ n ] = { , . . . , n } is not empty. For each integer n ≥ d + 1 with nd even, let G n be a uniformly distributed randomgraph on G n ( d ) . Almost-surely, the probability distribution distr G n converges as n goes to inﬁnity to theDirac mass at T d rooted at o (it is a consequence of the fact that the number of cycles of length k in G n is O (1) for any ﬁxed k , see [10]). It can further be checked that, a.s. if µ is an accumulation point of distr G n ( f n ) for some sequence of colorings f n ∈ M n , then µ ∈ I d ( M ) . This motivates the following deﬁnition introducedin [2]. Deﬁnition 1.1 (Typical process) . A measure µ ∈ I d ( M ) is weakly typical if lim ǫ → lim sup n →∞ P ( ∃ f ∈ M n : d(distr G n ( f ) , µ ) ≤ ǫ ) = 1 . t is strongly typical if lim ǫ → lim n →∞ P ( ∃ f ∈ M n : d(distr G n ( f ) , µ ) ≤ ǫ ) = 1 . Note that it is apparent from the deﬁnition that being typical does not depend on the choice of thedistance d which generates the weak topology. We note also that the deﬁnition in [2] is slightly diﬀerent butit turns out to be equivalent by using some measure concentration phenomena (see [3, Section 5]).We may also deﬁne a notion of micro-state entropy of µ ∈ I d ( M ) as follows. For µ ∈ I d ( M ) and r ≥ ,the probability measure µ r is deﬁned as the restriction of µ to the vertices at distance at most r from theroot. For any ǫ > , if G ∈ G n ( d ) , we deﬁne F G ( µ, r, ǫ ) = { f ∈ M n : d(distr G ( f ) r , µ r ) ≤ ǫ } . (1.2)This is the set of coloring functions f on G which are ǫ close to µ r in the Benjamini-Schramm sense. Let G n be a uniformly distributed random graph on G n ( d ) with n ≥ d − and nd even. Roughly speaking thesoﬁc entropy of µ is a limit of H G n ( µ, r, ǫ ) = 1 n log |F G n ( µ, r, ǫ ) | , (1.3)in n → ∞ and then in ǫ → and r → ∞ . However, since H G n ( µ, r, ǫ ) is a random variable in {−∞} ∪ [0 , ∞ ) some care is needed. More formally, we ﬁx < α < and consider the following median value of H G n ( µ, r, ǫ ) , h n ( µ, r, ǫ, α ) = sup { h ∈ {−∞} ∪ [0 , ∞ ) : P ( H G n ( µ, r, ǫ ) ≥ h ) ≥ α } . Since h n is non-decreasing in ǫ , we may deﬁne the upper and lower entropies of µ as ¯ h ( µ, r, α ) = lim ǫ → lim sup n →∞ h n ( µ, r, ǫ, α ) and h ( µ, r, α ) = lim ǫ → lim inf n →∞ h n ( µ, r, ǫ, α ) . These entropies are extended real numbers in {−∞} ∪ [0 , ∞ ) . This entropy can be interpreted as a versionof Bowen’s soﬁc entropy, see the survey [13].The soﬁc entropies ¯ h ( µ, r, α ) and h ( µ, r, α ) do not depend on α ∈ (0 , . Note that they do not dependneither on the choice of the distance d (in the sense that if two distances are topologically equivalent, thecorresponding quantities ¯ h ( µ, r, α ) and h ( µ, r, α ) are equal). We shall prove the following. Lemma 1.2.

Let µ ∈ I d ( M ) and r ≥ . The function α (¯ h ( µ, r, α ) , h ( µ, r, α )) is constant on (0 , . By Lemma 1.2, we may consider the common value of the entropies: for all α ∈ (0 , , ¯ h ( µ, r ) = ¯ h ( µ, r, α ) and h ( µ, r ) = h ( µ, r, α ) . By construction, ¯ h ( µ, r ) and h ( µ, r ) are the growth rates of the number of coloring of a random d -regulargraph whose r -neighborhood is close to µ r . Finally, since ¯ h ( µ, r ) and h ( µ, r ) are non-increasing in r , we maydeﬁne the upper and lower soﬁc entropies as ¯ h ( µ ) = lim r →∞ ¯ h ( µ, r ) and h ( µ ) = lim r →∞ h ( µ, r ) . Taking the limit as α → , for h ≥ , the inequality ¯ h ( µ ) ≥ h is equivalent to the existence of a vanishingsequence ( ǫ n ) and such that lim sup n →∞ P ( H G n ( µ, /ǫ n , ǫ n ) ≥ ( h − ǫ n ) + ) = 1 , where ( x ) + = x ∨ , and similarly for h ( µ ) . Since for ν, µ ∈ P ( G • M ) , d( ν r , µ r ) converges to d( ν, µ ) as r → ∞ ,the soﬁc entropy is thus closely related to the typicality: Lemma 1.3.

Let µ ∈ I d ( M ) . We have ¯ h ( µ ) ≥ (resp. h ( µ ) ≥ ) if and only if µ is weakly (resp. strongly)typical. This work is notably motivated the following conjecture which is connected to the notion of right conver-gence, see [21] and Subsection 1.5 below.

Conjecture 1.4.

For all µ ∈ I d ( M ) , we have h ( µ ) = ¯ h ( µ ) . In particular, µ is weakly typical if and only ifit is strongly typical. In this work, we will compute the entropy h ( µ ) = ¯ h ( µ ) for a large class of invariant measures µ ∈ I d ( M ) .This class is a class of processes where a second moment method can be applied. In a subsequent work, wewill use more advanced statistical physics methods to reﬁne our criterion. .2. Annealed entropy. If r ≥ is an integer and S is a subset of T d , we deﬁne B r ( S ) the subset ofvertices of T d at distance at most r from a vertex in S . For ease of notation, for r ≥ , we deﬁne S r = B r ( o ) as the ball of radius r around the root of T d and E r = B r − ( { o, } ) is the set of vertices at distance r − from the edge { o, } of T d .If X is an invariant process with law µ ∈ I d ( M ) and r ≥ integer, we set Σ r ( X ) = Σ r ( µ ) = H ( X S r ) − d H ( X E r ) , (1.4)where X S is the restriction of X to the subset S ⊂ T d and H is the usual Shannon entropy: if Y is a randomvariable taking value a ﬁnite set F , then H ( Y ) = − X x ∈ F P ( Y = x ) ln P ( Y = x ) . It follows from [3, 12] (see Subsection 2.2 below for details) that Σ r ( µ ) is non-increasing in r . We maythus deﬁne Σ( µ ) = lim r →∞ Σ r ( µ ) . Note that the law of X S r is µ r and that the law of X E r is a marginal of µ r . The quantities Σ r ( µ ) and Σ( µ ) will be called the annealed entropy of µ r and µ , the reason will be clear in the forthcoming Subsection2.3. The following ﬁrst moment bound is essentially contained in [2, 12]. Theorem 1.5.

For any µ ∈ I d ( M ) and integer r ≥ , we have ¯ h ( µ, r ) ≤ Σ r ( µ ) and ¯ h ( µ ) ≤ Σ( µ ) . As a corollary, we recover the "star-edge inequality" of [2, 3] which is a necessary condition of typicality:if µ is weakly typical then by Lemma 1.3, ¯ h ( µ ) ≥ and thus, by Theorem 1.5, Σ( µ ) ≥ . Corollary 1.6 ([2, 3]) . If µ ∈ I d ( M ) is a weakly typical process then Σ( µ ) ≥ . The main result of this paper is a matching lower bound for a large class of invariant processes. To thisend, we ﬁrst recall the notion of coupling restricted to our setting. Let M and M be two ﬁnite sets, X and X be two random variables on M T d and M T d with respective laws µ and µ . A coupling of µ and µ is a distribution ν on ( M × M ) T d such that if Y = ( Y , Y ) has law ν , Y i has law µ i for i = 1 , . If X i isan invariant process for i = 1 , , we say that ν or Y is an invariant coupling if ν ∈ I d ( M × M ) . Theorem 1.7.

Let µ ∈ I d ( M ) . For any integer r ≥ , if all invariant couplings ν of µ and µ satisfy Σ r ( ν ) ≤ r ( µ ) then h ( µ, r ) = ¯ h ( µ, r ) = Σ r ( µ ) . In particular, if the above condition holds for an increasingsequence of integers ( r k ) k ≥ then h ( µ ) = ¯ h ( µ ) = Σ( µ ) . We note that the bound Σ r ( ν ) ≤ r ( µ ) is attained for the independent coupling Y = ( X , X ) with X i independent with law µ . Note also that if ν is the trivial coupling of µ and µ , that is Y = ( X, X ) with X with law µ , we ﬁnd Σ r ( ν ) = Σ r ( µ ) . Hence, under the condition of Theorem 1.7, we have Σ r ( µ ) ≤ r ( µ ) orequivalently Σ r ( µ ) ≥ . As a corollary, by Lemma 1.3, we thus obtain the following suﬃcient condition fortypicality. Corollary 1.8.

Let µ ∈ I d ( M ) and ( r k ) k ≥ an increasing sequence of integers be such that for all invariantcouplings ν of µ and µ and all k ≥ we have Σ r k ( ν ) ≤ r k ( µ ) . Then µ is strongly typical. Edge-Markov processes.

There is a speciﬁc class of processes in I d ( M ) for which it is possible toimprove on Theorem 1.7, the edge-Markov processes deﬁned as follows. As above, for integer r ≥ , B r ( S ) is the r -neighborhood of a subset S in T d . For r ≥ , recall that S r = B r ( o ) and E r = B r − ( { o, } ) . Deﬁnition 1.9 (Edge-Markov process) . A probability measure in M T d is edge-Markov if conditioned on thevalue at an edge, the processes on the left and right subtrees of that edge are independent.More generally, for integer r ≥ , a probability measure on M T d is r -Markov if conditioned on the value at B r − ( e ) , the ( r − -neighborhood of an edge e , the processes on the left and right subtrees of e are independent(for r = 1 we recover the edge-Markov process). et I d,r ( M ) denote the set of probability measures on M S r that are invariant by automorphisms of S r and whose restriction to E r is invariant by switching the two sides of the edge { o, } . If µ ∈ I d ( M ) , then, µ r , its restriction to S r , is in I d,r ( M ) . Conversely, the following lemma is easy to see. Lemma 1.10.

Let r ≥ be an integer and p ∈ I d,r ( M ) . Then there is a unique r -Markov process µ ( p ) ∈I d ( M ) such that the marginal of µ ( p ) on S r is equal to µ . If p ∈ I d,r ( M ) , we deﬁne Σ( p ) = Σ r ( µ ( p )) = H ( X S r ) − ( d/ H ( X E r ) as in Equation (1.4), where X has law p . As above, if p and p are probability measures on M S r and M S r , a coupling of p and p is aprobability measure on M S r × M S r whose marginals are p and p . The following theorem is a strengtheningof Theorem 1.7 for egde-Markov processes. Theorem 1.11.

Let r ≥ be an integer and p ∈ I d,r ( M ) . If for all couplings q ∈ I d,r ( M ) of p and p , wehave Σ( q ) ≤ p ) then h ( µ ( p )) = ¯ h ( µ ( p )) = Σ( p ) and µ ( p ) is strongly typical. Theorem 1.11 provides an easy to check criteria for typicality for edge Markov processes. In the courseof the proof, we will need an important maximizing property satisﬁed by edge Markov processes (a closelyrelated characterization can be found in [12, Theorem 1.3] and [3, Lemma 10.1]).

Lemma 1.12.

Let X ∈ I d ( M ) and r ≥ . We have Σ r +1 ( X ) ≤ Σ r ( X ) with equality if and only if X S r +1 is a r -Markov process on S r +1 . Vertex-Markov processes.

There is a subclass of edge-Markov processes for which the annealedentropy takes a particularly simple form.

Deﬁnition 1.13 (Vertex-Markov process) . Let T be a tree, a probability measure in M T is vertex-Markovif conditioned on the value at a vertex, the processes on the pending subtrees of that vertex are independent. Let I e ( M ) denote the set of probability measures on M E that are invariant by switching the two sidesof the edge { o, } . If µ ∈ I d ( µ ) then its restriction to E is in I e ( M ) . Conversely, if p ∈ I e ( M ) , there existsa unique vertex-Markov process whose restriction to E is p . We denote the law of this process by µ ( p ) . If X ∈ I d ( M ) , we deﬁne Σ e ( X ) = d H ( X E ) − ( d − H ( X o ) . If p ∈ I e ( M ) , we set Σ( p ) = Σ e ( µ ( p )) . Vertex-Markov processes satisfy the following extremal property. Lemma 1.14. If X ∈ I d ( M ) then Σ ( X ) ≤ Σ e ( X ) , with equality if and only if X S is a vertex-Markov process on S . Combined with Theorem 1.11, the above lemma implies the following corollary.

Theorem 1.15.

Let p ∈ I e ( M ) . If for all couplings q ∈ I e ( M ) of p and p we have Σ( q ) ≤ p ) , then h ( µ ( p )) = ¯ h ( µ ( p )) = Σ( p ) and µ ( p ) is strongly typical.Proof. Let p ′ = µ ( p ) ∈ I d, ( M ) be the law of µ ( p ) restricted to S . Let q ′ be an invariant coupling of p ′ and p ′ and let q be its restriction to E . By construction q ∈ I e ( M ) . Moreover, by Lemma 1.14, Σ( q ′ ) ≤ Σ ( µ ( q )) = Σ e ( µ ( q )) = Σ( q ) . It follows that Theorem 1.15 is a consequence of Theorem 1.11 appliedto r = 1 and p ′ = µ ( p ) . (cid:3) Application to factor graphs and combinatorial optimization.

In this paragraph, we discuss abasic connection between asymptotic free energy of factor graphs and soﬁc entropy. This may serve an extramotivation for studying the soﬁc entropy.Let M be a ﬁnite set, r ≥ be an integer and let ϕ be a function on the set of rooted unlabeled M -coloredgraphs of radius r taking value in (0 , ∞ ) . If G ∈ G n ( d ) , Z G = X f ∈ M n n Y v =1 ϕ (( G, f, v ) r ) , here ( G, f, v ) r is the rooted colored graph associated to the ball of radius r around v in G . We set ψ = ln ϕ. The asymptotic free energy is deﬁned as the limit of (1 /n ) ln Z G n where G n is a uniformly sampled graphin G n ( d ) (provided that the limit exists). By standard concentration inequality (see argument in Theorem2.1), it is easy to check that if G n is uniformly sampled in G n ( d ) , then, in probability, as n goes to inﬁnity, n ln Z G n − E n ln Z G n → . (1.5)It is straightforward to express the limits of the expected free energy in terms of the entropy. If p ∈I d,r ( M ) , we set for ease of notation h ( p ) = h ( µ ( p ) , r ) and similarly for ¯ h ( p ) (there are the upper and lowergrowth rates of the number of colorings of G n whose r -neighborhood is close to p ). In the statement below,we use the notation h p, ψ i = E ψ ( X ) where X has law p . Lemma 1.16.

For integer r ≥ and ψ as above, if G n is uniformly distributed on G n ( d ) (with nd even and n ≥ d + 1 ), we have sup p ∈I d,r ( M ) ( h ( p ) + h p, ψ i ) ≤ lim inf n →∞ E n ln Z G n ≤ lim sup n →∞ E n ln Z G n ≤ sup p ∈I d,r ( M ) (cid:0) ¯ h ( p ) + h p, ψ i (cid:1) . In particular, if Conjecture 1.4 holds true, then it would automatically imply the convergence of theexpected free energy for all functions ψ . Note also that Theorem 1.7 can be used to obtain a lower boundexpected free energy while Theorem 1.5 can be used to get an upper bound. With proper technical conditions,it is possible to extend Lemma 1.16 to some hard-constrained models, that is to some functions ϕ = e ψ whichtake value in [0 , ∞ ) . For simplicity, we will however not discuss in details this possibility here.In the same vein, in combinatorial optimization problems, we are often interested in the computation ofa graph functional of the form: L G = max f ∈ M n n X v =1 ψ (( G, f, v ) r ) , with r ≥ and ψ as above. Again, it is easy to check that if G n is uniformly sampled in G n ( d ) , we have, inprobability, L G n n − E L G n n → . The following statement is a corollary of Lemma 1.16. It shows that typical processes are intimately connectedto the computation of limits of E L G n /n . Lemma 1.17.

For integer r ≥ and ψ as above, if G n is uniformly distributed on G n ( d ) (with nd even and n ≥ d + 1 ), we have sup p ∈I d,r ( M ): h ( p ) ≥ h p, ψ i ≤ lim inf n →∞ E L G n n ≤ lim sup n →∞ E L G n n ≤ sup p ∈I d,r ( M ):¯ h ( p ) ≥ h p, ψ i . Again, we observe that Theorem 1.7 can be used to obtain a lower bound on E L G n /n and Theorem 1.5an upper bound. Remark . We conclude this paragraph by mentioning that, for r = 1 , there is a simpliﬁcation of Lemma1.16 and Lemma 1.17 for functions ψ of the form ψ (( G, f, v ) ) = ψ ( f ( v )) + X u : u ∼ v ψ ( f ( v ) , f ( u )) where the supremum in Lemma 1.16 and Lemma 1.17 is taken over p ∈ I e ( M ) instead of p ∈ I d, ( M ) andthe entropic term is given by h ( p ) = sup h ( q ) where the supremum is over all q ∈ I d, ( M ) whose restrictionto E is p , and similarly for ¯ h ( p ) . This can be useful because it reduces the dimension of the underlyingoptimization problem. In that case, Theorem 1.15 can be used to give lower bounds. Organization of the paper.

The remainder of this text is organized as follows. In Section 2, we willestablish the key properties of soﬁc and annealed entropies. In Section 3, we will prove the main results ofthis paper. In the ﬁnal Section 4, we will extend our framework and main results to invariant processes onunimodular Galton-Watson trees. . Properties of sofic and annealed entropies

Concentration of entropy: proof of Lemma 1.2.

Let G n be a uniformly distributed random graphon G n ( d ) with n ≥ d − and nd even. Recall the deﬁnition of H G n ( µ, r, ǫ ) in (1.3). The aim of this subsectionis to establish the following concentration result. Theorem 2.1.

Let r ≥ , µ ∈ I d ( M ) , h ∈ {−∞} ∪ [0 , ∞ ) . For all continuous functions δ : [0 , ∞ ) → [0 , ∞ ) with δ (0) = 0 , we have the following: if for all ǫ > , lim sup n →∞ n log P ( H G n ( µ, r, ǫ ) ≥ h ) ≥ − δ ( ǫ ) , (2.1) then h ( µ, r, α ) ≥ h for all < α < . Conversely, there exists a function δ as above positive on (0 , ∞ ) suchthat: if for all ǫ > , lim sup n →∞ n log P ( H G n ( µ, , r, ǫ ) ≤ h ) ≥ − δ ( ǫ ) (2.2) then ¯ h ( µ, r, α ) ≤ h for all < α < . Finally, the same claims hold with lim inf and h replacing lim sup and ¯ h . It is immediate to check that Lemma 1.2 is a corollary of Theorem 2.1. Beware of the asymmetry betweenthe lower and upper bound. We believe that it is a caveat of our proof. It is ultimately due to the fact that H G n ( µ, r, ǫ ) can be equal to −∞ .The proof of Theorem 2.1 makes a detour through a relaxation of the entropy. Fix µ ∈ I d ( M ) and r ≥ .If G is in G d ( n ) , we deﬁne for β > , Z G ( β ) = X f ∈ M n e − nβ d(distr G ( f ) r ,µ r ) . We start the proof of the proposition with a concentration inequality.

Lemma 2.2.

Let G n be uniformly distributed on G n ( d ) with dn even and n ≥ d − . There exists a constant C depending on ( d, r ) and a deterministic number s n ( β ) depending on ( n, d, r, β ) such that for any t > , wehave P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) n ln Z G n ( β ) − s n ( β ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:19) ≤ C exp( − nt / ( Cβ ) ) . Proof.

The proof follows a standard path. By classical contiguity results, it is enough to establish the claimfor the conﬁguration model (see Bollobás [10, Section 2.4]). Recall that the conﬁguration model is the graph(with possible loops and multiple edges) obtained as follows. We attach to each vertex in [ n ] , d half-edges.We sample a matching m on the set ~E of nd half-edges uniformly at random (recall that a matching is aninvolution without ﬁxed point). Finally, we form a d -regular graph G = G ( m ) by creating an edge for eachpair of matched half-edges.Let us say that two matchings m, m ′ diﬀer by a switch if there exists ( a, b, c, d ) in ~E such that m ( e ) = m ′ ( e ) for all e ∈ ~E \{ a, b, c, d } and m ( a ) = b , m ′ ( a ) = c , m ( c ) = d , m ′ ( b ) = d . If m and m ′ diﬀer by a switch, weclaim that for any f ∈ M n , d(distr G ( m ) ( f ) r , distr G ( m ′ ) ( f ) r ) ≤ C d ( d − r − n = θn , where C is the diameter of P ( G • M ) for the distance d . Indeed, we have distr G,v ( f ) r = distr G ′ ,v ′ ( f ′ ) r ifthe rooted subgraphs ( G, f, v ) r and ( G ′ , f ′ , v ′ ) r are isomorphic. Notably, distr G ( m ) ,v ( f ) r = distr G ( m ′ ) ,v ( f ) r unless v is at distance at most r from an edge in the symmetric diﬀerence of G ( m ) and G ( m ′ ) .We deduce that e − βθ ≤ Z G ( m ′ ) Z G ( m ) ≤ e βθ and (cid:12)(cid:12) ln Z G ( m ) − ln Z G ( m ′ ) (cid:12)(cid:12) ≤ βθ. rom [29, Theorem 2.19], if m is a uniformly sampled matching on ~E , we get P (cid:18)(cid:12)(cid:12)(cid:12)(cid:12) n ln Z G ( m ) − E n ln Z G ( m ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ t (cid:19) ≤ − nt / (2 dθ β )) . The conclusion follows with s n ( β ) = E n ln Z G ( m ) ( β ) . (cid:3) We are ready for the proof of Theorem 2.1.

Proof of Theorem 2.1.

Recall the deﬁnition of F G ( µ, r, ǫ ) in (1.2). Since µ and r are ﬁxed, we write simply F G ( ǫ ) and set F G ( ǫ ) = |F G ( ǫ ) | . For any ǫ > , we have ln Z G ( β ) ≥ ln F G ( ǫ ) − nβǫ, (2.3)where we have used that d(distr G ( f ) r , µ r ) ≤ ǫ for all f ∈ F G ( ǫ ) . The other way around, if f / ∈ F G ( ǫ ) , wehave d(distr G ( f ) r , µ r ) ≥ ǫ . Hence, ln Z G ( β ) ≤ ln (cid:0) | M | n e − nβǫ + F G ( ǫ ) (cid:1) ≤ ln 2 + (ln F G ( ǫ )) ∨ ( n (ln | M | − βǫ )) . (2.4)We may now prove the ﬁrst claim of the theorem. Assume that (2.1) holds for some h ≥ (if h = −∞ ,there is nothing to prove). Let h < h and E = { H G n ( µ, r, ǫ ) ≥ h } . On the event E , from (2.3), we have forall β > , n ln Z G n ( β ) ≥ h − βǫ. There exists β ǫ such that, as ǫ → , β ǫ ǫ → , β ǫ → ∞ and β ǫ δ ( ǫ ) → . For this choice of β , ( t ǫ /β ǫ ) ≫ δ ( ǫ ) for some t ǫ → . It follows from Lemma 2.2 and (2.1) that for any h , h such that h < h < h < h , s n ( β ǫ ) ≥ h for all n large enough (depending on ǫ ). Let < α < . Applying again Lemma 2.2, we deducethat the event E = { n ln Z G n ( β ǫ ) ≥ h } has probability greater than α for all n large enough.Now, there exists η ǫ such that, as ǫ → , η ǫ → and β ǫ η ǫ → ∞ (for example η ǫ = 1 / √ β ǫ ). We apply(2.4) with ǫ = η ǫ . We get on the event E , if ǫ is small enough, H G n ( µ, r, η ǫ ) ≥ n ln Z G n ( β ǫ ) − n ln 2 ≥ h − n ln 2 . The right-hand side is larger than h if n is large enough. We deduce that ¯ h ( µ, r, α ) ≥ h , since h can bearbitrarily close to h , the ﬁrst claim follows.The second claim is proven similarly. Since H G n ( µ, r, ǫ ) takes value in {−∞}∪ [0 , ∞ ) , we have H G n ( µ, r, ǫ ) ≤− if and only if H G n ( µ, r, ǫ ) = −∞ . We may thus prove the second claim with h ≥ − . Assume that (2.2)holds for some δ ( ǫ ) which will be deﬁned later on. We now set E = { H G n ( µ, r, ǫ ) ≤ h } . On the event E , from(2.4), we have for all β > , n ln Z G n ( β ) ≤ n ln 2 + h ∨ (ln | M | − βǫ ) . If β ≥ β ǫ = (ln | M | + 1) /ǫ then we get, for all n large enough n ln Z G n ( β ) ≤ n ln 2 + h. We choose δ ( ǫ ) so that δ ( ǫ ) β ǫ → as ǫ → (for example δ ( ǫ ) = ǫ ). Then, if h > h > h > h and ǫ issmall enough, we deduce by Lemma 2.2 that s n ( β ǫ ) ≤ h for all n large enough. We apply again Lemma 2.2 and deduce that the event E = { n ln Z G n ( β ǫ ) ≤ h } hasprobability greater than α for all n large enough. Finally, from (2.3), we have on the event E , H G n ( µ, r, ǫ ) ≤ h + β ǫ ǫ The latter is less than h for all ǫ small enough. The second claim follows. Obviously, the same argumentworks with lim inf and h ( µ, r, α ) . (cid:3) .2. Maximizers of the annealed entropy: proofs of Lemma 1.12 and Lemma 1.14.

In this sub-section, we prove Lemma 1.12 and Lemma 1.14. If

X, Y are discrete random variables, we recall that therelative entropy of X given Y is H ( X | Y ) = − X x,y P (( X, Y ) = ( x, y )) ln P ( X = x | Y = y ) , where P ( A | B ) = P ( A ∩ B ) / P ( B ) is the usual conditional probability (if P ( B ) = 0 , P ( A | B ) takes an arbitraryvalue). In other words, H ( X | Y ) is the average over Y of the entropy of the conditional law of X given Y .We will repeatedly use that H ( X, Y ) = H ( Y ) + H ( X | Y ) and H ( X | ( Y, Y ′ )) ≤ H ( X | Y ) , (2.5)with equality if and only if X conditioned on Y is independent of Y ′ .We start with the proof of Lemma 1.12. Proof of Lemma 1.12.

The following fact is useful. For a given integer r ≥ , we introduce the ﬁnite set N = M S r − where as usual S r − = B r − ( o ) . We consider the map Ψ from M T d to N T d which maps x to Ψ( x ) such that for v ∈ T d , Ψ( x ) v is the restriction of x to S r − ( v ) (composed by a given isomorphism from S r − ( v ) to S r − ). If X is a process on T d then for all integers t ≥ , we have Σ t + r ( X ) = Σ t +1 (Ψ( X )) .Moreover, if X is a r -Markov process, then Ψ( X ) is an edge-Markov process.As a byproduct, it is suﬃcient to prove Theorem 1.12 with r = 1 : we should check that Σ ( X ) ≤ Σ ( X ) with equality if and only if X S is an edge Markov process. Note that the above inequality can be equivalentlywritten as H ( X S ) − H ( X S ) − d H ( X E ) + d H ( X E ) ≤ . (2.6)To check that (2.6) holds, we need some extra notation. We denote by L = { , . . . , d } the left side of E along E = { o, } . We also set L i = { ( i, , . . . , ( i, d − } with i = 1 , . . . , d . We have S = L ∪ E and thus,from (2.5) H ( S ) = H ( E ) + H ( L | E ) , where for ease of notation, for sets S, T , we write H ( S ) and H ( S | T ) in place of H ( X S ) and H ( X S | X T ) .Similarly, since E = S ∪ L , H ( E ) = H ( S ) + H ( L | S ) = H ( E ) + H ( L | E ) + H ( L | S ) . Finally, since S is the disjoint union of E and ∪ di =2 L i , we have, H ( S ) = H ( E ) + H ( ∪ di =2 L i | E ) . The last three identities imply that Equation (2.6) is equivalent to H ( ∪ di =2 L i | E ) − (cid:18) d − (cid:19) H ( L | S ) − d H ( L | E ) ≤ . (2.7)Using the invariance, we deduce from (2.5) that H ( ∪ di =2 L i | E ) ≤ ( d − H ( L | E ) with equality if and only if there is conditional independence of the X L i ’s given X E . Now, since E contains S , we get H ( L | E ) ≤ H ( L | S ) = H ( L | S ) , with equality in case of conditional independence of X L and X E \ S given X S . It follows that the left-handside of (2.7) is upper bounded by d H ( L | S ) − d H ( L | E ) . (2.8)From the invariance of X by switching the two sides of e , we get H ( L | E ) = H ( L | E ) and thus (2.8) isequal to d H ( L | S ) − H ( L | E )) . sing again (2.5), since E ⊂ S , this last expression is always non-positive with equality if and only if X L is conditionally independent of X S given X e . This proves that (2.6) holds. By considering the case ofequality, it is then easy to check that it implies that X S is an edge Markov process. It concludes the proofof Lemma 1.12. (cid:3) We now prove Lemma 1.14.

Proof of Lemma 1.14.

Let X ∈ I d ( M ) . From (2.5), with the notation used in the proof of Lemma 1.12, wehave H ( S ) = H ( o ) + H ( S | o ) ≤ H ( o ) + dH ( E | o ) , with equality if the variables ( X o , X i ) ≤ i ≤ d conditioned on X o are independent. Using (2.5) again, we get H ( S ) ≤ dH ( E ) − ( d − H ( o ) . So ﬁnally, Σ ( X ) = H ( S ) − d H ( E ) ≤ d H ( E ) − ( d − H ( o ) = Σ e ( X ) as requested. (cid:3) Combinatorial characterization of the annealed entropy.

In this subsection, we give a combina-torial interpretation of the annealed entropy Σ r ( µ ) . Recall that G n ( d ) is the set of simple d -regular graphson the vertex set [ n ] . For µ ∈ I d ( M ) , r ≥ integer and ǫ > , we deﬁne the set of colored graphs whose r -neighborhood is close to µ r as G n ( µ, r, ǫ ) = { ( G, f ) : G ∈ G n ( d ) , f ∈ M n , d(distr G ( f ) r , µ r ) ≤ ǫ } = G G ∈G n ( d ) F G ( µ, r, ǫ ) , where ⊔ is the disjoint union and F G ( µ, r, ǫ ) was deﬁned in (1.2). We then set Σ n ( µ, r, ǫ ) = 1 n (log |G n ( µ, r, ǫ ) | − log |G n ( d ) | ) = 1 n log E |F G n ( µ, r, ǫ ) | , (2.9)where the expectation is with respect to the random graph G n uniformly distributed on G n ( d ) . In comparisonwith the deﬁnition of H G n ( µ, r, ǫ ) in (1.3), Σ n ( µ, r, ǫ ) appears as an annealed quantity in the sense that thereis an average over the randomness of G n inside the logarithm. The following theorem asserts that Σ n ( µ, r, ǫ ) is close to Σ r ( µ ) as n goes to inﬁnity and ǫ goes to . Theorem 2.3.

Let µ ∈ I d ( M ) and r ≥ integer. We have lim ǫ → lim inf n →∞ Σ n ( µ, r, ǫ ) = lim ǫ → lim sup n →∞ Σ n ( µ, r, ǫ ) = Σ r ( µ ) . Proof.

One side of this identity can be found in [3, Lemma 6.2]. We will however give a proof which relieson [16] which is a generalization of [12] to colored graphs. This is interesting because it connects [12, 16] tothe entropic inequalities found in [2, 3]. First, a classical result of Bender and Canﬁeld [7] implies that n log |G n ( d ) | = d n − s ( d ) − log( d !) + o (1) , (2.10)where s ( d ) = d/ − ( d/

2) log d . On the other hand, Proposition 5 and Proposition 6 in Delgosha andAnantharam [16] imply that lim ǫ → lim inf n →∞ (cid:18) n log |G n ( µ, r, ǫ ) | − d n (cid:19) = lim ǫ → lim sup n →∞ (cid:18) n log |G n ( µ, r, ǫ ) | − d n (cid:19) = J r ( µ ) , where J r ( µ ) has an explicit formula that we now describe (the same formula appears in [12]).We deﬁne e T • r − as the set of unlabeled colored rooted ( d − -ary trees of depth r − . An element g = ( t, t ′ ) ∈ e E r = e T • r − × e T • r − can be seen as an unlabeled coloring of E r rooted on the oriented edge ( o, .For g = ( t, t ′ ) in e E r and X a coloring of T d , we then deﬁne N X ( g ) as the number of neighbors v of the rootsuch that X restricted to E r ( o, v ) = B r − ( { o, v } ) is isomorphic to g : more precisely such that the restrictionof X to ( d − -ary tree rooted at o (respectively v ) in E r ( o, v ) \{ o, v } is isomorphic to t (respectively t ′ ). Byconstruction X g ∈ e E r N X ( g ) = deg( o ) = d. (2.11) f X is a random coloring of T d with µ , we then deﬁne a probability measure on e E r by, for all g ∈ e E r : π µ ( g ) = E [ N X ( g )] d , where the expectation is with respect to the randomness of X . We have J r ( µ ) = − s ( d ) + H ( e X S r ) − d H ( π µ ) − X g ∈ e E r E [log( N X ( g )!)] , where e X S r is the rooted unlabeled coloring associated to X S r . As a sanity check, if M is a singleton, then J r ( µ ) = − s ( d ) − log( d !) and we retrieve Equation (2.10). Moreover, in view of Equation (2.10), the theoremfollows from the claim J r ( µ ) = − s ( d ) − log( d !) + H ( X S r ) − d H ( X E r ) . (2.12)The expression (2.12) is obtained by putting random labeling on an unlabeled rooted coloring and followingthe eﬀect on the Shannon entropy. We ﬁrst observe that, since X is invariant, for any g ∈ e E r , we have P ( X E r ≃ g ) = 1 d d X v =1 P ( X E r ( o,v ) ≃ g ) = π µ ( g ) . It follows that π µ is the law of e X E r deﬁned as the unlabeled coloring associated to X E r rooted at the orientededge ( o, . Besides, since X is invariant, X E r is in one-to-one correspondence with the triple ( e X E r , σ, σ ′ ) where, given e X E r , σ and σ ′ are independent and σ is a uniform random labeling of e X E r restricted to E or ,the ( d − -ary tree rooted at o in E r \{ o, } , and similarly for σ ′ . From the relative entropy identity (2.5),we ﬁnd that H ( X E r ) = H ( π µ ) + 2 K, where K is the relative entropy of σ given e X E or .Secondly, we observe that X S r is in one-to-one correspondence with the vector Y = ( X E r , . . . , X E dr ) where E kr is the ( d − -ary tree rooted at k in E r \{ o, k } . It follows that H ( Y ) = H ( X S r ) . Also, if e Y = ( e X E r , . . . , e X E dr ) , we ﬁnd from what precedes and the invariance of X that H ( X S r ) = H ( Y ) = H ( e Y ) + dK. Finally, the diﬀerence between e Y and e X S r is that the neighbors of o are ordered (or labeled) in e Y . Wededuce from Lemma 2.4 below that H ( e Y ) = H ( e X S r ) − X g ∈ e E r E [log( N X ( g )!)] + log( d !) . This concludes the proof of (2.12). (cid:3)

In the proof of Theorem 2.3, we have used the following elementary lemma. Recall that a vector isexchangeable if its law is invariant by any permutation of its coordinates.

Lemma 2.4.

Let F be a ﬁnite set and Z = ( Z , . . . , Z n ) a random exchangeable vector in F n . The countingmeasure N Z = P ni =1 δ Z i associated to Z satisﬁes H ( Z ) = H ( N Z ) + X x ∈ F E [log N Z ( x )!] − log n ! . Proof.

We consider the equivalence class on F n , z ∼ z ′ if z and z ′ are equal up to a permutation of thecoordinates of z . We have z ∼ z ′ if and only if N z = N z ′ . Moreover, the number of vectors in the equivalenceclass of z is given by the multinomial formula: n ! Q x ∈ F N z ( x )! . Using the exchangeability of Z , we deduce that P ( Z = z ) = Q x ∈ F N z ( x )! n ! X z ′ ∼ z P ( Z = z ′ ) = Q x ∈ F N z ( x )! n ! P ( N Z = N z ) . t then remains to use the relative entropy formula (2.5). (cid:3) Proofs of main results

First moment method: proof of Theorem 1.5.

Let r ≥ , ǫ > and G n be uniformly sampled on G n ( d ) . From Markov inequality, for any real h , P ( H G n ( µ, r, ǫ ) ≥ h ) = P ( |F G n ( µ, r, ǫ ) | ≥ e nh ) ≤ e − nh E |F G n ( µ, r, ǫ ) | . In particular, we ﬁnd n log P ( H G n ( µ, r, ǫ ) ≥ h ) ≤ Σ n ( µ, r, ǫ ) − h. From Theorem 2.3, we deduce the large deviations bound lim sup n →∞ n log P ( H G n ( µ, r, ǫ ) ≥ h ) ≤ Σ r ( µ ) − h + δ ( ǫ ) , where δ ( ǫ ) goes to as ǫ → . If h > Σ r ( µ ) , the right-hand side of the above expression is negative for all ǫ small enough. We deduce in particular that for all ǫ small enough, P ( H G n ( µ, r, ǫ ) ≥ h ) converges to . ByLemma 1.2, this proves that h ( µ, r ) < h . It concludes the proof of Theorem 1.5. (cid:3) Second moment method: proof of Theorem 1.7.

Let µ ∈ I d ( M ) , r ≥ and set p = µ r ∈ I d,r ( M ) .In view of Theorem 1.5, we should prove that h ( µ, r ) ≥ Σ( p ) . (3.1)For ease of notation, we write F G ( p, ǫ ) in place of F G ( µ, r, ǫ ) (since this depends of µ only through µ r = p ).The Paley-Zygmund inequality implies that P (cid:18) H G n ( µ, r, ǫ ) ≥ Σ n ( µ, r, ǫ ) − n (cid:19) = P (cid:0) |F G n ( p, ǫ ) | ≥ e − E |F G n ( p, ǫ ) | (cid:1) ≥ (1 − e − ) ( E |F G n ( p, ǫ ) | ) E |F G n ( p, ǫ ) | = (1 − e − ) exp(2 n Σ n ( µ, r, ǫ )) E |F G n ( p, ǫ ) | . Since µ r = p , we have Σ( p ) = Σ r ( µ ) and, by Theorem 2.3, lim inf n →∞ Σ n ( µ, r, ǫ ) ≥ Σ( p ) − δ ( ǫ ) , where δ ( ǫ ) goes to as ǫ → . We deduce that if we manage to prove that lim sup n →∞ n log E |F G n ( p, ǫ ) | ≤ p ) + δ ′ ( ǫ ) , (3.2)where δ ′ ( ǫ ) goes to as ǫ → , then we would get that lim inf n →∞ n P ( H G n ( µ, r, ǫ ) ≥ Σ( p ) − δ ( ǫ )) ≥ − δ ( ǫ ) − δ ′ ( ǫ ) . From Equation (2.1) in Theorem 2.1, this would imply that h ( µ, r ) ≥ Σ( p ) as claimed in (3.1).It thus remains to prove Equation (3.2). For concreteness, we may assume that the chosen distance d generating the weak topology is the total variation distance. To that end, let ǫ > and N ε be an ε -net onthe set of invariant coupling q ∈ I d,r ( M ) of p and p . Given a graph G ∈ G n ( d ) , consider two colorings of G with color set M whose r -neighborhood statistics are at most at total variation distance ε from p . Thenumber of such pairs is |F G ( p, ε ) | . On the other hand, each pair is in fact a coloring of G on M . Thenits r -neighborhood statistics is an element q ′ ∈ I d,r ( M ) . Since both marginals of q ′ are at most at totalvariation distance ε from p , there is a measure q ∗ ∈ I d,r ( M ) whose total variation distance is at most ε from q ′ and whose both marginals are exactly p (for each marginal of q ′ , there is an invariant coupling ofthis marginal and p such that the two colorings are equal with probability − ǫ ). Therefore there exists an lement in the ε -net, q ∈ N ǫ such that the distance of q from the original pair of coloring is at most ε . Weconclude that |F G ( p, ε ) | ≤ X q ∈N ε |F G ( q, ε ) | . This implies that E |F G n ( p, ǫ ) | ≤ |G n ( d ) | X G ∈G n ( d ) X q ∈N ε |F G ( q, ε ) | = X q ∈N ε exp( n Σ n ( µ ( q ) , r, ǫ )) . It follows that, n log E |F G n ( p, ǫ ) | ≤ max q ∈N ǫ Σ n ( µ ( q ) , r, ǫ ) + 1 n log |N ε | . By Theorem 2.3, we deduce that, for some function δ ′ ( ǫ ) going to with ǫ → , lim sup n →∞ n log E |F G n ( p, ǫ ) | ≤ max q ∈N ǫ Σ( q ) + δ ′ ( ǫ ) . By assumption for any q ∈ I d,r ( M ) , Σ( q ) ≤ p ) . We thus have proved that (3.2) holds. (cid:3) Proof of Theorem 1.11.

In view of Theorem 1.5 and Theorem 1.7, it remains to prove that for any t ≥ r , h ( µ ( p ) , t ) ≥ Σ r ( µ ( p )) = Σ( p ) . Let q ∈ I d,t ( M ) be an invariant coupling of ( µ ( p )) t and ( µ ( p )) t . Then q r is an invariant coupling of p and p (since (( µ ( p )) t ) r = µ ( p ) r = p by construction). By Lemma 1.12, we have Σ( q ) = Σ( µ ( q )) ≤ Σ( µ ( µ ( q ) r )) = Σ( q r ) . By assumption Σ( q r ) ≤ p ) . However, by Lemma 1.12, we have Σ( p ) = Σ( µ ( p ) t ) . It follows that Σ( q ) ≤ µ ( p ) t ) . From Equation (3.1) applied to the radius t , this implies that h ( µ ( p ) , t ) ≥ Σ( µ ( p ) t ) . By a last application of Lemma 1.12, the right-hand side of above expression is equal to Σ( p ) . This concludesthe proof of Theorem 1.11. (cid:3) Application to factor graphs: proofs of Lemmas 1.16 and Lemma 1.17.

We start with theproof of Lemma 1.16.

Proof of Lemma 1.16.

By construction, we have Z G = X f ∈ M n n Y v =1 e ψ (( G,f,v ) r ) = X f ∈ M n e n h distr G ( f ) r ,ψ i . Let ǫ > and N ε be an ε -net of I d,r ( M ) . The function p → h p, ψ i being uniformly continuous (since M is ﬁnite), there exists a function δ ( ǫ ) → as ǫ → such that for any probability measure, say q , on rootedcolored graphs of radius r , if d( q, p ) ≤ ǫ then |h q, ψ i − h p, ψ i| ≤ δ ( ǫ ) . If N G is the number of colorings suchthat distr G ( f ) is at distance larger than ǫ from N ε , it follows that Z G ≤ X p ∈N ε |F G ( µ ( p ) , r, ǫ ) | e n h p,ψ i + nδ ( ǫ ) + N G e n k ψ k ∞ , ≤ |N ǫ | max p ∈I d,r ( M ) e n ( H G ( µ ( p ) ,r,ǫ )+ h p,ψ i + δ ( ǫ )) + N G e n k ψ k ∞ . Now, if G n is uniformly sampled on G n ( d ) , then, for any ﬁxed ǫ > , P ( N G n = 0) converges to (since distr G n converges in probability to a Dirac mass at ( T d , o ) ). Using (1.5) and taking the limit in n , we ﬁnd lim sup n →∞ E n ln Z G n ≤ max p ∈I d,r ( M ) (cid:0) ¯ h ( p ) + h p, ψ i + δ ′ ( ǫ ) (cid:1) with δ ′ ( ǫ ) → as ǫ → . This gives the upper bound in Lemma 1.16. or the lower bound, we write similarly, Z G ≥ X p ∈N ε |F G ( µ ( p ) , r, ǫ ) | e n h p,ψ i− nδ ( ǫ ) , ≥ max p ∈N ε e n ( H G ( µ ( p ) ,r,ǫ )+ h p,ψ i− δ ( ǫ )) ≥ max p ∈I d,r ( M ) e n ( H G ( µ ( p ) ,r,ǫ )+ h p,ψ i− δ ( ǫ )) . The conclusion follows easily. (cid:3)

Lemma 1.17 is a corollary of Lemma 1.16.

Proof of Lemma 1.17.

For β > , let Z G ( β ) be the factor graph model: Z G ( β ) = X f ∈ M n n Y v =1 e nβψ (( G,f,v ) r ) . By construction, we have | M | − n Z G ( β ) ≤ e βL G ≤ Z G ( β ) . By Lemma 1.16, we ﬁnd sup p ∈I d,r ( M ) (cid:18) h ( p ) β + h p, ψ i − ln | M | β (cid:19) ≤ lim inf n E L G n n ≤ lim sup n E L G n n ≤ sup p ∈I d,r ( M ) (cid:18) ¯ h ( p ) β + h p, ψ i (cid:19) . We recall that ¯ h ( p ) and h ( p ) take value in {−∞} ∪ [0 , ln | M | ] . We get sup p ∈I d,r ( M ): h ( p ) ≥ (cid:18) h p, ψ i − ln | M | β (cid:19) ≤ lim inf n E L G n n ≤ lim sup n E L G n n ≤ sup p ∈I d,r ( M ): h ( p ) ≥ (cid:18) ln | M | β + h p, ψ i (cid:19) . We obtain the statement of the lemma by taking the limit β → ∞ . (cid:3) Extension to processes on unimodular Galton-Watson trees

An extended setting.

We now discuss an extension to processes on random trees. We will focus ourattention on unimodular Galton-Watson trees . In this section, we ﬁx a probability measure π on integerswith positive and ﬁnite expectation: d = ∞ X k =0 kπ ( k ) > . We deﬁne ˆ ν , the size-biased version of ν as the probability measure deﬁned by: for all integers k ≥ π ( k ) = ( k + 1) π ( k + 1) d . Then, the unimodular Galton-Watson tree with degree distribution π , is the Galton-Watson tree whose vertexset is a subset of N f deﬁned in (1.1) such that the root o has a number of oﬀsprings N o with distribution π indexed by , . . . , N o and all other vertices v have an independent number of oﬀsprings N v with distribution ˆ π indexed by ( v, , . . . , ( v, N v ) . We will denote by T a realization of this random tree and UGW( π ) the lawof the rooted tree ( T, o ) . We note that ( T, o ) is randomly labeled in the sense deﬁned in Subsection 1.1.For example, if π is a Dirac mass at d then T is the d -regular tree. If π is a Poisson random variable withmean d , then ˆ π = π and T is a standard Galton-Watson tree with Poisson oﬀspring distribution.As its name suggests, the random rooted tree T is unimodular. Recall that a random rooted graph ( G, o ) is unimodular if for all non-negative functions f on the set of doubly rooted graphs (a connected graph withtwo ordered distinguished vertices) which are invariant by isomorphisms, we have E X v ∈ V f ( G, o, v ) = E X v ∈ V f ( G, v, o ) . (4.1)where V is the vertex set of G and the expectation is with respect to the randomness of ( G, o ) .If M is a ﬁnite set, an invariant process X on T is deﬁned as a random colored tree ( T, X ) such that ( T, X, o ) is unimodular (that is, it satisﬁes (4.1) with G = ( T, X ) and f deﬁned on the set of doubly rooted olored graphs which are invariant by isomorphisms). We denote by I π ( M ) the set of laws of ( T, X ) with X invariant colorings of T on the color set M .Now, in order to deﬁne a relevant notion of soﬁc entropy, we need to choose the ensemble of ﬁnite graphs G n such that distr G n converges to UGW( π ) . A natural choice is the family of uniform random graphs witha given degree sequence. Let d n = ( d n (1) , . . . , d n ( n )) be a sequence of integers, indexed by a subset of N ,whose sum is even and such that distr d n = 1 n n X v =1 δ d n ( v ) converges weakly to π . For technical simplicity, we assume that the degree sequence is uniformly bounded:for some real ∆ , sup n max ≤ v ≤ n d n ( v ) ≤ ∆ . (4.2)Note in particular that (4.2) implies that the support of π is contained in { , . . . , ∆ } . From Erdős-GallaiTheorem [19], for all n large enough, the set G n ( d n ) of simple graphs with vertex set [ n ] = { , . . . , n } suchthat for all v ∈ [ n ] , v has degree d n ( v ) is not empty. Under these conditions, if G n is uniformly distributedon G n ( d n ) then almost surely distr G n converges to UGW( π ) , see for example [27]. In the statements below,we will not repeat the above assumptions on the sequence ( d n ) .For a given probability measure µ ∈ I π ( M ) (that is, µ is the law of an invariant coloring ( T, X ) ), we cannow reproduce the deﬁnition of weakly and typical processes and deﬁne the soﬁc entropy by taking limitsof H G n ( µ, r, ǫ ) deﬁned in (1.3). We do not repeat the deﬁnitions since there are identical except that G n is now a random graph uniformly distributed on G n ( d n ) . For integer r ≥ and < α < , we deﬁne thequantity ¯ h ( µ, r, α ) and h ( µ, r, α ) exactly as done below (1.3). Lemma 1.2 continues to holds in this moregeneral setting. Lemma 4.1.

Let µ ∈ I π ( M ) and r ≥ . The function α (¯ h ( µ, r, α ) , h ( µ, r, α )) is constant on (0 , .Proof. The proof of Theorem 2.1 works verbatim under the assumption (4.2). (cid:3)

We deﬁne ¯ h ( µ, r ) and h ( µ, r ) as the common value of ¯ h ( µ, r, α ) and h ( µ, r, α ) . The upper and lower soﬁcentropies ¯ h ( µ ) and h ( µ ) are the limits in r of ¯ h ( µ, r ) and h ( µ, r ) . Exactly as in Lemma 1.3, as a corollary ofLemma 4.1, we obtain the following claim. Lemma 4.2.

Let µ ∈ I π ( M ) . We have ¯ h ( µ ) ≥ (resp. h ( µ ) ≥ ) if and only if µ is weakly (resp. strongly)typical. Annealed entropy.

In this broader setting, the annealed entropy is deﬁned as follows. Let ( T, o ) bea randomly labeled rooted unimodular tree and X an invariant coloring of T with ( T, X ) having law µ . Thedegree of a vertex v of T is denoted by deg( v ) . We assume that d = E deg( o ) > . As above, if r ≥ is aninteger and S is a subset of the vertices of T , B r ( S ) is the subset of vertices of T at distance at most r from S . For r ≥ , we set S r = B r ( o ) and, if deg( o ) ≥ , we set E r = B r − ( { o, } ) (since T is randomly labeled,the neighbors of the root are indexed by (1 , . . . , deg( o )) ).We denote by X S r the colored tree ( T, X ) restricted to S r : by construction, X S r has law µ r . We also needto deﬁne the law of X restricted to E r but this requires a biasing of the tree T . This is done as follows. A(directed) edge-rooted graph is deﬁned as a pair ( G, ρ ) formed by a connected graph G and a distinguisheddirected edge ρ = ( u, v ) (that is, { u, v } is an edge of the graph). Now, we denote by ~µ the law on colorededge-rooted trees deﬁned by: ~µ ( · ) = 1 d E [deg( o ) (( T, X, ( o, ∈ · ] , (4.3)where ( T, X ) has law µ and d = E deg( o ) . Note that under the probability measure ~µ , o has at least degree and thus { o, } is an edge of the tree. We denote by ( ~T , ~X, ρ ) , with ρ = ( o, a random variable withlaw ~µ . It is easy to check that Equation (4.1) implies that ~µ is invariant by switching the two sides of theoriented edge. Moreover, if T has law UGW( π ) then ~T is given by two independent Galton-Watson treeswith oﬀspring distribution ˆ π whose roots are connected by the root-edge, see [1, Example 1.1]. In particular,if π is a Dirac mass at d , then T = ~T .We denote by ~X E r the colored tree ( ~T , ~X ) restricted to E r . The law of ~X E r is ~µ r , the restriction of ~µ to E r . We observe that ~µ r depends on µ only through its marginal µ r . ow, if X is an invariant coloring of T with law µ ∈ I π ( M ) and r ≥ is an integer, we set Σ r ( X ) = Σ r ( µ ) = H ( X S r ) − d H ( ~X E r ) − H ( π ) . (4.4)See Remark 4.5 for an alternative expression which is arguably more natural. Thanks to assumption (4.2) itis immediate that the above entropies are ﬁnite as soon as M is ﬁnite. Note also that Σ r ( µ ) depends on µ only through µ r . We will check in Lemma 4.10 below that Σ r ( µ ) is non-increasing in r . We may thus deﬁne Σ( µ ) = lim r →∞ Σ r ( µ ) . The quantities Σ r ( µ ) and Σ( µ ) are the annealed entropies of µ r and µ . The following lemma generalizesLemma 4.3. Theorem 4.3.

For any µ ∈ I π ( M ) and integer r ≥ , we have ¯ h ( µ, r ) ≤ Σ r ( µ ) and ¯ h ( µ ) ≤ Σ( µ ) . There is also an analog of Theorem 1.7.

Theorem 4.4.

Let µ ∈ I π ( M ) . For any integer r ≥ , if all invariant couplings ν of µ and µ , we have Σ r ( ν ) ≤ r ( µ ) then h ( µ, r ) = ¯ h ( µ, r ) = Σ r ( µ ) . In particular, if the above condition holds for an increasingsequence of integers ( r k ) k ≥ then h ( µ ) = ¯ h ( µ ) = Σ( µ ) . Remark . The annealed entropy is also given by the formula: Σ r ( X ) = H ( X S r | T S r ) − d H ( ~X E r | ~T E r ) , where H ( X | Y ) = H ( X, Y ) − H ( Y ) is the relative entropy. Indeed ~T is the union of two independent copiesof T ′ , a Galton-Watson tree with oﬀspring distribution ˆ π , while T is the union of N independent copiesof T ′ and with N independent with distribution π . It follows that, H ( ~T E r ) = 2 H ( T ′ S r − ) and H ( T S r ) = H ( N ) + dH ( T ′ S r − ) (from (2.5)). In particular, H ( T S r ) − ( d/ H ( ~T E r ) = H ( N ) = H ( π ) .4.3. Markov processes.

There is an extension of Theorem 1.11 and Theorem 1.15 in our extended setting.The previous deﬁnitions of Markov processes carry over when conditioned on the random tree. More precisely,we use the following deﬁnitions.

Deﬁnition 4.6 (Markov process) . Let ( T, X ) be a random coloring of a tree T on a ﬁnite set M with law µ . For integer r ≥ , X or µ is r -Markov if conditioned on T and on the value at B r − ( e ) , the ( r − -neighborhood of an edge e , the processes on the left and right subtrees of e are independent. Similarly, X or µ is vertex-Markov if conditioned on T and on the value at a vertex, the processes on the pending subtreesof that vertex are independent. For integer r ≥ , let I π,r ( M ) denote the set of laws µ of coloring ( T ′ , X ′ ) on M which are randomlylabeled, such that T ′ has law UGW( π ) r (the law of the restriction of T to S r ) and such that ~µ as deﬁned aboveis invariant by switching the two sides of the oriented edge. If µ ∈ I π ( M ) then µ r ∈ I π ( M ) . Conversely, wehave the following (see [12, Proposition 1.1]): Lemma 4.7.

Let r ≥ integer and p ∈ I π,r ( M ) . Then there is a unique r -Markov process µ ( p ) ∈ I π ( M ) such that the marginal of µ ( p ) on S r is equal to µ . If p ∈ I π,r ( M ) , we deﬁne Σ( p ) = Σ r ( µ ( p )) as in Equation (4.4). The following theorem is an extension ofTheorem 1.11. Theorem 4.8.

Let r ≥ be an integer and p ∈ I π,r ( M ) . If for all couplings q ∈ I π,r ( M ) of p and p , wehave Σ( q ) ≤ p ) then h ( µ ( p )) = ¯ h ( µ ( p )) = Σ( p ) and µ ( p ) is strongly typical. There is a version of this theorem for vertex-Markov processes. As above, let I e ( M ) denote the set ofprobability measures on M E that are invariant by switching the two sides of the edge { o, } . If µ ∈ I µ ( M ) then the restriction of ~µ to E is in I e ( M ) . Conversely, if p ∈ I e ( M ) , there exists a unique vertex-Markovprocess µ ( p ) in I π ( M ) such that ~µ restricted to E is in I e ( M ) . If ( T, X ) ∈ I π ( M ) , we deﬁne Σ e ( X ) = d H ( ~X E ) − dH ( ~X o ) + H ( X o ) − H ( π ) . f p ∈ I e ( M ) , we set Σ( p ) = Σ e ( µ ( p )) . The following theorem is an extension of Theorem 1.15. Theorem 4.9.

Let p ∈ I e ( M ) . If for all couplings q ∈ I e ( M ) of p and p , we have Σ( q ) ≤ p ) then h ( µ ( p )) = ¯ h ( µ ( p )) = Σ( p ) and µ ( p ) is strongly typical. In the remainder of the paper, we explain the proofs of Theorem 4.3, Theorem 4.4, Theorem 4.8 andTheorem 4.9. The proofs are entirely similar to the proof of the corresponding results for invariant processeson T d . We will only sketch the proof and explain the diﬀerences.4.4. Maximizers of the annealed entropy.

The following lemma is the exact analog of Lemma 1.12 andLemma 1.14.

Lemma 4.10.

Let X ∈ I π ( M ) and r ≥ . We have Σ r +1 ( X ) ≤ Σ r ( X ) , with equality if and only if X S r +1 is a r -Markov process on S r +1 . Moreover, Σ ( X ) ≤ Σ e ( X ) , with equality if and only if X S is a vertex-Markov process on S .Proof. We start by the ﬁrst statement. Arguing as in the proof of Lemma 4.10, it is enough to check theinequality for r = 1 . The inequality Σ ( X ) ≤ Σ ( X ) is equivalent to H ( X S ) − H ( X S ) − d H ( ~X E ) + d H ( ~X E ) ≤ . (4.5)Let deg T ( o ) and deg ~T ( o ) be the degrees of the root in T and ~T . For integer k ≥ , from (4.3), we have,for any event A , P (( ~X, ~T ) ∈ A, deg ~T ( o ) = k ) = kd P (( X, T ) ∈ A, deg T ( o ) = k ) . It follows that P (deg ~T ( o ) = k ) = kπ ( k ) /d and if k ≥ is in the support of π , P (( ~X, ~T ) ∈ A | deg ~T ( o ) = k ) = P (( X, T ) ∈ A | deg T ( o ) = k ) . (4.6)In other words, ( X, T ) and ( ~X, ~T ) have the same law when conditioned on the root degree is k ≥ . For a set S of vertices, let us denote by by H k ( S ) the entropy of the variable ( T, X ) restricted to S and conditioned onthe event deg T ( o ) = k . Similarly, we set H k ( S | S ′ ) = H k ( S, S ′ ) − H k ( S ′ ) is the associated relative entropy.From (2.5), we may write, for t = 1 , , H ( X S t ) = H (deg T ( o )) + π (0) H ( X o ) + ∞ X k =1 π ( k ) H k ( S t ) . and H ( ~X E t ) = H (deg ~T ( o )) + ∞ X k =1 kπ ( k ) d H k ( E t ) . Hence, (4.5) is equivalent to the claim: ∞ X k =1 π ( k ) (cid:18) H k ( S ) − H k ( S ) − k H k ( E ) + k H k ( E ) (cid:19) ≤ . (4.7)On the event deg T ( o ) = k , for ≤ i ≤ k , let L i = { ( i, , . . . , ( i, n i ) } be the oﬀspring of vertex i . Note thatthe random variables X L i conditioned on the event deg T ( o ) = k are exchangeable. Then, the computationfrom (2.6) to (2.8) gives H k ( S ) − H k ( S ) − k H k ( E ) + k H k ( E ) ≤ k H k ( L | S ) − k H k ( L | E ) The left-hand side of (4.7) is thus upper bounded by ∞ X k =1 k π ( k )( H k ( L | S ) − H k ( L | e )) = d (cid:16) H ( ~X L | ~X S ) − H ( ~X L | ~X E ) (cid:17) . e now use the invariance of ~µ by switching the two sides of the edge E = { o, } . We get H ( ~X L | ~X E ) = H ( ~X L | ~X E ) . Finally, since E ⊂ S , we deduce that the inequalities (4.5)-(4.7) hold. As in the proof ofLemma 1.12, the case of equality is a directed consequence of the case of equality in (2.5).We now prove the second statement of Lemma 4.10, Σ ( X ) ≤ Σ e ( X ) . It is equivalent to prove that H ( X S ) − d H ( ~X E ) ≤ d H ( ~X E ) − dH ( ~X o ) + H ( X o ) . Arguing as above, we ﬁnd that this is equivalent to ∞ X k =1 π ( k )( H k ( S ) − kH k ( E ) − ( k − H k ( o )) ≤ . As in the proof of Lemma 1.14, it remains to use that H k ( S ) ≤ H ( o ) + kH k ( E | o ) and H k ( E | o ) = H k ( E ) − H k ( o ) . In the case of equality, this implies the conditional independence of ( X , . . . , X k ) given X o and root degree equal to k . (cid:3) Combinatorial characterization of the annealed entropy.

In our extended setting, the combi-natorial interpretation of the annealed entropy Σ r ( µ ) explained in Subsection 2.3 continues to hold. Thedeﬁnition of Σ n ( µ, r, ǫ ) in (2.9) remains unchanged. The following theorem is an extension of Theorem 2.3. Theorem 4.11.

Let µ ∈ I π ( M ) and r ≥ integer. We have lim ǫ → lim inf n →∞ Σ n ( µ, r, ǫ ) = lim ǫ → lim sup n →∞ Σ n ( µ, r, ǫ ) = Σ r ( µ ) . Proof.

Let s ( d ) = d/ − ( d/

2) log d . From [10, Theorem 2.16], we have lim n →∞ (cid:18) n log |G n ( d n ) | − d n (cid:19) = − s ( d ) + H (deg( o )) − E [log(deg( o )!)] , where deg( o ) has law π . Also, from [12, 16], we have lim ǫ → lim inf n →∞ (cid:18) n log |G n ( µ, r, ǫ ) | − d n (cid:19) = lim ǫ → lim sup n →∞ (cid:18) n log |G n ( µ, r, ǫ ) | − d n (cid:19) = J r ( µ ) , where J r ( µ ) = − s ( d ) + H ( e X S r ) − d H ( π µ ) − P g ∈ e E r E [log( N X ( g )!)] is deﬁned exactly as in Theorem 2.3, theonly diﬀerence being that (2.11) is replaced by X g ∈ e E r N X ( g ) = deg( o ) . Arguing as in the proof of Theorem 2.3, we have H ( π µ ) = H ( e X E r ) where e X E r is the unlabeled coloringassociated to ~X E r . Hence, the theorem follows by checking that H ( e X S r ) − d H ( e X E r ) = H ( X S r ) − d H ( ~X E r ) + X g ∈ e E r E [log( N X ( g )!)] − E [log(deg( o )!)] . (4.8)In order to prove that (4.8) holds, we decompose the left-hand side of the possible values of the root-degree.We write H ( e X S r ) = H ( π ) + H ( X o ) + ∞ X k =1 π ( k ) H k ( e X S r ) , where H k is the entropy conditioned on deg T ( o ) = k . Similarly, d H ( e X E r ) = d H (ˆ π ) + ∞ X k =1 k π ( k ) H k ( e X E r ) . et E k [ · ] is the expectation conditioned on deg T ( o ) = k . From (4.6), it follows that the identity (4.8) isequivalent to ∞ X k =1 π ( k ) (cid:18) H k ( e X S r ) − k H k ( e X E r ) (cid:19) = ∞ X k =1 π ( k )  H k ( X S r ) − k H k ( X E r ) + X g ∈ e E r E k [log( N X ( g )!)] − log( k !)  . (4.9)We denote by E or , the tree rooted at o in E r \{ o, } and by E r the tree rooted at . Arguing as in theproof of Theorem 2.3, we ﬁnd H k ( X E r ) = H k ( e X E r ) + K k + K ′ k , where K k is the relative entropy of a random labeling σ of e X E or conditioned on deg( o ) = k and K ′ k is therelative entropy of a random labeling σ ′ of e X E r conditioned on deg( o ) = k . Similarly, arguing as in the proofof Theorem 2.3, we get H k ( X S r ) = H k ( e X S r ) + kK ′ k − X g ∈ e E r E k [log( N X ( g )!)] + log( k !) . We deduce that the right-hand side of (4.9) is equal to ∞ X k =1 π ( k ) (cid:18) H k ( e X S r ) − k H k ( e X E r ) (cid:19) + ∞ X k =1 π ( k ) (cid:18) k K ′ k − k K k (cid:19) . Finally, we observe that ∞ X k =1 π ( k ) (cid:18) k K ′ k − k K k (cid:19) = d H ( σ ) − H ( σ ′ )) . The above expression is equal to because the law ~µ is invariant by switching the two sides of the edge { o, } . This concludes the proof of Equation (4.8). (cid:3) Proofs of Theorem 4.3, Theorem 4.4, Theorem 4.8 and Theorem 4.9.

As already pointed, theconclusion of Theorem 2.1 holds in our extended setting. We may thus repeat verbatim the proofs in Section3 and invoke Theorem 4.11 in place of Theorem 2.3 and Lemma 4.10 in place of Lemma 1.12. (cid:3)

References [1] D. Aldous and R. Lyons. Processes on unimodular random networks.

Electronic Journal of Probability , 12:1454–1508, 2007.15[2] A. Backhausz and B. Szegedy. On large-girth regular graphs and random processes on trees.

Random Structures Algorithms ,53(3):389–416, 2018. 1, 2, 3, 4, 10[3] A. Backhausz and B. Szegedy. On the almost eigenvectors of random regular graphs.

Ann. Probab. , 47(3):1677–1725, 2019.1, 3, 4, 5, 10[4] A. Backhausz, B. Szegedy, and B. Virág. Ramanujan graphings and correlation decay in local algorithms.

Random Struc-tures Algorithms , 47(3):424–435, 2015. 1[5] A. Backhausz and B. Virág. Spectral measures of factor of i.i.d. processes on vertex-transitive graphs.

Ann. Inst. HenriPoincaré Probab. Stat. , 53(4):2260–2278, 2017. 1[6] M. Bayati, D. Gamarnik, and P. Tetali. Combinatorial approach to the interpolation method and scaling limits in sparserandom graphs.

Ann. Probab. , 41(6):4080–4115, 2013. 1[7] E. A. Bender and E. Canﬁeld. The asymptotic number of labeled graphs with given degree sequences.

Journal of Combi-natorial Theory, Series A , 24(3):296 – 307, 1978. 10[8] I. Benjamini and O. Schramm. Recurrence of distributional limits of ﬁnite planar graphs.

Electron. J. Probab. , 6:no. 23,13, 2001. 1[9] B. Bollobás. The independence ratio of regular graphs.

Proc. Amer. Math. Soc. , 83(2):433–436, 1981. 1[10] B. Bollobás.

Random graphs , volume 73 of

Cambridge Studies in Advanced Mathematics . Cambridge University Press,Cambridge, second edition, 2001. 1, 2, 7, 18[11] C. Bordenave. Normalité asymptotique des vecteurs propres d’un graphe régulier aléatoire (d’après Backhausz et Szegedy).Séminaire Bourbaki, 71 e année, 2018–2019, No. 1151., 2018. 1[12] C. Bordenave and P. Caputo. Large deviations of empirical neighborhood distribution in sparse random graphs. Probab.Theory Related Fields , 163(1-2):149–222, 2015. 4, 5, 10, 16, 18[13] L. P. Bowen. A brief introduction of soﬁc entropy theory. In

Proceedings of the International Congress of Mathematicians—Rio de Janeiro 2018. Vol. III. Invited lectures , pages 1847–1866. World Sci. Publ., Hackensack, NJ, 2018. 1, 3[14] A. Coja-Oghlan and W. Perkins. Spin systems on Bethe lattices.

Comm. Math. Phys. , 372(2):441–523, 2019. 1

15] E. Csóka, V. Harangi, and B. Virág. Entropy and expansion.

Ann. Inst. Henri Poincaré Probab. Stat. , 56(4):2428–2444,2020. 1[16] P. Delgosha and V. Anantharam. A notion of entropy for stochastic processes on marked rooted graphs. arXiv:1908.00964,2019. 10, 18[17] J. Ding, A. Sly, and N. Sun. Maximum independent sets on random regular graphs.

Acta Math. , 217(2):263–340, 2016. 1[18] W. Duckworth and N. C. Wormald. On the independent domination number of random regular graphs.

Combin. Probab.Comput. , 15(4):513–522, 2006. 1[19] P. Erdős and T. Gallai. Graphs with prescribed degrees of vertices (hungarian).

Mat. Lapok , 11:264–274, 1960. 15[20] J. Friedman. A proof of Alon’s second eigenvalue conjecture and related problems.

Mem. Amer. Math. Soc. ,195(910):viii+100, 2008. 1[21] D. Gamarnik. Right-convergence of sparse random graphs.

Probab. Theory Related Fields , 160(1-2):253–278, 2014. 1, 3[22] H. Hatami, L. Lovász, and B. Szegedy. Limits of locally-globally convergent graph sequences.

Geom. Funct. Anal. ,24(1):269–296, 2014. 1[23] L. Lovász.

Large networks and graph limits , volume 60 of

American Mathematical Society Colloquium Publications . Amer-ican Mathematical Society, Providence, RI, 2012. 1[24] M. Mezard and A. Montanari.

Information, Physics, and Computation . Oxford University Press, Inc., USA, 2009. 1[25] D. Puder. Expansion of random graphs: new proofs, new results.

Invent. Math. , 201(3):845–908, 2015. 1[26] J. Salez. The interpolation method for random graphs with prescribed degrees.

Combin. Probab. Comput. , 25(3):436–447,2016. 1[27] R. van der Hofstad.

Random Graphs and Complex Networks, Volume 2 . Cambridge Series in Statistical and ProbabilisticMathematics. Cambridge University Press, to appear. 15[28] N. C. Wormald. Diﬀerential equations for random processes and random graphs.

Ann. Appl. Probab. , 5(4):1217–1235, 1995.1[29] N. C. Wormald. Models of random regular graphs. In

Surveys in combinatorics, 1999 (Canterbury) , volume 267 of

LondonMath. Soc. Lecture Note Ser. , pages 239–298. Cambridge Univ. Press, Cambridge, 1999. 1, 8, pages 239–298. Cambridge Univ. Press, Cambridge, 1999. 1, 8