[PDF] Inference and mutual information on random factor graphs

Abstract

Random factor graphs provide a powerful framework for the study of inference problems such as decoding problems or the stochastic block model. Information-theoretically the key quantity of interest is the mutual information between the observed factor graph and the underlying ground truth around which the factor graph was created; in the stochastic block model, this would be the planted partition. The mutual information gauges whether and how well the ground truth can be inferred from the observable data. For a very general model of random factor graphs we verify a formula for the mutual information predicted by physics techniques. As an application we prove a conjecture about low-density generator matrix codes from [Montanari: IEEE Transactions on Information Theory 2005]. Further applications include phase transitions of the stochastic block model and the mixed k -spin model from physics.

Full PDF

aa r X i v : . [ c s . D M ] J u l INFERENCE AND MUTUAL INFORMATION ON RANDOM FACTOR GRAPHS

AMIN COJA-OGHLAN, MAX HAHN-KLIMROTH, PHILIPP LOICK, NOËLA MÜLLER,KONSTANTINOS PANAGIOTOU, MATIJA PASCH {acoghlan,hahnklim,loick,nmueller}@math.uni-frankfurt.de , Goethe University, MathematicsInstitute, 10 Robert Mayer St, Frankfurt 60325, Germany. {kpanagio, pasch}@math.lmu.de , Mathematisches Institut der Universität München, Theresienstr. 39,80333 München, Germany A BSTRACT . Random factor graphs provide a powerful framework for the study of inference problems suchas decoding problems or the stochastic block model. Information-theoretically the key quantity of interest isthe mutual information between the observed factor graph and the underlying ground truth around whichthe factor graph was created; in the stochastic block model, this would be the planted partition. The mutualinformation gauges whether and how well the ground truth can be inferred from the observable data. Fora very general model of random factor graphs we verify a formula for the mutual information predicted byphysics techniques. As an application we prove a conjecture about low-density generator matrix codes from[Montanari: IEEE Transactions on Information Theory 2005]. Further applications include phase transitionsof the stochastic block model and the mixed k -spin model from physics.Amin Coja-Oghlan and Philipp Loick are supported by DFG CO 646/3. Max Hahn-Klimroth is supported by Stiftung Poly-technische Gesellschaft. Konstantinos Panagiotou, Matija Pasch: The research leading to these results has received fundingfrom the European Research Council, ERC Grant Agreement 772606-PTRCSP. . I NTRODUCTION

Background and motivation.

Since the 1990s there has been an immense interest in inference andlearning problems on random graphs. One motivation has been to seize upon random graphs as bench-marks for inference algorithms of all creeds and denominations. An excellent example of this is thestochastic block model; the impressive literature on this model alone is surveyed in [1]. A second, no lesssalient motivation has been the use of random graphs in probabilistic constructions. Concrete exam-ples include powerful error correcting codes such as low density generator matrix or low density paritycheck codes, which have since found their way into modern communications standards [31, 43]. Fur-ther prominent recent applications include compressed sensing and group testing [3, 22, 23]. It appearshardly a stretch to claim that in terms of real world impact these constructions occupy top ranks amongapplications of the probabilistic method and, indeed, modern combinatorics generally.Yet many applications of the probabilistic method to inference problems still lack a satisfactory rig-orous justiﬁcation. Some are supported primarily by empirical evidence, i.e., not much more than abunch of computer experiments. Quite a few others have been inspired by a versatile but non-rigorousapproach from physics known as the ‘cavity method.’ But while there has been progress in recent years,vast gaps between the physics predictions and their rigorous vindications remain. One important reasonfor this is that the random graph models used in practical inference tend to be signiﬁcantly more intri-cate than, say, a classical binomial random graph. For instance, a highly popular breed of low-densityparity check codes use delicately tailored degree distributions for both the variable nodes and the checknodes of the Tanner graph [43].In this paper we signiﬁcantly advance the rigorous state of the art by corroborating important cavitymethod predictions wholesale for a rich class of inference problems that accommodates the very generalchoices of degree distributions of interest in high-dimensional Bayesian inference problems and codingtheory. Generally, the objective in such inference problems is to recover the ground truth from the ob-servable data. Think, for instance, of retrieving the hidden communities in the stochastic block model orof reconstructing the original message from a noisy codeword. For this broad class of models we rigor-ously establish the formulas that the cavity method predicts for the mutual information, which is the keyinformation-theoretic potential that gauges precisely how much it is possible in principle to learn aboutthe ground truth. Technically we build upon and extend the methods developed in [14] for random graphmodels of Erd˝os-Rényi type. While we follow a similar general proof strategy, the greater generality of thepresent results necessitates signiﬁcant upgrades to virtually all of the moving parts. For example, due tothe more rigid combinatorial structure of graphs with given degrees many of the manoeuvres that arestraightforward for binomial random graphs now require delicate coupling arguments.We proceed to highlight applications of our main results to three speciﬁc problems that have eachreceived a great deal of attention in their own right: low-density generator matrix codes, the stochasticblock model and the mixed k -spin model, which hails from mathematical physics. Then in Section 2 westate the main results concerning the general class of random factor graph models. Section 3 containsan overview of the proof strategy and a detailed comparison with prior work.1.2. Low-density generator matrix codes.

A powerful and instructive class of error-correcting codes,low-density generator matrix (‘ldgm’) codes are based on random bipartite graphs with given degree dis-tributions. Speciﬁcally, let d , k ≥ n be an integer andlet m ∼ Po( n E [ d ]/ E [ k ]) be a Poisson variable. One vertex class V = { x , . . . , x n } of the graph represents thebits of the original message. The other class F = { a , . . . , a m } represents the rows of the code’s generatormatrix. To obtain the random graph G create for each variable node x i an independent copy d i of d .Similarly, create an independent copy k i of k for each check node a i . Then given the event n X i = d i = m X i = k i (1.1) hat the total degrees on both sides match let G be a random bipartite graph where every x i has degree d i and every a i has degree k i . We tacitly restrict to n such that the event (1.1) has positive probability.The generator matrix of the ldgm code is now precisely the m × n biadjacency matrix A ( G ) of G , viewedas a matrix over F . Thus, the rows of A ( G ) correspond to the check nodes a , . . . , a m , the columns corre-spond to x , . . . , x n and the ( i , j )-entry equals one iff a i and x j are adjacent. For a given message x ∈ F n the corresponding codeword reads y = A ( G ) x ∈ F m . The receiver on the other end of a noisy channelobserves a scrambled version y ∗ of y . Speciﬁcally, y ∗ is obtained from y by ﬂipping every bit with prob-ability η ∈ (0, 1/2) independently. To gauge the potential of the code, the key question is how muchinformation about the original x the receiver can possibly extract from y ∗ . Naturally, the receiver alsoknows G . Hence, we aim to work out the conditional mutual information I ( x , y ∗ | G ) = X x ∈ F n , y ∈ F m P £ x = x , y ∗ = y | G ¤ log P £ x = x , y ∗ = y | G ¤ n P £ y ∗ = y | G ¤ .A precise prediction as to its asymptotical value was put forward on the basis of the physicists’ cavitymethod. As most such predictions, the formula comes as a variational problem that asks to optimise afunctional called the Bethe free entropy over a space of probability measures. Speciﬁcally, let P ∗ ([ −

1, 1])be the space of all probability measures ρ on the interval [ −

1, 1] with mean zero. Let ( θ i , ρ ) i ≥ ⊆ [ −

1, 1] bea family of samples from ρ . Further, let ( J i ) i ≥ be Rademacher variables, i.e., P [ J i = = P [ J i = − = k i , j ) i , j ≥ be random variables with distribution P £ ˆ k i = ℓ ¤ = ℓ P [ k = ℓ ] E [ k ] ( ℓ ≥ Λ ( z ) = z log( z ). Then the Bethe free entropy reads B ldgm ( ρ , η ) = E " Λ Ã X σ ∈ {0,1} d Y i = + ( − σ J i (1 − η ) ˆ k i − Y j = θ i , j ! − E [ d ]( k − E [ k ] Λ Ã + J (1 − η ) k Y j = θ j ! . Theorem 1.1.

For any d , k and for all η ∈ (0, 1) we have lim n →∞ n I ( x , y ∗ | G ) = µ + E [ d ] E [ k ] ¶ log(2) + η log( η ) + (1 − η ) log(1 − η ) − sup π ∈ P ∗ ([ − B ldgm ( π , η ) in probability .Theorem 1.1 completely solves a well known conjecture [38, Conjecture 1] and signiﬁcantly extends theresults from [11, 14], which required the restrictive assumption that the check degree k be constant.A possible objection to a result such as Theorem 1.1 might be that the resulting formula appears ex-ceedingly complicated as it leaves us with a potentially difﬁcult variational problem. Yet two points areto be made in defense. First, by vindicating the precise formula predicted by the cavity method, the the-orem and its proof show that this technique and the ideas behind it do indeed get to the bottom of theproblem. Second, since the formula involves a supremum, any π ∈ P ∗ ([ −

1, 1]) yields an upper boundon the mutual information. Hence, the heuristic population dynamics algorithm deemed to producegood candidate maximisers and beloved of physicists, can be harnessed to get rigorous bounds in onedirection. Finally, in some cases it is possible to precisely identify the maximiser analytically [6, 12].1.3.

The stochastic block model.

An instructive model of graph clustering, the stochastic block modelpresumes that a random graph is created in two steps. First each of the n vertices { x , . . . , x n } receives oneof q ≥ σ ∗ x i ∈ [ q ] uniformly and independently. Then a sparse random graph is createdwhere vertices with the same colour are either more likely to be connected by an edge (assortative case),or less likely (disassortative). Different versions of this model have been proposed. While in the simplestone edges are inserted independently, here we consider a model from [33] that produces a d -regulargraph. Hence, let d ≥ G = G ( n , d ) be a random d -regular graph. Further, given a arameter β > G ∗ = G ∗ ( n , d , σ ∗ ) be a random graph drawn from the distribution P £ G ∗ = G | σ ∗ ¤ ∝ exp " − β X vw ∈ E ( G ) © σ ∗ v = σ ∗ w ª ,(1.3)with the ∝ -symbol hiding the normalisation required to obtain a probability distribution. Thus, theparameter β tunes the penalty that we impose on monochromatic edges by comparison to the null model G . At β = G ∗ and G are identical. But even for positive β the randomgraphs G , G ∗ may still be indistinguishable and in effect recovering σ ∗ may be impossible. Hence, afundamental question is for what q , d , β it is possible to discriminate between G , G ∗ . Formally, we recallthat the Kullback-Leibler divergence of G ∗ , G is deﬁned as D KL ¡ G ∗ k G ¢ = X G P £ G ∗ = G ¤ log P [ G ∗ = G ] P [ G = G ] .The Kullback-Leibler divergence is an information-theoretic potential that gauges the similarity of tworandom graph models. In particular, if D KL ( G ∗ k G ) = Ω ( n ), then G , G ∗ can be told apart because naturalobservables will take vastly different values on the two models.Whether D KL ( G ∗ k G ) = Ω ( n ) depends on the value of the Bethe free entropy for the stochastic blockmodel. To be precise, let P ([ q ]) be the set of all probability distributions ( µ (1), . . . , µ ( q )) on [ q ]. Weidentify P ([ q ]) with the standard simplex in R q . Further, let P ∗ ([ q ]) be the set of all probability measures π on P ([ q ]) such that R µ ( σ )d π ( µ ) = q for every σ ∈ [ q ]. In other words, the mean of π is the barycenterof the simplex. Let ( µ i , π ) i ≥ be a family of independent samples from π and let B sbm ( π , β ) = E " Λ ¡P q σ = Q di = − (1 − e − β ) µ i , π ( σ ) ¢ q ¡ − (1 − e − β )/ q ¢ d − d Λ ¡ − (1 − e − β ) P q σ = µ π ( σ ) µ π ( σ ) ¢ ¡ − (1 − e − β )/ q ¢ . Theorem 1.2.

Let β ∗ = inf ( β > π ∈ P ∗ ([ q ]) B sbm ( π , β ) > log( q ) + d ³ − (1 − e − β )/ q ´) . (i) If β < β ∗ , then lim n →∞ n D KL ( G ∗ k G ) = .(ii) If β > β ∗ , then lim n →∞ n D KL ( G ∗ k G ) > . Theorem 1.2 easily implies that for β > β ∗ it is information-theoretically possible to recover a non-trivial approximation to σ ∗ from G ∗ . In other words, there exists an exponential time algorithm thatlikely outputs a colouring τ of the vertices that has a signiﬁcantly greater overlap with the ground truth σ ∗ than a random guess. An open question is whether for β > β ∗ this problem can even be solved by apolynomial time algorithm. The going conjecture is that in general the answer is ‘no’ and that efﬁcientrecoverability kicks in only at a second threshold β ∗∗ > β ∗ for many interesting choices of q , d [20].1.4. The mixed k -spin model. Not only do the main results of this paper facilitate rigorous proofs ofphysics predictions for problems in computer science, but also, conversely, do we obtain new theoremson problems of keen interest in statistical physics. For example, the mixed k -spin model is an impor-tant spin glass model [40]; its purpose is to describe the magnetic interactions in metallic alloys. Todeﬁne the model let k ≥ E [ k + ε ] < ∞ for some ε > P [ k = >

0. Let ( k i ) i ≥ be a sequence of independent copies of k . Moreover, let d > H = H k ( n , m ) be a (non-uniform) random hypergraph on V n = { x , . . . , x n } with m = Po( d n / E [ k ]) in-dependent hyperedges a , . . . , a m such that a i comprises k i vertices, drawn uniformly without replace-ment. Thus, in the special case that k is constant we obtain the classical binomial random hypergraph.To turn this random hypergraph into a spin glass model we draw for each of its edges a i an independent tandard Gaussian J i . Additionally, let β > Boltzmann distribution of the model is the probability distribution on { ± V n deﬁned by µ H , J , β ( σ ) = exp ¡ β P m i = J i Q x ∈ a i σ x ¢ Z ( H , J , β ) ( σ ∈ { ± V n ), where Z ( H , J , β ) = X τ ∈ { ± Vn exp Ã β m X i = J i Y x ∈ a i τ x ! .The normalising term Z ( H , J , β ) is known as the partition function .A key question is whether for given d , β , k there occur long-range correlations between the magnetic‘spins’ observed at x , . . . , x n . Formally, let σ ∈ { ± V n signify a sample from the Boltzmann distribution.Then we say that long-range correlations are absent iflim n →∞ n X x , y ∈ V n E ¯¯ µ H , J , β ({ σ x = σ y = − µ H , J , β ({ σ x = µ H , J , β ({ σ y = ¯¯ = x , y of vertices the spins σ x , σ y are essentially independent.If (1.4) is violated, we say that long-range correlations are present.According to physics predictions for a given β > d β , k that can be determined in terms of the Bethe free entropy [30, 36]. The methods developed in thispaper enable us to corroborate this formula rigorously. Speciﬁcally, let P ∗ ([ −

1, 1]) be the space of allprobability measures on [ −

1, 1] with mean zero. Given π ∈ P ∗ ([ −

1, 1]) let ( µ π , i , j ) i , j ≥ be a family of inde-pendent samples from π . Additionally, let ( ˆ k i ) i ≥ be a family of independent copies of ˆ k from (1.2) andlet d = Po( d ). Then the Bethe free entropy of the k -spin model reads B k − spin ( π ) = E  Λ  X σ ∈ { ± d Y i =  + X σ ,..., σ ˆ k i ∈ { ± tanh  β J j Y j ∈ [ˆ k i ] σ j  ˆ k i Y j = + σ j µ π , i , j  − d E [ k ] E " ( k − Λ Ã + X σ ∈ { ± k tanh Ã β J k Y i = σ j ! k Y i = + σ i µ π ,1, i ! . Theorem 1.3.

Let d β , k = inf © d > π ∈ P ∗ ([ − B k − spin ( π ) > log 2 ª .(i) Long-range correlations are absent for d < d β , k .(ii) For any ε > there exists d β , k < d < d β , k + ε where long-range correlations are present. Thus, the point d β , k , characterised by the Bethe variational principle, marks the onset of complex mag-netic interactions in the mixed k -spin model. This critical value is known as the replica symmetry break-ing phase transition in physics jargon. As a further application of the main results we can pinpoint theso-called condensation phase transition of the Potts antiferromagnet on random d -regular graphs, an-other problem of interest in mathematical physics. The details can be found in Section 16.2. T HE MUTUAL INFORMATION OF RANDOM FACTOR GRAPHS

The theorems quoted in Section 1 are easy consequences of results on general random factor graph mod-els. These more general theorems, which we present next, constitute the main results of the paper.2.1.

Random factor graph models.

Remarkably many classical problems from combinatorics, statisticsand physics can be expressed conveniently in the language of factor graph models [36, 41, 44]. A factorgraph G is a bipartite graph whose vertex classes are variable nodes V ( G ) and factor nodes F ( G ). Theformer represent the variables of the combinatorial problem in question, such as the individual bits ofa codeword. Generally we assume that these variables range over a domain Ω

6= ; of size q = | Ω | ≥ a ∈ F ( G ) comes with a function ψ a : Ω ∂ a → ∞ ) that assigns a positive weight to value combinations of the adjacent variables ∂ a . The factor graphgives rise to a probability distribution µ G ( σ ) = ψ G ( σ ) Z G , where ψ G ( σ ) = Y a ∈ F ( G ) ψ a ( σ ∂ a ) and Z G = X τ ∈ Ω V ( G ) ψ G ( τ ) ( σ ∈ Ω V ( G ) ).(2.1)To describe problems such as the ones from Section 1 we introduce models where the factor graphitself is random. Speciﬁcally, let d , k ≥ d i ) i ≥ , ( k i ) i ≥ beindependent copies of d , k . Further, for each k in the support of k let Ψ k be a ﬁnite set of k -ary functions ψ : Ω k → (0, ∞ ). Let P k be a probability distribution on Ψ k and let us write ψ k for a sample from P k .Further, let ψ be a random variable distributed as ψ k , let P be the distribution of ψ k and let k ψ denotethe arity of ψ .Now, to construct a factor graph let V n = { x , . . . , x n } be a set of variable nodes and let F m = { a , . . . , a m }be a set of m ∼ Po( n E [ d ]/ E [ k ]) factor nodes. We obtain the random factor graph G as follows. G1: given the event P ni = d i = P m i = k i , choose a bipartite graph on variable and factor nodes suchthat every x i has degree d i and every a j has degree k j uniformly at random. G2: choose for every factor node a i a weight function ψ a i from the distribution ψ k i .In the language of inference problems the random factor graph G is going to provide a null model be-cause the weight functions in G2 are independent of the graph structure from G1 . For instance, in thecontext of the stochastic block model from Section 1.3, this model plays the role of the purely randomgraph without a particular underlying colouring.2.2. The teacher-student scheme.

The teacher-student scheme organically turns the null model intoan inference problem. A helpful metaphor might be to imagine a teacher who attempts to convey aground truth σ ∗ to a student by presenting examples. The ground truth itself is a random vector chosenuniformly from the space Ω V n . The set of examples corresponds to a factor graph G ∗ .To be precise, let D be the σ -algebra generated by the degrees and the total number of factor nodes ofthe null model G . Then the factor graph G ∗ is chosen from the distribution P £ G ∗ = G | D , σ ∗ ¤ = P [ G = G | D ] ψ G ( σ ∗ ) E [ ψ G ( σ ∗ ) | D , σ ∗ ] .(2.2)Hence, we reweigh the null model G1 – G2 according to the ground truth σ ∗ , rewarding graphs underwhich σ ∗ receives a higher weight. In the case of the stochastic block model, G ∗ matches the reweigh-ing (1.3) that prefers bichromatic edges. The obvious question is how much of an imprint σ ∗ leaves onthe resulting factor graph G ∗ ? Before we answer this question in general let us illustrate how the exam-ples from Section 1 ﬁt into the general framework. Example 2.1 (ldgm codes) . Let Ω = { + − with + = ( − representing ∈ F and − representing ∈ F . For every degree k ≥ there are two k-ary weight functions ψ η , k , ± deﬁned by ψ η , k , J ( σ ) = − (1 − η ) J k Y i = σ i ( σ ∈ Ω k ). The probability distribution P k is deﬁned by P ( ψ η , k , J ) = . With this setup the bipartite graph structureof the null model G coincides with the bipartite graph introduced in Section 1.2. Moreover, the ± -labelsof the weight functions (i.e., value of J such that ψ a i = ψ η , k i , J ) represent the entries of the vector y ∗ . Thus,while in the null model G these vector entries are purely random, in the reweighted model G ∗ the labelsare distributed precisely as the entries of the vector y ∗ from the ldgm model. Example 2.2 (stochastic block model) . Let Ω = [ q ] be a set of q colours. We introduce a single binaryweight function ψ β , q ( σ , σ ) = exp( − β { σ = σ }) and we let d be the constant random variable d . Withthis weight function the construction (2.2) coincides with the deﬁnition (1.3) of the stochastic block model. he main theorem is going to provide a formula for the mutual information of G ∗ and the ground truth σ ∗ , provided that the distribution P on weight functions satisﬁes a number of easy-to-check conditions.To state these conditions let us denote by P ( Ω ) the set of all probability distributions on Ω , endowed withthe topology inherited from Euclidean space. Moreover, let P ∗ ( Ω ) signify the space of all probabilitymeasures π on P ( Ω ) such that R P ( Ω ) µ ( ω )d π ( µ ) = q for all ω ∈ Ω . Finally, for a given π ∈ P ∗ ( Ω ) let( µ i , j , π ) i , j ≥ be independent samples from π and recall Λ ( x ) = x log x . The assumptions read as follows. DEG: there exists ε > E [ d + ε ], E [ k + ε ] < ∞ . SYM: there exist reals ε , ξ > k ∈ supp k , ψ ∈ Ψ k , j ∈ [ k ], ω ∈ Ω we have X σ ∈ Ω k © σ j = ω ª ψ ( σ ) = q k − ξ , ε < ψ ( σ ) < ε ( σ ∈ Ω k ). BAL: for every k ∈ supp k the function µ ∈ P ( Ω ) P σ ∈ Ω k E £ ψ k ( σ ) ¤ Q ki = µ ( σ i ) is concave and at-tains its maximum at the uniform distribution on Ω . POS: for any two probability distributions π , π ′ ∈ P ∗ ( Ω ) and any k ∈ supp k we have E " Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y i = µ i ,1, ρ ( τ i ) ! + ( k − E " Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y i = µ i ,1, ρ ′ ( τ i ) ! ≥ k X j = E " Λ Ã X τ ∈ Ω k ψ k ( τ ) µ j ,1, ρ ( τ j ) Y i j µ i ,1, ρ ′ ( τ i ) ! .The ﬁrst assumption DEG ensures that the factor graphs are ‘sparse’ or, formally, locally ﬁnite. Yet

DEG allows for very general degree distributions, including Poisson and power law distributions. Moreover,conditions

SYM and

BAL are symmetry conditions. Roughly speaking, they provide that all the values ω ∈ Ω are on the same footing, i.e., there is no semantic preference for any value. Finally condition POS can be viewed as a convexity requirement. This assumption is needed for the technical reason offacilitating the interpolation method, a proof technique that we borrow from mathematical physics. Theconditions are easily seen to be satisﬁed in many models of interest including, of course, the stochasticblock model and ldgm codes; see Section 16. Crucially, the assumptions can be checked solely in termsof the weight functions; no random graphs considerations are required. The mutual information.

The main result of the paper vindicates the physicists’ hunch that themutual information between the teacher’s ground truth σ ∗ and the data G ∗ presented to the student isdetermined by the Bethe free entropy. To state the result we introduce the following generic version ofthe Bethe functional. Let ( ψ k , i ) k , i be a family of independent random weight functions such that ψ k , i isdistributed as ψ k . Further, let ( h k , i ) k , i with h k , i ∈ [ k ] be a family of independent uniformly distributedindices. Recalling that ( ˆ k i ) i ≥ are independent copies of ˆ k from (1.2), we deﬁne B ( π ) = q E  ξ − d Λ  X σ ∈ Ω d Y i = X τ ∈ Ω ˆ k i n τ h ˆ k i , i = σ o ψ ˆ k i , i ( τ ) Y j ∈ [ˆ k i ]\{ h ˆ k i , i } µ i , j , π ( τ j )  (2.3) − E [ d ] ξ E [ k ] E " ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ j , π ( τ j ) ! .The following theorem expresses the mutual information of G ∗ and σ ∗ given the degrees and the totalnumber of factor nodes as the variational problem of maximising the Bethe functional. We point out that

POS fails to hold in the case of the assortative stochastic block model. heorem 2.3. For any random factor graph model that satisﬁes the conditions

DEG , SYM , BAL and

POS , lim n →∞ n I ( σ ∗ , G ∗ | D ) = log q + E [ d ] ξ E [ k ] E " q − k ψ X τ ∈ Ω k ψ Λ ( ψ ( τ )) − sup π ∈ P ∗ ( Ω ) B ( π ) in probability .(2.4)The formula (2.4) is in line with predictions from [45]. Moreover, the results quoted in Section 1 areimmediate consequences of Theorem 2.3.3. P ROOF STRATEGY

In this section we survey the proof of Theorem 2.3. Subsequently we discuss how the strategy comparesto prior work, particularly [14]. Throughout we tacitly assume that

DEG , SYM , BAL and

POS are satisﬁed.3.1.

The partition function.

The starting point for computing the mutual information is to observe thatthis quantity is closely connected to the partition function of G ∗ . Proposition 3.1.

W.h.p. we haveI ( σ ∗ , G ∗ | D )/ n = log q + E [ d ] ξ E [ k ] E " q − k ψ X τ ∈ Ω k ψ Λ ( ψ ( τ )) − E [log Z ( G ∗ )]/ n + o (1).Hence, Proposition 3.1 reduces our task to computing E [log Z ( G ∗ )]. This is still a formidable challengebecause the logarithm sits inside the expectation; hence, routine techniques such as moment calcu-lations do not bite. Instead we will combine two separate techniques. The ﬁrst is a coupling argumentknown as the Aizenman-Sims-Starr scheme. This argument will show that E [log Z ( G ∗ )] is upper boundedby sup π B ( π ). The second component, the interpolation method, will supply the matching lower bound.What these techniques have in common is that they both boil down to ‘local’ calculations. That is, weneed to assess the impact on the partition function Z ( G ∗ ) of a small number of local changes such asaddition of a few factor or variable nodes to G ∗ . We will perform these computations by way of a proba-bilistic argument, namely by tracing how they affect the average weight of a sample from the Boltzmanndistribution of G ∗ . The key is a simple but powerful fact that trades as the Nishimori identity.3.2. The Nishimori identity.

To formulate this identity we need to introduce a slightly modiﬁed versionof the random factor graph model G ∗ . Recall from (2.2) that G ∗ was obtained by ﬁrst drawing σ ∗ uni-formly at random and then reweighting the null model G according to the weight of σ ∗ . If we combinethese two steps the net effect should be, at least roughly, that a speciﬁc G comes up with probability pro-portional to Z ( G ), as every σ ∈ Ω V n provides G with a ψ G ( σ ) chance of being sampled. Thus, G ∗ shouldbe roughly equivalent to the random factor graph model ˆ G deﬁned by P £ ˆ G = G | D ¤ ∝ Z G P [ G = G | D ] .(3.1)Indeed, this equivalence turns out to be exact if we make one minimal change. Namely, instead of draw-ing the ground truth σ ∗ uniformly at random, we draw a sample from the distribution P [ ˆ σ = σ | D ] ∝ E £ ψ G ( σ ) | D ¤ ( σ ∈ Ω V n ).(3.2)The following is an extension of [14, Proposition 3.10] to the present, more general class of factor graphmodels with given degrees. Proposition 3.2.

We have P £ ˆ G = G | D ¤ µ G ( σ ) = P [ ˆ σ = σ | D ] P £ G ∗ = G | D , σ ∗ = σ ¤ .(3.3) Furthermore, ˆ σ and σ ∗ as well as G ∗ , ˆ G are mutually contiguous and E [log Z G ∗ ] = E [log Z ˆ G ] + o ( n ).The proof of Proposition 3.2 relies on Bayes’ formula combined with a somewhat subtle application oflocal limit theorems and other probabilistic tools. The details can be found in Section 4.5. .3. Degree pruning.

A further preparation is degree pruning. Speciﬁcally, while in the random factorgraph models G ∗ and ˆ G may possess degrees as large as n − ε , the following proposition shows that itsufﬁces to prove the main result (2.4) for bounded degree sequences. Proposition 3.3.

Assume that for any integer L > and for any d , k such that d , k ≤ L the statement (2.4) is true. Then (2.4) holds for all d , k that satisfy DEG and for which E [ d ] , E [ k ] > . The proof of Proposition 3.3 is based on concentration inequalities and coupling arguments for bipartitegraphs with given degree sequences. Hence, we may assume from here on that d , k are bounded.3.4. Cavities and couplings.

Two of the main steps towards the proof of Theorem 2.3, the Aizenman-Sims-Starr scheme and the interpolation method, hinge on comparing random factor graphs with slightlydifferent parameters. For example, we will need to compare a random factor graph G ∗ with n variableand Po( E [ d ] n / E [ k ]) factor nodes and a factor graph with n + E [ d ]( n + E [ k ]) factor nodes. In the classical case of binomial factor graphs as treated in [14]where factor nodes are drawn independently this coupling would be relatively straightforward. Indeed,we could just add a variable node and a few extra factor nodes to the graph with n variables. However, inthe present setting of given degrees matters are much more delicate. For instance, how would you set upsuch a coupling for the d -regular stochastic block model from Section 1.3? Due to the given degrees thegraph structure is too rigid to accommodate the necessary local changes.To cope with this issue we ﬁrst create a bit of wiggling room for ourselves by slightly reducing thenumber of factor nodes. This idea has been used in prior work on factor graphs with rigid degree dis-tributions such as [12]. However, matters turn out to be rather more delicate here because we do notjust work with purely random factor graphs, but with graphs drawn from the teacher-student model.Thus, we need to take care to meticulously implement the weight shifts in accordance with (2.2). Hence,for a small but ﬁxed ε > m ε = Po((1 − ε ) E [ d ] n / E [ k ]) be a Poisson variable with a slightly smallermean than m . Because we assume that all degrees are bounded, with probability 1 − exp( − Ω ( n )) we have P ni = d i ≥ P m ε i = k i . In fact, w.h.p. the total variable degree exceeds the total degree of the ﬁrst m ε factornodes by Ω ( n ). Let G ( n , m ε ) be a random factor graph with variable nodes x , . . . , x n and factor nodes a , . . . , a m ε of degrees k , . . . , k m ε drawn uniformly at random subject to the condition that the degreeof each x i remains bounded by d i . Thus, some of the variable nodes will likely have a degree strictlysmaller than their ‘target degree’ d i . We refer to these variable degrees as cavities . Further, given σ ∈ Ω V n let G ∗ ( n , m ε , σ ) be the random factor graph obtained as in (2.2), i.e., with D ε denoting the σ -algebragenerated by the degrees and the total number of factors nodes of G ( n , m ε ) we let P £ G ∗ ( n , m ε , σ ) = G | D ε ¤ ∝ P [ G ( n , m ε ) = G | D ε ] ψ G ( σ ).The following proposition establishes that we can indeed think of G ∗ ( n , m ε + σ ) as being obtained from G ∗ ( n , m ε , σ ) by adding one extra factor node a m ε + . Further, for two factor graphs G , G ′ on the same setof nodes let G △ G ′ be the symmetric difference of their edge sets. Proposition 3.4.

Assume that | σ − ( ω ) | = n / q + O ( p n log n ) for all ω ∈ Ω . Then there exists a coupling of G ∗ ( n , m ε , σ ) and G ∗ ( n , m ε + σ ) such that P £ G ∗ ( n , m ε , σ ) = G ∗ ( n , m ε + σ ) − a m ε + | D ε ¤ = − ˜ O (1/ n ), P £¯¯ G ∗ ( n , m ε , σ ) △ G ∗ ( n , m ε + σ ) − a m ε + ¯¯ > n | D ε ¤ = − ˜ O (1/ n ).There is a similar coupling that accommodates the addition of an extra variable node. Proposition 3.5.

Assume that | σ − ( ω ) | = n / q + O ( p n log n ) for all ω ∈ Ω . Given the degree γ of x n + in G ∗ ( n + m ε + γ , σ ) then there exists a coupling of G ∗ ( n , m ε , σ ) and G ∗ ( n + m ε + γ , σ ) such that P £ G ∗ ( n , m ε , σ ) = G ∗ ( n + m ε + γ , σ ) − x n + − ∂ x n + | D ε ¤ = − ˜ O (1/ n ), P £¯¯ G ∗ ( n , m ε , σ ) = G ∗ ( n + m ε + γ , σ ) − x n + − ∂ x n + ¯¯ > n | D ε ¤ = − ˜ O (1/ n ). = t = IGURE

1. Illustration of the interpolation method at ’times’ t = t = O (1/ n ), ˜ O (1/ n ) of the error terms in Propositions 3.4 and 3.5 are vital to facilitate the com-putation of the partition function. On a technical level, the tools that we develop for proving these propo-sitions, and particularly for dealing with the fragile combinatorics of the factor graph models with givendegrees, constitute the main novelty of the paper. This is where we most visibly add to and improve overthe machinery developed in prior work. The details can be found in Section 4.3.3.5. Aizenman-Sims-Starr and interpolation.

Propositions 3.4 and 3.5 in combination with a trick knownas the Aizenman-Sims-Starr scheme yield the desired upper bound on the partition function.

Proposition 3.6.

We have E [log Z ( G ∗ )] ≤ n sup π ∈ P ∗ ( Ω ) B ( π ) + o ( n ) . To prove Proposition 3.6 it sufﬁces to establish the corresponding upper bound for G ∗ ( n , m ε , σ ∗ ). Thisis because similar but simpler arguments as in the proof of Proposition 3.4 show that E [log Z ( G ∗ )] = E [log Z ( G ∗ ( n , m ε , σ ∗ )] + O ( ε n ). Its proof can be found in Section 13. Now, the Aizenman-Sims-Starrscheme for calculating the latter quantity is to write a telescoping sum E [log Z ( G ∗ ( n , m ε , σ ∗ ))] = n X N = E [log Z ( G ∗ ( N + m ε ( N + σ ∗ N + ))] − E [log Z ( G ∗ ( N , m ε ( N ), σ ∗ N ))].Hence, it sufﬁces to bound the individual summands on the r.h.s., i.e., the differences E [log Z ( G ∗ ( n + m ε ( n + σ ∗ n + ))] − E [log Z ( G ∗ ( n , m ε ( n ), σ ∗ n ))].(3.4)To this end we couple these two random factor graphs. This is where Propositions 3.4 and 3.5 enter thefray. Speciﬁcally, we think of both these factor graphs as being obtained from a smaller factor graph G ∗ that with variables nodes x , . . . , x n and slightly fewer factor nodes than either of the two target factorgraphs. Then we obtain G ∗ ( n , m ε ( n ), σ ∗ n ) by adding a few random factors to G ∗ . Similarly, we obtain G ∗ ( n + m ε ( n + σ ∗ n + ) from G ∗ by adding a few new random factor nodes as well as a new variablenode x n + along with a number of adjacent factor nodes. Crucially, Propositions 3.4 and 3.5 provide thenecessary accuracy to trace the impact of these manipulations on the partition function, and the Bethefunctional emerges organically as an upper bound on (3.4).To obtain the matching lower bound we seize upon the interpolation method. The basic idea is to setup a family of random factor graph models parametrised by time t ∈ [0, 1] such that the model at time t = G ∗ ( n , m ε , σ ∗ ) while the model at time t = t = π B ( π ). Toderive the desired lower bound we prove that the derivative of the log-partition function remains non-negative as we increase t . As in the Aizenman-Sims-Starr scheme, the computation of the derivative canbe reduced to tracing the impact of local changes. Hence, once more we bring Proposition 3.4 to bear,this time in combination with the convexity assumption POS , to prove the following.

Proposition 3.7.

We have E [log Z ( G ∗ )] ≥ n sup π ∈ P ∗ ( Ω ) B ( π ) + o ( n ) . Finally, combining Proposition 3.1–3.7, we obtain Theorem 2.3. .6. Discussion.

There has been a great deal of interest in inference problems on random factor graphsrecently. The substantial literature on the stochastic block model alone, much of it devoted to corrobo-rating the predictions from [20], is surveyed in [1, 39]. The literature on applications to modern codingtheory until about 2008 is surveyed in [43]; important newer contributions include [31, 32]. Furtherrecent applications include compressed sensing [22, 23], group testing [3, 19], code-division multipleaccess [27, 42] and the patient zero problem [4]. Apart and beyond this rigorous literature, there is a vastbody of work based on either physics techniques such as the cavity method or computer experiments.The great variety of concrete problems studied individually underscores the potential of generic prooftechniques or, even better, general theorems that rigorise these predictions wholesale. A ﬁrst contri-bution has been made by Coja-Oghlan, Krzalaka, Perkins and Zdeborová [14], who studied the teacher-student model on binomial random factor graph models. While the general proof strategy that we pursuehere is guided by that paper, the present factor graph models are more general by allowing prescribeddegree sequences for both the variable and factor nodes. From an application viewpoint this general-ity is highly desirable because, for example, the quality of an error correcting code or a group testingscheme can be boosted by optimising the degree distribution [43]. However, from a technical viewpointthis generality comes at the cost of losing (conditional) independence among the factor nodes. This is-sue is well known in random graph theory, where random graphs with given degrees require far moreintricate proofs than, e.g., the Erd˝os–Rényi model [29]. Here, these difﬁculties are exacerbated by thefact that we study not just the plain random graph, which serves as a our null model, but the reweightedrandom graph distribution induced by the teacher-student scheme. In effect, many of the steps thatwere straightforwards in [14] become rather delicate due to stochastic dependencies. The key tool thatallows us to cope with these dependencies is Proposition 3.4. Thus, while we follow the strategy from [14]of combining the Aizenman-Sims-Starr scheme with the interpolation method and although we adoptsome of the technical ingredients from that work such as the ‘pinning lemma’, the greater generality ofthe model leads us to crystallise and improve over the previous approach.What are alternatives to the present strategy of combining the Aizenman-Sims-Starr scheme with theinterpolation method? A classical approach to inference problems on random graphs is the second mo-ment method [5]. Unfortunately, this approach does not generally allow for tight information-theoreticresults. The reason is that the precise formula for the mutual information or the information-theoreticthreshold in, e.g., the stochastic block model comes in terms of the optimiser of the Bethe free entropyfunctional. The distribution π where the maximum is obtained mirrors the outcome of a complicatedmessage passing process. Intuitively, π is an idealised version of the empirical distribution of Belief Prop-agation messages that whiz around the factor graph upon convergence when launched from either auniform initialisation or from the completely polarised initialisation corresponding to the ground truth.In some examples this ﬁxed point can be characterised precisely and, unsurprisingly, turns out to be any-thing but trivial [6]. But we cannot expect the expressiveness required for such a complicated object froma plain second moment computation. A second conceptually elementary approach is to actually com-pute the message passing ﬁxed point by hand, e.g., via the contraction method. But due to the intricacyof the calculations this method has been pushed through in only a few special cases [33].Further powerful techniques include spatial coupling [26] and the adaptive interpolation method [7].Both potentially allow for precise results. The basic idea behind spatial coupling is to convert the givenmodel into a factor graph model with a superimposed geometric structure. A plus of spatial coupling isthat it sometimes allows for better inference algorithms. A disadvantage is that the construction has to becarried out case-by-case. By comparison, the adaptive interpolation method has the advantage of beingtechnically relatively clean. However, at least on sparse models its combinatorial nuts and bolts appearto be roughly equivalent to the combination of Aizenman-Sims-Starr and the interpolation argumentused here. Furthermore, the latter approach has the merit of being closer in spirit to the physicists’cavity calculation. In addition, at this time the adaptive interpolation method has not been extended tomodels with given general degree sequences. urther, there has been quite some work on dense random factor graph models where each variableappears in a constant fraction of factor nodes. Examples are spiked matrix/tensor models [9] or modelsof neural networks such as the Hopﬁeld model [2, 37]. These methods are closer in nature to the classicalSherrington-Kirkpatrick model [40]. It seems fair to say that more is known about dense models thansparse ones because certain central limit theorem-like simpliﬁcations arise. In some cases, the Bethevariational principle reduces to a ﬁnite-dimensional or even scalar optimisation problem [21, 34].To conclude we note that the study of inference problems typically comes in two instalments: aninformation-theoretic view that asks for thresholds beyond which in principle sufﬁcient informationis available to form a non-trivial estimate of the ground truth and an algorithmic view interested inpolynomial-time algorithms. While the two perspectives might appear disparate at ﬁrst glance, infor-mation-theoretic results on inference problems like in this paper in combination with tools such as spa-tial coupling have in the past led to efﬁcient algorithms capable of attaining the information-theoreticthresholds [19, 23]. We view this as an exciting avenue for future research.3.7. Organisation.

In Section 4 we introduce an extension of the random factor graph model from Sec-tion 2 that incorporates the bells and whistles required to facilitate the proofs of Propositions 3.6 and 3.7.The section also contains the proofs of Propositions 3.2 and 3.4. Section 5–10 lay the foundation to proveProposition 3.1 in Section 11. Similarly, Section 12 will be used in Section 13 to prove Proposition 3.3.Subsequently in Section 14 we prove Proposition 3.6. The proof of Proposition 3.7 follows in Section 15.In Section 16 we prove the results stated in Section 1 and also point out a few further applications of thetheorems from Section 2. Two further extensions of our results can be found in Section 17.4. G

ROUNDWORK

A generalised model.

To facilitate the various parts of the proof we introduce one uniﬁed randomfactor graph model and supply a few tools for analysing it. The generic model has variable nodes V n = { x , . . . , x n } and factor nodes F m = { a , . . . , a m }. Each variable node comes with a target degree d i ≥

0. Thesequence ( d , . . . , d n ) is denoted by d . Similarly, each factor node a i comes with a target degree k i ≥ k = ( k , . . . , k m ). The degrees are required to satisfy the condition n X i = d i ≥ m X i = k i .(4.1)Every i ∈ [ m ] comes with a ﬁnite set Ψ i of weight functions Ω k i → (0, ∞ ), each of which is equipped witha probability measure P i . Let P = ( P , . . . , P m ).The random factor graph G ( d , k , P , θ ) is now deﬁned as follows. Let Γ be a random maximal matchingof the complete bipartite graph with vertex classes n [ i = { x i } × [ d i ] and m [ i = { a i } × [ k i ].Then the bipartite graph underlying G ( d , k , P , θ ) is obtained from Γ by contracting the vertex sets { x i } × [ d i ] and © a j ª × [ k j ] for all i ∈ [ n ] and all j ∈ [ m ]. Thus, the construction is similar to the well known pairingmodel for random graphs with given degree sequences. Strictly speaking, the result of this process is abipartite multigraph. We turn this multigraph into a factor graph by drawing for each a i a weight function ψ a i from the distribution P i independently. Furthermore, we add few unary factor nodes p , . . . , p θ . Foreach p i we let ∂ p i = { x i }. Moreover, with ω i ∈ Ω drawn independently and uniformly, the weight functionof p i reads ψ p i ( σ ) = { σ = ω i } . he random factor graph induces a Boltzmann distribution and partition function deﬁned via (2.1). Fur-thermore, G ( d , k , P , θ ) induces the reweighted factor graph distribution ˆ G ( d , k , P , θ ) deﬁned by P £ ˆ G ( d , k , P , θ ) ∈ A ¤ = E [ Z ( G ( d , k , P ) { G ( d , k , P , θ ) ∈ A }] E [ Z ( G ( d , k , P , θ )] for any event A .(4.2)Further, given σ ∈ Ω V n we deﬁne G ∗ ( d , k , P , θ , σ ) by P £ G ∗ ( d , k , P , θ , σ ) ∈ A ¤ = E [ ψ G ( d , k , P , θ ) ( σ ) { G ( d , k , P , θ ) ∈ A }] E [ ψ G ( d , k , P , θ ) ( σ )] for any event A .(4.3)Finally, we obtain an induced distribution ˆ σ ( d , k , P , θ ) on assignments via P £ ˆ σ ( d , k , P , θ ) = σ ¤ = E [ ψ G ( d , k , P , θ ) ( σ )] E [ Z ( G ( d , k , P , θ ))] .(4.4)4.2. Getting started.

The factor graph model and the corresponding Boltzmann distribution facilitatedelicate correlations between the spins of different vertices. To cope with them technically, we are in thelucky position that any ﬁnite probability space can be partitioned into ﬁnitely many sets (so-called purestates ) such that a given probability measure behaves like a product measure on these states.

Lemma 4.1 (Regularity Lemma, [17]) . For any ﬁnite set Ω and for all ε > there are L > and N > suchthat for all n ≥ N and all µ ∈ P ( Ω n ) we ﬁnd a partition S , . . . , S ℓ of Ω n into ﬁnitely many parts ( ≤ ℓ ≤ L)such that • P ℓ i = µ ( S i ) ≥ − ε , • for all i we ﬁnd µ ( S i ) > and E £ d TV ( µ j , k [ · | S i ] − µ j [ · | S i ] ⊗ µ k [ · | S i ]) ¤ ≤ ε . The regularity lemma itself deals with pairwise interactions between vertex spins. It turns out, that thispairwise approximate independence generalizes to an approximate independence between any boundednumber of vertex spins.

Lemma 4.2 (Symmetry, [15]) . For any ﬁnite set Ω and any measure µ ∈ P ( Ω n ) we ﬁnd that for any k ≥ E £ d TV ¡ µ i , j , µ i ⊗ µ j ¢¤ = o (1) =⇒ E " d TV Ã µ i ,... i k , k O i = µ i ! = o (1).We are left to ﬁnd a partition of Ω n into pure-states. It turns out that the pinning operation (that is,assigning speciﬁc values to a small number of variables) yields a regular partition. Lemma 4.3 (Pinning Lemma, Lemma 3.5 of [14]) . Let Ω be a ﬁnite set. For all ε > there is a numberT = T ( ε , Ω ) such that for any n > T and any probability measure µ ∈ P ( Ω n ) we ﬁnd the following. Wecreate a random probability measure ˇ µ ∈ P ( Ω n ) as follows. • Draw a sample ˇ σ from µ . • Independently, choose Θ i n (0, T ) uniformly at random. • Create a random subset U of [ n ] by including each i ∈ [ n ] independently with probability Θ / n. • Finally, deﬁne ˇ µ ( σ ) = µ ( σ ) { ∀ i ∈ U : ˇ σ i = σ i } µ ({ τ ∈ Ω n : ∀ i ∈ U : τ i = ˇ σ i }) . Then, with probability at least − ε we ﬁnd E i , j £ d TV ¡ ˇ µ i , j , ˇ µ i ⊗ ˇ µ j ¢¤ < ε .The following lemma evinces that if the free energy in G ∗ is larger than the ﬁrst moment bound, thefree energy in G is strictly smaller than this bound. emma 4.4. We have E £ log Z ( G ∗ ( d , k , P , θ , σ ∗ )) ¤ = log E £ Z ( G ( d , k , P , θ )) ¤ + o ( n ) ⇔ E £ log Z ( G ( d , k , P , θ )) ¤ = log E £ Z ( G ( d , k , P , θ )) ¤ + o ( n )Lemma 4.4 is an immediate consequence of Lemma 17.8.Throughout this paper, we will use the standard Landau notation and introduce ˜ O ( · ) to hide logarith-mic factors. Moreover, if ( E n ) n denotes a sequence of events we say that ( E n ) n holds with high probability (w.h.p.) if lim n →∞ P [ E n ] =

1. The proofs in the subsequent sections require the weight functions to bebounded and not too small, which we ensure by imposing the condition that they take values in ( ε , 2)which can be safely assumed by SYM .4.3.

Adding factor nodes.

Let d = ( d , . . . , d n ), k = ( k , . . . , k m ) and ( Ψ , P ), . . . , ( Ψ m , P m ) be as before.The aim in this section is to compare the random factor graph model with these parameters with a modelwith one extra factor node. Hence, let k + = ( k , . . . , k m , k m + ) be a degree sequence obtained from k byadding one more entry. Additionally, let ( Ψ m + , P m + ) be a set of possible weight functions for the newfactor node together with a probability distribution on that set. The aim of the following propositionis to show that G ∗ ( d , k + , P , θ , σ ) can essentially be obtained by ﬁrst creating G ∗ ( d , k , P , θ , σ ) and thenadding one extra factor node. While such a description is trivially valid in the realm of binomial factorgraph models, in the present setting of given degree sequences matters turn out to be quite delicate. Inparticular, we need to assume the following. SYM ′ : There exist reals ε , ξ > ψ ∈ S ≤ i ≤ m + Ψ i , j ∈ [ k ψ ], ω ∈ Ω we have q − k ψ X σ ∈ Ω k ψ © σ j = ω ª ψ ( σ ) = ξ , min σ ∈ Ω k ψ ψ ( σ ) > ε .In particular, SYM holds for P i , i ∈ [ m + Proposition 4.5.

For any ﬁxed C > ε > the following is true. Suppose that all degrees satisfy d i ≤ C fori ∈ [ n ] , k j ≤ C for j ∈ [ m ] , that n X i = d i − m X i = k i ≥ ε n , and that SYM ′ is satisﬁed. Moreover, assume that σ ∈ Ω V n is such that for all ω ∈ Ω we have ¯¯¯¯¯ n X i = d i ¡ { σ i = ω } − q ¢¯¯¯¯¯ = O ( p n log n ) and ¯¯¯¯¯ n X i = { σ i = ω } − nq ¯¯¯¯¯ = O ( p n log n ). Then there exists a coupling of G ∗ ( d , k , P , θ , σ ) and G ∗ ( d , k + , P , θ , σ ) such that P £ G ∗ ( d , k , P , θ , σ ) = G ∗ ( d , k + , P , θ , σ ) − a m + ¤ = − ˜ O ( n − ), P £¯¯ G ∗ ( d , k , P , θ , σ ) △ G ∗ ( d , k + , P , θ , σ ) ¯¯ < p n log n ¤ = − O ( n − ).We also need an estimate of the total variation distance of the two random factor graph models when SYM is not assumed for the last factor node.

SYM ′′ : There exist reals ε , ξ > ψ ∈ S ≤ i ≤ m Ψ i , j ∈ [ k ψ ], ω ∈ Ω we have q − k ψ X σ ∈ Ω k ψ © σ j = ω ª ψ ( σ ) = ξ , min σ ∈ Ω k ψ ψ ( σ ) > ε . Proposition 4.6.

For any ﬁxed C > ε > the following is true. Suppose that all degrees satisfy d i ≤ C fori ∈ [ n ] , k j ≤ C for j ∈ [ m ] , that n X i = d i − m X i = k i ≥ ε n , nd that SYM ′′ is satisﬁed. Moreover, assume that σ ∈ Ω V n is such that for all ω ∈ Ω we have ¯¯¯¯¯ n X i = d i ¡ { σ i = ω } − q ¢¯¯¯¯¯ = O ( p n log n ) and ¯¯¯¯¯ n X i = { σ i = ω } − nq ¯¯¯¯¯ = O ( p n log n ). Then there exists a coupling of G ∗ ( d , k , P , θ , σ ) and G ∗ ( d , k + , P , θ , σ ) such that P £ G ∗ ( d , k , P , θ , σ ) = G ∗ ( d , k + , P , θ , σ ) − a m + ¤ = − ˜ O ( n − ), P £¯¯ G ∗ ( d , k , P , θ , σ ) △ G ∗ ( d , k + , P , θ , σ ) ¯¯ < p n log n ¤ = − O ( n − ).A key feature of Proposition 4.6 is that we do not need to assume SYM ′ for the new factor node a m + . Toprove Proposition 4.5 we introduce a more accessible construction of the graph G ∗ ( d , k , P , θ , σ ). Let ∆ = n X i = d i − m X i = k i ≥ ε n , ∆ + = ∆ − k m + .(4.5)Additionally, for each i ∈ [ ∆ ] we introduce a unary factor node b i whose weight function is just the con-stant 1. Hence, the overall number of factor nodes becomes m + ∆ . Like in the pairing model of randomgraphs with given degree sequences we further introduce sets X = n [ i = { x i } × [ d i ], A = m [ i = { a i } × [ k i ], A + = m + [ i = { a i } × [ k i ], D = { b , . . . , b ∆ } , D + = { b , . . . , b ∆ + }of clones of variable and factor nodes. Moreover, given the assignment σ ∈ Ω n let χ ∈ Ω X be the inducedassignment on the variable clones.We now consider the following experiment whose outcome is a factor graph G ♯ ( d , k , P , σ ). SHARP1:

Generate a random assignment y ♯ ∈ Ω A ∪ D as follows. Draw y ♭ from the distribution P h y ♭ = y i = m Y i = E £ ψ a i ( y a i ) ¤P y ′ ∈ Ω ki E £ ψ a i ( y ′ ) ¤ q − ∆ ( y ∈ Ω A ∪ D ), and then choose P h y ♯ = y i = P h y ♭ = y ¯¯¯ ρ y ♭ = ρ χ i ,where ρ τ denotes the empirical distribution of spins under conﬁguration τ ∈ Ω X and y a i denotesthe restriction of y to { a i } × [ k i ]. SHARP2:

Given y ♯ = y , for i ∈ [ m ] independently, choose weight functions according to P h ψ ♯ a i ∈ E | y ♯ = y i = E £ ψ a i ( y a i ) © ψ a i ∈ E ª¤ E £ ψ a i ( y a i ) ¤ ,where y a i denotes the restriction of y to { a i } × [ k i ]. SHARP3:

Finally, choose a bijection g ♯ : X → A ∪ D uniformly from the set of all bijections g suchthat y ◦ g = χ ; thus, for any such g we have P h g ♯ = g | y ♯ = y i = q Y z = ¯¯ y − ( z ) ¯¯ ! .We denote the result of this procedure by G ♯ ( d , k , P , σ ). From this graph we obtain G ♯ ( d , k , P , θ , σ ) byadding unary factor nodes p , . . . , p θ adjacent to x , . . . , x θ with weight functions τ { τ = σ i }. Anal-ogously we deﬁne χ + , y ♭ , + , y ♯ , + , ψ ♯ , + a i , g ♯ , + for the degree sequence ( k , . . . , k m + ). These give rise to thefactor graph G ♯ ( d , k + , P , θ , σ ). Lemma 4.7.

The random factor graphs G ∗ ( d , k , P , θ , σ ) and G ♯ ( d , k , P , θ , σ ) are identically distributed. Soare G ∗ ( d , k + , P , θ , σ ) and G ♯ ( d , k + , P , θ , σ ) . roof. It sufﬁces to prove the second statement. Hence, let g : X → A + ∪ D + be a bijection and write y = χ ◦ g − for the induced assignment on A + ∪ D + . P " G ♯ ( d , k + , P , θ , σ ) ∈ © g ª × m + Y i = E i = P h y ♯ , + = y i P h g ♯ , + = g | y ♯ , + = y i m + Y i = P h ψ ♯ , + a i ∈ E i | y ♯ , + = y i = P h ρ y ♭ , + = ρ χ i Q q τ = ¯¯ y − ( τ ) ¯¯ ! Ã m + Y i = E £ ψ a i ( y a i ) ¤P τ ∈ Ω ki E £ ψ a i ( τ ) ¤ ! Ã m + Y i = E £ ψ a i ( y a i ) © ψ a i ∈ E i ª¤ E £ ψ a i ( y a i ) ¤ ! .Moreover, with f ranging over all bijections X → A + ∪ D + , P " G ∗ ( d , k + , P , θ , σ ) ∈ © g ª × m + Y i = E i = E h ψ G ( d , k + , P , θ ) ( σ ) © G ( d , k + , P , θ ) ∈ © g ª × Q m + i = E i ªi E h ψ G ( d , k + , P , θ ) ( σ ) i = Q m + i = E £ ψ a i ( y a i ) © ψ a i ∈ E i ª¤¡P ni = d i ¢ ! P f P £ g = f ¤Q m + i = E £ ψ a i (( χ ◦ f − ) a i ) ¤ = Ã m + Y i = E £ ψ a i ( y a i ) © ψ k i ∈ E i ª¤ E £ ψ a i ( y a i ) ¤ ! Q m + i = E £ ψ a i ( y a i ) ¤P f Q m + i = E £ ψ a i (( χ ◦ f − ) a i ) ¤ .It thus remains to show that P h ρ y ♭ , + = ρ χ i m + Y i = Ã X τ ∈ Ω ki E £ ψ a i ( τ ) ¤! Y τ ∈ Ω ¯¯ y − ( τ ) ¯¯ ! = X f m + Y i = E £ ψ a i (( χ ◦ f − ) a i ) ¤ .(4.6)On the right hand side we may alternatively sum over all possible images χ ◦ f − that arise from bijections f . Observe that each different χ ◦ f − can arise from exactly Q qz = ¯¯ χ − ( z ) ¯¯ ! many different f , as permutingthe images of clones within a color class does not change the induced image on the factor side. Moreover,we can only see χ ◦ f − with ρ χ ◦ f − = ρ χ which means ¯¯ χ − ( z ) ¯¯ = ¯¯ ( χ ◦ f − ) − ( z ) ¯¯ for all z ∈ Ω . Therefore, X f m + Y i = E £ ψ a i (( χ ◦ f − ) a i ) ¤ = q Y z = ¯¯ y − ( z ) ¯¯ ! X y ′ : ρ y ′ = ρ χ m + Y i = E £ ψ a i ( y ′ a i ) ¤ .Further, by the deﬁnition of y ♭ , + , P h ρ y ♭ , + = ρ χ i m + Y i = Ã X τ ∈ Ω ki E £ ψ a i ( τ ) ¤! = X y ′ : ρ y ′ = ρ χ m + Y i = E £ ψ a i ( y ′ a i ) ¤ ,which establishes (4.6) and thus the lemma. (cid:3) We prove Proposition 4.5 by showing that the assignments observed on the factor nodes can be cou-pled so that they agree with probability 1 − ˜ O (1/ n ). Let y ♯ and y ♯ , + denote the assignments drawn as per SHARP1 for the two graphs. Furthermore, let A denote the set of clones of a , . . . , a m and let y ♯ A , y ♯ , + A signify the restrictions of y ♯ , y ♯ , + to A . Moreover, let us call y ∈ Ω A extendible if X α ∈ A { y α = τ } ≤ ρ χ ( τ ) n X i = d i for all τ ∈ Ω .Thus, the extendible y are the conceivable outcomes of y ♯ A , y ♯ , + A .As a ﬁrst step we deal with “atypical” extendible y . To this end we ﬁnally introduce for i ∈ [ m ] Y ♯ / ♭ i ( τ ) = k i X j = n y ♯ / ♭ ( a i , j ) = τ o , A = m X i = k i .Thus, Y ♯ / ♭ i ( τ ) counts occurrences of τ among the clones of factor node a i under y ♯ / ♭ from SHARP1 . emma 4.8. Assume the assumptions of Proposition 4.5 to hold. We have P "¯¯¯¯¯ m X i = Y ♯ i ( τ ) − Aq ¯¯¯¯¯ > p A log A log log A ≤ n − . Proof.

Due to

SYM ’ we have E £ Y ♭ i ( τ ) ¤ = k i / q for all τ ∈ Ω . Therefore, Stirling’s formula yields that thereexists c > P h ρ y ♭ = ρ χ i = Ω ³ n − ( q − n − c log n ´ .Hence, P "¯¯¯¯¯ m X i = Y ♯ i ( τ ) − Aq ¯¯¯¯¯ > p A log A log log A ≤ P h¯¯¯P mi = Y ♭ i ( τ ) − Aq ¯¯¯ > p A log A log log A i P h ρ y ♭ = ρ χ i ≤ O ³ n ( q − n c log n ´ P "¯¯¯¯¯ m X i = Y ♭ i ( τ ) − Aq ¯¯¯¯¯ > p A log A log log A .(4.7)Moreover, the fact that the factor degrees are bounded and the Azuma–Hoeffding inequality imply that P "¯¯¯¯¯ m X i = Y ♭ a ( z ) − Aq ¯¯¯¯¯ > p A log A log log A ≤ µ − C log m ¡ log log m ¢ ¶ = O ( n − log n (loglog n ) ).(4.8)Combining (4.7) and (4.8) completes the proof. (cid:3) Let Y be the set of all extendible y ∈ Ω A such that for all τ ∈ Ω , ¯¯¯¯¯ m X i = k i X j = © y ( a i , j ) = τ ª − Aq ¯¯¯¯¯ ≤ p A log ( A ) log log ( A ) .(4.9) Lemma 4.9.

Suppose that

SYM ′ is satisﬁed. There is a coupling of y ♯ , y ♯ , + and of G ♯ ( d , k , P , θ , σ ) , G ♯ ( d , k + , P , θ , σ ) such that P h G ♯ ( d , k , P , θ , σ ) = G ♯ ( d , k + , P , σ ) | y ♯ A , y ♯ , + A ∈ Y i = − ˜ O µ n ¶ .We prove Lemma 4.9 in several steps. The ﬁrst step is to calculate the following ratio. Claim 4.10.

Suppose that α , β ∈ P ( Ω ) satisfy d TV ( α , q − ), d TV ( β , q − ) = O ( n − log n log log n ) and d TV ( α , β ) = O (1/ n ) . Then Ã ∆ + α ∆ + !Ã ∆ + β ∆ + ! − = + q ∆ + X τ ∈ Ω ( α ( τ ) − q )( α ( τ ) − β ( τ )) + ˜ O (1/ n ). Proof.

By Stirling’s formula, Ã ∆ + α ∆ + ! = ¡ π ∆ + ¢ − ( q − exp Ã − ∆ + X τ ∈ Ω α ( τ ) log ( α ( τ )) − X τ ∈ Ω log ( α ( τ )) + O µ n ¶! .(4.10)Moreover, applying Taylor’s formula to the entropy function, we obtain − X τ ∈ Ω α ( τ ) log α ( τ ) = log q − q X τ ∈ Ω µ α ( τ ) − q ¶ + q X τ ∈ Ω µ α ( τ ) − q ¶ + ˜ O µ n ¶ .(4.11) f course, estimates similar to (4.10) and (4.11) apply to ¡ ∆ + β ∆ + ¢ . Combining them, we obtain Ã ∆ + α ∆ + !Ã ∆ + β ∆ + ! − = exp h − ∆ + Ã q X τ ∈ Ω ( α ( τ ) − q ) − ( β ( τ ) − q ) + q ¡ α ( τ ) − q ¢ − ( β ( τ ) − q ) ! − X τ ∈ Ω log α ( τ ) β ( τ ) + O (1/ n ) i .(4.12)Furthermore, X τ ∈ Ω ( α ( τ ) − q ) − ( β ( τ ) − q ) = X τ ∈ Ω ( α ( τ ) − β ( τ )) + α ( τ ) − q )( β ( τ ) − α ( τ )) = O (1/ n ) + X τ ∈ Ω ( α ( τ ) − q )( β ( τ ) − α ( τ )),(4.13) X τ ∈ Ω ( α ( τ ) − q ) − ( β ( τ ) − q ) = − X τ ∈ Ω ( β ( τ ) − α ( τ )) + α ( τ ) − q ) ( β ( τ ) − α ( τ )) + α ( τ ) − q )( β ( τ ) − α ( τ )) = ˜ O (1/ n )(4.14)Plugging (4.13) and (4.14) into (4.12), we obtain Ã ∆ + α ∆ + !Ã ∆ + β ∆ + ! − = exp Ã q ∆ + X τ ∈ Ω ( α ( τ ) − q )( α ( τ ) − β ( τ )) + ˜ O (1/ n ) ! .Expanding the exponential series completes the proof. (cid:3) Claim 4.11.

For y ∈ Y we have P [ y ♯ A = y ] P [ y ♯ , + A = y ] = + ˜ O ( n − ) P k m + j = P τ ∈ Ω ¯¯¯ P h y ♯ , + m + j = τ i − q ¯¯¯ + ˜ O ( n − ) .Proof. For any y ∈ Y we have, by the deﬁnition of y ♯ from SHARP1 , P h y ♯ A = y i = P h y ♭ A = y , ρ y ♭ = ρ χ i P h ρ y ♭ = ρ χ i = X y ′ ∈ Ω km + P h y ♭ A = y , ∀ i ∈ [ k m + ] : y ♭ b i = y ′ i , ρ y ♭ = ρ χ i P h ρ y ♭ = ρ χ i = P £ y ♭ A = y ¤ P h ρ y ♭ = ρ χ i X y ′ ∈ Ω km + P h ∀ i ∈ [ k m + ] : y ♭ b i = y ′ i i P h ρ ( y , y ′ , y ♭∆ + ) = ρ χ i .(4.15)Analogously, P h y ♯ , + A = y i = P h y ♭ , + A = y i P h ρ y ♭ , + = ρ χ i X y ′ ∈ Ω km + P h ∀ i ∈ [ k m + ] : y ♭ , + b am + = y ′ i i P h ρ ( y , y ′ , y ♭ , + ∆ + ) = ρ χ i .(4.16)Set α y , y ′ ( τ ) = ∆ + " n X i = d i { σ i = τ } − m X i = k i X j = © y ( a i , j ) = τ ª − k m + X j = n y ′ j = τ o , α ′ y ( τ ) = ∆ + " n X i = d i { σ i = τ } − m X i = k i X j = © y ( a i , j ) = τ ª − k m + q , α ′′ y ′ ( τ ) = ∆ + k m + X j = µ q − n y ′ j = τ o¶ so that α y , y ′ ( τ ) = α ′ y ( τ ) + α ′′ y ′ ( τ ). Then P h ρ ( y , y ′ , y ♭∆ + ) = ρ χ i = P · Mult µ ∆ + , 1 q , . . . , 1 q ¶ = ∆ + α y , y ′ ¸ = q − ∆ + Ã ∆ + ¡ α y , y ′ ( τ ) ∆ + ¢ τ ∈ Ω ! . oreover, because y ∈ Y we have α ′ y ( τ ) = q + O µ log n log log n p n ¶ and α ′′ y ′ ( τ ) = O µ n ¶ .Claim 4.10, (4.15) and (4.16) therefore yield P h y ♯ A = y i P h y ♯ , + A = y i = P h ρ y ♭ , + = ρ χ i P h ρ y ♭ = ρ χ i Ã + ˜ O (1/ n ) + ˜ O ( n − ) X τ ∈ Ω k m + X j = ¯¯¯ P h y ♭ , + m + j = τ i − q ¯¯¯! .(4.17)We ﬁnally need to compare P h ρ y ♭ = ρ χ i and P h ρ y ♭ , + = ρ χ i . This can be done in a similar way to theprevious calculation. For y ′ ∈ Ω k m + and τ ∈ Ω , write α − y ′ ( τ ) = P ni = d i { σ i = τ } − P k m + h = © y ′ h = τ ªP ni = d i − k m + .Moreover, let y ♭ , − be the vector y ♭ with the components corresponding to b , . . . , b k m + removed. Then P h ρ y ♭ = ρ χ i P h ρ y ♭ + = ρ χ i = P y ′′ ∈ Ω km + E £ ψ m + ( y ′′ ) ¤ q k m + · P y ′ ∈ Ω km + P h ρ y ♭ , − = ³ α − y ′ ( τ ) ´ τ ∈ Ω iP y ′ ∈ Ω km + P h ρ y ♭ , − = ³ α − y ′ ( τ ) ´ τ ∈ Ω i E £ ψ m + ( y ′ ) ¤ .(4.18)We next compare the probabilities to hit certain colour statistics if factor node m + P h ρ y ♭ , − = ³ α − y ′ ( τ ) ´ τ ∈ Ω i = X y + ∈ Ω A P · Mult µ ∆ + , 1 q , . . . , 1 q ¶ = ∆ + α y + , y ′ ¸ P h y ♭ A = y + i .(4.19)To estimate (4.19) we notice that for any y + ∈ Ω A with ¯¯¯¯¯ m X i = k i X h = n y + ( a i , h ) = τ o − | A | / q ¯¯¯¯¯ > p | A | log | A | log log | A | (4.20)and any y ′ ∈ Ω k m + , we ﬁnd ¯¯¯¯ ∆ + α y + , y ′ ( τ ) − ∆ + q ¯¯¯¯ = Ω ³p | A | log | A | log log | A | ´ (4.21)Further, if (4.21) is satisﬁed, then the Chernoff bound implies that there is a constant δ > P · Mult µ ∆ + , 1 q , . . . , 1 q ¶ = ∆ + α y + , y ′ ¸ P h y ♭ A = y + i = O ³ n − δ log n (loglog n ) ´ P h y ♭ A = y + i .Hence, P h ρ y ♭ = ρ χ i P h ρ y ♭ , + = ρ χ i = q − k m + P y ′ ∈ Ω km + P y + ∈ Y P £ y ♭ A = y + ¤ ¡ ∆ + ∆ + α y + , y ′ ¢P y ′ ∈ Ω km + P h y ♭ , + m + = y ′ i P y + ∈ Y P £ y ♭ A = y + ¤ ¡ ∆ + ∆ + α y + , y ′ ¢ + ˜ O (1/ n )Thus, Claim 4.10 yields P h ρ y ♭ = ρ χ i P h ρ y ♭ , + = ρ χ i = + ˜ O ( n − ) k m + X j = X τ ∈ Ω ¯¯¯ P h y ♯ , + m + j = τ i − q ¯¯¯ + ˜ O ( n − ).(4.22)Combining (4.15), (4.16), (4.17) and (4.22), we obtain the assertion. (cid:3) roof of Lemma 4.9. Claim 4.11 and assumption

SYM’ yield d TV ( y ♯ A , y ♯ , + A ) = ˜ O (1/ n ).The coupling lemma for the total variation distance (see i.e. [35]) therefore yields a coupling under which y ♯ A and y ♯ , + A differ with probability ˜ O (1/ n ). The construction SHARP2–3 ﬁnally extends this coupling tothe desired coupling of G ♯ ( d , k , P , θ , σ ) and G ♯ ( d , k + , P , θ , σ ). (cid:3) Lemma 4.12.

There is a coupling of y ♯ , y ♯ , + and of G ♯ ( d , k , P , θ , σ ) , G ♯ ( d , k + , P , θ , σ ) such that P h¯¯¯ G ♯ ( d , k , P , θ , σ ) △ G ♯ ( d , k + , P , σ ) ¯¯¯ > p n log n | y ♯ A , y ♯ , + A ∈ Y i = O ¡ n − ¢ . Proof.

Let us denote by © ψ ∈ Ψ , k ∈ N ≥ , τ ∈ Ω k ª the set of all possible triples of weight function, arity andneighbourhood spins for a factor node. Further, let q ψ , k , τ denote the probability to observe such a triple.Since each factor node’s arity is bounded, Ω is a ﬁnite set and there exist only ﬁnitely many differentweight functions, the number of distinct weight function, arity and neighbourhood triples is also ﬁnite.Thus q ψ , k , σ > ε for some arbitrarily small ε >

0. Since there are m = Θ ( n ) many factor nodes, the Chernoffbound for the binomial distribution ensures that each distinct ( ψ , k , τ ) occurs in both G ♯ ( d , k , P , θ , σ ) and G ♯ ( d , k + , P , θ , σ ) for any choice of d , k + , P , θ , σ at least¯ d nq ψ , k , τ / ¯ k − p n log n often with probability 1 − O ( n − ). Therefore, we can couple G ♯ ( d , k , P , θ , σ ) and G ♯ ( d , k + , P , θ , σ ) in sucha way that they differ in at most p n log n factor nodes with probability 1 − O ( n − ) whence the lemmafollows. (cid:3) Proof of Proposition 4.5.

The proposition is an immediate consequence of Lemmas 4.7-4.12. (cid:3)

Proof of Proposition 4.6.

By Lemma 4.7, 4.8, Claims 4.10 and 4.11 and assuming

SYM ”, we have P [ y ♯ A = y ] P [ y ♯ , + A = y ] = + ˜ O ( n − ) k m + X j = X τ ∈ Ω ¯¯¯ P h y ♯ , + m + j = τ i − q ¯¯¯ + ˜ O ( n − ) = + ˜ O ( n − )Thus, d TV ( y ♯ A , y ♯ , + A ) = ˜ O ( n − ) and the construction SHARP2–3 extends this coupling to the desired cou-pling of G ♯ ( d , k , P , θ , σ ) and G ♯ ( d , k + , P , θ , σ )( σ ). Lemma 4.12 establishes the second statement of thelemma. (cid:3) Proof of Proposition 3.4.

The proposition is a special case of Proposition 4.5. (cid:3)

Adding a variable.

We add a variable node with its adjacent factor nodes to G ∗ ( d , k , P , θ , σ ) as fol-lows. Let d + be the sequence d extended by the degree of a new the variable node x n + . Similarly,let k + be the sequence k with the degrees of the factor nodes a ′ = a m + , . . . , a ′ d n + = a m + d n + appended.Also let h i ∈ [ k m + i ] for each i ∈ [ d n + ] and let ψ a ′ i signify the weight function of a ′ i . Furthermore, letˇ G ∗ ( d + , k + , P , σ ) be the random factor graph that results from the following experiment. PLUS1: choose σ x n + ∈ Ω uniformly at random. PLUS2: draw a random factor graph G ∗ ( d + , k + , P , ( σ , σ x n + )) given that the clones x n + × [ d n + ] areconnected to ( a ′ , h ), . . . ( a ′ d n + , h d n + ) in this order.As in the previous subsection, we ask how the factor graph ˇ G ∗ ( d + , k + , P , σ ) − x n + − a ′ − . . . − a ′ d n + obtainedby removing x n + , a ′ , . . . , a ′ d n + compares to G ∗ ( d , k , P , σ ). We need the following assumption. SYM ′′′ : There exist reals ε , ξ > ψ ∈ S ≤ i ≤ m + d n + Ψ i , j ∈ [ k ψ ], ω ∈ Ω we have q − k ψ X σ ∈ Ω k ψ © σ j = ω ª ψ ( σ ) = ξ , min σ ∈ Ω k ψ ψ ( σ ) > ε . roposition 4.13. For any ﬁxed C > ε > the following is true. Suppose that all degrees satisfy d i ≤ Cfor i ∈ [ n ] , k j ≤ C for j ∈ [ m ] , that n X i = d i − m X i = k i ≥ ε n , and that SYM ′′′ is satisﬁed. Moreover, assume that σ ∈ Ω V n is such that for all ω ∈ Ω we have ¯¯¯¯¯ n X i = d i ¡ { σ i = ω } − q ¢¯¯¯¯¯ = O ( p n log n ) and ¯¯¯¯¯ n X i = { σ i = ω } − nq ¯¯¯¯¯ = O ( p n log n ). Then there is a coupling of G ∗ ( d , k , P , θ , σ ) and ˇ G ∗ ( d + , k + , P , θ , σ , σ x n + ) such that P " G ∗ ( d , k , P , θ , σ ) = ˇ G ∗ ( d + , k + , P , θ , σ , σ x n + ) − x n + − d n + X i = a ′ i = − ˜ O ¡ n − ¢ , P h¯¯¯ G ∗ ( d , k , P , θ , σ ) △ ˇ G ∗ ( d + , k + , P , θ , σ , σ x n + ) ¯¯¯ ≤ p n log n i = − ˜ O ¡ n − ¢ .The proof of Proposition 4.13 is based on the arguments from the previous section. Speciﬁcally, weintroduce an auxiliary factor graph model in which the new variable x n + and the new factor nodes a ′ i are replaced by a single factor node a ′ of degree k ′ = P d n + i = k m + i − d n + . Moreover, the weight functionof a ′ is deﬁned as ψ a ′ ( τ ) = X χ ∈ Ω d n + Y i = X τ ∈ Ω km + i { τ h i = χ } ψ a ′ i ( τ ).Let ˜ G ∗ = G ( d , ( k , k ′ ), P , σ ) be the random factor graph with the additional factor node a ′ . Lemma 4.14.

Under the assumptions of Proposition 4.13 there exists a coupling of ˜ G ∗ and G ∗ ( d , k , P , σ ) such that P £ ˜ G ∗ − a ′ = G ∗ ( d , k , P , σ ) ¤ = − ˜ O (1/ n ), P £¯¯ ( ˜ G ∗ − a ′ ) △ G ∗ ( d , k , P , σ ) ¯¯ > p n log( n )/2 ¤ = O ( n − ). Proof.

We reiterate the argument from Section 4.3 for the ˜ G ∗ model. The assumption SYM ′′′ ensuresthat the random factor graph model ˜ G ∗ satisﬁes the assumption SYM ′ from Section 4.3. Indeed, for thefactor nodes a , . . . , a m this is an immediate consequence of SYM ′′′ . Moreover, with respect to a ′ we ﬁx i ∈ [ d n + ], j ∈ [ k m + i ] \ { h i } and ω ∈ Ω . Then X χ ∈ Ω d n + Y t = X τ ∈ Ω km + t { τ h t = χ ∧ ( t i ∨ τ j = ω )} ψ a ′ i ( τ ) = q P dn + ℓ = k m + ℓ − d n + ξ d n + .(4.23)In particular, the expression on the r.h.s. is independent of ω . Applying SYM ′ for a , . . . , a m and (4.23) for a ′ and reiterating the proof of Proposition 4.5, we obtain the assertion.Indeed, SYM ′′′ , (4.23) and Claims 4.10, 4.11 are everything we need to prove the statement. By Claims4.10 and 4.11 such a coupling exists if we manage to prove that for each τ ∈ Ω the probability of observingcolor τ at any variable x i for i ∈ [ n +

1] connected to a ′ under σ is 1/ q . If i ∈ [ n ] this is an immediateconsequence of SYM ′′′ . If i = n +

1, this follows from (4.23). (cid:3)

Remark 4.15.

Because the factor graphs are random, we may assume without loss that the distributions ψ k are invariant under permutations of the arguments, that is for any permutation κ of [ k ] and for any ψ ∈ Ψ k , the weight function ψ κ ( σ ) = ψ ( σ κ , . . . , σ κ k ) satisﬁes P £ ψ k = ψ ¤ = P £ ψ k = ψ κ ¤ . emma 4.16. Under the assumptions of Proposition 4.13 there exists a coupling of ˜ G ∗ and ˇ G ∗ such that P h ˜ G ∗ = ˇ G ∗ i = − ˜ O (1/ n ), P h¯¯¯ ˜ G ∗ △ ˇ G ∗ ¯¯¯ > p n log( n )/2 i = O ( n − ). Proof.

In the ﬁrst step, we claim that the distributions of ˇ G ∗ and ˜ G ∗ are identical conditioned on vertex x n + having an identical spin, i.e. let ω ∈ Ω , then P h ˇ G ∗ = g | ˇ σ ∗ x n + = ω i = P £ ˜ G ∗ = g | ˜ σ ∗ x n + = ω ¤ .(4.24)Indeed, by the deﬁnition of ˇ G ∗ and ˜ G ∗ and Bayes theorem we ﬁnd for any assignment σ ∈ Ω V n P h ˇ G ∗ = g | ˇ σ x n + = ω i = P £ ˇ G = g ¤ ψ g ( σ , ω ) E £ ψ ˇ G ( σ , ω ) ¤ ,(4.25) P £ ˜ G ∗ = g | ˜ σ x n + = ω ¤ = P £ ˜ G = g ¤ ψ g ( σ ) P ¡ ˜ σ x n + = ω | ˜ G ∗ = g ¢ E £ ψ ˜ G ( σ ) ¤ P £ ˜ σ x n + = ω ¤ = P £ ˜ G = g ¤ ψ g ( σ , ω ) E £ ψ ˜ G ( σ ) ¤ P £ ˜ σ x n + = ω ¤ (4.26)Moreover, P £ ˜ σ x n + = ω ¤ = E " ψ ˜ G ∗ ( σ , ω ) P χ ∈ Ω ψ ˜ G ∗ ( σ , χ ) = E h ψ ˜ G ( σ ) ψ ˜ G ( σ , ω ) P χ ∈ Ω ψ ˜ G ( σ , χ ) i E £ ψ ˜ G ( σ ) ¤ = E £ ψ ˜ G ( σ , ω ) ¤ E £ ψ ˜ G ( σ ) ¤ = E £ ψ ˇ G ( σ , ω ) ¤ E £ ψ ˜ G ( σ ) ¤ (4.27)Therefore, (4.24) follows from (4.25) – (4.27) and the fact that by deﬁnition P £ ˇ G = g ¤ = P £ ˜ G = g ¤ . We nowneed to get a handle on the distribution of ˜ σ ∗ x n + and ˇ σ ∗ x n + . Clearly, we ﬁnd by construction P £ ˇ σ x n + = ω ¤ = q . We claim that P £ ˜ σ x n + = ω ¤ = q + ˜ O ( n − ).(4.28)By assumption there is ( ε ω ) ω ∈ Ω such that ε ω = ˜ O ( n − ) for all ω ∈ Ω with the property that the marginaldistribution on a cavity with color ω is 1/ q + ε ω . It turns out that this is enough to prove the claim. ByRemark 4.15 without loss of generality, suppose that h i = i ∈ d n + . Then, P £ ˜ σ x n + = ω ¤ ∝ d n + Y i = X σ ∈ Ω km + i { σ = ω } ψ a ′ i ( σ ) k m + i Y j = q + ε σ j = d n + Y i = X σ ∈ Ω km + i { σ = ω } ψ a ′ i ( σ ) Ã q − k m + i + + q − k m + i + k m + i X j = ε σ j + O ¡ k ε k ¢! = d n + Y i = ξ + X σ ∈ Ω km + i { σ = ω } ψ a ′ i ( σ ) q − k m + i + k m + i X j = ε σ j + O ¡ k ε k ¢ = ξ d n + + d n + X i = X σ ∈ Ω km + i { σ i = ω } ψ a ′ i k m + i X j = ε σ j + O ¡ k ε k ¢ (4.29)Since ξ d n + does not depend on ω and since d n + , k m + i , . . . , k m + d n + , ψ a ′ , . . . , ψ a ′ dn + are all bounded, weﬁnd that (4.29) implies (4.28).Thus, we are left to consider two cases. Case ˜ σ x n + = ˇ σ x n + : This occurs with probability 1 − ˜ O ( n − ). In this case, we ﬁnd P h ˇ G ∗ = g | ˜ σ x n + = ˇ σ x n + i = P £ ˜ G ∗ = g | ˜ σ x n + = ˇ σ x n + ¤ trivially by the above. ase ˜ σ x n + ˇ σ x n + : By Proposition 4.6 there is a coupling of ˇ G ∗ and ˜ G ∗ such that P h ˇ G ∗ ˜ G ∗ | ˜ σ x n + ˇ σ x n + i = ˜ O ( n − ).Hence, P h ˜ G ∗ ˇ G ∗ i = P h ˜ G ∗ ˇ G ∗ | ˜ σ = ˇ σ i P [ ˜ σ = ˇ σ ] + P h ˜ G ∗ ˇ G ∗ | ˜ σ ˇ σ i P [ ˜ σ ˇ σ ] = ˜ O (1/ n ) ,implying the ﬁrst part of the lemma. The second part of the lemma follows from Lemma 4.12. (cid:3) Proof of Proposition 3.5.

The proposition is a special case of Proposition 4.13. (cid:3)

Nishimori redux.

The general models from Section 4 satisfy the following ’Nishimori identity’.

Proposition 4.17.

For any event A and for any σ ∈ Ω V n we have E h { ˆ G ( d , k , P , θ ) ∈ A } µ ˆ G ( d , k , P , θ ) ( σ ) i = P £ ˆ σ ( d , k , P , θ ) = σ ¤ P £ G ∗ ( d , k , P , θ , σ ) ∈ A ¤ . Furthermore, ˆ σ ( d , k , P , θ ) and σ ∗ are mutually contiguous, as are ˆ G ( d , k , P , θ ) and G ∗ ( d , k , P , θ , σ ∗ ) . The proof of the proposition can be found in Section 9.

Proof of Proposition 3.2.

This is an immediate consequence of Proposition 4.17. (cid:3)

Lemma 4.18.

Let C x = { i ∈ [ n ] : ∃ h ∈ [ d i ] : ( i , h ) ∈ C } be the set of cavity variables and let x ∈ C x denote arandomly chosen cavity where P [ x = i ] = | { h ∈ [ d i ] : ( i , h ) ∈ C } | / | C | . Moreover, abbreviate G ∗ ε , n = G ∗ ( d , k , P , θ , σ ) .Under the assumptions of Proposition 4.6, for any ω ∈ Ω , there exists a sequence ε ω = ˜ O ( n − ) such that P · µ G ∗ ε , n , x ( ω ) = q + ε ω ¸ = − o (1). Proof.

By Lemma 4.8, we have P h y ♯ A ∈ Y i = − O ( n − ) and therefore for any ω ∈ Ω , P "¯¯¯¯¯ ∆ X i = n y ♯ A + i = ω o − ∆ q ¯¯¯¯¯ > p ∆ log ∆ log log ∆ = O ¡ n − ¢ .Therefore, with probability 1 − O ( n − ), P £¯¯ ( σ ∗ | C x ) − ( ω ) − ∆ / q ¯¯ > p ∆ log ∆ log log ∆ ¤ = O ( n − ). By conti-guity, this in turn implies that P h¯¯ ( ˆ σ | C x ) − ( ω ) − ∆ / q ¯¯ > p ∆ log ∆ log log ∆ i = o (1).and therefore by Proposition 3.2 we ﬁnd E h µ G ∗ ( σ ∗ ), C ³n σ ∈ [ q ] C : ¯¯ ( σ − ( ω ) − ∆ / q ¯¯ > p ∆ log ∆ log log ∆ o´i = o (1). (cid:3) Recall Y ♯ i ( τ ) = P k i j = n y ♯ ( a i , j ) = τ o for i ∈ [ m ] and A = P mi = k i from above. We now correspondinglydenote by C ♯ ( τ ) = ∆ X j = n y ♯ b j = τ o the number of cavities of each colour τ . The following lemma provides that the spin distribution on thecavities is close to uniform. Lemma 4.19.

For all τ ∈ [ q ] , we have P ·¯¯¯¯ C ♯ ( τ ) − ∆ q ¯¯¯¯ = O ¡ p n log n log log n ¢¸ = O ( n − ). roof. By Lemma 4.8, we have P h y ♯ A ∈ Y i = − O ( n − ). Moreover, P ni = d i { σ i = τ } = P ni = d i / q + O ( p n log n )and therefore m X i = Y ♯ i ( τ ) + C ♯ ( τ ) = n X i = d i { σ i = τ } = n X i = d i / q + O ( p n log n ).Rearranging, we see that C ♯ ( τ ) = n X i = d i / q − A / q + O ( p n log n ) + O ³ p A log A log log A ´ = ∆ / q + O ( p n log n log log n )with probability at least 1 − O ( n − ). (cid:3)

5. V

ARIATION OF MEASURES

This section is entirely self-contained. Fix a number q ∈ Z > of colours and a nonempty index set L ⊆ Z ≥ . Fix a degree d ℓ ∈ Z ≥ for each ℓ ∈ L such that D \ {0}

6= ; holds for the set D = { d ℓ : ℓ ∈ L }of degrees. Further, for each ℓ ∈ L ﬁx a measure µ ℓ ∈ P ([ q ] d ℓ ) satisfying the assumption SPAN , i.e. forall ω ∈ [ q ] we have ω [ d ℓ ] ∈ X ℓ where X ℓ ⊆ [ q ] d ℓ denotes the support of µ ℓ and using the shorthand1 [ d ℓ ] = (1) h ∈ [ d ℓ ] . Analogously, the family ( d ℓ , µ ℓ ) ℓ ∈ L satisﬁes SPAN iff µ ℓ ∈ P ([ q ] d ℓ ) satisﬁes SPAN for all ℓ ∈ L . For P ∈ P ( L ) let ℓ P have law P , further d P = d ℓ P and P L = P L ( L ) = { P ∈ P ( L ) : E [ d P ] ∈ R > }denote the measures with ﬁnite positive degree expectation, i.e. exactly the measures P for which ˆ P ∈ P ( L ) given by the Radon-Nikodym derivative E [ d P ] − d ℓ with respect to P is well-deﬁned. For ℓ ∈ L and p ∈ P ([ q ]) let µ p , ℓ ∈ P ( X ℓ ) be given by µ p , ℓ ( χ ) = Z − p , ℓ µ ℓ ( χ ) Y h ∈ [ d ℓ ] p ( χ h ), Z p , ℓ = X χ µ ℓ ( χ ) Y h ∈ [ d ℓ ] p ( χ h ),for χ ∈ X ℓ . Let µ p , ℓ | h ∈ P ([ q ]), h ∈ [ d ℓ ], denote the marginal on the h -th coordinate and further µ p , ℓ | ∗ = P h d − ℓ µ p , ℓ | h if d ℓ >

0. The central quantity of this section is ι P : P ([ q ]) → P ([ q ]), p E £ µ p , ℓ ˆ P | ∗ ¤ for P ∈ P L . Notice that ι P is well-deﬁned since we always have P [ d ˆ P = = Proposition 5.1.

Let ( d ℓ , µ ℓ ) ℓ ∈ L be a family satisfying SPAN . Then for any choice of P ∈ P L the map ι P isa homeomorphism. For P ∈ P L let M P = Q ℓ P ( X ℓ ) denote the set of all families of measures that are absolutely continuouswith respect to ( µ ℓ ) ℓ , for all ℓ in the support of P . For given assignment distributions ν ∈ M P let ρ P ( ν ) = ρ P , ν = E £ ν ℓ ˆ P | ∗ ¤ ∈ P ([ q ])denote their (expected) relative colour frequencies. Further, for ρ ∈ P ([ q ]) let M P , ρ = ρ − P ( ρ ) ⊆ M P de-note the assignment distributions ν that are absolutely continuous with respect to µ = ( µ ℓ ) ℓ with colourfrequencies ρ . Let O ⊆ [ q ] denote the support of ρ and P ◦ ( O ) ⊆ P ([ q ]) the laws with support O . Further,let µ O , ℓ ∈ P ◦ ( X ℓ ∩ O d ℓ ) denote the law of χ ℓ | χ ℓ ∈ O d ℓ with χ ℓ being a sample from µ ℓ for given ℓ ∈ L .Finally, the conditional relative entropy on the ﬁbre given by ρ is f P , ρ : M P , ρ → R ≥ ∪ { ∞ }, ν E £ D KL ¡ ν ℓ P k µ O , ℓ P ¢¤ . Proposition 5.2.

Assume that ( d ℓ , µ ℓ ) ℓ ∈ L satisﬁes SPAN . Then for any choice of P ∈ P L and ρ ∈ P ([ q ]) with p = ι − P ( ρ ) the assignment distribution family ( µ p , ℓ ) ℓ ∈ M P , ρ is the unique minimiser of f P , ρ . or the last assertion we equip P L with the metric given by ∆ L ( P , P ′ ) = X ℓ ∈ L ( d ℓ + ¯¯ P ( ℓ ) − P ′ ( ℓ ) ¯¯ for P , P ′ ∈ P L , where adding one is required since d ℓ = d ℓ ′ = ℓ ℓ ′ . Proposition 5.3. If SPAN holds, then ι , ι − : P L × P ([ q ]) → P ([ q ]) are continuous. Proof strategy.

In the ﬁrst part of the proof we derive Proposition 5.1 and Proposition 5.2 withoutthe continuity results, which we postpone together with the proof of Proposition 5.3 to Section 5.8. Inthe ﬁrst part we start with the assumption that d ℓ > ℓ in the support of P ∈ P L and then discussisolated vertices in Section 5.7. Further, we ﬁrst restrict to ι P : P ◦ ([ q ]) → P ◦ ([ q ]) and then extend theresults to the boundary in Section 5.6. In this restricted setup Section 5.2 covers the case | L | =

1, Section5.3 is dedicated to the case | L | =

2, in Section 5.4 we discuss ﬁnite index sets L and in Section 5.5 weﬁnally extend the results to countable sets L .5.2. One point masses.

Fix ℓ ∈ L such that d = d ℓ >

0. With ρ ∈ P ◦ ([ q ]) our atoms are given by ι ℓ : P ◦ ([ q ]) → P ◦ ([ q ]), p µ p , ℓ | ∗ , ρ ℓ : P ( X ℓ ) → P ([ q ]), ν ν | ∗ , f ℓ , ρ : M ℓ , ρ → R ≥ , ν D KL ¡ ν k µ ℓ ¢ , M ℓ , ρ = © ν ∈ P ( X ℓ ) : ν | ∗ = ρ ª .Notice that for p ∈ P ◦ ([ q ]) we indeed have µ p , ℓ | ∗ ∈ P ◦ ([ q ]) since ω [ d ] ∈ X ℓ for all ω ∈ [ q ]. We split ι ℓ into ι ℓ : P ◦ ([ q ]) → M ℓ , p µ p , ℓ , with M ℓ = im( ι ℓ ) and ι ℓ : M ℓ → P ◦ ([ q ]), ν ν | ∗ , i.e. ι ℓ is the restrictionof ρ ℓ to M ℓ . Notice that M ℓ ⊆ P ◦ ( X ℓ ), which we equip with k · k inherited from R X ℓ . Further, for any p ∈ P ◦ ([ q ]) and ω ∈ [ q ] we have p ( ω ) p ( q ) = µ µ p , ℓ ( ω [ d ] ) µ ℓ ( q [ d ] ) µ p , ℓ ( q [ d ] ) µ ℓ ( ω [ d ] ) ¶ d ,so uniqueness of p for given µ p , ℓ follows with a normalization argument and hence ι ℓ is a bijection.Next, we show that for any ρ ∈ P ◦ ([ q ]) there exists a unique minimiser µ min, ρ ∈ M ℓ , ρ ∩ P ◦ ( X ℓ ) of f ℓ , ρ .First, notice that M ℓ , ρ

6= ; since µ b, ρ ∈ M ℓ , ρ with µ b, ρ = P ω ∈ [ q ] ρ ( ω ) e X ℓ , ω [ d ] and using the shorthand e X ℓ , ω [ d ] = ( { χ = ω [ d ] }) χ ∈ X ℓ ∈ R X ℓ for standard basis vectors. From this we obtain that M ℓ , ρ = µ b, ρ + V ℓ , ρ with V ℓ , ρ = © v ∈ V ℓ : µ b, ρ + v ≥ X ℓ ª , V ℓ = ( v ∈ R X ℓ : X χ v ( χ ) = v | ∗ = [ q ] ) .In the following sections we use the shorthand 0 for 0 X ℓ by an abuse of notation since the deﬁnitionof V ℓ determines the underlying space. Since V ℓ is a linear subspace of R X ℓ , the sets V ℓ , ρ and M ℓ , ρ arepolytopes, so in particular they are convex and compact. Further, notice that for any v ∈ R X ℓ and h ∈ [ d ]we have P χ v ( χ ) = P ω v | h ( ω ) and hence P χ v ( χ ) = P ω v | ∗ ( ω ), which suggests that V ℓ = { v ∈ R X ℓ : v | ∗ = [ q ] }, i.e. V ℓ is the kernel of the linear map v | ∗ = W v given by the matrix W = ( | χ − ( ω ) | / d ) χ ∈ X ℓ , ω ∈ [ q ] . Thefact that the column vectors ( W ω ∗ [ d ] , ω ) ω ∈ [ q ] = e [ q ], ω are exactly the unit vectors for ω ∗ ∈ [ q ] shows that W is surjective and thereby the kernel has dimension | X ℓ | − q . Hence, if X ℓ = { ω [ d ] : ω ∈ [ q ]}, then W is bijective, further M ℓ , ρ = { µ b, ρ } and µ min, ρ = µ b, ρ ∈ P ◦ ( X ℓ ) is the unique minimizer of f ℓ , ρ . Otherwise,for ε ∈ (0, 1) let µ ε , ρ ∈ R X ℓ be given by µ ε , ρ ( χ ) = εµ ℓ ( χ ) + X ω ∈ [ q ] α ( ω ) e X ℓ , ω [ d ] ( χ ), α ( ω ) = ρ ( ω ) − εµ ℓ | ∗ ( ω ),for χ ∈ X ℓ . Notice that µ ε , ρ | ∗ = ρ , ε + P ω α ( ω ) =

1, and that α ≥ ε sufﬁciently small since ρ ∈ P ◦ ([ q ]),which gives µ ε , ρ ∈ M ℓ , ρ ∩ P ◦ ( X ℓ ), meaning that µ ε , ρ is in the relative interior of M ℓ , ρ . For any ν ∈ M ℓ , ρ \ ◦ ( X ℓ ) the derivative of f ℓ , ρ at ν in the direction ( µ ε , ρ − ν ) ∈ V ℓ is −∞ by the properties of the relativeentropy, hence any minimizer of f ℓ , ρ has to be in M ℓ , ρ ∩ P ◦ ( X ℓ ). On the other hand, the minimizer µ min, ρ exists since M ℓ , ρ is compact and is unique since f ℓ , ρ is strictly convex (with convex domain). Hence, let M min = { µ min, ρ : ρ ∈ P ◦ ([ q ])} ⊆ P ◦ ( X ℓ ). Notice that for any ν ∈ M min and χ ∗ ∈ X ℓ \ { ω [ d ] : ω ∈ [ q ]} weknow that ν is a stationary point of f ℓ , ν | ∗ (since χ ∗ exists), hence by evaluating the ﬁrst derivative of f ℓ , ν | ∗ at ν in the direction v = d e X ℓ , χ ∗ − P ω | χ ∗− ( ω ) | e X ℓ , ω [ d ] ∈ V ℓ we obtain X χ µ log µ ν ( χ ) µ ℓ ( χ ) ¶ + ¶ v ( χ ) = ν ( χ ∗ ) = µ ℓ ( χ ∗ ) Q ω w ( ω ) | χ ∗− ( ω ) | = µ ℓ ( χ ∗ ) Q h ∈ [ d ] w ( χ ∗ h ) with w ( ω ) = µ ν ( ω [ d ] ) µ ℓ ( ω [ d ] ) ¶ d ∈ R > , ω ∈ [ q ].Further, this equation trivially holds for any choice of ν and ω ∈ [ q ] with χ ∗ = ω [ d ] . Hence with p ∈ P ◦ ([ q ]) given by p ∝ w , i.e. p = Z − w with Z = P ω w ( ω ), a normalization argument applied to ν ∈ M min shows that ν = µ p , ℓ , so M min ⊆ M ℓ . Conversely, for any ν = µ p , ℓ ∈ M ℓ , p ∈ P ◦ ([ q ]), we have ν ∈ M ℓ , ν | ∗ . If | M ℓ , ν | ∗ | =

1, then ν is the unique minimizer and hence ν ∈ M min , otherwise ν is in the relative interior of M ℓ , ν | ∗ and evaluating the ﬁrst derivatives of f ℓ , ν | ∗ at ν in any direction v ∈ V ℓ yields X χ µ log µ ν ( χ ) µ ℓ ( χ ) ¶ + ¶ v ( χ ) = X χ µX ω | χ − ( ω ) | log( p ( ω )) − log( Z p , ℓ ) ¶ v ( χ ) = M min = M ℓ and hence ι ℓ is bijective with inverse ρ µ min, ρ , which completes theproof.5.3. Two point masses.

Assume that P is supported on two indices ℓ , ℓ ∈ L . We let d = d ℓ , d = d ℓ for transparency and use analogous shorthands for all related quantities throughout this section.Further, we will continue to use the atoms introduced in Section 5.2. For given ρ ∈ P ◦ ([ q ]) let p = ι − ( ρ ), p = ι − ( ρ ) ∈ P ◦ ([ q ]) and notice that µ ρ = ( µ ρ ,1 , µ ρ ,2 ) ∈ M P , ρ with µ ρ ,1 = µ p , ℓ ∈ P ◦ ( X ) and µ ρ ,2 = µ p , ℓ ∈ P ◦ ( X ). Further, we have M P , ρ = µ ρ + V P , ρ with V P , ρ = { v ∈ V P : µ ρ + v ≥ V P = ( v ∈ R X × R X : X χ ∈ X v ( χ ) = X χ ∈ X v ( χ ) = ρ P , v = [ q ] ) .Hence, the set M P , ρ is a polytope. As in Section 5.2, notice that P χ v ( χ ) = P ω v | ∗ ( ω ) =

0, sousing P ω ρ P , v ( ω ) = P ω v | ∗ ( ω ) = P χ v ( χ ) = V P = ( v ∈ R X × R X : X χ ∈ X v ( χ ) = ρ P , v = [ q ] ) .The linear map W whose kernel is V P is given by W = µ t X t X P ( ℓ ) W P ( ℓ ) W ¶ ,using t to denote the transpose and where W , W are the matrices from Section 5.2 corresponding to v | ∗ and v | ∗ . To see that W is surjective ﬁx w ∈ R [ q ] , let v ∈ R X be any choice with P χ v ( χ ) = w (0)and use surjectivity of P ( ℓ ) W to determine a preimage v of w [ q ] − P ( ℓ ) v | ∗ . Hence, the ( | X | + | X | − q −

1) dimensional kernel V P of W is never trivial since q >

1. As in Section 5.2 for any boundary point ν ∈ M P , ρ \ ( P ◦ ( X ) × P ◦ ( X )) the derivative of f P , ρ at ν in the direction µ ρ − ν is −∞ , hence we have µ min, ρ ∈ M P , ρ ∩ ( P ◦ ( X ) × P ◦ ( X )) for the unique minimizer µ min, ρ of the strictly convex map f P , ρ (withconvex and compact domain M P , ρ ). Further, since we have at least one degree of freedom, the point µ min, ρ is a stationary point of f P , ρ and in particular the ﬁrst derivatives of f P , ρ at µ min, ρ in the directions v ∈ V P vanish. Now, with µ min, ρ = ( µ min, ρ ,1 , µ min, ρ ,2 ) minimizing f P , ρ the component µ min, ρ ,1 obviously eeds to be the unique minimizer of f ℓ , µ min, ρ ,1 | ∗ and µ min, ρ ,2 the unique minimizer of f ℓ , µ min, ρ ,2 | ∗ . Since µ min, ρ is in the relative interior of M P , ρ we know that µ min, ρ ,1 | ∗ , µ min, ρ ,2 | ∗ ∈ P ◦ ([ q ]) and can hence useSection 5.2 to obtain p , p ∈ P ◦ ([ q ]) with µ min, ρ ,1 = µ p , ℓ and µ min, ρ ,2 = µ p , ℓ . Now, ﬁx ω ∈ [ q ] with ω ω and let v ∈ V P be given by v ( ω [ d ] ) = d P ( ℓ ), v ( ω [ d ] ) = − d P ( ℓ ), v ( ω [ d ] ) = − d P ( ℓ ), v ( ω [ d ] ) = d P ( ℓ ),and v ( χ ) = v ( χ ) = f P , ρ at ν = µ min, ρ in the direction v then yields0 = P ( ℓ ) X j log µ ν ( ω j [ d ] ) µ ( ω j [ d ] ) ¶ v ( ω j [ d ] ) + P ( ℓ ) X j log µ ν ( ω j [ d ] ) µ ( ω j [ d ] ) ¶ v ( ω j [ d ] ).Rearranging gives p ( ω )/ p ( ω ) = p ( ω )/ p ( ω ). Since this result holds for all ω ∈ [ q ] a normalizationargument suggests that p = p , which shows that ι P is surjective. Further, for ﬁxed p ∈ P ◦ ([ q ]) we canevaluate the directional derivatives of f P , ι P ( p ) at µ p = ( µ p , ℓ , µ p , ℓ ) directly to see that µ p is indeed a sta-tionary point. This establishes a one-to-one correspondence between µ p and ι P ( p ), but since we haveseen that p can be uniquely reconstructed from any of the µ p ,1 , µ p ,2 this completes the proof.5.4. Finite supports.

The arguments in Section 5.3 directly extend to the case where P has ﬁnite support.In particular f P , ρ is strictly convex with convex and compact domain, which establishes the existenceof a unique minimizer µ min, ρ for any ρ ∈ P ◦ ([ q ]). Analogous arguments to the ones above show that µ min, ρ ∈ Q ℓ P ◦ ( X ℓ ) with ℓ in the support of P . From this we obtain p ℓ ∈ P ◦ ([ q ]) with µ min, ρ , ℓ = µ p ℓ , ℓ since the components µ min, ρ , ℓ also have to be minimizers for µ min, ρ , ℓ | ∗ as discussed in Section 5.3. Butnow, for any two distinct ℓ , ℓ with P denoting the law of ℓ P | ℓ P ∈ { ℓ , ℓ } and P c12 denoting the lawof ℓ P | ℓ P { ℓ , ℓ }, further µ = ( µ p , ℓ , µ p , ℓ ), ρ = ρ P , µ , µ c12 = ( µ p ℓ , ℓ ) ℓ { ℓ , ℓ } and ρ c12 = ρ P c12 , µ c12 weobtain ρ = P [ ℓ ˆ P ∈ { ℓ , ℓ }] ρ + P [ ℓ ˆ P { ℓ , ℓ }] ρ c12 and further f P , ρ (( µ p ℓ , ℓ ) ℓ ) = P [ ℓ P ∈ { ℓ , ℓ }] f P , ρ ( µ ) + P [ ℓ P { ℓ , ℓ }] f P c12 , ρ c12 ( µ c12 ).But then we necessarily have p ℓ = p ℓ since otherwise we could use Section 5.3 to obtain the uniqueminimizer of f P , ρ and use it to replace µ , thereby effectively decreasing f P , ρ without changing ρ and hence also ρ . This shows that µ min, ρ = ( µ p , ℓ ) ℓ for some p ∈ P ◦ ([ q ]) and thereby ι P is surjective. Tosee injectivity we follow Section 5.3 and show that ( µ p , ℓ ) ℓ is a stationary point of f P , ι P ( p ) by evaluating thedirectional derivatives of f P , ι P ( p ) at ( µ p , ℓ ) ℓ , and thereby is the unique minimizer.5.5. Inﬁnite supports.

Fix P with countably inﬁnite support and ρ ∈ P ◦ ([ q ]). Without loss of generalitywe may assume L = Z > . With ε n = P [ ℓ ˆ P [ n ]], P c n denoting the law of ℓ P | ℓ P [ n ], further µ c n = ( µ ℓ ) ℓ [ n ] and ρ c n = ρ P c n , µ c n we have ρ n = − ε n ρ − ε n − ε n ρ c n ∈ P ◦ ([ q ])for n sufﬁciently large since ρ n → ρ for n → ∞ . For ℓ ∈ [ n ] let p ℓ = ι − ℓ ( ρ n ) and p ℓ = u [ q ] otherwise,where u [ q ] = q − [ q ] ∈ P ◦ ([ q ]) denotes the uniform distribution over [ q ]. Then with ν = ( µ p ℓ , ℓ ) ℓ we have ρ = ρ P , ν and further f P , ρ ( ν ) = E £ { ℓ P ∈ [ n ]} D KL ¡ ν ℓ P k µ ℓ P ¢¤ ∈ R > .This shows that M ◦ P , ρ = { ν ∈ M P , ρ : f P , ρ ( ν ) < ∞ } is non-empty. Since f P , ρ is convex M ◦ P , ρ is convex and f P , ρ is strictly convex on M ◦ P , ρ which shows uniqueness of the minimizer µ min, ρ given its existence.With the discussion above and analogous to Section 5.4 we consider M P , ρ = ν ∗ + V P , ρ with ν ∗ ∈ M ◦ P , ρ , V P , ρ = { v ∈ V P : ν ∗ + v ≥

0} and V P = ( v ∈ Y ℓ R X ℓ : ∀ ℓ X χ v ℓ ( χ ) = ρ P , v = [ q ] ) s (inﬁnite dimensional) polytope. Notice that v ρ P , v is continuous with respect to the product topol-ogy since it is continuous for the restriction to ﬁnite domains and we have uniform tail bounds since v ℓ | ∗ is uniformly bounded. This shows that M P , ρ ⊆ M P is closed and hence compact (and metrizable)since M P is. Now, ﬁx a minimizing sequence ν n ∈ M P , ρ , n ∈ Z > , of f P , ρ . Using sequential compact-ness of M P we ﬁnd a converging subsequence of ( ν n ) n with limit ν ∈ M P , ρ and restrict to this subse-quence without loss of generality. This shows that f P , ρ ( ν ) ≥ inf ν ′ f P , ρ ( ν ′ ) is well-deﬁned. Now, assume that f P , ρ ( ν ) > inf ν ′ f P , ρ ( ν ′ ). Then there exists n ∗ such that the contribution to f P , ρ ( ν ) for ℓ P ∈ [ n ∗ ] is greaterthan inf ν ′ f P , ρ ( ν ′ ). But the contributions to f P , ρ ( ν n ) for ℓ P ∈ [ n ∗ ] converge to the contribution to f P , ρ ( ν )for ℓ P ∈ [ n ∗ ] due to continuity, hence for all sufﬁciently large n these contributions are bounded awayfrom inf ν ′ f P , ρ ( ν ′ ) and thereby f P , ρ ( ν n ) is bounded away from inf ν ′ f P , ρ ( ν ′ ) since the tails are non-negative,which is a contradiction to ν n being a minimizing sequence. Hence ν is a minimizer of f P , ρ , which es-tablishes that ν = µ min, ρ ∈ M ◦ P , ρ is the unique minimizer. Since ρ ∈ P ◦ ([ q ]) is fully supported we knowthat the colour frequencies of µ min, ρ conditional to ℓ P ∈ [ n ] are fully supported for n sufﬁciently large.But then the decomposition of ρ and f P , ρ with respect to [ n ] and analogous to Section 5.4 allows to usethe ﬁnite support results for [ n ] to obtain p ∈ P ◦ ([ q ]) such that µ min, ρ , ℓ = µ p , ℓ for ℓ ∈ [ n ], due to localoptimality of the minimizer µ min, ρ as discussed before. Since this argument holds for any n sufﬁcientlylarge, we obtain p n ∈ P ◦ ([ q ]) for any such choice and further p n = p since µ p n , ℓ = µ min, ρ , ℓ = µ p n ′ , ℓ for any ℓ ∈ [ n ] ⊆ [ n ′ ] and n ≤ n ′ . This shows that µ min, ρ = ( µ p , ℓ ) ℓ and further that ι P is surjective.To see injectivity ﬁx p ∈ P ◦ ([ q ]) and let ρ = ι P ( p ) ∈ P ◦ ([ q ]), ν ∗ = ( µ p , ℓ ) ℓ . Notice that f P , ρ ( ν ∗ ) = E h D KL ³ ν ∗ ℓ P k µ ℓ P ´i = E h − log ¡ Z p , ℓ P ¢ − d P H ³ ν ∗ ℓ P | ∗ k p ´i ≤ E £ − log ¡ Z p , ℓ P ¢¤ ≤ − log ³ min ω p ( ω ) ´ E [ d P ] ∈ R > ,i.e. ν ∗ ∈ M ◦ P , ρ . With M P , ρ = ν ∗ + V P , ρ as before and for any v ∈ V P we have E "X χ Ã log Ã ν ∗ ℓ P ( χ ) µ ℓ P ( χ ) ! + ! v ℓ P ( χ ) = E "X χ , h v ℓ P ( χ ) log( p ( χ h )) = E " X ω ∈ [ q ] log( p ( ω )) d P v ℓ P | ∗ ( ω ) = X ω ∈ [ q ] log( p ( ω )) E [ d P ] ρ P , v ( ω ) = ν ∈ M P , ρ and ℓ in the support of P with ν ℓ ν ∗ ℓ the strict convexity of the relative entropy yieldsthat D KL ¡ ν ℓ k µ ℓ ¢ > D KL ¡ ν ∗ ℓ k µ ℓ ¢ + X χ µ log µ ν ∗ ℓ ( χ ) µ ℓ ( χ ) ¶ + ¶ ( ν ℓ ( χ ) − ν ∗ ℓ ( χ )),i.e. the relative entropy is strictly above its tangent at ν ∗ . Combining these arguments gives f P , ρ ( ν ) > f P , ρ ( ν ∗ ) for any ν ∈ M P , ρ \ { ν ∗ }, i.e. ν ∗ = µ min, ρ which completes the proof (since p can be reconstructedfrom ν ∗ ).5.6. Extension to the boundary.

Let O ⊆ [ q ] be non-empty. Notice that ι P ( P ◦ ( O )) ⊆ P ◦ ( O ) since wehave µ ℓ ( ω [ d ℓ ] ) > ω ∈ [ q ]. This shows that the restriction ι P : P ◦ ( O ) → P ◦ ( O ) is a bijection for | O | = q O = | O | , µ ℓ replaced by µ O , ℓ (which still satisﬁes µ O , ℓ ( ω [ d ℓ ] ) > ω ∈ O )and P to see that the corresponding map ι O , P : P ◦ ( O ) → P ◦ ( O ) is a bijection and for any ρ ∈ P ◦ ( O )with p = ι − O , P ( ρ ) the assignment distribution ( µ p , ℓ ) ℓ is the unique minimizer of the corresponding map f O , P , ρ . However, for any p ∈ P ◦ ( O ) we have µ p , ℓ = µ O , p , ℓ and thereby ι O , P = ι P , f O , P , ρ = f P , ρ on P ◦ ( O )(up to relabeling colours). This shows that ι P : P ([ q ]) → P ([ q ]) is a bijection and ( µ p , ℓ ) ℓ is the uniqueminimizer of f P , ι P ( p ) . Finally, notice that the choice of µ O , ℓ over µ ℓ in the deﬁnition of f P , ρ for ρ ∈ P ◦ ( O )is only relevant for the case where the support of P is inﬁnite since these two versions of f P , ρ only differby an additive constant whenever the alternative deﬁnition of f P , ρ is ﬁnite. .7. Including zero.

Assume that P ∈ P L is such that P [ d P = >

0. By the deﬁnition of P L we have P [ d P = <

1. Further, we have P ([ q ] ) = { µ }, i.e. the one-point mass µ on the empty assignment isthe only possible choice for d ℓ =

0. Now, let P ◦ be the law of ℓ P | d P > P ◦ = ˆ P whichimmediately gives ι P = ι P ◦ . Further, since P ([ q ] ) carries only one element the contributions to f P , ρ , ρ ∈ P ([ q ]), for d ℓ = f P , ρ = P [ d P > f P ◦ , ρ since f P , ρ only formally depends on thecoordinates ℓ with d ℓ =

0. Thereby the results of the preceeding sections for P ◦ directly translate to P .5.8. Continuity.

First, we discuss continuity for ﬁxed P ∈ P L . For this purpose we consider the decom-position ι P = ι ◦ ι ◦ ι with ι : P ([ q ]) → Y ℓ ∈ L P ( X ℓ ), p ( µ p , ℓ ) ℓ ∈ L , ι : Y ℓ ∈ L P ( X ℓ ) → P ([ q ]) L , ν ( ν ℓ | ∗ ) ℓ ∈ L , ι : P ([ q ]) L → P ([ q ]), ρ E £ ρ ℓ ˆ P ¤ .We consider both im( ι ) and im( ι ) equipped with the inherited product topology, which is metriz-able since L is countable and therefore all topological spaces in question are compact and metriz-able. Thanks to the properties of the product topology both ι and ι are continuous, i.e. since p µ p , ℓ , ν ν | ∗ and the projections are continuous. This suggests that the restrictions ι : P ([ q ]) → im( ι ) and ι : im( ι ) → ι (im( ι )) are homeomorphisms, since they are continuous bijections of compact metrizablespaces and where Section 5.2 is already sufﬁcient to obtain bijectivity (with Section 5.6 and Section 5.7).Continuity of ι was discussed in Section 5.5, which concludes the proof that ι P is continuous. But since ι P is then a continuous bijection of compact metric spaces it is a homeomorphism.Next, we show that ι : P L × P ([ q ]) → P ([ q ]) is continuous. For this purpose let P , P n ∈ P L and p , p n ∈ P ([ q ]), n ∈ Z > , with ( P n , p n ) → ( P , p ) for n → ∞ . Further let ρ n = ι ( P n , p n ) and ρ = ι ( P , p ). Standardarguments show that P n → P with respect to the metric ∆ L yields E [ d P n ] → E [ d P ] and further ˆ P n → ˆ P in k · k . For any ε ∈ (0, 1) and n sufﬁciently large such that P [ ℓ ˆ P > n ] < ε this gives k ρ n − ρ k ≤ E h°° µ p n , ℓ ˆ P | ∗ − µ p , ℓ ˆ P | ∗ °° i + q k ˆ P n − ˆ P k ≤ E h°° µ p n , ℓ ˆ P | ∗ − µ p , ℓ ˆ P | ∗ °° { ℓ ˆ P ∈ [ n ]} i + ε + q k ˆ P n − ˆ P k → ε which shows that ι is continuous. For the reverse direction ﬁx a sequence ( P n , ρ n ) → ( P , ρ ) using p n = ι − ( P n , ρ n ) and p = ι − ( P , ρ ). Let ε ∈ (0, 1), R = ι P ( B ε ( p )), and using that ι P is a homeomorphism let δ ∈ (0, 1) be sufﬁciently small such that B δ ( ρ ) ⊆ R . For n sufﬁciently large we have k ˆ P n − ˆ P k < δ /(2 q )and ρ n ∈ B δ /2 ( ρ ), so k ι P ( p n ) − ρ k < δ + q q δ = δ ,thereby ι P ( p n ) ∈ B δ ( ρ ) ⊆ R and hence p n ∈ B ε ( p ). This completes the proof.6. L OCAL L IMIT T HEOREM

This section is mostly self-contained and only depends on the results obtained in Section 5 as wellas Chapter 5 in [10]. We start with the setup in Section 5, i.e. we ﬁx a number q ∈ Z > of colours anda family ( d ℓ , µ ℓ ) ℓ ∈ L satisfying SPAN with D \ {0}

6= ; . Following Chapter 5 in [10] let L ⊆ Z q − denotethe lattice spanned by ( | χ − ( ω ) | ) ω ∈ [ q − ∈ Z q − for χ ∈ X ℓ and ℓ ∈ L , i.e. the set of all points obtainedfrom (ﬁnite) linear combinations with integer coefﬁcients, and notice that L indeed has full rank dueto SPAN . Hence, Theorem 21.1 in [10] ensures the existence of a lattice basis b ω , ω ∈ [ q − L = P ω ∈ [ q − Z b ω . Using the proof of this theorem and SPAN , we notice that there exists a unique choice ofthe b ω such that ( b ω ) ω is lower triangular with positive diagonal h = ( b ω , ω ) ω ∈ Z q − > . The set of boxes Q τ = τ + Q ω ∈ [ q − [ − h ω /2, h ω /2) centered at τ ∈ L is a partition of R q − with Q τ ∩ L = τ . Further, since ach b ω is obtained from a ﬁnite linear combination of colour frequencies there exists a (not necessarilyunique) ﬁnite subset L ◦ ⊆ L that spans L in the sense above.For given p ∈ P ([ q ]), a number n ∈ Z > of vertices, indices ℓ = ( ℓ i ) i ∈ [ n ] ⊆ L and i ∈ [ n ] we use theshorthands d i = d ℓ i , X i = X ℓ i , further Z p , i = Z p , ℓ i and µ p , i = µ p , ℓ i for brevity, further let µ i = µ u [ q ] , i and also omit the subscript if p = u [ q ] for derived quantities since this corresponds to no variation. Let x n ∼ u [ n ] denote the uniformly random vertex, ℓ ℓ = ℓ x n and d ℓ = d x n . The main results of this sectionapply to sequences ( L n ) n ∈ Z > of families L n ⊆ L n satisfying the following assumptions. GEN:

There exists ε gen ∈ (0, 1) and a subset L ◦ ⊆ L that spans L such that P [ ℓ ℓ = ℓ ] ≥ ε gen for all ℓ ∈ L ◦ , ℓ ∈ L n and n ∈ Z > . VAR:

There exists E (2) ∈ R > such that E [ d ℓ ] ≤ E (2) for all ℓ ∈ L n and n ∈ Z > . SKEW:

There exists a sequence E (3) n ∈ R > , n ∈ Z > such that E (3) n = o ( p n / log( n ) ) and E [ d ℓ ] ≤ E (3) n for all ℓ ∈ L n and n ∈ Z > .We may assume without loss of generality that L ◦ is minimal and in particular d ℓ > ℓ ∈ L ◦ .Alternatively, we could deﬁne L n to be maximal for given ε gen , L ◦ , E (2) and E (3) n . Further, notice that theempty set satisﬁes all assumptions.Now, for given p ∈ P ([ q ]), n ∈ Z > and ℓ ∈ L n let χ p , ℓ denote a sample from N i ∈ [ n ] µ p , i , i.e. χ p , ℓ = ( χ p , ℓ , i ) i ∈ [ n ] with the components being independent, and let γ p , ℓ be the corresponding absolute colourfrequencies, i.e. for ω ∈ [ q ] given by γ p , ℓ ( ω ) = X i ∈ [ n ], h ∈ [ d i ] { χ p , ℓ , i , h = ω } = X i ∈ [ n ] | χ − p , ℓ , i ( ω ) | and further let ¯ γ p , ℓ = E [ γ p , ℓ ] denote the expectation. Notice that E [ d ℓ ] n is the total degree and in partic-ular P ω ∈ [ q ] γ p , ℓ ( ω ) = E [ d ℓ ] n . The ﬁrst result allows to control the tails of the colour frequencies. Proposition 6.1.

Let ( L n ) n ∈ Z > ⊆ L n satisfy GEN and

VAR . Then there exist constants c, c ′ ∈ R > such thatfor all n ∈ Z > , all ℓ ∈ L and all r ∈ R > we have P h k γ ℓ − ¯ γ k ≥ q E [ d ℓ ] nr i ≤ c ′ exp( − cr ) . For ℓ ∈ L n let P ℓ ∈ P ( L ) denote the law of ℓ ℓ and notice that d ℓ = d P ℓ is consistent. If E [ d ℓ ]is positive let ρ p , ℓ = E [ d ℓ ] n γ p , ℓ ∈ P ([ q ]) denote the relative colour frequencies for given p ∈ P ([ q ]).With Chapter 5 in [10] (Section 3.5 in [24]) it is immediate that P [ γ p , ℓ ,[ q − ∈ L ] = p ∈ P ([ q ])and ℓ ∈ L n , where we use the shorthand γ p , ℓ ,[ q − = ( γ p , ℓ , ω ) ω ∈ [ q − here and in the remainder. Using P ω ∈ [ q ] γ p , ℓ ( ω ) = E [ d ℓ ] n we extend the lattice L to C ℓ ∈ Z q , hence P [ γ p , ℓ ∈ C ℓ ] =

1, scale and truncate itto obtain P [ ρ p , ℓ ∈ R ℓ ] = R ℓ = E [ d ℓ ] n C ℓ ∩ P ([ q ]) and conclude with P ℓ = ι − ℓ ( R ℓ ) using the short-hand ι ℓ = ι P ℓ for the homeomorphism introduced in Section 5 (notice that indeed P ℓ ∈ P L ). Finally,let Σ ℓ , p = E [ d ℓ ] n Cov( γ p , ℓ ,[ q − ). The following theorem determines the local limits in the large deviationregime. Theorem 6.2.

Fix a compact set P ∗ ⊆ P ◦ ([ q ]) and a sequence ( L n ) n ⊆ L n satisfying SPAN , GEN , VAR and

SKEW . Then uniformly for all ℓ ∈ L n and p ∈ P ℓ ∩ P ∗ the covariance matrix Σ ℓ , p is positive deﬁnitewith k Σ − ℓ , p k − , k Σ ℓ , p k = Θ (1) and further using ρ = ι ℓ ( p ) we have P [ ρ ℓ = ρ ] =  + O  E (3) n s log( n ) n  Q ω h ( ω ) q E [ d ℓ ] n q − exp ³ − n E h D KL ³ µ p , ℓ ℓ k µ ℓ ℓ ´i´q (2 π ) q − det( Σ ℓ , p ) . For frequencies close to the expectation ¯ ρ ℓ , with ¯ ρ p , ℓ = E [ ρ p , ℓ ], Theorem 6.2 can be simpliﬁed to re-move the dependency on p . heorem 6.3. Fix a sequence r n ∈ R > with r n n = Ω (1) , nE (3) n r n = o (1) and a family ( L n ) n ⊆ L n satisfying SPAN , GEN , VAR and

SKEW . Then uniformly for all ℓ ∈ L n and ρ ∈ R ℓ with k ρ − ¯ ρ ℓ k < r n we have P [ ρ ℓ = ρ ] =  + O s log( n ) n + r n  E (3) n n  Q ω h ( ω ) q E [ d ℓ ] n q − φ ℓ ³q E [ d ℓ ] n ( ρ − ¯ ρ ℓ ) [ q − ´ ,where φ ℓ denotes the density of the normal distribution N (0 [ q − , Σ ℓ ) . Proof of Proposition 6.1.

Notice that γ ℓ ( ω ) is a sum of independent bounded random variables | χ − ℓ , i ( ω ) | ∈ [ d i ] with i ∈ [ n ], ω ∈ [ q ], and hence Hoeffding’s inequality with the usual transition to k · k ∞ and k · k yields P h k γ ℓ − ¯ γ ℓ k ≥ q E [ d ℓ ] nr i ≤ q exp Ã − E [ d ℓ ] nr q P i d i ! ≤ q exp µ − ε gen q E (2) r ¶ .6.2. Proof of Theorem 6.2.

The core idea of the proof is to determine the asymptotics of the point prob-abilities for colour frequencies ρ ∈ R ℓ by replacing the original law N i µ i with the frequency speciﬁclaw N i µ p , i , p = ι − ℓ ( ρ ), that is centered around ρ . The ﬁrst result introduces the variation of measure.For this purpose let γ χ ∈ Z q ≥ and ρ χ ∈ P ([ q ]) denote the absolute and relative colour frequencies of anassignment χ ∈ Q i ∈ [ n ] [ q ] d i for ℓ ∈ L n with E [ d ℓ ] > P ℓ ∈ P L . Lemma 6.4.

Assume that L satisﬁes SPAN . Then for all n ∈ Z > , all ℓ ∈ L n with P ℓ ∈ P L and χ ∈ Q i ∈ [ n ] [ q ] d i , using p = ι − ℓ ( ρ χ ) we have P [ χ ℓ = χ ] = exp ³ − n E h D KL ³ µ p , ℓ ℓ k µ ℓ ℓ ´i´ P [ χ p , ℓ = χ ] . The gist in the proof of Lemma 6.4, which is postponed to Section 6.3, is that ¯ γ p , ℓ = γ χ by the design of ι ℓ . Lemma 6.4 directly implies that P [ ρ ℓ = ρ χ ] = exp ³ − n E h D KL ³ µ p , ℓ ℓ k µ ℓ ℓ ´i´ P [ ρ p , ℓ = ρ χ ]and hence Theorem 6.2 is an immediate consequence from the following proposition which reﬂects theasymptotic point probability of exactly the expectation. Proposition 6.5.

With P ∗ ⊆ P ◦ ([ q ]) and ( L n ) n ⊆ L n from Theorem 6.2 and uniformly for all ℓ ∈ L n and p ∈ P ℓ ∩ P ∗ the covariance matrix Σ ℓ , p is positive deﬁnite with k Σ − ℓ , p k − , k Σ ℓ , p k = Θ (1) and further P [ ρ p , ℓ = ¯ ρ p , ℓ ] =  + O  E (3) n s log( n ) n  Q ω h ( ω ) q E [ d ℓ ] n q − q (2 π ) q − det( Σ ℓ , p ) . We show this result in Section 6.4 using the characteristic function inversion formula in Chapter 5 of[10], or equivalently a vanilla version of the saddle point method.6.3.

Proof of Lemma 6.4.

First, notice that P [ χ ℓ = χ ] = P [ χ p , ℓ = χ ] since p ∈ P ([ q ]) and ι ℓ ( p ) ∈ P ([ q ]) always have the same support. For the non-trivial case with O ⊆ [ q ] denoting the supportof p we have P [ χ ℓ = χ ] = Y i µ i ( χ i ) = c Y ω ∈ O p ( ω ) γ ( ω ) Y i µ i ( χ i ) Z p , i = c Y i , h p ( χ i , h ) Y i µ i ( χ i ) Z p , i = c P [ χ p , ℓ = χ ], c = ÃY i Z p , i ! Ã Y ω ∈ O p ( ω ) − γ χ ( ω ) ! = ÃY ℓ Z n P [ ℓ ℓ = ℓ ] p , ℓ ! Ã Y ω ∈ O p ( ω ) − γ χ ( ω ) ! . or ℓ ∈ L let χ ∗ p , ℓ be a sample from µ p , ℓ and γ ∗ p , ℓ = ( | χ ∗− p , ℓ ( ω ) | ) ω ∈ [ q ] the colour frequencies. Now, we use ρ χ = ι ℓ ( p ) = ¯ ρ p , ℓ to obtain log( c ) = − n E [ α ( ℓ ℓ )] with α ( ℓ ) = E " X ω ∈ O γ ∗ p , ℓ ( ω ) log( p ( ω )) − log( Z p , ℓ ) = E " log Ã Q h ∈ [ d ℓ ] p ( χ ∗ p , ℓ , h ) Z p , ℓ ! = D KL ¡ µ p , ℓ k µ ℓ ¢ .6.4. Proof of Proposition 6.5.

We start with some standard results based on Section 21 in [10]. No-tice that due to

GEN the lattice L is the minimal lattice for γ p , ℓ ,[ q − . Slightly deviating from [10] welet the dual basis be given by ( b ∗ ω ) ω = B ∗ = π ( B − ) t with B = ( b ω ) ω , so B ∗ is upper triangular, further B ∗ t B = π I [ q − with I [ q − denoting the identity, and b ∗ ω ( ω ) = π / h ( ω ) for ω ∈ [ q −

1] which yields the fundamental domain Q ∗ = Q ω [ − π / h ( ω ), π / h ( ω )]. Hence, translating the point probability for ρ p , ℓ to γ p , ℓ , reducing it to γ p , ℓ ,[ q − and using the inversion formula in the lattice case gives P [ ρ p , ℓ = ¯ ρ p , ℓ ] = Q ω h ( ω )(2 π ) q − Z Q ∗ f ℓ , p ( ϕ )d ϕ , f ℓ , p ( ϕ ) = exp ¡ − i ϕ t ¯ γ ℓ , p ,[ q − ¢ E h exp ³ i ϕ t γ ℓ , p ,[ q − ´i .With the shorthand ¯ γ ∗ p , ℓ = E [ γ ∗ p , ℓ ] for the expectation of γ ∗ p , ℓ from Section 6.3 and since γ p , ℓ is a sum ofindependent random vectors we have f ℓ , p ( ϕ ) = Y ℓ ∈ L f ∗ p , ℓ ( ϕ ) n P [ ℓ ℓ = ℓ ] , f ∗ p , ℓ ( ϕ ) = E · exp µ i ϕ t ³ γ ∗ p , ℓ − ¯ γ ∗ p , ℓ ´ [ q − ¶¸ .Now, we follow the standard scheme in that we ﬁrst bound the tails at constant distance, then establishsubgaussian tails and ﬁnally use a normal approximation to obtain the material contribution, with somecareful bookkeeping along the way to obtain suitable error bounds. Lemma 6.6.

For all r ∈ R > there exists a constant c ∈ R > such that for all n ∈ Z > , ℓ ∈ L n , p ∈ P ∗ and ϕ ∈ Q ∗ \ B r (0 [ q − ) we have | f ℓ , p ( ϕ ) | ≤ exp( − cn ) .Proof. First, since f ∗ p , ℓ is a characteristic function we have | f ∗ p , ℓ ( ϕ ) | ≤

1. This shows that | f ℓ , p ( ϕ ) | ≤ | f ◦ p ( ϕ ) | ε gen n with f ◦ p ( ϕ ) = Q ℓ ∈ L ◦ f ∗ p , ℓ ( ϕ ). The unique maximizer of | f ◦ p ( ϕ ) | is 0 [ q − on the closure Q c of Q ∗ for all p ∈ P ◦ ([ q ]), due to Lemma 21.6 in [10] and the fact that L ◦ spans L . Considering f ◦ ( p , ϕ ) = f ◦ p ( ϕ ) asa function of both p and ϕ , we notice that f ◦ and further | f ◦ | are both continuous on the compact set P ∗ × Q c , so the latter attains its maximum M r ∈ [0, 1) on P ∗ × ( Q c \ B r (0 [ q − )) for sufﬁciently small r . If r is too large the assertion is trivially true, otherwise take c ′ ∈ ( M r , 1) to obtain | f ℓ , p ( ϕ ) | ≤ exp( − cn ) with c = − ε gen log( c ′ ) ∈ R > , valid for all required n , ℓ , p and ϕ . (cid:3) With the coarse tail bound in place we establish subgaussian tails. For this purpose we take a closerlook at the atoms f ∗ p , ℓ . Let Σ ∗ p , ℓ = Cov( γ ∗ p , ℓ ,[ q − ) denote the corresponding covariance. Lemma 6.7.

For all ℓ ∈ L with d ℓ > , p ∈ P ◦ ([ q ]) and ϕ ∈ R q − with k ϕ k ∞ < π d ℓ we have ¯¯¯¯ log( f ∗ p , ℓ ( ϕ )) + ϕ t Σ ∗ p , ℓ ϕ ¯¯¯¯ ≤ d ℓ d ℓ k ϕ k ∞ ) k ϕ k .Further, there exists c ℓ ∈ R > such that c ℓ ≤ k Σ ∗− p , ℓ k − ≤ k Σ ∗ p , ℓ k ≤ d ℓ for all p ∈ P ∗ .Proof. Using d = d ℓ recall that k γ ∗ p , ℓ k = d almost surely and thereby | ϕ t γ ∗ p , ℓ ,[ q − | ≤ d k ϕ k ∞ < π /2. With f ∗ p , ℓ ( ϕ ) = a + i b = r e i α and since the cosine is even, non-negative and decreasing on [0, π /2] we have r = | f ∗ p , ℓ ( ϕ ) | = ¯¯¯ E h exp ³ i ϕ t γ ∗ p , ℓ ,[ q − ´i¯¯¯ ≥ | a | = a = X γ P [ γ ∗ p , ℓ = γ ] cos( | ϕ t γ [ q − | ) ≥ cos( d k ϕ k ∞ ). ith a > α with | α | < π /2 is unique, and thereby g ∗ p , ℓ ( ϕ ) = log( f ∗ p , ℓ ( ϕ )) = log( r ) + i α is well-deﬁned. By direct computation we obtain that the ﬁrst derivatives of g ∗ p , ℓ at 0 [ q − vanish, the secondpartial derivatives yield Σ ∗ p , ℓ and for the third partial derivatives we get ∂ g ∗ p , ℓ ( ϕ ) Q i ∈ [3] ∂ϕ ( ω i ) = − i E  c ( γ , ω ) exp ³ i ϕ t ¡P i γ i ¢ [ q − ´ f ∗ p , ℓ ( ϕ )  , c ( γ , ω ) = γ ( ω ) ¡ γ ( ω ) − γ ( ω ) ¢ ¡ γ ( ω ) + γ ( ω ) − γ ( ω ) ¢ ,with i.i.d. copies γ i ∼ γ ∗ p , ℓ for i ∈ [3] and ω ∈ [ q − . This gives | c ( γ , ω ) | ≤ d uniformly, hence the thirdpartial derivatives can be upper bounded by d | f ∗ p , ℓ ( ϕ ) | ≤ d cos( d k ϕ k ∞ ) , which proves the ﬁrst assertion usingTaylor’s theorem.Recall that the covariance Σ ∗ p , ℓ is positive semi-deﬁnite. With γ ◦ p , ℓ ( ω ) = ( γ ∗ p , ℓ ( ω ) − ¯ γ ∗ p , ℓ ( ω )) ∈ [ − d , d ]almost surely for all ω ∈ [ q −

1] and p , this gives | v t γ ◦ p , ℓ | ≤ d and hence v t Σ ∗ p , ℓ v ≤ d for all v ∈ R q − with k v k = ω ∗ ∈ [ q −

1] with | v ( ω ∗ ) | = k v k ∞ ≥ p q − k v k =

1. With E = { e [ q ], q , e [ q ], ω ∗ } this yields v t Σ ∗ p , ℓ v = E ·³ v t γ ◦ p , ℓ ,[ q − ´ n γ ∗ p , ℓ E o¸ + P h γ ∗ p , ℓ ∈ E i E ·³ v t γ ◦ p , ℓ ,[ q − ´ ¯¯¯¯ γ ∗ p , ℓ ∈ E ¸ .For the conditional expectation we use α x + α x ≥ α α ( x − x ) valid for all α ∈ P ([2]) and x ∈ R ,which gives v t Σ ∗ p , ℓ v ≥ P h γ ∗ p , ℓ ∈ E i P h γ ∗ p , ℓ = e [ q ], ω ∗ ¯¯¯ γ ∗ p , ℓ ∈ E i P h γ ∗ p , ℓ = e [ q ], q ¯¯¯ γ ∗ p , ℓ ∈ E i k v k ∞ ≥ P h γ ∗ p , ℓ = e [ q ], ω ∗ i P h γ ∗ p , ℓ = e [ q ], q i k v k ∞ ≥ µ ℓ ( ω ∗ [ d ] ) p ( ω ∗ ) d µ ℓ ( q [ d ] ) p ( q ) d q − µ min, ℓ = min ω ∈ [ q ] µ ℓ ( ω [ d ] ) and ε p = min p ∈ P ∗ min ω ∈ [ q ] p ( ω ) ∈ (0, 1) this gives v t Σ ∗ p , ℓ v ≥ µ ℓ ε d p q − forall required v , p and ℓ . (cid:3) With Lemma 6.7 we are ready to establish the subgaussian tails.

Corollary 6.8.

There exists a constant c ∈ R > such that | f ℓ , p ( ϕ ) | ≤ exp( − c k ϕ k n ) for all n ∈ Z > , ℓ ∈ L n , p ∈ P ∗ and ϕ ∈ Q ∗ .Proof. Fix ℓ ∈ L ◦ and r ∈ (0, 1) sufﬁciently small to obtain a good approximation of log( f ∗ p , ℓ ( ϕ )) for ϕ ∈ B r (0 [ q − ) using Lemma 6.7, say | log( f ∗ p , ℓ ( ϕ )) + ϕ t Σ ∗ p , ℓ ϕ | ≤ c ℓ k ϕ k uniformly for all p ∈ P ∗ , i.e. the rel-ative error is at most 1/2. With log( f ∗ p , ℓ ( ϕ )) = a + i b this gives | a + ϕ t Σ ∗ p , ℓ ϕ | ≤ ϕ t Σ ∗ p , ℓ ϕ , so | f ∗ p , ℓ ( ϕ ) | = e a ≤ exp( − ϕ t Σ ∗ p , ℓ ϕ ) ≤ exp( − c ℓ k ϕ k ), and further | f ℓ , p ( ϕ ) | ≤ exp( − c ℓ ε gen k ϕ k n ). For ϕ ∈ Q ∗ \ B r (0 [ q − )we use the constant c ′ ∈ R > from Lemma 6.6 to obtain | f ℓ , p ( ϕ ) | ≤ exp( − c ′ r r n ) ≤ exp( − c ′ r k ϕ k n ), andtaking the minimum of the two choices completes the proof. (cid:3) Since the assertion indicates that the integral is of order p n − ( q − , we ﬁx ε a, n = c ∗ p log( n )/ n for somelarge c ∗ ∈ R > and set B a, n = B ε a, n (0 [ q − ), since then ¯¯¯¯Z Q ∗ { ϕ B a, n } f ℓ , p ( ϕ )d ϕ ¯¯¯¯ ≤ n − cc ∗ Y ω h ( ω ) = o ³ p n − q ´ niformly in ℓ ∈ L n and p ∈ P ∗ . The remainder of the proof is dedicated to the material contributions B a, n . First, we extend Lemma 6.7 to f ℓ , p . For this purpose let g ℓ , p ( ϕ ) = log( f ℓ , p ( ϕ )) = n E h log ³ f ∗ p , ℓ ℓ ( ϕ ) ´i ,which is deﬁned for sufﬁciently small ϕ (depending on ℓ , p ) as shown in Lemma 6.7. Lemma 6.9.

Uniformly for all ℓ ∈ L n , p ∈ P ∗ and ϕ ∈ B a, n we have ¯¯¯¯ g ℓ , p ( ϕ ) + E [ d ℓ ] n ϕ t Σ ℓ , p ϕ ¯¯¯¯ = O  E (3) n s log( n ) n  .Further, there exists c ∈ (0, 1) with c ≤ k Σ − ℓ , p k − ≤ k Σ ℓ , p k ≤ c − for all p ∈ P ∗ , ℓ ∈ L n and n ∈ Z > .Proof. For given ℓ ∈ L n the maximum degree d max, ℓ satisﬁes d ℓ ≤ n E [ d ℓ ] ≤ nE (3) n , so we have d max, ℓ ≤ ( nE (3) n ) = o ( p n / log( n )) and further d max, ℓ ε a, n = o (1) uniformly in ℓ (and p ). So, with Lemma 6.7 andequivalence of norms we obtain n ∈ Z > , c ∈ R > such that ¯¯¯¯ log( f ∗ p , ℓ ( ϕ )) + ϕ t Σ ∗ p , ℓ ϕ ¯¯¯¯ ≤ cd ℓ ε n for all ϕ ∈ B a, n , p ∈ P ∗ , ℓ in ℓ ∈ L n and n ∈ Z ≥ n . With the deﬁnition of g ℓ , p , Σ ℓ , p and the triangleinequality this gives ¯¯¯¯ g ℓ , p ( ϕ ) + E [ d ℓ ] n ϕ t Σ ℓ , p ϕ ¯¯¯¯ ≤ nc E [ d ℓ ] ε n ≤ ncE (3) n ε n = O  E (3) n s log( n ) n  uniformly in ϕ ∈ B a, n , p ∈ P ∗ and ℓ ∈ L n . Finally, using Lemma 6.7, ℓ ∈ L ◦ , and with v ∈ R q − , k v k = ε gen c ℓ E (2) ≤ E [ d ℓ ] P [ ℓ ℓ = ℓ ] c ℓ ≤ v t Σ ℓ , p v ≤ E [ d ℓ ] E [ d ℓ ] ≤ E (2) ε gen uniformly for all ℓ , p . (cid:3) For the sake of transparency let f ℓ , p ( ϕ ) = f r, ℓ , p ( ϕ ) + i f i, ℓ , p ( ϕ ), g ℓ , p ( ϕ ) = g r, ℓ , p ( ϕ ) + i g i, ℓ , p ( ϕ ) be the de-compositions into real and imaginary part, so in particular f r, ℓ , p ( ϕ ) = exp ¡ g r, ℓ , p ( ϕ ) ¢ cos( g i, ℓ , p ( ϕ )). As di-rectly implied by the left hand side of the inversion formula we only need to evaluate the integral over f r, ℓ , p ( ϕ ) since the integral over f i, ℓ , p ( ϕ ) vanishes. Using | a | ≤ | z | for z = a + i b we obtain the bound on thetails for the real part from the bound on the tails of the complex integral. Further, Lemma 6.9 suggeststhat g r, ℓ , p ( ϕ ) = − E [ d ℓ ] n ϕ t Σ ℓ , p ϕ + O ( E (3) n p log( n ) / n ) and g i, ℓ , p ( ϕ ) = O ( E (3) n p log( n ) / n ) = o (1). Usingcos( x ) = − O ( x ) and exp( x ) = + O ( x ) this gives Z Q ∗ f r, ℓ , p ( ϕ )d ϕ =  + O  E (3) n s log( n ) n  Z B a, n exp µ − E [ d ℓ ] n ϕ t Σ ℓ , p ϕ ¶ d ϕ + o ³ p n − q ´ .Rescaling with v = q E [ d ℓ ] n ϕ gives a Gaussian integral. With v ℓ , p ∼ N (0, Σ − ℓ , p ) reﬂecting the correspond-ing normal and V ℓ , p = q E [ d ℓ ] n B a, n the corresponding event this gives Z Q ∗ f r, ℓ , p ( ϕ )d ϕ =  + O  E (3) n s log( n ) n  p (2 π ) q − q E [ d ℓ ] n q − q det( Σ ℓ , p ) P [ v ℓ , p ∈ V ℓ , p ] + o ³ p n − q ´ . ith E [ d ℓ ] ≥ ε gen we obtain c ∈ R > with k v k ≥ c p n ε a, n for all v V ℓ , p and ℓ , p . Further, with Lemma6.9 we have c ′ ∈ (0, 1) to bound the eigenvalues of Σ ℓ , p uniformly, suggesting the existence of constants c , c ′ ∈ R > such that P [ v ℓ , p V ℓ , p ] ≤ c ′ exp( − c ε n n ) uniformly for all p ∈ P ∗ , ℓ ∈ L n and n ∈ Z > . Withthe deﬁnition of ε a, n this gives P [ v ℓ , p V ℓ , p ] ≤ c ′ n − cc ∗ , hence for some ﬁxed large c ∗ ∈ R > we have P [ v ℓ , p V ℓ , p ] = o ( p n − ) uniformly. Now, since E [ d ℓ ] = Θ (1) uniformly and det( Σ ℓ , p ) = Θ (1) uniformly thedominant contribution is of order p n − ( q − . Hence, extracting the material part gives Z Q ∗ f r, ℓ , p ( ϕ )d ϕ =  + O  E (3) n s log( n ) n  + o ( p n − )  p (2 π ) q − q E [ d ℓ ] n q − q det( Σ ℓ , p ) .Here, the fact that E (3) n ≥ Proof of Theorem 6.3.

We split the proof into two parts. The ﬁrst part is dedicated to a local limittheorem for π ℓ = ι − ℓ ( ρ ℓ ) ∈ P ℓ around u [ q ] , and in the second part we translate the result to ρ ℓ . Proposition 6.10.

Uniformly for all ℓ ∈ L n and p ∈ P ℓ with k p − u [ q ] k < r n we have P [ π ℓ = p ] =  + O s log( n ) n + r n  E (3) n n  det( Σ − ℓ )) Q ω h ( ω ) q E [ d ℓ ] n q − φ ℓ ³q E [ d ℓ ] n ( q p − [ q ] ) [ q − ´ ,where φ ℓ denotes the density of N (0, Σ − ℓ ) .Proof. With Theorem 6.2 we expand the exponent to second order, control the resulting errors andproceed analogously for the determinant by expanding to zeroth order. For given ℓ and p let α ℓ ( p ) = E [ d ℓ ] − E [ D KL ¡ µ p , ℓ k µ ℓ ¢ ], notice that D KL ¡ µ p , ℓ k µ ℓ ¢ = d ℓ = α ℓ ( p ) = E [ d ℓ ] − E " X ω ∈ [ q ] ¯ γ ∗ p , ℓ ℓ ( ω ) log( p ( ω )) − log( Z p , ℓ ℓ ) = X ω ¯ ρ ℓ , p ( ω ) log( p ( ω )) − E [log( Z p , ℓ ℓ )] E [ d ℓ ] ,thereby removing dependencies on the product spaces [ q ] d . Recall that p = u [ q ] is the unique global min-imizer of α ℓ as discussed in Section 5, hence the ﬁrst derivatives vanish. For the sake of completenessand later use we provide the derivatives. For transparency we use the shorthand f ( ω ,..., ω k ) = ∂ f ∂ p ( ω k ) ··· ∂ p ( ω ) to denote the k -th partial derivatives of the extension of a map f : P ◦ ([ q ]) → R to R q > . With the short-hand Σ ◦ p , ℓ = Cov( γ ∗ p , ℓ ) (as opposed to Σ ∗ p , ℓ = Cov( γ ∗ p , ℓ ,[ q − ) in Section 6.4) and for ℓ ∈ L with d ℓ > p ∈ P ◦ ([ q ]) are Z ( ω ) p , ℓ = Z p , ℓ ¯ γ ∗ p , ℓ ( ω ) p ( ω ) , ¯ γ ∗ ( ω ) p , ℓ ( ω ) = Σ ◦ p , ℓ , ω , ω p ( ω ) .With L p = (log( p ( ω ))) ω and Σ ◦ ℓ , p = E [ d ℓ ] E [ Σ ◦ p , ℓ ℓ ], i.e. Σ ℓ , p = ( Σ ◦ ℓ , p ) [ q − × [ q − , this gives¯ ρ ( ω ) ℓ , p ( ω ) = Σ ◦ ℓ , p , ω , ω p ( ω ) , α ( ω ) ℓ ( p ) = P ω ′ Σ ◦ ℓ , p , ω , ω ′ L p , ω ′ p ( ω ) .For given ω ∈ [ q ] we have Σ ◦ ( ω ) p , ℓ , ω , ω = S ◦ p , ℓ , ω p ( ω ) with S ◦ p , ℓ , ω = E [ Q i ( γ ∗ p , ℓ ( ω i ) − ¯ γ ∗ p , ℓ ( ω i ))]. Hence, with S ◦ ℓ , p = E [ d ℓ ] E [ S ◦ p , d ℓ ] the derivatives on the next level are given by¯ ρ ( ω , ω ) ℓ , p ( ω ) = S ◦ ℓ , p , ω − δ ω , ω Σ ◦ ℓ , p , ω , ω p ( ω ) p ( ω ) , α ( ω , ω ) ℓ ( p ) = Σ ◦ ℓ , p , ω + P ω ′ ( S ◦ ℓ , p , ω , ω ′ − δ ω , ω Σ ◦ ℓ , p , ω , ω ′ ) L p , ω ′ p ( ω ) p ( ω ) sing the Kronecker symbol. For ω ∈ [ q ] we have S ◦ ( ω ) p , ℓ , ω , ω , ω = F ◦ p , ℓ , ω p ( ω ) with F ◦ p , ℓ , ω = E " Y i ∈ [4] ( γ ∗ p , ℓ ( ω i ) − ¯ γ ∗ p , ℓ ( ω i )) − X i ∈ [3] Σ ◦ p , ℓ , ω i , ω Σ ◦ p , ℓ , ω [3]\{ i } ,recall that Σ ◦ p , ℓ is symmetric, so e.g. Σ ◦ p , ℓ , ω , ω = Σ ◦ p , ℓ , ω , ω , and hence F ◦ p , ℓ is symmetric in that F ◦ p , ℓ , ω ◦ σ = F ◦ p , ℓ , ω for all permutations σ : [4] → [4]. With F ◦ ℓ , p , ω = E [ d ℓ ] E [ F ◦ p , ℓ ℓ , ω ] and ω ∈ [ q ] this yields α ( ω ) ℓ ( p ) = S ◦ ℓ , p , ω + P ω ′ F ◦ ℓ , p , ω , ω ′ L p , ω ′ − α ( ω )2, ℓ ( p ) + δ ω , ω δ ω , ω P ω ′ Σ ◦ ℓ , p , ω , ω ′ L p , ω ′ p ( ω ) p ( ω ) p ( ω ) , α ( ω )2, ℓ ( p ) = X i < j δ ω i , ω j Ã Σ ◦ ℓ , p , ω i , ω [3]\{ i , j } + X ω ′ S ◦ ℓ , p , ω i , ω [3]\{ i , j } , ω ′ L p , ω ′ ! Recall that all Σ ◦ ℓ , p , S ◦ ℓ , p and F ◦ ℓ , p are invariant to permutations of the indicies and further P ω ′ Σ ◦ ℓ , p , ω , ω ′ = P ω ′ S ◦ ℓ , p , ω , ω ′ = P ω ′ F ◦ ℓ , p , ω , ω ′ = ω respectively, i.e. the “column” sum for any givendimension and choice of remaining indicies vanishes. On the one hand, since L p ≡ log( q − ) for p = u [ q ] all inner products involving L p vanish, i.e. all ﬁrst derivatives vanish and further ( α ( ω ) ℓ ( u [ q ] )) ω = q Σ ◦ ℓ , u [ q ] .On the other hand, this means that the inner product with L p equals the inner product with L p + c [ q ] for any c ∈ R . Since we discuss α ℓ locally around p = u [ q ] we choose c = log( q ) and let L ◦ p = L p + c [ q ] = (log( p ( ω )/ q − )) ω .Now, since r n = o ( n − ) we can ﬁx any small compact neighbourhood P ∗ ⊆ P ◦ ([ q ]) of u [ q ] to obtain B n ⊆ P ∗ with B n = B r n ( u [ q ] ) for n ∈ Z ≥ n and some n ∈ Z > , so in particular we get some ε p ∈ (0, q − )close to q − with p ( ω ) ≥ ε p for all p ∈ B n and n ∈ Z ≥ n . This e.g. takes care of the denominator of the thirdpartial derivatives.Recall from the proof of Lemma 6.7 that k γ ∗ p , ℓ − ¯ γ ∗ p , ℓ k ∞ ≤ d ℓ almost surely for all ℓ ∈ L and p ∈ P ([ q ]),and further that E [ d ℓ ] ≥ ε gen . This gives | Σ ◦ p , ℓ , ω | ≤ d ℓ , | S ◦ p , ℓ , ω | ≤ d ℓ and | F ◦ p , ℓ , ω | ≤ d ℓ for all suitable ω respectively, and uniformly in p and ℓ , so | Σ ◦ ℓ , p , ω | ≤ ε − E [ d ℓ ] ≤ ε − E (2) , | S ◦ ℓ , p , ω | ≤ ε − E (3) n and | F ◦ ℓ , p , ω | ≤ ε − E (3) n d max, ℓ uniformly in p and ℓ , where we recall d max, ℓ ∈ Z > from the proof of Lemma 6.9, in par-ticular that d ℓ ≤ nE (3) n and hence d max, ℓ r n = o (1). Finally, due to the restriction to P ∗ and with equiv-alence of norms we get a global constant c l ∈ R > with k L ◦ p k ≤ c l k p − u [ q ] k ≤ c l r n for all p ∈ B n and n ∈ Z ≥ n . Using these bounds we get α ( ω ) ℓ ( p ) = O ( E (3) n ) with the order given by the ﬁrst contribution anduniformly in ω ∈ [ q ] , ℓ ∈ L n and p ∈ B n . Now, Taylor’s theorem with equivalence of norms yields ¯¯¯¯ E [ d ℓ ] n α ℓ ( p ) − q E [ d ℓ ] n ( p − u [ q ] ) t Σ ◦ ℓ , u [ q ] ( p − u [ q ] ) ¯¯¯¯ = O ¡ nE (3) r n ¢ uniformly in p ∈ B n and ℓ ∈ L n . Now, for ω ∈ [ q −

1] let b ω = e [ q ], ω − e [ q ], q denote a basis of 1 ⊥ [ q ] andfurther B = ( b ω ( ω ∗ )) ω ∗ ∈ [ q ], ω ∈ [ q − the corresponding transformation, then ( p − u [ q ] ) = B ( p − u [ q ] ) [ q − andthe precision matrix of our normal distribution is given by B t Σ ◦ ℓ , u [ q ] B . On the other hand, since 1 [ q ] isboth a row and column eigenvector of Σ ◦ ℓ , u [ q ] with eigenvalue 0 we have Σ ◦ ℓ , u [ q ] = B Σ ℓ , u [ q ] B t = B Σ ℓ B t .Hence, with B t B = I [ q − + [ q − t[ q − we obtain B t Σ ◦ ℓ , u [ q ] B = Σ ℓ .With the exponent in place we turn to the asymptotics of the determinant f ℓ ( p ) = det( Σ ℓ , p ). Inter-preting the matrix entries Σ ℓ , p , ω = Σ ◦ ℓ , p , ω , ω ∈ [ q ] , as functions in p the discussion above shows that | Σ ℓ , p , ω − Σ ℓ , u [ q ] , ω | = O ( E (3) r n ) uniformly in ω ∈ [ q ] , p ∈ B n and ℓ ∈ L n . Due to the assumption nr n = Ω (1) e have E (3) r n = O ( nE (3) r n ). Using the Leibniz formula to view f ℓ ( p ) as a polynomial and taking deriva-tives in Σ ℓ , p , ω , ω ∈ [ q − (as opposed to p ( ω )) we obtain | f ℓ ( p ) − f ℓ ( u [ q ] ) | = O µµX ω ¯¯ Σ ℓ , p , ω ¯¯¶ q − X ω | Σ ℓ , p , ω − Σ ℓ , u [ q ] , ω | ¶ = O ¡ nE (3) r n ¢ since we already showed that ¯¯ Σ ℓ , p , ω ¯¯ ≤ ε − E (2) uniformly in ω ∈ [ q − , p ∈ B n and ℓ ∈ L n . Withdet( Σ ℓ ) = Θ (1) uniformly in ℓ ∈ L n as derived in the proof of Theorem 6.2 the assertion follows. (cid:3) In the remainder of the proof we approximate ι ℓ to ﬁrst order and control the errors in the exponent,while the remainder already agrees with the assertion in Theorem 6.3. In the proof of Proposition 6.10 wehave already established the ﬁrst and second partial derivatives of ι ℓ ( p ) = ¯ ρ ℓ , p , and further the boundson p , S ◦ ℓ , p and Σ ◦ ℓ , p required to derive ¯ ρ ( ω , ω ) ℓ , p ( ω ) = O ( E (3) n ) uniformly for all ω ∈ [ q ] , p ∈ B n and ℓ ∈ L n .Hence, Taylor’s theorem yields k ¯ ρ ℓ , p − ˜ ρ ℓ , p k = O ¡ E (3) n r n ¢ , ˜ ρ ℓ , p = ¯ ρ ℓ + q Σ ◦ ℓ , u [ q ] ( p − u [ q ] ),uniformly for all p ∈ B n and ℓ ∈ L n , with E (3) n r n = o ( r n n ) = o ( p n − ) since r n = Ω ( p n − ). Recall thatthe eigenvalues of Σ ◦ ℓ , u [ q ] can be upper bounded by E [ d ℓ ]/ E [ d ℓ ], and are hence uniformly bounded, andfurther that 1 [ q ] is an eigenvector of Σ ◦ ℓ , u [ q ] (with eigenvalue 0), so Σ ◦ ℓ , u [ q ] maps 1 ⊥ [ q ] into 1 ⊥ [ q ] . This showsthat (for large enough n ) the linear approximation ˜ ρ ℓ , p is in P ◦ ([ q ]) with k ˜ ρ ℓ , p − ¯ ρ ℓ k = O ( r n ). With ˜ ρ ℓ , p ∈ P ◦ ([ q ]) we can safely project onto the ﬁrst ( q −

1) coordinates to obtain ( ˜ ρ ℓ , p − ¯ ρ ℓ ) [ q − = q ˜ B t Σ ◦ ℓ , u [ q ] B ( p − u [ q ] ) [ q − with B introduced in the proof of Proposition 6.10 and ˜ B = ( e [ q ], ω ( ω ∗ )) ω ∗ ∈ [ q ], ω ∈ [ q − . With Σ ◦ ℓ , u [ q ] = B Σ ℓ B t , ˜ B t B = I [ q − and B t B = I [ q − + [ q − t[ q − we have ( ˜ ρ ℓ , p − ¯ ρ ℓ ) [ q − = q Σ ℓ ( p − u [ q ] ) [ q − .For one, we already obtained uniform bounds on the eigenvalues of Σ ℓ in the proof of Theorem 6.2 andhence a constant c ∈ (0, 1) such that c k ( p − u [ q ] ) [ q − k ≤ k ( ˜ ρ ℓ , p − ¯ ρ ℓ ) [ q − k ≤ c − k ( p − u [ q ] ) [ q − k for all p ∈ P ∗ , ℓ ∈ L n , n ∈ Z > whenever ˜ ρ ℓ , p ∈ P ([ q ]). More than that, this map is invertible and allows tosubstitute q ( p − u [ q ] ) [ q − in the exponent α p, ℓ ( p ), i.e. α p, ℓ ( p ) = − E [ d ℓ ] nq ( p − u [ q ] ) t[ q − Σ ℓ ( p − u [ q ] ) [ q − = − E [ d ℓ ] n ( ˜ ρ ℓ , p − ¯ ρ ℓ ) t[ q − Σ − ℓ ( ˜ ρ ℓ , p − ¯ ρ ℓ ) [ q − .Using the bounds on the ﬂuctuations k ¯ ρ ℓ , p − ˜ ρ ℓ , p k this gives | α r, ℓ ( p ) − α p, ℓ ( p ) | = E [ d ℓ ] n ¯¯¯ ( ¯ ρ ℓ , p − ˜ ρ ℓ , p ) t[ q − Σ − ℓ ( ¯ ρ ℓ , p − ˜ ρ ℓ , p ) [ q − ¯¯¯ = O ¡ ( E (3) n ) r n n ¢ = o ¡ E (3) n r n n ¢ , α r, ℓ ( p ) = − E [ d ℓ ] n ( ¯ ρ ℓ , p − ¯ ρ ℓ ) t[ q − Σ − ℓ ( ¯ ρ ℓ , p − ¯ ρ ℓ ) [ q − ,since r n n = Ω (1) and hence E (3) n r n = O ( E (3) n r n n ) = o (1). Hence, the relative error made by approximatingthe exponent α p , ℓ ( p ) with α r, ℓ ( p ) is strictly smaller than the existing bound.To be thorough, ﬁx a sequence of radii r n with r n n = Ω (1) and E (3) r n n = o (1), bounding the ﬂuctu-ations k ρ − ¯ ρ ℓ k (as opposed to k p − u [ q ] k which was the case so far) and let B ℓ = B r n ( ¯ ρ ℓ ) be the cor-responding ball. Let c ∗ ∈ R > be large and r ′ n = c ∗ r n , then our existing results hold for r ′ n respectively B ′ n = B r ′ n ( u [ q ] ). The two-sided bounds for ˜ ρ ℓ , p imply that all ˜ ρ ∈ ˜ B ℓ , ˜ B ℓ = B r n ( ¯ ρ ℓ ), are covered by B ′ n .But since the ﬂuctuations k ¯ ρ ℓ , p − ˜ ρ ℓ , p k = o ( p n − ) = o ( r n ) are very small, all ρ ∈ B ℓ are covered by B ′ n ,which completes the proof since r n = Θ ( r ′ n ).7. A SSIGNMENT DISTRIBUTIONS

This section extends the results from Section 5 and Section 6. For this purpose ﬁx a non-trivial family( d ℓ , µ ℓ ) ℓ ∈ L satisfying SPAN and a sequence ( L n ) n ∈ Z > ⊆ L n satisfying GEN , VAR and

SKEW . For P ∈ P L and p ∈ P ([ q ]) let the expected assignment distribution ¯ α P , p ∈ P ( A ) be given by ¯ α P , p ( ℓ , χ ) = P ( ℓ ) µ p , ℓ ( χ ) or ( ℓ , χ ) ∈ A with A = {( ℓ , χ ) : ℓ ∈ L , χ ∈ [ q ] d ℓ }. As before, we usually omit the subscript p if p = u [ q ] . Weconsider the distributions ¯ α P , p elements of P A = { α ∈ P ( A ) : α | ∈ P L } equipped with the metric ∆ A ( α , α ′ ) = X ℓ ∈ L ( d ℓ + X χ ∈ [ q ] d ℓ ¯¯ α ( ℓ , χ ) − α ′ ( ℓ , χ ) ¯¯ for α , α ′ ∈ P A . Further, based on the insights from Section 6 we let ¯ ρ p , P = ι P ( p ) to stress the interpretationas expected colour frequencies and recall that ¯ ρ p , P ℓ = ¯ ρ p , ℓ for non-trivial sequences ℓ ∈ L n .For n ∈ Z > , non-trivial ℓ ∈ L n and χ ∈ Q i [ q ] d i let α χ ∈ P A denote the assignment frequencies, i.e. α χ ( ℓ , χ ) = n ¯¯© i ∈ [ n ] : ℓ i = ℓ , χ i = χ ª¯¯ for ( ℓ , χ ) ∈ A , where we keep the dependence on ℓ implicit. Finally, for p ∈ P ([ q ]) and ρ in the support of ρ p , ℓ we let χ p , ℓ , ρ = ( χ p , ℓ | ρ p , ℓ = ρ ), further α p , ℓ = α χ p , ℓ and α p , ℓ , ρ = α χ p , ℓ , ρ . The main result of this sectionensures that, given colour frequencies ρ close to their expectation ¯ ρ P and a sequence ℓ with frequen-cies P ℓ close to the reference P , the assignment distribution α ℓ , ρ is close to the expected unconditionalassignment distribution ¯ α P of the reference P with very high probability. Proposition 7.1.

Fix ( L n ) n ⊆ L n satisfying SPAN , GEN , VAR and

SKEW , a reference distribution P ∈ P L and ε ∈ R > . Then there exists δ , c, c ′ ∈ R > such that for all n ∈ Z > , all ℓ ∈ L n with P ℓ ∈ B δ ( P ) , and all ρ ∈ B δ ( ¯ ρ P ) in the support of ρ ℓ we have P £ ∆ A ( α ℓ , ρ , ¯ α P ) ≥ ε ¤ ≤ c ′ exp( − cn ) . The proof of Proposition 7.1 builds intuition for the construction in Section 5, in particular for thedistributions µ p , ℓ .7.1. Proof strategy.

Consider the speciﬁed ( L n ) n , P and ε ﬁxed in the remainder. Further, for given ε p ∈ R > let P ∗ = { p ∈ P ([ q ]) : min ω ∈ [ q ] p ( ω ) ≥ ε p } and for ℓ ∈ L n let P ∗ ℓ be the set of distributions p ∈ P ∗ with ¯ ρ p , ℓ in the support of ρ ℓ . Our ﬁrst result is a corollary to Proposition 6.5. Fact 7.2.

For ﬁxed ε p ∈ R > and uniformly over all ℓ ∈ L n and p ∈ P ∗ ℓ we have P [ ρ p , ℓ = ¯ ρ p , ℓ ] = Θ ³ p n − ( q − ´ . The proof is postponed to Section 7.2. The next result deals with the unconditional case for the ad-justed measures.

Lemma 7.3.

There exist constants δ , c, c ′ ∈ R > such that for all n ∈ Z > , all ℓ ∈ L n with P ℓ ∈ B δ ( P ) andall p ∈ P ∗ we have P £ ∆ A ¡ α p , ℓ , ¯ α p , ℓ ¢ ≥ ε ¤ ≤ c ′ exp( − cn ) . The proof is postponed to Section 7.3. Combining Fact 7.2 and Lemma 7.3 allows to derive bounds forthe conditional probability, still for the adjusted measures. For this purpose let ℓ ∈ L n and p ∈ P ∗ ℓ we let χ ∗ p , ℓ = χ p , ℓ , ¯ ρ p , ℓ and α ∗ p , ℓ = α χ ∗ p , ℓ , and further use ¯ α p , ℓ = ¯ α P ℓ , p for consistency. Lemma 7.4.

For all ε p ∈ R > there exist constants δ , c, c ′ ∈ R > such that for all n ∈ Z > , ℓ ∈ L n withP ℓ ∈ B δ ( P ) and all p ∈ P ∗ ℓ we have P h ∆ A ( α ∗ p , ℓ , ¯ α p , ℓ ) ≥ ε i ≤ c ′ exp( − cn ) . The proof is postponed to Section 7.4. Finally, the following fact justiﬁes the discussion of the adjustedmeasures.

Fact 7.5.

For all n ∈ Z > , ℓ ∈ L n and p ∈ P ∗ ℓ the assignments α ℓ , ¯ ρ p , ℓ and α ∗ p , ℓ have the same law. he proof is postponed to Section 7.5. Lemma 7.4 combined with Fact 7.5 yielsd concentration resultsfor the assignment distributions given their colour frequencies ¯ ρ p , ℓ . Hence, the only part left to show isthat the local concentration points ¯ α p , ℓ are close to the reference ¯ α P if P ℓ is close to P and p is close to u [ q ] . The details are presented in Section 7.6.7.2. Proof of Fact 7.2.

By construction we satisfy the assumptions of Proposition 6.5. But as thoroughlydiscussed e.g. in Section 6 we also have E [ d ℓ ], det( Σ ℓ , p ) = Θ (1) uniformly in ℓ ∈ L n which completes theproof. Notice that π ∈ P ∗ ∩ P ℓ sufﬁces to show this result, since Proposition 6.5 then implies that ¯ ρ p , ℓ isin the support of ρ ℓ for sufﬁciently large n .7.3. Proof of Lemma 7.3.

We consider ε p ﬁxed throughout this section. Further, ﬁx ε f , δ ∈ (0, 1), n ∈ Z > , ℓ ∈ L n with P ℓ ∈ B δ ( P ) and p ∈ P ∗ . Further let L − = { ℓ ∈ L : P [ ℓ P = ℓ ] < ε f }, L + = L \ L − denote the partition into measures of low frequency and high frequency respectively. Notice that | L + | ∈ Z > for ε f sufﬁciently small and let d max = max{ d ℓ : ℓ ∈ L + }. With ∆ = ∆ A we consider the correspondingsplit ∆ ¡ α p , ℓ , ¯ α p , ℓ ¢ = ∆ − ¡ α p , ℓ , ¯ α p , ℓ ¢ + ∆ + ¡ α p , ℓ , ¯ α p , ℓ ¢ , ∆ ± ¡ α p , ℓ , ¯ α p , ℓ ¢ = X ℓ ∈ L ± , χ ∈ X ℓ ( d ℓ + ¯¯ α p , ℓ ( ℓ , χ ) − ¯ α p , ℓ ( ℓ , χ ) ¯¯ .Recall that α p , ℓ | = ¯ α p , ℓ | = P ℓ , so with α p , ℓ , ℓ ∈ P ( X ℓ ) given by α p , ℓ , ℓ ( χ ) = α p , ℓ ( ℓ , χ )/ P ℓ ( ℓ ) for χ ∈ X ℓ and ℓ in the support of P ℓ denoting the law conditional to ℓ ℓ = ℓ we have ∆ ± ¡ α p , ℓ , ¯ α p , ℓ ¢ = X ℓ ∈ L ± P ℓ ( ℓ )( d ℓ + °° α p , ℓ , ℓ − µ ℓ °° .Since we can uniformly bound the norm and P ℓ ∈ B δ ( P ) we obtain ∆ − ¡ α p , ℓ , ¯ α p , ℓ ¢ ≤ X ℓ ∈ L − P ℓ ( ℓ )( d ℓ + < X ℓ ∈ L − P ( ℓ )( d ℓ + + δ .Since P ∈ P L has a ﬁnite ﬁrst moment the latter expectation tends to 0 for ε f →

0, so for ε f sufﬁcientlysmall and δ = ε f /2 we have ∆ − ¡ α p , ℓ , ¯ α p , ℓ ¢ ≤ ε /2 almost surely and thereby p = P £ ∆ ¡ α p , ℓ , ¯ α p , ℓ ¢ ≥ ε ¤ ≤ P £ ∆ + ¡ α p , ℓ , ¯ α p , ℓ ¢ ≥ ε /2 ¤ = P £ ∆ + ¡ α p , ℓ , ¯ α p , ℓ ¢ ≥ p + c ε ¤ with c = (2 p + ) − and p + = P [ ℓ ℓ ∈ L + ]. Writing both sides of ∆ + ¡ α p , ℓ , ¯ α p , ℓ ¢ ≥ p + c ε as expectations withrespect to ℓ ℓ yields p ≤ X ℓ ∈ L + P h ( d ℓ + °° α p , ℓ , ℓ − µ p , ℓ °° ≥ c ε i ≤ X ℓ ∈ L + P h°° α p , ℓ , ℓ − µ p , ℓ °° ≥ c ′ ε i with c ′ = c /( d max +

1) and where we notice that P ℓ ( ℓ ) > P ( ℓ ) − δ ≥ ε f /2 for all ℓ ∈ L + in the supportof ℓ ℓ . Recall that for all χ ∈ X ℓ the frequency α p , ℓ , ℓ ( χ ) is a sum of P ℓ ( ℓ ) n i.i.d. random variables withexpectation µ p , ℓ ( χ ), so Hoeffding’s inequality for ε ′ ∈ R ≥ yields P £¯¯ α p , ℓ , ℓ ( χ ) − µ p , ℓ ( χ ) ¯¯ ≥ ε ′ ¤ ≤ ¡ − ε ′ P ℓ ( ℓ ) n ¢ ≤ ¡ − ε f ε ′ n ¢ .Standard arguments yield a bound for the k · k ∞ norm and further P h°° α p , ℓ , ℓ − µ p , ℓ °° ≥ ε ′ i ≤ q d max exp µ − ε f q d max ε ′ n ¶ .This uniform bounds directly implies p ≤ | L + | q d max exp µ − ε f c ′ q d max ε n ¶ and thereby completes the proof. Finally, notice that p ∈ P ∗ was not required. .4. Proof of Lemma 7.4.

For ﬁxed ε p ∈ R > and with Fact 7.2 we obtain c ∈ R > and such that P h ∆ A ( α ∗ p , ℓ , ¯ α p , ℓ ) ≥ ε i ≤ c p n q − P £ ∆ A ( α p , ℓ , ¯ α p , ℓ ) ≥ ε ¤ for all sufﬁciently large n , ℓ ∈ L n and p ∈ P ∗ ℓ . Now, we summon Lemma 7.3 to obtain δ , c , c such thatfor all ℓ ∈ L n with P ℓ ∈ B δ ( P ) and p ∈ P ∗ ℓ we have P h ∆ A ( α ∗ p , ℓ , ¯ α p , ℓ ) ≥ ε i ≤ cc p n q − exp( − c n ) = exp ³ − ³ c − n − log ³ cc p n q − ´´ n ´ .Hence, we ﬁx a constant c ∈ (0, c ) and n ∗ sufﬁciently large such that for all n ∈ Z ≥ n ∗ the leading coefﬁ-cient in the exponent exceeds c , so for all ℓ ∈ L n with P ℓ ∈ B δ ( P ) and p ∈ P ∗ ℓ we have P h ∆ A ( α ∗ p , ℓ , ¯ α p , ℓ ) ≥ ε i ≤ exp ( − cn ).Finally, we set c ′ = exp( cn ∗ ) which ensures that c ′ exp( − cn ) ≥ n < n ∗ and hence the assertionholds.7.5. Proof of Fact 7.5.

For assignments χ with ρ χ = ¯ ρ p , ℓ , using Lemma 6.4 and ρ = ¯ ρ p , ℓ we have P · χ ℓ , ρ = χ ¸ = P [ χ ℓ = χ ] P [ ρ ℓ = ρ ] = exp ³ − n E h D KL ³ µ p , ℓ ℓ k µ ℓ ℓ ´i´ P [ χ p , ℓ = χ ]exp ³ − n E h D KL ³ µ p , ℓ ℓ k µ ℓ ℓ ´i´ P [ ρ p , ℓ = ρ ] = P · χ ∗ p , ℓ = χ ¸ ,which directly translates to the distributions and thereby completes the proof.7.6. Proof of Proposition 7.1.

Fix suitable ( L n ) n , P and ε . Further, ﬁx some small ε p ∈ (0, 1) and let η = ι − . Since η is continuous due to Proposition 5.3, the preimage η − ( B ε p ( u [ q ] )) is open and ( P , ¯ ρ P ) ∈ η − ( B ε p ( u [ q ] )) since η ( P , ¯ ρ P ) = u [ q ] . From this we obtain δ ∈ R > such that B δ ( P ) × B δ ( ¯ ρ P ) ⊆ η − ( B ε p ( u [ q ] )).With Lemma 7.4 we obtain δ , c , c ′ such that for all n ∈ Z > , all ℓ ∈ L n with P ℓ ∈ B δ ( P ) and all p ∈ B ε p ( u [ q ] ) with ¯ ρ p , ℓ in the support of ρ ℓ we have P h ∆ A ( α ∗ p , ℓ , ¯ α p , ℓ ) ≥ ε i ≤ c ′ exp( − cn ).Using Fact 7.5 and ρ = ¯ ρ p , ℓ immediately yields P £ ∆ A ( α ℓ , ρ , ¯ α p , ℓ ) ≥ ε ¤ ≤ c ′ exp ( − cn ).Now, let δ = min( δ , δ ), n ∈ Z > , ℓ ∈ L n with P ℓ ∈ B δ ( P ) and ρ ∈ B δ ( ¯ ρ P ) in the support ρ ℓ . By the abovewe have p = η ( P ℓ , ρ ) ∈ B ε p ( u [ q ] ) and further with P ℓ ∈ B δ ( P ) we obtain P £ ∆ A ( α ℓ , ρ , ¯ α p , ℓ ) ≥ ε ¤ ≤ c ′ exp ( − cn ).Now, we’re left to show that the conditional assignment distribution expectations ¯ α p , ℓ are close to theunconditional expectation ¯ α P . For this purpose notice that by using the triangle inequality and normal-ization of µ p , ℓ we have ∆ A ( ¯ α p , ℓ , ¯ α P ) = X ℓ , χ ( d ℓ + ¯¯ P ℓ ( ℓ ) µ p , ℓ ( χ ) − P ( ℓ ) µ ℓ ( χ ) ¯¯ ≤ ∆ L ( P ℓ , P ) + X ℓ P ( ℓ )( d ℓ + k µ p , ℓ − µ ℓ k .For L ∗ ⊆ L sufﬁciently large (but still ﬁnite) we use the uniform bounds for the norm on the L \ L ∗ contribution to the expectation and an upper bound d cap ∈ R > for the degrees of L ∗ . Further, since p ( µ p , ℓ ) ℓ ∈ L ∗ is continuous with u [ q ] ( µ ℓ ) ℓ ∈ L ∗ we can also control the norm on L ∗ and thereby ﬁnd δ ∈ R > such that for all ℓ ∈ L n with P ℓ ∈ B δ ( P ) and p ∈ B δ ( u [ q ] ) we have ∆ A ( ¯ α p , ℓ , ¯ α P ) ≤ δ + E £ ( d P + { ℓ P L ∗ } ¤ + ( d cap + E £ { ℓ P ∈ L ∗ } k µ p , ℓ P − µ ℓ P k ¤ < ε + ε + ε = ε .Finally, we combine the two arguments to obtain the result as follows. First, choose δ ∈ (0, 1) sufﬁcientlysmall such that ∆ A ( ¯ α p , ℓ , ¯ α P ) < ε /2 for all p ∈ B δ ( u [ q ] ) and ℓ ∈ L n with P ℓ ∈ B δ ( P ). Further, for ε /2 and p = δ the ﬁrst argument provides δ , c , c ′ such that for all n ∈ Z > , ℓ ∈ L n with P ℓ ∈ B δ ( P ) and all ρ ∈ B δ ( ¯ ρ P ) in the support of ρ ℓ we have p = η ( P ℓ , ρ ) ∈ B δ ( u [ q ] ) and P £ ∆ A ( α ℓ , ρ , ¯ α p , ℓ ) ≥ ε /2 ¤ ≤ c ′ exp( − cn ).Now, let δ = min( δ , δ ). Then for all n ∈ Z > , all ℓ ∈ L n with P ℓ ∈ B δ ( P ) and all ρ ∈ B δ ( ¯ ρ P ) in the supportof ρ ℓ we have p = η ( P ℓ , ρ ) ∈ B δ ( u [ q ] ), which gives ∆ A ( ¯ α p , ℓ , ¯ α P ) < ε /2, so using the triangle inequality ∆ A ( α ℓ , ρ , ¯ α P ) ≥ ε implies ∆ A ( α ℓ , ρ , ¯ α p , ℓ ) ≥ ε /2 and thereby P £ ∆ A ( α ℓ , ρ , ¯ α P ) ≥ ε ¤ ≤ P £ ∆ A ( α ℓ , ρ , ¯ α p , ℓ ) ≥ ε /2 ¤ ≤ c ′ exp( − cn ).8. D EGREE DISTRIBUTIONS

Recall the degree distributions introduced in Section 2.1, let ¯ d = E [ d ], ¯ k = E [ k ], ¯ m n = ¯ dn / ¯ k , t ∗ n = ( m , ( d i ) i ∈ [ n ] , ( k i ) i ∈ [ m ] ) and T ∗ n denote the support of t ∗ n . For t ∈ T ∗ n we use t = ( m t , d t , k t ) to specify thecomponents. Further, let E n denote the event n X i = d i = m X i = k i and N the values of n with P [ E n ] >

0. Finally, for n ∈ N let t n = ( t ∗ n | E n ) denote the degree sequences forwhich G is well-deﬁned and T n the support of t n .Let ε deg be such that DEG holds, further α = + ε deg and P deg = { p ∈ P ( Z ≥ ) : E [ x α p ] ∈ R ≥ } with x p ∼ p .Notice that the map ∆ ( p , p ′ ) = X x x | p ′ ( x ) − p ( x ) | + ¯¯¯ E h x p i − E h x p ′ i¯¯¯ + ¯¯¯ E h x α p i − E h x α p ′ i¯¯¯ with p , p ′ ∈ P deg deﬁnes a metric on P deg . This metric induces a metric on the product space T rel = R ≥ × P given by ∆ ( τ , τ ′ ) = | τ r − τ ′ r | + ∆ ( τ v , τ ′ v ) + ∆ ( τ f , τ ′ f )for τ = ( τ r , τ v , τ f ), τ ′ = ( τ ′ r , τ ′ v , τ ′ f ) ∈ T rel . With p d , p k denoting the laws of d and k respectively we noticethat τ ∗ = ( ¯ d / ¯ k , p d , p k ) ∈ T rel . For n ∈ Z > and t ∈ T ∗ n we let τ ( t ) = ( m t / n , p d, t , p k, t ) ∈ T rel with p d, t denot-ing the relative frequencies of the degrees on the variable side, or equivalently the law of d t = d t , i with i uniform on [ n ], and p k, t denoting the relative frequencies of the degrees on the factor side, or equiva-lently the law of k t = k t , a t with a t uniform on [ m t ]. For the case m t = p k, t be the one-point masson 0. Notice that for given n ∈ N and t ∈ T n the number m t ∈ Z ≥ of factors may still be arbitrarily large.We say that a sequence f n : T n → R , n ∈ N , is sublinear in the number of factors if there exists a constant c ∈ R > such that | f n ( t ) | ≤ c + cm t / n for all t ∈ T n and n ∈ N . Proposition 8.1.

Assume that

DEG holds. Then there exists r n ∈ R > with r n = o (1) such that for all se-quences f n : T n → R , n ∈ N , that are sublinear in the number of factors we have E [ f n ( t n )] = E [ f n ( t n ) { τ ( t n ) ∈ B r n ( τ ∗ )}] + o (1) = E [ f n ( t n ) | τ ( t n ) ∈ B r n ( τ ∗ )] + o (1) . As a byproduct of the proof we will see that | N | = ∞ , so taking limits is reasonable. Using Proposi-tion 8.1 we consider r n ﬁxed and use T ◦ n to denote the typical valid degree sequences, i.e. valid degreesequences t ∈ T n with τ ( t ) ∈ B r n ( τ ∗ ). In particular, we are free to choose r n such that uniform boundson the various quantities are enforced, e.g. ¯ d k ≤ m t / n ≤ d k by choosing r n ≤ ¯ d k for all n ∈ N , uniformlower bounds for the point probabilities in ﬁnite subsets of the supports D , K of d , k , bounds on themoments and so on. Details on further implications can be found in Section 8.6. .1. Proof strategy.

The main ingredient to the proof of Proposition 8.1 is the following result.

Proposition 8.2.

Assume that

DEG holds. Then there exists r n = o (1) such that τ ( t n ) ∈ B r n ( τ ∗ ) with highprobability. We split the proof of Proposition 8.2 into three parts. In the ﬁrst part we determine the order of theprobability that t ∗ n ∈ T n and show that | N | = ∞ . Lemma 8.3.

Assume that

DEG holds. Then we have P [ t ∗ n ∈ T n ] = Θ ( p n − ) . Notice that the proof only requires existence of the second moments. Next, we show that τ ( t ∗ n ) istypically close to τ ∗ . Lemma 8.4.

Assume that

DEG holds. Then there exists r n = o (1) such that τ ( t ∗ n ) ∈ B r n ( τ ∗ ) with highprobability. We then use a fairly general argument to show how Proposition 8.2 is immediately implied by Lemma8.3 and Lemma 8.4. Finally, we derive Proposition 8.1 from Proposition 8.2.8.2.

Proof of Lemma 8.3.

The relevant quantities for the proof are the total variable degree d tot, n , n ∈ Z ≥ , the total factor degree k tot, m , m ∈ Z ≥ , and the number of factors m n = Po( ¯ m n ), i.e. d tot, n = X i ∈ [ n ] d i , k tot, m = X a ∈ [ m ] k a , m ′ n = X i ∈ [ n ] m ′ i with m ′ i ∼ Po( ¯ d / ¯ k ), i ∈ Z > , independent of anything else, hence m n ∼ m ′ n by the properties of the Pois-son distribution. Hence, all relevant quantities are sums of i.i.d. non-negative integer random variableswith slightly more than the second moment, which allows to treat them simultaneously using Theorem3.5.2 in [24] and the discussion prior to the Theorem.In particular, we need to distinguish four cases depending on whether or not d and k are degenerate.To be thorough, notice that D \ {0}

6= ; and K \ {0}

6= ; since ¯ d ∈ R > and ¯ k ∈ R > . Hence, we have P [ d = ¯ d ] = d ∈ Z > if d is degenerate, and otherwise P [ d ∈ d ∗ + h d Z ] = d ∗ ∈ D and h d ∈ Z > denoting the span of d as introduced in Section 3.5 of [24]. In the latter case we say that d islattice. Obviously, the same holds for k , while Po( ¯ d / ¯ k ) is always lattice with span 1.In order to treat the random variables above simultaneously we let x ∈ Z ≥ with ¯ x = E [ x ] ∈ R > and σ = Var( x ) ∈ R ≥ . Further, for n ∈ Z > we let s n = P i ∈ [ n ] x i with x i ∼ x , i ∈ Z > , being i.i.d. randomvariables. If x is degenerate then we have P [ x = ¯ x ] = x ∈ Z > and further P [ s n = ¯ xn ] = n ∈ Z > .Otherwise, we have P [ x ∈ x ∗ + h Z ] = x ∗ ∈ Z ≥ such that P [ x = x ∗ ] ∈ (0, 1) and h being the span of x . In this case, as discussed in [24], we have P [ s n ∈ L n ] = L n = x ∗ n + h Z , and the following locallimit theorem. Theorem 8.5.

For x lattice with ﬁnite variance and the notions introduced above we have lim n →∞ sup s ∈ L n ¯¯¯¯ p nh P [ s n = s ] − φ n ( s ) ¯¯¯¯ = , φ n ( s ) = p πσ exp Ã − v n , s σ ! , v n , s = p n ( s − ¯ xn ) , s ∈ R ≥ . As we will see in the following, Theorem 8.5 has immediate consequences for the distribution of s n that facilitate the proof of Lemma 8.3. Now, we are ready for the discussion of the four cases.First, assume that we are in the biregular case, i.e. both d and k are degenerate. Then d tot, n = ¯ d n and k tot, m = ¯ km are degenerate as well, which implies that t ∈ T n iff m t = ¯ m n and hence | T n | =

1. For n ∈ ¯ k Z > we have ¯ m n ∈ Z > and further P [ t ∗ n ∈ T n ] = P [ m n = ¯ m n ] > | N | = ∞ . Further, for any n ∈ N we must have ¯ m n ∈ Z > , and further saw that P [ t ∗ n ∈ T n ] = P [ m n = ¯ m n ] = P [ m ′ n = ¯ m n ]. ence, we can use the local limit theorem 8.5 for m ′ n at ¯ m n , i.e. h = σ = ¯ d / ¯ k , v ¯ m n = φ n ( ¯ m n ) = c with c = p πσ − ∈ R > which gives P [ t ∗ n ∈ T n ] = p n ( c + o (1)) = Θ ( p n − ).Next, we consider the case that d is degenerate and k is lattice. Hence, we have ¯ d ∈ Z > , D = { ¯ d } and P [ d tot, n = ¯ d n ] = n ∈ Z > , on the variable side. Further, we have P [ k tot, m ∈ L m ] = L m = k ∗ m + h k Z for m ∈ Z > on the factor side, where k ∗ ∈ K \ {0} and h k ∈ Z > denotes the span of k . Now,for n ∈ k ∗ Z > we have m = ¯ dnk ∗ ∈ Z > and hence t ∈ T n , where t is given by m t = m , d t , i = ¯ d for i ∈ [ n ]and k t , a = k ∗ for a ∈ [ m ], so n ∈ N and hence | N | = ∞ . Further, for any n ∈ N there exists t ∗ ∈ T n , so¯ d n ∈ L m t ∗ . But by deﬁnition we have L m = L m t ∗ for any m ∈ m t ∗ + h k Z , so ¯ d n ∈ L m . Now, ﬁx some largeradius r ∈ R > and let M n be given by all m ∈ m t ∗ + h k Z with | m − ¯ m n | < r p n . Due to the lattice structurethis gives | M n | = Θ ( p n ). Further, notice that for m ∈ M n we have φ n ( m ) = Θ (1) uniformly since | v m | < r in the local limit theorem for m ′ n , so P [ m ′ n = m ] = Θ ( p n − ) uniformly and thereby P [ m ′ n ∈ M n ] = Θ (1).For any m ∈ M n we have | ¯ d n − ¯ km | = ¯ k | ¯ m − m | < ¯ kr p n , so the required total degree ¯ d n is sufﬁciently closeto the expected total degree ¯ km on the factor side. Now, since we have m = Θ ( n ) uniformly for all m ∈ M n and ¯ d n ∈ L m , the local limit theorem for k tot, m gives P [ k tot, m = ¯ d n ] = Θ ( p n − ) uniformly for all m ∈ M n .This shows that P [ t ∗ n ∈ T n ] = Ω ( p n − ). To see that P [ t ∗ n ∈ T n ] = O ( p n − ) we only have to notice that P [ k tot, m = ¯ dn ] = O ( p n − ) uniformly for all m ∈ Z ≥ ε n for any ﬁxed ε ∈ R > and that P [ m n < ε n ] = o ( p n − )using the well-known Poisson tails.Now, assume that d is lattice and k is degenerate. Let d ∗ ∈ D \ {0} and h d ∈ Z > denote the span of d .Notice that for any n ∈ ¯ k Z > we have m = d ∗ n / ¯ k ∈ Z > , so the corresponding sequences t are in T n andhence | N | = ∞ . Further, for any n ∈ N we ﬁx t ∗ ∈ T n and notice that ¯ km t ∗ ∈ L n with L n = d ∗ n + h d Z ,so ¯ km ∈ L n for any m ∈ m t ∗ + h d Z . We repeat the previous construction with r ∈ R > to obtain P [ m ′ n ∈ M n ] = Θ (1), and again for any m ∈ M n we have | ¯ d n − ¯ km | < ¯ kr p n . But this time, the consequenceis that P [ d tot, n = ¯ km ] = Θ ( p n − ) uniformly for all m ∈ M n , so P [ t ∗ n ∈ T n ] = Ω ( p n − ). For the upperbound notice that we have the uniform bound P [ d tot, n = ¯ km ] = O ( p n − ) for any choice of m and hence P [ t ∗ n ∈ T n ] = O ( p n − ).We turn to the ﬁnal case that both d and k are lattice. Let d ∗ ∈ D \ {0} and k ∗ ∈ K \ {0}, further h d and h k denote the spans as before. With n ∈ k ∗ Z > and m = d ∗ n / k ∗ ∈ Z > we get | N | = ∞ . Further, forany n ∈ N there exists t ∗ ∈ T n , so L d, n ∩ L k, m t ∗

6= ; , where L d, n = d ∗ n + h d Z and L k, m = k ∗ m + h k Z .Fix s ∗ ∈ L d, n ∩ L k, m t ∗ , then we have s ∗ ∈ L d, n ∩ L k, m for any m ∈ m t ∗ + h k Z since then L k, m = L k, m t ∗ .Hence, we are free to repeat the previous constrruction for given r to obtain M n with P [ m n ∈ M n ] = Θ (1).In the next step we need to improve on s ∗ , so for ﬁxed m ∈ M n we notice that we have s ∈ L d, n ∩ L k, m forany s ∈ s ∗ + h e Z with h e = gcd( h d , h k ). So, for ﬁxed and large r ′ ∈ R > let E n be given by s ∈ s ∗ + h e Z with | s − ¯ dn | < r ′ p n . But then for any m ∈ M n and s ∈ E n we have | s − ¯ km | < r ′ p n + r ¯ k p n , so s is sufﬁcientlyclose to the expected total degree on the factor side. Now, we can summon the local limit theorem for d tot, n to get P [ d tot, n ∈ E n ] = Θ (1) and the local limit theorem for k tot, m to get P [ k tot, m = s ] = Θ ( p n − )uniformly for all s ∈ E n and m ∈ M n . This gives P [ t ∗ n ∈ T n ] = Ω ( p n − ), while P [ d tot, n = s ] = O ( p n − )uniformly gives P [ t ∗ n ∈ T n ] = O ( p n − ). .3. Proof of Lemma 8.4.

We split the metric into the seven individual contributions and consider themseparately. For this purpose let n ∈ Z > and T r, n = { t ∈ T ∗ n : | m t / n − ¯ d / ¯ k | < r r, n }, T v1, n = ( t ∈ T ∗ n : X d d ¯¯ p d, t ( d ) − p d ( d ) ¯¯ < r v1, n ) , T f1, n = ( t ∈ T ∗ n : X k k ¯¯ p k, t ( k ) − p k ( k ) ¯¯ < r f1, n ) , T v2, n = © t ∈ T ∗ n : ¯¯ E [ d t ] − E [ d ] ¯¯ < r v2, n ª , T f2, n = © t ∈ T ∗ n : ¯¯ E [ k t ] − E [ k ] ¯¯ < r f2, n ª , T v3, n = © t ∈ T ∗ n : ¯¯ E [ d α t ] − E [ d α ] ¯¯ < r v3, n ª , T f3, n = © t ∈ T ∗ n : ¯¯ E [ k α t ] − E [ k α ] ¯¯ < r f3, n ª ,for some sequences of radii and with α = + ε deg . Since m t ∗ n is Po( ¯ m n ) we can use the standard Poissonbounds, e.g. Theorem 2.1 with Remark 2.6 in [28], to see that t ∗ n ∈ T r, n with high probability for any r r, n = ω ( p n − ) with r r, n = o (1). Further, we notice that s t = n E [ d t ] = X i ∈ [ n ] d t , i , so s t ∗ n = X i ∈ [ n ] d i and thereby s t ∗ n is the sum over the i.i.d. random variables d i , i ∈ [ n ]. Hence, we use the weak law of largenumbers, e.g. Chapter 10.2 in [25], applied to n s t ∗ n considered as the average over the i.i.d. d i , i ∈ Z > ,with ﬁnite ﬁrst moment E [ d ] ∈ R > to obtain r v2, n = o (1) such that t ∗ n ∈ T v2, n with high probability. Thediscussion of T v3, n is completely analogous. Next, for r e, n ∈ R > consider the event E n = ( t ∈ T ∗ n : X d > ¯¯ p d, t ( d ) − p d ( d ) ¯¯ < r e, n ) .Let α ′ ( d ) = d − (1 + ε deg ) for d ∈ D \ {0}. Let a = P d α ′ ( d ) ∈ R > and α = a − α ′ ∈ P ( D \ {0}). Then we have E [ { d > α ( d ) − ] = a E [ d + ε deg ] and further P £ t ∗ n E n ¤ = P " X d > | p d, t ∗ n ( d ) − p d ( d ) | ≥ X d > α ( d ) r e, n ≤ X d > P £ | p d, t ∗ n ( d ) − p d ( d ) | ≥ α ( d ) r e, n ¤ ≤ X d > Var( np d, t ∗ n ( d )) ¡ α ( d ) nr e, n ¢ = X d > p d ( d )(1 − p d ( d )) α ( d ) nr n ≤ a E [ d + ε deg ] nr n ,where we used that np d, t ∗ n ( d ) is binomial with size n and success probability p d ( d ). Hence, we canchoose any r e, n = ω ( p n − ) with r e, n = o (1) to obtain t ∗ n ∈ E n with high probability. With α = + ε deg , r e, n = o ( n − α ), d max, n = c n n α = ω (1), c n = ( E [ d α ] + r v3, n ) α = Θ (1), and t ∈ E n ∩ T v3, n Markov’s inequalityimplies that P [ d t ≥ d max, n ] ≤ E [ d α t ] d α max, n < E [ d α ] + r v3, n c α n n = n and hence P [ d t ≥ d max, n ] =

0, which further yields X d d | p d, t ( d ) − p d ( d ) | = X d < d max, n d | p d, t ( d ) − p d ( d ) | + E [ { d ≥ d max, n } d ] < d max, n r e, n + o (1) = o (1),meaning that there exists r v1, n = o (1) such that T v1, n ⊆ E n ∩ T v3, n . This shows the existence of radii r r, n , r v, n = o (1) such that jointly t ∗ n ∈ T r, n and p d, t ∗ n ∈ B r v, n ( p d ) with high probability. Due to symmetry weobtain r f, m = o (1) (in the number of factors) such that p k, t ∗ n ∈ B r f, m ( p k ) | m t ∗ n = m with high probability in m and further uniformly in n . But since we have m t = Θ ( n ) for t ∈ T r, n uniformly, we obtain radii r f, n = o (1) depending only on n by taking the supremum of r f, m t over t ∈ T r, n . With r n = r r, n + r v, n + r f, n = o (1)this immediately gives τ ( t ∗ n ) ∈ B r n ( τ ∗ ) with high probability. .4. Proof of Proposition 8.2.

Recall from the proof of Lemma 8.3 that τ ( t n ) = τ ∗ almost surely for all n ∈ N if d , k are degenerate, i.e. the assertion holds for any choice of r n = o (1). Otherwise, let r n = o (1)be a sequence obtained from Lemma 8.4 such that r n = ω (log( n ) − ). Notice that 3 r n ∈ o (1) and P £ τ ( t ∗ n ) ∈ B r n ( τ ∗ ) ¤ ≤ P £ t ∗ n ∈ T s, n ¤ ≤ P £ τ ( t ∗ n ) ∈ B r n ( τ ∗ ) ¤ , t ∗ n ∈ T s, n iff m n / n ∈ B r n ( ¯ d / ¯ k ), p d, n ∈ B r n ( p d ), p k, m n ∈ B r n ( p k ),with p d, n = p d, t ∗ n denoting the relative frequencies of ( d i ) i ∈ [ n ] , p k, m denoting the relative frequencies of( k a ) a ∈ [ m ] for given m ∈ Z ≥ and where we recall that t ∗ n = ( m n , ( d i ) i ∈ [ n ] , ( k a ) a ∈ [ m n ] ). In particular, theabove shows that all three events occur with high probability and further 3 r n is also a suitable choice inthe context of Lemma 8.4. For given s and m we use the shorthands p m ( m ) = P [ m n = m ], P d ( s ) = P " X i ∈ [ n ] d i = s , P k ( s , m ) = P " X a ∈ [ m ] k a = s , P + d ( s ) = P " X i ∈ [ n ] d i = s , p d, n ∈ B r n ( p d ) , P + k ( s , m ) = P " X a ∈ [ m ] k a = s , p k, m ∈ B r n ( p k ) ,further P − d ( s ) = P d ( s ) − P + d ( s ), P − k ( s , m ) = P k ( s , m ) − P + k ( s , m ) and M n = n B r n ( ¯ d / ¯ k ). Using this notationwe have P £ t ∗ n ∈ T n ¤ = X s , m p m ( m ) P d ( s ) P k ( s ), P £ t ∗ n ∈ T n ∩ T s, n ¤ = X m ∈ M n X s p m ( m ) P + d ( s ) P + k ( s , m ) ≥ P £ t ∗ n ∈ T n , m n ∈ M n ¤ − E d − E k , E d = X m ∈ M n X s p m ( m ) P − d ( s ) P k ( s , m ), E k = X m ∈ M n X s p m ( m ) P d ( s ) P − k ( s , m ),where we exploited the dependency structure of t ∗ n . With the Poisson bounds used in the proof of Lemma8.4, r n = ω (log( n ) − ) and Lemma 8.3 we have P £ t ∗ n ∈ T n , m n ∈ M n ¤ = P £ t ∗ n ∈ T n ¤ − P £ t ∗ n ∈ T n , m n M n ¤ = P £ t ∗ n ∈ T n ¤ − o ( p n − ) = (1 + o (1)) P £ t ∗ n ∈ T n ¤ .Now, assume that both d and k are lattice. With m ∈ M n and the proof of Lemma 8.3, respectivelyTheorem 8.5, notice that P k ( s , m ) = O ( p n − ) uniformly in s , m since k is lattice, so E d = O ( p n − ) X m ∈ M n X s p m ( m ) P − d ( s ) = O ( p n − ) P [ m n ∈ M n ] P £ p d, n B r n ( p d ) ¤ = o ( p n − )since p d, n ∈ B r n ( p d ) with high probability. Further, we have P d ( s ) = O ( p n − ) uniformly since d is latticeand hence we obtain E k = o ( p n − ) analogously. With P ( t ∗ n ∈ T n ∩ T s, n ] ≤ P [ t ∗ n ∈ T n ] this gives P [ t ∗ n ∈ T n ∩ T s, n ] = (1 + o (1)) P [ t ∗ n ∈ T n ]with another application of Lemma 8.3, which shows that P [ t n ∈ T s, n ] = + o (1) and thereby P [ τ ( t n ) ∈ B r n ( τ ∗ )] = + o (1), establishing the assertion for the current case with 3 r n .Next, we consider the case that d is lattice and k is degenerate. Then we have p k, m = p k almost surelyfor all m ∈ Z > and hence P − k ( s , m ) = s and further E k = n sufﬁciently large. Further, we have P k ( s , m ) = { s = ¯ km } and hence E d = X m ∈ M n p m ( m ) P − d ( ¯ km ) = O ( p n − ) X m ∈ M n P − d ( ¯ km ) = O ( p n − ) X s P − d ( s ) = o ( p n − )using the local limit theorem for m n and p d, n ∈ B r n ( p d ) with high probability. Following the discussionabove this yields P [ τ ( t n ) ∈ B r n ( τ ∗ )] = + o (1). inally, assume that d is degenerate and k is lattice, so in particular p d, n = p d almost surely, i.e. P − d ( s ) = E d =

0, and P d ( s ) = { s = ¯ dn }, leaving us with E k = X m ∈ M n p m ( m ) P − k ( ¯ d n , m ).Now, let m , m be i.i.d. with law Po( ¯ m n /2), i.e. we consider m n = m + m ∼ Po( ¯ m n ) as derived ran-dom variable. Analogously, we consider i.i.d. copies k a , k a with law p k and a ∈ Z > , which allowsto consider ( k a ) a ∈ [ m n ] = (( k a ) a ∈ [ m ] , ( k a ) a ∈ [ m ] ) as derived random variables. This immediately gives p k, m n = m m n p k1, m + m m n p k2, m and further ∆ ( p k, m n , p k ) ≤ m m n ∆ ( p k1, m , p k ) + m m n ∆ ( p k2, m , p k ).Hence, in the event that p k, m n B r n ( p k ) we have p k1, m B r n ( p k ) or p k2, m B r n ( p k ). Using corre-sponding shorthands for this decomposition we ﬁrst obtain E k = X m ∈ M n X m p m1 ( m ) p m2 ( m − m ) P − k ( ¯ d n , m ) ≤ X m , m ∈ M n p m1 ( m ) p m2 ( m ) P − k ( ¯ dn , m + m ) + o ( p n − )with M n = B nr n ( ¯ m n /2) by using the Poisson bounds for both m , m and an extension of the domain.As discussed above this further yields E k ≤ E k1 + E k2 + o ( p n − ), E k1 = X m ∈ M n p m1 ( m ) p m2 ( m ) X s P − k1 ( s , m ) P k2 ( ¯ dn − s , m ), E k2 = X m ∈ M n p m1 ( m ) p m2 ( m ) X s P k1 ( s , m ) P − k2 ( ¯ dn − s , m ).Since both m and m are uniformly linear in n we can apply the local limit theorem to obtain E k1 = O ( p n − ) P £ m ∈ M n , p k1, m B r n ( p k ) ¤ and the corresponding result for E k2 . At this point we notice that both the assertion of Proposition 8.2and Lemma 8.4 allow the choice of any arbitrarily ﬂat sequence r n = o (1) and in particular such thatthe assertion of Lemma 8.4 still holds with r ′ n = r n (where we may assume r n ≤ r n without loss ofgenerality). Hence, the observation that the models corresponding to m and m exactly reﬂect themodel corresponding to m n /2 with radii r ′ n /2 = r n shows that we can choose r n such that E k1 , E k2 = o ( p n − ) and thus E k = o ( p n − ). With these error bounds we also conclude for the last case that P [ τ ( t n ) ∈ B r n ( τ ∗ )] = + o (1).8.5. Proof of Proposition 8.1.

Let a sequence f n : T n → R , n ∈ N , be given that is sublinear in the num-ber of factors and let c ∈ R > such that | f n ( t ) | ≤ c + c m t n for all t ∈ T n and n ∈ N . Using Proposition 8.2we obtain r n = o (1) and let T ◦ n denote the set of t ∈ T n with τ ( t ) ∈ B r n ( τ ∗ ). With this notation we have ¯¯ E £ f n ( t n ) { t n T ◦ n } ¤¯¯ ≤ c P [ t n T ◦ n ] + c E h m t n n { t n T ◦ n } i ≤ o (1) + c ¯ m n n P [ t n T ◦ n ] + c E h m t n n { t n T ◦ n , m t n ≥ m n } i .Using the deﬁnition of ¯ m n and Proposition 8.2 we notice that the second contribution is also o (1). Forthe last contribution we recall the deﬁnition of t n , resolve the conditional expectation and use Lemma8.3 for the bound Θ ( p n − ) in the denominator P [ t ∗ n ∈ T n ], while the nominator can be upper boundedby E [ m n n − { m n ≥ m n }]. From the deﬁnition of the Poisson distribution we have E [ m n { m n = m }] = m n P [ m n = m − d / ¯ k P [ m n + ≥ m n ] which is exponentially small usingstandard Poisson bounds and hence p n P [ m n + ≥ m n ] = o (1).8.6. Properties of typical sequences.

In this section we summarize a few properties of the typical se-quences t ∈ T n for later usage. First, notice that k p d, t − p d k ≤ X d > | p d, t ( d ) − p d ( d ) | ≤ ∆ ( τ ( t ), τ ∗ ) < r n ,so we can choose r n such that for any ﬁnite subset D ′ ⊆ D and sufﬁciently small ε ∈ (0, 1) we have p d, t ( d ) ≥ ε uniformly in n , t ∈ T ◦ n and d ∈ D ′ , further impose any absolute bound on the distance to p d in k · k as well as the degree reweighted distance X d d | p d, t ( d ) − p d ( d ) | .In particular, we also obtain convergence of the ﬁrst moment since | E [ d t ] − E [ d ] | ≤ X d d | p d, t ( d ) − p d ( d ) | < r n .Since we obviously have E [ d t ] → E [ d ] and E [ d α t ] → E [ d α ] with α = + ε deg uniformly, we can choose r n to enforce uniform upper bounds E (2) , E ( α ) ∈ R > uniformly in n and t ∈ T ◦ n . As discussed in the proof ofProposition 8.4 Markov’s inequality then implies that max{ d t , i : i ∈ [ n ]} ≤ d max, n with d max, n = ( E ( α ) n ) α uniformly in t ∈ T ◦ n , so k d t k ∞ ≤ cn β almost surely for some c ∈ R ≥ , β ∈ (0, 1/2) and uniformly in t ∈ T ◦ n .Combining these gives the uniform bound E £ d t ¤ ≤ d − α max, n E ( α ) = cn β with c ∈ R > given by the above and β = (3 − α )/ α ∈ (0, 1/2) (if ε deg < β = m t ∼ ¯ m n uniformly for t ∈ T ◦ n the discussion above directly yields corresponding results for thefactor side. 9. M UTUAL CONTIGUITY

This section is dedicated to the mutual contiguity part of Proposition 3.2. We start with the deﬁnitionof contiguity. Let two sequences p n , p ∗ n ∈ P ( Ω n ) for n ∈ Z > on the same spaces Ω n be given. Then ( p n ) n is contiguous with respect to ( p ∗ n ) n if for every ε ∈ R > there exists n ∈ Z > and δ ∈ R > such that forall n ∈ Z ≥ n and all events E ⊆ Ω n with p ∗ ( E ) < δ we have p ( E ) < ε . If further ( p ∗ n ) n is contiguous withrespect to ( p n ) n then the two sequences are mutually contiguous.The factor graph model introduced in the following has the same law as the model discussed in Sec-tion 4.3 for Θ = P . The reason for explicitly introducingthe model is to build the connection to Section 5 and Section 6 for one, and further simplifying the no-tation for brevity. Another feature is that factor graphs are deﬁned or all possible distribution sequences(corresponding to degree sequences in the standard case), which is useful and required to obtain con-centration results in the upcoming sections.9.1. Product measure families.

For k ∈ Z ≥ let Ψ k = R Ω k > denote the set of functions ψ : Ω k → R > .Further, ﬁx a family ( k ℓ , P ℓ ) ℓ ∈ L F with L F ⊆ Z ≥ and k ℓ ∈ Z ≥ , P ℓ ∈ P ( Ψ k ℓ ) for ℓ ∈ L F . Let ψ ℓ ∼ P ℓ ,¯ ψ ℓ = E [ ψ ℓ ], Z ℓ = P y ¯ ψ ℓ ( y ), ξ ℓ = Z ℓ q − k ℓ and µ ℓ = Z − ℓ ¯ ψ ℓ ∈ P ( Ω k ℓ ) if Z ℓ > k ℓ , P ℓ ) ℓ ∈ L F satisﬁes BAL’ if X y µ ℓ ( y ) Y h ∈ [ k ℓ ] p ( y h ) ≤ q − k ℓ for all p ∈ P ( Ω ) and ℓ ∈ L F with k ℓ >

0. Further, notice that ( k ℓ , µ ℓ ) ℓ ∈ L F satisﬁes SPAN and the inducedlattice L discussed in Section 6 is Z q − and in particular h ≡

1. Analogous to the coupling in Section4.3 we introduce a new index ℓ ◦ ∈ Z ≥ with k ℓ ◦ = P ℓ ◦ being the one-point mass on ψ ≡

1, and let F ◦ = L F ∪ { ℓ ◦ }. Notice that this modiﬁcation does not change the associated lattice and further SPAN still holds.For the sake of symmetry we also ﬁx a family ( d λ ) λ ∈ L V with L V ⊆ Z ≥ and d λ ∈ Z ≥ for λ ∈ L V . Further,let ν λ ∈ P ( Ω d λ ) be given by ν λ ( ω [ d λ ] ) = q − for ω ∈ Ω and notice that ( d λ , ν λ ) λ ∈ L V satisﬁes SPAN bydeﬁnition. In the remainder we tacitly assume that both L V and L F are non-trivial, i.e. not all degreesare zero.9.2. Distribution sequences.

For n ∈ Z > we let T n denote the set of distribution sequences, i.e. T n = © ( m , λ , ℓ ) : m ∈ Z ≥ , λ ∈ L V n , ℓ ∈ L F m ª .For t = ( m , λ , ℓ ) ∈ T n we use the same shorthands as in Section 6 and Section 5, e.g. d i = d λ i for i ∈ [ n ].Further, let D t = P i ∈ [ n ] d i , K t = P i ∈ [ m ] k i , D ∗ t = max( D t , K t ), ∆ D ( t ) = D ∗ t − D t and ∆ K ( t ) = D ∗ t − K t denotethe total degrees and missing half-edges on both sides. We use the notions X t and A t from Section 4.3,but introduce two sets D D ( t ), D K ( t ) with | D D ( t ) | = ∆ D ( t ), D D ( t ) ∩ X t = ; and | D K ( t ) | = ∆ K ( t ), D K ( t ) ∩ A t =; . As indicated above and in Section 4.3 and using the shorthand m ◦ = m + ∆ K ( t ), we let t ◦ = ( m ◦ , λ , ℓ ◦ )with ℓ ◦ ∈ ( L F ◦ ) m + ∆ K ( t ) given by ℓ ◦ [ m ] = ℓ and ℓ i = ℓ ◦ otherwise.9.3. Factor graphs.

For given n ∈ Z > and t = ( m , λ , ℓ ) ∈ T n a factor graph G is given by a bijection g : X t ∪ D D ( t ) → A t ∪ D K ( t ) and weights ψ a i ∈ Ψ k i for each factor a i ∈ F m , i ∈ [ m ]. Let F ( G ) = { a : ( a , h ) ∈ A t , g − ( a , h ) ∈ X t }denote the subset of factors that are not connected to the dummy variables D D ( t ). For σ ∈ Ω V n let χ t , σ = ( σ x ) ( x , h ) ∈ X t denote the assignment to the half-edges excluding dummies and further γ t , σ the cor-responding absolute colour frequencies, i.e. γ t , σ ( ω ) = P x , h { χ x , h = ω } for ω ∈ Ω , and notice that thesenotions do not depend on G . Further, let y G , σ = y g , σ ∈ Ω A t ∪ D K ( t ) be given by y G , σ , h = χ t , σ , g − ( h ) for h ∈ A t ∪ D K ( t ) with g − ( h ) ∈ X t and undeﬁned otherwise. Finally, let ψ G ( σ ) = Q a ∈ F ( G ) ψ a ( y G , σ , a ), with Z G = P σ ψ G ( σ ) ∈ R > , µ G = Z − G ψ G ∈ P ( Ω V n ) unchanged and σ G ∼ µ G .9.4. Random factor graphs.

For n ∈ Z > and t = ( m , λ , ℓ ) ∈ T n we obtain the null model G t by drawing auniformly random bijection g : X t ∪ D D ( t ) → A t ∪ D K ( t ) and independently drawing the weight functions ψ a i from P i . Using ¯ ψ t = E [ ψ G t ] the teacher-student scheme G ∗ t ( σ ) with ground truth σ ∈ Ω V n is given bythe Radon-Nikodym derivative ψ G ( σ )/ ¯ ψ t ( σ ) with respect to G t . Further, using ¯ Z t = E [ Z G t ] the Nishimoriground truth ˆ σ t ∈ Ω V n is given by P [ ˆ σ t = σ ] = ¯ ψ t ( σ )/ ¯ Z t . Finally, we use the shorthand y ∗ t ( σ ) = y G ∗ t ( σ ), σ to denote the assignment to the factor side half-edges for a given ground truth. Notice that the models G t , G ∗ t ( σ ) and G t ◦ , G ∗ t ◦ ( σ ) are equal in that they show exactly the same behaviour and only differ in theexplicit modelling of the dummy factors in the latter case.9.5. Typical distribution sequences.

A sequence ( T n ) n ⊆ T n satisﬁes MC if the following holds. Thefamily ( k ℓ , P ℓ ) ℓ ∈ L F satisﬁes BAL’ . There exists ( L n ) n ⊆ L V n satisfying GEN , VAR and

SKEW such that forall n ∈ Z > and ( m , λ , ℓ ) ∈ T n we have λ ∈ L n . There exists ( L m ) m ⊆ L F m satisfying GEN , VAR and

SKEW such that for all n ∈ Z > and ( m , λ , ℓ ) ∈ T n we have ℓ ∈ L m . Finally, for all n ∈ Z > and ( m , λ , ℓ ) ∈ T n wehave X i ∈ [ n ] d i ≥ X i ∈ [ m ] k i ,using the conventions from Section 6, e.g. d i = d λ i .Again, notice that for any sequence ( T n ) n ⊆ T n satisfying MC the sequence ( T ◦ n ) n given by T ◦ n = { t ◦ : t ∈ T }, n ∈ Z > , satisﬁes MC as well, and ℓ ◦ spans the lattice as does any other index with non-trivial degree.Further, notice that these assumptions ensure the existence of c ∈ (0, 1) with cm ≤ K t ≤ D t ≤ c − n using GEN on the factor side for the ﬁrst inequality and

VAR on the variable side for the last inequality, so ∈ O ( n ) uniformly for all t ∈ T n . Using VAR on the factor side and

GEN on the variable side we obtain c ∈ (0, 1) with m ◦ ≥ max( m , cn − c − m ) ≥ c + c − n , so m ◦ is uniformly linear in n for all t ∈ T n since thearguments for m ∈ O ( n ) also apply to m ◦ . Let α − , α + ∈ R > be corresponding bounds, i.e. α − ≤ m ◦ ≤ α + n for all t ∈ T n and n .The arguments in the remainder of this section will clarify that using ( T ◦ n ) n is not only an alternativemodelling approach, but superior to using ( T n ) n . Intuitively, factors are not pruned (or missing in anysense), but replaced by trivial factors such that the total degree imposed by the variable side is met.Hence, for consistency we consider F ◦ t = { a , . . . , a m + ∆ K ( t ) } an extension of F m for t = ( m , λ , ℓ ) ∈ T n , let D K ( t ) = {( a i , 1) : m < i ≤ m + ∆ K ( t )} and A ◦ t = A ∪ D K ( t ).9.6. Random distributions.

In order to complete the picture recall m , m ε , d , k , ψ k from the introduc-tion and t n from Section 8. For the standard case we let L V be the support of d with d λ = λ for λ ∈ L V ,i.e. there is no distinction between labels and degrees. Analogously, we let L F be the support of k with k ℓ = ℓ for ℓ ∈ L F and P ℓ be the law of ψ ℓ . Notice that T n = T ∗ n , further T ◦ n satisﬁes MC and t ◦ = t for all t ∈ T n . Hence, compared to the discussion in Section 3.4 and Section 4.3 we slightly change the model inthat we do not condition on suitable degree sequences, but deﬁne the factor graphs for all possible de-gree sequences. However, as discussed in Section 3.4 (and as is evident from Section 8) the consistencycondition n X i = d i ≥ m ε X i = k i .(9.1)is satisﬁed with very high probability and hence the change of the model may be considered of purelytechnical nature.For ε ∈ (0, 1) we let t ∗ ε , n = ( m ε , ( d i ) i ∈ [ n ] , ( k i ) i ∈ [ m ε ] ) be the analogue of t ∗ n , but contrary to t n we let t ε , n = t ∗ ε , n as discussed above. With τ ∗ = ((1 − ε ) ¯ d / ¯ k , p d , p k ) and the metric ∆ from Section 8 it is immediate fromthe results of Section 8 that there exists r n = o (1) such that τ ( t ε , n ) ∈ B r n ( τ ∗ ε ) with high probability andProposition 8.1 also holds for ε >

0, so we can deﬁne T ◦ ε , n analogously. As discussed in Section 8 we canchoose r n such that uniform bounds hold for t = ( m , d , k ) ∈ T ◦ n , and in particular P i ∈ [ n ] d i > P i ∈ [ m ] k i .We deﬁne L V and L F as before and notice that as opposed to the boundary case ε = t ◦ t for all t ∈ T ◦ n .The conditions imposed by T ◦ n ensure that the number of factors is asymptotically equivalent to (1 − ε ) ¯ d / ¯ k and the total degrees are asymptotically equivalent to ¯ d n and (1 − ε ) ¯ d n on the variable and factorside respectively, hence the absolute frequency of ℓ ◦ in t ◦ is asymptotically equivalent to ε ¯ d n . Hence,for t = ( m , λ , ℓ ) ∈ T ◦ ε , n with t ◦ = ( m ◦ , λ , ℓ ◦ ) the number m ◦ of factors including the dummy factors isasymptotically equivalent to ( − ε ¯ k + ε ) ¯ d n . Further, the relative frequencies P λ (introduced in Section 6)converge to p d with respect to the metric ∆ L introduced in Section 5, and the frequencies P ℓ converge to p k with respect to ∆ L . Hence, the frequencies P ℓ ◦ converge to p ◦ k ∈ P ( L F ◦ ) given by p ◦ k ( ℓ ◦ ) = ε /( ε + (1 − ε )/ ¯ k ) and p ◦ k ( ℓ ) = (1 − p ◦ k ( ℓ ◦ )) p k ( ℓ ) otherwise.In a nutshell, the arguments above stress the fact that we always only consider factor graphs where thetotal degrees of the variable side and of the factor side are equal , a change of perspective that is essentialfor the upcoming sections.9.7. Mutual contiguity.

Mutual contiguity of σ ∗ and ˆ σ t uniformly over t ∈ T n follows with standardarguments from the following proposition. Further implications are discussed in Section 9.12 Finally, inSection 9.13 we will brieﬂy discuss why these results are entirely invariant to pinning. Proposition 9.1.

For all sequences ( T n ) n ⊆ T n satisfying MC and ε ∈ (0, 1) there exist c ∈ (0, 1) , r ∈ R > andn ∈ Z > such that for all n ∈ Z ≥ n , all t ∈ T n , all σ ∈ E t , with E t = { σ ∈ Ω V n : k γ t , σ − D t u Ω k < r p n } , wehave P [ σ ∗ ∈ E t ] , P [ σ G ∗ t ( σ ∗ ) ∈ E t ] , P [ ˆ σ t ∈ E t ] > − ε and c < P [ σ ∗ = σ ]/ P [ ˆ σ t = σ ] < c − . rom now on we consider ( T n ) n ⊆ T n satisfying MC ﬁxed. In order to show Proposition 9.1 we ﬁrstdetermine the asymptotics of the normalization constant of ˆ σ t , i.e. the ﬁrst moment ¯ Z t = E [ Z G t ]. Proposition 9.2.

Uniformly for t = ( m , λ , ℓ ) ∈ T n we have ¯ Z t = Θ ( Z ∗ t ) with Z ∗ t = q n Q i ∈ [ m ] ξ i . From the proof of Proposition 9.2 we directly obtain tail bounds and a local limit theorem for thecolour frequencies of ˆ σ t . For brevity let ˆ ρ t = D t γ t , ˆ σ t ∈ P ( Ω ) denote the random relative color frequen-cies on the half-edges under ˆ σ t . Recall from Section 6 that we have P [ ˆ ρ t ∈ R t ] = t = ( m , λ , ℓ ) ∈ T n and n ∈ Z > , where R t = R λ is the set induced by the lattice L obtained from ( d λ , ν λ ) λ ∈ L V . Proposition 9.3.

There exist constants c, c ′ ∈ R > such that for all n ∈ Z > , t ∈ T n and r ∈ R ≥ we have P [ k ˆ ρ t − u Ω k ≥ r ] ≤ c ′ exp( − cr n ) . In the following we may use the notions for t = ( m , λ , ℓ ) ∈ T n implied by the notions introduced inSection 5 and Section 6 without explicitly introducing them, e.g. λ t = λ λ , ℓ t = ℓ ℓ for the random indicesand d t = d λ , k t = k ℓ for the random degrees. In addition, let Σ V, t = Σ λ , Σ F, t = Σ ℓ ◦ as introduced inSection 6 and notice that Σ ℓ ◦ Σ ℓ in general. Further, let Σ E, t = E [ d t ] E [ d t ] Σ V, t and let Σ t be given by Σ − t = Σ − t + Σ − t − Σ − t . Let h = gcd{ d λ : λ ∈ L V } denote the greatest common divisor of the attainable variableside degrees. Proposition 9.4.

For r n = Θ ( p log( n )/ n ) , uniformly in t ∈ T n and ρ ∈ R t ∩ B r n ( u Ω ) we have P [ ˆ ρ t = ρ ] = (1 + o (1)) h q − p D t q − φ t ³p D t ( ρ − u Ω ) [ q − ´ ,where φ t denotes the density of N (0 [ q − , Σ t ) and Σ t . Further, Σ − t is positive deﬁnite and k Σ t k , k Σ − t k = Θ (1) uniformly in t ∈ T n . This local limit theorem for ˆ ρ t with the local limit theorem for ρ ∗ t = D t γ t , σ ∗ ∈ R t from Section 6 andthe tail bounds above is sufﬁcient to derive Proposition 9.1.9.8. Proof of Proposition 9.2.

Fix parameters ε gen , E (2) and E (3) n to satisfy the assumptions GEN , VAR and

SKEW jointly for the variable and factor side, this means in particular that E (3) n ∈ o ( p log( n ) / n ) isa uniform third moment bound for the variable side distribution sequences in n and also for all factorside sequences t ◦ , with m ◦ ranging from α − n to α + n .For t ∈ T n and ρ in the support of ˆ ρ t we have ¡ D t D t ρ ¢ Q ω ρ ( ω ) D t ρ ( ω ) ≥ ¡ D t + q − q − ¢ − , i.e. the maximal prob-ability of the multinomial is at least the uniform. Hence, the uniform bounds on E [ d t ] for t ∈ T n yield auniform lower bound ¡ D t D t ρ ¢ Q ω ρ ( ω ) D t ρ ( ω ) = Ω ( n − ( q − ). With Proposition 6.1 we have P [ k ρ ∗ t − u Ω k ≥ r ] ≤ c ′ exp( − cr n )for all n ∈ Z > , t ∈ T n and r ∈ R ≥ . For t = ( m , λ , ℓ ) ∈ T n , with m ◦ = m + ∆ K ( t ), γ y denoting the colourfrequencies of y ∈ Ω A ◦ t and using arguments analogous to Section 4.3 this yields¯ Z t Z ∗ t = X σ q D t q n ¡ D t γ t , σ ¢ X y { γ y = γ t , σ } Y i ∈ [ m ◦ ] µ i ( y a i ) = r t , + + r t , − , r t , + = X γ ∈ B n P [ γ t , σ ∗ = γ ] P [ γ y ∗ t = γ ] ¡ D t γ ¢ q − D t , r t , − = q D t X γ B n P [ γ t , σ ∗ = γ ] ¡ D t γ ¢ X y { γ y = γ } Y i ∈ [ m ◦ ] µ i ( y a i ), ith B n = D t B r n ( u Ω ) and y ∗ t ∼ N i ∈ [ m ◦ ] µ i . For r t , − , using ρ γ = D − t γ and BAL’ we get r t , − = q D t X γ B n P [ γ t , σ ∗ = γ ] ¡ D t γ ¢ Q ω ρ γ ( ω ) γ ( ω ) X y { γ y = γ } Y i ∈ [ m ◦ ] Ã µ i ( y a i ) Y h ∈ [ k i ] ρ γ ( y a i , h ) ! ≤ q D t X γ B n P [ γ t , σ ∗ = γ ] ¡ D t γ ¢ Q ω ρ γ ( ω ) γ ( ω ) X y Y i ∈ [ m ◦ ] Ã µ i ( y a i ) Y h ∈ [ k i ] ρ γ ( y a i , h ) ! ≤ X γ B n P [ γ t , σ ∗ = γ ] ¡ D t γ ¢ Q ω ρ γ ( ω ) γ ( ω ) = O ( n q P [ γ t , σ ∗ B n ]) = O ( n q exp( − cr n n ))uniformly in t ∈ T n . Hence, for any a ∈ R > , all c ∗ ∈ R > large enough and with r n = c ∗ p log( n )/ n we have r t , − = o ( n − a ). This completes the discussion of the tails.Next, we turn to the asymptotics of r t , + . Preparing the application of the local limit theorem 6.3 and the large deviation result 6.2 jointly for the variable side and the factor side, we proceed with care. First,recall the existence of sequences satisfying MC that cover λ and ℓ ◦ respectively for all t = ( m , λ , ℓ ) ∈ T n and n ∈ Z > . Further, we ﬁx a sequence R m ◦ = Θ ( p log( m ◦ )/ m ◦ ), with asymptotics in m ◦ , for the factorside and sufﬁciently large such that B n ⊆ B R m ◦ ( u Ω ) for all sufﬁciently large n and uniformly in m ◦ for t ∈ T n . Further, we ﬁx a compact set P ∗ ⊆ P ◦ ( Ω ), covering P ◦ ( Ω ) but for a small residue at the boundary.As discussed in the proof of theorem 6.3 using the ﬁrst order approximation of the homeomorphism ι from Section 5, we eventually have ι − λ ( B n ) ⊆ P ∗ and ι − ℓ ◦ ( B n ) ⊆ P ∗ for all t ∈ T n and n sufﬁcientlylarge. Now, we ﬁrst use the large deviation result 6.2 with the uniform error bounds. Recalling that E [ k t ◦ ] m ◦ = K t ◦ = D t , using the notions from Section 6, further for t = ( m , λ , ℓ ) ∈ T n , γ ∈ B n in the supportof γ t , σ ∗ and with ρ = D − t γ , p = ι − λ ( ρ ), p ′ = ι − ℓ ◦ ( ρ ) we have r t , + =  + O  E (3) n s log( n ) n  X γ ∈ B n W t ( γ ), W t ( γ ) = W V, t ( γ ) W F, t ( γ ) W E, t ( γ ) = h q − q D q − t exp( − D t α t ( γ )) p π q − q q q det( Σ λ , p Σ ℓ ◦ , p ′ ) α t ( γ ) = α v, t ( γ ) + α f, t ( γ ) − α e ( γ ) W V, t ( γ ) = h q − q D q − t exp ¡ − D t α v, t ( γ ) ¢ p π q − q det( Σ λ , p ) , α v, t ( γ ) = E [ d t ◦ ] E [ D KL ¡ ν p , λ t ◦ k ν λ t ◦ ¢ ] W F, t ( γ ) = q D q − t exp ¡ − D t α f, t ( γ ) ¢ p π q − q det( Σ ℓ ◦ , p ′ ) , α f, t ( γ ) = E [ k t ◦ ] E [ D KL ¡ µ p ′ , ℓ t ◦ k µ ℓ t ◦ ¢ ] W E, t ( γ ) = q D q − t exp ¡ − D t α e ( γ ) ¢ p π q − p q − q , α e ( γ ) = D KL ¡ ρ k u Ω ¢ ,Using BAL’ we notice that for all ℓ ∈ L F ◦ and p ∈ P ( Ω k ℓ ) we have µ ℓ | ∗ = u Ω , obtained from the fact that u Ω is a maximizer of p P y µ ℓ ( y ) Q h p ( y h ), hence a stationary point, and taking the ﬁrst derivatives.Further, using p ∗ = p | ∗ , we have D KL ¡ p k µ ℓ ¢ = D KL ¡ p k µ p ∗ , k ℓ ¢ + log Ã q − k ℓ Z p ∗ , k ℓ ! + k ℓ log( q ) − H ³ p k p ⊗ k ∗ ´ = D KL ¡ p k µ p ∗ , k ℓ ¢ + + log Ã q − k ℓ Z p ∗ , k ℓ ! + k ℓ D KL ¡ p ∗ k u Ω ¢ ≥ k ℓ D KL ¡ p ∗ k u Ω ¢ . ith this result, the convexity of the relative entropy under ˆ ℓ t ◦ = ℓ ˆ P ℓ ◦ from Section 5, i.e. for ℓ ∈ L F ◦ given by P [ ˆ ℓ t ◦ = ℓ ] = k ℓ E [ k t ◦ ] P [ ℓ t ◦ = ℓ ], and the fact that ρ = E [ µ p ′ ,ˆ ℓ t ◦ | ∗ ] we obtain α f, t ( γ ) ≥ E [ k t ◦ ] E [ k t ◦ D KL ¡ µ p ′ , ℓ t ◦ | ∗ k u Ω ¢ ] ≥ α e ( γ )Since we further have α f, t ( D t u Ω ) − α e ( D t u Ω ) = ι ℓ ◦ ( u Ω ) = u Ω by BAL’ ), this implies that the Hessian H t of f t ( ρ [ q − ) = α f, t ( γ ) − α e ( γ ) at ρ = u Ω is positive semi-deﬁnite.With B and composing the Hessians from the proof of Proposition 6.3 we have the Hessian Σ − t = qB t B (not depending on t ) for the latter contribution and Σ − t for the former, so H t = Σ − t − Σ − t . Now, we followthe proof of Proposition 6.3 to obtain D t α t ( γ ) = v t Σ − t v + O  E (3) n s log( n ) n  , v = p D t − ( γ − D t u Ω ) [ q − .The fact that Σ − t = Σ − t + H t shows that Σ − t is positive deﬁnite with k Σ t k ≤ k Σ V, t k = O (1) uniformly in t . Further, since Σ − t = q I [ q − + q [ q − t[ q − is positive deﬁnite with eigenvalues q , q (and determinant q q ) we get k Σ − t k ≤ k Σ − t k + k Σ − t k = O (1) uniformly in t , which yields k Σ t k , k Σ − t k = Θ (1) uniformlyin t . Using Σ V, t = E [ d t ] E [ d t ] Σ E, t , we obtain det( Σ V, t ) = ( E [ d t ]/ E [ d t ]) q − q q . Following the proof we can take theasymptotics of the determinants to get r t , + =  + O  E (3) n s log( n ) n  X γ ∈ B n W t ( γ ), W t ( γ ) = r ∗ t h q − p D t q − exp ¡ − v t Σ − t v ¢p (2 π ) q − det( Σ t ) , r ∗ t = vuut det Ã E [ d t ] E [ d t ] Σ t Σ − t ! ,and notice that r ∗ t = Θ (1) uniformly since all eigenvalues of both matrices and the second moment areuniformly Θ (1). Recall that γ [ q − sits on a lattice of lengths h in all dimensions, hence v is on a latticewith lengths h p D t − in all dimensions. Using the uniform bounds on the eigenvalues of Σ t we canapproximate the Riemann sum by an integral over a growing domain of radius c ∗ p log( n ) (in 1 ⊥ [ q ] withthe 2-norm), hence the error is of order O ( p log( n )/ n ), i.e. negligible. Choosing c ∗ sufﬁciently largeensures that the extension of the domain comes at a negligible cost, say p n − , hence we have¯ Z t Z ∗ t =  + O  E (3) n s log( n ) n  r ∗ t = Θ (1),uniformly in t ∈ T n . The constant r ∗ t is of interest in its own right and provides further insights, but inthis context we only need the uniform bounds.9.9. Proof of Proposition 9.3.

First, notice that the discussion in Section 9.2 directly translates to ˆ ρ t since P [ ˆ ρ t = ρ ] = Z ∗ t ¯ Z t P [ γ t , σ ∗ = D t ρ ] P [ γ y ∗ t = γ ] ¡ D t D t ρ ¢ q − D t =  + O  E (3) n s log( n ) n  r ∗− t P [ γ t , σ ∗ = D t ρ ] P [ γ y ∗ t = γ ] ¡ D t D t ρ ¢ q − D t uniformly in ρ ∈ R t and t ∈ T ◦ n . Analogously to the bounds derived for r − , t and with the relative errorbounds above we ﬁnd c , c ′ ∈ R > such that P [ k ˆ ρ t − u Ω k ≥ r ] ≤ c ′ n q exp( − cr n ) for all sufﬁciently large , t ∈ T n and r ∈ R ≥ . In particular, if r ≥ r n = log( n )/ p n , then we can weaken c to c ′′ ∈ (0, c ) to maintainthe bound since q log( q ) c log( n ) = o (1). For r < r n we can use the discussion of r + , t with uniform bounds onthe smallest eigenvalue which is ensured to be uniformly bounded away from zero, uniform boundson the leading coefﬁcient and the integral approximation to obtain uniform bounds up to r n and usethe bound above on the remainder, then taking the smaller constant for the exponent and the sum ofcoefﬁcients. This completes the proof for n large. For small n ≤ n we notice that k ˆ ρ t − u Ω k ≤

2, so if theleading coefﬁcient c ′ is sufﬁciently large and the constant c in the exponent sufﬁciently small, then theright-hand side is larger than 1 for all choices of n ≤ n and r with P [ k ˆ ρ t − u Ω k ≥ r ] >

0. This ensuresexistence of c , c ′ such that the assertion holds.9.10. Proof of Proposition 9.4.

Proposition 9.4 is immediate from Section 9.8 with the discussion inSection 9.9.9.11.

Proof of Proposition 9.1.

Recall that the results of Section 6 are also valid for ρ ∗ t , hence for given ε we can choose r such that k ˆ ρ t − u Ω k < r ( E [ d t ] p n ) − and k ρ ∗ t − u Ω k < r ( E [ d t ] p n ) − with probabilityat least 1 − ε , valid for all n ∈ Z > and t ∈ T n using the uniform bounds for E [ d t ]. Since the relative errorbounds are uniform for both models, the Radon-Nikodym derivative P [ ρ ∗ = ρ ]/ P [ ˆ ρ t = ρ ] is the ratioof the densities of the normal approximations up to a leading constant. This ratio can be uniformlybounded from above and below, uniformly for all sufﬁcintly large n , t ∈ T n and all ρ in the r ( E [ d t ] p n ) − radius around u Ω . Finally, notice that P [ ˆ σ t = σ ] = Z ∗ t q D t q n ¡ D t γ t , σ ¢ P [ γ y ∗ t = γ t , σ ]by the discussion at the beginning of Section 9.8, which means that ˆ σ t given γ t , ˆ σ t is uniform and henceequal to σ ∗ given γ t , σ ∗ . In particular the derivative of σ ∗ with respect to ˆ σ t is the derivative of ρ ∗ withrespect to ˆ ρ t (constant on assignments with same color frequencies, to be precise).We’re left to show the assertion that σ G ∗ t ( σ ∗ ) ∈ E t with probability at least 1 − ε uniformly. With c de-noting the upper bound on the derivative of σ ∗ to ˆ σ t notice that P [ σ G ∗ t ( σ ∗ ) E t ] ≤ E [ { σ G ∗ t ( σ ∗ ) E t , σ ∗ ∈ E t }] + ε ≤ c E [ { σ G ∗ t ( ˆ σ ) E t , ˆ σ t ∈ E t }] + ε ≤ c E [ { σ G ∗ t ( ˆ σ ) E t }] + ε = c E [ { ˆ σ t E t }] + ε ≤ ( c + ε uniformly in n ∈ Z > and t ∈ T n . So, if we now choose r ∗ ∈ R > sufﬁciently large such that both ˆ ρ t and ρ ∗ t attain frequencies in the corresponding ball with probability at least 1 − ( c + − ε , then we obtain the 1 − ε bound for σ G ∗ t ( σ ∗ ) . Hence, the assertion holds with r ∗ and the bound c ∗ on the derivative correspondingto r ∗ . This completes the proof.9.12. Implications.

The results in Section 8 directly impy mutual contiguity of σ ∗ and ˆ σ t n with t n fromSection 8, since the assumptions of Proposition 9.1 are clearly met and the result is uniform in t ∈ T n .The fact G ∗ t ( σ ∗ ) and G ∗ t ( ˆ σ t ) (or ˆ G t for that matter) conditional to a ﬁxed ground truth obviously havethe same law then yields mutual contiuity of the degree/assignment/factor graph triplets. For the samereason we obtain joint mutual contiguity for the factor side half-edge assignments y t ( σ ∗ ) and y t ( ˆ σ ),combined with t n , further σ ∗ and ˆ σ or the corresponding factor graph models.9.13. Pinning.

The pinned model is obtained from the regular model by ﬁxing a subset U ⊆ V n andattaching constraints to the variables that ﬁx the assignment to a uniformly random colour ˇ σ x ∈ Ω for x ∈ U . Since this process is independent of anything else we have E [ ψ G t , U ( σ )] = E [ ψ G t ( σ )] E [ Q x ∈ U { ˇ σ x = σ x }] = q −| U | E [ ψ G t ( σ )]. The result immediately translates to the partition function, implying that ˆ σ t , U and ˆ σ t have the same law and thereby the mutual contiguity results also hold for pinned models.

0. T

YPICAL ASSIGNMENTS

In this section we derive results for the variable side and factor side half-edge assignments introducedin Section 9. We use the model, notions and notation introduced in Section 9, further Section 8, Section7 and Section 6. For the model introduced in Section 9 degrees and labels coincide, i.e. ℓ = k and λ = d using the corresponding notation. The distributions µ k are derived from ψ k introduced in Section 2.1,while the distributions on the variable side are given by ν d as introduced in Section 9. Combining Section8 and Section 7, we have p d , p k ∈ P L , let A = {( d , χ ) : d ∈ Z ≥ , χ ∈ Ω d } denote the joint support, furtherlet α ∗ V ∈ P ( A ) be given by α ∗ V ( d , χ ) = p d ( d ) ν d ( χ ) for d in the support of d and χ ∈ Ω d , and let α ∗ F ∈ P ( A )be given by α ∗ F ( k , y ) = p k ( k ) µ k ( y ) for k in the support of k and y ∈ Ω k , i.e. the expected assignmentdistributions on the variable and factor side. Based on Section 7 we let ∆ A denote the metric on P induced by ∆ A on P A , i.e. ∆ A ( α , α ′ ) = ∆ A ( α , α ′ ) + ∆ A ( α , α ′ ) for α , α ′ ∈ P . Finally, we let α ∗ = ( α ∗ V , α ∗ F ) ∈ P .For n ∈ Z > , t = ( m , d , k ) ∈ T ∗ n , σ ∈ Ω V n and y ∈ Ω A t , we let α V, t , σ ∈ P A denote the variable side half-edge assignment distribution, i.e. α V, t , σ ( d , χ ) = n − ¯¯© i ∈ [ n ] : d i = d , ( σ x i ) h ∈ [ d i ] = χ ª¯¯ for ( d , χ ) ∈ A , and α F, t , y ∈ P A denote the factor side half-edge assignment distribution, i.e. α F, t , y ( k , y ) = m − ¯¯© i ∈ [ m ] : k i = k , y i = y ª¯¯ for ( k , y ) ∈ A if m > m =

0. Finally let α s = ( α V, t , σ , α F, t , y ) ∈ P with s = ( t , σ , y ). Now, for n ∈ N , t ∈ T n and σ ∈ Ω V n recall y ∗ t ( σ ) from Section 9 andlet s ∗ t = ( t , σ ∗ , y ∗ t ( σ ∗ )) and ˆ s t = ( t , ˆ σ , y ∗ t ( ˆ σ t )) denote the coloured sequences for the two versions of theteacher-student scheme for given t ∈ T n and further s ∗ = s ∗ t n , ˆ s = ˆ s t n . Further, let S n denote the set ofvalid coloured sequences s = ( t , σ , y ), i.e. we have t ∈ T n , σ ∈ Ω V n and y = y G , σ for some G in the supportof G t . Finally, for given r ∈ R > let S ◦ n , r = { s ∈ S n : α s ∈ B r ( α ∗ )}.As before, a sequence f n ( s ) with s ∈ S n and n ∈ N is sublinear in the number of factors if there exists c ∈ R > with | f n ( s ) | ≤ c + c mn for all s = ( t , σ , y ) ∈ S n and n ∈ N . Proposition 10.1.

Assume that

DEG and

BAL hold. Then there exists r n ∈ R > with r n = o (1) such that forall sequences f n ( s ) that are sublinear in the number of factors we have E [ f n ( s ∗ )] = E [ f n ( s ∗ ) { s ∗ ∈ S ◦ n , r n }] + o (1) = E [ f n ( s ∗ ) | s ∗ ∈ S ◦ n , r n ] + o (1) and the same holds for s ∗ replaced by ˆ s . Using Proposition 10.1 we ﬁx a suitable choice of r n and let S ◦ n = S ◦ n , r n denote the set of valid typi-cal coloured sequences. Notice that while we discuss the standard model for brevity, the entire sectioncanonically translates to the case including dummy factors as discussed in Section 9, where the referenceassignment distribution α ∗ F is the distribution corresponding to p ◦ k .10.1. Half-edge assignments.

As opposed to the deﬁnition of the teacher-student scheme and the dis-cussion in Section 10 we will work with assignments to the variable side half-edges directly, or equiva-lently with assignments to non-isolated variables. While there is almost a one-to-one correspondencebetween assignments σ ∈ Ω V n to the variables and the assignments χ ∈ Ω X t to the half-edges given t ∈ T n , we discard assignments σ x i to isolated variables x i ∈ V n with d i = i ∈ [ n ]. Hence, this tran-sition needs to be justiﬁed.Let n t = P [ d t > n denote the number of variables with non-trivial degree and χ t , σ ∈ Ω X t for σ ∈ Ω V n and t ∈ T n be given by χ x i , h = σ x i for i ∈ [ n ] and h ∈ [ d i ]. For χ in the support of χ t = χ t , σ ∗ and G inthe support of G ∗ t the deﬁnitions of y G , σ , ψ G ( χ ), α V, χ and hence α ∗ are completely analogous to the revious case and coincide. However, notice that with Z ′ G = P χ ψ G ( χ ) we have Z G = q n − n t Z ′ G . Let G ∗ t ( χ )be the teacher-student model with ground truth χ (in the support of χ ∗ t ) be given by the Radon-Nikodymderivative ψ G ( χ ) E h ψ G t ( χ ) i with respect to G t , so G ∗ t ( χ ) and G ∗ t ( σ ) have the same law for all σ ∈ Ω V n with χ = χ t , σ ,implying that y ∗ t ( χ ) and y ∗ t ( σ ) have the same law. Further, with ˆ χ t = χ t , ˆ σ t we have P [ χ ∗ t = χ ] = q n − n t q − n = q − n t , P [ ˆ χ t = χ ] = q n − n t E [ ψ G t ( χ )] q n − n t E [ Z ′ G t ] = E [ ψ G t ( χ )] E [ Z ′ G t ] ,i.e. consistent deﬁnitions of χ ∗ t and ˆ χ . The remaining notions directly translate, hence with the discus-sion above it is obvious that Proposition 10.1 holds if and only if it holds on the half-edge level.10.2. Proof strategy.

We start with the main result that yields Proposition 10.1 as a corollary.

Proposition 10.2.

Assume that

DEG and

BAL hold. For all ε ∈ R > there exist constants c, c ′ ∈ R > suchthat the following holds. For all n ∈ N and all t ∈ T ◦ n we have P £ ∆ A ( α s ∗ t , α ∗ ) ≥ ε ¤ ≤ c ′ exp( − cn ) , P £ ∆ A ( α ˆ s t , α ∗ ) ≥ ε ¤ ≤ c ′ exp( − cn ) . The proof of Proposition 10.2 is split into two parts. In the ﬁrst part we show that the colour frequen-cies in both models are close to uniform with very high probability. In the second part we show that forcolour frequencies sufﬁciently close to uniform the assignment distributions are indeed very close to thereference with very high probability.

Lemma 10.3.

Assume that

DEG and

BAL hold. Then there exist constants c, c ′ ∈ R > such that the follow-ing holds. For all n ∈ N , all t ∈ T ◦ n and all ε ∈ R > we have P £°° ρ ∗ t − u Ω °° ≥ ε ¤ ≤ c ′ exp( − c ε n ) , P £°° ˆ ρ t − u Ω °° ≥ ε ¤ ≤ c ′ exp( − c ε n ) . With the tail bounds in place we can focus on the center, i.e. colour frequencies close to uniform.

Lemma 10.4.

Assume that

DEG and

BAL hold. Then for all ε ∈ R > there exist δ , c, c ′ ∈ R > such that thefollowing holds. For all n ∈ N and all t ∈ T ◦ n we have P £ ∆ A ( α s ∗ t , α ∗ ) ≥ ε ¯¯ ρ ∗ t ∈ B δ ( u Ω ) ¤ ≤ c ′ exp( − cn ) and the same holds for s ∗ t , ρ ∗ t replaced by ˆ s t , ˆ ρ t . Proposition 10.2 is then an almost immediate consequence of Lemma 10.3 and Lemma 10.4.10.3.

Proof of Lemma 10.3.

Notice that χ ∗ t ∼ N i ∈ [ n ] ν d i and recall that T ◦ n satisﬁes the assumptions inSection 6 on both the variable and the factor side. Hence, Proposition 6.1 yields constants c , c ′ ∈ R > such that P [ k ρ ∗ t − u Ω k ≥ ε ] ≤ c ′ exp( − c ε n )for all n ∈ N , t ∈ T ◦ n and ε ∈ R ≥ . Proposition 9.3 yields constants c , c ′ ∈ R > such that the assertionholds for ˆ ρ t . Taking the maximum c ′ and minimum c completes the proof.10.4. Proof of Lemma 10.4.

Fix t = ( m , d , k ) ∈ T ◦ n , recall that χ ∗ t ∼ N i ∈ [ n ] ν d i and let y ∗ t ∼ N i ∈ [ m ] µ k i be independent of anything else. For γ in the support of γ ∗ t let χ t , γ = ( χ ∗ t | γ ∗ t = γ ) and y t , γ = ( y ∗ t | γ y ∗ t = γ ) denote the half-edge assignments on the variable side and factor side for given γ (with γ y denot-ing the colour frequencies of y , as introduced in Section 9). Notice that both ( χ ∗ t , y ∗ t ( χ ∗ t )) | γ ∗ t = γ and( ˆ σ t , y ∗ t ( ˆ σ t )) | ˆ γ t = γ have the same law as ( χ t , γ , y t , γ ). ith the results in Section 7 we obtain δ , c , c ′ ∈ R > such that for all n ∈ N , t ∈ T ◦ n with p d, t ∈ B δ ( p d )and p k, t ∈ B δ ( p k ) (with respect to the corrsesponding metric) and attainable ρ ∈ B δ ( u Ω with γ = E [ d t ] n ρ we have P h ∆ A ( α V, t , χ t , γ , α ∗ V ) ≥ ε i ≤ c ′ exp( − cn ).Further, a corresponding result holds on the factor side. By weakening the constants and using m ∼ m n uniformly we obtain δ , c , c ′ ∈ R > to obtain uniform exponential tail bounds on both sides. Further, sincewe have ( p d, t , p k, t ) → ( p d , p k ) uniformly in t ∈ T ◦ n the assumptions p d, t ∈ B δ ( p d ) and p k, t ∈ B δ ( p k ) areredundant for sufﬁciently large n . By readjusting the leading coefﬁcient c ′ the tail bounds are trivial forsmall n . Finally, the assertion follows from an ε /2 argument.10.5. Proof of Proposition 10.2.

Using Lemma 10.4 we obtain uniform exponential tail bounds for thecenter, i.e. restricted to ρ ∗ t ∈ B δ ( u Ω ) and ˆ ρ t ∈ B δ ( u Ω ) respectively for some δ ∈ R > . With Lemma 10.3 wethen obtain exponential tail bounds for ρ ∗ t B δ ( u Ω ) and ˆ ρ t B δ ( u Ω ) respectively, which immediatelyyield the assertion by splitting the probability into the two regimes and weakening the constants.10.6. Proof of Proposition 10.1.

With Proposition 10.2 we can construct a sequence r n ∈ R > , n ∈ N ,with r n = o (1) such that uniformly in t ∈ T ◦ n we have s ∗ t ∈ S ◦ n , r n and ˆ s t ∈ S ◦ n , r n with high probability.Following the proof of Proposition 8.1 we can restrict to t ∈ T ◦ n , and since f is bounded on this subsetthe assertion follows from the above.11. P ROOF OF P ROPOSITION

DEG , BAL and further

SYM are satisﬁed.Notice that

DEG implies the corresponding assumptions in Section 9 and

BAL implies

BAL’ .Recall the valid numbers N of variables and for n ∈ N the valid degree sequences T n . For n ∈ N , t ∈ T n and σ ∈ Ω V n let r ∗ σ denote the Radon-Nikodym derivative G ψ G ( σ )/ ¯ ψ t ( σ ), G in the support of G t , of the teacher-student scheme G ∗ t ( σ ) with respect to the null model G t , where ¯ ψ t = E [ ψ G t ] denotesthe expected total weight. Further, let r ∗ denote the derivative G E [ r ∗ t , σ ∗ ] of G ∗ t ( σ ∗ ) with respect to G t .Notice that we can keep the dependence of r ∗ σ and r ∗ on t implicit since t is determined by G , i.e. thesets of factor graphs for distinct degree sequences are disjoint. The mutual information given t and theunconditional mutual information are given by I ( t ) = I ( σ ∗ , G ∗ t ( σ ∗ )) = E · log µ r ∗ σ ∗ ( G ∗ t ( σ ∗ )) r ∗ ( G ∗ t ( σ ∗ )) ¶¸ , I = I ( σ ∗ , G ∗ t n ( σ ∗ )) = E [ I ( t n )] .We obtain the following proposition as a corollary. For this purpose recall the notions from Section 8,Section 9 and let Λ ( x ) = x log( x ). Proposition 11.1.

Under

DEG , BAL and

SYM we have n I ( σ ∗ , G ∗ t n ( σ ∗ )) = log( q ) + E " ¯ d ¯ k ξ k q k X y ∈ Ω k Λ ( ψ k ( y )) − n E h log ³ Z G ∗ t n ( σ ∗ ) ´i + o (1) . Preliminaries.

Using Section 11 with respect to Proposition 3.1 allows to restrict to t ∈ T ◦ n . Withrespect to Proposition 11.1 and using the deﬁnitions of Section 11 we ﬁrst notice that under SYM themutual information per variable is sublinear in the number of factors, i.e. there exists ε ∈ (0, 1) with ε m ≤ ψ G ( σ ) ≤ ε − m uniformly for G in the support of G t , σ ∈ Ω V n and t = ( m , d , k ) ∈ T n , hence the same holdsfor ¯ ψ t ( σ ), further | log( r ∗ σ ( G )) | , | log( r ∗ ( G )) | ≤ m log( ε − ) and thereby | i ∗ t | ≤ ¡ ε − ¢ mn with i ∗ t = n I ( t ).With i ∗ = E [ i ∗ t n ] we can hence use Proposition 8.1 to obtain i ∗ = E [ i ∗ t n { t n ∈ T ◦ n }] + o (1) which againjustiﬁes the restriction to typical degree sequences. or t ∈ T n we can rewrite i ∗ t as follows to extract the material contributions. While the following stepscan be traced algebraically using the deﬁnition of i ∗ t , but we prefer to give the conceptual and moreintuitive derivation using the conditional entropy H ( x | y ) and cross entropy H ¡ x k y ¢ , i.e. i ∗ t = n D KL ¡ ( σ ∗ , G ∗ t ( σ ∗ )) k σ ∗ ⊗ G ∗ t ( σ ∗ ) ¢ = n H ( σ ∗ ) − n H ( σ ∗ | G ∗ t ( σ ∗ )) = n H ( σ ∗ ) − n E [ H ¡ σ ∗ | G ∗ t ( σ ∗ ) k σ G ∗ t ( σ ∗ ) ¢ ] + n E [ D KL ¡ σ ∗ | G ∗ t ( σ ∗ ) k σ G ∗ t ( σ ∗ ) ¢ ] = log( q ) + ¡ η ∗ t − φ ∗ t ¢ + i ∗ err ( t ), η ∗ t = n E £ log ¡ ψ G ∗ t ( σ ∗ ) ( σ ∗ ) ¢¤ , φ ∗ t = n E £ log ¡ Z G ∗ t ( σ ∗ ) ¢¤ , i ∗ err ( t ) = − n E £ log ¡ r G ∗ t ( σ ∗ ) ( σ ∗ ) ¢¤ , r G ( σ ) = E · ¯ ψ t ( σ )¯ ψ t ( σ G ) ¸ .The quantities η ∗ t and φ ∗ t reﬂect the split of µ G ( σ ) = ψ G ( σ ) Z G into the weight ψ G ( σ ) and normalization con-stant Z G , and φ ∗ t already appears in the right hand side of the assertion. Hence, we are left to derivethe material contributions from η ∗ t and to show that the relative entropy per variable i err ∗ ( t ) is negligi-ble, where r G is the derivative of the posterior σ G ∗ t ( σ ∗ ) with respect to the prior σ ∗ | G ∗ t ( σ ∗ ) given G from G ∗ t ( σ ∗ ) (notice the leading minus sign in the deﬁnition of i ∗ err ( t )).11.2. The material contribution.

For given t = ( m , d , k ) ∈ T ◦ n we add the conditioning level for the fac-tor side assignments, i.e. η ∗ t = E [ η ∗ t ( σ ∗ , y ∗ t ( σ ∗ ))] with η ∗ t ( σ , y ) = n E h log ³ ψ G ∗ t ( σ , y ) ( σ ) ´i . With the resultsfrom Section 11.1 we notice that η ∗ t ( σ , y ) is sublinear in the number of factors, so we can use Proposition10.1 to obatin η ∗ t = E [ η ∗ t ( σ ∗ , y ∗ t ( σ ∗ )) {( σ ∗ , y ∗ t ( σ ∗ ) ∈ A ◦ t }] + o (1) uniformly for all t ∈ T ◦ n . However, by thevery deﬁnition of G ∗ t ( σ , y ) we have ψ G ∗ t ( σ , y ) ( σ ) = Q a ∈ F m ψ G ∗ t ( σ , y ), a ( y a ), and the weights ψ G ∗ t ( σ , y ), a , a ∈ F m ,are drawn independently and independent of the bijection g G ∗ t ( σ , y ) . As discussed in Section 4.3, for k inthe support of k and y ∈ Ω k let p ∗ k , y ∈ P ((0, 2) Ω k ) be the law given by the derivative ψ ψ ( y )¯ ψ k ( y ) ∈ R > withrespect to ψ k , then we have ( ψ G ∗ t ( σ , y ), a ) a ∼ N i ∈ [ m ] p ∗ k i , y ai and further η ∗ t ( σ , y ) = mn X i ∈ [ m ] m E h log ³ ψ G ∗ t ( σ , y ), i ( y a i ) ´i = mn E h log ³ ψ ∗ k i , y a i ( y a i ) ´i using ψ ∗ k , y ∼ p ∗ k , y and i ∈ [ m ] uniform. Since η ∗ t ( σ , y ) = Θ (1) uniformly in t ∈ T ◦ n and for all ( σ , y ) assum-ing SYM , and further ( k i , y a i ) converges to ( k , y ∗ k ), y ∗ k ∼ µ k , in total variation distance for ( t , σ , y ) ∈ S ◦ n , tobe precise we have uniform bounds in 1-norm for the laws given t and bounds on the 1-norm of the de-gree laws, this gives η ∗ t ( σ , y ) = ¯ d ¯ k E [log( ψ ∗ k , y ∗ k ( y ∗ k ))] + o (1) uniformly for all valid typical colored sequences( t , σ , y ) ∈ S ◦ n . With the discussion at the beginning of this section we obtain η ∗ t = ¯ d ¯ k E [log( ψ ∗ k , y ∗ k ( y ∗ k ))] + o (1) uniformly for all t ∈ T ◦ n and further E [ η ∗ t n ] = ¯ d ¯ k E [log( ψ ∗ k , y ∗ k ( y ∗ k ))] + o (1). While this may be consideredthe natural form in terms of our proof strategy, the form of the assertion can be established by expandingthe expectation over y ∗ k and using the derivative of p ∗ k , y .11.3. The negligible contribution.

The discussion in Section 11.1 allows to restrict to t ∈ T ◦ n , but asbefore subllinearity in the number of factors can also be easily obtained for i ∗ err ( t ). Further, the fact that i ∗ err ( t ) = n E [ D KL ¡ σ ∗ | G ∗ t ( σ ∗ ) k σ G ∗ t ( σ ∗ ) ¢ ] directly yields I ∗ err ( t ) ≥ x log( x ). Upper bounding i ∗ err ( t ) is involved since weconsider the relative entropy given G ∗ t ( σ ∗ ), a model that is not as accessible as say s ∗ n . owever, the derivative r G is an expectation by design and hence we can apply Jensen’s inequality to − log( x ) with respect to the inner expectation, yielding − log( r G ( σ )) ≤ E [ − log( ¯ ψ t ( σ )/ ¯ ψ t ( σ f ))] and hence i ∗ err ( t ) ≤ n E £ log ¡ ¯ ψ t ( σ G ∗ t ( σ ∗ ) ) ¢¤ − n E £ log ¡ ¯ ψ t ( σ ∗ ) ¢¤ = δ ∗ ( t ) − δ ∗ ( t ), δ ∗ ( t ) = E · n log ¡ r t ( σ ∗ ) ¢¸ , δ ∗ ( t ) = E · n log ¡ r t ( σ G ∗ t ( σ ∗ ) ) ¢¸ , r t ( σ ) = P [ σ ∗ = σ ] P [ ˆ σ t = σ ] .Notice that δ ∗ ( t ) = n − D KL ( σ ∗ k ˆ σ t ) and | n − log( r t ( σ )) | = n − | log( E [ ¯ ψ t ( σ ∗ )]) − log( ¯ ψ t ( σ )) | , which yields | n − log( r t ( σ )) | < c ∗ uniformly in t ∈ T ◦ n and σ ∈ Ω V n for some c ∗ ∈ R > . For given ε ∈ (0, 1) we summonProposition 9.1 to obtain c , r ∈ R > such that uniformly in t ∈ T ◦ n we have P [ σ ∗ E n ], P [ σ G ∗ t ( σ ∗ ) E n ] < ε and | log( r t ( σ )) | < c for all σ ∈ E n with E n = { σ ∈ Ω V n : k γ t , σ − D t u Ω k < r p n }. Then we have | i ∗ err ( t ) | ≤ ¡ cn + ε c ∗ ¢ ∼ c ∗ ε . Taking ε to 0 shows that i ∗ err ( t ) = o (1) uniformly in t ∈ T ◦ n , so E [ i ∗ err ( t n )] = o (1) and theassertion holds. Proof of Proposition 3.1.

Since standard arguments, i.e., Section 5 in [13] show that there exists a simple G with the desired degree sequences with positive probability, the proposition is an immediate conse-quence of Section 11.1-11.3. (cid:3)

12. C

ONCENTRATION

In this section we focus on the central quantity discussed in this work, the quenched free entropydensity. In the remainder we tacitly assume that

DEG , BAL and

SYM hold and reuse the conventions andnotions from Section 8, Section 9 and Section 10. For t ∈ T n and a factor graph G in the support of G t thefree entropy density of G is φ ( G ) = n log( Z G ). Now, depending on our model the quenched free entropydensities given t are ¯ φ t = E [ φ ( G t )], φ ∗ t = E [ φ ( G ∗ t ( σ ∗ ))] and ˆ φ t = E [ φ ( G ∗ t ( ˆ σ t ))].From these we obtain the model dependent quenched free entropy densities by averaging over t n ,i.e. ¯ φ = E [ ¯ φ t n ], φ ∗ = E [ φ ∗ t n ] and ˆ φ = E [ ˆ φ t n ]. As before, the results of this section canonically translate tothe factor pruned models as discussed in Section 9, combined with the argument in Section 12.8 whichensures that pathological cases can indeed be neglected.12.1. Null model.

As opposed to the teacher student scheme concentration of ¯ φ t n around ¯ φ can be eas-ily obtained. In the ﬁrst step we show concentration of φ ( G t ) around ¯ φ t for any given t ∈ T ◦ n . Proposition 12.1.

There exist constants c, c ′ ∈ R > such that for all n ∈ N , all t ∈ T ◦ n and all r ∈ R ≥ wehave P [ | φ ( G t ) − ¯ φ t | ≥ r ] ≤ c ′ exp( − cr n ) . This result suggests that for any t ∈ T ◦ n and r n ∈ ω ( p n − ) we have | φ ( G t ) − ¯ φ t | < r n with high proba-bility, so the free entropy densities of almost all instances asymptotically coincide with their expectation.The next result implies that the same is true for the conditional expectations. Proposition 12.2.

We have ¯ φ t = ¯ φ + o (1) uniformly for all t ∈ T ◦ n . Combining Proposition 12.2 controlling the free entropies globally via the conditional expectationsand Proposition 12.1 controlling the free entropies locally around the conditional expectation gives suf-ﬁcient control for the arguments in the remainder.12.2.

Teacher student model.

As indicated in Section 10 we introduce another conditioning level basedon the choice of assignment pairs ( σ , y ). So, for t ∈ T n , σ ∈ Ω V n and y in the support of y t ( σ ) let G ∗ t , σ , y = ( G ∗ t ( σ ) | y ∗ t ( σ ) = y ) be the teacher student model with the assignments on both sides ﬁxed, andnotice that the results from Section 4.3 can be directly applied to this model. Further, we introduce thecorresponding conditional quenched free entropy density φ ∗ t , σ , y = E [ φ ( G ∗ t , σ , y )]. In the ﬁrst step we showconcentration of φ ( G ∗ s ) around φ ∗ s for s = ( t , σ , y ) in the support of s ∗ n with t ∈ T ◦ . roposition 12.3. There exist constants c, c ′ ∈ R > such that for all n ∈ N , all s = ( t , σ , y ) in the support of s ∗ n with t ∈ T ◦ n and all r ∈ R ≥ we have P [ | φ ( G ∗ s ) − φ ∗ s | ≥ r ] ≤ c ′ exp( − cr n ) . This result suggests that the free entropy densities of almost all instances asymptotically coincide withtheir expectation. The next result implies that the same is true for the conditional expectations.

Proposition 12.4.

Uniformly for all s ∈ S ◦ n we have φ ∗ = φ ∗ s + o (1) = ˆ φ + o (1) . While Proposition 12.4 ensures the equivalence of the quenched free entropy densities, and concen-tration combined with Proposition 12.3, we will derive signiﬁcantly stronger exponential tail bounds forthe Nishimori model in Section 17.2.12.3.

Proof strategy.

The following result ensures that it is sufﬁcient to restrict to typical degree se-quences t ∈ T ◦ n . Further, an immediate consequence is that the quenched free entropy densities arebounded. Lemma 12.5.

We have E [ ¯ φ t n ] = E [ ¯ φ t n { t n ∈ T ◦ n }] + o (1) and the same holds for ¯ φ replaced by φ ∗ and ˆ φ .Proof. Using ε from SYM we have uniform bounds for φ ( G ) for all G in the support of G t given t = ( m , d , k ) ∈ T n , namely log( q ) + mn log( ε ) ≤ φ ( G ) < log( q ) + mn log( ε − ),so φ ( G ) is sublinear in the number of factors. Hence, any conditional expectation is also sublinear, whichcompletes the proof using Proposition 8.1. (cid:3) Hence, we can safely restrict to typical degree sequences for all proofs. Proposition 12.1 then imme-diately follows from Azuma’s inequality combined with the switching method, discussed in Section 12.4.Proposition 12.2 follows from a coupling argument that ensures Lipschitz continuity of the conditionalexpectations, discussed in Section 12.5.For the teacher student models we follow the same strategy on a more granular level. The ﬁrst resultensures that we can restrict to typical assignments.

Lemma 12.6.

We have E h φ ( G ∗ s ∗ n ) i = E h φ ∗ s ∗ n { s ∗ n ∈ S ◦ n } i + o (1) and the same holds for s ∗ n replaced by ˆ s n .Proof. In the proof of Lemma 12.5 we observed that φ ( G ) is sublinear in the number of factors, hence wecan use Proposition 10.1. (cid:3) Now, the proof of Proposition 12.3 in Section 12.6 and the proof of Proposition 12.4 in Section 12.7follow the same strategy as their counterparts for the null model, with an additional layer of complexity.12.4.

Proof of Proposition 12.1.

The proof of Proposition 12.1 is based on Azuma’s inequality. For thispurpose ﬁx t = ( m , d , k ) ∈ T ◦ n and consider G in the support G t of G t as element of the product space G ∈ Q i ∈ [ m ] G t , i with G t , i = X k i t × Ψ k i . This allows to canonically extend the notation for assignments tofactor graphs, i.e. for i ∈ [ m ] the coordinate G i = (( g − ( a i , h )) h ∈ [ k i ] , ψ a i ) encodes the wiring and weightfunction of the factor a i .Now, let ℓ ∈ [ m ] and G a , G b ∈ G t be given such that G ∗ = G a,[ ℓ − = G b,[ ℓ − . Recall that G t is obtained bya uniformly random choice of g t and independent choices of ψ G , a i for i ∈ [ m ]. Hence, ˜ G r ∼ G t | G t ,[ ℓ ] = G r ,[ ℓ ] for r ∈ {a, b} is obtained by a uniformly random completion of g − G r ,[ ℓ ] and independent choices ofthe remaining weight functions. This means that we obtain the following canonical coupling of ˜ G a and˜ G b . For any instance G from ˜ G a obtain G ′ = ι ( G ) by replacing ψ G , ℓ with ψ G b , ℓ and successively switchingthe wires ( ℓ , h ) with g G ( g − G b , ℓ ( ℓ , h )) for h ∈ [ k ℓ ]. It is obvious from the construction that G ′ is an instanceof ˜ G b , and further that reversing the construction recovers G from G ′ , hence ι is a bijection. This in turn hows that ι ( ˜ G a ) ∼ ˜ G b . Further, next to the factor a ℓ the maximum number of coordinates G i , i ∈ [ m ]\[ ℓ ],changed by ι is upper bounded by the maximum number of rewirings, i.e. by k ℓ . Using SYM and thedeﬁnition of the free entropy density this gives | φ ( G ) − φ ( G ′ ) | < k ℓ + n log ¡ ε − ¢ .So, under this coupling and using the triangle inequality we have ¯¯ E £ φ ( ˜ G a ) ¤ − E £ φ ( ˜ G b ) ¤¯¯ = ¯¯ E £ φ ( ˜ G a ) − φ ( ι ( ˜ G a )) ¤¯¯ < k ℓ + n log ¡ ε − ¢ .Since this bound is uniform in the choice of G b we obtain the bound ¯¯ E £ φ ( G t ) ¯¯ G t ,[ ℓ ] = G a,[ ℓ ] ¤ − E £ φ ( G t ) ¯¯ G t ,[ ℓ − = G a,[ ℓ − ¤¯¯ = ¯¯ E [ γ ( G a, ℓ ) − γ ( ˜ G )] ¯¯ ≤ k ℓ + n log ¡ ε − ¢ , γ ( G ) = E £ φ ( G t ) ¯¯ G t ,[ ℓ − = G a,[ ℓ − , G t , ℓ = G ¤ ,˜ G = ( G t , ℓ | G t ,[ ℓ − = G a,[ ℓ − ).Since this bound is uniform in the choice of G a the corresponding Doob martingale has bounded differ-ences almost surely and Azuma’s inequality yields P £¯¯ φ ( G t ) − ¯ φ t ¯¯ ≥ r ¤ ≤ ¡ − c t r n ¢ , c t =

12 log ¡ ε − ¢ mn E [( k t + ] = + o (1)2 log ¡ ε − ¢ d ¯ k E [( k t + ]uniformly for all t ∈ T ◦ n . This completes the proof.12.5. Proof of Proposition 12.2.

While Proposition 12.1 allows to control the ﬂuctuations of the freeentropy density locally, i.e. for given t ∈ T ◦ n , Proposition 12.2 allows to control the ﬂuctuations under avariation of the degree sequences. However, the proof strategy is fairly similar. Since the setup for thediscussion of the teacher student scheme is related but far more involved, we discuss the steps in detail.First we notice that ¯ φ t = ¯ φ t ′ if t ′ is obtained from t by only relabeling factors and variables. Hence, theconditional quenched free entropy ¯ φ t only depends on the absolute degree frequencies on both sides.Intuitively, this means that for t , t ′ ∈ T ◦ n we may assume without loss of generality that the degree se-quences are sorted such that the difference on both sides is minimized, i.e. iteratively for increasing d ∈ D we equip min( n P [ d t = d ], n P [ d t ′ = d ]) variables with degree d and keep the difference (in anyorder) at the end. Then we proceed analogously on the factor side. For transparency, let n g ∈ [ n ] de-note the number of good variables, i.e. d t ,[ n g ] = d t ′ ,[ n g ] by our construction above. Analogously, we have m g ∈ [min( m t , m t ′ )] good factors with k t ,[ m g ] = k t ′ ,[ m g ] . The remaining variables I b = [ n ] \ [ n g ] variablesare ﬂagged as bad, so are the remaining factors A b = [ m t ] \ [ m g ] in t and factors A ′ b = [ m t ′ ] \ [ m g ] in t ′ . Finally, assume without loss of generality that the total degree of t is at least the total degree of t ′ ,i.e. E [ d t ] n ≥ E [ d t ′ ] n .Now, we couple G t and G t ′ by choosing the weights for the factors a i , i ∈ [ m g ], identically from ψ k t , i since k t , i = k t ′ , i and independently for A b and A ′ b . Further, we draw the bijection g : E [ d t ] n → E [ d t ] n for G t uniformly and project it down to a bijection g ′ : E [ d t ′ ] n → E [ d t ′ ] n for G t ′ using the switchingmethod, i.e. by rewiring all positions in [ E [ d t ′ ] n ] pointing to [ E [ d t ] n ] \ [ E [ d t ′ ] n ] with the positions in[ E [ d t ] n ] \ [ E [ d t ′ ] n ] pointing to [ E [ d t ′ ] n ] in order of appearance. This perspective induces a partition ofthe variable side half-edges X t , namely the good half-edges X g of the variables x i , i ∈ [ n g ], the bad half-edges X bc that G t and G t ′ have in common with respect to the relative representations above, and thebad half-edges X be that correspond to [ E [ d t ] n ] \ [ E [ d t ′ ] n ]. Now, the switching method only affects goodfactors a i , i ∈ [ m g ] that have already turned bad by the wiring, i.e. that connect to X t \ X g . In otherwords, the good factors a i , i ∈ [ m g ], connecting to A g are not affected by the switching and are therebythe factors on which we know G t and G t ′ to coincide under this coupling. ence, the maximum number of factors on which G t and G t ′ differ under this coupling is given bymax( | A b | , | A ′ b | ) + | X t \ X g | . So, in terms of the free entropy density for given ( G , G ′ ) drawn from the cou-pling we consider the partition of [max( m t , m t ′ )] into the good factors A g , the good factors A bw turnedbad by the wiring, the bad factors A b given by the difference of t and t ′ , and ﬁnally some additionaldummy factors A d with constant weights ψ G , a i = i ∈ A d in case m t ′ > m t . Then we have φ ( G ) < | A bw ∪ A b ∪ A d | n log( ε − ) + n log ÃX σ Y i ∈ A g ψ G , a i ( σ ∂ a i ) ! ≤ | A bw ∪ A b ∪ A d | n log ¡ ε − ¢ + φ ( G ′ )and the lower bound follows analogously. Now, notice that | A b | + | A ′ b | = X k | m t P [ k t = k ] − m t ′ P [ k t ′ = k ] | , | X t \ X g | + | X t ′ \ X g | = X d d | n P [ d t = d ] − n P [ d t ′ = d ] | ,so with | A bw | ≤ | H v, t \ H v,g | (respectively the maximum of the two in general) and | A b ∪ A d | ≤ max( | A b | , | A ′ b | )the above gives bounds that match the order (while we still counted a fair amount of factors as being badalthough they might connect to the same variables in both models, only that the degrees of the variablesdiffer).Finally, by the choice of metric for the degree sequences multiple applications of the triangle inequal-ity in order to obtain the distance in terms of the reference distributions yield | A b | + | A ′ b | = o ( n ), | X t \ X g | + | X t ′ \ X g | = o ( n ),i.e. bounds that are uniform over any choice of t , t ′ ∈ T ◦ n and ( G , G ′ ) from the corresponding coupling,so | ¯ φ t − ¯ φ t ′ | = o (1) uniformly in t , t ′ and hence | ¯ φ t − ¯ φ | = o (1) uniformly in t (using Proposition 8.1).12.6. Proof of Proposition 12.3.

The proof of this result is very similar to the proof of Proposition 12.1,however with an additional layer of complexity due to the assignment pairs. Recall the distribution of G ∗ s , s = ( t , σ , y ) in the support of s ∗ n , from Section 4.3 and notice that it is very similar to obtaining G t ,with reweighted distributions for the weight functions and restrictions in the choice of bijections.The proof of Azuma’s inequality for this model is now in almost complete analogy to the proof ofProposition 12.1, only that the completions for the bijections have to be chosen separately, while theswitching method is not affected (consistency with colors is preserved by switching, since the assignmentpair ( σ , y ) coincides in both models). Recalling the result this gives c ′ = c s =

12 log ¡ ε − ¢ m t n E [( k t + ]using only SYM and for all s . Using DEG we obtain the uniform bound c ∈ R > for s with t ∈ T ◦ n and n ∈ N .12.7. Proof of Proposition 12.4.

The result follows from a combination of the concepts for the proof ofProposition 12.2 and the model introduced in Section 4.3. First, we observe the invariance of φ ∗ s withrespect to a relabeling of variables and factors. Fix s ∈ S ◦ n . Since only the frequencies on both sides arerelevant, we may assume that the sequences are sorted as in the Proof of Proposition 12.2, with the corre-sponding partitions into good and bad factors as well as good and bad variables. The model introducedin Section 12.6 allows for the same switching strategy, however this time we draw the q bijections sepa-rately, and each bijection wiring half-edges of color ω ∈ Ω for the model with most half-edges of color ω (a quantity that depends on t , t ′ , ( σ , y ) and ( σ ′ , y ′ ) only). As before, all good factors that don’t turn bad y the wiring of nodes of color ω ∈ Ω are not affected (regardless of the direction from which model towhich model we project), hence the total number m gb ( ω ) of good factors turned bad can still be upperbounded by the maximum h max,vb ( ω ) = max( h vb ( ω ), h ′ vb ( ω )) of the total degrees h vb ( ω ), h ′ vb ( ω ) of all bad variables of given color ω . With h max,vb ( ω ) ≤ h vb ( ω ) + h ′ vb ( ω ) and summing over all ω ∈ Ω recovers theupper bound from the proof of Proposition 12.2, i.e. the number of all good factors turned bad is at mostthe sum h vb + h ′ vb of the total degrees h vb , h ′ vb of the bad variables in both models. In addition to these weneed to consider the bad factors, so analogously to the standard model case the number of disagreeingfactors can be upper bounded by h vb + h ′ vb + m b + m ′ b with m b , m ′ b denoting the numbers of bad factors.As before, we notice that h vb + h ′ vb = X d , χ d | n α V, t , σ ( d , χ ) − n α V, t ′ , σ ′ ( d , χ ) | , m b + m ′ b = k m t α F, t , y − m t ′ α F, t ′ , y ′ k .Now, by design of the metrics for the degree sequences and assignment sequences, multiple applicationsof the triangle inequality yield h vb + h ′ vb = o ( n ), m b + m ′ b = o ( n ). This shows that φ ∗ s ′ = φ ∗ s + o (1) uniformlyfor all s , s ′ ∈ S ◦ n , and thereby shows φ ∗ = φ ∗ s + o (1) uniformly using Lemma 12.6.Mutual contiguity of s ∗ n and ˆ s n as discussed in 9.12 implies that ˆ s n ∈ S ◦ n with high probability since s ∗ n ∈ S ◦ n with high probability. This suggests that ˆ φ = E [ φ ∗ ˆ s n {ˆ s n ∈ S ◦ n }] + o (1) = φ ∗ s + o (1) since φ ∗ s ′ = φ ∗ s + o (1) uniformly for all s , s ′ ∈ S ◦ n .12.8. Pruning factors.

As discussed in Section 9, all arguments for ﬁxed sequences t canonically trans-late to the factor pruned model. Hence, the only missing argument is that the free entropy of the gener-alized factor graphs in Section 9 for arbitrary sequences is still sublinear in the number of factors, i.e. weonly need to establish Lemma 12.5. But this result is immediate from the deﬁnition of the factor graphs,since the number of non-trivial factors, i.e. factors whose weight functions are not constant 1, can alwaysbe upper bounded by the total number of factors.13. P ROOF OF P ROPOSITION t ε , n introduced in Section 9 and t n = t n as introduced in Section 8 let φ ∗ ( ε , n ) = E [ φ ( G ∗ t ε , n ( σ ∗ )].We say that φ ∗ is asymptotically continuous in ε ∗ m ∈ [0, 1) if for all ε ∈ R > there exists δ ∈ R > such that forall ε m ∈ [0, 1) ∩ B δ ( ε ∗ m ) there exists n ∈ N such that | φ ∗ ( ε m , n ) − φ ∗ ( ε ∗ m , n ) | < ε for all n ∈ N with n ≥ n .Further, φ ∗ is asymptotically continuous in the number of factors if the above holds for all ε m . This prop-erty ensures that φ ∗ can be asymptotically approximated, without assuming that a limit lim n →∞ φ ∗ ( ε , n )exists, and without enforcing uniform convergence in that n may depend on the choice of the parame-ter. Proposition 13.1.

The quenched free entropy density φ ∗ is asymptotically continuous in the number offactors.Proof. This result is immediate by combining Proposition 12.4 with the coupling in Section 12.7 used toobtain Proposition 12.4 since we derived bounds in terms of the distance of coloured sequences s . (cid:3) Proof of Proposition 3.3.

We proceed with the proof for the conﬁguration model and discuss the trans-lation to simple factor graphs at the end. Proposition 13.1 directly translates to degree distributions asfollows. We equip T ∗ n with the product metric induced by ∆ discussed in Section 8, preferably omittingthe parts ensuring convergence of the higher moments. Since the underlying assignment distributions µ k of the reference distribution (in Section 10) given k are invariant to the choice of the degree distri-bution (analogously on the variable side), Proposition 12.4 with the coupling in Section 12.7 ensuresthat | φ ∗ n − φ ∗ n | is small for n sufﬁciently large if ∆ ( t , t ) is small, with φ ∗ i , n denoting the quenched freeentropy density of the teacher student model and t i ∈ T ◦ i , n denoting typical degree sequences obtained rom the degree distributions ( p d, i , p k, i ). This ensures that it is indeed sufﬁcient to work with ﬁnitelysupported degree distributions in order to approximate the quenched free entropy density (in the limit).We’re left to show that the Bethe functional is also continuous with respect to the degree distributions,then Proposition 3.3 follows from an ε -argument. For given d in the support of d , k i in the support of k , h i ∈ [ k i ] and ψ i in the support of ψ k i for i ∈ [ d ], and ﬁnally µ i , j in the support of π ∈ P ∗ ( Ω ) for j ∈ [ k i ]with k = ( k i ) i , h = ( h i ) i , ψ = ( ψ i ) i and µ = ( µ i , j ) i , j let M V ( d , k , h , ψ , µ ) = X ω ∈ Ω d Y i = X τ ∈ Ω ki { τ h i = ω } ψ i ( τ ) Y j ∈ [ k i ]\{ h i } µ i , j ( τ j ).Using the canonical bounds for µ and SYM yields q ε d d Y i = q − ( k i − ≤ M V ( d , k , h , ψ , µ ) ≤ q ξ d d Y i = q k i − uniformly in ψ and µ . On the other hand, with π ∈ P ∗ ( Ω ) and with respect to Equation 2.3 we have E [ q − ξ − d M V ( d , k , ( h k i , i ) i , ( ψ k i , i ) i , ( µ i , j , π ) i , j )] = B ( π ) = E h q − ξ − d Λ ³ M V ( d , ( ˆ k i ) i , ( h ˆ k i , i ) i , ( ψ ˆ k i , i ) i , ( µ i , j , π ) i , j ) ´i to the Bethe functional can be uniformly bounded by E " log Ã q ε d d Y i = q − (ˆ k i − ! ≤ B ( π ) ≤ E " log Ã q ξ d d Y i = q ˆ k i − ! ,and is in particular ﬁnite. For the second contribution and ﬁxed k we obtain the uniform bound M F ( k , ψ k , ( µ j , π ) j ) = X τ ψ k ( τ ) Y j µ j , π ( τ j ) ≤ q k ξ and M F ( k , ψ k , ( µ j , π ) j ) ≥ q − k ε inside the logarithm, and the expectation E h M F ( k , ψ k , ( µ j , π ) j ) i = ξ ,hence the second contribution B ( π ) = E [ d ] E [ k ] E h ( k − Λ ³ M F ( k , ψ k , ( µ j , π ) j ) ´i can be uniformly bounded by E [ d ] E [ k ] E h ( k − ξ log ³ q − k ε ´i ≤ B ( π ) ≤ E [ d ] E [ k ] E h ( k − ξ log ³ q k ξ ´i ,so in particular the expectations in the Bethe functional are ﬁnite. However, most importantly the abovesuggests that the Bethe functional as a function of the degrees d , k is uniformly continuous in the fol-lowing sense. Let d ′ , k ′ be ﬁnitely supported degrees such that both d ′ , k ′ are close to d , k and ˆ d ′ , ˆ k ′ areclose to ˆ d , ˆ k in total variation (which gives bounds on the distance of the ﬁrst moments), then so are theBethe functionals uniformly for π ∈ P ∗ ( Ω ). The argumentation is similar to the discussion in Section 5.8.The fact that the expectations are ﬁnite ensures that we can cut the tails (in d , k = ( k i ) i ∈ [ d ] ) at arbitrarilysmall loss, leaving us with a uniform bound for the remaining contributions. Choosing suitable (ﬁnitelysupported) distributions d ′ , k ′ sufﬁciently close to d , k then ensures that cutting the tails with respectto d ′ , k ′ comes at an arbitrarily small loss and further using the uniform bounds for the remainder weobtain a uniform bound on the distance of the Bethe functionals in terms of the distance of the degreedistributions, thereby ensuring uniform continuity. This immediately translates to sup π ∈ P ∗ ( Ω ) B ( π ). ith these continuity results we can show that the quenched free entropy density coincides with thesupremum of the Bethe functional given that the assertion holds for ﬁnitely supported degree distribu-tions. For any given ε choose degrees d ′ , k ′ with ﬁnite support close to d , k in the metric above. Then thedistance of the supremum of the Bethe functional with respect to the two pairs of degree distributionscan be bounded by ε /3. Further, for n sufﬁciently large the quenched free entropy density with respectto d ′ , k ′ is at a distance at most ε /3 to the supremum of the Bethe functional with respect to d ′ , k ′ sincewe obtained the results for bounded degrees. But by the continuity result for the quenched free entropydensity above, we know that for n sufﬁciently large the quenched free entropy densities with respect to d ′ , k ′ and with respect to d , k are also at most at a distance ε /3. Taking ε to 0 completes the proof.Since standard arguments, i.e., Section 5 in [13] show that there exists a simple G with the desireddegree sequences with positive probability, the proposition readily follows. (cid:3)

14. P

ROOF OF P ROPOSITION

Overview.

For a given ε > m ε , n be a Poisson variable with mean (1 − ε ) ¯ d n / ¯ k . Moreover, let G ε , n be the random factor graph with variables nodes x , . . . , x n and factor nodes a , . . . , a m ε , n obtained asfollows. Let X = n [ i = { x i } × [ d i ], A = m ε , n [ i = { a i } × [ k i ]contain clones of the variable nodes x , . . . , x n and of the factor nodes, respectively. Then choose a max-imal matching Γ ε , n of the complete bipartite graph on the vertex classes X , A . For each matching edgewe insert the corresponding variable–factor node edge into G ε , n . Finally, for each factor node a i wechoose a weight function ψ a i independently from the distribution P .Let ˆ G ε , n , G ∗ ε , n be the random factor graph models obtained from G ε , n via (4.2), (4.3). Further, let σ ∗ n :{ x , . . . , x n } → Ω be a uniformly random assignment. Since E [ m ε , n ] < dn / k − Ω ( n ), w.h.p. some of thevariable clones from X remain vacant in the random factor graph G ∗ ε , n . Let C ∗ denote the set of allsuch vacant clones. As before, we refer to them as the cavities. Further, let ( y i , j ) i , j ≥ be a sequence ofuniformly chosen independent cavities. Also let d ε be a random variable with distribution Bin( d , 1 − ε ).By Remark 4.15, the main step toward the proof of Proposition 3.6 is to show the following. Proposition 14.1.

We have E h log Z G ∗ ε , n + i − E h log Z G ∗ ε , n i ≤ E  q − ξ − d ε Ã X σ ∈ Ω d ε Y i = ψ ˆ k i , i ( σ , σ ∗ y i ,2 , . . . , σ ∗ y i ,ˆ k i ) ! log X σ ∈ Ω * d ε Y i = ψ ˆ k i , i ( σ , σ y i ,2 , . . . , σ y i ,ˆ k i ) + G ∗ ε , n  − (1 − ε ) ¯ d ξ ¯ k E · ( k ψ − ψ ( σ ∗ y , . . . , σ ∗ y k ψ ) log D ψ ( σ y , . . . , σ y k ψ ) E G ∗ ε , n ¸ + o (1).To prove Proposition 14.1 we couple the random factor graphs G ∗ ε , n + and G ∗ ε , n . Speciﬁcally, for each j in the support of k let M j be a random variable with distribution Po((1 − ε ) ¯ d P £ k = j ¤ n / ¯ k ). Further, let ∆ j be a random variable with distribution ∆ j ∼ Po((1 − ε ) ¯ d P £ k = j ¤ / ¯ k ).Additionally, let M + j = M j + ∆ j . Further, let M = ( M j ) j , M + = ( M + j ) j and let G ∗ n , M , G ∗ n , M + be the factorgraphs obtained as follows. Choose a random maximal matching Γ n , M of the complete bipartite graphwith vertex classes X n = n [ i = { x i } × [ d i ], A n , M = [ i ∈ supp k [ j ∈ [ M i ] { a i , j } × [ i ]. hen let G n , M be the random factor graph with variable nodes x , . . . , x n and factor nodes a i , j , i ∈ supp k , j ∈ [ M i ] where each edge of Γ n , M induces an edge between the corresponding variable and check node.Additionally, the factor nodes a i , j receive independent weight functions with distribution P i . Finally, G ∗ n , M is the factor graph obtained from G n , M via (4.3). The model G ∗ n , M + is deﬁned analogously. Lemma 14.2.

The random factor graphs G ∗ ε , n , G n , M and G ∗ ε , n + , G n + M + are identically distributed.Proof. This is immediate from the construction. (cid:3)

Let γ i be the number of factor nodes of degree i adjacent to x n + in G ∗ n , M + . Further, let M − i = ∨ ( M i − γ i )and let G n , M − , G ∗ n , M − be the corresponding factor graphs. Additionally, let D − be the σ -algebra generatedby ( M i , γ i , ∆ i ) i ≥ and σ ∗ n + and let M − be the σ -algebra generated by D − and G ∗ n , M − .To set up the coupling, obtain G ′ from G ∗ n , M − as follows. Let C − be the set of cavities of G ∗ n , M − . More-over, for i ∈ supp k and j ∈ [ M i − M − i ] let a ′ i , j be a new factor node. Now, obtain G ′ by adding the a ′ i , j to G ∗ n , M − by pairing them to cavities from C − and choosing weight functions such that for any possibleresult of this experiment we have P £ G ′ = g | M − ¤ ∝ Y i , j P i ( ψ a ′ i , j ) ψ a ′ i , j ( σ ∗ ).(14.1)Additionally, let G ′′ be the random factor graph obtained from G ∗ n , M − via the following process. Add avariable node x n + , factor nodes a ′′ i , j for i ∈ supp k , j ∈ [ M + i − M i − γ i ] and further factor nodes a ′′′ i , j for i ∈ supp k , j ∈ [ γ i ] with x n + ∈ ∂ a ′′′ i , j according to the distribution P £ G ′′ = g | G ∗ n , M − , σ ∗ n + ¤ ∝ Y i , j P i ( ψ a ′′ i , j ) ψ a ′′ i , j ( σ ∗ ) Y i , j P i ( ψ a ′′′ i , j ) ψ a ′′′ i , j ( σ ∗ ) Lemma 14.3.

We have E £ log Z ( G ′ ) ¤ = E £ log Z ( G ∗ n , M ) ¤ + o (1), E £ log Z ( G ′′ ) ¤ = E h log Z ( G ∗ n + M + ) i + o (1). Proof.

By construction, G ′ is obtained from G ∗ n , M − by adding P i ∈ supp k M i − M − i factor nodes. Because alldegrees are bounded, we have E " X i ∈ supp k M i − M − i = Θ (1)Since a Poisson random variable with bounded expectation is bounded by O (log n ) with probability 1 − o (1/ n ), we may assume that the number of factor nodes added from G ∗ n , M − to G ′ is O (log n ). Let us addthese factor nodes one-by-one. Then by Proposition 4.5 we can couple G ′ and G ∗ n , M such that P £ G ′ = G ∗ n , M ¤ = − ˜ O ( n − )whence the ﬁrst statement of the lemma follows.Let E = ( G ′′ − x n + − X i , j a ′′ i , j − X i , j a ′′′ i , j = G ∗ n + M + − X i , j a ′′ i , j − X i , j a ′′′ i , j ) be the event that on the ﬁrst n variables the factor graphs G ′′ and G ∗ n + M + coincide. Furthermore, denoteby ∆ s = ¯¯¯¯¯ G ′′ − x n + − X i , j a ′′ i , j − X i , j a ′′′ i , j △ G ∗ n + M + − X i , j a ′′ i , j − X i , j a ′′′ i , j ¯¯¯¯¯ he amount of edges in which the factor graphs differ (restricted on the ﬁrst n variables). Then Proposi-tion 4.5 and Proposition 4.13 show that G ′′ and G ∗ n + M + can be coupled such that P [ E ] = − ˜ O ( n − ), P £ ∆ s > p n log n ¤ = ˜ O ( n − ).(14.2)Furthermore, comparing the deﬁnitions of G ′′ and G ∗ n + M + , we see that given E the factor graphs G ′′ and G ∗ n + M + satisfy d TV ³ G ′′| E , G ∗ n + M + | E ´ = ˜ O ( n − ).As all weight functions are strictly positive by assumption, there is a coupling of G ′′ and G ∗ n + M + suchthat ¯¯¯ E h log Z ( G ′′ ) − log Z ( G ∗ n + M + ) | E i¯¯¯ = o (1).(14.3)Additionally, given E = © ∆ s ≤ p n log n ª , we ﬁnd ¯¯¯ E h log Z ( G ′′ ) − log Z ( G ∗ n + M + ) | = O ( p n log n ) | E i¯¯¯ .(14.4)Since, ﬁnally, ¯¯¯ log Z ( G ′′ ) − log Z ( G ∗ n + M + ) ¯¯¯ = O ( n )(14.5)deterministically, the second assertion follows from (14.2)–(14.5). (cid:3) Let ( γ ′ i ) i ∈ supp k be a random vector with distribution γ ′ i = d ε X h = { ˆ k h = i }. Lemma 14.4.

We have d TV (( γ i ) ∈ supp k , ( γ ′ i ) i ∈ supp k ) = o (1) .Proof. Let E be the event that the new variable node x n + is adjacent to particular factor nodes α , . . . , α ℓ ,ordered according to the clones of x n + that they connect to. Let κ , . . . , κ ℓ be the degrees of α , . . . , α ℓ .Furthermore, let G ⋆ be the factor graph obtained from G n + M + by removing x n + and its adjacent factornodes. Finally, let R be the event that G ⋆ has (1 + o (1)) ∆ / q cavities with each possible value τ ∈ Ω under σ ∗ . Then Proposition 4.6 implies that P [ E ] = P £ E | G ⋆ ∈ R ¤ + o (1).(14.6)To be precise, in order to apply Proposition 4.6 we think of x n + and its adjacent factor nodes α , . . . , α ℓ as a single ‘super-factor node’ with weight function ψ x n + , α ,..., α ℓ ( σ ) = X σ xn + ∈ Ω ℓ Y i = ψ α i ( σ x ) x ∈ ∂α ℓ ( σ ∈ Ω V n ).Furthermore, the random factor graph model G ⋆ can be described as follows. There are ℓ fewer factornodes, and thus Proposition 4.6 and SYM imply that w.h.p. E [ ψ G ⋆ ( σ ∗ ) | σ ∗ ] E [ ψ G n + M + ( σ ∗ ) | σ ∗ ] = ξ − ℓ .(14.7)Similarly, E [ ψ G ⋆ ( σ ∗ ) | σ ∗ ] E [ ψ G n + M + ( σ ∗ ) | σ ∗ , E ] = ξ − ℓ .(14.8) ombining (14.7)–(14.8), we obtain P [ E ] ∼ P [ d ε = ℓ ] ℓ Y h = P £ ˆ k h = κ h ¤ .(14.9)Finally, Lemmas 4.18 and 4.19 ensure that w.h.p.there are (1 + o (1)) ∆ / q cavities of each possible colour τ ∈ Ω . Thus, the assertion follows from (14.9). (cid:3) Lemma 14.5.

We have E £ log Z G ′ − log Z G n , M − ¤ = (1 − ε ) ¯ d ξ ¯ k E · k ψ ψ ( σ ∗ y , . . . , σ ∗ y k ψ ) log D ψ ( σ y , . . . , σ y k ψ ) E G ε , n ¸ + o (1). Proof.

Since G ′ is obtained from G n , M − by adding factor nodes a ′ i , j for i ∈ supp k and j ∈ £ M i − M − i ¤ according to (14.1), we obtainlog Z G ′ Z G n , M − = log ¿ Y i ∈ supp k Y j ∈ [ M i − M − i ] ψ a ′ i , j µ σ ( ∂ a ′ i , j ), . . . , σ ( ∂ k a ′ i , j a ′ i , j ) ¶ À G ∗ n , M − Therefore, with ( y i ) i ≥ signifying independent uniformly random cavities of G ∗ n , M − , we obtain E £ log Z G ′ − log Z G n , M − ¤ = (1 − ε ) ¯ d ξ ¯ k E · k ψ ψ ( σ ∗ y , . . . , σ ∗ y k ψ ) log D ψ ( σ y , . . . , σ y k ψ ) E G ∗ n , M − ¸ + o (1).(14.10)Since G ∗ n , M − and G ∗ ε , n have total variation distance o (1) while the expression inside the expectation isbounded, the assertion follows from (14.10). (cid:3) Lemma 14.6.

We have E £ log Z G ′′ − log Z G n , M − ¤ = E " q − ξ − d ε Ã X σ ∈ Ω d ε Y i = ψ ˆ k i , i ( σ , σ ∗ y i ,2 , . . . , σ ∗ y i ,ˆ k i ) ! log X σ ∈ Ω * d ε Y i = ψ ˆ k i , i ( σ , σ y i ,2 , . . . , σ y i ,ˆ k i ) + G ε , n + (1 − ε ) ¯ d ξ ¯ k E · ψ ( σ ∗ y , . . . , σ ∗ y k ψ ) log D ψ ( σ y , . . . , σ y k ψ ) E G ε , n ¸ + o (1). Proof.

Since G ′′ is obtained from G n , M − by adding a variable node x n + with associated factor nodes a ′′′ i , j for i ∈ supp k , j ∈ [ γ i ] and further factor nodes a ′′ i , j for i ∈ supp k , j ∈ [ M + i − M i − γ i ], we obtainlog Z G ′′ Z G n , M − = log X σ ∈ Ω ¿ Y i supp k Y j ∈ [ γ i ] ψ a ′′′ i , j µ σ , σ ( ∂ a ′′′ i , j ), . . . , σ ( ∂ k a ′ i , j a ′′′ i , j ¶ À G ∗ n , M − (14.11) + log ¿ Y i ∈ supp k Y j ∈ £ M + i − M i − γ i ¤ ψ a ′′ i , j µ σ ( ∂ a ′′ i , j ), . . . , σ ( ∂ k a ′′ i , j a ′′ i , j ¶ À G ∗ n , M − (14.12)The assertion follows from (14.11), Lemma 14.4 and the fact that G ∗ n , M − and G ∗ ε , n have total variationdistance o (1). (cid:3) Lemma 14.7.

Let ( y i ) i ≥ be a sequence of uniformly random independent cavities of G ∗ ε , n . For any ℓ ≥ , δ > there exists θ such that for all functions f : Ω ℓ → [0, 1] we have ¯¯¯ E h f ( σ ∗ y , . . . , σ ∗ y ℓ ) | G ∗ ε , n i − E hD f ( σ y , . . . , σ y ℓ ) E | G ∗ ε , n i¯¯¯ < δ .(14.13) roof. Going back to the deﬁnitions of G ∗ and the Boltzmann distribution, we obtain P £ σ ∗ = σ | G ∗ ε , n = G ¤ = P £ G ∗ ε , n = G | σ ∗ = σ ¤ q − n P £ G ∗ ε , n = G ¤ = ψ G ( σ ) q n E [ ψ G ε , n ( σ )] P £ G ∗ ε , n = G ¤ = ψ G ( σ ) q n E [ ψ G ε , n ( σ )] P τ ∈ Ω Vn ψ G ( τ )/ E [ Z G ε , n ] = ψ G ( σ ) Z G · E [ Z G ε , n ] q n E [ ψ G ε , n ( σ )] .(14.14)There are two cases to consider. First, if | σ − ( ω ) | = n / q + O ( p n ), then BAL ensures that q n E [ ψ G ε , n ( σ )] = Θ ( E [ Z G ε , n ]). Hence, (14.14) shows that for such σ , P £ σ ∗ = σ | G ∗ ε , n = G ¤ = Θ ( µ G ( σ )).(14.15)The second case is that | σ − ( ω ) | − n / q ≫ p n for some ω ∈ Ω . Then Proposition 4.17 shows that P £ σ ∗ = σ ¤ , E [ µ G ∗ ε , n ( σ )] = o (1).Thus, we may conﬁne ourselves to the former case and assume that (14.15) holds. In light of Lemma 4.3and Proposition 4.17 we may assume that µ G ∗ ε , n is δ -symmetric for a small δ > θ ). Hence, (14.15) implies together with [16, Lemma 3.17] that (14.13) is satisﬁed. (cid:3) Proof of Proposition 3.6.

This is an immediate consequence of Proposition 14.1 and Lemma 14.7. (cid:3)

15. P

ROOF OF P ROPOSITION

BAL , SYM and

POS hold.15.1.

Preliminaries and setup.

The proof of Proposition 3.7 relies on showing that for any distribution π ∈ P ∗ ( Ω ), 1 n E [log Z ( ˆ G )] ≥ B ( π )(15.1)We will show (15.1) via the interpolation method. To be precise, for a given π ∈ P ∗ ( Ω ) we will constructa family of random factor graph models parametrised by t ∈ [0, 1]. The proof of Proposition 3.7 is basedon two pillars. First, it will be easy to see that the free energy of the t = n B ( π ) + o ( n ) and thatthe t = G . Second, we will show that the derivative of E [log Z ( ˆ G )]/ n with respectto t is non-negative. (15.1) readily follows.The interpolating family is constructed from the generalised model described in Section 4.1. To thisend, we introduce the model G t , ε , π which is constructed as follows. Let m ε ( t ) ∼ Po ¡ (1 − ε ) t ¯ d n / ¯ k ¢ and m ′ ε ( t ) ∼ Po ¡ (1 − ε )(1 − t ) ¯ d n / ¯ k ¢ As before, each variable comes with a target degree d i ≥ d .Similarly, each of the m ε ( t ) factor nodes comes with target degree k i ≥

2, while each of the m ′ ε ( t ) factornodes comes with a target degree of k ′ i , which are independent and distributed as k . Let the total numberof factor nodes be given by m = m ε ( t ) + m ′ ε ( t ) X i = k ′ i and deﬁne the factor degree sequence as k = ( k i ) i ∈ m ε ( t ) ∪ (1) i ∈ m ′ ε ( t ), j ∈ k ′ i .Moreover, let ( ψ ′ i , j ) i , j be a sequence of independent random weight functions such that ψ ′ i , j has distri-bution ψ k ′ i . Then with ( µ i , j , h ) i , j , h ≥ drawn independently from π and h i , j ∈ [ k ′ i ] drawn independently nd uniformly, we let ψ b i , j : σ ∈ Ω X τ ∈ Ω k ′ i ψ ′ i , j n τ h i , j = σ o Y h j µ i , j , h ( τ h )Finally, let G t , ε , π be the resulting random factor graph. In addition, for an integer T > G t , ε , π , T bethe random factor graph obtain by adding θ random unary factors that each ﬁx a random variable nodeto a uniformly random spin chosen from Ω , with θ ∈ [ T ] drawn uniformly at random. If the number offactor nodes is not obvious from the context, we will write G t , ε , π , T ( m ε ( t ), m ′ ε ( t )) for completeness. It isstraightforward to check the following. Fact 15.1.

The G t , ε , π , T model satisﬁes the assumptions of Proposition 4.5. Let Γ t = t ¯ d ¯ k ξ E " ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ ( π ) j ( τ j ) ! .The following proposition, which we prove in Section 15.2, shows that the free energy essentially in-creases with t , up to the correction term Γ t . Proposition 15.2.

For every ε > there is T > such that for all large enough n the following is true. Let φ T : t ∈ [0, 1] ( E [log Z ( ˆ G t , ε , π , T )] + Γ t )/ n . Then φ ′ T ( t ) > − ε for all t ∈ [0, 1] . We complement this statement by computing the free energy at ‘times’ t = Proposition 15.3.

We have n E [log Z ( ˆ G π ,0 )] = E " ξ − d | Ω | Λ Ã X σ ∈ Ω d Y i = X τ ∈ Ω ˆ k i © τ h i = σ ª ψ ˆ k i ( τ ) Y j h i µ i j ( τ j ) ! .The proof of Proposition 15.3 can be found in Section 15.3. Proof of Proposition 3.7.

Proposition 15.2 implies that1 n E [log Z ( ˆ G π ] = O ( ε ) + n E [log Z ( ˆ G ε , π , T )] ≥ O ( ε ) + n E [log Z ( ˆ G ε , π , T )] − Γ .(15.2)Further, Proposition 15.3 implies together with the fact that all weight functions are strictly positive that1 n E [log Z ( ˆ G ε , π , T )] = n E [log Z ( ˆ G π ,0 )] + O ( ε n ) = E " ξ − d | Ω | Λ Ã X σ ∈ Ω d Y i = X τ ∈ Ω ˆ k i © τ h i = σ ª ψ ˆ k i ( τ ) Y j h i µ i j ( τ j ) ! .(15.3)Combining (15.2) and (15.3) completes the proof. (cid:3) Proof of Proposition 15.2.

As before let σ ∗ ∈ Ω { x ,..., x n } be a uniformly random assignment. Fur-ther, let D ′ be the σ -algebra generated by ( d i , k i , k ′ i ) i . Let G ′ = G ∗ t , ε , π , T ( m ε ( t ), m ′ ε ( t )) be the randomfactor graph drawn from the distribution P £ G ′ ∈ E | D ′ , σ ∗ ¤ = E [ { G t , ε , π , T ( m ε ( t ), m ′ ε ( t )) ∈ E } ψ G t , ε , π , T ( m ε ( t ), m ′ ε ( t )) ( σ ∗ ) | D ′ , σ ∗ ] E [ ψ G t , ε , π , T ( m ε ( t ), m ′ ε ( t )) ( σ ∗ ) | D ′ , σ ∗ ] .We deﬁne G ′′ = G ∗ t , ε , π , T ( m ε ( t ) + m ′ ε ( t )), G ′′′ = G ∗ t , ε , π , T ( m ε ( t ), m ′ ε ( t ) +

1) analogously. Moreover, let C bethe set of all variable clones ( x i , h ), h ≤ d i that remain unmatched in G ′ . Let ( y i ) i ≥ denote a sequenceof independent uniform samples from C . We identify the clone y i with its underlying variable nodewhere convenient. Finally, let ( µ i ) i ≥ be independent samples from π . The key step towards the proof ofProposition 15.2 is the derivation of the following formula. emma 15.4. Let Ξ t = E £ ψ k ( σ ∗ ( y ), . . . , σ ∗ ( y k )) log 〈 ψ k ( σ ( y ), . . . , σ ( y k )) 〉 G ′ ¤ − E " k X i = X τ ∈ Ω k © τ j = σ ∗ ( y ) ª ψ k ( τ ) Y j i µ j ( τ j ) log ¿ X σ ∈ Ω k © σ i = σ ( y ) ª ψ ( σ ) Y j i µ j ( σ j ) À G ′ + E " ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ ( π ) j ( τ j ) ! . Then uniformly for all t ∈ (0, 1) and all T ≥ , ∂∂ t φ T ( t ) = o (1) + ¯ d ¯ k ξ Ξ t .The steps to prove Proposition 15.4 are the following. Let ∆ t = E £ log Z ( G ∗ t , ε , π , T ( m ε ( t ) + m ′ ε ( t )) ¤ − E £ log Z ( G ∗ t , ε , π , T ( m ε ( t ), m ′ ε ( t ))) ¤ = E £ log Z ( G ′′ ) ¤ − E [log Z ( G ′ )], ∆ ′ t = E £ log Z ( G ∗ t , ε , π , T ( m ε ( t ), m ′ ε ( t ) + ¤ − E £ log Z ( G ∗ t , ε , π , T ( m ε ( t ), m ′ ε ( t ))) ¤ = E £ log Z ( G ′′′ ) ¤ − E [log Z ( G ′ )], ∆ ′′ t = E " ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ j ( τ j ) ! .Because t enters into the deﬁnition of the various factor graphs only through the Poisson variables m ε ( t ), m ′ ε ( t ), the following claim follows directly from [14, Lemma 4.2]. Claim 15.5 (Lemma 4.2 of [14]) . We have n ∂∂ t φ T ( t ) = (1 − ε ) ¯ d ¯ k ¡ ∆ t − ∆ ′ t + ∆ ′′ t ¢ .To calculate ∆ t , ∆ ′ t we continue to denote by ψ ∗ k a weight function distributed as ψ k , drawn indepen-dently of everything else. Claim 15.6.

We have ∆ t = o (1) + E £ ψ k ( σ ∗ ( y ), . . . , σ ∗ ( y k )) log 〈 ψ k ( σ ( y ), . . . , σ ( y k )) 〉 G ′ ¤ / ξ . Proof.

Due to routine concentration arguments we may safely assume that n X i = d i ≥ m ε ( t ) X i = k i + m ′ ε ( t ) X i = k ′ i .Proposition 4.5 provides a coupling of G ′ , G ′′ . There are three possible scenarios. Case 1: G ′ = G ′′ − a m ε ( t ) + : In this case, G ′′ can be obtained from G ′ by adding a single k m ε ( t ) + -aryfactor node a = a m ε ( t ) + . Its weight function and the adjacent variable nodes are drawn from thedistribution P £ ∂ a = ( y , . . . , y k m ε ( t ) + ), ψ a = ψ | D ′ , σ ∗ ¤ = (1 + o (1)) P h ψ k m ε ( t ) + = ψ i ψ ( σ ∗ ( y ), . . . , σ ∗ ( y k m ε ( t ) + )) E [ ψ k m ε ( t ) + ( σ ∗ ( y ), . . . , σ ∗ ( y k m ε ( t ) + ))] .(15.4) with y , . . . , y k m ε ( t ) + ∈ C , ψ ∈ Ψ ; the 1 + o (1) term stems from the fact that the ‘cavities’ where a at-taches should be drawn without replacement. Furthermore, since with probability 1 − exp( − Ω ( n ))we have P y ∈ C { σ ∗ ( y ) = τ } = | C | / q + o ( n ) for all τ ∈ Ω , the expression (15.4) simpliﬁes to P £ ∂ a = ( y , . . . , y k m ε ( t ) + ), ψ a = ψ | D ′ , σ ∗ ¤ = + o (1) ξ P h ψ k m ε ( t ) + = ψ i ψ ( σ ∗ ( y ), . . . , σ ∗ ( y k m ε ( t ) + )).(15.5) urthermore, the ensuing change in free energy upon adding a works out to belog Z ( G ′′ ) Z ( G ′ ) = log ψ a ( σ ) ® G ′ .(15.6) Case 2: | G ′ △ G ′′ | = O ( p n log n ) : because all weight functions are strictly positive, in this case we ob-tain ¯¯ log Z ( G ′′ ) − log Z ( G ′ ) ¯¯ = O ( p n log n ).(15.7) Case 3: cases 1,2 do not occur.:

In this case we have the trivial boundlog Z ( G ′′ )/ Z ( G ′ ) = O ( n + m ε ).(15.8)Proposition 4.5 shows that Case 1 occurs with probability 1 − O (1/ n ) and that Case 3 occurs with proba-bility O (1/ n ). Therefore, (15.4)–(15.8) yield E · log Z ( G ′′ ) Z ( G ′ ) | D ′ , σ ∗ ¸ = + o (1) ξ E £ ψ k ( σ ∗ ( y ), . . . , σ ∗ ( y k )) log 〈 ψ k ( σ ( y ), . . . , σ ( y k )) 〉 G ′ ¤ ,as claimed. (cid:3) Claim 15.7.

We have ∆ ′ t = o (1) + E " k X i = X τ ∈ Ω k © τ j = σ ∗ ª ψ k ( τ ) Y j i µ j ( τ j ) log ¿ X σ ∈ Ω k { σ i = σ ( x )} ψ ( σ ) Y j i µ j ( σ j ) À G ′ / ξ . Proof.

We apply Proposition 4.5 as in the proof of the previous proposition to obtain a coupling of G ′ , G ′′′ .As in that proof, because all weight functions are strictly positive we just need to consider the case that G ′ coincides with the factor graph obtained from G ′′′ by removing b m ′ ε ( t ) + , . . . , b m ′ ε ( t ) + k ′ m ′ ε ( t ) + . Hence, wemay assume that G ′′′ is obtained from G ′ by adding unary factor nodes b , . . . , b k ′ m ′ ε ( t ) + deﬁned as follows.Let ( µ ′′ i , j ) i , j ≥ be independent samples from π and let ( h i ) i ≥ be independent and uniform samples from[ k ′ m ′ ε ( t ) + ]. To simplify matters, we are going to discretise the continuous distribution on distributions π .Then P  ∂ b j = y , ψ b j ( · ) = X τ ∈ Ω k ′ m ′ ε ( t ) + { τ h j = · } ψ ( τ ) Y h h j µ i , j ( τ j ) | D ′ , σ ∗  (15.9) = + o (1) ξ k ′ m ′ ε ( t ) + X τ ∈ Ω k ′ m ′ ε ( t ) + © τ i = σ ∗ ( y ) ª ψ ( τ ) Y h i µ i , j ( τ h ) π ( µ i , j ).Let σ be a sample from µ G ′ . Since the factor nodes factorize up to a vanishing error term that is due tosome variable nodes having two or more cavities, we havelog Z ( G ′′′ ) Z ( G ′ ) = k ′ m ′ ε ( t ) + X j = log D ψ b j ( σ ) E G ′ + o (1).(15.10)Combining (15.9) and (15.10), we ﬁnally obtain E · log Z ( G ′′′ ) Z ( G ′ ) | D ′ , σ ∗ ¸ = E · k X i = X τ ∈ Ω k © τ h i = σ ∗ ( y i ) ª ψ k ′ m ′ ε ( t ) + ( τ ) Y j h i µ i , j ( τ j )log ¿ X σ ∈ Ω k ′ m ′ ε ( t ) + ψ k ′ m ′ ε ( t ) + ( σ ) © σ h i = σ ( x ) ª ψ ( σ ) Y j h i µ i , j ( σ j ) À G ′ ¸. ( ξ + o (1)).The claim follows. (cid:3) laim 15.8. With µ , µ chosen independently from π we have ∆ ′′ t = ¯ k ξ ¯ d ∂∂ t Γ t = E " ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ ( π ) j ( τ j ) ! Proof.

This follows immediately by plugging in the deﬁnition of Γ t . (cid:3) Proof of Lemma 15.4.

This lemma follows from Claims 15.6, 15.7 and 15.8. (cid:3)

Proof of Proposition 15.2.

Let ρ ˆ G T , t , ε be the empirical distribution of the marginals of µ ˆ G t , ε , π , T , x deﬁnedover the set of cavities, i.e. ˆ ρ = | C | X x ∈ C δ µ ˆ G t , ε , π , T , x ∈ P ∗ ( Ω )(15.11)Lemma 4.3 shows that choosing T sufﬁciently large, we can ensure that µ ˆ G t , ε , π , T , x is δ -symmetric for anarbitrarily small δ >

0. Therefore, the Nishimori identity and Lemma 15.4 imply that ∂∂ t φ T ( t ) = o (1) + ¯ d ¯ k ξ Ξ t = O ( δ ) + ¯ d ¯ k ξ Ξ ′ t where Ξ ′ t = E " Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y i = ρ i ( τ i ) ! + ( k − Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y i = µ i ( τ i ) ! − k Λ Ã X τ ∈ Ω k ψ ( τ ) ρ ( τ ) k Y i = µ i ( τ i ) ! .Hence, the assertion follows from assumption POS . (cid:3) Proof of Proposition 15.3.

Because the random graph model is symmetric under permutations ofthe variable nodes, we can view n E £ log Z ( ˆ G ε ) ¤ as the contribution to E £ log Z ( ˆ G ε ) ¤ of the connectedcomponent of x . The partition function of the component of x is nothing but z = X σ ∈ Ω d x Y j = ψ b x , j ( σ )By construction at t =

0, the degree d is chosen from D . On the factor side, the variable is assigned tofactor nodes by choosing uniformly at random without replacement among the emanating half-edges ofthe factor nodes. Moreover, changing the total number of half-edges by a bounded number only changesthe probability of selecting factor nodes with speciﬁc arities by O (1/ n ). Thus, the arity of the chosenfactor nodes is distributed according to (1.2). Hence, we ﬁnd1 n E £ log Z ( ˆ G ε ¤ = E [ z ] = E " ξ − d | Ω | Λ Ã X σ ∈ Ω d Y i = X τ ∈ Ω ˆ k i © τ h i = σ ª ψ ˆ k i ( τ ) Y j h i µ i j ( τ j ) ! The overall proposition is immediate from Propositions 15.2 and 15.3.16. A

PPLICATIONS

LDGM codes.

We start to show how to apply Theorem 2.3 to derive the statement in Theorem 1.1.To this end, let Ω = { ±

1} , ψ k = © ψ k ,1 , ψ k , − ª for all k ≥ ψ k , J ( σ ) = + (1 − η ) J k Y i = σ i for all σ ∈ Ω k , J ∈ { ± P k is simply the uniform distribution, i.e. P k ( ψ k , J ) = J ∈ { − + Ψ k conditioned on the planted conﬁguration for a factor node a with degree k for all ∈ supp K is given by P £ ψ a = ψ k , J | σ ∂ a = ( σ . . . σ k ¤ ) = Ã + (1 − η ) J k Y i = σ i ! /2which yields 1 − η if Q ki = σ i = η if Q ki = σ i = −

1. Furthermore, we have ξ = E £ | Ω | − k P τ ∈ Ω k ψ k ( τ ) ¤ = E " | Ω | k X τ ∈ Ω k Λ ( ψ k ( τ )) = £ log 2 − H ( η ) ¤ .(16.1)Next, we check SYM , BAL , POS . SYM and

BAL are immediate since the function σ E £ ψ k ( σ ) ¤ is con-stant. For POS , we employ an argument from [14, Section 4.4]. Expanding Λ ( · ) and using Fubini’s theo-rem we obtain E " Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y i = ρ i ( τ i ) ! = − + ∞ X ℓ = E h¡ − P σ ∈ Ω k ψ k ( σ ) Q k i = ρ i ( σ i ) ¢ ℓ i ℓ ( ℓ − = − + ∞ X ℓ = E £ ((1 − η ) J ) ℓ ¤ E £ ( ρ (1) − ρ ( − ℓ k ¤ ℓ ( ℓ − POS and letting X ℓ = E £ ( ρ (1) − ρ ( − ℓ ¤ and Y ℓ = E £ ( ρ ′ (1) − ρ ′ ( − ℓ ¤ , we merely need to show that ∞ X ℓ = ℓ ( ℓ − E h ((1 − η ) J ) ℓ i E h X k ℓ − k X ℓ Y k − ℓ + ( k − Y k ℓ i ≥ ℓ is odd, then E £ ((1 − η ) J ) ℓ ¤ = J . Moreover, for even ℓ , both X , Y ≥ X k − k X Y k − + ( k − Y k ≥ X , Y ≥ E £ ((1 − η ) J ) ℓ ¤ ≥ ℓ is even. Theorem 2.3 together with (16.1) yieldlim n →∞ n I ( σ ∗ , G ∗ ) = (1 + ¯ d / ¯ k ) log 2 − H ( η ) − B ( η ).Finally, we simplify the Bethe function B ( η ). To this end, we can map a distribution µ ( π ′ ) drawn from π ′ ∈ P ∗ ({ ± θ ( ρ ) drawn from ρ ∈ P ([ −

1, 1]) by θ ( ρ ) = µ ( π ′ ) (1) − B ( π ′ ) = E " ξ − d | Ω | Λ Ã X σ ∈ Ω d Y i = X τ ∈ Ω ˆ k i © τ h i = σ ª ψ ˆ k i ( τ ) Y j h i µ ( π ) i j ( τ j ) ! − ¯ d ( k − ξ ¯ k Λ Ã X τ ∈ Ω k ψ k ( τ ) k Y j = µ ( π ) j ( τ j ) ! = E " Λ Ã X τ ∈ { ± d Y i = Ã + ˆ k − X τ ∈ { ± (1 − η ) J σ ˆ k − Y j = τ j µ ( π ′ ) i j ( τ j ) !! − ¯ d ( k − k Λ Ã + ˆ k X τ ∈ { ± (1 − η ) J k Y i = τ j µ ( π ′ ) j ( τ j ) ! = E " Λ Ã X σ ∈ { ± d Y i = Ã + (1 − η ) σ J b ˆ k − Y j = θ ( ρ ) i j !! − ¯ d ( k − k Λ Ã + (1 − η ) J k Y j = θ ( ρ )1, j ! = B ldgm ( ρ , η )concluding the proof. Stochastic Block Model.

First we need to check that the SBM indeed satisﬁes the assumptions

SYM , BAL , POS , which follows directly from [14].

Lemma 16.1.

The Stochastic Block Model satisﬁes the assumptions

SYM , BAL and

POS for all q ≥ β ≥ .Proof. The lemma is an immediate consequence of Lemma 4.3 and 4.5 in [14] which carry over the Sto-chastic Block Model deﬁned in Section 1.3. (cid:3)

Now, the proof of Theorem 1.2 is reduced to an application of Theorem 17.1 to the stochastic blockmodel.16.3.

The Potts antiferromagnet on random regular graphs.

Let G ( n , d ) denote a random regular graphwith n vertices, each with degree d . Theorem 16.2.

Let k = d ∈ N ≥ and m = d n / k. For q ≥ , and c ∈ [0, 1] , let B Potts ( d , q , c ) = sup π ∈ P ∗ ([ q ]) E  Λ ³P q σ = Q di = − c µ ( π ) i ( σ ) ´ q (1 − c / q ) d − d Λ (1 − P q τ = c µ ( π )1 ( τ ) µ ( π )2 ( τ )2(1 − c / q )  , β q , cond ( d ) = inf n β > B Potts ( d , q , 1 − e − β ) > log q + d log(1 − (1 − e − β )/ q )/2 o . Then we have lim n →∞ − n E £ log Z β ( G ( n , d )) ¤ = − log q − d log(1 − (1 − e − β )/ q )/2 β < β q , cond ( d )lim n →∞ − n E £ log Z β ( G ( n , d )) ¤ < − log q − d log(1 − (1 − e − β )/ q )/2 β > β q , cond ( d )The key observation towards the proof of Theorem 16.2 is that the Stochastic Block Model is just theplanted version of the Potts antiferromagnet. Indeed, we ﬁnd P [ G ∗ SBM = G | σ ∗ ] ∝ P [ G = G ] exp Ã − β X ( v , w ) ∈ E ( G ) © σ ∗ ( v ) = σ ∗ ( w ) ª! ∝ P [ G Potts ( σ ∗ ) = G ]. Proof of Theorem 16.2.

The theorem is an immediate consequence of Theorem 1.2 and Lemma 4.4. (cid:3)

Diluted mixed k -spin models. The proof of Theorem 1.3 is based on Theorem 2.3. Clearly, themixed k -spin model ﬁts the deﬁnition of the generalized model underlying Theorem 2.3. While we havea degree sequence on the factor side, each factor chooses variable nodes uniformly at random withoutreplacement. Thus, up to a smaller-order error that adds o (1) to the free energy, the number of neigh-bours for a variable node is a Poisson random variable. Of course, one problem is that the number ofpossible weight functions is inﬁnite. We will tackle this issue in the proof of Theorem 1.3 by introducinga discretised version of J that is cut off at the tails. Let p k , J , β be the law of ψ k , J , β . Then, ﬁx some r ∈ N and deﬁne a discretised version of JJ ( r ) : = r − X i = { J ∈ [ − r + i / r , − r + ( i + r ]} µ − r + ir ¶ + r − X i = r { J ∈ [ − r + i / r , − r + ( i + r ]} µ − r + i + r ¶ − { J < − r } r + { J > r } r .Note that r in J ( r ) governs both the value range of the random variable and size of each discretised inter-val where for J < J ( r ) takes the value of the left interval bound, while for J > J ( r ) is symmetric and bounded. Let p ( r ) k , J , β be the law of ψ k , J ( r ) , β . Lemma 16.3.

For all r ∈ N , k ≥

2, ¯ d > β > , p ( r ) k , J , β satisﬁes conditions SYM , BAL and

POS . roof. Condition

SYM is satisﬁed with ε = − tanh( β r ) > ξ =

1. In

BAL , the function that we need tocheck for concavity is µ + E £ tanh( β J ( r ) ) ¤ E µ [ X ] =

1, as J ( r ) is distributed as − J ( r ) . Hence, BAL follows.Finally, for

POS , we use the expansion Λ (1 − x ) = − x + P ℓ ≥ x ℓ /( ℓ ( ℓ − j ≥ Ã − X τ ∈ { ± k ψ k , J ( r ) , β ( τ ) k Y i = µ i , ρ ( τ i ) ! j = ¡ tanh( β J ( r ) ) ¢ j k Y i = ³ µ i , ρ (1) − µ i , ρ ( − ´ j .Therefore, by the dominated convergence theorem, E " Λ Ã X τ ∈ { ± k ψ k , J ( r ) , β ( τ ) k Y i = µ i , ρ ( τ i ) ! = X j ≥ E h¡ tanh( β J ( r ) ) ¢ j i E ·³ µ ρ (1) − µ ρ ( − ´ j ¸ k /( j ( j − POS and setting X j = E ·³ µ ρ (1) − µ ρ ( − ´ j ¸ and Y j = E ·³ µ ρ ′ (1) − µ ρ ′ ( − ´ j ¸ , we arrive at the condition X j ≥ E h¡ tanh( β J ( r ) ) ¢ j i ³ X kj + ( k − Y kj − k X j Y k − j ´ /( j ( j − ≥ J ( r ) is symmetric, E h¡ tanh( β J ( r ) ) ¢ j i = j , while E h¡ tanh( β J ( r ) ) ¢ j i ≥ j .The claim follows from the fact that X k − k X Y k − + ( k − Y k ≥ X , Y ≥ (cid:3) Lemma 16.4.

If long-range correlations are absent in G , we have lim n →∞ E £ log Z ( G ) ¤ / n = lim n →∞ log E [ Z ( G )] / nProof. We readily ﬁnd that ∂∂ d n E £ log Z ( G ) ¤ = E " log Ã + tanh( β J ) D k Y i = σ y i E G ! (16.3) ≤ log Ã E " + tanh( β J ) D k Y i = σ y i E G = ∂∂ d n log E [ Z ( G )]where the inequality follows by Jensen. Assume that long-range correlations are absent in G , hence bydeﬁnition the spins are approximately pairwise independent and by Lemma 4.2 k -wise independent.Therefore, the Jensen gap in (16.3) vanishes. Finally,1 n E £ log Z ( G ) ¤ = Z ∂∂ d n E £ log Z ( G ) ¤ d d = Z ∂∂ d n log E [ Z ( G )] d d = n log E [ Z ( G )]whence the lemma follows. (cid:3) Claim 16.5.

If we ﬁnd for almost all i , j ∈ [ n ] that 〈 σ i σ j 〉 = o (1) , then for all but o ( n ) coordinates i ∈ [ n ] we have µ i (1) = + o (1) .Proof. We prove the claim by using limits, i.e., we associate a function f σ : [0, 1] → P ({ −

1, 1}) with σ ∈ { −

1, 1} n such that f σ ( x ) = n − X i = { x ∈ [ i / n , ( i + n )} δ σ i .(Hence, f σ ∈ P (({ −

1, 1}) is the atom on σ i that represents the assignment σ when we shrink the coordi-nates from [ n ] to [0, 1]). Coming with this embedding of ({ −

1, 1} n into the space of functions f : [0, 1] → P ({ −

1, 1}), there is an embedding of the corresponding probability measures µ ∈ P (({ −

1, 1} n ) into thespace of functions ˆ µ : [0, 1] → P (({ −

1, 1}) by taking well-deﬁned limits. A detailed discussion and formaljustiﬁcation of the procedure is provided by [18]. ence, we effectively need to prove the following. Let F : ( s , x ) ∈ [0, 1] → [ −

1, 1] be a measurablefunction such that Z F ( s , x ) F ( s , y )d s = x , y ∈ [0, 1]. Then F ( s , x ) = F ( · , x ), F ( · , y ) ∈ L ([0, 1]). Then (16.4) shows that ( F ( · , x )) x is an orthogonalfamily. Since any orthonormal family of the separable Hilbert space L ([0, 1]) is countable, this impliesthat { F ( · , x )/ k F ( · , x ) k : F ( · , x )

0} is countable. Therefore, unless F ( · , x ) = x , thereexists x with F ( · , x ) y ∈ [0, 1] : F ( · , y ) F ( · , y )/ k F ( · , y ) k = F ( · , x )/ k F ( · , x ) k } haspositive measure. But this contradicts (16.4). (cid:3) The next lemma follows almost directly from Claim 16.5 as we ﬁnd that almost all spins σ y and σ y need to be independent, hence, no long-range correlations are present. Lemma 16.6.

If we have lim n →∞ E £ log Z ( G ) ¤ / n = lim n →∞ log E [ Z ( G )] / n, long-range correlations are ab-sent in G .Proof. By Jensen’s inequality, we havelim n →∞ ∂∂ d n E £ log Z ( G ) ¤ ≤ lim n →∞ ∂∂ d n log E [ Z ( G )]Since lim n →∞ E £ log Z ( G ) ¤ / n = lim n →∞ log E [ Z ( G )] / n by assumption, we ﬁndlim n →∞ ∂∂ d n E £ log Z ( G ) ¤ = lim n →∞ ∂∂ d n log E [ Z ( G )](16.5)Moreover, another application of Jensen’s inequality yieldslim n →∞ ∂∂ d n E £ log Z ( G ) ¤ = E " log Ã + tanh( β J ) D k Y i = σ y i E G ! ≤ log Ã E " + tanh( β J ) D k Y i = σ y i E G = lim n →∞ ∂∂ d n log E [ Z ( G )](16.6)By (16.5), we need equality to hold in (16.6). Since P [ k = > ε for some ε >

0, this equality needs to holdin particular for k =

2. By Claim 16.5, this implies the absence of long-range correlations in G closing theproof of the lemma. (cid:3) Proof of Theorem 1.3.

By Lemma 16.3 and since X converges to J in probability as ε → n → ∞ , Theorem 2.3 is applicable to the mixed k -spin model. Moreover, Lemmas 16.6 and 16.6 evincethat long-range correlations are absent in G if and only iflim n →∞ E £ log Z ( G ) ¤ / n = lim n →∞ log E [ Z ( G )] / n The theorem readily follows. (cid:3)

17. C

ONDENSATION THRESHOLD

In this section we discuss two (asymptotical) quantities considered as functions of the model param-eters q , ( ψ k ) k , k and d . For this purpose let Z k = P y ∈ Ω k E [ ψ k ( y )] for k ∈ Z ≥ . The annealed free entropydensity φ a ∈ R is given by φ a = (1 − ¯ d ) log( q ) + ¯ d ¯ k E £ log ( Z k ) ¤ .Assuming q to be ﬁxed we consider the regimes R RS = ¡ ( d , k , ( ψ k ) k ) : B sup ≤ φ a ¢ and R cond = © ( d , k , ( ψ k ) k ) : B sup > φ a ª . he next result is dedicated to the relative entropy of the teacher-student model with respect to the nullmodel. Theorem 17.1.

Assume that

DEG , SYM , BAL and

POS hold. Then we have lim n →∞ n D KL ¡ ( σ ∗ n , G ∗ n ( σ ∗ n )) k ( σ G n , G n ) ¢ = , ( d , k , ( ψ k ) k ) ∈ R RS , lim n →∞ n D KL ¡ ( σ ∗ n , G ∗ n ( σ ∗ n )) k ( σ G n , G n ) ¢ > , ( d , k , ( ψ k ) k ) ∈ R cond . The last result establishes that the quenched free entropy density and the annealed entropy densitycoincide exactly in the replica symmetric regime.

Theorem 17.2.

Assume that

DEG , SYM , BAL and

POS hold. Then we have lim n →∞ n E £ log ¡ Z G n ¢¤ = φ a , ( d , k , ( ψ k ) k ) ∈ R RS , lim n →∞ n E £ log ¡ Z G n ¢¤ < φ a , ( d , k , ( ψ k ) k ) ∈ R cond . In the following we tacitly assume that

DEG , SYM , BAL and

POS are satisﬁed.17.1.

Preliminaries.

We use the notation from Section 12 and further let φ a = E [ φ a, t n ] denote the an-nealed free entropy density with φ a, t = n log( ¯ Z t ), ¯ Z t = E [ Z G t ], denoting the annealed free entropy densityfor given t ∈ T n , n ∈ N . The ﬁrst result is a corollary to Proposition 9.2. Fact 17.3.

Uniformly in t ∈ T ◦ n we have φ a, t = φ a, ∞ + o (1) and further φ a = φ a, ∞ + o (1) , where φ a, ∞ = (1 − ¯ d ) log( q ) + ¯ d ¯ k E £ log ( Z k ) ¤ .Proof. Recall from Proposition 9.2 that we have ¯ Z t = (1 + o (1)) r ∗ t q n Q i ∈ [ m t ] ξ k t , i uniformly in t ∈ T ◦ n with r ∗ t = Θ (1) uniformly, which yields φ a, t = log( q ) + m t n E £ log ¡ ξ k t ¢¤ + o (1)uniformly. Now, notice that ε ≤ ξ k ≤ ε − for all k in the support of k using SYM , so with these uniformbounds on the expectation and the uniform bounds imposed by T ◦ n we have φ a, t = log( q ) + ¯ d ¯ k E £ log ( ξ k ) ¤ + o (1) = (1 − ¯ d ) log( q ) + ¯ d ¯ k E £ log ( Z k ) ¤ + o (1).Finally, notice that φ a, t , t ∈ T n , is sublinear in the number of factors using SYM , so with Proposition 8.1and the uniform convergence given t ∈ T ◦ n we have φ a = φ a, t + o (1) = φ a, ∞ + o (1). (cid:3) The next fact relates the quantities ˆ φ t , φ a, t and ¯ φ t through the distance of the models G ∗ t ( ˆ σ t ) and G t . Fact 17.4.

For all n ∈ N and t ∈ T n we have ˆ φ t = φ a, t + n D KL ¡ G ∗ t ( ˆ σ ) k G t ¢ ≥ φ a, t − n D KL ¡ G t k G ∗ t ( ˆ σ ) ¢ = ¯ φ t .Proof. Notice that the Radon-Nikodym derivative derivative of G ∗ t ( ˆ σ ) with respect to G t is G Z G ¯ Z t whichgives the equivalences, while the inequality is obvious due to the non-negativity of the relative entropy. (cid:3) Proof of Theorem 17.2.

Before we continue, we observe joint concentration given t ∈ T ◦ n . Lemma 17.5.

Jointly in s = ( t , σ , y ) ∈ S ◦ n we have ˆ φ = φ ∗ + o (1) = φ ∗ s + o (1) = φ ∗ t + o (1) = ˆ φ t + o (1) .Proof. The ﬁrst two equalities are immediate from Proposition 12.4. Thanks to Proposition 10.1 we have φ ∗ t = E [ φ ∗ s ∗ t { s ∗ t ∈ S ◦ n }] + o (1) uniformly in t ∈ T ◦ n , and further for ( t , σ , y ) ∈ S ◦ n with Proposition 12.4and the triangle inequality we have φ ∗ t , σ ′ , y ′ = φ ∗ t , σ , y + o (1) for any ( t , σ ′ , y ′ ) ∈ S ◦ n uniformly. This yields φ ∗ t = φ ∗ s + o (1) uniformly in s ∈ S ◦ n .Now, for given ε ∈ (0, 1) use Proposition 9.1 to obtain c , r ∈ R > then for any t ∈ T ◦ n we have P [ˆ s t S ◦ n ] ≤ ε + c P [ s ∗ t S ◦ n , σ ∗ ∈ E t ] = ε + o (1)uniformly in t ∈ T ◦ n thanks to Proposition 10.1, i.e. P [ˆ s t S ◦ n ] = o (1) uniformly. Analogously to the abovewe obtain ˆ φ t = φ ∗ t , σ , y + o (1) uniformly which completes the proof. (cid:3) First, we derive the following contiguity-like result for the replica symmetric phase.

Lemma 17.6.

If we have ¯ φ = φ a + o (1) , then for all ˆ c, ˆ c ′ ∈ R > there exist c, c ′ ∈ R > such that for all n ∈ N ,all t ∈ T ◦ n and E ⊆ Ω V n × G t with P [( ˆ σ t , G ∗ t ( ˆ σ t )) ∈ E ] ≤ ˆ c ′ exp( − ˆ cn ) we have P [( σ G t , G t ) ∈ E ] ≤ c ′ exp( − cn ) ,where G t denotes the support of G t .Proof. Using Fact 17.4 and Lemma 17.5 we notice that ¯ φ t = φ a, t + o (1) jointly in t ∈ T ◦ n . Now, ﬁx ˆ c , ˆ c ′ ∈ R > , n ∈ N and an event E such that P [( σ G ∗ t ( ˆ σ t ) , G ∗ t ( ˆ σ t )) ∈ E ] ≤ ˆ c ′ exp( − ˆ cn ).For any ε ∈ R > and using Proposition 12.1 we ﬁnd constants c , c ′ ∈ R > such that P · φ ( G t ) ≤ ¯ φ t − ε ¸ ≤ c ′ exp ³ − c ε n ´ uniformly in t ∈ T ◦ n . Due to the assumption we have ¯ φ t ≥ φ a, t − ε uniformly for all sufﬁciently large n ∈ N . Combining these gives P £ G t G ◦ t ¤ ≤ c ′ exp ¡ − c ε n ¢ , G ◦ t = { G ∈ G t : φ ( G ) > φ a, t − ε }, and further P [( σ G t , G t ) ∈ E ] ≤ c ′ exp µ − c ε n ¶ + P £ ( σ G t , G t ) ∈ E , G t ∈ G ◦ t ¤ = c ′ exp µ − c ε n ¶ + X σ E · ψ G t ( σ )exp( n φ ( G t )) {( σ , G t ) ∈ E , G t ∈ G ◦ t } ¸ < c ′ exp µ − c ε n ¶ + X σ E · ψ G t ( σ )exp( n φ a, t − ε n ) {( σ , G t ) ∈ E , G t ∈ G ◦ t } ¸ = c ′ exp µ − c ε n ¶ + e ε n X σ E · ψ G t ( σ )¯ Z t {( σ , G t ) ∈ E , G t ∈ G ◦ t } ¸ = c ′ exp µ − c ε n ¶ + e ε n E · Z G t ¯ Z t X σ µ G t ( σ ) {( σ , G t ) ∈ E , G t ∈ G ◦ t } ¸ = c ′ exp µ − c ε n ¶ + e ε n P £ ( σ G ∗ t ( ˆ σ t ) , G ∗ t ( ˆ σ t )) ∈ E , G ∗ t ( ˆ σ t ) ∈ G ◦ t ¤ ≤ c ′ exp µ − c ε n ¶ + ˆ c ′ exp ( ε n − ˆ cn ) .Let ε be the solution for which the coefﬁcients in the exponents coincide, i.e. c = c ε = ˆ c − ε ∈ R > , thenwith c = c ′ + ˆ c ′ we have P [( σ G t , G t ) ∈ E ] < c exp( − c n ). Recall that the result above holds for all n ∈ N with n > n for some suitable n ∈ N . Now, redeﬁne c = c and let c ′ ≥ c be sufﬁciently large such that c ′ exp( − cn ) ≥

1, then the assertion is trivial for all small n and also holds for large n . (cid:3) ext, we derive a concentration result for the Nishimori quenched free entropy density. Lemma 17.7.

For all ε ∈ R > there exist c, c ′ ∈ R > such that for all n ∈ N and t ∈ T ◦ n we have P £¯¯ φ ( ˆ G t ) − ˆ φ t ¯¯ ≥ ε ¤ ≤ c ′ exp( − cn ) .Proof. Using Lemma 17.5 we obtain uniform bounds for the distance of φ ∗ s and ˆ φ t over any choice of s ∈ S ◦ n and t ∈ T ◦ n for n sufﬁciently large. Further, for given ε ′ ∈ R > with Lemma 10.4 we obtain δ andexponential bounds for ∆ A ( α ˆ s t , α ∗ ) ≥ ε ′ given that the distance of ˆ ρ t and u Ω is less than δ . But Propo-sition 9.3 exactly provides the corresponding exponential bounds. Combining these results leaves uswith assignment distributions close to the reference distribution, for which the coupling in Section 12.7ensures that the corresponding quenched free entropy densities φ ∗ s , i.e. ∆ A ( α s , α ∗ ) < ε ′ and t ∈ T ◦ n with s = ( t , σ , y ) ∈ S n , are close to each other and the center. Finally, Proposition 12.3 provides uniform ex-ponential bounds for the distance of the free entropy density to its expectation given s , which concludesthe proof for large n . However, choosing c ′ ∈ R > sufﬁciently large ensures that the bound is valid for all n . (cid:3) Lemma 17.8.

We have ˆ φ = φ a + o (1) if and only if ¯ φ = φ a + o (1) .Proof. With Fact 17.3 we have φ a = φ a, t + o (1), with Lemma 17.5 we have ˆ φ = ˆ φ t + o (1) and with Propo-sition 12.2 we have ¯ φ = ¯ φ t + o (1), all uniformly in t ∈ T ◦ n . Now, we show that ˆ φ t = φ a, t + o (1) if and onlyif ¯ φ t = φ a, t + o (1) for a ﬁxed sequence t = t n ∈ T ◦ n , since then the assertion follows by the argumentsabove. Let c , c ′ ∈ R > and ˆ c , ˆ c ′ ∈ R > be the constants obtained from Proposition 12.1 and Proposition12.3 respectively.First, assume that ˆ φ t = φ a, t + o (1) holds. Fix a sequence ε n ∈ R > , n ∈ N , such that ε n = o (1), ε n n = ω (1)and | ˆ φ t − φ a, t | < ε n . Use Lemma 17.5 to obtain | φ ∗ s − ˆ φ t | < ε n for all s = ( t n , σ , y ) ∈ S ◦ n and sufﬁcientlylarge n ∈ N . The probability for the event E n = { G ∈ G t : | φ ( G ) − φ a, t | < ε n } can be bounded by P [ G ∗ t ( ˆ σ t ) E n ] ≤ P [ˆ s t S ◦ n ] + E h {ˆ s t ∈ S ◦ n } P h G ∗ ˆ s t E n ¯¯¯ ˆ s t ii ≤ o (1) + E · {ˆ s t ∈ S ◦ n } P ·¯¯¯ φ ( G ∗ ˆ s t ) − φ ∗ ˆ s t ¯¯¯ ≥ ε n ¯¯¯¯ ˆ s t ¸¸ ≤ o (1) + P [ˆ s t ∈ S ◦ n ] ˆ c ′ exp µ − ˆ c ε n n ¶ = o (1),using P [ˆ s t ∈ S ◦ n ] = + o (1) from the proof of Lemma 17.5, and that the exponent is of order ω (1) since ε n n = ω (1). Next, we use Z ◦ t = Z G t { G t ∈ E n } to obtain ¯ Z ◦ t = ¯ Z t P [ G ∗ t ( ˆ σ t ) ∈ E n ] ≥ ¯ Z t , where ¯ Z ◦ t = E [ Z ◦ t ]and n sufﬁciently large such that P [ G ∗ t ( ˆ σ t ) ∈ E n ] ≥ . On the other hand, using the deﬁnition of E n wehave E [ Z ◦ t ] ≤ exp(2 n ( φ a, t + ε n )) P [ G t ∈ E n ] ≤ exp(2 ε n n ) ¯ Z t , so the Paley-Zygmund inequality yields P · Z ◦ t ≥

12 ¯ Z ◦ t ¸ ≥ ¯ Z ◦ t E [ Z ◦ t ] ≥

116 exp( − ε n n ) .Since by deﬁnition we always have Z ◦ t ≤ Z G t , the event Z ◦ t ≥ ¯ Z ◦ t implies Z G t ≥ ¯ Z t and hence P £ G t ∈ E ′ n ¤ = P · Z G t ≥

14 ¯ Z t ¸ ≥

116 exp( − ε n n ), E ′ n = ½ G ∈ G t : φ ( G ) ≥ φ a, t − log(4) n ¾ .Fix a sequence δ n ∈ R > , n ∈ N , with δ n = o (1) and δ n = ω ( ε n ). Now we can use Proposition 12.1 with E ′′ n = { G ∈ G t : | φ ( G ) − ¯ φ t | < δ n } to obtain P £ G t ∈ E ′ n ∩ E ′′ n ¤ ≥

116 exp( − ε n n ) − c ′ exp( − c δ n n ) = µ − c ′ exp µ − c δ n n µ − ε n c δ n ¶¶¶ exp( − ε n n ) = (1 + o (1)) 116 exp( − ε n n ), o in particular we have G t ∈ E ′ n ∩ E ′′ n asymptotically with positive probability, and for all G ∈ E ′ n ∩ E ′′ n wehave | φ a, t − ¯ φ t | ≤ | φ a, t − φ ( G ) | + | φ ( G ) − ¯ φ t | ≤ n − log(4) + δ n = o (1).Conversely, assume that ˆ φ t = φ a, t + Ω (1), so there exists δ ∈ R > such that ˆ φ t ≥ φ a, t + δ for n sufﬁcientlylarge using Fact 17.4. Using Lemma 17.7 yields that P [ | φ ( ˆ G t ) − ˆ φ t | ≥ δ /2] ≤ c ′ exp( − cn ), so P [ φ ( ˆ G t ) ≤ φ a, t + δ /2] ≤ c ′ exp( − cn ). On the other hand, Fact 17.4 shows that ¯ φ t ≤ φ a , and further Proposition 12.1suggests that P [ | φ ( G t ) − ¯ φ t | ≥ δ ] ≤ c ′ exp( − cn ), so P [ φ ( G t ) ≤ φ a, t + δ /2] ≥ − c ′ exp( − cn ). So, with Lemma17.8, contraposition and Fact 17.4 we obtain ¯ φ t = φ a, t − Ω (1). (cid:3) Recall that ˆ φ = sup π ∈ P ∗ ( Ω ) B ( π ) + o (1) using all assumptions. Then with Fact 17.3 we obtain φ a = φ a, ∞ + o (1), and Lemma 17.8 yields ¯ φ = φ a, ∞ + o (1) iff sup π ∈ P ∗ ([ q ]) B ( π ) = φ a, ∞ . If sup π ∈ P ∗ ([ q ]) B ( π ) φ a, ∞ , thenwe have sup π ∈ P ∗ ([ q ]) B ( π ) > φ a, ∞ and lim sup n →∞ ¯ φ < φ a, ∞ using Fact 17.4. This completes the proof ofTheorem 17.2.17.3. Proof of Theorem 17.1.

Notice that the relative entropy density is given by f ( n ) = n D KL ¡ G ∗ t n ( σ ∗ ), σ ∗ k G t n , σ G t n ¢ = n E £ log ¡ r ¡ G ∗ t n ( σ ∗ ), σ ∗ ¢¢¤ , r ( G , σ ) = q − n Z G ¯ ψ t G ( σ ) ,where r denotes the derivative of ( G ∗ t n ( σ ∗ ), σ ∗ ) with respect to ( G t n , σ G t n ). Basic algebra and using g ( t ) = n − D KL ( σ ∗ k ˆ σ t ) gives f ( n ) = φ ∗ − φ a + E [ g ( t n )]. Using SYM we get q − n ε m t ≤ P [ ˆ σ t = σ ] ≤ q − n ε − m t ,so g ( t ) is sublinear in the number of factors and hence E [ g ( t n )] = E [ g ( t n ) { t n ∈ T ◦ n }] + o (1). Since g ( t )coincides with δ ∗ ( t ) in the mutual information proof we obtain E [ g ( t n )] = o (1) using BAL . Now, the resultis immediate using Theorem 17.2. R

EFERENCES [1] E. Abbe, A. Montanari: Conditional random ﬁelds, planted constraint satisfaction and entropy concentration. Theory ofComputing (2015) 413–443.[2] D. Amit, H. Gutfreund, H. Sompolinsky: Storing inﬁnite numbers of patterns in a spin-glass model of neural networks.Physical Review Letters (1985), 1530.[3] M. Aldridge, O. Johnson, J. Scarlett: Group testing: an information theory perspective. Foundations and Trends in Com-munications and Information Theory (2019).[4] F. Altarelli, A. Braunstein, L. Dallâ˘A´ZAsta, A. Lage-Castellanos, R. Zecchina. Bayesian inference of epidemics on networksvia belief propagation. Physical review letters (2014) 118701.[5] J. Banks, C. Moore, J. Neeman, P. Netrapalli: Information-theoretic thresholds for community detection in sparse networks.Proc. 29th COLT (2016) 383–416.[6] V. Bapst, A. Coja-Oghlan, S. Hetterich, F. Rassmann, D. Vilenchik: The condensation phase transition in random graphcoloring. Communications in Mathematical Physics (2016) 543–606.[7] J. Barbier, C. Chan, N. Macris: Mutual information for the stochastic block model by the adaptive interpolation method.Proc. IEEE International Symposium on Information Theory (2019) 405–409.[8] J. Barbier, C. Chan, N. Macris: Adaptive path interpolation for sparse systems: application to a simple censored blockmodel. Proc. IEEE International Symposium on Information Theory (2018) 1879–1883.[9] J. Barbier, N. Macris: The adaptive interpolation method for proving replica formulas. Applications to the Curieâ˘A¸SWeissand Wigner spike models. Journal of Physics A: Mathematical and Theoretical (2019) 294002.[10] R. Bhattacharya, R. Ranga Rao: Normal approximation and asymptotic expansions. Society for Industrial and AppliedMathematics (2010).[11] J. van den Brand, N. Jaafari: The mutual information of LDGM codes. arXiv 1707.04413 (2017).[12] A. Coja-Oghlan, A. Ergür, P. Gao, S. Hetterich, M. Rolvien: The rank of sparse random matrices. Proc. 31st SODA (2020)579–591.[13] A. Coja-Oghlan, P. Gao: The rank of random matrices over ﬁnite ﬁelds. arXiv preprint arXiv:1810.07390 (2018).[14] A. Coja-Oghlan. F. Krzakala, W. Perkins, L. Zdeborová: Information-theoretic thresholds from the cavity method. Advancesin Mathematics (2018) 694–795.[15] A. Coja-Oghlan, W. Perkins. Bethe States of Random Factor Graphs. Communications in Mathematical Physics.10.1007/s00220-019-03387-7 (2017).

16] A. Coja-Oghlan, W. Perkins: Spin systems on Bethe lattices. Communications in Mathematical Physics (2019) 441–523.[17] A. Coja-Oghlan, W. Perkins, K. Skubch: Limits of discrete distributions and Gibbs measures on random graphs. EuropeanJournal of Combinatorics (2017) 37–59.[18] A. Coja-Oghlan, M. Hahn-Klimroth: The cut metric for probability distributions. arXiv:1905.13619 (2019).[19] A. Coja-Oghlan, O. Gebhard, M. Hahn-Klimroth, P. Loick. Optimal group testing. Proceedings of Machine Learning Re-search (COLT) (2020).[20] A. Decelle, F. Krzakala, C. Moore, L. Zdeborová: Asymptotic analysis of the stochastic block model for modular networksand its algorithmic applications. Phys. Rev. E (2011) 066106.[21] M. Dia, N. Macris, F. Krzakala, T. Lesieur, L. Zdeborová. Mutual information for symmetric rank-one matrix estimation: Aproof of the replica formula. In Advances in Neural Information Processing Systems (2016) 424–432.[22] D. Donoho: Compressed sensing. IEEE Transactions on Information Theory (2006) 1289–1306.[23] D. Donoho, A. Javanmard , A. Montanari: Information-theoretically optimal compressed sensing via spatial coupling andapproximate message passing. IEEE Transactions on Information Theory (2013) 7434–7464.[24] R. Durrett: Probability: theory and examples. Cambridge University Press, Cambridge (2010).[25] W. Feller: An introduction to probability theory and its applications. John Wiley & Sons, Inc., New York-London-Sydney(1968).[26] A. Giurgiu, N. Macris, R. Urbanke: Spatial coupling as a proof technique and three applications. IEEE Transactions onInformation Theory (2016) 5281–5295.[27] D. Guo, C. Wang. Multiuser detection of sparsely spread CDMA. IEEE journal on selected areas in communications (2008) 421–431.[28] S. Janson, T. Łuczak, A. Rucinski: Random graphs. Wiley-Interscience, New York (2000).[29] S. Janson, T. Luczak, A. Rucinski: Random graphs John Wiley & Sons (2011).[30] F. Krzakala, A. Montanari, F. Ricci-Tersenghi, G. Semerjian, L. Zdeborová: Gibbs states and the set of solutions of randomconstraint satisfaction problems. Proc. National Academy of Sciences (2007) 10318–10323.[31] S. Kudekar, T. Richardson, R. Urbanke: Spatially coupled ensembles universally achieve capacity under belief propagation.IEEE Transactions on Information Theory (2013) 7761–7813.[32] S. Kumar, A. Young, N. Macris, H. Pﬁster: Threshold saturation for spatially coupled LDPC and LDGM codes on BMSchannels. IEEE Trans. Inf. Theory (2014) 7389–7415.[33] E. Mossel, J. Neeman, A. Sly: Reconstruction and estimation in the planted partition model. Probability Theory and RelatedFields (2015) 431–461.[34] M. Lelarge, L. Miolane: Fundamental limits of symmetric low-rank matrix estimation. Conference on Learning Theory(COLT) (2017) 1297–1301.[35] D. Levin, Y. Peres: Markov chains and mixing times (Vol. 107). American Mathematical Soc. (2017).[36] M. Mézard, A. Montanari: Information, physics and computation. Oxford University Press 2009.[37] M. Mézard. Mean-ﬁeld message-passing equations in the Hopﬁeld model and its generalizations. Physical Review E (2017) 022117.[38] A. Montanari: Tight bounds for LDPC and LDGM codes under MAP decoding. IEEE Transactions on Information Theory (2005) 3221-3246.[39] C. Moore: The computer science and physics of community detection: landscapes, phase transitions, and hardness.Bull. EATCS (2017).[40] D. Panchenko: The Sherrington-Kirkpatrick model. Springer 2013.[41] J. Pearl: Probabilistic reasoning in intelligent systems: networks of plausible inference. Elsevier (2014)[42] J. Raymond, D. Saad: Sparsely spread CDMAâ˘AˇTA statistical mechanics-based analysis. Journal of physics A: mathematicaland theoretical (2007) 12315.[43] T. Richardson, R. Urbanke: Modern coding theory. Cambridge University Press (2012).[44] L. Zdeborová, F. Krzakala: Statistical physics of inference: thresholds and algorithms. Advances in Physics (2016) 453–552.[45] L. Zdeborová, F. Krzakala: Phase transition in the coloring of random graphs. Phys. Rev. E (2007) 031131.(2007) 031131.