Storage capacity in symmetric binary perceptrons
aa r X i v : . [ c ond - m a t . d i s - nn ] M a r Storage capacity in symmetric binary perceptrons
Benjamin Aubin, Will Perkins, and Lenka Zdeborov´a Institut de physique th´eorique, Universit´e Paris Saclay, CNRS, CEA Saclay,F-91191 Gif-sur-Yvette, France Department of Mathematics, Statistics and Computer Science, University of Illinois,Chicago, USA (Dated: 2 April 2019)
We study the problem of determining the capacity of the binary perceptron fortwo variants of the problem where the corresponding constraint is symmetric. Wecall these variants the rectangle-binary-perceptron (RPB) and the u − function-binary-perceptron (UBP). We show that, unlike for the usual step-function-binary-perceptron, the critical capacity in these symmetric cases is given by the annealedcomputation in a large region of parameter space (for all rectangular constraintsand for narrow enough u − function constraints, K < K ∗ ). We prove this fact (un-der two natural assumptions) using the first and second moment methods. Wefurther use the second moment method to conjecture that solutions of the symmet-ric binary perceptrons are organized in a so-called frozen-1RSB structure, withoutusing the replica method. We then use the replica method to estimate the capacitythreshold for the UBP case when the u − function is wide K > K ∗ . We concludethat full-step-replica-symmetry breaking would have to be evaluated in order toobtain the exact capacity in this case. I. INTRODUCTION
In this paper we revisit the problem of computing the capacity of the binary perceptron for storing random patterns. This problem lies at the core of early statistical physics studiesof neural networks and their learning and generalization properties, for reviews see e.g. .While the perceptron problem is motivated by studies of simple artificial neural networks asdiscussed in detail in the above literature, in this paper we view it as a random constraintsatisfaction problem (CSP) where the vector of binary weights w ∈ {± } N (a solution )must satisfy M step constraints of the type N X i =1 X µi w i ≥ K , (1)where µ = 1 , . . . , M , K ∈ R is the threshold , the random variables X µi are iid Gaussianvariables with zero mean and variance 1 /N , and the rows of the matrix X ∈ R M × N arecalled patterns. We define an indicator function associated to the perceptron with a stepconstraint as ϕ s ( z ) = z ≥ K .We say that a given vector w is a solution of the perceptron instance if all M constraintsgiven by eq. (1) are satisfied. The storage capacity is then defined similarly to the satisfiabil-ity threshold in random constraint satisfaction problems: we denote the constraint densityas α ≡ M/N and define the storage capacity α c ( K ) as the infimum of densities α such thatin the limit N → ∞ , with high probability (over the choice of the matrix X ) there are nosolutions. It is natural to conjecture that the converse also holds, i.e. the storage capacity α c ( K ) equals the supremum of α such that in the limit N → ∞ solutions exist with highprobability. In this case we would say the storage capacity is a sharp threshold .Gardner and Derrida in their paper assume the storage capacity α c ( K ) is a sharp thresh-old and they apply the replica calculation to compute it, but reach a result inconsistentwith a simple upper bound obtained by the first moment method. M´ezard and Krauth found a way to obtain a consistent prediction from the replica calculation and concludedthat the storage capacity α sc ( K ) for the step binary perceptron (SBP), i.e. associated tothe constraint ϕ s , is given by the largest α for which the following quantity, the entropy inphysics, is positive: φ s RS ( α, K ) = extr q , ˆ q
12 ( q −
1) ˆ q + Z Dt log h (cid:16) t p ˆ q (cid:17)i + α Z Dt log Z ∞ K − t √ q √ − q Du , (2)where Dt = e − t / √ π dt is a normal Gaussian measure, and ”extr” means that the expressionis evaluated where the derivatives on the curl-bracket, with respect to q ≥ q ≥ and manyothers), the storage capacity of the binary perceptron remains an open mathematical prob-lem. In fact, even the very existence of a sharp threshold, i.e. the fact that in the limit N → ∞ the probability that patterns can be stored drops sharply from one to zero at thecapacity, is an open problem. Up to very recently only widely non-matching upper boundsand lower bounds for the storage capacity of the binary perceptron were available . Asthe present work was being finalized Ding and Sun proved in a remarkable paper a lowerbound on the capacity that matches the Krauth and Mezard conjecture (note that much likeTheorem 4 below, the main theorem in depends on a numerical hypothesis). A matchingupper bound remains an open challenge in mathematical physics and probability theory.In this paper we introduce two simple symmetric variants of the binary perceptron prob-lem. Let z µ ( w ) = P Ni =1 X µi w i . For a threshold K ∈ R + , we consider two different typesof symmetric constraints: • The rectangle binary perceptron (RBP) requires | z µ | ≤ K, ∀ µ = 1 , . . . , M . Its associ-ated indicator function is ϕ r ( z ) = | z | ≤ K . • The u -function binary perceptron (UBP) requires | z µ | ≥ K, ∀ µ = 1 , . . . , M . Its asso-ciated indicator function is ϕ u ( z ) = | z | ≥ K .These constraints are symmetric in the sense that if w is a solution then − w is a solutionas well. Our motivation behind these symmetric variants of the perceptron is that thissymmetry simplifies greatly the mathematical treatment of the problem, while keepingthe relevant physical properties intact. Thus, results that remain open questions for thecanonical perceptron can be established rigorously for these symmetric versions. Symmetricperceptron models are also directly related to the problem of determining the discrepancyof a random matrix or set system , a problem of interest in combinatorics.The main result of the present paper, presented in section II, is a proof, subject to anumerical hypothesis, of a formula for the storage capacity, defined in the same way as forthe step-function binary perceptron above. In particular, we show that in these symmetricvariants the first moment upper bound (corresponding to the annealed capacity in physics)on the storage capacity is tight (except for K > K ∗ ≃ .
817 for the UBP case). Weprove this statement using the second moment method. We note that the existing physicsliterature on perceptron-like problem contains other cases of models where the first momentupper bound on the storage capacity was observed to be tight, in particular the paritymachine , and the reversed-wedge binary perceptron . Those works, however, rely onthe comparison of the first moment bound on the capacity with the result of the replicamethod, rather than providing a rigorous justification.To formally state our main result, let Z ∼ N (0 , K ∈ R + let p r,K = P [ | Z | ≤ K ]and p u,K = P [ | Z | ≥ K ]. • The storage capacity for the rectangle binary perceptron is: α rc ( K ) = − log(2)log( p r,K ) ∀ K ∈ R + . (3) Binary perceptron Constraint Constraint function Range of K Storage capacityStep-function z ≥ K ϕ s ( z ) = z ≥ K ∀ K ∈ R RS eq. (2)Rectangle | z | ≤ K ϕ r ( z ) = | z | ≤ K ∀ K ∈ R + Annealed eq. (3) U -function | z | ≥ K ϕ u ( z ) = | z | ≥ K < K < K ∗ = 0 .
817 Annealed eq. (4) U -function | z | ≥ K ϕ u ( z ) = | z | ≥ K ∀ K > K ∗ = 0 .
817 FRSB?TABLE I. This table summarizes results for storage capacity in binary perceptrons with differenttypes of constraints. The result for canonical step-function is from . The results for the rectangleand u -function are obtained in this paper. • The storage capacity for the u − function binary perceptron is: α uc ( K ) = − log(2)log( p u,K ) for 0 < K < K ∗ ≃ . . (4)The constant K ∗ ≃ .
817 stems from the properties of the second moment entropy eq. (10).In the physics terms it is defined as the point of intersection between the annealed capacity α ua ( K ) and the local stability of the RS solution α u AT ( K ) eq. (17). That is, K ∗ is the solutionof the following equation: πp u,K e K log( p u,K ) = − K . (5)The two symmetric variants of the perceptron problem considered here share many ofthe intriguing geometric properties of the original step-function binary perceptron problem.Most significant is the conjectured frozen-1RSB nature of the space of solutions that splitsinto well separated clusters of vanishing entropy at any α >
0. Remarkably, this frozen-1RSB property can be deduced from the form of the second moment entropy as we explainin section III. Our justification of the frozen-1RSB property does not rely on the replicamethod and is hence of independent interest.For the UBP and
K > K ∗ , the second-moment proof technique fails, and this failuremarks tightly the onset of the replica symmetry breaking region. In that region, we evaluatethe one-step replica symmetry breaking (1RSB) approximation for the storage capacity, butconclude that full-step replica symmetry breaking (FRSB) would be needed to obtain theexact result. While the FRSB equations can be written along the lines of , they aremore involved than the ones for the Sherrington-Kirkpatrick model , and solving themnumerically or getting additional insight from them is a challenging task left for future work.We present the replica analysis in section IV. Table I contains the summary of our mainresults along with the predictions for the step-function perceptron.Finally let us comment on the simpler and more commonly considered case of sphericalperceptron where the binary constraint on the vector w is replaced by the spherical con-straint w ⊺ w = P Ni =1 w i = N . For K = 0 the spherical perceptron reduces to the famousproblem of intersection of half-spaces with capacity α c = 2 as solved by Wendell andCover . For K > is correct as proven in . For K < , while mathematical considerations about this case werepresented in . II. PROOF OF CORRECTNESS OF THE ANNEALED CAPACITY
To state the main results precisely we introduce some definitions. Let X ( N, M ) be therandom M × N pattern matrix. Define the partition functions Z r ( X ) = X w ∈{± } N M Y µ =1 ϕ r ( z µ ( w )) and Z u ( X ) = X w ∈{± } N M Y µ =1 ϕ u ( z µ ( w )) , which count respectively the number of solutions for the rectangle and u − function con-straints respectively. Let E r ( N, M ) and E u ( N, M ) be the events that Z r ( X ) ≥ Z u ( X ) ≥
1. We formally define the storage capacity.
Defintition 1.
The storage capacity α rc ( K ) is α rc ( K ) = inf { α : lim N →∞ P [ E r ( N, ⌊ αN ⌋ )] = 0 } , and likewise for α uc ( K ) . It is believed that there is a sharp threshold for the existence of solutions.
Conjecture 2.
The storage capacity is a sharp threshold: α rc ( K ) = sup { α : lim N →∞ P [ E r ( N, ⌊ αN ⌋ )] = 1 } , and likewise for α uc ( K ) . The corresponding conjecture for the random k-SAT model is the celebrated ‘satisfiabilitythreshold conjecture’ proved for k large by Ding, Sly, and Sun .Next, couple two standard Gaussians Z , Z β by letting Z and Z ′ be independent standardGaussians and setting Z = √ βZ + √ − βZ ′ and Z β = √ βZ − √ − βZ ′ . Let q r,K ( β ) = P [ | Z | ≤ K ∧ | Z β | ≤ K ] = q K ( β ) ,q u,K ( β ) = P [ | Z | ≥ K ∧ | Z β | ≥ K ] = 1 − p r,K + q K ( β ) , (6)with q K ( β ) the probability that two standard Gaussians with correlation 2 β − K in absolute value, that is: q K ( β ) = 12 π Z K − K dy Z K +(1 − β ) y √ β (1 − β ) − K +(1 − β ) y √ β (1 − β ) e − x y dx . Note that q t,K (1) = p t,K and q t,K (1 /
2) = p t,K for t ∈ { r, u } . We now introduce thefunctions that dictate the effectiveness of the second moment bound. Let F r,K,α ( β ) = H ( β ) + α log q r,K ( β ) (7) F u,K,α ( β ) = H ( β ) + α log q u,K ( β ) (8)where H ( β ) = − β log β − (1 − β ) log(1 − β ) is the Shannon entropy function.We state a numerical hypothesis in terms of the derivatives of these two functions. Hypothesis 3.
For all choices of
K > and α > so that F ′′ r,K,α (1 / < , there is exactlyone β ∈ (1 / , so that F ′ r,K,α ( β ) = 0 . The same holds for F u,K,α . Our main theorem is a proof, under Hypothesis 3, that the storage capacity is given bythe annealed computation.
Theorem 4.
Under the assumption of Hypothesis 3, the following hold.
1. For all
K > , we have α rc ( K ) = − log(2) / log( p r,K ) .2. For all K ∈ (0 , K ∗ ) , we have α uc ( K ) = − log(2) / log( p u,K ) . Under our definition of α rc ( K ) and α uc ( K ), we must prove two statements to show that α rc ( K ) = − log(2) / log( p r,K ) (and similarly for α uc ( K )). We use the first moment methodto show that for α > − log(2) / log( p r,K ),lim N →∞ Pr( E r ( N, M )) = 0; then we use the second moment method to show that for α < − log(2) / log( p r,K ), lim inf N →∞ Pr( E r ( N, M )) > ). Conjecture 2 asserts thestronger statement that for α < − log(2) / log( p r,K ), lim N →∞ Pr( E r ( N, M )) = 1.
A. First moment upper bound
Proposition 5.
1. If α > α ra ( K ) = − log(2)log( p r,K ) , then whp there is no satisfying assignment to the binaryperceptron with the rectangle activation function.2. If α > α ua ( K ) = − log(2)log( p u,K ) , then whp there is no satisfying assignment to the binaryperceptron with the u -function activation function.Proof. We give the proof for the rectangle function as the proof for the u -function is identical.Let ǫ = α − α ra ( K ) >
0. Let denote the vector of dimension N with all 1 entries. P [ E r ( N, αN )] ≤ E [ Z r ( X ( N, αN ))] = 2 N E " αN Y µ =1 | z µ ( ) |≤ K = 2 N p αNr,K = exp( N (log(2) + α log( p r,K )))= exp( N ǫ log( p r,K )) → N → ∞ . B. Second moment lower bound
Proposition 6.
1. If α < − log(2)log( p r,K ) , then lim inf N →∞ P [ E r ( N, αN )] > .
2. If
K < K ∗ and α < − log(2)log( p u,K ) , then lim inf N →∞ P [ E u ( N, αN )] > . To prove Proposition 6 we will apply the second-moment method in a similar fashion toAchlioptas and Moore who determined the satisfiability threshold of random k -SAT towithin a factor 2 by considering not-all-equal satisfying assignments (not-all-equal satisfia-bility (NAE-SAT) constraints are symmetric in the same way the rectangle and u -functionconstraints are symmetric). Recall the Paley-Zygmund inequality. Lemma 7.
Let X be a non-negative random variable. Then P [ X > ≥ E [ X ] E [ X ] . We will also use the following application of Laplace’s method from Achlioptas andMoore . Lemma 8.
Let g ( β ) be a real analytic function on [0 , and let G ( β ) = g ( β ) β β (1 − β ) − β . If G (1 / > G ( β ) for all β = 1 / and G ′′ (1 / < , then there exists constants c , c sothat for all sufficiently large Nc G (1 / N ≤ N X l =0 (cid:18) Nl (cid:19) g ( l/N ) N ≤ c G (1 / N .
1. Rectangle binary perceptron
We calculate E [ Z r ( X ) ] = X w , w ∈{± } N P [ w , w satisfying] = 2 N X w ∈{± } N P [ , w satisfying] = 2 N N X l =0 (cid:18) Nl (cid:19) q r,K ( l/N ) αN , where we recall q r,K from eq. (6). Define G r,K,α ( β ) ≡ exp( F r,K,α ( β )) = q r,K ( β ) α β β (1 − β ) − β , (9)If we can show that G r,K,α (1 / > G r,K,α ( β ) for all β = 1 / G ′′ r,K,α (1 / <
0, thenby Lemma 8, we have E [ Z r ( X ) ] ≤ c N q r,K (1 / αN = c N p αNr,K . Then since Z r ( X ) is integer valued, we have P [ Z r ( X ) ≥ ≥ E [ Z r ( X )] E [ Z r ( X ) ] = (2 N p αNr,K ) E [ Z r ( X ) ] ≥ (2 N p αNr,K ) c N p αNr,K = 1 /c > . It remains to show that when α < − log(2)log( p r,K ) , then G r,K,α (1 / > G r,K,α ( β ) for all β = 1 / G ′′ r,K,α (1 / <
0. By eq. (9) and the fact that G ′ r,K,α (1 /
2) = 0, it is enough to showthe same for F r,K,α .Certainly one necessary condition is that F r,K,α (1 / > F r,K,α (1). This reduces to thecondition 2 p αr,K > p αr,K or α < − log(2)log( p r,K ) which is exactly the condition of Proposition 6.Next consider F ′′ r,K,α (1 / F ′′ r,K,α (1 /
2) = 4 − π αK e − K p r,K ! . In particular, F ′′ r,K,α (1 / < α < π p r,K K e − K . But a calculation also shows that − log(2)log( p r,K ) < π p r,K K e − K for all K > F ′′ r,K,α (1 / < F r,K,α ( β ) is symmetric around β = 1 / β = 1 /
2, Hypothesis 3 implies that the global maximum of F r,K,α ( β ) occurs at either 1 / F r,K,α (1 / > F r,K,α (1), we have that F r,K,α (1 / > F r,K,α ( β ) for all β = 1 / u -function binary perceptron The proof for the u -function is similar. We can calculate E [ Z u ( X ) ] = 2 N N X l =0 (cid:18) Nl (cid:19) q u,K ( l/N ) αN = exp ( N (log(2) + F u,K,α ( β ))) , where we recall q u,K from eq. (6). Using Lemma 8 and Hypothesis 3 again, it sufficesto show that for 0 < K < K ∗ and α < − log(2)log( p u,K ) we have F u,K,α (1 / > F u,K,α (1) and F ′′ u,K,α (1 / <
0. The first follows immediately from the fact that α < − log(2)log( p u,K ) . For thesecond, we have F ′′ u,K,α (1 /
2) = 4 − π αK e − K p u,K ! and so F ′′ u,K,α (1 / < α < π p u,K K e − K . Unlike with the rectangle function it is not true that − log(2)log( p u,K ) < π p u,K K e − K (10)for all K : the left and right sides of the inequality cross at K = K ∗ , which implicitly defines K ∗ . Thus for K < K ∗ and α < − log(2)log( p u,K ) we have F ′′ u,K,α (1 / <
0, which completes theproof of Proposition 6 for the u -function binary perceptron. .
00 0 .
25 0 .
50 0 .
75 1 . β − . − . − . . . . F r , K , α ( β ) + l og ( ) a t K = . . . . . α .
00 0 .
25 0 .
50 0 .
75 1 . β . . . . . . . F u , K , α ( β ) + l og ( ) a t K = . . . . . . α FIG. 1. Second moment entropy densities. a) : the rectangle binary perceptron for α ≤ α ra = 1 . β = is the global maximizer. For α ≥ α ra , β = 0 and β = 1 are the maximizers. b) : the u -function binary perceptron for α ≤ α ∗ = 0 . β = is the maximizer while for α ∗ ≤ α ≤ α ua = 0 .
604 (dashed yellow), the maximizer is non-trivial β = 0.
3. Illustration
As an illustration, we plot the second moment entropy density lim N →∞ N log E [ Z t ] =log(2) + F t,K,α for t ∈ { r, u } at K = 1 > K ∗ in fig. 1. For the rectangle function ( a ), thesecond moment is tight: the maximum is reached for β = 1 / α smaller than the firstmoment α ra (dashed pink). Exactly the same happens for the u − function with K < K ∗ .However for K > K ∗ , the second moment method fails ( b ): β = 1 / β = 1 / α ua (dashed yellow). III. FROZEN-1RSB STRUCTURE OF SOLUTIONS IN BINARY PERCEPTRONS
One of the most striking properties of the canonical step-function perceptron is the pre-dicted frozen-1RSB nature of the space of solutions. This means that the dominant (mea-sure tending to one) part of the space of solutions splits into well separated clusters each ofwhich has vanishing entropy density at any α >
0. This frozen-1RSB scenario and quan-titative properties of the solution space were studied in detail recently . Following upon conjectures that such a frozen structure of solutions implies computational hardness indiluted constraint satisfaction problems , it was argued that finding a satisfying assign-ment in the binary perceptron should also be algorithmically hard since its solution spaceis dominated by clusters of vanishing entropy density . Yet this conjecture contradictedempirical results of . This paradox was resolved in where the authors identified thatthere are subdominant parts (i.e. parts of measure converging to zero as the system sizediverges) of the solution space that form extended clusters with large local entropy and allthe algorithms that work well always find a solution belonging to one of those large-local-entropy clusters. These sub-dominant clusters are not frozen and somewhat strangely arenot captured in the canonical 1RSB calculation . It was argued that existence of theselarge-local-entropy clusters bears more general consequences on the dynamics of learningalgorithms in neural networks, see e.g. .While frozen-1RSB structure has also been identified in constraint satisfaction problemson sparse graphs , we want to note that its nature in the binary perceptron is of arather different nature. In sparse systems a simple argument using expansion propertiesof the underlying graph and properties of the constraints show that each cluster with highprobability contains only one solution. In the perceptron model, which has a fully connectedbipartite interaction graph, this argument from sparse models does not apply.In the present paper, we deduce from the second moment calculation of the previoussection that the space of solutions in the symmetric binary perceptrons is also of the frozen-1RSB type and this property moreover extends to any finite temperature (with energybeing defined as the number of unsatisfied constraints). This is different from the lockedconstraint satisfaction problems of living on diluted hypergraphs, where the solution-clusters have extensive entropy at any non-zero temperature. Another difference is thatwhereas in the locked constraint satisfaction problems the size of each cluster is one withhigh probability, in the binary perceptron there are still many solutions in the clusters, itis only their entropy density (i.e. logarithm of their number per variable) that vanishes as N → ∞ .Investigation of the large local entropy clusters and their implications for learning inthe symmetric perceptrons is also of great interest, but left for future work. Clearly sincemathematically the symmetric perceptrons are simpler than the step-function one, theyshould also be the proper playground to deepen our understanding of the large local entropyclusters and their relation to learning and generalization.We present the frozen-1RSB scenario as a conjecture and then below indicate how thesecond moment calculation gives evidence for this conjecture. Given an instance X and asolution w , let Γ( w , d ) denote the set of solutions w ′ with Hamming distance at most d from w . Conjecture 9.
For every
K > and every α ∈ (0 , α rc ( K )) there exists d min > so thatwith high probability over the choice of the random instance X from the RBP, the followingproperty holds: for almost every solution w , N log | Γ( w , d min ) | → as N → ∞ . The same holds for the UBP for all K ≤ K ∗ . A. The link between the second-moment entropy and size of clusters
In this section we use t ∈ { r, u } and note that the form of the second moment entropydensity N log E [ Z t ] has very direct implications on the structure of solutions in the corre-sponding models. As we defined it above, the second moment entropy is the normalizedlogarithm of the expected number of pairs of solutions of overlap β .For problems such as the symmetric binary perceptrons where the quenched and annealedentropies are equal in leading order, there is a striking relation between the planted andthe random ensemble of the model . The random ensemble is the problem we haveconsidered so far, while the planted ensemble is defined by starting with a configurationof the weights (a solution) and then including only constraints that are satisfied by this planted configuration. As long as the quenched and annealed entropies of the randomensemble are equal in leading order the planted and random ensembles should be contiguous,meaning that high-probability properties that hold in one ensemble also hold in the other.Moreover the planted configuration in the planted ensemble has all the properties of aconfiguration sampled uniformly at random in the random ensemble. These propertiesfollow on the heuristic level from the cavity method reasoning . They were establishedfully rigorously in a range of models, see e.g. . In the present case of symmetricbinary perceptrons we have not yet managed to prove contiguity between the random andthe planted ensemble, and so we leave a rigorous mathematical result for future work. (Infact the missing ingredient is a version of Friedgut’s sharp threshold result suitable forperceptrons; such a result combined with Theorem 4 would also prove Conjecture 2). Wehence rely on the above heuristic argument and assume it holds in what follows.Given a planted solution w and a configuration w β that agrees with w on βN coordinates,the probability that w β is a solution in the planted model is ( q t,K ( β ) /p t,K ) M , and thus theexpected number of solutions at Hamming distance βN from the planted solution in theplanted ensemble is E [ Z β ] = (cid:18) NβN (cid:19) ( q t,K ( β ) /p t,K ) M , and its entropy density is ω t ( β ) ≡ lim N →∞ N log E [ Z β ] = F t,K,α ( β ) − α log p t,K for t ∈ { r, u } . (11)Recalling that contiguity implies that the planted solution has the properties of a uni-formly chosen solution in the random ensemble then this entropy gives us direct accessto properties of the solution space in the random ensemble at equilibrium. Most notablywe notice (see derivation in section III B below) that the derivative of ω t ( β ) at β = 1 is+ ∞ thus implying that ∀ ǫ > β ∈ [ d min ( α, K ) , (1 − ǫ )]. In turn, this means that the dominant (measure converging toone as N → ∞ ) part of the solution space splits into clusters each of which has vanishingentropy density (i.e. logarithm of the number of solutions in the cluster divided by N goesto zero as N → ∞ ). The missing ingredient in a full proof of Conjecture 9 is a proof of thecontiguity statement. B. Form of the 2nd moment entropy implying frozen-1RSB
In fig. 2a we plot ω r ( β ) for the rectangle binary perceptron, at K = 1, α = 1 . ≤ α rc ( K =1). Thanks to the contiguity between the planted and random ensembles that holds as longas the second moment entropy density is twice the first moment entropy density, this curverepresents also the annealed entropy of solutions at overlap β with a random referencesolution. We see notably that there is an interval of distances in which no solutions arepresent. Analytically we can see from the properties of the functions F t,K,α ( β ) and log p t,K . . . . . . β − . − . − . − . − . . ω r ( β ) a t( K , α ) = ( , . ) d min ( α, K ) . . . . . . β − . . . . . . . . ω k − N A E ( β ) a t( k , α ) = ( , ) FIG. 2. a) Density of the annealed entropy of solutions at overlap β from a random solution in therectangle binary perceptron at K = 1, α = 1 . ≤ α rc ( K = 1). We see there are no solution in aninterval of overlaps (1 − d min , − ǫ ). This curve is obtained from the second moment entropy andcontiguity between the random and planted ensembles. It implies the frozen-1RSB nature of thespace of solutions. The same holds for the u − function. b) To compare we plot the density of theannealed entropy of solutions at overlap β from a random solution in the k -NAE SAT model at k = 7, α = 40. We see the density is positive in a large region close to β = 1, showing the absenceof frozen-1RSB structure in this problem. that F t,K,α (1) = α log p t,K and the derivative of F t,K,α ( β ) → ∞ . This is in contrast with,for instance, the satisfiability problems studied in , where the function corresponding to F t,K,α ( β ) would have a negative derivative in β = 1 (see fig. 2b). There could still be aninterval of forbidden distance, but the bump in entropy for β ≈
1. Frozen 1RSB in rectangle binary perceptron
In the rectangle binary perceptron, the random and planted ensembles are conjecturedto be contiguous for all
K > α ∈ (0 , α rc ( K )). Using eq. (8), the first derivative of ω r ( β ), eq. (11), is given by (see Appendix VI E) ∂ω r ∂β = ∂F r,K,α ∂β = log (cid:18) − ββ (cid:19) + αq r,K,T ( β ) 1 π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) , and it diverges for all K ∈ R + , α > β → ∂ω r ∂β −−−→ β → + ∞ . (12)This implies vanishing entropy density of clusters to which typical solutions belong.
2. Frozen 1RSB in the u -function binary perceptron In the u -function binary perceptron, the random and planted ensembles are conjecturedto be contiguous for all 0 < K ≤ K ∗ and α ∈ (0 , α uc ( K )). Using eq. (8), the first derivativeof ω u ( β ) eq. (11), is given by ∂ω u ∂β = ∂F u,K,α ∂β = log (cid:18) − ββ (cid:19) + αq u,K,T ( β ) 1 π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) −→ β → + ∞ , thus reaching the same conclusion on presence of frozen-1RSB.In appendix VI E we extend the second moment calculation to finite temperature (forboth the rectangle and u − function case). This means that we define energy of a config-uration E ( w ) as the number of constraints that are violated by this configurations. Thenthe corresponding partition function is defined Z ( T ) = P w e −E ( w ) /T . There is a one-to-one mapping between the temperature T and energy density e = E /N , consequentlythe corresponding finite-temperature second moment entropy density counts the numberof pairs of solutions at overlap β and energy density e . In appendix VI E we apply thesame argument as here connecting the random and planted ensemble, and deduce that thefinite-temperature solution space of the models is of also of the frozen-1RSB type for any T < ∞ . C. Frozen-1RSB as derived from the replica analysis
We stress that we derived the frozen-1RSB nature of the space of solutions without the useof replicas. For completeness we summarize here how this translates to the properties of theone-step-replica-symmetry breaking solution. This is the way this phenomena was originallydiscovered and described in . For readers not familiar with the replica method thissection should be read after reading section IV.In general, three kinds of fixed points of the 1RSB equations are possible: • The replica symmetric (RS) solution q = q = q RS < • The frozen-1RSB solution (f1RSB) ( q , q ) = ( q RS ,
1) , • The 1RSB solution ( q , q ) with q = 1 .3 − q RS − q − q − q FIG. 3. Illustration of the configuration space for the different phases: a) : RS - solutions areconcentrated in a single cluster of typical size 1 − q RS . b) : 1RSB - solutions form clusters of size1 − q at a distance 1 − q from each other. c) : f1RSB - clusters are point-like (1 − q ≃
0) at adistance 1 − q = 1 − q RS from each other. The frozen-1RSB is characterized by an inner-cluster overlap q = 1 and an inter-clusteroverlap q = q RS , which means that clusters have vanishing entropy density and remainfar from each other. Mathematically RS and f1RSB solutions are equivalent in the sensethat these solutions have the same free energy eq. (20) Φ { q = q RS , q = q RS } =Φ { q = q RS , q = 1 } , and the complexity of the f1RSB solution equals the RS entropyΣ( φ = 0) = φ RS eq. (22, 15). However, RS and f1RSB do not share the same configurationspace. The RS phase is associated to a single cluster of solution with typical size 1 − q RS ,while the f1RSB configuration space is composed of many point-like solutions of size q ≃ − q = 1 − q RS of each other, see fig. 3. From this point of view f1RSB isthe correct description of the phase space. IV. REPLICA CALCULATION OF THE STORAGE CAPACITY
In this section we recall the replica calculation leading to the expression of the storagecapacity in the step-function binary perceptron. We show that in the symmetric binaryperceptrons the annealed calculation is reproduced by the replica symmetric result. Forthe u − function binary perceptron we show that K ∗ coincides with the onset of replicasymmetry breaking and we evaluate the 1RSB capacity for K > K ∗ . A. Replica calculation
For the purpose of the calculations, we introduce the constraint function C ( z ) that returns1 if w satisfies all the constraints { ϕ ( z µ ) } Mµ =1 and 0 otherwise C ( z ) = M Y µ =1 ϕ ( z µ ) with z µ = X µ w . Recall the partition function Z is the number of satisfying vectors w , with prior distribution P w ( w ), for a given matrix X Z ( X ) = X w ∈{± } N M Y µ =1 ϕ ( X µ w ) = Z d w P w ( w ) Z d z C ( z ) δ ( z − Xw ) . The replica method allows one to compute explicitly the quenched average E X [log( Z ( X ))] .More precisely, using the replica trick, the average of the logarithm can be expressed as the4limit n → n of the average of the n -th moment of thepartition function. Finally the free entropy reads: φ ( α ) ≡ lim N → + ∞ N E X [log( Z ( X ))] = lim N → + ∞ lim n → N n ∂ log ( E X [ Z ( X ) n ]) ∂n . (13)Computing the n -th moment of the partition function Z , for n ∈ N , is equivalent toconsidering n copies, also called replicas, of the initial system. For a given disorder, these n replicas are non-interacting and Z n can be computed easily. However, averaging over the”disorder” with distribution P X makes the replicas interacting: replicated weight-vectors w a and w b , for a, b ∈ [1 : n ], are correlated by the overlap matrix Q = ( Q ab ) na,b =1 = (cid:16) N P Ni =1 w ai w bi (cid:17) na,b =1 .We start averaging over the distribution P X and then use an analytical continuation for n ∈ R and reverse the limits N → ∞ and n →
0. The exchange of limits n → N → ∞ is a key and classical ingredient for replica calculations, rendering the replicamethod heuristic and not rigorously justified. Using this later point, we show in AppendixVI A that the free entropy φ eq. (13) can finally be expressed as a saddle point equationover n × n symmetric matrices Q and ˆQ φ ( α ) = − SP Q , ˆQ ( lim n → ∂S n ( Q , ˆQ ) ∂n ) , (14)where ˆQ is a parameter involved in the change of variable between { w a , w b } and Q ab andwith S n ( Q , ˆQ ) = Tr(
Q ˆQ ) − log( I nw ( ˆQ )) − α log ( I nz ( Q )) , I nw ( ˆQ ) = R R n d ˜w P ˜ w ( ˜w ) e ˜w ⊺ ˆQ ˜w where ˜w ∈ R n and P ˜ w ( ˜w ) = n Y a =1 [ δ ( ˜ w a −
1) + δ ( ˜ w a + 1)] , I nz ( Q ) = R R n d ˜z P ˜ z ( ˜z ) C ( ˜z ) where ˜z ∈ R n and P ˜ z , N ( , Q ) . In order to be able to compute the derivative of S n with respect to n eq. (14), we need ananalytical formulation of Q and ˆQ as a function of n . B. RS entropy
The simplest ansatz is to assume that the overlap matrix Q is Replica Symmetric (RS),which means that all replicas play the same role: the correlation between two arbitrary, butdifferent, replicas is denoted q , and therefore the RS ansatz reads: ∀ ( a, b ) ∈ [1 : n ] × [1 : n ] , N ( w a · w b ) = ( q if a = b ,Q = 1 if a = b . It enforces the matrix ˆQ to present the same symmetry, respectively with parameters ˆ q and ˆ Q = 1. Using this ansatz and the n → q and ˆ q , evaluatedat the saddle point (Appendix VI B): φ RS ( α ) = extr q , ˆ q (cid:26) −
12 + 12 ( q ˆ q −
1) + I w RS (ˆ q ) + α I z RS ( q ) (cid:27) , (15)5with I w RS (ˆ q ) ≡ R Dt log ( g w ( t, ˆ q )) , I z RS ( q ) ≡ R Dt log ( f z ( t, q )) , and for i ∈ N g wi ( t, ˆ q ) ≡ Z dw w i P w ( w ) exp (cid:18) (1 − ˆ q )2 w + t p ˆ q w (cid:19) ,f zi ( t, q ) ≡ Z Dz z i ϕ ( √ q t + p − q z ) . (16)Note that above and in what follows Dt = e − t / √ π dt . In the binary perceptron case, thefunction P w is defined as P w ( w ) = [ δ ( w −
1) + δ ( w + 1)] (note that this is not a probabilitydistribution because of the normalization), and recall ϕ ( z ) is the indicator function, checkingthat a constraint on the argument is satisfied (e.g in the step case, ϕ s ( z ) = 1 if z > K ).While in the step binary perceptron (SBP) the fixed point solution ( q , ˆ q ) is non-trivial,the symmetry of the activation function in the RBP and UBP cases enforces the configura-tion space to be symmetric and the fixed point ( q , ˆ q ) = (0 ,
0) to exist. If this symmetricfixed point is stable and has the lowest free energy, the RS free entropy matches the annealedentropy φ ta ( α ) = log(2) + α log( p t,K ) = N log E X [ Z t ( X )] from section II A with t ∈ { r, u } .
1. Rectangle
Solving numerically the corresponding saddle point equations leads to the single sym-metric fixed point ( q , ˆ q ) = (0 , φ r RS ( α ) = log(2) + α log ( p r,K ) = φ ra ( α ) , and the RS capacity equals the annealed capacity eq. (II A): α r RS ( K ) = α ra ( K ) = − log(2)log ( p r,K ) . U -function • For K ≤ K ∗ , only the symmetric fixed point ( q , ˆ q ) = (0 ,
0) exists, which leads againto the annealed free entropy: φ u RS ( α ) = log(2) + α log ( p u,K ) = φ ua ( α ) , and annealed capacity eq. (II A): α u RS ( K ) = α ua ( K ) = − log(2)log ( p u,K ) . • For
K > K ∗ , the RS entropy does not match the annealed entropy because the fixedpoint ( q , ˆ q ) = (0 ,
0) corresponds to a lower free energy than the symmetric fixedpoint (0 , K > K ∗ , where K ∗ isremarkably given by the same value as in the independent section II B 2. Hence itnaturally verifies eq. (5) even though its definition derives from the stability of theRS solution, that we study in the next section. C. Stability
The local stability of the RS solution can be studied using de Almeida and Thouless(AT) method , based on the positivity of the Hessian of S n ( Q , ˆQ ). The replica symmetric6AT-line α AT is given by the solution of the following implicit equation (Appendix VI D):1 α = 1(1 − q ( α )) Z Dt (cid:0) f z ( f z − f z ) + ( f z ) (cid:1) ( f z ) ( t, q ( α )) Z Dt (cid:0) g w g w − ( g w ) (cid:1) ( g w ) ( t, ˆ q ( α )) . As illustrated above, for the rectangle and u − function, the symmetry of the weights P w and the constraint ϕ imposes the existence of the symmetric fixed point ( q , ˆ q ) =(0 , q , ˆ q ) = (0 ,
0) (see Appendix VI D):1 α AT = ˜ f z − ˜ f z ˜ f z ! (cid:18) ˜ g w ˜ g w (cid:19) , where for i ∈ N : ˜ g wi = Z dww i P w ( w ) e w , ˜ f zi = Z Dzz i ϕ ( z ) . We plotted the annealed capacity, the replica symmetric capacity and the AT-line for thestep, rectangle and u -function binary perceptrons as functions of K in fig. 4, 5, 6.
1. Step binary perceptron
We note that for the step binary perceptron the RS solution is always stable towards1RSB, even for negative threshold
K <
0. This is interesting in the view of recent work onthe spherical perceptron with negative threshold where the replica symmetry breaks for all
K <
0, and full-step RSB is needed to evaluate the storage capacity .
2. Rectangle
As the RS capacity α r RS is always below the AT line α r AT , the RS solution is always locallystable. u -function There is a crossing between the values of the RS capacity α u RS and the AT-line α u AT , whichdefines implicitly the value K ∗ ≃ . − log (2)log ( p u,K ∗ ) = π p u,K ∗ ) e − ( K ∗ ) ( K ∗ ) . (17)For K ≤ K ∗ , the RS solution is locally stable, while for K > K ∗ the RS solution becomesunstable, and a symmetry breaking solution appears.7 − − K − α = M N SAT UNSAT
Annealed capacity α sa RS stability α s AT RS capacity α s RS K . . . . . α = M N SAT UNSAT . . . . K . . . α = M N FIG. 4. Step binary perceptron (SBP): the RS capacity α s RS (black) does not match the annealedcapacity α sa (blue) and is always below the AT-line α s AT (orange). The AT-line is closest to theannealed capacity for K min ≃ .
62 where the difference α s AT − α sa ≃ . K = 0, we retrievewell known results : α r RS ≃ . α r AT ≃ .
015 and α ra = 1. The left and right hand sides, andthe inset, represent the same data on different scales. The satisfiable (SAT) phase is representedby the beige shaded area and is located below the RS capacity, while the unsatisfiable (UNSAT)starts at the capacity (black line) and extends for a larger number of constraints. K − α = M N SATUNSAT
Annealed capacity α ra RS stability α r AT RS capacity α r RS . . . . . . K α = M N SATUNSAT .
00 1 .
25 1 .
50 1 . K α = M N FIG. 5. Rectangle binary perceptron (RBP): the RS capacity α r RS (black) matches the annealedbound α ra (blue), and the RS solution is locally stable for all K : α r RS < α r AT . The AT-line (orange)is closest to the annealed capacity for K min ≃ .
24 where the difference α s AT − α sa ≃ .
15. The leftand right hand sides, and the inset, represent the same data on different scales. The satisfiable(SAT) phase is represented by the beige shaded area and is located below the RS capacity, whilethe unsatisfiable (UNSAT) starts at the capacity (black line) and extends for a larger number ofconstraints. K − − α = M N SAT UNSAT K ∗ Annealed capacity α ua RS stability α u AT RS capacity α u RS K . . . . . . . α = M N SAT UNSAT K ∗ FIG. 6. U − function binary perceptron (UBP): the RS capacity black) matches the annealed bound(blue) for K < K ∗ . At K = K ∗ , the RS capacity crosses the AT-line (orange). For K > K ∗ ,the RS solution is unstable and the RS capacity deviates from the annealed capacity. The leftand right hand sides, and the inset, represent the same data on different scales. The satisfiable(SAT) phase is represented by the beige shaded area and is located below the RS capacity, whilethe unsatisfiable (UNSAT) starts at the capacity (black line) and extends for a larger number ofconstraints. D. 1RSB calculation
In the previous section we concluded that the replica symmetric solution is unstable inthe u − function binary perceptron for K > K ∗ , we analyze therefore the first-step of replicasymmetry breaking (1RSB) ansatz in this section. This ansatz and calculations is due to0seminal works of G. Parisi and is classic in the field of disordered systems and well presentedin the literature , we thus mainly give the key formulas and defer the details into theAppendix VI C.The 1RSB ansatz assumes that the space of configurations splits into states. Consequentlyreplicas are not symmetric anymore and instead n replicas are organized in nm groupscontaining m replicas each: ∀ ( a, b ) ∈ [1 : n ] × [1 : n ] , N ( w a · w b ) = q if a , b belong to the same state, q if a , b do not belong to the same state, Q = 1 if a = b . (18)Following , the partition function Z m associated to m replicas falling in the same stateis expressed as a sum over all possible states Ψ weighted by their corresponding free entropy φ : Z m = X { Ψ } exp( N mφ (Ψ)) = X { φ } N φ exp( N mφ ) = X { φ } exp( N Σ( φ )) exp( N mφ ) ∼ Z dφ exp( N ( mφ +Σ( φ )) , where we introduced the number of states at a given free entropy φ : N φ ≡ exp( N Σ( φ )) andthe complexity Σ( φ ), also called the configurational entropy.Using the saddle point method in the N → ∞ limit, the 1RSB replicated free entropyΦ is written as a function of the Parisi parameter m , the free entropy φ and thecomplexity Σ( φ ): Φ ( m, α ) ≡ lim N →∞ N E X [log( Z m ( X ))] = mφ + Σ( φ ) . (19)Injecting the 1RSB ansatz eq. (18) in the replica derivation eq. (14), the 1RSB replicatedfree entropy Φ is written as a saddle point equation over q = ( q , q ) and ˆq = (ˆ q , ˆ q )(see Appendix VI C):Φ ( m, α ) = extr q , ˆq (cid:26) m q ˆ q −
1) + m q ˆ q − q ˆ q ) + m I w ( ˆq ) + αm I z ( q ) (cid:27) (20)with I w ( ˆq ) = m R Dt log (cid:0)R Dt g w ( t , ˆq ) m (cid:1) , I z ( q ) = m R Dt log (cid:0)R Dt f z ( t , q ) m (cid:1) , denoting t = ( t , t ), and for i ∈ N : g wi ( t , ˆq ) = R dw w i P w ( w ) exp (cid:16) (1 − ˆ q )2 w + (cid:0) √ ˆ q t + √ ˆ q − ˆ q t (cid:1) w (cid:17) ,f zi ( t , q ) = R Dz z i ϕ ( √ q t + √ q − q t + √ − q z ) . (21)Taking the derivative of Φ with respect to m , the free entropy φ and complexity Σcan be written as: φ ( α ) = ∂ Φ ( m,α ) ∂m = extr q , ˆq (cid:8) ( q ˆ q −
1) + m ( q ˆ q − q ˆ q ) + J w ( ˆq ) + α J z ( q ) (cid:9) , Σ( φ ) = Φ − mφ = extr q , ˆq n m ( q ˆ q − q ˆ q ) + m ( I w − J w )( ˆq ) + mα ( I z − J z )( q ) o , (22)with J w ( ˆq ) = ∂ ( m I w ) ∂m = R Dt R Dt log( g w ( t , ˆq )) g w ( t , ˆq ) m R Dt g w ( t , ˆq ) m , J z ( q ) = ∂ ( m I z ) ∂m = R Dt R Dt log( f z ( t , q )) f z ( t , q ) m R Dt f z ( t , q ) m . E. 1RSB results for UBP
From now on, we only consider the u − function binary perceptron, whose RS solution isunstable for K > K ∗ . To describe the equilibrium of the system in the SAT phase, weneed to find the value of the Parisi parameter at equilibrium m eq . The complexity Σ( φ ) isthe entropy of clusters having internal entropy φ . In order to capture clusters that carryalmost all configurations, we need to maximize the total entropy φ tot = Σ( φ ) + φ underthe constraint that the free entropy and complexity are both positive φ ≥ φ ) ≥ m eq verifies φ eq = argmax φ ≥ , Σ ≥ { φ + Σ( φ ) } and m eq = − d Σ dφ (cid:12)(cid:12)(cid:12)(cid:12) φ eq . Using the expressions eq. (22) and varying the Parisi parameter m ∈ [0; 1], we obtainthe curve of the complexity Σ( φ ) as shown in fig. 7. At m = 1, the complexity is negative.Decreasing m , the complexity increases and becomes positive at the value m eq . Besides forsmall values of m , an unphysical (convex) branch appears, as commonly observed in othersystems solved by the replica method.We note that at α increases both the equilibrium complexity and free entropy decrease.In constraint satisfaction problems such as K-satisfiability or random graph coloring themechanism in which the satisfiability threshold appears is that the maximum of the com-plexity becomes negative. In the present UBP problem it is actually both the free entropyand the complexity that vanish together, as illustrated in fig. 7.2 .
010 0 .
012 0 .
014 0 .
016 0 .
018 0 . φ − . − . . . . . . C o m p l e x i t y Σ m eq = 0 . φ eq = 0 . . . . . . . m − . − . − .
005 0 . φ − . . . . . C o m p l e x i t y Σ m eq = 0 . φ eq = − . . . . . . . m FIG. 7. Complexity Σ( φ ) as a function of the free entropy φ for the u − function binary perceptronat K = 1 . > K ∗ . Complexity reaches Σ = 0 (black dot) at m eq . For K = 1 . α = 0 . a) thefree-entropy corresponding to m eq is positive φ eq >
0, whereas for α = 0 . b) the free entropy at m eq is negative φ eq < α is beyond the 1RSB storage capacity, and the capacity isin the interval [0 .
33; 0 . Computing the equilibrium value m eq ( α ), we have access to the corresponding equilibriumoverlaps q ∗ and q ∗ , that we may compare with the RS solution q RS . All these are depictedin fig. 8. The function m eq ( α ) shows a non monotonic behaviour as it has been previouslyobserved, e.g. in the Sherrington-Kirkpatrick model as a function of temperature .We also compute the 1RSB entropy that verifies φ u ≤ φ u RS and which vanishes atthe 1RSB capacity α u as depicted in fig. 9a. We note that the above inequality is aspredicted by Parisi’s replica theory , taking into account that we are working at strictlyzero energy, where the entropy becomes minus the free energy.The 1RSB solution provides a small correction to the RS result for storage capacity, as3illustrated in fig. 9b, where we plotted the difference between the annealed upper boundand the capacity for the RS and 1RSB solutions: α ua − α u RS and α ua − α u . .
10 0 .
15 0 .
20 0 .
25 0 .
30 0 . α = MN . . . . . . O v e r l a p s a t e q u ili b r i u m q R S , q ∗ , q ∗ a t K = . α u RS α u α u AT m eq q RS q ∗ q ∗ . . . . . . P a r i s i p a r a m e t e r a t e q u ili b r i u m m e q FIG. 8. Equilibrium values of the overlap q ∗ = q RS , q ∗ and the Parisi parameter m eq for the UBPat K = 1 .
5. For
K < K ∗ , the RS solution is stable and the only fixed point is q = q = q RS = 0. .
28 0 .
30 0 .
32 0 . α = MN − . . . . . . . . E n tr o p y φ u ( α ) a t K = . α u RS α u φ u ( α ) φ u RS ( α )0 . . . . . α u AT . . . . . . . K . . . . . C o m p a r i s o n R S v s R S B c a p a c i t i e s : α u a − α u c K ∗ K ∗ α ua − α u α ua − α u RS FIG. 9. a) : Comparison of the RS (blue) and 1RSB (orange) entropy for the UBP at K = 1 . α < α AT ≃ . α > α AT , 1RSB entropy deviatesslightly of the RS entropy before vanishing respectively at α u ≃ .
337 and α u RS ≃ . b) : Difference between the annealed upperbound and the 1RSB capacity α ua − α u (orange) and the RS capacity α ua − α u RS (blue). Below K ∗ the RS solution is stable: RS and 1RSB entropies match exactly. Above K ∗ , the RS solutionis unstable: the 1RSB entropy deviates slightly from the RS solution. F. 1RSB Stability
In the previous section we evaluated the 1RSB storage capacity of the u − function binaryperceptron for K > K ∗ . In this section we will argue that this cannot be an exact solutionto the problem.We could investigate the stability of 1RSB towards further levels of replica symmetrybreaking along the same lines we did for the RS solution. However, in the present case we5do not need to do that to see that the obtained solution cannot be correct. The explanationslies in the breaking of the up-down symmetry in the problem. This symmetry must eitherbe broken explicitly as in the ferromagnet, where the system would acquire an overallmagnetization, but we have not observed any trace of this in the present problem. Or thisup-down symmetry must be conserved in the final correct solution. The conservation ofthe up-down symmetry is manifested in the value q = 0 in the replica symmetric phase.The fact that in the 1RSB solution evaluated above we do not observe q = 0, but instead q > q ( x ), the smallest one of them should be 0 in order to restorethe up-down symmetry. We let the evaluation of the full-RSB for future work.Finally let us note that the 1RSB solution obtained in the previous section can be inter-preted as frozen-2RSB. In 2RSB we would have 3 kinds of overlaps, q , q and q . In frozen2RSB we would have q = 1, q = q , q = q . V. CONCLUSION
The step-function binary perceptron has thus far eluded a rigorous establishment of theconjectured storage capacity, eq. (2). This prediction is expected to be exact because ofthe frozen-1RSB nature of the problem . At the same time the work of sheds light onthe fact that the structure of the space of solutions is not fully described by the frozen-1RSB picture, and that rare dense and unfrozen regions exist and in fact are amenable todynamical procedures searching for solutions. It remains to be understood how is it possiblethat the 1RSB calculation does not capture these dense unfrozen regions of solutions .They do not dominate the equilibrium, but the RSB calculation is expected to describerare events via their large deviations, which in this case it does not.In this paper we focus on two cases of the binary perceptron with symmetric constraints,the rectangle binary perceptron and the u − function binary perceptron. We prove (up to anumerical assumption) using the second moment method that the storage capacity agreesin those cases with the annealed upper bound, except for the u − function binary perceptronfor K > K ∗ eq. (5). We analyze the 1RSB solution in that case and indeed obtain a lowerprediction for the storage capacity. However, we do not expect the 1RSB to provide theexact solution because it does not respect the up-down symmetry of the problem. Thoughthe precise nature of the satisfiable phase for the u − function binary perceptron for K > K ∗ remains illusive, we can conjecture it is full-RSB . Establishing this rigorously wouldprovide much deeper understanding and remains a challenging subject for future work. ACKNOWLEDGEMENT
We thank Florent Krzakala, Joe Neeman, and Pierfrancesco Urbani for useful discussions.We acknowledge funding from the ERC under the European Unions Horizon 2020 Researchand Innovation Programme Grant Agreement 714608-SMiLe. WP was supported in partby EPSRC grant EP/P009913/1. E. Gardner & B. Derrida. Optimal storage properties of neural network models.
J. Phys. A: Math. andGen , 1988. W. Krauth & M. M´ezard. Storage capacity of memory networks with binary couplings.
J. Phys. France ,1989. Timothy LH Watkin, Albrecht Rau, and Michael Biehl. The statistical mechanics of learning a rule.
Reviews of Modern Physics , 65(2):499, 1993. HS Seung, Haim Sompolinsky, and N Tishby. Statistical mechanics of learning from examples.
PhysicalReview A , 45(8):6056, 1992. A. Engel & C. Van den Broeck.
Statistical mechanics of learning . Cambridge university press, 2001. H. Nishimori.
Statistical Physics of Spin Glasses and Information Processing: An Introduction . OxfordUniversity Press, Oxford, UK, 2001. Michel Talagrand. The Parisi formula.
Annals of mathematics , pages 221–263, 2006. Michel Talagrand.
Spin glasses: a challenge for mathematicians: cavity and mean field models , volume 46.Springer Science & Business Media, 2003. M. M´ezard & A. Montanari.
Information, Physics, and Computation . Oxford Graduate Texts, 2009. Dimitris Achlioptas, Amin Coja-Oghlan, and Federico Ricci-Tersenghi. On the solution-space geometryof random constraint satisfaction problems.
Random Structures & Algorithms , 38(3):251–268, 2011. Dmitry Panchenko. The Parisi formula for mixed p -spin models. The Annals of Probability , 42(3):946–958,2014. Jian Ding, Allan Sly, and Nike Sun. Proof of the satisfiability conjecture for large k. In
Proceedings ofthe forty-seventh annual ACM symposium on Theory of computing , pages 59–68. ACM, 2015. Jeong Han Kim and James R Roche. Covering cubes by random half cubes, with applications to binaryneural networks.
Journal of Computer and System Sciences , 56(2):223–252, 1998. Mihailo Stojnic. Discrete perceptrons. arXiv preprint arXiv:1306.4375 , 2013. Jian Ding and Nike Sun. Capacity lower bound for the Ising perceptron. arXiv preprint arXiv:1809.07742 ,2018. Nikhil Bansal and Joel H. Spencer. On-line balancing of random inputs. arXiv preprint arXiv:1903.06898 ,2019. Manfred Opper. Statistical physics estimates for the complexity of feedforward neural networks.
PhysicalReview E , 51(4):3613, 1995. Geert Jan Bex, Roger Serneels, and Christian Van den Broeck. Storage capacity and generalization errorfor the reversed-wedge ising perceptron.
Physical Review E , 51(6):6309, 1995. Tadaaki Hosaka, Yoshiyuki Kabashima, and Hidetoshi Nishimori. Statistical mechanics of lossy datacompression using a nonmonotonic perceptron.
Physical Review E , 66(6):066126, 2002. S. Franz, G. Parisi, M. Sevelev, P. Urbani, and F. Zamponi. Universality of the SAT-UNSAT (jamming)threshold in non-convex continuous constraint satisfaction problems.
SciPost Phys , 2017. Giorgio Parisi. Infinite number of order parameters for spin-glasses.
Physical Review Letters , 43(23):1754,1979. Giorgio Parisi. A sequence of approximated solutions to the sk model for spin glasses.
Journal of PhysicsA: Mathematical and General , 13(4):L115, 1980. Giorgio Parisi. The order parameter for spin glasses: a function on the interval 0-1.
Journal of PhysicsA: Mathematical and General , 13(3):1101, 1980. James G Wendel. A problem in geometric probability.
Math. Scand , 11:109–111, 1962. Thomas M Cover. Geometrical and statistical properties of systems of linear inequalities with applicationsin pattern recognition.
IEEE transactions on electronic computers , (3):326–334, 1965. Mariya Shcherbina and Brunello Tirozzi. Rigorous solution of the Gardner problem.
Communications inmathematical physics , 234(3):383–422, 2003. Mihailo Stojnic. Another look at the Gardner problem. arXiv preprint arXiv:1306.3979 , 2013. Silvio Franz and Giorgio Parisi. The simplest model of jamming.
Journal of Physics A: Mathematicaland Theoretical , 49(14):145001, 2016. Mihailo Stojnic. Negative spherical perceptron. arXiv preprint arXiv:1306.3980 , 2013. Dimitris Achlioptas and Cristopher Moore. The asymptotic order of the random k-SAT threshold. In
Foundations of Computer Science, 2002. Proceedings. The 43rd Annual IEEE Symposium on , pages779–788. IEEE, 2002. K.Y.M Wong & Y. Kabashima H. Huang. Entropy landscape of solutions in the binary perceptronproblem.
Journal of Physics A: Mathematical and Theoretical , 2013. Haiping Huang and Yoshiyuki Kabashima. Origin of the computational hardness for learning with binarysynapses.
Physical Review E , 90(5):052813, 2014. Lenka Zdeborov´a and Marc M´ezard. Constraint satisfaction problems with isolated solutions are hard.
Journal of Statistical Mechanics: Theory and Experiment , 2008(12):P12004, 2008. Alfredo Braunstein and Riccardo Zecchina. Learning by message passing in networks of discrete synapses.
Physical review letters , 96(3):030201, 2006. Carlo Baldassi, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti, and Riccardo Zecchina. Subdominantdense clusters allow for simple learning and high computational performance in neural networks withdiscrete synapses.
Physical review letters , 115(12):128101, 2015. Carlo Baldassi, Christian Borgs, Jennifer T Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti,and Riccardo Zecchina. Unreasonable effectiveness of learning neural networks: From accessible statesand robust ensembles to basic algorithmic schemes.
Proceedings of the National Academy of Sciences ,113(48):E7655–E7662, 2016. Lenka Zdeborov´a and Marc M´ezard. Locked constraint satisfaction problems.
Physical review letters ,101(7):078702, 2008. Lenka Zdeborov´a and Florent Krzakala. Quiet planting in the locked constraint satisfaction problems.
SIAM Journal on Discrete Mathematics , 25(2):750–770, 2011. Dimitris Achlioptas and Amin Coja-Oghlan. Algorithmic barriers from phase transitions. In
Foundationsof Computer Science, 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on , pages 793–802. IEEE,2008. Florent Krzakala and Lenka Zdeborov´a. Hiding quiet solutions in random constraint satisfaction problems.
Physical review letters , 102(23):238701, 2009. Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partitionmodel.
Probability Theory and Related Fields , 162(3-4):431–461, 2015. Amin Coja-Oghlan, Florent Krzakala, Will Perkins, and Lenka Zdeborov´a. Information-theoretic thresh-olds from the cavity method.
Advances in Mathematics , 333:694–795, 2018. Ehud Friedgut. Sharp thresholds of graph properties, and the k-SAT problem.
Journal of the Americanmathematical Society , 12(4):1017–1054, 1999. OC Martin, M M´ezard, and O Rivoire. Frozen glass phase in the multi-index matching problem.
Physicalreview letters , 93(21):217205, 2004. C. Sch¨ulke.
Statistical physics of linear and bilinear inference problems . PhD thesis, Universit´e ParisDiderot - La Sapienza, 2016. J.R.L de Almeida and D.J Thouless. Stability of the Sherrington-Kirkpatrick solution of a spin glassmodel.
J. Phys. A: Math. Gen , 1978. & M.Virasoro M.M´ezard, G.Parisi. Spin glasses and beyond . World Science, Singapore, 1987. R. Monasson. Structural glass transition and the entropy of the metastable states.
Physical Review Letter ,(75 2847), 1995. Marc Mezard, Giorgio Parisi, and Miguel Angel Virasoro. Spin Glass Theory and Beyond, 1987.
VI. APPENDICESA. General replica calculation
We present here the replica computation for general prior distribution P w and constraintfunction ϕ . In order to compute the quenched average of the free entropy, we consider thepartition function of n ∈ N identical copies of the initial system. Using the replica trick,and an analytical continuation, the averaged free entropy φ of the initial system reads: φ ( α ) ≡ lim N → + ∞ N E X [log( Z ( X ))] = lim N → + ∞ lim n → N ∂ log ( E X [ Z ( X ) n ]) ∂n , (23)where the replicated partition function can be written as E X [ Z ( X ) n ] = Z d X P X ( X ) Z ( X ) n = Z d X P X ( X ) n Y a =1 Z d w a P w ( w a ) Z d z a C ( z a ) δ ( z a − Xw a ) , (24)with the global constraint function C ( z ) = M Y µ =1 ϕ ( z µ ).We suppose that inputs are iid distributed from P X , N (cid:0) , N (cid:1) . More precisely, for i, j ∈ [1 : N ], µ, ν ∈ [1 : M ], E X [ X iµ X jν ] = N δ µν δ ij . Hence z aµ = P Ni =1 X iµ w ai is the sum of iid random variables. The central limit theorem insures that z aµ ∼ N (cid:0) E X [ z aµ ] , E X [ z aµ z bµ ] (cid:1) ,with two first moments: ( E X [ z aµ ] = P Ni =1 E X [ X iµ ] w ai = 0 E X [ z aµ z bµ ] = P ij E X [ X iµ X jµ ] w ai w bj = N P ij δ ij w ai w bj = N P Ni =1 w ai w bi . (25)In the following we introduce the symmetric overlap matrix Q ≡ ( N P Ni =1 w ai w bi ) a,b =1 ..n .Define ˜z µ ≡ ( z aµ ) a =1 ..n and ˜w i ≡ ( w ai ) a =1 ..n . ˜z µ follows a multivariate gaussian distribution ˜z µ ∼ P ˜ z , N ( , Q ) and P ˜ w ( ˜w ) = Q na =1 [ δ ( ˜ w a −
1) + δ ( ˜ w a + 1)]. Introducing the change ofvariable and the Fourier representation of the δ -Dirac function that involves a new parameter ˆQ :1 = Z d Q Y a ≤ b δ N Q ab − N X i =1 w ai w bi ! = Z d Q Z d ˆQ exp (cid:18) − N T r ( Q ˆQ ) (cid:19) exp N X i =1 ˜w ⊺ i ˆQ ˜w i ! , Q and ˆQ ,that can be evaluated using Laplace method in the N → ∞ limit, E X [ Z ( X ) n ] = Z d Q d ˆQ e − N (cid:16) Tr(
Q ˆQ ) − log (cid:16)R d ˜w P ˜ w ( ˜w ) e ˜w ⊺ ˆQ˜w (cid:17) − α log( R d ˜z P ˜ z ( ˜z ) C ( ˜z )) (cid:17) (26)= Z d Q d ˆQ e − NS n ( Q , ˆQ ) ≃ N →∞ e − N · SP Q , ˆQ { S n ( Q , ˆQ ) } , (27)where SP states for saddle point and we defined S n ( Q , ˆQ ) = Tr(
Q ˆQ ) − log( I wn ( ˆQ )) − α log ( I zn ( Q )) I wn ( ˆQ ) = R R n d ˜w P ˜ w ( ˜w ) e ˜w ⊺ ˆQ ˜w I zn ( Q ) = R R n d ˜z P ˜ z ( ˜z ) C ( ˜z ) . (28)Finally, using eq. (23) and switching the two limits n → N → ∞ , the quenchedfree entropy φ simplifies as a saddle point equation φ ( α ) = − SP Q , ˆQ ( lim n → ∂S n ( Q , ˆQ ) ∂n ) , (29)over general symmetric matrices Q and ˆQ . In the following we will assume simple ansatzfor these matrices that allows to get analytic expressions in n in order to take the derivative. B. RS entropy
Let’s compute the functional S n ( Q , ˆQ ) appearing in the free entropy eq. (29) in thesimplest ansatz: the Replica Symmetric ansatz. This later assumes that all replica re-main equivalent with a common overlap q = N P Ni =1 w ai w bi for a = b and a norm Q = N P Ni =1 w ai w ai , leading to the following expressions of the matrices Q and ˆQ ∈ R n × n : Q = Q q ... q q Q ... ...... ... ... q q ... q Q and ˆQ = ˆ Q ˆ q ... ˆ q ˆ q ˆ Q ... ...... ... ... ˆ q ˆ q ... ˆ q ˆ Q . (30)Let’s compute separately the terms involved in the functional S n ( Q , ˆQ ) eq. (28): the firstis a trace term, the second a term of prior I wn and finally the third a term depending onthe constraint I zn . a. Trace term The trace term can be easily computed and takes the following form:12 Tr(
Q ˆQ ) (cid:12)(cid:12)(cid:12)(cid:12) RS = 12 (cid:16) nQ ˆ Q + n ( n − q ˆ q (cid:17) . (31) b. Prior integral Evaluated at the RS fixed point, and using a gaussian identity alsoknown as a Hubbard-Stratonovich transformation, the prior integral can be further simpli-fied I wn ( ˆQ ) (cid:12)(cid:12)(cid:12) RS = Z d ˜w P ˜ w ( ˜w ) e ˜w ⊺ ˆQ ˜w = Z d ˜w P ˜ w ( ˜w ) exp ( ˆ Q − ˆ q )2 n X a =1 ( ˜ w a ) ! exp ˆ q n X a =1 ˜ w a ! (32)= Z Dt "Z dwP w ( w ) exp ( ˆ Q − ˆ q )2 w + t p ˆ q w ! n . (33)9 c. Constraint integral Recall the vector ˜z ∼ P ˜ z , N ( , Q ) follows a gaussian distri-bution with zero mean and covariance matrix Q . In the RS ansatz, the covariance can berewritten as a linear combination of the identity I and J the matrix with all ones entries ofsize n × n : Q | RS = ( Q − q ) I + q J , that allows to split the variable z a = √ q t + √ Q − q u a with t ∼ N (0 ,
1) and ∀ a, u a ∼ N (0 , I zn ( Q ) | RS = Z d ˜z P ˜ z ( ˜z ) C ( ˜z ) = Z Dt Z n Y a =1 Du a ϕ (cid:16) √ q t + p Q − q u a (cid:17) (34)= Z Dt (cid:20)Z Duϕ (cid:16) √ q t + p Q − q u (cid:17)(cid:21) n . (35) d. Summary and RS free entropy φ RS Finally putting pieces together, the functional S n taken at the RS fixed point has an explicit formula and dependency in n : S n ( Q , ˆQ ) (cid:12)(cid:12)(cid:12) RS = 12 Tr( Q ˆQ ) − log( I nw ( ˆQ )) − α log ( I nz ( Q )) (cid:12)(cid:12)(cid:12)(cid:12) RS (36) ≃ n → (cid:16) nQ ˆ Q + n ( n − q ˆ q (cid:17) − n Z Dt log Z dwP w ( w ) exp ( ˆ Q − ˆ q )2 w + t p ˆ q w !! (37) − nα Z Dt log (cid:18)Z Duϕ (cid:16) y, √ q t + p Q − q u (cid:17)(cid:19) . (38)Finally taking the derivative with respect to n and the n → φ RS ( α ) = SP q , ˆ q (cid:26) − Q ˆ Q + 12 q ˆ q + I w RS (ˆ q ) + α I z RS ( q ) (cid:27) , (39)with Q = ˆ Q = 1 and the following notations, I w RS (ˆ q ) ≡ R Dt log (cid:16)R dwP w ( w ) exp (cid:16) ( ˆ Q − ˆ q )2 w + t √ ˆ q w (cid:17)(cid:17) I z RS ( q ) ≡ R Dt log (cid:0)R Dzϕ (cid:0) √ q t + √ Q − q z (cid:1)(cid:1) . (40) C. 1RSB entropy
The free entropy eq. (23) can also be evaluated at the simplest non trivial fixed point:the one step Replica Symmetry Breaking ansatz (1RSB). Instead assuming that replicas areequivalent, it assumes that the symmetry between replica is broken and that replicas areclustered in different states, with inner overlap q and outer overlap q . Translating this ina matrix formulation, the matrices can be expressed as Q = q J n +( q − q ) I nm ⊗ J m +( Q − q ) I n and ˆQ = ˆ q J n +(ˆ q − ˆ q ) I nm ⊗ J m + (cid:16) ˆ Q − ˆ q (cid:17) I n . (41) a. Trace term Again, the trace term can be easily computed12 Tr(
Q ˆQ ) (cid:12)(cid:12)(cid:12)(cid:12) = 12 (cid:16) nQ ˆ Q + n ( m − q ˆ q + n ( n − m ) q ˆ q (cid:17) . (42)0 b. Prior integral Separating replicas with different overlaps, the prior integral can bewritten as I wn ( ˆQ ) (cid:12)(cid:12)(cid:12) = Z d ˜w P ˜ w ( ˜w ) e ( ˆ Q − ˆ q P na =1 ( ˜ w a ) + (ˆ q − ˆ q P nmk =1 P kma,b =( k − m +1 ˜ w a ˜ w b + ˆ q ( P na =1 ˜ w a ) (43)= Z Dt "Z Dt "Z dwP w ( w ) exp ( ˆ Q − ˆ q )2 w + (cid:16)p ˆ q t + p ˆ q − ˆ q t (cid:17) w ! m nm (44) c. Constraint integral Again the vector ˜z ∼ P ˜ z , N ( , Q ) follows a gaussian vectorwith zero mean and covariance Q | = q J n + ( q − q ) I nm ⊗ J m + ( Q − q ) I n . Thegaussian vector of covariance Q | can be decomposed in a sum of normal gaussianvectors t ∼ N (0 , ∀ k ∈ [1 : nm ] , t k ∼ N (0 ,
1) and ∀ a ∈ [( k − m + 1 : km ], u a ∼ N (0 , z a = √ q t + √ q − q t k + √ Q − q u a . Finally the constraint integral reads I zn ( Q ) | = Z Dt Z nm Y k =1 Dt k Z km Y a =( k − m +1 Du a ϕ ( √ q t + √ q − q t k + p Q − q u a )(45)= Z Dt (cid:20)Z Dt (cid:20)Z Duϕ ( √ q t + √ q − q t + p Q − q u ) (cid:21) m (cid:21) nm . (46) d. Summary and 1RSB free entropy φ Gathering the previous computationseq. (42, 44, 46), the functional S n evaluated at the 1RSB fixed point reads: S n ( Q , ˆQ ) (cid:12)(cid:12)(cid:12) = 12 Tr( Q ˆQ ) − log( I nw ( ˆQ )) − α log ( I nz ( Q )) (cid:12)(cid:12)(cid:12)(cid:12) (47) ≃ n → (cid:16) nQ ˆ Q + n ( m − q ˆ q + n ( n − m ) q ˆ q (cid:17) (48) − nm Z Dt log Z Dt "Z d ˜ wP w ( ˜ w ) exp ( ˆ Q − ˆ q )2 ˜ w + (cid:16)p ˆ q t + p ˆ q − ˆ q t (cid:17) ˜ w ! m ! (49) − α nm Z dy Z Dt log (cid:18)Z Dt (cid:20)Z Duϕ ( y, √ q t + √ q − q t + p Q − q u ) (cid:21) m (cid:19) . (50)Let’s introduce the replicated free entropy following . We consider m reals replicas ofthe same system and we imagine we put a small field, that allows the m replicas to fallin the same state. The replicated free entropy is the free entropy corresponding to these m uncorrelated copies in the limit of zero coupling. To compute it, we consider n ′ = nm replicas. Denoting q = ( q , q ) and ˆq = (ˆ q , ˆ q ), the replicated free entropy reads as m times the free entropy of n replicas with 1RSB structure:Φ ( α ) : = (cid:18) lim N →∞ N E X [log( Z m ( X )] (cid:19) ≃ lim N →∞ N lim n ′ → ∂ log (cid:16) E X [ Z mn ′ ( X )] (cid:17) ∂n ′ (51)= m (cid:18) lim N →∞ lim n → N ∂ log ( E [ Z n ( X )] X ) ∂n (cid:19) = m − SP Q , ˆQ ( lim n → ∂S n ( Q , ˆQ ) ∂n )! (52)= SP q , ˆq (cid:26) m (cid:16) q ˆ q − Q ˆ Q (cid:17) + m q ˆ q − q ˆ q ) + m I w ( ˆq ) + αm I z ( q ) (cid:27) . (53)1with t = ( t , t ), g w and f z defined in eq. (21) and I w ( ˆq ) = 1 m Z Dt log (cid:18)Z Dt g w ( t , ˆq ) m (cid:19) and I z ( q ) = 1 m Z Dt log (cid:18)Z Dt f z ( t , q ) m (cid:19) . (54) D. RS Stability1. De Almeida Thouless RS Stability
The stability of a given saddle point ansatz is related to the positivity the hessian ofthe functional S n . This stability analysis has first been done by de Almeida Thoulessand following , replicons eigenvalues of the RS ansatz λ A and λ B can be expressed asfunctions of { g wi , f zi } i =0 defined in eq. (16): λ A ( q ) = 1( Q − q ) Z Dt (cid:0) f z ( f z − f z ) + ( f z ) (cid:1) ( f z ) ( t, q ) , and λ B (ˆ q ) = Z Dt (cid:0) g w g w − ( g w ) (cid:1) ( g w ) ( t, ˆ q ) . (55)The instability AT-line is defined when the determinant of the hessian vanishes that trans-lates as an implicit equation over α , where q , ˆ q are solution of the saddle point equationseq. (15) at α = α AT : 1 α AT = λ A ( q ( α AT ) , β ) λ B (ˆ q ( α AT )) . (56)However for α < α AT , ( q , ˆ q ) = (0 ,
0) is the only solution. Using { ˜ f zi , ˜ g wi } i =0 definedeq. (58), this expression simplifies because of the symmetry of the prior distribution P w and the constraints ϕ in the rectangle and u − function cases. In fact the symmetry imposes˜ f z = 0 and ˜ g w = 0 and the condition reads:1 α AT = ˜ f z − ˜ f z ˜ f z ! (cid:18) ˜ g w ˜ g w (cid:19) . (57)
2. Existence and stability of the RS fixed point ( q , ˆ q ) = (0 , We provide an alternative approach to get the instability condition of the RS solution forsymmetric prior and constraint. In this symmetric case, the stability can be derived fromthe existence and stability of the symmetric fixed point ( q , ˆ q ) = (0 , F ( q ) ≡ α R Dt ( f z ) − t √ q f z f z + q t ( f z ) (1 − q ) ( f z ) ( t, q ) ,G (ˆ q ) ≡ R Dt g w − t ˆ q − / g w g w ( t, ˆ q ) , with ˜ f zi ( y ) ≡ R Dzz i ϕ ( z ) , ˜ g wi ≡ R dww i P w ( w ) e w . (58)In fact the saddle point equations at the RS fixed point eq. (15) can be written using thefunctions F, G , and can be reduced to a single fixed point equation over q : q = G (ˆ q ) , ˆ q = F ( q ) , ⇒ n q = G ◦ F ( q ) ≡ H ( q ) . (59)2As stressed above, the RS stability is equivalent to the existence and stability of thefixed point q = 0. According to that, let’s compute the stability of the above fixed pointequation eq. (59). Computing F, F ′ , G, G ′ in the limit ( q , ˆ q ) → (0 , { f zi , g wi } i as functions of { ˜ f zi , ˜ g wi } i and finally using the symmetry that implies ˜ f z = 0 and ˜ g w = 0: F ( q ) = q → α (cid:20)(cid:16) ˜ f z ˜ f z (cid:17) + q (cid:16) ( ˜ f z − ˜ f z ) ( ˜ f z ) + 3 ( ˜ f z ) ( ˜ f z ) − ( ˜ f z ) ( ˜ f z − ˜ f z )( ˜ f z ) (cid:17) + O ( q ) (cid:21) ∼ αq (cid:16) ˜ f z − ˜ f z ˜ f z (cid:17) −→ q → , ∂F∂q ( q ) = q → α (cid:20)(cid:16) ˜ f z − ˜ f z ˜ f z (cid:17) + (cid:16) ˜ f z ˜ f z (cid:17) (cid:16) ( ˜ f z ) ( ˜ f z ) − ( ˜ f z − ˜ f z )˜ f z (cid:17) + O ( q ) (cid:21) −→ q → α (cid:16) ˜ f z − ˜ f z ˜ f z (cid:17) ,G (ˆ q ) = ˆ q → (cid:16) ˜ g w ˜ g w (cid:17) + ˆ q (cid:18)(cid:16) ˜ g w ˜ g w (cid:17) + ˜ g w ˜ g w (cid:18) (cid:16) ˜ g w ˜ g w (cid:17) − ˜ g w ˜ g w (˜ g w ) (cid:19)(cid:19) + O (ˆ q / ) −→ ˆ q → , ∂G∂ ˆ q (ˆ q ) = ˆ q → (cid:16) ˜ g w ˜ g w (cid:17) + ˜ g w ˜ g w (cid:18) (cid:16) ˜ g w ˜ g w (cid:17) − ˜ g w ˜ g w (˜ g w ) (cid:19) + O ( √ ˆ q ) −→ ˆ q → (cid:16) ˜ g w ˜ g w (cid:17) . (60)Finally, the existence and stability conditions of the fixed point ( q , ˆ q ) = (0 ,
0) translateas an explicit condition over α that defines α AT H ( q ) = G ◦ F ( q ) → q → ∂H∂q (cid:12)(cid:12)(cid:12) q =0 = ∂G∂ ˆ q (cid:12)(cid:12)(cid:12) ˆ q =0 ∂F∂q (cid:12)(cid:12)(cid:12) q =0 ≤ , ⇒ α ≤ ˜ f z − ˜ f z ˜ f z ! (cid:18) ˜ g w ˜ g w (cid:19) − ≡ α AT . (61) E. Moments at finite temperature
In this section we generalize the definition of the partition function for any temperature T . The energy of a configuration w is defined as the number of unsatisfied constraints andthe corresponding partition function is defined by Z ( X , T ) = P w ∈{± } N e −E ( w ) /T . In par-ticular for the rectangle and u − function constraints, the partition functions at temperature T read Z r ( X , T ) = X w ∈{± } N M Y µ =1 e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! and Z u ( X , T ) = X w ∈{± } N M Y µ =1 e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≥ K ! . (62)We define the probabilities that constraints are satisfied at temperature T : p r,K,T ≡ R Dze − T (cid:16) − | z | ≤ K (cid:17) = e − T + (1 − e − T ) p r,K ,p u,K,T ≡ R Dze − T (cid:16) − | z | ≥ K (cid:17) = e − T + (1 − e − T ) p u,K ,p s,K,T ≡ R Dze − T (cid:18) − z ≥ K (cid:19) = e − T + (1 − e − T ) p s,K . (63)3
1. First moment at finite temperature
Let E r ( N, M, T ) the event that Z r ( X , T ) ≥
1. Let’s compute the first moment in therectangle case, P [ E r ( N, αN, T )] ≤ E [ Z r ( X ( N, αN ) , T )] = 2 N E αN Y µ =1 e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! (64)= 2 N p αNr,K,T = exp( N (log(2) + α log( p r,K,T ))) . (65)and this derivation holds similarly for the step and u − function.
2. Second moment at finite temperature
Again we show the computation for the rectangle and it can be done similarly for the u − function. a. Expression of F r,K,α,T E [ Z r ( X ( N, αN ) , T ) ] = X w , w ∈{± } N E αN Y µ =1 e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! (66)= 2 N X w ∈{± } N αN Y µ =1 E e − T ( − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! + − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K !) (67)= 2 N N X l =0 (cid:18) Nl (cid:19) q r,K,T ( l/N ) αN ≡ exp( N (log(2) + F r,K,α,T )) , (68)where we defined q r,K,T the probability that two standard Gaussians with correlation β areboth at most K in absolute value at temperature T . Defining ρ ( β ) = 1 − β and I α ,β α ,β ( ρ ) ≡ Z β α Z β α dxdy e − ( x + y +2 ρxy ) π p − ρ = 12 π Z β α Z β ρy √ − ρ α ρy √ − ρ dydxe − y x , (69)the function F r,K,α,T at finite temperature can be written F r,K,α,T = H ( β ) + α log q r,K,T ( β ) , where q r,K,T ( β ) ≡ Z R dxdy e − ( x + y +2 ρ ( β ) xy ) π p − ρ ( β ) e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! + − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K !! (70)= I − K,K − K,K + e − T (cid:16) I − K,K −∞ , − K + I − K,KK, + ∞ + I −∞ , − K − K,K + I K, + ∞− K,K (cid:17) + e − T (cid:16) I −∞ , − K −∞ , − K + I K, + ∞−∞ , − K + I −∞ , − KK, + ∞ + I K, + ∞ K, + ∞ (cid:17) . (71) b. Expression of ∂ β F r,K,α,T To compute the derivative of q r,K,T , we first introduce G α ,β γ ( ρ ) ≡ π Z β α dye − y e −
12 ( γ + ρy )1 − ρ ( y + γρ ) . ∂ β I α ,β α ,β ( ρ ( β )) = − β (1 − β )) / (cid:16) G α ,β β − G α ,β α (cid:17) ( ρ ( β )) . (72)Hence taking the derivative of each term of the form I α ,β α ,β and simplifying it, the probability q r,K,T reads: q r,K,T ( β ) = − β (1 − β )) / (cid:16) G − K,KK − G − K,K − K (cid:17) ( ρ )(1 − e − /T ) = (1 − e − /T ) π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) . In the end, the derivative of the second moment can be evaluated for β = 0 and β = 1 atall temperature T : ∂F r,K,α,T ∂β ( β ) = log (cid:18) − ββ (cid:19) + αq r,K,T ∂q r,K,T ( β ) ∂β (73)= log (cid:18) − ββ (cid:19) + αq r,K,T ( β ) (1 − e − /T ) π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) −−−−−−−−→ β → / ± / ±∞ . (74)In particular at T = 0, ∂F r,K,α ∂β ( β ) = log (cid:18) − ββ (cid:19) + αq r,K,T ( β ) 1 π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) . (75) c. Expression of ∂ β F u,K,α,T Adapting the previous steps and using q u,K,T ( β ) ≡ Z R dxdy e − ( x + y +2 ρ ( β ) xy ) π p − ρ ( β ) e − T − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K ! + − (cid:12)(cid:12)(cid:12)(cid:12) z µ ( w ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ K !! = (cid:16) I −∞ , − K −∞ , − K + I K, + ∞−∞ , − K + I −∞ , − KK, + ∞ + I K, + ∞ K, + ∞ (cid:17) + e − T (cid:16) I − K,K −∞ , − K + I − K,KK, + ∞ + I −∞ , − K − K,K + I K, + ∞− K,K (cid:17) + e − T (cid:16) I − K,K − K,K (cid:17) = q r,K, − T e − T , and eq. (74) the derivative for the u − function is straightforward to compute and is givenby ∂F u,K,α,T ∂β ( β ) = log (cid:18) − ββ (cid:19) + αq u,K,T ( β ) ∂q u,K,T ∂β ( β )= log (cid:18) − ββ (cid:19) + αq u,K,T ( β ) ( e − /T − π p β (1 − β ) (cid:18) e − K − β ) (cid:18) e (2 β − K − β ) β − (cid:19)(cid:19) −−−−−−−−→ β → / ± / ±∞±∞