A Probability Density Theory for Spin-Glass Systems
AA Probability Density Theory for Spin-Glass Systems
Gavin S. Hartnett
The RAND Corporation [email protected]
Masoud Mohseni
Google Research [email protected]
Abstract
Spin-glass systems are universal models for representing many-body phenomenain statistical physics and computer science. High quality solutions of NP-hardcombinatorial optimization problems can be encoded into low energy states ofspin-glass systems. In general, evaluating the relevant physical and computationalproperties of such models is difficult due to critical slowing down near a phase tran-sition. Ideally, one could use recent advances in deep learning for characterizingthe low-energy properties of these complex systems. Unfortunately, many of themost promising machine learning approaches are only valid for distributions overcontinuous variables and thus cannot be directly applied to discrete spin-glass mod-els. To this end, we develop a continuous probability density theory for spin-glasssystems with arbitrary dimensions, interactions, and local fields. We show how ourformulation geometrically encodes key physical and computational properties ofthe spin-glass in an instance-wise fashion without the need for quenched disorderaveraging. We show that our approach is beyond the mean-field theory and identifya transition from a convex to non-convex energy landscape as the temperatureis lowered past a critical temperature. We apply our formalism to a number ofspin-glass models including the Sherrington-Kirkpatrick (SK) model, spins onrandom Erd˝os-Rényi graphs, and random restricted Boltzmann machines.
Spin-glasses are a general class of models which can be used to study complexity in physics,chemistry, biology, computer science, and social sciences [1]. They also provide a theoretical andphenomenological framework to analyze hard real-world problems in discrete optimization andprobabilistic inference over graphical models [2]. At the heart of such complex phenomena is theemergent behavior that can occur when disordered systems contain many particles, variables, oragents which exert two-body or higher order interactions. Today, there is a fundamental gap in ourknowledge of how such non-trivial correlations emerge at low temperatures. For these systems thereis a sudden increase in correlations, occurring simultaneously at various scales (up to the overallsystem size), as the temperature is reduced below a critical threshold. After half a century of intensestudy, it is not yet fully understood why, or under what conditions, a distribution of small to largeclusters of variables can become rigid or frozen below a critical point in a hierarchical or multi-scalefashion in the absence of any obvious symmetries. Additionally, the relaxation time-scales growexponentially large as a function of the correlation length-scales, which in the worst-case prevents thesystem from achieving equilibrium in finite time. This phenomenon is at the heart of the hardness ofcombinatorial optimization problems, such as random K-SAT, near computational phase transitions[3].Our main motivation is to explore the critical and low temperature properties of spin-glass systemsthat encode practical computational problems, which are typically at the intermediate scales withrespect to number of variables, range of physical interactions, and spatial dimensions. The spin-glassformulation of such problems often involves thousands or even millions of variables, which precludesany hope of successfully applying brute-force or ab initio methods. A given instance of these a r X i v : . [ c ond - m a t . d i s - nn ] J a n roblems typically contains considerable structure, with an underlying graph that could have a power-law distribution over the degree of connectivity with a fat tail for variables with many long-rangephysical interactions. Such realistic spin-glasses are also often in an intermediate zone with respectto their fractal dimensions and their physical and computational properties, and may be thought ofas lying between the two well-studied limiting cases of short-range Edwards-Anderson model [2]and infinite range Sherrington-Kirkpatrick (SK) model [4]. Consequently spin-glass representationsof most interesting and relevant problems reside in an uncharted territory that is analytically andcomputationally intractable. Although the disorders for each instance can be considered fixedor "quenched" for the relevant time-scales, the self-averaging assumption in statistical physicsnonetheless becomes inadequate. Mean-field techniques which can be otherwise successfully appliedto toy model problems such as random energy models [2], or p -spin models [5], become invalid asthe fluctuations over the mean values are typically large. Moreover, Renormalization Group (RG)techniques [6] are ineffective as these approaches rely on strong symmetry assumptions, which aredifficult to setup for a particular problem class, not well-defined in presence of strong inhomogeneities,and usually involve crude and irreversible coarse-graining of the microscopic degrees of freedom.Recent advances in deep learning open up the possibility that these non-linear and non-perturbativeemergent properties of spin-glass systems could be machine-learned. Unfortunately, many of themost promising machine learning approaches, such as gradient-based iterative optimization, are onlyvalid for distributions over continuous variables and thus they either cannot be directly applied todiscrete spin-glass systems; or they can be applied at the cost of simply ignoring the fact that themachine learning algorithm was developed specifically for distributions over continuous variables.Despite this, there has been some progress in using neural-network for discovering new phases ofmatter or accelerating Monte Carlo sampling [7, 8, 9, 10, 11]. Here, we are interested in eventuallyapplying recent techniques in deep generative models, such as normalizing flows [12], to discretespin-glass distributions described by the following family of Hamiltonians H = − (cid:88) i h i s i − (cid:88) i
12 ln det( β ˜ J ) + N β ∆2 . (13)This expression may be derived by equating the joint distribution p ( x, s ) (Eq. 10) with p ( x | s ) p ( s ) .The second and third terms are simply due to the normalization of the multi-variate Gaussian in p ( x | s ) , and last term is due to the fact that s T s = N for Ising spins. The fact that the continuousand discrete formulations are related in this way indicates that there is no “free lunch” here - onecannot use the continuous formulation to circumvent the problems associated with complex spin-glassdistributions - for example, the hardness in sampling or the evaluation of the partition function.The partition function is a generating function for the n -point correlation functions. Denoting theusual thermodynamic ensemble over discrete spins as (cid:104)·(cid:105) s := (cid:80) { s i } (cid:0) e − βH · (cid:1) /Z s , and the analogousensemble over continuous configurations as (cid:104)·(cid:105) x := (cid:82) d N x (cid:0) e − β H β ( x ) · (cid:1) /Z x , then by applying ∂ h i ...∂ h ip to each side of Eq. 13, the connected correlation functions of the two ensembles may berelated through: (cid:104) s i ...s i p (cid:105) s,C = (cid:68) tanh( β ˜ h i ( x )) ... tanh( β ˜ h i p ( x )) (cid:69) x,C , (14)4here all indices are assumed distinct and the C subscript denotes connected. In particular, theaverage local magnetization at site i is related to the continuous variable via (cid:104) s i (cid:105) s = (cid:104) tanh( β ˜ h i ( x )) (cid:105) x .The marginal probability of the spin pointing up at site i is p ( s i = ±
1) = 12 (cid:16) ± (cid:68) tanh( β ˜ h i ( x )) (cid:69) x (cid:17) . (15)This expression allows for an interpretation of the effective field ˜ h i ( x ) as a global input signal thatdetermines the local spin polarization after averaging over all possible x configurations. In thisexpression, the hyperbolic tangent plays the role of an activation function, commonly used in artificialneural networks, that determines the polarization of the spin.We can express the overlap distribution of the original discrete system in terms of the continuousvariable using Eq. 14. If the thermodynamic (Gibbs) measure decomposes into a sum over pure states,each with weight w α , then the disorder-dependent overlap distribution is P J ( q ) = (cid:88) αβ w α w β δ ( q αβ − q ) . (16)This distribution can be regarded as the order parameter of mean field spin-glasses in our continuousformulation, and the moments of this distribution may be expressed in terms of spin correlationfunctions as [16]: q ( p ) J := (cid:90) d q P J ( q ) q p = 1 N p (cid:88) i ...i p (cid:104) s i ...s i p (cid:105) s . (17)By using Eq. 14, this may be equivalently written as: q ( p ) J = 1 N p (cid:88) i ...i p (cid:68) tanh( β ˜ h i ( x )) ... tanh( β ˜ h i p ( x )) (cid:69) x . (18)This relation shows how the spin-glass order parameter is encoded in the continuous formulation.This concludes the derivation of our continuous formulation. One important aspect of using continu-ous variables is that they provide a geometric encoding of the problem. In particular, p ( s i | x ) encodesthe likelihood that a given spin will point up or down for a given point in R N . This probability is inturn determined by the inverse temperature and the strength of the effective local field at that point, ˜ h i ( x ) . The contours of constant ˜ h i ( x ) are given by shifted ellipsoids, with the shift given by theexternal local field h i and the shape and scale of the ellipsoid determined by the β ˜ J . The conditionaldistribution p ( s i | x ) can be used to obtain the marginal probability distribution p ( s i ) by integratingover all R N and weighting each point according to it probability under the continuous Boltzmanndistribution p ( x ) . The S -shaped activation function that appears in Eq. 15 implies that the spins willbe frozen if the regions of large local effective field are assigned a low energy in the energy landscapegiven by H β ( x ) . In subsequent sections, we will explore the geometric structure of this landscapefurther, both for general coupling matrices ˜ J , and for some well-known examples such as the SKmodel. The probability density formulation affords several advantages over the original discrete formulationas well as some additional mathematical subtleties. Continuous variables allow alternative samplingmethods such as Hamiltonian Monte Carlo [17] to be applicable, and indeed, this was one of themotivations for the continuous relaxation method [13]. Another benefit is that the continuousformulation provides a geometric encoding of of the combinatorial optimization problems which maybe represented in terms of spin-glass systems. In this section we will derive basic properties of the Since this relation holds for all p , it follows that Eq. 14 also holds for the unconnected correlation functions(i.e. the subscript C may be dropped): (cid:104) s i n ...s i p (cid:105) s = (cid:68) tanh( β ˜ h i ( x )) ... tanh( β ˜ h i p ( x )) (cid:69) x . H β ( x ) for arbitrary values of the couplings and graph topology.Later, in Sections 5, 6, and 7 we will further explore our formulation of the the SK model, randomrestricted Boltzmann machines, and spin on random Erd˝os-Rényi graphs as specific examples.One of our main results is that the energy landscape defined by H β ( x ) is convex above a disorder-dependent critical temperature T convex , given in terms of the largest eigenvalue of the shifted couplingmatrix: T convex := λ N ( J ) + ∆ . (19)The proof is given in Appendix A. One of the most important mathematical properties of the energylandscape is whether it is convex or not. In particular, convexity of H β ( x ) implies that the logprobability density p ( x ) is log-concave, and log-concave probability densities enjoy a number ofuseful properties, such as the fact that the cumulative distribution function (CDF) is also log-concave,as well as the fact that the marginal density over any subset of the x i variables will also be concave.Convexity of H β ( x ) also implies practical consequences, for example it means that certain algorithmssuch as adaptive rejection sampling may be used to efficiently sample p ( x ) [18].As the temperature is lowered past T convex , the Hamiltonian density becomes non-convex. In order tounderstand this transition, it will be useful to set the external magnetic field to zero, h = 0 . We firstnote that the expression for H β ( x ) in Eq. 4 is the sum of two terms. The first is quadratic in x , and isguaranteed to be positive for any x since ∆ was chosen to make ˜ J positive-definite. Conversely, thesecond term is negative for any x , and it scales linearly in x at large radii, i.e. as || x || → ∞ . Thus,at large radii the first term dominates and the Hamiltonian density is: H β ( x ) ∼ x T ˜ Jx , (20)which ensures that p ( x ) is integrable. In contrast, near the origin x = 0 the expression simplifies to H β ( x ) ∼ const + 12 x T ( ˜ J − β ˜ J ) x . (21)The linear term in the expansion vanishes, and therefore the origin is a critical point of the Hamiltoniandensity. If ( ˜ J − β ˜ J ) is also positive-definite, then x = 0 is a minimum. This condition is equivalentto T > T convex , and so in this case x = 0 is the unique global minimum. As T is lowered below T convex , the matrix ( ˜ J − β ˜ J ) develops negative eigenvalues, and x = 0 becomes a saddle.In addition to the origin becoming unstable, the convex/non-convex transition is also characterized bythe appearance of a pair of additional critical points. The critical points of H β ( x ) solve x = tanh( β ˜ Jx ) , (22)As the temperature approaches T convex from below, any critical points that exist will merge withthe critical point x = 0 , since x = 0 is the sole critical point for T > T convex . We may thereforelinearize the critical point equation around x = 0 . In this case, Eq. 22 simplifies to x = β ˜ Jx . Anon-trivial solution of this equation is just an eigenvector of β ˜ J with eigenvalue 1, which correspondsto T = T convex . If v ( N ) i is the largest eigenvector of β ˜ J with corresponding eigenvalue λ ( N ) , thenso is c v ( N ) i for any non-zero c - in other words the scale is not fixed in the linear treatment. Goingbeyond linear order will fix c up to a Z reversal c → − c , since h = 0 . Thus, a pair of critical pointswill appear as T convex is reached from above.The convex/non-convex transition experienced by the continuous distribution p ( x ) has no counter-part in the original discrete distribution p ( s ) . For every example we study below, T convex does not correspond to a phase transition in the discrete system. In fact, T convex may be varied without changingthe physical content of the theory by using a shift larger than the minimum, i.e. ∆ > ∆ min . However,there is some physical significance of the minimal value of T convex , which can be seen by noting that λ N ( J ) is the critical temperature predicted by the naive mean-field equation x = tanh ( βJx ) . (23)Defining T mean-field := λ N ( J ) , we may then write T convex = T mean-field + ∆ . (24)6oreover, if the eigenvalues of J lie in a symmetric interval, with λ N ( J ) = − λ ( J ) , then ∆ min = λ N ( J ) , and T convex ≥ T mean-field + ∆ min = 2 T mean-field . With our choice of ∆ = max(0 , (cid:15) − λ ( J )) we have that T convex = 2 T mean-field + (cid:15) so that as (cid:15) → the inequality is saturated.To summarize the results of this section: as the temperature is lowered past a T convex , the Hamiltoniandensity becomes non-convex, the critical point at x = 0 becomes unstable, and a pair of non-trivialcritical points with x (cid:54) = 0 appears. It is difficult to go much further than this description and makemore detailed statements about the geometry of the energy landscape without specifying the couplings J . This is to be expected, since our formalism applies to all spin-systems of the form Eq. 1, whichincludes both spin-glasses and ferromagnetic systems like the 2d Ising model. Below in Sections 5, 6and 7 we will further analyze the landscape for the Sherrington-Kirkpatrick model, random RestrictedBoltzmann Machines, and spin-glasses on random Erd˝os-Rényi graphs respectively. The case of 2Dferromagnetic Ising model system is explored in Appendix C. In this section we will discuss the low-temperature limit of our formalism. Our goal will be toprovide some insight into the geometry of energy landscapes of systems which are deep in thespin-glass phase (when such a phase exists), and to show how the metastable spin-glass states aregeometrically encoded in the Hamiltonian density, H β ( x ) . We will leave the coupling matrices anddisorder distribution unspecified, and as a result, our discussion will be somewhat general.We begin by taking the low-temperature expansion of the Hamiltonian density: H β ( x ) = H ∞ ( x ) + O ( β − ) , where H ∞ ( x ) := 12 x T ˜ Jx − N (cid:88) i =1 | ˜ Jx | i . (25)The equation governing the zero critical points may be obtained from H ∞ ( x ) directly or from the β → ∞ limit of Eq. 22: x = sgn ( ˜ Jx ) . (26)Additionally, the Hessian (matrix of second derivatives) of H ∞ ( x ) is simply the shifted couplingmatrix: K ∞ := ˜ J . There is a subtlety here, which is that the Hessian is not defined for points whichsatisfy ( ˜ Jx ) i = 0 for any i because the absolute value function is not differentiable at the origin.This is an important observation, since without it one would conclude that H ∞ ( x ) is convex, whichit certainly is not.With these ingredients, the integral defining the partition function Z x may then be formally written asa sum over the critical points using Laplace’s method: Z x = (cid:90) d N x e − β H β ( x ) ≈ (cid:88) α e − β H ∞ ( x ( α ) ) (cid:90) d N x e − β ( x − x ( α ) ) T ˜ J ( x − x ( α ) ) = (cid:115) (2 π ) N det( β ˜ J ) (cid:88) α e − β H ∞ ( x ( α ) ) , (27)where x ( α ) are the critical points of the Hamiltonian density, and the prefactor is due to the Gaussianintegration around each critical point. In writing the above expression we have assumed that allcritical points are minima, and so the Gaussian integration converges. Without specifying the couplingmatrix it is difficult to say much about the existence or non-existence of saddles, beyond the fact that x = 0 is always both a solution of the critical point equation and a point for which the Hessian is notdefined. This and any other similar points will require some special treatment, for example by rotatingthe integration contours and including sub-leading corrections in β − . Ignoring such complications,general correlation functions may also be formally written as a sum over critical points as: (cid:104) f ( x ) (cid:105) x ≈ (cid:88) α ω α f ( x ( α ) ) , (28)where ω α := e − β H β ( x ( α ) ) /Z x is the Boltzmann weight of each critical point, and f is an arbitraryfunction. Thus, the critical points can be seen to encode almost all of the physics of the problem in We have ignored the Gaussian prefactor here, since 1) it is subleading in β − , and 2) all critical pointsreceive the same prefactor since the Hessian is just the constant matrix ˜ J . a) (b) (c) (d) Figure 1: Contour plot of the Hamiltonian density for a system of two spins with J = J =0 . with shift ∆ = max(0 , (cid:15) − λ ( J )) and (cid:15) = 0 . for (a) T = T convex (b) T = T convex / (c) T = T convex / (d) T = T convex / . The density clouds due to spin configurations overlap above T convex ; that is the Gaussians are sufficiently broad that the resulting continuous distribution p ( x ) islog-concave. At low-temperatures, the distributions become fragmented into several distinct modes.Blue regions correspond to low energy configurations.the low-temperature limit. Applying the saddle-point method to the 1-point function (i.e. the p = 1 case of Eq. 14), the saddle point coordinates are related to the average per-site magnetizations via (cid:104) s i (cid:105) s ≈ (cid:88) α ω α x ( α ) i . (29)Therefore, in our continuous formulation the critical points are very analogous to the pure states ofspin-glass theory. Pure states of spin-glasses are sub-regions in the state space which are separated bylarge energy barriers, and the system is sub-ergodic in those regions even though global ergodicityis broken [19]. Indeed, if the sum over critical points is restricted to just a single critical point (orif there is only one such dominant critical point in the thermodynamic limit), then all connectedcorrelation functions vanish, for example (cid:104) x i x i (cid:105) x = (cid:104) x i (cid:105) x (cid:104) x i (cid:105) x . (30)This property is also known as cluster decomposition.The pure states have a simple geometric interpretation in our formalism. Recall that the continuousprobability density may be written as a weighted sum of Gaussians, each centered around one ofthe N spin configurations, p ( x ) = (cid:80) { s } p ( x | s ) p ( s ) . The covariance matrix of each Gaussian is Σ = ( β ˜ J ) − , and so the level sets of p ( x | s ) are N -dimensional ellipsoids whose shape is determinedby the eigenvectors and eigenvalues of Σ . In general, the density clouds due to each spin configurationwill overlap - for example above T convex the Gaussians are so broad that the resulting continuousdistribution p ( x ) is log-concave. At low-temperatures, the distribution will “fragment” into a numberof distinct modes. An example of this is shown in Fig. 1 for the simple case of just two spins, N = 2 .The nature of this fragmentation depend on how the β → ∞ limit is taken. Suppose that the originalcoupling matrix J has both positive and negative eigenvalues, so that the eigenvalues of the shiftedmatrix ˜ J satisfy λ i ( ˜ J ) ≥ (cid:15) (recall that the purpose of introducing the small positive constant (cid:15) was to guarantee positive-definiteness of ˜ J ). If (cid:15) is chosen to be temperature-independent, then the β → ∞ limit pushes all the eigenvalues of ( β ˜ J ) to infinity, and consequently all the eigenvaluesof Σ = ( β ˜ J ) − approach zero. In this case p ( x ) is composed of N distinct delta functions withdifferent weights, and the pure states are rather trivially just the N spin configurations. However, ifinstead β(cid:15) is held fixed as β → ∞ , then the eigenvalue spectrum of Σ will range from 0 to the finitevalue / ( β(cid:15) ) . Thus, the shape of the ellipsoid defining the level sets of the Gaussians will shrink to apoint in some directions, and remain finite in others. In this case the fragmentation of p ( x ) will bemore interesting. Groups of spin configurations will merge to form pure states as determined by thegeometry of the zero-temperature ellipsoids in relation to the N vertices of the [ − , N hypercube.The pure states of non-disorder averaged spin-glasses can be associated with solutions of a modifiedmean-field equation known as the Thouless, Anderson, and Palmer (TAP) equation, which wasderived in order to correct the failure of naive mean-field theory to describe the spin-glass phase of8he SK model. The naive mean-field equation is given in Eq. 23, whereas the TAP equation is givenby [20]: x i = tanh (cid:16) β (cid:88) j J ij x j − β x i (cid:88) j J ij (1 − x j ) (cid:17) . (31)We have argued that the critical points of the continuous probability density may be interpreted aspure states at low temperature, and thus there should be a connection between these and the solutionsof the TAP equation. Here we will establish such a connection at zero temperature, for which the TAPequation simplifies considerably: x = sgn ( Jx ) . Importantly, this is also the zero-temperature limitof the naive mean-field equation Eq. 23. The zero-temperature limit of the TAP/naive mean-fieldequations may be compared with the equation governing the critical points of H ∞ ( x ) : x = sgn ( ˜ Jx ) , critical points of H ∞ ( x ) . (32a) x = sgn ( Jx ) , naive mean-field/TAP . (32b)Note that the TAP/naive mean-field equation depends on the original coupling matrix J , whereas thecritical points of the Hamiltonian density depend on the shifted coupling matrix ˜ J = J + ∆ N × N .A key result is that solutions of the zero-temperature naive mean-field equation/TAP equation arealso critical points of the zero temperature Hamiltonian density: Proposition 1. If x is a solution of the zero temperature TAP equation x = sgn ( Jx ) , then x is alsoa solution of the zero temperature critical point equation x = sgn ( ˜ Jx ) .Proof. Suppose x = sgn ( Jx ) . The result holds for any ∆ ≥ ∆ min ≥ , so we will consider the cases ∆ = 0 and ∆ > separately. If ∆ = 0 , then clearly the mean-field and critical point equations areidentical. If ∆ > , then sgn (∆ x ) = sgn ( x ) = sgn ( sgn ( Jx )) = sgn ( Jx ) . Thus,sgn ( ˜ Jx ) = sgn ( Jx + ∆ x ) = sgn ( sgn ( Jx ) | Jx | + sgn (∆ x ) | ∆ x | ) (33) = sgn ( sgn ( Jx ) ( | Jx | + | ∆ x | )) = sgn ( Jx )= x . This establishes that for T = 0 every solution of the TAP equation is also a critical point of H ∞ ( x ) .The converse does not hold: there are critical points of the Hamiltonian density which are not solutionsof the TAP equation. To understand the significance of these points, recall that the nature of the purestates depends on whether (cid:15) is held fixed as β → ∞ , or if instead β(cid:15) is held fixed. In the first case, thepure states of the continuous formulation are somewhat trivial, as any of the N spin configurationswill be a pure state according to the above discussion. There may additionally be critical points with x i = ( ˜ Jx ) i = 0 for some i which will not correspond to any Ising spin configuration. For example,the point x = 0 is always a critical point. Since the TAP solutions are a subset of all possible spinconfigurations, the zero-temperature critical points will include both the TAP solutions as well as allother spin configurations and any saddle-like points such as x = 0 . If the zero-temperature limit isinstead taken while holding β(cid:15) fixed, then the critical points will include just a subset of all N spinconfigurations. That subset will include the TAP states, and possibly other spin configurations andsaddle-like points. In order to build intuition for the probability density formulation of general spin-glass systems, inthis section we consider as an example the Sherrington-Kirkpatrick (SK) model [4]. By specifyingthe coupling matrix J (or rather, the disorder distribution from which J is drawn), we may furtherexplore the geometry of the energy landscape and the nature of both the spin-glass and convexitytransitions in our formulation.The Sherrington-Kirkpatrick (SK) model is defined by specifying that the couplings J ij be drawnfrom an iid Gaussian distribution [4]: J ij ∼ N (cid:18) , J N (cid:19) , ( i < j ) , (34)9igure 2: The eigenvalue distribution of the coupling matrix J and the shifted coupling matrix ˜ J for the SK model. Both distributions are described by the Wigner semi-circle distribution, shownin black for both J and ˜ J . The size of the shift has been chosen so that the shifted distribution hassupport on the positive real numbers, λ ∈ (0 , ∞ ) .where the i > j values are fixed by symmetry of J to be the same as the i < j values, and thediagonal entries are zero. The coupling parameter J controls the variance of the disorder. Theeigenvalue distribution of J in the large- N limit is simply the Wigner semi-circle distribution, so thatthe probability density of the eigenvalues of J is p J ( λ ) = 2 πR (cid:112) R − λ [ − R,R ] ( λ ) , (35)where [ − R,R ] ( λ ) is the indicator function. The radius of the semi-circle is related to the couplingparameter via R = 2 J . Since the eigenvalues of J are restricted to the strip [ − R, R ] , the eigenvaluesof the shifted coupling matrix ˜ J will be shifted to lie within the strip [ (cid:15), R + (cid:15) ] . The eigenvalues ofboth J and J ∆ are depicted in Fig. 2Using the radius of the Wigner semi-circle (and disregarding (cid:15) for now by setting it to zero), we have T mean-field = 2 J , T convex = 2 T mean-field . (36)These may be contrasted with the critical temperature below which the system is in a spin-glass phase: T crit = J . (37)Therefore, we have found that T crit < T mean-field < T convex . (38)This indicates that the Hamiltonian density becomes non-convex due to the appearance of multiplecritical points well before any transition to an ordered phase occurs. The fact that T convex (cid:54) = T crit is intriguing. Naively one might have thought that the two temperatureswould have coincided because the transition from a convex to non-convex Hamiltonian densityrepresents a real and significant change in the corresponding Boltzmann distribution. Moreover, theminimal value of T convex = 4 J does not appear to have been previously identified as having anyparticular importance for the well-studied SK model. The mathematical transformation from theoriginal discrete variables to the continuous variables was exact and involved no approximation;however, one still needs to verify that spin-glass transition has not been shifted and still occursat T crit not at T convex when the convexity is no longer guaranteed. To this end, we carried out ahigh-temperature expansion in terms of the continuous variables and find exact agreement with theexpansion in terms of the original discrete variables carried out by Thouless, Anderson, and Palmer It is worth noting that these results are strictly only valid for N → ∞ . For finite- N both the eigenvalues of J and the critical temperature will exhibit fluctuations due to finite-sized effects. The fluctuation of the criticaltemperature due to finite-size effects is investigated in [21].
10n [20]. In both cases, the expansion breaks down at the spin-glass phase transition temperature T = T crit and not at the higher temperature T convex . We will provide an outline of the calculation here,and a more detailed treatment can be found in Appendix B.Using Eq. 13, the partition function Z s may be written in terms of the continuous variables as Z s = e − Nβ ∆2 (cid:42)(cid:89) i cosh (cid:16) β / ( ˜ Jx ) i (cid:17)(cid:43) . (39)The expectation value is taken over a properly normalized Gaussian distribution with zero-meanand covariance matrix ˜ J − . A high-temperature expansion may then be performed by expandingaround β = 0 . At each order the Gaussian integrals may be performed by Wick contractions, whichintroduces an increasing number of terms as the order of the expansion increases. The calculationsimplifies dramatically if the disorder is averaged over. Denoting the disorder average as (cid:104)·(cid:105) J , thefinal result is (cid:104) ln Z s (cid:105) J = N (cid:18) ( β J ) (cid:19) + 14 ln (cid:0) − β J (cid:1) + (non-singular) + O ( N − ) . (40)For T > T crit the sub-extensive terms may be neglected, but as the temperature of the spin-glasstransition is approached from above the logarithm becomes singular, indicating a breakdown of theperturbative expansion. Not only is the free energy analytic at the minimal convexity transitiontemperature min ∆ T convex = 4 J , but any dependence on the shift ∆ cancels out, since ∆ was onlyintroduced as part of our formulation.The above result was first derived in [20] by considering the expansion of Z s in terms of the originaldiscrete spin variables. In both cases - the expansion in terms of s and the expansion in terms of x ,the singular logarithm term is obtained by summing an infinite number of terms. In terms of Feynmandiagrams, the terms that contribute to the singularity correspond to double-sided regular n -gons for n ≥ : + + + + + · · · The fact that both expansions agree and yield no non-analyticity at T convex indicates that theconvex/non-convex transition is not associated with a thermodynamic phase transition. It alsoprovides a consistency check that the continuous formulation does not break down below T convex . Lastly, we investigated the zero-temperature limit of the SK model by studying the critical points.These are solutions of the equation x = sgn ( ˜ Jx ) . We generated a large number of such solutionsby randomly initializing x (0) ∈ {− , } N and then applying the iterative update rule below until asolution was found (or the algorithm failed to converge after a set number of iterations): x ( t ) = 12 (cid:16) x ( t − + sgn ( ˜ Jx ( t − ) (cid:17) , (critical point) . (41)This update rule corresponds to performing gradient descent on H β ( x ) , using a learning rate of / and ˜ J − ∇H β ( x ) in place of the usual gradient ∇H β ( x ) . We also generated a large number ofsolutions of the zero-temperature mean-field/TAP equation x = sgn ( Jx ) using the same procedurewith update rule given by: x ( t ) = 12 (cid:16) x ( t − + sgn ( Jx ( t − ) (cid:17) , (mean-field/TAP) . (42)In agreement with Proposition 1 above, we found that every mean-field/TAP solution also solved thecritical point equation. Interestingly, we also found that none of the critical point solutions generatedthis way also solved the mean-field/TAP equation. This is consistent with our earlier observation(that held for large ∆) that the saddle-like critical points exponentially out-numbered the minima. Wealso found that the solutions produced by the iterative method applied to each equation had widelyseparated energies. Fig. 3 plots the distribution of energies of each set of solutions. We thank Dan Ish for pointing this out to us. It should be emphasized that the iterative update rule/gradient descent method we used almost certainlydoes not generate solutions uniformly. We generated 10,000 unique solutions using each approach, but for the H ∞ ( x ) /N obtained by an application of the iterative procedure discussed in the text. This procedure wasapplied to two equations, the mean-field/TAP equation x = sgn ( Jx ) and the critical point equation x = sgn ( ˜ Jx ) , and in both cases N = 500 . Note that every solution of the first equation is also asolution of the second, although the converse is not true. Here we have set (cid:15) = 0 in ∆ min . This plotshows that the typical critical point has a much higher energy than typical solutions of the mean-fieldequation (when both sets of solutions are obtained using the iterative procedure). As a second example, we study the bipartite SK model, which is the natural extension of the SKmodel to bipartite complete graphs. This example also has significance in machine learning as itrepresents a randomly initialized Restricted Boltzmann Machine (RBM) [23]. In particular, thebipartite SK model describes a random initialization of RBMs where the biases have been set to zero.The connection between the bipartite SK model and RBMs has been recently studied in [24, 25, 26].In this case, the coupling matrix J ij takes on the block form: J = (cid:18) WW T (cid:19) , (43)with W a N v × N h matrix, where N v is the number of visible spins and N h is the number of hiddenspins. The total number of spins is N = N v + N h , and the spin vector may be written as s T = ( v, h ) .As in the SK model, the weights W ij in the bipartite SK model will be iid normally distributed: W ij ∼ N (cid:18) , J √ N v N h (cid:19) . (44)Using the relation det (cid:18) A BC D (cid:19) = det (cid:0) A − BD − C (cid:1) det( D ) (45)for block matrices A, B, C, D , the characteristic equation for J , det ( J − λ N × N ) = 0 , is equivalentto the condition det (cid:0) W W T − λ (cid:1) = 0 , (46)provided that λ (cid:54) = 0 . Thus, the non-zero eigenvalues of J are related to the eigenvalues of W W T via λ i ( J ) = ± λ i (cid:0) W W T (cid:1) / . (47) SK model we expect an exponential (in N ) number of solutions [22] and the solutions plotted in Fig. 3 may notbe representative of the overall distribution. Rather, they are representative of the distribution obtained whensolutions are generated using the iterative update rule/gradient descent. W W T for N v = 1000 , N h = 3000 and β = J = 1 . The dashed line corresponds to the large- N analytic prediction given by theMarchenko-Pastur distribution. (b) The eigenvalue distribution for the coupling matrix J constructedusing the W matrix used in (a). The dashed line corresponds to the large- N analytic prediction,which may be obtained from the Marchenko-Pastur analytic prediction and Eq. 47.The eigenvalue distribution of W W T in the large- N limit defined by N → ∞ with κ = N v /N h held fixed is given by the Marchenko–Pastur distribution [27], which in our conventions is: p W W T ( λ ) = (cid:112) ( R + − λ )( λ − R − )2 π J κ / λ [ R − ,R + ] ( λ ) + max (cid:0) , − κ − (cid:1) δ ( λ ) , (48)where R ± = J (cid:16) κ − / ± κ / (cid:17) . (49)In Fig. 4 we plot the eigenvalue distribution for both W W T and J .As a result of this analysis, we conclude that in the large- N limit λ i ( J ) ∈ [ − (cid:112) R + , (cid:112) R + ] , and also λ i ( ˜ J ) ∈ [0 , (cid:112) R + ] (again neglecting (cid:15) ). Thus, we have that T mean-field = J (cid:16) κ − / + κ / (cid:17) , and T convex = 2 T mean-field . (50)Moreover, as in the SK model, both of these temperatures are higher than the critical temperature ofthe spin-glass phase transition, which in our conventions is [26]: T crit = J . (51)Similar to the case of the SK model, the convex/non-convex transition happens well before the phasetransition occurs as the temperature is lowered. As a final example, we examine another prototypical spin-glass model by placing spins on Erd˝os-Rényi random graphs. We will consider J ij to be a Bernoulli random variable, by which we meanthat J i
We would like to thank S. Isakov, D. Ish, K. Najafi, and E. Parker for useful discussions and commentson this manuscript. We would also like to thank the organizers of the workshop
Theoretical Physicsfor Machine Learning , which took place at the Aspen Center for Physics in January 2019, forstimulating this collaboration and project.
A Convexity of the Hamiltonian density
In this appendix we prove that the Hamiltonian density is convex if and only if
T > T convex , with T convex = λ N ( ˜ J ) . Using our conventions, in [13] it was proven that p ( x ) for β = 1 is log-concave ifand only if the eigenvalue spectrum of ˜ J is sufficiently narrow, by which we mean < λ i ( ˜ J ) < , ∀ i ∈ { , ..., N } . (56)Note that the left-hand inequality of Eq. 56 is true by construction, since the shift ∆ was chosenso as to make ˜ J positive definite. Thus, the spectrum will be narrow if we additionally have that λ N ( ˜ J ) < . Here we repeat the proof of [13] for the case of Ising spin variables s ∈ {− , } N and our parametrization. Also, rather than considering the log-concavity of p ( x ) , we shall insteadconsider the equivalent condition of the convexity of H β ( x ) .In the proof we will make use of the relations λ ( − M ) = − λ N ( M ) and λ ( M − ) = λ N ( M ) − ,which hold for M = ˜ J and for M = S ( x ) , and we will also use the following eigenvalue inequalitiesfor two matrices A , B : λ ( A ) + λ ( B ) ≤ λ ( A + B ) ≤ λ ( A ) + λ N ( B ) . (57)We will also need the Hessian K ij ( x ) := ∂ i ∂ j H β ( x ) , which is K ( x ) = ˜ J − β ˜ JS ( x ) ˜ J , (58)where S ( x ) is the diagonal matrix given by S ij ( x ) := sech ( β ˜ h i ( x )) δ ij . We will find it useful towork with the matrix ˜ K ( x ) := β − ˜ J − K ( x ) ˜ J − , which is equal to ˜ K ( x ) = ( β ˜ J ) − − S ( x ) . (59)15ince K and ˜ K are congruent, they have the same numbers of positive, negative, and zero eigenvaluesaccording to Sylvester’s Law of Inertia [32]. Proposition 2.
The Hamiltonian density H β ( x ) is convex if and only if β ˜ J has a narrow spectrum,by which we mean λ N ( β ˜ J ) < . This is equivalent to T convex < T .Proof. For the forward direction, assume that λ N ( β ˜ J ) < . Then λ (cid:16) ˜ K ( x ) (cid:17) = λ (cid:16) ( β ˜ J ) − − S ( x ) (cid:17) ≥ λ (cid:16) ( β ˜ J ) − (cid:17) + λ ( − S ( x )) ≥ λ N ( β ˜ J ) − − > . (60)where we have used λ ( − S ( x )) ≥ − in the penultimate step, and the last inequality followsby assumption. Since the smallest eigenvalue of the Hessian ˜ K ( x ) is everywhere positive, theHamiltonian density is convex.To prove the reverse direction, assume that < inf x λ ( ˜ K ( x )) and let x ∗ = − ˜ J − h . Then, < inf x λ (cid:16) ˜ K ( x ) (cid:17) ≤ λ (cid:16) ˜ K ( x ∗ ) (cid:17) ≤ λ (cid:16) ( β ˜ J ) − (cid:17) + λ N ( − S ( x ∗ )) = λ N ( β ˜ J ) − − , (61)Therefore, λ N ( β ˜ J ) < . B High-temperature expansion of the SK model
According to Eq. 13, the discrete partition function is related to the continuous partition function via: Z s = e − Nβ ∆2 Z x (cid:113) (2 π ) N det(( β ˜ J ) − ) . (62)The denominator is just the normalization of a Gaussian distribution with zero mean and covariancematrix ( β ˜ J ) − . Similarly, Z x may also be written in terms of an un-normalized Gaussian integralwith the same mean and covariance: Z x = (cid:90) d N x e − β H β ( x ) = (cid:90) d N x e − β x T ˜ Jx (cid:89) i (cid:16) β ( ˜ Jx ) i (cid:17) . (63)Thus, after rescaling x → β − / x the partition function may be written as Z s = 2 N e − Nβ ∆2 (cid:42)(cid:89) i cosh (cid:16) β / ( ˜ Jx ) i (cid:17)(cid:43) . (64)where (cid:104)·(cid:105) denotes an average with respect to the Gaussian with covariance matrix Σ = ˜ J − . Theexpansion around β = 0 may now be carried out. Explicitly, to fourth order we have Z s = 2 N e − Nβ ∆2 (cid:42)(cid:89) i (cid:18) β Jx ) i + β
24 ( ˜ Jx ) i + β
720 ( ˜ Jx ) i + β Jx ) i + O ( β ) (cid:19)(cid:43) , (65)and expanding out the product yields Z s − N e Nβ ∆2 = 1 + β (cid:88) i (cid:104) ( ˜ Jx ) i (cid:105) + β (cid:32) (cid:88) i (cid:104) ( ˜ Jx ) i (cid:105) + 18 (cid:88) [ ij ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j (cid:105) (cid:33) + (66) + β (cid:32) (cid:88) i (cid:104) ( ˜ Jx ) i (cid:105) + 148 (cid:88) [ ij ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j (cid:105) + 148 (cid:88) [ ijk ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j ( ˜ Jx ) k (cid:105) (cid:33) + β (cid:32) (cid:88) i (cid:104) ( ˜ Jx ) i (cid:105) + 11440 (cid:88) [ ij ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j (cid:105) + 11152 (cid:88) [ ij ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j (cid:105) + 1192 (cid:88) [ ijk ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j ( ˜ Jx ) k (cid:105) + 1384 (cid:88) [ ijkl ] (cid:104) ( ˜ Jx ) i ( ˜ Jx ) j ( ˜ Jx ) k ( ˜ Jx ) l (cid:105) (cid:33) + O ( β ) . [ i i ...i n ] means that all the indices are distinct.The Gaussian integrals can be done via Wick contractions, each of which brings in a factor of ˜ J − .Working out the first few terms explicitly, one finds a proliferation of terms involving powers of ∆ .These terms may be summed to give e Nβ ∆2 , which cancels the term on the RHS of Eq. 66. Of course,this had to be the case because Z s is independent of ∆ . Carrying out the expansion to fourth order,we find that ln Z s = N ln 2+ β (cid:88) [ ij ] J ij + β (cid:88) [ ijk ] J ij J jk J ki + β (cid:88) [ ijkl ] J ij J jk J kl J li − (cid:88) [ ij ] J ij + O ( β ) . (67)Taking the disorder average yields: (cid:104) ln Z s (cid:105) J = N (cid:18) ln 2 + 14 ( β J ) (cid:19) − (cid:18) ( β J ) + 12 ( β J ) + O ( β ) (cid:19) + O ( N − ) . (68)With enough effort, the expansion may be extended to arbitrary order. To facilitate comparison withthe result obtained by expanding in terms of the discrete variables performed by TAP [20], we willnote that TAP did not keep track of each term in the expansion. They worked out all terms whichcontribute at O ( N ) , and at sub-leading order O (1) they restricted their attention to only those termsin the series which contribute to a singularity at the spin-glass phase transition. Concretely, theyfound: (cid:104) ln Z s (cid:105) J = N (cid:18) ln 2 + 14 ( β J ) (cid:19) + 14 ln (cid:0) − β J (cid:1) + (non-singular) + O ( N − ) . (69)Our calculation has already reproduced the extensive term. Next we will show that the expansion interms of the x variables also matches the singular logarithm term. For this purpose we will restrictattention to just those terms which involving n distinct copies of ( ˜ Jx ) . Using the notation A ⊃ B to indicate that the expansion for A contains the expansion B , the series restricted to these terms is: Z s ⊃ N ∞ (cid:88) n =3 β n n n ! (cid:88) [ i ...i n ] (cid:104) ( Jx ) i ( Jx ) i ... ( Jx ) i n (cid:105) (70)There are (2 n − different Wick contractions to consider. Of these, there are (2 n − contractionsthat avoid pairing x ’s connected through a coupling matrix. Thus the contribution of the connectedcyclic terms at each order is given by: Z s ⊃ N ∞ (cid:88) n =3 β n n (cid:88) [ i ...i n ] J i i J i i ...J i n i . (71)The series contains additional terms that result from other contractions, but these are either sub-leading in /N or do not contribute to the singularity. The contribution of the cyclic terms above maybe represented diagrammatically as regular n -sided polygons diagrams, where each side represents afactor of the coupling matrix Z s ⊃ + + + + · · · (72)The next step is to take the logarithm and perform the disorder average. The logarithm introducesadditional terms at each order, although many of these vanish at leading order in /N after taking thedisorder average. Among the terms which survive are the squares of the above polygon terms: (cid:104) ln Z s (cid:105) J ⊃ − ∞ (cid:88) n =3 β n (2 n ) (cid:42) (cid:88) [ i ...i n ] J i i J i i ...J i n i (cid:43) J . (73)The disorder average may also be performed using Wick contractions. The only non-vanishing con-tractions are those where each distinct factor of the coupling J ij appears squared. Diagrammatically,this means that the contribution of these terms corresponds to the double-sided regular polygons: (cid:104) ln Z s (cid:105) J ⊃ + + + + · · · (74)17o count the number of each term, note that there are two cyclic groupings of matrices which must becontracted with one another: J i i J i i ...J i n i and J j j J j j ...J j n j . Each J from the first groupwill be contracted with a J from the second. With no loss of generality, the ordering of the first cyclecan be fixed. There are then ( n − ways to order the second cycle. However, this over counts bya factor of 2 because the direction of the cycle is irrelevant. So, the symmetry factor for the n -thdiagram is ( n − / . There is also a factor of (cid:0) Nn (cid:1) corresponding to the number of choosing n distinct sites to form a cycle. The end result is that the double-sided polygons give a contribution of: (cid:104) ln Z s (cid:105) J ⊃ − ∞ (cid:88) n =3 ( n − (cid:18) Nn (cid:19) ( (cid:104) J ij (cid:105) J ) n = − ∞ (cid:88) n =3 ( β J ) n n + O ( N − ) . (75)This series just corresponds to − ln(1 − β J ) / , minus the n = 1 and n = 2 terms. Adding thisresult to the previous result of Eq. 68 reproduces the expression TAP found in [20], which we havereproduced in Eq. 69.Therefore, we have found that, regardless of which formulation is used, the disorder-averaged high-temperature expansion produces an extensive O ( N ) result which is valid above the spin-glass phasetransition temperature. At sub-leading order O (1) the expansion also contains an infinite number ofcyclic terms which diagrammatically correspond to regular polygons. These terms may be re-summedto find a contribution which becomes singular at the phase transition, indicating that the perturbativeexpansion has broken down. We see no indication that T convex has any particular significancewhatsoever in the partition function. Indeed, T convex depends on ∆ , a parameter introduced as partof the definition of the continuous formulation, whereas Z s does not. Lastly, we note that the twopartition functions Z s and Z x are proportional, and that the constants of proportionality are completelywell-behaved at T = T convex . Thus, we can conclude that the convex/non-convex transition does notcorrespond to any sort of phase transition or non-analyticity in either partition function. C 2d Ising model
As an additional example, we consider the phase transition of the well-studied ferromagnetic Isingmodel defined over the 2-dimensional square lattice with periodic boundary conditions. The probabil-ity density formulation of this model was used in [11] as the first step towards modeling the systemwith normalizing flows [12, 33] near the paramagnetic/ferromagetic phase transition.In this case the eigenvalues may be worked out analytically. For a d -dimensional hypercubic latticewith L spins per dimension, the eigenvalues are given by λ ( J ) = 2 J d (cid:88) µ =1 cos (cid:18) πL n µ (cid:19) , n µ ∈ { , , ..., L − } . (76)Where J is the bond strength. Thus, λ N ( J ) = 2 d J , λ ( J ) = (cid:26) − d J L even d J cos (cid:0) π (cid:0) − L − (cid:1)(cid:1) L odd (77)In the large- L limit, the difference between even and odd L vanishes, and the eigenvalues lie withinthe symmetric interval [ − R, R ] with R = 2 d J . As a result, for d = 2 , T mean-field = 4 J , T convex = 2 T mean-field . (78)Both of these are greater than the critical temperature, which is well known to be T crit = 2 J ln (cid:0) √ (cid:1) ≈ . J . (79)Thus, as the temperature is lowered the Hamiltonian density becomes non-convex well before thephase transition. See for example Sec. 2.2 of [34]. eferences [1] D. L. Stein and C. M. Newman, Spin Glasses and Complexity . Princeton University Press,2013.[2] M. Mezard and A. Montanari,
Information, Physics, and Computation . Oxford UniversityPress, Inc., New York, NY, USA, 2009.[3] C. Moore and S. Mertens,
The Nature of Computation . Oxford University Press, Inc., 2011.[4] D. Sherrington and S. Kirkpatrick,
Solvable model of a spin-glass , Physical review letters (1975), no. 26 1792.[5] T. Castellani and A. Cavagna, Spin-glass theory for pedestrians , Journal of StatisticalMechanics: Theory and Experiment (may, 2005) P05012.[6] H. Nishimori and G. Ortiz,
Phase transitions and critical phenomena , Elements of PhaseTransitions and Critical Phenomena (Feb, 2010) 1–15.[7] M. S. Albergo, G. Kanwar, and P. E. Shanahan,
Flow-based generative models for markovchain monte carlo in lattice field theory , Phys. Rev. D (Aug, 2019) 034515.[8] L. Huang and L. Wang,
Accelerated monte carlo simulations with restricted boltzmannmachines , Physical Review B (2017), no. 3 035105.[9] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Self-learning monte carlo method , Physical Review B (2017), no. 4 041101.[10] H. Shen, J. Liu, and L. Fu, Self-learning monte carlo with deep neural networks , PhysicalReview B (2018), no. 20 205140.[11] S.-H. Li and L. Wang, Neural network renormalization group , Physical review letters (2018), no. 26 260601.[12] D. J. Rezende and S. Mohamed,
Variational inference with normalizing flows , arXiv preprintarXiv:1505.05770 (2015).[13] Y. Zhang, Z. Ghahramani, A. J. Storkey, and C. A. Sutton, Continuous relaxations for discretehamiltonian monte carlo , in
Advances in Neural Information Processing Systems ,pp. 3194–3202, 2012.[14] F. Caravelli,
On a" continuum" formulation of the ising model partition function , arXiv preprintarXiv:1908.08065 (2019).[15] G. S. Hartnett and M. Mohseni, Self-supervised learning of generative spin-glasses withnormalizing flows , arXiv preprint arXiv:2001.00585 (2020).[16] V. Dotsenko, Introduction to the replica theory of disordered statistical systems , vol. 4.Cambridge University Press, 2005.[17] S. Duane, A. D. Kennedy, B. J. Pendleton, and D. Roweth,
Hybrid monte carlo , Physics lettersB (1987), no. 2 216–222.[18] W. R. Gilks and P. Wild,
Adaptive rejection sampling for gibbs sampling , Journal of the RoyalStatistical Society: Series C (Applied Statistics) (1992), no. 2 337–348.[19] M. Mézard, G. Parisi, and M. Virasoro, Spin glass theory and beyond: An Introduction to theReplica Method and Its Applications , vol. 9. World Scientific Publishing Company, 1987.[20] D. J. Thouless, P. W. Anderson, and R. G. Palmer,
Solution of’solvable model of a spin glass’ , Philosophical Magazine (1977), no. 3 593–601.[21] M. Castellana and E. Zarinelli, Role of tracy-widom distribution in finite-size fluctuations of thecritical temperature of the sherrington-kirkpatrick spin glass , Physical Review B (2011),no. 14 144417.[22] A. Bray and M. A. Moore, Metastable states in spin glasses , Journal of Physics C: Solid StatePhysics (1980), no. 19 L469. 1923] P. Smolensky, Information processing in dynamical systems: Foundations of harmony theory ,tech. rep., Colorado Univ at Boulder Dept of Computer Science, 1986.[24] A. Decelle, G. Fissore, and C. Furtlehner,
Spectral dynamics of learning restricted boltzmannmachines , arXiv preprint arXiv:1708.02917 (2017).[25] A. Decelle, G. Fissore, and C. Furtlehner, Thermodynamics of restricted boltzmann machinesand related learning dynamics , Journal of Statistical Physics (2018), no. 6 1576–1608.[26] G. S. Hartnett, E. Parker, and E. Geist,
Replica symmetry breaking in bipartite spin glasses andneural networks , Physical Review E (2018), no. 2 022116.[27] V. A. Marˇcenko and L. A. Pastur, Distribution of eigenvalues for some sets of random matrices , Mathematics of the USSR-Sbornik (1967), no. 4 457.[28] P. Erd˝os and A. Rényi, On the evolution of random graphs , Publ. Math. Inst. Hung. Acad. Sci (1960), no. 1 17–60.[29] L. Erd˝os, A. Knowles, H.-T. Yau, J. Yin, et al., Spectral statistics of erd˝os–rényi graphs i: localsemicircle law , The Annals of Probability (2013), no. 3B 2279–2375.[30] J. Milnor, Morse theory.(AM-51) , vol. 51. Princeton university press, 2016.[31] N. Le Roux and Y. Bengio,