[PDF] Entropic barriers as a reason for hardness in both classical and quantum algorithms

Abstract

We study both classical and quantum algorithms to solve a hard optimization problem, namely 3-XORSAT on 3-regular random graphs. By introducing a new quasi-greedy algorithm that is not allowed to jump over large energy barriers, we show that the problem hardness is mainly due to entropic barriers. We study, both analytically and numerically, several optimization algorithms, finding that entropic barriers affect in a similar way classical local algorithms and quantum annealing. For the adiabatic algorithm, the difficulty we identify is distinct from that of tunnelling under large barriers, but does, nonetheless, give rise to exponential running (annealing) times.

Full PDF

EEntropic barriers as a reason for hardness in both classical and quantum algorithms

Matteo Bellitti

Department of Physics, Boston University, Boston, MA 02215, USA

Federico Ricci-Tersenghi

Dipartimento di Fisica, Sapienza Universit´a di Roma, P.le A. Moro 2, I-00185 Roma, ItalyCNR, Nanotec, Rome unit, P.le A. Moro 2, I-00185 Roma, Italy andINFN, Sezione di Roma I, P.le A. Moro 2, I-00185 Roma, Italy

Antonello Scardicchio

The Abdus Salam International Center for Theoretical Physics, Strada Costiera 11, 34151, Trieste, Italy andINFN Sezione di Trieste, Via Valerio 2, 34127 Trieste, Italy (Dated: February 25, 2021)We study both classical and quantum algorithms to solve a hard optimization problem, namely3–XORSAT on 3–regular random graphs. By introducing a new quasi–greedy algorithm that is notallowed to jump over large energy barriers, we show that the problem hardness is mainly due toentropic barriers. We study, both analytically and numerically, several optimization algorithms,ﬁnding that entropic barriers aﬀect in a similar way classical local algorithms and quantum annealing.For the adiabatic algorithm, the diﬃculty we identify is distinct from that of tunnelling under largebarriers, but does, nonetheless, give rise to exponential running (annealing) times.

I. INTRODUCTION

Hard discrete optimization problems are ubiquitousin scientiﬁc disciplines and practical applications. Theproblem of minimizing a complex cost function (or equiv-alently maximizing a reward function) naturally appearsin many diﬀerent contexts: e.g. in physics in the compu-tation of ground state conﬁgurations, in statistics in themaximization of the likelihood, in machine learning inthe training of artiﬁcial neural networks, and so on.Although real world problems have usually local struc-tures that make their analysis diﬃcult, it is commonlybelieved that the main source of computational hardnessarises from the strong long–range correlations that ex-ist among variables, and this eﬀect can be studied alsoin more idealized and simple-to-solve models. In otherwords, in hard optimization problems, starting from anoptimal or near-optimal conﬁguration, the change of asingle variable (or a small subset of variables) often re-quires the rearrangement of many more variables in orderto remain close to optimality; often the variables to berearranged are not even close to the modiﬁed variable.This property makes the search for the optimal conﬁgura-tion a challenging task even for sophisticated algorithms(see for example the case of backtracking algorithms, likeDPLL [1]).An ideal setting for studying this kind of hard optimiza-tion problems is provided by constraint satisfaction prob-lems deﬁned on sparse random graphs. Such problemshave a twofold beneﬁt: they can be solved analyticallyusing the cavity method, a tool from statistical physics ofdisordered systems, and they can be eﬃciently handledon a computer, as the ﬁnite mean degree of the graphmakes the computational resources required (CPU andmemory) grow only linearly with the problem size.Random constraint satisfaction problems (rCSP) are optimization problems where N discrete variables needto be assigned in order to satisfy M = αN constraints,each one involving a small subset of variables. The mostfamous among rCSP is maybe random K -SAT [2]. Re-cently these rCSP have been the subject of intense studiesbased on statistical physics ideas with the aim of under-standing the origin of their computational hardness [3–16].Indeed, a common feature of all the hard rCSP is thepresence of a broad range of the constraints per variableratio α such that solutions to the problem exists withhigh probability (in the large N limit), but all knownsolving algorithms are unable to ﬁnd any solution in atime growing polynomially with the problem size N . Inthis hard region it is expected that any solving algorithmrequires a time growing exponentially with the problemsize, t ∼ exp( aN ).By deﬁning an energy function that counts the numberof violated constraints, one can visualize the rCSP as theproblem of searching for a zero-energy conﬁguration ina complex energy landscape. The hard phase in rCSPdoes actually corresponds to an energy landscape withexponentially many (in the system size) local minima thatcan trap the searching dynamics. The energy barriersbetween these minima are usually considered the mainsource of computational complexity, as any local dynamicsis required to jump over these barriers in order to proceedfurther in the search for the optimal conﬁguration.Based on the above picture, it is often believed thata quantum evolution –that allows for tunneling events–may escape local minima more eﬃciently than a classicalstochastic dynamics. This physically reasonable expecta-tion implies that quantum algorithms may be faster in thesearch for optimal conﬁgurations than their classical coun-terparts, and has fueled interest in quantum algorithmsthat could beneﬁt from this phenomenon: Quantum An-nealing [17–22], its more recent variant, the Quantum Ap- a r X i v : . [ c ond - m a t . d i s - nn ] F e b proximate Optimization Algorithm [21], and PopulationTransfer [23–25] are some well-known examples. All thesealgorithms typically show a complexity growth compara-ble with the best classical algorithms but, despite theirinitial promise, it is entirely possible that the limitationsthey exhibit are insurmountable, and it is unlikely thatthey could solve NP-hard problems in polynomial time.The limitations might come from the exponentially smalltunneling rate out of a local minimum (a phenomenonlinked to many-body localization and the existence ofan emergent integrable dynamical phase [20, 26–28]) ormight come from other dynamical phenomena. In thispaper we identify one such phenomenon.We consider one of the hardest sparse rCSP, namelyrandom 3-XORSAT, and show that the problem hardnesscan be interpreted as coming essentially from entropicbarriers : we introduce a quasi–greedy algorithm, unableto jump over large energy barriers, and notice that itis able to solve the problem as eﬃciently as the state-of-the-art algorithms, which are designed to be eﬃcientin problems with large energy barriers. This peculiarproperty suggests that this model is a perfect candidateto understand the eﬀect of entropic barriers.We investigate several algorithms, both classical andquantum, in order to better understand the eﬀect ofentropic barriers. For all the algorithms analyzed, we ﬁndthat the time to reach a solution scales exponentially withthe system size and quantum dynamics seem to suﬀerfrom the presence of entropic barriers as much as theclassical algorithmic dynamics. As the eﬀort to builda quantum computer are ﬁnally giving up some results[29], we believe it is important to identify all possiblestumbling blocks for quantum architectures. II. MODEL DEFINITION AND ITS KNOWNSOLUTION

The random 3-XORSAT problem is among the simplestrCSP[3]: it is made of N binary variables x i ∈ { , } thathave to satisfy M = αN parity checks of the kind x i ⊕ x j ⊕ x k = b ijk , (1)where the variables entering each constraint are randomlychosen and the parity check bit b ijk is 0 or 1 with equalprobability. In a K –XORSAT problem each constraintinvolves K variables, but for the sake of simplifying thepresentation we restrict to the random 3–XORSAT prob-lem, where each constraint involves exactly 3 randomlychosen variables. Increasing α the typical problem be-comes more and more diﬃcult to solve. Solutions existwith high probability in the large N limit until the sat–unsat threshold α s [30]. However, the most interestingtransition from the point of view of searching algorithmsis the clustering (or dynamical) transition that takes placeat α d before α s [6]. For α ∈ [ α d , α s ] the space of solutionsis shattered in exponentially many cluster of solutions[6, 31] and this is what makes the search for solutions much more diﬃcult [3, 6, 15]. This picture has beenproven rigorously to a large extent [32]. The hardness ofthe problem of ﬁnding a solution for some classes of localalgorithms in the region α ∈ [ α d , α s ] has been proven[33], and found to depend on the so-called ‘overlap gapproperty’, which in practice corresponds to the clusteringof solutions taking place after the dynamical transition α d . In other words, for α > α d the geometry of solutionsis such that the Hamming distance d between any pair ofsolutions is either very small d < d (for pairs of solutionsin the same cluster) or very large d > d (for pairs ofsolutions in diﬀerent clusters) [6]. It is exactly the exis-tence of such a range of distances with no solutions thatcreates an algorithmic bottleneck, while for α < α d localalgorithms can sample the space of solutions [32]. So weare going to focus our attention on a 3-XORSAT problembeyond the dynamical threshold α d . Given that we are interested in using this model asa benchmark for optimization, we need two more ingre-dients: (i) We need to deﬁne an energy function whoseground state conﬁgurations are the solutions to the prob-lem; the simplest choice consists in just counting thenumber of violated parity checks via the following Hamil-tonian H [ s ] = 12 (cid:32) M − M (cid:88) a =1 J a (cid:89) i ∈ ∂a s i (cid:33) , (2)where s i = ( − x i are Ising spins and J a = ( − b a thecouplings, being a an index running over all interactions(triplets for 3-XORSAT) and ∂a the set of variables en-tering the a -th interaction. (ii) At least a solution mustalways exist, and this can be ensured by enforcing aspeciﬁc conﬁguration to satisfy all the constraints. Forexample, by setting all b = 0, the conﬁguration x i = 0 ∀ i is always a solution. One may think this way of build-ing the model naturally favors the imposed or plantedconﬁguration, but this is not the case for the XORSATproblem. As noticed since Ref. [35], ﬁnding the imposedor planted solution is like ﬁnding the crystal in a modelof a liquid that upon cooling spontaneously forms a glass:it is well known that crystallization requires an activateddynamical process (nucleation), which is exponentially Strictly speaking, XORSAT is in P, as the solution to a setof αN linear equations in N variables can be always found intime O ( N ) by a simple Gaussian elimination algorithm (morecomplicated algorithms can do marginally better). However,such an algorithm is very special to XORSAT and cannot begeneralized to other CSPs being inherently non-local (i.e. theoperations performed involve in general many variables, also verydistant on the interaction network). On the contrary, when local algorithms are run on XORSAT they are found to be very slowand aﬀected strongly by the dynamical transition at α d . Eventhe very same Gaussian elimination algorithm has an algorithmicphase transition at α d from linear to cubic behavior [34] and itdoes not work as soon as the problem is slightly perturbed (e.g.by the addition of a tiny fraction of constraints with the ORoperator). rare in models with long range interactions, as the randomXORSAT. Planted models which are hard to solve arethe most natural candidates for optimization benchmarksand the planted XORSAT turns out to be the hardestamong these [5]. Properties of planted models have beenstudied in a great detail [36, 37] and reviewed in [38].The Hamiltonian for the planted model simpliﬁes tothe following H [ s ] = 12 (cid:32) M − M (cid:88) a =1 (cid:89) i ∈ ∂a s i (cid:33) , (3)which is indeed minimized by the conﬁguration s i = 1 ∀ i .This is the energy function we are going to minimize inorder to test classical and quantum optimization algo-rithms.The last relevant choice regards the interaction hyper-graph, that is the set of M triplets. In the model wherethe M triplets are chosen randomly the degree of eachvariable is a Poisson random variable of mean 3 α . Adiﬀerent choice is the one where the interaction hyper-graph is chosen such that each variable has the samedegree d : this is called a random regular hyper-graph andcan be generated via the conﬁgurational model where N variables are given d legs each and M = N d/ d = 3 case. In thiscase the random and the planted models are equivalent inthe large N limit for any positive temperature (at T = 0the sub-extensive diﬀerences between the two models maylead to some discrepancy in the number of solutions dis-cussed in Refs. [42, 43]). We are going to use the plantedmodel in order to be sure that a solution always existseven for ﬁnite (and small) values of N .Having ﬁxed the degree of the hyper-graph such thatthe XORSAT problem is in its hard phase, we can considernow the Gibbs measure corresponding to Hamiltonian H in Eq. 3 at any temperature T . The XORSAT problemcorresponds to the problem of ﬁnding a zero energy groundstate, so it is somehow related to the T = 0 physics ofthe model, but the behaviour of the model at T > T = 0 (e.g. for α > α d ) it undergoes a dynamical phasetransition at a positive temperature T d and for T < T d ithas an exponentially large number of metastable statesdominating the thermodynamics, N ∼ exp[ N Σ], where Σis the so-called complexity.This is the case for the 3-regular 3-XORSAT modelthat shows a dynamical phase transition at T d = 0 . AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0nEoseCF48t2A9oQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgobm1vbO8Xd0t7+weFR+fikreNUMWyxWMSqG1CNgktsGW4EdhOFNAoEdoLJ3dzvPKHSPJYPZpqgH9GR5CFn1FipiYNyxa26C5B14uWkAjkag/JXfxizNEJpmKBa9zw3MX5GleFM4KzUTzUmlE3oCHuWShqh9rPFoTNyYZUhCWNlSxqyUH9PZDTSehoFtjOiZqxXvbn4n9dLTXjrZ1wmqUHJlovCVBATk/nXZMgVMiOmllCmuL2VsDFVlBmbTcmG4K2+vE7aV1WvVnWb15V6PY+jCGdwDpfgwQ3U4R4a0AIGCM/wCm/Oo/PivDsfy9aCk8+cwh84nz/KJYzr e AAAB83icbVBNS8NAEN3Ur1q/qh69LBbBU0lE0WPBi8cK9gOaUDbbSbt0swm7E7GE/g0vHhTx6p/x5r9x2+agrQ8GHu/NMDMvTKUw6LrfTmltfWNzq7xd2dnd2z+oHh61TZJpDi2eyER3Q2ZACgUtFCihm2pgcSihE45vZ37nEbQRiXrASQpBzIZKRIIztJIPfR/hCXMc6Wm/WnPr7hx0lXgFqZECzX71yx8kPItBIZfMmJ7nphjkTKPgEqYVPzOQMj5mQ+hZqlgMJsjnN0/pmVUGNEq0LYV0rv6eyFlszCQObWfMcGSWvZn4n9fLMLoJcqHSDEHxxaIokxQTOguADoQGjnJiCeNa2FspHzHNONqYKjYEb/nlVdK+qHtXdff+stZoFHGUyQk5JefEI9ekQe5Ik7QIJyl5Jq/kzcmcF+fd+Vi0lpxi5pj8gfP5A7hIkh8= e thr AAAB8nicbVBNSwMxEM3Wr1q/qh69BIvQXsquKHosePGkFWwtbJeSTWfb0GyyJFmxLP0ZXjwo4tVf481/Y9ruQVsfDDzem2FmXphwpo3rfjuFldW19Y3iZmlre2d3r7x/0NYyVRRaVHKpOiHRwJmAlmGGQydRQOKQw0M4upr6D4+gNJPi3owTCGIyECxilBgr+bfVLjwlVXJTq/XKFbfuzoCXiZeTCsrR7JW/un1J0xiEoZxo7XtuYoKMKMMoh0mpm2pICB2RAfiWChKDDrLZyRN8YpU+jqSyJQyeqb8nMhJrPY5D2xkTM9SL3lT8z/NTE10GGRNJakDQ+aIo5dhIPP0f95kCavjYEkIVs7diOiSKUGNTKtkQvMWXl0n7tO6d1927s0qjkcdRREfoGFWRhy5QA12jJmohiiR6Rq/ozTHOi/PufMxbC04+c4j+wPn8Aa76kDM= O (exp( aN )) AAAB63icbVBNSwMxEJ2tX7V+VT16CRahXsquVPRY8OJJK9gPaJeSTbNtaJJdkqxQlv4FLx4U8eof8ua/MdvuQVsfDDzem2FmXhBzpo3rfjuFtfWNza3idmlnd2//oHx41NZRoghtkYhHqhtgTTmTtGWY4bQbK4pFwGknmNxkfueJKs0i+WimMfUFHkkWMoJNJt1X784H5Ypbc+dAq8TLSQVyNAflr/4wIomg0hCOte55bmz8FCvDCKezUj/RNMZkgke0Z6nEgmo/nd86Q2dWGaIwUrakQXP190SKhdZTEdhOgc1YL3uZ+J/XS0x47adMxomhkiwWhQlHJkLZ42jIFCWGTy3BRDF7KyJjrDAxNp6SDcFbfnmVtC9q3mXNfahXGo08jiKcwClUwYMraMAtNKEFBMbwDK/w5gjnxXl3PhatBSefOYY/cD5/AAokjZI= O ( N ) AAAB6HicbVBNS8NAEJ34WetX1aOXYBE8lUQUPRa8eGzBfkAbymY7bdduNmF3IpTQX+DFgyJe/Une/Ddu2xy09cHA470ZZuaFiRSGPO/bWVvf2NzaLuwUd/f2Dw5LR8dNE6eaY4PHMtbtkBmUQmGDBElsJxpZFEpsheO7md96Qm1ErB5okmAQsaESA8EZWalOvVLZq3hzuKvEz0kZctR6pa9uP+ZphIq4ZMZ0fC+hIGOaBJc4LXZTgwnjYzbEjqWKRWiCbH7o1D23St8dxNqWIneu/p7IWGTMJAptZ8RoZJa9mfif10lpcBtkQiUpoeKLRYNUuhS7s6/dvtDISU4sYVwLe6vLR0wzTjabog3BX355lTQvK/51xatflavVPI4CnMIZXIAPN1CFe6hBAzggPMMrvDmPzovz7nwsWtecfOYE/sD5/AHg4Yz6 t FIG. 1. Schematic picture for the energy relaxation in hardoptimization problems. On short times the energy relaxes toa threshold value, while the ground state (solution) is reachedonly for times growing exponentially with the system size. Theﬁrst regime can be described by ordinary diﬀerential equationstaking the large N limit at t/N ﬁxed, while the second requiresthe estimation of rare and large ﬂuctuations. and a non-zero complexity of states at T = 0 that extendsfrom e ≡ (cid:104)H(cid:105) /N = 0 to e = e d = 0 . e marg = 0 . T = 0 Langevin dynamics) and the energy landscape thatwe can describe in a precise way via the computation ofthe complexity is still missing (and recent results haveclariﬁed that the situation is much more complicatedthan previously expected [44–46]). So we cannot makean analytical claim about the threshold energy whichis hard to go below by searching algorithms, but thisthreshold energy is certainly positive and close to e marg .Reaching a solution, that is an e = 0 conﬁguration, is avery hard problem and requires in general times scalingexponentially with the system size N . III. THE OPTIMIZATION ALGORITHMS

In the study of the optimization algorithms, that isthe out of equilibrium processes that try to minimize theenergy, the order in which the large size limit ( N → ∞ )and large times limit ( t → ∞ ) are taken is extremelyimportant. We expect an algorithm-dependent thresholdenergy, e thr >

0, to exist such that conﬁgurations with e > e thr can be reached in an “easy” way (e.g. in a timescaling linearly with N ), while to reach a solution (i.e. aconﬁguration with e = 0) a time growing exponentiallyin N is required in general for hard problems. (see theschematic picture in Fig. 1).We will see that for some algorithms we are able toprovide an approximate description of the dynamics in theregime where the N → ∞ limit is taken before the t → ∞ limit, thus estimating e thr (that in the best cases is close Annealinge marg e T FIG. 2. Simulated Annealing is not able solve the 3-regular3-XORSAT problem, since it converges in the long time limitto a positive energy (close to the marginal energy e marg ). Thefour curves have annealing rates ∆ T = 10 − , − , − , − (from top to bottom). The blue curve represents the energyabove the dynamical transition. to the marginal energy e marg .) However the interestingquestion about the scaling of times to reach a solutionrequires a diﬀerent analytic approach where ﬂuctuationsare taken into account. Most of our results are in theregime where times are made large while keeping N ﬁniteare based on numerical experiments.The presentation of our results about optimizationalgorithms is somehow split in two parts. In Sections III Aand III B we discuss the regime of linear times where largesizes can be studied and analytical solutions in the large N limit can be obtained (thus estimating the thresholdenergy for various algorithms). In Sections III C and III Dwe study the regime where times are made exponentiallylarge in the system size N , and estimate the exponentialgrowth rate of the time to reach a solution. A. Simulated Annealing: a warming up with themost widely used optimization algorithm

Simulated Annealing (SA) is maybe the most widelyused optimization algorithm. It consists in implement-ing a Monte Carlo Markov Chain sampling from theGibbs-Boltzmann distribution P GB ( s ) ∝ exp( −H ( s ) /T )with a temperature T slowly decreasing towards zero.In Fig. 2 we report the results of Simulated Annealingrun on samples of size N = 10 with a cooling schedulewhere the temperature is decreased by ∆ T after eachMonte Carlo Sweep (MCS): the four curves correspondsto ∆ T = 10 − , − , − , − (from top to bottom).From the Simulated Annealing results it is clear thatan eﬃcient, but still linear-time, algorithm is not ableto solve the 3-regular 3-XORSAT problem, and seems toconverge to conﬁgurations with a threshold energy closeto e marg . To achieve the zero energy conﬁguration weneed to use an algorithm that can go below the thresholdenergy, overcoming energetic and/or entropic barriers. B. A broad class of stochastic search algorithms

Given a particular spin conﬁguration, let us classifyits variables according to the number of unsatisﬁed in-teractions they belong to: we say a variable is of type k if it belongs to k unsatisﬁed interactions. In the presentmodel k ∈ [0 ,

3] since d = 3 for all variables. We call f k ( t )the fraction of variables of type k at time t , that satisfy0 ≤ f k ( t ) ≤ (cid:80) k =0 f k ( t ) = 1 at any time.The searching algorithm we propose is extremely simpleand works as follows: at each time step it chooses onevariable of type k with probability p k ( t ) ∝ w k f k ( t ) andﬂips it. The time is then incremented by 1 /N in orderto have a well deﬁned continuous process in the large N limit.Starting from a random conﬁguration the behavior ofthe algorithm is determined only by the vector of weights w = ( w , w , w , w ). The stopping condition depends onthe weights: if all weights are non-null the algorithm neverstops; while if w = 0 the algorithm cannot ﬂip variablesparticipating only in satisﬁed interactions and thus anysolution is a stopping conﬁguration. We ﬁx w = 0 here-after so as to make any solution a stopping conﬁgurationfor the algorithm. We study several choices for the vectorof weights (the weights need not be normalized, but theprobabilities p k ∝ w k f k are).

1. Analytic description

Before presenting the actual performance of this al-gorithm, we would like to stress that the evolution ofthe algorithm can be described analytically under someassumptions, which are similar to those already used inthe literature to approximately describe the relaxationdynamics in model deﬁned on a Bethe lattice [31, 47, 48].At each time step the fractions { f k ( t ) } change depend-ing on the variable chosen. For example, if a variableof type k = 3 is chosen and ﬂipped, then that variablechanges its type from 3 to 0 (3 → f decreases by 1 /N and the fraction f increases by 1 /N ;at the same time its 6 neighboring variables decrease byone the number of their types, and one needs to com-pute the number of changes n → , n → and n → inorder to properly update the fractions { f k } (for example∆ f = ( n → − n → ) /N ). The three numbers n → , n → and n → are random variables distributed according tothe multinomial distribution Mult ( { p → , p → , p → } , p → ∝ f , p → ∝ f , p → ∝ f , (4)where the proportionality constant is ﬁxed by p → + p → + p → = 1. One more example: if the variablechosen to be ﬂipped is of type 2, then, apart from the f r a c . un s a t t w =4 w =1w =2 w =1w =1 w =1w =1 w =2w =1 w =4 FIG. 3. Evolution of the energy for several greedy versions( w = 0) of the algorithm. Points are numerical data and linesare the analytic description. change 2 →

1, we have that n → , n → and n → are distributed according to Mult ( { p → , p → , p → } , n → , n → and n → are distributed according to Mult ( { p → , p → , p → } , p → ∝ f , p → ∝ f , p → ∝ f , (5)with again the normalization condition p → + p → + p → = 1. So, the variations of the fractions due to asingle spin ﬂip are random variables given by∆ f k ( t ) = 1 N (cid:104) n (2(3 − j )) k − → k − n (2(3 − j )) k → k +1 + n (2 j ) k +1 → k − n (2 j ) k → k − + δ j, − k − δ j,k (cid:105) w/prob. p j ( t ) , where the superscript ( d ) in n ( d ) refers to the total num-ber of events in the multinomial distribution, and we ﬁx n − → = n →− = n → = n → = 0. The generaliza-tion of the above equation to other random graphs orinteraction types is straightforward. Rescaling the timesuch that a spin ﬂip happens every ∆ t = 1 /N , and tak-ing the average over a small, but ﬁnite, time interval,corresponding to O ( N ) single spin ﬂips, the stochasticequation above can be converted in the N → ∞ limit toan ordinary ﬁrst order diﬀerential equation in terms ofmean values E [ n ( d ) k → (cid:96) ] = d p k → (cid:96) . For simplicity, we keepusing the same notation, but now { f i ( t ) } are not singletrajectories, but mean values over the trajectories: f (cid:48) k ( t ) = (cid:88) j =0 p j ( t ) (cid:104) − j )( p k − → k − p k → k +1 )+2 j ( p k +1 → k − p k → k − ) + δ j, − k − δ j,k (cid:105) . (6)An easy check is that (cid:80) k =0 f (cid:48) k ( t ) = 0, so the totalprobability is conserved. The solution to Eq. (6) can beeasily achieved by any integration algorithm for ordinarydiﬀerential equations. This analytic solution will be com-pared in the following with the actual evolution of thealgorithm. = w = f f f FIG. 4. The actual evolution of fraction of variables in agreedy version ( w = 0) of the algorithm (data points) are welldescribed by the analytic solution (full curves). The algorithmstops when f = f = 0. w = w = w = f f f unsat frac. FIG. 5. Evolution of the energy and fraction of variablesduring the WalkSAT algorithm. The analytic description ofthe relaxation to the threshold energy is almost perfect.

2. Actual performances

Let us start from a greedy version of the algorithm,that is an algorithm that can never increase the energy (itis the equivalent of gradient descent, but here the spaceof conﬁgurations is discrete, so gradients are not welldeﬁned). This greedy version of the algorithm requires w = 0 because ﬂipping a variable of type 1 would increasethe energy (i.e. the number of unsatisﬁed interactions);only weights w and w can be non null in the greedycase. For this greedy version the number of stoppingconﬁgurations is very large (their entropy is computedin Appendix A): any conﬁguration where each variableparticipate to 0 or 1 unsatisﬁed interactions is a stoppingconﬁguration. These conﬁgurations are local minima ofthe energy function and we call them blocked .In Fig. 3 we show the energy or fraction of unsatisﬁedconstraints as a function of the running time of the greedyalgorithm ( w = 0) for diﬀerent choices of the ratio w /w (the w ’s are not normalized so only their ratio matters).Points are the numerical data and the lines are the solution e twalksat w=(0,1,2,3)w = (0, = (0, = (0, = (0, w=(0,0,1,1)marg. ener. FIG. 6. Evolution of the energy for WalkSAT and several (quasi-)greedy versions of the algorithm. The vector of weights( w , w , w , w ) appears in the legend. The analytic description under the assumption of lack of correlations is reported with afull curve and fails to describe the algorithms at the lowest energies, when correlations arise. to the ODE in Eq. (6). We notice the agreement betweenanalytics and numerics is almost perfect, because at thoseenergy values correlations are very weak. The same almostperfect agreement can be seen also at the level of fractionof variables f k participating in k unsatisﬁed constraints(see Fig. 4 for the case w = w = 1).Let us consider now non-greedy versions of this algo-rithm, that is cases with w > w k ∝ k . The numerical results and the comparisonwith the analytical description is provided in Fig. 5: theagreement is excellent. Although this version of the algo-rithm would run forever, after a ﬁnite time a stationaryregime is reached and nothing interesting happen anymore in the large N limit. Once the threshold energy hasbeen reached, then the search for the solution takes placevia rare ﬂuctuations as discussed in the Sec. III C.However, having achieved a good analytic descriptionof the WalkSAT algorithm is not enough, because thisalgorithm works at very high energies: its threshold energyis 1 / w is muchsmaller than w and w (in practice we ﬁx w = w = 1and explore the range w (cid:28) w is madevery small, the algorithm in practice is allowed to makean energy-increasing spin ﬂip only when no other moveis allowed, that is when the conﬁguration is blocked. Wehave simulated also a version of the algorithm where thisis made explicit and the results are equivalent.In Fig. 6 we report the evolution of the energy forWalkSAT, the greedy version and several quasi–greedyversions of the algorithm. We observe that the thresholdenergy is 1 / e marg for w (cid:28)

1. So the quasi–greedy versions of thisalgorithm seems very eﬀective in reaching the same energyvalues Simulated Annealing can achieve.The analytic description of the algorithm changes a lotdepending on the energy value. For high enough ener-gies, correlations are very weak and the analytic solutionmatches perfectly the numerical data: this is true forWalkSAT and the greedy version as already shown inFigs. 3, 4 and 5. For w = 0 . t ∼

1, that luckily enough disappearfor larger times.The most interesting cases are those with w ≤ − .First of all we observe a very weak dependence on w (thethree values used w = 10 − , − , − give practicallythe same results), so we are working in the limit wherethe algorithm does an energy-increasing spin ﬂip onlywhen no other move is available. So this algorithm is notable to jump over barriers of height larger than 1 (or fewunits at most), that correspond to ∆ e = O (1 /N ).Indeed this quasi–greedy algorithm follows closely thegreedy version until the time, t ∗ ∼ .

3, when the latterreaches a blocked conﬁguration that stops it. For t > t ∗ the quasi–greedy algorithm keeps decreasing the energyat a much slower pace, eventually approaching an energyvery close to the marginal one (that we expect to bea lower bound for the threshold energy in linear timealgorithms).In the regime t > t ∗ the analytical approximation basedon the lack of correlations fails dramatically in many im-portant aspects: it keeps predicting a very fast energydecrease and it estimates a too small asymptotic energy.The reason for such a failure is clear: low energy conﬁg-urations are very peculiar and have strong correlationsamong variables, even among variables which are farapart. In this energy range, our approximation fails andthe analytic solution is meaningless.It is worth stressing that we are still working ontimescales growing linearly with N (the ﬁrst regime de-picted in Fig. 1). So the failure of the analytic approxima-tion is not due to the ﬂuctuations that become importantin the second regime of exponentially large timescales.Here we are still working in a regime such that everyspin variable has been ﬂipped a ﬁnite number of times.Nonetheless the evolution of the quasi-greedy algorithmbring the system in conﬁgurations correlated up to somedistance, such that hypothesis leading to Eq. (6) are nolonger valid.From a preliminary study we have understood that inthis energy regime the algorithm proceeds by performingcollective rearrangements of variables forming a local tree-like structure, and the size of these collective rearrange-ments seems to grow while approaching the thresholdenergy (this reminds us a lot what happens in similarproblems approaching the dynamical transition [50, 51]).The analytical description of the quasi–greedy algorithmin this low energy regime is deferred to a future work.We move now to the problem of estimating the largedeviation rate to reach a solution via a rare ﬂuctuationfrom the threshold energy. Given the diﬀerences in thethreshold energies clearly visible in Fig. 6 we expect quitediﬀerent rates for WalkSAT and the several quasi–greedyversions of the algorithm. C. WalkSAT: the large deviations rate to reach asolution from a simple argument

WalkSAT is a popular randomized (or stochastic) algo-rithm for solving constraint satisfaction problems [49]. Inits simplest version works as follows: starts from a ran-dom conﬁguration; at each time step, if all constraints aresatisﬁed returns the solution, otherwise picks uniformlyat random an unsatisﬁed constraint and ﬂips a randomlychosen variable participating that constraint. The ﬂipcertainly satisﬁes the chosen constraint (interaction) butmay unsatisfy many other constraints: so the algorithmmay increase the energy during the evolution. An approximated analytic description of this algorithmexists [2, 47, 52] from which we learn that for the hardproblem we are studying the algorithm relaxes to a posi-tive energy in a linear time and then reaches a solution bya rare ﬂuctuation taking an exponential time. However itis not clear which kind of barrier the algorithm crosses,given that the energy increase is in principle withoutbounds.Actually the analytic description of the WalkSAT al-gorithm can be made even simpler than in previous ap-proaches [2, 47, 52] by noticing that (i) the algorithmstays most of the time on conﬁgurations of very highenergy where correlations are very weak and (ii) the ﬂuc-tuation leading to a solution is so rapid that correlationsdo not arise.These insights suggest that keeping track of the energyalone should capture the essential features of the algo-rithm. In practice, we think of the energy as a stochasticprocess, and write down a stochastic diﬀerential equationthat describes its dynamics during the computation. Thedetails of the calculation are presented in the followingsection.

1. Analytic description

In this section we focus on random p -XORSAT, with α constraints per spin: every spin participates in a Poissondistributed number of interactions (with average andvariance pα ), while every interaction involves exactly p spins. Random p -XORSAT is known to be less correlatedthan the regular version (see the comment after Eq. 3), sowe expect it to be more amenable to a simple description.Consider a uniformly random spin conﬁguration s . Ev-ery spin is connected to a Poisson distributed number ofbroken constraints with average pα u , where α u = H [ s ] /N .The same spin is also connected to a Poisson number ofsatisﬁed constraints with average p ( α − α u ). The funda-mental idea of our approach is to assume that WalkSATis incapable of building any correlations that violate thisproperty, which is strictly true only on fully random con-ﬁgurations. This is reasonable, as most of the time isspent in high energy states.Given an initial random conﬁguration s , we run Walk-SAT for T steps and denote the number of broken con-straints at this time H ( T ). When we take one more step, H ( T ) changes by∆ H T +1 T = − − u ( T ) + s ( T ) , (7)where u ( T ) is the number of excess broken constraintsconnected to the spin (i.e. excluding the one that wasselected by WalkSAT) and s ( T ) is the number of satisﬁedconstraints connected to it. As explained earlier, weassume that at all times they are distributed as if theconﬁguration was random.After a number ∆ T of steps, larger than one but smallcompared to the number of spins N , the total energychange is∆ H T +∆ TT = − ∆ T − ∆ T − (cid:88) k =0 u ( T ) + ∆ T − (cid:88) k =0 s ( T ) , (8)and we approximate the two sums using the central limittheorem: ∆ T − (cid:88) k =0 u ( T ) ≈ pα u ∆ T + R (cid:112) pα u ∆ T , (9) ∆ T − (cid:88) k =0 s ( T ) ≈ p ( α − α u )∆ T + R (cid:112) p ( α − α u )∆ T , where R and R are two independent standard Gaus-sian random variables. In these expressions we assume∆ T is short enough that the energy density α u does notappreciably change over the interval ∆ T . The sum ofthe two terms involving R and R is again a Gaussianvariable, so that the total energy change after ∆ T stepsis approximately∆ H T +∆ TT = − ∆ T + p ( α − α u )∆ T + √ pαR √ ∆ T . (10)We are interested in the large N limit, so it is convenientto change variable from T to t = T /N . Dividing theprevious equation by N we obtain a stochastic diﬀerentialequation describing an Ornstein-Uhlenbeck process: dα u ( t ) = 2 p (cid:18) pα − p − α u (cid:19) dt + (cid:114) pαN dW . (11)To estimate the scaling with N of the time necessary toreach a solution, it is best to study the Fokker–Planckequation associated to Eq. (11): ∂P∂t = − ∂∂α u (cid:16) p (cid:18) pα − p − α u (cid:19) P (cid:17) + 12 pαN ∂ P∂α u . (12)Regardless of the boundary condition P ( α u , t = 0), afteran initial transient, the energy is distributed accordingto the stationary solution of Eq. (12) (the normalizationconstant c N is irrelevant here) P st ( α u ) = c N exp (cid:18) − N ( α u − α ) α (cid:19) , (13)which is centered around the ﬁnite value (if pα > α = pα − p . (14)This indicates that the time necessary to reach the solu-tion grows more rapidly than O ( N ), but also gives us aconcrete way to estimate it: the typical time over which atransition from energy α to zero happens is the reciprocalof the rate t sol ∼ α u = 0 , t → ∞ | α u = α , t = 0)= exp (cid:18) N ( pα − αp (cid:19) . (15) l o g ( P s o l ) / N SDELarge dev.

FIG. 7. Comparison to the numerics for 3-XORSAT. Thevertical axis is the logarithm of the probability of solution inlinear time divided by N . The black dots are the numericaldata reported in [47] and the blue line is the analytic approx-imation derived in that reference, based on large deviationtheory. The red line is our approximation (see Eq.(15)). As expected, the solution time scales exponentially withthe problem size. This analytical solution does show, inits simplicity, the mechanism by which a solution is foundand why we call it entropic barriers . At equilibrium,reached after O ( N ) steps, the system ﬂuctuates arounda very high energy threshold α . This is well aboveany local minimum where the system could be stuck.With exponentially small probability however a rapidﬂuctuation of O ( N ) spin ﬂips can bring the system to thereal solution, but this happens with exponentially smallprobability. So, only an exponentially small fraction ofall the possible sequences of N spin ﬂips will pick theright direction to the solution, hence the problem is of anentropic nature, not an energetic one. We will discuss thisin more details after comparing with numerical results.

2. Comparison with numerics

A comparison with the actual rate measured in numer-ical experiments in Ref. [47] is provided in Fig. 7. Thequality of the rate reported in Eq. (15) and obtained un-der the assumption of lack of any correlation is surprisingand supports the idea that WalkSAT makes a randomsearch without building any relevant correlation in theproblem.It is interesting to notice that the same simple argumentcan be made also in the 3-regular 3-XORSAT case andprovides the following scaling for the time to a solution t sol ∼ exp( N/ . (16)The rate µ = 1 / (cid:39) .

167 is again close to the numericsreported in Ref. [53], where µ ≈ .

124 was measured. u P ( u ) / N FIG. 8. Equilibrium distribution of the density of UNSATclauses α u for the Fokker-Planck equation (12). A solution isfound when a rare ﬂuctuation reaches the value α u = 0, whichoccurrs with probability exponentially small in N . Notice that the physical interpretation of this calcula-tion is that the equilibrium distribution is centered arounda relatively large non-zero value of the cost function, asseen in Fig. 8. The solution is found by one rare ﬂuctu-ation which brings the system out of equilibrium by a“lucky” series of O ( N ) ﬂips which points in the exact direc-tion to the ground state. During this series, no signiﬁcantcorrelation is created between the values of the variousrandom variables so the central limit theorem we usedis valid. We expect this to be the behavior of a randomalgorithm which works at high temperature, tackling aproblem with entropic barriers.In the analysis of more complex algorithms, the onesthat work at lower temperatures, this assumption is mostprobably violated. Correlations between the random vari-ables need to be taken into account when computing therate of the rare ﬂuctuation that will lead to ﬁnding ofthe ground state. If one wants to make progress in theanalysis of such algorithms one needs to go beyond whatdone in this paper. D. Times to reach a solution via the quasi–greedyalgorithm

Given that the quasi–greedy version of the algorithmhas a much lower threshold energy than the WalkSATalgorithm we expect rare ﬂuctuations leading to a solutionto happen with a better exponential rate. Unfortunatelyin the quasi–greedy case the computation can not be doneanalytically because, as seen above, the approximationbased on the lack of correlations provides very poor resultsand so it is not useful at all. We thus resort to a numericalcomputation.We have observed above that in the limit w (cid:28) t i m e t o s o l u t i on NParallel

TemperingWalkSATQuasi-Greedy21.8*exp(0.0835*x)

FIG. 9. Mean times to reach a solution measured as spin ﬂipsper variable. The exponential growth of the time to reach asolution by the quasi–greedy version of the algorithm has arate µ QG ≈ . dependence on w is extremely weak in the ﬁrst regime oflinear times. The same is true also for the second regimeof exponential times. The exponential rate µ determiningthe mean time to reach a solution t sol ∼ exp( µN ) doesnot show any visible dependence on w . So, we have ﬁxed w = 0 .

05 to carry on our numerical experiments withthe quasi–greedy version of the algorithm.In Fig. 9 we report with blue points the average timeto reach a solution running the quasi–greedy algorithm.The time we report is the mean number of sweeps, wherea sweep corresponds to N spin ﬂips. The average is takenover 10 diﬀerent runs and the error is smaller than thesymbol size. For large N the data is very well ﬁtted byan exponential growth with rate µ QG = 0 . µ WS ≈ .

124 [53].A much more meaningful comparison is the one withthe Parallel Tempering (PT) algorithm. PT is consideredthe state-of-the-art for thermalization (and optimization)in the ﬁeld of disordered systems, as it can deal eﬃcientlywith very rough energy landscapes. For spin glass models,which are similar to the problem we are studying here, ithas been recently shown that PT performs comparablyto Population Annealing (PA) [54] another standard algo-rithm to optimize complex energy functions. We prefer tostudy the performance of PT in ﬁnding a solution, ratherthan PA, because in PT it is straightforward to computethe time to solution once the temperature scheduling isﬁxed. While in PA one needs to increase the size of thepopulation during the run and if the solution is not founda new run with larger sizes should be done; moreovercomparing population sizes and running times requiressome extra care.We have run PT with an optimal temperatures schedul-0ing derived in Appendix B. The results in terms of MonteCarlo Sweeps per replica are shown in Fig. 9. The compar-ison with the quasi–greedy algorithm is very favorable forthe latter: the growth rate for PT is slightly larger than µ QG (but still decreasing and could eventually convergeto the same value) and the prefactor for PT is larger by2 orders of magnitude than the one for the quasi–greedyalgorithm. E. Quantum annealing

Originally proposed twenty years ago [17, 55], Quan-tum Annealing is a quantum algorithm designed to solve classical optimization problems, exploiting the adiabatictheorem of quantum mechanics [56, 57]. First, we encodethe given classical problem in a problem Hamiltonian H P ,so that solutions correspond to its ground states. Theproblem Hamiltonian is typically in the form of a costfunction with all terms commuting among themselves (inthis sense it is a classical problem). A convenient andcustomary choice is to have the problem Hamiltonian bediagonal on the σ z basis. Then, we choose a ﬂuctuationHamiltonian H F , an arbitrary operator that should pro-vide “quantum ﬂuctuations” and must have a known andsimple ground state. A popular choice is to use a uniformﬁeld in the σ x direction to provide ﬂuctuations: H F = (cid:88) i σ xi , (17)where the sum runs over all spin variables in the system.The Quantum Annealing algorithm consists then in time–evolving the ground state of H ( t ) ≡ tT H P + (cid:18) − tT (cid:19) H F (18)from t = 0 to t = T . For a long enough annealing time T ,the adiabatic theorem guarantees that a system initiallyprepared in the ground state of H (0) = H F , known byconstruction, will evolve into a state belonging to theground state manifold of H ( T ) = H P . Measuring thisstate on the σ z basis we obtain a solution to the originalproblem. Notice that choosing the ﬂuctuation Hamilto-nian as in Eq. (17) guarantees the initial ground statehas ﬁnite overlap with every state in the computationalbasis; the algorithm will not miss a solution only becausethe corresponding state had no initial amplitude. Theadiabatic theorem also provides a lower bound on T : forthe algorithm to succeed, the annealing time should belonger than T (cid:29) max t (cid:104) ψ ( t ) | ∂ s H ( s ) | ψ ( t ) (cid:105) min t ∆ ( t ) s ≡ tT , (19)where | ψ ( t ) (cid:105) is the instantaneous ground state at time t , | ψ ( t ) (cid:105) is the ﬁrst excited state, and ∆( t ) is the energy gapbetween them. We have already encountered a classical Hamiltonian encoding XORSAT in Eq. (3). The quantummechanical version is simply H P = 12 (cid:32) N − N (cid:88) a =1 (cid:89) i ∈ ∂a σ zi (cid:33) . (20)In our numerics, we use this as the problem Hamiltonianand a uniform transverse ﬁeld as the ﬂuctuation one. Oncewe ﬁx the annealing time T and initial state | ψ (0) (cid:105) , QAwill end up in some ﬁnal state | ψ ( T ) (cid:105) . Let the probabilityof measuring an energy equal to the ground state energy,which in our model is E GS = 0, be ε in this state. As T increases, ε will approach unity. The complexity of thealgorithm is not expected to change with ε , as long as ε = O (1). In fact, since ε is the probability to ﬁnd the groundstate after a time T , one can enhance this probability byrepetition of the algorithm. A success probability of ε aftertwo repetitions becomes 1 − (1 − ε ) = 2 ε − ε , and even asmall ε , after n = O (1 /ε ) repetitions can be made close to1. This repetition, or restart technique, is commonly usedin algorithms for CSP [2], like WalkSAT. One commonlyadopted deﬁnition for the time complexity of QA is toinvert the relation between ε and annealing time: one ﬁxes ε and N , and asks what is the corresponding annealingtime T ( N, ε ). For large enough N it is reasonable toexpect an exponential scaling of the form T ( N, ε ) ∼ A ( N, ε ) exp ( µN ) , (21)where the prefactor A ( N, ε ) is allowed to have a polyno-mial dependence on N , while the rate µ does not dependon ε . In practice, T ( N, ε ) is rarely estimated from real–time unitary evolution, since the required computationalpower quickly becomes unmanageable as the number ofspins grows above N = 20. There are two common strate-gies adopted to sidestep this problem: estimate the energygap between the ground and ﬁrst excited state via ExactDiagonalization or Quantum Monte Carlo [19, 42, 43]and invoke the adiabatic theorem to impose a bound on T ( N, ε ), or perform the evolution in imaginary time as-suming that the scaling of the imaginary analog of T ( N, ε )will be the same as the original quantity [18]. Both meth-ods ﬁnd exponential scaling of the time for hard classicalproblems, a fact which has been connected with the orderof the thermodynamic transition [42] (however, see [58]).While both methods provide reasonable bounds on thescaling of the annealing time, it is unclear how the entropyof excited states aﬀects those estimates. Since entropiceﬀects are the primary focus of this work, we decidedto perform real–time evolution to minimize confound-ing factors, even if this means limiting the simulationsto moderate sizes. We integrated the time–dependentSchr¨odinger equation via an explicit high–order adaptiveRunge–Kutta method [59] implemented in the QuSpinlibrary [60].The results are presented in Fig. 10: it is clear thatinstances with a unique solution (dark blue dots) areharder than instances with multiple solutions (lighterblue dots), but not exponentially so. For any choice of1

10 12 14 16 18 20 N l o g ( T ) = 0.99 = 0.24 ± 0.03

10 12 14 16 18 N = 0.9 = 0.24 ± 0.03

10 12 14 16 18 N = 0.1 = 0.26 ± 0.01 D e g e n e r a c y o f G S FIG. 10. Annealing time T as a function of instance size N and solution probability ε . Every dot corresponds to an instance,with lighter color indicating a more degenerate ground state. Notice how the rate µ is insensitive to the target probability, asexpected from the restart argument given in the main text. The red line is a linear ﬁt performed on the median value of theannealing time, using the standard deviation of log( T ) as error bar. s E n e r g y [ J ] N = 13 d = 1

FIG. 11. Lowest 30 levels in the spectrum as a function oftransverse ﬁeld strength t/T for an instance with nondegen-erate ground state. This corresponds to a dark blue point inFigure 10. the solution probability ε , the growth rate µ is compatiblewith µ = 0 . ± .

07 (22)This value is higher than the one predicted in [19], thatin our notation would read µ (cid:39) . IV. CONCLUSIONS

In Figs. 11 and 12 we compare the spectrum of a typi-cal instance with unique solution to the spectrum of aninstance with eight–fold degenerate ground state, as afunction of the transverse ﬁeld. When the ground stateis unique (Fig. 11), one of the states in the ﬁrst excited s E n e r g y [ J ] N = 13 d = 8

FIG. 12. Lowest 30 levels in the spectrum as a function oftransverse ﬁeld strength t/T for an instance with eightfolddegenerate ground state. This corresponds to a white point inFigure 10. manifold peels oﬀ and has an avoided crossing with theground state. The performance of the Quantum Adia-batic Algorithm is limited by the size of this gap. Whenthe ground state is degenerate (Fig. 12) the picture isqualitatively diﬀerent: as the transverse ﬁeld is turnedoﬀ, a few excited states go through a cascade of avoidedcrossings, eventually ending in the ground state manifold.In this case, knowledge of the minimum gap between theground state and the ﬁrst excited state is not enough tounderstand the scaling of the annealing time: we shouldknow how many states end up in the ground state, andwhat avoided crossings those have. Typically, these stateswill start high in the spectrum, where there will be manycrossings, and so where entropy is important. Theseinstances are not naturally described by the picture oftrapped states which tunnel to the solution (like a singlelevel crossing) and we cannot avoid noticing the similaritywith the phenomenon of entropic barriers, in the sense2that only a “lucky” sequence of ﬂips from one state tothe other can lead to ground state manifold. And this,too, occurs with exponentially small probability.Last, we must notice that the threshold energy forthis problem is e marg (cid:39) .

02, so a more quantitativeunderstanding of the diﬀerence between excited statesbelow the threshold energy and above it in the contextof Quantum Annealing requires studying instances with N (cid:39)

100 or larger. For the small sizes we can currentlystudy via the exact integration of the Schr¨odinger equationthe ground state degeneracy seems the best indicator foridentifying hard instances.For rCSP problems, some algorithms encounter entropicbarriers rather than energetic barriers. This in particularhappens when the algorithm works at energies higher thanthe threshold energies for the given problem and they ﬁndthe ground state by a rare ﬂuctuation which picks theright sequence of O ( N ) spin-ﬂips among the exponentiallymany. We have shown this explicitly with a family ofalmost-greedy algorithms which includes WalkSAT. Thesealgorithms evolve in t = O ( N ) to their equilibrium state(this part of the evolution can be followed by a systemof coupled diﬀerential equations), and then must beneﬁtfrom a rare stochastic ﬂuctuation to ﬁnd the ground state.In the case of WalkSAT we found the ﬂuctuation rate–and hence the time to ﬁnd a solution– with a simpleBrownian motion (and a corresponding Fokker-Planck)analysis of the algorithm. This was possible exactly be-cause the algorithm’s equilibrium state is at high energy,so the region where strong correlations exists between thevariables is not explored.We have also introduced a new class of quasi-greedyalgorithms which are not able to jump over large energybarrier. We have shown that algorithms belonging to thisclass are very eﬀective in reaching the threshold energyin times O ( N ) and also in reaching the ground state byrare ﬂuctuations in times O (exp( µN )), with an optimalvalue for the µ exponent. Given that these algorithmsare not able to jump over large energy barriers, theyprovide a direct evidence that the most eﬀective searchfor solutions in the hard XORSAT problems is limitedmainly by entropic barriers. As it is likely to happen inmany other hard optimization problems.Very recently, an algorithm from this class of quasi-greedy searches has been used in a challenge, whose par-ticipants were asked to ﬁnd the ground state of a 3-regular3-XORSAT problem [61]. At present a very optimizedversion of this quasi-greedy algorithm, running on NvidiaVT100 GPU, is leading the challenge [62], supporting ourclaim that the quasi-greedy algorithm introduced here arevery eﬀective, as the real barrier is entropic in nature.We have shown, by a time integration of the time-dependent Schr¨odinger equation, that a similar situationis encountered by quantum annealing algorithms. If oneposits to ﬁnd the solution of the problem with an O (1)probability as N grows, the same rate of growth of thesolution time is found for problems with one solution (forwhich one can apply the adiabatic theorem connecting the gap with the time, as in previous studies [63]) andthose with many solutions, for which most of the actionoccurs at ﬁnite energy density, where entropy dominates.Here, in particular, it is not diﬃcult to make the parallelwith the diﬃculties attributed to localization phenomena[20], where many small avoided crossings occur betweenstates which are O ( N ) spin ﬂip apart.Therefore we conclude that, in situations in which al-gorithms work in regions of the phase space in whichtrapping in local minima is not a real problem, the realproblem is entropic barriers, which get for themselves thetask of making the solution time exponential. The factthat two such phenomena, apparently so diﬀerent fromeach other, can trade places and make sure P (cid:54) =NP is afascinating, and we believe not widely appreciated aspectof complexity theory. That quantum algorithms mightsuﬀer from a similar trade-oﬀ is, if possible, even moresurprising. ACKNOWLEDGMENTS

This work was supported by the European ResearchCouncil under the European Union’s Horizon 2020 re-search and innovation program (Grant No. 694925-Lotglassy). We warmly thank Guilhem Semerijan forthe careful reading and suggestions.

Appendix A: Counting blocked conﬁgurations andthe size of their basins of attraction

The greedy version of the algorithm we have introducedhas many possible stopping conﬁgurations. Indeed anyconﬁguration where each variable participate to at mostone violated interaction is a local minimum of the energyfunction and thus can block the greedy dynamics. Wecall these conﬁgurations blocked , because by seeing thedynamics as the evolution of energy defects that canjust annihilate in pairs, in a blocked conﬁguration energydefects cannot evolve since they are isolated.We start from the observation that all greedy dynamicstend to a threshold energy around 0.11; more precisely inFig. 13 we plot the energy of the stopping conﬁgurationsfor several greedy dynamics as a function of the ratio w /w (the only degree of freedom of the algorithm oncewe ﬁx w = w = 0).It is thus natural to ask whether it is possible to relatethe stopping energy of the greedy dynamics to the entropyof blocked conﬁgurations. In order to compute the latterwe use the replica symmetric cavity method, that is theBethe approximation.We consider the dual lattice to a 3-regular random3-hypergraph, which is again a 3-regular random 3-hypergraph, where the variables are now the constraintsand we assign variables n i ∈ { , } indicating whethera constraint is satisﬁed ( n i = 0) or not ( n i = 1). Theinteraction among triplets of constraints (those shared by3 f na l ene r g y f o r g r eed y d y na m i cs w / w FIG. 13. Mean energy of the stopping conﬁgurations for thegreedy dynamics as a function of the ratio w /w . ρ - - S FIG. 14. Entropy S of blocked conﬁgurations as a functionof the energy ρ . a variable in the original model) forbids any conﬁgura-tion with more than one violated constraint per triplet( n + n + n ≤ n i = 1 that we call ρ . Under theBethe approximation the joint probability distributioncan be factorized as follows P ( { n i } ) (cid:39) N (cid:89) i =1 p ( n i ) N (cid:89) ( ijk ) p ( n i , n j , n k ) p ( n ) p ( n j ) p ( n k ) (A1)where the second product is over the N randomly chosentriplets forming the 3-regular 3-hypergraph. Given thatthe random hypergraph is regular we can assume theone-particle and 3-particles marginal probabilities, p ( n i )and p ( n i , n j , n k ), being site independent. These can bewritten explicitly in terms of the energy ρ as p ( n ) = (cid:26) − ρ if n = 0 ρ if n = 1 (A2) p ( n , n , n ) = (cid:26) − ρ if n + n + n = 0 ρ if n + n + n = 1 (A3)From the Bethe approximation in Eq. (A1) the entropy b = l og ( s i z e ba s i n ) / N ener N = = = FIG. 15. The size of the basin of attraction grows exponentiallywith the size and the depth of the energy minimum. Finitesize eﬀects are evident only for the lowest energies. of blocked conﬁgurations can be written as S = − (cid:88) n ,n ,n p ( n , n , n ) log p ( n , n , n )+2 (cid:88) n p ( n ) log p ( n ) = − (1 − ρ ) log(1 − ρ ) − ρ log ρ + 2(1 − ρ ) log(1 − ρ )(A4)which is plotted in Fig. 14. S is non negative for ρ ≤ . ρ (cid:63) = 0 . , ρ (cid:63) ], but wecannot estimated it from S alone, because each blockedconﬁguration has a basin of attraction whose size mattersas much as the entropy of blocked conﬁgurations.Given the important role of the basins of attraction inpredicting the large time limit of relaxation dynamics wehave measure their sizes in problems of small size, N ≤ N andthus we deﬁne b = log(size of basin) /N .We report in Fig. 15 the results for b as a functionof the energy of the blocked conﬁguration for diﬀerentsizes. The data have been averaged over diﬀerent samplesand diﬀerent blocked conﬁguration at ﬁxed energy. It isremarkable that even for such small sizes the data showrather weak size dependence and are thus reliable.The dotted line in the ﬁgure is just a guide for theeyes to convince the reader that the observed b ( e ) is notfar from following a linear behavior up to the energy ofmost numerous conﬁgurations e max = 1 /

2. Accordingto this linear behavior the size of a basin of attractiondepends linearly on the depth of the energy minimum (theblocked conﬁguration). The simplest picture compatiblewith these data is the one schematically represented in4

FIG. 16. A schematic picture of the energy landscape in the 3-regular 3-XORSAT problem. Most energy minima (includingthe solutions to the problem) have a basin of attraction whichis likely to be connected to the energy level e max where thedynamics starts. However the entropic barrier makes veryunlikely to choose the pit leading to a solution. S + b ener N = = = FIG. 17. Combining the entropy of blocked conﬁgurations S and the log size of the basins of attractions b we can computethe rate of the large deviation probability of ﬁnding a blockconﬁguration of a given energy (shifted by log(2) because it isnot normalized). The maximum is achieved at an energy closeto 0.11 (marked by a vertical dotted line) where the greedyalgorithm converges in the large N limit. Fig. 16 where each energy minimum corresponds to a pitwhose edge is close to e max , such that the size of the pitgrows exponentially with its depth, that is the distancefrom the edge.According to the simpliﬁed picture in Fig. 16 every pitis in principle accessible from the initial conﬁguration(which has typically an energy e max ). However thoseleading to a solution correspond to a tiny minority of theconﬁgurations at e max and thus is extremely unlikely thatthe dynamics enter one of these (this is essentially theorigin of the entropic barrier).Combining the number of energy minima exp( N S )with their size exp( N b ) we can compute the large devia-tion rate to ﬁnd one of the energy minima as a functionof their energy. This is shown in Fig. 17, where indeedthe maximum is achieved at e ∼ .

11 (marked by a ver-tical dotted line) which is the typical energy reachedby the greedy algorithm in the large N limit. For ﬁ-nite values of N these data also provide the exponentialrate to ﬁnd a solution via the greedy algorithm: thisis just the probability of randomly choosing one of thepits having the bottom at e = 0 and corresponds to µ = max e ( S + b ) − ( S + b ) | e =0 ≈ . Appendix B: An optimal temperature scheduling forParallel Tempering

The main problem in setting up an optimized PT isthe choice of the temperature set. However in the presentmodel we are in a lucky situation because we known thatthe 3-regular random 3-XORSAT has no thermodynami-cal phase transition and so, after convergence, the energysampled by the PT must be the one of the paramagneticsolution, e ( β ) = [1 − tanh( β/ /

2. In the large N limit wecan assume that the extensive energy at inverse tempera-ture β is a Gaussian variables with mean E ( β ) = N e ( β )and variance σ ( β ) = − N e (cid:48) ( β ). This Gaussianity assump-tion (which is rather well satisﬁed, but in the vicinity ofthe ground state) allows us to compute the probability ofswapping two replicas at temperatures β and β , whichhas to be a function of the ratio of the energy diﬀerenceto the energy ﬂuctuation p swap = f (cid:32) E ( β ) − E ( β ) (cid:112) σ ( β ) + σ ( β ) (cid:33) . In the large N limit such a ratio can be written as E ( β ) − E ( β ) (cid:112) σ ( β ) + σ ( β ) (cid:39) N e (cid:48) ( β )∆ β (cid:112) N e (cid:48) ( β ) (cid:39) ∆ β (cid:112) N e (cid:48) ( β ) . The explicit form for the function f is given by f ( z ) = (cid:90) dx dy e − x − ( y − z )22 π min (cid:16) , e z ( x − y ) (cid:17) . The best way to allow replicas to wander fast betweentemperatures is to ﬁx a constant p swap between any pairof successive temperatures, that is using temperatureintervals set by ∆ β = f − ( p swap ) (cid:112) N e (cid:48) ( β ) , which implies in the large N limit the following recursiveequation for the temperatures to be used in the optimizedParallel Tempering β n +1 = β n + 2 r (cid:112) N (1 − tanh( β n / )where r = f − ( p swap ). The solution to the above recursiveequation converges in the large N limit to the followingset of temperatures β n = 2 arcsinh (cid:20) tan (cid:18) rn √ N (cid:19)(cid:21) , (B1)with the index n running up to (cid:98)√ N π/ (2 r ) (cid:99) .5The optimal value for r is the one minimizing the meantraveling time between the extremal temperatures, whichis proportional to [ r f ( r )] − . The minimum is achieved at r opt ≈ .

68, that corresponds to an optimal swapping rate f ( r opt ) ≈ .

23 (this result is well known as the 0.23 rule).We run all out PT simulations with the temperature setdeﬁned in Eq. (B1) with r = r opt . [1] S. Arora and B. Barak, Computational complexity: amodern approach (Cambridge University Press, 2009).[2] A. K. Hartmann and M. Weigt,

Phase transitions in com-binatorial optimization problems , vol. 67 (Wiley OnlineLibrary, 2005).[3] F. Ricci-Tersenghi, M. Weigt, and R. Zecchina, PhysicalReview E , 026702 (2001).[4] M. M´ezard, G. Parisi, and R. Zecchina, Science , 812(2002).[5] W. Barthel, A. K. Hartmann, M. Leone, F. Ricci-Tersenghi, M. Weigt, and R. Zecchina, Physical reviewletters , 188701 (2002).[6] M. M´ezard, F. Ricci-Tersenghi, and R. Zecchina, Journalof Statistical Physics , 505 (2003).[7] S. Cocco, O. Dubois, J. Mandler, and R. Monasson, Phys-ical review letters , 047205 (2003).[8] A. Montanari, G. Parisi, and F. Ricci-Tersenghi, Journalof Physics A: Mathematical and General , 2073 (2004).[9] M. M´ezard, T. Mora, and R. Zecchina, Physical ReviewLetters , 197205 (2005).[10] S. Mertens, M. M´ezard, and R. Zecchina, Random Struc-tures & Algorithms , 340 (2006).[11] F. Krzaka(cid:32)la, A. Montanari, F. Ricci-Tersenghi, G. Se-merjian, and L. Zdeborov´a, Proceedings of the NationalAcademy of Sciences , 10318 (2007).[12] F. Krzakala and J. Kurchan, Physical Review E ,021122 (2007).[13] A. Montanari, F. Ricci-Tersenghi, and G. Semerjian, Jour-nal of Statistical Mechanics: Theory and Experiment , P04004 (2008).[14] L. Zdeborov´a and F. Krzaka(cid:32)la, Physical Review E ,031131 (2007).[15] F. Altarelli, R. Monasson, and F. Zamponi, in Journal ofPhysics: Conference Series (2008), vol. 95, p. 012013.[16] F. Krzakala, F. Ricci-Tersenghi, L. Zdeborova,R. Zecchina, E. W. Tramel, and L. F. Cuglian-dolo,

Statistical Physics, Optimization, Inference, andMessage-Passing Algorithms: Lecture Notes of the LesHouches School of Physics-Special Issue, October 2013 ,2013 (Oxford University Press, 2016).[17] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, arXivpreprint quant-ph/0001106 (2000).[18] G. E. Santoro and E. Tosatti, Journal of Physics A: Math-ematical and General , R393 (2006).[19] E. Farhi, D. Gosset, I. Hen, A. Sandvik, P. Shor, A. Young,and F. Zamponi, Physical Review A , 052334 (2012).[20] B. Altshuler, H. Krovi, and J. Roland, Proceedings of theNational Academy of Sciences , 12446 (2010).[21] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprintarXiv:1411.4028 (2014).[22] C. R. Laumann, R. Moessner, A. Scardicchio, and S. L.Sondhi, The European Physical Journal Special Topics , 75 (2015).[23] G. Mossi and A. Scardicchio, Philosophical Transactionsof the Royal Society A: Mathematical, Physical and En- gineering Sciences , 20160424 (2017).[24] G. Mossi et al. (2017).[25] V. N. Smelyanskiy, K. Kechedzhi, S. Boixo, S. V. Isakov,H. Neven, and B. Altshuler, Physical Review X , 011017(2020).[26] D. M. Basko, I. L. Aleiner, and B. L. Altshuler, Annalsof physics , 1126 (2006).[27] C. R. Laumann, A. Pal, and A. Scardicchio, Physicalreview letters , 200405 (2014).[28] J. Z. Imbrie, V. Ros, and A. Scardicchio, Annalen derPhysik , 1600278 (2017).[29] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin,R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A.Buell, et al., Nature , 505 (2019).[30] O. Dubois and J. Mandler, Comptes Rendus Mathema-tique , 963 (2002).[31] S. Cocco, R. Monasson, A. Montanari, and G. Semerjian,arXiv preprint cs/0302003 (2003).[32] M. Ibrahimi, Y. Kanoria, M. Kraning, and A. Montanari,in Proceedings of the twenty-third annual ACM-SIAMsymposium on Discrete Algorithms (SIAM, 2012), pp.760–779.[33] D. Gamarnik and A. Jagannath, arXiv preprintarXiv:1911.06943 (2019).[34] A. Braunstein, M. Leone, F. Ricci-Tersenghi, andR. Zecchina, Journal of Physics A: Mathematical andGeneral , 7559 (2002).[35] S. Franz, M. M´ezard, F. Ricci-Tersenghi, M. Weigt, andR. Zecchina, EPL (Europhysics Letters) , 465 (2001).[36] F. Krzakala and L. Zdeborov´a, EPL (Europhysics Letters) , 66002 (2010).[37] L. Zdeborov´a and F. Krzakala, Physical Review B ,224205 (2010).[38] L. Zdeborov´a and F. Krzakala, Advances in Physics ,453 (2016).[39] S. Franz, M. Leone, F. Ricci-Tersenghi, and R. Zecchina,Physical Review Letters , 127209 (2001).[40] A. Montanari and F. Ricci-Tersenghi, The European Phys-ical Journal B-Condensed Matter and Complex Systems , 339 (2003).[41] A. Montanari and F. Ricci-Tersenghi, Physical Review B , 134406 (2004).[42] T. J¨org, F. Krzakala, G. Semerjian, and F. Zamponi,Physical review letters , 207206 (2010).[43] V. Bapst, L. Foini, F. Krzakala, G. Semerjian, and F. Zam-poni, Physics Reports , 127 (2013).[44] G. Folena, S. Franz, and F. Ricci-Tersenghi, PhysicalReview X , 031045 (2020).[45] S. S. Mannelli, G. Biroli, C. Cammarota, F. Krzakala,P. Urbani, and L. Zdeborov´a, Physical Review X ,011057 (2020).[46] S. S. Mannelli and L. Zdeborov´a, Journal of StatisticalMechanics: Theory and Experiment , 034004 (2020).[47] G. Semerjian and R. Monasson, Physical Review E ,066103 (2003). [48] G. Semerjian and M. Weigt, Journal of Physics A: Math-ematical and General , 5525 (2004).[49] B. Selman, H. A. Kautz, B. Cohen, et al., Cliques, coloring,and satisﬁability , 521 (1993).[50] A. Montanari and G. Semerjian, Journal of statisticalphysics , 103 (2006).[51] G. Semerjian, Journal of Statistical Physics , 251(2008).[52] W. Barthel, A. K. Hartmann, and M. Weigt, PhysicalReview E , 066104 (2003).[53] M. Guidetti and A. Young, Physical Review E , 011102(2011).[54] W. Wang, J. Machta, and H. G. Katzgraber, PhysicalReview E , 063307 (2015).[55] T. Kadowaki and H. Nishimori, Physical Review E ,5355 (1998).[56] A. Ambainis and O. Regev, arXiv preprint quant- ph/0411152 (2004).[57] A. Messiah, Quantum mechanics: volume II (North-Holland Publishing Company Amsterdam, 1962).[58] C. Laumann, R. Moessner, A. Scardicchio, and S. L.Sondhi, Physical review letters , 030502 (2012).[59] J. R. Dormand and P. J. Prince, Journal of computationaland applied mathematics , 19 (1980).[60] P. Weinberg and M. Bukov, SciPost Phys. , 003 (2017),URL https://scipost.org/10.21468/SciPostPhys.2.1.003 .[61] M. Kowalsky, T. Albash, I. Hen, and D. Lidar, in prepa-ration (2021).[62] M. Bernaschi, M. Bisson, M. Fatica, E. Marinari,V. Martin-Mayor, G. Parisi, and F. Ricci-Tersenghi, arXivpreprint arXiv:2102.09510 (2021).[63] A. Young, S. Knysh, and V. Smelyanskiy, Physical reviewletters104