[PDF] Practical engineering of hard spin-glass instances

Abstract

Recent technological developments in the field of experimental quantum annealing have made prototypical annealing optimizers with hundreds of qubits commercially available. The experimental demonstration of a quantum speedup for optimization problems has since then become a coveted, albeit elusive goal. Recent studies have shown that the so far inconclusive results, regarding a quantum enhancement, may have been partly due to the benchmark problems used being unsuitable. In particular, these problems had inherently too simple a structure, allowing for both traditional resources and quantum annealers to solve them with no special efforts. The need therefore has arisen for the generation of harder benchmarks which would hopefully possess the discriminative power to separate classical scaling of performance with size, from quantum. We introduce here a practical technique for the engineering of extremely hard spin glass Ising-type problem instances that does not require `cherry picking' from large ensembles of randomly generated instances. We accomplish this by treating the generation of hard optimization problems itself as an optimization problem, for which we offer a heuristic algorithm that solves it. We demonstrate the genuine thermal hardness of our generated instances by examining them thermodynamically and analyzing their energy landscapes, as well as by testing the performance of various state-of-the art algorithms on them. We argue that a proper characterization of the generated instances offers a practical, efficient way to properly benchmark experimental quantum annealers, as well as any other optimization algorithm.

Full PDF

PPractical engineering of hard spin-glass instances

Jeﬀrey Marshall,

1, 2

Victor Martin-Mayor,

3, 4 and Itay Hen

2, 5 Department of Physics and Astronomy, University of Southern California, Los Angeles, CA 90089, USA Center for Quantum Information Science & Technology,University of Southern California, Los Angeles, CA 90089, USA ∗ Departamento de F´ısica Te´orica I, Universidad Complutense, 28040 Madrid, Spain Instituto de Biocomputaci´on y F´ısica de Sistemas Complejos (BIFI), Zaragoza, Spain Information Sciences Institute, University of Southern California, Marina del Rey, CA 90292, USA

Recent technological developments in the ﬁeld of experimental quantum annealing have madeprototypical annealing optimizers with hundreds of qubits commercially available. The experimentaldemonstration of a quantum speedup for optimization problems has since then become a coveted,albeit elusive goal. Recent studies have shown that the so far inconclusive results, regarding aquantum enhancement, may have been partly due to the benchmark problems used being unsuitable.In particular, these problems had inherently too simple a structure, allowing for both traditionalresources and quantum annealers to solve them with no special eﬀorts. The need therefore has arisenfor the generation of harder benchmarks which would hopefully possess the discriminative power toseparate classical scaling of performance with size, from quantum. We introduce here a practicaltechnique for the engineering of extremely hard spin glass Ising-type problem instances that does notrequire ‘cherry picking’ from large ensembles of randomly generated instances. We accomplish thisby treating the generation of hard optimization problems itself as an optimization problem, for whichwe oﬀer a heuristic algorithm that solves it. We demonstrate the genuine thermal hardness of ourgenerated instances by examining them thermodynamically and analyzing their energy landscapes,as well as by testing the performance of various state-of-the art algorithms on them. We arguethat a proper characterization of the generated instances oﬀers a practical, eﬃcient way to properlybenchmark experimental quantum annealers, as well as any other optimization algorithm.

I. INTRODUCTION

Many problems of theoretical and practical relevanceconsist of searching for the global minimum of a cost func-tion. These optimization problems are on the one handnotoriously hard to solve but on the other hand ubiqui-tous, and appear in diverse ﬁelds such as machine learn-ing, material design, and software veriﬁcation, to mentiona few diverse examples. The computational diﬃculties as-sociated with solving these optimization problems stemsfrom the intricate structure of the cost function that needsto be optimized which often has a rough landscape withmany local minima. The design of fast and practical algo-rithms for optimization has therefore become one of themost important challenges of many areas of science andtechnology [1].Recent theoretical and technological breakthroughshave triggered an enormous interest in one such non-traditional method, commonly referred to as QuantumAnnealing (QA) [2–8]. The uniqueness of this approachstems from the fact that it does not rely on traditionalcomputation resources but rather manipulates data struc-tures called quantum bits, or qubits, that obey the lawsof Quantum Mechanics. It is believed that by utilizinguniquely quantum features such as entanglement, massiveparallelism and tunneling, a quantum computer can solvecertain computational problems in a way which scalesbetter with problem size than is possible on a classicalmachine. ∗ [email protected] A huge amount of progress has recently been made inthe building of experimental quantum annealers [9, 10],the most notable of which are the D-Wave processorsconsisting of hundreds of coupled superconducting ﬂuxqubits. These devices oﬀer a very natural approach tosolving optimization problems utilizing gradually decreas-ing quantum ﬂuctuations to traverse the barriers in theenergy landscape in search of global optima [2, 3, 5–7]. As an inherently quantum technique, QA holds theso-far unfulﬁlled promise to solve combinatorial opti-mization problems faster than traditional ‘classical’ algo-rithms [11–14]. However, to date there is no experimental(nor theoretical) evidence that quantum annealers are ca-pable of producing such speedups [15, 16].Extensive studies designed to properly benchmark ex-perimental QA processors, such as the D-Wave annealers,have resulted for the most part in inconclusive results, de-spite accumulating evidence for the (indirect detection)of genuinely quantum eﬀects such as entanglement andmulti-qubit tunneling [17–21]. Indeed, direct comparisontests between the 128-qubit D-Wave One and later the512-qubit D-Wave Two (DW2) processors and classicalstate of the art algorithms on randomly-generated Ising-model instances have shown no evidence of a quantumspeedup [15, 16, 19].The above lack of evidence has motivated a few re-cent studies to further explore problem classes where onemight expect the occurrence of quantum speedups [16,22–24]. Katzgraber, Hamze and Andrist [22] pointed outthat the random Ising instances used in the previouslymentioned comparison tests exhibit a spin-glass phasetransition only at T = 0, i.e., at zero temperature. Spin-glasses are disordered, frustrated spin systems that may a r X i v : . [ qu a n t - ph ] J u l be viewed as prototypical classically-hard (also called NP-hard) optimization problems, that are so challenging thatspecialized hardware has been built to simulate them [25–27] [the related cost function is in Eq. (1) below]. Thatthe spin glass transition occurs at T = 0 implies thatfor any T >

0, the energy landscapes for these problemsare in general fairly simple and can therefore be solvedrather easily by classical heuristic solvers and hence donot require quantum tunneling reach global optima, thusrendering these instances less than ideal for benchmark-ing.A subsequent study which examined the same class ofuniformly-random Ising problems on the D-Wave archi-tecture [23] measured the correlation between the perfor-mance of the DW2 device and a physical eﬀect referredto as temperature chaos [28–43], which has recently beenidentiﬁed as the culprit for the diﬃculties that classicalthermal algorithms encounter when attempting to solvespin-glasses [44, 45]. Temperature chaos implies the pres-ence of low-lying excited states (i.e., slightly sub-optimalspin assignments) that have a large Hamming distancewith respect to the minimizing assignment, or groundstate (GS), of the instance. Furthermore, these excitedstates are not only stable against local excitations (i.e.,bit ﬂips), they also have a much larger entropy than theGS. As a consequence, classical state-of-the-art optimiza-tion algorithms such as simulated annealing or paralleltempering simulations often get trapped in one of theseexcited states. Indeed, non-chaotic problem instances areexponentially unlikely to be found as the problem sizeis increased [39, 42, 43]. However, while temperature-chaotic instances do indeed exist on the relatively tinyDW2 512-bit hardware graph, they become exceedinglyrare with the degree to which they exhibit temperaturechaos, and are therefore diﬃcult to ﬁnd [23].Since these are the temperature-chaotic instances thatare expected to have the discriminative power to separateclassical scaling of performance with size, from quantum,a natural question thus arises. Can one eﬃciently ﬁnd orgenerate ‘rare gem’ instances, i.e., small size problems (sosmall that encoding it on a quantum device is feasible)that also display a large degree of temperature chaos,or inherent hardness? To date, several techniques forgenerating hard problems that go beyond random gen-eration of instances have been explored, such as utiliz-ing instances with planted solutions with tunable frustra-tion [16], the deliberate reduction of GS degeneracy usingSidon sets for the couplings [46] or the porting of fully-connected Sherrington-Kirkpatrick (SK) instances [24].However the obtained instances were found to lack thenecessary degree of inherent hardness (i.e., hard problemsare still rare), which as a result necessitated the genera-tion of an initial huge pool of problems followed by theprohibitively expensive procedure of exhaustive ‘mining’for instances presenting high degrees of inherent thermalhardness [23, 46].In this paper, we propose an altogether diﬀerent, adap-tive algorithm to generate such rare gem instances — ex-tremely hard spin-glass instances of relatively small size— in a considerably more eﬃcient manner than current mining techniques. The remainder of the work is orga-nized as follows. In Sec. II we present the basics of aheuristic algorithm which generates hard spin-glass in-stances. Following this, in Sec. III, we thermodynamicallyanalyze the generated instances, and test them againstclassical optimizers as well the DW2 annealer. We alsoapply our technique to instances with planted solutionsin Sec. IV and conclude in Sec. V.

II. ENGINEERING OF HARD SPIN-GLASSBENCHMARKS

For concreteness in what follows we apply our tech-nique to the D-Wave Two ‘Chimera’ hardware graph (seeAppendix A) as this will allow us to experimentally testthe hardness of the instances on an actual quantum an-nealer. However it should be noted that the method pro-posed here is far more general and may apply to arbitraryconnectivity graphs. The cost function on which we gen-erate our instances is of the form H = (cid:88) (cid:104) i,j (cid:105) J ij s i s j , (1)where the couplings { J ij } are programmable parame-ters that deﬁne the instance. The cost function H isto be minimized over the spin variables, s i = ± i = 1 . . . N and N is the number of participating spins.The angle brackets (cid:104) i, j (cid:105) denote that the sum is only overconnected bits on the Chimera graph.In this work, we shall treat the process of ﬁnding prob-lem instances, i.e., sets of { J ij } values, over any predeter-mined set of allowed values in a way that maximizes thehardness of the problem (however hardness is deﬁned), asan optimization problem in itself. The ﬁgure of merit – orcost function – for this optimization problem is any faith-ful characteristic of the inherent hardness of the instance.We will call this ﬁgure of merit the time to solution , orTTS for short.Here, we shall use as the TTS of a problem instancethe deﬁnition for classical thermal hardness that was in-troduced in Ref. [23], namely, the characteristic num-ber of steps it takes for a parallel tempering (PT) al-gorithm to equilibrate. In a PT algorithm, one simulates N T realizations of an N -spin system, with temperatures T < T < . . . < T N T , where Metropolis updates oc-cur independently for each copy. Each copy attempts toswap temperatures with its temperature neighbors, withprobabilities satisfying detailed balance [47]. The result-ing temperature random walk of each system copy allowsa global traversal of the conﬁguration space, as well asdetailed exploration of local minima (i.e., at the lowertemperatures). An accurate PT simulation takes timelonger than the temperature ‘mixing’ time, τ [48, 49].The time τ can be thought of as an equilibration time;the time for each copy to explore the entire temperaturemesh. Thus, large τ instances take longer to equilibrate,motivating the deﬁnition of the mixing time τ as the clas-sical hardness .Furthermore, we shall utilize the strong correlationfound between the PT mixing time τ and the hardnessof other algorithms (consistently with the requirement ofintrinsic hardness [23]). Speciﬁcally, we shall use as TTSthe runtime clocked by the Hamze-de Freitas and Selbyalgorithm (HFS) [50, 51], which has proved to be muchfaster yet strongly correlated with the classical hardness, τ [23]. Our aim in this work is the maximization of theTTS cost function where the variables over which themaximization is done are the coupling strengths { J ij } ofthe underlying graph. A. Random adaptive optimization (RAO)

Heuristic optimization algorithms, such as Metropolisand simulated annealing, aim to ﬁnd the global minimum(equivalently, maximum) of a cost function by changingthe state of the system at each step. Changes are ac-cepted when the cost moves in the required direction, butalso, often crucially, still accept changes in the ‘wrong’direction with a certain probability so as to reduce thechances of becoming stuck in local minima. This accep-tance probability may be determined by deﬁning a sim-ulated ‘temperature’ parameter, β (inspired by thermalannealing). We follow this approach with the cost beingthe TTS (which is assumed to be an indicator of inher-ent hardness), and the ‘state’ of the system a particularconﬁguration of the J ij .By picking some random ‘seed’ instance, and modify-ing a subset of the J ij (e.g., ﬂipping the sign of a randomedge), the resulting instance may be harder to solve as de-termined by the TTS cost. If this is the case, we acceptthe modiﬁcation. Repeating this process will necessar-ily drive the system to harder and harder instances. Ifthe TTS is lowered by such a modiﬁcation, it may stillbe accepted, so as avoid getting trapped in local min-ima. We utilize a Boltzmann-type acceptance probability, e − β | ∆TTS | [52], where the choice of β deﬁning this distri-bution will depend on the solver being used to determinethe TTS, and the manner in which one updates β duringthe algorithm (if at all), will depend on the methodologyone wishes to pursue (e.g., Metropolis, simulated anneal-ing, etc.).We outline our algorithm in its most basic (unopti-mized) form in Algorithm 1, which generates hard signed(i.e., J i,j = ±

1) instances, though we wish to stress thatthe general technique can be applied under much morediverse settings. In fact, we expect that allowing the cou-pling constants, J ij , to take on a wider range of values,will in general result in harder instances being generated(compared to the J ij = ± J ij = ± minimize the TTS in order to generate particularly easyinstances as well. Moreover, we apply this method toinstances with planted solutions in Sec. IV. In the next section we illustrate the eﬀectiveness of our technique. Algorithm 1

Random Adaptive Optimization (RAO) procedure GenerateHardProblem Generate random ± for step = 1 to NSTEP do Pick a random edge and ﬂip sign Get new TTS if TTS increases then Accept Change else Accept with probability e − β | ∆TTS | end if Update β if required end for end procedure III. RESULTSA. Engineering of hard spin-glass instances For our work, we used extensively the version of theHFS algorithm created by A. Selby [51, 53], and havetaken the average wallclock time for the TTS [54], asour cost function. We adopt the notation t HF S for thisquantity (see Appendix B for speciﬁc implementation de-tails) [55].The performance of one typical realization of our algo-rithm on a 512 bit Chimera-type instance is depicted inFig. 1. Remarkably, the ﬁnal instance is just 20 successfulsteps away from the initial, and the ﬁnal TTS is about25 times the initial instances TTS. Though there doesnot seem to necessarily be a typical (or standard) outputfor the algorithm, the occurrence of plateaus is found tobe fairly common. Indeed in Fig. 1 we see that the in-stance remains at a plateau of about 0.25 s between 100and 400 attempted ﬂips. These plateaus can occasionallyhalt the optimization of the cost function, and as such itis important to carefully choose an appropriate simulatedtemperature.In Fig. 2, we statistically quantify the merit of the gen-erated hard instances, by comparing the ﬁnal TTS to themean initial TTS (i.e., typical TTS of a random instance),after 0 (blue), 500 (red) and 2500 (yellow, with red out-line) adaptive steps on 150 instances. One notices imme-diately a clear separation in hardness classiﬁcation fromthe completely random instances (blue), and the othertwo groups, even after a fairly modest number of updateattempts (i.e., 500). The instances after 2500 steps are onaverage about 3 times harder than those after 500 steps,which are themselves about an order of magnitude abovethe random instances.To gain a reverse eﬀect, namely, easier than randominstances, we have also run our algorithm by ‘reversing’the acceptance criterion (i.e., step 6 in Algorithm 1) suchthat it favors instances with shorter TTS values. Thisallowed for the lowering the TTS on the DW2 Chimeraby a factor of about 10 from randomly chosen instances(see next subsection). No. Steps t H F S ( s ) RejectedAccepted

FIG. 1. (Color online)

Performance of the RAO algo-rithm for a single instance . A single run of the algorithmover 1000 steps, set up to maximize t HFS of a 512-bit Chimerainstance with J ij = ±

1. Updates consist of ﬂipping the valueof randomly picked edges. Squares (red) show successful up-date attempts, crosses (blue) are rejected updates. ! t ( f ) HF S (s) / h t ( i ) HF S (s) i F r a c t i o n o f I n s t a n ce s FIG. 2. (Color online)

Histogram of the ratio of ﬁnalto mean initial t HFS for 150 random signed instances(512 bit Chimera graph) . As in Fig 1 we adapted theinstances by ﬂipping random edge values, attempting to max-imize t HFS . This plot is a normalized histogram of the ra-tio ﬁnal TTS, t ( f ) HFS , to mean initial TTS, (cid:104) t ( i ) HFS (cid:105) , after 0(blue), 500 (red) and 2500 (yellow, with red outline) algorith-mic steps.

B. Inherent (thermal) hardness of the generatedinstances

It is crucially important to demonstrate that the gener-ated instances are not only diﬃcult to solve with respectto the solver with the help of which they are obtained,but that the instances are inherently diﬃcult, i.e., they possess inherent degrees of hardness. In what follows weillustrate precisely that by measuring the thermodynami-cal complexity of the instances, generated using our RAOalgorithm.To that aim, we have used t HF S as both a minimiz-ing and maximizing cost function, to generate 100 signed( J ij = ±

1) 500-bit DW2 Chimera instances in each offour diﬀerent hardness groups, or generations, classiﬁedby t HF S ∈ [0.8,1.2] × − k ( s ), where k = 1 , , , k = 1 instances are about an order of magnitude eas-ier compared to random instances. The hardest instancewe have found on the studied graph, after analyzing 780instances, was found to be ∼

250 times harder than atypical random instance, with a runtime of t HF S ≈ . s on a 3.5 GHz single core CPU, which to our knowledgeis the most diﬃcult randomly generated HFS instance onthe DW2 graph to date. We shall denote this instance by k max .Our ﬁrst task is assessing how diﬃcult it is for a state-of-the-art classical thermal algorithms such as PT to solvethe generated instances. This question can be quantita-tively answered by computing the mixing time, τ , for thetemperature random-walk of the algorithm [42, 47–49]and examine its correlation with ‘hardness group’ k .Our computation of τ follows exactly the procedure de-tailed in Ref. [23]. We simulated 120 system copies con-sisting of four independent parallel-temperature chains,with 30 temperatures each. Given the similar system size,we also use the same temperature grid of Ref. [23]. TheElementary Monte Carlo Step (EMCS) consisted of 10full-lattice Metropolis sweeps, independently performedin each system copy, followed by one Parallel Temper-ing temperature-exchange sweep. A computation of τ was considered satisfactory if two conditions were met: i)The system copy (out of the 120 possibilities) that spendsthe least time in the hot-half region (i.e., the 15 highesttemperatures), spends at least 20% of the total simula-tion time there. In other words, no system-copy got per-manently trapped in the cold-half region. ii) The totalsimulation time was at least 20 τ long. τ is given in unitsof Metropolis sweeps.We performed three independent simulations of diﬀer-ent lengths: 10 EMCS (i.e., 10 Metropolis sweeps), 10 EMCS and 10 EMCS. The shortest runs (10 EMCS),were enough to compute τ for the 200 problem instancesthat belong to the ﬁrst and second hardness groups( k = 1 , EMCS run suﬃced to compute τ for most (but not all) of the third-generation instances.The 10 EMCS run was enough to compute τ for all thethird-generation instances, and for 86 out of 100 prob-lem instances belonging to the fourth-generation. As across-check, we compared the GS energy found with par-allel tempering with the one found with the HFS code.Agreement was reached in all cases (even in cases wherethe computation of τ was not successful).Speciﬁcally notable are 32 of the instances of the hard-est ( k = 4) HFS group, which were found to have τ > .In Ref. [23] it was estimated that only 2 out of every 10 random instances of this type have τ > . This obser-vation helps to quantify the eﬃciency of our algorithm;a highly optimized PT algorithm screening random in-stances of this type would require ∼

65 CPU hours [57]to ﬁnd one with τ > . Equivalently for (unoptimized)RAO, we estimate less than 5 hours to generate one suchinstance [56]. overlap(GS-ES)/overlap(GS-GS) τ k = 1 k = 2 k = 3 k = 4 FIG. 3. (Color online)

Mixing time τ vs ratio of typi-cal GS-ES overlaps over GS-GS overlaps . For each in-stance for which PT computed τ successfully, we compute themedian overlap (see text) between the ground and dominantﬁrst-excited states, normalizing by the median ground stateto ground state overlap. Also plotted is the median data pointfor each group (ﬁlled black circle), where going from bottomright to upper left, is in order from k = 1 to k = 4. Linear ﬁtis on the median data point for k = 2 , , τ is given in unitsof Metropolis sweeps. Having classiﬁed (most of) these 400 instances by thePT mixing time, we analyze their energy landscapesby computing the overlap between their GS conﬁgura-tions and their dominant ﬁrst-excited states. The over-lap between two spin assignments, ¯ s , ¯ s , is deﬁned as,1 − h (¯ s , ¯ s ) /N , where h is the Hamming distance. Thisanalysis is summarized in Fig. 3. The trend in the ﬁg-ure is rather clear: a large τ (and k ) correlates stronglywith a smaller overlap (i.e., large Hamming distance) be-tween the ground and dominant ﬁrst-excited states; thatis, the larger τ , the more diﬃcult the problems are ina thermodynamical sense. Interestingly, the easiest HFSinstances, k = 1, which have been generated by minimiza-tion of TTS have been found to not correlate as well withthe other data groups. We examine this in more detail inthe next subsection. C. Algorithmic scaling

To establish the inherent hardness of the instances gen-erated by the RAO algorithm, we have directly comparedtheir time-to-solution t HF S to the PT mixing time, τ .This is depicted in Fig. 4. Despite the apparent ﬂuc-tuations, we observe an agreement between these twovastly diﬀerent solvers, which can be quantiﬁed by the dependence τ ∼ t . HF S (as measured by the median datapoint for the k = 2 , , k = 4 group) with τ > × EMCS, whichusing straightforward ‘mining’ would have required thegeneration and subsequent analysis of more than 2 × randomly generated instances (as found by Ref. [23]) —about 100 times more costly in terms of computational re-sources [56, 57]. Comparing this to the numerics quotedin the previous subsection (re. generating τ > in-stances), we see RAO becomes even more beneﬁcial overconventional methods as the problem diﬃculty bar israised, and is very likely to be the only way to obtaina large number of such temperature-chaotic instances. ! ! ! t HF S (s) = k = 1 k = 2 k = 3 k = 4 k max FIG. 4. (Color online)

PT-hardness (mixing time) vsHFS-hardness.

The 400 instances generated as documentedin the main text, examined on parallel tempering (PT). Thelinear ﬁt of slope 1.40 is obtained from a least squares ﬁttingon the median data point (black squares) of each data group,ignoring the k = 1 group. We also include, indicated by agreen diamond marker, the hardest instance ( k max ) found us-ing our adaptive algorithm, extrapolated using the linear ﬁtto obtain the corresponding value for τ ≈ . × . τ is givenin units of Metropolis sweeps. We perform a similar analysis using the D-Wave TwoQA optimizer as the comparison platform. We deﬁneTTS as measured by DW2, t DW , as the anneal timedivided by the probability of successfully ﬁnding theGS. To establish probabilities of success, we ran each ofthe 400 instances on the D-Wave processor for roughly650,000 anneals with individual anneal times in the range20-40 µs . For the hardest instances according to HFS( k = 4), about 75% of the instances were not solved evenonce by DW2. For this reason we use the lower quartileas a representative data point in Fig 5.Here too, as with the PT comparison, large variationsin the data are observed. Nonetheless, we see a strongcorrelation between the two solvers on average (as we ex-pect), with t DW ∼ t . HF S . That DW2 scales unfavorablywith HFS-hardness as compared to how the scaling of PTmixing time suggests that the QA chip may be detrimen-tally aﬀected by ‘classical causes’ such as thermal hard-ness of instances.The hardest HFS instance found, k max , neatly demon-strates the capabilities of the RAO algorithm applied tothis particular graph. Instances of this type we estimateto have τ ≈ . × , and t DW ≈ s (that is, wouldrequire ∼ DW2 anneals).As mentioned above, the HFS ‘easiest’ instances ( k =1) do not seem correlate with the other data groups (aswe also see in Figs. 3 and 4). In fact the reason hereis trivial; since t DW has a minimum equal to the medianannealing time, 30 µs ≈ × − s in this work, the easiestinstances accumulate at this value, as is seen in Fig. 5.We believe there may be an equivalent scenario occurringfor PT, i.e., practical lower bound on τ (though, note,quantifying such a bound is non-trivial). ! ! ! t HF S (s) ! ! ! ! ! t D W ( s ) k = 1 k = 2 k = 3 k = 4 k max FIG. 5. (Color online)

DW2-hardness vs HFS-hardness.

We plot the D-Wave TTS, t DW , against t HFS for the 400problem instances as explained in the main text. Note thatthere are fewer blue data points ( k = 4), compared to the oth-ers. This is because many of these instances were not solvedonce by the D-Wave machine in 660,000 attempts and so areleft oﬀ of this graph. The linear ﬁt of slope 2.48 is obtainedfrom a least squares ﬁtting on the lower quartile data point(black squares) of each data group, excluding the k = 1 group.We also include, indicated by a green diamond marker, thehardest instance ( k max ) found using our adaptive algorithm,extrapolated using the linear ﬁt to obtain the correspondingvalue for t DW ≈ s , i.e., require ∼ attempts. These above correlation analyses all suggest that in-stances generated using our RAO algorithm are indeed in-herently more diﬃcult optimization problems, comparedwith randomly generated instances. In particular, RAOproblems are more akin to rare events (i.e., the instancesdisplaying the strongest temperature chaos in a large setof randomly chosen problems). For example, the hard-est RAO instances, indexed here by k = 4 (equivalently, τ HF S ≈ s ), are in general harder for the D-Wave Twoquantum annealer, as well as for parallel tempering algo-rithms. Moreover, we see that (Fig. 3) these instances arethermodynamically more diﬃcult, with larger Hamming distances between the ground and dominant ﬁrst-excitedstates. IV. HARD BENCHMARKS WITH KNOWNGROUND STATE CONFIGURATIONS

The generation of instances with random couplingsdoes not allow us in general to know the GS energy ofthe instances with certainty – an important feature whencarrying out comparison tests. Therefore, this section isdevoted to another adaptive technique, building on workfrom Ref. [16], which generates hard instances, but alsocrucially allows for knowledge of the GS energy, withouthaving to resort to exact solvers.We apply our method to Ising-type instances with planted solutions — an idea borrowed from constraint sat-isfaction (SAT) problems [16, 58, 59]. Instances of thistype are constructed around some arbitrary solution, bysplitting the full graph up into smaller subgraphs, i.e., theHamiltonian is written as a sum of small subgraph IsingHamiltonians, H = (cid:80) Mj =1 H j . The coupling values ofeach sub-Ising Hamiltonian are chosen so that the plantedsolution is a simultaneous GS of all of the H j , and there-fore is also a GS of the total Hamiltonian H . This knowl-edge circumvents the need for exact (provable) solvers,which rapidly become too expensive computationally asthe number of variables grows, and as such is very suit-able for benchmarking. In what follows, we shall chooseour subgraph Hamiltonians to be randomly placed frus-trated cycles, or loops, along the edges of the hardwaregraph [16] such that no conﬁguration of the variables si-multaneously minimizes all terms in the cost function (seeFig. 1 of [16] for examples of Hamiltonian loops on theDW2 graph). This frustration is known to often causeclassical algorithms to get stuck in local minima, sincethe global minimum of the problem satisﬁes only a frac-tion of the Ising couplings and/or local ﬁelds [44, 45].Unlike the signed J ij = ± M randomsub-Hamiltonian loops (either frustrated or not) on thegraph which satisfy the solution. This method allows usto easily calculate the GS energy as a sum of the indi-vidual loop energies with respect to the planted solution.At variance with the RAO method, update attempts nowinstead of involving single random edges, involve Hamil-tonian loops. We remove a random loop from the instanceand add a new random loop, making sure to keep trackof the GS energy, and making sure the new loop respectsthe planted solution. Algorithm 2

Loop Adaptive Optimization (LAO) procedure GenerateFrustratedProblem Generate (random) solution Place M random loops on graph, each respecting theplanted solution Calculate GS energy for step = 1 to NSTEP do Remove random loop from current instance Pick new random loop and add, respecting plantedsolution Get new TTS if TTS increases then

Accept Change, update GS energy else

Accept with probability e − β | ∆TTS | Update GS energy if accepted end if end for end procedure

One now has many diﬀerent parameters aﬀecting theperformance, e.g., the total number of loops in the in-stance, the ratio of diﬀerent sized loops (e.g., one canuse a mix of size 4 and 6 loops etc.), diﬀerent (positive)weights on the loops. One can also scrutinize the positionof each loop to try to maximize e.g., the frustration (notethat randomly adding loops can have the aﬀect so as tocancel out frustration). Also, of course, similar commentsabout adjusting the algorithm as mentioned in the RAOsection still apply here. : t ( f ) HF S (s) / h t ( i ) HF S (s) i F r a c t i o n o f I n s t a n ce s FIG. 6. (Color online)

Histogram of the ratio of ﬁnal tomean initial t HFS for 100 planted-solution instances(504-bit Chimera graph) . We compare the LAO algorithmafter 2000 steps (red) to the 100 random initial (blue) planted-solution instances, each containing 350 random loops. Up-dates consisted of adding and removing random loops, wheresize 4 (6) loops have a probability of 0.1 (0.9) being chosen,with integer loop weight chosen randomly in range [1,5]. Weplot the histogram of the ratio ﬁnal HFS TTS, t ( f ) HFS , after 0(blue) and 2000 (red) LAO steps, to the mean initial TTS, (cid:104) t ( i ) HFS (cid:105) . The mean TTS ratio after the 2000 steps is ≈ . ≈ . s . In Fig. 6 we perform one such version of our LAO al-gorithm on 100 instances, and compare the ﬁnal TTS tothe typical TTS a random instance. While there is ageneral increase in problem diﬃculty, from that of a ran-dom instance, it is by far less than the equivalent ﬁgurefor random signed instances (see Fig. 2). Note howeverthat planted-solution problems may be tuned in numer-ous ways (see previous paragraph) to provide varying de-grees of hardness, thereby altering the structure of theproblem (in fact, they may be tuned to be harder thanrandom signed instances [16]). Thus, the choice of pa-rameters, which we have not optimized here, may heav-ily aﬀect the performance of LAO. Nevertheless, we havedemonstrated the successful application of our main al-gorithm to instances with planted solutions, allowing forthe generation of instances that are about an order ofmagnitude more diﬃcult.

V. SUMMARY AND CONCLUSIONS

We developed a technique to practically engineer ex-tremely hard optimization problems to address the chal-lenge of generating proper benchmarks for the testing ofexperimental quantum annealers. This was accomplishedby treating the generation of hard problem instances as anoptimization problem and subsequently devising a heuris-tic optimization algorithm to solve it. We demonstratedthat one can successfully engineer Ising-type optimiza-tion problems with varying degrees of diﬃculty, deﬁnedby some suitable choice of problem hardness. To estab-lish and conﬁrm the inherent hardness of the instances,we measured the correlation between various independentmeasures of problem diﬃculty, in particular, from paralleltempering conﬁgurations we computed a Hamming dis-tance measure between the ground and main ﬁrst-excitedstates.We illustrated the ability to generate signed (i.e., J ij = ± VI. ACKNOWLEDGEMENTS

We thank Ehsan Khatami for useful comments and sug-gestions.This work was partially supported by MINECO(Spain) through Grant Nos. FIS2012-35719-C02,FIS2015-65078-C2-1-P. Computation for the work de-scribed in this paper was partially supported by theUniversity of Southern Californias Center for High-Performance Computing (http://hpcc.usc.edu). [1] C.H. Papadimitriou and K. Steiglitz,

Combinatorial Op-timization: Algorithms and Complexity , Dover Books onComputer Science (Dover Publications, 2013).[2] A. B. Finnila, M. A. Gomez, C. Sebenik, C. Stenson, andJ. D. Doll, “Quantum annealing: A new method for min-imizing multidimensional functions,” Chemical PhysicsLetters , 343–348 (1994).[3] Tadashi Kadowaki and Hidetoshi Nishimori, “Quantumannealing in the transverse Ising model,” Phys. Rev. E , 5355 (1998).[4] Roman Martoˇn´ak, Giuseppe E. Santoro, and ErioTosatti, “Quantum annealing by the path-integral MonteCarlo method: The two-dimensional random Isingmodel,” Phys. Rev. B , 094203 (2002).[5] Giuseppe E. Santoro, Roman Martoˇn´ak, Erio Tosatti,and Roberto Car, “Theory of quantum annealing of anIsing spin glass,” Science , 2427–2430 (2002).[6] J. Brooke, D. Bitko, T. F., Rosenbaum, and G. Aeppli,“Quantum annealing of a disordered magnet,” Science , 779–781 (1999).[7] Edward Farhi, Jeﬀrey Goldstone, Sam Gutmann, JoshuaLapan, Andrew Lundgren, and Daniel Preda, “A quan-tum adiabatic evolution algorithm applied to random in-stances of an NP-Complete problem,” Science , 472–475 (2001).[8] Ben W. Reichardt, “The quantum adiabatic optimiza-tion algorithm and local minima,” in Proceedings of theThirty-sixth Annual ACM Symposium on Theory of Com-puting , STOC ’04 (ACM, New York, NY, USA, 2004) pp.502–510.[9] M. W. Johnson, M. H. S. Amin, S. Gildert, T. Lanting,F. Hamze, N. Dickson, R. Harris, A. J. Berkley, J. Jo-hansson, P. Bunyk, E. M. Chapple, C. Enderud, J. P.Hilton, K. Karimi, E. Ladizinsky, N. Ladizinsky, T. Oh,I. Perminov, C. Rich, M. C. Thom, E. Tolkacheva, C. J. S.Truncik, S. Uchaikin, J. Wang, B. Wilson, and G. Rose,“Quantum annealing with manufactured spins,” Nature , 194–198 (2011).[10] P. I Bunyk, E. M. Hoskinson, M. W. Johnson, E. Tolka-cheva, F. Altomare, AJ. Berkley, R. Harris, J. P. Hilton,T. Lanting, AJ. Przybysz, and J. Whittaker, “Architec-tural considerations in the design of a superconductingquantum annealing processor,”

Applied Superconductiv-ity, IEEE Transactions on , Applied Superconductivity, IEEE Transactions on , 1–10 (Aug. 2014).[11] A. P. Young, S. Knysh, and V. N. Smelyanskiy, “Sizedependence of the minimum excitation gap in the quan-tum adiabatic algorithm,” Phys. Rev. Lett. , 170503(2008).[12] Itay Hen and A. P. Young, “Exponential complexity ofthe quantum adiabatic algorithm for certain satisﬁabilityproblems,” Phys. Rev. E , 061152 (2011).[13] I. Hen, “Excitation gap from optimized correlation func-tions in quantum Monte Carlo simulations,” Phys. Rev.E. , 036705 (2012), arXiv:1112.2269v2.[14] E. Farhi, D. Gosset, I. Hen, A. W. Sandvik, P. Shor, A. P.Young, and F. Zamponi, “Performance of the quantumadiabatic algorithm on random instances of two optimiza-tion problems on regular hypergraphs,” Phys. Rev. A ,052334 (2012), (arXiv:1208.3757 ).[15] Troels F. Rønnow, Zhihui Wang, Joshua Job, SergioBoixo, Sergei V. Isakov, David Wecker, John M. Mar-tinis, Daniel A. Lidar, and Matthias Troyer, “Deﬁningand detecting quantum speedup,” Science , 420–424(2014).[16] Hen, I., Job, J., Albash, T., Rønnow, T.F., Troyer, M.,and Lidar D.A., “Probing for quantum speedup in spin-glass problems with planted solutions,” Phys. Rev. A (2015).[17] T. Lanting, A. J. Przybysz, A. Yu. Smirnov, F. M.Spedalieri, M. H. Amin, A. J. Berkley, R. Harris, F. Al-tomare, S. Boixo, P. Bunyk, N. Dickson, C. Enderud,J. P. Hilton, E. Hoskinson, M. W. Johnson, E. Ladizinsky,N. Ladizinsky, R. Neufeld, T. Oh, I. Perminov, C. Rich,M. C. Thom, E. Tolkacheva, S. Uchaikin, A. B. Wilson,and G. Rose, “Entanglement in a quantum annealing pro-cessor,” Physical Review X , 021041– (2014).[18] Sergio Boixo, Vadim N. Smelyanskiy, Alireza Shabani,Sergei V. Isakov, Mark Dykman, Vasil S. Denchev, Mo-hammad Amin, Anatoly Smirnov, Masoud Mohseni, andHartmut Neven, “Computational role of collective tunnel-ing in a quantum annealer,” arXiv:1411.4036 (2014).[19] Sergio Boixo, Troels F. Ronnow, Sergei V. Isakov, ZhihuiWang, David Wecker, Daniel A. Lidar, John M. Martinis,and Matthias Troyer, “Evidence for quantum annealingwith more than one hundred qubits,” Nat. Phys. , 218–224 (2014).[20] Sergio Boixo, Vadim N. Smelyanskiy, Alireza Shabani, Sergei V. Isakov, Mark Dykman, Vasil S. Denchev,Mohammad H. Amin, Anatoly Yu Smirnov, MasoudMohseni, and Hartmut Neven, “Computational multi-qubit tunnelling in programmable quantum annealers,”Nat Commun , 10327 (2016).[21] V. S. Denchev, S. Boixo, S. V. Isakov, N. Ding, R. Bab-bush, V. Smelyanskiy, J. Martinis, and H. Neven, “Whatis the Computational Value of Finite Range Tunneling?”ArXiv e-prints (2015), arXiv:1512.02206 [quant-ph].[22] Helmut G. Katzgraber, Firas Hamze, and Ruben S.Andrist, “Glassy chimeras could be blind to quantumspeedup: Designing better benchmarks for quantum an-nealing machines,” Physical Review X , 021008– (2014).[23] V. Martin-Mayor and I. Hen, “Unraveling quantum an-nealers using classical hardness,” Scientiﬁc Reports ,15324 (2015).[24] Davide Venturelli, Salvatore Mandr`a, Sergey Knysh,Bryan O’Gorman, Rupak Biswas, and Vadim Smelyan-skiy, “Quantum Optimization of Fully Connected SpinGlasses,” Phys. Rev. X (2015).[25] F. Belletti, M. Cotallo, A. Cruz, L. A. Fernandez,A. Gordillo, A. Maiorano, F. Mantovani, E. Mari-nari, V. Martin-Mayor, A. Mu˜noz Sudupe, D. Navarro,S. Perez-Gaviro, J. J. Ruiz-Lorenzo, S. F. Schifano,D. Sciretti, A. Tarancon, R. Tripiccione, and J. L. Ve-lasco (Janus Collaboration), “Simulating spin systemson IANUS, an FPGA-based computer,” Comp. Phys.Comm. , 208–216 (2008), arXiv:0704.3573.[26] F. Belletti, M. Guidetti, A. Maiorano, F. Mantovani, S. F.Schifano, R. Tripiccione, M. Cotallo, S. Perez-Gaviro,D. Sciretti, J. L. Velasco, A. Cruz, D. Navarro, A. Taran-con, L. A. Fernandez, V. Martin-Mayor, A. Mu˜noz-Sudupe, D. Yllanes, A. Gordillo-Guerrero, J. J. Ruiz-Lorenzo, E. Marinari, G. Parisi, M. Rossi, and G. Zanier(Janus Collaboration), “Janus: An FPGA-based systemfor high-performance scientiﬁc computing,” Computingin Science and Engineering , 48 (2009).[27] M. Baity-Jesi, R. A. Ba˜nos, Andres Cruz, Luis An-tonio Fernandez, Jose Miguel Gil-Narvion, AntonioGordillo-Guerrero, David Iniguez, Andrea Maiorano,F. Mantovani, Enzo Marinari, Victor Martin-Mayor,Jorge Monforte-Garcia, Antonio Mu˜noz Sudupe, DenisNavarro, Giorgio Parisi, Sergio Perez-Gaviro, M. Pi-vanti, F. Ricci-Tersenghi, Juan Jesus Ruiz-Lorenzo, Se-bastiano Fabio Schifano, Beatriz Seoane, Alfonso Taran-con, Raﬀaele Tripiccione, and David Yllanes (JanusCollaboration), “Janus II: a new generation application-driven computer for spin-system simulations,” Comp.Phys. Comm , 550–559 (2014), arXiv:1310.1032.[28] S. R. McKay, A. N. Berker, and S. Kirkpatrick, “Spin-glass behavior in frustrated ising models with chaoticrenormalization-group trajectories,” Phys. Rev. Lett. ,767 (1982).[29] A. J. Bray and M. A. Moore, “Chaotic nature of the spin-glass phase,” Phys. Rev. Lett. , 57 (1987).[30] J. R. Banavar and A. J. Bray, “Chaos in spin glasses:A renormalization-group study,” Phys. Rev. B , 8888(1987).[31] I. Kondor, “On chaos in spin glasses,” J. Phys. A ,L163 (1989).[32] I. Kondor and V´egs¨o, “Sensitivity of spin-glass order totemperature changes,” J. Phys. A , L641 (1993).[33] A. Billoire and E. Marinari, “Evidence against temper-ature chaos in mean-ﬁeld and realistic spin glasses,” J.Phys. A , L265 (2000).[34] T Rizzo, “Against chaos in temperature in mean-ﬁeld spin-glass models,” Journal of Physics A: Mathematicaland General , 5531 (2001).[35] Roberto Mulet, Andrea Pagnani, and Giorgio Parisi,“Against temperature chaos in naive thouless-anderson-palmer equations,” Phys. Rev. B , 184438 (2001).[36] A. Billoire and E. Marinari, “Overlap among states atdiﬀerent temperatures in the sk model,” Europhys. Lett. , 775 (2002).[37] F. Krzakala and O. C. Martin, “Chaotic temperature de-pendence in a model of spin glasses,” Eur. Phys. J. ,199 (2002).[38] T. Rizzo and A. Crisanti, “Chaos in temperature inthe sherrington-kirkpatrick model,” Phys. Rev. Lett. ,137201 (2003).[39] G. Parisi and T. Rizzo, “Chaos in temperature in dilutedmean-ﬁeld spin-glass,” J. Phys. A , 235003 (2010).[40] M. Sasaki, K. Hukushima, H. Yoshino, and H. Takayama,“Temperature chaos and bond chaos in edwards-andersonising spin glasses: Domain-wall free-energy measure-ments,” Phys. Rev. Lett. , 267203 (2005).[41] H. G. Katzgraber and F. Krzakala, “Temperature anddisorder chaos in three-dimensional ising spin glasses,”Phys. Rev. Lett. , 017201 (2007).[42] L. A. Fernandez, V. Martin-Mayor, G. Parisi, andB. Seoane, “Temperature chaos in 3d ising spin glassesis driven by rare events,” EPL , 67003 (2013),arXiv:1307.2361.[43] Alain Billoire, “Rare events analysis of temperature chaosin the sherrington–kirkpatrick model,” J. Stat. Mech. , P04016 (2014), arXiv:1401.4341.[44] K.H. Fischer and J.A. Hertz, Spin Glasses (University ofCambridge, 1991).[45] K. Binder and A. P. Young, “Spin glasses: Experimentalfacts, theoretical concepts, and open questions,” Reviewsof Modern Physics , 801–976 (1986).[46] Helmut G. Katzgraber, Firas Hamze, Zheng Shu, An-drew J. Ochoa, and H. Muonoz-Bauza, “Seeking Quan-tum Speedup Through Spin Glasses: The Good, the Bad,and the Ugly,” Phys. Rev. X , 031026 (2015).[47] A.D. Sokal, “Monte carlo methods in statistical mechan-ics: Foundations and new algorithms,” in Functional In-tegration: Basics and Applications , edited by C DeWitt-Morette, P. Cartier, and A. Folacci (Plenum, 1997).[48] L. A. Fernandez, V. Martin-Mayor, S. Perez-Gaviro,A. Tarancon, and A. P. Young, “Phase transition in thethree dimensional Heisenberg spin glass: Finite-size scal-ing analysis,” Phys. Rev. B , 024422 (2009).[49] R. Alvarez Ba˜nos, A. Cruz, L. A. Fernandez, J. M. Gil-Narvion, A. Gordillo-Guerrero, M. Guidetti, A. Maio-rano, F. Mantovani, E. Marinari, V. Martin-Mayor,J. Monforte-Garcia, A. Mu˜noz Sudupe, D. Navarro,G. Parisi, S. Perez-Gaviro, J. J. Ruiz-Lorenzo, S. F.Schifano, B. Seoane, A. Tarancon, R. Tripiccione, andD. Yllanes (Janus Collaboration), “Nature of the spin-glass phase at experimental length scales,” J. Stat. Mech. , P06026 (2010), arXiv:1003.2569.[50] Firas Hamze and Nando de Freitas, “From ﬁelds totrees,” in UAI , edited by David Maxwell Chickering andJoseph Y. Halpern (AUAI Press, Arlington, Virginia,2004) pp. 243–250.[51] Alex Selby, “Eﬃcient subgraph-based sampling of ising-type models with frustration,” arXiv:1409.3934 (2014).[52] The ‘Boltzmann factor’ e − β | ∆TTS | should not be confusedwith the standard one in Statistical Mechanics, e − βH : inour case the TTS plays the role that the Hamiltonianwould play in a standard problem. [53] Alex Selby, “QUBO-Chimera,” https://github.com/alex1770/QUBO-Chimera (2013).[54] For the HFS algorithm, the most natural choice of TTS,so as to quantify the computational diﬃculty, is simplythe physical, average solve time of an instance; due tothe nature of the algorithm, it does not easily lend toa ‘universal’, system independent measure of TTS (e.g.number of algorithmic steps). The reader is referred toRef. [16] and [51] for more details.[55] If one were to use a solver which suﬀers from intrinsiccontrol errors (e.g., an analogue device, such as the DWannealers), i.e., encoding errors in the J ij , one may haveto perform some kind of averaging procedure to try toestimate the TTS more accurately (e.g., in the DW case,running over several diﬀerent gauges [15]). The perfor-mance of our algorithm will of course be adversely af-fected in such a case.[56] To generate the hardest instances, k = 4, we ran ouralgorithm 780 times with up to 1000 steps (that is, atleast 100 of the 780 instances fell within the time interval[0.8,1.2] s ). The total CPU time this took was approxi-mately 400 hours (though note that our code was notoptimized). Also note that of these 780, about 120 have t HFS > . s . The other k groups took signiﬁcantly lesstime to generate.[57] Screening instances on PT occurs in 2 phases: Phase1 screens 76800 random instances, taking ∼

900 CPUhours. Phase 2 then screens the 1024 (estimated) worstcase instances for another 100 hours. Given 76800 ran-dom 500+ bit instances we expect approximately 15 tohave τ > ( ∼

65 hours per instance).[58] W. Barthel, A. K. Hartmann, M. Leone, F. Ricci-Tersenghi, M. Weigt, and R. Zecchina, “Hiding solutionsin random satisﬁability problems: A statistical mechanicsapproach,” Physical Review Letters , 188701– (2002).[59] Florent Krzakala and Lenka Zdeborov´a, “Hiding quietsolutions in random constraint satisfaction problems,”Physical Review Letters , 238701– (2009). Appendix A: D-WAVE TWO ANNEALER

The D-Wave Two (DW2) is marketed by D-Wave Sys-tems Inc. as a quantum annealer, which evolves a phys-ical system of superconducting ﬂux qubits according tothe time-dependent Hamiltonian H ( t ) = A ( t ) (cid:88) i ∈ V σ xi + B ( t ) H p , t ∈ [0 , t f ] , (A1)with H p given in Eq. (1). The annealing schedules givenby A ( t ) and B ( t ) are shown in Fig. 7. Our experimentsused the DW2 device housed at the USC Information Sci-ences Institute, with an operating temperature ≈ K , bipartitegraph. In the ideal Chimera graph (of 512 qubits) the de-gree of each vertex is 6 (except for the corner unit cells).In the actual DW2 device we used, 504 qubits were func-tional. t=t f A nn e a li n g S c h e du l e ( G H z ) A ( t ) B ( t ) k B T FIG. 7. (Color online)

Annealing schedule of the DW2device.

The annealing curves A ( t ) and B ( t ) are calculatedusing rf-SQUID models with independently calibrated qubitparameters. Units of (cid:126) = 1. The operating temperature of17mK is also shown. Appendix B: IMPLEMENTATION DETAILS

All of our numerical results were obtained using A.Selby’s (heuristic) version of HFS [51, 53] (i.e., with in-put settings -S3 -m1), running on a single core of an In-tel Xeon CPU E5-1650 v2 running at 3.50GHz. Withthese settings, the code will halt only after it has foundagreement in the lowest energy for Eq. 1, n + 1 con-secutive times (option -p set to n in Selby’s code), overindependent HFS sweeps (Selby’s code technically solvesQuadratic Unconstrained Binary Optimization (QUBO)problems, but the mapping from Ising problems of thissort is trivial).We deﬁne the HFS time-to-solution, t HF S in the maintext, as t HF S := lim n →∞ T step / ( n + 1), where T step , de-pending on n , is the physical wallclock run time of Selby’s1 FIG. 8. (Color online)

A 4 by 4 patch of the full 8 by 8Chimera graph for the DW2 chip . Top left (red) showsa single K , bipartite graph. The qubits or spin variablesoccupy the vertices (circles) and the couplings J ij are alongthe edges. Of the 512 qubits, 504 were operative in our ex-periments. code, for a single QUBO instance (also see footnote [54]).That is, T step is the time taken to ﬁnd agreement in thelowest energy n + 1 times in a row, and as such we de-ﬁne this quantity as T step := (cid:80) ni =0 t i , where t i is thetime taken to ﬁnd i th ( i >

0) occurrence of the (pre-sumed) minima, and t the time to ﬁrst detect this min-ima. In practice, to obtain a reasonable estimate of t HF S one should take a ‘large’ value for n , e.g., n > n , and adapting the instance under RAO, of course meansthat the wallclock runtime of each step T step of RAO in-creases as the problem becomes harder, meaning for alarge number of RAO steps, the algorithm may take avery long time to complete. 2) We noticed that in addi-tion to t HF S increasing, so do the diﬀerences, | ∆ t HF S | ,and as such, the acceptance probability, e − β | ∆ t HFS | , mayquickly become negligible. We provide a quick and easysolution to these two problems, by varying just one pa-rameter, n . This is by no means an optimal solution, butit has enabled us to generate many hard instances withfewer computational resources compared to what wouldotherwise be required.By deﬁning a cutoﬀ return time, T max (we picked T max = 3 s ), such that if T step > T max , we reduce n → n/ n be lower than 16, and initialize itwith n = 512). This allows us to control the growth of T step , and hence bound (at least somewhat) the total run-time of the RAO algorithm. The accuracy in the estima-tion of t HF S of course decreases as n decreases, thereforeone should run the ﬁnal instance (i.e., the instance afteradapting it as per the RAO algorithm) with some largechoice of n .In addition, we let β depend linearly on n , so that as t HF S increases (hence | ∆ t HF S | too), β is lowered, and assuch we can control the acceptance probability (again, atleast somewhat). Our particular choice used to generatethe 100 hardest instances ( k = 4) for the results sectionwas β = 6 . · nn