[PDF] Optimization and benchmarking of the thermal cycling algorithm

Abstract

Optimization plays a significant role in many areas of science and technology. Most of the industrial optimization problems have inordinately complex structures that render finding their global minima a daunting task. Therefore, designing heuristics that can efficiently solve such problems is of utmost importance. In this paper we benchmark and improve the thermal cycling algorithm [Phys. Rev. Lett. 79, 4297 (1997)] that is designed to overcome energy barriers in nonconvex optimization problems by temperature cycling of a pool of candidate solutions. We perform a comprehensive parameter tuning of the algorithm and demonstrate that it competes closely with other state-of-the-art algorithms such as parallel tempering with isoenergetic cluster moves, while overwhelmingly outperforming more simplistic heuristics such as simulated annealing.

Full PDF

OOptimization and benchmarking of the thermal cycling algorithm

Amin Barzegar,

1, 2

Anuj Kankani, Salvatore Mandr`a,

3, 4 and Helmut G. Katzgraber

5, 2, 1, ∗ Department of Physics and Astronomy, Texas A&M University, College Station, Texas 77843-4242, USA Microsoft Quantum, Microsoft, Redmond, Washington 98052, USA Quantum Artiﬁcial Intelligence Laboratory (QuAIL), NASA Ames Research Center, Moffett Field, California 94035, USA KBR, Inc., 601 Jefferson St., Houston, TX 77002, USA Professional Services, Amazon Web Services, Seattle, Washington 98101, USA (Dated: December 18, 2020)Optimization plays a signiﬁcant role in many areas of science and technology. Most of the industrial opti-mization problems have inordinately complex structures that render ﬁnding their global minima a daunting task.Therefore, designing heuristics that can efﬁciently solve such problems is of utmost importance. In this paperwe benchmark and improve the thermal cycling algorithm [Phys. Rev. Lett. 79, 4297 (1997)] that is designedto overcome energy barriers in nonconvex optimization problems by temperature cycling of a pool of candidatesolutions. We perform a comprehensive parameter tuning of the algorithm and demonstrate that it competesclosely with other state-of-the-art algorithms such as parallel tempering with isoenergetic cluster moves, whileoverwhelmingly outperforming more simplistic heuristics such as simulated annealing.

I. INTRODUCTION

Optimization is ubiquitous in science and industry. Fromthe search for the ground state of exotic states of matter suchas high-temperature superconductors in physics [1], topologyoptimization in material science [2], lead optimization in phar-maceutical drug discovery [3], spacecraft trajectory optimiza-tion [4], portfolio optimization in ﬁnance [5], scheduling intransportation [6], to speech recognition in artiﬁcial intelli-gence [7], to name a few. One important category of opti-mization problems is combinatorial optimization, which is thesearch for the minima of an objective function within a ﬁnitebut often large set of solutions. Paradigmatic examples are thetraveling salesman problem, the geometrical packing problem[8], graph coloring, the cutting stock problem, integer linearprogramming, etc. Many of these problems are NP-hard, inthe sense that the worst-case time to ﬁnd the optimum scalesworse than a polynomial in the size of the input. Moreover,these problems are not only computationally hard to solve inthe worst-case, but also in the typical case. These problemshave a rough energy (cost function) landscape, consisting ofnumerous metastable states. Therefore, heuristics based onlocal search—e.g., the greedy algorithm [9]—tend to performpoorly on these types of problems as they can easily becometrapped in local minima [10].One way to circumvent this difﬁculty is to use a stochasticprocess such as Metropolis dynamics [11] to randomly accessdifferent parts of the phase space. An example of an algo-rithm that utilizes such random sampling is simulated anneal-ing (SA) [12]. Simulated annealing is a Markov Chain MonteCarlo (MCMC) process where a series of quasi-equilibriumstates are visited by following an “annealing schedule” dur-ing which the system is gradually cooled from a sufﬁcientlyhigh temperature to a target low temperature. The goal is toguide the stochastic process through an occasionally complexenergy landscape toward the low-lying states. At high temper- ∗ The work of H. G. K. was performed before joining Amazon Web Services atures, the random “walker” can take long strides across phasespace, thus allowing for the exploration of conﬁgurations faraway in Hamming distance. As the system is cooled, the ex-ploration domain of the walker is reduced according to theGibbs distribution, and it eventually lands in a low-lying state.There is no guarantee that this is the true optimum, unless an(impractical) inﬁnite annealing time is used [13]. Because SAis stochastic in nature, running many such processes in paral-lel can increase the chance of ﬁnding the true optimum. Nev-ertheless, without establishing a way for the phase-space in-formation gathered by the random walkers to be shared, merereplication of a simulated annealing process will not yield anymeaningful speedup. Multiple Markov chain algorithms suchas path integral Monte Carlo (PIMC) [14–19], parallel temper-ing (PT) [20, 21], and population annealing (PA) [22–26] takeadvantage of this “collective knowledge” to efﬁciently probethe solution space of a problem.Closely related to the genetic local search approaches [27–30], the thermal cycling algorithm (TCA) [31, 32] is anotherheuristic that integrates the power of parallel annealing pro-cesses with the utility of local search methods. The annealingpart of this algorithm ensures that the phase space can be vis-ited ergodically, whereas the local search part biases the dy-namics toward the lower-energy states. When introduced al-most twenty years ago, thermal cycling was shown to outper-form simulated annealing in solving some limited instancesof the traveling salesman problem. Despite the early indica-tions that TCA might be a useful tool in dealing with hardoptimization problems, it has not been carefully benchmarkedand hence widely adopted by the optimization community.Here we reintroduce the thermal cycling algorithm and outlinethe basic pseudocode. In addition, we conduct a comprehen-sive parameter optimization of TCA using synthetic plantedproblems, where we compare the performance of TCA to anumber of modern optimizers, including simulated quantumannealing (SQA). In order to quantify the efﬁciency of theaforementioned heuristics, we study how their time to solution(TTS) [33, 34] scales with the problem size. Our results showthat when optimized properly, TCA can indeed be competi-tive with the state-of-the-art heuristics and therefore should a r X i v : . [ c ond - m a t . d i s - nn ] D ec be included in physics-inspired optimization platforms.The paper is structured as follows. In Sec. II we explain theanalysis techniques followed by the details of the algorithm inSec. III. Section IV is dedicated to the benchmarking resultsof the study. Concluding remarks are presented in Sec. V. II. DETAILS OF ANALYSIS

The cost function that we minimize in this benchmarkingstudy is a 2-local Ising spin system, i.e., H = N (cid:88) i (cid:88) j ∈N i J ij s i s j + N (cid:88) i h i s i , (1)where N is the total number of variables, N i is the adjacencylist of the i ’th lattice site, J ij is the coupling between spin s i and s j , and ﬁnally h i is an external ﬁeld applied to spin s i ∈ {± } .Most algorithms involve multiple parameters that needto be carefully tuned to observe the true asymptotic scal-ing. As such, a comprehensive hyperparameter optimiza-tion is in order. For benchmarking, we use synthetic prob-lems whose ground state is unique and known beforehand.Here, we use the deceptive cluster loop (DCL) problems [35]that are speciﬁcally designed for testing the performance ofthe D-Wave [36] quantum annealer against classical algo-rithms. DCL’s are inspired by the original frustrated clus-ter loop (FCL) problems [37, 38], which have a ferromag-netic planted ground state deﬁned on a Chimera graph [39].The Chimera topology consists of a two-dimensional latticeof fully-connected bipartite K , cells in which all qubitsare coupled ferromagnetically. The entire K , unit cell can,therefore, be viewed as one virtual variable. The cells are thenconnected via randomly-chosen frustrated loops. The magni-tude of inter-cell couplings are capped at a ﬁnite value R thatadds local “ruggedness” to the problems [40, 41]. The hard-ness of the FCL instances can be tuned by varying the den-sity of the frustrated loops, often denoted by parameter α . Inthe DCL problems, the inter-cell couplers are multiplied bya scaling factor λ . Depending on the value of λ , the internalstructure of the cells can be masked or accentuated, thus de-ceiving the annealers to spend more time optimizing the localstructures rather than ﬁnding the global minimum.As the measure of performance, we use the time to solution(TTS) [33, 34] that is deﬁned in the following way: TTS( α ) = n ( α ) τ run , (2)where n ( α ) is the number of times that the algorithm must berepeated, for a given parameter set α , to ﬁnd the ground stateat least once with a desired probability of p d . τ run is the aver-age run time, conventionally measured in microseconds. If weassume that the success probability , i.e. , the chance of hittingthe ground state in a single run of the algorithm is p s ( α ) , thenone can show from the binomial distribution that p d = n (cid:88) k ≥ (cid:18) nk (cid:19) p k s (1 − p s ) n − k = 1 − (1 − p s ) n . (3) We may now use the above expression to ﬁnd n ( α ) in Eq. (2): n ( α ) = log[1 − p d ]log[1 − p s ( α )] . (4)It is customary to set the desired probability in Eq. (4) to ahigh conﬁdence value of p d = 0 . . Because the TTS is afunction of the algorithm parameters, a thorough optimiza-tion of the parameters must be performed to reliably compareheuristics based upon it. Note that the optimization is oftenmultidimensional, which makes the benchmarking a relativelylaborious task. For each set of parameters α and each probleminstance, we repeat the runs times and calculate the suc-cess probability p s ( α ) as the percentage of the ground statehits. This process is repeated for all instances, in this case, , to calculate the median TTS, and the error bars are es-timated using the bootstrap method. The above procedure iscarried out for many other parameter-set values, and the opti-mal parameters are identiﬁed as the global minimum point ofthe TTS function. Having calculated the optimal TTS for allproblem sizes ( N = 8 L ), we can study the scaling behaviorof the algorithm, which is often a power law, i.e. , TTS opt ∼ a + bL . (5)The scaling exponent b determines the performance of an al-gorithm in the asymptotic limit, whereas a is a constant offsetthat depends on the factors nonintrinsic to the algorithm, suchas hardware speed, code efﬁciency, etc. Therefore, a relativelyunbiased way to compare different algorithms is to focus onthe scaling exponent. III. THERMAL CYCLING ALGORITHM

The thermal cycling algorithm works by periodically heat-ing and cooling an ensemble of states while following a de-creasing temperature schedule. The ensemble is prepared byselecting N p lowest energy states among N quenched ran-dom conﬁgurations. Starting from the initial inverse temper-ature β i = 0 , the above pool of states is annealed toward aﬁnal inverse temperature of β f in N T steps. At a given tem-perature, some energy is deposited into the ensemble statesusing N s Metropolis updates (heating) followed by an imme-diate quench via a local search method (cooling). If any of theresulting states are lower in energy than the original set, theyare replaced in the pool. The heating-cooling cycle is repeated N c times at a ﬁxed temperature. In practice, the above processsteers the ensemble toward the low-lying states while ensur-ing that metastable conﬁgurations do not hinder the dynamics.The temperature is then reduced, and the cycles start over. InAlgorithm 1, we present a concise outline of the thermal cy-cling algorithm.The main advantage of thermal cycling is the possibilityof using a variety of variable-update classes in the quenchingphase. By using more complex updates, exponentially manysmaller local minima can be skipped in favor of lower-energyand conﬁgurationally more differing ones. This, however,does not necessarily translate to increased efﬁciency of the al-gorithm as the implementation overhead associated with those Algorithm 1

Thermal Cycling Randomly initialize N conﬁgurations of the problem. Quench each of the N states using a local search algorithm. Construct a pool of states by selecting N p states with the lowestenergy from the above quenched states. Build a list of lattice sites by comparing the spins on a given sitebetween all pools states. If all aligned, add the site to the list. for N T steps starting from β = 0 until β = β f do for N c cycles do Pick a random state from the pool. Add heat to the pool state using N s Metropolis sweepsat β , excluding the spins in the site list. Quench the selected state. if lower energy is achieved then Replace the old state in the pool with the new one.

Rebuild the site list by comparing the spins betweenthe updated pool states. end if end for

Increase β → β + ∆ β in which the step size ∆ β isusually constant, i.e, linear schedule. end for Identify the pool state with the lowest energy as the solution ofthe problem. complex moves can negate the overall gain. Thus, there mustbe a trade-off between the complexity of the moves and thespeedup owing to the reduced metastability. One of the sim-plest updates is a single-spin greedy move (SSGM) in whichthe most unstable spins ( i.e., spins with the largest positivelocal ﬁelds) are ﬂipped in a sequential fashion until no fur-ther improvement can be made. Another subset of the moveclasses are the double-spin random moves (DSRM), whichconsist of ﬁrst attempting to ﬂip a randomly-chosen spin byitself, and if this is rejected, then trying to ﬂip it together withone of the neighboring spins (looked up sequentially) that re-sults in lowering the overall energy. The updates stop whenthe rejections accumulate to the total number of bonds in theproblem. Another important type of move that we have stud-ied here is the Lin-Kernighan cluster move (LKCM) that isbased on the famous Lin-Kernighan algorithm [42, 43] whichis considered, to date, one of the most efﬁcient heuristics forsolving the traveling salesman problem (TSP). In a LKCM,a cluster of spins of size M is constructed starting from themost unstable spin and then appending the neighboring spinsto it until the total cost of ﬂipping the cluster becomes posi-tive. The LKCM is essentially a k -opt local search algorithm[44–46], where k is determined from a sequence of partialcosts, i.e. , { c , c , . . . , c M } . Here, c is the cost of ﬂippingthe ﬁrst spin, c is the cost of ﬂipping the ﬁrst and the secondspin together, and so on. One then ﬂips the set of k spins withthe lowest partial cost, c k . We report the performance of eachof the above move classes in the next section.Another important point that one has to bear in mind is theduration of the heating phase. As mentioned earlier, the heat-ing part of TCA ensures that the algorithm remains dynamicin spite of being quenched to often deep local minima. Thisrequires that the system is subjected to a sufﬁcient number ofMetropolis updates. On the other hand, to preserve the gains of the previous cycles, the equilibration must be terminated inearly stages. Otherwise, the system might end up in a conﬁg-uration too far away in the phase space.During the cycling process, the states in the pool are treatedindependently from one another. This has the potential pitfallthat some of the states in the pool might wander off to en-ergetically unfavorable parts of the conﬁguration space. Asmentioned earlier, this shortcoming can be alleviated by es-tablishing an interaction between the pool states. One way todo this is to freeze the variables that are common among all ofthe states. We can justify this reduction by realizing that if thestates in the pool have a feature in common, it is very likelythat the feature will also appear in the ground-state conﬁgu-ration. Note that this step is closely related to metaheuristicssuch as tabu search [47, 48] as well as self-avoiding randomdynamics on integer complex systems (SARDONICS) [49],search for backbones [50–53] often used in genetic type algo-rithms, and sample persistence [54, 55] which has been usedin conjunction with algorithms such as simulated annealing aswell as simulated quantum annealing. IV. RESULTS

In this study, we compare TCA to simulated annealing (SA)[12], simulated quantum annealing (SQA) [16], parallel tem-pering (PT) [20, 21], and parallel tempering with isoenergeticcluster moves (PT+ICM) [56]. SQA is the classical imple-mentation of the quantum annealing process [57, 58] in whichthe system is initialized in the ground state of a simple Hamil-tonian and adiabatically [59] deformed into a target Hamilto-nian whose ground state is difﬁcult to ﬁnd. PT is a MonteCarlo algorithm that efﬁciently samples the equilibrium con-ﬁgurations of a system using the replica-exchange technique.The ICM update—which consist of rearranging a large col-lection of variables by inspecting the overlap between tworeplicas—is extremely effective for low connectivity graphsin which the cluster percolation threshold is small.For this benchmarking study we have generated

DCLinstances for each linear size ranging from L = 8 to L = 16 .The DCL parameters are ﬁxed to a relatively hard regime of α = 0 . , R = 1 , and λ = 3 . [35]. For each studied algo-rithm, we optimize the parameters via a grid search within itsparameter space. All of the simulations are done on a singlethread using Intel Xeon E5-2680 v4 2.40GHz processors. InFig. 1, we show some examples of such hyperparameter opti-mization. Figure 1(a) illustrates the parameter optimization ofTCA for the system size L = 12 in which the color map repre-sents the TTS values in the logarithmic scale and the axes arethe number of cycles N c and the number of annealing steps N T . Note that this is only a two-dimensional cross-section ofthe entire parameter space on which the rest of the parame-ters have ﬁxed values of N = 1024 , N p = 16 , N s = 32 , and β f = 0 . . We observe two minima with comparable depthwithin the error bars. This shows that N T and N c are anti-correlated. Figure 1(b) shows the optimization of sweeps forPT+ICM using various system sizes. The minimum TTS val-ues correspond to the optimal sweep values. The above opti-

200 400 600 800 1000 1200 1400100200300400500 N T N c log ( TTS ) (a)(b) FIG. 1: (a) Parameter tuning of the thermal cycling algorithm usingdeceptive cluster loop (DCL) problems of linear size L = 12 . The ﬁg-ure shows the number of cycles N c versus the number of annealingsteps N T , with the rest of the parameters ﬁxed. The color map showsthe TTS values. Two minima with similar depths are observed. Thelowest minimum corresponds to the optimal parameters. (b) Opti-mization of the number of sweeps for PT+ICM for various systemssizes. The minima of the TTS curves mark the optimal sweep valuesused in the scaling analysis. mization has been done for all other algorithms as well, andwe omit details for clarity.In Table. I, we have listed the optimal parameters of thethermal cycling algorithm for the studied system sizes. Weobserve that most of the algorithm parameters are robust withrespect to the problem size and the number of annealing steps N T —much like SA—is the only varying parameter. This isvaluable information as it eliminates the necessity of a fullparameter optimization in a practical application of the al-gorithm. In reality, the total effort in a TCA simulation isroughly proportional to N T N p N c N s with some additionaloverhead caused by the local search. Although this might TABLE I: Parameters of the thermal cycling algorithm for differentlinear problem sizes L . N T is the number of annealing steps, N p isthe pool size, N c is the number of heating-cooling cycles per tem-perature, N s is the number of Metropolis sweeps, and β f is the targetinverse temperature (lowest temperature). L N T N p N c N s β f . .

10 512 16 64 32 0 . . .

13 724 16 128 32 0 . . .

16 1448 16 128 32 0 . seem like a large effort compared to SA, in terms of time tosolution, TCA is orders of magnitude more efﬁcient than SAas we demonstrate below. This can, in part, be attributed to theuse of complex quenching moves in TCA that signiﬁcantly re-duce the number of the visited local minima.In Fig. 2 we show the scaling results of the thermal cyclingalgorithm with various move classes–that is, SSGM, DSRM,and LKCM, as explained in Sec. III. The main panel of Fig. 2displays the optimal TTS values for different system sizes L ,with the lines ﬁtted to the largest ﬁve system sizes. The insetof the ﬁgure shows the scaling exponent b in Eq. (5) obtainedfrom the linear ﬁts. Note that the height of the boxes repre-sents the error bars. It is clear that the double-spin randommoves (DSRM) are signiﬁcantly more efﬁcient than the othertwo types of move classes that we have studied here. It is inter-esting to note that the Lin-Kernighan cluster moves (LKCM)become less efﬁcient as the problem size increases despite giv-ing almost two orders of magnitude in constant speedup forsmaller systems sizes. We speculate that this is due to the in-creased overhead of constructing a long sequence of partialcosts—which requires building a large cluster of spins as weexplained in Sec. III—relative to the optimal subcluster thatis ﬂipped in the end. It is worth mentioning here that the ef-ﬁciency of the LKCM is to some extent topology-dependentbecause the updates are, in essence, cluster moves, and there-fore might perform better when implemented on a differentset of problems.In Fig. 3(a), we show the scaling curves of various algo-rithms, corresponding to the optimal values of their parame-ters, as a function of the linear size L . As before, the TTS val-ues are reported in the logarithmic (base 10) scale where thelinear ﬁts are interpreted as exponential scalings. There is aconsiderable constant offset associated with PT and PT+ICM,which is due to the use of a highly-optimized implementa-tion by Salvatore Mandr`a as a part of NASA/TAMU UniﬁedFramework for Optimization (UFO). Figure 3(b) shows thescaling exponent b [the slope of the linear ﬁt in Fig. 3(a)].We observe that TCA (with DSRM) scales overwhelminglybetter than SA, in agreement with previous TSP studies. Itis also more efﬁcient than PT and even competitively close FIG. 2: Main Panel: Time to solution (TTS) of the thermal cy-cling algorithm versus the linear problem size L . Various quenchingschemes consisting of single-spin greedy moves (SSGM), double-spin random moves (DSRM), and Lin-Kernighan cluster moves(LKCM) are displayed. The lines represent a linear ﬁt in the log-arithmic (base 10) scale to the ﬁve largest system sizes. We observea sizable constant speedup with the LKCM although it diminishes atlarger system sizes. Inset: Scaling exponent b corresponding to theslope of the linear ﬁts in the main panel. The height of the boxesrepresents the error bars. The best scaling is obtained when TCA isused in conjunction with DSRM. These moves are simple enough tocause minimal overhead, yet complex enough to considerably reducethe number of the metastable states. to SQA. Note that PT is already established as a power-ful heuristic in many optimization-related applications. SQAand PT+ICM show the best performances among the studiedsolvers. This can be ascribed to the structure of the DCL prob-lems, which involve tall but thin barriers that can be easily tun-neled through using SQA. Isoenergetic cluster moves are alsowell-suited for the DCL problems as they cause large rear-rangements of the variables, resulting in an efﬁcient samplingof the conﬁguration space. V. CONCLUSION

In this paper we have thoroughly benchmarked the ther-mal cycling algorithm. Our results demonstrate that TCAis a competitive heuristic for solving problems with complexstructures as it takes advantage of repeated heating and cool-ing to push the system toward the lower-energy states whileensuring that the system does not get trapped in an excitedstate. By reducing the variables among the TCA replicas, thestochastic process can be further accelerated, and the systemcan be guided more effectively toward the global minimumusing the collective “memory” of the solution pool. Havingcarefully tuned the parameters, we show that TCA can be aseffective as the state-of-the-art algorithms such as SQA, whileoverpowering SA and PT by great margins in the asymptotic (a)(b)

FIG. 3: Comparison between the scaling results of the studied al-gorithms using the deceptive cluster loop (DCL) problems. (a) Timeto solution (TTS) versus the linear problem size L for various al-gorithms. Note that TTS is given on a logarithmic (base 10) scale.The sizable offsets in PT and PT+ICM are due to the use of a highlyoptimized code. (b) Scaling exponent b in Eq. (5) for various algo-rithms. The height of the boxes represents the error bars. The TCAdata points correspond to the best performing quenching scheme, i.e. ,DSRM, shown in Fig. 2. TCA scales better than SA and PT, and evencomparable to SQA within the error bars. PT+ICM shows the bestscaling. However, the latter is due to the great efﬁciency of the ICMupdates in sampling the DCL phase space which is nonrepresentativefor denser industrial problems. scaling.Due to the special structure of the DCL problems, which in-volve tall yet narrow barriers, SQA and PT+ICM outperformTCA because they utilize quantum effects and cluster updatesto bypass those barriers. The true advantage of TCA might berevealed when using dense graphs with broad barriers, wherePT+ICM and SQA would naturally struggle. TCA also lendsitself to being integrated with ICM updates because it involvessimultaneous annealing of many system replicas. This is inclose analogy with to the Iterative Partial Transcription (IPT)algorithm [60, 61]. It has been shown by Ochoa et. al. [62]that a lower-energy state can be generated by overlapping twoexcited states via an ICM update. Therefore, one interestingaddition to TCA could be trying to push the pool states furtherdown in energy by performing ICM updates at the end of thecycle, provided that the graph density is low enough and theclusters do not percolate. Acknowledgments

We would like to thank A. M¨obius for useful discus-sions regarding various aspects of the thermal cycling algo-rithm, and also providing code for the double-spin randommoves (DSRM). The authors would also like to acknowl-edge Jonathan Machta for critically reviewing the manuscript.H.G.K. would like to thanks Dr. Pimple Popper for visual-izations of energy landscapes. We thank Texas A&M Uni- versity, and NASA Ames Research Center for providing ac-cess to computational resources. This work is supported inpart by the Ofﬁce of the Director of National Intelligence(ODNI), Intelligence Advanced Research Projects Activity(IARPA), via MIT Lincoln Laboratory Air Force ContractNo. FA8721-05-C-0002. SM also acknowledges the supportfrom the Intelligence Advanced Research Projects Activity(IARPA) – IARPA IAA 1198 – and the Defense AdvancedResearch Projects Agency (DARPA) – IAA 8839, Annex 125–. The views and conclusions contained herein are those ofthe authors and should not be interpreted as necessarily repre-senting the ofﬁcial policies or endorsements, either expressedor implied, of ODNI, IARPA, DARPA or the U.S. Govern-ment. The U.S. Government is authorized to reproduce anddistribute reprints for Governmental purposes notwithstand-ing any copyright annotation thereon. [1] V. Stanev, C. Oses, A. G. Kusne, E. Rodriguez, S. Curtarolo,and I. Takeuchi, npj Computational Materials , 2057 (2018).[2] M. P. Bendsøe, Topology optimizationTopology Optimization (Springer US, Boston, MA, 2009), pp. 3928–3929.[3] G. M. Keser˝u and G. M. Makara, Drug Discovery Today ,741 (2006), ISSN 1359-6446.[4] A. Shirazi, J. Ceberio, and J. A. Lozano, Progress in AerospaceSciences , 76 (2018), ISSN 0376-0421.[5] J. Doering, A. A. Juan, R. Kizys, A. Fito, and L. Calvet, in Mod-eling and Simulation in Engineering, Economics and Manage-ment , edited by R. Le´on, M. J. Mu˜noz-Torres, and J. M. Moneva(Springer International Publishing, Cham, 2016), pp. 22–30.[6] V. Guihaire and J.-K. Hao, Transportation Research Part A: Pol-icy and Practice , 1251 (2008), ISSN 0965-8564.[7] T. Kinnunen and H. Li, Speech Communication , 12 (2010),ISSN 0167-6393.[8] R. M. Karp, Complexity of Computer Computations (New York:Plenum, 1972), p. 85.[9] T. H. Cormen, T. E. Leiserson, and R. L. Rivest,

Introduction toAlgorithms (MIT Press, Cambridge, MA, 1990).[10] G. Bendall and F. Margot, Discrete Optimization , 288 (2006).[11] N. Metropolis and S. Ulam, J. Am. Stat. Assoc. , 335 (1949).[12] S. Kirkpatrick, C. D. Gelatt, Jr., and M. P. Vecchi, Science ,671 (1983).[13] S. Geman and D. Geman, IEEE Trans. Pattern. Analy. Mach.Intell. PAMI-6 , 721 (1984).[14] H. F. Trotter, Proc. Amer. Math. Soc , 545 (1959).[15] M. Suzuki, Progress of Theoretical Physics , 1337 (1971).[16] M. Suzuki, Progress of Theoretical Physics , 1454 (1976).[17] M. Suzuki, Quantum Monte Carlo Methods in Condensed Mat-ter Physics (World Scientiﬁc, Singapore, 1993).[18] D. P. Landau and K. Binder,

A Guide to Monte Carlo Simula-tions in Statistical Physics (Cambridge University Press, 2000).[19] M. Troyer, F. Alet, S. Trebst, and S. Wessel, in

AIP Conf. Proc.690: The Monte Carlo Method in the Physical Sciences (2003),pp. 156–169.[20] C. Geyer, in , edited by E. M.Keramidas (Interface Foundation, Fairfax Station, VA, 1991),p. 156.[21] K. Hukushima and K. Nemoto, J. Phys. Soc. Jpn. , 1604(1996). [22] K. Hukushima and Y. Iba, in The Monte Carlo method inthe physical sciences: celebrating the 50th anniversary of theMetropolis algorithm , edited by J. E. Gubernatis (AIP, LosAlamos, New Mexico (USA), 2003), vol. 690, p. 200.[23] J. Machta, Phys. Rev. E , 026704 (2010).[24] W. Wang, J. Machta, and H. G. Katzgraber, Phys. Rev. E ,063307 (2015).[25] C. Amey and J. Machta, Phys. Rev. E , 033301 (2018).[26] A. Barzegar, C. Pattison, W. Wang, and H. G. Katzgraber, Phys.Rev. E , 053308 (2018).[27] R. M. Brady, Nature , 804 (1985).[28] H. M¨uhlenbein, M. Gorges-Schleuter, and O. Kr¨amer, ParallelComputing , 65 (1988), ISSN 0167-8191.[29] B. Freisleben and P. Merz, in Proceedings of 1996 IEEE In-ternational Conference on Evolutionary Computation, NayoyaUniversity, Japan, May 20-22, 1996 , edited by T. Fukuda andT. Furuhashi (IEEE, 1996), pp. 616–621.[30] P. Merz and B. Freisleben,

Lecture Notes in Computer Sci. 1498 (Springer, Berlin, 1998), p. 765.[31] A. M¨obius, A. Neklioudov, A. D´ıaz-S´anchez, K. H. Hoffmann,A. Fachat, and M. Schreiber, Phys. Rev. Lett. , 4297 (1997).[32] A. M¨obius, K. H. Hoffmann, and C. Sch¨on, Optimization byThermal Cycling (????), pp. 215–219.[33] S. Boixo, T. F. Rønnow, S. V. Isakov, Z. Wang, D. Wecker,D. A. Lidar, J. M. Martinis, and M. Troyer, Nat. Phys. , 218(2014).[34] T. F. Rønnow, Z. Wang, J. Job, S. Boixo, S. V. Isakov,D. Wecker, J. M. Martinis, D. A. Lidar, and M. Troyer, Science , 420 (2014).[35] S. Mandr`a and H. G. Katzgraber, Quantum Sci. Technol. ,04LT01 (2018).[36] D-Wave Systems Inc. The D-Wave 2X Quantum Computer.[37] I. Hen, J. Job, T. Albash, T. F. Rønnow, M. Troyer, and D. A.Lidar, Phys. Rev. A , 042325 (2015).[38] J. King, S. Yarkoni, J. Raymond, I. Ozﬁdan, A. D. King, M. M.Nevisi, J. P. Hilton, and C. C. McGeoch, Journal of the PhysicalSociety of Japan , 061007 (2019).[39] P. Bunyk, E. Hoskinson, M. W. Johnson, E. Tolkacheva, F. Al-tomare, A. J. Berkley, R. Harris, J. P. Hilton, T. Lanting, andJ. Whittaker, IEEE Trans. Appl. Supercond. , 1 (2014).[40] A. D. King, T. Lanting, and R. Harris (2015), arXiv:1502.02098.[41] J. King, S. Yarkoni, J. Raymond, I. Ozﬁdan, A. D. King, M. M.Nevisi, J. P. Hilton, and C. C. McGeoch (2017), (arXiv:quant-phys/1701.04579).[42] S. Lin and B. W. Kernighan, Operations Research , 498(1973), ISSN 0030364X, 15265463.[43] K. Helsgaun, European Journal of Operational Research ,106 (2000), ISSN 0377-2217.[44] M. M. Flood, Operations Research , 61 (1956).[45] G. A. Croes, Operations Research , 791 (1958), ISSN0030364X, 15265463.[46] S. Lin, Bell System Technical Journal , 2245 (1965).[47] F. Glover, ORSA Journal on Computing , 190 (1989).[48] F. Glover, ORSA Journal on Computing , 4 (1990).[49] F. Hamze, Z. Wang, and N. de Freitas, arXiv:stat.CO/1111.5379(2011).[50] J. Schneider, C. Froschhammer, I. Morgenstern, T. Husslein,and J. M. Singer, Computer Physics Communications , 173(1996).[51] W. Zhang, Artif. Intell. , 1 (2004).[52] Y. Wang, Z. L¨u, F. Glover, and J.-K. Hao, in EvolutionaryComputation in Combinatorial Optimization , edited by P. Merzand J.-K. Hao (Springer Berlin Heidelberg, Berlin, Heidelberg, 2011), pp. 72–83.[53] Y. Wang, Z. L¨u, F. Glover, and J.-K. Hao, Journal of Heuristics , 679 (2013).[54] P. Chardaire, J. L. Lutton, and A. Sutter, European Journal ofOperational Research , 565 (1995), ISSN 0377-2217.[55] H. Karimi, G. Rosenberg, and H. G. Katzgraber, Phys. Rev. E , 043312 (2017).[56] Z. Zhu, A. J. Ochoa, and H. G. Katzgraber, Phys. Rev. Lett. , 077201 (2015).[57] A. B. Finnila, M. A. Gomez, C. Sebenik, C. Stenson, and J. D.Doll, Chem. Phys. Lett. , 343 (1994).[58] T. Kadowaki and H. Nishimori, Phys. Rev. E , 5355 (1998).[59] M. Born and V. Fock, Zeitschrift f¨ur Physik , 165 (1928).[60] A. M¨obius, B. Freisleben, P. Merz, and M. Schreiber, Phys. Rev.E , 4667 (1999).[61] A. M¨obius, A. Diaz-Sanchez, B. Freisleben, M. Schreiber,A. Fachat, K. Hoffmann, P. Merz, and A. Neklioudov, Com-puter Physics Communications , 34 (1999), ISSN0010-4655, proceedings of the Europhysics Conference onComputational Physics CCP 1998.[62] A. J. Ochoa, D. C. Jacob, S. Mandr`a, and H. G. Katzgraber,Phys. Rev. E99