[PDF] Robust Parameter Selection for Parallel Tempering

Abstract

This paper describes an algorithm for selecting parameter values (e.g. temperature values) at which to measure equilibrium properties with Parallel Tempering Monte Carlo simulation. Simple approaches to choosing parameter values can lead to poor equilibration of the simulation, especially for Ising spin systems that undergo 1 s t -order phase transitions. However, starting from an initial set of parameter values, the careful, iterative respacing of these values based on results with the previous set of values greatly improves equilibration. Example spin systems presented here appear in the context of Quantum Monte Carlo.

Full PDF

aa r X i v : . [ c ond - m a t . o t h e r] A p r Robust Parameter Selection for Parallel Tempering

Firas Hamze Neil Dickson Kamran Karimi

D-Wave Systems Inc., 100-4401 Still Creek Drive, Burnaby, B.C., V5C 6G9, Canada { fhamze, ndickson, kkarimi } @dwavesys.com Abstract

This paper describes an algorithm for selecting pa-rameter values (e.g. temperature values) at which tomeasure equilibrium properties with Parallel Tem-pering Monte Carlo simulation. Simple approachesto choosing parameter values can lead to poor equi-libration of the simulation, especially for Ising spinsystems that undergo 1 s t -order phase transitions.However, starting from an initial set of parametervalues, the careful, iterative respacing of these val-ues based on results with the previous set of valuesgreatly improves equilibration. Example spin sys-tems presented here appear in the context of Quan-tum Monte Carlo. This paper describes our experience with and mod-iﬁcations to an algorithm to enhance Parallel Tem-pering (PT) Monte Carlo [1] known as

Feedback Op-timized Parallel Tempering (FOPT) [3]. Thoughour experimental analysis will focus on the Quan-tum Ising Spin Glass in a transverse ﬁeld, we shalltry to keep the concepts as general as possible as webelieve that workers in as diverse ﬁelds as BayesianStatistics, Computer Science, and Molecular Simu-lation may be interested in optimizing PT on dif-ﬁcult problems. While PT has proved itself to bean indispensable tool in computer simulation, cer-tain aspects of some problems can present seriousobstacles to its use. The idea of FOPT is to try toeliminate the barriers in replica-diﬀusion space thatcan hinder the decorrelative eﬀect that PT is sup- posed to overcome. Our initial experiments withPT on the Quantum Ising Glass encountered pre-cisely such diﬃculties, but we found it necessary tomake some adaptations to FOPT for it to functionrobustly.To ease the presentation to those unfamiliar withQuantum Monte Carlo (QMC) or spin systems butknowledgeable with Markov Chain Monte Carlo(MCMC, also called dynamical Monte Carlo ) sim-ulation, the specialized, physics-oriented conceptsare conﬁned to Section 4, which brieﬂy reviews theQuantum Ising Glass and how it is simulated. Un-interested readers can skip that section and simplyimagine that within is derived a set of intractablehigh-dimensional distributions (which in this caseare deﬁned over a binary-valued state space) thatwe wish to sample from.Section 2 reviews the ideas of PT and FOPT;our contributions to improving the method are dis-cussed in Section 3. Section 4, as mentioned above,discusses quantum spin simulation, and Section 5presents some numerical data.

An excellent general survey of PT can be found in[2]; for completeness some basic concepts are pre-sented. Suppose we have a family { f k ( x ) } of M distributions deﬁned on a state space X each mem-ber of which is dependent on some parameter λ k ,i.e. f k ( x ) = f ( x | λ k ). We assume that the terminalparameters λ and λ M are ﬁxed and that imple-menting naive MCMC at those values, for example1ocal-variable Metropolis or Gibbs (heat bath) sam-pling, will result in slowly and quickly equilibrating(“mixing”) Markov Chains respectively. λ and λ M could, for instance, be low and high temperatures; inthe systems we simulate, λ is a parameter that playsroughly the same role as temperature. The valuesof the remaining { λ k } are assumed to be ﬂexible inthis work provided that equilibration gets easier forincreasing k . We refer to a local MCMC simulationat a given parameter value as a chain. The method of PT supplements the usual within-system Monte Carlo updates with phases duringwhich swaps of entire states between chains at dif-ferent parameter values are attempted. A particu-lar state is sometimes called a replica . In practice,to have a non-negligible swapping probability, ex-changes are only attempted between neighboring pa-rameters { λ k , λ k +1 } . An exchange is accepted prob-abilistically according to the Metropolis ratio α = min (cid:18) , f k ( x k +1 ) f k +1 ( x k ) f k ( x k ) f k +1 ( x k +1 ) (cid:19) (1)It can be shown that the combination of the lo-cal MCMC moves and the swapping implements aMarkov chain on the joint state space X M whoseinvariant distribution is f ( x ) f ( x ) . . . f M ( x M ).The advantage of PT is that in theory, it can en-able the system to escape from local minima it mayencounter during the local updates at large λ . It isdesirable that a given replica should drift relativelyeasily between the terminal λ as this would suggestthat the state at λ M at the end of a “round trip” inparameter space has decorrelated from its value atthe start. It may appear at ﬁrst sight that merelyallowing a reasonable swapping probability wouldsuﬃce. The latter can be achieved by spacing the λ k close enough together to ensure this. One canbe more precise (see, e.g. [2]); it can be shown, byexpanding log f ( x | λ ) to second order about λ k , thatthe expected log acceptance rate between parame-ters k and k + 1 under the equilibrium distributionsis E [log( α )] ∼ − I ( λ k )( λ k +1 − λ k ) (2)where I ( λ k ) = − P x f k ( x ) ∂ log f ( x | λ ) ∂λ (cid:12)(cid:12) λ k . We willuse this result in Section 3.1 to determine the initial spacing for the feedback procedure and in Section3.2 to stabilize the algorithm. As pointed out by [3], making PT work properlycan be more subtle than merely assuring suﬃcientoverlap between chains. Even though the marginalswap rate between all neighboring parameters mayseem reasonable (say 25%,) there can exist higher-order bottlenecks in parameter space that eﬀectivelychoke the replica diﬀusion. A given replica can re-peatedly make its way to the bottleneck but onlyrarely pass through it. Choosing the { λ k } to remedythis issue is thus not a matter of merely achieving agiven swapping rate between adjacent chains but ofensuring that a large number of replica round-tripstakes place.We now brieﬂy summarize the insights of [3]. Let n up ( λ k ) and n down ( λ k ) denote, for a given run ofPT, the total number of replica that have visited pa-rameter λ k that were drifting “upward” and “down-ward” respectively. An upward-drifting replica isone that, of the two terminal parameters, has vis-ited λ most recently; a downward-drifting one haslast visited λ M . Replica that have not yet visited ei-ther endpoint do not contribute to the n up or n down sums. To maximize the rate at which the replicadiﬀuse from λ to λ M , the ﬂow fraction f ( λ k ) = n up ( λ k ) n up ( λ k ) + n down ( λ k ) (3)should decrease linearly from λ to λ M [7]. In [3],an iterative algorithm is described, where, after arun of PT with a given set of parameters, new pa-rameters are generated from the old ones and themeasured f so as to (hopefully) make f closer tooptimal during the subsequent run of PT. In otherwords, the linearly-decreasing f is a ﬁxed point ofthe procedure. The algorithm tends to move theparameter values away from where f has a smallslope towards locations where it drops oﬀ sharply.For convenience, we restate in Algorithm 1 a ver-sion of it appearing in [4] that is somewhat moretransparent than the original one deﬁned in [3].The procedure can be stopped before the passage2 lgorithm 1: Feedback-Optimized PT

Input : { λ k } : Initial parameter set M : Number of parameters N iter : Number of feedback iterations N sweep : Number of sweeps of PT within an iteration Output : { λ k } : Optimized parameter set beginfor i = 1 . . . N iter do ParallelTempering( λ k ) for N sweep steps, calculate f Deﬁne g ( f ) such that:1. g ( f ( λ k )) = λ k

2. Within ( λ k , λ k +1 ), g ( f ) is a linear interpolation between g ( f ( λ k )) and g ( f ( λ k +1 )) for k = 2 . . . M − do λ k ← g (( k − / ( M − N iter iterations if some criterion is speciﬁed andmet; it may also be desirable to increase N sweep fromone iteration to the next [3].Impressive results in eliminating a ﬂow bottle-neck in the planar ferromagnetic Ising model arepresented in [3], which suggested that FOPT was anatural candidate to alleviate a similar issue we en-countered while simulating Quantum Ising glasseswith PT. Unfortunately, in its initial form, the al-gorithm was unstable on these problems: one is-sue was that it was too zealous in its placement ofthe parameters around the measured drop-oﬀ in thefraction. For a given number of chains and simula-tion time, this would cause some parameter inter-vals to become too wide, resulting in chains whereno upward or downward-moving replica visit andhence breaking the recursion. We suspect this to bedue to the extreme sensitivity of the systems’ behav-ior around the bottleneck, and the fact that owingto the large system sizes, it was infeasible to runPT for the times required to obtain enough statis-tics to accurately calculate f . Interestingly, thisproblem seemed to appear even when the numberof chains was increased dramatically; its occurrencewas merely delayed by a few iterations.Nonetheless, several factors drove us to make practical modiﬁcations to FOPT, not least of whichwas the sheer impracticality of manually choosinggood parameter values for a sizable number of verylarge and diﬃcult problems. In the next section wedetail our implementation improvements. There were several components that we found tobe necessary for feedback-optimization to properlyfunction on our set of problems, but we emphasizethat fundamentally, the goal is the same as that ofthe original FOPT, namely achieving a straight-line f . In some cases, good rules of thumb exist for choos-ing parameter spacings in PT; when dealing withtemperature, for example, a geometric sequence hasbeen suggested to be reasonable in certain circum-stances [5]. Generally, such guidelines are oftenunclear; indeed this is the point of the feedback-optimization idea. In our practical experience3hough, the initial parameter spacings and espe-cially the number of chains can have a substantialinﬂuence on the eﬃcacy of the feedback procedurefor a given amount of computation time. In partic-ular, our experience has shown that starting withan excessive number of chains can cause poor per-formance.For our initialization strategy, we devised the sim-ple method AddChains(), described in Algorithm 2.Starting from a sparse, linearly-spaced initial set ofparameters, a short run of PT takes place withinwhich the exchange rates are estimated. Parame-ters are then added in such a way as to achieve someminimum swap rate. The number of initial chainscan be relatively small; for example for the size N spin systems we considered, we typically used √ N / burn-in ) steps. However it is possible, instead ofaveraging the number of swaps themselves, to aver-age the log Metropolis ratios between neighboringchains each time swaps are attempted. Speciﬁcally,if during the n th swap attempt, we compute theratio: α ( n ) k,k +1 = min (cid:18) , f k ( x k +1 ) f k +1 ( x k ) f k ( x k ) f k +1 ( x k +1 ) (cid:19) and if N swap is the number of swap attempts, wedeﬁne \ L k,k +1 = 1 N swap N swap X n =1 log (cid:0) α ( n ) k,k +1 (cid:1) (4)It is not necessary to carry out PT to full equi-librium to usefully estimate the log swap rates us-ing (4); crude estimates calculated over a relativelyshort PT run can satisfactorily inform how manychains need to be added to intervals that are toowide, even when the estimated swap rates are very low (say 10 − or so.) Determination of the num-ber of needed chains needed in each interval is thendone using the second-order approximation in (2).A run of the initialization routine would thus pro-ceed as follows: begin with a minimum swappingthreshold α min (e.g. 20%), and an initial numberof chains M with λ M = λ M . After measuring theswap rates resulting from a uniform parameter grid,“grow” the number of chains (to a ﬁnal value of M ) attempting to meet the threshold; practically,the list of new parameters can be generated by pop-ulating it with the initial values, appending the newones to the end, and sorting it when ﬁnished. Algorithm 2:

AddChains

Input : M : Initial number of parameters λ , λ M : Terminal parameters α min : Minimum target swap rate Output : M : Number of ﬁnal parameters { λ k } : Final set of parameters begin // Initial parameter list; linearspacing for k = 2 . . . M − do λ k = λ + k ( λ M − λ ) / ( M − { λ k } for N sweep moves,obtaining { \ L k,k +1 } // Append parameters if intervalstoo wide M ← M for k = 1 . . . M − do R ← (cid:24)r \ L k,k +1 log( α min ) (cid:25) ∆ λ ← ( λ k +1 − λ k ) / ( R + 1) for j = 1 . . . R do λ M + j ← λ k + j ∆ λM ← M + R { λ k } ← sort( { λ k } )The procedure can be repeated with the new pa-rameters, though we have found this to be unneces-sary. Once the initial set has been determined, thefeedback phase, discussed in the next section, can4egin. The ﬁrst problem to tackle in the feedback processis that of instability due to unreliable f data or ex-treme problem sensitivity. As alluded to in Section2.2, if f possesses a severe drop-oﬀ, the original algo-rithm tends to over-concentrate the available chainsaround it, resulting in some intervals being too widefor any replica to pass through at all in the next it-eration. f thus becomes mathematically undeﬁneddue to n up = n down = 0, and the algorithm malfunc-tions. To redress this, an approach we implementedwas to smooth f in such a way that the parametersdo not move too quickly. We deﬁned f smooth ( λ k ) =(1 − w ) f ( λ k ) + wL ( λ k ), where w is a weight be-tween 0 and 1, and L ( λ k ) = 1 − ( k − / ( M − M steps, oralternatively, it is the ﬁxed point of the optimiza-tion algorithm. If w = 0, we recover the originalFOPT; at the other extreme, if w = 1, the feedbackdoes nothing to the parameters. At intermediatevalues of w , applying feedback to f smooth instead of f has a damping eﬀect on the parameters’ motion.It may be advantageous to reduce w between subse-quent feedback iterations as the initial estimates of f become better.In our experiments, the smoothing trick on itsown did not always eliminate the problem of overly-wide intervals, though it certainly postponed thisunfortunate behavior. Consequently, we developeda method that would override the feedback routine’soutput if it was likely that the resultant swap rateswere so low that they would cause the issue dis-cussed above to occur. We called this the post-process step; it is summarized in Algorithm 3. Us-ing the approximate relation (2) between the logaccept rate and the interval width, this procedureuses the accept rates estimated during the iterationto predict what the accept rates would be due tothe new parameters, i.e. those resulting from thefeedback. If, for a given interval, the predicted rateis lower than a threshold α min , its width is thresh-olded. This method may result in the ﬁnal intervalbeing too wide, and since the endpoints are assumed to be static, more parameters are added there tocompensate, again with the objective of attaining atleast α min in the resultant new intervals. Hence thetotal number of chains can continue to grow duringthe feedback procedure. Of course α min need not bethe same as the one used in the AddChains phase;indeed we have found that it should be kept as lowas possible (say around 5%) in order to minimize in-trusiveness on the parameter search. Its role shouldbe viewed exclusively as one of “life support.”A somewhat counter-intuitive fact we observedwas that in the initial iterations at least, it cansometimes be favorable to use the normalized func-tion n ′ down ( λ k ) = 1 − n down ( λ k ) /n down ( λ M ) or itssmoothed version as a surrogate for f . Due to therapid equilibration at large λ k , this quantity tendsto stabilize into its ﬁnal form more quickly than f does; using it instead of f makes the implicit as-sumption that the analogous function n ′ up ( λ k ) = n up ( λ k ) /n up ( λ ) will, when it properly converges,be symmetric to it, i.e. n ′ up ≈ − n ′ down . This as-sumption may end up being incorrect, but it seemsin some cases to be more reliable than using faulty f data in the beginning. As with the smoothingof f , as the iterations advance and the bottlenecksare more accurately located, one can start using theactual f in the feedback.A ﬁnal implementation aid we noticed to helpspeed up the convergence of f considerably withineach feedback iteration was initializing all the chainswith a ground state (global minimum) if it is known.If the ground state is unique, as it was in the systemswe were considering, this is the correct equilibriumbehavior at the low λ values.The next section discusses the model we per-formed our experiments on. It can be skipped bynon-physicists. Equilibrium simulation of quantum spin systems canbe done by application of the Suzuki-Trotter (ST)framework [6]; the quantum system’s partition func-tion is approximated by a partition function corre-sponding to that of a classical Ising model consisting5 lgorithm 3:

Post-Process Parameters

Input : { λ k } : parameters at start of feedbackiteration { λ ′ k } : parameters after feedback { \ L k,k +1 } : log swap rates measuredusing parameters { λ k } α min : minimum target swap rate M : initial number of { λ k } Output : { λ ′′ k } : parameters after post-process M : ﬁnal number of parameters { λ ′′ k } begin // This stage handles { λ k } intervalsthat are too wide λ ′′ ← λ for k = 1 . . . M − do l ← max { j | λ j ≤ λ ′′ k } ∆ λ ← λ l +1 − λ l ∆ λ max ← ∆ λ q log( α min ) / \ L l,l +1 ∆ λ ′ ← λ ′ k +1 − λ k if ∆ λ ′ > ∆ λ max then λ ′′ k +1 ← λ ′′ k + ∆ λ max else λ ′′ k +1 ← λ ′ k +1 // This stage adds chains to thefinal interval if needed l ← max { j | λ j ≤ λ ′′ M − } ∆ λ ← λ l +1 − λ l ∆ λ max ← ∆ λ q log( α min ) / \ L l,l +1 M old ← Mλ ← λ ′′ M − + ∆ λ max while λ < λ ′ M old do λ ′′ M ← λl ← max { j | λ j ≤ λ } ∆ λ ← λ l +1 − λ l ∆ λ max ← ∆ λ q log( α min ) / \ L l,l +1 λ ← λ + ∆ λ max M ← M + 1 λ ′′ M ← λ M of multiple ferromagnetically-coupled copies (“Trot-ter slices”) of the original system. If one can drawequilibrium samples from this eﬀective classical sys-tem, expectations with respect to the quantum sys-tem can be estimated. Representations using moreslices give a better approximation but are more com-putationally demanding to simulate.Our aim in these simulations was estimation ofthe minimum energy gap as a function of the quan-tum adiabatic parameter λ ∈ [0 , λ is given by H ( λ ) = A ( λ ) H B + B ( λ ) H P , where H B and H P are purelyquantum and classical Hamiltonian operators re-spectively, and the functions A ( λ ) , B ( λ ) are suchthat the system is classical and quantum for λ = 0and λ = 1 respectively.Omitting unnecessary details, the Hamiltonians H P came from a speciﬁc class of non-planar IsingGlasses. Our gap calculation methodology was thesame one appearing in [8], which crucially dependson obtaining accurate estimates of the spin correla-tion function. For large system sizes, na¨ıve MonteCarlo algorithms suﬀer from equilibration issues atsmall values of λ , hindering the estimation. Itseemed natural that PT would assist in this prob-lem; the set of target distributions f k simply cor-responded to the ST representations of the quan-tum systems at the { λ k } . It turned out that formany such problems, PT can have serious problemsof the sort discussed in this paper; their presencecorresponded to the existence of ﬁrst-order quantumphase transitions. Tuning the parameters by handis very diﬃcult for such problems; small changes tothe parameters had dramatic eﬀects on f . A PToptimization method was thus essential for simula-tions we were interested in. As an added beneﬁt,the method we implemented tends to concentratethe parameters precisely around the point we weremost interested in, namely where the spectral gapis minimum. The PT optimization algorithm discussed in this pa-per was used to determine parameter spacings for6undreds of quantum spin systems, which, in theirapproximate representation, correspond to complex,binary-valued systems of up to 32768 variables(these resulted from representing a 128 quantumspin system with 256 Trotter slices.) The param-eters were determined individually for each instanceas it was observed that a set optimized for one in-stance was likely to perform poorly on other in-stances in the same class of problems.For stepwise illustration, let us consider a partic-ular 16-spin Quantum Ising Glass represented with128 Trotter slices to yield a system size of N = 2048.This speciﬁc instance was extremely diﬃcult to sim-ulate with PT owing to a sharp ﬁrst-order quantumphase transition. If the parameters were spaced soas to achieve a uniform swap rate, a dramatic ﬂowbottleneck manifested itself. It was also imperviousto the initial FOPT algorithm, as well as numerousother approaches which seemed promising on otherproblems.In order to allow a fair comparison, in both theoriginal FOPT and in the version with our improve-ments, we applied the AddChains initialization rou-tine to determine the starting parameters, and thestates were all seeded with the ground state.First we demonstrate how Algorithm 2 performed.AddChains was called on this system using a targetswap threshold of 18%, 70000 sweeps of PT, and 20chains to start. Figure 1 shows both how the accep-tance rates responded to the interspersion and theparticular locations in λ space that parameters wereadded. The estimator of the swap rates at the ini-tial parameter spacing produced b L as low as − λ ≈ .

6. Our rough algo-rithm succeeded in achieving the target rate in allintervals.In Figure 2 a run of the initial FOPT algorithmand our stabilized version are shown. For our algo-rithm, we used the fairly strong smoothing value of w = 0 .

75; the number of PT sweeps was constantat 200000 in each iteration. For PostProcess, theminimum interval swap rate was set to 3%. The f InitialIteration 2Iteration 4Iteration 8Iteration 11

Figure 3: (Color online) Progress of the upward-moving fraction f when our algorithm is appliedto the N = 2048 Ising spin system discussed inthe text. Note the abrupt decline when PT is runwith the results of AddChains. Subsequent iter-ations progressively make f closer to the desiredform, namely a straight line. 200000 PT sweepswere used to gather the f statistics at each itera-tion.details can be read from the caption in the Figure,but the essential point is that the original algorithmaggressively spaces the parameters in such a waythat the recursion cannot proceed past a single it-eration. For all but the endpoint parameters, a PTrun of realistic length using such { λ } would give n up = n down = 0. For illustration, in Figure 3 weshow the progression of f (not f smooth ) at diﬀer-ent iterations of our method. Even on this highlytroublesome instance, the parameters were ﬁnallyspaced in such a way that enough round-trips tookplace to allow accurate estimation of the minimumexcitation gap.To demonstrate eﬃcacy on the typical systemsizes we were ultimately interested in simulating, inFigure 4 we present results of the evolution of f on an N = 24576 spin model resulting from repre-senting a 96 spin quantum system with 256 Trotterslices. The results were obtained using the samesimulation parameter settings as those used for the7 λ α ( λ ) λ Initial λλ after AddChainsInitial accept rateAccept rate after AddChainsMinimum accept rate=0.18 Figure 1: (Color online) Illustration of the AddChains algorithm on an N = 2048 Ising spin system. In thebottom graph, we show the grid of initial linearly-spaced parameters and the resultant λ after AddChains.The top graph plots the initial and ﬁnal estimated swap rates after 70000 sweeps of PT (in the left side ofthe plot, the blue and red lines are superimposed.) Around the region with λ ∈ [0 . , .

8] the initial estimatedswap rate plunged as low as ≈ e − ; after the parameter addition, all intervals had swaps taking place overthe criterion rate, shown by the black dotted line. The initial grid consisted of 20 parameters, to which 46were added by the algorithm. f InitialIteration 2Iteration 4Iteration 5

Figure 4: (Color online) Progress of the upward-moving fraction f when the algorithm is applied tothe N = 24576 Ising spin system discussed in thetext. previously-discussed 16-spin system. The resultantparameter spacing again allowed suﬃciently goodPT performance for us to obtain high-quality MonteCarlo data. We have presented improvements to the problem ofiterative parameter selection for PT. Our contribu-tion has consisted of three parts: ﬁrst, an initial-ization strategy to place parameters in a reason-able manner when no a priori information is knownabout the problem domain; second, the introductionof damping mechanisms to the FOPT algorithm of[3] that assist in tackling the problem of instability;third a post-processing procedure that prevents thealgorithm from malfunctioning. We demonstratedthe eﬀectiveness of the method by showing experi-ments on a particularly diﬃcult quantum spin sys-tem represented as a classical Ising model with 2048spins. We have tested our algorithm on much bigger8 λ It e r a t i on (a)(b) Figure 2: (Color online) Illustration of the progression of the parameters using the original FOPT algorithmand the one with our modiﬁcations on the N = 2048 Ising spin system discussed in the text. Both algorithmsbegan with the λ spacings generated by AddChains; all replica were initialized with the ground state. Atleft, we see that in a single iteration, FOPT spaces all the parameters around λ ≈ .

6, resulting in verywide intervals between the endpoints. The algorithm could thus not proceed past this point because thereplica in the middle could no longer visit either terminus; consequently f was mathematically undeﬁned.Our algorithm’s output appears at right. At each iteration step the resultant λ using a smoothed version of f in FOPT is shown in blue; the λ returned when those parameters are passed to the PostProcess algorithmappear in red. Note that to keep the swap rate over the threshold of 3%, PostProcess often “pulled”parameters back if it predicted that the interval was too wide.9ystems with satisfactory results. Acknowledgements

We would like to thank Helmut Katzgraber, PeterYoung, Mohammad Amin, and Geordie Rose forvaluable discussions and insights.

References [1] K. Hukushima and K. Nemoto. Exchange montecarlo method and application to spin glass sim-ulations. Technical Report cond-mat/9512035,Dec 1995.[2] Y. Iba. Extended ensemble monte carlo.

Inter-national Journal of Modern Physics C , 12:623,2001.[3] H. G. Katzgraber, S. Trebst, D. A. Huse, andM. Troyer. Feedback-optimized parallel temper-ing monte carlo.

Journal of Statistical Mechan-ics: Theory and Experiment , 2006(03):P03018,2006.[4] W. Nadler and U. H. E. Hansmann. Generalizedensemble and tempering simulations: A uniﬁedview.

Phys. Rev. E , 75(2):026109, Feb 2007.[5] C. Predescu, M. Predescu, and C. V. Ciobanu.The incomplete beta function law for paral-lel tempering sampling of classical canonicalsystems.

The Journal of Chemical Physics ,120(9):4119–4128, 2004.[6] M. Suzuki. Relationship between d-dimensionalquantal spin systems and (d+1)-dimensionalising system : Equivalence, critical exponentsand systematic approximants of the partitionfunction and spin correlations.

Progress of theo-retical physics , 56(5):1454–1469, 19761125.[7] S. Trebst, D. A. Huse, and M. Troyer. Opti-mizing the ensemble for equilibration in broad-histogram monte carlo simulations.

Phys. Rev.E , 70(4):046701, Oct 2004. [8] A. P. Young, S. Knysh, and V. N. Smelyanskiy.Size dependence of the minimum excitation gapin the quantum adiabatic algorithm.