[PDF] Nonflat Histogram Techniques for Spin Glasses

Abstract

We study the bimodal Edwards-Anderson spin glass comparing established methods, namely the multicanonical method, the 1/k -ensemble and parallel tempering, to an approach where the ensemble is modified by simulating power-law-shaped histograms in energy instead of flat histograms as in the standard multicanonical case. We show that by this modification a significant speed-up in terms of mean round-trip times can be achieved for all lattice sizes taken into consideration.

Full PDF

aa r X i v : . [ c ond - m a t . d i s - nn ] N ov Nonﬂat Histogram Techniques for Spin Glasses

Fabio M¨uller, ∗ Stefan Schnabel, † and Wolfhard Janke ‡ Institut f¨ur Theoretische Physik, Universit¨at Leipzig, IPF 231101, 04081 Leipzig, Germany (Dated: November 18, 2020)We study the bimodal Edwards-Anderson spin glass comparing established methods, namely themulticanonical method, the 1 /k -ensemble and parallel tempering, to an approach where the ensembleis modiﬁed by simulating power-law-shaped histograms in energy instead of ﬂat histograms as inthe standard multicanonical case. We show that by this modiﬁcation a signiﬁcant speed-up in termsof mean round-trip times can be achieved for all lattice sizes taken into consideration. I. INTRODUCTION

Simulations of systems with rugged free-energy land-scape [1] suﬀer from massive slowing down of the dy-namics in the low-temperature phase. This problem en-countered in many physical systems, e.g., folding poly-mers or spin glasses, renders the investigation of thethermodynamical properties of such systems in the low-temperature phase a very challenging task.The Metropolis algorithm [2] is designed to sample con-ﬁgurations according to their statistical weight in thecanonical ensemble. At low temperatures it fails dra-matically for systems with rugged free-energy landscapebecause of the eﬀectively broken ergodicity. The simula-tions get stuck in local minima (metastable states) andthe thermal energy is not suﬃcient to overcome the hugefree-energy barriers. There have been a wide range ofalgorithmic developments tackling this problem that canall be subsumed in the term of broad-energy ensembles.One commonly employed method is the Parallel Tem-pering (PT) [3–5] method where Metropolis simulationsof copies of the system (replicas) at diﬀerent tempera-tures are performed. After certain time intervals ex-changes of the replicas between the diﬀerent tempera-tures are attempted. This procedure enables the replicasat low temperature to fully explore deep free-energy val-leys and at high temperature to travel freely through thephase space and thus to decorrelate. Thereby, the diﬀer-ent replicas can explore the rugged structure of the free-energy landscape much more eﬃciently than in a simpleMetropolis simulation. Using this method at tempera-tures close to the transition, studies of spin-glass sys-tems of sizes up to 40 spins have been reported [6, 7].For ground-state searches systems of about 10 are fea-sible [8]. The great advantage of PT is its simplicity:the algorithm only needs a suitable temperature set andexhibits good performance which makes it probably themost employed method in the investigation of systemswith rugged free-energy landscape.Another recent development is the Population Anneal-ing Monte Carlo [9–13] method which proceeds similarly ∗ [email protected] † [email protected] ‡ [email protected] to Simulated Annealing [14] as the system is graduallycooled down according to an annealing schedule. The an-nealing is performed on a big population of replicas andby introducing intermediate resampling of the populationof replicas after lowering the temperature the simulationis kept at thermal equilibrium. This permits the evalua-tion of thermodynamic observables in contrast to simpleSimulated Annealing. Despite the attempts of optimizingthe method for spin glasses [10, 12, 15] it is not able tooutperform PT. Its optimization, however, remains morecumbersome due to the additional complexity. The mainadvantage of this algorithm is its suitability for massivelyparallel implementation. For disordered systems, how-ever, this advantage does not come into play, becausethe necessity of simulating many diﬀerent disorder real-izations allows the eﬃcient use of parallel computing forany method.The Multicanonical (MUCA) method [16–18] is an-other well established algorithm designed for the simu-lation of systems with ﬁrst-order phase transitions whichperforms well in the simulation of systems with ruggedfree-energy landscape, too. It has already been appliedto spin glasses in Refs. [19, 20]. In this method the simu-lation is set up to visit all possible energies with the sameprobability yielding a ﬂat histogram in energy. However,it has been noted by diﬀerent researchers that this en-semble is not optimal. One suggested improvement is the1 /k -ensemble by Hesselbo and Stinchcombe [21], wherethe sampling distribution is the inverse of the integrateddensity of states. As the authors point out, this descrip-tion samples the low-energy region more often than thehigh-energy region, resulting in energy histograms whichgrow towards the low-temperature phase.Another non-parametric optimization of the MUCA al-gorithm was proposed in Ref. [22]. The method uses anestimator of the local diﬀusivity in order to maximize thenumber of performed round trips in energy. The methodis among others applied to the ferromagnetic Ising modelfor which it improves the scaling behavior of the round-trip times in energy. The improved performance for themodels considered in that work and the nature of thealgorithm of automatically identifying the bottlenecks ofthe simulation and concentrating the simulation eﬀort onthis region suggest that the round-trip times of the simu-lations should diminish independently of the underlyingsystem. However, in our implementation, in the case ofthe three-dimensional (3D) bimodal Edwards-Anderson(EA) spin glass [23], the round-trip times did not sys-tematically improve with this method. Instead the simu-lation got stuck for some of the considered samples, ren-dering a comparison to the other methods impossible.In this work we present a diﬀerent approach: we pre-scribe parametric proﬁles for the histograms of the simu-lation and adjust the simulation weights accordingly. Asfor the three previous MUCA variants, it requires theknowledge of the underlying density of states, but it ismuch more ﬂexible. The proﬁles are all chosen to beshifted power laws having two free parameters.As an example we consider the 3D bimodal EA spinglass. This is one of the simplest models exhibiting arugged free-energy landscape and is also interesting fromthe point of view of an optimization problem where ﬁnd-ing ground states of hard disorder realizations is NP-hard [24]. Despite the exponential growth of the compu-tational resources fundamental questions regarding thenature of the spin-glass phase still remain. For theprogress in understanding the open questions the devel-opment of new methods and an improvement of the ex-isting methods is crucial.The rest of the paper is organized as follows. In Sec. IIthe spin-glass model and the simulation methods are ex-plained. The direct comparison of the round-trip timesof the individual methods is performed in Sec. III. Theframework of extreme-value statistics is introduced inSec. IV. In Sec. V benchmarks for the global comparisonare discussed and the diﬀerent methods are compared interms of those benchmarks. The results are summarizedin Sec. VI. II. MODEL AND EMPLOYED METHODS

We take into consideration the 3D bimodal EA modelwhose Hamiltonian takes the form H = − X h ij i J ij S i S j , (1)where the bonds J ij and the spins S i can take values ± V couplings J ij which are either positive ornegative unity with a probability of 50%, where V = L is number of spins in a lattice of linear lattice size L . Thedisorder realizations are generated prior to the simulationand then kept ﬁxed for all times (quenched disorder). Asan adequate set of disorder realizations 4000 samples with L = 3 and L = 4 are generated and 5000, 6000, and 4000samples of size L = 5 ,

6, and 8, respectively.The method which we adapted is the well-establishedMUCA method [17] employing a generalized Metropolis − − − − −

200 0 E H ( E ) E g E g − ∆ E ﬂat1 /k PTpower law 10 P S H ( E ) P SH ( E ) FIG. 1. The recorded histograms H ( E ) of the diﬀerent meth-ods and the proﬁle function P SH ( E ) for one disorder realiza-tion of linear lattice size L = 8. The dotted and the dashedvertical lines indicate the position of the ground-state energy E g and the position of the pole of the power law (5), respec-tively. criterion with an energy dependent weight function P acc = min (cid:18) , W ( E new ) W ( E old ) (cid:19) , (2)where the weight function is proportional to the inverseof the density of states Ω( E ), W ( E ) ∝ Ω − ( E ) . (3)For the MUCA simulations Ω( E ) has to be suﬃcientlywell-known a priori for each disorder realization. An es-timator for it can, for instance, be obtained by means ofthe Wang-Landau algorithm [25] or, as in this work, byother iterative procedures which are explained, e.g., inRef. [26]. This ensemble produces histograms which areﬂat in energy and is, therefore, often also referred to as“ﬂat histogram method”.A straightforward generalization of the ﬂat histogrammethod are the nonﬂat histogram methods. If the sim-ulation weights for the ﬂat MUCA method are multi-plied with the desired energy dependent shape (or proﬁle)function P SH ( E ) W ( E ) ∝ Ω − ( E ) P SH ( E ) , (4)the resulting histograms will be shaped according to P SH ( E ). In this work all the proﬁles are shifted powerlaws of the form P SH ( E, ∆ E, α ) = (cid:18) E ∆ E − E g + 1 (cid:19) α , (5)where the exponent α < E > E g of therespective spin-glass realization. In this parametrizationthe power laws are normalized to unity at E = 0.In Fig. 1 the recorded histograms of the diﬀerent meth-ods are displayed on a logarithmic y -scale for one disor-der realization with L = 8. In contrast to ﬂat MUCA allmethods have in common that the distribution of sam-pled states grows towards the ground-state energy. Therecorded histogram of nonﬂat MUCA matches perfectlythe imposed proﬁle and its histogram in the ground-stateregion is similar to that of PT. We are convinced that thisfeature which among the existing methods is strongestfor PT enhances the ability of sampling the low-energyregion and especially the ability of ﬁnding low-energystates of investigated systems. There are diﬀerent possi-ble choices of functional forms which enhance the sam-pling of the low-energy region and even stepwise deﬁnedfunction could be employed and might even yield betterresults. We chose a power law because the two involvedparameters allow for a good adaptation but the tuning ofthe parameters in the two-dimensional parameter spaceremains feasible.For the above parametrization we found a ﬁxed param-eter set namely α = − . E = 96 which indepen-dently of the lattice size yielded the shortest mean round-trip times, among the considered proﬁles. Subsequentlywe will refer to the nonﬂat MUCA setting with the power-law shape belonging to this parameter set just as power-law (PL) setting or nonﬂat MUCA method. While theoverall best results are obtained with this parameter set,we want to point out that an improvement compared toﬂat MUCA was visible for each of the considered param-eter sets. The parametrization with a ﬁxed oﬀset fromthe ground-state energy yields diﬀerent relative distribu-tions depending on the ground-state energy encounteredin the respective disorder realization. The value of theproﬁle function at the ground-state energy is given by P SH ( E g , ∆ E, α ) = − E g ∆ E ! α . (6)The sampling at the ground-state energy compared tozero energy is thus enhanced by a factor of ≈

13 for adisorder realization with L = 4 and a typical ground-state energy of ≈ − L = 8 andtypical ground-state energy of ≈ −

900 instead it is en-hanced by a factor of ≈ /k -ensemble [21] is considered which is de-ﬁned by setting the simulation weights equal to the in-verse of the integrated density of states up to the energyof the respective bin W /k ( E ) ∝ /k = Z EE g dE ′ Ω( E ′ ) ! − . (7)Here, a ﬁrst-order Taylor expansion of ln Ω at E leadsto W ( E ) ≈ W /k ( E ) if P SH ( E ) = d ln Ω( E ′ ) /dE ′ | E ′ = E .This prescription again relies on the knowledge of thedensity of states. The authors of Ref. [21] stress its robust ergodicity and apply it to spin glasses and the travelingsalesman problem [27].Since for the above mentioned methods the density ofstates is the only needed input it was determined onlyonce to high accuracy employing the iterative procedureadapted from Ref. [26] but with power-law shaped distri-butions in energy. In this case, and generally when theground-state energy of the system is not known, a priorithe proﬁle function has to be adapted whenever a lowerenergy is found.Lastly, the PT method being probably the most em-ployed algorithm for spin-glass simulations, is includedin the comparison. The ensemble in this case is deﬁnedby a set of M temperatures { T i , i = 1 , ..., M } . For eachtemperature T i a Metropolis simulation of a copy of thesystem (replica) is performed. The temperatures of thereplicas i and j are allowed to exchange conﬁgurationaccording to P ex ij = min (cid:16) , e ( Tj − Ti )( E j − E i ) (cid:17) , (8)where E i and E j are the energies of replica i and j and k B = 1. This prescription allows for fast decorrelationwhen a replica travels to high temperature and the explo-ration of the local minima at low temperatures. Amongthe vast choice of diﬀerent PT protocols available [28]we opted for the constant exchange rate protocol withacceptance rates between 40% and 60% [29]. For all sim-ulations the maximal temperature was chosen to be wellabove the critical temperature, T max > > T c ≈ M = 7 , , , , and 20for L = 3 , , , , and 8, respectively. We note that thechoice of the temperature set is crucial for the PT algo-rithm and also provides the possibility of optimizationsas for example in Ref. [30]. However, in this work werather limit ourselves to a well established protocol forPT focusing on the optimization of the nonﬂat histogramtechnique. III. COMPARISON OF THE ROUND-TRIPTIMES

The observable taken into account for this study is theround-trip time. For all methods except PT and eachdisorder realization it is deﬁned as the time needed by thesimulation to travel from the highest energy ( E ≈

0) tothe ground-state energy and back. For PT, instead, theround trip is measured between the ground-state energyand an energy typical for a canonical ensemble with atemperature well above the freezing point of the disorderrealization [31][32]. This time can be taken as an upperbound of the autocorrelation time of the energy of therespective disorder realization at the ground state. Wewant to stress that the energies we refer to as ground-state energies are the lowest encountered energies and (a) (b) ( ) (d) (e) (f) τ P T ( i ) τ ﬂat ( i ) L = 4 τ ﬂ a t ( i ) τ PL ( i ) L = 4 τ P T ( i ) τ PL ( i ) L = 4 τ P T ( i ) τ ﬂat ( i ) L = 8 τ ﬂ a t ( i ) τ PL ( i ) L = 8 τ P T ( i ) τ PL ( i ) L = 8FIG. 2. Scatter plots of the round-trip times comparing the nonﬂat power-law histogram technique (PL) to the standard ﬂatMUCA (ﬂat) and parallel tempering (PT) methods for sizes L = 4 (upper panels) and L = 8 (lower panels). All points scatteredabove the identity line have longer round-trip times for the method on the y axis. may not be the true ground states. However, the round-trip times were always measured performing at least 100round trips for each individual sample and method sothat several hundred round trips have been performedon each disorder realization. In case lower energies weremeasured during this process the disorder realization wasrequeued and simulated again until the desired numberof round trips was achieved. This procedure renders thediscovery of the true ground state very probable. Afterat least 100 round trips the relative statistical error inthe round-trip time τ i is of the order of ∆ τ i /τ i ≈ . L = 4 and L = 8 on a log-log scale. The strong correlation of the round-trip timesfor each single disorder realization should be noted, indi-cating that the hardness of the underlying optimizationproblem is primarily a characteristic of the disorder real-ization and mostly independent of the employed method.This fact allows us to categorize the disorder realizationsand speak of easy and hard instances. Comparing theround-trip times τ i for the ﬂat MUCA method and the parallel tempering method (left panels) for both latticesizes L = 4 and L = 8, the τ i are systematically lowerfor PT, indicating its superior performance for the wholeclasses of the bimodal EA spin glasses of the respectivelattice sizes.When comparing the performance of the nonﬂat his-togram method to the ﬂat MUCA method (central pan-els) the surrounding area of the scattered round-triptimes shows a bending, i.e., for L = 4 the ﬂat histogrammethod displays only slightly higher round trip times forthe easy disorder realizations. With increasing hardnessthe round-trip times for the ﬂat histogram method growfaster than for the PL setting. This eﬀect gets enhancedwith a further increase of the lattice size (see lower panel)where for the case of L = 8 the round-trip times of theeasiest samples for the ﬂat MUCA method are similar tothose for PL. However, as will become apparent in thenext section, the hard samples contribute most to themean round-trip time so that even a slightly weaker per-formance for the easier samples would hardly contributeto the total computation time.The right panels show the comparison of PL to PT. For L = 4 PT outperforms the nonﬂat histogram method forthe easy disorder realizations, while for the hard onesPL displays shorter round-trip times. For L = 8 a large τ . . . . . . F ( τ ) ﬂat1 /k PTpower law 10 τ . . . . . . F ( τ ) ﬂat1 /k PTpower law200 400 600 800 τ f ( τ ) (a) (b) FIG. 3. Round-trip time distributions (symbols) and best ﬁtting cumulative distribution functions (lines) for the diﬀerentmethods and lattice size L = 4 (a) and L = 8 (b). The inset of the left panel shows the PDF form of the distribution. fraction of the disorder realizations is characterized byshorter round-trip times for PT, but the tail of the dis-tribution describing the hard samples exhibits shorterround-trip times for PL. IV. ROUND-TRIP TIME DISTRIBUTIONS

In order to quantify the observations of the previoussection the distributions of the round-trip times can beexamined. Round trips in energy include the visit of theground state of the respective disorder realization whichis an extreme event. Their statistics must thus be de-scribed in the framework of extreme-value statistics. Oneof the main results in this ﬁeld is given by the Fisher-Tippet-Gnedenko theorem [33] which characterizes thetype of distributions which extreme-value distributionscan converge to. The round-trip time distributions ofthe bimodal EA spin glass all seem to converge to Fr´echetdistributions independently of the method and the sys-tem size. This has already been suggested in Ref. [34]for the round-trip time distributions of the 3D EA modelemploying the ﬂat histogram ensemble.One parametrization of the cumulative distributionfunction (CDF) of the Fr´echet distribution is given by F ( τ ) = exp " − (cid:18) ξ τ − µβ (cid:19) − /ξ , (9)with τ ∈ [ µ − β/ξ, ∞ ). The location of the distributionalong the τ -axis is determined by µ , β is the scale pa-rameter, and the shape parameter ξ describes the decayof the tail of the distribution, i.e., the occurrence of rareevents. The CDF is the integrated form of the proba-bility density function (PDF) f ( τ ). The round-trip timedistributions are thus all deﬁned by sets of parameters µ, β, ξ which are determined by ﬁtting the CDF to therecorded round-trip times.The measured round-trip times and the respectivelybest ﬁtting Fr´echet distribution for lattice sizes L = 4 L . . . . ξ ( L ) ﬂat1 /k PTpower law

FIG. 4. The ﬁgure shows the shape parameter of the bestﬁtting Fr´echet distribution in dependence of the lattice size forthe diﬀerent employed simulation methods. The dotted lineindicates the threshold value from which on the distributionmean diverges. and L = 8 are plotted in Fig. 3. The points representthe measured data and the solid lines are the best ﬁttingFr´echet distributions. The varying performance of themethods in dependence on the diﬃculty of the disorderrealizations which became visible in the last section, alsoreﬂects in the distribution of the round-trip times. Forboth lattice sizes the CDF belonging to the ﬂat MUCAmethod is lower for all τ than the one belonging to PT.The maximum increase which corresponds to the bulkof the distribution is shifted to higher τ for MUCA ascompared to PT.Comparing PT instead to the PL setting yields a dif-ferent picture: the cumulative distribution functions forlattice size L = 4 cross at F ( τ ) ≈ /

3, corresponding toa round-trip time τ ≈ × . This means that for thePT algorithm the easiest one third of all samples havesmaller round-trip times than the easiest third for thePL method, while PL is faster for the harder two thirds.For L = 8 the PL round-trip times are larger for the eas-ier half of the samples and smaller for the harder half.The round-trip times for the hard disorder realizations

10 100 1000 L = 4 L = 6 L = 8 τ ( n ) n FIG. 5. Illustration of the convergence of the population meanof the round-trip times for the ﬂat MUCA method in depen-dence of the population size to the distribution mean of theunderlying distribution. The solid lines are the running meanincluding the ﬁrst n samples and the dotted lines in the re-spective color are the means of the underlying distribution. have most inﬂuence on the decay of the distribution andthus on the shape parameter ξ . In Fig. 4 the scalingof the shape parameter for the diﬀerent methods is dis-played, where the errors of the best ﬁtting parametersare estimated via jackkniﬁng [35]. For the consideredlattice sizes the shape parameter scales similarly for allthe diﬀerent methods. However the values for PL are sys-tematically lower for L ≥

4. This is in good agreementwith its superior performance for the diﬃcult disorderrealizations.

V. ASSESSING THE PERFORMANCE OF THEDIFFERENT METHODS

Next, we want to compare the performance of the dif-ferent simulation methods. The most intuitive observablewould be the disorder average of the round-trip timesover the set of considered disorder realizations. How-ever, as it will turn out the rare-state events which havea dominating inﬂuence on the distribution mean are notwithin the set of simulated disorder realizations. Thiseﬀect is accounted for by considering distribution meansup to large quantiles of the underlying extreme-value dis-tributions, yielding a more reliable measure of the real performance of the diﬀerent methods.

A. Finding a Benchmark

In principle the real performance could be determinedby measuring the round-trip time of every possible dis-order realization. This procedure is discarded due to theenormous number of possible disorder realizations [36].Instead we generate a subset of all possible disorder real-izations and from those we try to infer the expected mean round-trip time of all the disorder realizations belongingto the same problem class, by means of the populationmean τ pop . This is a standard approach in all MonteCarlo studies and the law of large numbers assures itsconvergence for all random variables from distributionswith well deﬁned mean. However, this prerequisite is notfulﬁlled for all of the round-time distributions encoun-tered in this work.The expected mean round-trip time resulting from theunderlying probability density could be estimated by the distribution mean h τ i = ∞ Z µ − β/ξ dτ τ f ( τ ) . (10)This integral can be computed analytically, yielding h τ i = ( µ + βξ [Γ (1 − ξ ) −

1] for ξ < ∞ otherwise , (11)with Γ( x ) being the gamma function. The distributionmean is, therefore, only deﬁned as long as the shape pa-rameter ξ is smaller than one [37]. To illustrate this diﬃ-culty one can consider the running mean which is deﬁnedas the population mean over the ﬁrst n generated disor-der realizations keeping them in a ﬁxed order, τ ( n ) = 1 n n X i =1 τ i , (12)implying τ pop = τ ( N ), where N is the number of allsimulated disorder realizations.In Fig. 5 the running mean for the ﬂat MUCA methodand diﬀerent system sizes is plotted together with therespective distribution mean, if it is deﬁned. For L = 4( ξ ≪

1) the running mean quickly converges to the dis-tribution mean indicated by the dotted line. For L = 6( ξ ≈

1) the jumps due to rare events in the tail of the dis-tribution become more pronounced. The running meanis still expected to approach the distribution mean for aﬁnite number of disorder realizations. For the 6000 sam-ples considered in our work this is still not the case. For L = 8 ( ξ >

1) the distribution mean is not deﬁned. Inthe picture of the running mean, jumps represent round-trip times in the tail of the distribution. In the case of ξ > τ n /n in the running mean are clearlyvisible in Fig. 5 and will lead to a divergence of the popu-lation mean the more disorder realizations are taken intoaccount and hence the more rigorously the tail of the dis-tribution is explored. This illustrates that the populationmean as a measure for the performance of the diﬀerentmethods must be taken with a grain of salt.In order to retain the characteristics of the underlyinground-trip time distribution into the estimator of the per-formance of the diﬀerent methods the distribution meanup to a certain quantile can instead be taken into account. (a) (b) τ p o p L (cid:29)at /k PTpower law h τ i ǫ = − L (cid:29)at /k PTpower law

FIG. 6. Population mean τ pop (a) and quantile mean h τ i ǫ =10 − (b) of the round-trip times for the diﬀerent methods as afunction of system size. The latter is the more reliable statistical quantity. The quantile function is the inverse of the CDF (9), Q ( p ) = F − ( p ) = µ + βξ · h ( − ln p ) − ξ − i , p ∈ (0 , , (13)yielding the round-trip time τ p = Q ( p ) at which a certainfraction p of the distribution is accumulated. For each ǫ < quantile mean h τ i ǫ disregarding afraction ǫ of the tail of the distribution as the integral (10)with the upper bound replaced by Q (1 − ǫ ), h τ i ǫ = Q (1 − ǫ ) Z µ − β/ξ dτ τ f ( τ ) . (14)The integral is evaluated with the parameters of the bestﬁtting distributions to the measured round-trip times, seeFig. 3. This enables a well-deﬁned extrapolation beyondthe measured round-trip times of the simulated disorderrealizations of the underlying study and thus a compari-son of the diﬀerent methods beyond the mere populationmean, which may be strongly dependent on the set ofdisorder realizations taken into account for the study. B. Comparison of the Diﬀerent Methods

Finally, for the comparison of the mean round-triptimes only the population mean τ pop and the quantilemean h τ i ǫ =10 − neglecting a fraction ǫ = 10 − of the tailof the distribution are taken into account as the distribu-tion mean for the parallel tempering method is alreadyill-deﬁned for L = 6.The two deﬁnitions are evaluated for all simulated lat-tice sizes and plotted in Fig. 6. Both deﬁnitions of themean grow exponentially up to linear system size L = 6until which the mean is deﬁned, while for L >

6, wherethe distribution means diverge, they seem to be growingfaster than exponentially. We have also looked at thescaling of the more commonly used quantiles including

TABLE I. Ratios of the population mean τ pop and the quan-tile mean h τ i ǫ =10 − of the round-trip times for ﬂat MUCA,the 1 /k -ensemble, and parallel tempering with respect to thesame quantities for the power-law MUCA method.ﬂat MUCA 1 /k -ensemble parallel tempering L r pop r ǫ =10 − r pop r ǫ =10 − r pop r ǫ =10 − . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the median [38, 39], which are derived directly from the τ -values without the intermediate step of ﬁtting to a sta-tistical model. These quantiles behave similarly to thequantile means (14) being, however, less stable for small ǫ . For the direct comparison of PL with the existingmethods we introduce the relative performance r whichwe deﬁne as the fraction of the mean of the respectivemethod and the one of PL. In Table I, the relative per-formance for all diﬀerent system sizes is listed. The errorsin r are estimated using the Jackknife resampling tech-nique. It consists in generating a set of ratios { r i } , wherefor the calculation of each r i only a subset of all disorderrealizations is taken. The error in r is derived from thevariance of the so generated Jackknife sample.The speedup of PL compared to ﬂat MUCA increaseswith system size, reaching a factor of more than 10 for L = 8 for both deﬁnitions of the mean, while comparedto the 1 /k ensemble the speedup for the biggest systemsize is still a factor of r ≈ −

9. Compared to PT thespeedup is less pronounced and not steadily growing withsystem size, reaching a factor of r ≈ − VI. CONCLUSION

Setting up multicanonical simulations such that theoutcoming histograms are shaped according to powerlaws instead of being ﬂat is trivially achievable. Nev-ertheless this simple approach enables us to gather sig-niﬁcantly more independent statistics at the ground-stateenergy, which is important because the thermodynamiccontributions of the ground state of spin glasses are be-lieved to be signiﬁcant. It is likely that similar techniqueswill also improve the sampling of the ground state ofother systems with complex free-energy landscape suchas polymers and in particular proteins, for which the im-portance of the native state is well known.While PT has been the most employed method in thesimulation of spin glasses probably also due to its goodability to investigate the ground-state region, we wereable to show that the power-law setting considerably im-proves the performance of multicanonical simulations inthis respect, rendering them at least comparable to PT.The overall gain in performance grows with increasing lattice size and reaches a factor of up to 10 −

15 in com-parison to ﬂat MUCA and still a factor of up to 2 − ACKNOWLEDGEMENT

This project was funded by the Deutsche Forschungs-gemeinschaft (DFG, German Research Foundation) un-der project No. 189 853 844 – SFB/TRR 102 (projectB04). It was further supported by the Deutsch-Franz¨osische Hochschule (DFH-UFA) through the Doc-toral College “ L ” under Grant No. CDFA-02-07. [1] W. Janke, ed., Rugged Free Energy Landscapes: Com-mon Computational Approaches to Spin Glasses, Struc-tural Glasses and Biological Macromolecules , Lect. NotesPhys., Vol. 736 (Springer, Berlin, 2008).[2] N. Metropolis, A. W. Rosenbluth, M. N. Rosen-bluth, A. H. Teller, and E. Teller, Equation ofstate calculations by fast computing machines,J. Chem. Phys. , 1087 (1953).[3] K. Hukushima and K. Nemoto, Exchange Monte Carlomethod and application to spin glass simulations,J. Phys. Soc. Jpn. , 1604 (1996).[4] R. H. Swendsen and J.-S. Wang, ReplicaMonte Carlo simulation of spin-glasses,Phys. Rev. Lett. , 2607 (1986).[5] C. J. Geyer and E. A. Thompson, Annealing Markovchain Monte Carlo with applications to ancestral infer-ence, J. Am. Stat. Assoc. , 909 (1995).[6] M. Hasenbusch, A. Pelissetto, and E. Vicari, Criticalbehavior of three-dimensional Ising spin glass models,Phys. Rev. B , 214205 (2008).[7] M. Baity-Jesi, R. A. Ba˜nos, A. Cruz, L. A. Fer-nandez, J. M. Gil-Narvion, A. Gordillo-Guerrero,D. I˜niguez, A. Maiorano, F. Mantovani, E. Mari-nari, V. Martin-Mayor, J. Monforte-Garcia, A. Mu˜nozSudupe, D. Navarro, G. Parisi, S. Perez-Gaviro, M. Pi-vanti, F. Ricci-Tersenghi, J. J. Ruiz-Lorenzo, S. F.Schifano, B. Seoane, A. Tarancon, R. Tripiccione,and D. Yllanes (Janus Collaboration), Critical pa-rameters of the three-dimensional Ising spin glass,Phys. Rev. B , 224416 (2013).[8] W. Wang, J. Machta, and H. G. Katzgraber, ComparingMonte Carlo methods for ﬁnding ground states of Isingspin glasses: Population annealing, simulated annealing,and parallel tempering, Phys. Rev. E , 013303 (2015). [9] Y. Iba, Population Monte Carlo algorithms, Trans. Jpn.Soc. Artif. Intell. , 279 (2001).[10] K. Hukushima and Y. Iba, Population annealing and itsapplication to a spin glass, AIP Conf. Proc. , 200(2003).[11] J. Machta, Population annealing with weighted averages:A Monte Carlo method for rough free-energy landscapes,Phys. Rev. E , 026704 (2010).[12] W. Wang, J. Machta, and H. G. Katzgraber, Populationannealing: Theory and application in spin glasses, Phys.Rev. E , 063307 (2015).[13] L. Y. Barash, M. Weigel, M. Borovsk´y, W. Janke, andL. N. Shchur, GPU accelerated population annealing al-gorithm, Comput. Phys. Commun. , 341 (2017).[14] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimiza-tion by simulated annealing, Science , 671 (1983).[15] A. Barzegar, C. Pattison, W. Wang, and H. G.Katzgraber, Optimization of population annealingMonte Carlo for large-scale spin-glass simulations,Phys. Rev. E , 053308 (2018).[16] B. A. Berg and T. Neuhaus, Multicanoni-cal algorithms for ﬁrst order phase transitions,Phys. Lett. B , 249 (1991).[17] B. A. Berg and T. Neuhaus, Multicanonical ensemble: Anew approach to simulate ﬁrst-order phase transitions,Phys. Rev. Lett. , 9 (1992).[18] W. Janke, Multicanonical simulation ofthe two-dimensional 7-state Potts model,Int. J. Mod. Phys. C , 1137 (1992).[19] B. A. Berg and T. Celik, New approach to spin-glasssimulations, Phys. Rev. Lett. , 2292 (1992).[20] B. A. Berg, T. Celik, and U. Hansmann, Mul-ticanonical study of the 3d Ising spin glass,Europhys. Lett. , 63 (1993). [21] B. Hesselbo and R. B. Stinchcombe, Monte Carlo sim-ulation and global optimization without parameters,Phys. Rev. Lett. , 2151 (1995).[22] S. Trebst, D. A. Huse, and M. Troyer, Optimizing the en-semble for equilibration in broad-histogram Monte Carlosimulations, Phys. Rev. E , 046701 (2004).[23] S. F. Edwards and P. W. Anderson, Theory of spinglasses, J. Phys. F: Met. Phys. , 965 (1975).[24] F. Barahona, On the computationalcomplexity of Ising spin glass models,J. Phys. A: Math. Gen. , 3241 (1982).[25] F. Wang and D. P. Landau, Eﬃcient, multiple-range ran-dom walk algorithm to calculate the density of states,Phys. Rev. Lett. , 2050 (2001).[26] W. Janke, Histograms and all that, in Computer Simulations of Surfaces and Interfaces ,edited by B. D¨unweg, D. P. Landau, and A. I. Milchev(Springer Netherlands, Dordrecht, 2003) pp. 137–157.[27] The aim of the study was not maximizing the number ofround trips in energy but rather the amount of statisti-cally independent data in an uncorrelated Monte Carlosimulation.[28] T. Papakonstantinou and A. Malakis, Par-allel tempering and 3d spin glass models,J. Phys. Conf. Ser. , 012010 (2014).[29] E. Bittner, A. Nußbaumer, and W. Janke, Make life sim-ple: Unleash the full power of the parallel tempering al-gorithm, Phys. Rev. Lett. , 130603 (2008).[30] H. G. Katzgraber, S. Trebst, D. A. Huse, and M. Troyer,Feedback-optimized parallel tempering Monte Carlo,J. Stat. Mech.: Theory , P03018 (2006).[31] The estimated temperature was extracted from few singledisorder realizations for each lattice size and then taken as constant for all samples with the same lattice size.[32] This diﬀerent measuring prescription gives PT a slightadvantage in comparison to the other methods which is,however, negligible in the authors’ opinion.[33] M. Charras-Garrido and P. Lezaud, Ex-treme value analysis: An introduction,J. Soc. Fr. Statistique , 66 (2013).[34] S. Alder, S. Trebst, A. K. Hartmann, and M. Troyer, Dy-namics of the Wang-Landau algorithm and complexity ofrare events for the three-dimensional bimodal Ising spinglass, J. Stat. Mech.: Theory , P07008 (2004).[35] B. Efron,

The Jackknife, the Bootstrap and Other Resampling Plans (Society for Industrial and Applied Mathematics,Philadelphia, 1982).[36] The bimodal EA spin glass, having discrete randomness,has only a ﬁnite number of possible disorder realizations.Due to the symmetries in the absence of external ﬁeldsthis number can be estimated to be of the order of 2 V ,where V = L is the number of spins contained in thelattice.[37] Due to the ﬁnite number of possible disorder realizationsthe mean is actually deﬁned. Its ﬁnite value is expectedto be of the order of 10 , and can therefore in terms ofcomputation time numerically not be distinguished froma real divergence.[38] B. A. Berg, A. Billoire, and W. Janke, Spin-glass overlap barriers in three and four dimensions,Phys. Rev. B , 12143 (2000).[39] E. Bittner and W. Janke, Free-energy bar-riers in the Sherrington-Kirkpatrick model,Europhys. Lett. , 195 (2006).[40] A. Lucas, Ising formulations of many NP problems,Front. Phys.2