[PDF] On properties of the Wang--Landau algorithm

Abstract

We review recent advances in the analysis of the Wang--Landau algorithm, which is designed for the direct Monte Carlo estimation of the density of states (DOS). In the case of a discrete energy spectrum, we present an approach based on introducing the transition matrix in the energy space (TMES). The TMES fully describes a random walk in the energy space biased with the Wang-Landau probability. Properties of the TMES can explain some features of the Wang-Landau algorithm, for example, the flatness of the histogram. We show that the Wang--Landau probability with the true DOS generates a Markov process in the energy space and the inverse spectral gap of the TMES can estimate the mixing time of this Markov process. We argue that an efficient implementation of the Wang-Landau algorithm consists of two simulation stages: the original Wang-Landau procedure for the first stage and a 1/t modification for the second stage. The mixing time determines the characteristic time for convergence to the true DOS in the second simulation stage. The parameter of the convergence of the estimated DOS to the true DOS is the difference of the largest TMES eigenvalue from unity. The characteristic time of the first stage is the tunneling time, i.e., the time needed for the system to visit all energy levels.

Full PDF

aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A ug On properties of the Wang–Landau algorithm

L. N. Shchur , , Landau Institute for Theoretical Physics, 142432 Chernogolovka, Russia Science Center in Chernogolovka, 142432 Chernogolovka, Russia National Research University Higher School of Economics, 101000 Moscow, Russia

Abstract.

We review recent advances in the analysis of the Wang–Landau algorithm, whichis designed for the direct Monte Carlo estimation of the density of states (DOS). In the case of adiscrete energy spectrum, we present an approach based on introducing the transition matrix inthe energy space (TMES). The TMES fully describes a random walk in the energy space biasedwith the Wang–Landau probability. Properties of the TMES can explain some features of theWang–Landau algorithm, for example, the ﬂatness of the histogram. We show that the Wang–Landau probability with the true DOS generates a Markov process in the energy space and theinverse spectral gap of the TMES can estimate the mixing time of this Markov process. Weargue that an eﬃcient implementation of the Wang–Landau algorithm consists of two simulationstages: the original Wang–Landau procedure for the ﬁrst stage and a 1 /t modiﬁcation for thesecond stage. The mixing time determines the characteristic time for convergence to the trueDOS in the second simulation stage. The parameter of the convergence of the estimated DOSto the true DOS is the diﬀerence of the largest TMES eigenvalue from unity. The characteristictime of the ﬁrst stage is the tunneling time, i.e., the time needed for the system to visit allenergy levels.

1. Introduction

The history of Monte Carlo simulations begins in the early years of computing. The methodwas developed in one of the most cited papers in computational physics and chemistry [1], whereboth the Monte Carlo method and the molecular dynamics approach were introduced. Manyalgorithms for Monte Carlo simulations have since been developed. In statistical mechanics,diﬀerent Monte Carlo algorithms are based on diﬀerent partition function representations.A general partition function representation is Z = X { i } e − E i /k B T , (1)where E i is the energy of the conﬁguration { i } of the system in the phase space, T is thetemperature, and k B is the Boltzmann constant. Representation (1) can be associated, forexample, with the Metropolis algorithm [1], which is based on the local updates taking energychanges ∆ E kl = E l − E k with moves in the phase space from state { k } to state { l } into account.The probability of accepting the move is given by the Metropolis probabilities P = min h , e − ∆ E kl /k B T i . (2)Modiﬁcations of the Metropolis algorithm are based on variations of local updates, as well asmore complicated algorithms, which in any event use local update rules and probabilities (seebook [2] for references). All of them use partition function (1) as the basic representation. more sophisticated algorithm named the MuMC algorithm (multi-canonical Monte Carlo)is based on another representation of the partition function [3, 4], Z = N E X k =1 g ( E k ) W ( E k ) , (3)where N E is the number of energy levels, g ( E k ) is the number of states with the energy E k ,and W ( E ) is an a priori unknown weight function, which is estimated in the simulations. Thecondition to stop is that all g ( E k ) W ( E k ) are approximately equal to each other. It is successfullyused in polymer simulations [5].Another famous example is based on a cluster representation [6] of the Potts model, Z = X bonds p b (1 − p ) n q N c , (4)where p = 1 − exp( − J/k B T ), J is a constant measured in energy units, b is the number ofbonds connecting spins, n is the number of bonds not connecting spins, q is the number of spincomponents, and N c is the number of clusters in the conﬁguration. The cluster representationleads to several eﬃcient algorithms for simulating the Potts model; the two most famous are theSwendsen–Wang cluster algorithm [7] and the Wolf one-cluster algorithm [8].The abovementioned algorithms are extensively used in simulations. We do not discuss thepros and cons of the algorithms and refer our readers to the book [2].A common property of all the mentioned algorithms is that simulations depend explicitly onthe temperature T . The algorithm that we review and analyze here is quite diﬀerent: simulationsare independent of the temperature.The Wang–Landau (WL) algorithm is a way to estimate the density of states (DOS)directly [9, 10]. It can be applied to any system with a deﬁned partition function. Its idea isbased on representing the partition function as a sum over the energy levels (somewhat similarto representations used for the MuMC algorithm), Z = N E X k =1 g ( E k ) e − E k /k B T . (5)It has proved to work quite well for many systems, with applications in diﬀerent areas ofstatistical physics and mechanics. There are more than 1500 papers on the application ofthe algorithm and its improvements. At the same time, not all properties of the algorithmare well understood. For example, it is unclear how auxiliary histogram ﬂatness is related tothe convergence of the algorithm and how accuracy of the DOS estimation evolves with thesimulation time.An approach based on constructing the transition matrix in the energy space (TMES) wasrecently proposed [11]. It answers the above questions and can be used for a deeper analysisof the algorithm and for modiﬁcations of the WL algorithm. We present a summary of theapproach to the analysis of the WL algorithm. We regard the WL algorithm as a randomwalk in the energy space as soon as the WL probabilities of moves between energies are basedpurely on the ratios of the corresponding densities g ( E ) of the energy states. We explain someproperties of the WL algorithm, including the ﬂatness criteria and the characteristic time scales,and discuss the control parameter for estimating the accuracy of the DOS estimate and for theconvergence of the DOS to the true values.In section 2, we brieﬂy review the WL algorithm. In section 3, we introduce the TMES. Insection 4, we deﬁne and discuss the characteristics time of the WL algorithm. The discussion insection 5 ﬁnishes our review. . Wang–Landau algorithm We brieﬂy describe the WL algorithm here. We take a conﬁguration of the system, compute theenergy value E k , randomly choose an update to a new conﬁguration with the energy E m , andaccept this conﬁguration with the WL probability e P wl = min (cid:20) , ˜ g ( E k )˜ g ( E m ) (cid:21) , (6)where ˜ g ( E ) is the current DOS approximation. The approximation is obtained recursivelyby multiplying ˜ g ( E m ) by a factor f at each step of the random walk in the energy space.Each time that the auxiliary histogram H ( E ) becomes suﬃciently ﬂat, the parameter f ismodiﬁed by taking the square root, f := √ f . Each histogram value H ( E m ) counts the numberof moves to the energy level E m . The histogram is ﬁlled with zeros after each modiﬁcationof the reﬁnement parameter f . It is convenient to work with the logarithms of the values, S ( E k ) := ln ˜ g ( E k ) and F := ln f , and to replace the multiplication ˜ g ( E m ) := f · ˜ g ( E m ) withthe addition S ( E m ) := S ( E m ) + F . At the end of the simulation, the algorithm provides only arelative DOS. Either the total number of states or the number of ground states can be used todetermine the absolute DOS.

3. Transition matrix in the energy space

The central idea of the WL algorithms is the WL probability, which generates a random walk inthe energy space. We ﬁrst argue that random walk in the energy space with the WL probabilitycalculated using the true DOS is a Markov chain and satisﬁes the detailed balance condition [11].The WL probability in this case is given by P wl = min (cid:20) , g ( E k ) g ( E m ) (cid:21) , (7)where g ( E ) is the true DOS (cf. WL probability (6)).It is useful to consider a TMES whose elements show the frequency of transitions betweenenergy levels during the WL random walk in the energy space [11]. Its elements are inﬂuencedby both the random process of choosing a new conﬁgurational state and the WL probability ofaccepting the new energy, T ( E k , E m ) = min (cid:18) , g ( E k ) g ( E m ) (cid:19) P ( E k , E m ) , (8)which represents nondiagonal elements of the TMES of the WL random walk in the energy space.Here, P ( E k , E m ) is the probability of one step of the random walk to move from a conﬁgurationwith the energy E k to any conﬁguration with the energy E m .The random walk in the conﬁguration space is a Markov chain. Its invariant distribution isuniform, i.e., the probabilities of all states of the physical system are equal to each other. Forany pair of conﬁgurations Ω A and Ω B , the probability of an update from Ω A to Ω B is equal tothe probability of an update from Ω B to Ω A . Hence, the detailed balance condition is satisﬁed.Therefore, g ( E k ) P ( E k , E m ) = g ( E m ) P ( E m , E k ) , (9)where g ( E ) is the true DOS. It follows from (8) and (9) that T ( E k , E m ) = T ( E m , E k ) . (10)Therefore, the TMES of the WL random walk on the true DOS is a symmetric matrix. Becausethe matrix is both symmetric and right stochastic, it is also left stochastic. Therefore, the sumver rows (columns) is equal to unity, and the rates of visiting of all energy levels are equalto each other. The corresponding auxiliary histogram of energy visits in the WL algorithm isindeed ﬂat! (See the discussion in section 5.)We therefore observe [11] that the ﬂatness of the histogram in the WL algorithm is just theproximity of the corresponding random walk in the energy space to the Markov process. Thecloser the estimated DOS is to the true DOS, the closer the TMES is to the fully stochasticmatrix. This property can be used to connect control of DOS convergence to the property ofTMES convergence to the stochastic matrix. It is instructive to seek an exact solution of TMES elements. This can be easily done for the1D Ising model [11]. In the case of a chain of L spins with periodic boundary conditions, theprobability to change the energy from E k to E m in a WL random move is T ( E k , E m ) = min (cid:20) , g ( E k ) g ( E m ) (cid:21) k X i =0 N i Q E k → E m i g ( E k ) , (11)where k = m . Here, k is the number of couples of domains walls in the conﬁguration, whichdetermines the energy level E k = − P Lj =1 σ j σ j +1 = − L + 4 k , N i ( k, L ) is the number ofconﬁgurations where i domains consist of only one spin and 2 k − i domains consist of morethan one spin, and Q E k → E m i ( L ) is the probability that a single spin ﬂip moves system to theenergy E m from such conﬁgurations.The structure of expression (11) shows that TMES elements are composed of threeprobabilities: the probability of the particular conﬁguration, the probability that the movein the conﬁguration space changes the system energy to some value, and the WL probability P wl to accept the proposed move between energy levels. It reﬂects the inside of the WL algorithmand explicitly shows how the combination of choosing the site in the conﬁguration space and ofthe corresponding change in the phase space generates a random walk in the energy space.Occupations of energy levels of the chain are expressed in terms of binomial coeﬃcients as g ( E k ) = 2 C kL because there are exactly C kL ways to arrange 2 k domain walls. Therefore,partition function (5) is Z L = 2 L/ X k =0 C kL e ( L − k ) / ( k B T ) (12)and the TMES for the 1D Ising model is T ( E k , E k +1 ) = T ( E k +1 , E k ) = C kL − max (cid:16) C kL , C k +2 L (cid:17) . (13)The TMES is indeed symmetric and stochastic. We estimate the TMES elements in simulations as follows [11]. The auxiliary matrix U ( E k , E m )is initially ﬁlled with zeros. The element U ( E k , E m ) is increased by unity after every WL movefrom a conﬁguration with the energy E k to a conﬁguration with the energy E m . During thesimulations, we compute the normalized matrix e T ( E k , E m ) = U ( E k , E m ) / e H , (14)here e H = X k,m U ( E k , E m ) /N E . (15)The obtained matrix e T is expected to approach the stochastic matrix T in the ﬁnal stage ofcorrect calculation.

4. Two characteristic times in WL algorithm

Results of simulations [11, 12] demonstrate that there are two time scales in the WL simulations(although this point is not discussed in the texts). It is well known that the WL algorithm drivesthe estimated DOS to the vicinity of the true DOS at the beginning of simulations, at least forsystems with a discrete energy spectrum and a not very complicated energy landscape. This isa positive feature of the WL algorithm, although still unexplained. The time scale of this stagecan be associated with the tunneling time. The second stage is the stage of convergence of theestimated DOS to the true DOS. In this section, we argue that the time scale of the secondstage is given by the inverse value of the spectral gap of the TMES.

Is it known from the beginning of WL simulations [13] that the ﬁnite value of the desiredhistogram ﬂatness leads to saturation of the estimates, and saturation was found for systemswith a known DOS. The ﬁrst stage of the WL algorithm drives the estimated DOS close to thetrue DOS in the ﬁrst simulation stage, although not very close. Analysis of Figures 1 and 3–5in paper [11] shows that there is a characteristic time of the ﬁrst stage; we suggest that it is thetunneling time of the random walk in the energy space. There are several deﬁnitions leadingto the same scaling behavior of the tunneling time with system size (more precisely, with thenumber of energy levels), T t ∝ L z . A useful practical deﬁnition is that it is the time needed forthe WL random walk to reach the highest energy level starting from the lowest one [14].The modiﬁcation of the WL algorithm called the 1 /t -WL algorithm [15, 16] deﬁnes the timewhen the 1 /t regime of the WL algorithm starts as the time when all energy levels have beenvisited at least once. Clearly, it is practically the same as the tunneling time.The stochastic approximation Monte Carlo (SAMC) algorithm [17, 18] does specify how tochoose the time t when the 1 /t regime starts. Our preliminary analysis shows that the optimalchoice of t is indeed the tunneling time.There is a natural lower limit for the exponent z . It can be understood in terms of the randomwalk. We ﬁrst consider the classical problem of the 1D unbiased random walk: the position i ofthe random walk changes with equal probability to i − i + 1. The point 0 is the reﬂectingpoint of the random walk, and the point N is the absorbing point. The ﬁrst-passage time of therandom walk is known to scale as N . The number of levels N E for the 2D Ising model scales as L , and the number of random walk jumps necessary to cross all energy spectrum hence scalesas L . Therefore, z = 4 is the minimum value for any type of random walk in the energy spaceof the 2D Ising model.For a random walk with bias, which is the WL probability P wl , the growth of the tunnelingtime should be faster, z >

4. There is a worse message, which can be learned from the propertiesof a random walk in a random potential. It states that localization of the random walker occursin one dimension [19]. This probably explains the diﬃculties in applying the WL algorithm tosystems with complex energy landscapes: the random walker can be temporarily trapped neara global minimum of the probability proﬁle in the simulated DOS.Positions of the vertical dashed line in Figures 1 and 3–5 in paper [11] can be associated withthe tunneling time. During the ﬁrst simulation stage, the eﬀective tunneling time exponentgrows from z = 4 to a value that depends on the investigated system. Reported estimatedxponents for the scaling of the tunneling time are z = 4 . L and approximately z = 5 . The spectral gap G of the TMES determines the mixing time of the Markov chain [20]: T m = 1 /G, (16) G = λ − λ , (17)where λ and λ are the two largest eigenvalues of the TMES.In the second phase of the 1 /t -WL (or SAMC) algorithm, the DOS is very close to the trueDOS, and the characteristic time of convergence is therefore just the mixing time T m of theMarkov chain.The exact solution for the 1D Ising model [11] can be used to construct the TMES for varyinglattice sizes, and ﬁtting to the inverse spectral gap yields an estimate [12] for the mixing timeexponent T m ∝ L . . Numerically estimating the mixing time for the 2D Ising model [12] leadsto the result T m ∝ L . .

5. Discussion

Using our approach, we can calculate the normalized histogram H = H ( E m ) / P m H ( E m ) fromthe TMES H = P k e T ( E k , E m ). It is obvious that the histogram ﬂatness condition is equivalentto the property that the matrix e T is close to the stochastic matrix. The histogram ﬂatness inthe second simulation stage is closely related to the stochastic properties of the TMES becausethe estimated DOS is very close to the true DOS.We have argued that the time t c in the 1 /t -WL algorithm [15, 16] and the time t in the SAMCalgorithm [17, 18] is of the order of the tunneling time [14]. Therefore, in the ﬁrst simulationstage, the optimal choice is the original WL algorithm, which drives the estimated DOS closeto the true DOS, at least for systems with a not very complicated free energy landscape. In thesecond simulation stage, the 1 /t -WL algorithm drives the estimated DOS to the true DOS, andthe characteristic time is the mixing time T m . The accuracy of estimating the DOS is given bythe deviation of the largest eigenvalue of the estimated TMES from unity [11].Future research should be done to understand properties of the WL algorithm when appliedto more complicated systems and to systems with a continuous energy spectrum. In systemswith a complex energy landscape, the variation of the transition matrix elements can be huge,and this can lead to a large increase of the tunneling time. It follows from the theory of arandom walk in a random potential [19] that the random walker can be localized in an extremalvalley of the potential. It is important to understand under which conditions this can happenwith the WL algorithm.Partitioning a continuous energy spectrum into bins (done in simulations) allows using theTMES approach to analyze simulations. It would be instructive to understand how the bin sizeinﬂuences TMES properties, from which we might understand some properties of simulations,the behavior of the control parameter, and also the tunneling and mixing time values.This work was supported by grant 14-21-00158 from the Russian Science Foundation. References [1] Ni. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller,

Equation of State Calculationsby Fast Computing Machines , J. Chem. Phys. , 1087 (1953).[2] D.P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics , 4th edition,Cambridge University Press, Cambridge, 2015.3] B. A. Berg, T. Neuhaus,

Multicanonical algorithms for ﬁrst order phase transitions , , 249 (1991).[4] B. A. Berg, T. Neuhaus, Multicanonical ensemble: A new approach to simulate ﬁrst-order phase transitions ,Phys. Rev. Lett. , 9 (1992).[5] W. Janke and W. Paul, Thermodynamics and Structure of Macromolecules from Flat-Histogram Monte CarloSimulations , Soft Matter , 642 (2016).[6] C. M. Fortuin, P. W. Kasteleyn, On the random-cluster model: I. introduction and relation to other models ,Physica (Amsterdam), , 536 (1972).[7] R.H. Swendsen and J.-S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations , Phys. Rev. Lett. , 86 (1987).[8] U. Wolﬀ, Lattice ﬁeld theory as a percolation process , Phys. Rev. Lett. , 1461 (1988).[9] F. Wang, D. P. Landau, Eﬃcient, multiple-range random walk algorithm to calculate the density of states ,Phys. Rev. Lett. , 2050 (2001).[10] F. Wang, D. P. Landau, Determining the density of states for classical statistical models: A random walkalgorithm to produce a ﬂat histogram , Phys. Rev. E , 056101 (2001).[11] L. Yu. Barash, M. A. Fadeeva, and L. N. Shchur, Control of accuracy in the Wang–Landau algorithm , Phys.Rev E , 043307 (2017).[12] M. Fadeeva and L. Shchur, On the mixing time in the Wang–Landau algorithm , J. Phys.: Conf. Ser. ,012028 (2018).[13] D. Landau. Rahman Prize Lecture, 2002.[14] P. Dayal, et al,

Performance limitations of ﬂat-histogram method , Phys. Rev. Lett. , 097201 (2004).[15] R. E. Belardinelli and V. D. Pereyra, Fast algorithm to calculate density of states , Phys. Rev. E , 046701(2007).[16] R. E. Belardinelli and V. D. Pereyra, Wang–Landau algorithm: A theoretical analysis of the saturation of theerror , J. Chem. Phys. , 184105 (2007).[17] F. Liang, C. Liu, and R. J. Carroll,

Stochastic Approximation in Monte Carlo Computation , J. Am. Stat.Assoc. , 305 (2007).[18] F. Liang,

A Theory on Flat Histogram Monte Carlo Algorithms , J. Stat. Phys. , 511 (2006).[19] Ya.G. Sinai,

The Limiting Behavior of a One-Dimensional Random Walk in a Random Medium , TheoryProbab. Appl., , 256 (1982)[20] S. Boyd, P. Diaconis, L. Xiao, Fastest mixing Markov chain on a graph , SIAM Review46