aa r X i v : . [ c ond - m a t . s t a t - m ec h ] A ug On properties of the Wang–Landau algorithm
L. N. Shchur , , Landau Institute for Theoretical Physics, 142432 Chernogolovka, Russia Science Center in Chernogolovka, 142432 Chernogolovka, Russia National Research University Higher School of Economics, 101000 Moscow, Russia
Abstract.
We review recent advances in the analysis of the Wang–Landau algorithm, whichis designed for the direct Monte Carlo estimation of the density of states (DOS). In the case of adiscrete energy spectrum, we present an approach based on introducing the transition matrix inthe energy space (TMES). The TMES fully describes a random walk in the energy space biasedwith the Wang–Landau probability. Properties of the TMES can explain some features of theWang–Landau algorithm, for example, the flatness of the histogram. We show that the Wang–Landau probability with the true DOS generates a Markov process in the energy space and theinverse spectral gap of the TMES can estimate the mixing time of this Markov process. Weargue that an efficient implementation of the Wang–Landau algorithm consists of two simulationstages: the original Wang–Landau procedure for the first stage and a 1 /t modification for thesecond stage. The mixing time determines the characteristic time for convergence to the trueDOS in the second simulation stage. The parameter of the convergence of the estimated DOSto the true DOS is the difference of the largest TMES eigenvalue from unity. The characteristictime of the first stage is the tunneling time, i.e., the time needed for the system to visit allenergy levels.
1. Introduction
The history of Monte Carlo simulations begins in the early years of computing. The methodwas developed in one of the most cited papers in computational physics and chemistry [1], whereboth the Monte Carlo method and the molecular dynamics approach were introduced. Manyalgorithms for Monte Carlo simulations have since been developed. In statistical mechanics,different Monte Carlo algorithms are based on different partition function representations.A general partition function representation is Z = X { i } e − E i /k B T , (1)where E i is the energy of the configuration { i } of the system in the phase space, T is thetemperature, and k B is the Boltzmann constant. Representation (1) can be associated, forexample, with the Metropolis algorithm [1], which is based on the local updates taking energychanges ∆ E kl = E l − E k with moves in the phase space from state { k } to state { l } into account.The probability of accepting the move is given by the Metropolis probabilities P = min h , e − ∆ E kl /k B T i . (2)Modifications of the Metropolis algorithm are based on variations of local updates, as well asmore complicated algorithms, which in any event use local update rules and probabilities (seebook [2] for references). All of them use partition function (1) as the basic representation. more sophisticated algorithm named the MuMC algorithm (multi-canonical Monte Carlo)is based on another representation of the partition function [3, 4], Z = N E X k =1 g ( E k ) W ( E k ) , (3)where N E is the number of energy levels, g ( E k ) is the number of states with the energy E k ,and W ( E ) is an a priori unknown weight function, which is estimated in the simulations. Thecondition to stop is that all g ( E k ) W ( E k ) are approximately equal to each other. It is successfullyused in polymer simulations [5].Another famous example is based on a cluster representation [6] of the Potts model, Z = X bonds p b (1 − p ) n q N c , (4)where p = 1 − exp( − J/k B T ), J is a constant measured in energy units, b is the number ofbonds connecting spins, n is the number of bonds not connecting spins, q is the number of spincomponents, and N c is the number of clusters in the configuration. The cluster representationleads to several efficient algorithms for simulating the Potts model; the two most famous are theSwendsen–Wang cluster algorithm [7] and the Wolf one-cluster algorithm [8].The abovementioned algorithms are extensively used in simulations. We do not discuss thepros and cons of the algorithms and refer our readers to the book [2].A common property of all the mentioned algorithms is that simulations depend explicitly onthe temperature T . The algorithm that we review and analyze here is quite different: simulationsare independent of the temperature.The Wang–Landau (WL) algorithm is a way to estimate the density of states (DOS)directly [9, 10]. It can be applied to any system with a defined partition function. Its idea isbased on representing the partition function as a sum over the energy levels (somewhat similarto representations used for the MuMC algorithm), Z = N E X k =1 g ( E k ) e − E k /k B T . (5)It has proved to work quite well for many systems, with applications in different areas ofstatistical physics and mechanics. There are more than 1500 papers on the application ofthe algorithm and its improvements. At the same time, not all properties of the algorithmare well understood. For example, it is unclear how auxiliary histogram flatness is related tothe convergence of the algorithm and how accuracy of the DOS estimation evolves with thesimulation time.An approach based on constructing the transition matrix in the energy space (TMES) wasrecently proposed [11]. It answers the above questions and can be used for a deeper analysisof the algorithm and for modifications of the WL algorithm. We present a summary of theapproach to the analysis of the WL algorithm. We regard the WL algorithm as a randomwalk in the energy space as soon as the WL probabilities of moves between energies are basedpurely on the ratios of the corresponding densities g ( E ) of the energy states. We explain someproperties of the WL algorithm, including the flatness criteria and the characteristic time scales,and discuss the control parameter for estimating the accuracy of the DOS estimate and for theconvergence of the DOS to the true values.In section 2, we briefly review the WL algorithm. In section 3, we introduce the TMES. Insection 4, we define and discuss the characteristics time of the WL algorithm. The discussion insection 5 finishes our review. . Wang–Landau algorithm We briefly describe the WL algorithm here. We take a configuration of the system, compute theenergy value E k , randomly choose an update to a new configuration with the energy E m , andaccept this configuration with the WL probability e P wl = min (cid:20) , ˜ g ( E k )˜ g ( E m ) (cid:21) , (6)where ˜ g ( E ) is the current DOS approximation. The approximation is obtained recursivelyby multiplying ˜ g ( E m ) by a factor f at each step of the random walk in the energy space.Each time that the auxiliary histogram H ( E ) becomes sufficiently flat, the parameter f ismodified by taking the square root, f := √ f . Each histogram value H ( E m ) counts the numberof moves to the energy level E m . The histogram is filled with zeros after each modificationof the refinement parameter f . It is convenient to work with the logarithms of the values, S ( E k ) := ln ˜ g ( E k ) and F := ln f , and to replace the multiplication ˜ g ( E m ) := f · ˜ g ( E m ) withthe addition S ( E m ) := S ( E m ) + F . At the end of the simulation, the algorithm provides only arelative DOS. Either the total number of states or the number of ground states can be used todetermine the absolute DOS.
3. Transition matrix in the energy space
The central idea of the WL algorithms is the WL probability, which generates a random walk inthe energy space. We first argue that random walk in the energy space with the WL probabilitycalculated using the true DOS is a Markov chain and satisfies the detailed balance condition [11].The WL probability in this case is given by P wl = min (cid:20) , g ( E k ) g ( E m ) (cid:21) , (7)where g ( E ) is the true DOS (cf. WL probability (6)).It is useful to consider a TMES whose elements show the frequency of transitions betweenenergy levels during the WL random walk in the energy space [11]. Its elements are influencedby both the random process of choosing a new configurational state and the WL probability ofaccepting the new energy, T ( E k , E m ) = min (cid:18) , g ( E k ) g ( E m ) (cid:19) P ( E k , E m ) , (8)which represents nondiagonal elements of the TMES of the WL random walk in the energy space.Here, P ( E k , E m ) is the probability of one step of the random walk to move from a configurationwith the energy E k to any configuration with the energy E m .The random walk in the configuration space is a Markov chain. Its invariant distribution isuniform, i.e., the probabilities of all states of the physical system are equal to each other. Forany pair of configurations Ω A and Ω B , the probability of an update from Ω A to Ω B is equal tothe probability of an update from Ω B to Ω A . Hence, the detailed balance condition is satisfied.Therefore, g ( E k ) P ( E k , E m ) = g ( E m ) P ( E m , E k ) , (9)where g ( E ) is the true DOS. It follows from (8) and (9) that T ( E k , E m ) = T ( E m , E k ) . (10)Therefore, the TMES of the WL random walk on the true DOS is a symmetric matrix. Becausethe matrix is both symmetric and right stochastic, it is also left stochastic. Therefore, the sumver rows (columns) is equal to unity, and the rates of visiting of all energy levels are equalto each other. The corresponding auxiliary histogram of energy visits in the WL algorithm isindeed flat! (See the discussion in section 5.)We therefore observe [11] that the flatness of the histogram in the WL algorithm is just theproximity of the corresponding random walk in the energy space to the Markov process. Thecloser the estimated DOS is to the true DOS, the closer the TMES is to the fully stochasticmatrix. This property can be used to connect control of DOS convergence to the property ofTMES convergence to the stochastic matrix. It is instructive to seek an exact solution of TMES elements. This can be easily done for the1D Ising model [11]. In the case of a chain of L spins with periodic boundary conditions, theprobability to change the energy from E k to E m in a WL random move is T ( E k , E m ) = min (cid:20) , g ( E k ) g ( E m ) (cid:21) k X i =0 N i Q E k → E m i g ( E k ) , (11)where k = m . Here, k is the number of couples of domains walls in the configuration, whichdetermines the energy level E k = − P Lj =1 σ j σ j +1 = − L + 4 k , N i ( k, L ) is the number ofconfigurations where i domains consist of only one spin and 2 k − i domains consist of morethan one spin, and Q E k → E m i ( L ) is the probability that a single spin flip moves system to theenergy E m from such configurations.The structure of expression (11) shows that TMES elements are composed of threeprobabilities: the probability of the particular configuration, the probability that the movein the configuration space changes the system energy to some value, and the WL probability P wl to accept the proposed move between energy levels. It reflects the inside of the WL algorithmand explicitly shows how the combination of choosing the site in the configuration space and ofthe corresponding change in the phase space generates a random walk in the energy space.Occupations of energy levels of the chain are expressed in terms of binomial coefficients as g ( E k ) = 2 C kL because there are exactly C kL ways to arrange 2 k domain walls. Therefore,partition function (5) is Z L = 2 L/ X k =0 C kL e ( L − k ) / ( k B T ) (12)and the TMES for the 1D Ising model is T ( E k , E k +1 ) = T ( E k +1 , E k ) = C kL − max (cid:16) C kL , C k +2 L (cid:17) . (13)The TMES is indeed symmetric and stochastic. We estimate the TMES elements in simulations as follows [11]. The auxiliary matrix U ( E k , E m )is initially filled with zeros. The element U ( E k , E m ) is increased by unity after every WL movefrom a configuration with the energy E k to a configuration with the energy E m . During thesimulations, we compute the normalized matrix e T ( E k , E m ) = U ( E k , E m ) / e H , (14)here e H = X k,m U ( E k , E m ) /N E . (15)The obtained matrix e T is expected to approach the stochastic matrix T in the final stage ofcorrect calculation.
4. Two characteristic times in WL algorithm
Results of simulations [11, 12] demonstrate that there are two time scales in the WL simulations(although this point is not discussed in the texts). It is well known that the WL algorithm drivesthe estimated DOS to the vicinity of the true DOS at the beginning of simulations, at least forsystems with a discrete energy spectrum and a not very complicated energy landscape. This isa positive feature of the WL algorithm, although still unexplained. The time scale of this stagecan be associated with the tunneling time. The second stage is the stage of convergence of theestimated DOS to the true DOS. In this section, we argue that the time scale of the secondstage is given by the inverse value of the spectral gap of the TMES.
Is it known from the beginning of WL simulations [13] that the finite value of the desiredhistogram flatness leads to saturation of the estimates, and saturation was found for systemswith a known DOS. The first stage of the WL algorithm drives the estimated DOS close to thetrue DOS in the first simulation stage, although not very close. Analysis of Figures 1 and 3–5in paper [11] shows that there is a characteristic time of the first stage; we suggest that it is thetunneling time of the random walk in the energy space. There are several definitions leadingto the same scaling behavior of the tunneling time with system size (more precisely, with thenumber of energy levels), T t ∝ L z . A useful practical definition is that it is the time needed forthe WL random walk to reach the highest energy level starting from the lowest one [14].The modification of the WL algorithm called the 1 /t -WL algorithm [15, 16] defines the timewhen the 1 /t regime of the WL algorithm starts as the time when all energy levels have beenvisited at least once. Clearly, it is practically the same as the tunneling time.The stochastic approximation Monte Carlo (SAMC) algorithm [17, 18] does specify how tochoose the time t when the 1 /t regime starts. Our preliminary analysis shows that the optimalchoice of t is indeed the tunneling time.There is a natural lower limit for the exponent z . It can be understood in terms of the randomwalk. We first consider the classical problem of the 1D unbiased random walk: the position i ofthe random walk changes with equal probability to i − i + 1. The point 0 is the reflectingpoint of the random walk, and the point N is the absorbing point. The first-passage time of therandom walk is known to scale as N . The number of levels N E for the 2D Ising model scales as L , and the number of random walk jumps necessary to cross all energy spectrum hence scalesas L . Therefore, z = 4 is the minimum value for any type of random walk in the energy spaceof the 2D Ising model.For a random walk with bias, which is the WL probability P wl , the growth of the tunnelingtime should be faster, z >
4. There is a worse message, which can be learned from the propertiesof a random walk in a random potential. It states that localization of the random walker occursin one dimension [19]. This probably explains the difficulties in applying the WL algorithm tosystems with complex energy landscapes: the random walker can be temporarily trapped neara global minimum of the probability profile in the simulated DOS.Positions of the vertical dashed line in Figures 1 and 3–5 in paper [11] can be associated withthe tunneling time. During the first simulation stage, the effective tunneling time exponentgrows from z = 4 to a value that depends on the investigated system. Reported estimatedxponents for the scaling of the tunneling time are z = 4 . L and approximately z = 5 . The spectral gap G of the TMES determines the mixing time of the Markov chain [20]: T m = 1 /G, (16) G = λ − λ , (17)where λ and λ are the two largest eigenvalues of the TMES.In the second phase of the 1 /t -WL (or SAMC) algorithm, the DOS is very close to the trueDOS, and the characteristic time of convergence is therefore just the mixing time T m of theMarkov chain.The exact solution for the 1D Ising model [11] can be used to construct the TMES for varyinglattice sizes, and fitting to the inverse spectral gap yields an estimate [12] for the mixing timeexponent T m ∝ L . . Numerically estimating the mixing time for the 2D Ising model [12] leadsto the result T m ∝ L . .
5. Discussion
Using our approach, we can calculate the normalized histogram H = H ( E m ) / P m H ( E m ) fromthe TMES H = P k e T ( E k , E m ). It is obvious that the histogram flatness condition is equivalentto the property that the matrix e T is close to the stochastic matrix. The histogram flatness inthe second simulation stage is closely related to the stochastic properties of the TMES becausethe estimated DOS is very close to the true DOS.We have argued that the time t c in the 1 /t -WL algorithm [15, 16] and the time t in the SAMCalgorithm [17, 18] is of the order of the tunneling time [14]. Therefore, in the first simulationstage, the optimal choice is the original WL algorithm, which drives the estimated DOS closeto the true DOS, at least for systems with a not very complicated free energy landscape. In thesecond simulation stage, the 1 /t -WL algorithm drives the estimated DOS to the true DOS, andthe characteristic time is the mixing time T m . The accuracy of estimating the DOS is given bythe deviation of the largest eigenvalue of the estimated TMES from unity [11].Future research should be done to understand properties of the WL algorithm when appliedto more complicated systems and to systems with a continuous energy spectrum. In systemswith a complex energy landscape, the variation of the transition matrix elements can be huge,and this can lead to a large increase of the tunneling time. It follows from the theory of arandom walk in a random potential [19] that the random walker can be localized in an extremalvalley of the potential. It is important to understand under which conditions this can happenwith the WL algorithm.Partitioning a continuous energy spectrum into bins (done in simulations) allows using theTMES approach to analyze simulations. It would be instructive to understand how the bin sizeinfluences TMES properties, from which we might understand some properties of simulations,the behavior of the control parameter, and also the tunneling and mixing time values.This work was supported by grant 14-21-00158 from the Russian Science Foundation. References [1] Ni. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and E. Teller,
Equation of State Calculationsby Fast Computing Machines , J. Chem. Phys. , 1087 (1953).[2] D.P. Landau and K. Binder, A Guide to Monte Carlo Simulations in Statistical Physics , 4th edition,Cambridge University Press, Cambridge, 2015.3] B. A. Berg, T. Neuhaus,
Multicanonical algorithms for first order phase transitions , , 249 (1991).[4] B. A. Berg, T. Neuhaus, Multicanonical ensemble: A new approach to simulate first-order phase transitions ,Phys. Rev. Lett. , 9 (1992).[5] W. Janke and W. Paul, Thermodynamics and Structure of Macromolecules from Flat-Histogram Monte CarloSimulations , Soft Matter , 642 (2016).[6] C. M. Fortuin, P. W. Kasteleyn, On the random-cluster model: I. introduction and relation to other models ,Physica (Amsterdam), , 536 (1972).[7] R.H. Swendsen and J.-S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations , Phys. Rev. Lett. , 86 (1987).[8] U. Wolff, Lattice field theory as a percolation process , Phys. Rev. Lett. , 1461 (1988).[9] F. Wang, D. P. Landau, Efficient, multiple-range random walk algorithm to calculate the density of states ,Phys. Rev. Lett. , 2050 (2001).[10] F. Wang, D. P. Landau, Determining the density of states for classical statistical models: A random walkalgorithm to produce a flat histogram , Phys. Rev. E , 056101 (2001).[11] L. Yu. Barash, M. A. Fadeeva, and L. N. Shchur, Control of accuracy in the Wang–Landau algorithm , Phys.Rev E , 043307 (2017).[12] M. Fadeeva and L. Shchur, On the mixing time in the Wang–Landau algorithm , J. Phys.: Conf. Ser. ,012028 (2018).[13] D. Landau. Rahman Prize Lecture, 2002.[14] P. Dayal, et al,
Performance limitations of flat-histogram method , Phys. Rev. Lett. , 097201 (2004).[15] R. E. Belardinelli and V. D. Pereyra, Fast algorithm to calculate density of states , Phys. Rev. E , 046701(2007).[16] R. E. Belardinelli and V. D. Pereyra, Wang–Landau algorithm: A theoretical analysis of the saturation of theerror , J. Chem. Phys. , 184105 (2007).[17] F. Liang, C. Liu, and R. J. Carroll,
Stochastic Approximation in Monte Carlo Computation , J. Am. Stat.Assoc. , 305 (2007).[18] F. Liang,
A Theory on Flat Histogram Monte Carlo Algorithms , J. Stat. Phys. , 511 (2006).[19] Ya.G. Sinai,
The Limiting Behavior of a One-Dimensional Random Walk in a Random Medium , TheoryProbab. Appl., , 256 (1982)[20] S. Boyd, P. Diaconis, L. Xiao, Fastest mixing Markov chain on a graph , SIAM Review46