Distance between configurations in MCMC simulations and the geometrical optimization of the tempering algorithms
aa r X i v : . [ h e p - l a t ] J a n Distance between configurations in MCMCsimulations and the geometrical optimization of thetempering algorithms ∗ Masafumi Fukuma, a Nobuyuki Matsumoto † a and Naoya Umeda b a Department of Physics, Kyoto UniversityKyoto 606-8502, Japan b PrincewaterhouseCoopers Aarata LLCOtemachi Park Building, 1-1-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, JapanE-mail: [email protected] , [email protected] , [email protected] For a given Markov chain Monte Carlo (MCMC) algorithm, we define the distance betweenconfigurations that quantifies the difficulty of transitions. This distance enables us to investi-gate MCMC algorithms in a geometrical way, and we investigate the geometry of the simulatedtempering algorithm implemented for an extremely multimodal system with highly degeneratevacua. We show that the large scale geometry of the extended configuration space is given by anasymptotically anti-de Sitter metric, and argue in a simple, geometrical way that the temperingparameter should be best placed exponentially to acquire high acceptance rates for transitions inthe extra dimension. We also discuss the geometrical optimization of the tempered Lefschetzthimble method, which is an algorithm towards solving the numerical sign problem. ∗ Report No.: KUNS-2778 † Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative CommonsAttribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). https://pos.sissa.it/ istance between configurations and the geometrical optimization
Nobuyuki Matsumoto
1. Introduction
In Markov chain Monte Carlo (MCMC) simulations, we often encounter a multimodal dis-tribution, for which transitions between configurations around different modes are difficult. Weintroduced in [1] the distance between configurations to enumerate the difficulty of transitions.In this talk, we mainly consider the simulated tempering algorithm implemented for an ex-tremely multimodal system with highly degenerate vacua. Our distance enables us to investigatethe algorithm in a geometrical way as follows. We first define a metric on the extended configu-ration space, and show that it is given by an asymptotically anti-de Sitter (AdS) metric [1, 2]. Wethen show in a simple, geometrical way that the tempering parameter should be best placed expo-nentially to acquire high acceptance rates for transitions in the extra dimension. We further discussthe optimized form of the tempering parameter in the tempered Lefschetz thimble method (TLTM)[3, 4, 5], which is an algorithm towards solving the numerical sign problem. This talk is based onwork [1, 2, 4].
2. Definition of distance
In this section, we briefly review the distance introduced in [1]. Let M ≡ { x } be a configura-tion space, and S ( x ) the action. Suppose that we are given an MCMC algorithm which generates aconfiguration x from y with the conditional probability P ( x | y ) = ( x | ˆ P | y ) . We assume that it satisfiesthe detailed balance condition with respect to p eq ( x ) ≡ ( / Z ) e − S ( x ) (cid:0) Z ≡ R dx e − S ( x ) (cid:1) . We fur-ther assume that the Markov chain satisfies suitable ergodic properties so that p eq ( x ) is the uniqueequilibrium distribution.To define the distance, we consider the Markov chain in equilibrium. We denote by W n theset of transition paths with n steps in equilibrium, and by W n ( x , y ) , which is a subset of W n , theset of transition paths with n steps which start from y and end at x in equilibrium. We define the connectivity between x and y by the fraction of the sizes of the two sets: f n ( x , y ) ≡ | W n ( x , y ) || W n | = P n ( x | y ) p eq ( y ) = f n ( y , x ) . (2.1)Here P n ( x | y ) = ( x | ˆ P n | y ) is an n -step transition matrix. We further introduce the normalized con-nectivity as F n ( x , y ) ≡ f n ( x , y ) p f n ( x , x ) f n ( y , y ) , (2.2)with which we define the distance as follows: d n ( x , y ) ≡ p − F n ( x , y ) . (2.3)It can be shown that this distance gives a universal form at large scales for algorithms that generatelocal moves in the configuration space [1].As an example, let us first consider the action S ( x ) = ( β / ) ∑ D µ = x µ , which gives a Gaussiandistribution in equilibrium. The distance can be calculated analytically for the Langevin algorithm: d n ( x , y ) = β ( β n ε ) | x − y | , (2.4)1 istance between configurations and the geometrical optimization Nobuyuki Matsumoto where ε is the increment of the fictitious time. We thus find that a flat and translation invariantmetric is obtained for Gaussian distributions.As a second example, we consider the double-well action S ( x ) = ( β / )( x − ) , which gives amultimodal equilibrium distribution. We again use the Langevin algorithm to calculate the distance.By making use of a quantum mechanical argument and an instanton calculation, we find that thedistance behaves for β ≫ d n ( x , y ) = O ( e − β n ε / ) ( when x , y are around different modes ) d n ( x , y ) ∝ β ( when x , y are around the same mode ) . (2.5)Therefore, we confirm that our distance quantifies the difficulty of transitions.
3. Emergence of AdS geometry and the geometric optimization
The simulated tempering [6] is an algorithm to speed up the relaxation to equilibrium. In thisalgorithm, we choose a parameter β in the action (e.g. an overall coefficient) as the temperingparameter, and extend the configuration space in the β direction: M → M × A , where A ≡{ β , β , · · · , β A } = { β a } a = , ··· , A and β is the original parameter of interest. We assume that { β a } areordered as β > β · · · > β A . We set up a Markov chain in the extended configuration space M × A in such a way that the global equilibrium distribution becomes P eq ( x , β a ) ≡ w a exp [ − S ( x ; β a )] . Wechoose the weights { w a } to be w a = / ( A + ) Z a ( Z a ≡ R dx e − S ( x ; β a ) ) . Expectation values are tobe calculated by first realizing global equilibrium and then retrieving the subsample at β a = .For multimodal systems, transitions between configurations around different modes are diffi-cult. This situation can get improved by extending the configuration space as above because thensuch configuration can communicate easily by passing through the region with small β a . The bene-fit due to tempering can be enumerated in terms of the distance. Table 1 shows the distance betweentwo modes in the original configuration space with and without tempering for the double-well ac-tion. We see that the introduction of the simulated tempering drastically reduces the distance. n without tempering with tempering10 39 . .
550 19 . . . . . . ,
000 11 . . ,
000 8 .
46 2 . × − Table 1:
Comparison of the distance with and without tempering [1].
We hereafter consider an extremely multimodal system with highly degenerate vacua. As atypical example, we use the action S ( x ; β ) ≡ β ∑ D µ = (cid:0) − cos ( π x µ ) (cid:1) .2 istance between configurations and the geometrical optimization Nobuyuki Matsumoto
According to the definition of our distance, d n ( x , y ) is negligibly small when x and y lie aroundthe same mode, while d n ( x , y ) is large when x and y are around different modes. Therefore, when weinvestigate the large-scale geometry of M , we can identify configurations around the same mode asa single configuration. We write the coarse-grained configuration space thus obtained as ¯ M (for thecosine action, ¯ M = Z D ). We can similarly coarse-grain the extended configuration space when thesimulated tempering is implemented. We write the extended, coarse-grained configuration spaceas ¯ M × A .We define the metric on ¯ M × A = { X ≡ ( x , β a ) } in terms of the distance: ds = g µν dX µ dX ν = d n ( X , X + dX ) , (3.1)where X and X + dX denotes nearby points in ¯ M × A . It can be shown that this metric is anasymptotically AdS metric [2]. We here sketch the proof. We first note that the action is invariantunder the lattice translation x µ → x µ + m ( m ∈ Z ) and thus, the metric components are independentof x . Furthermore, since the action is also invariant under the reflection x µ → − x µ , there is nooff-diagonal components. Thus we deduce that the metric takes the following form: ds = f ( β ) d β + g ( β ) D ∑ µ = dx µ . (3.2)We are left with determining two functions f ( β ) , g ( β ) .We first consider g ( β ) . Since transitions in the x direction are difficult for larger β , g ( β ) should be an increasing function of β at least when β is large. We here assume that the leadingdependence on β for β ≫ β : d n (( x , β ) , ( x + dx , β )) = const . β q D ∑ µ = dx µ ( β ≫ ) , (3.3)where q is a constant. On the other hand, the functional form of f ( β ) for β ≫ β direction from the definition (2.3) as follows. We first approx-imate the local equilibrium distribution in β ≫ ( x , β a ) , ( x , β a + ) is a function of the ratio β a / β a + [2]. This means that thedistance in the β direction is invariant under scaling β → λβ for large β , and thus we obtain d n (( x , β ) , ( x , β + d β )) = const . d β β ( β ≫ ) . (3.4)Putting everything together, we conclude that the metric on ¯ M × A is given by ds = l d β β + αβ q D ∑ µ = dx µ ! ( β ≫ ) (3.5)with constants l , α , q . This is an AdS metric, as can be seen by the coordinate transformation β → ( √ α qz / ) − / q : ds = (cid:18) lq (cid:19) · z dz + D ∑ µ = dx µ ! , (3.6)3 istance between configurations and the geometrical optimization Nobuyuki Matsumoto
Figure 1:
Calculated distances [2]. The solid line is the geodesic distance with the fitted parameters. which is a Euclidean AdS metric in the Poincaré coordinates.We can verify this metric in the following way. We first numerically calculate the distance d n ( X ≡ ( , β a ) , Y ≡ ( x , β a )) for a = , , x = , · · · ,
10. We then make a χ fit by using as thefitting function the geodesic distance calculated from the metric (3.5): I ( x , β a ; l , α , q ) ≡ lq ln q ( q √ α | x | / ) + β − qa + q √ α | x | / β − q / a . (3.7)We carried out the above calculations for a two-dimensional ( D = ) configuration space, andobtained the results shown in Fig. 1 [2]. The parameters are determined to be l = . ( ) , α = . ( ) × , q = . ( ) with p χ / ( − ) = .
7. The good agreement shows that thedistances can be regarded as geodesic distances of an asymptotically Euclidean AdS metric.
Our aim here is to optimize the functional form of β a = β ( a ) so that transitions in the extendeddirection become smooth. We make this optimization by referring to the geometry of the extendedconfiguration space. Note that, since it is the parameter a that is directly dealt with in MCMC sim-ulations, we expect that the smooth transitions correspond to a flat metric in the extended directionwhen a is used as one of the coordinates: g ββ d β = const . da . Since the geometry of ¯ M × A is asymptotically AdS (3.5), this means that d β / β ∝ da . This in turn determines the functionalform of β a to be exponential in a , β a = β R − a ( β , R : constants).We confirmed this expectation numerically by gradually changing the value of β a so that thedistances between different modes are minimized [2]. The result is shown in Fig. 2. We see thatthe optimized form of β a certainly takes an exponential form. Figure 2:
Optimized values for { β a } ( a = , . . . , ) [2]. The blue dots are the initial values, and the orangedots are the resulting optimized values. istance between configurations and the geometrical optimization Nobuyuki Matsumoto
4. Tempering parameter in the tempered Lefschetz thimble method
The tempered Lefschetz thimble method (TLTM) [3, 4, 5] (see also [9]) is an algorithm towardssolving the sign problem. In this algorithm, by deforming the integration region from R N to Σ ⊂ C N , we reduce the oscillatory behavior of the reweighted integrals that appear in the followingexpression: h O ( x ) i = R Σ dz e − S ( z ) O ( z ) R Σ dz e − S ( z ) = R R N dx e − Re S ( x ) e − i Im S ( x ) O ( x ) / R R N dx e − Re S ( x ) R R N dx e − Re S ( x ) e − i Im S ( x ) / R R N dx e − Re S ( x ) . (4.1)In Lefschetz thimble methods, such a deformation is made according to the antiholomorphic gra-dient flow: ˙ z it = [ ∂ i S ( z t )] ∗ with z it = = x i , where the dot denotes the derivative with respect to t .This flow equation defines a map from x ∈ R N to z = z t ( x ) ∈ C N , and a new integration surfaceis given by Σ t ≡ z t ( R N ) . Σ t approaches a union of Lefschetz thimbles { J σ } as t → ∞ , and theintegrals remain unchanged under the continuous deformation thanks to Cauchy’s theorem. Eachthimble J σ has a critical point z σ (where ∂ z i S ( z σ ) = z ∈ J σ give the samephase, Im S ( z ) = Im S ( z σ ) = const. Thus, the oscillatory behavior of the reweighted integrals willget much reduced for large t . In the TLTM, we implement a tempering algorithm by choosing theflow time t as the tempering parameter in order to cure the ergodicity problem caused by infinitelyhigh potential barriers between different thimbles.We can give a geometrical argument that the optimized form of flow times t a is linear in a [4].In fact, at large t , Re S ( z t ( x )) increases exponentially as β t ∝ e const . t . As in the simulated tempering,we expect that the optimal form of β t a is an exponential function of a . Therefore, t a should be alinear function of a (see also discussions in [8]).In the application of TLTM to the Hubbard model [4], we confirmed that this choice actuallyworks well. Fig. 3 shows the acceptance rates between adjacent time slices, where t a is taken to bea piecewise linear function of a with a single breakpoint. This choice results in the acceptance ratesbeing roughly above 0.4. Most notably, the acceptance rates become constant for larger t (larger a ), where Σ t gets close to the thimbles and the above discussion becomes more valid. (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:1)(cid:1)(cid:7)(cid:1)(cid:1)(cid:7)(cid:2)(cid:1)(cid:7)(cid:3)(cid:1)(cid:7)(cid:4)(cid:1)(cid:7)(cid:5) (cid:1) (cid:1)(cid:2)(cid:2)(cid:3) (cid:4) (cid:5) (cid:1) (cid:6) (cid:2)(cid:3) (cid:7) (cid:1) (cid:5) (cid:3) Figure 3:
Acceptance rates in the t a direction with βµ =
8. Larger a corresponds to larger t a .
5. Conclusion and outlook
We introduced the distance between configurations, which quantifies the difficulty of transi-tions. We then discussed that an asymptotically AdS geometry emerges in the extended, coarse-grained configuration space, and showed that the optimization of the tempering parameter can be5 istance between configurations and the geometrical optimization
Nobuyuki Matsumoto made in a simple, geometrical way. We further argued how to determine the optimized form of thetempering parameter in the tempered Lefschetz thimble method.As for future work, it should be interesting to investigate the distance in the Yang-Mills the-ory, where the coarse-graining of the configuration space can be made by identifying configurationswith the same topological charge as a single configuration. We further would like to apply the dis-tance to models whose degrees of freedom can be interpreted as spacetime coordinates (e.g., matrixmodels) [10]. Then the geometry of the configuration space directly gives that of a spacetime. Weexpect that this formulation gives a systematic way to construct a spacetime geometry from ran-domness and provides us with a way to define a quantum theory of gravity.A study along these lines is now in progress and will be reported elsewhere.
Acknowledgments
The authors greatly thank the organizers of LATTICE 2019. They also thank Andrei Alexan-dru, Hiroki Hoshina, Etsuko Itou, Yoshio Kikukawa, Yuto Mori and Akira Onishi for useful dis-cussions. This work was partially supported by JSPS KAKENHI (Grant Numbers 16K05321,18J22698 and 17J08709) and by SPIRITS 2019 of Kyoto University (PI: M.F.).
References [1] M. Fukuma, N. Matsumoto and N. Umeda,
Distance between configurations in Markov chain MonteCarlo simulations , JHEP (2017) 001 [ ].[2] M. Fukuma, N. Matsumoto and N. Umeda,
Emergence of AdS geometry in the simulated temperingalgorithm , JHEP (2018) 060 [ ].[3] M. Fukuma and N. Umeda,
Parallel tempering algorithm for integration over Lefschetz thimbles , PTEP (2017) 073B01 [ ].[4] M. Fukuma, N. Matsumoto and N. Umeda,
Applying the tempered Lefschetz thimble method to theHubbard model away from half filling , Phys. Rev.
D100 (2019) 114510 [ ].[5] M. Fukuma, N. Matsumoto and N. Umeda,
Implementation of the HMC algorithm on the temperedLefschetz thimble method , .[6] E. Marinari and G. Parisi, Simulated tempering: A New Monte Carlo scheme , Europhys. Lett. (1992) 451 [ hep-lat/9205018 ].[7] A. Alexandru, G. Ba¸sar and P. Bedaque, Monte Carlo algorithm for simulating fermions on Lefschetzthimbles , Phys. Rev.
D93 (2016) 014504 [ ].[8] A. Alexandru, G. Ba¸sar, P. F. Bedaque and N. C. Warrington,
Tempered transitions between thimbles , Phys. Rev.
D96 (2017) 034513 [ ].[9] M. Fukuma, N. Matsumoto and N. Umeda,
Tempered Lefschetz thimble method and its application tothe Hubbard model away from half filling , PoS
LATTICE2019 (2019) 090 [ ].[10] M. Fukuma and N. Matsumoto, in preparation ..