[PDF] Distance between configurations in MCMC simulations and the geometrical optimization of the tempering algorithms

Abstract

For a given Markov chain Monte Carlo (MCMC) algorithm, we define the distance between configurations that quantifies the difficulty of transitions. This distance enables us to investigate MCMC algorithms in a geometrical way, and we investigate the geometry of the simulated tempering algorithm implemented for an extremely multimodal system with highly degenerate vacua. We show that the large scale geometry of the extended configuration space is given by an asymptotically anti-de Sitter metric, and argue in a simple, geometrical way that the tempering parameter should be best placed exponentially to acquire high acceptance rates for transitions in the extra dimension. We also discuss the geometrical optimization of the tempered Lefschetz thimble method, which is an algorithm towards solving the numerical sign problem.

Full PDF

aa r X i v : . [ h e p - l a t ] J a n Distance between conﬁgurations in MCMCsimulations and the geometrical optimization of thetempering algorithms ∗ Masafumi Fukuma, a Nobuyuki Matsumoto † a and Naoya Umeda b a Department of Physics, Kyoto UniversityKyoto 606-8502, Japan b PrincewaterhouseCoopers Aarata LLCOtemachi Park Building, 1-1-1 Otemachi, Chiyoda-ku, Tokyo 100-0004, JapanE-mail: [email protected] , [email protected] , [email protected] For a given Markov chain Monte Carlo (MCMC) algorithm, we deﬁne the distance betweenconﬁgurations that quantiﬁes the difﬁculty of transitions. This distance enables us to investi-gate MCMC algorithms in a geometrical way, and we investigate the geometry of the simulatedtempering algorithm implemented for an extremely multimodal system with highly degeneratevacua. We show that the large scale geometry of the extended conﬁguration space is given by anasymptotically anti-de Sitter metric, and argue in a simple, geometrical way that the temperingparameter should be best placed exponentially to acquire high acceptance rates for transitions inthe extra dimension. We also discuss the geometrical optimization of the tempered Lefschetzthimble method, which is an algorithm towards solving the numerical sign problem. ∗ Report No.: KUNS-2778 † Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative CommonsAttribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). https://pos.sissa.it/ istance between conﬁgurations and the geometrical optimization

Nobuyuki Matsumoto

1. Introduction

In Markov chain Monte Carlo (MCMC) simulations, we often encounter a multimodal dis-tribution, for which transitions between conﬁgurations around different modes are difﬁcult. Weintroduced in [1] the distance between conﬁgurations to enumerate the difﬁculty of transitions.In this talk, we mainly consider the simulated tempering algorithm implemented for an ex-tremely multimodal system with highly degenerate vacua. Our distance enables us to investigatethe algorithm in a geometrical way as follows. We ﬁrst deﬁne a metric on the extended conﬁgu-ration space, and show that it is given by an asymptotically anti-de Sitter (AdS) metric [1, 2]. Wethen show in a simple, geometrical way that the tempering parameter should be best placed expo-nentially to acquire high acceptance rates for transitions in the extra dimension. We further discussthe optimized form of the tempering parameter in the tempered Lefschetz thimble method (TLTM)[3, 4, 5], which is an algorithm towards solving the numerical sign problem. This talk is based onwork [1, 2, 4].

2. Deﬁnition of distance

In this section, we brieﬂy review the distance introduced in [1]. Let M ≡ { x } be a conﬁgura-tion space, and S ( x ) the action. Suppose that we are given an MCMC algorithm which generates aconﬁguration x from y with the conditional probability P ( x | y ) = ( x | ˆ P | y ) . We assume that it satisﬁesthe detailed balance condition with respect to p eq ( x ) ≡ ( / Z ) e − S ( x ) (cid:0) Z ≡ R dx e − S ( x ) (cid:1) . We fur-ther assume that the Markov chain satisﬁes suitable ergodic properties so that p eq ( x ) is the uniqueequilibrium distribution.To deﬁne the distance, we consider the Markov chain in equilibrium. We denote by W n theset of transition paths with n steps in equilibrium, and by W n ( x , y ) , which is a subset of W n , theset of transition paths with n steps which start from y and end at x in equilibrium. We deﬁne the connectivity between x and y by the fraction of the sizes of the two sets: f n ( x , y ) ≡ | W n ( x , y ) || W n | = P n ( x | y ) p eq ( y ) = f n ( y , x ) . (2.1)Here P n ( x | y ) = ( x | ˆ P n | y ) is an n -step transition matrix. We further introduce the normalized con-nectivity as F n ( x , y ) ≡ f n ( x , y ) p f n ( x , x ) f n ( y , y ) , (2.2)with which we deﬁne the distance as follows: d n ( x , y ) ≡ p − F n ( x , y ) . (2.3)It can be shown that this distance gives a universal form at large scales for algorithms that generatelocal moves in the conﬁguration space [1].As an example, let us ﬁrst consider the action S ( x ) = ( β / ) ∑ D µ = x µ , which gives a Gaussiandistribution in equilibrium. The distance can be calculated analytically for the Langevin algorithm: d n ( x , y ) = β ( β n ε ) | x − y | , (2.4)1 istance between conﬁgurations and the geometrical optimization Nobuyuki Matsumoto where ε is the increment of the ﬁctitious time. We thus ﬁnd that a ﬂat and translation invariantmetric is obtained for Gaussian distributions.As a second example, we consider the double-well action S ( x ) = ( β / )( x − ) , which gives amultimodal equilibrium distribution. We again use the Langevin algorithm to calculate the distance.By making use of a quantum mechanical argument and an instanton calculation, we ﬁnd that thedistance behaves for β ≫ d n ( x , y ) = O ( e − β n ε / ) ( when x , y are around different modes ) d n ( x , y ) ∝ β ( when x , y are around the same mode ) . (2.5)Therefore, we conﬁrm that our distance quantiﬁes the difﬁculty of transitions.

3. Emergence of AdS geometry and the geometric optimization

The simulated tempering [6] is an algorithm to speed up the relaxation to equilibrium. In thisalgorithm, we choose a parameter β in the action (e.g. an overall coefﬁcient) as the temperingparameter, and extend the conﬁguration space in the β direction: M → M × A , where A ≡{ β , β , · · · , β A } = { β a } a = , ··· , A and β is the original parameter of interest. We assume that { β a } areordered as β > β · · · > β A . We set up a Markov chain in the extended conﬁguration space M × A in such a way that the global equilibrium distribution becomes P eq ( x , β a ) ≡ w a exp [ − S ( x ; β a )] . Wechoose the weights { w a } to be w a = / ( A + ) Z a ( Z a ≡ R dx e − S ( x ; β a ) ) . Expectation values are tobe calculated by ﬁrst realizing global equilibrium and then retrieving the subsample at β a = .For multimodal systems, transitions between conﬁgurations around different modes are difﬁ-cult. This situation can get improved by extending the conﬁguration space as above because thensuch conﬁguration can communicate easily by passing through the region with small β a . The bene-ﬁt due to tempering can be enumerated in terms of the distance. Table 1 shows the distance betweentwo modes in the original conﬁguration space with and without tempering for the double-well ac-tion. We see that the introduction of the simulated tempering drastically reduces the distance. n without tempering with tempering10 39 . .

550 19 . . . . . . ,

000 11 . . ,

000 8 .

46 2 . × − Table 1:

Comparison of the distance with and without tempering [1].

We hereafter consider an extremely multimodal system with highly degenerate vacua. As atypical example, we use the action S ( x ; β ) ≡ β ∑ D µ = (cid:0) − cos ( π x µ ) (cid:1) .2 istance between conﬁgurations and the geometrical optimization Nobuyuki Matsumoto

According to the deﬁnition of our distance, d n ( x , y ) is negligibly small when x and y lie aroundthe same mode, while d n ( x , y ) is large when x and y are around different modes. Therefore, when weinvestigate the large-scale geometry of M , we can identify conﬁgurations around the same mode asa single conﬁguration. We write the coarse-grained conﬁguration space thus obtained as ¯ M (for thecosine action, ¯ M = Z D ). We can similarly coarse-grain the extended conﬁguration space when thesimulated tempering is implemented. We write the extended, coarse-grained conﬁguration spaceas ¯ M × A .We deﬁne the metric on ¯ M × A = { X ≡ ( x , β a ) } in terms of the distance: ds = g µν dX µ dX ν = d n ( X , X + dX ) , (3.1)where X and X + dX denotes nearby points in ¯ M × A . It can be shown that this metric is anasymptotically AdS metric [2]. We here sketch the proof. We ﬁrst note that the action is invariantunder the lattice translation x µ → x µ + m ( m ∈ Z ) and thus, the metric components are independentof x . Furthermore, since the action is also invariant under the reﬂection x µ → − x µ , there is nooff-diagonal components. Thus we deduce that the metric takes the following form: ds = f ( β ) d β + g ( β ) D ∑ µ = dx µ . (3.2)We are left with determining two functions f ( β ) , g ( β ) .We ﬁrst consider g ( β ) . Since transitions in the x direction are difﬁcult for larger β , g ( β ) should be an increasing function of β at least when β is large. We here assume that the leadingdependence on β for β ≫ β : d n (( x , β ) , ( x + dx , β )) = const . β q D ∑ µ = dx µ ( β ≫ ) , (3.3)where q is a constant. On the other hand, the functional form of f ( β ) for β ≫ β direction from the deﬁnition (2.3) as follows. We ﬁrst approx-imate the local equilibrium distribution in β ≫ ( x , β a ) , ( x , β a + ) is a function of the ratio β a / β a + [2]. This means that thedistance in the β direction is invariant under scaling β → λβ for large β , and thus we obtain d n (( x , β ) , ( x , β + d β )) = const . d β β ( β ≫ ) . (3.4)Putting everything together, we conclude that the metric on ¯ M × A is given by ds = l d β β + αβ q D ∑ µ = dx µ ! ( β ≫ ) (3.5)with constants l , α , q . This is an AdS metric, as can be seen by the coordinate transformation β → ( √ α qz / ) − / q : ds = (cid:18) lq (cid:19) · z dz + D ∑ µ = dx µ ! , (3.6)3 istance between conﬁgurations and the geometrical optimization Nobuyuki Matsumoto

Figure 1:

Calculated distances [2]. The solid line is the geodesic distance with the ﬁtted parameters. which is a Euclidean AdS metric in the Poincaré coordinates.We can verify this metric in the following way. We ﬁrst numerically calculate the distance d n ( X ≡ ( , β a ) , Y ≡ ( x , β a )) for a = , , x = , · · · ,

10. We then make a χ ﬁt by using as theﬁtting function the geodesic distance calculated from the metric (3.5): I ( x , β a ; l , α , q ) ≡ lq ln  q ( q √ α | x | / ) + β − qa + q √ α | x | / β − q / a  . (3.7)We carried out the above calculations for a two-dimensional ( D = ) conﬁguration space, andobtained the results shown in Fig. 1 [2]. The parameters are determined to be l = . ( ) , α = . ( ) × , q = . ( ) with p χ / ( − ) = .

7. The good agreement shows that thedistances can be regarded as geodesic distances of an asymptotically Euclidean AdS metric.

Our aim here is to optimize the functional form of β a = β ( a ) so that transitions in the extendeddirection become smooth. We make this optimization by referring to the geometry of the extendedconﬁguration space. Note that, since it is the parameter a that is directly dealt with in MCMC sim-ulations, we expect that the smooth transitions correspond to a ﬂat metric in the extended directionwhen a is used as one of the coordinates: g ββ d β = const . da . Since the geometry of ¯ M × A is asymptotically AdS (3.5), this means that d β / β ∝ da . This in turn determines the functionalform of β a to be exponential in a , β a = β R − a ( β , R : constants).We conﬁrmed this expectation numerically by gradually changing the value of β a so that thedistances between different modes are minimized [2]. The result is shown in Fig. 2. We see thatthe optimized form of β a certainly takes an exponential form. Figure 2:

Optimized values for { β a } ( a = , . . . , ) [2]. The blue dots are the initial values, and the orangedots are the resulting optimized values. istance between conﬁgurations and the geometrical optimization Nobuyuki Matsumoto

4. Tempering parameter in the tempered Lefschetz thimble method

The tempered Lefschetz thimble method (TLTM) [3, 4, 5] (see also [9]) is an algorithm towardssolving the sign problem. In this algorithm, by deforming the integration region from R N to Σ ⊂ C N , we reduce the oscillatory behavior of the reweighted integrals that appear in the followingexpression: h O ( x ) i = R Σ dz e − S ( z ) O ( z ) R Σ dz e − S ( z ) = R R N dx e − Re S ( x ) e − i Im S ( x ) O ( x ) / R R N dx e − Re S ( x ) R R N dx e − Re S ( x ) e − i Im S ( x ) / R R N dx e − Re S ( x ) . (4.1)In Lefschetz thimble methods, such a deformation is made according to the antiholomorphic gra-dient ﬂow: ˙ z it = [ ∂ i S ( z t )] ∗ with z it = = x i , where the dot denotes the derivative with respect to t .This ﬂow equation deﬁnes a map from x ∈ R N to z = z t ( x ) ∈ C N , and a new integration surfaceis given by Σ t ≡ z t ( R N ) . Σ t approaches a union of Lefschetz thimbles { J σ } as t → ∞ , and theintegrals remain unchanged under the continuous deformation thanks to Cauchy’s theorem. Eachthimble J σ has a critical point z σ (where ∂ z i S ( z σ ) = z ∈ J σ give the samephase, Im S ( z ) = Im S ( z σ ) = const. Thus, the oscillatory behavior of the reweighted integrals willget much reduced for large t . In the TLTM, we implement a tempering algorithm by choosing theﬂow time t as the tempering parameter in order to cure the ergodicity problem caused by inﬁnitelyhigh potential barriers between different thimbles.We can give a geometrical argument that the optimized form of ﬂow times t a is linear in a [4].In fact, at large t , Re S ( z t ( x )) increases exponentially as β t ∝ e const . t . As in the simulated tempering,we expect that the optimal form of β t a is an exponential function of a . Therefore, t a should be alinear function of a (see also discussions in [8]).In the application of TLTM to the Hubbard model [4], we conﬁrmed that this choice actuallyworks well. Fig. 3 shows the acceptance rates between adjacent time slices, where t a is taken to bea piecewise linear function of a with a single breakpoint. This choice results in the acceptance ratesbeing roughly above 0.4. Most notably, the acceptance rates become constant for larger t (larger a ), where Σ t gets close to the thimbles and the above discussion becomes more valid. (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:1) (cid:2) (cid:3) (cid:4) (cid:5) (cid:6)(cid:1)(cid:1)(cid:7)(cid:1)(cid:1)(cid:7)(cid:2)(cid:1)(cid:7)(cid:3)(cid:1)(cid:7)(cid:4)(cid:1)(cid:7)(cid:5) (cid:1) (cid:1)(cid:2)(cid:2)(cid:3) (cid:4) (cid:5) (cid:1) (cid:6) (cid:2)(cid:3) (cid:7) (cid:1) (cid:5) (cid:3) Figure 3:

Acceptance rates in the t a direction with βµ =

8. Larger a corresponds to larger t a .

5. Conclusion and outlook

We introduced the distance between conﬁgurations, which quantiﬁes the difﬁculty of transi-tions. We then discussed that an asymptotically AdS geometry emerges in the extended, coarse-grained conﬁguration space, and showed that the optimization of the tempering parameter can be5 istance between conﬁgurations and the geometrical optimization

Nobuyuki Matsumoto made in a simple, geometrical way. We further argued how to determine the optimized form of thetempering parameter in the tempered Lefschetz thimble method.As for future work, it should be interesting to investigate the distance in the Yang-Mills the-ory, where the coarse-graining of the conﬁguration space can be made by identifying conﬁgurationswith the same topological charge as a single conﬁguration. We further would like to apply the dis-tance to models whose degrees of freedom can be interpreted as spacetime coordinates (e.g., matrixmodels) [10]. Then the geometry of the conﬁguration space directly gives that of a spacetime. Weexpect that this formulation gives a systematic way to construct a spacetime geometry from ran-domness and provides us with a way to deﬁne a quantum theory of gravity.A study along these lines is now in progress and will be reported elsewhere.

Acknowledgments

The authors greatly thank the organizers of LATTICE 2019. They also thank Andrei Alexan-dru, Hiroki Hoshina, Etsuko Itou, Yoshio Kikukawa, Yuto Mori and Akira Onishi for useful dis-cussions. This work was partially supported by JSPS KAKENHI (Grant Numbers 16K05321,18J22698 and 17J08709) and by SPIRITS 2019 of Kyoto University (PI: M.F.).

References [1] M. Fukuma, N. Matsumoto and N. Umeda,

Distance between conﬁgurations in Markov chain MonteCarlo simulations , JHEP (2017) 001 [ ].[2] M. Fukuma, N. Matsumoto and N. Umeda,

Emergence of AdS geometry in the simulated temperingalgorithm , JHEP (2018) 060 [ ].[3] M. Fukuma and N. Umeda,

Parallel tempering algorithm for integration over Lefschetz thimbles , PTEP (2017) 073B01 [ ].[4] M. Fukuma, N. Matsumoto and N. Umeda,

Applying the tempered Lefschetz thimble method to theHubbard model away from half ﬁlling , Phys. Rev.

D100 (2019) 114510 [ ].[5] M. Fukuma, N. Matsumoto and N. Umeda,

Implementation of the HMC algorithm on the temperedLefschetz thimble method , .[6] E. Marinari and G. Parisi, Simulated tempering: A New Monte Carlo scheme , Europhys. Lett. (1992) 451 [ hep-lat/9205018 ].[7] A. Alexandru, G. Ba¸sar and P. Bedaque, Monte Carlo algorithm for simulating fermions on Lefschetzthimbles , Phys. Rev.

D93 (2016) 014504 [ ].[8] A. Alexandru, G. Ba¸sar, P. F. Bedaque and N. C. Warrington,

Tempered transitions between thimbles , Phys. Rev.

D96 (2017) 034513 [ ].[9] M. Fukuma, N. Matsumoto and N. Umeda,

Tempered Lefschetz thimble method and its application tothe Hubbard model away from half ﬁlling , PoS

LATTICE2019 (2019) 090 [ ].[10] M. Fukuma and N. Matsumoto, in preparation ..