Representation of complex probabilities and complex Gibbs sampling
RRepresentation of complex probabilities and complex Gibbssampling
Lorenzo Luis
Salcedo , ,(cid:63) Departamento de Física Atómica Molecular y Nuclear, Universidad de Granada, E-18071, Spain Instituto Carlos I de Física Teórica y Computacional, Universidad de Granada, E-18071, Spain
Abstract.
Complex weights appear in Physics which are beyond a straightforward im-portance sampling treatment, as required in Monte Carlo calculations. This is the well-known sign problem. The complex Langevin approach amounts to e ff ectively construct apositive distribution on the complexified manifold reproducing the expectation values ofthe observables through their analytical extension. Here we discuss the direct construc-tion of such positive distributions paying attention to their localization on the complex-ified manifold. Explicit localized representations are obtained for complex probabilitiesdefined on Abelian and non Abelian groups. The viability and performance of a complexversion of the heat bath method, based on such representations, is analyzed. Many physical problems reduce to computing weighted averages (cid:104) A (cid:105) P = (cid:82) d n x P ( x ) A ( x ) (cid:82) d n x P ( x ) x ∈ R n P ( x ) = e − S ( x ) . (1)For a large number of entangled degrees of freedom the Monte Carlo method is often the only viableapproach: x ∼ P ( Importance sampling with respect to P ( x )) . (2)However complex probabilities do appear in certain cases. An outstanding example is lattice QCDat finite density [1]. Straightforward importance sampling is meaningless when P ( x ) is not a positivedistribution and this is the well known sign problem [2].Several proposals have been put forward to deal with complex weights within Monte Carlo, amongother: Reweighting, complex Langevin equation (CLE) [3–8], Lefschetz thimbles and variants [9, 10],reweighting the Complex Langevin equation [11], or particular treatments for special problems, e.g.,worm algorithms [12], or other [13].The sign problem is hard [2]. Nothing as robust and reliable as the Metropolis algorithm is avai-lable in the complex case and each of the abovementioned methods has well-known limitations or is (cid:63) Acknowledges financial support by Spanish Ministerio de Economía y Competitividad (Grant No. FIS2014-59386-P), andAgencia de Innovación y Desarrollo de Andalucía (Grant No. FQM225), and Centro de Servicios de Informática y Redes deComunicaciones (CSIRC), Universidad de Granada, for providing computing time, e-mail: [email protected] a r X i v : . [ h e p - l a t ] O c t oo recent to asses its performance. It is fair to say that, at present, no existing method can be regardedas the complete solution. In this view new alternative approaches are worth pursuing.Our goal is to explore a complex version of the heat bath method. The idea is to do the updatesby means of a representation of the conditional complex probability. This requires studying represen-tations of complex probabilities by themselves (i.e., quite independently of the CLE construction). The CLE is based on representing P ( x ) = e − S ( x ) by an ordinary (real and positive) probability dis-tribution ρ ( z ) defined on the complexified manifold , so that ideally (cid:104) A (cid:105) P = (cid:104) A (cid:105) ρ where A ( z ) is the analytical extension of A ( x ). In the CLE the distribution ρ ( z ) is not obtained nor needed explicitly.Abstracting the idea, one can define [14] a representation ρ ( z ) of a complex distribution P ( x ) on R n ,as a distribution on C n such that (cid:90) d n z ρ ( z ) A ( z ) = (cid:90) d n x P ( x ) A ( x ) for generic A ( x ) . (3)Equivalently, P ( x ) = ( K ρ )( x ) ≡ (cid:90) d n y ρ ( x − i y, y ) ( analytical projection ) . (4)Of interest for Monte Carlo are the positive representations , that is, those with ρ ( z ) ≥ P ( x ) needs not be ananalytical function, any (normalizable) complex distribution has positive representations sharing thesame symmetries enjoyed by the complex probability itself. Moreover, the expectation values of thesubset of the holomorphic function do not fully fix ρ ( z ): Representations of a given P ( x ) are by nomeans unique. E.g., if ρ is a positive representation, so is exp( t ∇ ) ρ ( t >
0) [14].
Given a complex probability P ( x ) on R n let us choose P ( x ) > H ( x ) such that (cid:90) d n x P ( x ) = (cid:90) d n x P ( x ) , P ( x ) = P ( x ) + ∇ ( P ( x ) H ( x ) ) . (5)Then it is easy to verify [15] that (cid:104) A ( z ) (cid:105) = (cid:104) A (cid:105) P with z = x (cid:48) + z (cid:48) H ( x (cid:48) ), where x (cid:48) ∼ P and z (cid:48) ∼ q , and q ( z ) is any positive representation of δ ( x ) + δ (cid:48) ( x ), such as q ( z ) = π (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) − z (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e −| z | / .Nevertheless, the sign problem is not solved in practice since the available constructions are notviable for large systems: ( i ) the normalization of P ( x ) is not usually known, ( ii ) the vector field H ( x )is not easy to obtain for large n , and ( iii ) the dispersion of z increases exponentially with n , spoilingthe Monte Carlo estimates. .1.3 Conditions on the support of positive representations This is an important point in the representation approach: representations with smaller widths yieldsmaller variances and poor representations may not be better than reweighting. As a rule, the morecomplex P ( x ), the wider must be ρ ( z ) >
0. For given P ( x ), the width (size of the support in theimaginary direction) of any positive ρ ( z ) is bounded from below. For instance, the relation |(cid:104) A ( x ) (cid:105) P | = |(cid:104) A ( z ) (cid:105) ρ | ≤ max z ∈ supp ρ {| A ( z ) |} for arbitrary A (6)constrains how localized ρ can be. In particular A = e − ikx provides quite explicit bounds [16].For local actions good quality positive representations (i.e., with bounded width as n increases)should exist, eventually outperforming reweighting: CLE being an example, when it works. Attending to the issue of the width, let us consider representations with support along two horizontallines, R ± iY ( two-branch representations [16]): ρ ( z ) = Q + ( x ) δ ( y − Y ) + Q − ( x ) δ ( y + Y ) Q ± ( x ) ≥ . (7)It is easy to check that such ρ ( z ) will represent P ( x ) provided P ( x ) = Q + ( x − iY ) + Q − ( x + iY ) , (8)due to (cid:104) A ( x ) (cid:105) P = (cid:88) σ = ± N σ (cid:104) A ( x + i σ Y ) (cid:105) Q σ = (cid:104) A ( z ) (cid:105) ρ , N σ ≡ (cid:90) dx Q σ ( x ) . (9)For any Y , real functions Q ± ( x ) exist and they are essentially unique. In terms of Fourier modes:˜ Q ± ( k ) = ± e ± kY ˜ P ( k ) − e ∓ kY ˜ P ( − k ) ∗ kY ) ( k (cid:44)
0) (10)with P ( x ) = (cid:80) k e ikx ˜ P ( k ). For Y above some critical value, the Q ± ( x ) become positive : there are more e Y in denominator and eventually the positive constant mode dominates. The critical Y is the optimalchoice since it has the smallest width. Two examples are displayed in figure 1. For complex probabilities defined on R n ( n >
1) one can try a strict two-branch scheme [16] with P ( (cid:126) x ) = (cid:88) σ = ± Q σ ( (cid:126) x − i σ(cid:126) Y ) , Q σ ( (cid:126) x ) ≥ , (11) Q σ ( (cid:126) x ) = N σ + (cid:88) (cid:126) k (cid:44) e i (cid:126) k · ( (cid:126) x − i σ(cid:126) Y ) ˜ P ( k )2 sinh(2 σ(cid:126) k · (cid:126) Y ) ≡ N σ + Re (cid:16) C ( (cid:126) x ; σ(cid:126) Y ) ∗ P ( (cid:126) x ) (cid:17) . (12)A similar construction has been proposed in [17, 18]. Once again, this is a formal solution but not a practical one. In addition there is a technical problem: there are no good choices for (cid:126) Y . The natural P H x L =∆ H x L -∆ H x L Q +- Y = - - S H x L = - i cos H x L Q - Q + Y = Figure 1.
Two examples of one-dimensional two-branch representations. choice (cid:126) Y = ( Y , . . . , Y ) meets (cid:126) k · (cid:126) Y = (cid:126) Y are unnatural and also tend to require some Y i larger thannecessary. An elegant solution is to use 2 n branches: For each Fourier mode (cid:126) k and variable x i , either + Y of − Y is selected so that k i Y i ≥ i = , . . . , n : P ( x ) = (cid:88) (cid:126)σ Q (cid:126)σ ( (cid:126) x − i (cid:126)σ Y ) (cid:126)σ = ( ± , . . . , ± ) (13)E ff ectively one is adding a bit for each degree of freedom (say, each site), from (cid:126) x ∈ R n to ( (cid:126) x , (cid:126)σ ) ∈ ( R × Z ) n . Having more branches is compensated by i) simplicity, ii) restoration of symmetry, iii)smaller values of Y , and iv) it is mandatory in the non periodic case. Figure 2.
Left: (Representationusing 2 branches) Q + ( (cid:126) x ), (cid:126) Y = (1 . , . n branches) Q ++ ( (cid:126) x ), (cid:126) Y = (2 . , . Figure 2 displays examples of strict and generalized two-branch representations in two dimensionsfor P ( x , x ) = (1 + β cos( x ))(1 + β cos( x ))(1 + β cos( x − x )) , β = i . (14) Let P ( g ) be a complex probability defined on a matrix group G = { e − i (cid:126) a · (cid:126) T , (cid:126) a ∈ R m } . The observablescan be analytically extended to the complexified group ˜ G = { e − i (cid:126) a · (cid:126) T , (cid:126) a ∈ C m } , Haar measures can beadopted on G and ˜ G and representations can be defined as before.The two-branch scheme can be adapted to the present case: Letting G I ≡ { e − i (cid:126) a · (cid:126) T , (cid:126) a ∈ i R m } , P ( g ) = Q + ( g h ) + Q − ( g h − ) , g ∈ G , h ∈ G I , (15) (cid:104) A (cid:105) P = N + (cid:104) A ( g h − ) (cid:105) Q + + N − (cid:104) A ( g h ) (cid:105) Q − = (cid:104) A (cid:105) ρ , (16)here A ( g h − ), A ( g h ) refer to the analytical extension in ˜ G . So the Monte Carlo sampling is carriedout using the real and positive weights Q ± ( g ) defined on G .To do the matching in Eq. (15), P and Q ± are expanded as a linear combination of group repre-sentations (generalizing the Fourier modes of U(1)). For instance in U( N ), for P ( g ) = (cid:88) n a j ··· j n i ··· i n g i j · · · g i n j n (17)(not the most general case) one finds the solution, choosing h = diag( ω , . . . , ω N ) ( ω i > Q ± ( g ) = (cid:88) n a j ··· j n i ··· i n g i j · · · g i n j n χ ( Ω ± j ,..., j n ) , χ ( Ω ) ≡ ΩΩ − Ω − , (18)where Ω j ,..., j n ≡ ω j · · · ω j n . Once again, for su ffi ciently large h the constant (group singlet) modedominates and Q ± >
0. (The constant mode is to be added separately both in Eq. (17) and in Eq. (18).)
Figure 3.
For P ( g ) = + β tr( σ g ) in SU(2), the function Q + ( g ) is displayed for a = a , a ), where g = a − i (cid:126) a · (cid:126)σ . In the plot β = (1 + i ) and Ω = e Y , Y = . a = ± √ − (cid:126) a ( a = within S (cid:27) SU(2) ).
As an example, a two-branch representation on SU(2) for P ( g ) = + β tr( σ g ) is Q ± ( g ) = + tr (cid:20)(cid:18) ± β R ω − ω − + i σ β I ω + ω − (cid:19) g (cid:21) , h = diag( ω, ω − ) . (19) Q + ( g ) is displayed in figure 3, using g = a − i (cid:126) a · (cid:126)σ , and ( a , (cid:126) a ) ∈ S .As it would be expected subtleties arise in the non-Abelian case: regardless of the choice of h ,certain components (singlet with respect to h ) will remain unchanged (would have Ω = P is real along those components. For them a di ff erent element h (cid:48) ∈ G I has to be applied, subject to the condition (cid:104) singlet h | singlet h (cid:48) (cid:105) = . (20) The construction of direct representations just discussed becomes rapidly inviable as the dimension ofthe manifold increases. This suggests a heat bath approach. As in the standard Gibbs sampling, eachvariable is updated in turn, but, since the conditional probability is complex, a representation of it isused instead: ( z , . . . , z i , . . . , z n ) → ( z , . . . , z (cid:48) i , . . . , z n ) , z (cid:48) i ∼ ρ rep ( z (cid:48) i |{ z j (cid:44) i } ) (21)here ρ rep ( z (cid:48) i |{ z j (cid:44) i } ) is a representation (with respect to z (cid:48) i ) of the conditional probability P ( z (cid:48) i |{ z j (cid:44) i } ) = P ( z , . . . , z (cid:48) i , . . . , z n ) P ( z , . . . , (cid:98) z i , . . . , z n ) . (22)By construction only one-variable representations are needed, however, the method relies on the an-alytical extension of P ( (cid:126) x ), just like CLE. Furthermore convergence to a representation of P ( x ) is notguaranteed as standard Markovian chain theorems do not apply. Also, there is a potential problemfrom zeroes of the marginal probabilities P ( z , . . . , (cid:98) z i , . . . , z n ).To see the performance of the method we have studied a concrete model [16], namely, a complexaction in a d -dimensional periodic lattice S [ φ ] = (cid:88) x (cid:16) λφ x + φ x + β φ x d (cid:88) µ = φ x + ˆ µ (cid:17) , β ∈ C λ ≥ , (23)with conditional probability P [ φ x |{ φ x (cid:48) (cid:44) x } ] ∝ e − λφ x − φ x − β φ x (cid:80) d µ = ( φ x + ˆ µ + φ x − ˆ µ ) . In the free case ( λ = β plane ( e − S is normalizable for bounded Re( β )). We further apply the complex Gibbs sampling approach for λ = β . Some checksfrom T -matrix calculations, for d = | β | ≈ (cid:42) ∂ A ∂φ x (cid:43) = (cid:42) A ∂ S ∂φ x (cid:43) . (24)The choices A = φ x and A = φ x + ˆ µ , give rise to four observables O , and I , such that (cid:104) I (cid:105) = (cid:104) I (cid:105) = . These observables are computed using Monte Carlo and results are displayed in table 1. Reweightingand complex Langevin results are also displayed for comparison.As expected reweighting stops working for the larger lattices. Complex heat bath works ratherwell for any lattice size for moderated couplings although several technical issues arise, including theneed of a cuto ff and filtering of rare (but catastrophic) cases (see [16] for details).Some small bias is observed in (cid:104) I (cid:105) (which ought to be zero). Likely, this is rooted in a realweakness of the method, related to the presence of zeroes in the marginal probabilities . To analyzethis, let us assume, in a two-variable setting, that the current distribution ρ is a proper representation, (cid:104) A (cid:105) ρ = (cid:104) A (cid:105) P . One would like the Gibbs update to preserve this property. A Gibbs update ( z , z ) → ( z (cid:48) , z ), produces a new distribution ρ (cid:48) . Using the representation property one has: (cid:104) A (cid:105) ρ (cid:48) = (cid:90) d z d z (cid:48) ρ ( z ) ρ rep ( z (cid:48) | z ) A ( z (cid:48) , z ) = (cid:90) d z dx (cid:48) ρ ( z ) P ( x (cid:48) , z ) P ( z ) A ( x (cid:48) , z ) = (cid:90) dx dx (cid:48) P ( x ) P ( x (cid:48) , x ) P ( x ) A ( x (cid:48) , x ) + Residues = (cid:104) A (cid:105) P + Residues . (25)So spurious contributions may arise from poles in 1 / P ( z ). This idea is fully sustained by a detailedanalysis of P ( x , x ) = (1 + β cos( x ))(1 + β cos( x ))(1 + β cos( x − x )) . A bias is found whenever[ − Y , Y ] contains the zero of P ( z ) ∼ + β cos( z ), and this cannot be prevented for su ffi ciently large β ( Y increases and the zero z decreases). able 1. Some Monte Carlo results for the action in Eq. (23) with λ = ff ), respectively. CHB ∗ : no cuto ff . CHB ∗∗ : Filtered. β N d × Re (cid:104) O (cid:105) × Im (cid:104) O (cid:105) × Re (cid:104) I (cid:105) × Im (cid:104) I (cid:105) Method0.25i 3 − − − − − − − − − − - - - - - - Figure 4.
Position of the marginal probability (or closely related quantity) obtained on-the-flight. Left: Model ofEq. (23) for a cubic lattice and λ =
1, for β = . i (black triangles) and β = i (blue disks). Right: XY model in an8 lattice, for ( β, µ ) = (1 ,
1) (black triangles), and ( β, µ ) = (0 . , .
2) (blue disks). As expected, one finds that onlywhen the marginal probability stays away from zero (black triangles) the Gibbs sampling gives a fair estimate.
To have a robust method it is mandatory to find a way to remove the spurious contributions fromthe poles in 1 / P ( z , . . . , (cid:98) z i , . . . , z n ), when they exist. No such technique is available yet. On the otherhand, it is not hard to monitor on-the-flight whether the marginal probabilities stay away from zeroduring the Markovian chain. This is displayed in figure 4. Positive representations for any complex probability exist quite generally with variable degrees ofquality, related to their extension on the complexified manifold. Good quality positive representationsare expected to exist for local actions, or more precisely, when the correlation length does not increaseith the size of the system. Unfortunately, there is no known practical way to obtain them beyond thelow dimensional case. CLE would be such a construction when it works.The complex heat bath method, which only needs one-variable representations, works for mode-rate couplings (moderately complex actions). The limitation is due to the problem of possible zeroesin the marginal probabilities used to do the updates. These zeroes introduce poles in the constructionof the representations of the conditional probabilities. The validity of the method in each case can beassessed by monitoring the marginal probabilities on-the-flight. It would be worth studying whetherthe bias introduced by the zeroes of the marginal probability could somehow be removed. In anotherdirection, perhaps the bias problem could be alleviated by considering larger clusters of variables tobe updated at each step (which is also more costly).Despite the impediments noted, may be the most interesting lesson to be learned from the di-rect representation approach is that such positive representations do exist, also in hard problems likeQCD at finite density. This means that there exist real actions on the complexified manifold (how-ever ugly, wide and non-local they could be) that correctly reproduce such real-life systems. Thenone can attempt a di ff erent route, namely, to try to directly write down a real and local action (onthe complexified manifold) in the same universality class as that of QCD at finite density. Such realaction would have the same symmetries (e.g., invariance under real gauge transformations) and wouldinclude a chemical potential as an incorporated parameter but associated to the baryon number con-served charge. In addition the action should be compatible with reflection positivity (at least withinthe subset of holomorphic observables) and should have a support with bounded width in the largevolume limit. While this does not seem to be an easy task, it should be recalled that, as we haveargued, such actions definitely exist, albeit without the locality and width conditions. References [1] P. Hasenfratz, F. Karsch, Phys. Lett. , 308 (1983)[2] M. Troyer, U.J. Wiese, Phys. Rev. Lett. , 170201 (2005), cond-mat/0408370 [3] G. Parisi, Phys. Lett. , 393 (1983)[4] J.R. Klauder, Acta Phys. Austriaca Suppl. , 251 (1983)[5] G. Aarts, E. Seiler, I.O. Stamatescu, Phys. Rev. D81 , 054508 (2010), [6] K. Nagata, J. Nishimura, S. Shimasaki, PTEP , 013B01 (2016), [7] L.L. Salcedo, Phys. Rev.
D94 , 114505 (2016), [8] E. Seiler,
Status of Complex Langevin , in
Proceedings, 35th International Symposium on LatticeField Theory (Lattice2017): Granada, Spain , to appear in EPJ Web Conf.[9] M. Cristoforetti, F. Di Renzo, L. Scorzato (AuroraScience), Phys. Rev.
D86 , 074506 (2012)[10] A. Alexandru, G. Basar, P.F. Bedaque, G.W. Ridgway, N.C. Warrington, JHEP , 053 (2016)[11] J. Bloch, Phys. Rev. Lett. , 132002 (2011), [12] N. Prokof’ev, B. Svistunov, Phys. Rev. Lett. , 160601 (2001)[13] J. Wosiek, JHEP , 146 (2016), [14] L.L. Salcedo, J. Math. Phys. , 1710 (1997), hep-lat/9607044 [15] L.L. Salcedo, J. Phys. A40 , 9399 (2007), [16] L.L. Salcedo, Phys. Rev.
D94 , 074503 (2016), [17] E. Seiler, J. Wosiek (2017), [18] E. Seiler, J. Wosiek,
Positive Representations of a Class of Complex, Periodic Measures , in