Normalizing Flows and the Real-Time Sign Problem
NNormalizing Flows and the Real-Time Sign Problem
Scott Lawrence ∗ and Yukari Yamauchi † Department of Physics, University of Colorado, Boulder, CO 80309, USA Department of Physics, University of Maryland, College Park, Maryland 20742, USA
Normalizing flows have recently been applied to the problem of accelerating Markov chains inlattice field theory. We propose a generalization of normalizing flows that allows them to appliedto theories with a sign problem. These complex normalizing flows are closely related to contourdeformations (i.e. the generalized Lefschetz thimble method), which been applied to sign problems inthe past. We discuss the question of the existence of normalizing flows: they do not exist in the mostgeneral case, but we argue that approximate normalizing flows are likely to exist for many physicallyinteresting problems. Finally, normalizing flows can be constructed in perturbation theory. We givenumerical results on their effectiveness across a range of couplings for the Schwinger-Keldysh signproblem associated to a real scalar field in 0 + 1 dimensions.
I. INTRODUCTION
Monte Carlo methods, applied to lattice quantum fieldtheory, are unique in providing nonperturbative access toobservables in QCD and other field theories. These meth-ods are not, however, equally applicable to all theoriesand observables. In particular, when applied to theorieswith a finite density of relativistic fermions, or to observ-ables involving real-time evolution, lattice Monte Carlomethods are afflicted by the so-called sign problem . Thisobstacle to computing quantum real-time dynamics haspersisted for a considerable time, and is a central moti-vation for the use of quantum computers in high energyand nuclear physics.Lattice methods work by framing the observable to becomputed as a ratio of high-dimensional integrals. Space-time is discretized, and Feynman’s path integral becomesa finite- (but large-) dimensional integral. For many the-ories, this procedure results in a probability distributionover field configurations, which can be importance sam-pled with Markov chain Monte Carlo methods. However,in the case of a finite density of relativistic fermions, ornonequilibirium calculations, the Boltzmann factor e − S is generally complex, and cannot be treated as a proba-bility distribution. In such cases, a standard approach isto sample according to the “quenched” Boltzmann factor e − Re S , and include the phases by reweighting. The costof reweighting is generically exponential in the spacetimevolume of the system being simulated.Recent work has introduced normalizing flows as a toolfor accelerating Markov chain Monte Carlo methods [1–5]. The idea is to construct (usually training by gradi-ent descent) a generative model that samples approxi-mately according to the lattice Boltzmann factor. Eitherby reweighting or by using the model to create proposalsfor a Markov chain, the systemic bias of training is re-moved. This method is particularly anticipated to reducethe cost associated with the approach to the continuum ∗ [email protected] † [email protected] limit (“critical slowing down”).Normalizing flows, as usually formulated, are not di-rectly helpful for the sign problem: a generative modelnecessarily models a real probability distribution, ratherthan the complex weights associated to lattice modelswith a sign problem. In this paper, we show how nor-malizing flows may be generalized to alleviate or removea sign problem. The core of this idea is the observationthat a normalizing flow, suitably generalized, implicitlydefines an integration contour along which the sign prob-lem may be alleviated or removed. These complex nor-malizing flows are thus in the same family of methodsas the Lefschetz thimble approach [6], the generalizedLefschetz thimble method [7, 8], and the search for sign-optimized manifolds [9–12].Complex normalizing flows exist only when a mani-fold is available that exactly solves the sign problem. Wediscuss the conditions under which such perfect mani-folds exist. By modifying the holomorphic gradient flowof [7, 13], we argue that locally perfect manifolds, onwhich there are no local fluctuations of the phase of theBoltzmann factor, always exist. These manifolds maynonetheless posess a global sign problem: different com-ponents of the manifold, separated by singularities ofthe action, may contribute with the integral with dif-ferent phases, and therefore (partially) cancel. Condi-tioned on a mild conjecture regarding the dependence oflocally perfect manifolds on the parameters of the ac-tion, we show that globally perfect manifolds exist for abroad class of physical systems, including the Schwinger-Keldysh sign problem.When a manifold is available that merely approxi-mately solves the sign problem, an approximate normal-izing flow exists. A simple physical argument suggeststhat for many problems of physical relevance, manifoldsthat approximately solve the sign problem (with the ap-proximation getting better in the infinite volume limit)should be available.We also find that the tool of normalizing flows re-sults in a method for perturbatively approximating sign-problem-ameliorating integration contours, as well as anew approach for machine learning of such contours (a a r X i v : . [ h e p - l a t ] F e b prospect previously explored in [9, 10, 12, 14, 15]). Weexplore the in-practice effectiveness of the perturbativelyconstructed flow with numerical experiments on modest0 + 1 lattices. This method does not appear to have scal-ing properties that would allow it to be used, at leastwithout serious improvement, in higher-dimensional the-ories.Finally, a perturbative view of normalizing flows givesrise to a method of computing lattice expectation valuesby solving a certain high-dimensional first-order partialdifferential equation. We demonstrate this method onlattice scalar field theory. Unfortunately, this is mostly acuriosity, chiefly because the practical algorithm for solv-ing the differential equations represents an uncontrolledapproximation. On theories with no sign problem, thefact that reliable error bars are not available renders it in-ferior to standard methods; on theories with a sign prob-lem, solving the differential equation turns out to be hardin practice (for reasons apparently closely connected tothe sign problem itself).The remainder of this paper is structured as follows. InSec. II we describe the lattice Schwinger-Keldysh formal-ism and the origin of the sign problem. Sec. III details thegeneralization of normalizing flows to the complex set-ting, and shows how they relate to contour integrals andthe generalized thimble method. We discuss the questionof the existence of complex normalizing flows (and cor-respondingly, manifolds that solve the sign problem) inSec. IV; the notions of “global” and “local” sign problemsare defined here. Perturbative constructions of complexnormalizing flows are given in Sec. V, with numericalexperiments characterizing their effectiveness. Finally, Sec. VI outlines future avenues to explore. II. LATTICE SCHWINGER-KELDYSH
The lattice Schwinger-Keldysh method was introducedin [16, 17], for use with the generalized Lefschetz thimblemethod, as a formalism for computing real-time observ-ables — that is, observables where the operators havesome time separation. The Schwinger-Keldysh action isreadily derived by considering a lattice field theory in theHamiltonian formulation. We are interested in a time-separated observable, of the form (cid:104)O ( t ) O (0) (cid:105) , with theexpectation value taken in a thermal ensemble of inversetemperature β . Removing time-dependences from theoperators, this expectation value can be written (cid:104)O ( t ) O (0) (cid:105) = Tr e − βH e iHt O e − iHt O Tr e − βH . (1)The ordinary lattice path integral involves only theimaginary-time operator, e − βH . That operator is splitup into a product of many e − a τ H , each of which is Trot-terized. Resolutions of the identity are inserted betweeneach pair of operators, resulting in an integral over all(discrete) paths of field configurations.The Schwinger-Keldysh path integral does not dif-fer in its derivation. After all time-evolution operators(whether real or imaginary) are Trotterized and the fieldintegrals inserted, the expectation value is given by (cid:104)O ( t ) O (0) (cid:105) = (cid:82) D φ e − S [ φ ] O ( t ) O (0) (cid:82) D φ e − S [ φ ] , (2)with the (Euclidean) action, in the case of a single realscalar field, S = (cid:88) t,x ( φ x,t − φ x,t +1 ) a ( t ) + (cid:88) t a ( t ) + a ( t − (cid:88) (cid:104) xx (cid:48) (cid:105) ( φ x,t − φ x (cid:48) ,t ) a x + (cid:88) x (cid:18) m φ x,t + λ φ x,t (cid:19) . (3)Here m is the bare mass and λ the coupling. Becausesome of the Trotterized time-evolution operators wereimaginary time and others were real time, the timelikelattice spacing a is taken to vary over the lattice. In thispaper we will take it to be defined by the “S-contour”,although other choices are possible: a ( t ) = − i t ∈ [0 , N t )1 t ∈ [ N t , N t + N β / i t ∈ [ N t + N β / , N t + N β / t ∈ [2 N t + N β / , N t + N β ) . (4)Here N t and N β denote the number of real-time andthermodynamic time evolution steps, respectively. This choice of action is equivalent to an O ( a ) Trotter approx-imation to e − βH/ e iHt e − βH/ e − iHt .If there were no timeslices with Im a (cid:54) = 0, the actionwould be real. In that case, the Metropolis method givesan algorithm by which the computer can sample fromthe probability distribution proportional to e − S . Expec-tation values with respect to that distribution correspondto physical expectation values.As things are, the action is not pure real, and the Boltz-mann factor does not correspond to any probability dis-tribution — at least, not any distribution of real-valuedfields. The standard approach at this point is to samplewith respect to the quenched Boltzmann factor e − Re S ,and then reweight, computing observables as (cid:104)O(cid:105) = (cid:104) e − iS I O(cid:105) Q (cid:104) e − iS I (cid:105) Q (5)Here (cid:104)·(cid:105) Q denotes an expectation value with respect tothe quenched distribution. The denominator, (cid:104) e − iS I (cid:105) Q ,is known as the average phase, and characteristically de-cays exponentially in the spacetime volume of the sys-tem. A simple but robust argument shows that thisexponential decay is a generic phenomenon. The aver-age phase can be written as a ratio Z/Z Q of the phys-ical partition function to the quenched partition func-tion Z Q ≡ (cid:82) e − Re S . The physical partition function,in the large volume limit, behaves thermodynamically asthe exponential of the (extrinsic) free energy, and there-fore scales as e fV , with f the free energy density. Thequenched partition function describes some (less inter-esting) thermodynamic system, and is therefore expectedto have the same scaling, but with a different exponent: e f Q V . Thus, the ratio will exponentially decay. This canbe avoided only when f Q = f exactly: when there is nosign problem at all.The real-time portion of the Schwinger-Keldysh con-tour gives the action an imaginary part. Note that fieldsalong the real-time portion of the contour do not con-tribute at all to the real part of the action. Along thosedirections, importance sampling has no effect and thesign problem is maximally bad — this is generic to allfield theories. Specially to scalar field theory, becausethe domain of the path integral has infinite measure, thequenched partition function does not converge and theaverage phase is exactly zero. III. NORMALIZING FLOWS AND CONTOURINTEGRALS
We begin by introducing normalizing flows in the caseof a theory with no sign problem. To accelerate the pro-cess of sampling from the Boltzmann distribution e − S ,we can look for a map ˜ φ ( φ ) with the property (cid:32) det ∂ ˜ φ∂φ (cid:33) e − S [ ˜ φ ( φ )] ≈ N e − φ / . (6)The map ˜ φ (termed a normalizing flow in the machinelearning literature) transforms a Gaussian distribution,which can be sampled from efficiently, to the physicaldistribution desired. The normalization constant N is Strictly speaking, it is the inverse map ˜ φ (cid:55)→ φ that is usuallyreferred to as the normalizing flow, as it transforms the distri-bution e − S into the normal distribution. The convention usedhere, of working with ˜ φ ( φ ) itself, allows generalization to actionswith a sign problem. inserted to account for the fact that the partition function Z is generically not equal to the Gaussian integral.If the flow ˜ φ ( φ ) is exact, it allows expectation values tobe computed directly (and is referred to as a trivializingmap). A flow which is merely an approximation inducesan effective action on the fields ˜ φ which is unequal to thedesired physical action: S induced ( ˜ φ ) = φ / ∂ ˜ φ∂φ , (7)where φ is the preimage of ˜ φ under the normalizingflow. To compute the correct expectation values, we mustreweight by computing a ratio of expectation values: (cid:104)O(cid:105) = (cid:104)O e S induced − S (cid:105) n (cid:104) e S induced − S (cid:105) n , (8)where (cid:104)·(cid:105) n denotes an expectation value with respect tothe normal distribution over φ . In practice, it is oftenmore efficient to use the normalizing flow to generateproposals for a Markov Chain instead — the distinctionwill not matter here.This procedure can begin with any easily sampled dis-tribution. The use of a Gaussian is a convenient choicewhen the domain of integration is R N . For compact do-mains of integration, a uniform distribution is likely tobe a more convenient starting point.Note also that normalizing flows compose. Given a se-quence of distributions p , . . . , p k , and k − p i to p i +1 , the composition of thosenormalizing flows transforms p to p k . This composi-tional property is preserved by the complex normalizingflows defined below.This method is clearly not directly applicable to mod-els with a sign problem. The normalizing flow inducesan effective action on the physical fields ˜ φ which is al-ways real , and therefore will never match the physicalaction S [ ˜ φ ]. We can construct a normalizing flow for thequenched action Re S [ ˜ φ ], but this will at most lead to apolynomial speed up in an exponentially slow algorithm .Instead, inspired by the generalized thimble method,we can allow ˜ φ ( φ ) to map real trivial fields φ ∈ R N tocomplex-valued physical fields ˜ φ ∈ C N . We dub such aconstruction a complex normalizing flow . The conditionEq. (6) remains the same; to guarantee equality of ex-pectation values, we will see that additional constraintson the behavior of ˜ φ are needed.Assuming for the moment that Eq. (6) holds exactly,let us see what expectation values are computed. (cid:104)O ( ˜ φ ) (cid:105) n = (cid:82) ˜ φ ( R N ) D ˜ φ O e − S (cid:82) ˜ φ ( R N ) D ˜ φ e − S (9) Or at least, the Boltzmann factor is always real. A noninvertibleflow may induce a negative Boltzmann factor. Furthermore, as discussed in Sec. II, the real part of the actionfor real-time sign problems is typically flat for most directions.Sampling from the quenched action is not hard to begin with.
Although the integrand is the desired one, the domain ofintegration is incorrect. The physical expectation value (cid:104)O(cid:105) is obtained by an integral over the real plane R N ⊂ C N . The domain of integration used in Eq. (9) is theimage of R N under the map ˜ φ . In order for the twointegrals to be guaranteed equal, we must require thefollowing: • The Boltzmann factor e − S [ ˜ φ ] is holomorphic, as isthe product with the observable e − S O [18]. • The image of R N under ˜ φ ( φ ) is a continuous man-ifold M ⊂ C N . • The contours R N and M are connected by a ho-motopy; that is, there exists a continuous fam-ily of manifolds M ( t ) such that M (0) = R N , M (1) = M , and at no point does M pass througha singularity of an integrand.Implicit in the last condition is the requirement that,when the complexified domain is not compact, theasymptotic behavior of the manifold at infinity notchange. A change in this asymptotic behavior is con-sidered equivalent to the manifold passing through thesingularity at infinity.From the conditions for equality above, it is clear thata complex normalizing flow induces a manifold of integra-tion M of exactly the sort used in the generalized thimblemethod. For an exactly normalizing flow, the integrationalong this manifold exhibits no sign problem. Therefore,(exact) complex normalizing flows exist only if there is amanifold which exactly solves the sign problem. In fact,as discussed in Sec. IV E below, the converse holds aswell: the existence of a manifold with no sign problemimplies the existence of an exact normalizing flow.In cases where the complex normalizing flow is not ex-act, reweighting is used to recover the precise expectationvalues as usual. This will generally be necessary through-out the numerical methods explored in this paper. IV. EXISTENCE
A theory of complex normalizing flows does little goodif such flows do not exist for problems of physical inter-est. The section is devoted to investigating when complexnormalizing flows exist. Although in no (non-trivial) casecan we show that normalizing flows certainly exist, theevidence suggests that such flows are more likely to existin the case of bosonic (including real-time) sign problemsthan in the case of fermion sign problems.First, we construct manifolds that entirely remove local phase fluctuations, leaving only global cancellations be-tween different parts of the manifold of integration. Thisconstruction uses the holomorphic gradient flow (oftenused to approximate or define Lefschetz thimbles), de-fined and characterized in Sec. IV A. The existence of lo-cally sign-free manifolds is argued for in the subsequent section. In Sec. IV C, we conjecture that locally perfectmanifolds behave smoothly as parameters of the actionare varied. The existence of perfect manifolds for theSchwinger-Keldysh sign problem follows from this con-jecture. Several examples, where sign-free manifolds caneither be found explicitly or shown not to exist at all,are given in Sec. IV D; these examples suggest a patternin which sign-free manifolds generically exist for bosonic,but not fermionic, sign problems. Finally, in Sec. IV E,we use well-known results regarding normalizing flows inthe real setting to conclude that, conditional on a per-fect manifold existing, a complex normalizing flow mustexist.
A. Holomorphic Gradient Flow
The holomorphic gradient flow is a first-order dif-ferential equation used to approximate Lefschetz thim-bles [7, 13]. Lefschetz thimbles are the surfaces of steep-est descent of Re S proceeding from critical points of theaction. A certain union of the thimbles can be shown toyield the same integral as the real plane R N . Becausethe thimbles are generically sub-optimal in terms of thesign problem, we will ignore them and focus on the be-havior of the flow itself. The key result is that, when aBoltzmann factor has local phase fluctuations on the realplane, the holomorphic gradient flow can always be usedto find a nearby manifold with an improved sign problem.The holomorphic gradient flow is defined byd z d t = ∂S∂z . (10)Here the partial derivative ∂∂z denotes the usual holo-morphic derivative (i.e. the Wirtinger derivative). Wewill assume throughout that the action S is holomorphicin the field variables z . This differential equation governsthe evolution of a field configuration z through complexspace. When applied to all field configurations in a man-ifold, we obtain a family of manifolds parameterized bythe flow time t . Note that the flow time is purely fic-tional, and is unrelated to the physical time (representedas part of the lattice).Considering the evolution of the field equation, notethat the imaginary part of the action never changes, andthe real part can only increase (or remain the same, if webegin at a critical point):d S d t = (cid:12)(cid:12)(cid:12)(cid:12) ∂S∂z (cid:12)(cid:12)(cid:12)(cid:12) . (11)This can be taken as a motivation for the holomorphicgradient flow, as by increasing the real part of the action,we may hope to decrease the quenched partition functionand improve the average phase. Following this observa-tion, it is convenient to work with the real part of theaction u ≡ Re S .The flow Eq. (10) is most frequently applied to man-ifolds beginning from the real plane ( R N ⊂ C N ). Letus examine its behavior at early times. Note that, as aconsequence of Cauchy’s integral theorem, the partitionfunction itself will not be changed by the flow [19]; onlythe quenched partition function will change. The changein the quenched partition function is given bydd t Z Q = dd t (cid:90) R N D x e − u [ z ( x )] (cid:12)(cid:12)(cid:12)(cid:12) det ∂z∂x (cid:12)(cid:12)(cid:12)(cid:12) , (12)where we have chosen to parameterize the infinitesimallyflowed manifold z ( x ) by the real plane. Because the do-main of integration is unchanged, being the parameter-izing real plane for any t , we may proceed to inspect thederivative of the integrand.dd t e − u | det J | = e − u | det J | (cid:20) Re Tr J − d J d t − d u d t (cid:21) . (13)We have already seen that d u d t = | ∂S∂z | is guaranteed tobe non-negative, and positive away from a critical point.The Jacobian term, however, may be larger than theguaranteed-negative term, resulting in a worsening signproblem with flow time t . Empirically, this is indeed thecase at sufficiently long flow times: the average phaseis maximized at some intermediate t , rather than in thelimit t → ∞ . Beginning from the real plane, however,the Jacobian is the identity, and we findTr J − d J d t = (cid:88) i ∂ S∂z i . (14)The real part reduces to (cid:80) i ∂ u∂x i . Returning to Eq. (13),dd t e − u | det J | = (cid:88) i e − u (cid:34) ∂ u∂x i − (cid:12)(cid:12)(cid:12)(cid:12) ∂S∂z i (cid:12)(cid:12)(cid:12)(cid:12) (cid:35) . (15)Consider each i independently. Each term is very nearlya total derivative, since ∂∂x i (cid:18) ∂u∂x i e − u (cid:19) = e − u (cid:34) ∂ u∂x i − (cid:18) ∂u∂x i (cid:19) (cid:35) . (16)To connect the two expressions, observe that the magni-tude of the derivative of the action can be written (cid:12)(cid:12)(cid:12)(cid:12) ∂S∂z i (cid:12)(cid:12)(cid:12)(cid:12) = (cid:18) ∂u∂x i (cid:19) + (cid:18) ∂v∂x i (cid:19) (17)where v ≡ Im S . As a result, we find that the change inthe quenched partition function, when starting from thereal plane , isd Z Q d t = − (cid:90) D x e − u ( x ) (cid:18) ∂v∂x i (cid:19) . (18) The same argument applies to any flat manifold.
This is never positive; hence, the sign problem is alwaysimproved by a small amount of flow from the real plane.Moreover, as long as the imaginary part of the action isnon-constant on the portion of the real plane where e − u is nonvanishing, a small amount of flow will make thequenched partition function strictly smaller.This is the key result regarding the holomorphic gra-dient flow: when beginning from the real plane, if Im S is not constant where Re S is finite, a small amount offlow is guaranteed to improve the sign problem.One other property of the holomorphic gradient flowis of interest: regions of C N at which Re S diverges (be-coming large and positive) act as attractors. The flowwill collide with these singularities in a finite flow time.This does not cause the evolution of the manifold itselfto be ill-defined. The manifold will be continuous, butnot smooth, where it intersects singularities of the action.Because the Boltzmann factor vanishes at these singular-ities, the manifold’s behavior there contributes neither tothe integral nor to the sign problem. B. Existence of Locally Perfect Manifolds
The holomorphic gradient flow is not guaranteed to re-sult in a perfect manifold at asymptotically large times.The asymptotic manifold under this flow is a union of Lef-schetz thimbles. Two features of the Lefschetz thimblescontribute a nonvanishing sign problem. First, on eachthimble Im S is constant, but different thimbles generi-cally have different values of Im S . Thus, cancellationsoccur between different thimbles, which may becomesevere when multiple thimbles have similar quenchedweights. Second, although Im S is constant, the phasein the integral is actually Im S − Im log det J , with theJacobian term coming from the integration measure d z .Each thimble, therefore, comes with local phase fluctua-tions, which have been found to become severe on largelattices [20].These two problems are sometimes contrasted and re-ferred to as “global” vs “local” sign problems. An ex-ample of an unremovable global sign problem was givenin [19]: the one-dimensional integral (cid:82) (cos θ + (cid:15) )d θ , forsmall (cid:15) , cannot have its sign problem repaired by anycontour deformation. Local sign problems, in contrast,have been found to be removable even where the thim-bles fail. In the heavy-dense limit of the Thirring model,numerical experiments show that the flow results in asuboptimal manifold even when a perfect manifold doesexist [20].We will see in this section that local sign problems arealways removable; that is, a piecewise-smooth contour ex-ists along which there are no locally fluctuating phases,but there may be cancellations between different pieces.Combined with the observation above that at least some global sign problems are unremovable, this indicates thatthe distinction is well defined. Sign problems can be de-composed into a local and global part, with the local partfixable by an appropriate choice of integration contour,but the global part requiring more drastic manipulations.Now we turn to a procedure, based on the holomorphicgradient flow, for removing local sign problems. Beginwith M = R N ⊂ C N , and flow for a small amountof time (cid:15) . This defines a manifold M , parameterizedapproximately via ˜ φ ( φ ) = φ + (cid:15) ∂S∂φ . (19)(In this discussion, for illustrative purposes, we will ex-pand to linear order in the flow time (cid:15) , as if a discretejump was made. To treat the zeros of e − S correctly, itis important to perform a proper evolution by the flowequation instead.) By the argument in the previous sec-tion, the quenched partition function on M is no largerthan that on M ; moreover, if M (cid:54) = M , then thequenched partition function is smaller.Consider the effective action S induced by ˜ φ ; this isa function from R N to the complex numbers. We wouldnow like to flow with respect to this effective action. Do-ing so would guarantee that the sign problem once againimproves. In order for this to make sense, however, S must be holomorphic. Naively, it appears not to be. Inparticular, its definition includes the explicitly antiholo-morphic term ∂S∂φ , preventing us from repeating the laststep.This is a fiction. First consider ˜ φ ( φ ) as defined inEq. (19). As written, it appears to contain both a holo-morphic and an antiholomorphic term. However, theonly aspect of ˜ φ we care about is its definition on R N .Functions R N → C are neither holomorphic nor antiholo-morphic; as long as ˜ φ is sufficiently smooth, it can beanalytically continued into the complex plane in a purelyholomorphic way. Concretely, we can replace Eq. (19) by˜ φ ( φ ) = φ + (cid:15) ∂S∂φ (cid:12)(cid:12)(cid:12)(cid:12) ¯ φ ; (20)that is, we evaluate the function ∂S∂φ at ¯ φ . This param-eterizes exactly the same manifold (as the two functionsagree on R N ), but also defines a holomorphic map whenevaluated on the rest of the complex plane.Now we return to the effective action, which after onestep is given by S ( φ ) = S [ ˜ φ ( φ )] − log det (cid:32) (cid:15) ∂∂φ ∂S∂φ (cid:12)(cid:12)(cid:12)(cid:12) ¯ φ (cid:33) . (21)As initially defined, this is an analytic function of thereal plane alone. As with ˜ φ , we can choose its behaviorin the complex plane to be holomorphic, at least in someregion around the real plane. We can now flow the realplane again, this time with respect to S . As before, thisis guaranteed to improve the sign problem. We obtain afunction ˜ φ which maps the real plane (the domain of S )to some slightly deformed contour. Composing ˜ φ ◦ ˜ φ yields a map from the domain of the original action S toa deformed contour. Thus, we obtain a new integrationmanifold M = ˜ φ ◦ ˜ φ ( R N ), which induces an effectiveaction S , and we repeat.At every step of this modified flow, we have the free-dom to arbitrarily reparameterize the integration mani-fold. There is no requirement that the parameterizationsbe “connected” from one step to the next. This meansthat the evolution of the manifold from step to step isnon-unique.What can happen to the manifold in the limit of alarge number of steps? As long as the manifold is chang-ing, Z Q is shrinking; this implies that we cannot reacha cycle. The development of some singular behavior,unremovable by reparameterization, could require us totake ever-smaller steps (cid:15) . We do not have a formal proofforbidding this; however, numerical experience with theholomorphic gradient flow indicates that it does not cre-ate singularities away from zeros of the Boltzmann factor.The last possibility is a fixed point: subsequent mani-folds are ever-better approximations (or perhaps equal)to a manifold M f which is unchanged by the holomor-phic gradient flow under the effective action.The properties of such a fixed-point manifold are bestunderstood by considering the corresponding effective ac-tion S f . Since M f is unchanged after a step of flow,it must be that the flow vectors ∂S f ∂φ lie entirely withinthe real plane. Equivalently, since the sign problem isnot improved by flow, Im S must be constant everywhere e − S >
0. Critically, this does not imply that Im S isin fact globally constant, merely that regions of distinctIm S are separated by vanishing Boltzmann factors.To summarize: the effective action S f on the real planesatisfies e − Re S ∂ Im S = 0. The imaginary part is lo-cally constant except at places where the entire actiondiverges (and the Boltzmann factor vanishes). The realplane is thus divided into distinct regions, each with con-stant Im S f and therefore no local sign problem, but withthe possiblity of cancellations between the regions.What does this imply about M f ? Regions on R N where S f does not diverge correspond to smooth partsof the fixed-point manifold, which have no sign problemwhen integrated over. These smooth regions terminatewhere M f intersects with singularites of S , either dueto a fermion determinant become 0, or (in the case ofbosonic sign problems) where one or more fields ˜ φ di-verge.The similarities of M f to the Lefschetz thimbles M T are striking. Like M f , when the Lefschetz thimbles areparameterized by the real plane, they are separated fromeach other by regions of vanishing effective Boltzmannfactor. Those regions on the real plane correspond tothe places in C N where M T intersects with divergencesof S . The key difference is that, when working with thethimbles, the imaginary part of the physical action isconstant but the effective action (due to the Jacobian)may have imaginary fluctuations. The fixed-point man-ifold M f will have a fluctuating imaginary action, butconstant effective action on the real plane.On the fixed-point manifold, the notion of a “global”sign problem becomes clear. Different parts of the man-ifold have different Im S f , with those differences appar-ently not removable by any choice of integration contour.Which physical systems possess global sign problems re-mains an open question.Hints of this notion of a “global” sign problem are, asmentioned earlier, visible already when considering theLefschetz thimbles. When the integral over the real lineis equal to a sum of integrals over two (or more) thimbleswith different phases, it is tempting to disregard the localpart of the sign problem on the thimbles, and attributethe cancellations between thimbles to a global sign prob-lem. It is an open question whether such a global signproblem on the thimbles implies an unremovable globalsign problem.In the context of Lefschetz thimbles, it has been arguedthat cancellations between different thimbles should notbe severe in the infinite volume limit. One such argu-ment proceeds as follows [21]. Each thimble is associatedto a critical point of the action; i.e., a classical solutionto the equations of motion. At large volumes, we mayreasonably expect the integral to be dominated by thim-bles associated with large-scale classical solutions. Thesesolutions, and therefore their associated thimbles, per-sist as we enlarge the volume. Therefore, we may nowtalk about “a thimble” across multiple volumes. Eachthimble’s contribution to the path integral should growthermodynamically, defining a per-thimble free energy.Unless protected by some symmetry, each of these freeenergies will generically be different, causing one thim-ble to dominate in the large-volume limit. Even in thecase of the thimbles, this argument is not a proof. Itsapplicability to the fixed-point manifolds is particularlyunclear. C. Existence of Perfect Manifolds
In the restricted case of polynomial actions (this ex-cludes lattice models with a fermion determinant), wecan show that globally perfect manifolds always exist,provided that locally perfect manifolds depend smoothlyon the parameters of the action. To be precise, the con-jecture we need is:
Conjecture.
Let S t be a continuous family of actions,and let M be a manifold on which e − S d z has no localphase fluctuations. Then there exists a continuous familyof manifolds M t , with M = M , such that e − S t d z hasno local phase fluctuations on M t . This conjecture empirically holds in several one-dimensional models explored in the next section. It isalso motivated by thinking of normalizing flows as ana-lytic functions not just of the field variables, but also ofthe parameters of the action.
FIG. 1. Perfect manifolds, found by numerical search, for theone-dimensional integral defined by Eq. (22). All manifoldshave an average sign measured to be within 10 − of unity. Suppose we start from an action S that has no signproblem on the real plane, whether local or global. Lateractions S t have a sign problem. By definition the corre-sponding manifolds M t have no local sign problem; cana global sign problem be created?One way for a global sign problem to be created, with-out requiring discontinuous behavior of the family M t , isfor singularities of the action to intersect with the mani-fold. In the case of the Schwinger-Keldysh action Eq. (3)and other polynomial actions, however, there are no sin-gularities of the action except at infinity. Any global signproblem must come from regions of M t on which all fieldscan become arbitrarily large.Creating a global sign problem, therefore, implies in-troducing a new such region of M t . This is a discontin-uous operation on the family of manifolds. If we are tohold to the previous conjecture, we must conclude thatthis family of manifolds is not only locally perfect, butin fact globally perfect. D. Examples
In one dimension, manifolds with no sign problem canbe readily found by a numerical search. As an example,Fig. 1 shows such manifolds for the action S = x + λe iφ x (22)for various values of λ and φ . Note that any coefficient ofthe x term can be absorbed by a linear change of vari-ables, so this is the most general quartic action that iseven in x . These examples motivate the conjecture thatsimilarly structured sign problems (in particular, thosewith a polynomial action) generally admit perfect mani-folds.The availability of perfect manifolds does not hold foreven for all one-dimensional integrals, however. A simpleexample, not physically motivated, was given in [19]. Theintegral of (cos θ + (cid:15) ) has a sign problem of order (cid:15) − for small (cid:15) . For sufficiently small (cid:15) , it cannot have itssign problem removed by any contour deformation. Thisis readily confirmed by noticing that the magnitude ofcos( a + ib ) (which is the quenched Boltzmann factor) isminimized when b = 0. The integral along the real linewill then have the smallest quenched partition function,and therefore the best possible sign problem.In the previous section, we discussed how a global signproblem could be created when singularities of the actionintersected with a locally perfect manifold. The case of(cos θ + (cid:15) ) is a clear demonstration of this phenomenon. At (cid:15) >
1, there are two zeros of the Boltzmann factor,at Re θ = 0 and Im θ = ± cosh − (cid:15) . As epsilon is low-ered, these move towards the real line; at (cid:15) = 1 theymerge at θ = 0. At this point, no global sign problemyet exists, but the locally perfect manifold now passesthrough a zero. Continue lowering (cid:15) , and the two zerosagain split, now at Re θ = ± cos − (cid:15) . Although the man-ifold has never changed, it now consists of segments withcancelling phases.Let us now consider a more physical model: the 0 + 1-dimensional Thirring model as studied in [7, 13]. TheBoltzmann factor defining this model is e − S = exp (cid:32) g (cid:88) i cos z i (cid:33) det i,j (cid:20) mδ i,j + 12 (cid:0) e µ + iz i δ i +1 ,j − e − µ − iz j δ i − ,j + e − µ − iz j δ i, δ j,β − e µ + iz i δ j, δ i,β (cid:1)(cid:21) , (23)where g is a coupling constant, µ is the chemical poten-tial (and origin of the sign problem), and m is the baremass. The z , . . . , z β are the degrees of freedom beingintegrated over; there are β links on the lattice.The fermion determinant depends only on the sum ofthe fields βσ = (cid:80) i z i . A natural simplification, therefore,is to consider the “mean-field” model, a one-dimensionalintegral with Boltzmann factor e − S ( σ ) = e β g cos σ (cid:2) cos( β ( σ − iµ )) + 1 (cid:3) . (24)We have taken m = 0 for convenience (and neglected anoverall normalization).For our purposes, an interesting limit is that of large β ,while keeping the coupling and chemical potential both oforder unity. Numerical experiments indicate that thereis no contour that exactly solves the sign problem withthese parameters — indeed, the average phase falls expo-nentially in β , as one would expect. This holds even if weneglect the Jacobian. In particular, write the quenchedBoltzmann factor explicitly in terms of the real and imag-inary parts of σ = σ R + iσ I : | e − S | = e β g cos σ R cosh σ I (cid:12)(cid:12) βσ R cosh( β ( σ I − µ )) − i sin βσ R sinh( β ( σ I − µ )) (cid:12)(cid:12) . (25)The quenched partition function can be given a lowerbound by minimizing (cid:82) | e − S [ σ R ,σ I ( σ R )] | over all functions σ I . The minimization over σ I can be done individuallyfor each σ R . Even this lower bound on the quenched par-tition function still falls exponentially above the physicalpartition function.Note that the fact that no manifold exists to resolvethe mean-field sign problem does not prove that no man-ifold exists that resolves the sign problem of the originaltheory: it is merely suggestive. It seems plausible that asimilar technique could be used to establish the impossi-bility of the original sign problem. This fermionic example differs sharply from theSchwinger-Keldysh action (and from Eq. (22)). The ac-tion is not a polynomial, and relatedly, the Boltzmannfactor falls to zero away from infinity (and nearly on thereal plane). If the failure to have a perfect manifold isrelated to these features, then we expect fermionic signproblems to frequently be unresolvable via contour defor-mation, while bosonic sign problems would generically beresolvable. E. Existence of Normalizing Flows
A parameterization ˜ φ ( φ ) of a manifold with no signproblem induces an effective action on the real plane thatis always real. Thanks to the composability of normaliz-ing flows, the problem of finding a complex normalizingflow reduces to the problem of finding an ordinary nor-malizing flow for that effective action. Provided that thiscan be done, the existence of a perfect manifold impliesthe existence of a complex normalizing flow.As it happens, given probability distributions p ( x ) and π (˜ x ), a map x → ˜ x always exists such that the measure p ( x )d x induces the measure π (˜ x )d˜ x ; that is, such that p ( x ) (cid:18) det ∂ ˜ x∂x (cid:19) = π [˜ x ( x )]. (26)The construction is simplest, and unique, in one dimen-sion. Define the cumulative distribution functions P andΠ, of p and π respectively: P ( x ) = (cid:90) x −∞ d x (cid:48) p ( x ). (27)The CDF can be seen as a normalizing flow from a proba-bility distribution to the uniform distribution on the unitinterval. Therefore, the desired map is given by Π − ◦ P .In the multidimensional case such maps are known toexist as well, but cease to be unique. Finding maps withdesirable properties is an active area of research; see [22]for a review.The remainder of this work is dedicated to the taskof finding approximate normalizing flows, under the as-sumption that such flows exist. V. PERTURBING FLOWS
In principle, a complex normalizing flow can be trainedin much the same way as a regular normalizing flow. Inpractice, this training is a difficult process. One principlereason, closely linked to the sign problem, is that whencomparing Boltzmann factors, a difference of 2 π in theaction is invisible. As a result, if the physical Boltzmannfactor is 1 and the induced Boltzmann factor is −
1, thegradient descent procedure has no way to know whetherthe induced Im S should be changed by π or − π (or per-haps 3 π ). Circumventing this requires either maintaininga normalizing flow which is always “within π ” of beingexact, or defining the flow in such a way that the imag-inary part of the induced action is itself well defined.Instead, we will work in the spirit of [23], and constructnormalizing flows in perturbation theory. A. Leading Order
A normalizing flow need not begin with a Gaussiandistribution. In the general case, the condition for a nor-malizing flow reads (cid:32) det ∂ ˜ φ∂φ (cid:33) e − S [ ˜ φ ( φ )] = N e − S ( φ ) , (28)where S is the action defining the original probabilitydistribution, which φ (cid:55)→ ˜ φ transforms into e − S . Considerthe case where S is merely a perturbation of S ; that is,where S = S + λ O (29)for small λ . When λ = 0, a suitable normalizing flow issimply ˜ φ = φ . For small λ , we expand ˜ φ as a power series:˜ φ = φ + λ ∆ (1) . Returning to Eq. (28) and expanding toleading order in λ , we find the differential equation for∆ (1) : ∇ · ∆ (1) − ∆ (1) · ∇ S = O − (cid:104)O(cid:105) . (30)Here the expectation value (cid:104)O(cid:105) is evaluated with respectto the original action S . We can obtain Eq. (30) morequickly simply by considering the integral of ∇· (cid:0) ∆ e − S (cid:1) ,for any ∆ that decays at infinity, or diverges suffi-ciently slowly. As the integral of a total derivative, itmust vanish. This implies that the expectation value of ∇ · ∆ − ∆ · ∇ S vanishes as well. When perturbing from a free theory (that is, when S defines a Gaussian), Eq. (30) can be solved exactly. Inparticular, with S = (cid:88) ij φ i M ij φ j + λ (cid:88) i Λ i φ i , (31)the perturbative flow ∆ (1) is given by∆ (1) i = − (cid:88) j (cid:20) M − ij Λ j φ j + 34 M − ij M − jj Λ j φ j (cid:21) . (32)Note that Eq. (31) is a generalization of the Schwinger-Keldysh action Eq. (3). Any two Gaussians are triviallyconnected by a normalizing flow, and as noted earlier,normalizing flows compose. Thus, Eq. (32) implicitly de-fines a (perturbative) normalizing flow for the Schwinger-Keldysh sign problem in φ field theory.Unfortunately, Eq. (28) is not the only condition con-straining a complex normalizing flow. As discussed inSec. III, the asymptotic behavior of the contour ˜ φ ( R N )must match that of the real plane; i.e., the two manifoldsmust be in the same homology class. It is not a surprisethat the perturbative flow Eq. (32) violates this condi-tion, as the perturbative expansion is equivalent to anexpansion in small fields φ , while the asymptotic behav-ior is purely determined by the behavior of ∆ (1) when φ is large.To get the correct asymptotic behavior, we can workinstead in the strong coupling expansion to obtain∆ (1 , strong) , which becomes a good approximation at large φ . For an action of the form of Eq. (31), it is convenientto construct our normalizing flow as a sequence of fourmaps:1. Map the distribution e − φ to e − ψ via ψ = F ( φ ).2. Rotate and scale the complex plane via ψ = F ( ψ ) to obtain the distribution e − Λ ψ .3. Introduce a perturbative quadratic piece via a per-turbative flow ψ = F ( ψ ) = ψ + √ λ δ (1) ( ψ ). Theresulting distribution is e − S (cid:48) ( ψ ) , where S (cid:48) ( ψ ) = (cid:88) i Λ i ψ i + 1 √ λ (cid:88) ij ψ i M ij ψ j . (33)4. Rescale the fields to restore the correct field nor-malization via ˜ φ = F ( ψ ), finally obtaining thedesired distribution e − S ( ˜ φ ) , with the action definedin Eq. (31)Note that the first two maps, F and F , factor intoone-dimensional maps, which can be obtained straight-forwardly via the prescription following Eq. (27). Ac-0 FIG. 2. Simulations with the normalizing flow computed in the strong-coupling expansion to leading order. On the left, theresulting sign problem is computed on a lattice with N β = 2, n t = 5, m = 0 .
5, as a function of the coupling λ . The real-timecorrelator (cid:104) φ ( t ) φ (0) (cid:105) is shown on the right, at the same temperature and with m = 0 . λ = 0 .
33. The solid lines labelled‘Exact’ include the same Trotterization errors present on the Schwinger-Keldysh lattice. cordingly, F can be written as F ( φ ) = Π − ◦ P , with (34)Π( φ ) = 12 + 12 (cid:32) − Γ (cid:2) / , φ (cid:3) Γ(1 / (cid:33) sgn φ (35) P ( φ ) = 12 (cid:16) φ/ √ (cid:17) . (36)Above, Γ( x ) is the gamma function, Γ( s, x ) is the up-per incomplete gamma function, and Erf( x ) is the errorfunction.The second map is given by simply multiplying φ byΛ − / i . This is a rotation of the complex plane on mostof the lattice, with an additional scaling factor of 2 / onthe corners of the Schwinger-Keldysh contour. Thus themap F is defined as F ( φ ) = φ/ Λ / i . (37) The map F is where the strong coupling expansionis performed. The differential equation for δ (1) is of theform of Eq. (30), with O = (cid:80) ij φ i M ij φ j . The expecta-tion value (cid:104)O(cid:105) must now be evaluated with respect to theleading-order action S = (cid:80) i Λ i φ i . Expressed in termsof f i = δ (1) i ( φ ) e − Λ i φ i , and using the fact that (cid:104) φ i φ j (cid:105) vanishes when i (cid:54) = j (in the strong coupling limit), thedifferential equation reads ∂f i ∂φ i e Λ i φ i − (cid:88) j M ij φ i φ j = − M ii (cid:104) φ i (cid:105) . (38)The expectation value required is (cid:104) φ i (cid:105) = Γ(3 / / √ Λ i .Using the fact that only diagonal and nearest-neighborterms of M are non-zero, the solution is δ (1) i ( φ ) = e Λ i φ i M ii (cid:20) − φ i Γ[ , Λ i φ i ]4(Λ i φ i ) / + (cid:104) φ i (cid:105) φ i Γ[ , Λ i φ i ]4(Λ i φ i ) / (cid:21) + (cid:88) j ∈{ i − ,i +1 } e Λ i φ i √ π √ Λ i (cid:104) Erf( (cid:112) Λ i φ i ) − C (cid:105) M ij φ j . (39)Above, ( · ) / refers specifically to the principle fourthroot. A specific choice of C = 1 gives a solution whichvanishes at ψ i → ∞ and is oscillation-free.Finally, F rescales the field by a factor of λ / : F ( φ ) = φ/λ / . (40)Putting it all together, the entire perturbative flow from e − (cid:80) i ψ i to e − S ( φ ) at the leading order is φ + ∆ (1 , strong) ( φ ) = [ F ◦ F ◦ F ◦ F ] ( φ ). (41)The left panel of Fig. 2 shows the average phase ob-tained by this flow on a 12-site lattice with m = 0 .
5, as the coupling is varied. As expected, at strong coupling,the sign problem is almost entirely removed, whereas atsufficiently small coupling the average phase is too smallto be distinguished from zero. As a check of the correct-ness and convergence of the flow, the right panel of thesame figure shows the real-time correlator obtained with m = 0 . λ = 0 .
33, compared with an exact Hamil-tonian calculation. The lattice behind this calculationhas 14 sites (two thermal links and six temporal links ineach direction), and the average phase was computed tobe (cid:104) σ (cid:105) = 0 . B. Extracting Expectation Values
When the action S defines a probability distribu-tion from which sampling can be performed efficiently,Eq. (30) provides a means to approximately sample fromthe perturbed distribution defined by S . However, thatequation is still valid when S is itself hard to samplefrom (and perhaps afflicted with a sign problem). Anyvector field ∆ i corresponds to some observable O whoseexpectation value is known — and for a fixed desired ob-servable, a numerical solution to ∆ can be attempted,which will automatically yield the expectation value.The differential equation is in a number of dimensionsequal to the number of sites on the lattice. Neural net-works have been profitably applied to solving such high-dimensional differential equations [24]. Our strategy is asfollows. A multi-layer perceptron (MLP), with parame-ters labelled W , will represent ∆ as a function of the(real) fields φ ; a single additional training parameter E represents (cid:104)O(cid:105) . We train these parameters with respectto the cost function C ( W, E ) = (cid:90) d φ e − φ / ×|∇ · ∆ W ( φ ) − ∆ W ( φ ) · ∇ S ( φ ) − E + O ( φ ) | , (42)which is estimated by randomly sampling from the Gaus-sian distribution in φ .The left panel of Fig. 3 showcases this method on 0+1-dimensional scalar field theory — equivalent to the modelof Eq. (3) with no real-time evolution. The lattice pa-rameters are m = 0 . β = 10; expectation values aregiven as a function of the coupling. A two-layer MLP isused with hyperbolic tangent as an activation function.The method is seen to have reasonable agreement withthe exact answer across all couplings.This method has serious drawbacks. Most importantly,it represents an uncontrolled approximation. The errorin the estimation of (cid:104)O(cid:105) is due to the shortcomings ofthe ansatz used for ∆, rather than insufficient statistics;therefore, this error cannot be estimated with bootstrapnor removed with a larger number of samples.In the case of the Schwinger-Keldysh action with N t >
0, another issue emerges. The most natural cost functionfor training ∆ would be C (cid:48) ( W, E ) = (cid:90) d φ e − S ×|∇ · ∆ W ( φ ) − ∆ W ( φ ) · ∇ S ( φ ) − E + O ( φ ) | . (43)The task of evaluating this cost function itself has a signproblem; estimating its derivatives with respect to the MLP parameters has a related signal-to-noise problem.Training with respect to the cost function of Eq. (42)has no such difficulty, but a small value of that cost func-tion does not imply that ∆ is a good approximation interms of the ‘true’ cost function. This mismatch allowsan apparently good fit to correspond to very inaccurateexpectation values.The right panel of Fig. 3 showcases this failure moreclearly, in the case of N β = 2, for various real-time evo-lutions N t . Shown is the expectation value (cid:104) φ ( t ) φ (0) (cid:105) forseveral time separations t , at an inverse temperature of β = 2, with lattice parameters m = 0 . g = 0 . VI. FURTHER DISCUSSION
This paper introduced the notion of complex normal-izing flows, which extend the applicability of normaliz-ing flow-based methods to (some) models afflicted witha sign problem. Unlike real normalizing flows, which al-ways exist, complex normalizing flows exist only whenan integration contour is available which exactly removesthe sign problem. Approximate normalizing flows can beconstructed in perturbation theory, but low-order pertur-bative approximations were found to not have a tractablesign problem on lattices with more than ∼
20 sites.The general question of when sign problem-solvingintegration contours exist remains open. For simplefermionic models, they can be readily shown not to exist,although the possiblity remains that the manner of inte-grating out fermions could be changed to remedy this.Simple models inspired by the Schwinger-Keldysh action do have contours which exactly solve the sign problem.Conjecturally, this is a generic feature of polynomial ac-tions.The Schwinger-Keldysh sign problem for scalar fieldscomes from a polynomial action, but this is far from anunusual property. Consider the case of SU ( N ) gauge the-ories in the absence of fermions. The complexification of SU ( N ) is the group of complex N × N matrices U obeyingdet U = 1, SL ( N ; C ). On the space, the standard Wilsonaction and its many improvements can be written holo-morphically as a polynomial of U and U † . Cramer’s rulefor the inversion of matrices provides an expression for U − in terms of the elements of U , in which the only non-polynomial factor is (det U ) − . In the case of SL ( N ; C )matrices, this factor is always 1 and can be neglected.What remains in the action is a polynomial of the fieldvariables.We argued that contours always exist which locally solve the sign problem, by describing an iterative proce-dure for removing local fluctuations in Im S , and study-ing the properties of its fixed points. Analogous to the2 FIG. 3. Evaluation of expectation values via the machine learning method of Sec. V B. On the left, a 10-site lattice with noreal-time evolution, with m = 0 . β = 2, m = 0 .
5, and λ = 0 .
5. In both cases, exact results are shown by the solid line.
Lefschetz thimbles , a fixed-point manifold of this proce-dure can be decomposed into several smooth pieces, sep-arated from each other by singularities of the physicalaction. The key difference is that on this manifold, theimaginary part of the effective action is constant, ratherthan the imaginary part of the physical action.If the evidence presented earlier is taken at face value,it is likely that manifolds exist that resolve the real-timebosonic sign problem. It is critical to note that this is notequivalent to a solution to that sign problem. Difficultiescould still exist with the computational task of findingsuch manifolds, or there may be no efficient algorithmsfor sampling from them. In fact, in the case of a real-timesign problem with arbitrary time-dependent source terms(which is still a polynomial action), this is the expectedresult: it was shown in [25] that the task of computing amplitudes in such a context in BQP-hard. ACKNOWLEDGMENTS
We are indebted to Andrei Alexandru, Tom Cohen,Frederic Koehler, and Henry Lamm for many useful dis-cussions. We are additionally grateful to Henry Lamm forreviewing a previous version of this manuscript. S.L. issupported by the U.S. Department of Energy under Con-tract No. DE-SC0017905. Y.Y. is supported by theU.S. Department of Energy under Contract No. DE-FG02-93ER-40762 and by the Jefferson Science Asso-ciates 2020-2021 graduate fellowship program. [1] M. Albergo, G. Kanwar, and P. Shanahan, Flow-basedgenerative models for Markov chain Monte Carlo inlattice field theory, Phys. Rev. D , 034515 (2019),arXiv:1904.12072 [hep-lat].[2] G. Kanwar, M. S. Albergo, D. Boyda, K. Cranmer,D. C. Hackett, S. Racani`ere, D. J. Rezende, and P. E.Shanahan, Equivariant flow-based sampling for latticegauge theory, Phys. Rev. Lett. , 121601 (2020),arXiv:2003.06413 [hep-lat].[3] D. Boyda, G. Kanwar, S. Racani`ere, D. J. Rezende, M. S.Albergo, K. Cranmer, D. C. Hackett, and P. E. Shana-han, Sampling using SU ( N ) gauge equivariant flows,(2020), arXiv:2008.05456 [hep-lat].[4] K. A. Nicoli, S. Nakajima, N. Strodthoff, W. Samek, K.-R. M¨uller, and P. Kessel, Asymptotically unbiased es-timation of physical observables with neural samplers,Phys. Rev. E , 023304 (2020), arXiv:1910.13496[cond-mat.stat-mech]. In fact, the regions on the real plane of constant Im S f are thim-bles of the fixed-point effective action. [5] K. A. Nicoli, C. J. Anders, L. Funcke, T. Hartung,K. Jansen, P. Kessel, S. Nakajima, and P. Stornati, Esti-mation of Thermodynamic Observables in Lattice FieldTheories with Deep Generative Models, Phys. Rev. Lett. , 032001 (2021), arXiv:2007.07115 [hep-lat].[6] M. Cristoforetti, F. Di Renzo, and L. Scorzato (Aurora-Science), New approach to the sign problem in quantumfield theories: High density QCD on a Lefschetz thimble,Phys. Rev. D , 074506 (2012), arXiv:1205.3996 [hep-lat].[7] A. Alexandru, G. Basar, P. F. Bedaque, G. W. Ridgway,and N. C. Warrington, Sign problem and Monte Carlocalculations beyond Lefschetz thimbles, JHEP , 053,arXiv:1512.08764 [hep-lat].[8] A. Alexandru, G. Basar, P. F. Bedaque, G. W. Ridgway,and N. C. Warrington, Monte Carlo calculations of thefinite density Thirring model, Phys. Rev. D , 014502(2017), arXiv:1609.01730 [hep-lat].[9] Y. Mori, K. Kashiwa, and A. Ohnishi, Applicationof a neural network to the sign problem via thepath optimization method, PTEP , 023B04 (2018),arXiv:1709.03208 [hep-lat].[10] Y. Mori, K. Kashiwa, and A. Ohnishi, Toward solving the sign problem with path optimization method, Phys.Rev. D , 111501 (2017), arXiv:1705.05605 [hep-lat].[11] A. Alexandru, P. F. Bedaque, H. Lamm, S. Lawrence,and N. C. Warrington, Fermions at Finite Density in 2+1Dimensions with Sign-Optimized Manifolds, Phys. Rev.Lett. , 191602 (2018), arXiv:1808.09799 [hep-lat].[12] A. Alexandru, P. F. Bedaque, H. Lamm, andS. Lawrence, Finite-Density Monte Carlo Calculationson Sign-Optimized Manifolds, Phys. Rev. D , 094510(2018), arXiv:1804.00697 [hep-lat].[13] A. Alexandru, G. Basar, and P. Bedaque, Monte Carloalgorithm for simulating fermions on Lefschetz thimbles,Phys. Rev. D , 014504 (2016), arXiv:1510.03258 [hep-lat].[14] A. Alexandru, P. F. Bedaque, H. Lamm, andS. Lawrence, Deep Learning Beyond Lefschetz Thimbles,Phys. Rev. D , 094505 (2017), arXiv:1709.01971 [hep-lat].[15] J.-L. Wynen, E. Berkowitz, S. Krieg, T. Luu, and J. Ost-meyer, Leveraging Machine Learning to Alleviate Hub-bard Model Sign Problems, (2020), arXiv:2006.11221[cond-mat.str-el].[16] A. Alexandru, G. Basar, P. F. Bedaque, S. Vartak, andN. C. Warrington, Monte Carlo Study of Real Time Dy-namics on the Lattice, Phys. Rev. Lett. , 081602(2016), arXiv:1605.08040 [hep-lat]. [17] A. Alexandru, G. Basar, P. F. Bedaque, and G. W.Ridgway, Schwinger-Keldysh formalism on the lattice: Afaster algorithm and its application to field theory, Phys.Rev. D , 114501 (2017), arXiv:1704.06404 [hep-lat].[18] A. Alexandru, G. Ba¸sar, P. F. Bedaque, H. Lamm,and S. Lawrence, Finite Density QED
Near Lef-schetz Thimbles, Phys. Rev. D , 034506 (2018),arXiv:1807.02027 [hep-lat].[19] S. Lawrence, Sign Problems in Quantum Field Theory:Classical and Quantum Approaches , Ph.D. thesis, Mary-land U. (2020), arXiv:2006.03683 [hep-lat].[20] S. Lawrence, Beyond Thimbles: Sign-Optimized Man-ifolds for Finite Density, PoS
LATTICE2018 , 149(2018), arXiv:1810.06529 [hep-lat].[21] T. Cohen, personal communication.[22] C. Villani,
Topics in optimal transportation , 58 (Ameri-can Mathematical Soc., 2003).[23] S. Lawrence, Perturbative Removal of a Sign Problem,Phys. Rev. D , 094504 (2020), arXiv:2009.10901 [hep-lat].[24] J. Han, A. Jentzen, and E. Weinan, Solving high-dimensional partial differential equations using deeplearning, Proceedings of the National Academy of Sci-ences , 8505 (2018).[25] S. P. Jordan, H. Krovi, K. S. Lee, and J. Preskill, Bqp-completeness of scattering in scalar quantum field theory,Quantum2