[PDF] Multigrid for Chiral Lattice Fermions: Domain Wall

Abstract

Critical slowing down for the Krylov Dirac solver presents a major obstacle to further advances in lattice field theory as it approaches the continuum solution. We propose a new multi-grid approach for chiral fermions, applicable to both the 5-d domain wall or 4-d Overlap operator. The central idea is to directly coarsen the 4-d Wilson kernel, giving an effective domain wall or overlap operator on each level. We provide here an explicit construction for the Shamir domain wall formulation with numerical tests for the 2-d Schwinger prototype, demonstrating near ideal multi-grid scaling. The framework is designed for a natural extension to 4-d lattice QCD chiral fermions, such as the Möbius, Zolotarev or Borici domain wall discretizations or directly to a rational expansion of the 4-d Overlap operator. For the Shamir operator, the effective overlap operator is isolated by the use of a Pauli-Villars preconditioner in the spirit of the Kähler-Dirac spectral map used in a recent staggered MG algorithm [1].

Full PDF

MMultigrid for Chiral Lattice Fermions: Domain Wall

Richard C. Brower * , M. A. Clark † , Dean Howarth * , and Evan S.Weinberg † * Boston University, Boston, MA 02215, USA † NVIDIA Corporation, Santa Clara, CA 95050, USAApril 17, 2020

Abstract

Critical slowing down for the Krylov Dirac solver presents a major obstacle tofurther advances in lattice ﬁeld theory as it approaches the continuum solution. Wepropose a new multi-grid approach for chiral fermions, applicable to both the 5-ddomain wall or 4-d Overlap operator. The central idea is to directly coarsen the 4-dWilson kernel, giving an eﬀective domain wall or overlap operator on each level. Weprovide here an explicit construction for the Shamir domain wall formulation withnumerical tests for the 2-d Schwinger prototype, demonstrating near ideal multi-gridscaling. The framework is designed for a natural extension to 4-d lattice QCD chi-ral fermions, such as the M¨obius, Zolotarev or Borici domain wall discretizationsor directly to a rational expansion of the 4-d Overlap operator. For the Shamiroperator, the eﬀective overlap operator is isolated by the use of a Pauli-Villars pre-conditioner in the spirit of the K¨ahler-Dirac spectral map used in a recent staggeredMG algorithm [1]. a r X i v : . [ h e p - l a t ] A p r ontents L s . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Introduction

Increasingly powerful computers and better theoretical insights continue to improve thepredictive power of lattice quantum ﬁeld theories, most spectacularly for lattice quantumchromodynamics (LQCD) [2]. However, with larger lattice volumes and ﬁner lattice spac-ing exposing multiple scales, the lattice Dirac solver becomes increasingly ill-conditionedthreatening further progress. The cause is well known: as the fermion mass approacheszero the Dirac operator becomes singular due to the exact chiral symmetry of the Diracequation at zero mass, causing critical slowing down [3]. The algorithmic solution to thisproblem for lattice QCD was recognized 30 years ago [4]: the ﬁne grid representation forthe linear solver should be coupled to multiple scales on coarser grids in the spirit of Wil-son’s real space renormalization group and implemented as a recursive multi-grid (MG)preconditioner [5]. Early investigations in the 1990’s introduced a gauge-invariant produc-tive MG algorithm [6, 7] with encouraging results for the Dirac operator in the presenceof weak (or smooth) background gauge ﬁelds near the continuum. However, in practicelattice sizes at that time were too small and the gauge ﬁelds were too rough to achieveuseful improvements.It was not until the development of adaptive geometric MG methods [8, 9] that afully recursive MG algorithm, capable of projecting strong background chromodynamicsﬁelds onto coarser scales, was found for the Wilson Dirac discretization. However thereare two other discretizations, referred to as staggered [10] and domain wall [11] fermions,that are used extensively in high energy applications that more faithfully represent chiral symmetry on the lattice. The extension of adaptive geometric MG to these discretizationshas proven to be more diﬃcult, perhaps related to the improved lattice chiral symmetry.There has been progress on a two-level MG algorithm for domain wall fermions [12, 13, 14]and a non-Galerkin algorithm for the closely related overlap operator [15, 16]. Recentlyan adaptive geometric multigrid algorithm for staggered fermions was discovered based ona novel pre-conditioner inspired by the K¨ahler-Dirac spin structure [17, 18].Here we propose a new approach to the domain wall discretization which leverages,at least on a heuristic level, features developed from both the Wilson and staggered MGmethods. We hope that a comparison of methods will lead to new optimizations acrossthe full set of discretizations. The design strategy of our domain wall MG algorithmconsists of trying to separate the 4-d physical subspace of low modes found in the eﬀective4-d overlap operator from the larger 5-d domain wall vector space. This procedure isconveniently enumerated in 3 steps: 2.

Approximate Pauli-Villars preconditioning of the domain wall operator [19].ii.

Wilson kernel MG projection on the domain wall and Pauli-Villars factors [9].iii.

Truncated projection/prolongation restricted to the domain wall boundary.The salient features of each step are: i.)

The exact Pauli-Villars inverse D − P V ,which is a perfect map from the DW spectrum onto overlap, is well approximated by theapplication of the Pauli-Villars adjoint, D † P V . ii.) A Galerkin MG coarsening is appliedto the 4-d Wilson kernel on each extra-dimensional slice separately for both the domainwall and Pauli-Villars factors. The null space projection is formulated entirely from the4-d Wilson kernel and does not scale with the size of the extra dimension. iii.)

Finally,within the multigrid cycle, the residual coarsening and error interpolation is restricted tothe domain wall, which in turn allows the extent of the extra dimension of the coarse-leveloperator to be reduced.We again follow the successful development strategy for the Wilson [8] and stag-gered [1] MG algorithms by using the two-ﬂavor lattice Schwinger model [20, 21] as aprototype for exploration and testing. The reader is referred to Fig. 3.1 and the accom-panying Table 2 for a concise summary of the performance of our domain wall algorithmfor the 2-d Schwinger model. In Sec. 2, the underlying motivation and formalism is given.For simplicity, the discussion is restricted to the Shamir domain wall operator [22]. This isfollowed in Sec. 3 by the details of the implementation and benchmarks for our prototype2-d Schwinger model. Care is taken to present the formalism in a dimension-agnostic formto accommodate extensions from 2-d to 4-d gauge theories. In Sec. 5 we conclude by notingthat our core developments not only apply to the 4-d Shamir formulation presented herebut also to the M¨obius [23, 24], Borici [25, 26], and Zolotarev [27, 28] formulations, as wellas directly to the overlap operator approximation to the sign function [15, 29, 30, 31].

All lattice discretizations of the Dirac operator seek to rapidly approach the continuumDirac operator, Dψ ( x ) = γ µ ( ∂ µ − iA µ ( x )) ψ ( x ) + mψ ( x ) , (2.1)as the lattice spacing vanishes. The continuum operator is a ﬁrst derivative, an anti-Hermitian operator, thus the spectrum is imaginary indeﬁnite except for a small real shiftfor m >

0. It obeys an exact chiral symmetry at zero mass ( m = 0). The Wilson3iscretization, D W ( U, m ) x,y = − − γ µ U µ ( (cid:126)x ) δ x + µ,y − γ µ U † µ ( (cid:126)x − ˆ µ ) δ x − µ,y + ( d + m ) δ x,y , (2.2)introduces an anti-Hermitian “na¨ıve” ﬁrst diﬀerence and adds a Hermitian second-diﬀerence(or so-called Wilson term) to lift the doublers to the cut-oﬀ scale π/a at the expense ofexplicitly violating lattice chiral symmetry at O ( a ) in lattice spacing. The Wilson latticeoperator then requires ﬁne tuning of the bare quark in order to restore chiral symmetryin the continuum limit.The chiral overlap [15, 32] and domain wall [22] fermions, beyond their remarkablephysical properties, have the feature that the Wilson kernel can be re-purposed. In thedomain wall (DW) approach, the Wilson kernel is present in an extra dimension separating4-d domain walls by a lattice of length L s . Suppressing the four-dimensional indices, theDW operator is given by D DW ( m ) s (cid:48) s =  D W ( M ) + 1 P − · · · − mP + P + D W ( M ) + 1 P − · · · P + D W ( M ) + 1 · · · ...... ... ... . . . P − − mP − · · · P + D W ( M ) + 1  (2.3)where P ± = (1 ± γ ). The indices s, s (cid:48) = 1 , · · · L s label 4-d blocks in the extra ﬁfthdimension (or d+1 dimension). The bulk mass, M < −

1, is tachyonic. The physical baremass parameter is encoded by the boundary parameter m .In the limit of L s → ∞ an exact lattice chiral symmetry appears up to an explicitfermion mass gap given by m q = m − m (cid:39) m as m → . (2.4)The result is that propagators between the domain walls are described by the eﬀective 4-doverlap operator proposed by Neuberger [15, 33] with the deformed chiral algebra of theGinsparg-Wilson identity [34], γ D − ov + D − ov γ = O ( a ) , (2.5)at zero quark mass. The explicit spectral map from the domain wall to the overlap operatorwill be presented in Sec. 2.2, following closely the notation in [19] which we will refer toas BNO . This spectral map between domain wall and overlap operators plays a central rolein our DW MG algorithm. 4 − . − − . . . − I m ( λ ) Re ( λ ) Eigenvalues, , β = 10 . , m = 0 . , L s = 8 D DW Figure 2.1: The spectrum of the domain wall operator lacks the continuum chiral lowmodes; instead there are 2 d/ L s of them (2 d/ coming from the spin degrees of freedom) ina circle around zero in the complex plane. The domain wall operator encodes chiral symmetry in a subtle and indirect fashion. Thefull spectrum of the domain wall operator in Eq. 2.3, illustrated in Fig. 2.1 for two dimen-sions, does not have the expected small eigenvalues of the continuum as youapproach the chiral limit , but instead has O ( L s ) small eigenvalues. This can easily beseen in the free limit ( U = 1) which at zero momentum has the spectrum, λ n = m /L s e i (2 n + 1) π/L s for n = 0 , , ..., L s − . (2.6)In the exact chiral limit, ﬁrst taking L S → ∞ followed by m →

0, this operator has nozero modes. Instead they form a unit circle around the origin in the complex plane. Thisfeature persists when gauge ﬁelds are turned on as illustrated in Fig. 2.1 for two dimensionswith Abelian gauge ﬁelds.The DW operator features further issues. Unlike with staggered or Wilson fermions,5he DW operator is dramatically non-normal ([ D DW ( m ) , D † DW ( m )] (cid:54) = 0), even in the exactfree ﬁeld. The spectrum does not satisfy the half-plane eigenvalue condition with positivereal values ( Re [ λ ] > r centered at a complex point c ∈ C . For GMRES methods,one can show the relative residual on iteration n of a GMRES method is bounded by | r/c | n [35].A standard method to solve the domain wall linear system is to replace D DW ( m ) Ψ = b (2.7)with the normal system, D † DW ( m ) D DW ( m ) Ψ = D † DW ( m ) b . (2.8)This system has multiple beneﬁts but also complications.The normal operator encodes a single low mode with, in the free-ﬁeld limit, eigenval-ues O ( m ). The low modes are “bound” to the domain wall, as can be visually inspectedby looking at the proﬁle of the eigenvectors (the singular vectors of the domain wall op-erator) in the bulk dimension. Both of these properties form a stark contrast with D DW ;the normal operator transforms the physics of the domain wall operator into as a singlechiral fermion below the cut-oﬀ.From a numerical standpoint, the normal operator is Hermitian positive deﬁnite(HPD) and can be solved eﬃciently by traditional Krylov methods, e.g., Conjugate Gradi-ent. Further, it is amenable to deﬂation with eigenpairs generated via an eﬃcient Lanczosprocess. Also as a Hermitian positive deﬁnite matrix, the solver for this normal oper-ator can be implemented as a traditional MG algorithm [12, 13]. However, a numeri-cal implementation of the coarsened normal operator, a distance-two stencil, requires anon-trivial increase in computation and communication relative to a distance-one sten-cil [36, 37, 38, 39]. This owes in large part to a far more complicated gather pattern dueto around-the-corner terms, which the original ﬁne level original normal operator avoidsby being the product of two distance-one operators.One solution to this issue is to recognize that this normal operator is the square ofa distance-one “ γ ” Hermitian operator, D † DW ( m ) D DW ( m ) = (Γ D DW ( m )) , (2.9)6here Γ = γ R is a product of γ and the reﬂection in the extra dimension by R ss (cid:48) = δ ( s + s (cid:48) − L s , . This operator has an indeﬁnite real spectrum similar to the imaginaryspectra in the continuum and staggered operator on the lattice. Appealingly, the operatoralso has a single chiral mode with eigenvalues of O ( m ). However, this operator itself has itsown frustrations: while it can be coarsened, it develops spurious small eigenvalues similarto with a na¨ıve approach to staggered fermions, which was shown in [1] to harm a fullyrecursive algorithm.The Γ D DW operator leads itself to a clean interpretation of spurious low modes viaa free-ﬁeld analysis. The low modes of Γ D DW do not include support on the bulk; assuch, a “coarsening” inspired by the low modes eliminate it. In this case, the coarsenedoperator can be shown to be γ D na¨ıve with well understood extra low modes. This isactually a foretelling of a salient feature of our algorithm: the bulk dimension cannot betrivially eliminated. As a last concern, this approach does not generalize beyond Shamirdomain wall fermions: the fully general M¨obius formulation has a Wilson kernel inverse aspart of its deﬁnition of Γ . Our new approach seeks to avoid these diﬃculties base on thespectral map form the domain wall and to the overlap representations in BNO , combinedwith methods borrowed from prior multigrid algorithms for Wilson [8] and staggered [1]discretizations.

One may view the Pauli-Villars operator, D P V ≡ D DW (1) as a left preconditioner of thedomain wall operator in the linear system D − P V D DW ( m )Ψ = D − P V b . (2.10)We will show that the Pauli-Villars operator is an ideal, albeit expensive, preconditionerand that even a simple approximation to the inverse dramatically accelerates convergence.This is accomplished via the generalized eigenmode problem, D DW ( m )Ψ λ = λD P V Ψ λ ,which separates the low chiral generalized eigenvectors ( λ (cid:39)

0) bound to the walls at s = 1 and s = L s , ψ x = 12 (1 − γ )Ψ x, + 12 (1 + γ )Ψ x,L s , (2.11)from the high bulk modes at the cut-oﬀ: λ = O ( π/a ).To see this explicitly, it is convenient as in BNO to ﬁrst move both walls to s = 1 by7ntroducing a cyclic permutation of the negative chiral modes at s = L s to s = 1 by P s (cid:48) s =  P − P + · · · P − P + · · · · · · P + P + · · · P −  s (cid:48) s , (2.12)with P ± = (1 ± γ ). This deﬁnes a unitary transformation of our domain wall operators, D DW ( m ) → P † D DW ( m ) P . (2.13)Following the derivation via LDU transformation in BNO , the preconditioned matrix in this permuted chiral basis , K DW ( m ) = ( D P V P ) − D DW ( m ) P =  D ov ( m ) 0 0 · · · · · · − (1 − m )∆ · · · − (1 − m )∆ · · · − (1 − m )∆ · · · − (1 − m )∆ L s · · · · · ·  . (2.14)is mapped into a block diagonal form [19] in the extra dimension. This remarkable identityis the central observation for our preconditioned multigrid algorithm. The eﬀective overlapoperator block, K DW , ( m ) = D ov ( m ), has all the non-trivial low eigenvalues. The additionalextra heavy modes are mapped exactly to unit eigenvalues (or in physical units at the1 /a ), irrespective of L s , lattice spacing (here scaled to a = 1) and gauge interactions.Parenthetically, we should acknowledge that this general mechanism to isolate thelow domain wall modes in an eﬀective overlap block is well known and it is the fundamentalinsight to chiral lattice fermions [32]. In particular the Monte Carlo sampling of the pathintegral must divide by the Pauli-Villars determent to give a ﬁnite determinant ratio,det[ K ] = det[ D DW ( m )] / det[ D DW (1)] in the continuum. As explained by Kaplan andSchmaltz in [40] in an elegant exposition based on a kinematic super symmetry cancellationbetween the bulk fermion and the bosonic Pauli-Villars pseudofermions, broken only bydomain wall boundary, to give rise to boundary chiral modes via the Callan-Harvey descentrelations [41]. 8 . − . . . . . . . . . . I m ( λ ) Re ( λ ) Eigenvalues of D − P V D DW , , β = 10 . , m = 0 . , L s = 8 D − P V D DW Figure 2.2: A representative spectrum of the preconditioned domain wall operator, D − P V D DW ( m ). Deviations from an exact circle exist but are qualitatively negligible. Notethe similarity to the K¨ahler-Dirac preconditioned staggered operator in Fig. 2.3.The block structure of K DW ( m ) lends itself to a structured block-inverse given by K − DW ( m ) =  D − ov ( m ) 0 0 · · · · · · − m )∆ · · · − m )∆ · · · − m )∆ · · · − m )∆ L s · · · · · ·  . (2.15)The identiﬁcation of the overlap propagator G ≡ D − ov ( m ) agrees with the practice ofcomputing DW propagators by solving the linear system, D DW ( m ) G = D P V b , from anarbitrary source on the wall b ∼ δ ,s . The heavy modes (0 , , · · · , , · · · ,

0) are static.On the other hand, the non-zero elements in the ﬁrst column show that the chiral modesbleed exponentially into the interior by factors, ∆ s +1 = T ∆ s = T s / (1 + T L ) in terms ofthe transfer matrix: T = (1 − H ) / (1 + H ). At ﬁnite L s the overlap operator is D ov ( m ) = 1 + m − m γ (cid:15) L [ H ] = m + (1 − m ) D ov (0) , (2.16)9ith (cid:15) L [ H ] = (1 − H ) L − (1 + H ) L (1 − H ) L + (1 + H ) L and H = γ D W ( M ) / (2 + D W ( M )) (2.17)in the Shamir implementation. In the limit L s → ∞ this becomes the exact sign function: (cid:15) L [ H ] → (cid:98) γ = sign [ H ]. The same spectral transformation into this sparse structurein Eq. 2.15 applies to other implementations of domain wall fermions (M¨obius, Borici,Zolotarev, etc). The modiﬁcations include a variation of the Hermitian kernel H and thefunctional (cid:15) L [ H ] that converge, (cid:15) L [ H ] → sign [ H ] to the sign function as L s → ∞ .We can trace the sparse block structure to the mass dependence on the dyadic struc-ture at the boundaries relating the Pauli-Villars operator to the domain wall operator, D P V = D DW ( m ) + (1 − m )  P + P −  ⊗ (cid:2) P − · · · P + (cid:3) (2.18)or D P V ≡ D DW ( m ) + (1 − m ) U V † . After applying the Sherman-Morrison-Woodburyformula [42], D − P V D DW ( m ) = 1 − D − DW ( m ) U (1 − m ) I + (1 − m ) V T D − DW ( m ) U V T , (2.19)and again considering the chiral basis, V T → V T P = (cid:2) · · · (cid:3) , we see a clearprojection onto the ﬁrst column in K DWs,s (cid:48) ( m ), reproducing the sparse structure in Eq. 2.14. Free-Field Limit:

The analysis of the free-ﬁeld ( U = 1) limit for the domain wall andPauli-Villars operators gives valuable insight and guidance to our MG construction, par-ticularly when examining the low spectra well below the UV cut-oﬀ scale, π/a . Qualitativeand even quantitative features survive the introduction of the gauge ﬁelds generated bylattice Monte Carlo methods.We begin by transforming the free Wilson kernel in Eq. 2.3 to momentum space, (cid:101) D W ( p µ ) = am + γ µ sin( ap µ ) + 2 sin ( ap µ /

2) = am + (cid:88) µ (1 − e − iγ µ ap µ ) . (2.20)where we introduce the expression on the right to emphasize the well known feature ofcircular arcs in the complex spectrum of the Wilson operator. This gaping separates the10oubler modes from the continuum modes as evident in Fig. 2.4 even with non-zero gaugeﬁelds turned on. The lattice spacing a has been introduced to identify physical low modes( | p | (cid:28) π/a ) relative to UV cut-oﬀ: O ( π/a ). The low spectrum, λ ± (cid:39) m + ± i (cid:112) p + ap / O ( ap ) Wilson term.This Fourier analysis had a straightforward generalization to the Pauli-Villars oper-ator, D P V , as the boundary conditions are antiperiodic in the ﬁfth dimension, (cid:101) D P V ( p µ , p ) = − e − iγ ap + (cid:101) D W ( p µ ) + (1 + M ) . (2.21)The boundary conditions restrict the bulk momentum p to half integer modes: p = π (2 n + 1) /L s . After setting the mass to its free-ﬁeld tachyonic value M = −

1, the lowmomentum expansion again has the familiar Wilson form ( iγ µ p µ + ( a/ p ) with ﬁrst“eye” displaced to a circle centered at λ = 1. Not surprisingly the free domain walloperator, D DW ( m ), which diﬀers by a fermion-mass dependence on the boundary (reducingto Dirichlet boundary conditions for m = 0), has a qualitatively similar spectra as seenin Fig. 2.1 even with non-zero gauge ﬁelds. Both have no small eigenvalues below thecut-oﬀ for free ﬁelds. (With interactions small eigenvalues occur when the topologicalcharge changes.)Turning to the normal equation, we see a dramatic diﬀerence. While the domainwall normal operator has chiral modes at m = 0, as we show in Appendix A, the normalequation for the Pauli-Villars operator is positive deﬁnite with a large gap from zero with singular values of order the cut-oﬀ: D P V D † P V = 1 + O ( p ) (2.22)The ﬁrst correction is O ( a p ) in physical units. Indeed more generally for M = − L s in the free limit that D † P V D P V is bounded from below by 1, i.e.,at the lattice cut-oﬀ scale: 1 /a . This feature suggest the usefulness of our approximatepreconditioning, D − P V D DW = D † P V ( D P V D † P V ) − D DW (cid:39) D † P V D DW (2.23)which avoids the expensive need to invert the Pauli-Villars operator. Even after gauge ﬁeldsare included, we note in Fig. 2.5 this approximation conforms well at small eigenvaluesincluding the O ( ap ) for the parabolic curvature in the complex plane.To explore this further, we summarize results from Appendix A, comparing the lowmomentum spectra for the exact overlap map, D − P V D DW , with the approximate map, D † P V D DW . At m = 0, the low spectrum for D − P V D DW is given by λ ± = ± i (cid:112) p + ap +11 ( a ). By using the shift identity, λ → λ = am + (1 − am ) λ as implicit in Eq. A.6, thelow spectrum for non-zero mass is λ ± = am ± i (1 − am ) (cid:112) p + a (1 − am ) p + O ( a ) . (2.24)which after a rescaling, λ ± → λ ± / (1 − am ) = am q ± i (cid:112) p + ap , is a Wilson-like dispersionrelation. The general expansion in powers of momentum p has a rather remarkableindependence on L s . As noted in Appendix A, a direct evaluation of the free overlapkernel in Eq. A.6 for any ﬁnite L s ≥ p which, up to O (( a p ) k )m isindependent of L s ≥ k . This invariance of the low momentum expansion with respect tosize of the extra dimension may explain the eﬃcacy of reducing the size of extra dimension L s on the coarse level iterations as documented in Sec. 4.2.Finally we compare the low mass spectra to our approximate preconditioned opera-tor, D † P V D DW ( m ), λ ± = am ± i (1 − am ) (cid:112) p + ap + O ( a ) . (2.25)relative to the exact preconditioner in Eq. 2.24. The only diﬀerence to quadratic orderoccurs at dimension 6 with a contribution O ( a mp ), helping to explain why a Wilson-likespectra in the overlap sector is preserved in Fig. 2.34, including the parabolic curvatureto O ( ap ). Similarity with the K¨ahler-Dirac Preconditioned Staggered Operator:

It is in-teresting to compare the Wilson and the overlap spectra with the preconditioned spectrumfor the staggered MG algorithm in Ref. [1]. The staggered lattice operator has the uniqueproperty, shared by the continuum, of being an exactly anti-Hermitian operator plus aconstant mass shift as illustrated for m = 0 by the vertical (red) spectra in Fig. 2.3. Boththe staggered and continuum operators are normal operators.At ﬁrst these similarities between the staggered operator and the continuum mayseem to be ideal for multigrid, but this turned out to be major obstacle to extending theGalerkin projection method used successfully for the Wilson MG algorithm [16] to thestaggered operator. The solution found in Ref. [1] was to ﬁrst precondition by dividing the anti-Hermitian staggered operator by the spin-taste K¨ahler-Dirac block, deforming thespectrum into one resembling the overlap spectrum or the ﬁrst “eye” of Wilson spectra inFig. 2.3.More speciﬁcally, this required writing the staggered operator as an “even/odd” 2 d block operator (i.e., 2 squares in 2d, 2 hypercubes in 4d) decomposed as a sum of block-local terms “ B ” and block-hopping terms “ C ” as described in Eq. 2.11 of [1]. Each of12 . − − . . . − . . . . . I m ( λ ) Re ( λ ) Eigenvalues, , free, m = 0 StaggeredKähler-Dirac Prec

Figure 2.3: The spectrum of the two-dimensional free, massless staggered operator before(vertical line) and after K¨ahler-Dirac preconditioning.these terms are separately anti-Hermitian operators with an indeﬁnite spectra. In thisformalism, the preconditioned operator is simply the block Jacobi preconditioned form, B − D stag . This maps the spectrum onto an exactly unitary circle in the free case, asshown in Fig. 2.3, resembling the exact overlap operator at L s = ∞ .The structural similarity to the the Pauli-Villars preconditioner is striking. Inthis case, we could have also started with a pair of anti-Hermitian indeﬁnite operators,“ i Γ D P V ” for the Pauli-Villars operator and “ i Γ D DW ( m )” for the domain wall operator.We can now formulate the Pauli-Villars preconditioned domain wall operator as D DW ( m ) → D − P V D DW ( m ) ≡ ( i Γ D P V ) − ( i Γ D DW ( m )) , (2.26)where we’ve made explicit that this preconditioning takes an imaginary indeﬁnite spectrumto a unitary circle, identical in form to Fig.2.2. Beyond the practical consequence of thisobservation, it is intriguing to ask why this is the case. It may hint of a unifying principlefor our multigrid algorithm common to all three major fermion discretizations in the chirallimit, which is worthy of additional investigation.13 − I m ( λ ) Re ( λ ) Eigenvalues, , free, m = 0 WilsonOverlap, m ker = − . Kähler-Dirac Prec

Figure 2.4: Comparison of the 2-d free eﬀective overlap and K¨ahler-Dirac preconditionedoperator with almost identical circles centered at λ = 1 overlaid with 2-d free Wilsonoperator with two doublers at λ = 2 and one at λ = 4. Pauli-Villars Preconditioning:

For the ﬁrst step we need to consider the ideal Pauli-Villars preconditioner. It is worth re-emphasizing the challenge and importance of pre-conditioning the domain wall operator. The domain wall operator on a d + 1 dimensionallattice increases the number of eigenvalues from N ev = 2 d/ × N c × L d by a factor of L s .The Pauli-Villars inverse spectra transform is an ideal preconditioner, putting the ”bulk” N ev × ( L s −

1) eigenvalues exactly at the cut-oﬀ 1 /a in physical units.Due to the Pauli-Villars operator having a maximally indeﬁnite spectrum, it is mostoptimally solved via the normal operator. Although the Pauli-Villars normal operator isextremely well conditioned with a positive real spectrum starting at 1 /a , its use as apreconditioner is still prohibitively expensive. Instead we consider an approximate Pauli-Villars preconditioner, D − P V = D † P V [ D P V D † P V ] − (cid:39) D † P V (2.27)14 − . .

51 0 0 . . . I m ( λ ) Re ( λ ) Eigenvalues, , β = 10 . , m = 0 . , L s = 8 D − P V D DW D † P V D DW Figure 2.5: The spectrum of our target multigrid operator, D † P V D DW , compared with theeﬀective overlap spectrum, D − P V D DW . For clarity of presentation we truncate the x-axis;the spectrum of D † P V D DW extends out to Re( λ ) ≈ (cid:16) D † P V D P V (cid:17) D − P V D DW | λ (cid:105) = re iθ | λ (cid:105) , ; (2.28)with − π/ < θ < π/ λ = r exp[ iθ ]. This is proven in Appendix Bbased on the positive deﬁnite spectra of the normal operator factor and the right-halfplane spectrum of D − P V D DW . This does imply Krylov solvers such as BiCGStabcan be directly applied to this approximate operator.

While Fig. 2.5 is consistent with this property, the qualitatively strong match betweenthe low eigenvalues of the two operators suggests we can make a much stronger statement.Indeed the two operators are nearly identical, with deviations conﬁned to larger eigenvaluesin the approach to the cut-oﬀ scale π/a . This is again motivated by the free ﬁeld limit15here we prove for M = − D P V D † P V = 1 + O ( p ) , (2.29)and as a result the spectrum of the approximation is valid up to O ( p ) corrections. Indeedin the free theory, the additive operator to 1 (or 1 /a in physical units) is positive deﬁnitefor all momenta. Further details on this can be found in Appendix A.We note that a point of future investigation could be approximating ( D P V D † P V ) − by a low-order polynomial in the normal operator as opposed to truncating it to 1. Giventhe success of the truncation to 1, it is unclear if higher order polynomials would be worththe additional computational burden. Wilson Kernel MG Projection:

In the second step , we introduce a coarseningprojection using the familiar Galerkin projecting developed for Wilson MG acting inde-pendently on the Wilson kernel for each of the s = 1 , , · · · L s slices, (cid:98) D W (cid:98) x, (cid:98) x (cid:48) = P † (cid:98) x,x D Wx,x (cid:48) P x (cid:48) , (cid:98) x (cid:48) or (cid:98) D W = P † D W P . (2.30)Here color and spin indices are implicit, and on the right we have further followed the con-vention of Wilson MG by suppressing the indices of the d-dimensional space-time lattice.The projection preserves γ so that γ P = P σ and P † (1 ± γ ) P = 1 ± σ . (2.31)With the normalization convention of the restrictor, P † P = I , giving the identity operatoron the coarse vector space and the fact that D DW ( m ) is a linear functional of the Wilsonkernel, we have (cid:98) D W = P † D W P = ⇒ (cid:98) D DW ( m ) = P † D DW ( m ) P . (2.32)with the implicit redeﬁning on the right of the restrictor as diagonal in s-space: P → P δ s,s (cid:48) .This notational slight of hand is common practice in the physics literature with tensorexpressions. For example in Eq. 2.31 we also implicitly redeﬁned γ as diagonal in colorand d-dimensional space-time.Of course this factorization also applies to the Pauli-Villars term and to generalizeddomain wall formulations such as M¨obius, Zolotarev etc. However this factorization does not apply to non-linear functional of the kernel such as the preconditioned product( P † D P V P † ) − P † D DW ( m ) P † (cid:54) = P † D − P V D DW ( m ) P † (2.33)16r the overlap operator. For these the kernel projection does not commute with theoperator. To make this clear we explicitly write the coarsened form of the domain walland Pauli-Villars operators, (cid:98) D DW ( m ) =  (cid:98) D W ( M ) + 1 (cid:98) P − · · · − m (cid:98) P + (cid:98) P + (cid:98) D W ( M ) + 1 (cid:98) P − · · · (cid:98) P + (cid:98) D W ( M ) + 1 · · · ...... ... ... . . . (cid:98) P − − m (cid:98) P − · · · (cid:98) P + (cid:98) D W ( M ) + 1  , (2.34)where (cid:98) P ± = (1 ± σ ). This coarsened operator has identical algebraic structure as origi-nal ﬁne-level Domain wall operator. So when applying a coarse Pauli-Villars preconditioner (cid:98) D − P V , the exact same algebraic manipulations for the

BNO factorization carry over. Againthe bulk modes are moved to the cut-oﬀ and “chiral” modes are conﬁned to the boundaryto form an eﬀective coarse (cid:98) D ov ( m ) operator. This recursive construction is the key elementfor our DW multigrid construction. Truncated projection/prolongation:

Lastly, in the third step , we deﬁne a con-vention for residual coarsening and error correction prolongation. One straightforwardapproach would be to coarsen and prolongate across all L s slices. Instead, we ﬁnd it ispossible and in fact advantageous to only restrict and prolong the boundary contributionfor the residual coarsening and the error correction, respectively. We assert this conventionhere and use it in general going forward, however we quantitatively study only acting onthe boundary as opposed to the entire bulk in Sec. 4.1.In the convention where the boundary is at the s = 1 slice, we deﬁne the projectionof residual and prolongation of the error on the boundary by (cid:98) r s = (cid:40) P † r for s = 10 for s > e s = (cid:40) P (cid:98) e for s = 10 for s > . (2.35)The s > Sec 3.2 Overlap to bulk Domain Wall reconstruction procedure inBNO [19]. It may be surprising, but this is still eﬀective despite the fact we are using (cid:98) D † P V in place of (cid:98) D − P V . We discuss this further below, but in brief, this can be motivatedas reasonable again by equivalence, D † P V ∼ D − P V , up to the quadratic order in momentain the free theory. 17ll of these elements come with a variety of potential parameters for optimizationon each level. For example since we are only transferring the residual and error correctionon a single slice we are also able to reduce the extra dimension on the coarse levels. Wewill denote L s for the ﬁrst coarsening as (cid:98) L s , on the second coarsening (cid:98)(cid:98) L s , and so on. Wewill use a reduced (cid:98) L s and (cid:98)(cid:98) L s going forward, however we quantitatively study varying thecoarser L s in Sec. 4.2. Summary:

The full MG algorithm for the linear system D DW x = b , formulated as anextension to a K-cycle (i.e., a multigrid cycle where each coarse solve is wrapped in aKrylov solver) is as follows: • Left precondition the system by D † P V , giving the new linear system D † P V D DW x = D † P V b (2.36) • Perform an MG-preconditioned iterative solve (via GCR, FGMRES, etc.) using theoperator D † P V D DW . – Relax on the current residual with D † P V D DW , known as the pre-smoother. – Go the next level with the projected Wilson kernel (cid:98) D W ( U, M ) = P † D W ( U, M ) P (2.37)to deﬁne coarse level operators, (cid:98) D DW and (cid:98) D P V , using Eq. 2.34. ∗ Project the residual on the wall using Eq. 2.35 ∗ Using a Krylov solver, approximately solve the coarse level system: (cid:98) D † P V (cid:98) D DW (cid:98) e = (cid:98) r (2.38) ∗ Prolong the error with Eq. 2.35 and correct the solution: x = x + e – Post-smooth on the accumulated error from the previous two steps with D † P V D DW . • Repeat until the desired tolerance on || b − D DW x || .Needless to say, there are several knobs to tune, even as far as MG algorithms go.We have explored a few of these parameters in a preliminary form in the 2-d two-ﬂavorSchwinger model and have left unexplored further until testing for 4-d domain wall methodsdiscussed brieﬂy in the conclusion. 18 m β Deﬂ. Fine D P V , D DW Int. avg. iter. Coarsest avg. iter.64 0.01 3.0 N 820 2.88(07) 2.62(06) ×

128 0.005 12.0 N 484 2.85(17) 4.71(42) ×

256 0.0025 48.0 N 460 3.32(23) 1.06(11) ×

64 0.01 3.0 Y 820 2.62(09) 51.2(3.9)128 0.005 12.0 Y 412 2.29(14) 68.6(0.9)256 0.0025 48.0 Y 412 2.00(09) 93.2(2.0)Table 1: The behavior of our MG algorithm in the approach to the continuum limitat constant physics and physical volume. We see an improvement in convergence as weapproach the continuum limit, noting that stable convergence depends on an inexpensivedeﬂation of the coarsest level. We chose to deﬂate 128 eigenvectors on the coarsest level.

We now turn to testing our domain wall MG algorithm on the two-ﬂavor Schwingermodel [20, 21, 43]. As with the cases of Wilson [8] and staggered MG [1], the Schwingermodel is a useful framework for the development and testing of algorithms for QCD [44].As a low-dimensional prototype model it has the advantage of enabling the rapid explo-ration of a wide variety of alternative features in a serial laptop code. This can be used todemonstrate validity of an MG algorithm and guide the subsequent application to QCDand software tuning at scale on modern GPU accelerated systems. The importance of thistwo-step approach can not be over emphasized.For our investigations in the interacting case, we have ﬁxed M = − .

05 relative tothe correct free-ﬁeld value M = − L s = 16 as a representative large value. Ourcurrent performance of the DW MG algorithm outlined above is illustrated in Fig. 3.1 andthe accompanying Table 2. The one exception to these parameters is our study of thecontinuum limit, where we explore the addition of deﬂation on the coarsest level.Before describing the details, it is apparent that our basic algorithm vastly improvesscaling in the approach to the continuum and chiral limit, nearly eliminating critical slow-ing down. Our study of additional deﬂation in the continuum limit, given in Table 1,suggests that the lack of perfect scaling shown in Fig. 3.1 can be improved with deﬂation.This is supported by ongoing investigations of deﬂation of the coarsest level with bothtwisted mass and HISQ fermions in 4-d QCD and inspiration from [14].19 .

001 0 .

01 0 . N u m b e r ﬁ n e D † P V a nd D D W m Fine D † P V and D DW Applications, , β = 6 . CGNRBiCGstab-6MG-GCR

Figure 3.1: The number of domain wall operator ( D DW , D † DW , D † P V ) applications requiredfor a solve to a tolerance of 10 − using CGNR on D † DW D DW , BiCGStab-6 on D † P V D DW ,and multigrid for a representative ﬁxed β and volume. Note that this is a log-log plot. A beneﬁt of our MG algorithm is its setup is the same as the setup for the traditional γ -preserving MG algorithm for Wilson fermions. In Eq. 2.34 we see that the coarseneddomain wall operator takes a Galerkin-projected Wilson operator as a kernel. For thisreason we expect the setup to be roughly the same cost as one ﬁve-dimensional domainwall solve. We have tuned the Wilson operator to the critical mass for near-null vectorgeneration. Near-null vectors are generated by relaxing on the homogeneous normal systemwith a Gaussian-distributed initial guess followed by a chiral doubling via (1 + P ± ) topreserve a σ -Hermiticity on the coarser levels [1, 8].For the solve, the main structure of the K-cycle is unchanged relative to our previouswork. Fixed parameters related to the setup and the K-cycle (target tolerance on eachlevel, etc) are described in Table 2. We have standardized on L s = 16 throughout thisinvestigation, though explorations for other values of L s are discussed in Sec. 4.20arametersetup setup operator Normal operator, D W D † W setup solver CGmax iterations 250max residual tolerance per null vector 10 − number of null vectors, level 1 ( n vec ) 8size of aggregate block, level 1 4 number of null vectors, level l > n lvec ) 12size of aggregate block, level l > number of levels l max D † P V D DW restart length of GCR 16relative residual tolerance 10 − GCR iterations for pre-, post-smooth 0, 8solver, level 2 operator (cid:98) D † P V (cid:98) D DW (cid:98) L s (cid:99)(cid:99) D † P V (cid:99)(cid:99) D DW ) † ( (cid:99)(cid:99) D † P V (cid:99)(cid:99) D DW ) (cid:98)(cid:98) L s .

001 0 .

01 0 . N u m b e r ﬁ n e D † P V , D D W m Fine D † PV and D DW applications, β = 6 . .

001 0 .

01 0 . N u m b e r ﬁ n e D † P V a nd D D W m Fine D † PV and D DW applications, β = 3 . β = 6 . β = 10 . Figure 3.2: The number of applications of the ﬁne domain wall operator (separately D DW and D † P V ) within an MG-preconditioned solve as a function of mass. On the left, weconsider ﬁxed β = 6 .

0, and on the right, ﬁxed volume 256 .Inspired by the work of [14], we tested adding a deﬂation step to the coarsest level,resulting in a four-level algorithm. This is important for addressing critical slowing downon the coarsest level, as has been also seen in 4-d studies for Wilson and staggered fermions.For conciseness of presentation we do not explore this past the results given in Table 1.To study the viability of our MG algorithm, we consider sweeps in input fermion mass m at both ﬁxed β = 6 .

0, varying the 2-d lattice volume and at ﬁxed volume 256 , varyingthe bare coupling β . An ideal MG algorithm shifts critical slowing down to the coarsestlevel, corresponding to mass-independent behavior on the ﬁnest and intermediate levels.Further, it should be insensitive to the volume at constant physics. We are interested inan algorithm that works for all reasonable values of β , however in practice it only needsto works for large β approaching the continuum at ﬁxed physical correlation lengths wellbelow the UV cut-oﬀ. In Fig. 3.2 we consider the number of ﬁne domain wall operator applications as a functionof the input fermion mass, which is proportional to the number of outer GCR iterations.On the left, we see that for ﬁxed β critical slowing down has been largely eliminated,albeit there is still a small increase at vanishing mass. On the right, we see that for aﬁxed volume, the MG algorithm shows an improved fermion mass independence in theapproach to the continuum limit. The algorithm is unsuccessful for our coarsest β = 3 . l σ ≈ .

4, which is pushing into22 .

001 0 .

01 0 . b D † P V b D D W m Average Intermediate b D † PV b D DW Iterations, β = 6 . .

001 0 .

01 0 . b D † P V b D D W m Average Intermediate b D † PV b D DW Iterations, β = 3 . β = 6 . β = 10 . Figure 3.3: The average number of iterations for the inner Krylov solve as a function ofmass. On the left, we consider ﬁxed β = 6 .

0, and on the right, ﬁxed volume 256 . Notethat this is a log-linear plot.an unphysical regime. Similar eﬀects have been noticed in staggered MG [1]. As notedin Table 1, preliminary investigations of deﬂation on the coarsest level lead to a furtherreduction in iteration count and, by extension, operator application count on the ﬁne level,though not a complete elimination of the increase as a function of decreasing mass. Intermediate level:

In Fig. 3.3 we consider the average number of GCR iterations onthe intermediate level as a function of the input fermion mass. We see that the averageiteration count is roughly independent of the coupling β and the volume, which is encour-aging. There is a weak mass dependence to the iteration count, however we note thatthe growth in iteration count appears to be weaker than power law. This is a signiﬁcantimprovement over the power-law dependence that is traditional of critical slowing down.Preliminary investigations of deﬂation on the coarsest level lead to a reduction in iterationcount on the intermediate level, though not a complete elimination of the increase as afunction of decreasing mass. Coarsest level:

In the previous two paragraphs we have demonstrated the eliminationof critical slowing down from the ﬁnest and the intermediate level. This is because criticalslowing down has been shifted to the coarsest level. In Fig. 3.4 we consider the averagenumber of CGNR iterations on the coarsest level as a function of the input fermion mass.In contrast to plots for the ﬁne and coarse levels, here we present this data on a log-log plotto examine power behavior. In both the left and right panels, we see behavior consistentwith power-law divergence of the iteration count independent of volume and β , showing23 .

001 0 .

01 0 . cc D † P V cc D D W m Average Coarsest cc D † PV cc D DW Iterations, β = 6 . .

001 0 .

01 0 . cc D † P V cc D D W m Average Coarsest cc D † PV cc D DW Iterations, β = 3 . β = 6 . β = 10 . Figure 3.4: The average number of iterations of CGNR on the coarsest level for, on theleft, ﬁxed β and on the right, ﬁxed volume. Note that this is a log-log plot.that critical slowing down has been successfully shifted to the coarsest level. As has beenseen in studies with twisted clover and HISQ fermions in 4-d, this ﬁnal critical slowingdown can be eﬃciently eliminated by deﬂation. Comparison with direct solve:

In the previous paragraphs we have demonstrated aMG algorithm which shifts critical slowing down from the ﬁnest level down to the coarsestlevel. In Fig. 3.1, we can see a stark contrast in behavior between our MG algorithmand CGNR directly on the domain wall operator. While the number of ﬁne domainwall operator applications scales with only weak mass dependence in the case of our MGalgorithm, there is a strong power-law dependence present for CGNR. This is the criticalslowing down which has shifted to the coarsest level in our MG algorithm.We also considered applying the BiCGStab- l Krylov solver directly to D † P V D DW .We can do this because, in contrast to D DW in isolation, D † P V D DW obeys the half-planecondition as proven in Appendix B. While this approach unsurprisingly still demonstratescritical slowing down, there is a marked reduction in ﬁne operator applications. This couldbe of immediate use for computing domain wall propagators for four-dimensional QCD. The MG algorithm described above, while generally successful, does introduce several com-ponents that are worth understanding better and very likely can lead to further improve24erformance. Even if we were to use the full eﬀective overlap operator, D − P V D DW , theassumption that we need only prolong and restrict the boundary mode when going be-tween levels is non-trivial. That this continues to be adequate with our approximation, D † P V D DW is even more surprising. Also the beneﬁt of reducing L s on the coarsened levelsbegs a better understanding. Here is our initial eﬀort to explore these issues.As we did in our prior paper on the staggered multigrid [1], we begin by studyingthe spectrum of our approximate operator and its “coarsened” version to glean some intu-ition. While we ﬁnd a strong overlap of physical low eigenvalues between the ﬁne and thecoarsened operator, we also ﬁnd that the coarsened operator includes additional spurioussmall eigenvalues. Unlike the na¨ıve formulation of MG for staggered fermions, these modesdo not appear to undermine the success of the algorithm. To probe this phenomena, weconsider the local colinearity and the oblique projector as monitors for the quality of ourMG preconditioner. We see that transferring only the boundary component of the ﬁneresidual and the coarse error correction is essential for the success of our multigrid algo-rithm, and present a physical argument for why this “cures” the problem introduced bythe spurious small eigenvalues. In the free ﬁeld limit presented in detail in Appendix A,we note that the low momentum modes to order O ( p ) are ﬁxed for any L s ≥

2, whichmaybe an indication of the underlying mechanism. We investigated reducing L s on thecoarser levels, ﬁnding that this reduction leads to an improved algorithm relative to usingthe ﬁne L s on the coarsened levels. In Fig. 4.1, we see that the spectrum of the coarsened operator (cid:98) D † P V (cid:98) D DW relative to thespectra for the ﬁne operator D † P V D DW introduces a large number of nearly real low modes.The problem resembles the spurious small eigenvalues that plagued the direct applicationof Galerkin projection to the staggered operator prior to the K¨ahler-Dirac preconditioning.In this instance we posit that the saving grace for the domain wall operator is thatthe low modes of D − P V D DW are bound to the chiral walls, while higher modes bleed moredominantly into the bulk as suggested by Eq. 2.15. Based on this we posit that projectingonly the boundary modes between levels acts as a ﬁlter against the bulk spurious modes.In Sec. 2.3 we noted two ways to formulate the transfer operator between levels. Themethod we utilize across this paper is only prolonging and restricting the boundary modeas given in Eq. 2.35. Another formulation would be to repeat the 2-d prolongator/restrictor25 − . .

51 0 0 . . . I m ( λ ) Re ( λ ) Eigenvalues, , β = 10 . , m = 0 . , L s = b L s = 8 D † P V D DW b D † P V b D DW Figure 4.1: A representative spectrum of D † P V D DW and its coarsened version (cid:98) D † P V (cid:98) D DW at ﬁxed L s = (cid:98) L s = 8.across the bulk dimension.As a quantitative approach to this study we follow the investigations of the staggeredMG paper [1]. Consider the normalized right eigenpairs of the ﬁne operator, ( λ, v λ ). Giventhese we inspect both the local colinearity , a measure of preserving the low eigenspace inthe least-squares sense, deﬁned by || (1 − P R ) v λ || , (4.1)and the oblique projector , deﬁned by || (1 − P ( (cid:98) D † P V (cid:98) D DW ) − RD † P V D DW ) v λ || , (4.2)which quantiﬁes the reduction or enhancement of a given error component for a magnitudeless than or greater than one. While an error enhancement is not inherently a problem,a very large error enhancement requires a prohibitively expensive compensation at thesmoother step. 26 , β = 10 . , m = 0 . , L s = 8 . . . . . . . . . || ( − P R ) v λ || Re( λ )Local Co-linearity, D † PV D DW preserve bulkboundary only . . .

53 0 0 . . . || ( − P ( b D † P V b D D W ) − R D † P V D D W ) v λ || Re( λ )Oblique Projector, D † PV D DW preserve bulkboundary only Figure 4.2: On the left, a measurement of local colinearity, and on the right, the eﬀect ofthe oblique projector on eigenvectors. Both ﬁgures compare a prolongator and restrictorwhich transfer the boundary and bulk vs just the boundary. The parameters for the MGaggregation are given in Table 2.In Fig. 4.2, we consider the local colinearity and the oblique projector for a givenconﬁguration. On the left-hand side, we see that the local colinearity is smaller in magni-tude when prolongating and restricting the entire boundary and bulk compared with justtransferring the boundary. This is not surprising as the boundary-only transfer operatorby construction has a smaller span than the full transfer operator. On the other hand,the oblique projector using the boundary-only transfer operator is generally smaller inmagnitude than the full transfer operator. This is a good indicator that the coarseningprescription given in the previous section combined with a transfer operator acting onlyon the boundary modes leads to an improved MG algorithm. L s An additional beneﬁt of using a transfer operator which only acts on the boundaries is itgives us the ﬂexibility to tune L s between levels. We will investigate this by two avenues.First, we will consider the behavior of the local colinearity and the oblique objector fortwo diﬀerent values of the coarse L s . Next, we will perform an explicit test of domain wallMG for a large value of the outer L s , with a range of ﬁxed smaller L s values on all coarserlevels.We illustrate the oblique projector on the left-hand side of Fig. 4.3. We see that a re-duced coarse (cid:98) L s = 4 behaves at least as well as maintaining a constant (cid:98) L s = 8. In addition,27 , β = 10 . , m = 0 . , L s = 8 . .

52 0 0 . . . || ( − P ( b D † P V b D D W ) − R D † P V D D W ) v λ || Re( λ )Oblique Projector, D † PV D DW b L s = 8 b L s = 4 − . − . − . . . . . . I m ( λ ) Re ( λ ) Eigenvalues of b D † PV b D DW , , β = 10 . , m = 0 . b L s = 8 b L s = 4 Figure 4.3: On the left, the eﬀect of the oblique projector for a boundary-only transferoperator for two value of the coarse (cid:98) L s : the original 8, and a reduced 4. On the right, acomparison of the coarse spectrum for each of these choices of (cid:98) L s .for larger values of the ﬁne eigenvalue (i.e., higher momentum and bulk modes), the errorenhancement is further suppressed. This may be related to the spectrum of the reduced (cid:98) L s operator on the right-hand of Fig. 4.3. We see that this operator does suﬀer from fewerspurious small eigenvalues, leading to a reduced risk of error enhancement. Following thisinvestigation of the oblique projector and the spectrum on a small conﬁguration, we havestudied the performance of a solve on a 128 conﬁguration with a representative β = 6 . L s = 32 for the outermost solve. Forsimplicity we chose a ﬁxed reduced L s for both the intermediate and coarsest level.We see in Fig. 4.4 that reducing the L s between the ﬁne and the intermediate level leads to a perfectly well behaved preconditioner . This is impressive for two rea-sons. One, a reduction in L s does lead to an enhancement in chiral symmetry breaking,suggesting that the intermediate and coarsest levels would not accurately capture the lowmodes of the ﬁne level. This does not appear to be a problem. Two, a reduced L s leadsto enhanced stability for very small masses. There also may be a side beneﬁt of fewerspurious small modes for the coarsest operator with reduced L s .We are encouraged that reducing L s on the intermediate and coarsest level leads toan improved algorithm, and it is what informed the formulation studied in Sec. 3. Ofcourse, there is a wider parameter space we could explore in this study: varying the lengthof the extra dimension separately for each level, tuning M , tuning m to approximatelyconstant m + m res , deﬂating the coarsest level to avoid a large iteration count. In lightof our success simply reducing L s and making no other changes, we defer such in depth28 .

001 0 .

01 0 . N u m b e r ﬁ n e D D W m Fine D DW applications, , β = 6 . , L s = 32 both coarse L s = 4 both coarse L s = 8 both coarse L s = 16 both coarse L s = 24 both coarse L s = 32 Figure 4.4: The number of applications of the ﬁne domain wall operator ( D DW and D † P V each count as one) per MG-preconditioned solve as a function of mass for ﬁxed β = 6 . . We vary the intermediate and coarse L s from 4 to 32. M remains ﬁxed.investigations to a study in four dimensions.29 Conclusion

We have presented a new approach to formulating an MG solver for domain wall fermions.This is a critical step to realizing the full beneﬁt of this chiral formulation which is the-oretically superior but more computationally demanding than the Wilson and staggereddiscretizations. For clarity of exposition our formalism was restricted to the Shamir versionof domain wall and for easy of development and testing restricted to domain wall fermionsfor the 2-d two-ﬂavor Schwinger model. Neither of these choices are fundamental. Theseresults convince us that this is a solution to a long sought fully recursive MG algorithm fordomain wall fermions that can eliminate critical slowing down approaching the continuumlimit for small fermion masses.As was the case with prototyping MG solvers the for Wilson [8] and staggered [1]discretizations, the next step is to develop software and optimize performance for the Diracsolver for lattice QCD. We anticipate a range of algorithmic embellishments and softwaremethods to optimize such an algorithm for use on exascale-trajectory machines in boththe weak- and strong-scaling regimes.One salient feature is important to emphasize. The projection and prolongationonly requires ﬁnding the near null space of the 4-d Wilson kernel, saving computationaloverhead and memory occupancy relative to the na¨ıve cost of the 5-d domain wall operator.The fact that there is no expansion of the null space relative to Wilson MG due to the heavy ﬂavors in the extra dimension is very good news. We foresee that this approach willbe eﬀective even for workﬂows that require few solves per gauge ﬁeld, e.g., Hybrid MonteCarlo.This points to another beneﬁt. The basic method presented here for the Shamirimplementation applies equally well to M¨obius, Zolotarev, Borici, etc. formalism via thedomain wall/Pauli-Villars factorization in

BNO . The only requirement is that each factoris a linear functional of the Wilson kernel. As a consequence it should also be straightforward to generalize our algorithm directly to the overlap operator itself. To appreciatethe full landscape consider the large class of chiral fermion methods presented by Edwards,Joo, Kennedy, Orginos, and Wenger in Ref. [31]. In all of these the sign function sign [ H ]in the overlap operator D ov = 1 + m − m γ sign [ H ] (5.1)must be approximated in a variety of ways as functional of the Wilson operator: sign [ H ] (cid:39) (cid:15) [ D W ] using, for example, rational approximations such as Pad`e expansions, partial frac-30ions, continued fractions, etc. The coarsening step would then act by projecting the 4-dWilson kernel into a near null space before building the sign [ H ] function, (cid:15) [ D W ] → (cid:92) (cid:15) [ D W ] = (cid:15) [ P † D W P ] . (5.2)As with Eq. 2.33, the kernel projection does not commute with the eﬀective chiral operator: (cid:15) [ P † D W P ] (cid:54) = P † (cid:15) [ D W ] P . From this vantage point, our domain wall MG algorithm imple-mentation is a special case using the domain wall/Pauli-Villars factorization in Eq. 2.34to identify an eﬀective overlap operator.Clearly there is a larger landscape of MG algorithms for chiral fermion operators toexplore. Optimizations will depend on the speciﬁc applications and target architectures.We anticipate that alternative implementations of the MG solver for domain wall fermionsin four dimensions [12, 13] may contribute to further optimizations. We leave the detailedstudy of these generalizations and optimizations for MG for domain wall and overlap chiralfermions to future investigations. Acknowledgements

We are grateful to Robert Edwards, Balint Joo, Harmut Neﬀ, Kostas Orginos, and PavlosVranas for fruitful discussions. This work was supported in part by the U.S. Department ofEnergy (DOE) under Award No. DE-SC0015845 and by the Exascale Computing Project(17-SC-20-SC), a collaborative eﬀort of the U.S. Department of Energy Oﬃce of Scienceand the National Nuclear Security Administration.

References [1] Richard C. Brower, M. A. Clark, Alexei Strelchenko, and Evan Weinberg. Multigridalgorithm for staggered lattice fermions.

Phys. Rev. , D97(11):114513, 2018.[2] Kenneth G. Wilson. Conﬁnement of Quarks.

Phys.Rev. , D10:2445–2459, 1974.[3] T. Blum, R.S. Van de Water, D. Holmgren, R. Brower, S. Catterall, et al. WorkingGroup Report: Lattice Field Theory. 2013.[4] Richard C. Brower, Claudio Rebbi, and Ettore Vicari. Projective multigrid for prop-agators in lattice gauge theory.

Phys. Rev. Lett. , 66:1263–1266, 1991.315] Kenneth G. Wilson. The Renormalization Group: Critical Phenomena and the KondoProblem.

Rev. Mod. Phys. , 47:773, 1975.[6] Richard C. Brower, Robert G. Edwards, Claudio Rebbi, and Ettore Vicari. Projectivemultigrid for Wilson fermions.

Nucl. Phys. , B366:689–705, 1991.[7] Arjan Hulsebos, Jan Smit, and Jeroen C. Vink. Multigrid inversion of the staggeredfermion matrix.

Nucl. Phys. Proc. Suppl. , 20:94–97, 1991.[8] James Brannick, Richard C. Brower, M A. Clark, James C. Osborn, and ClaudioRebbi. Adaptive Multigrid Algorithm for Lattice QCD.

Phys.Rev.Lett. , 100:041601,2008.[9] Ronald Babich, James Brannick, Richard C. Brower, M A. Clark, Thomas A. Man-teuﬀel, et al. Adaptive multigrid algorithm for the lattice Wilson-Dirac operator.

Phys.Rev.Lett. , 105:201602, 2010.[10] John Kogut and Leonard Susskind. Hamiltonian formulation of wilson’s lattice gaugetheories.

Phys. Rev. D , 11:395–408, Jan 1975.[11] David B. Kaplan. A Method for simulating chiral fermions on the lattice.

Phys.Lett. ,B288:342–347, 1992.[12] Saul D. Cohen, R.C. Brower, M.A. Clark, and J.C. Osborn. Multigrid Algorithms forDomain-Wall Fermions.

PoS , LATTICE2011:030, 2011.[13] P A Boyle. Hierarchically deﬂated conjugate gradient. 2014.[14] Azusa Yamaguchi and Peter Boyle. Hierarchically deﬂated conjugate residual.

PoS ,LATTICE2016:374, 2016.[15] Herbert Neuberger. Vector like gauge theories with almost massless fermions on thelattice.

Phys. Rev. , D57:5417–5433, 1998.[16] James Brannick, Andreas Frommer, Karsten Kahl, Bj¨orn Leder, Matthias Rottmann,and Artur Strebel. Multigrid Preconditioning for the Overlap Operator in LatticeQCD.

Numer. Math. , 132(3):463–490, 2016.[17] P. Becher and H. Joos. The dirac-k¨ahler equation and fermions on the lattice.

Zeitschrift f¨ur Physik C Particles and Fields , 15(4):343–365, Dec 1982.[18] Geoﬀrey T. Bodwin and Eve V. Kov´acs. Equivalence of dirac-k¨ahler and staggeredlattice fermions in two dimensions.

Phys. Rev. D , 38:1206–1219, Aug 1988.3219] Richard C. Brower, Harmut Neﬀ, and Kostas Orginos. The M¨obius domain wallfermion algorithm.

Comput. Phys. Commun. , 220:1–19, 2017.[20] Julian Schwinger. Gauge invariance and mass. ii.

Phys. Rev. , 128:2425–2429, Dec1962.[21] Andrei V. Smilga. Critical amplitudes in two-dimensional theories.

Phys. Rev. ,D55:443–447, 1997.[22] Yigal Shamir. New domain wall fermion actions.

Phys.Rev. , D62:054513, 2000.[23] Richard C. Brower, Hartmut Neﬀ, and Kostas Orginos. Mobius fermions: Improveddomain wall chiral fermions.

Nucl.Phys.Proc.Suppl. , 140:686–688, 2005.[24] R.C. Brower, H. Neﬀ, and K. Orginos. Mobius fermions.

Nucl.Phys.Proc.Suppl. ,153:191–198, 2006.[25] Artan Borici. Truncated overlap fermions: The Link between overlap and domainwall fermions ( in Lattice fermions and structure of the vaccuum, V. K. Mitrjushkinand G. Schierholz (eds)) . pages 41–52, 1999.[26] A. Borici. Truncated overlap fermions.

Nucl.Phys.Proc.Suppl. , 83:771–773, 2000.[27] Ting-Wai Chiu. Optimal domain-wall fermions.

Phys. Rev. Lett. , 90:071601, 2003.[28] Ting-Wai Chiu. Locality of optimal lattice domain-wall fermions.

Phys. Lett. ,B552:97–100, 2003.[29] Robert G. Edwards and Urs M. Heller. Exact chiral symmetry for domain wallfermions with ﬁnite l(s).

Nucl. Phys. Proc. Suppl. , 94:737–740, 2001.[30] Robert G. Edwards and Urs M. Heller. Domain wall fermions with exact chiralsymmetry.

Phys. Rev. , D63:094505, 2001.[31] Robert G. Edwards, Balint Joo, Anthony D. Kennedy, Kostas Orginos, and UrsWenger. Comparison of chiral fermion methods.

PoS , LAT2005:146, 2006.[32] Herbert Neuberger. Exactly massless quarks on the lattice.

Phys.Lett. , B417:141–144,1998.[33] Yoshio Kikukawa and Tatsuya Noguchi. Low-energy eﬀective action of domain wallfermion and the Ginsparg-Wilson relation.

Nucl. Phys. B Proc. Suppl. , 83:630–632,2000. 3334] Paul H. Ginsparg and Kenneth G. Wilson. A remnant of chiral symmetry on thelattice.

Phys. Rev. , D25:2649, 1982.[35] J¨org Liesen and Petr Tich´y. Convergence analysis of krylov subspace methods.

GAMM-Mitteilungen , 27(2):153–173, 2004.[36] M.W. Benson.

Iterative Solution of Large Scale Linear Systems . Mathematics report.Thesis (M.Sc.)–Lakehead University, 1973.[37] Edmond Chow. Parallel implementation and practical use of sparse approximateinverse preconditioners with a priori sparsity patterns.

Int. J. High Perform. Comput.Appl. , 15(1):56–74, February 2001.[38] Hans De Sterck, Ulrike Meier Yang, Jeﬀrey, and J. Heys. Reducing complexity inparallel algebraic multigrid preconditioners.

SIAM J. Matrix Anal. Appl , 27:1019–1039, 2006.[39] Eran Treister and Irad Yavneh. Non-galerkin multigrid based on sparsiﬁed smoothedaggregation.

SIAM Journal on Scientiﬁc Computing , 37(1):A30–A54, 2015.[40] David B. Kaplan and Martin Schmaltz. Supersymmetric Yang-Mills theories fromdomain wall fermions.

Chin.J.Phys. , 38:543–550, 2000.[41] Curtis G. Callan, Jr. and Jeﬀrey A. Harvey. Anomalies and Fermion Zero Modes onStrings and Domain Walls.

Nucl. Phys. , B250:427–436, 1985.[42] Gene H. Golub and Charles F. van Loan.

Matrix Computations . JHU Press, fourthedition, 2013.[43] David H. Adams. Theoretical foundation for the Index Theorem on the lattice withstaggered fermions.

Phys. Rev. Lett. , 104:141602, 2010.[44] Pavlos M. Vranas. Chiral symmetry restoration in the schwinger model with domainwall fermions.

Phys. Rev. , D57:1415–1432, 1998.

A Low Momentum Expansions for the DW Fermions

The free theory, setting U = 1, can be expanded and diagonalized in momentum spacefor ﬁnite L s , giving valuable guidance to our MG construction. First consider our approx-imation, D † P V (cid:39) D − P V , which we will prove to valid up to error O ( p ) at ﬁnite ﬁnite L s .34ndeed the Pauli-Villars operator is especially simple because it is anti-periodic in L s , andthus even with interacting ﬁelds can be diagonalized via Fourier modes giving D P V ( U, p ) = γ sin( p ) + 1 − cos( p ) + M + D W ( U, , (A.1)in terms of d × d space-time blocks for the Wilson operator D W ( U, M = − D P V ( U, p ) = − e − ip γ + D W ( U, , (A.2)we note that the ﬁrst term by in isolation gives a circle of eigenvalues in the complex plane.Moreover, as is apparent in Fig. 2.1, this basic pattern persists even with non-trivial gaugeﬁelds. In the free-ﬁeld limit, we can further diagonalize in space-time Fourier modes, p µ ,giving (cid:101) D P V ( p µ , p ) = i (cid:88) M =1 γ M sin( p M ) + (cid:88) M =1 ( p M /

2) + M , (A.3)where the summation includes γ and p . The free normal operator is (cid:101) D † P V (cid:101) D P V = (cid:88) M =1 sin ( p M ) + ( (cid:88) M =1 ( p M /

2) + M ) , (A.4)diagonal in spin structure. In the case of M = −

1, the low momentum expansion of thisoperator is (cid:101) D † P V (cid:101) D P V = M + O ( p M ). This gives (cid:101) D † P V = (cid:101) D − P V + O ( p ) . (A.5)Indeed more generally for M = − D † P V (cid:101) D P V is bounded below by1, i.e., at the lattice cutoﬀ scale 1 /a . With non-trivial gauge ﬁelds the approximation (cid:101) D † P V ≈ (cid:101) D − P V holds qualitatively particularly when M is appropriately tuned.The next step is to compare the low eigenspectra of the preconditioned operator D − P V D DW ( m ) and the approximation D † P V D DW ( m ). From Fig. 2.5 we note the near exactcoincidence of small eigenvalues even in the presence of gauge ﬁelds. We can elucidate thisin the free-ﬁeld limit.We cannot simultaneously diagonalize D P V and D DW ( m ) in the bulk dimension dueto the diﬀerence in boundary conditions which complicates the analysis. In this casewe take advantage of low-momentum perturbation theory in the space-time dimensions.Given (cid:101) D DW ( p µ , m ) ss (cid:48) , we expand both it and the Pauli-Villars factor at low d-momenta,take the appropriate products and solve the characteristic polynomial for the eigenvalues.The pairing of low modes gives a complex square root singularity in the complex plane,leading to the circular structure of the low spectrum as evident in Fig. 2.5. The results35rom this approach are given in the text in Eq. 2.24 and Eq. 2.25. Comparing the exactvs the approximate low spectra we see they are identical up to a six-dimensional operator O ( a mp ).We additionally note that the low momentum expansion of the eﬀective overlapoperator, D ov ( m ) = 1 + m − m γ (cid:15) L [ H ] , (cid:15) L [ H ] = (1 − H ) L − (1 + H ) L (1 − H ) L + (1 + H ) L , (A.6)has universal coeﬃcients even at ﬁnite L s up to O ( p n ) for L s > n . As a result, theeigenvalues are also equivalent in perturbation theory up to that order. For example,consider for d = 2 the roots of the polynomial, λ − λ tr + det = 0 in terms of thetrace and determinant. For the Shamir kernel, Evaluating this for the Shamir kernel, H = γ D W ( M ) / (2 + D W ( M )), this is given bydet = λ + λ − = p x + p y − (4 / p x + p y ) − p x p y ] , tr = λ + + λ − = 2( p x + p y ) (A.7)up to O ( p ), again universal for any L s ≥

2, resulting in the eigenvalues λ ± = ± i (cid:113) p (1 − p ) + 2 p x p y / p , (A.8)to the same order. To include non-zero mass we can use the mass shift identity foreigenvalus, λ → λ = m + (1 − m ) λ , from Eq. 2.3 to get λ ± = m + p ± i (1 − m ) (cid:113) p (1 − p ) + 2 p x p y / − mp , (A.9)which up to rescaling by 1 / (1 − m ) gives λ ± ∼ m q + p ± i (cid:112) p in agreement with Eq. 2.24in the text up to a dimension-three scaling operator of O ( mp ). Higher-order expansionsin 2-d and 4-d eﬀective overlap operators are easily shown in Mathematica to follow thisexpansion invariance up to O ( p L s ). We believe the fact that the quadratic form is inde-pendent of L s ≥ D † P V D DW ( m ) eﬀective overlap operator is atthe core the eﬀectiveness of using small L s for MG iterations on the coarse level.Finally we note that this expansion is almost certainly a divergent asymptotic seriesand as such does not by itself lend itself to ﬁx the coeﬃcients in improved higher-orderpolynomial approximation[ D P V D † P V ] − (cid:39) c + c D P V D † P V + c ( D P V D † P V ) · · · (A.10)to [ D P V D † P V ] − in Eq. 2.27 beyond the zeroth order. We see this ﬁrst-order equivalenceis given by c i = (1 , , , · · · ). We can take thus further by expanding [ D P V D † P V ] − =36 / (1 + A P V ) in A P V = D P V D † P V − p . For example just the second-order expansion, c i = (2 , − , , , · · · ), giving the equivalence D − P V = D † P V (2 − D P V D † P V ) + O ( p µ ). Thispolynomial on its own cannot be used in our MG algorithm because 2 − D P V D † P V is notpositive deﬁnite so coeﬃcient in a polynomial truncation must be weighted appropriatelyover the entire spectrum.

B Proof that the spectrum has a positive real part

In the full interacting case with gauge ﬁelds we consider our approximation D † P V D DW as (cid:16) D † P V D P V (cid:17) D − P V D DW and note from Eq. 2.14 that D − P V D DW has the eigenspectra of theﬁnite- L s overlap kernel, D ( L s ) ov ( m ) = 1 + m − m γ (cid:15) L s [ H ] , (B.1)plus additional 1 eigenvalues. We recall that for even L s , (cid:15) L s [ H ] ∈ [ − , γ (cid:15) L s [ H ] inequality for a normalized vectors, (cid:104) u | u (cid:105) = (cid:104) v | v (cid:105) = 1, thematrix element is bounded in magnitude, | (cid:104) u | γ (cid:15) L s [ H ] | v (cid:105) | = (cid:112) (cid:104) v | (cid:15) L s [ H ] (cid:15) L s [ H ] | v (cid:105) ≤ , (B.2)Equivalently this is just a statement of the unitarity bound on matrix elements. Byextension, the spectrum of D ov ( m ) lives inside a circle of radius − m centered at ( m , D ov ( m ) and D − P V D DW have positive deﬁnite real partfor m > D † P V D DW for any righteigenvector | λ (cid:105) , D † P V D DW | λ (cid:105) = re iθ | λ (cid:105) . (B.3)where we have generally written the eigenvalue λ as re iθ . We wish to prove the right-halfplane condition θ ∈ ( − π/ , π/ (cid:16) D † P V D P V (cid:17) D − P V D DW | λ (cid:105) = re iθ | λ (cid:105) . (B.4)Left multiplying by (cid:16) D † P V D P V (cid:17) − and taking the full matrix element with (cid:104) λ | we have (cid:104) λ | D − P V D DW | λ (cid:105) = re iθ (cid:104) λ | ( D † P V D P V ) − | λ (cid:105) . (B.5)37ince D − P V D DW satisﬁes the right half-plane condition and D † P V D P V is a Hermitian positiveoperator, this proves the right half-plane condition θ ∈ ( − π/ , π/

2) for m >m >