[PDF] A multigrid accelerated eigensolver for the Hermitian Wilson-Dirac operator in lattice QCD

Abstract

Eigenvalues of the Hermitian Wilson-Dirac operator are of special interest in several lattice QCD simulations, e.g., for noise reduction when evaluating all-to-all propagators. In this paper we present a Davidson-type eigensolver that utilizes the structural properties of the Hermitian Wilson-Dirac operator Q to compute eigenpairs of this operator corresponding to small eigenvalues. The main idea is to exploit a synergy between the (outer) eigensolver and its (inner) iterative scheme which solves shifted linear systems. This is achieved by adapting the multigrid DD- α AMG algorithm to a solver for shifted systems involving the Hermitian Wilson-Dirac operator. We demonstrate that updating the coarse grid operator using eigenvector information obtained in the course of the generalized Davidson method is crucial to achieve good performance when calculating many eigenpairs, as our study of the local coherence shows. We compare our method with the commonly used software-packages PARPACK and PRIMME in numerical tests, where we are able to achieve significant improvements, with speed-ups of up to one order of magnitude and a near-linear scaling with respect to the number of eigenvalues. For illustration we compare the distribution of the small eigenvalues of Q on a 64× 32 3 lattice with what is predicted by the Banks-Casher relation in the infinite volume limit.

Full PDF

AA multigrid accelerated eigensolver for theHermitian Wilson-Dirac operator in lattice QCD

Andreas Frommer, Karsten Kahl, Francesco Knechtli,Matthias Rottmann, Artur Strebel, Ian Zwaan

Abstract

Eigenvalues of the Hermitian Wilson-Dirac operator are of special in-terest in several lattice QCD simulations, e.g., for noise reduction whenevaluating all-to-all propagators. In this paper we present a Davidson-typeeigensolver that utilizes the structural properties of the Hermitian Wilson-Dirac operator Q to compute eigenpairs of this operator corresponding tosmall eigenvalues. The main idea is to exploit a synergy between the(outer) eigensolver and its (inner) iterative scheme which solves shiftedlinear systems. This is achieved by adapting the multigrid DD- α AMGalgorithm to a solver for shifted systems involving the Hermitian Wilson-Dirac operator. We demonstrate that updating the coarse grid operatorusing eigenvector information obtained in the course of the generalizedDavidson method is crucial to achieve good performance when calculat-ing many eigenpairs, as our study of the local coherence shows. We com-pare our method with the commonly used software-packages PARPACKand PRIMME in numerical tests, where we are able to achieve signiﬁcantimprovements, with speed-ups of up to one order of magnitude and a near-linear scaling with respect to the number of eigenvalues. For illustrationwe compare the distribution of the small eigenvalues of Q on a 64 × lattice with what is predicted by the Banks-Casher relation in the inﬁnitevolume limit. In lattice Quantum Chromodynamics (QCD) the Wilson-Dirac operator de-scribes the interaction between quarks and gluons in the framework of quantumﬁeld theory. Results of lattice QCD simulations represent essential input toseveral of the current and planned experiments in elementary particle physics(e.g., BELLE II, LHCb, EIC, PANDA, BES III).Obtaining eigenpairs (eigenvalues and -vectors) of the Wilson-Dirac operatoris an important computational task. For example, eigenpairs can be used todirectly compute physical observables [5, 12, 15, 16, 21, 29] or as a tool fornoise reduction in stochastically estimated quantities like disconnected fermionloops [2]. In most circumstances we are interested in a small to moderate amountof eigenvectors corresponding to the eigenvalues closest to zero, especially for1 a r X i v : . [ h e p - l a t ] A p r he Hermitian Wilson-Dirac operator. As the Hermitian Wilson-Dirac operatoris indeﬁnite, these eigenvalues lie in the interior of the spectrum.Typically, computing interior eigenvalues is particularly expensive, which iswhy in this paper we develop eﬃcient computational methods for the specialcase of the Hermitian Wilson-Dirac operator. The most prominent methods forobtaining interior eigenvalues are shift-and-invert algorithms which extend thebasic inverse iteration approach; cf. [34]. This includes methods ranging fromthe classical Rayleigh quotient iteration (RQI) [34] to the generalized Davidson(GD) methods [34] and its numerous variations like GD+ k [39], Jacobi-Davidson(JD) [36] or JDCG/JDQMR [30, 39]. The generalized Davidson methods can,alternatively, also be regarded as a generalization of Arnoldi’s method [28] withimproved search directions.Most Davidson-type methods share an inner-outer-scheme , where the outeriteration ﬁnds approximations to the sought eigenpairs, while the inner itera-tions generates new search directions by approximately solving shifted linearsystems ( A − τ I ) x = b, (1.1)where τ is an approximation to a target eigenvalue. In fact, these inversionsmake up the bulk of the computational work, and it is thus mandatory to ﬁndparticularly eﬃcient methods for this task. In lattice QCD, adaptive algebraicmultigrid methods have established themselves as the most eﬃcient methods forsolving linear systems with the Wilson-Dirac operator [1, 6, 18, 19, 31]. Theydemonstrate signiﬁcant speed-ups compared to conventional Krylov subspacemethods, achieving orders of magnitudes faster convergence and an insensitiv-ity to conditioning. In this work, we use the adaptive domain decompositionalgebraic multigrid method DD- α AMG [18, 19], but we expect the results forDD- α AMG to carry over to the other aggregation-based multigrid implemen-tations as well. Originally DD- α AMG is composed of an adaptive aggregationbased multigrid construction and a red-black multiplicative Schwarz smoother(traditionally termed “SAP” for Schwarz alternating procedure in lattice QCD),but we also have the option to use other smoothers like GMRES in the DD- α AMG framework.So far, multigrid solvers have been limited to the unshifted Wilson-Diracoperator . This is due to the algebraic construction of the interpolation opera-tor, which is built from approximations of eigenvectors corresponding to smalleigenvalues, which is necessary for an eﬀective overall error reduction in theunshifted case, cf. [18, 19].During the progress of the eigensolver—when the shift τ in (1.1) becomeslarger—the interpolation operator no longer approximates the space spannedby eigenvectors corresponding to small eigenvalues of the shifted operator, thusinvalidating the coarse grid correction step and signiﬁcantly slowing down con- In [28] the authors relate generalized Davidson with the Lanczos method, but the state-ment also holds for non-Hermitian matrices. The original solver can manage small shifts, but quickly deteriorates for increasing shiftsas will be demonstrated later. nner iteration • uses eigenvectors to improvethe multigrid preconditioner outer iteration • uses better search directionto improve the search spaceprovides improved search directionprovides new eigenpairs and shift τ Figure 1: Interleaving eigenpair extraction and construction of the multigridpreconditioner for the inner iterationvergence. The main idea of this paper is that in order to overcome this problemwe dynamically update the interpolation operator of the multigrid solver duringthe outer iteration. This idea is sketched in Figure 1. It illustrates how we areinterleaving the eigenpair extraction with the construction of an eﬃcient solverfor (1.1). While the outer iteration extracts eigenpairs, its eigenvectors are usedto improve the multigrid preconditioner. In turn the multigrid method is aneﬃcient preconditioner for (1.1), which produces improved search directions forthe outer iteration.As an additional topic, we investigate several approaches for the most suit-able smoothing method of the multigrid method for the Hermitian Wilson-Diracoperator in presence of large shifts. Altogether, we obtain a method which scalesclose to linearly with the number of eigenpairs to be computed.The remainder of this paper is organized as follows. We give a brief intro-duction into lattice QCD and algebraic multigrid methods in Section 2. We putparticular emphasis on showing and theoretically justifying how the algebraicmultigrid approach, which has originally been developed for the non-HermitianWilson-Dirac operator, can be applied to the Hermitian Wilson-Dirac operator.In Section 3 we proceed by introducing the generalized Davidson method andpresent our adaptations for the speciﬁc case of the Hermitian Wilson-Dirac op-erator, including the interleaving of the eigensolver and the construction of themultigrid solver used in the inner iteration. Numerical tests and comparisonswith commonly used and state-of-the-art software are presented in Section 4.As a simple example of the use of our method in physics, Section 5 discusseslattice artifacts observed for the spectral gap. Finally, a summary of our resultsis given in Section 6. 3

Algebraic Multigrid Methods in Lattice QCD

The Dirac equation D ψ + m · ψ = η (2.1)describes the dynamics of quarks and the interaction of quarks and gluons.Here, ψ = ψ ( x ) and η = η ( x ) represent quark ﬁelds. They depend on x , thepoints in space-time, x = ( x , x , x , x ). The gluons are represented in theDirac operator D , and m is a scalar mass parameter that is independent of x and sets the mass of the quarks in the QCD theory.More precisely, D is given as D = (cid:88) µ =0 γ µ ⊗ ( ∂ µ + A µ ) , (2.2)where ∂ µ = ∂/∂x µ and A is the gluon (background) gauge ﬁeld with the anti-Hermitian traceless matrices A µ ( x ) being elements of su (3), the Lie algebra ofthe special unitary group SU(3). The (Hermitian) γ -matrices γ , γ , γ , γ ∈ C × represent generators of the Cliﬀord algebra with γ µ γ ν + γ ν γ µ = 2 δ µν I for µ, ν = 0 , , , , (2.3)with I the identity on C . Consequently, at each point x in space-time, thespinor ψ ( x ), i.e., the quark ﬁeld ψ at a given point x , is a twelve componentcolumn vector, each component corresponding to one of three colors (acted uponby A µ ( x )) and four spins (acted upon by γ µ ). For future use we remark that γ = γ γ γ γ satisﬁes γ γ µ = − γ µ γ , µ = 0 , , , , (2.4)independently from the chosen representation.The only known way to obtain predictions in QCD from ﬁrst principlesand non-perturbatively, is to discretize and then simulate on a computer. Thediscretization is typically formulated on an equispaced periodic N t × N s lattice L with uniform lattice spacing a , N s denoting the number of lattice points foreach of the three spatial dimensions and N t the number of lattice points in thetime dimension. A quark ﬁeld ψ is now represented by its values at each latticepoint, i.e., it is a spinor valued function ψ : x ∈ L → ψ ( x ) ∈ C .The Wilson-Dirac discretization is one of the most commonly used discretiza-tions in lattice QCD simulations. It is obtained from the continuum equation byreplacing the covariant derivatives by centralized covariant ﬁnite diﬀerences onthe lattice. It contains an additional, second order ﬁnite diﬀerence stabilizationterm, as otherwise the discretization would suﬀer from ‘red-black’ instability,cf. [37]. The Wilson-Dirac discretization yields a local operator D in the sensethat it represents a nearest neighbor coupling on the lattice L .Introducing shift vectors ˆ µ = (ˆ µ , ˆ µ , ˆ µ , ˆ µ ) T ∈ R in dimension µ on L ,i.e., ˆ µ ν = (cid:40) a µ = ν , D on a discrete quark ﬁeld ψ is given as( Dψ )( x ) = ( m + 4 a ) ψ ( x ) − a (cid:88) µ =0 (( I − γ µ ) ⊗ U µ ( x )) ψ ( x + ˆ µ ) − a (cid:88) µ =0 (cid:0) ( I + γ µ ) ⊗ U Hµ ( x − ˆ µ ) (cid:1) ψ ( x − ˆ µ ) . (2.5)Here, the gauge-links U µ ( x ) are now matrices from the Lie group SU(3), andthe lattice indices x ± ˆ µ are to be understood periodically. The mass parameter m sets the quark mass (for further details, see [27]), and we will write D ( m )whenever the dependence on m is important.To explicitly describe D we ﬁx a representation for the γ -matrices in which γ = (cid:0) I − I (cid:1) .From (2.5) we obtain the couplings of the lattice sites x and x ± ˆ µ as( D ) x,x +ˆ µ = − a ( I − γ µ ) ⊗ U µ ( x ) , ( D ) x,x − ˆ µ = − a ( I + γ µ ) ⊗ U Hµ ( x − ˆ µ ) , (2.6)which shows ( D ) x +ˆ µ,x = − a ( I + γ µ ) ⊗ U Hµ ( x ) . Thus, the commutativity relations(2.4) imply the symmetry( γ ⊗ I ) (cid:0) D (cid:1) x,x +ˆ µ = (cid:0) ( γ ⊗ I ) (cid:0) D (cid:1) x +ˆ µ,x (cid:1) H . With Γ = I n L ⊗ γ ⊗ I , n L the number of lattice sites, this symmetry can bedescribed on the level of the entire Wilson-Dirac operator asΓ D = (Γ D ) H . (2.7)The matrix Γ is Hermitian and unitary, and the Γ -symmetry (2.7) is a non-trivial, fundamental symmetry that the discrete Wilson-Dirac operator inheritsfrom a corresponding symmetry of the continuum Dirac operator (2.2).The Wilson-Dirac operator and its clover-improved variant (where a termwhich is diagonal in space and time is added to reduce the local discretizationerror from O ( a ) to O ( a )) is an adequate discretization for the numerical com-putation of many physical observables. For further details on discretization,its properties and the clover-improved variant, we refer the interested readerto [11, 20, 35]. In this paper we focus on the Hermitian or symmetrized Wilson-Dirac operator Q := Γ D .

Algebraic multigrid methods.

The state-of-the-art approaches for solvinglinear systems involving the (non-Hermitian) Wilson-Dirac operator D are vari-ants of aggregation-based adaptive algebraic multigrid methods, see [1, 6, 19, 31].Typically, these multigrid solvers are used as a (non-stationary) preconditionerwithin a ﬂexible Krylov subspace method like FGMRES [33] or GCR [14]; seealso [25, 26]. 5he error propagator for the two-level version of all these multigrid ap-proaches is E g = ( I − M D ) ν ( I − P D − c RD )( I − M D ) µ , (2.8)where M denotes the smoother—which, in the case of DD- α AMG, is given bythe Schwarz alternating procedure (SAP)—and µ and ν denote the number ofpre- and post-smoothing iterations, respectively. The operator I − P D − c RD isthe coarse grid correction, where P is the adaptively constructed aggregationbased interpolation [1, 6, 19, 31], obtained in a “setup” phase, R = P H is thecorresponding restriction and D c the Galerkin projected coarse grid operator D c = P H DP .As is discussed in [7, 31], this algebraic multigrid approach for D can betransferred to one for Q if the interpolation P preserves spin structure in thesense that on the coarse grid we can partition the degrees of freedom per gridpoint into two groups corresponding to diﬀerent spins and that we have Γ P = P Γ c , where Γ c is diagonal with values ±

1, depending on the spin on the coarsegrid. Putting Q c = Γ c D c we then have I − P Q − c P H Q = I − P D − c Γ c P H Γ D = I − P D − c P H D , (2.9)showing that the coarse grid error propagator for D is identical to the coarsegrid error propagator for Q if we take the same P . Note that the constructionof P in [1, 19, 31] is indeed spin structure preserving, while this is not the casefor the “little Dirac” construction found in [26].As a matter of fact the SAP smoothing used in DD- α AMG is identical for D and Q , as well, which can be shown by the following argument. Mathematically,one step of SAP is a product of block projections, i.e., the error propagator isgiven by E SAP := b (cid:89) i =1 ( I − I L i Q − i I H L i (cid:124) (cid:123)(cid:122) (cid:125) := M Qi Q ) , (2.10)where b is the number of subdomains, L i is the i -th subdomain of the lattice L , I L i the trivial injection from L i into L , and Q i := I H L i QI L i the block restrictionof Q on L i .Note that algorithmically, the calculations corresponding to I L i Q − i I H L i Q can be performed in parallel for all blocks i of the same color if we introduce ared-back ordering on the blocks.With this we get the following proposition, in which we deﬁne M D i and D i analogously to M Q i . Proposition 1.

The error propagator E SAP ( Q ) := (cid:81) bi =1 ( I − M Q i Q ) is equiv-alent to E SAP ( D ) := (cid:81) bi =1 ( I − M D i D ) .Proof. We ﬁrst note that Γ is just a local positive or negative identity, so itsblock restriction Γ i := I H L i Γ I L i on L i not only satisﬁes I H L i Γ = Γ i I H L i but also6Γ i ) − = Γ i . To prove the proposition we only need to show that the errorpropagators are identical for any given subdomain i : I − M Q i Q = I − ( I L i Q − i I H L i ) Q = I − ( I L i ( I H L i Γ DI L i ) − I H L i )Γ D = I − ( I L i (Γ i D i ) − I H L i )Γ D = I − ( I L i D − i Γ i I H L i )Γ D = I − ( I L i D − i I H L i Γ )Γ D = I − M D i D. (2.11)The proposition states that SAP for Q is equivalent to SAP for D if theblock inversions for the block systems Q i are performed exactly, which togetherwith (2.9) implies that the DD- α AMG method has the same error propagator,irrespective of whether it is applied to Q or to D . As observed in [19] SAPsmoothing works well for the standard Wilson-Dirac operator D thus it alsoworks well for the Hermitian Wilson-Dirac operator Q . However, if we performonly approximate block inversions in SAP—and this is what one typically does—this situation becomes less clear; see Section 4.Alternatively, instead of SAP one can use (restarted) GMRES as a smootherfor Q . For the non-Hermitian Wilson-Dirac operator D this is used in the multi-grid methods from [1, 6, 31], and since GMRES is also one of the most numeri-cally stable Krylov subspace methods for indeﬁnite systems, it is to be expectedto work well as a smoother in a multigrid method for Q as well. Interestingly,for GMRES smoothing a connection between Q and D similar to what has justbeen exposed for SAP smoothing does not hold. We compare the above optionsfor the smoothing method experimentally in Section 4. The generalized Davidson (GD) method [10, 28, 34] is an eigensolver frameworkwhich can be seen as a generalization of Arnoldi’s method. Its advantage is thatit does not rely on a Krylov subspace structure and thus oﬀers a more ﬂexibleway of steering the search space V m into a desired direction. The methodsuccessively generates a set of orthogonal vectors v , v , . . . , v m , which span thesearch space V m . An approximate eigenpair ( u, θ ) with u ∈ V m is chosen suchthat the Ritz-Galerkin condition Au − θu ⊥ V m (3.1)holds, which amounts to solving the (small and dense) m × m eigenvalue problem (cid:0) V Hm AV m (cid:1) s − θs = 0 , where V m = [ v | · · · | v m ] , (3.2)and then taking u = V m s with s the eigenvector from (3.2) whose eigenvalue θ is closest to the target eigenvalue. The search space is then extended by a7ew vector t which is obtained as a function of the matrix A , the approximateeigenvalue θ , and the eigenvector residual r := Au − θu . The new vector v m +1 is then retrieved after orthogonalizing t against v , . . . , v m and normalizing it.For our work we focus on obtaining t as an (approximate) solution of the correction equation ( A − τ I ) t = r, (3.3)where τ is an estimate for the target eigenvalue. The choice of τ steers theexpansion of the search space, and with it the Ritz values, towards the desiredeigenvalue regions, e.g., eigenvalues with smallest absolute value or with largestimaginary part.For the (Hermitian) Wilson-Dirac operator, Davidson-type methods are tobe preferred over Arnoldi’s method due to the fact that Arnoldi’s method wouldrequire exact solves of the correction equation to maintain its constitutive or-thogonality relations, whereas Davidson-type methods are tailored to accom-modate approximate solutions, and these can be computed eﬃciently via somesteps of multigrid preconditioned ﬂexible GMRES.A description of the generalized Davidson method to obtain one eigenpair isgiven in Algorithm 1. Techniques for computing several eigenpairs and restart-ing will be reviewed in the subsequent section. Algorithm 1:

Generalized Davidson (basic) input: initial guess t , desired accuracy ε output: eigenpair ( λ, x ) V = ∅ for m = 1 , , . . . t = ( I − V V H ) t v m = t/ || t || V = [ V | v m ] H = V H AV get target eigenpair ( θ, s ) of H u = V s r = Au − θu if || r || ≤ ε λ = θ , x = u return compute t as a function of A , r and θ λ AMG

The method we propose for the Hermitian Wilson-Dirac operator is based onAlgorithm 1 but incorporates several adaptations for the Hermitian Wilson-Dirac operator and the underlying DD- α AMG multigrid solver.8 ﬁrst challenge is that we are confronted with a “maximally indeﬁnite”interior eigenvalue problem, seeking the eigenvalues closest to zero, while theoperator has a nearly equal amount of positive and negative eigenvalues. Thebasic generalized Davidson method uses the Rayleigh Ritz procedure to deter-mine the Ritz approximation by solving the standard eigenvalue problem (3.2)for H = V Hm AV m . Ritz values approximate outer eigenvalues better and fasterthan the interior ones [34], which is why we use harmonic Ritz values [32] in-stead. Deﬁnition 1 (Harmonic Ritz Values) . A value θ ∈ C is called a harmonic Ritzvalue of A with respect to a linear subspace V if θ − is a Ritz value of A − withrespect to V . As the exterior eigenvalues of A − are the inverses of the eigenvalues of A ofsmall modulus, harmonic Ritz values tend to approximate small eigenvalues well.Inverting A to obtain harmonic Ritz values can be avoided with an appropriatechoice for V as stated in the following theorem; cf. [36]. Theorem 1.

Let V be some m -dimensional subspace with basis v , . . . , v m . Avalue θ ∈ C is a harmonic Ritz value of A with respect to the subspace W := A V ,if and only if Au m − θu m ⊥ A V for some u m ∈ V , u m (cid:54) = 0 . (3.4) With V m := [ v | . . . | v m ] , W m := AV m and H m := ( W Hm V m ) − W Hm AV m , (3.4) is equivalent to H m s = θs for some s ∈ C m , s (cid:54) = 0 and u m = V m s. Due to Theorem 1, we can obtain harmonic Ritz values by solving the gen-eralized eigenvalue problem W Hm AV m u = θW Hm V m u. (3.5)The computational overhead compared to the standard Ritz procedure isdominated by m additional inner products to build W Hm AV m . In our numericaltests, we have observed that this is compensated by a faster convergence of thegeneralized Davidson method, cf. Section 4.Although the multigrid approach is viable for the Hermitian Wilson-Diracoperator Q , it is, in practice, slower than for D . For exact solves of the subdo-main systems in the SAP smoother, the discussion in Section 2 and Proposition 1implies that the convergence speeds for Q and for D are comparable as the er-ror propagation operators are identical. Though in computational practice, it ismore eﬃcient to do only approximate solves for the subdomain systems, usinga small number of GMRES steps, for example. In this scenario the multigrid9ethod becomes signiﬁcantly slower when used for Q rather than D , see Fig-ure 6 in Chapter 4. This slowdown can be countered by left-preconditioning thecorrection equation with Γ . This means that instead of solving (1.1) with Q ,we can transform it equivalently according to( Q − τ I ) t = r (3.6) ⇐⇒ Γ ( Q − τ I ) t = Γ r ⇐⇒ ( D − τ Γ ) t = Γ r. (3.7)The spectrum of the resulting operator Γ Q ( τ ) := D − τ Γ has similaritiesto that of D with some eigenvalues collapsing on the real axis. As we willsee in Chapter 4, this simple transformation speeds up the multigrid methodsigniﬁcantly. Figure 2 shows full spectra of D , Q and Γ Q ( τ ) for a conﬁgurationon a small 4 lattice. − − . − − . . .

52 0 1 2 3 4 5 6 7 8 i m ag i n a r y a x i s real axis 0 − − − − i m ag i n a r y a x i s real axis − − . − − . . .

52 0 1 2 3 4 5 6 7 8 i m ag i n a r y a x i s real axis Figure 2: Full Spectra of D , Q and Γ Q ( τ ) for conﬁguration 5 (see Table 1). Theshaded area of spec( Q ) highlights the eigenvalues we are particularly interestedin. Restarting and locking.

As the search space grows in every outer iteration,the storage and orthogonalization costs of the outer iteration in a generalizedDavidson method eventually become prohibitively large. The following tech-niques reduce these costs in order to achieve a near-linear scaling in the numberof computed eigenpairs. The ﬁrst technique is thick restarting [36]. When thesearch space reaches a size of m max , we perform a restart by discarding thecurrent search space. At the same time we keep the ﬁrst m min smallest non-converged harmonic Ritz vectors and use them to span the search space at thebeginning of the next restart cycle. We tuned the parameters m min and m max such that we have reason to assume that we keep both positive and negativeharmonic Ritz values within the new search space. This way the eigensolver ob-tains a (nearly) equal amount of positive and negative eigenpairs in a uniformway.In order to avoid re-targeting converged eigenpairs, we employ the conceptof locking converged eigenpairs [40] as a second technique. Locking keeps thesearch space V orthogonal to the space of already converged eigenvectors X .In this manner, it is not required to keep converged eigenvectors in the searchspace which has the eﬀect that the search space dimension becomes bounded10ndependently of the number of eigenpairs sought. This in turn bounds the costfor computing the harmonic Ritz pairs. The new search direction still has to beorthogonalized against all previous eigenvectors, which leads to costs of order O ( nk ), since it consists of k − k eigenpairs.This is responsible for the fact that, in principle, the cost of our method scalessuperlinearly with k , and this becomes visible when k becomes suﬃciently large. Local coherence and its eﬀect on the correction equation.

The strengthof algebraic multigrid methods relies on an eﬀective coarse grid correction stepand thus on the construction of the interpolation operator P . The methods inuse for the Wilson-Dirac operator are all adaptive: They require a setup phasewhich computes “test vectors” w i , i = 1 , . . . , n tv which are collected as columnsin the matrix W = [ w | . . . | w n tv ]. The test vectors are approximations toeigenvectors corresponding to small eigenvalues of the unshifted Wilson-Diracoperator D . The matrix W is then used to build an aggregation based, “blockdiagonal” interpolation operator P , where each diagonal block constitutes an aggregate , i.e., a block A i of W corresponding to the degrees of freedom of ablock of the lattice L ; see Figure 3 and [19, 31]. ( w , . . . , w n tv ) = = A A A s → P = A A A s Figure 3: Matrix view of the construction of the aggregation based interpolationoperator P .By construction, the range of an aggregation based interpolation P contains at least the range spanned by the test vectors it is being built from. In [26] it hasbeen observed that eigenvectors belonging to small eigenvalues of the Wilson-Dirac operator D are locally coherent in the sense that these eigenvectors arelocally similar, i.e., they are similar on the individual aggregates. This is thereason why the span of an aggregation based interpolation P contains goodapproximations to small eigenpairs far beyond those which are explicitly used forits construction. This in turn explains the eﬃciency of such P in the multigridmethod.We can study local coherence using the local coherence measure lc of a vector v deﬁned as lc( v ) = (cid:107) Π v (cid:107) / (cid:107) v (cid:107) , where Π denotes the orthogonal projection on the range of P . If lc( v ) is closeto 1, there is a good approximation to v in the range of P , implying that the11ultigrid coarse grid correction reduces error components in the direction of v almost to zero. -1000 0 1000020406080100120 -1000 0 1000020406080100120 Figure 4: Local coherence for D (left) and Q (right) for a 4 conﬁguration, cf.Table 1.Figure 4 gives values for lc( v ) for the Wilson-Dirac operator D and thecorresponding Hermitian Wilson-Dirac operator Q on a 4 lattice. Since thislattice is so small, we can compute the full spectrum (12 · = 3072 eigenpairs)of both D and Q . For each matrix we then consider a partitioning of theeigenvectors into 128 sets, each set consisting of 24 consecutive eigenpairs. Here,“consecutive” refers to an ordering based on the modulus and the sign of the realpart; see the next paragraph for details. For each of these “interpolation sets”,the corresponding row displays the color coded value of lc( v ) when projectingan eigenvector v with the projection Π corresponding to the aggregation-basedinterpolation P built with the eigenvectors from that interpolation set as testvectors. The aggregates used were based on a decomposition of the 4 latticeinto 16 sub-lattices of size 2 . Due to the spin structure preserving approach,we have two aggregates per sub-lattice, each built from the corresponding spincomponents of the 24 test vectors . Of course lc( v ) = 1 (dark red) if v is fromthe respective interpolation set.The numbering of the eigenvalues used in these plots is as follows: The plotfor D has its eigenvalues with negative imaginary part in its left half, ordered bydescending modulus and enumerated by increasing, negative indices includingzero, − , . . . ,

0. The eigenvalues with positive imaginary part are located inthe right half, ordered by ascending modulus and enumerated with increasingpositive indices 1 , . . . , Q we just order the real eigenvalues by thenatural ordering on the reals, using again negative and positive indices. Thus,for D as for Q , eigenvalues small in modulus are in the center and their indicesare small in modulus, while eigenvalues with large modulus appear at the left The projection Π therefore projects onto a subspace of dimension 24 · ·

16 = 768. If therewere no local coherence at all, the expected value of lc is thus 768 / (12 · ) = 0 . D and Q , but it is more pronounced for thenon-Hermitian Wilson-Dirac matrix. This especially holds directly next to theinterpolation sets (the diagonal in the plots). Secondly, local coherence is par-ticularly strong and far-reaching when projecting on the interpolation sets cor-responding to the smallest and largest eigenpairs in absolute values. In thecenter of both plots, we observe a star-shaped area with particularly high localcoherence. This area corresponds to around 10% of the smallest eigenvalues. Toa lesser extent, local coherence is also noticeable for the other parts of the spec-trum, as we consistently observe higher values for lc( v ) for eigenvectors close tothe respective interpolation set.The right part of Figure 5 reports similar information for the HermitianWilson-Dirac operator Q coming from a larger, realistic conﬁguration on a 64 × lattice. For lattices of this size we cannot compute the full spectrum, thus weshow the values for the 984 smallest eigenpairs, subdivided in 41 interpolationsets, each consisting of 24 consecutive eigenpairs. The aggregates were this timeobtained from 4 sublattices. For comparison, the left part of the ﬁgure shows azoomed-in part of the local coherence plot for Q for the 4 -lattice from Figure 4. -200 -100 0 100 20051015 -400 -200 0 200 40010203040 Figure 5: Local coherence for Q for diﬀerent lattices, focusing on the eigenpairsclosest to zero. Left:

432 eigenpairs of a 4 lattice. Right:

984 eigenpairs of a64 × lattice.First note that the colors encode diﬀerent values in the left and right part ofFigure 5. Local coherence does not drop below 0 . .

6. On the other hand, 984eigenvalues only correspond to a minuscule fraction of roughly 4 · − % of thetotal of 12 · ·

64 = 25 165 824 eigenvalues, which is much less than the roughly10% depicted for the small conﬁguration. For the interpolation operator of thelarge conﬁguration we used aggregates corresponding to 4 sub-lattices, whichgive a total of 2 × = 16 384 aggregates. In relative terms, this is several13rders of magnitude ﬁner as for the 4 lattice. This ﬁner aggregation leads tointerpolation operators with increased faculties to recombine information, whichexplains the resulting higher local coherence.Both parts of Figure 5 show that local coherence drops oﬀ for eigenvectorsfarther away from the interpolation set. For the large conﬁguration, we see, forexample, that local coherence of the vectors from the last interpolation set withthe second-to-last interpolation set is very high as indicated by the deep redcolor in the top right corner of the plot. The local coherence of these vectorswith respect to the central interpolation set (which contains the eigenpairs witheigenvalues closest to 0) is signiﬁcantly smaller, indicated by the yellow colorat the middle of the right-hand boundary of the plot. In the scenario where wechoose the shift τ in the correction equation (3.7) farther away from zero—aswe are targeting eigenpairs close to τ —a coarse grid operator constructed usingthe eigenpairs closest to zero thus becomes increasingly less eﬀective, reducingthe overall convergence speed of the multigrid method signiﬁcantly. To remedythis, we propose a dynamical interpolation updating approach, resulting in acoarse grid operator that remains eﬀective on the span of the eigenvectors witheigenvalues close to the value of τ set in the outer iteration of the generalizedDavidson method. In the course of the outer iteration, once enough eigenpairsare available, we therefore rebuild the interpolation, and with it the coarsegrid operator, using the already converged eigenvectors which are closest to thecurrently targeted harmonic Ritz value. Once a harmonic Ritz value convergedto an eigenvalue and we choose a new target value τ that has the same sign asthe previous target, we replace one eigenvector from the interpolation set—theone farthest away from τ —with the newly converged eigenvector, and updatethe multigrid hierarchy. If the new τ has its sign opposite to the previous one wereplace the full interpolation set with converged eigenvectors closest to the new τ , and again update the multigrid hierarchy. The updates of the interpolationand coarse grid operators involve some data movement and local operations, buttheir cost is minor compared to the cost of the other parts of the computation.With this approach, the coarse grid is always able to treat the eigenspaceclosest to our current harmonic Ritz approximation eﬃciently and makes op-timal use of the existing local coherence. This results in a signiﬁcantly fastermultigrid solver when larger shifts are used, i.e., when a large number of smalleigenpairs has to be computed. Since the solution of these shifted systemsaccounts for most of the work in the eigensolver this approach improves the14igenvalue scaling to a nearly linear complexity, as seen in Section 4. Algorithm 2:

GD- λ AMG input:

Hermitian Dirac operator Q , no. of eigenvalues n , no. of testvectors n tv , min. and max. subspace size m min and m max , initialguess [ v , . . . , v n tv , t ] =: [ V | t ], desired accuracy ε outer output: set of n eigenpairs (Λ , X ) Λ = ∅ , X = ∅ for m = n tv + 1 , n tv + 2 , . . . t = ( I − V V H ) t , t = ( I − XX H ) t v m = t/ || t || V = [ V | v m ] get all ( θ i , s i ) with ( QV ) H ( QV ) s i = θ i ( QV ) H V s i ﬁnd smallest (in modulus) θ i / ∈ Λ u = V s i , r = Qu − θ i u if || r || ≤ ε outer // current eigenpair has converged Λ = [Λ , θ i ], X = [ X, u ] update smallest (in modulus) θ i / ∈ Λ u = V s i , r = Qu − θ i u rebuild interpolation using the n tv eigenvectors x j with eigenvalue λ j closest to θ i and update multigrid hierarchy // solve correction equation t = DD- α AMG (( D − θ i Γ ) , Γ r ) // restart if m ≥ m max get (Θ , S ) as all eigenpairs ( θ i , s i ) of ( QV ) H ( QV ) s i = θ ( QV ) H V s i sort (Θ , S ) by ascending modulus of Θ for i = 1 , . . . , m min V i = V s i ( QV ) i = ( QV ) s i retain ﬁrst m min vectors of V and QV A summary of our eigensolver, termed GD- λ AMG, a generalized Davidsonmethod with algebraic multigrid acceleration, is given as Algorithm 2.

In this section we present a variety of numerical tests to analyze the eﬃciencyof the GD- λ AMG eigensolver.Table 1 contains information on the gauge conﬁgurations we use in our tests.The two small conﬁgurations on a 4 and a 8 lattice were generated using ourown heat-bath algorithm. The conﬁgurations on the larger lattices (conﬁgura-tions 1 to 4) were provided by our partners at the University of Regensburgwithin the Collaborative Research Centre SFB-TRR55; see [3]. For these con-15D lattice size hopping parameter mass parameter clover term CPU N t × N s κ a · m c sw cores1 48 × − . . × − . . , × − . . , × − . . , × – − . × – − . α AMG setup number of test vectors n tv α AMG solve relative residual ε inner − maximum iterations 5coarse grid tolerance 5 · − eigensolver method relative eigenvector residual ε outer − number of eigenpairs 100minimum subspace size m min m max default parameters.ﬁgurations, we actually use clover improved (see [35]) Wilson-Dirac operators D and Q , where a block diagonal term with 6 × O ( a ) to O ( a ). The resulting mod-iﬁed D is still Γ -Hermitian. The mass parameter m from (2.5) is chosen suchthat a · m = κ − m such that we obtain a comparable conditioning ofthe matrix. A second table, Table 2, shows the default algorithmic parametersettings we used within GD- λ AMG. For a more detailed explanation of theseparameters we refer to [19].The numerical results involving conﬁgurations 1–4 were obtained on the JU-RECA and JUWELS clusters at the J¨ulich Supercomputing Centre [23, 24],while results involving the other lattices were obtained on a smaller worksta-tion. We will compare our results with the state-of-the-art library PRIMME andwith PARPACK, and we start by outlining their underlying basic algorithms.16

RIMME (PReconditioned Iterative MultiMethod Eigensolver) [41, 43]implements a broad framework for diﬀerent Davidson-type eigensolvers. Itsperformance is best if it is given an eﬃcient routine to solve linear systems withthe matrix A , and we do so by providing the Γ -preconditioned DD- α AMGsolver. There are two key diﬀerences compared to GD- λ AMG: • The interpolation cannot be updated eﬃciently within the PRIMME frame-work (at least not without expert knowledge on the underlying data struc-tures), hence we do not update it for this method. • PRIMME uses a Rayleigh Ritz instead of a harmonic Ritz approach toextract eigenvalue approximations.PRIMME has a fairly ﬁne-tuned default parameter set, e.g., for subspace sizeor restart values, and is able to dynamically change the eigensolver method. Wekeep the default settings and provide the same multigrid solver to PRIMME aswe do for GD- λ AMG.

PARPACK (Parallel ARnoldi PACKage) [38] is a somewhat older butwidely used software for the computation of eigenvalues of large sparse ma-trices. It is based on an implicitly restarted Arnoldi method, which is originallydesigned to ﬁnd extremal eigenvalues. It is possible to transform an interiorproblem into an exterior one using a ﬁlter polynomial, i.e., a polynomial whichis large on the k interior eigenvalues we are looking for and small on the re-maining ones. To construct such a polynomial, for example as a Chebyshevpolynomial, we need information on the eigenvalue λ max which is largest inmodulus and the ( k + 1)st smallest in modulus, λ k +1 . While λ max = 8 is asuﬃciently good estimate for the Hermitian Wilson-Dirac matrix Q , no a-prioriguess for λ k +1 is available in realistic scenarios. For our tests, we run one ofthe other methods to compute the ﬁrst k eigenvalues and then use a slightlylarger value as a guess for λ k +1 . While this approach obviously costs a lot ofadditional work and actually makes the subsequent Arnoldi method obsolete, itis a good reference for a near-optimally polynomially ﬁltered Arnoldi method.Since this approach does not require inversions of the matrix Q , the parameterset for this method is rather small. We use a degree ten Chebyshev polynomialas the ﬁlter polynomial and set the maximum subspace size to be twice thenumber of sought eigenpairs. The required eigenvector residual is set to 10 − ,as with the other methods. Solving the correction equation.

Each step of Algorithm 2 uses DD- α AMGin line 14 to solve the Γ -preconditioned correction equation ( D − τ Γ ) t = Γ r .More precisely, as indicated by the parameters given in the middle of Table 2,we stop the outer (FGMRES) iteration of DD- α AMG once the initial residualnorm is reduced by a factor of 0 . α AMG iteration we require a reduction of the residual by afactor of 0.5 when solving the system on the coarsest level. Table 3 shows thatthe Γ -preconditioning yields indeed signiﬁcant gains in compute time.correction equation iterations Timeouter inner in core-h.Eq. (3.6): ( Q − τ I ) t = r

565 10,349 83.0Eq. (3.7): ( D − τ Γ ) t = Γ r

511 3,045 41.3Table 3: Impact of Γ -preconditioning for the computation of 100 eigenpairs ofconﬁguration 1 (see Table 1).A variant of generalized Davidson methods solves, instead of the correctionequation (3.3), the Jacobi-Davidson projected [36] system ( I − uu H )( A − θI )( I − uu H ), where u is the last (harmonic) Ritz vector approximation. This will avoidstagnation in the case that the correction equation is solved too exactly . Thereare theoretically justiﬁed approaches which adaptively determine how accuratelythe projected system should be solved in each iteration. Since we solve thecorrection equation to quite low relative precision (10 − only), we could not seea beneﬁt from using the Jacobi-Davidson projected system. Indeed, even withthe adaptive stopping criterion, this approach increased the compute time byapproximately 15%. Impact of the smoother

The original DD- α AMG method uses SAP as asmoother and we have shown in Chapter 2 that SAP is also applicable forthe Hermitian Wilson-Dirac operator Q , yielding the same error propagationoperator as long as the individual block systems are solved exactly. We nowcompare a cost-eﬃcient, approximate SAP and GMRES as smoothers withinthe multigrid methods constructed for the matrices D − τ I , Q − τ I and D − τ Γ ,where τ ranges from 0 to 0 . D − τ I is not relevantfor this work, since it would arise when computing eigenpairs for D . We stillinclude the results here to be able to compare the performance of DD- α AMGfor Q − τ I and D − τ Γ with the performance of DD- α AMG for D − τ I .Figure 6 shows a scaling plot with respect to the target shift τ for conﬁgu-ration 6. For this plot, we used a two-level DD- α AMG method with six stepsof the adaptive setup procedure to generate the coarse grid system. The GM-RES smoother requires a reduction of the residual by a factor of 10 − or after amaximum of ten iterations have been performed. Similarly, the SAP smootherperforms three sweeps of SAP, where each block solve is performed using GM-RES until a reduction of the residual by a factor of 10 − was achieved for theindividual block of after a maximum of ten iterations have been performed. Thisway the computational work for both smoothers is roughly comparable.Figure 6 veriﬁes what was stated in Section 3.1, namely that DD- α AMG con-verges more slowly for Q compared to D . It also shows that Γ -preconditioning18igure 6: Comparison of iteration counts of the DD- α AMG method using eitherSAP or GMRES smoothing for conﬁguration 6 and increasing target shifts τ .The black diamonds at the bottom depict the eigenvalue distribution of Q .is beneﬁcial in the case of GMRES smoothing whereas in the case of SAPsmoothing, it loses eﬃciency compared to Q , although only by a small margin.Comparing the two smoothing methods for D − τ Γ , we see that both methodsperform nearly identical up to larger shifts, where SAP starts to be slightlymore favorable. We do not expect this to be relevant for larger conﬁgurations,though, since there the spectrum is much more dense. Even when aiming for alarge number of small eigenvalues, we certainly do not expect to end up with τ -values as large as 0 . D − τ Γ within the DD- α AMG framework would require a more substantial remodelingof the DD- α AMG code.

Impact of the coarse grid correction

For an assessment of the impactof the coarse grid correction step we compute 100 eigenvalues for conﬁgura-tion 1, once using DD- α AMG with GMRES smoothing to solve the correctionequation, and once with a modiﬁcation where we turned-oﬀ the coarse grid cor-rection. This yields a generalized Davidson method where the Γ -preconditionedcorrection equation (3.7) is solved using FGMRES with the GMRES-steps ofthe smoother as a non-stationary preconditioner, i.e., GMRESR, the recursiveGMRES method [42]. Note that we do not yet include updating the multigridhierarchy as the outer iterations proceeds.The left part of Figure 7 shows the FGMRES iterations spent on the correc-19 i nn e r i t e r a t i o n s p e r e i g e n v a l u e ( s o li d li n e ) c u m u l a t e d i nn e r i t e r a t i o n s ( d a s h e d li n e ) eigenvalue indexFGMRESRFGMRES + AMG 010203040506070 25 50 75 100 020004000600080001000012000 c o r e - h ( s o li d li n e ) i nn e r i t e r a t i o n s ( d a s h e d li n e ) number of eigenvectorsw/o updates (core-h)w/ updates (core-h) Figure 7:

Left:

Computation of 100 eigenvalues for conﬁguration 1 with GM-RESR and FGMRES + AMG.

Right:

Comparing eigenvalue scaling for con-ﬁguration 2 depending on whether eigenvalue information is provided for theinterpolation operator.tion equation for computing 100 eigenvalues for the two variants. We see thatright from the beginning, including the coarse grid correction, i.e., using themultigrid method, reduces the iteration count by one order of magnitude com-pared to the “pure” GMRESR-Krylov subspace method. The required numberof FGMRES iterations per eigenvalue stays constant at ≈

30 for the multigridmethod, whereas GMRESR starts at ≈

300 and increases to ≈ ,

200 for the lasteigenvalues. This is also reﬂected in CPU time, where on JUWELS multigridpreconditioning results in 30 core-h for the entire computation, whereas 217core-h were necessary when using GMRESR. Thus multigrid gains one order ofmagnitude and, in addition, shows an improved scaling behaviour, despite theloss of local coherence for the larger eigenvalues.The right part of Figure 7 now illustrates the additional beneﬁts that we getfrom turning on the updating of the multigrid hierarchy, i.e., when performingfull GD- λ AMG as described in Algorithm 2. Both approaches perform similarlyas long as a small amount of eigenvalues is sought. This changes substantiallyfor already a moderate amount of eigenvalues to a point where interpolationupdates save roughly a factor of two in both, number of iterations (dashed lines)and consumed core-h (solid lines). In terms of iterations it is also noteworthythat interpolation updates lead to a nearly linear scaling with respect to theeigenvalue count, whereas in the other case the scaling is closer to quadratic.

Scaling with the lattice size.

We now compare GD- λ AMG, PRIMME andPARPACK in terms of scaling with respect to the lattice size. For this, we reportthe total core-h consumed for computing 100 eigenpairs on conﬁgurations 1 to4. Figure 8 shows that PRIMME and GD- λ AMG scale similarly with increas-20 × × × × c o r e - h lattice size N t × N s PARPACKPRIMME + AMGGD- λ AMG 10 × × × × c o r e - h ( s o li d li n e ) i t e r a t i o n s ( d a s h e d li n e ) lattice size N t × N s Rayleigh-RitzHarmonic Ritz

Figure 8:

Left:

Computation of 100 eigenvalues for 48 × to 64 × latticesfor diﬀerent methods. Right:

Comparison of Ritz and harmonic Ritz eigenpairextraction for diﬀerent lattice sizes.ing lattice size. GD- λ AMG shows some improvement in core-h compared toPRIMME, and this improvement tends to get larger when increasing the lat-tice size. The right part of Figure 8 shows, that this improvement might bepartially attributed to the fact that we use a harmonic Ritz extraction. Here,we compare GD- λ AMG with its default harmonic Ritz extraction to a variantwhere we use the standard Rayleigh-Ritz extraction as is done in PRIMME. TheFigure shows that harmonic Ritz extractions result in substantially less inneriterations. This also yields savings in computational time, which are smaller,due to the additional cost for the inner products. Note that for larger latticeseigenvalues become more clustered. The harmonic Ritz extraction is then morefavorable compared to the Rayleigh-Ritz approach, since it is able to better sep-arate the target eigenvalue from the neighboring ones. PARPACK scales worsethan the other methods, even when we use an unrealistic “near optimal” ﬁlterpolynomial as we did here. In practice, i.e., when no guess for | λ k +1 | is available,PARPACK’s performance would fall even further behind. Applying PARPACKto Q − to make use of the eﬃcient multigrid solver is way too costly, due to thenecessity of very exact solves to maintain the Krylov structure. Scaling with the number of eigenvalues.

Figure 9 reports results of ascaling study obtained for conﬁguration 2. We just compare GD- λ AMG andPRIMME, since PARPACK is not competitive.The ﬁgure shows that GD- λ AMG has an advantage over PRIMME whenlarger numbers of eigenvalues are sought. GD- λ AMG needs up to one order ofmagnitude less iterations, which translates to a speed-up of 1 . λ AMG and PRIMME both scale nearly linearly with respect to thenumber of eigenvalues sought, up to at least 300 eigenvalues. Then PRIMME’s21

50 100 500 1000 10 c o r e - h ( s o li d li n e s ) i t e r a t i o n s ( d a s h e d li n e s ) number of eigenvectors N PRIMME + Γ -precGD- λ AMG

Figure 9: Eigenvalue scaling in the range of 50 to 1000 eigenpairs on conﬁgura-tion 2 with a lattice size of 64 × .performance starts to decrease more signiﬁcantly compared to GD- λ AMG. Wesee that the increase in the overall computing time in PRIMME scales morethan linearly with the number of iterations to be performed. This indicatesthat the non-adaptive multigrid solver used in PRIMME is getting increasinglyless eﬃcient, a situation that is remedied with the update strategy realized inGD- λ AMG.

This paper has a clear focus on algorithmic development to compute smalleigenmodes. Fur purposes of illustration, we now include an example whichrelates our computed eigenvalues to the general approach of lattice QCD.The spectral gap is the relevant quantity for the stability of Monte Carlosimulations of lattice QCD [13]. It is deﬁned as the smallest eigenvalue inmagnitude of the Hermitian Wilson-Dirac operator Q . If chiral symmetry waspreserved, the spectral gap would be bounded from below by the bare current-quark mass m . Since the Wilson-Dirac operator breaks chiral symmetry it ispossible that the gap is smaller than m . In the following we adopt the notationof [13] and denote the eigenvalues of Q by α i .Figure 10 shows the number of eigenvalues √ α i per bin and fm (blue line)at the low end of the spectrum computed on conﬁguration 2 with a lattice size of64 × and N f = 2 mass-degenerate non-perturbatively O( a ) improved quarks,cf. Table 1. The lattice spacing is a = 0 .

071 fm from [3] and the bin size is setto 1 MeV.It is interesting to compare these computational results to what we knowanalytically in the continuum theory in the inﬁnite volume limit. The spectral22

Figure 10: Number of eigenvalues √ α i of | Q | per bin and fm at the low end ofthe spectrum on conﬁguration 2 with a lattice size of 64 × . For comparisonthe same quantity in the continuum theory is also shown, cf. Eq. (5.1), taking m = 8 .

22 MeV and Σ = (250 MeV) .density can be computed from the Banks–Casher relation [4, 13] asymptotically˜ ρ ( √ α ) α>m = 2 √ α (cid:20) Σ π √ α − m + O(1) (cid:21) , (5.1)which goes to inﬁnity for √ α → m and is not deﬁned for √ α < m , resulting inno eigenvalues smaller than m for the continuum case. Note that the Banks-Casher relation can be used to determine the chiral condensate Σ from thenon-zero density of eigenmodes at the origin in inﬁnite volume; cf. [22]. Inorder to compare to our lattice results for conﬁguration 2 we compute the barecurrent-quark mass m using [17, Eq. (E.1)]. The spectral gap (black verticalline in ﬁgure 10) is given by √ ¯ α = Z A m [13], which we evaluate using theaxial-current renormalization constant Z A from [9]. The resulting distributionof eigenvalues √ α per bin and fm for the continuum operator (red line) isalso plotted in Figure 10. The comparison to the lattice data shows that onconﬁguration 2 there is a signiﬁcant amount of eigenvalues smaller than √ ¯ α ,and the distribution close to the gap deviates from Equation (5.1). The ﬁgurethus illustrates quantitatively the deviations due to lattice artifacts which arenot unexpected for the given lattice sizes. In this paper we introduced an eigensolver built around the multigrid methodDD- α AMG for eﬃcient shifted inversions within a generalized Davidson method.23everal adaptations were included in order to improve eigenvalue scaling by atleast a factor of three for a moderate to large amount of eigenvalues comparedto current general purpose eigensolver software. This was accomplished by im-plementing a synergy between the generalized Davidson method and the AMGsolver. Additionally, we incorporated several state-of-the-art techniques likelocking, thick restarting and harmonic Ritz extraction to achieve a nearly lin-ear eigenvalue scaling. We included a variety of numerical tests to verify theeﬃciency of our proposed adaptations.We can now use the new algorithm to compute relatively many small eigen-modes of large conﬁgurations. This allows to advance deﬂation approachesfor stochastic trace estimation as required in the computation of disconnectedfermion loops. We plan to explore this further in cooperation with the Europeantwisted mass collaboration. Note that the eigenpairs of the symmetrized Wilsonand twisted mass operators diﬀer by an imaginary shift in the eigenvalues only.With the algorithm presented in this work it is now also possible to perform asystematic study of the spectral gap and the spectral density varying the latticevolume, the lattice spacing, and the quark mass. We plan to extend the workof [13] to the N f = 2 + 1 O( a ) improved theory, cf. [8] in the future. The authors are grateful for the fruitful collaboration with Benjamin M¨uller ofthe University of Mainz and our colleagues within the SFB-TR 55 from the The-oretical Physics department of the University of Regensburg, especially GunnarBali and his group for providing conﬁgurations. We thank the J¨ulich Super-computing Centre (JSC) for access to high performance computing resourcesthrough NIC grant HWU29.

References [1] R. Babich, J. Brannick, R. C. Brower, M. A. Clark, T. A. Manteuﬀel, S. F.McCormick, J. C. Osborn, and C. Rebbi. Adaptive multigrid algorithm forthe lattice Wilson-Dirac operator.

Phys. Rev. Lett. , 105:201602, 2010.[2] G. Bali, S. Collins, A. Frommer, K. Kahl, I. Kanamori, B. M¨uller,M. Rottmann, and J. Simeth. (Approximate) Low-Mode Averaging with anew Multigrid Eigensolver.

PoS , LATTICE2015:350, 2015.[3] G. Bali, S. Collins, B. Gl¨aßle, M. G¨ockeler, J. Najjar, R. H. R¨odl,A. Sch¨afer, R. W. Schiel, A. Sternbeck, and W. S¨oldner. The Moment (cid:104) x (cid:105) u − d of the Nucleon From N f = 2 Lattice QCD Down to Nearly PhysicalQuark Masses. Phys. Rev. , D90(7):074510, 2014.[4] T. Banks and A. Casher. Chiral Symmetry Breaking in Conﬁning Theories.

Nucl. Phys. B , 169:103–125, 1980.245] B. Blossier, M. Della Morte, N. Garron, G. von Hippel, T. Mendes,H. Simma, and R. Sommer. HQET at Order 1 /m : II. Spectroscopy inthe Quenched Approximation. JHEP , 05:074, 2010.[6] J. Brannick, R. C. Brower, M. A. Clark, J. C. Osborn, and C. Rebbi. Adap-tive multigrid algorithm for lattice QCD.

Phys. Rev. Lett. , 100:041601,2007.[7] James Brannick and Karsten Kahl. Bootstrap algebraic multigrid for the 2dWilson Dirac system.

SIAM Journal on Scientiﬁc Computing , 36(3):B321–B347, 2014.[8] M. Bruno, D. Djukanovic, G. P. Engel, A. Francis, G. Herdoiza, H. Horch,P. Korcyl, T. Korzec, M. Papinutto, S. Schaefer, E. E. Scholz, J. Simeth,H. Simma, and W. Soeldner. Simulation of QCD With N f = 2 + 1 Flavorsof Non-Perturbatively Improved Wilson Fermions. JHEP , 02:043, 2015.[9] M. Dalla Brida, T. Korzec, S. Sint, and P. Vilaseca. High Precision Renor-malization of the Flavour Non-Singlet Noether Currents in Lattice QCDWith Wilson Quarks.

Eur. Phys. J. C , 79(1):23, 2019.[10] E. R. Davidson. The Iterative Calculation of a few of the Lowest Eigen-values and Corresponding Eigenvectors of Large Real-Symmetric Matrices.

Journal of Computational Physics , 17:87–94, January 1975.[11] T. DeGrand and C. E. Detar.

Lattice Methods for Quantum Chromody-namics . World Scientiﬁc, 2006.[12] T. A. DeGrand and S. Schaefer. Improving Meson two Point Functions inLattice QCD.

Comput. Phys. Commun. , 159:185–191, 2004.[13] L. Del Debbio, L. Giusti, M. L¨uscher, R. Petronzio, and N. Tantalo. Sta-bility of Lattice QCD Simulations and the Thermodynamic Limit.

JHEP ,02:011, 2006.[14] S. C. Eisenstat, H. C. Elman, and M. H. Schultz. Variational iterativemethods for nonsymmetric systems of linear equations.

SIAM Journal onNumerical Analysis , 20(2):345–357, 1983.[15] E. Endress, C. Pena, and K. Sivalingam. Variance Reduction With Prac-tical all-to-all Lattice Propagators.

Comput. Phys. Commun. , 195:35–48,2015.[16] J. Foley, K. Jimmy Juge, A. O’Cais, M. Peardon, S. M. Ryan, andJ. Skullerud. Practical all-to-all Propagators for Lattice QCD.

Comput.Phys. Commun. , 172:145–162, 2005.[17] P. Fritzsch, F. Knechtli, B. Leder, M. Marinkovi´c, S. Schaefer, R. Sommer,and F. Virotta. The Strange Quark Mass and Lambda Parameter of twoFlavor QCD.

Nucl. Phys. B , 865:397–429, 2012.2518] A. Frommer, K. Kahl, S. Krieg, B. Leder, and M. Rottmann. Aggregation-based multilevel methods for lattice QCD. In

Proceedings of Science, http://pos.sissa.it , volume LATTICE2011:046, 2011.[19] A. Frommer, K. Kahl, S. Krieg, B. Leder, and M. Rottmann. Adaptiveaggregation based domain decomposition multigrid for the lattice Wilson-Dirac operator.

SIAM J. Sci. Comp. , 36(4):A1581–A1608, 2014.[20] C. Gattringer and C. B. Lang.

Quantum Chromodynamics on the Lattice ,volume 788 of

Lect. Notes Phys.

Springer, 2009.[21] L. Giusti, P. Hernandez, M. Laine, P. Weisz, and H. Wittig. Low-EnergyCouplings of QCD From Current Correlators Near the Chiral Limit.

JHEP ,04:013, 2004.[22] L. Giusti and M. Luscher. Chiral Symmetry Breaking and the Banks-Casher Relation in Lattice QCD With Wilson Quarks.

JHEP , 03:013,2009.[23] J¨ulich Supercomputing Centre. JURECA -J¨ulich research on exascale cluster architectures. .[24] J¨ulich Supercomputing Centre. JUWELS -J¨ulich wizard for european leadership science. .[25] M. L¨uscher. Solution of the Dirac Equation in Lattice QCD Using a DomainDecomposition Method.

Comput. Phys. Commun. , 156:209–220, 2004.[26] M. L¨uscher. Local coherence and deﬂation of the low quark modes in latticeQCD.

JHEP , 07(2007)081, 2007.[27] I. Montvay and G. M¨unster.

Quantum Fields on a Lattice . CambridgeMonographs on Mathematical Physics. Cambridge University Press, 1994.[28] R. B. Morgan and D. S. Scott. Generalizations of Davidson’s method forcomputing eigenvalues of sparse symmetric matrices.

SIAM J. Sci. Stat.Comput. , 7(3):817–825, 1986.[29] H. Neﬀ, N. Eicker, T. Lippert, J. W. Negele, and K. Schilling. On thelow Fermionic Eigenmode Dominance in QCD on the Lattice.

Phys. Rev. ,D64:114509, 2001.[30] Y. Notay. Combination of Jacobi-Davidson and conjugate gradients for thepartial symmetric eigenproblem.

Numer. Linear Algebra Appl. , 9(1):21–44,2002. 2631] J. C. Osborn, R. Babich, J. Brannick, R. C. Brower, M. A. Clark, S. D.Cohen, and C. Rebbi. Multigrid solver for clover fermions.

PoS , LAT-TICE2010:037, 2010.[32] C. C. Paige, B. N. Parlett, and H. A. van der Vorst. Approximate solutionsand eigenvalue bounds from Krylov subspaces.

Numer. Linear AlgebraAppl. , 2(2):115–133, 1998.[33] Y. Saad. A ﬂexible inner-outer preconditioned GMRES algorithm.

SIAMJ. Sci. Comput , 14(2):461–469, 1992.[34] Y. Saad.

Numerical Metheods for Large Eigenvalue Problems . SIAM,Philadelphia, PA, USA, 2nd edition, 2011.[35] B. Sheikholeslami and R. Wohlert. Improved Continuum Limit LatticeAction for QCD With Wilson Fermions.

Nucl. Phys. , B259:572, 1985.[36] G. L. G. Sleijpen and H. A. van der Vorst. A Jacobi-Davidson itera-tion method for linear eigenvalue problems.

SIAM J. Matrix Anal. Appl. ,17(2):401–425, 1996.[37] G. D. Smith.

Numerical Solution of Partial Diﬀerential Equations: FiniteDiﬀerence Methods . Oxford Applied Mathematics and Computing ScienceSeries. Clarendon Press, 1985.[38] D.C. Sorensen, R.B. Lehoucq, C. Yang, and K. Maschhoﬀ. PARPACK. , used version:2.1, September 1996.[39] A. Stathopoulos. Nearly optimal preconditioned methods for Hermitianeigenproblems under limited memory. part i: Seeking one eigenvalue.

SIAMJ. Sci. Comput. , 29(2):481–514, 2007.[40] A. Stathopoulos and J. R. McCombs. Nearly optimal preconditioned meth-ods for Hermitian eigenproblems under limited memory. part ii: Seekingmany eigenvalues.

SIAM J. Sci. Comput. , 29(5):2162–2188, 2007.[41] A. Stathopoulos and J. R. McCombs. PRIMME: PReconditioned Itera-tive MultiMethod Eigensolver: Methods and software description.

ACMTransactions on Mathematical Software , 37(2):21:1–21:30, 2010.[42] C. Vuik and H. van der Vorst. GMRESR: A family of nested GMRESmethods.

Numerical Linear Algebra with Applications , 1(4):369–386, 1994.[43] L. Wu, E. Romero, and A. Stathopoulos. PRIMME SVDS: A high-performance preconditioned SVD solver for accurate large-scale compu-tations.