[PDF] Comparison of the mass preconditioned HMC and the DD-HMC algorithm for two-flavour QCD

Abstract

Mass preconditioned HMC and DD-HMC are among the most popular algorithms to simulate Wilson fermions. We present a comparison of the performance of the two algorithms for realistic quark masses and lattice sizes. In particular, we use the locally deflated solver of the DD-HMC environment also for the mass preconditioned simulations.

Full PDF

aa r X i v : . [ h e p - l a t ] N ov Comparison of the mass preconditioned HMC andthe DD-HMC algorithm for two-ﬂavour QCD

CERN-PH-TH-2010-260HU-EP-10/68SFB/CPP-10-95

Marina Marinkovic ∗ Humboldt Universität zu Berlin, Institut für Physik, Newtonstr. 15, 12489 Berlin, GermanyE-mail: [email protected]

Stefan Schaefer

Humboldt Universität zu Berlin, Institut für Physik, Newtonstr. 15, 12489 Berlin, GermanyCERN, Physics Department, 1211 Geneva 23, SwitzerlandE-mail: [email protected]

Mass preconditioned HMC and DD-HMC are among the most popular algorithms to simulateWilson fermions. We present a comparison of the performance of the two algorithms for realisticquark masses and lattice sizes. In particular, we use the locally deﬂated solver of the DD-HMCenvironment also for the mass preconditioned simulations.

The XXVIII International Symposium on Lattice Field Theory, Lattice2010June 14-19, 2010Villasimius, Italy ∗ Speaker. c (cid:13) Copyright owned by the author(s) under the terms of the Creative Commons Attribution-NonCommercial-ShareAlikeLicence. http://pos.sissa.it/ omparison of mass preconditioned HMC and DD-HMC

Marina Marinkovic

1. Introduction

In order to accelerate numerical simulations in lattice QCD, different preconditioning tech-niques are being used. We will try to make a comparison between two popular ways of precon-ditioning the Hybrid Monte Carlo (HMC) for improved Wilson fermions: domain decomposi-tion introduced by Lüscher in the DD-HMC algorithm[1] and Hasenbusch’s mass preconditioning(MP)[2]. In both approaches, the quark determinant is factorized into a part which is dominatedby the infra-red and another which is largely ultra-violet. This leads to the reduction of the quarkforce magnitude in the molecular dynamics equations. Therefore, the associated integration stepsizes can be set to larger values, which gives an acceleration of the algorithm.Comparisons between algorithms are difﬁcult. In a modern lattice QCD computation, manyparameters have to be set. Since their number can go into the dozens—their optimal values de-pending on each other—it is virtually impossible to ﬁnd the minimum, at which each algorithmperforms best, and then make a true comparison. In particular, the performance is determined bythe auto-correlation times of the observables of interest, whose measurements require runs withhigh statistics. Furthermore, the optimal values of the parameters and the relative performance ofthe algorithms might also depend on the implementation and the computer the simulation is run on.At least from the point of view of implementation, we tried to have the comparison on virtuallyequal grounds: for the DD-HMC, we used Lüscher’s publicly available code and from it, wedeveloped an implementation of the mass preconditioned HMC, reusing as many building blocksas possible. In particular the locally deﬂated solver introduced in the DD-HMC framework[3] turnsout to greatly speed up the simulations in both algorithmic setups.The block decomposition, on which the DD-HMC is based, allows for a decoupling of theblocks for a large part of the forces in such a simulation. The links on the boundary are not updatedduring the trajectory, which results in reduced communication, and is therefore suitable for clusterswith fast nodes and a relatively slow connection. However, this also means that only a fraction R of the links is “active”. The naive expectation that the auto-correlation times grow proportionalto R − is conﬁrmed in pure gauge theory[4], however, with dynamical fermions the cost is notreduced accordingly. There is a competition between the need of the computer, which are smallblocks using many processors of massively parallel machines, and the need of the physics, whichasks for blocks of a physical size of at least 0 . http://cern.ch/luscher/dd-hmc omparison of mass preconditioned HMC and DD-HMC Marina Marinkovic

2. Action and algorithms

We are simulating N f = a ) improved Wilsonquarks, using the Wilson gauge action. The Dirac operator in this formulation is given by D ( m ) = D W + c sw (cid:229) m , n i s mn ˆ F mn + m , (2.1)where D W represents the unimproved Wilson Dirac operator without mass term, c sw is the improve-ment coefﬁcient and m the bare quark mass.In our implementation of MP-HMC, we use mass preconditioning for the Schur complementof the symmetric even-odd preconditioning. Starting from the standard decomposition,det Q = det ( g D ) = det Q ee det Q oo det Q S with Q S = − Q − ee Q eo Q − oo Q oe , (2.2)we write det Q = det Q ee det Q oo det (cid:2) W ( D m ) (cid:3) det (cid:2) W − ( D m ) Q S (cid:3) (2.3)where W ( D m ) = Q S + D m with D m >

0. This leads to the effective action S eff = − ( log det Q ee + log det Q oo ) + | W − ( D m ) F | + | ( + D mQ S ) F | . (2.4)Using again a Schur complement approach, the inverse of the preconditioned operator Q S can beconstructed from the inverse of the full Hermitian Dirac operator QQ − S = P e Q − Q ee P e . (2.5)In the following, the forces associated with pseudo-fermion ﬁeld F are denoted by F , whereas F are the forces from F .In the DD-HMC, the quark determinant is written as the product of the determinants of theDirac operator restricted to the blocks and a factor which accounts for the remaining contributionsto the fermion determinant. The latter factor couples the gauge ﬁelds on the different blocks. Theblock forces are referred to as F and F is the block interaction force. For details of this setup see[1]. In both setups, UV/IR separation due to the preconditioning opens the possibility to integrate F on a larger time scale than F .

3. Simulation parameters

We have performed runs on a 48 × lattice at b = . c sw = . a ≈ . r = . k sea = . m p ≈ m p L ≈ .

6. In all our runs, thetrajectory length is set to t = .

5, for which we take the normalization of Ref. [1]. These parameterswere also used in DD-HMC simulations without deﬂation in Ref. [7].In DD-HMC, the locally deﬂated Schwarz-preconditioned generalized conjugate residual (GCR)solver described in [3] is employed for the inversions in F . The less expensive inversions on theblocks, needed in the computation of F , are done with the BiCGstab algorithm. In the multiple3 omparison of mass preconditioned HMC and DD-HMC Marina Marinkovicwithout deﬂ. with deﬂ. h N GCR ( F ) i

107 23comp. time( F ) 780.42s 264.98 s Table 1:

Average number of GCR iterations per trajectory and the total execution time for the F computationin MP-HMC with the deﬂated solver and without the application of deﬂation. The size of the deﬂation blocksis 6 × . Note that a reduction in average time by a factor ∼ time scale integration we use N steps in the fermion force F for each of the N steps per trajectoryof F and analogously N steps of the gauge force per step in F . We choose N , N and N to be18, 5 and 6, respectively, for a block size of 6 × .Without much tuning, for the MP case we have taken the same step size at the outer forceand the rest of the parameters is chosen according to the ratio of the forces magnitude, i.e. k F k : k F k : k F k = N , N , N = 18, 10, 10. In ourversion of MP-HMC, the Schwarz-preconditioned GCR solver is employed for the computationof both F and F . The demanding inversions with the low quark mass in the F computation aredone with the deﬂated version of the solver whereas in the F computations the deﬂation was off.Here, the preconditioning parameter is the positive mass difference added to the symmetricallypreconditioned Dirac operator which we set to D m ≈ .

09. More tuning could probably lead to abetter performance than the one discussed below.In both setups, the standard leap frog integration is implemented and for the prediction of thesolution in all inversions, the chronological inversion method of Brower et al. [8] is used.

4. Solver performance

The Schwarz-preconditioned GCR solver used for all the inversions in MP-HMC is takenover from the DD-HMC environment. The results of intensive tests of this solver implementationwithin DD-HMC can be found in [9, 10]. As expected, we ﬁnd the improved performance of thedeﬂated solver also in the MP-HMC, gaining roughly a factor of three in the time needed for thecomputation of F on our lattices compared to the case without the application of deﬂation, detailscan be found in Table 1.For the update of the deﬂation subspace the same criteria as in the DD-HMC setup wereapplied[10], however, since in MP-HMC all links are active, the deﬂation subspace has to be up-dated more often than in the DD-HMC case to satisfy the same update criteria, see Fig. 1. Inprinciple, we could have optimized the update criterion of the deﬂation space even further to matchthe particular conditions in the MP-HMC given by the cost of the F force computation, but we havealready achieved a very signiﬁcant gain and do not expect any further improvements to dramaticallychange the situation.

5. Quark forces and stability

At small quark masses, instabilities in the numerical integration of the molecular dynamics4 omparison of mass preconditioned HMC and DD-HMC

Marina Marinkovic t / e DD-HMCN

GCR subspace update 1st solution2nd solution t / e MP HMC subspace update 1st solution2nd solution

Figure 1:

History of the iteration numbers N GCR of the deﬂated Schwarz-preconditioned GCR solver alonga molecular-dynamics trajectory, using the DD-HMC (left side) and MP-HMC (right side) algorithm, plottedagainst the molecular-dynamics time t given in units of the integration step size e = t N . The lattice sizein both cases is 48 × and k sea = . equations may occur, which manifest themselves in violent ﬂuctuations in the energy violation D H of the molecular dynamics evolution. According to the tests of the DD-HMC algorithm performedso far, severe integration instabilities were rare [7]. This has to be demonstrated also for the MP-HMC and for both algorithms when going to smaller quark masses.Typically, the force F is the source of these instabilities, and in Fig. 2 we show Monte Carlotime histories for F and F showing the maximal and average forces, together with the correspond-ing D H throughout roughly 600 trajectories for DD-HMC and 300 in the MP-HMC case. One cansee that the average force in the case of MP-HMC ﬂuctuates a lot less than in DD-HMC, which isreﬂected in smaller magnitude and ﬂuctuations in D H for the mass preconditioning.

6. Efﬁciency of the algorithms

The question of performance of the two algorithms must be addressed empirically and al-though a ﬁnal answer would require to include in the comparison a series of lattice sizes and quarkmasses, our study could give us a ﬁrst insight into how the two approaches relate in computa-tional cost. This includes not only the cost of performing a single trajectory, but also the relatedauto-correlation times, because what matters is the cost of achieving a certain statistical error. Thepresented computations are performed on the SGI Altix which is built of Intel Xeon GainestownX5570 CPUs with InﬁniBand connections at HLRN supercomputing system at ZIB in Berlin andRRZN in Hannover, and the relative timings can certainly be different on other architectures.In Table 2 we show the average plaquette value in both runs, integrated autocorrelation time t int and the acceptance rate for the two cases. As we have already mentioned, the fraction of activelinks in the DD-HMC for the chosen parameters is R = L / a =

24. It has been shown for the pure gauge theory that the autocorrelation time scales withthe inverse of this fraction[4] as long as the blocks are of a reasonable size. In order to be able tocompare the two algorithms, we have multiplied the current result for the integrated autocorrelationtime with R , and scaled the execution time accordingly. Since the available statistics is not enoughfor the reliable estimation of the errors in the autocorrelation times, the values for the t int and its5 omparison of mass preconditioned HMC and DD-HMC Marina Marinkovicwall clock / R U P t int ( U P ) × R N tra j acc. rateDD-HMC 2010s 1.65106(10) 10(5) 840 90(1)%MP-HMC 1530s 1.65127(10) 10(4) 432 85(2)%

Table 2:

Comparison of the DD-HMC algorithm with the MP-HMC. Both simulations are done for theimproved Wilson theory with two degenerate fermion ﬂavours. The lattice size is 48 × , lattice spacing a ≈ .

07 fm and the pion mass m p ≈

400 MeV. The block size in DD-HMC is 6 × , while in the MP case,the difference in mass D m ∼ .

09. Here R represents the fraction of active links in the algorithm, R = . R = error should be taken only as ﬁrst estimates. Including the acceptance into consideration, we canconclude from the performed study that in both approaches roughly the same CPU time is neededfor the same error in the measured observable U P .

7. Summary

In this contribution, we have compared DD-HMC, one of the most efﬁcient algorithms for || > 04 D H 0 10 20max ||F || 0.20.40.6 MP HMC04 0 10 202.052.102.15< ||F || > 0 10 20 100 200 300 400 500 600max ||F || 1.952.02.05 0 10 20 50 100 150 200 250 300 Figure 2:

Histories of the energy violation D H , as well as maximum and average forces F and F , foreach force update, plotted as a function of the trajectory number. Values corresponding to the DD-HMCalgorithm are shown to the left and the integration stepsizes for the two forces relate as D t : D t = D t : D t = × and k sea = . omparison of mass preconditioned HMC and DD-HMC Marina Marinkovic dynamical QCD simulations, with our implementation of a mass preconditioned HMC, includingfor the ﬁrst time a locally deﬂated solver, which brings a signiﬁcant speedup. We have conﬁrmedthat relatively large step size for small quark mass is achievable also with MP-HMC. Looking at theseries of average and maximal forces of F in both cases, it is indicative that the energy violationsvisible as spikes in D H are caused by the irregularities in the forces. This is easier to control witha continuous parameter, such as the preconditioning mass in mass preconditioning case, than withthe HMC block size in DD-HMC which can only take few values in practical applications. Thetwo algorithms have demonstrated comparable performance, which is in large owed to the usage ofthe same efﬁcient deﬂated solver in both cases.A future task is to extend this study to larger lattices with even lower quark masses. The MP-HMC also leaves room for improvement. One could study the use of three or more pseudo-fermionﬁelds and also tune the preconditioning masses more carefully than we did in the present results.In particular for smaller preconditioning masses, the use of the deﬂated solver also for F might beadvisable.Besides being able to use larger numbers of CPUs, the approach with mass preconditioningallows for a much easier extension of the program package, such as introducing Schrödinger func-tional boundary conditions, as well as adding additional heavy and non-degenerate ﬂavours.

8. Acknowledgements

We thank M. Hasenbusch, H. Simma, R. Sommer, F. Virotta and U. Wolff for useful and stim-ulating discussions and M. Lüscher for making his DD-HMC code publicly available. This work issupported by the German Science Foundation (DFG) under the grants GRK1504 "Mass, spectrum,symmetry" and SFB/TR9-03. Simulations were performed at HLRN in Berlin and Hannover.

References [1] Martin Lüscher, Comput. Phys. Commun., 165:199–220, 2005.[2] M. Hasenbusch, Phys. Lett., B519:177–182, 2001; M. Hasenbusch, K. Jansen, Nucl. Phys.

B659 (2003) 299-320.[3] M. Lüscher, JHEP (2007) 081.[4] S. Schaefer, R. Sommer, F. Virotta, arXiv:1009.5228 [hep-lat].[5] K. Jansen, A. Shindler, C. Urbach and U. Wenger, PoS

LAT2005 (2006) 118.[6] B. Leder, F. Knechtli, R. Sommer, PoS

LAT2010 (2010) 233.[7] L. Del Debbio, et al. , JHEP (2007) 056; L. Del Debbio, et al.

JHEP (2007) 082.[8] R. C. Brower, T. Ivanenko, A. R. Levi and K. N. Orginos, Nucl. Phys. B (1997) 353.[9] M. Lüscher, Comput. Phys. Commun. (2004) 209.[10] M. Lüscher, JHEP , 011 (2007)., 011 (2007).