Efficient integration of gradient flow in lattice gauge theory and properties of low-storage commutator-free Lie group methods
EE ffi cient integration of gradient flow in lattice gauge theory andproperties of low-storage commutator-free Lie group methods Alexei Bazavov a , Thomas Chuna a a Department of Computational Mathematics, Science and Engineering and Department of Physics and Astronomy,Michigan State University, East Lansing, MI 48824, USA
Abstract
The smoothing procedure known as the gradient flow that suppresses ultraviolet fluctuations of gauge fields plays an important rolein lattice gauge theory calculations. In particular, this procedure is often used for high-precision scale setting and renormalizationof operators. The gradient flow equation is defined on the SU (3) manifold and therefore requires geometric, or structure-preserving,integration methods to obtain its numerical solutions. We examine the properties and origins of the three-stage third-order explicitRunge-Kutta Lie group integrator commonly used in the lattice gauge theory community, demonstrate its relation to 2 N -storageclassical Runge-Kutta methods and explore how its coe ffi cients can be tuned for optimal performance in integrating the gradientflow. We also compare the performance of the tuned method with two third-order variable step size methods. Next, based onthe recently established connection between low-storage Lie group integrators and classical 2 N -storage Runge-Kutta methods, westudy two fourth-order low-storage methods that provide a computationally e ffi cient alternative to the commonly used third-ordermethod while retaining the convenient iterative property of the latter. Finally, we demonstrate that almost no coding e ff ort is neededto implement the low-storage Lie group methods into existing gradient flow codes. Keywords:
Lattice gauge theory, Gradient flow, Geometric integration, Runge-Kutta methods, Lie group methods
1. Introduction
In lattice gauge theory [1], a numerical approach to quan-tum gauge theories, the path integrals are evaluated by samplingthe space of possible field configurations with a Markov ChainMonte Carlo process and averaging over the Monte Carlo timeseries. The fields are defined on a four-dimensional Euclideanspace-time grid and for the physically relevant case of QuantumChromodynamics (QCD) take values in the SU (3) group.A smoothing procedure, referred to as gradient flow , intro-duced by L¨uscher in Ref. [2], allows one to evolve a givengauge field configuration towards the classical solution. Thegradient flow possesses renormalizing properties and is oftenused for renormalization of operators and determining the lat-tice spacing in physical units. Certain lattice calculations re-quire determination of the lattice scale to sub-percent precision.Numerically, gradient flow amounts to integrating a first-orderdi ff erential equation on the SU (3) manifold. While a varietyof structure-preserving integration methods can be used for thistask [3], the three-stage third-order explicit Runge-Kutta inte-grator introduced in Ref. [2] became the most commonly usedmethod in lattice gauge theory applications.In this paper we explore the recently observed relations be-tween classical low-storage explicit Runge-Kutta methods andthe commutator-free Lie group methods [4]. We show how athree-stage third-order integrator can be optimized specifically Email address: [email protected] (Alexei Bazavov) for integrating the gradient flow and how higher-order methodswith similar low-storage properties can be constructed.The paper is organized as follows. In Sec. 2 we introduce theformalism of lattice gauge theory and gradient flow. In Sec. 3we review the properties of standard explicit Runge-Kutta inte-gration methods and so-called low-storage Runge-Kutta meth-ods. In Sec. 4 we discuss structure-preserving integrators, oftencalled geometric integrators or Lie group integrators. In Sec. 5we present the numerical results on integrating the gradient flowon three lattice ensembles with several third- and fourth-orderLie group methods. We present our conclusions and recom-mendations for tuning the methods and improving the compu-tational e ffi ciency of integrating the gradient flow in Sec. 6.
2. Lattice gauge theory and the gradient flow
In lattice gauge theory the primary degrees of freedom are SU (3) matrices U x , µ that reside on the links of a hypercubiclattice with dimensions N σ × N τ . The lattice spacing is a , x is aninteger-valued four-vector, and µ = , . . . ,
4. Gauge-invariantobservables are represented as traces of products of the gaugelink variables along paths on the lattice that are closed loops.The main observables we discuss here are the plaquette (4-linkloop): P x , µ,ν = U x , µ U x + µ, ν U † x + ν, µ U † x , ν (1)rectangle (6-link loop): R x , µ,ν = U x , µ U x + µ, µ U x + µ, ν U † x + µ + ν, µ U † x + ν, µ U † x , ν (2) Preprint submitted to Computer Physics Communications January 15, 2021 a r X i v : . [ h e p - l a t ] J a n nd the so called clover expression constructed as a linear com-bination of several plaquettes forming the shape of a clover leaf: C x , µ,ν = i (cid:16) Q x , ν, µ − Q x , µ,ν (cid:17) (3)where Q x , µ,ν = P x , µ,ν + P x ,ν, − µ + P x , − µ, − ν + P x , − ν, µ .The simplest gauge action is the Wilson action [1] that in-cludes only the plaquette term (for convenience we drop thefactor 1 / g in the definition): S Wilson = (cid:88) x (cid:88) µ<ν ReTr(1 − P x , µ,ν ) . (4)To suppress lattice discretization e ff ects one can construct im-proved actions such as, for instance, the tree-level Symanzik-improved gauge action that includes the plaquette and rectangleterms [5]: S Symanzik = S Wilson − (cid:88) x (cid:88) µ (cid:44) ν ReTr(1 − R x , µ,ν ) . (5)The clover action is S clover = (cid:88) x (cid:88) µ (cid:44) ν ReTr( C x , µ,ν C x , µ,ν ) (6)and another variant of an improved action can be constructed asa linear combination of the plaquette and clover terms.To smoothen the fields and suppress ultraviolet fluctuationsRef. [2] suggested evolving the gauge fields U x , µ with the fol-lowing gradient flow equation: dV x , µ dt = − (cid:110) ∂ x , µ S f ( t ) (cid:111) V x , µ , V x , µ ( t = = U x , µ (7)where the di ff erential operator ∂ x , µ acts on a function of SU (3)group elements as defined in Ref. [2] and S f is the lattice ac-tion evaluated using the evolved gauge link variables V ( t ). Werefer to the gradient flow as the Wilson flow when S f = S Wilson is used in the flow equation, and as the Symanzik flow when S f = S Symanzik . The flow time t has dimensions of lattice spac-ing squared.One of the widespread applications of gradient flow in latticegauge theory is scale setting, i.e. determination of the latticespacing in physical units for a given lattice ensemble. In thiscase the flow is run until the flow time t = w [6] at which (cid:34) t ddt t (cid:104) S o ( t ) (cid:105) (cid:35) t = w = Const (8)and typically
Const = . w -scale in physical units, w phys .Eq. (8) is an improved version of the original proposal wherethe action itself rather than its derivative was used [2]: t (cid:104) S o ( t ) (cid:105) (cid:12)(cid:12)(cid:12) t = t = Const . (9)The observable used for the scale setting in Eq. (8) is the actiondensity S o , not necessarily the same as S f in the flow equa-tion (7). As has been discussed in Ref. [7] di ff erent combina-tions of the flow action and the observable result in di ff erentdependence on the lattice spacing. Here we consider S Wilson and S Symanzik for the flow and S Wilson , S Symanzik and S clover forthe observable.
3. Classical Runge-Kutta methods
Consider a first-order di ff erential equation for a function y ( t ) dydt = f ( t , y ) , (10)and the initial condition y ( t =
0) given. An s -stage explicitRunge-Kutta (RK) method that propagates the numerical ap-proximation to the solution y t of Eq. (10) at time t to time t + h is given in Algorithm 1 [8, 9]. Algorithm 1
Explicit classical s -stage Runge-Kutta method for i = do y i = y t + h (cid:80) i − j = a i j k j (cid:46) a i , j (cid:62) i = k i = f ( t + hc i , y i ) (cid:46) c = end for y t + h = y t + h (cid:80) si = b i k i The self-consistency conditions require c i = i − (cid:88) j = a i j . (11)We refer to this method as classical RK method to distinguish itfrom the Lie group integrators discussed in Sec. 4. To providean order of accuracy p the coe ffi cients a i j , b i need to satisfy the order conditions . The order conditions for a classical RKmethod of third-order global accuracy are given in AppendixA. It is convenient to represent the set of coe ffi cients a i j , b i , c i as a Butcher tableau , for instance, for a 3-stage method (thefirst entry with c = c a c a a b b b (12)The nodes c i in the left column describe the time points at whichthe stages are evaluated, the a i j in the middle give the weights of the right hand side function for each stage, and the bottomrow gives the weights for the final stage of the method. If themethod is explicit each stage can only depend on the previousones and therefore the Butcher tableau has a characteristic tri-angular shape.For a 3-stage third-order classical RK method there are fourorder conditions and six independent coe ffi cients, thus, thesemethods belong to a two-parameter family. Often, the coef-ficients c , c are chosen as free parameters and the rest areexpressed through them, as given in Eqs. (A.5)–(A.10).In the following the discussion is restricted to autonomousproblems where the right hand side of Eq. (10) does not explic-itly depend on time. Extension to non-autonomous problems istrivial. N-storage classical RK methods
As is clear from Algorithm 1, to compute y t + h at the final stepone needs to store k i , i = , . . . , s (the right hand side evalua-tions) from all s stages of the method. It was shown in Ref. [10]2hat a classical RK method may be written in a form where onlythe values from the previous stage are used. Therefore onlytwo quantities need to be stored at all times, independent ofthe number of stages of the method. RK methods with such aproperty are called low-storage methods . A number of di ff erenttypes of low-storage methods have been developed in the liter-ature, e.g. , Refs. [11, 12]. For the later discussion of Lie groupmethods in Sec. 4 we focus on the methods of Ref. [10], whichare also called 2 N -storage methods .Given an s -stage RK method one can express its coe ffi cientsthrough another set of coe ffi cients A i , B i , i = , . . . , s such that a i j = A j + a i , j + + B j , j < i − , B j , j = i − , , otherwise , (13) b i = A i + b i + + B i , i < s , B i , i = s , (14)and for explicit methods necessarily A =
0. A 2 N -storage s -stage explicit classical RK method is given in Algorithm 2. Algorithm 2 N -storage explicit classical s -stage Runge-Kuttamethod y = y t for i = do ∆ y i = A i ∆ y i − + h f ( y i − ) (cid:46) A = y i = y i − + B i ∆ y i end for y t + h = y s For a 3-stage third-order method expressing the original a i j , b i coe ffi cients through A i , B i leads to an additional, fifth, ordercondition for a i j , b i that was found in [10]. This means that thecoe ffi cients of a 2 N -storage scheme now form a one-parameterfamily. One can express the fifth order condition as an implicitfunction of c and c [10]: c (1 − c ) + c (cid:32) c + c − (cid:33) + (cid:32) − c (cid:33) = . (15)This implicit function is shown in Fig. 1.While almost any point in the plane (except c = c = / ffi cient scheme, only the values on the curve correspond toclassical RK methods that can be rewritten in the 2 N -storageformat, Algorithm 2. The plot shown in Fig. 1 first appearedin [10] therefore we refer to it as the Williamson curve . Tofind coe ffi cients for a 3-stage third-order 2 N -storage schemeone can proceed in the following way: • Pick a value of c in the allowed range. • Solve Eq. (15) for c and pick one of the branches. Unlike other types of low-storage methods ( e.g. , 2 R -, 2 S -, 3 R -, 3 S -, etc.)the 2 N -storage methods have special properties that turned out to be related toLie group integrators [4]. c c W6 W7
Figure 1: The Williamson curve, i.e. the set of values of c and c coe ffi -cients for which the 3-stage third-order classical RK schemes can be written inthe 2 N -storage format. The symbols (triangles, box and circle) correspond torational solutions. The blue box and the red circle are the schemes that are dis-cussed in more detail later. W6 and W7 labels indicate the original numberingin Ref. [10]. There is a reflection symmetry along the c + c = • Express all a i j , b i coe ffi cients in terms of c and c usingEqs. (A.5)–(A.10). • Find A i , B i by inverting the relations given in Eqs. (13),(14).The symbols on the Williamson curve are the points where c and c (and all the other coe ffi cients) have rational values.These values are summarized in Table B.2 in Appendix B. Theschemes labeled with the blue box and red circle in the figureplay special role in our discussion later. We denote them: • RK3W6: c = / c = / • RK3W7: c = / c = / N -storage schemes first appeared.For 2 N -storage schemes with more than three stages andorders higher than three there are no analytic solutions avail-able. The coe ffi cients can be found by expressing the orderconditions through the coe ffi cients A i , B i and solving the re-sulting system of non-linear equations numerically. Multiple2 N -storage schemes have been designed in this way in the lit-erature [13, 14, 15, 16, 17, 18, 19, 20]. In the previous sections we considered integration methodsthat operate with fixed step size. If an estimate of the localerror is available one can adjust the step size during the inte-gration. Methods with such a property are known as variablestep size or adaptive integrators. To construct a variable stepsize method one uses two schemes of di ff erent order simultane-ously. The di ff erence between the two solutions after one step3f integration serves as an estimate of the local error and is usedto adjust the step size.Later in Sec. 5.2 we will consider two variable step sizeschemes where a third-order integrator is used for propagatingthe solution and an embedded second-order integrator is used toconstruct an alternative estimate of the solution. Let d be somemeasure of distance between the solutions (the precise metricused is not important at this point) after the current integrationstep and δ a fixed parameter – local tolerance. After an integra-tion step the step size is adjusted as h → . (cid:114) δ d h . (16)If d > δ then the integration step is redone with the adjusted h .Otherwise the integration proceeds with the adjusted h . Such aprocedure ensures that the local error at every step is boundedby δ .It is important to note that setting a bound on the local errordoes not actually tell one what value of the global error will beachieved. We discuss this point in more detail in Sec. 5.2.
4. Lie group integrators
Consider now a di ff erential equation on a manifold: dYdt = F ( Y ) Y . (17)The results in this section are valid in general for Y taking val-ues on an arbitrary manifold equipped with a group action. Forthe main discussion that follows, Y will represent a gauge linkvariable, which is an SU (3) matrix. Capital letters are used hereto emphasize that the variables do not necessarily commute, un-like in the classical RK case.If one uses a classical RK scheme for solving Eq. (17) nu-merically, an update of the form Y + Const · hF ( Y ) Y willmove Y away from the manifold. One needs to use geomet-ric, or structure-preserving integration schemes that update Y as exp( Const · hF ( Y )) Y . We discuss the two main approachesto constructing such methods next. Let us first return to the classical RK Algorithm 1. Onecan modify the step 2 in the following way: Evaluate thesum in the second term, exponentiate it and act on y t , i.e.Y i = exp( h (cid:80) i − j = a i j K j ) Y t , where K j = F ( Y j ). In this case, how-ever, the extra uncanceled terms in the Taylor expansion of thescheme will result in a method whose global order of accuracyis lower than for the classical scheme. This can be cured byadding commutators. A scheme given in Algorithm 3 which isreferred to as the Runge-Kutta-Munthe-Kaas (RKMK) methodwas introduced in Ref. [21]. There, the expansion of the inversederivative of the matrix exponential d exp − U ( V ) is truncated atthe order p − U , V , p ) = p − (cid:88) k = B k k ! ad k U ( V ) , (18) B k are the Bernoulli numbers and the k -th power of the adjointoperator ad U ( V ) is given by an iterated commutator applica-tion: ad U ( V ) = V , (19)ad U ( V ) = [ U , V ] , (20)ad k U ( V ) = ad U (ad k − U ( V )) = [ U , [ U , [ . . . , [ U , V ]]]] . (21)This algorithm results in a Lie group integrator of order p whose coe ffi cients are the coe ffi cients of a classical RK methodof order p . Algorithm 3 s -stage Runge-Kutta-Munthe-Kaas Lie groupmethod for i = do U i = h (cid:80) i − j = a i j ˜ K j (cid:46) a i , j (cid:62) i = Y i = exp( U i ) Y t K i = F ( Y i ) ˜ K i = dexpinv( U i , K i , p ) end for V = (cid:80) si = b i ˜ K i Y t + h = exp( V ) Y t The advantage of Algorithm 3 is that any classical RKmethod can be turned into a Lie group integrator. The num-ber of commutators can often be reduced as discussed in [22].It was found, however, that schemes that avoid commutatorscan be more stable and provide lower global error at the samecomputational cost. We discuss them next.
The earlier work on manifold integrators [23] was extendedin Ref. [24] to design a class of commutator-free
Lie groupmethods where each stage of the method may include a com-position of several exponentials. A general scheme is given inAlgorithm 4. We use the notation of Ref. [4] instead of the orig-inal notation of Ref. [24]. L i is the number of exponentials usedat stage i , J il is the number of right hand side evaluations K j used in the l -th exponential at stage i , L is the number of expo-nentials at the final step, and I l is the number of right hand sideevaluations K i used in the l -th exponential at the final step. T represents a “time-ordered” product meaning that an exponen-tial with a lower value of index l is located to the right . Algorithm 4 s -stage commutator-free Lie group method Y = Y t , K = F ( Y ) for i = do Y i = T (cid:110)(cid:81) L i l = exp (cid:16) h (cid:80) J il j = α l ; i j K j (cid:17)(cid:111) Y t K i = F ( Y i ) end for Y t + h = T (cid:110)(cid:81) Ll = exp (cid:16) h (cid:80) I l i = β l ; i K i (cid:17)(cid:111) Y t The coe ffi cients α l ; i j , β l ; i are related to the coe ffi cients of a4lassical RK method as [24] L i (cid:88) l = α l ; i j = a i j , L (cid:88) l = β l ; i = b i . (22)To better understand the notation of Algorithm 4 we list in Al-gorithm 5 explicit steps of one of the methods of Ref. [24]where s = L = L = J = L = J = L = I = I = Algorithm 5 Y = Y t K = F ( Y ) Y = exp( h α K ) Y t K = F ( Y ) Y = exp( h ( α K + α K )) Y t K = F ( Y ) Y t + h = exp( h ( β K + β K + β K )) exp( h β K ) Y t It was found in Ref. [24] that fixing β = α = / Y at the final step and the other coe ffi cients forma one-parameter family of solutions. Ref. [24] considered such commutator-free methods thatreuse exponentials. For instance, in Algorithm 5 Y is reused atthe final stage so one needs only three exponential evaluationsin total. Recently, Ref. [4] considered designing a commutator-free method where every next stage reuses Y i from the previousstage and contains only one exponential evaluation per stage(inspired by Algorithm 7 of Ref. [2], see below). Such meth-ods form a subclass of methods of Ref. [24] but di ff er from thesolutions found there by how the exponentials are reused. Itturned out that for a 3-stage third-order commutator-free Liegroup method with exponential reuse the additional order con-dition resulting from non-commutativity is the same as the or-der condition for a 2 N -storage 3-stage third-order classical RKmethod, Eq. (15). Thus, it was proven in [4] that all 2 N -storage 3-stage third-order classical RK methods of [10], i.e. all points on the Williamson curve, Fig. 1, are also low-storagethird-order commutator-free Lie group integrators. It was con-jectured in [4] that 2 N -storage classical RK methods of orderhigher than three are also automatically Lie group integratorsof the same order. Numerical evidence was provided in supportof the conjecture. Moreover, for a given set of numerical valuesof coe ffi cients A i , B i of a classical 2 N -storage RK method theorder of the Lie group method based on it can be determinedalgorithmically by using B-series [25]. Thus for all such meth-ods that we use here, given the coe ffi cients A i , B i of a classical2 N -storage RK method with s stages and global order of ac-curacy p , the procedure listed in Algorithm 6 is a low-storagecommutator-free Lie group method of order p .Let us now turn to the discussion of the integrator first in-troduced by L¨uscher in Ref. [2]. In our notation it is givenin Algorithm 7. This scheme belongs to the generic class of Algorithm 6 N -storage s -stage commutator-free Runge-KuttaLie group method Y = Y t for i = do ∆ Y i = A i ∆ Y i − + hF ( Y i − ) (cid:46) A = Y i = exp( B i ∆ Y i ) Y i − end for Y t + h = Y s commutator-free Lie group methods developed in Ref. [24],however, it di ff ers from the classes of solutions found there.Given that the linear combination of K and K is the same atsteps 5 and 7 and the previous stage is reused at steps 3, 5 and7, this integrator has certain reusability property. As far as weare aware, an integrator with the structure and numerical co-e ffi cients of Algorithm 7 was not present in the literature onmanifold integrators prior to Ref. [2]. Thus, we believe that thismethod was derived independently. Since its derivation was notpresented in [2], we present our derivation in Appendix C forillustrative purposes and also to document the order conditionsin the form that we were not able to find in the existing litera-ture. This scheme provides a link to the recent developments ofRef. [4]. Algorithm 7 Y = Y t K = F ( Y ) Y = exp (cid:16) h K (cid:17) Y K = F ( Y ) Y = exp (cid:16) h (cid:16) K − K (cid:17)(cid:17) Y K = F ( Y ) Y t + h = exp (cid:16) h (cid:16) K − K + K (cid:17)(cid:17) Y It turns out that when Algorithm 7 is rewritten in the 2 N -storage format of Algorithm 6 and the A i , B i coe ffi cients areconverted to the coe ffi cients of the classical underlying RKscheme, Eqs. (13), (14), the latter are the same as for theRK3W6 classical RK method discussed in Sec. 3.1. It is shownas a blue square on the Williamson curve, Fig. 1. Thus, themethod of Ref. [2] belongs to the class of 2 N -storage classicalRK methods which are automatically Lie group integrators ofthe same order as proven in [4]. We will explore this to findthe optimal set of coe ffi cients for integrating the gradient flowin Sec. 5.1.
5. Numerical results
To explore the properties of di ff erent integrators we usedthree gauge ensembles with the lattice spacing ranging from0 .
15 fm down to 0 .
09 fm listed in Table 1. These ensembleswere generated by the MILC collaboration with the one-loopimproved gauge action [5] and the Highly Improved StaggeredQuark (HISQ) action [26, 27]. The light quark masses weretuned to produce the Goldstone pion mass of about 300 MeV5 able 1: The MILC 2 + + T max .Here T max is dimensionless and approximately equal to ( w phys / a ) . Ensemble N σ × N τ a , fm T max l1648f211b580m013m065m838 ×
48 0 .
15 1 . l2464f211b600m0102m0509m635 ×
64 0 .
12 2 . l3296f211b630m0074m037m440 ×
96 0 .
09 3 . and the strange and charm quark masses are set to the physicalvalues.We integrate the flow for the amount of time T max that isneeded to determine the w -scale according to Eq. (8) with Const = .
3. The values of T max for each ensemble are givenin Table 1. The w -scale for these ensembles was determinedin [29].To find the global error for each integration method we needto compare to the exact solution. For this purpose we have alsoimplemented a 13-stage eighth-order RK integrator of Munthe-Kaas type, Algorithm 3, with Prince-Dormand coe ffi cients [30]that we refer to as RKMK8. At step size h = − it providesthe result that is exact within the floating point double precision.For this reason the results from RKMK8 are labeled as “exact.”To evaluate the global error introduced by the integrationmethods several quantities are studied. Let us first define thesquared distance between two SU (3) matrices X , Y : D ( X , Y ) ≡ (cid:88) i , j = | X i j − Y i j | . (23)For a set of flowed gauge fields the distance from the exact so-lution is defined as : ∆ V ≡ (cid:88) x (cid:88) µ (cid:113) D ( V x , µ ( t = T max ) , V exactx , µ ( t = T max )) . (24)We also calculate the value of the plaquette, rectangle andclover expression averaged over the lattice: P = N σ N τ (cid:88) x (cid:88) µ<ν ReTr( P x , µν ) , (25) R = N σ N τ (cid:88) x (cid:88) µ (cid:44) ν ReTr( R x , µν ) , (26) C = N σ N τ (cid:88) x (cid:88) µ (cid:44) ν ReTr( C x , µν C x , µν ) . (27)Up to a constant prefactor and a shift these quantities providethe three di ff erent discretizations of the action, Eqs. (4)–(6) that Other definitions are possible, e.g. ∆ V ≡ (cid:115)(cid:88) x (cid:88) µ D ( V x ,µ ( t = T max ) , V exactx ,µ ( t = T max )) . They scale with the step size in the same way as (24). can be used for the w -scale determination in (8). The globalerror is also estimated from the di ff erences: ∆ Z ≡ Z ( t ) − Z exact ( t ) , (28)where Z = P , R or C and t -dependence means that these quan-tities are computed using evolved gauge links V ( t ). We now address the question of what integrator out of thefamily of schemes along the Williamson curve can providethe lowest error for integrating the SU (3) gradient flow. TheWilliamson curve is parametrized with a variable u which isthe distance along the curve from the point (2 / ,
0) to (1 , / u ∈ [0 , ffi cient schemes c ( u ), c ( u )and use their coe ffi cients for low-storage commutator-free Liegroup integrators in the form of Algorithm 6. We refer to thesemethods in general as LSCFRK3 – low-storage, commutator-free, Runge-Kutta, third-order. We picked such values of c and c that are either rational or given in terms of radicals. Two par-ticular schemes are of interest in the following: LSCFRK3W6(equivalent to the integrator of Ref. [2]) and LSCFRK3W7.These are commutator-free Lie group versions of the classi-cal RK integration schemes RK3W6 and RK3W7 discussed inSec. 3.1.For this part of the calculation we have chosen 11 latticesper each ensemble separated by 500 molecular dynamics timeunits for the first two ensembles and 360 time units for the thirdensemble in Table 1. For the first two ensembles we ran both theWilson and Symanzik flow, and only the Symanzik flow for thethird ensemble. We ran all 32 LSCFRK3 methods at step sizes h = /
16, 1 /
32, 1 /
64 and 1 /
128 and RKMK8 at h = / uD V W6 W7 ✕
96 sym24 ✕
64 wil24 ✕
64 sym16 ✕
48 wil16 ✕
48 sym
Figure 2: The leading-order coe ffi cient D V for the global integration errordefined in Eq. (29) as function of the distance along the Williamson curve u . u = c = / c =
0, and u = c = c = /
3. The arrows labeled “W6” and “W7”represent the LSCFRK3W6 and LSCFRK3W7 schemes discussed in the text.The statistical errors are (much) smaller than the symbol size.
We first consider the behavior of the distance metric definedin (24). For a third-order method the distance is expected to6cale as O ( h ). We fit the distance ∆ V as a function of the stepsize to a polynomial form: (cid:104) ∆ V ( h ) (cid:105) = D V h + D V h + D V h , (29)where (cid:104) . . . (cid:105) represents the ensemble average. In some caseswe omitted the fifth-order term if a reasonable fit resulted fromjust the first two terms. The errors on the fit parameters wereestimated with a single elimination jackknife procedure.In Fig. 2 we show the dependence of the leading-order globalerror coe ffi cient D V as function of the distance u along theWilliamson curve ( i.e. the RK coe ffi cient scheme) for all en-sembles and flows that we analyzed. The values of D V arenormalized in the following way. For the Symanzik flow onthe 16 ×
48 lattice D V ( u ) is divided by 5 × . (This largefactor stems from the fact that our definition of ∆ V is exten-sive.) For all the other ensembles and flows D V ( u ) is dividedby such a constant that D V ( u =
0) coincides with the one forthe 16 ×
48 lattice Symanzik flow. As one can observe fromFig. 2, the behavior of D V ( u ) is similar for di ff erent ensemblesand di ff erent types of flow. The LSCFRK3W7 Lie group in-tegrator is closer to the minimum of the global error than theLSCFRK3W6 method. Note that since the definition of ∆ V ismanifestly positive, the leading order coe ffi cient D V ( u ) is alsopositive for all u . -0.0001-5x10 -5 -5 u rescaled ∆ P rescaled ∆ R ∆ C ∆ P ∆ R Figure 3: The global integration error in the plaquette (cid:104) ∆ P(cid:105) , rectangle (cid:104) ∆ R(cid:105) and the clover expression (cid:104) ∆ C(cid:105) as function of distance along the Williamsoncurve u . (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) match (cid:104) ∆ C(cid:105) after they are multiplied by a constantfactor. The rescaled values of (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) are shifted by ± × − to bedistinguishable from (cid:104) ∆ C(cid:105) . Next, we study the action related observables, Eqs. (25)–(27). In Fig. 3 we show the global errors (cid:104) ∆ P(cid:105) , (cid:104) ∆ R(cid:105) and (cid:104) ∆ C(cid:105) ,defined in (28), as function of u evaluated at T max = . h = /
16 for the Wilson flow on the a = .
15 fmensemble. When (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) are rescaled by a constantso that they coincide with (cid:104) ∆ C(cid:105) at u =
0, all three quantitiescollapse onto the same curve. In fact, the collapse is so accu-rate that the quantities labeled “rescaled” in the figure wouldbe completely covered by the clover (cid:104) ∆ C(cid:105) had not we shiftedthem by ± × − . This is expected since at later flow times the gauge fields are smooth and the di ff erence between di ff er-ent discretizations diminishes. It is remarkable that the collapsehappens already at our coarsest lattice, for the least improvedflow at the largest step size, h = /
16. We therefore focussolely on the clover discretization in the following.Similarly to Eq. (29) we fit the global integration error ∆ C asfunction of step size to a polynomial: (cid:104) ∆ C ( h ) (cid:105) = D C h + D C h + D C h . (30)We omit the fifth-order term whenever a reasonable fit is ob-tained with the first two terms. -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.2 0.4 0.6 0.8 1 uD C W6 W7 ✕
96 sym24 ✕
64 wil24 ✕
64 sym16 ✕
48 wil16 ✕
48 sym
Figure 4: The leading-order coe ffi cient D C for the global integration error de-fined in Eq. (30) as function of the distance along the Williamson curve u . Thearrows labeled “W6” and “W7” represent the LSCFRK3W6 and LSCFRK3W7schemes discussed in the text. The statistical errors are (much) smaller than thesymbol size. In Fig. 4 we plot the dependence of the leading order globalerror coe ffi cient D C as function of u for all ensembles and flows,similar to Fig. 2. The data is rescaled such that all values at u = D C for the Symanzik flow on the16 ×
48 lattice. Interestingly, unlike (cid:104) ∆ V ( h ) (cid:105) , (cid:104) ∆ C ( h ) (cid:105) is notnecessarily positive. We observe that while most of the inte-gration schemes approach the exact result from above, there isa region of u where the exact result is approached from below.The LSCFRK3W7 scheme is close to the point where D C = u ∈ [0 . , .
65] is magnified in Fig. 5 for the32 ×
96 ensemble and the Symanzik flow. The fourth-ordercoe ffi cient D C ( u ) is also shown. For the LSCFRK3W7 schemeit is also relatively small. Therefore this method provides closeto the lowest error for the action observables that are central forscale setting.In Fig. 6 we directly compare (cid:104) ∆ C ( h ) (cid:105) for LSCFRK3W6,LSCFRK3W7 and LSCFRK3W9 for the 32 ×
96 ensem-ble. The LSCFRK3W7 scheme has the smallest error (cid:104) ∆ C ( h ) (cid:105) , This scheme has irrational coe ffi cients and it has the minimal theoreticalbound on the global error using the definition of [31]. In other words, these are“Ralston coe ffi cients” but with taking into account the additional constraint ofthe low-storage method. More details can be found in [10, 4]. u W6 W7 D C D C Figure 5: The leading D C and the next-to-leading order D C coe ffi cients forthe global integration error defined in Eq. (30) for the Symanzik flow on 32 ×
96 ensemble. The lines are not fits but are drawn to guide the eye. The D C coe ffi cient is divided by a factor of two to better fit in the frame. which due to the competition between the h and h termscrosses zero, as shown in the inset of Fig. 6. -4x10 -7 -2x10 -7 -7 -7 -7 -7 -6
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 h 〈 ∆ C ( h ) 〉 LSCFRK3W9LSCFRK3W7LSCFRK3W6-4x10 -10 -2x10 -10 -10 -10
0 0.02
Figure 6: The dependence of the global integration error in the clover expres-sion (cid:104) ∆ C ( h ) (cid:105) on the step size h for three LSCFRK3 methods. Lines are fits tothe data, Eq. (30). For the LSCFRK3W7 method the error changes sign, asshown in the inset. To summarize, we expect that the LSCFRK3W7 integrator isclose to optimal for integrating the gradient flow: it has closeto the lowest global error for the gauge fields, Fig. 2, and itsleading-order error coe ffi cient D C is close to 0 for the actionrelated observables, giving much smaller global error, Fig. 6.Moreover, we observe that these properties of LSCFRK3W7are stable against fluctuations within the gauge ensemble, dif-ferent ensembles, di ff erent types of flow and di ff erent dis-cretizations of the observable.In the following we consider only the Symanzik flow andthe clover expression for the observable. The flow is integratedwith LSCFRK3W6 and LSCFRK3W7 on 100 lattices that are -20-18-16-14-12-10 ln | 〈 ∆ C ( h ) 〉 | ✕ -22-20-18-16-14 ✕ RKMK3LSCFRK3W6LSCFRK3W7-24-22-20-18-16-14 -5 -4.5 -4 -3.5 -3 -2.5 ✕ ln h Figure 7: Scaling of the global integration error in the clover observable (cid:104) ∆ C ( h ) (cid:105) with the step size h = / /
64, 1 /
32 and 1 /
16 for the Symanzikflow on the three gauge ensembles. The three third-order methods are discussedin the text. The lines are drawn to guide the eye. separated by 50 time units for the first two ensembles and 36time units for the third in Table 1. The RKMK8 method with h = − is used to get the exact solution. The error bars arecalculated with respect to 20 jackknife bins. For comparisonwe also implemented a third-order algorithm of Munthe-Kaastype with Ralston coe ffi cients, Algorithm 3, that we refer to asRKMK3. The results for the global integration error for theclover expression as function of step size h for the three third-order methods are shown in Fig. 7.It is customary to plot the logarithm of the absolute valueof the error vs the logarithm of the step size since the slopethen is equal to the order of the method. It also allows one todisplay the scaling for di ff erent integrators that would be toosmall on the linear scale. This works well when the error isdominated by the leading order term. As we observe in Figs. 5,6, the LSCFRK3W7 error crosses zero which translates to −∞ on the log-log plot. This explains the non-monotonicity of theLSCFRK3W7 data in Fig. 7. The point with the largest errorbar for the 32 ×
96 lattice is close to the zero crossing and thusthe jackknife propagated error bars in ln | ∆ C| are large there.This is expected when the leading-order coe ffi cient is small andthe leading term is comparable with the next-to-leading orderterm for a range of step sizes. For the 16 ×
48 lattice the D C coe ffi cient is so close to zero that the error scales almost as h rather than h . At small enough h all LSCFRK3 methods shouldapproach the expected h behavior.In Fig. 7 one can observe that the LSCFRK3W7 methodhas lower global error than the original integrator of Ref. [2],LSCFRK3W6. The RKMK3 method has the largest error, de-spite the fact that its coe ffi cients are chosen to be the ones that8ive the lowest theoretical bound on the global error [31].The most expensive part of the calculation is evaluation ofthe right hand side F ( Y ), Eq. (17). All three methods requirethree evaluations and thus the same computational cost. (Weneglect the fact that the RKMK3 method also requires compu-tation of commutators. In a parallel code that operation is localand thus its overhead is unnoticeable compared with the compu-tation of F ( Y ), which requires communication.) We concludethat LSCFRK3W7 is the most beneficial Lie group integratorfor the gradient flow among the third-order explicit RK meth-ods that we studied. Variable step size integrators were used for gradient flow inRefs. [32, 33], and we also explore methods of this type herefor comparison. In Ref. [32] a second-order method was em-bedded into the LSCFRK3W6 scheme. Methods of this typecan be built for all LSCFRK3 schemes so we consider the mostgeneric case. To connect with the form presented in [32] it isconvenient to start with the form where K , K and K are sep-arated, Algorithm 9 in Appendix C. Once the third-order in-tegrator stages are complete and all K i are computed, anotherstage is performed to get a lower order estimate:˜ Y t + h = exp( h ( λ K + λ K + λ K )) Y t . (31)For ˜ Y t + h to be globally second order (and locally third order)the coe ffi cients λ i need to satisfy the following conditions: λ + λ + λ = , (32) c λ + c λ = . (33)For any LSCFRK3 method determined by c ( u ), c ( u ) there is aone-parameter family of embedded second-order methods. Thevariable step size scheme is no longer a low-storage methodsince K i need to be stored separately for the stage (31) to beapplied .For the measure of the local error Ref. [32] suggested d = max x ,µ (cid:113) D ( V x , µ ( t + h ) , ˜ V x , µ ( t + h )) . (34)The full method is summarized in Algorithm 8 (the local toler-ance δ is a preset parameter).We take λ as a free parameter (this fixes λ and λ ) and testtwo variable step size methods based on W W ffi cients. These methods are referred to as CFRK3W6VSand CFRK3W7VS in the following (VS = variable step size).The tests are performed on a single ×
48 lattice with theSymanzik flow to illustrate a qualitative point. With one exception: For most methods there is a single value of λ wherethe linear combination of K and K can be reused. For instance, for theLSCFRK3W6 method setting λ = − / λ K + λ K = / K − / K ), so one can reuse the same linear combination in the low-order esti-mate of the solution, Eq. (31). And there is an exception to the exception: Areusable embedded second-order scheme does not exist for LSCFRK3W7. Seethe Mathematica script in Appendix D. Algorithm 8 Perform the three stages of Algorithm 9 for each gauge linkto get V x , µ ( t + h ). Store K i , i = , . . . , Compute ˜ V x , µ ( t + h ) using Eq. (31). Compute the maximum distance d with Eq. (34). Set h → . (cid:113) δ d h . If d > δ restart from step 1. ∆ V ( δ =10 -5 ) λ CFRK3W6VSCFRK3W7VS
Figure 8: The global integration error for gauge fields defined in Eq. (24) asfunction of the parameter λ that distinguishes variable step size methods. TheCFRK3W6VS and CFRK3W7VS methods di ff er in what third-order integratoris used. The global error is evaluated at the same local tolerance δ = − for allmethods. The results are from the flow on a single 16 ×
48 lattice and thereforehave no errorbars.
For a set of local tolerances δ = − , 3 × − , 10 − , 3 × − ,10 − , 3 × − and 10 − CFRK3W6VS and CFRK3W7VS wererun for values of λ ∈ [ − ,
2] separated by 0 .
1. The resultsfor the global error in the gauge fields ∆ V ( δ ) defined in (24)are shown in Fig. 8 for a fixed value of tolerance δ = − asfunction of the λ parameter. There seems to be some room fortuning λ since there is a significant di ff erence in what globalerror is achieved by di ff erent methods. In Fig. 9 the depen-dence of the global error ∆ V ( δ ) on the local tolerance δ isshown for the three CFRK3W6VS methods with λ = −
1, 0and 0 .
7. Note that λ = λ = . λ = − i.e. computational e ff ort) each method needs to reach acertain global error. For this purpose all 62 schemes shown inFig. 8 are plotted against the number of right hand side eval-uations (which is the number of steps times three stages perstep) N rhs in Fig. 10. All schemes collapse on the same lineapart from the region of small number of steps at the begin-ning (large local tolerance) where the subleading corrections toscaling are still large. There is no room for tuning – all meth-9 ∆ V ( δ ) δ Figure 9: The global integration error ∆ V ( δ ) as function of the local tolerance δ for the three third-order variable step size methods CFRK3W6VS( λ ) for λ = −
1, 0 and 0 .
7. The results are from the flow on a single 16 ×
48 latticeand therefore have no errorbars. ods would reach a given global error with the same computa-tional e ff ort independently of what coe ffi cients of the embed-ded low-order scheme λ i are used. What is di ff erent for dif-ferent schemes is the value of the local tolerance at which thatglobal error is achieved. For instance, the CFRK3W6VS with λ = . ∆ V =
10 at the local toler-ance δ = . × − , with λ = δ = × − and with λ = − δ = . × − , Fig. 9.This brings us to an important point. It may be tempting, ashappens in some literature, to interpret the local tolerance δ asa universal parameter that can be set once and after that the in-tegrator self-tunes and takes care of the global error. Contraryto that, we observe that there is no apriori way to know whatglobal integration error is achieved for a specific value of δ ona given lattice ensemble. Therefore, δ should be treated in thesame way as the step size h in fixed step size methods. For acalculation performed at a single value of h there is no way toestimate what global error was achieved, apart from knowingthat it is proportional to h p , where p is the order of the method.One needs to either calculate the exact (or high-precision) solu-tion and compare with it, or perform the calculation at severalstep sizes, study the dependence of the error in the observable ofinterest on the step size and decide what step size is appropriatefor a given lattice ensemble. Similarly, one needs to study thescaling of the error with respect to δ and pick δ based on that.It will certainly be di ff erent for each problem, since the numer-ical value of the global integration error is influenced by manyfactors such as the integration method, what gauge ensemblesare used, type of the gradient flow, for how long the flow is run,etc.Given that all embedded methods are equivalent with re-spect to the computational cost for a given higher-orderscheme and we do not observe a significant di ff erence be-tween CFRK3W6VS and CFRK3W7VS , we only perform This happens also because W6 and W7 schemes are close, e.g.
Fig. 1.
50 100 300 ∆ V N rhs
Figure 10: The global error ∆ V as function of the number of right hand sideevaluations of Eq. (7) for all 62 variable step size schemes shown in Fig. 8.The results are from the flow on a single 16 ×
48 lattice and therefore have noerrorbars. calculations with the original method of Ref. [32] which isCFRK3W6VS( λ = i.e. involves commutators).It requires four right hand side evaluations; however, the lastevaluation at the current step is the first on the next, so it canbe stored and reused (so called First Same As Last – FSALproperty). Therefore in practice it requires only three right handside evaluations, which is the same computational e ff ort as forall the other third-order methods explored here. Since we donot find this method to be beneficial, we do not list the fullalgorithm. The details can be found in [34, 33]. This method isreferred to as RKMK3BS.As in Sec. 5.1 the Symanzik flow is measured on 100 lat-tices from each ensemble of Table 1. The error bars are es-timated from 20 jackknife bins. In Fig. 11 the logarithm ofthe global error (cid:104) ∆ C ( h ) (cid:105) is shown vs the logarithm of the num-ber of the right hand side evaluations N rhs for the two fixedstep size integrators LSCFRK3W6 and LSCFRK3W7 that arediscussed in Sec. 5.1 and the two variable step size methodsCFRK3W6VS( λ =
0) and RKMK3BS. The origin of the non-monotonic behavior for CFRK3W6VS is similar to the one forLSCFRK3W7 – its global error crosses zero at some value of δ . Interestingly, while both variable step size methods are morebeneficial than LSCFRK3W6 for the 16 ×
48 and 24 ×
64 en-sembles, for the 32 ×
96 ensemble in the regime where all in-tegrators approach the expected cubic scaling, the variable stepsize schemes require comparable or larger computational e ff ortthan LSCFRK3W6. The fixed step size LSCFRK3W7 methodrequires the least computational e ff ort for all three ensembles For a method further away on the Williamson curve the embedded schemeswould still be equivalent themselves, but the computational cost of that integra-tor would be higher than for CFRK3W6VS and CFRK3W7VS. ln | 〈 ∆ C 〉 | ✕ LSCFRK3W6RKMK3BSCFRK3W6VSLSCFRK3W7-22-20-18-16-14 ✕ -24-22-20-18-16 4 4.5 5 5.5 6 6.5 7 ln N rhs ✕ Figure 11: Scaling of the global integration error in the clover observable (cid:104) ∆ C(cid:105) with the number of right hand side evaluations N rhs for the Symanzik flow onthe three gauge ensembles. The variable step size methods RKMK3BS andCFRK3W6VS were run with a set of local tolerances δ = − , 3 × − , 10 − ,3 × − , 10 − , 3 × − , 10 − , 3 × − . The data for the fixed step size methodsLSCFRK3W6 and LSCFRK3W7 is the same as in Fig. 7. The lines are drawnto guide the eye. except the region of large local tolerance (small number of righthand side evaluations). We conclude that for the scale settingapplications there may be no benefit from using variable stepsize integrators, at least for the gauge ensembles that we usedfor this study.Our findings are in contrast with the studies of Ref. [32]where large computational savings were reported forCFRK3W6VS( λ = ff erent regime. Apartfrom less important di ff erences in the gauge couplings,boundary conditions and lattice volumes, the flow was runsignificantly longer to achieve a much larger smoothing radiusthan is needed for w -scale setting. Translated for the gaugeensembles used in this study the flow would be run in theranges T max = . − a = .
15 fm, T max = . − a = .
12 fm and T max = . −
32 for the a = .
09 fmensembles (compare with T max in Table 1). Reference [4] opened a possibility of building low-storageLie group integrators from classical 2 N -storage RK methods.There are a number of fourth-order 2 N -storage methods in theliterature [13, 14, 15, 16, 17, 18, 19] that were constructed fordi ff erent applications. Most of them include large number ofstages and may therefore be computationally ine ffi cient for in-tegration of the gradient flow in lattice gauge theory.We study two fourth-order methods that have five (the min-imal possible number) [13] and six [15] stages which we refer to as LSCFRK4CK and LSCFRK4BBB , respectively. Theircoe ffi cients were found by solving a system of nonlinear equa-tions resulting from eight order conditions for a classical RKmethod and additional problem-dependent constraints ( e.g. in-creased regions of stability). These coe ffi cients in the 2 N -storage format are listed in Appendix B.As in Sec. 5.1, the tests are performed with the Symanzikflow, clover observable on 100 lattices from the ensembleslisted in Table 1. The step sizes h = /
16, 1 /
32, 1 /
64 and1 /
128 are used and the exact solution is obtained with RKMK8at h = − . For comparison we also implemented a fourth-order method of Munthe-Kaas type, RKMK4. Such a methodwas employed in [35]. Our implementation is not exactly thesame as there and provides a slightly smaller global error, butour findings are qualitatively similar. -24-22-20-18-16-14-12 ln | 〈 ∆ C ( h ) 〉 | ✕ -24-22-20-18-16 ✕ RKMK4LSCFRK3W6LSCFRK4CKLSCFRK4BBB-26-24-22-20-18-16 -5 -4.5 -4 -3.5 -3 -2.5 ✕ ln h Figure 12: Scaling of the global integration error in the clover observable (cid:104) ∆ C ( h ) (cid:105) with the step size h = / /
64, 1 /
32 and 1 /
16 for the Symanzikflow on the three gauge ensembles. The three fourth-order methods are dis-cussed in the text. The lines are drawn to guide the eye.
The scaling of the error in the clover observableln |(cid:104) ∆ C ( h ) (cid:105)| as function of ln h is shown in Fig. 12 for theLSCFRK3W6, LSCFRK4CK, LSCFRK4BBB and RKMK4methods. RKMK4 has the largest error among the fourth-ordermethods while the lowest global integration error is achievedwith LSCFRK4BBB.The 2 N -storage RK methods with s stages have 2 s − A = We use nomenclature consistent with other methods in this paper. In theoriginal work [15] the classical Runge-Kutta method is called RK46-NL. ln | 〈 ∆ C 〉 | ✕ -24-22-20-18-16 ✕ -26-24-22-20-18-16 4 4.5 5 5.5 6 6.5 7 7.5 8 ✕ ln N rhs LSCFRK3W6LSCFRK4CKLSCFRK4BBBLSCFRK3W7
Figure 13: Scaling of the global integration error in the clover observable (cid:104) ∆ C(cid:105) with the number of right hand side evaluations N rhs for the Symanzik flow onthe three gauge ensembles. All integrators were run at step sizes h = / /
64, 1 /
32 and 1 /
16 except LSCFRK4CK and LSCFRK4BBB where h = / similarly to our discussion in Sec. 5.1. However, due to thecomplexity of the order conditions, no analytic solutions suchas Eq. (15) are available. This makes tuning of that integrator acomplicated task. We note that there are three more coe ffi cientschemes reported in Ref. [13], but we found them less e ffi cientthan the main scheme that Ref. [13] recommended and whichwas implemented in this study. The integration schemes that we explored di ff er in the num-ber of stages. To compare their computational e ffi ciency,Fig. 13 shows the dependence of the global error in the cloverobservable vs the number of right hand side evaluations for thethird-order method of Ref. [2] LSCFRK3W6, the third-orderscheme LSCFRK3W7 that we discussed in Sec. 5.1, and thetwo fourth-order methods LSCFRK4CK and LSCFRK4BBB.Compared with LSCFRK3W6 we find that the LSCFRK3W7scheme produces lower global error at all step sizes exploredand is thus more beneficial computationally. The fourth-order LSCFRK4BBB scheme becomes more beneficial thanLSCFRK3W6 at step size of about h = /
16 for the 16 × ×
64 ensembles and about h = /
32 for the 32 × ffi cient at h = /
128 ( h for LSCFRK3W7). Forthe 32 ×
96 ensemble the five-stage LSCFRK4CK method be-comes comparable with LSCFRK4BBB.
6. Conclusion
Based on the connection [4] between the 2 N -storage clas-sical Runge-Kutta methods [10] and commutator-free integra-tors [24] we explored several possible improvements in the ef-ficiency of integrating the gradient flow in lattice gauge theory.Among the low-storage three-stage third-order schemes thatare parametrized by the Williamson curve the LSCFRK3W7 isthe most promising. Its global error in the norm of the gaugefield is close to the minimum, Fig. 2. For the action observ-ables this method is close to the point where the leading or-der error coe ffi cient is close to zero. Like the originally pro-posed LSCFRK3W6 method of L¨uscher [2], LSCFRK3W7 hasrational coe ffi cients that are given in the 2 N -storage form inAppendix B. The performance of the LSCFRK3W7 method isuniversal across ensembles with di ff erent lattice spacing, typesof flow and types of observable that we explored. For a specificgradient flow application the reader can always revisit the tun-ing of the third-order scheme similar to our study in Sec. 5.1 by e.g. running a small-scale test on the set of rational values of c and c coe ffi cients given in Appendix B that reasonably coverthe Williamson curve. For the reader’s convenience we alsoprovide a listing of the Mathematica script that calculates theLSCFRK3 method coe ffi cients in various formats from given c and c in Appendix D.Our studies of the third-order variable step size methods inSec. 5.2 indicate that one needs to exercise caution in interpret-ing the local tolerance δ parameter. Its relation to the globalintegration error is not a priori known, in the same way as ithappens with the step size h for fixed step size methods. Thus,in both situations one needs to study the scaling of the globalerror with the control parameter, step size h or local tolerance δ , and tune it based on that for each specific case. For w -scalesetting applications we find that it is still computationally moree ffi cient to use the fixed step size LSCFRK3W7 method ratherthan the third-order variable step size schemes. For applica-tions where the third-order variable step size methods may bebeneficial we also include the coe ffi cient scheme that allowsone to reuse the second stage of the third-order integrator in theembedded second-order integrator in the Mathematica script inAppendix D.There are two low-storage fourth-order methods studied inSec. 5.3 that may be well-suited for gradient flow applications.We find that the LSCFRK4BBB method is the most compu-tationally e ffi cient one, although the five-stage LSCFRK4CKmethod becomes comparable at finer ensembles. For the gaugeensembles that we studied we conclude that if one needs to runthe LSCFRK3W6 integrator at time steps lower that 1 /
32, itis more beneficial to switch to LSCFRK4BBB. A fourth-orderRKMK4 method was used in Ref. [35] to provide a conserva-tive estimate of the integration error for observables related tothe topology of gauge fields. We believe that LSCFRK4BBBprovides a better alternative to RKMK4, see Fig. 12. For thethree gauge ensembles used here the LSCFRK4BBB integra-tor is stable at the largest step size we tried, h = /
8. Theaverage integration error at this step size for the clover observ-able is (cid:104) ∆ C ( h = / (cid:105) = − . × − for the a = .
15 fm,12 ∆ C ( h = / (cid:105) = − . × − for the a = .
12 fm and (cid:104) ∆ C ( h = / (cid:105) = − . × − for the a = .
09 fm ensemble.Based on our findings we recommend LSCFRK3W7,LSCFRK4BBB and LSCFRK4CK for integrating the gradientflow. Exact speedups depend on the accuracy required for theflow observables. For the ensembles we used LSCFRK3W7 isabout twice as e ffi cient as LSCFRK3W6 for the w -scale set-ting. Since these methods are all written in the 2 N -storage for-mat, it is very easy to implement di ff erent integrators in theexisting code. For instance, in the MILC code [36] the mainloop over the stages in Algorithm 6 is the following : for( i=0; i A 2N , B 2N – the values of the two arrays of size N stages that store the A i , B i coe ffi cients in the 2 N -storage format. Thecode in the integrate RK 2N one stage function performsone stage of Algorithm 6 and is typically already present in thelattice codes that implemented Algorithm 7.To summarize, we presented several Lie group methods thatmay provide better computational e ffi ciency for integrating thegradient flow in lattice gauge theory. Given their low-storageproperties, they are easy to implement in the existing codes.For the reader interested in exploring the properties and imple-mentation of the low-storage Lie group integrators in a simplersetting we point out that a simple Matlab script for integratingthe equation of motion of a rotating free rigid body is includedin the Appendix of Ref. [4]. Acknowledgements. We thank the MILC collaboration forsharing the gauge configurations, Oliver Witzel for an indepen-dent test of the flow observables, Oswald Knoth for checkingthe order conditions for the low-storage Lie group integratorsused here and Claude Bernard, Steven Gottlieb and JohannesWeber for careful reading and comments on the manuscript.This work was in part supported by the U.S. National ScienceFoundation under award PHY-1812332. The minus sign in front of the second argument is to match the historicconvention in the MILC code on how the right hand side of Eq. (7) is computed.Also, at the time of writing the code for all integrators used in this paper islocated in the feature/wilson flow 2 branch. Appendix A. Some order conditions for classical Runge-Kutta methods The coe ffi cients of a third-order explicit classical Runge-Kutta method satisfy the following order conditions: (cid:88) i b i = , (A.1) (cid:88) i b i c i = , (A.2) (cid:88) i b i c i = , (A.3) (cid:88) i , j b i a i j c j = . (A.4)A 3-stage method has six independent parameters. Picking c and c as free parameters one gets the most generic branch ofsolutions c (cid:44) (cid:44) c (cid:44) c (cid:44) / b = c − c ( c − c ) , (A.5) b = − c c ( c − c ) , (A.6) a = c ( c − c ) c (2 − c ) , (A.7) b = − b − b , (A.8) a = c , (A.9) a = c − a . (A.10) Appendix B. Coe ffi cients for several 2 N -storage third- andfourth-order classical RK methods For future reference and possible integrator tuning we listseveral rational coe ffi cient schemes in Table B.2.The coe ffi cients of the RK3W6 and RK3W7 schemes in the2 N -storage format are listed in Table B.3 and of the RK4CKand RK4BBB in Table B.4. Note that the RK4CK coe ffi cientswere found numerically and then close rational expressionswere found that provide 26 digits of accuracy. For the RK4BBBmethod the coe ffi cients are given with 12 digits of accuracyin [15] therefore one cannot expect the error to be below 10 − .This bound is however well below the typical precision neededin gradient flow applications. Appendix C. Derivation of the Lie group integrator ofRef. [2] It is illustrative to discuss the properties of the integrator pro-vided by L¨uscher in Ref. [2]. As this coe ffi cient scheme was notpresent in the literature on Lie group methods at the time, webelieve that the method was derived independently. Let us startwith a generic algorithm that reuses the function value from theprevious stage, Algorithm 9, and contains six yet undeterminedparameters.If Y and Y are substituted in terms of Y , one can recognizethat this scheme belongs to the class of commutator-free Lie13 able B.2: Three-stage third-order 2 N -storage schemes with rational coe ffi -cients. The original numbering of Ref. [10] is listed in the first column. Theadditional schemes that we found are marked with the asterisk. There are manymore than we show here, but as the numerator and denominator grow largerthey are not convenient to use. Numbering [10] c c / / 12 2 / 15* 823 / / / / 12* 1418 / / / / / / 37 1 / / 4* 823 / / / / / 12 3 / 4* 439 / 592 1064 / / 15 5 / / / 314 1 1 / Table B.3: The coe ffi cients of the two 2 N -storage third-order schemes of [10]that are the basis for the LSCFRK3W6 and LSCFRK3W7 methods discussedin Sec. 5.1. Coe ffi cient RK3W6 RK3W7 A A − / − / A − / − / B / / B / / B / / Table B.4: The coe ffi cients of the two 2 N -storage fourth-order schemes of [13]and [15] that are the basis for the LSCFRK4CK and LSCFRK4BBB methodsdiscussed in Sec. 5.3. Coe ffi cient RK4CK A A − / A − / A − / A − / B / B / B / B / B / ffi cient RK4BBB A A − . A − . A − . A − . A − . B . B . B . B . B . B . lgorithm 9 Y = Y t K = F ( Y ) Y = exp ( h α K ) Y K = F ( Y ) Y = exp ( h ( α K + α K )) Y K = F ( Y ) Y t + h = exp ( h ( β K + β K + β K )) Y or Y t + h = exp ( h ( β K + c ( α K + α K ))) Y (cid:46) See text.group integrators explored in Ref. [24]. However, it does notbelong to the classes of solutions found there.We would like to find a solution of the order conditions thatallows for a third-order global accuracy method. For a classicalRK method there are four order conditions, thus, we need atleast four parameters to satisfy them. For a Lie group integratorof this type we need at least five parameters. This can be seenfrom the following argument. If we take, for instance, a third-order Crouch-Grossman method [23] (which is a commutator-free Lie group integrator with a more restricted structure thanthe methods of [24]), there is a non-classical order conditionarising from non-commutativity, so there are five in total. Ifwe follow the RKMK route [37, 21], one needs, at least, onecommutator for a third-order method. Thus, if we would liketo avoid commutators, we need at least one more coe ffi cient totune to cancel the e ff ect of the commutator. Overall, it may bepossible to construct a scheme with five rather than six di ff erentparameters. This can be beneficial to reduce the complexity ofthe order conditions, and also to reuse storage. We can trade β and β for a single coe ffi cient c by requiring: β = c α , β = c α . (C.1)This way one can store the combination α K + α K , ratherthan K and K separately, overwriting K with this combina-tion.As is done for classical RK methods, by Taylor expandingthe numerical scheme and comparing with the expansion of theexact solution one arrives at the following order conditions: β + (1 + c )( α + α ) + α = , (C.2) β ( α + α + α ) + (1 + c ) α α = , (C.3) β ( α + α + α ) + (1 + c ) α α = , (C.4) β α α = , (C.5) 32 (cid:104) β ( α + α + α ) + (1 + c ) α α (cid:105) + β ( β + c ( α + α ))( α + α + α ) + 12 [ c ( β + c ( α + α )) + α + α ] α α + c ( α + α ) α α = , (C.6)12 β ( β + c ( α + α ))( α + α + α ) + 12 [ c ( β + c ( α + α )) + α + α ] α α + ( β + c ( α + α )) α α = . (C.7)Eqs. (C.2)–(C.5) are equivalent to the classical order conditionsfor a third-order scheme, expressed in terms of the coe ffi cients α i j , β i (their relation to the classical coe ffi cients will becomeclear in a moment). Eqs. (C.6), (C.7) appear when Y is a vectoror a matrix and terms F (cid:48) F and FF (cid:48) F in the Taylor expansioncan no longer be combined. By using the four classical orderconditions (C.2)–(C.5) one finds that the two conditions (C.6),(C.7) are not independent and can be condensed into a singlecondition:12 (cid:2) β + c ( α + α ) (cid:3) + (1 + c )( α + α ) α α = . (C.8)Now there are five unknowns and five (non-linear) equations.As is obvious from Eqs. (C.2)–(C.5), (C.8) this system canbecome significantly simpler if 1 + c = 0, so we can try c = − α + α + α = / 3, substituting that into (C.3) we immedi-ately get β = / 4, from (C.2) α = / 4, from (C.5) α = / α = − / 36. We can check that Eq. (C.8) isalso satisfied. These are nothing else but the coe ffi cients of thescheme of Ref. [2], Algorithm 7.However, as the reader can verify, there are solutions forother values of c . This is possible because, as one can show, inthe form involving the coe ffi cient c , Eq. (C.1), the non-classicalconstraint (C.8) is a linear combination of (C.2)–(C.4). Had wekept β and β as independent coe ffi cients this would not hap-pen. We would still end up with a one-parameter family of so-lutions (six coe ffi cients with five constraints), but (C.8) wouldbe linearly independent.As is now clear from the discussion in Sec. 4.3, the integra-tor of [2] is, in fact, a low-storage commutator-free Lie groupintegrator that belongs to the family of schemes based on theclassical 2 N -storage methods of Ref. [10]. See [4] for detaileddiscussion. We call this scheme LSCFRK3W6 in Sec. 5.1. Itsset of classical RK coe ffi cients is a = / a = − / a = / b = / b = b = / α ’s and β ’s we started with as α = a , (C.9) α = a − a , (C.10) α = a , (C.11) β = b − a , (C.12) β = b − a , (C.13) β = b . (C.14)The coe ffi cients in the 2 N -storage format, Algorithm 6, aregiven in the second column of Table B.3. Appendix D. Mathematica script A Wolfram Mathematica script (tested with version 11) thatcalculates the coe ffi cients for the 2 N -storage explicit three-stage third-order Runge-Kutta methods from provided c and c coe ffi cients is given below. The reader can simply copy andpaste it into an empty Mathematica notebook (the formattingwill most likely be lost). (* Note: enclosing parentheses are neededso that Abort[] function could actually abortthe execution of the cell *)( (* <---- do not remove *)(* SET c2 AND c3 HERE e.g. from Table B.2 *)c2 := 1/4;c3 := 2/3;(* check the singular point *)If[c2 == 1/3 && c3 == 1/3,Print["No RK scheme with c2=c3=1/3 exists"];Abort[]];(* check if low-storage *)If[c3^2*(1 - c2) + c3*(c2^2 + 1/2*c2 - 1)+ (1/3 - 1/2*c2) != 0,Print["c2,c3 -- not a low-storage scheme"];Abort[]];(* check if limiting case c2=2/3,c3=0or c2=c3=2/3 is hit *)If[c2 == 2/3 && c3 == 0,b3 := -1/3; b2 := 3/4; b1 := 1/4 - b3;a32 := 1/4/b3; a31 := -a32; a21 := 2/3,If[c2 == 2/3 && c3 == 2/3,b3 := 1/3; b2 := 3/4 - b3; b1 := 1/4;a32 := 1/4/b3; a31 := 2/3 - a32; a21 := 2/3,b2 := (3*c3 - 2)/6/c2/(c3 - c2);b3 := (2 - 3*c2)/6/c3/(c3 - c2);a32 := c3*(c3 - c2)/c2/(2 - 3*c2);b1 := 1 - b2 - b3; a31 = c3 - a32; a21 := c2]];Print["Coefficients in classical RK form:"];Print["a21=", a21]; Print["a31=", a31];Print["a32=", a32];Print["b1=", b1];Print["b2=", b2];Print["b3=", b3];alpha21 := a21;alpha31 := a31 - a21;alpha32 := a32;beta3 := b3;c := (b1 - a31)/(a31 - a21);Print["Coefficients in the form of"];Print["Luescher, 1006.4518"];Print["with reusability condition"];Print["beta1=c*alpha31, beta2=c*alpha32:"];Print["alpha21=", alpha21];Print["alpha31=", alpha31];Print["alpha32=", alpha32];Print["beta3=", beta3];Print["c=", c];A1 := 0;B3 := b3;B2 := a32;A3 := (b2 - B2)/b3;B1 := a21;If[b2 == 0,A2 := (a31 - a21)/a32, A2 := (b1 - B1)/b2];Print["Low-storage form of Williamson:"];Print["A1=", A1];Print["A2=", A2];Print["A3=", A3];Print["B1=", B1];Print["B2=", B2];Print["B3=", B3];Print["Variable step size:"];Print["Second-order coefficients"];Ds := c2*alpha32 - c3*(c3 - c2);If[Ds == 0,Print["No reusable embedded scheme"];Print["Pick lambda3=0"];l3 := 0;l2 := 1/2/c2;l1 := 1 - l2;Print["lambda1=", l1];Print["lambda2=", l2];Print["lambda3=", l3],l2 := alpha32*(1/2 - c3)/Ds;l3 := 1 - (c3 - c2)*(1/2 - c3)/Ds;l1 := 1 - l2 - l3;q := l2/a32;Print["with reuse of third-order second stage"];Print["lambda2=q*alpha32, lambda1=q*alpha31"];Print["lambda3=", l3]; rint["q=", q]];) (* <---- do not remove *)(* end of script *) References [1] K. G. Wilson, Confinement of Quarks, Phys. Rev. D10 (1974) 2445–2459. doi:10.1103/PhysRevD.10.2445 .[2] M. Luescher, Properties and uses of the Wilson flow in lattice QCD, JHEP08 (2010) 071, [Erratum: JHEP03,092(2014)]. arXiv:1006.4518 , doi:10.1007/JHEP08(2010)071,10.1007/JHEP03(2014)092 .[3] E. Hairer, C. Lubich, G. Wanner, Geometric Numerical Integration:Structure-Preserving Algorithms for Ordinary Di ff erential Equations; 2nded., Springer, Dordrecht, 2006. doi:10.1007/3-540-30666-8 .URL https://cds.cern.ch/record/1250576 [4] A. Bazavov, Commutator-free Lie group methods with minimum stor-age requirements and reuse of exponentials, arXiv e-prints (2020)arXiv:2007.04225 arXiv:2007.04225 .[5] M. Luscher, P. Weisz, Computation of the Action for On-Shell ImprovedLattice Gauge Theories at Weak Coupling, Phys. Lett. 158B (1985) 250–254. doi:10.1016/0370-2693(85)90966-9 .[6] S. Borsanyi, et al., High-precision scale setting in lattice QCD, JHEP 09(2012) 010. arXiv:1203.4469 , doi:10.1007/JHEP09(2012)010 .[7] Z. Fodor, K. Holland, J. Kuti, S. Mondal, D. Nogradi, C. H. Wong, Thelattice gradient flow at tree-level and its improvement, JHEP 09 (2014)018. arXiv:1406.0827 , doi:10.1007/JHEP09(2014)018 .[8] J. Butcher, Numerical Methods for Ordinary Di ff erential Equations, 3rdEdition, Wiley, 2016.[9] E. Hairer, S. Nørsett, G. Wanner, Solving Ordinary Di ff erential EquationsI Nonsti ff problems, 2nd Edition, Springer, Berlin, 2000.[10] J. Williamson, Low-storage Runge-Kutta schemes, Journal ofComputational Physics 35 (1) (1980) 48 – 56. doi:https://doi.org/10.1016/0021-9991(80)90033-9 .URL [11] C. A. Kennedy, M. H. Carpenter, R. Lewis, Low-storage, explicitRunge-Kutta schemes for the compressible Navier-Stokes equa-tions, Applied Numerical Mathematics 35 (3) (2000) 177 – 219. doi:https://doi.org/10.1016/S0168-9274(99)00141-5 .URL [12] D. I. Ketcheson, Runge-Kutta methods with minimum storage imple-mentations, Journal of Computational Physics 229 (5) (2010) 1763 –1773. doi:https://doi.org/10.1016/j.jcp.2009.11.006 .URL [13] M. Carpenter, C. Kennedy, Fourth-order 2N-storage Runge-Kuttaschemes, Tech. Rep. NASA-TM-109112, NASA (1994).[14] M. Bernardini, S. Pirozzoli, A general strategy for the optimiza-tion of Runge-Kutta schemes for wave propagation phenomena,Journal of Computational Physics 228 (11) (2009) 4182 – 4199. doi:https://doi.org/10.1016/j.jcp.2009.02.032 .URL [15] J. Berland, C. Bogey, C. Bailly, Low-dissipation and low-dispersionfourth-order Runge-Kutta algorithm, Computers and Fluids 35 (10)(2006) 1459 – 1463. doi:https://doi.org/10.1016/j.compfluid.2005.04.003 .URL [16] T. Toulorge, W. Desmet, Optimal Runge-Kutta schemes for discon-tinuous Galerkin space discretizations applied to wave propagationproblems, Journal of Computational Physics 231 (4) (2012) 2067 – 2091. doi:https://doi.org/10.1016/j.jcp.2011.11.024 .URL [17] D. Stanescu, W. Habashi, 2N-storage low dissipation and dis-persion Runge-Kutta schemes for computational acoustics,Journal of Computational Physics 143 (2) (1998) 674 – 681. doi:https://doi.org/10.1006/jcph.1998.5986 . URL [18] V. Allampalli, R. Hixon, M. Nallasamy, S. D. Sawyer, High-accuracylarge-step explicit Runge-Kutta (HALE-RK) schemes for computationalaeroacoustics, Journal of Computational Physics 228 (10) (2009) 3837 –3850. doi:https://doi.org/10.1016/j.jcp.2009.02.015 .URL [19] J. Niegemann, R. Diehl, K. Busch, E ffi cient low-storage Runge-Kutta schemes with optimized stability regions, Journal of Com-putational Physics 231 (2) (2012) 364 – 372. doi:https://doi.org/10.1016/j.jcp.2011.09.003 .URL [20] Y. an Yan, Low-storage Runge-Kutta method for simulating time-dependent quantum dynamics, Chinese Journal of Chemical Physics30 (3) (2017) 277 – 286.[21] H. Munthe-Kaas, High order Runge-Kutta methods on man-ifolds, Applied Numerical Mathematics 29 (1) (1999) 115– 127, proceedings of the NSF / CBMS Regional Conferenceon Numerical Analysis of Hamiltonian Di ff erential Equations. doi:https://doi.org/10.1016/S0168-9274(98)00030-0 .URL [22] H. Z. Munthe-Kaas, B. Owren, Computations in a free Lie algebra, Philo-sophical Transactions of the Royal Society of London. Series A: Mathe-matical, Physical and Engineering Sciences 357 (1999) 957 – 981.[23] P. E. Crouch, R. Grossman, Numerical integration of ordinary di ff erentialequations on manifolds, Journal of Nonlinear Science 3 (1) (1993) 1–33. doi:10.1007/BF02429858 .URL https://doi.org/10.1007/BF02429858 [24] E. Celledoni, A. Marthinsen, B. Owren, Commutator-free Lie groupmethods, Future Generation Computer Systems 19 (3) (2003)341 – 352, special Issue on Geometric Numerical Algorithms. doi:https://doi.org/10.1016/S0167-739X(02)00161-9 .URL [25] O. Knoth, B-Series (2020).URL https://github.com/OsKnoth/B-Series [26] E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage,J. Shigemitsu, H. Trottier, K. Wong, Highly improved staggered quarkson the lattice, with applications to charm physics, Phys. Rev. D75 (2007)054502. arXiv:hep-lat/0610092 , doi:10.1103/PhysRevD.75.054502 .[27] A. Bazavov, et al., Scaling studies of QCD with the dynamical HISQaction, Phys. Rev. D82 (2010) 074501. arXiv:1004.0342 , doi:10.1103/PhysRevD.82.074501 .[28] A. Bazavov, et al., Lattice QCD Ensembles with Four Flavors of HighlyImproved Staggered Quarks, Phys. Rev. D87 (5) (2013) 054505. arXiv:1212.4768 , doi:10.1103/PhysRevD.87.054505 .[29] A. Bazavov, et al., Gradient flow and scale setting on MILC HISQ en-sembles, Phys. Rev. D 93 (9) (2016) 094510. arXiv:1503.02769 , doi:10.1103/PhysRevD.93.094510 .[30] P. Prince, J. Dormand, High order embedded runge-kutta formulae,Journal of Computational and Applied Mathematics 7 (1) (1981) 67 – 75. doi:https://doi.org/10.1016/0771-050X(81)90010-3 .URL [31] A. Ralston, Runge-Kutta methods with minimum error bounds, Math.Comp. 16 (1962) 431 – 437. doi:https://doi.org/10.1090/S0025-5718-1962-0150954-0 .URL [32] P. Fritzsch, A. Ramos, The gradient flow coupling in the SchroedingerFunctional, JHEP 10 (2013) 008. arXiv:1301.4388 , doi:10.1007/JHEP10(2013)008 .[33] M. Wandelt, M. Guenther, E ffi cient numerical simulation of the wilsonflow in lattice qcd, in: G. Russo, V. Capasso, G. Nicosia, V. Romano(Eds.), Progress in Industrial Mathematics at ECMI 2014, Springer Inter-national Publishing, Cham, 2016, pp. 1065–1071.[34] P. Bogacki, L. Shampine, A 3(2) pair of runge - kutta for- ulas, Applied Mathematics Letters 2 (4) (1989) 321 – 325. doi:https://doi.org/10.1016/0893-9659(89)90079-7 .URL [35] M. Ce, C. Consonni, G. P. Engel, L. Giusti, Non-Gaussianities in thetopological charge distribution of the SU(3) Yang–Mills theory, Phys.Rev. D92 (7) (2015) 074502. arXiv:1506.06052 , doi:10.1103/PhysRevD.92.074502 .[36] MILC Collaboration, MILC collaboration code for lattice QCD calcula-tions (Dec. 2020).URL https://github.com/milc-qcd/milc_qcd [37] H. Munthe-Kaas, Runge-Kutta methods on Lie groups, BIT NumericalMathematics 38 (1) (1998) 92–111. doi:10.1007/BF02510919 .URL https://doi.org/10.1007/BF02510919https://doi.org/10.1007/BF02510919