[PDF] Efficient integration of gradient flow in lattice gauge theory and properties of low-storage commutator-free Lie group methods

Abstract

The smoothing procedure known as the gradient flow that suppresses ultraviolet fluctuations of gauge fields plays an important role in lattice gauge theory calculations. In particular, this procedure is often used for high-precision scale setting and renormalization of operators. The gradient flow equation is defined on the SU(3) manifold and therefore requires geometric, or structure-preserving, integration methods to obtain its numerical solutions. We examine the properties and origins of the three-stage third-order explicit Runge-Kutta Lie group integrator commonly used in the lattice gauge theory community, demonstrate its relation to 2N-storage classical Runge-Kutta methods and explore how its coefficients can be tuned for optimal performance in integrating the gradient flow. We also compare the performance of the tuned method with two third-order variable step size methods. Next, based on the recently established connection between low-storage Lie group integrators and classical 2N-storage Runge-Kutta methods, we study two fourth-order low-storage methods that provide a computationally efficient alternative to the commonly used third-order method while retaining the convenient iterative property of the latter. Finally, we demonstrate that almost no coding effort is needed to implement the low-storage Lie group methods into existing gradient flow codes.

Full PDF

EE ﬃ cient integration of gradient ﬂow in lattice gauge theory andproperties of low-storage commutator-free Lie group methods Alexei Bazavov a , Thomas Chuna a a Department of Computational Mathematics, Science and Engineering and Department of Physics and Astronomy,Michigan State University, East Lansing, MI 48824, USA

Abstract

The smoothing procedure known as the gradient ﬂow that suppresses ultraviolet ﬂuctuations of gauge ﬁelds plays an important rolein lattice gauge theory calculations. In particular, this procedure is often used for high-precision scale setting and renormalizationof operators. The gradient ﬂow equation is deﬁned on the SU (3) manifold and therefore requires geometric, or structure-preserving,integration methods to obtain its numerical solutions. We examine the properties and origins of the three-stage third-order explicitRunge-Kutta Lie group integrator commonly used in the lattice gauge theory community, demonstrate its relation to 2 N -storageclassical Runge-Kutta methods and explore how its coe ﬃ cients can be tuned for optimal performance in integrating the gradientﬂow. We also compare the performance of the tuned method with two third-order variable step size methods. Next, based onthe recently established connection between low-storage Lie group integrators and classical 2 N -storage Runge-Kutta methods, westudy two fourth-order low-storage methods that provide a computationally e ﬃ cient alternative to the commonly used third-ordermethod while retaining the convenient iterative property of the latter. Finally, we demonstrate that almost no coding e ﬀ ort is neededto implement the low-storage Lie group methods into existing gradient ﬂow codes. Keywords:

Lattice gauge theory, Gradient ﬂow, Geometric integration, Runge-Kutta methods, Lie group methods

1. Introduction

In lattice gauge theory [1], a numerical approach to quan-tum gauge theories, the path integrals are evaluated by samplingthe space of possible ﬁeld conﬁgurations with a Markov ChainMonte Carlo process and averaging over the Monte Carlo timeseries. The ﬁelds are deﬁned on a four-dimensional Euclideanspace-time grid and for the physically relevant case of QuantumChromodynamics (QCD) take values in the SU (3) group.A smoothing procedure, referred to as gradient ﬂow , intro-duced by L¨uscher in Ref. [2], allows one to evolve a givengauge ﬁeld conﬁguration towards the classical solution. Thegradient ﬂow possesses renormalizing properties and is oftenused for renormalization of operators and determining the lat-tice spacing in physical units. Certain lattice calculations re-quire determination of the lattice scale to sub-percent precision.Numerically, gradient ﬂow amounts to integrating a ﬁrst-orderdi ﬀ erential equation on the SU (3) manifold. While a varietyof structure-preserving integration methods can be used for thistask [3], the three-stage third-order explicit Runge-Kutta inte-grator introduced in Ref. [2] became the most commonly usedmethod in lattice gauge theory applications.In this paper we explore the recently observed relations be-tween classical low-storage explicit Runge-Kutta methods andthe commutator-free Lie group methods [4]. We show how athree-stage third-order integrator can be optimized speciﬁcally Email address: [email protected] (Alexei Bazavov) for integrating the gradient ﬂow and how higher-order methodswith similar low-storage properties can be constructed.The paper is organized as follows. In Sec. 2 we introduce theformalism of lattice gauge theory and gradient ﬂow. In Sec. 3we review the properties of standard explicit Runge-Kutta inte-gration methods and so-called low-storage Runge-Kutta meth-ods. In Sec. 4 we discuss structure-preserving integrators, oftencalled geometric integrators or Lie group integrators. In Sec. 5we present the numerical results on integrating the gradient ﬂowon three lattice ensembles with several third- and fourth-orderLie group methods. We present our conclusions and recom-mendations for tuning the methods and improving the compu-tational e ﬃ ciency of integrating the gradient ﬂow in Sec. 6.

2. Lattice gauge theory and the gradient ﬂow

In lattice gauge theory the primary degrees of freedom are SU (3) matrices U x , µ that reside on the links of a hypercubiclattice with dimensions N σ × N τ . The lattice spacing is a , x is aninteger-valued four-vector, and µ = , . . . ,

4. Gauge-invariantobservables are represented as traces of products of the gaugelink variables along paths on the lattice that are closed loops.The main observables we discuss here are the plaquette (4-linkloop): P x , µ,ν = U x , µ U x + µ, ν U † x + ν, µ U † x , ν (1)rectangle (6-link loop): R x , µ,ν = U x , µ U x + µ, µ U x + µ, ν U † x + µ + ν, µ U † x + ν, µ U † x , ν (2) Preprint submitted to Computer Physics Communications January 15, 2021 a r X i v : . [ h e p - l a t ] J a n nd the so called clover expression constructed as a linear com-bination of several plaquettes forming the shape of a clover leaf: C x , µ,ν = i (cid:16) Q x , ν, µ − Q x , µ,ν (cid:17) (3)where Q x , µ,ν = P x , µ,ν + P x ,ν, − µ + P x , − µ, − ν + P x , − ν, µ .The simplest gauge action is the Wilson action [1] that in-cludes only the plaquette term (for convenience we drop thefactor 1 / g in the deﬁnition): S Wilson = (cid:88) x (cid:88) µ<ν ReTr(1 − P x , µ,ν ) . (4)To suppress lattice discretization e ﬀ ects one can construct im-proved actions such as, for instance, the tree-level Symanzik-improved gauge action that includes the plaquette and rectangleterms [5]: S Symanzik = S Wilson − (cid:88) x (cid:88) µ (cid:44) ν ReTr(1 − R x , µ,ν ) . (5)The clover action is S clover = (cid:88) x (cid:88) µ (cid:44) ν ReTr( C x , µ,ν C x , µ,ν ) (6)and another variant of an improved action can be constructed asa linear combination of the plaquette and clover terms.To smoothen the ﬁelds and suppress ultraviolet ﬂuctuationsRef. [2] suggested evolving the gauge ﬁelds U x , µ with the fol-lowing gradient ﬂow equation: dV x , µ dt = − (cid:110) ∂ x , µ S f ( t ) (cid:111) V x , µ , V x , µ ( t = = U x , µ (7)where the di ﬀ erential operator ∂ x , µ acts on a function of SU (3)group elements as deﬁned in Ref. [2] and S f is the lattice ac-tion evaluated using the evolved gauge link variables V ( t ). Werefer to the gradient ﬂow as the Wilson ﬂow when S f = S Wilson is used in the ﬂow equation, and as the Symanzik ﬂow when S f = S Symanzik . The ﬂow time t has dimensions of lattice spac-ing squared.One of the widespread applications of gradient ﬂow in latticegauge theory is scale setting, i.e. determination of the latticespacing in physical units for a given lattice ensemble. In thiscase the ﬂow is run until the ﬂow time t = w [6] at which (cid:34) t ddt t (cid:104) S o ( t ) (cid:105) (cid:35) t = w = Const (8)and typically

Const = . w -scale in physical units, w phys .Eq. (8) is an improved version of the original proposal wherethe action itself rather than its derivative was used [2]: t (cid:104) S o ( t ) (cid:105) (cid:12)(cid:12)(cid:12) t = t = Const . (9)The observable used for the scale setting in Eq. (8) is the actiondensity S o , not necessarily the same as S f in the ﬂow equa-tion (7). As has been discussed in Ref. [7] di ﬀ erent combina-tions of the ﬂow action and the observable result in di ﬀ erentdependence on the lattice spacing. Here we consider S Wilson and S Symanzik for the ﬂow and S Wilson , S Symanzik and S clover forthe observable.

3. Classical Runge-Kutta methods

Consider a ﬁrst-order di ﬀ erential equation for a function y ( t ) dydt = f ( t , y ) , (10)and the initial condition y ( t =

0) given. An s -stage explicitRunge-Kutta (RK) method that propagates the numerical ap-proximation to the solution y t of Eq. (10) at time t to time t + h is given in Algorithm 1 [8, 9]. Algorithm 1

Explicit classical s -stage Runge-Kutta method for i = do y i = y t + h (cid:80) i − j = a i j k j (cid:46) a i , j (cid:62) i = k i = f ( t + hc i , y i ) (cid:46) c = end for y t + h = y t + h (cid:80) si = b i k i The self-consistency conditions require c i = i − (cid:88) j = a i j . (11)We refer to this method as classical RK method to distinguish itfrom the Lie group integrators discussed in Sec. 4. To providean order of accuracy p the coe ﬃ cients a i j , b i need to satisfy the order conditions . The order conditions for a classical RKmethod of third-order global accuracy are given in AppendixA. It is convenient to represent the set of coe ﬃ cients a i j , b i , c i as a Butcher tableau , for instance, for a 3-stage method (theﬁrst entry with c = c a c a a b b b (12)The nodes c i in the left column describe the time points at whichthe stages are evaluated, the a i j in the middle give the weights of the right hand side function for each stage, and the bottomrow gives the weights for the ﬁnal stage of the method. If themethod is explicit each stage can only depend on the previousones and therefore the Butcher tableau has a characteristic tri-angular shape.For a 3-stage third-order classical RK method there are fourorder conditions and six independent coe ﬃ cients, thus, thesemethods belong to a two-parameter family. Often, the coef-ﬁcients c , c are chosen as free parameters and the rest areexpressed through them, as given in Eqs. (A.5)–(A.10).In the following the discussion is restricted to autonomousproblems where the right hand side of Eq. (10) does not explic-itly depend on time. Extension to non-autonomous problems istrivial. N-storage classical RK methods

As is clear from Algorithm 1, to compute y t + h at the ﬁnal stepone needs to store k i , i = , . . . , s (the right hand side evalua-tions) from all s stages of the method. It was shown in Ref. [10]2hat a classical RK method may be written in a form where onlythe values from the previous stage are used. Therefore onlytwo quantities need to be stored at all times, independent ofthe number of stages of the method. RK methods with such aproperty are called low-storage methods . A number of di ﬀ erenttypes of low-storage methods have been developed in the liter-ature, e.g. , Refs. [11, 12]. For the later discussion of Lie groupmethods in Sec. 4 we focus on the methods of Ref. [10], whichare also called 2 N -storage methods .Given an s -stage RK method one can express its coe ﬃ cientsthrough another set of coe ﬃ cients A i , B i , i = , . . . , s such that a i j =  A j + a i , j + + B j , j < i − , B j , j = i − , , otherwise , (13) b i =  A i + b i + + B i , i < s , B i , i = s , (14)and for explicit methods necessarily A =

0. A 2 N -storage s -stage explicit classical RK method is given in Algorithm 2. Algorithm 2 N -storage explicit classical s -stage Runge-Kuttamethod y = y t for i = do ∆ y i = A i ∆ y i − + h f ( y i − ) (cid:46) A = y i = y i − + B i ∆ y i end for y t + h = y s For a 3-stage third-order method expressing the original a i j , b i coe ﬃ cients through A i , B i leads to an additional, ﬁfth, ordercondition for a i j , b i that was found in [10]. This means that thecoe ﬃ cients of a 2 N -storage scheme now form a one-parameterfamily. One can express the ﬁfth order condition as an implicitfunction of c and c [10]: c (1 − c ) + c (cid:32) c + c − (cid:33) + (cid:32) − c (cid:33) = . (15)This implicit function is shown in Fig. 1.While almost any point in the plane (except c = c = / ﬃ cient scheme, only the values on the curve correspond toclassical RK methods that can be rewritten in the 2 N -storageformat, Algorithm 2. The plot shown in Fig. 1 ﬁrst appearedin [10] therefore we refer to it as the Williamson curve . Toﬁnd coe ﬃ cients for a 3-stage third-order 2 N -storage schemeone can proceed in the following way: • Pick a value of c in the allowed range. • Solve Eq. (15) for c and pick one of the branches. Unlike other types of low-storage methods ( e.g. , 2 R -, 2 S -, 3 R -, 3 S -, etc.)the 2 N -storage methods have special properties that turned out to be related toLie group integrators [4]. c c W6 W7

Figure 1: The Williamson curve, i.e. the set of values of c and c coe ﬃ -cients for which the 3-stage third-order classical RK schemes can be written inthe 2 N -storage format. The symbols (triangles, box and circle) correspond torational solutions. The blue box and the red circle are the schemes that are dis-cussed in more detail later. W6 and W7 labels indicate the original numberingin Ref. [10]. There is a reﬂection symmetry along the c + c = • Express all a i j , b i coe ﬃ cients in terms of c and c usingEqs. (A.5)–(A.10). • Find A i , B i by inverting the relations given in Eqs. (13),(14).The symbols on the Williamson curve are the points where c and c (and all the other coe ﬃ cients) have rational values.These values are summarized in Table B.2 in Appendix B. Theschemes labeled with the blue box and red circle in the ﬁgureplay special role in our discussion later. We denote them: • RK3W6: c = / c = / • RK3W7: c = / c = / N -storage schemes ﬁrst appeared.For 2 N -storage schemes with more than three stages andorders higher than three there are no analytic solutions avail-able. The coe ﬃ cients can be found by expressing the orderconditions through the coe ﬃ cients A i , B i and solving the re-sulting system of non-linear equations numerically. Multiple2 N -storage schemes have been designed in this way in the lit-erature [13, 14, 15, 16, 17, 18, 19, 20]. In the previous sections we considered integration methodsthat operate with ﬁxed step size. If an estimate of the localerror is available one can adjust the step size during the inte-gration. Methods with such a property are known as variablestep size or adaptive integrators. To construct a variable stepsize method one uses two schemes of di ﬀ erent order simultane-ously. The di ﬀ erence between the two solutions after one step3f integration serves as an estimate of the local error and is usedto adjust the step size.Later in Sec. 5.2 we will consider two variable step sizeschemes where a third-order integrator is used for propagatingthe solution and an embedded second-order integrator is used toconstruct an alternative estimate of the solution. Let d be somemeasure of distance between the solutions (the precise metricused is not important at this point) after the current integrationstep and δ a ﬁxed parameter – local tolerance. After an integra-tion step the step size is adjusted as h → . (cid:114) δ d h . (16)If d > δ then the integration step is redone with the adjusted h .Otherwise the integration proceeds with the adjusted h . Such aprocedure ensures that the local error at every step is boundedby δ .It is important to note that setting a bound on the local errordoes not actually tell one what value of the global error will beachieved. We discuss this point in more detail in Sec. 5.2.

4. Lie group integrators

Consider now a di ﬀ erential equation on a manifold: dYdt = F ( Y ) Y . (17)The results in this section are valid in general for Y taking val-ues on an arbitrary manifold equipped with a group action. Forthe main discussion that follows, Y will represent a gauge linkvariable, which is an SU (3) matrix. Capital letters are used hereto emphasize that the variables do not necessarily commute, un-like in the classical RK case.If one uses a classical RK scheme for solving Eq. (17) nu-merically, an update of the form Y + Const · hF ( Y ) Y willmove Y away from the manifold. One needs to use geomet-ric, or structure-preserving integration schemes that update Y as exp( Const · hF ( Y )) Y . We discuss the two main approachesto constructing such methods next. Let us ﬁrst return to the classical RK Algorithm 1. Onecan modify the step 2 in the following way: Evaluate thesum in the second term, exponentiate it and act on y t , i.e.Y i = exp( h (cid:80) i − j = a i j K j ) Y t , where K j = F ( Y j ). In this case, how-ever, the extra uncanceled terms in the Taylor expansion of thescheme will result in a method whose global order of accuracyis lower than for the classical scheme. This can be cured byadding commutators. A scheme given in Algorithm 3 which isreferred to as the Runge-Kutta-Munthe-Kaas (RKMK) methodwas introduced in Ref. [21]. There, the expansion of the inversederivative of the matrix exponential d exp − U ( V ) is truncated atthe order p − U , V , p ) = p − (cid:88) k = B k k ! ad k U ( V ) , (18) B k are the Bernoulli numbers and the k -th power of the adjointoperator ad U ( V ) is given by an iterated commutator applica-tion: ad U ( V ) = V , (19)ad U ( V ) = [ U , V ] , (20)ad k U ( V ) = ad U (ad k − U ( V )) = [ U , [ U , [ . . . , [ U , V ]]]] . (21)This algorithm results in a Lie group integrator of order p whose coe ﬃ cients are the coe ﬃ cients of a classical RK methodof order p . Algorithm 3 s -stage Runge-Kutta-Munthe-Kaas Lie groupmethod for i = do U i = h (cid:80) i − j = a i j ˜ K j (cid:46) a i , j (cid:62) i = Y i = exp( U i ) Y t K i = F ( Y i ) ˜ K i = dexpinv( U i , K i , p ) end for V = (cid:80) si = b i ˜ K i Y t + h = exp( V ) Y t The advantage of Algorithm 3 is that any classical RKmethod can be turned into a Lie group integrator. The num-ber of commutators can often be reduced as discussed in [22].It was found, however, that schemes that avoid commutatorscan be more stable and provide lower global error at the samecomputational cost. We discuss them next.

The earlier work on manifold integrators [23] was extendedin Ref. [24] to design a class of commutator-free

Lie groupmethods where each stage of the method may include a com-position of several exponentials. A general scheme is given inAlgorithm 4. We use the notation of Ref. [4] instead of the orig-inal notation of Ref. [24]. L i is the number of exponentials usedat stage i , J il is the number of right hand side evaluations K j used in the l -th exponential at stage i , L is the number of expo-nentials at the ﬁnal step, and I l is the number of right hand sideevaluations K i used in the l -th exponential at the ﬁnal step. T represents a “time-ordered” product meaning that an exponen-tial with a lower value of index l is located to the right . Algorithm 4 s -stage commutator-free Lie group method Y = Y t , K = F ( Y ) for i = do Y i = T (cid:110)(cid:81) L i l = exp (cid:16) h (cid:80) J il j = α l ; i j K j (cid:17)(cid:111) Y t K i = F ( Y i ) end for Y t + h = T (cid:110)(cid:81) Ll = exp (cid:16) h (cid:80) I l i = β l ; i K i (cid:17)(cid:111) Y t The coe ﬃ cients α l ; i j , β l ; i are related to the coe ﬃ cients of a4lassical RK method as [24] L i (cid:88) l = α l ; i j = a i j , L (cid:88) l = β l ; i = b i . (22)To better understand the notation of Algorithm 4 we list in Al-gorithm 5 explicit steps of one of the methods of Ref. [24]where s = L = L = J = L = J = L = I = I = Algorithm 5 Y = Y t K = F ( Y ) Y = exp( h α K ) Y t K = F ( Y ) Y = exp( h ( α K + α K )) Y t K = F ( Y ) Y t + h = exp( h ( β K + β K + β K )) exp( h β K ) Y t It was found in Ref. [24] that ﬁxing β = α = / Y at the ﬁnal step and the other coe ﬃ cients forma one-parameter family of solutions. Ref. [24] considered such commutator-free methods thatreuse exponentials. For instance, in Algorithm 5 Y is reused atthe ﬁnal stage so one needs only three exponential evaluationsin total. Recently, Ref. [4] considered designing a commutator-free method where every next stage reuses Y i from the previousstage and contains only one exponential evaluation per stage(inspired by Algorithm 7 of Ref. [2], see below). Such meth-ods form a subclass of methods of Ref. [24] but di ﬀ er from thesolutions found there by how the exponentials are reused. Itturned out that for a 3-stage third-order commutator-free Liegroup method with exponential reuse the additional order con-dition resulting from non-commutativity is the same as the or-der condition for a 2 N -storage 3-stage third-order classical RKmethod, Eq. (15). Thus, it was proven in [4] that all 2 N -storage 3-stage third-order classical RK methods of [10], i.e. all points on the Williamson curve, Fig. 1, are also low-storagethird-order commutator-free Lie group integrators. It was con-jectured in [4] that 2 N -storage classical RK methods of orderhigher than three are also automatically Lie group integratorsof the same order. Numerical evidence was provided in supportof the conjecture. Moreover, for a given set of numerical valuesof coe ﬃ cients A i , B i of a classical 2 N -storage RK method theorder of the Lie group method based on it can be determinedalgorithmically by using B-series [25]. Thus for all such meth-ods that we use here, given the coe ﬃ cients A i , B i of a classical2 N -storage RK method with s stages and global order of ac-curacy p , the procedure listed in Algorithm 6 is a low-storagecommutator-free Lie group method of order p .Let us now turn to the discussion of the integrator ﬁrst in-troduced by L¨uscher in Ref. [2]. In our notation it is givenin Algorithm 7. This scheme belongs to the generic class of Algorithm 6 N -storage s -stage commutator-free Runge-KuttaLie group method Y = Y t for i = do ∆ Y i = A i ∆ Y i − + hF ( Y i − ) (cid:46) A = Y i = exp( B i ∆ Y i ) Y i − end for Y t + h = Y s commutator-free Lie group methods developed in Ref. [24],however, it di ﬀ ers from the classes of solutions found there.Given that the linear combination of K and K is the same atsteps 5 and 7 and the previous stage is reused at steps 3, 5 and7, this integrator has certain reusability property. As far as weare aware, an integrator with the structure and numerical co-e ﬃ cients of Algorithm 7 was not present in the literature onmanifold integrators prior to Ref. [2]. Thus, we believe that thismethod was derived independently. Since its derivation was notpresented in [2], we present our derivation in Appendix C forillustrative purposes and also to document the order conditionsin the form that we were not able to ﬁnd in the existing litera-ture. This scheme provides a link to the recent developments ofRef. [4]. Algorithm 7 Y = Y t K = F ( Y ) Y = exp (cid:16) h K (cid:17) Y K = F ( Y ) Y = exp (cid:16) h (cid:16) K − K (cid:17)(cid:17) Y K = F ( Y ) Y t + h = exp (cid:16) h (cid:16) K − K + K (cid:17)(cid:17) Y It turns out that when Algorithm 7 is rewritten in the 2 N -storage format of Algorithm 6 and the A i , B i coe ﬃ cients areconverted to the coe ﬃ cients of the classical underlying RKscheme, Eqs. (13), (14), the latter are the same as for theRK3W6 classical RK method discussed in Sec. 3.1. It is shownas a blue square on the Williamson curve, Fig. 1. Thus, themethod of Ref. [2] belongs to the class of 2 N -storage classicalRK methods which are automatically Lie group integrators ofthe same order as proven in [4]. We will explore this to ﬁndthe optimal set of coe ﬃ cients for integrating the gradient ﬂowin Sec. 5.1.

5. Numerical results

To explore the properties of di ﬀ erent integrators we usedthree gauge ensembles with the lattice spacing ranging from0 .

15 fm down to 0 .

09 fm listed in Table 1. These ensembleswere generated by the MILC collaboration with the one-loopimproved gauge action [5] and the Highly Improved StaggeredQuark (HISQ) action [26, 27]. The light quark masses weretuned to produce the Goldstone pion mass of about 300 MeV5 able 1: The MILC 2 + + T max .Here T max is dimensionless and approximately equal to ( w phys / a ) . Ensemble N σ × N τ a , fm T max l1648f211b580m013m065m838 ×

48 0 .

15 1 . l2464f211b600m0102m0509m635 ×

64 0 .

12 2 . l3296f211b630m0074m037m440 ×

96 0 .

09 3 . and the strange and charm quark masses are set to the physicalvalues.We integrate the ﬂow for the amount of time T max that isneeded to determine the w -scale according to Eq. (8) with Const = .

3. The values of T max for each ensemble are givenin Table 1. The w -scale for these ensembles was determinedin [29].To ﬁnd the global error for each integration method we needto compare to the exact solution. For this purpose we have alsoimplemented a 13-stage eighth-order RK integrator of Munthe-Kaas type, Algorithm 3, with Prince-Dormand coe ﬃ cients [30]that we refer to as RKMK8. At step size h = − it providesthe result that is exact within the ﬂoating point double precision.For this reason the results from RKMK8 are labeled as “exact.”To evaluate the global error introduced by the integrationmethods several quantities are studied. Let us ﬁrst deﬁne thesquared distance between two SU (3) matrices X , Y : D ( X , Y ) ≡ (cid:88) i , j = | X i j − Y i j | . (23)For a set of ﬂowed gauge ﬁelds the distance from the exact so-lution is deﬁned as : ∆ V ≡ (cid:88) x (cid:88) µ (cid:113) D ( V x , µ ( t = T max ) , V exactx , µ ( t = T max )) . (24)We also calculate the value of the plaquette, rectangle andclover expression averaged over the lattice: P = N σ N τ (cid:88) x (cid:88) µ<ν ReTr( P x , µν ) , (25) R = N σ N τ (cid:88) x (cid:88) µ (cid:44) ν ReTr( R x , µν ) , (26) C = N σ N τ (cid:88) x (cid:88) µ (cid:44) ν ReTr( C x , µν C x , µν ) . (27)Up to a constant prefactor and a shift these quantities providethe three di ﬀ erent discretizations of the action, Eqs. (4)–(6) that Other deﬁnitions are possible, e.g. ∆ V ≡ (cid:115)(cid:88) x (cid:88) µ D ( V x ,µ ( t = T max ) , V exactx ,µ ( t = T max )) . They scale with the step size in the same way as (24). can be used for the w -scale determination in (8). The globalerror is also estimated from the di ﬀ erences: ∆ Z ≡ Z ( t ) − Z exact ( t ) , (28)where Z = P , R or C and t -dependence means that these quan-tities are computed using evolved gauge links V ( t ). We now address the question of what integrator out of thefamily of schemes along the Williamson curve can providethe lowest error for integrating the SU (3) gradient ﬂow. TheWilliamson curve is parametrized with a variable u which isthe distance along the curve from the point (2 / ,

0) to (1 , / u ∈ [0 , ﬃ cient schemes c ( u ), c ( u )and use their coe ﬃ cients for low-storage commutator-free Liegroup integrators in the form of Algorithm 6. We refer to thesemethods in general as LSCFRK3 – low-storage, commutator-free, Runge-Kutta, third-order. We picked such values of c and c that are either rational or given in terms of radicals. Two par-ticular schemes are of interest in the following: LSCFRK3W6(equivalent to the integrator of Ref. [2]) and LSCFRK3W7.These are commutator-free Lie group versions of the classi-cal RK integration schemes RK3W6 and RK3W7 discussed inSec. 3.1.For this part of the calculation we have chosen 11 latticesper each ensemble separated by 500 molecular dynamics timeunits for the ﬁrst two ensembles and 360 time units for the thirdensemble in Table 1. For the ﬁrst two ensembles we ran both theWilson and Symanzik ﬂow, and only the Symanzik ﬂow for thethird ensemble. We ran all 32 LSCFRK3 methods at step sizes h = /

16, 1 /

32, 1 /

64 and 1 /

128 and RKMK8 at h = / uD V W6 W7 ✕

96 sym24 ✕

64 wil24 ✕

64 sym16 ✕

48 wil16 ✕

48 sym

Figure 2: The leading-order coe ﬃ cient D V for the global integration errordeﬁned in Eq. (29) as function of the distance along the Williamson curve u . u = c = / c =

0, and u = c = c = /

3. The arrows labeled “W6” and “W7”represent the LSCFRK3W6 and LSCFRK3W7 schemes discussed in the text.The statistical errors are (much) smaller than the symbol size.

We ﬁrst consider the behavior of the distance metric deﬁnedin (24). For a third-order method the distance is expected to6cale as O ( h ). We ﬁt the distance ∆ V as a function of the stepsize to a polynomial form: (cid:104) ∆ V ( h ) (cid:105) = D V h + D V h + D V h , (29)where (cid:104) . . . (cid:105) represents the ensemble average. In some caseswe omitted the ﬁfth-order term if a reasonable ﬁt resulted fromjust the ﬁrst two terms. The errors on the ﬁt parameters wereestimated with a single elimination jackknife procedure.In Fig. 2 we show the dependence of the leading-order globalerror coe ﬃ cient D V as function of the distance u along theWilliamson curve ( i.e. the RK coe ﬃ cient scheme) for all en-sembles and ﬂows that we analyzed. The values of D V arenormalized in the following way. For the Symanzik ﬂow onthe 16 ×

48 lattice D V ( u ) is divided by 5 × . (This largefactor stems from the fact that our deﬁnition of ∆ V is exten-sive.) For all the other ensembles and ﬂows D V ( u ) is dividedby such a constant that D V ( u =

0) coincides with the one forthe 16 ×

48 lattice Symanzik ﬂow. As one can observe fromFig. 2, the behavior of D V ( u ) is similar for di ﬀ erent ensemblesand di ﬀ erent types of ﬂow. The LSCFRK3W7 Lie group in-tegrator is closer to the minimum of the global error than theLSCFRK3W6 method. Note that since the deﬁnition of ∆ V ismanifestly positive, the leading order coe ﬃ cient D V ( u ) is alsopositive for all u . -0.0001-5x10 -5 -5 u rescaled ∆ P rescaled ∆ R ∆ C ∆ P ∆ R Figure 3: The global integration error in the plaquette (cid:104) ∆ P(cid:105) , rectangle (cid:104) ∆ R(cid:105) and the clover expression (cid:104) ∆ C(cid:105) as function of distance along the Williamsoncurve u . (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) match (cid:104) ∆ C(cid:105) after they are multiplied by a constantfactor. The rescaled values of (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) are shifted by ± × − to bedistinguishable from (cid:104) ∆ C(cid:105) . Next, we study the action related observables, Eqs. (25)–(27). In Fig. 3 we show the global errors (cid:104) ∆ P(cid:105) , (cid:104) ∆ R(cid:105) and (cid:104) ∆ C(cid:105) ,deﬁned in (28), as function of u evaluated at T max = . h = /

16 for the Wilson ﬂow on the a = .

15 fmensemble. When (cid:104) ∆ P(cid:105) and (cid:104) ∆ R(cid:105) are rescaled by a constantso that they coincide with (cid:104) ∆ C(cid:105) at u =

0, all three quantitiescollapse onto the same curve. In fact, the collapse is so accu-rate that the quantities labeled “rescaled” in the ﬁgure wouldbe completely covered by the clover (cid:104) ∆ C(cid:105) had not we shiftedthem by ± × − . This is expected since at later ﬂow times the gauge ﬁelds are smooth and the di ﬀ erence between di ﬀ er-ent discretizations diminishes. It is remarkable that the collapsehappens already at our coarsest lattice, for the least improvedﬂow at the largest step size, h = /

16. We therefore focussolely on the clover discretization in the following.Similarly to Eq. (29) we ﬁt the global integration error ∆ C asfunction of step size to a polynomial: (cid:104) ∆ C ( h ) (cid:105) = D C h + D C h + D C h . (30)We omit the ﬁfth-order term whenever a reasonable ﬁt is ob-tained with the ﬁrst two terms. -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.2 0.4 0.6 0.8 1 uD C W6 W7 ✕

96 sym24 ✕

64 wil24 ✕

64 sym16 ✕

48 wil16 ✕

48 sym

Figure 4: The leading-order coe ﬃ cient D C for the global integration error de-ﬁned in Eq. (30) as function of the distance along the Williamson curve u . Thearrows labeled “W6” and “W7” represent the LSCFRK3W6 and LSCFRK3W7schemes discussed in the text. The statistical errors are (much) smaller than thesymbol size. In Fig. 4 we plot the dependence of the leading order globalerror coe ﬃ cient D C as function of u for all ensembles and ﬂows,similar to Fig. 2. The data is rescaled such that all values at u = D C for the Symanzik ﬂow on the16 ×

48 lattice. Interestingly, unlike (cid:104) ∆ V ( h ) (cid:105) , (cid:104) ∆ C ( h ) (cid:105) is notnecessarily positive. We observe that while most of the inte-gration schemes approach the exact result from above, there isa region of u where the exact result is approached from below.The LSCFRK3W7 scheme is close to the point where D C = u ∈ [0 . , .

65] is magniﬁed in Fig. 5 for the32 ×

96 ensemble and the Symanzik ﬂow. The fourth-ordercoe ﬃ cient D C ( u ) is also shown. For the LSCFRK3W7 schemeit is also relatively small. Therefore this method provides closeto the lowest error for the action observables that are central forscale setting.In Fig. 6 we directly compare (cid:104) ∆ C ( h ) (cid:105) for LSCFRK3W6,LSCFRK3W7 and LSCFRK3W9 for the 32 ×

96 ensem-ble. The LSCFRK3W7 scheme has the smallest error (cid:104) ∆ C ( h ) (cid:105) , This scheme has irrational coe ﬃ cients and it has the minimal theoreticalbound on the global error using the deﬁnition of [31]. In other words, these are“Ralston coe ﬃ cients” but with taking into account the additional constraint ofthe low-storage method. More details can be found in [10, 4]. u W6 W7 D C D C Figure 5: The leading D C and the next-to-leading order D C coe ﬃ cients forthe global integration error deﬁned in Eq. (30) for the Symanzik ﬂow on 32 ×

96 ensemble. The lines are not ﬁts but are drawn to guide the eye. The D C coe ﬃ cient is divided by a factor of two to better ﬁt in the frame. which due to the competition between the h and h termscrosses zero, as shown in the inset of Fig. 6. -4x10 -7 -2x10 -7 -7 -7 -7 -7 -6

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 h 〈 ∆ C ( h ) 〉 LSCFRK3W9LSCFRK3W7LSCFRK3W6-4x10 -10 -2x10 -10 -10 -10

0 0.02

Figure 6: The dependence of the global integration error in the clover expres-sion (cid:104) ∆ C ( h ) (cid:105) on the step size h for three LSCFRK3 methods. Lines are ﬁts tothe data, Eq. (30). For the LSCFRK3W7 method the error changes sign, asshown in the inset. To summarize, we expect that the LSCFRK3W7 integrator isclose to optimal for integrating the gradient ﬂow: it has closeto the lowest global error for the gauge ﬁelds, Fig. 2, and itsleading-order error coe ﬃ cient D C is close to 0 for the actionrelated observables, giving much smaller global error, Fig. 6.Moreover, we observe that these properties of LSCFRK3W7are stable against ﬂuctuations within the gauge ensemble, dif-ferent ensembles, di ﬀ erent types of ﬂow and di ﬀ erent dis-cretizations of the observable.In the following we consider only the Symanzik ﬂow andthe clover expression for the observable. The ﬂow is integratedwith LSCFRK3W6 and LSCFRK3W7 on 100 lattices that are -20-18-16-14-12-10 ln | 〈 ∆ C ( h ) 〉 | ✕ -22-20-18-16-14 ✕ RKMK3LSCFRK3W6LSCFRK3W7-24-22-20-18-16-14 -5 -4.5 -4 -3.5 -3 -2.5 ✕ ln h Figure 7: Scaling of the global integration error in the clover observable (cid:104) ∆ C ( h ) (cid:105) with the step size h = / /

64, 1 /

32 and 1 /

16 for the Symanzikﬂow on the three gauge ensembles. The three third-order methods are discussedin the text. The lines are drawn to guide the eye. separated by 50 time units for the ﬁrst two ensembles and 36time units for the third in Table 1. The RKMK8 method with h = − is used to get the exact solution. The error bars arecalculated with respect to 20 jackknife bins. For comparisonwe also implemented a third-order algorithm of Munthe-Kaastype with Ralston coe ﬃ cients, Algorithm 3, that we refer to asRKMK3. The results for the global integration error for theclover expression as function of step size h for the three third-order methods are shown in Fig. 7.It is customary to plot the logarithm of the absolute valueof the error vs the logarithm of the step size since the slopethen is equal to the order of the method. It also allows one todisplay the scaling for di ﬀ erent integrators that would be toosmall on the linear scale. This works well when the error isdominated by the leading order term. As we observe in Figs. 5,6, the LSCFRK3W7 error crosses zero which translates to −∞ on the log-log plot. This explains the non-monotonicity of theLSCFRK3W7 data in Fig. 7. The point with the largest errorbar for the 32 ×

96 lattice is close to the zero crossing and thusthe jackknife propagated error bars in ln | ∆ C| are large there.This is expected when the leading-order coe ﬃ cient is small andthe leading term is comparable with the next-to-leading orderterm for a range of step sizes. For the 16 ×

48 lattice the D C coe ﬃ cient is so close to zero that the error scales almost as h rather than h . At small enough h all LSCFRK3 methods shouldapproach the expected h behavior.In Fig. 7 one can observe that the LSCFRK3W7 methodhas lower global error than the original integrator of Ref. [2],LSCFRK3W6. The RKMK3 method has the largest error, de-spite the fact that its coe ﬃ cients are chosen to be the ones that8ive the lowest theoretical bound on the global error [31].The most expensive part of the calculation is evaluation ofthe right hand side F ( Y ), Eq. (17). All three methods requirethree evaluations and thus the same computational cost. (Weneglect the fact that the RKMK3 method also requires compu-tation of commutators. In a parallel code that operation is localand thus its overhead is unnoticeable compared with the compu-tation of F ( Y ), which requires communication.) We concludethat LSCFRK3W7 is the most beneﬁcial Lie group integratorfor the gradient ﬂow among the third-order explicit RK meth-ods that we studied. Variable step size integrators were used for gradient ﬂow inRefs. [32, 33], and we also explore methods of this type herefor comparison. In Ref. [32] a second-order method was em-bedded into the LSCFRK3W6 scheme. Methods of this typecan be built for all LSCFRK3 schemes so we consider the mostgeneric case. To connect with the form presented in [32] it isconvenient to start with the form where K , K and K are sep-arated, Algorithm 9 in Appendix C. Once the third-order in-tegrator stages are complete and all K i are computed, anotherstage is performed to get a lower order estimate:˜ Y t + h = exp( h ( λ K + λ K + λ K )) Y t . (31)For ˜ Y t + h to be globally second order (and locally third order)the coe ﬃ cients λ i need to satisfy the following conditions: λ + λ + λ = , (32) c λ + c λ = . (33)For any LSCFRK3 method determined by c ( u ), c ( u ) there is aone-parameter family of embedded second-order methods. Thevariable step size scheme is no longer a low-storage methodsince K i need to be stored separately for the stage (31) to beapplied .For the measure of the local error Ref. [32] suggested d = max x ,µ (cid:113) D ( V x , µ ( t + h ) , ˜ V x , µ ( t + h )) . (34)The full method is summarized in Algorithm 8 (the local toler-ance δ is a preset parameter).We take λ as a free parameter (this ﬁxes λ and λ ) and testtwo variable step size methods based on W W ﬃ cients. These methods are referred to as CFRK3W6VSand CFRK3W7VS in the following (VS = variable step size).The tests are performed on a single ×

48 lattice with theSymanzik ﬂow to illustrate a qualitative point. With one exception: For most methods there is a single value of λ wherethe linear combination of K and K can be reused. For instance, for theLSCFRK3W6 method setting λ = − / λ K + λ K = / K − / K ), so one can reuse the same linear combination in the low-order esti-mate of the solution, Eq. (31). And there is an exception to the exception: Areusable embedded second-order scheme does not exist for LSCFRK3W7. Seethe Mathematica script in Appendix D. Algorithm 8 Perform the three stages of Algorithm 9 for each gauge linkto get V x , µ ( t + h ). Store K i , i = , . . . , Compute ˜ V x , µ ( t + h ) using Eq. (31). Compute the maximum distance d with Eq. (34). Set h → . (cid:113) δ d h . If d > δ restart from step 1. ∆ V ( δ =10 -5 ) λ CFRK3W6VSCFRK3W7VS

Figure 8: The global integration error for gauge ﬁelds deﬁned in Eq. (24) asfunction of the parameter λ that distinguishes variable step size methods. TheCFRK3W6VS and CFRK3W7VS methods di ﬀ er in what third-order integratoris used. The global error is evaluated at the same local tolerance δ = − for allmethods. The results are from the ﬂow on a single 16 ×

48 lattice and thereforehave no errorbars.

For a set of local tolerances δ = − , 3 × − , 10 − , 3 × − ,10 − , 3 × − and 10 − CFRK3W6VS and CFRK3W7VS wererun for values of λ ∈ [ − ,

2] separated by 0 .

1. The resultsfor the global error in the gauge ﬁelds ∆ V ( δ ) deﬁned in (24)are shown in Fig. 8 for a ﬁxed value of tolerance δ = − asfunction of the λ parameter. There seems to be some room fortuning λ since there is a signiﬁcant di ﬀ erence in what globalerror is achieved by di ﬀ erent methods. In Fig. 9 the depen-dence of the global error ∆ V ( δ ) on the local tolerance δ isshown for the three CFRK3W6VS methods with λ = −

1, 0and 0 .

7. Note that λ = λ = . λ = − i.e. computational e ﬀ ort) each method needs to reach acertain global error. For this purpose all 62 schemes shown inFig. 8 are plotted against the number of right hand side eval-uations (which is the number of steps times three stages perstep) N rhs in Fig. 10. All schemes collapse on the same lineapart from the region of small number of steps at the begin-ning (large local tolerance) where the subleading corrections toscaling are still large. There is no room for tuning – all meth-9 ∆ V ( δ ) δ Figure 9: The global integration error ∆ V ( δ ) as function of the local tolerance δ for the three third-order variable step size methods CFRK3W6VS( λ ) for λ = −

1, 0 and 0 .

7. The results are from the ﬂow on a single 16 ×

48 latticeand therefore have no errorbars. ods would reach a given global error with the same computa-tional e ﬀ ort independently of what coe ﬃ cients of the embed-ded low-order scheme λ i are used. What is di ﬀ erent for dif-ferent schemes is the value of the local tolerance at which thatglobal error is achieved. For instance, the CFRK3W6VS with λ = . ∆ V =

10 at the local toler-ance δ = . × − , with λ = δ = × − and with λ = − δ = . × − , Fig. 9.This brings us to an important point. It may be tempting, ashappens in some literature, to interpret the local tolerance δ asa universal parameter that can be set once and after that the in-tegrator self-tunes and takes care of the global error. Contraryto that, we observe that there is no apriori way to know whatglobal integration error is achieved for a speciﬁc value of δ ona given lattice ensemble. Therefore, δ should be treated in thesame way as the step size h in ﬁxed step size methods. For acalculation performed at a single value of h there is no way toestimate what global error was achieved, apart from knowingthat it is proportional to h p , where p is the order of the method.One needs to either calculate the exact (or high-precision) solu-tion and compare with it, or perform the calculation at severalstep sizes, study the dependence of the error in the observable ofinterest on the step size and decide what step size is appropriatefor a given lattice ensemble. Similarly, one needs to study thescaling of the error with respect to δ and pick δ based on that.It will certainly be di ﬀ erent for each problem, since the numer-ical value of the global integration error is inﬂuenced by manyfactors such as the integration method, what gauge ensemblesare used, type of the gradient ﬂow, for how long the ﬂow is run,etc.Given that all embedded methods are equivalent with re-spect to the computational cost for a given higher-orderscheme and we do not observe a signiﬁcant di ﬀ erence be-tween CFRK3W6VS and CFRK3W7VS , we only perform This happens also because W6 and W7 schemes are close, e.g.

Fig. 1.

50 100 300 ∆ V N rhs

Figure 10: The global error ∆ V as function of the number of right hand sideevaluations of Eq. (7) for all 62 variable step size schemes shown in Fig. 8.The results are from the ﬂow on a single 16 ×

48 lattice and therefore have noerrorbars. calculations with the original method of Ref. [32] which isCFRK3W6VS( λ = i.e. involves commutators).It requires four right hand side evaluations; however, the lastevaluation at the current step is the ﬁrst on the next, so it canbe stored and reused (so called First Same As Last – FSALproperty). Therefore in practice it requires only three right handside evaluations, which is the same computational e ﬀ ort as forall the other third-order methods explored here. Since we donot ﬁnd this method to be beneﬁcial, we do not list the fullalgorithm. The details can be found in [34, 33]. This method isreferred to as RKMK3BS.As in Sec. 5.1 the Symanzik ﬂow is measured on 100 lat-tices from each ensemble of Table 1. The error bars are es-timated from 20 jackknife bins. In Fig. 11 the logarithm ofthe global error (cid:104) ∆ C ( h ) (cid:105) is shown vs the logarithm of the num-ber of the right hand side evaluations N rhs for the two ﬁxedstep size integrators LSCFRK3W6 and LSCFRK3W7 that arediscussed in Sec. 5.1 and the two variable step size methodsCFRK3W6VS( λ =

0) and RKMK3BS. The origin of the non-monotonic behavior for CFRK3W6VS is similar to the one forLSCFRK3W7 – its global error crosses zero at some value of δ . Interestingly, while both variable step size methods are morebeneﬁcial than LSCFRK3W6 for the 16 ×

48 and 24 ×

64 en-sembles, for the 32 ×

96 ensemble in the regime where all in-tegrators approach the expected cubic scaling, the variable stepsize schemes require comparable or larger computational e ﬀ ortthan LSCFRK3W6. The ﬁxed step size LSCFRK3W7 methodrequires the least computational e ﬀ ort for all three ensembles For a method further away on the Williamson curve the embedded schemeswould still be equivalent themselves, but the computational cost of that integra-tor would be higher than for CFRK3W6VS and CFRK3W7VS. ln | 〈 ∆ C 〉 | ✕ LSCFRK3W6RKMK3BSCFRK3W6VSLSCFRK3W7-22-20-18-16-14 ✕ -24-22-20-18-16 4 4.5 5 5.5 6 6.5 7 ln N rhs ✕ Figure 11: Scaling of the global integration error in the clover observable (cid:104) ∆ C(cid:105) with the number of right hand side evaluations N rhs for the Symanzik ﬂow onthe three gauge ensembles. The variable step size methods RKMK3BS andCFRK3W6VS were run with a set of local tolerances δ = − , 3 × − , 10 − ,3 × − , 10 − , 3 × − , 10 − , 3 × − . The data for the ﬁxed step size methodsLSCFRK3W6 and LSCFRK3W7 is the same as in Fig. 7. The lines are drawnto guide the eye. except the region of large local tolerance (small number of righthand side evaluations). We conclude that for the scale settingapplications there may be no beneﬁt from using variable stepsize integrators, at least for the gauge ensembles that we usedfor this study.Our ﬁndings are in contrast with the studies of Ref. [32]where large computational savings were reported forCFRK3W6VS( λ = ﬀ erent regime. Apartfrom less important di ﬀ erences in the gauge couplings,boundary conditions and lattice volumes, the ﬂow was runsigniﬁcantly longer to achieve a much larger smoothing radiusthan is needed for w -scale setting. Translated for the gaugeensembles used in this study the ﬂow would be run in theranges T max = . − a = .

15 fm, T max = . − a = .

12 fm and T max = . −

32 for the a = .

09 fmensembles (compare with T max in Table 1). Reference [4] opened a possibility of building low-storageLie group integrators from classical 2 N -storage RK methods.There are a number of fourth-order 2 N -storage methods in theliterature [13, 14, 15, 16, 17, 18, 19] that were constructed fordi ﬀ erent applications. Most of them include large number ofstages and may therefore be computationally ine ﬃ cient for in-tegration of the gradient ﬂow in lattice gauge theory.We study two fourth-order methods that have ﬁve (the min-imal possible number) [13] and six [15] stages which we refer to as LSCFRK4CK and LSCFRK4BBB , respectively. Theircoe ﬃ cients were found by solving a system of nonlinear equa-tions resulting from eight order conditions for a classical RKmethod and additional problem-dependent constraints ( e.g. in-creased regions of stability). These coe ﬃ cients in the 2 N -storage format are listed in Appendix B.As in Sec. 5.1, the tests are performed with the Symanzikﬂow, clover observable on 100 lattices from the ensembleslisted in Table 1. The step sizes h = /

16, 1 /

32, 1 /

64 and1 /

128 are used and the exact solution is obtained with RKMK8at h = − . For comparison we also implemented a fourth-order method of Munthe-Kaas type, RKMK4. Such a methodwas employed in [35]. Our implementation is not exactly thesame as there and provides a slightly smaller global error, butour ﬁndings are qualitatively similar. -24-22-20-18-16-14-12 ln | 〈 ∆ C ( h ) 〉 | ✕ -24-22-20-18-16 ✕ RKMK4LSCFRK3W6LSCFRK4CKLSCFRK4BBB-26-24-22-20-18-16 -5 -4.5 -4 -3.5 -3 -2.5 ✕ ln h Figure 12: Scaling of the global integration error in the clover observable (cid:104) ∆ C ( h ) (cid:105) with the step size h = / /

64, 1 /

32 and 1 /

16 for the Symanzikﬂow on the three gauge ensembles. The three fourth-order methods are dis-cussed in the text. The lines are drawn to guide the eye.

The scaling of the error in the clover observableln |(cid:104) ∆ C ( h ) (cid:105)| as function of ln h is shown in Fig. 12 for theLSCFRK3W6, LSCFRK4CK, LSCFRK4BBB and RKMK4methods. RKMK4 has the largest error among the fourth-ordermethods while the lowest global integration error is achievedwith LSCFRK4BBB.The 2 N -storage RK methods with s stages have 2 s − A = We use nomenclature consistent with other methods in this paper. In theoriginal work [15] the classical Runge-Kutta method is called RK46-NL. ln | 〈 ∆ C 〉 | ✕ -24-22-20-18-16 ✕ -26-24-22-20-18-16 4 4.5 5 5.5 6 6.5 7 7.5 8 ✕ ln N rhs LSCFRK3W6LSCFRK4CKLSCFRK4BBBLSCFRK3W7

Figure 13: Scaling of the global integration error in the clover observable (cid:104) ∆ C(cid:105) with the number of right hand side evaluations N rhs for the Symanzik ﬂow onthe three gauge ensembles. All integrators were run at step sizes h = / /

64, 1 /

32 and 1 /

16 except LSCFRK4CK and LSCFRK4BBB where h = / similarly to our discussion in Sec. 5.1. However, due to thecomplexity of the order conditions, no analytic solutions suchas Eq. (15) are available. This makes tuning of that integrator acomplicated task. We note that there are three more coe ﬃ cientschemes reported in Ref. [13], but we found them less e ﬃ cientthan the main scheme that Ref. [13] recommended and whichwas implemented in this study. The integration schemes that we explored di ﬀ er in the num-ber of stages. To compare their computational e ﬃ ciency,Fig. 13 shows the dependence of the global error in the cloverobservable vs the number of right hand side evaluations for thethird-order method of Ref. [2] LSCFRK3W6, the third-orderscheme LSCFRK3W7 that we discussed in Sec. 5.1, and thetwo fourth-order methods LSCFRK4CK and LSCFRK4BBB.Compared with LSCFRK3W6 we ﬁnd that the LSCFRK3W7scheme produces lower global error at all step sizes exploredand is thus more beneﬁcial computationally. The fourth-order LSCFRK4BBB scheme becomes more beneﬁcial thanLSCFRK3W6 at step size of about h = /

16 for the 16 × ×

64 ensembles and about h = /

32 for the 32 × ﬃ cient at h = /

128 ( h for LSCFRK3W7). Forthe 32 ×

96 ensemble the ﬁve-stage LSCFRK4CK method be-comes comparable with LSCFRK4BBB.

6. Conclusion

Based on the connection [4] between the 2 N -storage clas-sical Runge-Kutta methods [10] and commutator-free integra-tors [24] we explored several possible improvements in the ef-ﬁciency of integrating the gradient ﬂow in lattice gauge theory.Among the low-storage three-stage third-order schemes thatare parametrized by the Williamson curve the LSCFRK3W7 isthe most promising. Its global error in the norm of the gaugeﬁeld is close to the minimum, Fig. 2. For the action observ-ables this method is close to the point where the leading or-der error coe ﬃ cient is close to zero. Like the originally pro-posed LSCFRK3W6 method of L¨uscher [2], LSCFRK3W7 hasrational coe ﬃ cients that are given in the 2 N -storage form inAppendix B. The performance of the LSCFRK3W7 method isuniversal across ensembles with di ﬀ erent lattice spacing, typesof ﬂow and types of observable that we explored. For a speciﬁcgradient ﬂow application the reader can always revisit the tun-ing of the third-order scheme similar to our study in Sec. 5.1 by e.g. running a small-scale test on the set of rational values of c and c coe ﬃ cients given in Appendix B that reasonably coverthe Williamson curve. For the reader’s convenience we alsoprovide a listing of the Mathematica script that calculates theLSCFRK3 method coe ﬃ cients in various formats from given c and c in Appendix D.Our studies of the third-order variable step size methods inSec. 5.2 indicate that one needs to exercise caution in interpret-ing the local tolerance δ parameter. Its relation to the globalintegration error is not a priori known, in the same way as ithappens with the step size h for ﬁxed step size methods. Thus,in both situations one needs to study the scaling of the globalerror with the control parameter, step size h or local tolerance δ , and tune it based on that for each speciﬁc case. For w -scalesetting applications we ﬁnd that it is still computationally moree ﬃ cient to use the ﬁxed step size LSCFRK3W7 method ratherthan the third-order variable step size schemes. For applica-tions where the third-order variable step size methods may bebeneﬁcial we also include the coe ﬃ cient scheme that allowsone to reuse the second stage of the third-order integrator in theembedded second-order integrator in the Mathematica script inAppendix D.There are two low-storage fourth-order methods studied inSec. 5.3 that may be well-suited for gradient ﬂow applications.We ﬁnd that the LSCFRK4BBB method is the most compu-tationally e ﬃ cient one, although the ﬁve-stage LSCFRK4CKmethod becomes comparable at ﬁner ensembles. For the gaugeensembles that we studied we conclude that if one needs to runthe LSCFRK3W6 integrator at time steps lower that 1 /

32, itis more beneﬁcial to switch to LSCFRK4BBB. A fourth-orderRKMK4 method was used in Ref. [35] to provide a conserva-tive estimate of the integration error for observables related tothe topology of gauge ﬁelds. We believe that LSCFRK4BBBprovides a better alternative to RKMK4, see Fig. 12. For thethree gauge ensembles used here the LSCFRK4BBB integra-tor is stable at the largest step size we tried, h = /

8. Theaverage integration error at this step size for the clover observ-able is (cid:104) ∆ C ( h = / (cid:105) = − . × − for the a = .

15 fm,12 ∆ C ( h = / (cid:105) = − . × − for the a = .

12 fm and (cid:104) ∆ C ( h = / (cid:105) = − . × − for the a = .

09 fm ensemble.Based on our ﬁndings we recommend LSCFRK3W7,LSCFRK4BBB and LSCFRK4CK for integrating the gradientﬂow. Exact speedups depend on the accuracy required for theﬂow observables. For the ensembles we used LSCFRK3W7 isabout twice as e ﬃ cient as LSCFRK3W6 for the w -scale set-ting. Since these methods are all written in the 2 N -storage for-mat, it is very easy to implement di ﬀ erent integrators in theexisting code. For instance, in the MILC code [36] the mainloop over the stages in Algorithm 6 is the following : for( i=0; i

A 2N , B 2N – the values of the two arrays of size

N stages that store the A i , B i coe ﬃ cients in the 2 N -storage format. Thecode in the integrate RK 2N one stage function performsone stage of Algorithm 6 and is typically already present in thelattice codes that implemented Algorithm 7.To summarize, we presented several Lie group methods thatmay provide better computational e ﬃ ciency for integrating thegradient ﬂow in lattice gauge theory. Given their low-storageproperties, they are easy to implement in the existing codes.For the reader interested in exploring the properties and imple-mentation of the low-storage Lie group integrators in a simplersetting we point out that a simple Matlab script for integratingthe equation of motion of a rotating free rigid body is includedin the Appendix of Ref. [4]. Acknowledgements.

We thank the MILC collaboration forsharing the gauge conﬁgurations, Oliver Witzel for an indepen-dent test of the ﬂow observables, Oswald Knoth for checkingthe order conditions for the low-storage Lie group integratorsused here and Claude Bernard, Steven Gottlieb and JohannesWeber for careful reading and comments on the manuscript.This work was in part supported by the U.S. National ScienceFoundation under award PHY-1812332. The minus sign in front of the second argument is to match the historicconvention in the MILC code on how the right hand side of Eq. (7) is computed.Also, at the time of writing the code for all integrators used in this paper islocated in the feature/wilson flow 2 branch.

Appendix A. Some order conditions for classical Runge-Kutta methods

The coe ﬃ cients of a third-order explicit classical Runge-Kutta method satisfy the following order conditions: (cid:88) i b i = , (A.1) (cid:88) i b i c i = , (A.2) (cid:88) i b i c i = , (A.3) (cid:88) i , j b i a i j c j = . (A.4)A 3-stage method has six independent parameters. Picking c and c as free parameters one gets the most generic branch ofsolutions c (cid:44) (cid:44) c (cid:44) c (cid:44) / b = c − c ( c − c ) , (A.5) b = − c c ( c − c ) , (A.6) a = c ( c − c ) c (2 − c ) , (A.7) b = − b − b , (A.8) a = c , (A.9) a = c − a . (A.10) Appendix B. Coe ﬃ cients for several 2 N -storage third- andfourth-order classical RK methods For future reference and possible integrator tuning we listseveral rational coe ﬃ cient schemes in Table B.2.The coe ﬃ cients of the RK3W6 and RK3W7 schemes in the2 N -storage format are listed in Table B.3 and of the RK4CKand RK4BBB in Table B.4. Note that the RK4CK coe ﬃ cientswere found numerically and then close rational expressionswere found that provide 26 digits of accuracy. For the RK4BBBmethod the coe ﬃ cients are given with 12 digits of accuracyin [15] therefore one cannot expect the error to be below 10 − .This bound is however well below the typical precision neededin gradient ﬂow applications. Appendix C. Derivation of the Lie group integrator ofRef. [2]

It is illustrative to discuss the properties of the integrator pro-vided by L¨uscher in Ref. [2]. As this coe ﬃ cient scheme was notpresent in the literature on Lie group methods at the time, webelieve that the method was derived independently. Let us startwith a generic algorithm that reuses the function value from theprevious stage, Algorithm 9, and contains six yet undeterminedparameters.If Y and Y are substituted in terms of Y , one can recognizethat this scheme belongs to the class of commutator-free Lie13 able B.2: Three-stage third-order 2 N -storage schemes with rational coe ﬃ -cients. The original numbering of Ref. [10] is listed in the ﬁrst column. Theadditional schemes that we found are marked with the asterisk. There are manymore than we show here, but as the numerator and denominator grow largerthey are not convenient to use. Numbering [10] c c / /

12 2 /

15* 823 / / / /

12* 1418 / / / / / /

37 1 / /

4* 823 / / / / /

12 3 /

4* 439 /

592 1064 / /

15 5 / / /

314 1 1 / Table B.3: The coe ﬃ cients of the two 2 N -storage third-order schemes of [10]that are the basis for the LSCFRK3W6 and LSCFRK3W7 methods discussedin Sec. 5.1. Coe ﬃ cient RK3W6 RK3W7 A A − / − / A − / − / B / / B / / B / / Table B.4: The coe ﬃ cients of the two 2 N -storage fourth-order schemes of [13]and [15] that are the basis for the LSCFRK4CK and LSCFRK4BBB methodsdiscussed in Sec. 5.3. Coe ﬃ cient RK4CK A A − / A − / A − / A − / B / B / B / B / B / ﬃ cient RK4BBB A A − . A − . A − . A − . A − . B . B . B . B . B . B . lgorithm 9 Y = Y t K = F ( Y ) Y = exp ( h α K ) Y K = F ( Y ) Y = exp ( h ( α K + α K )) Y K = F ( Y ) Y t + h = exp ( h ( β K + β K + β K )) Y or Y t + h = exp ( h ( β K + c ( α K + α K ))) Y (cid:46) See text.group integrators explored in Ref. [24]. However, it does notbelong to the classes of solutions found there.We would like to ﬁnd a solution of the order conditions thatallows for a third-order global accuracy method. For a classicalRK method there are four order conditions, thus, we need atleast four parameters to satisfy them. For a Lie group integratorof this type we need at least ﬁve parameters. This can be seenfrom the following argument. If we take, for instance, a third-order Crouch-Grossman method [23] (which is a commutator-free Lie group integrator with a more restricted structure thanthe methods of [24]), there is a non-classical order conditionarising from non-commutativity, so there are ﬁve in total. Ifwe follow the RKMK route [37, 21], one needs, at least, onecommutator for a third-order method. Thus, if we would liketo avoid commutators, we need at least one more coe ﬃ cient totune to cancel the e ﬀ ect of the commutator. Overall, it may bepossible to construct a scheme with ﬁve rather than six di ﬀ erentparameters. This can be beneﬁcial to reduce the complexity ofthe order conditions, and also to reuse storage. We can trade β and β for a single coe ﬃ cient c by requiring: β = c α , β = c α . (C.1)This way one can store the combination α K + α K , ratherthan K and K separately, overwriting K with this combina-tion.As is done for classical RK methods, by Taylor expandingthe numerical scheme and comparing with the expansion of theexact solution one arrives at the following order conditions: β + (1 + c )( α + α ) + α = , (C.2) β ( α + α + α ) + (1 + c ) α α = , (C.3) β ( α + α + α ) + (1 + c ) α α = , (C.4) β α α = , (C.5) 32 (cid:104) β ( α + α + α ) + (1 + c ) α α (cid:105) + β ( β + c ( α + α ))( α + α + α ) +

12 [ c ( β + c ( α + α )) + α + α ] α α + c ( α + α ) α α = , (C.6)12 β ( β + c ( α + α ))( α + α + α ) +

12 [ c ( β + c ( α + α )) + α + α ] α α + ( β + c ( α + α )) α α = . (C.7)Eqs. (C.2)–(C.5) are equivalent to the classical order conditionsfor a third-order scheme, expressed in terms of the coe ﬃ cients α i j , β i (their relation to the classical coe ﬃ cients will becomeclear in a moment). Eqs. (C.6), (C.7) appear when Y is a vectoror a matrix and terms F (cid:48) F and FF (cid:48) F in the Taylor expansioncan no longer be combined. By using the four classical orderconditions (C.2)–(C.5) one ﬁnds that the two conditions (C.6),(C.7) are not independent and can be condensed into a singlecondition:12 (cid:2) β + c ( α + α ) (cid:3) + (1 + c )( α + α ) α α = . (C.8)Now there are ﬁve unknowns and ﬁve (non-linear) equations.As is obvious from Eqs. (C.2)–(C.5), (C.8) this system canbecome signiﬁcantly simpler if 1 + c =

0, so we can try c = − α + α + α = /

3, substituting that into (C.3) we immedi-ately get β = /

4, from (C.2) α = /

4, from (C.5) α = / α = − /

36. We can check that Eq. (C.8) isalso satisﬁed. These are nothing else but the coe ﬃ cients of thescheme of Ref. [2], Algorithm 7.However, as the reader can verify, there are solutions forother values of c . This is possible because, as one can show, inthe form involving the coe ﬃ cient c , Eq. (C.1), the non-classicalconstraint (C.8) is a linear combination of (C.2)–(C.4). Had wekept β and β as independent coe ﬃ cients this would not hap-pen. We would still end up with a one-parameter family of so-lutions (six coe ﬃ cients with ﬁve constraints), but (C.8) wouldbe linearly independent.As is now clear from the discussion in Sec. 4.3, the integra-tor of [2] is, in fact, a low-storage commutator-free Lie groupintegrator that belongs to the family of schemes based on theclassical 2 N -storage methods of Ref. [10]. See [4] for detaileddiscussion. We call this scheme LSCFRK3W6 in Sec. 5.1. Itsset of classical RK coe ﬃ cients is a = / a = − / a = / b = / b = b = / α ’s and β ’s we started with as α = a , (C.9) α = a − a , (C.10) α = a , (C.11) β = b − a , (C.12) β = b − a , (C.13) β = b . (C.14)The coe ﬃ cients in the 2 N -storage format, Algorithm 6, aregiven in the second column of Table B.3. Appendix D. Mathematica script

A Wolfram Mathematica script (tested with version 11) thatcalculates the coe ﬃ cients for the 2 N -storage explicit three-stage third-order Runge-Kutta methods from provided c and c coe ﬃ cients is given below. The reader can simply copy andpaste it into an empty Mathematica notebook (the formattingwill most likely be lost). (* Note: enclosing parentheses are neededso that Abort[] function could actually abortthe execution of the cell *)( (* <---- do not remove *)(* SET c2 AND c3 HERE e.g. from Table B.2 *)c2 := 1/4;c3 := 2/3;(* check the singular point *)If[c2 == 1/3 && c3 == 1/3,Print["No RK scheme with c2=c3=1/3 exists"];Abort[]];(* check if low-storage *)If[c3^2*(1 - c2) + c3*(c2^2 + 1/2*c2 - 1)+ (1/3 - 1/2*c2) != 0,Print["c2,c3 -- not a low-storage scheme"];Abort[]];(* check if limiting case c2=2/3,c3=0or c2=c3=2/3 is hit *)If[c2 == 2/3 && c3 == 0,b3 := -1/3; b2 := 3/4; b1 := 1/4 - b3;a32 := 1/4/b3; a31 := -a32; a21 := 2/3,If[c2 == 2/3 && c3 == 2/3,b3 := 1/3; b2 := 3/4 - b3; b1 := 1/4;a32 := 1/4/b3; a31 := 2/3 - a32; a21 := 2/3,b2 := (3*c3 - 2)/6/c2/(c3 - c2);b3 := (2 - 3*c2)/6/c3/(c3 - c2);a32 := c3*(c3 - c2)/c2/(2 - 3*c2);b1 := 1 - b2 - b3; a31 = c3 - a32; a21 := c2]];Print["Coefficients in classical RK form:"];Print["a21=", a21]; Print["a31=", a31];Print["a32=", a32];Print["b1=", b1];Print["b2=", b2];Print["b3=", b3];alpha21 := a21;alpha31 := a31 - a21;alpha32 := a32;beta3 := b3;c := (b1 - a31)/(a31 - a21);Print["Coefficients in the form of"];Print["Luescher, 1006.4518"];Print["with reusability condition"];Print["beta1=c*alpha31, beta2=c*alpha32:"];Print["alpha21=", alpha21];Print["alpha31=", alpha31];Print["alpha32=", alpha32];Print["beta3=", beta3];Print["c=", c];A1 := 0;B3 := b3;B2 := a32;A3 := (b2 - B2)/b3;B1 := a21;If[b2 == 0,A2 := (a31 - a21)/a32, A2 := (b1 - B1)/b2];Print["Low-storage form of Williamson:"];Print["A1=", A1];Print["A2=", A2];Print["A3=", A3];Print["B1=", B1];Print["B2=", B2];Print["B3=", B3];Print["Variable step size:"];Print["Second-order coefficients"];Ds := c2*alpha32 - c3*(c3 - c2);If[Ds == 0,Print["No reusable embedded scheme"];Print["Pick lambda3=0"];l3 := 0;l2 := 1/2/c2;l1 := 1 - l2;Print["lambda1=", l1];Print["lambda2=", l2];Print["lambda3=", l3],l2 := alpha32*(1/2 - c3)/Ds;l3 := 1 - (c3 - c2)*(1/2 - c3)/Ds;l1 := 1 - l2 - l3;q := l2/a32;Print["with reuse of third-order second stage"];Print["lambda2=q*alpha32, lambda1=q*alpha31"];Print["lambda3=", l3]; rint["q=", q]];) (* <---- do not remove *)(* end of script *) References [1] K. G. Wilson, Conﬁnement of Quarks, Phys. Rev. D10 (1974) 2445–2459. doi:10.1103/PhysRevD.10.2445 .[2] M. Luescher, Properties and uses of the Wilson ﬂow in lattice QCD, JHEP08 (2010) 071, [Erratum: JHEP03,092(2014)]. arXiv:1006.4518 , doi:10.1007/JHEP08(2010)071,10.1007/JHEP03(2014)092 .[3] E. Hairer, C. Lubich, G. Wanner, Geometric Numerical Integration:Structure-Preserving Algorithms for Ordinary Di ﬀ erential Equations; 2nded., Springer, Dordrecht, 2006. doi:10.1007/3-540-30666-8 .URL https://cds.cern.ch/record/1250576 [4] A. Bazavov, Commutator-free Lie group methods with minimum stor-age requirements and reuse of exponentials, arXiv e-prints (2020)arXiv:2007.04225 arXiv:2007.04225 .[5] M. Luscher, P. Weisz, Computation of the Action for On-Shell ImprovedLattice Gauge Theories at Weak Coupling, Phys. Lett. 158B (1985) 250–254. doi:10.1016/0370-2693(85)90966-9 .[6] S. Borsanyi, et al., High-precision scale setting in lattice QCD, JHEP 09(2012) 010. arXiv:1203.4469 , doi:10.1007/JHEP09(2012)010 .[7] Z. Fodor, K. Holland, J. Kuti, S. Mondal, D. Nogradi, C. H. Wong, Thelattice gradient ﬂow at tree-level and its improvement, JHEP 09 (2014)018. arXiv:1406.0827 , doi:10.1007/JHEP09(2014)018 .[8] J. Butcher, Numerical Methods for Ordinary Di ﬀ erential Equations, 3rdEdition, Wiley, 2016.[9] E. Hairer, S. Nørsett, G. Wanner, Solving Ordinary Di ﬀ erential EquationsI Nonsti ﬀ problems, 2nd Edition, Springer, Berlin, 2000.[10] J. Williamson, Low-storage Runge-Kutta schemes, Journal ofComputational Physics 35 (1) (1980) 48 – 56. doi:https://doi.org/10.1016/0021-9991(80)90033-9 .URL [11] C. A. Kennedy, M. H. Carpenter, R. Lewis, Low-storage, explicitRunge-Kutta schemes for the compressible Navier-Stokes equa-tions, Applied Numerical Mathematics 35 (3) (2000) 177 – 219. doi:https://doi.org/10.1016/S0168-9274(99)00141-5 .URL [12] D. I. Ketcheson, Runge-Kutta methods with minimum storage imple-mentations, Journal of Computational Physics 229 (5) (2010) 1763 –1773. doi:https://doi.org/10.1016/j.jcp.2009.11.006 .URL [13] M. Carpenter, C. Kennedy, Fourth-order 2N-storage Runge-Kuttaschemes, Tech. Rep. NASA-TM-109112, NASA (1994).[14] M. Bernardini, S. Pirozzoli, A general strategy for the optimiza-tion of Runge-Kutta schemes for wave propagation phenomena,Journal of Computational Physics 228 (11) (2009) 4182 – 4199. doi:https://doi.org/10.1016/j.jcp.2009.02.032 .URL [15] J. Berland, C. Bogey, C. Bailly, Low-dissipation and low-dispersionfourth-order Runge-Kutta algorithm, Computers and Fluids 35 (10)(2006) 1459 – 1463. doi:https://doi.org/10.1016/j.compfluid.2005.04.003 .URL [16] T. Toulorge, W. Desmet, Optimal Runge-Kutta schemes for discon-tinuous Galerkin space discretizations applied to wave propagationproblems, Journal of Computational Physics 231 (4) (2012) 2067 – 2091. doi:https://doi.org/10.1016/j.jcp.2011.11.024 .URL [17] D. Stanescu, W. Habashi, 2N-storage low dissipation and dis-persion Runge-Kutta schemes for computational acoustics,Journal of Computational Physics 143 (2) (1998) 674 – 681. doi:https://doi.org/10.1006/jcph.1998.5986 . URL [18] V. Allampalli, R. Hixon, M. Nallasamy, S. D. Sawyer, High-accuracylarge-step explicit Runge-Kutta (HALE-RK) schemes for computationalaeroacoustics, Journal of Computational Physics 228 (10) (2009) 3837 –3850. doi:https://doi.org/10.1016/j.jcp.2009.02.015 .URL [19] J. Niegemann, R. Diehl, K. Busch, E ﬃ cient low-storage Runge-Kutta schemes with optimized stability regions, Journal of Com-putational Physics 231 (2) (2012) 364 – 372. doi:https://doi.org/10.1016/j.jcp.2011.09.003 .URL [20] Y. an Yan, Low-storage Runge-Kutta method for simulating time-dependent quantum dynamics, Chinese Journal of Chemical Physics30 (3) (2017) 277 – 286.[21] H. Munthe-Kaas, High order Runge-Kutta methods on man-ifolds, Applied Numerical Mathematics 29 (1) (1999) 115– 127, proceedings of the NSF / CBMS Regional Conferenceon Numerical Analysis of Hamiltonian Di ﬀ erential Equations. doi:https://doi.org/10.1016/S0168-9274(98)00030-0 .URL [22] H. Z. Munthe-Kaas, B. Owren, Computations in a free Lie algebra, Philo-sophical Transactions of the Royal Society of London. Series A: Mathe-matical, Physical and Engineering Sciences 357 (1999) 957 – 981.[23] P. E. Crouch, R. Grossman, Numerical integration of ordinary di ﬀ erentialequations on manifolds, Journal of Nonlinear Science 3 (1) (1993) 1–33. doi:10.1007/BF02429858 .URL https://doi.org/10.1007/BF02429858 [24] E. Celledoni, A. Marthinsen, B. Owren, Commutator-free Lie groupmethods, Future Generation Computer Systems 19 (3) (2003)341 – 352, special Issue on Geometric Numerical Algorithms. doi:https://doi.org/10.1016/S0167-739X(02)00161-9 .URL [25] O. Knoth, B-Series (2020).URL https://github.com/OsKnoth/B-Series [26] E. Follana, Q. Mason, C. Davies, K. Hornbostel, G. P. Lepage,J. Shigemitsu, H. Trottier, K. Wong, Highly improved staggered quarkson the lattice, with applications to charm physics, Phys. Rev. D75 (2007)054502. arXiv:hep-lat/0610092 , doi:10.1103/PhysRevD.75.054502 .[27] A. Bazavov, et al., Scaling studies of QCD with the dynamical HISQaction, Phys. Rev. D82 (2010) 074501. arXiv:1004.0342 , doi:10.1103/PhysRevD.82.074501 .[28] A. Bazavov, et al., Lattice QCD Ensembles with Four Flavors of HighlyImproved Staggered Quarks, Phys. Rev. D87 (5) (2013) 054505. arXiv:1212.4768 , doi:10.1103/PhysRevD.87.054505 .[29] A. Bazavov, et al., Gradient ﬂow and scale setting on MILC HISQ en-sembles, Phys. Rev. D 93 (9) (2016) 094510. arXiv:1503.02769 , doi:10.1103/PhysRevD.93.094510 .[30] P. Prince, J. Dormand, High order embedded runge-kutta formulae,Journal of Computational and Applied Mathematics 7 (1) (1981) 67 – 75. doi:https://doi.org/10.1016/0771-050X(81)90010-3 .URL [31] A. Ralston, Runge-Kutta methods with minimum error bounds, Math.Comp. 16 (1962) 431 – 437. doi:https://doi.org/10.1090/S0025-5718-1962-0150954-0 .URL [32] P. Fritzsch, A. Ramos, The gradient ﬂow coupling in the SchroedingerFunctional, JHEP 10 (2013) 008. arXiv:1301.4388 , doi:10.1007/JHEP10(2013)008 .[33] M. Wandelt, M. Guenther, E ﬃ cient numerical simulation of the wilsonﬂow in lattice qcd, in: G. Russo, V. Capasso, G. Nicosia, V. Romano(Eds.), Progress in Industrial Mathematics at ECMI 2014, Springer Inter-national Publishing, Cham, 2016, pp. 1065–1071.[34] P. Bogacki, L. Shampine, A 3(2) pair of runge - kutta for- ulas, Applied Mathematics Letters 2 (4) (1989) 321 – 325. doi:https://doi.org/10.1016/0893-9659(89)90079-7 .URL [35] M. Ce, C. Consonni, G. P. Engel, L. Giusti, Non-Gaussianities in thetopological charge distribution of the SU(3) Yang–Mills theory, Phys.Rev. D92 (7) (2015) 074502. arXiv:1506.06052 , doi:10.1103/PhysRevD.92.074502 .[36] MILC Collaboration, MILC collaboration code for lattice QCD calcula-tions (Dec. 2020).URL https://github.com/milc-qcd/milc_qcd [37] H. Munthe-Kaas, Runge-Kutta methods on Lie groups, BIT NumericalMathematics 38 (1) (1998) 92–111. doi:10.1007/BF02510919 .URL https://doi.org/10.1007/BF02510919https://doi.org/10.1007/BF02510919