Optimal Residuals and the Dahlquist Test Problem
NNumerical Algorithms manuscript No. (will be inserted by the editor)
Optimal Residuals and the Dahlquist Test Problem
Robert M. Corless , C. Yalc¸ın Kaya andRobert H. C. Moir Received: date / Accepted: date
Abstract
We show how to compute the optimal relative backward error for the numericalsolution of the Dahlquist test problem by one-step methods. This is an example of a generalapproach that uses results from optimal control theory to compute optimal residuals, butelementary methods can also be used here because the problem is so simple. This analysisproduces new insight into the numerical solution of stiff problems.
Keywords stiff IVP · backward error · residual · optimal control Mathematics Subject Classification (2010) · · · The study of stiff differential equations and their efficient numerical solution is by now amature field. There are several, perhaps many, efficient practical methods with freely avail-able high quality implementations. The literature on the theory of such methods is extensive.Surprisingly, it is not yet complete: for instance, see the survey [24], which has the intriguingtitle
Stiffness 1952-2012: Sixty years in search of a definition . That paper re-examines thefundamentals and thoroughly surveys the literature, and proposes a new stiffness indicatorthat they claim is useful both a priori for indicating stiffness and a posteriori for indicatingvarying regions of the solution that stiffness is important.This paper takes a different approach, that of optimal residuals , i.e., backward error,and uses it on the Dahlquist test problem to generate some new observations about this, thesimplest of all stiff problems. Indeed, [24] calls this problem “simplistic” and with goodreason, but surprisingly it still has things to teach us.Trying to study stiff problems from the point of view of backward error analysis is itselfnot new. For instance, there is the PhD thesis of W.L. Seward and the paper [8]. But there isan intrinsic dissonance: a stiff problem has the feature that errors are (often sharply) dampedas the integration proceeds forward in time, and thus it is not obvious why one might prefer The Rotman Institute of Philosophy andThe Ontario Research Center for Computer Algebra andThe School of Mathematical and Statistical SciencesThe University of Western Ontario School of Information Technology and Mathematical SciencesUniversity of South Australia a r X i v : . [ m a t h . NA ] M a y Robert M. Corless , C. Yalc¸ın Kaya and Robert H. C. Moir to look at backward error when, if the right method is used, small forward error happensmore or less automatically.We claim that backward error, in particular optimal residuals, which we explain below,really can be useful. One way to see the usefulness of this type of analysis is in contrast tothe classical stability analysis on the Dahlquist test problem, ˙ y = λ y , λ ∈ C . In the classicalstability analysis there is an emphasis on A-stability, with an understanding that, at leastlocally where linear stability assessments are valid, a basic criterion for stability of a methodis that the numerical solution decays where the actual solution does. This condition is ofparticular concern for stiff problems, since it entails that the method will not be subject tostability restrictions on account of eigenvalues with large negative real part.The classical stability analysis really does explain most of the behaviour of implicitmethods on stiff problems, and much insight has been gained thereby. This present analysisis a refinement only, that offers the possibility of explaining some extra considerations whichare “well-known” to practitioners, namely that, under certain circumstances, a stiff methodmay well be stable but not as accurate as one might wish. With the optimal residual approacha rather different set of criteria emerge, which imply a de-emphasis of the criterion of A-stability, being enhanced with a consideration of those regions of the problem space that agiven method can reproduce the dynamics of the reference problem accurately; such regionsoften extend comfortably into the right-half of the complex plane, even for explicit methods.There are also deeper reasons why backward error analysis is useful on stiff problems, as a result of the fact that a small backward error entails a small perturbation of the dynamicsof the problem. In this regard, one place where optimal residuals are especially useful isin the solution of systems which have nontrivial attracting sets: the problems can be stiffbecause decay to the attracting set can be very strong, but small backward error is importanttoo in order to get the dynamics right on the attracting set, which might in fact be actuallychaotic. Of course, for a chaotic problem, small forward error is not possible at all (at least,not for very long), but as explained for instance in [3] small backward error can be perfectlysatisfactory as an explanation of the success of a numerical method on a chaotic problem.The question of what good a numerical method is, even one providing a small back-ward error, for computation of an attracting set for a chaotic system, deserves a repetitionof the answer given in that reference. Every numerical analyst knows that small forwarderror requires exponentially accurate initial conditions and exponentially accurate integra-tion. One way to explain a successful computation is if some form of shadowing is invoked.That is, the forward accuracy of the computed trajectory is explained by being“shadowed”by an exact solution of the reference problem, typically from some nearby initial condition.This is a form of backward error analysis [20]. Computationally verifying that shadowinghas occurred is expensive, however, and while shadowing is generic, there are no a priori guarantees that shadowing will occur or has occurred.A cheaper and simpler method to explain the success of a numerical method on a chaoticproblem is to verify that the residual (also called the defect, deviation, or slope error) issmall. Also take note of the utility of interpreting ordinary numerical errors as a modellingerror in many circumstances. If the model equations are written˙ y = f ( t , y ) after having been derived for some physical context, universally by neglecting minor effects,and our numerical solution gives us z ( t ) exactly satisfying˙ z ( t ) = f ( t , z ( t )) + ∆ ( t ) . ptimal Residuals and the Dahlquist Test Problem 3 or perhaps ˙ z ( t ) = ( I + δ ( t )) f ( t , z ( t )) , and ∆ or δ are small compared to the neglected terms, then as Nick Higham puts it “forall we know, we already have the exact solution,” for error has been introduced into thedynamics by modelling assumptions.We also emphasize that there must be some feature of this system that is robust underperturbations, or else even the reference solution of ˙ y = f ( t , y ) would be useless in the faceof real-life perturbations. The existence of such a feature for a given system was termed“well-enough conditioning” in [2]. Nearly all models used have this property, even chaoticones. For instance, the dimension of the Lorenz attractor is quite robust under forcings ofthis type: the attracting sets of ˙ y = σ ( y − y ) + δ ( y ) , ˙ y = y ( ρ − y ) − y + δ ( y ) , ˙ y = y y − β y + δ ( y ) , are remarkably close to one another even for quite large ∆ ( t ) = ( δ ( t ) , δ ( t ) , δ ( t )) T . [2]So backward error may be important in explaining the success or failure of numericalmethods for chaotic systems, which can be stiff. More than this, however, we believe ourapproach to stability analysis on the Dahlquist test problem may lead to a refinement of the classical explanation of the practical success of various numerical methods.We now consider backward error and the general context of optimal residuals briefly,before moving on to an elementary analysis of the Dahlquist test problem. z ( t ) , to the numerical skeleton ( t k , y k ) ofour computed solution to the ODE ˙ y = f ( t , y ) , we define the residual ∆ ( t ) as ∆ ( t ) : = ˙ z ( t ) − f ( t , z ( t )) . (1)As previously stated, this is sometimes called the defect. In one sense, this is a kind ofbackward error, because the computed z ( t ) can be interpreted as the exact solution of theperturbed equation ˙ y ( t ) = f ( t , y ( t )) + ∆ ( t ) . (2)Note that the residual as defined here is dependent on just which interpolant z ( t ) is used.To be compatible with the accuracy of the numerical solutions they interpolate, theseinterpolants should be O ( h p ) accurate, but sometimes are not . For example, in M ATLAB , ode45 uses a fifth-order Runge-Kutta Fehlberg formula, but has only a fourth-order inter-polant: so z (cid:48) ( t ) will only be third-order accurate. A consequence is that the residual, ordefect, ∆ ( t ) : = ˙ z ( t ) − f ( t , z ) will sometimes be overestimated.The work of [9,7], [11], [10], [21], [1] has shown that one can interpolate the skeleton { t k , y k } of a numerical solution in a practical way that gives the correct asymptotic sizeof the residual (cid:107) ∆ (cid:107) = O ( h p ) as the mean time step h →
0, for a method of order p . This Robert M. Corless , C. Yalc¸ın Kaya and Robert H. C. Moir gives a practical method that gives tolerance proportionality and robust reliability for tighttolerances.If instead one interpolates the skeleton badly, one will obviously get a large residual. Inone of their examples, Hubbard & West wonder if a smaller residual (they term it “slopeerror”) might be achieved with some other interpolant [14]. This raises the question of howto find the “best” interpolant, which gives the “smallest” residual. This smallest residual isthe one that will most accurately measure how good a job the underlying method did ingenerating the skeleton.This question is answered in general elsewhere (Corless, Kaya & Moir, in preparation)where optimization methods are used to find interpolants from ( t k , y k ) to ( t k + , y k + ) thatminimize (cid:107) ∆ ( t ) (cid:107) or (cid:107) δ ( t ) (cid:107) (either the 2-norm or ∞ -norm are handled).2.2 Interpolants via optimal control theoryFor ease of exposition in this section we work entirely in R n .Define the control variable vector u ( t ) : = δ ( t ) , where u : [ t i , t i + ] → R n , as a piecewisecontinuous vector function. The problem of finding interpolants z : [ t i , t i + ] → R n from ( t i , y i ) to ( t i + , y i + ) that minimize the L ∞ -norm of the relative error (cid:107) δ (cid:107) L ∞ can be stated as anoptimal control problem: P1: minimize max t i ≤ t ≤ t i + (cid:107) u ( t ) (cid:107) ∞ subject to ˙ z ( t ) = f ( t , z ( t )) ( + u ( t )) , a.e. t i ≤ t ≤ t i + , z ( t i ) = y i , z ( t i + ) = y i + , where (cid:107) · (cid:107) ∞ is the (cid:96) ∞ -norm in R n . Problem (P1), where z ( t ) assumes the role of a statevariable vector , can be transformed into a more standard form to apply a maximum princi-ple, as in [17], as follows. Let α be a new parameter. Then one can rewrite Problem (P1)equivalently asP2: minimize α subject to ˙ z ( t ) = f ( t , z ( t )) ( + u ( t )) , a.e. t i ≤ t ≤ t i + , z ( t i ) = y i , z ( t i + ) = y i + , | u j ( t ) | ≤ α , a.e. t i ≤ t ≤ t i + , j = , . . . , n . To apply the maximum principle, we define the
Hamiltonian function for Problem (P2): H ( z , u , ψ , t ) : = n ∑ j = ψ j f j ( t , z ) ( + u j ) , where ψ : [ t i , t i + ] → R n is referred to as the adjoint (or costate ) variable vector , such that˙ ψ j = − ∂ H ∂ z j = − n ∑ j = ψ j ∂ f j ∂ z j ( t , z ) ( + u j ) , j = , . . . , n . (3) The maximum principle asserts that if u is optimal then there exists a nontrivial adjointvariable vector such that u minimizes the Hamiltonian; namely u ( t ) ∈ argmin | v |≤ α H ( z ( t ) , v , ψ ( t ) , t ) , a.e. t ∈ [ t i , t i + ] , ptimal Residuals and the Dahlquist Test Problem 5 i.e., u j ( t ) = α , if ψ j ( t ) f j ( t , z ( t )) < , − α , if ψ j ( t ) f j ( t , z ( t )) > , undetermined , if ψ j ( t ) f j ( t , z ( t )) = . j = , . . . , n . (4)This implies that the components of the optimal control, or the optimal backward error, willeither be bang–bang (i.e., switching between the constants α and − α ) or singular (when ψ j ( t ) f j ( t , z ( t )) = Dahlquist test equation , ˙ y = λ y , for which the optimalityconditions derived above simplify drastically. For Dahlquist with real λ , the adjoint equa-tion (3) reduces to the scalar ODE˙ ψ ( t ) = − λ ( + u ( t )) ψ ( t ) . (5)Singular control can be ruled out for this case as follows. If the optimal control is singular,then from (4) either ψ ( t ) = f ( t , z ( t )) = z ( t ) = [ s , s ] ⊂ [ t i , t i + ] .Suppose that ψ ( t ) = [ s , s ] . Then ˙ ψ ( t ) = − λ ( + u ( t )) ψ ( t ) = [ s , s ] , for any u . However, this further means that ψ ( t ) = t ∈ [ t i , t i + ] , which, being trivial, is notallowed by the maximum principle. Now, suppose that z ( t ) = [ s , s ] . Then similarly z ( t ) = t ∈ [ t i , t i + ] , which we can leave out as a very special case with y i = | δ ( t ) | = α , i.e., the optimal relative error is of constant magnitude.Suppose that the control u ( t ) = α or − α on [ t i , t i + ε ] , with a small ε >
0. Then (5)implies that ψ ( t ) does not change sign for any t ∈ [ t i , t i + ] . Thus, either u ( t ) = α or u ( t ) = − α for all t ∈ [ t i , t i + ] . As a result, the optimal interpolant is given by˙ z ( t ) = ( ± α ) λ z ( t ) , whose solution can simply be obtained as z ( t ) = y i e ( ± α ) λ ( t − t i ) . It follows that y i + = y i e ( ± α ) λ h , (6)or ± α = λ h ln y i + y i . (7)Rearranging further yields the minimum value of the L ∞ -norm of the relative error as α = (cid:12)(cid:12)(cid:12)(cid:12) − λ h ln y i + y i (cid:12)(cid:12)(cid:12)(cid:12) . One may also look for a control defined over more than one piece of the skeleton; thiswould allow the analysis to be used for multistep methods. In this paper, however, we restrictattention to one-step methods. Going from equation (6) to equation (7) is correct over the positive reals but not quite right over C : ln z means the principal branch of the logarithm, with argument in ( − π , π ] . We will see how to choose the correctcomplex branch in equation (9), in a later section. Robert M. Corless , C. Yalc¸ın Kaya and Robert H. C. Moir Indeed for some problems the minimal δ is surprisingly easy to find: see also [6] for aninvestigation of Torricelli’s law using this idea.We will use this approach to re-examine the Dahlquist test problem, ˙ y = λ y . We willbe able to find the interpolant giving the optimal δ without difficulty using only elementarymethods. Our analysis will be valid for any one-step method. The results are quite surprising. ˙ y = λ y Without loss of generality suppose that our one-step numerical method gives, for ˙ y = λ y , y ( t n ) = y n , and time-step h y n + = y n + h Φ ( y n ) = R ( µ ) y n , where µ = λ h ∈ C and the time-step h >
0. For instance, the Explicit Euler method gives y n + = y n + h · λ y n = ( + µ ) y n , while Implicit Euler gives y n + = y n + h · λ y n + or y n + = − µ y n . A small table of R ( µ ) corresponding to various methods is given in Table 1. Note thatTaylor series methods of order p have R ( µ ) = p ∑ k = µ k k ! . absolute stability region is defined as the region in thecomplex plane where the value of µ = λ h guarantees that the Dahlquist test problem isuniformly bounded for all forward time steps. Again the rationale for focusing on such asimple problem is that it characterizes the local behaviour of a method on a scalar componentof a linearized version of any nonlinear problem.For Runge-Kutta methods, the absolute stability region may be characterized in terms ofthe quantity R ( µ ) as the region satisfying | R ( µ ) | ≤
1. Having this condition satisfied strictly( < ) for those problems where the exact solution decays (eigenvalues with negative real part)underlies the criterion of A-stability . An A-stable method is any method that contains theright-half plane in its stability region. This is an important criterion for stiff problems, in thesense of problems that have eigenvalues with real parts widely separated and negative, be-cause an A-stable method will avoid stability restrictions that force an extremely small timestep when small time steps are not required for an accurate solution. An exactly A-stablemethod is one that has precisely the left-half plane, with the imaginary axis as boundary,as its absolute stability region. The stability regions of the Euler method, the implicit Eulermethod and the implicit midpoint rule are shown in figure 1. The implicit midpoint rule isan example of an exactly A-stable method. ptimal Residuals and the Dahlquist Test Problem 7(a) Euler method (b) Implicit midpoint rule (c) Implicit Euler method
Fig. 1: The classical stability regions for the one-stage θ methods (a) θ =
0, (b) θ = , (c) θ =
1. The stability region of (b) is the same for any exactly A-stable method. method R ( µ ) Euler 1 + µ Backward Euler − µ Implicit Midpoint Rule + µ − µ Taylor Series order p ∑ pk = µ k k ! Table 1: Approximations to e µ for some numerical methodsHigher order methods have more complex and more interesting absolute stability re-gions. The Runge-Kutta-Fehlberg (or RKF45) method includes a pair of Runge-Kutta meth- ods consisting of a fourth order method together with a fifth order error estimator. Theabsolute stability regions for these methods are shown in figure 2. (a) Fourth order component of RKF45 (b) Fifth order component of RKF45 Fig. 2: The classical stability regions for the components of the Fehlberg method.The shift to consider optimal residual stability regions entails a very different perspectiveon numerical methods for stiff problems.As is well-known, consideration of the relative forward error | R ( µ ) e − µ − | gives riseto the beautiful theory of Order Stars, which considers the regions A + : | R ( µ ) e − µ | > , A : | R ( µ ) e − µ | = , A − : | R ( µ ) e − µ | <
1. For example, the three regions for the fifth-order methodof RKF45 are shown in figure 3. For a detailed account see [15]. In hindsight, it is not asurprise that a similar theory will arise from a consideration of relative backward error.Fig. 3: Order star for the fifth order component of RKF45. The A − region is shaded, the A boundary is dotted, and the A + region is the unshaded part of the plane. Robert M. Corless , C. Yalc¸ın Kaya and Robert H. C. Moir y n + = R ( µ ) y n . Also without loss ofgenerality consider only t n = t n + = h . We search for an optimal interpolant, whichwe call z ( t ) , satisfying z ( ) = y n , z ( h ) = y n + and˙ z ( t ) = λ ( + δ ( t )) z ( t ) , (8)with δ ( t ) as small as possible. By constructing it explicitly we will demonstrate its existenceand its minimality for δ ( t ) . We use the ∞ -norm to measure the size of δ : (cid:107) δ ( t ) (cid:107) ∞ = max ≤ t ≤ h | δ ( t ) | . Rearranging, and assuming z ( s ) (cid:54) = ≤ s ≤ h ,˙ z ( t ) z ( t ) − λ = λ δ ( t ) , so (cid:90) h ˙ z ( s ) z ( s ) ds − λ (cid:90) h ds = λ (cid:90) h δ ( s ) ds , or ln z ( h ) − ln z ( ) − λ h = λ (cid:90) h δ ( s ) ds − π ik , (9)where the integer k is determined by the (as yet unknown) number of times the path z ( s ) winds around z =
0. We will see later that k = k (cid:54) =
0, especially for higher-order methods. By adjusting k if necessary we get ln k (cid:18) z ( h ) z ( ) (cid:19) − µ = λ (cid:90) h δ ( s ) ds , where we have used David Jeffrey’s compact notation ln k z for ln z + π ik .Since z ( h ) z ( ) = y n + y n = R ( µ ) we haveln k R ( µ ) − µ = λ (cid:90) h δ ( s ) ds . Taking absolute values and using the triangle inequality, | ln k R ( µ ) − µ | ≤ | λ | (cid:12)(cid:12)(cid:12)(cid:12) (cid:90) h ds (cid:12)(cid:12)(cid:12)(cid:12) (cid:107) δ ( t ) (cid:107) ∞ or | ln k R ( µ ) − µ | ≤ | µ | (cid:107) δ (cid:107) ∞ or (cid:107) δ (cid:107) ∞ ≥ (cid:12)(cid:12)(cid:12)(cid:12) ln k R ( µ ) µ − (cid:12)(cid:12)(cid:12)(cid:12) , where we ignore the uninteresting case of µ = ptimal Residuals and the Dahlquist Test Problem 9 This fundamental inequality gives a lower bound on any backward error δ ( t ) capable oftaking y n to y n + . We now show that this lower bound is achieved, if we choose δ ( t ) to beconstant. Suppose δ = ln k R ( µ ) µ − . (10)Then ˙ z = λ ( + δ ) z is well-defined and indeed z = e λ ( + δ ) t z ( ) = e λ ln k R ( µ ) µ t y n . When t = h , z ( h ) = e λ ln k R ( µ ) µ h y n = e µ ln k R ( µ ) µ y n = e ln k R ( µ ) y n = R ( µ ) y n = y n + (11)as desired, for any choice of k . Since the set (cid:110)(cid:12) (cid:12)(cid:12) ln k R ( µ ) µ − (cid:12) (cid:12)(cid:12) : k ∈ Z (cid:111) is countable and nonnegative, it has a least mem- ber. We will use the k that picks out the least member of this set. Lemma:
To minimize (cid:12)(cid:12)(cid:12) ln k R ( µ ) µ − (cid:12)(cid:12)(cid:12) over choices of branch cut, we must choose k = K : = (cid:20) Im ( µ − ln R ( µ )) π (cid:21) , where [ a ] is the nearest integer to a ; in case of a tie, either integer above or below will do. Proof
We seek a choice of k that minimizes (cid:12)(cid:12)(cid:12) ln k R ( µ ) µ − (cid:12)(cid:12)(cid:12) , i.e. , that minimizes | ε | for ε : = ln k R ( µ ) µ − = ln R ( µ ) + π ik µ − . Putting this over a common denominator, we get ε = ln R ( µ ) − µ + π ik µ . (12)For the magnitude of ε to be as small as possible, we choose k . But this alters only the imag-inary part of the numerator, because k is an integer and therefore 2 π ik is purely imaginary.In order to make this imaginary term as small as possible, it follows that we must have thatthe integer k cancels as much as possible of the imaginary part from equation (12), or k = K : = (cid:20) Im ( µ − ln R ( µ )) π (cid:21) . (13)This formula is very reminiscent of the unwinding number from [16], but is different indetail. , C. Yalc¸ın Kaya and Robert H. C. Moir Remark 1
We will be able to examine the optimal backward error of a very large class ofmethods, essentially all one-step methods, with just formula (10) with the integer k specifiedby formula (13). Remark 2
If ever an interpolant hits z ( s ) =
0, from ˙ z = λ ( + δ ) z we would have z ≡ y n + = y n + = (cid:107) δ (cid:107) ∞ = ∞ isnecessary (a small absolute ∆ ( t ) is possible in that case, but the relative error must go toinfinity). As we saw in the general optimal control approach, this corresponds to the casewhen the control is singular. We now examine properties of the optimal backward error along with contour plots of theoptimal ∞ -norm optimal residuals. Note | δ | = | δ | larger thanthis means we are solving a totally different equation. In the absence of systematic structurepreservation, we may regard any likeness of the solution to what we want as coincidence.Taking | δ | = .
05 is analogous to the “95% confidence limit”. If | δ | ≤ .
05, we have the exact solution to a problem within 5% of the one we wanted to solve. We will also beconcerned with the asymptotic limit | δ | = ε as ε → (a) Euler method (b) Implicit midpoint rule (c) Implicit Euler method Fig. 4: The ∞ -norm optimal residual stability regions from equation (10) for the one-stage θ methods (a) θ =
0, (b) θ = , (c) θ =
1. In each image, the classical stability region isshaded and the ∞ -norm optimal residual stability region is coloured, with contours at 5%intervals. Regions where | δ | > δ = .
05 for the second-order method with θ = . In essence this is why second-order methods are more efficient than first-order methods, for a given accuracy. Perhapsmost important, the regions where | δ | > i.e. those regions where the problem solved ismore than 100% different to the problem we wanted to solve, has a nontrivial intersectionwith the classical stability region.4.1 Theta methods The rational approximation that arises from the theta-method y n + = y n + µ (( − θ ) y n + θ y n + ) (14)is R ( µ ) = + ( − θ ) µ − θ µ . (15) ptimal Residuals and the Dahlquist Test Problem 11 For θ = θ = θ = / δ for each of these three θ are plotted in figure 4.We see that for explicit Euler there is a substantial portion of the classical stability region | µ + | < | δ | >
1. This means that even though thesolution will decay, it must be the solution of a problem that is more than 100% different tothe original problem. Likewise, a substantial portion of the right-half plane will have | δ | > θ = /
2) doesn’t have such large 100%error zones (although it does, near the portions of the real axis where | µ | > | δ | = .
05 than either explicit or implicit Euler does.This is because the method is second-order, and the substantially larger region “of 95%confidence,” if you like, is a reflection of the value of a second-order method. Higher ordermethods really can be more efficient, and this picture shows why. (a) Fourth order component of RKF45 (b) Fifth order component of RKF45
Fig. 5: The ∞ -norm optimal residual stability regions from equation (10) for the componentsof the RK-Fehlberg method. In each image, the classical stability region is shaded and the ∞ -norm optimal residual stability region is coloured, with contours at 5% intervals.4.2 RKF45In figure 5 we find the optimal stability regions for each member of the RKF45 pair. Wesee substantial regions in the left half plane where the optimal residual must be larger than1 in magnitude; that is, where the backward error must be larger than 100%. The regionswhere | δ | ≤ .
05 are comparable for each member of the pair, showing that if one method isaccurate, the other is likely to be as well. This information is complementary to that of theclassical stability regions for the pair in figure 2.4.3 A third order SDIRK methodSingly-Diagonally Implicit Runge-Kutta (SDIRK) methods are important and efficient im-plicit solvers for stiff problems. In [25] we find a detailed derivation of a third-order SDIRKmethod that contains the diagonal parameter γ . Two values of γ make the method 3rd order.It is argued there that one value of γ is to be preferred over the other, as making the methodhave a larger (much larger) classical stability region. However, the optimal backward errordiagram in figure 6 shows that it is the other value of γ that is potentially more accurate.Of course, real life is more complicated than either of these images would show. See themassive review [18] for a more nuanced picture. , C. Yalc¸ın Kaya and Robert H. C. Moir (a) Larger g SDIRK method (b) Smaller g SDIRK method
Fig. 6: The ∞ -norm optimal residual stability regions from equation (10) for the two singly-diagonally-implicit Runge-Kutta (SDIRK) methods of order 3. In each image, the classicalstability region is shaded and the ∞ -norm optimal residual stability region is coloured, withcontours at 5% intervals. Classical stability analysis suggests that the larger γ method (left)is more stable. This analysis suggests that the smaller γ method (right) is more accurate. Fig. 7: Lanczos τ -methodFig. 8: The ∞ -norm optimal residual stability regions from equation (10) for the Lanczos Taumethod with n =
1. The ∞ -norm optimal residual stability region is shaded in each image,with contours at 5% intervals.4.4 The Lanczos τ -methodWe have included one figure (figure 8) with the results from solving ˙ y = λ y , y ( ) = y ,on 0 ≤ t ≤ h , by using the Lanczos τ -method (see for instance [19] or [23]). For ease ofreference, we include a brief description of the method here. One starts with a Chebyshevexpansion for ˙ y , namely ˙ y ( t ) = n ∑ k = c k T k ( θ ) , (16) ptimal Residuals and the Dahlquist Test Problem 13 with θ = − + t / h . Integrating T with respect to θ gives T ( θ ) , integrating T ( θ ) gives T ( θ ) / + T ( θ ) /
4, and thereafter (ignoring constants of integration) (cid:90) T k ( θ ) d θ = k + T k + ( θ ) − k − T k − ( θ ) . (17)Lanczos chose to expand the derivative of the unknown, and then integrate using these sim-ple formulas, because this was simpler to do by hand. The details of the process are nowof course automatable in a variety of ways. Then integration of equation (16) gives an ex-pression for y ( t ) , in terms of Chebyshev polynomials of degree n + y ( ) = y to identify the constant introduced on integration. Onesubtracts λ times this expression from the expression for ˙ y , and sets the coefficients of T k ( θ ) to zero for k = k = n . This leaves here a residual term containing T n + ( θ ) . If we dothis using n = z ( t ) = y (cid:18) ( − µ ) T ( θ ) + µ ( − µ ) T ( θ ) + µ ( − µ ) T ( θ ) (cid:19) , (18)which can be simplified at t = h to y = + µ / + µ / − µ / + µ / y . (19)This identifies R ( µ ) = ( + µ / ) / ( − µ / ) and from there we can identify the optimal δ , which is plotted in figure 8. Similar graphs can be produced for larger n . Remark 3
The Lanczos τ -method is quite close in spirit to finding the minimal ∞ -normresidual, because Chebyshev expansions are, as is well-known, near-minimal for real µ inthis sense. One important difference here is that because λ ∈ C we are concerned withthe region in the complex plane where | δ | is small (less than 1 certainly and less than,say, 0 .
05 in practice). Comparison of the optimal backward error on the real interval ( , h ) to the size of the Chebyshev residual from the τ method (not shown here) shows that theChebyshev method is less than a factor of five worse. Reluctantly, however, we do not pursuethe Lanczos τ -method further here. (a) Order 2 Taylor method (b) Order 4 Taylor method (c) Order 8 Taylor method (d) Order 16 Taylormethod Fig. 9: The ∞ -norm optimal residual stability regions from equation (10) for the higher orderTaylor methods of order (a) 2, (b) 4, (c) 8 and (d) 16 , . The ∞ -norm optimal residual stabilityregion is shaded in each image, with contours at 5% intervals. , C. Yalc¸ın Kaya and Robert H. C. Moir (a) 2,2 Pad´e method (b) 4,4 Pad´e method (c) 8,8 Pad´e method (d) 16,16 Pad´e method Fig. 10: The ∞ -norm optimal residual stability regions from equation (10) for the higherorder Pad´e methods of signature (a) 2 ,
2, (b) 4 ,
4, (c) 8 , ,
16. The ∞ -norm optimalresidual stability region is shaded in each image, with contours at 5% intervals.4.5 Taylor series methods and Pad´e methodsTaylor series methods, including implicit Taylor series methods, and their generalization toHermite-Obreshkoff methods, remain of interest for practical problems. See for instance [22].For the Dahlquist test problem, all of these methods lead to R ( µ ) being a Pad´e approximantto exp ( µ ) . Figures 10 and 9 were generated using Maple’s built-in function for computa-tion of Pad´e approximants [12]. Notice that as the order of the method increases, the size ofthe area enclosed by the δ = .
05 contour increases. We remark that the unwinding numberfrom formula (13) was necessary to get this large central region correct, for high-order meth-ods. Without the correct branch of logarithm chosen, the central region only had a verticalwidth of 2 π , corresponding to the range of the principal branch of logarithm. h being small. Theoptimal δ is the optimal δ , with almost no reservations or caveats. It exists for the Dahlquisttest problem for O ( ) intervals around h =
0, regardless of the one-step numerical methodused, so long as the optimal interpolant does not go through 0. Nonetheless there are someinteresting connections to asymptotic results. First, if the underlying method has forwarderror O ( h p ) as h →
0, that is, it is a p th order method, then the residual δ will also go tozero like h p . That is, δ = O ( h p ) . This is in contrast to the distinction between so-called localerror (which is O ( h p + ) ) and forward error (also called global error). For instance, if R ( µ ) = ( + µ / ) ( − µ / ) , (20)which it is for the Lanczos τ -method with n =
1, then δ = ln k ( R ( µ )) µ − = µ + µ + O (cid:16) µ (cid:17) (21)showing that this particular instance of the Lanczos τ -method acts on the Dahlquist testproblem as if it was a second-order method (in fact it is in general a second order method,but all this expansion does is illustrate that it’s second-order on this problem). Note that forsmall h , R ( µ ) = + O ( µ ) and hence the unwinding number is k = O ( h ) for this method.Similar expansions for other methods confirm their numerical order. ptimal Residuals and the Dahlquist Test Problem 15 The classical stability analysis using the Dahlquist test problem is informative in that it givesthe fundamental reason for loss of stability on taking too large a step size for a stiff problem.Yet the classical analysis neglects the case of eigenvalues with positive real part (whichare indeed important in applications) and furthermore says little about accuracy: decay is aqualitative feature, not a quantitative one, and one can have decay and yet be 100% wrong.As we see in the figures for some common methods, computing the optimal backward errorgives some complementary information, namely the relative size of the figure enclosed bythe 0 .
05 contour level: the larger the area, the larger the time-step that can be taken. We seeindeed that this region (and indeed even the 100% region) can be much smaller than theclassical stability region. This indicates that even though the method may be stable for timesteps inside the classical stability region but outside the 100% contour, it certainly is notaccurate, solving a problem that is more than 100% different to the one that was intended tobe solved.This observation is in some sense not new. Experienced numerical analysts knew this,and knew that one had to take small time steps for accuracy anyway (when it was needed).Still, this quantitative assessment of optimal relative backward accuracy is indeed new. Itseems possible that one might choose a different method to solve some problems, using thiscriterion, than one might choose using the classical stability criterion.
The analysis here applies to any one-step method.
Remark 4
Since the optimal δ is constant (for any one-step method), we have the curiousobservation that constant-stepsize solution using a one-step method gives the exact solutionof ˙ y = Λ y where Λ = λ ( + δ ) = λ ( ln k R ( µ ) / µ ) ; that is, the optimal interpolant is the exactsolution of not only a nearby problem, but a nearby problem of the same kind. That is, wehave not only an optimal backward error, but an optimal structured backward error.That constant-stepsize one-step methods solved ˙ y = Λ y was known; what was not knownwas that this solution has the optimality property derived in this paper. Remark 5
If the optimal backward error is large , then the numerical method has necessarilysolved a very different problem to the one intended. This is a sure and certain indicationthat the underlying numerical method, that generated the skeleton, has failed. In particular,when a numerical method introduces spurious chaos into a nonchaotic system, the optimalbackward error must be too large. A more frequent failure detected (for nonchaotic systems)will be when the automatically-chosen time step sizes are too large; because the analysis ofthis paper and the more general paper in preparation do not rely on the asymptotic limit asthe mean stepsize goes to zero, detection of failure is sure and certain.
John C. Butcher was correct when, critiquing an earlier incarnation of this idea, he said thatthe classical stability theory does indeed explain many features of the behaviour of manynumerical methods on stiff problems. It is a highly-successful theory. The refinement ofusing Order Stars is also highly successful, in that new proofs of classical order barrierswere obtained using it (for instance). , C. Yalc¸ın Kaya and Robert H. C. Moir However, this optimal backward error idea for y n + = R ( µ ) y n with µ = h λ , namelyformula (10) with the unwinding number k from formula (13), δ = ln k R ( µ ) µ − , (22)with k = K : = (cid:20) Im ( µ − ln R ( µ )) π (cid:21) , (23)also explains, quantitatively, some features of the numerical solution of stiff problems byone-step methods. In particular, it explains clearly why one might sometimes wish to takesmaller timesteps than strictly necessary for stability reasons. That is, any adaptive step-sizecontrol that looks at accuracy must be affected by these regions. We note that asymptoticallyas h →
0, the optimal residual approaches the local error per unit step, as explained by a the-orem of Stetter (this is cited and extended in [5]). There are other puzzles that this approachmight help to explain. We look forward, for instance, to finding an example problem forwhich the smaller γ SDIRK method performs better than the larger γ SDIRK method, as ispredicted to exist by figure 6.
Maple WorksheetThe M
APLE worksheet written by R.H.C. Moir used to produce the graphs in this paper isat publish.uwo.ca/~rcorless/Dahlquist/ResidualAnalysis-3D-Contours.mw .AcknowledgementsPart of this work was supported by the Natural Sciences and Engineering Research Councilof Canada. Thanks are also due to the Rotman Institute of Philosophy, the Fields Institutefor Research in the Mathematical Sciences, the Ontario Research Center for Computer Al-gebra, and the University of South Australia which supported a visit of the first author to thesecond. Special thanks are owed to John C. Butcher for his hosting the beautiful ANODEconferences over the years, and his long-term support of research in the Runge-Kutta world.
References
1. Cao, Y., Petzold, L.: A posteriori error estimation and global error control for ordinary differential equa-tions by the adjoint method. SIAM Journal on Scientific Computing (2), 359–374 (2004)2. Corless, R.M.: Error backward. Contemporary Mathematics , 31–31 (1994)3. Corless, R.M.: What good are numerical simulations of chaotic dynamical systems? Computers & Math-ematics with Applications (10-12), 107–121 (1994)4. Corless, R.M., Essex, C., Nerenberg, M.: Numerical methods can suppress chaos. Physics Letters A (1), 27–36 (1991)5. Corless, R.M., Fillion, N.: A graduate introduction to numerical methods. Springer (2013)6. Corless, R.M., Jankowski, J.E.: Variations on a theme of Euler. SIAM Review (4), 775–792 (2016)7. Enright, W.: Continuous numerical methods for ODEs with defect control. Journal of Computationaland Applied Mathematics (1), 159–170 (2000)8. Enright, W., Seward, W.: Achieving tolerance proportionality in software for stiff initial-value problems.Computing (4), 341–352 (1989)9. Enright, W.H.: A new error-control for initial value solvers. Applied Mathematics and Computation ,288–301 (1989)ptimal Residuals and the Dahlquist Test Problem 1710. Enright, W.H., Hayes, W.B.: Robust and reliable defect control for Runge–Kutta methods. ACM Trans-actions on Mathematical Software (TOMS) (1), 1 (2007)11. Enright, W.H., Higham, D.J.: Parallel defect control. BIT Numerical Mathematics (4), 647–663 (1991)12. Geddes, K.O.: Symbolic computation of Pad´e approximants. ACM Transactions on Mathematical Soft-ware (TOMS) (2), 218–233 (1979)13. Hairer, E., Nørsett, S.P., Wanner, G.: Solving ordinary differential equations. i (1993)14. Hubbard, J., West, B.H.: Differential equations: a dynamical systems approach, vol. 5.;5, 18;18.;.Springer-Verlag, New York (1991)15. Iserles, A., Norsett, S.P.: Order Stars: Theory and Applications, vol. 2. CRC Press (1991)16. Jeffrey, D.J., Hare, D.E., Corless, R.M.: Unwinding the branches of the Lambert W function. Mathemat-ical Scientist (1), 1–7 (1996)17. Kaya, C.Y., Noakes, J.L.: Finding interpolating curves minimizing L ∞ acceleration in the Euclideanspace via optimal control theory. SIAM J. Control Optim. , 442–464 (2013)18. Kennedy, C.A., Carpenter, M.H.: Diagonally implicit Runge–Kutta methods for ordinary differentialequations. a review (2016). URL https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20160005923.pdf
19. Lanczos, C.: Applied analysis. Dover (1988)20. Moir, R.H.: Reconsidering backward error analysis for ordinary differential equations. Master’s thesis,The University of Western Ontario. (2010)21. Muir, P., Pancer, R., Jackson, K.R.: PMIRKDC: a parallel mono-implicit Runge–Kutta code with defectcontrol for boundary value ODEs. Parallel Computing (6), 711–741 (2003)22. Neher, M., Jackson, K.R., Nedialkov, N.S.: On Taylor model based integration of ODEs. SIAM Journalon Numerical Analysis (1), 236–262 (2007)23. Ortiz, E.L.: The tau method. SIAM Journal on Numerical Analysis (3), 480–492 (1969)24. S¨oderlind, G., Jay, L., Calvo, M.: Stiffness 1952–2012: Sixty years in search of a definition. BIT Nu-merical Mathematics55