A residual concept for Krylov subspace evaluation of the φ matrix function
Mike A. Botchev, Leonid A. Knizhnerman, Eugene E. Tyrtyshnikov
AA RESIDUAL CONCEPT FOR KRYLOV SUBSPACE EVALUATIONOF THE ϕ MATRIX FUNCTION
M.A. BOTCHEV ∗ , L.A. KNIZHNERMAN † , AND
E.E. TYRTYSHNIKOV ‡ Abstract.
An efficient Krylov subspace algorithm for computing actions of the ϕ matrix functionfor large matrices is proposed. This matrix function is widely used in exponential time integration,Markov chains and network analysis and many other applications. Our algorithm is based on areliable residual based stopping criterion and a new efficient restarting procedure. For matriceswith numerical range in the stable complex half plane, we analyze residual convergence and provethat the restarted method is guaranteed to converge for any Krylov subspace dimension. Numericaltests demonstrate efficiency of our approach for solving large scale evolution problems resulting fromdiscretized in space time-dependent PDEs, in particular, diffusion and convection–diffusion problems. Key words. phi matrix function, Krylov subspace methods, exponential time integration,matrix exponential, restarting, exponential residual
AMS subject classifications.
1. Introduction.
Exponential time integration is an actively developing re-search field [41], with numerous successful applications in challenging large scalecomputations, such as elastic wave equations [43], wind farm simulations [30], powerdelivery network analysis [69], large circuit analysis [70] vector finite element dis-cretizations of Maxwell’s equations [12] and photonic crystal modeling [9, 11].In this paper we propose a Krylov subspace method for solving initial-value prob-lem (IVP)(1.1) y (cid:48) ( t ) = − Ay ( t ) + g, y (0) = v, t (cid:62) , where the matrix A ∈ R n × n and vectors g, v ∈ R n ( g (cid:54) = 0) are given and the problemsize n is supposed to be large, i.e., n (cid:29)
1. Using the variation of constants formula(see, e.g., [45, Chapter 2.3]) y ( t ) = exp( − tA ) v + (cid:20)(cid:90) t exp( − ( t − s ) A ) ds (cid:21) g, t (cid:62) , we can write y ( t ) in the form(1.2) y ( t ) = v + tϕ ( − tA )( g − Av ) , t (cid:62) . Here ϕ ( − tA )( g − Av ) is a matrix-vector product with the matrix function ϕ ( − tA )and ϕ is defined as(1.3) ϕ ( z ) ≡ e z − z , z ∈ C , where we set ϕ (0) = 1 so that ϕ is an entire function. Thus, the matrix function ϕ provides the exact solution (1.2) to system (1.1). In other words, solving IVP (1.1) is ∗ Corresponding author. Keldysh Institute of Applied Mathematics, Russian Academy of Sciences,Miusskaya Sq. 4, Moscow 125047, Russia, [email protected]. † Mathematical Modelling Department, Central Geophysical Expedition, Narodnogo OpolcheniyaSt., 38, Bldg. 3, Moscow 123298, Russia, [email protected]. ‡ Marchuk Institute of Numerical Mathematics, Russian Academy of Sciences, Gubkin St. 8, Mos-cow 119333, Russia, [email protected]. Work of this author is supported by the Rus-sian Science Foundation grant No. 19-11-00338. 1 a r X i v : . [ m a t h . NA ] O c t quivalent to evaluating the matrix function ϕ times a vector in (1.2). The methodfor solving (1.1) we propose in this paper is based on an efficient Krylov subspaceevaluation of the matrix function ϕ ( − tA ) in (1.2).For homogeneous ODEs (ordinary differential equations), i.e., for g = 0, the exactsolution of (1.1) takes the well-known form y ( t ) = exp( − tA ) v . Therefore, computingmatrix-vector products with the matrix function ϕ and the matrix exponential is animportant task in exponential time integration [41], occurring independently as well aswithin higher-order ODE Krylov subspace methods [42, 32] or splitting schemes [36, 8].For large problems a direct evaluation of the matrix function based on dense linearalgebra techniques [50, 37] is out of question. Therefore, the matrix-vector productswith the matrix function are approximately computed, rather than the matrix functionitself. Krylov subspace methods have been a successful tool for doing this since theend of the eighties; we list chronologically [54, 65, 23, 46, 57, 24, 39]. This progress innumerical linear algebra has triggered advances in numerical time integration methods(see, e.g., [31, 40, 18]) and a revival of the exponential time integrators [41], whichhave been developed since the sixties [19, 49, 48].In this work we are interested in solving large scale problems (1.1) which result,for instance, within a method of lines, when spatially discretized PDEs (partial dif-ferential equations) have to be efficiently solved in time. This means that A doesnot have to be symmetric and is often far from a normal matrix. Furthermore, fineand nonuniform grids as well as realistic coefficient values in the PDE may lead toa stiffness in the problem, i.e., the eigenvalues of A may significantly vary in magni-tude [29]. Taking into account unavoidable space discretization errors, error producedby a time integrator does not have to be extremely small in these problems. Usuallyabsolute time errors in the range [10 − , − ] suffice and this is the range we aim here.As we see in numerical tests, delivering accuracies in this range can be a challengefor existing exponential solvers. The proposed method appears to carry out this tasksuccessfully.Krylov subspace exponential integrators are good candidates for solving largescale problems with sparse nonnormal matrices A , such as space-discretized PDEs.Indeed, these solvers have sufficient stability properties and often lead to an efficientcompromise between standard implicit time integrators (which may occur too expen-sive in large scale applications) and explicit stabilized solvers. Moreover, as Krylovsubspace methods can significantly profit from eigenvalue clustering [67], their appli-cation for solving stiff problems can be successful [33, 20, 16, 15] because stiffnesstypically means a large discrepancy in eigenvalues. Other approaches, not employingKrylov subspaces, for evaluating matrix exponential and related matrix functions oflarge sparse matrices include the Chebyshev polynomials, scaling and squaring withTaylor or Pad´e approximations and contour integrals [64, 21, 59, 17, 4].Our approach for computing ϕ matrix-vector products is essentially based ontwo ingredients presented in this paper: (i) a residual notion for ϕ matrix functionsand (ii) a restarting, i.e., an algorithm aiming to preserve convergence while keepingthe Krylov dimension k bounded, k (cid:54) k max [25]. The residual notion leads to areliable stopping criterion and facilitates the proposed restarting. Restarting is oftenindispensable for solving large scale problems because the size n of the matrix A tends to be extremely large in real life applications. Thus, storing k max Krylov basisvectors of size n can be prohibitively expensive for n (cid:29) k max is moderatelylarge. Therefore, in the tests we restrict k max by 30. Numerous restarting algorithmsfor Krylov subspace evaluations of the matrix exponential have been proposed, forinstance [2, 18, 26, 52, 10]. Restarting for general matrix functions include [27, 35]. wo restarting techniques have been designed specifically for the ϕ matrix functions,namely [60] and [53]. They both are based on error estimates and choose a timeinterval length t such that solving (1.1) by k max Krylov steps yields a sufficientlysmall error. In the numerical experiments presented below we show that our methodperforms favorably compared to the codes based on these techniques.Furthermore, we provide convergence analysis of the method, obtaining upperbounds for the residual norm with no asymptotic estimates involved, and show thatthe restarted algorithm is guaranteed to converge for any restart length, i.e., for anyKrylov dimension k max .The structure of this paper is as follows. In Section 2 we describe details of theKrylov subspace approximation and introduce a residual concept for the ϕ matrixfunction. In Section 3 some observations on the residual behavior are given andthe restarting algorithm, based on the observed behavior, is formulated. Section 4is devoted to residual convergence analysis for both symmetric and nonsymmetricmatrices. Based on this analysis, in Section 5 we show convergence of the restartedalgorithm. Section 6 presents numerical experiments. Finally, some conclusions aredrawn in Section 7.
2. Krylov subspace evaluations of the ϕ matrix function.2.1. Residual notion for the ϕ matrix function. Throughout the paper weassume that(2.1) Re( z ∗ Az ) (cid:62) , ∀ z ∈ C n , where the function Re( · ) is the real part of its argument, and, unless reported other-wise, (cid:107) · (cid:107) denotes the vector or matrix 2-norm.To keep our presentation simpler, we first assume that the initial value vectorin (1.1) is zero: v = 0. In this case relation (1.2) takes a form(1.2 (cid:48) ) y ( t ) = tϕ ( − tA ) g, t (cid:62) , and, thus, to solve (1.1) we have to evaluate the action of the matrix function ϕ ( − tA )on the vector g . Let upper Hessenberg matrix H k ∈ R ( k +1) × k and matrix V k +1 = (cid:2) v v . . . v k v k +1 (cid:3) ∈ R n × ( k +1) , v = g/ (cid:107) g (cid:107) , with orthonormal columns v j , be the matrices which result after carrying out k steps ofthe Arnoldi process. These matrices satisfy the Arnoldi decomposition [34, 55, 67, 58](2.2) AV k = V k +1 H k = V k H k + h k +1 ,k v k +1 e Tk . Here H k = V Tk AV k ∈ R k × k consists of the first k rows of H k and the columns v , . . . , v k span the Krylov subspace K k ( A, g ) = span( g, Ag, . . . , A k − g ) . A standard way to compute a Krylov subspace approximation y k ( t ) to y ( t ) in (1.2 (cid:48) )is to set [67, Chapter 11](2.3) y k ( t ) := V k tϕ ( − tH k )( βe ) , with β = (cid:107) g (cid:107) and e = [1 , , . . . , T ∈ R k . This expression for y k ( t ) can be derivedby projecting the IVP (1.1) onto the Krylov subspace K k ( A, g ) = colspan( V k ) in the alerkin sense. Indeed, let us define a residual r k ( t ) of y k ( t ) with respect to (1.1),i.e., let(2.4) r k ( t ) ≡ − Ay k ( t ) + g − y (cid:48) k ( t ) . Now consider y k ( t ) ≈ y ( t ) , y k ( t ) = V k u ( t )such that its residual is kept orthogonal to the Krylov subspace: V Tk r k ( t ) = 0. Since g = βV k e , this orthogonality condition leads to a projected IVP in u ( t ) : R → R k (2.5) u (cid:48) ( t ) = − H k u ( t ) + βe , u (0) = 0 , where u (0) = 0 follows from the assumption y (0) = 0. With u ( t ) = tϕ ( − tH k )( βe ),we see that V k u ( t ) yields (2.3).Now it is easy to observe that the residual (2.4) is readily computable in thecourse of the Arnoldi process:(2.6) r k ( t ) (2.3) = − AV k u ( t ) + g − V k u (cid:48) ( t ) (2.5) = − AV k u ( t ) + g − V k ( − H k y ( t ) + βe )= − AV k u ( t ) + g + V k H k u ( t ) − g (2.2) = − h k +1 ,k v k +1 e Tk u ( t ) , where the bracketed numbers above the equality signs indicate the relations used inthe derivation.In case the initial value vector v in (1.1) is not zero, we introduce a function˜ y ( t ) = y ( t ) − v which satisfies an IVP˜ y (cid:48) ( t ) = − A ˜ y ( t ) + ˜ g, ˜ y (0) = 0 , t (cid:62) , with ˜ g = g − Av . Using (1.2) for this transformed problem we see that ˜ y ( t ) = tϕ ( − tA )˜ g and, thus, a Krylov subspace approximation ˜ y k ( t ) to ˜ y ( t ), as well as its residual ˜ r k ( t )(cf. (2.4)), can be computed as discussed above. Our Krylov subspace approximationto y ( t ) is then y k ( t ) = ˜ y k ( t ) + v . For this approximation we introduce the residual r k ( t ) in the same way as for the case v = 0: see (2.4). Then we have r k ( t ) = − Ay k ( t ) + g − y (cid:48) k ( t ) = − A ˜ y k ( t ) − Av + g − y (cid:48) k ( t ) = − A ˜ y k ( t ) + ˜ g − ˜ y (cid:48) k ( t ) = ˜ r k ( t ) , i.e., the residual r k ( t ) coincides with the residual ˜ r k ( t ) for the transformed solution˜ y k ( t ) and, hence, is also easily computable.Note that the error (cid:15) k ( t ) ≡ y ( t ) − y k ( t ) of the Krylov approximation y k ( t ) satisfiesan IVP(2.7) (cid:15) (cid:48) k ( t ) = − A(cid:15) k ( t ) + r k ( t ) , (cid:15) k (0) = 0 . This shows that the introduced residual can be seen as the backward error for theapproximate solution y k ( t ). This is similar to the well known property of the linearsystem residual: if ˜ x approximates exact solution x of a linear system Ax = b thenfor its residual ˜ r = b − A ˜ x we have A ( x − ˜ x ) = ˜ r . Using (2.1),(2.7) we can bound theerror norm as [10, Lemma 4.1](2.8) (cid:107) (cid:15) k ( t ) (cid:107) (cid:54) tϕ ( − tω ) max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) , where ω (cid:62) .2. ϕ via the matrix exponential, two approaches. If A is nonsingular,a naive approach to compute y := ϕ ( − A ) g for a given g would be to solve v from Av = − g and evaluate y := exp( − A ) v − v . This means that we replace (1.1) byan equivalent IVP with the initial value v and a zero source term. Comparing thisapproach with Krylov subspace evaluation of y := ϕ ( − A ) g shows that both Krylovsubspace methods (for y := exp( − A ) v − v and for y := ϕ ( − A ) g ) require approximatelythe same number of Krylov steps. However, the additional computational costs forsolving Av = − g are not paid off. Hence, unless the systems with A can be solved ata negligible cost, computing actions of ϕ by the matrix exponential actions appearsto be inefficient. Of course, this approach can not be used for a general g in case A issingular.We remark that quite a different and successful approach to compute the ϕ matrixfunction via a matrix exponential is presented in [57, Proposition 2.1] (see also [4]). Forcomputing ϕ ( − tA ) v , A ∈ R n × n , it amounts to evaluating the last column of exp( − t (cid:98) A ),where (cid:98) A ∈ R ( n +1) × ( n +1) is a matrix obtained by augmenting A with v as the lastcolumn and zeros as the last row. In the context of large scale problems and Krylovsubspace methods this approach has a drawback that eventual (skew) symmetry of A is lost in (cid:98) A . We use this approach for the evaluating u ( t ) = tϕ ( − tH k )( βe ) to solvethe small scale projected IVP (2.5), see Section 6.
3. A restarting procedure for ϕ Krylov subspace evaluation.3.1. The residual norm as a function of time.
For real matrices A it is notdifficult to see that property (2.1) is equivalent to x T Ax (cid:62) , ∀ x ∈ R n , which means that the symmetric part ( A + A T ) of A is nonnegative definite. Denote(3.1) ω ≡ min (cid:54) = x ∈ R n x T Axx T x (cid:62) . Then it holds (see, e.g., [45, Section I.2.3])(3.2) (cid:107) exp( − tA ) (cid:107) (cid:54) e − tω , ∀ t (cid:62) . A similar estimate holds for the matrix H k = V Tk AV k produced by the Arnoldi process(see (2.2)). Indeed, let(3.3) ω k ≡ min (cid:54) = y ∈ R k y T H k yy T y . Then ω k = min (cid:54) = y ∈ R k ( V k y ) T A ( V k y ) y T y (cid:62) ω (cid:62) , so that, similarly to (3.2),(3.4) (cid:107) exp( − tH k ) (cid:107) (cid:54) e − tω k , ∀ t (cid:62) . Exploiting the estimate [41, Proof of Lemma 2.4] (cid:107) ϕ ( − tA ) (cid:107) (cid:54) (cid:90) (cid:107) exp( − t (1 − θ ) A ) (cid:107) dθ, ounding the norm under the integral by e − t (1 − θ ) ω and evaluating the integral, weobtain a useful inequality(3.5) (cid:107) ϕ ( − tA ) (cid:107) (cid:54) ϕ ( − tω ) . Then a similar relation holds also for H k :(3.6) (cid:107) ϕ ( − tH k ) (cid:107) (cid:54) ϕ ( − tω k ) . Proposition
Let t > , g ∈ R n and let A ∈ R n × n be such that (2.1) holds.If y k ( t ) is the Krylov subspace approximation (2.3) to the vector y ( t ) = tϕ ( − tA ) g then for its residual r k ( t ) , defined by (2.4) , (2.6) , holds (3.7) (cid:107) r k ( t ) (cid:107) (cid:54) βth k +1 ,k ϕ ( − tω k ) = βω k h k +1 ,k (1 − e − tω k ) , for ω k > ,βth k +1 ,k , for ω k = 0 , where ω k is defined by (3.3) .Proof. From (2.5) we have u ( t ) = tϕ ( − tH k )( βe ), so that (cid:107) u ( t ) (cid:107) (cid:54) βt (cid:107) ϕ ( − tH k ) (cid:107) (cid:54) βtϕ ( − tω k ) = βω k (1 − e − tω k ) , for ω k > ,βt, for ω k = 0 , where (3.6) is used. Relation (2.6) implies (cid:107) r k ( t ) (cid:107) = h k +1 ,k | e Tk u ( t ) | (cid:54) h k +1 ,k (cid:107) u ( t ) (cid:107) , where h k +1 ,k (cid:62) (cid:107) u ( t ) (cid:107) , gives the required bound (3.7).Note that (3.7) exhibits solely the time behavior of the residual r k ( t ) as a functionof time for a fixed k . A more detailed analysis of the residual dependence on k andon t is given below in Section 4. The method for evaluatingthe ϕ matrix functions we propose in this paper adopts the so-called residual-time(RT) restarting strategy. It is proposed for the matrix exponential evaluations in ourrecent paper [13]. The RT strategy is essentially based on a notion of the matrixexponential residual defined, for an approximation w k ( t ) ≈ w ( t ) = exp( − tA ) v , as [18,22, 10] − Aw k ( t ) − w (cid:48) k ( t ) . The RT restarting is based on the observation that the residual is easily computableand the residual norm tends to be a monotonically increasing function of time, whichexplains the name “residual-time” restarting. The ϕ residual (2.4) exhibits the samebehavior (see relation (2.6) and Proposition 3.1 above). Therefore the RT restartingapproach can be readily adopted for the ϕ evaluations.The RT restarting approach is quite simple. We solve (1.1) by performing k =1 , , . . . , k max steps of the Arnoldi process (see Section 2.1). We stop as soon as theresidual norm max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) is small enough. If, after k max carried out steps, thisis not the case we find a time moment δ such that the residual norm (cid:107) r k max ( s ) (cid:107) issufficiently small for s ∈ [0 , δ ]. Then we set in (1.1) v := y k ( δ ), decrease the time Given: A ∈ R n × n , g (cid:54) = 0 , v ∈ R n , t > , k max and tol > convergence := false, tol := tol /t while not(convergence) β := (cid:107) g − Av (cid:107) , v := β ( g − Av ) for k = 1 , . . . , k max carry out step k of Arnoldi process:compute v k +1 and h ,k , ..., h k +1 ,k if max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) (cid:54) tol then ( ∗ ) convergence := truebreak (leave the for loop)elseif k = k max % -------- restart after k max stepsfind largest δ such that max s ∈ [0 ,δ ] (cid:107) r k ( s ) (cid:107) (cid:54) tol ( ◦ ) u := δϕ ( − δH k ) e v := v + V k ( βu ) t := t − δ break (leave the for loop)end ifend forend while u := tϕ ( − tH k ) e , y k := v + V k ( βu ) Fig. 3.1 . Description of the RT (residual–time) restarting algorithm for the ϕ matrix functionevaluations. The algorithm computes Krylov subspace approximation y k ( t ) ≈ v + ϕ ( − tA )( g − Av ) such that for its residual r k ( t ) holds max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) (cid:54) tol . interval t := t − δ and start the Arnoldi process for this modified IVP again. This RTrestarting for the ϕ matrix function evaluations is outlined in Figure 3.1.To monitor the values of the residual norm in the algorithm (see the algorithmlines marked with ( ∗ ), ( ◦ )), we compute the exact solution u ( s ) of the projectedIVP (2.5) at time moments s = ∆ t , 2∆ t , 3∆ t , . . . . This is done by recursion(3.8) u ( s + ∆ t ) = E k u ( s ) + ∆ td k ,E k = exp( − ∆ tH k ) , d k = βϕ ( − ∆ tH k ) e , where the matrix E k and vector d k are computed once. The value of ∆ t is chosen inlines ( ∗ ) and ( ◦ ) differently:(3.9) ∆ t := t/ , to estimate max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) , line ( ∗ ) ,t p , with smallest p = 0 , , . . . suchthat (cid:107) r k (∆ t ) (cid:107) (cid:54) tol , line ( ◦ ) . For estimating max s ∈ [0 ,t ] (cid:107) r k ( s ) (cid:107) in line ( ∗ ) of the algorithm, ∆ t does not have to bevery small. This is because (cid:107) r k ( s ) (cid:107) usually achieves its maximum at the end point s = t and the value (cid:107) r k ( t ) (cid:107) determines the quality of the approximation y k ( t ) ≈ y ( t ).We emphasize that we compute the values of u ( s ) exactly, and the computationaccuracy does not depend on ∆ t as is the case in a regular time integration scheme.To determine the restart interval length δ (line ( ◦ ) in Figure 3.1), we proceedaccording to formula (3.9). More specifically, ∆ t is initially set to t/ (cid:107) r k (∆ t ) (cid:107) (cid:54) ol , we trace the values (cid:107) r k (∆ t ) (cid:107) , (cid:107) r k (2∆ t ) (cid:107) , . . . by recursion (3.8), until they exceedthe tolerance value. Then we set δ to the largest p ∆ t for which (cid:107) r k ( p ∆ t ) (cid:107) (cid:54) tol .Otherwise, if (cid:107) r k (∆ t ) (cid:107) > tol for ∆ t = t/ t until (cid:107) r k (∆ t ) (cid:107) (cid:54) tol and then set δ := ∆ t .
4. Estimates on the residual in terms of the Faber series.
For simplifi-cation of formulae we assume in this section that β = 1.Faber series as a means to investigate convergence of Arnoldi’s method wereintroduced in [46]; see also [6]. Denote by W ( A ) be the numerical range of matrix A and let Φ j be the Faber polynomials [63] for the compact W ( A ). Since, for a fixed t , ϕ ( − tz ) is an entire function of z , we can consider the Faber series decomposition(4.1) f ( z ; t ) ≡ tϕ ( − tz ) = 1 − e − tz z = + ∞ (cid:88) j =0 f j ( t )Φ j ( z ) , z ∈ W ( A ) , t (cid:62) , where t is considered as a parameter.The following assertion is an analogue of [14, Proposition 1]. Proposition
The residual (2.4) satisfies the inequality (4.2) (cid:107) r k ( t ) (cid:107) (cid:54) h k +1 ,k + ∞ (cid:88) j = k − | f j ( t ) | (cid:54) (cid:107) A (cid:107) + ∞ (cid:88) j = k − | f j ( t ) | . Proof.
The superexponential convergence in j and the smoothness of f j ( t ) in t enable one to differentiate series (4.1) in t . Decomposition (4.1) implies that theapproximant (2.3) can be represented as(4.3) y k ( t ) = V k f ( H k ; t ) e = V k + ∞ (cid:88) j =0 f j ( t )Φ j ( H k ) e . From (4.1) we see that(4.4) e − tz = 1 − zf ( z ; t ) = 1 − + ∞ (cid:88) j =0 f j ( t ) z Φ j ( z ) . Therefore, since ∂e − tz ∂t = − ze − tz , we obtain, differentiating (4.4), − ze − tz = ∂∂t − + ∞ (cid:88) j =0 f j ( t ) z Φ j ( z ) , − z − + ∞ (cid:88) j =0 f j ( t ) z Φ j ( z ) = − + ∞ (cid:88) j =0 f (cid:48) j ( t ) z Φ j ( z ) , − + ∞ (cid:88) j =0 f j ( t ) z Φ j ( z ) = + ∞ (cid:88) j =0 f (cid:48) j ( t )Φ j ( z ) , − + ∞ (cid:88) j =0 (cid:2) f (cid:48) j ( t ) + zf j ( t ) (cid:3) Φ j ( z ) = 0 . (4.5) xploiting equality deg Φ j = j and relation (2.4), (4.3), (2.2), (4.5) with H k substi-tuted for z , we get r k ( t ) = − ( V k H k + h k +1 ,k v k +1 e T k ) + ∞ (cid:88) j =0 f j ( t )Φ j ( H k ) e + g − V k + ∞ (cid:88) j =0 f (cid:48) j ( t )Φ j ( H k ) e = − V k + ∞ (cid:88) j =0 (cid:2) f (cid:48) j ( t ) I k + H k f j ( t ) (cid:3) Φ j ( H k ) e + V k e − h k +1 ,k v k +1 e T k + ∞ (cid:88) j =0 f j ( t )Φ j ( H k ) e = − h k +1 ,k v k +1 e T k + ∞ (cid:88) j = k − f j ( t )Φ j ( H k ) e , where I k denotes the k × k identity matrix. The bound (cid:107) Φ j ( H k ) (cid:107) (cid:54) Remark Comparing estimate (4.2) and the estimate on the error (cid:107) y ( t ) − y k ( t ) (cid:107) (cid:54) + ∞ (cid:88) j = k | f j ( t ) | (see [6, Theorem 3.2]) shows that the two upper bounds mainly differ in coefficients.Thus, we can conjecture that the error and the residual behave similarly to each other. We assume here that A T = A and the spectralinterval of A is contained in the segment [0 , c ], c >
0. The change of variable x = ( z − c ) /c translates f ( z ; t ) into the function g ( x ; t ) ≡ − e − ct e − ctx c ( x + 1) , and we wish to estimate its Chebyshev coefficients in x , because in this case theFaber series essentially reduces to the Chebyshev series up to a linear variable scaling.Following [56], we denote by a m [ h ] the m th Chebyshev coefficient of a function h defined for x ∈ [ − ,
1] as h ( x ) = + ∞ (cid:88) m =0 (cid:48) a m [ h ] T m ( x ) , where T m is the m th first kind Chebyshev polynomial and the prime (cid:48) means that theterm at m = 0 is divided by two. Lemma
For k (cid:62) the inequality (4.6) + ∞ (cid:88) m = k − (cid:12)(cid:12)(cid:12)(cid:12) a m (cid:20) − e − ct e − ctx c ( x + 1) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) e − ct c + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) I ν ( ct ) . holds with Bessel functions I ν .Proof. With formulae [56, Ch. II, §
11, (10), (54)], we have e − ctx = 2 + ∞ (cid:88) j =0 (cid:48) I j ( − ct ) T j ( x ) = 2 + ∞ (cid:88) j =0 (cid:48) ( − j I j ( ct ) T j ( x ) . hen, with δ i,j denoting the Kronecker delta, we obtain a m (cid:20) − e − ct e − ctx c (cid:21) = 2 δ m, − e − ct a m [ e − ctx ] c = 2 δ m, − − m e − ct I m ( ct ) c , which in view of [56, Ch. II, §
9, Theorem 9.4, (42)] gives a m (cid:20) − e − ct e − ctx c ( x + 1) (cid:21) = 2 + ∞ (cid:88) l =0 ( − l (cid:18) l + 11 (cid:19) a m + l +1 (cid:20) − e − ct e − ctx c (cid:21) = 4 c + ∞ (cid:88) l =0 ( − l ( l + 1)[ δ m + l +1 , − e − ct ( − m + l +1 I m + l +1 ( ct )]= 4 c ( − m + ∞ (cid:88) l =0 ( l + 1) e − ct I m + l +1 ( ct ) , whence + ∞ (cid:88) m = k − (cid:12)(cid:12)(cid:12)(cid:12) a m (cid:20) − e − ct e − ctx c ( x + 1) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) e − ct c + ∞ (cid:88) m = k − ∞ (cid:88) l =0 ( l + 1) I m + l +1 ( ct )= 4 e − ct c + ∞ (cid:88) ν = k (cid:34) ν − k (cid:88) l =0 ( l + 1) (cid:35) I ν ( ct ) = 4 e − ct c + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1)2 I ν ( ct )= 2 e − ct c + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) I ν ( ct ) . Proposition
For k (cid:62) (cid:107) r k ( t ) (cid:107) (cid:54) e − ct + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) I ν ( ct ) . Proof.
Gathering (4.2), the definition of c , (4.6) and the fact that Φ [ − , ,ν = 2 T ν , ν (cid:62) §
1, (21)]), derive (cid:107) r k ( t ) (cid:107) (cid:54) c + ∞ (cid:88) j = k − | f j ( t ) | = 2 c + ∞ (cid:88) j = k − (cid:12)(cid:12)(cid:12)(cid:12) a j (cid:20) − e − ct e − ctx c ( x + 1) (cid:21)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) e − ct + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) I ν ( ct ) . A similar use of Bessel functions I k can be found in [23, § ν − k + 2)( ν − k + 1) as a complication. Note that the upper bound (4.7) can beeasily computed with a high accuracy by truncating the infinite sum (cid:80) + ∞ ν = k to (cid:80) Mν = k where M can be estimated as discussed in [23, § Lemma
Let µ ∈ N and w (cid:62) . Then the Bessel function I µ obeys the in-equality (4.8) e − w I µ ( w ) (cid:54) w µ (2 µ − . roof. Using formulae [1, 9.6.18, 6.1.8, 6.1.12], derive e − w I µ ( w ) (cid:54) ( w/ µ √ π Γ( µ + 1 / (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π (cid:90) e w cos θ sin µ θ dθ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:54) e − w ( w/ µ √ π Γ( µ + 1 / πe w = √ πw µ µ · (2 µ − µ · √ π = w µ (2 µ − . Proposition k of order at least ct in the symmetriccase). If (4.9) k (cid:62) max { ct, } , then (4.10) (cid:107) r k ( t ) (cid:107) (cid:54) ct ) k (2 k − Proof.
Exploiting assumption (4.9) and the inequality (see [51, Proposition 1]) I ν +1 ( w ) < I ν ( w )1 + νw , w > , ν (cid:62) − , obtain for ν (cid:62) k ( ν − k + 3)( ν − k + 2) I ν +1 ( ct )( ν − k + 2)( ν − k + 1) I ν ( ct ) = (cid:18) ν − k + 1 (cid:19) · I ν +1 ( ct ) I ν ( ct ) <
31 + νct (cid:54)
31 + ctct (cid:54) , so that, in view of (4.8), e − ct + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) I ν ( ct ) (cid:54) e − ct · I k ( ct ) (cid:54) ct ) k (2 k − . It remains to substitute the above bound into (4.7).Proposition 4.5 is an analogue of [23, Theorem 5]. Bound (4.10) would be expo-nentially worse, if we used the Taylor series instead of the Chebyshev one; this clarifiesthe role of Faber series.In Figure 4.1 we plot the graph of the right-hand side of (4.7) for the values c and t as occur in the numerical test presented in Section 6.1. Since (cid:107) A (cid:107) (cid:54) (cid:107) A (cid:107) (cid:107) A (cid:107) ∞ (see, e.g., [44, Chapter 5.6]) and (cid:107) A (cid:107) = (cid:107) A (cid:107) ∞ for A = A T , we can estimate c by (cid:107) A (cid:107) /
2. This leads to the c values reported in caption of Figure 4.1. The t valuesused in the plots (and reported in the caption) are taken to be approximately theaverage time interval length ¯ δ per restart, i.e., t := ¯ δ , obtained from data in Table 6.1as the total time interval length divided by the number of restarts. In the figure wealso plot the actual residual norm values computed with the regular Lanczos process.From Figure 4.1 we see that the obtained upper bound is far from being sharp.This is certainly expected. Although (4.7) is a true upper bound and contains noasymptotic estimates, it does not reflect the sensibility of the Krylov subspace methods
50 100 15010 -15 -13 -11 -09 -07 -05 -03 -01 Krylov dimension k test 1, 40 x 40 x 40 grid c=501.1359, t=0.8||r k || 0 20 40 60 80 100 120 14010 -15 -13 -11 -09 -07 -05 -03 -01 Krylov dimension k test 1, 60 x 60 x 60 grid c=98.9898, t=3.0||r k || Fig. 4.1 . The solid lines are error upper bound (4.7) for c = 501 . , t = ¯ δ = 0 . (left plot)and c = 98 . , t = ¯ δ = 3 . (right plot). The values correspond to A being discretized anisotropicelliptic operator described in Section 6.1, for × × (left plot) and × × (right plot)finite volume grids, respectively. The dashed lines are the actual scaled residual norm values (cid:107) r k (cid:107) /β . with respect to the discrete structure of the spectrum. To confirm this and to showthat the upper bound in (4.7) can be sharp, in Figure 4.2 we plot the upper boundalong with the actual residual norm values for the following tridiagonal A ∈ R n × n and g ∈ R n :(4.11) A = / √ . . . / √ / / , g = . It can be shown that (cid:107) A (cid:107) ≈ n , hence, we take n = 1000 and c = 1. Theresidual values are computed with the regular Lanczos process. As we see from theplots, the upper bound (4.7) is sharp. -15 -13 -11 -09 -07 -05 -03 -01 Krylov dimension k c=1.0, t=1.0||r k || 0 10 20 30 4010 -15 -13 -11 -09 -07 -05 -03 -01 Krylov dimension k c=1.0, t=20||r k || Fig. 4.2 . The solid lines are error upper bound (4.7) for c = 1 , t = 1 (left plot) and c = 1 , t = 20 (right plot). The values correspond to A defined in (4.11) . The dashed lines are the actualscaled residual norm values (cid:107) r k (cid:107) /β obtained for g defined in (4.11) . .3. A weakly non-symmetric case. Proposition
Let thenumerical range W ( A ) be enlcosed by an ellipse E c,ρ with foci and c with the sumof semiaxes ρc , c > , ρ > . Then for k (cid:62) (cid:107) r k ( t ) (cid:107) (cid:54) e − ct + ∞ (cid:88) ν = k ( ν − k + 2)( ν − k + 1) ρ ν I ν ( ct ) . Proof.
The proof is a slight generalization of that of proposition 4.3; the lattercorresponds to the limit case ρ = 1. One should exploit the relation (see [63, Ch. II, §
1, (25)]) between the Faber polynomials for E c,ρ and the Faber polynomials for thefocal segment [0 , c ] (actually, the shifted Chebyshev polynomials).Similarly to (4.7), upper bound (4.12) is easily computable by truncating theinfinite series in (4.12).In Figure 4.3 we plot the right-hand side of (4.12) for the matrices of the secondand third test problems, see Sections 6.2 and 6.3. In each of these two test problemsseveral grids and corresponding parameter settings are used, and the two matricesfor the plots in Figure 4.3 are built for the grids 202 ×
202 (the second test) and512 ×
512 (the third test), respectively. The t values are approximate average timeinterval length ¯ δ per restart, computed as the total time interval length divided bythe number of restarts. To obtain the ellipse parameters appearing in the bound, weapproximate the numerical ranges of the matrices A by the numerical ranges of theirKrylov projections H k = V Tk AV k , see (2.2). This is done for a sufficiently large k such that the exterior eigenvalues of A are well approximated by the eigenvalues of H k . Thus, the plots in Figure 4.3 are produced for approximate spectral data but dosuffice for illustrative purposes. In the figure the actual residual norm is also plotted. Krylov dimension k -15 -13 -11 -9 -7 -5 -3 -1 test 2: complex diffusion c=4.0398e07,t=1.25e-05, =1.0142||r k || Krylov dimension k -15 -13 -11 -9 -7 -5 -3 -1 test 3: dominating convection c=0.49949,t=3750, =1.0937||r k || Fig. 4.3 . The solid lines are error upper bound (4.12) for c = . , t = ¯ δ = , ρ = 1 . (left plot) and c = 0 . , t = ¯ δ = 3750 , ρ = 1 . (right plot). The values correspondto the matrices A of the second test problem (see Section 6.2, grid × ) and the third testproblem (see Section 6.3, grid × ), respectively. The dashed lines are the actual scaledresidual norm values (cid:107) r k (cid:107) /β .
5. Convergence of the restarted method.
Proposition
Let the RT restarted Krylov subspace method be used to com-pute an approximation to the solution y ( t ) , t < ∞ , of IVP (1.1) and let tol be he residual tolerance used at each restart. Then for any restart length k max the RTrestarted Krylov subspace method stops after a finite number J of restarts and theerror of its approximate solution y k ( t ) is bounded by t · tol , (cid:107) y ( t ) − y k ( t ) (cid:107) (cid:54) t · tol , with t being the length of the time interval on which problem (1.1) is solved.Proof. Based on Propositions 3.1, 4.3, 4.6, we know that at each restart j a δ j > tol on thetime interval of length δ j . Hence, since the time interval length t is finite and isreduced at each restart by δ j , a finite number J of restarts will suffice. The timeinterval after j restarts is reduced from [0 , t ] to [0 , t − ( δ + · · · + δ j )] and (cid:80) Jj =1 δ j = t .To simplify the notation in the proof, by y j ( s ), s ∈ [0 , δ j ] we denote the Krylovsubspace solution which is obtained, with k (cid:54) k max steps of the Arnoldi process, atrestart j ( k < k max is possible only for j = J ). We also denote the residual of y j ( s )by r j ( s ). With this notation, y j ( s ) approximates the exact solution y ( (cid:80) j − i =1 δ i + s )for s ∈ [0 , δ j ] with residual r j ( s ) such thatmax s ∈ [0 ,δ j ] (cid:107) r j ( s ) (cid:107) (cid:54) tol . Then the approximate solution at the first restart is y ( δ ) = v + δ ϕ ( − δ A )( g − Av ) + (cid:15) ( δ ) , where (cid:15) ( s ) is the error of the approximation y ( s ) ≈ y ( s ). Using (2.8) we can bound (cid:107) (cid:15) (cid:107) as (cid:107) (cid:15) ( δ ) (cid:107) (cid:54) δ ϕ ( − δ ω ) max s ∈ [0 ,δ ] (cid:107) r ( s ) (cid:107) (cid:54) δ ϕ ( − δ ω ) · tol (cid:54) δ · tol , where ω (cid:62) (cid:15) j = (cid:15) j ( δ j ) and y j = y j ( δ j ). Then at restart 2 we have y = y + (cid:15) + δ ϕ ( − δ A )( g − A ( y + (cid:15) )) + (cid:15) = y + δ ϕ ( − δ A )( g − Ay ) (cid:124) (cid:123)(cid:122) (cid:125) exact solution + δ ϕ ( − δ A )( − A(cid:15) ) + (cid:15) + (cid:15) (cid:124) (cid:123)(cid:122) (cid:125) ε , where we denote all the error terms by ε . Now note that δ ϕ ( − δ A )( − A(cid:15) ) + (cid:15) = exp( − δ A ) (cid:15) , and, hence, ε = exp( − δ A ) (cid:15) + (cid:15) , (cid:107) ε (cid:107) (cid:54) (cid:107) (cid:15) + (cid:15) (cid:107) (cid:54) ( δ + δ ) · tol . Here we bound (cid:15) in the same way as (cid:15) and use estimate (3.2). Continuing in thesame way, we obtain at the last restart Jε J = exp( − ( t − δ ) A ) (cid:15) + exp( − ( t − δ − δ ) A ) (cid:15) + . . . + exp − ( t − J − (cid:88) j =1 δ j ) A (cid:15) J − + (cid:15) J , (cid:107) ε J (cid:107) (cid:54) (cid:107) (cid:15) (cid:107) + · · · + (cid:107) (cid:15) J (cid:107) (cid:54) ( δ + · · · + δ J ) · tol = t · tol .
20 40 60 80 100 120 140 160 180
Krylov dimension k -15 -13 -11 -9 -7 -5 -3 -1 test 1, 40 x 40 x 40 grid t=0.1t=0.2t=0.4t=0.8 Fig. 5.1 . Residual upper bound (4.7) for c = 501 . (the first test problem, × × grid)and different time interval lengths t . The number of restarts J appearing in Proposition 5, can be estimated from theresidual upper bounds (4.7),(4.10),(4.12). This can be done, for example, using plotsas shown in Figure 5.1. There, for the test problem described in Section 6.1, we plotthe bound (4.7) for different time interval lengths t . For instance, if k max ≈
30 andthe requested tolerance is in the range [10 − , − ], then the restart interval length δ should be approximately 0 . t = 1000 used in the test, J ≈ / . J . Indeed, as we see from Table 6.1, the actual number of restarts is roughly fourtimes smaller.
6. Numerical experiments.
In the experiments presented here we test ourmethod to compute actions of ϕ ( − tA ) implemented with the proposed RT restarting.We compare our code phiRT with two other well known Krylov subspace codes forevaluating ϕ ( − tA ), namely the phiv function of the EXPOKIT package [60] and the phip function presented in [53] and publicly available online . We do not include therecently proposed code phipm [53] in the comparisons because this code uses a variableKrylov dimension up to 100. Using such a high Krylov dimension in the large scaleproblems is not always feasible, and in the tests we restrict the Krylov dimension forall the three solvers by 30.All the tests are carried out on a Linux laptop with four AMD Ryzen 3 3200UCPUs (clock rate 2.6 GHz) and 8 Gb memory, in Octave (version 4.2.2) with noparallelization. Our implementation of the Arnoldi and Lanczos processes does notinclude Krylov basis vector reorthogonalization and no serious orthogonality loss hasbeen observed. In the numerical tests the Krylov subspace matrices are computed bythe Lanczos process as soon as A = A T , otherwise the Arnoldi process is employed.To calculate matrix-vector products ϕ ( − tH k ) x for the small projected Krylovsubspace matrix H k in our algorithm (cf. (3.8)), we employ relation [57, Proposi-tion 2.1] exp( (cid:20) − tH k x (cid:21) ) = (cid:20) exp( − tH k ) ϕ ( − tH k ) x (cid:21) , computing the matrix exponential at the left hand side and, thus, reducing com-putations with ϕ to computations with the matrix exponential (see also [4]). Tocompute the matrix exponentials exp( − tH k ) here and in (3.8) the standard built-in ∼ jitse/software.html15 unction expm of Octave is used. It is based on Ward’s scaling and squaring methodcombined with diagonal Pad´e approximation and is certainly not the best methodavailable, see [50, 38]. For the moderate tolerance range [10 − , − ] we aim in thetests, we have observed no accuracy problems with this Octave implementation. Wehave chosen to stick to the built-in function expm to have our Octave code fully Mat-lab portable. Note that Matlab’s implementation of expm [3] is seen as more reliablebut is not publicly available. For higher accuracy requirements we could switch, forinstance, to the phipade function of the EXPINT package [7].In all the tests the reported achieved error is measured for each of the three Krylovsubspace solvers as (cid:107) y k ( t ) − y ref ( t ) (cid:107)(cid:107) y ref ( t ) (cid:107) , where y k ( t ) is the solution of the Krylov subspace solver at final time t and y ref ( t ) isa reference solution computed with a high tolerance (reported for each test below) bythe phiv function. Since the reference solution y ref ( t ) is affected by the same spacediscretization error as y k ( t ), (cid:107) y k ( t ) − y ref ( t ) (cid:107) is a good indicator of the time error. In this test wesolve IVP (1.1) with a symmetric matrix A . It is a 3D (three-dimensional) anisotropicelliptic operator 10 ∂ xx + 10 ∂ yy + ∂ zz with periodic boundary conditions defined inthe spatial domain [0 , × [0 , × [0 ,
1] and discretized by a fourth order accurate finitevolume discretization described in [68]. The source vector g contains the grid valuesof either of the functionssin(2 πx ) sin(2 πy ) sin(2 πz ) + x (1 − x ) y (1 − y ) z (1 − z ) , e − x − . +( y − . +( z − . )) + x (1 − x ) y (1 − y ) z (1 − z ) . We use two different source vectors g to make sure that Krylov subspace methods donot profit from a particular choice of the source vector. In this test the initial valuevector v in (1.1) is set to zero.In the test we use uniform 40 × ×
40 and 60 × ×
60 grids, for which theproblem size is n = 64 000 and n = 216 000, respectively. For these grids the matrix A has its norms (cid:107) A (cid:107) = (cid:107) A (cid:107) ∞ approximately equal to 1002 .
27 and 197 .
98, respectively.These values give the estimations (cid:107) A (cid:107) / c used in Figure 4.1. The final timeis t = 1000 and the reference solution is computed by function phiv with tolerance tol = .The results for this test are presented in Table 6.1 and in Figure 6.1, wherefor each of the solvers the values of achieved accuracy, the number of matrix-vectormultiplications (matvecs) and the CPU times are given. These values are reported forthe same tolerance . Exploring performance of the solvers for different tolerancevalues leads to plots in Figure 6.1. As can be seen in the plots, on the coarser gridfunction phiv fails to produce results with an accuracy less stringent than ≈ − .Its least accurate result (corresponding to the upper left end of the dashed curve)is obtained for tolerance . An attempt to run the code with a less stringenttolerance yields an error message. Furthermore, the phip solver does not producethe results within the required accuracy range [10 − , − ] and appears to be moreexpensive for this test than our phiRT . For both spatial grids the phip solver exhibitsan abrupt change in the delivered accuracy with gradually varying requested tolerance,indicated in the Figure by segments AB and CD . Thus, we see that both phiv and phip solvers have difficulties with delivering results within the desired accuracy range[10 − , − ]. able 6.1 Results for the fourth order finite-volume discretized anisotropic heat equation method(Krylov dim.), delivered matvecs /requested tolerance error CPU time, s40 × ×
40 grid, sine g phiv (10), phip (10), phiRT (10), phiv (30), phip (30), phiRT (30), × ×
40 grid, Gaussian g phiv (30), phip (30), phiRT (30), × ×
60 grid, sine g phiv (30), phip (30), phiRT (30), -13 -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 a cc u r a cy A BphivphipphiRT 100 200 300 400 50010 -13 -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 CPU time a cc u r a cy A B phivphipphiRT5000 10000 1500010 -13 -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 a cc u r a cy C D phivphipphiRT 100 200 300 400 500 600 70010 -13 -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 CPU time a cc u r a cy C D phivphipphiRT
Fig. 6.1 . The fourth order finite-volume discretized anisotropic heat equation. Relative errorversus number of the matrix-vector products (left) and versus the CPU time (right) for the phiv (dashed line), phip (dash-dotted line) and phiRT (solid line). The restart length is 30 for all thethree solvers. The upper left end of the phiv curve at the top plots is obtained for tolerance and running phiv with tolerance fails with an error message. The points A , B , C , and D onthe phip curve are obtained for the tolerances , , , and , respectively. − − − y x -1 -0.5 0 0.5 1-1-0.500.51 x y velocity field Fig. 6.2 . Diffusion coefficients D , ( x, y ) (left) and velocity field [ v ( x, y ) , v ( x, y )] (right). Theregion where D , = 10 occupies the square [ − . , . × [ − . , . and is surrounded by the wallwhere D , = 10 − which is . wide and has a . broad slit. We now consider a nonsymmetric matrix A resulting from a standard fivepoint central-difference discretization of the following convection–diffusion operator: L [ u ] = − ( D u x ) x − ( D u y ) y + Pe (cid:18)
12 ( v u x + v u y ) + 12 (( v u ) x + ( v u ) y ) (cid:19) , where ( x, y ) ∈ [ − , × [ − ,
1] and the homogeneous Dirichlet conditions are imposedon the boundary. Here the first derivatives representing the convective terms arewritten in a special way. If the velocity field is divergence-free, i.e., if ( v ) x + ( v ) y =0, then the central-difference discretization of the convective terms yields a skew-symmetric matrix [47]. This physically desirable property [68] guarantees that thefirst derivatives do not contribute to the symmetric part of the matrix which is thecase for diffusive upwind discretizations.The diffusion coefficients D , ( x, y ) are defined as shown at the left plot in Fig-ure 6.2. This choice is inspired by [66, test 4]. The velocity field, taken as v ( x, y ) = y (1 − x ) , v ( x, y ) = x ( y − , is the recirculating wind adopted from the IFISS package [62], see the right plot inFigure 6.2. The initial value vector v , cf. (1.1), has all its entries set to 0 .
01. For v = 0 the solution may become unstable due to appearance of small negative solutionvalues (recall that central difference discretizations are used for both diffusion andconvection terms). The components of the source vector g are the values of function1000e − x + y ) on the finite difference grid. Uniform 202 × ×
402 and 802 ×
802 grids areused in the experiments, so that the problem size is n = 40 000, n = 160 000 and n = 640 000, respectively. On the first two grids the Peclet number Pe is set to 10,whereas Pe = 20 on the finest grid. For these Peclet values and all the three gridsthe ratio of the norms of the skew-symmetric and symmetric parts of the matrices,measured as the maximal row and column norms, is of order 10 − , i.e., the matrices are ig. 6.3 . Solution of finite-difference discretized convection–diffusion problem at time t =5 × − (left) and t = 10 − (right), × grid, Pe = 10 . At the right plot a correspondenceto the values of the diffusion coefficients D , is clearly visible, with a leakage through the slit, seeFigure 6.2. weakly nonsymmetric. Solution samples of this test problem can be seen in Figure 6.3.The reference solution is computed by the phiv function run with tolerance .The results of the comparison runs are presented in Table 6.2. To see clearlythe dependence of the attained accuracy on the computational work and CPU time,we also present plots in Figure 6.4. We see that phiv tends to deliver a much moreaccurate solution than is requested by given tolerance. We also see that for thecoarsest 202 ×
202 grid our solver is more efficient than phip in terms of matrix–vector multiplications but requires more CPU time. This effect disappears on the finergrids. There, for the moderate tolerance values we are interested in, phiRT clearlyoutperforms the other solvers in terms of CPU time and matrix–vector multiplications.
In this test IVP (1.1) is solved where A and g are produced by theIFISS software package [62, 61, 28]. More precisely, A results from a finite-elementdiscretization of the following two-dimensional convection–diffusion problem:(6.1) − ν ∇ u + (cid:126)w · ∇ u = 0 , u = u ( x, y ) , ( x, y ) ∈ [ − , × [ − , , where ν is viscosity parameter and Dirichlet boundary conditions u = 0 . y = 1, where u ( x,
1) = 12 + 12 (1 − x ) . The initial value vector v in (1.1) is set to zero in this test.Discretization of (6.1) in IFISS by bilinear quadrilateral ( Q ) finite elements withthe streamline upwind Petrov–Galerkin (SUPG) stabilization leads to a linear system Au = g , where g enforces the Dirichlet boundary conditions. Thus, the IVP we solveis a nonstationary version of the stationary discretized convection–diffusion problem.We use uniform 512 ×
512 and 1024 × ν = 1 / ν = 1 / ν min (cid:26) h x cos α , h y sin α (cid:27) (cid:107) (cid:126)w (cid:107) , α = arctan w y w x , is ≈
25, where h x,y and w x,y are respectively the element sizes and the componentsof the velocity (cid:126)w . able 6.2 Results for the finite-difference discretized convection–diffusion test problem. method(Krylov dim.), delivered matvecs /requested tolerance error CPU time, s202 ×
202 grid, Pe = 10, t = 10 − phiv (30), phiv (30), phip (30), phip (30), phiRT (30), phiRT (30), ×
402 grid, Pe = 10, t = 10 − phiv (30), phiv (30), phip (30), phip (30), phiRT (30), phiRT (30), phiv (20), phip (20), phiRT (20), phiRT (20), ×
802 grid, Pe = 20, t = 5 · − phiv (30), phip (30), phiRT (30), phiv (20), phip (20), phiRT (20), (cid:107) A + A T (cid:107) = 1, (cid:107) A − A T (cid:107) ≈ . ×
512 grid and (cid:107) A + A T (cid:107) = 1, (cid:107) A − A T (cid:107) ≈ . × t = 2 · . For this t the solution y ( t )of IVP (1.1) still differs significantly from the stationary solution A − g to which y ( t )converges for growing t , see Figure 6.5. The reference solution is computed by phiv with tolerance .The results are presented in Tables 6.3 and 6.4 and in Figure 6.6. We see that phiRT appears to be more efficient for this test and that obtaining a time error withinthe desired tolerance range [10 − , − ] can be problematic with phiv and phip . Onboth grids the phip solver demonstrates a remarkable “all-or-nothing” behavior: onthe 1024 × to yields adrop in the delivered accuracy from to . This last accuracy valueis not plotted in Figure 6.6 as it is too low for a satisfactory PDE solution.
7. Conclusions.
We have proposed a Krylov subspace method for computingmatrix-vector products with the ϕ matrix function. It is based on the residual no-tion (2.4), which is essentially used in our algorithm both as a stopping criterion andfor efficient restarting. Our analysis provides true upper bounds for the residual, withno asymptotic estimates, and shows convergence of the restarted method for an arbi-
000 10000 15000 20000 2500010 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 a cc u r a cy
402 x 402 grid, Pe = 10 phivphipphi-RT
50 100 150 200 250 300 350 40010 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 CPU time a cc u r a cy
402 x 402 grid, Pe = 10 phivphipphi-RT -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 a cc u r a cy
802 x 802 grid, Pe = 20 phivphipphi-RT
100 200 300 400 50010 -12 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 CPU time a cc u r a cy
802 x 802 grid, Pe = 20 phivphipphi-RT
Fig. 6.4 . Finite-difference discretized convection–diffusion problem. Relative error versus num-ber of the matrix-vector products (left) and versus the CPU time (right) for phiv (dashed line), phip (dash-dotted line) and phiRT (solid line). Top plots: × grid, Pe = 10 , t = 10 − , bottomplots: × grid, Pe = 20 , t = 5 · − . The restart length is 30 for all the three solvers.The loosest tolerance value for phiv and phip solvers is . Hence, it is problematic to obtaincheaper, less accurate results with these solvers. Fig. 6.5 . Solution of the IFISS convection-dominated problem y ( t ) at t = 2 · (left) and thestationary solution A − g (right) on the × grid. trary final time interval [0 , t ], t < ∞ , and for any restart length (i.e., Krylov subspacedimension).Since the residual can be seen as the backward error, the residual based stoppingcriterion and residual-time (RT) restarting provide a reliable error control. This isconfirmed by the performance of the method: in all the test runs it either outperforms able 6.3 Results for the finite-element discretized convection-dominated convection–diffusion test prob-lem, ν = 1 / , × grid method(Krylov dim.), delivered matvecs /requested tolerance error CPU time, s phiv (30),
160 / 5.6 phiv (30),
224 / 7.7 phip (30),
31 / 1.2 phip (30),
210 / 7.7 phip (30),
180 / 6.7 phiRT (30),
89 / 3.3 phiRT (30),
160 / 5.7 phiv (10),
216 / 4.3 phip (10),
320 / 7.7 phiRT (10),
166 / 4.1
50 100 150 20010 -9 -8 -7 -6 -5 -4 -3 -2 -1 a cc u r a cy
512 x 512 grid phivphipphi-RT 0 2 4 6 810 -9 -8 -7 -6 -5 -4 -3 -2 -1 CPU time a cc u r a cy
512 x 512 grid phivphipphi-RT0 20 40 60 80 100 120 14010 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 a cc u r a cy phivphipphi-RT 0 5 10 15 2010 -11 -10 -09 -08 -07 -06 -05 -04 -03 -02 -01 CPU time a cc u r a cy phivphipphi-RT Fig. 6.6 . The finite-element discretized convection-dominated convection–diffusion test prob-lem. Relative error versus number of the matrix-vector products (left) and versus the CPU time(right) for phiv (dashed line), phip (dash-dotted line) and phiRT (solid line). The restart lengthis 30 for all the three solvers. Perturbing requested tolerance from to for phip yieldsa change in delivered accuracy from to , see the two upper left points of thedash-dotted line in the upper plots, × grid. A similar effect is observed for × grid. able 6.4 Results for the finite-element discretized convection-dominated convection–diffusion test prob-lem, ν = 1 / , × grid method(Krylov dim.), delivered matvecs /requested tolerance error CPU time, s phiv (30),
128 / 18.1 phiv (30),
128 / 18.2 phip (30), phip (30),
60 / 9.0 phip (30),
90 / 13.8 phiRT (30), phiRT (30), phiRT (30),
19 / 2.5 phiRT (30),
28 / 4.2 phiv (10),
60 / 5.2 phiv (10),
96 / 8.2 phip (10), phip (10),
100 / 9.9 phiRT (10), phiRT (10), phiRT (10),
45 / 4.4the phiv and phip solvers or delivers a comparable efficiency. More importantly,in all the test runs our method appears to deliver a monotone dependence of theobtained accuracy on the required tolerance. The phiv and phip solvers have in somecases difficulties with delivering error in the desired moderate PDE tolerance range[10 − , − ]; they tend to produce more accurate and more expensive results thanrequested.Thus, the presented residual based Krylov subspace method for evaluating matrix-vector products with the ϕ matrix function appears to be a promising approach usefulin practical computations. REFERENCES[1]
M. Abramowitz and I. A. Stegun , Handbook of mathematical functions with formulas,graphs, and mathematical tables , vol. 55 of National Bureau of Standards Applied Math-ematics Series, For sale by the Superintendent of Documents, U.S. Government PrintingOffice, Washington, D.C., 1964.[2]
M. Afanasjew, M. Eiermann, O. G. Ernst, and S. G¨uttel , Implementation of a restartedKrylov subspace method for the evaluation of matrix functions , Linear Algebra Appl., 429(2008), pp. 2293–2314.[3]
A. H. Al-Mohy and N. J. Higham , A new scaling and squaring algorithm for the matrixexponential , SIAM Journal on Matrix Analysis and Applications, 31 (2010), pp. 970–989,https://doi.org/10.1137/09074721X.[4]
A. H. Al-Mohy and N. J. Higham , Computing the action of the matrix exponential, withan application to exponential integrators , SIAM J. Sci. Comput., 33 (2011), pp. 488–511.http://dx.doi.org/10.1137/100788860.[5]
B. Beckermann , Image num´erique, GMRES et polynˆomes de Faber , C. R. Acad. Sci. Paris:Ser. I, 340 (2005), pp. 855–860.[6]
B. Beckermann and L. Reichel , Error estimation and evaluation of matrix functions via theFaber transform , SIAM J. Num. Anal., 47 (2009), pp. 3849–3883.[7]
H. Berland, B. Skaflestad, and W. M. Wright , EXPINT—a MATLAB package for ex- onential integrators M. Botchev, I. Farag´o, and R. Horv´ath , Application of operator splitting to theMaxwell equations including a source term , Applied Numerical Mathematics, 59(2009), pp. 522–541, https://doi.org/DOI:10.1016/j.apnum.2008.03.031, http://dx.doi.org/10.1016/j.apnum.2008.03.031.[9]
M. A. Botchev , Krylov subspace exponential time domain solution of Maxwell’s equations inphotonic crystal modeling , J. Comput. Appl. Math., 293 (2016), pp. 24–30. http://dx.doi.org/10.1016/j.cam.2015.04.022.[10]
M. A. Botchev, V. Grimm, and M. Hochbruck , Residual, restarting and Richardson it-eration for the matrix exponential , SIAM J. Sci. Comput., 35 (2013), pp. A1376–A1397.http://dx.doi.org/10.1137/110820191.[11]
M. A. Botchev, A. M. Hanse, and R. Uppu , Exponential Krylov time integration for modelingmulti-frequency optical response with monochromatic sources , Journal of Computationaland Applied Mathematics, 340 (2018), pp. 474–485. https://doi.org/10.1016/j.cam.2017.12.014.[12]
M. A. Botchev, D. Harutyunyan, and J. J. W. van der Vegt , The Gautschi time steppingscheme for edge finite element discretizations of the Maxwell equations , J. Comput. Phys.,216 (2006), pp. 654–686. http://dx.doi.org/10.1016/j.jcp.2006.01.014.[13]
M. A. Botchev and L. A. Knizhnerman , ART: Adaptive residual-time restarting for Krylovsubspace matrix exponential evaluations , J. Comput. Appl. Math., 364 (2020), p. 112311.https://doi.org/10.1016/j.cam.2019.06.027.[14]
M. A. Botchev and L. A. Knizhnerman , ART: Adaptive residual-time restarting for Krylovsubspace matrix exponential evaluations , J. Comput. Appl. Math., 364 (2020). https://doi.org/10.1016/j.cam.2019.06.027.[15]
M. A. Botchev, G. L. G. Sleijpen, and H. A. van der Vorst , Stability control for approx-imate implicit time stepping schemes with minimum residual iterations , Appl. Numer.Math., 31 (1999), pp. 239–253.[16]
G. D. Byrne, A. C. Hindmarsh, and P. N. Brown , VODPK, large non-stiff or stiff ordinarydifferential equation initial-value problem solver
M. Caliari and A. Ostermann , Implementation of exponential Rosenbrock-type integrators ,Appl. Numer. Math., 59 (2009), pp. 568–581, https://doi.org/10.1016/j.apnum.2008.03.021.[18]
E. Celledoni and I. Moret , A Krylov projection method for systems of ODEs , Appl. Numer.Math., 24 (1997), pp. 365–378, https://doi.org/10.1016/S0168-9274(97)00033-0.[19]
J. Certaine , The solution of ordinary differential equations with large time constants , inMathematical Methods for Digital Computers, K. E. A. Ralston, H.S. Wilf, ed., Wiley,New York, 1960, pp. 128–132.[20]
T. F. Chan and K. R. Jackson , The use of iterative linear-equation solvers in codes for largesystems of stiff IVPs for ODEs , SIAM J. Sci. Stat. Comput., 7 (1986), pp. 378–417.[21]
H. De Raedt, K. Michielsen, J. S. Kole, and M. T. Figge , One-step finite-difference time-domain algorithm to solve the Maxwell equations , Phys. Rev. E, 67 (2003), p. 056706.[22]
V. L. Druskin, A. Greenbaum, and L. A. Knizhnerman , Using nonorthogonal Lanczosvectors in the computation of matrix functions , SIAM J. Sci. Comput., 19 (1998), pp. 38–54, https://doi.org/10.1137/S1064827596303661.[23]
V. L. Druskin and L. A. Knizhnerman , Two polynomial methods of calculating functions ofsymmetric matrices , U.S.S.R. Comput. Maths. Math. Phys., 29 (1989), pp. 112–121.[24]
V. L. Druskin and L. A. Knizhnerman , Krylov subspace approximations of eigenpairs andmatrix functions in exact and computer arithmetic , Numer. Lin. Alg. Appl., 2 (1995),pp. 205–217.[25]
M. Eiermann and O. G. Ernst , A restarted Krylov subspace method for the evaluation ofmatrix functions , SIAM Journal on Numerical Analysis, 44 (2006), pp. 2481–2504.[26]
M. Eiermann, O. G. Ernst, and S. G¨uttel , Deflated restarting for matrix functions , SIAMJ. Matrix Anal. Appl., 32 (2011), pp. 621–641.[27]
M. Eiermann, O. G. Ernst, and S. G¨uttel , Deflated restarting for matrix functions , SIAMJ. Matrix Anal. Appl., 32 (2011), pp. 621–641.[28]
H. C. Elman, A. Ramage, and D. J. Silvester , IFISS: A computational laboratory forinvestigating incompressible flow problems , SIAM Review, 56 (2014), pp. 261–273.[29]
R. P. Fedorenko , Stiff systems of ordinary differential equations , in Numerical methods andapplications, CRC, Boca Raton, FL, 1994, pp. 117–154.[30]
X. Fu, C. Wang, P. Li, and L. Wang , Exponential integration algorithm for large-scale windfarm simulation with Krylov subspace acceleration , Applied Energy, 254 (2019), p. 113692.24ttps://doi.org/10.1016/j.apenergy.2019.113692.[31]
E. Gallopoulos and Y. Saad , Efficient solution of parabolic equations by Krylov ap-proximation methods , SIAM J. Sci. Statist. Comput., 13 (1992), pp. 1236–1264. http://dx.doi.org/10.1137/0913071.[32]
S. Gaudreault, G. Rainwater, and M. Tokman , KIOPS: A fast adaptive Krylov subspacesolver for exponential integrators , Journal of Computational Physics, 372 (2018), pp. 236–255. https://doi.org/10.1016/j.jcp.2018.06.026.[33]
C. W. Gear and Y. Saad , Iterative solution of linear equations in ODE codes , SIAM J. Sci.Stat. Comput., 4 (1983), pp. 583–601.[34]
G. H. Golub and C. F. Van Loan , Matrix Computations , The Johns Hopkins UniversityPress, Baltimore and London, third ed., 1996.[35]
S. G¨uttel, A. Frommer, and M. Schweitzer. , Efficient and stable Arnoldi restarts for matrixfunctions based on quadrature , SIAM J. Matrix Anal. Appl, 35 (2014), pp. 661–683.[36]
E. Hansen and A. Ostermann , High-order splitting schemes for semilinear evolution equa-tions , BIT Numerical Mathematics, 56 (2016), pp. 1303–1316.[37]
N. J. Higham , Functions of Matrices: Theory and Computation , Society for Industrial andApplied Mathematics, Philadelphia, PA, USA, 2008.[38]
N. J. Higham and A. H. Al-Mohy , Computing matrix functions , Acta Numer., 19 (2010),pp. 159–208, https://doi.org/10.1017/S0962492910000036.[39]
M. Hochbruck and C. Lubich , On Krylov subspace approximations to the matrix exponentialoperator , SIAM J. Numer. Anal., 34 (1997), pp. 1911–1925.[40]
M. Hochbruck, C. Lubich, and H. Selhofer , Exponential integrators for large systems ofdifferential equations , SIAM J. Sci. Comput., 19 (1998), pp. 1552–1574.[41]
M. Hochbruck and A. Ostermann , Exponential integrators , Acta Numer., 19 (2010), pp. 209–286, https://doi.org/10.1017/S0962492910000048.[42]
M. Hochbruck, A. Ostermann, and J. Schweitzer , Exponential Rosenbrock-type methods ,SIAM J. Numer. Anal., 47 (2008/09), pp. 786–803, https://doi.org/10.1137/080717717.http://dx.doi.org/10.1137/080717717.[43]
M. Hochbruck, T. Paˇzur, A. Schulz, E. Thawinan, and C. Wieners , Efficient time inte-gration for discontinuous Galerkin approximations of linear wave equations , ZAMM, 95(2015), pp. 237–259, http://dx.doi.org/10.1002/zamm.201300306.[44]
R. A. Horn and C. R. Johnson , Matrix Analysis , Cambridge University Press, 1986.[45]
W. Hundsdorfer and J. G. Verwer , Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations , Springer Verlag, 2003.[46]
L. A. Knizhnerman , Calculation of functions of unsymmetric matrices using Arnoldi’s method ,U.S.S.R. Comput. Maths. Math. Phys., 31 (1991), pp. 1–9.[47]
L. A. Krukier , Implicit difference schemes and an iterative method for solving them for acertain class of systems of quasi-linear equations , Sov. Math., 23 (1979), pp. 43–55. Trans-lation from Izv. Vyssh. Uchebn. Zaved., Mat. 1979, No. 7(206), 41–52 (1979).[48]
J. D. Lawson , Generalized Runge-Kutta processes for stable systems with large Lipschitz con-stants , SIAM J. Numer. Anal., 4 (1967), pp. 372–380.[49]
J. Legras , R´esolution num´erique des grands syst`emes diff´erentiels lin´eaires , Numer. Math., 8(1966), pp. 14–28.[50]
C. B. Moler and C. F. Van Loan , Nineteen dubious ways to compute the exponential of amatrix, twenty-five years later , SIAM Rev., 45 (2003), pp. 3–49.[51]
I. N ˙asell , Inequalities for modified Bessel functions , Math. Comp., 28 (1974), pp. 253–256.https://doi.org/10.2307/2005831.[52]
J. Niehoff , Projektionsverfahren zur Approximation von Matrixfunktionen mit Anwendungenauf die Implementierung exponentieller Integratoren , PhD thesis, Mathematisch-Naturwis-senschaftlichen Fakult¨at der Heinrich-Heine-Universit¨at D¨usseldorf, December 2006.[53]
J. Niesen and W. M. Wright , Algorithm 919: A Krylov subspace algorithm for evaluatingthe ϕ -functions appearing in exponential integrators , ACM Trans. Math. Softw., 38 (2012),pp. 22:1–22:19, https://doi.org/10.1145/2168773.2168781.[54] T. J. Park and J. C. Light , Unitary quantum time evolution by iterative Lanczos reduction ,J. Chem. Phys., 85 (1986), pp. 5870–5876.[55]
B. N. Parlett , The Symmetric Eigenvalue Problem , SIAM, 1998.[56]
S. Paszkowski , Computational Applications of Chebyshev Polynomials and Series , Nauka,1983. Russian, translated from Polish.[57]
Y. Saad , Analysis of some Krylov subspace approximations to the matrix exponential operator ,SIAM J. Numer. Anal., 29 (1992), pp. 209–228.[58]
Y. Saad , Iterative Methods for Sparse Linear Systems ∼ saad/books.html.2559] T. Schmelzer and L. N. Trefethen , Evaluating matrix functions for exponential integra-tors via Carath´eodory-Fej´er approximation and contour integrals , Electron. Trans. Numer.Anal., 29 (2007/08), pp. 1–18.[60]
R. B. Sidje , Expokit.
A software package for computing matrix exponentials
D. J. Silvester, H. C. Elman, and A. Ramage , Algorithm 886: IFISS, a Matlab toolbox formodelling incompressible flow , ACM Transactions on Mathematical Software, 33 (2007).[62]
D. J. Silvester, H. C. Elman, and A. Ramage , Incompressible flow & iterative solver soft-ware
P. K. Suetin , Series of Faber Polynomials , CRC Press, 1998.[64]
H. Tal-Ezer , Spectral methods in time for parabolic problems , SIAM J. Numer. Anal., 26(1989), pp. 1–11.[65]
H. A. van der Vorst , An iterative solution method for solving f ( A ) x = b , using Krylovsubspace information obtained for the symmetric positive definite matrix A , J. Comput.Appl. Math., 18 (1987), pp. 249–263.[66] H. A. van der Vorst , BiCGSTAB: a fast and smoothly converging variant of BiCG for thesolution of nonsymmetric linear systems , SIAM J. Sci. Stat. Comput., 13 (1992), pp. 631–644.[67]
H. A. van der Vorst , Iterative Krylov methods for large linear systems , Cambridge UniversityPress, 2003.[68]
R. W. C. P. Verstappen and A. E. P. Veldman , Symmetry-preserving discretization ofturbulent flow , J. Comput. Phys., 187 (2003), pp. 343–368. http://dx.doi.org/10.1016/S0021-9991(03)00126-8.[69]
X. Wang, P. Chen, and C. Cheng , Stability and convergency exploration of matrix ex-ponential integration on power delivery network transient simulation , IEEE Transac-tions on Computer-Aided Design of Integrated Circuits and Systems, (2019), pp. 1–1.https://doi.org/10.1109/TCAD.2019.2954473.[70]