[PDF] A Block Krylov Subspace Implementation of the Time-Parallel Paraexp Method and its Extension for Nonlinear Partial Differential Equations

Abstract

A parallel time integration method for nonlinear partial differential equations is proposed. It is based on a new implementation of the Paraexp method for linear partial differential equations (PDEs) employing a block Krylov subspace method. For nonlinear PDEs the algorithm is based on our Paraexp implementation within a waveform relaxation. The initial value problem is solved iteratively on a complete time interval. Nonlinear terms are treated as a source term, provided by the solution from the previous iteration. At each iteration, the problem is decoupled into independent subproblems by the principle of superposition. The decoupled subproblems are solved fast by exponential integration, based on a block Krylov method. The new time integration is demonstrated for the one-dimensional advection-diffusion equation and the viscous Burgers equation. Numerical experiments confirm excellent parallel scaling for the linear advection-diffusion problem, and good scaling in case the nonlinear Burgers equation is simulated.

Full PDF

aa r X i v : . [ m a t h . NA ] S e p A BLOCK KRYLOV SUBSPACE IMPLEMENTATION OF THETIME-PARALLEL PARAEXP METHOD AND ITS EXTENSION FORNONLINEAR PARTIAL DIFFERENTIAL EQUATIONS

GIJS L. KOOIJ † , MIKE A. BOTCHEV ‡ , AND

BERNARD J. GEURTS †§ Abstract.

A parallel time integration method for nonlinear partial diﬀerential equations isproposed. It is based on a new implementation of the Paraexp method for linear partial diﬀerentialequations (PDEs) employing a block Krylov subspace method. For nonlinear PDEs the algorithmis based on our Paraexp implementation within a waveform relaxation. The initial value problemis solved iteratively on a complete time interval. Nonlinear terms are treated as a source term,provided by the solution from the previous iteration. At each iteration, the problem is decoupledinto independent subproblems by the principle of superposition. The decoupled subproblems aresolved fast by exponential integration, based on a block Krylov method. The new time integration isdemonstrated for the one-dimensional advection-diﬀusion equation and the viscous Burgers equation.Numerical experiments conﬁrm excellent parallel scaling for the linear advection-diﬀusion problem,and good scaling in case the nonlinear Burgers equation is simulated.

Key words. parallel computing, exponential integrators, partial diﬀerential equations, parallelin time, block Krylov subspace.

AMS subject classiﬁcations.

1. Introduction.

Recent developments in hardware architecture, bringing topractice computers with hundreds of thousands of cores, urge the creation of new,as well as a major revision of existing numerical algorithms [14]. To be eﬃcient onsuch massively parallel platforms, the algorithms need to employ all possible meansto parallelize the computations. When solving partial diﬀerential equations (PDEs)with a time-dependent solution, an important way to parallelize the computationsis, next to the parallelization across space, parallelization across time. This adds anew dimension of parallelism with which the simulations can be implemented. Inthis paper we present a new time-parallel integration method extending the Paraexpmethod [21] to nonlinear partial diﬀerential equations using Krylov methods andwaveform relaxation.Several approaches to parallelize the simulation of time-dependent solutions intime can be distinguished. The ﬁrst important class of the methods are the waveformrelaxation methods [6, 38, 44, 56], including the space-time multigrid methods forparabolic PDEs [8, 24, 30, 37]. The key idea is to start with an approximation tothe numerical solution for the whole time interval of interest and update the solution,solving an easier-to-solve approximate system in time. The Parareal method [36],which attracted signiﬁcant attention recently, is a prime example related to the classof waveform relaxation methods [19].Parallel Runge–Kutta methods and general linear methods, where the parallelismis determined and restricted by the number of stages or steps, form another class ofthe time-parallel methods [8, 12, 52]. Time-parallel schemes can also be obtained † Multiscale Modeling and Simulation, Faculty of EEMCS, University of Twente, 7500 AEEnschede, The Netherlands. Email: [email protected] (ﬁrst and corresponding author),[email protected] (third author). ‡ Mathematics of Computational Science, Faculty of EEMCS, University of Twente, 7500 AEEnschede, The Netherlands ([email protected]). § Faculty of Applied Physics, Fluid Dynamics Laboratory, Eindhoven University of Technology,Eindhoven, The Netherlands. 1

KOOIJ ET AL. by treating the time as an additional space dimension and solving for values at alltime steps at once [13, 57]. This approach requires signiﬁcantly more memory andis used, e.g., in multidimensional tensor computations and (discontinuous) Galerkinﬁnite element methods [9, 31, 35, 54]. Recently a “parallel full approximation schemein space and time” (PFASST) was introduced, which is based on multigrid meth-ods [17]. PFASST was observed to have a generally improved parallel eﬃciency com-pared to Parareal. Also, we mention parallel methods that facilitate parallelism byreplacing the exact solves in implicit schemes by approximate ones [7], and parallelmethods based on operator splitting [4]. Finally, there is the Paraexp method [21]for parallel integration of linear initial-value problems. This algorithm is based onthe fact that linear initial-value problems can be decomposed into homogeneous andnonhomogeneous subproblems. The homogeneous subproblems can be solved fast byan exponential integrator, while the nonhomogeneous subproblems can be solved atraditional time-stepping method.In this paper we propose a time parallel method which is based on the matrixexponential time stepping technique and waveform relaxation. Our method adopts theParaexp method within a waveform relaxation framework, which enables the extensionof parallel time integration to nonlinear PDEs. The method is inspired by and relieson recent techniques in Krylov subspace matrix exponential methods such as shift-and-invert acceleration [22, 40, 51], restarting [1, 10, 16, 45, 50] and using block Krylovsubspaces [2] to handle nonautonomous PDEs eﬃciently. The method also relies ona singular value decomposition (SVD) of source terms in the PDE, which is used toconstruct the block Krylov subspace. To improve the eﬃciency of the method, theSVD is truncated by retaining only the relatively large singular values. We show intheory and in practice that for a source term, that can be approximated by a smoothfunction, the singular values decay algebraically with time interval size. In that case,a truncated SVD approximation of the source term is adequate.The contribution of this paper is more speciﬁcally as follows. First, to solve sys-tems of linear ODEs, we propose an implementation of the Paraexp method, basedon the exponential block Krylov (EBK) method, introduced in [2]. The EBK methodis a Krylov method that approximates the exact solution of a system of nonhomo-geneous linear ordinary diﬀerential equations by a projection onto a block Krylovsubspace. Our Paraexp implementation does not involve a splitting of homogeneousand nonhomogeneous subproblems, which leads to a better parallel eﬃency. Second,we extend our EBK-based implementation of the Paraexp method to nonlinear initial-value problems. We solve the problem iteratively on a chosen time interval. In ourcase, the nonlinear term of the PDE is incorporated as a source term, which is up-dated after every iteration with the latest solution. Third, we show that our Paraexp-EBK (PEBK) implementation has promising properties for time-parallelization. ThePEBK method can be seen as a special continuous-time waveform relaxation method,which is typically more eﬃcient than standard waveform relaxation methods [6]. ThePEBK method is tested for the one-dimensional advection-diﬀusion equation and theviscous Burgers equation in an extensive series of numerical experiments. Since we areinterested in time-parallel solvers for large-scale applications in computational ﬂuiddynamics (CFD), such as turbulent ﬂow simulations, we also test our method withrespect to the diﬀusion/advection ratio and grid resolution, reﬂecting high Reynolds-number conditions. For example, the parallel eﬃciency of the Parareal algorithm wasfound to decrease considerably with increasing number of processors for the advectionequation [20, 47, 48]. In contrast, the EBK method was found to have excellent weak

TIME-PARALLEL EXPONENTIAL INTEGRATOR

2. Exponential time-integration.

Our time integration method for linear andnonlinear PDEs is based on the exponential block Krylov (EBK) method [2]. In thissection, we ﬁrst provide a brief description of the EBK method. Then, we extendthe EBK method to integrate nonlinear PDEs in an iterative way. Finally, the paral-lelization of the time integrator is discussed.

The EBK method is a time inte-grator for linear systems of nonhomogeneous ordinary diﬀerential equations (ODEs).Details of the method are given in [2]. We follow the method of lines approach [55],i.e., the PDE is discretized in space ﬁrst. We start with linear PDEs. After applyinga spatial discretization to the PDE, we obtain the initial value problem, u ′ ( t ) = A u ( t ) + g ( t ) , u (0) = u , (2.1)where u ( t ) is a vector function u ( t ) : R → R n , A ∈ R n × n is a square matrix, and g ( t ) ∈ R n a time-dependent source term. The vector u can readily be identiﬁedwith the solution to the PDE in terms of its values taken on a computational gridin physical space. In Section 2.3, we will introduce a more general term g ( t, u ( t )),which contains the nonlinear terms of the PDE. The matrix A is typically large andsparse, depending on the spatial discretization method. The dimension of the system, n , corresponds to the number of degrees of freedom used in the spatial discretiza-tion. Exponential integrators approximate the exact solution of the semi-discretesystem (2.1), formulated in terms of the matrix exponential.The ﬁrst step in the EBK method is to approximate g ( t ) with a function thatcan be treated easier. Here, g ( t ) is approximated based on a truncated singular valuedecomposition (SVD) [3] as follows. The source term is sampled at s points in time,0 = t < t < . . . < t s − < t s = ∆ T , in the integration interval [0 , ∆ T ], and thesource samples form the matrix G = [ g ( t ) g ( t ) . . . g ( t s ) ] ∈ R n × s . (2.2)We make a natural assumption that the typical number of time samples s , necessaryfor a good approximation of g ( t ), is much lower than n : s ≪ n . Since we assume s ≪ n , it is more economical to calculate the so-called thin SVD of G [23], instead ofthe full SVD, without loss of accuracy. In this case, the thin SVD of G is G = ˜ U ˜Σ ˜ V T , (2.3)where ˜Σ ∈ R s × s is a diagonal matrix containing the singular values σ ≥ σ ≥ . . . ≥ σ s , and ˜ U ∈ R n × s , ˜ V ∈ R s × s are matrices with orthonormal columns. The thin SVDcan be approximated by a truncation in which the m < s largest singular values areretained. As seen from this truncated SVD, the samples of g ( t ) can be approximatedby linear combinations of the ﬁrst m columns of ˜ U , i.e., g ( t ) ≈ U p ( t ) . (2.4) KOOIJ ET AL. where U ∈ R n × m is formed by the ﬁrst m columns of ˜ U , p ( t ) ∈ R m is obtainedby an interpolation of the coeﬃcients in these linear combinations [3]. There areseveral possible choices for p ( t ), among which, cubic piecewise polynomials. Then,the approximation error in the source term, k g ( t ) − U p ( t ) k , can be easily controlledwithin a desired tolerance, depending on the number of samples in [0 , ∆ T ], and thenumber of singular values truncated (see [2]).The number of retained singular values, m , is equal to the block width in the blockKrylov subspace. The eﬃciency of the EBK method therefore depends on how manysingular values are required for an accurate approximation of the source term (2.4). Inapplications of the method this parameter m can be varied and practical convergencecan be assessed. If the subinterval ∆ T is small and the source function g ( t ) canbe well approximated by a smooth function, then the singular values of the samplesof g decrease rapidly, thus allowing for an eﬃcient low rank approximation in (2.4).Furthermore, let the subinterval length ∆ T be small in the sense that (∆ T ) ≪ ∆ T (indimensionless time units). Of course, the assumption that ∆ T is small, is not alwaysrealistic and we comment on how it can be partially relaxed below in Remark 1.Denote by g ( j ) the j th derivative of g and assume that the constants M j := max ξ ∈ [ t,t +∆ T ] k g ( j ) ( ξ ) k j = 0 , . . . , n, are bounded. We can then show that the singular values of the samples decrease,starting from a thin QR factorization of G .Consider the thin QR factorization of G , G = QR , where Q ∈ R n × s has orthonor-mal columns and R ∈ R s × s is an upper triangular matrix with nonnegative diagonalentries. As Theorem 1 in [25] states, the entries r ik of R satisfy | r jk | ≤ C j M j − (∆ T ) j − , k ≥ j, and r jk = 0 for k < j because R is upper triangular. In this estimate the constants C j depend only on the points t , . . . , t s at which the samples are computed and donot depend on g and ∆ T .Since G and R have the same singular values, we consider now the singular valuesof R . According to [29, 3.1.3], σ j +1 ≥ k R j k , j = 1 , . . . , s − , (2.5)where R j is a matrix obtained from R by skipping j rows or columns, chosen arbitrar-ily. Since the entries in R decrease rowwise as | r jk | = O (∆ T ) j − , to have the sharpestestimate in (2.5), we choose R j to be the matrix R with the ﬁrst j rows skipped. Tobound the 2-norm of R j we use [28, Chapter 5.6] k R j k ≤ q k R j k k R j k ∞ and note that for j = 1 , . . . , s − k R j k ∞ ≤ ( s − j ) C j M j − (∆ T ) j − , k R j k = O (∆ T ) j − . Thus, we obtain k R j k = √ s − j O (∆ T ) j − , which, together with (2.5), yields thefollowing result. Theorem 2.1.

Let g ( t ) : R → R n be a smooth function such that the constants M j := max t ∈ [0 , ∆ T ] k g ( j ) ( t ) k j = 0 , . . . , n, TIME-PARALLEL EXPONENTIAL INTEGRATOR are bounded. Furthermore, let the subinterval length ∆ T be small in the sense that (∆ T ) ≪ ∆ T . Then for the singular values σ j , j = 1 , . . . , s , of the sample matrix G = (cid:2) g ( t ) g ( t ) . . . g ( t s ) (cid:3) ∈ R n × s holds σ j +1 = p s − j O (∆ T ) j , j = 1 , . . . , s − . (2.6) Remark 1.

We may extend the result of Theorem 2.1 to the case of a large ∆ T if the constants M j are bounded more strongly or even decay with j . Indeed, if ∆ T is large, we can consider the function ˜ g ( ˜ ξ ) := g ( t + ξq ) , with q chosen such that τ := ∆ T /q is small. As the function ˜ g takes the same values for ˜ ξ ∈ [0 , τ ] as thefunction g for ξ ∈ [ t, t + ∆ T ] , Theorem 2.1 formally holds for ˜ g with ∆ T replacedby τ and M j multiplied by q j . The coeﬃcients in the O symbol in (2.6) may nowgrow with the powers of ∆ T , thus rendering the result meaningless unless a stricterassumption on M j is made. The truncated SVD approximation of the source term (2.4) facilitates the solutionof the initial value problem (IVP) in (2.1) by the block Krylov subspace method [2].Here, the block Krylov subspace at iteration l is deﬁned as K l ( A, U ) := span (cid:8)

U, AU, A U, . . . , A l − U (cid:9) . (2.7)Furthermore, the block Krylov subspace method can be accelerated by the shift-and-invert technique [22, 40, 51] and, to keep the Krylov subspace dimension bounded,implemented with restarting [5, 10]. In this section, the parallelizationof the linear ODE solution is discussed. As we will see, the method is equivalent tothe Paraexp method, but there are diﬀerences in implementation leading to a betterparallel eﬃciency (see Section 3.2). Linear IVPs can be solved parallel in time byusing the principle of superposition. First, the time intervals is divided into P non-overlapping subintervals, 0 = T < T < . . . < T P = T, where P is also the number of processors that can be used in parallel. We introducethe shift of u ( t ) with respect to the initial condition, u ( t ) = u + b u ( t ). The shiftedvariable b u ( t ) solves the IVP with homogeneous initial conditions, b u ′ ( t ) = A b u ( t ) + b g ( t ) , t ∈ [0 , T ] , b u (0) = , (2.8)where b g ( t ) := A u + g ( t ) . (2.9)The source term is approximated using s time samples per subinterval, as illustratedin Fig. 2.1. The shifted IVP (2.8) can be decoupled into independent subproblems bythe principle of superposition. To each subinterval [ T j , T j − ] we associate a part ofthe source term b g ( t ), deﬁned for j = 1 , . . . , P as b g j ( t ) = (cid:26) b g ( t ) , for T j − ≤ t < T j , , otherwise . (2.10) KOOIJ ET AL. T j − T j T j +1 T j +1 t t . . . t s − t s Figure 2.1: Time samples of the source term on the subinterval [ T j , T j +1 ].The expected parallel speedup is based on the observation that the solution to (2.8)is given by a variation-of-constant formula (see, e.g., [26]), b u ( T ) = Z T exp (( T − s ) A ) b g ( s ) ds = P X j =1 Z TT j − exp (( T − s ) A ) b g j ( s ) ds, (2.11)where the integrals R T j T j − . . . ds can be evaluated independently and in parallel. Moreprecisely, by exploiting the linearity of the problem we decompose the IVP (2.20) into P independent subproblems, v ′ j ( t ) = A v j ( t ) + b g j ( t ) , t ∈ [0 , T ] , v j (0) = . (2.12)where v j ( t ) is referred to as a subsolution of the total problem. The subproblems areindependent and can be solved in parallel. We integrate the subproblem individuallywith the EBK method. Observe that the source is only nonzero on the subinter-val [ T j − , T j ). We follow the ideas of the Paraexp method [21] and note that thenonhomogeneous part of the ODE requires most of the computational work in theEBK method. An accurate SVD approximation of the source term (2.4) generallyrequires more singular values to be retained, increasing the dimensions of the blockKrylov subspace (2.7). According to the principle of superposition, the solution ofthe original problem is then given as u ( t ) = u + P X j =1 v j ( t ) . (2.13)The summation of the subsolutions, v j ( t ) is the only communication required betweenthe parallel processes. The parallel algorithm is summarized in Fig. 2.2. In principle,this algorithm is identical to the Paraexp method [21]. The only practical diﬀerenceis that in our implementation both the nonhomogeneous and the homogeneous partof the subproblems are solved by the EBK method. The original version of theParaexp method assumes a convential time integration method to solve the “diﬃcult”nonhomogeneous part, and a Krylov subspace method for exponential integration ofthe homogeneous part. Our implementation of the Paraexp method is comparednumerically with the original one in Section 3.2.Finally, additional parallelism could be exploited within the block Krylov sub-space method itself. If the block size is m , then the m matrix-vector products can beexecuted entirely in parallel, see [34, 46]. This approach could be applied in combi-nation with the parallelization described in this section. TIME-PARALLEL EXPONENTIAL INTEGRATOR Algorithm parallel time integration.

Given: A , u , g ( t ), ...Solve: u ′ ( t ) = A u ( t ) + g ( t ) , u (0) = u . for j = 1 , . . . , P (in parallel)Solve nonhomogeneous part of the ODE (2.12), for t ∈ [ T j − , T j ].Solve homogeneous part of the ODE (2.12), for t ∈ [ T j , T P ] . end for Construct solution u ( t ), see (2.13).Figure 2.2: The algorithm for solving linear ODE systems with the Paraexp exponen-tial block Krylov (PEBK) method. The EBK method, designed to solve a lin-ear system of nonhomogeneous ODEs (2.1), can be extended to handle nonlinearsystems of ODEs by including the nonlinearities in the source term. The system isthen solved iteratively, in such a way that the iterand u k ( t ) is updated on the entireinterval [0 , T ]: u k ( t ) u k +1 ( t ) , (2.14)where k denotes the iteration index. We proceed in a few steps. First, the problemis reduced to the form of Eq. (2.1). The nonlinear IVP is therefore approximatedby evaluating the source term with the current iterand, u k ( t ). Because the currentiterand u k ( t ) is a known function, we can write the source term as an explicit functionof time, g k ( t ) := g ( t, u k ( t )) . (2.15)The resulting initial value problem is, u ′ k +1 ( t ) = A u k +1 ( t ) + g k ( t ) , u k +1 (0) = u . (2.16)This system is solved by the EBK method to achieve one iteration. This processis continued until the solution is suﬃciently converged. This approach is similarto applying Picard or ﬁxed-point iterations [42] on the nonlinear term. The sourceterm can be further improved for nonlinear behaviour by using the Jacobian matrix,similar to a Newton–Rhapson method. In this case, we introduce the average Jacobianmatrix, J k := 1 T Z T (cid:20) ∂ g k ∂u . . . ∂ g k ∂u n (cid:21) dt, (2.17)which is averaged over the interval of integration [0 , T ]. Generally, the solution variesin time and so does the Jacobian matrix. The nonlinear remainder, with respect tothe time-averaged state, is contained in the source term. Using this correction, wethen ﬁnd the recursive relation, u ′ k +1 ( t ) = [ A + J k ] u k +1 ( t ) + g k ( t ) − J k u k ( t ) , u k +1 (0) = u . (2.18) KOOIJ ET AL.

When converged, u k +1 ( t ) = u k ( t ), the terms containing J k eventually disappear. Remark 2.

The iteration (2.18) is well known in the literature on waveformrelaxation methods [37, 38]. Its convergence is given, e.g., in [5, 42] and, in case ofan inexact iteration, in [6]. These results, in particular, show that the iteration (2.18) converges superlinearly on ﬁnite time intervals if the norm of the matrix exponentialof t ( A + J k ) decays exponentially with t , i.e., the eigenvalues of A + J k lie in the lefthalf-plane. Nonlinear IVPs are solved in aniterative way with the PEBK method. We follow the waveform relaxation approach[6, 33, 42], that is, the problem is solved iteratively on the entire time interval ofinterest, [0 , T ]. The original problem is decomposed into a number of independent,“easier” subproblems on [0 , T ] that can be solved iteratively, and in parallel.At each iteration the nonlinear term is treated as source term, which gives us alinear system of ODEs that can be integrated in parallel, see Section 2.2. In the caseof nonlinear problems, we take the average of the Jacobian matrix on each subinterval,which gives us the piecewise constant function for j = 1 , . . . , P : J k ( t ) := 1 T j − T j − Z T j T j − (cid:20) ∂ g k ∂u . . . ∂ g k ∂u n (cid:21) dt, t ∈ [ T j − , T j ] . (2.19)The IVP is then shifted with respect to the initial condition, b u ′ k +1 ( t ) = [ A + J k ( t )] b u k +1 ( t ) + b g k ( t ) , t ∈ [0 , T ] , b u k +1 (0) = , (2.20)where, b g k ( t ) := A u + g k ( t ) + J k ( t )[ u − b u k ( t )] . (2.21)The shifted IVP (2.20) can be then solved in parallel. We follow an approach similarto the Paraexp method, as explained in Section 2.2. Here, we deﬁne b g j,k ( t ) := (cid:26) b g k ( t ) , for T j − ≤ t < T j , , otherwise . (2.22)Now, the problem is decomposed into P independent subproblems. The subsolutions v j result from convergence of v j,k obeying: v ′ j,k +1 ( t ) = [ A + J k ( t )] v j,k +1 ( t ) + b g j,k ( t ) , t ∈ [0 , T ] , (2.23) v j,k +1 (0) = , (2.24)which can be integrated with the EBK method. Here, we have introduced a secondsubindex k to indicate the j th subsolution at the k th iteration. The total solutioncan then be updated according to the principle of superposition, u k +1 ( t ) = u + P X j =1 v j,k +1 ( t ) . (2.25)After each iteration, the solution is assembled, and the time-averaged Jacobian matrixis updated at each subinterval, according to Eq. (2.19). Note that for linear systems TIME-PARALLEL EXPONENTIAL INTEGRATOR u ( t )on [ T j , T j +1 ]Update sourceterm (2.15) andJacobian matrix (2.17)Solve IVP (2.18)for u k +1 ( t ) k < K ?Next timeinterval . . . k + 1 yes noFigure 2.3: Flow diagram of the serial EBK method for nonlinear PDEs. The algo-rithm stops after K iterations, after which the solution is assumed to be converged.We consider the parallel eﬃciency in an idealized setting, and estimate a theoret-ical upper bound here, assuming that communication among the processors can becarried out very eﬃciently. The computational cost can then be simply estimated bythe number of iterations required for the numerical solution to converge. Suppose thecomputation time of a single EBK iteration is τ . The maximum parallel speedup isthen, speedup = K τ K P τ /P = K PK P where K is number of iterations for serial time integration, and K P for parallel timeintegration. The theoretical upper bound of the parallel eﬃciency is then,eﬃciency = speedup P = K K P . (2.26)High parallel eﬃciency can be achieved if the parallelization does not slow down theconvergence of the numerical solution. As will be demonstrated in Section 4, wetypically observe that K P is not signiﬁcantly larger than K , and a near-optimal ef-ﬁciency is achieved in various relevant cases. For comparison, the parallel eﬃciencyof the Parareal algorithm is formally bounded by 1 /K P [39]. In our case, this upperbound is improved by a factor K . Parareal is an iterative method for the paralleliza-tion of sequential time integration methods, whereas the EBK method, for nonlinear0 KOOIJ ET AL.

Initialize u ( t )Update sourceterm (2.21) andJacobian matrix (2.19)Solve subproblemfor v j,k ( t ) (2.24) . . .Solve sub-problem for v ,k ( t ) (2.24) Solve sub-problem for v P,k ( t ) (2.24)Updatesolution (2.25) k < K ?End k + 1 yesnoFigure 2.4: Flow diagram of the Paraexp-EBK method for nonlinear PDEs. Thealgorithm stops after K iterations, after which the solution is assumed to be converged.problems, is an iterative method to start with, and its parallelization does not neces-sarily increase the total number of iterations. Note the PFASST method [17] has asimilar estimate of parallel eﬃciency as the PEBK method. Algorithm parallel time integration.

Given: A , u , g ( t , u ( t )), ...Solve: u ′ ( t ) = A u ( t ) + g ( t, u ( t )) , u (0) = u . for k = 1 , . . . , K for j = 1 , . . . , P (in parallel)Calculate time-averaged Jacobian matrix J k ( t ), see (2.19).Solve nonhomogeneous part linearized ODE (2.24), for t ∈ [ T j − , T j ].Solve homogeneous part linearized ODE (2.24), for t ∈ [ T j , T P ]. end for Update solution u k ( t ) following (2.25). end for Figure 2.5: The algorithm for solving nonlinear problems with the Paraexp exponen-tial block Krylov (PEBK) method, see also Fig. 2.4.

3. Advection-diﬀusion equation.

In this section, we present results of numer-ical tests where the space-discretized advection-diﬀusion equation is solved with theEBK method. We demonstrate the consistency and stability of the EBK method for

TIME-PARALLEL EXPONENTIAL INTEGRATOR ode15s in MATLAB (2013b). The rela-tive tolerance of the PEBK solver is denoted by tol . In our tests, the block Krylovsubspace is restarted every 20 iterations. For the truncated SVD approximation (2.4), p ( t ) are chosen to be piecewise cubic polynomials, although other types of approxi-mations are possible as well, and are not crucial for the performance of the PEBKmethod. The sample points per subinterval are Chebyshev nodes. This is not crucialbut gives a slightly better approximation than with uniform sample points. We consider the advection-diﬀusion equation witha short pulse initial condition, to clearly distinguish the seperate eﬀects of advectionand diﬀusion. The PDE and the periodic boundary conditions are as follows u t + au x = νu xx , x ∈ [0 , , t ∈ [0 , ,u ( x,

0) = sin ( πx ) ,u (0 , t ) = u (1 , t ) ,u x (0 , t ) = u x (1 , t ) , (3.1)where a ∈ R is the advection velocity, and ν ∈ R the diﬀusivity coeﬃcient. Bothparameters are constant in space and time. The PDE is ﬁrst discretized in space witha second-order central ﬁnite diﬀerence scheme [41]. The corresponding semi-discretesystem of ODEs is then v ′ ( t ) = A v ( t ) + A u , v (0) = , (3.2)where the matrix A results from the discretization of the spatial derivatives, andrepresents both the diﬀusive and the advective term. In (3.2), the substitution u = v + u has been applied, with u ( t ) being the vector function containing the values ofthe numerical solution on the mesh at time t , with u = u (0). The substitution leadsto homogeneous initial conditions. In this case, the source term is constant in time,and its SVD polynomial approximation (2.4) is exact with m = 1. This allows us tofocus on the two remaining parameters: the grid resolution and the tolerance of theEBK solver.The linear IVP (3.2) is decoupled into independent subproblems by partitioningof the source term ( A u ) on the time interval. The superposition of the subsolutions isillustrated in Fig. 3.1, in which the time interval of interest, [0 , P = 8 subintervals are used. Theadvection-diﬀusion equation is solved for three diﬀerent combinations of a and ν ,see Fig. 3.2. The solutions are computed for diﬀerent tol and ∆ x . The discretization2 KOOIJ ET AL. error is controlled by the time integration with tol , and by the spatial discretizationwith ∆ x . The error of the numerical solution is measured in the relative ℓ -norm, k u h ( T ) − u ( T ) k / k u ( T ) k , (3.3)where the exact solution u ( T ) is on the mesh, i.e., it has the entries u j ( T ) = u ( x j , T ).The exact solution of Eq. (3.1) is u ( x, t ) = X n =0 a n cos ( nπ ( x − at )) exp (cid:0) − ( nπ ) νt (cid:1) , (3.4)with coeﬃcients a = 12 Z − sin ( πx ) dx, (3.5) a n = Z − sin ( πx ) cos( nπx ) dx. (3.6)The coeﬃcients are calculated using numerical quadrature with high precision. Theerror in (3.3) shows second-order convergence with ∆ x , as expected from the trun-cation error of the ﬁnite diﬀerence scheme. The convergence plots show that we areable to control the error of the parallel time integration method. In this case, the ﬁnalerror can be made to depend only on the spatial resolution and the tolerance of theEBK method. Also, the EBK method has no principal restrictions on the timestepsize, as it directly approximates the exact solution of Eq. (3.2) by Krylov subspaceprojections. TIME-PARALLEL EXPONENTIAL INTEGRATOR . . . xt u ( x , t ) (a) u . . . xt u ( x , t ) (b) u + v ( t ) . . . xt u ( x , t ) (c) u + v ( t ) + v ( t ) . . . xt u ( x , t ) (d) u + v ( t ) + v ( t ) + v ( t ) . . . xt u ( x , t ) (e) u ( t ) = u + P j =1 v j ( t ) Figure 3.1: Superposition of subsolutions to the advection-diﬀusion equation, withpartitioning T = { , . , . , . , } , ∆ x = 5 · − , a = 1, and ν = 10 − . The colordepends on the time coordinate: blue corresponds to t = 0 and green to t = 1.4 KOOIJ ET AL. . . . . . . . . x u − . − − . − − − − ∆ x ℓ - e rr o r n o r m (a) Velocity a = 0 and diﬀusivity coeﬃcient ν = 10 − . . . . . . . . . x u − . − − . − − − ∆ x ℓ - e rr o r n o r m (b) Velocity a = 1 and diﬀusivity coeﬃcient ν = 0. . . . . . . . . x u − . − − . − − − − ∆ x ℓ - e rr o r n o r m (c) Velocity a = 1 and diﬀusivity coeﬃcient ν = 10 − . Figure 3.2: Left: numerical solution on the ﬁnest mesh at t = 0 (dashed) and t = 0 . a = 1 and ν = 0 does not show odd-even decouplingbecause of the high spatial resolution used. Right: convergence of the numericalsolution (at ﬁnal time t = 1), for tol : (cid:3) − , ◦ − , × − . TIME-PARALLEL EXPONENTIAL INTEGRATOR In the previous examples, we solved homogeneousIVPs. Nonhomogeneous problems are generally more expensive to solve, becausean accurate SVD approximation of the source term (2.4) requires more singular val-ues, which increases the block width of the block Krylov subspace (2.7). Parallelspeedup can then be expected by splitting the nonhomogeneous problem into sub-problems (2.12), which require less individual eﬀort to solve by the PEBK method.To measure the parallel eﬃciency of our algorithm for nonhomogeneous IVPs, weintroduce a source term f ( x, t ) in the advection-diﬀusion equation u t + au x = νu xx + f ( x, t ) , (3.7)with ν = 10 − , and a = 1. The source term is chosen such that the solution is a seriesof ﬁve travelling pulses u ( x, t ) = − cos (10 π ( x − t )) . (3.8)The mesh width of the spatial discretization is ∆ x = 10 − , such that the error due tothe spatial discretization is small compared to the time integration error. The meshwidth gives a semi-discrete system with a 1000 × − . The SVD polynomial approximation is constructed from 100 time samplesper subinterval. The singular values are plotted in Fig. 3.4, which reveals that theﬁrst two singular values are several orders larger than the rest. Therefore, we retainonly the ﬁrst two singular values in the truncated SVD. The decay of singular valuesis guaranteed by the upper bound from Theorem 2.1. We have veriﬁed that the SVDapproximation is suﬃciently accurate, i.e., the approximation error, measured in the L -norm, is less than the tolerance of the EBK method.We compare the parallel eﬃciency of the Paraexp-EBK implementation with aconvential implementation of the Paraexp algorithm [21], where the nonhomogeneousproblem is integrated with the Crank–Nicolson (CN) scheme. The linear system issolved directly using MATLAB. The matrix exponential propagator, for the homoge-neous problem in the Paraexp method, is realized with an Arnoldi method using theshift-and-invert technique [51].In order to have a Courant number of one, we take ∆ t = 10 − (for ∆ x = 10 − and a = 1). According to the Paraexp method, the time step size needs to be decreased inparallel computation by a factor P / q [21], where q is the order of the time integrationmethod, in order to control the error. In case of the Crank–Nicolson scheme, we have q = 2.As discussed in Section 2.2, there is no communication required between pro-cessors, except for the superposition of the solutions to the subproblems at the end.Therefore, we are able to test the parallel algorithm on a serial computer by measur-ing the computation time of each independent subproblem. The computation time ofthe serial time integration is denoted τ . For the parallel time integration, we mea-sure the computation times of the nonhomogeneous and the homogeneous part of thesubproblems separately, denoted by τ and τ respectively. The parallel speedup canthen be estimated as speedup = τ max( τ ) + max( τ ) , (3.9)where we take the maximum value of τ and τ over all parallel processes. The timingsof the EBK method and Paraexp are listed in Table 3.1.6 KOOIJ ET AL. P E ﬃ c i e n c y [ % ] Figure 3.3: Parallel eﬃciency for solv-ing the advection-diﬀusion equation. (cid:3)

PEBK; × Paraexp/CN. − − − m σ m Figure 3.4: Singular values of the matrixcomposed of the source term samples.The parallel eﬃciency of both methods is illustrated in Figure 3.3. Note thatthe parallel eﬃciency of the standard Paraexp method steadily decreases, whereasthe PEBK method maintains a constant eﬃciency level around 90%. The decreasein eﬃciency of the standard Paraexp method is due to the reduced time step size inthe Crank–Nicolson scheme, with respect to its serial implementation. The nonho-mogeneous part of the subproblems requires more computation time as the numberof processors increases.The numerical test conﬁrms the initial assumption that parallel speedup withthe EBK method can be achieved by decomposing the originial problem into simplerindependent subproblems (2.12). Furthermore, the Paraexp implementation using theEBK method, appears to be more eﬃcient than one using a traditional time-steppingmethod.

TIME-PARALLEL EXPONENTIAL INTEGRATOR P .Timing τ corresponds to the serial algorithm. For the parallel algorithm, τ and τ are timings of the nonhomogeneous and homogeneous problem respectively.Serial Parallel P τ Error max ( τ ) max ( τ ) Error Eﬃciency

PEBK 2 1.43e+00 3.05e-04 7.22e-01 6.44e-02 2.70e-04 91 %4 2.81e+00 3.05e-04 7.23e-01 7.06e-02 2.68e-04 89 %8 5.59e+00 3.05e-04 7.40e-01 6.21e-02 2.88e-04 87 %16 1.14e+01 3.05e-04 7.30e-01 6.28e-02 3.22e-04 90 %32 2.27e+01 3.05e-04 7.33e-01 6.63e-02 3.65e-04 89 %Paraexp 2 2.15e+00 4.56e-04 1.29e+00 1.66e-02 4.10e-04 82 %4 4.37e+00 4.56e-04 1.52e+00 1.28e-02 3.80e-04 71 %8 8.56e+00 4.56e-04 1.85e+00 1.24e-02 3.59e-04 58 %16 1.70e+01 4.56e-04 2.18e+00 1.31e-02 3.43e-04 49 %32 3.43e+01 4.56e-04 2.62e+00 1.16e-02 3.32e-04 41 %8

KOOIJ ET AL.

4. Burgers equation.

In the previous section, the PEBK method was appliedto a linear PDE. In this section, the performance of the PEBK method is tested on anonlinear PDE, the viscous Burgers equation.

Consider the viscous Burgers equation, u t + uu x = νu xx + f ( x, t ) , x ∈ [0 , , t ∈ [0 , , (4.1)where ν ∈ R denotes the diﬀusivity (or viscosity) coeﬃcient. In the following ex-periments, we take ν = 10 − , such that the problem is dominated by the nonlinearconvective term. As in the previous example, the boundary conditions are periodic.Exact solutions to the Burgers equation can be found by the Cole–Hopf transfor-mation [27]. In this test case, we construct a desired solution by introducing anappropriate source term, as shown for the advection-diﬀusion equation in Section 3.2.The the source term balances the dissipation of energy due to diﬀusion, and preventsthe solution of vanishing in the limit t → ∞ .The Burgers equation is an important equation in ﬂuid dynamics. It can beseen as a simpliﬁcation of the Navier–Stokes equation, which retains the nonlinearconvective term, uu x . In the limit ν →

0, the nonlinearity may produce discontinuoussolutions, or shocks, so that a typical solution may resemble a sawtooth wave. Asawtooth wave can be represented by an inﬁnite Fourier series. A smooth version ofthe sawtooth wave can be obtained by the modiﬁed Fourier series, u ( ξ ) = 12 − k max X k =1 πk sin(2 πkξ )Φ( k, ǫ ) , ξ ∈ R , (4.2)where k is the wavenumber, k max a cutoﬀ wavenumber, and Φ( k, ǫ ) is a smoothingfunction, which supresses the amplitudes associated to high wavenumbers:Φ( k, ǫ ) = (cid:20) πkǫ/ πkǫ/ (cid:21) , (4.3)Here, ǫ is the smoothing parameter. The smoothing function Φ( k, ǫ ) is motivated bythe viscosity-dependent inertial spectrum of the Burgers equation found by Chorin &Hald [11]. The smoothing function for ǫ = 0 . k max = 100. The value ǫ = 0 . x -direction by introducing theparametrization ξ = x − t + in (4.1). The source term is then readily foundby substituting the chosen solution (4.2) into the Burgers equation (4.1), f ( x, t ) = u t + uu x − νu xx . (4.4)The space derivatives in the Burgers equation are discretized with a second-ordercentral ﬁnite diﬀerence schemes, using a uniform mesh width ∆ x >

0. The error ofthe numerical solution is again measured in the relative ℓ -norm, cf. (3.3).The tolerance of the PEBK method is set to 10 − . The SVD approximation of thesource term, which includes the nonlinear term, is constructed from s = 50 samplesper subinterval, and reveals that m = 12 singular values are suﬃcient in the truncatedSVD, so that the error of the SVD approximation is less than the tolerance of thePEBK method. TIME-PARALLEL EXPONENTIAL INTEGRATOR − − k Φ ( k , ǫ ) (a) Smoothing function (solid line) and cut-oﬀ wavenumber (dashed line). . . . . . . . . x u ( x ) (b) Solution for ǫ = 0 . k max = 100. Figure 4.1: Manufactured solution (4.2) of the Burgers equation.The nonlinear system of ODEs is solved iteratively, as outlined in Section 2.3.Figure 4.2 shows the error history at diﬀerent grid resolutions. Here, the error con-verges to a value that depends on the mesh width. In other words, the ﬁnal errorof the time integration method is much smaller than the error due to the spatialdiscretization. − − − Iteration ℓ - e rr o r n o r m Figure 4.2: Error history with ∆ T = 0 . (cid:3) , ∆ x = 10 − ; ◦ , ∆ x = 5 · − ; △ ,∆ x = 2 . · − .The PEBK method is parallelized as discussed in Section 2.4. The time interval[0 , T ] can be partitioned into P subintervals with a uniform subinterval size ∆ T . Inthe following experiment, the complete time interval is extended with each subintervaladded, T = P ∆ T . The mesh width is ∆ x = 2 . · − . Figure 4.3a shows thatincreasing the number of processors and hence the simulated time, generally increasesthe number of iterations required to achieve the same level of accuracy. The increasednumber of required iterations implies a decrease in theoretical parallel eﬃciency, which0 KOOIJ ET AL. can be estimated by the ratio K /K P , see (2.26).Next, the ﬁnal time is kept constant, and the subinterval size reduces with anincrease in the number of processors, ∆ T = T /P . Figure 4.3b shows that the numberof iterations is roughly independent of the number of processors, i.e., in this case,the parallel eﬃciency, K /K P , does not decrease with P . Parallel speedup mightalso be improved by the fact that the SVD approximation converges faster on smallersubintervals, see Theorem 2.1. That is, fewer singular values have to be retained in thetruncated SVD in order to achieve a certain accuracy. Also, the number of samplesof the source term could be decreased on smaller subintervals. − − − Iteration ℓ - e rr o r n o r m (a) ∆ T = 0 . T = P ∆ T . − − − Iteration ℓ - e rr o r n o r m (b) ∆ T = T /P , T = 0 . Figure 4.3: Error history for the Paraxp-EBK method for a ﬁxed subinterval size anda ﬁxed ﬁnal time. (cid:3) , P = 1; ◦ , P = 2; △ , P = 4; ⋆ , P = 8; ⋄ , P = 16; × , P = 32. In this section, we test the parallel eﬃciency of thePEBK method for the viscous Burgers equation. In the following experiments, thesource term is chosen such that the solution of the spatially discretized system isequal to the exact solution described in Section 4.1. In other words, the source termaccounts for the error by the spatial discretization, and we measure only the errordue to the time integration. The spatial discretization is here performed with a meshwidth of ∆ x = 2 · − .If the subinterval size, ∆ T , is ﬁxed, the number of iterations generally increasesin case the number of processors P increases, see Fig. 4.3. Increasing the number ofiterations would reduce the parallel eﬃciency, according to (2.26). We therefore ﬁxthe ﬁnal time T , such that the subinterval size decreases with increasing P . Also, thetotal number of samples over [0 , T ] is ﬁxed, such that the local number of samplesdecreases with increasing P . To keep the global distribution of samples constant inthe parallel computations, the local sample points are uniformly distributed insteadof Chebyshev points as used in previous experiments. In our experiments, we use∆ T = 0 . /P and s = 128 /P . Furthermore, the number of (retained) singular valuesis m = 12, and the tolerance of the EBK method is set to 10 − . The error history fordiﬀerent P is shown in Fig. 4.4. The results show that the convergence rate of thePEBK method can improve by increasing P . The nonlinear corrections, see (2.18),appear to be more eﬀective on smaller subintervals. TIME-PARALLEL EXPONENTIAL INTEGRATOR ν = 10 − and ν = 10 − are shown in Fig. 4.5a,which clearly illustrate a parallel speedup. The slightly higher timings of ν = 10 − could be attributed to the increased stiﬀness of the problem. Figure 4.5b shows thatthe parallel eﬃciency of the PEBK method steadily decreases with the number of(virtual) processors, P . There is no signiﬁcant diﬀerence between the performanceof the method at these two values of the viscosity coeﬃcient. The parallel eﬃciencymight even be further tuned in practice, based on the previous observation that therequired number of iterations could decrease with higher P . Also, the number ofsingular values, m , could possibly be reduced on smaller subintervals, based on (2.1),which would further enhance the potential parallel speedup.The PFASST algorithm [17] shows a good parallel eﬃciency for the viscous Burg-ers equation as well. It is unknown whether changing the viscosity coeﬃcient has animpact on the performance of PFASST. Our observations point in the direction ofa good parallel eﬃciency of the PEBK method for simulations of the Navier–Stokesequation at high Reynolds numbers. For these problems there is a high demand forhighly eﬃcient parallel solvers [53]. The Parareal algorithm was for example reportedto perform poorly in advection-dominated problems [18, 49]. − − − Iteration ℓ - e rr o r n o r m Figure 4.4: Error history for the Paraxp-EBK method with ∆ T = 0 . /P . (cid:3) , P = 1; ◦ , P = 2; △ , P = 4; ⋆ , P = 8. Turbulent ﬂow is a multiscale phenomenon that ischaracterized by a wide range of length and time scales. In case of the Burgersequation, we can emulate such a solution by combining two functions that are periodic2

KOOIJ ET AL. P W a ll c l o c k t i m e [ s ] (a) × : ν = 10 − ; (cid:3) : ν = 10 − ; dotted line: ν = 10 − (ideal); dashed line: ν = 10 − (ideal). P E ﬃ c i e n c y [ % ] (b) × : ν = 10 − ; (cid:3) : ν = 10 − . Figure 4.5: Total computation time (left) and parallel eﬃciency (right) of ten PEBKiterations, with ∆ T = 0 . /P .in both space and time, e.g., u ( x, t ) = sin(2 πx ) sin(2 πt ) + 1 k sin(2 k πx ) sin(2 k πt ) , k > . (4.5)The solution features a large scale mode with wavenumber one, and a smaller scalemode with wavenumber k >

1. This combination allows the construction of anarbitrarily wide dynamic range. The factor 1 /k is included in compliance with anassumed energy distribution of [ˆ u ( k )] ∝ k − [11], where ˆ u ( k ) is the Fourier transform,ˆ u ( k ) = Z ∞−∞ u ( x, t ) exp( − πik ( x + t )) dx dt, (4.6)where we have used the fact that the solution is periodic in space and time. Theprevious experiment, see Fig. 4.3a, is repeated for the manufactured solution (4.5)with diﬀerent values of k . Figure 4.6 illustrates that widening the spectrum does notsigniﬁcantly aﬀect the convergence of the PEBK iterations. Remarkably, the curvesappear to form pairs based on P .

5. Conclusions.

We propose an implementation of the Paraexp method withenhanced parallelism based on the exponential block Krylov (EBK) method. Fur-thermore, the method, Paraexp-EBK (PEBK), is extended to solve nonlinear PDEsiteratively by a waveform relaxation method. The nonlinear terms are representedby a source term in a nonhomogeneous linear system of ODEs. Each iteration thesource term is updated with the latest solution. The convergence of the iterativeprocess can be accelerated by adding a correction term based on the Jacobian matrixof the nonlinear term. Each iteration the initial value problem can then be decoupledinto independent subproblems, which can be solved parallel in time. Essentially, weimplement the Paraexp method within a waveform relaxation approach in order to in-tegrate nonlinear PDEs. Also, the Paraxp-EBK (PEBK) method is used to integrate

TIME-PARALLEL EXPONENTIAL INTEGRATOR − − − Iteration ℓ - e rr o r n o r m (a) k = 4 − − − Iteration ℓ - e rr o r n o r m (b) k = 16 Figure 4.6: Error history with ∆ T = 0 . T = P ∆ T . (cid:3) , P = 1; ◦ , P = 2; △ , P = 4; ⋆ , P = 8; ⋄ , P = 16; × , P = 32.both the homogeneous and the nonhomogeneous parts of the subproblems. This is incontrast to the original Paraexp method, which assumes a convential time integrationmethod for the nonhomogeneous parts.The PEBK method is tested on the advection-diﬀusion equation for which wedemonstrate the parallelization concept for linear PDEs. The parallelization alsoworks in cases without diﬀusion present, in which the PDE is purely hyperbolic. Theparallel eﬃciency is compared with a Crank–Nicolson (CN) scheme parallelized withthe Paraexp algorithm. The parallel eﬃciency of the PEBK method remains roughlyconstant around 90%. On the other hand, the parallel eﬃciency of the CN/Paraexpcombination steadily decreases with the number of processors.As a model nonlinear PDE, the viscous Burgers equation is solved. The numberof waveform relaxation iterations required for a certain error tolerance increases, whenthe relative importance of nonlinearity grows by decreasing the viscosity coeﬃcient.Good parallel eﬃciency of the EBK method was observed for diﬀerent values of theviscosity coeﬃcient. Since the nonlinear convective term in the Burgers equation isshared by the Navier–Stokes equation, the presented results give a hint of the poten-tial of the PEBK method as an eﬃcient parallel solver in turbulent ﬂuid dynamics,where nonlinearity plays a key role. The question remains how to treat the pressurein the incompressible Navier–Stokes equation, which enforces the incompressibilityconstraint on the velocity ﬁeld (see for possible approaches in [15, 43]). This will beexplored in future work. REFERENCES[1] M. Afanasjew, M. Eiermann, O. G. Ernst, and S. G¨uttel. Implementation of a restarted Krylovsubspace method for the evaluation of matrix functions.

Linear Algebra Appl. , 429:2293–2314, 2008.[2] M. Botchev. A block Krylov subspace time-exact solution method for linear ordinary diﬀerentialequation systems.

Numerical linear algebra with applications , 20(4):557–574, 2013. KOOIJ ET AL.[3] M. Botchev, G. Sleijpen, and A. Sopaheluwakan. An SVD-approach to Jacobi–Davidson so-lution of nonlinear Helmholtz eigenvalue problems.

Linear Algebra and its Applications ,431(3):427–440, 2009.[4] M. A. Botchev, I. Farag´o, and A. Havasi. Testing weighted splitting schemes on a one-columntransport-chemistry model.

Int. J. Environment and Pollution , 22(1/2):3–16, 2004.[5] M. A. Botchev, V. Grimm, and M. Hochbruck. Residual, restarting, and Richardson iterationfor the matrix exponential.

SIAM Journal on Scientiﬁc Computing , 35(3):A1376–A1397,2013.[6] M. A. Botchev, I. V. Oseledets, and E. E. Tyrtyshnikov. Iterative across-timesolution of linear diﬀerential equations: Krylov subspace versus waveform re-laxation.

Computers & Mathematics with Applications , 67(12):2088–2098, 2014. http://dx.doi.org/10.1016/j.camwa.2014.03.002 .[7] M. A. Botchev and H. A. van der Vorst. A parallel nearly implicit scheme.

Journal of Compu-tational and Applied Mathematics , 137:229–243, 2001.[8] K. Burrage.

Parallel and sequential methods for ordinary diﬀerential equations . The ClarendonPress Oxford University Press, New York, 1995. Oxford Science Publications.[9] A. Cella and M. Lucchesi. Space-time ﬁnite elements for the wave propagation problem.

Mec-canica , 10(3):168–170, 1975. http://dx.doi.org/10.1007/BF02149028 .[10] E. Celledoni and I. Moret. A Krylov projection method for systems of ODEs.

Applied numericalmathematics , 24(2):365–378, 1997.[11] A. J. Chorin and O. H. Hald. Viscosity-dependent inertial spectra of the Burgers and Korteweg–deVries–Burgers equations.

Proceedings of the National Academy of Sciences of the UnitedStates of America , 102(11):3921–3923, 2005.[12] J. J. B. de Swart.

Parallel Software for Implicit Diﬀerential Equations . PhD thesis, Universityof Amsterdam, Nov. 1997. ISBN 90-74795-77-3.[13] S. V. Dolgov, B. N. Khoromskij, and I. V. Oseledets. Fast solution of parabolic problemsin the tensor train/quantized tensor train format with initial application to the fokker–planck equation.

SIAM Journal on Scientiﬁc Computing , 34(6):A3016–A3038, 2012. http://dx.doi.org/10.1137/120864210 .[14] J. Dongarra et al. The international exascale software project roadmap.

International Journalof High Performance Computing Applications , page 1094342010391989, 2011.[15] W. Edwards, L. Tuckerman, R. Friesner, and D. Sorensen. Krylov methods for the incom-pressible Navier–Stokes equations.

Journal of Computational Physics , 110(1):82 – 102,1994.[16] M. Eiermann, O. G. Ernst, and S. G¨uttel. Deﬂated restarting for matrix functions.

SIAM J.Matrix Anal. Appl. , 32(2):621–641, 2011.[17] M. Emmett and M. Minion. Toward an eﬃcient parallel in time method for partial diﬀer-ential equations.

Communications in Applied Mathematics and Computational Science ,7(1):105–132, 2012.[18] P. F. Fischer, F. Hecht, and Y. Maday. A Parareal in time semi-implicit approximation of theNavier–Stokes equations. In

Domain decomposition methods in science and engineering ,pages 433–440. Springer, 2005.[19] M. Gander and S. Vandewalle. Analysis of the Parareal Time-Parallel Time-Integration Method.

SIAM Journal on Scientiﬁc Computing , 29(2):556–578, 2007.[20] M. J. Gander. Analysis of the parareal algorithm applied to hyperbolic problems using char-acteristics.

Boletin de la Sociedad Espanola de Matem´atica Aplicada , 42:21–35, 2008.[21] M. J. Gander and S. G¨uttel. Paraexp: A parallel integrator for linear initial-value problems.

SIAM Journal on Scientiﬁc Computing , 35(2):C123–C142, 2013.[22] T. G¨ockler and V. Grimm. Uniform approximation of ϕ -functions in exponential integrators bya rational Krylov subspace method with simple poles. SIAM Journal on Matrix Analysisand Applications , 35(4):1467–1489, 2014. http://dx.doi.org/10.1137/140964655 .[23] G. H. Golub and C. F. Van Loan. Matrix computations.

Johns Hopkins University, Press,Baltimore, MD, USA , pages 374–426, 1996.[24] W. Hackbusch. Parabolic multigrid methods. In

Computing methods in applied sciences andengineering, VI (Versailles, 1983) , pages 189–197. North-Holland, Amsterdam, 1984.[25] M. Hochbruck and J. Niehoﬀ. Approximation of matrix operators applied to multiple vectors.

Mathematics and Computers in Simulation , 79(4):1270–1283, 2008.[26] M. Hochbruck and A. Ostermann. Exponential integrators.

Acta Numerica , 19:209–286, 2010.[27] E. Hopf. The partial diﬀerential equation u t + uu x = µu xx . Communications on Pure andApplied mathematics , 3(3):201–230, 1950.[28] R. A. Horn and C. R. Johnson. Matrix Analysis.

Cambridge University Press , 1986.[29] R. A. Horn and C. R. Johnson. Topics in Matrix Analysis.

Cambridge University Press , 1991. TIME-PARALLEL EXPONENTIAL INTEGRATOR [30] G. Horton and S. Vandewalle. A space-time multigrid method for parabolicpartial diﬀerential equations. SIAM J. Sci. Comput. , 16(4):848–864, 1995. http://dx.doi.org/10.1137/0916050 .[31] T. J. R. Hughes and G. M. Hulbert. Space-time ﬁnite element methods for elastodynamics:formulations and error estimates.

Comput. Methods Appl. Mech. Engrg. , 66(3):339–363,1988. http://dx.doi.org/10.1016/0045-7825(88)90006-0 .[32] W. Hundsdorfer and J. G. Verwer.

Numerical solution of time-dependent advection-diﬀusion-reaction equations , volume 33. Springer Science & Business Media, 2013.[33] E. Lelarasmee, A. E. Ruehli, and A. Sangiovanni-Vincentelli. The waveform relaxation methodfor time-domain analysis of large scale integrated circuits.

Computer-Aided Design ofIntegrated Circuits and Systems, IEEE Transactions on , 1(3):131–145, July 1982.[34] G. Li. A block variant of the GMRES method on massively parallel processors.

ParallelComputing , 23(8):1005–1019, 1997.[35] J.-L. Lions. ´Equations diﬀ´erentielles op´erationnelles et probl`emes aux limites . DieGrundlehren der mathematischen Wissenschaften, Bd. 111. Springer-Verlag, Berlin-G¨ottingen-Heidelberg, 1961.[36] J.-L. Lions, Y. Maday, and G. Turinici. R´esolution d’edp par un sch´ema en temps parar´eel .

Comptes Rendus de l’Acad´emie des Sciences - Series I - Mathematics , 332(7):661 – 668,2001.[37] C. Lubich and A. Ostermann. Multi-grid dynamic iteration for parabolic equations.

BITNumerical Mathematics , 27:216–234, 1987. 10.1007/BF01934186.[38] U. Miekkala and O. Nevanlinna. Convergence of dynamic iteration methods for initial valueproblems.

SIAM Journal on Scientiﬁc and Statistical Computing , 8(4):459–482, 1987.[39] M. Minion. A hybrid parareal spectral deferred corrections method.

Communications in AppliedMathematics and Computational Science , 5(2):265–301, 2011.[40] I. Moret and P. Novati. RD rational approximations of the matrix exponential.

BIT , 44:595–615, 2004.[41] K. W. Morton and D. F. Mayers.

Numerical solution of partial diﬀerential equations: anintroduction . Cambridge university press, 2005.[42] O. Nevanlinna. Remarks on Picard-Lindel¨of iteration.

BIT Numerical Mathematics , 29(2):328–346, 1989.[43] C. K. Newman.

Exponential Integrators for the Incompressible Navier–Stokes Equations . PhDthesis, Virginia Polytechnic Institute and State University, 2004.[44] A. R. Newton and A. L. Sangiovanni-Vincentelli. Relaxation-based electrical simulation.

IEEETransactions on Electron Devices , 30(9):1184–1207, 1983.[45] J. Niehoﬀ.

Projektionsverfahren zur Approximation von Matrixfunktionen mit Anwendungenauf die Implementierung exponentieller Integratoren . PhD thesis, Mathematisch-Natur-wissenschaftlichen Fakult¨at der Heinrich-Heine-Universit¨at D¨usseldorf, December 2006.[46] A. Nikishin and A. Y. Yeremin. Variable block CG algorithms for solving large sparse symmetricpositive deﬁnite linear systems on parallel computers, I: General iterative scheme.

SIAMJournal on Matrix Analysis and Applications , 16(4):1135–1153, 1995.[47] D. Ruprecht and R. Krause. Explicit parallel-in-time integration of a linear acoustic-advectionsystem.

Computers & Fluids , 59:72–83, 2012.[48] G. A. Staﬀ and E. M. Rønquist. Stability of the parareal algorithm. In

Domain decompositionmethods in science and engineering , pages 449–456. Springer, 2005.[49] J. Steiner, D. Ruprecht, R. Speck, and R. Krause. Convergence of Parareal for the Navier–Stokes equations depending on the Reynolds number. In

Numerical Mathematics andAdvanced Applications-ENUMATH 2013 , pages 195–202. Springer, 2015.[50] H. Tal-Ezer. On restart and error estimation for Krylov approximation of w = f ( A ) v . SIAMJ. Sci. Comput. , 29(6):2426–2441, 2007.[51] J. van den Eshof and M. Hochbruck. Preconditioning Lanczos approximations to the matrixexponential.

SIAM J. Sci. Comput. , 27(4):1438–1457, 2006.[52] P. J. van der Houwen and B. P. Sommeijer. CWI contributions to the development of parallelRunge-Kutta methods.

CWI Quarterly , 11(1):33–53, 1998. Solving diﬀerential equationson parallel computers.[53] E. P. van der Poel, R. Ostilla-M´onico, J. Donners, and R. Verzicco. A pencil distributed ﬁnitediﬀerence code for strongly turbulent wall-bounded ﬂows.

Computers & Fluids , 116(0):10–16, 2015.[54] J. J. W. van der Vegt and H. van der Ven. Space-time discontinuous Galerkin ﬁnite elementmethod with dynamic grid motion for inviscid compressible ﬂows. I. General formulation.

J. Comput. Phys. , 182(2):546–585, 2002.[55] J. G. Verwer and J. M. Sanz-Serna. Convergence of method of lines approximations to partial KOOIJ ET AL.diﬀerential equations.