Massively parallel-in-space-time, adaptive finite element framework for non-linear parabolic equations
Robert Dyja, Baskar Ganapathysubramanian, Kristoffer G. van der Zee
MMASSIVELY PARALLEL-IN-SPACE-TIME, ADAPTIVE FINITEELEMENT FRAMEWORK FOR NON-LINEAR PARABOLICEQUATIONS
ROBERT DYJA, BASKAR GANAPATHYSUBRAMANIAN,AND KRISTOFFER G. VAN DER ZEE
Abstract.
We present an adaptive methodology for the solution of (linearand) non-linear time dependent problems that is especially tailored for mas-sively parallel computations. The basic concept is to solve for large blocks ofspace-time unknowns instead of marching sequentially in time. The method-ology is a combination of a computationally efficient implementation of aparallel-in-space-time finite element solver coupled with a posteriori space-time error estimates and a parallel mesh generator. This methodology en-ables, in principle, simultaneous adaptivity in both space and time (withinthe block) domains. We explore this basic concept in the context of a varietyof time-steppers including Θ-schemes and Backward Differentiate Formulas.We specifically illustrate this framework with applications involving time de-pendent linear, quasi-linear and semi-linear diffusion equations. We focus oninvestigating how the coupled space-time refinement indicators for this class ofproblems affect spatial adaptivity. Finally, we show good scaling behavior upto 150,000 processors on the Blue Waters machine. This is achieved by carefulusage of memory via block storage and non-zero formats along with lumpedcommunication for matrix assembly. This methodology enables scaling on nextgeneration multi-core machines by simultaneously solving for large number oftime-steps, and reduces computational overhead by refining spatial blocks thatcan track localized features. This methodology also opens up the possibility ofefficiently incorporating adjoint equations for error estimators and inverse de-sign problems, since blocks of space-time are simultaneously solved and storedin memory. Introduction
We describe the methodology, implementation details and application examplesof space-time block adaptive solutions to parabolic partial differential equations(evolution problems). This approach is primarily motivated by the necessity of
Czestochowa University of Technology, ul. Dabrowskiego 73, 42-201 Czestochowa,PolandIowa State University, 306 Lab of Mechanics, Ames, IA 50011University of Nottingham, School of Mathematical Sciences, University Park, Not-tingham NG7 2RD, UK
E-mail addresses : [email protected], [email protected],[email protected] . Key words and phrases. parabolic problems, parallel-in-time, finite element method, adaptivemesh refinement.We gratefully acknowledge NSF 1435587 and XSEDE resources at TACC, as well as an ex-ploratory account on BlueWaters.Kristoffer G. van der Zee’s research was supported by the Engineering and Physical SciencesResearch Council (EPSRC) under grant EP/I036427/1. a r X i v : . [ phy s i c s . c o m p - ph ] A ug R. DYJA, B. GANAPATHYSUBRAMANIAN, AND K.G. VAN DER ZEE designing computational methodologies that can scale to leverage the availabilityof very large computing clusters (with ∼ ∼ thousands of processors, beyond which efficiency iscommunication limited) which make this approach unsuitable on larger machines.To overcome this barrier, a natural approach is to consider the time domain as anadditional dimension and simultaneously solve for blocks of time, instead of thestandard approach of sequential time-stepping. Early work on this approach wasconsidered by Hughes and coworkers [15, 16], Tezduyar and co-workers [24], andReddy and co-workers [21], while variations on this theme have recently been ex-plored by several groups [6, 9, 18, 19, 22, 28]. Other examples include methodsthat exploit the fact that some coefficients during matrix assembly can be calcu-lated concurrently [7] and methods that transform the equations so as to build onesystem of equations for more than one timestep [17].The concept of solving for blocks of time simultaneously has recently gaineda lot of attention to enable effective usage of exascale computing resources (seefor instance the US DOE’s Exascale Mathematics Working Group [14]). In addi-tion to this obvious advantage, solving for space-time blocks also allows naturalincorporation of a posteriori error estimates for mesh adaptivity, and enables thesolution of inverse problems (involving adjoints). This has several additional tangi-ble benefits in the context of computational overhead. For evolution problems thatexhibit regionalized behavior in space and time (wave equations, moving interfaceslike bubbles and shocks) solving in blocks of space-time that are locally refined tomatch the regional behavior provides substantial computational gain [9]. Similarly,the availability of error estimates across a block of time allows optimal choices ofspace and time adaptivity.Motivated by these considerations, this paper presents a methodology for thesolution of linear, quasi-linear and semi-linear (time dependent) diffusion equationsin three dimensions. We discuss the development of a parallel adaptive frameworkfor the solution of large blocks of space-time. We detail the development of theblock space-time framework for two classes of time-steppers ( θ schemes, BackwardDifference Formula (BDF)). We discuss implementation details of the massivelyparallel adaptive block-space time framework and illustrate scaling behavior up to150,000 processors. We subsequently define a posteriori space-time error indicators to identify spatial regions for mesh adaption. Finally, we demonstrate that for suffi-ciently large problems the block space-time approach is much more computationallyefficient than the standard sequential time-stepping approach.The outline of the rest of the paper is as follows: Section 2, and 3 detail the blockspace-time framework for linear, and non-linear evolution equations, respectively.Section 4 discusses the space-time error estimates for these classes of problems. InSection 5, we discuss implementation details and show scaling performance andanalysis. Section 6 illustrates several numerical examples of the framework. Weconclude in Section 7. DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 3 Basic space-time formulation: linear and non-linear versions
Space-time framework for a linear problem.
Given a bounded domainΩ ∈ R , and a finite time domain [0 , T ], consider the parabolic equation that solvesfor u : Ω × [0 , T ] → R :(1) (cid:40) ∂ t u ( x , t ) − ∇ · κ ∇ u ( x , t ) = f ( x , t ) in Ω × [0 , T ] u ( x ,
0) = u where f : Ω × [0 , T ] → R is a smooth source function, and κ >
0. We consider (with-out loss of generality) that Dirichlet boundary conditions are imposed on the bound-ary Γ , unless otherwise specified. Considering a tesselation,
T ≡ { Ω , ..., Ω e , ... } ,of the domain Ω into elements (with average size h ), the weak form of this equationis given as:(2) (cid:40) Find u h ( · , t ) ∈ U h :( w h , ∂ t u h ( · , t )) + ( ∇ w h , κ ∇ u h ( · , t )) = ( w h , f ) ∀ w h ∈ V h where ( . , . ) is the L inner product on Ω and(3) U h := (cid:8) u h | u h ∈ H (Ω) , u h ∈ P (Ω e ) ∀ e (cid:9) V h := (cid:8) w h | w h ∈ H (Ω) , w h ∈ P (Ω e ) ∀ e (cid:9) with P (Ω e ) being the space of the standard polynomial finite element shape func-tions on element Ω e . To obtain a fully discretized form, we employ a time steppingtechnique on the above semi-discrete equation. While any time-stepping methodcan be used, as an example, consider the Euler Backward Formula that is definedon a discretization { , t , t , ..., T } of the time domain:(4) (cid:18) w h , u hn +1 − u hn ∆ t (cid:19) − (cid:0) ∇ w h , κ ∇ u hn +1 (cid:1) = (cid:0) w h , f n +1 (cid:1) , for n = 0 , , ... where the subscript denotes evaluation at that discrete time, and ∆ t = t n +1 − t n is the timestep.Following standard FEM practice, with the tesselation of the domain resultingin k nodal values (to describe spatial variation) of the field u , equation (4) can beexpressed in terms of matrix-vector products as:(5) Mu n +1 − ∆ t Ku n +1 = Mu n + ∆ t f n +1 , for n = 0 , , ... where M and K are the global mass and stiffness matrices, respectively. u n +1 and u n are vectors containing the nodal values of the field u at time step n + 1 and n , respectively. Equation (5) represents the system of equations solved to get thesolution for timestep n + 1. The size of vector u is equal to the number of nodalunknowns, k . Similarly matrices K , M are sparse matrices of size k × k .Consider a block-wise division of the total time domain. Each block, B i , consistsof multiple timesteps. This is schematically represented in Figure 1. Instead ofsequentially solving for each time step (as in (5)), consider solving for the fieldvariable in a complete time block, B , consisting of N timesteps simultaneously, i.e.solve for u i , i = 1 , ..., N simultaneously. With some abuse of notation, the elements of matrix M are equal to M ij = ( w hi , w hj ) andmatrix K are equal to K ij = ( ∇ w hi , κ ∇ w hj ). R. DYJA, B. GANAPATHYSUBRAMANIAN, AND K.G. VAN DER ZEE B B t= t n t n+ t N t N +B t=Tt t t n +B Figure 1.
Indexing of timesteps per block (B). Number oftimesteps per block is equal to N.This results in a block diagonal matrix of size ( N × k )-by-( N × k ) given by:(6) I − M M + ∆ t K − M M + ∆ t K . . . − M M + ∆ t K u u u ... u N = IC∆ t f ∆ t f ...∆ t f N where I is an identity matrix (of size k ) and IC are the imposed initial conditions.This system solves for N timesteps at once with a total number of unknowns N × k . Remark 1.
By treating the (unknown) nodal values at different time steps as mul-tiple degrees of freedom associated with each spatial node, we can leverage standardalgorithmic approaches (assembly, memory usage) tailored for multiple d.o.f prob-lems. Many approaches in uncertainty quantification (polynomial chaos represen-tation, spectral stochastic methods) leverage such an approach of representing fieldvariation along additional dimensions (stochastic dimensions) as simply additionaldegrees of freedom at each spatial location [20, 27, 11, 23] . Figure 2(left) illustratesan elemental matrix considering N (= 10) time steps (i.e. degrees of freedom pernode), for a 2D quadrilateral element . Note the sparse structure of the elementalmatrix as well as the resulting global matrix (for a × quad mesh) in Figure 2(right). Figure 2.
Left, Nonzero entries in an elemental matrix. Right,Nonzero entries in the resultant global matrix for a 2x2 mesh ofquadrilateral elements. Dark squares represent nonzero entries andnumber are node indices. The time block consists of 10 time steps. using Backward-Euler discretization in time DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 5
Space-time framework for non-linear problems.
Extending the approachto certain non-linear problems is straightforward. Consider the case where κ is afunction of the dependent variable, u . We assume that κ ( u ) satisfies appropri-ate smoothness and boundedness assumptions to ensure existence and uniqueness,0 < κ ≤ κ ( u ) ≤ κ < ∞ . In this case the weak form for a block B is(7) ( w h , u hn +1 − u hn ∆ t ) + ( ∇ w h , κ ( u hn +1 ) ∇ u hn +1 ) = ( w h , f n +1 ) , for n = 0 , , ..., N The solution to such non-linear equations are usually via (quasi-)Newton schemes.The methodology involves construction of the Jacobian and residual, which areused to compute updates. This is represented in matrix-vector terms as: J u in +1 δ u i +1 n +1 = F u in +1 , u i +1 n +1 = u in +1 + δ u i +1 n +1 , (8) for i = 1 , ..., until convergence and for n = 0 , , ... where J u in +1 is the Jacobian (or linearized form), and F u in +1 is the residual of theabove equation, both computed using u in +1 . More specifically, for the non-lineardiffusion equations defined by (7), the residual, F u in +1 , is given by(9) F u in +1 = 1∆ t Mu in +1 − t Mu n − K ( u in +1 ) u in +1 − f n +1 where K ( u in +1 ) denotes the solution dependent stiffness matrix. The Jacobian isgiven as:(10) J u in +1 = 1∆ t M + (cid:18) K ( u in +1 ) + d K ( u in +1 )d u (cid:19) Instead of sequentially solving for each time step (as in (9)), consider solving forthe field variable in a complete time block, B , consisting of N timesteps simulta-neously. That is,(11) J u i − t M J u i . . . − t M J u iN δ u i +11 δ u i +12 ... δ u i +1 N = F u i F u i ... F u iN and u i +11 u i +12 ... u i +1 N = u i u i ... u iN + δ u i +11 δ u i +12 ... δ u i +1 N Remark 2.
One can alternatively ignore the off-diagonal entries of the block Ja-cobian to construct an approximate diagonal Jacobian. The propagation of timeinformation is then limited to the residual on the right hand side. We tested bothapproaches, with the latter approach taking marginally more iterations to conver-gence, while providing substantial ease of implementation. Unless otherwise stated,all our results are based on the latter approach.
R. DYJA, B. GANAPATHYSUBRAMANIAN, AND K.G. VAN DER ZEE Space-time formulation: Higher Order Time schemes
We next look at extending the space-time strategy to incorporate two families ofhigher order time-steppers: Θ scheme and Backward difference formula. We con-sider linear and nonlinear diffusion, and moreover, we also consider the treatment ofthe Allen-Cahn equation, which is a parabolic PDE with a lower-order nonlinearity,whose solution has evolving layers and for which adaptivity is particularly useful.3.1. Θ scheme: Linear equation.
The semi-discrete form of the Θ-scheme –which is a generalization of the Euler Backward scheme – is as follows:(12) ( w, u n +1 ) − ( w, u n ) + ∆ t [(1 − Θ)( ∇ w, κ ∇ u n ) + Θ( ∇ w, κ ∇ u n +1 )] =∆ t [(1 − Θ)( w, f n ) + Θ( w, f n +1 )] , for n = 0 , , ... The fully discrete matrix-vector representation is given by(13) Mu n +1 − Mu n +∆ t (1 − Θ) Ku n +∆ t Θ Ku n +1 = ∆ t (1 − Θ) f n +∆ t Θ f n +1 , for n = 0 , , ... Again, it is straightforward to group and simultaneously solve for N time stepstogether. The corresponding matrix form is expressed as: I − M + ∆ t (1 − Θ) K M + ∆ t Θ K − M + ∆ t (1 − Θ) K M + ∆ t Θ K . . . − M + ∆ t (1 − Θ) K M + ∆ t Θ K u u u ... u N = IC∆ t [(1 − Θ) f + Θ f ]∆ t [(1 − Θ) f + Θ f ]...∆ t [(1 − Θ) f N − + Θ f N ] (14)Again, the global space-time matrix (14) has a block structure.3.2. Θ scheme: Nonlinear diffusion with variable coefficient. The corre-sponding weak form for this case is given as:(15) ( w h , u hn +1 − u hn ∆ t ) + (1 − Θ)( ∇ w h , κ ( u hn ) ∇ u hn ) + Θ( ∇ w h , κ ( u hn +1 ) ∇ u hn +1 ) =(1 − Θ)( w h , f n ) + Θ( w h , f n +1 ) , for n = 0 , , ... The Jacobian is:(16) J u in +1 = M + ∆ t Θ (cid:18) K ( u in +1 ) + d K ( u in +1 )d u (cid:19) while the residual is given by:(17) F u in +1 = Mu in +1 − Mu n +∆ t Θ (cid:0) K ( u in +1 ) u in +1 − f n +1 (cid:1) +∆ t (1 − Θ) ( K ( u n ) u n − f n ) DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 7
Again, it is straightforward to group and simultaneously solve for N time stepstogether. The corresponding (diagonal) block Jacobian (see Remark 2) is given as:(18) M +∆ t Θ (cid:16) K ( u i ) + d K ( u i )d u (cid:17) M +∆ t Θ (cid:16) K ( u i ) + d K ( u i )d u (cid:17) . . . M +∆ t Θ (cid:16) K ( u iN ) + d K ( u iN )d u (cid:17) As before, the upper index denotes Newton-Raphson iteration.3.3. Θ scheme: Allen Cahn Equation.
The Allen-Cahn equation is a semi-linear diffusion equation with a non-linear reaction term:(19) ∂ t u ( x , t ) − ∆ u ( x , t ) + (cid:15) − f ( u ) = 0where we set f ( u ) = u ( u − u ( x ,
0) = u along with zeroflux conditions in the boundaries.The corresponding semi-discrete form is given as:(20) ( w h , ∂ t u h ) + ( ∇ w h , ∇ u h ) + (cid:15) − ( w h , f ( u h )) = 0Using the Θ-scheme results in the fully discrete form:(21) ( w h , u hn +1 ) + ∆ t Θ (cid:2) ( ∇ w h , ∇ u hn +1 ) + (cid:15) − ( w h , f ( u hn +1 )) (cid:3) =( w h , u hn ) − ∆ t (1 − Θ) (cid:2) ( ∇ w h , ∇ u hn ) + (cid:15) − ( w h , f ( u hn )) (cid:3) Again, it is straightforward to group and simultaneously solve for N time stepstogether. The corresponding (diagonal) block Jacobian is given as:(22) M + ∆ t Θ (cid:0) K ( u i )+ (cid:15) − f ( u i )d u (cid:17) M + ∆ t Θ (cid:0) K ( u i )+ (cid:15) − f ( u i )d u (cid:17) . . . M + ∆ t Θ (cid:0) K ( u iN )+ (cid:15) − f ( u iN )d u (cid:17) Alternative schemes for the Allen-Cahn equation and other phase-field models aredescribed in, e.g., [13].3.4.
Backward difference formula based time steppers.
BDF based time-steppers of order s utilize the solution at s previous time steps to construct thesolution at the next time step. A general s order BDF scheme is given as:(23) s (cid:88) k =0 α k u n + k = ∆ tβg n + s R. DYJA, B. GANAPATHYSUBRAMANIAN, AND K.G. VAN DER ZEE where the left hand side is the BDF scheme representation of the time derivative, ∂u∂t , in terms of the solution u i at time point i , and the right hand side collects allother terms. Here, α and β are known BDF coefficients [8]. A first order ( s = 1)BDF scheme is identical to the Euler Backward scheme described earlier. Thesimplest multi-step scheme is for s = 2 and for the linear diffusion equation it isgiven as:(24)( w h , u hn +2 ) −
43 ( w h , u hn +1 ) + 13 ( w h , u hn ) + 23 ∆ t ( ∇ w h , κ ∇ u hn +2 ) = 23 ∆ t ( w h , f n +2 )Again, it is straightforward to group and simultaneously solve for N time stepstogether. The corresponding space-time block equations are as follows:(25) I − M M + ∆ t K M − M M + ∆ t K . . . M − M M + ∆ t K u u u ... u N = IC∆ t f ∆ t f ... ∆ t f N It is clear that higher order multistep methods produce block matrices that have alarger bandwidth (see Figure 3). Nonlinear problems can be treated similarly.
Figure 3.
View of nonzero entries of an elemental matrix (2Dquad with linear basis functions) for a block of space-time con-sisting of 2 timesteps + initial condition (left) and 9 timesteps +initial condition(right) with Backward difference formula of secondorder. First timestep is calculated using Euler Backward scheme.4.
Adaptive meshing for the block-space-time method: Residualbased error estimator
A central idea of this work is to develop a block space-time methodology that canbe integrated with mesh adaptivity. This will enable targeted refinement of regionsthat exhibit variations in the corresponding block of time. Mesh adaptivity requiresthe definition of an indicator function that determines which regions of the spacerequire refinement/coarsening. In this work, we build on prior work and utilize Note that the first time step is approximated using a backward Euler time stepper as thesecond order BDF scheme requires knowledge of the solution at two previous time steps
DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 9 standard residual-based error indicators; see, e.g. [4, 26, 25]. Alternative duality-based indicators for nonlinear parabolic problems are described in, e.g. [9, 10, 12].Residual based error indicators η e are constructed for each spatial element, Ω e ,and consist of two terms – an interior residual, r int , and a jump residual, r jump [26]:(26) η e = h e (cid:107) r hint (cid:107) L (Ω e ) + h e (cid:107) r hjump (cid:107) L ( ∂ Ω e ) where h e is the size of element, Ω e . For example, for the linear and quasi-lineardiffusion equation, the detailed derivation of the interior and jump residual is avail-able in the work of Verfurth [26]. We refer the interested reader to that work andonly show the key results here. Basically, the two terms are constructed (as thename suggests) from the definition of the residual:(27) R h ( w ) = ( w, f ) − ( w, ∂ t u h ) − ( ∇ w, κ ( u h ) ∇ u h )This residual is decomposed into element-wise terms as:(28) R h ( w ) = (cid:88) e (cid:18)(cid:90) Ω e wr h int dΩ + (cid:90) Γ e wr h jump dΓ (cid:19) where(29) r h int = f − ∂ t u h + ∇ · κ ( u h ) ∇ u h and(30) r h jump = (cid:26) ∂ Ω e ∩ Γ h κ ( u h ) ∇ u h + · n + + κ ( u h ) ∇ u h − · n − on ∂ Ω e \ Γ h where Γ h is that part of the boundary with Dirichlet conditions imposed.The interior and jump residuals for the Allen-Cahn equations are similarly definedas(31) r h int = (cid:15) − f ( u h ) − ∂ t u h + ∆ u h and(32) r h jump = (cid:26) ∂ Ω e ∩ Γ h ∇ u h + · n + + ∇ u h − · n − on ∂ Ω e \ Γ h Recall that we are using error indicators defined over a block of time. We extendthe concept of error indicator defined at one time step to the notion of an errorindicator defined over a block of time steps. There are several choice of errorindicators with two of them being:the average value of the error indicator across the block of time(33) η e,avge = (cid:88) Block of time, B (cid:32) (cid:80) Nn =1 η e,n N (cid:33) and the maximum value in the block of time:(34) η e,max = max Block of time, B η e,n The former approach is advantageous for slow variations in the block and avoidsfrequent migrating of elements between processors, which can negatively affect scal-ability. The latter approach is advantageous when there is rapid changes across afew timesteps. In this case the error estimator is not ’diffused’ by timesteps wheresolution is changing slowly.
Remark 3.
In this work, we approximate the semi-discrete term, ∂u∂t in terms ofits finite difference representation ( Θ or BDF scheme). An implicit assumptionis that the time steps are small enough that this approximation is valid. Ideally,one would choose a consistent representation in both space and time; i.e. usinga finite element representation for time variations [21, 15] . This provides severaladvantages in terms of mathematical elegance. We defer this development to asubsequent paper. Implementation details
We utilize our in-house scalable, parallel FEM framework that is optimized fordistributed memory computing. The FEM software library is implemented in C++and uses object oriented software principles. Linear algebra, parallel matrix andvector storage are all performed by the PETSc library [5]. PETSc modules (KSP,SNES) are used to solve (non)linear equations. The FEM library is dynamicallylinked to the Parallel Hierarchical Grid (PHG) library [2] which is a parallel meshrefinement framework. PHG uses a bisection type algorithm [29], specifically newestvertex bisection to refine/coarsen elements . PHG operates on simplex elements andproduces conforming meshes after refinement. While our existing implementationwas reasonably optimized, several software engineering principles had to be carefullyimplemented to ensure efficient execution of the block space-time problems. Someof the simpler changes that had substantial impact for the space-time formulation , but had minimal impact on the existing, standard iterative formulation are: (a)moving calculations out of loops whenever possible, (b) executing calculations andstoring results in array before loops, (c) conversion to 64-bit based integer variablesfor storing matrix indices, which allows going beyond 4 billion unknowns, (d) usinglocal values instead of values obtained through pointer or reference .We made software engineering decisions to ensure that the space-time imple-mentation is compatible with our existing sequential FEM frameworks (see remark1). Basically, the space-time formulation is treated as the solution of a steady-stateproblem. Initial conditions are imposed as boundary condition on the first degreeof freedom. We discuss key aspects of the implementation next.5.1. Memory interlacing and matrix bandwidth.
We rearrange the vectorof unknowns u to enumerate time points before looping over space. i.e. u = { u , u , ..., u N , u , u , ..., u N , ..., u kN } , where subscript refers to time and superscriptdenotes space. This allows for more efficient assembly, because all data (coefficientsin system of equation) for a specific element share memory locality, thus preventingcache misses. Moreover, this approach is compatible with existing frameworksfor FEM, because problems with multiple DOFs are supported in existing FEMframeworks, so we can use standard procedures for system assembling or imposingboundary conditions. This has the additional advantage of reducing the matrix In newest vertex bisection, the edge that lies in opposite to the newest node is divided even very simple changes possibly like avoiding division operations by replacing with multipli-cation, or using ’const’ to enable the compiler to automatically simplify expressions, and avoidingunnecessary conversions between integer and double values have some impact these standard best practices provide minimal improvements to sequential time stepping, andare usually not reported or ignored. the function can access values more quickly, as it does not have to first fetch pointer, thancheck where pointer points and get value DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 11 bandwidth. Table 1 enumerates the bandwidth for space-time formulation (2Dmesh with 100 ×
100 quad elements, linear basis function) using a Euler Backwardformulation. As expected, the bandwidth linearly increases with increasing numberof timesteps.
Table 1.
The bandwidth size for space-time formulationnumber of timestepssequential 5 10 25 50row size 10201 51005 102010 255025 510050bandwidth 205 1023 2043 5103 10203
10 100 1000 10000 CN BDF2 EB t i m e [ s ] Elliptic problem with force term aij125102550100no zeros
Figure 4.
Computational time for different sizes of block size inblock based matrix aij format . Investigated example was the 2Dlinear diffusion problem using 100x100 quad elements with oneblock of 100 timesteps. Numbers in legend describes block size,the ’aij’ is sparse matrix without blocks and ’no zeroes’ is specialcase with ignored zero entries in element matrix.5.2.
Matrix access and storage.
We explored the use of block structured matrixformats that are optimized for storing the global matrix, where the block size is afunction of the number of timesteps. We use blocks of sizes ranging from 2 × N × N (where N is the number of timesteps in a block, B ). Figure 4 plots thetime required to assemble and solve a linear diffusion problem using the space-timeformulation (Backward Euler). We can see that different non-zero blocks sizes cangive observable savings in computational time. Note also that each of these blocksis itself quite sparse (see Figure. 6). Careful identification of the non-zero patterncould potentially result in much larger savings. This is illustrated in the ’no-zero’column in Figure 4 where we avoid inserting zero elements to the global matrix.The memory requirement for this case is also substantially minimized (see figure5). m e m o r y [ b y t e s ] size of blockElliptic problem with force term Figure 5.
Memory requirements for different sizes of block sizein block based matrix aij format and sparse element matrix On theexample of elliptic problem on grid of 100x100 elements with oneblock of 100 timesteps
Figure 6.
View of full element matrix for Euler Backward (left)and compressed element matrix (right). Dark grey color meansnonzero entries in matrix.5.3.
Compressed element matrix.
Figure 6 shows the non zero patterns withinan elemental stiffness matrix. The size of this matrix is ( nbf × N ) × ( nbf × N ),where nbf is the number of basis functions (assuming 1 dof) and N is the numberof time steps in the block, B . Clearly, as N increases, the sparsity of the elementalstiffness matrix improves. This suggests using compressed formats for storing theelemental stiffness matrices. This issue becomes more important with increasingblock size.5.4. Scaling.
We performed scaling studies on two machines. Preliminary scalingwas performed on the TACC Stampede [3]. Stampede consists of 6400 computenodes each equipped with two Intel E5-2680 8-core processors and 32 GB of memory.Each compute node has access to 250GB of local storage and Lustre filesystem witha write performance of 150GB/s. The nodes are connected with FDR InfiniBand
DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 13
16 32 64 128 256 512 1024 2048 16 32 64 128 256 512 1024 2048 R e l a t i v e S peed U p t i m e [ s ] Figure 7.
Speed up on Stampede (top) and time requirements forspace-time and iterative formulations on the same machine (bot-tom)network. Scaling results are shown in Figure 7. We also performed scaling on NCSABlue Waters [1]. Blue Waters consists of 22,640 nodes, each consisting of two AMD6276 ”Interlagos” processors for total number of 362,240 computing cores. Eachnode has 64GB of memory, which gives total system memory equal to 1.476PB.Compute nodes have access to storage with Lustre filesystem that has size 26.4 PBand offers total bandwith of more than 1TB/s.Figures 8, 9, 10 show scaling for the linear, nonlinear and Allen-Cahn equationsrespectively. We show good scaling upto 131,072 processors. The largest prob-lem size used was ∼ . mesh with 10timesteps per block. The results illustrate reasonably good scaling for the space-time approach, and Figure 7 suggests that for larger number of processors, thespace-time approach performs better than the iterative approach. Specifically, be-yond a threshold number of processors, the total time to solve a given problem is
16 64 256 1024 4096 16384 65536 262144 16 64 256 1024 4096 16384 65536 262144 R e l a t i v e S peed U p *10200 *10400 *10800 *10 Figure 8.
Speed up on Blue Waters for linear time dependent diffusion
16 64 256 1024 4096 16384 65536 262144 16 64 256 1024 4096 16384 65536 262144 R e l a t i v e S peed U p *10200 *10400 *10550 *10 Figure 9.
Speed up on Blue Waters for nonlinear time dependentdiffusion (nonlinearity induced by diffusion coefficient dependingon u )lower for the space time approach compared to the iterative approach. We antic-ipate that for more complex problems (complex geometries, remeshing, I/O) thespace-time approach may perform even better.6. Numerical Examples
We illustrate our adaptive parallel-in-space-time framework on linear and non-linear diffusion equation and the Allen-Cahn equation.6.1.
Problem A – Linear Diffusion.
We first consider the linear case. We set κ = 1. We use the method of manufactured solutions to construct the forcing termin (1) to ensure an analytical solution, u :(35) u ( x, y, z, t ) = exp ( − α ( x, y, z, t )) DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 15
16 64 256 1024 4096 16384 65536 262144 16 64 256 1024 4096 16384 65536 262144 R e l a t i v e S peed U p *10200 *10400 *10550 *10 Figure 10.
Speed up on Blue Waters Supercomputer for Allen-Cahn problemwhere α ( x, y, z, t ) is equal to:(36) α ( x, y, z, t ) = ( x − x ( t )) + ( y − y ( t )) + ( z − z ( t )) d and x ( t ) = a cos( ωt ) + by ( t ) = a sin( ωt ) + bz ( t ) = b with a = 0 . b = 0 . ω = 2 π , d = 0 .
1. Thus, u is a rotating exponential hillcentered at the midplane and rotating with a radius of 0.2 and an angular speed of2 π . This manufactured solution is constructed by the following forcing term(37) f ( x , t ) = − (cid:18) x − x ) + ( y − y ) + ( z − z ) ) d − d −− aω (cid:18) cos( tω )( y − y ) − sin( tω )( x − x ) d (cid:19)(cid:19) exp ( − α ( x, y, z, t ))The equation is solved in a region of dimensions [0 : 1] × [0 : 1] × [0 : 1]. Everyboundary face of the region has an essential boundary condition with prescribedvalue of u equal to the value computed from (35). We use a time step of 0.01 andsolve for a block of 100 timesteps.Figure 11 shows a cutoff of the refined spatial mesh after 20 refinement iterations.We remind the reader that each iteration consists of solving the space-time problem,constructing the time-averaged elemental refinement indicators, equation (33), andthen refining the 3D mesh according to the indicators. Given that this is a movingsource problem, it is clearly seen that there is refinement in regions where the sourcehas passed through.We compare spatial convergence rates for three implementations: (a) sequen-tial time stepping with no spatial adaptivity, (b) space-time implementation withno spatial adaptivity, and (c) space-time implementation with spatial adaptivity. Figure 11.
Refined mesh: view from top (left) and front (right). l a s t || e || L2 h ‾ grid convergence for space-time formulation, dt=0.01iterspace-timespace-time, adapt Figure 12.
Grid convergence for linear heat equation. Error is (cid:107) u − u h (cid:107) from last time-step versus average element size ¯h. l a s t || e || L2 dttime convergence for space-time formulation (linear problem)EBBDF2 Figure 13.
Time convergence for linear heat equation. Error is (cid:107) u − u h (cid:107) from last time-step versus size of time-step.We plot convergence in Figure 12, where error is (cid:107) u − u h (cid:107) . The first two imple-mentations (obviously) overlap, with the adaptive mesh implementation showing a DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 17 reduced error. All three curves show a slope of 2, which is to be expected. The Fig-ure 13 shows time-step convergence, with expected slopes of 1 and 2 for BackwardEuler and Backward Difference Formulae, respectively.6.2.
Problem B – Nonlinear Diffusion.
For the nonlinear case, we set thecoefficient κ ( u ) = 1 + 10 u and we choose an exact solution such that | u | ≤
1, thusbounding κ . As before, we choose our analytical solution to be given by (35). Theforcing term, f , is consequently:(38) f ( x , t ) =80 ud (cid:0) ( x − x ) + ( y − y ) + ( z − z ) (cid:1) exp ( − α ( x, y, z, t ))++ (1 + 10 u ) (cid:18) x − x ) + ( y − y ) + ( z − z ) ) d − d (cid:19) exp ( − α ( x, y, z, t )) −− aω cos( tω )( y − y ) − sin( tω )( x − x ) d exp ( − α ( x, y, z, t ))Figure. 14 plots several time snapshots of the moving non-linear source problem,with the solution accurately tracking the moving source. In Figure. 15, we plotspatial convergence for three implementations: (a) sequential time stepping with nospatial adaptivity, (b) space-time implementation with no spatial adaptivity, and(c) space-time implementation with spatial adaptivity (plotted with the averageelement size). Convergence rates follow along expected lines, with a slope of 2.6.3. Problem C – Allen-Cahn.
In this final example we solve the modified Allen-Cahn problem. The key physics (i.e. phase change) which is described by thisnon-linear equation essentially occurs in a highly localized region of the domain (ona surface of co-dimension 1). Adaptive refinement (and coarsening) has been a veryeffective approach to accurately resolve this localized region. The equation is givenas ∂u∂t = − D (cid:0) f ( u ) − C n ∇ u (cid:1) , where(39) f ( u ) = 2 Au (1 − u + 2 u ) − k and D = 1, C n = 0 . A = 16, k = 0 .
1. The initial conditions are(40) u ( x ) = 0 . . r − . (cid:113) A C n , r = (cid:112) x + y + z with zero flux conditions on all boundaries. This represents an initial solid of radius, r = 0 .
5, that is melting. Using symmetry arguments, we consider a single octantof the space ([0 : 1] × [0 : 1] × [0 : 1]). We consider a time step of 0.02 and a blocksize of 50 time-steps. Figure 16 shows snapshots of the melting sphere at varioustime points.Figure 17 illustrates adaptive mesh refinement across the time block. The figureshows the refined mesh for the first block of time ( t = 0 to t = 1) Note thatrefinement is localized along the solidification front within the time block . To capturethis thin interface using a uniform mesh would have required 589 824 elements,instead of the 137 776 elements that were used here. (a) t=0s (b) t=0.25s (c) t=0.75s (d) t=1s Figure 14.
Results of computations where upper half of compu-tational domain is removed. Values of u change in time from 0s(a) to 1s (d) l a s t || e || L2 h ‾ grid convergence for space-time formulation, dt=0.01, nonlineartet, itertet, adapttet Figure 15.
Grid convergence for nonlinear heat equation. Erroris (cid:107) u − u h (cid:107) from last time-step versus average element size ¯h. DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 19 (a) t=0s (b) t=1s (c) t=3s (d) t=5s
Figure 16.
Results of computations for the 3D Allen-Cahn prob-lem. Values of u for initial condition (a), after 1s (b), 3s (c) and5s (d) Y XZ
Figure 17.
Finite element mesh after 20 refinement iterationswith superimposed solution at time 1 s at the dense part of mesh7. Conclusion
We present formulation, implementation details and representative examples ofa parallel-in-space-time based adaptive methodology for the solution of (linear and)non-linear time dependent problems. The basic concept is to solve for large blocks ofspace-time unknowns instead of marching sequentially in time. The methodology isa combination of a computationally efficient implementation of a parallel-in-space-time finite element solver coupled with a posteriori space-time error estimates and a parallel mesh generator. We illustrate how this implementation is especially tailoredfor massively parallel computations. We show good scaling behavior up to 150,000processors on the Blue Waters machine. This methodology enables scaling on nextgeneration multi-core machines by simultaneously solving for large number of time-steps, and reduces computational overhead by refining spatial blocks that can tracklocalized features. This methodology also opens up the possibility of efficientlyincorporating adjoint equations for error estimators and inverse design problems,since blocks of space-time are simultaneously solved and stored in memory. Ourfuture work is focused on extending the space-time framework to utilizing finiteelement basis functions in time (which enables formal derivation of space-time aposteriori error estimates), and subsequently implementing 4D finite elements toenable simultaneous space and time adaptivity.
References Blue waters user portal | system summary .2. Phg (parallel hierarchical grid) .3.
Texas advanced computing center - stampede technical details .4. Mark Ainsworth and J. Tinsley Oden,
A posteriori error estimation in finite element analysis ,John Wiley & Sons, New York, 2000.5. Satish Balay, William D. Gropp, Lois Curfman McInnes, and Barry F. Smith,
Efficient man-agement of parallelism in object oriented numerical software libraries , Modern Software Toolsin Scientific Computing (E. Arge, A. M. Bruaset, and H. P. Langtangen, eds.), Birkh¨auserPress, 1997, pp. 163–202.6. Marek Behr,
Simplex space-time meshes in finite element simulations , Int. J. Numer. Meth.Fluids (2008), no. 9, 1421–1434.7. Kevin Burrage,
10. parallel methods for systems of ordinary differential equations , Applica-tions on Advanced Architecture Computers, 1996, pp. 101–120.8. John C. Butcher,
Numerical methods for ordinary differential equations , John Wiley & Sons,West Sussex, 2008.9. V. Carey, D. Estep, A. Johansson, M. Larson, and S. Tavener,
Blockwise adaptivity for timedependent problems based on coarse scale adjoint solutions , SIAM J. Sci. Comput. (2010),no. 4, 2121–2145.10. G. S¸im¸sek, X. Wu, K.G. van der Zee, and E.H. van Brummelen, Duality-based two-level errorestimation for time-dependent pdes: Application to linear and nonlinear parabolic equations ,Computer Meth. Appl. Mech. Engrg. (2015), 83–109.11. Bert J. Debusschere, Habib N. Najm, Philippe P. Pbay, Omar M. Knio, Roger G. Ghanem,and Olivier P. Le Matre,
Numerical challenges in the use of polynomial chaos representationsfor stochastic processes , SIAM J. Sci. Comput. (2004), no. 2, 698–719.12. K. Eriksson, C. Johnson, and A. Logg, Adaptive computational methods for parabolic prob-lems , Encyclopedia of Computational Mechanics, in: Fundamentals (E. Stein, R. de Borst,and T.J.R. Hughes, eds.), John Wiley & Sons, Ltd., 2004, pp. 675–702.13. Hector Gomez and Kristoffer G. van der Zee,
Computational phase-field modeling , Encyclo-pedia of Computational Mechanics, Second Edition, John Wiley & Sons, Ltd., (In Press).14. Jeffrey Hittinger, Sven Leyffer, and Jack Dongarra,
Models and algorithms for exascale com-puting pose challenges for applied mathematicians , SIAM News, 2013.15. Thomas J. R. Hughes and G. M. Hulbert,
Space-time finite element methods for elastody-namics: Formulations and error estimates , Comput. Methods Appl. Mech. Eng. (1988),no. 3, 339–363.16. Thomas J.R. Hughes and James R. Stewart, A space-time formulation for multiscale phe-nomena , J. Comput. Appl. Math. (1996), no. 1, 217–229.17. J. Lions, Yvon Maday, and Gabriel Turinici, A”parareal”in time discretization of pde’s , C. R.Acad. Sci. Series I Mathematics (2001), no. 7, 661–668.
DAPTIVE PARALLEL-IN-SPACE-TIME FINITE ELEMENT 21
18. Robert B. Lowrie, Philip L. Roe, and Bram van Leer,
Space-time methods for hyperbolic con-servation laws , Barriers and Challenges in Computational Fluid Dynamics (V. Venkatakrish-nan, Manuel D. Salas, and Sukumar R. Chakravarthy, eds.), Springer Netherlands, Dordrecht,1998, pp. 79–98.19. Karthik Mani and Dimitri Mavriplis,
Efficient solutions of the euler equations in a time-adaptive space-time framework , 49th AIAA Aerospace Sciences Meeting including the NewHorizons Forum and Aerospace Exposition, American Institute of Aeronautics and Astronau-tics, 2011.20. Akil Narayan and Dongbin Xiu,
Stochastic collocation methods on unstructured grids in highdimensions via interpolation , SIAM J. Sci. Comput. (2012), no. 3, A1729–A1752.21. J. P. Pontaza and J. N. Reddy, Space-time coupled spectral/hp least-squares finite elementformulation for the incompressible navier-stokes equations , J. Comput. Phys. (2004),no. 2, 418–459.22. Thomas C. S. Rendall, Christian B. Allen, and Edward D. C. Power,
Conservative unsteadyaerodynamic simulation of arbitrary boundary motion using structured and unstructuredmeshes in time , Int. J. Numer. Meth. Fluids (2012), no. 12, 1518–1542.23. Christian Soize and Roger Ghanem, Physical systems with random uncertainties: Chaos rep-resentations with arbitrary probability measure , SIAM J. Sci. Comput. (2004), no. 2, 395–410.24. Tayfun E. Tezduyar, Sunil Sathe, Ryan Keedy, and Keith Stein, Spacetime finite element tech-niques for computation of fluidstructure interactions , Computer Meth. Appl. Mech. Engrg. (2006), no. 17–18, 2002–2027.25. R¨udiger Verf¨urth,
A review of a posteriori error estimation and adaptive mesh-refinementtechniques , Wiley-Teubner, New York and Stuttgart, 1996.26. ,
A posteriori error estimation techniques for finite element methods , Oxford Univer-sity Press, Oxford, 2008.27. Xiaoliang Wan and George Em Karniadakis,
Multi-element generalized polynomial chaos forarbitrary probability measures , SIAM J. Sci. Comput. (2006), no. 3, 901–928.28. Luming Wang and Per-Olof Persson, A high-order discontinuous galerkin method with un-structured spacetime meshes for two-dimensional compressible flows on domains with largedeformations , Comput Fluids (2015), 53–68.29. Lin-Bo Zhang,
A parallel algorithm for adaptive local refinement of tetrahedral meshes usingbisection , Numer. Math. Theor. Meth. Appl.2