[PDF] PyMGRIT: A Python Package for the parallel-in-time method MGRIT

Abstract

In this paper, we introduce the Python framework PyMGRIT, which implements the multigrid-reduction-in-time (MGRIT) algorithm for solving the (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-based iterative method that allows parallel-in-time simulations, i. e., calculating multiple time steps simultaneously in a simulation, by using a time-grid hierarchy. The PyMGRIT framework features many different variants of the MGRIT algorithm, ranging from different multigrid cycle types and relaxation schemes, as well as various coarsening strategies, including time-only and space-time coarsening, to using different time integrators on different levels in the multigrid hierachy. PyMGRIT allows serial runs for prototyping and testing of new approaches, as well as parallel runs using the Message Passing Interface (MPI). Here, we describe the implementation of the MGRIT algorithm in PyMGRIT and present the usage from both user and developer point of views. Three examples illustrate different aspects of the package, including pure time parallelism as well as space-time parallelism by coupling PyMGRIT with PETSc or Firedrake, which enable spatial parallelism through MPI.

Full PDF

PPyMGRIT: A Python Package for the parallel-in-time method MGRIT

JENS HAHNE, STEPHANIE FRIEDHOFF, and MATTHIAS BOLTEN,

Bergische Universit¨at Wuppertal

In this paper, we introduce the Python framework

PyMGRIT , which implements the multigrid-reduction-in-time (MGRIT) algorithm forsolving the (non-)linear systems arising from the discretization of time-dependent problems. The MGRIT algorithm is a reduction-basediterative method that allows parallel-in-time simulations, i. e., calculating multiple time steps simultaneously in a simulation, by usinga time-grid hierarchy. The

PyMGRIT framework features many different variants of the MGRIT algorithm, ranging from differentmultigrid cycle types and relaxation schemes, as well as various coarsening strategies, including time-only and space-time coarsening,to using different time integrators on different levels in the multigrid hierachy. The comprehensive documentation with tutorials andmany examples, the fully documented code, and a large number of pre-implemented problems allow an easy start into the work withthe package. The functionality of the code is ensured by automated serial and parallel tests using continuous integration.

PyMGRIT allows serial runs for prototyping and testing of new approaches, as well as parallel runs using the Message Passing Interface (MPI).Here, we describe the implementation of the MGRIT algorithm in

PyMGRIT and present the usage from both user and developer pointof views. Three examples illustrate different aspects of the package, including pure time parallelism as well as space-time parallelismby coupling

PyMGRIT with

PETSc or Firedrake , which enable spatial parallelism through MPI.CCS Concepts: •

Mathematics of computing → Solvers;

Ordinary differential equations; Partial differential equations;

Differentialalgebraic equations;Additional Key Words and Phrases: Multigrid-reduction-in-time (MGRIT), parallel-in-time integration

ACM Reference format:

Jens Hahne, Stephanie Friedhoff, and Matthias Bolten. 2020. PyMGRIT: A Python Package for the parallel-in-time method MGRIT. 1,1, Article 1 (January 2020), 21 pages.DOI: 10.1145/1122445.1122456

The classical approach for solving an initial value problem is based on a time-stepping procedure, which computesthe solution at a point in time based on the solution at one or more previous time steps and, thus, propagates thesolution over time. The method is optimal, i. e., of order

O ( N t ) for N t time steps, and gives the solution after N t applications of the time integrator. However, time-stepping algorithms are completely sequential in time and limitpotential parallelization to the spatial domain. In recent years, the structure of modern computer systems has beencharacterized by a growing number of processors, as the clock rates of the individual processors stagnate. As aconsequence, possibilities of spatial parallelization are quickly exhausted, although further resources are available.Promising approaches for using these modern computer systems to further reduce the runtime of simulations of initialvalue problems are parallel-in-time methods, which allow not only spatial but also temporal parallelism. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are notmade or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for componentsof this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or toredistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].© 2020 ACM. Manuscript submitted to ACMManuscript submitted to ACM a r X i v : . [ c s . M S ] A ug Hahne and Friedhoff, et al.One of these approaches is the iterative multigrid-reduction-in-time (MGRIT) algorithm (Falgout et al. 2014), whichapplies multigrid reduction principles to the time domain. The idea of the algorithm is based on using a hierarchy oftime grids, where the finest grid contains the same points in time as in the time-stepping approach and the number ofpoints on the coarser grids decreases, so that the coarsest grid consists of only a few points. While time integration isapplied to the coarsest grid sequentially, time subdomains are handled in parallel on the other grids. Over the recentyears, the algorithm has been successfully applied to various problems, e. g., linear and nonlinear parabolic problems(Falgout et al. 2014, 2017), compressible fluid dynamics (Falgout et al. 2015), power grids (Lecouvez et al. 2016; Schroderet al. 2018), eddy-current simulations (Bolten et al. 2019; Friedhoff et al. 2020, 2019), linear advection (De Sterck et al.2019; Howse et al. 2019), and machine learning (G¨unther et al. 2020).Besides the MGRIT algorithm, there are many other iterative and direct parallel-in-time methods. Probably the mostwell-known method is Parareal (Lions et al. 2001), whose success is due to its simplicity and applicability: For the use ofthe algorithm only an expensive and a cheap time integrator is needed. This idea of Parareal can be interpreted in avariety of frameworks of numerical schemes. In particular, Parareal can be seen as a two-level MGRIT variant (Falgoutet al. 2014). The parallel full aproximation scheme in space and time (PFASST) (Emmett and Minion 2012) is based onspectral deferred corrections (SDC) (Dutt et al. 2000) and allows space-time parallelism by simultaneously executingseveral SDC “sweeps” on a space-time hierarchy. Another method, the revisionist integral deferred correction method(RIDC) (Christlieb et al. 2010; Ong et al. 2016) allows time parallelism through the pipelined parallel application ofintegral deferred corrections. A more detailed, structured, and general overview of parallel-time-integration approachesis given in (Gander 2015). Furthermore, the website https://parallel-in-time.org offers many more referencesand information about parallel-in-time methods.Despite the increasing number of algorithms, concepts, and papers in the field of parallel-in-time integration, thenumber of accessible stand-alone parallel-in-time libraries providing parallelization in time or allowing space-timeparallelism is relatively small. The following libraries are most noteable: The XBraid (LLNL 2020) library, written inC provides an implementation of the MGRIT algorithm and libridc (Ong et al. 2016) is a C ++ implementation of theRIDC algorithm. The PFASST method is implemented by various libraries which are summarized in (LLBL 2020), inparticular by the Fortran library libpfasst , PFASST++ written in C ++ , and the Python implementation pySDC . Besidesthese, there are a number of other implementations of parallel-in-time concepts, but most of them are either morespecialized or more difficult to access, such as the SWEET code (Martin Schreiber 2020) or various implementations ofthe PARAEXP method (Gander and G¨uttel 2013). This paper describes the new Python framework

PyMGRIT , whichimplements the MGRIT algorithm.

PyMGRIT is designed to be easy to use, ranging from the very simple installation ofthe package to calling or overriding functions, making it ideal for testing the application of MGRIT to problems, or forprototyping new ideas and strategies for the MGRIT algorithm. More precisely,

PyMGRIT allows on the one hand thesimple, pure application of different variations of the MGRIT algorithm to problems without having to worry aboutimplementation details, parallel communication, etc., but on the other hand, it allows flexible adaptations of individualcomponents of the algorithm by overriding existing functions. This allows users to try MGRIT for their application,and it makes

PyMGRIT particularly attractive for the training of students. Furthermore,

PyMGRIT allows easy couplingwith other software that already exists as Python packages, e. g.,

Matplotlib (Hunter 2007) for creating static andinteractive visualizations or

Firedrake (Rathgeber et al. 2016) for using the Finite Element Method (FEM). Additionally,

PyMGRIT allows space-time parallel runs that go far beyond prototyping.

Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 3

Initial value 512 time steps 1024 time steps

Fig. 1. Sequential time stepping for a simple heat conduction problem. Starting with the initial condition (left), the solution ispropagated over time (middle, right).

The following sections of this paper are organized as follows: Section 2 describes the MGRIT algorithm. Based on this,Section 3 introduces the

PyMGRIT framework and describes the implementation of the algorithm and its variations. Then,the use of

PyMGRIT is examined, describing first basic usage of the package, followed by a description of implementinga custom application using

PyMGRIT . In Section 4, different aspects and possibilities of using

PyMGRIT are demonstratedin three numerical examples. Finally, conclusions are drawn and possible extensions of the package are discussed inSection 5.

The idea of MGRIT is to enable parallelism in a traditionally sequential process. For time-dependent problems, startingwith an initial value, the solution is classically computed step by step, i. e., starting with the initial condition, the solutionat each time step is computed based on the solution at one or more previous time steps. This sequential time-steppingprocess is shown in Figure 1, which displays the initial condition of a simple linear heat conduction problem along withthe solutions after 512 and 1024 time steps.In contrast, the iterative MGRIT algorithm solves the problem by updating the solution at many points in timesimultaneously. Thereby, the initial guess of the solution can be chosen arbitrarily for all points in time. The idea of thealgorithm is based on a multilevel hierarchy for the problem. While the finest level contains the same points in time asin the time-stepping approach, on coarse levels only a subset of these points are considered. Optimally, the coarsestgrid contains only a very small number of points in time, e. g., three or five, and can therefore be solved exactly andquickly by the classical time-marching approach. This solution on the coarsest grid is used in the algorithm to updatethe solution on the finer grids until the solution on the finest grid is accurate enough based on a predefined tolerance.More precisely, one iteration of the algorithm consists of the following steps:(1) Relaxation on the finest grid for multiple points in time simultaneously. Relaxation corresponds to a localapplication of the time integrator, which is also used for time stepping.(2) Transferring the solution to the next coarser grid, which contains only a subset of the points in time.(3) Repeating steps 1 and 2 for the next coarser grid until the coarsest grid is reached.(4) Solving the problem on the coarsest grid directly.(5) Interpolating the solution from the coarsest grid to the finest grid, improving the solution on all grid levels.These steps are applied iteratively until a desired quality of the solution is achieved. Figure 2 shows this process fortwo MGRIT iterations. While this is only a rough description of the algorithm, it reveals one of the key features of

Manuscript submitted to ACM

Hahne and Friedhoff, et al. relaxation restriction interpolation relaxation restriction interpolation

Iteration 0 Iteration 1

Fig. 2. Two MGRIT iterations for a simple heat conduction problem.

MGRIT: its non-intrusiveness. The only requirement for the algorithm is a time-stepping procedure, which is also usedin the time-marching approach and, thus, this procedure already exists for many problems and can be integrated easilyinto a parallel framework with MGRIT.After the previous schematic representation of time stepping and the MGRIT algorithm, we now want to examineboth approaches algorithmically. Therefore, we consider an initial value problem of the form u (cid:48) ( t ) = f ( t , u ( t )) , u ( t ) = g , t ∈ ( t , t f ] , (1)which can, for example, be a system of ordinary differential equations after the spatial discretization of a space-timepartial differential equation. We discretize the time interval on a grid with uniformly distributed points t i = i ∆ t , i = , , . . . , N t with constant time step ∆ t = ( t f − t )/ N t and let u i ≈ u ( t i ) for i = , . . . , N t with u = u ( ) . Note thatnon-uniform distributions are also possible, but are not considered here for reasons of simplicity. Further, let Φ i bea time integrator, propagating the solution u i − from a time point t i − to time point t i , including forcing from theright-hand side. Then, the one-step time integration method for the time-discrete initial value problem (1) can bewritten as u i = Φ i ( u i − ) , i = , , . . . , N t , or, considering all time points at once, as the space-time system A( u ) ≡  u u − Φ ( u ) ... u N t − Φ N t ( u N t − )  =  g ...  ≡ g . (2)Sequential time-stepping solves problem (2) through a sequential forward solve, yielding the discrete solution after N t applications of the time integrator and, thus, sequential time-stepping is optimal, i. e., of order O ( N t ) .In order to embed the time integrator Φ i into a multi-level algorithm that iteratively solves problem (2), we firstdefine the components of the algorithm and then construct the algorithm. More precisely, we define a coarse gridsystem, a restriction and a prolongation operator between temporal grids, and a relaxation scheme. For a given finetemporal grid t i = i ∆ t , i = , , . . . , N t , and a given integer coarsening factor m >

1, we define a splitting of the finegrid points into F - and C -points, where every m -th point is a C -point and all other points are F -points. Looking only Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 5Fine level t t t · · · t m t N t ∆ t Coarse level T T · · · T N T ∆ T = m ∆ t Fig. 3. Fine and coarse temporal grid with uniformly distributed points. The coarse grid is created by removing all F -points(represented by short markers) of the fine grid. t t t t t t t t t t t t t Φ Φ Φ Φ Φ Φ Φ Φ Φ t t t t t t t t t t t t t Φ Φ Φ F -relaxation C -relaxation Fig. 4. F - and C -relaxation for a temporal grid with 13 time points and a coarsening factor of four. In both relaxations, the independentintervals (blue) can be updated simultaneously. at the C -points, we obtain a coarser grid T i c = i c ∆ T , i c = , , . . . , N T , with N T = N t / m and time step ∆ T = m ∆ t , asshown in Figure 3.As the time-step size is different on the coarse grid, a time integrator Φ i c is needed for the coarse level. We choose are-discretization of the problem with time step ∆ T , but several choices are possible, such as coarsening in the order of thediscretization (Falgout et al. 2019; Nielsen et al. 2018). Next, we define two types of relaxation strategies, i. e., schemes oflocal applications of the time integrator. The first scheme, the so-called F -relaxation, performs a relaxation of all F -pointsby propagating the solution from a C -point to all following F -points up to the next C -point. Each interval of F -pointscan be executed in parallel, whereby each interval consists of m − C -relaxation, analogously performs a relaxation of all C -points consisting of propagatingthe solution from the preceding F -point to a C -point. Again, all intervals of C -points can be updated simultaneously.Both relaxation schemes are shown in Figure 4 and can be combined to new schemes. The FCF -relaxation, i. e., an F -relaxation followed by a C - and another F -relaxation, is the typical choice for the MGRIT algorithm, but othercombinations are also possible. Finally, we define two operators for the transfer between the temporal fine and coarsegrid, more precisely a restriction and a prolongation operator. While restriction is an injection, we define the “ideal”prolongation as an injection at C -points followed by an F -relaxation.Using the full approximation storage (FAS) framework (Brandt 1977) for solving both linear and nonlinear problems,the two-level MGRIT-FAS algorithm (Falgout et al. 2015) extended by spatial coarsening (Falgout et al. 2017) can bewritten as in Algorithm 1.In algorithm 1, A l ( u ( l ) ) = g ( l ) specifies the space-time problem on levels l = ,

2, which can be either linear ornonlinear, and the two operators R I and P denote the transfer operators between temporal grids. Additionally, thealgorithm allows for spatial coarsening and we define R s as spatial restriction and P s as spatial prolongation. Therelaxation scheme can be controlled by the parameter ν , whereby at least one F -relaxation is performed per iterationand then ν subsequent CF -relaxations are executed. Typically ν =

1, i. e., an

FCF -relaxation scheme, is chosen for theMGRIT algorithm. Note that the two-level variant with F -relaxation is equivalent to the parareal method (Lions et al. Manuscript submitted to ACM

Hahne and Friedhoff, et al.

Algorithm 1

MGRIT-FAS( A , u , g ) Apply F -relaxation to A ( u ( ) ) = g ( ) For 0 to ν : Apply CF -relaxation to A ( u ( ) ) = g ( ) Inject the approximation and its residual to the coarse grid: u ( ) = R I ( u ( ) ) , g ( ) = R I ( g ( ) − A u ( ) ) If spatial coarsening: u ( ) = R s ( u ( ) ) g ( ) = R s ( g ( ) ) Solve A ( v ( ) ) = A ( u ( ) ) + g ( ) Compute the error approximation: e = v ( ) − u ( ) If spatial coarsening: e = P s ( e ) Correct using ideal interpolation: u ( ) = u ( ) + P ( e ) N t / m or N t /( m ) iterationsfor F - or FCF -relaxation, respectively (Falgout et al. 2014).

In the previous section, we have familiarized ourselves with the way MGRIT works and how the algorithm looks like.In this section, we introduce the

PyMGRIT framework. First, we describe the structure and availability of the framework.We then present the implementation of the MGRIT algorithm in

PyMGRIT . In terms of algorithmic parameters, manychoices must be made such as the relaxation scheme or the cycling strategy that lead to many different variants of thealgorithm.

PyMGRIT pursues two goals: On the one hand,

PyMGRIT should enable the use of MGRIT with a minimalchoice of algorithmic parameters. On the other hand,

PyMGRIT should be as flexible as possible and give users thepossibility to adjust all settings with little effort. Section 3.2 gives an overview of the most important MGRIT parametersand their implementation in

PyMGRIT . Pursuing the first goal, most parameters have default values, but various choicesare possible for pursuing the second goal. Section 3.3 presents a simple but typical code example for an application,while in Section 3.4, we take a developer’s view of

PyMGRIT and describe the classes required for implementing a customapplication.

The

PyMGRIT framework consists of three components: the code repository (Hahne and Friedhoff 2020c), the docu-mentation (Hahne and Friedhoff 2020b), and the Python Package Index (PyPI) version (Hahne and Friedhoff 2020d).

Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 7The installation of the package is typically done using pip , and only a few requirements have to be fulfilled. First,a Python version ≥ . NumPy (van der Walt et al.2011),

SciPy (Virtanen et al. 2020),

Matplotlib (Hunter 2007), and mpi4py (Dalcin et al. 2011). While

NumPy and

SciPy are used throughout the complete package,

Matplotlib is mainly used for the visualization in different examples.The package mpi4py provides Python bindings for parallelization using the Message Passing Interface (MPI) standard.The requirements are listed in the file setup.py and will be installed automatically if PyPI and pip are used for theinstallation of the

PyMGRIT package.The code repository on GitHub (Hahne and Friedhoff 2020c) contains the latest version of the current code. Aftereach commit on the master branch,

GitHub Actions are used as a continuous integration tool to trigger automatedprocesses. Automated serial and parallel tests are performed for all Python versions ≥ . pytest (Krekel et al.2004) and tox (Krekel 2020). Additionally, the code coverage of the tests is calculated and the coverage is automaticallyuploaded to Codecov (Hahne and Friedhoff 2020a), where the results can be viewed. Furthermore, an up-to-date versionof the documentation is created using Sphinx. The documentation consists of two parts, a pure application programminginterface (API) documentation and a documentation of the features of PyMGRIT . While the API documentation isautomatically generated from the Python comments, the feature documentation includes a quick start, tutorial, andmany examples for both basic and advanced usage of

PyMGRIT . By creating a new release on GitHub, the currentrepository version is automatically uploaded to PyPI and is available via pip afterwards.The

PyMGRIT package primarily consists of four directories: • src: this directory contains the core features of PyMGRIT , namely implementations of the MGRIT algorithmand of template classes of other required components. It also contains a number of sample problems and theirimplementation. • examples: in this directory, a number of application examples can be found, both for basic and advanced usageof PyMGRIT , including all examples from the tutorial and from the documentation. • tests: the test directory contains all written tests that are automatically executed after each commit on themaster branch. Tests include core functions of PyMGRIT as well as sample applications. • docs: contains the documentation for the examples, quick start, tutorial, etc. PyMGRIT is based on classes, with the exception of some auxiliary functions. Two steps are required to use the MGRITalgorithm: First, an instance of the

Mgrit class from

PyMGRIT ’s core must be created. In this step, all algorithmic choicesand settings can be selected by parameters. The MGRIT implementation of

PyMGRIT offers high flexibility for differentvariants of the algorithm, but at the same time the possibility to solve a problem with minimal specifications by usingdefault values for most parameters. The second step is the actual solving of the problem, which is executed by simplycalling a method of the

Mgrit class.Listing 1 provides a minimal example of creating an instance of the class with subsequent solving of the problem.Only the problem hierachy has to be passed to the class via constructor parameters in line 3 (the structure of a problemhierarchy is described in section 3.2.1). Other parameters of the constructor have default values, but can also be setmanually. In line 4, the problem is then solved by calling the member function solve . The most important parametersof

PyMGRIT ’s core class

Mgrit and their effects are presented below; a complete list of all parameters can be found inthe documentation.

Manuscript submitted to ACM

Hahne and Friedhoff, et al. from pymgrit import Mgrit mgrit = Mgrit ( problem = problem ) mgrit . solve ( ) Listing 1. Example for the minimal generation of an instance of the class Mgrit. Only a problem hierarchy is required, all otherparameters have default values. Fig. 5. Structure of V - and F -cycles for four grid levels. The only mandatory parameter for instantiating an object of the

Mgrit class is a problem hierarchy. This problem hierarchy is a list of problems (objects of classes that inherit from

PyMGRIT ’score

Application class) that defines the time-multigrid hierarchy. The exact structure of the problems and how toimplement a problem in an application class is described in Section 3.4. At this point, it is sufficient to know that thetime-multigrid hierarchy, i. e., the number of levels and the coarsening factor for the MGRIT solver, is given by thisproblem hierarchy. The problem for each level contains a vector with all time points for that level. The only requirementfor the time points of each coarse level is that the time points are a subset of the time points of the finest level. Thisallows great flexibility in the choice of the coarsening factor, as it allows the implementation of regular coarseningstrategies, coarsening strategies with different coarsening factors per level, and also semi-coarsening strategies veryeasily.

Two of the most important parameters for MGRIT are the cycling andthe relaxation strategy. Both parameters have a significant impact on the convergence, parallelism, and efficiency ofthe algorithm.

PyMGRIT provides two cycling strategies: V - and F -cycles. These can be controlled by the parameter cycle type , i. e., by adding the cycling strategy via cycle type=‘V’ or cycle type=‘F’ to the constructor call. Thestructure of the two cycles is shown in Figure 5, which differs primarily in how often coarse grids are used. The V -cycleis the recursive implementation of algorithm 1, visiting the coarsest grid only once per MGRIT iteration. In contrast,the F -cycle visits the coarsest level several times per iteration, going back to the coarsest grid after having reachedeach level for the first time. As a consequence, an F -cycle better approximates the two-level method and, thus, fasterconvergence can be expected for F -cycles. However, F -cycles require more communication on intermediate coarsegrids and more serial work on the coarsest grid, leading to worse parallel scalability compared to V -cycles for largenumbers of processes.Furthermore, an improved initial guess can be computed by using the nested iteration strategy, which can becontrolled by the Mgrit constructor parameter nested iteration . By default, nested iteration is active, but it can bedisabled by passing nested iteration=False to the constructor. As described at the end of Section 2, nested iterationcomputes an initial solution guess on the fine grid from coarse grids. More precisely, starting on the coarsest grid, anapproximate solution is computed and interpolated to the next finer grid until the finest grid is reached. On each gridlevel, one V -cycle is used to compute an approximate solution, except on the coarsest grid, where the problem is solved Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 9

Fig. 6. Structure of nested iteration when using four grid levels. directly. The resulting structure is illustrated in Figure 6. Note: In

PyMGRIT , nested iteration is part of the setup phase ofthe algorithm, since it provides an initial solution guess on the finest grid.The relaxation scheme in

PyMGRIT can be set by the parameter cf iter . This parameter specifies how many CF -relaxations follow an always executed F -relaxation. By default, the parameter is set to 1, i. e., an FCF -relaxation scheme.By varying this parameter, the user can easily switch between F -, FCF -, FCFCF -relaxation, and other relaxationschemes. Again, the choice of the scheme has a direct impact on the convergence of the algorithm. Usually, the morerelaxation steps are performed, the better is the convergence. However, this improved convergence comes from theadditional work performed in each iteration. The choice of the relaxation scheme also depends on the cycling strategyused. While an

FCF -relaxation scheme is often a good starting point for a V -cycle, only F -relaxation is often sufficientfor an F -cycle (Falgout et al. 2014). Note that the always performed F -relaxation on the finest grid is skipped after thesecond MGRIT iteration, because the updates during this F -relaxation are already performed during the post-relaxationof the ideal interpolation of the previous iteration. Another important parameter is the stopping tolerance of the algorithm, i. e., a thresholdwhich stops the iterations if crossed. By default,

PyMGRIT uses the absolute (space-)time residual of system (2) to checkconvergence. More precisely, denoting the residual of system (2) at the i c -th C -point of the finest grid at iteration k by r ( k ) i c , where i c is an integer index for looping over all C -points, convergence is measured in the 2-norm based on theabsolute (space-)time residual, (cid:107) r ( k ) (cid:107) = (cid:32)(cid:213) i c (cid:107) r ( k ) i c (cid:107) (cid:33) / , where the computation of the norm of the residual at each C -point, (cid:107) r ( k ) i c (cid:107) , is specified by the user in a class that inheritsfrom PyMGRIT ’s core

Vector class (see Section 3.4 for details). The convergence tolerance defines when to stop theiterations by checking (cid:107) r ( k ) (cid:107) < tol at each iteration, where the tolerance can be set using the constructor parameter tol ; by default, tol is set to 10 − . Currently, the only stopping criterion implemented in PyMGRIT is based on the(space-)time residual; however, the documentation includes examples of how to customize and change this criterion.

In implementations based on a classical time-stepping approach,typically solution values are stored for one time point (or for a few time points in case of a multistep method) at a time,and values are updated at each step of the method. Therefore, such algorithms only offer the possibility to decomposethe computational domain of the problem in space and, thus, allow spatial parallelism only to speed up the calculation.On the contrary,

PyMGRIT stores the solution at every point in time and, thus,

PyMGRIT provides another dimensionfor the decomposition of the computational domain: the time dimension. Figure 7 shows both decompositions of aspace-time domain, reflecting the parallelization strategies applied in classical space-parallel time-stepping algorithmsand in space-time-parallel MGRIT. Notice, that the partitioning of the temporal domain into time slices is added to the

Manuscript submitted to ACM t i m e t i m e Fig. 7. Decomposition of a space-time domain used for parallelizing computations in classical time-stepping algorithms (left) and inMGRIT (right).

Level 0Level 1Level 2 Process 0Process 1

Fig. 8. Example of the distribution of time points across two processes in

PyMGRIT . The time points of the finest level are distributedevenly, and the coarser levels contain all C -points of their parent fine grids. On the coarsest grid, one process holds all time points. spatial decomposition, allowing a distribution of the time slices across additional parallel processing units. PyMGRIT provides a function split communicator that splits a communicator into spatial and temporal components and returnstwo communicators for these components. The two communicators can be passed to the

Mgrit constructor using theparameters comm x for the space communicator and comm t for the time communicator.

PyMGRIT distributes the time points of the finest grid equally over all processes contained in the time communicator.The distributions on coarser grids are based on the decompositions on their parent fine grids as each coarser gridcontains a subset of their parent fine grid. The only exception is the coarsest grid, on which the problem is solvedsequentially by one process. This process then distributes the solution values to other processes. Figure 8 shows thedistribution of N t =

33 time points for two processes in a three-level setting with a coarsening factor of m = PyMGRIT creates an object for the solution at this time point. Additionally,on coarser levels, two additional objects must be stored for each time point: one for a restricted copy of the fine-gridobject and one for the FAS right-hand side. There are several ways to reduce the increased memory requirements forthe MGRIT algorithm, e. g., storing only C -points. Yet, this is not implemented in PyMGRIT , but it is a possible extensionfor future versions.

By default,

PyMGRIT applies only temporal coarsening and does not require any additionalinformation about the restriction or interpolation between spatial grids. If spatial coarsening is desired, the user can

Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 11 dahlquist = Dahlquist ( t_start =0 , t_stop =5 , nt =101) dahlquist_multilevel_structure = simple_setup_problem ( problem = dahlquist , level =2 , coarsening =2) mgrit = Mgrit ( problem = dahlquist_multilevel_structure , tol =1 e − info = mgrit . solve ( ) Listing 2. Typical main routine of an application code that uses

PyMGRIT . pass a list of spatial transfer operators using the transfer parameter. Thereby, a transfer operator is expected for eachtransfer between MGRIT levels, which ensures high flexibility for the user. The

PyMGRIT class

MgritWithPlots provides an extension of the solver with several plotting functions,e. g., for visualizing MGRIT cycles or the distribution of time points across processes; example plots can be found inSection 4.1.

In this section, we describe how to solve a simple application problem and we explain the structure of a typical mainroutine of an application code that uses

PyMGRIT . For a simple application problem, we choose the simplest case of ascalar ODE. More precisely, our goal is solving Dahlquist’s test problem, u (cid:48) = λu in ( , ] (3)with u ( ) = λ = −

1. This and other application problems are included in

PyMGRIT . For an overview of theapplications included, see the online documentation. It provides a detailed tutorial and many examples for basic andadvanced usage of

PyMGRIT .The main routine of an application code that uses

PyMGRIT to solve Dahlquist’s test problem is presented in Listing2. The program uses a two-level MGRIT algorithm to solve Dahlquist’s test problem in the time interval [ , ] with101 equidistant time points. The 101 time points are composed of one point for the start time t = PyMGRIT . The structure of a mainroutine usually consists of three steps. First, the problem is created. Then, a multigrid hierarchy is constructed for thisproblem. Finally, the problem is solved using the MGRIT algorithm. In our example, for the first step, an instance of

PyMGRIT ’s class

Dahlquist is created in line 2 that describes the fine problem. The time domain is passed to the problemclass by using the parameters t start and t stop for specifying the time interval bounds and the parameter nt for thenumber of time steps. Afterwards, for the second step, a multilevel hierarchy is constructed in line 5, based on theproblem instance dahlquist . Here, the auxiliary function simple setup problem is used and a two-level hierarchywith a coarsening factor of two is chosen. This results in a coarse level with 51 points in time. Note that PyMGRIT offersseveral ways to pass the time domain information to a problem class, as well as for creating a multilevel hierarchy; seethe documentation for more details. For the third step, the MGRIT solver for the test problem is set up in line 8 as aninstance of

PyMGRIT ’s core class

Mgrit using the multilevel object dahlquist multilevel structure . Furthermore,the halting tolerance is set to 1 e −

10. Finally, the problem is solved by calling the solve routine of the solver mgrit inline 11. The solver returns some statistical information about the run in the dictionary info . Manuscript submitted to ACM constant_lambda = − value = 1 nt = 101 t = np . linspace (0 , 5 , nt ) for i in range (1 , nt ) : value = 1 / (1 − ( t [ i ] − t [ i − constant_lambda ) ∗ value Listing 3. Example timestepping code for problem 3, discretized by backward Euler on a temporal mesh with 101 points.

PyMGRIT is based on four different types of classes: • Solver: the solver class provides the implementation of the MGRIT algorithm. • Vector: vector classes contain the solution of a single point in time. Every vector class must inherit from

PyMGRIT ’s core

Vector class. • Application: application classes contain information about the problem we want to solve. Every applicationclass must inherit from

PyMGRIT ’s core

Application class. • GridTransfer: grid transfer classes contain information about the transfer of spatial grids between consecutiveMGRIT levels. Every grid transfer class must inherit from

PyMGRIT ’s core

GridTransfer class.The three types of classes Vector, Application and GridTransfer are all based on abstract super classes. These classesindependently create some structures that are valid for each type of class and also ensure that all necessary membervariables and member functions exist in the respective child classes. A developer who wants to create and solve aproblem that is not included in the

PyMGRIT package usually only has to specify parts of the four classes. In most cases itis sufficient to write a vector class and an application class. The grid transfer class is primarily needed for the additionalfeature of spatial coarsening and, if spatial coarsening is not used, it is automatically created by the solver. For the mostpart, the solver class can be used without modifications, but changes are possible; see the documentation for examples.

Listing 3 shows a typical time-stepping code for solving problem 2 discretizedby backward Euler on a temporal mesh with 101 points. The code consists of three components: first, the initialcondition value = 1 is set in line 2. The variable value is further used to store the propagated solution at the currenttime. The second component consists of the time information in lines 3 and 4. The temporal grid contains nt = [ , ] and is created using the Numpy function linspace . In the last step, we iterate over allpoints in time and apply the time integration in form of backward Euler. In summary, we have as components a variablethat contains the solution at a point in time, time information belonging to the problem, and the time-integration loop.To implement these components in

PyMGRIT , we first write a vector class

VectorDahlquist , which stores thesolution at a point in time and inherits from the

PyMGRIT core class

Vector . Therefore, we create a member variable value , which contains the solution of a point in time. Furthermore, the following member functions have to beimplemented: set values , get values , clone , clone zero , clone rand , add , sub , norm , pack , and unpack .The function set values receives data values and overwrites the values of the vector data and get values returnsthe vector data. The function clone clones the object, clone zero returns a vector object initialized with zeros, and clone rand similarly returns a vector object initialized with random data. The functions add , sub , and norm define the addition and subtraction of two vector objects and the norm of a vector object, respectively. Finally, thefunctions pack and unpack define the data to be communicated and how data is unpacked after receiving it. Thedefinition of the class VectorDahlquist with implementations of the constructor and of the functions add , sub Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 13 class VectorDahlquist ( Vector ) : def __init__ ( self , value ) : super ( ) . __init__ ( ) self . value = value def __add__ ( self , other ) : return VectorDahlquist ( self . get_values ( ) + other . get_values ( ) ) def __sub__ ( self , other ) : return VectorDahlquist ( self . get_values ( ) − other . get_values ( ) ) Listing 4. Vector class for Dahlquist’s test equation. Note: The definition of the class is not complete, the member functions set values , get values , clone , clone zero , clone rand , norm , pack , and unpack are not shown. class Dahlquist ( Application ) : def __init__ ( self , constant_lambda = −

1, ∗ args , ∗ ∗ kwargs ) : super ( ) . __init__ ( ∗ args , ∗ ∗ kwargs ) self . constant_lambda = constant_lambda self . vector_template = VectorDahlquist ( 0 ) self . vector_t_start = VectorDahlquist ( 1 ) def step ( self , u_start : VectorDahlquist , t_start : float , t_stop : float ) − > VectorDahlquist : return VectorDahlquist (1 / (1 − ( t_stop − t_start ) ∗ self . constant_lambda ) ∗ u_start . get_values ( ) ) Listing 5. Application class for Dahlquist’s test equation. is shown in Listing 4. The implementation of the other functions is straightforward and not specified in detail here;please refer to the documentation for more information.Second, we write an application class

Dahlquist which contains information about the problem we want to solve.This class contains information about the time grid and the step function and is shown in Listing 5. The time informationis automatically provided by the

PyMGRIT core class

Application , from which every

PyMGRIT application must inheritfrom; for more information see the tutorial. The function step must be defined and contains the time integration routine,which is the same as in Listing 3 except for names and accesses. To compute the new solution, the function receives asparameters the solution of the previous time point, u start , as well as the start point and the end point of the timeintegration step, t start and t stop , respectively. Furthermore, two mandatory member variables, vector template and vector t start , must be created in the application class. The variable vector template stores an instance of thecorresponding Vector class, i. e., the

DahlquistVector class, and vector t start defines the initial condition usingthe same class. This is all we need for our test problem. We can now use

PyMGRIT to solve our problem as described inListing 2.

In this section, we present several examples for using

PyMGRIT to run simulations with MGRIT. Each example has itsown focus and highlights different aspects of the framework. The first example gives an overview of runs with differentvariants of

PyMGRIT ’s MGRIT algorithm and shows some plots generated by

PyMGRIT . The goal of this example is toprovide a better understanding of the algorithm. The second example shows how

PyMGRIT can dramatically reduce thesimulation time of an existing realistic application. As an example, a two-dimensional electrical machine is used, wherea single time step calls the

GetDP (Dular and Geuzaine 2020; Dular et al. 1998; Geuzaine 2007) solver. This example alsouses spatial coarsening to further reduce the simulation time. The third example demonstrates the coupling of

PyMGRIT

Manuscript submitted to ACM

Fig. 9. Results of five MGRIT variants in terms of runtime (left) and number of iterations (right). The runtime results are shown forusing up to 64 processes for parallelizing only in time. and

PETSc and shows space-time parallel results. The tests of all three examples were performed on an Intel Xeon PhiCluster consisting of four 1.4 GHz Intel Xeon Phi processors.

In the first example, we compare several variants of the MGRIT algorithm for the one-dimensional heat equation, u t − au xx = b ( x , t ) in [ , ] × [ , ] , with thermal conductivity a =

1, right-hand side b ( x , t ) = − sin ( πx )( sin ( t )− π cos ( t )) , homogeneous Dirichlet boundaryconditions in space, and subject to the initial condition u ( x , ) = sin ( πx ) .The problem is discretized using second-order central finite differences with 1025 degrees of freedom in space andon an equidistant time grid with 1024 intervals using backward Euler. We choose five-level MGRIT algorithms withcoarsening factor m = V -cycles with FCF -relaxation,(2) V -cycles with FCFCF -relaxation,(3) F -cycles with F -relaxation,(4) F -cycles with FCF -relaxation, and(5) F -cycles with FCFCF -relaxation.For all variants, the stopping tolerance is set to 10 − and a random initial guess is used. Note that we are not consideringa five-level V -cycle with F -relaxation, since a multilevel setting with only F -relaxation generally does not provide anoptimal algorithm (Falgout et al. 2014). The code for the example can be found in the examples/toms directory of PyMGRIT .Figure 9 shows the runtimes and number of iterations of the

PyMGRIT simulations. The left figure presents the runtimeas a function of the number of processes for up to 64 processes and the right figure shows the number of iterationsrequired to reach the halting tolerance for the different MGRIT variants. Comparing only the number of iterations,we see that iterations decrease when stronger (and, thus, more expensive) relaxation schemes are used. Similarly, an F -cycle typically requires fewer iterations than a V -cycle with the same relaxation scheme due to the extra work in Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 15

Fig. 10. Plots provided by

PyMGRIT ’s class

MgritWithPlots for solving the one-dimensional heat equation using five-level MGRIT V -cycles with FCF -relaxation and using four processes in time. While the convergence plot (left) requires running a full

PyMGRIT simulation, the plots visualizing the cycling strategy (middle) and the distribution of time points across processes (right) can becreated directly after setting up the MGRIT solver. each iteration. As an example, the number of iterations for the F -cycle with F -relaxation is about one and a half timesthe number of iterations for with F -cycle variant with FCF -relaxation. Using the even stronger

FCFCF -relaxation, thenumber can be reduced further, but at a smaller scale.However, the choice of the relaxation scheme and the cycle type affects the cost per iteration. Looking at theruntimes of V -cycles, the runtime of the variant with FCFCF -relaxation is always higher than that of the variant with

FCF -relaxation, although fewer iterations are required for the stronger relaxation scheme. Looking at F -cycles, we seethat runtimes are higher than those of V -cycles for the same relaxation scheme, although iteration counts are smaller.Additionally, F -cycles scale worse than V -cycles.Figure 10 shows three sample plots that can be automatically generated by the PyMGRIT class

MgritWithPlots .Shown are the plots for the V -cycle variant with FCF -relaxation and using four processes in time. The diagram on theleft shows the residual norm after each MGRIT iteration of a

PyMGRIT simulation. The other two plots in Figure 10 canbe created immediately after initializing the

MgritWithPlots object. The middle plot visualizes the MGRIT cycle andthe right plot presents the distribution of time points across processes for all MGRIT levels.The figures demonstrate how easy it is to visualize different aspects of

PyMGRIT ’s MGRIT algorithm. More examplesof how

PyMGRIT can be used to create diagrams, e. g., plotting the (space-)time solution of a problem, can be found inthe documentation.

In this example, we apply

PyMGRIT ’s MGRIT algorithm to a simulation of a two-dimensional electrical machine, wherebythe solution of the spatial problem at each time step is executed by the external

GetDP solver. The example demonstrateshow an existing time integration routine from a complex simulation code can be integrated into the parallel

PyMGRIT framework, enabling a dramatic reduction of the simulation time. This section summarizes the results presented in(Bolten et al. 2019); more details about the simulation can be found in (Bolten et al. 2019) and all files are available in

PyMGRIT ’s folder examples/induction machine .The standard approach for the simulation of electrical machines is given by the so-called eddy current problem, asimplification of Maxwell’s equations in which the displacement current is neglected, and is defined in terms of the

Manuscript submitted to ACM A : Ω × ( t , t f ] → R as σ ∂ t A + ∇ × ( ν (·)∇ × A ) = J s in Ω × ( t , t f ] , (4a) n × A = ∂ Ω , (4b) A ( x , t ) = A ( x ) , (4c)with spatial domain Ω , consisting of rotor, stator, and the air gap in between, time interval ( t , t f ] and Dirichletboundary conditions. The geometry is encoded by the scalar electrical conductivity σ ( x ) ≥ ν ( x , |∇ × A |) >

0. Three ( n s =

3) homogeneously distributed stranded conductors (Sch¨ops et al.2018) are modeled by the source current density J s = n s (cid:213) s = χ s i s , with winding functions χ s : Ω → R and currents i s : ( t , t f ] → R . An attached electrical network provides aconnection between the voltage v s ( t ) , s = , ,

3, and the so-called flux-linkage.To consider the rotation of the rotor, the problem is extended by an additional equation of motion, ω ( t ) = dθ ( t ) dt , t ∈ ( t , t f ] , (5a) I d θdt + C dθdt + κθ = T mag ( A ) in ( t , t f ] , (5b) θ ( t ) = θ , (5c) ω ( t ) = ω , (5d)where the torque T mag defines the mechanical excitation, ω is the angular velocity of the rotor, I denotes the moment ofinertia, and C and κ are the fricition and torsion coefficients, respectively. The moving band approach (Ferreira da Luzet al. 2002) is used to model the movement of the mesh.We reduce the three-dimensional (3D) domain Ω into a two-dimensional (2D) domain Ω D ⊂ R in the x , y -planeand discretize the problem (4), combined with (5), in space using linear finite elements with n a degrees of freedom. Thisyields a system of equations of the form M u (cid:48) ( t ) + K ( u ( t )) u ( t ) = f ( t ) , t ∈ ( t , t f ] (6a) u ( t ) = u , (6b)with unknown u (cid:62) = [ a (cid:62) , i (cid:62) , θ , ω ] : ( t , t f ] → R n . At one point in time t ∈ ( t , t f ] , the solution u ( t ) ∈ R n consists ofthe magnetic vector potential a ( t ) ∈ R n a , the currents of the three phases i ( t ) ∈ R , the rotor angle θ ( t ) ∈ R , and theangular velocity of the rotor ω ( t ) ∈ R . Note that problem (6), due to the presence of non-conducting materials, consistsof differential-algebraic equations (DAEs) of index-1 (Bartel et al. 2011; Cortes Garcia et al. 2019).We use the multi-slice finite element model “im 3 kw” (Gyselinck et al. 2001) of an induction machine for modelingthe semi-discrete problem (6). More precisely, a modified version of the machine (Gander et al. 2019) supplied by athree-phase pulse width modulated voltage source of 20 kHz is considered. We refer to (Bolten et al. 2019) for moredetails. Gmsh (Geuzaine and Remacle 2009, 2020) is used to construct the mesh representation of the model and ahierarchy of two nested meshes is considered to allow for spatial coarsening. The fine mesh Ω , consisting of n a = , Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 17

Fig. 11. Runtimes of five different MGRIT variants and sequential time stepping for the nonlinear electrical machine “im kw”. Solidlines represent variants without spatial coarsening and dotted lines with spatial coarsening. For reference purposes, the dashed lineshows the runtime for sequential time stepping on one processor. degrees of freedom, is constructed by refining the coarser mesh Ω with n a = GetDP solver, which implements the time integration using backward Euler.We apply five different MGRIT variants to the simulation, choosing Ω as the spatial grid, a time step ∆ t = − and N t = ,

753 time steps, resulting in a final time t f ≈ .

01 s. For all MGRIT variants, we choose a convergencecriterion based on the relative change of joule losses at C -points of two consecutive iterations. The algorithm stopsif the maximum norm of the relative difference of two successive iterations is less than 1%. The following MGRITvariants are chosen: a two-level MGRIT V -cycle with F -relaxation, a five-level MGRIT V -cycle with FCF -relaxationand a five-level MGRIT F -cycle with FCF -relaxation, whereby both five-level variants are applied with and withoutspatial coarsening. Further, a non-uniform temporal coarsening strategy with coarsening factor 42 on the first leveland, for the multilevel variants, a coarsening factor of four on all coarse levels is chosen.Figure 11 shows the runtimes for the five different MGRIT variants as a function of the number of processes and, forreference purposes, the runtime for sequential time stepping on one processor. Thereby, the dashed line represents theruntime results for sequential time stepping, which is about five days, solid lines represent the MGRIT variants withoutspatial coarsening and dotted lines with spatial coarsening. While the multilevel variants require about eight processesto achieve the same runtime results as the sequential method, the two-level variant with eight-way parallelism achievesa runtime of about four days. However, the multilevel variants show better strong scaling, such that the multilevelvariants and the two-level variant using 16 processes have approximately the same runtime. Increasing the number ofprocesses to 256, the runtime of the fastest multilevel variant is about 5.5 hours, which corresponds to a speedup ofabout 22 compared to sequential time stepping.

In the two previous examples, only parallelization in time was considered. In the last example, we show results forspace-time parallel runs with

PyMGRIT . There are several approaches for extending

PyMGRIT with spatial parallelism,e. g., the use of external libraries that take care of spatial parallelism. Two examples, the coupling of

PyMGRIT with

Manuscript submitted to ACM

Firedrake (Rathgeber et al. 2016) and with

PETSc (Balay et al. 2020), are available in the online documentation and inthe repository. In this example, we consider the coupling of

PyMGRIT with

PETSc via the Python package petsc4py (Dalcin et al. 2011), which allows access to

PETSc data types, solvers and MPI-based parallization in space withouthaving to use lower-level programming languages like Fortran or C ++ . Note that petsc4py is not automatically installedwith the installation of PyMGRIT , but can be easily installed using pip . The code for the example can be found in the examples/toms directory.To demonstrate space-time parallelization we choose a standard example of parallel-in-time integration methods(Gander 2015). More precisely, we choose the the forced 2D heat equation, u t − ∆ u = b ( x , y , t ) in [ , ] × [ , ] × ( t , t f ] , with initial condition u ( x , y , t ) = u ( x , y ) , homogeneous Dirichlet boundary conditions, and forcing term b ( x , y , t ) suchthat the exact solution is given by u ( x , y , t ) = sin ( πx ) sin ( πy ) cos ( t ) in [ , ] × [ , ] × [ t , t f ] . The problem is discretized using standard central finite differences in space with N x = degrees of freedom. Intime, the problem is discretized on a grid with N t = ,

385 points using backward Euler.We apply two five-level MGRIT variants to the problem, a V -cycle with FCF -relaxation and an F -cycle with F -relaxation. Both variants use a coarsening strategy with varying coarsening factors m = m = m = m = − . For both variants, the nested iteration strategy is used to get an improved initial guess.For the implementation of the PyMGRIT application and vector classes for the 2D heat equation,

PETSc structures canbe used thanks to the

PETSc backend, such that the vector class uses DMDA global vectors as data structure to store thesolution at a point in time and the time-stepping method in the application class uses the linear Krylov space (KSP)solver of

PETSc . Then, space-time parallelism is easily achieved by partitioning the

MPI COMM WORLD communicatorinto two communicators: one time and one space communicator.Figure 12 presents strong scaling results for the two MGRIT variants and additionally for a time stepping algorithm,with parallelization only in space. The time-stepping method achieves the fastest runtime for 32 processes in space, sothis would be the optimal value to add time parallelism. However, due to the limited number of cores on the cluster, weselect only four processes in space for the two MGRIT variants, such that 64 processes in the plot correspond to usingfour processes for spatial parallelization and 16 processes in time for the MGRIT variants, i. e., a total of 4 ∗ = This paper introduces the Python framework

PyMGRIT , which implements the parallel-in-time method multigrid-reduction-in-time (MGRIT). The

PyMGRIT framework allows the application of the MGRIT algorithm without having toworry about implementation details, parallel communication and so forth, as well as easy prototyping of new variants

Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 19

Fig. 12. Strong scaling results for two five-level MGRIT variants and space-parallel time-stepping. For the multi-level variants, fourprocesses in space are used, i. e., 16 processes correspond to four processes in time and four processes in space. of the algorithm, and (space-)time parallel simulations using MPI. In addition to the Python code, whose functionalityis guaranteed by a continuous integration environment with automated serial and parallel tests, the framework alsooffers extensive documentation with a quickstart code example, a tutorial, and many examples demonstrating variousfeatures of

PyMGRIT . Together with a simple installation and many pre-implemented ordinary and partial differentialequations, the documentation provides an easy start into working with the package. Advanced usage is described ina separate section of the documentation, and more information about

PyMGRIT ’s core classes and functions can befound in the API documentation, which is automatically generated from Python comments. By coupling

PyMGRIT withother libraries, such as

Firedrake or PETSc , the time parallelism provided by the package can be combined with spatialparallelism for space-time parallel simulations.The

PyMGRIT package already offers many features; however, there are still many possible extensions, open tasks,and new directions. In the following we summarize possible future work. Due to its structure and iterative nature,the MGRIT algorithm allows for a large number of variations and adaptations, e. g., considering different cycle types,applying different relaxation strategies on different MGRIT levels or for different MGRIT iterations, and so forth. Whilemany of these MGRIT variants are already implemented in the

PyMGRIT package, there are still some extensions, suchas adaptive time stepping, that are not implemented yet. Furthermore, for nonlinear problems, options such as theadditional storage of an auxiliary vector (Falgout et al. 2017) that provides an improved starting solution for nonlinearsolvers, are attractive extensions.Another area is the improvement of the required storage space. The MGRIT algorithm, as implemented in

PyMGRIT requires more memory than sequential time stepping, because the solution is stored for multiple points in time. Thereare different strategies to reduce the memory costs, e. g., storing the solution only at C -points, but these are notimplemented yet and are topics for future work. PyMGRIT contains examples of coupling the package with

Firedrake , PETSc and

GetDP , but of course many otherpowerful libraries exist. Coupling with more libraries could increase the number of applications for the package and

Manuscript submitted to ACM

FENiCS (Logg et al. 2012)and

PyClaw (Ketcheson et al. 2012), which both provide a wide range of space-specific algorithms.

ACKNOWLEDGMENTS

The authors are grateful to Jacob Schroder and Robert Falgout for many helpful conversations and insightful commentsregarding the implementation of the MGRIT algorithm, and would like to thank Ryo Yoda for sharing his experiencesof using

PyMGRIT that led to improvements of the package. The authors also acknowledge the support by the BMBF inthe framework of project PASIROM (grant number 05M18PXB).

REFERENCES

Satish Balay, Shrirang Abhyankar, Mark F. Adams, Jed Brown, Peter Brune, Kris Buschelman, Lisandro Dalcin, Alp Dener, Victor Eijkhout, William D.Gropp, Dmitry Karpeyev, Dinesh Kaushik, Matthew G. Knepley, Dave A. May, Lois Curfman McInnes, Richard Tran Mills, Todd Munson, Karl Rupp,Patrick Sanan, Barry F. Smith, Stefano Zampini, Hong Zhang, and Hong Zhang. 2020.

PETSc Users Manual

AppliedNumerical Mathematics

61 (Sept. 2011), 1257–1270.Matthias Bolten, Stephanie Friedhoff, Jens Hahne, and Sebastian Sch¨ops. 2019. Parallel-in-Time Simulation of an Electrical Machine using MGRIT.arXiv:1912.03106 [math.NA]Achi Brandt. 1977. Multi-level adaptive solutions to boundary-value problems.

Math. Comp.

31, 138 (1977), 333–390.Andrew J. Christlieb, Colin B. Macdonald, and Benjamin W. Ong. 2010. Parallel High-Order Integrators.

SIAM Journal on Scientific Computing

32, 2 (2010),818–835.Idoia Cortes Garcia, Herbert De Gersem, and Sebastian Sch¨ops. 2019. A Structural Analysis of Field/Circuit Coupled Problems Based on a GeneralisedCircuit Element.

Numerical Algorithms (March 2019), 1–22.Lisandro D. Dalcin, Rodrigo R. Paz, Pablo A. Kler, and Alejandro Cosimo. 2011. Parallel distributed computing using Python.

Advances in Water Resources

IEEE Transactions on Magnetics

34, 5 (Sep. 1998), 3395–3398.Alok Dutt, Leslie Greengard, and Vladimir Rokhlin. 2000. Spectral deferred correction methods for ordinary differential equations.

BIT NumericalMathematics

40, 2 (June 2000), 241–266.Matthew Emmett and Michael Minion. 2012. Toward an efficient parallel in time method for partial differential equations.

Communications in AppliedMathematics and Computational Science

7, 1 (2012), 105–132.Robert D. Falgout, Stephanie Friedhoff, Tzanio Kolev, Scott MacLachlan, and Jacob B. Schroder. 2014. Parallel time integration with multigrid.

SIAMJournal on Scientific Computing

36, 6 (2014), C635–C661.Robert D. Falgout, Aaron Katz, Tzanio Kolev, Jacob B. Schroder, Andrew Wissink, and Ulrike M. Yang. 2015.

Parallel Time Integration with MultigridReduction for a Compressible Fluid Dynamics Application . Technical Report, LLNL-JRNL-663416. Lawrence Livermore National Laboratory.Robert D. Falgout, Matthieu Lecouvez, and Carol S. Woodward. 2019. A parallel-in-time algorithm for variable step multistep methods.

Journal ofComputational Science

37 (2019), 101029, 12.Robert D. Falgout, Thomas Manteuffel, Ben O’Neill, and Jacob B. Schroder. 2017. Multigrid reduction in time for nonlinear parabolic problems: a casestudy.

SIAM Journal on Scientific Computing

39, 5 (2017), S298–S322.Mauricio V. Ferreira da Luz, Patrick Dular, Nelson Sadowski, Christophe Geuzaine, and Jo˜ao P. A. Bastos. 2002. Analysis of a permanent magnet generatorwith dual formulations using periodicity conditions and moving band.

IEEE Transactions on Magnetics

38, 2 (March 2002), 961–964.Stephanie Friedhoff, Jens Hahne, Iryna Kulchytska-Ruchka, and Sebastian Sch¨ops. 2020. Exploring Parallel-in-Time Approaches for Eddy CurrentProblems. In

Progress in Industrial Mathematics at ECMI 2018 (The European Consortium for Mathematics in Industry) , Istv´an Farag´o, Ferenc Izs´ak, andP´eter L. Simon (Eds.), Vol. 30. Springer, Berlin.Stephanie Friedhoff, Jens Hahne, and Sebastian Sch¨ops. 2019. Multigrid-reduction-in-time for Eddy Current problems.

PAMM

19, 1 (2019), e201900262.Martin J. Gander. 2015. 50 years of Time Parallel Time Integration. In

Multiple Shooting and Time Domain Decomposition . Springer, 69–113.Martin J. Gander and Stefan G¨uttel. 2013. PARAEXP: A Parallel Integrator for Linear Initial-Value Problems.

SIAM Journal on Scientific Computing

35, 2(2013), C123–C142.Manuscript submitted to ACM yMGRIT: A Python Package for the parallel-in-time method MGRIT 21

Martin J. Gander, Iryna Kulchytska-Ruchka, Innocent Niyonzima, and Sebastian Sch¨ops. 2019. A New Parareal Algorithm for Problems with DiscontinuousSources.

SIAM Journal on Scientific Computing

41, 2 (2019), B375–B395.Christophe Geuzaine. 2007. GetDP: a general finite-element solver for the de Rham complex.

PAMM

7, 1 (2007), 1010603–1010604.Christophe Geuzaine and Jean-Franc¸ois Remacle. 2009. Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities.

Internat. J. Numer. Methods Engrg.

Magnetics, IEEE Transactions on

37 (10 2001), 3233 – 3237.Stefanie G¨unther, Lars Ruthotto, Jacob B. Schroder, Eric C. Cyr, and Nicolas R. Gauger. 2020. Layer-Parallel Training of Deep Residual Neural Networks.

SIAM Journal on Mathematics of Data Science

2, 1 (2020), 1–23.Jens Hahne and Stephanie Friedhoff. 2020a. Code coverage for

PyMGRIT . https://codecov.io/gh/pymgrit/pymgrit, Online; accessed June 30, 2020.Jens Hahne and Stephanie Friedhoff. 2020b. Documentation for

PyMGRIT . https://pymgrit.github.io/pymgrit/, Online; accessed June 30, 2020.Jens Hahne and Stephanie Friedhoff. 2020c. Github repository for

PyMGRIT . https://github.com/pymgrit/pymgrit, Online; accessed June 30, 2020.Jens Hahne and Stephanie Friedhoff. 2020d. PyPI site for

PyMGRIT . https://pypi.org/project/pymgrit/, Online; accessed June 30, 2020.A. Howse, H. De Sterck, Robert D. Falgout, Scott MacLachlan, and Jacob B. Schroder. 2019. Parallel-In-Time Multigrid with Adaptive Spatial Coarseningfor The Linear Advection and Inviscid Burgers Equations.

SIAM Journal on Scientific Computing

41, 1 (2019), A538–A565.John D. Hunter. 2007. Matplotlib: A 2D Graphics Environment.

Computing in Science Engineering

9, 3 (2007), 90–95.David I. Ketcheson, Kyle T. Mandli, Aron J. Ahmadia, Amal Alghamdi, Manuel Quezada de Luna, Matteo Parsani, Matthew G. Knepley, and MatthewEmmett. 2012. PyClaw: Accessible, Extensible, Scalable Tools for Wave Propagation Problems.

SIAM Journal on Scientific Computing

34, 4 (Nov. 2012),C210–C231.Holger Krekel. 2020. Welcome to the tox automation project. https://tox.readthedocs.io/en/latest/, Online; accessed June 30, 2020.Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, and Florian Bruhin. 2004. pytest. https://github.com/pytest-dev/pytestLydia Kronsj¨o. 1975. A note on the “nested iterations” methods.

Nordisk Tidskr. Informationsbehandling (BIT)

15, 1 (1975), 107–110.Lydia Kronsj¨o and Germund Dahlquist. 1972. On the design of nested iterations for elliptic difference equations.

Nordisk Tidskr. Informationsbehandling(BIT)

12 (1972), 63–71.Matthieu Lecouvez, Robert D. Falgout, Carol S. Woodward, and Philip Top. 2016. A parallel multigrid reduction in time method for power systems. In . 1–5.Jacques-Louis Lions, Yvon Maday, and Gabriel Turinici. 2001. R´esolution d’EDP par un sch´ema en temps “parar´eel”.

Comptes Rendus de l’Acad´emie desSciences. S´erie I. Math´ematique

Automated Solution of Differential Equations by the Finite Element Method . Springer.Martin Schreiber. 2020. Website for

SWEET . https://schreiberx.github.io/sweetsite/, Online; accessed May 11, 2020.Allan S. Nielsen, Gilles Brunner, and Jan S. Hesthaven. 2018. Communication-aware adaptive Parareal with application to a nonlinear hyperbolic systemof partial differential equations.

J. Comput. Phys.

371 (2018), 483–505.Benjamin W. Ong, Ronald D. Haynes, and Kyle Ladd. 2016. Algorithm 965: RIDC Methods: A Family of Parallel Time Integrators.

ACM Trans. Math.Software

43, 1, Article 8 (Aug. 2016), 13 pages.Florian Rathgeber, David A. Ham, Lawrence Mitchell, Michael Lange, Fabio Luporini, Andrew T. T. Mcrae, Gheorghe-Teodor Bercea, Graham R. Markall,and Paul H. J. Kelly. 2016. Firedrake: Automating the Finite Element Method by Composing Abstractions.

ACM Trans. Math. Software

43, 3, Article 24(Dec. 2016), 27 pages.Sebastian Sch¨ops, Innocent Niyonzima, and Markus Clemens. 2018. Parallel-in-time Simulation of Eddy Current Problems using Parareal.

IEEE Transactionson Magnetics

54, 3 (M¨arz 2018).Jacob B. Schroder, Robert D. Falgout, Carol S. Woodward, Philip Top, and Matthieu Lecouvez. 2018. Parallel-in-Time Solution of Power Systems withScheduled Events. In . 1–5.Stefan van der Walt, S. Chris Colbert, and Gael Varoquaux. 2011. The NumPy Array: A Structure for Efficient Numerical Computation.

Computing inScience Engineering

13, 2 (2011), 22–30.Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, Matt Haberland, Tyler Reddy, David Cournapeau, Evgeni Burovski, Pearu Peterson, Warren Weckesser,Jonathan Bright, St´efan J. van der Walt, Matthew Brett, Joshua Wilson, K. Jarrod Millman, Nikolay Mayorov, Andrew R. J. Nelson, Eric Jones, RobertKern, Eric Larson, CJ Carey, ˙Ilhan Polat, Yu Feng, Eric W. Moore, Jake Vand erPlas, Denis Laxalde, Josef Perktold, Robert Cimrman, Ian Henriksen,E. A. Quintero, Charles R Harris, Anne M. Archibald, Antˆonio H. Ribeiro, Fabian Pedregosa, Paul van Mulbregt, and SciPy 1. 0 Contributors. 2020.SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python.