[PDF] A neural network multigrid solver for the Navier-Stokes equations

Abstract

We present a deep neural network multigrid solver (DNN-MG) that we develop for the instationary Navier-Stokes equations. DNN-MG improves computational efficiency using a judicious combination of a geometric multigrid solver and a recurrent neural network with memory. The multigrid method is used in DNN-MG to solve on coarse levels while the neural network corrects interpolated solutions on fine ones, thus avoiding the increasingly expensive computations that would have to be performed on there. A reduction in computation time is thereby achieved through DNN-MG's highly compact neural network. The compactness results from its design for local patches and the available coarse multigrid solutions that provides a "guide" for the corrections. A compact neural network with a small number of parameters also reduces training time and data. Furthermore, the network's locality facilitates generalizability and allows one to use DNN-MG trained on one mesh domain also on an entirely different one. We demonstrate the efficacy of DNN-MG for variations of the 2D laminar flow around an obstacle. For these, our method significantly improves the solutions as well as lift and drag functionals while requiring only about half the computation time of a full multigrid solution. We also show that DNN-MG trained for the configuration with one obstacle can be generalized to other time dependent problems that can be solved efficiently using a geometric multigrid method.

Full PDF

AA neural network multigrid solver for theNavier-Stokes equations

Nils Margenberg ∗ Dirk Hartmann † Christian Lessig ‡ Thomas Richter § August 27, 2020

We present a deep neural network multigrid solver (DNN-MG) that we de-velop for the instationary Navier-Stokes equations. DNN-MG improves com-putational eﬃciency using a judicious combination of a geometric multigridsolver and a recurrent neural network with memory. The multigrid method isused in DNN-MG to solve on coarse levels while the neural network correctsinterpolated solutions on ﬁne ones, thus avoiding the increasingly expensivecomputations that would have to be performed on there. A reduction incomputation time is thereby achieved through DNN-MG’s highly compactneural network. The compactness results from its design for local patchesand the available coarse multigrid solutions that provides a “guide” for thecorrections. A compact neural network with a small number of parametersalso reduces training time and data. Furthermore, the network’s localityfacilitates generalizability and allows one to use DNN-MG trained on onemesh domain also on an entirely diﬀerent one. We demonstrate the eﬃcacyof DNN-MG for variations of the 2D laminar ﬂow around an obstacle. Forthese, our method signiﬁcantly improves the solutions as well as lift anddrag functionals while requiring only about half the computation time of afull multigrid solution. We also show that DNN-MG trained for the conﬁgu-ration with one obstacle can be generalized to other time dependent problemsthat can be solved eﬃciently using a geometric multigrid method. ∗ Helmut Schmidt University, Holstenhofweg 85, 22043 Hamburg, Germany, [email protected] † Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, 81739 Munich, Germany, [email protected] ‡ University of Magdeburg, Institute for Simulation and Graphics, Universit¨atsplatz 2, 39104 Magde-burg, Germany, [email protected] § University of Magdeburg, Institute for Analysis and Numerics, Universit¨atsplatz 2, 39104 Magdeburg,Germany, [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug Introduction

The last decade has seen enormous progress on the application of deep neural networksto tasks in machine translation, computer vision, and many other ﬁelds, e.g. [32, 46].These advances lead to a growing (and renewed) interest to apply neural networks alsoto problems in computational science and engineering, including for the simulation ofpartial diﬀerential equations. There, the existence of eﬃcient numerical methods raisesthe questions how neural networks can be combined with, e.g. multigrid ﬁnite elementmethods, to leverage the beneﬁts the diﬀerent approaches provide.Towards this end, we propose the deep neural network multigrid solver (DNN-MG)that we develop in this paper for the simulation of the instationary Navier-Stokes equa-tions. DNN-MG combines a geometric multigrid solver with a recurrent neural networkto replace the computations on one or multiple ﬁnest mesh levels with a network-basedcorrection. For this, the Navier-Stokes equations are ﬁrst solved on the ﬁrst L levels ofthe multigrid mesh hierarchy using a classical multigrid solver. The obtained solution isthen interpolated (or prolongated) to the next ﬁner level L + 1 where the network pre-dicts a correction towards the unknown ground truth. The interpolation and predictionsteps are repeated up to a ﬁnest level where the right hand side for the classical multi-grid solve in the next time step is computed. Through this, the neural network-basedcorrections feed back to the coarse levels and aﬀect the overall time evolution of the ﬂowwhile the costs of the multigrid computations on ﬁne levels are avoided.The key to the eﬃciency of the DNN-MG solver is the use of a highly compact neu-ral network with a small number of parameters. The compactness is achieved by thenetwork’s locality, i.e. that it operates independently on patches of the mesh domain,e.g. a single mesh element on level L or a collection of few adjacent elements. It isalso enabled by the availability of the coarse multigrid solution that provides signiﬁcantinformation about the ﬂow behavior and thus reduces the state that needs to be carriedin the network. A local, patch-based network also reduces the required amount of train-ing data and time, since even a single ﬂow contains already a multitude of patches anddiﬀerent behaviors. The latter also facilitates the networks’ ability to generalize to ﬂowsnot seen during training. To ensure that the neural network can model complex ﬂowbehavior, it contains a Gated Recurrent Unit (GRU) with memory that incorporatesa ﬂow’s past into a prediction, similar to how many classical time integration schemesuse multiple past time steps for their update. The GRU’s memory as well as the useof the network’s prediction in the right hand side at time step n + 1 ensures a stable atemporally consistent time integration.We demonstrate the eﬃcacy of the DNN-MG solver for the 2D laminar ﬂow in achannel with obstacles with varying elliptical eccentricities. We ﬁrst consider the classicalﬂow with one obstacle and training performed with diﬀerent eccentricities than used fortesting. The obtained solutions as well as the lift and drag functionals demonstrate thatDNN-MG obtains considerably better accuracy than a coarse multigrid solution on level L while requiring only about half the computation time of a full solution on level L + 1.To analyze DNN-MG’s ability to generalize to ﬂow regime’s not seen during training, wealso consider the channel ﬂow with no or with two obstacles using the neural network2rained for the one obstacle case. In both instances, DNN-MG is able to provide highﬁdelity solutions and for the two obstacle ﬂow it again reduces the error of the lift anddrag functionals substantially.The remainder of the paper is structured as follows. In the next section we reviewrelated work that uses neural networks for the simulation of partial diﬀerential equations.In Sec. 3 we then recall the solution of the Navier-Stokes equations using the geometricmultigrid method before providing an introduction to recurrent neural networks in Sec. 4.In Sec. 5 we introduce the deep neural network multigrid solver, discuss its design andalso sketch its generalization to other problems. Our numerical results are presented inSec. 6. Recently, there has been an increasing interest to employ (deep) neural networks for thesimulation of physical systems. In the following, we will discuss existing approaches withan emphasis on those that consider partial diﬀerential equations and in particular theNavier-Stokes equations.For learning, partial diﬀerential equations are typically considered as sequential timeseries so that neural network architectures developed for such data can also be employedfor these. The most common architecture are recurrent neural networks (RNN), e.g.with Long-Short-Term-Memory units (LSTM) [25] to avoid the vanishing gradient prob-lem. Because of the prevalence of sequential data, LSTMs led to many variations overthe years. One are Gated Recurrent Units (GRUs) which, while not as powerful asLSTMs, provide often similar performance and are easier to train [13]. An alternativeto LSTM-type architectures are temporal convolutional neural networks (TCNs) wherethe temporal dependence is modeled using a causal (i.e. one-sided) convolution in thetime variable. Bai et al. [1] demonstrated that this can provide substantial improve-ments in prediction accuracy over recurrent neural networks with memory. A diﬀerentapproach to modeling temporal dependencies was developed by Voelker et al. [49] whorepresented the time domain in Legendre polynomials and learned basis function co-eﬃcients. Through this, they obtained a neural network model with continuous time.A related idea are the neural ordinary diﬀerential equations architectures by Chen etal. [11] where the depth of the network is continuous.Approaches for a direct description of the time evolution of partial diﬀerential equa-tions using neural networks go back more than 20 years, e.g. [31]. In more recent work,Raissi et al. [41] demonstrated that the velocity ﬁeld and pressure of the Navier-Stokesequations can be learned to good approximation only from observing passively advectedtracer particles. Nabian and Meidani [37], Yang and Perdikaris [55], and Raissi andKarniadakis [40] exploited that not only observations are available but also a knownanalytic model. These authors hence also use the deviation of the network predictionfrom the model as an additional penalty term in the training loss. Kasim et al. [29] re-cently demonstrated neural network-based simulations for a broad range of applications,including ﬂuid dynamics, by also optimizing the network architecture itself during train-3ng process. Eichinger, Heinlein and Klawonn [33] use convolutional neural networksand techniques from image processing to learn ﬂow patterns for the Navier-Stokes ﬂowaround objects of diﬀerent shape.Next to the above direct, neural network-based approaches to the simulation of partialdiﬀerential equations, there have been diﬀerent attempts to integrate these into existingnumerical techniques. As in our work, the objective is to combine the beneﬁts of neuralnetworks and classical methods and obtain techniques that would be diﬃcult or impos-sible with either approach alone. For elasticity, neural network-based representations forconstitutive relations were learned in [47, 7, 43]. These were then be used in classicalsimulations, e.g. based on ﬁnite elements. Wiewel et al. [53] presented a simulation ofthe Navier-Stokes equation where an LSTM-based deep neural network is used to predictthe pressure correction. Xi et al. [54] and Werhahn et al. [52] use neural networks toupsample a coarse simulation and obtain physically plausible high frequency details. Anapproach based on the splitting method for high dimensional PDEs was developed byBeck et al. [3] where one obtains a set of smaller learning problems that are easier to treatthan a single large one. Wan and Sapsis [50] describe the motion of ﬁnite size particlesin ﬂuid ﬂows by combining analytic arguments with a neural network that models thediscrepancy between the (highly) idealized analytic model and real world observations.To ensure that neural network-based predictions respect physical invariants, such asenergy conservation or divergence freedom, network architectures tailored towards physi-cal simulations have been proposed in the literature. The authors of [36, 12, 27] developedneural networks that incorporate the symplectic structure of Hamiltonian mechanics andthey demonstrate that this improves generalization and accuracy. In a related line ofresearch, ideas from nonlinear analysis and stability theory were exploited. Erichson etal. [16], for example, improve the accuracy of the learning process by ensuring that thelearned system has the same Lyapunov stability as the system of interest. Similarly, Ginet al. [19] train a neural network to provide a coordinate transformation that linearizes anonlinear PDE, in the spirit of the Cole-Hopf transform for Burgers equation, but usinga neural network to ﬁnd and represent the transform. For this, they use a Koopmanoperator representation. Related to this is the work Zhang et al. [56] where a neuralnetwork is used to improve the dynamically orthogonal (DO) and bi-orthogonal (BO)methods for stochastic pdes.An alternative line of research aims to learn surrogate neural network representationsof physical systems from the governing (stochastic) partial diﬀerential equations withoutrecourse to simulations or measurements [18, 57, 28]. This avoids the training data gen-eration that is often a bottleneck with the approaches above and allows one to quantifythe uncertainty of the prediction. Karumuri et al. [28] thereby obtain their loss functionthrough a variational formulation of the stochastic PDEs they are considering. Relatedto this is the Deep Ritz Method by E and Yu [15] where the ansatz of the Ritz methodis used but with neural networks for the function approximation. Qin et al. [39] showthat residual networks can be seen as exact one-step time stepping methods and theseprovide building blocks for multi-step methods.The performance of (deep) neural networks has also been analyzed theoretically. Al-ready in the 1980s and 1990s, the ﬁrst results demonstrated the potential of neural4etworks as universal function approximators [14, 26] that do not suﬀer from the curseof dimensionality [2]. More recent results on the approximation properties of deep neu-ral networks, e.g. [8], focused in particular on the required size of a network to achievecertain approximation bound. Mallat [34] provided insight into the eﬀectiveness of deepneural networks using tools from harmonic analysis and pointed out the importance ofinvariants for their understanding. For partial diﬀerential equations, currently only fewresults exist. Kutyniok et al. [30] studied the theoretical power of forward, deep neuralnetworks for reduced order models and they show that the solution map, which yieldsthe basis coeﬃcients of the reduced basis for given values of the parameters, can beeﬃciently approximated using neural networks and in a manner essentially independentof the dimension. Grohs et al. [20] study the spacetime error for PDEs when Euler stepsare realized using neural networks. To our knowledge, for partial diﬀerential equationsno theoretical results for recurrent neural networks with memory exist.

In the following, we provide an overview of the solution of the Navier-Stokes equationsusing the geometric multigrid method.We consider a domain Ω ∈ R d with d ∈ { , } and Lipschitz-continuous boundaryΓ and a bounded time interval [0 , T ]. The solution of the instationary Navier-Stokesequations then seeks the velocity v : [0 , T ] × Ω → R d and pressure p : [0 , T ] × Ω → R ,such that ∂ t v + ( v · ∇ ) v − v + ∇ p = f on [0 , T ] × Ω ∇ · v = 0 on [0 , T ] × Ω , (1)where Re > f an external force. The initial and boundaryconditions are given by v (0 , · ) = v ( · ) on Ω v = v D on [0 , T ] × Γ D Re ( (cid:126)n · ∇ ) v − p(cid:126)n = 0 in [0 , T ] × Γ N , (2)where (cid:126)n denotes the outward facing unit normal vector on the boundary ∂ Ω of thedomain. The boundary Γ = Γ D ∪ Γ N is split into subsets Γ D with Dirichlet boundaryconditions and Γ N with Neumann type conditions. For a discretization with ﬁnite elements, we brieﬂy sketch the variational formulationof Eq. 1. By L (Ω) we denote the space of square integrable functions on the domainΩ ⊂ R d with scalar product ( · , · ) and by H (Ω) those L (Ω) functions with weak ﬁrst5erivative in L (Ω). The function spaces for the velocity and pressure are then V := v D + H (Ω; Γ D ) d , H (Ω; Γ D ) d := (cid:110) v ∈ H (Ω) d : v = 0 on Γ D (cid:111) L := (cid:26) p ∈ L (Ω) , and, if Γ N = ∅ , (cid:90) Ω p d x = 0 (cid:27) , (3)where v D ∈ H (Ω) d is an extension of the Dirichlet data on Γ D into the domain. GivenDirichlet data on the complete boundary, the pressure is normalized to yield uniqueness.With these spaces, the variational formulation of Eq. 1 is given by( ∂ t v, φ ) + ( v · ∇ v, φ ) + 1Re ( ∇ v, ∇ φ ) − ( p, ∇ · φ ) = ( f, φ ) ∀ φ ∈ H (Ω; Γ D ) d ( ∇ · v, ξ ) = 0 ∀ ξ ∈ Lv (0 , · ) = v ( · ) on Ω . (4)Let Ω h be a quadrilateral or hexahedral ﬁnite element mesh of the domain Ω satisfyingthe usual requirements on the structural and form regularity such that the standardinterpolation results hold, compare [42, Section 4.2]. For the ﬁnite element discretizationof Eq. 4, we choose W ( r ) h as the space of continuous functions which are polynomials ofdegree r on each element T ∈ Ω h . If the ﬁnite element mesh contains elements ofgeneralized quadrilateral or hexahedral type, e.g. to resolve curved boundaries, we useisoparametric ﬁnite elements, see [42, Section 4.2.1] for details and the speciﬁc realizationchosen for this work. Throughout this paper, we choose the discrete trial- and test-spacesfor the discretization of Eq. 4 as v h , φ h ∈ V h = [ W (2) h ] d and p h , ξ h ∈ L h = W (2) h . Since theresulting equal order ﬁnite element pair V h × L h does not fulﬁl the inf-sup condition, weadd additional stabilization terms of local projection type [4]. The resulting semidiscretevariational problem then reads( ∂ t v h , φ h ) + ( v h · ∇ v h , φ h ) + 1Re ( ∇ v h , ∇ φ h ) − ( p h , ∇ · φ h ) = ( f, φ h ) ∀ φ h ∈ V h ( ∇ · v h , ξ h ) + (cid:88) T ∈ Ω h α T ( ∇ ( p h − π h p h ) , ∇ ( ξ h − π h ξ h )) = 0 ∀ ξ h ∈ L h (5)where π h : W (2) h → W (1) h denotes the local projection operator into the space of linearpolynomials. The local stabilization parameter α T is chosen as α T = α · Re · h T , where we usually use α = 0 . h T is the local mesh size of element T . Werefer to [4, 9] for details. Since the numerical experiments described in Section 6 considermoderate Reynolds numbers only, no stabilization of the convective term is required.6 .2 Time discretization For temporal discretization, the time interval [0 , T ] is split into discrete time steps ofuniform size 0 = t < t < · · · < t N = T, k = t n − t n − . The generalization to a non-equidistant time discretization is straightforward and onlyomitted for ease of presentation. We deﬁne v n := v h ( t n ) and p n := p h ( t n ) for the fullydiscrete approximation of velocity and pressure at time t n and apply the second orderCrank-Nicolson method to Eq. 5, resulting in the fully discrete problem( ∇ · v n , ξ h ) + (cid:88) T ∈ Ω h α T ( ∇ ( p n − π h p n ) , ∇ ( ξ h − π h ξ h )) = 0 ∀ ξ h ∈ L h , k ( v n , φ h ) + 12 ( v n · ∇ v n , φ h ) + 12Re ( ∇ v n , ∇ φ h ) − ( p n , ∇ · φ h )= 1 k ( v n − , φ h ) + 12 ( f n , φ h ) + 12 ( f n − , φ h ) −

12 ( v n − · ∇ v n − , φ h ) − ∇ v n − , ∇ φ h ) ∀ φ h ∈ V h . (6)The right hand side only depends on the the velocity v n − at the last time step n − b n − in the following.The Crank-Nicolson time discretization in Eq. 6 corresponds to a nonlinear systemof equations that can be solved, e.g., with the Newton-Krylov method. In the pre-sented form it is suﬃciently robust for smooth initial data v ; we refer to [22] for smallmodiﬁcations with improved robustness and stability. In each time step, Eq. 6 results in a large nonlinear system of algebraic equations which,by introducing the unknown x = ( v n , p n ), we write as A h ( x ) = f h [ A h ( x )] i := ( ∇ · v n , ξ ih ) + (cid:88) T ∈ Ω h α T ( ∇ ( p n − π h p n ) , ∇ ( ξ ih − π h ξ ih ))+ 1 k ( v n , φ ih ) + 12 ( v n · ∇ v n , φ ih ) + 12Re ( ∇ v n , ∇ φ ih ) − ( p n , ∇ · φ ih )[ f h ] i := 1 k ( v n − , φ ih ) + 12 ( f n , φ ih ) + 12 ( f n − , φ ih ) −

12 ( v n − · ∇ v n − , φ ih ) − ∇ v n − , ∇ φ ih ) , (7)for all test functions φ ih and ξ ih . The nonlinear problem is solved by Newton’s methodbased on the initial guess x (0) = ( v n − , p n − ) and the iteration l = 1 , , . . . A (cid:48) ( x ( l − ) w ( l ) = f h − A ( x ( l − ) , x ( l ) = x ( l − + w ( l ) (8)7 lgorithm 1 The geometric multigrid method for the solution of the linear system A L x L = b L given on a ﬁnest level L . If all high frequency errors of S are smoothed ata constant rate, the method achieves optimal complexity O ( n ). The multigrid solver isinitiated on the ﬁnest mesh level L and then used recursively: procedure multigrid ( l, A l , b l , x l ) s l ← S ( A l , b l , x l ) (cid:46) Smoothing r l ← b l − A l s l (cid:46) Residual r l − ← R ( l, r l ) (cid:46) Restriction if l − then (cid:46) Coarse-grid solution c ← A − r (cid:46) Direct solution else c l − ← multigrid ( l − , A l − , r l − , end if x (cid:48) l ← s l + P ( l, c l − ) (cid:46) Prolongation s (cid:48) l ← S ( A l , b l , x (cid:48) l ) (cid:46) Smoothing return s (cid:48) l end procedure where we denote by A (cid:48) ( x ( l − ) the Jacobian of A at x ( l − , i.e. the matrix of ﬁrst partialderivatives. For the Navier-Stokes equations this is easily computed analytically, cf. [42,Sec. 4.4.2].Each Newton step requires the solution of a linear system of equations, where the sys-tem matrix A (cid:48) ( x ( l − ) is sparse, but non-symmetric and not deﬁnite, due to the saddlepoint structure of the underlying Navier-Stokes equation. To approximate the solutionwith optimal robustness, we employ the generalized minimal residual method (GMRES)which has been introduced by Saad [44]. The convergence is thereby accelerated througha geometric multigrid solver that is used as a preconditioner with a single sweep per-formed for every GMRES step (usually about 10 for each nonlinear Newton iteration). Multigrid methods are a class of eﬃcient algorithms for the solution of linear systemsthat can achieve the optimal complexity of O ( n ), where n is the number of degreesof freedom (in our case proportional to the number of mesh vertices). We employ thegeometric multigrid method as a preconditioner for the GMRES iterations and henceonly discuss this method in the following. It is summarized in Algo. 1.The geometric multigrid method is based on a hierarchical approximation of a linearsystem on a sequence of ﬁnite element spaces V ⊂ V ⊂ · · · ⊂ V L deﬁned over a hierarchyof meshes Ω , . . . , Ω L = Ω h . Instead of solving the linear system on the ﬁnest mesh level L , as one would do with traditional solvers, geometric multigrid methods smooth highfrequency errors with a simple iterative method and treat the remaining errors on lowerlevels. Usually, smoothing is applied at the beginning (Algo. 1, line 2) and at the end(Algo. 1, line 11) of each iteration. In between, the remaining residual is computed8Algo. 1, line 3) and restricted to the next coarse level L − L − L − L (Algo. 1,line 10).The mesh transfer from ﬁne to coarse levels is accomplished with L -projections,known as restrictions, and from coarse to ﬁne ones with interpolations, known as pro-longations. The smoothing that preceded the restrictions is performed with a smoothingoperator S ( A l , b l , x l ). It is usually a simple iteration that yields an approximation tothe solution of the linear system, i.e. S ( A l , b l , x k ) ≈ A − l b l , and that aims to quickly reduce all high frequency components of the residual b l − A l x l .If this happens on all levels l at a constant rate, then the method achieves the optimalcomplexity O ( n ). In our implementation we use a simple iteration of Vanka type [48],which allows for easy parallelization and gives very good performance with less than 5pre- and post-smoothing steps [17]. The idea of the Vanka type smoother is to exactlysolve small subproblems and to replace the inverse of A − l by V l ( A l ) = (cid:88) T ∈ Ω l R TT [ R T A l R TT ] − R T (9)where R T is the restriction to those nodes that belong to one mesh element T ∈ Ω l and R TT its transpose. Considering piecewise quadratic ﬁnite elements, 9 nodes areattached to each element such that the local matrices to be inverted have the dimension R T A l R TT ∈ R × , since each node comprises the scalar pressure and two velocitycomponents. For a more detailed discussion of the geometric multigrid method we referto [21, 10] and the realization in Gascoigne 3D [5] used throughout this work is describedin [6]. Diﬀerent types of neural networks exist. The ones originally introduced under the nameare today known as feed-forward neural networks and used for tasks such as image recog-nition or data mining. Recurrent neural networks are an extension where an activationvariable a kn ∈ R p k allows to propagate information in a discrete time variable indexedby n , such as those occurring in time series (e.g. the stock market) or sequential data(e.g. text). An extension of this model uses network nodes with memory. These allow tomore eﬀectively model long term dependencies and also to avoid the vanishing gradientproblem that otherwise occurs with recursive neural networks [24]. In our work, we useGated Recurrent Units (GRUs) [13] that are somewhat simpler than the better knownLong-Short-Term-Memory (LSTM) units [51] but are therefore also easier to train. At9ayer k in the network, a GRU is given by z kn = σ z (cid:0) W ( z ) k h k − n + U ( z ) k h kn − + b ( z ) k (cid:1) (10a) r kn = σ r (cid:0) W ( r ) k h k − n + U ( r ) k h kn − + b ( r ) k (cid:1) (10b) h kn = z tk (cid:12) h kn − + (1 − z kn ) (cid:12) σ h (cid:0) W ( h ) k h k − n + U ( h ) k ( r kn (cid:12) h kn − ) + b ( h ) k (cid:1) (10c)where (cid:12) denotes the element-wise product and the W ( · ) k and b ( · ) k are weight matrices andbias vectors determined by training. The so called update gate vector z kn ∈ R q in Eq. 10controls to what extend h kn − is carried over to the output h kn at the current time n andthe reset gate vector r kn ∈ R q controls the contribution of h kn − to the nonlinearity ofthe cell. Together, z kn ∈ R q and r kn ∈ R q thus control the memory of a GRU cell, i.e. towhat extend the past hidden output h kn − contributes to the current one h kn . In this section, we develop the deep neural network multigrid solver (DNN-MG). It uses arecurrent neural network to predict the correction of a coarse mesh solution that has beenprolongated (or interpolated) onto a ﬁner mesh (ﬁner by one or multiple mesh layers).Through this, we can obtain solutions that are more accurate than those obtained withthe coarse mesh only while being computationally more eﬃcient than performing thefull (multigrid) computations on the ﬁne mesh level(s).We will develop the DNN-MG solver in the context of the Navier-Stokes equations andreturn to the general formulation at the end of the section. To simplify the exposition, wewill also assume that the neural network operates on only one ﬁner level L + 1. However,bridging more levels as well as an extension to other partial diﬀerential equations ispossible within our framework. We begin by detailing one time step of the simulation of the Navier-Stokes equationsusing the DNN-MG solver. The computations are summarized in Algo 2 and a conceptualdepiction is provided in Fig. 1.At the beginning of time step n , we ﬁrst solve for the unknown velocity v L n andpressure p L n on level L deﬁned by Eq. 6 using the classical Newton-Krylow simulation(Algo 2, lines 2-5) as described in Sec. 3.3. To improve the solution, we interpolate(i.e. prologante) v L n to ˜ v L +1 n onto the next ﬁner level L + 1 where a richer functionspace V L +1 is available (Algo 2, line 7). The neural network part of DNN-MG thenpredicts the velocity update d L +1 n , i.e. the diﬀerence d n L +1 = ¯ v L +1 n − ˜ v L +1 n between ˜ v L +1 n and the (unknown) ground truth solution ¯ v L +1 n on level L + 1 (Algo 2, line 8) based onthe interpolated velocity ˜ v L +1 n ∈ V L +1 and information about the local mesh structure.The n th time step ends by computing the right hand side b L +1 n of Eq. 6 on level L + 1using the corrected velocity ˜ v L +1 n + d L +1 n (Algo 2, line 8) and restricting it to level L lgorithm 2 DNN-MG for the solution of the Navier-Stokes equations. Lines 6-9 (blue)provide the modifcations of the DNN-MG method compared to a classical Newton-Krylow simulation with geometric multigrid preconditioning. for all time steps n do while not converged do (cid:46) Newton’s method for Eq. 6 δz i ← multigrid ( L, A nL , b nL , δz i ) (cid:46) Algo. 1 with z = ( p Ln , v Ln ) z i +1 ← z i + (cid:15) δz i end while ˜ v L +1 n ← P ( v L n ) (cid:46) Prolongation on level L + 1 d L +1 n ← N (˜ v L +1 n , Ω L , Ω L +1 ) (cid:46) Prediction of velocity correction b L +1 n +1 ← Rhs (˜ v L +1 n + d L +1 n , f n , f n +1 ) (cid:46) Set up rhs of Eq. 6 for next time step b L n +1 ← R ( b L +1 n +1 ) (cid:46) Restriction of rhs to level L end for (Algo 2, line 9). Through this right hand side, the the neural network-based correctionbecomes part of the multigrid computations in the next time step (Algo 2, line 5) andthus aﬀects the overall time evolution of the ﬂow. A key aspect of the DNN-MG solveris to build the right hand side b L +1 n on level L + 1 since a restriction of the correctedvelocity ¯ v L +1 n + d L +1 n itself would (essentially) yield again the velocity v L n that resultedfrom the multigrid computations only. Pressure is handled implicitly on level L in themultigrid solve and thus plays no role in the neural network DNN-MG correction. The neural network component is at the heart of the DNN-MG solver. Care in itsdesign is required to ensure that it improves the eﬃciency of the runtime computations,is easy to train, and generalizes well to ﬂow regimes not seen during training. Theseobjectives will be attained through a very compact neural network with a small numberof parameters and a local, patch-based design. An overview of the network componentis provided in Fig. 2.

A patch-based neural network

The eﬃciency of DNN-MG relies on the neural networkevaluation (and related auxiliary computations, e.g. of its inputs) being less computa-tionally expensive than using traditional solvers for Eq. 7 for the level L + 1. When aneural network predicts the defect d L +1 n for the entire domain on level L + 1, the size ofthe networks in- and outputs is of order O ( m L +1 ) where m L +1 is the number of degreesof freedom on the level. The cost of evaluating the network is then at least O ( m L +1 ) andit would hence be more expensive than a direct multigrid solution. The neural networkwould additionally have to have a large number of parameters to model the solution’sbehavior on the entire simulation domain, necessitating substantial training time anddata, while the domain would be “baked” into the network, hampering generalization.11 n t n +1 time step multigrid solutionneural network P ( v Ln )temporal consistency through GRU-cell memory h t − R ( b L +1 n +1 ) b L +1 n = Rhs (˜ v L +1 n + d L +1 n ) Figure 1: The neural network part of DNN-MG predicts a correction d L +1 n of the pro-longated velocity P ( v L n ) towards the unknown ground truth ¯ v L +1 n , which tra-ditionally would be obtained using the multigrid solver for level L + 1. Thecorrection d L +1 n is incorporated into the time integration by using the improvedsolution ˜ v L +1 n + d L +1 n to build a right hand side b L +1 n +1 and restricting it to level L . The restriction b Ln +1 is then used in the next time step in the multigridsolver. Temporal coherence of the correction (and hence the overall ﬂow) isachieved through the memory of the GRU in the neural network.To avoid these issues, DNN-MG uses a neural network that operates patch-wise oversmall neighborhoods of the mesh. In its most local variant, which we use for the nu-merical examples discussed Section 6, each patch consists of the unknowns that belongto one single mesh element only, cf. Fig. 3; a more global approach is easily realized byconsidering larger patches consisting of multiple adjacent elements. Our network takesas input thus only information from one patch and predicts the velocity update d L +1 n onlyfor the mesh nodes in the patch, with the prediction done independently for all patchesto cover the entire simulation domain (duplicate predictions for adjacent patches areaveraged). The locality is one key to the compactness of the neural network and hencealso to its eﬃcient evaluation. Furthermore, since a single neural network is used forall patches in the domain, training with a small number of ﬂows exposes the networkto a large number of diﬀerent ﬂow behaviors. This facilitates the network’s ability togeneralize to ﬂows not seen during training and reduces training time and data. Neural network inputs

The eﬃcacy of the neural network prediction relies criticallyon suitable network inputs that provide rich information about the coarse ﬂow behavioron level L as well as the local patch geometry (similar to how the geometry of meshcells play an important role in classical ﬁnite element simulations). In particular, withcarefully selected inputs one requires fewer parameters in the neural network, whichreduces the evaluation time at runtime and the amount of data and time required fortraining. 12raining phase Application Training dataon each patch:

Input:

Residual Ax − b on level L + 1;Peclet numbers;velocities;cell sizes Loss:

Defect d L +1 n = P ( v L n ) − ¯ v L +1 n Coarse sol. { v L n } T n n =1 Reference sol. { ¯ v L +1 n } T n n =1 Deepneuralnetwork Correctedsolution P ( v Ln ) + d L +1 n multigridsolutionon level L TrainingEnhance modeland solution R ( b L +1 n )InterferenceFigure 2: High level overview of data generation, training and application of DNN-MG.The training inputs are velocity ﬁelds on level L and L + 1, with the latter oneserving as reference solution. From these as well as the meshes Ω L and Ω L +1 onthe respective levels we derive the input to the neural network that operatespatch-wise (cf. Fig. 3). At application time, we use a classical multigridsolution on level L and apply the neural network part of DNN-MG to obtaineda corrected solution P ( v L n ) + d L +1 n on level L + 1. With it, a right hand side b L +1 n is computed that is then restricted to level L where it is used in the multigridsolver in the next time step.Using the most localized patches directly supported by the available mesh structure,as we do in our numerical experiments, each patch consists of exactly one mesh elementon level L with N L = 4 nodes and thus with N L +1 = 9 nodes on level L + 1. Our inputsto the neural network are then: • nonlinear residuals r L +1 n = f L +1 − A L +1 ( P ( x L n )) ∈ R N L +1 of the prolongated coarsemesh solution for Eq. 7; • the velocities v Ln ∈ R N L on mesh level L ; • P´eclet numbers Pe L ∈ R N L ; • geometry of the cells; in particular the edge lengths h c ∈ R , the aspect ratios a c ∈ R (of two neighboring and two opposite edges each), and the angles betweenthe edges α c ∈ R .We hence have in total 51 inputs, reﬂecting the compactness of our neural network.The most important input to the neural network is the nonlinear residual r L +1 n = f L +1 − A L +1 ( P ( x L n )), cf. Eq. 7, that provides a local measure of the error (or defect) at13 omain Ω GRU-based NeuralNetwork Patchfrom mesh

Get inputfor DNNfrom patchAdd defectback tosolution

Figure 3: DNN-MG uses a local approach where the neural network operates patch-wise on small neighborhoods of the simulation domain and provides a velocitycorrection independently for each patch.every mesh node on level L + 1 and thereby incorporates information about the partialdiﬀerential equation. We also use the P´eclet number Pe L = q ˜ v nL /ν where ν is the ﬂuid’sviscosity and q a characteristic length scale, which is the patch size in our case. TheP´eclet number describes the ratio between transport and diﬀusion on each node on level L and it hence measures which part of the Navier-Stokes equations dominates the localbehavior of the ﬂow. Neural network architecture

To predict a defect d L +1 n that is consistent with the ﬂuidﬂow, we use recurrent neural networks with memory. Through this, the past simulationstates . . . , v n − , v n − aﬀect the predicted update d n at time step n , similar to howclassical time stepping methods (often) use information from a sequence of last timesteps to predict the next one.The speciﬁc network architecture we use for DNN-MG is shown in Fig. 4. At its heartis a GRU unit with memory, cf. Sec 4. It could in principle be replaced by another cellwith memory, such as an LSTM unit or a TCN, but since the GRU unit performed wellin our experiments and is, through its simpler design, easier to train we used it in thepresent work. The fully connected layer that follows the GRU reduces its output sizeto the size of the update. The two convolutional blocks following the fully connectedlayer are designed to learn local dependencies, with one learning the interactions of thetwo velocity components at each vertex and the other the spatial dependence of eachcomponent across a patch. Each of these blocks consist of two layers, where the secondone is just the transposed version of the ﬁrst layer so that input and output sizes areequal. The ﬁnal convolutional layer reduce the dimensionality of the data from themultiple ﬁlters that form the preceding convolutional layers to the output d n of thenetwork. Training of DNN-MG

The training of DNN-MG is based on simulations of the Navier-Stokes equations for which a multigrid representation of the velocity v ln with two levels14 RU, in: N in , out: N GRU

Input( r n , v n , P e L , h c )Fully connected N GRU × N out Convolutional block Convolutional blockConvolution reducingnumber of ﬁltersOutput d n ∈ R N out Figure 4: The neural network architecture that was used. L and L + 1 is available. The velocity v L +1 n thereby serves as ground truth ¯ v L +1 n . Thetraining then ﬁnds network weight (in the GRU unit, the dense and the convolutionallayers) such that the norm (cid:13)(cid:13) (˜ v L +1 n + d L +1 n ) − ¯ v L +1 n (cid:13)(cid:13) of the diﬀerence between the predictedvelocity ˜ v L +1 n + d L +1 n (i.e. the velocity after correction, Algo. 2, line 9) and the groundtruth ¯ v L n is minimized over the course of a simulation. After the detailed description of DNN-MG for the solution of the Navier-Stokes equa-tions we will in the following show how it can be applied to a general PDE in variationalform u ∈ V : A ( u )(Φ) = F (Φ) , ∀ Φ ∈ V , where A ( · )( · ) is a semi-linear form (linearin the second argument) and V is a Banach-space chosen for the problem. Given asuitable Galerkin formulation in the discrete subspace V h ⊂ V the abstract Newton iter-ation for the solution of the problem A h ( x h ) = f h can be formulated exactly as for theNavier-Stokes equations, compare Eq. 7 and Eq. 8. The Newton-Krylov approach basedon the preconditioned GMRES iteration and a geometric multigrid solver as precondi-tioner is highly robust and it can be applied to a variety of PDE models (as is done inGascoigne [5]). The DNN-MG solver in Algorithm 2 can hence be applied to variousPDEs.More generally, DNN-MG does not strictly require a geometric multigrid solver butonly an algorithm that can solve on coarse levels and an interpolation onto one ormultiple ﬁner levels. However, the geometric multigrid framework gives us immediatelyaccess to a nested hierarchy of problems. 15 .4 Algorithmic Complexity of DNN-MG The Newton-Krylov geometric multigrid method can already achieve optimal complexity,which raises the question what advantage DNN-MG has to oﬀer. Before addressing thequestion from a practical point of view in the next section, we sketch in the following ananswer from a theoretical point of view.The Newton-Krylov geometric multigrid framework achieves linear complexity O ( N )in the number N of degrees of freedom, which roughly quadruples with each globalmesh reﬁnement, i.e. N L +1 ≈ N L . The constant hidden in O ( N ), however, can beimmense, since on average 5 Newton steps are required in each time step and within eachNewton step one has to compute on average 10 GMRES steps with one sweep of thegeometric multigrid solver as preconditioning. Furthermore, in the geometric multigridtypically 5 pre-smoothing and 5 post-smoothing steps have to be determined. Hence,a total of about 500 Vanka smoothing steps must be performed on each of the meshlayers L, L − , . . . ,

0. One Vanka step thereby requires the inversion of about O ( N l / ×

27, one on each element of the mesh, cf. Eq. 9, resulting inapproximately 27 ≈

20 000 operations. Since the complexity of all mesh levels sums upto about N L + N L − + . . . N ≈ N L (1 + 4 − + · · · + 4 − L ) ≈ N L we can estimate theeﬀort for the complete solution process on level L as 5 · · (5 + 5) · · ≈ N L . Wethereby only counted the eﬀort for smoothing and neglected all further matrix, vectorand scalar products. Solving the problem on an additional mesh level L +1 would requireabout four times this eﬀort, i.e. ≈ N L operations.On the other hand, the DNN-MG method only requires the prolongation of the solutionto the next ﬁner mesh and there, as main eﬀort, the evaluation of the neural networkon each patch. If we again consider patches of minimal size (one patch is one element ofmesh layer L + 1) about O ( N L +1 /

4) = O ( N L ) patches must be processed. The eﬀort forthe evaluation of the network can be estimated by the number of trainable parameterswith N c inputs. The DNN-MG approach asks for only one such evalution of the networkin each time step. The speciﬁc network model used for the numerical test cases comprises8634 trainable parameters. Hence, the eﬀort for correcting the level L Newton-Krylovsolution by the neural network on level L + 1 can be estimated by 8634 N L , which isnegligible in comparison to the eﬀort of solving on level L + 1 ( ≈ N L ). In this section, we present numerical results obtained with the deep neural networkmultigrid solver (DNN-MG) for the Navier-Stokes equations. We consider a classicalbenchmark problem describing the laminar ﬂow around a cylinder, cf. [45]. In contrast tothe original benchmark conﬁguration, we parametrize the geometry by using an ellipsoidwith varying aspect ratio to study the robustness of our technique. We also report onthe runtime performance of DNN-MG and study its ability to generalize to ﬂow regimesthat diﬀer from the training data. 16 wall Γ out Γ wall Γ in Re test = 2002 . . . . . , . in , do-nothingboundary conditions at the outﬂow boundary Γ out and no-slip conditions onthe walls Γ wall . The center of the obstacle is at (0 . , . L )ﬁne 8088 4 ( L + 1)reference 31536 5 ( L + 2)Table 1: Spatial meshes with increasing number of multigrid levels and unknowns. We consider a variation of the well-established “laminar ﬂow around a cylinder” intro-duced by Sch¨afer and Turek [45], in particular the unsteady 2D-2 testcase where thelaminar ﬂow around a circular obstacle at Reynolds number Re = 100 leads to a stableperiodic ﬂow pattern. The geometry of the ﬂow domain is shown in Fig 5. The ﬂow isdriven by a Dirichlet proﬁle v = v D at the left boundary Γ in given by v D ( x, y, t ) = v avg y ( H − y ) H ω ( t ) on Γ in := 0 × [0 , H ] ,ω ( t ) = (cid:40) − cos(2 πt ) , t ≤ t > , (11)where H = 0 .

41 is the height of the channel and the function ω ( t ) regularizes the startupphase of the ﬂow. On the wall boundary Γ wall and on the obstacle we prescribe no-slipboundary conditions v = 0 and on the outﬂow boundary Γ out we use a do-nothingoutﬂow condition [23]. The average ﬂow rate is v avg = which, through the diﬀerentobstacles used for training and testing, results in the Reynolds numbers Re train = 166and Re test = 200, respectively.In all numerical examples the temporal step size is k = 0 .

01. In space, we considerthe sequence of meshes in Table 1. Finer levels result from a uniform reﬁnement of the“coarse” mesh conforming to the obstacle. 17 .2 Neural network parameters

As discussed in Sec. 5, DNN-MG uses a local neural network that operates over patchesof the mesh domain. For our numerical experiments, we deﬁned a patch to be thereﬁnement of a coarse mesh cell on level L , i. e. we used the most localized patches thatare naturally supported by the mesh structure. This resulted in a simple network (cf.Figure 4) with only 8634 trainable parameters. With none of the layers having a biasand N in = 51 (the input length), N GRU = 32 (the size of the GRU state), N out = 18(the output length), there are 3(32 ·

32 + 32 ·

51) = 7968 parameters in the GRU and32 ·

51 = 576 in the fully connected layer. Each of the convolutional blocks that learnsvertex interactions has, furthermore, 36 parameters and the last convolutional layer has18 parameters.Each convolutional block learns 4 ﬁlters, in the ﬁrst layer the number of ﬁlters isexpanded to 4 and in the following transposed layer it is reduced to 1. In the last layerthe outputs of the convolutional blocks are concatenated along the ﬁlter dimension,which then has to be reduced to a length of 1. This is achieved by a a convolution withlength 1 (summing along the ﬁlter dimension and weighting the output), resulting in 18trainable parameters.The network was implemented using PyTorch [38].

As training data for DNN-MG we used three ( L + 1)-level multi-grid simulations of theﬂow around a cylinder, each with a slightly diﬀerent elliptical obstacle with height 0 . N c × N T training items, where N c is the numberof patches and N T is the length of the time series. By the periodicity of the ﬂow it alsosuﬃced to consider a small number of full periods. We hence used t ∈ [1 ,

6] resulting in N T = 500.As loss function L we employed a simple l -loss L = N T (cid:88) n =1 N c N c (cid:88) c =1 l (˜ v L +1 n,c + d L +1 n,c , ¯ v L +1 n,c ) = N T (cid:88) i =1 N c N c (cid:88) c =1 (cid:13)(cid:13)(cid:0) ˜ v L +1 n,c + d L +1 n,c (cid:1) − ¯ v L +1 n,c (cid:13)(cid:13) . Tikhonov-regularization was used as well with a scaling factor α = 10 − . Training wasperformed using the ADAM optimizer with a limit of 1000 epochs. At the end, themodel with the lowest validation loss was saved. This resulted in a training time of 4h23m on an Intel Xeon Gold 6254. For testing we used the ﬂow around an elliptic obstacle with an increased height of 0 . t = 9 for the ﬂow obtained using DNN-MG as well as classical multigrid solutions onlevel L and L + 1. It can be observed that DNN-MG is indeed able to predict high18 .)b.)c.) Figure 6: Magnitude of the velocity ﬁeld for the channel for ﬂow with one obstacle attime t = 9 for a.) a multigrid solution with L + 1 levels, b.) DNN-MG, c.) amultigrid solution with L levels.frequency ﬂuctuations that are not visible in the coarse mesh solution. In particular inthe vicinity of the obstacle the quality of the solution is strongly enhanced with distinctfeatures in the wake being apparent in the DNN-MG simulation. . . . . . . − − − × −

13 multigrid levels4 multgrid levels5 multigrid levels . . . . . . − − − × − Figure 7: The lift functional on the sequence of spatially reﬁned ﬁnite element meshesfrom Table 1 reveals a frequency shift which is depending on the spatial ﬁniteelement discretization. We show the startup phase [0 ,

3] and the interval [9 , J d ( v, p ) = (cid:90) Γ (cid:16) ∇ v − pI (cid:17) (cid:126)n · (cid:126)e d s, J l ( v, p ) = (cid:90) Γ (cid:16) ∇ v − pI (cid:17) (cid:126)n · (cid:126)e d s, respectively, where (cid:126)e = (1 , T and (cid:126)e = (0 , T . The results in Fig. 8 and Table 2demonstrate that DNN-MG is also able to substantially correct both the drag and liftfunctionals. Maximum and minimum values but also the frequency of oscillation of theaugmented simulation are by far closer to the ﬁne mesh one that serves as reference.These results substantiate that DNN-MG is able to considerably improve the solutionquality.Fig. 10, ﬁnally, shows the relative velocity and pressure errors E rel n ( v ) = (cid:13)(cid:13) (˜ v L +1 n + d L +1 n ) − v L +1 n (cid:13)(cid:13) (cid:13)(cid:13) v L +1 n (cid:107) , E rel n ( p ) = (cid:13)(cid:13) p g − p f (cid:13)(cid:13) (cid:13)(cid:13) p L +1 n (cid:13)(cid:13) , where (cid:107) · (cid:107) denotes the Euclidean norm of the discrete solution. DNN-MG is able torobustly reduce the velocity error from 20% −

27% to about 10%. While the relativepressure error is also reduced by a factor of 2 on average, some oscillatory peaks remain.These stem from the shift in periodicity as discussed in the following remark.

Remark.

Fig. 7 shows the lift functional for classical multigrid simulations with threediﬀerent resolutions both in the startup phase t ∈ [0 ,

3] and for t ∈ [9 ,

10] where the ﬂowproﬁle is fully developed. The results show that on coarser meshes it takes longer forthe periodic ﬂow proﬁle to develop (Fig. 7, left) and the period of oscillations is gettingshorter with increasing mesh resolution (Fig. 7, right). This shift in frequency andphase as a function of mesh resolution prevents a direct comparison of results obtainedon diﬀerent levels in a multigrid-type algorithm such as DNN-MG. In [35], a shiftedvariant of the Crank-Nicolson scheme is discussed to remedy the problem. Since thiscannot be used directly for DNN-MG, we instead compare the functional values in shiftedintervals [ t ∗ , t ∗ + 1], where t ∗ is chosen as the ﬁrst peak in the lift functional after time t = 9 s , identiﬁed separately for the coarse mesh solution, DNN-MG and the ﬁne meshone on level L + 1. The results reported in Fig. 8, Fig. 10, Fig. 13 and Table 2, Table 3are those in [ t ∗ , t ∗ + 1]. drag lifttype (level) min max mean ampl. min max mean ampl.MG ( L + 1) 0 . . . . − . . − . . L ) 0 . . . . − . . . . L ) 0 . . . . − . . − . . Table 2: Functional outputs for the drag and the lift functional on the coarse mesh, theﬁne mesh and the coarse mesh with ANN correction.20 . . . . . . t . . . . . . . . × − DNN-MG ( L )MG ( L )MG ( L + 1) 0 . . . . . . t − − − × − DNN-MG ( L )MG ( L )MG ( L + 1) Figure 8: Drag (top) and lift (bottom) functionals for the channel with one obstacle forthe coarse mesh solution, the ﬁne mesh solution and DNN-MG.

Figure 9: Comparison of the wall-clock times of the diﬀerent mehods

The use of the DNN-MG method in applications is only meaningful when it not onlyimproves the accuracy compared to a coarse mesh solution (as demonstrated in the lastsubsection) but also reduces the computation time compared to solving on a ﬁner mesh.As shown in Figure 9, DNN-MG saves 55 % of the wall clock time required for a ﬁne meshsolution. It hence indeed improves the computational eﬃciency. Compared to solvingon a coarse mesh, the runtime increases by 37 %, but with an improved accuracy forDNN-MG. Fig. 9 also shows that the evaluation of the neural network requires only 2 %of the DNN-MG runtime and a substantial amount of time is spent on data conversionbetween the multigrid framework and the neural network and other auxiliary tasks.21 . . . . . . t . . . . . . . . v e l o c i t y e rr o r × − DNN-MG ( L )MG ( L ) 0 . . . . . . t p r e ss u r ee rr o r × − DNN-MG ( L )MG ( L ) Figure 10: Relative errors for coarse solution with and without ANN correction comparedto the reference solution on level L + 1.The DNN-MG runtime can hence most likely be further reduced with a more optimizedimplementation. For the practicality of the DNN-MG method it is important that a neural networktrained on one ﬂow performs well also on similar ones, i.e. that it generalizes to ﬂowsbeyond those seen during training. The results in Sec. 6.4 already demonstrated that thenetwork is able to do so under small perturbations of the geometry of the obstacle. Inthe following, we consider three more substantial changes to the ﬂow: a channel withoutan obstacle, a channel with two obstacles, and a ﬂow in an L-shaped domain, Unlessstated otherwise, we reuse the network trained as discussed in Sec. 6.3 for the channelwith a single obstacle.

Channel ﬂow without obstacle

In our ﬁrst test case we considered the stationary, 1-dimensional ﬂow that develops in the channel of Fig. 5 when the obstacle close to theinﬂow is removed. The stationary ﬂow is challenging for DNN-MG in our setup since itis trained on oscillatory ﬂow data only. Fig. 11 shows the DNN-MG solution that hasbeen obtained. We observe a slight deformation of the velocity ﬁeld (which should bestrictly constant in the horizontal direction) but an otherwise stable and in particularstationary ﬂow. The network is thus able to generalize to this entirely diﬀerent ﬂowregime. We conjecture that the network’s ability to do so results from our networkdesign that learns individual patches. Hence, even for the non-stationary ﬂow around acylinder that served as training data one has at least locally also relatively stationaryﬂows in the training data. 22igure 11: Magnitude of the velocity ﬁeld obtained with DNN-MG trained on the ﬂowaround one cylinder for the channel without obstacle (bottom) and referencesolution (top).

Channel with two obstacles

The second generalization test was also an extension ofthe initial benchmark problem in Sec. 6.4 where we replaced the one obstacle with two.These are centered at (0 . , .

18) and (0 . , . . .

1. Since the incoming ﬂow is already disturbed by the ﬁrst obstaclebefore reaching the second one, the conﬁguration diﬀers substantially from the trainingscenario with only one. To represent the modiﬁed geometry of this test case we employa ﬁnite element mesh that on level L features 9 741 degrees of freedom and 37 917 forthe ﬁne mesh of level L + 1,The magnitude of the velocity ﬁeld obtained with DNN-MG as well as multigridsolutions on levels L and L + 1 are shown in Figure 12. All ﬁelds look similar, althoughvisually DNN-MG is closer to the reference than the coarse solution in the vicinity ofthe two obstacles and diﬀers more in the far ﬁeld on the right. To quantitatively assessthe performance of DNN-MG we again used the drag and lift functionals, which wenow evaluated for the second obstacle. Fig. 13 and Table 3 show that DNN-MG is ableto substantially improve the estimated amplitude, despite being trained with only oneobstacle. Moreover, the results were obtained using a network trained on a diﬀerentmesh, which was possible since our neural network operates on local patches that areagnostic to the global mesh structure. The frequency in Fig. 13 is again suﬀering fromthe frequency and phase shift that arises from the diﬀerent mesh resolutions but thatare unrelated to DNN-MG, cf. Sec. 6.4 23 .)b.)c.) Figure 12: Magnitude of the velocity ﬁeld of DNN-MG trained on the ﬂow around onecylinder for the channel with two obstacle at time t = 10 (b.) as well as amultigrid solution with L + 1 levels (a.) and L levels (c.). . . . . . . t . . . . . . . . × − DNN-MGMGMG (ﬁne) 0 . . . . . . t − − − − × − DNN-MGMGMG (ﬁne)

Figure 13: Drag (left) and lift (right) functionals of DNN-MG trained on the ﬂow aroundone cylinder for the channel with two obstacle as well as for a classical multi-grid solution on L and L + 1 levels. L-shaped domain

As a ﬁnal test case we considered an L-shaped domain as shownin Fig. 14. Using DNN-MG with the trained neural network from Sec. 6.4 fails in this24 rag lifttype (level) min max mean ampl. min max mean ampl.MG ( L + 1) 0 .

335 0 .

375 0 .

355 0 . − .

401 0 . − .

001 0 . L ) 0 .

319 0 .

353 0 .

336 0 . − .

303 0 .

399 0 .

048 0 . L ) 0 .

302 0 .

330 0 .

315 0 . − .

302 0 . − .

000 0 . Table 3: Drag and the lift functionals for the generalization of DNN-MG trained on theﬂow around one cylinder for the channel with two obstacles. As reference weshow the classical multi-grid solutions on the coarse and the ﬁne meshes.case since the training data contains only ﬂows in the positive x -direction (the predictedsolution then has a unphysical drift in the x -direction also in the lower arm of the L-shape). To alleviate this problem, we retrained DNN-MG with the existing data but the x - and y -axes swapped. Through this, no new and expensive simulations on level L + 1were necessary, and the training converged faster since the optimization could start froman already trained network. In Figure 14 the magnitude of the resulting velocity ﬁeldsfor DNN-MG and multigrid solutions on levels L and L + 1 are depicted. As can be seenthere, DNN-MG does not provide a substantial improvement over the coarse multigridsolution. In our opinion, this is not surprising given that the training data does notcontain a ﬂow past a sharp edge. Nonetheless, we would like to point out that the DNN-MG simulation remains stable and the neural network correction does not deterioratethe solution quality of the coarse simulation on level L .Figure 14: Comparison between simulations with L + 1 multigrid levels, DNN-MG, L multigrid levels (from left to right).25 Conclusion

We have presented a deep neural network multigrid solver (DNN-MG) that uses a re-current neural network to improve the eﬃciency of a geometric multigrid solver, e.g. forthe simulation of the Navier-Stokes equations. The neural network is integrated intothe multigrid hierarchy and corrects on ﬁne levels the prolongated solutions (instead ofthe smoothing operations used classically), aﬀecting the overall time evolution throughthe right hand side of the problem. Central to DNN-MG is the compactness of its neu-ral network. This is enabled through the network’s locality, namely that it operatesindependently on small patches of the domain, and the coarse multigrid solution thatprovides a “guide” for the network’s corrections. The compactness is vital for the com-putational eﬃciency of DNN-MG and also reduces the amount of training data and timethat are required. The locality furthermore facilitates generalizability and even allowsone to use a network trained on one mesh domain with an entirely diﬀerent one. The useof recurrent neural networks with memory thereby enables DNN-MG to account for timedependencies in a ﬂow and obtain predicted corrections that are temporally coherent.We analyzed the eﬃcacy of DNN-MG for the classical Navier-Stokes benchmark prob-lem of a laminar ﬂow around an obstacle. Our results show that DNN-MG is able tosubstantially improve the accuracy of solutions compared to coarser multigrid ones andalso provide substantial improvements for the drag and lift functionals. It thereby re-quires only approximately half the time of a multigrid solution at the same resolutionand only a modest increase compared to a coarse one. DNN-MG is thereby able togeneralize substantially beyond the ﬂow used for training and with the neural networktrained on the one obstacle ﬂow we were able to obtain high ﬁdelity solutions also forchannel ﬂows with no or two obstacles.The experiment with the ﬂow in the L-shaped domain, where we do not obtain im-provements over a coarse multigrid solution, shows the limits of the current DNN-MGimplementation. We believe that a more complex neural network and more variabilityin the training data will be able to alleviate the limitation. We want to study this infuture work. There, we also want to generalize our current implementation to 3D ﬂows.Up to now, we made no eﬀort to use accelerator hardware for the neural network. Ahybrid approach where the ﬁnite element multigrid levels are residing on the CPU andthe neural network prediction is realized on a GPU would potentially provide furtherspeedups. Although DNN-MG is already considering the physics of the underlying prob-lem by using the nonliner residual on level L + 1, it is worth exploring how additionalmodel knowledge such as the divergence freedom of the predicted velocity ﬁelds can beintegrated into DNN-MG. In future work, we also want to consider the application ofDNN-MG to other partial diﬀerential equations, cf. from nonlinear elasticity, cf. Sec. 5.3. Acknowledgement

NM and TR acknowledge the ﬁnancial support by the Federal Min-istry of Education and Research of Germany, grant number 05M16NMA as well as theGRK 2297 MathCoRe, funded by the Deutsche Forschungsgemeinschaft, grant number314838170. 26 eferences [1] S. Bai, J. Zico Kolter, and V. Koltun. An Empirical Evaluation of Generic Con-volutional and Recurrent Networks for Sequence Modeling. arXiv e-prints , pagearXiv:1803.01271, March 2018.[2] A. R. Barron. Approximation and Estimation Bounds for Artiﬁcial Neural Net-works.

Machine Learning , 14(1):115–133, jan 1994.[3] C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep splitting methodfor parabolic PDEs. jul 2019.[4] R. Becker and M. Braack. A ﬁnite element pressure gradient stabilization for theStokes equations based on local projections.

Calcolo , 38(4):173–199, 2001.[5] R. Becker, M. Braack, D. Meidner, T. Richter, and B. Vexler. The ﬁnite elementtoolkit

Gascoigne . .[6] R. Becker, M. Braack, and T. Richter. Parallel multigrid on locally reﬁned meshes.In R. Rannacher, et. al., editor, Reactive Flows, Diﬀusion and Transport . Springer,Berlin, 2006.[7] J. Berg and K. Nystr¨om. Data-driven discovery of PDEs in complex datasets.

Journal of Computational Physics , 384:239–252, may 2019.[8] Helmut B¨olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. OptimalApproximation with Sparsely Connected Deep Neural Networks.

SIAM Journal onMathematics of Data Science , 1(1):8–45, jan 2019.[9] M. Braack, E. Burman, V. John, and G. Lube:. Stabilized ﬁnite element methods forthe generalized oseen problem.

Comput. Methods Appl. Mech. Engrg. , 196:853–866,2007.[10] J. H. Bramble.

Multigrid Methods . Longman Scientiﬁc & Technical, Harlow, 1993.[11] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural OrdinaryDiﬀerential Equations.

NIPS 2018 , jun 2018.[12] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. Symplectic Recurrent NeuralNetworks. sep 2019.[13] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio.Learning phrase representations using RNN encoder-decoder for statistical machinetranslation. In

Conference on Empirical Methods in Natural Language Processing(EMNLP 2014) , 2014.[14] G. Cybenko. Approximation by superpositions of a sigmoidal function.

Mathematicsof Control, Signals, and Systems , 2(4):303–314, dec 1989.2715] Weinan E and Bing Yu. The deep ritz method: A deep learning-based numericalalgorithm for solving variational problems.

Communications in Mathematics andStatistics , 6(1):1–12, 2018.[16] N. B. Erichson, M. Muehlebach, and M. W. Mahoney. Physics-informed Autoen-coders for Lyapunov-stable Fluid Flow Prediction. may 2019.[17] L. Failer and T. Richter. A newton multigrid framework for optimal control ofﬂuid-structure interactions.

Optimization and Engineering , 82(2), accepted 2020.https://arxiv.org/abs/2003.14018.[18] N. Geneva and N. Zabaras. Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks.

Journal of Computational Physics ,403:109056, feb 2020.[19] Craig Gin, Bethany Lusch, Steven L. Brunton, and J. Nathan Kutz. Deep LearningModels for Global Coordinate Transformations that Linearize PDEs. arXiv e-prints ,page arXiv:1911.02710, November 2019.[20] Philipp Grohs, Fabian Hornung, Arnulf Jentzen, and Philipp Zimmermann. Space-time error estimates for deep neural network approximations for diﬀerential equa-tions. arXiv e-prints , page arXiv:1908.03833, August 2019.[21] W. Hackbusch.

Multi-Grid Methods and Applications . Springer, 1985.[22] J. Heywood and R. Rannacher. Finite element approximation of the nonstationaryNavier-Stokes problem. iv. error analysis for second-order time discretization.

SIAMJ. Numer. Anal. , 27(3):353–384, 1990.[23] J.G. Heywood, R. Rannacher, and S. Turek. Artiﬁcial boundaries and ﬂux andpressure conditions for the incompressible Navier-Stokes equations.

Int. J. Numer.Math. Fluids. , 22:325–352, 1992.[24] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber.

Gradient Flow in Re-current Nets: The Diﬃculty of Learning LongTerm Dependencies , pages 237–243.Wiley-IEEE Press, 2001.[25] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory.

Neural Computation ,9(8):1735–1780, nov 1997.[26] K. Hornik. Approximation capabilities of multilayer feedforward networks.

NeuralNetworks , 4(2):251–257, jan 1991.[27] P. Jin, A. Zhu, G. E. Karniadakis, and Y. Tang. Symplectic networks: Intrinsicstructure-preserving networks for identifying Hamiltonian systems. jan 2020.[28] Sharmila Karumuri, Rohit Tripathy, Ilias Bilionis, and Jitesh Panchal. Simulator-free solution of high-dimensional stochastic elliptic partial diﬀerential equations28sing deep neural networks.

Journal of Computational Physics , 404:109120, March2020.[29] M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatﬁeld, D. H. Froula,G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga, J. Topp-Mugglestone, E. Viezzer,and S. M. Vinko. Up to two billion times acceleration of scientiﬁc simulations withdeep neural architecture search. jan 2020.[30] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider. A Theoretical Analysis ofDeep Neural Networks and Parametric PDEs. mar 2019.[31] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artiﬁcial neural networks for solvingordinary and partial diﬀerential equations.

Trans. Neur. Netw. , 9(5):987–1000,September 1998.[32] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.

Nature , 521(7553):436–444,may 2015.[33] A. Klawonn M. Eichinger, A. Heinlein. Stationary ﬂow predictions using convolu-tional neural networks. In F. Vermolen and C. Vuik, editors,

Numerical Mathematicsand Advanced Applications. ENUMATH 2019 . Springer, 2020.[34] S. Mallat. Understanding deep convolutional networks.

Philosophical Transac-tions of the Royal Society A: Mathematical, Physical and Engineering Sciences ,374(2065):20150203, apr 2016.[35] N. Margenberg and T. Richter. Parallel time-stepping for ﬂuid-structure interac-tions. submitted , 2019. https://arxiv.org/abs/1907.01252.[36] M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, and E. Kaxiras. PhysicalSymmetries Embedded in Neural Networks. apr 2019.[37] M. A. Nabian and H. Meidani. Physics-Informed Regularization of Deep NeuralNetworks. Technical report, 2018.[38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, GregoryChanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des-maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, AlykhanTejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and SoumithChintala. Pytorch: An imperative style, high-performance deep learning library. InH. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´e Buc, E. Fox, and R. Garnett,editors,

Advances in Neural Information Processing Systems 32 , pages 8024–8035.Curran Associates, Inc., 2019.[39] T. Qin, K. Wu, and D. Xiu. Data driven governing equations approximation usingdeep neural networks.

Journal of Computational Physics , 395:620–635, oct 2019.2940] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning ofnonlinear partial diﬀerential equations.

Journal of Computational Physics , 357:125–141, mar 2018.[41] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden Fluid Mechanics: A Navier-Stokes Informed Deep Learning Framework for Assimilating Flow VisualizationData. aug 2018.[42] T. Richter.

Fluid-structure Interactions. Models, Analysis and Finite Elements ,volume 118 of

Lecture notes in computational science and engineering . Springer,2017.[43] S. Rudy, A. Alla, S. L. Brunton, and J. N. Kutz. Data-Driven Identiﬁcation ofParametric Partial Diﬀerential Equations.

SIAM Journal on Applied DynamicalSystems , 18(2):643–660, jan 2019.[44] Y. Saad.

Iterative Methods for Sparse Linear Systems . PWS Publishing Company,1996.[45] M. Sch¨afer and S. Turek. Benchmark computations of laminar ﬂow around a cylin-der. (With support by F. Durst, E. Krause and R. Rannacher). In E.H. Hirschel,editor,

Flow Simulation with High-Performance Computers II. DFG priority re-search program results 1993-1995 , number 52 in Notes Numer. Fluid Mech., pages547–566. Vieweg, Wiesbaden, 1996.[46] J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt,A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver. Mas-tering atari, go, chess and shogi by planning with a learned model. arXiv e-prints ,page arXiv:1911.08265, November 2019.[47] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, andD. Barajas-Solano. Learning Parameters and Constitutive Relationships withPhysics Informed Deep Neural Networks. aug 2018.[48] S. P. Vanka. Block-implicit multigrid solution of Navier-Stokes equations in primi-tive variables.

J. Comp. Phy. , 65:138–158, 1985.[49] A. Voelker, I. Kaji´c, and C. Eliasmith. Legendre Memory Units: Continuous-TimeRepresentation in Recurrent Neural Networks. In

NIPS 2019 , pages 15544–15553,2019.[50] Z. Y. Wan and T. P. Sapsis. Machine learning the kinematics of spherical particlesin ﬂuid ﬂows.

Journal of Fluid Mechanics , 857:R2, dec 2018.[51] G. Weiss, Y. Goldberg, and E. Yahav. On the practical computational power ofﬁnite precision rnns for language recognition. In

ACL 2018 , 2018.[52] M. Werhahn, Y. Xie, M. Chu, and N. Thuerey. A multi-pass gan for ﬂuid ﬂowsuper-resolution.

Proc. ACM Comput. Graph. Interact. Tech. , 2(2), July 2019.3053] S. Wiewel, M. Becher, and N. Thuerey. Latent-space Physics: Towards Learningthe Temporal Evolution of Fluid Flow. feb 2018.[54] Y. Xie, E. Franz, M. Chu, and N. Thuerey. Tempogan: A temporally coherent,volumetric gan for super-resolution ﬂuid ﬂow.

ACM Trans. Graph. , 37(4), July2018.[55] Y. Yang and P. Perdikaris. Physics-informed deep generative models. dec 2018.[56] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in Modal Space: SolvingTime-Dependent Stochastic PDEs Using Physics-Informed Neural Networks. may2019.[57] Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and ParisPerdikaris. An eﬃcient ﬂuid–solid coupling algorithm for single-phase ﬂows.