A neural network multigrid solver for the Navier-Stokes equations
Dirk Hartmann, Christian Lessig, Nils Margenberg, Thomas Richter
AA neural network multigrid solver for theNavier-Stokes equations
Nils Margenberg ∗ Dirk Hartmann † Christian Lessig ‡ Thomas Richter § August 27, 2020
We present a deep neural network multigrid solver (DNN-MG) that we de-velop for the instationary Navier-Stokes equations. DNN-MG improves com-putational efficiency using a judicious combination of a geometric multigridsolver and a recurrent neural network with memory. The multigrid method isused in DNN-MG to solve on coarse levels while the neural network correctsinterpolated solutions on fine ones, thus avoiding the increasingly expensivecomputations that would have to be performed on there. A reduction incomputation time is thereby achieved through DNN-MG’s highly compactneural network. The compactness results from its design for local patchesand the available coarse multigrid solutions that provides a “guide” for thecorrections. A compact neural network with a small number of parametersalso reduces training time and data. Furthermore, the network’s localityfacilitates generalizability and allows one to use DNN-MG trained on onemesh domain also on an entirely different one. We demonstrate the efficacyof DNN-MG for variations of the 2D laminar flow around an obstacle. Forthese, our method significantly improves the solutions as well as lift anddrag functionals while requiring only about half the computation time of afull multigrid solution. We also show that DNN-MG trained for the configu-ration with one obstacle can be generalized to other time dependent problemsthat can be solved efficiently using a geometric multigrid method. ∗ Helmut Schmidt University, Holstenhofweg 85, 22043 Hamburg, Germany, [email protected] † Siemens AG, Corporate Technology, Otto-Hahn-Ring 6, 81739 Munich, Germany, [email protected] ‡ University of Magdeburg, Institute for Simulation and Graphics, Universit¨atsplatz 2, 39104 Magde-burg, Germany, [email protected] § University of Magdeburg, Institute for Analysis and Numerics, Universit¨atsplatz 2, 39104 Magdeburg,Germany, [email protected] a r X i v : . [ phy s i c s . c o m p - ph ] A ug Introduction
The last decade has seen enormous progress on the application of deep neural networksto tasks in machine translation, computer vision, and many other fields, e.g. [32, 46].These advances lead to a growing (and renewed) interest to apply neural networks alsoto problems in computational science and engineering, including for the simulation ofpartial differential equations. There, the existence of efficient numerical methods raisesthe questions how neural networks can be combined with, e.g. multigrid finite elementmethods, to leverage the benefits the different approaches provide.Towards this end, we propose the deep neural network multigrid solver (DNN-MG)that we develop in this paper for the simulation of the instationary Navier-Stokes equa-tions. DNN-MG combines a geometric multigrid solver with a recurrent neural networkto replace the computations on one or multiple finest mesh levels with a network-basedcorrection. For this, the Navier-Stokes equations are first solved on the first L levels ofthe multigrid mesh hierarchy using a classical multigrid solver. The obtained solution isthen interpolated (or prolongated) to the next finer level L + 1 where the network pre-dicts a correction towards the unknown ground truth. The interpolation and predictionsteps are repeated up to a finest level where the right hand side for the classical multi-grid solve in the next time step is computed. Through this, the neural network-basedcorrections feed back to the coarse levels and affect the overall time evolution of the flowwhile the costs of the multigrid computations on fine levels are avoided.The key to the efficiency of the DNN-MG solver is the use of a highly compact neu-ral network with a small number of parameters. The compactness is achieved by thenetwork’s locality, i.e. that it operates independently on patches of the mesh domain,e.g. a single mesh element on level L or a collection of few adjacent elements. It isalso enabled by the availability of the coarse multigrid solution that provides significantinformation about the flow behavior and thus reduces the state that needs to be carriedin the network. A local, patch-based network also reduces the required amount of train-ing data and time, since even a single flow contains already a multitude of patches anddifferent behaviors. The latter also facilitates the networks’ ability to generalize to flowsnot seen during training. To ensure that the neural network can model complex flowbehavior, it contains a Gated Recurrent Unit (GRU) with memory that incorporatesa flow’s past into a prediction, similar to how many classical time integration schemesuse multiple past time steps for their update. The GRU’s memory as well as the useof the network’s prediction in the right hand side at time step n + 1 ensures a stable atemporally consistent time integration.We demonstrate the efficacy of the DNN-MG solver for the 2D laminar flow in achannel with obstacles with varying elliptical eccentricities. We first consider the classicalflow with one obstacle and training performed with different eccentricities than used fortesting. The obtained solutions as well as the lift and drag functionals demonstrate thatDNN-MG obtains considerably better accuracy than a coarse multigrid solution on level L while requiring only about half the computation time of a full solution on level L + 1.To analyze DNN-MG’s ability to generalize to flow regime’s not seen during training, wealso consider the channel flow with no or with two obstacles using the neural network2rained for the one obstacle case. In both instances, DNN-MG is able to provide highfidelity solutions and for the two obstacle flow it again reduces the error of the lift anddrag functionals substantially.The remainder of the paper is structured as follows. In the next section we reviewrelated work that uses neural networks for the simulation of partial differential equations.In Sec. 3 we then recall the solution of the Navier-Stokes equations using the geometricmultigrid method before providing an introduction to recurrent neural networks in Sec. 4.In Sec. 5 we introduce the deep neural network multigrid solver, discuss its design andalso sketch its generalization to other problems. Our numerical results are presented inSec. 6. Recently, there has been an increasing interest to employ (deep) neural networks for thesimulation of physical systems. In the following, we will discuss existing approaches withan emphasis on those that consider partial differential equations and in particular theNavier-Stokes equations.For learning, partial differential equations are typically considered as sequential timeseries so that neural network architectures developed for such data can also be employedfor these. The most common architecture are recurrent neural networks (RNN), e.g.with Long-Short-Term-Memory units (LSTM) [25] to avoid the vanishing gradient prob-lem. Because of the prevalence of sequential data, LSTMs led to many variations overthe years. One are Gated Recurrent Units (GRUs) which, while not as powerful asLSTMs, provide often similar performance and are easier to train [13]. An alternativeto LSTM-type architectures are temporal convolutional neural networks (TCNs) wherethe temporal dependence is modeled using a causal (i.e. one-sided) convolution in thetime variable. Bai et al. [1] demonstrated that this can provide substantial improve-ments in prediction accuracy over recurrent neural networks with memory. A differentapproach to modeling temporal dependencies was developed by Voelker et al. [49] whorepresented the time domain in Legendre polynomials and learned basis function co-efficients. Through this, they obtained a neural network model with continuous time.A related idea are the neural ordinary differential equations architectures by Chen etal. [11] where the depth of the network is continuous.Approaches for a direct description of the time evolution of partial differential equa-tions using neural networks go back more than 20 years, e.g. [31]. In more recent work,Raissi et al. [41] demonstrated that the velocity field and pressure of the Navier-Stokesequations can be learned to good approximation only from observing passively advectedtracer particles. Nabian and Meidani [37], Yang and Perdikaris [55], and Raissi andKarniadakis [40] exploited that not only observations are available but also a knownanalytic model. These authors hence also use the deviation of the network predictionfrom the model as an additional penalty term in the training loss. Kasim et al. [29] re-cently demonstrated neural network-based simulations for a broad range of applications,including fluid dynamics, by also optimizing the network architecture itself during train-3ng process. Eichinger, Heinlein and Klawonn [33] use convolutional neural networksand techniques from image processing to learn flow patterns for the Navier-Stokes flowaround objects of different shape.Next to the above direct, neural network-based approaches to the simulation of partialdifferential equations, there have been different attempts to integrate these into existingnumerical techniques. As in our work, the objective is to combine the benefits of neuralnetworks and classical methods and obtain techniques that would be difficult or impos-sible with either approach alone. For elasticity, neural network-based representations forconstitutive relations were learned in [47, 7, 43]. These were then be used in classicalsimulations, e.g. based on finite elements. Wiewel et al. [53] presented a simulation ofthe Navier-Stokes equation where an LSTM-based deep neural network is used to predictthe pressure correction. Xi et al. [54] and Werhahn et al. [52] use neural networks toupsample a coarse simulation and obtain physically plausible high frequency details. Anapproach based on the splitting method for high dimensional PDEs was developed byBeck et al. [3] where one obtains a set of smaller learning problems that are easier to treatthan a single large one. Wan and Sapsis [50] describe the motion of finite size particlesin fluid flows by combining analytic arguments with a neural network that models thediscrepancy between the (highly) idealized analytic model and real world observations.To ensure that neural network-based predictions respect physical invariants, such asenergy conservation or divergence freedom, network architectures tailored towards physi-cal simulations have been proposed in the literature. The authors of [36, 12, 27] developedneural networks that incorporate the symplectic structure of Hamiltonian mechanics andthey demonstrate that this improves generalization and accuracy. In a related line ofresearch, ideas from nonlinear analysis and stability theory were exploited. Erichson etal. [16], for example, improve the accuracy of the learning process by ensuring that thelearned system has the same Lyapunov stability as the system of interest. Similarly, Ginet al. [19] train a neural network to provide a coordinate transformation that linearizes anonlinear PDE, in the spirit of the Cole-Hopf transform for Burgers equation, but usinga neural network to find and represent the transform. For this, they use a Koopmanoperator representation. Related to this is the work Zhang et al. [56] where a neuralnetwork is used to improve the dynamically orthogonal (DO) and bi-orthogonal (BO)methods for stochastic pdes.An alternative line of research aims to learn surrogate neural network representationsof physical systems from the governing (stochastic) partial differential equations withoutrecourse to simulations or measurements [18, 57, 28]. This avoids the training data gen-eration that is often a bottleneck with the approaches above and allows one to quantifythe uncertainty of the prediction. Karumuri et al. [28] thereby obtain their loss functionthrough a variational formulation of the stochastic PDEs they are considering. Relatedto this is the Deep Ritz Method by E and Yu [15] where the ansatz of the Ritz methodis used but with neural networks for the function approximation. Qin et al. [39] showthat residual networks can be seen as exact one-step time stepping methods and theseprovide building blocks for multi-step methods.The performance of (deep) neural networks has also been analyzed theoretically. Al-ready in the 1980s and 1990s, the first results demonstrated the potential of neural4etworks as universal function approximators [14, 26] that do not suffer from the curseof dimensionality [2]. More recent results on the approximation properties of deep neu-ral networks, e.g. [8], focused in particular on the required size of a network to achievecertain approximation bound. Mallat [34] provided insight into the effectiveness of deepneural networks using tools from harmonic analysis and pointed out the importance ofinvariants for their understanding. For partial differential equations, currently only fewresults exist. Kutyniok et al. [30] studied the theoretical power of forward, deep neuralnetworks for reduced order models and they show that the solution map, which yieldsthe basis coefficients of the reduced basis for given values of the parameters, can beefficiently approximated using neural networks and in a manner essentially independentof the dimension. Grohs et al. [20] study the spacetime error for PDEs when Euler stepsare realized using neural networks. To our knowledge, for partial differential equationsno theoretical results for recurrent neural networks with memory exist.
In the following, we provide an overview of the solution of the Navier-Stokes equationsusing the geometric multigrid method.We consider a domain Ω ∈ R d with d ∈ { , } and Lipschitz-continuous boundaryΓ and a bounded time interval [0 , T ]. The solution of the instationary Navier-Stokesequations then seeks the velocity v : [0 , T ] × Ω → R d and pressure p : [0 , T ] × Ω → R ,such that ∂ t v + ( v · ∇ ) v − v + ∇ p = f on [0 , T ] × Ω ∇ · v = 0 on [0 , T ] × Ω , (1)where Re > f an external force. The initial and boundaryconditions are given by v (0 , · ) = v ( · ) on Ω v = v D on [0 , T ] × Γ D Re ( (cid:126)n · ∇ ) v − p(cid:126)n = 0 in [0 , T ] × Γ N , (2)where (cid:126)n denotes the outward facing unit normal vector on the boundary ∂ Ω of thedomain. The boundary Γ = Γ D ∪ Γ N is split into subsets Γ D with Dirichlet boundaryconditions and Γ N with Neumann type conditions. For a discretization with finite elements, we briefly sketch the variational formulationof Eq. 1. By L (Ω) we denote the space of square integrable functions on the domainΩ ⊂ R d with scalar product ( · , · ) and by H (Ω) those L (Ω) functions with weak first5erivative in L (Ω). The function spaces for the velocity and pressure are then V := v D + H (Ω; Γ D ) d , H (Ω; Γ D ) d := (cid:110) v ∈ H (Ω) d : v = 0 on Γ D (cid:111) L := (cid:26) p ∈ L (Ω) , and, if Γ N = ∅ , (cid:90) Ω p d x = 0 (cid:27) , (3)where v D ∈ H (Ω) d is an extension of the Dirichlet data on Γ D into the domain. GivenDirichlet data on the complete boundary, the pressure is normalized to yield uniqueness.With these spaces, the variational formulation of Eq. 1 is given by( ∂ t v, φ ) + ( v · ∇ v, φ ) + 1Re ( ∇ v, ∇ φ ) − ( p, ∇ · φ ) = ( f, φ ) ∀ φ ∈ H (Ω; Γ D ) d ( ∇ · v, ξ ) = 0 ∀ ξ ∈ Lv (0 , · ) = v ( · ) on Ω . (4)Let Ω h be a quadrilateral or hexahedral finite element mesh of the domain Ω satisfyingthe usual requirements on the structural and form regularity such that the standardinterpolation results hold, compare [42, Section 4.2]. For the finite element discretizationof Eq. 4, we choose W ( r ) h as the space of continuous functions which are polynomials ofdegree r on each element T ∈ Ω h . If the finite element mesh contains elements ofgeneralized quadrilateral or hexahedral type, e.g. to resolve curved boundaries, we useisoparametric finite elements, see [42, Section 4.2.1] for details and the specific realizationchosen for this work. Throughout this paper, we choose the discrete trial- and test-spacesfor the discretization of Eq. 4 as v h , φ h ∈ V h = [ W (2) h ] d and p h , ξ h ∈ L h = W (2) h . Since theresulting equal order finite element pair V h × L h does not fulfil the inf-sup condition, weadd additional stabilization terms of local projection type [4]. The resulting semidiscretevariational problem then reads( ∂ t v h , φ h ) + ( v h · ∇ v h , φ h ) + 1Re ( ∇ v h , ∇ φ h ) − ( p h , ∇ · φ h ) = ( f, φ h ) ∀ φ h ∈ V h ( ∇ · v h , ξ h ) + (cid:88) T ∈ Ω h α T ( ∇ ( p h − π h p h ) , ∇ ( ξ h − π h ξ h )) = 0 ∀ ξ h ∈ L h (5)where π h : W (2) h → W (1) h denotes the local projection operator into the space of linearpolynomials. The local stabilization parameter α T is chosen as α T = α · Re · h T , where we usually use α = 0 . h T is the local mesh size of element T . Werefer to [4, 9] for details. Since the numerical experiments described in Section 6 considermoderate Reynolds numbers only, no stabilization of the convective term is required.6 .2 Time discretization For temporal discretization, the time interval [0 , T ] is split into discrete time steps ofuniform size 0 = t < t < · · · < t N = T, k = t n − t n − . The generalization to a non-equidistant time discretization is straightforward and onlyomitted for ease of presentation. We define v n := v h ( t n ) and p n := p h ( t n ) for the fullydiscrete approximation of velocity and pressure at time t n and apply the second orderCrank-Nicolson method to Eq. 5, resulting in the fully discrete problem( ∇ · v n , ξ h ) + (cid:88) T ∈ Ω h α T ( ∇ ( p n − π h p n ) , ∇ ( ξ h − π h ξ h )) = 0 ∀ ξ h ∈ L h , k ( v n , φ h ) + 12 ( v n · ∇ v n , φ h ) + 12Re ( ∇ v n , ∇ φ h ) − ( p n , ∇ · φ h )= 1 k ( v n − , φ h ) + 12 ( f n , φ h ) + 12 ( f n − , φ h ) −
12 ( v n − · ∇ v n − , φ h ) − ∇ v n − , ∇ φ h ) ∀ φ h ∈ V h . (6)The right hand side only depends on the the velocity v n − at the last time step n − b n − in the following.The Crank-Nicolson time discretization in Eq. 6 corresponds to a nonlinear systemof equations that can be solved, e.g., with the Newton-Krylov method. In the pre-sented form it is sufficiently robust for smooth initial data v ; we refer to [22] for smallmodifications with improved robustness and stability. In each time step, Eq. 6 results in a large nonlinear system of algebraic equations which,by introducing the unknown x = ( v n , p n ), we write as A h ( x ) = f h [ A h ( x )] i := ( ∇ · v n , ξ ih ) + (cid:88) T ∈ Ω h α T ( ∇ ( p n − π h p n ) , ∇ ( ξ ih − π h ξ ih ))+ 1 k ( v n , φ ih ) + 12 ( v n · ∇ v n , φ ih ) + 12Re ( ∇ v n , ∇ φ ih ) − ( p n , ∇ · φ ih )[ f h ] i := 1 k ( v n − , φ ih ) + 12 ( f n , φ ih ) + 12 ( f n − , φ ih ) −
12 ( v n − · ∇ v n − , φ ih ) − ∇ v n − , ∇ φ ih ) , (7)for all test functions φ ih and ξ ih . The nonlinear problem is solved by Newton’s methodbased on the initial guess x (0) = ( v n − , p n − ) and the iteration l = 1 , , . . . A (cid:48) ( x ( l − ) w ( l ) = f h − A ( x ( l − ) , x ( l ) = x ( l − + w ( l ) (8)7 lgorithm 1 The geometric multigrid method for the solution of the linear system A L x L = b L given on a finest level L . If all high frequency errors of S are smoothed ata constant rate, the method achieves optimal complexity O ( n ). The multigrid solver isinitiated on the finest mesh level L and then used recursively: procedure multigrid ( l, A l , b l , x l ) s l ← S ( A l , b l , x l ) (cid:46) Smoothing r l ← b l − A l s l (cid:46) Residual r l − ← R ( l, r l ) (cid:46) Restriction if l − then (cid:46) Coarse-grid solution c ← A − r (cid:46) Direct solution else c l − ← multigrid ( l − , A l − , r l − , end if x (cid:48) l ← s l + P ( l, c l − ) (cid:46) Prolongation s (cid:48) l ← S ( A l , b l , x (cid:48) l ) (cid:46) Smoothing return s (cid:48) l end procedure where we denote by A (cid:48) ( x ( l − ) the Jacobian of A at x ( l − , i.e. the matrix of first partialderivatives. For the Navier-Stokes equations this is easily computed analytically, cf. [42,Sec. 4.4.2].Each Newton step requires the solution of a linear system of equations, where the sys-tem matrix A (cid:48) ( x ( l − ) is sparse, but non-symmetric and not definite, due to the saddlepoint structure of the underlying Navier-Stokes equation. To approximate the solutionwith optimal robustness, we employ the generalized minimal residual method (GMRES)which has been introduced by Saad [44]. The convergence is thereby accelerated througha geometric multigrid solver that is used as a preconditioner with a single sweep per-formed for every GMRES step (usually about 10 for each nonlinear Newton iteration). Multigrid methods are a class of efficient algorithms for the solution of linear systemsthat can achieve the optimal complexity of O ( n ), where n is the number of degreesof freedom (in our case proportional to the number of mesh vertices). We employ thegeometric multigrid method as a preconditioner for the GMRES iterations and henceonly discuss this method in the following. It is summarized in Algo. 1.The geometric multigrid method is based on a hierarchical approximation of a linearsystem on a sequence of finite element spaces V ⊂ V ⊂ · · · ⊂ V L defined over a hierarchyof meshes Ω , . . . , Ω L = Ω h . Instead of solving the linear system on the finest mesh level L , as one would do with traditional solvers, geometric multigrid methods smooth highfrequency errors with a simple iterative method and treat the remaining errors on lowerlevels. Usually, smoothing is applied at the beginning (Algo. 1, line 2) and at the end(Algo. 1, line 11) of each iteration. In between, the remaining residual is computed8Algo. 1, line 3) and restricted to the next coarse level L − L − L − L (Algo. 1,line 10).The mesh transfer from fine to coarse levels is accomplished with L -projections,known as restrictions, and from coarse to fine ones with interpolations, known as pro-longations. The smoothing that preceded the restrictions is performed with a smoothingoperator S ( A l , b l , x l ). It is usually a simple iteration that yields an approximation tothe solution of the linear system, i.e. S ( A l , b l , x k ) ≈ A − l b l , and that aims to quickly reduce all high frequency components of the residual b l − A l x l .If this happens on all levels l at a constant rate, then the method achieves the optimalcomplexity O ( n ). In our implementation we use a simple iteration of Vanka type [48],which allows for easy parallelization and gives very good performance with less than 5pre- and post-smoothing steps [17]. The idea of the Vanka type smoother is to exactlysolve small subproblems and to replace the inverse of A − l by V l ( A l ) = (cid:88) T ∈ Ω l R TT [ R T A l R TT ] − R T (9)where R T is the restriction to those nodes that belong to one mesh element T ∈ Ω l and R TT its transpose. Considering piecewise quadratic finite elements, 9 nodes areattached to each element such that the local matrices to be inverted have the dimension R T A l R TT ∈ R × , since each node comprises the scalar pressure and two velocitycomponents. For a more detailed discussion of the geometric multigrid method we referto [21, 10] and the realization in Gascoigne 3D [5] used throughout this work is describedin [6]. Different types of neural networks exist. The ones originally introduced under the nameare today known as feed-forward neural networks and used for tasks such as image recog-nition or data mining. Recurrent neural networks are an extension where an activationvariable a kn ∈ R p k allows to propagate information in a discrete time variable indexedby n , such as those occurring in time series (e.g. the stock market) or sequential data(e.g. text). An extension of this model uses network nodes with memory. These allow tomore effectively model long term dependencies and also to avoid the vanishing gradientproblem that otherwise occurs with recursive neural networks [24]. In our work, we useGated Recurrent Units (GRUs) [13] that are somewhat simpler than the better knownLong-Short-Term-Memory (LSTM) units [51] but are therefore also easier to train. At9ayer k in the network, a GRU is given by z kn = σ z (cid:0) W ( z ) k h k − n + U ( z ) k h kn − + b ( z ) k (cid:1) (10a) r kn = σ r (cid:0) W ( r ) k h k − n + U ( r ) k h kn − + b ( r ) k (cid:1) (10b) h kn = z tk (cid:12) h kn − + (1 − z kn ) (cid:12) σ h (cid:0) W ( h ) k h k − n + U ( h ) k ( r kn (cid:12) h kn − ) + b ( h ) k (cid:1) (10c)where (cid:12) denotes the element-wise product and the W ( · ) k and b ( · ) k are weight matrices andbias vectors determined by training. The so called update gate vector z kn ∈ R q in Eq. 10controls to what extend h kn − is carried over to the output h kn at the current time n andthe reset gate vector r kn ∈ R q controls the contribution of h kn − to the nonlinearity ofthe cell. Together, z kn ∈ R q and r kn ∈ R q thus control the memory of a GRU cell, i.e. towhat extend the past hidden output h kn − contributes to the current one h kn . In this section, we develop the deep neural network multigrid solver (DNN-MG). It uses arecurrent neural network to predict the correction of a coarse mesh solution that has beenprolongated (or interpolated) onto a finer mesh (finer by one or multiple mesh layers).Through this, we can obtain solutions that are more accurate than those obtained withthe coarse mesh only while being computationally more efficient than performing thefull (multigrid) computations on the fine mesh level(s).We will develop the DNN-MG solver in the context of the Navier-Stokes equations andreturn to the general formulation at the end of the section. To simplify the exposition, wewill also assume that the neural network operates on only one finer level L + 1. However,bridging more levels as well as an extension to other partial differential equations ispossible within our framework. We begin by detailing one time step of the simulation of the Navier-Stokes equationsusing the DNN-MG solver. The computations are summarized in Algo 2 and a conceptualdepiction is provided in Fig. 1.At the beginning of time step n , we first solve for the unknown velocity v L n andpressure p L n on level L defined by Eq. 6 using the classical Newton-Krylow simulation(Algo 2, lines 2-5) as described in Sec. 3.3. To improve the solution, we interpolate(i.e. prologante) v L n to ˜ v L +1 n onto the next finer level L + 1 where a richer functionspace V L +1 is available (Algo 2, line 7). The neural network part of DNN-MG thenpredicts the velocity update d L +1 n , i.e. the difference d n L +1 = ¯ v L +1 n − ˜ v L +1 n between ˜ v L +1 n and the (unknown) ground truth solution ¯ v L +1 n on level L + 1 (Algo 2, line 8) based onthe interpolated velocity ˜ v L +1 n ∈ V L +1 and information about the local mesh structure.The n th time step ends by computing the right hand side b L +1 n of Eq. 6 on level L + 1using the corrected velocity ˜ v L +1 n + d L +1 n (Algo 2, line 8) and restricting it to level L lgorithm 2 DNN-MG for the solution of the Navier-Stokes equations. Lines 6-9 (blue)provide the modifcations of the DNN-MG method compared to a classical Newton-Krylow simulation with geometric multigrid preconditioning. for all time steps n do while not converged do (cid:46) Newton’s method for Eq. 6 δz i ← multigrid ( L, A nL , b nL , δz i ) (cid:46) Algo. 1 with z = ( p Ln , v Ln ) z i +1 ← z i + (cid:15) δz i end while ˜ v L +1 n ← P ( v L n ) (cid:46) Prolongation on level L + 1 d L +1 n ← N (˜ v L +1 n , Ω L , Ω L +1 ) (cid:46) Prediction of velocity correction b L +1 n +1 ← Rhs (˜ v L +1 n + d L +1 n , f n , f n +1 ) (cid:46) Set up rhs of Eq. 6 for next time step b L n +1 ← R ( b L +1 n +1 ) (cid:46) Restriction of rhs to level L end for (Algo 2, line 9). Through this right hand side, the the neural network-based correctionbecomes part of the multigrid computations in the next time step (Algo 2, line 5) andthus affects the overall time evolution of the flow. A key aspect of the DNN-MG solveris to build the right hand side b L +1 n on level L + 1 since a restriction of the correctedvelocity ¯ v L +1 n + d L +1 n itself would (essentially) yield again the velocity v L n that resultedfrom the multigrid computations only. Pressure is handled implicitly on level L in themultigrid solve and thus plays no role in the neural network DNN-MG correction. The neural network component is at the heart of the DNN-MG solver. Care in itsdesign is required to ensure that it improves the efficiency of the runtime computations,is easy to train, and generalizes well to flow regimes not seen during training. Theseobjectives will be attained through a very compact neural network with a small numberof parameters and a local, patch-based design. An overview of the network componentis provided in Fig. 2.
A patch-based neural network
The efficiency of DNN-MG relies on the neural networkevaluation (and related auxiliary computations, e.g. of its inputs) being less computa-tionally expensive than using traditional solvers for Eq. 7 for the level L + 1. When aneural network predicts the defect d L +1 n for the entire domain on level L + 1, the size ofthe networks in- and outputs is of order O ( m L +1 ) where m L +1 is the number of degreesof freedom on the level. The cost of evaluating the network is then at least O ( m L +1 ) andit would hence be more expensive than a direct multigrid solution. The neural networkwould additionally have to have a large number of parameters to model the solution’sbehavior on the entire simulation domain, necessitating substantial training time anddata, while the domain would be “baked” into the network, hampering generalization.11 n t n +1 time step multigrid solutionneural network P ( v Ln )temporal consistency through GRU-cell memory h t − R ( b L +1 n +1 ) b L +1 n = Rhs (˜ v L +1 n + d L +1 n ) Figure 1: The neural network part of DNN-MG predicts a correction d L +1 n of the pro-longated velocity P ( v L n ) towards the unknown ground truth ¯ v L +1 n , which tra-ditionally would be obtained using the multigrid solver for level L + 1. Thecorrection d L +1 n is incorporated into the time integration by using the improvedsolution ˜ v L +1 n + d L +1 n to build a right hand side b L +1 n +1 and restricting it to level L . The restriction b Ln +1 is then used in the next time step in the multigridsolver. Temporal coherence of the correction (and hence the overall flow) isachieved through the memory of the GRU in the neural network.To avoid these issues, DNN-MG uses a neural network that operates patch-wise oversmall neighborhoods of the mesh. In its most local variant, which we use for the nu-merical examples discussed Section 6, each patch consists of the unknowns that belongto one single mesh element only, cf. Fig. 3; a more global approach is easily realized byconsidering larger patches consisting of multiple adjacent elements. Our network takesas input thus only information from one patch and predicts the velocity update d L +1 n onlyfor the mesh nodes in the patch, with the prediction done independently for all patchesto cover the entire simulation domain (duplicate predictions for adjacent patches areaveraged). The locality is one key to the compactness of the neural network and hencealso to its efficient evaluation. Furthermore, since a single neural network is used forall patches in the domain, training with a small number of flows exposes the networkto a large number of different flow behaviors. This facilitates the network’s ability togeneralize to flows not seen during training and reduces training time and data. Neural network inputs
The efficacy of the neural network prediction relies criticallyon suitable network inputs that provide rich information about the coarse flow behavioron level L as well as the local patch geometry (similar to how the geometry of meshcells play an important role in classical finite element simulations). In particular, withcarefully selected inputs one requires fewer parameters in the neural network, whichreduces the evaluation time at runtime and the amount of data and time required fortraining. 12raining phase Application Training dataon each patch:
Input:
Residual Ax − b on level L + 1;Peclet numbers;velocities;cell sizes Loss:
Defect d L +1 n = P ( v L n ) − ¯ v L +1 n Coarse sol. { v L n } T n n =1 Reference sol. { ¯ v L +1 n } T n n =1 Deepneuralnetwork Correctedsolution P ( v Ln ) + d L +1 n multigridsolutionon level L TrainingEnhance modeland solution R ( b L +1 n )InterferenceFigure 2: High level overview of data generation, training and application of DNN-MG.The training inputs are velocity fields on level L and L + 1, with the latter oneserving as reference solution. From these as well as the meshes Ω L and Ω L +1 onthe respective levels we derive the input to the neural network that operatespatch-wise (cf. Fig. 3). At application time, we use a classical multigridsolution on level L and apply the neural network part of DNN-MG to obtaineda corrected solution P ( v L n ) + d L +1 n on level L + 1. With it, a right hand side b L +1 n is computed that is then restricted to level L where it is used in the multigridsolver in the next time step.Using the most localized patches directly supported by the available mesh structure,as we do in our numerical experiments, each patch consists of exactly one mesh elementon level L with N L = 4 nodes and thus with N L +1 = 9 nodes on level L + 1. Our inputsto the neural network are then: • nonlinear residuals r L +1 n = f L +1 − A L +1 ( P ( x L n )) ∈ R N L +1 of the prolongated coarsemesh solution for Eq. 7; • the velocities v Ln ∈ R N L on mesh level L ; • P´eclet numbers Pe L ∈ R N L ; • geometry of the cells; in particular the edge lengths h c ∈ R , the aspect ratios a c ∈ R (of two neighboring and two opposite edges each), and the angles betweenthe edges α c ∈ R .We hence have in total 51 inputs, reflecting the compactness of our neural network.The most important input to the neural network is the nonlinear residual r L +1 n = f L +1 − A L +1 ( P ( x L n )), cf. Eq. 7, that provides a local measure of the error (or defect) at13 omain Ω GRU-based NeuralNetwork Patchfrom mesh
Get inputfor DNNfrom patchAdd defectback tosolution
Figure 3: DNN-MG uses a local approach where the neural network operates patch-wise on small neighborhoods of the simulation domain and provides a velocitycorrection independently for each patch.every mesh node on level L + 1 and thereby incorporates information about the partialdifferential equation. We also use the P´eclet number Pe L = q ˜ v nL /ν where ν is the fluid’sviscosity and q a characteristic length scale, which is the patch size in our case. TheP´eclet number describes the ratio between transport and diffusion on each node on level L and it hence measures which part of the Navier-Stokes equations dominates the localbehavior of the flow. Neural network architecture
To predict a defect d L +1 n that is consistent with the fluidflow, we use recurrent neural networks with memory. Through this, the past simulationstates . . . , v n − , v n − affect the predicted update d n at time step n , similar to howclassical time stepping methods (often) use information from a sequence of last timesteps to predict the next one.The specific network architecture we use for DNN-MG is shown in Fig. 4. At its heartis a GRU unit with memory, cf. Sec 4. It could in principle be replaced by another cellwith memory, such as an LSTM unit or a TCN, but since the GRU unit performed wellin our experiments and is, through its simpler design, easier to train we used it in thepresent work. The fully connected layer that follows the GRU reduces its output sizeto the size of the update. The two convolutional blocks following the fully connectedlayer are designed to learn local dependencies, with one learning the interactions of thetwo velocity components at each vertex and the other the spatial dependence of eachcomponent across a patch. Each of these blocks consist of two layers, where the secondone is just the transposed version of the first layer so that input and output sizes areequal. The final convolutional layer reduce the dimensionality of the data from themultiple filters that form the preceding convolutional layers to the output d n of thenetwork. Training of DNN-MG
The training of DNN-MG is based on simulations of the Navier-Stokes equations for which a multigrid representation of the velocity v ln with two levels14 RU, in: N in , out: N GRU
Input( r n , v n , P e L , h c )Fully connected N GRU × N out Convolutional block Convolutional blockConvolution reducingnumber of filtersOutput d n ∈ R N out Figure 4: The neural network architecture that was used. L and L + 1 is available. The velocity v L +1 n thereby serves as ground truth ¯ v L +1 n . Thetraining then finds network weight (in the GRU unit, the dense and the convolutionallayers) such that the norm (cid:13)(cid:13) (˜ v L +1 n + d L +1 n ) − ¯ v L +1 n (cid:13)(cid:13) of the difference between the predictedvelocity ˜ v L +1 n + d L +1 n (i.e. the velocity after correction, Algo. 2, line 9) and the groundtruth ¯ v L n is minimized over the course of a simulation. After the detailed description of DNN-MG for the solution of the Navier-Stokes equa-tions we will in the following show how it can be applied to a general PDE in variationalform u ∈ V : A ( u )(Φ) = F (Φ) , ∀ Φ ∈ V , where A ( · )( · ) is a semi-linear form (linearin the second argument) and V is a Banach-space chosen for the problem. Given asuitable Galerkin formulation in the discrete subspace V h ⊂ V the abstract Newton iter-ation for the solution of the problem A h ( x h ) = f h can be formulated exactly as for theNavier-Stokes equations, compare Eq. 7 and Eq. 8. The Newton-Krylov approach basedon the preconditioned GMRES iteration and a geometric multigrid solver as precondi-tioner is highly robust and it can be applied to a variety of PDE models (as is done inGascoigne [5]). The DNN-MG solver in Algorithm 2 can hence be applied to variousPDEs.More generally, DNN-MG does not strictly require a geometric multigrid solver butonly an algorithm that can solve on coarse levels and an interpolation onto one ormultiple finer levels. However, the geometric multigrid framework gives us immediatelyaccess to a nested hierarchy of problems. 15 .4 Algorithmic Complexity of DNN-MG The Newton-Krylov geometric multigrid method can already achieve optimal complexity,which raises the question what advantage DNN-MG has to offer. Before addressing thequestion from a practical point of view in the next section, we sketch in the following ananswer from a theoretical point of view.The Newton-Krylov geometric multigrid framework achieves linear complexity O ( N )in the number N of degrees of freedom, which roughly quadruples with each globalmesh refinement, i.e. N L +1 ≈ N L . The constant hidden in O ( N ), however, can beimmense, since on average 5 Newton steps are required in each time step and within eachNewton step one has to compute on average 10 GMRES steps with one sweep of thegeometric multigrid solver as preconditioning. Furthermore, in the geometric multigridtypically 5 pre-smoothing and 5 post-smoothing steps have to be determined. Hence,a total of about 500 Vanka smoothing steps must be performed on each of the meshlayers L, L − , . . . ,
0. One Vanka step thereby requires the inversion of about O ( N l / ×
27, one on each element of the mesh, cf. Eq. 9, resulting inapproximately 27 ≈
20 000 operations. Since the complexity of all mesh levels sums upto about N L + N L − + . . . N ≈ N L (1 + 4 − + · · · + 4 − L ) ≈ N L we can estimate theeffort for the complete solution process on level L as 5 · · (5 + 5) · · ≈ N L . Wethereby only counted the effort for smoothing and neglected all further matrix, vectorand scalar products. Solving the problem on an additional mesh level L +1 would requireabout four times this effort, i.e. ≈ N L operations.On the other hand, the DNN-MG method only requires the prolongation of the solutionto the next finer mesh and there, as main effort, the evaluation of the neural networkon each patch. If we again consider patches of minimal size (one patch is one element ofmesh layer L + 1) about O ( N L +1 /
4) = O ( N L ) patches must be processed. The effort forthe evaluation of the network can be estimated by the number of trainable parameterswith N c inputs. The DNN-MG approach asks for only one such evalution of the networkin each time step. The specific network model used for the numerical test cases comprises8634 trainable parameters. Hence, the effort for correcting the level L Newton-Krylovsolution by the neural network on level L + 1 can be estimated by 8634 N L , which isnegligible in comparison to the effort of solving on level L + 1 ( ≈ N L ). In this section, we present numerical results obtained with the deep neural networkmultigrid solver (DNN-MG) for the Navier-Stokes equations. We consider a classicalbenchmark problem describing the laminar flow around a cylinder, cf. [45]. In contrast tothe original benchmark configuration, we parametrize the geometry by using an ellipsoidwith varying aspect ratio to study the robustness of our technique. We also report onthe runtime performance of DNN-MG and study its ability to generalize to flow regimesthat differ from the training data. 16 wall Γ out Γ wall Γ in Re test = 2002 . . . . . , . in , do-nothingboundary conditions at the outflow boundary Γ out and no-slip conditions onthe walls Γ wall . The center of the obstacle is at (0 . , . L )fine 8088 4 ( L + 1)reference 31536 5 ( L + 2)Table 1: Spatial meshes with increasing number of multigrid levels and unknowns. We consider a variation of the well-established “laminar flow around a cylinder” intro-duced by Sch¨afer and Turek [45], in particular the unsteady 2D-2 testcase where thelaminar flow around a circular obstacle at Reynolds number Re = 100 leads to a stableperiodic flow pattern. The geometry of the flow domain is shown in Fig 5. The flow isdriven by a Dirichlet profile v = v D at the left boundary Γ in given by v D ( x, y, t ) = v avg y ( H − y ) H ω ( t ) on Γ in := 0 × [0 , H ] ,ω ( t ) = (cid:40) − cos(2 πt ) , t ≤ t > , (11)where H = 0 .
41 is the height of the channel and the function ω ( t ) regularizes the startupphase of the flow. On the wall boundary Γ wall and on the obstacle we prescribe no-slipboundary conditions v = 0 and on the outflow boundary Γ out we use a do-nothingoutflow condition [23]. The average flow rate is v avg = which, through the differentobstacles used for training and testing, results in the Reynolds numbers Re train = 166and Re test = 200, respectively.In all numerical examples the temporal step size is k = 0 .
01. In space, we considerthe sequence of meshes in Table 1. Finer levels result from a uniform refinement of the“coarse” mesh conforming to the obstacle. 17 .2 Neural network parameters
As discussed in Sec. 5, DNN-MG uses a local neural network that operates over patchesof the mesh domain. For our numerical experiments, we defined a patch to be therefinement of a coarse mesh cell on level L , i. e. we used the most localized patches thatare naturally supported by the mesh structure. This resulted in a simple network (cf.Figure 4) with only 8634 trainable parameters. With none of the layers having a biasand N in = 51 (the input length), N GRU = 32 (the size of the GRU state), N out = 18(the output length), there are 3(32 ·
32 + 32 ·
51) = 7968 parameters in the GRU and32 ·
51 = 576 in the fully connected layer. Each of the convolutional blocks that learnsvertex interactions has, furthermore, 36 parameters and the last convolutional layer has18 parameters.Each convolutional block learns 4 filters, in the first layer the number of filters isexpanded to 4 and in the following transposed layer it is reduced to 1. In the last layerthe outputs of the convolutional blocks are concatenated along the filter dimension,which then has to be reduced to a length of 1. This is achieved by a a convolution withlength 1 (summing along the filter dimension and weighting the output), resulting in 18trainable parameters.The network was implemented using PyTorch [38].
As training data for DNN-MG we used three ( L + 1)-level multi-grid simulations of theflow around a cylinder, each with a slightly different elliptical obstacle with height 0 . N c × N T training items, where N c is the numberof patches and N T is the length of the time series. By the periodicity of the flow it alsosufficed to consider a small number of full periods. We hence used t ∈ [1 ,
6] resulting in N T = 500.As loss function L we employed a simple l -loss L = N T (cid:88) n =1 N c N c (cid:88) c =1 l (˜ v L +1 n,c + d L +1 n,c , ¯ v L +1 n,c ) = N T (cid:88) i =1 N c N c (cid:88) c =1 (cid:13)(cid:13)(cid:0) ˜ v L +1 n,c + d L +1 n,c (cid:1) − ¯ v L +1 n,c (cid:13)(cid:13) . Tikhonov-regularization was used as well with a scaling factor α = 10 − . Training wasperformed using the ADAM optimizer with a limit of 1000 epochs. At the end, themodel with the lowest validation loss was saved. This resulted in a training time of 4h23m on an Intel Xeon Gold 6254. For testing we used the flow around an elliptic obstacle with an increased height of 0 . t = 9 for the flow obtained using DNN-MG as well as classical multigrid solutions onlevel L and L + 1. It can be observed that DNN-MG is indeed able to predict high18 .)b.)c.) Figure 6: Magnitude of the velocity field for the channel for flow with one obstacle attime t = 9 for a.) a multigrid solution with L + 1 levels, b.) DNN-MG, c.) amultigrid solution with L levels.frequency fluctuations that are not visible in the coarse mesh solution. In particular inthe vicinity of the obstacle the quality of the solution is strongly enhanced with distinctfeatures in the wake being apparent in the DNN-MG simulation. . . . . . . − − − × −
13 multigrid levels4 multgrid levels5 multigrid levels . . . . . . − − − × − Figure 7: The lift functional on the sequence of spatially refined finite element meshesfrom Table 1 reveals a frequency shift which is depending on the spatial finiteelement discretization. We show the startup phase [0 ,
3] and the interval [9 , J d ( v, p ) = (cid:90) Γ (cid:16) ∇ v − pI (cid:17) (cid:126)n · (cid:126)e d s, J l ( v, p ) = (cid:90) Γ (cid:16) ∇ v − pI (cid:17) (cid:126)n · (cid:126)e d s, respectively, where (cid:126)e = (1 , T and (cid:126)e = (0 , T . The results in Fig. 8 and Table 2demonstrate that DNN-MG is also able to substantially correct both the drag and liftfunctionals. Maximum and minimum values but also the frequency of oscillation of theaugmented simulation are by far closer to the fine mesh one that serves as reference.These results substantiate that DNN-MG is able to considerably improve the solutionquality.Fig. 10, finally, shows the relative velocity and pressure errors E rel n ( v ) = (cid:13)(cid:13) (˜ v L +1 n + d L +1 n ) − v L +1 n (cid:13)(cid:13) (cid:13)(cid:13) v L +1 n (cid:107) , E rel n ( p ) = (cid:13)(cid:13) p g − p f (cid:13)(cid:13) (cid:13)(cid:13) p L +1 n (cid:13)(cid:13) , where (cid:107) · (cid:107) denotes the Euclidean norm of the discrete solution. DNN-MG is able torobustly reduce the velocity error from 20% −
27% to about 10%. While the relativepressure error is also reduced by a factor of 2 on average, some oscillatory peaks remain.These stem from the shift in periodicity as discussed in the following remark.
Remark.
Fig. 7 shows the lift functional for classical multigrid simulations with threedifferent resolutions both in the startup phase t ∈ [0 ,
3] and for t ∈ [9 ,
10] where the flowprofile is fully developed. The results show that on coarser meshes it takes longer forthe periodic flow profile to develop (Fig. 7, left) and the period of oscillations is gettingshorter with increasing mesh resolution (Fig. 7, right). This shift in frequency andphase as a function of mesh resolution prevents a direct comparison of results obtainedon different levels in a multigrid-type algorithm such as DNN-MG. In [35], a shiftedvariant of the Crank-Nicolson scheme is discussed to remedy the problem. Since thiscannot be used directly for DNN-MG, we instead compare the functional values in shiftedintervals [ t ∗ , t ∗ + 1], where t ∗ is chosen as the first peak in the lift functional after time t = 9 s , identified separately for the coarse mesh solution, DNN-MG and the fine meshone on level L + 1. The results reported in Fig. 8, Fig. 10, Fig. 13 and Table 2, Table 3are those in [ t ∗ , t ∗ + 1]. drag lifttype (level) min max mean ampl. min max mean ampl.MG ( L + 1) 0 . . . . − . . − . . L ) 0 . . . . − . . . . L ) 0 . . . . − . . − . . Table 2: Functional outputs for the drag and the lift functional on the coarse mesh, thefine mesh and the coarse mesh with ANN correction.20 . . . . . . t . . . . . . . . × − DNN-MG ( L )MG ( L )MG ( L + 1) 0 . . . . . . t − − − × − DNN-MG ( L )MG ( L )MG ( L + 1) Figure 8: Drag (top) and lift (bottom) functionals for the channel with one obstacle forthe coarse mesh solution, the fine mesh solution and DNN-MG.
Figure 9: Comparison of the wall-clock times of the different mehods
The use of the DNN-MG method in applications is only meaningful when it not onlyimproves the accuracy compared to a coarse mesh solution (as demonstrated in the lastsubsection) but also reduces the computation time compared to solving on a finer mesh.As shown in Figure 9, DNN-MG saves 55 % of the wall clock time required for a fine meshsolution. It hence indeed improves the computational efficiency. Compared to solvingon a coarse mesh, the runtime increases by 37 %, but with an improved accuracy forDNN-MG. Fig. 9 also shows that the evaluation of the neural network requires only 2 %of the DNN-MG runtime and a substantial amount of time is spent on data conversionbetween the multigrid framework and the neural network and other auxiliary tasks.21 . . . . . . t . . . . . . . . v e l o c i t y e rr o r × − DNN-MG ( L )MG ( L ) 0 . . . . . . t p r e ss u r ee rr o r × − DNN-MG ( L )MG ( L ) Figure 10: Relative errors for coarse solution with and without ANN correction comparedto the reference solution on level L + 1.The DNN-MG runtime can hence most likely be further reduced with a more optimizedimplementation. For the practicality of the DNN-MG method it is important that a neural networktrained on one flow performs well also on similar ones, i.e. that it generalizes to flowsbeyond those seen during training. The results in Sec. 6.4 already demonstrated that thenetwork is able to do so under small perturbations of the geometry of the obstacle. Inthe following, we consider three more substantial changes to the flow: a channel withoutan obstacle, a channel with two obstacles, and a flow in an L-shaped domain, Unlessstated otherwise, we reuse the network trained as discussed in Sec. 6.3 for the channelwith a single obstacle.
Channel flow without obstacle
In our first test case we considered the stationary, 1-dimensional flow that develops in the channel of Fig. 5 when the obstacle close to theinflow is removed. The stationary flow is challenging for DNN-MG in our setup since itis trained on oscillatory flow data only. Fig. 11 shows the DNN-MG solution that hasbeen obtained. We observe a slight deformation of the velocity field (which should bestrictly constant in the horizontal direction) but an otherwise stable and in particularstationary flow. The network is thus able to generalize to this entirely different flowregime. We conjecture that the network’s ability to do so results from our networkdesign that learns individual patches. Hence, even for the non-stationary flow around acylinder that served as training data one has at least locally also relatively stationaryflows in the training data. 22igure 11: Magnitude of the velocity field obtained with DNN-MG trained on the flowaround one cylinder for the channel without obstacle (bottom) and referencesolution (top).
Channel with two obstacles
The second generalization test was also an extension ofthe initial benchmark problem in Sec. 6.4 where we replaced the one obstacle with two.These are centered at (0 . , .
18) and (0 . , . . .
1. Since the incoming flow is already disturbed by the first obstaclebefore reaching the second one, the configuration differs substantially from the trainingscenario with only one. To represent the modified geometry of this test case we employa finite element mesh that on level L features 9 741 degrees of freedom and 37 917 forthe fine mesh of level L + 1,The magnitude of the velocity field obtained with DNN-MG as well as multigridsolutions on levels L and L + 1 are shown in Figure 12. All fields look similar, althoughvisually DNN-MG is closer to the reference than the coarse solution in the vicinity ofthe two obstacles and differs more in the far field on the right. To quantitatively assessthe performance of DNN-MG we again used the drag and lift functionals, which wenow evaluated for the second obstacle. Fig. 13 and Table 3 show that DNN-MG is ableto substantially improve the estimated amplitude, despite being trained with only oneobstacle. Moreover, the results were obtained using a network trained on a differentmesh, which was possible since our neural network operates on local patches that areagnostic to the global mesh structure. The frequency in Fig. 13 is again suffering fromthe frequency and phase shift that arises from the different mesh resolutions but thatare unrelated to DNN-MG, cf. Sec. 6.4 23 .)b.)c.) Figure 12: Magnitude of the velocity field of DNN-MG trained on the flow around onecylinder for the channel with two obstacle at time t = 10 (b.) as well as amultigrid solution with L + 1 levels (a.) and L levels (c.). . . . . . . t . . . . . . . . × − DNN-MGMGMG (fine) 0 . . . . . . t − − − − × − DNN-MGMGMG (fine)
Figure 13: Drag (left) and lift (right) functionals of DNN-MG trained on the flow aroundone cylinder for the channel with two obstacle as well as for a classical multi-grid solution on L and L + 1 levels. L-shaped domain
As a final test case we considered an L-shaped domain as shownin Fig. 14. Using DNN-MG with the trained neural network from Sec. 6.4 fails in this24 rag lifttype (level) min max mean ampl. min max mean ampl.MG ( L + 1) 0 .
335 0 .
375 0 .
355 0 . − .
401 0 . − .
001 0 . L ) 0 .
319 0 .
353 0 .
336 0 . − .
303 0 .
399 0 .
048 0 . L ) 0 .
302 0 .
330 0 .
315 0 . − .
302 0 . − .
000 0 . Table 3: Drag and the lift functionals for the generalization of DNN-MG trained on theflow around one cylinder for the channel with two obstacles. As reference weshow the classical multi-grid solutions on the coarse and the fine meshes.case since the training data contains only flows in the positive x -direction (the predictedsolution then has a unphysical drift in the x -direction also in the lower arm of the L-shape). To alleviate this problem, we retrained DNN-MG with the existing data but the x - and y -axes swapped. Through this, no new and expensive simulations on level L + 1were necessary, and the training converged faster since the optimization could start froman already trained network. In Figure 14 the magnitude of the resulting velocity fieldsfor DNN-MG and multigrid solutions on levels L and L + 1 are depicted. As can be seenthere, DNN-MG does not provide a substantial improvement over the coarse multigridsolution. In our opinion, this is not surprising given that the training data does notcontain a flow past a sharp edge. Nonetheless, we would like to point out that the DNN-MG simulation remains stable and the neural network correction does not deterioratethe solution quality of the coarse simulation on level L .Figure 14: Comparison between simulations with L + 1 multigrid levels, DNN-MG, L multigrid levels (from left to right).25 Conclusion
We have presented a deep neural network multigrid solver (DNN-MG) that uses a re-current neural network to improve the efficiency of a geometric multigrid solver, e.g. forthe simulation of the Navier-Stokes equations. The neural network is integrated intothe multigrid hierarchy and corrects on fine levels the prolongated solutions (instead ofthe smoothing operations used classically), affecting the overall time evolution throughthe right hand side of the problem. Central to DNN-MG is the compactness of its neu-ral network. This is enabled through the network’s locality, namely that it operatesindependently on small patches of the domain, and the coarse multigrid solution thatprovides a “guide” for the network’s corrections. The compactness is vital for the com-putational efficiency of DNN-MG and also reduces the amount of training data and timethat are required. The locality furthermore facilitates generalizability and even allowsone to use a network trained on one mesh domain with an entirely different one. The useof recurrent neural networks with memory thereby enables DNN-MG to account for timedependencies in a flow and obtain predicted corrections that are temporally coherent.We analyzed the efficacy of DNN-MG for the classical Navier-Stokes benchmark prob-lem of a laminar flow around an obstacle. Our results show that DNN-MG is able tosubstantially improve the accuracy of solutions compared to coarser multigrid ones andalso provide substantial improvements for the drag and lift functionals. It thereby re-quires only approximately half the time of a multigrid solution at the same resolutionand only a modest increase compared to a coarse one. DNN-MG is thereby able togeneralize substantially beyond the flow used for training and with the neural networktrained on the one obstacle flow we were able to obtain high fidelity solutions also forchannel flows with no or two obstacles.The experiment with the flow in the L-shaped domain, where we do not obtain im-provements over a coarse multigrid solution, shows the limits of the current DNN-MGimplementation. We believe that a more complex neural network and more variabilityin the training data will be able to alleviate the limitation. We want to study this infuture work. There, we also want to generalize our current implementation to 3D flows.Up to now, we made no effort to use accelerator hardware for the neural network. Ahybrid approach where the finite element multigrid levels are residing on the CPU andthe neural network prediction is realized on a GPU would potentially provide furtherspeedups. Although DNN-MG is already considering the physics of the underlying prob-lem by using the nonliner residual on level L + 1, it is worth exploring how additionalmodel knowledge such as the divergence freedom of the predicted velocity fields can beintegrated into DNN-MG. In future work, we also want to consider the application ofDNN-MG to other partial differential equations, cf. from nonlinear elasticity, cf. Sec. 5.3. Acknowledgement
NM and TR acknowledge the financial support by the Federal Min-istry of Education and Research of Germany, grant number 05M16NMA as well as theGRK 2297 MathCoRe, funded by the Deutsche Forschungsgemeinschaft, grant number314838170. 26 eferences [1] S. Bai, J. Zico Kolter, and V. Koltun. An Empirical Evaluation of Generic Con-volutional and Recurrent Networks for Sequence Modeling. arXiv e-prints , pagearXiv:1803.01271, March 2018.[2] A. R. Barron. Approximation and Estimation Bounds for Artificial Neural Net-works.
Machine Learning , 14(1):115–133, jan 1994.[3] C. Beck, S. Becker, P. Cheridito, A. Jentzen, and A. Neufeld. Deep splitting methodfor parabolic PDEs. jul 2019.[4] R. Becker and M. Braack. A finite element pressure gradient stabilization for theStokes equations based on local projections.
Calcolo , 38(4):173–199, 2001.[5] R. Becker, M. Braack, D. Meidner, T. Richter, and B. Vexler. The finite elementtoolkit
Gascoigne . .[6] R. Becker, M. Braack, and T. Richter. Parallel multigrid on locally refined meshes.In R. Rannacher, et. al., editor, Reactive Flows, Diffusion and Transport . Springer,Berlin, 2006.[7] J. Berg and K. Nystr¨om. Data-driven discovery of PDEs in complex datasets.
Journal of Computational Physics , 384:239–252, may 2019.[8] Helmut B¨olcskei, Philipp Grohs, Gitta Kutyniok, and Philipp Petersen. OptimalApproximation with Sparsely Connected Deep Neural Networks.
SIAM Journal onMathematics of Data Science , 1(1):8–45, jan 2019.[9] M. Braack, E. Burman, V. John, and G. Lube:. Stabilized finite element methods forthe generalized oseen problem.
Comput. Methods Appl. Mech. Engrg. , 196:853–866,2007.[10] J. H. Bramble.
Multigrid Methods . Longman Scientific & Technical, Harlow, 1993.[11] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud. Neural OrdinaryDifferential Equations.
NIPS 2018 , jun 2018.[12] Z. Chen, J. Zhang, M. Arjovsky, and L. Bottou. Symplectic Recurrent NeuralNetworks. sep 2019.[13] K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio.Learning phrase representations using RNN encoder-decoder for statistical machinetranslation. In
Conference on Empirical Methods in Natural Language Processing(EMNLP 2014) , 2014.[14] G. Cybenko. Approximation by superpositions of a sigmoidal function.
Mathematicsof Control, Signals, and Systems , 2(4):303–314, dec 1989.2715] Weinan E and Bing Yu. The deep ritz method: A deep learning-based numericalalgorithm for solving variational problems.
Communications in Mathematics andStatistics , 6(1):1–12, 2018.[16] N. B. Erichson, M. Muehlebach, and M. W. Mahoney. Physics-informed Autoen-coders for Lyapunov-stable Fluid Flow Prediction. may 2019.[17] L. Failer and T. Richter. A newton multigrid framework for optimal control offluid-structure interactions.
Optimization and Engineering , 82(2), accepted 2020.https://arxiv.org/abs/2003.14018.[18] N. Geneva and N. Zabaras. Modeling the dynamics of PDE systems with physics-constrained deep auto-regressive networks.
Journal of Computational Physics ,403:109056, feb 2020.[19] Craig Gin, Bethany Lusch, Steven L. Brunton, and J. Nathan Kutz. Deep LearningModels for Global Coordinate Transformations that Linearize PDEs. arXiv e-prints ,page arXiv:1911.02710, November 2019.[20] Philipp Grohs, Fabian Hornung, Arnulf Jentzen, and Philipp Zimmermann. Space-time error estimates for deep neural network approximations for differential equa-tions. arXiv e-prints , page arXiv:1908.03833, August 2019.[21] W. Hackbusch.
Multi-Grid Methods and Applications . Springer, 1985.[22] J. Heywood and R. Rannacher. Finite element approximation of the nonstationaryNavier-Stokes problem. iv. error analysis for second-order time discretization.
SIAMJ. Numer. Anal. , 27(3):353–384, 1990.[23] J.G. Heywood, R. Rannacher, and S. Turek. Artificial boundaries and flux andpressure conditions for the incompressible Navier-Stokes equations.
Int. J. Numer.Math. Fluids. , 22:325–352, 1992.[24] S. Hochreiter, Y. Bengio, P. Frasconi, and J. Schmidhuber.
Gradient Flow in Re-current Nets: The Difficulty of Learning LongTerm Dependencies , pages 237–243.Wiley-IEEE Press, 2001.[25] S. Hochreiter and J. Schmidhuber. Long Short-Term Memory.
Neural Computation ,9(8):1735–1780, nov 1997.[26] K. Hornik. Approximation capabilities of multilayer feedforward networks.
NeuralNetworks , 4(2):251–257, jan 1991.[27] P. Jin, A. Zhu, G. E. Karniadakis, and Y. Tang. Symplectic networks: Intrinsicstructure-preserving networks for identifying Hamiltonian systems. jan 2020.[28] Sharmila Karumuri, Rohit Tripathy, Ilias Bilionis, and Jitesh Panchal. Simulator-free solution of high-dimensional stochastic elliptic partial differential equations28sing deep neural networks.
Journal of Computational Physics , 404:109120, March2020.[29] M. F. Kasim, D. Watson-Parris, L. Deaconu, S. Oliver, P. Hatfield, D. H. Froula,G. Gregori, M. Jarvis, S. Khatiwala, J. Korenaga, J. Topp-Mugglestone, E. Viezzer,and S. M. Vinko. Up to two billion times acceleration of scientific simulations withdeep neural architecture search. jan 2020.[30] G. Kutyniok, P. Petersen, M. Raslan, and R. Schneider. A Theoretical Analysis ofDeep Neural Networks and Parametric PDEs. mar 2019.[31] I. E. Lagaris, A. Likas, and D. I. Fotiadis. Artificial neural networks for solvingordinary and partial differential equations.
Trans. Neur. Netw. , 9(5):987–1000,September 1998.[32] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning.
Nature , 521(7553):436–444,may 2015.[33] A. Klawonn M. Eichinger, A. Heinlein. Stationary flow predictions using convolu-tional neural networks. In F. Vermolen and C. Vuik, editors,
Numerical Mathematicsand Advanced Applications. ENUMATH 2019 . Springer, 2020.[34] S. Mallat. Understanding deep convolutional networks.
Philosophical Transac-tions of the Royal Society A: Mathematical, Physical and Engineering Sciences ,374(2065):20150203, apr 2016.[35] N. Margenberg and T. Richter. Parallel time-stepping for fluid-structure interac-tions. submitted , 2019. https://arxiv.org/abs/1907.01252.[36] M. Mattheakis, P. Protopapas, D. Sondak, M. Di Giovanni, and E. Kaxiras. PhysicalSymmetries Embedded in Neural Networks. apr 2019.[37] M. A. Nabian and H. Meidani. Physics-Informed Regularization of Deep NeuralNetworks. Technical report, 2018.[38] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, GregoryChanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Des-maison, Andreas Kopf, Edward Yang, Zachary DeVito, Martin Raison, AlykhanTejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and SoumithChintala. Pytorch: An imperative style, high-performance deep learning library. InH. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alch´e Buc, E. Fox, and R. Garnett,editors,
Advances in Neural Information Processing Systems 32 , pages 8024–8035.Curran Associates, Inc., 2019.[39] T. Qin, K. Wu, and D. Xiu. Data driven governing equations approximation usingdeep neural networks.
Journal of Computational Physics , 395:620–635, oct 2019.2940] M. Raissi and G. E. Karniadakis. Hidden physics models: Machine learning ofnonlinear partial differential equations.
Journal of Computational Physics , 357:125–141, mar 2018.[41] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden Fluid Mechanics: A Navier-Stokes Informed Deep Learning Framework for Assimilating Flow VisualizationData. aug 2018.[42] T. Richter.
Fluid-structure Interactions. Models, Analysis and Finite Elements ,volume 118 of
Lecture notes in computational science and engineering . Springer,2017.[43] S. Rudy, A. Alla, S. L. Brunton, and J. N. Kutz. Data-Driven Identification ofParametric Partial Differential Equations.
SIAM Journal on Applied DynamicalSystems , 18(2):643–660, jan 2019.[44] Y. Saad.
Iterative Methods for Sparse Linear Systems . PWS Publishing Company,1996.[45] M. Sch¨afer and S. Turek. Benchmark computations of laminar flow around a cylin-der. (With support by F. Durst, E. Krause and R. Rannacher). In E.H. Hirschel,editor,
Flow Simulation with High-Performance Computers II. DFG priority re-search program results 1993-1995 , number 52 in Notes Numer. Fluid Mech., pages547–566. Vieweg, Wiesbaden, 1996.[46] J. Schrittwieser, I. Antonoglou, T. Hubert, K. Simonyan, L. Sifre, S. Schmitt,A. Guez, E. Lockhart, D. Hassabis, T. Graepel, T. Lillicrap, and D. Silver. Mas-tering atari, go, chess and shogi by planning with a learned model. arXiv e-prints ,page arXiv:1911.08265, November 2019.[47] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, andD. Barajas-Solano. Learning Parameters and Constitutive Relationships withPhysics Informed Deep Neural Networks. aug 2018.[48] S. P. Vanka. Block-implicit multigrid solution of Navier-Stokes equations in primi-tive variables.
J. Comp. Phy. , 65:138–158, 1985.[49] A. Voelker, I. Kaji´c, and C. Eliasmith. Legendre Memory Units: Continuous-TimeRepresentation in Recurrent Neural Networks. In
NIPS 2019 , pages 15544–15553,2019.[50] Z. Y. Wan and T. P. Sapsis. Machine learning the kinematics of spherical particlesin fluid flows.
Journal of Fluid Mechanics , 857:R2, dec 2018.[51] G. Weiss, Y. Goldberg, and E. Yahav. On the practical computational power offinite precision rnns for language recognition. In
ACL 2018 , 2018.[52] M. Werhahn, Y. Xie, M. Chu, and N. Thuerey. A multi-pass gan for fluid flowsuper-resolution.
Proc. ACM Comput. Graph. Interact. Tech. , 2(2), July 2019.3053] S. Wiewel, M. Becher, and N. Thuerey. Latent-space Physics: Towards Learningthe Temporal Evolution of Fluid Flow. feb 2018.[54] Y. Xie, E. Franz, M. Chu, and N. Thuerey. Tempogan: A temporally coherent,volumetric gan for super-resolution fluid flow.
ACM Trans. Graph. , 37(4), July2018.[55] Y. Yang and P. Perdikaris. Physics-informed deep generative models. dec 2018.[56] D. Zhang, L. Guo, and G. E. Karniadakis. Learning in Modal Space: SolvingTime-Dependent Stochastic PDEs Using Physics-Informed Neural Networks. may2019.[57] Yinhao Zhu, Nicholas Zabaras, Phaedon-Stelios Koutsourelakis, and ParisPerdikaris. An efficient fluid–solid coupling algorithm for single-phase flows.