[PDF] Neural network guided adjoint computations in dual weighted residual error estimation

Abstract

In this work, we are concerned with neural network guided goal-oriented a posteriori error estimation and adaptivity using the dual weighted residual method. The primal problem is solved using classical Galerkin finite elements. The adjoint problem is solved in strong form with a feedforward neural network using two or three hidden layers. The main objective of our approach is to explore alternatives for solving the adjoint problem with greater potential of a numerical cost reduction. The proposed algorithm is based on the general goal-oriented error estimation theorem including both linear and nonlinear stationary partial differential equations and goal functionals. Our developments are substantiated with some numerical experiments that include comparisons of neural network computed adjoints and classical finite element solutions of the adjoints. In the programming software, the open-source library deal.II is successfully coupled with LibTorch, the PyTorch C++ application programming interface.

Full PDF

NNeural network guided adjoint computations in dual weightedresidual error estimation

J. Roth , M. Schröder , and T. Wick Leibniz Universität Hannover, Institut für Angewandte Mathematik, AG WissenschaftlichesRechnen, Welfengarten 1, 30167 Hannover, Germany Cluster of Excellence PhoenixD (Photonics, Optics, and Engineering - Innovation AcrossDisciplines), Leibniz Universität Hannover, Germany

Abstract

In this work, we are concerned with neural network guided goal-oriented a posteriori errorestimation and adaptivity using the dual weighted residual method. The primal problem is solvedusing classical Galerkin ﬁnite elements. The adjoint problem is solved in strong form with afeedforward neural network using two or three hidden layers. The main objective of our approachis to explore alternatives for solving the adjoint problem with greater potential of a numerical costreduction. The proposed algorithm is based on the general goal-oriented error estimation theoremincluding both linear and nonlinear stationary partial diﬀerential equations and goal functionals.Our developments are substantiated with some numerical experiments that include comparisonsof neural network computed adjoints and classical ﬁnite element solutions of the adjoints. In theprogramming software, the open-source library deal.II is successfully coupled with LibTorch, thePyTorch C++ application programming interface.

This work is devoted to an innovative solution of the adjoint equation in goal-oriented error estimationwith the dual weighted residual (DWR) method [4, 5, 3] (based on former adjoint concepts [18]); wealso refer to [7, 1, 21, 37] for some important early work. Since then, the DWR method has beenapplied to numerous applications such as ﬂuid-structure interaction [49, 42, 19], Maxwell’s equations[11], surrogate models in stochastic inversion [33], model adaptivity in multiscale problems [32], andadaptive multiscale predictive modeling [35]. A summary of theoretical advancements in eﬃciencyestimates and multi-goal-oriented error estimation was recently made in [14]. An important partin these studies is the adjoint problem, as it measures the sensitivity of the primal solution withrespect to a single or multiple given goal functionals (quantities of interest). This adjoint solution isusually obtained by global higher order ﬁnite element method (FEM) solutions or local higher orderapproximations [5]. In general, the former is more stable, see e.g. [17], but the latter works often1 a r X i v : . [ m a t h . NA ] F e b uﬃciently well in practice. As the adjoint solution is only required to evaluate the a posteriori errorestimator, a cheap solution is of interest.Consequently, in this work, the main objective is to explore alternatives for computing the adjoint.Due to the universal approximation property [38], a primer candidate are neural networks as theyare already successfully employed for solving partial diﬀerential equations (PDE) [39, 46, 6, 40, 28,48, 22, 45, 23]. A related work in aiming to improve goal-oriented computations with the help ofneural network data-driven ﬁnite elements is [9]. Moreover, a recent summary of the key concepts ofneural networks and deep learning was compiled in [24]. The advantage of neural networks is a greaterﬂexibility as they belong to the class of meshless methods. We follow the methodology of [39, 46] tosolve PDEs by minimizing the residual using an L-BFGS (Limited memory Broyden-Fletcher-Goldfarb-Shanno) method [29]. We address both linear and nonlinear PDEs and goal functionals in stationarysettings. However, a shortcoming in the current approach is that we need to work with strong adjointformulations, which may limit extensions to nonlinear coupled PDEs such as multiphysics problemsand coupled variational inequality systems. If such problems can be restated in an energy formulation,again neural network algorithms are known [13, 45]. Despite this drawback, namely the necessityof working with strong formulations, the current study provides useful insights whether at all neuralnetwork guided adjoints can be an alternative concept for dual weighted residual error estimation. Forthis reason, our resulting modiﬁed adaptive algorithm and related numerical simulations are comparedside by side in all numerical tests to classical Galerkin ﬁnite element solutions (see e.g., [10]) of theadjoint. Our proposed algorithm is implemented in the open-source ﬁnite element library deal.II [2]coupled with LibTorch, the PyTorch C++ API [36].The outline of this paper is as follows: In Section 2, we recapitulate the DWR method. Next,in Section 3 we gather the important ingredients of the neural network solution. This section alsoincludes an extension of an approximation theorem from Lebesgue spaces to classical function spaces.The algorithmic realization is addressed in Section 4. Then, in Section 5 several numerical experimentsare conducted. Our ﬁndings are summarized in Section 6. Let U and V be Banach spaces and let A : U → V ∗ be a nonlinear mapping, where V ∗ denotes thedual space of V . With this, we can deﬁne the problem: Find u ∈ U such that A ( u )( v ) = 0 ∀ v ∈ V. (1)Additionally, we can look at an approximation of this problem. For subspaces ˜ U ⊂ U and ˜ V ⊂ V theproblem reads: Find ˜ u ∈ ˜ U such that A (˜ u )(˜ v ) = 0 ∀ ˜ v ∈ ˜ V .

Remark 2.1.

In the following the nonlinear mapping A ( · )( · ) will represent the variational formulationof a stationary partial diﬀerential equation with the associated function spaces U and V . We deﬁne the nite element approximation of the abstract problem as follows: Find u h ∈ U h such that A ( u h )( v h ) = 0 ∀ v h ∈ V h , (2) where U h ⊂ U and V h ⊂ V denote the ﬁnite element spaces. Here the operator is given by A ( u h )( · ) := a ( u h )( · ) − l ( · ) with the linear forms a ( u h )( · ) and l ( · ) . In many applications we are not necessarily interested in the whole solution to a given problem butmore explicitly only in the evaluation of a certain quantity of interest. This quantity of interest canoften be represented mathematically by a goal functional J : U → R . Here the main target is tominimize the error in this given goal functional and use the computational resources eﬃciently. Thiscan lead to the approach of [4, 5], the DWR method, which this work will follow closely. We areinterested in the evaluation of the goal functional J in the solution u ∈ U to the problem A ( u )( v ) = 0 for all v ∈ V . Under the assumption that the problem yields a unique solution, the formulation fromabove can be rewritten into the equivalent optimization problem min u ∈ U J ( u ) s.t. A ( u )( v ) = 0 ∀ v ∈ V. For this constrained optimization problem we can introduce the corresponding Lagrangian L ( u, z ) = J ( u ) − A ( u )( z ) with the adjoint variable z ∈ V . For this, a stationary point needs to fulﬁll the ﬁrst-order necessaryconditions L (cid:48) = 0 ⇔  L (cid:48) u ( u, z ) = J (cid:48) ( u )( δu ) − A (cid:48) ( u )( δu, z ) ! = 0 L (cid:48) z ( u, z ) = −A ( u )( δz ) ! = 0 ⇔  A (cid:48) ( u )( δu, z ) = J (cid:48) ( u )( δu ) A ( u )( δz ) = 0 where J (cid:48) , A (cid:48) denote the Fréchet derivatives. We see that a deﬁning equation for the adjoint variablearises therein. Find z ∈ V such that A (cid:48) ( u )( φ, z ) = J (cid:48) ( u )( φ ) ∀ φ ∈ U, (3)which is known as the adjoint problem. This leads to the error representation for arbitrary approxi-mations, as derived in [41]. Theorem 2.2.

Let ( u, z ) ∈ U × V solve (1) and (3). Further, let A ∈ C ( U, V ∗ ) and J ∈ C ( U, R ) .Then for arbitrary approximations (˜ u, ˜ z ) ∈ U × V the error representation J ( u ) − J (˜ u ) = 12 ρ (˜ u )( z − ˜ z ) + 12 ρ ∗ (˜ u, ˜ z )( u − ˜ u ) + ρ (˜ u )(˜ z ) + R (3) (4)3 olds true and ρ (˜ u )( · ) := −A (˜ u )( · ) ,ρ ∗ (˜ u, ˜ z )( · ) := J (cid:48) (˜ u )( · ) − A (cid:48) (˜ u )( · , ˜ z ) . With e = u − ˜ u, e ∗ = z − ˜ z , the remainder term reads as follows: R (3) := 12 (cid:90) (cid:104) J (cid:48)(cid:48)(cid:48) (˜ u + se )( e, e, e ) − A (cid:48)(cid:48)(cid:48) (˜ u + se )( e, e, e, ˜ z + se ∗ ) − A (cid:48)(cid:48) (˜ u + se )( e, e, e ∗ ) (cid:105) s ( s −

1) d s. Proof.

The proof can be found in [41].

Remark 2.3. If ˜ u := u h ∈ U h ⊂ U is the Galerkin projection which solves (2) and ˜ z := z h ∈ V h ⊂ V , then the iteration error ρ (˜ u )(˜ z ) vanishes and yields the theorems presented in the early work [3].Therefore, from now on we omit the iteration error. The remainder term is usually of third order [5]and can be omitted for which detailed computational evidence was demonstrated in [16]. In the case ofa linear problem, it clearly holds that η = ρ (˜ u )( z − ˜ z ) = 12 ρ (˜ u )( z − ˜ z ) + 12 ρ ∗ (˜ u, ˜ z )( u − ˜ u ) . Remark 2.4.

Theorem 2.2 motivates the error estimator η = 12 ρ (˜ u )( z − ˜ z ) + 12 ρ ∗ (˜ u, ˜ z )( u − ˜ u ) . This error estimator is exact but not computable. Therefore, the exact solutions u and z are now beingapproximated by higher-order solutions (cid:16) u (2) h , z (2) h (cid:17) ∈ U (2) h × V (2) h . These higher-order solutions can berealised by a globally reﬁned grid or by using higher-order basis functions. The practical error estimatorreads η (2) = 12 ρ (˜ u ) (cid:16) z (2) h − ˜ z (cid:17) + 12 ρ ∗ (˜ u, ˜ z ) (cid:16) u (2) h − ˜ u (cid:17) . (5) In principle, we need to solve four problems, where especially the computation of u (2) h is expensive.It is well-known that diﬀerent possibilities exist such as global higher-order ﬁnite element solution orlocal interpolations [5, 43, 7]. Moreover, we only consider the primal part of the error estimator, whichis justiﬁed for linear problems only, and yields a second order remainder term in nonlinear problems[5][Proposition 2.3]: η (2) h = ρ ( u h ) (cid:16) z (2) h − ˜ z (cid:17) . For many nonlinear problems this version is used as it reduces to solving only two problems and yieldsfor mildly nonlinear problems, such as incompressible ﬂow [8], excellent values. On the other hand, forquasi-linear problems, there is a strong need to work with the adjoint error parts ρ ∗ as well [15, 16].In our work, we employ solutions in enriched spaces. We compute the adjoint solution z lh = i h z l, (2) h ∈ V lh ⊂ V l, (2) h via restriction. For nonlinear problems, we approximate the primal solution4n the enriched space u l, (2) h = I (2) h u lh ∈ U l, (2) h ⊃ U lh via interpolation. Therefore, we only solve twoproblems in practice: the primal problem and the enriched adjoint problem. Algorithm 1

DWR algorithm for general nonlinear problems Start with some initial guess u h , l = 1 . Solve the primal problem: Find u lh ∈ U lh such that A (cid:16) u lh (cid:17) (cid:16) φ lh (cid:17) = 0 ∀ φ lh ∈ V lh , using some nonlinear solver. Compute the interpolations u l, (2) h = I (2) h u lh ∈ U l, (2) h . Solve the enriched adjoint problem: Find z l, (2) h ∈ V l, (2) h such that A (cid:48) (cid:16) u l, (2) h (cid:17) (cid:16) z l, (2) h , ψ l, (2) h (cid:17) = J (cid:48) (cid:16) u l, (2) h (cid:17) (cid:16) ψ l, (2) h (cid:17) ∀ ψ l, (2) h ∈ U l, (2) h , using some linear solver. Compute the restriction z lh = i h z l, (2) h ∈ V lh . Compute the error estimator η (2) . if (cid:12)(cid:12) η (2) (cid:12)(cid:12) < T OL then Algorithm terminates with ﬁnal output J (cid:0) u lh (cid:1) . Localize error estimator η (2) and mark elements. Reﬁne marked elements: T lh (cid:55)→ T l +1 h , l = l + 1 . Go to Step 2.

The error estimator η (2) must be localized to corresponding regions of error contribution. This can beeither done by methods proposed in [4, 5, 3], which use integration by parts in a backwards mannerand result in an element wise localization employing the strong form of the equations. However, in thiswork we use the technique of [43], where a partition-of-unity (PU) (cid:80) i ψ i ≡ was introduced, in whichthe error contribution is localized on a nodal level. To realize this partition-of-unity, one can simplychoose piece-wise bilinear elements { ψ ih | i = 1 , . . . , N } . Then, the approximated error indicator reads η (2) ,P U = N (cid:88) i =1 (cid:18) ρ (˜ u ) (cid:16)(cid:16) z (2) h − ˜ z (cid:17) ψ i (cid:17) + 12 ρ ∗ (˜ u, ˜ z ) (cid:16)(cid:16) u (2) h − ˜ u (cid:17) ψ i (cid:17)(cid:19) . (6)Some recent theoretical work on the eﬀectivity and eﬃciency of η (2) ,P U can be found in [43, 16],respectively. The main objective of the remainder of this paper is to compute the adjoint solution witha feedforward neural network. To evaluate the goodness of the error estimator we introduce the eﬀectivity index I eff = (cid:12)(cid:12) η (2) ,P U (cid:12)(cid:12) | J ( u ) − J (˜ u ) | . J ( u ) is unknown, we approximate it by J (ˆ u ) , where ˆ u is the solution of the PDE on a very ﬁne grid.We desire that the eﬀectivity index converges to , which signiﬁes that our error estimator is a goodapproximation of the error in the goal functional. In order to realize neural network guided DWR, we consider feedforward neural networks u NN : R d → R , where d is the dimension of the domain Ω plus the dimension of u and the dimension of all thederivatives of u that are required for the adjoint problem. The neural networks can be expressed as u NN ( x ) = T ( L ) ◦ σ ◦ T ( L − ◦ · · · ◦ σ ◦ T (1) ( x ) , where T ( i ) : R n i − → R n i , y (cid:55)→ W ( i ) y + b ( i ) are aﬃne transformations for ≤ i ≤ L , with weightmatrices W ( i ) ∈ R n i × n i − and bias vectors b ( i ) ∈ R n i . Here n i denotes the number of neurons inthe i .th layer with n = d and n L = 1 . σ : R → R is a nonlinear activation function, which is thehyperbolic tangent function throughout this work. Derivatives of neural networks can be computedwith back propagation (see e.g. [44, 24]), a special case of reverse mode automatic diﬀerentiation [34].Similarly higher order derivatives can be calculated by applying automatic diﬀerentiation recursively. Cybenko [12] and Hornik [25] proved a ﬁrst version of the universal approximation theorem, whichstates that continuous functions can be approximated to arbitrary precision by single hidden layerneural networks. A few years later Pinkus [38] generalized their ﬁndings and showed that single hiddenlayer neural networks can uniformly approximate a function and its partial derivatives. The space ofsingle hidden layer neural networks is given by M ( σ ) := (cid:110) σ ( w · x + b ) (cid:12)(cid:12)(cid:12) w ∈ R d , b ∈ R (cid:111) . Theorem 3.1 (Universal Approximation Theorem [38]) . Let m ( i ) ∈ N d be multi indices for ≤ i ≤ s and set m = max ≤ i ≤ s | m ( i ) | . Assume σ ∈ C m ( R ) and σ is not a polynomial. Then for any f ∈ ∩ si =1 C m ( i ) ( R d ) , any compact K ⊂ R d , and any (cid:15) > , there exists g ∈ M ( σ ) such that max x ∈ K (cid:12)(cid:12)(cid:12) D k f ( x ) − D k g ( x ) (cid:12)(cid:12)(cid:12) < (cid:15), for all k ∈ N d for which k ≤ m ( i ) for some ≤ i ≤ s . This theoretical result motivates the application of neural networks for the numerical approximationof partial diﬀerential equations.

Residual minimization with neural networks has become popular in the last few years by the works ofRaissi, Perdikaris and Karniadakis on physics-informed neural networks (PINNs) [39] and the paper of6irignano and Spiliopoulos on the "Deep Galerkin Method" [46]. For their approach one can considerthe strong formulation of the stationary PDE N ( u, x ) = 0 in Ω B ( u, x ) = 0 on ∂ Ω (7)where N is a diﬀerential operator and B is a boundary operator. An example for the diﬀerentialoperator N is given by the semi-linear form A ( u )( v ) introduced in Section 2.1. The boundary operator B in case of Dirichlet conditions is realized in the weak formulation as usual in the function space U .One then needs to ﬁnd a neural network u NN , which minimizes the loss function L ( u NN ) = 1 n Ω n Ω (cid:88) i =1 N (cid:0) u NN , x Ω i (cid:1) + 1 n ∂ Ω n ∂ Ω (cid:88) i =1 B (cid:16) u NN , x ∂ Ω i (cid:17) , where x Ω1 , . . . , x Ω n Ω ∈ Ω are collocation points inside the domain and x ∂ Ω1 , . . . , x ∂ Ω n ∂ Ω ∈ ∂ Ω are collocationpoints on the boundary. In [47] it has been shown that the two components of the loss function needto be weighted appropriately to yield accurate results. Therefore, we use a modiﬁed version of thismethod which circumvents these issues. Let us again consider the abstract PDE problem in its strong formulation (7). For simplicity, we onlyconsider Dirichlet boundary conditions, i.e. B ( u, x ) := u ( x ) − g ( x ) . Additionally, in our work we usethe approach of Berg and Nyström [6], who used the ansatz u ( x ) := d ∂ Ω ( x ) · u NN ( x ) + ˜ g ( x ) for x ∈ ¯Ω (8)to fulﬁll inhomogeneous Dirichlet boundary conditions exactly. Here ˜ g denotes the extension of theboundary data g to the entire domain ¯Ω , which is continuously diﬀerentiable up to the order of thediﬀerential operator N . Berg and Nyström [6] used the distance to the boundary ∂ Ω as their function d ∂ Ω . However, it is suﬃcient to use a function d ∂ Ω which is continuously diﬀerentiable up to the orderof the diﬀerential operator N with the properties d ∂ Ω ( x )  = 0 for x ∈ ∂ Ω (cid:54) = 0 for x ∈ Ω . Thus, d ∂ Ω can be interpreted as a level-set function, since Ω = (cid:8) x ∈ ¯Ω (cid:12)(cid:12) d ∂ Ω ( x ) (cid:54) = 0 (cid:9) and ∂ Ω = (cid:8) x ∈ ¯Ω (cid:12)(cid:12) d ∂ Ω ( x ) = 0 (cid:9) . Obviously, for this kind of ansatz for the solution of the PDE, it holds that B ( u, x ) = u ( x ) − g ( x ) = [ d ∂ Ω ( x ) · u NN ( x ) + ˜ g ( x )] − g ( x ) = 0 on ∂ Ω . Therefore, in contrast to some previous works, we do not need to account for the boundary conditionsin our loss function, which is a big beneﬁt of our approach, since proper weighting of the diﬀerent7esidual contributions in the loss function is not required. It might only be a little cumbersome tofulﬁll the boundary conditions exactly when dealing with mixed boundary condition, but the form ofthe ansatz function for such boundary conditions has been laid out in [31]. xy xy σσσσ ... σσσσ ... u NN d ∂ Ω ≡ on ∂ Ω (cid:54) = 0 in Ω˜ g : ¯Ω → R g : ∂ Ω → R extension × + u Loss n (cid:80) ni =1 ( ∂ xx u i + ∂ yy u i + f i ) Figure 1: Section 3.3: Diagram of our ansatz u = d ∂ Ω · u NN + ˜ g for the two dimensional Poissonproblem. Here we used the abbreviations u i := u ( x i , y i ) and f i := f ( x i , y i ) . In the following, we prove that our neural network solutions approximate the analytical solutions wellif their loss is suﬃciently small. Our neural networks u NN have been trained with the mean squarederror of the residual of the PDE, i.e. L ( u ) = 1 n n (cid:88) i =1 N ( u, x i ) , (9)where n is the number of collocation points x i from the domain Ω . For the sake of generality, let usconsider the generalized loss ˆ L p ( u ) = 1 | Ω | (cid:90) Ω |N ( u, x ) | p d x for p ≥ . Then, the loss (9) is just the Monte Carlo approximation of the generalized loss for p = 2 . Webrieﬂy recall the approximation theorem from [47] and show that the classical solution of the Poissonproblem satisﬁes the assumptions of the approximation theorem. Lemma 3.2 (Approximation theorem [47]) . Let ≤ p ≤ ∞ . We consider a PDE of the form (7) ona bounded, open domain Ω ⊂ R m with Lipschitz boundary ∂ Ω and N ( u, x ) = N ( u, x ) − ˆ f ( x ) , where N is a linear, elliptic operator and ˆ f ∈ L (Ω) . Let there be a unique solution ˆ u ∈ H (Ω) and let thefollowing stability estimate (cid:107) u (cid:107) H (Ω) ≤ C (cid:107) f (cid:107) L (Ω) old for u ∈ H (Ω) , f ∈ L (Ω) with N ( u, x ) = f ( x ) in Ω . Then we have for an approximate solution u ∈ H (Ω) that ∀ (cid:15) > ∃ δ > L p ( u ) < δ = ⇒ (cid:107) u − ˆ u (cid:107) H (Ω) < (cid:15). Proof.

Let δ = (cid:15) p C − p | Ω | − p . Let u = d ∂ Ω ( x ) · u NN ( x ) + ˜ g ( x ) ∈ H (Ω) be an approximate solution of the PDE with ˆ L p ( u ) < δ ,which means that there exists a perturbation to the right-hand side f error ∈ L (Ω) such that N ( u, x ) =ˆ f ( x ) + f error ( x ) . By the stability estimate and the linearity of N , we have (cid:107) u − ˆ u (cid:107) H (Ω) ≤ C (cid:107) ( ˆ f + f error ) − ˆ f (cid:107) L (Ω) = C (cid:107) f error (cid:107) L (Ω) . Applying the Hölder inequality to the norm of f error and using ≤ p ≤ ∞ yields (cid:107) f error (cid:107) L (Ω) ≤ | Ω | − p (cid:107) f error (cid:107) L p (Ω) . Combing the last two inequalities gives us the desired error bound (cid:107) u − ˆ u (cid:107) H (Ω) ≤ C (cid:107) f error (cid:107) L (Ω) ≤ C | Ω | − p (cid:107) f error (cid:107) L p (Ω) = C | Ω | ˆ L p ( u ) p < C | Ω | δ p = (cid:15). In the last inequality, we used that the generalized loss of our approximate solution is suﬃciently small,i.e. ˆ L p ( u ) < δ .Let us recapitulate an important result from the Schauder theory [20], which yields the existence anduniqueness of classical solutions of the Poisson problem if we assume higher regularity of our problem,i.e. when we work with Hölder continuous functions and suﬃciently smooth domains. Lemma 3.3 (Solution in classical function spaces) . Let < λ < be such that Ω ⊂ R m is a domainwith C ,λ boundary, ˜ g ∈ C ,λ ( ¯Ω) and ˆ f ∈ C ,λ ( ¯Ω) . Then Poisson’s problem, which is of the form (7)with N ( u, x ) := − ∆ u , has a unique solution ˆ u ∈ C ,λ ( ¯Ω) .Proof. Follows immediately from [20][Theorem 6.14].With Lemma 3.3 we can now show that the approximation theorem holds for the Poisson problem inclassical function spaces.

Theorem 3.4.

Let < λ < be such that Ω ⊂ R m is a bounded, open domain with C ,λ boundary, ˜ g ∈ C ,λ ( ¯Ω) and ˆ f ∈ C ,λ ( ¯Ω) . Then Poisson’s problem, which is of the form (7) with N ( u, x ) := − ∆ u ,has a unique solution ˆ u ∈ H (Ω) . Furthermore, there exists u = d ∂ Ω ( x ) · u NN ( x ) + ˜ g ( x ) ∈ H (Ω) withthe estimate ∀ (cid:15) > ∃ δ > L p ( u ) < δ = ⇒ (cid:107) u − ˆ u (cid:107) H (Ω) < (cid:15). roof. From Lemma 3.3 it follows that there exists a unique solution ˆ u ∈ C ,λ ( ¯Ω) ⊂ H (Ω) . Analo-gously it holds that u = d ∂ Ω ( x ) · u NN ( x ) + ˜ g ( x ) ∈ H (Ω) . Furthermore, we have by the Lax-MilgramLemma that ˆ u ∈ H (Ω) is the unique weak solution and fulﬁlls the stability estimate (cid:107) ˆ u (cid:107) H (Ω) ≤ C (cid:107) ˆ f (cid:107) L (Ω) . By Lemma 3.2 the estimate ∀ (cid:15) > ∃ δ > L p ( u ) < δ = ⇒ (cid:107) u − ˆ u (cid:107) H (Ω) < (cid:15) then also holds. Remark 3.5.

Theorem 3.4 implies that a low loss value of a neural network with high probabilitycorresponds to an accurate approximation u of the exact solution ˆ u of the PDE, since the loss is aMonte Carlo approximation of the generalized loss, which for a large number of collocation pointsshould be close in value. To make a posteriori error estimates for our FEM solution of the primal problem (1), we now useneural networks to solve the adjoint PDE (3). In an FEM approach, the adjoint PDE would be solvedin its variational form as described in Algorithm 1, but we minimize the residual of the strong formusing neural networks and hence need to derive the strong formulation of the adjoint PDE ﬁrst. Aftertraining, the neural network is then projected into the FEM ansatz function space of the adjointproblem. Finally, the a posteriori estimates can be made as usual with the DWR method followingagain Algorithm 1.

Remark 3.6.

For linear goal functionals the Riesz representation theorem yields the existence anduniqueness of the strong formulation. Nevertheless, deriving the strong form of the adjoint PDE mightbe very involved for complicated PDEs, such as ﬂuid structure interaction, e.g. [42, 49], and goalfunctionals J : U → R . In future works, we aim to extend to alternative approaches which do notrequire the derivation of the strong form. Remark 3.7.

We use neural networks to trade oﬀ accuracy for speed. In general the neural networkapproach requires less collocation points than the ﬁnite element method. Therefore, we would expect theneural networks to be faster than the ﬁnite element method on ﬁner grids. In our numerical tests weused the coordinates of the degrees of freedom as our collocation points, but the collocation points couldalso be sampled randomly or one could adaptively choose the collocation points as proposed in [30].

In this section, we describe our ﬁnal algorithm for the neural network guided dual weighted residualmethod. In the algorithm, we work with hierarchical FEM spaces, i.e. U lh ⊂ U l, (2) h and V lh ⊂ V l, (2) h .10 lgorithm 2 Neural network guided DWR algorithm Start with some initial guess u h , l = 1 . Solve the primal problem: Find u lh ∈ U lh such that A (cid:16) u lh (cid:17) (cid:16) φ lh (cid:17) = 0 ∀ φ lh ∈ V lh , using some nonlinear solver. Compute the interpolations u l, (2) h = I (2) h u lh ∈ U l, (2) h . Solve the adjoint problem with a neural network: Find z = d ∂ Ω · z NN + ˜ g ∈ H (Ω) such that z NN = arg min ˆ z NN L ( d ∂ Ω · ˆ z NN + ˜ g ) . Project the neural network solution in the enriched FEM space z l, (2) h = π ( z ) with a projection π : H (Ω) → V l, (2) h . Compute the restriction z lh = i h z l, (2) h ∈ V lh . Compute the error estimator η (2) h deﬁned in (5). if (cid:12)(cid:12)(cid:12) η (2) h (cid:12)(cid:12)(cid:12) < T OL then Algorithm terminates with ﬁnal output J (cid:0) u lh (cid:1) . Localize error estimator η (2) h and mark elements. Reﬁne marked elements: T lh (cid:55)→ T l +1 h , l = l + 1 . Go to Step 2.Here we only consider the Galerkin method for which the ansatz function space and the trial functionspace coincide, i.e. U = V , but U (cid:54) = V can be realized in a similar fashion. The novelty compared tothe DWR method presented in Chapter 2 are step 4 and step 5 of the algorithm. In the following, wedescribe these parts in more detail.In step 4, we solve the strong form of the adjoint problem, which for nonlinear PDEs or nonlineargoal functionals also depends on the primal solution u l, (2) h . The strong form of the adjoint problemis of the form (7) and thus we can ﬁnd a neural network based solution by minimizing the loss (9)with L-BFGS [29], a quasi-Newton method. We observed that by using L-BFGS sometimes the lossexploded or the optimizer got stuck at a saddle point. Consequently, we restarted the training loopwith a new neural network when the loss exploded or used a few steps with the Adam optimizer [27]when a saddle point was reached. Afterwards, L-BFGS can be used as an optimizer again. Duringtraining we used the coordinates of the degrees of freedom as our collocation points. We stoppedthe training when the loss did not decrease by more than T OL = 10 − in the last n = 5 epochs orwhen we reached the maximum number of epochs, which we chose to be . An alternative stoppingcriterion on ﬁne meshes could be early stopping, where the collocation points are being split into atraining and a validation set and the training stops when the loss on the validation set starts deviat-11ng from the loss on the training set, i.e. when the neural network begins to overﬁt on the training data.In step 5, we projected the neural network based solution into the enriched FEM space by evaluatingit at the coordinates of the degrees of freedom, which yields a unique function z l, (2) h . In this section we consider two stationary problems (with in total four numerical tests) with ourproposed approach. We consider both linear and nonlinear PDEs and goal functionals. The primalproblem, i.e. the original PDE, is being solved with bilinear shape functions. The adjoint PDE is solvedby minimizing the residual of our neural network ansatz (Sections 3 and 4) and we project the solutioninto the biquadratic ﬁnite element space. For studying the performance, we also compute the adjointproblem with ﬁnite elements employing biquadratic shape functions. Finally, this neural networksolution is being plugged into the PU DWR error estimator (6), which decides which elements will bemarked for reﬁnement. To realize the numerical experiments, we couple deal.II [2] with LibTorch, thePyTorch C++ API [36].

At ﬁrst we consider the two dimensional Poisson equation with homogeneous Dirichlet conditions onthe unit square. In our ansatz (8), we choose the function d ∂ Ω ( x, y ) = x (1 − x ) y (1 − y ) . Poisson’s problem is given by − ∆ u = f in Ω := (0 , u = 0 on ∂ Ω . For a linear goal functional J : V → R the adjoint problem then reads:Find z ∈ H (Ω) such that ( ∇ ψ, ∇ z ) = J ( ψ ) ∀ ψ ∈ H (Ω) . Here ( · , · ) denotes the L inner product, i.e. ( f, g ) := (cid:82) Ω f · g d x . As a ﬁrst numerical example of a linear goal functional, we consider the mean value goal functional J ( u ) = 1 | Ω | (cid:90) Ω u d x. The adjoint PDE can be written as ( ∇ ψ, ∇ z ) = (cid:18) ψ, | Ω | (cid:19) − ∆ z = 1 | Ω | in Ω z = 0 on ∂ Ω . We trained a fully connected neural network with two hidden layers with 32 neurons each and thehyperbolic tangent activation function for 400 epochs on 1,000 uniformly sampled points. In [39] it hasbeen shown that wider and deeper neural networks can achieve a lower L error between the neuralnetwork and the analytical solutions. However, if we use the support points of the FEM mesh as thecollocation points, we cannot use bigger neural networks, since we do not have enough training data.Therefore, we decided to use smaller networks.We compared our neural network based error estimator with a standard ﬁnite element based errorestimator: Est. error I eff Ref. DoFs J ( u ) − J ( u h ) FEM NN FEM NN0 9 1.17e-2 1.15e-2 1.14e-2 0.979 0.9711 25 3.17e-3 3.14e-3 3.14e-3 0.992 0.9902 81 8.10e-4 8.08e-4 8.08e-4 0.998 0.9983 289 2.03-4 2.04e-4 2.04e-4 1.00 1.004 1089 5.03e-5 5.11e-5 5.11e-5 1.02 1.025 4225 1.20e-5 1.28e-5 1.28e-5 1.07 1.07Figure 2: Section 5.1.1: Error estimator results for mean value goal functional.In this numerical test the neural network reﬁned in the same way as the ﬁnite element method andboth error indicators yield eﬀectivity indices I eff of approximately . , which means that the exacterror and the estimated error were almost identical. The error reduction is of second order as to beexpected and the overall results conﬁrm well similar computations presented in [43][Table 1]. In the second numerical example, we analyze the mean value goal functional which is only beingcomputed on a subset D ⊂ Ω of the domain. We choose D := (cid:2) , (cid:3) × (cid:2) , (cid:3) . For the regional goalfunction J ( u ) = 1 | D | (cid:90) D u d x the strong form of the PDE is given by − ∆ z = D | D | , D is the indicator function of D . The rest of the training setup is the same as for the previousgoal functional.We obtain the following computational results when comparing ﬁnite elements with our neural networkapproach: FEM NNRef. DoFs J ( u ) − J ( u h ) Est. error I eff DoFs J ( u ) − J ( u h ) Est. error I eff I eff of approximately . . (a) FEM (b) NN Figure 4: Section 5.1: Grid reﬁnement with regional mean value goal functional.On these grids which have been reﬁned with the diﬀerent approaches, we can see that the ﬁniteelement method creates a symmetrical grid reﬁnement. This symmetry can not be observed in theneural network based reﬁnement. Furthermore, our approach reﬁned a few more elements than FEM,but overall our methodology still produced a reasonable grid adaptivity.14 .1.3 Mean squared value goal functional

In this third numerical test, an example of a nonlinear goal functional is the mean squared value, whichreads J ( u ) = 1 | Ω | (cid:90) Ω u d x. For a nonlinear goal functional the adjoint problem then needs to be modiﬁed to (see also (3) in Section2): Find z ∈ H (Ω) such that ( ∇ ψ, ∇ z ) = J (cid:48) ( u )( ψ ) ∀ ψ ∈ H (Ω) . Computing the Fréchet derivative of the mean squared value goal functional, we can rewrite the adjointproblem as ( ∇ ψ, ∇ z ) = (cid:18) ψ, u | Ω | (cid:19) and can be transformed into its strong form − ∆ z = 2 u | Ω | . Our training setup also changed slightly. The problem statement has become more diﬃcult and wedecided to use slightly bigger networks to compute a suﬃciently good solution of the adjoint solution.We used three hidden layers with 32 neurons and retrained the neural network on each grid, since theprimal solution is part of the adjoint PDE.FEM NNRef. DoFs J ( u ) − J ( u h ) Est. error I eff DoFs J ( u ) − J ( u h ) Est. error I eff I eff arestable without major oscillations. 15 a) FEM (b) NN Figure 6: Section 5.1: Grid reﬁnement with mean squared value goal functional.

In the second numerical problem, we now consider the case were both the PDE and the goal functionalare nonlinear. We add the scaled nonlinear term u to the previous equation, such that the newproblem is given by − ∆ u + γu = f in Ω u = 0 on ∂ Ω , with γ > . For our nonlinear goal functional, we choose the mean squared value goal functional fromthe previous example. The adjoint problem thus reads:Find z ∈ H (Ω) such that ( ∇ ψ, ∇ z ) + 2 γ ( ψ, zu ) = (cid:18) ψ, u | Ω | (cid:19) ∀ ψ ∈ H (Ω) , with corresponding strong form − ∆ z + 2 γzu = 2 u | Ω | . The training setup is the same as for the previous goal functional. For γ = 50 we obtain the followingresults: 16EM NNRef. DoFs J ( u ) − J ( u h ) Est. error I eff DoFs J ( u ) − J ( u h ) Est. error I eff (a) FEM (b) NN Figure 8: Section 5.2: Grid reﬁnement for the nonlinear PDE.

In this work, we proposed neural network guided a posteriori error estimation with the dual weightedresidual method. Speciﬁcally, we computed the adjoint solution with feedforward neural networks withtwo or three hidden layers. To use existing FEM software we ﬁrst solved the adjoint PDE with neuralnetworks and then projected the solution into the FEM space of the adjoint PDE. We demonstratedexperimentally that neural network based solutions of the strong formulation of the adjoint PDE yieldexcellent approximations for dual weighted residual error estimates. Therefore, neural networks mightbe an eﬀective way to compute adjoint sensitivities within goal-oriented error estimators for certain17roblems, when the number of degrees of freedom is high. Furthermore they admit greater ﬂexibilitybeing a meshless method and it would be interesting to investigate in future works how diﬀerent choicesof collocation points inﬂuence the quality of the error estimates. A sophisticated choice of collocationpoints could lead to a signiﬁcant speedup over the ﬁnite element method for a high number of degreesof freedom. However, an important current limitation of our methodology is that we work with thestrong formulation of the PDE, whose derivation from the weak formulation can be very involved formore complex problems, e.g. multiphysics. Hence, if an energy minimization formulation exists, thisshould be a viable alternative to our strong form of the adjoint PDE. This alternative problem canbe solved with neural networks with the "Deep Ritz Method" [13, 45]. Nevertheless, the energy mini-mization formulation does not exist for all partial diﬀerential equations. For this reason in the future,we are going to analyze neural network based methods, which work with the variational formulation,e.g. VPINNs [26].

Acknowledgements

This work is supported by the Deutsche Forschungsgemeinschaft (DFG) under Germany’s ExcellenceStrategy within the cluster of Excellence PhoenixD (EXC 2122, Project ID 390833453).

References [1] M. Ainsworth and J. T. Oden.

A Posteriori Error Estimation in Finite Element Analysis . Pureand Applied Mathematics (New York). Wiley-Interscience [John Wiley & Sons], New York, 2000.[2] D. Arndt, W. Bangerth, B. Blais, T. C. Clevenger, M. Fehling, A. V. Grayver, T. Heister, L. Heltai,M. Kronbichler, M. Maier, P. Munch, J.-P. Pelteret, R. Rastak, I. Thomas, B. Turcksin, Z. Wang,and D. Wells. The deal.II library, version 9.2.

Journal of Numerical Mathematics , 28(3):131–146,2020.[3] W. Bangerth and R. Rannacher.

Adaptive Finite Element Methods for Diﬀerential Equations .Birkhäuser Verlag„ 2003.[4] R. Becker and R. Rannacher. A feed-back approach to error control in ﬁnite element methods:basic analysis and examples.

East-West J. Numer. Math. , 4:237–264, 1996.[5] R. Becker and R. Rannacher. An optimal control approach to a posteriori error estimation inﬁnite element methods.

Acta Numerica 2001 , 10:1 – 102, 05 2001.[6] J. Berg and K. Nyström. A uniﬁed deep artiﬁcial neural network approach to partial diﬀerentialequations in complex geometries.

Neurocomputing , 317:28–41, 2018.[7] M. Braack and A. Ern. A posteriori control of modeling errors and discretization errors.

MultiscaleModeling & Simulation , 1(2):221–238, 2003. 188] M. Braack and T. Richter. Solutions of 3D Navier-Stokes benchmark problems with adaptiveﬁnite elements.

Computers & Fluids , 35(4):372 – 392, 2006.[9] I. Brevis, I. Muga, and K. G. van der Zee. A machine-learning minimal-residual (ML-MRes) frame-work for goal-oriented ﬁnite element discretizations.

Computers & Mathematics with Applications ,2020.[10] P. Ciarlet.

The Finite Element Method for Elliptic Problems . Classics in Applied Mathematics.Society for Industrial and Applied Mathematics, 2002.[11] P. I. Cilliers and M. M. Botha. Goal-Oriented Error Estimation for the Method of Moments toCompute Antenna Impedance.

IEEE Antennas and Wireless Propagation Letters , 19(6):997–1001,2020.[12] G. Cybenko. Approximation by superpositions of a sigmoidal function.

Mathematics of Control,Signals and Systems , 2(4):303–314, 1989.[13] W. E and B. Yu. The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm forSolving Variational Problems.

Communications in Mathematics and Statistics , 6(1):1–12, Mar2018.[14] B. Endtmayer.

Multi-goal oriented a posteriori error estimates for nonlinear partial diﬀerentialequations . PhD thesis, Johannes Kepler University Linz, 2021.[15] B. Endtmayer, U. Langer, and T. Wick. Multigoal-Oriented Error Estimates for Non-linear Prob-lems.

Journal of Numerical Mathematics , 27(4):215–236, 2019.[16] B. Endtmayer, U. Langer, and T. Wick. Two-side a posteriori error estimates for the dual-weightedresidual method.

SIAM Journal on Scientiﬁc Computing , 42(1):A371–A394, 2020.[17] B. Endtmayer, U. Langer, and T. Wick. Reliability and eﬃciency of dwr-type a posteriori error es-timates with smart sensitivity weight recovering.

Computational Methods in Applied Mathematics ,2021.[18] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson. Introduction to adaptive methods for diﬀer-ential equations. In A. Iserles, editor,

Acta Numerica 1995 , pages 105–158. Cambridge UniversityPress., 1995.[19] L. Failer and T. Wick. Adaptive time-step control for nonlinear ﬂuid-structure interaction.

Journalof Computational Physics , 366:448 – 477, 2018.[20] D. Gilbarg and N. S. Trudinger.

Elliptic Partial Diﬀerential Equations of Second Order, Classicsin Mathematics , volume 224. Springer Berlin Heidelberg, Berlin, Heidelberg, 2001.[21] M. Giles and E. Süli. Adjoint methods for pdes: a posteriori error analysis and postprocessing byduality.

Acta Numerica 2002 , pages 145–236, 2002. A. Iserles, ed.1922] D. Hartmann, C. Lessig, N. Margenberg, and T. Richter. A neural network multigrid solver forthe Navier-Stokes equations, arXiv:2008.11520, 2020.[23] O. Hennigh, S. Narasimhan, M. A. Nabian, A. Subramaniam, K. Tangsali, M. Rietmann, J. delAguila Ferrandis, W. Byeon, Z. Fang, and S. Choudhry. NVIDIA SimNet

T M : an AI-acceleratedmulti-physics simulation framework, arXiv:2012.07938, 2020.[24] C. Higham and D. Higham. Deep learning: an introduction for applied mathematicians.

SIAMReview , 61(4):860–891, 2019.[25] K. Hornik. Approximation capabilities of multilayer feedforward networks.

Neural Networks ,4(2):251 – 257, 1991.[26] E. Kharazmi, Z. Zhang, and G. E. Karniadakis. Variational Physics-Informed Neural NetworksFor Solving Partial Diﬀerential Equations, arXiv:1912.00873, 2019.[27] D. P. Kingma and J. Ba. Adam: A Method for Stochastic Optimization, arXiv:1412.6980, 2017.[28] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, and A. Anandkumar.Fourier neural operator for parametric partial diﬀerential equations, arXiv:2010.08895, 2020.[29] D. C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization.

Mathematical Programming , 45(1):503–528, Aug 1989.[30] L. Lu, X. Meng, Z. Mao, and G. E. Karniadakis. DeepXDE: A Deep Learning Library for SolvingDiﬀerential Equations.

SIAM Review , 63(1):208–228, 2021.[31] L. Lyu, K. Wu, R. Du, and J. Chen. Enforcing exact boundary and initial conditions in the deepmixed residual method, arXiv:2008.01491, 2020.[32] M. Maier and R. Rannacher. A duality-based optimization approach for model adaptivity inheterogeneous multiscale problems.

Multiscale Modeling & Simulation , 16(1):412–428, 2018.[33] S. A. Mattis and B. Wohlmuth. Goal-oriented adaptive surrogate construction for stochasticinversion.

Computer Methods in Applied Mechanics and Engineering , 339:36 – 60, 2018.[34] J. Nocedal and S. J. Wright.

Numerical Optimization . Springer, New York, NY, USA, secondedition, 2006.[35] J. T. Oden. Adaptive multiscale predictive modelling.

Acta Numerica , 27:353–450, 2018.[36] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein,L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy,B. Steiner, L. Fang, J. Bai, and S. Chintala. Pytorch: An imperative style, high-performancedeep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché Buc, E. Fox,and R. Garnett, editors,

Advances in Neural Information Processing Systems , volume 32. CurranAssociates, Inc., 2019. 2037] J. Peraire and A. Patera. Bounds for linear-functional outputs of coercive partial diﬀerentialequations: local indicators and adaptive reﬁnement. In P. Ladeveze and J. Oden, editors,

Advancesin Adaptive Computational Methods in Mechanics , pages 199–215. Elsevier, Amsterdam, 1998.[38] A. Pinkus. Approximation theory of the MLP model in neural networks.

Acta Numerica , 8:143–195, 1999.[39] M. Raissi, P. Perdikaris, and G. E. Karniadakis. Physics-informed neural networks: A deeplearning framework for solving forward and inverse problems involving nonlinear partial diﬀerentialequations.

Journal of Computational Physics , 378:686–707, 2019.[40] M. Raissi, A. Yazdani, and G. E. Karniadakis. Hidden ﬂuid mechanics: Learning velocity andpressure ﬁelds from ﬂow visualizations.

Science , 2020.[41] R. Rannacher and J. Vihharev. Adaptive ﬁnite element analysis of nonlinear problems: balancingof discretization and iteration errors.

Journal of Numerical Mathematics , 21(1):23 – 62, 01 Mar.2013.[42] T. Richter. Goal-oriented error estimation for ﬂuid-structure interaction problems.

ComputerMethods in Applied Mechanics and Engineering , 223-224:28 – 42, 2012.[43] T. Richter and T. Wick. Variational localizations of the dual weighted residual estimator.

Journalof Computational and Applied Mathematics , 279(0):192 – 208, 2015.[44] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by back-propagating errors.

Nature , 323:533–536, 1986.[45] E. Samaniego, C. Anitescu, S. Goswami, V. Nguyen-Thanh, H. Guo, K. Hamdia, X. Zhuang, andT. Rabczuk. An energy approach to the solution of partial diﬀerential equations in computationalmechanics via machine learning: Concepts, implementation and applications.

Computer Methodsin Applied Mechanics and Engineering , 362:112790, 2020.[46] J. Sirignano and K. Spiliopoulos. Dgm: A deep learning algorithm for solving partial diﬀerentialequations.

Journal of Computational Physics , 375:1339–1364, 2018.[47] R. van der Meer, C. Oosterlee, and A. Borovykh. Optimally weighted loss functions for solvingPDEs with Neural Networks, arXiv:2002.06269, 2020.[48] H. Wessels, C. Weißenfels, and P. Wriggers. The neural particle method - An updated Lagrangianphysics informed neural network for computational ﬂuid dynamics.

Computer Methods in AppliedMechanics and Engineering , 368:113127, 2020.[49] K. Zee, E. Brummelen, I. Akkerman, and R. Borst. Goal-oriented error estimation and adaptiv-ity for ﬂuid-structure interaction using exact linearized adjoints.