[PDF] Bayesian neural networks for weak solution of PDEs with uncertainty quantification

Abstract

Solving partial differential equations (PDEs) is the canonical approach for understanding the behavior of physical systems. However, large scale solutions of PDEs using state of the art discretization techniques remains an expensive proposition. In this work, a new physics-constrained neural network (NN) approach is proposed to solve PDEs without labels, with a view to enabling high-throughput solutions in support of design and decision-making. Distinct from existing physics-informed NN approaches, where the strong form or weak form of PDEs are used to construct the loss function, we write the loss function of NNs based on the discretized residual of PDEs through an efficient, convolutional operator-based, and vectorized implementation. We explore an encoder-decoder NN structure for both deterministic and probabilistic models, with Bayesian NNs (BNNs) for the latter, which allow us to quantify both epistemic uncertainty from model parameters and aleatoric uncertainty from noise in the data. For BNNs, the discretized residual is used to construct the likelihood function. In our approach, both deterministic and probabilistic convolutional layers are used to learn the applied boundary conditions (BCs) and to detect the problem domain. As both Dirichlet and Neumann BCs are specified as inputs to NNs, a single NN can solve for similar physics, but with different BCs and on a number of problem domains. The trained surrogate PDE solvers can also make interpolating and extrapolating (to a certain extent) predictions for BCs that they were not exposed to during training. Such surrogate models are of particular importance for problems, where similar types of PDEs need to be repeatedly solved for many times with slight variations. We demonstrate the capability and performance of the proposed framework by applying it to steady-state diffusion, linear elasticity, and nonlinear elasticity.

Full PDF

BBayesian neural networks for weak solution of PDEs withuncertainty quantiﬁcation

Xiaoxuan Zhang , Krishna Garikipati , , ∗ Department of Mechanical Engineering, University of Michigan, United States Department of Mathematics, University of Michigan, United States Michigan Institute for Computational Discovery & Engineering, University of Michigan, United States

Abstract

Solving partial differential equations (PDEs) is the canonical approach for understanding the be-havior of physical systems. However, large scale solutions of PDEs using state of the art dis-cretization techniques remains an expensive proposition. In this work, a new physics-constrainedneural network (NN) approach is proposed to solve PDEs without labels, with a view to enablinghigh-throughput solutions in support of design and decision-making. Distinct from existing physics-informed NN approaches, where the strong form or weak form of PDEs are used to construct theloss function, we write the loss function of NNs based on the discretized residual of PDEs throughan efﬁcient, convolutional operator-based, and vectorized implementation. We explore an encoder-decoder NN structure for both deterministic and probabilistic models, with Bayesian NNs (BNNs)for the latter, which allow us to quantify both epistemic uncertainty from model parameters andaleatoric uncertainty from noise in the data. For BNNs, the discretized residual is used to constructthe likelihood function. In our approach, both deterministic and probabilistic convolutional layersare used to learn the applied boundary conditions (BCs) and to detect the problem domain. Asboth Dirichlet and Neumann BCs are speciﬁed as inputs to NNs, a single NN can solve for similarphysics, but with different BCs and on a number of problem domains. The trained surrogate PDEsolvers can also make interpolating and extrapolating (to a certain extent) predictions for BCs thatthey were not exposed to during training. Such surrogate models are of particular importance forproblems, where similar types of PDEs need to be repeatedly solved for many times with slight vari-ations. We demonstrate the capability and performance of the proposed framework by applying itto different steady-state and equilibrium boundary value problems with physics that spans diffusion,linear elasticity, and nonlinear elasticity. K eywords weak form constrained neural network · Bayesian neural network · uncertainty quantiﬁcation · nonlinearelasticity · surrogate PDE solver · partial differential equations Solving partial differential equations (PDEs) is crucial for many scientists and engineers to understand the behavior ofdifferent physical systems. Popular numerical methods to solve PDEs include, but are not limited to, the ﬁnite elementmethod (FEM), ﬁnite difference method, ﬁnite volume method, Fourier method, and other methods, where each hasits own advantages and limitations. Among those methods, FEM is arguably the most widely used method due toits ﬂexibility for solving problems with complex geometrical and irregular shapes. However, when a large numberof simulations need to be carried out, such as homogenization, optimization, or inverse problems, these numericalmethods can sometimes be very expensive, thus efﬁcient surrogate models are needed.With the drastically increasing computational power of graphics processing units (GPUs), machine learning has gainedgreat popularity in a wide range of applications in ﬁelds, such as computer vision, speech recognition, and anomalydetection, etc. It has also emerged as a powerful approach among data-driven methods for surrogate modeling inmany material, physics, and engineering applications [1, 2], such as material screening [3, 4], constitutive modeling[5–8], scale bridging [9–11], and system identiﬁcation [12–16], effective material properties prediction [17–20], non-linear material response characterization [21–27], etc. Attempts to use machine learning techniques to learn thefull ﬁeld solution of physical systems are explored in [28–31], and many others. For example, [28] proposed aBayesian approach to convolutional neural networks (CNNs) to study the ﬂow problem in heterogeneous media with ∗ Corresponding author. E-mail address: [email protected] January 14, 2021 a r X i v : . [ c s . C E ] J a n PREPRINT - J

ANUARY

14, 2021uncertainty quantiﬁcation. CNNs are also used to predict the velocity and pressure ﬁelds for aerodynamic problemsin [30] and predict concentration distribution for single-species reaction-diffusion systems in [31]. Those approachesgenerally require a large amount of full-ﬁeld solutions as the training data, either from experiments or direct numericalsimulations (DNSs), which might not be easily available or expensive to obtain.Another thrust in using machine learning techniques in the computational mechanics, materials, and physics com-munity is to solve PDEs with little or no pre-labeled data [32–41]. For example, high-dimensional free-boundaryPDEs are solved with fully connected NNs by reformulating PDEs as backward stochastic differential equations in[33]. The. Deep Galerkin Method proposed in [34] is used to solve high-dimensional free-boundary PDEs with neuralnetworks (NNs), which satisfy the differential operators, initial condition (IC), and boundary conditions (BCs). ThePhysics-informed Neural Networks (PINNs) approach has been proposed to solve different transient systems [35]. Inthis approach, the strong form of PDEs, either using the continuous or discrete time derivative, is constructed andserves as part of the loss function, which further consists of contributions from the IC and BCs [35]. Various ex-tensions of PINNs have been made to solve different systems [37, 38, 42–46]. Surrogate PDE solvers based on theweak/variational formulation are also studied in [47–51].When solving PDE systems, it is important to quantify uncertainties from different sources, such as geometry repre-sentation, BCs, material parameters, and others, to better understand the systems and make reliable predictions. Thesources of uncertainties can generally be categorized as either epistemic and aleatory, where the former can be re-duced by gathering more data or using a more sophisticated model, and the latter is less prone to reduction [52]. Withmachine learning techniques, probabilistic models can be constructed to easily quantify the uncertainty. Uncertaintyquantiﬁcation (UQ) with surrogate PDE solvers has been investigated in [37, 38, 53, 54]. For example, [53] employsthe dropout techniques proposed in [55] to quantify uncertainties of a surrogate solver that combines arbitrary polyno-mial chaos with PINNs. In [54], probabilistic PINNs are constructed based on latent variable models and trained withan adversarial inference process for UQ. A Bayesian framework is used in [37] for UQ, where the posterior distribu-tion of the surrogate model parameters is constructed based on the Stochastic Weight Averaging Gaussian techniqueproposed in [56]. A Bayesian PINN is proposed in [38] for UQ, where PINNs are used to construct the likelihoodfunction and either the Hamiltonian Monte Carlo or variational inference (VI) techniques are used to estimate theposterior distribution.With properly trained surrogate PDE solvers, there is interest in using them in problems, such as homogenization oroptimization, to rapidly predict the response of the same PDE systems, but with different IC or BCs, and potentiallyeven on different problem domains. However, such goals are in general difﬁcult to achieve with existing surrogateapproaches, which typically enforce only one speciﬁc set of BCs via the loss function. It is very challenging to makepredictions for new sets of BCs with such surrogate solvers without re-training them. In this work, we aim to addresssuch challenges by proposing a new physics-constrained NN to solve PDEs, where the BCs are speciﬁed as inputs tothe NNs. Motivated by the FEM, which uses the weak formulation to completely deﬁne a physical system describedby the governing PDEs and the associated BCs, and solves the problem based on the discretized residual, we constructthe discretized residual of PDEs from NN predicted solutions to form the loss function to train NNs, through anefﬁcient, convolutional operator-based, and vectorized residual calculation implementation. As shown in Fig. 1, theweak PDE loss layers are independent from the NN that serves as the surrogate PDE solver and do not introduceany new trainable parameters. Such features offer us great ﬂexibility to choose the NN architecture. We focus onan encoder-decoder NN structure, which has been investigated for other physical systems [28–30]. We studied bothdeterministic and probabilistic models, with Bayesian NNs (BNNs) for the latter. The encoder-decoder structure canbe easily adopted to the BNNs with the modularized probabilistic layers provided in the TensorFlow Probability (TFP)library. In our approach, deterministic/probabilistic convolutional NN layers are used to learn the applied BCs (bothDirichlet and Neumann) and to detect the problem domains through carefully designed input data structure. Thus,with our approach, a single NN can be used to simultaneously solve different BVPs that are governed by the samePDEs but on different domains with different BCs. In addition, similar to other surrogate PDE solvers, our approachis also label free. Furthermore, the trained surrogate solvers can make predictions for interpolated and extrapolated (toa certain extent) BCs that they were not exposed to during training.In our BNNs, each model parameter is sampled from a posterior distribution. We solve for the posterior distributionof model parameters with the VI method instead of the Markov Chain Monte Carlo (MCMC) sampling, as the latterinvolves expensive iterative inference steps and is not suitable for systems with a large number of parameters [57, 58].In our work, the likelihood function is constructed based on the discretized PDE residuals. The BNNs allow us toquantify both epistemic uncertainty from model parameters and aleatoric uncertainty from noise in the data. In ourstudy, an additive noise is applied to the NN predicted solution as in [16, 28, 36, 37, 59] to represent the aleatoricuncertainty. Such an additive noise represents potential errors from various sources, such as discretization error [37],geometry representation, boundary conditions, and material parameter measurement, among others.2

PREPRINT - J

ANUARY

14, 2021The proposed framework is a generalized approach that is applicable to both steady-state and transient problems.In this work, we present the details of this new framework, and its application for the steady-state diffusion, linearelasticity, and nonlinear elasticity. We defer the investigation of transient problems to a subsequent work. To theauthors’ best knowledge, this is the ﬁrst attempt to simultaneously solve PDEs on different domains with differentBCs with a single surrogate solver, with the further feature of UQ. In this study, the problem domains are representedvia pixels on a square background grid for simplicity. Thus, the boundary of a domain is not a smooth curve, but hasa pixel-level resolution. One can refer to [30, 31, 60] and many others for strategies to map complex and irregulardomain onto a regular grid mesh. Such geometry transformation can be taken into account in the proposed PDE losslayers with the isoparametric mapping concept of the FEM via the shape functions, though it is not the focus of thiswork.The rest of the paper is organized as follows. In Section 2, we brieﬂy summarize the general mathematical descriptionof the type of physical systems that is studied in this work. The structures of discretized residual constrained NNs usedin this work are presented in Section 3. Section 4 provides the details of an efﬁcient implementation of the discretizedresidual calculation. Section 5 covers the data structure of NN inputs, domain/boundary detection, setup of BVPs, andNN training procedures. Detailed simulation results are presented in Section 6, where steady-state diffusion, linearelasticity, and non-linear elasticity are studied. Concluding remarks and perspectives are offered in Section 7.

In this work, we are interested in solving the steady-state diffusion, linear elasticity, and nonlinear elasticity problemswith discretized residual constrained NNs. These three physical systems are described by a general elliptic PDE on adomain Ω with the Dirichlet BC on Γ ϕ and the Neumann BC on Γ k as ∇ · A ( ϕ ) = on Ω , ϕ ( X ) = ¯ ϕ ( X ) on Γ ϕ , k ( X ) = ¯ k ( X ) on Γ k , (1)where ϕ ( X ) represents the location-dependent unknown variable and X is the coordinate. The overall boundary ofthe continuum body satisﬁes Γ = Γ ϕ (cid:83) Γ k and Γ ϕ (cid:84) Γ k = ∅ . It is worth mentioning that even though bold typeface ϕ , A , and k are used in (1), depending on the degree of freedoms (DOFs) of each physical system, they can representeither scalar, vector, or tensor ﬁelds. For example, in the diffusion problem, ϕ , A , and k represent the compositionalorder parameter (scalar), the diffusive ﬂux (vector), and the outward ﬂux (scalar), respectively. Whereas in elasticityproblems, ϕ , A , and k represent the deformation ﬁeld (vector), the stress ﬁeld (second-order tensor), and the surfacetraction (vector), respectively. The details of each system are provided in the numerical simulation section.The weak form of (1) states: For variations ω satisfying ∀ ω ∈ V with V = (cid:8) ω | ω = on Γ ϕ (cid:9) , seek trial solutions ϕ ∈ S with S = (cid:8) ϕ | ϕ = ¯ ϕ on Γ ϕ (cid:9) such that (cid:90) Ω ∇ ω · A ( ϕ ) dV − (cid:90) Γ k ¯ k · ω dS = 0 . (2)Eq. (2) is obtained by multiplying (1 ) with ω , integrating by parts, and then incorporating the Neumann BC in (1 ).For the diffusion problem, ω is a scalar ﬁeld. For elasticity problems, ω is a vector ﬁeld.To obtain the approximate solutions of (2), ﬁnite-dimensional approximations of ω and ϕ , denoted by ω h and ϕ h ,are constructed with ∀ ω h ∈ V h = (cid:8) ω h | ω h = on Γ ϕ (cid:9) and ϕ h ∈ S h = (cid:8) ϕ h | ϕ h = ¯ ϕ on Γ ϕ (cid:9) . The discretizedterms ω h , ∇ ω h , and ϕ h are computed as ω h = N dω , ∇ ω h = Bdω , and ϕ h = N dϕ (3)in terms of the nodal solutions dω and dϕ , the basis functions N , and the gradient matrix B = ∇ N . Inserting (3)into (2) we obtain the discretized residual by a sum over subdomains Ω e and their associated boundary Γ e as R = n elem (cid:88) e =1 (cid:26)(cid:90) Ω e B T A ( ϕ h ) dV − (cid:90) Γ e, k N T ¯ k dS (cid:27) (4)where n elem represents the total number of subdomains. The volume and surface integrations in (4) are evaluatednumerically via Gaussian quadrature rules. In this work, the problem domain Ω is represented by images. The adjacentpixels in images are used to form the subdomain Ω e , whose connectivity information is preserved automatically bythe image data. The values at each pixel of the image are treated as nodal values. A more detailed discussion onconstructing the subdomains based on image pixels is provided in Section 4.3 PREPRINT - J

ANUARY

14, 2021

Zero/non-zero Dirichlet BCs

Weak PDE loss layers

Non-zero Neumann BCs

Encoder Decoder R neu R bulk R tot Neural Networks R tot red ﬁ ll randomnumbers Figure 1: Illustration of the discretized residual constrained NNs, which consist of an encoder-decoder NN structureand a group of layers to compute the PDE-related loss. The NNs can be either deterministic or probabilistic. The weakPDE loss layers take both NN inputs and outputs as their inputs. The Dirichlet BCs from the NN inputs are appliedto the NN predicted solution to compute the bulk residual. The Neumann residual is directly computed based on theNeumann BCs from the inputs . In this section, we present the formulation of discretized residual constrained deterministic/probabilistic NNs forsolving PDEs for given BCs without labels.

We ﬁrst consider deterministic NNs, whose parameters Θ are represented by single values instead of distributionsas in probabilistic NNs. In our approach, the NNs take image data that contains information of both Dirichlet andNeumann BCs as inputs and output the full ﬁeld weak solutions of the PDEs that are associated to the input BCs. Asshown in (4), the discretized residual of PDEs consists of two contributions, one bulk term and one surface term. Wepropose the weak PDE loss layers, which are discussed in detail in Section 4, to compute the residual in the bulk andon the Neumann boundary. As illustrated in Fig. 1, the weak PDE loss layers are constructed based on NN predictedsolutions. Those weak PDE loss layers only contain forward calculations based on the FEM without introducingany new parameters to be optimized. Since the weak PDE loss layers are independent of the NN that serves as thesurrogate PDE solver, and they also do not introduce any new trainable parameters, there is ﬂexibility in choosing theNN architecture. As shown in Fig. 1, we focus on an encoder-decoder NN structure, which has been investigated forother physical systems [28–30].When using mini-batch optimization to train the discretized residual constrained deterministic NNs over a dataset D ,the batch loss L i is written in terms of the reduced total residual R redtot , as illustrated in Fig. 1, as L i = 1 N N (cid:88) n =1 (cid:0) R redtot ( D i , Θ ) (cid:1) , (5)for each mini-batch i = 1 , , · · · , M with N indicating the size of data in each mini-batch. The detailed trainingprocess of the discretized residual constrained NNs is discussed in Section 5.4. Material parameters can also be speciﬁed as inputs to the NNs to further enrich the ﬂexibility of the proposed approach. Forsimplicity, only ﬁxed material parameters are considered in the numerical sections in this work. PREPRINT - J

ANUARY

14, 2021

For probabilistic NNs, we consider BNNs, whose model parameters Θ are stochastic and sampled from a posteriordistribution P ( Θ |D ) instead of being represented by single values as in deterministic NNs. The posterior distribution P ( Θ |D ) is computed based on the Bayes’ theorem P ( Θ |D ) = P ( D| Θ ) P ( Θ ) P ( D ) , (6)where D denote the i.i.d. observations (training data) and P represents the probability density function. In (6), P ( D| Θ ) is the likelihood, P ( Θ ) is the prior probability, and P ( D ) is the evidence function, respectively. The likelihood is theprobability of D given Θ , which describes the probability of the observed data for given parameters Θ . A larger valueof P ( D| Θ ) means the probability of the observed data is larger, implying that Θ is more reasonable. The prior needsto be speciﬁed before the Bayesian inference process [61].To compute the posterior distributions of Θ , one can use popular sampling-based methods, such as MCMC. However,the sampling method involves expensive iterative inference steps and would be difﬁcult to use when datasets are largeor models are very complex [57, 58, 62]. Alternatively, we can use the VI, which approximates the exact posteriordistribution P ( Θ |D ) with a more tractable surrogate distribution Q ( Θ ) by minimizing the Kullback-Leibler (KL)divergence [57, 62, 63] Q ∗ = arg min D KL ( Q ( Θ ) || P ( Θ |D )) . (7)Compared with MCMC, the VI is faster and easier to scale to large datasets. We therefore explore it in this work, eventhough it is less rigorously studied than MCMC [57]. The KL divergence is computed as D KL ( Q ( Θ ) || P ( Θ |D )) = E Q [log Q ( Θ )] − E Q [log P ( Θ , D )] + log P ( D ) , (8)which requires computing the logarithm of the evidence, log P ( D ) in (6) [57]. Since P ( D ) is hard to compute, it ischallenging to direct evaluate the objective function in (7). Alternatively, we can optimize the so-called evidence lowerbound (ELBO) ELBO ( Q ) = E Q [log P ( Θ , D )] − E Q [log Q ( Θ )]= E Q [log P ( D| Θ )] − ( E Q [log Q ( Θ )] − E Q [log P ( Θ )])= E Q [log P ( D| Θ )] − D KL ( Q ( Θ ) || P ( Θ )) . (9)which is equivalent to the KL-divergence up to an added constant. So, the loss function for the BNN is written as L = D KL ( Q ( Θ ) || P ( Θ )) − E Q [log P ( D| Θ )] , (10)which consists of a prior-dependent part and a data-dependent part, with the former being the KL-divergence of thesurrogate posterior distribution Q ( Θ ) and the prior P ( Θ ) , and the latter being the negative log-likelihood cost. Different methods are available for training NNs with stochastic weights, such as weight perturbation [63–65], ac-tivation perturbation [66], reparameterization [58], and many others. In this work, we follow a speciﬁc weight per-turbation method, the so-called Flipout, proposed in [65]. Compared with other weight perturbation algorithms thatsuffer from high variance of the gradient estimates because the same perturbation is shared in a mini-batch for alltraining examples, Flipout is an efﬁcient method, which decorrelates the gradients in a mini-batch by implicitly sam-pling pseudo-independent weight perturbation for each example, and thus reduces the variance of NNs with stochasticweights [65]. This method can be efﬁciently implemented in a vectorized manner with unbiased stochastic gradients.A brief description of Flipout is summarized here. Readers are directed to Ref. [65] for details. Flipout assumesthat the perturbations of different weights are independent, and the perturbation distribution is symmetric around zero.Under such assumptions, the perturbation distribution is invariant to element-wise multiplication by a random signmatrix. To minimize the loss L , the distribution of Q ( Θ ) can be described in terms of perturbations with W = W + ∆ W , where W and ∆ W are the mean and a stochastic perturbation for Θ , respectively. Flipout uses a baseperturbation (cid:100) ∆ W shared by all examples (training data points) in a mini-batch, and arrives at the perturbation forindividual example by multiplying (cid:100) ∆ W with a different rank-one sign matrix ∆ W n = (cid:100) ∆ W ◦ r n s tn , (11)where the subscript n indicates an individual example in a mini-batch, and r n and s n are entries of random vectorsuniformly sampled from ± . Using different perturbations for each example in a mini-batch rather than an identical5 PREPRINT - J

ANUARY

14, 2021perturbation for all the example in a mini-batch ensures the reduction of the variance of the stochastic gradients inFlipout during training. For BNNs, the W and (cid:100) ∆ W are the mean and standard deviation of the posterior distribution Q ( Θ ) , which are obtained via backpropagation with stochastic optimization algorithms. For mini-batch optimization,the batch loss is written as L i = 1 M D KL ( Q ( Θ ) || P ( Θ )) − E Q [log P ( D i | Θ ( i ) )] , (12)for each mini-batch i = 1 , , · · · , M [64]. With (12), we have L = (cid:80) i L i . Following [64], Monte Carlo (MC)sampling is used to approximate the expectation in (12) as L i ≈ M D KL ( Q ( Θ ) || P ( Θ )) − N N (cid:88) n =1 log P ( D ni | Θ ( i ) ) , (13)where N is the size of each mini-batch dataset, and Θ ( i ) denotes the i th batch sample drawn from the posterior distri-bution Q ( Θ ) . Even though only one set of parameters Θ ( i ) is drawn from Q ( Θ ) for each mini-batch, the perturbationapproach proposed by Flipout ensures that parameters are different for the individual example D ni to calculate the log-likelihood cost. Probabilistic dense layers and convolutional layers with the. Flipout weight perturbation techniquehave been implemented in the TFP Library and are used to construct the BNNs in this work. As the probabilistic layers are implemented in the TFP library in a modularized form, we can easily construct thediscretized residual constrained BNNs to have a similar encoder-decoder architecture, as shown in Fig. 1, as thedeterministic model but with all weights being drawn from probability distributions. The loss of the BNNs is given in(10). The probabilistic layers in the TFP library automatically calculate the prior-dependent KL-divergence and add itto the total loss.The data-dependent loss is accounted for by the likelihood function. In general, data contains noise that leads toaleatoric uncertainty, which cannot be reduced by training the surrogate model with more observations. Additive noise,which is independent of the data and is commonly treated as Gaussian, is often added to the output of the surrogatemodel to construct the likelihood function [16, 28, 36, 37, 59]. Such an additive noise represents potential errorsfrom various sources, such as discretization error [37], geometry representation, boundary conditions, and materialparameter measurement. Assuming a Gaussian noise (cid:15) ∼ N ( , Σ I ) with a zero-mean and a pre-speciﬁed constantcovariance Σ , the NN output y is written as y = f ( x , Θ ) + (cid:15) , (14)where f ( x , Θ ) represents the surrogate NNs. For discretized residual constrained NNs, the likelihood function isconstructed based on the residual value, rather than NN predicted solutions. The expected value of point-wise residualis zero, which states that the governing PDEs are weakly satisﬁed at each location. This ensures that the proposed sur-rogate PDE solvers are label free. With the noise (cid:15) in (14) propagating through the residual calculation, the likelihoodfunction is written as P ( R redtot | , Σ I ) = K (cid:89) k =1 N (cid:0) R red, k tot | , Σ (cid:1) (15)where k indicates the pixel location with K total pixels. As it is challenging to directly calculate Σ via error propa-gation based on Σ , we treat Σ as a learnable parameter to be optimized base on the NN loss. In (15), Σ essentiallyserves as a threshold for the residual to converge to. The batch-wise loss of the residual constrained BNNs has thefollowing format L i ≈ M D KL ( Q ( Θ ) || P ( Θ )) − N N (cid:88) n =1 K (cid:88) k =1 log (cid:16) N (cid:16) R red, k tot ( D ni , Θ ( i ) ) | , Σ (cid:17)(cid:17) . (16)The detailed training process of the residual constrained BNNs is discussed in Section 5.4. Aleatoric uncertainty can further be categorized into homoscedastic uncertainty and heteroscedastic uncertainty [67]. Weassume Σ being the former case for simplicity, which stays constant for different inputs. The latter is useful in cases where outputnoise depends on the model inputs. For systems where nonlinear operations are involved in the residual calculation, the residual noise distribution is in general nonGaussian even if the noise in the NN outputs is assumed to be Gaussian. Under the conditions that Σ is small and the nonlinearoperations are smooth and approximately linear locally, we assume that the noise distribution of the residual is approximatelyGaussian. PREPRINT - J

ANUARY

14, 2021

The BNNs allow us to quantify both epistemic uncertainty from model parameters and aleatoric uncertainty from noisein the data. With the discretized residual constrained BNNs, the posterior predictive distribution P ( y ∗ | x ∗ , D ) for aspeciﬁc testing data point { x ∗ , y ∗ } is expressed as [28, 59] P ( y ∗ | x ∗ , D ) = (cid:90) P ( y ∗ | x ∗ , Θ ) P ( Θ |D ) d Θ ≈ (cid:90) P ( y ∗ | x ∗ , Θ ) Q ( Θ ) d Θ , (17)which can be numerically evaluated via MC sampling as P ( y ∗ | x ∗ , D ) ≈ S S (cid:88) s =1 P ( y ∗ | x ∗ , Θ s ) where Θ s ∼ Q ( Θ ) , (18)with s indicating each sampling. To represent the uncertainty, we compute the statistical moments of y ∗ via thepredictive expectation E [ y ∗ | x ∗ , D ] ≈ S S (cid:88) s =1 f ( x ∗ , Θ s ) (19)and the predictive varianceVar [ y ∗ | x ∗ , D ] = E [( y ∗ + (cid:15) ) ] − ( E [ y ∗ + (cid:15) ]) ≈ S S (cid:88) s =1 (cid:16) f ( x ∗ , Θ s ) f T ( x ∗ , Θ s ) + Σ I (cid:17) − (cid:32) S S (cid:88) s =1 f ( x ∗ , Θ s ) (cid:33) (cid:32) S S (cid:88) s =1 f ( x ∗ , Θ s ) (cid:33) T . (20) In this section, we describe the implementation details of the weak PDE loss layers. We heavily utilize the convolu-tional operation, and the vector/matrix/tensor operations to achieve numerical efﬁciency. Readers are directed to oursource code for additional details . As shown in Fig. 1, the weak PDE loss layers take both NN inputs (BCs infor-mation) and outputs (NN predicted solution) as their inputs. The data structure to represent the BCs is discussed indetail in Section 5.1. A schematic of the major implementation steps of the weak PDE loss layers is shown in Fig. 2.We choose a steady state diffusion problem with a scalar unknown at each node for illustration purpose, with DirichletBCs being applied on the left side, non-zero Neumann BCs being applied on the bottom and right sides, and zeroNeumann BCs on the top. Assuming that the output of the NN shown in Fig. 1 is a × matrix (an image with × pixels), denoted as M NN , with the value of each entry being the actual concentration, M NN , is equivalent to the nodalsolution on a domain, which is discretized with 4x4 elements, as shown in Fig. 2(a). The implementation procedure issummarized in the Algorithm Box 1 The channel of NN inputs with Dirichlet BCs information is denoted as I , D . To enforce the Dirichlet BCs, we replacethe nodal values of M NN , at the location of Dirichlet boundary with the actual values of I , D to obtain a new matrix,denoted as M , , as indicated by the green color in Fig. 2(a). The Dirichlet BCs are then automatically incorporatedinto the residual vector during the bulk residual calculation discussed in the next Section. The matrix representation of the nodal solution automatically contains the element connectivity information of themesh. To compute the bulk residual, we ﬁrst apply convolutional operations to M , with the following kernels k B, = (cid:20) (cid:21) , k B, = (cid:20) (cid:21) , k B, = (cid:20) (cid:21) , k B, = (cid:20) (cid:21) . (21) github.com/mechanoChem/mechanoChemML In the source code, M NN , is stored as M NN , , with a third dimension of 1, which indicates the DOF per node. For elasticityproblems, the third dimension has a size of 2. Here, we drop the “1” to simplify the notations. PREPRINT - J

ANUARY

14, 2021 k B,1 k B,2 k B,3 k B,4 (a) nodal solution (b) selected nodes by each kernel (c) element like nodal solution(e) element like nodal residual (f) assemble residual (unfold)

Nodal values are replaced withthe actual values atDirichlet BCs. (g) Neumann BCs representation residual from Neumann BCs (h) selected nodes by each kernel (i) reduced total residual(d) vectorized representation (I)(II) M M M R bulk R bulk k I,1 k I,2 k II,1 k II,2 R bulk I Neu

Figure 2: Illustration of the implementation steps of computing the residual in the weak PDE loss layers. Paddings ofzeros are not shown in (c,d,e).Each convolutional operation results in a matrix with a size of × , which corresponds to the selected nodes, ashighlighted with colors in Fig. 2(b). With these four convolutional operations, we now have a matrix with a size of × × ( M , , ), as shown in Fig. 2(c). We then reshape the matrix to an array × ( M , ), as shown in Fig. The resulting matrix size is × . Zero paddings are used to ensure the resulted matrix with a dimension of × . Keeping thematrix size unchanged during the convolutional operations is not necessary and might require a small amount of extra ﬂoating-pointoperations, but it is less prone to errors if we handle matrices with a ﬁxed size. PREPRINT - J

ANUARY

14, 20212(d). Each row of M , corresponds to the local nodal solution vector inside one ﬁnite element, the subdomain Ω e in (4), which can then be used to efﬁciently evaluate the residual via the matrix-vector operation.To evaluate the residual of the steady-state diffusion problem with × Gauss points, the B-operator matrix in (35)has a size of × × ( × dimensions × B , , , with its transpose denotedas B T , , . The bulk residual at each Gauss point i is evaluated as ( R , bulk ) i = ω i D M , B i, , B i, , (22)with ω i denoting the weights. The total bulk residual is computed as R , bulk = (cid:88) i =1 R i bulk , (23)as shown in Fig. 2(d). R , bulk is then reshaped to R , , bulk , and stored in the element-like form, as shown in Fig. 2(e).Next, we use the tf.roll function to unfold the element-like residual to the correct nodal position, as shown Fig.2(f), with R , , bulk = R , , bulk R , , bulk = tf.roll ( R , , bulk , [1] , [2]) R , , bulk = tf.roll ( R , , bulk , [1] , [1]) R , , bulk = tf.roll ( R , , bulk , [1 , , [1 , (24)The assemble operation in (35) for the bulk integration is now achieved by the tf.reduce_sum ( R , , bulk ) functionwithout looping over all the elements to get R , , bulk as done traditionally in the FEM. Readers are directed to oursource code for the implementation of the linear/non-linear elasticity problems. One channel of the inputs that contains purely Neumann BCs, denoted as I , Neu , is shown in Fig. 2(g), where thematrix contains only non-zero items at the non-zero Neumann boundary locations. The Neumann residual needs to beevaluated within surface elements. Similar to computing the bulk residual, we apply convolutional operations to I , Neu to construct surface elements. Two sets of kernels are used to construct two groups of surface elements, with group Ifor computing residual on the top and bottom edges, and group II for the left and right edges. We use the followingtwo kernels k I, = (cid:20) (cid:21) , k I, = (cid:20) (cid:21) , (25)to construct surface elements I , , Neu,I for the ﬁrst group, with the selected nodal information being shown Fig. 2(h-I),and the following kernels k II, = (cid:20) (cid:21) , k II, = (cid:20) (cid:21) . (26)to construct surface elements I , , Neu,II for the second group, with the selected nodal information being shown Fig. 2(h-II).Similar to the bulk residual calculation, we form two matrices, I , Neu,I and I , Neu,II , to compute the Neumann residual.We use 2 Gauss points for surface integration. The shape function N in (35) has a size of × ( × N , . We evaluate the Neumann residual at each Gauss point i via ( R , Neu,I ) i = ω i I , Neu,I N i, N i, and ( R , Neu,II ) i = ω i I , Neu,II N i, N i, (27)with ω i denoting the weights. The total Neumann residual is computed as R , Neu,I = (cid:88) i =1 R i Neu,I and R , Neu,II = (cid:88) i =1 R i Neu,II . (28)Again, we use the tf.roll function to unfold the element-like residual to the correct nodal position, similar to thoseshown Fig. 2(f), for group I R , , Neu,I = R , , Neu,I R , , Neu,I = tf.roll ( R , , Neu,I , [1] , [1]) (29)9 PREPRINT - J

ANUARY

14, 2021

Algorithm 1

Residual calculation for the steady-state diffusion example.

Bulk residual with applied Dirichlet BCs: R tot Apply Dirichlet BCs to NN predicted solutions M NN , by replacing the nodal values at the corresponding locationsto obtain M , (Fig. 2a). Convert M , from nodal value representation to a four-node element representation M , , by convolutionaloperations with kernels k B, , k B, , k B, , and k B, (Fig. 2b, 2c). For elasticity problems, NN predicted solutionshave two channels to represent both u x and u y . The same four kernels are applied to both channels, resulting inthe element representation M with a third dimension of 8 instead of 4. Get the vectorized representation M , with each row being the local nodal solutions for one element (Fig. 2d). Compute bulk residual R , for each element (Fig. 2d). Readers are directed to our source code for details of thebulk residual calculation of linear/nonlinear elasticity systems. Switch back to matrix representation of element-like nodal residual R , , bulk (Fig. 2e). Assemble bulk residual R , , bulk (Fig. 2f). Residual at Neumann BCs: R Neu Use kernels k I, , k I, and k II, , k II, to construct two groups of two-node surface elements I , , Neu,I and I , , Neu,II . Forelasticity problems, we have four groups of surface elements with two for T x and two for T y . Get the vectorized representation of surface elements I , Neu,I and I , Neu,II . Compute residuals R , Neu,I and R , Neu,II at Neumann BCs. Switch back to matrix representation of element-like nodal residual R , , Neu,I and R , , Neu,II . Assemble residual at Neumann BCs R , , Neu . Reduced total residual: R redtot Create a mask matrix M , bulk based on I , D to represent the pixel locations with valid bulk residual values. Theentries of M , bulk are zero for the components of I , D with a value of − , which indicates the margins betweenactual problem domain and the background grid (see more details in Section 5.1). For the steady-state diffusionexamples, all entries of M , bulk are one. Create a reverse mask matrix M , D,rev based on I , D to represent the pixel locations that are not at the Dirichletboundary. The entries of M , D,rev is zero at the entry locations of I , D with a value larger than zero. Compute total residual R tot based on (32). Multiply (element-wise) R tot with M , D,rev and M , bulk to get R redtot .and for group II R , , Neu,II = R , , Neu,II R , , Neu,II = tf.roll ( R , , Neu,II , [1] , [2]) . (30)The assemble operation in (35) for the surface integration is now achieved by the tf.reduce_sum ( R , , Neu,I ) and tf.reduce_sum ( R , , Neu,II ) without looping over elements. We obtain the ﬁnal Neumann residual R , , Neu = R Neu,I + R Neu,II . (31)The total residual R tot in (35), as shown in Fig. 1, is computed as R , , tot = R , , bulk − R , , Neu (32)by applying the Neumann residual to the bulk residual. To construct the deterministic loss in (5) and the likelihoodfunction in (15), the reduced residual R redtot obtained by excluding the residual at the Dirichlet boundary location from R tot is used, as shown Fig. 2(i). It is worth mentioning that additional auxiliary matrix/vector/tensor operations havebeen introduced, which are not included in the description, to complete this efﬁcient residual evaluation. Readers areinvited to refer to our code for the detailed implementation. In this section, we present details on the data structure of NN inputs, domain/boundary detection, the setup of BVPs,and the NN training procedure. 10

PREPRINT - J

ANUARY

14, 2021 (a) Dirichlet BCs (b) Neumann BCs(0,1) (> 0)(= 0)(= -1) (= -1)(= -2)(> 0)channel 1 channel 2

Figure 3: Illustration of the data structure of NN inputs for steady-state diffusion problem. The NN inputs contain twochannels with the ﬁrst one containing the Dirichlet BCs (red) and the second one containing the Neumann BCs (blue) .Only the boundary locations have values that are greater than 0. In the ﬁrst channel, the problem domain (gray color)is ﬁlled with a value of − , which serves an indicator to ﬁll with random numbers during the training process. Themargin part (white region) is ﬁlled with a value of − . The residual contribution from this region is excluded whencomputing R redtot . In the second channel, the problem domain is ﬁlled with a value of . Similarly, the margin is ﬁlledwith a value of − . Since the discretized residual constrained NNs do not require labels for training, the NN inputs are syntheticallygenerated with only information on problem domains and the applied BCs. We consider a ﬁxed square backgroundgrid of [0 , × [0 , , with nx and ny total pixels in each dimension. For both diffusion and elasticity problems,each input data point is a three-dimensional matrix I nx,ny,2 × DOF to represent a set of BCs. The ﬁrst two indices of I indicate the pixels locations in X- and Y- directions. For steady-state diffusion problem with one scalar DOF per node,there are two channels in the third dimension, which contain information of Dirichlet and Neumann BCs, respectively.For elasticity problems, there are four channels in the third dimension with the ﬁrst two channels containing DirichletBCs in X- and Y- directions and the last two channels containing Neumann BCs in X- and Y- directions, respectively.Data normalization between [ − , is used to ensure that all the physically meaningful data in our study has a valuegreater than 0.The structure of the input data is illustrated in Fig. 3 with the diffusion problem as an example. In our study, theproblem domain does not necessarily occupy the whole background grid, which results in the margin region as shownin Figs 3 and 4. For the channel(s) containing Dirichlet BCs, the problem domain is ﬁlled with − except the Dirichletboundary values, which is greater than 0. The auxiliary number − serves as an indicator to be ﬁlled with randomnumbers during the training process. For the margin region, which represents the space between the background gridand the problem domain, if there is any, is ﬁlled with − . The auxiliary number − serves as an indicator to evaluate R redtot with the residual in this region being excluded. For the channel(s) containing Neumann BCs, the problem domainis ﬁlled with a value of except the Neumann boundary values. When convolutional kernels operate on the problemdomain, only the boundary makes a non-zero contribution. Similarly, the margin is ﬁlled with a value of − forassisting the calculation of R redtot . Examples of the actual inputs for steady state diffusion are shown in Fig. 6(a,b). As discussed in Section 5.1, a ﬁxed value of − is assigned to the margins. When calculating the residual, a maskmatrix is created for domain detection. This mask matrix is created based on the information of Dirichlet BCs fromthe inputs and ensures that only the residual inside the actual problem domain is evaluated. One can refer to [30, 31,60] and many others for strategies to map complex and irregular domain into a regular grid mesh. Such geometrytransformation can be easily taken into account in the proposed PDE loss layers with the isoparametric mappingconcept of the FEM via the shape functions. The proposed approach, using a mask matrix for domain detection,should still be applicable to other parametric domain representations, though it is not the focus of this work.In our study, each input data point represents a unique BVP for a speciﬁc problem domain with a speciﬁc set of appliedBCs. To detect the Dirichlet BCs, during the NN training, the input data is ﬁrst passed to a customized Keras layer, For elasticity problems, the inputs contain four channels, with the ﬁrst two representing Dirichlet BCs for ¯ u x and ¯ u y and thelast two presenting Neumann BCs for ¯ T x and ¯ T y . The auxiliary numbers − and − are arbitrary choices with no physical meaning. Users can choose different values to assignto the margin and the problem domain for the inputs. PREPRINT - J

ANUARY

14, 2021 y ﬁ ve simulation domainsfour sets of BCs for steady-state di ﬀ usionsix sets of BCs for linear/nonlinear elasticity x 1 2 3 4 51 2 3 41 2 3 4 5 6c (b)(c)(a) cxy Figure 4: Illustration of the setup of BVPs on different domains for different physics. In these drawings, red representsa zero Dirichlet BC. Green represents a non-zero Dirichlet BC. Blue represents a non-zero Neumann BC. No coloris assigned to Zero Neumann BCs. (a) Setup of ﬁve rectangle simulation domains of different sizes and locations ona ﬁxed background grid with different applied BCs. For steady-state diffusion, 4 sets of BCs are assigned to eachsimulation domain, leading to 20 diffusion BVPs. For linear/nonlinear elasticity, 6 sets of BCs are assigned to eachsimulation domain, leading to 30 linear/nonlinear elasticity BVPs. (b) Setup of one diffusion BVP with an octagonsimulation domain with mixed BCs. (c) Setup of one linear elasticity BVP with a L-shape simulation domain with thebottom edge ﬁxed and the left edge loaded vertically.called

LayerFillRandomNumber, which ﬁlls the pixel locations with values of − in the Dirichlet BCs channel withuniformly generated random numbers in the range of [0 , . As the problem domain is ﬁlled with random numbers,the convolutional kernels eventually only pick up and learn the actual Dirichlet boundary values. The data structure inthe Neumann BCs channel automatically ensures that the kernels learn the information of BCs, as the problem domainis ﬁlled with zeros. To demonstrate the performance of our proposed method, we investigate different setups of BVPs for the three con-sidered physical systems. Speciﬁcally, we consider rectangle, octagon, and L-shape simulation domains, as shownin Fig. 4. For the rectangular domain, we allow its size and location to change with respect to a ﬁxed backgroundgrid . Five rectangular domains are considered, as shown in Fig. 4(a). For steady-state diffusion, four unique sets ofBCs are assigned to each domain, resulting in 20 different diffusion BVPs. For linear/nonlinear elasticity, six uniquesets of BCs are assigned to each domain, resulting in 30 different linear/nonlinear elasticity BVPs. On the octagonaldomain, we solve for the steady-state diffusion problem with one speciﬁc set of BCs, as shown in Fig. 4(b). On theL-shape domain, we solve for linear elasticity with one speciﬁc set of BCs, as shown in Fig. 4(c). The NN inputscorresponding to these BVP setups are synthetically generated to train the discretized residual-constrained NNs. Tocompare the solution accuracy between NNs and DNSs, we also solve these BVPs with mechanoChemFEM , which isa publicly available multiphysics code developed by us based on the deal.II library [68]. The ﬁxed background grid is necessary to ensure that the same NN structure can be used to solve PDEs on different simulationdomains Code available at github.com/mechanoChem/mechanoChemFEM. PREPRINT - J

ANUARY

14, 2021Figure 5: Illustration of the deterministic NNs predicted solution at different epochs for a diffusion BVP setup withdomain id 5 and BCs id 2 (ﬂux loading from the right edge), as shown in Fig. 4(a). Top: without zero initialization.Bottom: with zero initialization for the ﬁrst 100 epochs.

Algorithm 2

Training procedure for deterministic NNs. Load NN inputs with each data point being a unique set of BCs. Augment the inputs by duplicating them for multiple times to generate D . Split D into training, validation, and testing datasets. Setup the encoder-decoder deterministic NN structure, with the ﬁrst layer being a customized layer to ﬁll thelocations that have values of − in D with uniform random numbers between [0 , to ensure D is i.i.d. for epoch < total epochs do Batch train the NNs if use zero initialization and epoch < total zero initialization epochs then Use dummy labels with values of 0.5, which is equivalent to an actual zero before data normalization, to formthe MSE loss to train the NN. else Use R redtot to form the deterministic loss to train the NN. end if end for Make prediction.

For deterministic NNs, a ﬁxed learning rate is used to batch optimize the loss function (5) to solve the PDE systems.In our study, we found that problems loaded with Dirichlet BCs converge faster than cases loaded with Neumann BCs.The proposed approach sometimes fails to ﬁnd the correct solution for the latter case. This observation holds for allthree considered systems. This is mainly because, for the latter case, it is essentially the gradient of the unknown(s)that drives the loss down instead of the unknown(s) itself as for the former case. We demonstrate this by showing theNNs predicted solution at different epochs in Fig. 5 for a diffusion BVP setup with domain id 5 and BCs id 2 (see Fig.4a) with zero concentration on the left edge and non-zero ﬂux on the right edge. The top row of Fig. 5 shows that theNN predicted concentration very slowly changes by a front progressing from the left edge (zero Dirichlet BCs) to theright edge (ﬂux BCs), and the solution is not yet close to the DNS results.Such challenge arises mainly because the parameters of NNs are randomly initialized. NN predicted solutions at theearlier training stage are random numbers close to zero. Since data normalization is used, the NNs output scaledresults with zero being equivalent to an actual value of − . Such random outputs could easily violate the governingequations, e.g. resulting in a deformation gradient with negative determinant in nonlinear elasticity. Recalling that thesolution vector in the FEM is initialized to zero in general, we adopt the same approach for the NNs. For the ﬁrst fewepochs, we train NNs with dummy labels with values of 0.5 (equivalent to an actual value of 0) without enforcing thePDE constraint. We call this as the zero initialization process. This process helps to improve the initialization of NNparameters. After the zero initialization procedure is completed, the PDE constraints are enabled to train the NNs tosolve the PDE systems. We found that this remedy drastically improves the training results. In addition, it also speedsup the overall training process, as shown in the bottom row of Fig. 5, where the NN predicted solutions approach the13 PREPRINT - J

ANUARY

14, 2021

Algorithm 3

Training procedure for BNNs. Load NN inputs with each data point being a unique set of BCs. Augment the inputs by duplicating them multiple times to generate D . Split D into training, validation, and testing datasets. Setup the encoder-decoder probabilistic NN structure, with the ﬁrst layer being a customized layer to ﬁll thelocations that have values of − in D with uniform random numbers between [0 , . if use optimal parameter initialization then Load the optimized parameters from the deterministic NNs to initialize the mean of the posterior distribution ofBNN parameters. else Use random initialization for the posterior distribution of BNN parameters. end if for epoch < total epochs do Batch train the NNs if use zero initialization and epoch < total zero initialization epochs and ( not use optimal parameter initializa-tion) then Use dummy labels with values of 0.5, which is equivalent to an actual zero before data normalization, to formthe MSE loss to train the NN. else

Use R redtot to form the likelihood loss to train the NN. end if end for MC sampling for UQ.DNS results at 500 epochs, much faster than the case without the zero initialization process. The training process for deterministic NNs is summarized in the Algorithm Box 2.For probabilistic NNs, we can use the proposed approach successfully solve a single BVP. However, when we try tosolve multiple BVPs, we notice that the BNNs converge faster to purely Dirichlet problems (boundary id 1, 3 for thediffusion problem and boundary id 1, 4 for elasticity problems) than those with non-zero Neumann BCs. Once theBNNs converges to a sub-optimal state, it is very challenging to optimize BNNs further for other BVPs with NeumannBCs. To overcome this challenge, we ﬁrst train deterministic NNs with identical architecture as the BNNs. Oncethe deterministic NNs are converged to a desired state, we then initialize the mean of the posterior distribution ofparameters in the BNNs with the optimized parameters from the deterministic model. We call this as the optimalparameter initialization process. During the subsequent training of the BNNs, similar as in [37], we use a smalllearning rate to explore the local parameter space around these optimized parameters. The training process for BNNsis summarized in the Algorithm Box 3. In this section, the discretized residual constrained NNs are used to solve for the setup of BVPs presented in Section 2for steady-state diffusion, linear elasticity, and non-linear elasticity, to demonstrate the capability and performance ofthe proposed framework.

In this section, we use the proposed method to solve different steady-state diffusion problems. Usually, the number of unique sets of BCs is small, compared to D , which is augmented multiple times. Thus, even though thedataset is split into training, validation, and testing groups, each group could potentially contain all the unique BCs. The differenceamong these dataset groups lies in the interior domain, which is ﬁlled with random numbers. PREPRINT - J

ANUARY

14, 2021Deterministic Probabilistic Size Layer argumentsInput Input - -LayerFillRandomNumber LayerFillRandomNumber - -Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameFlatten Flatten - -Dense DenseFlipout units = 64 ReLUDense DenseFlipout units = 64 ReLUReshape Reshape - [4 , , Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUConv2D Convolution2DFlipout ﬁlters = 1 kernel (5,5), padding: same, ReLUTable 1: Details of both deterministic and probabilistic NNs for solving 20 steady-state diffusion BVPs.Description Deterministic ProbabilisticTotal parameters 33,209 66,202Size of D × Aug: × Aug: Epochs 20,000 5,000Zero initialization epochs 100 -Optimizer Nadam NadamLearning Rate 2.5e-4 1e-8Batch Size 256 64 Σ - 1e-8Initial value of Σ - 1e-8Table 2: Training related parameters for solving 20 steady-state diffusion BVPs. Aug: data augmentation. The general description of an elliptic PDE system given (1) is rewritten as ∇ · H = on Ω ,C ( X ) = ¯ C ( X ) on Γ C ,H = ¯ H ( X ) on Γ H , (33)for the one species steady-state diffusion problem. In (33), C represents the compositional order parameter, H is thediffusive ﬂux term deﬁned as H = − D ∇ C, (34)with D as the diffusivity, and H is the outward surface ﬂux in the normal direction . The discretized residual function(4) for steady-state diffusion is written as R = n elem (cid:88) e =1 (cid:26)(cid:90) Ω e B T H dV − (cid:90) Γ e,H N T ¯ H dS (cid:27) . (35)A diffusivity D = 1 . is used in both DNSs and the surrogate PDE solver. In this section, we use the proposed PDE constrained NNs to simultaneously solve 20 steady-state diffusion BVPs, asshown in Fig. 4(a), with a resolution of × . The architectures of both deterministic and probabilistic NNs and other In section 6.1.2 and 6.1.3, the inward ﬂux has a positive sign. PREPRINT - J

ANUARY

14, 2021 (a) NN inputs for BVP (i) (b) NN inputs for BVP (ii)(c) results for BVP (i) (d) UQ for BVP (i)(e) results for BVP (ii) (f) UQ for BVP (ii)

Figure 6: Results of two selected BVPs out of the 20 steady-state diffusion BVPs with varying domains and differentapplied BCs simultaneously solved by a single NN with the proposed method. BVP (i) and (ii) correspond to bc id1 and 2 for domain id 1, as shown in Fig. 4(a). (a, b) NN inputs for different BVPs. (c, e) Solutions from DNS,deterministic (det) NNs, and BNNs (Mean, Std.) for different BVPs . (d, f) Quantitative comparison of the solutiondistribution between DNS and BNNs along the dashed line.training related NN parameters are summarized in Table 1 and 2, respectively. The NN hyperparameters are manuallytuned to achieve a desired performance. We follow the training procedures in Algorithm Boxes 2 and 3 to ﬁrst trainthe deterministic NN with zero initialization, followed by training the BNNs with the optimal parameter initializationprocess. The results of two selected BVPs are shown in Fig. 6, with remaining results from other setups being givenin Appendix A.1.1. The statistical moments of the BNN predictions are evaluated based on 50 MC samplings. InFig. 6, BVP (i) and (ii) correspond to bc id 1 (non-zero Dirichlet loading) and bc id 2 (non-zero Neumann loading)applied to domain id 1. The NN inputs for both BVPs are shown in Fig. 6(a,b), in which only the red colored regionsare physically meaningful with values > . The comparison of solutions among DNSs, the deterministic NN, and theBNN for these two BVPs is shown qualitatively in Fig. 6(c,e), with quantitative comparison of the solution distributionalong the dashed line between DNSs and the BNN given in Fig. 6(d,f). Such a comparison shows that the proposedmethod has successfully solved the BVPs with desired accuracy. We further observe from Fig. 6(f) that the uncertaintyat the locations with the Neumann BCs is higher than other places, which is expected.The training losses for both deterministic and probabilistic NNs are given in Fig. 7(a, b). The negative loss in Fig.7(b) is reasonable, because the total loss of BNNs in (16) consists two terms. The ﬁrst term in (16) is non-negative,whereas the second term could be either positive or negative depending on values of both R redtot and Σ . The evolutionof Σ from the BNN is shown in Fig. 7(c), which converges to a speciﬁc value during training. The evolution of Σ is correlated to the sign change of the BNN loss. To further evaluate the relation between BNN predicted resultsand the value of Σ , we report the solution distribution along the dashed line for epochs 100, 500, 1000, 2000, 4000,indicated by the vertical lines in Fig. 7(c). The results for both BVP (i) and (ii) are presented in Fig. 8, which shows As Dirichlet BCs are enforced to NN predicted solutions, the uncertainty at the Dirichlet boundary locations is not evaluated.A zero standard deviation of the solution at these locations is shown in Fig. 6(c,e). PREPRINT - J

ANUARY

14, 2021 (a) deterministic loss (b) probabilistic loss (c) Σ Figure 7: NN training information for results shown in Fig. 6. (a) Loss from the deterministic NN. (b) Loss from theBNN. (c) Evolution of Σ from the BNN. (a) epoch 100 (b) epoch 500 (c) epoch 1000 (d) epoch 2000 (e) epoch 4000(f) epoch 100 (g) epoch 500 (h) epoch 1000 (i) epoch 2000 (j) epoch 4000 Figure 8: Solution distribution from BNNs along the dashed line at different epochs for BVP (i) (top) and BVP (ii)(bottom).that solutions from the BNN are stable during training regardless of the evolution of Σ . Such behavior is expected asthe BNN is initialized with optimal parameters from the deterministic NNs and is trained with a very small learningrate to only explore the local parameter space around the optimized parameters. The change of the probabilistic lossin Fig. 7(b) is attributed to the initial value of Σ , which differs from its actual value. Based on the observation inFig. 8, it is therefore reasonable to train the BNNs for a small number of epochs to evaluate the statistical momentsof BNN predicted solutions. For the remaining simulations in Section 6, the BNNs are trained for 100 epochs beforeevaluating the statistical moments of related quantities. The example in the previous section is fairly simple and is essentially a one-dimensional problem. In this section, weuse the proposed PDE constrained NNs to solve steady-state diffusion on an octagonal domain with mixed BCs, asshown in Fig. 4(b), whose solution is nonlinear in both X- and Y- directions. To keep the discussion concise for easyreading, the architectures of both deterministic and probabilistic NNs along with other training related information areprovided in the Appendix A.1.2. We follow the procedures described in Section 5.4 to train both types of NNs withtwo output resolutions, × and × .Similar to the development in Section 6.1.2, we compare solutions among DNSs, the deterministic NN, and the BNNfor these two output resolutions and show them qualitatively in Fig. 9(a,b), with quantitative comparison of thesolution distribution along both the horizontal and vertical dashed lines between DNSs and the BNN given in Fig.9(c-f). Standard deviations along the bottom edge and the top right corner edge in Fig. 9(a,b) are not evaluated and17 PREPRINT - J

ANUARY

14, 2021 (a) results ( × )(b) results ( × )(c) UQ horizontal ( × ) (d) UQ horizontal ( × ) (e) UQ vertical ( × ) (f) UQ vertical ( × ) Figure 9: Steady-state diffusion BVP on an octagonal domain with mixed BCs for different output resolutions. (a, b)Solutions from DNS, deterministic (det) NNs, and BNNs (Mean, Std.) for output resolutions of × and × ,respectively. (c-f) Quantitative comparison of the solution distribution between DNS and BNNs along the horizontaland vertical dashed lines, which shows that the NN solutions are more accurate with a ﬁner output resolution.are assigned a value of zero, because Dirichlet BCs are enforced to NN predicted solutions along these edges. Fig. 9shows that NN results from both output resolutions are comparable with the DNS solution. The comparison betweenFigs. 9(c,e) and 9(d,f) further shows that, as expected, the NN solutions are improved with a ﬁner output resolution.The uncertainties of the NN predicted solution along the horizontal line is larger than it along the vertical line. This isreasonable as the non-linearity in the solutions along the horizontal line is higher than it along the vertical line. Thisexample and the example in the previous section demonstrate that the proposed framework can properly recognizeboth regular and irregular problem domains, learn different BCs, and simultaneously solve multiple BVPs. In this section, we use the proposed method to solve different linear elasticity problems.

The general description of an elliptic PDE system given (1) is rewritten as ∇ · σ = on Ω , u ( X ) = ¯ u ( X ) on Γ u , T = ¯ T ( X ) on Γ T , (36)18 PREPRINT - J

ANUARY

14, 2021

BVP (i) BVP(ii) BVP (iii) (a) three selected BVPs on rectangle domains (b) l-shape

Figure 10: Illustration of the deformed shape of selected linear elasticity BVPs. The wireframe and the gray regionindicate the undeformed and deformed problem domain, respectively. (a) Three selected BVPs of out the 30 BVPssolved in section 6.2.2. (b) The L-shape BVP solved in section 6.2.3.Description Deterministic ProbabilisticTotal parameters 34,010 67,803Size of D × Aug: × Aug: Epochs 20,000 100Zero initialization epochs 100 -Optimizer Nadam NadamLearning Rate 2.5e-4 1e-8Batch Size 128 64 Σ - 1e-8Initial value of Σ - 1e-8Table 3: Training related parameters for solving 30 linear elasticity BVPs. Aug: data augmentation.for the linear elasticity problem. In (36), u represents the displacement ﬁeld, σ is the stress tensor, and T is the surfacetraction. Here, σ is related to the strain ε = (cid:0) ∇ u + ( ∇ u ) T (cid:1) via the following constitutive relationship σ = λ tr ( ε ) + 2 µ ε (37)where λ and µ are the Lamé constants, and is the second-order identity tensor. The discretized residual function (4)for the linear elasticity problem is written as R = n elem (cid:88) e =1 (cid:26)(cid:90) Ω e B T σ dV − (cid:90) Γ e,T N T ¯ T dS (cid:27) . (38)A set of material parameters with λ = 14 . and µ = 9 . is used in both DNSs and the surrogate PDE solver. In this section, we use the proposed PDE constrained NNs to simultaneously solve 30 linear elasticity BVPs, as shownin Fig. 4(a), with a resolution of × . The deformed problem domains from DNSs for three representative setupsare shown in Fig. 10(a). Similar architectures of both deterministic and probabilistic NNs as summarized in Table1 are used, except the last layer has two ﬁlters, representing u x and u y , instead of one for the steady-state diffusionproblem. The other training related NN parameters are summarized in Table 3. We follow the procedures described inSection 5.4 to train both types of NNs. The NN results of three selected BVPs, as shown in Fig. 10(a), are presentedin Fig. 11, with remaining results from other setups being given in Appendix A.2.1. The statistical moments of theBNN predictions are evaluated based on 50 MC samplings. In Fig. 11, BVP (i), (ii), and (iii) correspond to bc id 1(non-zero Dirichlet loading), bc id 2 (non-zero Neumann loading), bc id 3 (mixed loading) applied to domain id 5,respectively. The comparison of solutions among DNSs, the deterministic NN, and the BNN for these threeBVPs isshown qualitatively in Fig. 11(a,c,e,g,i,k), with quantitative comparison of the solution distribution along the dashedlines between DNSs and the BNN given in Fig. 11(b,d,f,h,j,l). Such a comparison shows that the proposed methodhas successfully solved most of the BVPs with desired accuracy. The results from NNs in Fig. 11(i,k,l) are slightlyworse than the DNSs. This happens mainly because the deformation for linear elasticity is small. The scaled resultshave a narrow range of [0 . , . , which is challenging for NNs to learn to distinguish, particularly for purely non-zero traction loadings. For the more challenging nonlinear elasticity case, where the deformation is large, the NNs19 PREPRINT - J

ANUARY

14, 2021 (a) u x results for BVP (i) (b) u x UQ for BVP (i)(c) u y results for BVP (i) (d) u y UQ for BVP (i)(e) u x results for BVP (ii) (f) u x UQ for BVP (ii)(g) u y results for BVP (ii) (h) u y UQ for BVP (ii)(i) u x results for BVP (iii) (j) u x UQ for BVP (iii)(k) u y results for BVP (iii) (l) u y UQ for BVP (iii)

Figure 11: Results of three selected BVPs out of the 30 linear elasticity BVPs with varying domains and differentapplied BCs simultaneously solved by a single deterministic or probabilistic NN with the proposed method. BVP (i),(ii), (iii) correspond to bc id 1, 2, and 3 for domain id 5, as shown in Fig. 4(a). (a, c, e, g, i, k) Solutions from DNS,deterministic (det) NNs, and BNNs (Mean, Std.) for different BVPs. (b, d, f, h, j, l) Quantitative comparison of thesolution distribution between DNS and BNNs along the dashed lines.20

PREPRINT - J

ANUARY

14, 2021 (a) u x results (b) u x UQ(c) u y results (d) u y UQ(e) F x results (f) F x results (g) F y results (h) F y results Figure 12: Results for the L-shape BVP. (a, c) Solutions from DNS, deterministic (det) NNs, and BNNs (Mean, Std.)for different BVPs. (b, d) Quantitative comparison of the solution distribution between the DNS and BNNs along thedashed lines. (e, g) Comparison of reaction force in X- and Y-direction between the DNS and deterministic NNs. (f,h) Comparison of reaction force in X- and Y-direction between the DNS and BNNs. “Inter” indicates the interpolatedprediction (see the text for an explanation.)can successfully solve such BVPs, as shown in Fig. 14. The performance difference between linear and nonlinearelasticity problems suggests that carefully choosing data normalization is important for improving the performance ofthe PDE constrained surrogate solvers.

So far, we have demonstrated the capability of the proposed method to solve PDEs on regular/irregular, andﬁxed/varying domains with different applied BCs. In this section, we use the proposed method to solve linear elasticityon a L-shape domain with fully constrained bottom edge and non-zero applied Dirichlet vertical loading on the leftedge, as shown in Fig. 4(c). We focus on the linear loading regime, with the ﬁnal deformed shape shown in Fig. 10(b),without accounting for any nonlinear behavior such as crack propagation [69, 70]. In previous examples, for each setof BCs, only one speciﬁc set of values for non-zero Dirichlet/Neumann loadings is exposed to the NNs. Even thoughthe NNs can successfully learn the BCs and solve the corresponding BVPs, it remains challenging for the NNs to makeinterpolating/extrapolating predictions, as the NNs do not learn the physical meaning of a set of BCs based on a singleloading data point. To enforce learning of the physical meaning of the BCs upon the NNs, for each set of BCs, wehave to expose NNs to multiple incremental loadings. For the L-shape BVP, we create 10 input data points (see thereaction force plots in 12(e-h)), with each differing from each other only by the actual values at the non-zero DirichletBCs. Since strains are small, we only focus on studying the interpolated predictions of the NNs, and leave the studyof extrapolated predictions for the non-linear elasticity BVPs investigated in section 6.3.3. During training, the NNsare exposed to 5 loading cases, as indicated by the red dots in Fig. 12(e-h). The 5 loading cases for interpolatingprediction are marked with green dots in Fig. 12(e-h). Again, to keep the discussion concise for easy reading, thearchitectures of both deterministic and probabilistic NNs along with other training related information are provided21

PREPRINT - J

ANUARY

14, 2021in the Appendix A.2.2. We follow the procedures described in Section 5.4 to train both types of NNs with an outputresolution of × .We compare solutions among DNSs, the deterministic NN, and the BNN for the last interpolating loading step andshow them qualitatively in Fig. 12(a,c), with quantitative comparison of the solution distribution along both thehorizontal and vertical dashed lines between DNSs and the BNN given in Fig. 12(b,d). Additional results from otherinterpolating loading steps are given in Appendix A.2.2. The reaction forces in both directions at the bottom edge areshown in Fig. 12(e-h). For BNNs, the statistical moments of reaction forces are evaluated based on averaging reactionforces computed from 50 MC samplings. Fig. 12 shows that NN results are comparable with the DNS solution. Forthis BVP setup, the reaction force in the X-direction is zero and is also very small in the Y-direction, as indicated bythe DNS results in Fig. 12(e,g). The deterministic NNs predict F y in the correct range, but fail to predict F x . For theBNNs, the predicted reaction forces is quite different from the DNS, which is expected, as both u x and u y at the bottomof the geometry is very small, thus leading to high uncertainties in the reaction forces. Still the accurate interpolatingprediction shown Fig. 12(b,d) is very appealing for problems, such as homogenization and inverse modelling studies,where many similar BVPs with small variations need to be simulated repeatedly. In this section, we use the proposed method to solve different nonlinear elasticity problems.

The general description of an elliptic PDE system given (1) is rewritten as ∇ · P = on Ω , u ( X ) = ¯ u ( X ) on Γ u , T = ¯ T ( X ) on Γ T , (39)for the non-linear elasticity problem with the subscript indicating the reference conﬁguration. In (39), u representsthe displacement ﬁeld, P is the ﬁrst Piola-Kirchhoff stress tensor, and T is the surface traction. In the non-linearelasticity problem, the deformation gradient is deﬁned as F = + ∂ u /∂ X with being the second-order identitytensor. The right Cauchy-Green deformation tensor is written as C = F T F . The following compressible Neo-hookean hyperelastic free energy function is considered W = 12 µ ( tr ( C )3 − − J )) + λ

12 ( J − , (40)with µ and λ as the Lamé constants and J = det( F ) . The Piola stress tensor P is computed as P = ∂W∂ F = λ ( J − J ) F − T + µ ( F − F − T ) . (41)The discretized residual function (4) for the non-linear elasticity problem is written as R = n elem (cid:88) e =1 (cid:26)(cid:90) Ω e B T P dV − (cid:90) Γ e,T N T ¯ T dS (cid:27) . (42)A set of material parameters with λ = 14 . and µ = 9 . is used in both DNSs and the surrogate PDE solver. In this section, we use the proposed PDE constrained NNs to simultaneously solve 30 nonlinear elasticity BVPs, asshow in Fig. 4(a), with a resolution of × . The deformed problem domains from DNSs for three representativesetups are shown in Fig. 13. The architectures of both deterministic and probabilistic NNs and the training relatedNN parameters used in this section are identical to those used in section 6.2.2 for solving linear elasticity BVPs. Wefollow the procedures described in Section 5.4 to train both types of NNs. The NN results of three selected BVPs, as Even with the zero-initialization process, the NN outputs at early stages of training could violate the physics, e.g. with anegative or zero determinant of the deformation gradient J . To ensure that the residual can be evaluated and to prevent residualsfrom these “bad” pixels values from contributing to the ﬁnal loss, we regularize the loss by omitting the residual contribution with J < . and J > . . As the training continues towards a later stage, the NN predicted solutions gradually fulﬁll the governingPDEs, and the regularization on J will cease to function. PREPRINT - J

ANUARY

14, 2021

BVP (i) BVP(ii) BVP (iii)

Figure 13: Illustration of the deformed shape of the three selected nonlinear elasticity BVPs. The wireframe and thegray region indicate the undeformed and deformed problem domain, respectively.Hardware Software Wall-time Averaged L errorFEM Intel i7-8750, 2.2GHz (use single core) mechanoChemFEM 110ms -deterministic NN GeForce GTX 1050 Ti, 4GB memory Tensorﬂow 0.22ms 3.06e-4BNN GeForce GTX 1050 Ti, 4GB memory Tensorﬂow 0.29ms 4.04e-4Table 4: Comparison of the wall-clock time of ﬁnite element simulation and the NN prediction for solving the BVPin Fig. 15. The wall-time is averaged over multiple simulations/predictions . The averaged L error in (43) betweenDNSs and the NN prediction conﬁrm the accuracy of the surrogate PDE solver.shown in Fig. 13, are presented in Fig. 14, with the remaining results from other setups given in Appendix A.3.1. Thestatistical moments of the BNN predictions are evaluated based on 50 MC samplings. In Fig. 14, BVP (i), (ii), and(iii) correspond to bc id 1 (non-zero Dirichlet loading), bc id 2 (non-zero Neumann loading), bc id 3 (mixed loading)applied to domain id 5. The comparison of solutions between DNSs, the deterministic NN, and the BNN for thesethreeBVPs is shown qualitatively in Fig. 14(a,c,e,g,i,k), with quantitative comparison of the solution distribution alongthe dashed lines between DNSs and the BNN given in Fig. 14(b,d,f,h,j,l). Such comparison shows that the proposedmethod has successfully solved multiple BVPs with desired accuracy. In this section, we explore the interpolating and extrapolating capability of the proposed framework for the BVP setupshown in Fig. 15. Both the DNS and NN solution have resolutions of × . The architectures of both deterministicand probabilistic NNs and the training related NN parameters used in this section are identical to those used in section6.2.2 for solving linear elasticity BVPs. The domain is ﬁxed in both directions on the left edge and is loaded withnon-zero Dirichlet loading in the X-direction and non-zero Neumann loading in the Y-direction. As discussed insection 6.2.3, in order to enforce the learning of the boundary conditions and make interpolating prediction, the NNsneed to be exposed to step-wise boundary loading. We train the NNs in four different cases and show the reactionforces in both directions for trained, interpolated and extrapolated BCs, as shown in Fig. 16. In each case, the trainingdataset contains different loading steps, as indicated by the red dots in Fig. 16. The total number of training loadingsteps decreases with increased case number. Thus, the total number of loading steps with extrapolating NN predictionincreases in these four cases, as indicated by the blue dots Fig. 16. The interpolating NN prediction is marked withgreen dots in Fig. 16. Unlike the reaction forces in the L-shape BVP, the NNs results matches the DNSs well. Wefurther observe that the interpolating predicted reaction forces in general are accurate for all the four cases. And theextrapolating predicted reaction forces are reasonable for a short range beyond the training range, especially for F x .The NN results from the last loading step in case (i) with extrapolating prediction are shown in Fig. 17, from whichwe can observe that the extrapolating NN prediction is still very accurate. Additional interpolating and extrapolatingNN prediction results for case (i) are given in Appendix A.3.2. With properly trained NNs, the surrogate model couldmake predictions for new BCs orders faster than the traditional numerical methods, as shown in Table 4. The time andvolume averaged L error between DNSs and the NN predictions for case (i) is computed as L = 1 L L (cid:88) l =1  K (cid:118)(cid:117)(cid:117)(cid:116) K (cid:88) k =1 (cid:16) y DNS l,k − y NN l,k (cid:17)  (43) The wall time might be reduced if the used software is further optimized. However, it is generally the case that NN predictioncould be orders faster than the DNS. PREPRINT - J

ANUARY

Figure 14: Results of three selected BVPs out of the 30 nonlinear elasticity BVPs with varying domains and differentapplied BCs simultaneously solved by a single deterministic or probabilistic NN with the proposed method. BVP (i),(ii), (iii) correspond to bc id 1, 2, and 3 for domain id 5, as shown in Fig. 4(a). (a, c, e, g, i, k) Solutions from DNS,deterministic (det) NNs, and BNNs (Mean, Std.) for different BVPs. (b, d, f, h, j, l) Quantitative comparison of thesolution distribution between DNS and BNNs along the dashed lines.24

PREPRINT - J

ANUARY

14, 2021Figure 15: Illustration of the deformed shape of the nonlinear elasticity BVP for NN interpolating and extrapolatingprediction with non-zero Dirichlet loading in the X-direction and non-zero Neumann loading in the Y-direction. Thewireframe and the gray region indicate the undeformed and deformed problem domain, respectively. (a) case (i) (b) case (ii) (c) case (iii) (d) case (iv)

Figure 16: Comparison of the reaction forces in both X- and Y-directions between DNSs and the BNN predictedsolution. For the four different cases, the loading steps exposed to NNs during training decrease, with increased NNpredicted extrapolating loading steps. The interpolating predicted reaction forces in general are good for all the fourcases. The extrapolating predicted reaction forces are reasonable for a short range beyond the training range, especiallyfor F x .with L ( = 30 ) indicating the total number of incremental loading steps (time) and K ( = 16 × ) indicating the totalpixels in the problem domain (volume). As shown in Table 4, the averaged L error is about 3.06e-4 for deterministicNNs and 4.04e-4 for BNNs. Compared to the unscaled DNS solution, which is in the range of [0 , , the L erroris small, which further conﬁrms the accuracy of the surrogate PDE solver. The prediction capability for interpolatedand extrapolated BCs with good accuracy is very useful for rapidly estimating the solution of BVPs with similarphysics but different BCs, particularly for homogenization and inverse problems where many similar BVPs with smallvariations need to be simulated repeatedly. In this work, an approach to solve PDEs with discretized residual constrained NNs is proposed. Both deterministicand probabilistic NNs with an encoder-decoder structure are explored in this work, with the latter to quantify theuncertainties from model parameters and noise in the data. An efﬁcient NN-based implementation to calculate thediscretized PDE residual is proposed. The NNs take a specially designed data structure, which contains informationof the problem domain and the applied BCs, to solve BVPs. The proposed approach is applied to different physicalproblems, including steady-state diffusion, linear and nonlinear elasticity. Different examples for each system areconsidered to demonstrate the capability and performance of the proposed approach, which can simultaneously solveBVPs with varying domains and different applied BCs. We also show the interpolation and extrapolation capabilityof the proposed NN solvers. The ability to make accurate interpolated and extrapolated predictions on BCs that theNNs have not been exposed to during training is particularly useful for rapidly estimating the solution of BVPs with25

PREPRINT - J

ANUARY

14, 2021 (a) u x results (b) u x UQ(c) u y results (d) u y UQ Figure 17: Results of the last loading step with NN extrapolating prediction for case (i) in Fig. 16. (a, c) Solutionsfrom DNS, deterministic (det) NNs, and BNNs (Mean, Std.) for different BVPs. (b, d) Quantitative comparison of thesolution distribution between DNS and BNNs along the dashed lines.similar physics but different BCs, particularly for homogenization and inverse problems where many similar BVPswith small variations need to be simulated repeatedly. A well-trained NN-based PDE solver can be easily shared andreused by users to investigate BVPs, and attains solutions much faster compared to the traditional numerical methodswith acceptable error for most engineering design and decision-making applications. The trained NN PDE solver canbe further trained to improve its accuracy and capability by using different transfer learning techniques as studied in[27, 71]. The proposed approach is generalizable and can be easily applied to other physical systems. Extending theproposed approach to transient systems is currently being investigated in a subsequent work by the authors.

Acknowledgments

We gratefully acknowledge the support of Toyota Research Institute, Award

A Supporting materials

A.1 Steady-state diffusionA.1.1 Multiple rectangle domains with different BCs

Additional results from the 20 BVPs with rectangle domains are summarized in Fig. 18.

A.1.2 Single octagon domain with mixed BCs

NN structure information for the octagon domain simulation are summarized in Table 5 and 6.

A.2 Linear elasticityA.2.1 Multiple rectangle domains with different BCs

Additional results from the 30 BVPs with rectangle domains are summarized in Fig. 19 and 20.26

PREPRINT - J

ANUARY

14, 2021Figure 18: Additional NN results for steady-state diffusion BVPs on rectangle domains.Deterministic Probabilistic Size Layer argumentsInput Input - -LayerFillRandomNumber LayerFillRandomNumber - -Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameFlatten Flatten - -Dense DenseFlipout units = 32 ReLUDense DenseFlipout units = 32 ReLUReshape Reshape - [4 , , Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUConv2D Convolution2DFlipout ﬁlters = 1 kernel (5,5), padding: same, ReLUTable 5: Details of both deterministic and probabilistic NNs for solving diffusion BVPs on the octagon domain withan output resolution of × . 27 PREPRINT - J

ANUARY

14, 2021Figure 19: Additional NN results for linear BVPs on rectangle domains.28

PREPRINT - J

ANUARY

14, 2021Figure 20: Additional NN results for linear BVPs on rectangle domains (continue).29

PREPRINT - J

ANUARY

14, 2021Description Deterministic ProbabilisticTotal parameters 16,049 31,970Size of D × Aug: × Aug: Epochs 20,000 100Zero initialization epochs 100 -Optimizer Nadam NadamLearning Rate 2.5e-4 1e-8Batch Size 256 64 Σ - 1e-8Initial value of Σ - 1e-8Table 6: Training related parameters for solving steady-state diffusion on the octagon domain. Aug: data augmenta-tion. Deterministic Probabilistic Size Layer argumentsInput Input - -LayerFillRandomNumber LayerFillRandomNumber - -Conv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 8 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameConv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUMaxPooling2D MaxPooling2D - kernel (2,2), padding: sameFlatten Flatten - -Dense DenseFlipout units = 32 ReLUDense DenseFlipout units = 128 ReLUReshape Reshape - [4 , , Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUUpSampling2D UpSampling2D - size (2,2)Conv2D Convolution2DFlipout ﬁlters = 16 kernel (5,5), padding: same, ReLUConv2D Convolution2DFlipout ﬁlters = 2 kernel (5,5), padding: same, ReLUTable 7: Details of both deterministic and probabilistic NNs for solving linear elasticity L-shape BVPs.

A.2.2 L-shape domains

NN structure information for the L-shape domain simulation are summarized in Table 7 and 8. Additional interpolatedprediction results from the L-shape domain simulation are summarized in Fig. 21.Description Deterministic ProbabilisticTotal parameters 41,346 82,435Size of D × Aug: × Aug: Epochs 10,000 100Zero initialization epochs 100 -Optimizer Adam NadamLearning Rate 2.5e-4 1e-8Batch Size 256 64 Σ - 1e-8Initial value of Σ - 1e-8Table 8: Training related parameters for solving linear elasticity L-shape BVPs. Aug: data augmentation.30 PREPRINT - J

ANUARY

14, 2021Figure 21: Additional NN results for linear BVPs on L-shape domains.

A.3 Non-linear elasticityA.3.1 Multiple rectangle domains with different BCs

Additional results from the 30 BVPs with rectangle domains are summarized in Fig. 22 and 23.

A.3.2 Rectangle domain with solution interpolation and extrapolation

Additional interpolated and extrapolated NN prediction results for case (i) are given in Fig. 24 and 25, respectively.

References [1] Rampi Ramprasad, Rohit Batra, Ghanshyam Pilania, Arun Mannodi-Kanakkithodi, and Chiho Kim. Machinelearning in materials informatics: Recent applications and prospects. npj Computational Materials , 3, 2017.[2] Frederic E. Bock, Roland C. Aydin, Christian J. Cyron, Norbert Huber, Surya R. Kalidindi, and Benjamin Kluse-mann. A Review of the Application of Machine Learning and Data Mining Approaches in Continuum MaterialsMechanics.

Frontiers in Materials , 6, 2019.[3] B. Meredig, A. Agrawal, S. Kirklin, J. E. Saal, J. W. Doak, A. Thompson, K. Zhang, A. Choudhary, andC. Wolverton. Combinatorial screening for new materials in unconstrained composition space with machinelearning.

Physical Review B - Condensed Matter and Materials Physics , 89:1–7, 2014.[4] Logan Ward, Ankit Agrawal, Alok Choudhary, and Christopher Wolverton. A general-purpose machine learningframework for predicting properties of inorganic materials. npj Computational Materials , 2:1–7, 2016.[5] Y. M.A. A Hashash, S. Jung, and J. Ghaboussi. Numerical implementation of a neural network based materialmodel in ﬁnite element analysis.

Int. J. Numer. Meth. Eng. , 59:989–1005, 2004.[6] Rubén Ibanez, Emmanuelle Abisset-Chavanne, David Gonzalez, Jean Louis Duval, Elias Cueto, and FranciscoChinesta. Hybrid constitutive modeling: data-driven learning of corrections to plasticity models.

InternationalJournal of Material Forming , pages 717–725, 2018.[7] Kun Wang and WaiChing Sun. Meta-modeling game for deriving theory-consistent, microstructure-based trac-tion–separation laws via deep reinforcement learning.

Comput. Methods Appl. Mech. Engrg. , 346:216–241,2019.[8] Daniel Z. Huang, Kailai Xu, Charbel Farhat, and Eric Darve. Learning constitutive relations from indirectobservations using deep neural networks.

J. Comput. Phys. , 416:109491, 2020.[9] Felix Brockherde, Leslie Vogt, Li Li, Mark E. Tuckerman, Kieron Burke, and Klaus Robert Müller. Bypassingthe Kohn-Sham equations with machine learning.

Nat. Commun. , 8, 2017.[10] Gregory H. Teichert, A. R. Natarajan, A. Van der Ven, and Krishna Garikipati. Machine learning materialsphysics: Integrable deep neural networks enable scale bridging by learning free energy functions.

Comput.Methods Appl. Mech. Engrg. , 353:201–216, 2019. 31

PREPRINT - J

ANUARY

14, 2021Figure 22: Additional NN results for linear BVPs on rectangle domains.32

PREPRINT - J

ANUARY

14, 2021Figure 23: Additional NN results for linear BVPs on rectangle domains (continue).33

PREPRINT - J

ANUARY

14, 2021Figure 24: Additional interpolating NN prediction results for case (i).Figure 25: Additional extrapolating NN prediction results for case (i).34

PREPRINT - J

ANUARY

14, 2021[11] G. H. Teichert, A. R. Natarajan, A. Van der Ven, and K. Garikipati. Scale bridging materials physics: Ac-tive learning workﬂows and integrable deep neural networks for free energy function representations in alloys.

Comput. Methods Appl. Mech. Engrg. , 371:113281, 2020.[12] Steven L. Brunton, Joshua L. Proctor, J. Nathan Kutz, and William Bialek. Discovering governing equationsfrom data by sparse identiﬁcation of nonlinear dynamical systems.

Proc. Natl. Acad. Sci. U.S.A. , 113:3932–3937, 2016.[13] Zhenlin Wang, Xun Huan, and Krishna Garikipati. Variational system identiﬁcation of the partial differentialequations governing the physics of pattern-formation: Inference under varying ﬁdelity and noise.

Comput.Methods Appl. Mech. Engrg. , 356:44–74, 2019.[14] Z. Wang, X. Zhang, G. H. Teichert, M. Carrasco-Teja, and K. Garikipati. System inference for the spatio-temporal evolution of infectious diseases: Michigan in the time of COVID-19.

Comput. Mech. , 66:1153–1176,2020.[15] Z. Wang, X. Huan, and K. Garikipati. Variational system identiﬁcation of the partial differential equationsgoverning microstructure evolution in materials: Inference over sparse and spatially unrelated data. pages 1–43,2020.[16] Zhenlin Wang, Bowei Wu, Krishna Garikipati, and Xun Huan. A Perspective on Regression and BayesianApproaches for System Identiﬁcation of Pattern Formation Dynamics.

Theoretical & Applied Mechanics Letters ,10:188–194, 2020.[17] Ahmet Cecen, Hanjun Dai, Yuksel C. Yabansu, Surya R. Kalidindi, and Le Song. Material structure-propertylinkages using three-dimensional convolutional neural networks.

Acta Mater. , 146:76–84, 2018.[18] Xiang Li, Zhanli Liu, Shaoqing Cui, Chengcheng Luo, Chenfeng Li, and Zhuo Zhuang. Predicting the effectivemechanical property of heterogeneous materials by image based modeling and deep learning.

Comput. MethodsAppl. Mech. Engrg. , 347:735–753, 2019.[19] Zijiang Yang, Yuksel C. Yabansu, Reda Al-Bahrani, Wei keng Liao, Alok N. Choudhary, Surya R. Kalidindi,Ankit Agrawal, Wei keng Liao, Alok N. Choudhary, Surya R. Kalidindi, Ankit Agrawal, Wei keng Liao, Alok N.Choudhary, Surya R. Kalidindi, and Ankit Agrawal. Deep learning approaches for mining structure-propertylinkages in high contrast composites from simulation datasets.

Comput. Mater. Sci. , 151:278–287, 2018.[20] Ruho Kondo, Shunsuke Yamakawa, Yumi Masuoka, Shin Tajima, and Ryoji Asahi. Microstructure recognitionusing convolutional neural networks for prediction of ionic conductivity in ceramics.

Acta Mater. , 141:29–38,2017.[21] Ridha Hambli, Houda Katerchi, and Claude Laurent Benhamou. Multiscale methodology for bone remod-elling simulation using coupled ﬁnite element and neural network computation.

Biomech. Model. Mechanobiol. ,10:133–145, 2011.[22] Miguel A. Bessa, R. Bostanabad, Zeliang Liu, A. Hu, Daniel W. Apley, C. Brinson, W. Chen, and Wing Kam Liu.A framework for data-driven analysis of materials under uncertainty: Countering the curse of dimensionality.

Comput. Methods Appl. Mech. Engrg. , 320:633–667, 2017.[23] Ari L. Frankel, Reese E. Jones, Coleman Alleman, and Jeremy Templeton. Predicting the mechanical responseof oligocrystals with deep learning. pages 1–22, 2019.[24] Kun Wang and WaiChing Sun. A multiscale multi-permeability poroplasticity model linked by recursive homog-enizations and deep learning.

Comput. Methods Appl. Mech. Engrg. , 334:337–380, 2018.[25] B. A. Le, Julien Yvonnet, and Q. C. He. Computational homogenization of nonlinear elastic materials usingneural networks.

Int. J. Numer. Meth. Eng. , 104:1061–1084, 2015.[26] Xiaoxin Lu, Dimitris G. Giovanis, Julien Yvonnet, Vissarion Papadopoulos, Fabrice Detrez, and Jinbo Bai.A data-driven computational homogenization method based on neural networks for the nonlinear anisotropicelectrical response of graphene/polymer nanocomposites.

Comput. Mech. , 64:307–321, 2019.[27] Xiaoxuan Zhang and Krishna Garikipati. Machine learning materials physics: Multi-resolution neural networkslearn the free energy and nonlinear elastic response of evolving microstructures.

Comput. Methods Appl. Mech.Engrg. , 372:113362, 2020.[28] Yinhao Zhu and Nicholas Zabaras. Bayesian deep convolutional encoder-decoder networks for surrogate mod-eling and uncertainty quantiﬁcation.

J. Comput. Phys. , 366:415–447, 2018.[29] Nick Winovich, Karthik Ramani, and Guang Lin. ConvPDE-UQ: Convolutional neural networks with quanti-ﬁed uncertainty for heterogeneous elliptic partial differential equations on varied domains.

J. Comput. Phys. ,394:263–279, 2019. 35

PREPRINT - J

ANUARY

14, 2021[30] Saakaar Bhatnagar, Yaser Afshar, Shaowu Pan, and Karthik Duraisamy. Prediciton of Aerodynamic Flow FieldsUsing Convolutional Neural Networks.

Comput. Mech. , 5:1–30, 2019.[31] Angran Li, Ruijia Chen, Amir Barati Farimani, and Yongjie Jessica Zhang. Reaction diffusion system predictionbased on convolutional neural network.

Sci. Rep. , 10:1–9, 2020.[32] Isaac Lagaris, Aristidis Likas, and Dimitrios I. Fotiadis. Artiﬁcial neural networks for solving ordinary andpartial differential equations.

IEEE Transactions on Neural Networks , 9:987–1000, 1998.[33] Jiequn Han, Arnulf Jentzen, Weinan E, and E. Weinan. Solving high-dimensional partial differential equationsusing deep learning.

Proceedings of the National Academy of Sciences , 115:8505–8510, 2018.[34] Justin Sirignano and Konstantinos Spiliopoulos. DGM: A deep learning algorithm for solving partial differentialequations.

J. Comput. Phys. , 375:1339–1364, 2018.[35] Maziar Raissi, Paris Perdikaris, and George E. Karniadakis. Physics-informed neural networks: A deep learn-ing framework for solving forward and inverse problems involving nonlinear partial differential equations.

J.Comput. Phys. , 378:686–707, 2019.[36] Yinhao Zhu, Nicholas Zabaras, Phaedon Stelios Koutsourelakis, and Paris Perdikaris. Physics-constrained deeplearning for high-dimensional surrogate modeling and uncertainty quantiﬁcation without labeled data.

J. Comput.Phys. , 394:56–81, 2019.[37] Nicholas Geneva and Nicholas Zabaras. Modeling the dynamics of PDE systems with physics-constrained deepauto-regressive networks.

J. Comput. Phys. , 403:109056, 2020.[38] Liu Yang, Xuhui Meng, and George Em Karniadakis. B-PINNs: Bayesian physics-informed neural networks forforward and inverse PDE problems with noisy data.

J. Comput. Phys. , 425:109913, 2021.[39] Jens Berg and Kaj Nystroem. A uniﬁed deep artiﬁcial neural network approach to partial differential equationsin complex geometries.

Neurocomputing , 317:28–41, 2018.[40] Luning Sun, Han Gao, Shaowu Pan, and Jian Xun Wang. Surrogate modeling for ﬂuid ﬂows based on physics-constrained deep learning without simulation data.

Comput. Methods Appl. Mech. Engrg. , 361:112732, 2020.[41] E. Samaniego, C. Anitescu, S. Goswami, V. M. Nguyen-Thanh, H. Guo, K. Hamdia, X. Zhuang, and T. Rabczuk.An energy approach to the solution of partial differential equations in computational mechanics via machinelearning: Concepts, implementation and applications.

Comput. Methods Appl. Mech. Engrg. , 362:112790, 2020.[42] Xiaowei Jin, Shengze Cai, Hui Li, and George Em Karniadakis. NSFnets (Navier-Stokes ﬂow nets): Physics-informed neural networks for the incompressible Navier-Stokes equations.

J. Comput. Phys. , 1:109951, 2020.[43] Guofei Pang, L U Lu, and George E. Karniadakis. fpinns: Fractional physics-informed neural networks.

SIAMJournal on Scientiﬁc Computing , 41:A2603–A2626, 2019.[44] Xuhui Meng, Zhen Li, Dongkun Zhang, and George E. Karniadakis. PPINN: Parareal physics-informed neuralnetwork for time-dependent PDEs.

Comput. Methods Appl. Mech. Engrg. , 370:113250, 2020.[45] Sifan Wang and Paris Perdikaris. Deep Learning of Free Boundary and Stefan Problems. arXiv , page 109914,2020.[46] Ameya D. Jagtap, Ehsan Kharazmi, and George Em Karniadakis. Conservative physics-informed neural net-works on discrete domains for conservation laws: Applications to forward and inverse problems.

Comput.Methods Appl. Mech. Engrg. , 365:113028, 2020.[47] Yaohua Zang, Gang Bao, Xiaojing Ye, and Haomin Zhou. Weak adversarial networks for high-dimensionalpartial differential equations.

J. Comput. Phys. , 411:109409, 2020.[48] Fan Chen, Jianguo Huang, Chunmei Wang, and Haizhao Yang. Friedrichs Learning: Weak Solutions of PartialDifferential Equations via Deep Learning. 1:1–24, 2020.[49] Reza Khodayi-mehr and Michael M. Zavlanos. VarNet: Variational Neural Networks for the Solution of PartialDifferential Equations. arXiv , 2019.[50] Ke Li, Kejun Tang, Tianfan Wu, and Qifeng Liao. D3M: A Deep Domain Decomposition Method for PartialDifferential Equations.

IEEE Access , 8:5283–5294, 2020.[51] Ehsan Kharazmi, Zhongqiang Zhang, and George E.M. Karniadakis. hp-VPINNs: Variational physics-informedneural networks with domain decomposition.

Comput. Methods Appl. Mech. Engrg. , 374:113547, 2021.[52] Armen Der Kiureghian and Ove Ditlevsen. Aleatory or epistemic? Does it matter?

Structural Safety , 31:105–112, 2009. 36

PREPRINT - J

ANUARY

14, 2021[53] Dongkun Zhang, Lu Lu, Ling Guo, George E. Karniadakis, and George Em. Quantifying total uncertainty inphysics-informed neural networks for solving forward and inverse stochastic problems.

J. Comput. Phys. , 397:1–19, 2019.[54] Yibo Yang and Paris Perdikaris. Adversarial uncertainty quantiﬁcation in physics-informed neural networks.

J.Comput. Phys. , 394:136–152, 2019.[55] Yarin Gal and Zoubin Ghahramani. Dropout as a Bayesian Approximation: Representing Model Uncertainty inDeep Learning. In

Proceedings of the 33rd International Conference on Machine Learning , volume 48, pages1050–1059, 2016.[56] Wesley J. Maddox, Timur Garipov, Izmailov, Dmitry Vetrov, and Andrew Gordon Wilson. A simple baseline forBayesian uncertainty in deep learning.

Advances in Neural Information Processing Systems , 32:1–25, 2019.[57] David M. Blei, Alp Kucukelbir, and Jon D. McAuliffe. Variational Inference: A Review for Statisticians.

Journalof the American Statistical Association , 112:859–877, 2017.[58] Diederik P. Kingma and Max Welling. Auto-encoding variational bayes. , pages 1–14, 2014.[59] Xihaier Luo and Ahsan Kareem. Bayesian deep learning with hierarchical prior: Predictions from limited andnoisy data.

Structural Safety , 84:101918, 2020.[60] Han Gao, Luning Sun, and Jian Xun Wang. PhyGeoNet: Physics-Informed Geometry-Adaptive ConvolutionalNeural Networks for Solving Parametric PDEs on Irregular Domain. arXiv , pages 1–45, 2020.[61] Andrew Gelman, John B Carlin, Hal S Stern, David B Dunson, Aki Vehtari, and Donald B Rubin.

Bayesian dataanalysis . CRC press, 2013.[62] Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose Bayesian inference algorithm.

Advances in Neural Information Processing Systems , pages 2378–2386, 2016.[63] Alex Graves. Practical variational inference for neural networks.

Advances in Neural Information ProcessingSystems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 , pages 1–9,2011.[64] Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, and Daan Wierstra. Weight uncertainty in neural net-works. , 2:1613–1622, 2015.[65] Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, and Roger Grosse. Flipout: Efﬁcient pseudo-independentweight perturbations on mini-batches. , pages 1–16, 2018.[66] Sergey Ioffe and Christian Szegedy. Batch Normalization: Accelerating Deep Network Training by ReducingInternal Covariate Shift. arXiv preprint arXiv:1502.03167 , 2015.[67] Alex Kendall and Yarin Gal. What uncertainties do we need in Bayesian deep learning for computer vision?

Advances in Neural Information Processing Systems , 2017-Decem:5575–5585, 2017.[68] G Alzetta, D Arndt, Wolfgang Bangerth, V Boddu, B Brands, D Davydov, R Gassmoeller, T Heister, L Heltai,K Kormann, M Kronbichler, M Maier, J.-P. Pelteret, B Turcksin, and D Wells. The deal.II Library, Version 9.0.

Journal of Numerical Mathematics , 2018.[69] Bernhard Josef Winkler.

Traglastuntersuchungen von unbewehrten und bewehrten Betonstrukturen auf derGrundlage eines objektiven Werkstoffgesetzes für Beton . Innsbruck University Press, 2001.[70] Christian Linder and Xiaoxuan Zhang. A marching cubes based failure surface propagation concept for three-dimensional ﬁnite elements with non-planar embedded strong discontinuities of higher-order kinematics.

Inter-national Journal for Numerical Methods in Engineering , 96(6):339–372, 2013.[71] Gregory H. Teichert and Krishna Garikipati. Machine learning materials physics: Surrogate optimization andmulti-ﬁdelity algorithms predict precipitate morphology in an alternative to phase ﬁeld dynamics.