[PDF] SciANN: A Keras/Tensorflow wrapper for scientific computations and physics-informed deep learning using artificial neural networks

Abstract

In this paper, we introduce SciANN, a Python package for scientific computing and physics-informed deep learning using artificial neural networks. SciANN uses the widely used deep-learning packages Tensorflow and Keras to build deep neural networks and optimization models, thus inheriting many of Keras's functionalities, such as batch optimization and model reuse for transfer learning. SciANN is designed to abstract neural network construction for scientific computations and solution and discovery of partial differential equations (PDE) using the physics-informed neural networks (PINN) architecture, therefore providing the flexibility to set up complex functional forms. We illustrate, in a series of examples, how the framework can be used for curve fitting on discrete data, and for solution and discovery of PDEs in strong and weak forms. We summarize the features currently available in SciANN, and also outline ongoing and future developments.

Full PDF

SSciANN: A Keras wrapper for scientiﬁc computations andphysics-informed deep learning using artiﬁcial neural networks

Ehsan Haghighat a , Ruben Juanes a a Massachusetts Institute of Technology, Cambridge, MA

Abstract

In this paper, we introduce SciANN, a Python package for scientiﬁc computing and physics-informed deep learning using artiﬁcial neural networks. SciANN uses the widely used deep-learning packages Tensorﬂow and Keras to build deep neural networks and optimizationmodels, thus inheriting many of Keras’s functionalities, such as batch optimization and modelreuse for transfer learning. SciANN is designed to abstract neural network construction forscientiﬁc computations and solution and discovery of partial diﬀerential equations (PDE)using the physics-informed neural networks (PINN) architecture, therefore providing theﬂexibility to set up complex functional forms. We illustrate, in a series of examples, how theframework can be used for curve ﬁtting on discrete data, and for solution and discovery ofPDEs in strong and weak forms. We summarize the features currently available in SciANN,and also outline ongoing and future developments.

Keywords:

SciANN, Deep Neural Networks, Scientiﬁc Computations, PINN, vPINN

1. Introduction

Over the past decade, artiﬁcial neural networks, also known as deep learning, haverevolutionized many computational tasks, including image classiﬁcation and computer vi-sion [1, 2, 3], search engines and recommender systems [4, 5], speech recognition [6], au-tonomous driving [7], and healthcare [8] (for a review, see, e.g. [9]). Even more recently, thisdata-driven framework has made inroads in engineering and scientiﬁc applications, such asearthquake detection [10, 11, 12], ﬂuid mechanics and turbulence modeling [13, 14], dynam-ical systems [15], and constitutive modeling [16, 17]. A recent class of deep learning knownas physics-informed neural networks (PINN) has been shown to be particularly well suitedfor solution and inversion of equations governing physical systems, in domains such as ﬂuidmechanics [18, 19], solid mechanics [20] and dynamical systems [21]. This increased inter-est in engineering and science is due to the increased availability of data and open-sourceplatforms such as Theano [22], Tensorﬂow [23], MXNET [24], and Keras [25], which oﬀerfeatures such as high-performance computing and automatic diﬀerentiation [26].Advances in deep learning have led to the emergence of diﬀerent neural network archi-tectures, including densely connected multi-layer deep neural networks (DNNs), convolu-tional neural networks (CNNs), recurrent neural networks (RNNs) and residual networks(ResNets). This proliferation of network architectures, and the (often steep) learning curvefor each package, makes it challenging for new researchers in the ﬁeld to use deep learning

Preprint submitted to Elsevier May 19, 2020 a r X i v : . [ c s . OH ] M a y ools in their computational workﬂows. In this paper, we introduce an open-source Pythonpackage, SciANN, developed on Tensorﬂow and Keras, which is designed with scientiﬁc com-putations and physics-informed deep learning in mind. As such, the abstractions used inthis programming interface target engineering applications such as model ﬁtting, solution ofordinary and partial diﬀerential equations, and model inversion (parameter identiﬁcation).The outline of the paper is as follows. We ﬁrst describe the functional form associatedwith deep neural networks. We then discuss diﬀerent interfaces in SciANN that can be used toset up neural networks and optimization problems. We then illustrate SciANN’s applicationto curve ﬁtting, the solution of the Burgers equation, and the identiﬁcation of the Navier–Stokes equations and the von Mises plasticity model from data. Lastly, we show how touse SciANN in the context of the variational PINN framework [27]. The examples discussedhere and several additional applications are freely available at github.com/sciann/sciann-applications.

2. Artiﬁcial Neural Networks as Universal Approximators

A single-layer feed-forward neural network with inputs x ∈ R m , outputs y ∈ R n , and d hidden units is constructed as: y = W σ ( W x + b ) + b , (1)where ( W ∈ R d × m , b ∈ R d ), ( W ∈ R n × d , b ∈ R n ) are parameters of this transformation,also known as weights and biases, and σ is the activation function. As shown in [28, 29],this transformation can approximate any measurable function, independently of the sizeof input features m or the activation function σ . If we deﬁne the transformation Σ asΣ i (ˆ x i ) := ˆ y i = σ i ( W i ˆ x i + b i ) with ˆ x i as the input to and ˆ y i as the output of any hiddenlayer i , x = ˆ x as the main input to the network, and y = Σ L (ˆ x L ) as the ﬁnal output of thenetwork, we can construct a general L -layer neural network as composition of Σ i functionsas: y = Σ L ◦ Σ L − ◦ · · · ◦ Σ ( x ) , (2)with σ i as activation functions that make the transformations nonlinear. Some commonactivation functions are: ReLU : ˆ x (cid:55)→ ˆ x + , sigmoid : ˆ x (cid:55)→ / (1 + e ˆ x ) , tanh : ˆ x (cid:55)→ ( e ˆ x − e − ˆ x ) / ( e ˆ x + e − ˆ x ) . (3)In general, this multilayer feed-forward neural network is capable of approximating functionsto any desired accuracy [28, 30]. Inaccurate approximation may arise due to lack of a deter-ministic relation between input and outputs, insuﬃcient number of hidden units, inadequatetraining, or poor choice of the optimization algorithm.The parameters of the neural network, W i and b i of all layers i = { . . . L } , are iden-tiﬁed through minimization using a back-propagation algorithm [31]. For instance, if weapproximate a ﬁeld variable such as temperature T with a multi-layer neural network as T ( x ) ≈ ˆ T ( x ) = N T ( x ; W , b ), we can set up the optimization problem asarg min W , b L ( W , b ) := (cid:13)(cid:13)(cid:13) T ( x ∗ ) − ˆ T ( x ∗ ) (cid:13)(cid:13)(cid:13) = (cid:107) T ( x ∗ ) − N T ( x ∗ ; W , b ) (cid:107) , (4)2here x ∗ is the set of discrete training points, and (cid:107)◦(cid:107) p is the mean squared norm. Notethat one can use other choices for the loss function L , such as mean absolute error or cross-entropy. The optimization problem (4) is nonconvex, which may require signiﬁcant trial anderror eﬀorts to ﬁnd an eﬀective optimization algorithm and optimization parameters.We can construct deep neural networks with an arbitrary number of layers and neurons.We can also deﬁne multiple networks and combine them to generate the ﬁnal output. Thereare many types of neural networks that have been optimized for speciﬁc tasks. An exampleis the ResNet architecture introduced for image classiﬁcation, consisting of many blocks,each of the form: z k = Σ k ◦ Σ k ◦ Σ k ( z k − ) + z k − , (5)where k is the block number and z k − is the output of previous block, with x = z and y = z K as the main inputs to and outputs of the network. Therefore, artiﬁcial neural networks oﬀera simple way of constructing very complex but dependent solution spaces (see, e.g., Fig. 1). x σσσσσσσσ y f1 σσσσ f2 g σσσσσσσσ σσσσ Figure 1: A sample multi-net architecture to construct a complex functional space g as g ( x, y ) = g ( f ( x, y ) , f ( x, y )).

3. SciANN: Scientiﬁc Computing with Artiﬁcial Neural Networks

SciANN is an open-source neural-network library, based on Tensorﬂow [23] and Keras [25],which abstracts the application of deep learning for scientiﬁc computing purposes. In thissection, we discuss abstraction choices for SciANN and illustrate how one can use it forscientiﬁc computations. 3 .1. Brief description of SciANN

SciANN is implemented on the most popular deep-learning packages, Tensorﬂow andKeras, and therefore it inherits all the functionalities they provide. Among those, the mostimportant ones include graph-based automatic diﬀerentiation and massive heterogeneoushigh-performance computing capabilities. It is designed for an audience with a backgroundin scientiﬁc computation or computational science and engineering.SciANN currently supports fully connected feed-forward deep neural networks, and recur-rent networks are under development. Some architectures, such as convolutional networks,are not a good ﬁt for scientiﬁc computing applications and therefore are not currently inour development plans. Tensorﬂow and Keras provide a wide range of features, includingoptimization algorithms, automatic diﬀerentiation, and model parameter exports for transferlearning.To install SciANN, one can simply use the Python’s pip package installer as: p i p i n s t a l l s c i a n n

It can be imported into the active Python environment using Python’s import module: import sciann as sn Its mathematical functions are located in the sn.math interface. For instance, the functiondiﬀ is accessed through sn.math.diff . The main building blocks of SciANN include: • sn.Variable : class to deﬁne inputs to the network. • sn.Field : class to deﬁne outputs of the network. • sn.Functional : class to construct a nonlinear neural network approximation. • sn.Parameter : class to deﬁne a parameter for inversion purposes. • sn.Data, sn.Tie : class to deﬁne the targets. If there are observations for any variable,the ‘sn.Data’ interface is used when building the optimization model. For physicalconstraints such as PDEs or equality relations between diﬀerent variables, the ‘sn.Tie’interface is designed to build the optimizer. • sn.SciModel : class to set up the optimization problem, i.e. inputs to the networks,targets (objectives), and the loss function. • sn.math : mathematical operations are accessed here. SciANN also support operatoroverloading, which improves readability when setting up complex mathematical rela-tions such as PDEs. 4 .2. An illustrative example: curve ﬁtting We illustrate SciANN’s capabilities with its application to a curve-ﬁtting problem. Givena set of discrete data, generated from f ( x, y ) = sin( x ) sin( y ) over the domain x, y → [ − π, π ] × [ − π, π ], we want to ﬁt a surface, in the form of a neural network, to this dataset. A multi-layer neural network approximating the function f can be constructed as ˆ f : ( x, y ) (cid:55)→N f ( x, y ; W , b ), with inputs x, y and output ˆ f . In the most common mathematical andPythonic abstraction, the inputs x, y and output ˆ f can be implemented as: x = sn . V a r i a b l e ( " x " ) y = sn . V a r i a b l e ( " y " ) f = sn . Field ( " f " ) A 3-layer neural network with 6 neural units and hyperbolic-tangent activation function canthen be constructed as f = sn . F u n c t i o n a l ( fields =[ f ] , v a r i a b l e s =[ x , y ] , h i d d e n _ l a y e r s =[6 , 6 , 6] , actf = " tanh " ) This deﬁnition can be further compressed as f = sn . F u n c t i o n a l ( " f " , [x , y ] , [6 , 6 , 6] , " tanh " ) At this stage, the parameters of the networks, i.e. set of W , b for all layers, are randomlyinitialized. Their current values can be retrieved using the command get_weights : f . g e t _ w e i g h t s () One can set the parameters of the network to any desired values using the command set_weights .As another example, a more complex neural network functional as the composition ofthree blocks, as shown in Fig. 1, can be constructed as f1 = sn . F u n c t i o n a l ( " f1 " , [x , y ] , [4 , 4] , " tanh " ) f2 = sn . F u n c t i o n a l ( " f2 " , [x , y ] , [4 , 4] , " tanh " ) g = sn . F u n c t i o n a l ( " g " , [ f1 , f2 ] , [4 , 4] , " tanh " ) Any of these functions can be evaluated immediately or after training using the eval function, by providing discrete data for the inputs:5 f_test = f . eval ([ x_data , y_data ]) f 1 _ te s t = f1 . eval ([ x_data , y_data ]) f 2 _ te s t = f2 . eval ([ x_data , y_data ]) g_test = g . eval ([ f1_data , f 2 _ da t a ]) Once the networks are initialized, we set up the optimization problem and train thenetwork by minimizing an objective function, i.e. solving the optimization problem for W and b . The optimization problem for a data-driven curve-ﬁtting is deﬁned as:arg min W , b L ( W , b ) := (cid:107) f ( x ∗ , y ∗ ) − N f ( x ∗ , y ∗ ; W , b ) (cid:107) , (6)where x ∗ , y ∗ is the set of all discrete points where f is given. For the loss-function (cid:107)◦(cid:107) , weuse the mean squared-error norm (cid:107)◦(cid:107) = N (cid:80) x ∗ ,y ∗ ∈ I ( f ( x ∗ , y ∗ ) − ˆ f ( x ∗ , y ∗ )) . This problem isset up in SciANN through the SciModel class as: m = sn . S c i M o d e l ( inputs = [x , y ] , t a r ge t s = [ f ] , l o s s _ f u n c = " mse " , o p t i m i z e r = " adam " ) The train model is then used to perform the training and identify the parameters of theneural network: m . train ([ x_data , y_data ] , [ f_data ] , epochs =400) Once the training is completed, one can set parameters of a

Functional to be trainableor non-trainable (ﬁxed). For instance, to set f to be non-trainable: f1 . s e t _ t r a i n a b l e ( False ) The result of this training is shown in Fig. 2, where we have used 400 epochs to perform thetraining on a dataset generated using a uniform grid of 51 × f ( x, y ) = sin( x ) sin( y ), we know that this is a solutionto ∆ f + 2 f = 0, with ∆ as the Laplacian operator. As a ﬁrst illustration of SciANNfor physics-informed deep learning, we can constrain the curve-ﬁtting problem with this‘governing equation’. In SciANN, the diﬀerentiation operators are evaluated through sn (cid:38) .math.diff function. Here, this diﬀerential equation can be evaluated as: L = diff ( fxy ,x , order =2) + diff ( fxy ,y , order =2) + 2* fxy order expressing the order of diﬀerentiation.Based on the physics-informed deep learning framework, the governing equation can beimposed through the objective function. The optimization problem can then be deﬁned asarg min W , b L ( W , b ) := (cid:13)(cid:13)(cid:13) f ( x ∗ , y ∗ ) − ˆ f ( x ∗ , y ∗ ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ∆ ˆ f ( x ∗ , y ∗ ) + 2 ˆ f ( x ∗ , y ∗ ) (cid:13)(cid:13)(cid:13) , (7)and implemented in SciANN as m = S c i M o d e l ([ x , y ] , [ fxy , L ]) m . train ([ x_mesh , y_mesh ] , [( ids_data , f x y _ d a t a ) , ’ zero ’] , (cid:38) epochs =400) Note that while the inputs are the same as for the previous case, the optimization model isdeﬁned with two targets, fxy and L . The training data for fxy remains the same; thesampling grid, however, can be expanded further as ‘physics’ can be imposed everywhere. Asampling grid 101 ×

101 is used here, where data is only given at the same locations as theprevious case, i.e. on the 51 ×

51 grid. To impose target L , it is simply set to ’zero’ .The new result is shown in Fig. 3. We ﬁnd that, for the same network size and trainingparameters, incorpo rating the ‘physics’ reduces the error signiﬁcantly. Figure 2: Using SciANN to train a network on synthetic data generated from sin( x ) sin( y ); (a): networkpredictions; (b): absolute error with respect to true values. igure 3: Using SciANN to train a network on synthetic data generated from sin( x ) sin( y ) and imposing thegoverning equations f ,xx + f ,yy − f = 0; (a): network predictions; (b): absolute error with respect to truevalues. Once the training is completed, the weights W , b for all layers can be saved using thecommand save_weights , for future use. These weights can be later used to initialize anetwork of the same structure using load_weights_from keyword in SciModel .

4. Application of SciANN to Physics-Informed Deep Learning

In this section, we explore how to use SciANN to solve and discover some representativecase studies of physics-informed deep learning.

As the ﬁrst example, we illustrate the use of SciANN to solve the Burgers equation,which arises in ﬂuid mechanics, acoustics, and traﬃc ﬂow [32]. Following [18], we explorethe governing equation: u ,t + uu ,x − (0 . /π ) u ,xx = 0 , t ∈ [0 , , x ∈ [ − , , (8)subject to initial and boundary conditions u ( t = 0 , x ) = − sin( πx ) and u ( t, x = ±

1) = 0,respectively. The solution variable u can be approximated by ˆ u , deﬁned in the form of anonlinear neural network as ˆ u : ( t, x ) (cid:55)→ N u ( t, x ; W , b ). The network used in [18] consists of8 hidden layers, each with 20 neurons, and with tanh activation function, and can be deﬁnedin SciANN as: t = sn . V a r i a b l e ( " t " ) x = sn . V a r i a b l e ( " x " ) u = sn . F u n c t i o n a l ( " u " , [t , x ] , 8*[20] , " tanh " ) To set up the optimization problem, we need to identify the targets. The ﬁrst target, asused in the PINN framework, is the PDE in Eq. (8), and is deﬁned in SciANN as:8 import sciann . math . diff as diff L1 = diff (u , t ) + u * diff (u , x ) - (0.01/ pi ) * diff (u , x , order (cid:38) =2)

To impose boundary conditions, one can deﬁne them as continuous mathematical functionsdeﬁned at all sampling points: L := (1 − sign( t − t min ))( u + sin ( πx )) ,L := (1 − sign( x − x min )) u,L := (1 + sign( x − x max )) u, (9)For instance, L is zero at all sampling points except for t < t min , which is chosen as t + tol. Instead of sign, one can use smoother functions such as tanh. In this way, the optimizationmodel can be set up as: m = sn . S c i M o d e l ([ t , x ] , [ L1 , L2 , L3 , L4 ] , " mse " , " Adam " ) In this case, all targets should ‘vanish’, therefore the training is done as: m . train ( [ x_data , t_data ] , [ ’ zeros ’ , ’ zeros ’ , ’ zeros ’ , ’ zeros ’] , b a t c h _ s i z e =256 , epochs =10000 ) An alternative approach to deﬁne the boundary conditions in SciANN is to deﬁne thetarget in the sn.SciModel as the variable of interest and pass the ‘ids’ of training datawhere the conditions should be imposed. This is achieved as: m = sn . S c i M o d e l ([ t , x ] , [ L1 , u ] , " mse " , " Adam " ) m . train ( [ x_data , t_data ] , [ ’ zeros ’ , ( ids_ic_bc , U _ i c _ b c ) ] , b a t c h _ s i z e =256 , epochs =10000 ) Here, ids_ic_bc are ids associated with collocation points (t_data, x_data) where theinitial condition and boundary condition are given. An important point to keep in mindis that if the number of sampling points where boundary conditions are imposed is a verysmall portion, the mini-batch optimization parameter batch_size should be set to a largenumber to guarantee consistent mini-batch optimization. Otherwise, some mini-batches maynot acquire any data on the boundary and therefore not generate the correct gradient forthe gradient-descent update. Also worth noting is that setting governing relations to ‘zero’is conveniently done in SciANN. 9he result of solving the Burgers equation using the deep learning framework is shownin Fig. 4. The results match the exact solution accurately, and reproduce the formation ofa shock (self-sharpening discontinuity) in the solution at x = 0. Figure 4: Solution of the Burgers equation using PINN. (a) True solution for u ; (b) PINN predicted values ˆ u ;(c) Absolute error between true and predicted values, | u − ˆ u | . As a second example, we show how SciANN can be used for discovery of partial diﬀerentialequations. We choose the incompressible Navier–Stokes problem used in [18]. The equationsare: u ,t + p ,x + λ ( uu ,x + vu ,y ) − λ ( u ,xx + u ,yy ) = 0 ,v ,t + p ,y + λ ( uv ,x + vv ,y ) − λ ( v ,xx + v ,yy ) = 0 , (10)where u and v are components of velocity ﬁeld in x and y directions, respectively, p is thedensity-normalized pressure, λ should be identically equal to 1 for Newtonian ﬂuids, and λ is the kinematic viscosity. The true value of the parameters to be identiﬁed are λ = 1 and λ = 0 .

01. Given the assumption of ﬂuid incompressibility, we use the divergence-free formof the equations, from which the components of the velocity are obtained as: u = ψ ,y , v = − ψ ,x , (11)where ψ is the potential function.Here, the independent ﬁeld variables p and ψ are approximated as p ( t, x, y ) ≈ ˆ p ( t, x, y )and ψ ( t, x, y ) ≈ ˆ ψ ( t, x, y ), respectively, using nonlinear artiﬁcial neural networks as ˆ p :( t, x, y ) (cid:55)→ N p ( t, x, y ; W , b ) and ˆ ψ : ( t, x, y ) (cid:55)→ N ψ ( t, x, y ; W , b ). Using the same networksize and activation function that was used in [19], we set up the neural networks in SciANNas: 10 p = sn . F u n c t i o n a l ( " p " , [t , x , y ] , 8*[20] , ’ tanh ’) psi = sn . F u n c t i o n a l ( " psi " , [t , x , y ] , 8*[20] , ’ tanh ’) Note that this way of deﬁning the networks results in two separate networks for p and ψ ,which we ﬁnd more suitable for many problems. To replicate the one-network model usedin the original study, one can use: p , psi = sn . F u n c t i o n a l ([ " p " , " psi " ] , [t , x , y ] , 8*[20] , ’ tanh (cid:38) ’) . split () Here, the objective is to identify parameters λ and λ of the Navier–Stokes equations (10)on a dataset with given velocity ﬁeld. Therefore, we need to deﬁne these as trainableparameters of the network. This is done using sn.Parameter interface as: lamb1 = sn . P a r a m e t e r (0.0 , [x , y ] , name = " lamb1 " ) lamb2 = sn . P a r a m e t e r (0.0 , [x , y ] , name = " lamb2 " ) Note that these parameters are initialized with a value of 0 .

0. The required derivatives inEquations (10) and (11) are evaluated as: u , v = diff ( psi , y ) , - diff ( psi , x ) u_t , v_t = diff (u , t ) , diff (v , t ) u_x , u_y = diff (u , x ) , diff (u , y ) v_x , v_y = diff (v , x ) , diff (v , y ) u_xx , u_yy = diff (u ,x , order =2) , diff (u ,y , order =2) v_xx , v_yy = diff (v ,x , order =2) , diff (v ,y , order =2) p_x , p_y = diff (p , x ) , diff (p , y ) with ‘order’ indicating the order of diﬀerentiation. We can now set up the targets of theproblem as: L1 = u_t + p_x + lamb1 *( u * u_x + v * u_v ) - lamb2 *( u_xx + u_yy ) L2 = v_t + p_y + lamb1 *( u * v_x + v * v_y ) - lamb2 *( v_xx + v_yy ) L3 = u L4 = v

The optimization model is now set up as: m = sn . S c i M o d e l ([ t , x , y ] , [ L1 , L2 , L3 , L4 ] , " mse " , " Adam " ) m . train ([ t_data , x_data , y_data ] , [ ’ zeros ’ , ’ zeros ’ , u_data , v_data ] , b a t c h _ s i z e =64 , epochs = 1 0 0 0 0 ) u and v are provided, as in [19]. The results are shown inFig. 5. Figure 5: Predicted values from the PINN framework, for the ﬁeld variables u , v and p , at diﬀerent times t .The parameters are identiﬁed as λ = 0 . λ = 0 . Here, we illustrate the use of PINN for solution and discovery of nonlinear solid mechanics.We use the von Mises elastoplastic constitutive model, which is commonly used to describemechanical behavior of solid materials, in particular metals. Elastoplasticity relations giverise to inequality constraints on the governing equations [33], and, therefore, compared tothe Navier–Stokes equations, they pose a diﬀerent challenge to be incorporated in PINN.The elastoplastic relations for a plane-strain problem are: σ ij,j + f i = 0 ,σ ij = s ij − pδ ij ,p = − σ kk / − ( λ + 2 / µ ) ε v ,s ij = 2 µe eij ,ε ij = ( u i,j + u j,i ) / e ij + ε v δ ij / ,ε v = ε kk = ε xx + ε yy ,e ij = e eij + e pij . (12)Here, the summation notation is used with i, j, k ∈ { x, y } . σ ij are components of the Cauchystress tensor, and s ij and p are its deviatoric components and its pressure invariant, respec-tively. ε ij are components of the inﬁnitesimal strain tensor derived from the displacements u x , u y , and e ij and ε v are its deviatoric and volumetric components, respectively.According to the von Mises plasticity model, the admissible state of stress is deﬁned insidethe cylindrical yield surface F = F ( σ ij ) as F := q − σ Y ≤

0. Here, q is the equivalent stress12eﬁned as q = (cid:112) / s ij s ij . Assuming the associative ﬂow rule, the plastic strain componentsare: ε pij ≡ e pij = ¯ e p ∂ F ∂σ ij = ¯ ε p s ij q , (13)where ¯ ε p is the equivalent plastic strain, subject to ¯ ε p ≥

0. For the von Mises model, it canbe shown that ¯ ε p is evaluated as ¯ ε p = ¯ ε − σ Y µ ≥ , (14)where ¯ ε is the total equivalent strain, deﬁned as ¯ ε = (cid:112) / e ij e ij . Note that for von Misesplasticity, the volumetric part of plastic strain tensor is zero, ε pv = 0. Finally, the parametersof this model include the Lam´e elastic parameters λ and µ , and the yield stress σ Y .We use a classic example to illustrate our framework: a perforated strip subjected touniaxial extension [34, 33]. Consider a plate of dimensions 200 mm ×

360 mm, with acircular hole of diameter 100 mm located in the center of the plate. The plate is subjectedto extension displacements of δ = 1 mm along the short edge, under plane-strain condition,and without body forces, f i = 0. The parameters are λ = 19 .

44 GPa, µ = 29 .

17 GPa and σ Y = 243 . u x , u y , σ xx , σ yy , σ zz , σ xy with nonlinear neural networks as:ˆ u x : ( x, y ) (cid:55)→ N u x ( x, y ; W , b )ˆ v x : ( x, y ) (cid:55)→ N v x ( x, y ; W , b )ˆ σ xx : ( x, y ) (cid:55)→ N σ xx ( x, y ; W , b )ˆ σ yy : ( x, y ) (cid:55)→ N σ yy ( x, y ; W , b )ˆ σ zz : ( x, y ) (cid:55)→ N σ zz ( x, y ; W , b )ˆ σ xy : ( x, y ) (cid:55)→ N σ xy ( x, y ; W , b ) (15)Note that due to plastic deformation, the out-of-plane stress σ zz is not predeﬁned, and there-fore we also approximate it with a neural network. These neural networks and parameters λ, µ, σ Y are deﬁned as follows: ux = sn . F u n c t i o n a l ( ’ ux ’ , [x , y ] , 4*[50] , ’ tanh ’) uy = sn . F u n c t i o n a l ( ’ uy ’ , [x , y ] , 4*[50] , ’ tanh ’) sxx = sn . F u n c t i o n a l ( ’ sxx ’ , [x , y ] , 4*[50] , ’ tanh ’) syy = sn . F u n c t i o n a l ( ’ syy ’ , [x , y ] , 4*[50] , ’ tanh ’) szz = sn . F u n c t i o n a l ( ’ szz ’ , [x , y ] , 4*[50] , ’ tanh ’) sxy = sn . F u n c t i o n a l ( ’ sxy ’ , [x , y ] , 4*[50] , ’ tanh ’) lmbd = sn . P a r a m a t e r (1.0 , [x , y ]) mu = sn . P a r a m a t e r (1.0 , [x , y ]) sy = sn . P a r a m a t e r (1.0 , [x , y ]) The kinematic relations, deviatoric stress components and plastic strains can be deﬁnedas: Exx = diff ( ux , x ) Eyy = diff ( uy , y ) Exy = ( diff ( ux , y ) + diff ( uy , x ) ) /2 Evol = Exx + Eyy exx = Exx - Evol /3 eyy = Eyy - Evol /3 ezz = - Evol /3 ebar = sn . math . sqrt (2/3*( exx **2 + eyy **2 + ezz **2 + 2* exy **2) (cid:38) ) prs = -( sxx + syy + szz ) /3 dxx = sxx + prs dyy = syy + prs dzz = szz + prs q = sn . math . sqrt (3/2*( dxx **2 + dyy **2 + dzz **2 + 2* sxy **2) ) pebar = sn . math . relu ( ebar - sy /(3* mu ) ) pexx = 1.5 * pebar * sxx / q peyy = 1.5 * pebar * syy / q pezz = 1.5 * pebar * szz / q pexy = 1.5 * pebar * sxy / q F = q - sy

The operator-overloading abstraction of SciANN improves readability signiﬁcantly. Assum-ing access to the measured data for variables u x , u y , σ xx , σ yy , σ zz , σ xy , ε xx , ε yy , ε xy , theoptimization targets for training data can be described using the L* = sn.Data(*) , where ∗ refers to each variable. The physics-informed constraints are set as: − s t a t i c ) s t r e s s L1 = sn . Tie ( prs , - kappa * Evol ) L2 = sn . Tie ( dxx , 2* mu *( exx - pexx ) ) L3 = sn . Tie ( dyy , 2* mu *( eyy - peyy ) ) L4 = sn . Tie ( dzz , 2* mu *( ezz - pezz ) ) L5 = sn . Tie ( dxy , 2* mu *( exy - pexy ) ) L6 = sn . math . relu ( F ) L7 = sn . diff ( sxx , x ) + sn . diff ( sxy , y ) L8 = sn . diff ( sxy , x ) + sn . diff ( syy , y )

We use 2,000 data points from this reference solution, randomly distributed in the simulationdomain, to provide the training data. The PINN training is performed using networks with4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. Theoptimization parameters are the same as those used in [20]. The results predicted by thePINN approach match the reference results very closely, as evidenced by: (1) the verysmall errors in each of the components of the solution, except for the out-of-plane plasticstrain components (Fig. 6); and (2) the precise identiﬁcation of yield stress σ Y and relativelyaccurate identiﬁcation of elastic parameters λ and µ , yielding estimated values λ = 18 . µ = 27 . σ Y = 243 . Figure 6: The predicted values from the PINN framework for displacements, strains, plastic strains andstresses. The inverted parameters are λ = 18 . µ = 27 . σ Y = 243 .

5. Application to Variational PINN

Neural networks have recently been used to solve the variational form of diﬀerential equa-tions as well [36, 37]. In a recent study [27], the vPINN framework for solving PDEs wasintroduced and analyzed. Like PINN, it is based on graph-based automatic diﬀerentiation.The authors of [27] suggest a Petrov–Galerkin approach, where the test functions are chosen15iﬀerently from the trial functions. For the test functions, they propose the use of polyno-mials that vanish on the boundary of the domain. Here, we illustrate how to use SciANNfor vPINN, and we show how to construct proper test functions using neural networks.Consider the steady-state heat equation subject to Dirichlet boundary conditions and aknown heat source f ( x, y ) [27]:∆ T + f ( x, y ) = 0 , x, y ∈ [ − , × [ − , , (16)subject to the following boundary conditions: T ( x = ± , y ) = sin(2 πy ) ,T ( x, y = ±

1) = 0 , (17)and a heat source: f ( x, y ) = sin (2 πy ) (cid:18)

20 tanh (10 x ) (cid:0) x ) − (cid:1) − π sin (2 πx )5 (cid:19) − π sin (2 πy ) (cid:18) tanh (10 x ) + sin (2 πx )10 (cid:19) . (18)The analytical solution to this problem is: T ( x, y ) = (0 . πx ) + tanh(10 x )) sin(2 πy ) . (19)The weak form of Eq. (16) is expressed as: (cid:90) Ω [ ∇ Q · ∇ T + Q · f ( x, y )] dV = (cid:90) ∂ Ω Qq n dS, (20)where Ω is the domain of the problem, ∂ Ω is the boundary of the domain, q n is the boundaryheat ﬂux, and Q is the test function. The trial space for the temperature ﬁeld T is constructedby a neural network as T : ( x, y ) (cid:55)→ N T ( x, y ; W , b ). For the test space Q , the authors of [27]suggest the use of polynomials that satisfy the boundary conditions. However, consideringthe universal approximation capabilities of the neural networks, we suggest that this step isunnecessary, and a general neural network can be used as the test function. Note that testfunctions should satisfy continuity requirements as well as boundary conditions. A multi-layer neural network with any nonlinear activation function is a good candidate for thecontinuity requirements. To satisfy the boundary conditions, we can simply train the testfunctions to vanish on the boundary. Note that this step is associated to the construction ofproper test function and is done as a preprocessing step. Once the test functions satisfy the(homogeneous) boundary conditions, there is no need to further train them, and thereforetheir parameters can be set to non-trainable at this stage. We also ﬁnd that there is no needfor the N T and N Q networks to be of the same size, or use the same activation functions.Therefore, the test function Q is deﬁned as Q : ( x, y ) (cid:55)→ N Q ( x, y ; ¯ W , ¯ b ) subject to Q ( x = ± , y ) = Q ( x, y ± = 1) = 0. Here, overbar weights and biases ¯ W , ¯ b indicate thattheir values are predeﬁned and ﬁxed (non-trainable). Therefore, the boundary ﬂux integralon the right side of Eq. (20) vanishes, and the resulting week form can be expressed as (cid:90) Ω ∇ Q · ∇ T + Q · f ( x, y ) dV = 0 . (21)The problem can be deﬁned in SciANN as follows. The ﬁrst step is to construct proper testfunction: 16 Q = sn . F u n c t i o n a l ( ’Q ’ , [x , y ] , 4*[20] , ’ s i g m o i d ’) m = sn . S c i M o d e l ([ x , y ] , [ Q ]) m . train ([ x_data , y_data ] , [ Q_data ]) Q . s e t _ t r a i n a b l e ( False )

As discussed earlier,

Q_data takes a value of 0 . Q are set to non-trainable at the end of this step. The trial function T and the target weak form in Eq. (21)are now implemented as: T = sn . F u n c t i o n a l ( ’T ’ , [x , y ] , 4*[20] , ’ tanh ’) Q_x , Q_y = diff (Q , x ) , diff (Q , y ) T_x , T_y = diff (T , x ) , diff (T , y ) fxy = sn . V a r i a b l e ( ’ fxy ’) vol = sn . V a r i a b l e ( ’ vol ’) J = ( Q_x * T_x + Q_y * T_y + Q * fxy ) * vol

Since the variational relation (21) takes an integral form, we need to perform a domainintegral. Therefore, the volume information should be passed to the network along with thebody-force information at the quadrature points. This is achieved by introducing two newSciANN variables as the inputs to the network. The optimization model is then deﬁnedas: m = sn . S c i M o d e l ([ x , y , vol ] , [J , T ] , " mse " ) m . train ( [ x_data , y_data , vol_data , f x y _ d a t a ] , [ ’ zeros ’ , ( bc_ids , b c _ v a l s ) ] , ) The second target on T imposes the boundary conditions at speciﬁc quadrature points bc_ids .Following the details in [27], we perform the integration on a 70 ×

70 grid. The resultsare shown in Fig. 7, which are very similar to those reported in [27].17 igure 7: Solution of a steady-state heat equation using the vPINN framework. (a) True temperature ﬁeld, T ( x, y ). (b) Temperature ﬁeld predicted by the neural network, ˆ T ( x, y ). (c) Absolute error between trueand predicted values, | T ( x, y ) − ˆ T ( x, y ) | .

6. Conclusions

In this paper, we have introduced the open-source deep-learning package, SciANN, de-signed speciﬁcally to facilitate physics-informed simulation, inversion, and discovery in thecontext of computational science and engineering problems. It can be used for regressionand physics-informed deep learning with minimal eﬀort on the neural network setup. It isbased on Tensorﬂow and Keras packages, and therefore it inherits all the high-performancecomputing capabilities of Tensorﬂow back-end, including CPU/GPU parallelization capabil-ities.The objective of this paper is to introduce an environment based on a modern implemen-tation of graph-based neural network and automatic diﬀerentiation, to be used as a platformfor scientiﬁc computations. In a series of examples, we have shown how to use SciANN forcurve-ﬁtting, solving PDEs in strong and weak form, and for model inversion in the contextof physics-informed deep learning. The examples presented here as well as the package itselfare all open-source, and available in the github repository github.com/sciann.

Acknowledgments

This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale ReservoirScience’.

ReferencesReferences [1] C. M. Bishop, Pattern Recognition and Machine Learning, 2006.URL [2] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classiﬁcation with Deep Convo-lutional Neural Networks, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger(Eds.), Advances in Neural Information Processing Systems 25, MIT Press, 2012, pp.18097–1105.URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf [3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/nature14539 .URL [4] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich, Recommender Systems: An Intro-duction, Cambridge University Press, 2010.[5] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A surveyand new perspectives, ACM Computing Surveys (CSUR) 52 (1) (2019) 1–38.[6] A. Graves, A.-r. Mohamed, G. Hinton, Speech recognition with deep recurrent neuralnetworks, in: 2013 IEEE International Conference on Acoustics, Speech and SignalProcessing, IEEE, 2013, pp. 6645–6649.[7] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel,M. Monfort, U. Muller, J. Zhang, Others, End to end learning for self-driving cars,arXiv preprint arXiv:1604.07316 (2016).[8] R. Miotto, F. Wang, S. Wang, X. Jiang, J. T. Dudley, Deep learning for healthcare:review, opportunities and challenges, Brieﬁngs in Bioinformatics 19 (6) (2018) 1236–1246.[9] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.URL [10] Q. Kong, D. T. Trugman, Z. E. Ross, M. J. Bianco, B. J. Meade, P. Gerstoft, Machinelearning in seismology: turning data into insights, Seismological Research Letters 90 (1)(2018) 3–14. doi:10.1785/0220180259 .[11] Z. E. Ross, D. T. Trugman, E. Hauksson, P. M. Shearer, Searching for hidden earth-quakes in Southern California, Science 771 (May) (2019) 767–771. doi:10.1126/science.aaw6888 .[12] K. J. Bergen, P. A. Johnson, M. V. de Hoop, G. C. Beroza, Machine learning fordata-driven discovery in solid Earth geoscience, Science 363 (6433) (2019) eaau0323. doi:10.1126/science.aau0323 .URL [13] M. P. Brenner, J. D. Eldredge, J. B. Freund, Perspective on machine learning for ad-vancing ﬂuid mechanics, Physical Review Fluids 4 (10) (2019) 100501. doi:10.1103/PhysRevFluids.4.100501 .URL https://doi.org/10.1103/PhysRevFluids.4.100501 arXiv:1905.11075 , doi:10.1146/annurev-fluid-010719-060214 .[15] S. Dana, M. F. Wheeler, A machine learning accelerated FE homogenization algorithmfor elastic solids (2020). arXiv:2003.11372 .URL http://arxiv.org/abs/2003.11372 [16] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, D. Barajas-Solano,Learning Parameters and Constitutive Relationships with Physics Informed Deep NeuralNetworks (2018). arXiv:1808.03398 .URL http://arxiv.org/abs/1808.03398 [17] K. Xu, D. Z. Huang, E. Darve, Learning Constitutive Relations using Symmetric Posi-tive Deﬁnite Neural Networks (2020) 1–31 arXiv:2004.00265 .URL http://arxiv.org/abs/2004.00265 [18] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial diﬀerentialequations, Journal of Machine Learning Research 19 (2018) 1–24. arXiv:1801.06637 .[19] M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Diﬀerential Equations, SIAM Journal on ScientiﬁcComputing 40 (1) (2018) A172–A198. arXiv:1703.10230 , doi:10.1137/17M1120762 .URL https://epubs.siam.org/doi/10.1137/17M1120762 [20] E. Haghighat, M. Raissi, A. Moure, H. Gomez, R. Juanes, A deep learning frameworkfor solution and discovery in solid mechanics (2020). arXiv:2003.02751 .URL http://arxiv.org/abs/2003.02751 [21] S. Rudy, A. Alla, S. L. Brunton, J. N. Kutz, Data-Driven Identiﬁcation of ParametricPartial Diﬀerential Equations, SIAM Journal on Applied Dynamical Systems 18 (2)(2019) 643–660. doi:10.1137/18M1191944 .URL https://epubs.siam.org/doi/10.1137/18M1191944 [22] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian,D. Warde-Farley, Y. Bengio, Theano: a CPU and GPU math expression compiler, in:Proceedings of the Python for Scientiﬁc Computing Conference (SciPy), Vol. 4, Austin,TX, 2010.[23] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray,B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng,TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symposiumon Operating Systems Design and Implementation (OSDI 16), USENIX Association,Savannah, GA, 2016, pp. 265–283.URL arXiv:1512.01274 .[25] F. Chollet, Deep Learning with Python, Manning Publications Company, 2017.URL https://books.google.ca/books?id=Yo3CAQAACAAJ [26] A. G¨une, G. Baydin, B. A. Pearlmutter, J. M. Siskind, Automatic Diﬀerentiation inMachine Learning: a Survey, Journal of Machine Learning Research 18 (2018) 1–43.URL [27] E. Kharazmi, Z. Zhang, G. E. Karniadakis, hp-VPINNs: Variational Physics-InformedNeural Networks With Domain Decomposition (2020) 1–21 arXiv:2003.05385 .URL http://arxiv.org/abs/2003.05385 [28] K. Hornik, M. Stinchcombe, H. White, Multilayer feed-forward networks are universalapproximators, Neural Networks 2 (5) (1989) 359–366. doi:10.1016/0893-6080(89)90020-8 .[29] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics ofControl, Signals and Systems 2 (4) (1989) 303–314. doi:10.1007/BF02551274 .URL https://doi.org/10.1007/BF02551274 [30] K. Hornik, Approximation capabilities of multilayer feed-forward networks, NeuralNetworks 4 (2) (1991) 251 – 257. doi:https://doi.org/10.1016/0893-6080(91)90009-T .URL [31] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533–536. doi:10.1038/323533a0 .[32] C. M. Dafermos, Hyperbolic Conservation Laws in Continuum Physics, Springer-Verlag,Berlin, 2000.[33] J. C. Simo, T. J. R. Hughes, Computational Inelasticity, Vol. 7 of InterdisciplinaryApplied Mathematics, Springer, New York, 1998.[34] O. Zienkiewicz, S. Valliappan, I. King, Elasto-plastic solutions of engineering problemsinitial stress, ﬁnite element approach, International Journal for Numerical Methods inEngineering 1 (1) (1969) 75–100.[35] COMSOL, COMSOL Multiphysics User’s Guide, COMSOL, Stockholm, Sweden, 2020.[36] E. Weinan, B. Yu, The Deep Ritz Method: A Deep Learning-Based Numerical Algo-rithm for Solving Variational Problems, Communications in Mathematics and Statistics6 (1) (2018) 1–14. arXiv:1710.00211 , doi:10.1007/s40304-018-0127-z .2137] J. Berg, K. Nystr¨om, A uniﬁed deep artiﬁcial neural network approach to partialdiﬀerential equations in complex geometries, Neurocomputing 317 (2018) 28–41. arXiv:1711.06464 , doi:10.1016/j.neucom.2018.06.056 .URL https://doi.org/10.1016/j.neucom.2018.06.056https://linkinghub.elsevier.com/retrieve/pii/S092523121830794Xhttps://doi.org/10.1016/j.neucom.2018.06.056https://linkinghub.elsevier.com/retrieve/pii/S092523121830794X