SciANN: A Keras/Tensorflow wrapper for scientific computations and physics-informed deep learning using artificial neural networks
SSciANN: A Keras wrapper for scientific computations andphysics-informed deep learning using artificial neural networks
Ehsan Haghighat a , Ruben Juanes a a Massachusetts Institute of Technology, Cambridge, MA
Abstract
In this paper, we introduce SciANN, a Python package for scientific computing and physics-informed deep learning using artificial neural networks. SciANN uses the widely used deep-learning packages Tensorflow and Keras to build deep neural networks and optimizationmodels, thus inheriting many of Keras’s functionalities, such as batch optimization and modelreuse for transfer learning. SciANN is designed to abstract neural network construction forscientific computations and solution and discovery of partial differential equations (PDE)using the physics-informed neural networks (PINN) architecture, therefore providing theflexibility to set up complex functional forms. We illustrate, in a series of examples, how theframework can be used for curve fitting on discrete data, and for solution and discovery ofPDEs in strong and weak forms. We summarize the features currently available in SciANN,and also outline ongoing and future developments.
Keywords:
SciANN, Deep Neural Networks, Scientific Computations, PINN, vPINN
1. Introduction
Over the past decade, artificial neural networks, also known as deep learning, haverevolutionized many computational tasks, including image classification and computer vi-sion [1, 2, 3], search engines and recommender systems [4, 5], speech recognition [6], au-tonomous driving [7], and healthcare [8] (for a review, see, e.g. [9]). Even more recently, thisdata-driven framework has made inroads in engineering and scientific applications, such asearthquake detection [10, 11, 12], fluid mechanics and turbulence modeling [13, 14], dynam-ical systems [15], and constitutive modeling [16, 17]. A recent class of deep learning knownas physics-informed neural networks (PINN) has been shown to be particularly well suitedfor solution and inversion of equations governing physical systems, in domains such as fluidmechanics [18, 19], solid mechanics [20] and dynamical systems [21]. This increased inter-est in engineering and science is due to the increased availability of data and open-sourceplatforms such as Theano [22], Tensorflow [23], MXNET [24], and Keras [25], which offerfeatures such as high-performance computing and automatic differentiation [26].Advances in deep learning have led to the emergence of different neural network archi-tectures, including densely connected multi-layer deep neural networks (DNNs), convolu-tional neural networks (CNNs), recurrent neural networks (RNNs) and residual networks(ResNets). This proliferation of network architectures, and the (often steep) learning curvefor each package, makes it challenging for new researchers in the field to use deep learning
Preprint submitted to Elsevier May 19, 2020 a r X i v : . [ c s . OH ] M a y ools in their computational workflows. In this paper, we introduce an open-source Pythonpackage, SciANN, developed on Tensorflow and Keras, which is designed with scientific com-putations and physics-informed deep learning in mind. As such, the abstractions used inthis programming interface target engineering applications such as model fitting, solution ofordinary and partial differential equations, and model inversion (parameter identification).The outline of the paper is as follows. We first describe the functional form associatedwith deep neural networks. We then discuss different interfaces in SciANN that can be used toset up neural networks and optimization problems. We then illustrate SciANN’s applicationto curve fitting, the solution of the Burgers equation, and the identification of the Navier–Stokes equations and the von Mises plasticity model from data. Lastly, we show how touse SciANN in the context of the variational PINN framework [27]. The examples discussedhere and several additional applications are freely available at github.com/sciann/sciann-applications.
2. Artificial Neural Networks as Universal Approximators
A single-layer feed-forward neural network with inputs x ∈ R m , outputs y ∈ R n , and d hidden units is constructed as: y = W σ ( W x + b ) + b , (1)where ( W ∈ R d × m , b ∈ R d ), ( W ∈ R n × d , b ∈ R n ) are parameters of this transformation,also known as weights and biases, and σ is the activation function. As shown in [28, 29],this transformation can approximate any measurable function, independently of the sizeof input features m or the activation function σ . If we define the transformation Σ asΣ i (ˆ x i ) := ˆ y i = σ i ( W i ˆ x i + b i ) with ˆ x i as the input to and ˆ y i as the output of any hiddenlayer i , x = ˆ x as the main input to the network, and y = Σ L (ˆ x L ) as the final output of thenetwork, we can construct a general L -layer neural network as composition of Σ i functionsas: y = Σ L ◦ Σ L − ◦ · · · ◦ Σ ( x ) , (2)with σ i as activation functions that make the transformations nonlinear. Some commonactivation functions are: ReLU : ˆ x (cid:55)→ ˆ x + , sigmoid : ˆ x (cid:55)→ / (1 + e ˆ x ) , tanh : ˆ x (cid:55)→ ( e ˆ x − e − ˆ x ) / ( e ˆ x + e − ˆ x ) . (3)In general, this multilayer feed-forward neural network is capable of approximating functionsto any desired accuracy [28, 30]. Inaccurate approximation may arise due to lack of a deter-ministic relation between input and outputs, insufficient number of hidden units, inadequatetraining, or poor choice of the optimization algorithm.The parameters of the neural network, W i and b i of all layers i = { . . . L } , are iden-tified through minimization using a back-propagation algorithm [31]. For instance, if weapproximate a field variable such as temperature T with a multi-layer neural network as T ( x ) ≈ ˆ T ( x ) = N T ( x ; W , b ), we can set up the optimization problem asarg min W , b L ( W , b ) := (cid:13)(cid:13)(cid:13) T ( x ∗ ) − ˆ T ( x ∗ ) (cid:13)(cid:13)(cid:13) = (cid:107) T ( x ∗ ) − N T ( x ∗ ; W , b ) (cid:107) , (4)2here x ∗ is the set of discrete training points, and (cid:107)◦(cid:107) p is the mean squared norm. Notethat one can use other choices for the loss function L , such as mean absolute error or cross-entropy. The optimization problem (4) is nonconvex, which may require significant trial anderror efforts to find an effective optimization algorithm and optimization parameters.We can construct deep neural networks with an arbitrary number of layers and neurons.We can also define multiple networks and combine them to generate the final output. Thereare many types of neural networks that have been optimized for specific tasks. An exampleis the ResNet architecture introduced for image classification, consisting of many blocks,each of the form: z k = Σ k ◦ Σ k ◦ Σ k ( z k − ) + z k − , (5)where k is the block number and z k − is the output of previous block, with x = z and y = z K as the main inputs to and outputs of the network. Therefore, artificial neural networks offera simple way of constructing very complex but dependent solution spaces (see, e.g., Fig. 1). x σσσσσσσσ y f1 σσσσ f2 g σσσσσσσσ σσσσ Figure 1: A sample multi-net architecture to construct a complex functional space g as g ( x, y ) = g ( f ( x, y ) , f ( x, y )).
3. SciANN: Scientific Computing with Artificial Neural Networks
SciANN is an open-source neural-network library, based on Tensorflow [23] and Keras [25],which abstracts the application of deep learning for scientific computing purposes. In thissection, we discuss abstraction choices for SciANN and illustrate how one can use it forscientific computations. 3 .1. Brief description of SciANN
SciANN is implemented on the most popular deep-learning packages, Tensorflow andKeras, and therefore it inherits all the functionalities they provide. Among those, the mostimportant ones include graph-based automatic differentiation and massive heterogeneoushigh-performance computing capabilities. It is designed for an audience with a backgroundin scientific computation or computational science and engineering.SciANN currently supports fully connected feed-forward deep neural networks, and recur-rent networks are under development. Some architectures, such as convolutional networks,are not a good fit for scientific computing applications and therefore are not currently inour development plans. Tensorflow and Keras provide a wide range of features, includingoptimization algorithms, automatic differentiation, and model parameter exports for transferlearning.To install SciANN, one can simply use the Python’s pip package installer as: p i p i n s t a l l s c i a n n
It can be imported into the active Python environment using Python’s import module: import sciann as sn Its mathematical functions are located in the sn.math interface. For instance, the functiondiff is accessed through sn.math.diff . The main building blocks of SciANN include: • sn.Variable : class to define inputs to the network. • sn.Field : class to define outputs of the network. • sn.Functional : class to construct a nonlinear neural network approximation. • sn.Parameter : class to define a parameter for inversion purposes. • sn.Data, sn.Tie : class to define the targets. If there are observations for any variable,the ‘sn.Data’ interface is used when building the optimization model. For physicalconstraints such as PDEs or equality relations between different variables, the ‘sn.Tie’interface is designed to build the optimizer. • sn.SciModel : class to set up the optimization problem, i.e. inputs to the networks,targets (objectives), and the loss function. • sn.math : mathematical operations are accessed here. SciANN also support operatoroverloading, which improves readability when setting up complex mathematical rela-tions such as PDEs. 4 .2. An illustrative example: curve fitting We illustrate SciANN’s capabilities with its application to a curve-fitting problem. Givena set of discrete data, generated from f ( x, y ) = sin( x ) sin( y ) over the domain x, y → [ − π, π ] × [ − π, π ], we want to fit a surface, in the form of a neural network, to this dataset. A multi-layer neural network approximating the function f can be constructed as ˆ f : ( x, y ) (cid:55)→N f ( x, y ; W , b ), with inputs x, y and output ˆ f . In the most common mathematical andPythonic abstraction, the inputs x, y and output ˆ f can be implemented as: x = sn . V a r i a b l e ( " x " ) y = sn . V a r i a b l e ( " y " ) f = sn . Field ( " f " ) A 3-layer neural network with 6 neural units and hyperbolic-tangent activation function canthen be constructed as f = sn . F u n c t i o n a l ( fields =[ f ] , v a r i a b l e s =[ x , y ] , h i d d e n _ l a y e r s =[6 , 6 , 6] , actf = " tanh " ) This definition can be further compressed as f = sn . F u n c t i o n a l ( " f " , [x , y ] , [6 , 6 , 6] , " tanh " ) At this stage, the parameters of the networks, i.e. set of W , b for all layers, are randomlyinitialized. Their current values can be retrieved using the command get_weights : f . g e t _ w e i g h t s () One can set the parameters of the network to any desired values using the command set_weights .As another example, a more complex neural network functional as the composition ofthree blocks, as shown in Fig. 1, can be constructed as f1 = sn . F u n c t i o n a l ( " f1 " , [x , y ] , [4 , 4] , " tanh " ) f2 = sn . F u n c t i o n a l ( " f2 " , [x , y ] , [4 , 4] , " tanh " ) g = sn . F u n c t i o n a l ( " g " , [ f1 , f2 ] , [4 , 4] , " tanh " ) Any of these functions can be evaluated immediately or after training using the eval function, by providing discrete data for the inputs:5 f_test = f . eval ([ x_data , y_data ]) f 1 _ te s t = f1 . eval ([ x_data , y_data ]) f 2 _ te s t = f2 . eval ([ x_data , y_data ]) g_test = g . eval ([ f1_data , f 2 _ da t a ]) Once the networks are initialized, we set up the optimization problem and train thenetwork by minimizing an objective function, i.e. solving the optimization problem for W and b . The optimization problem for a data-driven curve-fitting is defined as:arg min W , b L ( W , b ) := (cid:107) f ( x ∗ , y ∗ ) − N f ( x ∗ , y ∗ ; W , b ) (cid:107) , (6)where x ∗ , y ∗ is the set of all discrete points where f is given. For the loss-function (cid:107)◦(cid:107) , weuse the mean squared-error norm (cid:107)◦(cid:107) = N (cid:80) x ∗ ,y ∗ ∈ I ( f ( x ∗ , y ∗ ) − ˆ f ( x ∗ , y ∗ )) . This problem isset up in SciANN through the SciModel class as: m = sn . S c i M o d e l ( inputs = [x , y ] , t a r ge t s = [ f ] , l o s s _ f u n c = " mse " , o p t i m i z e r = " adam " ) The train model is then used to perform the training and identify the parameters of theneural network: m . train ([ x_data , y_data ] , [ f_data ] , epochs =400) Once the training is completed, one can set parameters of a
Functional to be trainableor non-trainable (fixed). For instance, to set f to be non-trainable: f1 . s e t _ t r a i n a b l e ( False ) The result of this training is shown in Fig. 2, where we have used 400 epochs to perform thetraining on a dataset generated using a uniform grid of 51 × f ( x, y ) = sin( x ) sin( y ), we know that this is a solutionto ∆ f + 2 f = 0, with ∆ as the Laplacian operator. As a first illustration of SciANNfor physics-informed deep learning, we can constrain the curve-fitting problem with this‘governing equation’. In SciANN, the differentiation operators are evaluated through sn (cid:38) .math.diff function. Here, this differential equation can be evaluated as: L = diff ( fxy ,x , order =2) + diff ( fxy ,y , order =2) + 2* fxy order expressing the order of differentiation.Based on the physics-informed deep learning framework, the governing equation can beimposed through the objective function. The optimization problem can then be defined asarg min W , b L ( W , b ) := (cid:13)(cid:13)(cid:13) f ( x ∗ , y ∗ ) − ˆ f ( x ∗ , y ∗ ) (cid:13)(cid:13)(cid:13) + (cid:13)(cid:13)(cid:13) ∆ ˆ f ( x ∗ , y ∗ ) + 2 ˆ f ( x ∗ , y ∗ ) (cid:13)(cid:13)(cid:13) , (7)and implemented in SciANN as m = S c i M o d e l ([ x , y ] , [ fxy , L ]) m . train ([ x_mesh , y_mesh ] , [( ids_data , f x y _ d a t a ) , ’ zero ’] , (cid:38) epochs =400) Note that while the inputs are the same as for the previous case, the optimization model isdefined with two targets, fxy and L . The training data for fxy remains the same; thesampling grid, however, can be expanded further as ‘physics’ can be imposed everywhere. Asampling grid 101 ×
101 is used here, where data is only given at the same locations as theprevious case, i.e. on the 51 ×
51 grid. To impose target L , it is simply set to ’zero’ .The new result is shown in Fig. 3. We find that, for the same network size and trainingparameters, incorpo rating the ‘physics’ reduces the error significantly. Figure 2: Using SciANN to train a network on synthetic data generated from sin( x ) sin( y ); (a): networkpredictions; (b): absolute error with respect to true values. igure 3: Using SciANN to train a network on synthetic data generated from sin( x ) sin( y ) and imposing thegoverning equations f ,xx + f ,yy − f = 0; (a): network predictions; (b): absolute error with respect to truevalues. Once the training is completed, the weights W , b for all layers can be saved using thecommand save_weights , for future use. These weights can be later used to initialize anetwork of the same structure using load_weights_from keyword in SciModel .
4. Application of SciANN to Physics-Informed Deep Learning
In this section, we explore how to use SciANN to solve and discover some representativecase studies of physics-informed deep learning.
As the first example, we illustrate the use of SciANN to solve the Burgers equation,which arises in fluid mechanics, acoustics, and traffic flow [32]. Following [18], we explorethe governing equation: u ,t + uu ,x − (0 . /π ) u ,xx = 0 , t ∈ [0 , , x ∈ [ − , , (8)subject to initial and boundary conditions u ( t = 0 , x ) = − sin( πx ) and u ( t, x = ±
1) = 0,respectively. The solution variable u can be approximated by ˆ u , defined in the form of anonlinear neural network as ˆ u : ( t, x ) (cid:55)→ N u ( t, x ; W , b ). The network used in [18] consists of8 hidden layers, each with 20 neurons, and with tanh activation function, and can be definedin SciANN as: t = sn . V a r i a b l e ( " t " ) x = sn . V a r i a b l e ( " x " ) u = sn . F u n c t i o n a l ( " u " , [t , x ] , 8*[20] , " tanh " ) To set up the optimization problem, we need to identify the targets. The first target, asused in the PINN framework, is the PDE in Eq. (8), and is defined in SciANN as:8 import sciann . math . diff as diff L1 = diff (u , t ) + u * diff (u , x ) - (0.01/ pi ) * diff (u , x , order (cid:38) =2)
To impose boundary conditions, one can define them as continuous mathematical functionsdefined at all sampling points: L := (1 − sign( t − t min ))( u + sin ( πx )) ,L := (1 − sign( x − x min )) u,L := (1 + sign( x − x max )) u, (9)For instance, L is zero at all sampling points except for t < t min , which is chosen as t + tol. Instead of sign, one can use smoother functions such as tanh. In this way, the optimizationmodel can be set up as: m = sn . S c i M o d e l ([ t , x ] , [ L1 , L2 , L3 , L4 ] , " mse " , " Adam " ) In this case, all targets should ‘vanish’, therefore the training is done as: m . train ( [ x_data , t_data ] , [ ’ zeros ’ , ’ zeros ’ , ’ zeros ’ , ’ zeros ’] , b a t c h _ s i z e =256 , epochs =10000 ) An alternative approach to define the boundary conditions in SciANN is to define thetarget in the sn.SciModel as the variable of interest and pass the ‘ids’ of training datawhere the conditions should be imposed. This is achieved as: m = sn . S c i M o d e l ([ t , x ] , [ L1 , u ] , " mse " , " Adam " ) m . train ( [ x_data , t_data ] , [ ’ zeros ’ , ( ids_ic_bc , U _ i c _ b c ) ] , b a t c h _ s i z e =256 , epochs =10000 ) Here, ids_ic_bc are ids associated with collocation points (t_data, x_data) where theinitial condition and boundary condition are given. An important point to keep in mindis that if the number of sampling points where boundary conditions are imposed is a verysmall portion, the mini-batch optimization parameter batch_size should be set to a largenumber to guarantee consistent mini-batch optimization. Otherwise, some mini-batches maynot acquire any data on the boundary and therefore not generate the correct gradient forthe gradient-descent update. Also worth noting is that setting governing relations to ‘zero’is conveniently done in SciANN. 9he result of solving the Burgers equation using the deep learning framework is shownin Fig. 4. The results match the exact solution accurately, and reproduce the formation ofa shock (self-sharpening discontinuity) in the solution at x = 0. Figure 4: Solution of the Burgers equation using PINN. (a) True solution for u ; (b) PINN predicted values ˆ u ;(c) Absolute error between true and predicted values, | u − ˆ u | . As a second example, we show how SciANN can be used for discovery of partial differentialequations. We choose the incompressible Navier–Stokes problem used in [18]. The equationsare: u ,t + p ,x + λ ( uu ,x + vu ,y ) − λ ( u ,xx + u ,yy ) = 0 ,v ,t + p ,y + λ ( uv ,x + vv ,y ) − λ ( v ,xx + v ,yy ) = 0 , (10)where u and v are components of velocity field in x and y directions, respectively, p is thedensity-normalized pressure, λ should be identically equal to 1 for Newtonian fluids, and λ is the kinematic viscosity. The true value of the parameters to be identified are λ = 1 and λ = 0 .
01. Given the assumption of fluid incompressibility, we use the divergence-free formof the equations, from which the components of the velocity are obtained as: u = ψ ,y , v = − ψ ,x , (11)where ψ is the potential function.Here, the independent field variables p and ψ are approximated as p ( t, x, y ) ≈ ˆ p ( t, x, y )and ψ ( t, x, y ) ≈ ˆ ψ ( t, x, y ), respectively, using nonlinear artificial neural networks as ˆ p :( t, x, y ) (cid:55)→ N p ( t, x, y ; W , b ) and ˆ ψ : ( t, x, y ) (cid:55)→ N ψ ( t, x, y ; W , b ). Using the same networksize and activation function that was used in [19], we set up the neural networks in SciANNas: 10 p = sn . F u n c t i o n a l ( " p " , [t , x , y ] , 8*[20] , ’ tanh ’) psi = sn . F u n c t i o n a l ( " psi " , [t , x , y ] , 8*[20] , ’ tanh ’) Note that this way of defining the networks results in two separate networks for p and ψ ,which we find more suitable for many problems. To replicate the one-network model usedin the original study, one can use: p , psi = sn . F u n c t i o n a l ([ " p " , " psi " ] , [t , x , y ] , 8*[20] , ’ tanh (cid:38) ’) . split () Here, the objective is to identify parameters λ and λ of the Navier–Stokes equations (10)on a dataset with given velocity field. Therefore, we need to define these as trainableparameters of the network. This is done using sn.Parameter interface as: lamb1 = sn . P a r a m e t e r (0.0 , [x , y ] , name = " lamb1 " ) lamb2 = sn . P a r a m e t e r (0.0 , [x , y ] , name = " lamb2 " ) Note that these parameters are initialized with a value of 0 .
0. The required derivatives inEquations (10) and (11) are evaluated as: u , v = diff ( psi , y ) , - diff ( psi , x ) u_t , v_t = diff (u , t ) , diff (v , t ) u_x , u_y = diff (u , x ) , diff (u , y ) v_x , v_y = diff (v , x ) , diff (v , y ) u_xx , u_yy = diff (u ,x , order =2) , diff (u ,y , order =2) v_xx , v_yy = diff (v ,x , order =2) , diff (v ,y , order =2) p_x , p_y = diff (p , x ) , diff (p , y ) with ‘order’ indicating the order of differentiation. We can now set up the targets of theproblem as: L1 = u_t + p_x + lamb1 *( u * u_x + v * u_v ) - lamb2 *( u_xx + u_yy ) L2 = v_t + p_y + lamb1 *( u * v_x + v * v_y ) - lamb2 *( v_xx + v_yy ) L3 = u L4 = v
The optimization model is now set up as: m = sn . S c i M o d e l ([ t , x , y ] , [ L1 , L2 , L3 , L4 ] , " mse " , " Adam " ) m . train ([ t_data , x_data , y_data ] , [ ’ zeros ’ , ’ zeros ’ , u_data , v_data ] , b a t c h _ s i z e =64 , epochs = 1 0 0 0 0 ) u and v are provided, as in [19]. The results are shown inFig. 5. Figure 5: Predicted values from the PINN framework, for the field variables u , v and p , at different times t .The parameters are identified as λ = 0 . λ = 0 . Here, we illustrate the use of PINN for solution and discovery of nonlinear solid mechanics.We use the von Mises elastoplastic constitutive model, which is commonly used to describemechanical behavior of solid materials, in particular metals. Elastoplasticity relations giverise to inequality constraints on the governing equations [33], and, therefore, compared tothe Navier–Stokes equations, they pose a different challenge to be incorporated in PINN.The elastoplastic relations for a plane-strain problem are: σ ij,j + f i = 0 ,σ ij = s ij − pδ ij ,p = − σ kk / − ( λ + 2 / µ ) ε v ,s ij = 2 µe eij ,ε ij = ( u i,j + u j,i ) / e ij + ε v δ ij / ,ε v = ε kk = ε xx + ε yy ,e ij = e eij + e pij . (12)Here, the summation notation is used with i, j, k ∈ { x, y } . σ ij are components of the Cauchystress tensor, and s ij and p are its deviatoric components and its pressure invariant, respec-tively. ε ij are components of the infinitesimal strain tensor derived from the displacements u x , u y , and e ij and ε v are its deviatoric and volumetric components, respectively.According to the von Mises plasticity model, the admissible state of stress is defined insidethe cylindrical yield surface F = F ( σ ij ) as F := q − σ Y ≤
0. Here, q is the equivalent stress12efined as q = (cid:112) / s ij s ij . Assuming the associative flow rule, the plastic strain componentsare: ε pij ≡ e pij = ¯ e p ∂ F ∂σ ij = ¯ ε p s ij q , (13)where ¯ ε p is the equivalent plastic strain, subject to ¯ ε p ≥
0. For the von Mises model, it canbe shown that ¯ ε p is evaluated as ¯ ε p = ¯ ε − σ Y µ ≥ , (14)where ¯ ε is the total equivalent strain, defined as ¯ ε = (cid:112) / e ij e ij . Note that for von Misesplasticity, the volumetric part of plastic strain tensor is zero, ε pv = 0. Finally, the parametersof this model include the Lam´e elastic parameters λ and µ , and the yield stress σ Y .We use a classic example to illustrate our framework: a perforated strip subjected touniaxial extension [34, 33]. Consider a plate of dimensions 200 mm ×
360 mm, with acircular hole of diameter 100 mm located in the center of the plate. The plate is subjectedto extension displacements of δ = 1 mm along the short edge, under plane-strain condition,and without body forces, f i = 0. The parameters are λ = 19 .
44 GPa, µ = 29 .
17 GPa and σ Y = 243 . u x , u y , σ xx , σ yy , σ zz , σ xy with nonlinear neural networks as:ˆ u x : ( x, y ) (cid:55)→ N u x ( x, y ; W , b )ˆ v x : ( x, y ) (cid:55)→ N v x ( x, y ; W , b )ˆ σ xx : ( x, y ) (cid:55)→ N σ xx ( x, y ; W , b )ˆ σ yy : ( x, y ) (cid:55)→ N σ yy ( x, y ; W , b )ˆ σ zz : ( x, y ) (cid:55)→ N σ zz ( x, y ; W , b )ˆ σ xy : ( x, y ) (cid:55)→ N σ xy ( x, y ; W , b ) (15)Note that due to plastic deformation, the out-of-plane stress σ zz is not predefined, and there-fore we also approximate it with a neural network. These neural networks and parameters λ, µ, σ Y are defined as follows: ux = sn . F u n c t i o n a l ( ’ ux ’ , [x , y ] , 4*[50] , ’ tanh ’) uy = sn . F u n c t i o n a l ( ’ uy ’ , [x , y ] , 4*[50] , ’ tanh ’) sxx = sn . F u n c t i o n a l ( ’ sxx ’ , [x , y ] , 4*[50] , ’ tanh ’) syy = sn . F u n c t i o n a l ( ’ syy ’ , [x , y ] , 4*[50] , ’ tanh ’) szz = sn . F u n c t i o n a l ( ’ szz ’ , [x , y ] , 4*[50] , ’ tanh ’) sxy = sn . F u n c t i o n a l ( ’ sxy ’ , [x , y ] , 4*[50] , ’ tanh ’) lmbd = sn . P a r a m a t e r (1.0 , [x , y ]) mu = sn . P a r a m a t e r (1.0 , [x , y ]) sy = sn . P a r a m a t e r (1.0 , [x , y ]) The kinematic relations, deviatoric stress components and plastic strains can be definedas: Exx = diff ( ux , x ) Eyy = diff ( uy , y ) Exy = ( diff ( ux , y ) + diff ( uy , x ) ) /2 Evol = Exx + Eyy exx = Exx - Evol /3 eyy = Eyy - Evol /3 ezz = - Evol /3 ebar = sn . math . sqrt (2/3*( exx **2 + eyy **2 + ezz **2 + 2* exy **2) (cid:38) ) prs = -( sxx + syy + szz ) /3 dxx = sxx + prs dyy = syy + prs dzz = szz + prs q = sn . math . sqrt (3/2*( dxx **2 + dyy **2 + dzz **2 + 2* sxy **2) ) pebar = sn . math . relu ( ebar - sy /(3* mu ) ) pexx = 1.5 * pebar * sxx / q peyy = 1.5 * pebar * syy / q pezz = 1.5 * pebar * szz / q pexy = 1.5 * pebar * sxy / q F = q - sy
The operator-overloading abstraction of SciANN improves readability significantly. Assum-ing access to the measured data for variables u x , u y , σ xx , σ yy , σ zz , σ xy , ε xx , ε yy , ε xy , theoptimization targets for training data can be described using the L* = sn.Data(*) , where ∗ refers to each variable. The physics-informed constraints are set as: − s t a t i c ) s t r e s s L1 = sn . Tie ( prs , - kappa * Evol ) L2 = sn . Tie ( dxx , 2* mu *( exx - pexx ) ) L3 = sn . Tie ( dyy , 2* mu *( eyy - peyy ) ) L4 = sn . Tie ( dzz , 2* mu *( ezz - pezz ) ) L5 = sn . Tie ( dxy , 2* mu *( exy - pexy ) ) L6 = sn . math . relu ( F ) L7 = sn . diff ( sxx , x ) + sn . diff ( sxy , y ) L8 = sn . diff ( sxy , x ) + sn . diff ( syy , y )
We use 2,000 data points from this reference solution, randomly distributed in the simulationdomain, to provide the training data. The PINN training is performed using networks with4 layers, each with 100 neurons, and with a hyperbolic-tangent activation function. Theoptimization parameters are the same as those used in [20]. The results predicted by thePINN approach match the reference results very closely, as evidenced by: (1) the verysmall errors in each of the components of the solution, except for the out-of-plane plasticstrain components (Fig. 6); and (2) the precise identification of yield stress σ Y and relativelyaccurate identification of elastic parameters λ and µ , yielding estimated values λ = 18 . µ = 27 . σ Y = 243 . Figure 6: The predicted values from the PINN framework for displacements, strains, plastic strains andstresses. The inverted parameters are λ = 18 . µ = 27 . σ Y = 243 .
5. Application to Variational PINN
Neural networks have recently been used to solve the variational form of differential equa-tions as well [36, 37]. In a recent study [27], the vPINN framework for solving PDEs wasintroduced and analyzed. Like PINN, it is based on graph-based automatic differentiation.The authors of [27] suggest a Petrov–Galerkin approach, where the test functions are chosen15ifferently from the trial functions. For the test functions, they propose the use of polyno-mials that vanish on the boundary of the domain. Here, we illustrate how to use SciANNfor vPINN, and we show how to construct proper test functions using neural networks.Consider the steady-state heat equation subject to Dirichlet boundary conditions and aknown heat source f ( x, y ) [27]:∆ T + f ( x, y ) = 0 , x, y ∈ [ − , × [ − , , (16)subject to the following boundary conditions: T ( x = ± , y ) = sin(2 πy ) ,T ( x, y = ±
1) = 0 , (17)and a heat source: f ( x, y ) = sin (2 πy ) (cid:18)
20 tanh (10 x ) (cid:0) x ) − (cid:1) − π sin (2 πx )5 (cid:19) − π sin (2 πy ) (cid:18) tanh (10 x ) + sin (2 πx )10 (cid:19) . (18)The analytical solution to this problem is: T ( x, y ) = (0 . πx ) + tanh(10 x )) sin(2 πy ) . (19)The weak form of Eq. (16) is expressed as: (cid:90) Ω [ ∇ Q · ∇ T + Q · f ( x, y )] dV = (cid:90) ∂ Ω Qq n dS, (20)where Ω is the domain of the problem, ∂ Ω is the boundary of the domain, q n is the boundaryheat flux, and Q is the test function. The trial space for the temperature field T is constructedby a neural network as T : ( x, y ) (cid:55)→ N T ( x, y ; W , b ). For the test space Q , the authors of [27]suggest the use of polynomials that satisfy the boundary conditions. However, consideringthe universal approximation capabilities of the neural networks, we suggest that this step isunnecessary, and a general neural network can be used as the test function. Note that testfunctions should satisfy continuity requirements as well as boundary conditions. A multi-layer neural network with any nonlinear activation function is a good candidate for thecontinuity requirements. To satisfy the boundary conditions, we can simply train the testfunctions to vanish on the boundary. Note that this step is associated to the construction ofproper test function and is done as a preprocessing step. Once the test functions satisfy the(homogeneous) boundary conditions, there is no need to further train them, and thereforetheir parameters can be set to non-trainable at this stage. We also find that there is no needfor the N T and N Q networks to be of the same size, or use the same activation functions.Therefore, the test function Q is defined as Q : ( x, y ) (cid:55)→ N Q ( x, y ; ¯ W , ¯ b ) subject to Q ( x = ± , y ) = Q ( x, y ± = 1) = 0. Here, overbar weights and biases ¯ W , ¯ b indicate thattheir values are predefined and fixed (non-trainable). Therefore, the boundary flux integralon the right side of Eq. (20) vanishes, and the resulting week form can be expressed as (cid:90) Ω ∇ Q · ∇ T + Q · f ( x, y ) dV = 0 . (21)The problem can be defined in SciANN as follows. The first step is to construct proper testfunction: 16 Q = sn . F u n c t i o n a l ( ’Q ’ , [x , y ] , 4*[20] , ’ s i g m o i d ’) m = sn . S c i M o d e l ([ x , y ] , [ Q ]) m . train ([ x_data , y_data ] , [ Q_data ]) Q . s e t _ t r a i n a b l e ( False )
As discussed earlier,
Q_data takes a value of 0 . Q are set to non-trainable at the end of this step. The trial function T and the target weak form in Eq. (21)are now implemented as: T = sn . F u n c t i o n a l ( ’T ’ , [x , y ] , 4*[20] , ’ tanh ’) Q_x , Q_y = diff (Q , x ) , diff (Q , y ) T_x , T_y = diff (T , x ) , diff (T , y ) fxy = sn . V a r i a b l e ( ’ fxy ’) vol = sn . V a r i a b l e ( ’ vol ’) J = ( Q_x * T_x + Q_y * T_y + Q * fxy ) * vol
Since the variational relation (21) takes an integral form, we need to perform a domainintegral. Therefore, the volume information should be passed to the network along with thebody-force information at the quadrature points. This is achieved by introducing two newSciANN variables as the inputs to the network. The optimization model is then definedas: m = sn . S c i M o d e l ([ x , y , vol ] , [J , T ] , " mse " ) m . train ( [ x_data , y_data , vol_data , f x y _ d a t a ] , [ ’ zeros ’ , ( bc_ids , b c _ v a l s ) ] , ) The second target on T imposes the boundary conditions at specific quadrature points bc_ids .Following the details in [27], we perform the integration on a 70 ×
70 grid. The resultsare shown in Fig. 7, which are very similar to those reported in [27].17 igure 7: Solution of a steady-state heat equation using the vPINN framework. (a) True temperature field, T ( x, y ). (b) Temperature field predicted by the neural network, ˆ T ( x, y ). (c) Absolute error between trueand predicted values, | T ( x, y ) − ˆ T ( x, y ) | .
6. Conclusions
In this paper, we have introduced the open-source deep-learning package, SciANN, de-signed specifically to facilitate physics-informed simulation, inversion, and discovery in thecontext of computational science and engineering problems. It can be used for regressionand physics-informed deep learning with minimal effort on the neural network setup. It isbased on Tensorflow and Keras packages, and therefore it inherits all the high-performancecomputing capabilities of Tensorflow back-end, including CPU/GPU parallelization capabil-ities.The objective of this paper is to introduce an environment based on a modern implemen-tation of graph-based neural network and automatic differentiation, to be used as a platformfor scientific computations. In a series of examples, we have shown how to use SciANN forcurve-fitting, solving PDEs in strong and weak form, and for model inversion in the contextof physics-informed deep learning. The examples presented here as well as the package itselfare all open-source, and available in the github repository github.com/sciann.
Acknowledgments
This work was funded by the KFUPM-MIT collaborative agreement ‘Multiscale ReservoirScience’.
ReferencesReferences [1] C. M. Bishop, Pattern Recognition and Machine Learning, 2006.URL [2] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification with Deep Convo-lutional Neural Networks, in: F. Pereira, C. J. C. Burges, L. Bottou, K. Q. Weinberger(Eds.), Advances in Neural Information Processing Systems 25, MIT Press, 2012, pp.18097–1105.URL http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf [3] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015) 436–444. doi:10.1038/nature14539 .URL [4] D. Jannach, M. Zanker, A. Felfernig, G. Friedrich, Recommender Systems: An Intro-duction, Cambridge University Press, 2010.[5] S. Zhang, L. Yao, A. Sun, Y. Tay, Deep learning based recommender system: A surveyand new perspectives, ACM Computing Surveys (CSUR) 52 (1) (2019) 1–38.[6] A. Graves, A.-r. Mohamed, G. Hinton, Speech recognition with deep recurrent neuralnetworks, in: 2013 IEEE International Conference on Acoustics, Speech and SignalProcessing, IEEE, 2013, pp. 6645–6649.[7] M. Bojarski, D. Del Testa, D. Dworakowski, B. Firner, B. Flepp, P. Goyal, L. D. Jackel,M. Monfort, U. Muller, J. Zhang, Others, End to end learning for self-driving cars,arXiv preprint arXiv:1604.07316 (2016).[8] R. Miotto, F. Wang, S. Wang, X. Jiang, J. T. Dudley, Deep learning for healthcare:review, opportunities and challenges, Briefings in Bioinformatics 19 (6) (2018) 1236–1246.[9] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016.URL [10] Q. Kong, D. T. Trugman, Z. E. Ross, M. J. Bianco, B. J. Meade, P. Gerstoft, Machinelearning in seismology: turning data into insights, Seismological Research Letters 90 (1)(2018) 3–14. doi:10.1785/0220180259 .[11] Z. E. Ross, D. T. Trugman, E. Hauksson, P. M. Shearer, Searching for hidden earth-quakes in Southern California, Science 771 (May) (2019) 767–771. doi:10.1126/science.aaw6888 .[12] K. J. Bergen, P. A. Johnson, M. V. de Hoop, G. C. Beroza, Machine learning fordata-driven discovery in solid Earth geoscience, Science 363 (6433) (2019) eaau0323. doi:10.1126/science.aau0323 .URL [13] M. P. Brenner, J. D. Eldredge, J. B. Freund, Perspective on machine learning for ad-vancing fluid mechanics, Physical Review Fluids 4 (10) (2019) 100501. doi:10.1103/PhysRevFluids.4.100501 .URL https://doi.org/10.1103/PhysRevFluids.4.100501 arXiv:1905.11075 , doi:10.1146/annurev-fluid-010719-060214 .[15] S. Dana, M. F. Wheeler, A machine learning accelerated FE homogenization algorithmfor elastic solids (2020). arXiv:2003.11372 .URL http://arxiv.org/abs/2003.11372 [16] A. M. Tartakovsky, C. O. Marrero, P. Perdikaris, G. D. Tartakovsky, D. Barajas-Solano,Learning Parameters and Constitutive Relationships with Physics Informed Deep NeuralNetworks (2018). arXiv:1808.03398 .URL http://arxiv.org/abs/1808.03398 [17] K. Xu, D. Z. Huang, E. Darve, Learning Constitutive Relations using Symmetric Posi-tive Definite Neural Networks (2020) 1–31 arXiv:2004.00265 .URL http://arxiv.org/abs/2004.00265 [18] M. Raissi, Deep hidden physics models: Deep learning of nonlinear partial differentialequations, Journal of Machine Learning Research 19 (2018) 1–24. arXiv:1801.06637 .[19] M. Raissi, P. Perdikaris, G. E. Karniadakis, Numerical Gaussian Processes for Time-Dependent and Nonlinear Partial Differential Equations, SIAM Journal on ScientificComputing 40 (1) (2018) A172–A198. arXiv:1703.10230 , doi:10.1137/17M1120762 .URL https://epubs.siam.org/doi/10.1137/17M1120762 [20] E. Haghighat, M. Raissi, A. Moure, H. Gomez, R. Juanes, A deep learning frameworkfor solution and discovery in solid mechanics (2020). arXiv:2003.02751 .URL http://arxiv.org/abs/2003.02751 [21] S. Rudy, A. Alla, S. L. Brunton, J. N. Kutz, Data-Driven Identification of ParametricPartial Differential Equations, SIAM Journal on Applied Dynamical Systems 18 (2)(2019) 643–660. doi:10.1137/18M1191944 .URL https://epubs.siam.org/doi/10.1137/18M1191944 [22] J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian,D. Warde-Farley, Y. Bengio, Theano: a CPU and GPU math expression compiler, in:Proceedings of the Python for Scientific Computing Conference (SciPy), Vol. 4, Austin,TX, 2010.[23] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat,G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D. G. Murray,B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng,TensorFlow: A system for large-scale machine learning, in: 12th USENIX Symposiumon Operating Systems Design and Implementation (OSDI 16), USENIX Association,Savannah, GA, 2016, pp. 265–283.URL arXiv:1512.01274 .[25] F. Chollet, Deep Learning with Python, Manning Publications Company, 2017.URL https://books.google.ca/books?id=Yo3CAQAACAAJ [26] A. G¨une, G. Baydin, B. A. Pearlmutter, J. M. Siskind, Automatic Differentiation inMachine Learning: a Survey, Journal of Machine Learning Research 18 (2018) 1–43.URL [27] E. Kharazmi, Z. Zhang, G. E. Karniadakis, hp-VPINNs: Variational Physics-InformedNeural Networks With Domain Decomposition (2020) 1–21 arXiv:2003.05385 .URL http://arxiv.org/abs/2003.05385 [28] K. Hornik, M. Stinchcombe, H. White, Multilayer feed-forward networks are universalapproximators, Neural Networks 2 (5) (1989) 359–366. doi:10.1016/0893-6080(89)90020-8 .[29] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics ofControl, Signals and Systems 2 (4) (1989) 303–314. doi:10.1007/BF02551274 .URL https://doi.org/10.1007/BF02551274 [30] K. Hornik, Approximation capabilities of multilayer feed-forward networks, NeuralNetworks 4 (2) (1991) 251 – 257. doi:https://doi.org/10.1016/0893-6080(91)90009-T .URL [31] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature 323 (6088) (1986) 533–536. doi:10.1038/323533a0 .[32] C. M. Dafermos, Hyperbolic Conservation Laws in Continuum Physics, Springer-Verlag,Berlin, 2000.[33] J. C. Simo, T. J. R. Hughes, Computational Inelasticity, Vol. 7 of InterdisciplinaryApplied Mathematics, Springer, New York, 1998.[34] O. Zienkiewicz, S. Valliappan, I. King, Elasto-plastic solutions of engineering problemsinitial stress, finite element approach, International Journal for Numerical Methods inEngineering 1 (1) (1969) 75–100.[35] COMSOL, COMSOL Multiphysics User’s Guide, COMSOL, Stockholm, Sweden, 2020.[36] E. Weinan, B. Yu, The Deep Ritz Method: A Deep Learning-Based Numerical Algo-rithm for Solving Variational Problems, Communications in Mathematics and Statistics6 (1) (2018) 1–14. arXiv:1710.00211 , doi:10.1007/s40304-018-0127-z .2137] J. Berg, K. Nystr¨om, A unified deep artificial neural network approach to partialdifferential equations in complex geometries, Neurocomputing 317 (2018) 28–41. arXiv:1711.06464 , doi:10.1016/j.neucom.2018.06.056 .URL https://doi.org/10.1016/j.neucom.2018.06.056https://linkinghub.elsevier.com/retrieve/pii/S092523121830794Xhttps://doi.org/10.1016/j.neucom.2018.06.056https://linkinghub.elsevier.com/retrieve/pii/S092523121830794X