Meshless physics-informed deep learning method for three-dimensional solid mechanics
MM ESHLESS PHYSICS - INFORMED DEEP LEARNING METHODFOR THREE - DIMENSIONAL SOLID MECHANICS
A P
REPRINT
Diab W. Abueidda ∗ National Center for Supercomputing ApplicationsDepartment of Mechanical Science and EngineeringUniversity of Illinois at Urbana-Champaign
Qiyue Lu
National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign
Seid Koric
National Center for Supercomputing ApplicationsDepartment of Mechanical Science and EngineeringUniversity of Illinois at Urbana-Champaign A BSTRACT
Deep learning and the collocation method are merged and used to solve partial differential equationsdescribing structures’ deformation. We have considered different types of materials: linear elasticity,hyperelasticity (neo-Hookean) with large deformation, and von Mises plasticity with isotropic andkinematic hardening. The performance of this deep collocation method (DCM) depends on thearchitecture of the neural network and the corresponding hyperparameters. The presented DCM ismeshfree and avoids any spatial discretization, which is usually needed for the finite element method(FEM). We show that the DCM can capture the response qualitatively and quantitatively, withoutthe need for any data generation using other numerical methods such as the FEM. Data generationusually is the main bottleneck in most data-driven models. The deep learning model is trained to learnthe model’s parameters yielding accurate approximate solutions. Once the model is properly trained,solutions can be obtained almost instantly at any point in the domain, given its spatial coordinates.Therefore, the deep collocation method is potentially a promising standalone technique to solvepartial differential equations involved in the deformation of materials and structural systems as wellas other physical phenomena.
Keywords
Computational mechanics · Machine learning · Meshfree method · Neural networks · Partial differentialequations · Physics-informed learning
Computational solid mechanics aims to predict and/or optimize the behavior of a particular problem using computermethods. Such a problem emerges in natural and engineered systems. Conventional approaches employed to solvepartial differential equations (PDEs) governing physical phenomena appearing in computational solid mechanicsproblems include the isogeometric analysis [1], the meshfree methods [2], and the finite element method (FEM)[3, 4]. Computational methods capable of capturing the physical responses can be computationally expensive andtime-consuming. Such problems include and are not limited to multiphysics [5, 6], modeling of architected materialswith complex geometries [7, 8, 9], nonlinear topology optimization [10, 11, 12, 13], multiscale analysis [14, 15], etc.Recently, machine learning (ML) has been proven effective and successful in many fields such as medical diagnoses,image and speech recognition, financial services, autopilot in automotive scenarios, and many other engineeringand medical applications [16, 17, 18]. Computational engineering and mechanics are no exception [19, 20, 21].Several data-driven approaches have been developed to capture the thermal conductivity of composites [22], the ∗ [email protected] a r X i v : . [ c s . L G ] F e b CM for computational solid mechanics
A P
REPRINT elastic properties of composites [23], the anisotropic hyperelasticity [24], the plastic response of different systems[25, 26, 27, 28], the thermo-viscoplastic modeling of solidification [28], the failure of composites [29], the fatigueof materials [30], the effect of flexoelectricity on nanostructures [31], the properties of phononic crystals [32], etc.Also, several researchers programmed user-defined material subroutines (UMATs) in which neural networks weretrained and used to replace the constitutive model integration in typical implicit nonlinear finite element solutionprocedures [27, 33, 34]. Additionally, data-driven models have shown broad applicability to accelerate design processesand materials discovery [23, 31, 35, 36, 37, 38]. Recent studies have also shown that deep-learning based surrogatemodels can establish the material law for composite materials using the computationally or experimentally generatedstress-strain data under different loading paths [39].While deep learning (DL) provides a platform that is capable of prominently rapid inference, it requires a large trainingdataset to learn the multifaceted relationships between the inputs and outputs. The dataset size used for training a modelis problem-based, i.e., complex problems necessitate large datasets to yield models capable of predicting the responseaccurately. Thus, such deep-learning based surrogate models usually require a discretization method, such as the finiteelement method, to generate the data needed to train the model (e.g., [22, 23, 24, 25, 26, 29, 30, 31, 36, 37, 38, 40]). Data-driven approaches are still dominant when it comes to using machine learning algorithms in the field of computationalmechanics. Nevertheless, several researchers recently proposed using deep neural networks to directly solve partialdifferential equations, where such an approach was proposed some time ago [41, 42, 43, 44]. However, it did not gainmuch interest because of the lack of efficient techniques and tools, such as automatic differentiation [45] and recentadvances related to the graphics processing unit (GPU). Raissi et al. [46] developed a physics-informed neural network(PINN) to solve forward and inverse problems. The method has dealt with the strong form, where it is based on acollocation approach. They have used limited data observations that are merged with partial differential equations.The PINN approach [46, 47, 48, ? ] has been adopted in several fields, such as cardiovascular flows modeling [49],poromechanics [50], subsurface transport [51], etc.Solving partial differential equations using deep learning algorithms is very intriguing, and it is gaining an increasinginterest by both academia and industry [52, 53, 54, ? ]. Weinan et al. [55] proposed a method called the deep Ritz method,in which they solve variational problems, particularly those arising from partial differential equations. They have shownthat the deep Ritz method is naturally nonlinear and adaptive and can be applied to systems with high-dimensionalpartial differential equations. Additionally, Sirignano [56] developed an approach in which the solution is approximatedusing a deep neural network. Specifically, the proposed meshfree approach trains a deep neural network model to satisfythe differential operator, boundary conditions, and initial conditions.There are limited attempts to solve partial differential equations related to the field of computational solid mechanicsusing deep neural networks [57, 58]. In these attempts, the authors developed a novel approach, called the deep energymethod (DEM), where such an approach is well suited for problems possessing energy functionals. In their approach,the potential energy defines the neural network model’s loss function, which is minimized using a predefined optimizer,such as the Adam [59] and L-BFGS [60] optimizers. The proposed approach has shown promising results whenapplying it to elasticity, elastodynamics, hyperelasticity, phase field for fracture, and piezoelectricity [57, 58].This study proposes a meshfree approach to solve partial differential equations involving different constitutive models,using deep neural networks (DNNs). In this meshfree approach, the DNN attempts to find the displacement field thatsatisfies the partial differential equation and the essential and natural boundary conditions. Since the training of amachine learning model is an optimization problem, in which the loss function is minimized, we define the loss functionusing the strong form. The loss function can be minimized using one of the optimizers, such as the Adam [59] and thequasi-Newton L-BFGS [60] optimizers. Here, we refer to this approach as the deep collocation method (DCM). Notethat this approach is meshfree and does not require the tangent modulus assembly, which is a primary step in any finiteelement analysis (FEA) that can be computationally expensive. Additionally, this approach does not necessarily requirethe definition of the potential energy functional.The remainder of the paper is laid out as follows. Section 2 talks about the details of the proposed approach, where ageneric problem setup is introduced. In Sections 3, 4, and 5, the DCM approach is applied to solve three-dimensional(3D) examples involving elastic, hyperelastic, and elastoplastic constitutive models, respectively. We conclude thepaper in Section 6 by summarizing the significant results and stating possible future work directions. When a nonlinear problem involving material and/or geometric nonlinearities is solved using the implicit finite elementmethod, one often ends up defining a residual vector. Using an iterative scheme (e.g., Newton-Raphson), in eachiteration, the tangent matrix and residual vector must be obtained to solve the corresponding linear system of equations,using a direct or iterative solver, to find the vector of unknowns. In contrast, explicit nonlinear finite element methods2CM for computational solid mechanics
A P
REPRINT don’t simultaneously solve a linear equations system, but they are often bounded by the conditional stability, oftenusing very small time increments. Moreover, when explicit FEM is used for quasi-static simulations, care must be takenso that the inertia effects are insignificant [61]. This study uses meshfree deep learning to attain the displacement field.Here, the displacement field obtained from the DNN is used to calculate strains, stresses, and other variables needed tosatisfy the strong form. The loss function is minimized within a deep learning framework such as PyTorch [62] andTensorFlow [63]. Below, we include a brief introduction to deep learning and then scrutinize the proposed framework.
Deep learning is a special kind of machine learning working with various types of artificial neural networks (ANN)[64]. Loosely inspired by the brain’s structure and functionality, artificial neural networks are layers of interconnectedindividual units, called neurons, where each neuron performs a distinct function given input data. The most straightfor-ward neural network, so-called dense feedforward neural network, comprises linked layers of neurons that map theinput data to predictions, as shown in Figure 1. An artificial neural network’s deepness is controlled by the number ofhidden layers, i.e., the layers in between input and output layers. The numbers of layers and neurons in each layer arespecified based on problem complexity.Figure 1: Fully connected (dense) artificial neural network.The mapping from the input to the output is expressed as Z : R n → R m , where n and m denote the number of neuronsin the input and output layers, respectively. Upon initialization, the weights W and biases b of the model will be farfrom ideal. Throughout the training process, the data flow from the model goes forward from the input layer to theoutput layer of the neural network. The output ˆ o l of the l th layer is calculated as: Z l = W l ˆ o l − + b l ˆ o l = f l (cid:16) Z l (cid:17) . (1)After each training pass, W and b are updated. f l represents the activation function, which is an R → R mapping thatacts on the i th neuron in the l th layer. Neural networks employ simple differentiable nonlinear activation functions,which assist the network to learn complex functional relationships between outputs and inputs and render accuratepredictions. Some standard activation functions utilized in neural networks are hyperbolic tangent, rectified linear unit(ReLU), sigmoid, etc.One needs to define a loss function L and then find the weights W and biases b that lead to a minimized loss value,where such an optimization process is called training in the context of machine learning. The training process requiresthe use of backpropagation, wherein the loss function is minimized iteratively. One of the most common and moststraightforward optimizers used in machine learning is gradient descent, as expressed below: W c +1 ij = W cij − β ∂ L ∂W cij b c +1 i = b ci − β ∂ L ∂b ci (2)where β denotes the learning rate, which is a crucial parameter in the neural network training process. The gradients ofthe loss function are obtained using the chain rule. In other words, the gradients of the loss function are calculated with3CM for computational solid mechanics A P
REPRINT respect to the weights and biases in the last layer, and the weights and biases are updated for each neuron. The sameprocess is then performed for the previous layer and until all of the layers have had their weights and biases updated.Then, a new iteration c with forward propagation starts again. Eventually, after a reasonable number of iterations, theweights W c and biases b c will converge toward a minimized loss value. One also can use more intricate optimizerssuch as Adam and L-BFGS. Although we use only dense layers and activation functions, the approach is not limited tothose; one may incorporate other machine learning algorithms such as convolutional neural network and dropout.The universal approximation theorem [65, 66] can explain why a feedforward neural network can be used as an arbitraryapproximator for any continuous function F ( X ) defined on a compact subset of R n . The universal approximationtheorem states that the multilayer feedforward neural networks with an arbitrary bounded and nonconstant activationfunction f and as few as a single hidden layer can be universal approximators with arbitrary accuracy (cid:15) > . Providingthe activation function is bounded, nonconstant, and continuous, then continuous mappings can be uniformly learnedover compact input sets. Mathematically, the universal approximation theorem states that there exist W , b , N , and a such that the approximation function g ( X ) satisfies: | F ( X ) − g ( X ) | < (cid:15)g ( X ) = N (cid:88) i =1 a i f ( W ij X j + b i ) (3)where a i ∈ R is fixed. It is noteworthy to highlight that the theorem neither makes conclusions about the network’straining, nor the number of neurons needed in the hidden layers to attain the desired accuracy (cid:15) , nor whether thenetwork’s parameters estimation is even feasible. Now, we discuss the deep collocation method (DCM), and then, in Section 2.4, we talk about architecture of the neuralnetwork model used in this study. In a general sense, the partial differential equation, with solution u ( t, X ) , can beexpressed as: ( ∂ t + N ) u ( t, X ) = , ( t, X ) ∈ [0 , T ] × Ω , u (0 , X ) = u o X ∈ Ω , u ( t, X ) = u , ( t, X ) ∈ [0 , T ] × Γ u , t ( t, X ) = t , ( t, X ) ∈ [0 , T ] × Γ t , (4)where ∂ t is the partial derivative with respect to time t , T is the total time, N is a spatial differential operator, u o is theinitial condition, u is a defined essential boundary condition, and t is a defined natural boundary condition. Ω is thematerial domain, while Γ u and Γ t are the surfaces with essential and natural boundary conditions, respectively.In this work, we are trying to solve partial differential equations by training a neural network with parameters φ = { W , b } . Specifically, we train the model such that the approximate solution ˆ u ( t, X ; φ ) obtained from the neuralnetwork should be as close as possible to the solution of the differential equation. Hence, the loss function L is definedusing the mean square error ( M SE ) in the light of Equation 4, i.e., L = M SE G + λ u M SE u + λ t M SE t + λ i M SE i , where M SE G = 1 N G N G (cid:88) j =1 (cid:107) ( ∂ t + N ) ˆ u ( t, X ; φ ) (cid:107) , ( t j , X j ) ∈ [0 , T ] × Ω M SE u = 1 N u N u (cid:88) j =1 (cid:107) ˆ u ( t, X ; φ ) − u (cid:107) , ( t j , X j ) ∈ [0 , T ] × Γ u M SE t = 1 N t N t (cid:88) j =1 (cid:107) ˆ t ( t, X ; φ ) − t (cid:107) , ( t j , X j ) ∈ [0 , T ] × Γ t M SE i = 1 N i N i (cid:88) j =1 (cid:107) ˆ u (0 , X ; φ ) − u o (cid:107) , X j ∈ Ω (5)where N G , N u , N t , and N i ∈ Ω are the numbers of the sampled points corresponding to the different terms of the lossfunction. λ u > , λ t > , and λ i > are hyperparameters (penalty coefficients) weighting the contribution of thedifferent boundary conditions. In our experience, these weighting hyperparameters are essential to guarantee that thedifferent terms are contributing to a similar degree to the value of the loss function.4CM for computational solid mechanics A P
REPRINT
In the examples we have considered in this study, we assume that the inertia term is zero, i.e., the term ∂ t u = ,and the solution is u ( X ) . Hence, there is no need to define initial conditions, i.e., the M SE i term drops from thedefinition of the loss function. Figure 2 depicts the deep collocation method used here. The deep neural network (DNN)minimizes the loss function to obtain the optimized network parameters φ ∗ = { W ∗ , b ∗ } . The DNN model mapsthe coordinates X of the sampled points to the displacement field ˆ u ( X , φ ) using the feedforward propagation. Thepredicted displacement field ˆ u is used to calculate the dependent variables, involved in a specific problem, and theloss function. The computation of the dependent variables and loss function typically requires finding the first andsecond derivative of ˆ u , which are found using the automatic differentiation offered by deep learning frameworks. Theoptimization (minimization) problem is written as: φ ∗ = arg min φ L ( φ ) . (6)Figure 2: Flow chart of the deep collocation method (DCM).The optimization problem is solved using an optimizer such as L-BFGS, where the gradients are calculated using thebackpropagation process. Note that the DCM is a meshfree approach, and there is no need to assemble the tangentmodulus, and also there is no linear system of equations has to be solved. Such a method does not necessitate thedefinition of potential energies, making it intriguing for a broader spectrum of problems, such as plasticity. It is worthmentioning that the optimization problem involved in any machine learning problem, including the DCM, is generallya nonconvex one. Hence, one needs to pay attention to the possibility of getting trapped in local minima and saddlepoints. This study is far from being the last word on the topic. The design of experiments (DOE) has become prominent in the era of the swiftly growing fields of data science,machine learning, and statistical modeling. The proposed approach is meshfree, in which many points are sampledfrom three sets, namely Ω , Γ u , and Γ t . This set of randomly distributed collocation points are used to minimize the lossfunction while accounting for a set of constraints (boundary conditions). These collocation points are used to trainthe machine learning model, i.e., to solve the partial differential equations in the physical domain. There are manysampling methods that can be adopted. Here, we use the Monte Carlo method, i.e., create the points through repeatedrandom sampling [67]. In this paper, a -layer network is used, where the numbers of neurons in the different layers are − − − − − .Three neurons are used in the input layer for the coordinates X , and three neurons are used in the output layer forthe displacement ˆ u . In the four hidden layers, the number of neurons employed is . After each dense layer, we usean activation function, as demonstrated in Equation 1. A network based on dense layers without nonlinear activationfunctions is reduced to a linear one, making it challenging to capture nonlinear relationships between input and output.5CM for computational solid mechanics A P
REPRINT
One popular activation functions in many machine learning practices are ReLU and leaky ReLU. However, this approachrequires finding the second-order derivative. While ReLU is a globally nonlinear activation function, it is linear in aneighborhood of almost every input, rendering the network linear in a neighborhood of the inputs (see Figure 3). Hence,it is not a good choice for our case. Here, we use the hyperbolic tangent (tanh) activation function within the hiddenlayers, where, for the output layer, we use a linear activation function. The architecture of the network is obtained bytrying several architectures and fine-tuning of hyperparameters.Figure 3: Illustration of some activation functions.As mentioned earlier, the loss function, defined in Equation 5, is minimized. PyTorch is used in this study to solve thedeep learning problem (minimize the loss function). Here, we use two optimizers; we start with the Adam optimizer,and then the limited-memory BFGS (L-BFGS) optimizer with the Strong Wolfe line search [60, 68] is used. High-throughput computations are done on iForge, which is an HPC cluster hosted at the National Center for SupercomputingApplications (NCSA). A representative compute node on iForge has the following hardware specifications: two NVIDIATesla V100 GPUs, two -core Intel Skylake CPUs, and GB main memory. Since it is challenging to find ananalytical solution for most cases, we obtain a reference solution using the conventional finite element analysis tomeasure the accuracy of the solution obtained from the DCM. We use the normalized “ L -error" as a metric reflectingon the accuracy of the DCM solution: L -error = || u REF − ˆ u || L || u REF || L . (7) Here, we consider a homogeneous, isotropic, elastic body going small deformation. The equilibrium equation, in theabsence of body and inertial forces, is expressed as: ∇ · σ = , x ∈ Ω , ˆ u = u , x ∈ Γ u , σ · n = t , x ∈ Γ t , (8)where σ is the Cauchy stress tensor, n is the normal unit vector, ∇ · denotes the divergence operator, and ∇ is thegradient operator. Since small deformation is assumed, the strain is given by: ε = 12 ( ∇ ˆ u + ∇ ˆ u T ) (9)where u i are displacement components, and ε denotes the infinitesimal strain tensor. The relationship between thestress and strain is expressed as: σ = λ trace ( ε ) I + 2 µ ε (10)where λ and µ denote the Lame constants, and I represents the second-order identity tensor. The steps used to solve aproblem with a linear elastic constitutive model are summarized in Algorithm 1.Now, let us consider a 3D bending beam, as shown in Figure 4. Let L = 4 m and D = H = 1 m. Using adisplacement-controlled approach, the displacement in the y -direction is u y = C = 0 . m at the face with a normal6CM for computational solid mechanics A P
REPRINT
Algorithm 1:
Linear elasticity
Input : Physical domain, BCs, and DNNMaterial parameters ( λ and µ )Sample points X int from Ω Sample points X u from Γ u Sample points X t from Γ t Neural network architectureNeural network hyperparametersOptimizer (Adam followed by L-BFGS)
Initialization : Initial weights and biases of the DNN
Output : Optimized weights and biases of the DNN while
Not minimized do Obtain ˆ u from the DNNCompute ∇ ˆ u using automatic differentiationCompute ε Compute σ if X int then Compute ∇ · σ using automatic differentiationCalculate M SE G else if X t then Compute t = σ · n Calculate
M SE t else Calculate
M SE u Calculate loss functionUpdate the weights and biases end
Figure 4: 3D beam bending test.unit vector n = [1 , , T . On the other hand, the displacements u x , u y , and u z are set zero at the face with a normalunit vector n = [ − , , T . Since this approach is based on the strong form, the degrees of freedom on the boundarieswith both zero and nonzero tractions have to be explicitly satisfied. The M SE t term appearing in Equation 5 takes careof this. The numbers of points sampled from the cantilever beam domain are as follows: N G = 7500 , N u = 4000 , and N t = 4000 . Figure 5 shows the convergence of the loss function. As mentioned earlier, we have used two optimizers.We start with the Adam optimizer, and then it is followed by the L-BFGS optimizer. We found that combining bothoptimizers stabilizes the optimization procedure. Figure 6 depicts the vertical displacement ˆ u y contours obtained fromthe DCM and compares the contours with those obtained from the FEM. Also, the von Mises stress contours obtainedusing the DCM are presented. After the model is trained, and the optimized weights and biases are attained, 1000 test7CM for computational solid mechanics A P
REPRINT
Figure 5: Elasticity example: Loss function convergence.Figure 6: Elasticity example: (a) Comparison between the vertical displacement ˆ u y contours obtained from the DCMand FEM, (b) Comparison between the FEM and DCM vertical displacement ˆ u y along the AB path, and (c) von Misesstress contours.points from the DCM, different from the ones used in the training of the DNN, and their corresponding ones from theFEM are utilized to calculate the L -error; the L -error = 0 . .Since the constitutive model here is path-independent, the solution can be obtained using one pseudo-time step. Also,although we are sampling the points at the very beginning of the solution procedure and fix those points during theoptimization process, one is not really required to do so. In other words, this case is path-independent; therefore, onecan sample new points at each optimization iteration, which might help increase the model’s accuracy. This is not thecase for elastoplastic problems, as discussed later. We sample the points at the beginning of the optimization procedurefor simplicity, and we fix them throughout the optimization iteration.8CM for computational solid mechanics A P
REPRINT
Let us consider a body made of a homogeneous and isotropic hyperelastic material under finite deformation. Themapping ζ of material points from the initial configuration to the current configuration is given by: x = ζ ( X , t ) = X + ˆ u . (11)In the absence of body and inertial forces, the strong form is written as: ∇ X · P = , X ∈ Ω , ˆ u = u , X ∈ Γ u , P · N = t , X ∈ Γ t , (12)where P is the first Piola-Kirchhoff stress, and N represents the outward normal unit vector in the initial configuration.The constitutive law for such a material is expressed as: P = ∂ψ ( F ) ∂ FF = ∇ X ζ ( X ) (13)where F denotes the deformation gradient. For the neo-Hookean hyperelastic material, the Helmholtz free energy ψ ( F ) is given by: ψ ( F ) = 12 λ ( ln ( J )) − µ ln ( J ) + 12 µ ( I − , (14)where the first invariant is defined as I = trace ( C ) , the right Cauchy-Green tensor C is defined as C = F T F , andthe second invariant is defined as J = det ( F ) . Thus, P is given by: P = ∂ψ ( F ) ∂ F = µ F + ( λ ln ( J ) − µ ) F − T , P = J σ F − T . (15) Algorithm 2: neo-Hookean hyperelasticity
Input : Physical domain, BCs, and DNNMaterial parameters ( λ and µ )Sample points X int from Ω Sample points X u from Γ u Sample points X t from Γ t Neural network architectureNeural network hyperparametersOptimizer (Adam followed by L-BFGS)
Initialization : Initial weights and biases of the DNN
Output : Optimized weights and biases of the DNN while
Not minimized do Obtain ˆ u from the DNNCompute ∇ X ˆ u using automatic differentiationCompute F = I + ∇ X ˆ u Compute J = det ( F ) , C , I , and P if X int then Compute ∇ X · P using automatic differentiationCalculate M SE G else if X t then Compute t = P · N Calculate
M SE t else Calculate
M SE u Calculate loss functionUpdate the weights and biases end
A P
REPRINT
The steps used to solve a problem with a neo-Hookean hyperelastic constitutive model are described in Algorithm 2.Similar procedure, with slight changes, can be used for other hyperelastic constitutive models, such as the Mooney-Revlin and Arruda-Boyce hyperelastic models.Figure 7: Hyperelasticity example: (a) Comparison between the vertical displacement ˆ u y contours obtained from theDCM and FEM, (b) Comparison between the DCM and FEM vertical displacement ˆ u y along the AB path, and (c) vonMises stress contours.Going back to the cantilever beam example (see Figure 4), but now assume that the cantilever beam is made of aneo-Hookean hyperelastic material. The same boundary conditions are applied, except the vertical displacement u y at the face with the normal unit vector N = [1 , , T is increased to C = 1 . m to ensure that large deformation isinduced. The numbers of points sampled from the cantilever beam domain are as follows: N G = 7500 , N u = 4000 ,and N t = 4000 . Figure 7 illustrates the vertical displacement ˆ u y contours found using the DCM and compares thecontours with those obtained from the FEM. Additionally, the von Mises stress contours obtained using the DCM areshown. Similar to the linear elasticity case, after optimizing the weight and biases of the DNN, 1000 test points from theDCM, different from the ones used in the training of the DNN, and their FEM corresponding ones are used to find the L -error; the L -error = 0 . . When the FEM is used to solve a hyperelastic problem involving large deformation, oneusually needs to implement pseudo-time discretization to avoid divergence, although the problem is history-independent.Using the DCM, we managed to solve the problem using one pseudo-time step. Of course, one can find the solution atthe intermediate pseudo-time steps if it is of interest. Also, we have not studied the effect of obtaining the solution usingmultiple pseudo-time steps. It could help increase the accuracy of the obtained solution, but it is not required, as wehave shown. Furthermore, similar to the elasticity case, we are sampling the points at the very beginning of the solutionprocedure, and then those points stay unchanged during the optimization process. Nevertheless, one is not obligated tostick with the same points when the material is hyperelastic, and these points can be resampled at each optimizationiteration. We anticipate that resampling the points at each optimization iteration would increase the model’s accuracy.10CM for computational solid mechanics A P
REPRINT
In the second case presented in this study, we consider a body made of a material obeying a J flow theory with linearisotropic and kinematic hardening, where both forms of the hardening law evolve linearly. A cantilever beam problemis solved here as an example of applying the DCM to elastoplastic systems. The strong form, in the absence of inertialand body forces, is given by: ∇ · σ = , x ∈ Ω , ˆ u = u , x ∈ Γ u , σ · n = t , x ∈ Γ t . (16)Under the assumption of small deformation, the strain tensor ε is additively decomposed into the elastic strain tensor ε el and plastic strain tensor ε pl : ε = 12 (cid:16) ∇ u + ( ∇ u ) T (cid:17) ε = ε el + ε pl . (17)The von Mises yield condition y is used: y ( σ , q , α ) = || η || f − (cid:114)
23 ( σ y + Kα ) η = s − q (18)where || · || f is the Frobenius norm, η is the relative stress tensor, s is the deviatoric stress tensor, q denotes the backstress, and α is the internal plastic variable known as the equivalent plastic strain. σ y and K are material hardeningconstants. The Karush-Kuhn-Tucker (KKT) conditions and consistency condition are required to complete the definitionof the constitutive model: y ( σ , q , α ) ≤ γy ( σ , q , α ) = 0 γ ≥ γ ˙ y ( σ , q , α ) = 0 (19)where γ is the consistency parameter. The evolution of α and q is defined as: ˙ α = (cid:114) γ n = η || η || f ˙ q = 23 γH n (20)where H is the material kinematic hardening constant, and n is the tensor normal to the yield surface.The radial return mapping algorithm, originally presented by Wilkins [69], is used here. Given the state at an integrationpoint: α t and ε pt at the previous time step t , and ε t +1 at the current time step t + 1 , one can calculate α t +1 and ε pt +1 .Firstly, the trial state is computed: e t +1 = ε t +1 − trace ( ε t +1 ) Is trialt +1 = 2 µ ( e t +1 − e pt ) η trialt +1 = s trialt +1 − q t (21)where I is the second-order identity tensor, µ represents the shear modulus, and e is the deviatoric strain tensor.Then, the yield condition y trialt +1 is checked: y trialt +1 = || η trialt +1 || f − (cid:114)
23 ( σ y + Kα t ) (22)11CM for computational solid mechanics A P
REPRINT If y trialt +1 ≤ , ( · ) t +1 = ( · ) trialt +1 . Otherwise, we proceed with the calculations of n t +1 , ∆ γ t +1 , and α t +1 : n t +1 = η trialt +1 || η trialt +1 || f ∆ γ t +1 = y trialt +1 (cid:0) µ + H + K (cid:1) α t +1 = α t + (cid:114)
23 ∆ γ t +1 . (23)Next, the back stress, plastic strain, and stress are updated: q t +1 = q t + 23 ∆ γ t +1 H n t +1 e pt +1 = e pt + ∆ γ t +1 n t +1 σ t +1 = κ trace ( ε t +1 ) I + s trialt +1 − µ ∆ γ t +1 n t +1 (24)where κ is the bulk modulus. The procedure used to solve the elastoplastic problem is described in Algorithm 3. Forelastoplastic problems, the solution is obtained using M pseudo-time steps. Hence, the optimization problem is solved M times, where the optimized weights and biases attained at a step t are used as the initial weights and biases for thenext step t + 1 , which can be considered a form of transfer learning. This transfer learning is equivalent to updatingdisplacements after each converged step in a typical nonlinear implicit finite element analysis solution procedure.We use the DCM to solve the cantilever beam problem, shown in Figure 4 with C = 0 . m, where the beam is made ofan elastoplastic material. The numbers of points sampled from the cantilever beam domain are as follows: N G = 7500 , N u = 4000 , and N t = 4000 . Figure 8 portrays the vertical displacement ˆ u y contours obtained using the DCM andFEM, and also it shows ˆ u y along the AB path. Additionally, we present the von Mises stress and some plastic straincomponents contours in Figure 8. Elastoplastic materials are path-dependent. Even if the problem is rate-independent, apseudo-time stepping scheme is implemented, where the state variables at a step t are inputs for the next step t + 1 .Hence, unlike the linear elasticity and large deformation hyperelasticity, resampling the points at each optimizationiteration is tricky. Then, one needs to adopt interpolating techniques to find the state variables at the resampled points,where such interpolation functions might jeopardize accuracy.12CM for computational solid mechanics A P
REPRINT
Algorithm 3: von Mises plasticity with isotropic and kinematic hardening
Input : Physical domain, BCs, and DNNMaterial parameters ( κ , µ , σ y , K , and H )Sample points X int from Ω Sample points X u from Γ u Sample points X t from Γ t Neural network architectureNeural network hyperparametersOptimizer (Adam followed by L-BFGS)Initialize e po , q o , and α o Initialization : Initial weights and biases of the DNN
Output : Optimized weights and biases of the DNN for t ← to number of steps do Use weights and biases from previous step while
Not minimized do Obtain ˆ u from the DNNCompute ∇ ˆ u using automatic differentiationCompute ε t +1 , e t +1 , s trialt +1 , η trialt +1 , and y trialt +1 if y trialt +1 ≤ then ( · ) t +1 = ( · ) trialt +1 else Compute n t +1 , ∆ γ t +1 , and α t +1 Compute q t +1 , e pt +1 , and σ t +1 if X int then Compute ∇ · σ using automatic differentiationCalculate M SE G else if X t then Compute t = σ · n Calculate
M SE t else Calculate
M SE u Calculate loss functionUpdate the weights and biases endend
A P
REPRINT
Figure 8: Plasticity example: (a) Comparison between the vertical displacement ˆ u y contours obtained from the DCMand FEM, (b) Comparison between the DCM and FEM vertical displacement ˆ u y along the AB path, and (c) von Misesstress contours, (c) plastic strain ε p , and (d) plastic strain ε p . All results are at the last load increment, i.e., C = 0 . m.14CM for computational solid mechanics A P
REPRINT
In this paper, the collocation method and deep learning are merged here to solve partial differential equations involvedin the mechanical deformation of different materials: linear elasticity, neo-Hookean hyperelasticity, and von Misesplasticity with isotropic and kinematic hardening. This approach is not data-driven in the sense that no data generationis required. Data generation is usually the most time-consuming stage in developing a data-driven model. However,physics laws are used to obtain the solution. Once the DNN model is appropriately trained, high-quality solutions canbe inferenced almost instantly for any point in the domain based on its spatial coordinates. The deep collocation methodis meshfree. Thus, there is no need for defining connectivity between the nodes and mesh generation, which can bethe bottleneck in many cases [70] and may require special partitioning methods for large meshes [71]. Also, meshfreemethods have additional advantages, such as there is no element distortion or volumetric locking to worry about.Using deep learning to solve partial differential equations is a relatively recent research topic, and several issues are stillopen for improvement. The optimization of a deep learning model is a nonconvex one in most cases. Hence, one needsto be vigilant of getting trapped in local minima. Also, using more intricate methods to identify the optimal activationfunctions, architecture, and hyperparameters of a DNN model is another active research area; one possible option is touse genetic algorithms [72]. In this study, we have used Monte Carlo sampling. It would be interesting to investigate theeffect of sampling techniques (such as Latin hypercube, Halton sequence, etc.) and explore whether different samplingtechniques affect the accuracy and figure out which ones lead to higher accuracy. Also, resampling the points at eachoptimization iteration, especially for linear elasticity and hyperelasticity, is another direction to explore.In this study, we assumed that the body and inertial forces are negligible. It would be interesting to include sucheffects in future works and investigate how they impact the model’s accuracy. Moreover, for all the cases studiedhere, we have used feedforward neural networks. However, one can use more advanced architectures such as longshort-term memory (LSTM), gated recurrent unit (GRU), or temporal convolutional network (TCN), which are deepsequence learning model. Such architectures are very useful for cases that are history-dependent such as plasticity andviscoplasticity [26, 28]. Therefore, it would be intriguing to employ such sequence learning models within the DCM tosolve path-dependent problems. Additionally, in future works, other geometries, including irregular geometries, needto be considered and study the DCM’s accuracy for more complex geometries. It is also worth mentioning that thepenalty method has been used to satisfy the constraints (boundary conditions). One can reformulate the loss function,so the constraints (boundary conditions) are satisfied using Lagrange multipliers. The displacement field and Lagrangemultipliers can be inferenced by the DNN, and then one can check if a reasonable accuracy can be obtained. Solvingpartial differential equations using deep learning methods is an active research area, and this study is far from being thelast word on the topic.
Acknowledgment
The authors would like to thank the National Center for Supercomputing Applications (NCSA) Industry Program andthe Center for Artificial Intelligence Innovation.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
References [1] T. J. Hughes, G. Sangalli, M. Tani, Isogeometric analysis: Mathematical and implementational aspects, withapplications, in: Splines and PDEs: From Approximation Theory to Numerical Linear Algebra, Springer, 2018,pp. 237–315.[2] A. Huerta, T. Belytschko, S. Fernández-Méndez, T. Rabczuk, X. Zhuang, M. Arroyo, Meshfree methods,Encyclopedia of Computational Mechanics Second Edition (2018) 1–38.[3] T. J. Hughes, The Finite Element Method: Linear Static and Dynamic Finite Element Analysis, Courier Corpora-tion, 2012.[4] A. Ibrahimbegovic, Nonlinear Solid Mechanics: Theoretical Formulations and Finite Element Solution Methods,Vol. 160, Springer Science & Business Media, 2009.[5] W. B. Zimmerman, Multiphysics modeling with finite element methods, Vol. 18, World Scientific PublishingCompany, 2006. 15CM for computational solid mechanics
A P
REPRINT [6] K. Ghosh, O. Lopez-Pamies, On the two-potential constitutive modeling of dielectric elastomers, Meccanica.[7] D. W. Abueidda, M. Elhebeary, C.-S. A. Shiang, R. K. A. Al-Rub, I. M. Jasiuk, Compression and buckling ofmicroarchitectured Neovius-lattice, Extreme Mechanics Letters 37 (2020) 100688.[8] S. Babaee, J. Shim, J. C. Weaver, E. R. Chen, N. Patel, K. Bertoldi, 3D soft metamaterials with negative poisson’sratio, Advanced Materials 25 (36) (2013) 5044–5049.[9] D. W. Abueidda, P. Karimi, J.-M. Jin, N. A. Sobh, I. M. Jasiuk, M. Ostoja-Starzewski, Shielding effectiveness andbandgaps of interpenetrating phase composites based on the schwarz primitive surface, Journal of Applied Physics124 (17) (2018) 175102.[10] R. Alberdi, K. Khandelwal, Design of periodic elastoplastic energy dissipating microstructures, Structural andMultidisciplinary Optimization 59 (2) (2019) 461–483.[11] K. A. James, H. Waisman, Topology optimization of viscoelastic structures using a time-dependent adjoint method,Computer Methods in Applied Mechanics and Engineering 285 (2015) 166–187.[12] T. E. Bruns, D. A. Tortorelli, Topology optimization of non-linear elastic structures and compliant mechanisms,Computer methods in applied mechanics and engineering 190 (26-27) (2001) 3443–3459.[13] D. Jung, H. C. Gea, Topology optimization of nonlinear structures, Finite Elements in Analysis and Design 40 (11)(2004) 1417–1427.[14] K. Matouš, M. G. Geers, V. G. Kouznetsova, A. Gillman, A review of predictive nonlinear theories for multiscalemodeling of heterogeneous materials, Journal of Computational Physics 330 (2017) 192–220.[15] D. L. McDowell, J. Panchal, H.-J. Choi, C. Seepersad, J. Allen, F. Mistree, Integrated Design of Multiscale,Multifunctional Materials and Products, Butterworth-Heinemann, 2009.[16] W. Lim, D. Jang, T. Lee, Speech emotion recognition using convolutional and recurrent neural networks, in: 2016Asia-Pacific signal and information processing association annual summit and conference (APSIPA), IEEE, 2016,pp. 1–4.[17] Z. Thorat, S. Mahadik, S. Mane, S. Mohite, A. Udugade, Self driving car using raspberry-Pi and machine learning6 (2019) 969–972.[18] M. de Bruijne, Machine learning approaches in medical image analysis: From detection to diagnosis, MedicalImage Analysis 33 (2016) 94 – 97.[19] A. Mielke, T. Ricken, Evaluating artificial neural networks and quantum computing for mechanics, PAMM 19 (1)(2019) e201900470.[20] M. Ghommem, V. Puzyrev, F. Najar, Fluid sensing using microcantilevers: From physics-based modeling to deeplearning, Applied Mathematical Modelling 88 (2020) 224–237.[21] M. Bartoˇn, V. Puzyrev, Q. Deng, V. Calo, Efficient mass and stiffness matrix assembly via weighted Gaussianquadrature rules for b-splines, Journal of Computational and Applied Mathematics 371 (2020) 112626.[22] Q. Rong, H. Wei, X. Huang, H. Bao, Predicting the effective thermal conductivity of composites from crosssections images using deep learning methods, Composites Science and Technology 184 (2019) 107861.[23] D. W. Abueidda, M. Almasri, R. Ammourah, U. Ravaioli, I. M. Jasiuk, N. A. Sobh, Prediction and optimizationof mechanical properties of composites using convolutional neural networks, Composite Structures 227 (2019)111264.[24] N. N. Vlassis, R. Ma, W. Sun, Geometric deep learning for computational mechanics Part I: Anisotropic hypere-lasticity, Computer Methods in Applied Mechanics and Engineering 371 (2020) 113299.[25] R. Ma, W. Sun, Computational thermomechanics for crystalline rock Part II: Chemo-damage-plasticity andhealing in strongly anisotropic polycrystals, Computer Methods in Applied Mechanics and Engineering 369(2020) 113184.[26] M. Mozaffar, R. Bostanabad, W. Chen, K. Ehmann, J. Cao, M. Bessa, Deep learning predicts path-dependentplasticity, Proceedings of the National Academy of Sciences 116 (52) (2019) 26414–26420.[27] C. Settgast, M. Abendroth, M. Kuna, Constitutive modeling of plastic deformation behavior of open-cell foamstructures using neural networks, Mechanics of Materials 131 (2019) 1–10.[28] D. W. Abueidda, S. Koric, N. A. Sobh, H. Sehitoglu, Deep learning for plasticity and thermo-viscoplasticity,International Journal of Plasticity 136 (2021) 102852.[29] C. Yang, Y. Kim, S. Ryu, G. X. Gu, Prediction of composite microstructure stress-strain curves using convolutionalneural networks, Materials & Design 189 (2020) 108509.16CM for computational solid mechanics
A P
REPRINT [30] A. D. Spear, S. R. Kalidindi, B. Meredig, A. Kontsos, J.-B. Le Graverend, Data-driven materials investigations:the next frontier in understanding and predicting fatigue behavior, JOM 70 (7) (2018) 1143–1146.[31] K. M. Hamdia, H. Ghasemi, Y. Bazi, H. AlHichri, N. Alajlan, T. Rabczuk, A novel deep learning based methodfor the computational material design of flexoelectric nanostructures with topology optimization, Finite Elementsin Analysis and Design 165 (2019) 21–30.[32] S. M. Sadat, R. Y. Wang, A machine learning based approach for phononic crystal property discovery, Journal ofApplied Physics 128 (2) (2020) 025106.[33] Y. Hashash, S. Jung, J. Ghaboussi, Numerical implementation of a neural network based material model in finiteelement analysis, International Journal for numerical methods in engineering 59 (7) (2004) 989–1005.[34] S. Jung, J. Ghaboussi, Neural network constitutive model for rate-dependent materials, Computers & Structures84 (15-16) (2006) 955–963.[35] M. A. Bessa, P. Glowacki, M. Houlder, Bayesian machine learning in metamaterial design: Fragile becomessupercompressible, Advanced Materials 31 (48) (2019) 1904845.[36] C.-T. Chen, G. X. Gu, Generative deep neural networks for inverse materials design using backpropagation andactive learning, Advanced Science 7 (5) (2020) 1902607.[37] D. W. Abueidda, S. Koric, N. A. Sobh, Topology optimization of 2D structures with nonlinearities using deeplearning, Computers & Structures 237 (2020) 106283.[38] H. T. Kollmann, D. W. Abueidda, S. Koric, E. Guleryuz, N. A. Sobh, Deep learning for topology optimization of2D metamaterials, Materials & Design 196 (2020) 109098.[39] H. Yang, Q. Xiang, S. Tang, X. Guo, Learning material law from displacement fields by artificial neural network,Theoretical and Applied Mechanics Letters 10 (3) (2020) 202–206.[40] G. X. Gu, C.-T. Chen, M. J. Buehler, De novo composite design based on machine learning algorithm, ExtremeMechanics Letters 18 (2018) 19–28.[41] H. Lee, I. S. Kang, Neural algorithm for solving differential equations, Journal of Computational Physics 91 (1)(1990) 110–131.[42] I. E. Lagaris, A. Likas, D. I. Fotiadis, Artificial neural networks for solving ordinary and partial differentialequations, IEEE Transactions on Neural Networks 9 (5) (1998) 987–1000.[43] I. E. Lagaris, A. C. Likas, D. G. Papageorgiou, Neural-network methods for boundary value problems withirregular boundaries, IEEE Transactions on Neural Networks 11 (5) (2000) 1041–1049.[44] A. Malek, R. S. Beidokhti, Numerical solution for high order differential equations using a hybrid neuralnetwork—optimization method, Applied Mathematics and Computation 183 (1) (2006) 260–271.[45] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, J. M. Siskind, Automatic differentiation in machine learning: Asurvey, The Journal of Machine Learning Research 18 (1) (2017) 5595–5637.[46] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural networks: A deep learning framework forsolving forward and inverse problems involving nonlinear partial differential equations, Journal of ComputationalPhysics 378 (2019) 686–707.[47] Y. Guo, X. Cao, B. Liu, M. Gao, Solving partial differential equations using deep learning and physical constraints,Applied Sciences 10 (17) (2020) 5917.[48] S. Luo, M. Vellakal, S. Koric, V. Kindratenko, J. Cui, Parameter identification of RANS turbulence model usingphysics-embedded neural network, in: International Conference on High Performance Computing, Springer, 2020,pp. 137–149.[49] G. Kissas, Y. Yang, E. Hwuang, W. R. Witschey, J. A. Detre, P. Perdikaris, Machine learning in cardiovascularflows modeling: Predicting arterial blood pressure from non-invasive 4d flow mri data using physics-informedneural networks, Computer Methods in Applied Mechanics and Engineering 358 (2020) 112623.[50] T. Kadeethum, T. M. Jørgensen, H. M. Nick, Physics-informed neural networks for solving nonlinear diffusivityand Biot’s equations, PloS one 15 (5) (2020) e0232683.[51] Q. He, D. Barajas-Solano, G. Tartakovsky, A. M. Tartakovsky, Physics-informed neural networks for multiphysicsdata assimilation with application to subsurface transport, Advances in Water Resources (2020) 103610.[52] J. Han, A. Jentzen, E. Weinan, Solving high-dimensional partial differential equations using deep learning,Proceedings of the National Academy of Sciences 115 (34) (2018) 8505–8510.17CM for computational solid mechanics
A P