Recurrent Localization Networks applied to the Lippmann-Schwinger Equation
RRecurrent Localization Networks applied to the Lippmann-SchwingerEquation
Conlain Kelly , Surya R. Kalidindi ∗ Abstract
The bulk of computational approaches for modeling physical systems in materials science derive from eitheranalytical (i.e. physics based) or data-driven (i.e. machine-learning based) origins. In order to combinethe strengths of these two approaches, we advance a novel machine learning approach for solving equationsof the generalized Lippmann-Schwinger (L-S) type. In this paradigm, a given problem is converted intoan equivalent L-S equation and solved as an optimization problem, where the optimization procedure iscalibrated to the problem at hand. As part of a learning-based loop unrolling, we use a recurrent convolu-tional neural network to iteratively solve the governing equations for a field of interest. This architectureleverages the generalizability and computational efficiency of machine learning approaches, but also permitsa physics-based interpretation. We demonstrate our learning approach on the two-phase elastic localizationproblem, where it achieves excellent accuracy on the predictions of the local (i.e., voxel-level) elastic strains.Since numerous governing equations can be converted into an equivalent L-S form, the proposed architecturehas potential applications across a range of multiscale materials phenomena.
Keywords:
Machine Learning, Learned Optimization, Localization, Convolutional Neural Networks
1. Introduction
Most problems in materials science and engineering require the exploration of linkages between materialsprocessing history, materials structure, and materials properties. Generally referred as Process-Structure-Property linkages [1], they constitute the core materials knowledge needed to drive materials innovationsupporting advances in technology [2]. Traditional physics-based numerical methods have long been thestandard for solving the governing field equations underpinning these linkages. For mechanical problems,these have included ubiquitous finite element methods [3, 4, 5, 6] as well as FFT-based spectral methods[7, 8, 9]. However, standard solvers can constitute a major performance bottleneck in problems which requirerepeated solution over varied inputs, such as inverse problems [10, 11, 12] and multi-scale materials design[13, 14, 15].As an alternative, machine learning (ML) provides tools to approximate unknown linkages in a parametrizedfashion, with great success in many domains [16]. One of the most successful classes of ML models is neuralnetworks [17], which have been applied with excellent results both in general applications [16, 18, 19, 20],and within materials science [15, 21, 22, 23]. Unfortunately, ML models tend to act as “black boxes” whoseinner workings do not permit the depth of analysis provided by purely physics-based models [24]. There is aclear demand for approaches that leverage the advantages of both methodologies in order to build reliable,scalable, and interpretable reduced-order models.One example of such an effort is the Materials Knowledge Systems (MKS) framework [25, 26]. Aimedat multiscale materials design [27, 28], MKS formulates the governing field equations for heterogeneous ∗ Corresponding author
Email addresses: [email protected] (Conlain Kelly ), [email protected] (Surya R. Kalidindi ) Georgia Institute of Technology, Atlanta, Georgia 30332, USA
Preprint submitted to Elsevier February 2, 2021 a r X i v : . [ phy s i c s . c o m p - ph ] J a n aterials in a Lippmann-Schwinger (L-S) form [7, 29]. Using regression techniques to calibrate the first-order terms of a series expansion to the L-S equation, MKS presents a generalized approach for solving abroad class of scale-bridging problems in materials design and optimization [30, 31]. However, improvingthe accuracy of these models requires higher-order L-S terms, which rapidly become more computationallyexpensive.As an alternative, we propose an approach inspired by the intersections between iterative spectral meth-ods [32] and recent advances in inverse imaging [19, 33]; we cast the recurrent L-S equation as an optimizationproblem. Rather than employing a predefined optimization strategy, such as gradient descent or conjugategradients [34], the optimizer is posed as a recurrent collection of convolutional neural networks. After beingcalibrated to available curated data (e.g., results of FEA simulations, phase field models), these networksact as proximal (or “update”) operators which take in a candidate solution and output an improved ver-sion. This iterative methodology emphasizes the underlying physics to permit greater model robustness anddeeper analysis.In this paper, we begin with a brief analysis of the L-S equation and its application to the linear elasticityproblem, followed by discussion on the general L-S equation. Using this analysis, we then demonstrate howthe L-S equation can be naturally posed as a machine learning problem, and how a neural network can learnproximal operations which minimize a physical quantity (e.g., stress field divergence) within a solution field.By exploring the interplay between the physical and computational interpretations of the L-S equation, weprovide insight into a new class of ML models for materials science. We then analyze which aspects ofour approach provide the greatest gains by exploring various model configurations, reinforcing the value ofan iterative (rather than feed-forward) approach. Finally, we evaluate our methodology on the problem ofelastic localization and compare it to a previous machine learning model.
2. Background
Originally developed in the context of quantum mechanics [29], the L-S equation – or class of equations– is an implicit integral form that can represent a fairly general space of physical phenomena. The L-Sequation is especially useful in the context of physics of heterogeneous media with spatially-varying physicalparameters: stiffness, conductivity, density, etc. We defer discussion of the general Lippmann-Schwingerform until Section 2.2.As a case study, we consider the problem of computing the internal elastic strain field of a compositematerial undergoing bulk stresses or strains [7, 32, 35]. The composite microstructure is assumed to becomposed of two or more distinct phases (i.e., thermodynamic material constituents), each exhibiting itsown constitutive laws. This problem is herein referred to as elastic localization. An example two-phasemicrostructure and corresponding elastic strain field are presented in Figure 1.Physically, elastic localization is governed by a generalized Hooke’s law relating the variation in stress σ ( x ), strain ε ( x ), and stiffness C ( x ) over some volume V , along with the demand that the equilibriumstress field be divergence-free: σ = Cε , (1) ∇ · σ = 0 . (2)We consider constant periodic boundary conditions that correspond to the imposed volume-averaged strain, ε . With these choices, one can model the internal mechanical response of a representative volume element(RVE) of the material. In these models, the RVE often serves as a statistical unit cell of a larger materialstructure. We note that the problem statement expressed here serves as a simple illustration of the L-Sapproach, which has been successfully applied to more complex material systems and/or boundary conditions[32, 30].Following the work of Kr¨oner [36], this system is converted into an equivalent form (the elastic L-S equation) by splitting the local elastic stiffness C into a selected (constant) reference value C R and a2 igure 1: Example microstructure-strain field pair for two-phase elastic localization. Yellow is high-stiffness phase; purple islow-stiffness phase; contrast between elastic moduli is CR = 50 perturbation value C (cid:48) . Substituting these stiffness tensors into Equations (1) and (2) provides a partitionedversion of the governing field equations: ∇ · ( C R ε ) = −∇ · ( C (cid:48) ε ) (3)Observe that the left-hand side of this equation is a linear operator, denoted as L , which acts on ε : L ε ≡ ∇ · ( C R ε ) = C R ∇ · ε (4)Clearly L ε is linear in ε ; therefore, it will have a corresponding Green’s function G ( x , s ) [37]. Sincedivergence is uniform in space, we make the simplification G ( x , s ) = G ( x − s ). This function representsthe system’s response to an impulse inhomogeneity L G ( x − s ) = δ ( x − s ) (5)where δ denotes the Dirac-delta function.We also partition the strain field as ε = ε R + ε (cid:48) , where ε R represents a (constant) reference strain deter-mined by boundary conditions (this is also equal to the internal average strain ε ) and ε (cid:48) is the correspondingperturbation strain. Both C R and ε R are constant, so ∇ · ( C R ε R ) = 0 and thus ε R is the homogeneoussolution to L . For a given inhomogeniety b ( x ), we can use the Green’s function to solve for the particularsolution ε (cid:48) ( x ): L ε ( x ) = b ( x ) (6)= ⇒ ε ( x ) = ε R + (cid:90) V G ( x − s ) b ( s ) d s (7)Now we formally treat the right-hand side of Equation (3) as the inhomogeneity b to obtain the elasticLippmann-Schwinger Equation: ε (cid:48) ( x ) = − (cid:90) V G ( x − s ) ( ∇ · [ C (cid:48) ( s ) ε ( s )]) d s . (8)or ε ( x ) = ε − (cid:90) V G ( x − s ) ( ∇ · [ C (cid:48) ( s ) ε ( s )]) d s (9)3ince many governing equations can be converted into a similar form, we refer to their transformed versionas an “L-S form”; that is, Equation (9) is the Lippmann-Schwinger equation corresponding to the elasticgoverning equations. A Lippmann-Schwinger derivation corresponding to a general governing equation ispresented in Section 2.2. The reader is referred to Refs. [38, 39] for more background on Green’s functions.Using insights from previous work [22, 27, 35], we modify this form to make it amenable to a learningparadigm. First, we integrate-by-parts to shift the derivative onto G and absorb it into a new operator, (cid:98) G .Using ∗ to concisely denote a convolution, we obtain ε ( x ) = ε − (cid:90) V (cid:98) G ( x − s ) C (cid:48) ( s ) ε ( s ) d s = ε − (cid:98) G ∗ ( C (cid:48) ε ) (10)Next, we define a binary microstructure representation m h ( x ) that equals 1 if the material at point x isof phase (or material type) h , and 0 otherwise. Since each phase has its own stiffness, we can project thestiffness tensor C onto each phase: C = (cid:80) h C h m h and likewise C (cid:48) = (cid:80) h C h (cid:48) m h . Finally, we combine theGreen’s function terms with the C h (cid:48) expression to obtain yet another operator Γ ( x − s ) h ≡ (cid:98) G ( x − s ) C h (cid:48) .Applying all of these modifications, the elastic L-S form becomes ε ( x ) = ε − (cid:88) h Γ h ∗ (cid:0) m h ε (cid:1) . (11)The problem of elastic localization has thus been reduced to a single convolutional integral containingthe microstructure m h , candidate strain field ε , and a physics-determined stencil Γ h . Curiously, the firsttwo terms appear solely as an element-wise product between m h and ε . This is due to the fact thatthe strain field is constrained indirectly by the divergence-free condition on σ . One also observes that alleffects of C have been absorbed into Γ h . This corresponds to the fact that Γ h is not unique: infinitelymany choices of C R could result in this equation. Although mathematically equivalent to the originalphysics (and visually more complicated), Equation (11) provides significant advantages for solution overlarge heterogeneous volumes. Several solution strategies have been explored to address elastic localizationin heterogeneous material systems, resulting from different interpretations of the L-S equation.Mathematically, Γ h is a set of convolutional kernels – one for each phase – encoding the underlyingphysics of the problem. Given a strain field, it computes the corresponding stresses and peels off the strainperturbation field required to minimize the stress divergence. Using this perspective, many existing modelsview the L-S equation as a fixed-point equation and solve it via root-finding [7]. The rate of convergence ofthese methods tends to depend heavily on the variation in material properties and the choice of C R [32].All of these physics-based approaches require a quantitative knowledge of the Green’s function G .From a computer science perspective, the term m h ε simply represents the strain field segmented by phase(since m h is a binary indicator function). Given a collection of true structure-strain pairs, one could eitherlearn Γ h , or some approximation, to best conserve the equality. Following this view, several ML-based elasticlocalization models [22, 35] have been applied to learn (non-iterative) linkages between m h and ε by eitherapproximating a series expansion of Γ h , or using a neural network to map m h to ε directly, bypassing Γ h completely. The disadvantage of these models is that they either truncate or ignore the underlying physics,trying to re-learn it from data. The tension between these two perspectives leaves room for a hybrid methodwhich retains the iterative L-S structure, but uses ML to deduce the internal details of the Γ h operator. This section presents a derivation for the general L-S equation from a generic governing equation, drawingon the work of Moulinec [7] and Kr¨oner [36]. This is provided for two reasons: to provide greater intuitionand background for the L-S equation, and to motivate its compatibility with ML solvers. First, we write (inconservation form) a governing differential equation controlling a field y which varies over space x spanningsome volume V : H ( y ( x ); x ) = 0 . (12)4bserve that any governing equation can be partitioned into two coupled subequations, each governingtheir own subsystems. First define an operator L a which captures all linear (and spatially homogeneous)components of H . Now define a second (possibly nonlinear) operator B containing the rest of H . Oneobtains the earlier example of elastic localization with the substitutions y ≡ ε , L a ε ≡ ∇ · ( C R ε ), and B ( ε ) ≡ ∇ · ( C (cid:48) ε ). Although not explicitly denoted, both L a and B may contain implicit information aboutthe solution domain’s structure (terms such as C or m ).Using these operators we can rewrite the original equation as: H ( y ; x ) = L a y + B ( y ; x ) = 0 (13a)or L a y = − B ( y ; x ) . (13b)This partitions the governing equation into two coupled systems: a linear homogeneous system permittingonly “simple” solutions, and a nonlinear, heterogeneous system where the solution is more complicated.Before solving the complete equation, we consider the auxiliary system L a y ( x ) = b ( x ) (14)for some inhomogeneity b . We define the homogeneous solution to this system as y R , so that L a y R = 0.Note that in general, y R is determined by both L a and the relevant boundary conditions, and for someproblems there may be more than one suitable y R . For problems with a constant solution field on theboundary, one finds that y R = y , i.e., the reference field is the average solution everywhere.The choice of y R induces a corresponding perturbation (or “particular solution”) y (cid:48) = y − y R . Because y R is annihilated by L a , note that L a y = L a y (cid:48) . Since L a is linear, it will have a Green’s function G ( x , s ),which captures the system’s impulse response [38]. Using this we write the particular solution to theauxiliary equation as a convolution between G and b and reconstruct the complete solution: L a G ( x , s ) = δ ( x − s ) (15)= ⇒ y (cid:48) ( x ) = (cid:90) V G ( x , s ) b ( s ) d s (16)or y ( x ) = y R + (cid:90) V G ( x , s ) b ( s ) d s . (17)Now we return to Equation (13b) and apply the auxiliary approach, this time treating the entire B term asour homogeneity b (and noting the attached minus sign). Plugging this into the perturbation expression for y gives us: y = y R − (cid:90) V G ( x , s ) B ( y ( s ); s ) d s . (18)This is the general Lippmann-Schwinger equation; since this derivation holds for any operator H , we usethe term “L-S form” for a given H to describe the result of partitioning and substituting that H intoEquation (18). Referring back to the example of elastic localization, Equation (9) is the equivalent L-S formfor Hooke’s law (Equation (2)). Note that the linear system L a only enters the L-S equation through thedefinitions of y R and G . For example, if one used the trivial choice of the identity for L a , the correspondingGreen’s function would just be the Dirac delta function, and Equation (18) would simplify to the originalgoverning equation.For the L-S form to be advantageous over H , L a must capture a non-trivial amount of the underlyingequation. There are three primary factors which make the L-S equation useful. First, it is partitioned :5he original system is broken into two coupled physical systems. This makes it similar to a traditional“splitting method” where the linear and nonlinear components are separated, allowing the solution of thenonlinear components to be informed by the homogeneous, linear solution. The coupling between systemsmeans that the L-S equation is also recursive : the presence of y in the inhomogeneity term leads to itsappearance on both sides of the equation. If one desires to solve the problem analytically, the L-S form islikely no more useful than the original governing equation. However, the implicit structure is very suitablefor iterative and optimization-based solvers [9, 40]. Finally, the L-S equation is convolutional : the integral isactually a convolution between the B term and a (possibly-unknown) Green’s function G . Roughly speaking,Equation (18) presents the solution field y ( x ) as a balance between the global homogeneous “pull” ( y R ) andthe localized “tug” ( y (cid:48) ( x )) of the solution values in a neighborhood near x . In situations where B is a purelydifferential operator (such as elastic localization), and with appropriate boundary conditions, Equation (18)can be integrated-by-parts to shift part of B onto G . This can simplify the integral term so that all of thephysics is contained in a single convolutional stencil. As one of the most popular ML tools in use, neural networks are a class of parametrized functionapproximators that can be calibrated to curated data [17]. At a high level, an artificial neural network(ANN) operates as an alternating sequence of tunable linear transforms and nonlinear activation functions– mimicking the operation of physical neurons in the brain [41, 42]. Under certain conditions, a sufficientlylarge neural network can be shown to act as a universal function approximator [43, 44], motivating their usein a myriad of disciplines.Two relevant specializations are the convolutional neural network (CNN), which uses convolution witha fixed-width stencil as its transform operation [45, 18], and the recurrent neural network (RNN), whichoperates on sequential data and considers latent information carried across input iterations [46]. Thesetools can be combined to model the underlying structure of various problems. A well-designed ML modeltrained on sufficient data can be significantly faster than an analytical equivalent and still provide reasonableaccuracy [47]. This comes at the expense of interpretability – they return a “black-box” model which isdifficult to understand and analyze [24]. Additionally, the topology of these networks (e.g., number of layers,nodes per layer, activation functions) strongly determines their success [48], and the “best” configuration isproblem-dependent and often constructed ad-hoc.Recently there has been tremendous interest in the application of neural networks to mathematicalproblems [16]. Specifically, variations of recurrent CNNs have been explored [49] to learn Bayesian priors forimage denoising [33] or proximal operators for medical imaging [19]. These image analysis methods pose theproblem such that the desired output is obtained via a learned optimization procedure, where the optimizeritself is formulated as a neural network. Surprisingly, these methods often employ very simple networkdesigns, especially compared to deeper and more elaborate structures found in mainstream ML [20, 50].
3. Methodology
We now explore how the perturbation expansion and L-S form allow a governing equation to be inter-preted naturally as a machine learning problem. We first define a new operator Φ representing the entireright-hand side of Equation (18). We also use m to represent a problem domain’s underlying microstructure,which influences the inhomogeneity B . Given a sample microstructure m ∗ , we obtain the correspondingstrain y ∗ by minimizing the error (or loss) L between y and Φ ( y , m ∗ ) over all y . Φ ( y , m ) ≡ y R − (cid:90) V G ( x , s ) B ( y ( s ); s ) d s (19) y ∗ = Φ ( y ∗ ; m ∗ ) = arg min y L ( y , Φ ( y ; m ∗ )) (20)6lthough Φ may not be linear itself, linear analysis methods provide a useful interpretation: for agiven microstructure m ∗ , Φ has a (possibly non-unique) generalized eigenfunction y ∗ with unit eigenvalue.Issues regarding the solution’s uniqueness and existence can arise from the original governing equation’snonlinearity. In the case of the elasticity problem, the governing equation is linear, so a unique solution willexist.Now the original problem of solving the governing equation H has been reduced to that of minimizing thescalar loss L given a particular microstructure via a learned optimization strategy. To do this, we definea parametrized learner F θ that performs a sequence of proximal operations which progressively improvethe solution field to match the governing physics. Here θ represents all possible parameters of this learner.The optimal set of parameters θ opt is obtained by minimizing the expected error produced by F θ w.r.t. θ ; in the case of a CNN this represents the optimal network weights, and can be obtained via standardbackpropagation [51]. Given a microstructure and initial guess y , we want F θ to provide a solution fieldwhich is approximately the true solution field y ∗ : F θ ( y , m ∗ ) = ˆ y ≈ y ∗ (21)This is accomplished by a sequence of updates y i = y R + f i ( y i − , m h ) (22)where f i represents the perturbation proximal operator used at iteration i ; given a microstructure andcandidate strain field, it outputs the perturbation component corresponding to an improved estimate ofthe true strain field. A pseudocode and visual representation of the approach developed in this work arepresented in Figure 2. Ideally, after the model is trained, F θ and Φ have the same eigenfunctions (i.e., F θ ( y ∗ , m ∗ ) = y ∗ ).The optimization strategy employed by the F θ model is directly determined by the choice of f i . Toexplore what this might look like for the elastic problem, consider an L2 loss function and plug in the elasticLS formula (Equation (11)): Φ ( y ; m h ) = y R − (cid:88) h Γ h ∗ (cid:0) m h y (cid:1) (23) L (cid:0) y , Φ ( y ; m h ) (cid:1) ≡ (cid:0) y − Φ ( y ; m h ) (cid:1) = 12 (cid:32) y − y R + (cid:88) h Γ h ∗ ( m h y ) (cid:33) (24)The original fixed-point approach of Moulinec [7] corresponds to the choice f Mi ( y i − , m h ) = − (cid:88) h Γ h ∗ (cid:0) m h y i − (cid:1) ∀ i (25)As an alternative, we can obtain a “steepest descent” formula by taking the gradient [34] of Equation (24): f Gi ( y i − , m h ) = y i − − y R − γ i ∂∂ y i − L (cid:0) y i − , Φ ( y i − ; m h ) (cid:1) (26)or f Gi ( y i − , m h ) = y i − − y R − γ i (cid:32) I + ∂∂ y i − (cid:88) h Γ h ∗ ( m h y i − ) (cid:33) (cid:32) y i − − y R + (cid:88) h Γ h ∗ ( m h y i − ) (cid:33) (27)where γ i denotes the step size at iteration i and I represents the identity operator. By flattening everythinginto vectors and representing the convolution with an equivalent Toeplitz matrix [52], one can convert the7 e f F θ ( y R , m ) : y = y R f o r i = 1 t o N : y i = y R + f i ( y i − , m )r e t u r n y N (a) Pseudocode for optimization procedure F ? my f Legend: y R : reference solution y i : iterates ( i-k) through ( i) (last k candidate solutions) ? : data combination? : element-wise sum y R my f N y N ... (b) Visualization of data flow through optimization procedure Figure 2: Pseudocode and visualization for optimization-based solution of the L-S equation H acting on y i − . The linearity of H comesfrom the fact that the ( I + ∂∂ y i − . . . ) term is actually independent of y i − . Using a constant λ i to collectremaining terms of y R , the steepest descent rule becomes: f Gi ( y i − , m h ) = (1 − γ i )( y i − − y R ) + γ i Hy i − + λ i y R (28)Effectively, the gradient descent rule says that the new perturbation field is obtained by correcting theprevious perturbation field ( y i − − y R ) using a set of convolutional operations involving Γ h and m h , thensubtracting off a factor of y R such that the output perturbation field is zero-mean.A variety of more complicated update rules have been proposed to accelerate the solution of differentforms of the L-S equation [32]. From the examples above one sees that any update rule will involve variousterms of Γ h , which itself contains both physical properties ( C h (cid:48) ) as well as derivatives of the Green’s function( (cid:98) G ); more complicated update rules will simply require higher-order combinations. Therefore, if we hopeto learn f i , it must be parametrized in a way such that it can capture global convolutional stencils, as wellas various derivatives thereof. Finally, we note that although most analytical approaches employ the sameoperator for each iteration, the F θ formulation permits varying f i across iterations. Most analytical minimization procedures are (by design) problem-agnostic; this means that they can beexpected to work reasonably well for many problems, but may not be optimal for the problem at hand.Rather than using a predefined optimization strategy, we formulate each f i as a CNN that learns a proximaloperator mapping a given microstructure and candidate solution field to an improved perturbation field.The central motivation behind using a CNN proximal operator is that given sufficient parameterization, itcan emulate almost any optimization strategy; furthermore, that strategy will be customized to the problemat hand during training. Of course, this comes with the immense caveat that, absent any advances intheoretical ML, a learned optimizer will not have any provable convergence guarantees. The means thateven as N → ∞ , our learned model may not converge to the true solution field for a given microstructure.In practice, however, the model can be tuned and trained until it consistently produces solutions withinacceptable error tolerances.We define the F θ model imbued with a CNN f i as a recurrent localization network (RLN), which performsthe following operations during both training and evaluation phases: given a microstructure m and initialguess y R , estimate the true solution field by refining it over N iterations. At iteration i , the microstructureis combined with candidate solutions from previous iterations and passed through CNN f i . Note that f i outputs a perturbation field y (cid:48) i . The reference solution y R is already known, so there is no need to learnthat. In order to simulate a multi-step solver [3, 53] and estimate higher-order derivative terms, f i considersthe last k solutions via multiple input channels (rather than just y i − ). One could potentially achieve thisproperty, and perhaps obtain better results, by using GRU or LSTM modules [33] which learn a “latent”state to pass between iterations; however, 3D convolutional implementations for these operations were notpart of major ML libraries at the time of writing.Specifically for elastic localization, m and y are combined via element-wise multiplication followingEquation (11). To enforce computational stability, all strains are normalized by the average strain ε .Moreover, the output of each f i network has its average subtracted to enforce the constraint that theperturbation strain field is always zero-mean.Following prior work [19], we define a full RLN as using a different f i for each iteration (although weuse the same CNN structure for each), for a total of N distinct networks. The idea behind this is to allowthe network capture different properties at each iteration, akin to terms in a series expansion. Havingsignificantly more tunable parameters, this configuration provides the most expressive model. By allowingdifferent operators to be employed at different iterations, this approach also deviates the most from standardanalytical optimization procedures.Alternatively, one may wish to reuse the same intermediate network across iterations ( f i = f ∀ i ). Thisapproach is denoted as RLN-t since the weights are tied between iterations. This means that the sameoperator will be employed at each iteration; however, since a time series is fed into each f i (rather than a9ingle data point), the RLN-t is still able to learn higher-order derivatives. It has the primary advantage ofsimplicity and efficiency, since it uses a factor of N fewer parameters than the full RLN.Finally, we test the importance of the iterative and recurrent nature by considering a single f i network,i.e., choosing N = 1. We call this a feed-forward localization network (FLN) and use it as a control toquantify the benefits of iteration vs. network design. Although the RLN-t and the FLN have the samenumber of parameters, the RLN-t uses each parameter N times, effectively simulating a deeper network.For N >
1, the proximal CNNs are all calibrated simultaneously via backpropagation. During training abatch of true structure-strain pairs are fed through F θ in its entirety, and all proximal networks are updatedsimultaneously to minimize a calibration loss function L ( cal ) (different from the solution field loss L above). Rather than only consider the loss of the last iterate, we use a weighted sum of the loss of individualiterates: L ( cal ) = (cid:80) i w i L ( cal ) i for some weights w i . The goal is to encourage the network to progressbetween iterations, while also finding the best possible solution. Following the analysis of Andrychowiczet al. [49] we interpret the use of L ( cal ) as a variant of Backpropogation Through Time [54]. The choice ofweights w i could theoretically act as a form of stabilization or even regularization. By requiring that eachiteration output a reasonable candidate solution, each proximal operator is constrained to behave somewhatphysically, which might help prevent overfitting. However, this means that the network is encouraged tomake larger changes in early iterations, potentially reducing its final-iterate accuracy. Conversely, if onlythe final result is important, then intermediate iterations could explore the loss curve more. However, onlyupdating based on the last iteration will slow the model’s training, and possibly increase the chances of thenetwork weights converging to a poor local minimum. Clearly further experiments are required to explorethese hypotheses.Following similar works [33] we chose the uniform weighting w i = 1 ∀ i . This induces the network tomake larger corrections in early iterations (to avoid carrying costly errors through several iterations) andrelatively smaller corrections in later iterations. However, our numerical experiments (Section 4) indicatethat perhaps a different weighting might help the network capture fine-scale microstructure features. Theappropriate number of iterations depends on the specific problem; for elasticity N = 5 proved sufficientin both the RLN and the RLN-t, and more iterations yielded little benefit. For the number of previousiterations to track, the value k = 2 was chosen. The above choices of hyperparameter were largely heuristicand made for simplicity, but they worked well for elasticity; for a more intensive problem a cross-validationprocedure would be a better method for their selection. Various network topologies for f i [19, 33, 55] were tested with one goal in mind: use convolutionalkernels to effectively capture local interactions. The architecture that proved most successful for elasticitywas based roughly on a V-Net [55] and is presented in Figure 3. By combining information across lengthscales through the use of up- and down-sampling operations, this architecture is able to encode and combineboth fine- and coarse-grained features. Notably, even a simple sequence of 3 convolutional layers (similar tothat of Adler [19]) worked reasonably well. As with the hyperparameters above, a cross-validation procedurecould aid in picking a superior network topology for a given problem; for this work, we intentionally avoidedhyperparameter optimization to emphasize the mathematical analysis and iterative approach.For this study, most convolutional kernels are 3x3x3, with 1x1x1 kernels used for channel reductionand 2x2x2 for up/down-sampling, summing to ≈ f i . A parametric rectified linear unit (PReLU) activation function [56] was used was used after the 3x3x3convolutions. This activation is similar the regular ReLU, but has a non-zero slope α for negative inputs; α is trained with all the other network weights during backpropogation. In our experiments this improved themodel’s stability during training and accuracy during testing. Effectively, it allows the network to be muchmore expressive with only a few extra parameters – changing one α scales the entire output of a channel,rather than 3 × × f i OutputInput
Legend:
Conv MxMxM: M kernel@X: X output channels / Y: downsample by Y * Z: upsample by ZPReLU: Parametric ReLUConv 3x3x3 @16Conv 2x2x2 @32 / 2ConvT 2x2x2@32 * 2 Conv 3x3x3 @32Conv 3x3x3 @32Conv 3x3x3 @32 PReLUPReLUPReLU
Conv 1x1x1 @1
PReLUPReLU
Conv 3x3x3 @16
PReLU
Conv 3x3x3 @16
PReLU
Figure 3: Architecture for internal f i networks
4. Linear Elasticity Experiments
We now present results from the application of RLN-type models to the elastic localization problemin two-phase composites. A synthetic dataset of 20,480 31 × ×
31 microstructure/strain field pairs wasgenerated and randomly split 40% / 20 % / 40% into disjoint train/validation/test sets, respectively. Thetrain set was used to calibrate F θ , the validation set served to measure its quality while training, and thetest set was used to compute error metrics.The microstructures were generated via PyMKS [25] by creating a uniform random field, applying a setof Gaussian filters, and thresholding the result. In this microstructure generation process, the filter’s widthin each direction controls the characteristic size and shape of the material grains, and the threshold controlsthe relative volume fraction of each phase.Combinations of these four parameters were generated via a Latin hypercube sampling procedure togenerate 4096 design points for microstructure generation. For each design point, 5 random microstructureswere generated in order to produce statistically similar inputs, so that the datasets had some statisticalredundancy. A random sample of 8 training microstructures is displayed in Figure 4. By varying thesedesign parameters, one can cover a vast subspace of all possible microstructures, although we note thatthe space of all microstructures in intractably large – ignoring circular symmetries there are O (2 S ) possibledifferent microstructures with S voxels. By selecting our training and testing sets using the same procedure,we demonstrate the RLN’s ability to interpolate between “familiar” microstructure samples, but not toextrapolate to microstructures which lie outside the closure of the training set. It is important to note thatthese microstructures do not necessarily correspond to thermodynamically favorable structures (i.e. theymight appear in nature). However, this is a limitation only in data and not methodology – by using afully-convolutional structure [57] the RLN can handle any voxelized input, including experimental data.The elastic strain fields in each microstructure were obtained from previously-built finite element models[22]. One major simplification is that even though the imposed 3D strain field is actually a second-ordertensor (and the elastic stiffness field a fourth-order tensor), the linear nature of the problem allows us tosolve for each component individually and superimpose their solutions as needed [35]. As such we apply an11 igure 4: Random sample of 8 training microstructures sliced along ( y, z ) axis. Yellow is high-stiffness phase, purple islow-stiffness. overall displacement to a material RVE along only the x -axis and we focus solely on the (cid:15) xx term. For thesesimulations, a total strain of 0.1% was applied via periodic boundary conditions.The relevant material parameters to be prescribed are thus the elastic moduli E , E and Poisson ratios ν , ν of the two phases. To roughly match most common metals, we chose ν = ν = 0 .
3. With theseselections, the contrast ratio CR ≡ E E has the most dominant role on the final strain field. With the choice E ≥ E , one observes that CR ≥
1. In general, as the contrast in stiffnesses increases, the problem becomesharder to solve with both iterative and data-driven methods [32]. Following prior work [22], we tested theRLN on contrast ratios of 10 and 50.Each RLN model was implemented in PyTorch [58] and trained independently for 60 epochs on anNVIDIA V100 GPU [59], which took approximately 8 hours. The RLN models were all calibrated using theAdam optimizer, a Mean Square Error loss, and a Cosine annealing learning rate decay [60]. The last choiceproved especially important since the models demonstrated great sensitivity to learning rate; introducingthe cosine decay helped the model converge smoothly and quickly. After training, the epoch with the lowestvalidation loss was chosen as the “optimal”. In reality, the training and validation losses were tightly coupledas demonstrated in Figure 5. Note that training losses are aggregated during the epoch, whereas validationlosses are computed after the epoch is complete, which causes the validation loss to occasionally appearlower.
Following Yang et al. [22], the accuracy of the model was evaluated for each instance using the meanabsolute strain error (MASE) over all S voxels: M ASE ( y pred , y true ) = 100 × S S (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12) y pred [ i ] − y true [ i ] y R (cid:12)(cid:12)(cid:12)(cid:12) . (29)Figure 6 presents the MASE distribution across each microstructure/strain pair in the test set. Note thatthe MASE is an aggregate measure which measures RVE-wide error; the pointwise error variation within amicrostructure is explored below. The mean and standard deviation of the MASE distribution are collected12
10 20 30 40 50 60Epoch5101520253035 M S E L o ss Learning curve for FLN
FLN, CR = 10 training lossFLN, CR = 10 validation lossFLN, CR = 50 training lossFLN, CR = 50 validation loss 0 10 20 30 40 50 60Epoch M S E L o ss Learning curve for RLN-t
RLN-t, CR = 10 training lossRLN-t, CR = 10 validation lossRLN-t, CR = 50 training lossRLN-t, CR = 50 validation loss 0 10 20 30 40 50 60Epoch v a l u e Learning curve for RLN
RLN, CR = 10 training lossRLN, CR = 10 validation lossRLN, CR = 50 training lossRLN, CR = 50 validation loss Figure 5: Learning curve containing calibration losses for each model and contrast ratio.
Model MASE (mean ± std. dev.) Contrast-10
Comparison DL model [22] 3.07% ± ± ± ± Contrast-50
Comparison DL model 5.71% ± ± ± ± Table 1: Test-set MASE metrics for RLN and control models, for both CR = 10 and CR = 50 in Table 1 for the RLN-type models. For comparison we also present results from a recent study [22] usinga feed-forward deep learning (DL) model to predict the strain fields; note that the DL model was trainedand tested on its own dataset prior to this effort.It is important to note that the DL model had a fundamentally different architecture: it was designedto work on 21 voxel structures (whereas ours was tested on 31 ), and it predicted the strain one voxelat a time (whereas ours predicts strain across the entire microstructure simultaneously). The dataset forthe DL model employed large amounts of data augmentation using circular permutations, and containedsignificantly more input-output pairs. Finally, the DL model employed linear layers (to collapse to a singleoutput); the size of these layers implies that the DL model used substantially more network weights thanthe RLN. As a result, it is difficult to compare the DL dataset and results with ours. We emphasize thatthe DL model represents a strong comparison model for ML elastic localization, but that objective rankingof relative performance is difficult. Nevertheless, the RLN architecture is able to produce significantly moreaccurate strain field estimates on the RLN dataset than the DL architecture produces on the DL dataset.Keeping these caveats in mind, the following analysis explores the difference between RLN configurations.Looking at the aggregate statistics, the FLN performs worst of all models analyzed; it is vastly outper-formed by the RLN-t even though they have the same number of parameters. This is also reflected inthe learning curve in Figure 5: the RLN-t trained faster than the FLN and converged to a more accuratemodel simply by applying the proximal operator repeatedly across multiple iterations. Intriguingly, the FLNresults are somewhat worse than the DL results. This implies that our simple network topology and relativelack of hyperparameter tuning produced a less-powerful model than the DL. However, the RLN-t did muchbetter, producing less error on average (and significantly less variance) than the DL control for both contrast13 igure 6: MASE distribution for RLN, CR = 50Figure 7: Slices of RLN predicted strain, true (FEA) strain, and ASE for worst test-set instance, CR = 50 ratios. The disparity in performance indicates that dataset and network topology alone cannot explain theimproved MASE relative to the DL model – the iterative methodology produces a more powerful model.The full RLN is the most accurate model, producing roughly half as much error and variability as theDL model for both contrast ratios. The improvement over the RLN-t has a number of possible origins:the RLN uses a factor of N more parameters, and in turn it uses a different operator at each iteration. Wenote that after training, the full RLN increases the memory overhead but not the prediction time: up toGPU memory details, each iteration has the same computational costs regardless of weight tying. Excludingdata I/O overhead, the RLN-t and the full RLN take only ≈
87 seconds to predict strain fields for theentire testing set (8,192 microstructures), or roughly 11 milliseconds per microstructure ( c.f. ≈ y i evolves across iterations, as well as its difference14 y N o r m a li z e d S t r a i n FLN Evolution
Figure 8: FLN predictions and differences between iterations for selected test-set instance, CR = 50 y y y y y y N o r m a li z e d S t r a i n RLN-t Evolution
Figure 9: RLN-t predictions and differences between iterations for selected test-set instance, CR = 50 ∆ i ≡ y i − y i − . This is presented in Figure 8 for the FLN, Figure 9 for the RLN-t, and Figure 10 for the fullRLN. The greyscale coloring is the same scale as Figure 7 and has its maximum at the true strain response’smaximum. The redscale coloring represents a negative strain update ( y decreasing between iterations).The FLN appears to handle low-strain areas fairly well, but misses the strain peaks in the center. Notethat it does not apply the same operation as the first iteration of the RLN-t; the FLN is trained to makethe best possible answer within a single jump. In comparison, the RLN-t does a better job of capturingstrain peaks, likely since it can build them up over several iterations. What is less clear is that it fails tocapture the central strain peak as well as the full RLN – the differences between the two are evidently ratherfine-scale, so we focus our analysis on the full RLN.After the first iteration the RLN has picked up most of the relative troughs and peaks, and finer tweaksare handled by the later iterations. For example, most of the high-strain regions (greyscale areas) appearto be captured in the first two iterations, whereas low-strain regions (redscale areas) continue to be refinedin later iterations.This is due to the RLN’s ability to learn different, and nonlinear, operators at each iteration – theMSE loss function tends to magnify high-magnitude errors, even if they are very localized (i.e., strain peaksand troughs). The model is therefore encouraged to estimate these outlier areas first (corresponding to ∆ having much more magnitude). Of course, this puts a lot of weight on the first proximal network to capturethe strain outliers correctly. One possible solution would be to adjust the loss function weighting so that15 y y y y y N o r m a li z e d S t r a i n RLN Evolution
Figure 10: RLN predictions and differences between iterations for selected test-set instance, CR = 50 early iterations are encouraged to converge gradually towards a good solution, rather than quickly towardsa decent one. In other words, by allowing earlier iterations to mispredict strain peaks somewhat, the modelmay be more likely to escape local minima and actually obtain higher final-iteration accuracy.This approach differs from traditional finite-element based approaches in that it seeks a solution bylearning Green’s functions (and derivatives thereof) of the governing equation, rather than solving thegoverning equation numerically. This provides an advantage in prediction speed by requiring a rather costly(but one-time) training overhead. The computational complexity of convolving a 3D field containing S voxels with a 3D stencil containing k voxels is O ( Sk ) (since each voxel in the first field must be multipliedby each voxel in the stencil). For a fixed network topology, k is a constant; therefore the prediction runtimewill increase linearly with the number of microstructure voxels. A spectral method using the Fast FourierTransform will cost at least O ( S log S ), and any numerical methods (finite element or otherwise) employinglinear solvers will likely be even more expensive. We reiterate that using GPU parallelization, the RLNrequires on average 11 milliseconds to predict the strain field for a given microstructure, compared to severalseconds for the finite element solver. This makes it very valuable for inverse problems, where the localizationproblem must be solved thousands or even millions of times in order to solve a higher-level metaproblem[15]. Once trained for a specific governing equation (e.g. linear elasticity) and set of material properties(e.g., contrast ratio), the RLN methodology can be applied to any voxelized microstructure. Although weonly tested a 31 structure, in principle a model could be trained on one structure size and used on another(possibly with reduced accuracy); this is the subject of ongoing work. Note that the model must be trainedanew to predict strains for a different contrast ratio.
5. Conclusions
In this paper, we describe a new learning-based methodology for addressing Lippmann-Schwinger typephysics problems. Embedding recurrent CNNs into a learned optimization procedure provides for a flexible,but interpretable, ML model. The design of proximal networks is informed by problem-specific domainknowledge; that knowledge also provides a physical interpretation of what the model is learning. Fur-thermore, the partitioned and convolutional structure of this approach acts as a regularizer by enforcingunderlying physical properties such as mean field values and spatial invariance. The iterative scheme allowsfor emulation of a deeper network, vastly increasing model robustness without increasing the parameter-ization space. If space allows, using a different network for each iteration further improves the model’sexpressiveness and accuracy.When applied to the elasticity localization problem and using our dataset, our model produced muchmore accurate and interpretable results than previous deep learning models produced on similar datasets,16hile being much faster than analytical approaches. The CNN architecture used here was designed forsimplicity, but could be improved with more advanced techniques such as inception modules [61], perceptuallosses [62], Fourier layers [63], or variational layers [64]. Moreover, many hyperparameters, such as numberof iterations and loss function weighting, can be tuned on a problem-specific basis.
6. Acknowledgements
This work was supported by NSF Graduate Research Fellowship DGE-1650044, and SK acknowledgessupport from NSF 2027105. We used the Hive cluster supported by NSF 1828187 and managed by PACEat Georgia Institute of Technology, USA. The authors thank Anne Hanna, Andreas Robertson, Nic Olsen,and Mady Veith for insightful discussions that helped shape this work.
7. Data and Software Availability
The raw and processed data required to reproduce these findings are available to download from [dataset] [65]. The implementa-tion and training code for each RLN configuration, as well as trained models, are available under an opensource license on Github (see Ref. [66]). 17 eferences [1] S. Kalidindi, Hierarchical Materials Informatics: Novel Analytics for Materials Data, Butterworth-Heinemann, Boston,2015. doi:https://doi.org/10.1016/B978-0-12-410394-8.00001-1 .URL [2] NIST, Materials Genome Initiative Stragetic Plan, 2014.URL [3] K. W. Morton, D. F. Mayers, Numerical Solution of Partial Differential Equations: An Introduction, 2nd Edition, Cam-bridge University Press, 2005. doi:10.1017/CBO9780511812248 .[4] F. Roters, P. Eisenlohr, L. Hantcherli, D. Tjahjanto, T. Bieler, D. Raabe, Overview of constitutive laws, kinematics,homogenization and multiscale methods in crystal plasticity finite-element modeling: Theory, experiments, applications,Acta Materialia 58 (4) (2010) 1152 – 1211. doi:https://doi.org/10.1016/j.actamat.2009.10.058 .URL [5] M. Cheng, J. Warren, Controlling the accuracy of unconditionally stable algorithms in the cahn-hilliard equation, Physicalreview. E, Statistical, nonlinear, and soft matter physics 75 (2007) 017702. doi:10.1103/PhysRevE.75.017702 .[6] D. McDowell, J. Panchal, H.-J. Choi, C. Seepersad, J. Allen, F. Mistree, Integrated Design of Multiscale, MultifunctionalMaterials and Products, 2009. doi:10.1016/C2009-0-20058-4 .[7] H. Moulinec, P. Suquet, A numerical method for computing the overall response of nonlinear composites with complexmicrostructure, Computer Methods in Applied Mechanics and Engineering 157 (1) (1998) 69 – 94. doi:https://doi.org/10.1016/S0045-7825(97)00218-1 .URL [8] T. de Geus, J. Vondˇrejc, J. Zeman, R. Peerlings, M. Geers, Finite strain fft-based non-linear solvers made simple, ComputerMethods in Applied Mechanics and Engineering 318 (2017) 412–430. doi:10.1016/j.cma.2016.12.032 .URL http://dx.doi.org/10.1016/j.cma.2016.12.032 [9] J. C. Michel, H. Moulinec, P. Suquet, A computational scheme for linear and non-linear composites with arbitraryphase contrast, International Journal for Numerical Methods in Engineering 52 (1-2) (2001) 139–160. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.275 , doi:10.1002/nme.275 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.275 [10] R. Snieder, Inverse problems in geophysics, in: Signal Recovery and Synthesis, Optical Society of America, 2001, p. SMA2. doi:10.1364/SRS.2001.SMA2 .URL [11] X. Wu, G. Proust, M. Knezevic, S. Kalidindi, Elastic–plastic property closures for hexagonal close-packed polycrystallinemetals using first-order bounding theories, Acta Materialia 55 (8) (2007) 2729 – 2737. doi:https://doi.org/10.1016/j.actamat.2006.12.010 .URL [12] A. Jain, J. Bollinger, T. Truskett, Inverse methods for material design, AIChE Journal 60 (08 2014). doi:10.1002/aic.14491 .[13] M. Parno, T. Moselhy, Y. Marzouk, A multiscale strategy for bayesian inference using transport maps, SIAM/ASA Journalon Uncertainty Quantification 4 (1) (2016) 1160–1190. doi:10.1137/15m1032478 .URL http://dx.doi.org/10.1137/15M1032478 [14] M. Horstemeyer, Multiscale Modeling: A Review, 2009, pp. 87–135. doi:10.1007/978-90-481-2687-3_4 .[15] C.-T. Chen, G. X. Gu, Generative deep neural networks for inverse materials design using backpropagation and activelearning, Advanced Science 7 (5) (2020) 1902607. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/advs.201902607 , doi:10.1002/advs.201902607 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/advs.201902607 [16] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborov´a, Machine learning andthe physical sciences, Rev. Mod. Phys. 91 (2019) 045002. doi:10.1103/RevModPhys.91.045002 .URL https://link.aps.org/doi/10.1103/RevModPhys.91.045002 [17] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85–117. doi:10.1016/j.neunet.2014.09.003 .URL http://dx.doi.org/10.1016/j.neunet.2014.09.003 [18] A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural networks, Neural Informa-tion Processing Systems 25 (01 2012). doi:10.1145/3065386 .[19] J. Adler, O. Oktem, Learned primal-dual reconstruction, IEEE Transactions on Medical Imaging 37 (6) (2018) 1322–1332. doi:10.1109/tmi.2018.2799231 .URL http://dx.doi.org/10.1109/TMI.2018.2799231 [20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CoRR abs/1512.03385 (2015). arXiv:1512.03385 .URL http://arxiv.org/abs/1512.03385 [21] Z. Yang, Y. C. Yabansu, R. Al-Bahrani, W. keng Liao, A. N. Choudhary, S. R. Kalidindi, A. Agrawal, Deep learningapproaches for mining structure-property linkages in high contrast composites from simulation datasets, ComputationalMaterials Science 151 (2018) 278 – 287. doi:https://doi.org/10.1016/j.commatsci.2018.05.014 .URL [22] Z. Yang, Y. C. Yabansu, D. Jha, W. keng Liao, A. N. Choudhary, S. R. Kalidindi, A. Agrawal, Establishing structure-property localization linkages for elastic deformation of three-dimensional high contrast composites using deep learning pproaches, Acta Materialia 166 (2019) 335 – 345. doi:https://doi.org/10.1016/j.actamat.2018.12.045 .URL [23] J. Schmidt, M. R. G. Marques, S. Botti, M. A. L. Marques, Recent advances and applications of machine learning insolid-state materials science, npj Computational Materials 5 (2019) 1–36.[24] V. Buhrmester, D. M¨unch, M. Arens, Analysis of explainers of black box deep neural networks for computer vision: Asurvey (2019). arXiv:1911.12116 .[25] D. Brough, D. Wheeler, S. Kalidindi, Materials knowledge systems in python - a data science framework for accelerateddevelopment of hierarchical materials, Integrating Materials and Manufacturing Innovation 6 (2016) 1–18. doi:10.1007/s40192-017-0089-0 .[26] M. I. Latypov, L. S. Toth, S. R. Kalidindi, Materials knowledge system for nonlinear composites, Computer Methods inApplied Mechanics and Engineering 346 (2018) 180–196.[27] S. R. Kalidindi, A bayesian framework for materials knowledge systems, MRS Communications 9 (2) (2019) 518–531. doi:10.1557/mrc.2019.56 .[28] S. Kalidindi, A. Khosravani, B. Yucel, A. Shanker, A. Blekh, Data infrastructure elements in support of acceleratedmaterials innovation: Ela, pymks, and matin, Integrating Materials and Manufacturing Innovation (2019) 1–14 doi:10.1007/s40192-019-00156-1 .[29] B. A. Lippmann, J. Schwinger, Variational principles for scattering processes. i, Phys. Rev. 79 (1950) 469–480. doi:10.1103/PhysRev.79.469 .URL https://link.aps.org/doi/10.1103/PhysRev.79.469 [30] D. B. Brough, D. Wheeler, J. A. Warren, S. R. Kalidindi, Microstructure-based knowledge systems for capturing process-structure evolution linkages, Current Opinion in Solid State and Materials Science 21 (3) (2017) 129 – 140, materialsInformatics: Insights, Infrastructure, and Methods. doi:https://doi.org/10.1016/j.cossms.2016.05.002 .URL [31] M. W. Priddy, N. H. Paulson, S. R. Kalidindi, D. L. McDowell, Strategies for rapid parametric assessment ofmicrostructure-sensitive fatigue for hcp polycrystals, International Journal of Fatigue 104 (2017) 231 – 242. doi:https://doi.org/10.1016/j.ijfatigue.2017.07.015 .URL [32] R. A. Lebensohn, A. D. Rollett, Spectral methods for full-field micromechanical modelling of polycrystalline materials,Computational Materials Science 173 (2020) 109336. doi:https://doi.org/10.1016/j.commatsci.2019.109336 .URL [33] P. Putzky, M. Welling, Recurrent inference machines for solving inverse problems (2017). arXiv:1706.04008 .[34] M. Stein, Gradient methods in the solution of systems of linear equations, Journal of research of the National Bureau ofStandards 48 (1952) 407.[35] G. Landi, S. R. Niezgoda, S. R. Kalidindi, Multi-scale modeling of elastic response of three-dimensional voxel-basedmicrostructure datasets using novel dft-based knowledge systems, Acta Materialia 58 (7) (2010) 2716 – 2725. doi:https://doi.org/10.1016/j.actamat.2010.01.007 .URL [36] E. Kr¨oner, Statistical continuum mechanics: course held ... October 1971, Springer, 1972.[37] G. Green, An essay on the application of mathematical analysis to the theories of electricity and magnetism (1828). arXiv:0807.0088 .[38] T. Eisler, An introduction to Green’s functions, 1969.[39] Wikipedia, https://en.wikipedia.org/w/index.php?title=Green%27s_function&oldid=966577686 (Jul 2020).[40] R. A. Lebensohn, J. P. Escobedo, E. K. Cerreta, D. Dennis-Koller, C. A. Bronkhorst, J. F. Bingert, Modeling void growthin polycrystalline materials, Acta Materialia 61 (18) (2013) 6918 – 6932. doi:https://doi.org/10.1016/j.actamat.2013.08.004 .URL [41] W. S. McCulloch, W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity, MIT Press, Cambridge, MA,USA, 1988, p. 15–27.[42] F. Rosenblatt, The Perceptron, a Perceiving and Recognizing Automaton Project Para, Report: Cornell AeronauticalLaboratory, Cornell Aeronautical Laboratory, 1957.[43] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems 2(1989) 303–314.[44] A. Pinkus, Approximation theory of the mlp model in neural networks, Acta Numerica 8 (1999) 143–195.[45] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition, Proceedings of theIEEE 86 (11) (1998) 2278–2324.[46] Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning (2015). arXiv:1506.00019 .[47] F. Emmert-Streib, Z. Yang, H. Feng, S. Tripathi, M. Dehmer, An introductory review of deep learning for predictionmodels with big data, in: Frontiers in Artificial Intelligence, 2020.[48] V. K. Ojha, A. Abraham, V. Sn´asel, Metaheuristic design of feedforward neural networks: A review of two decades ofresearch, CoRR abs/1705.05584 (2017). arXiv:1705.05584 .URL http://arxiv.org/abs/1705.05584 [49] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, B. Shillingford, N. de Freitas, Learning tolearn by gradient descent by gradient descent (2016). arXiv:1606.04474 .[50] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with
0x fewer parameters and < arXiv:1602.07360 .URL http://arxiv.org/abs/1602.07360 [51] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning Representations by Back-Propagating Errors, MIT Press,Cambridge, MA, USA, 1988, p. 696–699.[52] R. M. Gray, 2006. doi:10.1561/0100000006 .[53] M. Z. Ullah, S. Serra-Capizzano, F. Ahmad, An efficient multi-step iterative method for computing the numerical solutionof systems of nonlinear equations associated with odes, Applied Mathematics and Computation 250 (2015) 249 – 259. doi:https://doi.org/10.1016/j.amc.2014.10.103 .URL [54] M. Mozer, A focused backpropagation algorithm for temporal pattern recognition, Complex Syst. 3 (1989).[55] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmen-tation (2016). arXiv:1606.04797 .[56] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing human-level performance on imagenet classifi-cation (2015). arXiv:1502.01852 .[57] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation (2015). arXiv:1411.4038 .[58] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai,S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle,A. Beygelzimer, F. d ' Alch´e-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32,Curran Associates, Inc., 2019, pp. 8024–8035.URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf [59] PACE, Partnership for an Advanced Computing Environment (PACE) (2017).[60] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts (2017). arXiv:1608.03983 .[61] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeperwith convolutions, CoRR abs/1409.4842 (2014). arXiv:1409.4842 .URL http://arxiv.org/abs/1409.4842 [62] J. Johnson, A. Alahi, F. Li, Perceptual losses for real-time style transfer and super-resolution, CoRR abs/1603.08155(2016). arXiv:1603.08155 .URL http://arxiv.org/abs/1603.08155 [63] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier neural operator forparametric partial differential equations (2020). arXiv:2010.08895 .[64] K. Shridhar, F. Laumann, M. Liwicki, A comprehensive guide to bayesian convolutional neural network with variationalinference (2019). arXiv:1901.02731 .[65] C. Kelly, Elastic localization data, (2020).[66] C. Kelly, Rln elasticity, https://github.com/conlain-k/RLN_elasticity (2020).(2020).