[PDF] Recurrent Localization Networks applied to the Lippmann-Schwinger Equation

Abstract

The bulk of computational approaches for modeling physical systems in materials science derive from either analytical (i.e. physics based) or data-driven (i.e. machine-learning based) origins. In order to combine the strengths of these two approaches, we advance a novel machine learning approach for solving equations of the generalized Lippmann-Schwinger (L-S) type. In this paradigm, a given problem is converted into an equivalent L-S equation and solved as an optimization problem, where the optimization procedure is calibrated to the problem at hand. As part of a learning-based loop unrolling, we use a recurrent convolutional neural network to iteratively solve the governing equations for a field of interest. This architecture leverages the generalizability and computational efficiency of machine learning approaches, but also permits a physics-based interpretation. We demonstrate our learning approach on the two-phase elastic localization problem, where it achieves excellent accuracy on the predictions of the local (i.e., voxel-level) elastic strains. Since numerous governing equations can be converted into an equivalent L-S form, the proposed architecture has potential applications across a range of multiscale materials phenomena.

Full PDF

RRecurrent Localization Networks applied to the Lippmann-SchwingerEquation

Conlain Kelly , Surya R. Kalidindi ∗ Abstract

The bulk of computational approaches for modeling physical systems in materials science derive from eitheranalytical (i.e. physics based) or data-driven (i.e. machine-learning based) origins. In order to combinethe strengths of these two approaches, we advance a novel machine learning approach for solving equationsof the generalized Lippmann-Schwinger (L-S) type. In this paradigm, a given problem is converted intoan equivalent L-S equation and solved as an optimization problem, where the optimization procedure iscalibrated to the problem at hand. As part of a learning-based loop unrolling, we use a recurrent convolu-tional neural network to iteratively solve the governing equations for a ﬁeld of interest. This architectureleverages the generalizability and computational eﬃciency of machine learning approaches, but also permitsa physics-based interpretation. We demonstrate our learning approach on the two-phase elastic localizationproblem, where it achieves excellent accuracy on the predictions of the local (i.e., voxel-level) elastic strains.Since numerous governing equations can be converted into an equivalent L-S form, the proposed architecturehas potential applications across a range of multiscale materials phenomena.

Keywords:

Machine Learning, Learned Optimization, Localization, Convolutional Neural Networks

1. Introduction

Most problems in materials science and engineering require the exploration of linkages between materialsprocessing history, materials structure, and materials properties. Generally referred as Process-Structure-Property linkages [1], they constitute the core materials knowledge needed to drive materials innovationsupporting advances in technology [2]. Traditional physics-based numerical methods have long been thestandard for solving the governing ﬁeld equations underpinning these linkages. For mechanical problems,these have included ubiquitous ﬁnite element methods [3, 4, 5, 6] as well as FFT-based spectral methods[7, 8, 9]. However, standard solvers can constitute a major performance bottleneck in problems which requirerepeated solution over varied inputs, such as inverse problems [10, 11, 12] and multi-scale materials design[13, 14, 15].As an alternative, machine learning (ML) provides tools to approximate unknown linkages in a parametrizedfashion, with great success in many domains [16]. One of the most successful classes of ML models is neuralnetworks [17], which have been applied with excellent results both in general applications [16, 18, 19, 20],and within materials science [15, 21, 22, 23]. Unfortunately, ML models tend to act as “black boxes” whoseinner workings do not permit the depth of analysis provided by purely physics-based models [24]. There is aclear demand for approaches that leverage the advantages of both methodologies in order to build reliable,scalable, and interpretable reduced-order models.One example of such an eﬀort is the Materials Knowledge Systems (MKS) framework [25, 26]. Aimedat multiscale materials design [27, 28], MKS formulates the governing ﬁeld equations for heterogeneous ∗ Corresponding author

Email addresses: [email protected] (Conlain Kelly ), [email protected] (Surya R. Kalidindi ) Georgia Institute of Technology, Atlanta, Georgia 30332, USA

Preprint submitted to Elsevier February 2, 2021 a r X i v : . [ phy s i c s . c o m p - ph ] J a n aterials in a Lippmann-Schwinger (L-S) form [7, 29]. Using regression techniques to calibrate the ﬁrst-order terms of a series expansion to the L-S equation, MKS presents a generalized approach for solving abroad class of scale-bridging problems in materials design and optimization [30, 31]. However, improvingthe accuracy of these models requires higher-order L-S terms, which rapidly become more computationallyexpensive.As an alternative, we propose an approach inspired by the intersections between iterative spectral meth-ods [32] and recent advances in inverse imaging [19, 33]; we cast the recurrent L-S equation as an optimizationproblem. Rather than employing a predeﬁned optimization strategy, such as gradient descent or conjugategradients [34], the optimizer is posed as a recurrent collection of convolutional neural networks. After beingcalibrated to available curated data (e.g., results of FEA simulations, phase ﬁeld models), these networksact as proximal (or “update”) operators which take in a candidate solution and output an improved ver-sion. This iterative methodology emphasizes the underlying physics to permit greater model robustness anddeeper analysis.In this paper, we begin with a brief analysis of the L-S equation and its application to the linear elasticityproblem, followed by discussion on the general L-S equation. Using this analysis, we then demonstrate howthe L-S equation can be naturally posed as a machine learning problem, and how a neural network can learnproximal operations which minimize a physical quantity (e.g., stress ﬁeld divergence) within a solution ﬁeld.By exploring the interplay between the physical and computational interpretations of the L-S equation, weprovide insight into a new class of ML models for materials science. We then analyze which aspects ofour approach provide the greatest gains by exploring various model conﬁgurations, reinforcing the value ofan iterative (rather than feed-forward) approach. Finally, we evaluate our methodology on the problem ofelastic localization and compare it to a previous machine learning model.

2. Background

Originally developed in the context of quantum mechanics [29], the L-S equation – or class of equations– is an implicit integral form that can represent a fairly general space of physical phenomena. The L-Sequation is especially useful in the context of physics of heterogeneous media with spatially-varying physicalparameters: stiﬀness, conductivity, density, etc. We defer discussion of the general Lippmann-Schwingerform until Section 2.2.As a case study, we consider the problem of computing the internal elastic strain ﬁeld of a compositematerial undergoing bulk stresses or strains [7, 32, 35]. The composite microstructure is assumed to becomposed of two or more distinct phases (i.e., thermodynamic material constituents), each exhibiting itsown constitutive laws. This problem is herein referred to as elastic localization. An example two-phasemicrostructure and corresponding elastic strain ﬁeld are presented in Figure 1.Physically, elastic localization is governed by a generalized Hooke’s law relating the variation in stress σ ( x ), strain ε ( x ), and stiﬀness C ( x ) over some volume V , along with the demand that the equilibriumstress ﬁeld be divergence-free: σ = Cε , (1) ∇ · σ = 0 . (2)We consider constant periodic boundary conditions that correspond to the imposed volume-averaged strain, ε . With these choices, one can model the internal mechanical response of a representative volume element(RVE) of the material. In these models, the RVE often serves as a statistical unit cell of a larger materialstructure. We note that the problem statement expressed here serves as a simple illustration of the L-Sapproach, which has been successfully applied to more complex material systems and/or boundary conditions[32, 30].Following the work of Kr¨oner [36], this system is converted into an equivalent form (the elastic L-S equation) by splitting the local elastic stiﬀness C into a selected (constant) reference value C R and a2 igure 1: Example microstructure-strain ﬁeld pair for two-phase elastic localization. Yellow is high-stiﬀness phase; purple islow-stiﬀness phase; contrast between elastic moduli is CR = 50 perturbation value C (cid:48) . Substituting these stiﬀness tensors into Equations (1) and (2) provides a partitionedversion of the governing ﬁeld equations: ∇ · ( C R ε ) = −∇ · ( C (cid:48) ε ) (3)Observe that the left-hand side of this equation is a linear operator, denoted as L , which acts on ε : L ε ≡ ∇ · ( C R ε ) = C R ∇ · ε (4)Clearly L ε is linear in ε ; therefore, it will have a corresponding Green’s function G ( x , s ) [37]. Sincedivergence is uniform in space, we make the simpliﬁcation G ( x , s ) = G ( x − s ). This function representsthe system’s response to an impulse inhomogeneity L G ( x − s ) = δ ( x − s ) (5)where δ denotes the Dirac-delta function.We also partition the strain ﬁeld as ε = ε R + ε (cid:48) , where ε R represents a (constant) reference strain deter-mined by boundary conditions (this is also equal to the internal average strain ε ) and ε (cid:48) is the correspondingperturbation strain. Both C R and ε R are constant, so ∇ · ( C R ε R ) = 0 and thus ε R is the homogeneoussolution to L . For a given inhomogeniety b ( x ), we can use the Green’s function to solve for the particularsolution ε (cid:48) ( x ): L ε ( x ) = b ( x ) (6)= ⇒ ε ( x ) = ε R + (cid:90) V G ( x − s ) b ( s ) d s (7)Now we formally treat the right-hand side of Equation (3) as the inhomogeneity b to obtain the elasticLippmann-Schwinger Equation: ε (cid:48) ( x ) = − (cid:90) V G ( x − s ) ( ∇ · [ C (cid:48) ( s ) ε ( s )]) d s . (8)or ε ( x ) = ε − (cid:90) V G ( x − s ) ( ∇ · [ C (cid:48) ( s ) ε ( s )]) d s (9)3ince many governing equations can be converted into a similar form, we refer to their transformed versionas an “L-S form”; that is, Equation (9) is the Lippmann-Schwinger equation corresponding to the elasticgoverning equations. A Lippmann-Schwinger derivation corresponding to a general governing equation ispresented in Section 2.2. The reader is referred to Refs. [38, 39] for more background on Green’s functions.Using insights from previous work [22, 27, 35], we modify this form to make it amenable to a learningparadigm. First, we integrate-by-parts to shift the derivative onto G and absorb it into a new operator, (cid:98) G .Using ∗ to concisely denote a convolution, we obtain ε ( x ) = ε − (cid:90) V (cid:98) G ( x − s ) C (cid:48) ( s ) ε ( s ) d s = ε − (cid:98) G ∗ ( C (cid:48) ε ) (10)Next, we deﬁne a binary microstructure representation m h ( x ) that equals 1 if the material at point x isof phase (or material type) h , and 0 otherwise. Since each phase has its own stiﬀness, we can project thestiﬀness tensor C onto each phase: C = (cid:80) h C h m h and likewise C (cid:48) = (cid:80) h C h (cid:48) m h . Finally, we combine theGreen’s function terms with the C h (cid:48) expression to obtain yet another operator Γ ( x − s ) h ≡ (cid:98) G ( x − s ) C h (cid:48) .Applying all of these modiﬁcations, the elastic L-S form becomes ε ( x ) = ε − (cid:88) h Γ h ∗ (cid:0) m h ε (cid:1) . (11)The problem of elastic localization has thus been reduced to a single convolutional integral containingthe microstructure m h , candidate strain ﬁeld ε , and a physics-determined stencil Γ h . Curiously, the ﬁrsttwo terms appear solely as an element-wise product between m h and ε . This is due to the fact thatthe strain ﬁeld is constrained indirectly by the divergence-free condition on σ . One also observes that alleﬀects of C have been absorbed into Γ h . This corresponds to the fact that Γ h is not unique: inﬁnitelymany choices of C R could result in this equation. Although mathematically equivalent to the originalphysics (and visually more complicated), Equation (11) provides signiﬁcant advantages for solution overlarge heterogeneous volumes. Several solution strategies have been explored to address elastic localizationin heterogeneous material systems, resulting from diﬀerent interpretations of the L-S equation.Mathematically, Γ h is a set of convolutional kernels – one for each phase – encoding the underlyingphysics of the problem. Given a strain ﬁeld, it computes the corresponding stresses and peels oﬀ the strainperturbation ﬁeld required to minimize the stress divergence. Using this perspective, many existing modelsview the L-S equation as a ﬁxed-point equation and solve it via root-ﬁnding [7]. The rate of convergence ofthese methods tends to depend heavily on the variation in material properties and the choice of C R [32].All of these physics-based approaches require a quantitative knowledge of the Green’s function G .From a computer science perspective, the term m h ε simply represents the strain ﬁeld segmented by phase(since m h is a binary indicator function). Given a collection of true structure-strain pairs, one could eitherlearn Γ h , or some approximation, to best conserve the equality. Following this view, several ML-based elasticlocalization models [22, 35] have been applied to learn (non-iterative) linkages between m h and ε by eitherapproximating a series expansion of Γ h , or using a neural network to map m h to ε directly, bypassing Γ h completely. The disadvantage of these models is that they either truncate or ignore the underlying physics,trying to re-learn it from data. The tension between these two perspectives leaves room for a hybrid methodwhich retains the iterative L-S structure, but uses ML to deduce the internal details of the Γ h operator. This section presents a derivation for the general L-S equation from a generic governing equation, drawingon the work of Moulinec [7] and Kr¨oner [36]. This is provided for two reasons: to provide greater intuitionand background for the L-S equation, and to motivate its compatibility with ML solvers. First, we write (inconservation form) a governing diﬀerential equation controlling a ﬁeld y which varies over space x spanningsome volume V : H ( y ( x ); x ) = 0 . (12)4bserve that any governing equation can be partitioned into two coupled subequations, each governingtheir own subsystems. First deﬁne an operator L a which captures all linear (and spatially homogeneous)components of H . Now deﬁne a second (possibly nonlinear) operator B containing the rest of H . Oneobtains the earlier example of elastic localization with the substitutions y ≡ ε , L a ε ≡ ∇ · ( C R ε ), and B ( ε ) ≡ ∇ · ( C (cid:48) ε ). Although not explicitly denoted, both L a and B may contain implicit information aboutthe solution domain’s structure (terms such as C or m ).Using these operators we can rewrite the original equation as: H ( y ; x ) = L a y + B ( y ; x ) = 0 (13a)or L a y = − B ( y ; x ) . (13b)This partitions the governing equation into two coupled systems: a linear homogeneous system permittingonly “simple” solutions, and a nonlinear, heterogeneous system where the solution is more complicated.Before solving the complete equation, we consider the auxiliary system L a y ( x ) = b ( x ) (14)for some inhomogeneity b . We deﬁne the homogeneous solution to this system as y R , so that L a y R = 0.Note that in general, y R is determined by both L a and the relevant boundary conditions, and for someproblems there may be more than one suitable y R . For problems with a constant solution ﬁeld on theboundary, one ﬁnds that y R = y , i.e., the reference ﬁeld is the average solution everywhere.The choice of y R induces a corresponding perturbation (or “particular solution”) y (cid:48) = y − y R . Because y R is annihilated by L a , note that L a y = L a y (cid:48) . Since L a is linear, it will have a Green’s function G ( x , s ),which captures the system’s impulse response [38]. Using this we write the particular solution to theauxiliary equation as a convolution between G and b and reconstruct the complete solution: L a G ( x , s ) = δ ( x − s ) (15)= ⇒ y (cid:48) ( x ) = (cid:90) V G ( x , s ) b ( s ) d s (16)or y ( x ) = y R + (cid:90) V G ( x , s ) b ( s ) d s . (17)Now we return to Equation (13b) and apply the auxiliary approach, this time treating the entire B term asour homogeneity b (and noting the attached minus sign). Plugging this into the perturbation expression for y gives us: y = y R − (cid:90) V G ( x , s ) B ( y ( s ); s ) d s . (18)This is the general Lippmann-Schwinger equation; since this derivation holds for any operator H , we usethe term “L-S form” for a given H to describe the result of partitioning and substituting that H intoEquation (18). Referring back to the example of elastic localization, Equation (9) is the equivalent L-S formfor Hooke’s law (Equation (2)). Note that the linear system L a only enters the L-S equation through thedeﬁnitions of y R and G . For example, if one used the trivial choice of the identity for L a , the correspondingGreen’s function would just be the Dirac delta function, and Equation (18) would simplify to the originalgoverning equation.For the L-S form to be advantageous over H , L a must capture a non-trivial amount of the underlyingequation. There are three primary factors which make the L-S equation useful. First, it is partitioned :5he original system is broken into two coupled physical systems. This makes it similar to a traditional“splitting method” where the linear and nonlinear components are separated, allowing the solution of thenonlinear components to be informed by the homogeneous, linear solution. The coupling between systemsmeans that the L-S equation is also recursive : the presence of y in the inhomogeneity term leads to itsappearance on both sides of the equation. If one desires to solve the problem analytically, the L-S form islikely no more useful than the original governing equation. However, the implicit structure is very suitablefor iterative and optimization-based solvers [9, 40]. Finally, the L-S equation is convolutional : the integral isactually a convolution between the B term and a (possibly-unknown) Green’s function G . Roughly speaking,Equation (18) presents the solution ﬁeld y ( x ) as a balance between the global homogeneous “pull” ( y R ) andthe localized “tug” ( y (cid:48) ( x )) of the solution values in a neighborhood near x . In situations where B is a purelydiﬀerential operator (such as elastic localization), and with appropriate boundary conditions, Equation (18)can be integrated-by-parts to shift part of B onto G . This can simplify the integral term so that all of thephysics is contained in a single convolutional stencil. As one of the most popular ML tools in use, neural networks are a class of parametrized functionapproximators that can be calibrated to curated data [17]. At a high level, an artiﬁcial neural network(ANN) operates as an alternating sequence of tunable linear transforms and nonlinear activation functions– mimicking the operation of physical neurons in the brain [41, 42]. Under certain conditions, a suﬃcientlylarge neural network can be shown to act as a universal function approximator [43, 44], motivating their usein a myriad of disciplines.Two relevant specializations are the convolutional neural network (CNN), which uses convolution witha ﬁxed-width stencil as its transform operation [45, 18], and the recurrent neural network (RNN), whichoperates on sequential data and considers latent information carried across input iterations [46]. Thesetools can be combined to model the underlying structure of various problems. A well-designed ML modeltrained on suﬃcient data can be signiﬁcantly faster than an analytical equivalent and still provide reasonableaccuracy [47]. This comes at the expense of interpretability – they return a “black-box” model which isdiﬃcult to understand and analyze [24]. Additionally, the topology of these networks (e.g., number of layers,nodes per layer, activation functions) strongly determines their success [48], and the “best” conﬁguration isproblem-dependent and often constructed ad-hoc.Recently there has been tremendous interest in the application of neural networks to mathematicalproblems [16]. Speciﬁcally, variations of recurrent CNNs have been explored [49] to learn Bayesian priors forimage denoising [33] or proximal operators for medical imaging [19]. These image analysis methods pose theproblem such that the desired output is obtained via a learned optimization procedure, where the optimizeritself is formulated as a neural network. Surprisingly, these methods often employ very simple networkdesigns, especially compared to deeper and more elaborate structures found in mainstream ML [20, 50].

3. Methodology

We now explore how the perturbation expansion and L-S form allow a governing equation to be inter-preted naturally as a machine learning problem. We ﬁrst deﬁne a new operator Φ representing the entireright-hand side of Equation (18). We also use m to represent a problem domain’s underlying microstructure,which inﬂuences the inhomogeneity B . Given a sample microstructure m ∗ , we obtain the correspondingstrain y ∗ by minimizing the error (or loss) L between y and Φ ( y , m ∗ ) over all y . Φ ( y , m ) ≡ y R − (cid:90) V G ( x , s ) B ( y ( s ); s ) d s (19) y ∗ = Φ ( y ∗ ; m ∗ ) = arg min y L ( y , Φ ( y ; m ∗ )) (20)6lthough Φ may not be linear itself, linear analysis methods provide a useful interpretation: for agiven microstructure m ∗ , Φ has a (possibly non-unique) generalized eigenfunction y ∗ with unit eigenvalue.Issues regarding the solution’s uniqueness and existence can arise from the original governing equation’snonlinearity. In the case of the elasticity problem, the governing equation is linear, so a unique solution willexist.Now the original problem of solving the governing equation H has been reduced to that of minimizing thescalar loss L given a particular microstructure via a learned optimization strategy. To do this, we deﬁnea parametrized learner F θ that performs a sequence of proximal operations which progressively improvethe solution ﬁeld to match the governing physics. Here θ represents all possible parameters of this learner.The optimal set of parameters θ opt is obtained by minimizing the expected error produced by F θ w.r.t. θ ; in the case of a CNN this represents the optimal network weights, and can be obtained via standardbackpropagation [51]. Given a microstructure and initial guess y , we want F θ to provide a solution ﬁeldwhich is approximately the true solution ﬁeld y ∗ : F θ ( y , m ∗ ) = ˆ y ≈ y ∗ (21)This is accomplished by a sequence of updates y i = y R + f i ( y i − , m h ) (22)where f i represents the perturbation proximal operator used at iteration i ; given a microstructure andcandidate strain ﬁeld, it outputs the perturbation component corresponding to an improved estimate ofthe true strain ﬁeld. A pseudocode and visual representation of the approach developed in this work arepresented in Figure 2. Ideally, after the model is trained, F θ and Φ have the same eigenfunctions (i.e., F θ ( y ∗ , m ∗ ) = y ∗ ).The optimization strategy employed by the F θ model is directly determined by the choice of f i . Toexplore what this might look like for the elastic problem, consider an L2 loss function and plug in the elasticLS formula (Equation (11)): Φ ( y ; m h ) = y R − (cid:88) h Γ h ∗ (cid:0) m h y (cid:1) (23) L (cid:0) y , Φ ( y ; m h ) (cid:1) ≡ (cid:0) y − Φ ( y ; m h ) (cid:1) = 12 (cid:32) y − y R + (cid:88) h Γ h ∗ ( m h y ) (cid:33) (24)The original ﬁxed-point approach of Moulinec [7] corresponds to the choice f Mi ( y i − , m h ) = − (cid:88) h Γ h ∗ (cid:0) m h y i − (cid:1) ∀ i (25)As an alternative, we can obtain a “steepest descent” formula by taking the gradient [34] of Equation (24): f Gi ( y i − , m h ) = y i − − y R − γ i ∂∂ y i − L (cid:0) y i − , Φ ( y i − ; m h ) (cid:1) (26)or f Gi ( y i − , m h ) = y i − − y R − γ i (cid:32) I + ∂∂ y i − (cid:88) h Γ h ∗ ( m h y i − ) (cid:33) (cid:32) y i − − y R + (cid:88) h Γ h ∗ ( m h y i − ) (cid:33) (27)where γ i denotes the step size at iteration i and I represents the identity operator. By ﬂattening everythinginto vectors and representing the convolution with an equivalent Toeplitz matrix [52], one can convert the7 e f F θ ( y R , m ) : y = y R f o r i = 1 t o N : y i = y R + f i ( y i − , m )r e t u r n y N (a) Pseudocode for optimization procedure F ? my f Legend: y R : reference solution y i : iterates ( i-k) through ( i) (last k candidate solutions) ? : data combination? : element-wise sum y R my f N y N ... (b) Visualization of data ﬂow through optimization procedure Figure 2: Pseudocode and visualization for optimization-based solution of the L-S equation H acting on y i − . The linearity of H comesfrom the fact that the ( I + ∂∂ y i − . . . ) term is actually independent of y i − . Using a constant λ i to collectremaining terms of y R , the steepest descent rule becomes: f Gi ( y i − , m h ) = (1 − γ i )( y i − − y R ) + γ i Hy i − + λ i y R (28)Eﬀectively, the gradient descent rule says that the new perturbation ﬁeld is obtained by correcting theprevious perturbation ﬁeld ( y i − − y R ) using a set of convolutional operations involving Γ h and m h , thensubtracting oﬀ a factor of y R such that the output perturbation ﬁeld is zero-mean.A variety of more complicated update rules have been proposed to accelerate the solution of diﬀerentforms of the L-S equation [32]. From the examples above one sees that any update rule will involve variousterms of Γ h , which itself contains both physical properties ( C h (cid:48) ) as well as derivatives of the Green’s function( (cid:98) G ); more complicated update rules will simply require higher-order combinations. Therefore, if we hopeto learn f i , it must be parametrized in a way such that it can capture global convolutional stencils, as wellas various derivatives thereof. Finally, we note that although most analytical approaches employ the sameoperator for each iteration, the F θ formulation permits varying f i across iterations. Most analytical minimization procedures are (by design) problem-agnostic; this means that they can beexpected to work reasonably well for many problems, but may not be optimal for the problem at hand.Rather than using a predeﬁned optimization strategy, we formulate each f i as a CNN that learns a proximaloperator mapping a given microstructure and candidate solution ﬁeld to an improved perturbation ﬁeld.The central motivation behind using a CNN proximal operator is that given suﬃcient parameterization, itcan emulate almost any optimization strategy; furthermore, that strategy will be customized to the problemat hand during training. Of course, this comes with the immense caveat that, absent any advances intheoretical ML, a learned optimizer will not have any provable convergence guarantees. The means thateven as N → ∞ , our learned model may not converge to the true solution ﬁeld for a given microstructure.In practice, however, the model can be tuned and trained until it consistently produces solutions withinacceptable error tolerances.We deﬁne the F θ model imbued with a CNN f i as a recurrent localization network (RLN), which performsthe following operations during both training and evaluation phases: given a microstructure m and initialguess y R , estimate the true solution ﬁeld by reﬁning it over N iterations. At iteration i , the microstructureis combined with candidate solutions from previous iterations and passed through CNN f i . Note that f i outputs a perturbation ﬁeld y (cid:48) i . The reference solution y R is already known, so there is no need to learnthat. In order to simulate a multi-step solver [3, 53] and estimate higher-order derivative terms, f i considersthe last k solutions via multiple input channels (rather than just y i − ). One could potentially achieve thisproperty, and perhaps obtain better results, by using GRU or LSTM modules [33] which learn a “latent”state to pass between iterations; however, 3D convolutional implementations for these operations were notpart of major ML libraries at the time of writing.Speciﬁcally for elastic localization, m and y are combined via element-wise multiplication followingEquation (11). To enforce computational stability, all strains are normalized by the average strain ε .Moreover, the output of each f i network has its average subtracted to enforce the constraint that theperturbation strain ﬁeld is always zero-mean.Following prior work [19], we deﬁne a full RLN as using a diﬀerent f i for each iteration (although weuse the same CNN structure for each), for a total of N distinct networks. The idea behind this is to allowthe network capture diﬀerent properties at each iteration, akin to terms in a series expansion. Havingsigniﬁcantly more tunable parameters, this conﬁguration provides the most expressive model. By allowingdiﬀerent operators to be employed at diﬀerent iterations, this approach also deviates the most from standardanalytical optimization procedures.Alternatively, one may wish to reuse the same intermediate network across iterations ( f i = f ∀ i ). Thisapproach is denoted as RLN-t since the weights are tied between iterations. This means that the sameoperator will be employed at each iteration; however, since a time series is fed into each f i (rather than a9ingle data point), the RLN-t is still able to learn higher-order derivatives. It has the primary advantage ofsimplicity and eﬃciency, since it uses a factor of N fewer parameters than the full RLN.Finally, we test the importance of the iterative and recurrent nature by considering a single f i network,i.e., choosing N = 1. We call this a feed-forward localization network (FLN) and use it as a control toquantify the beneﬁts of iteration vs. network design. Although the RLN-t and the FLN have the samenumber of parameters, the RLN-t uses each parameter N times, eﬀectively simulating a deeper network.For N >

1, the proximal CNNs are all calibrated simultaneously via backpropagation. During training abatch of true structure-strain pairs are fed through F θ in its entirety, and all proximal networks are updatedsimultaneously to minimize a calibration loss function L ( cal ) (diﬀerent from the solution ﬁeld loss L above). Rather than only consider the loss of the last iterate, we use a weighted sum of the loss of individualiterates: L ( cal ) = (cid:80) i w i L ( cal ) i for some weights w i . The goal is to encourage the network to progressbetween iterations, while also ﬁnding the best possible solution. Following the analysis of Andrychowiczet al. [49] we interpret the use of L ( cal ) as a variant of Backpropogation Through Time [54]. The choice ofweights w i could theoretically act as a form of stabilization or even regularization. By requiring that eachiteration output a reasonable candidate solution, each proximal operator is constrained to behave somewhatphysically, which might help prevent overﬁtting. However, this means that the network is encouraged tomake larger changes in early iterations, potentially reducing its ﬁnal-iterate accuracy. Conversely, if onlythe ﬁnal result is important, then intermediate iterations could explore the loss curve more. However, onlyupdating based on the last iteration will slow the model’s training, and possibly increase the chances of thenetwork weights converging to a poor local minimum. Clearly further experiments are required to explorethese hypotheses.Following similar works [33] we chose the uniform weighting w i = 1 ∀ i . This induces the network tomake larger corrections in early iterations (to avoid carrying costly errors through several iterations) andrelatively smaller corrections in later iterations. However, our numerical experiments (Section 4) indicatethat perhaps a diﬀerent weighting might help the network capture ﬁne-scale microstructure features. Theappropriate number of iterations depends on the speciﬁc problem; for elasticity N = 5 proved suﬃcientin both the RLN and the RLN-t, and more iterations yielded little beneﬁt. For the number of previousiterations to track, the value k = 2 was chosen. The above choices of hyperparameter were largely heuristicand made for simplicity, but they worked well for elasticity; for a more intensive problem a cross-validationprocedure would be a better method for their selection. Various network topologies for f i [19, 33, 55] were tested with one goal in mind: use convolutionalkernels to eﬀectively capture local interactions. The architecture that proved most successful for elasticitywas based roughly on a V-Net [55] and is presented in Figure 3. By combining information across lengthscales through the use of up- and down-sampling operations, this architecture is able to encode and combineboth ﬁne- and coarse-grained features. Notably, even a simple sequence of 3 convolutional layers (similar tothat of Adler [19]) worked reasonably well. As with the hyperparameters above, a cross-validation procedurecould aid in picking a superior network topology for a given problem; for this work, we intentionally avoidedhyperparameter optimization to emphasize the mathematical analysis and iterative approach.For this study, most convolutional kernels are 3x3x3, with 1x1x1 kernels used for channel reductionand 2x2x2 for up/down-sampling, summing to ≈ f i . A parametric rectiﬁed linear unit (PReLU) activation function [56] was used was used after the 3x3x3convolutions. This activation is similar the regular ReLU, but has a non-zero slope α for negative inputs; α is trained with all the other network weights during backpropogation. In our experiments this improved themodel’s stability during training and accuracy during testing. Eﬀectively, it allows the network to be muchmore expressive with only a few extra parameters – changing one α scales the entire output of a channel,rather than 3 × × f i OutputInput

Legend:

Conv MxMxM: M kernel@X: X output channels / Y: downsample by Y * Z: upsample by ZPReLU: Parametric ReLUConv 3x3x3 @16Conv 2x2x2 @32 / 2ConvT 2x2x2@32 * 2 Conv 3x3x3 @32Conv 3x3x3 @32Conv 3x3x3 @32 PReLUPReLUPReLU

Conv 1x1x1 @1

PReLUPReLU

Conv 3x3x3 @16

PReLU

Conv 3x3x3 @16

PReLU

Figure 3: Architecture for internal f i networks

4. Linear Elasticity Experiments

We now present results from the application of RLN-type models to the elastic localization problemin two-phase composites. A synthetic dataset of 20,480 31 × ×

31 microstructure/strain ﬁeld pairs wasgenerated and randomly split 40% / 20 % / 40% into disjoint train/validation/test sets, respectively. Thetrain set was used to calibrate F θ , the validation set served to measure its quality while training, and thetest set was used to compute error metrics.The microstructures were generated via PyMKS [25] by creating a uniform random ﬁeld, applying a setof Gaussian ﬁlters, and thresholding the result. In this microstructure generation process, the ﬁlter’s widthin each direction controls the characteristic size and shape of the material grains, and the threshold controlsthe relative volume fraction of each phase.Combinations of these four parameters were generated via a Latin hypercube sampling procedure togenerate 4096 design points for microstructure generation. For each design point, 5 random microstructureswere generated in order to produce statistically similar inputs, so that the datasets had some statisticalredundancy. A random sample of 8 training microstructures is displayed in Figure 4. By varying thesedesign parameters, one can cover a vast subspace of all possible microstructures, although we note thatthe space of all microstructures in intractably large – ignoring circular symmetries there are O (2 S ) possiblediﬀerent microstructures with S voxels. By selecting our training and testing sets using the same procedure,we demonstrate the RLN’s ability to interpolate between “familiar” microstructure samples, but not toextrapolate to microstructures which lie outside the closure of the training set. It is important to note thatthese microstructures do not necessarily correspond to thermodynamically favorable structures (i.e. theymight appear in nature). However, this is a limitation only in data and not methodology – by using afully-convolutional structure [57] the RLN can handle any voxelized input, including experimental data.The elastic strain ﬁelds in each microstructure were obtained from previously-built ﬁnite element models[22]. One major simpliﬁcation is that even though the imposed 3D strain ﬁeld is actually a second-ordertensor (and the elastic stiﬀness ﬁeld a fourth-order tensor), the linear nature of the problem allows us tosolve for each component individually and superimpose their solutions as needed [35]. As such we apply an11 igure 4: Random sample of 8 training microstructures sliced along ( y, z ) axis. Yellow is high-stiﬀness phase, purple islow-stiﬀness. overall displacement to a material RVE along only the x -axis and we focus solely on the (cid:15) xx term. For thesesimulations, a total strain of 0.1% was applied via periodic boundary conditions.The relevant material parameters to be prescribed are thus the elastic moduli E , E and Poisson ratios ν , ν of the two phases. To roughly match most common metals, we chose ν = ν = 0 .

3. With theseselections, the contrast ratio CR ≡ E E has the most dominant role on the ﬁnal strain ﬁeld. With the choice E ≥ E , one observes that CR ≥

1. In general, as the contrast in stiﬀnesses increases, the problem becomesharder to solve with both iterative and data-driven methods [32]. Following prior work [22], we tested theRLN on contrast ratios of 10 and 50.Each RLN model was implemented in PyTorch [58] and trained independently for 60 epochs on anNVIDIA V100 GPU [59], which took approximately 8 hours. The RLN models were all calibrated using theAdam optimizer, a Mean Square Error loss, and a Cosine annealing learning rate decay [60]. The last choiceproved especially important since the models demonstrated great sensitivity to learning rate; introducingthe cosine decay helped the model converge smoothly and quickly. After training, the epoch with the lowestvalidation loss was chosen as the “optimal”. In reality, the training and validation losses were tightly coupledas demonstrated in Figure 5. Note that training losses are aggregated during the epoch, whereas validationlosses are computed after the epoch is complete, which causes the validation loss to occasionally appearlower.

Following Yang et al. [22], the accuracy of the model was evaluated for each instance using the meanabsolute strain error (MASE) over all S voxels: M ASE ( y pred , y true ) = 100 × S S (cid:88) i =1 (cid:12)(cid:12)(cid:12)(cid:12) y pred [ i ] − y true [ i ] y R (cid:12)(cid:12)(cid:12)(cid:12) . (29)Figure 6 presents the MASE distribution across each microstructure/strain pair in the test set. Note thatthe MASE is an aggregate measure which measures RVE-wide error; the pointwise error variation within amicrostructure is explored below. The mean and standard deviation of the MASE distribution are collected12

10 20 30 40 50 60Epoch5101520253035 M S E L o ss Learning curve for FLN

FLN, CR = 10 training lossFLN, CR = 10 validation lossFLN, CR = 50 training lossFLN, CR = 50 validation loss 0 10 20 30 40 50 60Epoch M S E L o ss Learning curve for RLN-t

RLN-t, CR = 10 training lossRLN-t, CR = 10 validation lossRLN-t, CR = 50 training lossRLN-t, CR = 50 validation loss 0 10 20 30 40 50 60Epoch v a l u e Learning curve for RLN

RLN, CR = 10 training lossRLN, CR = 10 validation lossRLN, CR = 50 training lossRLN, CR = 50 validation loss Figure 5: Learning curve containing calibration losses for each model and contrast ratio.

Model MASE (mean ± std. dev.) Contrast-10

Comparison DL model [22] 3.07% ± ± ± ± Contrast-50

Comparison DL model 5.71% ± ± ± ± Table 1: Test-set MASE metrics for RLN and control models, for both CR = 10 and CR = 50 in Table 1 for the RLN-type models. For comparison we also present results from a recent study [22] usinga feed-forward deep learning (DL) model to predict the strain ﬁelds; note that the DL model was trainedand tested on its own dataset prior to this eﬀort.It is important to note that the DL model had a fundamentally diﬀerent architecture: it was designedto work on 21 voxel structures (whereas ours was tested on 31 ), and it predicted the strain one voxelat a time (whereas ours predicts strain across the entire microstructure simultaneously). The dataset forthe DL model employed large amounts of data augmentation using circular permutations, and containedsigniﬁcantly more input-output pairs. Finally, the DL model employed linear layers (to collapse to a singleoutput); the size of these layers implies that the DL model used substantially more network weights thanthe RLN. As a result, it is diﬃcult to compare the DL dataset and results with ours. We emphasize thatthe DL model represents a strong comparison model for ML elastic localization, but that objective rankingof relative performance is diﬃcult. Nevertheless, the RLN architecture is able to produce signiﬁcantly moreaccurate strain ﬁeld estimates on the RLN dataset than the DL architecture produces on the DL dataset.Keeping these caveats in mind, the following analysis explores the diﬀerence between RLN conﬁgurations.Looking at the aggregate statistics, the FLN performs worst of all models analyzed; it is vastly outper-formed by the RLN-t even though they have the same number of parameters. This is also reﬂected inthe learning curve in Figure 5: the RLN-t trained faster than the FLN and converged to a more accuratemodel simply by applying the proximal operator repeatedly across multiple iterations. Intriguingly, the FLNresults are somewhat worse than the DL results. This implies that our simple network topology and relativelack of hyperparameter tuning produced a less-powerful model than the DL. However, the RLN-t did muchbetter, producing less error on average (and signiﬁcantly less variance) than the DL control for both contrast13 igure 6: MASE distribution for RLN, CR = 50Figure 7: Slices of RLN predicted strain, true (FEA) strain, and ASE for worst test-set instance, CR = 50 ratios. The disparity in performance indicates that dataset and network topology alone cannot explain theimproved MASE relative to the DL model – the iterative methodology produces a more powerful model.The full RLN is the most accurate model, producing roughly half as much error and variability as theDL model for both contrast ratios. The improvement over the RLN-t has a number of possible origins:the RLN uses a factor of N more parameters, and in turn it uses a diﬀerent operator at each iteration. Wenote that after training, the full RLN increases the memory overhead but not the prediction time: up toGPU memory details, each iteration has the same computational costs regardless of weight tying. Excludingdata I/O overhead, the RLN-t and the full RLN take only ≈

87 seconds to predict strain ﬁelds for theentire testing set (8,192 microstructures), or roughly 11 milliseconds per microstructure ( c.f. ≈ y i evolves across iterations, as well as its diﬀerence14 y N o r m a li z e d S t r a i n FLN Evolution

Figure 8: FLN predictions and diﬀerences between iterations for selected test-set instance, CR = 50 y y y y y y N o r m a li z e d S t r a i n RLN-t Evolution

Figure 9: RLN-t predictions and diﬀerences between iterations for selected test-set instance, CR = 50 ∆ i ≡ y i − y i − . This is presented in Figure 8 for the FLN, Figure 9 for the RLN-t, and Figure 10 for the fullRLN. The greyscale coloring is the same scale as Figure 7 and has its maximum at the true strain response’smaximum. The redscale coloring represents a negative strain update ( y decreasing between iterations).The FLN appears to handle low-strain areas fairly well, but misses the strain peaks in the center. Notethat it does not apply the same operation as the ﬁrst iteration of the RLN-t; the FLN is trained to makethe best possible answer within a single jump. In comparison, the RLN-t does a better job of capturingstrain peaks, likely since it can build them up over several iterations. What is less clear is that it fails tocapture the central strain peak as well as the full RLN – the diﬀerences between the two are evidently ratherﬁne-scale, so we focus our analysis on the full RLN.After the ﬁrst iteration the RLN has picked up most of the relative troughs and peaks, and ﬁner tweaksare handled by the later iterations. For example, most of the high-strain regions (greyscale areas) appearto be captured in the ﬁrst two iterations, whereas low-strain regions (redscale areas) continue to be reﬁnedin later iterations.This is due to the RLN’s ability to learn diﬀerent, and nonlinear, operators at each iteration – theMSE loss function tends to magnify high-magnitude errors, even if they are very localized (i.e., strain peaksand troughs). The model is therefore encouraged to estimate these outlier areas ﬁrst (corresponding to ∆ having much more magnitude). Of course, this puts a lot of weight on the ﬁrst proximal network to capturethe strain outliers correctly. One possible solution would be to adjust the loss function weighting so that15 y y y y y N o r m a li z e d S t r a i n RLN Evolution

Figure 10: RLN predictions and diﬀerences between iterations for selected test-set instance, CR = 50 early iterations are encouraged to converge gradually towards a good solution, rather than quickly towardsa decent one. In other words, by allowing earlier iterations to mispredict strain peaks somewhat, the modelmay be more likely to escape local minima and actually obtain higher ﬁnal-iteration accuracy.This approach diﬀers from traditional ﬁnite-element based approaches in that it seeks a solution bylearning Green’s functions (and derivatives thereof) of the governing equation, rather than solving thegoverning equation numerically. This provides an advantage in prediction speed by requiring a rather costly(but one-time) training overhead. The computational complexity of convolving a 3D ﬁeld containing S voxels with a 3D stencil containing k voxels is O ( Sk ) (since each voxel in the ﬁrst ﬁeld must be multipliedby each voxel in the stencil). For a ﬁxed network topology, k is a constant; therefore the prediction runtimewill increase linearly with the number of microstructure voxels. A spectral method using the Fast FourierTransform will cost at least O ( S log S ), and any numerical methods (ﬁnite element or otherwise) employinglinear solvers will likely be even more expensive. We reiterate that using GPU parallelization, the RLNrequires on average 11 milliseconds to predict the strain ﬁeld for a given microstructure, compared to severalseconds for the ﬁnite element solver. This makes it very valuable for inverse problems, where the localizationproblem must be solved thousands or even millions of times in order to solve a higher-level metaproblem[15]. Once trained for a speciﬁc governing equation (e.g. linear elasticity) and set of material properties(e.g., contrast ratio), the RLN methodology can be applied to any voxelized microstructure. Although weonly tested a 31 structure, in principle a model could be trained on one structure size and used on another(possibly with reduced accuracy); this is the subject of ongoing work. Note that the model must be trainedanew to predict strains for a diﬀerent contrast ratio.

5. Conclusions

In this paper, we describe a new learning-based methodology for addressing Lippmann-Schwinger typephysics problems. Embedding recurrent CNNs into a learned optimization procedure provides for a ﬂexible,but interpretable, ML model. The design of proximal networks is informed by problem-speciﬁc domainknowledge; that knowledge also provides a physical interpretation of what the model is learning. Fur-thermore, the partitioned and convolutional structure of this approach acts as a regularizer by enforcingunderlying physical properties such as mean ﬁeld values and spatial invariance. The iterative scheme allowsfor emulation of a deeper network, vastly increasing model robustness without increasing the parameter-ization space. If space allows, using a diﬀerent network for each iteration further improves the model’sexpressiveness and accuracy.When applied to the elasticity localization problem and using our dataset, our model produced muchmore accurate and interpretable results than previous deep learning models produced on similar datasets,16hile being much faster than analytical approaches. The CNN architecture used here was designed forsimplicity, but could be improved with more advanced techniques such as inception modules [61], perceptuallosses [62], Fourier layers [63], or variational layers [64]. Moreover, many hyperparameters, such as numberof iterations and loss function weighting, can be tuned on a problem-speciﬁc basis.

6. Acknowledgements

This work was supported by NSF Graduate Research Fellowship DGE-1650044, and SK acknowledgessupport from NSF 2027105. We used the Hive cluster supported by NSF 1828187 and managed by PACEat Georgia Institute of Technology, USA. The authors thank Anne Hanna, Andreas Robertson, Nic Olsen,and Mady Veith for insightful discussions that helped shape this work.

7. Data and Software Availability

The raw and processed data required to reproduce these ﬁndings are available to download from [dataset] [65]. The implementa-tion and training code for each RLN conﬁguration, as well as trained models, are available under an opensource license on Github (see Ref. [66]). 17 eferences [1] S. Kalidindi, Hierarchical Materials Informatics: Novel Analytics for Materials Data, Butterworth-Heinemann, Boston,2015. doi:https://doi.org/10.1016/B978-0-12-410394-8.00001-1 .URL [2] NIST, Materials Genome Initiative Stragetic Plan, 2014.URL [3] K. W. Morton, D. F. Mayers, Numerical Solution of Partial Diﬀerential Equations: An Introduction, 2nd Edition, Cam-bridge University Press, 2005. doi:10.1017/CBO9780511812248 .[4] F. Roters, P. Eisenlohr, L. Hantcherli, D. Tjahjanto, T. Bieler, D. Raabe, Overview of constitutive laws, kinematics,homogenization and multiscale methods in crystal plasticity ﬁnite-element modeling: Theory, experiments, applications,Acta Materialia 58 (4) (2010) 1152 – 1211. doi:https://doi.org/10.1016/j.actamat.2009.10.058 .URL [5] M. Cheng, J. Warren, Controlling the accuracy of unconditionally stable algorithms in the cahn-hilliard equation, Physicalreview. E, Statistical, nonlinear, and soft matter physics 75 (2007) 017702. doi:10.1103/PhysRevE.75.017702 .[6] D. McDowell, J. Panchal, H.-J. Choi, C. Seepersad, J. Allen, F. Mistree, Integrated Design of Multiscale, MultifunctionalMaterials and Products, 2009. doi:10.1016/C2009-0-20058-4 .[7] H. Moulinec, P. Suquet, A numerical method for computing the overall response of nonlinear composites with complexmicrostructure, Computer Methods in Applied Mechanics and Engineering 157 (1) (1998) 69 – 94. doi:https://doi.org/10.1016/S0045-7825(97)00218-1 .URL [8] T. de Geus, J. Vondˇrejc, J. Zeman, R. Peerlings, M. Geers, Finite strain ﬀt-based non-linear solvers made simple, ComputerMethods in Applied Mechanics and Engineering 318 (2017) 412–430. doi:10.1016/j.cma.2016.12.032 .URL http://dx.doi.org/10.1016/j.cma.2016.12.032 [9] J. C. Michel, H. Moulinec, P. Suquet, A computational scheme for linear and non-linear composites with arbitraryphase contrast, International Journal for Numerical Methods in Engineering 52 (1-2) (2001) 139–160. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/nme.275 , doi:10.1002/nme.275 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/nme.275 [10] R. Snieder, Inverse problems in geophysics, in: Signal Recovery and Synthesis, Optical Society of America, 2001, p. SMA2. doi:10.1364/SRS.2001.SMA2 .URL [11] X. Wu, G. Proust, M. Knezevic, S. Kalidindi, Elastic–plastic property closures for hexagonal close-packed polycrystallinemetals using ﬁrst-order bounding theories, Acta Materialia 55 (8) (2007) 2729 – 2737. doi:https://doi.org/10.1016/j.actamat.2006.12.010 .URL [12] A. Jain, J. Bollinger, T. Truskett, Inverse methods for material design, AIChE Journal 60 (08 2014). doi:10.1002/aic.14491 .[13] M. Parno, T. Moselhy, Y. Marzouk, A multiscale strategy for bayesian inference using transport maps, SIAM/ASA Journalon Uncertainty Quantiﬁcation 4 (1) (2016) 1160–1190. doi:10.1137/15m1032478 .URL http://dx.doi.org/10.1137/15M1032478 [14] M. Horstemeyer, Multiscale Modeling: A Review, 2009, pp. 87–135. doi:10.1007/978-90-481-2687-3_4 .[15] C.-T. Chen, G. X. Gu, Generative deep neural networks for inverse materials design using backpropagation and activelearning, Advanced Science 7 (5) (2020) 1902607. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/advs.201902607 , doi:10.1002/advs.201902607 .URL https://onlinelibrary.wiley.com/doi/abs/10.1002/advs.201902607 [16] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborov´a, Machine learning andthe physical sciences, Rev. Mod. Phys. 91 (2019) 045002. doi:10.1103/RevModPhys.91.045002 .URL https://link.aps.org/doi/10.1103/RevModPhys.91.045002 [17] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85–117. doi:10.1016/j.neunet.2014.09.003 .URL http://dx.doi.org/10.1016/j.neunet.2014.09.003 [18] A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classiﬁcation with deep convolutional neural networks, Neural Informa-tion Processing Systems 25 (01 2012). doi:10.1145/3065386 .[19] J. Adler, O. Oktem, Learned primal-dual reconstruction, IEEE Transactions on Medical Imaging 37 (6) (2018) 1322–1332. doi:10.1109/tmi.2018.2799231 .URL http://dx.doi.org/10.1109/TMI.2018.2799231 [20] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, CoRR abs/1512.03385 (2015). arXiv:1512.03385 .URL http://arxiv.org/abs/1512.03385 [21] Z. Yang, Y. C. Yabansu, R. Al-Bahrani, W. keng Liao, A. N. Choudhary, S. R. Kalidindi, A. Agrawal, Deep learningapproaches for mining structure-property linkages in high contrast composites from simulation datasets, ComputationalMaterials Science 151 (2018) 278 – 287. doi:https://doi.org/10.1016/j.commatsci.2018.05.014 .URL [22] Z. Yang, Y. C. Yabansu, D. Jha, W. keng Liao, A. N. Choudhary, S. R. Kalidindi, A. Agrawal, Establishing structure-property localization linkages for elastic deformation of three-dimensional high contrast composites using deep learning pproaches, Acta Materialia 166 (2019) 335 – 345. doi:https://doi.org/10.1016/j.actamat.2018.12.045 .URL [23] J. Schmidt, M. R. G. Marques, S. Botti, M. A. L. Marques, Recent advances and applications of machine learning insolid-state materials science, npj Computational Materials 5 (2019) 1–36.[24] V. Buhrmester, D. M¨unch, M. Arens, Analysis of explainers of black box deep neural networks for computer vision: Asurvey (2019). arXiv:1911.12116 .[25] D. Brough, D. Wheeler, S. Kalidindi, Materials knowledge systems in python - a data science framework for accelerateddevelopment of hierarchical materials, Integrating Materials and Manufacturing Innovation 6 (2016) 1–18. doi:10.1007/s40192-017-0089-0 .[26] M. I. Latypov, L. S. Toth, S. R. Kalidindi, Materials knowledge system for nonlinear composites, Computer Methods inApplied Mechanics and Engineering 346 (2018) 180–196.[27] S. R. Kalidindi, A bayesian framework for materials knowledge systems, MRS Communications 9 (2) (2019) 518–531. doi:10.1557/mrc.2019.56 .[28] S. Kalidindi, A. Khosravani, B. Yucel, A. Shanker, A. Blekh, Data infrastructure elements in support of acceleratedmaterials innovation: Ela, pymks, and matin, Integrating Materials and Manufacturing Innovation (2019) 1–14 doi:10.1007/s40192-019-00156-1 .[29] B. A. Lippmann, J. Schwinger, Variational principles for scattering processes. i, Phys. Rev. 79 (1950) 469–480. doi:10.1103/PhysRev.79.469 .URL https://link.aps.org/doi/10.1103/PhysRev.79.469 [30] D. B. Brough, D. Wheeler, J. A. Warren, S. R. Kalidindi, Microstructure-based knowledge systems for capturing process-structure evolution linkages, Current Opinion in Solid State and Materials Science 21 (3) (2017) 129 – 140, materialsInformatics: Insights, Infrastructure, and Methods. doi:https://doi.org/10.1016/j.cossms.2016.05.002 .URL [31] M. W. Priddy, N. H. Paulson, S. R. Kalidindi, D. L. McDowell, Strategies for rapid parametric assessment ofmicrostructure-sensitive fatigue for hcp polycrystals, International Journal of Fatigue 104 (2017) 231 – 242. doi:https://doi.org/10.1016/j.ijfatigue.2017.07.015 .URL [32] R. A. Lebensohn, A. D. Rollett, Spectral methods for full-ﬁeld micromechanical modelling of polycrystalline materials,Computational Materials Science 173 (2020) 109336. doi:https://doi.org/10.1016/j.commatsci.2019.109336 .URL [33] P. Putzky, M. Welling, Recurrent inference machines for solving inverse problems (2017). arXiv:1706.04008 .[34] M. Stein, Gradient methods in the solution of systems of linear equations, Journal of research of the National Bureau ofStandards 48 (1952) 407.[35] G. Landi, S. R. Niezgoda, S. R. Kalidindi, Multi-scale modeling of elastic response of three-dimensional voxel-basedmicrostructure datasets using novel dft-based knowledge systems, Acta Materialia 58 (7) (2010) 2716 – 2725. doi:https://doi.org/10.1016/j.actamat.2010.01.007 .URL [36] E. Kr¨oner, Statistical continuum mechanics: course held ... October 1971, Springer, 1972.[37] G. Green, An essay on the application of mathematical analysis to the theories of electricity and magnetism (1828). arXiv:0807.0088 .[38] T. Eisler, An introduction to Green’s functions, 1969.[39] Wikipedia, https://en.wikipedia.org/w/index.php?title=Green%27s_function&oldid=966577686 (Jul 2020).[40] R. A. Lebensohn, J. P. Escobedo, E. K. Cerreta, D. Dennis-Koller, C. A. Bronkhorst, J. F. Bingert, Modeling void growthin polycrystalline materials, Acta Materialia 61 (18) (2013) 6918 – 6932. doi:https://doi.org/10.1016/j.actamat.2013.08.004 .URL [41] W. S. McCulloch, W. Pitts, A Logical Calculus of the Ideas Immanent in Nervous Activity, MIT Press, Cambridge, MA,USA, 1988, p. 15–27.[42] F. Rosenblatt, The Perceptron, a Perceiving and Recognizing Automaton Project Para, Report: Cornell AeronauticalLaboratory, Cornell Aeronautical Laboratory, 1957.[43] G. Cybenko, Approximation by superpositions of a sigmoidal function, Mathematics of Control, Signals and Systems 2(1989) 303–314.[44] A. Pinkus, Approximation theory of the mlp model in neural networks, Acta Numerica 8 (1999) 143–195.[45] Y. Lecun, L. Bottou, Y. Bengio, P. Haﬀner, Gradient-based learning applied to document recognition, Proceedings of theIEEE 86 (11) (1998) 2278–2324.[46] Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of recurrent neural networks for sequence learning (2015). arXiv:1506.00019 .[47] F. Emmert-Streib, Z. Yang, H. Feng, S. Tripathi, M. Dehmer, An introductory review of deep learning for predictionmodels with big data, in: Frontiers in Artiﬁcial Intelligence, 2020.[48] V. K. Ojha, A. Abraham, V. Sn´asel, Metaheuristic design of feedforward neural networks: A review of two decades ofresearch, CoRR abs/1705.05584 (2017). arXiv:1705.05584 .URL http://arxiv.org/abs/1705.05584 [49] M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoﬀman, D. Pfau, T. Schaul, B. Shillingford, N. de Freitas, Learning tolearn by gradient descent by gradient descent (2016). arXiv:1606.04474 .[50] F. N. Iandola, M. W. Moskewicz, K. Ashraf, S. Han, W. J. Dally, K. Keutzer, Squeezenet: Alexnet-level accuracy with

0x fewer parameters and < arXiv:1602.07360 .URL http://arxiv.org/abs/1602.07360 [51] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning Representations by Back-Propagating Errors, MIT Press,Cambridge, MA, USA, 1988, p. 696–699.[52] R. M. Gray, 2006. doi:10.1561/0100000006 .[53] M. Z. Ullah, S. Serra-Capizzano, F. Ahmad, An eﬃcient multi-step iterative method for computing the numerical solutionof systems of nonlinear equations associated with odes, Applied Mathematics and Computation 250 (2015) 249 – 259. doi:https://doi.org/10.1016/j.amc.2014.10.103 .URL [54] M. Mozer, A focused backpropagation algorithm for temporal pattern recognition, Complex Syst. 3 (1989).[55] F. Milletari, N. Navab, S.-A. Ahmadi, V-net: Fully convolutional neural networks for volumetric medical image segmen-tation (2016). arXiv:1606.04797 .[56] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectiﬁers: Surpassing human-level performance on imagenet classiﬁ-cation (2015). arXiv:1502.01852 .[57] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation (2015). arXiv:1411.4038 .[58] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga,A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai,S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle,A. Beygelzimer, F. d ' Alch´e-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32,Curran Associates, Inc., 2019, pp. 8024–8035.URL http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf [59] PACE, Partnership for an Advanced Computing Environment (PACE) (2017).[60] I. Loshchilov, F. Hutter, Sgdr: Stochastic gradient descent with warm restarts (2017). arXiv:1608.03983 .[61] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. E. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, Going deeperwith convolutions, CoRR abs/1409.4842 (2014). arXiv:1409.4842 .URL http://arxiv.org/abs/1409.4842 [62] J. Johnson, A. Alahi, F. Li, Perceptual losses for real-time style transfer and super-resolution, CoRR abs/1603.08155(2016). arXiv:1603.08155 .URL http://arxiv.org/abs/1603.08155 [63] Z. Li, N. Kovachki, K. Azizzadenesheli, B. Liu, K. Bhattacharya, A. Stuart, A. Anandkumar, Fourier neural operator forparametric partial diﬀerential equations (2020). arXiv:2010.08895 .[64] K. Shridhar, F. Laumann, M. Liwicki, A comprehensive guide to bayesian convolutional neural network with variationalinference (2019). arXiv:1901.02731 .[65] C. Kelly, Elastic localization data, (2020).[66] C. Kelly, Rln elasticity, https://github.com/conlain-k/RLN_elasticity (2020).(2020).