A physics-informed operator regression framework for extracting data-driven continuum models
Ravi G. Patel, Nathaniel A. Trask, Mitchell A. Wood, Eric C. Cyr
AA physics-informed operator regression framework forextracting data-driven continuum models
Ravi G. Patel a, ∗ , Nathaniel A. Trask a , Mitchell A. Wood b , Eric C. Cyr a a Sandia National Laboratories - Computational Mathematics Department b Sandia National Laboratories - Computational Multiscale Department
Abstract
The application of deep learning toward discovery of data-driven models re-quires careful application of inductive biases to obtain a description of physicswhich is both accurate and robust. We present here a framework for discoveringcontinuum models from high fidelity molecular simulation data. Our approachapplies a neural network parameterization of governing physics in modal space,allowing a characterization of differential operators while providing structurewhich may be used to impose biases related to symmetry, isotropy, and conser-vation form. We demonstrate the effectiveness of our framework for a varietyof physics, including local and nonlocal diffusion processes and single and mul-tiphase flows. For the flow physics we demonstrate this approach leads to alearned operator that generalizes to system characteristics not included in thetraining sets, such as variable particle sizes, densities, and concentration.
Keywords: physics-informed machine learning, operator regression, spectralmethods, continuum scale modeling
1. Introduction
Data-driven physics models have recently emerged in response to the surgeof available science and engineering data [1, 2, 3]. Traditionally, equations gov-erning dynamics of a given system have been derived by arguing from firstprinciples, typically appealing to conservation laws and using either empiri-cal arguments or statistical mechanics/homogenization/multiscale approachesto derive closures and constitutive relations [4, 5]. Such approaches work wellwhen simplifying assumptions can be made, representative of near-equilibriumsettings [6], simplified geometries [7] or small numbers of dilute species. Re-cently however, applications require complex models capturing multiscale, mul-tiphysics, and nonequilibrium processes, mandating increasingly technical frame- ∗ Corresponding author
Email addresses: [email protected] (Ravi G. Patel), [email protected] (NathanielA. Trask), [email protected] (Mitchell A. Wood), [email protected] (Eric C. Cyr)
September 28, 2020 a r X i v : . [ phy s i c s . c o m p - ph ] S e p orks for prescribing dynamics, such as the Mori-Zwanzig or GENERIC for-malisms [8, 6, 9]. In this context, data-driven models have gained traction as ameans to use the ubiquity of data to prescribe models, augmenting the tradi-tional role of first-principles derivations.Given the recent advances in machine learning (ML), there has been substan-tial interest in determining whether techniques from deep learning [10, 11] maybe used to prescribe data-driven models, providing machine-learned surrogatesand closures to augment traditional modeling and simulation approaches [12].In ML, inductive biases describe the priors and biases imposed by a learningstrategy, typically exploiting domain expertise to alleviate the required size oftraining data [13]. In scientific machine learning (SciML) contexts, this is oftenreferred to as physics-informed ML, and inductive biases manifest as choicesof model form that enforce preservation of physical invariants or mathemati-cal structure [14, 15, 16]. The challenge of SciML is to carefully balance theseinductive biases, imposing an appropriate model form to extract robust andphysically realizable system dynamics without losing descriptiveness.In the data-driven modeling literature, approaches assuming known dynam-ics with unknown material parameters or constitutive relations, are differen-tiated from those assuming unknown dynamics. Mature techniques exists forthe former, where one obtains an inverse problem for parameter estimation thatmay be approached with tools such as PDE-constrained optimization [17, 18, 19]or Bayesian inversion [20]. In the later, it is an open question how to assumea model form which will guarantee that extracted dynamics are numericallywell-posed. Many techniques exist toward this end. However, the spectrum ofchoices can be characterized on one end by “black box” ML techniques, relyingon the universal approximation property of neural networks to impart no ex-plicit bias on extracted dynamics [21], and dictionary-based techniques on theother, which restrict dynamics to a sparse selection of candidate models spec-ified a priori [22, 23, 24, 1, 2, 25]. While the former is potentially the mostdescriptive, the later provides more structure to explicitly impose physical bi-ases. In this work, we extend the framework in [26] for parameterizing dynamicsthis lies in between, allowing imposition of physical biases without imposing anexplicit model form a priori via a dictionary.In this work we adopt the perspective of modal/nodal description of op-erators that is ubiquitous in the spectral/spectral element discretization ofPDEs [27, 28, 29]. This perspective is not particularly new in the ML liter-ature; see e.g. [26, 30, 31] for examples where this modal perspective providesan effective means of parameterizing differential operators without the need fordictionaries. In this work, we focus on developing specializations of the archi-tectures in [26, 30, 31] which allow the introduction of physical biases. We willassume that dynamics may be described by two neural networks: one param-eterizing a pseudo-differential operator in modal space composed with anothernetwork parameterizing a point-wise nonlinear operator in physical space. Whileapplicable to a general choice of spectral basis (see for example [32, 33, 30, 34]),we focus here on the Fourier basis in rectangular domains with naturally imposedboundary conditions. This provides access to both fast projection operators via2he Fourier transform, and an effective parameterization of differential operatorsvia the symbol of the operator in Fourier space. In contrast to the architectureconsidered for example in [30], this treatment of differential operators allowshandling a single scalar function of the wavevector, rather than learning a high-dimensional black-box mode-to-mode mapping, and supports the imposition ofbiases, including conservation form and isotropy, which may be easily impartedto extract accurate and stable data-driven dynamics. We present this approachthrough the perspective of operator regression , in which one may learn an op-erator via observations of its action on a representative family of functions. Weestablish that this perspective may be interpreted as a reduced space PDE-constrained optimization problem [35, 36] that may be implemented entirely inpopular machine learning packages such as Tensorflow and pyTorch [37, 38], al-lowing use of “off-the-shelf” neural network architectures. Our implementationis available at github.com/rgp62/MOR-Physics .While numerous works have sought to learn dynamics, our focus is on theinterplay of inductive biases and data. While we present many necessary com-ponents under the umbrella of operator regression (the modal/nodal perspec-tive, learning updated operators, imposing physical biases/etc.), several of theseingredients have been previously pursued within the literature. The key contri-bution of the current work is to carefully design the loss and architectures sothat one may extract a model which is both accurate and stable for long timeintegration. While several works have considered learning dynamics for simplescalar problems with synthetic data, to our best knowledge there are few frame-works which can handle a general “black-box” form for the governing equationsfor real engineering data. We have gathered a collection of benchmark systemsfor which the necessity of physical biases becomes increasingly important. Weinitially start with synthetic data corresponding to simple scalar PDE, whichis easily handled in many frameworks. We then progress to extraction of con-tinuum models from realistic simulations of particle/molecular dynamics. Thecorresponding data sets incorporate multiscale effects and are inherently noisy,and we informally note that without imposing physical biases it is difficult (ifnot impossible) to extract a stable and accurate solution. We additionally con-sider necessary extensions to handle systems of PDEs for which the choice ofarchitecture and loss are critical to obtain models which scale reasonably withmodel complexity. This contribution is critical to bridging the gap betweenproof-of-concept calculations for synthetic scalar PDE data (e.g. [26, 30, 31])and the model complexity necessary for large systems of interest to the engi-neering community consisting of multiscale, multidimensional, and multiphysicsdata.We organize this work by first introducing a general abstract framework fordata-driven modeling, and then use manufactured data to highlight its attrac-tive properties in the absence of measurement noise. In Section 5 we demon-strate how anomalous diffusion processes involving nonlocal operators may beextracted from particle simulations of Levy flights. In Section 6 we considermicrochannels containing both single phase and two phase Leonnard-Jonesianfluids, extracting multiscale continuum models for the coarse-grained multi-3hase system that generalize to particle densities, concentrations, and particlesize ratios unseen in the training set. For all applications, we apply a subsetof the biases discussed in Section 3, providing particular specializations of thegeneral framework necessary for each application. As a whole, these exemplarmultiscale problems present a series of tutorials demonstrating how the generaltheory and physical biases may be combined for a broad range of applications.
2. Abstract data-driven modeling framework
We present a general framework for data-driven modeling that we will laterspecialize toward specific problems. As input to the framework, we considersamples of field data X = { u im } i =1 ,...,Im =1 ,...,M , (1)where u im belong to a Banach space V defined on a compact domain Ω ⊂ R d .Each index i refers to a time series of functions with I distinct initial conditions,while the index m denotes evaluation at a time point t m ∈ R + with t = 0. Wefurther assume there exists a function u ( x , t ) ∈ V × R + that satisfies dynamicsof the form ∂ t u = N u , B u = 0 , u ( x ,
0) = u i ( x ) , (2)where N is a spatial operator, B is a known boundary operator, and u i areknown initial conditions in X , as described above.We seek to determine the dynamics N so that u ( x , t m ) is close to u m forall m in an appropriate norm. While we consider in this work a first-order timederivative appropriate for conservation laws, the process generalizes naturallyto higher-order time derivatives. By integrating the principal component of Eq. 2 in time, we obtain theupdate operator ˆ L t − t prescribing the increment of the dynamical system stateover a finite interval u − u = (cid:90) t t N u dt := ˆ L t − t u (3)where u i is the solution at t i . By restricting to a uniform time mesh (for all n , t n +1 − t n = ∆ t ), assuming dynamics are autonomous (i.e. N does not depend ontime), and applying one-point quadrature, we obtain for all n the approximateupdate operator u m +1 − u m = L ∆ t u m + ≈ ˆ L t m +1 − t m u . (4)4here L ∆ t [ u m + ] = ∆ t N u m + . The evaluation point u m + denotes the choiceof one-point quadrature scheme; selecting u m + as u m or u m +1 yields regres-sion of explicit Euler or implicit Euler dynamics, respectively. For simplicity inthis work, we will consider explicit dynamics and drop the subscript, adopt-ing L ∆ t = L . Note that in the traditional forward modeling context, thisencapsulates both explicit Euler integrators and multi-stage integrators such asRunge-Kutta; in either case only the state at the previous timestep is used toadvance in time.To regress the dynamics of Equation 2, we introduce a parameterized familyof operators L ξ (here the subscript indicates the parametrization while the timestep ∆ t is assumed) and use the data in Equation 1 to estimate appropriateparameters ξ such that u m +1 − u m = L ξ u m , (5)or equivalently, that data at t n +1 may be related to the initial condition viarepeated application of the update operator u m +1 = ( I + L ξ ) m +1 u , (6)where I denotes the identity map. To define the optimal parameterization ξ , we introduce the following abstractoperator regression problem . Consider a Banach space V , with finite dimensionalsubspace U . Assume one has training data of the form X = { f n , L [ f n ] } Nn =1 , where f n are sampled from U and L : U → V . Then, introducing a parameter-ized family of operators L ξ : U → V , we seek a choice of ξ such that L ξ providesan optimal characterization of L in the following sense. ξ ∗ = argmin ξ (cid:88) f ∈ U ||L [ f ] − L ξ [ f ] || , (7)where || · || denotes an appropriate norm over V .While we pursue this problem in the context of data-driven modeling, itmay be studied in a purely mathematical context as a generalization of classicalregression. In regression, one traditionally seeks a characterization of a func-tion f at input points x ∈ Ω, while in operator regression we instead seek acharacterization of the operator L at input functions f ∈ V .To perform data-driven regression of the dynamics in Equation 2, we special-ize Equation 7 and find a choice of ξ which best matches the data (Equation 1).This corresponds to selecting an appropriate space U such that X characterizesthe range of L . In this work we select as norm the mean-square norm evaluatingthe action of the operator at a collection of points Q = { x q } N Q q =1 ⊂ Ω.5 ∗ = argmin ξ (cid:88) u i ∈ X m =0 x q ∈ Q | u im ( x q ) − ( I + L ξ ) m u i ( x q ) | . (8)To apply this framework, one must specify the following choices:1. Choice of reconstruction space U
2. Choice of parameterization for L ξ We explore both choices in the following sections 2.3 and 2.4. To motivatethe choice of U , we first relate this perspective to traditional PDE-constrainedoptimization.An equivalent PDE-constrained optimization formulation of operator regres-sion is defined as the equality constrained quadratic program:argmin N I (cid:88) i =1 || v i ( x m , t m ) − u im || subject to v i ( x , t n ) − u i ( x ) = (cid:90) t n N v i dt, B v i = 0 , v i ( x ,
0) = u i ( x ) , (9)where B and u i are boundary and initial data. The constraint is written inintegral form, as opposed to the form in Eq. 2, to match the form of the in-crement operator. Computing the gradient of the objective function subjectto the equality constraint typically requires calculation of forward and adjointproblems. The adjoint computation is often burdensome to implement and notalways available within a PDE solver, and it is therefore natural to pursue theunconstrained operator regression formulation defined in Eq. 8. As we will dis-cuss in 2 .
4, we will pursue neural network parameterization of operators. Oneconvenient consequence of this choice is that the machine learning frameworksTensorflow [37] and pyTorch [38] enable automatic computation of the deriva-tives necessary to handle Eq. 8.For a certain choice of U and L , the operator regression may be interpreted asa reduced space [35, 36] approach to Equations 9. We first note that due to theassumption of uniform discretization in time, the first constraint is automaticallysatisfied. This follows trivially from v i ( x , t n ) − u i ( x ) = (cid:90) t n N v i dt = ( I + L ξ ) n +1 u i . To satisfy the second constraint, one must choose the space U ⊂ V to beboundary-conforming, i.e. for any f ∈ U we have B f = 0. From this per-spective, our operator regression framework is equivalent to a PDE-constrainedoptimization approach which “plays well” with machine learning libraries im-plementing unconstrained optimizers, allowing one to explore parameterizations L ξ that use “off-the-shelf” neural network architectures without the need to de-velop interfaces to PDE-constrained optimization libraries.6 .3. Choice of reconstruction space The choice of U must enforce the boundary operator constraint (constrainttwo in Equation 9) while supporting a broad class of parameterized operators L ξ . In principle any boundary-conforming basis satisfies the first requirement.In light of the second, we consider modal spaces typical of spectral methodsfor PDEs, such as Fourier series or orthogonal polynomials. These bases gen-erally provide a natural description of differential operators in modal space andnonlinear operators in physical space , which we explain below.Specifically, we seek U supporting description of differential operators viaan invertible mapping Π : U → R dim ( U ) and a bounded operator S , so that D α = Π − ◦ S ◦ Π . (10)As a particular example, defining U as the range of a truncated Fourier serieswill provide two desirable features. First, any differential operator may beparameterized via the Fourier symbol of D α by selecting S ( κ ) = (cid:80) α C α κ α , formulti-index α and coefficients C α , where κ denotes wave-number. Secondly, Πmay be selected as the Fourier transform to allow a fast algorithm. For thesereasons we will consider the Fourier basis in this work, however it is naturalto consider whether one may select other choices for U more appropriate forcomplex geometries.We gather the relevant conditions on U characterizing broader choices ofmodal/physical spaces. Consider a complete basis such that ∀ f ∈ U , f ( x ) = c (cid:124) P ( x ), where c and P are vectors of coefficients and basis functions, respec-tively. We assume the function space is characterized via a collection Λ ofdegrees of freedom (DOFs), and that these DOFs are unisolvent over U in thesense that there exists a unique bijection mapping c to Λ. In the modal setting,operators are described in terms of c ; in the physical, operators are describedin terms of Λ. In the physical, nonlinear operators are often more easily de-scribed; e.g. for nodal DOFs one may sparsely evaluate the operator u bysimply squaring the DOFs at each node, whereas a modal description may yielda dense operator. Following from unisolvency, there exists an invertible map-ping Π : Λ → c . As an example, the choice of orthogonal polynomials ratherthan Fourier series for U fits the desired form of Equation 10, but would insteadprovide a differentiation matrix for S and L -projection for Π. Recent work[32, 33, 30] adopts this viewpoint, which may be appropriate for generalizingthe current approach to complex geometries via spectral element bases.Regarding imposition of boundary conditions, in this paper we will assumedata to be arbitrarily smooth V = C ∞ , and denote the space of Fourier se-ries U = (cid:110)(cid:80) Jj =0 c j exp ( ij πx ) (cid:111) , with the following Dirichlet and Neumannboundary-conforming restrictions: U D = { f ∈ U such that f (0) = f (1) = 0 } ,U N = (cid:26) f ∈ U such that dfdx (0) = dfdx (1) = 0 (cid:27) . Q in Equation 8 the set of evenly spaced points, 0 , . . . , J −
1, theboundary conditions are satisfied at j = − / j = J − /
2. For problemsin two-dimensions, we consider tensor-product generalizations of these spaceswith appropriate boundary conditions.
Motivated by the form of Equation 10, we will consider in this work thefollowing parameterization of L ξ . L ξ = Π − ◦ g ξ ◦ Π ◦ h ξ (11)Again, Π represents the mapping from physical to modal space, thus Π andΠ − are the Fourier transform and its inverse, respectively. In general, Π is the L -projection onto U , and Π − is evaluation Π − [ u ] = c (cid:124) P ( x ). The functions g ξ and h ξ are neural networks whose architecture will be given later. Follow-ing the discussion in the previous section, g and h are designed to characterizedifferential operators in modal space and nonlinearities in physical space, re-spectively. We note a slight abuse of notation, where a given h may map from U → V (consider e.g. h = u , and U = { sin( x ) } , for which h ( u ) (cid:54)∈ U ) so thatthe composition Π ◦ h is not well defined. If one works with a discrete Fouriertransform, then the evaluation of h ( u ) at discrete points implicitly provides therequisite projection of V → U ; we assume this to be understood when we writeΠ ◦ h .Several works consider various parameterization for operator-regression-liketasks in the literature [33, 21, 40, 41]. Broadly, they may be characterizedas employing varying degrees of inductive biases, balancing the expressivityof parameterization against the number of parameters and consequent datarequired for training. In increasing order of bias: some works use only neuralnetworks to characterize operators, imposing no explicit bias; some assume thenetwork may be expressed as a stencil; and others assume one may draw upondomain expertise to build a dictionary of potential model forms. The form ofEquation 11 assumes that the dynamics may be characterized as the compositionof a differential operator and a nonlinear operator, using Π as an encoder intomodal space. In [30], the authors consider a similar encoding into modal space,but do not include a nonlinear network, and treat g ξ as a black-box mappingfrom modal coefficients to modal coefficients. By instead treating the modaloperator as a parameterization of the Fourier symbol, we exchange the dim ( U )-dimensional problem for a one-dimensional one. This provides a further benefitof allowing one to use the Fourier symbol as a means of imposing biases, whichwe explore in the following section.
3. Inductive bias
For many scientific machine learning tasks, it is infeasible to conduct theexpensive experiments or fine grain simulations necessary to extract a suit-8
99% prediction accuracy.However data augmentation increases the difficulty of training and, partic-ularly for continuous transformations such as rotation, it is unclear how thor-oughly one should sample the transformations. Building inductive biases di-rectly into the model alleviates these concerns by restricting the search spaceof models to only those a priori deemed feasible. For instance it is common inimage classification tasks to use convolutional neural networks to enforce trans-lational invariance [46, 47]. An active area of research seeks models with built-inequivariances [48, 49, 50, 51].We impose inductive biases by choosing the form of Eq 11. For regressionproblems using the Fourier basis, we use the following parameterization of thePDE, ∂ t u = L ξ u = Π − ◦ g ξ ( κ )Π ◦ h ξ ◦ u (12)where h is a nonlinear, point-wise operator, without explicit dependence on x ,i.e. ( h ◦ u )( x ) = h ( u ( x )). Figure 1 diagrams the architecture for the discretized L ξ . This parameterization satisfies translational invariance due to the shiftingproperty of the Fourier transform. If u satisfies Equation 12, so does a shiftedfunction, u ( x ) → u ( x − x ) = T x u because translation commutes with L ξ ,Π − ◦ g ξ ( κ )Π ◦ h ξ ◦ T x = Π − ◦ g ξ ( κ )Π ◦ T x ◦ h ξ = Π − ◦ T x g ξ ( κ )Π ◦ h ξ = T x ◦ Π − ◦ g ξ ( κ )Π ◦ h ξ (13)We consider the benefits of enforcing three additional inductive biases,9. Reflective symmetry, if u satisfies Equation 12, so does − u . This symmetrymay be enforced by letting h ( u ) = sign( u )ˆ h ( | u | ), where ˆ h is a neuralnetwork. By linearity, Π − ◦ g ( κ )Π ◦ h ◦ ( − u ) = − Π − ◦ g ( κ )Π ◦ h ◦ u .2. Isotropy. This symmetry may be enforced by letting g ( κ ) = ˆ g ( || κ || ), whereˆ g is a neural network. By the rotation property of the Fourier transform,for a rotated function, u ( x ) → u ( Rx ) = Ru ,Π − ◦ ˆ g ( || κ || )Π ◦ h ◦ R = Π − ◦ ˆ g ( || κ || )Π ◦ R ◦ h = Π − ◦ R ◦ ˆ g ( || κ || )Π ◦ h = R ◦ Π − ◦ ˆ g ( || κ || )Π ◦ h (14)where ( R ◦ Π ◦ v )( κ ) = (Π ◦ v )( Rκ ). This restriction requires h to bepoint-wise operator without explicit dependence on x .3. Conservation. This constraint may be enforced globally by letting g ( κ ) = χ ( κ )ˆ g ( κ ), where ˆ g is a neural network and χ ( κ ) = (cid:26) κ = 01 otherwise . Inte-grating Equation 12 over space, (cid:90) ∂ t udx = (cid:90) Π − ◦ g ( κ )Π ◦ h ◦ udx = ddt (Π ◦ u )(0) = g (0)(Π ◦ h ◦ u )(0) = 0 (15)This bias can be imposed in models using the cosine basis and an equiva-lent parameterization as Equation 12 by a similar argument.These inductive biases are physically motivated. For example, the responseof a material is often assumed to be constant at every point in the material and inall directions, so translational invariance and isotropy are commonly supposedin mechanics models. Conservation, e.g. of mass, momentum, or energy, isoften a fundamental property of a physical system. The reflective symmetryconsidered here is representative of symmetries encountered in many physicallyrelevant PDEs.
4. Validation for abstract operator regression problems
To study the effectiveness of the neural network parameterization, this sec-tion learns known steady-state spatial operators on a periodic domain. Appliedto Eq. 7, the goal is to find a pair of parameterized operators to match twospatial operators L ξ u ∼ ∂ x u and L ξ u ∼ ∇ u. (16)The observations, u and ˆ L u , are sampled using the following strategy. Eachinput function, u , is generated by a Fourier expansion, u = (cid:88) | κ | < =10 c κ exp( jκ · x ) (17)10 able 1: Hyperparameters for spatial operator regression Parameter ValueNetwork width 4Network depth 4Activation function eluOptimizer AdamLearning rate 0.001Batch size 10Epochs 1Grid size 256 for ∂ x u ; 128 ×
128 for ∇ u Training set size 800,1600,3200,6400,12800where c − κ = ¯ c κ , | c κ | = 1, and arg( c κ ) is sampled from the uniform distribution, U [0 , π ]. The action of the differential operator, ˆ L u , is defined using Equa-tion 11. For each problem, we generate training sets by sampling N functionsand computing the actions of the operator. A similar parametrization is usedfor both the advection operator and Laplacian, L ξ = Π − g ξ ( κ ) Π h ξ ( u ) (18)where g ξ and h ξ are neural networks. Table 1 shows the parameters used forthe networks and training.To facilitate comparisons with the spatial operators, the parameterization ismodified in a post processing step to remove redundant degrees of freedom. Forinstance, the choice of f ξ and g ξ can be easily modified by a scaling to give anequivalent operator. To this end, for both operators in Eq. 16 in post processingwe shift h ξ and g ξ (0) such that h ξ (0) = 0. Additionally, for Burger’s we scale h ξ and g ξ such that d hdu = 2 at u = 0. For the Laplacian, we scale h ξ and g ξ such that dh ξ du = 1 at u = 0.Figure 2 shows the results of learning a 1D nonlinear spatial operator moti-vated by Burger’s equation: ∂ x u . The upper left image shows a representativesample from the solution space with the true value of the differential operatorin blue and the regressed evaluation in orange. The log of relative error in min-imizing the loss is in the upper right with errors bars representing one standarddeviation of the log error. The high accuracy of the relative loss, indicates thatthe learned operator is relatively accurate. This is confirmed by the lower twoimages, which show learned (orange) and exact (blue) modal function g on theleft, and the nonlinear function h on the right.Figure 3 similarly learns ∇ u in 2 spatial dimensions. An examination of theeffect of including an isotropy assumption (see Section 3) in the construction ofthe parameterization is studied. Here the lower right image shows the relativeerror and demonstrates that a higher-level of agreements is achieved when theisotropy assumption is applied (orange) rather than ignored (blue). Again, thisobservation is clear from the contour plot of the value of g for different values11 . . . x/ π − . . . L u . . . Log Training set size − − − − L og R e l a t i v ee rr o r κ I m [ g ( κ ) ] − .
05 0 .
00 0 . u . . . h ( u ) Figure 2: Application of the Burgers operator, ∂ x u , ( ) and regressed operator ( )to a function ( top left ). Im[ g ( κ )] ( bottom left ) and h ( u ) ( bottom right ) for both operators.Relative error, ||L u − ∂ x u || / || ∂ x u || vs. training set size ( top right ). . . . x/ π . . . . . y / π Laplacian . . . x/ π . . . . . y / π Regressed κ x − . − . . . . κ y -4 -16 -36 -64 Re[ g ( κ )] . . . Log Training set size − − L og R e l a t i v ee rr o r − . . . L u Figure 3: Application of Laplacian to a function, ∇ u , ( top left ) and application of theregressed operator with isotropy assumption to the same function, L u , ( top right ). Realpart of the symbol of the Laplacian ( ), regressed operator without isotropy assumption( ), and regressed operator with isotropy assumption ( ) ( bottom left ). Relative error, ||L u − ∇ u || / ||∇ u || vs. training set size ( bottom right ). of the modal space. The parameterization enforcing isotropy agrees stronglywith the exact values for the Laplacian operator (dotted black). However, theagreement with the unrestricted parameterization is not as strong. This demon-strates the importance of enforcing appropriate inductive biases to improve thequality of the learned operator.
5. Diffusion Example: Extraction of anomalous diffusion from molec-ular data
As a canonical example of a particle system exhibiting anomalous diffusion[52],we consider the evolution of a number of particles described by the Levy process, dX t = dZ t , (19)in a periodic domain, whose increment is specified by the α -stable distributionwith skewness 0, scale parameter ∆ t /α , and location parameter 0 (i.e. Z t +∆ t − able 2: Hyperparameters for diffusion operator regression Parameter ValueNetwork width 4Network depth 4 M ×
64 for 2D Z t ∼ L ( α, , ∆ t /α , α -stable distribution, along withalgorithms and code to sample it, we refer readers to [53] and references therein.In the continuum limit, the particle density satisfies a fractional diffusionequation. For the particular choice of α = 2, Z t one recovers the familiarBrownian motion with continuum limit corresponding to the heat equation ∂ t u = ∆ u, (20)while selecting α < α > For operator regression the task is to train on coarse grained data generatedby Eq. 19, and learn a continuum representation approximating Eq. 20. Thisprovides a test that the operator regression strategy we employ can yield accu-rate predictions. In addition, we will demonstrate the effect that the inductivebiases have on the performance of the regression. As a validation experiment, atrajectory of the learned continuum operator is applied to an initial conditionthat was not used in training, and compared to the evolution of a correspondingmolecular trajectory with identical initial condition. This will demonstrate theability of the regressed operator to generalize to unseen initial conditions.Due to the periodicity of the domains, the Fourier basis is used with theparameterization in Eq. 12 for the spatial operator. In both 1D and 2D thecontinuum evolution of the density is computed by binning the particles in his-tograms. Regular grids are used for the histograms, as discussed in Section 5.2.1and 5.2.3, so that the transforms can be computed efficiently.14o avoid performing the optimization in Eq. 8 through a large number ofcompositions, m , we approximate it as, ξ ∗ = argmin ξ (cid:88) u j ∈ Xx q ∈ Q | u m + j ( x q ) − ( I + L ξ ) m u j ( x q ) | . (21)where we iterate the above optimization problem M times, setting m = 1 forthe first iteration and increasing m by 1 every iteration. For the examples inthis section, we use the hyperparameters listed in Table 2.We remark the the problem setup bears similarities to that considered in [30].An important distinction is that the update operator I + L ξ will be modeledby a single update, whereas [30] model the update operator using a residualnetwork. Their approach may be interpreted as learning a multi-stage updateoperator, at the expense of requiring a separate set of hyperparameters ξ ateach stage. From this perspective the current work may be considered as aspecial case corresponding to a single layer ResNet. Practically this amountsto a smaller number of hyperparameters, learning one operator L ξ rather thanmultiple for each substage. When we later move to more difficult problemshowever, we will require a more involved loss in Eqn. 34, and at that point wewill lose the ResNet interpretation. D Brownian motion
The training data is generated by particle motion evolved according toEq. 19. Particle positions, X ( t = 0), are initialized by sampling from a pa-rameterized multi-von Mises distribution, P ( X ) = 110 (cid:88) i =1 exp( κ i cos( x − µ i ))2 πI (22)where I is the zeroth order modified Bessel function of the first kind. For asingle parameterization ( κ i and µ i are fixed) the initial location of each particleis generated by first drawing i from a discrete uniform distribution, U [0 , f i . The trajectories, X ( t ), are evolvedin the periodic domain [0 , π ) using the Euler-Maruyama algorithm with atimestep size of 0 . u ( x, t ) by binning the particles in 256 uniform cells. We generate 100such particle trajectories each containing 51200 particles at randomly selectedparameter values. The five scale and location parameters κ i and µ i are sampledfrom a uniform distribution, U [5 , U [0 , π ], re-spectively. We perform 5 realizations of training using the same data, randomlyinitializing the neural network parameters. The top row of Figure 4 shows the training data at the final time (in blue)compared to the final time of the trajectories evolved using the 5 learned op-erators (in orange). The black line in the images is the initial condition. The15 u No restrictions Conservation ConservationReflection x/ π − u x/ π x/ π Figure 4: Evolution of the Brownian motion training data ( top row ) and the heat equationvalidation example ( bottom row ) from t = 0 ( ) to t = 1 ( ) compared to evolution of5 realizations of the learned equation ( ). Each column depicts the final learned solutionwith different physical assumptions yielding improved training and validation accuracy. u ( x, t ) = Π − g ( κ ) exp( − κ t ) g ( κ ) = (cid:26) κ is even − j/πκ else (23)and shown in blue for t = 1. The learned evolution often produce solutions thatdrift over time or oscillate when u is negative. This situation can be rectified,however, by enforcing inductive biases on the parameterization of the learnedoperator.To explain the improvements gained from enforcing the inductive biases con-sider first the “no restrictions” validation result in Fig. 4. Here several of thelearned approximations drift away from the exact solution. In Figure 5 this qual-itative observation is quantitatively expressed as a time history of the currenttotal particle density relative to the initial particle density. For both training(left image) and validation (right image) results, the learned evolved particledensity can deviate substantially from the initial particle density. Introducingthe conservation constraint, as discussed in Sec. 3, provides a correction to thesolution as observed in the middle images of Fig. 4. These solutions don’t drift;however, the validation and training solutions remain flawed.Figure 6 demonstrates why the reflection constraint can be used to addressthese problems. The figure shows the learned and exact h (top row) and g (bottom row) functions. The learned g ξ corresponds well to the known value atlow frequencies regardless of the enforcement of constraints. At high frequencies,the approximation deviates. This is not surprising as the spatial resolutionof the original Brownian motion is bounded by the choice of histogram size.More relevant to the issue at hand is the function h . Here enforcement ofthe reflection constraint is critical to getting this function correct. When thisconstraint is unenforced, the optimizer lacks the information to fit function h ξ for u < g ξ and h ξ functions that closely matches the exact functions.17 t . . . R u ( x , t ) d x / R u ( x , ) d x t − − − R u ( x , t ) d x / R u ( x , ) d x Figure 5: Evolution of total particle density for initial condition provided by training set ( left )and square wave initial condition ( right ). Results are shown for parameterization that doesnot enforce any physics ( ) and parameterization that enforces conservation ( ). − . . . u − h ( u ) No restrictions − . . . u Conservation − . . . u ConservationReflection κ R e { − g ( κ ) } κ κ Figure 6: h ( u ) ( top row ) and g ( κ ) ( bottom row ) for the true heat operator ( ) and learnedoperator ( ) without enforcing physics ( left column ), enforcing conservation ( middle col-umn ), and enforcing conservation and parity ( right column ). .2.3. Generation of training data from DLevy flight
Empirical particle densities are computed in a similar manner as in Sec-tion 5.2.1. Multi-von Mises distributions are used to generate the x and y coordinates of the initial particle positions, P ( X ) = 15 (cid:88) i =1 exp( κ di cos( x − µ di ))2 πI , (24)where the scale factors, κ di , and location parameters, µ di , are sampled indepen-dently for the d coordinate. The particle trajectories are computed using theEuler-Maruyama algorithm with increments sampled from an isotropic stabledistribution with α = 1 . dt = 0 . ×
64 bins. As in Section 5.2.1, µ di is sampled from auniform distribution, U [0 , π ].We vary the distributions of κ di to generate anisotropicly biased data. Forexample, if κ xi = 0, the initial particle density will be uniform in x and willremain zero for all time. In this situation, the empirical time evolution ofparticle density lacks information about dynamics in the x direction. In thefollowing section, we quantify the effects that anisotropicly biased data has ontraining. We introduce β as the “degree of anisotropy” in the training set,sampling ˆ κ xi and κ yi from U [1 , κ xi = (1 − β )ˆ κ xi . We perform 10realizations of training using the same data, randomly initializing the neuralnetwork parameters. The previous examples demonstrated the ability of the operator regressionapproach to learn integral order spatial operators, and the effects that assumingan inductive bias can have on the results. In this section, we demonstrate theadvantage of using a spectral discretization of the operator to realize fractionalspatial operators. In addition, we study the accuracy of the operator regressionwith and without the iostropic bias. This confirms, for this case, the intuitiveexpectation that more accuracy can be obtained with less data if the appropriatephysics is included in the parameterization of the operator.The top row of Figure 7 presents the training data generated by a Levy flightyielding a continuous fractional diffusion operator with exponent of α/ . u ( x, t ) = Π − g ( κ ) exp( −| κ | t ) g ( κ ) = (cid:26) κ x or κ y is even − j/π κ x κ y else (25)19his solution matches closely to the evolution of the field using the learnedoperator, as shown in the fourth row. L e vy R e g r e ss e d F r a c t i o n a l H e a t t = 0 R e g r e ss e d t = 249 t = 499 t = 749 t = 999 Figure 7: Histograms showing evolution of Levy flight (first row) . Evolution of learned op-erator using histogram of Brownian motion at t = 0 as the initial condition (second row) .Evolution of the fractional heat equation with square wave initial condition (third row) . Evo-lution of learned operator on square wave initial condition (fourth row) To examine the impact of inductive biases, the fractional isotropic diffusionof a truncated Fourier expansion of the Dirac δ function is considered. Figure 8shows analytical solution,ˆ u ( x, t ) = 4096Π − exp( −| κ | . t ) (26)in the top row. The remaining rows show the learned diffusion of the operatorwithout an isotropic inductive bias (middle row), and with an isotropic induc-tive bias (bottom row). These image clearly demonstrate that including theinductive biases helps maintain symmetry. Figure 9 shows isocontours of thelearned spectrum. For smaller modes (with larger wave lengths) the isotropicassumption clearly benefits the learning problem. As the wavelengths shrink,however, we see that all learned operators deviate from the theory as noiseovertakes signal in the training data.To quantify the impact of including the isotropic assumption, Figure 10presents the error achieved in reproducing the training data using the learnedoperator. In this experiment, an isotropic Levy flight is used. However, the20 r a c t i o n a l H e a t R e g r e ss e d t = 0 R e g r e ss e d I s o t r o p i c t = 249 t = 499 t = 749 t = 999 Figure 8: Evolution of the fractional heat equation with Dirac delta initial condition (firstrow) . Evolution of the learned equation without isotropy assumption with Dirac delta initialcondition (second row) . Evolution of the learned equation with isotropy assumption withDirac delta initial condition (third row) . κ x − − κ x Figure 9: Real part of g ( κ ) for fractional Laplacian ( ), regressed operator withoutisotropic assumption ( ), and regressed operator with isotropic assumption ( ). β in the figure, variesover 3 training sets. Further, the sizes of the training sets are varied. In theimages, the RMS error in the learned operator is plotted as a function of thetraining set size, with each line representing a different degree of anisotropy. Theregression used for the solid lines has no inductive bias, while the regression usedfor the dotted lines assumes isotropy. Comparing these two operators, it is clearthat the including the inductive bias reduces the error by multiple orders ofmagnitude in this case regardless of the degree of anisotropy in the trainingdata. Furthermore, this is not simply an effect of the amount of training dataas the isotropic case is able to do better for all values of β . In the case of a β = 1, this is a 1d distribution of the initial condition and despite this behaviorthe isotropic case still performs better although it does not have the markedimprovement with training set size. . . . . . Training set size − − − − − − − l og R M S E rr o r Figure 10: Test error vs. size of training set for anisotropic model trained on data with degreeof anisotropy, β = 0 . β = 0 . β = 1 . β = 0 . β = 0 . β = 1 . β = 0; statistically 1D : β = 1)
6. Continuum mechanics: extraction of continuum mechanics modelsfrom molecular dynamics simulations
A common problem in computational science and engineering is bridgingmicro- and macro- scale descriptions of a physical system. For many systemsa first principles micro-scale description is available, but the range of length-and time-scales needed to simulate a complete system are outside the reach of22ven leadership computing facilities. To simulate such systems, accurate macro-scale models are needed. We demonstrate the present method as a strategy forextracting continuum scale models from data collected in molecular dynamicssimulations.We consider two pressure driven, planar channel systems, a Lennard-Jones(LJ) fluid and a colloidal fluid. For the LJ system, as training we vary thedensity while for the colloidal system, we vary both the colloid concentrationand the particle sizes. Our goal is to find models that accurately predict theevolution of a set of macroscopic states. We will demonstrate the scheme’sability to generalize learned dynamics beyond the concentrations, particle sizes,and densities contained in the training data.
It is difficult to know which fields carry sufficient information for predictionto be accurate or even possible. Furthermore, increasing the number of fieldsto evolve may increase the difficulty of training. Therefore, we compare modelsthat evolve different fields for the LJ and colloidal systems. For the LJ system,we consider a one field model that evolves only the particle momentum, a twofield model that evolves the momentum and the particle density, and a three fieldmodel that evolves momentum, density, and the density weighted temperature.For the colloidal problem, we consider a two field model that evolves the largeparticle density and total particle momentum, a four field model that evolves theparticle density and momentum for the large and small particles separately, anda six field model that evolves density, momentum, density weighted temperaturefor the two phases. The geometry of the systems, ensures they are statistically1D in the wall normal direction. Thus, we seek 1D models for the evolutionof these fields. We assume homogeneous Neumann boundary conditions for thedensity and temperature fields and homogeneous Dirichlet boundary conditionsfor the momentum fields.All models considered have the following parameterization, ∂ t u iN = K (cid:88) k C − g ij ( κ ; γ ) C h ij ( u ; γ ) ∂ t u iD = K (cid:88) k S − g ij ( κ ; γ ) S h ij ( u ; γ ) (27)where C and S are the DCT and DST discussed in Section 2.3, u are all thefields being evolved, u N are the fields satisfying Neumann boundary conditions, u D are the fields satisfying Dirichlet boundary conditions, γ are the varyingsimulation parameters, and K is a hyperparameter.For the LJ system, γ = ρ ∗ is the LJ reduced density and1 field: ( u N , u D ) = ([ ∅ ] , [ p ]) , (28)2 field: ( u N , u D ) = ([ ρ ] , [ p ]) , (29)3 field: ( u N , u D ) = ([ ρ, ρT ] , [ p ]) , (30)23here p is the particle momentum, ρ is the particle density, and ρT is thedensity weighted temperature. For the colloidal system γ = ( c, d ) is the colloidconcentration and colloid particle size and2 field: ( u N , u D ) = ([ ρ L ] , [ p L + p S ]) , (31)4 field: ( u N , u D ) = ([ ρ L , ρ S ] , [ p L , p S ]) , (32)6 field: ( u N , u D ) = ([ ρ L , ρ S , ( ρT ) L , ( ρT ) S ] , [ p L , p S ]) , (33)where ρ , p , and ( ρT ) are again the density, momentum and density weightedtemperature, and the superscripts, L and S refer to the colloid particles andsolvent particles, respectively.Using these parameterizations, we attempt to find models that capture boththe transient and steady state behaviors. We found the following loss functionto yield successful models, ξ ∗ = argmin ξ (cid:88) u j ∈ Xx q ∈ Qm ∈ [1 ,...,M ] m β | u m + j ( x q ) − ( I + L ξ ) m u j ( x q ) | . (34)where β is an additional hyperparameter and m is sampled from a uniform dis-tribution U [0 , M ]. For β >
0, this loss function preferentially penalizes updateoperators that fail to capture dynamics on fast timescales. Compared to theloss function in Equation 21, this loss function does not lead to the optimizerforgetting short time scale dynamics as it learns longer time scale dynamics.Due to the configuration of these systems, the number of particles in thedomain remains constant over time, and therefore, we may assume conservationof ρ . In Section 6.3.2 we consider the benefits of enforcing conservation on ρ . The LAMMPS particle simulation engine is used to generate the train-ing data needed. LAMMPS evolves the dynamics of each particle based ona Verlet time integration of Newton’s equations of motion, e.g. m i ∂ t r i = ∇ U { r ij } [55, 56, 57]. Inter-particle forces ( ∇ U ) are used in a discrete timeintegration algorithm to update velocities and positions. Forces for particle i are accumulated by summing over all neighboring particles j using the distance r ij between particles. There are many ways to select U { r ij } functionals, forinstance recent work has applied machine learning techniques to achieve a sig-nificant advances in model fidelity [58]. However, for this work we have employedthe colloid potential energy functional of Everaers et. al. [59] that reduces toa Lennard-Jones model for point masses. Simulations of a two-phase mixtureof large and small particles are initialized on a regular face-centered cubic lat-tice with particle types assigned at random in a domain with periodic boundaryconditions. This initial spatial ordering evolves quickly ( < timesteps) to ap-proach steady-state flow. For the Lennard-Jones system, the volume of the cellis varied to produce particle densities, ρ ∗ ∈ [0 . , . , . , . , . , . , . , c ∈ [0 . , . , . , . , . d ∈ [2 , . , , , ± X boundary to be fixed in time. Thiscreates a rigid wall that prevents particles from passing through the boundary.A uniform particle flow is simulated by adding a constant force (in addition to ∇ U { r ij } ) to the remaining mobile particles in the +Y direction. A constanttemperature is maintained by a Nose-Hoover thermostat. This results in steady-state dynamics where the number of particles (of each type), simulation volumeand temperature are conserved quantities (i.e. the canonical ensemble). Train-ing data of volume-averaged velocity and density is collected on a rectangulargrid perpendicular to the flow direction. Within each grid volume the particlenumber density and velocities are averaged for 10 timesteps and outputted ev-ery 10 timesteps to a file. We found this is a necessary step to avoid excessnoise due to particle fluctuations near grid boundaries. Training simulationstotaling forty output frames were generated. The total time is scaled 1. Inaddition, the width of the channel is scaled to be 1. This scaling is reflected inFigures 12 thru 15. This method allows us to capture initial concentration gra-dients between species, and their time evolution. To test the generalization ofthe operator regression framework, the results given in subsequent sections willbe of data that are not directly represented by the training conditions outlinedhere. In particular, particle densities or colloidal mass ratios that interpolatebetween training, as well as to times that extrapolate beyond the training setwill be used as validation.We plan to make the configuration files used to run these simulations avail-able at github.com/rgp62/MOR-Physics . For many physical systems, it is difficult to determine the grid resolutionneeded to accurately capture the dynamics given the ambiguity of defining arepresentative volume element. Additionally, more resolution may increase thedifficulty of training as finer length scale dynamics must be learned. In thisstudy, we examine the effects of resolution on training for the LJ system. Wecreate a dataset with varying grid resolution by performing an ensemble of LJsimulations for ρ ∗ = 0 .
01 and binning on a fine scale, with N = 80. Thehigh resolution dataset is obtained by averaging over the ensemble. The lowerresolution, N = 40 and N = 20 are obtained by downsampling the N = 80runs with a flat kernel of width 2 and 4 and averaging over one-half and one-fourth of the ensemble, respectively. This ensures the particles sampled per binis constant among resolutions, controlling for noise. We fit the 2 field model forall three resolutions. Table 3 lists the hyperparameters used in this study.25 able 3: Hyperparameters for LAMMPS modeling Parameter ValueNetwork width 4Network depth 4 K M β . × −
40 5 . × −
80 9 . × − . . N = ρ p . . N = x . . N = x Figure 11: LJ system with ρ ∗ = 0 .
01. Minimum relative error at steady state over 20 train-ing runs for each grid resolution ( left ). Density and momentum at steady state ( right ) forLAMMPS ( ) and the learned equation ( ). N = 80 case performsworse than the other two cases. As discussed above, higher resolutions mayincrease the difficulty of training as finer scale dynamics must be learned. Inparticular, we observe additional flow features in both fields near the walls withincreased resolution. The remaining studies use a resolution of N = 20. .
25 0 .
50 0 .
75 1 . Training set cutoff time − M e d i a n E rr o r .
25 0 .
50 0 .
75 1 . Training set cutoff time × − × − × − × − M i n i m u m E rr o r Figure 12: LJ system with ρ ∗ = 0 .
01. Relative error at steady state vs. the cutoff time oftraining set. Median error ( left ) and minimum error ( right ) over sets of 20 runs is shown for 1field model ( ), 2 field model with conservation ( ) and without conservation ( ),and 3 field model with conservation ( ) and without conservation ( ). .
25 0 .
50 0 .
75 1 . Training set cutoff time − M e d i a n E rr o r .
25 0 .
50 0 .
75 1 . Training set cutoff time × − × − × − M i n i m u m E rr o r Figure 13: Colloidal system with ( c, d ) = (0 . , left ) and minimum error ( right ) over sets of 20runs is shown for 2 field model with conservation ( ) and without conservation ( ), 4field model with conservation ( ) and without conservation ( ), and 6 field model withconservation ( ) and without conservation ( ). We next compare the models for the LJ and colloidal systems and the benefits27 able 4: Hyperparameter optimization ranges
Hyperparameter RangeNetwork width [1,8]Network depth [1,8]K [1,4]M [1,8]Learning rate [1e-5,1e-1]Batch size [10,200]Epochs [10,1000]of enforcing conservation on ρ . We again use the hyperparameters in Table 3.For this study, we vary training data available to the optimizer by including onlydata up to a cutoff time, t c . For all models, we train over all initial conditionsand simulation parameters, i.e., reduced (particle number) density, ρ ∗ , for theLJ system and colloid concentration and particle size, ( c, d ), for the colloidalsystem. We test the models’ prediction at steady state with the relative error ofthe momentum for the LJ system and the colloid concentration for the colloidalsystem.Figure 12 compares the 1, 2 and 3 field models for the LJ system. We observethat the model accuracy quickly improves quickly as more of the training datais made available, but quickly levels off. We find that the one equation modelperforms poorly, while the two and three equation models perform similarly.The one field model, for which only the particle momentum is evolved maynot carry enough information for good predictions to be possible. We do notobserve a consistent benefit to imposing conservation of ρ in the 2 and 3 fieldmodels. The data available may contain enough information for conservation tobe inferred or conservation may confer no advantage in the quality of prediction.Figure 13 compares the 2, 4, and 6 field models for the colloidal system.We again observe that the model accuracy is poor for heavily censored datasetsbut quickly improves and saturates with more data. The 2 and 4 field modelsperform similarly, while the 6 field model performs poorly. The complexity ofthe model may have increased the difficulty of training such that proper fitscould not be achieved. For this system, we do observe a more consistent, albeitsmall benefit to imposing conservation of ρ . In this section, we attempt to capture the transient behavior of the twosystems and evaluate the ability of the models to interpolate between simulationparameters. To perform this study, we separate the LAMMPS data into trainingsets and test sets. The LAMMPS runs, ρ ∗ = 0 . c, d ) =(0 . , .
5) for the colloidal system are set aside as the test sets. All other runsform the training sets. We use the 2 field model and 4 field models for the LJand colloidal systems, respectively. 28 . . . . . . t = . ρ . . . . . p . . . . . t = . . . . . . . . . . . t = . . . . . . . . . . . t = . . . . . . . . . x . . t = . . . . x . . ρ ∗ = 0 .
1. Evolution of LAMMPS simulation ( ) and regressed 2equation model ( ). Particle density ( first column ), particle momentum ( second column ),and particle temperature ( third column ) is shown for increasing time ( rows ). . . t = . ρ L p L ρ S p S . . t = .
01 05 020 . . t = .
01 05 020 . . t = .
01 05 020 1 x . . t = . x
01 0 1 x
05 0 1 x Figure 15: Colloidal system with ( c, d ) = (0 . , . first column ), smallparticle momentum ( second column ), large particle density ( third column ), and large particlemomentum ( fourth column ) is shown for increasing time ( rows ).
30e found the quality of transient predictions to be heavily sensitive to thechoice of hyperparameters. Therefore, we use the black box hyperparameteroptimization tool, scikit-optimize [60], to tune the models. We use the Bayesianoptimization tool within scikit-optimize to identify the optimal set of hyperpa-rameters shown in Table 4 with the ranges searched. The default settings forthis tool are used. To find the optimal hyperparameters that result in good fitsfor both the steady state and transient, we use the following loss,Loss = || u regressed − u LAMMPS |||| u LAMMPS || (cid:12)(cid:12)(cid:12)(cid:12) t =0 . + || u regressed − u LAMMPS |||| u LAMMPS || (cid:12)(cid:12)(cid:12)(cid:12) t =1 . (35)This loss penalizes hyperparameters that result in poor fits for both the transientand steady state dynamics. The hyperparameter optimization is performedseparately for the LJ and colloidal systems.After training with the optimal hyperparameters, we use the models to pre-dict the evolution of the test systems, shown in Figures 14 and 15 for the LJ andcolloidal systems. For both systems, we observe the models have good agree-ment with the LAMMPS simulations, up to the final time of the LAMMPSsimulations. Additionally, the steady state for the model appears to be stable,even well past the end time of the LAMMPS simulation.
7. Conclusion
In this work, we extend a PDE discovery method based upon pseudo-spectralapproximation [26] to systems of PDEs, multidimensional problems, and do-mains with non-trivial boundary conditions. Assuming a first order in timePDE, we parameterize the spatial operator as the composition of a Fouriermultiplier and a nonlinear point-wise functional, both parameterized by neu-ral networks. Such a paramterization allows for simple introduction of physi-cally motivated inductive biases, such as conservation, translational invariance,isotropy, and reflective symmetry. We demonstrate that we can recover the heatequation and fractional heat equation from density measurements of Brownianmotion and Levy flight trajectories in 1D and 2D. The inductive biases im-prove the accuracy of the regression technique and allows it to discover modelsthat extrapolate beyond the dynamics present in the training set. Additionally,we demonstrate that the method can extract suitable continuum models frommolecular dynamics simulations of a Lennard-Jones fluid and a colloidal fluidunder Poiseuille flow.Future work will focus on translating this framework to other discretiza-tions, such as the spectral element method, so that complex geometries may behandled more naturally. Additionally, we will focus on applying this methodto extract physics in realistic engineering settings. As we demonstrated in thiswork, we are able to extract accurate models from fine-grained simulations. Us-ing this framework, we may extract models from a broad range of physics, suchas turbulence, multiphase flows, and material mechanics. While this work pri-marily considered molecular dynamics data, one may generalize to non-syntheticexperimental data as well. 31 cknowledgement
Sandia National Laboratories is a multimission laboratory managed and op-erated by National Technology and Engineering Solutions of Sandia, LLC, awholly owned subsidiary of Honeywell International, Inc., for the U.S. Depart-ment of Energy’s National Nuclear Security Administration under contract DE-NA0003525. This paper describes objective technical results and analysis. Anysubjective views or opinions that might be expressed in the paper do not nec-essarily represent the views of the U.S. Department of Energy or the UnitedStates Government.The work of R. Patel and N. Trask has also been supported by the U.S. De-partment of Energy, Office of Advanced Scientific Computing Research underthe Collaboratory on Mathematics and Physics-Informed Learning Machines forMultiscale and Multiphysics Problems (PhILMs) project. The work of E. Cyrwas supported by the U.S. Department of Energy, Office of Advanced Scien-tific Computing Research under the Early Career Research Program. SANDNumber: SAND2020-5359 J.
References [1] S. L. Brunton, J. L. Proctor, J. N. Kutz, Discovering governing equationsfrom data by sparse identification of nonlinear dynamical systems, Pro-ceedings of the National Academy of Sciences 113 (15) (2016) 3932–3937.[2] S. H. Rudy, S. L. Brunton, J. L. Proctor, J. N. Kutz, Data-driven discoveryof partial differential equations, Science Advances 3 (4) (2017) e1602614.[3] F. J. Mont´ans, F. Chinesta, R. G´omez-Bombarelli, J. N. Kutz, Data-driven modeling and learning in science and engineering, Comptes RendusM´ecanique 347 (11) (2019) 845–855.[4] G. Pavliotis, A. Stuart, Multiscale methods: averaging and homogeniza-tion, Springer Science & Business Media, 2008.[5] T. Arbogast, Mixed multiscale methods for heterogeneous elliptic problems,in: Numerical analysis of multiscale problems, Springer, 2012, pp. 243–283.[6] R. Zwanzig, Nonequilibrium statistical mechanics, Oxford University Press,2001.[7] J.-C. Michel, H. Moulinec, P. Suquet, Effective properties of composite ma-terials with periodic microstructure: a computational approach, ComputerMethods in Applied Mechanics and Engineering 172 (1-4) (1999) 109–143.[8] H. Mori, Transport, collective motion, and brownian motion, Progress ofTheoretical Physics 33 (3) (1965) 423–455.329] Z. Li, X. Bian, X. Li, G. E. Karniadakis, Incorporation of memory effectsin coarse-grained modeling via the Mori-Zwanzig formalism, The Journalof Chemical Physics 143 (24) (2015) 243128.[10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)436–444.[11] I. Goodfellow, Y. Bengio, A. Courville, Deep learning, MIT Press, 2016.[12] N. Baker, F. Alexander, T. Bremer, A. Hagberg, Y. Kevrekidis, H. Najm,M. Parashar, A. Patra, J. Sethian, S. Wild, et al., Workshop report onbasic research needs for scientific machine learning: Core technologies forartificial intelligence, Tech. rep., USDOE Office of Science, Washington,DC (2019).[13] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. Zam-baldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner,et al., Relational inductive biases, deep learning, and graph networks, arXivpreprint arXiv:1806.01261 (2018).[14] J.-L. Wu, H. Xiao, E. Paterson, Physics-informed machine learning ap-proach for augmenting turbulence models: A comprehensive framework,Physical Review Fluids 3 (7) (2018) 074602.[15] M. Raissi, P. Perdikaris, G. E. Karniadakis, Physics-informed neural net-works: A deep learning framework for solving forward and inverse problemsinvolving nonlinear partial differential equations, Journal of ComputationalPhysics 378 (2019) 686–707.[16] J. Ling, R. Jones, J. Templeton, Machine learning strategies for systemswith invariance properties, Journal of Computational Physics 318 (2016)22–35.[17] L. T. Biegler, O. Ghattas, M. Heinkenschloss, B. van Bloemen Waanders,Large-scale PDE-constrained optimization: an introduction, in: Large-Scale PDE-Constrained Optimization, Springer, 2003, pp. 3–13.[18] E. Haber, U. M. Ascher, D. W. Oldenburg, Inversion of 3d electromagneticdata in frequency and time domain using an inexact all-at-once approach,Geophysics 69 (5) (2004) 1216–1228.[19] I. Epanomeritakis, V. Ak¸celik, O. Ghattas, J. Bielak, A Newton-CGmethod for large-scale three-dimensional elastic full-waveform seismic in-version, Inverse Problems 24 (3) (2008) 034015.[20] A. M. Stuart, Inverse problems: a Bayesian perspective, Acta Numerica 19(2010) 451–559.[21] L. Lu, P. Jin, G. E. Karniadakis, Deeponet: Learning nonlinear operatorsfor identifying differential equations based on the universal approximationtheorem of operators, arXiv preprint arXiv:1910.03193 (2019).3322] J. Bongard, H. Lipson, Automated reverse engineering of nonlinear dynam-ical systems, Proceedings of the National Academy of Sciences 104 (24)(2007) 9943–9948.[23] M. Schmidt, H. Lipson, Distilling free-form natural laws from experimentaldata, Science 324 (5923) (2009) 81–85.[24] I. Bright, G. Lin, J. N. Kutz, Compressive sensing based machine learningstrategy for characterizing the flow around a cylinder with limited pressuremeasurements, Physics of Fluids 25 (12) (2013) 127102.[25] Z. Long, Y. Lu, B. Dong, PDE-Net 2.0: Learning PDEs from data with anumeric-symbolic hybrid deep network, Journal of Computational Physics399 (2019) 108925.[26] R. G. Patel, O. Desjardins, Nonlinear integro-differential operator regres-sion with neural networks, arXiv preprint arXiv:1810.08552 (2018).[27] J. P. Boyd, Chebyshev and Fourier spectral methods, Courier Corporation,2001.[28] J. S. Hesthaven, S. Gottlieb, D. Gottlieb, Spectral methods for time-dependent problems, Vol. 21, Cambridge University Press, 2007.[29] G. Karniadakis, S. Sherwin, Spectral/hp element methods for computa-tional fluid dynamics, Oxford University Press, 2013.[30] K. Wu, D. Xiu, Data-driven deep learning of partial differential equationsin modal space, Journal of Computational Physics 408 (2020) 109307.[31] D. Zhang, L. Guo, G. E. Karniadakis, Learning in modal space: Solvingtime-dependent stochastic pdes using physics-informed neural networks,SIAM Journal on Scientific Computing 42 (2) (2020) A639–A665.[32] T. Qin, K. Wu, D. Xiu, Data driven governing equations approximationusing deep neural networks, Journal of Computational Physics 395 (2019)620–635.[33] K. Wu, D. Xiu, Numerical aspects for approximating governing equationsusing data, Journal of Computational Physics 384 (2019) 200–221.[34] D. Zhang, L. Guo, G. E. Karniadakis, Learning in Modal Space: Solv-ing Time-Dependent Stochastic PDEs Using Physics-Informed Neural Net-works, SIAM Journal on Scientific Computing 42 (2020) A639–A665. doi:10.1137/19M1260141 .[35] J. Nocedal, S. Wright, Numerical optimization, Springer Science & BusinessMedia, 2006. 3436] J. Hicken, J. Alonso, Comparison of reduced-and full-space algorithms forpde-constrained optimization, in: 51st AIAA Aerospace Sciences Meetingincluding the New Horizons Forum and Aerospace Exposition, 2013, p.1043.[37] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Cor-rado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp,G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-enberg, D. Man´e, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster,J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke,V. Vasudevan, F. Vi´egas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke,Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heteroge-neous systems, software available from tensorflow.org (2015).[38] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan,T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf,E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner,L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle,A. Beygelzimer, F. d’ Alch´e-Buc, E. Fox, R. Garnett (Eds.), Advances inNeural Information Processing Systems 32, Curran Associates, Inc., 2019,pp. 8024–8035.[39] W. H. Press, S. A. Teukolsky, W. T. Vetterling, B. P. Flannery, Numer-ical Recipes 3rd Edition: The Art of Scientific Computing, 3rd Edition,Cambridge University Press, 2007.[40] N. Trask, R. G. Patel, B. J. Gross, P. J. Atzberger, GMLS-Nets: A frame-work for learning from unstructured data, arXiv preprint arXiv:1909.05371(2019).[41] Y. Bar-Sinai, S. Hoyer, J. Hickey, M. P. Brenner, Learning data-drivendiscretizations for partial differential equations, Proceedings of the NationalAcademy of Sciences 116 (31) (2019) 15344–15349.[42] S. Dieleman, K. W. Willett, J. Dambre, Rotation-invariant convolutionalneural networks for galaxy morphology prediction, Monthly Notices of theRoyal Astronomical Society (2015).[43] S. C. Wong, A. Gatt, V. Stamatescu, M. D. McDonnell, Understand-ing data augmentation for classification: when to warp?, arXiv preprintarXiv:1609.08764 (2016).[44] J. Wang, L. Perez, The Effectiveness of Data Augmentation in Image Clas-sification using Deep Learning, arXiv preprint arXiv:1712.04621v1 (2017).[45] H. Inoue, Data Augmentation by Pairing Samples for Images Classification,arXiv preprint arXiv:1801.02929 (2018).3546] S. Lawrence, C. Giles, Ah Chung Tsoi, A. Back, Face recognition: a convo-lutional neural-network approach, IEEE Transactions on Neural Networks8 (1) (1997) 98–113.[47] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet Classification withDeep Convolutional Neural Networks, in: Conference on Neural Informa-tion Processing Systems, 2012, pp. 1097–1105.[48] P. Meltzer, M. D. G. Mallea, P. J. Bentley, PiNet: A Permutation In-variant Graph Neural Network for Graph Classification, arXiv preprintarXiv:1905.03046 (2019).[49] N. Thomas, T. Smidt, S. Kearnes, L. Yang, L. Li, K. Kohlhoff, P. Riley,Tensor field networks: Rotation- and translation-equivariant neural net-works for 3D point clouds, arXiv preprint arXiv:1802.08219 (2018).[50] C. Esteves, C. Allen-Blanchette, A. Makadia, K. Daniilidis, LearningSO(3) Equivariant Representations with Spherical CNNs, arXiv preprintarXiv:1711.06721 (2017).[51] D. Worrall, G. Brostow, CubeNet: Equivariance to 3D Rotation and Trans-lation, arXiv preprint arXiv:1804.04458 (2018).[52] I. M. Sokolov, J. Klafter, From diffusion to anomalous diffusion: a centuryafter einstein’s brownian motion, Chaos: An Interdisciplinary Journal ofNonlinear Science 15 (2) (2005) 026103.[53] M. Gulian, G. Pang, Stochastic solution of elliptic and parabolic bound-ary value problems for the spectral fractional laplacian, arXiv preprintarXiv:1812.01206 (2018).[54] A. Lischke, G. Pang, M. Gulian, F. Song, C. Glusa, X. Zheng, Z. Mao,W. Cai, M. M. Meerschaert, M. Ainsworth, et al., What is the fractionallaplacian?, arXiv preprint arXiv:1801.09767 (2018).[55] S. Plimpton, Fast parallel algorithms for short-range molecular dynamics,Tech. rep., Sandia National Labs., Albuquerque, NM (1993).[56] M. Tuckerman, B. J. Berne, G. J. Martyna, Reversible multiple time scalemolecular dynamics, The Journal of Chemical Physics 97 (3) (1992) 1990–2001.[57] M. Parrinello, A. Rahman, Polymorphic transitions in single crystals: Anew molecular dynamics method, Journal of Applied physics 52 (12) (1981)7182–7190.[58] Y. Zuo, C. Chen, X. Li, Z. Deng, Y. Chen, J. Behler, G. Cs´anyi, A. V.Shapeev, A. P. Thompson, M. A. Wood, et al., Performance and cost assess-ment of machine learning interatomic potentials, The Journal of PhysicalChemistry A 124 (4) (2020) 731–745.3659] R. Everaers, M. Ejtehadi, Interaction potentials for soft and hard ellipsoids,Physical Review E 67 (4) (2003) 041710.[60] T. Head, MechCoder, G. Louppe, I. Shcherbatyi, fcharras, Z. Vin´ıcius, cm-malone, C. Schr¨oder, nel215, N. Campos, T. Young, S. Cereda, T. Fan, renerex, K. K. Shi, J. Schwabedal, carlosdanielcsantos, Hvass-Labs, M. Pak,SoManyUsernamesTaken, F. Callaway, L. Est`eve, L. Besson, M. Cherti,K. Pfannschmidt, F. Linzberger, C. Cauet, A. Gut, A. Mueller, A. Fabisch,scikit-optimize/scikit-optimize: v0.5.2 (Mar. 2018). doi:10.5281/zenodo.1207017doi:10.5281/zenodo.1207017