[PDF] Non-intrusive model reduction of large-scale, nonlinear dynamical systems using deep learning

Abstract

Projection-based model reduction has become a popular approach to reduce the cost associated with integrating large-scale dynamical systems so they can be used in many-query settings such as optimization and uncertainty quantification. For nonlinear systems, significant cost reduction is only possible with an additional layer of approximation to reduce the computational bottleneck of evaluating the projected nonlinear terms. Prevailing methods to approximate the nonlinear terms are code intrusive, potentially requiring years of development time to integrate into an existing codebase, and have been known to lack parametric robustness. This work develops a non-intrusive method to efficiently and accurately approximate the expensive nonlinear terms that arise in reduced nonlinear dynamical system using deep neural networks. The neural network is trained using only the simulation data used to construct the reduced basis and evaluations of the nonlinear terms at these snapshots. Once trained, the neural network-based reduced-order model only requires forward and backward propagation through the network to evaluate the nonlinear term and its derivative, which are used to integrate the reduced dynamical system at a new parameter configuration. We provide two numerical experiments---the dynamical systems result from the semi-discretization of parametrized, nonlinear, hyperbolic partial differential equations---that show, in addition to non-intrusivity, the proposed approach provides more stable and accurate approximations to each dynamical system across a large number of training and testing points than the popular empirical interpolation method.

Full PDF

NNon-intrusive model reduction of large-scale, nonlinear dynamical systemsusing deep learning

Han Gao a,b , Jian-Xun Wang a,b , Matthew J. Zahr a,b, ˚ a Department of Aerospace and Mechanical Engineering, University of Notre Dame, Notre Dame, IN b Center for Informatics and Computational Science, University of Notre Dame, Notre Dame, IN

Abstract

Projection-based model reduction has become a popular approach to reduce the cost associated with in-tegrating large-scale dynamical systems so they can be used in many-query settings such as optimizationand uncertainty quantiﬁcation. For nonlinear systems, signiﬁcant cost reduction is only possible with anadditional layer of approximation to reduce the computational bottleneck of evaluating the projected nonlin-ear terms. Prevailing methods to approximate the nonlinear terms are code intrusive, potentially requiringyears of development time to integrate into an existing codebase, and have been known to lack parametricrobustness.This work develops a non-intrusive method to eﬃciently and accurately approximate the expensive nonlin-ear terms that arise in reduced nonlinear dynamical system using deep neural networks. The neural networkis trained using only the simulation data used to construct the reduced basis and evaluations of the nonlinearterms at these snapshots. Once trained, the neural network-based reduced-order model only requires forwardand backward propagation through the network to evaluate the nonlinear term and its derivative, which areused to integrate the reduced dynamical system at a new parameter conﬁguration. We provide two numericalexperiments—the dynamical systems result from the semi-discretization of parametrized, nonlinear, hyper-bolic partial diﬀerential equations—that show, in addition to non-intrusivity, the proposed approach providesmore stable and accurate approximations to each dynamical system across a large number of training andtesting points than the popular empirical interpolation method.

Keywords: nonlinear model reduction, non-intrusive hyperreduction, deep learning

1. Introduction

Numerical simulations have made an undeniable impact on the ﬁelds of science, engineering, and medicinedue to the possibility to study or analyze a physical system in a virtual setting. However, modeling andsimulation of most practical systems is a computationally expensive endeavor, usually requiring days on asupercomputer, essentially limiting users to a few runs. However, the true power of computational physicslies in many-query analyses, e.g., optimization and uncertainty quantiﬁcation (UQ), which require simula-tions at a large number of parameter conﬁgurations. For example, optimization problems are ubiquitousin science and engineering and their solutions can lead to systems with unprecedented eﬃciency (e.g., en-ergetically optimal ﬂapping ﬂight [1–4]), help gain insight to physical phenomena, or determine propertiesof a system from sparse, noisy observations of the solution (inverse problems). To enable these types ofmany-query analyses on important problems, the issue of high computational cost of a single simulationmust be addressed. Reduced-order models (ROMs) oﬀer a promising means to do so.The number of degrees of freedom (DOF) of a dynamical system is dramatically reduced in ROMs byconstraining the dynamics to evolve in a very low-dimensional subspace, spanned by a set of reduced basis(RB) functions. These basis functions are usually learned through training data, i.e., solution snapshotsobtained from high-dimensional model (HDM) simulations. Although the number of DOF can be signiﬁcantly ˚ Corresponding author. Tel: +1 574-631-1298

Email address: [email protected] (Matthew J. Zahr)

Preprint submitted to Elsevier June 11, 2020 a r X i v : . [ m a t h . NA ] J un educed by RB projection, for nonlinear problems, the speedup of the standard ROMs is often marginal dueto a large number of high-dimensional operations involved in the evaluation of nonlinear terms in the ROMsystem. Therefore, an additional step, usually called hyperreduction , must be taken to eﬃciently approximatenonlinear terms. Most existing hyperreduction techniques, e.g., empirical interpolation method (EIM) [5]and its discrete variant discrete EIM (DEIM) [6], approximate the HDM nonlinear velocity function using alow-dimensional subspace as well, which provides a satisfactory tool to deal with nonlinear PDE systems inan eﬃcient way. Massive speedups can be gained by hyperreduced ROMs in many cases where the solution ofa dynamical system and its nonlinear terms are well-approximated in low-dimensional subspaces, includingnon-parametric problems (reproduce training data), linear elliptic PDEs, or problems with limited parametervariations [7–9].Despite the tremendous promise of ROMs, standard hyperreduction techniques often struggle to pro-vide a stable and accurate approximation of nonlinear terms and present notable limitations in parametricsettings [10]. This lack of parametric robustness remains the main roadblock of ROMs being the tech-nology that enables large-scale many-query analyses which inherently involve parametric variations. Inaddition, standard hyperreduction techniques are code intrusive [6], usually requiring hundreds of person-hours to implement, which poses great challenges to leveraging existing open-source or commercial legacycodes for computational mechanics. Therefore, there is an increasing interest in developing non-intrusiveor weakly-intrusive ROMs without the need for access to HDM operators to achieve better robustness andstability [11–14]. For example, Audouze et al. [11] proposed a non-intrusive proper orthogonal decomposition(POD)-based ROM, where the Galerkin projection is bypassed by using a radial basis regression to estimatethe RB coeﬃcients directly and does not require hyperreduction for eﬃciency. Peherstofer and Wilcox [12]proposed a data-driven operator inference approach based on least square ﬁtting to establish a non-intrusiveprojection-based ROM framework. Reduced-order models based on piecewise polynomial approximation ofthe dynamical system in state space [15, 16] are popular in subsurface ﬂow [17–19] and electrical engineeringapplications [15, 20], but are diﬃcult to train since they sensitive to the choice of expansion points [21].Recent advances in scientiﬁc machine learning (SciML) has been receiving increased attention in the com-putational modeling community [22–26] and oﬀers new opportunities to develop more eﬃcient and accuratereduced-order models. A growing amount of research using machine learning, particularly deep learning tech-niques, for model reduction has been witnessed most recently. Speciﬁcally, a majority of these studies focusedon learning the closure model (or discrepancy terms) of projection-based ROMs from data to improve theirpredictive accuracy [27–34]. San and Maulik [28, 29] employed feedforward neural networks (NNs) to buildthe ROM closures, with which the performance can be notably improved. Pan and Duraisamy [30] modeledthe truncated dynamics in a data-driven way using sparse regression and neural networks. In a similar vein,Mohebujjaman et al. [32] proposed a data-driven correction ROM (DDC-ROM) framework, which has beentested on a number of ﬂuid dynamic problems. In addition to building closure models, machine learninghas also been used to construct more representative basis functions for model reduction. For example, Leeand Carlberg [22] applied deep convolutional autoencoders to learn the nonlinear manifold deﬁned by theparametrized dynamical system solution, which was shown to outperform the linear POD basis. Another im-portant direction of using machine learning in model reduction is the development of accurate non-intrusiveROMs [35–49]. In most of these works, the general idea is to use machine learning models to learn thetemporal evolution of the states in the reduced subspace and thus the Galerkin projection and intrusivehyperreduction are bypassed. For example, the dynamics in low-dimensional manifold can be learned usinga multi-level perceptrons (MLP) [39, 40, 42], deep residual recurrent neural networks (RNN) [36], or LongShort Term Memory (LSTM) based RNN [37, 43, 49]. Most of these approaches are purely data-driven andequation-free, which makes it diﬃcult to respect the underlying PDE structure. Moreover, these works arefocused on problems with non-parametric settings [44, 49].In this work, we propose a novel method to approximate the nonlinear terms arising in projection-basedROMs that uses deep learning (DL) to overcome the parametric robustness issues and code intrusion ofexisting hyperreduction methods. Namely, a deep fully-connected neural network (NN) will be built to learnthe nonlinear velocity function in the ROM equations by leveraging the same HDM solution data used toconstruct the POD basis and the corresponding velocity data. The deep NN model here is embedded intothe standard projection-based ROM setting and the resulting dynamical system is solved using numericaltime integration. The proposed method is non-intrusive in the sense that the hyperreduced model is a smalldynamical system with velocity function deﬁned by the NN that can be run independently of the original2imulation code once the NN has been trained. The performance of the proposed method will be comparedagainst a ROM without hyperreduction and a ROM with the (D)EIM hyperreduction. Note that this workis focused on parametrized, nonlinear dynamical systems. To the best of our knowledge, the current work isthe ﬁrst attempt to build a DL-based hyperreduction for projection-based ROMs in parametric settings.The rest of the paper is organized as follows. Section 2 introduces projection-based model reduction fornonlinear dynamical systems and brieﬂy discusses the popular, intrusive hyperreduction method (D)EIM.Section 3 presents the detailed formulation and training procedure for the proposed non-intrusive reduced-order model that uses deep learning to approximate the reduced velocity function. Section 4 comparesthe accuracy of the proposed method against classical model reduction and (D)EIM using two dynamicalsystems that result from the semi-discretization of nonlinear, hyperbolic PDEs. Finally, Section 5 concludesthe paper.

2. Classical model reduction of nonlinear dynamical systems

Consider a parametrized, nonlinear dynamical system that we will take to be the HDM, M d u dt “ f p u , t, µ q , u p q “ u , (1)where D Ă R N µ is the parameter space, u : r , T s ˆ D Ñ R N u is the time- and parameter-dependent state, u P R N u is the initial condition, f : R N u ˆ r , T s ˆ D Ñ R N u is the velocity of the nonlinear dynamicalsystem p ξ , t, µ q ÞÑ f p ξ , t, µ q , and M P R N u ˆ N u is the mass matrix. To approximately integrate the systemin (1), we introduce a discretization of the time domain into N t intervals with endpoints T : “ t t , t , . . . , t N t u such that t “ t N t “ T , and t n ă t n ` for n “ , . . . , N t ´ N u "

1) and computationally intensive to numerically integrate.Of particular interest are dynamical systems that result from the semi-discretization of partial diﬀerentialequations for large, complex systems, e.g., in computational ﬂuid dynamics it is not uncommon to havesemi-discrete models with O p q degrees of freedom [7, 10].We assume the computational complexity of evaluating the N u -components of the velocity function p ξ , t, µ q ÞÑ f p ξ , t, µ q is O p g p N u qq . Furthermore, we assume the complexity of evaluating the Jacobian matrix p ξ , t, µ q ÞÑ B f B ξ p ξ , t, µ q is also O p g p N u qq . This holds for local discretizations such as the ﬁnite diﬀerence, ﬁniteelement, or ﬁnite volume methods where the Jacobian is sparse with the number of nonzeros per columnand computational complexity of evaluating each column independent of N u . The complexity of an entireimplicit time step—dominated by the velocity and Jacobian evaluation and the linear solve with the Jacobianmatrix—is O p g p N u q ` N u q if a direct solver is used and O p g p N u q ` N u q if an iterative solver with an eﬀectivepreconditioner is used. The construction of projection-based reduced-order models begins with the ansatz that the solution of thedynamical system can be well-approximated in a low-dimensional aﬃne subspace V : “ t ¯ u ` Φ y | y P R k u u ,where Φ P R N u ˆ k u with Φ T Φ “ I denotes the reduced basis of dimension k u ! N u and ¯ u P R N u is anaﬃne oﬀset. That is, u p t, µ q « u r p t, µ q : “ ¯ u ` Φ y p t, µ q , (2)where u r : r , T s ˆ D Ñ V is the approximation of u p t, µ q in the reduced subspace and y : r , T s ˆ D Ñ R k u are the reduced coordinates of u r p t, µ q corresponding to the basis Φ and oﬀset ¯ u . The reduced coordinatesare deﬁned by enforcing the subspace approximation (2) in the governing equation and constraining theresulting system to be orthogonal to a test subspace W of dimension dim W “ k u M r d y dt “ f r p y , t, µ q , y p q “ y , (3)where Ψ P R N u ˆ k u with Ψ T Ψ “ I is a basis for W . The velocity of the reduced dynamical system is f r : R k u ˆ r , T s ˆ D Ñ R k u , p τ , t, µ q ÞÑ Ψ T f p ¯ u ` Φ τ , t, µ q (4)3nd the reduced mass matrix M r P R k u ˆ k u and initial condition for the reduced coordinates y P R k u aredeﬁned as M r : “ Ψ T M Φ , y : “ Φ T p u ´ ¯ u q . (5)The reduced initial condition y is the orthogonal projection of the initial condition u onto the trial subspace.In this work, we choose the test space to be the same as the trial space, up to the aﬃne oﬀset, i.e.,a Galerkin projection Ψ “ Φ . The aﬃne oﬀset is taken to be the initial condition ¯ u “ u based on theobservations in [50]. The reduced basis Φ is deﬁned using the method of snapshots and proper orthogonaldecomposition (POD) [51]. For each parameter in a given training set Ξ : “ t µ , . . . , µ N s u Ă D , wecompute the approximate solution on the time discretization T and agglomerate into a global snapshotmatrix X “ “ X ¨ ¨ ¨ X N s ‰ P R N u ˆ N t N s , where ﬁxed-parameter snapshot matrices X k P R N u ˆ N t aredeﬁned as X k : “ “ u p t , µ k q ¨ ¨ ¨ u p t N t , µ k q ‰ (6)for k “ , . . . , N s . The reduced basis Φ is deﬁned by compressing the information in the snapshot matrixusing POD, i.e., retaining the dominant singular vectors of the translated snapshot matrix u ´ u T (toaccount for the aﬃne oﬀset). This computationally expensive training phase requires N s solutions of thelarge-scale dynamical system and compression of the resulting snapshot matrix of size N u ˆ N s N t , but is onlyrequired once; the resulting reduced-order model can be leveraged on a testing set Ξ ˚ without re-trainingto ameliorate the oﬄine cost. Generalizability of the basis to Ξ ˚ depends on the coverage of the parameterspace with training samples. Sophisticated greedy methods exist to select training samples based on themaximum ROM error in the parameter space [52, 53], given a reliable error indicator is available. Sincewe consider complex nonlinear problems, such error indicators with high eﬀectivity are not available so wesimply use uniform sampling—feasible in our setting due to low-dimensional parameter spaces considered( N µ ď p τ , t, µ q ÞÑ f r p τ , t, µ q and its Jacobian p τ , t, µ q ÞÑ B f r B τ p τ , t, µ q are O p g p N u q ` N u k u q and O p g p N u q ` N u k u q , respectively. An entire implicit time step requires O p g p N u q ` N u k u ` k u q operations, assuming a direct solver is used for the linear system of equations, which is almostexclusively the case due to the small size of k u . Despite the substantial reduction in the dimensionality of the dynamical system, from N u -dimensional in(1) to k u -dimensional in (3) with k u ! N u , it is well-known the ROM only achieves marginal speedup relativeto the HDM due to an inherent bottleneck in the evaluation of the nonlinear term f r p τ , t, µ q with complexityproportional to the large dimension N u : O p g p N u q ` N u k u q . From the deﬁnition in (4) it is clear that eventhough f r is a mapping between low-dimensional spaces, it is expensive to evaluate due to a sequence ofhigh-dimensional operations: reconstruction of u r “ ¯ u ` Φ τ ( O p N u k u q operations), evaluation of the HDMvelocity function f p u r , t, µ q ( O p g p N u qq operations), and projection of the velocity onto Range p Φ q ( O p N u k u q operations). To overcome this computational bottleneck, a host of so-called hyperreduction methods [5–9, 54–58] have been introduced to approximate f r at a cost that does not scale with N u . However, these methodsusually have limited prediction potential for complex problems [10] and, most importantly, are diﬃcult andcode-intrusive to implement properly and achieve substantial speedup in practice.For example, the empirical interpolation method [5] and its discrete variant [6] approximate the ROMvelocity function as f r p τ , t, µ q « f d p τ , t, µ q : “ AP T f p ¯ u ` Φ τ , t, µ q , A “ Ψ T Π p P T Π q ´ P R k u ˆ k f , (7)where Π P R N u ˆ k f is a basis for a k f -dimensional subspace ( k f ! N u ) used to approximate the HDMvelocity function f and P P R N u ˆ k f is a subset of the columns of the N u ˆ N u identity matrix used tosample entries of the HDM velocity function. Then the (D)EIM reduced coordinates y d : r , T s ˆ D Ñ R k u are computed such that M r d y d dt “ f d p y d , t, µ q , y d p q “ y , (8)4nd the HDM approximation u d : r , T s ˆ D Ñ R N u is reconstructed as u p t, µ q « u d p t, µ q : “ ¯ u ` Φ y d p t, µ q . (9)As noted in [6, 8, 59], for this approach to be eﬃcient, it is not suﬃcient to ﬁrst evaluate f p ¯ u ` Φ τ , t, µ q andthen sample its entries. Rather, the term P T f must be evaluated directly given the appropriate sampling ofthe state ˆ P T u , where ˆ P P R N u ˆ k s is the matrix that samples all entries of the state u required to evaluatethe entries necessary entries of the velocity function P T f . This approach assumes sparse dependence ofthe velocity function on the state, i.e., each entry of the velocity function depends on a small number ofentries of the state vector. This sparsity property is guaranteed if the dynamical system in (1) correspondsto the semi-discretization of a PDE using local methods e.g., ﬁnite diﬀerence or ﬁnite volume methods.Direct implementation of P T f given ˆ P T u in the context of a PDE discretization involves the use of asparsiﬁed or sample mesh on which the state ˆ P T u is deﬁned [6]. While successful, this approach is codeintrusive and diﬃcult to implement, often requiring years of development to incorporate into existing codes.In addition, the implementation is highly dependent on the semi-discretization approach used for the PDE[6–8, 57, 58]. Other physics-based hyperreduction methods besides (D)EIM exist [5–9, 54–58] to approximatethe ROM velocity; however, they all rely on this concept of partial assembly over a sample mesh and requirea specialized, code-intrusive implementation.Assuming an eﬃcient implementation of the sampled nonlinear velocity function and its Jacobian, thecomputational complexity of evaluating each term is O p g p k f q ` k f k u q and O p g p k f q ` k f k u ` γk f k u q [6],respectively, where γ is the average number of nonzero entries per column of the full Jacobian matrix B f B ξ .In this work, we assume γ is independent of N u , which simpliﬁes the complexity of the reduced Jacobianevaluation to O p g p k f q ` k f k u q . An implicit time step requires O p g p k f q ` k f k u ` k u q operations, assuming adirect solver is used for the linear system of equations, which is independent of N u and substantially cheaperthan integrating the HDM (1) and ROM without hyperreduction (3).

3. Non-intrusive hyperreduction using deep neural networks

We propose a new approach to hyperreduction that approximates the ROM velocity function using adeep neural network, which we abbreviate ROM-NN in the remainder. That is, we introduce a functionˆ f r : R k u ˆ r , T s ˆ D ˆ R N w Ñ R k u , p τ , t, µ , ν q ÞÑ ˆ f r p τ , t, µ ; ν q (10)and vector of weights w P R N w such thatˆ f r p y p t, µ q , t, µ ; w q « f r p y p t, µ q , t, µ q (11)for any t P r , T s and µ P D and compute y n : r , T s ˆ D Ñ R k u that solves M r d y n dt “ ˆ f r p y n , t, µ q , y n p q “ y . (12)The HDM approximation u n : r , T s ˆ D Ñ R N u is reconstructed as u p t, µ q « u n p t, µ q : “ ¯ u ` Φ y n p t, µ q . (13)If (11) holds, we expect y n to be a good approximation to y and, provided the reduced basis is suﬃcient, u n to be a good approximation to u .The ROM velocity (4) is a k u -valued function of k u ` ` N µ variables; training a neural network toapproximate this mapping requires a (large) number of instances of the function input p τ , t, µ q P R k u ˆr , T s ˆ D and the corresponding output f r p τ , t, µ q P R k u so the network weights can be tuned to minimizea loss function. In this work, the network weights are deﬁned as the solution of the following optimizationproblem minimize w P R N w N t ÿ i “ N s ÿ j “ ››› ˆ f r p τ ij , t i , µ j ; w q ´ f r p τ ij , t i , µ j q ››› , (14)5here t t i u N t i “ Ă r , T s are the nodes of the temporal discretization (Section 2.1), t µ k u N s k “ Ă D are thetraining parameters (Section 2.2), and τ ij P R k u for i “ , . . . , N t , j “ , . . . , N s are the reduced coordinatesused for training the network. Given the requirement in (11) that the DNN velocity function matchesthe ROM velocity function on the manifold of ROM solutions , a sensible choice is τ ij “ y p t i , µ j q . Whileconsistent with the requirement in (11), this approach requires that both the HDM solution u p t, µ q andexpensive ROM solution y p t, µ q , i.e., without hyperreduction, be computed for each µ P Ξ to deﬁne thetraining data, which can substantially increase the oﬄine cost. To mitigate the additional cost of computingthe expensive ROM solution, we propose to use the projection of the HDM solution onto the subspace V inplace of the ROM solution itself. That is, we take τ ij “ ˜ y p t i , µ j q , where˜ y : r , T s ˆ D Ñ R k u , p t, µ q ÞÑ Φ T p u p t, µ q ´ ¯ u q , (15)which requires exactly the same data used to compute the reduced basis Φ . The complete training procedureis summarized in Algorithm 1. Algorithm 1

Training procedure for deep learning-based reduced-order model

Input:

Training set t µ j u N s j “ Ă D , temporal discretization t t i u N t i “ Ă r , T s , initial condition u P R N u Output:

Aﬃne oﬀset ¯ u P R N u , reduced basis Φ P R N u ˆ k u , network weights w P R N w for j “ , . . . , N s do Compute solution of the HDM dynamical system u p ¨ , µ j q end for Deﬁne the snapshot matrix X P R N u ˆ N s N t according to (6) Compute the left singular vectors of X ´ u T : u i P R N u , i “ , . . . , N s N t Deﬁne reduced subspace: ¯ u Ð u , Φ Ð “ u ¨ ¨ ¨ u r ‰ for j “ , . . . , N s do for i “ , . . . , N t do Compute τ ij “ Φ T p u p t i , µ j q ´ ¯ u q end for end for Solve (14) for network weights w In this work, we deﬁne ˆ f r using a fully-connected, feed-forward neural network (FCNN) architecture,which contains one input layer (the input vector p τ , t, µ q ), ﬁve hidden layers, and one output layer (the predic-tion ˆ f r p τ , t, µ q ). Each layer is fed forward to the next layer by a linear weighted sum and nonlinear activationfunction (e.g., reLU). The network is built in the form of sparse autoencoder (SAE), namely the hidden layersfollow a decoder-encoder structure in order to capture the complex hidden nonlinear pattern of the mapping.The number of neurons for each layers from the input to the output are p , , , , , , q . Stan-dardized normalization is applied for both input and output layers. The training is conducted in a supervisedmanner, i.e., minimizing the loss function of data misﬁt, using a stochastic gradient descent (SGD) based op-timizer (e.g. Adam algorithm [60]). To avoid over-ﬁtting, the dropout [61] and early stopping [62] techniquesare applied. To demonstrate the robustness of the ROM-NN, the architecture and hyperparameters of thenetwork remain the same for all test cases throughout the paper. We considered a number of other FCNNstructures (uniform, converging-diverging, diverging-converging; Figure 1) of varying depths and breadthsand found the accuracy of the ROM-NN to be rather insensitive to the network structure; for the problemsconsidered, even a shallow neural network with only two hidden layers and 80 neurons per layer leads to aROM-NN with similar overall accuracy as one with the aforementioned diverging-converging structure. Theonly exception being in the limit of sparse training where a deeper network leads to a slightly more accurateROM-NN.This approach is guaranteed to be non-intrusive because the training procedure only relies on snapshotsof the HDM solutions and evaluations of the ROM velocity function and the online solution only requiresevaluation of the neural network velocity function (forward pass) and its derivative with respect to τ (back-ward propagation). As a result, both the DNN and dynamical system code can be treated as black boxes ,which substantially eases the implementation burden. Another advantage of the proposed ROM-NN method6 igure 1: Neural network structures tested: uniform ( left ), converging-diverging ( middle ), and diverging-converging ( right ). is we directly approximate f r , a mapping between low-dimensional input and output spaces, using nonlin-ear basis functions. We will show in our numerical experiments (Section 4) that this approximation, whensuﬃciently trained, mitigates some parametric robustness and stability issues of traditional hyperreductiontechniques, such as (D)EIM, that approximate the mapping p τ , t, µ q ÞÑ f p ¯ u ` Φ τ , t, µ q (low-dimensionalinput space, high-dimensional output space) using a linear basis.The computational complexity of a single pass through a FCNN with M ` i th layerconsisting of m i neurons, i “ , . . . , M `

1, where layer 0 is the input and layer M ` O p ř M ` i “ m i ´ m i q , which can be seen from a simple analogy to dense matrix-vector multiplication. Inour case, m “ k u ` N µ ` m M ` “ k u . Therefore evaluation of the approximate velocity function p τ , t, µ , ν q ÞÑ ˆ f r p τ , t, µ , ν q scales quadratically with the breadth of the network and linearly in the depth: O p k u m ` N µ m ` k u m M ` ř Mi “ m i ´ m i q . In the special case where all hidden layers are the same size m i “ m for i “ , . . . , M , this reduces to O p k u m ` N µ m ` M m q . The cost to evaluate the Jacobian p τ , t, µ , ν q ÞÑ B ˆ f r B τ p τ , t, µ , ν q using automatic diﬀerentiation is within a constant factor of the cost for thevelocity function itself [63, 64]. Therefore the cost of an entire implicit time step is O p k u ` k u m ` N µ m ` k u m M ` ř Mi “ m i ´ m i q assuming a direct solver is used for the linear system. In the special case where allhidden layers are the same size, this reduces to O p k u ` k u m ` N µ m ` M m q . Assuming the breadth ( m )and depth ( M ) of the network scale independently of N u , we expect this approach to be eﬃcient becausethe dependence on the large dimension has been removed. If we require the network breadth and depthto be on the order of the size of the reduced basis, i.e., m „ O p k u q and M „ O p k u q , the complexityof a time step reduces to O p k u ` N µ k u q . This shows the bottleneck caused by the nonlinear terms hasbeen completely eliminated using the proposed FCNN approximation of the nonlinear velocity function with O p k u q layers and neurons per layer because, asymptotically, the cost is similar to that of a direct solve withthe reduced Jacobian matrix. A similar result follows with broader, shallower networks, e.g., m „ O p k { u q and M „ O p q , with complexity per time step: O p k u ` N µ k { u q . Similarly, deeper, narrower networks can beused, e.g., m „ O p k { u q and M „ O p k u q , with a complexity per time step of O p k u ` N µ k { u q . The numberof parameters ( N µ ) is usually small and always independent of k u , therefore the dominant complexity forall the network structures mentioned is O p k u q .

4. Numerical experiments

In this section, we test the accuracy, stability, and parametric robustness of the proposed ROM-NNmethod using two dynamical systems that result from the semi-discretization of nonlinear, hyperbolic PDEs.We compare the performance of the ROM-NN method to a standard Galerkin-POD ROM, which providesa theoretical lower bound on the accuracy of the ROM-NN, and the most popular intrusive hyperreduction7ethod, (D)EIM. For both problems, we deﬁne the parameter space D and introduce two subsets Ξ Ă Ξ ˚ Ă D , where Ξ are the parameters used to train the reduced-order models and Ξ ˚ are all parameters wherethe accuracy of the models is tested (includes the training points). Recall the deﬁnition of the parametricHDM solution u (1) and its approximation provided by the ROM u r (2), (D)EIM u d (9), and ROM-NN u n (13). For a given parameter µ P D , we quantify the error between the HDM solution u p ¨ , µ q and anapproximate solution v p ¨ , µ q as (cid:15) p v ; µ q : “ gffe ř N t i “ } v p t i , µ q ´ u p t i , µ q} ř N t i “ } u p t i , µ q} . (16)Therefore the error in the ROM, (D)EIM, and ROM-NN solutions are (cid:15) r p µ q : “ (cid:15) p u r ; µ q , (cid:15) d p µ q : “ (cid:15) p u d ; µ q , (cid:15) n p µ q : “ (cid:15) p u n ; µ q , (17)respectively. In the rest of this section, we will consider the statistics (minimum, maximum, and median)of these error metrics over the training set Ξ and testing set Ξ ˚ z Ξ . In this work, we do not comparethe computational cost of the HDM, ROM, (D)EIM, and ROM-NN because it is heavily dependent on theimplementation and a number algorithmic choices, e.g., choice of linear solver. The ﬁrst numerical experiment we consider is solution of the one-dimensional, parametrized, viscousBurgers’ equation in the domain Ω : “ p , q , where u : Ω ˆ r , T s ˆ D Ñ R solves B t u p x, t, µ q ` u p x, t, µ qB x u p x, t, µ q “ ν p µ qB xx u p x, t, µ q ` g p x, µ q , x P p , q , t P r , T s , µ P D ,u p , t, µ q “ , u p , t, µ q “ , t P r , T s , µ P D . (18)The time interval is taken as T “ D : “ r . , . s ˆ r , s ˆ r , s Ă R . Forany µ P D where µ “ p µ , µ , µ q , the parametrized viscosity and source term are deﬁned as ν p µ q “ µ , g p x, µ q “ µ e µ x . (19)The PDE is discretized in space using 200 linear ﬁnite elements with essential boundary conditions stronglyenforced to yield a dynamical system of the form (1) with a total of N u “

199 spatial degrees of freedom.The dynamical system is discretized in time using the two-stage diagonally implicit Runge-Kutta method[65] with 100 time steps.For this problem, we deﬁne the testing set Ξ ˚ as the uniform sampling of D on a 5 ˆ ˆ | Ξ ˚ | “

125 parameter conﬁgurations. We consider two training sets: Ξ a , Ξ b are the uniform samplingsof D on a 2 ˆ ˆ ˆ ˆ Ξ a Ă Ξ b Ă Ξ ˚ . For both training sets,we construct a POD-Galerkin ROM without hypereduction, accelerated with (D)EIM, and accelerated withthe neural network approximation of f r using k u “ Ξ ˚ .The reduced-order model without hyperreduction is the most stable and accurate method, which isexpected since it computes the velocity function f r exactly. Nonlinear approximation via (D)EIM is theleast accurate approach and even goes unstable for a number of training and testing points for this smallbasis size. The neural network approach to approximate the nonlinear terms is more accurate than (D)EIMand is stable for all points in Ξ ˚ . These observations are taken from Figures 2 and 3, which contain the PDEstate vector computed with each model at several instances in time for various points in Ξ ˚ and Figure 4that directly compares the accuracy of (D)EIM and ROM-NN.For the training set Ξ a , the minimum error across both the training and testing sets are comparable for(D)EIM and ROM-NN. Since (D)EIM is unstable on both training and testing points, the maximum erroris large. The ROM-NN approach is stable for all points in Ξ ˚ ; however, its maximum error on the test set Ξ ˚ z Ξ a is large ( « Ξ b and keeping the ROM size ﬁxed ( k u “ Ξ ˚ ), but the accuracy improves for stable conﬁgurations (median81 . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , . , q (test) u p x , t , µ q µ “ p . , . , q (test) µ “ p . , . , q (test) . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , . , q (test) u p x , t , µ q µ “ p . , . , q (test) µ “ p . , . , q (test) . . µ “ p . , , q (train) x u p x , t , µ q . µ “ p . , , q (test) x . µ “ p . , , q (train) x Figure 2: Snapshots of viscous Burgers’ equation ( t “ , . , . , . , . , . ,

1) at various parameter conﬁgurations usingHDM ( ), ROM ( ), (D)EIM ( ), ROM-NN ( ). The model reduction methods are trained using 8 parametersamples ( Ξ a ) for a total of 800 snapshots and compressed to a size k u “

8. In most cases, including both training and testingconﬁguration, the ROM-NN model is more accurate than the (D)EIM model and does not exhibit the same stability issues. . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , . , q (test) u p x , t , µ q µ “ p . , . , q (test) µ “ p . , . , q (test) . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , , q (train) u p x , t , µ q µ “ p . , , q (test) µ “ p . , , q (train) . µ “ p . , . , q (test) u p x , t , µ q µ “ p . , . , q (test) µ “ p . , . , q (test) . . µ “ p . , , q (train) x u p x , t , µ q . µ “ p . , , q (test) x . µ “ p . , , q (train) x Figure 3: Snapshots of viscous Burgers’ equation ( t “ , . , . , . , . , . ,

1) at various parameter conﬁgurations usingHDM ( ), ROM ( ), (D)EIM ( ), ROM-NN ( ). The model reduction methods are trained using 27 parametersamples ( Ξ b ) for a total of 2700 snapshots and compressed to a size k u “

8. In most cases, including both training and testingconﬁguration, the ROM-NN model is more accurate than the (D)EIM model and does not exhibit the same stability issues. able 1: Summary of the performance of the model reduction methods trained on Ξ a , compressed to k u “

8, and tested on Ξ ˚ . The error statistics are reported for the training set Ξ a and testing set Ξ ˚ z Ξ a separately. The ROM-NN is stable for alltraining and testing points considered and has a median error less than 3% on the training set and about 5% on the testingset. The (D)EIM method goes unstable at a number of training and testing point and has a median error greater than 10%. ROM (D)EIM ROM-NNTrain set Test set Train set Test set Train set Test setUnstable (

Table 2: Summary of the performance of the model reduction methods trained on Ξ b , compressed to k u “

8, and tested on Ξ ˚ . The error statistics are reported for the training set Ξ b and testing set Ξ ˚ z Ξ b separately. The ROM-NN is stable for alltraining and testing points considered and has a median error less than 3% on both the training and testing set. The (D)EIMmethod goes unstable at a number of training and testing point and has a median error greater than 10%. ROM (D)EIM ROM-NNTrain set Test set Train set Test set Train set Test setUnstable ( Ξ a is used as the training set,but the errors on the testing set become smaller suggesting the additional training leads to better prediction.The median and maximum errors of the ROM-NN for both training and testing sets are roughly 3% and6%, respectively (Table 2). -air ﬂame model The second numerical experiment we consider is solution of a simpliﬁed model of a premixed H -air ﬂameat a constant and uniform pressure, in a constant, divergence-free velocity ﬁeld, and with constant, uniformdiﬀusivities for all species and temperature in the domain Ω : “ r , L x s ˆ r , L y s , where L x “ L y “ r , T s , T “ . ` O Ñ O.The PDE model of this system [66] is B t U p x, t, µ q ´ κ ∆ U p x, t, µ q ` β ¨ ∇ U p x, t, µ q “ N p U, µ q , x P Ω , t P r , T s , µ P D ,U p x, t, µ q “ U D p x q , x P Γ D , t P r , T s , µ P D , ∇ U p x, t, µ q ¨ n p x q “ , x P Γ N , t P r , T s , µ P D ,U p x, , µ q “ U , x P Ω , µ P D (20)with solution U : Ω ˆ r , T s ˆ D Ñ R , p x, t, µ q ÞÑ »——– Y F p x, t, µ q Y O p x, t, µ q Y P p x, t, µ q Θ p x, t, µ q ﬁﬃﬃﬂ , (21)where n : B Ω Ñ R is the outward unit normal, Y i : Ω ˆ r , T s ˆ D Ñ R is the mass fraction of the hydrogenfuel ( i “ F ), oxygen ( i “ O ), and water product ( i “ P ), and Θ : Ω ˆ r , T s ˆ D Ñ R is the temperature.The domain boundary is split into six segments (Figure 5) B Ω “ ď i “ Γ i , Γ D : “ ď i “ Γ i , Γ N : “ ď i “ Γ i (22)with the following essential boundary conditions prescribed on Γ D Ă B Ω U D : Ω Ñ R , x ÞÑ p , , , q x P Γ Y Γ p . , . , , q x P Γ (23)110 ´ ´ ´ ´ (cid:15) d p µ q (cid:15) n p µ q ´ ´ (cid:15) d p µ q Figure 4: Comparison of the error in the (D)EIM and ROM-NN ( k u “

8) with respect to the HDM solution when trainedwith Ξ a ( left ) or Ξ b ( right ) for each point in Ξ ˚ . The individual marks correspond to the (D)EIM error vs. the ROM errorfor training ( ) and testing ( ) points. All entries that lie below the line of identity ( ) correspond to parameters where theROM-NN is more accurate than (D)EIM. For both training cases, far more points lie below the line of identity indicating theROM-NN is more accurate across the testing set Ξ ˚ than (D)EIM. Γ Γ Γ Γ Γ Figure 5: Schematic setup for the hydrogen-air ﬂame (units: mm). and homogeneous natural boundary conditions on Γ N Ă B

Ω. The nonlinear reaction source term is of Arrhe-nius type and modeled as in Cuenot and Poinsot [67] as N p U, µ q “ r N F p U, µ q , N O p U, µ q , N P p U, µ q , N Θ p U, µ qs ,where N i p U, µ q “ ´ ν i ˆ W i ρ ˙ ˆ ρY F W F ˙ ν F ˆ ρY O W O ˙ ν O A p µ q exp ˆ ´ E p µ q R Θ ˙ N Θ p U, µ q “ N P p U, µ q Q (24)for i “ F, O, P . The divergence-free velocity ﬁeld is β “ p , q cm/sec. The diﬀusivities are κ “ . /sec and the density of the mixture is ρ “ . ˆ ´ gr/cm . The molecular weights are W F “ . W O “ . W P “

18 gr/mol, the stoichiometric coeﬃcients are ν F “ ν O “ ν P “

2, the heat of reactionis Q “ R “ . D : “ r . ˆ , . ˆ s ˆ r . , s . For any µ P D where µ “ p µ , µ q , the parametrizedpre-exponential factor and activation energy are taken as A p µ q “ µ , E p µ q “ µ . (25)At t “

0, the domain is considered empty at a temperature of 300 K , i.e., U “ p , , , q . The PDE isdiscretized in space using the ﬁnite diﬀerence method on a grid of 40 ˆ

20 with essential boundary conditionsstrongly enforced to yield a dynamical system of the form (1) with a total of N u “ Ξ ˚ as the uniform sampling of D on a 7 ˆ | Ξ ˚ | “

49 test points. The training set is a uniform sampling of D on a 4 ˆ Ξ Ă Ξ ˚ . Similar to the previous section, we construct a POD-Galerkin ROM without12369 x ( mm ) x ( mm ) x (mm) x ( mm ) x (mm) Figure 6: Snapshots of advection-diﬀusion-reaction temperature (Θ p x, t, µ q ) ﬁeld ( t “ , . , . , . , . , .

06; top-to-bottom then left-to-right) at µ “ p . ˆ , . q computed using the HDM.

130 100 120 140 160 180 20010 ´ ´ ´ ´ ´ k u (cid:15) p ¨ ; µ q Figure 7: The maximum (circle), minimum (cross), and median (plus) error of the ROM (solid), (D)EIM (dotted), and ROM-NN (dashed) over the set Ξ ˚ . While the ROM error statistics demonstrate deep convergence under reﬁnement of k u , theROM-NN does not (expected because ˆ f has more inputs/outputs as k u increases and therefore becomes more diﬃcult totrain). Even though the ROM-NN does not have deep convergence, its maximum and median errors are small (3%). (D)EIMis unstable for k u ă k u ě

150 for stability, it is more accurate than the ROM-NN. Legend: maximum ROM error ( ), minimum ROM error ( ), median ROM error ( ), maximum (D)EIM error( ), minimum (D)EIM error ( ), median (D)EIM error ( ), maximum ROM-NN error ( ), minimum ROM-NNerror ( ), median ROM-NN error ( ). hyperreduction, one accelerated with (D)EIM, and one accelerated with the neural network for a range ofbasis sizes ( k u ) and subsequently test each model on all points in Ξ ˚ .The reduced-order model without hyperreduction is the most stable and accurate method and demon-strates deep convergence under reﬁnement of k u , even when error metric is aggregated over both testing andtraining points. For this problem, (D)EIM was highly unstable; a basis of size k u “

150 was required forstability, while the other methods were stable for a basis of size k u “

80. When (D)EIM is stable, it is moreaccurate than the proposed ROM-NN method. The ROM-NN method is stable for all basis sizes considered,but does not exhibit deep convergence (the error is about 2% for basis sizes k u P r , s ). We have runextensive numerical tests to conﬁrm this is due to the scaling of the entries of f r , which can vary by up toseven orders of magnitude for this problem. This implies the loss function used to train the FCNN is heavilybiased toward the important coeﬃcients and the smallest coeﬃcients are all but ignored in the training. Asa result, the relative error of the small coeﬃcients (required for deep convergence) is large. Weighting theloss function to oﬀset the massively diﬀerent scale of the loss function terms causes the approximation ofsmall coeﬃcients to improve, but the accuracy of the large (important) coeﬃcients degrades for the networksconsidered in this work, which causes the overall error to increase. Because the ROM-NN does not exhibitdeep convergence, there is little point in reﬁning the basis beyond k u “

80 because the reduction in erroris negligible. The maximum and median errors of the ROM-NN solution ( k u “

80) across all training andtesting parameters are small (2 ´

5. Conclusion

This work developed a non-intrusive acceleration technique for projection-based model reduction ofnonlinear dynamical systems using deep neural networks. The approach is non-intrusive in the sense that ittreats both the original dynamical system and neural network code as black boxes. The method is trainedusing the same HDM solutions computed to train the reduced basis, i.e., no new simulations are required totrain the neural network approximation of the ROM velocity function, only evaluations of the ROM velocityat existing snapshots. Unlike traditional hyperreduction methods, this does not require modiﬁcation of theunderlying dynamical system code because, once the neural network is trained, only forward propagation(and symbolic diﬀerentiation) through the network is required to compute the approximate velocity function14nd its derivative. Aside from the beneﬁt of non-intrusivity, the proposed method is more stable and accurateat both training and testing points in the limit of a small basis than the popular (D)EIM hyperreductionfor the two dynamical systems considered (semi-discretizations of nonlinear, hyperbolic PDEs). Given weused uniform sampling to train the ROM-NN, this approach may be most appropriate for problems witha limited number of parameters. Future work will explore whether the amount of training can be reducedusing POD-Greedy sampling [53] without sacriﬁcing stability or parametric robustness. We will also performa careful study of the computational cost of the proposed approach and the beneﬁts of using GPUs to trainand pass through the neural network in future work.

References [1] M. J. Zahr, P.-O. Persson, An adjoint method for a high-order discretization of deforming domainconservation laws for optimization of ﬂow problems, Journal of Computational Physics 326 (SupplementC) (2016) 516 – 543. doi:https://doi.org/10.1016/j.jcp.2016.09.012 .[2] M. J. Zahr, P.-O. Persson, J. Wilkening, A fully discrete adjoint method for optimization of ﬂowproblems on deforming domains with time-periodicity constraints, Computers & Fluids 139 (2016) 130– 147. doi:10.1016/j.compfluid.2016.05.021 .[3] J. Wang, M. Zahr, P.-O. Persson, Energetically optimal ﬂapping ﬂight based on a fully discrete adjointmethod with explicit treatment of ﬂapping frequency, in: Proc. of the 23rd AIAA Computational FluidDynamics Conference, American Institute of Aeronautics and Astronautics, Denver, Colorado, 6/5/2017– 6/9/2017.[4] M. J. Zahr, P.-O. Persson, Energetically optimal ﬂapping wing motions via adjoint-based optimizationand high-order discretizations, in: Frontiers in PDE-Constrained Optimization, Springer, 2018.[5] M. Barrault, Y. Maday, N. Nguyen, A. Patera, An ‘empirical interpolation’ method: application toeﬃcient reduced-basis discretization of partial diﬀerential equations, Comptes Rendus Mathematique339 (9) (2004) 667–672.[6] S. Chaturantabut, D. Sorensen, Nonlinear model reduction via discrete empirical interpolation, SIAMJournal on Scientiﬁc Computing 32 (5) (2010) 2737–2764.[7] K. Carlberg, D. Amsallem, P. Avery, M. Zahr, C. Farhat, The GNAT nonlinear model reduction methodand its application to ﬂuid dynamics problems, in: 6th AIAA Theoretical Fluid Mechanics Conference,2011, p. 3112.[8] C. Farhat, T. Chapman, P. Avery, Structure-preserving, stability, and accuracy properties of the energy-conserving sampling and weighting method for the hyper reduction of nonlinear ﬁnite element dynamicmodels, Internat. J. Numer. Methods Engrg. 102 (5) (2015) 1077–1110. doi:10.1002/nme.4820 .URL https://doi.org/10.1002/nme.4820 [9] M. J. Zahr, P. Avery, C. Farhat, A multilevel projection-based model order reduction framework fornonlinear dynamic multiscale problems in structural and solid mechanics, International Journal forNumerical Methods in Engineering 112 (8) (2017) 855–881. doi:10.1002/nme.5535 .[10] K. Washabaugh, Faster ﬁdelity for better design: A scalable model order reduction framework for steadyaerodynamic design applications, Ph.D. thesis, Stanford University (2016).[11] C. Audouze, F. De Vuyst, P. B. Nair, Nonintrusive reduced-order modeling of parametrized time-dependent partial diﬀerential equations, Numerical Methods for Partial Diﬀerential Equations 29 (5)(2013) 1587–1628.[12] B. Peherstorfer, K. Willcox, Data-driven operator inference for nonintrusive projection-based modelreduction, Computer Methods in Applied Mechanics and Engineering 306 (2016) 196–215.[13] W. Chen, J. S. Hesthaven, B. Junqiang, Y. Qiu, Z. Yang, Y. Tihao, Greedy nonintrusive reduced ordermodel for ﬂuid dynamics, AIAA Journal 56 (12) (2018) 4927–4943.1514] S.-T. Yeh, X. Wang, C.-L. Sung, S. Mak, Y.-H. Chang, L. Zhang, C. J. Wu, V. Yang, Common properorthogonal decomposition-based spatiotemporal emulator for design exploration, AIAA Journal 56 (6)(2018) 2429–2442.[15] M. Rewienski, J. White, A trajectory piecewise-linear approach to model order reduction and fastsimulation of nonlinear circuits and micromachined devices, IEEE Transactions on computer-aideddesign of integrated circuits and systems 22 (2) (2003) 155–170.[16] N. Dong, J. Roychowdhury, Piecewise polynomial nonlinear model reduction, in: Proceedings of the40th annual Design Automation Conference, ACM, 2003, pp. 484–489.[17] M. Cardoso, L. Durlofsky, Linearized reduced-order models for subsurface ﬂow simulation, Journal ofComputational Physics 229 (3) (2010) 681–700.[18] J. He, J. Sætrom, L. Durlofsky, Enhanced linearized reduced-order models for subsurface ﬂow simulation,Journal of Computational Physics 230 (23) (2011) 8313–8341.[19] S. Trehan, L. Durlofsky, Trajectory piecewise quadratic reduced-order model for subsurface ﬂow, withapplication to PDE-constrained optimization, Journal of Computational Physics 326 (2016) 446–473.[20] D. Vasilyev, M. Rewienski, J. White, Macromodel generation for biomems components using a stabilizedbalanced truncation plus trajectory piecewise-linear approach, IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems 25 (2) (2006) 285–293.[21] M. J. Zahr, K. Carlberg, D. Amsallem, C. Farhat, Comparison of model reduction techniques on high-ﬁdelity linear and nonlinear electrical, mechanical, and biological systems, Tech. rep., University ofCalifornia, Berkeley (2010).[22] S. Lee, N. Baker, Basic research needs for scientiﬁc machine learning: Core technologies for artiﬁcialintelligence, Tech. rep., USDOE Oﬃce of Science (SC)(United States) (2018).[23] S. Brunton, B. Noack, P. Koumoutsakos, Machine learning for ﬂuid mechanics, arXiv preprintarXiv:1905.11075.[24] J.-X. Wang, J.-L. Wu, H. Xiao, Physics-informed machine learning approach for reconstructing Reynoldsstress modeling discrepancies based on DNS data, Physical Review Fluids 2 (3) (2017) 034603.[25] P. E. Shanahan, D. Trewartha, W. Detmold, Machine learning action parameters in lattice quantumchromodynamics, Physical Review D 97 (9) (2018) 094506.[26] S. L. Brunton, J. N. Kutz, Data-driven Science and Engineering: Machine Learning, Dynamical Systems,and Control, Cambridge University Press, 2019.[27] S. Trehan, K. T. Carlberg, L. J. Durlofsky, Error modeling for surrogates of dynamical systems usingmachine learning, International Journal for Numerical Methods in Engineering 112 (12) (2017) 1801–1827.[28] O. San, R. Maulik, Machine learning closures for model order reduction of thermal ﬂuids, AppliedMathematical Modelling 60 (2018) 681–710.[29] O. San, R. Maulik, Neural network closures for nonlinear model order reduction, Advances in Compu-tational Mathematics 44 (6) (2018) 1717–1750.[30] S. Pan, K. Duraisamy, Data-driven discovery of closure models, SIAM Journal on Applied DynamicalSystems 17 (4) (2018) 2381–2413.[31] Z. Y. Wan, P. Vlachas, P. Koumoutsakos, T. Sapsis, Data-assisted reduced-order modeling of extremeevents in complex dynamical systems, PloS one 13 (5) (2018) e0197704.1632] M. Mohebujjaman, L. G. Rebholz, T. Iliescu, Physically constrained data-driven correction for reduced-order modeling of ﬂuid ﬂows, International Journal for Numerical Methods in Fluids 89 (3) (2019)103–122.[33] R. Maulik, A. Mohan, B. Lusch, S. Madireddy, P. Balaprakash, Time-series learning of latent-spacedynamics for reduced-order model closure, arXiv preprint arXiv:1906.07815.[34] C. Mou, H. Liu, D. R. Wells, T. Iliescu, Data-driven correction reduced order models for the quasi-geostrophic equations: A numerical investigation, arXiv preprint arXiv:1908.05297.[35] D. Xiao, F. Fang, A. Buchan, C. Pain, I. Navon, A. Muggeridge, Non-intrusive reduced order modellingof the navier–stokes equations, Computer Methods in Applied Mechanics and Engineering 293 (2015)522–541.[36] J. N. Kani, A. H. Elsheikh, Dr-rnn: A deep residual recurrent neural network for model reduction, arXivpreprint arXiv:1709.00939.[37] A. T. Mohan, D. V. Gaitonde, A deep learning based approach to reduced order modeling for turbulentﬂow control using lstm neural networks, arXiv preprint arXiv:1804.09269.[38] J. S. Hesthaven, S. Ubbiali, Non-intrusive reduced order modeling of nonlinear problems using neuralnetworks, Journal of Computational Physics 363 (2018) 55–78.[39] Z. Wang, D. Xiao, F. Fang, R. Govindan, C. C. Pain, Y. Guo, Model identiﬁcation of reduced order ﬂuiddynamics systems using deep learning, International Journal for Numerical Methods in Fluids 86 (4)(2018) 255–268.[40] O. San, R. Maulik, M. Ahmed, An artiﬁcial neural network framework for reduced order modeling oftransient ﬂows, Communications in Nonlinear Science and Numerical Simulation 77 (2019) 271–287.[41] F. Regazzoni, L. Ded`e, A. Quarteroni, Machine learning for fast and reliable solution of time-dependentdiﬀerential equations, Journal of Computational Physics 397 (2019) 108852.[42] Q. Wang, J. S. Hesthaven, D. Ray, Non-intrusive reduced order modeling of unsteady ﬂows usingartiﬁcial neural networks with application to a combustion problem, Journal of computational physics384 (2019) 289–307.[43] K. Li, J. Kou, W. Zhang, Deep neural network for unsteady aerodynamic and aeroelastic modelingacross multiple mach numbers, Nonlinear Dynamics 96 (3) (2019) 2157–2177.[44] H. F. Lui, W. R. Wolf, Construction of reduced-order models for ﬂuid ﬂows using deep feedforwardneural networks, Journal of Fluid Mechanics 872 (2019) 963–994.[45] X. Xie, G. Zhang, C. G. Webster, Non-intrusive inference reduced order model for ﬂuids using deepmultistep neural network.[46] S. Pawar, S. Rahman, H. Vaddireddy, O. San, A. Rasheed, P. Vedula, A deep learning enabler fornonintrusive reduced order modeling of ﬂuid ﬂows, Physics of Fluids 31 (8) (2019) 085101.[47] Z. L. Jin, Y. Liu, L. J. Durlofsky, Deep-learning-based reduced-order modeling for subsurface ﬂowsimulation, arXiv preprint arXiv:1906.03729.[48] N. D. Santo, S. Deparis, L. Pegolotti, Data driven approximation of parametrized pdes by reduced basisand neural networks, arXiv preprint arXiv:1904.01514.[49] R. Maulik, V. Rao, S. Madireddy, B. Lusch, P. Balaprakash, Using recurrent neural networks fornonlinear component computation in advection-dominated reduced-order models, arXiv:1909.09144.[50] D. Amsallem, M. Zahr, C. Farhat, Nonlinear model order reduction based on local reduced-order bases,Internat. J. Numer. Methods Engrg. 92 (10) (2012) 891–916.1751] L. Sirovich, Turbulence and the dynamics of coherent structures. I. Coherent structures, Quarterly ofApplied Mathematics 45 (3) (1987) 561–571.[52] G. Rozza, D. Huynh, A. Patera, Reduced basis approximation and a posteriori error estimation foraﬃnely parametrized elliptic coercive partial diﬀerential equations, Archives of Computational Methodsin Engineering 15 (3) (2008) 229–275.[53] B. Haasdonk, Convergence rates of the POD–Greedy method, ESAIM: Mathematical Modelling andNumerical Analysis 47 (3) (2013) 859–873.[54] D. Ryckelynck, A priori hyperreduction method: an adaptive approach, Journal of ComputationalPhysics 202 (1) (2005) 346–366.[55] S. An, T. Kim, D. James, Optimizing cubature for eﬃcient integration of subspace deformations, in:ACM Transactions on Graphics (TOG), Vol. 27, ACM, 2008, p. 165.[56] P. Astrid, S. Weiland, K. Willcox, T. Backx, Missing point estimation in models described by properorthogonal decomposition, IEEE Transactions on Automatic Control 53 (10) (2008) 2237–2251.[57] P. Tiso, D. Rixen, Discrete empirical interpolation method for ﬁnite element structural dynamics, in:Topics in Nonlinear Dynamics, Volume 1, Springer, 2013, pp. 203–212.[58] M. Yano, Discontinuous Galerkin reduced basis empirical quadrature procedure for model reduction ofparametrized nonlinear conservation laws, Advances in Computational Mathematics (2019) 1–34.[59] K. Carlberg, C. Farhat, J. Cortial, D. Amsallem, The GNAT method for nonlinear model reduction:Eﬀective implementation and application to computational ﬂuid dynamics and turbulent ﬂows, J. Com-put. Phys. 242 (2013) 623 – 647. doi:https://doi.org/10.1016/j.jcp.2013.02.028 .URL