Projection-based model reduction of dynamical systems using space-time subspace and machine learning
PProjection-based model reduction of dynamical systemsusing space-time subspace and machine learning
Chi Hoang ∗ Kenny Chowdhary ∗ Kookjin Lee ∗ Jaideep Ray ∗ Abstract
This paper considers the creation of parametric surrogate models for applications in science and engi-neering where the goal is to predict high-dimensional spatiotemporal output quantities of interest, suchas pressure, temperature and displacement fields. The proposed methodology develops a low-dimensionalparametrization of these quantities of interest using space-time bases combining with machine learningmethods. In particular, the space-time solutions are sought in a low-dimensional space-time linear trialsubspace that can be obtained by computing tensor decompositions of usual state-snapshots data. Themapping between the input parameters and the basis expansion coefficients (or generalized coordinates)is approximated using four different machine learning techniques: multivariate polynomial regression,k-nearest-neighbors, random forest and neural network. The relative costs and effectiveness of the fourmachine learning techniques are explored through three engineering problems: steady heat conduction,unsteady heat conduction and unsteady advective-diffusive-reactive system. Numerical results demon-strate that the proposed method performs well in terms of both accuracy and computational cost, andhighlight the important point that the amount of model training data available in an engineering settingis often much less than it is in other machine learning applications, making it essential to incorporateknowledge from physical models. In addition, simpler machine learning techniques are seen to performbetter than more elaborate ones.
Keywords : model reduction, data-driven reduced models, physics-based machine learning, space-time bases, surrogate models, non-intrusive reduced models
Many tasks in computational science and engineering are many query in nature, as they require the repeatedsimulations of a parameterized large-scale computational model. Model reduction has become a popularapproach to make such tasks tractable. Most of such techniques first perform an “offline” training stagethat simulates the computational model for multiple input-parameter instances to build a low-dimensionalsubspace or manifold. Then during an “online” deployed stage, these techniques reduce the dimensional-ity and complexity of the original computational model at arbitrary unseen input-parameter instances byperforming a projection process of the original computational model onto that low-dimensional subspace ormanifold. The projection framework can be combined with various ways of representing parametric depen-dence, where the input parameters might describe, for example, material properties, system geometry, systemconfiguration, initial conditions and/or boundary conditions. The output generally consists of quantities ofinterest in the form of whole fields or some extractions/functions of the field such as pressure, temperatureor displacement fields.For initial boundary value problems, many model reduction approaches based on time and frequencydomains have been proposed. In particular, moment-matching methods are commonly used for frequencydomain-based model reduction, such as the Lanczos algorithm [1], the Arnoldi algorithm [2], and the Krylov ∗ Extreme-scale Data Science and Analytics Department, Sandia National Laboratories, Livermore, CA 94550 ([email protected]). Sandia National Laboratories is a multimission laboratory managed and operated by National Technologyand Engineering Solutions of Sandia, LLC, a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Departmentof Energy’s National Nuclear Security Administration under contract DE-NA-0003525. a r X i v : . [ phy s i c s . c o m p - ph ] F e b ubspace method [3, 4]. These moment-matching techniques are primarily applicable to model reduction oflinear time-invariant systems. Time domain-based model reduction techniques can be divided into severaltypes, such as the balanced truncation method [5], the proper orthogonal decomposition (POD) method [6],the reduced basis (RB) method [7], the proper generalized decomposition (PGD) method [8], and recentlythe dynamic modes decomposition (DMD) method [9], to name just a few. We are also aware of severalhyper-reduction techniques that can be used to enhance some of the aformentioned methods such as theMissing Point Estimation method [10], the Gauss–Newton with approximated tensors (GNAT) [11], theEmpirical Interpolation Method (EIM) [12], and its discrete variant the Discrete Empirical InterpolationMethod (DEIM) [13], etc.Alternatively, another approach involves learning the mapping from input parameters to output low-dimensional representations of the high-fidelity model data (i.e., generalized coordinates or basis expansioncoefficients). That is, these methods first (i) build off a set of “snapshots” that are solutions associatedwith the high-fidelity model for different input-parameter instances; then (ii) project snapshot data onto alow-dimensional subspace and finally (iii) use the low-dimensional, projected snapshots as training data tobuild the reduced models as mappings from the inputs to the output basis expansion coefficients (or outputgeneralized coordinates). Both classical interpolation and standard machine-learning-based methods havebeen used to determine such mappings. Perhaps the earliest example that approximates such an input-outputmapping can be found in [14], in which the authors compute a POD representation of the temperature field ina Rayleigh-B´enard convection problem, and then uses a cubic spline interpolation to predict the temperaturefield at Rayleigh numbers that are not included in the snapshot training data. Ref. [15] reconstructs PODrepresentations of aerodynamic quantities from incomplete data using interpolation via gappy POD [16].Other approaches use Gaussian process regression to build a model for the POD coefficients as a functionof the inputs [17, 18, 19]. The authors in Ref. [20] learn the mapping from inputs to POD coefficientsusing an adaptive combination of self-organizing maps and local response surfaces, while [21, 22] learns themapping from inputs to POD coefficients using neural networks. In Ref. [23], learning the POD coefficientsis coupled with a greedy approach to actively guide the sampling in the input domain. Interestingly, thework in Ref. [24] approximates the mapping between input parameters and the POD coefficients by exploringvarious machine learning (ML) methods, and partly inspires this current paper.Instead of learning a mapping from the inputs to the coefficients of POD representations, data-drivenmodel reduction seeks to learn the operators of reduced models (or more of system identification or operatorregression). Thus, these data-driven model reduction techniques provide a way to learn the dynamics of thesystem of interest in the form of reduced-model operators that respect some of the structure of the underlyinggoverning equations. Data-driven model reduction shares similarities with classical system identification[25], where dynamical system models are extracted from time- and frequency-domain measurements. Theeigensystem realization algorithm [26, 27] and finite impulse response system identification [28] are twosystem identification techniques that seek to learn reduced models from impulse responses of linear time-invariant systems. Vector fitting is a regression-based approach to learn a reduced model from frequency-response data [29]. Data-driven model reduction methods based on dynamic mode decomposition learnlinear reduced operators that best-fit given data in the (cid:96) norm [30]. The work in [31] uses least-squaresregression to find operators in a similar way to dynamic mode decomposition but is applicable to modelswith low-order polynomial terms. The work in [32] uses a recursive variable selection strategy to learn areduced model, with a focus on retaining physical interpretability. A different line of work leverages (data)sparsity-promoting regression techniques to learn reduced models from data such as in [33, 34, 35, 36].Despite this wealth of literature, it seems that no work has sought to construct a map between inputparameters and space-time bases expansion coefficients. In comparison with spatial bases reduced ordermodeling, the main benefit of space-time bases is that fewer generalized coordinates need to be computed inorder to characterize the complete space–time solution. This is because both space and time are embedded tothe reduced bases, as opposed to just one or the other. This approach is quite new as most of the previousworks in the literature propose using parameter- and time-dependent generalized coordinates (i.e., spatial We also note that space-time basis computations are generally more computationally expensive and complicated than usualspatial bases due to memory requirements and implementations. • Our proposed method can be considered as non-intrusive model reduction (or data-driven model re-duction) and PDE-agnostic which means that one does not need to know (and solve) any underlying reduced partial differential equation (PDE) online. The only requirement is that full-order modelsnapshots are already available (to create space-time bases and train surrogate models). • We compare various machine learning techniques to approximate the mapping between input parame-ters and output space-time bases expansion coefficients: multivariate polynomial regression, k-nearestneighbors, random forest and neural network. • In the offline stage, we collect usual (space-time) state snapshots to build space-time bases and trainthe machine learning surrogate models. In the online stage, for an arbitrary input parameter we expressthe space-time solution as linear combinations of those space-time bases and compute the coefficientsusing the trained surrogate models. • The space-time basis expansion coefficients only depend on input parameters and not on time. • The proposed method is verified on three benchmark problems: parameterized steady heat conductionPDE, parameterized unsteady heat conduction PDE and parameterized advective-diffusive-reactiveproblem. Numerical results show that the online computation of machine learning surrogate models isextremely fast and the relative solution error is less than 1% on average. • We show that while our proposed approach is less accurate than the least-square Petrov-Galerkin(LSPG) model reduction (without hyper-reduction)[44, 45, 46], it is significantly faster.The paper is structured as follows. Section 2 characterizes the full-order models for both steady andunsteady problems. Section 3 details the proposed projection-based model reduction technique that addressboth steady (Section 3.1) and unsteady problems (Section 3.2). Section 4 describes four machine learningmethods used to construct the mapping between input parameters and the output basis expansion coefficients.Section 5 reports the numerical experiments, and Section 6 concludes the paper.
We consider the (high-fidelity) full-order model (FOM) of a dynamical system represented as a parameterizedsystem of ordinary differential equations (ODEs) r ( x ; µ ) := f ( x , t ; µ ) − ˙ x = 0 , x (0; µ ) = x ( µ ) , (2.1)where t ∈ [0 , T ] denotes time with final time T ∈ R + , and x : [0 , T ] × D → R N s denotes the time-dependent,parameterized state as the solution of problem (2.1) for the parameters µ ∈ D , and r : R N s × D → R N s istime-continuous residual (trivially zero) at time t . Here, D ⊆ R n µ is the parameter space, x : D → R N s is the parameterized initial condition, and f : R N s × [0 , T ] × D → R N s is the velocity. Differentiation ofa variable x with respect to time is denoted by ˙ x . Dynamical systems such as these may arise from thesemidiscretization of a time-dependent PDE model. We refer to Eq. (2.1) as the FOM ODE.We note that when ˙ x vanishes, the FOM model reduces to a steady parameterized system of nonlinearalgebraic equations. In such a case, all time-derivatives in the following treatment disappear.Numerically solving Eq. (2.1) requires a time-discretization method. In this work we use a linear multistepmethod. A linear k -step method applied to Eq. (2.1) leads to the algebraic system r n ( x n ; µ ) = , n = 1 , . . . , N t , (2.2)3here the time-discrete residual r n : R N s × D → R N s is defined as r n : ( ξ ; ν ) (cid:55)→ α ξ − ∆ tβ f ( ξ , t n ; ν ) + k (cid:88) j =1 α j x n − j − ∆ t k (cid:88) j =1 β j f ( x n − j , t n − j ; ν ) . (2.3)Here, x k is the numerical approximation to x ( k ∆ t ; µ ), ∆ t ∈ R + is the time step, and coefficients α j and β j , j = 0 , . . . , k with (cid:80) kj =0 α j = 0 define a particular linear multistep scheme. We obtain an implicit time-discretization scheme if β (cid:54) = 0. In this paper we use a uniform time step ∆ t and a fixed k for each timeinstance. We refer to Eq. (2.2) as the FOM O∆E.The time-continuous solution field is a function of both space and time. Thus, the full solution lives inthe tensor product manifold of both space and time, i.e., y ( · ; µ ) ∈ R N s ⊗ H (2.4)where H is the set of functions from { t n } N t n =0 to R . We refer to R N s ⊗ H as the ‘FOM trial space’. Let g bean isomorphic function that ‘unrolls’ time from H to R N t such that g : u (cid:55)→ (cid:2) u ( t ) · · · u ( t N t ) (cid:3) : R p ⊗ H → R p ⊗ R N t , (2.5)where p ∈ N is an arbitrary dimension. This yields an equivalent (discrete) representation of the FOM spaceas g ( y ( · ; µ )) = (cid:2) y ( t ; µ ) · · · y ( t N t ; µ ) (cid:3) ∈ R N s ⊗ R N t . (2.6)Let us also define a vectorization function which converts the matrix solution to a single column vector h : u (cid:55)→ vec( g ( u )): R p ⊗ H → R pN t . (2.7) We now consider applying projection-based model reduction for the full order models presented in Section 2:Section 3.1 addresses steady problems and Section 3.2 addresses unsteady problems.
We begin by prescribing the subset of the original state space R N s on which the ROM techniques will seekapproximate solutions. Assume that we have constructed reduced bases Φ = [ φ , . . . , φ p ] ∈ R N s × p(cid:63) with p ≤ N s , where R m × n(cid:63) denotes the non-compact Stiefel manifold: the set of full-column-rank m × n real-valued matrices. (We shall describe the popular snapshot-POD approach for constructing such bases laterin this section.) We then approximate the solution in the associated p -dimensional trial subspace with x ( µ ) ≈ ˜ x ( µ ) = x ref ( µ ) + p (cid:88) i =1 φ i ˆ x i ( µ ) ∈ R N s , (3.1)where ˆ x ( µ ) = [ˆ x ( µ ) , . . . , ˆ x p ( µ )] T ∈ R p is the generalized/reduced coordinates (or coefficients), x ref ( µ ) is afixed “particular solution” or also referred in some literature as “static correction”. (We refer to [24] for aspecific way to compute x ref ( µ ).) Remark 1.
Given the reduced bases Φ ∈ R N s × p , we approximate the input-output mapping µ ∈ R n µ (cid:55)→ ˆ x ( µ ) ∈ R p using different machine learning regression models, such as polynomials, k -nearest neighbors,random forests, and dense neural networks that will be described in Section 4. asis construction This section describes how the reduced bases Φ can be constructed assuming that afull-system state-snapshot matrix X := (cid:2) x ( µ ) − x ref ( µ ) , . . . , x ( µ n train train ) − x ref ( µ n train train ) (cid:3) ∈ R N s × n train with { µ j train } n train j =1 ⊆ D has been precomputed during an “offline” training stage. Algorithm 1 lists thewidely-used proper orthogonal decomposition algorithm that we employ to construct all spatial reducedbases in this work. Throughout, υ ∈ [0 ,
1] is the “energy criterion” used to determine the applied truncation.
Algorithm 1:
POD : Proper orthogonal decomposition
Input:
Snapshots X ∈ R N s × m , energy criterion υ ∈ [0 , Output:
Reduced-basis matrix Φ ∈ R N s × p ( p ≤ m ) Compute (thin) singular value decomposition (SVD): X = U Σ V T , Set Φ = [ u · · · u p ], where p = min i ∈ Γ( υ ) , Γ( υ ) := { i | (cid:80) ij =1 σ j / (cid:80) mk =1 σ k ≥ υ } . Here, U ≡ [ u · · · u m ] and Σ ≡ diag( σ , . . . , σ m ).To compute bases Φ ∈ R N s × p for steady problems, we simply execute Algorithm 1 such that Φ = POD ( X , υ ). As in the previous section, we begin by formulating the reduced basis solution, followed by the constructionof the space-time basis. Let Φ = [ φ , . . . , φ p ] ∈ R N s × n s (cid:63) represent the spatial basis matrix and Ψ =[ ψ , . . . , ψ k ] ∈ R N t × n t (cid:63) be the temporal basis matrix, then each spatiotemporal basis is contained in the spanof the tensor product space π := Φ ⊗ Ψ = { φ i ⊗ ψ j } n s ,n t i,j =1 ∈ R N s N t × n st where n st = n s n t . While this tensorproduct construction of space-time basis is most straightforward, in this paper we have chosen a tailored setof spatiotemporal bases, which is slightly different, but without loss of generality, can be represented in asimilar tensor product form.Following the idea in [43], we seek the approximated numerical solution ˜ x ( µ ) to reside in an affinespace–time trial subspace ˜ x ( µ ) = x ref ( µ ) + n st (cid:88) i =1 π i ˆ x i ( µ ) ∈ R N s N t , (3.2)where ˆ x i ( µ ) ∈ R , ≤ i ≤ n st are the generalized coordinates (or coefficients), π i ∈ R N s N t are the spatiotem-poral bases in Φ ⊗ Ψ , and x ref ( µ ) is a reference solution with possible µ dependence, commonly referred toas the “static correction” or centered solution (over both space and time). In the case where the referencesolution is the initial condition, then x ref ( µ ) = x ( µ ) ⊗ N t ∈ R N s N t .Here, the key idea of (3.2) is that the time dependence of the approximated solution is moved from thegeneralized coordinates to the basis vectors; this enables fewer generalized coordinates to be computed inorder to characterize the complete space–time solution. We emphasize again that this is quite new as mostof works in literature proposed to use parameter- and time-dependent generalized coordinates (i.e., spatialbases) [24, 37, 38, 39, 40, 41]; and very few works used only parameter-dependent generalized coordinates(i.e., space-time bases) [42, 43]. Remark 2.
Given space-time reduced bases { π i } n st i =1 ∈ R N s N t , we approximate the input-output mapping µ ∈ R n µ (cid:55)→ ˆ x ( µ ) = [ˆ x ( µ ) , . . . , ˆ x n st ( µ )] T ∈ R n st using different machine learning regression models, suchas polynomials, k-nearest neighbors, random forests, and dense neural networks that will be described inSection 4. Space-time basis construction
In order to compute the space-time basis, we evaluate the FOM (2.1)at the set of training parameter instances D train := { µ , . . . , µ n train train } defined by, e.g., uniform sampling,Latin-hypercube sampling, greedy, etc., using a linear multistep method (2.2). This gives us ‘snapshots’ of thesolution space x ( t j ; µ k train ) ∈ R N s , j ∈ N ( N t ), k ∈ N ( n train ) in both space and time; here N ( N ) := { , . . . , N } .This data collection procedure is commonly referred to as the ‘offline’ stage in model reduction.5he space-time trial subspace is computed by applying tensor-decomposition techniques to the three-way‘state tensor’ X ∈ R N s × N t × n train with elements X ijk := x i ( t j ; µ k train ) − x ref i ( µ k train ) , i ∈ N ( N s ) , j ∈ N ( N t ) , k ∈ N ( n train ) . (3.3)We refer readers to [43] for a complete research of such space-time bases. We expect our resulting ROM beaccurate if the solution exhibits separable behavior in space and time.In order to construct the basis from this three-way tensor, we introduce two different unfoldings of X .These unfoldings or reshapings allow us to utilize standard matrix decomposition techniques such as PCA(Principal Component Analysis) with simplicity. The mode-1 and mode-2 unfoldings of X are defined as X (1) = (cid:2) X ( µ ) . . . X ( µ n train train ) (cid:3) ∈ R N s × N t n train (3.4) X (2) = (cid:2) X T ( µ ) . . . X T ( µ n train train ) (cid:3) ∈ R N t × N s n train , (3.5)respectively, where we have defined X ( µ ) := g ( x ( · ; µ ) − x ref ( µ )).In short, X (1) removes time as a variable by treating time and the parameter space the same, and X (2) removes the spatial dimension as a variable by treating space and parameter samples the same. In the model-reduction literature, the matrix X (1) is typically referred to as the ‘global snapshot matrix’; we refer to itin this work as the ‘spatial snapshot matrix’ as its columns comprise snapshots of the spatial solution overtime and parameter variation. Similarly, we refer to X (2) as the ‘temporal snapshot matrix’ as its columnscomprise snapshots of the time-evolution of the solution over variation in space and parameter. Now we canbegin computing the spatiotemporal basis vectors.The spatial reduced bases Φ = [ φ , . . . , φ n s ] ∈ R N s × n s is typically computed using proper orthogo-nal decomposition applied to the spatial snapshot matrix X (1) by executing Algorithm 1 to obtain Φ = POD ( X (1) , υ ). This produces a basis for the spatial component that is, in a sense, averaged over time. Thetemporal reduced bases Ψ = [ ψ , . . . , ψ n t ] ∈ R N t × n t , can be computed similarly by applying Algorithm 1 on X (2) , i.e., Ψ = POD ( X (2) , υ ). Then, spatiotemporal reduced bases can be computed straightforwardly fromthe Kronecker product π = Φ ⊗ Ψ .Alternatively, we propose using the “tailored temporal bases” defined in [43] due to its advantageouscapabilities in capturing the time evolution of each individual spatial basis vector. In particular, we use ST-HOSVD (sequentially truncated high-order SVD) [47, 48]. The idea can be summarized as follows. First, foreach spatial basis vector φ i we project each space-time snapshot X ( µ m train ) , m ∈ N ( n train ) onto φ i , i ∈ N ( n s ).This computation can be performed by a single matrix-vector multiplication to create X ( φ i ) (2) = X (2) φ i ∈ R N t × n train . Then, in order to obtain the n it -dimensional tailored temporal bases corresponding to φ i , wesimply apply an SVD, as in Algorithm 1, giving us a collection of tailored temporal basis vectors Ψ i ∈ R N t × n it ,where n it may be different for each spatial basis vector φ i . We then repeat this for all n s spatial basis vectorsto get a total of (cid:80) n s i =1 n it temporal basis vectors. Finally, the Kronecker product is implemented correspondingto each spatial basis vector to create resulting spatiotemporal bases π = { φ i ⊗ Ψ i } ∈ R N s N t × n st , i ∈ N ( n s ).Detailed implementations are listed in Algorithm 2.There are several notes for this approach described as follow. First, the maximum dimension of eachtemporal basis is limited by the number of training-parameter instances, i.e., n it ≤ n train , i ∈ N ( n s ). Second,this approach is less computationally expensive than applying directly SVD on X (2) above since we shallperform n s SVD algorithms on X ( φ i ) (2) ∈ R N t × n train , while X (2) ∈ R N t × N s n train and typically n s (cid:28) N s .Third, Ref. [43] has shown that comparing with other temporal basis approaches, this approach is veryefficient since it provides similar accuracy levels but using far fewer degrees of freedom (in the context ofGNAT hyper-reduction technique) as each temporal basis vector is tailored to its associated spatial basisvector. Finally this was the approach employed to construct temporal bases in previous works based onforecasting [49, 50]. 6 lgorithm 2: Space-time POD : Space-time proper orthogonal decomposition
Input:
Spatial snapshot matrix X (1) ∈ R N s × N t m , energy criterion υ ∈ [0 , Output:
Spatiotemporal basis matrix π ∈ R N s N t × p n it Compute (thin) singular value decomposition (SVD) for X (1) = U Σ V T . Set Φ = [ u · · · u p ], where p = min i ∈ Γ( υ ) , Γ( υ ) := { i | (cid:80) ij =1 σ j / (cid:80) N t mk =1 σ k ≥ υ } . Here, U ≡ [ u · · · u N t m ] and Σ ≡ diag( σ , . . . , σ N t m ); p ≤ N t m Form X (2) = X T (1) ∈ R N t × N s m For each u i , compute SVD of X (2) · u i = U i Σ i ( V i ) T and set Ψ i = (cid:104) ˜ u i · · · ˜ u in it (cid:105) where U i = [ ˜ u i · · · ˜ u im ], i ∈ N ( p ); n it ≤ m For each u i , compute Kronecker product π i = u i ⊗ Ψ i , i ∈ N ( p ) Set π = [ π · · · π p ].Note that here we generalize so that the number n it can be adapted for each spatial vector, but for simplicity,we will assume a single n t for a total of n s n t spatiotemporal basis vectors. Four different machine learning methods were used to approximate the mappings µ (cid:55)→ ˆ x ( µ ) in Remark 1and Remark 2, respectively. This section describes the setup and brief overview of each machine learningmethod. Specific implementation choices for each modeling approach are given in Section 5. In general, a surrogate model is constructed for the mapping α ( µ ) : D (cid:55)→ A , e.g., α ≡ ˆ x and A ⊆ R p (or A ⊆ R n st ), respectively. The outputs α ( µ ) are the coefficients defined in Eq. (3.1) or Eq. (3.2) and theinputs are system parameters. In general, let µ ∈ D ⊆ R n µ and A ⊆ R r .Consider the case that we have m snapshots, where each snapshot corresponds to a different inputparameter. (Note that for steady problem, a snapshot is a spatial FOM solution while for unsteady problem,a snapshot is a space-time FOM solution. Snapshots correspond to different input parameters.) The inputparameters corresponding to each snapshot are collected in the matrix D ∈ R m × n µ ; while the correspondingbasis coefficients for each snapshot are collected in the matrix A ∈ R m × r , respectively. Subsequent sectionsprovide an overview of the four methods used to make predictions. For each of these methods, input andoutput data D and A are divided into training and testing sets denoted by ( D train , A train ) and ( D test , A test ),where m = n train + n test respectively. The goal is to learn the mapping α : D (cid:55)→ A from the training data( D train , A train ), and test the performance of the ML methods on ( D test , A test ).With the available snapshots data x ( µ ) and constructed reduced basis Φ (or π , respectively), we canobtain the “best” linear ROM approximations by projecting all FOM solutions onto the reduced bases andthen compute associated “best” relative error as: ∀ µ ∈ D ˆ x best ( µ ) = (cid:0) Φ T Φ (cid:1) − Φ T x ( µ ) , Compute ˜ x best ( µ ) from ˆ x best ( µ ) following Eq . (3.1) or Eq . (3.2) ,ε ( µ ) = (cid:107) x ( µ ) − ˜ x best ( µ ) (cid:107)(cid:107) x ( µ ) (cid:107) , (4.1)where ˆ x best ( µ ) ∈ R r are the “best” generalized coordinates (in linear ROM sense). The training and testingdata is then defined as: D train = [ µ , . . . , µ n train train ] T ∈ R n train × n µ ; A train = [ ˆ x best ( µ ) , . . . , ˆ x best ( µ n train train )] T ∈ R n train × r , (4.2) D test = [ µ , . . . , µ n test test ] T ∈ R n test × n µ ; A test = [ ˆ x best ( µ ) , . . . , ˆ x best ( µ n test test )] T ∈ R n test × r . (4.3)7 .2 Polynomial regression The first model considered is multivariate polynomial regression (MPR), which approximates the outputs(or functions) α ( µ ) using a multivariate polynomial expansion and least squares regression. In this paper,we use polynomial functions up to fifth order to define the model. The model for a coefficient α l ( µ ) can bewritten as α l ( µ ) = b + n µ (cid:88) i =1 b i µ i + n µ (cid:88) i =1 n µ (cid:88) j =1 c ij µ i µ j + n µ (cid:88) i =1 n µ (cid:88) j =1 n µ (cid:88) k =1 d ijk µ i µ j µ k + . . . , l ∈ N ( r ) , (4.4)where b i , i = 0 , , . . . , n µ ; c ij , i, j ∈ N ( n µ ) and d ijk , i, j, k ∈ N ( n µ ) are the coefficients of the approxi-mation model, and µ i , µ j , µ k are components of µ . Note that in this type of expansion, known as a fulltensor product grid, the number of basis elements scales like s n µ where s is the polynomial degree. In theoffline/training stage, these unknown coefficients are determined via least squares regression that minimizesthe mean squared error between the predicted and actual training outputs. Once these coefficients are de-termined, in the online/evaluation stage one can quickly compute α ( µ test ) for any given µ test . (Note thatin all numerical experiments in Section 5, the number of training samples are always larger than number ofunknown coefficients assuring that the least square problem is well-defined.)The largest computational complexity arises from solving the least squares systems, i.e. inverting the D train T D train covariance term. The computational complexity depends on n train , n µ and the degree of theregression polynomials. As shown in Section 5, the polynomial regression model is much cheaper to train andto evaluate than other more complicated ML models (such as neural network or random forest regressors). The k-nearest-neighbors (kNN) model can be interpreted as a localized form of multivariate regression:the approximation is computed over a local set of k training samples that are the closest neighbors of theevaluation point µ in the input space. Due to the local nature of kNN approximations, the resulting modelsdo not provide a global interpretation of the underlying relationship between inputs and outputs. ThereforekNN is an unstructured but flexible scheme, balancing the simplicity of polynomial regression with theflexibility of localized models.In this paper, we consider a standard implementation of kNN that approximates α ( µ ) by averaging theoutputs associated with the k nearest neighbors. In the offline stage, a k - d tree partitions the data alongeach input dimension and can be constructed in O ( n µ n train ) time without explicitly calculating any n µ -dimensional distances. This also allows fast nearest neighbor searches that have an average computationalcomplexity of O (log( n train )) [51, 52]. In the online stage, the predicted output of the kNN model is theweighted average of the training outputs at each of the neighbors; this is essentially a localized linearregression model. The third model considered is random forest regression, which is an ensemble of decision trees, generallytrained via the bagging method (or pasting method) [53]. A decision tree partitions the input domain intomany regions and makes local average estimates of the output. This is performed in a top down fashionwhere the data are recursively partitioned at each node using a greedy algorithm to group similar datatogether. Each node is assigned a certain split criteria, which aims to make a split that creates similarvalued subsets of data. Nodes in the tree continue to split the data into left and right child nodes usinga series of logical statements. This process continues until certain stopping criteria are satisfied and nodescease to split, becoming leaf nodes. After the data are partitioned into multiple regions, a piecewise constant While this is true for low n µ , e.g., 2 or 3, the number of basis elements grow exponentially. So, for example, if n µ = 7 thenumber of basis elements grow to 78,125! This is why a total order or sparse multi-index is necessary for higher dimensions. O ( n µ n train log( n train )) to build the tree. If adecision tree is approximately balanced, this allows fast predictions on the order of O (log( n train )). One ofthe disadvantages of a random forrest regression model is that it is not globally differentiable, as opposed toa polynomial model which is infinitely differentiable. Input Layer Hidden Layer Hidden Layer Output Layer
Figure 1: Structure of a neural network with a three dimensional input, two hidden layers and a twodimensional outputThe last model considered is a fully connected, feed forward neural network, or multi-layer perceptronmodel. Neural network models estimate the output α ( µ ) at an input µ using weights and biases that areadapted to the data during training. With the use of adaptivity, these models are structured but can achievegreat flexibility. A disadvantage of using neural networks is that interpretation of the resulting model maybe difficult.A neural network consists of sequential layers of nodes that define a mapping between an input and anoutput. The basic structure of a neural network is shown in Figure 1. The first layer of nodes is called theinput layer, followed by intermediate hidden layers, and ended with an output layer. The input layer directlyreceives the input to the model: the number of nodes in the input layer is equal to the number of inputs.Next, in each hidden (i.e., not an input or output) layer l , there are n l nodes. The i th node in layer l isdenoted as η il for i ∈ { , . . . , n l } . Each node takes an input, evaluates an activation function g , and producesan intermediate output o il , which is used as an input to nodes in the next layer l + 1. Every node in a givenlayer is connected to every node in the following layer (fully connected) and information is passed forwardthrough the network (feed forward). Each of these connections are given a weight w il and a bias b il , whichdefines the importance and the effect of a particular input to the output [55]. The activation function ateach node is a function of h il = w il o il − + b il and thus depends on the input to the node, the weight assigned tothe connection to that node and the bias. The output for node η il is determined by evaluating the activation9unction o il = g ( h il ). The last layer is the output layer which produces the final output: the number of nodesin the last layer is equal to the number of output values.To train a neural network, training inputs D train and corresponding training outputs A train are providedto the model. The training process iterates forward and backward over the network, adjusting weights andbiases to minimize the mean squared error between predicted and actual training outputs [56]. Each pair offorward and backward passes is referred to as an epoch. The computational complexity of training a neuralnetwork is approximately linear with the size of the network (nodes and layers), which in turn depends uponthe dimension of the input and output, and with the number of epochs, assuming using a smooth activationfunction [57]. This section reports numerical experiments that assess the performance of the proposed space-time machinelearning ROM (ST-ML-ROM) method on three benchmark problems: a steady heat PDE, an unsteady heatPDE and an advective-diffusive-reactive problem. In particular, the first numerical problem is to check ourmethods against established results [24], the second problem is to extend to unsteady problems, and the thirdproblem is to check feasibility when there are many spatial/temporal scales and modes are depend nonlinearlyon input parameters. We focus on the accuracy and speed of four regression methods (polynomial, kNN,random forest and neural network) and compare the following models: • FOM . This model corresponds to the full-order model, i.e., the solution satisfying Eq. (2.1). • ST-ML-ROM . This model corresponds to data-driven space-time ROMs, i.e., surrogate models con-structed via using each ML technique described in sections 4.2–4.5.For steady and unsteady problems, the accuracy of ROM solutions ˜ x ( µ ) are measured by computing the (cid:96) -norm of relative errors: relative error = (cid:115) (cid:107) x ( µ ) − ˜ x ( µ ) (cid:107) (cid:107) x ( µ ) (cid:107) , (5.1)and relative error = (cid:118)(cid:117)(cid:117)(cid:116) N t (cid:88) n =1 (cid:107) x n ( µ ) − ˜ x n ( µ ) (cid:107) (cid:44)(cid:118)(cid:117)(cid:117)(cid:116) N t (cid:88) n =1 (cid:107) x n ( µ ) (cid:107) , (5.2)respectively. All timings are obtained by performing calculations in sklearn [58] on a Mac laptop 2.3GHz
Intel
Core i9 with 16 GB RAM of memory. Reported timings comprise the average time over specifiednumbers of repeated runs for offline and online stages of each machine learning method used. We also notethat the implementation of all ML methods in this paper are pretty straightforward with standard settingsand there is no special treatments such as regularization or any ad hoc tuning.
Table 1: Steady heat equation, parameters of finite element discretizationFinite element meshNumber of elements 3600Number of nodes 3721Number of degrees of freedom n (a) Finite element mesh with 60x60 elements (b) Finite element solution with µ = (10 , Figure 2: Steady heat equation, finite element mesh and solution.We first consider the model example introduced in Refs.[59, 13, 46]. This is a parametric nonlinear 2D heatproblem which consists of computing u ( x , µ ) with x ≡ ( x , x ) ∈ Ω = [0 , and µ ≡ ( µ , µ ) ∈ D = [0 . , and homogeneous Dirichlet boundary condition on Γ ≡ ∂ Ω satisfying − ∇ u + µ µ ( e µ u −
1) = 100 sin(2 π x ) sin(2 π x ) . (5.3)This model can be interpreted as a 2D stationary diffusion problem with a nonlinear interior heat source.The resulting solution exhibits a strongly nonlinear dependence on the parameters µ . Note that we use thissteady problem to verify all machine learning methods developed earlier (such as the work [24]).For spatial discretization, we apply the finite-element method using a mesh characterized by 3600 (60x60)bilinear quadrilateral (Q1) elements. Figure 2a depicts the mesh, while Table 1 reports the correspondingdiscretization parameters. Figure 2b plots the FOM reference solution on the mesh with an input parameter µ = (10 , x = 0. We use Latin Hypercube Sampling with a uniform distribution [60] to generate a total of 300 sample pointsfor the input parameters µ in D that is denoted as D . These 300 sample points are split into 200 trainingsamples ( D train ) and 100 testing samples ( D test ), where D train is used for training purpose and D test isused for testing purpose. Figure 3a depicts Latin Hypercube Sampling with a uniform distribution, whileFigure 3b plots the training and testing samples sets ( D train and D test ), respectively. We also note that D = D train ∪ D test and D train ∩ D test = ∅ .To generate the reduced bases required for the reduced-order models, we solve the FOM (2.1) for all µ ∈ D train , i.e., X train = [ x ( µ ) , . . . , x ( µ n train train )], then feed X train to Algorithm 1 to obtain reducedbases. Figure 4 shows the results of Algorithm 1: Figure 4a, 4b plot the singular values and correspondingcumulative energy for spatial SVD. Figure 4b implies that we only need first 5 POD bases to capture morethan 99.99% energy of the system, hence we choose p = 5 for all computations afterward. Note that we alsosolve the FOM (2.1) for all µ ∈ D test , i.e., X test = [ x ( µ ) , . . . , x ( µ n test test )] to compute the relative solutionerror of all proposed ML-ROM techniques later.Next, we can obtain the “best” linear ROM approximations by projecting all FOM solutions onto thePOD bases and then compute associated “best” relative errors following the computation in Eq. (4.1).Figure 5 plots “best” relative solution errors with µ ∈ D .11 a) Latin Hypercube Sampling, uniform distribution (b) Stratified split into training and testing samples Figure 3: Steady heat equation, sampling the input parameter domain. (a) Singular values (b) Relative cumulative energy
Figure 4: Steady heat equation, singular values decomposition.We now use the four machine learning techniques described in Section 4 to approximate the mapping µ (cid:55)→ ˆ x best ( µ ) by µ (cid:55)→ ˆ x ( µ ) in the offline (training) stage; then compute the approximated ROM solutionEq. (3.1) and associated relative error Eq. (5.1) in the online (prediction) stage. Note that D train is used tobuild surrogate models and D test is used to assess the performance (accuracy and speed) of these surrogatemodels as described in Section 4.1. We compare the four methods for fixed values of their parameters, and for a single randomly selectedonline point µ test = (9 . , . ∈ D test . In particular, we use 2 nd order polynomial regressor, kNN re-gressor with 3 neighbors, random forest regressor with configuration ( max features =2, max leaf nodes =16, n estimators =271) and neural network with configuration ( n hidden =2, n neurons =50, learning rate =10 − ,12 a) (b) Figure 5: Steady heat equation, “best” relative solution error with µ ∈ D . (a) FOM solution (b) 2 nd order polynomial regression (c) kNN (3 neighbors)(d) Random forest (e) Neural network Figure 6: Steady heat equation, FOM solution and ROM predictions using four methods with configurationsdefined in Table 2 at point µ test = (9 . , . ∈ D test . activation = tanh ), respectively . Note that the configurations and hyper-parameters of these regressorsare chosen randomly to check their performance, we will perform a more rigorous parameter study in thenext Section 5.1.4.Table 2 reports the chosen hyper-parameters and associated performance of the methods, while Figure 6 We refer to [58] for details about these hyper-parameters. a) 2 nd order polynomial regression (b) kNN (3 neighbors)(c) Random forest (d) Neural network Figure 7: Steady heat equation, solution error using four methods with configurations defined in Table 2 atpoint µ test = (9 . , . ∈ D test .Table 2: Steady heat equation, ML-ROM methods performance at point µ test = (9 . , . ∈ D test forone online computation.method polynomial regression kNN random forest neural network( max features =2, ( n hidden =2,configuration 2 nd order 3 neighbors max leaf nodes =16, n neurons =50, n estimators =271) learning rate =10 − , activation = tanh ) × − × − × − × − and Figure 7 visualize the ML-ROM solutions and solution errors of all methods, respectively. The resultsin Table 2 and Figures 6, 7 verify that the proposed methods yield accurate solutions as anticipated [24]. Because assessing a given method’s performance for a single instance of its parameters does not lend insightinto the model’s complete error–cost performance tradeoff, this section subjects each of the proposed ROMsto a parameter study wherein each model parameter is varied within relatively wide ranges. In this spirit,the degree of polynomial regressor will be varied in the range 1 ≤ r ≤
5. For kNN regressor, the numberof neighbors will be varied in the range 1 ≤ k ≤
19. For random forest regressor, we first perform across-validation randomized search [61] with hyper-parameter limits specified in Table 3. Table 4 reportshyper-parameter values associated with nine best configurations of random forest regressors resulting from the14able 3: Steady heat equation, hyper-parameter ranges of cross-validation randomized search for randomforest regressor Hyper-parameters Range/Limit max features [1,3] max leaf nodes [1,300] n estimators [1,500]Table 4: Steady heat equation, hyper-parameters of nine best configurations of random forest regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 config 8 config 9 max features max leaf nodes
16 47 3 7 23 24 38 21 8 n estimators
271 446 485 21 331 492 386 161 189Table 5: Steady heat equation, hyper-parameter ranges of cross-validation randomized search for neuralnetwork regressor Hyper-parameters Range/Limit n hidden { } n neurons [1,100] learning rate [10 − ,10 − ] activation { relu , tanh , sigmoid } Table 6: Steady heat equation, hyper-parameters of nine best configurations of neural network regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 config 8 config 9 n hidden n neurons
79 2 90 39 13 98 17 54 86 learning rate × − × − × − × − × − × − × − × − × − activation tanh sigmoid relu tanh relu sigmoid relu relu tanh Table 7: Steady heat equation, number of repeated runs of offline and online stages for each configurationof four regressorsmethods polynomial regressor kNN regressor random forest neural networkoffline stage 2000 2000 500 5online stage 2000 2000 500 1000randomized search. For neural network regressor, we also perform first a cross-validation randomized search[61] with hyper-parameter ranges specified in Table 5. Table 6 reports hyper-parameter values associatedwith nine best configurations of neural network regressors after the randomized search.We then investigate the accuracy of these configurations by computing their relative solution errors for all µ ∈ D test . The computational results are shown in Figure 8. In particular, the relative errors are presentedin the form of data boxes where on each box the median, 25 th , 75 th percentiles, most extreme data pointsand outliers are all shown together. For reference, the “best” relative error defined in (4.1) is also added toFigure 8 as the lowermost data box. Figure 8 shows that the proposed methods yield accurate solutions andreasonable error levels for engineering problems. 15 a) Polynomial regression (b) k-nearest neighbors(c) Random forest (d) Neural network Figure 8: Steady heat equation, relative solution error with µ ∈ D test . The lowermost box is for “best”relative error defined in Eq. (4.1), all upper boxes for ML-ROM using four machine learning techniques. (Oneach data box, the central red mark indicates the median, the bottom and top blue edges indicate the 25 th and 75 th percentiles. The whiskers extend to the most extreme data points not considered outliers, and theoutliers are plotted by red plus symbols.)We also examine the computational time of the considered configurations. To quantify the computationtime, the offline and online stages of each configuration of the four methods are repeated many times andthe reported time is averaged from that of these repeated runs. Table 7 lists the number of repeated runsmentioned previously. We emphasize that the reported time only measures the time to compute µ → ˆ x ( µ ),and does not include ˆ x ( µ ) → ˜ x ( µ ) as the step ˆ x ( µ ) → ˜ x ( µ ) is the same for all regressors. Figure 9a and 9breport the training time and prediction time of some configurations of four regressors. In general, we observethat neural network takes much more time to train and predict than the other three techniques.From the results in Figures 8 and 9b, we construct a Pareto front to characterize the error–cost for eachmethod. This Pareto front is characterized by the method hyper-parameters that minimize the competingobjectives of relative error and computational time. Figure 9c reports the Pareto fronts of the four methodsconsidered. From Figure 9c, we observe a very similar trend to that in the work [24] in which simple MLtechniques (such as polynomial, kNN and random forest regressors) outperform complex ML ones (such as16 a) Training time (b) Prediction time
10 9 8 7 6 5 4 3 2 1 1 2,47,4462,38,3862,21,1611,7,21 1,98,sigmoid1,17,relu (c) Pareto front
Figure 9: Steady heat equation, training and prediction time and Pareto front of four ML-ROM techniques:PR=Polynomial Regression, kNN=k-nearest neighbor, RF=Random Forest and NN=Neural Network.neural network) in terms of accuracy and computational cost (for this specific example). For example, withthe same accuracy level neural network is about two order of magnitude slower than polynomial and kNNregressors, and one order of magnitude slower than the random forest regressor. The conclusion is somewhatsimilar to that in [24]: in engineering problems where the training data is often expensive to obtain andsparse, simple ML techniques can outperform the complex ones because the limitation of training data islimiting the performance of the complicated ML techniques.
We finally provide an accuracy comparison of the proposed methods with the well-known LSPG-ROM method[44, 11, 45]. The LSPG-ROM method is implemented using our in-house code [46] with standard settings.Figure 10 shows all relative solution errors of proposed methods and the LSPG-ROM method, together withthe “best” relative errors (4.1). (In other words, Figure 10 comprises Figure 8 adding relative errors ofLSPG-ROM method.) From Figure 10, we observe that LSPG-ROM provides very accurate solutions thatare very close to the “best” possible linear ROM solutions; and the LSPG-ROM solutions are much moreaccurate than that of the proposed methods. However, note that the computational cost of LSPG-ROM17 (a) Polynomial regression (b) k-nearest neighbors (c) Random forest (d) Neural network
Figure 10: Steady heat equation, relative solution error with µ ∈ D test . The lowermost box is for “best”relative error defined in 4.1, the middle box is for LSPG-ROM (green color) [44], and all upper boxes forML-ROM using four machine learning techniques.(without hyper-reduction) is much higher than that of our proposed methods . We now consider the model example introduced in Refs. [59, 13]. This is a parametric nonlinear 2D unsteadyheat problem which consists of computing u ( x , t, µ ) with x ≡ ( x , x ) ∈ Ω = [0 , , t ∈ [0 , T ] with T = 2 and µ ≡ ( µ , µ ) ∈ D = [0 . , and homogeneous Dirichlet boundary condition on Γ ≡ ∂ Ω, such that: ρ ∂u∂t − ∇ u + µ µ ( e µ u −
1) = ρb ( t ) sin(2 π x ) sin(2 π x ) , (5.4) Note that LSPG-ROM (without hyper-dimension) needs to solve a nonlinear optimization problem in the online stage wherecomputational cost still depends full-order model dimension, while our proposed methods has computational cost completelyindependent of FOM dimension. (a) Finite element mesh with 60x60 elements (b) Finite element solution at t = 1 .
5s with µ = (3 , Figure 11: Unsteady heat equation, finite element mesh and solution.where ρ = 10 is a constant, b = sin(2 πt ) is the control input. This model can be interpreted as a 2Ddynamically diffusion problem with a nonlinear interior heat source density. It presents nonlinear dependenceof the solution with respect the input parameter µ .For spatial discretization, we use a finite element mesh with 3600 (60 ×
60) bilinear quadrilateral (Q1)elements, which results in a system of parameterized ODEs of the form (2.1). The same Table 1 also reportsthe spatial discretization parameters used for this model. For time discretization, we employ the first orderbackward Euler scheme which is a linear multistep method characterized by k ( t n ) = 1, α n = β n = 1, α n = − β n = 0, n ∈ N ( N t ) in (2.3). We use a uniform time step ∆ t = 8 × − , leading to N t = 250 timeinstances. Figure 11a depicts the mesh, while Figure 11b plots the FOM reference solution at time instance t = 1 .
5s with µ = (3 , (a) Latin Hypercube Sampling, uniform distribution (b) Stratified split into training and testing samples Figure 12: Unsteady heat equation, sampling the input parameter domain.19 a) Spatial singular values (b) Spatial cumulative energy(c) Temporal singular values associates with Φ s (d) Temporal cumulative energy associates with Φ s Figure 13: Unsteady heat equation, singular values decomposition.Similar to section 5.1.2, we use Latin Hypercube Sampling with a uniform distribution [60] to generatea total of 300 sample points for the input parameter µ in D that is denoted as D . These 300 samplepoints are split into 200 training samples ( D train ) and 100 testing samples ( D test ), where D train is usedfor training purposes and D test is used for testing purposes. Figure 12a depicts Latin Hypercube Samplingwith a uniform distribution, while Figure 12b plots the training and testing samples sets ( D train and D test ),respectively. We also note that D = D train ∪ D test and D train ∩ D test = ∅ .To generate the space-time reduced bases required for the reduced-order models, we solve the FOM (2.2)for all µ ∈ D train , i.e., X train = [ x ( t n , µ ) , . . . , x ( t n , µ n train train )] , n ∈ N ( N t ) then feed X train to Algorithm 2to obtain the space-time bases. Figure 13 shows the results after performing this step. In particular,Figure 13a, 13b present singular values and cumulative energy for spatial SVD, while Figure 13c and 13dplot singular values and cumulative energy for temporal SVD associated with Φ s . From Figures 13b and13d, we choose n s = 3 spatial bases, n t = 2 temporal bases, hence n st = n s n t = 6 space-time reducedbases for all computations afterward. Again, we also solve the FOM (2.2) for all µ ∈ D test , i.e., X test =[ x ( t n , µ ) , . . . , x ( t n , µ n test test )] , n ∈ N ( N t ) to compute the relative solution error of all proposed ML-ROMtechniques. We finally obtain the “best” linear ROM approximations by projecting all space-time FOMsolutions onto the space-time bases and then compute associated “best” relative errors following Eq. (4.1).20 a) (b) Figure 14: Unsteady heat equation, “best” relative error for µ ∈ D total with n s = 3, n t = 2, p = n s n t = 6.Figure 14 plots “best” relative solution errors with µ ∈ D using n s , n t and n st chosen above.We now use the four machine learning techniques described in Section 4 to approximate the mapping µ → ˆ x ( µ ) in the offline (training) stage; then compute the approximated ROM solution Eq. (3.2) andassociated relative error Eq. (5.2) in the online (prediction) stage. Note again that D train is used to buildsurrogate models and D test is used to assess the performance (accuracy and speed) of these surrogate models. Table 8: Unsteady heat equation, ST-ML-ROM methods performance at point µ test = (9 . , . ∈ D test for one online computation.method polynomial regression kNN random forest neural network( max features =2, ( n hidden =4,configuration 2 nd order 3 neighbors max leaf nodes =44, n neurons =75, n estimators =162) learning rate =3 . × − , activation = sigmoid ) . × − . × − . × − . × − We now compare the four methods for fixed values of their parameters, and for a single randomly selectedonline point µ test = (9 . , . ∈ D test . In particular, we use 2 nd order polynomial regressor, kNN re-gressor with 3 neighbors, random forest regressor with configuration ( max features =2, max leaf nodes =44, n estimators =162) and neural network with configuration ( n hidden =4, n neurons =75, learning rate =3 . × − , activation = sigmoid ), respectively. Note that the configurations and hyper-parameters of these re-gressors are chosen randomly to check their performance, we will perform a more rigorous parameter studyin the next Section 5.2.4.Table 8 reports the chosen hyper-parameters and associated performance of the methods, while Figure 15and Figure 16 visualize the ST-ML-ROM solutions and solution errors of all methods at time instance t = 1 . a) FOM solution (b) 2 nd order polynomial regression (c) kNN (3 neighbors)(d) Random forest (e) Neural network Figure 15: Unsteady heat equation, solution visualization using four methods with configurations specifiedin Table 8 at time t = 1 . µ test = (9 . , . ∈ D test . Table 9: Unsteady heat equation, hyper-parameters of nine best configurations of random forest regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 config 8 config 9 max features max leaf nodes
264 167 49 106 44 22 2 88 135 n estimators
431 274 475 260 162 253 390 373 21Table 10: Unsteady heat equation, hyper-parameters of nine best configurations of neural network regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 config 8 config 9 n hidden n neurons
99 43 67 79 75 28 30 67 71 learning rate × − × − × − × − × − × − × − × − × − activation sigmoid tanh tanh sigmoid sigmoid tanh sigmoid tanh sigmoid We again compare the performance of the ROM methods across a wide variation of all method parameters.In particular, the degree of polynomial regressor will be varied in the range 1 ≤ r ≤
5. For kNN regressor, thenumber of neighbors will be varied in the range 1 ≤ k ≤
19. For random forest and neural network regressors,we employ the same approach as described in Section 5.1.4 to find their several optimal configurations.22 a) 2 nd order polynomial regression (b) kNN (3 neighbors)(c) Random forest (d) Neural network Figure 16: Unsteady heat equation, solution error using four methods with configurations defined in Table 8at time t = 1 . µ test = (9 . , . ∈ D test .Since the input-output mapping of this problem ( R → R ) is very similar to that of the previous problemin Section 5.1 ( R → R ), we use the same randomized search ranges for hyper-parameters of random forestand neural network regressors as specified in Tables 3 and 5, respectively. Tables 9 and 10 report hyper-parameter values associated with nine best configurations of random forest and neural network regressorsresulting from the corresponding cross-validation randomized searches.We then investigate the accuracy of these configurations by computing their relative solution errorstogether with the relative solution errors of the space-time LSPG-ROM method [43] and “best” relativeerror (4.1) for all µ ∈ D test . Figure 17 reports all these relative solution errors. We observe a similaritybetween Figure 17 and Figure 10 that is the space-time LSPG-ROM provides very accurate solutions thatare close to the “best” possible linear ROM solutions; and the space-time LSPG-ROM solutions are slightly more accurate than our proposed methods. Again, we also note that the computational cost of space-timeLSPG-ROM (without hyper-reduction) is much higher than that of our proposed methods (see Section 5.1.5).The computational time of the considered configurations are also examined. We apply the same proceduredescribed in Section 5.1.4 to measure computational times; the same Table 7 lists the number of repeatedoffline and online runs. Figure 18a and Figure 18b report the training time and Pareto front for onlinepredictions, respectively. Again, Figure 18b shows a very similar trend to Figure 9c in which simple MLtechniques (such as polynomial, kNN and random forest regressors) outperform complex ML ones (neuralnetwork) in terms of accuracy and computational cost for this problem. We investigate the effect of the amount of training data on computational results. In particular, withintraining data set D train , we pick a random nonstratified partition [62] that has 75% size of D train to train allML models, then evaluate the accuracy of these models similarly to Section 5.2.4 above. Figure 19 reports therelative solution errors together with the “best” relative error (4.1) for all µ ∈ D test . Comparing Figure 1923 (a) Polynomial regression (b) k-nearest neighbors (c) Random forest (d) Neural network Figure 17: Unsteady heat equation, relative solution error with µ ∈ D test . The lowermost box is for “best”relative error defined in Eq. (4.1), the middle box is for space-time LSPG-ROM (green color) [43], and allupper boxes for ST-ML-ROM using four machine learning techniques.and Figure 17 shows that all ML algorithms still work well and provide accurate solutions even when usingonly 75% training data. (It is generally obvious that using much less training data will have adverse effecton the performance of ML algorithms.) We now consider an advective-diffusive-reactive system modeled on the premixed combustion of a H − O mixture at constant pressure. The governing equation system is: ∂ w ( (cid:126)x, t ; µ ) ∂t = ∇ · ( κ ∇ w ( (cid:126)x, t ; µ )) − v · ∇ w ( (cid:126)x, t ; µ ) + q ( w ( (cid:126)x, t ; µ ); µ ) in Ω × D × [0 , T ] , (5.5)where ∇ is the spatial gradient, κ is the molecular diffusivity, v is the velocity field, and w ( (cid:126)x, t ; µ ) ≡ [ w T ( (cid:126)x, t ; µ ) , w H ( (cid:126)x, t ; µ ) , w O ( (cid:126)x, t ; µ )) , w H O (( (cid:126)x, t ; µ ))] T ∈ R a) Training time (b) Pareto front Figure 18: Unsteady heat equation, training time and Pareto front of four ST-ML-ROM techniques:PR=Polynomial Regression, kNN=k-nearest neighbor, RF=Random Forest and NN=Neural Network.denotes the state vector containing the temperature w T ( (cid:126)x, t ; µ ) and the mass fractions of chemical speciesH , O , and H O, i.e., w i ( (cid:126)x, t ; µ ) for i ∈ { H , O , H O } . The Arrhenius reaction source term q ( w ( (cid:126)x, t ; µ ); µ ) ≡ [ q T ( w ; µ ) , q H ( w ; µ ) , q O ( w ; µ ) , q H O ( w ; µ )] T is q T ( w ; µ ) = Qq H O ( w ; µ ) q i ( w ; µ ) = − ν i (cid:18) W i ρ (cid:19) (cid:18) ρw H W H (cid:19) ν H2 (cid:18) ρw O W O (cid:19) ν O2 A exp (cid:18) − ERw T (cid:19) , i ∈ { H , O , H O } , where ( ν H , ν O , ν H O ) = (2 , , −
2) are the stoichiometric coefficients, ( W H , W O , W H O ) = (2 . , . , · mol − , ρ = 1 . × − g · cm − is the density of the mixture, R = 8 .
314 J · mol − · K − is the universal gas constant, and Q = 9800K captures the heat of the reaction.The n µ = 2 uncertain parameters are µ = ( A, κ ), the pre-exponential factor A and the molecular diffusivity.The parameter domain is set to D = [2 . × , . × ] × [1 , v = [50 cm · s − , T , the activation energy E = 7 . × and the final time is T = 0 .
04 s,respectively.Figure 20a shows the geometry of the spatial domain. At the inflow boundary Γ , Dirichlet boundaryconditions w H = 0 . w O = 0 . w H O = 0 are imposed for the mass fractions and w T = 950Kfor the temperature. On boundaries Γ and Γ , homogeneous Dirichlet boundary conditions exist for massfractions, and temperature w T = 300K is set. On Γ , Γ , and Γ , we impose homogeneous Neumannconditions on the temperature and mass fractions. For initial conditions, we use w T = 300K and w H = w O = w H O = 0.Eq. (5.5) is solved on a 65 ×
32 finite-difference mesh (see Fig. 20b). A second-order backward differenceformula ( k = 2, α = 1, α = − , α = , β = , and β = β = 0) is used for temporal discretization.These result in a nonlinear system of the form of Eq. (2.3) with n = 8192 degrees of freedom; ∆ t = 2 × − which implies N t = 200 time instances to solve the problem. We again use Latin Hypercube Sampling with uniform distribution to generate a total of 400 sample pointsfor the input parameter µ in D that is denoted as D . These 400 sample points are split into 320 trainingsamples ( D train ) and 80 testing samples ( D test ), where D train is used for training purpose and D test is used fortesting purpose. Within D train , we use only first 150 points to build space-time bases due to implementation25 a) Polynomial regression (b) k-nearest neighbors(c) Random forest (d) Neural network Figure 19: Unsteady heat equation, relative solution error with µ ∈ D test . The lowermost box is for “best”relative error defined in (4.1), and all upper boxes for ST-ML-ROM using four machine learning techniqueswith 75% training data.issues . Figure 21a depicts the training and testing samples sets, while Figure 21b plots the training samplespoints (within D train ) used to build space-time bases, respectively. Note that D = D train ∪ D test and D train ∩ D test = ∅ .To generate the space-time reduced bases required for the reduced-order models, we solve the FOM (2.2)for all µ ∈ D train , i.e., X train = [ x ( t n , µ ) , . . . , x ( t n , µ n train train )] , n ∈ N ( N t ) then feed X train to Algorithm 2to obtain the space-time bases. Figure 22 shows the results after performing this step: Figure 22a, 22bpresent singular values and cumulative energy for spatial SVD, while Figure 22c and 22d plot singular valuesand cumulative energy for temporal SVD associated with Φ s . Based on Figures 22b, 22d and some trial-errorexperiments, we choose n s = 20 spatial bases, n t = 3 temporal bases, hence n st = n s n t = 60 space-timereduced bases for all computations afterward.Again, we also solve the FOM (2.2) for all µ ∈ D test , i.e., X test = [ x ( t n , µ ) , . . . , x ( t n , µ n test test )] , n ∈ N ( N t ) to compute the relative solution error of all proposed ST-ML-ROM techniques. We finally obtain the Note that space-time implementation for this problem is more expensive than the previous example in Section 5.2 in termsof cost and memory requirements due to pretty high number of spatial degrees of freedom N s =8192 and stronger nonlinearity. Γ
18 mm Γ Γ Γ Γ x x (a) Geometry of the spatial domain Ω (b) Finite difference grid Figure 20: Reacting flow, geometry and finite difference grid. (a) Stratified split into training and testing samples (b) Training samples to build space-time bases
Figure 21: Reacting flow, sampling the input parameter domain.“best” linear ROM approximations by projecting all space-time FOM solutions onto the space-time bases andthen compute associated “best” relative errors following Eq. (4.1). Figure 23 plots “best” relative solutionerrors with µ ∈ D using n s , n t and n st chosen above.We then build the surrogate models (using four machine learning techniques described in Section 4) forthe mapping µ → ˆ x ( µ ). The surrogate models are built in the offline (training) stage using data in D train ;and their performance (accuracy and speed) are evaluated in the online (prediction) stage using data in D test , respectively. We now compare the four methods for fixed values of their parameters, and for a single randomly selectedonline point µ test = (5 . × , . ∈ D test . In particular, we use 2 nd order polynomial regres-sor, kNN regressor with 3 neighbors, random forest regressor with configuration ( max leaf nodes =116, n estimators =81) and neural network with configuration ( n hidden =10, n neurons =19, learning rate =1 . × a) Spatial singular values (b) Spatial cumulative energy(c) Temporal singular values associates with Φ s (d) Temporal cumulative energy associates with Φ s Figure 22: Reacting flow, singular values decomposition.Table 11: Reacting flow, ST-ML-ROM methods performance at point µ test = (5 . × , . ∈ D test for one online computation.method polynomial regression kNN random forest neural network( max leaf nodes =116, ( n hidden =10,configuration 2 nd order 3 neighbors n estimators =81), n neurons =19, learning rate =1 . × − , activation = sigmoid ) . × − . × − . × − . × − − , activation = sigmoid ), respectively. Table 11 reports the chosen hyper-parameters and associatedperformance of the methods, while Figure 24 visualizes the ST-ML-ROM solutions and solution errors of the2nd order polynomial regressor. The results on Table 11 and Figure 24 verify that our proposed methods28 a) (b) Figure 23: Reacting flow, “best” relative error for µ ∈ D with n s = 20, n t = 3, p = n s n t = 60.yield very accurate solutions. Table 12: Reacting flow, hyper-parameter ranges of cross-validation randomized search for random forestregressor Hyper-parameters Range/Limit max leaf nodes [10,1000] n estimators [10,1000]Table 13: Reacting flow, hyper-parameters of nine best configurations of random forest regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 config 8 config 9 max leaf nodes
709 109 967 885 815 116 624 710 286 n estimators
985 881 696 576 395 81 131 30 170Table 14: Reacting flow, hyper-parameter ranges of cross-validation randomized search for neural networkregressor Hyper-parameters Range/Limit n hidden { } n neurons [1,100] learning rate [10 − ,10 − ] activation { relu , tanh , sigmoid } We again compare the performance of the ROM methods across a wide variation of all method parameters.In particular, the degree of polynomial regressor will be varied in the range 1 ≤ r ≤
5. For kNN regressor, thenumber of neighbors will be varied in the range 1 ≤ k ≤
19. For random forest and neural network regressors,29 . . . . x (cm)0 . . x ( c m ) Temperature (FOM) 40080012001600 (a) FOM temperature . . . . x (cm)0 . . x ( c m ) Temperature (ROM) 40080012001600 (b) ROM temperature . . . . x (cm)0 . . x ( c m ) Temperature (absolute error) − − (c) Temperature error . . . . x (cm)0 . . x ( c m ) H mass fraction (FOM) 0 . . . . . (d) FOM H . . . . x (cm)0 . . x ( c m ) H mass fraction (ROM) 0 . . . . . (e) ROM H x (cm)0 . . x ( c m ) H mass fraction (absolute error) − . − . − . − . . (f) H error . . . . x (cm)0 . . x ( c m ) O mass fraction (FOM) 0 . . . . . (g) FOM O . . . . x (cm)0 . . x ( c m ) O mass fraction (ROM) 0 . . . . . (h) ROM O . . . . x (cm)0 . . x ( c m ) O mass fraction (absolute error) − . − . − . . (i) O error . . . . x (cm)0 . . x ( c m ) H O mass fraction (FOM) 0 . . . . . . (j) FOM H O . . . . x (cm)0 . . x ( c m ) H O mass fraction (ROM) 0 . . . . . (k) ROM H O . . . . x (cm)0 . . x ( c m ) H O mass fraction (absolute error) − . . . . . (l) H O error
Figure 24: Reacting flow, solution visualization with 2nd order polynomial regressor at time t = 0 . µ test = (5 . × , . ∈ D test on Table 11.we employ the same approach as described in Section 5.1.4 to find their several optimal configurations. Inthis spirit, the hyper-parameter limits of the cross-validation randomized searches for random forest andneural network regressors are listed on Table 12 and 14; while Table 13 and 15 report the hyper-parameter30able 15: Reacting flow, hyper-parameters of nine best configurations of neural network regressorHyper-parameters config 1 config 2 config 3 config 4 config 5 config 6 config 7 n hidden
10 10 60 5 10 60 20 n neurons
90 67 83 76 19 26 39 learning rate × − × − × − × − × − × − × − activation tanh sigmoid tanh tanh sigmoid sigmoid tanh (a) Polynomial regression (b) k-nearest neighbors(c) Random forest (d) Neural network Figure 25: Reacting flow, relative solution error with µ ∈ D test . The lowermost box is for “best” relativeerror defined in (4.1), and all upper boxes for ST-ML-ROM using four machine learning techniques.values associated with their nine and seven best configurations after the randomized searches, respectively.We then investigate the accuracy of these configurations by computing their relative solution errorstogether with the “best” relative error Eq. (4.1) for all µ ∈ D test . Figure 25 reports all these relativesolution errors. We observe from Figure 25 again that the proposed methods yield accurate solutions andreasonable error levels for engineering settings.Finally, the computational time of the considered configurations are also investigated. We apply the same31 a) Training time (b) Pareto front Figure 26: Reacting flow, training time and Pareto front of four ST-ML-ROM techniques: PR=PolynomialRegression, kNN=k-nearest neighbor, RF=Random Forest and NN=Neural Network.procedure described in Section 5.1.4 to measure computational times; the same Table 7 lists the numberof repeated offline and online runs for each configuration. Figure 26a and Figure 26b report the trainingtime and Pareto front for online predictions, respectively. Again, Figure 26b shows a very similar trend toFigure 18b and 9c in which simple ML techniques (such as polynomial, kNN and random forest regressors)outperform complex ML ones (neural network) in terms of accuracy and computational cost; especiallyrandom forest is the best choice for this complicated problem.
Numerical experiments in this paper have demonstrated that the use of space-time subspace bases combinedwith machine learning methods provide a tractable and effective way to map between input parameters to atime-dependent high-dimensional output quantity of interest. The method can be applied straightforwardlyto applications that exhibit a fast decaying Kolmogorov n -width (see other approaches, e.g., [63, 64, 65] tohandle problems exhibiting slowly decaying Kolmogorov n -width). The numerical experiments in this paperhave also highlighted the important point that the availability of data in an engineering setting is typicallymuch less than it is in other machine learning applications. This is because engineering data are oftengenerated from expensive physics-based codes or from sensors embedded on a physical system. In eithercase, it is reasonable to expect hundreds or thousands of training data points, but not millions as is thecase in many non-engineering machine learning applications. The strategy to choose appropriate machinelearning methods is thus affected by this fact. In particular, we have observed that in this setting simplerpolynomial regression or random forest models may be better choices (than complicated neural networks) –even when the underlying physical phenomena exhibit strongly nonlinear behavior. Another possible reasonexplaining why polynomial regression, kNN and random forest provide more accurate solutions than neuralnetwork does (in this engineering setting) is that these 3 methods build one surrogate model for each mapping µ (cid:55)→ α i ( µ ), i ∈ N ( r ) (hence totally r models to approximate whole mapping), while neural network buildone unique surrogate model for the whole mapping µ (cid:55)→ α ( µ ). The Kolmogorov n -width is defined as: d n ( M ) := inf S n sup f ∈M inf g ∈S n (cid:107) f − g (cid:107) , where the first infimum is taken over all n -dimensional subspaces of the state space, and M denotes the manifold of solutions over all time and parameters. cknowledgements This paper describes objective technical results and analysis. Any subjective views or opinions that might beexpressed in the paper do not necessarily represent the views of the U.S. Department of Energy or the UnitedStates Government. Sandia National Laboratories is a multimission laboratory managed and operated byNational Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of HoneywellInternational, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration undercontract DE-NA-0003525.
References [1] Gallivan, K., Grimme, E., and Dooren, P. V., “Asymptotic waveform evaluation via a Lanczos method,”Applied Mathematics Letters, Vol. 7, No. 5, 1994, pp. 75–80.[2] Silveira, L. M., Kamon, M., Elfadel, I., and White, J., “A coordinate-transformed Arnoldi algorithmfor generating guaranteed stable reduced-order models of RLC circuits,” Computer Methods in AppliedMechanics and Engineering, Vol. 169, No. 3-4, 1999, pp. 377–389.[3] Boley, D. L., “Krylov space methods on state-space control models,” Circuits, Systems and SignalProcessing, Vol. 13, No. 6, 1994, pp. 733–758.[4] Bai, Z., “Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems,”Applied numerical mathematics, Vol. 43, No. 1-2, 2002, pp. 9–44.[5] Willcox, K. and Peraire, J., “Balanced Model Reduction via the Proper Orthogonal Decomposition,”AIAA JOURNAL, Vol. 40, No. 11, 2002.[6] Sirovich, L., “Turbulence and the dynamics of coherent structures. Part I: Coherent structures,”Quarterly of applied mathematics, Vol. 45, No. 3, 1987, pp. 561–571.[7] Rozza, G., Huynh, D., and Patera, A. T., “Reduced basis approximation and a posteriori error estima-tion for affinely parametrized elliptic coercive partial differential equations,” Archives of ComputationalMethods in Engineering, Vol. 15, No. 3, 2008, pp. 229–275.[8] Chinesta, F., Ladeveze, P., and Cueto, E., “A short review on model order reduction based on propergeneralized decomposition,” Archives of Computational Methods in Engineering, Vol. 18, No. 4, 2011,pp. 395.[9] Schmid, P. J., “Dynamic mode decomposition of numerical and experimental data,” Journal of fluidmechanics, Vol. 656, 2010, pp. 5–28.[10] Astrid, P., Weiland, S., Willcox, K., and Backx, T., “Missing point estimation in models described byproper orthogonal decomposition,” IEEE Transactions on Automatic Control, Vol. 53, No. 10, 2008,pp. 2237–2251.[11] Carlberg, K., Farhat, C., Cortial, J., and Amsallem, D., “The GNAT method for nonlinear modelreduction: effective implementation and application to computational fluid dynamics and turbulentflows,” Journal of Computational Physics, Vol. 242, 2013, pp. 623–647.[12] Barrault, M., Maday, Y., Nguyen, N. C., and Patera, A. T., “An ‘empirical interpolation’ method:application to efficient reduced-basis discretization of partial differential equations,” Comptes RendusMathematique, Vol. 339, No. 9, 2004, pp. 667–672.[13] Chaturantabut, S. and Sorensen, D. C., “Nonlinear model reduction via discrete empirical interpola-tion,” SIAM Journal on Scientific Computing, Vol. 32, No. 5, 2010, pp. 2737–2764.3314] Ly, H. V. and Tran, H. T., “Modeling and control of physical processes using proper orthogonal decom-position,” Mathematical and computer modelling, Vol. 33, No. 1-3, 2001, pp. 223–236.[15] Bui-Thanh, T., Damodaran, M., and Willcox, K., “Aerodynamic data reconstruction and inverse designusing proper orthogonal decomposition,” AIAA journal, Vol. 42, No. 8, 2004, pp. 1505–1516.[16] Everson, R. and Sirovich, L., “Karhunen–Lo`eve procedure for gappy data,” Journal of the OpticalSociety of America A, Vol. 12, No. 8, 1995, pp. 1657–1664.[17] Audouze, C., De Vuyst, F., and Nair, P., “Reduced-order modeling of parameterized PDEs usingtime–space-parameter principal component analysis,” International journal for numerical methods inengineering, Vol. 80, No. 8, 2009, pp. 1025–1057.[18] Audouze, C., De Vuyst, F., and Nair, P. B., “Nonintrusive reduced-order modeling of parametrizedtime-dependent partial differential equations,” Numerical Methods for Partial Differential Equations,Vol. 29, No. 5, 2013, pp. 1587–1628.[19] Wirtz, D., Karajan, N., and Haasdonk, B., “Surrogate modeling of multiscale models using kernelmethods,” International Journal for Numerical Methods in Engineering, Vol. 101, No. 1, 2015, pp. 1–28.[20] Mainini, L. and Willcox, K., “Surrogate modeling approach to support real-time structural assessmentand decision making,” AIAA Journal, Vol. 53, No. 6, 2015, pp. 1612–1626.[21] Ulu, E., Zhang, R., and Kara, L. B., “A data-driven investigation and estimation of optimal topolo-gies under variable loading configurations,” Computer Methods in Biomechanics and BiomedicalEngineering: Imaging & Visualization, Vol. 4, No. 2, 2016, pp. 61–72.[22] Hesthaven, J. S. and Ubbiali, S., “Non-intrusive reduced order modeling of nonlinear problems usingneural networks,” Journal of Computational Physics, Vol. 363, 2018, pp. 55–78.[23] Chen, W., Hesthaven, J. S., Junqiang, B., Qiu, Y., Yang, Z., and Tihao, Y., “Greedy nonintrusivereduced order model for fluid dynamics,” AIAA Journal, Vol. 56, No. 12, 2018, pp. 4927–4943.[24] Swischuk, R., Mainini, L., Peherstorfer, B., and Willcox, K., “Projection-based model reduction: For-mulations for physics-based machine learning,” Computers & Fluids, Vol. 179, 2019, pp. 704–717.[25] Ljung, L., “System identification,” Wiley encyclopedia of electrical and electronics engineering, 1999,pp. 1–19.[26] Viberg, M., “Subspace-based methods for the identification of linear time-invariant systems,”Automatica, Vol. 31, No. 12, 1995, pp. 1835–1851.[27] Reynders, E., “System identification methods for (operational) modal analysis: review and comparison,”Archives of Computational Methods in Engineering, Vol. 19, No. 1, 2012, pp. 51–124.[28] Mendel, J. M., “Tutorial on higher-order statistics (spectra) in signal processing and system theory:Theoretical results and some applications,” Proceedings of the IEEE, Vol. 79, No. 3, 1991, pp. 278–305.[29] Drmac, Z., Gugercin, S., and Beattie, C., “Vector fitting for matrix-valued rational approximation,”SIAM Journal on Scientific Computing, Vol. 37, No. 5, 2015, pp. A2346–A2379.[30] Proctor, J. L., Brunton, S. L., and Kutz, J. N., “Dynamic mode decomposition with control,” SIAMJournal on Applied Dynamical Systems, Vol. 15, No. 1, 2016, pp. 142–161.[31] Peherstorfer, B. and Willcox, K., “Data-driven operator inference for nonintrusive projection-basedmodel reduction,” Computer Methods in Applied Mechanics and Engineering, Vol. 306, 2016, pp. 196–215. 3432] Castelletti, A., Galelli, S., Restelli, M., and Soncini-Sessa, R., “Data-driven dynamic emulation mod-elling for the optimal management of environmental systems,” Environmental Modelling & Software,Vol. 34, 2012, pp. 30–43.[33] Balzano, L., Nowak, R., and Recht, B., “Online identification and tracking of subspaces from highly in-complete information,” 2010 48th Annual allerton conference on communication, control, and computing(Allerton), IEEE, 2010, pp. 704–711.[34] Peherstorfer, B. and Willcox, K., “Online adaptive model reduction for nonlinear systems via low-rankupdates,” SIAM Journal on Scientific Computing, Vol. 37, No. 4, 2015, pp. A2123–A2150.[35] Brunton, S. L., Proctor, J. L., and Kutz, J. N., “Discovering governing equations from data by sparseidentification of nonlinear dynamical systems,” Proceedings of the national academy of sciences, Vol. 113,No. 15, 2016, pp. 3932–3937.[36] Dean, S., Mania, H., Matni, N., Recht, B., and Tu, S., “On the sample complexity of the linear quadraticregulator,” Foundations of Computational Mathematics, 2019, pp. 1–47.[37] Wang, Q., Hesthaven, J. S., and Ray, D., “Non-intrusive reduced order modeling of unsteady flows usingartificial neural networks with application to a combustion problem,” Journal of computational physics,Vol. 384, 2019, pp. 289–307.[38] Fresca, S., Dede, L., and Manzoni, A., “A comprehensive deep learning-based approach to reduced ordermodeling of nonlinear time-dependent parametrized PDEs,” arXiv preprint arXiv:2001.04001, 2020.[39] Renganathan, S. A., Maulik, R., and Rao, V., “Machine learning for nonintrusive model order reductionof the parametric inviscid transonic flow past an airfoil,” Physics of Fluids, Vol. 32, No. 4, 2020,pp. 047110.[40] Kast, M., Guo, M., and Hesthaven, J. S., “A non-intrusive multifidelity method for the reduced ordermodeling of nonlinear problems,” Computer Methods in Applied Mechanics and Engineering, Vol. 364,2020, pp. 112947.[41] Rajaram, D., Puranik, T. G., Perron, C., and Mavris, D. N., “Non-Intrusive Parametric Reduced OrderModeling using Randomized Algorithms,” AIAA Scitech 2020 Forum, 2020, p. 0417.[42] Fritzen, F. and Hassani, M., “Space–time model order reduction for nonlinear viscoelastic systemssubjected to long-term loading,” Meccanica, Vol. 53, No. 6, 2018, pp. 1333–1355.[43] Choi, Y. and Carlberg, K., “Space–Time Least-Squares Petrov–Galerkin Projection for Nonlinear ModelReduction,” SIAM Journal on Scientific Computing, Vol. 41, No. 1, 2019, pp. A26–A58.[44] Carlberg, K., Bou-Mosleh, C., and Farhat, C., “Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations,” International Journal forNumerical Methods in Engineering, Vol. 86, No. 2, 2011, pp. 155–181.[45] Carlberg, K., Barone, M., and Antil, H., “Galerkin v. least-squares Petrov–Galerkin projection innonlinear model reduction,” Journal of Computational Physics, Vol. 330, 2017, pp. 693–734.[46] Hoang, C., Choi, Y., and Carlberg, K., “Domain-decomposition least-squares Petrov-Galerkin (DD-LSPG) nonlinear model reduction,” arXiv preprint arXiv:2007.11835, 2020.[47] Vannieuwenhoven, N., Vandebril, R., and Meerbergen, K., “A new truncation strategy for the higher-order singular value decomposition,” SIAM Journal on Scientific Computing, Vol. 34, No. 2, 2012,pp. A1027–A1052. 3548] Austin, W., Ballard, G., and Kolda, T. G., “Parallel tensor compression for large-scale scientific data,”2016 IEEE international parallel and distributed processing symposium (IPDPS), IEEE, 2016, pp. 912–922.[49] Carlberg, K., Ray, J., and van Bloemen Waanders, B., “Decreasing the temporal complexity for non-linear, implicit reduced-order models by forecasting,” Computer Methods in Applied Mechanics andEngineering, Vol. 289, 2015, pp. 79–103.[50] Carlberg, K., Brencher, L., Haasdonk, B., and Barth, A., “Data-driven time parallelism via forecasting,”SIAM Journal on Scientific Computing, Vol. 41, No. 3, 2019, pp. B466–B496.[51] Bentley, J. L., “Multidimensional binary search trees used for associative searching,” Communicationsof the ACM, Vol. 18, No. 9, 1975, pp. 509–517.[52] Friedman, J. H., Bentley, J. L., and Finkel, R. A., “An algorithm for finding best matches in logarithmicexpected time,” ACM Transactions on Mathematical Software (TOMS), Vol. 3, No. 3, 1977, pp. 209–226.[53] G´eron, A., Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, andtechniques to build intelligent systems, O’Reilly Media, 2019.[54] Breiman, L., Friedman, J., Stone, C. J., and Olshen, R. A., Classification and regression trees, CRCpress, 1984.[55] Nielsen, M. A., Neural networks and deep learning, Vol. 2018, Determination press San Francisco, CA,2015.[56] Rumelhart, D. E., Hinton, G. E., and Williams, R. J., “Learning representations by back-propagatingerrors,” nature, Vol. 323, No. 6088, 1986, pp. 533–536.[57] Livni, R., Shalev-Shwartz, S., and Shamir, O., “On the computational efficiency of training neuralnetworks,” Advances in neural information processing systems, 2014, pp. 855–863.[58] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Pretten-hofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M.,and Duchesnay, E., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research,Vol. 12, 2011, pp. 2825–2830.[59] Grepl, M. A., Maday, Y., Nguyen, N. C., and Patera, A. T., “Efficient reduced-basis treatment ofnonaffine and nonlinear partial differential equations,” ESAIM: Mathematical Modelling and NumericalAnalysis, Vol. 41, No. 03, 2007, pp. 575–605.[60] Budiman, M., “Latin Hypercube Sampling, MATLAB Central File Exchange,” , July 2020.[61] scikit-learn 0.24.0, “Randomized Search Cross Validation,” https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html , Dec. 2020.[62] MatlabR2020b, “Malab cvpartition command,”