Latent-space time evolution of non-intrusive reduced-order models using Gaussian process emulation
Romit Maulik, Themistoklis Botsas, Nesar Ramachandra, Lachlan Robert Mason, Indranil Pan
LL ATENT - SPACE TIME EVOLUTION OF NON - INTRUSIVEREDUCED - ORDER MODELS USING G AUSSIAN PROCESSEMULATION
A P
REPRINT
Romit Maulik
Argonne Leadership Computing FacilityArgonne National LaboratoryLemont, IL 60439 [email protected]
Themistoklis Botsas
The Alan Turing InstituteLondon NW1 2DB [email protected]
Nesar Ramachandra
High Energy Physics DivisionArgonne National LaboratoryLemont, IL 60439 [email protected]
Lachlan Robert Mason
Imperial College London SW7 2AZThe Alan Turing Institute, NW1 2DB [email protected]
Indranil Pan
Imperial College London SW7 2AZThe Alan Turing Institute, NW1 2DB [email protected]
July 24, 2020 A BSTRACT
Non-intrusive reduced-order models (ROMs) have recently generated considerable interest for con-structing computationally efficient counterparts of nonlinear dynamical systems emerging from var-ious domain sciences. They provide a low-dimensional emulation framework for systems that maybe intrinsically high-dimensional. This is accomplished by utilizing a construction algorithm that ispurely data-driven. It is no surprise, therefore, that the algorithmic advances of machine learninghave led to non-intrusive ROMs with greater accuracy and computational gains. However, in by-passing the utilization of an equation-based evolution, it is often seen that the interpretability of theROM framework suffers. This becomes more problematic when black-box deep learning methodsare used which are notorious for lacking robustness outside the physical regime of the observed data.In this article, we propose the use of a novel latent space interpolation algorithm based on Gaussianprocess regression. Notably, this reduced-order evolution of the system is parameterized by controlparameters to allow for interpolation in space. The use of this procedure also allows for a continuousinterpretation of time which allows for temporal interpolation. The latter aspect provides informa-tion, with quantified uncertainty, about full-state evolution at a finer resolution than that utilized fortraining the ROMs. We assess the viability of this algorithm for an advection-dominated systemgiven by the inviscid shallow water equations. K eywords Reduced-order models; Deep learning; Gaussian process regression
Recently, researchers have shown sustained interest in using machine learning (ML) methods for non-intrusivereduced-order models (ROMs) for systems that may be governed by advection-dominated partial differential equa-tions (PDEs). This is because solving PDE forward-models for such systems may require very fine spatiotemporalnumerical discretizations which cause a significant bottleneck in design and forecast tasks [44]. The prospect ofbypassing traditional numerical methods and building surrogates from data alone [26, 36, 37, 41] is attractive for mul-tiple applications ranging from engineering design [4, 38] and control [35, 23] to climate modeling [8, 9, 28]. Thisis because data-driven ROMs allow for rapid predictions of nonlinear dynamics unencumbered by the stability-based a r X i v : . [ phy s i c s . c o m p - ph ] J u l PREPRINT - J
ULY
24, 2020limitations of numerical discretizations. In almost all ROM applications, forecasts must be conditioned on time andseveral control parameters, such as the initial conditions or the physical properties of the governing laws. In addition,since these models eschew the use of PDEs, it is necessary to associate some notion of uncertainty quantificationduring the time evolution of these surrogate dynamical systems. This is to ensure that the loss of interpretability andreliability by bypassing equations is offset by a feedback process from the ML.Neural networks have been used for ROMs for decades. One of the earliest examples [14] used a simple fully connectednetwork for forecasting meteorological information. More recently, researchers have incorporated a single-layeredfeed-forward neural network into a nonlinear dynamical system and built a surrogate model for a high-dimensionalaerodynamics problem [27]; radial basis function networks have been used to make forecasts for a nonlinear unsteadyaerodynamics task [51, 21]; and a simple fully connected network has been used for learning the dynamics of anadvection-dominated system [40, 13].Neural networks are commonly used for two tasks in typical ROM construction: Compression and time-evolution. Forthe former, they may be used as a nonlinear equivalent of the proper orthogonal decomposition (POD) or principalcomponent analysis (PCA) based methods which find a linear affine subspace for the full system. The identificationof this reduced basis to ensure a compressed representation that is minimally lossy is a core component of most ROMdevelopment strategies (some examples include [39, 19, 16]). While POD-based methods currently represent the mostpopular technique for reduced-basis (or latent space) construction, data generated from PDE simulations can often beinterpreted as images on a square grid; therefore, convolutional neural networks have also been applied [43, 17, 12, 30]for building computationally effective ROMs.Once this basis (or latent representation) is identified, we need a cost-effective strategy for accurate nonlinear dy-namical system evolution to reproduce the full-order spatiotemporal complexity of the problem in the reduced basis.For example, linear reduced-basis construction allows for the use of intrusive methods (which project the governingequations onto the reduced-basis), as seen in [15, 34], which use a Galerkin projection; or [6, 49, 11], which use thePetrov–Galerkin method (see [5] for the comparison of these two methods). Such extensions are not straightforwardfor autoencoder based latent space constructions, since projecting to a basis space is infeasible. We note, though,that the use of convolutional autoencoder architectures has been demonstrated for the Petrov–Galerkin method, wherethere is no requirement to project the governing equations onto a trial space [24]. As mentioned previously, neuralnetworks are also commonly used for the temporal evolution of these latent space representations. Mainstream MLhas generated a large body of time-series forecasting literature that lends itself readily to latent-space evolution of dy-namical systems. Recently, long short-term memory architectures (LSTMs) have become popular for the non-intrusivecharacterization of dynamical systems [45, 1, 32, 33, 46] as they allow for the construction of non-i.i.d, directionalrelationships between the states of a dynamical system at different times before a future forecast is performed. Othermethods that have been utilized for such tasks include the temporal convolutional network [50], non-autoregressivevariants for the LSTM [29] and system-identification within latent space [7]. In most of these developments, the evo-lution framework (in time) is deterministic in nature. These frameworks highlight a crucial drawback when coupledwith the black-box nature of purely data-driven ROMs – it is imperative to embed some notion of uncertainty into thetime evolution of ROMs in the latent space. Past work has shown that LSTMs suffer from stability and interpretabilityissues when utilized for the reduced-order modeling of advection-dominated systems [29] and this study proposes analternative to provide interpretable forecasts with quantified uncertainty. To that end, we propose the use of a Gaussianprocess regression (GPR) framework for evolving the low-dimensional representation of data obtained from a PDEevolution. In addition to providing forecasts at the temporal resolution of the training data, the use of the our frame-work allows for interpolation in time. This is because time is interpreted as a continuous variable. When coupled withquantified uncertainty, the framework allows a ROM user to interrogate the behavior of the emulated system at a finertemporal resolution, with implications for data sources that are temporally sparse.To summarize, the major contributions of this article are: • We introduce a technique to construct parametric non-intrusive ROMs with quantified uncertainty duringlatent-space time evolution of dynamical systems. • We detail the use of Gaussian processes (GPs) that are customized for the time-evolution of parametric non-linear dynamical systems. • We demonstrate the ability of the proposed time-evolution algorithm for systems compressed by linearreduced-basis methods such as POD, as well as nonlinear compression frameworks such as variational andconvolutional autoencoders. • We test the ability of a ROM constructed from coarse training data to interpolate on a finer temporal resolutionwith quantified uncertainty. 2
PREPRINT - J
ULY
24, 2020Figure 1: Schematic of reduced-order modeling. Upper panel shows the dimensionality reduction operation to a latentspace. Lower panel shows the forecasting in the latent space for reconstructions.In the following, we shall introduce our experimental setup which relies on the inviscid shallow water equations inSection 2, our various compression mechanisms in Section 3, the UQ embedded emulation strategy in Section 4,followed by results and conclusions in Sections 5 and 6 respectively.
The inviscid shallow water equations belong to a prototypical system of equations for geophysical flows. In par-ticular, the shallow water equations admit solutions where advection dominates dissipation and pose challenges forconventional ROMs [47]. These governing equations are given by ∂ ( ρη ) ∂t + ∂ ( ρηu ) ∂x + ∂ ( ρηv ) ∂y = 0 (1) ∂ ( ρηu ) ∂t + ∂∂x (cid:18) ρηu + 12 ρgη (cid:19) + ∂ ( ρηuv ) ∂y = 0 (2) ∂ ( ρηv ) ∂t + ∂ ( ρηuv ) ∂x + ∂∂y (cid:18) ρηv + 12 ρgη (cid:19) = 0 , (3)where, η corresponds to the total fluid column height, and ( u, v ) is the fluid’s horizontal flow velocity, averaged acrossthe vertical column, g is acceleration due to gravity, and ρ is the fluid density, typically set to . . Here, t , x and y arethe independent variables of time and the spatial coordinates of the two-dimensional system. Equation 1 captures thelaw of mass conservation, whereas Equations 2 and 3 denote the conservation of momentum. The initial conditions ofthe problem are given by ρη ( x, y, t = 0) = 1 + exp (cid:20) − (cid:18) ( x − ¯ x ) × ) + ( y − ¯ y ) × ) (cid:19)(cid:21) (4) ρηu ( x, y, t = 0) = 0 (5) ρηv ( x, y, t = 0) = 0 , (6)3 PREPRINT - J
ULY
24, 2020i.e., a Gaussian perturbation at a particular location on the grid [¯ x, ¯ y ] ≡ w . We solve the system of equations until t = 0 . with a time-step of . seconds on a square two-dimensional grid with collocation points to completelycapture the advection and gradual decay of this perturbation. Note that these numbers may vary according to theforecasting and fidelity requirements of a particular problem and perturbation. The initial and boundary conditionsfor this particular shallow-water equation experiment represent a tightly-controlled traveling wave problem that istranslation invariant. Different realizations of the initial condition lead to translationally shifted trajectories. We alsonote the presence of mirror symmetries with respect to x = ¯ x and y = ¯ y coupled with a rotational symmetry of π/ radians about the origin. However, our motivation for a first assessment of our emulators on this system stems fromthe well-known fact that POD and Galerkin-projection based methods are severely limited in their ability to forecaston these simple traveling-wave systems [22, 31] and require special treatment with intrinsic knowledge of the flowdynamics. This is in addition to the fact that equation-based models are impossible to construct because of the absenceof information from the other variables of the PDE. We seek to build predictive models solely from observations of ρη conditioned on w mimicking a real-world scenario where complete observations of all relevant variables (in this case,velocities) are unavailable. In this section, we review the POD technique for the construction of a reduced basis [20, 2]. The interested reader mayalso find an excellent explanation of POD and its relationship with other dimension-reduction techniques in [42]. ThePOD procedure is tasked with identifying a space X f = span (cid:110) ϑ , . . . , ϑ f (cid:111) , (7)which approximates snapshots optimally with respect to the L -norm. The process of ϑ generation commences withthe collection of snapshots in the snapshot matrix S = [ ˆ q h ˆ q h · · · ˆ q N s h ] ∈ R N h × N s , (8)where N s is the number of snapshots, and ˆ q ih : T × P → R N h corresponds to an individual snapshot in time of thediscrete solution domain with the mean value removed, i.e., ˆ q ih = q ih − ¯q h , ¯q h = 1 N s N s (cid:88) i =1 q ih . (9)with q h : P → R N h being the time-averaged solution field. Our POD bases can then be extracted efficiently throughthe method of snapshots where we solve the eigenvalue problem on the correlation matrix C = S (cid:124) S ∈ R N s × N s . Then CW = W Λ , (10)where Λ = diag { λ , λ , · · · , λ N s } ∈ R N s × N s is the diagonal matrix of eigenvalues and W ∈ R N s × N s is theeigenvector matrix. Our POD basis matrix can then be obtained by ϑ = SW ∈ R N h × N s . (11)In practice a reduced basis ψ ∈ R N h × N r is built by choosing the first N r columns of ϑ for the purpose of efficientROMs, where N r (cid:28) N s . This reduced basis spans a space given by X r = span (cid:110) ψ , . . . , ψ N r (cid:111) . (12)The coefficients of this reduced basis (which capture the underlying temporal effects) may be extracted as A = ψ (cid:124) S ∈ R N r × N s . (13)The POD approximation of our solution is then obtained via ˆ S = [ ˜ q h ˜ q h · · · ˜ q N s h ] ≈ ψ A ∈ R N h × N s , (14)4 PREPRINT - J
ULY
24, 2020Figure 2: Network architecture shows the encoder, bottleneck and decoder of a typical convolutional autoencoder(CAE). The input image is reconstructed using convolution, pooling and upsampling/unpooling layers.where ˜ q ih : T × P → R N h corresponds to the POD approximation to ˆ q ih . The optimal nature of reconstruction maybe understood by defining the relative projection error (cid:80) N s i =1 (cid:13)(cid:13) ˆ q ih − ˜ q ih (cid:13)(cid:13) R Nh (cid:80) N s i =1 (cid:13)(cid:13) ˆ q ih (cid:13)(cid:13) R Nh = (cid:80) N s i = N r +1 λ i (cid:80) N s i =1 λ i , (15)which exhibits that with increasing retention of POD bases, increasing reconstruction accuracy may be obtained. Weremark that for dimension d > , the solution variables may be stacked to obtain this set of bases that are utilizedfor the reduction of each PDE within the coupled system. Another approach may be to obtain reduced bases for eachdependent variable within the coupled system and evolve each PDE on a different manifold. Each dependent variableis projected onto bases constructed from its snapshots alone. This affects the computation of the nonlinear term forcomputing the updates for each dimension in q . In practice, this operation manifests itself in the concatenation ofreduced bases to obtain one linear operation for reconstruction of all field quantities. Autoencoders are neural networks that learn a new representation of the input data, usually with lower dimensionality.The initial layers, called the encoder , map the input x ∈ R m to a new representation z ∈ R k . The remaining layers,called the decoder , map z back to R m with the goal of reconstructing x . The objective is to minimize the reconstructionerror. Autoencoders are unsupervised; the data x is given, but the representation z must be learned.More specifically, we use convolutional autoencoders (CAEs) that have convolutional layers. In a convolutional layer,instead of learning a matrix that connects all m neurons of layer’s input to all n neurons of the layer’s output, we learna set of filters. Each filter f i is convolved with patches of the layer’s input. Suppose a one-dimensional convolutional5 PREPRINT - J
ULY
24, 2020layer has filters of length m f i . Then each of the layer’s output neurons corresponding to filter f i is connected to a patchof m f i of the layer’s input neurons. In particular, a one-dimensional convolution of filter f and patch p is defined as f ∗ p = (cid:80) j f j p j (for neural networks, convolutions are usually technically implemented as cross-correlations). Then,for a typical one-dimensional convolutional layer, the layer’s output neuron y ij = ϕ ( f i ∗ p j + B i ) , where ϕ is anactivation function and B i are the entries of a bias term. As j increases, patches are shifted by stride s . For example,a one-dimensional convolutional layer with a filter f of length m f = 3 and stride s = 1 could be defined so that y j involves the convolution of f and inputs j − , j , and j + 1 . To calculate the convolution, it is common to add zerosaround the inputs to a layer, which is called zero padding . In the decoder, we use deconvolutional layers to return tothe original dimension. These layers upsample with nearest-neighbor interpolation.Two-dimensional convolutions are defined similarly, but each filter and each patch are two-dimensional. A two-dimensional convolution sums over both dimensions, and patches are shifted both ways. For a typical two-dimensionalconvolutional layer, the output neuron y ijk = ϕ ( f i ∗ p jk + B i ) . Input data can also have a “channel” dimension, such asred, green and blue for images. The convolutional operator sums over channel dimensions, but each patch contains allof the channels. The filters remain the same size as patches, so they can have different weights for different channels.In our study, we use solely one channel for the spatial magnitude of ρη .It is common to follow a convolutional layer with a pooling layer, which outputs a sub-sampled version of the input.In this paper, we specifically use max-pooling layers. Each output of a max-pooling layer is connected to a patch ofthe input, and it returns the maximum value in the patch. As opposed to CAEs, the variational autoencoder (VAE) [18] takes a Bayesian approach to latent space modeling. Weutilize a convolutional VAE architecture, where outer convolutional, pooling and upscaling layers are identical to CAE.The difference only arises in the bottleneck, where the encoder final layers are fully connected layers that project theinput x ∈ R n onto the latent-variable z space. Thus, the encoder effectively is a function q ( z | x ) . The fully connectedpart of the decoder network p ( x (cid:48) | z ) generates newly sampled x (cid:48) from the latent space. The output of this undergoesupsampling and convolutions (similar to that of the CAE) to reconstruct the image. The VAE also constrains the latent-space variables to follow a normal distribution z ∼ N ( µ, σ ) via the Kullback–Leibler divergence (KL divergence)term D KL ( q ( z | x i ) || p ( z | x i )) . The architecture of VAE (shown in Figure 3) is similar to that of the conventional CAEshown in Figure 2, except at the bottleneck, where the encoder network outputs the mean µ and variance σ .The inference itself is undertaken using variational inference (VI) by modeling the true distribution q ( z | x ) using asimple Gaussian distribution, and then minimizing the difference via the KL divergence as an addition to the lossfunction E q ( z | x i ) [log p ( z , x i ) − log q ( z | x i )] . The KL-divergence loss is applied such that the distribution on z is asclose to the normal distribution as possible.With the inclusion of additional complexity to the loss function and the bottleneck, VAEs have more parameters to tunethan CAEs. However, this also gives a significant control over the latent-space distribution. Depending on the data,dimensionality reduction using CAEs may not allow for a straightforward interpolation due to the presence of discon-tinuous clusters in the latent-space representation. On the other hand, VAEs constrain the latent-space representationto follow a pre-specified distribution, hence by design, facilitate easier interpolation. Dimensionality reduction performed using POD, CAE or VAE techniques results in a representation of the originaldata, along with a model (the POD bases or decoders) to reconstruct the data for any point in the latent space. We thenonly require a temporal interpolation scheme fitted on the representation space, so that a time evolution of the dynam-ical system can be reconstructed, as schematically shown in Fig 1. In our approach, we deploy the use of GPs [48]as our interpolation algorithm. While GPs perform Bayesian regression tasks with a high level of interpretabilty, theyare generally computationally expensive for large data sizes. We achieve a considerable reduction of computationalcost by fitting in the space of reduced dimensions. In addition, the GPR may also be restricted to a less noisy spacecompared with the original dataset of the dynamical evolution.A GP is an accumulation of random variables, every finite collection of which follows a multivariate Gaussian dis-tribution. It can be perceived as a generalization of a multivariate Gaussian distribution with infinite space. GPsare a popular choice for non-linear regression due to their flexibility and ease of use. In addition, one of their mainadvantages, is that they incorporate a principled way of measuring the uncertainty information, since they providepredictions in distributional form. For the purpose of this paper, a GPR model is used to fit the reduced space from the6
PREPRINT - J
ULY
24, 2020Figure 3: Convolutional VAE architecture: Convolutional parts of the encoder and decoder are similar to that of aconvolutional autoencoder. The bottleneck includes a mapping onto a normal distribution of latent-space variables.Sampling from this space is performed and decoded for generation of new data.data-compression algorithms of Section 3. Subsequently, the mean prediction, which corresponds to the maximum aposteriori (MAP) estimate, is used for the reconstructions. We use the GPflow library for the experiments [10].A GP can be completely specified by its second-order statistics. Thus, a mean function m ( x ) equal to zero canbe assumed, and a positive definite covariance function (kernel) k ( x , x (cid:48) ) , which can be perceived as a measure ofsimilarity between x and x (cid:48) , is the only requirement to specify the GP.For the experiments in Section 5 we initially experimented with changepoint kernels [25]. The intuition came fromthe data (see e.g. Figure 11), where two typical behaviours are commonly observed; a steep increase or decrease ofthe latent dimension values for early times, and a subsequent, smoother change in direction, that eventually leads tostabilization. At first we examined a changepoint kernel only for the time feature, which was then added to a regularkernel that accounted for the other variables. The results were discouraging, which we attribute to the fact that thiskernel structure leads to loss of correlational information between time and the other variables. Subsequently, weexamined changepoint kernels that accounted for all parameters in both their sub-kernels. Even though this type ofkernel was successful in producing acceptable results, and also detecting adequately the position of the changepoint,the output was only a slight improvement when compared to a standard kernel. Furthermore, the computational timewas substantially larger, which led us to abandon the pursuit of a changepoint kernel. Instead, we settled on a Mat´ern3/2 kernel, k ( x , x (cid:48) ) = (cid:32) (cid:112) x − x (cid:48) ) l (cid:33) exp (cid:32) − (cid:112) x − x (cid:48) ) l (cid:33) , (16)due to its versatility, flexibility and smoothness, and more specifically its automatic relevance determination (ARD)extension [3], which incorporates a separate parameter for each input variable and gave a significant improvement inour results.For a GPR model, we considered a GP f and noisy training observations y of n datapoints x , derived from the truevalues f ( x ) with additive i.i.d. Gaussian noise (cid:15) with variance σ n . In mathematical form, that is: y = f ( x ) + (cid:15),(cid:15) ∼ N (0 , σ n ) f ( x ) ∼ GP(0 , k ( x , x (cid:48) )) , (17)where k ( · , · ) is the kernel. We obtain the complete specification of the GP, by maximizing the marginal likelihood ,which we can acquire by integrating the product of the Gaussian likelihood and the GP prior over f : p ( y | x ) = (cid:90) f p ( y | f, x ) p ( f | x ) d f. (18)7 PREPRINT - J
ULY
24, 2020For testing input x (cid:63) and testing output f (cid:63) , we derive the joint marginal likelihood: (cid:20) yf (cid:63) (cid:21) ∼ N (cid:18)(cid:20) (cid:21) , (cid:20) k ( x , x ) + σ n I k ( x , x (cid:63) ) k ( x (cid:63) , x ) k ( x (cid:63) , x (cid:63) ) (cid:21)(cid:19) , where I is the identity matrix.Finally, by conditioning the joint distribution on the training data and the testing inputs, we derive the predictivedistribution f (cid:63) | x , x (cid:63) , y ∼ N ( ¯f (cid:63) , cov( f (cid:63) )) , (19)where ¯f (cid:63) = k ( x (cid:63) , x )[ k ( x , x ) + σ n I ] − y cov( f (cid:63) ) = k ( x (cid:63) , x (cid:63) ) − k ( x (cid:63) , x )[ k ( x , x ) + σ n I ] − k ( x , x (cid:63) ) . (20)During the reconstruction phase, we focus on the predictions that correspond to ¯f (cid:63) . In this section, we outline several experiments designed to assess the various compression frameworks and how theyinterface with latent-space emulation using our aforementioned GPs. A first series of assessments is solely targetedat assessing the fidelity of reconstruction (i.e., which framework offers the most efficient compression). Followingthis, we interface latent-space representations of our compressed fields with GP emulators to obtain low-dimensionalsurrogates with embedded uncertainty quantification. Finally, we outline the ability for the GP emulators to predictthe dynamics’ evolution at finer temporal resolutions.For the purpose of training the compression frameworks and the GP emulators, we generate forward-model solveswhich are obtained by a Latin hypercube sampling of different initial conditions w between − . and . for eachdimension ¯ x and ¯ y . For each of these simulations, evenly-spaced snapshots in time are obtained to constructour total data set (i.e., we have flow-fields for ρη of resolution × ). We remind the reader that the equation-based simulations require the solution of a system of PDEs (for ρη , ρηu and ρηv ), but our emulators will be builtfrom ρη information alone. We split the simulations into for training, for validation and for testing. Thevalidation data set is primarily utilized for early stopping criteria in the deep neural network autoencoders. Note thatthe training and validation data are combined into one data set for training GP emulators. All statistical and qualitativeassessments in the following will be shown for the testing data alone. We begin by assessing the ability of our different compression frameworks, i.e., the POD, CAE and VAE. This compar-ison is obtained by training multiple encoders with varying degrees of freedom (DOF) in the latent space. The resultsof these experiments can be seen in Table 1. Here, we have chosen , , , , and DOF for all of our com-pression frameworks and have compared the fidelity of the reconstruction. We use metrics given by the coefficient ofdetermination ( R ), mean squared error (MSE), and mean absolute error (MAE) to compare the true and reconstructedfields for testing data sets. Our analysis of the metrics indicate that the CAE is able to reach optimal reconstructionaccuracy faster than both the POD and VAE. Both the VAE and CAE are seen to possess an advantage over the POD,due to their ability to find a non-linear low-dimensional manifold. POD, instead, obtains a linear affine subspace ofthe high-dimensional system. Interestingly, with increasing DOF in latent space ( ∼ ) the POD method is seen tooutperform the VAE. We also note that the CAE and VAE frameworks obtain their peak accuracy at around 8 DOF inthe latent space and proceed to saturate in accuracy with greater latent space dimensions. We remark that this alignswith the lack of any guarantees on convergence with increasing DOF for these nonlinear compression frameworks.POD, instead, is guaranteed to converge with increasing latent space dimensions.Following these quantitative assessments, we assess the reconstruction fidelity of the different frameworks by com-paring contours from the different methods with varying DOF in latent space. Figure 4 shows the performance forour three compression frameworks for four DOF in the latent space. At this coarse resolution (in latent space), thelinear compression of POD is inadequate at capturing the coherent features in the solution field upon reconstruction.This is due to the advective nature of the dynamics of this data set and the associated high Kolmogorov width. Incontrast, both CAE and VAE are able to identify coherent structures in the flow field after being reconstructed from alatent space. Note, however, that the CAE is able to identify the crests and troughs in the flow field in a more accuratemanner. We observe similar results from an eight-dimensional latent space where the CAE and VAE are still seen to8 PREPRINT - J
ULY
24, 2020Table 1: Latent space compression and reconstruction metrics for testing data. The CAE is seen to obtain high accuracywith relatively few DOF in comparison to the POD and VAE.Coefficient of determinationModel/Latent DOF 2 4 8 16 30 40POD 0.10 0.30 0.55 0.69 0.82 0.87CAE 0.37 0.87 0.91 0.88 0.91 0.91VAE 0.35 0.66 0.86 0.83 0.82 0.79Mean squared errorModel/Latent DOF 2 4 8 16 30 40POD 0.0025 0.0021 0.0014 0.0010 0.00063 0.00045CAE 0.0017 0.00034 0.00025 0.00031 0.00026 0.00025VAE 0.0017 0.00084 0.00036 0.00043 0.00045 0.00052Mean absolute errorModel/Latent DOF 2 4 8 16 30 40POD 0.029 0.027 0.021 0.017 0.013 0.011CAE 0.021 0.0090 0.0075 0.0083 0.0075 0.0076VAE 0.023 0.015 0.0094 0.010 0.010 0.011outperform POD (although improvements in the latter may be observed). For both cases (i.e., four and eight DOF), theCAE and VAE are seen to struggle with reconstructing the dissipating coherent features later in the evolution of thesystem. For completeness, we also show a result for a forty-dimensional latent space in Figure 6, where POD can beseen to capture the spatio-temporal trends of the true solution appropriately. Note that improvements in the CAE andVAE are marginal with the POD outperforming both these frameworks later in the evolution of system (at t = 0 . ). Next we test the ability of our trained GP emulators to forecast the evolution of systems in their latent space represen-tations. For this assessment, we choose trained encoders (with 4,8 and 40 latent space dimensions) to obtain trainingand validation data for fitting our previously introduced GPs. Once trained, the GPs are tasked with predicting theevolution of the latent state in reduced space for a set of test initial conditions.Figure 7 shows the latent space evolution of a testing simulation over time for the three different compression method-ologies. This result utilizes solely four latent dimensions. It is readily apparent that CAE compression leads to asmooth evolution of the system in latent space. In contrast, POD and VAE methods display significant oscillations.The GP is able to capture the behavior of the system evolution and also provides confidence intervals which are basedon two standard deviations around the mean. With the exception of a few instances in time, the confidence intervalsare able to envelope the true evolution of the system. We see similar results in Figure 8 where eight latent space DOFare obtained for each of our compression frameworks. While the POD is seen to provide oscillatory system evolution(as before), the CAE and VAE are smooth. In either case, the constructed GP is able to recover the evolution well.
We proceed by assessing the fidelity of the reconstructions from latent space using the GP forecasts of the previoussubsection. Qualitative comparisons for the reconstruction of a test simulation are shown in Figure 9 for a four-dimensional latent space at three different times. The CAE and VAE compression is seen to outperform the linearreduced-basis constructed by POD. This aligns with past studies where nonlinear compression methods have outper-formed the POD. The CAE is seen to be more accurate than the VAE due to its deterministic formulation duringcompression. We observe the CAE and VAE to outperform the POD for the eight-dimensional latent space as well, asshown in Figure 10. Qualitatively, the CAE is seen to be the best compression technique of all the methods. Table 2shows different metrics to establish these conclusions quantitatively. Note that all these metrics are evaluated on thereconstructed data in physical space.
It must be highlighted that ML-based time-series forecasting methods are generally formulated in a discrete fashionwhere the temporal resolution of the training data determines the resolution of the ROM deployment. Also, moststudies of ML-based forecasting in time assume a regular sampling of the state in time. In practice, due to simulationor experimental limitations, state information may be available sparsely in time and at irregular intervals. Therefore,9
PREPRINT - J
ULY
24, 2020 (a) Reconstruction at t = 0 . (b) Reconstruction at t = 0 . (c) Reconstruction at t = 0 . Figure 4: Reconstruction from true latent-space evolutions for a four-dimensional latent space for time t = 0 . (top), t = 0 . (middle), t = 0 . (bottom) for a testing initial condition. Qualitatively, for the same data set and similararchitectures, the CAE outperforms the VAE. Note that both the VAE and CAE are superior to the POD.Table 2: Metrics obtained via GP-based forecast in latent space followed by reconstruction for testing data.Coefficient of determinationModel/DOF 4 8POD –3.89 –1.99CAE 0.86 0.87VAE 0.63 0.82Mean squared errorModel/DOF 4 8POD 0.034 0.0041CAE 0.00032 0.00029VAE 0.00086 0.00046Mean absolute errorModel/DOF 4 8POD 0.030 0.044CAE 0.0085 0.0076VAE 0.014 0.010the construction of a ROM that is continuous in the temporal variable allows us to sample at intermediate time stepswith quantified uncertainty. Therefore. we test the ability of our parameterized non-intrusive ROM for interpolation in time . In this assessment, we establish the performance of the ROM to sample at locations (in time) that were not10 PREPRINT - J
ULY
24, 2020 (a) Reconstruction at t = 0 . (b) Reconstruction at t = 0 . (c) Reconstruction at t = 0 . Figure 5: Reconstruction from true latent space evolutions for an eight-dimensional latent space for time t = 0 . (top), t = 0 . (middle), t = 0 . (bottom) for a testing initial condition. Qualitatively, for the same data set andsimilar architectures, the CAE outperforms the VAE. Note that both VAE and CAE are superior to the POD.obtained in the discrete training and testing data. This is made possible due to the continuous function approximationproperty of the GPR.To assess this capability, we re-generate our testing data which is now sampled times more finely in the temporaldimension. We utilize our pretrained CAE (solely trained on the coarsely sampled training data), to compress thissystem evolution to an eight-DOF latent space. Following this, our previously trained GP is tasked with sampling atthe intermediate points in time, which correspond to this finely sampled testing data. Note that the GP is trained onlywith the coarse data set as well. Thus, this assessment represents interpolation in both space and time. The results forthe GP forecast in this assessment are shown in Figure 11. A good agreement can be observed between the true latentspace trajectory and the GP interpolated counterpart. We also direct our attention to the forecast behavior for latentdimensions , and where uncertainties are seen to oscillate corresponding to the coarsely sampled training points.We qualitatively assess the accuracy of the temporal interpolation as shown in Figure 12, where the reconstructionfrom the true latent space trajectory and the GP interpolation are compared against the truth. A good agreement isseen. Naturally, the Root Mean Square Error (RMSE) and MAE for this finely sampled test data set is seen to behigher that the case for interpolating solely on the coarser sample locations (see Table 2 for comparison), but this addsa useful utility to this ROM strategy. The development of parameteric non-intrusive reduced-order models for advective PDE systems has great applicationsfor cost reductions in large numerical simulation campaigns across multiple domain sciences. This article addresses11
PREPRINT - J
ULY
24, 2020 (a) Reconstruction at t = 0 . (b) Reconstruction at t = 0 . (c) Reconstruction at t = 0 . Figure 6: Reconstruction from true latent space evolutions for a forty-dimensional latent space for time t = 0 . (top), t = 0 . (middle), t = 0 . (bottom) for a testing initial condition. Qualitatively, for the same data set and similararchitectures, the CAE outperforms the VAE and is comparable to the POD encoding. This is because of the greaterDOF in the latent space.Table 3: Metrics for the temporal super-resolution of testing simulations. A CAE and GP combination trained oncoarsely sampled data in time is utilized to reconstruct at finer intervals Metric MAE RMSE
GP Interpolation+reconstruction 0.0175 0.00094Reconstruction 0.0084 0.0002212
PREPRINT - J
ULY
24, 2020 (a) POD(b) CAE(c) VAE
Figure 7: Latent space forecasts on testing data for (a) POD, (b) CAE and (c) VAE with a reduced dimension of 4DOF. The shaded areas indicate confidence intervals from the probabilistic forecasts.their limitations associated with reduced interpretability by proposing the use of GPRs, conditioned on time and systemcontrol parameters that provide quantification of uncertainty. This is particularly useful when nonlinear compressiontechniques, such as autoencoders, are used for efficient DOF reduction. In addition to the ability to interpolate ininitial condition space, we also investigate the ability of the proposed framework to interpolate in time. This addressesthe fact that the sampling of a dynamical system for training these ROMs may not match the emulation temporalresolution requirement. We also remark that the modular nature of the compression and time evolution allows for theuse of conventional reduced-basis methods such as the POD for dynamics which are intrinsically low-dimensional.Our results indicate that the proposed model-order reduction technique is successful at dealing with advective dy-namics through assessments on the inviscid shallow-water equations. We establish this by testing on unseen initialconditions for our system evolution where a low-dimensional evolution successfully replicates high-dimensional re-sults when coupled with a convolutional or variational autoencoder. The non-intrusive nature of our framework alsoallows for construction of emulators from remote-sensing or experimental data. This is of value when the underlyinggoverning PDEs are not known a priori. Extensions to the present study shall investigate the integration of a feedbackloop to sample points in control parameter space with the knowledge of prediction uncertainty. Through this, we aimto establish continually-learning model-order reduction frameworks for advective problems spanning large physicalregimes. 13
PREPRINT - J
ULY
24, 2020 (a) POD(b) CAE(c) VAE
Figure 8: Latent space forecasts on testing data for (a) POD, (b) CAE and (c) VAE with a reduced dimension of 8DOF. The shaded areas indicate confidence intervals from the probabilistic forecasts.14
PREPRINT - J
ULY
24, 2020 (a) GP Reconstructions t = 0 . (b) GP Reconstructions t = 0 . (c) GP Reconstructions t = 0 . Figure 9: Reconstruction from GP forecasts for a four-dimensional latent space for time (a) t = 0 . , (b) t = 0 . and (c) t = 0 . for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAEoutperforms the VAE. Note that both the VAE and CAE are superior to the POD. RM acknowledges support from the Margaret Butler Fellowship at the Argonne Leadership Computing Facility.TB, LRM, IP acknowledge support from Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC GrantEP/T001569/1, particularly the
Digital Twins for Complex Engineering Systems theme within that grant and The AlanTuring Institute. IP acknowledges funding from the Imperial College Research Fellowship scheme. This material isbased upon work supported by the U.S. Department of Energy (DOE), Office of Science, Office of Advanced ScientificComputing Research, under Contract DE-AC02-06CH11357. This research was funded in part and used resources ofthe Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Con-tract DE-AC02-06CH11357. This paper describes objective technical results and analysis. Any subjective views oropinions that might be expressed in the paper do not necessarily represent the views of the U.S. DOE or the UnitedStates Government. Declaration of Interests - None.
References [1] Shady E Ahmed, Omer San, Adil Rasheed, and Traian Iliescu. A long short-term memory embedding for hybriduplifted reduced order models.
Physica D: Nonlinear Phenomena , page 132471, 2020.[2] Gal Berkooz, Philip Holmes, and John L Lumley. The proper orthogonal decomposition in the analysis ofturbulent flows.
Annual Review of Fluid Mechanics , 25(1):539–575, 1993.[3] Christopher M Bishop.
Pattern recognition and machine learning . springer, 2006.15
PREPRINT - J
ULY
24, 2020 (a) GP Reconstructions t = 0 . (b) GP Reconstructions t = 0 . (c) GP Reconstructions t = 0 . Figure 10: Reconstruction from GP forecasts for an eight-dimensional latent space for time (a) t = 0 . , (b) t = 0 . and (c) t = 0 . for a testing initial condition. Qualitatively, for the same data set and similar architectures, the CAEoutperforms the VAE. Note that both VAE and CAE are superior to the POD.Figure 11: Latent space interpolation for a testing data set with finer resolution in time. The purpose of this assessmentis to determine the proposed framework’s viability for super-resolution in time. While all the training data is obtainedby sampling a simulation only times over the course of its evolution, the assessments are tested against a simulationthat has been sampled times. 16 PREPRINT - J
ULY
24, 2020 (a) t = 0 . (b) t = 0 . (c) t = 0 . Figure 12: True fields (left), CAE reconstructions (middle) and GP-interpolated reconstructions (right) for a testingsimulation that has been sampled times more finely. Note that the shown instances in time are not the usual samplingpoints utilized for our CAE or GP training. 17 PREPRINT - J
ULY
24, 2020[4] Tan Bui-Thanh, Murali Damodaran, and Karen Willcox. Aerodynamic data reconstruction and inverse designusing proper orthogonal decomposition.
AIAA journal , 42(8):1505–1516, 2004.[5] Kevin Carlberg, Matthew Barone, and Harbir Antil. Galerkin v. least-squares petrov–galerkin projection innonlinear model reduction.
Journal of Computational Physics , 330:693–734, 2017.[6] Kevin Carlberg, Charbel Bou-Mosleh, and Charbel Farhat. Efficient non-linear model reduction via a least-squares Petrov–Galerkin projection and compressive tensor approximations.
Int. J. Numer. Meth. Eng. ,86(2):155–181, 2011.[7] Kathleen Champion, Bethany Lusch, J Nathan Kutz, and Steven L Brunton. Data-driven discovery of coordinatesand governing equations. arXiv preprint arXiv:1904.02107 , 2019.[8] Ashesh Chattopadhyay, Pedram Hassanzadeh, and Saba Pasha. Predicting clustered weather patterns: A test casefor applications of convolutional neural networks to spatio-temporal climate data.
Scientific Reports , 10(1):1–13,2020.[9] Ashesh Chattopadhyay, Ebrahim Nabizadeh, and Pedram Hassanzadeh. Analog forecasting of extreme-causingweather patterns using deep learning. arXiv preprint arXiv:1907.11617 , 2019.[10] Alexander G De G. Matthews, Mark Van Der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, PabloLe´on-Villagr´a, Zoubin Ghahramani, and James Hensman. Gpflow: A gaussian process library using tensorflow.
The Journal of Machine Learning Research , 18(1):1299–1304, 2017.[11] Fangxin Fang, Christopher C Pain, IM Navon, Ahmed H Elsheikh, Juan Du, and D Xiao. Non-linear petrov–galerkin methods for reduced order hyperbolic equations and discontinuous finite element methods.
Journal ofComputational Physics , 234:540–559, 2013.[12] Francisco J Gonzalez and Maciej Balajewicz. Learning low-dimensional feature dynamics using deep convolu-tional recurrent autoencoders. arXiv preprint arXiv:1808.01346 , 2018.[13] Jan S Hesthaven and Stefano Ubbiali. Non-intrusive reduced order modeling of nonlinear problems using neuralnetworks.
Journal of Computational Physics , 363:55–78, 2018.[14] William W Hsieh and Benyang Tang. Applying neural network models to prediction and data analysis in mete-orology and oceanography.
Bulletin of the American Meteorological Society , 79(9):1855–1870, 1998.[15] I Kalashnikova and MF Barone. On the stability and convergence of a galerkin reduced order model (rom) ofcompressible flow with solid wall and far-field boundary treatment.
International journal for numerical methodsin engineering , 83(10):1345–1375, 2010.[16] Virginia L Kalb and Anil E Deane. An intrinsic stabilization scheme for proper orthogonal decomposition basedlow-dimensional models.
Phys. Fluids , 19(5):054106, 2007.[17] Byungsoo Kim, Vinicius C Azevedo, Nils Thuerey, Theodore Kim, Markus Gross, and Barbara Solenthaler.Deep fluids: A generative network for parameterized fluid simulations. In
Computer Graphics Forum , volume 38,pages 59–70. Wiley Online Library, 2019.[18] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 , 2013.[19] Milan Korda, Mihai Putinar, and Igor Mezi´c. Data-driven spectral analysis of the koopman operator.
Appliedand Computational Harmonic Analysis , 2018.[20] DD Kosambi. Statistics in function space.
J Indian Math. Soc. , 7:76–88, 1943.[21] Jiaqing Kou and Weiwei Zhang. Layered reduced-order models for nonlinear aerodynamics and aeroelasticity.
Journal of Fluids and Structures , 68:174–193, 2017.[22] J. N. Kutz, X. Fu, and S. L. Brunton. Multi-resolution dynamic mode decomposition.
SIAM Journal of AppliedDynamical Systems , 15(2):713–735, 2016.[23] J Nathan Kutz, Steven L Brunton, Bingni W Brunton, and Joshua L Proctor.
Dynamic mode decomposition:data-driven modeling of complex systems . SIAM, 2016.[24] Kookjin Lee and Kevin T Carlberg. Model reduction of dynamical systems on nonlinear manifolds using deepconvolutional autoencoders.
J. Comp. Phys. , 404:108973, 2020.[25] James Robert Lloyd, David Duvenaud, Roger Grosse, Joshua B Tenenbaum, and Zoubin Ghahramani. Automaticconstruction and natural-language description of nonparametric regression models. stat , 1050:24, 2014.[26] Zichao Long, Yiping Lu, Xianzhong Ma, and Bin Dong. PDE-net: Learning PDEs from data. arXiv preprintarXiv:1710.09668 , 2017. 18
PREPRINT - J
ULY
24, 2020[27] Andrea Mannarino and Paolo Mantegazza. Nonlinear aeroelastic reduced order modeling by recurrent neuralnetworks.
Journal of Fluids and Structures , 48:103–121, 2014.[28] Romit Maulik, Romain Egele, Bethany Lusch, and Prasanna Balaprakash. Recurrent neural network architecturesearch for geophysical emulation. arXiv preprint arXiv:2004.10928 , 2020.[29] Romit Maulik, Bethany Lusch, and Prasanna Balaprakash. Non-autoregressive time-series methods for stableparametric reduced-order models. arXiv preprint arXiv:2006.14725 , 2020.[30] Romit Maulik, Arvind Mohan, Bethany Lusch, Sandeep Madireddy, Prasanna Balaprakash, and Daniel Livescu.Time-series learning of latent-space dynamics for reduced-order model closure.
Physica D: Nonlinear Phenom-ena , page 132368, 2020.[31] Ariana Mendible, Steven L Brunton, Aleksandr Y Aravkin, Wes Lowrie, and J Nathan Kutz. Dimensionalityreduction and reduced order modeling for traveling wave physics. arXiv preprint arXiv:1911.00565 , 2019.[32] Arvind Mohan, Don Daniel, Michael Chertkov, and Daniel Livescu. Compressed Convolutional LSTM: AnEfficient Deep Learning framework to Model High Fidelity 3D Turbulence. arXiv preprint arXiv:1903.00033 ,2019.[33] Arvind T Mohan and Datta V Gaitonde. A deep learning based approach to reduced order modeling for turbulentflow control using LSTM neural networks. arXiv preprint arXiv:1804.09269 , 2018.[34] Muhammad Mohebujjaman, Leo G Rebholz, and Traian Iliescu. Physically constrained data-driven correctionfor reduced-order modeling of fluid flows.
Int. J. Numer. Meth. Fl. , 89(3):103–122, 2019.[35] Bernd R Noack, Marek Morzynski, and Gilead Tadmor.
Reduced-order modelling for flow control , volume 528.Springer Science & Business Media, 2011.[36] Maziar Raissi. Deep hidden physics models: Deep learning of nonlinear partial differential equations.
TheJournal of Machine Learning Research , 19(1):932–955, 2018.[37] Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learningframework for solving forward and inverse problems involving nonlinear partial differential equations.
Journalof Computational Physics , 378:686–707, 2019.[38] S Ashwin Renganathan, Yingjie Liu, and Dimitri N Mavris. Koopman-based approach to nonintrusive projection-based reduced-order modeling with black-box high-fidelity models.
AIAA Journal , 56(10):4087–4111, 2018.[39] Omer San and Jeff Borggaard. Basis selection and closure for pod models of convection dominated boussinesqflows. In , volume 5, 2014.[40] Omer San and Romit Maulik. Neural network closures for nonlinear model order reduction.
Advances in Com-putational Mathematics , 44(6):1717–1750, 2018.[41] Justin Sirignano and Konstantinos Spiliopoulos. DGM: A deep learning algorithm for solving partial differentialequations.
Journal of Computational Physics , 375:1339–1364, 2018.[42] Kunihiko Taira, Maziar S Hemati, Steven L Brunton, Yiyang Sun, Karthik Duraisamy, Shervin Bagheri,Scott Dawson, and Chi-An Yeh. Modal analysis of fluid flows: Applications and outlook. arXiv preprintarXiv:1903.05750 , 2019.[43] Nils Thuerey, Konstantin Weißenow, Lukas Prantl, and Xiangyu Hu. Deep learning methods for Reynolds-Averaged Navier–Stokes simulations of airfoil flows.
AIAA J , pages 1–12, 2019.[44] RWCP Verstappen and AEP Veldman. Direct numerical simulation of turbulence at lower costs.
Journal ofEngineering Mathematics , 32(2-3):143–159, 1997.[45] Pantelis R Vlachas, Wonmin Byeon, Zhong Y Wan, Themistoklis P Sapsis, and Petros Koumoutsakos. Data-driven forecasting of high-dimensional chaotic systems with long short-term memory networks.
Proceedings ofthe Royal Society A: Mathematical, Physical and Engineering Sciences , 474(2213):20170844, 2018.[46] Qian Wang, Nicol`o Ripamonti, and Jan S Hesthaven. Recurrent neural network closure of parametric pod-galerkin reduced-order models based on the mori-zwanzig formalism.
Journal of Computational Physics , page109402, 2020.[47] Zhu Wang, Imran Akhtar, Jeff Borggaard, and Traian Iliescu. Proper orthogonal decomposition closure modelsfor turbulent flows: a numerical comparison.
Computer Methods in Applied Mechanics and Engineering , 237:10–26, 2012.[48] Christopher KI Williams and Carl Edward Rasmussen.
Gaussian processes for machine learning , volume 2.MIT press Cambridge, MA, 2006. 19
PREPRINT - J
ULY
24, 2020[49] D Xiao, F Fang, J Du, CC Pain, IM Navon, AG Buchan, Ahmed H Elsheikh, and G Hu. Non-linear petrov–galerkin methods for reduced order modelling of the navier–stokes equations using a mixed finite element pair.
Computer Methods In Applied Mechanics and Engineering , 255:147–157, 2013.[50] Jiayang Xu and Karthik Duraisamy. Multi-level convolutional autoencoder networks for parametric predictionof spatio-temporal dynamics. arXiv preprint arXiv:1912.11114 , 2019.[51] Weiwei Zhang, Jiaqing Kou, and Ziyi Wang. Nonlinear aerodynamic reduced-order model for limit-cycle oscil-lation and flutter.