[PDF] Multifidelity Ensemble Kalman Filtering Using Surrogate Models Defined by Physics-Informed Autoencoders

Abstract

Data assimilation is a Bayesian inference process that obtains an enhanced understanding of a physical system of interest by fusing information from an inexact physics-based model, and from noisy sparse observations of reality. The multifidelity ensemble Kalman filter (MFEnKF) recently developed by the authors combines a full-order physical model and a hierarchy of reduced order surrogate models in order to increase the computational efficiency of data assimilation. The standard MFEnKF uses linear couplings between models, and is statistically optimal in case of Gaussian probability densities. This work extends MFEnKF to work with non-linear couplings between the models. Optimal nonlinear projection and interpolation operators are obtained by appropriately trained physics-informed autoencoders, and this approach allows to construct reduced order surrogate models with less error than conventional linear methods. Numerical experiments with the canonical Lorenz '96 model illustrate that nonlinear surrogates perform better than linear projection-based ones in the context of multifidelity filtering.

Full PDF

““Multiﬁdelity Ensemble Kalman Filteringusing surrogate models deﬁned byPhysics-Informed Autoencoders”

Andrey A. Popov, Adrian Sandu

Computational Science LaboratoryReport CSL-TR-21-1

Computational Science Laboratory“Compute the Future!”Department of Computer ScienceVirginia TechBlacksburg, VA 24060Phone: (540) 231-2193Fax: (540) 231-6075Email: [email protected], [email protected]

Web: http://csl.cs.vt.edu . a r X i v : . [ m a t h . O C ] F e b reprint , x(x):1–20 (2021) MULTIFIDELITY ENSEMBLE KALMANFILTERING USING SURROGATE MODELSDEFINED BY PHYSICS-INFORMEDAUTOENCODERS

Andrey A. Popov ∗ & Adrian Sandu *Address all correspondence to: Andrey A. Popov, E-mail: [email protected] Original Manuscript Submitted: mm/dd/yyyy; Final Draft Received: mm/dd/yyyyThe multiﬁdelity ensemble Kalman ﬁlter aims to combine a full-order model and a hierarchy of re-duced order surrogate model in an optimal statistical framework for Bayesian inference in sequentialdata assimilation. In this work we extend the multiﬁdelity ensemble Kalman ﬁlter to work with non-linear couplings between the models. Using autoencoders it is possible to train optimal projectionand interpolation operators, and to obtain reduced order surrogate models with less error than con-ventional linear methods. We show on the canonical Lorenz ’96 model that such a surrogate doesindeed perform better in the context of multiﬁdelity ﬁltering.

KEY WORDS:

Bayesian inference, control variates, data assimilation, multiﬁdelity en-semble Kalman ﬁlter, reduced order modeling, machine learning, surrogate models

Contents A.A. Popov, & A. Sandu

Preprint onlinear MFEnKF

1. INTRODUCTION

Unlike the “big-data” approaches that are en vogue at the moment, data assimilation (Asch et al.,2016; Reich and Cotter, 2015) is, by necessity, a “small-data” affair. Instead of more traditionaldata-based approaches which aim to learn black-box solutions, at the heart of data assimilation isa well-posed inference problem, where one aims to learn knowledge about a quantity of interestthrough a mathematically rigorous approach. This work aims to land somewhere in the middle(Karpatne et al., 2017; Willard et al., 2020), by attempting to combine both a mathematicallyrigorous data assimilation algorithm, and a data rigorous machine learning algorithm, throughpowerful techniques in multilevel inference (Giles, 2008, 2015).More formally, data assimilation is an inference process that aims at fusing information ob-tained from an inexact physics-based model, and from noisy sparse observations of reality, inorder to enhance our understanding of some unknown process. From the perspective of ma-chine learning, data assimilation is small-data learning problem, where the quantity of interest isweakly constrained by prior assumptions, and nudged towards the optimum by small quantitiesof imperfect data.

The ensemble Kalman ﬁlter (Evensen, 1994, 2009; G. et al., 1998) known as the EnKF, is the name for a family of related methods that attempt to solve the data assimilation problem using

Gaussian assumptions on an ensemble of model state realizations.

Ensemble size is typically the limiting factor in the efﬁcacy of ensemble-based data assim- ilation methods, thus heuristics correction methods such as covariance shrinkage (Nino-Ruiz and Sandu, 2015, 2019) and localization (Moosavi et al., 2019; Petrie, 2008; Popov and Sandu, plementations of the ensemble Kalman ﬁlter, reducing the need for such heuristic corrections is an important and active area of research.Full order model runs are one of the dominant costs in any operational data assimilation setup, thus, augmenting data assimilation algorithms with reduced order models (Cao et al.,

The idea of leveraging model hierarchies in numerical algorithms for uncertainty quantiﬁca- tion (Peherstorfer et al., 2018) is fast gaining traction in both the data assimilation and machine learning communities. We focus our attention on reduced order models (ROMs) whose dynam- ics can be described by a smaller number of variables than the corresponding full order model (FOM). There are many different types of ROMs that exist. We however focus on just two: a proper orthogonal decomposition (POD) based ROM corresponding to a linear projection of the dynamics (Brunton and Kutz, 2019), and a ROM based on autoencoders (Aggarwal et al., non-linear projection of the dynamics.

The multiﬁdelity ensemble Kalman ﬁlter (MFEnKF) (Popov et al., 2021; Popov and Sandu,

Volume x, Issue x, 2021

A.A. Popov, & A. Sandu literature(Kalnay, 2003), other applications can beneﬁt from our multiﬁdelity approach such asmechanical engineering (Blanchard et al., 2009, 2010a,b) and air quality modeling (Constanti-nescu et al., 2007a,b,c).This paper is organized as follows. Section 2 discussed the data assimilation problem, pro-vides background on control variates, the EnKF, and the MFEnKF, as well as ROMs and au-toencoders. Section 3 introduces the non-linear extension to the MFEnKF. Section 4 introducesthe Lorenz ’96 model and the POD-ROM and NN-ROM. Section 5 has numerical experiments.Section 6 has concluding remarks.

2. BACKGROUND

As the aphorism goes “all models are wrong, but some are useful”, Sequential data assimila-tion propagates our imperfect knowledge about some physical quantity of interest through animperfect model which represents a time-evolving dynamical system, typically a chaotic system(Strogatz, 2018). Without an additional inﬂux of external information, our knowledge about thesystems would rapidly degrade, therefore acquiring and assimilating imperfect external informa- tion about the quantity of interest is the core of the problem that we are interested in.To be speciﬁc, consider a dynamical system of interest whose true state at time t i is X ti . The evoluton of the system is captured by the dynamical model X i = M t i − ,t i ( X i − ) + Ξ i , (2.1) where X i is a random variable whose distribution describes our knowledge of the the state of a physical process at time index i , and Ξ i is a random variable describing the modeling error. In this paper we assume a perfect model ( Ξ i ≡ methods is signiﬁcantly outside the scope of this paper.Additional independent information about the system is obtained through imperfect mea- surements of the observable aspects Y i of the truth X ti , i.e., through noisy observations Y i = H ( X ti ) + η i , η i ∼ N ( , Σ η i , η i ) , (2.2) where the “observation operator” H maps the model space onto the observation space (i.e., selects the observable aspects of the state).Our aim is to combine the two sources of information in a Bayesian framework: π ( X i | Y i ) ∝ π ( Y i | X i ) π ( X i ) . (2.3)In the remainder of the paper we use the following notation. Let W and V be random vari- ables. The exact mean of W is denoted by µ W , and the exact covariance between W and V by Σ W,V . E W denotes an ensemble of samples of W , and (cid:101) µ W and (cid:101) Σ W,V are the empiricalensemble mean of W and empirical ensemble covariance of W and V , respectively. Bayesian inference requires that all available information is used in order to obtain correct results(Jaynes, 2003). Variance reduction techniques (Owen, 2013) are methods that provide estimatesof some quantity of interest with lower variability. From a Bayesian perspective they representa reintroduction of information that was previously ignored. The linear control variate (LCV)

Preprint onlinear MFEnKF approach Rubinstein and Marcus (1985) is a variance reduction method that aims to incorporatethe new information in an optimal linear sense.LCV works with two vector spaces, a principal space X and a control space U , and severalrandom variables, as follows. The principal variate X is a X -valued random variable, and thecontrol variate ˆ U is a highly correlated U -valued random variable. The ancillary variate U is a U -valued random variable, which is uncorrelated with the preceding two variables, but sharesthe same mean as ˆ U , meaning µ U = µ ˆ U . The linear control variate framework builds a new X -valued random variable Z , called the total variate: Z = X − S ( ˆ U − U ) , (2.4)where the linear “gain” operator S is used to minimize the variance in Z by utilizing the infor-mation about the distributions of the constituent variates X , ˆ U and U .In this work X and U are ﬁnite dimensional vector spaces. The dimension of X is n , thedimension of U is denoted by m when it is the observation space, and by r when it is the reducedorder model state space. The following lemma is a slight generalization of (Rubinstein and Marcus, 1985, Appendix).

Lemma 1 (Optimal gain) . The optimal gain S that minimizes the trace of the covariance of Z (2.4) is: S = Σ X, ˆ U ( Σ ˆ U, ˆ U + Σ U,U ) − . (2.5) Here we focus on the case where the principal and control variates are related through a deterministic projection operator: ˆ U = θ ( X ) , (2.6)and where there exists an interpolation operator such that: X ≈ (cid:101) X = φ ( ˆ U ) , (2.7)which in some optimal sense approximately recovers the information embedded in X . The vari- able (cid:101) X is known as the reconstruction. We make the assumption that the projection and interpolation operators are linear: θ ( X ) := Θ X, φ ( ˆ U ) := Φ ˆ U. (2.8)We reproduce below the useful result (Popov et al., 2021, Theorem 3.1). Theorem 1.

Under the assumptions that ˆ U and U have equal covariances, and that the principalvariate residual is uncorrelated with the control variate, the optimal gain of (2.4) is half theinterpolation operator: Σ ˆ U, ˆ U = Σ U,U and Σ ( X − Φ ˆ U ) , ˆ U = ⇒ S = Φ . (2.9)Under the assumptions of Theorem 1 the control variate structure (2.4) is: Z = X − Φ ( ˆ U − U ) . (2.10) Volume x, Issue x, 2021

A.A. Popov, & A. Sandu

While working with linear transformations is easy, many applications require reducing the vari-ance of a non-linear transformation of a random variable. We generalize the control variateframework to such an application.Following Owen (2013), assume that our transformed principal variate is of the form h ( X ) ,and the transformed control variate is of the form g ( ˆ U ) , where h and g are smooth non-linearfunctions. Similarly, assume that the transformed ancillary variate has the form g ( U ) . The totalvariate Z h , in the same space as h ( X ) , is deﬁned as: Z h = h ( X ) − S (cid:16) g ( ˆ U ) − g ( U ) (cid:17) , (2.11)with the optimal linear gain given by Lemma 1: S = Σ h ( X ) ,g ( ˆ U ) (cid:16) Σ g ( ˆ U ) ,g ( ˆ U ) + Σ g ( U ) ,g ( U ) (cid:17) − . (2.12) Theorem 2. If ˆ U and U are Gaussian, and share the same mean and covariance, ( µ ˆ U = µ U , Σ ˆ U, ˆ U = Σ U,U ), then µ g ( ˆ U ) = µ g ( U ) , (2.13) the control variate structure (2.11) holds, and the optimal linear gain is: S = Σ h ( X ) ,g ( ˆ U ) Σ − g ( ˆ U ) ,g ( ˆ U ) . (2.14) Proof. As ˆ U and U are Gaussian, they are completely deﬁned by their mean and covariance, and any deterministic non-linear transformation of the variates has the same moments, thereby sharing the mean and covariance. In that case g ( ˆ U ) − g ( U ) is unbiased, and the control variate framework can be applied.We now consider the case where the principal variate transformation is linear, h = H , and the control and ancillary variate transformation is the interpolation operator, g = φ , such that the underlying control variate is a projection of the principal variate (2.42), H X ≈ φ ( ˆ U ) . (2.15) Theorem 3.

Without proof, if the assumption of Theorem 2 hold, and the transformed principalvariate residual is uncorrelated with the transformed control variate, Σ ( H X − φ ( ˆ U ) ) , φ ( ˆ U ) assumed = , (2.16) then the optimal gain is S = I , (2.17) where I is the identity operator. Preprint onlinear MFEnKF The EnKF is a statistical approximation to the optimal control variate structure (2.4), where theunderlying probability density functions are represented by empirical measures using ensembles,i.e., a ﬁnite number of realizations of the random variables. The linear control variate frameworkallows to combine multiple ensembles into one that better represents the desired quantity ofinterest.Let E X bi ∈ R n × N X be an ensemble of N X realizations of the n -dimensional principal vari-ate, which represents our prior uncertainty in the model state at time index i from (2.1). Likewise,let E H i ( X bi ) = H i ( E X bi ) R m × N X be an ensemble of N X realizations of the m -dimensional con-trol observation state variate, which represents the same model realizations cast into observationspace. Let E Y i ∈ R m × N X be an ensemble of N X ‘perturbed observations’, which is a statisticalcorrection required in the ensemble Kalman ﬁlter (G. et al., 1998). Remark 4 (EnKF Perturbed Observations) . Each ensemble member of the perturbed observa-tions is sampled from a Gaussian distribution with mean the measured value, and the known observation covariance from (2.2) : [ E Y i ] : ,e ∼ N ( µ Y i , Σ η i , η i ) . (2.18) The prior ensemble at time step i is obtained by propagating the posterior ensemble at time i − E X bi = M t i − ,t i (cid:16) E X ai − (cid:17) , (2.19) where the slight abuse of notation indicates an independent model propagation of each ensemble member. Application of the Kalman ﬁlter formula constructs an ensemble E X ai describing the posterior uncertainty: E X ai = E X bi − (cid:101) K i (cid:16) E H i ( X bi ) − E Y i (cid:17) , (2.20) where the statistical Kalman gain is an ensemble-based approximation to the exact gain in Lemma 1: (cid:101) K i = (cid:101) Σ X bi , H i ( X bi ) (cid:16) (cid:101) Σ H i ( X bi ) , H i ( X bi ) + Σ η i , η i (cid:17) − . (2.21) Remark 5 (Inﬂation) . Inﬂation is a probabilistic correction necessary to account for the Kalmangain being correlated to the ensemble (Popov and Sandu, 2020). In inﬂation the ensembleanomalies (deviations from the statistical mean) are multiplied by a constant α > , therebyincreasing the covariance of the distribution described by the ensemble: E X b i + ← (cid:101) µ X b i + + α (cid:16) E X b i + − (cid:101) µ X b i + (cid:17) . (2.22) The Multiﬁdelity Ensemble Kalman Filter (MFEnKF) (Popov et al., 2021) seeks to merge theinformation from a hierarchy of models and the corresponding observations into a coherentrepresentation that can be propagated forward in time. This necessitates that the informationfrom the models is decoupled, but implicitly preserving of some underlying structure; we takethe linear control variate structure for convenience.

Volume x, Issue x, 2021

A.A. Popov, & A. Sandu

Without loss of generality, we discuss here a two-ﬁdelity approach, with a telescopic ex-tension to multiple ﬁdelities provided at the end of the section. Instead of having access to onemodel M , assume that we have access to a hierarchy of models. In the two ﬁdelity case, theprincipal space model is denoted by M X and the control space model is denoted by M U .We now consider the total variate Z b i = X b i − Φ ( ˆ U b i − U b i ) , (2.23)that describes the prior total information from a model that evolves in principal space ( X b i ) anda model that evolves in ancillary space ( ˆ U b i and U b i ).Assume that our prior total variate is represented by the three ensembles E X b i ∈ R n × N X con-sisting of N X realizations of the n -dimensional principal model state variate , E ˆ U b i ∈ R r × N X consisting of N X realizations of the r -dimensional control model state variate, and E U b i ∈ R r × N U consisting of N U realizations of the r -dimensional ancillary model state variate. Eachof these ensembles has a corresponding ensemble of m -dimensional control observation spacerealizations. MFEnKF performs sequential data assimilation using the above constituent ensembles, with- out having to explicitly calculate the ensemble of the total variates. The MFEnKF forecast step propagates the three ansembles form the previous step: E X b i = M Xt i − ,t i ( E X a i − ) , E ˆ U b i = M Ut i − ,t i ( E ˆ U a i − ) , E U b i = M Ut i − ,t i ( E U a i − ) . (2.24) Two observation operators H Xi and H Ui cast the principal model and control model spaces, respectively, into the control observation space. In this paper we assume that he principal model space observation operator is the canonical observation operator (2.2): H Xi ( X i ) := H i ( X i ) , (2.25)and that the control model space observation operator is the canonical observation operator (2.2) applied to the linear interpolated reconstruction (2.7) of a variable in control model space: H Ui ( U i ) := H i ( Φ U i ) . (2.26)Additionally, we deﬁne an (approximate) observation operator for the total model variate : H Zi ( Z i ) := H Xi ( X i ) − (cid:16) H Ui ( ˆ U i ) − H Ui ( U i ) (cid:17) , (2.27)which, under the linearity assumptions on H Xi of Theorem 3 and the underlying Gaussian as-sumptions on ˆ U i and U i of Theorem 2, begets that H Zi = H Xi . Even without the linearityassumption the deﬁnition (2.27) is operationally useful.The MFEnKF analysis updates each constituent ensemble as follows: E X a i = E X b i − (cid:101) K i (cid:16) E H Xi ( X b i ) − E Y Xi (cid:17) , E ˆ U a i = E ˆ U b i − Θ (cid:101) K i (cid:16) ( E H Ui ( ˆ U b i ) − E Y Xi (cid:17) , E U a i = E U b i − Θ (cid:101) K i (cid:16) ( E H Ui ( U b i ) − E Y Ui (cid:17) , (2.28) Preprint onlinear MFEnKF with the heuristic correction to the means (cid:101) µ X a i ← (cid:101) µ Z ai , (cid:101) µ ˆ U a i ← Θ (cid:101) µ Z ai , (cid:101) µ U a i ← Θ (cid:101) µ Z ai , (2.29)applied in order to fulﬁll the unbiasedness requirement of the control variate structure: (cid:101) µ Z a i = (cid:101) µ Z b i − (cid:101) K i (cid:16)(cid:101) µ H Xi ( Z b i ) − µ Y i (cid:17) . (2.30)The Kalman gain and the covariances are deﬁned by the semi–linearization: (cid:101) K i = (cid:101) Σ Z b i , H Zi ( Z b i ) (cid:16) (cid:101) Σ H Zi ( Z b i ) , H Zi ( Z b i ) + Σ η i , η i (cid:17) − (2.31) (cid:101) Σ Z b i , H Zi ( Z b i ) = (cid:101) Σ X b i , H Xi ( X b i ) + (cid:101) Σ Φ ˆ U b i , H Ui ( ˆ U b i ) + (cid:101) Σ Φ U b i , H Ui ( U b i ) − (cid:101) Σ X b i , H Ui ( ˆ U b i ) − (cid:101) Σ Φ ˆ U b i , H Xi ( X b i ) , (2.32) (cid:101) Σ H Zi ( Z b i ) , H Zi ( Z b i ) = (cid:101) Σ H Xi ( X b i ) , H i ( X b i ) + (cid:101) Σ H Ui ( ˆ U b i ) , H Ui ( ˆ U b i ) + (cid:101) Σ H Ui ( U b i ) , H Ui ( U b i ) − (cid:101) Σ H Xi ( X b i ) , H Ui ( ˆ U b i ) − (cid:101) Σ H Ui ( ˆ U b i ) , H Xi ( X b i ) . (2.33)In order to ensure that the control variate ˆ U remains highly correlated to the principal variate X , at the end of each analysis step we replace the analysis control variate ensemble with the corresponding projection of the principal variate ensemble: E ˆ U a i ← Θ E X a i . (2.34) Remark 6 (MFEnKF Perturbed observations) . There is no unique way to perform perturbedobservations (remark 4) in the MFEnKF. We will present one way in this paper. As Theorem 2requires both the control and ancillary variates to share the same covariance, we utilize herethe ‘control space uncertainty consistency’ approach. The perturbed observations ensembles in (2.28) is deﬁned by: [ E Y Xi ] : ,e ∼ N ( µ Y i , Σ η i , η i ) , (2.35) [ E Y Ui ] : ,e ∼ N ( µ Y i , s Σ η i , η i ) , (2.36) where the scaling factor is s = . See (Popov et al., 2021, Section 4.2) for a more detaileddiscussion about perturbed observations. Remark 7 (MFEnKF Inﬂation) . Similarly to the EnKF (see Remark 5), the MFEnKF also re-quires inﬂation in order to account for the statistical Kalman gain being correlated to its con-stituent ensembles. For a two ﬁdelity MFEnKF, two inﬂation factors are required: α X which actson the anomalies of the principal and control variates (as they must remain highly correlated)and α U which acts on the ensemble anomalies of the ancillary variate: E X b i + ← (cid:101) µ X b i + + α X (cid:16) E X b i + − (cid:101) µ X b i + (cid:17) , E ˆ U b i + ← (cid:101) µ ˆ U b i + + α X (cid:16) E ˆ U b i + − (cid:101) µ ˆ U b i + (cid:17) , E U b i + ← (cid:101) µ U b i + + α U (cid:16) E U b i + − (cid:101) µ U b i + (cid:17) . (2.37) Volume x, Issue x, 2021

A.A. Popov, & A. Sandu

It is well known that the important information of many dynamical systems can be expressedwith signiﬁcantly fewer dimensions than the discretization dimension n (Sell and You, 2013).For many inﬁnite dimensional equations it is possible to construct a ﬁnite-dimensional inertialmanifold that represents the dynamics of the system (including the global attractor). The Haus-dorff dimension of the global attractor of some dynamical system is a useful lower bound forthe minimal representation of the dynamics, though a representation of just the attractor is likelynot sufﬁcient to fully capture all the ‘useful’ aspects of the data. For data-based reduced ordermodels an important aspect is the intrinsic dimension (Lee and Verleysen, 2007) of the data. Theauthors’ are not aware of any formal statements relating the dimension of an inertial manifoldand the intrinsic dimension of some ﬁnite discretization of the dynamics. For some reduced di-mension r that is sufﬁcient to represent either the dynamics or the data, or both, it is possible tobuild a ‘useful’ surrogate model.We will now discuss the construction of reduced order models for problems posed as ordinarydifferential equations. The following derivations are similar to those found in Lee and Carlberg (2020), but assume vector spaces and no re-centering. Just like in the control variate framework in section 2.1, our full order model will reside in the principal space X and our reduced order model will be deﬁned in the space U , which is related to X through some coupling. Given an initial value problem in X : d X d t = f ( X ) , X ( t ) = X , t ∈ [ t , t f ] , (2.38) and the projection operator (2.42), the induced reduced order model initial value problem in U is deﬁned by simple differentiation of U = θ ( X ) , by dynamics in the space U , d U d t = θ (cid:48) ( X ) f ( X ) , X ( t ) = X , t ∈ [ t , t f ] . (2.39)As is common, the full order trajectory is not available during integration, as there is no bijection from X to U , thus an approximation using the interpolation operator (2.7) that fully resides in U is used instead: d U d t = θ (cid:48) ( φ ( U )) f ( φ ( U )) , U ( t ) = φ ( X ) , t ∈ [ t , t f ] . (2.40)Note that this is not the only way to obtain a reduced order model by using arbitrary pro- jection and interpolation operators. It is however the simplest extension of the POD-based ROM framework. Remark 8 (Linear ROM) . In the linear case (2.8) the reduced order model (2.40) takes the form d U d t = Θ f ( Φ U ) , U ( t ) = φ ( X ) , t ∈ [ t , t f ] . (2.41) An autoencoder (Aggarwal et al., 2018) is an artiﬁcial neural network consisting of two com-ponents, an encoder θ and a decoder φ , such that given a variable X in the principal space, thevariable ˆ U = θ ( X ) , (2.42) Preprint onlinear MFEnKF resides in the control space of the encoder. Conversely the reconstruction, ˜ X = φ ( ˆ U ) , (2.43)is an approximation to X in the principal space. In our context we think of the encoder as a non-linear projection operator (2.42), and of the decoder as a nonlinear interpolation operator (2.7).While the relative dimension n of the principal space is relatively high, the arbitrary structure ofan artiﬁcial neural networks allows the autoencoder to learn the optimal r -dimensional (small)representation of the data.

3. NON-LINEAR PROJECTION-BASED MFENKF

We extend MFEnKF to work with non-linear projection and interpolation operators. Theoreticalextensions of the linear control variate framework to the non-linear case (Nelson, 1987) arenot as-of-yet satisfactory. As is traditional in data assimilation literature, we resort to heuristicassumptions, which in practice may work signiﬁcantly better than the theory predicts. The newalgorithm is named NL-MFEnKF.

The main idea is to replace the optimal control variate structure for linear projection and interpolation operators (2.10) with one that works with their non-linear counterparts (2.11): Z bi = X bi − (cid:16) φ ( ˆ U bi ) − φ ( U bi ) (cid:17) . (3.1) We assume that ˆ U and U are Gaussian, and have the same mean and covariance, such that they obey the assumptions made in Theorem 2 and in Theorem 3 for the optimal gain. Similar to MFEnKF (2.26), the control model space observation operator is the application of the canonical observation operator (2.2) to the reconstruction H Ui ( U i ) := H i ( φ ( U i )) , (3.2)with the other observation operators (2.25, 2.27) deﬁned as in the MFEnKF. Remark 9.

It is of independent interest to explore control model space observation operatorsthat are not of the form (3.2) . For example, if the interpolation operator φ is created throughan autoencoder, the control model space observation operator H U could similarly be a differentdecoder of the same latent space. The MFEnKF equations (2.28) are replaced by their non-linear counterparts in a manner similar to what is done with non-linear observation operators, E X a i = E X b i − (cid:101) K i (cid:16) E H Xi ( X b i ) − E Y Xi (cid:17) , E ˆ U a i = E ˆ U b i − (cid:101) K θ i (cid:16) E H Ui ( ˆ U b i ) − E Y Xi (cid:17) , E U a i = E U b i − (cid:101) K θ i (cid:16) E H Ui ( U b i ) − E Y Ui (cid:17) , (3.3)where, as opposed to (2.28), there are now two Kalman gains, deﬁned by, (cid:101) K i = (cid:101) Σ Z b i , H Zi ( Z b i ) (cid:16) (cid:101) Σ H Zi ( Z b i ) , H Zi ( Z b i ) + Σ η i , η i (cid:17) − , (3.4) (cid:101) K θ i = (cid:101) Σ θ ( Z b i ) , H Zi ( Z b i ) (cid:16) (cid:101) Σ H Zi ( Z b i ) , H Zi ( Z b i ) + Σ η i , η i (cid:17) − , (3.5)for our purposes the covariances are semi-linear approximations, similar to (2.32) and (2.33),and the perturbed observations are deﬁned just like in the MFEnKF (Remark 6). Volume x, Issue x, 20210

A.A. Popov, & A. Sandu

For linear operators the projection of the mean is the mean of the projection. This is howevernot true for general non-linear operators. Thus in order to correct the means like in the MFEnKF(2.29), additional assumptions have to be made.The empirical mean of the total analysis variate (similar to (3.1) and (2.30)) is (cid:101) µ Z ai = (cid:101) µ X ai − (cid:16)(cid:101) µ φ ( ˆ U ai ) − (cid:101) µ φ ( U ai ) (cid:17) . (3.6)We use it to ﬁnd the optimal mean adjustments in reduced space. Speciﬁcally, we set the meanof the principal variate to be the mean of the total variate (3.6), (cid:101) µ X ai ← (cid:101) µ Z ai , (3.7)enforce the recorrelation of the principal and control variates (2.34)) via E ˆ U ai ← θ ( E X ai ) , (3.8)and deﬁne the control variate mean adjustment as, (cid:101) µ ˆ U ai ← (cid:101) µ θ ( X ai ) . (3.9)Unlike the linear control variate framework of the MFEnKF (2.23), the non-linear framework of the NL-MFEnKF (3.1) does not induce a unique way to impose unbiasedness on the control- space variates. There are many possible non-linear formulations to the MFEnKF, and many possible heuristic corrections of the mean the ancillary variate. Here we discuss three approaches based on:

1. control space unbiased mean adjustment,

2. principal space unbiased mean adjustment, and3. Kalman approximate mean adjustment,each stemming from a different assumption on the relationship between the ancillary variate and the other variates.

By assuming that the control variate ˆ U ai and and the ancillary variate U ai are unbiased in the control space, the mean adjustment of ˆ U a in (3.9) directly deﬁnes the mean adjustment of the ancillary variate: (cid:101) µ U ai ← (cid:101) µ ˆ U ai . (3.10)The authors will choose this method of correction in the numerical experiments for both itsproperties and ease of implementation. If instead we assume that the control variate φ ( ˆ U i ) and the ancillary variate φ ( U i ) are unbiasedin the principal space, then the mean of the total variate Z i (3.1) equals the mean of the principalvariate X i , a desirable property. Preprint onlinear MFEnKF Finding a mean adjustment for U ai in the control space such that the unbiasedness is satisﬁedin the principal space is a non-trivial problem. Explicitly, we seek a vector ν such that: (cid:101) µ φ ( ˆ U ai ) = (cid:101) µ φ ( U ai − (cid:101) µ Uai + ν ) . (3.11)The solution to (3.11) requires the solution to an expensive inverse problem. Note that (3.11) isequivalent to (3.10) under the assumptions of Theorem 2, in the limit of ensemble size. Instead of assuming that the control and ancillary variates are unbiased, we can consider directlythe mean of the control-space total variate: (cid:101) µ θ ( Z ai ) = (cid:101) µ θ ( X ai ) − (cid:16)(cid:101) µ ˆ U ai − (cid:101) µ U ai (cid:17) , (3.12)deﬁned with the mean values in NL-MFEnKF formulas (3.3). The following adjustment to themean of the ancillary variate: (cid:101) µ U ai ← (cid:101) µ θ ( Z ai ) , (3.13)is not unbiased with respect to the control variate in any space, but provides a heuristic approxi- mation of the total variate mean in control space. As in (Popov et al., 2021, Section 4.5) one can telescopically extend the two ﬁdelity NL-

MFEnKF algorithm to L + φ (cid:96) interpolates from the space of ﬁdelity (cid:96) to the space of ﬁdelity (cid:96) −

1, where φ interpolates to the principal space. A telescopic extension of (3.1) is Z = X − L (cid:88) (cid:96) = − (cid:96) (cid:16) φ ( ˆ U (cid:96) ) − φ ( U (cid:96) ) (cid:17) , (3.14)where the projection operator at each ﬁdelity is deﬁned as, φ (cid:96) = φ ◦ · · · ◦ φ (cid:96) , (3.15)projecting from the space of ﬁdelity (cid:96) to the principal space. The telescopic extension of the NL-MFEnKF is not analyzed further in this work.

4. MODELS FOR NUMERICAL EXPERIMENTS

For numerical experiments we use the Lorenz ’96 model, and two surrogate models that approx-imate it’s dynamics:1. a principal orthogonal decomposition-based quadratic reduced order model (POD-ROM),and2. an autoencoder neural network-based reduced order model (NN-ROM).For each case we construct reduced order models (ROMs) for reduced dimension sizes of r = Volume x, Issue x, 20212

A.A. Popov, & A. Sandu

The Lorenz ’96 model (Lorenz, 1996) is conjured from the PDE (Reich and Cotter, 2015), d y d t = − yy x − y + F, (4.1)where the forcing parameter is set to F =

8. In the semi-discrete version y ∈ R n , and thenonlinear term is approximated by a numerically unstable ﬁnite difference approximation, [ yy x ] k = (cid:0)(cid:98) Iy (cid:1) · (cid:0) (cid:98) Dy (cid:1) = ([ y ] k − ) · ([ y ] k − − [ y ] k + ) , (4.2)where (cid:98) I is a (linear) shift operator, and the linear operator (cid:98) D is a ﬁrst order approximation tothe ﬁrst spatial derivative. The canonical n =

40 variable discretization with cyclic boundaryconditions is used. The canonical fourth order Runge-Kutta method is used to discretize the timedimension.For the given discrete formulation of the Lorenz ’96 system, 14 represents the number ofnon-negative Lyapunov exponents, 28 represents the rounded-up Kaplan-Yorke dimension of .

1, and 35 represents an approximation of the intrinsic dimension of the system (calculated by the method provided by Bahadur and Paffenroth (2019)). To the authors’ knowledge, the inertial manifold of the system, if it exists, is not known. The relatively high ratio between the intrinsic dimension of the system and the spatial dimension of the system makes constructing a reduced order model particularly challenging.

The data to construct the reduced order models is taken to be T = representative model run. The snapshots are spaced 36 time units apart, equivalent to six months in model time. Using the method of snapshots (Sirovich, 1987), we construct optimal linear operators, Φ T = Θ ∈ R r × n , such that the projection captures the dominant orthogonal modes of the system dynamics. The reduced order model approximation with linear projection and interpolation op- erators (2.41) is quadratic (similar to Mou et al. (2021); San and Iliescu (2015)) d u d t = a + Bu + u T C u , (4.3)where the corresponding vector a , matrix B , and 3-tensor C are deﬁned by: a = F Θ n , (4.4a) B = − ΘΦ , (4.4b) [ C ] pqr = − (cid:0)(cid:98) IΦ : ,p (cid:1) T (cid:0) (cid:98) DΦ : ,q (cid:1) Φ : ,r . (4.4c) We now discuss building the neural network-based reduced order model (NN-ROM). Given theprincipal space variable X , consider an encoder that computes its projection ˆ U onto the controlspace (2.42), and a decoder that computes the reconstruction (cid:101) X (2.43). Preprint onlinear MFEnKF Canonical autoencoders simply aim to minimize the reconstruction error: X ≈ ˜ X, (4.5)which attempts to capture the dominant modes of the intrinsic manifold of the data. However,for our purposes, we seek to preserve two additional properties.The ﬁrst property that is especially important to our application is the stable preservation ofthe latent space through multiple projections and interpolations: θ ( φ ( ˆ U )) ≈ ˆ U, (4.6)which we will call the left-inverse property . For POD, this property is automatically preservedby construction and the linearity of the methods. For non-linear operators, the authors have notexplicitly seen this property preserved, however, as the MFEnKF requires sequential applicationsof projections and interpolations, we believe that for the use-case outlined in this paper, theproperty is especially important.The autoencoder structure automatically induces a reduced order model M AE , correspond-ing to the equation (2.40), that captures the dynamics of the system in the r -dimensional control space. The second property is that the r -dimensional ROM dynamics follows accurately the n -dimensional dynamics of the full order model. Remark 10.

Note that unlike the POD-ROM whose linear structure induces a purely r -dimensionalinitial value problem, the NN-ROM (2.40) still involves n -dimensional function evaluations. Ina practical method it would be necessary to reduce the internal dimension of the ROM, howeverthat is signiﬁcantly outside the scope of this paper. We wish to construct a proof-of-concept neural network surrogate. We seek to ensure that the induced dynamics (2.40) makes accurate predictions, by not only capturing the intrinsic manifold of the data, but also attempting to capture the inertial manifold of the system. Explicitly, we wish to ensure that (cid:13)(cid:13) M ( X ) − φ ( M AE ( θ ( X ))) (cid:13)(cid:13) (4.7) is minimized. In this sense (2.40) would represent an approximation of the dynamics along a submanifold of the inertial manifold. In practice we replace (4.7) by the difference between a short trajectory in the full space started from a certain initial value, and the corresponding short trajectory in the latent space started form the projected initial value. Combining the canonical autoencoder reconstruction error term (4.5), the latent space re- construction error term (preserving property (4.6)), and the temporal reconstruction error (4.7) (computed along K small steps), we arrive at the following loss function for each snapshot: (cid:96) j ( X j ) = n (cid:107) X j − φ ( θ ( X j )) (cid:107) + λ r (cid:107) θ ( X j ) − θ ( φ ( θ ( X j ))) (cid:107) + K (cid:88) k = λ n (cid:13)(cid:13)(cid:13) M t j , ( t j + k ∆ t ) ( X j ) − φ ( M AEt j , ( t j + k ∆ t ) ( θ ( X j ))) (cid:13)(cid:13)(cid:13) , (4.8)where the hyper-parameters λ and λ represent the inverse relative variance of the mismatch.The full loss function, combining the cost functions for all T snapshots, L ( X ) = T (cid:88) j = (cid:96) j ( X j ) , (4.9)can be minimized through typical stochastic optimization methods. Volume x, Issue x, 20214

A.A. Popov, & A. Sandu

Similar to the POD model, we construct r -dimensional NN-based surrogate ROMs. To this end,we use one hidden layer networks with the tanh activation function for the encoder (2.42) anddecoder (2.43): θ ( X ) = W θ tanh( W θ X + b θ ) + b θ , W θ ∈ R h × n , W θ ∈ R r × h , b θ ∈ R h , b θ ∈ R r , (4.10) φ ( U ) = W φ tanh( W φ U + b φ ) + b φ , W φ ∈ R h × r , W φ ∈ R n × h , b φ ∈ R h , b φ ∈ R n , (4.11)where the hidden layer dimension on both the encoder and decoder is set to h =

200 in order toapproximate well most non-linear projection and interpolation functions.The hyperparameter values used in the numerical experiments are λ = , λ =

1, and K =

5. We employ the ADAM (Kingma and Ba, 2014) optimization procedure to train the NN and produce the various ROMs needed. Gradients of the loss function are computed through automatic differentiation.

5. NUMERICAL EXPERIMENTS

The numerical experiments compare the following four methodologies :

1. Standard EnKF with the Lorenz ’96 full order model;

2. MFEnKF with the POD surrogate model, an approach named MFEnKF(POD);

3. NL-MFEnKF with the autoencoder surrogate model, named NL-MFEnKF(NN); and4. MFEnKF with the autoencoder surrogate model, named MFEnKF(NN).

Since MFEnKF does not support non-linear projections and interpolations, in MFEnKF(NN) the ensembles are interpolated into the principal space, and assimilated under the assumption that Θ = Φ = I . To measure the accuracy of the analysis mean with respect to the truth we use the spatio- temporal root mean square error (RMSE):RMSE = (cid:118)(cid:117)(cid:117)(cid:116) N (cid:88) i = n (cid:88) k = (cid:16)(cid:104)(cid:101) µ X a i (cid:105) k − [ X ti ] k (cid:17) , (5.1) where N is the number of data assimilation steps for which we wish to measure the error.For sequential data assimilation experiments we observe all 40 variables of the Lorenz ’96system, with an observation error covariance of Σ η i , η i = I . Observations are performed every0 .

05 time units corresponding to 6 hours in model time. We run 20 independent realizations(independent ensemble initializations) for 1100 time steps, but discard the ﬁrst 100 steps forspinup.

Our ﬁrst experiment is concerned with the preservation of energy by our ROMs, and seeks tocompare the accuracy of NN-ROM against that of POD-ROM. For the Lorenz ’96 model, we

Preprint onlinear MFEnKF r POD-ROM NN-ROM . . . . . . . . . . TABLE 1:

Relative kinetic energies preserved by the reconstructions of the POD-ROM and theNN-ROM solutions of the Lorenz ’96 system. Various reduced-order model dimensions r areconsidered.use the following equation (Karimi and Paul, 2010) to model the spatio-temporal kinetic energy,KE = T (cid:88) i = n (cid:88) k = ([ y i ] k ) , (5.2) where T is the number of temporal snapshots that were used to construct the ROMs. Table 1 shows the relative kinetic energies of the POD-ROM and the NN-ROM reconstructed solutions (2.7) (the energies of the reconstructed ROM solutions are by the kinetic energy of the full order solution). The results lead to several observations. First, the NN-ROM always preserves more energy than the POD-ROM. We have achieved our goal to build an NN-ROM that is more accurate than the POD-ROM. Second, the NN-ROMs with dimensions r =

21 and r =

28 preserve almost as much energy as the POD-ROMs with dimension r =

28 and r =

35, respectively. Intuitively this tells us that they should be just as accurate.

The second set of experiments seeks to learn how the ROM dimension affects the accuracy of the various multiﬁdelity data assimilation algorithms.We take the principal ensemble size to be N X =

32, and the surrogate ensemble sizes equal to N U = r −

3, in order to always work in the undersampled regime. All multiﬁdelity algorithms are run with inﬂation factors α X = .

05 and α U = .

01. The traditional EnKF using the full order model is run with an ensemble size of N = N X and an inﬂation factor α = .

06 to ensure stability.

The results are shown in Figure 1. For the ‘interesting’ dimensions r = r = and an underrepresented r =

21, the NL-MFEnKF(NN) performs signiﬁcantly better than the

MFEnKF(POD). For a severely underrepresented ROM dimension of r = r =

14 the

MFEnKF(POD) outperforms or ties the NL-MFEnKF(NN). The authors suspect that this is ei-ther due to the NN-ROMs sharing the same hyperparameters, meaning that if the hyperparame-ters were tailored to each r dimensional NN-ROM individually, the authors suspect that it wouldbeat the POD-ROM in all cases.Of note is that excluding the case of r =

35, the MFEnKF(NN), using the standard MFEnKFmethod in the principal space, is beaten by all other algorithms, indicating that the non-linearmethod presented in this paper is truly needed for models involving non-linear model reduction.We note that for r =

35, the suspected intrinsic dimension of the data, the NL-MFEnKF(NN)outperforms the EnKF, both in terms of RMSE and variability within runs. This is additionally

Volume x, Issue x, 20216

A.A. Popov, & A. Sandu

ROM Dimension ( r ) R M SE MFEnKF(POD)NL-MFEnKF(NN)MFEnKF(NN)EnKF

FIG. 1:

Analysis RMSE versus ROM dimension for three multiﬁdelity data assimilation algorithms and theclassical EnKF. Ensemble sizes are N X =

32 and N U = r −

3. Error bars show two standard deviations.The inﬂation factor for the surrogate ROMs is ﬁxed at α U = .

01, the inﬂation of α = .

07 is used for theEnKF, and α X = .

05 is used for all other algorithms. strengthened by the results of the MFEnKF(NN) assimilated in the principal space, as it implies that there is little-to-no loss of information in the projection into the control space.

Our last set of experiments focuses on the particular ROM dimension r =

28, as we believe that it is representative of an operationally viable dimension reduction, covering the dimensionality of the global attractor.

For each of the four algorithms we vary the principal ensemble size N X = N , and the principal inﬂation factor α X = α . As before, we set the control ensemble size to N U = r − =

25 and the control-space inﬂation factor to α U = . Figure 2 shows the spatio-temporal RMSE for various choices of the free variables. The results show compelling evidence of the viability of the NL-MFEnKF(NN) as compared to that of the MFEnKF(POD), as the methods have similar stability properties for a wide range of principal ensemble size N X and principal inﬂation α X , with the NL-MFEnKF(NN) exhibiting smaller error for almost all the points for which the two methods are stable.For a few points with lower values of principal inﬂation α X , the NL-MFEnK(NN) is not asstable as the MFEnKF(POD). This could be due to either an instability in the NN-ROM itself,in the NL-MFEnKF itself, or in the projection and interpolation operators.An interesting observation is that the MFEnKF(NN), which is assimilated naively in theprinciple space, becomes less stable the greater ensemble size N X is. One possible explanationfor this is that the ensemble mean estimates become more accurate, thus the bias between theancillary and control variates is ampliﬁed in (2.4), and more error is introduced from the surro-gate model. This is in stark contrast to most other ensemble based methods, including all othersin this paper, whose error is lowered by an increase in ensemble size. Preprint onlinear MFEnKF MFEnKF(POD)

15 20 25 30

Ensemble Size ( N X ) I n f l a t i on ( X ) NL-MFEnKF(NN)

15 20 25 30

Ensemble Size ( N X ) I n f l a t i on ( X ) MFEnKF(NN)

15 20 25 30

Ensemble Size ( N X ) I n f l a t i on ( X ) EnKF

15 20 25 30

Ensemble Size ( N ) I n f l a t i on () FIG. 2:

Analysis RMSE for four data assimilation algorithms and various values of the principal ensemblesize N X and inﬂation factor α X . The surrogate ROM size is ﬁxed to r =

28 with ensemble size of N U =

25 and inﬂation of α U = .

01. The top left panel represents the MFEnKF with a POD-ROMsurrogate, the top right panel represents the NL-MFEnKF with a NN-ROM, the bottom left panel representsthe MFEnKF with a NN-ROM surrogate assimilated in the principal space, and the bottom right panelrepresents the classical EnKF.

6. CONCLUSIONS

The results obtained in this paper strongly imply that reduced order models based on non-linearprojections that fully capture the intrinsic dimension of the data are excellent surrogates for use inmultiﬁdelity sequential data assimilation. Moreover, the results show that generalizations to thecontrol variate framework to non-linear projection and interpolation does not beget signiﬁcanterror, and thus assimilation can safely be done in the space of a reduced model.Some future directions to pursue might involve more robust constructions of projection andinterpolation operators, the use of recurrent neural network models, and a more operational testproblem.

Volume x, Issue x, 2021 A.A. Popov, & A. Sandu

ACKNOWLEDGMENTS

This work was supported by awards NSF ACI–1709727, NSF CDS&E-MSS–1953113, DOEASCR DE–SC0021313, and by the Computational Science Laboratory at Virginia Tech.

REFERENCES

Aggarwal, C.C. ,

Neural Networks and Deep Learning , Springer, 2018.Asch, M., Bocquet, M., and Nodet, M.,

Data Assimilation: Methods, Algorithms, and Applications , SIAM,2016.Bahadur, N. and Paffenroth, R., Dimension Estimation using Autoencoders, arXiv preprintarXiv:1909.10702 , 2019.Blanchard, E., Sandu, A., and Sandu, C., Parameter Estimation for Mechanical Systems via an ExplicitRepresentation of Uncertainty,

Engineering Computations. International Journal for Computer-AidedEngineering and Software , vol. , no. 5, pp. 541–569, 2009.URL http://dx.doi.org/ . / Journal of Dynamic Systems, Measurement, and Control ,vol. , no. 6, p. 18, 2010a.URL http://dx.doi.org/ . / . Journal of Multi-body Dynamics , vol. , pp. 59–81, 2010b.URL http://dx.doi.org/ . / JMBD

Data-Driven Science and Engineering: Machine Learning, Dynamical Sys-tems, and Control , Cambridge University Press, 2019.Cao, Y., Zhu, J., Navon, I.M., and Luo, Z., A Reduced-Order Approach to Four-Dimensional VariationalData Assimilation using Proper Orthogonal Decomposition,

Int. J. Numer. Meth. Fluids , vol. , no. 10,pp. 1571–1583, 2007.Chada, N.K., Jasra, A., and Yu, F., Multilevel Ensemble Kalman-Bucy Filters, arXiv preprintarXiv:2011.04342 , 2020.Chernov, A., Hoel, H., Law, K.J.H., Nobile, F., and Tempone, R., Multilevel Ensemble Kalman Filteringfor Spatio-Temporal Processes, Numerische Mathematik , vol. , no. 1, pp. 71–125, 2021.URL https://doi.org/ . /s - - - https://github.com/ComputationalScienceLaboratory/ODE-Test-Problems Constantinescu, E., Sandu, A., Chai, T., and Carmichael, G., Assessmentofensemble-Basedchemicaldataassimilationinanidealizedsetting,

Atmospheric Environment , vol. , no. 1, pp. 18–36, 2007a.URL http://dx.doi.org/ . /j.atmosenv. . . Quarterly Journal of the Royal MeteorologicalSociety , vol. , no. 626, pp. 1229–1243, 2007b.URL http://dx.doi.org/ . /qj. Quarterly Journal of the Royal Meteo-rological Society , vol. , no. 626, pp. 1245–1256, 2007c.URL http://dx.doi.org/ . /qj. Preprint onlinear MFEnKF Evensen, G., Sequential Data Assimilation with a Nonlinear Quasi-Geostrophic Model using MonteCarlo Methods to Forecast Error Statistics,

Journal of Geophysical Research: Oceans , vol. , no. C5,pp. 10143–10162, 1994.Evensen, G., Data Assimilation: the Ensemble Kalman Filter , Springer Science & Business Media, 2009.Farrell, B.F. and Ioannou, P.J., State Estimation using a Reduced-Order Kalman Filter,

Journal of the At-mospheric Sciences , vol. , no. 23, pp. 3666–3680, 2001.G., B., van Leeuwen. P. J., and G., E., Analysis Scheme in the Ensemble Kalman Filter, Monthly WeatherReview , vol. , pp. 1719–1724, 1998.Giles, M.B., Multilevel Monte Carlo Path Simulation,

Operations Research , vol. , no. 3, pp. 607–617,2008.Giles, M.B., Multilevel Monte Carlo Methods, Acta Numerica , vol. , pp. 259–328, 2015.Hoel, H., Law, K.J.H., and Tempone, R., Multilevel Ensemble Kalman Filtering, SIAM Journal on Numer-ical Analysis , vol. , no. 3, 2016.Hoel, H., Shaimerdenova, G., and Tempone, R., Multilevel Ensemble Kalman Filtering based on a SampleAverage of Independent EnKF Estimators, Foundations of Data Science , p. 0, 2019.Jaynes, E.T.,

Probability Theory: The Logic of Science , Cambridge university press, 2003.Kalnay, E.,

Atmospheric Modeling, Data Assimilation, and Predictability , Cambridge Univ Pr, 2003.Karimi, A. and Paul, M.R., Extensive Chaos in the Lorenz-96 Model,

Chaos: An interdisciplinary journalof nonlinear science , vol. , no. 4, p. 043105, 2010.Karpatne, A., Atluri, G., Faghmous, J.H., Steinbach, M., Banerjee, A., Ganguly, A., Shekhar, S., Samatova,N., and Kumar, V., Theory-Guided Data Science: A New Paradigm for Scientiﬁc Discovery from Data, IEEE Transactions on knowledge and data engineering , vol. , no. 10, pp. 2318–2331, 2017.Kingma, D.P. and Ba, J., Adam: A Method for Stochastic Optimization, arXiv preprint arXiv:1412.6980 ,2014.Lee, J.A. and Verleysen, M., Nonlinear Dimensionality Reduction , Springer Science & Business Media,2007.Lee, K. and Carlberg, K.T., Model Reduction of Dynamical Systems on Nonlinear Manifolds using DeepConvolutional Autoencoders,

Journal of Computational Physics , vol. , p. 108973, 2020.Lorenz, E.N., Predictability: A Problem Partly Solved,

Proc. Seminar on predictability , Vol. 1, 1996.Moosavi, A., Attia, A., and Sandu, A., Tuning Covariance Localization using Machine Learning,

MachineLearning and Data Assimilation for Dynamical Systems track, International Conference on Computa-tional Science ICCS 2019 , R.J. et al., Ed., Vol. 11539 of

Lecture Notes in Computer Science , Faro,Algarve, Portugal, pp. 199–212, 2019.URL https://doi.org/ . / - - - - _ Fluids , vol. , no. 1, p. 16, 2021.Nelson, B.L., On Control Variate Estimators, Computers & Operations Research , vol. , no. 3, pp. 219–225, 1987.Nino-Ruiz, E. and Sandu, A., Ensemble Kalman Filter Implementations based on Shrinkage CovarianceMatrix Estimation, Ocean Dynamics , vol. , no. 11, pp. 1423–1439, 2015.URL http://dx.doi.org/ . /s - - - Cluster Computing , vol. , pp. 2211–2221, 2019.URL https://doi.org/ . /s - - - Monte Carlo Theory, Methods and Examples , 2013.

Volume x, Issue x, 2021 A.A. Popov, & A. SanduPeherstorfer, B., Willcox, K., and Gunzburger, M., Survey of Multiﬁdelity Methods in Uncertainty Propa-gation, Inference, and Optimization,

Siam Review , vol. , no. 3, pp. 550–591, 2018.Petrie, R., Localization in the Ensemble Kalman Filter, MSc Atmosphere, Ocean and Climate University ofReading , 2008.Popov, A.A., Mou, C., Sandu, A., and Iliescu, T., A Multiﬁdelity Ensemble Kalman Filter with ReducedOrder Control Variates,

SIAM Journal on Scientiﬁc Computing , vol. accepted , no. TBD, p. in press,2021.Popov, A.A. and Sandu, A., A Bayesian Approach to Multivariate Adaptive Localization in Ensemble-Based Data Assimilation with Time-Dependent Extensions,

Nonlinear Processes in Geophysics , vol. ,pp. 109–122, 2019.URL https://doi.org/ . /npg- - - Data Assimilationfor Atmospheric, Oceanic and Hydrologic Applications (Vol. IV) . S.K. Park and L. Xu, Eds. Springer,p. in press.Reich, S. and Cotter, C.,

Probabilistic Forecasting and Bayesian Data Assimilation , Cambridge UniversityPress, 2015.Roberts, S., Popov, A.A., and Sandu, A., ODE test problems: a MATLAB suite of initial value problems, ,2019.Rubinstein, R.Y. and Marcus, R., Efﬁciency of Multivariate Control Variates in Monte Carlo Simulation,

Operations Research , vol. , no. 3, pp. 661–677, 1985.San, O. and Iliescu, T., A Stabilized Proper Orthogonal Decomposition Reduced-Order Model for LargeScale Quasigeostrophic Ocean Circulation, Adv. Comput. Math. , pp. 1289–1319, 2015.Sell, G.R. and You, Y.,

Dynamics of Evolutionary Equations , Vol. 143, Springer Science & Business Media,2013.Sirovich, L., Turbulence and the Dynamics of Coherent Structures. I. Coherent Structures,

Quarterly ofapplied mathematics , vol. , no. 3, pp. 561–571, 1987.Strogatz, S.H., Nonlinear Dynamics and Chaos: with Applications to Physics, Biology, Chemistry, andEngineering , CRC Press, 2018.Willard, J., Jia, X., Xu, S., Steinbach, M., and Kumar, V., Integrating Physics-Based Modeling with Ma-chine Learning: A Survey, arXiv preprint arXiv:2003.04919 , 2020., 2020.