[PDF] Improve Unscented Kalman Inversion With Low-Rank Approximation and Reduced-Order Model

Abstract

The unscented Kalman inversion (UKI) presented in [1] is a general derivative-free approach to solving the inverse problem. UKI is particularly suitable for inverse problems where the forward model is given as a black box and may not be differentiable. The regularization strategy and convergence property of the UKI are thoroughly studied, and the method is demonstrated effectively handling noisy observation data and solving chaotic inverse problems. In this paper, we aim to make the UKI more efficient in terms of computational and memory costs for large scale inverse problems. We take advantages of the low-rank covariance structure to reduce the number of forward problem evaluations and the memory cost, related to the need to propagate large covariance matrices. And we leverage reduced-order model techniques to further speed up these forward evaluations. The effectiveness of the enhanced UKI is demonstrated on a barotropic model inverse problem with O(10^5) unknown parameters and a 3D generalized circulation model (GCM) inverse problem, where each iteration is as efficient as that of gradient-based optimization methods.

Full PDF

IImprove Unscented Kalman Inversion With Low-Rank Approximation andReduced-Order Model

Daniel Z. Huang a , Jiaoyang Huang b a California Institute of Technology, Pasadena, CA b New York University, New York, NY

Abstract

The unscented Kalman inversion (UKI) presented in [1] is a general derivative-free approach tosolving the inverse problem. UKI is particularly suitable for inverse problems where the forwardmodel is given as a black box and may not be diﬀerentiable. The regularization strategy andconvergence property of the UKI are thoroughly studied, and the method is demonstrated eﬀectivelyhandling noisy observation data and solving chaotic inverse problems. In this paper, we aim to makethe UKI more eﬃcient in terms of computational and memory costs for large scale inverse problems.We take advantages of the low-rank covariance structure to reduce the number of forward problemevaluations and the memory cost, related to the need to propagate large covariance matrices.And we leverage reduced-order model techniques to further speed up these forward evaluations.The eﬀectiveness of the enhanced UKI is demonstrated on a barotropic model inverse problemwith O (10 ) unknown parameters and a 3D generalized circulation model (GCM) inverse problem,where each iteration is as eﬃcient as that of gradient-based optimization methods. Keywords:

Inverse Problem, Optimization, Kalman Filter, Low-Rank Approximation,Reduced-Order Model

1. Introduction

Inverse problems are ubiquitous in engineering and scientiﬁc applications. These include, toname only a few, global climate model calibration [2, 3, 4], material constitutive relation calibra-tion [5, 6, 7], seismic inversion in geophysics [8, 9], and medical tomography [10, 11]. The associatedforward problems, which may feature multiple scales or include chaotic and turbulent phenomena,can be very expensive. Moreover, the observational data is often noisy, and the inverse problempossibly involves a large number of unknown parameters to recover and may be ill-posed.Inverse problems can be formulated as recovering unknown parameters (or states) θ ∈ R N θ fromthe noisy observation y ∈ R N y , as following y = G ( θ ) + η, (1)where G denotes a mapping between the parameter space and the observation space, and η ∼N (0 , Σ η ) denotes the observational error. Central to both the optimization and probabilistic ap- Email addresses: [email protected] (Daniel Z. Huang), [email protected] (Jiaoyang Huang) a r X i v : . [ m a t h . NA ] F e b roaches to inversion is the regularized objective function Φ R ( θ ) deﬁned byΦ R ( θ ) := Φ( θ ) + 12 (cid:107) Σ − ( θ − r ) (cid:107) , (2a)Φ( θ ) := 12 (cid:107) Σ − η ( y − G ( θ )) (cid:107) , (2b)where Σ η (cid:31) r and Σ (cid:31) θ .The focus of this paper is mainly on the unscented Kalman inversion [12, 1], a version of theKalman inversion method. Kalman inversion methodology pairs the parameter-to-data map G with a stochastic dynamical system for the parameter and then employs techniques from Kalmanﬁltering [13, 14, 15, 16] to estimate the parameter θ given the data y . When the unscented Kalmanﬁltering is applied, this leads to the UKI, a derivative-free method to solve the inverse problemaimed at solving the optimization problem deﬁned by minimization of Φ R . For inverse problems ingeneral, UKI is attractive because it is derivative-free and hence introduces a signiﬁcant ﬂexibility inthe forward solver design. Therefore, UKI is suitable for complex multiphysics problems requiringcoupling of diﬀerent solvers [17, 18, 19, 20, 21], where forward models can be treated as a black box,and methods containing discontinuities (i.e., immersed/embedded boundary method [22, 23, 24, 25]and adaptive mesh reﬁnement [26, 27]). Moreover, Tikhonov type regularization embedded inthe stochastic dynamics [1] enables UKI to eﬀectively handle noisy observations; the Levenberg-Marquardt connection [1] enables UKI to achieve linear (sometimes superlinear) convergence inpractice; and the smoothing property [1] introduced by the unscented transform enables UKI toeﬀectively handle chaotic inverse problems. Therefore, UKI is a general approach to solving theinverse problem.However, at each iteration, the UKI requires evaluating the forward problem 2 N θ + 1 timesand propagating large N θ by N θ covariance matrices. Although these forward problem evaluationsare embarrassingly parallel, the computational or memory costs can be intractable, when N θ islarge or the forward problem evaluation is expensive. The present work focuses on alleviating thecomputational and memory costs of the UKI. The Kalman ﬁlter [13] and its variants, including but not limit to extended Kalman ﬁlter [14],ensemble Kalman ﬁlter [15, 28, 29], unscented Kalman ﬁlter [16, 12], and cubature Kalman ﬁl-ter [30] are developed to sequentially update the probability distribution of states in partiallyobserved dynamics. Most of Kalman ﬁlters use Gaussian ansatz to formulate Kalman gain to as-similate the observation and update the distribution. The unscented Kalman ﬁlter represents themean and covariance of the Gaussian distribution directly. A quadrature rule—unscented trans-form, speciﬁcally for Gaussian distribution based on 2 N θ + 1 deterministic σ − points [16, 31, 12],has been designed and applied for the propagation of the state mean and covariance. Besides stateestimation, Kalman methods are used for parameter inversion, which bring about the extendedKalman inversion (ExKI), the ensemble Kalman inversion (EKI), and the unscented Kalman in-version (UKI). Extended, ensemble and unscented Kalman inversions have been applied to trainneural networks [32, 33, 12, 34] and EKI has been applied in the oil industry [35, 36, 37]. Dualand joint Kalman ﬁlters [38, 12] have been designed to simultaneously estimate the unknown statesand the parameters [38, 39, 12, 40, 41] from noisy sequential observations. The EKI has beensystematically developed and analyzed as a general purpose methodology [42, 43, 44, 45, 46] for thesolution of inverse and parameter estimation problems, the same is for UKI [1]. However, unscentedKalman ﬁlter and inversion generally require 2 N θ + 1 forward problem evaluations and the storage2f a N θ × N θ covariance matrix, therefore, they are (mistakenly) supposed to be unsuitable forhigh-dimensional data-assimilation/inverse problems.Since many physical phenomena or systems feature large-scale structure or low-dimensionalattractors, therefore, the model error covariance matrices are generally low-rank. The low-rankcovariance structures are leveraged in ensemble based Kalman ﬁlters, which leads to reduced spaceKalman ﬁlters [47, 48, 49] and square root ﬁlters (i.e, ensemble adjustment Kalman ﬁlter [28] andensemble transform Kalman ﬁlter [29, 50, 51]). And these square root ﬁlters have been appliedsuccessfully for weather forecast. Inspired by these square root ﬁlters [28, 29, 52], we introduce thereparameterization strategy and the low-rank square root form of the UKI, namely the truncatedunscented Kalman inversion (TUKI), where the inverse problems are solved in the vastly reducedsubspace. This reduces the computation and memory cost from O ( N θ ) and O ( N θ ) to O ( N r ) and O ( N r N θ ), where N r is the rank of the covariance matrix. It has also been pointed out in [28] that”ﬁlter methods are unlikely to work without making use of such information about the covariancestructure”.Moreover, these 2 N θ + 1 deterministic σ − points in the unscented transform representing theGaussian distribution, consists of 1 point at the mean and 2 N θ symmetrically distributed pointsaround the mean. The forward problem evaluation at the mean is a ﬁrst order approximation of thetransformed mean, and the other 2 N θ forward evaluations give a 2nd order approximation of thetransformed covariance. They correspond to G ( θ ) and d G ( θ ) in minimizing Eq. (2), respectively.We observe that the covariance evaluation does not need to be very accurate, and this gives roomfor another level of speedup. Reduced-order models, including but not limited to low-ﬁdelity/low-resolution models [53, 54], projection based reduced-order models (PROM) [55, 56, 57] , Gaussianprocess based surrogate models [58, 59], and neural network based surrogate models [60, 61, 62],can be applied to speed up these 2 N θ forward evaluations. Our main contribution is the development of several speedup strategies for the UKI. • We take advantage of the low-rank covariance structure by reparameterization or reformu-lating UKI in the low-rank square root form. These strategies reduce the number of forwardproblem evaluations from O ( N θ ) to O ( N r ) and the memory cost from O ( N θ ) to O ( N r N θ ). • We further speed up forward problem evaluations by using reduced-order models. The compu-tational cost in one iteration is reduced from 2 N θ + 1 (2 N r + 1) expensive forward evaluationsto 1 single expensive forward evaluation and 2 N θ (2 N r ) cheap forward evaluations. • The eﬀectiveness of the enhanced unscented Kalman inversion is demonstrated on a barotropicmodel inverse problem with O (10 ) unknown parameters and a 3D generalized circulationmodel, where each iteration is as eﬃcient as that of gradient based optimization meth-ods (Generally, gradient based optimization methods require a forward problem evaluationand an equally expensive backward propagation in one iteration.)The remainder of the paper is organized as follows. In Section 2, an overview of the unscentedKalman inversion is provided. In Section 3, speedup strategies by leveraging low-rank covariancestructure are introduced. In Section 4, speedup strategies by leveraging reduced-order modelsare introduced. Numerical applications that demonstrate the eﬃciency of the enhanced UKI areprovided in Section 5. 3 . Unscented Kalman Inversion (UKI) UKI pairs the parameter-to-data map G with a stochastic dynamical system for the parameter,which is deﬁned as following,evolution: θ n +1 = r + α ( θ n − r ) + ω n +1 , ω n +1 ∼ N (0 , Σ ω ) , (3a)observation: y n +1 = G ( θ n +1 ) + ν n +1 , ν n +1 ∼ N (0 , Σ ν ) , (3b)where θ n +1 is the unknown state vector, and y n +1 is the observation, the artiﬁcial evolution error ω n +1 and artiﬁcial observation error ν n +1 are mutually independent, zero-mean Gaussian sequenceswith covariances Σ ω and Σ ν , respectively. α ∈ (0 ,

1] is the regularization parameter, and r is anarbitrary vector.Let denote Y n := { y , y , · · · , y n } , the observation set at time n . Techniques from ﬁltering areemployed to approximate the distribution µ n of θ n | Y n . The iterative algorithm starts from µ andupdates µ n through the prediction and analysis steps [63, 64]: µ n (cid:55)→ ˆ µ n +1 , and then ˆ µ n +1 (cid:55)→ µ n +1 ,where ˆ µ n +1 is the distribution of θ n +1 | Y n .In the prediction step, we assume that µ n ≈ N ( m n , C n ), then under (3a), ˆ µ n +1 is also Gaussianwith mean and covariance: (cid:98) m n +1 = E [ θ n +1 | Y n ] = αm n + (1 − α ) r (cid:98) C n +1 = Cov[ θ n +1 | Y n ] = α C n + Σ ω . (4)In the analysis step, we assume that the joint distribution of { θ n +1 , y n +1 }| Y n can be approximatedby a Gaussian distribution N (cid:16)(cid:20) (cid:98) m n +1 (cid:98) y n +1 (cid:21) , (cid:34) (cid:98) C n +1 (cid:98) C θpn +1 (cid:98) C θpn +1 T (cid:98) C ppn +1 (cid:35)(cid:17) , (5)where (cid:98) y n +1 = E [ G ( θ n +1 ) | Y n ] , (cid:98) C θpn +1 =Cov[ θ n +1 , G ( θ n +1 ) | Y n ] , (cid:98) C ppn +1 =Cov[ G ( θ n +1 ) | Y n ] + Σ ν . (6)Conditioning the Gaussian in (5) to ﬁnd θ n +1 |{ Y n , y n +1 } = θ n +1 | Y n +1 gives the following expressionsfor the mean m n +1 and covariance C n +1 of the approximation to µ n +1 : m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − ( y n +1 − (cid:98) y n +1 ) ,C n +1 = (cid:98) C n +1 − (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − (cid:98) C θpn +1 T . (7)By assuming all observations { y n } are identical to y ( Y n = y ), Equations (4) to (7) establish aconceptual description of the Kalman inversion to solve the inverse problem (1). And UKI uses thefollowing unscented transform to evaluate Eq. (6). Deﬁnition 1 (Modiﬁed Unscented Transform [1]) . Let denote Gaussian random variable θ ∼N ( m, C ) ∈ R N θ , N θ + 1 symmetric σ − points are chosen deterministically: θ = m θ j = m + c j [ √ C ] j θ j + N θ = m − c j [ √ C ] j (1 ≤ j ≤ N θ ) , where [ √ C ] j is the j th column of the Cholesky factor of C . The quadrature rule approximates themean and covariance of the transformed variable G i ( θ ) as follows, E [ G i ( θ )] ≈ G i ( θ ) = G i ( m ) Cov [ G ( θ ) , G ( θ )] ≈ N θ (cid:88) j =1 W cj ( G ( θ j ) − E G ( θ ))( G ( θ j ) − E G ( θ )) T . (8)4 ere these constant weights are c = c · · · = c N θ = (cid:112) N θ + λ W c = W c = · · · = W c N θ = 12( N r + λ ) ,λ = a ( N θ + κ ) − N θ κ = 0 a = min { (cid:114) N θ + κ , } . Consider the algorithm deﬁned by Eqs. (4) to (7). By utilizing the aforementioned quadraturerule, we obtain the following UKI algorithm: • Prediction step : (cid:98) m n +1 = αm n + (1 − α ) r (cid:98) C n +1 = α C n + Σ ω . (9) • Generate σ − points : (cid:98) θ n +1 = (cid:98) m n +1 , (cid:98) θ jn +1 = (cid:98) m n +1 + c j [ (cid:113) (cid:98) C n +1 ] j (1 ≤ j ≤ N θ ) , (cid:98) θ j + N θ n +1 = (cid:98) m n +1 − c j [ (cid:113) (cid:98) C n +1 ] j (1 ≤ j ≤ N θ ) . • Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = (cid:98) y n +1 , (10a) (cid:98) C θpn +1 = N θ (cid:88) j =1 W cj ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T , (10b) (cid:98) C ppn +1 = N θ (cid:88) j =1 W cj ( (cid:98) y jn +1 − (cid:98) y n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T + Σ ν , (10c) m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − ( y − (cid:98) y n +1 ) , (10d) C n +1 = (cid:98) C n +1 − (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − (cid:98) C θpn +1 T . (10e)Following [1], the hyperparameters of the stochastic dynamical system are r = r , Σ ω = (2 − α )Λ , Σ ν = 2Σ η , and α ∈ (0 , . . (11)where r is the prior mean and Λ is a positive deﬁnite matrix. And the initial distribution θ ∼N ( m , C ) satisﬁes m = r and C = Λ.UKI [1] as a general derivative-free approach to solving the inverse problem, is particularlysuitable for inverse problems where the forward model is given as a black box and may not bediﬀerentiable. However, the computational cost of 2 N θ + 1 forward model evaluations (10a) inthe analysis step could be intractable, especially when the unknown parameter dimension is highor the forward problem evaluation is expensive; the memory cost of storing and propagating thecovariance (10e) could be unaﬀordable, especially when the unknown parameter dimension is high.In the present work, we focus on improving UKI by leveraging the low rank covariance structureand the reduced-order model techniques. 5 . Speed Up with Low-Rank Covariance Structure In general, UKI as a particle method, requires evaluations on 2 N θ + 1 σ -points and the prop-agation of the N θ × N θ covariance matrix, which is impractical for general geophysical applica-tions (i.e., initial condition recovery problems) with dimension of roughly O (10 ) − O (10 ) [65, 66].However, the ensemble Kalman ﬁlter [15, 28, 29] is successfully used in high-dimensional data-assimilation applications with ensemble size O (10 ), what is the catch? Since many observed geo-physical phenomena have large-scale structure or low-dimensional attractors, therefore, the modelerror covariance can be well-approximated by the low-rank sample covariance. Moreover, theselarge-scale structure can be well represented by dominant frequency modes (i.e., dominant Fourieror Karhunen-Lo`eve (KL) modes).In this section, two approaches to leveraging the low-rank large-scale structure in the UKIframework are introduced: • Reparameterization, namely project or reparameterize the model parameters onto some vastlyreduced subspace; • Truncated unscented Kalman inversion, namely formulate the unscented Kalman ﬁlter as alow-rank square-root ﬁlter.For both approaches, the number of σ -points is reduced from 2 N θ + 1 to 2 N r + 1, where N r is thelow rank number. And the storage for the covariance matrix is reduced from O ( N θ ) to O ( N θ N r ).Moreover, a discussion about the scenario, where low rank covariance structure is unknown or doesnot exist, is presented in Subsection 3.3. Let r denote the prior mean. The discrepancy θ − r is assumed to be well-approximated inthe linear space spanned by the basis { u , u , · · · , u N r } . Hence, the unknown parameters can bereparameterized as following, θ = r + N r (cid:88) i =1 τ ( i ) u i . UKI [1] is then applied to solve for the vector τ = [ τ (1) , τ (2) , · · · , τ ( N r ) ] T , which has prior mean 0.Let m n ( τ ) and C n ( τ ) denote the mean and covariance estimation of τ , the mean and covarianceestimation of θ can be recovered by m n ( θ ) = r + U m n ( τ ) C n ( θ ) = U C n ( τ ) U T , (12)here U = (cid:16) u u · · · u N r (cid:17) . Remark 1.

Equation (12) also reveals the low-rank covariance structure of C n ( θ ) . The squareroot Z n of the covariance matrix C n ( θ ) can be written as Z n = U (cid:112) C n ( τ ) Q, here Q is an orthogonal matrix. Hence, all the square root matrices { Z n } share the column vectorspace—the space spanned by column vectors of U , which is prescribed in the parameterization. In the present work, the square root of the matrix C is deﬁned to be a matrix Z such that C = ZZ T , which is inconsistent with its most common use in the mathematical literature. And if Z and Z are two n × m square root matrices of C , then there exists an orthogonal matrix Q such that Z = Z Q [67]. .2. Truncated Unscented Kalman Inversion (TUKI) The truncated unscented transform, which is a low-rank approximation of the modiﬁed un-scented transform (See Deﬁnition 1), is central to the TUKI:

Deﬁnition 2 (Truncated Unscented Transform) . Let denote Gaussian random variable θ ∼ N ( m, C ) ∈ R N θ , and assume the N r dimensional truncated singular value decomposition ( N r -TSVD) of C is C ≈ N r (cid:88) i =1 d i u i u Ti , where { u i } are left singular vectors, and d ≥ d · · · d N r ≥ are dominant singular values. N r + 1 symmetric σ − points are chosen deterministically: θ = m θ j = m + c j (cid:112) d j u j θ j + N r = m − c j (cid:112) d j u j (1 ≤ j ≤ N r ) . The quadrature rule approximates the mean and covariance of the transformed variable G i ( θ ) asfollows, E [ G i ( θ )] ≈ G i ( θ ) Cov [ G ( θ ) , G ( θ )] ≈ N r (cid:88) j =1 W cj ( G ( θ j ) − E G ( θ ))( G ( θ j ) − E G ( θ )) T . Here these constant weights are c = c · · · = c N r = (cid:112) N r + λ W c = W c = · · · = W c N r = 12( N r + λ ) ,λ = a ( N r + κ ) − N r κ = 0 a = min { (cid:114) N r + κ , } . We choose the hyperparameter Λ in Eq. (11) to be a low-rank matrix with square root Z ∈ R N θ × N r , which satisﬁes Λ = Z Z T and rank( Z ) = N r . (13)Let Z ω = √ − α Z denote the square root matrix of the artiﬁcial evolution error covariance Σ ω in Eq. (11). The iteration procedure of the UKI can be formulated in the square root form { Z n } ,where C n = Z n Z Tn . This leads to the following truncated unscented Kalman inversion (TUKI): • Prediction step : (cid:98) m n +1 = αm n + (1 − α ) r N r -TSVD : (cid:16) αZ n Z ω (cid:17) = (cid:98) U n (cid:113) (cid:98) D n (cid:98) V Tn , (14)where (cid:98) U n = (cid:16) u u · · · u N r (cid:17) and (cid:98) D n = diag { d , d , · · · d N r } . And hence (cid:98) C n +1 = α C n + Σ ω = (cid:16) αZ n Z ω (cid:17)(cid:16) αZ n Z ω (cid:17) T = (cid:98) U n (cid:98) D n (cid:98) U Tn . • Generate truncated σ − points : (cid:98) θ n +1 = (cid:98) m n +1 (cid:98) θ jn +1 = (cid:98) m n +1 + c j (cid:112) d j u j (cid:98) θ j + N r n +1 = (cid:98) m n +1 − c j (cid:112) d j u j (1 ≤ j ≤ N r ) . Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = (cid:98) y n +1 , (15a) (cid:98) Z n +1 = (cid:16)(cid:112) W c ( (cid:98) θ n +1 − (cid:98) m n +1 ) (cid:112) W c ( (cid:98) θ n +1 − (cid:98) m n +1 ) · · · (cid:113) W c N r ( (cid:98) θ N r n +1 − (cid:98) m n +1 ) (cid:17) , (15b) (cid:98) Y n +1 = (cid:16)(cid:112) W c ( (cid:98) y n +1 − (cid:98) y n +1 ) (cid:112) W c ( (cid:98) y n +1 − (cid:98) y n +1 ) · · · (cid:113) W c N r ( (cid:98) y N r n +1 − (cid:98) y n +1 ) (cid:17) , (15c)SVD : (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 = (cid:98) P n +1 (cid:98) Γ n +1 (cid:98) P Tn +1 , (15d) m n +1 = (cid:98) m n +1 + (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − (cid:98) P Tn +1 (cid:98) Y Tn +1 Σ − ν ( y − (cid:98) y n +1 ) , (15e) Z n +1 = (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − / . (15f)Since the covariance matrices in Eq. (10) are (cid:98) C θpn +1 = (cid:98) Z n +1 (cid:98) Y Tn +1 , (cid:98) C ppn +1 = (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν , and (cid:98) C n +1 = (cid:98) Z n +1 (cid:98) Z Tn +1 . Applying Sherman–Morrison–Woodbury formula( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − = Σ − ν − Σ − ν (cid:98) Y n +1 ( I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 ) − (cid:98) Y Tn +1 Σ − ν to these update equations (10d) and (10e) leads to the update equations (15e) and (15f). Remark 2.

The SVD in Eq. (15d) is applied on a N r × N r matrix and the truncated SVDin Eq. (14) is applied on a N θ × N r matrix, and hence, they are aﬀordable. Lemma 1.

All square root matrices { Z n } and { (cid:98) Z n } are rank N r and share the same column vectorspace spanned by the column vectors of Z (13). Therefore, the N r -TSVD in the prediction step (14) is exact.Proof. The proof is in Appendix A.

Remark 3. { m n } lie in the linear space spanned by r (or r ) and columns of Z , since m n +1 = αm n + (1 − α ) r + (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − (cid:98) P Tn +1 (cid:98) Y Tn +1 Σ − ν ( y − (cid:98) y n +1 ) . Remark 4.

Although the square root matrix Z n +1 computed in Eq. (15f) is not unique, namely itcan be represented as Z n +1 = (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − / Q, where Q can be any orthogonal matrix, the algorithm is uniquely deﬁned. Since the SVD of thesquare root of the predicted covariance matrix (14) can be written as (cid:16) αZ n +1 Q Z ω (cid:17) = (cid:98) U n +1 (cid:113) (cid:98) D n +1 (cid:16) (cid:98) V n +1 , Q (cid:98) V n +1 , (cid:17) , where (cid:16) αZ n +1 Z ω (cid:17) = (cid:98) U n +1 (cid:113) (cid:98) D n +1 (cid:16) (cid:98) V n +1 , (cid:98) V n +1 , (cid:17) , the orthogonal matrix Q does not aﬀect theleft singular vectors (cid:98) U n +1 or singular values in (cid:98) D n +1 . .3. Without Low Rank Covariance Structure Information In this section, we focus on problems with high dimensional unknown parameters, but withoutlow-rank covariance structure information (i.e., the prior covariance is I ). For ensemble Kalmaninversion and its variants (See Appendix B), the initial ensemble can still be generated from theprior distribution. For TUKI, a low rank matrix Λ = Z Z T is required to reduce computationaland memory costs.However, the ensemble Kalman inversion has the invariant subspace property [42, 43], namely,the ensemble members are in the linear space spanned by the initial ensembles. The UKI withreparameterization and TUKI have similar properties (See Remarks 1 and 3). On one hand, thisproperty brings regularization eﬀect. On the other hand, this property also brings strong limitationfor the inverse problems without low-rank covariance information, when the truncated low ranknumber N r or the ensemble size J are much smaller than the unknown parameter dimension N θ .The limitation is illustrated in the following theorem: Theorem 2.

Let { θ i } Ji =1 be random samples from the N θ -dimensional Gaussian distribution N (0 , I ) ,for any given vector θ ref ∈ R N θ , we have E (cid:2) dist ( θ ref , span { θ , θ , · · · , θ J } ) (cid:3) = (1 − JN θ ) (cid:107) θ ref (cid:107) . (16) Proof.

The proof is in Appendix A.Theorem 2 illustrates that when J and N r are much smaller than N θ , even the optimal approx-imation in the randomly generated invariant subspace is a poor approximation of the solution withhigh probability.

4. Speed Up with Reduced-Order Models

For large scientiﬁc or engineering problems, even with a small parameter number N θ (or ranknumber N r ), the computational cost associated with these 2 N θ + 1 (or 2 N r + 1) forward modelevaluations can be intractable. Reduced-order models can be used to speed up these 2 N θ + 1 (or2 N r + 1) forward model evaluations. Speciﬁcally the forward evaluation G (ˆ θ n +1 ) is evaluated bythe high-ﬁdelity model, and other forward evaluations G (ˆ θ jn +1 ) with j ≥ (cid:98) θ n +1 canform a good projection basis. For Gaussian process and neural network based surrogate models,data generating and training procedures are required. Remark 5.

Following Proposition 2 in [1], the UKI ( α = 1 ) update equations (10) can be writtenas m n +1 = m n + C n +1 F u d G Tn +1 (cid:0) Σ ν + (cid:101) Σ ν,n +1 (cid:1) − (cid:0) y − G ( m n ) (cid:1) , (17a) C n +1 = (cid:16) (cid:98) C − n +1 + F u d G Tn +1 (cid:0) Σ ν + (cid:101) Σ ν,n +1 (cid:1) − F u d G n +1 (cid:17) − . (17b)9 here F u d G n +1 = (cid:98) C θpn +1 T (cid:98) C − n +1 is the averaged derivative and (cid:107) (cid:101) Σ ν,n +1 (cid:107) = O ( (cid:107) (cid:98) C n +1 (cid:107) ) . To evaluate G ( m n ) and F u d G n +1 , the UKI requires N θ + 1 forward model evaluations. Speciﬁcally, in themodiﬁed unscented transform (See Deﬁnition 1), we need forward model evaluation (cid:98) y n +1 = G ( (cid:98) θ n +1 ) = G ( m n ) to estimate the mean, which corresponds to y − G ( m n ) in Eq. (17a) ; and N θ forward model eval-uations (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) j ≥ to estimate the covariance, which corresponds to F u d G n +1 .Since we ﬁnd the covariance and F u d G n +1 do not need to be very accurate. Hence, these N θ forward model evaluations can be accelerated by using reduced-order models.

5. Applications

In this section, we present numerical results for using the aforementioned strategies, includingUKI with reparameterization (referred as UKI) and TUKI, to speed up UKI. For comparison, wealso apply ensemble Kalman inversion (EKI), ensemble adjustment Kalman inversion (EAKI), andensemble transform Kalman inversion (ETKI), the detailed implementation of these ensemble-basedKalman inversions is listed in Appendix B. • Linear model problems: these problems serve as proof-of-concept examples, which comparethe behaviors of diﬀerent Kalman inversions with/without low-rank covariance structure in-formation. • Barotropic model problem: this is a model data assimilation problem in meteorology, wherethe initial condition is recovered from random observations in the north hemisphere. Thisproblem demonstrates the eﬀectiveness of using low rank covariance structure information tospeed up UKI. • Idealized general circulation model problem: this is a 3D Navier-Stokes problem with hy-drostatic assumption, which describes physical processes in the atmosphere. This problemdemonstrates the eﬀectiveness of using reduced-order models (speciﬁcally low-resolution mod-els) to speed up UKI.The code is accessible online: https://github.com/Zhengyu-Huang/InverseProblems.jl

In this section, 2 linear model problems are considered. The ﬁrst one is from one dimensionalelliptic equation and, therefore the low-rank covariance structure exists. The second one is anartiﬁcial linear model problem, with parameters following Bernoulli distribution and, therefore thelow-rank covariance structure information does not exist. For comparison, UKI with reparameteri-zation, TUKI, EKI, EAKI, and ETKI are all applied to these linear problems. Since these problemsare well-posed and no observation error is added, the regularization parameter α is set to be 1.10 .1.1. With Low Rank Covariance Structure Consider the one dimensional elliptic equation − d θdx + θ = f ( x ) ,θ (0) = θ (1) = 0 . Here Dirichlet boundary conditions are applied on both ends, and f deﬁnes the source: f ( x ) = (cid:40) ≤ x ≤ < x ≤ . The elliptic equation is semi-discretized by ﬁnite diﬀerence method on a uniform grid with N θ +2points with N θ = 1000. The solution is θ ref ∈ R N θ on these interior points. We are interested inthe inverse problem of recovering θ ref from observations of f on these interior points ( N y = N θ ): y = Gθ + η, here G = ( − d dx + 1) is the discretized operator, the observation error is η ∼ N (0 , I ) ∈ R N y . Sincethe large structure of θ lies in the Fourier sine space, we can approximate it as θ = τ (1) sin( xπ ) + · · · + τ ( N r ) sin( N r xπ ) with N r = 5 . (18)For UKI, the reparameterization is Eq. (18) and the UKI is initialized with τ ∼ N (0 , I ). ForTUKI, the low-rank covariance structure is embedded in the square root of Λ: Z = 10 × (cid:16) sin( xπ ) sin(2 xπ ) · · · sin( N r xπ ) (cid:17) . And the TUKI is initialized with θ ∼ N (0 , Z Z T ). For EKI, EAKI, and ETKI, the initial ensemblesare generated from the distribution N (0 , Z Z T ), which span the column space of Z . The ensemblesize is J = 2 N r + 1, which matches the number of σ -points of UKI and TUKI.The convergence of { m n } in terms of the relative L norm errors is depicted in Fig. 1. All theKalman inversions behave similar and converge eﬃciently, although the EKI suﬀers from randomnoise due to the small ensemble size. This problem illustrates how UKI uses 2 N r + 1 = 11 σ -pointsto solve a N θ = 1000 dimensional inverse problem. Consider the following artiﬁcial inverse problem y = Gθ + η, with G = I ∈ R N y × N θ and N y = 1000 , N θ = 1000. The entries of the underlying truth θ ref are independent, identical Bernoulli distribution B (1 , . y = Gθ ref , withobservation error η ∼ N (0 , I ). However, there is no low-rank covariance structure information.Hence, the initial distribution is set to be N (0 , I ).Although there is no low-rank covariance structure information, the TUKI is applied, withtruncated rank N r = 5. EKI, EAKI, and ETKI are also applied with the ensemble size of J =2 N r + 1.The convergence of { m n } in terms of the relative L norm errors is depicted in Fig. 2. All EKI,EAKI, and ETKI diverge, and TUKI fails to converge to θ ref . This conforms to the discussionin Subsection 3.3. Gradient-based optimization algorithms with automatic diﬀerentiation [68, 69,5, 6] could be much more eﬀective and eﬃcient for this scenario, when the forward problem isdiﬀerentiable and not chaotic. 11 L n o r m e rr o r TUKIUKIEKIEAKIETKI

Figure 1: L error (cid:107) m n − θ ref (cid:107) (cid:107) θ ref (cid:107) of the elliptic linear model problem. L n o r m e rr o r TUKIEKIEAKIETKI

Figure 2: L error (cid:107) m n − θ ref (cid:107) (cid:107) θ ref (cid:107) of the Bernoulli linear model problem. The barotropic vorticity equation describes the evolution of a non-divergent, incompressibleﬂow on the surface of the earth: ∂ω∂t = − v · ∇ ( ω + f ) , ∇ ψ = ω v = k × ∇ ψ, (19)where ω and ψ are (absolute) vorticity and streamfunction, respectively. v is the non-divergentﬂow velocity, k is the unit vector in the radial direction, and f = 2Ω sin( θ ) is the Coriolis force,depending on the latitude θ . The angular velocity is Ω = 7 . × s − and the Earth radius is R = 6 . × m .The initial condition (See Fig. 3) is the superposition of a zonally symmetric ﬂow u b = 25 cos( θ ) −

30 cos ( θ ) + 300 sin ( θ ) cos ( θ ) , (20)here u b is the zonal wind velocity, and a sinusoidal disturbance in the vorticity ﬁeld ω (cid:48) = A θ ) e − (( θ − θ ) /θ W ) cos( mλ ) , λ is the longitude, m = 4 , θ = 45 ◦ N, θ W = 15 ◦ , and A = 8 . × − . This resembles thatfound in the upper troposphere in Northern winter [70, 71]. (a) Zonally symmetric ﬂow ( u b ) (b) Zonally symmetric ﬂow ( ω b )(c) Sinusoidal vorticity disturbance ( ω (cid:48) ) (d) Initial vorticity ﬁeld ( ω ) Figure 3: Initial condition (d) of the barotropic model is the superposition of a zonally symmetric ﬂow (a or b) anda sinusoidal disturbance in the vorticity ﬁeld (c).

The forward problem is solved by the spectral transform method with T85 spectral resolu-tion (triangular truncation at wavenumber 85, with 256 ×

512 points on the latitude-longitudetransform grid) and the semi-implicit time integrator [72].For the inverse problem, we recover the initial condition, speciﬁcally the initial vorticity ﬁeld ( ω )of the barotropic vorticity equation, given noisy pointwise observations of the zonal wind velocityﬁeld. The pointwise observations are collected at 50 random points in the north hemisphere at T = 12 h and T = 24 h , therefore, we have N y = 100 data (See Fig. 4). And 5% Gaussian randomnoises are added to the observation, as follows, y obs = y ref + 5% y ref (cid:12) N (0 , I ) , here (cid:12) denotes element-wise multiplication. The initial zonal ﬂow ω b represents the basic atmo-spheric circulation, and we are more interested in the initial perturbation ﬁeld ω (cid:48) . Hence, we assumethe prior mean is r = ω b . Since the large structure of ω lies in the spherical harmonics space Y m,n ( λ, θ ) = P m,n ( θ ) e imλ − n ≤ m ≤ n, n ≥ . N = 7 truncated wavenumber withtriangular truncation, as following, ω ( λ, θ ) = ω b + N (cid:88) n =1 τ (0 ,n ) P ,n ( θ ) + N (cid:88) m =1 N (cid:88) n = m τ s ( m,n ) P m,n ( θ ) cos( mλ ) + τ c ( m,n ) P m,n ( θ ) sin( mλ ) . (21)For UKI, the reparameterization is Eq. (21), and the UKI is initialized with τ ∼ N (0 , I R ). ForTUKI, we assume the square root of Λ is Z = 1 R (cid:16) P ,n , · · · , P m,n cos( mλ ) , P m,n sin( mλ ) · · · (cid:17) with m, n ≤ N. The TUKI starts with θ ∼ N ( ω b , Z Z T ). For both approaches, we have N r = 63. It is worthmentioning the inverse problem is ill-posed, another possible initial condition can be obtainedby mirroring around the equator, since there is no observation in the south hemisphere. Hence,regularization is required, and we set the regularization parameter α to 0 .

5, since ω b is a goodapproximation. Figure 4: The zonal velocity ﬁeld of the barotropic model and the 50 random pointwise measurements (black dots)in the north hemisphere at two observation times ( T = 12 h and T = 24 h ). The convergence of the initial vorticity ﬁeld ω ( x, m n ) and the optimization errors at eachiteration are depicted in Fig. 5. TUKI performs very close to UKI with reparameterization for thiscase. The estimated initial vorticity ﬁelds ω ( x, m n ) at the 20th iteration are depicted in Fig. 6.Both UKI and TUKI capture these sinusoidal disturbance of the truth initial ﬁeld. The 3- σ errorﬁelds at each point are depicted in Fig. 7. Although the covariance does not represent the Bayesianposterior uncertainty, it does indicate the sensitivities inherent in the estimation problem, and inparticular it features largest uncertainty in the south hemisphere, since there is no observation. Finally, we consider using reduced-order model techniques to speed up an idealized general cir-culation model inverse problem. The model is based on the 3D Navier-Stokes equations, makingthe hydrostatic and shallow-atmosphere approximations common in atmospheric modeling. Specif-ically, we test on the notable Held-Suarez test case [72], in which a detailed radiative transfer modelis replaced by Newtonian relaxation of temperatures toward a prescribed “radiative equilibrium”14 R e l a t i v e L n o r m e rr o r O p t i m i z a t i o n e rr o r TUKIUKI

Figure 5: L error (cid:107) ω ( x,m n ) − ω (cid:107) (cid:107) ω (cid:107) (left) and the optimization error 12 (cid:107) Σ − ν ( y obs − (cid:98) y n ) (cid:107) (right) of the barotropicmodel problem.Figure 6: Initial vorticity ﬁelds ω ( x, m n ) recovered by UKI (left) and TUKI (right). The reference initial vorticityﬁeld is in Fig. 3-d. T eq ( φ, p ) that varies with latitude φ and pressure p . Speciﬁcally, the thermodynamic equation fortemperature T ∂T∂t + · · · = Q (dots denoting advective and pressure work terms) contains a diabatic heat source Q = − k T ( φ, p, p s ) (cid:0) T − T eq ( φ, p ) (cid:1) , with relaxation coeﬃcient (inverse relaxation time) k T = k a + ( k s − k a ) max (cid:16) , σ − σ b − σ b (cid:17) cos φ. Here, σ = p/p s , which is pressure p normalized by surface pressure p s , is the vertical coordinate ofthe model, and T eq = max (cid:110) K, (cid:104) K − ∆ T y sin φ − ∆ θ z log (cid:16) pp (cid:17) cos φ (cid:105)(cid:16) pp (cid:17) κ (cid:111) igure 7: Error variance ﬁelds in terms of the standard deviation at each point obtained by UKI (left) andTUKI (right). is the equilibrium temperature proﬁle ( p = 10 Pa is a reference surface pressure and κ = 2 / k a = (40 day) − , k s = (4 day) − , ∆ T y = 60 K , ∆ θ z = 10 K . For the numerical simulations, we use the spectral transform method in the horizontal, withT42 spectral resolution (triangular truncation at wavenumber 42, with 64 ×

128 points on thelatitude-longitude transform grid); we use 20 vertical levels equally spaced in σ . With the defaultparameters, the model produce an Earth-like zonal-mean circulation, albeit without moisture orprecipitation. A single jet is generated with maximum strength of roughly 30 m s − near 45 ◦ latitude (See Fig. 8).Our inverse problem is constructed to learn parameters in Newtonian relaxation term Q :( k a , k s , ∆ T y , ∆ θ z ) , We do so in the presence of the following constraints:0 day − < k a < − , k a < k s < − , < ∆ T y , < ∆ θ z . The inverse problem is formed as follows [1], y = G ( θ ) + η with G ( θ ) = T ( φ, σ ) (22)with the parameter transformation θ : ( k a , k s , ∆ T y , ∆ θ z ) = (cid:16)

11 + | θ (1) | ,

11 + | θ (1) | + 11 + | θ (2) | , | θ (3) | , | θ (4) | (cid:17) . (23)The observation mapping G is deﬁned by mapping from the unknown θ to the 200-day zonal meanof the temperature ( T ) as a function of latitude ( φ ) and height ( σ ), after an initial spin-up of 200days. The truth observation is the 1000-day zonal mean of the temperature (see Fig. 8-top-left),after an initial spin-up 200 days to eliminate the inﬂuence of the initial condition. Because the truthobservations come from an average 5 times as long as the observation window used for parameterlearning, the chaotic internal variability of the model introduces noise in the observations.16o perform the inversion, we set r = [2 day , ,

20 K ,

20 K] T and UKI is initial-ized with θ ∼ N (cid:16) r , I (cid:17) . Within the algorithm, we assume that the observation error satisﬁes η ∼ N (0 K , I K ). Because the problem is over-determined, we set α = 1 . The reduced-ordermodel technique discussed in Section 4 is applied to speed up the UKI. These 2 N θ forward modelevaluations are computed on a T21 grid (triangular truncation at wavenumber 21, with 32 × σ (twicecoarser in all three directions). The 1000-day zonal mean of the temperature and velocity predictedby the low-resolution model with the truth parameters are shown in Fig. 8-bottom. Although thereare large discrepancies comparing with results computed on the T42 grid, these low-resolutionresults only aﬀect the covariance and F u d G . The estimated parameters and associated 3 − σ con-ﬁdence intervals for each component at each iteration are depicted in Fig. 9. The estimation ofmodel parameters at the 20th iteration are (cid:20) k a k s ∆ T y ∆ θ z (cid:21) ∼ N (cid:16) (cid:34) . − .

271 day − .

76 K9 .

84 K (cid:35) , (cid:34) . × − day − − . × − day − − . × − day − K 3 . × − day − K − . × − day − . × − day − . × − day − K − . × − day − K − . × − day − K 5 . × − day − K 2 . × − K . × − K . × − day − K − . × − day − K 3 . × − K . × − K (cid:35) (cid:17) . The UKI with reduced-order models converges eﬃciently to the true parameters (less than 20iterations). Moreover, the computational cost of these 2 N θ low-resolution forward evaluations iseven lower than a single high-resolution forward evaluation. Hence, for this inverse problems, thederivative-free UKI equipped with reduced order models achieves a CPU time speedup factor of O (10) and is as eﬃcient as gradient-based optimization methods, which generally require an equallyexpensive backward propagation.

6. Conclusion

Unscented Kalman inversion is attractive for at least four main reasons: (i) it is derivative-free;(ii) it is robust for noise observations and chaotic inverse problems; (iii) it can be embarrassinglyparallel; (iv) it provides sensitivity information. In the present work, several strategies of makingthe UKI more eﬃcient is presented for large scale inverse problems, including using low-rank ap-proximation and reduced-order model techniques. Although we demonstrate the success of applyingUKI for high dimensional inverse problems, the construction of low-rank basis for general scientiﬁcand engineering problems with complex geometries is not trivial, which is worth further investiga-tion. Another interesting area for future work would be to speed up UKI with other reduced-ordermodel techniques.

Acknowledgments.

D.Z.H. is supported by the generosity of Eric and Wendy Schmidt by recom-mendation of the Schmidt Futures program. J.H. is supported by the Simons Foundation as aJunior Fellow at New York University.

Appendix A. Proof of Theorems

Proof of lemma 1.

We will prove that Z n is rank N r and share the same column vector spacespanned by the column vectors of Z by induction.When n = 0, since rank( Z ) = N r , therefore Z n is rank N r and share the same column vectorspace spanned by the column vectors of Z . 17 igure 8: Zonal mean temperature (left) and zonal wind velocity (right) obtained with the T42 grid (top) and theT21 grid (bottom). We assume this holds for all n ≤ k . Since Z ω = (cid:112) − α Z , the column space and the leftsingular vector space of (cid:16) αZ k Z ω (cid:17) are the same as the column space of Z . Therefore the N r -TSVD (cid:16) αZ k Z ω (cid:17) = (cid:98) U k (cid:113) (cid:98) D k (cid:98) V Tk is exact. Next, since we have (cid:98) Z k +1 (cid:98) Z Tk +1 = (cid:16) αZ k Z ω (cid:17)(cid:16) αZ k Z ω (cid:17) T , (cid:98) Z k +1 is rank N r and the column space of (cid:98) Z k +1 is the same as Z . Finally, since (cid:98) P k +1 ( (cid:98) Γ k +1 + I ) − / is not singular, from Z k +1 = (cid:98) Z k +1 (cid:98) P k +1 ( (cid:98) Γ k +1 + I ) − / , we have that Z k +1 shares the column space of (cid:98) Z k +1 .Therefore { Z n } and { (cid:98) Z n } are rank N r and share the same column vector space spanned by thecolumn vectors of Z (13). Proof of theorem 2.

Gaussian vectors are rotation invariant. We can take any N θ × N θ orthogonalmatrix O , then the joint law of { Oθ , Oθ , · · · , Oθ J } is the same as the joint law of { θ , θ , · · · , θ J } .18 k a ( day ) k s ( day ) T y ( K ) y ( K ) Figure 9: Convergence of the idealized general circulation model inverse problem with UKI, true parameter valuesare represented by dashed grey lines.

Moreover, rotation preserves distance:dist( θ ref , span { θ , θ , · · · , θ J } ) = dist( Oθ ref , span { Oθ , Oθ , · · · , Oθ J } ) , (A.1)which has the same law as dist( Oθ ref , span { θ , θ , · · · , θ J } ). By taking expectation over θ , θ , · · · , θ J on both sides, we get E [dist( θ ref , span { θ , θ , · · · , θ J } ) ] = E [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ] , (A.2)for any N θ × N θ orthogonal matrix O . We can further average (A.2), over the uniform distributionon all N θ × N θ orthogonal matrices (Haar measure over the orthogonal group): E [dist( θ ref , span { θ , θ , · · · , θ J } ) ] = E [ E O [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ]] , (A.3)where E O is the expectation with respect to the uniform distribution on all N θ × N θ orthogonalmatrices. Under this measure, Oθ ref is uniformly distributed on the sphere of radius (cid:107) θ ref (cid:107) , i.e. ithas the same law as (cid:107) θ ref (cid:107) ω , where ω is uniformly distributed on the unit sphere. We can rewritethe right hand side of (A.3) as E O [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ] = (cid:107) θ ref (cid:107) E ω [dist( ω, span { θ , θ , · · · , θ J } ) ] . (A.4)For Gaussian vectors θ , θ , · · · , θ J , almost surely, their span, span { θ , θ , · · · , θ J } has dimension J . dist( ω, span { θ , θ , · · · , θ J } ) is the distance from a uniformly distributed random vector on theunit sphere to a plane of dimension J , we can again rotate the plane, which will not change thelaw. So we can rotate the plane to the plane spanned by the coordinate vectors e , e , · · · , e J , then E ω [dist( ω, span { θ , θ , · · · , θ J } ) ] = E ω [dist( ω, span { e , e , · · · , e J } ) ]= E ω (cid:34) N θ (cid:88) i = J ω i (cid:35) = N θ − JN θ = 1 − JN θ , (A.5)where we use that for a uniformly distributed vector, the expectation of each coordinate square is1 /N θ . The claim (16) follows from combining (A.4) and (A.5).19 ppendix B. Ensemble Based Kalman Inversion The stochastic ensemble Kalman inversion [1] is • Prediction step : (cid:98) θ jn +1 = αθ jn + (1 − α ) r + ω jn +1 (cid:98) m n +1 = 1 J J (cid:88) j =1 (cid:98) θ jn +1 , (cid:98) C n +1 = 1 J − J (cid:88) j =1 ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) θ jn +1 − (cid:98) m n +1 ) T . • Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = 1 J J (cid:88) j =1 (cid:98) y jn +1 , (cid:98) C θpn +1 = 1 J − J (cid:88) j =1 ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T , (cid:98) C ppn +1 = 1 J − J (cid:88) j =1 ( (cid:98) y jn +1 − (cid:98) y n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T + Σ ν ,θ jn +1 = (cid:98) θ jn +1 + (cid:98) C θpn +1 (cid:16) (cid:98) C ppn +1 (cid:17) − ( y − (cid:98) y jn +1 − ν jn +1 ) ,m n +1 = 1 J J (cid:88) j =1 θ jn +1 . (B.1)Here the superscript j = 1 , · · · , J is the ensemble particle index, ω jn +1 ∼ N (0 , Σ ω ) and ν jn +1 ∼N (0 , Σ ν ) are independent and identically distributed random variables.Inspired by square root Kalman ﬁlters [29, 28, 50, 51], the analysis step of the stochastic Kalmaninversion can be replaced by a deterministic analysis, which leads to deterministic ensemble Kalmaninversions. Deterministic ensemble Kalman inversions update the mean ﬁrst, as following, m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 (cid:16) (cid:98) C ppn +1 (cid:17) − ( y − (cid:98) y n +1 ) . (B.2)Then we can deﬁne the matrix square roots (cid:98) Z n +1 , Z n +1 ∈ R N θ × J of (cid:98) C n +1 , C n +1 as following, (cid:98) Z n +1 = 1 √ J − (cid:16)(cid:98) θ n +1 − (cid:98) m n +1 (cid:98) θ n +1 − (cid:98) m n +1 ... (cid:98) θ Jn +1 − (cid:98) m n +1 (cid:17) ,Z n +1 = 1 √ J − (cid:16) θ n +1 − m n +1 θ n +1 − m n +1 ... θ Jn +1 − m n +1 (cid:17) . (B.3)The particles { θ jn +1 } can be deterministically updated by the ensemble adjustment/transformKalman methods, which brings about following ensemble adjustment/transform Kalman inversions.20 ppendix B.1. Ensemble Adjustment Kalman Inversion Following the ensemble adjustment Kalman ﬁlter proposed in [28], the analysis step updatesparticles deterministically with a pre-multiplier A , θ jn +1 − m n +1 = A ( (cid:98) θ jn +1 − (cid:98) m n +1 ) . (B.4)Here A = F √ D p U √ D √ D p − F T withSVD : (cid:98) Z n +1 = F √ D p V T , SVD : V T (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − V = U DU T , (B.5)where both √ D p and D are non-singular diagonal matrices, with dimensionality rank( (cid:98) Z n +1 ) and (cid:98) Y n +1 = 1 √ J − (cid:16)(cid:98) y n +1 − (cid:98) y n +1 (cid:98) y n +1 − (cid:98) y n +1 ... (cid:98) y Jn +1 − (cid:98) y n +1 (cid:17) . (B.6)It can be veriﬁed that the ensemble covariance matrix from Eq. (B.1) satisﬁes C n +1 = Z n +1 Z Tn +1 = (cid:98) Z n +1 (cid:16) I − (cid:98) Y Tn +1 ( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − (cid:98) Y n +1 (cid:17) (cid:98) Z Tn +1 = (cid:98) Z n +1 (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − (cid:98) Z Tn +1 (B.7)Bringing Eq. (B.5) into Eq. (B.7) leads to C n +1 = F √ D p U DU T √ D p F = F √ D p U √ D √ D p − F T (cid:98) C n +1 F √ D p − √ DU T √ D p F T = A (cid:98) Z n +1 (cid:98) Z Tn +1 A T . (B.8)And therefore the ensemble adjustment Kalman ﬁlter delivers the same covariance matrix as thestochastic ensemble Kalman ﬁlter. Although the covariance matrix is not required in both algo-rithms. Appendix B.2. Ensemble Transform Kalman Inversion

Following the ensemble transform Kalman ﬁlter proposed in [29], the analysis step updatesparticles deterministically with a post-multiplier T , Z n +1 = (cid:98) Z n +1 T. (B.9)Here T = P (Γ + I ) − / , with SVD: (cid:98) Y n +1 Σ − ν (cid:98) Y n +1 = P Γ P T , (B.10)where (cid:98) Y n +1 is deﬁned in Eq. (B.6). Following Eq. (B.7), the ensemble covariance matrix satisﬁes C n +1 = (cid:98) Z n +1 (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − (cid:98) Z Tn +1 = (cid:98) Z n +1 (cid:16) I + P Γ P T (cid:17) − (cid:98) Z Tn +1 = (cid:98) Z n +1 P (cid:16) I + Γ (cid:17) − P T (cid:98) Z Tn +1 = (cid:98) Z n +1 P (cid:16) I + Γ (cid:17) − P T (cid:98) Z Tn +1 = (cid:98) Z n +1 T T T (cid:98) Z Tn +1 . (B.11)21nd therefore the ensemble transform Kalman ﬁlter delivers the same covariance matrix as thestochastic ensemble Kalman ﬁlter. Although the covariance matrix is not required in both algo-rithms. Remark 6.

The original ETKF is biased, since Z n +1 (cid:54) = 0 ( m n +1 is not the mean of { θ jn +1 } j ). Anunbiased ETKF ﬁx is introduced in [51] by deﬁning a symmetric post-multiplier T = P (Γ+ I ) − / P T .Since T T T (cid:16) I − (cid:98) Y Tn +1 ( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − (cid:98) Y n +1 (cid:17) , (B.12)1 is the eigenvector of T and, therefore Z n +1 (cid:98) Z n +1 T (cid:98) Z n +1 . (B.13) References [1] Daniel Z Huang, Tapio Schneider, and Andrew M Stuart. Unscented kalman inversion. arXivpreprint arXiv:2102.01580 , 2021.[2] Mrinal K Sen and Paul L Stoﬀa.

Global optimization methods in geophysical inversion . Cam-bridge University Press, 2013.[3] Tapio Schneider, Shiwei Lan, Andrew Stuart, and Joao Teixeira. Earth system modeling 2.0:A blueprint for models that learn from observations and targeted high-resolution simulations.

Geophysical Research Letters , 44(24):12–396, 2017.[4] Oliver RA Dunbar, Alfredo Garbuno-Inigo, Tapio Schneider, and Andrew M Stuart. Cali-bration and uncertainty quantiﬁcation of convective parameters in an idealized gcm. arXivpreprint arXiv:2012.13262 , 2020.[5] Daniel Z Huang, Kailai Xu, Charbel Farhat, and Eric Darve. Learning constitutive relationsfrom indirect observations using deep neural networks.

Journal of Computational Physics ,page 109491, 2020.[6] Kailai Xu, Daniel Z Huang, and Eric Darve. Learning constitutive relations using symmetricpositive deﬁnite neural networks.

Journal of Computational Physics , 428:110072, 2021.[7] Philip Avery, Daniel Z Huang, Wanli He, Johanna Ehlers, Armen Derkevorkian, and CharbelFarhat. A computationally tractable framework for nonlinear dynamic multiscale modeling ofmembrane fabric. arXiv preprint arXiv:2007.05877 , 2020.[8] Brian H Russell.

Introduction to seismic inversion methods . SEG Books, 1988.[9] Carey Bunks, Fatimetou M Saleck, S Zaleski, and G Chavent. Multiscale seismic waveforminversion.

Geophysics , 60(5):1457–1473, 1995.[10] Alexander V Goncharsky and Sergey Y Romanov. Iterative methods for solving coeﬃ-cient inverse problems of wave tomography in models with attenuation.

Inverse Problems ,33(2):025003, 2017.[11] Janne Hakkarainen, Zenith Purisha, Antti Solonen, and Samuli Siltanen. Undersampled dy-namic x-ray tomography with dimension reduction kalman ﬁlter.

IEEE Transactions on Com-putational Imaging , 5(3):492–501, 2019. 2212] Eric A Wan and Rudolph Van Der Merwe. The unscented kalman ﬁlter for nonlinear estima-tion. In

Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communica-tions, and Control Symposium (Cat. No. 00EX373) , pages 153–158. Ieee, 2000.[13] Rudolph Emil Kalman. A new approach to linear ﬁltering and prediction problems.

J. BasicEng. Mar , 82(1):35–45, 1960.[14] Harold Wayne Sorenson.

Kalman ﬁltering: theory and application . IEEE, 1985.[15] Geir Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model usingmonte carlo methods to forecast error statistics.

Journal of Geophysical Research: Oceans ,99(C5):10143–10162, 1994.[16] Simon J Julier, Jeﬀrey K Uhlmann, and Hugh F Durrant-Whyte. A new approach for ﬁlteringnonlinear systems. In

Proceedings of 1995 American Control Conference-ACC’95 , volume 3,pages 1628–1632. IEEE, 1995.[17] Daniel Z Huang, P-O Persson, and Matthew J Zahr. High-order, linearly stable, partitionedsolvers for general multiphysics problems based on implicit–explicit runge–kutta schemes.

Computer Methods in Applied Mechanics and Engineering , 346:674–706, 2019.[18] Daniel Z Huang, Will Pazner, Per-Olof Persson, and Matthew J Zahr. High-order parti-tioned spectral deferred correction solvers for multiphysics problems.

Journal of ComputationalPhysics , page 109441, 2020.[19] Zhengyu Huang, Philip Avery, Charbel Farhat, Jason Rabinovitch, Armen Derkevorkian, andLee D Peterson. Simulation of parachute inﬂation dynamics using an eulerian computationalframework for ﬂuid-structure interfaces evolving in high-speed turbulent ﬂows. In , page 1540, 2018.[20] Daniel Z Huang, Philip Avery, Charbel Farhat, Jason Rabinovitch, Armen Derkevorkian,and Lee D Peterson. Modeling, simulation and validation of supersonic parachute inﬂationdynamics during mars landing. In

AIAA Scitech 2020 Forum , page 0313, 2020.[21] Alistair Adcroft, Whit Anderson, V Balaji, Chris Blanton, Mitchell Bushuk, Carolina O Du-four, John P Dunne, Stephen M Griﬃes, Robert Hallberg, Matthew J Harrison, et al. The gfdlglobal ocean and sea ice model om4. 0: Model description and simulation features.

Journal ofAdvances in Modeling Earth Systems , 11(10):3167–3211, 2019.[22] Charles S Peskin. Numerical analysis of blood ﬂow in the heart.

Journal of computationalphysics , 25(3):220–252, 1977.[23] Marsha Berger and Michael Aftosmis. Progress towards a cartesian cut-cell method for viscouscompressible ﬂow. In , page 1301, 2012.[24] Daniel Z Huang, Dante De Santis, and Charbel Farhat. A family of position-and orientation-independent embedded boundary methods for viscous ﬂow and ﬂuid–structure interactionproblems.

Journal of Computational Physics , 365:74–104, 2018.[25] Daniel Z Huang, Philip Avery, and Charbel Farhat. An embedded boundary approach forresolving the contribution of cable subsystems to fully coupled ﬂuid-structure interaction.

International Journal for Numerical Methods in Engineering , 2020.2326] Marsha J Berger, Phillip Colella, et al. Local adaptive mesh reﬁnement for shock hydrody-namics.

Journal of computational Physics , 82(1):64–84, 1989.[27] Raunak Borker, Daniel Huang, Sebastian Grimberg, Charbel Farhat, Philip Avery, and JasonRabinovitch. Mesh adaptation framework for embedded boundary methods for computationalﬂuid dynamics and ﬂuid-structure interaction.

International Journal for Numerical Methodsin Fluids , 90(8):389–424, 2019.[28] Jeﬀrey L Anderson. An ensemble adjustment kalman ﬁlter for data assimilation.

Monthlyweather review , 129(12):2884–2903, 2001.[29] Craig H Bishop, Brian J Etherton, and Sharanya J Majumdar. Adaptive sampling withthe ensemble transform kalman ﬁlter. part i: Theoretical aspects.

Monthly weather review ,129(3):420–436, 2001.[30] Ienkaran Arasaratnam and Simon Haykin. Cubature kalman ﬁlters.

IEEE Transactions onautomatic control , 54(6):1254–1269, 2009.[31] Simon J Julier and Jeﬀrey K Uhlmann. New extension of the kalman ﬁlter to nonlinear systems.In

Signal processing, sensor fusion, and target recognition VI , volume 3068, pages 182–193.International Society for Optics and Photonics, 1997.[32] Sharad Singhal and Lance Wu. Training multilayer perceptrons with the extended kalmanalgorithm. In

Advances in neural information processing systems , pages 133–140, 1989.[33] Gintaras V Puskorius and Lee A Feldkamp. Decoupled extended kalman ﬁlter training offeedforward layered networks. In

IJCNN-91-Seattle International Joint Conference on NeuralNetworks , volume 1, pages 771–777. IEEE, 1991.[34] Nikola B Kovachki and Andrew M Stuart. Ensemble kalman inversion: a derivative-freetechnique for machine learning tasks.

Inverse Problems , 35(9):095005, 2019.[35] Dean S Oliver, Albert C Reynolds, and Ning Liu.

Inverse theory for petroleum reservoircharacterization and history matching . Cambridge University Press, 2008.[36] Yan Chen and Dean S Oliver. Ensemble randomized maximum likelihood method as aniterative ensemble smoother.

Mathematical Geosciences , 44(1):1–26, 2012.[37] Alexandre A Emerick and Albert C Reynolds. Investigation of the sampling performance ofensemble-based methods with a simple reservoir model.

Computational Geosciences , 17(2):325–350, 2013.[38] Eric A Wan and Alex T Nelson. Neural dual extended kalman ﬁltering: applications in speechenhancement and monaural blind signal separation. In

Neural Networks for Signal ProcessingVII. Proceedings of the 1997 IEEE Signal Processing Society Workshop , pages 466–475. IEEE,1997.[39] Alexander G Parlos, Sunil K Menon, and A Atiya. An algorithmic approach to adaptive stateﬁltering using recurrent neural networks.

IEEE Transactions on Neural Networks , 12(6):1411–1432, 2001.[40] JH Gove and DY Hollinger. Application of a dual unscented kalman ﬁlter for simultaneous stateand parameter estimation in problems of surface-atmosphere exchange.

Journal of GeophysicalResearch: Atmospheres , 111(D8), 2006. 2441] David J Albers, Matthew Levine, Bruce Gluckman, Henry Ginsberg, George Hripcsak, andLena Mamykina. Personalized glucose forecasting for type 2 diabetes using data assimilation.

PLoS computational biology , 13(4):e1005232, 2017.[42] Marco A Iglesias, Kody JH Law, and Andrew M Stuart. Ensemble kalman methods for inverseproblems.

Inverse Problems , 29(4):045001, 2013.[43] Claudia Schillings and Andrew M Stuart. Analysis of the ensemble kalman ﬁlter for inverseproblems.

SIAM Journal on Numerical Analysis , 55(3):1264–1290, 2017.[44] Claudia Schillings and Andrew M Stuart. Convergence analysis of ensemble kalman inversion:the linear, noisy case.

Applicable Analysis , 97(1):107–123, 2018.[45] David J Albers, Paul-Adrien Blancquart, Matthew E Levine, Elnaz Esmaeilzadeh Seylabi, andAndrew Stuart. Ensemble kalman methods with constraints.

Inverse Problems , 35(9):095007,2019.[46] Neil K Chada, Andrew M Stuart, and Xin T Tong. Tikhonov regularization within ensemblekalman inversion.

SIAM Journal on Numerical Analysis , 58(2):1263–1294, 2020.[47] Alexey Kaplan, Yochanan Kushnir, Mark A Cane, and M Benno Blumenthal. Reduced spaceoptimal analysis for historical data sets: 136 years of atlantic sea surface temperatures.

Journalof Geophysical Research: Oceans , 102(C13):27835–27860, 1997.[48] P Brasseur, J Ballabrera-Poy, and J Verron. Assimilation of altimetric data in the mid-latitudeoceans using the singular evolutive extended kalman ﬁlter with an eddy-resolving, primitiveequation model.

Journal of Marine Systems , 22(4):269–294, 1999.[49] L Gourdeau, J Verron, Thierry Delcroix, AJ Busalacchi, and R Murtugudde. Assimilationof topex/poseidon altimetric data in a primitive equation model of the tropical paciﬁc oceanduring the 1992–1996 el ni˜no-southern oscillation period.

Journal of Geophysical Research:Oceans , 105(C4):8473–8488, 2000.[50] Michael K Tippett, Jeﬀrey L Anderson, Craig H Bishop, Thomas M Hamill, and Jeﬀrey SWhitaker. Ensemble square root ﬁlters.

Monthly Weather Review , 131(7):1485–1490, 2003.[51] Xuguang Wang and Craig H Bishop. A comparison of breeding and ensemble transform kalmanﬁlter ensemble forecast schemes.

Journal of the atmospheric sciences , 60(9):1140–1158, 2003.[52] Rudolph Van Der Merwe and Eric A Wan. The square-root unscented kalman ﬁlter for stateand parameter-estimation. In , volume 6, pages 3461–3464. IEEE, 2001.[53] TD Robinson, MS Eldred, KE Willcox, and R Haimes. Surrogate-based optimization us-ing multiﬁdelity models with variable parameterization and corrected space mapping.

AiaaJournal , 46(11):2814–2822, 2008.[54] Timothy MacDonald and Juan J Alonso. Multi-ﬁdelity wing optimization utilizing 2d to 3dpressure distribution mapping in transonic conditions. In

AIAA Scitech 2020 Forum , page1295, 2020.[55] Kevin Carlberg, Charbel Farhat, Julien Cortial, and David Amsallem. The gnat method fornonlinear model reduction: eﬀective implementation and application to computational ﬂuiddynamics and turbulent ﬂows.

Journal of Computational Physics , 242:623–647, 2013.2556] Elizabeth Qian, Boris Kramer, Benjamin Peherstorfer, and Karen Willcox. Lift & learn:Physics-informed machine learning for large-scale nonlinear dynamical systems.

Physica D:Nonlinear Phenomena , 406:132401, 2020.[57] Sebastian Grimberg, Charbel Farhat, Radek Tezaur, and Charbel Bou-Mosleh. Mesh samplingand weighting for the hyperreduction of nonlinear petrov-galerkin reduced-order models withlocal reduced-order bases. arXiv preprint arXiv:2008.02891 , 2020.[58] Ilias Bilionis and Nicholas Zabaras. Multi-output local gaussian process regression: Appli-cations to uncertainty quantiﬁcation.

Journal of Computational Physics , 231(17):5718–5746,2012.[59] Emmet Cleary, Alfredo Garbuno-Inigo, Shiwei Lan, Tapio Schneider, and Andrew M Stuart.Calibrate, emulate, sample. arXiv preprint arXiv:2001.03689 , 2020.[60] Rohit K Tripathy and Ilias Bilionis. Deep uq: Learning deep neural network surrogate modelsfor high dimensional uncertainty quantiﬁcation.

Journal of computational physics , 375:565–588, 2018.[61] Yuwei Fan and Lexing Ying. Solving inverse wave scattering with deep learning. arXiv preprintarXiv:1911.13202 , 2019.[62] Nicholas H Nelsen and Andrew M Stuart. The random feature model for input-output mapsbetween banach spaces. arXiv preprint arXiv:2005.10224 , 2020.[63] Sebastian Reich and Colin Cotter.

Probabilistic forecasting and Bayesian data assimilation .Cambridge University Press, 2015.[64] Kody Law, Andrew Stuart, and Kostas Zygalakis. Data assimilation.

Cham, Switzerland:Springer , 2015.[65] Peter Jan van Leeuwen. A variance-minimizing ﬁlter for large-scale applications.

MonthlyWeather Review , 131(9):2071–2084, 2003.[66] Dean S Oliver and Yan Chen. Recent progress on reservoir history matching: a review.

Computational Geosciences , 15(1):185–221, 2011.[67] David M Livings, Sarah L Dance, and Nancy K Nichols. Unbiased ensemble square root ﬁlters.

Physica D: Nonlinear Phenomena , 237(8):1021–1028, 2008.[68] Malcolm Sambridge, Peter Rickwood, Nicholas Rawlinson, and Silvano Sommacal. Automaticdiﬀerentiation in geophysical inverse problems.

Geophysical Journal International , 170(1):1–8,2007.[69] Peng Chen, Umberto Villa, and Omar Ghattas. Taylor approximation for pde-constrainedoptimization under uncertainty: Application to turbulent jet ﬂow.

PAMM , 18(1):e201800466,2018.[70] Isaac M Held. Pseudomomentum and the orthogonality of modes in shear ﬂows.

Journal ofthe atmospheric sciences , 42(21):2280–2288, 1985.[71] Isaac M Held and Peter J Phillips. Linear and nonliear barotropic decay on the sphere.

Journalof the atmospheric sciences , 44(1):200–207, 1987.2672] Isaac M Held and Max J Suarez. A proposal for the intercomparison of the dynamical coresof atmospheric general circulation models.