Improve Unscented Kalman Inversion With Low-Rank Approximation and Reduced-Order Model
IImprove Unscented Kalman Inversion With Low-Rank Approximation andReduced-Order Model
Daniel Z. Huang a , Jiaoyang Huang b a California Institute of Technology, Pasadena, CA b New York University, New York, NY
Abstract
The unscented Kalman inversion (UKI) presented in [1] is a general derivative-free approach tosolving the inverse problem. UKI is particularly suitable for inverse problems where the forwardmodel is given as a black box and may not be differentiable. The regularization strategy andconvergence property of the UKI are thoroughly studied, and the method is demonstrated effectivelyhandling noisy observation data and solving chaotic inverse problems. In this paper, we aim to makethe UKI more efficient in terms of computational and memory costs for large scale inverse problems.We take advantages of the low-rank covariance structure to reduce the number of forward problemevaluations and the memory cost, related to the need to propagate large covariance matrices.And we leverage reduced-order model techniques to further speed up these forward evaluations.The effectiveness of the enhanced UKI is demonstrated on a barotropic model inverse problemwith O (10 ) unknown parameters and a 3D generalized circulation model (GCM) inverse problem,where each iteration is as efficient as that of gradient-based optimization methods. Keywords:
Inverse Problem, Optimization, Kalman Filter, Low-Rank Approximation,Reduced-Order Model
1. Introduction
Inverse problems are ubiquitous in engineering and scientific applications. These include, toname only a few, global climate model calibration [2, 3, 4], material constitutive relation calibra-tion [5, 6, 7], seismic inversion in geophysics [8, 9], and medical tomography [10, 11]. The associatedforward problems, which may feature multiple scales or include chaotic and turbulent phenomena,can be very expensive. Moreover, the observational data is often noisy, and the inverse problempossibly involves a large number of unknown parameters to recover and may be ill-posed.Inverse problems can be formulated as recovering unknown parameters (or states) θ ∈ R N θ fromthe noisy observation y ∈ R N y , as following y = G ( θ ) + η, (1)where G denotes a mapping between the parameter space and the observation space, and η ∼N (0 , Σ η ) denotes the observational error. Central to both the optimization and probabilistic ap- Email addresses: [email protected] (Daniel Z. Huang), [email protected] (Jiaoyang Huang) a r X i v : . [ m a t h . NA ] F e b roaches to inversion is the regularized objective function Φ R ( θ ) defined byΦ R ( θ ) := Φ( θ ) + 12 (cid:107) Σ − ( θ − r ) (cid:107) , (2a)Φ( θ ) := 12 (cid:107) Σ − η ( y − G ( θ )) (cid:107) , (2b)where Σ η (cid:31) r and Σ (cid:31) θ .The focus of this paper is mainly on the unscented Kalman inversion [12, 1], a version of theKalman inversion method. Kalman inversion methodology pairs the parameter-to-data map G with a stochastic dynamical system for the parameter and then employs techniques from Kalmanfiltering [13, 14, 15, 16] to estimate the parameter θ given the data y . When the unscented Kalmanfiltering is applied, this leads to the UKI, a derivative-free method to solve the inverse problemaimed at solving the optimization problem defined by minimization of Φ R . For inverse problems ingeneral, UKI is attractive because it is derivative-free and hence introduces a significant flexibility inthe forward solver design. Therefore, UKI is suitable for complex multiphysics problems requiringcoupling of different solvers [17, 18, 19, 20, 21], where forward models can be treated as a black box,and methods containing discontinuities (i.e., immersed/embedded boundary method [22, 23, 24, 25]and adaptive mesh refinement [26, 27]). Moreover, Tikhonov type regularization embedded inthe stochastic dynamics [1] enables UKI to effectively handle noisy observations; the Levenberg-Marquardt connection [1] enables UKI to achieve linear (sometimes superlinear) convergence inpractice; and the smoothing property [1] introduced by the unscented transform enables UKI toeffectively handle chaotic inverse problems. Therefore, UKI is a general approach to solving theinverse problem.However, at each iteration, the UKI requires evaluating the forward problem 2 N θ + 1 timesand propagating large N θ by N θ covariance matrices. Although these forward problem evaluationsare embarrassingly parallel, the computational or memory costs can be intractable, when N θ islarge or the forward problem evaluation is expensive. The present work focuses on alleviating thecomputational and memory costs of the UKI. The Kalman filter [13] and its variants, including but not limit to extended Kalman filter [14],ensemble Kalman filter [15, 28, 29], unscented Kalman filter [16, 12], and cubature Kalman fil-ter [30] are developed to sequentially update the probability distribution of states in partiallyobserved dynamics. Most of Kalman filters use Gaussian ansatz to formulate Kalman gain to as-similate the observation and update the distribution. The unscented Kalman filter represents themean and covariance of the Gaussian distribution directly. A quadrature rule—unscented trans-form, specifically for Gaussian distribution based on 2 N θ + 1 deterministic σ − points [16, 31, 12],has been designed and applied for the propagation of the state mean and covariance. Besides stateestimation, Kalman methods are used for parameter inversion, which bring about the extendedKalman inversion (ExKI), the ensemble Kalman inversion (EKI), and the unscented Kalman in-version (UKI). Extended, ensemble and unscented Kalman inversions have been applied to trainneural networks [32, 33, 12, 34] and EKI has been applied in the oil industry [35, 36, 37]. Dualand joint Kalman filters [38, 12] have been designed to simultaneously estimate the unknown statesand the parameters [38, 39, 12, 40, 41] from noisy sequential observations. The EKI has beensystematically developed and analyzed as a general purpose methodology [42, 43, 44, 45, 46] for thesolution of inverse and parameter estimation problems, the same is for UKI [1]. However, unscentedKalman filter and inversion generally require 2 N θ + 1 forward problem evaluations and the storage2f a N θ × N θ covariance matrix, therefore, they are (mistakenly) supposed to be unsuitable forhigh-dimensional data-assimilation/inverse problems.Since many physical phenomena or systems feature large-scale structure or low-dimensionalattractors, therefore, the model error covariance matrices are generally low-rank. The low-rankcovariance structures are leveraged in ensemble based Kalman filters, which leads to reduced spaceKalman filters [47, 48, 49] and square root filters (i.e, ensemble adjustment Kalman filter [28] andensemble transform Kalman filter [29, 50, 51]). And these square root filters have been appliedsuccessfully for weather forecast. Inspired by these square root filters [28, 29, 52], we introduce thereparameterization strategy and the low-rank square root form of the UKI, namely the truncatedunscented Kalman inversion (TUKI), where the inverse problems are solved in the vastly reducedsubspace. This reduces the computation and memory cost from O ( N θ ) and O ( N θ ) to O ( N r ) and O ( N r N θ ), where N r is the rank of the covariance matrix. It has also been pointed out in [28] that”filter methods are unlikely to work without making use of such information about the covariancestructure”.Moreover, these 2 N θ + 1 deterministic σ − points in the unscented transform representing theGaussian distribution, consists of 1 point at the mean and 2 N θ symmetrically distributed pointsaround the mean. The forward problem evaluation at the mean is a first order approximation of thetransformed mean, and the other 2 N θ forward evaluations give a 2nd order approximation of thetransformed covariance. They correspond to G ( θ ) and d G ( θ ) in minimizing Eq. (2), respectively.We observe that the covariance evaluation does not need to be very accurate, and this gives roomfor another level of speedup. Reduced-order models, including but not limited to low-fidelity/low-resolution models [53, 54], projection based reduced-order models (PROM) [55, 56, 57] , Gaussianprocess based surrogate models [58, 59], and neural network based surrogate models [60, 61, 62],can be applied to speed up these 2 N θ forward evaluations. Our main contribution is the development of several speedup strategies for the UKI. • We take advantage of the low-rank covariance structure by reparameterization or reformu-lating UKI in the low-rank square root form. These strategies reduce the number of forwardproblem evaluations from O ( N θ ) to O ( N r ) and the memory cost from O ( N θ ) to O ( N r N θ ). • We further speed up forward problem evaluations by using reduced-order models. The compu-tational cost in one iteration is reduced from 2 N θ + 1 (2 N r + 1) expensive forward evaluationsto 1 single expensive forward evaluation and 2 N θ (2 N r ) cheap forward evaluations. • The effectiveness of the enhanced unscented Kalman inversion is demonstrated on a barotropicmodel inverse problem with O (10 ) unknown parameters and a 3D generalized circulationmodel, where each iteration is as efficient as that of gradient based optimization meth-ods (Generally, gradient based optimization methods require a forward problem evaluationand an equally expensive backward propagation in one iteration.)The remainder of the paper is organized as follows. In Section 2, an overview of the unscentedKalman inversion is provided. In Section 3, speedup strategies by leveraging low-rank covariancestructure are introduced. In Section 4, speedup strategies by leveraging reduced-order modelsare introduced. Numerical applications that demonstrate the efficiency of the enhanced UKI areprovided in Section 5. 3 . Unscented Kalman Inversion (UKI) UKI pairs the parameter-to-data map G with a stochastic dynamical system for the parameter,which is defined as following,evolution: θ n +1 = r + α ( θ n − r ) + ω n +1 , ω n +1 ∼ N (0 , Σ ω ) , (3a)observation: y n +1 = G ( θ n +1 ) + ν n +1 , ν n +1 ∼ N (0 , Σ ν ) , (3b)where θ n +1 is the unknown state vector, and y n +1 is the observation, the artificial evolution error ω n +1 and artificial observation error ν n +1 are mutually independent, zero-mean Gaussian sequenceswith covariances Σ ω and Σ ν , respectively. α ∈ (0 ,
1] is the regularization parameter, and r is anarbitrary vector.Let denote Y n := { y , y , · · · , y n } , the observation set at time n . Techniques from filtering areemployed to approximate the distribution µ n of θ n | Y n . The iterative algorithm starts from µ andupdates µ n through the prediction and analysis steps [63, 64]: µ n (cid:55)→ ˆ µ n +1 , and then ˆ µ n +1 (cid:55)→ µ n +1 ,where ˆ µ n +1 is the distribution of θ n +1 | Y n .In the prediction step, we assume that µ n ≈ N ( m n , C n ), then under (3a), ˆ µ n +1 is also Gaussianwith mean and covariance: (cid:98) m n +1 = E [ θ n +1 | Y n ] = αm n + (1 − α ) r (cid:98) C n +1 = Cov[ θ n +1 | Y n ] = α C n + Σ ω . (4)In the analysis step, we assume that the joint distribution of { θ n +1 , y n +1 }| Y n can be approximatedby a Gaussian distribution N (cid:16)(cid:20) (cid:98) m n +1 (cid:98) y n +1 (cid:21) , (cid:34) (cid:98) C n +1 (cid:98) C θpn +1 (cid:98) C θpn +1 T (cid:98) C ppn +1 (cid:35)(cid:17) , (5)where (cid:98) y n +1 = E [ G ( θ n +1 ) | Y n ] , (cid:98) C θpn +1 =Cov[ θ n +1 , G ( θ n +1 ) | Y n ] , (cid:98) C ppn +1 =Cov[ G ( θ n +1 ) | Y n ] + Σ ν . (6)Conditioning the Gaussian in (5) to find θ n +1 |{ Y n , y n +1 } = θ n +1 | Y n +1 gives the following expressionsfor the mean m n +1 and covariance C n +1 of the approximation to µ n +1 : m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − ( y n +1 − (cid:98) y n +1 ) ,C n +1 = (cid:98) C n +1 − (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − (cid:98) C θpn +1 T . (7)By assuming all observations { y n } are identical to y ( Y n = y ), Equations (4) to (7) establish aconceptual description of the Kalman inversion to solve the inverse problem (1). And UKI uses thefollowing unscented transform to evaluate Eq. (6). Definition 1 (Modified Unscented Transform [1]) . Let denote Gaussian random variable θ ∼N ( m, C ) ∈ R N θ , N θ + 1 symmetric σ − points are chosen deterministically: θ = m θ j = m + c j [ √ C ] j θ j + N θ = m − c j [ √ C ] j (1 ≤ j ≤ N θ ) , where [ √ C ] j is the j th column of the Cholesky factor of C . The quadrature rule approximates themean and covariance of the transformed variable G i ( θ ) as follows, E [ G i ( θ )] ≈ G i ( θ ) = G i ( m ) Cov [ G ( θ ) , G ( θ )] ≈ N θ (cid:88) j =1 W cj ( G ( θ j ) − E G ( θ ))( G ( θ j ) − E G ( θ )) T . (8)4 ere these constant weights are c = c · · · = c N θ = (cid:112) N θ + λ W c = W c = · · · = W c N θ = 12( N r + λ ) ,λ = a ( N θ + κ ) − N θ κ = 0 a = min { (cid:114) N θ + κ , } . Consider the algorithm defined by Eqs. (4) to (7). By utilizing the aforementioned quadraturerule, we obtain the following UKI algorithm: • Prediction step : (cid:98) m n +1 = αm n + (1 − α ) r (cid:98) C n +1 = α C n + Σ ω . (9) • Generate σ − points : (cid:98) θ n +1 = (cid:98) m n +1 , (cid:98) θ jn +1 = (cid:98) m n +1 + c j [ (cid:113) (cid:98) C n +1 ] j (1 ≤ j ≤ N θ ) , (cid:98) θ j + N θ n +1 = (cid:98) m n +1 − c j [ (cid:113) (cid:98) C n +1 ] j (1 ≤ j ≤ N θ ) . • Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = (cid:98) y n +1 , (10a) (cid:98) C θpn +1 = N θ (cid:88) j =1 W cj ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T , (10b) (cid:98) C ppn +1 = N θ (cid:88) j =1 W cj ( (cid:98) y jn +1 − (cid:98) y n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T + Σ ν , (10c) m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − ( y − (cid:98) y n +1 ) , (10d) C n +1 = (cid:98) C n +1 − (cid:98) C θpn +1 ( (cid:98) C ppn +1 ) − (cid:98) C θpn +1 T . (10e)Following [1], the hyperparameters of the stochastic dynamical system are r = r , Σ ω = (2 − α )Λ , Σ ν = 2Σ η , and α ∈ (0 , . . (11)where r is the prior mean and Λ is a positive definite matrix. And the initial distribution θ ∼N ( m , C ) satisfies m = r and C = Λ.UKI [1] as a general derivative-free approach to solving the inverse problem, is particularlysuitable for inverse problems where the forward model is given as a black box and may not bedifferentiable. However, the computational cost of 2 N θ + 1 forward model evaluations (10a) inthe analysis step could be intractable, especially when the unknown parameter dimension is highor the forward problem evaluation is expensive; the memory cost of storing and propagating thecovariance (10e) could be unaffordable, especially when the unknown parameter dimension is high.In the present work, we focus on improving UKI by leveraging the low rank covariance structureand the reduced-order model techniques. 5 . Speed Up with Low-Rank Covariance Structure In general, UKI as a particle method, requires evaluations on 2 N θ + 1 σ -points and the prop-agation of the N θ × N θ covariance matrix, which is impractical for general geophysical applica-tions (i.e., initial condition recovery problems) with dimension of roughly O (10 ) − O (10 ) [65, 66].However, the ensemble Kalman filter [15, 28, 29] is successfully used in high-dimensional data-assimilation applications with ensemble size O (10 ), what is the catch? Since many observed geo-physical phenomena have large-scale structure or low-dimensional attractors, therefore, the modelerror covariance can be well-approximated by the low-rank sample covariance. Moreover, theselarge-scale structure can be well represented by dominant frequency modes (i.e., dominant Fourieror Karhunen-Lo`eve (KL) modes).In this section, two approaches to leveraging the low-rank large-scale structure in the UKIframework are introduced: • Reparameterization, namely project or reparameterize the model parameters onto some vastlyreduced subspace; • Truncated unscented Kalman inversion, namely formulate the unscented Kalman filter as alow-rank square-root filter.For both approaches, the number of σ -points is reduced from 2 N θ + 1 to 2 N r + 1, where N r is thelow rank number. And the storage for the covariance matrix is reduced from O ( N θ ) to O ( N θ N r ).Moreover, a discussion about the scenario, where low rank covariance structure is unknown or doesnot exist, is presented in Subsection 3.3. Let r denote the prior mean. The discrepancy θ − r is assumed to be well-approximated inthe linear space spanned by the basis { u , u , · · · , u N r } . Hence, the unknown parameters can bereparameterized as following, θ = r + N r (cid:88) i =1 τ ( i ) u i . UKI [1] is then applied to solve for the vector τ = [ τ (1) , τ (2) , · · · , τ ( N r ) ] T , which has prior mean 0.Let m n ( τ ) and C n ( τ ) denote the mean and covariance estimation of τ , the mean and covarianceestimation of θ can be recovered by m n ( θ ) = r + U m n ( τ ) C n ( θ ) = U C n ( τ ) U T , (12)here U = (cid:16) u u · · · u N r (cid:17) . Remark 1.
Equation (12) also reveals the low-rank covariance structure of C n ( θ ) . The squareroot Z n of the covariance matrix C n ( θ ) can be written as Z n = U (cid:112) C n ( τ ) Q, here Q is an orthogonal matrix. Hence, all the square root matrices { Z n } share the column vectorspace—the space spanned by column vectors of U , which is prescribed in the parameterization. In the present work, the square root of the matrix C is defined to be a matrix Z such that C = ZZ T , which is inconsistent with its most common use in the mathematical literature. And if Z and Z are two n × m square root matrices of C , then there exists an orthogonal matrix Q such that Z = Z Q [67]. .2. Truncated Unscented Kalman Inversion (TUKI) The truncated unscented transform, which is a low-rank approximation of the modified un-scented transform (See Definition 1), is central to the TUKI:
Definition 2 (Truncated Unscented Transform) . Let denote Gaussian random variable θ ∼ N ( m, C ) ∈ R N θ , and assume the N r dimensional truncated singular value decomposition ( N r -TSVD) of C is C ≈ N r (cid:88) i =1 d i u i u Ti , where { u i } are left singular vectors, and d ≥ d · · · d N r ≥ are dominant singular values. N r + 1 symmetric σ − points are chosen deterministically: θ = m θ j = m + c j (cid:112) d j u j θ j + N r = m − c j (cid:112) d j u j (1 ≤ j ≤ N r ) . The quadrature rule approximates the mean and covariance of the transformed variable G i ( θ ) asfollows, E [ G i ( θ )] ≈ G i ( θ ) Cov [ G ( θ ) , G ( θ )] ≈ N r (cid:88) j =1 W cj ( G ( θ j ) − E G ( θ ))( G ( θ j ) − E G ( θ )) T . Here these constant weights are c = c · · · = c N r = (cid:112) N r + λ W c = W c = · · · = W c N r = 12( N r + λ ) ,λ = a ( N r + κ ) − N r κ = 0 a = min { (cid:114) N r + κ , } . We choose the hyperparameter Λ in Eq. (11) to be a low-rank matrix with square root Z ∈ R N θ × N r , which satisfies Λ = Z Z T and rank( Z ) = N r . (13)Let Z ω = √ − α Z denote the square root matrix of the artificial evolution error covariance Σ ω in Eq. (11). The iteration procedure of the UKI can be formulated in the square root form { Z n } ,where C n = Z n Z Tn . This leads to the following truncated unscented Kalman inversion (TUKI): • Prediction step : (cid:98) m n +1 = αm n + (1 − α ) r N r -TSVD : (cid:16) αZ n Z ω (cid:17) = (cid:98) U n (cid:113) (cid:98) D n (cid:98) V Tn , (14)where (cid:98) U n = (cid:16) u u · · · u N r (cid:17) and (cid:98) D n = diag { d , d , · · · d N r } . And hence (cid:98) C n +1 = α C n + Σ ω = (cid:16) αZ n Z ω (cid:17)(cid:16) αZ n Z ω (cid:17) T = (cid:98) U n (cid:98) D n (cid:98) U Tn . • Generate truncated σ − points : (cid:98) θ n +1 = (cid:98) m n +1 (cid:98) θ jn +1 = (cid:98) m n +1 + c j (cid:112) d j u j (cid:98) θ j + N r n +1 = (cid:98) m n +1 − c j (cid:112) d j u j (1 ≤ j ≤ N r ) . Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = (cid:98) y n +1 , (15a) (cid:98) Z n +1 = (cid:16)(cid:112) W c ( (cid:98) θ n +1 − (cid:98) m n +1 ) (cid:112) W c ( (cid:98) θ n +1 − (cid:98) m n +1 ) · · · (cid:113) W c N r ( (cid:98) θ N r n +1 − (cid:98) m n +1 ) (cid:17) , (15b) (cid:98) Y n +1 = (cid:16)(cid:112) W c ( (cid:98) y n +1 − (cid:98) y n +1 ) (cid:112) W c ( (cid:98) y n +1 − (cid:98) y n +1 ) · · · (cid:113) W c N r ( (cid:98) y N r n +1 − (cid:98) y n +1 ) (cid:17) , (15c)SVD : (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 = (cid:98) P n +1 (cid:98) Γ n +1 (cid:98) P Tn +1 , (15d) m n +1 = (cid:98) m n +1 + (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − (cid:98) P Tn +1 (cid:98) Y Tn +1 Σ − ν ( y − (cid:98) y n +1 ) , (15e) Z n +1 = (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − / . (15f)Since the covariance matrices in Eq. (10) are (cid:98) C θpn +1 = (cid:98) Z n +1 (cid:98) Y Tn +1 , (cid:98) C ppn +1 = (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν , and (cid:98) C n +1 = (cid:98) Z n +1 (cid:98) Z Tn +1 . Applying Sherman–Morrison–Woodbury formula( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − = Σ − ν − Σ − ν (cid:98) Y n +1 ( I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 ) − (cid:98) Y Tn +1 Σ − ν to these update equations (10d) and (10e) leads to the update equations (15e) and (15f). Remark 2.
The SVD in Eq. (15d) is applied on a N r × N r matrix and the truncated SVDin Eq. (14) is applied on a N θ × N r matrix, and hence, they are affordable. Lemma 1.
All square root matrices { Z n } and { (cid:98) Z n } are rank N r and share the same column vectorspace spanned by the column vectors of Z (13). Therefore, the N r -TSVD in the prediction step (14) is exact.Proof. The proof is in Appendix A.
Remark 3. { m n } lie in the linear space spanned by r (or r ) and columns of Z , since m n +1 = αm n + (1 − α ) r + (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − (cid:98) P Tn +1 (cid:98) Y Tn +1 Σ − ν ( y − (cid:98) y n +1 ) . Remark 4.
Although the square root matrix Z n +1 computed in Eq. (15f) is not unique, namely itcan be represented as Z n +1 = (cid:98) Z n +1 (cid:98) P n +1 ( (cid:98) Γ n +1 + I ) − / Q, where Q can be any orthogonal matrix, the algorithm is uniquely defined. Since the SVD of thesquare root of the predicted covariance matrix (14) can be written as (cid:16) αZ n +1 Q Z ω (cid:17) = (cid:98) U n +1 (cid:113) (cid:98) D n +1 (cid:16) (cid:98) V n +1 , Q (cid:98) V n +1 , (cid:17) , where (cid:16) αZ n +1 Z ω (cid:17) = (cid:98) U n +1 (cid:113) (cid:98) D n +1 (cid:16) (cid:98) V n +1 , (cid:98) V n +1 , (cid:17) , the orthogonal matrix Q does not affect theleft singular vectors (cid:98) U n +1 or singular values in (cid:98) D n +1 . .3. Without Low Rank Covariance Structure Information In this section, we focus on problems with high dimensional unknown parameters, but withoutlow-rank covariance structure information (i.e., the prior covariance is I ). For ensemble Kalmaninversion and its variants (See Appendix B), the initial ensemble can still be generated from theprior distribution. For TUKI, a low rank matrix Λ = Z Z T is required to reduce computationaland memory costs.However, the ensemble Kalman inversion has the invariant subspace property [42, 43], namely,the ensemble members are in the linear space spanned by the initial ensembles. The UKI withreparameterization and TUKI have similar properties (See Remarks 1 and 3). On one hand, thisproperty brings regularization effect. On the other hand, this property also brings strong limitationfor the inverse problems without low-rank covariance information, when the truncated low ranknumber N r or the ensemble size J are much smaller than the unknown parameter dimension N θ .The limitation is illustrated in the following theorem: Theorem 2.
Let { θ i } Ji =1 be random samples from the N θ -dimensional Gaussian distribution N (0 , I ) ,for any given vector θ ref ∈ R N θ , we have E (cid:2) dist ( θ ref , span { θ , θ , · · · , θ J } ) (cid:3) = (1 − JN θ ) (cid:107) θ ref (cid:107) . (16) Proof.
The proof is in Appendix A.Theorem 2 illustrates that when J and N r are much smaller than N θ , even the optimal approx-imation in the randomly generated invariant subspace is a poor approximation of the solution withhigh probability.
4. Speed Up with Reduced-Order Models
For large scientific or engineering problems, even with a small parameter number N θ (or ranknumber N r ), the computational cost associated with these 2 N θ + 1 (or 2 N r + 1) forward modelevaluations can be intractable. Reduced-order models can be used to speed up these 2 N θ + 1 (or2 N r + 1) forward model evaluations. Specifically the forward evaluation G (ˆ θ n +1 ) is evaluated bythe high-fidelity model, and other forward evaluations G (ˆ θ jn +1 ) with j ≥ (cid:98) θ n +1 canform a good projection basis. For Gaussian process and neural network based surrogate models,data generating and training procedures are required. Remark 5.
Following Proposition 2 in [1], the UKI ( α = 1 ) update equations (10) can be writtenas m n +1 = m n + C n +1 F u d G Tn +1 (cid:0) Σ ν + (cid:101) Σ ν,n +1 (cid:1) − (cid:0) y − G ( m n ) (cid:1) , (17a) C n +1 = (cid:16) (cid:98) C − n +1 + F u d G Tn +1 (cid:0) Σ ν + (cid:101) Σ ν,n +1 (cid:1) − F u d G n +1 (cid:17) − . (17b)9 here F u d G n +1 = (cid:98) C θpn +1 T (cid:98) C − n +1 is the averaged derivative and (cid:107) (cid:101) Σ ν,n +1 (cid:107) = O ( (cid:107) (cid:98) C n +1 (cid:107) ) . To evaluate G ( m n ) and F u d G n +1 , the UKI requires N θ + 1 forward model evaluations. Specifically, in themodified unscented transform (See Definition 1), we need forward model evaluation (cid:98) y n +1 = G ( (cid:98) θ n +1 ) = G ( m n ) to estimate the mean, which corresponds to y − G ( m n ) in Eq. (17a) ; and N θ forward model eval-uations (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) j ≥ to estimate the covariance, which corresponds to F u d G n +1 .Since we find the covariance and F u d G n +1 do not need to be very accurate. Hence, these N θ forward model evaluations can be accelerated by using reduced-order models.
5. Applications
In this section, we present numerical results for using the aforementioned strategies, includingUKI with reparameterization (referred as UKI) and TUKI, to speed up UKI. For comparison, wealso apply ensemble Kalman inversion (EKI), ensemble adjustment Kalman inversion (EAKI), andensemble transform Kalman inversion (ETKI), the detailed implementation of these ensemble-basedKalman inversions is listed in Appendix B. • Linear model problems: these problems serve as proof-of-concept examples, which comparethe behaviors of different Kalman inversions with/without low-rank covariance structure in-formation. • Barotropic model problem: this is a model data assimilation problem in meteorology, wherethe initial condition is recovered from random observations in the north hemisphere. Thisproblem demonstrates the effectiveness of using low rank covariance structure information tospeed up UKI. • Idealized general circulation model problem: this is a 3D Navier-Stokes problem with hy-drostatic assumption, which describes physical processes in the atmosphere. This problemdemonstrates the effectiveness of using reduced-order models (specifically low-resolution mod-els) to speed up UKI.The code is accessible online: https://github.com/Zhengyu-Huang/InverseProblems.jl
In this section, 2 linear model problems are considered. The first one is from one dimensionalelliptic equation and, therefore the low-rank covariance structure exists. The second one is anartificial linear model problem, with parameters following Bernoulli distribution and, therefore thelow-rank covariance structure information does not exist. For comparison, UKI with reparameteri-zation, TUKI, EKI, EAKI, and ETKI are all applied to these linear problems. Since these problemsare well-posed and no observation error is added, the regularization parameter α is set to be 1.10 .1.1. With Low Rank Covariance Structure Consider the one dimensional elliptic equation − d θdx + θ = f ( x ) ,θ (0) = θ (1) = 0 . Here Dirichlet boundary conditions are applied on both ends, and f defines the source: f ( x ) = (cid:40) ≤ x ≤ < x ≤ . The elliptic equation is semi-discretized by finite difference method on a uniform grid with N θ +2points with N θ = 1000. The solution is θ ref ∈ R N θ on these interior points. We are interested inthe inverse problem of recovering θ ref from observations of f on these interior points ( N y = N θ ): y = Gθ + η, here G = ( − d dx + 1) is the discretized operator, the observation error is η ∼ N (0 , I ) ∈ R N y . Sincethe large structure of θ lies in the Fourier sine space, we can approximate it as θ = τ (1) sin( xπ ) + · · · + τ ( N r ) sin( N r xπ ) with N r = 5 . (18)For UKI, the reparameterization is Eq. (18) and the UKI is initialized with τ ∼ N (0 , I ). ForTUKI, the low-rank covariance structure is embedded in the square root of Λ: Z = 10 × (cid:16) sin( xπ ) sin(2 xπ ) · · · sin( N r xπ ) (cid:17) . And the TUKI is initialized with θ ∼ N (0 , Z Z T ). For EKI, EAKI, and ETKI, the initial ensemblesare generated from the distribution N (0 , Z Z T ), which span the column space of Z . The ensemblesize is J = 2 N r + 1, which matches the number of σ -points of UKI and TUKI.The convergence of { m n } in terms of the relative L norm errors is depicted in Fig. 1. All theKalman inversions behave similar and converge efficiently, although the EKI suffers from randomnoise due to the small ensemble size. This problem illustrates how UKI uses 2 N r + 1 = 11 σ -pointsto solve a N θ = 1000 dimensional inverse problem. Consider the following artificial inverse problem y = Gθ + η, with G = I ∈ R N y × N θ and N y = 1000 , N θ = 1000. The entries of the underlying truth θ ref are independent, identical Bernoulli distribution B (1 , . y = Gθ ref , withobservation error η ∼ N (0 , I ). However, there is no low-rank covariance structure information.Hence, the initial distribution is set to be N (0 , I ).Although there is no low-rank covariance structure information, the TUKI is applied, withtruncated rank N r = 5. EKI, EAKI, and ETKI are also applied with the ensemble size of J =2 N r + 1.The convergence of { m n } in terms of the relative L norm errors is depicted in Fig. 2. All EKI,EAKI, and ETKI diverge, and TUKI fails to converge to θ ref . This conforms to the discussionin Subsection 3.3. Gradient-based optimization algorithms with automatic differentiation [68, 69,5, 6] could be much more effective and efficient for this scenario, when the forward problem isdifferentiable and not chaotic. 11 L n o r m e rr o r TUKIUKIEKIEAKIETKI
Figure 1: L error (cid:107) m n − θ ref (cid:107) (cid:107) θ ref (cid:107) of the elliptic linear model problem. L n o r m e rr o r TUKIEKIEAKIETKI
Figure 2: L error (cid:107) m n − θ ref (cid:107) (cid:107) θ ref (cid:107) of the Bernoulli linear model problem. The barotropic vorticity equation describes the evolution of a non-divergent, incompressibleflow on the surface of the earth: ∂ω∂t = − v · ∇ ( ω + f ) , ∇ ψ = ω v = k × ∇ ψ, (19)where ω and ψ are (absolute) vorticity and streamfunction, respectively. v is the non-divergentflow velocity, k is the unit vector in the radial direction, and f = 2Ω sin( θ ) is the Coriolis force,depending on the latitude θ . The angular velocity is Ω = 7 . × s − and the Earth radius is R = 6 . × m .The initial condition (See Fig. 3) is the superposition of a zonally symmetric flow u b = 25 cos( θ ) −
30 cos ( θ ) + 300 sin ( θ ) cos ( θ ) , (20)here u b is the zonal wind velocity, and a sinusoidal disturbance in the vorticity field ω (cid:48) = A θ ) e − (( θ − θ ) /θ W ) cos( mλ ) , λ is the longitude, m = 4 , θ = 45 ◦ N, θ W = 15 ◦ , and A = 8 . × − . This resembles thatfound in the upper troposphere in Northern winter [70, 71]. (a) Zonally symmetric flow ( u b ) (b) Zonally symmetric flow ( ω b )(c) Sinusoidal vorticity disturbance ( ω (cid:48) ) (d) Initial vorticity field ( ω ) Figure 3: Initial condition (d) of the barotropic model is the superposition of a zonally symmetric flow (a or b) anda sinusoidal disturbance in the vorticity field (c).
The forward problem is solved by the spectral transform method with T85 spectral resolu-tion (triangular truncation at wavenumber 85, with 256 ×
512 points on the latitude-longitudetransform grid) and the semi-implicit time integrator [72].For the inverse problem, we recover the initial condition, specifically the initial vorticity field ( ω )of the barotropic vorticity equation, given noisy pointwise observations of the zonal wind velocityfield. The pointwise observations are collected at 50 random points in the north hemisphere at T = 12 h and T = 24 h , therefore, we have N y = 100 data (See Fig. 4). And 5% Gaussian randomnoises are added to the observation, as follows, y obs = y ref + 5% y ref (cid:12) N (0 , I ) , here (cid:12) denotes element-wise multiplication. The initial zonal flow ω b represents the basic atmo-spheric circulation, and we are more interested in the initial perturbation field ω (cid:48) . Hence, we assumethe prior mean is r = ω b . Since the large structure of ω lies in the spherical harmonics space Y m,n ( λ, θ ) = P m,n ( θ ) e imλ − n ≤ m ≤ n, n ≥ . N = 7 truncated wavenumber withtriangular truncation, as following, ω ( λ, θ ) = ω b + N (cid:88) n =1 τ (0 ,n ) P ,n ( θ ) + N (cid:88) m =1 N (cid:88) n = m τ s ( m,n ) P m,n ( θ ) cos( mλ ) + τ c ( m,n ) P m,n ( θ ) sin( mλ ) . (21)For UKI, the reparameterization is Eq. (21), and the UKI is initialized with τ ∼ N (0 , I R ). ForTUKI, we assume the square root of Λ is Z = 1 R (cid:16) P ,n , · · · , P m,n cos( mλ ) , P m,n sin( mλ ) · · · (cid:17) with m, n ≤ N. The TUKI starts with θ ∼ N ( ω b , Z Z T ). For both approaches, we have N r = 63. It is worthmentioning the inverse problem is ill-posed, another possible initial condition can be obtainedby mirroring around the equator, since there is no observation in the south hemisphere. Hence,regularization is required, and we set the regularization parameter α to 0 .
5, since ω b is a goodapproximation. Figure 4: The zonal velocity field of the barotropic model and the 50 random pointwise measurements (black dots)in the north hemisphere at two observation times ( T = 12 h and T = 24 h ). The convergence of the initial vorticity field ω ( x, m n ) and the optimization errors at eachiteration are depicted in Fig. 5. TUKI performs very close to UKI with reparameterization for thiscase. The estimated initial vorticity fields ω ( x, m n ) at the 20th iteration are depicted in Fig. 6.Both UKI and TUKI capture these sinusoidal disturbance of the truth initial field. The 3- σ errorfields at each point are depicted in Fig. 7. Although the covariance does not represent the Bayesianposterior uncertainty, it does indicate the sensitivities inherent in the estimation problem, and inparticular it features largest uncertainty in the south hemisphere, since there is no observation. Finally, we consider using reduced-order model techniques to speed up an idealized general cir-culation model inverse problem. The model is based on the 3D Navier-Stokes equations, makingthe hydrostatic and shallow-atmosphere approximations common in atmospheric modeling. Specif-ically, we test on the notable Held-Suarez test case [72], in which a detailed radiative transfer modelis replaced by Newtonian relaxation of temperatures toward a prescribed “radiative equilibrium”14 R e l a t i v e L n o r m e rr o r O p t i m i z a t i o n e rr o r TUKIUKI
Figure 5: L error (cid:107) ω ( x,m n ) − ω (cid:107) (cid:107) ω (cid:107) (left) and the optimization error 12 (cid:107) Σ − ν ( y obs − (cid:98) y n ) (cid:107) (right) of the barotropicmodel problem.Figure 6: Initial vorticity fields ω ( x, m n ) recovered by UKI (left) and TUKI (right). The reference initial vorticityfield is in Fig. 3-d. T eq ( φ, p ) that varies with latitude φ and pressure p . Specifically, the thermodynamic equation fortemperature T ∂T∂t + · · · = Q (dots denoting advective and pressure work terms) contains a diabatic heat source Q = − k T ( φ, p, p s ) (cid:0) T − T eq ( φ, p ) (cid:1) , with relaxation coefficient (inverse relaxation time) k T = k a + ( k s − k a ) max (cid:16) , σ − σ b − σ b (cid:17) cos φ. Here, σ = p/p s , which is pressure p normalized by surface pressure p s , is the vertical coordinate ofthe model, and T eq = max (cid:110) K, (cid:104) K − ∆ T y sin φ − ∆ θ z log (cid:16) pp (cid:17) cos φ (cid:105)(cid:16) pp (cid:17) κ (cid:111) igure 7: Error variance fields in terms of the standard deviation at each point obtained by UKI (left) andTUKI (right). is the equilibrium temperature profile ( p = 10 Pa is a reference surface pressure and κ = 2 / k a = (40 day) − , k s = (4 day) − , ∆ T y = 60 K , ∆ θ z = 10 K . For the numerical simulations, we use the spectral transform method in the horizontal, withT42 spectral resolution (triangular truncation at wavenumber 42, with 64 ×
128 points on thelatitude-longitude transform grid); we use 20 vertical levels equally spaced in σ . With the defaultparameters, the model produce an Earth-like zonal-mean circulation, albeit without moisture orprecipitation. A single jet is generated with maximum strength of roughly 30 m s − near 45 ◦ latitude (See Fig. 8).Our inverse problem is constructed to learn parameters in Newtonian relaxation term Q :( k a , k s , ∆ T y , ∆ θ z ) , We do so in the presence of the following constraints:0 day − < k a < − , k a < k s < − , < ∆ T y , < ∆ θ z . The inverse problem is formed as follows [1], y = G ( θ ) + η with G ( θ ) = T ( φ, σ ) (22)with the parameter transformation θ : ( k a , k s , ∆ T y , ∆ θ z ) = (cid:16)
11 + | θ (1) | ,
11 + | θ (1) | + 11 + | θ (2) | , | θ (3) | , | θ (4) | (cid:17) . (23)The observation mapping G is defined by mapping from the unknown θ to the 200-day zonal meanof the temperature ( T ) as a function of latitude ( φ ) and height ( σ ), after an initial spin-up of 200days. The truth observation is the 1000-day zonal mean of the temperature (see Fig. 8-top-left),after an initial spin-up 200 days to eliminate the influence of the initial condition. Because the truthobservations come from an average 5 times as long as the observation window used for parameterlearning, the chaotic internal variability of the model introduces noise in the observations.16o perform the inversion, we set r = [2 day , ,
20 K ,
20 K] T and UKI is initial-ized with θ ∼ N (cid:16) r , I (cid:17) . Within the algorithm, we assume that the observation error satisfies η ∼ N (0 K , I K ). Because the problem is over-determined, we set α = 1 . The reduced-ordermodel technique discussed in Section 4 is applied to speed up the UKI. These 2 N θ forward modelevaluations are computed on a T21 grid (triangular truncation at wavenumber 21, with 32 × σ (twicecoarser in all three directions). The 1000-day zonal mean of the temperature and velocity predictedby the low-resolution model with the truth parameters are shown in Fig. 8-bottom. Although thereare large discrepancies comparing with results computed on the T42 grid, these low-resolutionresults only affect the covariance and F u d G . The estimated parameters and associated 3 − σ con-fidence intervals for each component at each iteration are depicted in Fig. 9. The estimation ofmodel parameters at the 20th iteration are (cid:20) k a k s ∆ T y ∆ θ z (cid:21) ∼ N (cid:16) (cid:34) . − .
271 day − .
76 K9 .
84 K (cid:35) , (cid:34) . × − day − − . × − day − − . × − day − K 3 . × − day − K − . × − day − . × − day − . × − day − K − . × − day − K − . × − day − K 5 . × − day − K 2 . × − K . × − K . × − day − K − . × − day − K 3 . × − K . × − K (cid:35) (cid:17) . The UKI with reduced-order models converges efficiently to the true parameters (less than 20iterations). Moreover, the computational cost of these 2 N θ low-resolution forward evaluations iseven lower than a single high-resolution forward evaluation. Hence, for this inverse problems, thederivative-free UKI equipped with reduced order models achieves a CPU time speedup factor of O (10) and is as efficient as gradient-based optimization methods, which generally require an equallyexpensive backward propagation.
6. Conclusion
Unscented Kalman inversion is attractive for at least four main reasons: (i) it is derivative-free;(ii) it is robust for noise observations and chaotic inverse problems; (iii) it can be embarrassinglyparallel; (iv) it provides sensitivity information. In the present work, several strategies of makingthe UKI more efficient is presented for large scale inverse problems, including using low-rank ap-proximation and reduced-order model techniques. Although we demonstrate the success of applyingUKI for high dimensional inverse problems, the construction of low-rank basis for general scientificand engineering problems with complex geometries is not trivial, which is worth further investiga-tion. Another interesting area for future work would be to speed up UKI with other reduced-ordermodel techniques.
Acknowledgments.
D.Z.H. is supported by the generosity of Eric and Wendy Schmidt by recom-mendation of the Schmidt Futures program. J.H. is supported by the Simons Foundation as aJunior Fellow at New York University.
Appendix A. Proof of Theorems
Proof of lemma 1.
We will prove that Z n is rank N r and share the same column vector spacespanned by the column vectors of Z by induction.When n = 0, since rank( Z ) = N r , therefore Z n is rank N r and share the same column vectorspace spanned by the column vectors of Z . 17 igure 8: Zonal mean temperature (left) and zonal wind velocity (right) obtained with the T42 grid (top) and theT21 grid (bottom). We assume this holds for all n ≤ k . Since Z ω = (cid:112) − α Z , the column space and the leftsingular vector space of (cid:16) αZ k Z ω (cid:17) are the same as the column space of Z . Therefore the N r -TSVD (cid:16) αZ k Z ω (cid:17) = (cid:98) U k (cid:113) (cid:98) D k (cid:98) V Tk is exact. Next, since we have (cid:98) Z k +1 (cid:98) Z Tk +1 = (cid:16) αZ k Z ω (cid:17)(cid:16) αZ k Z ω (cid:17) T , (cid:98) Z k +1 is rank N r and the column space of (cid:98) Z k +1 is the same as Z . Finally, since (cid:98) P k +1 ( (cid:98) Γ k +1 + I ) − / is not singular, from Z k +1 = (cid:98) Z k +1 (cid:98) P k +1 ( (cid:98) Γ k +1 + I ) − / , we have that Z k +1 shares the column space of (cid:98) Z k +1 .Therefore { Z n } and { (cid:98) Z n } are rank N r and share the same column vector space spanned by thecolumn vectors of Z (13). Proof of theorem 2.
Gaussian vectors are rotation invariant. We can take any N θ × N θ orthogonalmatrix O , then the joint law of { Oθ , Oθ , · · · , Oθ J } is the same as the joint law of { θ , θ , · · · , θ J } .18 k a ( day ) k s ( day ) T y ( K ) y ( K ) Figure 9: Convergence of the idealized general circulation model inverse problem with UKI, true parameter valuesare represented by dashed grey lines.
Moreover, rotation preserves distance:dist( θ ref , span { θ , θ , · · · , θ J } ) = dist( Oθ ref , span { Oθ , Oθ , · · · , Oθ J } ) , (A.1)which has the same law as dist( Oθ ref , span { θ , θ , · · · , θ J } ). By taking expectation over θ , θ , · · · , θ J on both sides, we get E [dist( θ ref , span { θ , θ , · · · , θ J } ) ] = E [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ] , (A.2)for any N θ × N θ orthogonal matrix O . We can further average (A.2), over the uniform distributionon all N θ × N θ orthogonal matrices (Haar measure over the orthogonal group): E [dist( θ ref , span { θ , θ , · · · , θ J } ) ] = E [ E O [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ]] , (A.3)where E O is the expectation with respect to the uniform distribution on all N θ × N θ orthogonalmatrices. Under this measure, Oθ ref is uniformly distributed on the sphere of radius (cid:107) θ ref (cid:107) , i.e. ithas the same law as (cid:107) θ ref (cid:107) ω , where ω is uniformly distributed on the unit sphere. We can rewritethe right hand side of (A.3) as E O [dist( Oθ ref , span { θ , θ , · · · , θ J } ) ] = (cid:107) θ ref (cid:107) E ω [dist( ω, span { θ , θ , · · · , θ J } ) ] . (A.4)For Gaussian vectors θ , θ , · · · , θ J , almost surely, their span, span { θ , θ , · · · , θ J } has dimension J . dist( ω, span { θ , θ , · · · , θ J } ) is the distance from a uniformly distributed random vector on theunit sphere to a plane of dimension J , we can again rotate the plane, which will not change thelaw. So we can rotate the plane to the plane spanned by the coordinate vectors e , e , · · · , e J , then E ω [dist( ω, span { θ , θ , · · · , θ J } ) ] = E ω [dist( ω, span { e , e , · · · , e J } ) ]= E ω (cid:34) N θ (cid:88) i = J ω i (cid:35) = N θ − JN θ = 1 − JN θ , (A.5)where we use that for a uniformly distributed vector, the expectation of each coordinate square is1 /N θ . The claim (16) follows from combining (A.4) and (A.5).19 ppendix B. Ensemble Based Kalman Inversion The stochastic ensemble Kalman inversion [1] is • Prediction step : (cid:98) θ jn +1 = αθ jn + (1 − α ) r + ω jn +1 (cid:98) m n +1 = 1 J J (cid:88) j =1 (cid:98) θ jn +1 , (cid:98) C n +1 = 1 J − J (cid:88) j =1 ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) θ jn +1 − (cid:98) m n +1 ) T . • Analysis step : (cid:98) y jn +1 = G ( (cid:98) θ jn +1 ) (cid:98) y n +1 = 1 J J (cid:88) j =1 (cid:98) y jn +1 , (cid:98) C θpn +1 = 1 J − J (cid:88) j =1 ( (cid:98) θ jn +1 − (cid:98) m n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T , (cid:98) C ppn +1 = 1 J − J (cid:88) j =1 ( (cid:98) y jn +1 − (cid:98) y n +1 )( (cid:98) y jn +1 − (cid:98) y n +1 ) T + Σ ν ,θ jn +1 = (cid:98) θ jn +1 + (cid:98) C θpn +1 (cid:16) (cid:98) C ppn +1 (cid:17) − ( y − (cid:98) y jn +1 − ν jn +1 ) ,m n +1 = 1 J J (cid:88) j =1 θ jn +1 . (B.1)Here the superscript j = 1 , · · · , J is the ensemble particle index, ω jn +1 ∼ N (0 , Σ ω ) and ν jn +1 ∼N (0 , Σ ν ) are independent and identically distributed random variables.Inspired by square root Kalman filters [29, 28, 50, 51], the analysis step of the stochastic Kalmaninversion can be replaced by a deterministic analysis, which leads to deterministic ensemble Kalmaninversions. Deterministic ensemble Kalman inversions update the mean first, as following, m n +1 = (cid:98) m n +1 + (cid:98) C θpn +1 (cid:16) (cid:98) C ppn +1 (cid:17) − ( y − (cid:98) y n +1 ) . (B.2)Then we can define the matrix square roots (cid:98) Z n +1 , Z n +1 ∈ R N θ × J of (cid:98) C n +1 , C n +1 as following, (cid:98) Z n +1 = 1 √ J − (cid:16)(cid:98) θ n +1 − (cid:98) m n +1 (cid:98) θ n +1 − (cid:98) m n +1 ... (cid:98) θ Jn +1 − (cid:98) m n +1 (cid:17) ,Z n +1 = 1 √ J − (cid:16) θ n +1 − m n +1 θ n +1 − m n +1 ... θ Jn +1 − m n +1 (cid:17) . (B.3)The particles { θ jn +1 } can be deterministically updated by the ensemble adjustment/transformKalman methods, which brings about following ensemble adjustment/transform Kalman inversions.20 ppendix B.1. Ensemble Adjustment Kalman Inversion Following the ensemble adjustment Kalman filter proposed in [28], the analysis step updatesparticles deterministically with a pre-multiplier A , θ jn +1 − m n +1 = A ( (cid:98) θ jn +1 − (cid:98) m n +1 ) . (B.4)Here A = F √ D p U √ D √ D p − F T withSVD : (cid:98) Z n +1 = F √ D p V T , SVD : V T (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − V = U DU T , (B.5)where both √ D p and D are non-singular diagonal matrices, with dimensionality rank( (cid:98) Z n +1 ) and (cid:98) Y n +1 = 1 √ J − (cid:16)(cid:98) y n +1 − (cid:98) y n +1 (cid:98) y n +1 − (cid:98) y n +1 ... (cid:98) y Jn +1 − (cid:98) y n +1 (cid:17) . (B.6)It can be verified that the ensemble covariance matrix from Eq. (B.1) satisfies C n +1 = Z n +1 Z Tn +1 = (cid:98) Z n +1 (cid:16) I − (cid:98) Y Tn +1 ( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − (cid:98) Y n +1 (cid:17) (cid:98) Z Tn +1 = (cid:98) Z n +1 (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − (cid:98) Z Tn +1 (B.7)Bringing Eq. (B.5) into Eq. (B.7) leads to C n +1 = F √ D p U DU T √ D p F = F √ D p U √ D √ D p − F T (cid:98) C n +1 F √ D p − √ DU T √ D p F T = A (cid:98) Z n +1 (cid:98) Z Tn +1 A T . (B.8)And therefore the ensemble adjustment Kalman filter delivers the same covariance matrix as thestochastic ensemble Kalman filter. Although the covariance matrix is not required in both algo-rithms. Appendix B.2. Ensemble Transform Kalman Inversion
Following the ensemble transform Kalman filter proposed in [29], the analysis step updatesparticles deterministically with a post-multiplier T , Z n +1 = (cid:98) Z n +1 T. (B.9)Here T = P (Γ + I ) − / , with SVD: (cid:98) Y n +1 Σ − ν (cid:98) Y n +1 = P Γ P T , (B.10)where (cid:98) Y n +1 is defined in Eq. (B.6). Following Eq. (B.7), the ensemble covariance matrix satisfies C n +1 = (cid:98) Z n +1 (cid:16) I + (cid:98) Y Tn +1 Σ − ν (cid:98) Y n +1 (cid:17) − (cid:98) Z Tn +1 = (cid:98) Z n +1 (cid:16) I + P Γ P T (cid:17) − (cid:98) Z Tn +1 = (cid:98) Z n +1 P (cid:16) I + Γ (cid:17) − P T (cid:98) Z Tn +1 = (cid:98) Z n +1 P (cid:16) I + Γ (cid:17) − P T (cid:98) Z Tn +1 = (cid:98) Z n +1 T T T (cid:98) Z Tn +1 . (B.11)21nd therefore the ensemble transform Kalman filter delivers the same covariance matrix as thestochastic ensemble Kalman filter. Although the covariance matrix is not required in both algo-rithms. Remark 6.
The original ETKF is biased, since Z n +1 (cid:54) = 0 ( m n +1 is not the mean of { θ jn +1 } j ). Anunbiased ETKF fix is introduced in [51] by defining a symmetric post-multiplier T = P (Γ+ I ) − / P T .Since T T T (cid:16) I − (cid:98) Y Tn +1 ( (cid:98) Y n +1 (cid:98) Y Tn +1 + Σ ν ) − (cid:98) Y n +1 (cid:17) , (B.12)1 is the eigenvector of T and, therefore Z n +1 (cid:98) Z n +1 T (cid:98) Z n +1 . (B.13) References [1] Daniel Z Huang, Tapio Schneider, and Andrew M Stuart. Unscented kalman inversion. arXivpreprint arXiv:2102.01580 , 2021.[2] Mrinal K Sen and Paul L Stoffa.
Global optimization methods in geophysical inversion . Cam-bridge University Press, 2013.[3] Tapio Schneider, Shiwei Lan, Andrew Stuart, and Joao Teixeira. Earth system modeling 2.0:A blueprint for models that learn from observations and targeted high-resolution simulations.
Geophysical Research Letters , 44(24):12–396, 2017.[4] Oliver RA Dunbar, Alfredo Garbuno-Inigo, Tapio Schneider, and Andrew M Stuart. Cali-bration and uncertainty quantification of convective parameters in an idealized gcm. arXivpreprint arXiv:2012.13262 , 2020.[5] Daniel Z Huang, Kailai Xu, Charbel Farhat, and Eric Darve. Learning constitutive relationsfrom indirect observations using deep neural networks.
Journal of Computational Physics ,page 109491, 2020.[6] Kailai Xu, Daniel Z Huang, and Eric Darve. Learning constitutive relations using symmetricpositive definite neural networks.
Journal of Computational Physics , 428:110072, 2021.[7] Philip Avery, Daniel Z Huang, Wanli He, Johanna Ehlers, Armen Derkevorkian, and CharbelFarhat. A computationally tractable framework for nonlinear dynamic multiscale modeling ofmembrane fabric. arXiv preprint arXiv:2007.05877 , 2020.[8] Brian H Russell.
Introduction to seismic inversion methods . SEG Books, 1988.[9] Carey Bunks, Fatimetou M Saleck, S Zaleski, and G Chavent. Multiscale seismic waveforminversion.
Geophysics , 60(5):1457–1473, 1995.[10] Alexander V Goncharsky and Sergey Y Romanov. Iterative methods for solving coeffi-cient inverse problems of wave tomography in models with attenuation.
Inverse Problems ,33(2):025003, 2017.[11] Janne Hakkarainen, Zenith Purisha, Antti Solonen, and Samuli Siltanen. Undersampled dy-namic x-ray tomography with dimension reduction kalman filter.
IEEE Transactions on Com-putational Imaging , 5(3):492–501, 2019. 2212] Eric A Wan and Rudolph Van Der Merwe. The unscented kalman filter for nonlinear estima-tion. In
Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communica-tions, and Control Symposium (Cat. No. 00EX373) , pages 153–158. Ieee, 2000.[13] Rudolph Emil Kalman. A new approach to linear filtering and prediction problems.
J. BasicEng. Mar , 82(1):35–45, 1960.[14] Harold Wayne Sorenson.
Kalman filtering: theory and application . IEEE, 1985.[15] Geir Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model usingmonte carlo methods to forecast error statistics.
Journal of Geophysical Research: Oceans ,99(C5):10143–10162, 1994.[16] Simon J Julier, Jeffrey K Uhlmann, and Hugh F Durrant-Whyte. A new approach for filteringnonlinear systems. In
Proceedings of 1995 American Control Conference-ACC’95 , volume 3,pages 1628–1632. IEEE, 1995.[17] Daniel Z Huang, P-O Persson, and Matthew J Zahr. High-order, linearly stable, partitionedsolvers for general multiphysics problems based on implicit–explicit runge–kutta schemes.
Computer Methods in Applied Mechanics and Engineering , 346:674–706, 2019.[18] Daniel Z Huang, Will Pazner, Per-Olof Persson, and Matthew J Zahr. High-order parti-tioned spectral deferred correction solvers for multiphysics problems.
Journal of ComputationalPhysics , page 109441, 2020.[19] Zhengyu Huang, Philip Avery, Charbel Farhat, Jason Rabinovitch, Armen Derkevorkian, andLee D Peterson. Simulation of parachute inflation dynamics using an eulerian computationalframework for fluid-structure interfaces evolving in high-speed turbulent flows. In , page 1540, 2018.[20] Daniel Z Huang, Philip Avery, Charbel Farhat, Jason Rabinovitch, Armen Derkevorkian,and Lee D Peterson. Modeling, simulation and validation of supersonic parachute inflationdynamics during mars landing. In
AIAA Scitech 2020 Forum , page 0313, 2020.[21] Alistair Adcroft, Whit Anderson, V Balaji, Chris Blanton, Mitchell Bushuk, Carolina O Du-four, John P Dunne, Stephen M Griffies, Robert Hallberg, Matthew J Harrison, et al. The gfdlglobal ocean and sea ice model om4. 0: Model description and simulation features.
Journal ofAdvances in Modeling Earth Systems , 11(10):3167–3211, 2019.[22] Charles S Peskin. Numerical analysis of blood flow in the heart.
Journal of computationalphysics , 25(3):220–252, 1977.[23] Marsha Berger and Michael Aftosmis. Progress towards a cartesian cut-cell method for viscouscompressible flow. In , page 1301, 2012.[24] Daniel Z Huang, Dante De Santis, and Charbel Farhat. A family of position-and orientation-independent embedded boundary methods for viscous flow and fluid–structure interactionproblems.
Journal of Computational Physics , 365:74–104, 2018.[25] Daniel Z Huang, Philip Avery, and Charbel Farhat. An embedded boundary approach forresolving the contribution of cable subsystems to fully coupled fluid-structure interaction.
International Journal for Numerical Methods in Engineering , 2020.2326] Marsha J Berger, Phillip Colella, et al. Local adaptive mesh refinement for shock hydrody-namics.
Journal of computational Physics , 82(1):64–84, 1989.[27] Raunak Borker, Daniel Huang, Sebastian Grimberg, Charbel Farhat, Philip Avery, and JasonRabinovitch. Mesh adaptation framework for embedded boundary methods for computationalfluid dynamics and fluid-structure interaction.
International Journal for Numerical Methodsin Fluids , 90(8):389–424, 2019.[28] Jeffrey L Anderson. An ensemble adjustment kalman filter for data assimilation.
Monthlyweather review , 129(12):2884–2903, 2001.[29] Craig H Bishop, Brian J Etherton, and Sharanya J Majumdar. Adaptive sampling withthe ensemble transform kalman filter. part i: Theoretical aspects.
Monthly weather review ,129(3):420–436, 2001.[30] Ienkaran Arasaratnam and Simon Haykin. Cubature kalman filters.
IEEE Transactions onautomatic control , 54(6):1254–1269, 2009.[31] Simon J Julier and Jeffrey K Uhlmann. New extension of the kalman filter to nonlinear systems.In
Signal processing, sensor fusion, and target recognition VI , volume 3068, pages 182–193.International Society for Optics and Photonics, 1997.[32] Sharad Singhal and Lance Wu. Training multilayer perceptrons with the extended kalmanalgorithm. In
Advances in neural information processing systems , pages 133–140, 1989.[33] Gintaras V Puskorius and Lee A Feldkamp. Decoupled extended kalman filter training offeedforward layered networks. In
IJCNN-91-Seattle International Joint Conference on NeuralNetworks , volume 1, pages 771–777. IEEE, 1991.[34] Nikola B Kovachki and Andrew M Stuart. Ensemble kalman inversion: a derivative-freetechnique for machine learning tasks.
Inverse Problems , 35(9):095005, 2019.[35] Dean S Oliver, Albert C Reynolds, and Ning Liu.
Inverse theory for petroleum reservoircharacterization and history matching . Cambridge University Press, 2008.[36] Yan Chen and Dean S Oliver. Ensemble randomized maximum likelihood method as aniterative ensemble smoother.
Mathematical Geosciences , 44(1):1–26, 2012.[37] Alexandre A Emerick and Albert C Reynolds. Investigation of the sampling performance ofensemble-based methods with a simple reservoir model.
Computational Geosciences , 17(2):325–350, 2013.[38] Eric A Wan and Alex T Nelson. Neural dual extended kalman filtering: applications in speechenhancement and monaural blind signal separation. In
Neural Networks for Signal ProcessingVII. Proceedings of the 1997 IEEE Signal Processing Society Workshop , pages 466–475. IEEE,1997.[39] Alexander G Parlos, Sunil K Menon, and A Atiya. An algorithmic approach to adaptive statefiltering using recurrent neural networks.
IEEE Transactions on Neural Networks , 12(6):1411–1432, 2001.[40] JH Gove and DY Hollinger. Application of a dual unscented kalman filter for simultaneous stateand parameter estimation in problems of surface-atmosphere exchange.
Journal of GeophysicalResearch: Atmospheres , 111(D8), 2006. 2441] David J Albers, Matthew Levine, Bruce Gluckman, Henry Ginsberg, George Hripcsak, andLena Mamykina. Personalized glucose forecasting for type 2 diabetes using data assimilation.
PLoS computational biology , 13(4):e1005232, 2017.[42] Marco A Iglesias, Kody JH Law, and Andrew M Stuart. Ensemble kalman methods for inverseproblems.
Inverse Problems , 29(4):045001, 2013.[43] Claudia Schillings and Andrew M Stuart. Analysis of the ensemble kalman filter for inverseproblems.
SIAM Journal on Numerical Analysis , 55(3):1264–1290, 2017.[44] Claudia Schillings and Andrew M Stuart. Convergence analysis of ensemble kalman inversion:the linear, noisy case.
Applicable Analysis , 97(1):107–123, 2018.[45] David J Albers, Paul-Adrien Blancquart, Matthew E Levine, Elnaz Esmaeilzadeh Seylabi, andAndrew Stuart. Ensemble kalman methods with constraints.
Inverse Problems , 35(9):095007,2019.[46] Neil K Chada, Andrew M Stuart, and Xin T Tong. Tikhonov regularization within ensemblekalman inversion.
SIAM Journal on Numerical Analysis , 58(2):1263–1294, 2020.[47] Alexey Kaplan, Yochanan Kushnir, Mark A Cane, and M Benno Blumenthal. Reduced spaceoptimal analysis for historical data sets: 136 years of atlantic sea surface temperatures.
Journalof Geophysical Research: Oceans , 102(C13):27835–27860, 1997.[48] P Brasseur, J Ballabrera-Poy, and J Verron. Assimilation of altimetric data in the mid-latitudeoceans using the singular evolutive extended kalman filter with an eddy-resolving, primitiveequation model.
Journal of Marine Systems , 22(4):269–294, 1999.[49] L Gourdeau, J Verron, Thierry Delcroix, AJ Busalacchi, and R Murtugudde. Assimilationof topex/poseidon altimetric data in a primitive equation model of the tropical pacific oceanduring the 1992–1996 el ni˜no-southern oscillation period.
Journal of Geophysical Research:Oceans , 105(C4):8473–8488, 2000.[50] Michael K Tippett, Jeffrey L Anderson, Craig H Bishop, Thomas M Hamill, and Jeffrey SWhitaker. Ensemble square root filters.
Monthly Weather Review , 131(7):1485–1490, 2003.[51] Xuguang Wang and Craig H Bishop. A comparison of breeding and ensemble transform kalmanfilter ensemble forecast schemes.
Journal of the atmospheric sciences , 60(9):1140–1158, 2003.[52] Rudolph Van Der Merwe and Eric A Wan. The square-root unscented kalman filter for stateand parameter-estimation. In , volume 6, pages 3461–3464. IEEE, 2001.[53] TD Robinson, MS Eldred, KE Willcox, and R Haimes. Surrogate-based optimization us-ing multifidelity models with variable parameterization and corrected space mapping.
AiaaJournal , 46(11):2814–2822, 2008.[54] Timothy MacDonald and Juan J Alonso. Multi-fidelity wing optimization utilizing 2d to 3dpressure distribution mapping in transonic conditions. In
AIAA Scitech 2020 Forum , page1295, 2020.[55] Kevin Carlberg, Charbel Farhat, Julien Cortial, and David Amsallem. The gnat method fornonlinear model reduction: effective implementation and application to computational fluiddynamics and turbulent flows.
Journal of Computational Physics , 242:623–647, 2013.2556] Elizabeth Qian, Boris Kramer, Benjamin Peherstorfer, and Karen Willcox. Lift & learn:Physics-informed machine learning for large-scale nonlinear dynamical systems.
Physica D:Nonlinear Phenomena , 406:132401, 2020.[57] Sebastian Grimberg, Charbel Farhat, Radek Tezaur, and Charbel Bou-Mosleh. Mesh samplingand weighting for the hyperreduction of nonlinear petrov-galerkin reduced-order models withlocal reduced-order bases. arXiv preprint arXiv:2008.02891 , 2020.[58] Ilias Bilionis and Nicholas Zabaras. Multi-output local gaussian process regression: Appli-cations to uncertainty quantification.
Journal of Computational Physics , 231(17):5718–5746,2012.[59] Emmet Cleary, Alfredo Garbuno-Inigo, Shiwei Lan, Tapio Schneider, and Andrew M Stuart.Calibrate, emulate, sample. arXiv preprint arXiv:2001.03689 , 2020.[60] Rohit K Tripathy and Ilias Bilionis. Deep uq: Learning deep neural network surrogate modelsfor high dimensional uncertainty quantification.
Journal of computational physics , 375:565–588, 2018.[61] Yuwei Fan and Lexing Ying. Solving inverse wave scattering with deep learning. arXiv preprintarXiv:1911.13202 , 2019.[62] Nicholas H Nelsen and Andrew M Stuart. The random feature model for input-output mapsbetween banach spaces. arXiv preprint arXiv:2005.10224 , 2020.[63] Sebastian Reich and Colin Cotter.
Probabilistic forecasting and Bayesian data assimilation .Cambridge University Press, 2015.[64] Kody Law, Andrew Stuart, and Kostas Zygalakis. Data assimilation.
Cham, Switzerland:Springer , 2015.[65] Peter Jan van Leeuwen. A variance-minimizing filter for large-scale applications.
MonthlyWeather Review , 131(9):2071–2084, 2003.[66] Dean S Oliver and Yan Chen. Recent progress on reservoir history matching: a review.
Computational Geosciences , 15(1):185–221, 2011.[67] David M Livings, Sarah L Dance, and Nancy K Nichols. Unbiased ensemble square root filters.
Physica D: Nonlinear Phenomena , 237(8):1021–1028, 2008.[68] Malcolm Sambridge, Peter Rickwood, Nicholas Rawlinson, and Silvano Sommacal. Automaticdifferentiation in geophysical inverse problems.
Geophysical Journal International , 170(1):1–8,2007.[69] Peng Chen, Umberto Villa, and Omar Ghattas. Taylor approximation for pde-constrainedoptimization under uncertainty: Application to turbulent jet flow.
PAMM , 18(1):e201800466,2018.[70] Isaac M Held. Pseudomomentum and the orthogonality of modes in shear flows.
Journal ofthe atmospheric sciences , 42(21):2280–2288, 1985.[71] Isaac M Held and Peter J Phillips. Linear and nonliear barotropic decay on the sphere.
Journalof the atmospheric sciences , 44(1):200–207, 1987.2672] Isaac M Held and Max J Suarez. A proposal for the intercomparison of the dynamical coresof atmospheric general circulation models.