Trimmed Ensemble Kalman Filter for Nonlinear and Non-Gaussian Data Assimilation Problems
TTrimmed Ensemble Kalman Filter for Nonlinear and Non-Gaussian DataAssimilation Problems ∗ Weixuan Li † , W. Steven Rosenthal † , and Guang Lin ‡ Abstract.
We study the ensemble Kalman filter (EnKF) algorithm for sequential data assimilation in a gen-eral situation, that is, for nonlinear forecast and measurement models with non-additive and non-Gaussian noises. Such applications traditionally force us to choose between inaccurate Gaussianassumptions that permit efficient algorithms (e.g., EnKF), or more accurate direct sampling meth-ods which scale poorly with dimension (e.g., particle filters, or PF). We introduce a trimmed ensembleKalman filter (TEnKF) which can interpolate between the limiting distributions of the EnKF andPF to facilitate adaptive control over both accuracy and efficiency. This is achieved by introducinga trimming function that removes non-Gaussian outliers that introduce errors in the correlation be-tween the model and observed forecast, which otherwise prevent the EnKF from proposing accurateforecast updates. We show for specific trimming functions that the TEnKF exactly reproduces thelimiting distributions of the EnKF and PF. We also develop an adaptive implementation which pro-vides control of the effective sample size and allows the filter to overcome periods of increased modelnonlinearity. This algorithm allow us to demonstrate substantial improvements over the traditionalEnKF in convergence and robustness for the nonlinear Lorenz-63 and Lorenz-96 models.
Key words. ensemble Kalman filter, nonlinear filter, non-Gaussian data assimilation, adaptive data assimilation
AMS subject classifications.
1. Introduction.
A sequential data assimilation problem involves estimating the unknownstate variables of a dynamic system from a time sequence of measurement data. From aprobabilistic point of view, the solution of such a problem is given by the posterior probabilitydensity function (PDF) of the states conditioned to the measurement data. Theoretically, thisforms an inverse problem and the required posterior PDF can be derived following the Bayesianfiltering approach. In practice, however, a closed-form expression for the posterior PDF doesnot exist except for a few simple problems. Consequently, we have to resort to a numericalalgorithm to seek an approximate solution.The ensemble Kalman filter (EnKF) [7, 8] and its variants (e.g., [25, 14, 16]) are a classof widely used algorithms for sequential data assimilation problems encountered in variousscientific and engineering areas, such as atmospheric science [12, 20], hydrology [22, 19], reser-voir engineering [11, 1], and power systems [15, 9]. Designed as a Monte Carlo approximationof the classic Kalman filter algorithm, the EnKF employs an ensemble of model simulations ∗ Submitted to the editors DATE.
Funding:
This work was supported by the U.S. Department of Energy, Office of Science, Office of AdvancedScientific Computing Research, Applied Mathematics program as part of the Multifaceted Mathematics for ComplexEnergy Systems (M ACS) project. G. Lin would like to acknowledge the support from NSF Grants DMS-1555072.Pacific Northwest National Laboratory is operated by Battelle for the DOE under Contract DE-AC05-76RL01830. † Advanced Computing, Mathematics & Data Division, Pacific Northwest National Laboratory, Richland, WA([email protected], [email protected]). ‡ Department of Mathematics & School of Mechanical Engineering, Purdue University, West Lafayette, IN([email protected]). a r X i v : . [ s t a t . M E ] A ug W. LI, W. S. ROSENTHAL, AND G. LIN to represent the uncertainty associated with the dynamic system under study. For linear dy-namic systems with linear measurement operators and Gaussian noises, the EnKF’s solutionhas been shown to converge, as the ensemble size becomes sufficiently large, to that of theBayesian filter (the Bayesian filter becomes the classic Kalman filter in this scenario) [18].However, the dynamic models in many practical applications are nonlinear, and the noisefollows non-Gaussian distributions. In this situation, the result given by the EnKF in generaldoes not converge to the correct posterior. This is in contrast to a fully nonlinear filteringalgorithm like the particle filter (PF) [2, 6], which gives the correct estimation of the poste-rior PDF, regardless of the linearity and Gaussianity conditions, provided that the number ofparticles is sufficiently large [5].For moderately nonlinear and non-Gaussian problems, the EnKF algorithm can yield sat-isfactory approximate results with acceptable computational cost, whereas a classic nonlineardata assimilation algorithm such as the PF algorithm may be prohibitively expensive as itrequires simulating a much larger number of realizations to overcome the degeneracy issue(that is, the effective number of realizations shrinks rapidly after a few data assimilationsteps). Considering the compromise between computational cost and estimation accuracy,the EnKF could be a preferable choice. Nevertheless, when the nonlinearity in the modelis strong, and/or the noise is notably non-Gaussian, the EnKF could lead to unacceptableresults. For example, it is well recognized (see [24] for example) that, to make robust infer-ence from data with outliers, one should adopt a long-tail distribution model rather than aGaussian distribution model on which the EnKF is built.There have been some research efforts to combine the advantages of both the EnKF andthe PF, for example, by using the Gaussian mixture filter [23] or via a two-stage hybridupdate scheme [10, 4]. In another study, van Leeuwen [26] applied importance sampling tothe particle filter with a proposal transition density based on the Gaussian component ofthe forecast and observation cross-covariance. Also, Lei and Bickel [13] proposed to use theimportance sampling method to estimate the statistical moments of the conditional PDF ofthe state variables (as functions with respect to the measurement variables), which are thenused to construct a debiasing scheme for ensemble update.In this paper, we investigate a method that reduces the bias of the EnKF solution andenhances its robustness in nonlinear/non-Gaussian applications. The main novelty in thisstudy is that we prove that the asymptotic limit distribution of the EnKF solution can bewritten in a special form: the weighted average of the “shifted” conditional PDFs of the statevariables (see Subsection 3.3 for details). Based on this observation, we propose to correctthe bias of the EnKF by multiplying the forecast joint PDF (of the state and measurementvariables) with a non-negative function, termed the “trimming function”, which essentiallyadjusts the averaging weights. Due to its simplicity and flexibility, the proposed trimmedEnKF (TEnKF) approach is widely applicable to generally nonlinear and non-Gaussian dataassimilation problems. For instance, the TEnKF does not need the measurement noise to beadditive, which is a required condition for the alternative methods mentioned earlier. We alsoshow the TEnKF methodology to be sufficiently flexible to permit adaptive selection of thetrimming function and variable ensemble size in sequential data assimilation steps, which canbe exploited to enhance the computational efficiency.The paper is organized as follows. In Section 2, we state the sequential data assimilation
RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 3 problem in a general nonlinear non-Gaussian setting, and derive its solution (i.e., the posteriorPDF of the state variables) using the Bayesian filtering approach. The algorithmic proceduresof the EnKF and its asymptotic convergence is reviewed in Section 3. In Section 4, wedescribe the trimmed version of the EnKF algorithm with a discussion of its theoretical andpractical aspects. We also provide an implementation with extensions to adaptive trimmingand ensemble augmentation. Section 5 provides several numerical examples which illustratethe limiting distributions, adaptivity, and performance of the TEnKF. Concluding remarksare given in Section 6.
2. Sequential data assimilation problems.
This section briefly reviews the sequentialdata assimilation problem and derives its exact solution, known as the Bayesian filter, accord-ing to some basic rules of probability. The mathematical notation used in this paper followsthese conventions: an uppercase bold letter (e.g., Y ) represents a random vector; a lowercasebold letter (e.g., y ) represents the values that a random vector takes; p Y ( y ) represents thePDF of the random vector Y evaluated at y ; p XY ( x , y ) represents the joint PDF of the ran-dom vectors X and Y evaluated at ( x , y ); and p Y | X = x ( y ) represents the conditional PDF of Y , given the condition that X = x . Without causing any ambiguity, p Y | X = x ( y ) is furthershortened to p Y | x ( y ). Finally, the scalar special case of each of these terms is representedwith regular typeface. Consider a dynamic system that is described by a forecastmodel:(1) X k = f k ( X k − , W k ) , k = 1 , , ..., where X k − and X k are the ( N ×
1) state vectors at time steps k − k , respectively.The value of the state vector output is uncertain due to both the uncertainty in the previousstate and the noisy input vector W k , which represents the forecast model uncertainty. Alsoconsider observations Y k ( M × Y k = h k ( X k , V k ) , k = 1 , , ..., where V k is the measurement noise. In this paper, we consider the general situation inwhich both f k ( · ) and h k ( · ) could be nonlinear functions, and that W k and V k could follownon-Gaussian distributions.In a sequential data assimilation problem, we need to estimate the state X k at each timestep k based on all the measurements that have been made up to step k : y ∗ , ..., y ∗ k , where y ∗ i is the measured value of Y i . When some measurement data are used to estimate the state,we say these data are “assimilated” into the model of the dynamic system. The solution of the data assimilation problem describedabove can be formally represented with: p X k | y ∗ ,..., y ∗ k ( x k ), i.e., the conditional PDF of X k given all measurements available up to and including time step k . This density known asthe posterior PDF, or the Bayesian filter solution of the data assimilation problem. Thissubsection gives a general derivation of this conditional PDF. W. LI, W. S. ROSENTHAL, AND G. LIN
Remark 2.1 : Before moving on to the solution, we note that (1) and (2) describe a 1 st -order Markov chain if the noise vectors W k and V k at different time steps are independent,which is a commonly assumed condition in sequential data assimilation problems. The Markovproperty implies that, given the conditional PDF p X k − | y ∗ ,..., y ∗ k − ( x k − ) obtained in step k − X k no longer depends on previous measurements y ∗ , ..., y ∗ k − . To simplifythe notation, we drop the dependence on y ∗ , ..., y ∗ k − in all the conditional PDFs through-out the discussion in the rest of the paper. For example, p X k − | y ∗ ,..., y ∗ k − ( x k − ) is shortenedto p X k − ( x k − ), and p X k | y ∗ ,..., y ∗ k ( x k ) is shortened to p X k | y ∗ k ( x k ), with the implication that y ∗ , ..., y ∗ k − have been assimilated in previous steps. This allows us to restrict our analysis totwo neighboring time steps: k − k .Following some basic rules in probability theory, the required conditional PDF can beobtained in three steps.1) Determine the prior PDF of X k from(3) p X k ( x k ) = (cid:90) p X k − ( x k − ) p X k | x k − ( x k ) d x k − . This PDF is termed prior in the sense that it represents the estimation of X k prior to theassimilation of measurement data y ∗ k . Note that in deriving Eq. (3), we have used the sum rule ,i.e., the marginal PDF with respect to a variable x k can be calculated by summing/integratingthe joint PDF over the other random variables, or p X k ( x k ) = (cid:90) p X k − X k ( x k − , x k ) d x k − , as well as the product rule , i.e., the joint PDF equals the product of the marginal PDF andthe conditional PDF: p X k − X k ( x k − , x k ) = p X k − ( x k − ) p X k | x k − ( x k ) . Eq. (3) shows that the prior PDF p X k ( x k ) can be calculated from two pieces of information:the posterior PDF of X k − obtained in the previous time step, and the conditional PDF p X k | x k − ( x k ), also known as the transition PDF, which is defined by the forecast model Eq. (1).2) Find the joint prior PDF of X k and Y k by another application of the product rule:(4) p X k Y k ( x k , y k ) = p X k ( x k ) p Y k | x k ( y k ) , where p X k ( x k ) is obtained in Step 1), and p Y k | x k ( y k ) is defined by the measurement modelEq. (2).3) Compute the required posterior PDF by fixing y k = y ∗ k in the joint PDF (4) andre-normalizing the PDF such that its integral over x k equals one:(5) p X k | y ∗ k ( x k ) = cp X k ( x k ) p Y k | x k ( y ∗ k ) , where c = (cid:2)(cid:82) p X k ( x k ) p Y k | x k ( y ∗ k ) d x k (cid:3) − is the normalizing constant, and p Y k | x k ( y ∗ k ) is knownas the likelihood . RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 5
3. Ensemble Kalman filter.
The posterior PDF given by the Bayesian filter, i.e., Eqs. (3)-(5), in general cannot be explicitly and analytically expressed when the forecast and measure-ment models are nonlinear, or when the distributions of the noises are non-Gaussian. Instead,this PDF may be approximately represented with the empirical distribution of n sample points,i.e., a Monte Carlo solution. One commonly used method to generate such a sample is EnKF.In this section, we review the EnKF algorithm with a discussion on its convergence. Thisalgorithm solves a sequential data assimilation problem through iterations of the two-stepprocess, first sampling from the prior PDF (known as the forecast step in the literature), andthen sampling from the posterior PDF (known as the update step) at each time step k . Thesetime steps coincide with times when the system is measured. In the forecast step ofthe EnKF at time step k , n sample points from the joint prior PDF p X k Y k ( x k , y k ) can beobtained by simulating n realizations of the forecast model Eq. (1):(6) X ik = f k ( X ik − , W ik ) , i = 1 , ..., n, and the measurement model Eq. (2):(7) Y ik = h k ( X ik , V ik ) , i = 1 , ..., n, where X ik − are independent and identically distributed (i.i.d.) sample points from p X k − ( x k − )(estimated in the previous time step), and W ik and V ik are i.i.d. sample points from theirrespective distributions (given as part of the problem statement). Remark 3.1 : Sinceevery random quantity involved in the update step is associated with time step k , we drop thesubscript k throughout the rest of the paper to further simplify the notation. For example, X k , y ∗ k , and p X k | y ∗ k ( x k ) are shortened to X , y ∗ , and p X | y ∗ ( x ), respectively, without furtherclarification.While sampling from the prior PDF in the forecast step is rather straightforward, samplingfrom the posterior PDF p X k | y ∗ k ( x k ) in the update step is tricky because of the extra conditionthat needs to be satisfied: Y = y ∗ . Indeed, the distinct feature of the EnKF algorithm incomparison with other sample-based data assimilation methods (e.g., a PF) is its linear updatescheme, which is shown below:(8) ˜ X i = X i + K ( y ∗ − Y i ) , i = 1 , ..., n, where ˜ X i approximately follows the target posterior PDF (we will explain the approximationin the next subsection). K = C XY C − YY is the Kalman gain, where C XY is the covariancematrix between random vectors X and Y , and C YY is the covariance matrix of random vector Y . Remark 3.2 : In many EnKF studies, it is assumed that the measurement error is additive: Y = h ( X ) + V , and the error V is independent of X . Thus, the covariance matrices neededin the calculation of the Kalman gain become C XY = C X h and C YY = C hh + R , where C X h isthe covariance matrix between X and h ( X ), C hh is the covariance matrix of h ( X ), and R is W. LI, W. S. ROSENTHAL, AND G. LIN the covariance matrix of V (assumed to be known). Additive measurement noise also impliesthat the update scheme (8) becomes(9) ˜ X i = X i + K [ y ∗ − ( h ( X i ) + V i )] . Note that often in EnKF literature the update scheme is given in a different form:(10) ˜ X i = X i + K [( y ∗ + V i ) − h ( X i )] , that is, the error realizations V i are added to the measurement value y ∗ k , which is the pro-cedure known as measurement perturbation [3]. When the error follows a symmetric PDF,i.e., p V ( v ) = p V ( − v ), (9) and (10) are equivalent statistically. In a general situation, weuse the update scheme (8) since it assumes neither additive nor symmetrically distributedmeasurement noise. Remark 3.3 : In general nonlinear and non-Gaussian problems, the covariance matrices C XY and C YY cannot be analytically derived, and thus, the Kalman gain K is not knownexactly. In practice, we can use the sample estimates ˆ C XY and ˆ C YY calculated from theforecast ensemble. Then K may be approximated with(11) ˆ K = ˆ C XY ˆ C − YY , and ˜ X i may be approximated with ˆ X i = X i + ˆ K ( y ∗ − Y i ). By the law of large numbers, as thesample size n → ∞ , we have that ˆ K and ˆ X i converge in probability to K and ˜ X i , respectively. It has been well-recognized thatthe sample points given by the EnKF update scheme (8) need not converge to the correctposterior PDF asymptotically for the general nonlinear/non-Gaussian data assimilation prob-lem. Intuitively, this is because the Kalman gain is calculated only from covariance matricesinstead of using the full information described by the joint prior PDF. The proposition belowgives the limit PDF (as the ensemble size n → ∞ ) of the EnKF sample. Particularly, we showthat this limit PDF is equal to the weighted average of the “shifted” conditional PDFs con-ditioned to different observation values. This result provides an insight about how to correctthe asymptotic bias in the EnKF algorithm. Proposition 3.1.
The PDF of ˜ X = X + K ( y ∗ − Y ) is (12) p ˜ X (˜ x ) = (cid:90) p X | y (˜ x − K ( y ∗ − y )) p Y ( y ) d y . Proof.
See Appendix A. (cid:3)
Proposition 3.1 shows that the limit PDF of the EnKF solution can be obtained by firstshifting the conditional PDF p X | y ( x ) by K ( y ∗ − y ) for different conditioning values of y ,and then averaging the shifted conditional PDFs with respect to the weighting distribution p Y ( y ), i.e., the marginal PDF of Y . Note that when y = y ∗ , the shifted conditional PDF p X | y (˜ x −K ( y ∗ − y )) becomes p X | y ∗ (˜ x ), which is exactly our target posterior PDF, the Bayesianfilter solution for the general nonlinear/non-Gaussian problem. However, for other values of y ,the shifted conditional PDFs in general are different from the target posterior, and hence, their RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 7 weighted average is not guaranteed to equal the target posterior. One exceptional situation iswhen X and Y follow a joint Gaussian distribution (usually this is not true if the forecast orthe measurement models are nonlinear, or if the noises are non-Gaussian), for which we caneasily prove that the shifted conditional PDF p X | y (˜ x − K ( y ∗ − y )) = p X | y ∗ (˜ x ) for any valueof y , and as a result, the limit PDF of the EnKF solution is exactly the target posterior.
4. A trimmed EnKF for nonlinear and non-Gaussian problems.
In the previous section,we investigated the convergence of the EnKF algorithm and discussed its bias from the exactposterior PDF. This section introduces a trimming procedure that reduces the bias. We firstgive the theoretical analysis of the trimming procedure in Subsection 4.1 and then discuss itspractical implementation in the following subsections.
Proposition 3.1 shows that the limitPDF of the EnKF solution is an average of the shifted conditional PDFs, weighted by themarginal PDF of Y . Motivated by this observation, we propose a modification of the EnKFalgorithm that allows us to adjust the averaging weight, and thus reduces the bias in theposterior estimate.The implementation steps of the new algorithm are parallel to that of EnKF describedin Sections 3.1 and 3.2 except that we now introduce an adjusted joint PDF obtained bymultiplying a non-negative function t ( y ) to the original joint PDF p XY ( x , y ):(13) p t XY ( x , y ) = c t t ( y ) p XY ( x , y ) , where c t is the normalizing constant that ensures the integral of the adjusted joint PDF equals1. Suppose we can draw sample points ( X it , Y it ) following the adjusted joint PDF (13). Then,similar to (8), the updated sample points are obtained by(14) ˜ X it = X it + K ( y ∗ − Y it ) . Similar to (12), we have the following proposition regarding the PDF of the sample pointsgenerated with (14).
Proposition 4.1.
The PDF of ˜ X t = X t + K ( y ∗ − Y t ) , where X t and Y t follow the adjustedjoint prior (13) , is (15) p t ˜ X (˜ x ) = (cid:90) p X | y (˜ x − K ( y ∗ − y )) p Y ( y ) c t t ( y ) d y . Proof.
See Appendix B. (cid:3)
Comparing (15) with (12), we see that the averaging weight has become p Y ( y ) c t t ( y ),which can be adjusted by choosing different functions t ( y ). Particularly, to reduce the biasbetween the limit PDF (15) and the true posterior p X | y ( x ), we want to put more weight onthe values of y that are close to the measurement y ∗ , and less weight on the values of y thatare far from this measurement. Thus, we choose t ( y ) such that it is monotonically decreasingwith respect to the distance between y and y ∗ . For example, one may choose(16) t ( y ) = exp[ − d ( y , y ∗ ) /λ ] , W. LI, W. S. ROSENTHAL, AND G. LIN where λ is a positive constant, and d ( y , y ∗ ) is a measure of the distance between y and y ∗ .For instance, we choose the L1 distance (normalized with the prior sample standard deviationof each state variable) in the numerical experiments in our study:(17) d ( y , y ∗ ) = N (cid:88) j =1 | y j − y ∗ j | ˆ σ j . Intuitively, multiplying such functions as t ( y ) to the original joint prior leads to a partial“trimming off” of the density distribution in the regions that are inconsistent with the mea-surement data. Hence, we refer to t ( y ) as the “trimming function”, and the modified versionof the EnKF algorithm as the trimmed EnKF, or TEnKF.One can control how much to trim by adjusting the trimming function. Consider t ( y ) =exp[ − d ( y , y ∗ ) /λ ] for example. A large λ results in a mild trim. In the extreme case, as λ → ∞ ,then c t t ( y ) → y and thus implies zero trim. From (15) we see that p t ˜ X ( x ) → p ˜ X ( x )in this situation, i.e., the limiting posterior distribution of the TEnKF converges to that ofthe EnKF. On the other hand, a small λ results in a significant trim. In this extreme case, as λ →
0, then c t t ( y ) → δ ( y ∗ − y ) /p Y ( y ∗ ) (i.e., the Dirac-delta function), which exemplifies themaximum possible trim. From (15) we see that p t ˜ X ( x ) → p X | y ∗ ( x ), i.e., the true posterior PDFfor the Bayesian filter problem. We will demonstrate how the choice of trimming functioninterpolates between the limiting distribution of the EnKF and the true posterior for a simpletest problem in Subsection 5.1. We discussed in the previous subsection howthe trimming procedure affects the limit distribution of the EnKF. In this and the followingsubsections, we focus on the practical implementation details and give the complete descriptionof the TEnKF algorithm.The first task in implementing the TEnKF algorithm is to sample from the trimmed jointprior PDF (13). A straightforward way to achieve such a sample is implementing an impor-tance sampling procedure using the untrimmed joint prior as the proposal, that is, we associateeach sample point realization ( X i , Y i ) obtained with (6) and (7) a weight proportional to thetrimming function:(18) w i = t ( Y i ) (cid:80) nj =1 t ( Y j ) . To generate equally weighted sample points, we further apply a bootstrapped resampling tothe weighted sample points using these weights as selection probabilities. In other words, foreach i = 1 , ..., n , the probability the i th member of the trimmed sample is taken to be the j th member of the untrimmed sample is w j , and duplicates in the trimmed sample are permitted. A critical step in the implementation ofTEnKF is the selection of the trimming function t ( y ) that results in a satisfactory balancebetween accuracy and efficiency. In practice, the selection may be made from a family offunctions with a tuning parameter that controls the level of trim. For instance, we can choosefrom the family t ( y ; λ ) = exp[ − d ( y , y ∗ ) /λ ] by tuning the parameter λ . As discussed in RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 9
Subsection 4.1, a larger trim (i.e., a smaller λ ) helps reduce the bias in the TEnKF estimate.However, in practice, a larger trim may also lead to sample degeneracy—similar to that of thePF—as a large portion of the ensemble members are given negligible weights and trimmedoff. To deal with this trade-off, we design an adaptive algorithm to automatically tune thetrimming function such that the filter maintains a sufficient effective ensemble size n e afterthe trimming.For n weighted ensemble members, the effective ensemble size can be measured by(19) n e = (cid:34) n (cid:88) i =1 w i (cid:35) − , where w i , as defined in Eq. (18), are the trimming weights corresponding to a parametervalue of λ . Note that n e equals n for untrimmed (equally weighted) ensemble members, anddecreases as λ becomes smaller (larger trim). In practice, we perturb λ in some iterativescheme until n e is close to a target effective ensemble size n ∗ e .The advantage of this algorithm is that it allows adaptive tuning of the trimming functionbased on available computational resources. We summarize this procedure in Algorithm 1,which maintains a specified effective ensemble size by automatically adjusting the tuningparameter. When only a relatively small number of ensemble members can be simulated, thealgorithm can enforce a mild trim (large λ ), and thus behave more like the EnKF. On theother hand, if we can simulate an ensemble size that is significantly larger than the target n ∗ e ,the algorithm will automatically adopt a significant trim (small λ ) to reduce the bias in theposterior estimate. We will illustrate this adaptive behavior in Subsection 5.3. Algorithm 1
TEnKF1. Given the (untrimmed) forecast ensemble ( X i , Y i ) , i = 1 , ..., n, the measurementvalue y ∗ , the distance measure function d ( · , y ∗ ), and the trimming parameter λ :2. Calculate the Kalman gain ˆ K with (11).3. Compute the weights w i for the ensemble members with (16) and (18).4. Compute the effective ensemble size n e with (19).5. Decrease/increase λ if n e is greater/smaller than the target n ∗ e , until n e ≈ n ∗ e .6. Obtain the trimmed ensemble ( X jt , Y jt ) , j = 1 , ..., n, by bootstrapped resampling of( X i , Y i ) with respect to the trimming weights w i .7. Compute the updated state ensemble ˜ X it , i = 1 , ..., n , with (14). Other metrics can be applied alongside the trimmingfunction to further control the effective ensemble size and to enhance the computational ef-ficiency. We note that the level of nonlinearity/non-Gaussianity in the model often variesover different data assimilation cycles. An efficient algorithm should deploy more resourceswhen needed to maintain accuracy and prevent degeneracy, and operate with less computa-tional cost during more linear/Gaussian intervals. We take advantage of the flexibility of theTEnKF and propose adaptively increasing the ensemble size n prior to the trimming step.Specifically, we examine the number of forecast realizations that are within a distance d max from the observations:(20) n d = n (cid:88) i =1 d ( Y i , y ∗ ) 5. Numerical examples. We demonstrate the properties and efficacy of the TEnKF andthe aforementioned algorithms in several numerical examples. Using two well-known numericalmodels due to Lorenz [17, 21], we illustrate how Proposition 4.1 implies the TEnKF posteriorestimate interpolates between those of the EnKF and PF in the large n limit. Additional nu-merical exercises show that the TEnKF restores the convergence of the EnKF with increasingensemble size n as model nonlinearities increase. We also examine how well adaptive controlof ensemble size prior to trimming allows the filter to push through transient nonlinearitieswithout sacrificing accuracy. We first consider a simple example that intuitively illustrateshow we can correct the bias in the EnKF posterior estimate with the trimming procedure. TheLorenz-63 model [17], with an additive stochastic noise term to represent model uncertainty,is given by(21) d x d t = α ( x − x ) + ξ ( t )d x d t = x ( ρ − x ) − x + ξ ( t )d x d t = x x − βx + ξ ( t )for t ≥ 0, where ξ j ( t ), j = 1 , ..., 3, are independent Gaussian white noise processes with covari-ance σ δ s,t for all s ≥ 0, where δ is the Kronecker δ -function. We obtain solution trajectoriesof the stochastic differential equation (SDE) numerically using the O (∆ t ) strongly/weaklyaccurate stochastic Heun method, which employs a trapezoidal discretization of the determin-istic part of the integral of Eq. (21), and an Euler discretization of the stochastic part. Theparameters of this model and the following data assimilation problem are given in Figure 1.For certain parameter values, solutions of the deterministic equations ( σ = 0) are knownto be sensitive to initial conditions such that small perturbations due to numerical round-offor other disturbances cause chaotic trajectories. We consider the Bayesian filtering problemto estimate the distribution of the model state at time t = t from a direct, noisy observationof the second component, y = x + (cid:15) , where (cid:15) ∼ N (0 , τ ). An independent Gaussian priordistribution is assumed for each state variable initial condition, x j (0) ∼ N (cid:16) x j, , σ j, (cid:17) , j =1 , ..., RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 11 α ρ β x , x , x , n σ , σ , σ , σ τ t ∆ t 10 28 8 / . y truth 25 10 . τ . . 01 0 . . λ , and the bootstrapped PF. Thelatter is known to provide the exact Bayesian filter solution, and a short description of thisalgorithm is as follows. The forecast ensemble is determined from a Monte Carlo solution tothe original Lorenz-63 SDE at t = t . Then bootstrapped resampling is applied to the forecastensemble with weights as defined in Subsection 4.2 and the trimming function replaced by thelikelihood function L ( y i ) = exp (cid:104) − (cid:0) y i − d (cid:1) / (cid:0) τ (cid:1)(cid:105) . For each method, a sufficient ensemblesize n is used to accurately resolve the limiting ( n → ∞ ) posterior PDFs (Figure 1).The forecast from the nonlinear Lorenz-63 SDE model can be highly non-Gaussian, despiteGaussian assumptions for the model and initial condition uncertainties. This is the case forthe Lorenz-63 SDE example in Figure 1, where periodically trajectories can randomly switchorbits around one of two different states. If the observations are not sufficiently frequentand accurate, the forecast becomes bimodal. Since the EnKF algorithm measures only theGaussian component of the forecast, then the joint distribution between the forecast andobservations will be inaccurate, and this can skew the posterior distribution (Figure 1). Thejoint correlations are most inaccurate for forecast members furthest from the observations.However, as these members are trimmed away in the TEnKF algorithm, the joint distributionapproaches that of the PF for sufficiently large n . To wit, as the trimming parameter λ shrinks, the limiting posterior PDF of the TEnKF approaches the exact solution determinedfrom the bootstrapped PF. As discussed earlier, nonlinear modelsintroduce non-Gaussian forecast perturbations which can be difficult to correct in the posteriorusing a Gaussian filter like the standard EnKF. When observations occur less frequently, theincreased forecast interval allows the nonlinearities to have a stronger effect. Under suchconditions, the EnKF can fail to decrease estimation errors as the ensemble size is increased. On the other hand, a nonlinear/non-Gaussian filter like the TEnKF can restore convergenceby trimming outliers and correcting the limiting posterior distribution.In the examples which follow, we integrate error in posterior estimates over all ensemblemembers, rather than only the mean of the posterior estimate. Over N rep runs of the exper-iment, we analyze the predictive accuracy of the posterior estimates provided by TEnKF byassimilating a time series of N t = t f / ∆ t obs data points spaced ∆ t obs seconds apart. The errorin the posterior estimate can be computed from the root-mean-square distance of the filterensemble from the true system state, measured over all state dimensions j = 1 , ..., N . If x ij,k and x tj,k are the j th elements of the i th ensemble member and truth state vectors, respectively,at time t k , then E m,k = n n (cid:88) i =1 N N (cid:88) j =1 (cid:0) x ij,k − x tj,k (cid:1) / , (22) E m = (cid:34) N t N t (cid:88) k =1 E m,k (cid:35) / , (23)for m = 1 , ..., N rep , are samples from the distribution of time-series (22) and time-averaged(23) posterior root-mean-square errors (RMSE).To show the applicability of the TEnKF methodology for nonlinear/non-Guassian filters oflarger scale, we test Algorithm 1 on the Lorenz-96 model [21]. (See Figure 2 for parameters.)This system of nonlinearly-coupled ordinary differential equations is given by(24) d x j d t = − x j − x j − + x j − x j +1 + F + ξ j ( t )for j = 1 , ..., N and forcing constant F > 0. Like the noisy Lorenz-63 model discussedearlier, each noise term ξ j ( t ) is an independent Gaussian white noise process with variance σ . (See example in Subsection 5.1.) Lorenz introduced the deterministic version of theLorenz-96 system ( σ = 0) and showed that for N = 36 and F = 8, this system exhibitschaotic trajectories. We take direct, noisy observations of the system state at every odd-indexed component, or(25) y = h ( x ) + (cid:15) = ( x , x , ..., x N − ) + (cid:15) with (cid:15) ∼ N (cid:0) , τ I N/ (cid:1) , where I s is the s × s identity matrix. In order to prevent degeneracydue to excessive initial errors, we draw the initial condition (IC) from a noisy observation of thetruth at time t = 0. The truth and unobserved components ( k even) of the ensemble memberICs are drawn from the same Gaussian distribution, N (cid:0) µ + µ · z, σ (cid:1) , where z ∼ N (0 , 1) isconstant for any given experiment. For the observed components ( k odd), the ICs are drawnfrom the likelihood distribution at t = 0 by taking x k − (0) ∼ N (cid:0) y ,k , τ (cid:1) , k = 1 , ..., N/ RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 13 (a) N F t f t obs . t . σ . τ . n n ∗ e d max µ µ . σ . (b) Figure 2: (a) EnKF and TEnKF (Algorithm 1) performance on noisy Lorenz-96 model overtime. (b) Parameters for this experiment.in forecast variance that, despite a large ensemble and accurate observations, does not enablethe filter to keep track of the truth trajectory. We see these relative differences in trackingproficiency persist over many repetitions of the experiment with different truth realizations(Figure 3). For observation intervals less than ∆ t ≤ . 70, the forecast is sufficiently Gaussianthat the performance improvement from the TEnKF is incremental. However, over longerobservation intervals the median prediction error is smaller for the TEnKF with any ensemblesize larger than 200. The uncorrected bias from non-Gaussian outliers spurred by the modelnonlinearities degrades the EnKF posterior estimate, which worsens as ∆ t obs is increased.This bias cannot be corrected simply by increasing the ensemble size, and more accuracy isregained by trimming these outliers than is lost due to a smaller effective sample size. The adaptive procedure in Algorithm 1 maintainsa minimum effective sample size, but it risks over-trimming whenever the non-linearities inthe model are weak. The algorithm also risks under-trimming if the forecast distributionrequires an ensemble larger than we initially used, since each ensemble member carries moreweight and is less dispensable. The adaptive ensemble size procedure described in Subsection4.4 enables the forecast to be enlarged to meet a target effective sample size. We apply thismodification to the Lorenz-96 problem. In particular, we consider a deterministic version ofthe system ( σ = 0), and accelerate forecast generation with a Runge-Kutta 4 th -5 th adaptiveintegration scheme. The distance measure in (20) is chosen to be d ( y , y ∗ ) = max j | y j − y ∗ | .To compute the new forecast members, we perturb the new initial conditions with N (0 , σ p )noise independently in each dimension.Our exhibit of one experimental run shows how the variation of ensemble sizes allow theTEnKF to adjust to time-varying levels of nonlinearity (Figure 4). Without model error to Figure 3: Effect of increasing nonlinearity (∆ t obs ↑ ) on the median and inter-quartile range ofthe time-averaged RMSE errors from the EnKF and TEnKF applied to the noisy Lorenz-96model over 500 runs. Parameters are given in Figure 2. N F t f ∆ t obs ∆ t n r max d max σ τ σ p µ µ σ 36 8 32 0 . . 01 0 . 05 0 . . . n = 200, we see that typically this ensemble size issufficient for a Gaussian filter to track the the truth trajectory. However, at several pointsthere are sufficient non-Gaussian outliers generated by the nonlinear model and initial varianceto prevent the EnKF from making correct forecast updates, despite having relatively accuratemeasurements. By allowing the ensemble to enlarge up to 3 n , and then disposing of enoughcorrupting outliers from the non-Gaussian forecast, the TEnKF is capable of tracking the RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 15 Figure 5: Effect of increasing nonlinearity (∆ t obs ) on the time-averaged increase in ensemblesize ( n aug /n ) by the TEnKF with adaptive ensemble augmentation, over 250 repetitions ofthe experiment. Parameters in Figure 4.nonlinear Lorenz 96 model without much more effort over time than the standard EnKF.To assess the impact of adaptive ensemble augmentation, we examine the rate at whichnonlinearities force the TEnKF to augment the ensemble size over several repetitions of thedeterministic Lorenz-96 experiment (Figure 5). An ensemble size of n = 200 within a observa-tion threshold distance of d max = 3 from the measurements was sufficient to enable the filterto track the deterministic Lorenz-96 system. To ensure a robust algorithm, the maximumaugmentation ratio, r max , need only be so large that the relative increase in average ensemblesize with respect to r max is negligible. At prediction times of ∆ t obs = 0 . 80, then r max ≈ n aug ≈ nr max ≈ 600 ensemble members is needed to handle the transient nonlinearitieswhile maintaining the effective ensemble size. The computational cost increases rapidly withforecast length. An exponential extrapolation of the ∆ t obs = 0 . 90 case suggests an averageaugmented ensemble size as large as 1500, and a required capacity greater than 4000 mem-bers. However, increasing the minimum ensemble size may temper these requirements to somedegree. 6. Conclusions. We have introduced a trimmed ensemble Kalman filter, or TEnKF, devel-oped as an extension of the ensemble Kalman filter to solve sequential nonlinear non-GaussianBayesian inverse problems. This algorithm uses a “trimming” function to identify outliers inthe observed forecast which contribute to errors in the correlation between the forecast andlikelihood ensembles when the forecast is significantly non-Gaussian. For specific trimmingfunctions, we show the TEnKF accurately reproduces the limiting distributions of both the standard EnKF and a particle filter with bootstrapped resampling (i.e., the exact Bayesianposterior) on a non-linear, non-Gaussian test problem. A one-parameter family of trimmingfunctions allows us to interpolate between these limiting distributions to balance adaptivelyaccuracy or efficiency.An implementation of the TEnKF methodology is presented with adaptive control ofthe trimming function and the effective ensemble size. Through numerical experiments onstochastic versions of the 3-dimensional Lorenz-63 model and the 36-dimensional Lorenz-96 model, we show the methods restore convergence to the true posterior in cases when theEnKF fails to converge. We also extended the TEnKF to use adaptive ensemble augmentationto overcome transient increases in model nonlinearity and improve efficiency over smootherintervals.The efficiency results and the flexibility of the TEnKF algorithm present an opportunityto integrate other methods that increase the number of significant ensemble members. Forexample, consider importance sampling and steering techniques which draw ensemble mem-bers toward observations. We may further improve accuracy in uncertainty calculations bycombining these with the TEnKF to further increase the effective ensemble size, and thenremove any non-Gaussian outliers. Appendix A. Proof of Proposition 3.1. To derive the PDF of ˜ X , we consider themapping from ( X , Y ) to ( ˜ X , Y ). The Jacobian matrix of this mapping is J = (cid:20) I −K I (cid:21) , where I is the identity matrix. It is easy to verify that the determinant of J is | J | = 1. So,we have the joint PDF of ( ˜ X , Y ):(26) p ˜ XY (˜ x , y ) = p XY ( x , y ) | J | − = p XY ( x , y ) . Using the product rule p XY ( x , y ) = p X | y ( x ) p Y ( y ), and substituting the relationship x =˜ x − K ( y ∗ − x ) to Eq. (26), we have(27) p ˜ XY (˜ x , y ) = p X | y (˜ x − K ( y ∗ − x )) p Y ( y ) . Finally, by integrating out y , we have the marginal PDF of ˜ X : p ˜ X (˜ x ) = (cid:90) p X | y (˜ x − K ( y ∗ − y )) p Y ( y ) d y . (cid:3) Appendix B. Proof of Proposition 4.1. By Proposition 3.1, we have(28) p t ˜ X (˜ x ) = (cid:90) p t X | y (˜ x − K ( y ∗ − y )) p t Y ( y ) d y , where p t X | y ( · ) and p t Y ( · ) are the conditional PDF and the marginal PDF determined by theadjusted joint PDF (13). RIMMED ENSEMBLE KALMAN FILTER FOR NONLINEAR AND NON-GAUSSIAN DATA ASSIMILA-TION PROBLEMS 17 By the definition of (13) and the sum rule (i.e., the marginal PDF can be calculated bysumming/integrating the joint PDF over other random variables), we have(29) p t Y ( y ) = (cid:90) p t XY ( x , y ) d x = (cid:90) c t t ( y ) p XY ( x , y ) d x = c t t ( y ) (cid:90) p XY ( x , y ) d x = c t t ( y ) p Y ( y ) . By Eq. (13), Eq. (29), and the fact that the conditional PDF is equal to the joint PDFdivided by the marginal PDF, we have(30) p t X | y ( x ) = p t XY ( x , y ) p t Y ( y ) = c t t ( y ) p XY ( x , y ) c t t ( y ) p Y ( y ) = p XY ( x , y ) p Y ( y ) = p X | y ( x ) . Finally, combining Eqs. (28), (29) and (30) yields p t ˜ X (˜ x ) = (cid:90) p X | y (˜ x − K ( y ∗ − y )) p Y ( y ) c t t ( y ) d y . (cid:3) REFERENCES [1] S. I. Aanonsen, G. Nævdal, D. S. Oliver, A. C. Reynolds, B. Vall`es, et al. , The ensemblekalman filter in reservoir engineering–a review , Spe Journal, 14 (2009), pp. 393–412.[2] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp , A tutorial on particle filters foronline nonlinear/non-gaussian bayesian tracking , IEEE Transactions on signal processing, 50 (2002),pp. 174–188.[3] G. Burgers, P. Jan van Leeuwen, and G. Evensen , Analysis scheme in the ensemble kalman filter ,Monthly weather review, 126 (1998), pp. 1719–1724.[4] N. Chustagulprom, S. Reich, and M. Reinhardt , A hybrid ensemble transform particle filter for non-linear and spatially extended dynamical systems , SIAM/ASA Journal on Uncertainty Quantification,4 (2016), pp. 592–608.[5] D. Crisan and A. Doucet , A survey of convergence results on particle filtering methods for practitioners ,IEEE Transactions on signal processing, 50 (2002), pp. 736–746.[6] P. M. Djuric, J. H. Kotecha, J. Zhang, Y. Huang, T. Ghirmai, M. F. Bugallo, and J. Miguez , Particle filtering , IEEE signal processing magazine, 20 (2003), pp. 19–38.[7] G. Evensen , Sequential data assimilation with a nonlinear quasi-geostrophic model using monte carlomethods to forecast error statistics , Journal of Geophysical Research: Oceans, 99 (1994), pp. 10143–10162.[8] G. Evensen , Data Assimilation: The Ensemble Kalman Filter , Springer Science & Business Media, Aug.2009.[9] R. Fan, Z. Huang, S. Wang, R. Diao, and D. Meng , Dynamic state estimation and parametercalibration of a dfig using the ensemble kalman filter , in Power & Energy Society General Meeting,2015 IEEE, IEEE, 2015, pp. 1–5.[10] M. Frei and H. R. K¨unsch , Bridging the ensemble kalman and particle filters , Biometrika, (2013),p. ast020.[11] Y. Gu, D. S. Oliver, et al. , History matching of the punq-s3 reservoir model using the ensemble kalmanfilter , SPE journal, 10 (2005), pp. 217–224.[12] P. L. Houtekamer and H. L. Mitchell , A sequential ensemble kalman filter for atmospheric dataassimilation , Monthly Weather Review, 129 (2001), pp. 123–137.[13] J. Lei and P. Bickel , A moment matching ensemble filter for nonlinear non-gaussian data assimilation ,Monthly Weather Review, 139 (2011), pp. 3964–3973. [14] W. Li, G. Lin, and D. Zhang , An adaptive anova-based pckf for high-dimensional nonlinear inversemodeling , Journal of Computational Physics, 258 (2014), pp. 752–772.[15] Y. Li, Z. Huang, N. Zhou, B. Lee, R. Diao, and P. Du , Application of ensemble kalman filter inpower system state tracking and sensitivity analysis , in Transmission and Distribution Conference andExposition (T&D), 2012 IEEE PES, IEEE, 2012, pp. 1–8.[16] Q. Liao, D. Zhang, et al. , Data assimilation for strongly nonlinear problems by transformed ensemblekalman filter , SPE Journal, 20 (2015), pp. 202–221.[17] E. N. Lorenz , Deterministic nonperiodic flow , Journal of the Atmospheric Sciences, 20 (1963), pp. 130–141.[18] J. Mandel, L. Cobb, and J. D. Beezley , On the convergence of the ensemble kalman filter , Applicationsof Mathematics, 56 (2011), pp. 533–541.[19] H. Moradkhani, S. Sorooshian, H. V. Gupta, and P. R. Houser , Dual state–parameter estimationof hydrological models using ensemble kalman filter , Advances in water resources, 28 (2005), pp. 135–147.[20] E. Ott, B. R. Hunt, I. Szunyogh, A. V. Zimin, E. J. Kostelich, M. Corazza, E. Kalnay,D. Patil, and J. A. Yorke , A local ensemble kalman filter for atmospheric data assimilation , TellusA, 56 (2004), pp. 415–428.[21] T. Palmer and R. Hagedorn , eds., Predictability of Weather and Climate , Cambridge University Press,2006.[22] R. H. Reichle, D. B. McLaughlin, and D. Entekhabi , Hydrologic data assimilation with the ensemblekalman filter , Monthly Weather Review, 130 (2002), pp. 103–114.[23] A. S. Stordal, H. A. Karlsen, G. Nævdal, H. J. Skaug, and B. Vall`es , Bridging the ensemblekalman filter and particle filters: the adaptive gaussian mixture filter , Computational Geosciences, 15(2011), pp. 293–305.[24] A. Tarantola , Inverse problem theory and methods for model parameter estimation , SIAM, 2005.[25] M. K. Tippett, J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker , Ensemblesquare root filters , Monthly Weather Review, 131 (2003), pp. 1485–1490.[26]