[PDF] Penalized Ensemble Kalman Filters for High Dimensional Non-linear Systems

Abstract

The ensemble Kalman filter (EnKF) is a data assimilation technique that uses an ensemble of models, updated with data, to track the time evolution of a usually non-linear system. It does so by using an empirical approximation to the well-known Kalman filter. However, its performance can suffer when the ensemble size is smaller than the state space, as is often necessary for computationally burdensome models. This scenario means that the empirical estimate of the state covariance is not full rank and possibly quite noisy. To solve this problem in this high dimensional regime, we propose a computationally fast and easy to implement algorithm called the penalized ensemble Kalman filter (PEnKF). Under certain conditions, it can be theoretically proven that the PEnKF will be accurate (the estimation error will converge to zero) despite having fewer ensemble members than state dimensions. Further, as contrasted to localization methods, the proposed approach learns the covariance structure associated with the dynamical system. These theoretical results are supported with simulations of several non-linear and high dimensional systems.

Full PDF

PPenalized Ensemble Kalman Filters for High Dimensional Non-linearSystems

Elizabeth Hou ∗ And Alfred O. Hero

University of Michigan, Ann Arbor, Michigan

Earl Lawrence

Los Alamos National Laboratory, Los Alamos, New Mexico ∗ Corresponding author address:

University of Michigan, Ann Arbor, MI.E-mail: [email protected]

Generated using v4.3.2 of the AMS L A TEX template a r X i v : . [ s t a t . A P ] M a r BSTRACTThe ensemble Kalman ﬁlter (EnKF) is a data assimilation technique thatuses an ensemble of models, updated with data, to track the time evolutionof a usually non-linear system. It does so by using an empirical approxima-tion to the well-known Kalman ﬁlter. However, its performance can sufferwhen the ensemble size is smaller than the state space, as is often necessaryfor computationally burdensome models. This scenario means that the empir-ical estimate of the state covariance is not full rank and possibly quite noisy.To solve this problem in this high dimensional regime, we propose a compu-tationally fast and easy to implement algorithm called the penalized ensem-ble Kalman ﬁlter (PEnKF). Under certain conditions, it can be theoreticallyproven that the PEnKF will have good performance (the error will convergeto zero) despite having fewer ensemble members than state dimensions. Fur-ther, the proposed approach makes fewer assumptions about the structure ofthe system than localization methods. These theoretical results are supportedwith simulations of several non-linear and high dimensional systems.2 . Introduction

The Kalman ﬁlter is a well-known technique to track a linear system over time, and many vari-ants based on the extended and ensemble Kalman ﬁlters have been proposed to deal with non-linearsystems. The ensemble Kalman ﬁlter (EnKF) Evensen (1994); Burgers et al. (1998) is particularlypopular when the non-linear system is extremely complicated and its gradient is infeasible to cal-culate, which is often the case in geophysical systems. However, these systems are often highdimensional and forecasting each ensemble member forward through the system is computation-ally expensive. Thus, the ﬁltering often operates in the high dimensional regime where the numberof ensemble members, n , is much less than the size of the state, p . It is well known that even when p / n → const . and the samples are from a Gaussian distribution, the eigenvalues and the eigenvec-tors of the sample covariance matrix do not converge to their population equivalents, Johnstone(2001); Johnstone and Yu Lu (2004). Since our ensemble is both non-Gaussian and high dimen-sional ( n << p ), the sample covariance matrix of the forecast ensemble will be extremely noisy. Inthis paper, we propose a variant of the EnKF speciﬁcally designed to handle covariance estimationin this difﬁcult regime, but with weaker assumptions and less prior information than competingapproaches. Other Work

To deal with the sampling errors, many schemes have been developed to de-noise the forecastsample covariance matrix. These schemes “tune” the matrix with variance inﬂation and localiza-tion, Hamill et al. (2001); Houtekamer and Mitchell (2001); Ott et al. (2004); Houtekamer et al.(2005); Wang et al. (2007); Anderson (2007, 2009); Li et al. (2009); Bishop and Hodyss (2009a,b);Houtekamer et al. (2009); Campbell et al. (2010); Greybush et al. (2011); Miyoshi (2011). How-ever, these schemes are often not trivial to implement because they require carefully choosing theinﬂation factor and using expert knowledge of the true system to set up the localization. Addi-tionally, the EnKF with perturbed observations introduces additional sampling errors due to theperturbation noise’s lack of orthogonality with the ensemble. Methods have been devised thatconstruct perturbation matrices that are orthogonal, Evensen (2003); however these methods arecomputationally expensive, Evensen (2004). This has lead to the development of matrix factor-ization versions of the EnKF such as the square root and transform ﬁlters, Bishop et al. (2001);Whitaker and Hamill (2002); Tippett et al. (2003); Evensen (2004); Hunt et al. (2007); Godinezand Moulton (2012); Nerger et al. (2012); T¨odter and Ahrens (2015), which do not perturb theobservations and are designed to avoid these additional sampling errors.The ensemble Kalman ﬁlter is closely related to the particle ﬁlter Papadakis et al. (2010), al-though it uses a Gaussian approximation of the conditional state distribution in order to get anupdate that is a closed form expression for the analysis ensemble (as opposed to one that requiresnumerical integration). While the particle ﬁlter does not use this approximation, it also requires anexponential number of particles to avoid ﬁlter collapse, Snyder et al. (2008). Recently, there hasbeen signiﬁcant effort to apply the particle ﬁlter to larger scale systems using equal weights, vanLeeuwen (2010); Ades and van Leeuwen (2013), and merging it with the ensemble Kalman ﬁlterto form hybrid ﬁlters, Papadakis et al. (2010); Lei and Bickel (2011); Frei and K¨unsch (2013a);Nakano (2014); Robert and K ¨unsch (2016). EnKF is also related to the unscented Kalman ﬁlter,Julier and Uhlmann (1997); Wan and Van Der Merwe (2000), which handles nonlinearity by prop-agating a carefully selected set of “sigma points” (as opposed to the randomly sampled points ofthe EnKF) through the nonlinear forecast equations. The results are then used to reconstruct theforecasted mean and covariance.Most similar to our proposed work are Ueno and Tsuchiya (2009) and Nino-Ruiz et al. (2015),which also propose methods that use sparse inverse covariance matrices. Both methods justifythe appropriateness of using the inverse space with large scale simulations or real weather data.The former reports that their computational complexity is polynomial in the state dimension andrequires the stronger assumptions of Gaussianity and structural knowledge. The latter algorithm3an be implement in parallel making it very efﬁcient, however, the paper still makes the much stronger assumptions of Gaussianity and conditional independence.

Proposed Method

We propose a penalized ensemble Kalman ﬁlter (PEnKF), which uses an estimator of the fore-cast covariance whose inverse is sparsity regularized. While the localization approaches effectivelydampen or zero out entires in the covariance, our approach zeros out entries in the inverse covari-ance, resulting in a sparse inverse covariance . This provides two advantages. First, it makesa weaker assumption about the relationship between state variables. Second, our approach doesnot require anything like localization’s detailed knowledge of which covariances to ﬁx at zero orhow much to dampen. Instead, it merely favors sparsity in the inverse covariance. Additionally,our method is very easy to implement because it just requires using a different estimator for thecovariance matrix in the EnKF. We can explicitly show the improvement of our estimator throughtheoretical guarantees.O

UTLINE

In Section 2, we explain the assumptions in our high-dimensional system and we give back-ground on the EnKF and (cid:96) penalized inverse covariance matrices. In Section 3, we give details onhow to modify the EnKF to our proposed PEnKF and provide theoretical guarantees on the ﬁlter.Section 4 contains the simulation results of the classical Lorenz 96 system and a more complicatedsystem based on modiﬁed shallow water equations.

2. Background

In this paper, we consider the scenario of a noisy, non-linear dynamics model f ( · ) , which evolvesa vector of unobserved states x t ∈ R p through time. We observe a noisy vector y t ∈ R r , which isa transformation of x t by a function h ( · ) . Both the process noise ω t and the observation noise (cid:15) t are independent of the states x t . We assume both noises are zero mean Gaussian distributed withknown diagonal covariance matrices, QQQ and

RRR . Often, it is assumed that the dynamics model doesnot have noise making ω t a zero vector, but for generality we allow ω t to be a random vector. x t = f ( x t − ) + ω t Dynamics Model y t = h ( x t ) + (cid:15) t Observation ModelAs with localization methods, we make an assumption about the correlation structure of thestate vector in order to handle the high dimensionality of the state. In particular, we assume thatrelatively few pairs of state variables have non-zero conditional correlation, Cov ( x i , x j | x − ( i , j ) ) (cid:54) = x − ( i , j ) represents all state variables except x i and x j . This means that, conditioning on allof the rest of the state, x i and x j are uncorrelated. They may have a dependency, meaning thatthe correlation between them is non-zero, but that correlation is entirely explained by dependenceon other parts of the state. A sample example is given by a one-dimensional spatial ﬁeld with thethree locations x , x , and x where x and x are both connected to x , but not each other. Inthis case, it might be reasonable to model x and x as uncorrelated conditional on x althoughnot necessarily unconditionally uncorrelated. Their simple correlation might not be zero, but theirconditional or partial correlation is zero. This is similar to our assumption, although we do notassume any particular pattern of the conditional dependencies as you might in a spatial ﬁeld.We assume that the set of non-zero conditional correlations is sparse. This is equivalent toassuming that the inverse correlation matrix of the model state is sparse. In other words, the inversecovariance matrix will have s non-zero off-diagonal entries. We can also quantify the sparsity levelas d , which is the maximum number of non-zero off-diagonals in any row, so d << p . Note thatour assumption is on the conditional correlation or lack of it, and we do not make any claims on4he independence between states. This is because x t is not Gaussian when f ( · ) is non-linear souncorrelation does not imply independence thus the zeros in the inverse covariance matrix do notimply conditional independence. This assumption is weaker than the one made using localization.That assumption is equivalent to assuming that the covariance matrix itself is sparse whereas ourassumption admits a dense covariance. Finally, because we do not assume that the state variableinteractions are the same for different time points, we allow the set E t and its size s t to change overtime. a. Ensemble Kalman Filter The standard EnKF algorithm of Evensen (2003) is shown in Algorithm 1. At time t = n samples are drawn from some distribution, which is often chosen as the standard multivariatenormal distribution, if the true initial distribution is unknown, to form an initial ensemble AAA ∈ R p × n . And, at every time point t , the observations y t are perturbed n times with Gaussian whitenoise, η j ∼ N ( , RRR ) , to form a perturbed observation matrix DDD t ∈ R p × n , where d jt = y t + η j . Algorithm 1

Ensemble Kalman Filter

Input:

AAA , HHH , QQQ , RRR , and DDD t where HHH is the the measurement operator for t ∈ { , ..., T } do (cid:46) Evolve each ensemble member forward in time a j = f ( a j ) + w j ∀ j ∈ { , ..., n } where w j ∼ N ( , QQQ ) (cid:46) Correct the ensemble with the observations

AAA = AAA + ˆ KKK ( DDD t − HAHAHA ) where ˆ KKK = ˆ PPP f HHH T ( HHH ˆ PPP f HHH T + RRR ) − (cid:46) Predict using the analysis ensemble meanˆ x t = n ∑ nj = a j end forOutput: ˆ x t The forecast covariance estimator ˆ

PPP f is typically the sample covariance of the forecast ensemble,deﬁned as (cid:100) Cov ( AAA ) = n − ( AAA − ¯ AAA )( AAA − ¯ AAA ) T , where ¯ AAA is the sample mean vector, but it canbe another estimator such as a localized estimator (one that is localized with a taper matrix), or apenalized estimator as proposed in this paper. b. Bregman Divergence and the (cid:96) Penalty

Below, we give a brief overview of the (cid:96) penalized log-determinant Bregman divergence andsome properties of its minimizing estimator, as described in Ravikumar et al. (2011). We denote SSS to be any arbitrary sample covariance matrix, and Σ = E ( SSS ) to be its true covariance matrix,where E ( · ) is the expectation function.The Bregman divergence is a very general method to measure the difference between two func-tions. Here the functions to be compared are covariance matrices. Since we are interested inﬁnding a sparse positive deﬁnite estimator for the inverse covariance matrix, a natural choice ofBregman function is − log det ( · ) , which has a domain restricted to positive deﬁnite matrices. Thus5 , our optimal estimator for the inverse covariance matrix Σ − , will minimizearg min Θ ∈ S p × p ++ − log det ( Θ ) − log det ( Σ ) + tr (cid:0) Σ ( Θ − Σ − ) (cid:1) where S p × p ++ is the set of all symmetric positive deﬁnite p × p matrices. This loss function requiresthe covariance matrix Σ to be known, but it can approximated by an empirical loss, which replaces Σ with its empirical equivalent SSS and adds a penalty term to ensure strict convexity.The empirical Bregman divergence with function − log det ( · ) and an (cid:96) penalty term essentiallyreduces (by dropping the constants) toarg min Θ ∈ S p × p ++ − log det ( Θ ) + tr ( Θ SSS ) + λ || Θ || (1)where λ ≥ || · || denotes an element-wise (cid:96) norm. This can begeneralized so that each entry of Θ can be penalized differently if λ is a matrix and using aelement-wise product with the norm.This objective has a unique solution, Θ = ( ˜ SSS ) − , which satisﬁes ∂∂ Θ B λ ( Θ || SSS − ) = SSS − Θ − + λ ∂ || Θ || = ∂ || Θ || is a subdifferential of the (cid:96) norm deﬁned in (A1) in the appendix. The solution ( ˜ SSS ) − is a sparse positive deﬁnite estimator of the inverse covariance matrix Σ − , and we canwrite its inverse explicitly as ˜ SSS = SSS + λ ˜ ZZZ , where ˜

ZZZ is the unique subdifferential matrix that makesthe gradient zero. See Ravikumar et al. (2011) or Jankov´a et al. (2015) for a more thoroughexplanation of ˜

ZZZ .Ravikumar et al. (2011) show that for well-conditioned covariances and certain minimum sam-ple sizes, the estimator ( ˜ SSS ) − has many nice properties including having, with high probability,the correct zero and signed non-zero entries and a sum of squared error that converges to 0 as n , p , s → ∞ . These properties will allow our method, described in the next section, to attain supe-rior performance over the EnKF. (cid:96) Penalized Ensemble Kalman Filter

Our penalized ensemble Kalman ﬁlter modiﬁes the EnKF, by using a penalized forecast covari-ance estimator ˜

PPP f . This penalized estimator is derived from its inverse, which is the minimizerof (1). Thus from Section 2.b, it can be explicitly written as ˜ PPP f = ˆ PPP f + λ ˜ ZZZ , implying that weessentially learn a matrix ˜

ZZZ , and use it to modify our sample covariance ˆ

PPP f so that ( ˆ PPP f + λ ˜ ZZZ ) − issparse. From this, our modiﬁed Kalman gain matrix is˜ KKK = ( ˆ PPP f + λ ˜ ZZZ ) HHH T (cid:16) HHH ( ˆ PPP f + λ ˜ ZZZ ) HHH T + RRR (cid:17) − . The intuition behind this estimator is that since only a small number of the state variables in thestate vector x t are conditionally correlated with each other, the forecast inverse covariance matrix ( PPP f ) − will be sparse with many zeros in the off-diagonal entries. Furthermore, since minimizing(1) gives a sparse estimator for ( PPP f ) − , this sparse estimator will accurately capture the conditionalcorrelations and uncorrelations of the state variables. Thus ˜ PPP f will be a much better estimator ofthe true forecast covariance matrix PPP f because the (cid:96) penalty will depress spurious noise in order6o make ( ˜ PPP f ) − sparse, while the inverse of the sample forecast covariance ( ˆ PPP f ) − , when it exists,will be non-sparse. As in most penalized estimators, the ˜ PPP f is a biased estimator of the forecastcovariance, while the sample forecast covariance is not. But because the forecast distribution iscorrected for in the analysis step, it is acceptable to take this bias as a trade-off for less variance(sampling errors). A more in-depth study of the consequences of this bias in (cid:96) penalized inversecovariance matrices and their inverses is described in Jankov´a et al. (2015). Additionally, this biasdue to penalization in the inverse covariance behaves in a similar way as variance inﬂation becausethe bias on the diagonal of ( ˜ PPP f ) − is due to it being increased by λ , so having a biased estimator isnot necessarily disadvantageous. And ﬁnally, since we do not assume the state variables interact inthe same way over all time, we re-learn the matrix ˜ ZZZ every time the ensemble is evolved forward.We can choose the penalty parameter λ in a systematic fashion by calculating a regularizationpath, solving (1) for a list of decreasing λ s, and evaluating each solution with an informationcriterion such as an extended or generalized Akaike information criterion (AIC) or Bayesian in-formation criterion (BIC), Foygel and Drton (2010); Lv and Liu (2014). Additionally, if we haveknowledge or make assumptions about the moments of the ensemble’s distribution, we know theoptimal proportionality of the penalty parameter (see proof of Theorem 1. Thus, we can reﬁnethe penalty parameter by calculating a regularization path for the constant of the optimal order. InSection 4, we describe a practical approach to choosing λ using a free forecast model run like inRobert and K¨unsch (2016) and the BIC. a. Implications on the Kalman Gain Matrix The only estimated randomness in the EnKF occurs in the ensemble update step, which is alinear function of the Kalman gain matrix. So, having an accurate estimator of the true Kalmangain matrix

KKK will ensure that the algorithm performs well. And, because the true Kalman gainmatrix inherits many of the properties of the forecast covariance matrix

PPP f , our modiﬁed Kalmangain matrix ˜ KKK will beneﬁt from many of the nice properties of our forecast covariance estimator˜

PPP f .How good of an estimator we can get for the forecast covariance matrix PPP f will of course dependon its structure. If it is close to singular or contains lots of entries with magnitudes smaller than thenoise level, it will be always be difﬁcult to estimate. So for the following theorem, we assume thatthe forecast covariance matrix is well-behaved. This means that it satisﬁes the standard regularityconditions (incoherence, bounded eigenvalue, sparsity, sign consistency and monotonicity of thetail function) found in many places including Ravikumar et al. (2011); Jankov´a et al. (2015) andalso deﬁned in the appendix. Theorem 1.

Under regularity conditions and for the system described in Section 2, when λ (cid:16) (cid:112) ( p ) / n for sub-Gaussian ensembles and λ (cid:16) (cid:112) p / m / n for ensembles with bounded m th moments, Sum of Squared Errors of ˜ KKK (cid:46)

Sum of Squared Errors of ˆ KKK and as long as the sample size is at least o ( n ) = d log ( p ) for sub-Gaussian ensembles ando ( n ) = d p / m for ensembles with bounded m th moments,Sum of Squared Errors of ˜ KKK → with high probability as n , p , s → ∞ . The above theorem gives us a sense of the performance of the modiﬁed Kalman gain matrix incomparison to the sample Kalman gain matrix. It shows that with high probability, the modiﬁedKalman gain matrix will have an asymptotically smaller sum of squared error (SSE) than a Kalmangain matrix formed using the sample forecast covariance matrix. Also, for a given number of states p , the theorem tell us the minimum ensemble size n required for our modiﬁed Kalman gain matrix7o be a good estimate of the true Kalman gain matrix. The sub-Gaussian criterion, where allmoments are bounded, is actually very broad and includes any state vectors with a strictly log-concave density and any ﬁnite mixture of sub-Gaussian distributions. However even if not allmoments are bounded, the larger the number of bounded fourth-order moments m , the smaller thenecessary sample size. In comparison, the sample Kalman gain matrix requires o ( n ) = p samplesin the sub-Gaussian case, and also signiﬁcantly more in the other case (see appendix for exactdetails). When the minimum sample size for a estimator is not met, good performance cannot beguaranteed because the asymptotic error will diverge to inﬁnity instead of converge to zero. This iswhy when the number of ensembles n is smaller than the number of states p , just using the sampleforecast covariance matrix is not sufﬁcient. b. Implications on the Analysis Ensemble It is well known that due to the additional stochastic noise used to perturb the observations, thecovariance of the EnKF’s analysis ensemble, (cid:100)

Cov ( AAA ) is not equivalent to its analysis covariancecalculated by the Gaussian update ˆ PPP a = ( III − ˆ KKKHHH ) ˆ PPP f . This has led to the development of determin-istic variants such as the square root and transform ﬁlters, which do have (cid:100) Cov ( AAA ) = ˆ PPP a . However,in a non-linear system, this update is sub-optimal because it uses a Gaussian approximation ofPr ( x t | y t − ) , the actual distribution of forecast ensemble AAA . Thus let us denote PPP a as the trueanalysis covariance deﬁned as (cid:90) ( x t ) Pr ( x t | y t ) d x t − (cid:18) (cid:90) x t Pr ( x t | y t ) d x t (cid:19) where Pr ( x t | y t ) = Pr ( y t | x t ) Pr ( x t | y t − ) / Pr ( y t ) is not Gaussian. Then, E ( ˆ PPP a ) (cid:54) = PPP a and there willalways be this analysis spread error regardless of whether (cid:100) Cov ( AAA ) = ˆ PPP a or not.As also mentioned in Lei and Bickel (2011), actually none of the analysis moments of theEnKF are consistent with the true moments including the analysis mean. However this analysiserror is present in all methods that do not introduce particle ﬁlter properties to the EnKF, andthus is not the focus of our paper. We are primarily concerned with the sampling errors in high-dimensional systems and simply wanted to address that the lack of equivalence to the Gaussianupdate is irrelevant in our case of a non-linear system. c. Computational Time and Storage Issues The computational complexity of solving for the minimizer of (1) with the GLASSO algorithmfrom Friedman et al. (2008) is O ( sp ) because it is a coordinate descent algorithm. Althoughthe ﬁnal estimator ( ˜ PPP f ) − is sparse and only requires storing s + p values, the algorithm requiresstoring p × p matrices in memory. However, by using iterative quadratic approximations to (1),block coordinate descent, and parallelization, the BIGQUIC algorithm of Hsieh et al. (2013) hascomputational complexity O ( s ( p / k )) and only requires storing ( p / k ) × ( p / k ) matrices, where k isthe number of parallel subproblems or blocks.The matrix operations for the analysis update AAA = AAA + up can also be linear in p if RRR is diagonaland

HHH is sparse (like in banded interpolation matrices) with at most h << q non-zero entries ina row. Then (( ˜ PPP f ) − + HHH T RRR − HHH ) has at most ( s + p + qh ) << p non-zero entries and can becomputed with O ( s + p + qh ) matrix operations. And, solving for up only takes O ( n ( s + p + qh )) matrix operations because it is made from the solutions to the sparse linear systems (( ˜ PPP f ) − + HHH T RRR − HHH ) up = HHH T RRR − ( DDD t − HHHAAA ) where the right-hand side takes O ( pq + qpn ) matrix operationsto form. 8 . Simulations In all simulations, we compare to an ensemble Kalman ﬁlter where the forecast covariancematrix is localized with a taper matrix generated from equation (4.10) in Gaspari and Cohn (1999).The taper matrix parameter c is chosen using the true interactions of the system, so the localizationshould be close to optimal for simple systems. We use this TAPER-EnKF as the baseline becauseif the PEnKF can do as well as this ﬁlter, it implies that the PEnKF can learn a close to optimalcovariance matrix, even without the need to impose a known neighborhood structure. If the PEnKFcan do better than this ﬁlter, it implies that the PEnKF is learning some structure that is not capturedby localization with a taper matrix.In order to choose the penalty parameter for PEnKF, we assume that the state variables in ourexamples are sub-Gaussian. In this case, we can set λ = c λ (cid:112) R log ( p ) / n for some appropriatechoice of c λ (see the proof of Theorem 1), where R is the observation noise’s variance. To estimate c λ , we generate a representative ensemble (which may also be our initial ensemble) using a freeforecast run like in Robert and K ¨unsch (2016) in which a state vector is drawn at random (e.g.from N ( , III ) ) and evolved forward. The representative ensemble is produced by taking a set ofequally spaced points (e.g. every 100th state vector) from the evolution. This ensemble is usedto choose c λ from some predeﬁned interval by minimizing the extended Bayesian informationcriterion (eBIC) of Foygel and Drton (2010) if p > n or the BIC of Schwarz (1978) if p < n .Of course (1) is not a likelihood unless the states are Gaussian. So, we have a misspeciﬁedmodel where we are treating the states as having a Gaussian likelihood when evaluating a potentialpenalty parameter using an information criterion. In this case, we should correct our informationcriterion for the misspeciﬁcation as in Lv and Liu (2014). However, this can be quite difﬁcult andwe leave an in-depth exploration of this problem for future work. In the meantime, we assume themisspeciﬁed information criterion is close to the correct information criterion, and it does seem toperform well despite its lack of optimality.We deﬁne the root mean squared error (RMSE) used to evaluate a ﬁlters performance byRMSE t = (cid:113) ( || ˆ x t − x t || ) / p where RMSE t is an element of a vector indicating the RMSE at time point t , x t is a vector of thetrue hidden state variables, ˆ x t is a ﬁlter’s estimators for the true state vector, and || · || is the (cid:96) norm. We will refer to quantiles such as the mean or median RMSE to be the mean or median ofthe elements of the RMSE vector. a. Lorenz 96 System The 40-state Lorenz 96 model is one of the most common systems used to evaluate ensembleKalman ﬁlters. The state variables are governed by the following differential equations dx it dt = (cid:0) x i + t − x i − t (cid:1) x i − t − x it + ∀ i = , . . . , x t = x t , x t = x t , and x − t = x t .We use the following simulation settings. We have observations for the odd state variables, so y t = HHH x t + (cid:15) t where HHH is a 20 ×

40 matrix with ones at entries { i , j = i − } and zeros everywhereelse and (cid:15) t is a 20 × N ( , . III ) . We initialize the true state vector from a N ( , III ) and we assimilate at every 0 . t time steps, where t = , . . . , .

4, which makes it signiﬁcantly non-linear, andthe lack of observations for the even state variables.9ince the exact equations of the Lorenz 96 model are fairly simple, it is clear how the statevariables interact with each other. This makes it possible to localize with a taper matrix that isalmost optimal by using the Lorenz 96 equations to choose a half-length parameter c . However,we do not incorporate this information in the PEnKF algorithm, which instead learns interactionsby essentially extracting it from the sample covariance matrix. We set the penalty parameter λ = c λ (cid:112) . ( p ) / n by using an ofﬂine free forecast run to search for the constant c λ in therange [ . , ] as described at the beginning of this section.We average the PEnKF estimator of the forecast inverse covariance matrix at the time points500, 1000, 1500, and 2000 for 50 trials with 25 ensembles members, and we compare it to the“true” inverse covariance matrix, which is calculated by moving an ensemble of size 2000 throughtime. In Figure 1, each line represents the averaged normalized rows of an inverse covariancematrix and the lines are centered at the diagonal. The penalized inverse covariance matrix does aqualitatively good job of capturing the neighborhood information and successfully identiﬁes thatany state variables far away from state variable i , do not interact with it.Because the PEnKF is successful at estimating the structure of the inverse covariance matrixand thus the forecast covariance matrix, we expect it will have good performance for estimatingthe true state variables. We compare the PEnKF to the TAPER-EnKF and other estimators fromBengtsson et al. (2003); Lei and Bickel (2011); Frei and K ¨unsch (2013a,b) by looking at statisticsof the RMSE. Note that in order to have comparable statistics to as many other papers as possible,we do not add variance inﬂation to the TAPER-EnKF (like in Bengtsson et al. (2003); Frei andK¨unsch (2013a,b) and unlike in Lei and Bickel (2011) ). Also, like in those papers, we initializethe ensemble from a N ( , III ) , and we use this ensemble to start the ﬁlters. Note that in this case,the initial ensemble is different than the ofﬂine ensemble that we use to estimate the PEnKF’spenalty parameter. This is because the initial ensemble is not representative of the system andits sample covariance is an estimator for the identity matrix. The TAPER-EnKF, which is simplycalled the EnKF in the other papers, is localize by applying a taper matrix where c =

10 to thesample covariance matrix.We show the mean, median, 10%, and 90% quantiles of the RMSE averaged over 50 independenttrials for ensembles of size 400, 100, 25, and 10 in Table 1. For 400 ensemble members, the PEnKFdoes considerably better than the TAPER-EnKF and its relative improvement is larger than thatof the XEnKF reported in Bengtsson et al. (2003) and similar to those of the NLEAF, EnKPF,and XEnKF reported in Lei and Bickel (2011); Frei and K¨unsch (2013a,b) respectively. For 100ensemble members, the PEnKF does do worse than the TAPER-EnKF and EnKPF of Frei andK¨unsch (2013a); this we suspect may be do to the bias-variance trade-off when estimating theforecast covariance matrix. The PEnKF has the most signiﬁcant improvement over the TAPER-EnKF in the most realistic regime where we have fewer ensemble members than state variables.For both 25 and 10 ensemble members, the PEnKF does considerably better than the TAPER-EnKF and it does not suffer from ﬁlter divergence, which Frei and K ¨unsch (2013a) report occursfor the EnKPF at 50 ensemble members.While it is clear the PEnKF does well even when there are fewer ensemble members than statevariables, 40 variables is not enough for the problem to be considered truly high-dimensional.We now consider simulation settings where we increase the dimension of the state space p whileholding the number of ensemble members n constant. We initialize the ensemble from the freeforecast run and set λ and the taper matrix in the same way as in the previous simulations. Weexamine the mean RMSE averaged over 50 trials and its approximate 95% conﬁdence intervalsin the Figure 2. The mean RMSE of the PEnKF is signiﬁcantly smaller than the mean RMSE ofthe TAPER-EnKF for all p . Additionally the conﬁdence intervals of the mean RMSE are muchnarrower than the ones for the TAPER-EnKF. This suggest that there is little variability in thePEnKF’s performance, while the TAPER-EnKF’s performance is more dependent on the trial,with some trials being “easier” for the TAPER-EnKF than others.10 . Modiﬁed Shallow Water Equations System While the Lorenz 96 system shows that the PEnKF has strong performance because it is success-ful at reducing the sampling errors and capable of learning the interactions between state variables,the system is not very realistic in that all state variables are identical and the relationship betweenstate variables is very simplistic. We now consider a system based on the modiﬁed shallow waterequations of W¨ursch and Craig (2014), which models cloud convection with ﬂuid dynamics equa-tions, but is substantially computationally less expensive than actual numerical weather predictionmodels. The system has three types of state variables: ﬂuid height, rain content, and horizontalwind speed.To generate this system we use the R package “modiﬁedSWEQ” created by Robert (2014), andthe same simulation settings as in Robert and K¨unsch (2016). So we always observe the raincontent, but wind speed is only observed at locations where it is raining and ﬂuid height is neverobserved. Explicitly for the R function generate . xy () , we use h c = . , h r = . σ r = . , σ u = . RRR = diag ([ R r = . R u = σ u ]) tobe the estimated diagonal noise covariance matrix. All other parameters are just the default onesin the function. The initial ensemble is drawn from a free forecast run with 10000/60 time-stepsbetween each ensemble member. We give a snapshot of the system at a random time point in Figure3. There are p =

300 state variables for each type, making the state space have 900 dimensionsand we assimilate the system every 5 seconds for a total time period of 6 hours. Like in Robertand K¨unsch (2016), we choose to use only 50 ensemble members and we do not perturb rainobservations that are 0, because at these points there is no measurement noise.The TAPER-EnKF uses a 3 p × p taper matrix with c =

5, however the entries off the p × p block diagonals are depressed (they are multiplied by 0.9). The NAIVE-LEnKPF uses the samesettings as in Robert and K ¨unsch (2016), so a localization parameter of 5km, which gives the sametaper matrix as the one used in the TAPER-EnKF, and an adaptive γ parameter. For the PEnKF,we set the penalty parameter to be a 3 p × p matrix, Λ = c λ (cid:113) λ R λ TR log ( p ) / n , where the ﬁrst p entries of the vector λ R are reference units and the rest are to scale for the perturbation noise ofthe different state types. So the ﬁrst p are 1 (reference) for ﬂuid height, the second p are R u forwind, and the last p are R r for rain. We choose the constant c λ with eBIC like before and searchin the range [ . , ] .Figure 4 shows the mean and approximate 95% conﬁdence intervals of the RMSE for ﬂuidheight, wind speed, and rain content over 6 hours of time using 50 trials. The mean RMSE forall three ﬁlters are well within each others’ conﬁdence intervals for the ﬂuid height and windvariables. For the rain variables, the mean RMSE of neither the TAPER-EnKF nor the NAIVE-LEnKPF are in the PEnKF’s conﬁdence intervals and the mean RMSE of the PEnKF is on theboundary of the other two models’ conﬁdence intervals. This strongly suggests that the PEnKF’srain error is statistically smaller than the rain errors of the other two ﬁlters. Since this simulation isnot as simple as the previous ones, the interactions between the state variables are most likely notas effectively captured by the taper matrix or other localization methods, and the results from thissimulation suggest that the PEnKF is learning more accurate interactions for the rain variables.We do not show the results of the BLOCK-LEnKPF of Robert and K¨unsch (2016) because thealgorithm suffered from ﬁlter divergence in 27 of the 50 trials, and in the trials where it did notfail, it performed very similar to the NAIVE-LEnKPF.

5. Discussion

We propose a new algorithm based on the ensemble Kalman ﬁlter that is designed for superiorperformance in non-linear high dimensional systems. This algorithm we call the penalized en-semble Kalman ﬁlter because it uses the popular statistical concept of penalization/regularizationin order to make the problem of estimating the forecast covariance matrix well-deﬁned (strictly11onvex). This in turn both decreases the sampling errors in the forecast covariance estimator bytrading it off for bias and prevents ﬁlter divergence by ensuring that the estimator is positive deﬁ-nite. The PEnKF is computationally efﬁcient in that it is not signiﬁcantly slower than the standardEnKF algorithms and easy to implement since it only adds one additional step, and this step usesthe well-established GLASSO algorithm available in almost any scientiﬁc computing language.We give theoretical results that prove that the Kalman gain matrix constructed from this estimatorwill converge to the population Kalman gain matrix under the non-simplistic asymptotic case ofhigh-dimensional scaling, where the sample size and the dimensionality increase to inﬁnity.Through simulations, we show that the PEnKF can do at least as well as, and sometimes betterthan, localized ﬁlters that use much more prior information. We emphasize that by doing just aswell as the TAPER-EnKF which has a close to optimal taper matrix, the PEnKF is effectively cor-rectly learning the structure of interactions between the state variables. In a non-simulation settingwhere there is no ground-truth knowledge of the interactions between state variables, correct lo-calization is much more difﬁcult, making any localized ﬁlter’s performance likely sub-optimal. Incontrast, since the PEnKF does not use any of this “oracle” information, its performance will notdiffer in this way between simulations and real-life situations. The more complicated simulation,based on the modiﬁed shallow water equations, highlights this advantage of the PEnKF throughits substantial superior performance in estimating the hidden states of the rain variables. Anotherfeature of the approach is that it seems to require less variance inﬂation. None was applied to anyalgorithm in our comparison, but the PEnKF approach never collapsed. The penalization of theinverse covariance actually produces a slight inﬂation on the diagonal of the covariance, whichseems to help in this regard.While we display a very naive way of searching for a good penalty parameter for the PEnKF inthe simulations, it is theoretically incorrect and thus not a way to chose the truly optimal penaltyparameter. We do believe deriving a speciﬁc information criterion for our PEnKF with correcttheoretical properties is very important since the PEnKF can be sensitive to the penalty parame-ter. However, this model selection in misspeciﬁed models problem is not trivial to solve and anactive topic in current statistical research. Therefore, we will leave deriving a theoretically correctinformation criterion for future work.

Acknowledgments.

This work was partially supported by the Consortium for Veriﬁcation Tech-nology under Department of Energy National Nuclear Security Administration award number de-na0002534 and partially supported by the Laboratory Directed Research and Development pro-gram at Los Alamos National Laboratory under project 20150033DR, SHIELDS: Space HazardsInduced near Earth by Large Dynamic Storms - Understanding, Modeling, and Predicting.APPENDIX

Deﬁnition. (D1) σ min ( AAA ) and σ max ( AAA ) denote the minimum and maximum singular values of any matrix AAA .(D2) The spectral || · || and Frobenius || · || F norms are submultiplicative (cid:107) AAABBB (cid:107) ≤ (cid:107)

AAA (cid:107)(cid:107)

BBB (cid:107) andunitary invariant (cid:107)

AAAUUU (cid:107) = (cid:107) UUU T AAA (cid:107) = (cid:107) AAA T (cid:107) where UUUUUU T = III . So (cid:107)

AAABBB (cid:107) F = (cid:107) AAAUUUDDDVVV T || F = (cid:107) AAAUUUDDD (cid:107) F ≤ (cid:107) AAAUUU σ max ( DDD ) (cid:107) F = (cid:107) AAAUUU (cid:107) F (cid:107) DDD (cid:107) = (cid:107) AAA || F (cid:107) BBB (cid:107) (D3) || AAA − || = σ max ( AAA − ) = / σ min ( AAA ) (D4) KKK and ˜ KKK can be decomposed like KK = PPP f HHH T ( HHHPPP f HHH T + RRR ) − = PPP f HHH T (cid:16) RRR − − RRR − HHH (( PPP f ) − + HHH T RRR − HHH ) − HHH T RRR − (cid:17) = (cid:18) PPP f − PPP f (cid:16) ( PPP f ) − ( HHH T RRR − HHH ) − + III (cid:17) − (cid:19) HHH T RRR − = (cid:18) PPP f − PPP f (cid:16) ( HHH T RRR − HHH ) − + PPP f (cid:17) − PPP f (cid:19) HHH T RRR − = (cid:16) ( PPP f ) − + HHH T RRR − HHH (cid:17) − HHH T RRR − . (D5) E is the edge set corresponding to non-zeros in ( PPP f ) − and E c is its complement. Γ is theHessian of (1) .(D6) ∂ (cid:107) Θ (cid:107) can be any number between -1 and 1 where Θ is 0 because the derivative of anabsolute value is undeﬁned at zero. Thus, it is the set of all matrices ZZZ ∈ S p × p such thatZ i j = (cid:26) sign ( Θ i j ) if Θ i j (cid:54) = ∈ [ − , ] if Θ i j = . (A1) (D7) A bounded m th moment is the highest fourth-order moment of a random variable that isﬁnite, where m is the number of fourth-order moments. Lemma 1.1.

Because

HHH is a constant matrix, it does not affect the asymptotic magnitude of themodiﬁed or sample Kalman gain matricies under any norm.Proof of Lemma 1.1. || ˜ KKK || (cid:16) ||

HHH || || ˜ KKK || (cid:16) ||

HHH ˜ KKK || under any norm where HHH ˜ KKK = HHH ˜ PPP f HHH T ( HHH ˜ PPP f HHH T + RRR ) − = (cid:16) III + RRR ( HHH ˜ PPP f HHH T ) − (cid:17) − = RRRRRR − (cid:16) RRR − + ( HHH ˜ PPP f HHH T ) − (cid:17) − RRR − = III − RRR ( HHH ˜ PPP f HHH T + RRR ) − The same argument holds for ˆ

KKK , where ( HHH ˆ PPP f HHH T ) − is the pseudoinverse if the inverse does notexist. Assumptions.

The following assumptions are necessary for the minimizer of (1) to have goodtheoretical properties, Ravikumar et al. (2011). Thus we assume they are true for the theorem.(A1) There exists some α ∈ ( , ] such that max e ∈ E c || Γ e E ( Γ E E ) − || ≤ ( − α ) .(A2) The ratio between the maximum and minimum eigenvalues of PPP f is bounded.(A3) The maximum (cid:96) norms of the rows of PPP f and ( Γ E E ) − are bounded.(A4) The minimum non-zero value of ( PPP f ) − is Ω ( (cid:112) log ( p ) / n ) for a sub-Gaussian state vectorand Ω ( (cid:112) p / m / n ) for state vectors with bounded m th moments.Our assumptions are stronger than necessary, and it is common to allow the error rates to dependon the bounding constants above, but for simplicity we give the error rates only as a function ofthe dimensionality n , p and sparsity s , d parameters. roof of Theorem 1. From Vershynin (2012) and Ravikumar et al. (2011), we know that for sub-Gaussian random variables and those with bounded 4 m th moments respectively, the SSE of thesample covariance matrix are (cid:40) O (cid:0) p / n (cid:1) O (cid:16) ( log log ( p )) p ( p / n ) − / m (cid:17) (A2)and with high probability and the SSE of ( ˜ PPP f ) − are (cid:40) O ( ( s + p ) log ( p ) / n ) for λ (cid:16) (cid:112) ( p ) / nO (cid:16) ( s + p ) p / m / n (cid:17) for λ (cid:16) (cid:112) p / m / n (A3)with probability 1 - 1/p. (cid:107) HHH ˆ KKK − HHHKKK (cid:107) F = (cid:107) RRR ( HHHPPP f HHH T + RRR ) − − RRR ( HHH ˆ PPP f HHH T + RRR ) − (cid:107) F = (cid:107) RRR (cid:16) ( HHH ˆ PPP f HHH T + RRR ) − (cid:16) ( HHH ˆ PPP f HHH T + RRR ) − ( HHHPPP f HHH T + RRR ) (cid:17) ( HHHPPP f HHH T + RRR ) − (cid:17) (cid:107) F = (cid:107) RRR ( HHH ˆ PPP f HHH T + RRR ) − HHH ( ˆ PPP f − PPP f ) HHH T ( HHHPPP f HHH T + RRR ) − (cid:107) F ≤ (cid:107) RRR ( HHH ˆ PPP f HHH T + RRR ) − HHH (cid:107) (cid:107) ( ˆ PPP f − PPP f ) (cid:107) F (cid:107) HHH T ( HHHPPP f HHH T + RRR ) − (cid:107) F So, the second term has the rates in (A2) and the ﬁnal term is a constant. The ﬁrst term is also aconstant because (cid:107)

RRR ( HHH ˆ PPP f HHH T + RRR ) − HHH (cid:107) ≤ (cid:107) ( HHH ˆ PPP f HHH T RRR − + III ) − (cid:107) (cid:107) HHH (cid:107) = (cid:107) HHH (cid:107) / ( σ min ( HHH ˆ PPP f HHH T RRR − + III )) ≤ (cid:107) HHH (cid:107) . Thus (cid:107)

HHH ˆ KKK − HHHKKK (cid:107) F also has the rates in (A2) and from Lemma 1.1, (cid:107) ˆ KKK − KKK (cid:107) F does too. (cid:107) ˜ KKK − KKK (cid:107) F = (cid:107) (cid:16) (( ˜ PPP f ) − + HHH T RRR − HHH ) − − (( PPP f ) − + HHH T RRR − HHH ) − (cid:17) HHH T RRR − (cid:107) F = (cid:107) (cid:16) (( PPP f ) − + HHH T RRR − HHH ) − (cid:16) (( PPP f ) − + HHH T RRR − HHH ) − (( ˜ PPP f ) − + HHH T RRR − HHH ) (cid:17) (( ˜ PPP f ) − + HHH T RRR − HHH ) − (cid:17) HHH T RRR − (cid:107) F ≤ (cid:107) (( PPP f ) − + HHH T RRR − HHH ) − (cid:107) F (cid:107) ( PPP f ) − − ( ˜ PPP f ) − (cid:107) F (cid:107) ˜ KKK (cid:107) The ﬁrst term is a constant and the second term has the rates in (A3). The ﬁnal term is also aconstant because || ˜ KKK || (cid:16) || HHH ˜ KKK || = / σ min ( III + RRR ( HHH ˜ PPP f HHH T ) − ) ≤ (cid:107) ˜ KKK − KKK (cid:107) F also has therates in (A3). 14 eferences Ades, M., and P. J. van Leeuwen, 2013: An exploration of the equivalent weights particle ﬁlter.

Quarterly Journal of the Royal Meteorological Society ,

139 (672) , 820–840, doi:10.1002/qj.1995.Anderson, J. L., 2007: An adaptive covariance inﬂation error correction algorithm for ensembleﬁlters.

Tellus A ,

59 (2) , 210–224, doi:10.1111/j.1600-0870.2006.00216.x, URL http://dx.doi.org/10.1111/j.1600-0870.2006.00216.x.Anderson, J. L., 2009: Spatially and temporally varying adaptive covariance inﬂation for ensembleﬁlters.

Tellus A ,

61 (1) , 72–83, doi:10.1111/j.1600-0870.2008.00361.x, URL http://dx.doi.org/10.1111/j.1600-0870.2008.00361.x.Bengtsson, T., C. Snyder, and D. Nychka, 2003: Toward a nonlinear ensemble ﬁlter for high-dimensional systems.

Journal of Geophysical Research: Atmospheres ,

108 (D24) .Bishop, C., and D. Hodyss, 2009a: Ensemble covariances adaptively localized with eco-rap. part1: tests on simple error models.

Tellus A ,

61 (1) .Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensembletransform kalman ﬁlter. part i: Theoretical aspects.

Monthly Weather Review ,

129 (3) , 420–436,doi:10.1175/1520-0493(2001)129 (cid:104) (cid:105)

Tellus A ,

61 (1) , 97–111, doi:10.1111/j.1600-0870.2008.00372.x.Burgers, G., P. Jan van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble kalmanﬁlter.

Monthly weather review ,

126 (6) , 1719–1724.Campbell, W. F., C. H. Bishop, and D. Hodyss, 2010: Vertical covariance localization for satelliteradiances in ensemble kalman ﬁlters.

Monthly Weather Review ,

138 (1) , 282–290, doi:10.1175/2009MWR3017.1.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model usingmonte carlo methods to forecast error statistics.

Journal of Geophysical Research: Oceans ,

99 (C5) , 10 143–10 162.Evensen, G., 2003: The ensemble kalman ﬁlter: Theoretical formulation and practical implemen-tation.

Ocean dynamics ,

53 (4) , 343–367.Evensen, G., 2004: Sampling strategies and square root analysis schemes for the enkf.

Oceandynamics ,

54 (6) , 539–560.Foygel, R., and M. Drton, 2010: Extended bayesian information criteria for gaussian graphicalmodels.

Advances in neural information processing systems , 604–612.Frei, M., and H. R. K¨unsch, 2013a: Bridging the ensemble kalman and particle ﬁlters.

Biometrika ,

100 (4) , 781–800.Frei, M., and H. R. K ¨unsch, 2013b: Mixture ensemble kalman ﬁlters.

Computational Statistics &Data Analysis , , 127–138.Friedman, J., T. Hastie, and R. Tibshirani, 2008: Sparse inverse covariance estimation with thegraphical lasso. Biostatistics , , 432–441.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimen-sions. Quarterly Journal of the Royal Meteorological Society ,

125 (554) , 723–757.15odinez, H. C., and J. D. Moulton, 2012: An efﬁcient matrix-free algorithm for the ensemblekalman ﬁlter.

Computational Geosciences ,

16 (3) , 565–575, doi:10.1007/s10596-011-9268-9.Greybush, S. J., E. Kalnay, T. Miyoshi, K. Ide, and B. R. Hunt, 2011: Balance and ensemblekalman ﬁlter localization techniques.

Monthly Weather Review ,

139 (2) , 511–522, doi:10.1175/2010MWR3328.1.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent ﬁltering of backgrounderror covariance estimates in an ensemble kalman ﬁlter.

Monthly Weather Review ,

129 (11) ,2776–2790.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble kalman ﬁlter for atmosphericdata assimilation.

Monthly Weather Review ,

129 (1) , 123–137.Houtekamer, P. L., H. L. Mitchell, and X. Deng, 2009: Model error representation in an op-erational ensemble kalman ﬁlter.

Monthly Weather Review ,

137 (7) , 2126–2143, doi:10.1175/2008MWR2737.1.Houtekamer, P. L., H. L. Mitchell, G. Pellerin, M. Buehner, M. Charron, L. Spacek, and B. Hansen,2005: Atmospheric data assimilation with an ensemble kalman ﬁlter: Results with real obser-vations.

Monthly Weather Review ,

133 (3) , 604–620, doi:10.1175/MWR-2864.1.Hsieh, C.-J., M. A. Sustik, I. S. Dhillon, P. K. Ravikumar, and R. Poldrack, 2013: Big & quic:Sparse inverse covariance estimation for a million variables.

Advances in Neural InformationProcessing Systems , 3165–3173.Hunt, B. R., E. J. Kostelich, and I. Szunyogh, 2007: Efﬁcient data assimilation for spatiotemporalchaos: a local ensemble transform kalman ﬁlter.

Physica D: Nonlinear Phenomena ,

230 (1-2) ,112–126.Jankov´a, J., S. Van De Geer, and Coauthors, 2015: Conﬁdence intervals for high-dimensionalinverse covariance estimation.

Electronic Journal of Statistics , , 1205–1229.Johnstone, I. M., 2001: On the distribution of the largest eigenvalue in principal componentsanalysis. The Annals of Statistics ,

29 (2) , 295–327.Johnstone, I. M., and A. Yu Lu, 2004: Sparse principle component analysis.

UnpublishedManuscript .Julier, S. J., and J. K. Uhlmann, 1997: New extension of the kalman ﬁlter to nonlinear systems.

AeroSense’97 , International Society for Optics and Photonics, 182–193.Lei, J., and P. Bickel, 2011: A moment matching ensemble ﬁlter for nonlinear non-gaussian dataassimilation.

Monthly Weather Review ,

139 (12) , 3964–3973.Li, H., E. Kalnay, and T. Miyoshi, 2009: Simultaneous estimation of covariance inﬂation andobservation errors within an ensemble kalman ﬁlter.

Quarterly Journal of the Royal Meteoro-logical Society ,

135 (639) , 523–533, doi:10.1002/qj.371.Lv, J., and J. S. Liu, 2014: Model selection principles in misspeciﬁed models.

Journal of the RoyalStatistical Society: Series B (Statistical Methodology) ,

76 (1) , 141–167.Miyoshi, T., 2011: The gaussian approach to adaptive covariance inﬂation and its implementationwith the local ensemble transform kalman ﬁlter.

Monthly Weather Review ,

139 (5) , 1519–1535,doi:10.1175/2010MWR3570.1.Nakano, S., 2014: Hybrid algorithm of ensemble transform and importance sampling for assimi-lation of non-gaussian observations.

Tellus A ,

66 (0) .16erger, L., T. Janji, J. Schrter, and W. Hiller, 2012: A uniﬁcation of ensemble square root kalmanﬁlters.

Monthly Weather Review ,

140 (7) , 2335–2345, doi:10.1175/MWR-D-11-00102.1.Nino-Ruiz, E. D., A. Sandu, and X. Deng, 2015: A parallel ensemble kalman ﬁlter imple-mentation based on modiﬁed cholesky decomposition.

Proceedings of the 6th Workshop onLatest Advances in Scalable Algorithms for Large-Scale Systems , 4:1–4:8, ScalA ’15, doi:10.1145/2832080.2832084.Ott, E., and Coauthors, 2004: A local ensemble kalman ﬁlter for atmospheric data assimilation.

Tellus A ,

56 (5) , 415–428.Papadakis, N., E. M´emin, A. Cuzol, and N. Gengembre, 2010: Data assimilation with the weightedensemble kalman ﬁlter.

Tellus A ,

62 (5) , 673–697, doi:10.1111/j.1600-0870.2010.00461.x.Ravikumar, P., M. J. Wainwright, G. Raskutti, B. Yu, and Coauthors, 2011: High-dimensional co-variance estimation by minimizing 1-penalized log-determinant divergence.

Electronic Journalof Statistics , , 935–980.Robert, S., 2014: modiﬁedSWEQ: Simpliﬁed cumulus convection with the modiﬁed SWEQ . Rpackage version 0.1, https://github.com/robertsy/modiﬁedSWEQ.Robert, S., and H. R. K¨unsch, 2016: Local Ensemble Kalman Particle Filters for efﬁcient dataassimilation. ArXiv e-prints , 1605.05476.Schwarz, G., 1978: Estimating the dimension of a model.

Ann. Statist. , , 461–464, doi:10.1214/aos/1176344136.Snyder, C., T. Bengtsson, P. Bickel, and J. Anderson, 2008: Obstacles to high-dimensional particleﬁltering. Monthly Weather Review ,

136 (12) , 4629–4640, doi:10.1175/2008MWR2529.1.Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003:Ensemble square root ﬁlters.

Monthly Weather Review ,

131 (7) , 1485–1490, doi:10.1175/1520-0493(2003)131 (cid:104) (cid:105)

Monthly Weather Review ,

143 (4) , 1347–1367, doi:10.1175/MWR-D-14-00108.1.Ueno, G., and T. Tsuchiya, 2009: Covariance regularization in inverse space.

Quarterly Journalof the Royal Meteorological Society ,

135 (642) , 1133–1156.van Leeuwen, P. J., 2010: Nonlinear data assimilation in geosciences: an extremely efﬁcientparticle ﬁlter.

Quarterly Journal of the Royal Meteorological Society ,

136 (653) , 1991–1999,doi:10.1002/qj.699.Vershynin, R., 2012: How close is the sample covariance matrix to the actual covariance matrix?

Journal of Theoretical Probability ,

25 (3) , 655–686.Wan, E. A., and R. Van Der Merwe, 2000: The unscented kalman ﬁlter for nonlinear estimation.

Adaptive Systems for Signal Processing, Communications, and Control Symposium 2000. AS-SPCC. The IEEE 2000 , Ieee, 153–158.Wang, X., T. M. Hamill, J. S. Whitaker, and C. H. Bishop, 2007: A comparison of hybrid ensembletransform kalman ﬁlteroptimum interpolation and ensemble square root ﬁlter analysis schemes.

Monthly Weather Review ,

135 (3) , 1055–1076, doi:10.1175/MWR3307.1.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observa-tions.

Monthly Weather Review ,

130 (7) , 1913–1924, doi:10.1175/1520-0493(2002)130 (cid:104) (cid:105)

Meteorologische Zeitschrift ,

23 (5) , 483–490, doi:10.1127/0941-2948/2014/0492. 18

IST OF TABLES

Table 1.

Mean, median, 10% and 90% quantile of RMSE averaged over 50 trials. Thenumber in the parentheses is the summary statistics’ corresponding standarddeviation. . . . . . . . . . . . . . . . . . . . . 20 ABLE

1: Mean, median, 10% and 90% quantile of RMSE averaged over 50 trials. The numberin the parentheses is the summary statistics’ corresponding standard deviation. n =

400 10% 50 % Mean 90%TAPER-EnKF 0.580 (.01) 0.815 (.01) 0.878 (.02) 1.240 (.03)PEnKF 0.538 (.02) 0.757 (.03) 0.827 (.03) 1.180 (.05) n =

100 10% 50 % Mean 90%TAPER-EnKF 0.582 (.01) 0.839 (.02) 0.937 (.03) 1.390 (.06)PEnKF 0.717 (.04) 0.988 (.04) 1.067 (.04) 1.508 (.05) n =

25 10% 50 % Mean 90%TAPER-EnKF 0.769 (.04) 1.668 (.13) 1.882 (.09) 3.315 (.11)PEnKF 0.971 (.03) 1.361 (.03) 1.442 (.03) 2.026 (.04) n =

10 10% 50 % Mean 90%TAPER-EnKF 2.659 (.07) 3.909 (.06) 3.961 (.05) 5.312 (.06)PEnKF 1.147 (.02) 1.656 (.02) 1.735 (.02) 2.437 (.04) IST OF FIGURES

Fig. 1.

Each line represents the normalized values of entries of row i of the inverse covariancematrix, ordered from i −

20 to i +

20, where the values are averaged over 50 trials. ThePEnKF algorithm is successful at identifying that the state variables far away from variable i have no effect on it, even though there are fewer ensemble members than state variables. . . 22 Fig. 2.

The RMSE of the TAPER-EnKF and PEnKF over 50 trials. The darker lines of each linetypeare the mean and the colored areas are the 95% conﬁdence intervals. There is clear sepa-ration between the RMSE of the two ﬁlters with the PEnKF’s error as signiﬁcantly smaller.23

Fig. 3.

Fluid height, rain, and wind at 300 different locations at an instance of time. The blue dotsare observations; rain is always observed, wind is only observed when the rain is non-zero,ﬂuid height is never observed. The dashed lines in ﬂuid height are the cloud and rainwaterthresholds. . . . . . . . . . . . . . . . . . . . . . . . 24

Fig. 4.

The RMSE of the TAPER-EnKF, NAIVE-LEnKPF, and PEnKF over 50 trials. The darkerlines of each linetype are the mean and the colored areas are the 95% conﬁdence intervals.All three ﬁlters are pretty indistinguishable except for the PEnKF’s rain error, which isstatistically smaller than the others. . . . . . . . . . . . . . . . . . 25 IG . 1: Each line represents the normalized values of entries of row i of the inverse covariancematrix, ordered from i −

20 to i +