On the Mathematical Theory of Ensemble (Linear-Gaussian) Kalman-Bucy Filtering
OOn the Mathematical Theory of Ensemble (Linear-Gaussian)Kalman-Bucy Filtering
Adrian N. Bishop and Pierre Del Moral CSIRO; and University of Technology Sydney (UTS), Australia INRIA, Bordeaux Research Center, France
Abstract
The purpose of this review is to present a comprehensive overview of the theory ofensemble Kalman-Bucy filtering for linear-Gaussian signal models. We present a system ofequations that describe the flow of individual particles and the flow of the sample covarianceand the sample mean in continuous-time ensemble filtering. We consider these equationsand their characteristics in a number of popular ensemble Kalman filtering variants. Giventhese equations, we study their asymptotic convergence to the optimal Bayesian filter. Wealso study in detail some non-asymptotic time-uniform fluctuation, stability, and contractionresults on the sample covariance and sample mean (or sample error track). We focus ontestable signal/observation model conditions, and we accommodate fully unstable (latent)signal models. We discuss the relevance and importance of these results in characterisingthe filter’s behaviour, e.g. it’s signal tracking performance, and we contrast these resultswith those in classical studies of stability in Kalman-Bucy filtering. We provide intuitionfor how these results extend to nonlinear signal models and comment on their consequenceon some typical filter behaviours seen in practice, e.g. catastrophic divergence.
Contents a r X i v : . [ m a t h . S T ] J un Introduction
Consider a time-invariant, continuous-time, signal and observation model of the form, d X t = a ( X t ) dt + R / d V t d Y t = h ( X t ) dt + R / d W t (1.1)where X t is the underlying signal (latent) process, Y t is the observation signal, a ( · ) and h ( · ) are the signal and sensor model functions, and V t and W t are continuous-time Brownian motion(noise) signals. The filtering problem [2, 6] is concerned with estimating some statistic(s) of thesignal X t conditioned on the observations Y s , ≤ s ≤ t . For example, one may want to charac-terise fully the distribution of X t given Y t , or one may seek some moments of this distribution.The conditional distribution of X t given Y s , ≤ s ≤ t is called the (optimal, Bayesian) filter-ing distribution. When the model functions a ( · ) , h ( · ) are linear, the exact (optimal, Bayesian)solution to this problem is completely characterised by the first two moments of the filteringdistribution and these moments are given by the celebrated Kalman-Bucy filter [43, 2, 8].Apart from the most pathological model cases [6], in the nonlinear model setting, there isin general no finite dimensional optimal filter. In practice, typically some filter approximationsare needed. For example, one may consider a type of “extended” Kalman filter [2] based onlinearisation of the nonlinear model and application of the classical Kalman-Bucy filter. Thismethod works well in suitably regular, and sufficiently close to linear problems. This methoddoes not handle multiple modes in the true conditional filtering distribution well. So-calledGaussian sum filters are another Kalman-filter-type approximation designed to handle in somesense multiple modes in the filtering distribution [2]. More recently there has been some focuson Monte Carlo integration methods for approximating the optimal Bayesian filter [24, 6]. Suchmethods, termed particle filters or sequential Monte Carlo filters/methods [34, 29, 28], havethe advantage of not being subject to the assumption of linearity or Gaussianity in the model.These particle filters are consistent in the number of Monte Carlo samples, i.e. with infinitecomputational power these methods converge to the optimal nonlinear filter. However, typicalparticle filtering algorithms exhibit high computational costs with approximation errors thatgrow (with a fixed sample size) with the signal/observation dimensions [24, 68]. These methodsin practice are not scalable to the high-dimensional filtering or state estimation problems foundin the geosciences and other areas [32, 44, 53, 70].The ensemble Kalman-Bucy filter (generally abbreviated EnKF ) [31, 32] is a type of MonteCarlo sampling approximation of the Kalman filter. In particular, the
EnKF is a recursive algo-rithm for propagating and updating the sample mean and sample covariance of an approximatedBayesian filter [32]. The filter works via the evolution of a collection (i.e. an ensemble) of samples(or ensemble members, or particles) that each satisfies a type of Kalman-Bucy update equation.In classical Kalman-Bucy filtering [43, 2, 8], a gain function weights a predicted state estimatewith the signal observations dependent on the filtering error covariance matrix, see [32, 8]. Inthe
EnKF , the error covariance is replaced by a type of sample covariance in the gain function.The result is a system of interacting particles in the spirit of a mean-field approximation of acertain McKean-Vlasov-type diffusion equation [61, 26]. We may refine this discussion by givingthe relevant equations for a most basic form of
EnKF . Let ( V it , W it , X i ) with ≤ i ≤ N + 1 be ( N + 1) independent copies of ( V t , W t , X ) . The most basic ensemble Kalman filter, originallydue to Evensen [18, 31, 32], is defined by, d X it = a ( X it ) dt + R / d V it + (cid:98) P ht R − (cid:104) d Y t − (cid:16) h ( X it ) dt + R / d W it (cid:17)(cid:105) (1.2)with ≤ i ≤ N + 1 and the (particle) sample mean and the sample cross-covariance defining the2o-called Kalman gain matrix given by, (cid:98) X t := 1 N + 1 N +1 (cid:88) i =1 X it and (cid:98) P ht := 1 N N +1 (cid:88) i =1 (cid:104) X it − (cid:98) X t (cid:105) (cid:34) h ( X it ) − N + 1 N +1 (cid:88) i =1 h ( X it ) (cid:35) (cid:48) (1.3)and we may also write the standard sample covariance, (cid:98) P t := 1 N N +1 (cid:88) i =1 (cid:104) X it − (cid:98) X t (cid:105) (cid:104) X it − (cid:98) X t (cid:105) (cid:48) (1.4)In this work we study this most basic ensemble Kalman filter as described above, and also moresophisticated variants, including the method of Sakov and Oke [72], that exhibit less fluctuationdue to sampling noise. Readers familiar with the Kalman filter will recognise immediately somestructural similarities as discussed above. However, there is no evolution equation given abovefor the covariance as in the Kalman filter (e.g. no Riccati-type matrix flow equation). Instead,we replace the relevant covariance matrices with their sample-based counterparts.Importantly, if the underlying model is linear and Gaussian, then the optimal Bayesian filter-ing distribution is Gaussian, and the EnKF propagates exactly the sample mean and covarianceof the optimal Bayesian filter and is provably consistent. If the model is nonlinear and/or non-Gaussian, then a standard implementation of the
EnKF propagates a sample-based estimate ofthe filtering mean and covariance (but not the true posterior sample mean or covariance andwith no results on consistency). In the context of estimation theory, we may contrast the notionof a state estimator (or observer) with the notion of a Bayesian filter. The goal of the formeris to design an observer that tracks in some suitable (typically point-wise) sense the underlyingsignal and perhaps provides some usable measure of uncertainty on this estimate. The goalof the latter is to compute or approximate the true (Bayesian) filtering distribution (or somerelated statistics). In the nonlinear setting, even with infinite computational power, the
EnKF methods do not converge to the optimal nonlinear filter and indeed their limiting objects arenot well understood in this setting. As discussed more technically later, ensemble Kalman filtersare probably best viewed in practice as a type of (random) sample-based observer or state esti-mator for nonlinear signal/observation models. However, in the special case of linear signal andobservation models they are indeed provably consistent approximations of the optimal Bayesianfilter.In practice, the ensemble Kalman filtering methodology is applied in high-dimensional, non-linear state-space models, e.g. see [31, 32] and the application references listed later in thisintroduction. Empirically, this method has shown good tracking performance in these applica-tions, see [32] and the application references listed later. This tracking behaviour of the
EnKF when applied to practical models may be explainable as a consequence of viewing the
EnKF inthe sense of a dynamic state estimator, cf. the preceding discussion on filtering versus stateestimators/observers. The fluctuation, stability and contraction properties of the
EnKF exploredin this article (albeit mainly for linear-Gaussian models) may be viewed in this context also andprovide some insight into the state estimate tracking behaviour seen in practice.
The purpose of this review is to present a comprehensive overview of the theory of ensembleKalman-Bucy filtering with an emphasis on rigorous results and behavioural characterisations forlinear-Gaussian signal/observation models. We present a system of equations that describe theflow of individual particles, the flow of the sample covariance, and the flow of the sample meanin continuous-time ensemble filtering. We consider these equations and their characteristics ina number of popular
EnKF varieties. Given these equations, we study in detail some fluctuation,stability, and error contraction results for the relevant ensemble Kalman filtering equations .3e discuss the relevance and importance of these results in terms of characterising the
EnKF behaviour, and we contrast these results with those considered in classical studies of stability inKalman-Bucy filtering.We remark that classical studies of stability in (traditional, non-ensemble-type) Kalman-Bucy filtering are important because they rigorously establish the type of “tracking” propertiesdesired in a filtering or estimation problem; and they establish intuitive, testable, model-basedconditions (e.g. so-called model observability) for achieving these convergence properties. Clas-sical results in Kalman-Bucy filtering also establish the (exponential) convergence of the errorcovariance to a fixed steady-state value computable from the model parameters. See the review[8, 10] for detailed results in the classical context and historical remarks. The results in thiswork seek to characterise in an analogous manner the practical performance and behaviour ofensemble Kalman filtering, and these results then provide guidance and intuition on the tracking,approximation error, and other properties of these practical methods.
The
EnKF is a key numerical method for solving high-dimensional forecasting and data assimi-lation problems; see, e.g., [31, 32]. In particular, applications have been motivated by inferenceproblems in ocean and atmosphere sciences [56, 58, 44, 65], weather forecasting [3, 4, 18, 38],environmental and ecological statistics [1, 42], as well as in oil reservoir simulations [33, 64, 76],and many others. This list is by no means exhaustive, nor the cited articles fully representa-tive of the respective applications. We refer to (some of) the seminal methodology papers in[30, 38, 18, 5, 39, 36, 3, 4, 79, 72, 69, 83]. This long list is not exhaustive; see also [32, 44, 53, 70]for more background, and the detailed chronological list of references in Evensen’s text [32]. En-semble Kalman methods for inverse problems have also been considered in the literature [41, 21]with some related analysis [74, 75]. See these references for further details on this topic.In continuous-time, we may broadly break down the class of
EnKF methods into three distincttypes; distinguished by the level of fluctuation added via sampling noise needed to ensure thatthe
EnKF sample mean and covariance are consistent in the linear-Gaussian setting. The originalform of the
EnKF is the so-called ‘vanilla’
EnKF of Evensen [18, 32]; and this method exhibitsthe most fluctuations due to sampling noise. The next class is the so-called ‘deterministic’
EnKF of Sakov and Oke [72], see also [69]; which exhibits (considerably) less fluctuation. In thecontinuous-time linear-Gaussian setting this class is representative of the so-called square-root
EnKF methods [79] (which differ somewhat in discrete-time). Finally, there has been recentinterest in so-called transport-inspired
EnKF methods [69, 78]; which apart from initialisationnoise/randomisation are completely deterministic and whose analysis (in the linear setting)follows that of the classical Kalman-Bucy filter (cf. [8]). These classes do not distinguishthe totality of
EnKF methodology (especially in nonlinear or non-Gaussian models); which mayfurther consist of so-called covariance regularisation methods [5, 39, 36, 62, 31], etc. However,in the linear-Gaussian setting, these three classes broadly capture the fundamentals. We willreturn to these three classes, and some regularisation methods, in the development of this article.Convergence and large-sample asymptotics of the discrete-time
EnKF has been studied in [55,60, 51], in the sense of taking the number of particles to infinity. The discrete-time square rootform of the
EnKF is accommodated in [51], and nonlinear state-space models are accommodatedin [55]. In [54, 22] the authors consider continuous-time (non-Gaussian) state-space models(e.g. certain nonlinear diffusion models) and the convergence (with sample size) of particular(somewhat less standard)
EnKF methods. In the continuous-time, linear-Gaussian, setting, theconvergence (in sample size) of the three broad classes of
EnKF to the true Kalman-Bucy filteris immediate and follows, e.g., from the sample mean and sample covariance matrices evolutionequations in [26, 11]. In this latter sense, we recover the fact that the
EnKF is a consistentapproximation of the optimal, Bayesian filter (i.e. the classical Kalman-Bucy filter) in thelinear-Gaussian setting as discussed earlier. 4n practice, one is typically interested in the non-asymptotic (in terms of ensemble size)fluctuation properties as well as the long time/stability behaviour of the particle-type filteringapproximations.The fluctuation analysis of ensemble Kalman-type filtering is studied in detail in the linear-Gaussian setting in [14, 13, 11]. In [14] a complete Taylor-type stochastic expansion of thesample covariance is given at any order with bounded remainder terms and estimates. Bothnon-asymptotic and asymptotic bias and variance estimates for the
EnKF sample covariance andsample mean are given explicitly in [14]. These latter expansions directly imply an almost surestrong form of a central limit-type result on the sample covariance and sample mean at anytime. The analysis in [14] is considered over the entire path space of the matrix-valued Riccatistochastic differential equation that describes the flow of the sample covariance. However, mostof the non-asymptotic time-uniform results in [14] hold only when the underlying signal is stable.In [13, 11] we consider the case in which the underlying signal may be unstable, and we providetime-uniform, non-asymptotic moment estimates and time-uniform control over the fluctuationof the sample covariance and mean about their limiting Riccati and Kalman-Bucy filtering terms.The emphasis of time-uniformity on the moment bounds and on the fluctuation boundson the sample mean and sample covariance (about the true optimal Bayesian filtering meanand covariance) is important. If these bounds are allowed to grow in time, e.g. typically in thisanalysis one can easily obtain bounds that grow exponentially in time, then these bounds quicklybecome useless for any practical numerical application; e.g. an exponent > may induce anexceedingly pessimistic bound greater than the estimated number of particles of matter in thevisible universe. We remark also that our emphasis on accommodating unstable (latent) signalmodels is important because time-uniform fluctuation results in such cases (which are of realpractical importance) are significantly more difficult to obtain under testable and realistic modelassumptions (like the classical observability and controllability model assumptions in the controland filtering literature [2, 8]).In [26], stability of the EnKF in continuous-time linear-Gaussian models is considered underthe assumption that the underlying signal model is also stable. This latter assumption is incontrast with classical Kalman-Bucy filter stability results, which hold in the linear-Gaussiansetting under the much weaker (and more natural) condition of signal detectability [82, 8, 10].The classical Kalman-Bucy filter is stable as a result of the closed-loop stabilising propertiesof the so-called Kalman gain matrix, which is intimately related to the flow of the filter errorcovariance described by a Riccati differential equation. The
EnKF analogue, in linear-Gaussiansettings, is the sample covariance, and its random fluctuation properties (noted in the precedingparagraph) are the main source of difficulty in establishing the closed-loop filter stability inthose models in which the underlying signal itself is not stable.In [80], the authors analyse the long-time behaviour of the (discrete-time)
EnKF in a classof nonlinear systems, with finite ensemble size, using Foster-Lyapunov techniques. Applyingthe results of [80] to the basic linear-Gaussian filtering problem, the analysis and assumptionsin [80] then also require stability of the underlying signal model. In a traditional sense, theconditions needed in [80] are hard to check, e.g. as compared to the classical observability orcontrollability-type model conditions in Kalman filtering analysis; but a range of examples aregiven in [80]. In [47], the long-time behaviour of the
EnKF is analysed in both discrete andcontinuous time settings with similar conditions on the model as in [80]; and which again iflinearised equates to a form of stability on the signal model.We emphasise again that the type of analysis in [47, 80, 26] cannot handle unstable, ortransient, signal models; i.e. signals with sample paths with at least one coordinate that maygrow unbounded. In the context studied in [47, 80, 26] dealing with stable or bounded latentsignal processes (e.g. the Lorenz-class of signal models [47, 80]), the important question on thefilter stability or filtering error estimates relies on obtaining meaningful quantitative fluctuationconstants decreasing with the number of ensemble members to achieve a desired performance.5f course, time uniformity of these bounds follows trivially in this setting from the boundednessproperties of the latent signal process.Covariance inflation is a mechanism used in practical methods to increase the positive-definiteness of the sample covariance matrix and essentially amplify its effect on the stabilisationproperties of the Kalman gain matrix. In [47] time-uniform
EnKF error boundedness results followunder a true signal stability condition and given a sufficiently large variance inflation regime.See also [81, 59] for related stability analysis in the presence of adaptive covariance inflation andprojection techniques. In [11] in the continuous-time linear-Gaussian setting, the mechanism bywhich covariance inflation acts to stabilise the ensemble filter is exemplified, see also [16].In the continuous-time, and linear-Gaussian setting, the first work to relax the assumptionof underlying signal stability for the
EnKF is in [14, 13, 11]. In those articles, signals withsample paths that may grow unbounded (to infinity exponentially fast) are accommodated.That work is based on both a fluctuation analysis of the sample covariance and the samplemean [14, 13, 11], followed by studies on the long-time behaviour, e.g. stability properties,of both the sample covariance and mean [13, 11]. Time-uniform fluctuation properties aregiven under a type of (strong) signal observability condition. In this setting time-uniformity ofthese results is non-trivial. This assumption is in keeping with classical Kalman-Bucy filteringand Riccati equation results; and does not require any form of underlying signal stability. Asthe authors of [80] note in their stability analysis, they use “few properties of the forecast[predicted] covariance matrix other than positivity”. As noted in [80], this lends generality totheir results, but conversely places the burden back on the signal model assumptions (includingthose assumptions of true signal stability). Contrast this with the work in [14, 13, 11] whereemphasis is placed on the fluctuation analysis of the sample covariance, with a primary aim ofremoving the stability assumptions needed on the underlying signal model. The time-uniformfluctuation and stochastic perturbation contributions in [14, 13, 11], were discussed earlier.Given this fluctuation analysis, the stability of the filter sample mean and sample covarianceand their (time) asymptotic properties are studied in [14, 13, 11] without stability assumptions onthe underlying signal model. These results rigorously establish the type of “tracking” propertiesdesired by a filtering or estimation solution.Although of lesser practical use in ensemble filtering applications, strong results in the one-dimensional setting are also derived in [13] that converge, e.g. in the limit with the ensemblesize, to those properties of the classical Kalman-Bucy filter. For example, we can recover theoptimal exponential contraction and filter stability rates, etc. In the multidimensional setting,the decay rates to equilibrium are not sharp, and the stationary measures are not given in closedform.
The main goal of this article is to: 1). present a novel formulation for ensemble filtering inlinear-Gaussian, continuous-time, systems that lends itself naturally to analysis; 2). study thestability of the resulting stochastic Riccati differential equation that describes the flow of thesample covariance; 3). study the stability of the continuous-time ensemble Kalman-Bucy updateequation that is coupled to this stochastic Riccati equation, and which describes the flow of thesample mean (or the sample mean minus the true signal, i.e. the sample error signal); 4). providedetailed fluctuation analysis of the ensemble Kalman-Bucy flow, the sample mean, and thestochastic Riccati equation describing the sample covariance. This article is primarily a review ofthe literature and results in these directions. The prime focal point of this review are the articles[26, 14, 13, 11], which focus heavily on the linear-Gaussian model setting. In those articles, an emphasis is placed on deriving time-uniform fluctuation, stability, and contraction resultsunder testable model conditions reminiscent of those classical observability and controllability-type model assumptions. Importantly, we do not generally assume the true underlying signal isstable in this review.
EnKF methodsconsidered is always under-biased when compared to the true covariance matrix. This may mo-tivate, from a pure uncertainty quantification viewpoint, some form of covariance regularisation[5, 39, 36, 62, 31]. We provide detailed analysis illustrating the effect of inflation regularisation onstability (similarly to [47, 81, 59]). As another example, we provide strong intuition for so-calledcatastrophic filter divergence (studied previously in [37, 35, 48]) based on rigorous (heavy-tailed)fluctuation properties inherent to the relevant sample covariance matrices and their invariantdistributions. We contrast the so-called ‘vanilla’
EnKF of [18, 32] with the ‘deterministic’
EnKF ofSakov and Oke [72] in terms of their fluctuation and sample noise characteristics and we showhow this effects their respective sample behaviour and stability properties.As with classical (non-ensemble) Kalman filtering, the importance of the results reviewedis in rigorously establishing the type of tracking and stability behaviour desired in filteringapplications [2, 24, 6, 8]. For example, our results imply conditions under which the initialestimation errors are forgotten, and that the flow of the sample mean converges to the trueKalman filtering mean estimate (and thus the true underlying signal) in the average. In thiscase, there must be some emphasis on the stochastic behaviour of the ensemble (Monte Carlo)mean and covariance in order to establish filter stability. We also provide the analogue of theerror covariance fixed point in classical Kalman filtering [2, 8]; whereby we state results thatensure the sample covariance matrix converges to an invariant, steady-state, distribution. Wecharacterise the properties of this invariant distribution and relate this to the sample behaviourof the ‘vanilla’
EnKF [18, 32] and the ‘deterministic’
EnKF [72].We focus on the linear, continuous-in-time, Gaussian setting in this review and note that inthis case the sample mean and sample covariance are consistent approximations of the optimalBayesian filtering mean and covariance. We emphasise that even in the linear-Gaussian case,the samples themselves are not in general independent. The analysis even in the linear settingis highly technical [26, 14, 13, 11] and the results presented in this case are aimed as a step inthe progression to more applied results and intuition in nonlinear model settings. There is someprecedent for studying the relative properties, behaviour, or performance of ensemble Kalmanfiltering firstly with linear-Gaussian signal models [31]. For example, the seminal article [18]illustrated that a perturbation of the observations in the ensemble Kalman filter was necessary torecover a consistent covariance limit (to the true Kalman filter for linear-Gaussian systems); or toachieve the standard Monte Carlo error rate with a finite size set of particles. The analysis (andeven derivation) of ensemble square root filters for linear-Gaussian system models is standard[73, 57], etc. Convergence of the ensemble Kalman filter in inverse problems is studied in [74]in the linear setting. We discuss connections and extensions of the results in this article to thenonlinear model setting toward the end.Note that the analysis and proofs in [26, 14, 13, 11], while motivated originally by en-semble Kalman-type filtering methods, are largely presented as independent technical resultson certain general classes of matrix-valued Riccati diffusion equations and associated linearstochastic differential equations with random coefficients. In this review we emphasise the workin [26, 14, 13, 11] via a series of results directly and solely stated in the context of ensembleKalman-type filtering. Throughout we relate our results to the broader technical literature onensemble Kalman filtering and we emphasise the practical significance of these results, e.g. viathe tracking property of the filter, its stability, or via their error fluctuation or catastrophicdivergence behaviour, among other topics. We also contrast the behaviour of the various classesof continuous-time
EnKF methods. 7 .4 Notation
We remark firstly that some care must be taken throughout to keep track of the font stylings; e.g.upright vs. calligraphic vs. script, etc. There is typically a relationship between like symbolsappearing with different stylings.Hatted terms (cid:98) · := · N should be viewed as terms indexed to the ensemble (particle) size N ≥ .Time is indexed variously by s, t, u, τ ∈ [0 , ∞ [ . We write c, c n , c τ , c n,τ , c n,τ ( Q ) , c n,τ ( z, Q ) . . . forsome positive constants whose values may vary from result to result, and which only dependon some parameters n, τ, z, Q , etc, as well as implicitly on the model parameters ( A, H, R, R ) introduced later. Importantly, these constants do not depend on the time horizon t , nor on thenumber of ensemble particles N .Let M d be the set of ( d × d ) real matrices with r ≥ and M d ,d the set of ( d × d ) realmatrices. Let S d ⊂ M d be the subset of symmetric matrices, and S d , and S + d the subsets ofpositive semi-definite and definite matrices respectively. We write A ≥ B when A − B ∈ S d ; and A > B when A − B ∈ S + d . We denote by and I the null and identity matrices, for any d ≥ .Given R ∈ ∂ S + d := S d − S + d we denote by R / a (non-unique) symmetric square root of R . When R ∈ S + d we choose the unique symmetric square root. We write A (cid:48) the transpose of A , and A sym =( A + A (cid:48) ) / its symmetric part. We denote by Absc( A ) := max { Re ( λ ) : λ ∈ Spec( A ) } its spectralabscissa. We also denote by Tr( A ) the trace. When A ∈ S d we let λ ( A ) ≥ . . . ≥ λ d ( A ) denotethe ordered eigenvalues of A . We equip M d with the spectral norm (cid:107) A (cid:107) = (cid:107) A (cid:107) = (cid:112) λ ( AA (cid:48) ) or the Frobenius norm (cid:107) A (cid:107) = (cid:107) A (cid:107) Frob = (cid:112) Tr( AA (cid:48) ) . Let µ ( A ) denote the matrix logarithmic“norm” (which can be < ) [77]. For example, the (2-)logarithmic “norm” is µ ( A ) = λ ( A sym ) .We have µ ( · ) ≥ Absc( · ) .Given a suitably regular matrix-valued stochastic process t (cid:55)→ Q t ∈ M d , for any t ≥ and n ≥ , we set ||| Q t ||| n = E [ (cid:107) Q t (cid:107) n ] /n where E [ · ] denotes the expectation. Consider a time-invariant linear-Gaussian filtering model of the following form, d X t = A X t dt + R / d V t d Y t = H X t dt + R / d W t (2.1)where A ∈ M d and H ∈ M d y ,d are the signal and sensor model matrices respectively, and R ∈ S d and R ∈ S + d y are the respective signal and sensor noise covariance matrices. The noise inputs V t and W t are d and d y -dimensional Brownian motions, and X is an d -dimensional Gaussianrandom variable (independent of ( V t , W t ) ) with mean E ( X ) and covariance P ∈ S d .We let Y = 0 and Y t = σ ( Y s , s ≤ t ) be the σ -algebra generated by the observations. Theconditional distribution η t := Law ( X t | Y t ) of the signal states X t given Y t is Gaussian with aconditional mean and covariance given by X t := E ( X t | Y t ) and P t := E (cid:0) [ X t − X t ] [ X t − X t ] (cid:48) | Y t (cid:1) . The mean and the covariance obey the Kalman-Bucy and the Riccati equations dX t = A X t dt + P t H (cid:48) R − ( d Y t − HX t dt ) (2.2) ∂ t P t = Ricc( P t ) (2.3)with the Riccati drift function from S d into S r defined for any Q ∈ S d by Ricc( Q ) := AQ + QA (cid:48) − QSQ + R (2.4)8nd with, S := H (cid:48) R − H (2.5)Importantly, the covariance of the conditional distribution Law ( X t | Y t ) in this case does notdepend on the observations Y t . The error Z t := ( X t − X t ) satisfies dZ t = ( A − P t S ) Z t dt + P t H (cid:48) R − / d W t − R / d V tlaw = ( A − P t S ) Z t dt + ( P t S P t + R ) / d B t (2.6)where B t is some independent d -dimensional Brownian motion. Here we make use of a martin-gale representation theorem, e.g. [45, Theorem 4.2], see also [27].Let φ t ( Q ) := P t denote the flow of the matrix differential equation (2.3) with P = Q ∈ S d .Let ψ t ( z, Q ) := Z t denote the flow of the stochastic error (2.6) with Z = z = ( x − X ) ∈ R d and P t = φ t ( Q ) . Finally, we denote the flow of the Kalman-Bucy update (2.2) with X = x ∈ R d by χ t ( x, Q ) := X t . This notation allows us to reference the flows ψ t ( z, Q ) , φ t ( Q ) , χ t ( x, Q ) withrespect to their initialisation at t = 0 which is useful when we compare flows and study stability.Throughout the remainder, we assume that ( A, R / ) and ( A, H ) are controllable and ob-servable pairs in the sense that (cid:104) R / , AR / . . . , A r − R / (cid:105) and HHA ... HA r − have rank d. (2.7)Note that if R ∈ S + d is positive definite it follows the controllability holds trivially. We considerthe observability and controllability Gramians ( O t , C t ( O )) and ( C t , O t ( C )) associated with thetriplet ( A, R, S ) and defined by O t := (cid:90) t e − A (cid:48) s S e − As ds and C t ( O ) := O − t (cid:20)(cid:90) t e − ( t − s ) A (cid:48) O s R O s e − ( t − s ) A ds (cid:21) O − t (2.8) C t := (cid:90) t e As R e A (cid:48) s ds and O t ( C ) := C − t (cid:20)(cid:90) t e ( t − s ) A C s S C s e ( t − s ) A ds (cid:21) C − t . Given (2.7), there exists some parameters τ, (cid:36) o,c ± , (cid:36) c ± ( O ) , (cid:36) o ± ( C ) > such that (cid:36) c − ≤ (cid:107)C τ (cid:107) ≤ (cid:36) c + and (cid:36) o − ≤ (cid:107)O τ (cid:107) ≤ (cid:36) o + (2.9) (cid:36) c − ( O ) ≤ (cid:107)C τ ( O ) (cid:107) ≤ (cid:36) c + ( O ) and (cid:36) o − ( C ) ≤ (cid:107)O τ ( C ) (cid:107) ≤ (cid:36) o + ( C ) . (2.10)The parameter τ is often called the interval of observability-controllability.These rank conditions (2.7) ensure the existence and the uniqueness of a positive definitefixed-point matrix P ∞ solving the so-called algebraic Riccati equation Ricc( P ∞ ) := AP ∞ + P ∞ A (cid:48) − P ∞ SP ∞ + R = 0 . (2.11)Importantly, in this case, the matrix difference ( A − P ∞ S ) is asymptotically stable (Hurwitzstable, i.e. Absc( A − P ∞ S ) < ) even when the signal matrix A is unstable [52, Theorems9.12, 9.15]. More relaxed conditions (i.e. detectability and stabilisability) for a stabilisingsolution (perhaps only positive semi-definite) to exist are discussed widely in the literature; see[49, 63, 52] and the convergence results in [50, 19]. A (marginally) stable solution to (2.11) existsunder a detectability condition, and convergence to this solution is given under mild additional9onditions in [67, 20, 66]. In [82], given only detectability, the stability of a time-varying “closedloop”, e.g. ( A − φ t ( Q ) S ) , is proven even when ( A − P ∞ S ) , is only marginally stable.For any s ≤ t and Q ∈ S d we define the state-transition matrix, E s,t ( Q ) := exp (cid:18)(cid:73) ts ( A − φ u ( Q ) S ) du (cid:19) ⇐⇒ ∂ t E s,t ( Q ) = ( A − φ u ( Q ) S ) E s,t ( Q ) . (2.12)When s = 0 we often write E t ( Q ) instead of E ,t ( Q ) . The matrix E t ( Q ) is the fundamental matrix.We have E s,t ( Q ) = E t ( Q ) E s ( Q ) − . The following convergence estimates follow from [8, 10]: Forany Q, Q , Q ∈ S d and any t ≥ we have the local contraction inequalities (cid:107)E t ( Q ) (cid:107) ≤ c (1 + (cid:107) Q (cid:107) ) (cid:107)E t ( P ∞ ) (cid:107) and (cid:107)E t ( P ∞ ) (cid:107) = (cid:107) e t ( A − P ∞ S ) (cid:107) ≤ c e − α t (2.13)for some finite α, c > and with P ∞ solving (2.11) and (cid:107)E t ( Q ) − E t ( Q ) (cid:107) ≤ c ( Q , Q ) e − α t (cid:107) Q − Q (cid:107) (2.14)for some finite constant c ( Q , Q ) > . In addition, there exists some parameter τ > suchthat for any s ≥ and any t ≥ τ > we have the uniform estimates, (cid:107)E s,s + t ( Q ) (cid:107) ≤ c τ (cid:107)E t ( P ∞ ) (cid:107) (2.15)Note it is desirable to relate the decay of E s,s + t ( Q ) to the decay at the fixed point (cid:107)E t ( P ∞ ) (cid:107) = (cid:107) e t ( A − P ∞ S ) (cid:107) ≤ c e − α t (since as t → ∞ it is clear that we cannot do better). See [10] for anexplicit Floquet-type expression of E t ( Q ) in terms of E t ( P ∞ ) .The convergence and stability properties of the Kalman-Bucy filter and the associated Riccatiequation are directly related to the contraction properties of the state-transition matrix E s,t ( Q ) .To get some intuition for this we note, ψ t ( z, Q ) = E s,t ( Q ) ψ s ( z, Q ) + (cid:90) ts E u,t ( Q ) ( φ u ( Q ) S φ u ( Q ) + R ) / d B u (2.16)and φ t ( Q ) = E s,t ( Q ) φ s ( Q ) E s,t ( Q ) (cid:48) + (cid:90) t E s,t ( Q ) ( φ s ( Q ) S φ s ( Q ) + R ) E s,t ( Q ) (cid:48) ds (2.17)for any s ≤ t .From [8], for any t ≥ τ > and any Q ∈ S d we have the uniform estimates (cid:0) O τ ( C ) + C − τ (cid:1) − ≤ φ t ( Q ) ≤ O − τ + C τ ( O ) . (2.18)We also have ≤ φ t ( Q ) ≤ P ∞ + e ( A − P ∞ S ) t ( R − P ∞ ) e ( A − P ∞ S ) (cid:48) t (2.19)The following stability result follows from [8, 10]: For any Q , Q ∈ S d and for any t ≥ , (cid:107) φ t ( Q ) − φ t ( Q ) (cid:107) ≤ c (1 + (cid:107) Q (cid:107) + (cid:107) Q (cid:107) ) (cid:107)E t ( P ∞ ) (cid:107) (cid:107) Q − Q (cid:107) (2.20)and recall the exponential contraction estimate on (cid:107)E t ( P ∞ ) (cid:107) in (2.13). Similarly, using (2.15),for any s ≥ and any t ≥ τ > , we have (cid:107) φ s,s + t ( Q ) − φ s,s + t ( Q ) (cid:107) ≤ c τ (cid:107)E t ( P ∞ ) (cid:107) (cid:107) Q − Q (cid:107) (2.21)Note that both (2.20) and (2.21) imply immediately that φ t ( Q ) → t →∞ P ∞ exponentially fastfor any Q ∈ S d ; e.g. by letting Q = P ∞ . 10ote that the uniform estimates independent of the initial condition stated throughout,involve some arbitrarily small, positive time parameter τ , which is related to the notion of aso-called observability/controllability interval; for further details on this topic we refer to [8].Results (e.g. bounds and convergence results) on the flow of the inverse of the solution ofthe Riccati equation are considered in [8] and are relevant for proving results on the flow of theRiccati equation itself; e.g. upper bounds on the flow of the inverse solution help to lower boundsolutions of the Riccati flow. The flow of the inverse Riccati solution may also be of interest onits own as it relates to the flow of “information” (as the inverse of covariance).Given the contraction properties on E s,t ( Q ) it is often said the “deterministic part” of thefilter error ∂ t Z t = ( A − P t S ) Z t is stable. From [8] we can be more explicit if desired, forexample, for any t ≥ τ we have the uniform estimate, sup Q ∈ S d (cid:107) E ( ψ t ( z, Q ) | X ) (cid:107) ≤ c e − α t (cid:107) z − X (cid:107) (2.22)for some rate α > and some finite constant c > . Moreover, the conditional probability ofthe following event (cid:107) ψ t ( z, Q ) (cid:107) ≤ c ( Q ) (cid:18) e − αt (cid:107) z − X (cid:107) + e √ (cid:20)
12 + (cid:16) δ + √ δ (cid:17)(cid:21)(cid:19) (2.23)given the state variable X is greater than − e − δ , for any δ ≥ . And, for any t ≥ , z , z ∈ R d , Q , Q ∈ S d and any n ≥ we have the almost sure local contraction estimate E ( (cid:107) ψ t ( z , Q ) − ψ t ( z , Q ) (cid:107) n | X ) n ≤ c ( Q , Q ) e − αt (cid:107) z − z (cid:107) + c n ( Q , Q ) e − αt (1 + (cid:107) z − X (cid:107) ) (cid:107) Q − Q (cid:107) (2.24)with some rate α > and the finite constants c ( Q , Q ) , c n ( Q , Q ) > . For any probability measure η on R d we let P η denote the η -covariance η (cid:55)→ P η := η (cid:0) [ ι − η ( ι )][ ι − η ( ι )] (cid:48) (cid:1) (3.1)with the identity function ι ( x ) := x and the column vector η ( f ) := (cid:82) f dη for some measurablefunction f : R d → R d .We now consider three different cases of a conditional nonlinear McKean-Vlasov-type diffu-sion process, ( F1 ) d X t = A X t dt + R / d V t + P η t H (cid:48) R − (cid:104) d Y t − (cid:16) H X t dt + R / d W t (cid:17)(cid:105) ( F2 ) d X t = A X t dt + R / d V t + P η t H (cid:48) R − (cid:20) d Y t − H (cid:18) X t + η t ( ι )2 (cid:19) dt (cid:21) (3.2) ( F3 ) d X t = A X t dt + R P − η t ( X t − η t ( ι )) dt + P η t H (cid:48) R − (cid:20) d Y t − H (cid:18) X t + η t ( ι )2 (cid:19) dt (cid:21) where η t := Law ( X t | Y t ) (3.3)and thus the diffusions in (3.2) depend in some nonlinear fashion on the conditional law of the dif-fusion process itself. In all three cases ( V t , W t , X ) are independent copies of ( V t , W t , X ) . Thesediffusions are time-varying Ornstein-Uhlenbeck processes [26] and consequently η t is Gaussian;see also [8]. These Gaussian distributions have the same conditional mean η t ( ι ) and conditionalcovariance P η t . 11 roposition 3.1 ([26, 8]) . We have η t := Law ( X t | Y t ) = Law ( X t | Y t ) =: η t (3.4) and X t := η t ( ι ) = η t ( ι ) and P t = P η t = P η t where X t and P t correspond to the Kalman-Bucyfilter update and Riccati equations in (2.2) and (2.3). We may refer to this specific class (3.2) of McKean-Vlasov-type diffusion as a Kalman-Bucydiffusion process [8]. The case ( F1 ) corresponds to the limiting object that is sampled in thecontinuous-time version of the ‘vanilla’ EnKF [32]; while ( F2 ) is the continuous-time limitingobject that is sampled in the ‘deterministic’ EnKF of [72], see also [69]; and ( F3 ) is a fullydeterministic transport-inspired equation [69, 78]. Note that in this case ( F3 ) the existence ofthe inverse of P η t is given by the positive-definiteness properties of the solution of the Riccatiequation in (2.3). In the next section we detail the Monte-Carlo ensemble filters derived fromthese Kalman-Bucy diffusion processes.Note we may define a generalised version of case ( F3 ) by, ( F3 (cid:48) ) d X t = A X t dt + R P − η t ( X t − η t ( ι )) dt + P η t H (cid:48) R − (cid:20) d Y t − H (cid:18) X t + η t ( ι )2 (cid:19) dt (cid:21) + G t P − η t ( X t − η t ( ι )) dt (3.5)for any skew symmetric matrix G (cid:48) t = − G t that may also depend η t . This added tuning parametermay be related to an optimality metric, when deriving this transport equation from an optimaltransport beginning. We may also write similar generalised versions ( F1 (cid:48) ) and ( F2 (cid:48) ) by adding G t P − η t ( X t − η t ( ι )) to ( F1 ) and ( F2 ). Ensemble Kalman-Bucy filters (
EnKF ) coincide with the mean-field particle interpretation of thenonlinear diffusion processes defined in (3.2).Let ( V it , W it , X i ) with ≤ i ≤ N + 1 be ( N + 1) independent copies of ( V t , W t , X ) . Again,we consider three different cases of Kalman-Bucy-type interacting diffusion process, ( F1 ) d X it = A X it dt + R / d V it + (cid:98) P t H (cid:48) R − (cid:104) d Y t − (cid:16) H X it dt + R / d W it (cid:17)(cid:105) ( F2 ) d X it = A X it dt + R / d V it + (cid:98) P t H (cid:48) R − (cid:34) d Y t − H (cid:32) X it + (cid:98) X t (cid:33) dt (cid:35) (4.1) ( F3 ) d X it = A X it dt + R (cid:98) P − t (cid:16) X it − (cid:98) X t (cid:17) dt + (cid:98) P t H (cid:48) R − (cid:34) d Y t − H (cid:32) X it + (cid:98) X t (cid:33) dt (cid:35) with ≤ i ≤ N + 1 and the rescaled (particle) sample mean and covariance (cid:98) η t := η N t = 1 N + 1 N +1 (cid:88) i =1 δ X it = ⇒ (cid:98) X t := X N t = 1 N + 1 N +1 (cid:88) i =1 X it and (cid:98) P t := P N t = N + 1 N P (cid:98) η t (4.2)In cases ( F1 ) and ( F2 ) we have N ≥ and in case ( F3 ) we require N ≥ d for the almost sureinvertibility of (cid:98) P t (although in case ( F3 ) one may substitute a pseudo-inverse of (cid:98) P t withoutchanging the mathematical analysis). A sampled version of case ( F3 (cid:48) ) may also be derived inthe same way. 12 .1 Vanilla Ensemble Kalman-Bucy Filter The vanilla
EnKF , denoted by
VEnKF , is associated with the first case ( F1 ) of nonlinear process X t in (3.2) and is defined by the Kalman-Bucy-type interacting diffusion process ( F1 ) in (4.1).We then have the following key result. Proposition 4.1 ([26]) . Let N ≥ . The stochastic flow of the sample mean satisfies, d (cid:98) X t law = (cid:16) A − (cid:98) P t S (cid:17) (cid:98) X t dt + (cid:98) P t H (cid:48) R − d Y t + 1 √ N + 1 (cid:16) R + (cid:98) P t S (cid:98) P t (cid:17) / d B t (4.3) where B t is an independent d -dimensional Brownian motion.The sample covariance evolves according to a so-called matrix-valued Riccati diffusion processof the form, d (cid:98) P t law = Ricc( (cid:98) P t ) dt + 2 √ N (cid:20) (cid:98) P / t d M t (cid:16) R + (cid:98) P t S (cid:98) P t (cid:17) / (cid:21) sym (4.4) where M t is a ( d × d ) -matrix with independent Brownian entries (also independent of B t ). We see that for the vanilla
EnKF , the convergence of (cid:98) X t → X t and (cid:98) P t → P t as N → ∞ followsimmediately. This result follows via the martingale representation theorem, e.g. Theorem 4.2in [45], see also [27]. The ‘deterministic’
EnKF , denoted
DEnKF , is associated with the second case ( F2 ) of nonlinearprocess X t in (3.2), and is defined by the Kalman-Bucy-type interacting diffusion process ( F2 ) in(4.1). This ‘deterministic’ epithet in the DEnKF follows because the update ‘part’ of the particleflow is deterministic and does not rely on the stochastic perturbations by W it appearing in the VEnKF . This name and idea was proposed in [72]. We have the following key result.
Proposition 4.2 ([13, 11]) . Let N ≥ . The stochastic flow of the sample mean satisfies, d (cid:98) X t law = (cid:16) A − (cid:98) P t S (cid:17) (cid:98) X t dt + (cid:98) P t H (cid:48) R − d Y t + 1 √ N + 1 R / d B t (4.5) where B t is an independent d -dimensional Brownian motion.The sample covariance evolves according to a so-called matrix-valued Riccati diffusion processof the form, d (cid:98) P t law = Ricc( (cid:98) P t ) dt + 2 √ N (cid:104) (cid:98) P / t d M t R / (cid:105) sym (4.6) where M t is a ( d × d ) -matrix with independent Brownian entries (also independent of B t ). Again, for the
DEnKF , the convergence of (cid:98) X t → X t and (cid:98) P t → P t as N → ∞ follows imme-diately. Note the simplified diffusion weighting(s) in the case of the DEnKF , as compared to the
VEnKF . The fully deterministic ensemble transport filter
DEnTF is associated with the third case ( F3 ) ;defined by the Kalman-Bucy-type interacting diffusion process ( F3 ) in (4.1). In this case, wehave the special result. Proposition 4.3 ([69, 78]) . Let N ≥ . The flow of sample mean is given by, d (cid:98) X t = (cid:16) A − (cid:98) P t S (cid:17) (cid:98) X t dt + (cid:98) P t H (cid:48) R − d Y t , (cid:98) X := 1 N + 1 N +1 (cid:88) i =1 X i (4.7)13 he sample covariance evolves according to the deterministic Riccati equation, d (cid:98) P t = Ricc( (cid:98) P t ) dt, (cid:98) P := N + 1 N P (cid:98) η (4.8)Note that the particle mean (cid:98) X t and the particle covariance (cid:98) P t associated with the particleinterpretation ( F3 ) discussed in (4.1) satisfy exactly the equations of the Kalman-Bucy filterwith the associated deterministic Riccati equation.The “randomness” in this case only comes from the initial conditions. The stability analysisof this class of DEnTF model resumes to the one of the Kalman-Bucy filter and the associatedRiccati equation. Thus, the results, e.g., in (2.20), (2.22), (2.23) and (2.24) hold immediately;see also [8] in the linear-Gaussian setting. In [22, 23] this filter is analysed in the case of anonlinear signal, but fully observed (linear observation) model. The fluctuation analysis in thiscase can also be developed easily by combining these stability results w.r.t. the initial state (see[8]) with conventional sample estimates based on independent copies of the initial states, see forinstance [15] for estimates associated with classical sample covariance estimates. Consequently,we do not consider this class of model going forward.
In practice, the ensemble Kalman filtering methodology is applied in high-dimensional, nonlinearstate-space models, e.g. see [31, 32] and the application references listed in the introduction.It is rather straightforward to extend the algorithmic particle methods in (4.1) to nonlinearsystems as we now outline. Consider a time-invariant nonlinear diffusion model of the form, d X t = a ( X t ) dt + R / d V t d Y t = h ( X t ) dt + R / d W t (4.9)where a : R d → R d and h : R d → R d y are the nonlinear signal and sensor model functions ofsome sufficient regularity.Let ( V it , W it , X i ) with ≤ i ≤ N + 1 be ( N + 1) independent copies of ( V t , W t , X ) . Weconsider the three EnKF variants as before and define the flow of particles by, ( NF1 ) d X it = a ( X it ) dt + R / d V it + (cid:98) P ht R − (cid:104) d Y t − (cid:16) h ( X it ) dt + R / d W it (cid:17)(cid:105) ( NF2 ) d X it = a ( X it ) dt + R / d V it + (cid:98) P ht R − (cid:34) d Y t − (cid:32) h ( X it ) + (cid:98) h t (cid:33) dt (cid:35) (4.10) ( NF3 ) d X it = a ( X it ) dt + R (cid:98) P − t (cid:16) X it − (cid:98) X t (cid:17) dt + (cid:98) P ht R − (cid:34) d Y t − (cid:32) h ( X it ) + (cid:98) h t )2 (cid:33) dt (cid:35) with ≤ i ≤ N + 1 and the (particle) sample mean (cid:98) X t and sample covariance (cid:98) P t defined asusual, e.g. see (4.2), and with the observation function sample mean and sample cross-covariancedefined as, (cid:98) h t := 1 N + 1 N +1 (cid:88) i =1 h ( X it ) and (cid:98) P h := 1 N N +1 (cid:88) i =1 (cid:104) X it − (cid:98) X t (cid:105) (cid:104) h ( X it ) − (cid:98) h t (cid:105) (cid:48) (4.11)The EnKF implementation for nonlinear signal and observation models is thus a natural progres-sion of the ensemble filters in (4.1), with the appropriate nonlinear replacements of the drift andsensor functions, e.g. A X it with a ( X it ) , etc. Except in the fully deterministic case ( NF3 ), we arenot aware of any results that rigorously establish a limiting ( N → ∞ ) evolution equation, say X t .We may conjecture that such a limit, X t , is the natural analogue of the McKean-Vlasov equa-tions in (3.2), but this remains to be proven. Similarly, except in case ( NF3 ), the limiting mean14nd sample covariance equations and their interpretation remains a topic of open investigation.For example, in any case, it is clear that the flow of the sample covariance of the particles in(4.10) will not satisfy the type of Riccati diffusion we see under linear models.It is worth noting again that in the linear model setting, the
EnKF algorithms in (4.1) arederived as sampled versions of the McKean-Vlasov equations (3.2). This is a natural derivationof a particle-type approximation of the optimal Bayes filter (i.e. an ensemble filter). This followsbecause the (conditional) distribution of the state in (3.2) is exactly the optimal Bayes filter(which is Gaussian with mean and covariance given by the classical Kalman filter). The
EnKF inthat linear setting is shown to be consistent, i.e. with infinite computational power one recoverstruly the optimal Bayesian filter.In the nonlinear model setting, the
EnKF algorithms in (4.10) are most naturally derived asheuristic extensions of (4.1) so as to accommodate the nonlinearities in the model. For example,swapping A X it with a ( X it ) is a type of heuristic extension moving from (4.1) to (4.10); albeitthis is a very natural heuristic step also. This difference in derivation is exemplified by thefact that, except perhaps in case ( NF3 ), the limiting ( N → ∞ ) object to which each ensemblemember converges has not been rigorously established. And it is exemplified further by notingthat whatever that limiting object may look like, it is certainly not a diffusion equation withdistribution equal to the optimal Bayes filter. That is,Law ( X t | Y t ) (cid:54) = Law ( X t | Y t ) =: η t (4.12)in the nonlinear model setting. Said differently, even with infinite computational power, the EnKF methods as applied in this nonlinear model setting do not converge to the optimal nonlinearBayes filter. As discussed previously and again later, the
EnKF in this nonlinear model setting isprobably best viewed in practice as a type of (random) sample-based observer or (point-valued)state estimator (and not as an approximation of the optimal Bayesian filter).We discuss connections and extensions of our results to the nonlinear model setting in a latersection (at the end of this review article).
Going forward, we consider only the
VEnKF (case ( F1 )) and DEnKF (case ( F2 )) since as noted thetheory of the DEnTF in the linear-Gaussian setting reverts to that of the standard Kalman-Bucyfilter as detailed in [8]. The parameter κ ∈ { , } will distinguish the two cases ( κ = 1 in case( F1 ), and κ = 0 in case ( F2 )) throughout.We may unify the analysis via the following representation, d (cid:98) X t = ( A − (cid:98) P t S ) (cid:98) X t dt + (cid:98) P t H (cid:48) R − d Y t + 1 √ N + 1 Σ / κ ( (cid:98) P t ) d B t (5.1) d (cid:98) P t = Ricc( (cid:98) P t ) dt + 2 √ N (cid:104) (cid:98) P / t d M t Σ / κ ( (cid:98) P t ) (cid:105) sym (5.2)with the mapping, Σ κ ( Q ) := R + κ QSQ with κ = (cid:26) in case ( F1 ) in case ( F2 ) (5.3)Let (cid:98) Z t := ( (cid:98) X t − X t ) and observe that d (cid:98) Z t = ( A − (cid:98) P t S ) (cid:98) Z t dt + (cid:98) P t H (cid:48) R − / d V t − R / d W t + 1 √ N + 1 Σ / κ ( (cid:98) P t ) d B tlaw = ( A − (cid:98) P t S ) (cid:98) Z t dt + Ω / κ ( (cid:98) P t ) d (cid:98) B t (5.4)15or some independent d -dimensional Wiener process (cid:98) B t and with, Ω κ := Σ + 1 N + 1 Σ κ (5.5)Note we often refer to the flows (cid:98) Z t or Z t as error flows.We also underline that (cid:98) Z t − Z t = ( (cid:98) X t − X t ) − ( X t − X t ) = (cid:98) X t − X t (5.6)so that the difference between the noisy error flow (cid:98) Z t and the classical Kalman-Bucy error flow Z t is equal to the difference between the EnKF (sample mean) state estimate and the classicalKalman-Bucy state estimate.Let (cid:98) φ t ( Q ) := (cid:98) P t denote the flow of the Riccati diffusion equation in (5.2) with (cid:98) P = Q ∈ S d .Let (cid:98) ψ t ( z, Q ) := (cid:98) Z t denote the flow of the stochastic error (5.4) with (cid:98) Z = z = ( x − X ) ∈ R d and (cid:98) P t = (cid:98) φ t ( Q ) . Finally, we denote the flow of the sample mean in (5.1) with (cid:98) X = x ∈ R d by (cid:98) χ t ( x, Q ) := (cid:98) X t .We underline further that the difference between two error flows satisfies, (cid:98) ψ t k ( z , Q ) − (cid:98) ψ t k ( z , Q ) = (cid:98) χ t ( x , Q ) − (cid:98) χ t ( x , Q ) (5.7)and is thus equal to the difference between the two corresponding sample means (with compatiblestarting points). Studying the difference between two error flows ( (cid:98) ψ t k ( z , Q ) − ψ t k ( z , Q )) subsumes the study of something like ( (cid:98) χ t ( x , Q ) − χ t ( x , Q )) which is the difference betweenthe EnKF (sample mean) state estimate and the classical Kalman-Bucy state estimate (withdifferent initial conditions).For any s ≤ t and Q ∈ S d we define the stochastic state-transition matrix, (cid:98) E s,t ( Q ) := exp (cid:20)(cid:73) ts (cid:16) A − (cid:98) φ u ( Q ) S (cid:17) du (cid:21) ⇐⇒ ∂ t (cid:98) E s,t ( Q ) = (cid:16) A − (cid:98) φ u ( Q ) S (cid:17) (cid:98) E s,t ( Q ) (5.8)As with the classical Kalman-Bucy filter, e.g. see (2.16) and (2.17), the convergence andstability properties of the ensemble Kalman-Bucy filter and the associated Riccati diffusionequation are directly related to the contraction properties of the stochastic state-transitionmatrix (cid:98) E s,t ( Q ) . For example, the flow of the stochastic error equation (5.4) is given by, (cid:98) ψ t ( z, Q ) = (cid:98) E s,t ( Q ) (cid:98) ψ s ( z, Q ) + (cid:90) ts (cid:98) E u,t ( Q ) Ω / κ ( (cid:98) φ u ( Q )) d (cid:98) B u (5.9)and the stochastic flow of the matrix Riccati diffusion (5.2) is given implicitly by (cid:98) φ t ( Q ) = (cid:98) E s,t ( Q ) (cid:98) φ s ( Q ) (cid:98) E s,t ( Q ) (cid:48) + (cid:90) ts (cid:98) E u,t ( Q ) Σ ( (cid:98) φ u ( Q )) (cid:98) E u,t ( Q ) (cid:48) du + 2 √ N (cid:90) ts (cid:98) E u,t ( Q ) (cid:104) (cid:98) φ / u ( Q ) d M u Σ / κ ( (cid:98) φ u ( Q )) (cid:105) sym (cid:98) E u,t ( Q ) (cid:48) (5.10)for any s ≤ t . We denote by (cid:98) Π t the Markov semigroup of (cid:98) φ t ( Q ) defined for any boundedmeasurable function F on S d and any Q ∈ S d with the property that, (cid:98) Π t ( F )( Q ) := E (cid:104) F ( (cid:98) φ t ( Q )) (cid:105) (5.11)We have the first result concerning the quadratic, matrix-valued, Riccati diffusion process (5.10). Theorem 5.1.
For any N ≥ the Riccati diffusion (5.10) has a unique weak solution on S d .For N ≥ d + 1 there exists a unique strong solution on S + d . Moreover, (cid:98) Π t ( P, dQ ) is a stronglyFeller and irreducible semigroup with a unique invariant probability measure (cid:98) Γ ∞ on S + d . Thismeasure admits a positive density with respect to the natural Lebesgue measure on S d . (cid:98) X t in(5.1) or a solution (cid:98) Z t in (5.4) exists and is unique. This result is proven in [11, Theorem 2.1].Once the problem of existence and uniqueness is tackled, one major problem in this equationis the behavior at infinity: existence of a stationary measure and speed of convergence towardsthis stationary measure or even distance between two solutions starting at different points. In this section we consider the fluctuation of (cid:98) φ t ( Q ) about φ t ( Q ) and of (cid:98) ψ t ( z, Q ) about ψ t ( z, Q ) .The fluctuation properties and moment boundedness properties of (cid:98) φ t ( Q ) and (cid:98) ψ t ( z, Q ) dependnaturally on the size on the fluctuation as determined by N .Typically, we will write either of the the following expressions in stating our results, (cid:48)(cid:48) N is sufficiently large (cid:48)(cid:48) or (cid:48)(cid:48) N ≥ (cid:48)(cid:48) (5.12)In case ( F1 ) with κ = 1 there is often a minimum threshold on N needed to prove the results.In case ( F1 ) this lower threshold on N may be large. In case ( F2 ) with κ = 0 , these same resultstypically hold; but moreover, we can often refine the relevant results and at the same time relaxthe conditions on N , often needing just N ≥ . This is a significant analytical advantage of the DEnKF over the
VEnKF . In some cases, this advantage is practically realised and provable (and notjust a by-product of analysis methods). For example, we will show later that some moments ofthe
VEnKF sample covariance in one-dimension provably do not exist in the steady-state withouta sufficient number of particles (whereas in the
DEnKF these moments always exist with N ≥ ).In some cases, the results stated in this work are only known for the DEnKF . If we do not specifya particular case, or a value for κ ∈ { , } , then the stated results may be assumed to hold forboth the VEnKF and the
DEnKF .We start with the following under-bias estimate on the sample covariance which holds forboth the
VEnKF and the
DEnKF . Theorem 5.2.
In all cases, for any t ≥ , any Q ∈ S d , and any N ≥ , we have the uniformunder-bias estimate, E (cid:104) (cid:98) φ t ( Q ) (cid:105) ≤ φ t ( Q ) ≤ c (1 + (cid:107) Q (cid:107) ) I (5.13) for a finite constant c > that doesn’t depend on the time horizon. We may refine this under-bias result as is done in [11]. For example, if we assume furtherthat S ∈ S + d , then for any time horizon t ≥ we also have the refined bias estimates, ≤ φ t ( Q ) − E (cid:104) (cid:98) φ t ( Q ) (cid:105) ≤ c ( Q ) 1 N I (5.14)when N is sufficiently large (in case ( F1 ) with κ = 1 ) or for any N ≥ (in case ( F2 ) with κ = 0 ).The proof of this refinement, and details on the constant c ( Q ) , is in [11, Theorem 2.3] and in[14].We will see subsequtently that the condition S ∈ S + d ensures that for any n ≥ , the n -th moments of the trace of the sample covariance are uniformly bounded w.r.t. the timehorizon (when the fluctuation parameter is small enough) even when the matrix A is unstable .This condition S ∈ S + d may be thought of as a strengthening of a detectability/observabilitycondition. We discuss this condition more later as it (re-)appears throughout our presentation.The next theorem concerns these time-uniform moment estimates on the stochastic Riccatiflow in (4.4), i.e. on the flow of the sample covariance matrix. Theorem 5.3.
Assume that S ∈ S + d . For any n ≥ , t ≥ , any Q ∈ S d , and any N sufficientlylarge, we have the uniform estimate, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n (1 + (cid:107) Q (cid:107) ) (5.15)17 urthermore, for any time horizon t ≥ τ > we also have the uniform estimates (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n,τ (5.16) In addition, in case ( F2 ), for any N ≥ , any n ≥ , Q ∈ S d , t ≥ and any s ≥ τ > we havethe refined estimates, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c (1 + (cid:107) Q (cid:107) ) (1 + (cid:114) n N ) and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ s ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c τ (1 + (cid:114) n N ) (5.17)The proof of the above Theorem is provided in [11, Theorem 2.2] where a precise descriptionof the (finite) parameters c n , c n,τ , c, c τ > are also provided. The first estimate in (5.15) alsoholds if S = 0 when µ ( A ) < . The proof of this Theorem is based on a reduction of (4.4)to a scalar Riccati diffusion, a novel representation of its n -th powers, and a comparison of itsmoments to a judiciously designed deterministic scalar Riccati equation. We note the proof isconservative by nature (due to the scalar reduction and comparison).Now we turn to quantifying the fluctuations of the matrix Riccati diffusions around theirlimiting (deterministic) values as found when N tends to ∞ . That is, we quantify the fluctuationof the EnKF sample covariance about the limiting covariance of the classical Kalman-Bucy filter.
Theorem 5.4.
Assume that S ∈ S + d . For any n ≥ , t ≥ , any Q ∈ S d , and any N sufficientlylarge we have the uniform estimates, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) − φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n √ N (1 + (cid:107) Q (cid:107) ) (5.18) In case ( F2 ), for any N ≥ , any n ≥ , t ≥ , and any Q ∈ S d , we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) − φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c √ N (1 + (cid:107) Q (cid:107) ) (cid:18) (cid:114) n N (cid:19) (5.19)The estimates in Theorem 5.4 do not depend on Q ∈ S d when t ≥ τ for any τ > and with c n , c replaced with c n,τ , c τ ; e.g. similarly to (5.16) in Theorem 5.3.The proof of the preceding Theorem is provided in [11, Theorem 2.3] and in [14]. The prooffollows from a second-order expansion of the stochastic flow (cid:98) φ t about the deterministic flow φ t and then an appropriate bounding of the first and second order stochastic terms. See [14] fordetails on the decomposition of (cid:98) φ t in terms of φ t plus stochastic terms of any order (in thefluctuation N ). We explore this expansion in the scalar case in more detail in a later section.The under bias result (5.13) holds with any N ≥ in both the VEnKF of case ( F1 ), and in the DEnKF of case ( F2 ). This under-bias is a motivation for so-called sample covariance regularisationin practice; e.g. so-called sample covariance inflation or localisation methods [5, 39, 36, 62, 31].Later we discuss the effects of inflation in particular.As with the deterministic Riccati equation, we may bound the moments of the inverse of thestochastic Riccati flow (cid:98) φ t ( Q ) under stronger conditions on the number of particles N required;e.g. see [11]. It follows that with Q ∈ S + d and with additional conditions on N , that for t ≥ τ > there exists a uniform positive definite lower bound on E [ (cid:98) φ t ( Q )] .A number of basic corollaries follow the proofs in [14, 11], for instance, we have the monotoneproperty, S d (cid:51) Q ≤ Q = ⇒ E (cid:104) (cid:98) φ t ( Q ) (cid:105) ≤ E (cid:104) (cid:98) φ t ( Q ) (cid:105) (5.20)and, for any Q ∈ S d , the fixed upper bound, E (cid:104) (cid:98) φ t ( Q ) (cid:105) ≤ P ∞ + E t ( P ∞ ) ( Q − P ∞ ) E t ( P ∞ ) (cid:48) (5.21)18everal spectral estimates can be deduced from the estimates (5.14), (5.18) and (5.19). Forexample, in case ( F2 ), with κ = 0 and N ≥ then combining (5.19) with the n -version of theHoffman-Wielandt inequality we have the uniform estimate, sup ≤ i ≤ r (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) λ i (cid:16) (cid:98) φ t ( Q ) (cid:17) − λ i ( φ t ( Q )) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( Q ) 1 √ N (5.22)In [14] we go further with regards to this fluctuation analysis on the EnKF sample covariance.In [14] we present sharp and non-asymptotic expansions of the matrix moments of the matrixRiccati diffusion with respect to the parameter N . These results follow from a Taylor-typeperturbation expansion of the form, (cid:98) φ t = φ t + (cid:88) ≤ k Consider only case ( F2 ). If µ ( A − P ∞ S ) < and S ∈ S + d , then for any n ≥ , z ∈ R d , Q ∈ S d , there exists some time (cid:98) t n −→ N →∞ ∞ such that for any ≤ t ≤ (cid:98) t n we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) ψ t ( z, Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( z, Q ) (5.25) See [9] for details on the time parameter (cid:98) t n −→ N →∞ ∞ . Unlike Theorem 5.3, the proof of this result requires contraction (or non-explosiveness) to beestablished a priori for the transition semigroup (cid:98) E s,t ( Q ) defined in (5.8). Moreover, this result isknown only for the DEnKF with κ = 0 in the general multi-dimensional setting without stabilityassumptions on A itself.Now we turn to quantifying the fluctuations of the sample mean around their limiting valuesas found when N tends to ∞ . That is, we quantify the fluctuation of the EnKF sample meanabout the limiting conditional mean estimate from the classical Kalman-Bucy filter. Theorem 5.6. Consider only case ( F2 ). If µ ( A − P ∞ S ) < and S ∈ S + d , then for any n ≥ , z ∈ R d , Q ∈ S d , there exists some time (cid:98) t n −→ N →∞ ∞ such that for any ≤ t ≤ (cid:98) t n we have (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) ψ t ( z, Q ) − ψ t ( z, Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( z, Q ) 1 √ N (5.26) See [9] for details on the time parameter (cid:98) t n −→ N →∞ ∞ . (cid:98) E s,t ( Q ) . Also, this result is known only for the DEnKF with κ = 0 in thegeneral multi-dimensional setting without stability assumptions on A itself.In [26], the preceding two results, Theorem 5.5 and Theorem 5.6, are stated under thestrong stability assumption µ ( A ) < . That assumption ensures the contractive property holdsfor (cid:98) E s,t ( Q ) and consequently that (cid:98) ψ t ( z, Q ) is stable. In that case, with µ ( A ) < , these resultshold for both the VEnKF and the DEnKF . The stability of (cid:98) ψ t ( z, Q ) more generally (without askingfor A to be stable) is considered in the following subsection and Theorem 5.5 and Theorem 5.6should be read in conjunction with those results.Asking for µ ( A − P ∞ S ) < in Theorems 5.5 and 5.6 is much weaker than asking for µ ( A ) < (e.g. we allow unstable A ). Note that the related condition Absc( A − P ∞ S ) < follows directlyfrom the classical observability and controllability assumptions on the signal-observation model(2.1), see (2.11). Model conditions for µ ( A − P ∞ S ) < may be thought of as a strengtheningof observability and controllability and is discussed later.The proof of Theorem 5.5 and Theorem 5.6 is provided in [13] in the one-dimensional settingwhere a detailed description of the (finite) parameters c n ( z, Q ) > are provided. The multi-dimensional result follows using similar proof methods to those used in [13] in combination withthe contraction properties of the transition matrix (cid:98) E s,t ( Q ) detailed in the next subsection. Inthe one-dimensional setting studied in [13], contraction of (cid:98) E s,t ( Q ) is given under very generalmodel conditions which also accommodate both the VEnKF and the DEnKF . Consequently, in onedimension both Theorem 5.5 and Theorem 5.6 hold on an infinite time horizon for any t ≥ and with any κ ∈ { , } .One may consider a perturbation expansion of the sample mean flow as (cid:98) ψ t = ψ t + (cid:88) ≤ k Theorem 5.7. Assume the fluctuation parameter N is sufficiently large such that E [ (cid:107) (cid:98) φ t ( Q ) (cid:107) ] and E [ (cid:107) (cid:98) φ − t ( Q ) (cid:107) ] are uniformly bounded (e.g. as in Theorem 5.3 for bounds on E [ (cid:107) (cid:98) φ t ( Q ) (cid:107) ] ).Then, there exists some finite constants c, α, (cid:126) > such that for any t ≥ and probabilitymeasures Γ , Γ on S + d , we have the Λ -norm contraction inequality (cid:107) Γ (cid:98) Π t − Γ (cid:98) Π t (cid:107) (cid:126) , Λ ≤ c e − α t (cid:107) Γ − Γ (cid:107) (cid:126) , Λ (5.31)Of course, setting Γ = (cid:98) Γ ∞ where (cid:98) Γ ∞ is the unique invariant probability measure describedin Theorem 5.1 implies that for any initial probability measure Q ∼ Γ on S + d we have that (cid:98) φ t ( Q ) tends to be distributed according to (cid:98) Γ ∞ . The proof of the above theorem is provided in [11,Theorem 2.4] and is based on matrix-valued Lyapunov and minorisation conditions (choosingthe Lyapunov candidate, Λ( · ) ).For one-dimensional models, the article [13] provides explicit analytical expressions for thereversible measure of (cid:98) P t in terms of the model parameters. As expected, heavy tailed reversiblemeasures arise when κ = 1 , and weighted Gaussian distributions when κ = 0 ; see the illustrativeexamples in a later section in this article. The article [13] also provides sharp exponential decayrates to equilibrium, in the sense that the decay rates tend to those of the limiting deterministicRiccati equation when N tends to ∞ . Recall that the stability properties of the deterministic ( N = ∞ ) semigroups E s,t ( Q ) associatedwith the classical Kalman-Bucy filter are rather well understood; e.g. see (2.13), (2.15), and(2.14) and also [8, 10].We emphasise that in the deterministic case ( N = ∞ ), stability of the matrix-valued Riccatidifferential equation, e.g. as in (2.20), follows from the contraction properties of E s,t ( Q ) in(2.13); see [8, 10] for the derivation. Some intuition for this follows from the implicit form forthe solution in (2.17). Similarly, in classical Kalman-Bucy filter, the stability properties of theerror flow (2.6) are related to the contraction properties of the state-transition matrix E s,t ( Q ) .The intuition for this follows again from the solution form in (2.16). The stability properties ofthe classical Kalman-Bucy error flow are given in, e.g., (2.22) and (2.24); see [8].We come now to the contractive properties of (cid:98) E s,t ( Q ) defined in (5.8). Firstly, we remarkthat if S ∈ S + d , then up to a change of basis we can always assume that S = I . Moreover, forany s, t ∈ [0 , ∞ [ we immediately have the rather crude almost sure estimate µ ( A ) < ⇒ (cid:107) (cid:98) E s,s + t ( Q ) (cid:107) ≤ exp [ t µ ( A ))] −→ t →∞ (5.32)In general, asking for A to be stable in this form is a very strong and restrictive condition.We typically seek contraction results on (cid:98) E s,t ( Q ) that accomodate arbitrary A ∈ M d matrices;21n particular, we seek to accommodate unstable signal matrices A , i.e. matrices with (some)non-negative eigenvalues. To this end, fix Q ∈ S d and consider the process (cid:98) A defined by (cid:98) A : t ∈ [0 , ∞ [ (cid:55)→ (cid:98) A t := A − (cid:98) φ t ( Q ) S (5.33)We write A for the analogous process driven by φ t ( Q ) , i.e. with N = ∞ .In this notation, for example when κ = 0 , combining (5.17) (5.19) and (2.15) with Krause’sinequality for any nd ≥ we also have the uniform estimate, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) SpecDist (cid:16) A t , (cid:98) A t (cid:17) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) nd ≤ c n ( Q ) 1 √ N (5.34)where we define the optimal matching distance between the spectrum of matrices A, B ∈ M d by SpecDist ( A, B ) = min perm( · ) max ≤ i ≤ d | λ i ( A ) − λ perm( i ) ( B ) | (5.35)where the minimum is taken over the set of d ! permutations of { , . . . , d } . In addition, for any t ≥ τ > , using the Lipschitz estimates discussed above we also have A ∞ := A − P ∞ S = ⇒ SpecDist ( A t , A ∞ ) ≤ c exp [ − α t/d ] (cid:107) Q − P ∞ (cid:107) /d (5.36)with the rate parameter α coming from (2.13). These spectral estimates are of interest on theirown, but are not immediately usable for controlling the contraction properties of the exponentialsemigroups.By Theorem 5.3 and Theorem 5.4, with S ∈ S + d , the collection of processes ( A , (cid:98) A ) introducedin (5.33) satisfy the following regularity properties: • Case κ ∈ { , } : For any n ≥ , t ≥ , Q ∈ S d , and any N sufficiently large we have theuniform estimates √ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A t − (cid:98) A t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n (1 + (cid:107) Q (cid:107) ) and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) A t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n (1 + (cid:107) Q (cid:107) ) • Case κ = 0 :For any n ≥ , t ≥ , Q ∈ S d , and any N ≥ we have the uniform estimates √ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) A t − (cid:98) A t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c (1 + (cid:107) Q (cid:107) ) (1 + √ n √ N ) and (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) A t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c (1 + (cid:107) Q (cid:107) ) (1 + √ n √ N ) The stability properties of stochastic semigroups associated with a collection of stochasticflows ( A , (cid:98) A ) satisfying the above properties have been developed in our prior work [9]. Severallocal-type contraction estimates can be derived. The following results broadly imply the semi-group (cid:98) E s,t ( Q ) is asymptotically stable in probability when µ ( A ∞ ) < and a sufficient numberof particles are employed. Theorem 5.8. Let κ ∈ { , } and S ∈ S + d . If µ ( A ∞ ) < , then for any increasing sequence ≤ s ≤ t k ↑ k →∞ ∞ , and for any Q ∈ S d , the probability of the following event lim sup k →∞ t k log (cid:107) (cid:98) E s,t k ( Q ) (cid:107) < µ ( A − P ∞ S ) is greater than − ν (5.37) for any ν ∈ ]0 , , as soon as N is sufficiently large (as a function of ν ∈ ]0 , ). This log-Lyapunov estimate (5.37) immediately implies the semigroup (cid:98) E s,t k ( Q ) may be ex-ponentially contracting with a high probability; given strong observability and controllabilityconditions that imply µ ( A − P ∞ S ) < . A number of reformulations of this result that shedinsight individually are worth highlighting: 22 Let κ ∈ { , } . For any ≤ s ≤ t k ↑ k →∞ ∞ , there exists a sequence N := N k ↑ k →∞ ∞ such that we have the almost sure Lyapunov estimate lim sup k →∞ lim sup k →∞ t k log (cid:107) (cid:98) E s,s + t k ( Q )) (cid:107) < µ ( A − P ∞ S ) (5.38) • Let κ ∈ { , } . Then, for any increasing sequence of times ≤ s ≤ t k ↑ k →∞ ∞ , theprobability of the following event, ∀ < ν ≤ ∃ l ≥ such that ∀ k ≥ l it holds that t k log (cid:107) (cid:98) E s,t k ( Q ) (cid:107) ≤ 12 (1 − ν ) µ ( A − P ∞ S ) (5.39)is greater than − ν , for any ν ∈ ]0 , , as soon as N is sufficiently large (as a function of n ≥ and ν ∈ ]0 , ). • Let κ ∈ { , } . Consider any s ≥ , any increasing sequence of time horizons t k ↑ k →∞ ∞ ,and any sequence N := N k ,n ↑ k →∞ ∞ such that (cid:80) k ≥ / (cid:112) N k ,n < ∞ for some n ≥ .Then, we have the almost sure Lyapunov estimate, ∀ < ν ≤ ∃ l , l ≥ such that ∀ k ≥ l , ∀ k ≥ l it holds that t k log (cid:107) (cid:98) E s,s + t k ( Q ) (cid:107) ≤ 12 (1 − ν ) µ ( A − P ∞ S ) (5.40)The first dot-point result captured by (5.38) is derived from (5.37) in Theorem 5.8 viathe Borel-Cantelli lemma. The next two dot-point results provide some reformulation of thesupremum limit estimates (5.37) and (5.38) in terms of random relaxation time horizons andrandom relaxation-type fluctuation parameters. The last reformulation in (5.40) underlines thefact that after some random time (i.e. determined by l ), and given some randomly sufficientlylarge number of particles (determined by l ) the semigroup (cid:98) E s,t ( Q ) is exponentially contractive.We have no direct control over the parameters l and l in (5.40) which depend on the randomnessin any realisation.Stronger results are applicable if we restrict κ = 0 . We have the following immediate corollaryof our prior work in [9]: Theorem 5.9. Assume κ = 0 and S ∈ S + d . If µ ( A ∞ ) < , then the semigroup (cid:98) E s,t ( Q ) isasymptotically L n -stable for any n ≥ over time horizons with lengths controlled by N . Morespecifically, for any n ≥ , s ≥ , Q ∈ S d , there exists some time horizons t n < (cid:98) t n −→ N →∞ ∞ such that for any t n ≤ t ≤ (cid:98) t n we have t log E (cid:104) (cid:107) (cid:98) E s,s + t ( Q ) (cid:107) n (cid:105) ≤ n µ ( A − P ∞ S ) (5.41) whenever N is sufficiently large such that (cid:98) t n > t n ; see [9] for details on these time parameters. Importantly, in this last result we have (cid:98) t n −→ N →∞ ∞ and thus we can control the horizonon which the semigroup (cid:98) E s,t ( Q ) is asymptotically L n -stable for any n ≥ when κ = 0 . In otherwords, the estimate (5.41) ensures that the stochastic semigroup (cid:98) E s,t ( Q ) is stable on arbitrarylong finite time horizons , as soon as κ = 0 , and when the perturbation parameter is chosensufficiently large. We have the following fact immediate from Theorem 5.9:23 Assume κ = 0 . For any n ≥ , s ≥ , we have lim sup N →∞ (cid:98) t n log E (cid:104) (cid:107) (cid:98) E s,s + (cid:98) t n ( Q ) (cid:107) n (cid:105) ≤ n µ ( A − P ∞ S ) Combining Theorem 5.8 and Theorem 5.9 we may draw the basic (qualitative) conclusionthat, after some initial time period, and given enough particles, the (noisy) exponential semi-groups (cid:98) E s,t ( Q ) are exponentially contractive (in some sense, e.g. almost-sure or L n -type) at arate related to µ ( A − P ∞ S ) . We note that the condition µ ( A − P ∞ S ) < is however strongerthan asking for Absc( A − P ∞ S ) < ; the latter of which follows immediately from classical ob-servability/controllability conditions on the model (or even weaker detectability/stabilisabilityconditions); see (2.11).Finally, we also have the following new result which extends the exponential decay resultsfor one-dimensional models presented in [13] to the determinant of the matrix-valued Riccatidiffusions considered herein. This is a type of stochastic Liouville formula. Theorem 5.10. For any n ≥ , t ≥ , any Q ∈ S + d , and N sufficiently large we have theexponential decay estimate E (cid:104) det ( (cid:98) E t ( Q )) n (cid:105) /n = E (cid:20) exp (cid:18) n (cid:90) t Tr( A − (cid:98) φ s ( Q ) S ) ds (cid:19)(cid:21) /n ≤ c n ( Q ) exp (cid:32) − t (cid:114) Tr (cid:16) (cid:98) R n (cid:98) S n (cid:17)(cid:33) (5.42) with (cid:98) R n := R − N (2 n + d + 1) R > and (cid:98) S n := S − N (2 n + d + 1) κ S > (5.43) In addition, the exists some function (cid:98) ν n with lim N →∞ (cid:98) ν n = 0 such that E (cid:104) det ( (cid:98) E t ( Q )) n (cid:105) /n ≤ c n ( Q ) exp (cid:16) − t (1 − (cid:98) ν n ) (cid:112) Tr( A ) + Tr( RS ) (cid:17) (5.44)The proof of this theorem is in [11, Theorem 2.7]. In the one-dimensional case, d = 1 , thisresult collapses to capture the strong exponential contraction results presented in [13]. Indeedin one dimension, Theorem 5.10 can be seen as a significant improvement over both Theorem5.8 and Theorem 5.9 in both theoretical development and practical usability.In the scalar case, strong stability results on the stochastic Riccati flow (cid:98) φ t analogous to thedeterministic setting, e.g. (2.20), also follow from Theorem 5.10; see also [13] and the illustrativeexamples in a later section in this article. In this section we consider the stability of the error flow of (cid:98) ψ t ( z, Q ) := (cid:98) Z t = ( (cid:98) X t − X t ) in bothcase ( F1 ) and case ( F2 ) EnKF models with (cid:98) Z = ( (cid:98) X − X ) = z ∈ R d and (cid:98) P = Q ∈ S d . Thisflow may be related to the Ornstein-Uhlenbeck process (5.4) and whose solution can be writtenmore generally as in (5.9).Now from preceding results on the contraction of (cid:98) E s,t in Section 5.2.1, we may study thestability of the error flow (cid:98) ψ t ( z, Q ) and its contraction properties.The following uniform error contraction estimate follows from (5.9) and Theorem 5.9, sup Q ∈ S d (cid:13)(cid:13)(cid:13) E (cid:16) (cid:98) ψ t ( z, Q ) | X (cid:17) (cid:13)(cid:13)(cid:13) ≤ c exp [ t α µ ( A − P ∞ S )] (cid:107) z − X (cid:107) (5.45)and holds for the DEnKF , with κ = 0 , for some α, c > , and under conditions compatiblewith the conditions in Theorem 5.9. This contraction result is analogous to (2.22) for the24lassical Kalman-Bucy filter; but under stronger conditions dictated by the available results onthe contraction properties of (cid:98) E t stated in Theorem 5.9. In particular, our methods prove thiscontraction (5.45) only in the case of the DEnKF , with κ = 0 , with N sufficiently large, and ontime horizons compatible with those detailed in Theorem 5.9.The next results on the stability of (cid:98) ψ t ( z, Q ) similarly follow immediately from those stabilityresults in the preceding Section 5.2.1 but are stated at the level of the process (cid:98) ψ t ( z, Q ) itself,rather than the stochastic exponential semigroup (cid:98) E t . Theorem 5.11. Let κ ∈ { , } and S ∈ S + d . If µ ( A ∞ ) < , then for any increasing sequence oftimes t k ↑ k →∞ ∞ and any z (cid:54) = z and any Q ∈ S d , the probability of the following event lim sup k →∞ t k log (cid:107) (cid:98) ψ t k ( z , Q ) − (cid:98) ψ t k ( z , Q ) (cid:107) < µ ( A − P ∞ S ) is greater than − ν (5.46) for any ν ∈ ]0 , , as soon as N is sufficiently large (as a function of ν ∈ ]0 , ). Two reformulations of this result may shed insight individually and are worth highlighting: • Let κ ∈ { , } . For ≤ t k ↑ k →∞ ∞ , there exists a sequence N := N k ↑ k →∞ ∞ suchthat we have the almost sure Lyapunov estimate lim sup k →∞ lim sup k →∞ t k log (cid:107) (cid:98) ψ t k ( z , Q ) − (cid:98) ψ t k ( z , Q ) (cid:107) < µ ( A − P ∞ S ) (5.47) • Let κ ∈ { , } . Consider any increasing sequence of time horizons t k ↑ k →∞ ∞ , and anysequence N := N k ,n ↑ k →∞ ∞ such that (cid:80) k ≥ / (cid:112) N k ,n < ∞ for some n ≥ . Then, wehave the almost sure Lyapunov estimate, ∀ < ν ≤ ∃ l , l ≥ such that ∀ k ≥ l , ∀ k ≥ l it holds that t k log (cid:107) (cid:98) ψ t k ( z , Q ) − (cid:98) ψ t k ( z , Q ) (cid:107) ≤ 12 (1 − ν ) µ ( A − P ∞ S ) (5.48)Again we emphasise that the last reformulation in (5.48) highlights that after some randomtime (i.e. determined by l ), and given some randomly sufficiently large number of particles(determined by l ) the difference of error flows (or sample means; see (5.7)) is exponentiallystable.We have stronger L n -type stability results in case ( F2 ), i.e. when κ = 0 , as illustrated next. Theorem 5.12. Assume κ = 0 and S ∈ S + d . If µ ( A ∞ ) < , then for any n ≥ , Q ∈ S d , thereexists some time horizons t n < (cid:98) t n −→ N →∞ ∞ such that for any t n ≤ t ≤ (cid:98) t n we have the stabilityestimate, E (cid:104) (cid:107) (cid:98) ψ t ( z , Q ) − (cid:98) ψ t ( z , Q ) (cid:107) n (cid:105) /n ≤ exp (cid:18) t µ ( A − P ∞ S ) (cid:19) (cid:107) z − z (cid:107) (5.49) whenever N is sufficiently large such that (cid:98) t n > t n ; see [9] for details on these time parameters. We emphasise again that (cid:98) t n −→ N →∞ ∞ . With regards to qualitative reasoning, we maycombine Theorem 5.11 and Theorem 5.12 and draw the basic (qualitative) conclusion that,after some initial time period, and given enough particles, the difference in (noisy) error flows ( (cid:98) ψ t ( z , Q ) − (cid:98) ψ t ( z , Q )) , or the difference in sample means ( (cid:98) χ t ( x , Q ) − (cid:98) χ t ( x , Q )) , is exponentiallystable (in some sense) with a rate related to µ ( A − P ∞ S ) . We note again that µ ( A − P ∞ S ) < is more restrictive than Absc( A − P ∞ S ) < ; the latter of which follows from classical observ-ability/controllability conditions, see (2.11). We have stronger stability results in the case κ = 0 L n -type stability results for any n ≥ ; but on finite horizons thatcan be made arbitrarily long.In the scalar case d = 1 , stronger stability results on the error flow (cid:98) ψ t ( z, Q ) follow fromthe contraction properties in Theorem 5.10 under weaker model and ensemble (particle) sizeassumptions. The strong L n -type stability results in the scalar d = 1 case are quantitative withexponential rates, on infinite horizons, that collapse to the optimal deterministic rates (explicitlycomputable when d = 1 ) as N → ∞ . See [13] and the illustrative examples in the next sectionin this article. Throughout this section we let d = 1 and R ∧ S > . The latter condition R ∧ S > is bothnecessary and sufficient for observability and controllability to hold in one dimension.When P ∈ [0 , ∞ [ , the deterministic Riccati equation defined on [0 , ∞ [ , in the classicalKalman-Bucy filter, satisfies the quadratic differential equation (2.3) which may be written alsoas, ∂ t P t = Ricc( P t ) = − S ( P t − (cid:37) + ) ( P t − (cid:37) − ) , (6.1)with the equilibrium states ( (cid:37) − , (cid:37) + ) defined by S (cid:37) − := A − (cid:112) A + RS < < S (cid:37) + := A + (cid:112) A + RS (6.2)With P ∈ [0 , ∞ [ , we have P t → t →∞ P ∞ = (cid:37) + . It follows that, A − P ∞ S = − (cid:112) A + RS (6.3)and thus simplifying, e.g. (2.13), we have the equality, E t ( Q ) = c t ( Q ) E t ( P ∞ ) ≤ c ( Q ) E t ( P ∞ ) and E t ( P ∞ ) = e − t √ A + RS (6.4)where −√ A + RS may be viewed explicitly as the optimal semigroup contraction rate in thescalar case. The explicit form of the constants c t ( Q ) , c ( Q ) is also available in the scalar case,see [13] and also the general Floquet-type multivariate result in [10].The Riccati drift function Ricc( · ) is also the derivative of the double-well drift function F ( Q ) = − S Q ( Q − ζ − ) ( Q − ζ + ) (6.5)with the roots ζ − := 3 A S − (cid:34)(cid:18) A S (cid:19) + 3 RS (cid:35) / < < ζ + := 3 A S + (cid:34)(cid:18) A S (cid:19) + 3 RS (cid:35) / (6.6)In this situation, the general Riccati diffusion (5.2) describing the flow of the sample covariancein both case ( F1 ) and case ( F2 ) resumes to the Langevin-Riccati drift-type diffusion process, d (cid:98) P t = ∂F ( (cid:98) P t ) dt + 1 √ N (cid:98) P / t Σ / κ ( (cid:98) P t ) d M t (6.7)with the mapping Σ κ defined in (5.3). Recall that case ( F1 ) corresponds to the vanilla EnKF ,denoted by VEnKF , and case ( F2 ) corresponds to the ‘deterministic’ EnKF , denoted by DEnKF .Also observe that ∂F > on the open interval ]0 , ζ + [ and ∂F (0) = R > σ (0) so that theorigin is repellent and instantaneously reflecting.26t any time t ≥ we may comment on the boundedness of certain moments of the sam-ple variance and the fluctuation of the sample variance and sample mean about their limiting(classical Kalman-Bucy variance and mean) values.For example, taking Theorem 5.3 and the detailed scalar exposition in [13], we have for any n ≥ , t ≥ , Q ∈ [0 , ∞ [ , and any N ≥ ∨ κ ( n − , the uniform estimates, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( Q ) (6.8)For any t ≥ , Q ∈ [0 , ∞ [ , and any N ≥ , we have the uniform under bias estimate, E (cid:104) (cid:98) φ t ( Q ) (cid:105) ≤ φ t (6.9)and we have explicit bounds on φ t in the scalar case in [13]. This under bias result holds for any N ≥ in both the VEnKF of case ( F1 ), and in the DEnKF of case ( F2 ). This under bias motivates so-called variance/covariance regularisation methods in practice; e.g. so-called sample covarianceinflation or localisation methods. Later we discuss the effects of inflation in particular.We have bounds on the inverse Riccati flow (leading to lower bounds on the sample covari-ance) under stronger conditions on N ; see [13].From Theorem 5.5 and the detailed scalar exposition in [13] we have for any n ≥ , t ≥ , Q ∈ [0 , ∞ [ , and any N > n + 1)(1 + 4 κ ) , the uniform estimates, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) ψ t ( z, Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( z, Q ) (6.10)From Theorem 5.4, Theorem 5.6 and the scalar exposition in [13], we have for any n ≥ , t ≥ , Q ∈ [0 , ∞ [ , and any N ≥ ∨ κ ( n − , the uniform fluctuation estimate, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) φ t ( Q ) − φ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( Q ) √ N (6.11)and for any N > n + 1)(1 + 4 κ ) we have the uniform fluctuation estimate, (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:98) ψ t ( Q ) − ψ t ( Q ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n ≤ c n ( z, Q ) √ N (6.12)Note that the constants (e.g. c n ( Q ) ) above in (6.8) and (6.11) are examined in [13] in thescalar case d = 1 in more detail.The benefit and real interest in the scalar case is the ability to explicate the convergencerates, e.g. as in (6.4) and as shown subsequently for the contraction properties of the EnKF sample means and variances.Note that we may expand the stochastic flow of the sample variance as in (5.23). Exploringthis idea further in the scalar case, we may write the first and second-order fluctuations as, (cid:98) ϕ t := √ N [ (cid:98) φ t − φ t ] , (6.13) (cid:98) ϕ (2) t := √ N [ (cid:98) ϕ t − ϕ t ] (6.14)where in the second line we emphasise the superscript · (2) is an order index (not a power) andwhere, ϕ t ( Q ) := (cid:90) t ( ∂φ t − s ) ( φ s ( Q )) Σ / κ ( φ s ( Q )) d M s (6.15)and the derivatives ∂ k φ t of any order are explicitly given in [13]. In this case, ∂φ t ( Q ) = E t ( Q ) (where the superscript here is now a power). We then have, (cid:98) φ t = φ t + 1 √ N ϕ t + 1 N (cid:98) ϕ (2) t (6.16)27he natural central limit theorem follows, √ N (cid:104) (cid:98) φ t − φ t (cid:105) −→ N →∞ ϕ t (6.17)The (non-)asymptotic variance is estimated in [13, 14].The expansion (6.16) allows ones to better understand the bias properties of the samplecovariance (cid:98) φ t . Writing the third-order fluctuation as, (cid:98) ϕ (3) t := √ N [ (cid:98) ϕ (2) t − ϕ (2) t / (6.18)and expanding and taking expectations, E (cid:104) (cid:98) φ t (cid:105) = φ t + 12 N E (cid:104) ϕ (2) t (cid:105) + 1 N / E (cid:104) (cid:98) ϕ (3) t (cid:105) (6.19)and limits we have the dominating ( N -order-asymptotic) bias is given by N (cid:16) E (cid:104) (cid:98) φ t (cid:105) − φ t (cid:17) −→ N →∞ (cid:90) t (cid:0) ∂ φ t − s (cid:1) ( φ s ( Q )) Σ κ ( φ s ( Q )) ds < (6.20)which is negative (agreeing with our general under bias results, e.g. (6.9)).The expansion (5.27) of the sample mean (error) may be explored similarly to the aboveexpansion of the sample covariance. The first order terms ϑ t in (5.27) related to the centrallimit theorem are studied in [14, Section 1.3].The infinitesimal generator of the diffusion (6.7) on ]0 , ∞ [ is given in Sturm-Liouville formby the equation L ( f ) = 2 N ι Σ κ e V ∂ (cid:0) e − V ∂f (cid:1) with V ( · ) = − N (cid:90) · δ ∂F ( x ) ι − ( x ) Σ − κ ( x ) dx (6.21)for any δ > and where we recall the identity function ι ( x ) := x . This implies that a reversiblemeasure of the Riccati diffusion (5.2) in the scalar d = 1 case is given by the formula (cid:98) Γ ∞ ( dx ) ∝ ]0 , ∞ [ ( x ) N ι ( x ) Σ κ ( x ) exp ( − V ( x )) dx (6.22)In case ( F1 ) corresponding to the VEnKF , we have that L is reversible w.r.t. the probabilitymeasure (cid:98) Γ ∞ on ]0 , ∞ [ defined by, (cid:98) Γ ∞ ( dx ) ∝ ]0 , ∞ [ ( x ) exp (cid:32) N A √ RS tan − (cid:32) x (cid:114) SR (cid:33)(cid:33) (cid:18) xR + Sx (cid:19) N x ( R + Sx ) dx (6.23)See also [13] for alternate derivations/forms of this heavy tailed invariant measure. The heavytailed nature of the stationary measure implies that for the n -th moment to exist one requires N > ∨ n − . As expected this condition on N is generally weaker than that required for n -th moment boundedness at any time t ≥ in (6.8). In Figure 1 we plot the line defined by (2 n − / N for various N values. With any N ≥ , we have existence of the first two moments.Higher-order moments even in one dimension are still troublesome (for the VEnKF , κ = 1 ).In fact, the diffusion (cid:98) P t for the sample variance in case ( F1 ) does not have any exponentialmoments in the stationary regime for any finite N ≥ . That is, for any t ≥ and any finite α > we have Law ( Q ) = (cid:98) Γ ∞ = ⇒ E (cid:104) exp (cid:16) α (cid:107) (cid:98) φ t ( Q ) (cid:107) (cid:17)(cid:105) = ∞ for any N ≥ .We also remark that the heavy tailed nature of this stationary distribution implies thatnumerical stability in practice may be worrisome. In the stationary regime, it is realistic to28 Plot of (2n-4)/N...... with N moving from 1 to 50 fanning left to right Figure 1: Existence of moments for the EnKF. Each line corresponds to some number N with N moving from to fanning left to right. The ‘x-axis’ corresponds to moment orders n anda moment n exists whenever the line (2 n − / N is strictly less than one.expect samples from the tails in this case, and these may be large enough and/or frequent enoughto cause numerical divergence. This property may lead to so-called catastrophic divergence asstudied in, e.g., [48]. In [37, 35, 48] mechanisms for catastrophic divergence are studied incomplex nonlinear systems. Here we argue that even in linear systems, the heavy-tailed natureof the invariant measure of the sample covariance may lead to samples numerically large enoughto cause numerical catastrophe in any practical computing system.In case ( F2 ) corresponding to the DEnKF , we have that L is reversible w.r.t. the probabilitymeasure (cid:98) Γ ∞ on ]0 , ∞ [ defined by, (cid:98) Γ ∞ ( dx ) ∝ ]0 , ∞ [ ( x ) x N − exp (cid:32) − S N R (cid:18) x − AS (cid:19) (cid:33) dx (6.24)Note this measure has Gaussian tails, and we contrast this with the heavy tailed nature of (6.23).This is significant, since it implies that the sample variance (and mean) of this DEnKF will exhibitsmaller fluctuations than the VEnKF , and that all moments (including exponential moments) existin this case for any choice of N ≥ . This latter result is consistent with Theorem 5.3 at any time t ≥ (and in the general multivariate setting). We can also expect better numerical stability(e.g. less outliers); including better time-discretisation properties [40] in case ( F2 ). These betterfluctuation properties are already apparent in the preceding results (e.g. see Theorem 5.3, 5.4and 5.12) in the full multi-dimensional setting.As an illustrative example, take A = 20 (i.e. the underlying signal model is highly unstable), R = S = 1 and N = 6 . In Figure 2 we compare the invariant measure for the flow of the samplevariance in each case.We see in Figure 2 the heavy tails of the invariant measure (6.23) for the vanilla EnKF samplevariance, and conversely the Gaussian-type tails in the case (6.24) of the ‘deterministic’ EnKF .Note also the positioning of the mode/mean in each case. In case ( F1 ) of the VEnKF , n -th ordermoments exist only when (2 n − / N is strictly less than one (in this case for n < ); while allmoments exist in case ( F2 ) for the DEnKF .We finally tun to the convergence/stability properties of the sample variance and samplemean. In the case of the sample variance, we know from Theorem 5.7 that convergence of (cid:98) φ t toits invariant measure (cid:98) Γ ∞ (e.g. as depicted in Figure 2 and described by (6.23) or (6.24)) holdsif N > κ S ) / (2 R ) . Proof of this condition on N follows from Theorem 5.7, the originalmultivariate statement of the same result in [11, Theorem 2.4] and bounds on the mean of thesample variance flow and its inverse [13, 11]. In [13] we also consider contraction and stability29 Sample Variance Invariant Measures 'Vanilla' EnKF'Deterministic' EnKF Figure 2: The invariant measure of the sample variance of the ‘vanilla’ EnKF in case ( F1 ), versusthat of the ‘deterministic’ EnKF in case ( F2 ).properties of the distribution of the sample covariance with respect to a particular Wassersteinmetric; as opposed to the Λ -norm contraction used in Theorem 5.7. An interesting result from[13] is that when κ = 0 , and for stable signal models (i.e. A < ), the Riccati diffusion (6.7)(describing the flow of the sample covariance) may converge faster to its invariant measure in(6.24), than the deterministic Riccati (6.1) does to its fixed point in (6.2).In one-dimensional ( d = 1 ) settings we may say more on the (stochastic) stability of the EnKF sample covariance φ t and sample mean ψ t based on the contraction properties of thestochastic transition matrix (cid:98) E t ( Q ) defined in (5.8). It follows from Theorem 5.10 that we havethe exponential decay estimate with N > ∨ (4 n − κ which comes from [11, Theorem 2.7], E (cid:104) (cid:98) E t ( Q ) n (cid:105) /n = E (cid:20) exp (cid:18) n (cid:90) t ( A − (cid:98) φ s ( Q ) S ) ds (cid:19)(cid:21) /n ≤ c n ( Q ) exp (cid:18) − t (cid:113) (cid:98) R n (cid:98) S n (cid:19) (6.25)In addition, the exists some function (of N ) lim N →∞ (cid:98) ν n = 0 such that E (cid:104) (cid:98) E t ( Q ) n (cid:105) /n = c n ( Q ) exp (cid:16) − t (1 − (cid:98) ν n ) (cid:112) A + RS (cid:17) (6.26)which we may relate (or contrast) with the exact contraction rate of the exponential semigroupassociated with the deterministic Riccati equation in (6.4) describing the true filter variancein the classical Kalman-Bucy filter. The rate parameter (cid:98) ν n is different between the VEnKF and DEnKF . Details on the parameter (cid:98) ν n are given in [13] but importantly for both κ ∈ { , } werecover naturally the convergence rate of the deterministic Riccati flow in (6.4).The exponential decay of the exponential semigroup (cid:98) E t ( Q ) plays a central role in the stabilityof the pair of processes ( (cid:98) φ t , (cid:98) ψ t ) . For large time horizons the Lyapunov exponent can be estimatedby the formula t log (cid:98) E t ( Q ) = 1 t (cid:90) t ( A − (cid:98) φ s ( Q ) S ) ds −→ t →∞ A − (cid:98) Γ ∞ ( ι ) S, (6.27)where (cid:98) Γ ∞ denotes the reversible measure (6.23) or (6.24). We also have the following estimatesof the Lyapunov exponent (6.27) from [13], and that relate also to the under-bias result in (6.9).Let κ = 0 and let Law ( Q ) = (cid:98) Γ ∞ be the reversible probability measure defined in (6.24). Then,for any t ≥ , we have N > ⇒ − (cid:112) A + RS ≤ A − E ( (cid:98) φ t ( Q )) S ≤ − (cid:115) A + RS (cid:18) − N (cid:19) < (6.28)30imilarly assuming κ = 1 with Law ( Q ) = (cid:98) Γ ∞ and (cid:98) Γ ∞ defined in (6.23) we have for any t ≥ , N > ⇒ − (cid:112) A + RS ≤ A − E ( (cid:98) φ t ( Q )) S ≤ − (cid:113) A + RS ( − (4 /N ) ) − A/N /N < (6.29)As noted, the left hand inequalities in the preceding two equations follows immediately from theunder-bias result in (6.9).From the contraction properties on E [ (cid:98) E t ( Q ) n ] we may deduce, in the scalar setting, strongstability results on the stochastic Riccati flow (cid:98) φ t analogous to the deterministic setting, e.g.(2.20). Similarly, strong stability results on the error flow (cid:98) ψ t follow from the contraction proper-ties of E [ (cid:98) E t ( Q ) n ] . Importantly, in the scalar d = 1 case of (cid:98) ψ t ( z, Q ) we may relax the multivariateresults like Theorem 5.11 and Theorem 5.12 which require more restrictive model (e.g. the strongobservability/stability µ ( A − P ∞ S ) < condition) and ensemble (particle) size assumptions.From [13, Theorem 5.10] we have that for any N > ∨ κ ( n − , E (cid:104) (cid:107) (cid:98) φ t ( Q ) − (cid:98) φ t ( Q ) (cid:107) n (cid:105) /n ≤ c n (cid:107) Q − Q (cid:107) exp (cid:16) − t (1 − (cid:98) ν n ) (cid:112) A + RS (cid:17) (6.30)for some function (of N ) lim N →∞ (cid:98) ν n = 0 . Note we have found no analogue of this result in themultivariate setting.From [13, Theorem 6.1] we have that for any N > n + 1)(1 + 4 κ ) , E (cid:104) (cid:107) (cid:98) ψ t ( z , Q ) − (cid:98) ψ t ( z , Q ) (cid:107) n (cid:105) /n ≤ c n ( z , z , Q , Q ) ( (cid:107) z − z (cid:107) + (cid:107) Q − Q (cid:107) ) exp (cid:16) − t (1 − (cid:98) ν n ) (cid:112) A + RS (cid:17) (6.31)for some function (of N ) lim N →∞ (cid:98) ν n = 0 . We may contrast this result with the more restrictiveTheorem 5.12 in the multivariate setting. Note in the scalar setting we accommodate differentinitial variance positions, and we recover, over fully infinite horizons, a continuous relationshipwith the optimal stability rates of (6.4).The constants in (6.26), (6.30), and (6.31) are given explicitly in terms of the model param-eters in [13]. We remark that across these three stability results, the details of (cid:98) ν n vary [13], butimportantly we recover the optimal (classical Kalman-Bucy) rates lim N →∞ (cid:98) ν n = 0 .We consider an illustration of the fluctuation and stability properties of the sample variancein the different EnKF variants. Consider again the model leading to Figure 2, and let (cid:98) P = 0 .The deterministic Riccati flow ( N = ∞ , in (6.1)) of the classical Kalman-Bucy filter and withthe chosen model parameters is given in Figure 3, along with N = 100 sample paths of thesample variances for both the VEnKF and the DEnKF . Flow of the deterministic Riccati equation Sample path trajectories ('vanilla' EnKF sample variance) Figure 3: Flow of the deterministic Riccati equation, and N = 100 sample paths of the VEnKF sample variance of case ( F1 ), and N = 100 paths of the DEnKF sample variance of case ( F2 ).31ote in Figure 3 the drastically reduced fluctuations in the ‘deterministic’ EnKF samplevariance sample paths. At equilibrium, these fluctuations are related to the invariant measuresof the two EnKF varieties in (6.23) and (6.24).In Figure 4 we plot the flow of the first two central moments and the rd through the thstandardised central moments for both the VEnKF and DEnKF sample variance distribution. Recallthat N = 6 in this example, and we expect moments of the VEnKF sample variance in case ( F1 )to exist up to n = 4 with n = 5 the boundary case; while all moments exist for the DEnKF ofcase ( F2 ). Vanilla EnKF: Sample Variance Moments Mean Variance × × × Deterministic EnKF: Sample Variance Moments Mean Variance Figure 4: Flow of the sample variance moments for the VEnKF and the DEnKF .We note in Figure 4 that the sample variance moments for the VEnKF in case ( F1 ) beginto destabilize around the th/ th moments as expected. Importantly, the mean of the samplevariance for the VEnKF is very negatively biased in this case, while the mean of the DEnKF in case( F2 ) is quite accurate. Note also the very large variance in the sample variance for the VEnKF .Lastly, observe that (6.7) has non-globally Lipschitz coefficients in case ( F1 ). The drift isquadratic, while the diffusion has a polynomial growth of order / . It follows by [40] that thenaive Euler-type time-discretization may blow up, regardless of the boundedness properties ofthe limiting (continuous-time) diffusion. Let ( V it , W it , X i ) with ≤ i ≤ N + 1 be ( N + 1) independent copies of ( V t , W t , X ) . Now considera modification of the individual particle update equations in the two cases of interest, ( F1 ) d X i,ξt = A X i,ξt dt + R / d V it + (cid:16) (cid:98) P ξt + ξ T (cid:17) H (cid:48) R − (cid:104) d Y t − (cid:16) H X i,ξt dt + R / d W it (cid:17)(cid:105) (7.1) ( F2 ) d X i,ξt = A X i,ξt dt + R / d V it + (cid:16) (cid:98) P ξt + ξ T (cid:17) H (cid:48) R − (cid:34) d Y t − H (cid:32) X i,ξt + (cid:98) X ξt (cid:33) dt (cid:35) ξ ∈ [0 , ∞ [ , and T ∈ S r is some given reference matrix. Here, (cid:98) P ξt = (cid:98) φ ξt ( Q ) denotes thesample covariance-type function of X i,ξt given by, (cid:98) η ξt := η N ,ξt = 1 N + 1 N +1 (cid:88) i =1 δ X i,ξt = ⇒ (cid:98) X ξt := X N ,ξt = 1 N + 1 N +1 (cid:88) i =1 X i,ξt and (cid:98) P ξt := P N ,ξt = N + 1 N P (cid:98) η ξt (7.2)Recall the unified representation (for both the VEnKF and the DEnKF ) for the flow of thesample mean, sample covariance, and the sample error flow in equations (5.1) through to (5.4).Now consider the modification of the state estimator (sample mean) update equation result-ing from the ( ξ T ) -modification to the particle updates, d (cid:98) X ξt = ( A − ( (cid:98) P ξt + ξ T ) S ) (cid:98) X ξt dt + ( (cid:98) P ξt + ξ T ) H (cid:48) R − d Y t + 1 √ N + 1 Σ / κ,ξ ( (cid:98) P ξt ) d B t (7.3)with the mapping, Σ κ,ξ ( Q ) := R + κ ( Q + ξ T ) S ( Q + ξ T ) with κ = (cid:26) in case ( F1 ) in case ( F2 ) (7.4)With (cid:98) Z ξt := ( (cid:98) X ξt − X t ) we then also have that, d (cid:98) Z ξt = ( A − ( (cid:98) P ξt + ξ T ) S ) (cid:98) Z ξt dt + ( (cid:98) P ξt + ξ T ) H (cid:48) R − d V t − R / d W t + 1 √ N + 1 Σ / κ,ξ ( (cid:98) P ξt ) d B tlaw = ( A − ( (cid:98) P ξt + ξ T ) S ) (cid:98) Z ξt dt + Ω / κ,ξ ( (cid:98) P ξt ) d (cid:98) B t (7.5)with, Ω κ,ξ := Σ ,ξ + 1 N + 1 Σ κ,ξ (7.6)In un-regularised ensemble Kalman filtering, we approximate φ t by the sample covariance (cid:98) φ t since the dimension of (cid:98) X t may be in the many millions; see [31]. However, when computingthe sample covariance in high-dimensions, rank deficient estimation is common due to a lack ofenough samples. Covariance inflation, leading to an approximation of the form ( (cid:98) φ ξt ( Q ) + ξ T ) , inthe update equation (e.g. (7.1)), is a common, simple means of addressing this rank deficiency[31].Note that the perturbation in the resulting flow of (cid:98) φ ξt ( Q ) comes from a (rather delicate)feedback loop adding ξ T to the covariance of the signal (cid:98) X ξt at each instant. The flow of (cid:98) φ ξt ( Q ) is given by, d (cid:98) P ξt = (cid:20) ( A − (1 − κ )2 ξ T S − (cid:98) P ξt S ) (cid:98) P ξt + (cid:98) P ξt ( A − (1 − κ )2 ξ T S − (cid:98) P ξt S ) (cid:48) + R + κ ξ T S T (cid:21) dt + 2 √ N (cid:20) (cid:98) P ξt / d M t Σ / κ,ξ ( (cid:98) P ξt ) (cid:21) sym (7.7)In the limit N → ∞ we recover a perturbed, deterministic, Riccati equation that describesthe flow of the limiting covariance. This perturbed Riccati equation is studied in [16, 12]. Forany size (cid:107) ξT (cid:107) < ∞ , the perturbed Riccati flow qualitatively retains all the stability propertiesof the nominal Riccati flow (e.g. (2.20), but with a different steady state value), and the size ofthe error between the two grows in a well-quantified continuous way.33n the limiting case, we have via [12, Theorem 2.1] that φ t ( Q ) ≤ φ ξt ( Q ) in case ( F1 ). In case( F2 ) we have that φ ξt ( Q ) ≤ φ t ( Q ) in the limit N → ∞ .For any s ≤ t and Q ∈ S d we define the stochastic state-transition matrix, (cid:98) E ξs,t ( Q ) := exp (cid:20)(cid:73) ts (cid:16) A − ξ T S − (cid:98) φ ξu ( Q ) S (cid:17) du (cid:21) ⇔ ∂ t (cid:98) E ξs,t ( Q ) = (cid:16) A − ξ T S − (cid:98) φ ξu ( Q ) S (cid:17) (cid:98) E ξs,t ( Q ) (7.8)Note that this semigroup (cid:98) E ξs,t ( Q ) is associated with the evolution of the (inflation) regularisedsample mean in (7.3) or the error flow (7.5) in both case ( F1 ) and ( F2 ). Unlike the un-regularisedsetting, this same semigroup is not directly related to the evolution of the sample covariance,in (7.7); for example, in case ( F1 ) the semigroup associated with the evolution of the samplecovariance is just (cid:98) E s,t ( Q ) as given in (5.8) and studied throughout the preceding section.We can comment on the effect of inflation regularisation on the contraction properties of (cid:98) E ξs,t ( Q ) , as compared e.g. to (cid:98) E s,t ( Q ) . Firstly, it is worth noting, given the contraction estimatesin Section 5.2.1, that, µ (( A − ξ T S ) − P S ) ≤ µ ( A − P S ) (7.9)for any fixed matrix P ∈ S d and S ∈ S d . Arguing as in (5.32), when S ∈ S + d , then up to achange of basis we can always assume that S = I . We then have, µ ( A ) < (cid:107) ξ T (cid:107) = ⇒ (cid:107) (cid:98) E ξs,t ( Q ) (cid:107) ≤ exp ( µ ( A − ξ T )( t − s )) −→ ( t − s ) →∞ (7.10)which illustrates the added stabilising effects of ξ T in the extreme case in which (cid:98) P ξt S hasno stabilising effect at all. Contrast this with (5.32). Then one interpretation of the precedingrelationship is that ξ T extends the set of signal matrices A ∈ M d for which one may immediatelyachieve stabilisation (regardless of the effect of (cid:98) P ξt S ). In practice, (cid:98) P ξt S will also act to stabilisethe filter, see e.g. (5.49). Indeed, in the classical Kalman filtering setting (2.2), (2.3) with ξ = 0 ,the time-varying matrix ( A − P t S ) is stabilising [8] for any A ∈ M d , even A unstable. In the EnKF , we know that (cid:98) P t will fluctuate about P t , e.g. see Theorem 5.4. Therefore, the stabilisationproperties of ( A − (cid:98) P t S ) are unclear; indeed the study of (cid:98) E s,t ( Q ) in the preceding Section 5.2.1is concerned with precisely this issue. The above implies that the addition of ξ T can act tocounter the negative effects of this fluctuation (and directly add a stabilising effect on the stateestimation error).Finally, we have φ t ( Q ) ≤ φ ξt ( Q ) in case ( F1 ) and φ ξt ( Q ) ≤ φ t ( Q ) in case ( F2 ). The semigroupassociated with the error flow (7.5) in both cases is the same. The inequality φ t ( Q ) ≤ φ ξt ( Q ) in case ( F1 ) suggests that the diffusion fluctuation in (7.3) or (7.7) will increase. However, weconversely expect that with S ∈ S + d we have µ (( A − ξ T S ) − φ ξt ( Q ) S ) ≤ µ (( A − ξ T S ) − φ t ( Q ) S ) and thus we gain a type of stabilising effect. Inflation in case ( F1 ) is then a delicate balancingtradeoff between adding noise to the diffusion coefficients (which may kill the existence of samplecovariance moments, for example), and adding a stabilising effect on the sample mean error flow.When ξ > is large enough we can achieve added stabilisation in case ( F2 ), as compared to thenon inflated case. This is not automatic as in case ( F1 ) because φ ξt ( Q ) ≤ φ t ( Q ) . However, thefluctuations are (further) decreased with inflation in case ( F2 ). In places, we switch between rather quantitative estimates to those more qualitative in nature.In part this is to simplify presentation, or when the details are (likely) not tight and thus perhapsof little quantitative interest. In some in places it is because we did not obtain more precisedescriptions of the estimates involved. Refining these estimates may be of practical interest insome cases; e.g. when deriving estimates on the required number of particles N for stability ofthe sample covariance (or convergence to its invariant measure).34he results presented thus far consisted of constants, e.g. c , c n , c τ , etc, that depend onthe model parameters ( A, R, S ) , but importantly not on the ensemble size ( N + 1) or the timehorizon t ∈ [0 , ∞ [ . Due to the dependence on the model (e.g. ( A, R, S ) ), these constants dependimplicitly (via the matrix norms used) on the underlying signal dimension d . It would be ofinterest to pull this dependence out more explicitly depending on the matrix norm we are using,so as to quantify, at least in some general sense, the tradeoff between N and d . For example, inTheorem 5.4 or Theorem 5.6 detailing the fluctuation of the sample covariance and sample meanabout their limiting covariance and (Kalman-Bucy) state estimate values, it would be of interestto know how this fluctuation scales with dimension d , say e.g. with fixed N . Unfortunately, theproof tools used in the development of this work does not lend itself naturally to this analysis.The matrix S := H (cid:48) R − H plays a critical role throughout with regards to obtaining time-uniform fluctuation and then subsequently stability/convergence results. In particular, the as-sumption that S ∈ S + d is strictly positive-definite is needed in numerous places; specifically whenthe signal model A is unstable. This assumption amounts to a type of strong observability result;e.g. a requirement on the “fullness” of the observations and the size and rank of the observationmatrix H . It is worth emphasising that this assumption appears in many technical articlesdiscussing the performance properties of the ensemble Kalman filter; e.g. [47, 26, 80, 25, 22, 23].Typically, the tools used in the proofs in [26, 14, 13, 11] are not sophisticated enough to accom-modate zero eigenvalues of S . A basic example of this deficiency is in the proof of time-uniformmoment boundedness of (cid:98) P t , stated in Theorem 5.3. In that proof, we resort to taking trace oreigenvalue-type reductions of the matrix-valued Riccati diffusion and studying a scalar compari-son Riccati equation. This scalar reduction means that we must look at the minimum eigenvalueof S (because it appears with a minus sign in the Riccati equation) and thus we cannot allowthis value to be zero (because we would lose this term completely in the scalar comparison). Toobtain uniform-in-time bounds, one needs the stabilising effect of this non-zero S in the scalarcomparison. See the proof in [11, Theorem 2.2] for this very transparent example. In this exam-ple, one may relax the condition on S to S ∈ S d at the expense of time exponentially growingbounds. Related difficulties in allowing S ∈ S d instead of S ∈ S + d arise in numerous other places(and as noted in other related works [47, 26, 80, 25, 22, 23]). One difficulty is related to stabilityof the (time-varying) matrix ( A − (cid:98) P t S ) and the positive-definiteness properties of product (cid:98) P t S as discussed subsequently.We have focused significant effort on relaxing the assumption that the underlying signal isstable. Note that if A is stable, e.g. strongly stable in the sense that µ ( A ) < , then the stabilityof µ ( A − (cid:98) P t S ) may be trivially inherited whenever S ∈ S + d via a change of coordinates, see [26].We see here again the use of S ∈ S + d as it pertains to the product (cid:98) P t S . If S ∈ S d is only positivesemi-definite, then one can construct counterexamples such that even if (cid:98) φ t = ( φ t + (cid:98) ϕ t / √ N ) ∈ S + d is positive definite, there exists flows (cid:98) ϕ t such that µ ( A − (cid:98) P t S ) = µ ( A − φ t S − (cid:98) ϕ t S/ √ N ) > .The assumption µ ( A ) < is made in [26] in the linear-Gaussian setting and follow also in, e.g.,[47, 80, 25, 23] when reducing those studies to the linear-Gaussian setting. If A is allowed tobe unstable, then the asymptotic (time-varying) stability of ( A − P t S ) in the classical Kalmanfilter follows under so-called detectability (or observability) conditions [50, 66, 82]. Detectabilityintricately relates the relevant rank deficient directions in P t and S in terms of the unstabledirections in A (i.e. it basically ensures those directions of A that are unstable are observed(as captured by S ) and non-zero weighted in the update Kalman gain via P t ). The rank of thesample covariance (cid:98) P t is at most N ≥ . If N < d , then (cid:98) P t is almost surely rank deficient andthus has zero eigenvalues in some directions. In general, we cannot control the directions inwhich the random, sub-rank, (cid:98) P t has zero eigenvalues (e.g. to play nicely with S in the sense ofdetectability). If A is unstable in those directions, the filter is consequently unstable in thosedirections. Thus, there is a basic, unavoidable, but also transparent tradeoff in requiring eitherstability of µ ( A ) < or sufficiently large ensemble sizes N ≥ d in the derivation of uniform-in-time stability results for the EnKF . In the stability results stated in this work, we emphasised35nstable models A but required sufficiently large ensemble sizes N ≥ d . Nevertheless, moststability results stated in this work with the hypothesis that “ N is sufficiently large” may berestated with this condition replaced with “ N ≥ and µ ( A ) < ”. In [14, 13, 11] the detailson “ N is sufficiently large” are given more explicitly. Note some results that do not consider orrely on the long time stability behaviour of the samples, e.g. the fluctuation size of the samplecovariance about its true value, hold with N ≥ and any matrix A , e.g. this is true for the DEnKF in Theorem 5.4.We remark that the assumption that the true Kalman-Bucy filter is stable in the sense µ ( A − P ∞ S ) < is used in a number of the fluctuation (on the sample mean) and long-time behavioural results in this article. This is stronger than the more natural assumption that Absc( A − P ∞ S ) < . The latter follows from the very natural model assumptions of detectabilityand stabilisability, see (2.11) and the discussion following that equation, e.g. [52, Theorems 9.12,9.15]. In this sense, asking for µ ( A − P ∞ S ) < on the true filter may be viewed as asking for atype of strong observability and controllability. This is similarly in line with asking for S ∈ S + d which is itself a type of strong observability assumption.It is worth noting again that all moment boundedness and fluctuation results stated inthis work hold with any N ≥ and without further assumptions if one replaces the constants c, c n , c n ( Q ) , c n ( z, Q ) . . . with functions that depend on (and grow with) the time horizon t ≥ . The focus of this article is ensemble filtering in the linear-Gaussian (continuous-time) setting.The results surveyed herein portray a rather detailed theory of fluctuation and stability results inthat case. In practice, the ensemble Kalman filtering methodology is applied in high-dimensional,nonlinear state-space models, e.g. see [31, 32]. The evolution equations for each ensemblemember in the case of nonlinear state-space models are given in (4.10).In [83, 78] a novel McKean-Vlasov-type diffusion is studied which has conditional distributionequal to the true Bayesian filter. The mean-field approximation of this McKean-Vlasov-typediffusion in [83, 78] resembles somewhat superficially the particle filters in (4.10). However, theanalogue of the gain function in (4.10) in the filter of [83, 78] is derived as the solution of acertain Poisson-type partial differential equation. In the linear-Gaussian case, the filter of [83]coincides with the DEnKF .In the nonlinear model setting, the ensemble filters in (4.10) are not derived as sampledversions of an equation whose (conditional) distribution is equal to the Bayesian filter. That is,these filters are not derived as sampled versions of the McKean-Vlasov-type diffusion in [83, 78].Conversely, in the limit ( N → ∞ ) the ensemble filters in (4.10) do not converge to an object withdistribution equal to the optimal Bayes filter. In fact, the object these filters converge to hasnot been rigorously established in general and its properties, as compared to the true Bayesianfilter, remain an open topic. Thus, in the nonlinear setting, the ensemble filters discussed inthis work, see (4.10), may be viewed as approximations of the filter in [83, 78] only in somevery weak sense (despite any superficial resemblance to the contrary). Indeed, the gain functionapproximation in (4.10) is likely a very poor approximation of the solution of the Poisson-typepartial differential equation in [83]; except of course in linear-Gaussian models. Rather, we mayargue, as we have earlier in this article, that the ensemble filters in (4.10) should be viewed in thecontext of so-called observer theory, and related not to Bayesian filtering but rather to the moregeneral topic of (dynamic) state estimation [2, 7]. The goal of state estimation in this contextis to design an observer that tracks in some suitable (typically point-wise) sense the underlyingsignal and perhaps provides some usable measure of uncertainty on this estimate. The goal isnot to develop an approximation (at each time) of the true conditional (Bayesian) distributionof the signal given the observations. The latter contains significantly more information than isperhaps needed in many practical applications. Nevertheless, we also argue that the filtering36deas in [83, 78], and suitable approximations thereof, are in need of further investigation.In [25] a class of so-called ensemble extended Kalman filters ( En-EKF ) is developed that isbased on a type of particle approximation of the linearisation-based extended Kalman filter,see [2]. This ensemble filter is interesting because the sample mean is shown to converge (in N → ∞ ) to the extended Kalman filter state estimate. This extended Kalman state estimatorhas been widely studied in nonlinear filtering and control theory [2, 7, 71, 17, 46], and may beviewed more as a type of nonlinear state estimator rather than a Bayesian filter [7, 71].When considering nonlinear signal models, the long time behavioural analysis of various EnKF methods in [47, 80, 25] assumes a strong type of stability property on the signal (whichin the linear case would reduce to assuming that A is Hurwitz stable in our model (2.1)). Thisstability assumption on the true signal is precisely what we aim to relax in our work; albeitlimited in our study to linear models. Filter stability without assumptions on the stability ofthe true signal will ultimately require some control of the fluctuation properties of the sampledobserver, e.g. see the discussion in the preceding section on this topic (in the linear-Gaussianmodel setting). This fluctuation analysis is lacking somewhat in the nonlinear model setting. Itis complicated in that case by the absence of any closed-form evolution equations for the samplemean and sample covariance.Viewing, or even designing, an ensemble filter (or its sample mean for example) as a (dy-namic) state estimator (or observer) may have some benefits. In particular, stability may be alarger design consideration if starting from this viewpoint rather than seeking Bayesian proba-bilistic properties. It may be possible to then also exploit the properties of existing nonlinearstate estimators which have traditionally been rigorously analysed, e.g. [7, 71, 17, 46].This is exemplified in the ( En-EKF ) in [25] that converges to the extended Kalman filterin the limit N → ∞ . The stability of the extended Kalman filter as a nonlinear observer hasbeen widely studied, e.g. see [7, 71, 17, 46]. Although strong signal stability assumptions aretaken in [25], it would be natural to consider the ( En-EKF ) in [25] without the underlying signalstability assumption and look at developing the fluctuation type analysis considered herein inthe linear-Gaussian setting. We may then also exploit the stability analysis that already exists[7, 71, 17, 46] for the limiting extended Kalman state estimator. This is analogous in manyways to the stability properties and observability/controllability properties used herein in thelinear-Gaussian setting.Inflation is used in [47, 81] in the nonlinear model setting to aid in stability. This is similarto the study considered herein on stability under inflation in linear-Gaussian models. It seemsnatural that added inflation acts to stabilise the various ensemble filters. In the context of thepreceding discussion, inflation-based state estimators may also be viewed in the context of stablenonlinear observers, rather than heuristic adaptions of approximate Bayesian filters.Finally, we remark that the transport-based ensemble filter DEnTF , see case ( NF3 ) in (4.10),is studied in [22] in a particular nonlinear setting. In that case again, it is typical to assumea square observation matrix, or in other words to take the observations as linear and assumetheir exists a change of coordinate so that H = I . Note this assumption is made also in[47, 80, 25, 23] which otherwise consider certain classes of nonlinear signals. Thus, this strong(and linear) observability assumption seems key to analysis in the ensemble filtering literatureeven when moving away from the linear signal model. References [1] J.I. Allen, M. Eknes and G. Evensen. An ensemble Kalman filter with a complex marineecosystem model: Hindcasting phytoplankton in the Cretan Sea. Annales Geophysicae. vol.21. pp. 399–411 (2003).[2] B.D.O. Anderson and J.B. Moore. Optimal Filtering. Dover Publications (1979).373] J.L. Anderson. An ensemble adjustment Kalman filter for data assimilation. MonthlyWeather Review. vol. 129, no. 12. pp. 2884–2903 (2001).[4] J.L. Anderson. A local least squares framework for ensemble filtering. Monthly WeatherReview. vol. 131, no. 4. pp 634–642 (2003).[5] J.L. Anderson and S.L. Anderson. A Monte Carlo Implementation of the Nonlinear FilteringProblem to Produce Ensemble Assimilations and Forecasts. Monthly Weather Review. vol.127, no. 12. pp. 2741–2758 (1999).[6] A. Bain and D. Crisan. Fundamentals of Stochastic Filtering. Springer (2009).[7] J.S. Baras, A. Bensoussan, and M.R. James. Dynamic observers as asymptotic limits ofrecursive filters: Special cases. SIAM Journal on Applied Mathematics. vol. 48, no. 5. pp.1147–1158 (1988).[8] A.N. Bishop and P. Del Moral. On the Stability of Kalman-Bucy Diffusion Processes. SIAMJournal on Control and Optimization. vol. 55, no. 6. pp 4015–4047 (2017); arxiv e-print arXiv:1610.04686 updated.[9] A.N. Bishop and P. Del Moral. Stability Properties of Systems of Linear Stochastic Differ-ential Equations with Random Coefficients. SIAM Journal on Control and Optimization.vol. 57, no. 2. pp. 1023–1042 (2019); arXiv e-print, arXiv:1804.09349 .[10] A.N. Bishop and P. Del Moral. An explicit Floquet-type representation of Riccati aperiodicexponential semigroups. arXiv e-print, arXiv:1805.02127 (2018).[11] A.N. Bishop and P. Del Moral. On the stability of matrix-valued Riccati diffusions. Elec-tronic Journal of Probability. vol. 24, paper no. 24 (2019); arXiv e-print, arXiv:1808.00235 .[12] A.N. Bishop and P. Del Moral. On the robustness of Riccati flows to complete modelmisspecification. Journal of the Franklin Institute. vol. 355, no. 15. pp 7178–7200 (2018).[13] A.N. Bishop, P. Del Moral, K. Kamatani, and B. Remillard. On one-dimensional Riccatidiffusions. Annals of Applied Probability. vol. 29, no. 2. pp. 1127–1187 (2019); arXiv e-print, arXiv:1711.10065 .[14] A.N. Bishop, P. Del Moral, and A. Niclas. A perturbation analysis of stochastic matrixRiccati diffusions. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques. vol.56, no. 2. pp. 884–916 (2020); arXiv e-print, arXiv:1709.05071 .[15] A.N. Bishop, P. Del Moral and A. Niclas. An introduction to Wishart matrix moments.Foundations and Trends in Machine Learning. vol. 11, no. 2. pp. 97–218 (2018); arXive-print, arXiv:1710.10864 .[16] A.N. Bishop, P. Del Moral and S. Pathiraja. Perturbations and Projections of Kalman-BucySemigroups. Stochastic Processes and their Applications. vol. 128, no. 9. pp. 2857–2904.(2018).[17] A.N. Bishop and P. Jensfelt. A stochastically stable solution to the problem of robocen-tric mapping. In Proceedings of the 2009 IEEE International Conference on Robotics andAutomation, Kobe, Japan (May, 2009).[18] G. Burgers, P.J. van Leeuwen, and G. Evensen. Analysis Scheme in the Ensemble KalmanFilter. Monthly Weather Review, vol. 126, no. 6. pp. 1719–1724 (1998).3819] F.M. Callier and J.L. Willems. Criterion for the Convergence of the Solution of the RiccatiDifferential Equation. IEEE Transactions on Automatic Control. vol. 26, no. 6. pp. 1232–1242 (1981).[20] F.M. Callier and J. Winkin. Convergence of the Time-Invariant Riccati Differential Equa-tion towards Its Strong Solution for Stabilizable Systems. Journal of Mathematical Analysisand Applications. vol. 192, no. 1. pp. 230–257 (1995).[21] N.K. Chada, M.A. Iglesias, L. Roininen, and A.M. Stuart. Parameterizations for ensembleKalman inversion. Inverse Problems. vol. 34, no. 5. (2018).[22] J. de Wiljes, S. Reich, and W. Stannat. Long-Time Stability and Accuracy of the EnsembleKalman-Bucy Filter for Fully Observed Processes and Small Measurement Noise. SIAMJournal on Applied Dynamical Systems. vol. 17, no. 2. pp. 1152–1181 (2018).[23] J. de Wiljes and X.T. Tong. Analysis of a localised nonlinear Ensemble Kalman Bucy Filterwith complete and accurate observations. arXiv e-print, arXiv:1908.10580 (2019).[24] P. Del Moral. Feynman-Kac Formulae. Springer (2004).[25] P. Del Moral, A. Kurtzmann, and J. Tugaut. On the Stability and the Uniform Propagationof Chaos of a Class of Extended Ensemble Kalman–Bucy Filters. SIAM Journal on Controland Optimization. vol. 55, no.1. pp. 119–155 (2017).[26] P. Del Moral and J. Tugaut. On the stability and the uniform propagation of chaos prop-erties of ensemble Kalman-Bucy filters. Annals of Applied Probability. vol. 28, no. 2. pp790–850 (2018).[27] J.L. Doob. Stochastic Processes. John Wiley & Sons, New York (1953).[28] A. Doucet, N. de Freitas and N.J. Gordon (editors). Sequential Monte Carlo Methods inPractice. Springer (2001).[29] A. Doucet, S. Godsill and C. Andrieu. On sequential Monte Carlo sampling methods forBayesian filtering. Statistics and Computing. vol. 10, no. 3. pp. 197–208 (2000).[30] G. Evensen. Sequential data assimilation with a nonlinear quasi-geostrophic model usingMonte Carlo methods to forecast error statistics. Journal of Geophysical Research: Oceans.vol. 99, no. C5. pp. 10143–10162 (1994).[31] G. Evensen. The Ensemble Kalman Filter: Theoretical Formulation and Practical Imple-mentation. Ocean Dynamics. vol. 53, no. 4. pp. 343–367 (2003).[32] G. Evensen. Data Assimilation: The Ensemble Kalman Filter. Springer Science & BusinessMedia, 2nd edition, (2009).[33] G. Evensen, J. Hove, H.C. Meisingset, E. Reiso, K.S. Seim. Using the EnKF for assistedhistory matching of a North Sea Reservoir Model. In Proceedings of the 2007 SPE ReservoirSimulation Symposium, Houston, Texas (February, 2007).[34] N. Gordon, J. Salmond and A. Smith. A novel approach to non-linear/non-GaussianBayesian state estimation. IEE Proceedings on Radar and Signal Processing. vol. 140, no.2. pp. 107–113 (1993).[35] G.A. Gottwald and A.J. Majda. A mechanism for catastrophic filter divergence in dataassimilation for sparse observation networks. Nonlinear Processes in Geophysics. vol. 20,no. 5. pp. 705–712 (2013). 3936] T.M. Hamill, J.S. Whitaker, and C. Snyder. Distance-Dependent Filtering of BackgroundError Covariance Estimates in an Ensemble Kalman Filter. Monthly Weather Review. vol.129, no. 11. pp. 2776–2790 (2001).[37] J. Harlim and A.J. Majda. Catastrophic Filter Divergence in Filtering Nonlinear DissipativeSystems. Communications in Mathematical Sciences. vol. 8, no. 1. pp. 27–43 (2010).[38] P.L. Houtekamer and H.L. Mitchell. Data assimilation using an ensemble Kalman filtertechnique. Monthly Weather Review. vol. 126, no. 3. pp. 796–811 (1998).[39] P.L. Houtekamer and H.L. Mitchell. A Sequential Ensemble Kalman Filter for AtmosphericData Assimilation. Monthly Weather Review. vol. 129, no. 1. pp. 123–137 (2001).[40] M. Hutzenthaler, A. Jentzen and P.E. Kloeden. Strong and weak divergence in finite time ofEuler’s method for stochastic differential equations with non-globally Lipschitz continuouscoefficients. Proceedings of the Royal Society of London A: Mathematical, Physical andEngineering Sciences. vol. 467, no. 2130. pp. 1563–1576 (2011).[41] M.A. Iglesias, K.J.H. Law, and A.M. Stuart. Ensemble Kalman methods for inverse prob-lems. Inverse Problems. vol. 29, no. 4. (2013).[42] C.J. Johns and J. Mandel. A two-stage ensemble Kalman filter for smooth data assimilation.Environmental and Ecological Statistics. vol. 15, no. 1. pp. 101–110 (2008).[43] R.E. Kalman and R.S. Bucy. New Results in Linear Filtering and Prediction Theory. Journalof Basic Engineering. vol. 83, no. 1. pp. 95–108 (1961).[44] E. Kalnay. Atmospheric Modelling, Data Assimilation and Predictability. Cambridge Uni-versity Press (2003).[45] I. Karatzas and S.E. Shreve. Brownian Motion and Stochastic Calculus. Springer (1996).[46] T. Karvonen, S. Bonnabel, E. Moulines and S. Särkkä. On stability of a class of filters fornon-linear stochastic systems. arXiv e-print, arXiv:1809.05667 (2018).[47] D. Kelly, K.H.J. Law, and A.M. Stuart. Well-posedness and accuracy of the ensembleKalman filter in discrete and continuous time. Nonlinearity. vol. 27, no. 10. pp. 2579–2603(2014).[48] D. Kelly, A.J. Majda, and X.T. Tong. Concrete ensemble Kalman filters with rigorous catas-trophic filter divergence. Proceedings of the National Academy of Sciences of the UnitedStates of America. vol 112, no. 34. pp. 10589–10594 (2015).[49] V. Kucera. A Contribution to Matrix Quadratic Equations. IEEE Transactions on Auto-matic Control. vol. 17, no. 3. pp. 344–347 (1972).[50] H. Kwakernaak and R. Sivan. Linear Optimal Control Systems. Wiley-Interscience (1972).[51] E. Kwiatkowski and J. Mandel. Convergence of the Square Root Ensemble Kalman Filterin the Large Ensemble Limit. SIAM/ASA Journal on Uncertainty Quantification. vol. 3,no. 1. pp. 1–17 (2015).[52] P. Lancaster and L. Rodman. Algebraic Riccati Equations. Oxford University Press (1995).[53] K.J.H. Law, A.M. Stuart and K. Zygalakis. Data Assimilation: A Mathematical Introduc-tion. Springer (2015). 4054] K.J.H. Law, H. Tembine and R. Tempone. Deterministic Mean-Field Ensemble KalmanFiltering. SIAM Journal on Scientific Computing. vol. 38, no. 3. pp. A1251-A1279 (2016).[55] F. Le Gland, V. Monbet and V.D. Tran. Large sample asymptotics for the ensemble Kalmanfilter. Chapter 22 in The Oxford Handbook of Nonlinear Filtering. pp. 598–631 (2011).[56] K.A. Lisaeter, J. Rosanova and G. Evensen. Assimilation of ice concentration in a coupledice-ocean model using the Ensemble Kalman Filter. Ocean Dynamics. vol. 53, no. 4. pp.368–388 (2003).[57] D.M. Livings, S.L. Dance and N.K. Nichols. Unbiased Ensemble Square Root Filters. Phys-ica D: Nonlinear Phenomena. vol. 237, no. 8. pp. 1021–1028 (2008).[58] A.J. Majda and J. Harlim. Filtering Complex Turbulent Systems. Cambridge UniversityPress (2012).[59] A.J. Majda and X.T. Tong. Performance of Ensemble Kalman Filters in Large Dimensions.Communications in Mathematical Sciences. vol. 71, no. 5. pp. 892–937 (2018).[60] J. Mandel, L. Cobb, and J.D. Beezley. On the convergence of the ensemble Kalman filter.Applications of Mathematics. vol. 56, no. 6. pp. 533–541 (2011).[61] H.P. McKean. A class of Markov processes associated with nonlinear parabolic equations.Proceedings of the National Academy of Sciences. vol. 56, no. 6. pp.1907–1911 (1966).[62] H.L. Mitchell, P.L. Houtekamer and G. Pellerin. Ensemble size, balance, and model-errorrepresentation in an ensemble Kalman filter. Monthly Weather Review. vol. 130, no. 11.pp. 2791–2808 (2002).[63] B.P. Molinari. The time-invariant linear-quadratic optimal control problem. Automatica.vol. 13, no. 4. pp. 347–357 (1977).[64] G. Naevdal, L.M. Johnsen, S.I. Aanonsen and E.H. Vefring. Reservoir monitoring andcontinuous model updating using ensemble Kalman filter. In Proceedings of the 2003 SPEAnnual Technical Conference and Exhibition, Denver, Colorado (October, 2003).[65] E. Ott, B.R. Hunt, I. Szunyogh, A.V. Zimin, E.J. Kostelich, M. Corazza, E. Kalnay, D.Patil, and J.A. Yorke. A local ensemble Kalman filter for atmospheric data assimilation.Tellus A. vol. 56, no. 5. pp. 415–428 (2004).[66] P. Park and T. Kailath. Convergence of the DRE solution to the ARE strong solution.IEEE Transactions on Automatic Control. vol. 42, no. 4. 573–578 (1997).[67] M-A. Poubelle, I.R. Petersen, M.R. Gevers, and R.R. Bitmead. A Miscellany of Results onan Equation of Count J. F. Riccati. IEEE Transactions on Automatic Control. vol. 31, no.7. pp. 651–654 (1986).[68] P. Rebeschini and R. Van Handel. Can local particle filters beat the curse of dimensionality?The Annals of Applied Probability. vol. 25, no. 5. pp. 2809–2866 (2015).[69] S. Reich and C.J. Cotter. Ensemble filter techniques for intermittent data assimilation.Large Scale Inverse Problems: Computational Methods and Applications in the EarthSciences (eds: M. Cullen, M.A. Freitag, S. Kindermann, R. Scheichl), pages 91–134. DeGruyter Publishers (2013). See also: arXiv e-print, arXiv:1208.6572arXiv:1208.6572