[PDF] On the Projective Geometry of Kalman Filter

Abstract

Convergence of the Kalman filter is best analyzed by studying the contraction of the Riccati map in the space of positive definite (covariance) matrices. In this paper, we explore how this contraction property relates to a more fundamental non-expansiveness property of filtering maps in the space of probability distributions endowed with the Hilbert metric. This is viewed as a preliminary step towards improving the convergence analysis of filtering algorithms over general graphical models.

Full PDF

OOn the Projective Geometry of Kalman Filter

Francesca Paola Carli and Rodolphe Sepulchre

Abstract — Convergence of the Kalman ﬁlter is best analyzedby studying the contraction of the Riccati map in the space ofpositive deﬁnite (covariance) matrices. In this paper, we explorehow this contraction property relates to a more fundamentalnon–expansiveness property of ﬁltering maps in the space ofprobability distributions endowed with the Hilbert metric. Thisis viewed as a preliminary step towards improving the conver-gence analysis of ﬁltering algorithms over general graphicalmodels.

I. I

NTRODUCTION

This paper is about the asymptotic behavior of the Kalmanﬁlter [11]. The Kalman–Bucy ﬁlter merges predictions froma trusted model of the dynamics of the system with incomingmeasurements in order to get an accurate, real–time estimateof the unknown internal state of the system. The estimationrelies on the computation of a positive semideﬁnite matrix P , the covariance of the estimation error. The differenceequation veriﬁed by P is a discrete–time algebraic Riccatiequation. Kalman showed that, for a linear time–invariantsystem, under detectability conditions, the Riccati equationconverges to a ﬁxed point, which is unique under certainstabilizability conditions ([10], see also [9]). The classicalconvergence analysis requires several steps, showing that theerror covariance is upper bounded, that, with zero initialvalue, it is monotone increasing, so that it admits a limit,and then proving that the corresponding ﬁlter is stable andthat the limit is the same for all initial covariances.In [4] Bougerol proposed a more geometric convergenceanalysis by showing that the discrete–time Riccati iterationis a contraction for the Riemannian metric associated to thecone of positive deﬁnite matrices. Other authors elaboratedalong these lines (see e.g. [16], [19], [13], [7]), showingthat the Riccati operator is a contraction with respect toother metrics (e.g. Thompson’s metric) and providing explicitformulas for the contraction coefﬁcients.In this paper, we seek to relate the convergence of theKalman iteration, and, in particular, of the Riccati ﬂow, tothe contraction of the (projective) Hilbert metric under theaction of a nonlinear map on the space of positive measurablefunctions (as opposed to the action of the nonlinear Riccatioperator on the space of positive deﬁnite matrices). Thechoice of Hilbert metric seems to be particularly sensiblein this context since, thanks to its property of being in-variant under scaling, it allows to study the convergence Francesca Paola Carli is with the Department of Electrical Engineeringand Computer Science, University of Li`ege, Belgium, and visiting theDepartment of Engineering, University of Cambridge, United Kingdom. Sheacknowledges support from the FNRS, Belgium. [email protected]

Rodolphe Sepulchre is with the Department of Engineering, Universityof Cambridge, United Kingdom, [email protected] of a nonlinear iteration via the analysis of a linear one. Tothis end, the Kalman iteration is seen as a specializationfor Gaussian distributions of ﬁltering algorithms for generalhidden Markov models (HMMs) and the observation ismade that the underlying iteration of these general ﬁlteringalgorithms never expands the Hilbert metric. This approachis more general than the analysis of the Riccati iteration butat the price of a weaker result, since only non expansivenessof the Hilbert metric can be shown. The gap between nonexpansiveness and contraction is certainly a non trivial onein the inﬁnite dimensional space of probability distributions.Using the Hilbert metric, convergence results have beenproved in [1], [15] (see also [14] for some results concerningHMMs with ﬁnite state space) where problems arising fromnon–compact state spaces or heavy tailed distributions havebeen considered. We envision that this approach can openthe way to a geometric analysis of ﬁltering algorithms ongeneral graphical models, e.g., of arbitrary topology.The paper is organized as follows. Section II and IIIestablish common notation by introducing the Hilbert metricand the Kalman ﬁlter iteration. In Section IV we showthat the nonlinear iteration underlying ﬁltering algorithmsfor general HMMs does not expand the Hilbert metric onthe space of positive measurable functions. In Section Vwe show that the Kalman iteration can indeed be seenas a particularization for Gaussian distributions of forwardﬁltering algorithms for general HMMs and as such does notexpand the Hilbert metric on the space of positive measur-able functions endowed with the Hilbert metric. Section VIdiscusses convergence. Section VII ends the paper.

Notation.

Throughout the paper if K is a cone, we denoteby K + the interior of K . In particular we will denote by P ( P + ) the cone of positive semideﬁnite (deﬁnite) matriceswhile F ( F + ) will be used to denote the cone of nonnegative(positive) measurable functions with respect to a suitable σ –algebra. II. H ILBERT METRIC

The Hilbert metric was introduced in [8]. Birkhoff [3](see also [5]) showed that strict positivity of a mappingimplies contraction in the Hilbert metric, paving the way tomany contraction–based results in the literature of positiveoperators. The Hilbert metric is deﬁned as follows. Let B be a real Banach space and let K be a closed solid cone in B that is a closed subset K with the properties that (i) K + is non–empty; (ii) K + K ⊆ K ; (iii)

K ∩ −K = { } ; (iv) λ K ⊂ K for all λ ≥ . Deﬁne the partial order x (cid:22) y ⇔ y − x ∈ K , a r X i v : . [ m a t h . O C ] O c t nd for x, y ∈ K\ { } , let M ( x, y ) := inf { λ | x − λy (cid:22) } m ( x, y ) := sup { λ | x − λy (cid:23) } The Hilbert metric d H ( · , · ) induced by K is deﬁned by d H ( x, y ) := log (cid:18) M ( x, y ) m ( x, y ) (cid:19) , x, y ∈ K\ { } . (1)For example, if B = R n and the cone K is the positiveorthant, K = O := { ( x , . . . , x n ) : x i ≥ , ≤ i ≤ n } ,then M ( x , y ) = max i ( x i /y j ) and m ( x , y ) = min i ( x i /y i ) and the Hilbert metric can be expressed as d H ( x , y ) = log max i ( x i /y i )min i ( x i /y i ) On the other hand, if B = S := (cid:8) X = X (cid:62) ∈ R n × n (cid:9) is theset of symmetric matrices and K = P := { X (cid:23) | X ∈ S} is the cone of positive semideﬁnite matrices, then for X , Y (cid:31) , M ( X , Y ) = λ max (cid:0) XY − (cid:1) and m ( X , Y ) = λ min (cid:0) XY − (cid:1) . Hence the Hilbert metric is d H ( X , Y ) = log λ max (cid:0) XY − (cid:1) λ min ( XY − ) In the following, we will be interested to positive operatorson ﬁnite measures. In this context, the Hilbert metric isdeﬁned as follows. Let X be a complete separable metricspace and let X be the σ –algebra of Borel subsets of X .Moreover let B = V be the vector space of ﬁnite signedmeasure on ( X , X ) and K = C ( X ) be the set of ﬁnitenonnegative measures on X . We recall that two elements λ, µ ∈ C ( X ) are called comparable if αλ ≤ µ ≤ βλ for suitable positive scalars α, β . The Hilbert metric on C ( X ) \ { } is deﬁned as d H ( µ, µ (cid:48) ) = (cid:40) log sup A : µ (cid:48) ( A ) > µ ( A ) /µ (cid:48) ( A )inf A : µ (cid:48) ( A ) > µ (cid:48) ( A ) /µ ( A ) if µ, µ (cid:48) comparable ∞ otherwise.An important property of the Hilbert metric is the follow-ing. The Hilbert metric is a projective metric on K i.e. itis nonnegative, symmetric, it satisﬁes the triangle inequalityand is such that, for every x, y ∈ K , d H ( x, y ) = 0 if and onlyif x = λy for some λ > . It follows easily that d H ( x, y ) isconstant on rays, that is d H ( λx, µy ) = d H ( x, y ) for λ, µ > . (2) Hilbert metric and positive mappings

In this section, we review contraction properties of positiveoperators with respect to the Hilbert metric. We recall that amap A : K (cid:55)→ K is said to be positive ; a map A : K + (cid:55)→ K + is said to be strictly positive . If A is a strictly positive linearmap we denote by k ( A ) := inf (cid:8) λ : d ( Ax, Ay ) ≤ λd ( x, y ) ∀ x, y, ∈ K + (cid:9) (3)the contraction ratio of A and by ∆( A ) := sup (cid:8) d ( Ax, Ay ) : x, y, ∈ K + (cid:9) (4) its projective diameter. Contraction properties of positiveoperators with respect to the Hilbert metric are establishedin the following theorem [3], [5], [12]. Theorem 2.1: If x, y ∈ K , then the following holds ( i ) if A is a positive linear map on K , then d H ( Ax, Ay ) ≤ d H ( x, y ) , i.e. the Hilbert metric contracts weakly underthe action of a positive linear transformation. ( ii ) [Birkhoff, 1957] If A is a strictly positive linear map in B , then k ( A ) = tanh 14 ∆(A) (5)Let U denote the unit sphere in B and let E be the metricspace E := {K + ∪ U, d H } . Then, by combining Theorem2.1 (ii), with the Banach contraction mapping theorem, thefollowing generalization of the Perron–Frobenius theoremholds: if ∆( A ) < ∞ and if the metric space E is complete,then there exists a unique positive eigenvector of A in E .III. K ALMAN FILTER AND THE R ICCATI OPERATOR

In this section, we brieﬂy introduce the Kalman ﬁlteriteration, that is analyzed later on in Section V where analternative derivation is also provided.Let us consider a linear dynamical system X k +1 = AX k + W k , k ≥ (6a) Y k = CX k + V k , (6b)where { W k } and { V k } are mutually uncorrelated whitenoise Gaussian processes with variance Γ and Σ , respec-tively, i.e. W k ∼ N (0 , Γ ) V k ∼ N (0 , Σ ) , (7)and with initial condition X ∼ N ( µ , P ) (8)such that E (cid:2) W k X (cid:62) (cid:3) = 0 , E (cid:2) V k X (cid:62) (cid:3) = 0 . (9)The Kalman ﬁlter recursion consists of the following steps: Time update (“Predict”) step: ˆ X k | k − = A ˆ X k − | k − (10) P k | k − = AP k − | k − A (cid:62) + Γ (11) Measurement update (“Correct”) step: ˆ X k | k = ˆ X k | k − + K k (cid:16) Y k − C ˆ X k | k − (cid:17) (12) P k | k = ( I − K k C ) P k | k − (13) K k = P k | k − C (cid:62) (cid:0) CP k | k − C (cid:62) + Σ (cid:1) − (14)and is initialized at ˆ X |− = µ , P |− = P . Equivalently,the following one–step expression for the a posteriori stateestimate and covariance holds P k | k = Φ( P k − | k − ) (15) ˆ X k | k = (cid:0) A − P k | k C (cid:62) Σ − CA (cid:1) ˆ X k − | k − + P k | k C (cid:62) Σ − Y k (16)here Φ is the nonlinear map Φ( P ) = (cid:0) APA (cid:62) + Γ (cid:1)(cid:2) I + C (cid:62) Σ − CΓ + C (cid:62) Σ − CAPA (cid:62) (cid:3) − . (17) Φ in (17) can be written as Φ( P ) = (cid:16)(cid:0) APA (cid:62) + Γ (cid:1) − + C (cid:62) Σ − C (cid:17) − . (18)This equation is called the discrete Riccati equation . Inthe literature, convergence of the Kalman iteration has beenstudied by proving that the discrete Riccati operator con-tracts suitable metrics (e.g. the Riemannian metric [4], theThompson’s part metric [16]) on the set of positive deﬁnitematrices. In the following, we propose to study convergenceof the Kalman iteration by directly analyzing an equivalentiteration on the space of positive measurable functions. Thisequivalent iteration will be introduced and discussed in thefollowing section.IV. N ON – EXPANSIVENESS OF THE F ILTERING R ECURSION IN P ROJECTIVE S PACES

In this section, we introduce the ﬁltering algorithm forgeneral hidden Markov models and we show that the mapunderlying the main iteration does not expand the Hilbertmetric on the cone of positive measurable functions. Notethat some authors use the term hidden Markov model exclu-sively for the case where X k takes values in a ﬁnite statespace. In this paper, following e.g. [6], when referring to ahidden Markov model we also intend to include models withcontinuous state space; such models are also referred to asstate–space models in the literature. Problem statement

In the broadest sense of the word, a hidden Markov modelis a Markov process that is split into two components:an observable component and an unobservable or “hidden”component. That is, a hidden Markov model is a Markovprocess { X k , Y k } k ≥ on the state space X × Y , where wepresume that we have a way of observing Y k , but not X k .In simple cases such as discrete–time, countable statespace models, it is common to deﬁne hidden Markov modelsby using the concept of conditional independence. It turnsout that conditional independence is mathematically moredifﬁcult to deﬁne in general settings (in particular, when thestate space X of the Markov process is not countable – thecase we are interested in), so a different route is adopted(see [6] for details). To this aim, we deﬁne the transitionkernel (the parallel of the transition matrix for countable statespaces). Deﬁnition 4.1: (Transition kernel) A kernel from a mea-surable space ( X , X ) to a measurable space ( Y , Y ) is a map Q : X × Y → [0 , ∞ ] such that(i) for all x ∈ X , A (cid:55)→ Q ( x , A ) is a measure on Y ;(ii) for all A ∈ Y , the map x (cid:55)→ Q ( x , A ) is measurable.If Q ( x , Y ) = 1 for every x ∈ X , then Q is called a transitionkernel . We next consider an X –valued stochastic process { X k } k ≥ ,i.e., a collection of X –valued random variables on a commonunderlying probability space (Ω , G , P ) , where X is somemeasure space. The process { X k } k ≥ is Markov if, for everytime k ≥ , there exists a transition kernel Q k : X × X → [0 , such that P ( X k +1 ∈ A | X , . . . . , X k ) = Q k ( X k , A ) , for every A ∈ X , k ≥ . If Q k = Q for every k , thenthe Markov process is called homogeneous . For simplicityof exposition, from now on we will consider homogeneousMarkov processes, though the theory we are about to developdoes not rely on this assumption. A hidden Markov model { X k , Y k } k ≥ is a (only partially observed) Markov process,whose transition kernel has a special structure, namely itis such that both the joint process { X k , Y k } k ≥ and themarginal unobservable process { X k } k ≥ are Markov. For-mally: Deﬁnition 4.2: (Hidden Markov Model)

Let ( X , X ) and ( Y , Y ) be two measurable spaces and let Q and G denotea transition kernel on ( X , X ) and a transition kernel from ( X , X ) to ( Y , Y ) . Consider the transition kernel on theproduct space ( X × Y , X ⊗ Y ) deﬁned by T [( x , y ) , C ] = (cid:90) (cid:90) C Q ( x , d x (cid:48) ) G ( x (cid:48) , d y (cid:48) ) . for ( x , y ) ∈ X × Y , C ∈ X ⊗ Y . The Markov process { X k , Y k } k ≥ with transition kernel T and initial probabilitymeasure µ on ( X , X ) , is called a hidden Markov model .A hidden Markov model is completely determined by theinitial measure µ and its transition kernel T (equivalently by Q and G ), formally: Proposition 4.1:

Let { X k , Y k } k ≥ be a hidden Markovmodel on ( X × Y , X ⊗Y ) with transition kernel Q , observationkernel G , and initial measure µ . Then for every boundedmeasurable function f : X × Y → R , E [ f ( X , Y , . . . , X k , Y k )]= (cid:90) f ( x , y , ..., x k , y k ) G ( x k , d y k ) Q ( x k − , d x k ) . . .G ( x , d y ) Q ( x , d x ) G ( x , d y ) µ ( d x ) . (19)In the following, we are interested in the ﬁltering problem for HMM, namely the problem of computing the sequenceof conditional distribution of X k given Y k . The ﬁltering,as well as the related smoothing and prediction problems,have their origin in the work of Wiener, who was inter-ested in stationary processes. In the more general settingof hidden Markov models, early contributions are the worksof Stratonovich, Shiryaev, Baum, Petrie and coworkers [18],[17], [2], see also [6] for a recent monograph. Filtering algorithm

Assume that both G and Q are absolutely continuous withrespect to the Lebesgue measure (in the next section wewill particularize to the case of Gaussian distributions) withtransition density functions g and q respectively. In terms ofransition densities, the ﬁltering problem can be solved asfollows. Theorem 4.1 (Forward ﬁltering recursion):

We denoteby ˆ α k ( x k ) the probability density function ˆ α s ( x k ) := p ( x k | y s ) and let g ( x k , y k ) = g k ( x k ) . Then ˆ α k ( x k ) = p ( x k | y k ) can be recursively expressed interms of ˆ α k − ( x k − ) = p ( x k − | y k − ) as follows ˆ α k ( x k ) = g k ( x k ) (cid:82) q ( x k − , x k )ˆ α k − ( x k − ) d x k − (cid:82)(cid:82) g k ( x k ) q ( x k − , x k )ˆ α k − ( x k − ) d x k d x k − (20)with iteration initialized at ˆ α ( x ) = g ( x ) µ ( x ) (cid:82) g ( x ) µ ( x ) d x . (21)The iteration (20) deﬁnes a time–varying dynamical systemover the cone F of nonnegative measurable functions withrespect to the product σ –algebra X ⊗ Y ⊗ ( k +1) . The follow-ing equivalent two–step formulation holds. Remark 4.1: [Two–step formulation of the ﬁlteringrecursion]

The ﬁltering recursion (20) is often split into twosteps.1) prediction step: in which the one-step-ahead predic-tive density is computed ˆ α k − ( x k ) = (cid:90) q ( x k − , x k )ˆ α k − ( x k − ) d x k − (22)2) update step: in which the observed data from time k is absorbed yielding to the ﬁltering density ˆ α k ( x k ) = g k ( x k )ˆ α k − ( x k ) (cid:82)(cid:82) g k ( x k ) q ( x k − , x k )ˆ α k − ( x k − ) d x k d x k − (23) Non–expansiveness in projective space

First of all, notice that the nonlinear map in (20), say ¯Ψ k ,is the composition of a linear one (at the numerator) and apositive scaling, i.e. we can write ( ¯Ψ k f )( x ) = (Ψ k f )( x ) (cid:82) ( ¯Ψ k f )( x ) d x where (Ψ k f )( x ) = g k ( x ) (cid:90) q ( x (cid:48) , x ) f ( x (cid:48) ) d x (cid:48) (24)with q and g transition densities associated to the transitionand observation kernels Q and G , respectively. The nexttheorem draws the consequences of the fact that the map Ψ k takes nonnegative measurable functions into nonnegativemeasurable functions. Theorem 4.2:

The map Ψ k in (24) does not expand theHilbert metric, i.e. d H ((Ψ k f )( x ) , (Ψ k g )( x )) ≤ d H ( f ( x ) , g ( x )) . Proof:

The map Ψ k is the composition of (i) (Ψ (1) f )( x ) = (cid:82) q ( x (cid:48) , x ) f ( x (cid:48) ) d x (cid:48) and (ii) (Ψ (2) f )( x ) = g k ( x ) f ( x ) . The maps Ψ (1) and Ψ (2) are positive linearand as such they do not expand the Hilbert metric (seeTheorem 2.1, (i)). The thesis follows since the compositionof nonexpansive operators is nonexpansive.V. K ALMAN FILTERING AS F ORWARD F ILTERING R ECURSION

The classical derivation of Kalman ﬁlter relies on an argu-ment based on projections onto spaces spanned by randomvariables. As an alternative, the Kalman iteration can be seenas a specialization of the ﬁltering algorithm in Theorem 4.1for Gaussian distributions. This fact by itself is known inthe literature (see e.g. [6]). In this section, ﬁrst we brieﬂyreview this alternative derivation of Kalman ﬁltering. This,combined with the (weak) contraction result of Theorem 4.2,let us conclude that the Kalman iteration does not expandthe Hilbert metric. Convergence of the Kalman iteration isdiscussed in Section VI.Before getting started, we observe that the linear dy-namical system (6)–(9) is indeed equivalent to a hiddenMarkov model as speciﬁed by (19) with initial, transitionand emission probability densities, for k ≥ , given by p ( x ) = N ( µ , P ) , (25) p ( x k +1 | x k ) = N ( Ax k , Γ ) , (26) p ( y k | x k ) = N ( Cx k , Σ ) , (27)Also we recall that given the prior and likelihood p ( x ) = N ( µ X , Σ X ) (28) p ( y | x ) = N ( Ax + b , Σ Y | X ) (29)the posterior p ( x | y ) and normalization constant p ( y ) aregiven by p ( y ) = N ( A µ X + b , Σ Y | X + AΣ X A (cid:62) ) (30) p ( x | y ) = N (cid:16) µ X | Y , Σ X | Y (cid:17) (31)with Σ X | Y = Σ X + A (cid:62) Σ − Y | X A (32) µ X | Y = Σ X | Y (cid:104) A (cid:62) Σ − Y | X ( y − b ) + Σ − X µ X (cid:105) . (33)The next proposition connects the Kalman ﬁlter algorithmto the ﬁltering recursion described in Section IV. Proposition 5.1:

The Kalman ﬁlter recursion (10)–(14)is a specialization of the forward ﬁltering recursion ofTheorem 4.1 for an HMM with Gaussian initial, transitionand emission probabilities as in (25).

Proof:

So far, we have shown that the time–varying nonlinearoperator ¯Ψ k that underlies the Kalman iteration does not ex-pand the Hilbert metric. Proving convergence of the Kalmaniteration indeed amounts to prove that such iteration strictly contracts the Hilbert metric. As observed in Section IV,the map (20) is the composition of a linear positive mapand a positive scaling. By the scaling invariant property ofthe Hilbert metric, it follows that convergence analysis canconcentrate only on the linear numerator of ¯Ψ k . By Theorem2.1 (ii), a sufﬁcient condition for a strictly positive linearoperator to be a contraction is to have a ﬁnite projectivediameter. At this point, one may observe that even the Hilbertdistance between two Gaussians with the same variance anddifferent mean may tend to inﬁnity (a general discussionthat takes into account problems arising from the use of theHilbert metric with non–compact state space and heavy taileddistributions is contained in [1]). Proving strict contractionusually requires to exploit that the map ¯Ψ k is time–varying, and showing that the map contracts over a uniform time–horizon as opposed to at each time instant. For iterations onthe ﬁnite dimensional space of covariance matrices, this is theplace where the observability and controllability conditionsenter the analysis. Our hope is that similar conditionsapply to more general situations that the one covered by theKalman ﬁlter and that this general approach will ﬁnd novelapplications in the analysis of ﬁltering algorithms on generalgraphical models. VII. C ONCLUSION

As an attempt to generalize the contraction–based con-vergence analysis of the Kalman ﬁlter, we have interpretedthe contraction result of Bougerol in the space of positivedeﬁnite (covariance) matrices as a specialization of the non–expansiveness of the general ﬁltering recursion for hiddenMarkov models in the space of positive measurable func-tions. In spite of the obstacles to showing a ﬁnite projectivediameter in this inﬁnite dimensional space, we feel that thisapproach is worth revisiting in the convergence analysis ofﬁltering algorithms on general graphical models (arbitrarytopology and/or on different spaces of distributions). This isthe topic of ongoing research.R

EFERENCES[1] R. Atar and O. Zeitouni. Exponential stability for nonlinear ﬁltering.

Annales de l’IHP Probabilit´es et Statistiques , 33(6):697–725, 1997.[2] L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximizationtechnique occurring in the statistical analysis of probabilistic functionsof Markov chains.

The annals of mathematical statistics , pages 164–171, 1970.[3] G. Birkhoff. Extensions of Jentzsch’s theorem.

Transactions of theAmerican Mathematical Society , pages 219–227, 1957.[4] P. Bougerol. Kalman ﬁltering with random coefﬁcients and contrac-tions.

SIAM Journal on Control and Optimization , 31(4):942–959,1993.[5] P.J. Bushell. Hilbert’s metric and positive contraction mappingsin a Banach space.

Archive for Rational Mechanics and Analysis ,52(4):330–338, 1973.[6] O. Capp´e, E. Moulines, and T. Ryd´en.

Inference in Hidden MarkovModels . Springer Verlag, New York, 2005.[7] S. Gaubert and Z. Qu. The contraction rate in Thompson’s part metricof order-preserving ﬂows on a cone–application to generalized Riccatiequations.

Journal of Differential Equations , 256(8):2902–2948, 2014.[8] D. Hilbert. ¨Uber die gerade linie als k¨urzeste verbindung zweierpunkte.

Mathematische Annalen , 46(1):91–96, 1895.[9] A. H. Jazwinski.

Stochastic processes and ﬁltering theory . AcademicPress, 1970.[10] R.E. Kalman. New methods in Wiener ﬁltering theory. In

Proceedingsof the First Symposium on Engineering Applications of RandomFunction Theory and Probability . John Wiley & Sons, New York,1963.[11] R.E. Kalman and R. S. Bucy. New results in linear ﬁltering andprediction theory.

Journal of Basic Engineering , 83(1):95–108, 1961.[12] E. Kohlberg and J.W. Pratt. The contraction mapping approach tothe Perron–Frobenius theory: Why Hilbert’s metric?

Mathematics ofOperations Research , 7(2):198–210, 1982.[13] J. Lawson and Y. Lim. A Birkhoff contraction formula with applica-tions to Riccati equations.

SIAM Journal on Control and Optimization ,46(3):930–951, 2007.[14] F. Le Gland and L. Mevel. Exponential forgetting and geometricergodicity in hidden markov models.

Mathematics of Control, Signalsand Systems , 13(1):63–93, 2000.[15] F. Le Gland and N. Oudjane. Stability and uniform approximation ofnonlinear ﬁlters using the Hilbert metric and application to particleﬁlters.

The Annals of Applied Probability , 14(1):144–187, 2004.16] C. Liverani and M.P. Wojtkowski. Generalization of the Hilbert metricto the space of positive deﬁnite matrices.

Paciﬁc J. Math , 166(2):339–355, 1994.[17] A.N. Shiryaev. On stochastic equations in the theory of conditionalMarkov process.

Theory of probability and its applications , 11(1):179–184, 1966.[18] R.L. Stratonovich. Conditional Markov processes.

Theory of Proba-bility and Its Applications , 5(2):156–178, 1960.[19] M.P. Wojtkowski. Geometry of Kalman ﬁlters.