On the Projective Geometry of Kalman Filter
OOn the Projective Geometry of Kalman Filter
Francesca Paola Carli and Rodolphe Sepulchre
Abstract — Convergence of the Kalman filter is best analyzedby studying the contraction of the Riccati map in the space ofpositive definite (covariance) matrices. In this paper, we explorehow this contraction property relates to a more fundamentalnon–expansiveness property of filtering maps in the space ofprobability distributions endowed with the Hilbert metric. Thisis viewed as a preliminary step towards improving the conver-gence analysis of filtering algorithms over general graphicalmodels.
I. I
NTRODUCTION
This paper is about the asymptotic behavior of the Kalmanfilter [11]. The Kalman–Bucy filter merges predictions froma trusted model of the dynamics of the system with incomingmeasurements in order to get an accurate, real–time estimateof the unknown internal state of the system. The estimationrelies on the computation of a positive semidefinite matrix P , the covariance of the estimation error. The differenceequation verified by P is a discrete–time algebraic Riccatiequation. Kalman showed that, for a linear time–invariantsystem, under detectability conditions, the Riccati equationconverges to a fixed point, which is unique under certainstabilizability conditions ([10], see also [9]). The classicalconvergence analysis requires several steps, showing that theerror covariance is upper bounded, that, with zero initialvalue, it is monotone increasing, so that it admits a limit,and then proving that the corresponding filter is stable andthat the limit is the same for all initial covariances.In [4] Bougerol proposed a more geometric convergenceanalysis by showing that the discrete–time Riccati iterationis a contraction for the Riemannian metric associated to thecone of positive definite matrices. Other authors elaboratedalong these lines (see e.g. [16], [19], [13], [7]), showingthat the Riccati operator is a contraction with respect toother metrics (e.g. Thompson’s metric) and providing explicitformulas for the contraction coefficients.In this paper, we seek to relate the convergence of theKalman iteration, and, in particular, of the Riccati flow, tothe contraction of the (projective) Hilbert metric under theaction of a nonlinear map on the space of positive measurablefunctions (as opposed to the action of the nonlinear Riccatioperator on the space of positive definite matrices). Thechoice of Hilbert metric seems to be particularly sensiblein this context since, thanks to its property of being in-variant under scaling, it allows to study the convergence Francesca Paola Carli is with the Department of Electrical Engineeringand Computer Science, University of Li`ege, Belgium, and visiting theDepartment of Engineering, University of Cambridge, United Kingdom. Sheacknowledges support from the FNRS, Belgium. [email protected]
Rodolphe Sepulchre is with the Department of Engineering, Universityof Cambridge, United Kingdom, [email protected] of a nonlinear iteration via the analysis of a linear one. Tothis end, the Kalman iteration is seen as a specializationfor Gaussian distributions of filtering algorithms for generalhidden Markov models (HMMs) and the observation ismade that the underlying iteration of these general filteringalgorithms never expands the Hilbert metric. This approachis more general than the analysis of the Riccati iteration butat the price of a weaker result, since only non expansivenessof the Hilbert metric can be shown. The gap between nonexpansiveness and contraction is certainly a non trivial onein the infinite dimensional space of probability distributions.Using the Hilbert metric, convergence results have beenproved in [1], [15] (see also [14] for some results concerningHMMs with finite state space) where problems arising fromnon–compact state spaces or heavy tailed distributions havebeen considered. We envision that this approach can openthe way to a geometric analysis of filtering algorithms ongeneral graphical models, e.g., of arbitrary topology.The paper is organized as follows. Section II and IIIestablish common notation by introducing the Hilbert metricand the Kalman filter iteration. In Section IV we showthat the nonlinear iteration underlying filtering algorithmsfor general HMMs does not expand the Hilbert metric onthe space of positive measurable functions. In Section Vwe show that the Kalman iteration can indeed be seenas a particularization for Gaussian distributions of forwardfiltering algorithms for general HMMs and as such does notexpand the Hilbert metric on the space of positive measur-able functions endowed with the Hilbert metric. Section VIdiscusses convergence. Section VII ends the paper.
Notation.
Throughout the paper if K is a cone, we denoteby K + the interior of K . In particular we will denote by P ( P + ) the cone of positive semidefinite (definite) matriceswhile F ( F + ) will be used to denote the cone of nonnegative(positive) measurable functions with respect to a suitable σ –algebra. II. H ILBERT METRIC
The Hilbert metric was introduced in [8]. Birkhoff [3](see also [5]) showed that strict positivity of a mappingimplies contraction in the Hilbert metric, paving the way tomany contraction–based results in the literature of positiveoperators. The Hilbert metric is defined as follows. Let B be a real Banach space and let K be a closed solid cone in B that is a closed subset K with the properties that (i) K + is non–empty; (ii) K + K ⊆ K ; (iii)
K ∩ −K = { } ; (iv) λ K ⊂ K for all λ ≥ . Define the partial order x (cid:22) y ⇔ y − x ∈ K , a r X i v : . [ m a t h . O C ] O c t nd for x, y ∈ K\ { } , let M ( x, y ) := inf { λ | x − λy (cid:22) } m ( x, y ) := sup { λ | x − λy (cid:23) } The Hilbert metric d H ( · , · ) induced by K is defined by d H ( x, y ) := log (cid:18) M ( x, y ) m ( x, y ) (cid:19) , x, y ∈ K\ { } . (1)For example, if B = R n and the cone K is the positiveorthant, K = O := { ( x , . . . , x n ) : x i ≥ , ≤ i ≤ n } ,then M ( x , y ) = max i ( x i /y j ) and m ( x , y ) = min i ( x i /y i ) and the Hilbert metric can be expressed as d H ( x , y ) = log max i ( x i /y i )min i ( x i /y i ) On the other hand, if B = S := (cid:8) X = X (cid:62) ∈ R n × n (cid:9) is theset of symmetric matrices and K = P := { X (cid:23) | X ∈ S} is the cone of positive semidefinite matrices, then for X , Y (cid:31) , M ( X , Y ) = λ max (cid:0) XY − (cid:1) and m ( X , Y ) = λ min (cid:0) XY − (cid:1) . Hence the Hilbert metric is d H ( X , Y ) = log λ max (cid:0) XY − (cid:1) λ min ( XY − ) In the following, we will be interested to positive operatorson finite measures. In this context, the Hilbert metric isdefined as follows. Let X be a complete separable metricspace and let X be the σ –algebra of Borel subsets of X .Moreover let B = V be the vector space of finite signedmeasure on ( X , X ) and K = C ( X ) be the set of finitenonnegative measures on X . We recall that two elements λ, µ ∈ C ( X ) are called comparable if αλ ≤ µ ≤ βλ for suitable positive scalars α, β . The Hilbert metric on C ( X ) \ { } is defined as d H ( µ, µ (cid:48) ) = (cid:40) log sup A : µ (cid:48) ( A ) > µ ( A ) /µ (cid:48) ( A )inf A : µ (cid:48) ( A ) > µ (cid:48) ( A ) /µ ( A ) if µ, µ (cid:48) comparable ∞ otherwise.An important property of the Hilbert metric is the follow-ing. The Hilbert metric is a projective metric on K i.e. itis nonnegative, symmetric, it satisfies the triangle inequalityand is such that, for every x, y ∈ K , d H ( x, y ) = 0 if and onlyif x = λy for some λ > . It follows easily that d H ( x, y ) isconstant on rays, that is d H ( λx, µy ) = d H ( x, y ) for λ, µ > . (2) Hilbert metric and positive mappings
In this section, we review contraction properties of positiveoperators with respect to the Hilbert metric. We recall that amap A : K (cid:55)→ K is said to be positive ; a map A : K + (cid:55)→ K + is said to be strictly positive . If A is a strictly positive linearmap we denote by k ( A ) := inf (cid:8) λ : d ( Ax, Ay ) ≤ λd ( x, y ) ∀ x, y, ∈ K + (cid:9) (3)the contraction ratio of A and by ∆( A ) := sup (cid:8) d ( Ax, Ay ) : x, y, ∈ K + (cid:9) (4) its projective diameter. Contraction properties of positiveoperators with respect to the Hilbert metric are establishedin the following theorem [3], [5], [12]. Theorem 2.1: If x, y ∈ K , then the following holds ( i ) if A is a positive linear map on K , then d H ( Ax, Ay ) ≤ d H ( x, y ) , i.e. the Hilbert metric contracts weakly underthe action of a positive linear transformation. ( ii ) [Birkhoff, 1957] If A is a strictly positive linear map in B , then k ( A ) = tanh 14 ∆(A) (5)Let U denote the unit sphere in B and let E be the metricspace E := {K + ∪ U, d H } . Then, by combining Theorem2.1 (ii), with the Banach contraction mapping theorem, thefollowing generalization of the Perron–Frobenius theoremholds: if ∆( A ) < ∞ and if the metric space E is complete,then there exists a unique positive eigenvector of A in E .III. K ALMAN FILTER AND THE R ICCATI OPERATOR
In this section, we briefly introduce the Kalman filteriteration, that is analyzed later on in Section V where analternative derivation is also provided.Let us consider a linear dynamical system X k +1 = AX k + W k , k ≥ (6a) Y k = CX k + V k , (6b)where { W k } and { V k } are mutually uncorrelated whitenoise Gaussian processes with variance Γ and Σ , respec-tively, i.e. W k ∼ N (0 , Γ ) V k ∼ N (0 , Σ ) , (7)and with initial condition X ∼ N ( µ , P ) (8)such that E (cid:2) W k X (cid:62) (cid:3) = 0 , E (cid:2) V k X (cid:62) (cid:3) = 0 . (9)The Kalman filter recursion consists of the following steps: Time update (“Predict”) step: ˆ X k | k − = A ˆ X k − | k − (10) P k | k − = AP k − | k − A (cid:62) + Γ (11) Measurement update (“Correct”) step: ˆ X k | k = ˆ X k | k − + K k (cid:16) Y k − C ˆ X k | k − (cid:17) (12) P k | k = ( I − K k C ) P k | k − (13) K k = P k | k − C (cid:62) (cid:0) CP k | k − C (cid:62) + Σ (cid:1) − (14)and is initialized at ˆ X |− = µ , P |− = P . Equivalently,the following one–step expression for the a posteriori stateestimate and covariance holds P k | k = Φ( P k − | k − ) (15) ˆ X k | k = (cid:0) A − P k | k C (cid:62) Σ − CA (cid:1) ˆ X k − | k − + P k | k C (cid:62) Σ − Y k (16)here Φ is the nonlinear map Φ( P ) = (cid:0) APA (cid:62) + Γ (cid:1)(cid:2) I + C (cid:62) Σ − CΓ + C (cid:62) Σ − CAPA (cid:62) (cid:3) − . (17) Φ in (17) can be written as Φ( P ) = (cid:16)(cid:0) APA (cid:62) + Γ (cid:1) − + C (cid:62) Σ − C (cid:17) − . (18)This equation is called the discrete Riccati equation . Inthe literature, convergence of the Kalman iteration has beenstudied by proving that the discrete Riccati operator con-tracts suitable metrics (e.g. the Riemannian metric [4], theThompson’s part metric [16]) on the set of positive definitematrices. In the following, we propose to study convergenceof the Kalman iteration by directly analyzing an equivalentiteration on the space of positive measurable functions. Thisequivalent iteration will be introduced and discussed in thefollowing section.IV. N ON – EXPANSIVENESS OF THE F ILTERING R ECURSION IN P ROJECTIVE S PACES
In this section, we introduce the filtering algorithm forgeneral hidden Markov models and we show that the mapunderlying the main iteration does not expand the Hilbertmetric on the cone of positive measurable functions. Notethat some authors use the term hidden Markov model exclu-sively for the case where X k takes values in a finite statespace. In this paper, following e.g. [6], when referring to ahidden Markov model we also intend to include models withcontinuous state space; such models are also referred to asstate–space models in the literature. Problem statement
In the broadest sense of the word, a hidden Markov modelis a Markov process that is split into two components:an observable component and an unobservable or “hidden”component. That is, a hidden Markov model is a Markovprocess { X k , Y k } k ≥ on the state space X × Y , where wepresume that we have a way of observing Y k , but not X k .In simple cases such as discrete–time, countable statespace models, it is common to define hidden Markov modelsby using the concept of conditional independence. It turnsout that conditional independence is mathematically moredifficult to define in general settings (in particular, when thestate space X of the Markov process is not countable – thecase we are interested in), so a different route is adopted(see [6] for details). To this aim, we define the transitionkernel (the parallel of the transition matrix for countable statespaces). Definition 4.1: (Transition kernel) A kernel from a mea-surable space ( X , X ) to a measurable space ( Y , Y ) is a map Q : X × Y → [0 , ∞ ] such that(i) for all x ∈ X , A (cid:55)→ Q ( x , A ) is a measure on Y ;(ii) for all A ∈ Y , the map x (cid:55)→ Q ( x , A ) is measurable.If Q ( x , Y ) = 1 for every x ∈ X , then Q is called a transitionkernel . We next consider an X –valued stochastic process { X k } k ≥ ,i.e., a collection of X –valued random variables on a commonunderlying probability space (Ω , G , P ) , where X is somemeasure space. The process { X k } k ≥ is Markov if, for everytime k ≥ , there exists a transition kernel Q k : X × X → [0 , such that P ( X k +1 ∈ A | X , . . . . , X k ) = Q k ( X k , A ) , for every A ∈ X , k ≥ . If Q k = Q for every k , thenthe Markov process is called homogeneous . For simplicityof exposition, from now on we will consider homogeneousMarkov processes, though the theory we are about to developdoes not rely on this assumption. A hidden Markov model { X k , Y k } k ≥ is a (only partially observed) Markov process,whose transition kernel has a special structure, namely itis such that both the joint process { X k , Y k } k ≥ and themarginal unobservable process { X k } k ≥ are Markov. For-mally: Definition 4.2: (Hidden Markov Model)
Let ( X , X ) and ( Y , Y ) be two measurable spaces and let Q and G denotea transition kernel on ( X , X ) and a transition kernel from ( X , X ) to ( Y , Y ) . Consider the transition kernel on theproduct space ( X × Y , X ⊗ Y ) defined by T [( x , y ) , C ] = (cid:90) (cid:90) C Q ( x , d x (cid:48) ) G ( x (cid:48) , d y (cid:48) ) . for ( x , y ) ∈ X × Y , C ∈ X ⊗ Y . The Markov process { X k , Y k } k ≥ with transition kernel T and initial probabilitymeasure µ on ( X , X ) , is called a hidden Markov model .A hidden Markov model is completely determined by theinitial measure µ and its transition kernel T (equivalently by Q and G ), formally: Proposition 4.1:
Let { X k , Y k } k ≥ be a hidden Markovmodel on ( X × Y , X ⊗Y ) with transition kernel Q , observationkernel G , and initial measure µ . Then for every boundedmeasurable function f : X × Y → R , E [ f ( X , Y , . . . , X k , Y k )]= (cid:90) f ( x , y , ..., x k , y k ) G ( x k , d y k ) Q ( x k − , d x k ) . . .G ( x , d y ) Q ( x , d x ) G ( x , d y ) µ ( d x ) . (19)In the following, we are interested in the filtering problem for HMM, namely the problem of computing the sequenceof conditional distribution of X k given Y k . The filtering,as well as the related smoothing and prediction problems,have their origin in the work of Wiener, who was inter-ested in stationary processes. In the more general settingof hidden Markov models, early contributions are the worksof Stratonovich, Shiryaev, Baum, Petrie and coworkers [18],[17], [2], see also [6] for a recent monograph. Filtering algorithm
Assume that both G and Q are absolutely continuous withrespect to the Lebesgue measure (in the next section wewill particularize to the case of Gaussian distributions) withtransition density functions g and q respectively. In terms ofransition densities, the filtering problem can be solved asfollows. Theorem 4.1 (Forward filtering recursion):
We denoteby ˆ α k ( x k ) the probability density function ˆ α s ( x k ) := p ( x k | y s ) and let g ( x k , y k ) = g k ( x k ) . Then ˆ α k ( x k ) = p ( x k | y k ) can be recursively expressed interms of ˆ α k − ( x k − ) = p ( x k − | y k − ) as follows ˆ α k ( x k ) = g k ( x k ) (cid:82) q ( x k − , x k )ˆ α k − ( x k − ) d x k − (cid:82)(cid:82) g k ( x k ) q ( x k − , x k )ˆ α k − ( x k − ) d x k d x k − (20)with iteration initialized at ˆ α ( x ) = g ( x ) µ ( x ) (cid:82) g ( x ) µ ( x ) d x . (21)The iteration (20) defines a time–varying dynamical systemover the cone F of nonnegative measurable functions withrespect to the product σ –algebra X ⊗ Y ⊗ ( k +1) . The follow-ing equivalent two–step formulation holds. Remark 4.1: [Two–step formulation of the filteringrecursion]
The filtering recursion (20) is often split into twosteps.1) prediction step: in which the one-step-ahead predic-tive density is computed ˆ α k − ( x k ) = (cid:90) q ( x k − , x k )ˆ α k − ( x k − ) d x k − (22)2) update step: in which the observed data from time k is absorbed yielding to the filtering density ˆ α k ( x k ) = g k ( x k )ˆ α k − ( x k ) (cid:82)(cid:82) g k ( x k ) q ( x k − , x k )ˆ α k − ( x k − ) d x k d x k − (23) Non–expansiveness in projective space
First of all, notice that the nonlinear map in (20), say ¯Ψ k ,is the composition of a linear one (at the numerator) and apositive scaling, i.e. we can write ( ¯Ψ k f )( x ) = (Ψ k f )( x ) (cid:82) ( ¯Ψ k f )( x ) d x where (Ψ k f )( x ) = g k ( x ) (cid:90) q ( x (cid:48) , x ) f ( x (cid:48) ) d x (cid:48) (24)with q and g transition densities associated to the transitionand observation kernels Q and G , respectively. The nexttheorem draws the consequences of the fact that the map Ψ k takes nonnegative measurable functions into nonnegativemeasurable functions. Theorem 4.2:
The map Ψ k in (24) does not expand theHilbert metric, i.e. d H ((Ψ k f )( x ) , (Ψ k g )( x )) ≤ d H ( f ( x ) , g ( x )) . Proof:
The map Ψ k is the composition of (i) (Ψ (1) f )( x ) = (cid:82) q ( x (cid:48) , x ) f ( x (cid:48) ) d x (cid:48) and (ii) (Ψ (2) f )( x ) = g k ( x ) f ( x ) . The maps Ψ (1) and Ψ (2) are positive linearand as such they do not expand the Hilbert metric (seeTheorem 2.1, (i)). The thesis follows since the compositionof nonexpansive operators is nonexpansive.V. K ALMAN FILTERING AS F ORWARD F ILTERING R ECURSION
The classical derivation of Kalman filter relies on an argu-ment based on projections onto spaces spanned by randomvariables. As an alternative, the Kalman iteration can be seenas a specialization of the filtering algorithm in Theorem 4.1for Gaussian distributions. This fact by itself is known inthe literature (see e.g. [6]). In this section, first we brieflyreview this alternative derivation of Kalman filtering. This,combined with the (weak) contraction result of Theorem 4.2,let us conclude that the Kalman iteration does not expandthe Hilbert metric. Convergence of the Kalman iteration isdiscussed in Section VI.Before getting started, we observe that the linear dy-namical system (6)–(9) is indeed equivalent to a hiddenMarkov model as specified by (19) with initial, transitionand emission probability densities, for k ≥ , given by p ( x ) = N ( µ , P ) , (25) p ( x k +1 | x k ) = N ( Ax k , Γ ) , (26) p ( y k | x k ) = N ( Cx k , Σ ) , (27)Also we recall that given the prior and likelihood p ( x ) = N ( µ X , Σ X ) (28) p ( y | x ) = N ( Ax + b , Σ Y | X ) (29)the posterior p ( x | y ) and normalization constant p ( y ) aregiven by p ( y ) = N ( A µ X + b , Σ Y | X + AΣ X A (cid:62) ) (30) p ( x | y ) = N (cid:16) µ X | Y , Σ X | Y (cid:17) (31)with Σ X | Y = Σ X + A (cid:62) Σ − Y | X A (32) µ X | Y = Σ X | Y (cid:104) A (cid:62) Σ − Y | X ( y − b ) + Σ − X µ X (cid:105) . (33)The next proposition connects the Kalman filter algorithmto the filtering recursion described in Section IV. Proposition 5.1:
The Kalman filter recursion (10)–(14)is a specialization of the forward filtering recursion ofTheorem 4.1 for an HMM with Gaussian initial, transitionand emission probabilities as in (25).
Proof:
Let µ k | s := E [ X k | Y s ] , P k | s := E (cid:104) ( X k − µ k | s )( X k − µ k | s ) (cid:62) | Y s (cid:105) prediction step: By (22), p ( x k | y k − ) is given by p ( x k | y k − ) = (cid:90) x k − p ( x k | x k − ) p ( x k − | y k − ) d x k − ow, p ( x k | x k − ) is Gaussian with mean Ax k − andcovariance Γ . p ( x k − | y k − ) is also Gaussian. Wedenote by µ k − | k − and P k − | k − its mean and co-variance, respectively. By virtue of (30) we get p ( x k | y k − ) ∼ N ( A µ k − | k − , AP k − | k − A (cid:62) + Γ ) i.e. µ k | k − = A µ k − | k − P k | k − = AP k − | k − A (cid:62) + Γ which are the a priori state estimate and covariance in(10)–(11).2) update step: By (23), p ( x k | y k ) is given by p ( x k | y k ) = p ( y k | x k ) p ( x k | y k − ) p ( y k | y k − ) Now p ( y k | x k ) is Gaussian with mean Cx k and co-variance Σ . p ( x k | y k − ) is also Gaussian. We denoteby µ k | k − and P k | k − its mean and covariance. Byvirtue of (31) we get p ( x k | y k ) ∼ N (cid:16) µ k | k , P k | k (cid:17) with P k | k = (cid:16) P − k | k − + C (cid:62) Σ − C (cid:17) − (34) µ k | k = P k | k (cid:104) C (cid:62) Σ − y k + P − k | k − µ k | k − (cid:105) (35)from which the expressions (12)–(13) for the a pos-teriori state estimate and covariance can be recoveredvia the matrix inversion lemma.By the results in Theorem 4.2 and Proposition 5.1, we havethat the map underlying the Kalman filtering algorithm doesnot expand the Hilbert metric on space of positive measurablefunctions.VI. O N STRICT CONTRACTIVENESS OF THE K ALMANITERATION
So far, we have shown that the time–varying nonlinearoperator ¯Ψ k that underlies the Kalman iteration does not ex-pand the Hilbert metric. Proving convergence of the Kalmaniteration indeed amounts to prove that such iteration strictly contracts the Hilbert metric. As observed in Section IV,the map (20) is the composition of a linear positive mapand a positive scaling. By the scaling invariant property ofthe Hilbert metric, it follows that convergence analysis canconcentrate only on the linear numerator of ¯Ψ k . By Theorem2.1 (ii), a sufficient condition for a strictly positive linearoperator to be a contraction is to have a finite projectivediameter. At this point, one may observe that even the Hilbertdistance between two Gaussians with the same variance anddifferent mean may tend to infinity (a general discussionthat takes into account problems arising from the use of theHilbert metric with non–compact state space and heavy taileddistributions is contained in [1]). Proving strict contractionusually requires to exploit that the map ¯Ψ k is time–varying, and showing that the map contracts over a uniform time–horizon as opposed to at each time instant. For iterations onthe finite dimensional space of covariance matrices, this is theplace where the observability and controllability conditionsenter the analysis. Our hope is that similar conditionsapply to more general situations that the one covered by theKalman filter and that this general approach will find novelapplications in the analysis of filtering algorithms on generalgraphical models. VII. C ONCLUSION
As an attempt to generalize the contraction–based con-vergence analysis of the Kalman filter, we have interpretedthe contraction result of Bougerol in the space of positivedefinite (covariance) matrices as a specialization of the non–expansiveness of the general filtering recursion for hiddenMarkov models in the space of positive measurable func-tions. In spite of the obstacles to showing a finite projectivediameter in this infinite dimensional space, we feel that thisapproach is worth revisiting in the convergence analysis offiltering algorithms on general graphical models (arbitrarytopology and/or on different spaces of distributions). This isthe topic of ongoing research.R
EFERENCES[1] R. Atar and O. Zeitouni. Exponential stability for nonlinear filtering.
Annales de l’IHP Probabilit´es et Statistiques , 33(6):697–725, 1997.[2] L.E. Baum, T. Petrie, G. Soules, and N. Weiss. A maximizationtechnique occurring in the statistical analysis of probabilistic functionsof Markov chains.
The annals of mathematical statistics , pages 164–171, 1970.[3] G. Birkhoff. Extensions of Jentzsch’s theorem.
Transactions of theAmerican Mathematical Society , pages 219–227, 1957.[4] P. Bougerol. Kalman filtering with random coefficients and contrac-tions.
SIAM Journal on Control and Optimization , 31(4):942–959,1993.[5] P.J. Bushell. Hilbert’s metric and positive contraction mappingsin a Banach space.
Archive for Rational Mechanics and Analysis ,52(4):330–338, 1973.[6] O. Capp´e, E. Moulines, and T. Ryd´en.
Inference in Hidden MarkovModels . Springer Verlag, New York, 2005.[7] S. Gaubert and Z. Qu. The contraction rate in Thompson’s part metricof order-preserving flows on a cone–application to generalized Riccatiequations.
Journal of Differential Equations , 256(8):2902–2948, 2014.[8] D. Hilbert. ¨Uber die gerade linie als k¨urzeste verbindung zweierpunkte.
Mathematische Annalen , 46(1):91–96, 1895.[9] A. H. Jazwinski.
Stochastic processes and filtering theory . AcademicPress, 1970.[10] R.E. Kalman. New methods in Wiener filtering theory. In
Proceedingsof the First Symposium on Engineering Applications of RandomFunction Theory and Probability . John Wiley & Sons, New York,1963.[11] R.E. Kalman and R. S. Bucy. New results in linear filtering andprediction theory.
Journal of Basic Engineering , 83(1):95–108, 1961.[12] E. Kohlberg and J.W. Pratt. The contraction mapping approach tothe Perron–Frobenius theory: Why Hilbert’s metric?
Mathematics ofOperations Research , 7(2):198–210, 1982.[13] J. Lawson and Y. Lim. A Birkhoff contraction formula with applica-tions to Riccati equations.
SIAM Journal on Control and Optimization ,46(3):930–951, 2007.[14] F. Le Gland and L. Mevel. Exponential forgetting and geometricergodicity in hidden markov models.
Mathematics of Control, Signalsand Systems , 13(1):63–93, 2000.[15] F. Le Gland and N. Oudjane. Stability and uniform approximation ofnonlinear filters using the Hilbert metric and application to particlefilters.
The Annals of Applied Probability , 14(1):144–187, 2004.16] C. Liverani and M.P. Wojtkowski. Generalization of the Hilbert metricto the space of positive definite matrices.
Pacific J. Math , 166(2):339–355, 1994.[17] A.N. Shiryaev. On stochastic equations in the theory of conditionalMarkov process.
Theory of probability and its applications , 11(1):179–184, 1966.[18] R.L. Stratonovich. Conditional Markov processes.
Theory of Proba-bility and Its Applications , 5(2):156–178, 1960.[19] M.P. Wojtkowski. Geometry of Kalman filters.