Variational Bayesian Adaptation of Noise Covariances in Non-Linear Kalman Filtering
VVariational Bayesian Adaptation of Noise Covariances in Non-Linear Kalman Filtering
Simo S¨arkk¨a and Jouni Hartikainen
Aalto University, Finland
Abstract
This paper is considered with joint estimation of state and time-varying noise covariance matrices in non-linear stochastic statespace models. We present a variational Bayes and Gaussian filtering based algorithm for e ffi cient computation of the approximatefiltering posterior distributions. The Gaussian filtering based formulation of the non-linear state space model computation allowsusage of e ffi cient Gaussian integration methods such as unscented transform, cubature integration and Gauss-Hermite integrationalong with the classical Taylor series approximations. The performance of the algorithm is illustrated in a simulated application. Keywords: non-linear Kalman filtering, variational Bayes, noise adaptation
1. Introduction
In this paper, we consider Bayesian inference on the state x k and noise covariances Σ k in heteroscedastic non-linear stochas-tic state space models of the form x k ∼ N ( f ( x k − ) , Q k ) y k ∼ N ( h ( x k ) , Σ k ) Σ k ∼ p ( Σ k | Σ k − ) , (1)where x k ∈ R n is the state at time step k , and y k ∈ R d is themeasurement, Q k is the known process noise covariance and Σ k is the measurement noise covariance. The non-linear functions f ( · ) and h ( · ) form the dynamic and measurement models, re-spectively, and the last equation defines the Markovian dynamicmodel for the dynamics of the unknown noise covariances Σ k .The purpose is to estimate the joint posterior (filtering) dis-tribution of the states and noise covariances: p ( x k , Σ k | y k ) , (2)where we have introduced the notation y k = y , . . . , y k .If the parameters Q k and Σ k in the model (1) were known,the state estimation problem would reduce to the classical non-linear (Gaussian) filtering problem [1, 2]. However, here weconsider the case, where the noise covariances Σ k are unknown.The formal Bayesian filtering solution for general probabilisticstate space models, including the one considered here, is wellknown (see, e.g., [1]) and consist of the Chapman-Kolmogorovequation on the prediction step and Bayes’ rule on the up-date step. However, the formal solution is computationally in-tractable and we can only approximate it.In a recent article, S¨arkk¨a and Nummenmaa [3] introducedthe variational Bayesian adaptive Kalman filter (VB-AKF),which can be used for estimating the measurement noise vari-ances along with the state in linear state space models. In thispaper, we extend the method to allow estimation of the full noise covariance matrix and non-linear state space models. Theidea is similar to what was recently used by Pich´e et al. [4] inthe context of oulier-robust filtering, which in turn is based onthe linear results of [5].We use the Bayesian approach and use the free form varia-tional Bayesian (VB) approximation (see, e.g., [6, 7, 8]) for thejoint filtering posteriors of states and covariances, and the Gaus-sian filtering approach [9, 10] for handling non-linear models.The Gaussian filtering approach allows us also to utilize moregeneral methods such as unscented Kalman filter (UKF) [11],Gauss-Hermite Kalman filter (GHKF) [9], cubature Kalman fil-ter (CKF) [12], and various others [13, 14, 15] along with theclassical methods [1, 2].The variational Bayesian approach has been applied to pa-rameter identification in state space models also in [16, 17, 18]and other Bayesian approaches are, for example, Monte Carlomethods [19, 20, 21] and multiple model methods [22]. It isalso possible to do adaptive filtering by simply augmenting thenoise parameters as state components [2] and use, for example,above-mentioned Gaussian filters for estimation of the state andparameters. If the covariances in the model (1) were known, the filteringproblem would reduce to the classical non-linear (Gaussian) op-timal filtering problem [1, 2]. This non-linear filtering problemcan be solved in various ways, but one quite general approachis the Gaussian filtering approach [2, 9, 10], where the idea is toassume that the filtering distribution is approximately Gaussian.That is, we assume that there exist means m k and covariances P k such that p ( x k | y k ) ≈ N( x k | m k , P k ) . (3)The Gaussian filter prediction and update steps can be writtenas follows [9]: Preprint submitted to Elsevier February 5, 2013 a r X i v : . [ s t a t . M E ] F e b Prediction: m − k = (cid:90) f ( x k − ) N( x k − | m k − , P k − ) d x k − P − k = (cid:90) ( f ( x k − ) − m − k ) ( f ( x k − ) − m − k ) T × N( x k − | m k − , P k − ) d x k − + Q k . (4) • Update: µ k = (cid:90) h ( x k ) N( x k | m − k , P − k ) d x k S k = (cid:90) ( h ( x k ) − µ k ) ( h ( x k ) − µ k ) T × N( x k | m − k , P − k ) d x k + Σ k C k = (cid:90) ( x k − m − ) ( h ( x k ) − µ k ) T N( x k | m − k , P − k ) d x k K k = C k S − k m k = m − k + K k ( y k − µ k ) P k = P − k − K k S k K Tk . (5)With di ff erent selections for the Gaussian integral approxima-tions, we get di ff erent filtering algorithms [10]. In this paper, we approximate the joint filtering distributionof the state and covariance matrix with the free-form variationalBayesian (VB) approximation (see, e.g. , [6, 7, 8, 16]): p ( x k , Σ k | y k ) ≈ Q x ( x k ) Q Σ ( Σ k ) , (6)where Q x ( x k ) and Q Σ ( Σ k ) are the yet unknown approximatingdensities. The VB approximation can be formed by minimizingthe Kullback-Leibler (KL) divergence between the true distri-bution and the approximation:KL[ Q x ( x k ) Q Σ ( Σ k ) || p ( x k , Σ k | y k )] = (cid:90) Q x ( x k ) Q Σ ( Σ k ) log (cid:32) Q x ( x k ) Q Σ ( Σ k ) p ( x k , Σ k | y k ) (cid:33) d x k d Σ k . Minimizing the KL divergence with respect to the probabilitydensities, we get the following equations: Q x ( x k ) ∝ exp (cid:32)(cid:90) log p ( y k , x k , Σ k | y k − ) Q Σ ( Σ k ) d Σ k (cid:33) Q Σ ( Σ k ) ∝ exp (cid:32)(cid:90) log p ( y k , x k , Σ k | y k − ) Q x ( x k ) d x k (cid:33) . (7)These equations can be interpreted and used as fixed-point iter-ation for the su ffi cient statistics of the approximating densities.In the original VB-AKF [3], the VB approximation was de-rived for linear state space models with diagonal noise covari-ance matrix. In this paper, we generalize it to non-linear sys-tems with non-diagonal noise covariance matrix.
2. Main Results
We start by considering the linear state space model with un-known covariance as follows: p ( x k | x k − ) = N( x k | A k x k − , Q k ) p ( y k | x k , Σ k ) = N( x k | H k x k , Σ k ) , (8)where A k and H k are some known matrices. We assume that thedynamic model for the covariance is independent of the stateand of the Markovian form p ( Σ k | Σ k − ), and set some restric-tions to it shortly. In this section we follow the derivation in[3], and extend the scalar variance case to the full covariancecase.Assume that the filtering distribution of the time step k − p ( x k − , Σ k − | y k − ) = N( x k − | m k − , P k − ) IW( Σ k − | ν k − , V k − ) . where the densities, up to non-essential normalization terms,can be written as [23]:N( x | m , P ) ∝ | P | − / exp (cid:32) −
12 ( x − m ) T P − ( x − m ) (cid:33) IW( Σ | ν, V ) ∝ | Σ | − ( ν + n + / exp (cid:32) −
12 tr (cid:16) V Σ − (cid:17)(cid:33) . That is, in the VB approximation (6), Q x ( x k ) is the Gaussiandistribution and Q Σ ( Σ k ) is the inverse Wishart distribution.We now assume that the dynamic model for the covarianceis of such form that it maps an inverse Wishart distribution atthe previous step into inverse Wishart distribution at the currentstep. That is, the Chapman-Kolmogorov equation [1] gives theprediction p ( Σ k | y k − ) = (cid:90) p ( Σ k | Σ k − ) IW( Σ k − | ν k − , V k − ) d Σ k − = IW( Σ k | ν − k , V − k ) , for some parameters ν − k and V − k . We postpone the discussion onhow to actually calculate these parameters to Section 2.2. Forthe state part we obtain the prediction p ( x k | y k − ) = (cid:90) N( x k | A k x k − , Q k ) N( x k − | m k − , P k − ) d x k − = N( x k | m − k , P − k ) , where m − k and P − k are given by the standard Kalman filter pre-diction equations: m − k = A k m k − P − k = A k P k − A Tk + Q k . (9)2ecause the distribution and the previous step is separable, andthe dynamic models are independent we thus get the followingjoint predicted distribution: p ( x k , Σ k | y k − ) = N( x k | m − k , P − k ) IW( Σ k | ν − k , V − k ) . We are now ready to form the actual VB approximation to theposterior. The integrals in the exponentials of (7) can now beexpanded as follows (cf. [3]): (cid:90) log p ( y k , x k , Σ k | y k − ) Q Σ ( Σ k ) d Σ k = −
12 ( y k − H k x k ) T (cid:104) Σ − k (cid:105) Σ ( y k − H k x k ) −
12 ( x k − m − k ) T (cid:16) P − k (cid:17) − ( x k − m − k ) + C (cid:90) log p ( y k , x k , Σ k | y k − ) Q x ( x k ) d x k = −
12 ( ν − k + n +
2) log | Σ k | −
12 tr (cid:110) V − k Σ − k (cid:111) − (cid:104) ( y k − H k x k ) T Σ − k ( y k − H k x k ) (cid:105) x + C , (10)where (cid:104)·(cid:105) Σ = (cid:82) ( · ) Q Σ ( Σ k ) d Σ k , (cid:104)·(cid:105) x = (cid:82) ( · ) Q x ( x k ) d x k , and C , C are some constants. If we have that Q Σ ( Σ k ) = IW( Σ k | ν k , V k ),then the expectation in the first equation of (10) is (cid:104) Σ − k (cid:105) Σ = ( ν k − n − V − k . (11)Furthermore, if Q x ( x k ) = N( x k | m k , P k ), then the expectation inthe second equation of (10) becomes (cid:104) ( y k − H k x k ) T Σ − k ( y k − H k x k ) (cid:105) x = tr (cid:110) H k P k H Tk Σ − k (cid:111) + tr (cid:110) ( y k − H k m k ) ( y k − H k m k ) T Σ − k (cid:111) . (12)By substituting the expectations (11) and (12) into (10) andmatching terms in left and right hand sides of (7) results in thefollowing coupled set of equations: S k = H k P − k H Tk + ( ν k − n − − V k K k = P − k H Tk S − k m k = m − k + K k ( y k − H k m − k ) P k = P − k − K k S k K Tk ν k = ν − k + V k = V − k + H k P k H Tk + ( y k − H k m k ) ( y k − H k m k ) T . (13)The first four of the equations have been written into such sug-gestive form that they can easily be recognized to be the Kalmanfilter update step equations with measurement noise covariance( ν k − n − − V k . In analogous manner to [3], the dynamic model p ( Σ k | Σ k − )needs to be chosen such that when it is applied to an inverse Wishart distribution, it produces another inverse Wishart distri-bution. Although, the explicitly construction of the density ishard, all we need to do is to postulate a transformation rule forthe su ffi cient statistics of the inverse Wishart distributions at theprediction step. Using similar heuristics as in [3], we arrive atthe following dynamic model: ν − k = ρ ( ν k − − n − + n + V − k = B V k − B T , (14)where ρ is a real number 0 < ρ ≤ B is a matrix0 < | B | ≤
1. A reasonable choice for the matrix is B = √ ρ I , inwhich case parameter ρ controls the assumed dynamics: value ρ = • Predict : Compute the parameters of the predicted distribu-tion as follows: m − k = A k m k − P − k = A k P k − A Tk + Q k ν − k = ρ ( ν k − − n − + n + V − k = B V k − B T , • Update : First set m (0) k = m − k , P (0) k = P − k , ν k = + ν − k ,and V (0) k = V − k and the iterate the following, say N , steps i = , . . . , N : S ( i + k = H k P − k H Tk + ( ν k − n − − V ( i ) k K ( i + k = P − k H Tk [ S ( i + k ] − m ( i + k = m − k + K ( i + k ( y k − H k m k ) P ( i + k = P − k − K ( i + k S ( i + k [ K ( i + k ] T V ( i + k = V − k + H k P ( i ) k H Tk + ( y k − H k m ( i ) k ) ( y k − H k m ( i ) k ) T and set V k = V ( N ) k , m k = m ( N ) k , P k = P ( N ) k . Algorithm 1:
The multidimensional Variational BayesianAdaptive Kalman Filter (VB-AKF) algorithm
In this section we extend the results in the previous sectioninto non-linear models of the form (1). We start with the as-sumption that the filtering distribution is approximately productof a Gaussian term and inverse Wishart (IW) term: p ( x k − , Σ k − | y k − ) = N( x k − | m k − , P k − ) IW( Σ k − | ν k − , V k − ) . The prediction step can be handled in similar manner as in thelinear case, except that the computation of the mean and co-variance of the state should be done with the Gaussian filter3rediction equations (4) instead of the Kalman filter predictionequations (9). The inverse Wishart part of the prediction re-mains intact.After the prediction step, the approximation is then p ( x k , Σ k | y k − ) = N( x k | m − k , P − k ) IW( Σ k | ν − k , V − k ) . The expressions corresponding to (10) now become: (cid:90) log p ( y k , x k , Σ k | y k − ) Q Σ ( Σ k ) d Σ k = −
12 ( y k − h ( x k )) T (cid:104) Σ − k (cid:105) Σ ( y k − h ( x k )) −
12 ( x k − m − k ) T (cid:16) P − k (cid:17) − ( x k − m − k ) + C (cid:90) log p ( y k , x k , Σ k | y k − ) Q x ( x k ) d x k = −
12 ( ν − k + n +
2) log | Σ k | −
12 tr (cid:110) V − k Σ − k (cid:111) − (cid:104) ( y k − h ( x k )) T Σ − k ( y k − h ( x k )) (cid:105) x + C . (15)The expectation in the first equation is still given by the equa-tion (11), but the resulting distribution in terms of x k is in-tractable in closed form due to the non-linearity h ( x k ). For-tunately, the approximation problem is exactly the same as en-countered in the update step of Gaussian filter and thus we candirectly use the equations (5) for computing Gaussian approxi-mation to the distribution.The simplification (12) does not work in the non-linear case,but we can rewrite the expectation as (cid:104) ( y k − h ( x k )) T Σ − k ( y k − h ( x k )) (cid:105) x = tr (cid:110) (cid:104) ( y k − h ( x k )) ( y k − h ( x k )) T (cid:105) x Σ − k (cid:111) , (16)where the expectation can be separately computed using someof the Gaussian integration methods in [10]. Because the resultof the integration is just a constant matrix, we can now substi-tute (11) and (16) into (15) and match the terms in equations (7)in the same manner as in linear case to obtain the equations: µ k = (cid:90) h ( x k ) N( x k | m − k , P − k ) d x k S k = (cid:90) ( h ( x k ) − µ k ) ( h ( x k ) − µ k ) T × N( x k | m − k , P − k ) d x k + ( ν k − n − − V k C k = (cid:90) ( x k − m − ) ( h ( x k ) − µ k ) T N( x k | m − k , P − k ) d x k K k = C k S − k m k = m − k + K k ( y k − µ k ) P k = P − k − K k S k K Tk ν k = ν − k + V k = V − k + (cid:90) ( y k − h ( x k )) ( y k − h ( x k )) T × N( x k | m k , P k ) d x k . (17) The general filtering method for the full covariance and non-linear state space model is shown in Algorithm 2. Various use-ful special cases and extensions can be deduced from the equa-tions: • The
Gaussian integration method will result in di ff er-ent variants of the algorithm. For example, the Taylorseries based approximation could be called VB-AEKF,unscented transform based method VB-AUKF, cubaturebased VB-ACKF, Gauss-Hermite based VB-AGHKF andso on. For the details of the di ff erent Gaussian integrationmethods, see, [9, 10, 11, 12, 24, 13, 14, 15]. • The linear special case can be easily deduced by compar-ing the equations (13) and (17) to the equations in the algo-rithm. That is, one simply replaces the Gaussian integralswith their closed form solutions. • The diagonal covariance case , which was considered in[3], can be recovered by updating only the diagonal ele-ments in the last equation of the Algorithm and keepingall other elements in the matrices V ( i ) k zero. Of course,the matrix B in the prediction step then needs to be diag-onal also. Note that the inverse Wishart parameterizationdoes not reduce to the inverse Gamma parameterization,but still the formulations are equivalent. • Non-additive dynamic models can be handled by simplyreplacing the state prediction with the non-additive coun-terpart.
3. Numerical Results
In this simple example we illustrate the performance of thedeveloped adaptive filters by tracking a moving target withsensors, which measure the distances to the target moving in2-dimensional ( u , v ) space. The measurements are corruptedwith noise having time-varying correlations between the sen-sors. The correlations arise, because the noise in the measure-ments is caused by localized variations in the environment andwhen the spatial paths of the measured signals are similar, thenoises are correlated.The state is contains the position and velocity of the target x k = ( u k v k ˙ u k ˙ v k ) T and the dynamics of the target are modeledby the standard Wiener velocity model. The distance measure-ments from m sensors read y ik = (cid:113) ( s iu − u k ) + ( s iv − v k ) + r ik , i = , . . . , m , (18)where ( s iu , s iv ) is the position of i th sensor and r ik is the i th com-ponent of a Gaussian distributed noise vector r k ∼ N (0 , Σ k ). Inthis experiment the noise to each distance measurement is gen-erated by drawing a random sample from a discretized Gaussianrandom field z k ∈ R n z and then collecting all the values of thefield connected to the line between the sensor and the target.4e take the time-white continuous-valued random field z ( u , v )to be zero mean and to have the covariance function k ( u , v , u (cid:48) , v (cid:48) ) = σ δ ( u − u (cid:48) ) δ ( v − v (cid:48) ) + n A (cid:88) i = A i ( u , v ) 1 A i ( u (cid:48) , v (cid:48) ) k i ( u , v , u (cid:48) , v (cid:48) ) , (19)where the first term corresponds to white background noiseand the latter terms to additive correlations for points insidebounded regions A i ⊂ R , i = . . . , n A with covariance func-tions k i ( u , v , u (cid:48) , v (cid:48) ). We set the covariance functions to k i ( u , v , u (cid:48) , v (cid:48) ) = σ , i δ ( u − u (cid:48) ) δ ( v − v (cid:48) ) + σ , i exp l i (( u − u (cid:48) ) + ( v − v (cid:48) ) ) , (20)which means that noise inside the bounded regions consists ofindependent (white) and correlated components.The simulation scenario is illustrated in Figure 1. For thelightly shaded area ( A ) the covariance function parameterswere σ , i = . , σ , i = . and l i =
2, and inside thedarkly shaded area ( A ) σ , i = . , σ , i = . and l i = σ = . .The spectral density of the process noise was set to q = T = .
01. The trajectory shown in Figure 1was discretized to 1000 time steps and then measurements weregenerated according to the procedure described above. Giventhe measurements, the target was tracked with the followingmethods: • UKF-t: Unscented Kalman filter with measurement co-variance set to true value on each time step. • UKF-o: Unscented Kalman filter with fixed diagonal mea-surement covariance matrix with di ff erent standard devia-tions σ = . , . . . , • VB-AUKF-f: The proposed adaptive filter with ADF ap-proximations made with UKF. The parameter ρ in dynamicmodel of measurement noise was set to ρ = − exp( − • VB-AUKF-d: Same as VB-AUKF-f with the exceptionthat the measurement covariance is forced to be diagonal.With all methods the parameters of UKF was set to default val-ues α = , β = , κ = − m . The RMSE values in trackingthe position of the target with the tested methods are shownin Figure 2 for a typical simulation run. It can be seen that thebest results can be achieved with exact measurement covariancewhile the estimation of full covariance improves the results ofVB-AUKF over the diagonal case. Obviously, UKF with fixeddiagonal measurement covariance is clearly the worst of the allthe tested methods. Figure 3 shows the estimates of the elementof Σ k produced by VB-AUKF-f together with their true values. As an example, we consider the classical multi-sensor bear-ings only tracking problem with coordinated turning model, −10 −5 0 5 10−10−8−6−4−20246810 A A Figure 1: Simulation scenario in the Range-Only Tracking Demonstration. Cir-cle denotes the starting location of the target, triangles the locations of the sen-sors and the dashed line the trajectory of the target. Inside the shaded areas thenoise field has spatial correlations. σ R M SE ( po s ) UKF−tUKF−oVBAUKF−fVBAUKF−d
Figure 2: Range-Only Tracking: RMSE values for tracking the position of thetarget on a typical simulation run. The di ff erent standard deviation values usedfor UKF-o are on X-axis. σ , (a) σ , σ , (b) σ , σ , (c) σ , σ , (d) σ , σ , (e) σ , σ , Ground truthMean estimate (f) σ , Figure 3: Range-Only Tracking: Estimates of Σ k with VBAUKF-f. Smootherestimates can be obtained by tuning ρ to be larger, but that introduces more lagto estimates. where the state x = ( u , ˙ u , v , ˙ v , ω ) contains the 2d location ( u , v )and the corresponding velocities (˙ u , ˙ v ) as well as the turningrate ω of the target. The dynamic model and the measurementmodels for sensors i = , . . . , x k = sin( ω ∆ t ) ω − (cid:16) − cos( ω ∆ t ) ω (cid:17)
00 cos( ω ∆ t ) 0 − sin( ω ∆ t ) 00 − cos( ω ∆ t ) ω sin( ω ∆ t ) ω
00 sin( ω ∆ t ) 0 cos( ω ∆ t ) 00 0 0 0 1 x k − + q k − y k = arctan (cid:32) v k − s iv u k − s iu (cid:33) + r i , k , i = , . . . , , (21)where q k ∼ N(0 , Q ) is the process noise and r k = ( r k , , . . . , r k , )are the measurement noises of sensors with joint distribution r k ∼ N(0 , Σ k ), where Σ k is unknown and time varying.We simulated a trajectory and measurements from the modeland applied di ff erent filters to it. We tested various Gaussianintegration based methods (VB-AEKF, VB-AUKF, VB-ACKF,VB-AGHKF) and because the results were quite much the samewith di ff erent Gaussian integration methods (though VB-AEKFwas a bit worse than the others), we only present the results ob-tained with VB-ACKF. Figure 4 shows the simulated trajectoryand the VB-ACKF results with the full covariance estimation.In the simulation, the variances of the measurement noises aswell as the cross-correlations varied smoothly over time. Thesimulated measurements are shown in Figure 5.Figure 6 shows the root mean squared errors (RMSEs) forthe following methods: • CKF-t : CKF with the true covariance matrix. • CKF-o : CKF with a diagonal covariance matrix with di-agonal elements given by the value on the x -axis. • VBCKF-f : CKF with full covariance estimation. • VBCKF-d:
CKF with diagonal covariance estimation. −1 −0.5 0 0.5 1x 10 −1−0.8−0.6−0.4−0.200.20.40.60.81 x 10 SensorsTrue trajectoryFiltered estimate
Figure 4: The simulated trajectory and the estimate obtained with VB-ACKF.
Figure 5: The simulated measurements.
As can be seen from the figure, the results of filters with co-variance estimation are indeed better than the results of any fil-ter with fixed diagonal covariance matrix. The filter with theknown covariance matrix gives the lowest error, as would beexpected, and the filter with full covariance estimation gives alower error than the filter with diagonal covariance estimation.
4. Conclusion and Discussion
In this paper, we have presented a variational Bayes andGaussian filtering based algorithm for joint estimation of stateand time-varying noise covariances in non-linear state spacemodels. The performance of the method has been illustratedin simulated applications.There are several extensions that could be considered as well.For instance, we could try to estimate the process noise covari-ance in the model. However, it is not as easy as it sounds, be-cause the process noise covariance does not appear in the equa-6 .002 0.004 0.006 0.008 0.01 0.012 0.014 0.016 0.018 0.0210 CKF−tCKF−oVBACKF−fVBACKF−d