Kalman Filter from the Mutual Information Perspective
aa r X i v : . [ c s . I T ] J a n Kalman Filter from the Mutual Information Perspective
Yarong Luo, [email protected] Hu, [email protected] Guo, [email protected] Research Center, Wuhan University
Abstract
Kalman filter is a best linear unbiased state estimator. It is also comprehensible from the pointview of the Bayesian estimation. However, this note gives a detailed derivation of Kalman filterfrom the mutual information perspective for the first time. Then we extend this result to the R´enyimutual information. Finally we draw the conclusion that the measurement update of the Kalmanfilter is the key step to minimize the uncertainty of the state of the dynamical system.
Key Words
Kalman filter, mutual information, R´enyi mutual information, uncertainty, measurement update
1. Introduction
Kalman filter has been widely in various fields as an effective state estimator for integrated navi-gation [1], robotics [2], etc. The classical Kalman filter can be derived as a best linear unbiasedestimate [1] and it is easy understand it from the probabilistic perspective [2]. Recently, Kalmanfilter has also been presented using the methods of maximum relative entropy [3] and the temporalderivative of the R´enyi entropy [4], which go beyond the general Bayesian filter. More and moreevidences show that Kalman filter can be regarded as a direct extension of information theory.This note gives a new perspective of the Kalman filter from the mutual information, which furtherbridges the gap between the optimal state estimation and the information theory. The main con-ribution of this note is to derive the Kalman filter from the perspective of mutual information andextend it to the R´enyi mutual information case.
2. Kalman Filter from the Mutual Information
For following discrete-time state-space model X k = Φ k | k − X k − + Γ k | k − W k − (1) Z k = H k X k + V k (2)where X k is n-dimensional state vector; Z k is m-dimensional measurement vector; Φ k | k − , Γ k | k − and H k are the known system structure parameters, which are called the n × n dimensional one-step state update matrix, the n × l dimensional system noise distribution matrix, and the m × n dimensional measurement matrix, respectively; W k − is the l -dimensional system noise vector,and V k is the m-dimensional measurement noise vectors. Both of them are Gaussian noise vectorsequences with zero mean value, and independent to each other: E [ W k ] = 0 , E [ W k W Tj ] = Q k δ kj (3) E [ V k ] = 0 , E [ V k V Tj ] = R k δ kj (4) E [ W k V Tj ] = 0 (5)The one-step prediction covariance matrix is denoted as Σ k | k − . The state estimation at t k isdenoted as N ( ˆ X k , Σ k ) , where ˆ X k is the mean of estimated state and Σ k is the covariance matrixof the estimated covariance matrix. Assuming the optimal estimation of the state can be calculatedas follows: ˆ X k = X − k | k − + K k ˜ Z k | k − (6)where K k is the undetermined correction factor matrix, X − k | k − = X k | k − − ˆ X k − is the stateestimation error, ˜ Z k | k − = Z k − H k X − k | k − is the measurement one-step prediction error.Then, the mean square error matrix of state estimation ˆ X k is given by [1] Σ k = ( I − K k H k )Σ k | k − ( I − K k H k ) T + K k R k K Tk (7)2he mean square error matrix Σ k is positive definite as ( I − K k H k )Σ k | k − ( I − K k H k ) T is positivedefinite and K k R k K Tk is positive definite.The joint Gaussian distribution can be expressed p ( X, Y ) ∼ ˆ X ˆ Y , Σ xx Σ xy Σ yx Σ yy (8)where X ∼ N ( ˆ X, Σ xx ) , Y ∼ N ( ˆ Y , Σ yy ) .The mutual information for a joint Gaussian PDF can be represented by I ( X, Y ) = H ( X ) + H ( Y ) − H ( X, Y ) = H ( X ) − H ( X | Y )= 12 ln((2 πe ) N det Σ xx ) + 12 ln((2 πe ) M det Σ yy ) −
12 ln((2 πe ) M + N det Σ)= −
12 ln (cid:18) det Σdet Σ xx det Σ yy (cid:19) (9)where H ( X ) = ln((2 πe ) N det Σ xx ) is the entropy of a Gaussian random variable, and det Σ = det Σ xx det (cid:0) Σ yy − Σ yx Σ − xx Σ xy (cid:1) = det Σ yy det (cid:0) Σ xx − Σ xy Σ − yy Σ yx (cid:1) (10)Therefore, the mutual information describes the reduction of the uncertainty in variable X dueto gaining knowledge of variable Y.Similarly, the mutual information at time step t k +1 can be easily computed by the a priori, aposteriori PDF and the Kalman gain K k as I ( ˆ X k | k − , Z k ) = 12 ln (cid:18) det Σ k | k − det Σ k (cid:19) = 12 ln det Σ k | k − det (cid:0) ( I − K k H k )Σ k | k − ( I − K k H k ) T + K k R k K Tk (cid:1) ! (11)It describes the reduction of the uncertainty of the state due to gaining knowledge from themeasurement Z k . Consequently, we want to maximize the mutual information I ( ˆ X k | k − , Z k ) . It isobvious that the maximum of the mutual information is equivalent to the minimum of ln det Σ k .We can note that equation (11) is the function of the unknown factor matrix and thereby the maxi-mization of it can be calculated by taking derivative of it with respect to K k and setting it equal tozero, dI ( ˆ X k | k − , Z k ) dK k = − Σ − Tk d Σ k dK k = − Σ − Tk (cid:0) − I − K k H k )Σ k | k − H Tk + 2 K k R k (cid:1) = 0 (12)3here ∂ ln det X∂X = X − T [5] has been used and then solving for K k gives K k = Σ k | k − H Tk ( H k Σ k | k − H Tk + R k ) − (13)A subsequent derivative of equation (12) must be performed to check for a maximum, that is ddK k (cid:0) − Σ − Tk (cid:0) − I − K k H k )Σ k | k − H Tk + 2 K k R k (cid:1)(cid:1) (14)After substituting the equation (12) into above equation results in ddK k (cid:0) − Σ − Tk (cid:0) − I − K k H k )Σ k | k − H Tk + 2 K k R k (cid:1)(cid:1) = − Σ − Tk ( H k Σ k | k − H Tk + R k ) (15)which is always negative definite by the definition of the covariance matrices R k and H k Σ k | k − H Tk ,ensuring the solution for K k is a maximum.
3. Kalman Filter from the R´enyi Mutual Information
Moreover, the R´enyi mutual information of a joint Gaussian PDF can be calculated similar toequation (9): I αR ( X, Y ) = H αR ( X ) + H αR ( Y ) − H αR ( X, Y )= 12 ln | (2 π ) N α Nα − det Σ xx | + 12 ln | (2 π ) M α Mα − det Σ yy | −
12 ln | (2 π ) N + M α N + Mα − det Σ | = 12 ln (cid:18) det Σ xx det Σ yy det Σ (cid:19) = I ( X, Y ) (16)where H αR ( X ) = ln | (2 π ) N α Nα − det Σ xx | is the R´enyi entropy of order α for a continuous Gaus-sian random variable with a multivariate Gaussian PDF. Consequently, we can know that the mutualinformation is the same as the R´enyi mutual information for the joint Gaussian PDF. Similarly, wecan get the same result as equation (13) from the R´enyi mutual information.
4. Conclusions
In this paper, Kalman filter is derived from the perspective of mutual information and extended tothe R´enyi mutual information case. We show that the measurement update of the Kalman filter4an minimize the uncertainty of the state by formulating it as the mutual information between theevolving state and the measurement and maximizing the mutual information. Furthermore, we canthink of Kalman filter a little more radically as an extension of the information theory.
Acknowledgement
This research was supported by a grant from the National Key Research and Development Programof China (2018YFB1305001). We express thanks to GNSS Center, Wuhan University.
References [1] Y. Gongmin and W. Jun,
Lectures on Strapdown Inertial Navigation Algorithm and IntegratedNavigation Principles . Northwestern Polytechnical University Press: Xi’an, China, 2019.[2] S. Thrun, W. Burgard, and D. Fox,
Probabilistic Robotics . MIT Press, 2005.[3] A. Giffin and R. Urniezius, “The kalman filter revisited using maximum relative entropy,”
Entropy , vol. 16, no. 2, pp. 1047–1069, 2014.[4] Y. Luo, C. Guo, S. You, and J. Liu, “A novel perspective of the kalman filter from the r´enyientropy,”
Entropy , vol. 22, no. 9, p. 982, 2020.[5] X.-D. Zhang,