[PDF] Maximum Correntropy Kalman Filter

Abstract

Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.

Full PDF

>> REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1  Abstract — Traditional Kalman filter (KF) is derived under the well-known minimum mean square error (MMSE) criterion, which is optimal under Gaussian assumption. However, when the signals are non-Gaussian, especially when the system is disturbed by some heavy-tailed impulsive noises, the performance of KF will deteriorate seriously. To improve the robustness of KF against impulsive noises, we propose in this work a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), which adopts the robust maximum correntropy criterion (MCC) as the optimality criterion, instead of using the MMSE. Similar to the traditional KF, the state mean and covariance matrix propagation equations are used to give prior estimations of the state and covariance matrix in MCKF. A novel fixed-point algorithm is then used to update the posterior estimations. A sufficient condition that guarantees the convergence of the fixed-point algorithm is given. Illustration examples are presented to demonstrate the effectiveness and robustness of the new algorithm.

Index Terms — Kalman Filter, Maximum Correntropy Criterion (MCC), Fixed-Point Algorithm. I. I NTRODUCTION stimation problem has been one of the most important issues from industrial appliances to research areas including signal processing, optimal control, navigation and so on. The actual applications include parameter estimate [27], system identification [28], target tracking [29], simultaneous localization [30] and many others. For linear dynamic systems, the estimation problem is usually solved by Kalman filter (KF), which is, in essence, an adaptive least square error filter that provides an optimal recursive solution [1] [2] [3]. The KF performs very well in Gaussian noises [4]. Nevertheless, its performance is likely to get worse when applied to non-Gaussian situations, especially when the systems are disturbed by impulsive noises. The main reason for this is that KF is based on the well-known minimum mean square error (MMSE) criterion, which is sensitive to large outliers and results in deterioration of the robustness of the KF in non-Gaussian noise environments [5].

This work was supported by 973 Program (No. 2015CB351703) and the National Natural Science Foundation of China (No. 61372152). B. Chen (Corresponding author) and X. Liu are with the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China (e-mail: [email protected] ; [email protected]). Haiquan Zhao is with the School of Electrical Engineering, Southwest Jiaotong University, Chengdu, China. ([email protected]) J. C. Principe is with the Department of Electrical and Computer Engineering, University of Florida, Gainesville FL 32611 USA, and the School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, China (e-mail: [email protected]).

The optimization criteria in information theoretic learning (ITL) [6] [7] have gained increasing attention over the past few years, which uses the information theoretic quantities (e.g. entropy) estimated directly from the data instead of the usual second order statistical measures, such as variance and covariance, as the optimization costs. Information theoretic quantities can capture higher-order statistics and offer potentially significant performance improvement in machine learning and signal processing applications. The ITL links information theory, nonparametric estimators, and reproducing kernel Hilbert spaces (RKHS) in a simple and unconventional way. In particular, the correntropy as a nonlinear similarity measure in kernel space has its root in Renyi's entropy [8]-[12]. Since correntropy is also a local similarity measure (hence insensitive to outliers), it is naturally a robust cost for machine learning and signal processing [13]-[21]. In supervised learning, such as regression, the problem can be formulated as that of maximizing the correntropy between model output and desired response. This optimization criterion is called in ITL the maximum correntropy criterion (MCC) [6] [7]. Recently, the MCC has been successfully used in robust adaptive filtering in impulsive (heavy-tailed) noise environments [6] [9]-[11] [22]. The MCC solution cannot be obtained in closed form even for a simple linear regression problem, so one has to solve it using an iterative update algorithm such as the gradient based methods [9]- [11] [22]. The gradient based methods are simple and widely used. But they depend on a free parameter step-size and usually converge to an optimal solution slowly. The fixed-point iterative algorithm is an alternative efficient way to solve the MCC solution, which involves no step-size and may converge to the solution very fast [6] [24] [25]. A sufficient condition that guarantees the convergence of the fixed-point MCC algorithm was given in [26]. In the present paper, we develop a new Kalman filter, called the maximum correntropy Kalman filter (MCKF), based on the MCC and a fixed-point iterative algorithm. Similar to the traditional KF, the MCKF not only retains the state mean propagation process, but also preserves the covariance matrix propagation process. Especially, the new filter has a recursive solution structure and is suitable for online implementation. It is worth noting that in [23], the MCC has been used in hidden state estimation, but it involves no covariance propagation process and is in form not a Kalman filter. The rest of the paper is organized as follows. In Section II, we briefly introduce the maximum correntropy criterion and Kalman filter. In Section III, we derive the MCKF algorithm and give the computational complexity and convergence analysis. Simulation results are then provided in Section IV to show the excellent performance of the MCKF. Finally, conclusion is given in Section V.

Maximum Correntropy Kalman Filter

Badong Chen,

Senior Member, IEEE , Xi Liu,

Haiquan Zhao, José C. Príncipe,

Fellow, IEEE E REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 2 II. P RELIMINARIES A. Maximum Correntropy Criterion

Correntropy is a generalized similarity measure between two random variables. Given two random variables , X Y   with joint distribution function   F , XY x y , correntropy is defined by         , , , F , XY V X Y X Y x y d x y        (1) where  denotes the expectation operator, and ( , )    is a shift-invariant Mercer Kernel. In this paper, without mentioned otherwise the kernel function is the Gaussian Kernel given by     , G exp 2 ex y e         (2) where e x y   , and  stands for the kernel bandwidth. In most practical situations, however, only limited number of data are available and the joint distribution F XY is usually unknown. In these cases, one can estimate the correntropy using a sample mean estimator:     

1, G ( ) Ni V X Y e iN    (3) where ( ) ( ) ( ) e i x i y i   ,   ( ), ( ) Ni x i y i  are N samples drawn from F XY . Taking Taylor series expansion of the Gaussian kernel, we have      

1, 2 ! n nn nn

V X Y X Yn          (4) As one can see, the correntropy is a weighted sum of all even order moments of the random variable X Y  . The kernel bandwidth appears as a parameter weighting the second order and higher order moments. With a very large  (compared to the dynamic range of the data), the correntropy will be dominated by the second order moment. Given a sequence of error data   ( ) Ni e i  , the cost function of MCC is given by   NMCC i

J e iN    (5) Suppose the goal is to learn a parameter vector W of an adaptive model, and let ( ) x i and ( ) y i denote, respectively, the model output and the desired response. The MCC based learning can be formulated as the following optimization problem:  

1ˆ arg max G ( )

NW i

W e iN     (6) where ˆ W denotes the optimal solution, and  denotes a feasible set of parameter. B. Kalman Filter

Kalman filter provides a powerful tool to deal with state estimation of linear systems, which is an optimal estimator under linear and Gaussian assumptions. Consider a linear system described by the following state and measurement equations: ( ) ( 1) ( 1) ( 1), k k k k      x F x q (7) ( ) ( ) ( ) ( ). k k k k   y H x r (8) where ( ) n k  x  denotes the n -dimensional state vector, ( ) m k  y  represents the m -dimensional measurement vector at instant k . F and H stand for, respectively, the system matrix (or state transition matrix) and observation matrix. ( 1) k  q and ( ) k r are mutually uncorrelated process noise and measurement noise, respectively, with zero mean and covariance matrices ( 1) ( 1) ( 1), T k k k        q q Q ( ) ( ) ( ) T k k k     r r R (9) In general, Kalman filter includes the following two steps: Predict

The prior mean and covariance matrix are given by   ( | 1) ( 1) ( 1| 1), k k k k k      x F x (10) ( | 1) ( 1) ( 1| 1) ( 1) ( 1). T k k k k k k k         P F P F Q (11) Update

The Kalman filter gain is computed as   ( ) ( | 1) ( ) ( ) ( | 1) ( ) ( ) T T k k k k k k k k k      K P H H P H R (12) The posterior state is equal to the prior state plus the innovation weighted by the KF gain,      ( | ) ( | 1) ( ) ( ) ( ) ( | 1) k k k k k k k k k      x x K y H x (13) Additionally, the posterior covariance is recursively updated as follows:     ( | ) ( ) ( ) ( | 1) ( ) ( ) + ( ) ( ) ( ) TT k k k k k k k kk k k     P I K H P I K HK R K (14) III. K ALMAN F ILTER U NDER

MCC REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 3 A. Derivation of the Algorithm

Traditional Kalman filter works well under Gaussian noises, but its performance may deteriorate significantly under non-Gaussian noises, especially when the underlying system is disturbed by impulsive noises. The main reason for this is that KF is developed based on the MMSE criterion, which captures only the second order statistics of the error signal and is sensitive to large outliers. To address this problem, we propose in this work to use the MCC criterion to derive a new Kalman filter, which may perform much better in non-Gaussian noise environments, since correntropy contains second and higher order moments of the error. For the linear model described in the previous section, we have  ( | 1) ( ) ( )( )( ) k k k kkk            Ix xHy (15) where I is the n n  identity matrix, and ( ) k  is    ( ) ( | 1)( ) ( ) k k kk k          x xr , with ( ) ( )( | 1) 00 ( )( | 1) ( | 1) 00 ( ) ( )( ) ( ) T Tp p Tr rT k kk k kk k k k k kk k              

P RB B B BB B (16) where ( ) k B can be obtained by Cholesky decomposition of ( ) ( ) T k k      . Left multiplying both sides of (15) by ( ) k  B , we get ( ) ( ) ( ) ( ) k k k k   D W x e (17) where  ( | 1)( ) ( ) ( ) k kk k k       xD B y , ( ) ( ) ( ) k k k       IW B H , ( ) ( ) ( ) k k k    e B . Since ( ) ( ) T k k     e e I , the residual error ( ) k e are white. Now we propose the following MCC based cost function:  

1( ) G ( ) ( ) ( )

LL i ii

J x d k k kL     w x (18) where ( ) i d k is the i -th element of ( ) k D , ( ) i k w is the i -th row of ( ) k W , and L n m   is the dimension of ( ) k D . Then, under MCC criterion, the optimal estimate of ( ) k x is    ( ) arg max ( ) arg max G ( ) LL ix x i k J x e k     x (19) where ( ) i e k is the i -th element of ( ) k e : ( ) ( ) ( ) ( ) i i i e k d k k k   w x (20) The optimal solution can thus be obtained by solving ( ) 0( ) L J xk   x (21) It follows easily that   

11 1 ( ) G ( ) ( ) ( ) G ( ) ( ) ( )

L Ti i ii L Ti i ii k e k k ke k k d k                   x w ww (22) Since ( ) ( ) ( ) ( ) i i i e k d k k k   w x , the optimal solution (22) is actually a fixed-point equation of ( ) k x and can be rewritten as   ( ) f ( ) k k  x x (23) with     

11 1 f ( ) G ( ) ( ) ( ) ( ) ( ) G ( ) ( ) ( ) ( ) ( )

L Ti i i ii L Ti i i ii k d k k k k kd k k k k d k                   x w x w ww x w A fixed-point iterative algorithm can be readily obtained as     ( ) f ( ) t t k k   x x (24) where  ( ) t k x denotes the solution at the fixed-point iteration t . The fixed-point equation (22) can also be expressed as   ( ) ( ) ( ) ( ) ( ) ( ) ( ) T T k k k k k k k   x W C W W C D (25) where ( ) 0( ) 0 ( ) x y kk k      CC C , with       ( ) G ( ) ,..., G ( ) x n k diag e k e k    C ,       ( ) G ( ) ,..., G ( ) y n n m n m k diag e k e k      C . The equation (25) can be further expressed as follows (see the Appendix for a detailed derivation):     ( ) ( | 1) ( ) ( ) ( ) ( | 1) k k k k k k k k      x x K y H x (26) where REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 4   ( ) ( | 1) ( ) ( ) ( | 1) ( ) ( )( | 1) ( | 1) ( ) ( | 1)( ) ( ) ( ) ( ) T TTp x pTr y r k k k k k k k k kk k k k k k kk k k k           

K P H H P H RP B C BR B C B (27)

Remark : Of course, the equation (26) is also a fixed-point equation of ( ) k x because ( ) k K depends on ( | 1) k k  P and ( ) k R , both related to ( ) k x via ( ) x k C and ( ) y k C , respectively. The optimal solution of the equation (26) depends also on the prior estimate  ( | 1) k k  x , which can be calculated by (10) using the latest estimate  ( 1| 1) k k   x . With the above derivations, we summarize the proposed MCKF algorithm as follows: 1) Choose a proper kernel bandwidth  and a small positive number  ; Set an initial estimate  (0 | 0) x and an initial covariance matrix (0 | 0) P ; Let k  ; 2) Use equations (10) (11) to obtain  ( | 1) k k  x and ( | 1) k k  P , and use Cholesky decomposition to obtain ( | 1) p k k  B ; 3) Let t  and   ( ( | ) | 1) k k k k   x x , where  ( | ) t k k x denotes the estimated state at the fixed-point iteration t ; 4) Use (28)-(34) to compute  ( | ) t k k x ;       ( | ) ( | 1) ( ) ( ) ( ) ( | 1) t k k k k k k k k k      x x K y H x (28) with       ( ) ( | 1) ( ) ( ) ( | 1) ( ) ( ) , T T k k k k k k k k k      K P H H P H R (29)   ( | 1) ( | 1) ( ) ( | 1), Txp p k k k k k k k      P B C B (30)   ( ) ( ) ( ) ( ). Tyr r k k k k   R B C B (31)        ( ) G ( ) ,..., G ( ) x n k diag e k e k    C   (32)        ( ) G ( ) ,..., G ( ) y n n mn m k diag e k e k     C   (33)  ( ) ( ) ( ) ( | ) i i i t e k d k k k k    w x  (34) 5) Compare the estimation of the current step and the estimation of the last step. If (35) holds, set   ( | ) ( | ) t k k k k  x x and continue to 6). Otherwise, t t   , and go back to 4).   ( | ) ( | )( | ) t tt k k k kk k     x xx (35) 6) Update the posterior covariance matrix by (36), k k   and go back to 2).         ( | ) ( ) ( ) ( | 1) ( ) ( ) + ( ) ( ) ( ) TT k k k k k k k kk k k     P I K H P I K HK R K (36)

Remark : As one can see, different from the traditional KF algorithm, the MCKF uses a fixed-point algorithm to update the posterior estimate of the state. The small positive number  provides a stop condition (or a threshold) for the fixed-point iteration. Since the initial value of the fixed-point iteration is set at the prior estimate  ( | 1) k k  x , the convergence to the optimal solution will be very fast (usually in several steps). The bandwidth  is a key parameter in MCKF. In general, a smaller bandwidth makes the algorithm more robust (with respect to outliers) but converge more slowly. On the other hand, when  becomes more and more larger, the MCKF will behave more and more like the ordinary KF algorithm. In particular, the following theorem holds. Theorem 1 : When the kernel bandwidth    , the MCKF will reduce to the KF algorithm.

Proof : see Appendix. B. Computational Complexity

Next, we analyze the computational complexity in terms of the floating point operations for the proposed algorithm. The computational complexities of some basic equations are given in Table I. TABLE I C

OMPUTATIONAL C OMPLEXITIES O F S OME EQUATIONS equation multiplication and addition/subtraction division, matrix inversion, Cholesky decomposition and exponentiation (10) -n 0 (11) 4n -n m+4nm -3nm O(m ) (13) 4nm 0 (14) 4n +6n m-2n +2nm -nm 0 (28) 4nm 0 (29) 4n m+4nm -3nm O(m ) (30) 2n n + O(n ) (31) 2m m + O(m ) (32) 2n n (33) 2nm m (34) 2n 0 (36) 4n +6n m-2n +2nm -nm 0 The traditional Kalman filter algorithm involves the equations (10)~(14). Thus from Table I ， one can conclude that the computational complexity of Kalman filter is KF S n n m n nm n O m       (37) The MCKF algorithm mainly involves the equations (10), (11), (28)~(34) and (36). Note that  ( ) x k C and  ( ) y k C are diagonal matrices, so it is very easy to obtain their inverse matrices. Assume that the average fixed-point iteration number is T . Then, according to Table I, the computational complexity of the MCKF is REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 5 (2 8) (6 4 ) (2 1) (4 2) (3 1) (4 1) 2 2 ( ) 2 ( ) MCKF

S T n T Tn m T n T nmT nm T n Tm Tm TO n TO m                (38) The fixed-point iteration number T is relatively small in general (see the simulation results in the next section). Thus the computational complexity of the MCKF is moderate compared with the traditional KF algorithm. C. Convergence Issue

The rigorous convergence analysis of the proposed MCKF algorithm is very complicated. In the following, we present only a sufficient condition that guarantees the convergence of the fixed-point iterations in MCKF. The result is similar to that of [26] and hence, will not be proved here. Let . p denote an p l -norm of a vector or an induced norm of a matrix defined by max p pp p   X AXA X with p  , and min [.]  denote the minimum eigenvalue of a matrix. According to the results of [26], the following theorem holds. Theorem 2 : If ( ) ( )( ) ( )

L Ti ii L Ti ii n k d kk k             ww w , and   † max ,      , in which   is the solution of the equation ( )   , with     ( ) ( )( ) , 0,( ) ( ) ( ) ( ) L Ti iiL Ti i i ii n k d kG k d k k k             ww w w (39) and †  is the solution of equation           , with       ( ) ( ) ( ) ( ) ( ) ( ) ( )( ) G ( ) ( ) ( ) ( ) L T Ti i i i i i ii L Ti i i ii n k d k k k k k d kk d k k k                   w w w w ww w w     (40) then it holds that   f ( ) k  x , and   ( ) 1 f ( ) k k   x x for all   ( ) ( ) : ( ) n k k k    x x x  , where   ( ) f ( ) k k  x x denotes the n n  Jacobian matrix of   f ( ) k x with respect to ( ) k x , that is       ( ) 1 f ( ) f ( ) ... f ( )( ) ( ) k n k k kx k x k         x x x x (41) with        f ( )( ) 1 ( ) ( ) ( ) ( ) ( ) f ( )1 ( ) ( ) ( ) ( ) ( ) j L j Ti i i i ii L j Ti i i i ii kx k e k w k G e k k k ke k w k G e k k d k                      xN w w xN w where   G ( ) ( ) ( )

L Ti i ii e k k k    ww N w w and ( ) ji w k is the j -th element of ( ) i k w . By Theorem 2, if the kernel bandwidth  is larger than a certain value, we have   

1( ) 1 f ( )f ( ) 1 k kk      x xx (42) By Banach Fixed-Point Theorem [24], given an initial state estimate satisfying ( ) k  x , the fixed-point iteration algorithm in MCKF will surely converge to a unique fixed point in the range   ( ) ( ) : ( ) n k k k    x x x  provided that the kernel bandwidth  is larger than a certain value (e.g.   † max ,    ). Theorem 2 implies that the kernel bandwidth has significant influence on the convergence behavior of MCKF. If the kernel bandwidth  is too small, the algorithm will diverge or converge very slowly. A larger kernel bandwidth ensures a faster converge speed but usually leads to a poor performance in impulsive noises. In practical applications, the bandwidth can be set manually or optimized by trial and error methods. IV. I LLUSTRATIVE E XAMPLES

In this section, we present two illustrative examples to demonstrate the performance of the proposed MCKF algorithm, and compare it to the traditional KF algorithm. A. Example 1

Consider the following linear system: ( ) ( 1) ( 1)cos( ) sin( )( ) ( 1) ( 1)sin( ) cos( ) x k x k q kx k x k q k                           (43)   ( )( ) 1 1 ( )( ) x ky k r kx k      (44) where   . First, we consider the case in which the noises are all Gaussian, that is REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 6 ( 1) (0, 0.01)( 1) (0, 0.01)( ) (0, 0.01) q k Nq k Nr k N   Table II shows the MSEs of x and x for different filters. Here the MSE is computed as an average over 100 independent Monte Carlo runs, and in each run, 1000 samples (time steps) are used to evaluate the MSE. Since all the noises are Gaussian, the Kalman filter performs very well and in this example, it achieves almost the best performance (that is, the smallest MSEs). One can also see that when the kernel bandwidth is too small, the MCKF may achieve a worse performance; while when the bandwidth becomes larger, its performance will approach that of the KF. Actually, it has been proved that when   , the MCKF will reduce to the traditional KF. In general, one should choose a larger kernel width under Gaussian noises. TABLE II MSE S OF x AND x IN G AUSSIAN N OISES Filter MSE of x MSE of x KF 0.035778 0.030052 MCKF                                    Second, we consider the case in which the process noises are still Gaussian but the observation noise is a heavy-tailed (impulsive) non-Gaussian noise, with a mixed-Gaussian distribution, that is ( 1) (0, 0.01)( 1) (0, 0.01)( ) 0.9 (0, 0.01) 0.1 (0,100) q k Nq k Nr k N N    Fig.1 and Fig.2 illustrate the probability densities of the estimation errors of x and x . In the simulation, we set    . As one can see, in impulsive noises, when kernel bandwidth is too small or too large, the performance of MCKF will be not good. In this case, however, with a proper kernel bandwidth (say  ), the MCKF can outperform the KF significantly, achieving a desirable error distribution with a higher peak and smaller dispersion. Again, when  is very large, MCKF achieves almost the same performance as the KF. Fig.3 shows the fixed-point iteration numbers at the time step (or instant) k  for different kernel bandwidths. It is evident that the larger the kernel bandwidth, the faster the convergence speed. In particular, when the kernel bandwidth is large enough, the fixed-point algorithm in MCKF will converge to the optimal solution in just one or two iterations. In practical applications, to avoid slow convergence, the kernel bandwidth cannot be set at a very small value. Similar results can also be seen from Table III, where the average fixed-point iteration numbers of every time step for different filters are shown, which are computed as averages over 100 independent Monte Carlo runs, with each run containing 1000 time steps. Fig. 1. Probability densities of x estimation errors with different filters Fig. 2. Probability densities of x estimation errors with different filters Fig. 3. Fixed-point iteration numbers at time step k  for different kernel bandwidths -2 -1.5 -1 -0.5 0 0.5 1 1.5 200.20.40.60.811.2 KFMCKF(  =0.2)MCKF(  =0.5)MCKF(  =1)MCKF(  =2)MCKF(  =3)MCKF(  =10)-2 -1.5 -1 -0.5 0 0.5 1 1.5 200.20.40.60.811.2 KFMCKF(  =0.2)MCKF(  =0.5)MCKF(  =1)MCKF(  =2)MCKF(  =3)MCKF(  =10)  it e r a ti on nu m b e r T REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 7 TABLE III A

VERAGE I TERATION N UMBERS FOR E VERY T IME S TEP WITH D IFFERENT  Filter Average Iteration Numbers MCKF                                     

10, 10      We further investigate the influence of the threshold  on the performance. Table IV illustrates the MSEs of x and x with different  (where the kernel bandwidth is set at   ), and Table V presents the average fixed-point iteration numbers. One can see that a smaller  usually results in slightly lower MSEs but needs more iterations to converge. Obviously, the influence of  is not significant compared with the kernel bandwidth  . TABLE IV MSE S OF x AND x WITH D IFFERENT  Filter MSE of x MSE of x MCKF                                    TABLE V A

VERAGE I TERATION N UMBERS FOR E VERY T IME S TEP WITH D IFFERENT  Filter Average Iteration Numbers MCKF                                    Fig.4 and Fig.5 show the true and the estimated values of ( ) x k and ( ) x k with KF and MCKF (      ). The results clearly indicate that the MCKF can achieve much better tracking performance than the traditional KF algorithm. Fig. 4. The true and the estimated values of x Fig. 5. The true and the estimated values of x B. Example 2

Now we consider a practical example about one-dimensional linear uniformly accelerated motion. The state vector is   ( ) ( ) ( ) ( ) T k x k x k x k  x , in which ( ) x k is the position, ( ) x k denotes the speed, and ( ) x k stands for the acceleration. We assume that there are certain noises in the system and only the speed can be observed, which is also affected by some measurement disturbances. T  represents the measurement time interval. Then, the state and measurement equations are given by ( ) 1 0 ( 1) ( 1)( ) 0 1 ( 1) ( 1)( ) 0 0 1 ( 1) ( 1) x k T x k q kx k T x k q kx k x k q k                                            (45)   ( )( ) 0 1 0 ( ) ( )( ) x ky k x k r kx k        (46) trueKFMCKF trueKFMCKF REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 8 with

T s   . First, the process noises are assumed to be Gaussian and the measurement noise is non-Gaussian with a mixed-Gaussian distribution, that is ( 1) (0, 0.01)( 1) (0, 0.01)( 1) (0, 0.01)( ) 0.9 (0, 0.01) 0.1 (0,100) q k Nq k Nq k Nr k N N   

Further, the initial values of the true state, estimated state and covariance matrix are assumed to be:  (0) [0 0 1] ,(0 | 0) [0 0 1] (0, 0.01) [1 1 1] ,(0 | 0) 0.01 {1 1 1}. T T T

Ndiag      xxP

Fig.6~ Fig. 8 demonstrate the probability densities of the estimation errors of x , x and x for KF and MCKF, and Table VI summarizes the corresponding MSEs. In the simulation, the parameters are set at      . Those results confirm again that the proposed MCKF can outperform the traditional KF significantly when the system is disturbed by Gaussian process noises and non-Gaussian measurement noise. TABLE VI MSE S OF x , x AND x IN G AUSSIAN P ROCESS N OISES AND N ON -G AUSSIAN M EASUREMENT N OISE Filter MSE of x MSE of x MSE of x KF 50.7874   m   / m s   / m s MCKF 10.1444   m   / m s   / m s Fig. 6. Probability densities of x estimation errors for KF and MCKF in Gaussian process noises and non-Gaussian measurement noise Fig. 7. Probability densities of x estimation errors for KF and MCKF in Gaussian process noises and non-Gaussian measurement noise Fig. 8. Probability densities of x estimation errors for KF and MCKF in Gaussian process noises and non-Gaussian measurement noise Next, we consider the situation where the process and measurement noises are all non-Gaussian with mixed-Gaussian distributions, that is ( 1) 0.9 (0, 0.01) 0.1 (0,1)( 1) 0.9 (0, 0.01) 0.1 (0,1)( 1) 0.9 (0, 0.01) 0.1 (0,1)( ) 0.9 (0, 0.01) 0.1 (0,100) q k N Nq k N Nq k N Nr k N N     

With the same initial values and parameters setting as before, the results are shown in Fig.9~11 and Table VII. As expected, the MCKF performs much better than the traditional KF when the system is disturbed by non-Gaussian process and measurement noises. -10 -5 0 5 10 15 2000.050.10.150.20.250.30.350.4

KFMCKF -6 -4 -2 0 2 400.10.20.30.40.50.60.7

KFMCKF -2 -1 0 1 2 300.10.20.30.40.50.60.70.80.9

KFMCKF

REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 9

Fig. 9. Probability densities of x estimation errors for KF and MCKF in non-Gaussian process and measurement noises Fig. 10. Probability densities of x estimation errors for KF and MCKF in non-Gaussian process and measurement noises Fig. 11. Probability densities of x estimation errors for KF and MCKF in non-Gaussian process and measurement noises TABLE VII MSE S OF x , x AND x IN N ON -G AUSSIAN P ROCESS AND M EASUREMENT N OISES Filter MSE of x MSE of x MSE of x KF 114.8233   m   / m s   / m s MCKF 44.1290   m   / m s   / m s V. C ONCLUSION

A new Kalman type filtering algorithm, called maximum correntropy Kalman filter (MCKF), has been proposed in this work. The MCKF is derived by using the maximum correntropy criterion (MCC) as the optimality criterion, instead of using the well-known minimum mean square error (MMSE) criterion. The propagation equations for the prior estimation of the state and covariance matrix in MCKF are the same as those in KF. However, different from the KF, the MCKF uses a novel fixed-point algorithm to update the posterior estimations. The computational complexity of the MCKF is not expensive and the convergence is ensured if the kernel bandwidth is larger than a certain value. When the kernel bandwidth is large enough, the MCKF will behave like the KF. With a proper kernel bandwidth, the MCKF can outperform the KF significantly, especially when the underlying system is disturbed by some impulsive non-Gaussian noises. A

PPENDIX A. Derivation of the formula (26)                | 1 00| 1 p rpr k k kk k kkk kk k                     

IW B H IB HBBB H (A.1)       x y kk k      CC C (A.2)                | 1| 1 | 1 p r k kk k kk k k kk k                xD B yB xB y (A.3) By (A.1) and (A.2), we have -20 -15 -10 -5 0 5 10 15 2000.020.040.060.080.10.120.140.16 KFMCKF-8 -6 -4 -2 0 2 4 600.10.20.30.40.50.60.7

KFMCKF -6 -4 -2 0 2 4 600.050.10.150.20.250.30.350.4

KFMCKF

REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 10           

1` 11 1 1 1

T T TTp x p r y r k k k          

W C WB C B H B C B H (A.4) where we denote   | 1 p k k  B by p B ,   r k B by r B ,   x k C by x C and   y k C by y C for simplicity. Using the matrix inversion lemma with the identification:     , ,, T Tp x p Tr y r      

B C B A H BH C B C B D

We arrive at          ( )

T T T T T T T Tp x p p x p r y r p x p p x p k k k         

W C WB C B B C B H B C B HB C B H HB C B (A.5) Further, by (A.1)~(A.3), we derive                | 1 T T TTp x p r y r k k kk k k       

W C DB C B x H B C B y (A.6) Combining (25), (A.5) and (A.6), we obtain (26). B. Proof of Theorem 1  

121 2 ( )limG ( ) lim exp 12 ii e ke k                  (A.7) It follows easily that         lim ( ) lim G ( ) ,..., G ( )(1,...,1) x nn k diag e k e kdiag        C   (A.8)         lim ( ) lim G ( ) ,..., G ( )(1,...,1) y n n mm k diag e k e kdiag         C   (A.9)   lim ( | 1) lim ( | 1) ( ) ( | 1) ( | 1) ( | 1) ( | 1) Txp pTp p k k k k k k kk k k kk k           

P B C BB BP (A.10)   lim ( ) lim ( ) ( ) ( ) ( ) ( ) ( )

Tyr rTr r k k k kk kk     

R B C BB BR (A.11)         lim ( ) lim ( | 1) ( ) ( ) ( | 1) ( ) ( ) ( | 1) ( ) ( ) ( | 1) ( ) ( ) ( )

T TT T k k k k k k k k kk k k k k k k kk               

K P H H P H RP H H P H RK (A.12)            (A.12)(13) lim ( | ) lim ( | 1) ( ) ( ) ( ) ( | 1) ( | 1) ( ) ( ) ( ) ( | 1) ( | ) t k k k k k k k k kk k k k k k kk k               x x K y H xx K y H xx (A.13)              (A.12)(14) lim ( | )lim ( ) ( ) ( | 1) ( ) ( ) ( ) ( ) ( )( ) ( ) ( | 1) ( ) ( ) + ( ) ( ) ( )( | ) T TT T k k k k k k k k k k kk k k k k k k k kk k             

P I K H P I K H K R KI K H P I K H K R KP (A.14) This completes the proof. R

EFERENCES [1]

R. E. Kalman, “A new approach to linear filtering and prediction problems,”

Trans. ASME-J. Basic Eng. , vol. series D, no. 82, pp. 35-45, 1960. [2]

A. Bryson and Y. Ho,

Applied optimal control . Wiley New York, 1975. [3]

N. E. Nahi,

Estimation Theory and Applications . New York, NY: Wiley, 1969 [4]

J. Morr, “The Kalman filter: A robust estimator for some classes of linear-quadratic problem,”

IEEE Trans. Inf. Theory , vol. 22, no. 5, pp.526-534, 1976. [5]

Z. Wu, J. Shi, X. Zhang, W. Ma and B. Chen, “Kernel recursive maximum correntropy,”

Signal Processing , vol. 117, pp. 11-16, 2015 [6]

J. C. Principe,

Information Theoretic Learning: Renyi’s Entropy and Kernel Perspectives . New York, NY, USA: Springer-Verlag, 2010. [7]

B. Chen, Y. Zhu, J. Hu and J. C. Principe,

System Parameter Identification: Information criteria and algorithms . Oxford, U.K.: Newnes, 2013 [8]

W. Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: Properties, and applications in non-gaussian signal processing,”

IEEE Trans. Signal Process. , vol. 55, no. 11, pp. 5286-5298, 2007. [9]

B. Chen, L. Xing, J. Liang, N. Zheng, and J. C. Principe, “Steady-State Mean-Square Error Analysis for Adaptive Filtering under the Maximum Correntropy Criterion,”

IEEE Signal Process. Lett. , vol. 21, no. 7, pp. 880-884, 2014 [10]

S. Zhao, B. Chen and J. C. Principe, “Kernel adaptive filtering with maximum correntropy criterion,” in

Proc. Int. Joint Conference on Neural Networks (IJCNN) , 2011, pp. 2012-2017

REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 11 [11]

A. Singh and J. C. Principe, “Using correntropy as a cost function in linear adaptive filters,” in

Proc. Int. Joint Conference on Neural Networks (IJCNN) , 2009, pp. 2950-2955 [12]

B. Chen and J. C. Principe, “Maximum correntropy estimation is a smoothed MAP estimation.”

IEEE Signal Process. Lett. , vol. 19, pp. 491-494, 2012. [13]

W. Ma, H. Qu and J. Zhao, “Estimator with Forgetting Factor of Correntropy and Recursive Algorithm for Traffic Network Prediction,” in

Proc. Chinese Control and Decision Conference (CCDC) , 2013, pp. 490-494 [14]

X. Chen, J. Yang, J. Liang and Q. Ye, “Recursive robust least squares support vector regression based on maximum correntropy criterion,”

Neurocomputing , vol. 97, pp. 63-73, 2012 [15]

R. He, B. Hu, X. Yuan, and L. Wang,

Robust Recognition via Information Theoretic Learning . Amsterdam, The Netherlands: Springer, 2014. [16]

A. Singh and J. C. Principe, “A loss function for classification based on a robust similarity metric,” in

Proc. Int. Joint Conference on Neural Networks (IJCNN) , 2010, pp. 1-6 [17]

A. Gunduz and J. C. Principe, “Correntropy as a novel measure for nonlinearity tests,” Signal Process., vol. 89, no. 1, pp. 14-23, 2009 [18]

R. He, W. Zheng, B, Hu, “Maxinum correntropy criterion for robust face recognition,”

IEEE Trans. Patt. Anal. Intell. , vol. 33, no. 8, pp. 1561-1576,2011. [19]

R. He, B. Hu, W. Zheng, X. Kong, “Robust principal component analysis based on maximum correntropy criterion,”

IEEE Trans. Image Process. , vol. 20, no. 6, pp. 1485-1494, 2011. [20]

J. Xu and J. C. Principe, “A pitch detector based on a generalized correlation function,”

IEEE Trans. Audio, Speech, Lang. Process. , vol. 16, no. 8, pp. 1420-1432, 2008. [21]

R. J. Bessa, V. Miranda, and J. Gama, “Entropy, and correntropy against minimum square error in offline, and online three-day ahead wind power forecasting,”

IEEE Trans. Power Syst. , vol. 24, no. 4, pp. 1657-1666, 2009. [22]

L. Shi and Y. Lin, “Convex Combination of Adaptive Filters under the Maximum Correntropy Criterion in Impulsive Interference,”

IEEE Signal Process. Lett. , vol. 21, no. 11, pp. 1385-1388, 2014. [23]

G. T. Cinar and J. C. Principe, “Hidden State Estimation using the Correntropy Filter with Fixed Point Update and Adaptive Kernel Size,” in

Proc. IEEE World Cong. Comp. Intell. (WCCI) , 2012, pp. 10-15 [24]

R. P. Agarwal, M. Meehan, and D. O’ Regan,

Fixed Point Theory and Applications . Cambridge, U.K.: Cambridge Univ. Press, 2001. [25]

A. Singh and J. C. Principe,”A closed form recursive solution for maximum correntropy training,” in

Proc. IEEE Int. Conf. Acoustics Speech and Signal Processing (ICASSP) , 2010, pp. 2070-2073 [26]

B. Chen, J. Wang, H. Zhao, N. Zheng, and J. C. Principe, “Convergence of a fixed-point algorithm under maximum correntropy criterion,”

IEEE Signal Process. Lett. , vol. 22, no. 10, pp. 1723-1727, 2015 [27]

G. Tambini, G. C. Montanari and M. Cacciari, “The Kalman filter as a way to estimate the life-model parameters of insulating materials and system,” in

Proc. the 4 th Inter. Conf. on Conduction and Breakdown in Solid Dielectrics , 1992, pp. 523-527 [28]

D. Unsal and M. Dogan, “Implementation of Identification System for IMUs Based on Kalman Filtering,” in

Proc. IEEE/ION Position, Location and Navigation Symposium-PLANS , 2014, pp. 236-240 [29]

A.Yadav, N. Naik, M. R. Ananthasayanam, A. Gaur and Y. N. Singh, “A Constant Gain Kalman Filter Approach to target tracking in Wireless Sensor Networks,” in

Proc. the 7 th IEEE Inter. Conf. on Industrial and Information Systems (ICIIS) , 2012, pp. 1-7 [30]

A Simultaneous Localization and Mapping Algorithm Based on Kalman Filtering,” in

Proc. IEEE Intelligent Vehicles Symposium , 2004, pp. 631-635, 2004, pp. 631-635