SVD-based Kalman Filter Derivative Computation
aa r X i v : . [ c s . S Y ] A p r PREPRINT 1
SVD-based Kalman Filter Derivative Computation
J.V. Tsyganova and M.V. Kulikova
Abstract — Recursive adaptive filtering methods are often used forsolving the problem of simultaneous state and parameters estimationarising in many areas of research. The gradient-based schemes foradaptive Kalman filtering (KF) require the corresponding filter sensitivitycomputations. The standard approach is based on the direct differentia-tion of the KF equations. The shortcoming of this strategy is a numericalinstability of the conventional KF (and its derivatives) with respect toroundoff errors. For decades, special attention has been paid in the KFcommunity for designing efficient filter implementations that improverobustness of the estimator against roundoff. The most popular andbeneficial techniques are found in the class of square-root (SR) or UDfactorization-based methods. They imply the Cholesky decomposition ofthe corresponding error covariance matrix. Another important matrixfactorization method is the singular value decomposition (SVD) and,hence, further encouraging KF algorithms might be found under thisapproach. Meanwhile, the filter sensitivity computation heavily relieson the use of matrix differential calculus. Previous works on therobust KF derivative computation have produced the SR- and UD-basedmethodologies. Alternatively, in this paper we design the SVD-basedapproach. The solution is expressed in terms of the SVD-based KF covariance quantities and their derivatives (with respect to unknownsystem parameters). The results of numerical experiments illustratethat although the newly-developed SVD-based method is algebraicallyequivalent to the conventional approach and the previously derived SR-and UD-based strategies, it outperforms the mentioned techniques forestimation accuracy in ill-conditioned situations.
Index Terms — Kalman filter, filter sensitivity equations, SVD factor-ization, array algorithms.
I. I
NTRODUCTION
The problem of filter sensitivities evaluation plays a key rolein many areas of research; for instance, in state estimation andparameter identification realm [1], [2], in the field of optimal inputdesign [3], [4], in information theory for computing the Fisherinformation matrix [5]–[7] etc . In this paper we explore lineardiscrete-time stochastic systems where the associated Kalman filter(KF) is used for estimating the unknown dynamic states. Therefore,the standard approach for computing the filter sensitivities (withrespect to unknown system parameters) is a direct differentiation ofthe KF equations. This conventional methodology is comprehensivelystudied in [3], [8], [9]. The shortcoming of this strategy is a numericalinstability of the conventional KF (and its derivatives) with respectto roundoff errors discussed in [10], [11]. Due to this fact, specialattention has been paid in the KF community for designing robustKF implementation methods. The most popular techniques belongto the class of square-root (SR) or UD factorization-based methods;see [12]–[15] and many others. These algorithms imply the Choleskydecomposition and its modification for the corresponding covariancematrix factorization [13], [16], [17]. We may note that the Choleskydecomposition exists and is unique when the symmetric matrix tobe decomposed is positive definite [18]. If it is a positive semi-definite, then the Cholesky decomposition still exists, however, it
Manuscript received ??; revised ??. The second author thanks the support ofPortuguese National Fund (
Fundac¸ ˜ao para a Ciˆencia e a Tecnologia ) withinthe scope of project UID/Multi/04621/2013.The first author is with Ulyanovsk State University, Str. L. Tolstoy 42,432017 Ulyanovsk, Russian Federation. The second author is with CEMAT(Center for Computational and Stochastic Mathematics), Instituto SuperiorT´ecnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 LISBOA,Portugal; Emails: [email protected]; [email protected] is not unique [19]. Further encouraging KF implementation methodsmight be found with the use of singular value decomposition (SVD).Some evidences of better estimation quality obtained under the SVD-based approach exist in the field of nonlinear filtering; for instance,see discussion in [20]–[22] and others. For linear filtering problemexamined in this paper, the first SVD-based KF was, to the best ofour knowledge, designed in [23]. Our recent analysis exposes that thementioned SVD-based filter can be further improved for enhancing itsnumerical robustness. This result is comprehensively studied in [24],where some new stable SVD-based KF implementations are designed.Despite the existence of inherently more stable SR-, UD- and SVD-based KF variants, the problem of robust filter derivative computationis seldom addressed in practice because of its complicated matter.The solution to the mentioned problem heavily relies on the useof matrix differential calculus. The first SR-based information-type algorithm for the KF derivative computations belongs to Bierman etal. and was appeared in 1990; see [25]. Alternatively, the SR-based covariance-type method was proposed in [26] as well as the UD-based scheme designed in [27]. Later on, a general “differentiated”SR-based methodology was designed for both orthogonal and J-orthogonal transformations involved in the filtering equations (andtheir derivatives) in [28]–[30]. Alternatively, in this technical note wedevelop the SVD-based approach for the KF derivative computation.We show that the new technique is algebraically equivalent to theconventional “differentiated” KF, but it improves the robustnessagainst roundoff errors as well as the existing “differentiated” SR- andUD-based methodologies. However, motivated by the results obtainedin nonlinear filtering realm, we expect that the newly-designed SVD-based method outperforms the previously derived algorithms whilesolving the parameters estimation problem, especially when the errorcovariance is ill-conditioned.II. F
ILTER SENSITIVITY EQUATIONS : CONVENTIONAL APPROACH
Consider the state-space equations x k = F ( θ ) x k − + B ( θ ) u k − + G ( θ ) w k − , k ≥ , (1) z k = H ( θ ) x k + v k , v k ∼ N (0 , R ( θ )) , w k ∼ N (0 , Ω( θ )) (2)where z k ∈ R m , u k ∈ R d , x k ∈ R n and θ ∈ R p are, respectively,the vectors of available measurements, the known deterministiccontrol input, the unknown dynamic state and the unknown systemparameters that need to be estimated from the available experimentaldata, { z , . . . , z N } . The process and the measurement noises areindependent Gaussian zero-mean white-noise processes that are alsoindependent from the initial state x ∼ N (¯ x , Π ( θ )) . The covari-ances are assumed to be Ω( θ ) ≥ , R ( θ ) > and Π ( θ ) ≥ .Equations (1), (2) represent a set of the state-space models (SSMs).Each of them corresponds to a particular system parameter value. Thismeans that for any fixed value of θ , say ˆ θ ∗ , the system matrices areknown, i.e. there is no uncertainty in model (1), (2). For simplicity,throughout the paper we write F etc. instead of F (ˆ θ ∗ ) etc. whenevaluating at the fixed point ˆ θ ∗ . The associated KF yields the linearminimum least-square estimate of the unknown dynamic state that REPRINT 2 can be recursively computed via the equations [16, Theorem 9.2.1]: e k = z k − H ˆ x k | k − , ˆ x | = ¯ x , k ≥ , (3) K p,k = F P k | k − H T R − e,k , R e,k = R + HP k | k − H T , (4) ˆ x k +1 | k = F ˆ x k | k − + Bu k + K p,k e k (5)where { e k } are innovations of the discrete-time KF. The importantproperty of the KF for Gaussian SSMs is e k ∼ N (0 , R e,k ) . The P k | k − = E (cid:8) ( x k − ˆ x k | k − )( x k − ˆ x k | k − ) T (cid:9) is the one-step aheadpredicted error covariance matrix computed as follows: P k +1 | k = F P k | k − F T + G Ω G T − K p,k R e,k K Tp,k , P | = Π . (6)The conventional approach for deriving the related sensitivitymodel is based on differentiation of the corresponding filteringequations. Let A ( θ ) ∈ R m × n , B ( θ ) ∈ R n × q be matrices, whichentries are differentiable functions of the parameter vector θ ∈ R p .The m × n matrix ∂ i A = ∂A/∂θ i implies the partial derivative of the A with respect to the i -th component of θ , i = 1 , . . . p . The m × n matrix d A = P pi =1 ( ∂ i A ) · (d θ i ) is the differential form of first-orderderivatives of A ( θ ) . Taking into account the matrix product rule ofdifferentiation [31, p. 955]: d ( AB ) = (d A ) B + A (d B ) , and thefact d I = 0 , we derive d (cid:0) A − (cid:1) = − A − (d A ) A − for any squareand invertible matrix A (it is also known as the Jacobi’s formula);see also [8, p. 546]. Using these differentiation rules, the necessarydifferentials of (3)-(6) can be written as follows [8], [9]: d e k = − (cid:2) (d H ) ˆ x k | k − + H (cid:0) dˆ x k | k − (cid:1)(cid:3) , (7) dˆ x k +1 | k = (d F ) ˆ x k | k − + F (cid:0) dˆ x k | k − (cid:1) + (d B ) u k + (d K p,k ) e k + K p,k (d e k ) , (8) d K p,k = (d F ) P k | k − H T R − e,k + F (cid:0) d P k | k − (cid:1) H T R − e,k + F P k | k − (cid:16) d H T (cid:17) R − e,k − F P k | k − H T R − e,k (d R e,k ) R − e,k , (9) d R e,k = d R + (d H ) P k | k − H T + H (cid:0) d P k | k − (cid:1) H T + HP k | k − (cid:16) d H T (cid:17) , (10) d P k +1 | k = (d F ) P k | k − F T + F (cid:0) d P k | k − (cid:1) F T + F P k | k − (cid:16) d F T (cid:17) + (d G ) Ω G T + G (dΩ) G T + G Ω (cid:16) d G T (cid:17) − (d K p,k ) R e,k K Tp,k − K p,k (d R e,k ) K Tp,k − K p,k R e,k (cid:16) d K Tp,k (cid:17) . (11)In deriving the equations above we take into account that d z k = 0 and d u k = 0 , because the observations z k and the control input u k donot depend on the parameters (i.e. their realizations are independentof variations in θ ) and therefore have a differential equal to zero.We may also note that except for the scalar factor d θ i , ∂ i A is aspecial case of d A , so that to obtain partial-derivative forms fromdifferential forms, we only have to everywhere replace operator d ( · ) with ∂ i ( · ) for i = 1 , . . . p [8, p. 546]. Hence, from (7) – (11) weobtain a set of p vector equations, known as the filter sensitivityequations , for computing ∂ i ˆ x k +1 | k , i = 1 , . . . p , and a set of p matrix equations, known as the Riccati-type sensitivity equations ,for computing ∂ i P k +1 | k , i = 1 , . . . p . This approach for the KFsensitivity model derivation is called the “differentiated KF”. Its maindrawback is a numerical instability of the conventional KF (3) – (6)and inherently its derivative (7) – (11) with respect to roundoff errors.The goal of this paper is to design a robust methodology forupdating the “differentiated” KF equations above in terms of SVDfactors (and their derivatives) of the error covariance matrices P k | k − instead of using the full matrices P k | k − (and their derivatives). III. SVD FACTORIZATION - BASED K ALMAN FILTERING
To the best of our knowledge, the first SVD-based KF was by Wang et al. and appeared in 1992; see Eqs (17), (22), (23) in [23, pp. 1225-1226]. Our recent research shows that although that implementationis inherently more stable than the KF (3) – (6), it is still sensitiveto roundoff and poorly treats ill-conditioned problems. The citedanalysis exposes that the SVD-based filter can be further improved forenhancing its numerical robustness. This result is comprehensivelystudied in [24], where new stable SVD-based KF implementationsare designed. The readers are referred to the cited paper for thedetailed derivations, numerical stability discussion and proofs. Here,we briefly outline the principle steps for construction of the mostadvanced SVD-based KF variant. Next, we extend it to a stable filtersensitivities computation, which is the main purpose of this study.Consider the SVD factorization [32, Theorem 2.8.1]: suppose A ∈ C m × n , rank A = r . There exist positive numbers σ ≥ . . . σ r > and unitary matrices W ∈ C m × m and V ∈ C n × n such that A = W Σ V ∗ , Σ = (cid:20) S
00 0 (cid:21) ∈ C m × n , S = diag { σ , . . . , σ r } where V ∗ is the conjugate transpose of V .The diagonal entries of Σ are known as the singular values of A .The non-zero σ i ( i = 1 , . . . , r ) are the square roots of the non-zeroeigenvalues of both A ∗ A and AA ∗ .If A is a square matrix such that A ∗ A = AA ∗ , then the A can bediagonalized using a basis of eigenvectors according to the spectraltheorem, i.e. it can be factorized as follows: A = QDQ ∗ where Q is a unitary matrix and D is a diagonal matrix, respectively. If A is also positive semi-definite, then the spectral decomposition above, A = QDQ ∗ , is also a SVD factorization, i.e. the diagonal matrix D contains the singular values of A . For the SSMs examined in thispaper, the initial error covariance Π ∈ R n is a symmetric positivesemi-definite matrix and, hence, the spectral decomposition implies Π = Q Π D Π Q T Π where Q Π and D Π are the orthogonal anddiagonal matrices, respectively. It is also a SVD factorization, i.e.the factor D Π contains the singular values of Π .Now, we are ready to present the SVD-based KF implementationdeveloped recently in [24]. Instead of conventional recursion (3)-(6)for P k | k − , we update only their SVD factors, { Q P k | k − , D / P k | k − } ,at each iteration step of the filter as shown below.I NITIAL S TEP ( k = 0 ). Apply the SVD factorization for the initialerror covariance matrix Π = Q Π D Π Q T Π and, additionally, forthe process and measurement noise covariances: Ω = Q Ω D Ω Q T Ω and R = Q R D R Q TR , respectively. Set the initial values as follows: Q P | = Q Π , D / P | = D / and ˆ x | = ¯ x .M EASUREMENT U PDATE ( k = 1 , . . . , N ). Build the pre-arraysfrom the filter quantities that are currently available and, then, applythe SVD factorizations in order to obtain the corresponding SVDfactors of the updated filter quantities as follows: " D / R Q TR D / P k | k − Q TP k | k − H T Pre − array = W (1) MU " D / R e,k Q TR e,k | {z } Post − array SVD factors , (12) ¯ K k = (cid:16) Q P k | k − D P k | k − Q TP k | k − (cid:17) H T Q R e,k , (13) " D / P k | k − Q TP k | k − ( I − K k H ) T D / R Q TR K Tk Pre − array = W (2) MU " D / P k | k Q TP k | k | {z } Post − array SVD factors (14)where we denote K k = ¯ K k D − R e,k Q TR e,k . The matrices W (1) MU ∈ R ( m + n ) × ( m + n ) , Q R e,k ∈ R m × m and W (2) MU ∈ R ( n + m ) × ( n + m ) , Q P k | k ∈ R n × n are the orthogonal matrices of the corresponding REPRINT 3
SVD factorizations in (12), (14). Next, D / R e,k ∈ R m × m and D / P k | k ∈ R n × n are diagonal matrices with square roots of the singular valuesof R e,k and P k | k , respectively.It can be easily seen that the required SVD factors of the inno-vation covariance R e,k , i.e. { Q R e,k , D / R e,k } , and a posteriori errorcovariance matrix P k | k , i.e. { Q P k | k , D / P k | k } , are directly read-offfrom the post-array factors in (12) and (14), respectively. Finally,find a posteriori estimate ˆ x k | k through equations ˆ x k | k = ˆ x k | k − + ¯ K k D − R e,k ¯ e k , ¯ e k = Q TR e,k (cid:0) z k − H ˆ x k | k − (cid:1) . (15)T IME U PDATE ( k = 1 , . . . , N ). Build the pre-array and apply theSVD factorization to obtain a priori error covariance SVD factors { Q P k +1 | k , D / P k +1 | k } as follows: " D / P k | k Q TP k | k F T D / Q T Ω G T Pre − array = W TU " D / P k +1 | k Q TP k +1 | k | {z } Post − array SVD factors (16)and find a priori estimate ˆ x k +1 | k as follows: ˆ x k +1 | k = F ˆ x k | k + Bu k . (17)The SVD-based KF implementation above is formulated in two-stage form. Meanwhile, following [15], the conventional KF (3) –(6) is expressed in the so-called “condensed” form. Nevertheless,these KF variants are algebraically equivalent. It is easy to proveif we take into account the SVD factorization A = W Σ V T andthe properties of orthogonal matrices. Indeed, for each pre-array tobe decomposed we have A T A = ( V Σ W T )( W Σ V T ) = V Σ V T .Next, by comparing both sides of the obtained matrix equations, wecome to the corresponding SVD-based KF formulas. The detailedderivation can be found in [24].IV. F ILTER SENSITIVITY EQUATIONS : SVD-
BASED APPROACH
To begin constructing the “differentiated” SVD-based method forcomputing the filter sensitivities, we pay attention to the underlyingSVD-based filter and remark that it is formulated in the so-calledarray form. This makes the modern KF algorithms better suited toparallel implementation and to very large scale integration (VLSI)implementation as mentioned in [15]. Each iteration of the SVD-based filter examined has the following pattern: given a pre-array A ∈ R ( k + s ) × s , compute the post-array SVD factors W ∈ R ( k + s ) × ( k + s ) , Σ ∈ R ( k + s ) × s and V ∈ R s × s by means of the SVD factorization A = W Σ V T , Σ = (cid:20) S (cid:21) , S = diag { σ , . . . , σ s } (18)where the matrix A is of full column rank, i.e. rank A = s ; the W , V are orthogonal matrices and S is a diagonal matrix with singularvalues of the pre-array A .The goal of our study is to develop the method that naturallyextends formula (18) on the post-array SVD factors’ derivative com-putation. More precisely, the computational procedure is expected toutilize the pre-array A and its derivative A ′ θ for reproducing the SVDpost-arrays { W , Σ , V } together with their derivatives { W ′ θ , Σ ′ θ , V ′ θ } .To achieve our goal, we prove the result presented below. We alsobear in mind that the SVD post-array factor W is of no interest in thepresented SVD-based KF for performing the next step of the filterrecursion and, hence, the quantity W ′ θ is not required to be computed. Lemma 1:
Consider the SVD factorization in (18). Let entries ofthe pre-array A ( θ ) be known differentiable functions of a scalarparameter θ . We assume that σ i ( θ ) = σ j ( θ ) , j = i , for all θ . Given the derivative of the pre-array, A ′ θ , the following formulas calculatethe corresponding derivatives of the post-arrays: Σ ′ θ = (cid:20) S ′ θ (cid:21) , S ′ θ = diag h W T A ′ θ V i s × s , (19) V ′ θ = V h ¯ L T − ¯ L i (20)where (cid:2) W T A ′ θ V (cid:3) s × s denotes the main s × s block of the matrixproduct W T A ′ θ V , and ¯ L is a strictly lower triangular matrix, whichentries are computed as follows: (¯ l ) ij = ¯ u ji σ j + ¯ l ij σ i σ i − σ j , i = 2 , . . . , s, j = 1 , . . . , i − . (21)In equation above, the quantities ¯ u ji and ¯ l ji denote the entries ofmatrices ¯ L and ¯ U , respectively. The ¯ L , ¯ U are strictly lower and uppertriangular parts of the matrix product (cid:2) W T A ′ θ V (cid:3) s × s , respectively. Proof:
By differentiating (18) with respect to θ , we obtain A ′ θ = W ′ θ Σ V T + W Σ ′ θ V T + W Σ ( V T ) ′ θ . (22)Having applied a right-multiplier V and a left-multiplier W T toequation (22), we have W T A ′ θ V = h W T W ′ θ i Σ + Σ ′ θ + Σ h ( V T ) ′ θ V i . (23)In deriving the equation above we take into account the properties ofany orthogonal matrix Q , i.e. QQ T = Q T Q = I .It is also easy to show that for any orthogonal matrix Q the product Q ′ θ Q T is a skew symmetric matrix. Indeed, by differentiating bothsides of QQ T = I with respect to θ , we get Q ′ θ Q T + Q (cid:0) Q T (cid:1) ′ θ = 0 ,or in the equivalent form Q ′ θ Q T = − (cid:0) Q ′ θ Q T (cid:1) T . The latter impliesthat the matrix Q ′ θ Q T is skew symmetric.For the sake of simplicity we introduce the following notations: Υ = W T W ′ θ and Λ = V T V ′ θ . As discussed above, the matrices Υ ∈ R ( k + s ) × ( k + s ) and Λ ∈ R s × s are skew symmetric, because W and V are orthogonal matrices. Hence, we have Λ T = − Λ . Takinginto account this fact, we obtain the following partitioning of thematrix form of equation (23): "(cid:2) W T A ′ θ V (cid:3) s × s (cid:2) W T A ′ θ V (cid:3) k × s = (cid:20) [Υ] s × s [Υ] s × k [Υ] k × s [Υ] k × k (cid:21) (cid:20) S (cid:21) + (cid:20) S ′ θ (cid:21) − (cid:20) S (cid:21) Λ . From the equation above, we derive the formula for the main s × s block of the matrix product W T A ′ θ V h W T A ′ θ V i s × s = [Υ] s × s S + S ′ θ − S Λ . (24)Hence, the diagonal matrix S ′ θ obeys the equation S ′ θ = h W T A ′ θ V i s × s − [Υ] s × s S + S Λ . (25)Now, let us discuss formula (25) in details. Recall the matrices Υ and Λ are skew symmetric matrices and, hence, their diagonal entriesare equal to zero. The multiplication of any skew symmetric matrixby a diagonal matrix does not change the matrix structure, i.e. thediagonal entries of the matrix products [Υ] s × s S and S Λ are equal tozero as well. Meanwhile, the matrix (cid:2) W T A ′ θ V (cid:3) s × s is a full matrixand contains a diagonal part. Hence, from equation (25) we concludethat diagonal matrix S ′ θ is, in fact, a diagonal part of the main s × s block of the matrix product W T A ′ θ V . This completes the proof offormulas in equation (19).Finally, we need to compute V ′ θ where Λ = V T V ′ θ . Since V is anorthogonal matrix, we obtain V ′ θ = V Λ . Next, any skew symmetricmatrix can be presented as a difference of a strictly lower triangular REPRINT 4 matrix and its transpose. Hence, the skew symmetric matrices [Υ] s × s and Λ can be represented as follows: [Υ] s × s = ¯ L T − ¯ L Λ = ¯ L T − ¯ L (26)where ¯ L and ¯ L are strictly lower triangular matrices.Next, we split the matrix product (cid:2) W T A ′ θ V (cid:3) s × s into strictlylower triangular, diagonal and strictly upper triangular parts, i.e. (cid:2) W T A ′ θ V (cid:3) s × s = ¯ L + D + ¯ U . It was proved above that S ′ θ = D .Taking into account this fact, the substitution of both formulas in (26)into (25) yields D |{z} S ′ θ = ¯ L + D + ¯ U | {z } [ W T A ′ θ V ] s × s − h ¯ L T − ¯ L i| {z } [Υ] s × s S + S h ¯ L T − ¯ L i| {z } Λ . (27)Hence, we obtain ¯ L + ¯ U = [ ¯ L T − ¯ L ] S − S [ ¯ L T − ¯ L ] . (28)In (28), the ¯ L , ¯ L , ¯ L are strictly lower triangular matrices, the ¯ U is a strictly upper triangular matrix and S is a diagonal. Hence,equation (28) implies (cid:26) ¯ U = ¯ L T S − S ¯ L T , ¯ L = − ¯ L S + S ¯ L . It can be solved with respect to entries of ¯ L as follows: (¯ l ) ij = ¯ u ji s j + ¯ l ij s i s i − s j , i = 2 , . . . , s, j = 1 , . . . , i − . The formula above is exactly equation (21). Having computed theentries (¯ l ) ij we can form the matrix Λ = ¯ L T − ¯ L in (26) and,then, compute the derivative V ′ θ = V Λ . This completes the proofof (20) and Lemma 1. Remark 1:
The assumption of singular values of A ( θ ) beingdistinct for all values of parameter θ is necessary for avoiding thedivision by zero in formula (21). In future, if possible, we will intendfor relaxing this restriction, which reduces the practical applicabilityof the proposed method.For readers’ convenience, Algorithm 1 provides a pseudocode forthe computational scheme derived in Lemma 1.A LGORITHM
1. D
IFFERENTIATED
SVD ( A, A ′ θ ) Input: A , A ′ θ ✄ Pre-array and its derivative A . Save W , S , V .2 Compute the matrix product W T A ′ θ V .3 Extract the main s × s block M = (cid:2) W T A ′ θ V (cid:3) s × s .4 M = ¯ L + D + ¯ U . ✄ Split into strictly lower triangular, diagonal ✄ and strictly upper triangular parts ¯ L , ¯ U and S , compute the lower triangular ¯ L by (21).6 Find V ′ θ = V (cid:2) ¯ L T − ¯ L (cid:3) .7 Find S ′ θ = D . Hence, Σ ′ θ = [ S ′ θ | T . Output: Σ , V and Σ ′ θ , V ′ θ ✄ Post-arrays and their derivative
The theoretical result presented in Lemma 1 can be further appliedto the SVD factorization-based KF discussed in Section III. Theobtained computational scheme is summarized in Algorithm 2 andshown in the form of a pseudocode. The new “differentiated” SVD-based KF extends the underlying SVD-based filter on the derivativecomputation (with respect to unknown system parameters) for updat-ing the corresponding filter sensitivities equations. The method canbe used for replacing the conventional “differentiated KF” approachdiscussed in Section II by inherently more stable approach, whichis preferable for practical implementation. Finally, we would liketo remark that any “differentiated” filtering technique consists oftwo parts: i) the underlying KF variant, and ii) its “differentiated”extension used for the filter sensitivities computation. A
LGORITHM
2. D
IFFERENTIATED
SVD-
BASED KF (¯ x , Π ) Initial Step ( k = 0 ).1 Ω = Q Ω D Ω Q T Ω and R = Q R D R Q TR ✄ SVD factorization Π = Q Π D Π Q T Π ✄ SVD factorization Q P | = Q Π , D / P | = D / and ˆ x | = ¯ x .4 Set ∂ i Q P | = ∂ i Q Π , ∂ i D / P | = ∂ i D / , ∂ i ˆ x | = 0 . Measurement Update: ( k = 1 , . . . , N ).5 Build pre-array A in (12) and its derivatives ∂ i A , i = 1 , p .6 [Σ , V , ∂ i Σ , ∂ i V ] ← Differentiated SVD ( A , ∂ i A ).7 n D / R e,k , ∂ i D / R e,k o ← read-off from Σ , ∂ i Σ ( i = 1 , p ).8 (cid:8) Q R e,k , ∂ i Q R e,k (cid:9) ← read-off from V , ∂ i V ( i = 1 , p ).9 Find ¯ K k from (13) and K k = ¯ K k D − R e,k Q TR e,k .10 ∂ i ¯ K k = ∂ i (cid:16) Q P k | k − D P k | k − Q TP k | k − H T Q R e,k (cid:17) , i = 1 , p .11 Build pre-array A in (14) and its derivatives ∂ i A , i = 1 , p .12 [Σ , V , ∂ i Σ , ∂ i V ] ← Differentiated SVD ( A , ∂ i A ).13 n D / P k | k , ∂ i D / P k | k o ← read-off from Σ , ∂ i Σ ( i = 1 , p ).14 n Q P k | k , ∂ i Q P k | k o ← read-off from V , ∂ i V ( i = 1 , p ).15 Find a posteriori estimate ˆ x k | k and ¯ e k from (15).16 ∂ i ¯ e k = (cid:16) ∂ i Q TR e,k (cid:17) (cid:2) z k − H ˆ x k | k − (cid:3) − Q TR e,k (cid:2) ( ∂ i H ) ˆ x k | k − + H (cid:0) ∂ i ˆ x k | k − (cid:1)(cid:3) , i = 1 , p .17 ∂ i ˆ x k | k = ∂ i ˆ x k | k − + (cid:0) ∂ i ¯ K k (cid:1) D − R e,k ¯ e k + ¯ K k D − R e,k ( ∂ i ¯ e k ) − ¯ K k D − R e,k (cid:0) ∂ i D R e,k (cid:1) D − R e,k ¯ e k , i = 1 , p . Time Update: ( k = 1 , . . . , N ).18 Build pre-array A in (16) and its derivatives ∂ i A , i = 1 , p .19 [Σ , V , ∂ i Σ , ∂ i V ] ← Differentiated SVD ( A , ∂ i A ).20 n D / P k +1 | k , ∂ i D / P k +1 | k o ← read-off from Σ , ∂ i Σ ( i = 1 , p ).21 n Q P k +1 | k , ∂ i Q P k +1 | k o ← read-off from V , ∂ i V ( i = 1 , p ).22 Find a priori estimate ˆ x k +1 | k from equation (17).23 ∂ i ˆ x k +1 | k = ( ∂ i F ) ˆ x k | k + F (cid:0) ∂ i ˆ x k | k (cid:1) + ( ∂ i B ) u k , i = 1 , p . End.
At the same manner, one can naturally augment any existing SVD-based KF variant (see, for instance, the algorithms in [23], [24]) orpotentially new SVD-based KF implementation on the correspondingfilter sensitivities computation.Finally, taking into account the properties of orthogonal matrices,it is not difficult to show that the negative log likelihood function(LF) given as [33]: L (cid:16) θ, Z N (cid:17) = c + 12 N X k =1 n ln (det R e,k ) + e Tk R − e,k e k o can be rewritten in terms of the SVD filter variables Q R e,k , D R e,k and ¯ e k appeared in equations (12) – (17) as follows: L (cid:16) θ, Z N (cid:17) = c + 12 N X k =1 n ln (cid:0) det D R e,k (cid:1) + ¯ e Tk D − R e,k ¯ e k o (29)where Z N = { z , . . . , z N } is N -step measurement history and c isa constant value where c = Nm ln(2 π ) .Taking into account that the matrix D R e,k is diagonal and using theJacobi’s formula, d (cid:0) A − (cid:1) = − A − (d A ) A − , from (29) we obtainthe expression for the log LF gradient evaluation in terms of the SVDfilter variables and their derivatives computed in the newly-developedAlgorithm 2 (for each i = 1 , . . . , p ): ∂ i L (cid:16) θ, Z N (cid:17) = 12 N X k =1 n tr h(cid:0) ∂ i D R e,k (cid:1) D − R e,k i +2 (cid:16) ∂ i ¯ e Tk (cid:17) D − R e,k ¯ e k − ¯ e Tk D − R e,k (cid:0) ∂ i D R e,k (cid:1) D − R e,k ¯ e k o . (30) REPRINT 5
V. N
UMERICAL EXPERIMENTS
By using simple test problem, we would like to demonstratethoroughly each step of the method summarized in Algorithm 1.
Example 1:
Given pre-array A ( θ ) and its derivative A ′ θ A ( θ ) = − θ sin( θ )2 θ θ sin ( θ ) 1 / θ θ θ − ( θ ) θ + θ , compute the corresponding SVD post-arrays Σ , V and their derivative Σ ′ θ , V ′ θ at the point ˆ θ = 0 . .Table 1 illustrates each step of the computational scheme inAlgorithm 1. To assess the accuracy of computations, we compute l ∞ = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ( A T A ) ′ ˆ θ =0 . − ( V Σ V T ) ′ ˆ θ =0 . (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ . This quantity should besmall. Indeed, taking into account the properties of diagonal andorthogonal matrices, from (18) we have A T A = V Σ T W T W Σ V T = V Σ V T . Hence, the derivatives of both sides of the last formulashould coincide as well. In our numerical experiment we obtain l ∞ = 1 . · − . This justifies the correctness of computations viaAlgorithm 1 and confirms the theoretical derivations in Lemma 1. TABLE IA
LGORITHM ILLUSTRATIVE CALCULATIONS FOR E XAMPLE Input
Pre-array: A | ˆ θ =0 . = " − . . . . . . . − . . . Pre-array derivative: A ′ θ (cid:12)(cid:12) ˆ θ =0 . = − . . . . . . . . − . . Line 1. W = − . . . . . . . . − . . . . . . − . . − . . . . . . − . . − . Σ = " . . , V = (cid:2) . . − . . (cid:3) Line 2. Compute M = . − . . . . − . − . − . − . − . .Line 3. Extract [ M ] × = (cid:2) . − . . . (cid:3) Line 4. Split [ M ] × = (cid:2) . (cid:3) + (cid:2) . . (cid:3) + (cid:2) − . (cid:3) Line 5. Compute ¯ L = (cid:2) . (cid:3) Line 6. V ′ θ (cid:12)(cid:12) ˆ θ =0 . = (cid:2) . − . . . (cid:3) Line 7. Σ ′ θ (cid:12)(cid:12) ˆ θ =0 . = " . . Output
Post-arrays: Σ | ˆ θ =0 . = " . . V | ˆ θ =0 . = h . − . − . − . i Post-arrays’ derivative: Σ ′ θ (cid:12)(cid:12) ˆ θ =0 . and V ′ θ (cid:12)(cid:12) ˆ θ =0 . (Lines 6,7) Next, we wish to demonstrate how the novel method for the filtersensitivities evaluation (Algorithm 2) works in practice. For that,we consider the parameter estimation problem where the gradient-based optimization method is applied for finding the optimal valueof unknown system parameters. We test the conventional “differ-entiated” KF (Eqs (3) – (11) in Section II) and the previouslyderived SR- and UD-based “differentiated” KF variants from [26]and [27], respectively, against the new “differentiated” SVD-based KF (Algorithm 2). As discussed in Section IV, all “differentiated”methods consist of two parts and, hence, they compute the LogLF and its gradient simultaneously. These values are utilized by agradient-based optimization method for maximizing the log LF withrespect to system parameters. Our library of codes is implemented inMATLAB where we use the built-in optimization method fminunc . Example 2:
Consider a linearized version of the in-track motiondynamic when a satellite travels in a circular orbit [34]: x k = . .
50 1 1 10 0 1 00 0 0 0 . x k − + w k − , Ω = q z k = (cid:20) δ (cid:21) x k + v k , R = (cid:20) θ δ θ δ (cid:21) where q = 0 . · − , x ∼ N (0 , θ I ) and θ is the unknownsystem parameter that needs to be estimated. In contrast to [34], wewish to test both well-conditioned and ill-conditioned situations. Forthat, following [17], we simulate the roundoff by parameter δ . It isassumed to be δ < ǫ roundoff , but δ > ǫ roundoff where ǫ roundoff denotes the unit roundoff error . When δ → ǫ roundoff , i.e. themachine precision limit, the problem above becomes ill-conditioned.By varying the ill-conditioning parameter δ , we are able to exploresome numerical insights of each method assessed.The numerical experiment is organized as follows. For each fixedvalue of ill-conditioning parameter δ , the SSM in Example 2 issimulated for θ ∗ = 5 to generate N = 100 measurements. Next,the unknown system parameter θ is estimated from the availableexperimental data, Z N = { z , . . . , z N } , by using gradient-basedadaptive KF techniques examined, i.e. by four “differentiated” KFmethods mentioned earlier in this Section. For a fair comparison,each “differentiated” algorithm utilizes the same data Z N and thesame initial value for the optimization method, ˆ θ (0) = 1 . Next, theobtained optimal estimate ˆ θ ∗ is compared with the “true” value of θ ∗ = 5 for assessing the estimation quality of each method. We repeatthe experiment M = 100 times and calculate a posterior mean of theestimate, the root mean squared error (RMSE) and the mean absolutepercentage error (MAPE) over Monte Carlo runs.Having carefully analyzed the obtained numerical results summa-rized in Table 2, we make a few important conclusions. First, all“differentiated” KF variants work equally well when δ is about − and − , i.e. when the problem is not ill-conditioned. This con-firms that all “differentiated” techniques are algebraically equivalent.Second, among all methods examined, the conventional approach(“differentiated” KF) shows the worst performance. It degrades fasterthan any other algorithms when δ → ǫ roundoff . Furthermore, theline in Table 2 means that MATLAB can not even run the algorithm.Third, we analyze the outcomes obtained by other methods testedand observe that the UD- and SVD-based “differentiated” techniquesproduce a better estimation quality than the SR-based counterpart.This conclusion is reasonable if we recall that in this paper we donot explore the filtering algorithms, but their differential form forthe KF sensitivities computation. Any existing “differentiated” SR-based scheme requires the triangular matrix inversion R / e,k that isa square-root factor of the innovation covariance R e,k ; see Eq (6)in [26]. In contrast, the UD- and SVD-based “differentiated” methodsinvolve the inversion of only diagonal matrix D R e,k ; see (30) andEq (8) in [27]. Finally, we observe that the new SVD-based approachslightly outperforms the UD-based counterpart when δ → ǫ roundoff . Computer roundoff for floating-point arithmetic is characterized by a singleparameter ǫ roundoff , defined in different sources as the largest number suchthat either ǫ roundoff = 1 or ǫ roundoff / in machine precision. REPRINT 6
TABLE IIE
FFECT OF ROUNDOFF ERRORS IN ILL - CONDITIONED TEST PROBLEMS IN E XAMPLE EXACT θ ∗ = 5 , M ONTE C ARLO RUNS “differentiated” KF “differentiated” SR-based KF “differentiated” UD-based KF “differentiated” SVD-based KF δ Mean RMSE MAPE% Mean RMSE MAPE% Mean RMSE MAPE% Mean RMSE MAPE% − − − − − > − -0.1315 7.2403 > − − − − − − − − − − − − − − − − − − − − − − − − − In summary, the previously derived UD- and the new SVD-based techniques provide the best estimation quality when solvingparameter estimation problem by the gradient-based adaptive filteringmethodology. This creates a strong background for their practical use.In our ill-conditioned test example, the new SVD-based approacheven slightly outperforms the UD-based counterpart.R
EFERENCES [1] R. K. Mehra, “Approaches to adaptive filtering,”
IEEE Trans. Automat.Contr. , vol. 17, no. 5, pp. 693–698, Oct. 1972.[2] G. Bastin and M. R. Gevers, “Stable adaptive observers for nonlineartime-varying systems,”
IEEE Trans. Automat. Contr. , vol. 33, no. 7, pp.650–658, Jul. 1988.[3] R. K. Mehra, “Optimal input signals for parameter estimation in dynamicsystems – survey and new results,”
IEEE Trans. Automat. Contr. , vol.AC-19, no. 6, pp. 753–768, Dec. 1974.[4] N. K. Gupta and R. K. Mehra, “Computational aspects of maximumlikelihood estimation and reduction in sensitivity function calculations,”
IEEE Trans. Automat. Contr. , vol. AC-19, no. 6, pp. 774–783, Dec. 1974.[5] P. A. Zadrozny and S. Mittnik, “Kalman-filtering methods for computinginformation matrices for time-invariant, periodic, and generally time-varying VARMA models and samples,”
Computers & Mathematics withApplications , vol. 28, no. 4, pp. 107–119, 1994.[6] A. Klein and G. M´elard, “Computation of the Fisher informationmatrix for time series models,”
Journal of computational and appliedmathematics , vol. 64, no. 1, pp. 57–68, 1995.[7] A. Klein, G. M´elard, and T. Zahaf, “Construction of the exact Fisherinformation matrix of Gaussian time series models by means of matrixdifferential rules,”
Linear Algebra and its Applications , vol. 321, no. 1,pp. 209–232, 2000.[8] P. A. Zadrozny, “Analytic derivatives for estimation of linear dynamicmodels,”
Computers & Mathematics with Applications , vol. 18, no. 6,pp. 539–553, 1989.[9] A. Klein and H. Neudecker, “A direct derivation of the exact Fisherinformation matrix of Gaussian vector state space models,”
LinearAlgebra and its Applications , vol. 321, no. 1, pp. 233–238, 2000.[10] M. Verhaegen and P. Van Dooren, “Numerical aspects of differentkalman filter implementations,”
IEEE Trans. Automat. Contr. , vol. AC-31, no. 10, pp. 907–917, Oct. 1986.[11] M. Verhaegen, “Round-off error propagation in four generally-applicable, recursive, least-squares estimation schemes,”
Automatica ,vol. 25, no. 3, pp. 437–444, 1989.[12] P. G. Kaminski, A. E. Bryson, and S. F. Schmidt, “Discrete square-rootfiltering: a survey of current techniques,”
IEEE Trans. Automat. Contr. ,vol. AC-16, no. 6, pp. 727–735, Dec. 1971.[13] G. J. Bierman,
Factorization Methods For Discrete Sequential Estima-tion . New York: Academic Press, 1977.[14] A. H. Sayed and T. Kailath, “Extended C handrasekhar recursion,” IEEETrans. Automat. Contr. , vol. AC-39, no. 3, pp. 619–622, Mar. 1994.[15] P. Park and T. Kailath, “New square-root algorithms for K almanfiltering,” IEEE Trans. Automat. Contr. , vol. 40, no. 5, pp. 895–899,May 1995.[16] T. Kailath, A. H. Sayed, and B. Hassibi,
Linear Estimation . New Jersey:Prentice Hall, 2000. [17] M. Grewal and A. Andrews,
Kalman filtering: theory and practice . NewJersey: Prentice Hall, 2001.[18] G. H. Golub and C. F. Van Loan,
Matrix computations . Baltimore,Maryland: Johns Hopkins University Press, 1983.[19] N. J. Higham, “Analysis of the Cholesky decomposition of a semi-definite matrix,” University of Manchester, Tech. Rep. MIMS EPrint:2008.56, 1990.[20] X. Zhang, W. Hu, Z. Zhao, Y. Wang, and Q. Wei, “SVD based Kalmanparticle filter for robust visual tracking,” in
Proceedings of the 19thInternational Conference on Pattern Recognition . Tampa, FL, USA:IEEE, 2008, pp. 1–4.[21] O. Straka, J. Dunik, M. Simandl, and J. Havlik, “Aspects and comparisonof matrix decompositions in unscented Kalman filter,” in
Proceedings ofthe IEEE American Control Conference (ACC) , 2013, pp. 3075–3080.[22] Q. Zhang, X. Meng, S. Zhang, and Y. Wang, “Singular ValueDecomposition-based Robust Cubature Kalman Filtering for an Inte-grated GPS/SINS Navigation System,”
Journal of Navigation , vol. 68,no. 3, pp. 549–562, 2014.[23] L. Wang, G. Libert, and P. Manneback, “Kalman Filter Algorithmbased on Singular Value Decomposition,” in
Proceedings of the 31stConference on Decision and Control . Tuczon, AZ, USA: IEEE, 1992,pp. 1224–1229.[24] M. V. Kulikova and J. V. Tsyganova, “Improved discrete-timeKalman filtering within singular value decomposition,”
IET ControlTheory & Applications , 2016, (in progress). [Online]. Available:https://arxiv.org/abs/1611.03686[25] G. J. Bierman, M. R. Belzer, J. S. Vandergraft, and D. W. Porter,“Maximum likelihood estimation using square root information filters,”
IEEE Trans. Automat. Contr. , vol. 35, no. 12, pp. 1293–1298, Dec. 1990.[26] M. V. Kulikova, “Likelihood gradient evaluation using square-rootcovariance filters,”
IEEE Trans. Automat. Contr. , vol. 54, no. 3, pp. 646–651, Mar. 2009.[27] J. V. Tsyganova and M. V. Kulikova, “State sensitivity evaluation withinUD based array covariance filter,”
IEEE Trans. Automat. Contr. , vol. 58,no. 11, pp. 2944–2950, Nov. 2013.[28] M. V. Kulikova and A. Pacheco, “Kalman filter sensitivity evaluationwith orthogonal and J-orthogonal transformations,”
IEEE Trans. Au-tomat. Contr. , vol. 58, no. 7, pp. 1798–1804, Jul. 2013.[29] M. V. Kulikova and J. V. Tsyganova, “Constructing numerically stableKalman filter-based algorithms for gradient-based adaptive filtering,”
In-ternational Journal of Adaptive Control and Signal Processing , vol. 29,no. 11, pp. 1411–1426, 2015.[30] ——, “A unified square-root approach for the score and Fisher informa-tion matrix computation in linear dynamic systems,”
Mathematics andComputers in Simulation , vol. 119, pp. 128–141, 2016.[31] H. Neudecker, “Some theorems on matrix differentiation with specialreference to Kronecker matrix products,”
Journal of the AmericanStatistical Association , vol. 64, no. 327, pp. 953–963, 1969.[32] E. E. Tyrtyshnikov,
A brief introduction to numerical analysis . SpringerScience & Business Media: Springer, 2012.[33] F. C. Schweppe, “Evaluation of likelihood functions for Gaussiansignals,”
IEEE Trans. Inf. Theory , vol. IT-11, pp. 61–70, Jan. 1965.[34] H. E. Rauch, C. T. Striebel, and F. Tung, “Maximum likelihood estimatesof linear dynamic systems,”