Likelihood Gradient Evaluation Using Square-Root Covariance Filters
aa r X i v : . [ c s . S Y ] M a y PREPRINT 1
Likelihood Gradient Evaluation Using Square-RootCovariance Filters
M.V. Kulikova
Abstract — Using the array form of numerically stable square-rootimplementation methods for Kalman filtering formulas, we constructa new square-root algorithm for the log-likelihood gradient (score)evaluation. This avoids the use of the conventional Kalman filter withits inherent numerical instabilities and improves the robustness ofcomputations against roundoff errors. The new algorithm is developedin terms of covariance quantities and based on the ”condensed form” ofthe array square-root filter.
Index Terms — identification, maximum likelihood estimation, gradientmethods, Kalman filtering, numerical stability.
I. I
NTRODUCTION
Consider the discrete-time linear stochastic system x k = F k x k − + G k w k , (1) z k = H k x k + v k , k = 1 , . . . , N, (2)where x k ∈ R n and z k ∈ R m are, respectively, the state and the mea-surement vectors; k is a discrete time, i.e. x k means x ( t k ) . The noises w k ∈ R q , v k ∈ R m and the initial state x ∼ N (¯ x , Π ) are takenfrom mutually independent Gaussian distributions with zero mean andcovariance matrices Q k and R k , respectively, i.e. w k ∼ N (0 , Q k ) , v k ∼ N (0 , R k ) . Additionally, system (1), (2) is parameterized by avector of unknown system parameters θ ∈ R p , which needs to beestimated. This means that the entries of the matrices F k , G k , H k , Q k , R k and Π are functions of θ ∈ R p . However, for the sake ofsimplicity we will suppress the corresponding notations below, i.einstead of F k ( θ ) , G k ( θ ) , H k ( θ ) , Q k ( θ ) , R k ( θ ) and Π ( θ ) we willwrite F k , G k , H k , Q k , R k and Π .Solving the parameter estimation problem by the method of maxi-mum likelihood requires the maximization of the likelihood function(LF) with respect to unknown system parameters. It is often done byusing a gradient approach where the computation of the likelihoodgradient (LG) is necessary. For the state-space system (1), (2) thenegative Log LF is given as [1]: L θ (cid:16) Z N (cid:17) = 12 N X k =1 n m π ) + ln (det R e,k ) + e Tk R − e,k e k o where Z N = [ z , . . . , z N ] is N -step measurement history and e k arethe innovations, generated by the discrete-time Kalman filter (KF),with zero mean and covariance matrix R e,k . They are e k = z k − H k ˆ x k | k − and R e,k = H k P k | k − H Tk + R k , respectively. The KFdefines the one-step ahead predicted state estimate ˆ x k | k − and theone-step predicted error covariance matrix P k | k − .Straight forward differentiation of the KF equations is a directapproach to the Log LG evaluation, known as a ”score”. This leadsto a set of p vector equations, known as the filter sensitivity equations ,for computing ∂ ˆ x k | k − /∂θ , and a set of p matrix equations, knownas the Riccati-type sensitivity equations , for computing ∂P k | k − /∂θ .Consequently, the main disadvantage of the standard approachis the problem of numerical instability of the conventional KF, i.edivergence due to the lack of reliability of the numerical algorithm.Solution of the matrix Riccati equation is a major cause of numer-ical difficulties in the conventional KF implementation, from the Manuscript received September 14, 2007; revised May 9, 2008.M.V. Kulikova is with the School of Computational and AppliedMathematics, University of the Witwatersrand, South Africa; Email:[email protected]. standpoint of computational load as well as from the standpoint ofcomputational errors [2].The alternative approach can be found in, so-called, square-root filtering algorithms. It is well known that numerical solutionof the Riccati equation tends to be more robust against roundofferrors if Cholesky factors or modified Cholesky factors (such asthe U T DU -algorithms [3]) of the covariance matrix are used as thedependent variables. The resulting KF implementation methods arecalled square-root filters (SRF). They are now generally preferredfor practical use [2], [4], [5]. For more insights about numericalproperties of different KF implementation methods we refer to thecelebrated paper of Verhaegen and Van Dooren [6].Increasingly, the preferred form for algorithms in many fieldsis now the array form [7]. Several useful SRF algorithms for KFformulas formulated in the array form have been recently proposedby Park and Kailath [8]. For this implementations the reliability ofthe filter estimates is expected to be better because of the use ofnumerically stable orthogonal transformations for each recursion step.Apart from numerical advantages, array SRF algorithms appear to bebetter suited to parallel and to very large scale integration (VLSI)implementations [8], [9].The development of numerically stable implementation methodsfor KF formulas has led to the hope that the Log LG (with respectto unknown system parameters) might be computed more accurately.For this problem, a number of questions arise: • Is it possible to extend reliable array SRF algorithms to the caseof the Log LG evaluation? • If such methods exist, will they inherit the advantages fromthe source filtering implementations? In particular, will theyimprove the robustness of the computations against roundofferrors compared to the conventional KF technique? The questionabout suitability for parallel implementation is beyond the scopeof this paper.The first attempt to answer these questions belongs to Bierman et al. [10]. The authors used the square-root information filter,developed by Dyer and McReynolds [11] and later extended byBierman [3], as a source filter implementation and constructed themethod for score evaluation. The algorithm was developed in theform of measurement and time updates. However, the accuracy ofthe proposed method has not been investigated.In contrast to the main result of [10], we focus on the dual classof KF implementation methods (that is the class of covariance-typemethods) and discuss the efficient Log LG evaluation in square-rootcovariance filters. More precisely, we consider the array form of thesquare-root covariance filter eSRCF introduced in [8]. The purposeof this paper is to design the method for the Log LG evaluation interms of the square-root covariance variables, i.e. in terms of thequantities that appear naturally in the eSRCF. This avoids the useof the conventional KF with its inherent numerical instabilities andgives us an opportunity to improve the robustness of the Log LGcomputation against roundoff errors.II. E
XTENDED S QUARE -R OOT C OVARIANCE F ILTER
To achieve our goal, we are first going to present the extendedsquare-root covariance filter (eSRCF), proposed in [8], and second,we will derive the expression for the Log LG evaluation in terms ofthe variables that are generated by the eSRCF implementation.
Notations to be used:
For the sake of simplicity, we denote theone-step predicted state estimate as ˆ x k and the one-step predictederror covariance matrix as P k . We use Cholesky decomposition ofthe form P k = P T/ k P / k , where P / k is an upper triangular matrix.Similarly, we define R / k , Q / k , R / e,k . For convenience we will write REPRINT 2 Q k R / k − R − T/ k z k ∂ θ i R / k ∂ θ i (cid:16) − R − T/ k z k (cid:17) P / k H Tk P / k F Tk P − T/ k ˆ x k ∂ θ i (cid:16) P / k H Tk (cid:17) ∂ θ i (cid:16) P / k F Tk (cid:17) ∂ θ i (cid:16) P − T/ k ˆ x k (cid:17) Q / k G Tk ∂ θ i (cid:16) Q / k G Tk (cid:17) = R / e,k ¯ K Tp,k − ¯ e k X i Y i M i P / k +1 P − T/ k +1 ˆ x k +1 N i V i W i γ k B i K i T i . (8) A − / = ( A / ) − , A − T/ = ( A − / ) T and ∂ θ i A implies thepartial derivative of the matrix A with respect to the i th componentof θ , i.e ∂A/∂θ i .In this paper, we deal with the ”condensed form” of the eS-RCF [8]: Assume that R k > . Given Π / and Π − T/ ¯ x , recur-sively update P / k and P − T/ k ˆ x k as follows: Q k R / k − R − T/ k z k P / k H Tk P / k F Tk P − T/ k ˆ x k Q / k G Tk = R / e,k ¯ K Tp,k − ¯ e k P / k +1 P − T/ k +1 ˆ x k +1 γ k (3)where Q k is any orthogonal rotation that upper-triangularizes thefirst two (block) columns of the matrix on the left-hand side of (3); ¯ K p,k = F k P k H Tk R − / e,k and ¯ e k = R − T/ e,k e k .One can easily obtain the expression for the negative Log LF interms of the eSRCF variables: L θ (cid:16) Z N (cid:17) = 12 N X k =1 n m π ) + 2 ln (cid:16) det R / e,k (cid:17) + ¯ e Tk ¯ e k o . (4)Let θ = [ θ , . . . , θ p ] denote the vector of parameters with respectto which the likelihood function is to be differentiated. Then from (4),we have ∂ θ i L θ (cid:16) Z N (cid:17) = N X k =1 (cid:26) ∂ θ i h ln (cid:16) det R / e,k (cid:17)i + 12 ∂ θ i h ¯ e Tk ¯ e k i(cid:27) . (5)Taking into account that the matrix R / e,k is upper triangular, wederive ∂ θ i h ln (cid:16) det R / e,k (cid:17)i = ∂ θ i " m X j =1 ln (cid:16) r jje,k (cid:17) = m X j =1 " r jje,k · ∂ θ i r jje,k = tr h R − / e,k · ∂ θ i R / e,k i , i = 1 , . . . , p (6)where the r jje,k , j = 1 , . . . , m denote the diagonal elements of thematrix R / e,k .Substitution of (6) into (5) yields the result that we are looking for ∂ θ i L θ (cid:16) Z N (cid:17) = N X k =1 n tr h R − / e,k · ∂ θ i R / e,k i + ¯ e Tk · ∂ θ i ¯ e k o . (7)Ultimately, our problem is to compute Log LG (7) by using theeSRCF equation (3). Before we come to the main result of this paper,there are a few points to be considered. As can be seen from (7),the elements ¯ e k and R / e,k involved in the Log LG evaluation areobtained from the underlying filtering algorithm directly, i.e. from (3).No additional computations are needed. Hence, our aim is to explain The ”condensed form” of filtering algorithms refers to the case when im-plementation method for the KF formulas is not divided into the measurementand time updates. how the last two terms in the Log LG expression, ∂ θ i ¯ e k and ∂ θ i R / e,k ,can be computed using quantities available from eSRCF (3).III. S UGGESTED S QUARE -R OOT M ETHOD FOR S CORE E VALUATION
We can now prove the following result.
Theorem 1:
Let the entries of the matrices F k , G k , H k , Q k , R k , Π describing the linear discrete-time stochastic system (1), (2) bedifferentiable functions of a parameter θ ∈ R p . Then in order tocompute the Log LF and its gradient (with respect to unknown systemparameter θ ) the eSRCF, which is used to filter the data, needs to beextended as follows. Assume that R k > . Given the initial values Π / , Π − T/ ¯ x and ∂ θ i Π / , ∂ θ i (cid:16) Π − T/ ¯ x (cid:17) , recursively update P / k , P − T/ k ˆ x k and ∂ θ i P / k , ∂ θ i (cid:16) P − T/ k ˆ x k (cid:17) as follows: I. Replace the eSRCF equation (3) by (8) where Q k is any or-thogonal rotation that upper-triangularizes the first two (block)columns of the matrix on the left-hand side of (8). II.
Having computed the elements of the right-hand side matrixin (8), calculate for each θ i : " ∂ θ i R / e,k ∂ θ i ¯ K Tp,k ∂ θ i P / k +1 = h ¯ L Ti + D i + ¯ U i i" R / e,k ¯ K Tp,k P / k +1 , (9) " − ∂ θ i ¯ e k ∂ θ i (cid:16) P − T/ k +1 ˆ x k +1 (cid:17) = h ¯ L Ti − ¯ L i i(cid:20) − ¯ e k P − T/ k +1 ˆ x k +1 (cid:21) + " R / e,k ¯ K Tp,k P / k +1 − T (cid:20) B i K i (cid:21) γ k + (cid:20) M i W i (cid:21) (10)where ¯ L i , D i and ¯ U i are strictly lower triangular, diagonal andstrictly upper triangular parts of the following matrix product: (cid:20) X i Y i N i V i (cid:21) " R − / e,k − R − / e,k ¯ K Tp,k P − / k +1 P − / k +1 = ¯ L i + D i + ¯ U i . (11) III.
Having determined ¯ e k , R / e,k and ∂ θ i ¯ e k , ∂ θ i R / e,k compute LogLF (4) and Log LG (7). Proof:
As discussed earlier, the main difficulty in score evalu-ation (7) is to define ∂ θ i R / e,k and ∂ θ i ¯ e k from the underlying filter,i.e. from (3). We divide the proof into two parts, first proving (9) forthe ∂ θ i R / e,k evaluation and then validating (10) for ∂ θ i ¯ e k . Part I.
Our goal is to express ∂ θ i R / e,k in terms of the variablesthat appear naturally in the eSRCF implementation. First, we cannote that the eSRCF transformation in (3) has a form QA = B where A is a rectangular matrix, and Q is an orthogonal transfor-mation that block upper-triangularizes B . If matrix A is square and REPRINT 3 invertible, then given the matrix of derivatives A ′ θ = da ij dθ we cancompute B ′ θ as follows [10]: B ′ θ = h L T + D + U i B (12)where L , D and U are, respectively, strictly lower triangular, diagonaland strictly upper triangular parts of the matrix QA ′ θ B − .However, this idea cannot be applied to the eSRCF because thematrix to be triangularized, i.e. the first two (block) columns of thematrix on the left-hand side of (3), is not square and, hence, not in-vertible. By using the pseudoinversion (Moore-Penrose inversion) weavoid this obstacle and generalize the scheme of computations (12)to the case of eSRCF (3).To begin constructing the method for score evaluation, we augmentthe matrix to be triangularized by q columns of zeros. Hence, weobtain Q k R / k P / k H Tk P / k F Tk Q / k G Tk = R / e,k ¯ K Tp,k P / k +1
00 0 0 . (13)The matrices in (13) have dimensions ( m + n + q ) × ( m + n + q ) .For the sake of simplicity, we denote the left-hand side and the right-hand side matrices of (13) as A k and B k , respectively. Then, bydifferentiating (13) with respect to the components of θ , we obtain ∂ θ i Q k · A k + Q k · ∂ θ i A k = ∂ θ i B k . (14)Multiplication both sides of (14) by the pseudoinverse matrix B + k yields ∂ θ i B k · B + k = ∂ θ i Q k (cid:0) A k B + k (cid:1) + Q k · ∂ θ i A k · B + k = ∂ θ i Q k (cid:16) Q Tk B k B + k (cid:17) + ( Q k · ∂ θ i A k ) B + k . (15)One can easily obtain the explicit expression for B + k : B + k = R − / e,k − R − / e,k ¯ K Tp,k P − / k +1 P − / k +1
00 0 0 . (16)By using (8), we replace Q k · ∂ θ i A k in (15) by the quantitiesalready computed. Then, taking into account (16), we derive theequation for the ( m + n ) × ( m + n ) main block of the matrix B k : " ∂ θ i R / e,k ∂ θ i ¯ K Tp,k ∂ θ i P / k +1 R / e,k ¯ K Tp,k P / k +1 − = h ∂ θ i Q k × Q Tk i m + n + (cid:20) X i Y i N i V i (cid:21) " R / e,k ¯ K Tp,k P / k +1 − (17)where (cid:2) ∂ θ i Q k · Q Tk (cid:3) m + n denotes the ( m + n ) × ( m + n ) main blockof the matrix ∂ θ i Q k · Q Tk .As discussed in [10], the matrix ∂ θ i Q k · Q Tk is skew symmetricand, hence, can be represented in the form ¯ L T − ¯ L where ¯ L is strictlylower triangular.Now, let us consider matrix equation (17). As can be seen, thematrix on the left-hand side of (17) is block upper triangular. Thus,the strictly lower triangular part of the matrix (cid:2) ∂ θ i Q k · Q Tk (cid:3) m + n must exactly cancel the strictly lower triangular part of the secondterm on the right-hand side of (17). In other words, if (cid:20) X i Y i N i V i (cid:21) " R / e,k ¯ K Tp,k P / k +1 − = ¯ L i + D i + ¯ U i , then h ∂ θ i Q k · Q Tk i m + n = ¯ L Ti − ¯ L i . (18) Substitution of (18) into (17) leads to the result " ∂ θ i R / e,k ∂ θ i ¯ K Tp,k ∂ θ i P / k +1 = h ¯ L Ti + D i + ¯ U i i " R / e,k ¯ K Tp,k P / k +1 . (19)Formulas (19) and (18) are, in fact, equations (9) and (11) of theproposed method for score evaluation. The theorem is half proved. Part II.
We need to verify (10). By differentiating the last equationof the eSRCF with respect to the components of θQ k − R − T/ k z k P − T/ k ˆ x k = − ¯ e k P − T/ k +1 ˆ x k +1 γ k we obtain − ∂ θ i ¯ e k ∂ θ i (cid:16) P − T/ k +1 ˆ x k +1 (cid:17) ∂ θ i γ k = ∂ θ i Q k · Q Tk · Q k − R − T/ k z k P − T/ k ˆ x k + Q k − ∂ θ i (cid:16) R − T/ k z k (cid:17) ∂ θ i (cid:16) P − T/ k ˆ x k (cid:17) . (20)Next, we replace the last term in (20) with the quantities alreadycomputed and collected in the right-hand side matrix of (8). Further-more, it is useful to note that the element ∂ θ i γ k is of no interest here.These two steps give us " − ∂ θ i ¯ e k ∂ θ i (cid:16) P − T/ k +1 ˆ x k +1 (cid:17) = h ∂ θ i Q k · Q Tk i m + n (cid:20) − ¯ e k P − T/ k +1 ˆ x k +1 (cid:21) + h ∂ θ i Q k · Q Tk i row : 1: m + ncol : last q γ k + (cid:20) M i W i (cid:21) (21)where h ∂ θ i Q k · Q Tk i row : 1: m + ncol : last q stands for the ( m + n ) × q matrixcomposed of the entries that are located at the intersections of thelast q columns with the first m + n rows of ∂ θ i Q k · Q Tk .Taking into account (18), from the equation above we obtain " − ∂ θ i ¯ e k ∂ θ i (cid:16) P − T/ k +1 ˆ x k +1 (cid:17) = h ¯ L Ti − ¯ L i i(cid:20) − ¯ e k P − T/ k +1 ˆ x k +1 (cid:21) + h ∂ θ i Q k · Q Tk i row : 1: m + ncol : last q γ k + (cid:20) M i W i (cid:21) (22)where ¯ L i is strictly lower triangular part of the matrix in (11).Since ∂ θ i Q k · Q Tk is skew symmetric, we can write down h ∂ θ i Q k · Q Tk i row : 1: m + ncol : last q = − (cid:20)h ∂ θ i Q k · Q Tk i row : last qcol : 1: m + n (cid:21) T (23)where h ∂ θ i Q k · Q Tk i row : last qcol : 1: m + n stands for the q × ( m + n ) matrixcomposed of the entries that are located at the intersections of thelast q rows with the first ( m + n ) columns of ∂ θ i Q k · Q Tk .To evaluate the right-hand side of (23), we return to (15) and writeit in the matrix form: ∂ θ i R / e,k ∂ θ i ¯ K Tp,k ∂ θ i P / k +1
00 0 0 R / e,k ¯ K Tp,k P / k +1
00 0 0 + = ∂ θ i Q k · Q Tk + X i Y i N i V i B i K i R / e,k ¯ K Tp,k P / k +1
00 0 0 + . (24) REPRINT 4
As can be seen, the last (block) row of the left-hand side matrixin (24) is zero. Thus, the last (block) row of the matrix ∂ θ i Q k · Q Tk must exactly cancel the last (block) row of the second term in (24): h ∂ θ i Q k · Q Tk i row : last qcol : 1: m + n = − [ B i K i ] " R / e,k ¯ K Tp,k P / k +1 − . (25)By substituting (25) into (23), we obtain h ∂ θ i Q k · Q Tk i row : 1: m + ncol : last q = " R / e,k ¯ K Tp,k P / k +1 − T (cid:20) B i K i (cid:21) . (26)Final substitution of (26) into (22) validates (10) of the proposedmethod for the Log LG evaluation. This completes the proof. Remark 1:
The method for score evaluation introduced above hasbeen derived from the eSRCF implementation. As a consequence, theproposed method is of covariance-type.
Remark 2:
The new square-root algorithm for score evaluationnaturally extends the eSRCF filter and, hence, consists of two parts.They are the ”filtered” and ”differentiated” parts. This structureallows the Log LF and its gradient to be computed simultaneously.Thus, the method is ideal for simultaneous state estimation andparameter identification.
Remark 3:
In the KF formulation of the Log LG evaluation, it isnecessary to run the ”differentiated” KF for each of the parameters θ i to be estimated. As in [10], in the eSRCF formulation this ”bank”of filters is replaced with the augmented arrays to which orthogonaltransformations are applied.IV. N UMERICAL R ESULTS
First, we would like to check our theoretical derivations. To do so,we apply the square-root algorithm introduced in Theorem 1 to thefollowing simple test problem.
Example 1:
Consider the special case of the system (1), (2) being x k = (cid:20) d k s k (cid:21) = (cid:20) t e − ∆ t/τ (cid:21) x k − + (cid:20) (cid:21) w k , z k = [1 0] x k + v k where w k ∼ N (0 , I ) , v k ∼ N (0 , I ) , I n denotes the n × n identitymatrix and τ is a parameter which needs to be estimated.In our simulation experiment, we compute the negative Log LF andits gradient by the proposed square-root method and, then, comparethe results to those produced by the conventional KF approach. Theoutcomes of this experiments are illustrated by Fig. 1 and 2.As can be seen from Fig. 2, all algorithms for score evaluationproduce exactly the same result and give the same zero point thatfurther coincides with the minimum point of the negative Log LF (seeFig. 1). All these evidences substantiate the theoretical derivations ofSection III.Next, we wish to answer the second question posed in this paper:does the algorithm for score evaluation derived from numericallystable square-root implementation method improve the robustnessof computations against roundoff errors? The previously obtainedresults (Example 1) indicate that both methods, i.e. the conventionalKF technique and the new square-root algorithm, produce exactlythe same answer for the Log LF and Log LG evaluation. However,numerically they no longer agree. We are now going to explore theaccuracy of the numerical algorithms.To begin designing the ill-conditioned test problem we, first, stressthe type of the proposed method. As discussed in Remark 1, thenew square-root algorithm belongs to the class of covariance-typemethods. From Verhaegen and Van Dooren’s celebrated paper [6], weknow that the condition number of the innovation covariance matrix K ( R e,k ) is the key parameter determining the numerical behavior of N ega t i v e Log L i k e li hood F un c t i on τ (in sec.) Conventional KF techniqueSuggested square−root method, the "filtered" part
Fig. 1. The negative Log LF computed by the eSRCF and the conventionalKF for Example 1 L i k e li hood G r ad i en t τ (in sec.) Conventional KF technique Suggested square−root method, the "differentiated" part
Fig. 2. The Log LG computed by the proposed square-root method and theconventional KF for Example 1 the covariance algorithms. Taking into account these two importantfacts, we construct the following ill-conditioned test problem.
Example 2:
Consider the problem with the measurement sensitiv-ity matrix H k = (cid:20) δ (cid:21) and F k = I , G k = 0 , Q k = I , R k = δ θI with x ∼ N (0 , θI ) , where θ is an unknown system parameter. Tosimulate roundoff we assume that δ < ǫ roundoff , but δ > ǫ roundoff where ǫ roundoff denotes the unit roundoff error .When θ = 1 , Example 2 coincides with well-known ill-conditionedfiltering problem (see, for instance, [2]) and demonstrates howa problem that is well-conditioned, as posed, can be made ill-conditioned by the filter implementation. The difficulty to be exploredis in matrix inversion. As can be seen, although rank H = 2 , thematrix R e, is singular in machine precision that yields the failureof the conventional KF implementation. We introduced an unknownsystem parameter θ making sure that the same problem is applied tothe matrix ( R e, ) ′ θ for each value of θ . Thus, both parts of the methodfor score evaluation, that are the ”filtered” and ”differentiated” parts,fail after processing the first measurement. From the discussion abovewe understand that Example 2 demonstrates the difficulty only forthe covariance-type methods.Our simulation experiments presented below are organized asfollows. All methods were implemented in the same precision (64-bit Computer roundoff for floating-point arithmetic is often characterized by asingle parameter ǫ roundoff , defined in different sources as the largest numbersuch that either ǫ roundoff = 1 or ǫ roundoff / in machineprecision. REPRINT 5
TABLE IE
FFECT OF ROUNDOFF ERRORS ON THE COMPUTED SOLUTIONS FOR THE SET OF TEST PROBLEMS FROM E XAMPLE δ K ( R e, ) ∆ P ∆ P ′ ∆ LogLF ∆ LogLG ∆ P ∆ P ′ ∆ LogLF ∆ LogLG − · − · − · − · − · − · − · − · − − · − · − · − · − · − · − · − · − − · − · − · − · − · − · − · − · − − · − · − · − · − · − · − · − · − − · − NaN · · · − · − · · − ∞ NaN NaN NaN NaN · − · − · · floating point) in MatLab where the unit roundoff error is − ≈ . · − . The MatLab function eps is twice the unit roundofferror and δ = eps / satisfies the conditions δ < ǫ roundoff and δ > ǫ roundoff from Example 2. We provide the computations forfor different values of δ , say δ ∈ [10 − eps / , eps / ] . Thismeans that we consider a set of test problems from Example 2. Theunknown system parameter θ is fixed, say θ = 2 . The exact answersare produced by the Symbolic Math Toolbox of MatLab. Experiment 1:
In this experiment we are going to use the perfor-mance profile technique to compare the conventional KF approachfor score evaluation with the square-root algorithm introduced in thispaper. The performance profile method was developed by Dolan andMor ´e [12] to answer a common question in scientific computing:how to compare several competing methods on a set of test problem.Now, it can be found in textbooks (see, for instance, [13]).In our simulation experiments we consider a set A of n = 2 algorithms, mentioned above. The performance measure, t a ( p ) , is ameasure of accuracy. More precisely, t a ( p ) is the maximum absoluteerror in Log LG computed for different values of δ . Thus, weconsider a set P of m = 7 test problems from Example 2; δ ∈ [10 − , − , − , − , − , − , − ] . According to theperformance profile technique, we compute the performance ratio r p,a = t a ( p )min { t σ ( p ) : σ ∈ A } ≥ , which is the performance of algorithm a on problem p dividedby the best performance of all the methods (we mean a particularimplementation method for score evaluation) on this problem. Theperformance profile of algorithm a is the function φ a ( µ ) = 1 m × number of p ∈ P such that r p,a ≤ µ, which is monotonically increasing. Thus, φ a ( µ ) is the probabilitythat the performance of algorithm a is within a factor µ of the bestperformance over all implementations on the given set of problems.The results of this experiment are illustrated by Fig. 3. For eachmethod, µ is plotted against the performance profile φ a ( µ ) , for µ ∈ [0 , . We are now going to explain Fig. 3.Let us consider the left-hand side of Fig. 3, where µ = 1 . Wecan say that the new square-root algorithm proposed in this paper isthe most accurate implementation on ≈ of the problems, withthe conventional KF being accurate on of the problems. Next,we consider the middle of the plot, looking where the curve first hitprobability 1. We conclude that the suggested square-root method iswithin a factor µ ≈ . of being the most accurate implementation onevery test problem. However, the conventional KF approach for scoreevaluation will never manage all problems (as δ → ǫ roundoff , themachine precision limit, the test problems become ill-conditioned).We need to increase µ to ≈ . to be able to say that for ≈ ofthe test problems the conventional KF provides an accurate Log LGevaluation within a factor µ ≈ . . µ P e r f o r m an c e p r o f il e φ a ( µ ) Suggested square−root algorithmConventional KF technique
Fig. 3. Performance profiles of the methods for score evaluation: theconventional KF implementation and the new square-root algorithm proposedin this paper, – on the set of test problems from for Example 2.
Thus, the performance profiles clearly indicate that on the set ofthe test problems from Example 2 the new square-root algorithmderived in this paper provides more accurate evaluation of the LogLG compared with the conventional KF approach.
Experiment 2:
In this experiment we use the conventional KFtechnique and the proposed square-root method to compute themaximum absolute error in Log LF, denoted as ∆ LogLF , and itsgradient, denoted as ∆ LogLG . The results of this experiment aresummarized in Table I. We also present the maximum absolute erroramong elements in matrices P and ( P ) ′ θ (denoted as ∆ P and ∆ P ′ ,respectively) to explore the numerical behavior of the ”filtered” and”differentiated” parts of the methods for score evaluation.As can be seen from Table I, the square-root implementation ofthe Riccati-type sensitivity equation degrades more slowly than theconventional Riccati-type sensitivity recursion as δ → ǫ roundoff ,the machine precision limit (see columns denoted as ∆ P ′ ). Forinstance, the ”filtered” (columns ∆ P ) and ”differentiated” (columns ∆ P ′ ) parts of the proposed square-root method for score evaluationmaintain about and digits of accuracy, respectively, at δ = 10 − .The conventional KF technique provides essentially no correct digitsin both computed solutions. Besides, it seems that the roundoff errorstend to accumulate and degrade the accuracies of the Log LF and LogLG faster than the accuracies of ∆ P and ∆ P ′ . Indeed, for the same δ = 10 − we obtain no correct digits in the computed solutions forall methods. In MatLab, the term ’NaN’ stands for ’Not a Number’that actually means the failure of the numerical algorithm. Remark 4:
The results of Experiment 2 indicate that the newsquare-root algorithm provides more accurate computation of thesensitivity matrix ( P k ) ′ θ compared to the conventional KF. Hence,it can be successfully used in all applications where this quantity isrequired. REPRINT 6
V. C
ONCLUDING R EMARKS
In this paper, a numerically stable square-root implementationmethod for KF formulas, the eSRCF, has been extended in orderto compute the Log LG for linear discrete-time stochastic systems.The preliminary analysis indicates that the new algorithm for scoreevaluation provides more accurate computations compared with theconventional KF approach. The new result can be used for efficientcalculations in sensitivity analysis and in gradient-search optimiza-tion algorithms for the maximum likelihood estimation of unknownsystem parameters.As an extension of the eSRCF, the new method for score evaluationis expected to inherit its benefits. However, the question aboutsuitability for parallel implementation is still open.It can be mentioned that another approach to construct numericallystable implementation method for score evaluation is to use the UD filter [3]. Being the modification of the square-root implementations,the UD -type algorithms improve the robustness of computationsagainst roundoff errors, but compared with SRF, the UD filter reducesthe computational cost (see [3], [6], [5]). As mentioned in [10] andas far as this author knows, it is still not known how to use the UD filter to compute the score.R EFERENCES [1] F.C. Schweppe, ”Evaluation of likelihood functions for Gaussian sig-nals”,
IEEE Trans. Inf. Theory , vol. IT-11, pp. 61–70, Jan. 1965.[2] M.S. Grewal and A.P. Andrews,
Kalman Filtering: Theory and PracticeUsing MATLAB , 2nd ed., Ed. New York: John Wiley & Sons, 2001.[3] G.J. Bierman,
Factorization Methods for Discrete Sequential Estimation .Ed. New York: Academic Press, 1977.[4] P.G. Kaminski, A.E. Bryson and S.F. Schmidt, ”Discrete square-rootfiltering: a survey of current techniques”,
IEEE Trans. Autom. Control ,vol. AC-16, pp. 727–735, Dec. 1971.[5] T. Kailath, A.H. Sayed and B. Hassibi,
Linear Estimation , Ed. NewJersey: Prentice Hall, 2000.[6] M. Verhaegen and P.V. Dooren, ”Numerical aspects of different Kalmanfilter implementations”,
IEEE Trans. Autom. Control , vol. AC-31,pp. 907–917, Oct. 1986.[7] T. Kailath, ”Array algorithms for structured matrices”, presented at theconference of the International Linear Algebra Society, Winnipeg, USA,June 3-6, 1998.[8] P. Park and T. Kailath, ”New square-root algorithms for Kalman filter-ing”,
IEEE Trans. Autom. Control , vol. 40, pp. 895–899, May 1995.[9] E.K.B. Lee and S. Maykin, ”Parallel implementation of the extendedsquare-root covariance filter for tracking applications”,
IEEE Trans.Parallel Distrib. Syst. , vol. 4, pp. 446–457, April 1993.[10] G.J. Bierman, M.R. Belzer, J.S. Vandergraft and D.W. Porter, ”Maximumlikelihood estimation using square root information filters”,
IEEE Trans.Autom. Control , vol. 35, pp. 1293–1298, Dec. 1990.[11] P. Dyer and S. McReynolds, ”Extensions of square root filtering toinclude process noise”,
J. Opt. Theory Appl. , vol. 3, pp. 444–459,June 1969.[12] E.D. Dolan and J.J. Mor ´e , ”Benchmarking optimization software withperformance profiles”, Math. Programming , vol. 91:201-213, pp. 320–348, 2002.[13] D.J. Higham and N.J. Higham,