Block Diagonally Dominant Positive Definite Sub-optimal Filters and Smoothers
aa r X i v : . [ s t a t . M E ] M a r BLOCK DIAGONALLY DOMINANT POSITIVE DEFINITEAPPROXIMATE FILTERS AND SMOOTHERS
Running title:
BLOCK DIAGONALLY DOMINANT APPROXIMATE FILTERS
Subtitle: Kalman filters and smoothers are approximated when the transitionmatrix and the incremental information are nearly block diagonal.
Kurt S. RiedelCourant Institute of Mathematical SciencesNew York University251 Mercer St.New York, New York 10012
Abstract
We examine stochastic dynamical systems where the transition matrix, Φ,and the system noise,
Γ Q Γ T , covariance are nearly block diagonal. When H T R − H is also nearly block diagonal, where R is the observation noise co-variance and H is the observation matrix, our suboptimal filter/smoothers arealways positive semidefinite, and have improved numerical properties. Appli-cations for distributed dynamical systems with time dependent pixel imagingare discussed. . INTRODUCTION In this article, we examine suboptimal filters and smoothers of stochastic systemswhen the dynamics and the measurements are nearly block diagonal (N.B.D.). Weassume that the transition matrix, Φ( i + 1 , i ), the system noise covariance, [ Γ Q Γ T ] i ,the initial state covariance, P (0 | J i ≡ H Ti R − i H i , are all N.B.D. We then derive estimation equations for the state vector,ˆ ~x i , and the covariance, P ( i | i ), which approximate the optimal estimates to secondorder in ǫ .Our stochastic systems are similar to the widely studied weakly coupled system(Kokotovic et al. (1969), Sezer and Siljak (1986), Gajic et al. (1990), Shen and Gajic(1990)). Our N.B.D. systems are not limited to two block systems, but apply to anarbitrary number of blocks. Furthermore, we require only that H Ti R − i H i is N.B.D.This contrasts to the stronger hypothesis of weakly coupled systems that H i and R i are separately weakly coupled.The existing theory of weakly coupled systems concentrates on the convergenceof approximations to the complete system as ǫ tends to zero. Thus the existinganalysis considers only the case where ǫ is sufficiently small as to preclude the loss ofpositive definiteness in the approximate equations. Therefore previous analyses havenot explicitly required positive definiteness.Our emphasis is on well-conditioned approximation of ˆ ~x i and P ( i | i ) for finite,but small values of the coupling parameter, ǫ . Formally, our expansions require thatthe zeroth order N.B.D. matrices are all uniformly much larger than the remainingoffdiagonal terms. In practice, the coupling parameter, ǫ , is not vanishingly small,and there may be component directions where the first order terms almost cancel thezeroth order terms. To prevent the approximate covariance matrix, P ( ǫ ) ( i | i ), fromlosing positive definiteness, we add second order terms to the approximate covariance.These additional terms not only guarantee positive semidefiniteness, but also providea matrix factorization.Our motivation for the study of N.B.D. systems is the analysis of distributedsystems of partial differential equations for fluid flow. We estimate the fluid flow as a2unction of time and space, ~u ( ~r, t ), where ~u ( ~r, t ) satisfies the Navier-Stokes equations: ∂ t ~u + ~u · ∇ ~u = ∇ p + ν ∆ ~u , ∇ · ~u = 0 . We are given continuous time measurementsof velocity field on a coarse grid in space. We expand the Navier-Stokes equation inthe set of eigenfunctions of the laminar flow linear stability problem (Canuto et al.(1988)). We truncate the eigenfunction expansion in the middle of the inertial range,and model the effects of the discarded modes through an anomalously large diffusioncoefficient.When spatial inhomogeneities and nonlinearities are weak, the transition matrix,Φ( i + 1 , i ), for the zeroth order eigenfunction basis often will be N.B.D. To decouplethe estimation equations to leading order, we assume both [ Γ Q Γ T ] i and P (0 |
0) arenearly block diagonal.In our prototypical system, the elements of the measurement evaluation matrix, H i , are evaluations of basis functions, Ψ k , at the spatial locations, z ℓ , of the mea-surements, y ℓ . When the z ℓ are distributed more or less uniformly in space, and theΨ k are orthogonal, and R i is a multiple, σ i , of the identity, then( H Ti R − i H i ) k,k ′ = 1 σ i m X ℓ =1 Ψ k ( z ℓ )Ψ k ′ ( z ℓ ) ∼ mσ i Z Ψ k ( z )Ψ k ′ ( z ) dz ∼ c k δ k,k ′ . (1 . H Ti R − i H i is nearly blockdiagonal corresponds to the measurement locations being nearly uniformly distributedand approximating spatial integration on the scalelength of the shortest wavelengthbasis function.For such pixel type measurements, the number of pixels needs to exceed the num-ber of different diagonal blocks of eigenfunctions. If the measurements are spatiallyuniform, but of insufficient number to distinguish the various eigenfunctions, theevolution equations will be partially coupled due to spatial aliasing.Section II defines N.B.D. matrices, and presents several stabilizing transforma-tions and approximate factorizations. In Section III, we review the standard discreteKalman filter and derive a positive definite suboptimal approximation to the Kalmanfilter. Section IV and Appendix B derive similar suboptimal positive definite ap-3roximations to the discrete Kalman smoothers for fixed intervals and for fixed lagsrespectively. Section V discusses our N.B.D. formulation. Appendix A examines thenumerical advantages of computing the basic matrix operations only to first order. II. NEARLY BLOCK DIAGONAL MATRIX REPRESENTATIONS ANDOPERATIONS A) Matrix Structure
We consider the class of nearly block diagonal (N.B.D.) matrices to be N × N matrices of the form: P ( ǫ ) = P (0) ( ǫ ) + ǫ P (1) + ǫ P (2) ( ǫ ) , where P (2) ( ǫ ) containssecond order and higher terms in ǫ . The weak coupling parameter, ǫ , is a formalsmall expansion term parameter. P (0) ( ǫ ) is block diagonal of the form: P (0) ( ǫ ) = P (0)11 ( ǫ ) 0 . . . P (0)22 ( ǫ ) 00 . . . . . . P (0) N b N b ( ǫ ) , (2 . P (0) kk entry is a n k × n k matrix for k = 1 , , . . . , N b . The block sizes, n , n , . . . , n N b , are fixed in this article, i.e. all matrices have the same block structure.We often suppress the functional dependence on ǫ in P ( i ) ( ǫ ) , i = 0 , ,
2. We denote thetruncated approximations of P ( ǫ ) by P ( ǫ ) , where P ( ǫ ) = P (0) + ǫ P (1) for first orderapproximations and P ( ǫ ) = P (0) + ǫ P (1) + ǫ P (2) for second order approximations.The first order block diagonal terms may be included in either P (0) ( ǫ ) or ǫ P (1) .Including the block diagonal terms in P (0) ( ǫ ) reduces storage requirements; however,the resulting equations are slightly more complicated. For simplicity, we include thefirst order block diagonal terms in ǫ P (1) . We define P L to be the strictly lowertriangular part of a matrix P plus half of the block diagonal part of P .B) Stabilizing Transformations and the LD − L T Factorization
The truncated approximations, P ( ǫ ) , to P ( ǫ ) need not be positive semidefiniteeven when P (0) and P ( ǫ ) are positive definite. We assume that P (0) is positivedefinite and that P (0) , P ( ǫ ) and P ( ǫ ) are symmetric, we define the following trans-4ormations: T [ P ( ǫ ) ] ≡ [ P (0) + ǫ P (1) L ] P (0) − [ P (0) + ǫ P (1) L ] T , (2 . T [ P ( ǫ ) ] ≡ h P (0) + ǫ P (1) L + ǫ (cid:16) P (2) L − G (2) L (cid:17)i P (0) − h P (0) + ǫ P (1) L + ǫ (cid:16) P (2) L − G (2) L (cid:17)i T , (2 . G (2) ≡ P (1) L P (0) − P (1) TL . Both transfor-mations produce positive semidefinite matrices with LD − L T block factorizations. T [ P ( ǫ ) ] differs from P (0) + ǫ P (1) by the second order term ǫ P (1) L P (0) − P (1) TL , and isthus strictly larger than P (0) + ǫ P (1) L . T [ P ( ǫ ) ] differs from P (0) + ǫ P (1) + ǫ P (2) bythird order terms, however these third order terms need not be positive semidef-inite. An alternative transformation is T b [ P ( ǫ ) ] ≡ h P (0) + ǫ P (1) L + ǫ P (2) L i P (0) − h P (0) + ǫ P (1) L + ǫ P (2) L i T . T b [ P ( ǫ ) ] approximates P ( ǫ ) only to second order, but addsonly positive terms. Therefore T b [ P ( ǫ ) ] can be used to provide second order upperbounds.A number of other stabilizing transformations may be defined. ǫ P (1) L can be re-placed by ǫ P (1) at the cost of losing the LD − L T block factorization. A more usefultransformation is to decompose P ( ǫ ) into its spectral representation, and then to setany negative eigenvalues of P ( ǫ ) to zero. This spectral transformation has the advan-tage that it uses the smallest possible correction which makes the transformed matrixpositive semidefinite. In Appendix A, we describe a first order approximation to thespectral decomposition. Instead of actually performing the singular value decompo-sition, we may simply test P ( ǫ ) for negative eigenvalues using Eq. (A3). Only theeigenvalues with small λ (0) k need be tested and/or replaced. Thus the eigendecompo-sition approach is especially attractive when only a small number of eigenvalues arequestionable. T [ P ( ǫ ) ] also has a block LDL T representation: [ I N + ǫ P (1) L P (0) − ] P (0) [ I N + ǫ P (1) L P (0) − ] T . This factorization is less numerically efficient than the LD − L T rep-resentation.In our filtering applications, we use the transformation, P ( ǫ )+ = T [ P ( ǫ ) ], to stabilize5he data assimilation and variance evaluations. We note that T [ T [ P ( ǫ ) ]] = T [ P ( ǫ ) ].This property is important when the covariance matrix, P ( ǫ ) , is modified many timeswith small updates. We let T [ · ] denote the appropriate stabilizing transformation. III. DIAGONALLY DOMINANT DISCRETE KALMAN FILTERS
We consider the discrete linear state space model: ~x i +1 = Φ( i + 1 , i ) ~x i + Γ i ~w i , (3 . ~y i = H i ~x i + ~v i , (3 . ~x i is the state vector of dimension N , ~y i is the measurement vector of dimension m , and Φ( j, i ) is the N × N nonsingular deterministic part of the map from time i totime j . The system noise, ~w k , is assumed to be an r -dimensional white Gaussian withcovariance Q i . The measurement noise is a m -dimensional white Gaussian sequencewith nondegenerate covariance R i . The m × N measurement evaluation matrix, H i ,maps the state vector, ~x i , onto the deterministic part of the measurements. We definethe N × N matrices, Q Γ ,i ≡ Γ i Q i Γ Ti and J i ≡ H Ti R − i H i .The standard Kalman filter estimates the state vector, ˆ ~x ( i | j ), at time i given themeasurements, ~y , . . . , ~y j up to time j by the time evolution update:ˆ ~x ( i + 1 | i ) = Φ( i + 1 , i )ˆ ~x ( i | i ) . (3 . P ( i | j ), of the estimate, ˆ ~x ( i | j ), evolves as P ( i + 1 | i ) = Φ( i + 1 , i ) P ( i | i )Φ T ( i + 1 , i ) + Q Γ ,i . (3 . ~x (0 |
0) and P (0 |
0) are given. The measurement update isˆ ~x ( i | i ) = ˆ ~x ( i | i −
1) + K i ( ~y i − H i ˆ ~x ( i | i − , (3 . P ( i | i ) − = P ( i | i − − + H Ti R − i H i , (3 . K i is the N × m Kalman gain: K i = [ P ( i | i − − + H Ti R − i H i ] − H Ti R − i = P ( i | i ) H Ti R − i . (3 . i + 1 , i ), Q Γ ,i , P (0 |
0) and J i ≡ H Ti R − i H Ti are N.B.D. Weassume that the leading order operator, Φ( i + 1 , i ), is normal, so that its eigenvectorsare orthogonal. For clarity, we denote Φ (0) ( i + 1 , i ) by Λ i . We denote the k, ℓ -thsubblocks of P ( i | j ) by P ( i | j ) { k,ℓ } , and use similar subscripts for the subblocks of J i , Q Γ ,i etc. We present the expansion of the Kalman filter only to first order. Higherorder expressions are similar, but longer.The time evolution update for ˆ ~x , Eq. (3.3), may be computed to arbitrary order ifdesired. The time evolution of the covariance for the standard N.B.D. representationsatisfies P (0) ( i + 1 | i ) { k,k } = Λ( i ) k P (0) ( i | i ) { k,k } Λ( i ) Tk + [ Γ Q Γ T ] (0) i { k,k } , (3 . P (1) ( i + 1 | i ) { k,ℓ } = Λ( i ) k P (1) ( i | i ) { k,ℓ } Λ( i ) Tℓ + Φ (1) ( i + 1 , i ) { k,ℓ } P (0) ( i | i ) { ℓ,ℓ } Λ( i ) Tℓ +Λ( i ) k P (0) ( i | i ) { k,k } Φ (1) ( i + 1 , i ) T { ℓ,k } + [ Q Γ ,i ] (1) { k,ℓ } , (3 . ~x ( i | i ) and P ( i | i ) is sepa-rated into four steps. First, the zeroth order, block diagonal approximation to P ( i | i )is determined by solving the block system P (0) ( i | i ) = [ P (0) ( i | i − − + J (0) i ] − (3 . P ( i | i ) are P (1) ( i | i ) = P (0) ( i | i ) − [ P (0) ( i | i − − P (1) ( i | i − P (0) ( i | i − − − J (1) ] P (0) ( i | i ) − . (3 . P ( i | i ) is forced to be positive semidefinite using the trans-formation: P ( ǫ )+ ( i | i ) = T [ P ( ǫ ) ( i | i )]. Finally, we update our estimate of ~x ( i | i ) usingˆ ~x ( i | i ) = ˆ ~x ( i | i −
1) + P ( ǫ )+ ( i | i ) H Ti R − i [ ~y i − H i ˆ ~x ( i | i − , (3 . T [ P ( ǫ ) ( i | i )] − P ( ǫ ) ( i | i ), are not propagated in the filter. Instead, new stabilizingterms are calculated at every data assimilation step. If P ( i +1 | i ) or P ( i | i ) needs to be7valuated to assess the uncertainty in the state space estimate, we use the stabilizedapproximations.Comments:1) Our covariance matrices, P ( ǫ ) ( i | i ), are approximations of the covariance matri-ces of the optimal estimate, ˆ ~x ( i | i ), and not the actual covariance of the approximateestimate, ˆ ~x ( ǫ ) ( i | i ).2) Computational savings occurs only for the first order approximation to thestate vector covariance matrix, P , and not for the state estimate. The computationalrequirements of second order calulations are approximately equal to the costs of theoriginal Kalman filter. Our approximate filter does not require successive matrixinversions, and is therefore more numerically stable.3) The zeroth order block matrices, P (0) ( i | i ) and P (0) ( i | i − P → T [ P ], guarantee positive semidefiniteness.5) In general, the block diagonal structure is incompatible with sequential pro-cessing of the measurements since each H i, { k } R − { k,k } H i { k } separately is usually notblock diagonal.6) Suboptimal versions of the information filter reformulation may be constructedusing duality. A second order upper bound on P − , constructed using an informationfilter and the stabilizing transformation, T b , and thereby producing a lower bound on P . 7) Different suboptimal filters with positive definite covariance may be constructedby expanding other formulations of the Kalman filter order by order and insertingthe transformation P → T [ P ] whenever necessary. To guarantee that the N.B.D.structure is fully utilized, each matrix in the reformulation should be N.B.D. Forexample, replacing Eq. (3.6) by P ( i | i ) = P ( i | i − − K i H i P ( i | i − , or P ( i | i ) =[ I N − K i H i ] P ( i | i − I N − K i H i ] T + K i R i K Ti yields a system of equations where8ach term in the evaluations is not explicitly N.B.D. Similarly, replacing the Kalmangain matrix, K i , of Eq. (3.7) with the representation, K i ≡ P ( i | i − H Ti [ H i P ( i | i − H Ti + R i ] − , which is not in N.B.D. form, results in a system which is not explicitlyN.B.D. IV. N.B.D. FIXED INTERVAL SMOOTHERS
In this section, we derive suboptimal, second order approximations to the variousformulations of the fixed interval Kalman smoother. We denote the final measure-ment time by N f . We begin with the Rauch-Tung-Striebel (R.T.S.) formulation ofthe smoother. We then present a new information formulation of the R.T.S. smootheras well as the Bryson-Frazier formulation. The R.T.S. smoother consists of a forwardKalman filter followed by a backward smoother correction. This structure arises be-cause the estimation equations for ~x ( i | N f ) have a block tribanded structure. Theforward-backward sweeps correspond to the standard algorithm for solving blocktribanded matrices. In our notation, the R.T.S. smoother (Rauch et al. (1965),Bryson and Ho, Ch. 13.2 (1969)) isˆ ~x ( i | N f ) = ˆ ~x ( i | i ) + P ( i | i )Φ( i + 1 | i ) T P − ( i + 1 | i ) (cid:16) ˆ ~x ( i + 1 | N f ) − ˆ ~x ( i + 1 | i ) (cid:17) , (4 . P ( i | N f ) = P ( i | i )+ (4 . P ( i | i )Φ( i + 1 | i ) T P − ( i + 1 | i ) [ P ( i + 1 | N f ) − P ( i + 1 | i )] P − ( i + 1 | i )Φ( i + 1 | i ) P ( i | i ) . We assume that P ( i, i ) and P ( i + 1 , i ) have been computed using the N.B.D. ap-proximations and stabilizing transformations of Sec. II. We stabilize both P ( i | i ) and P − ( i + 1 | i ) before evaluating Eq. (4.1) to all orders. The R.T.S. fixed intervalsmoother is explicitly in N.B.D. form, and the N.B.D. expansion of Secs. II and III isused to evaluate P ( i | N f ) to second order. To ensure positive definiteness, we stabilizeour estimate of P ( i | N f ): P ( ǫ )+ ( i | N f ) = T [ P ( ǫ ) ( i | N f )].A desirable property of a smoother is that P ( ǫ ) ( i | i ) ≥ P ( ǫ ) ( i | N f ) ≥
0, and un-fortunately our suboptimal approximation of the R.T.S. smoother does not explicitlyinsure this property for moderate values of ǫ . In contrast, we now show the informa-tion formulation of the R.T.S. smoother covariance equation possesses the property9hat P − ǫ ) ( i | N f ) ≥ P − ǫ ) ( i | i ) ≥
0. We apply the Sherman-Morrison matrix inverseidentity to Eq. (4.2) twice and simplify to yield P − ( i | N f ) = P − ( i | i )+Φ( i +1 | i ) T (cid:20)(cid:16) P − ( i + 1 | N f ) − P − ( i + 1 | i ) (cid:17) − + Q Γ ,i (cid:21) − Φ( i +1 | i ) . (4 . ǫ and applying the stabilizing transformation, P − ǫ ) ( i | N f ) → T [ P − ǫ ) ( i | N f )] . The original R.T.S. formulation requires that the evolution equations be integratedbackward in time during the backward sweep. Since we are interested in distributeddynamical systems with dissipation and diffusion, such a backward integration is ill-conditioned. The Bryson-Frazier formulation of the fixed interval smoother reducesthis problem by making the following change of variables for the smoother correction:ˆ ~x ( i | N f ) = ˆ ~x ( i | i ) − P ( i | i )Φ( i + 1 | i ) T λ ( i ) , (4 . P ( i | N f ) = P ( i | i ) − P ( i | i )Φ( i + 1 | i ) Λ ( i )Φ( i + 1 | i ) T P ( i | i ) . (4 . N vector ~λ ( i ) and N × N positive definite symmetric matrix Λ ( i ), equations (4.1)-(4.2) transform to ~λ ( i −
1) = ( I n − P ( i | i ) J i ) T [Φ( i + 1 , i ) ~λ ( i ) − H Ti R − i ( ~y i − H i ˆ ~x ( i | i − , (4 . Λ ( i −
1) = ( I n − P ( i | i ) J i ) T Φ( i +1 , i ) T Λ ( i )Φ( i +1 , i )( I n − P ( i | i ) J i )+ J i − J i P ( i | i ) J i , (4 . ~λ ( N f ) = 0 , Λ ( N f ) = 0. Λ ( i ) is positive semidefinite,but its approximation, Λ ( ǫ ) ( i ), need not be. We do not stabilize our estimate of Λ ( ǫ ) ( i )or any term in Eq. (4.7), since adding positive definite terms to Λ ( i ) will tend tounderestimate P ( i | N f ). Instead, we again apply the stabilizing transformation onlyto P ( i | N f ): P ( i | N f ) → T [ P ( i | N f )]. V. CONCLUSION
In this article, we have given first and second order approximations for the Kalmanfilter and a number of smoothers by expanding the estimation equations in powers of10he coupling parameter. We have described the formulations the estimation equationswhich explicit preserve the N.B.D. structure. We apply stabilizing transformationsto ensure the approximate covariance is positive semidefinite. We do not propagatethese stabilizing terms in the Kalman filter in order to minimize the perturbation.Other N.B.D. formulations are possible where stabilizing terms are added to thecovariance and propagated in the filter. To minimize the effect of the terms theapproximate spectral representation of Eq. (A5) may be used. Only the small ornegative eigenvalues need be modified. The stabilizing transformation need not beapplied at every time step. Instead, the values of P ( i | i ) may be examined occasionallyor regularly, and stabilized if they have eigenvalues near zero.The computational advantage in reducing the operations count by using first orderapproximations is apparent and scales as O (1 /N b ). If the stochastic system has aspecial structure like nearest neighbors block structure, computational savings mayalso be present for second order approximations.For general N.B.D. structure, second order approximation actually increases thecomputational work over straightforward, nonexpansion calculations. In spite of theadditional complication and computational cost, higher order calculations are some-times necessary and valuable. Higher order calculations for weakly coupled systemshave been given in Shen and Gajic (1990). To motivate second order approximations,we consider a case where P (0) is the identity matrix and that P (1) ( ǫ ) has a largenegative eigenvalue, λ (1)1 , such that ǫ ∗ λ (1)1 = − δ ∗ for the value of ǫ ∗ of interest. P (0) + ǫ ∗ P (1) ( ǫ ∗ ) will have at least one small eigenvalue, λ ∼ δ ∗ , and a correspondinglarge eigenvalue, O (1 /δ ∗ ), for its inverse. Our approximate filter effectively replacesthis large matrix component by terms of order O (1 / P (0) is bounded from belowand P (0) + ǫ ∗ P (1) ( ǫ ∗ ) is close to singular. Acknowledgement
We thank the referees for their helpful comments. This work was supported bythe U.S. Department of Energy, Grant No. DE-FG02-86ER53223.11 eferences
Anderson, B.D.O. and J.B. Moore (1979).
Optimal Filtering.
Prentice-Hall, NewJersey.Bierman, G.J. (1977).
Factorization Methods for Discrete Sequential Estimation .Academic Press, New York.Bryson, Jr., A.E. and Y.C. Ho (1969).
Applied Optimal Control . Blaisdel PublishingCo., New York.Cohn, S.E. and D.F. Parrish (1991). The behavior of forecast error covariances for aKalman filter in two dimensions.
Monthly Weather Review , 1757-1785.Canuto, C., M.Y. Hussaini, A. Quarteroni, and T.A.Zang (1988).
Spectral methodsin fluid dynamics.
Springer-Verlag, New York.Gajic, Z., D. Petkovski, X. Shen (1990).
Singularly perturbed and weakly coupledlinear control systems.
Lecture Notes in Control and Information Sciences, No. 140.Springer-Verlag, Berlin.Jazwinski, A.H. (1970).
Stochastic Processes and Filtering Theory . Academic Press,New York.Kokotovic, P., W. Perkins, J. Cruz Jr. and G. D’Ans (1969). ǫ -coupling for near-optimum design of large scale linear systems. Proc. IEEE , 889-892.Moore, J.B. (1973). Discrete time fixed lag smoothing.
Automatica , 163.Rauch, H.E., F. Tung and C.T. Striebel (1965). Maximum likelihood estimates oflinear dynamic systems. AIAA J. , 1445.Sezer, M. and D. Siljak (1986). Nested ǫ -decomposition and clustering of complexsystems. Automatica , 321-331.Shen, X.-M. and Z. Gajic (1990). Near-optimum steady state regulators for stochasticlinear weakly coupled systems. Automatica , 919-923. APPENDIX A: FIRST ORDER N.B.D. MATRIX OPERATIONS A) Storage and Operations Count
We examine the computational savings which occur when the matrix opera-12ions are performed only to first order in ǫ . We define the following moments N ≡ (cid:16) N b P N b k =1 n k (cid:17) / and N = (cid:16) N b P N b k =1 n k (cid:17) / . Thus if all blocks are the samesize, n = n = n b , N = N = N . Our operation count is for the number of scalarmultiplications. In contrast to the other sections, we store the first order block diag-onal terms, ǫ P (1) { k,k } , in P (0) ( ǫ ), and assume P (0) { k,k } ( ǫ ) = P (0) { k,k } + ǫ P (1) { k,k } is positivedefinite. This representation slightly reduces the operation count.We consider two subclasses of N.B.D. matrices: general and nearest neighbor.General block diagonal matrices have no particular structure in P (1) and P (2) . Wesay a matrix P ( ǫ ) has nearest neighbor structure if and only if P (1) has nonzeroelements only on the diagonal, P (1) k,k and the adjacent bands, P (1) k,k ± . We say amatrix P ( ǫ ) has a strongly nearest neighbor structure if and only if P (2) has nonzeroelements only on the diagonal and two adjacent bands, P (2) k,k ± and P (2) k,k ± as well.For the N.B.D. matrices with a general structure on P (1) , the storage requirementis N for an arbitrary matrix and N ( N + 1) / P N b k =1 n k + 2 n k n k +1 for arbitrary matrices and P N b k =1 n k ( n k +1)2 + n k n k +1 for symmetric matrices, where n N b +1 ≡
0. When all the n k are equal, n = n = n b , the sums storage requirements for nearest neighbor matricesare (3 N b − n for no symmetry, and ( N b − n + N b n for symmetric matrices.B) LD − L T Factorization
The transformation T [ P ( ǫ ) ] ≡ [ P (0) + ǫ P (1) L ] P (0) − [ P (0) + ǫ P (1) L ] T need not beexplicitly computed by multiplying P (1) L P (0) − P (1) TL . Instead, the implicit LD − L T representation is usually sufficient. The LD − L T factorization requires just P N b k =1 n k multiplications to compute P − . The LD − L T representation of T [ P ( ǫ ) ] does requirethat both P (0) and P (0) − be stored. Since both matrices are symmetric, this requiresan additional storage allocation of P N b k − n k ( n k +1)2 .C) First Order Matrix Multiplication
The N.B.D. structure is preserved under matrix multiplication. Let R = P Q ,then R (0) = P (0) Q (0) and R (1) = P (0) Q (1) + P (1) Q (0) . Calculating R (0) requires P N b k =1 n k operations and calculating R (1) requires 2 P N b k =1 n k ( N − n k ) for a total of2 NN N b − N N b operations. If both P and Q have nearest neighbor symmetry, calculating13 (1) requires only 2 P N B k =1 n k ( n k +1 + n k − ) operations where n ≡
0, and n N b +1 ≡ N N b − N N b .A second matrix operation which is often performed in filtering is S = RQR T ,where Q is symmetric. For ordinary matrices, this symmetric product requires N + N operations. Computing S (0) = R (0) Q (0) R (0) T and R (0) Q (0) R (1) T re-quires P N b k =1 n k + ( N + ) n k = ( N +1 / NN N b + N N b . Estimating R (0) Q (1) R (0) T re-quires [ NN N b − N N b ] multiplications. Thus the symmetric product requires a to-tal of (5 N +1) N N b − N N b multiplications. For nearest neighbor matrices, a total of P N b k =1 n k (3 n k + 1 + 5( n k − + n k +1 )) multiplications are required.D) Matrix Inversion and D − [ D − L ] D − [ D − L ] T D − Factorization
To stabilize the order by order approximate inversion, we define the
Inv [ · ] transfor-mation to be the T [ · ] transformation applied to the approximate inverse: Inv [ P ( ǫ ) ] ≡ T [ P ( ǫ ) − ]. When the approximation is first order, Inv [ · ] reduces to T [ P (0) − − ǫ P (0) − P (1) P (0) − ] = P (0) − [ P (0) − ǫ P (1) L ] P (0) − [ P (0) − ǫ P (1) L ] T P (0) − . We refer tothis factorization of the approximate inverse as the D − L ′ D − L ′ T D − factorizationwhere L ′ ≡ P (0) − ǫ P (1) L .The approximate inverse, Inv [ P ], usually does not need to be computed explic-itly. Instead the D − L ′ D − L ′ T D − representation of Inv [ P ( ǫ ) ] is defined implicitly.Given the L T D − L representation of T [ P ( ǫ ) ], the D − L ′ D − L ′ T D − representationof Inv [ T [ P ( ǫ ) ]] requires no additional storage and no multiplications. The inverse of Inv is Inv : Inv [ Inv [ T [ P ( ǫ ) ]]] = T [ P ( ǫ ) ], and the Inv [ · ] operation commutes with the T [ · ] operation: Inv [ T [ P ( ǫ ) ]] = T [ Inv [ P ( ǫ ) ]].E) Inverse Matrix Updates
In Kalman filtering, we successively update the covariance matrix and then itsinverse. We now examine updates of the D − ( D − L ) D − ( D − L ) T D − representationunder matrix addition. We let the matrices, M and J , be block diagonally dominantsymmetric matrices with LD − L T representations. We wish to derive an LD − L T representation of P where P − ≡ M − + J . The zeroth order matrix P (0) satisfies P (0) − = M (0) − + J (0) . Since M (0) − is given in the LD − L T factorization, thecomputation of P (0) − requires only additions and no multiplications. However P (0) P N b k =1 n k + n k operations. We note that P (0) − = P − , but that P − = P (1) − . To determine P (1) , we first determine P − , and then solve for P (1) : P (1) = P (0) M (0) − M (1) M (0) − P (0) − P (0) J (1) P (0) , ( A P (1) = [ I − P (0) J (0) − ] M (1) [ I − J (0) − P (0) ] − P (0) J (1) P (0) . ( A J (0) ≪ M (0) − . Eitherformulation requires hP N b k =1 n k (3 N − n k ) i operations.F) Approximate Eigenvalues and Eigenvectors
The eigenvalues and eigenvectors may be estimated from perturbation theory. Weuse the following basic result from linear algebra. Let P be a symmetric matrix form S o + ǫ S , let { ~e (0) k } and { λ (0) k } be the eigenvectors and eigenvalues of S , then theeigenvalues of S are asymptotically λ k ∼ λ (0) k + ǫ~e (0) Tk S ~e (0) k , ( A ~e k ∼ ~e (0) k − ǫ ( S o − λ (0) k I ) − ( S − ( ~e (0) Tk S ~e (0) k ) I ) ~e (0) k , ( A P = N X k =1 λ k ~e k ~e Tk ∼ N X k =1 ( λ (0) k + ǫλ (1) k )( ~e (0) k + ǫ~e (1) k )( ~e (0) k + ǫ~e (1) k ) T . ( A APPENDIX B: N.B.D. FIXED LAG DISCRETE SMOOTHERS
We consider suboptimal approximations of fixed lag Kalman smoothers withN.B.D. structure. Moore derived the fixed lag Kalman smoother (Moore (1973);15h. 7.3 of Anderson and Moore (1979)) as the Kalman filter for the augmentedstate space, ~x A ( t ) T ≡ ( ~x ( t ) T , ~x ( t − T . . . ~x ( t − n ) T ), and then simplified the result-ing augmented filter. In presenting the fixed lag smoother, we rewrite the equa-tions and reorder the matrix indices of Anderson and Moore to achieve an explicitN.B.D. structure. We define ~e (1) i = [ I N − J i P ( i | i )] H Ti R − i ( ~y i − H i ˆ ~x ( i | i − ~e ( j +1) i = Φ( i − j + 1 , i − j )[ I N − P ( i − j | i − j ) J i − j ] ~e ( j ) i . The fixed lag smoother isˆ ~x ( i | i + n ) = ˆ ~x ( i | i ) + P ( i | i − n X ℓ =1 ~e ( ℓ +1) i + ℓ . ( B ~x ( i | i + n ) is P ( i | i + n ) = P ( i | i ) − n X ℓ =1 P ( ℓ ) ( i + ℓ | i + ℓ −
1) ( J i + ℓ − J i + ℓ P ( i + ℓ | i + ℓ ) J i + ℓ ) P ( ℓ ) ( i + ℓ | i + ℓ − , ( B P ( ℓ ) ( i + ℓ | i + ℓ − ≡ P ( i | i − ℓ − Y j =0 [ I N − J i + j P ( i + j | i + j )]Φ( i + j + 1 , i + j ) T . ( B I n − H Ti K Ti ] with [ I n − J i P ( i | i )], and replaced H i [ H i P ( i | i − H Ti + R i ] − with [ I N − J i P ( i | i )] H Ti R − i .Alternatively, we could replace [ I N − J i P ( i | i )] with P ( i | i − − P ( i | i ) and/or replace H Ti R − i − J i P ( i | i ) H Ti R − i by P − ( i | i − P ( i | i ) H TI R − i . The alternative formula-tions have the advantages that they involve fewer matrix multiplications. Our presentformulation has the advantage that P ( i | i −
1) appears only once in the expression,and that all the other terms are input quantities, usually known to all orders. Inthe limit that J i << P ( i | i − − , our formulation approximates small terms whilethe alternative formulation approximates large terms. For these reasons, we generallyprefer our formulation in Eqs. (B1-3).We stabilize the data assimilation by using P ( ǫ )+ ( i − j | i − j ) = T [ P ( ǫ ) ( i − j | i − j )]in evaluating ~e ( j +1) i and using P ( ǫ )+ ( i | i −
1) in Eq. (B1). If P ( i | i + n ) is of interest,we also stabilize our approximation of it.16ur work is motivated by and generalises the results of Cohn and Parrish. InAppendix B of Cohn and Parrish (1991), the authors show that if H Ti R − i H i isdiagonal and the evolution equations are diagonal, then the estimation covariancewill be diagonal. In this article, we extend their results to include block diagonalsystems and Kalman smoothers. More importantly, we relax the requirement ofexact diagonality and consider small offdiagonal terms. We expand the estimationequations in powers of the offdiagonal terms and develop numerically wellconditionedalgorithms to compute these approximate estimationIn applying Kalman filtering to global circulation models, Cohn and Parrish (Cohnand Parrish (1991)) noted that the evolution equtions are simplest in an eigenfunctionbasis while the data assimilation is simplest in a finite difference or finite elementrepresentation. By requiring that the measurement locations be distributed suchthat H Ti R − ii
1) in Eq. (B1). If P ( i | i + n ) is of interest,we also stabilize our approximation of it.16ur work is motivated by and generalises the results of Cohn and Parrish. InAppendix B of Cohn and Parrish (1991), the authors show that if H Ti R − i H i isdiagonal and the evolution equations are diagonal, then the estimation covariancewill be diagonal. In this article, we extend their results to include block diagonalsystems and Kalman smoothers. More importantly, we relax the requirement ofexact diagonality and consider small offdiagonal terms. We expand the estimationequations in powers of the offdiagonal terms and develop numerically wellconditionedalgorithms to compute these approximate estimationIn applying Kalman filtering to global circulation models, Cohn and Parrish (Cohnand Parrish (1991)) noted that the evolution equtions are simplest in an eigenfunctionbasis while the data assimilation is simplest in a finite difference or finite elementrepresentation. By requiring that the measurement locations be distributed suchthat H Ti R − ii H ii