[PDF] Block Diagonally Dominant Positive Definite Sub-optimal Filters and Smoothers

Abstract

We examine stochastic dynamical systems where the transition matrix, Φ , and the system noise, ΓQ Γ T , covariance are nearly block diagonal. When H T R −1 H is also nearly block diagonal, where R is the observation noise covariance and H is the observation matrix, our suboptimal filter/smoothers are always positive semi-definite, and have improved numerical properties. Applications for distributed dynamical systems with time dependent pixel imaging are discussed.

Full PDF

aa r X i v : . [ s t a t . M E ] M a r BLOCK DIAGONALLY DOMINANT POSITIVE DEFINITEAPPROXIMATE FILTERS AND SMOOTHERS

Running title:

BLOCK DIAGONALLY DOMINANT APPROXIMATE FILTERS

Subtitle: Kalman ﬁlters and smoothers are approximated when the transitionmatrix and the incremental information are nearly block diagonal.

Kurt S. RiedelCourant Institute of Mathematical SciencesNew York University251 Mercer St.New York, New York 10012

Abstract

We examine stochastic dynamical systems where the transition matrix, Φ,and the system noise,

Γ Q Γ T , covariance are nearly block diagonal. When H T R − H is also nearly block diagonal, where R is the observation noise co-variance and H is the observation matrix, our suboptimal ﬁlter/smoothers arealways positive semideﬁnite, and have improved numerical properties. Appli-cations for distributed dynamical systems with time dependent pixel imagingare discussed. . INTRODUCTION In this article, we examine suboptimal ﬁlters and smoothers of stochastic systemswhen the dynamics and the measurements are nearly block diagonal (N.B.D.). Weassume that the transition matrix, Φ( i + 1 , i ), the system noise covariance, [ Γ Q Γ T ] i ,the initial state covariance, P (0 | J i ≡ H Ti R − i H i , are all N.B.D. We then derive estimation equations for the state vector,ˆ ~x i , and the covariance, P ( i | i ), which approximate the optimal estimates to secondorder in ǫ .Our stochastic systems are similar to the widely studied weakly coupled system(Kokotovic et al. (1969), Sezer and Siljak (1986), Gajic et al. (1990), Shen and Gajic(1990)). Our N.B.D. systems are not limited to two block systems, but apply to anarbitrary number of blocks. Furthermore, we require only that H Ti R − i H i is N.B.D.This contrasts to the stronger hypothesis of weakly coupled systems that H i and R i are separately weakly coupled.The existing theory of weakly coupled systems concentrates on the convergenceof approximations to the complete system as ǫ tends to zero. Thus the existinganalysis considers only the case where ǫ is suﬃciently small as to preclude the loss ofpositive deﬁniteness in the approximate equations. Therefore previous analyses havenot explicitly required positive deﬁniteness.Our emphasis is on well-conditioned approximation of ˆ ~x i and P ( i | i ) for ﬁnite,but small values of the coupling parameter, ǫ . Formally, our expansions require thatthe zeroth order N.B.D. matrices are all uniformly much larger than the remainingoﬀdiagonal terms. In practice, the coupling parameter, ǫ , is not vanishingly small,and there may be component directions where the ﬁrst order terms almost cancel thezeroth order terms. To prevent the approximate covariance matrix, P ( ǫ ) ( i | i ), fromlosing positive deﬁniteness, we add second order terms to the approximate covariance.These additional terms not only guarantee positive semideﬁniteness, but also providea matrix factorization.Our motivation for the study of N.B.D. systems is the analysis of distributedsystems of partial diﬀerential equations for ﬂuid ﬂow. We estimate the ﬂuid ﬂow as a2unction of time and space, ~u ( ~r, t ), where ~u ( ~r, t ) satisﬁes the Navier-Stokes equations: ∂ t ~u + ~u · ∇ ~u = ∇ p + ν ∆ ~u , ∇ · ~u = 0 . We are given continuous time measurementsof velocity ﬁeld on a coarse grid in space. We expand the Navier-Stokes equation inthe set of eigenfunctions of the laminar ﬂow linear stability problem (Canuto et al.(1988)). We truncate the eigenfunction expansion in the middle of the inertial range,and model the eﬀects of the discarded modes through an anomalously large diﬀusioncoeﬃcient.When spatial inhomogeneities and nonlinearities are weak, the transition matrix,Φ( i + 1 , i ), for the zeroth order eigenfunction basis often will be N.B.D. To decouplethe estimation equations to leading order, we assume both [ Γ Q Γ T ] i and P (0 |

0) arenearly block diagonal.In our prototypical system, the elements of the measurement evaluation matrix, H i , are evaluations of basis functions, Ψ k , at the spatial locations, z ℓ , of the mea-surements, y ℓ . When the z ℓ are distributed more or less uniformly in space, and theΨ k are orthogonal, and R i is a multiple, σ i , of the identity, then( H Ti R − i H i ) k,k ′ = 1 σ i m X ℓ =1 Ψ k ( z ℓ )Ψ k ′ ( z ℓ ) ∼ mσ i Z Ψ k ( z )Ψ k ′ ( z ) dz ∼ c k δ k,k ′ . (1 . H Ti R − i H i is nearly blockdiagonal corresponds to the measurement locations being nearly uniformly distributedand approximating spatial integration on the scalelength of the shortest wavelengthbasis function.For such pixel type measurements, the number of pixels needs to exceed the num-ber of diﬀerent diagonal blocks of eigenfunctions. If the measurements are spatiallyuniform, but of insuﬃcient number to distinguish the various eigenfunctions, theevolution equations will be partially coupled due to spatial aliasing.Section II deﬁnes N.B.D. matrices, and presents several stabilizing transforma-tions and approximate factorizations. In Section III, we review the standard discreteKalman ﬁlter and derive a positive deﬁnite suboptimal approximation to the Kalmanﬁlter. Section IV and Appendix B derive similar suboptimal positive deﬁnite ap-3roximations to the discrete Kalman smoothers for ﬁxed intervals and for ﬁxed lagsrespectively. Section V discusses our N.B.D. formulation. Appendix A examines thenumerical advantages of computing the basic matrix operations only to ﬁrst order. II. NEARLY BLOCK DIAGONAL MATRIX REPRESENTATIONS ANDOPERATIONS A) Matrix Structure

We consider the class of nearly block diagonal (N.B.D.) matrices to be N × N matrices of the form: P ( ǫ ) = P (0) ( ǫ ) + ǫ P (1) + ǫ P (2) ( ǫ ) , where P (2) ( ǫ ) containssecond order and higher terms in ǫ . The weak coupling parameter, ǫ , is a formalsmall expansion term parameter. P (0) ( ǫ ) is block diagonal of the form: P (0) ( ǫ ) =  P (0)11 ( ǫ ) 0 . . . P (0)22 ( ǫ ) 00 . . . . . . P (0) N b N b ( ǫ )  , (2 . P (0) kk entry is a n k × n k matrix for k = 1 , , . . . , N b . The block sizes, n , n , . . . , n N b , are ﬁxed in this article, i.e. all matrices have the same block structure.We often suppress the functional dependence on ǫ in P ( i ) ( ǫ ) , i = 0 , ,

2. We denote thetruncated approximations of P ( ǫ ) by P ( ǫ ) , where P ( ǫ ) = P (0) + ǫ P (1) for ﬁrst orderapproximations and P ( ǫ ) = P (0) + ǫ P (1) + ǫ P (2) for second order approximations.The ﬁrst order block diagonal terms may be included in either P (0) ( ǫ ) or ǫ P (1) .Including the block diagonal terms in P (0) ( ǫ ) reduces storage requirements; however,the resulting equations are slightly more complicated. For simplicity, we include theﬁrst order block diagonal terms in ǫ P (1) . We deﬁne P L to be the strictly lowertriangular part of a matrix P plus half of the block diagonal part of P .B) Stabilizing Transformations and the LD − L T Factorization

The truncated approximations, P ( ǫ ) , to P ( ǫ ) need not be positive semideﬁniteeven when P (0) and P ( ǫ ) are positive deﬁnite. We assume that P (0) is positivedeﬁnite and that P (0) , P ( ǫ ) and P ( ǫ ) are symmetric, we deﬁne the following trans-4ormations: T [ P ( ǫ ) ] ≡ [ P (0) + ǫ P (1) L ] P (0) − [ P (0) + ǫ P (1) L ] T , (2 . T [ P ( ǫ ) ] ≡ h P (0) + ǫ P (1) L + ǫ (cid:16) P (2) L − G (2) L (cid:17)i P (0) − h P (0) + ǫ P (1) L + ǫ (cid:16) P (2) L − G (2) L (cid:17)i T , (2 . G (2) ≡ P (1) L P (0) − P (1) TL . Both transfor-mations produce positive semideﬁnite matrices with LD − L T block factorizations. T [ P ( ǫ ) ] diﬀers from P (0) + ǫ P (1) by the second order term ǫ P (1) L P (0) − P (1) TL , and isthus strictly larger than P (0) + ǫ P (1) L . T [ P ( ǫ ) ] diﬀers from P (0) + ǫ P (1) + ǫ P (2) bythird order terms, however these third order terms need not be positive semidef-inite. An alternative transformation is T b [ P ( ǫ ) ] ≡ h P (0) + ǫ P (1) L + ǫ P (2) L i P (0) − h P (0) + ǫ P (1) L + ǫ P (2) L i T . T b [ P ( ǫ ) ] approximates P ( ǫ ) only to second order, but addsonly positive terms. Therefore T b [ P ( ǫ ) ] can be used to provide second order upperbounds.A number of other stabilizing transformations may be deﬁned. ǫ P (1) L can be re-placed by ǫ P (1) at the cost of losing the LD − L T block factorization. A more usefultransformation is to decompose P ( ǫ ) into its spectral representation, and then to setany negative eigenvalues of P ( ǫ ) to zero. This spectral transformation has the advan-tage that it uses the smallest possible correction which makes the transformed matrixpositive semideﬁnite. In Appendix A, we describe a ﬁrst order approximation to thespectral decomposition. Instead of actually performing the singular value decompo-sition, we may simply test P ( ǫ ) for negative eigenvalues using Eq. (A3). Only theeigenvalues with small λ (0) k need be tested and/or replaced. Thus the eigendecompo-sition approach is especially attractive when only a small number of eigenvalues arequestionable. T [ P ( ǫ ) ] also has a block LDL T representation: [ I N + ǫ P (1) L P (0) − ] P (0) [ I N + ǫ P (1) L P (0) − ] T . This factorization is less numerically eﬃcient than the LD − L T rep-resentation.In our ﬁltering applications, we use the transformation, P ( ǫ )+ = T [ P ( ǫ ) ], to stabilize5he data assimilation and variance evaluations. We note that T [ T [ P ( ǫ ) ]] = T [ P ( ǫ ) ].This property is important when the covariance matrix, P ( ǫ ) , is modiﬁed many timeswith small updates. We let T [ · ] denote the appropriate stabilizing transformation. III. DIAGONALLY DOMINANT DISCRETE KALMAN FILTERS

We consider the discrete linear state space model: ~x i +1 = Φ( i + 1 , i ) ~x i + Γ i ~w i , (3 . ~y i = H i ~x i + ~v i , (3 . ~x i is the state vector of dimension N , ~y i is the measurement vector of dimension m , and Φ( j, i ) is the N × N nonsingular deterministic part of the map from time i totime j . The system noise, ~w k , is assumed to be an r -dimensional white Gaussian withcovariance Q i . The measurement noise is a m -dimensional white Gaussian sequencewith nondegenerate covariance R i . The m × N measurement evaluation matrix, H i ,maps the state vector, ~x i , onto the deterministic part of the measurements. We deﬁnethe N × N matrices, Q Γ ,i ≡ Γ i Q i Γ Ti and J i ≡ H Ti R − i H i .The standard Kalman ﬁlter estimates the state vector, ˆ ~x ( i | j ), at time i given themeasurements, ~y , . . . , ~y j up to time j by the time evolution update:ˆ ~x ( i + 1 | i ) = Φ( i + 1 , i )ˆ ~x ( i | i ) . (3 . P ( i | j ), of the estimate, ˆ ~x ( i | j ), evolves as P ( i + 1 | i ) = Φ( i + 1 , i ) P ( i | i )Φ T ( i + 1 , i ) + Q Γ ,i . (3 . ~x (0 |

0) and P (0 |

0) are given. The measurement update isˆ ~x ( i | i ) = ˆ ~x ( i | i −

0) and J i ≡ H Ti R − i H Ti are N.B.D. Weassume that the leading order operator, Φ( i + 1 , i ), is normal, so that its eigenvectorsare orthogonal. For clarity, we denote Φ (0) ( i + 1 , i ) by Λ i . We denote the k, ℓ -thsubblocks of P ( i | j ) by P ( i | j ) { k,ℓ } , and use similar subscripts for the subblocks of J i , Q Γ ,i etc. We present the expansion of the Kalman ﬁlter only to ﬁrst order. Higherorder expressions are similar, but longer.The time evolution update for ˆ ~x , Eq. (3.3), may be computed to arbitrary order ifdesired. The time evolution of the covariance for the standard N.B.D. representationsatisﬁes P (0) ( i + 1 | i ) { k,k } = Λ( i ) k P (0) ( i | i ) { k,k } Λ( i ) Tk + [ Γ Q Γ T ] (0) i { k,k } , (3 . P (1) ( i + 1 | i ) { k,ℓ } = Λ( i ) k P (1) ( i | i ) { k,ℓ } Λ( i ) Tℓ + Φ (1) ( i + 1 , i ) { k,ℓ } P (0) ( i | i ) { ℓ,ℓ } Λ( i ) Tℓ +Λ( i ) k P (0) ( i | i ) { k,k } Φ (1) ( i + 1 , i ) T { ℓ,k } + [ Q Γ ,i ] (1) { k,ℓ } , (3 . ~x ( i | i ) and P ( i | i ) is sepa-rated into four steps. First, the zeroth order, block diagonal approximation to P ( i | i )is determined by solving the block system P (0) ( i | i ) = [ P (0) ( i | i − − + J (0) i ] − (3 . P ( i | i ) are P (1) ( i | i ) = P (0) ( i | i ) − [ P (0) ( i | i − − P (1) ( i | i − P (0) ( i | i − − − J (1) ] P (0) ( i | i ) − . (3 . P ( i | i ) is forced to be positive semideﬁnite using the trans-formation: P ( ǫ )+ ( i | i ) = T [ P ( ǫ ) ( i | i )]. Finally, we update our estimate of ~x ( i | i ) usingˆ ~x ( i | i ) = ˆ ~x ( i | i −

1) + P ( ǫ )+ ( i | i ) H Ti R − i [ ~y i − H i ˆ ~x ( i | i − , (3 . T [ P ( ǫ ) ( i | i )] − P ( ǫ ) ( i | i ), are not propagated in the ﬁlter. Instead, new stabilizingterms are calculated at every data assimilation step. If P ( i +1 | i ) or P ( i | i ) needs to be7valuated to assess the uncertainty in the state space estimate, we use the stabilizedapproximations.Comments:1) Our covariance matrices, P ( ǫ ) ( i | i ), are approximations of the covariance matri-ces of the optimal estimate, ˆ ~x ( i | i ), and not the actual covariance of the approximateestimate, ˆ ~x ( ǫ ) ( i | i ).2) Computational savings occurs only for the ﬁrst order approximation to thestate vector covariance matrix, P , and not for the state estimate. The computationalrequirements of second order calulations are approximately equal to the costs of theoriginal Kalman ﬁlter. Our approximate ﬁlter does not require successive matrixinversions, and is therefore more numerically stable.3) The zeroth order block matrices, P (0) ( i | i ) and P (0) ( i | i − P → T [ P ], guarantee positive semideﬁniteness.5) In general, the block diagonal structure is incompatible with sequential pro-cessing of the measurements since each H i, { k } R − { k,k } H i { k } separately is usually notblock diagonal.6) Suboptimal versions of the information ﬁlter reformulation may be constructedusing duality. A second order upper bound on P − , constructed using an informationﬁlter and the stabilizing transformation, T b , and thereby producing a lower bound on P . 7) Diﬀerent suboptimal ﬁlters with positive deﬁnite covariance may be constructedby expanding other formulations of the Kalman ﬁlter order by order and insertingthe transformation P → T [ P ] whenever necessary. To guarantee that the N.B.D.structure is fully utilized, each matrix in the reformulation should be N.B.D. Forexample, replacing Eq. (3.6) by P ( i | i ) = P ( i | i − − K i H i P ( i | i − , or P ( i | i ) =[ I N − K i H i ] P ( i | i − I N − K i H i ] T + K i R i K Ti yields a system of equations where8ach term in the evaluations is not explicitly N.B.D. Similarly, replacing the Kalmangain matrix, K i , of Eq. (3.7) with the representation, K i ≡ P ( i | i − H Ti [ H i P ( i | i − H Ti + R i ] − , which is not in N.B.D. form, results in a system which is not explicitlyN.B.D. IV. N.B.D. FIXED INTERVAL SMOOTHERS

In this section, we derive suboptimal, second order approximations to the variousformulations of the ﬁxed interval Kalman smoother. We denote the ﬁnal measure-ment time by N f . We begin with the Rauch-Tung-Striebel (R.T.S.) formulation ofthe smoother. We then present a new information formulation of the R.T.S. smootheras well as the Bryson-Frazier formulation. The R.T.S. smoother consists of a forwardKalman ﬁlter followed by a backward smoother correction. This structure arises be-cause the estimation equations for ~x ( i | N f ) have a block tribanded structure. Theforward-backward sweeps correspond to the standard algorithm for solving blocktribanded matrices. In our notation, the R.T.S. smoother (Rauch et al. (1965),Bryson and Ho, Ch. 13.2 (1969)) isˆ ~x ( i | N f ) = ˆ ~x ( i | i ) + P ( i | i )Φ( i + 1 | i ) T P − ( i + 1 | i ) (cid:16) ˆ ~x ( i + 1 | N f ) − ˆ ~x ( i + 1 | i ) (cid:17) , (4 . P ( i | N f ) = P ( i | i )+ (4 . P ( i | i )Φ( i + 1 | i ) T P − ( i + 1 | i ) [ P ( i + 1 | N f ) − P ( i + 1 | i )] P − ( i + 1 | i )Φ( i + 1 | i ) P ( i | i ) . We assume that P ( i, i ) and P ( i + 1 , i ) have been computed using the N.B.D. ap-proximations and stabilizing transformations of Sec. II. We stabilize both P ( i | i ) and P − ( i + 1 | i ) before evaluating Eq. (4.1) to all orders. The R.T.S. ﬁxed intervalsmoother is explicitly in N.B.D. form, and the N.B.D. expansion of Secs. II and III isused to evaluate P ( i | N f ) to second order. To ensure positive deﬁniteness, we stabilizeour estimate of P ( i | N f ): P ( ǫ )+ ( i | N f ) = T [ P ( ǫ ) ( i | N f )].A desirable property of a smoother is that P ( ǫ ) ( i | i ) ≥ P ( ǫ ) ( i | N f ) ≥

0, and un-fortunately our suboptimal approximation of the R.T.S. smoother does not explicitlyinsure this property for moderate values of ǫ . In contrast, we now show the informa-tion formulation of the R.T.S. smoother covariance equation possesses the property9hat P − ǫ ) ( i | N f ) ≥ P − ǫ ) ( i | i ) ≥

0. We apply the Sherman-Morrison matrix inverseidentity to Eq. (4.2) twice and simplify to yield P − ( i | N f ) = P − ( i | i )+Φ( i +1 | i ) T (cid:20)(cid:16) P − ( i + 1 | N f ) − P − ( i + 1 | i ) (cid:17) − + Q Γ ,i (cid:21) − Φ( i +1 | i ) . (4 . ǫ and applying the stabilizing transformation, P − ǫ ) ( i | N f ) → T [ P − ǫ ) ( i | N f )] . The original R.T.S. formulation requires that the evolution equations be integratedbackward in time during the backward sweep. Since we are interested in distributeddynamical systems with dissipation and diﬀusion, such a backward integration is ill-conditioned. The Bryson-Frazier formulation of the ﬁxed interval smoother reducesthis problem by making the following change of variables for the smoother correction:ˆ ~x ( i | N f ) = ˆ ~x ( i | i ) − P ( i | i )Φ( i + 1 | i ) T λ ( i ) , (4 . P ( i | N f ) = P ( i | i ) − P ( i | i )Φ( i + 1 | i ) Λ ( i )Φ( i + 1 | i ) T P ( i | i ) . (4 . N vector ~λ ( i ) and N × N positive deﬁnite symmetric matrix Λ ( i ), equations (4.1)-(4.2) transform to ~λ ( i −

1) = ( I n − P ( i | i ) J i ) T [Φ( i + 1 , i ) ~λ ( i ) − H Ti R − i ( ~y i − H i ˆ ~x ( i | i − , (4 . Λ ( i −

1) = ( I n − P ( i | i ) J i ) T Φ( i +1 , i ) T Λ ( i )Φ( i +1 , i )( I n − P ( i | i ) J i )+ J i − J i P ( i | i ) J i , (4 . ~λ ( N f ) = 0 , Λ ( N f ) = 0. Λ ( i ) is positive semideﬁnite,but its approximation, Λ ( ǫ ) ( i ), need not be. We do not stabilize our estimate of Λ ( ǫ ) ( i )or any term in Eq. (4.7), since adding positive deﬁnite terms to Λ ( i ) will tend tounderestimate P ( i | N f ). Instead, we again apply the stabilizing transformation onlyto P ( i | N f ): P ( i | N f ) → T [ P ( i | N f )]. V. CONCLUSION

In this article, we have given ﬁrst and second order approximations for the Kalmanﬁlter and a number of smoothers by expanding the estimation equations in powers of10he coupling parameter. We have described the formulations the estimation equationswhich explicit preserve the N.B.D. structure. We apply stabilizing transformationsto ensure the approximate covariance is positive semideﬁnite. We do not propagatethese stabilizing terms in the Kalman ﬁlter in order to minimize the perturbation.Other N.B.D. formulations are possible where stabilizing terms are added to thecovariance and propagated in the ﬁlter. To minimize the eﬀect of the terms theapproximate spectral representation of Eq. (A5) may be used. Only the small ornegative eigenvalues need be modiﬁed. The stabilizing transformation need not beapplied at every time step. Instead, the values of P ( i | i ) may be examined occasionallyor regularly, and stabilized if they have eigenvalues near zero.The computational advantage in reducing the operations count by using ﬁrst orderapproximations is apparent and scales as O (1 /N b ). If the stochastic system has aspecial structure like nearest neighbors block structure, computational savings mayalso be present for second order approximations.For general N.B.D. structure, second order approximation actually increases thecomputational work over straightforward, nonexpansion calculations. In spite of theadditional complication and computational cost, higher order calculations are some-times necessary and valuable. Higher order calculations for weakly coupled systemshave been given in Shen and Gajic (1990). To motivate second order approximations,we consider a case where P (0) is the identity matrix and that P (1) ( ǫ ) has a largenegative eigenvalue, λ (1)1 , such that ǫ ∗ λ (1)1 = − δ ∗ for the value of ǫ ∗ of interest. P (0) + ǫ ∗ P (1) ( ǫ ∗ ) will have at least one small eigenvalue, λ ∼ δ ∗ , and a correspondinglarge eigenvalue, O (1 /δ ∗ ), for its inverse. Our approximate ﬁlter eﬀectively replacesthis large matrix component by terms of order O (1 / P (0) is bounded from belowand P (0) + ǫ ∗ P (1) ( ǫ ∗ ) is close to singular. Acknowledgement

We thank the referees for their helpful comments. This work was supported bythe U.S. Department of Energy, Grant No. DE-FG02-86ER53223.11 eferences

Anderson, B.D.O. and J.B. Moore (1979).

Optimal Filtering.

Prentice-Hall, NewJersey.Bierman, G.J. (1977).

Factorization Methods for Discrete Sequential Estimation .Academic Press, New York.Bryson, Jr., A.E. and Y.C. Ho (1969).

Applied Optimal Control . Blaisdel PublishingCo., New York.Cohn, S.E. and D.F. Parrish (1991). The behavior of forecast error covariances for aKalman ﬁlter in two dimensions.

Monthly Weather Review , 1757-1785.Canuto, C., M.Y. Hussaini, A. Quarteroni, and T.A.Zang (1988).

Spectral methodsin ﬂuid dynamics.

Springer-Verlag, New York.Gajic, Z., D. Petkovski, X. Shen (1990).

Singularly perturbed and weakly coupledlinear control systems.

Lecture Notes in Control and Information Sciences, No. 140.Springer-Verlag, Berlin.Jazwinski, A.H. (1970).

Stochastic Processes and Filtering Theory . Academic Press,New York.Kokotovic, P., W. Perkins, J. Cruz Jr. and G. D’Ans (1969). ǫ -coupling for near-optimum design of large scale linear systems. Proc. IEEE , 889-892.Moore, J.B. (1973). Discrete time ﬁxed lag smoothing.

Automatica , 163.Rauch, H.E., F. Tung and C.T. Striebel (1965). Maximum likelihood estimates oflinear dynamic systems. AIAA J. , 1445.Sezer, M. and D. Siljak (1986). Nested ǫ -decomposition and clustering of complexsystems. Automatica , 321-331.Shen, X.-M. and Z. Gajic (1990). Near-optimum steady state regulators for stochasticlinear weakly coupled systems. Automatica , 919-923. APPENDIX A: FIRST ORDER N.B.D. MATRIX OPERATIONS A) Storage and Operations Count

We examine the computational savings which occur when the matrix opera-12ions are performed only to ﬁrst order in ǫ . We deﬁne the following moments N ≡ (cid:16) N b P N b k =1 n k (cid:17) / and N = (cid:16) N b P N b k =1 n k (cid:17) / . Thus if all blocks are the samesize, n = n = n b , N = N = N . Our operation count is for the number of scalarmultiplications. In contrast to the other sections, we store the ﬁrst order block diag-onal terms, ǫ P (1) { k,k } , in P (0) ( ǫ ), and assume P (0) { k,k } ( ǫ ) = P (0) { k,k } + ǫ P (1) { k,k } is positivedeﬁnite. This representation slightly reduces the operation count.We consider two subclasses of N.B.D. matrices: general and nearest neighbor.General block diagonal matrices have no particular structure in P (1) and P (2) . Wesay a matrix P ( ǫ ) has nearest neighbor structure if and only if P (1) has nonzeroelements only on the diagonal, P (1) k,k and the adjacent bands, P (1) k,k ± . We say amatrix P ( ǫ ) has a strongly nearest neighbor structure if and only if P (2) has nonzeroelements only on the diagonal and two adjacent bands, P (2) k,k ± and P (2) k,k ± as well.For the N.B.D. matrices with a general structure on P (1) , the storage requirementis N for an arbitrary matrix and N ( N + 1) / P N b k =1 n k + 2 n k n k +1 for arbitrary matrices and P N b k =1 n k ( n k +1)2 + n k n k +1 for symmetric matrices, where n N b +1 ≡

0. When all the n k are equal, n = n = n b , the sums storage requirements for nearest neighbor matricesare (3 N b − n for no symmetry, and ( N b − n + N b n for symmetric matrices.B) LD − L T Factorization

The transformation T [ P ( ǫ ) ] ≡ [ P (0) + ǫ P (1) L ] P (0) − [ P (0) + ǫ P (1) L ] T need not beexplicitly computed by multiplying P (1) L P (0) − P (1) TL . Instead, the implicit LD − L T representation is usually suﬃcient. The LD − L T factorization requires just P N b k =1 n k multiplications to compute P − . The LD − L T representation of T [ P ( ǫ ) ] does requirethat both P (0) and P (0) − be stored. Since both matrices are symmetric, this requiresan additional storage allocation of P N b k − n k ( n k +1)2 .C) First Order Matrix Multiplication

The N.B.D. structure is preserved under matrix multiplication. Let R = P Q ,then R (0) = P (0) Q (0) and R (1) = P (0) Q (1) + P (1) Q (0) . Calculating R (0) requires P N b k =1 n k operations and calculating R (1) requires 2 P N b k =1 n k ( N − n k ) for a total of2 NN N b − N N b operations. If both P and Q have nearest neighbor symmetry, calculating13 (1) requires only 2 P N B k =1 n k ( n k +1 + n k − ) operations where n ≡

0, and n N b +1 ≡ N N b − N N b .A second matrix operation which is often performed in ﬁltering is S = RQR T ,where Q is symmetric. For ordinary matrices, this symmetric product requires N + N operations. Computing S (0) = R (0) Q (0) R (0) T and R (0) Q (0) R (1) T re-quires P N b k =1 n k + ( N + ) n k = ( N +1 / NN N b + N N b . Estimating R (0) Q (1) R (0) T re-quires [ NN N b − N N b ] multiplications. Thus the symmetric product requires a to-tal of (5 N +1) N N b − N N b multiplications. For nearest neighbor matrices, a total of P N b k =1 n k (3 n k + 1 + 5( n k − + n k +1 )) multiplications are required.D) Matrix Inversion and D − [ D − L ] D − [ D − L ] T D − Factorization

To stabilize the order by order approximate inversion, we deﬁne the

Inv [ · ] transfor-mation to be the T [ · ] transformation applied to the approximate inverse: Inv [ P ( ǫ ) ] ≡ T [ P ( ǫ ) − ]. When the approximation is ﬁrst order, Inv [ · ] reduces to T [ P (0) − − ǫ P (0) − P (1) P (0) − ] = P (0) − [ P (0) − ǫ P (1) L ] P (0) − [ P (0) − ǫ P (1) L ] T P (0) − . We refer tothis factorization of the approximate inverse as the D − L ′ D − L ′ T D − factorizationwhere L ′ ≡ P (0) − ǫ P (1) L .The approximate inverse, Inv [ P ], usually does not need to be computed explic-itly. Instead the D − L ′ D − L ′ T D − representation of Inv [ P ( ǫ ) ] is deﬁned implicitly.Given the L T D − L representation of T [ P ( ǫ ) ], the D − L ′ D − L ′ T D − representationof Inv [ T [ P ( ǫ ) ]] requires no additional storage and no multiplications. The inverse of Inv is Inv : Inv [ Inv [ T [ P ( ǫ ) ]]] = T [ P ( ǫ ) ], and the Inv [ · ] operation commutes with the T [ · ] operation: Inv [ T [ P ( ǫ ) ]] = T [ Inv [ P ( ǫ ) ]].E) Inverse Matrix Updates

In Kalman ﬁltering, we successively update the covariance matrix and then itsinverse. We now examine updates of the D − ( D − L ) D − ( D − L ) T D − representationunder matrix addition. We let the matrices, M and J , be block diagonally dominantsymmetric matrices with LD − L T representations. We wish to derive an LD − L T representation of P where P − ≡ M − + J . The zeroth order matrix P (0) satisﬁes P (0) − = M (0) − + J (0) . Since M (0) − is given in the LD − L T factorization, thecomputation of P (0) − requires only additions and no multiplications. However P (0) P N b k =1 n k + n k operations. We note that P (0) − = P − , but that P − = P (1) − . To determine P (1) , we ﬁrst determine P − , and then solve for P (1) : P (1) = P (0) M (0) − M (1) M (0) − P (0) − P (0) J (1) P (0) , ( A P (1) = [ I − P (0) J (0) − ] M (1) [ I − J (0) − P (0) ] − P (0) J (1) P (0) . ( A J (0) ≪ M (0) − . Eitherformulation requires hP N b k =1 n k (3 N − n k ) i operations.F) Approximate Eigenvalues and Eigenvectors

The eigenvalues and eigenvectors may be estimated from perturbation theory. Weuse the following basic result from linear algebra. Let P be a symmetric matrix form S o + ǫ S , let { ~e (0) k } and { λ (0) k } be the eigenvectors and eigenvalues of S , then theeigenvalues of S are asymptotically λ k ∼ λ (0) k + ǫ~e (0) Tk S ~e (0) k , ( A ~e k ∼ ~e (0) k − ǫ ( S o − λ (0) k I ) − ( S − ( ~e (0) Tk S ~e (0) k ) I ) ~e (0) k , ( A P = N X k =1 λ k ~e k ~e Tk ∼ N X k =1 ( λ (0) k + ǫλ (1) k )( ~e (0) k + ǫ~e (1) k )( ~e (0) k + ǫ~e (1) k ) T . ( A APPENDIX B: N.B.D. FIXED LAG DISCRETE SMOOTHERS

We consider suboptimal approximations of ﬁxed lag Kalman smoothers withN.B.D. structure. Moore derived the ﬁxed lag Kalman smoother (Moore (1973);15h. 7.3 of Anderson and Moore (1979)) as the Kalman ﬁlter for the augmentedstate space, ~x A ( t ) T ≡ ( ~x ( t ) T , ~x ( t − T . . . ~x ( t − n ) T ), and then simpliﬁed the result-ing augmented ﬁlter. In presenting the ﬁxed lag smoother, we rewrite the equa-tions and reorder the matrix indices of Anderson and Moore to achieve an explicitN.B.D. structure. We deﬁne ~e (1) i = [ I N − J i P ( i | i )] H Ti R − i ( ~y i − H i ˆ ~x ( i | i − ~e ( j +1) i = Φ( i − j + 1 , i − j )[ I N − P ( i − j | i − j ) J i − j ] ~e ( j ) i . The ﬁxed lag smoother isˆ ~x ( i | i + n ) = ˆ ~x ( i | i ) + P ( i | i − n X ℓ =1 ~e ( ℓ +1) i + ℓ . ( B ~x ( i | i + n ) is P ( i | i + n ) = P ( i | i ) − n X ℓ =1 P ( ℓ ) ( i + ℓ | i + ℓ −

1) appears only once in the expression,and that all the other terms are input quantities, usually known to all orders. Inthe limit that J i << P ( i | i − − , our formulation approximates small terms whilethe alternative formulation approximates large terms. For these reasons, we generallyprefer our formulation in Eqs. (B1-3).We stabilize the data assimilation by using P ( ǫ )+ ( i − j | i − j ) = T [ P ( ǫ ) ( i − j | i − j )]in evaluating ~e ( j +1) i and using P ( ǫ )+ ( i | i −

1) in Eq. (B1). If P ( i | i + n ) is of interest,we also stabilize our approximation of it.16ur work is motivated by and generalises the results of Cohn and Parrish. InAppendix B of Cohn and Parrish (1991), the authors show that if H Ti R − i H i isdiagonal and the evolution equations are diagonal, then the estimation covariancewill be diagonal. In this article, we extend their results to include block diagonalsystems and Kalman smoothers. More importantly, we relax the requirement ofexact diagonality and consider small oﬀdiagonal terms. We expand the estimationequations in powers of the oﬀdiagonal terms and develop numerically wellconditionedalgorithms to compute these approximate estimationIn applying Kalman ﬁltering to global circulation models, Cohn and Parrish (Cohnand Parrish (1991)) noted that the evolution equtions are simplest in an eigenfunctionbasis while the data assimilation is simplest in a ﬁnite diﬀerence or ﬁnite elementrepresentation. By requiring that the measurement locations be distributed suchthat H Ti R − ii