[PDF] A low-rank approach to the solution of weak constraint variational data assimilation problems

Abstract

Weak constraint four-dimensional variational data assimilation is an important method for incorporating data (typically observations) into a model. The linearised system arising within the minimisation process can be formulated as a saddle point problem. A disadvantage of this formulation is the large storage requirements involved in the linear system. In this paper, we present a low-rank approach which exploits the structure of the saddle point system using techniques and theory from solving large scale matrix equations. Numerical experiments with the linear advection-diffusion equation, and the non-linear Lorenz-95 model demonstrate the effectiveness of a low-rank Krylov subspace solver when compared to a traditional solver.

Full PDF

aa r X i v : . [ m a t h . NA ] F e b A low-rank approach to the solution of weak constraintvariational data assimilation problems.

Melina A. Freitag ∗ Daniel L. H. Green † Abstract

Weak constraint four-dimensional variational data assimilation is an important methodfor incorporating data (typically observations) into a model. The linearised systemarising within the minimisation process can be formulated as a saddle point problem. Adisadvantage of this formulation is the large storage requirements involved in the linearsystem. In this paper, we present a low-rank approach which exploits the structure ofthe saddle point system using techniques and theory from solving large scale matrixequations. Numerical experiments with the linear advection-diﬀusion equation, and thenon-linear Lorenz-95 model demonstrate the eﬀectiveness of a low-rank Krylov subspacesolver when compared to a traditional solver.

Keywords

Data assimilation, weak constraint 4D-Var, iterative methods, matrix equa-tions, low-rank methods, preconditioning.

Data assimilation is a method for combining a numerical model with observations obtainedfrom a physical system, in order to create a more accurate estimate for the true state ofthe system. One example where data assimilation is used is numerical weather prediction,however it is also applied in areas such as oceanography, glaciology and other geosciences.A property which these applications all share is the vast dimensionality of the statevectors involved. In numerical weather prediction the systems have variables of order 10 [24]. In addition to the requirement that these computations to be solved quickly, the storagerequirement presents an obstacle. In this paper we propose an approach for implementingthe weak four-dimensional variational data assimilation method with a low-rank solution inorder to achieve a reduction in storage space as well as computation time. The approachinvestigated here is based on a recent paper [38] which implemented this method in thesetting of PDE-constrained optimisation. We introduce here a low-rank modiﬁcation toGMRES in order to generate low-rank solutions in the setting of data assimilation.This method was motivated by recent developments in the area of solving large sparsematrix equations, see [3, 23, 30, 32, 36, 37], notably the Lyapunov equation AX + XA T = − BB T ∗ Department of Mathematical Sciences, University of Bath, Claverton Down, BA2 7AY, United Kingdom(email: [email protected] ). Corresponding author. † Department of Mathematical Sciences, University of Bath, Claverton Down, BA2 7AY, United Kingdom(email: [email protected] ). VARIATIONAL DATA ASSIMILATION in which we solve for the matrix X , where A , B and X are large matrices of matchingsize. It is known that if the right hand side of these matrix equations are low-rank, thereexist low-rank approximations to X [21]. There are a number of methods which iterativelygenerate low-rank solutions; see e.g. [13, 26, 30, 32, 36], and it is these ideas which areemployed in this paper.Alternative methods [14, 31, 39] have been considered for computing low-rank solutions,based on sequential data assimilation methods such as the Kalman ﬁlter [22, 31]. Fur-thermore there have been developments in applying traditional model reduction techniquessuch as Balanced Truncation [29] and Principal Orthogonal Decomposition (POD) to dataassimilation; e.g. [10, 25]. In this paper we take a diﬀerent approach, the data assimila-tion problem is considered in its full formulation, however the expensive solve of the linearsystem is done in a low-rank in time framework.In the next section we introduce a saddle point formulation of weak constraint fourdimensional variational data assimilation. Section 3 explains the connection between thearising linear system and the solution to matrix equations. We also introduce a low-rankapproach to GMRES, and consider several preconditioning strategies. Numerical results arepresented in Section 4, with an extension to time-dependent systems considered in Section 5. Variational data assimilation, initially proposed in [34, 35] is one of two families of meth-ods for data assimilation, the other being sequential data assimilation which includes theKalman Filter and modiﬁcations [14, 22, 31].We consider the discrete-time non-linear dynamical system x k +1 = M k ( x k ) + η k , (2.1)where x k ∈ R n is the state of the system at time t k and M k : R n → R n is the non-linearmodel operator which evolves the state from time t k to t k +1 for k = 0 , . . . N −

1. The modelerrors are denoted η k , and are assumed to be Gaussian with zero mean and covariancematrix Q k ∈ R n × n .Observations of this system, y k ∈ R p k at time t k for k = 0 , . . . N are given by y k = H k ( x k ) + ǫ k , (2.2)where H k : R n → R p k is an observation operator, and ǫ k is the observation error. Ingeneral, p k ≪ n . This observation operator H k may also be non-linear, and may haveexplicit time dependence. The observation errors are assumed to be Gaussian, with zeromean and covariance matrix R k ∈ R p k × p k .We assume that at the initial time we have an a priori estimate of the state, which werefer to as the background state, and denote x b . This is commonly the result of a short-range forecast, or a previous assimilation, and is typically taken to be the ﬁrst guess duringthe assimilation process. We assume that this background state has Gaussian errors withcovariance matrix B ∈ R n × n . Four dimensional variational data assimilation (4D-Var) is so called for three spatial dimen-sions, plus time, and to diﬀerentiate it from three-dimensional variational data assimilation2

VARIATIONAL DATA ASSIMILATION (3D-Var), where we do not consider multiple observation times. In 4D-Var, we ﬁnd an initialstate which minimises both the weighted least squares distance to the background state x b ,and the weighted least squares distance between the model trajectory of this initial state x k and the observations y k for an assimilation window [ t , t N ]. Mathematically, we can writethis as a minimisation of a cost function, e.g. argmin J ( x ), where J ( x ) = 12 ( x − x b ) T B − ( x − x b ) | {z } J b + 12 N X i =0 ( y i − H i ( x i )) T R − i ( y i − H i ( x i )) | {z } J o + 12 N X i =1 ( x i − M i ( x i − )) T Q − i ( x i − M i ( x i − )) | {z } J q , = 12 k x − x b k B − + 12 N X i =0 k y i − H i ( x i ) k R − i + 12 N X i =1 k x i − M i ( x i − ) k Q − i , (2.3)where x = [ x T , x T , . . . , x TN ] T , and x k is the model state at each timestep t k for k = 0 , . . . , N .This is known as weak constraint strong constraint J q term.The additional cost of weak constraint 4D-Var, and the diﬃculties in computing Q k mean that it is not widely implemented in real world systems. However, accounting for thismodel error (with suitable covariances) would lead to improved accuracy, and the addedpotential of longer assimilation windows [17, 18]. To implement 4D-Var operationally, an incremental approach [11] is used, which is merelya form of Gauss-Newton iteration and generates an approximation to the solution of x =argmin J ( x ). We approximate the 4D-Var cost function by a quadratic function of anincrement δx ( ℓ ) = h ( δx ( ℓ )0 ) T , ( δx ( ℓ )1 ) T , . . . , ( δx ( ℓ ) N ) T i T deﬁned as δx ( ℓ ) = x ( ℓ +1) − x ( ℓ ) , (2.4)where x ( ℓ ) = h ( x ( ℓ )0 ) T , ( x ( ℓ )1 ) T , . . . , ( x ( ℓ ) N ) T i T denotes the ℓ -th iterate of the Gauss-Newtonalgorithm. Updating this estimate is implemented in an outer loop , whilst generating δx ( ℓ ) is referred to as the inner loop . This increment δx ( ℓ ) is a solution to the minimisation ofthe linearised cost function˜ J ( δx ( ℓ ) ) = 12 ( δx ( ℓ )0 − b ( ℓ )0 ) T B − ( δx ( ℓ )0 − b ( ℓ )0 )+ 12 N X i =0 ( d ( ℓ ) i − H i δx ( ℓ ) i ) T R − i ( d ( ℓ ) i − H i δx ( ℓ ) i )+ 12 N X i =1 ( δx ( ℓ ) i − M i δx ( ℓ ) i − − c ( ℓ ) k ) T Q − i ( δx ( ℓ ) i − M i δx ( ℓ ) i − − c ( ℓ ) k ) . (2.5)3 VARIATIONAL DATA ASSIMILATION

Here M k ∈ R n × n and H k ∈ R n × p k , are linearisations of M k and H k about the current statetrajectory x ( ℓ ) . For convenience and conciseness, we introduce b ( ℓ )0 = x b − x ( ℓ )0 , (2.6) d ( ℓ ) k = y k − H k ( x ( ℓ ) k ) , (2.7) c ( ℓ ) k = M k ( x ( ℓ ) k − ) − x ( ℓ ) k . (2.8)We deﬁne the following vectors in order to rewrite the cost function in a more compactform. δx =  δx δx ... δx N  , δp =  δx δq ... δq N  , where we have dropped the superscript for the outer loop iteration. These two vectors arerelated by δq k = δx k − M k δx k − , or in matrix form δp = Lδx, (2.9)where L =  I − M I . . . . . . − M N I  ∈ R ( N +1) n × ( N +1) n . (2.10)Furthermore, we introduce the following matrices: D =  B Q . . . Q N  ∈ R ( N +1) n × ( N +1) n , R =  R R . . . R N  ∈ R N P k =0 p k × N P k =0 p k , H =  H H . . . H N  ∈ R ( N +1) n × N P k =0 p k , b =  b c ... c N  ∈ R ( N +1) n , d =  d d ... d N  ∈ R N P k =0 p k . This allows us to write (2.5), with the superscripts dropped, as a function of δx :˜ J ( δx ) = ( Lδx − b ) T D − ( Lδx − b ) + ( H δx − d ) T R − ( H δx − d ) . (2.11)Minimising the cost function is equivalent to solving the linear system for the gradient.Indeed, taking the gradient of this cost function with respect to δx , we have ∇ ˜ J ( δx ) = L T D − ( Lδx − b ) + H T R − ( H δx − d ) . (2.12)Deﬁning λ = D − ( b − Lδx ) and µ = R − ( d − H δx ), allows us to write the gradient atthe minimum as ∇ ˜ J = L T λ + H T µ = 0 . (2.13)4 LOW-RANK APPROACH

Additionally, we have Dλ + Lδx = b, (2.14) Rµ + Hδx = d, (2.15)and (2.13), (2.14) and (2.15) can be combined into a single linear system:  D L R H L T H T   λµδx  =  bd  , (2.16)which is solved for δx .This equation is known as the saddle-point formulation for weak constraint 4D-Var, andallows us to exploit the saddle point structure for linear solves and preconditioning [5, 8, 38].The saddle point matrix in (2.16), is a square symmetric indeﬁnite matrix of size (cid:16) n ( N + 1) + P Nk =0 p k (cid:17) . In order to successfully solve this system we must use an iter-ative solver such as MINRES or GMRES as it is unfeasible with these large problem sizesto use a direct method. Additionally we require a good choice of preconditioner for a saddlepoint system [5, 6, 7, 8, 9, 18], which in a data assimilation setting, has a (1 ,

2) block whichis more computationally expensive than the (1 ,

1) block. The inexact constraint precondi-tioner [8] has been found to be an eﬀective choice of preconditioner for the data assimilationproblem [18], but application of this results in a nonsymmetric system necessitating the useof GMRES. We consider diﬀerent preconditioning approaches in Section 3.4. Furthermore,to overcome the storage requirements of the matrix in (2.16), we wish to avoid forming it(and indeed as many of the submatrices as possible), which motivates the method describedin the following section.

As noted above, the matrix formed in the saddle point formulation is very large, as indeedare the vectors λ, µ, δx . We wish to adapt the ideas developed in [38] in order to solve (2.16).This approach is dependent on the Kronecker product and the vec ( · ) operator; which aredeﬁned to be A ⊗ B =  a B · · · a n B ... . . . ... a m B · · · a mn B  vec ( C ) =  c ... c n ... c mn  . We also make use of the relationship between the two:( B T ⊗ A )vec ( C ) = vec ( ACB ) . (3.1)Employing this deﬁnition, we may rewrite (2.16) as  E ⊗ B + E ⊗ Q I N +1 ⊗ I n + C ⊗ M I N +1 ⊗ R I N +1 ⊗ HI N +1 ⊗ I n + C T ⊗ M T I N +1 ⊗ H T   λµδx  =  bd  , (3.2)5 LOW-RANK APPROACH where we make the additional assumptions that Q i = Q , R i = R , H i = H , M i = M andthe number of observations p i = p for each i . The extended case relaxing this assumptionis considered in Section 5. Here C =  − −  , E =   , and E =   . The matrices

C, E , E , I N +1 ∈ R N +1 × N +1 , whilst B, Q, M, I n ∈ R n × n , H ∈ R p × n , and R ∈ R p × p .Using (3.1), we may rewrite (3.2) as the simultaneous matrix equations: B Λ E + Q Λ E + X + M XC T = b ,RU + HX = d , Λ + M T Λ C + H T U = 0 ., (3.3)where we suppose λ, δx, b, µ and d are vectorised forms of the matrices Λ , X, b ∈ R n × N +1 and U, d ∈ R p × N +1 respectively. These are generalised Sylvester equations, which we solvefor Λ , U and X , though for implementing incremental data assimilation, we require only δx and hence the solution X .For standard Sylvester equations of the form AX + X B = C , it is known that if theright hand side C is low-rank, then there exist low-rank approximate solutions [21]. Indeed,recent algorithms for solving these Sylvester equations have focused on constructing low-rank approximate solutions. These algorithms include Krylov subspace methods (see [37])and ADI based methods (see [2, 4, 19]). It is this knowledge which motivates the followingapproach. We wish to show that we can ﬁnd a low-rank approximate solution to (3.2). Further to theassumption that the model and observations are not time-dependent, let us additionallyassume that the model is linear and perfect. Thus c k = M k ( x k − ) − x k = 0 for all k , giving b =  b c ... c N  =  b  , and hence b = (cid:2) b · · · (cid:3) ∈ R n × N +1 . (3.4)Assuming R is non-singular, solving the second block-row of (3.2) for µ yields, µ = ( I N +1 ⊗ R − ) d − ( I N +1 ⊗ R − H ) δx, (3.5)which when substituted into the third block-row of (3.2) gives( I N +1 ⊗ I n + C T ⊗ M T ) λ − ( I N +1 ⊗ H T R − H ) δx = − ( I N +1 ⊗ H T R − ) d. (3.6)6 LOW-RANK APPROACH

Reformulating this as a matrix equation as before, we are left with the simultaneous (block-row) equations X + M XC T + B Λ E + Q Λ E = b (3.7)Λ + M T Λ C − H T R − HX = − H T R − d . (3.8)Assuming M − exists, we multiply (3.8) by M − T to obtain M − T Λ + Λ C = M − T H T R − HX − M − T H T R − d . (3.9)Typically in real world applications, we only observe a small proportion of the state space.As such, the matrix d containing these observations is low-rank, as is the observationoperator H . Hence the right hand side of (3.9) is low-rank.Applying the existence of low-rank solutions for Sylvester equations shown in [21] to(3.9), we have that Λ, or indeed an approximate solution ˜Λ, is low-rank.Finally, multiplying (3.7) by M − , and substituting in ˜Λ gives another Sylvester equationof the form M − X + XC T = M − (cid:16) b − B ˜Λ E − Q ˜Λ E (cid:17) . (3.10)From the assumption that the model is perfect, we see from (3.4) that b is indeed low-rank, being rank 1, and hence from above, so is ˜Λ. Thus the right hand side of this Sylvesterequation (3.10) is also low-rank. Applying once more the result from [21], we obtain thedesired property that X is low-rank, or indeed there is an approximate solution ˜ X to X which is low-rank.We formulate this result as the following Theorem. Theorem 3.1.

Consider the solution to the saddle point formulation of the linearised weakconstraint 4D-Var problem (3.3) . Let the model and observations be time-independent, with M = M k , R = R k , H = H k , Q = Q k for all k . Furthermore, we assume there is no modelerror, and that the model operator M , and the covariance matrix R are invertible. If thenumber of observations p ≪ n , then there exists a low-rank approximation X r = W V T to X , where δx = vec ( X ) . It is necessary to note that it would be unfeasible to compute low-rank solutions to(2.16) in such a way. Indeed in (3.9) the right hand side still contains X , however theobservation operator allows us to know the right hand side is low-rank.Furthermore we had to make a number of assumptions to obtain this result. Whilstthe assumption that m ≪ n is realistic, the constant operators and covariance matrices arerestrictive. However, as we will see in Section 5, relaxing some of these assumptions stillresults in low-rank solutions observed numerically. In order to implement the above, we suppose as in [1, 38], that the matrices Λ , U, X in (3.3)have low-rank representations, withΛ = W Λ V T Λ , W Λ ∈ R n × k Λ , V Λ ∈ R N +1 × k Λ , (3.11) U = W U V TU , W U ∈ R p × k U , V U ∈ R N +1 × k U , (3.12) X = W X V TX , W X ∈ R n × k X , V X ∈ R N +1 × k X , (3.13)7 LOW-RANK APPROACH where k Λ , k U , k X ≪ n and k Λ , k U , k X ≪ N .This allows us to rewrite (3.3) as follows: (cid:2) BW Λ QW Λ W X M W X (cid:3)  V T Λ E V T Λ E V TX V TX C T  = b , (cid:2) RW U HW X (cid:3) (cid:20) V TU W TX (cid:21) = d , (cid:2) W Λ M T W Λ H T W U (cid:3)  V T Λ V T Λ CV TU  = 0 . (3.14)Since using a direct solver would be infeasible, we use an iterative solver, in this case GM-RES [33] to allow for ﬂexibility in choosing a preconditioner, see Section 3.4. Algorithm 1details a low-rank implementation of GMRES, which leads to low-rank approximate so-lutions to (3.2), making use of (3.14). Fundamentally this is the same as a traditionalvector-based GMRES with a vector z , where instead here we havevec  Z Z T Z Z T Z Z T  = z. Applying the concatenation X k = [ Y k , Z k ] , X k = [ Y k , Z k ] for k = 1 , , x = y + z , since X k X Tk = Y k Y Tk + Z k Z Tk and hence x = vec  X X T X X T X X T  = vec  Y Y T + Z Z T Y Y T + Z Z T Y Y T + Z Z T  = y + z. Note that here we employ the same notation as in [38], using the brackets {} as a con-catenation and truncation operation. Furthermore, after applying the matrix multiplicationand the preconditioning, we also truncate the resulting matrices. How this truncation couldbe implemented is also treated in [38], with options including a truncated singular value de-composition, possibly through Matlab’s inbuilt svds function, or a skinny QR factorisation.In the numerical results to follow, we use a modiﬁcation of the Matlab svds function.In order to compute the inner product h w, v ( i ) i which arises in GMRES when computingthe entries of the Hessenberg matrix (see line 11 in Algorithm 1), we make use of the relationbetween the trace and vec operators:trace( A T B ) = vec ( A ) T vec ( B ) . Since here vec  W W T W W T W W T  = w and vec  V ( i )11 ( V ( i )12 ) T V ( i )21 ( V ( i )22 ) T V ( i )31 ( V ( i )32 ) T  = v ( i ) , LOW-RANK APPROACH we see that we may compute the inner product h w, v ( i ) i as h w, v ( i ) i =trace (cid:16) ( W W T ) T ( V ( i )11 ( V ( i )12 ) T ) (cid:17) + trace (cid:16) ( W W T ) T ( V ( i )21 ( V ( i )22 ) T ) (cid:17) + trace (cid:16) ( W W T ) T ( V ( i )31 ( V ( i )32 ) T ) (cid:17) , (3.15)by considering the submatrices which make up the vectors w and v ( i ) . Importantly however,the matrices formed in (3.15) do not exploit the low-rank nature of the submatrices, being N + 1 × N + 1 matrices. Fortunately, using the properties of the trace operator, we mayconsider instead: h w, v ( i ) i = trace (cid:16) W T V ( i )11 ( V ( i )12 ) T W (cid:17) + trace (cid:16) W T V ( i )21 ( V ( i )22 ) T W (cid:17) + trace (cid:16) W T V ( i )31 ( V ( i )32 ) T W (cid:17) , (3.16)and hence compute the trace of smaller matrices. In line 11 of Algorithm 1, we compute(3.16) as traceproduct ( W , W , W , W , W , W , V ( i )11 , V ( i )12 , V ( i )21 , V ( i )22 , V ( i )31 , V ( i )32 ).The matrix vector multiplication Az in traditional GMRES, is implemented in LR-GMRES by considering the low-rank form of the saddle point equations generated in (3.14).The concatenation is explicitly written in Algorithm 2 and is denoted Amult in Algorithm 1.Note that we have considered traditional GMRES when implementing LR-GMRES,however it would require only a small modiﬁcation to allow for restarted GMRES. All thatremains to consider is preconditioning LR-GMRES, which is implemented in Algorithm 1through the

Aprec function.

We return to the saddle point problem in (2.16). Many approaches exist for preconditioningsaddle point problems, a number of which are detailed in [5, 6]. However, the data assim-ilation setting introduces an unusual situation where the (1 ,

2) block of the saddle pointmatrix is more computationally expensive than the (1 ,

1) block. In [15, 18] it is noted thatthe inexact constraint preconditioner [7, 8, 9] is an eﬀective choice: P =  D L R L T  , (3.17)provided a good approximation ˜ L to L = I N +1 ⊗ I n + C ⊗ M is chosen. Using an in-exact constraint preconditioner requires the use of GMRES since the resulting system isnonsymmetric.Two further requirements must be considered when implementing a preconditioner forLR-GMRES. In order to maintain the low-rank structure we wish to write this in Kroneckerform, however we must also consider the inverse of the preconditioner. It is the implemen-tation of the inverse in Kronecker form which allows us to write this as a simple matrixmultiplication as in (3.14) for the saddle point matrix.We present here a number of diﬀerent choices of preconditioner for LR-GMRES.9 LOW-RANK APPROACH

Algorithm 1

Low-rank GMRES (LR-GMRES)Choose X (0)11 , X (0)12 , X (0)21 , X (0)22 , X (0)31 , X (0)32 . { ˜ X , ˜ X , ˜ X , ˜ X , ˜ X , ˜ X } = Amult ( X (0)11 , X (0)12 , X (0)21 , X (0)22 , X (0)31 , X (0)32 ) .V = { B , − ˜ X } , V = { B , ˜ X } , V = { B , − ˜ X } , V = { B , ˜ X } , V = { B , − ˜ X } , V = { B , ˜ X } . ξ = [ ξ , , . . . , ξ = q traceproduct ( V (1)11 , . . . , V (1)11 , . . . ). for k = 1 , . . . do { Z ( k )11 , Z ( k )12 , Z ( k )21 , Z ( k )22 , Z ( k )31 , Z ( k )32 } = Aprec ( V ( k )11 , V ( k )12 , V ( k )21 , V ( k )22 , V ( k )31 , V ( k )32 ) { W , W , W , W , W , W } = Amult ( Z ( k )11 , Z ( k )12 , Z ( k )21 , Z ( k )22 , Z ( k )31 , Z ( k )32 ) . for i = 1 , . . . , k do h i,k = traceproduct ( W , . . . , V ( i )11 , . . . ), W = { W , h i,k V ( i )11 } , W = { W , V ( i )12 } , W = { W , h i,k V ( i )21 } , W = { W , V ( i )22 } , W = { W , h i,k V ( i )31 } , W = { W , V ( i )32 } . end for h k +1 ,k = p traceproduct ( W , . . . , W , . . . ) V ( k +1)11 = W /h k +1 ,k , V ( k +1)12 = W , V ( k +1)21 = W /h k +1 ,k , V ( k +1)22 = W , V ( k +1)31 = W /h k +1 ,k , V ( k +1)32 = W .Apply Givens rotations to k th column of h , i.e. for j = 1 , . . . k − do (cid:20) h j,k h j +1 ,k (cid:21) = (cid:20) c j s j − ¯ s j c j (cid:21) (cid:20) h j,k h j +1 ,k (cid:21) end for Compute k th rotation, and apply to ξ and last column of h . (cid:20) ξ k ξ k +1 (cid:21) = (cid:20) c k s k − ¯ s k c k (cid:21) (cid:20) ξ k (cid:21) , h k,k = c k h k,k + s k h k +1 ,k ,h k +1 ,k = 0 . if | ξ k +1 | suﬃciently small then Solve ˜ H ˜ y = ξ , where the entries of ˜ H are h i,k . Y = { ˜ y V (1)11 , . . . , ˜ y k V ( k )11 } , Y = { ˜ y V (1)12 , . . . , ˜ y k V ( k )12 } Y = { ˜ y V (1)11 , . . . , ˜ y k V ( k )21 } , Y = { ˜ y V (1)22 , . . . , ˜ y k V ( k )22 } Y = { ˜ y V (1)31 , . . . , ˜ y k V ( k )31 } , Y = { ˜ y V (1)32 , . . . , ˜ y k V ( k )32 }{ ˜ Y , ˜ Y , ˜ Y , ˜ Y , ˜ Y , ˜ Y } = Aprec ( Y , Y , Y , Y , Y , Y ) X = { X (0)11 , ˜ Y } , X = { X (0)12 , ˜ Y } X = { X (0)21 , ˜ Y } , X = { X (0)22 , ˜ Y } X = { X (0)31 , ˜ Y } , X = { X (0)32 , ˜ Y } breakend ifend for LOW-RANK APPROACH

Algorithm 2

Matrix multiplication (

Amult ) Input: W , W , W , W , W , W Output: Z , Z , Z , Z , Z , Z Z = [ BW , QW , W , M W ], Z = [ E W , E W , W , CW ], Z = [ RW , HW ], Z = [ W , W ], Z = [ W , M T W , H T W ], Z = [ W , C T W , W ] As mentioned above, the inexact constraint preconditioner [7] has been seen to be an eﬀec-tive preconditioner for the saddle point formulation of weak constraint 4D-Var [18], provideda suitable choice of approximation of L is taken.The inverse of the inexact constraint preconditioner (3.17) is given by P − =  L − T R − L − − ˜ L − D ˜ L − T  , (3.18)which includes the term ˜ L − . In order to implement this in LR-GMRES, we write ˜ L − in Kronecker form. This restricts the choice of ˜ L , however taking an approximation ˜ L ofthe form I N +1 ⊗ I n + C ⊗ ˜ M , where ˜ M is an approximation to M , the structure of L ismaintained. Additionally, we can write the inverse in Kronecker form as˜ L − = I N +1 ⊗ I n − C ⊗ ˜ M + C ⊗ ˜ M − . . . + C N ⊗ ˜ M N = I N +1 ⊗ I n + N X k =1 ( − k C k ⊗ ˜ M k . (3.19)Despite being able to write this in Kronecker form, this results in an unfeasible numberof terms for large N , futhermore for close approximations ˜ M to the model matrix M , thecomputations are expensive. A possibility is therefore to approximate ˜ L − by truncating(3.19) after a few terms.Truncating after one term we obtain the approximation ˜ L − = I n ( N +1) . Hence inKronecker form we can then write the resulting inverse of the preconditioner as: P − I =  I N +1 ⊗ I n I N +1 ⊗ R − I N +1 ⊗ I n − E ⊗ B − + E ⊗ Q −  . (3.20)To illustrate a possible choice of the Aprec function, we present the application of (3.20)as Algorithm 3.If we take ˜ M = I n we may consider the approximation ˆ L = I N +1 ⊗ I n + C ⊗ I n .Truncating the resulting inverse after two terms we compute that the Kronecker inverse ofthe preconditioner isˆ P − L =  I ⊗ I − C ⊗ I I ⊗ R − I ⊗ I − C ⊗ I J  , (3.21)11 LOW-RANK APPROACH

Algorithm 3

Inexact constraint preconditioner ˜ L − = I n ( N +1) ( Aprec ) Input: W , W , W , W , W , W Output: Z , Z , Z , Z , Z , Z Z = W , Z = W , Z = R − W , Z = W , Z = [ W , − B − W , − Q − W ], Z = [ W , E W , E W ]where J = − ( I ⊗ I − C ⊗ I )( E ⊗ B − )( I ⊗ I − C T ⊗ I ) − ( I ⊗ I − C ⊗ I )( E ⊗ Q − )( I ⊗ I − C T ⊗ I ),and we drop the subscripts for the identities.An alternative approach is to consider an inexact constraint preconditioner where weapproximate H in (2.16) in addition to L . In this example we approximate L by ˜ L = I ,and using the exact H , we obtain P I H =  D I R H I H T  . (3.22)The inverse of which is P − I H =  H T F H −H T F I − H T F H D −F H F F H DI − D H T F H D H T F D H T F H D − D  , (3.23)where F = ( H D H T + R ) − = ( E ⊗ ( HBH T + R ) − ) + ( E ⊗ ( HQH T + R ) − ). If H iscomputationally expensive (such as if H is not a simple interpolatory observation operator),this choice of preconditioner may prove unfeasible. An alternative choice of preconditioner is a Schur complement preconditioner, such as theblock diagonal preconditioner P D =  D R

00 0 ˜ S  , (3.24)where ˜ S is an approximation to the Schur-complement S = − L T D − L − H T R − H . This choice of preconditioner is used in [38], and allows the use of LR-MINRES, thoughin Section 4.2 we use LR-GMRES to compare the diﬀerent choices as in the full-rank case,GMRES and MINRES are theoretically equivalent for symmetric systems.As an approximation to the Schur complement we consider˜ S = − ˜ L T D − ˜ L, (3.25)the inverse of which, ˜ S − = − ˜ L − D ˜ L − T is familiar as the (3 ,

3) term in the inexact con-straint preconditioner inverse (3.18). As such we must approximate this by truncating the12

LOW-RANK APPROACH expansion of ˜ L − (3.19) as before. Considering the approximation ˆ L = I N +1 ⊗ I n + C ⊗ I n and truncating after two terms as before, the block diagonal Schur complement precondi-tioner may be implemented in the same way as the inexact constraint preconditioner (3.21)above. This results in P − D ˆ L =  E ⊗ B − + E ⊗ Q − I ⊗ R −

00 0 J  , (3.26)where J = − ( I ⊗ I − C ⊗ I )( E ⊗ B − )( I ⊗ I − C T ⊗ I ) − ( I ⊗ I − C ⊗ I )( E ⊗ Q − )( I ⊗ I − C T ⊗ I )as before.An alternative method for implementing the Schur complement approximation (3.25) ina low-rank form is detailed in [38]. Instead of truncating the resulting inverse, and applyingthe technique used in Algorithm 3, the relationship between the Kronecker product andSylvester equations is exploited. In order to solve ˜ S Z Z T = W W T , the Kronecker form − ( I ⊗ I + C T ⊗ ˜ M T )( E ⊗ B − + E ⊗ Q − )( I ⊗ I + C ⊗ ˜ M )vec (cid:0) Z Z T (cid:1) = vec (cid:0) W W T (cid:1) , is written as two consecutive Sylvester equations. These resulting Sylvester equations aresolved one after the other using a low-rank solver such as an ADI [2, 4] or Krylov [36]method to generate a low-rank approximation X X T . It is this approach which we employin our numerical implementations in Section 4.2.An alternative Schur complement preconditioner is the block triangular Schur comple-ment preconditioner, which requires the use of LR-GMRES unlike the block diagonal oneabove. This choice uses approximations to L , H , and the Schur complement S , P T =  D L R ˜ H S  . (3.27)When inverted, unlike the other preconditioners we have considered, this maintains aterm containing ˜ L , in addition to the ˜ L − in the Schur complement approximation inverse.Taking the same approximation to S as above, we obtain the inverse P − T =  D − − D − ˜ L ˜ S − R − −R − ˜ H ˜ S − S −  . (3.28)In order to implement this preconditioner, (3.28) must be described in Kronecker form, ap-proximating ˜ S − by truncation or as we use in Section 4.2, the Sylvester equation approachabove. As mentioned above, whilst there has been investigation into preconditioning saddle pointproblems such as [5, 6, 8], most of these choices assume that the (1 ,

1) block is the compu-tationally expensive one.Schur complement preconditioners such as the block diagonal and block triangular ex-amples we consider here are detailed in [5, 6]. Using exact matrices for the approximations13

NUMERICAL RESULTS ˜ S , ˜ L and ˜ H , in (3.24) and (3.27) results in the preconditioned system having two or threeeigenvalues; therefore methods such as MINRES or GMRES converge in at most threesteps. However in general, we must consider approximations which reduces the eﬃcacy ofthe preconditioner. Furthermore, for the data assimilation saddle point problem, these arenot necessarily the most appropriate from a computational point of view.The use of the inexact constraint preconditioner [8] in the data assimilation setting isconsidered in [15, 16, 18], and experimentally has proved eﬀective. Here as the covariancematrices are less computationally expensive, the exact (1 ,

1) block is typically used. Thususing the result in [8], the eigenvalues τ of the matrix  D L R ˜ H ˜ L T ˜ H T  −  D L R H L T H T  (3.29)are either one (with multiplicity at least ( N + 1)(2 n + p ) − L T , H T ] − [ ˜ L T , ˜ H T ]))or bounded by | τ − | ≤ k [ L T , H T ] − [ ˜ L T , ˜ H T ] k ˜ σ , where ˜ σ is the smallest singular value of [ ˜ L T , ˜ H T ].When considering the exact approximation ˜ L = L , and taking ˜ H = 0, the resultingpreconditioned system has eigenvalues τ = 1 ± s µ T H L − DL − T H T µµ T R µ i where µ ∈ R ( N +1) p . Using the properties of the Rayleigh quotient, we know that theeigenvalues are on a line parallel to the imaginary axis through 1, where the maximumdistance from the real axis is given by s λ max ( H L − DL − T H T ) λ min ( R ) . Experimental results in [18] demonstrate that when an approximation is taken for ˜ L ,the eigenvalues are clustered in a cloud surrounding τ = 1 with the size of this cloud likelydepending on the accuracy of the chosen approximation. In this section we present numerical results using LR-GMRES. (For preconditioning strate-gies see Section 4.2). We use 20 iterations of LR-GMRES with a tolerance of 10 − . Duringthe algorithm where we truncate the matrices after concatenation and applying Amult , weuse a truncation tolerance of 10 − . We present examples with diﬀerent choices of reducedrank r . 14 NUMERICAL RESULTS

As a ﬁrst example, let us consider the one-dimensional (linear) advection-diﬀusion problem,deﬁned as: ∂∂t u ( x, t ) = c d ∂ ∂x u ( x, t ) + c a ∂∂x u ( x, t ) (4.1)for x ∈ [0 , t ∈ (0 , T ), subject to the boundary and initial conditions u (0 , t ) = 0 , t ∈ (0 , T ) u (1 , t ) = 0 , t ∈ (0 , T ) u ( x,

0) = u ( x ) , x ∈ [0 , . We solve this system with a centered diﬀerence scheme for u x and u t , and a Crank-Nicolsonscheme [12] for u xx , discretising x uniformly with n = 100, and taking timesteps of size∆ t = 10 − . For this example, we set the underlying system to have c d = 0 . c a = 1 . u ( x ) = sin( πx ).We now consider this example as a data assimilation problem, and compare the solutionsobtained both by the saddle point formulation (2.16), and the low-rank approximation usingLR-GMRES. We take an assimilation window of 200 timesteps (giving N = 199), followedby a forecast of 800 timesteps. Thus the resulting linear system (2.16) we solve here isof size (40 ,

000 + 200 p ), where p is the number of observations we take at each timestep.Independent of p , the full-rank update δx ∈ R , . In contrast the low-rank update is W V T , where W ∈ R × r , V ∈ R × r . For r = 20, this requires only 30% of the storage.In the examples to follow, we compare the full- and low-rank solutions to the dataassimilation problem with the background estimate. Perfect observations

First let us suppose we have perfect observations of every statein the assimilation window. Hence p = 100, and the size of the saddle point system weconsider is 60 , u b , a perturbed initial conditionwith background covariance B = 0 . I , and for this, and the following examples, weconsider a model error with covariance Q = 10 − I .Figure 4.1 shows the state u ( x, t a ) and absolute error | u ∗ ( x, t a ) − u ( x, t a ) | for the time t a immediately after assimilation. We consider the three approaches, denoting the truesolution by u ∗ . In Figure 4.2 we consider the root mean squared error of the approaches,presenting the errors in both the assimilation window, and the forecast. The resultsshow that the low-rank solution matches the full-rank solution very closely, in both theobservation window and the forecast. In Figure 4.1, the low, and full-rank approximationsare indistinguishable, with both displaying the same characteristics in the state error plot.Both methods for solving the data assimilation problem result in a superior forecast to theinitial guess (without assimilation).It is worth noting that here the low-rank solution to the data assimilation problemachieves a lower root mean squared error than the full-rank solution for half of the forecastwindow. Investigating diﬀerent random seeds, we saw that this was not always the case,though in majority of experiments the two solutions were close. In this example, the full-and low-rank solutions both outperformed the background estimate for all random seedsconsidered. 15 NUMERICAL RESULTS − . − . − . . . . .

81 x S t a t e True StateNo assimilationFull-rankLow-rank (a) State u ( x, t a ) · − x E rr o r No assimilationFull-rankLow-rank (b) Error | u ∗ ( x, t a ) − u ( x, t a ) | Figure 4.1: State and error for time t a after the assimilation window for 1Dadvection-diﬀusion problem with perfect observations. , − − − − Timestep E rr o r No assimilationFull-rankLow-rank

Figure 4.2: Root mean squared errors for 1D advection-diﬀusion data assimilationproblem with perfect observations.

Partial, noisy observations

Next, we introduce partial noisy observations, taking ob-servations in every ﬁfth component of u . These are generated from the truth with covariance R = 0 . I p , for p = 20, and as such the linear system we consider for this example is of size44 , B i,j = 0 . −| i − j | ),keeping Q = 10 − I and r = 20. The resulting errors for three approaches, and the rootmean squared errors are shown in Figure 4.3.As with the previous example, the state errors of both the full- and low-rank solutionsare similar, though here we notice a greater variation between the two than in the previ-ous example. Unlike above, when we compare the root mean squared errors of the full-and low-rank approaches, there is a greater disparity between the two, with the full-rankperforming signiﬁcantly better except at the very start of the forecast. Nonetheless thelow-rank approximation is superior to using no assimilation.16 NUMERICAL RESULTS · − · − · − · − . . . . . . .

22 x E rr o r No assimilationFull-rankLow-rank (a) Error | u ∗ ( x, t a ) − u ( x, t a ) | , − − − Timestep E rr o r No assimilationFull-rankLow-rank (b) Root mean squared error

Figure 4.3: Error for time t a after the assimilation window, and root mean squared errorfor 1D advection-diﬀusion problem with partial, noisy observations ( r = 20). Diﬀerent choices of rank

We now consider the eﬀect of the chosen rank on the assim-ilation result. In the previous examples we have considered r = 20, which resulted in thelow-rank approximation to δx requiring only 30% of the storage needed for the full-ranksolution. Here we consider r = 5 (requiring 7 .

5% of the storage), and r = 1 (needing just1 . r = 5 , − − − Timestep E rr o r No assimilationFull-rankLow-rank (a) r = 5 , − − − Timestep E rr o r No assimilationFull-rankLow-rank (b) r = 1 Figure 4.4: Root mean squared errors for 1D advection-diﬀusion data assimilationproblem with partial, noisy observations.to that which we saw from r = 20, though the assimilation window has greater variationfor r = 5 whilst remaining close to the full-rank solution. In contrast, the behaviour of theroot mean squared error for r = 1 is considerably diﬀerent to that of the full-rank solution.Despite this, the forecasts for both r = 5 and r = 1 are close to the full-rank solution and are17 NUMERICAL RESULTS comfortably more accurate than using no assimilation. The closeness to the full-rank maybe caused by the smoothing properties of this model operator, and the particular randomseed, as noted above. Though taking diﬀerent random seeds results in similar behaviour inmajority of cases.Table 1 presents the storage requirements for the examples considered in this section.As Figures 4.1- 4.4 demonstrate, despite the large reduction in the necessary storage for thelow-rank approach, it results in close approximations to the full-rank method.

We present here a comparison between diﬀerent choices of preconditioner for the 1D advec-tion -diﬀusion equation system in Section 4.1. We consider a small example taking n = 10, N = 19, p = 4 with B i,j = 0 . −| i − j | ), Q = 10 − I , R = 0 . I . The resulting sad-dle point matrix is 440 × r = 5 isconsidered, though similar results are obtained when we vary this choice.The preconditioners considered in Figure 4.5a are inexact constraint preconditioners(3.17), which we compare to using no preconditioner. We use ˜ L = I, ˆ L = I N +1 ⊗ I n + C ⊗ I n ,and also consider P I H from (3.22) where ˜ L = I , and use the exact H .We see that none of the preconditioners achieve a residual smaller than 10 − even after440 iterations due to the additional restrictions of the low-rank solver (e.g. the truncationduring the algorithm). The three inexact constraint preconditioners where we take ˜ H = 0exhibit very similar behaviour with the approximation ˆ L performing slightly better than theother two on the whole. The only preconditioner which achieved superior results to takingthe identity, was P I H from (3.22), incorporating the true H and taking L = I . Despite this,the improvement occurs only after 70 iterations which for GMRES is not ideal since we muststore all iterates. Even using the low-rank representation here, this becomes problematic.For Figure 4.5b, we experimented with a selection of Schur complement preconditioners,all of which approximate the Schur complement using the approximation (3.25). For theblock triangular preconditioner, we use the exact L and H in the inverted matrix in additionto (3.25).Unlike the inexact constraint preconditioners, none of the Schur complement precondi-tioners we consider here showed better results than using no preconditioner. Comparisonwith the inexact constraint preconditioners shows the block diagonal Schur complement pre-conditioners using ˆ L and L to be comparable. Despite the block triangular preconditionercontaining the true H it results in an ineﬀective choice, performing worse than all othersconsidered. 18 NUMERICAL RESULTS − − LR-GMRES iterations R e s i du a l No preconditionerIC ˜ L = I IC ˆ L , 2 term truncationIC L , 2 term truncationIC I, H (a) Inexact constraint − − LR-GMRES iterations R e s i du a l No preconditionerSC ˜ L = I SC ˆ L SC L SC triangular

L, H (b) Schur complement

Figure 4.5: Residual using diﬀerent preconditioners for the 440 ×

440 advection-diﬀusionexample.To illustrate a larger problem size than those above, we conduct a further test using n = 20 with the remaining setup unchanged from above. Thus the saddle point matrix isnow of size 880. In Figure 4.6 we compare the best performing of the above preconditioners,the inexact constraint preconditioner P I H from (3.22) using ˜ L = I and ˜ H = H . We see − − LR-GMRES iterations R e s i du a l No preconditionerIC I, H Figure 4.6: Residual using the inexact constraint preconditioner for the 880 × TIME-DEPENDENT SYSTEMS the ones belonging to larger eigenvalues, ignoring the smaller ones. Therefore, the low-rankapproach acts like a regularisation, and hence in some sense like a projected preconditioner.

Next we consider an extension of the Kronecker formulation (3.2) to the time-dependentcase, allowing for time-dependent model, and observation operators, and the respectivecovariance matrices. The remaining assumption we must make is that the number of obser-vations in the i -th timestep, p i is constant, i.e. p i = p for each i . With these assumptions,the linear system in (3.2) becomes  F ⊗ B + N P i =1 F i +1 ⊗ Q i I ⊗ I x + N P i =1 C i ⊗ M i N P i =0 F i +1 ⊗ R i N P i =0 F i +1 ⊗ H i I ⊗ I x + N P i =1 C Ti ⊗ M Ti N P i =0 F i +1 ⊗ H Ti   λµδx  =  bd  , (5.1)where F i denotes the matrix with 1 on the i th entry of the diagonal, and zeros elsewhere,and C i is the matrix with − i th column of the subdiagonal, and zeros elsewhere.Here M i and H i are linearisations of the model and observation operators M i and H i respectively about x i .As in Section 3.1, we may use (3.1) to rewrite this as the (now more general) matrixequations B Λ F + N X i =1 Q i Λ F i +1 + X + N X i =1 M i XC Ti = b N X i =0 R i U F i +1 + N X i =0 H i XF i +1 = d Λ + N X i =1 M Ti Λ C i + N X i =0 H Ti U F i +1 = 0 . (5.2)Where as before λ, δx, b, µ and d are vectorised forms of the matrices Λ , X, b ∈ R n × N +1 and U, d ∈ R p × N +1 respectively. These matrix equations must again be solved for Λ, U and X ,where X is the matrix of interest.Algorithm 4 is an implementation of Amult for the time-dependent case, explicitly writ-ing the concatenation deﬁned by (5.2) in the form required for LR-GMRES. This requireslinearisations of the model and observation operators at all timesteps in order to be applied.As an example, we consider the Lorenz-95 system [28] which is both non-linear, and alsochaotic rather than smoothing such as the previous example (Section 4.1), so as to betterrepresent real world data assimilation problems such as weather forecasting.

We consider the Lorenz-95 system [28], this is a generalisation of the three-dimensionalLorenz system [27] to n dimensions. The model is deﬁned by a system of n non-linear20 TIME-DEPENDENT SYSTEMS

Algorithm 4

Matrix multiplication (time-dependent) (

Amult ) Input: W , W , W , W , W , W Output: Z , Z , Z , Z , Z , Z Z = [ BW , Q W , . . . , Q N W , W , M W , . . . , M N W ], Z = [ F W , F W , . . . , F N +1 W , W , C W , . . . , C N W ], Z = [ R W , . . . , R N W , H W , . . . , H N W ], Z = [ F W , . . . , F N +1 W , F W , . . . , F N +1 W ], Z = [ W , M T W , . . . , M TN W , H T W , . . . , H TN W ], Z = [ W , C T W , . . . , C TN W , F W , . . . , F N +1 W ]ordinary diﬀerential equationsd x i d t = − x i − x i − + x i − x i +1 − x i + f, (5.3)where x = [ x , x , . . . , x n ] T is the state of the system, and f is a forcing term. It is knownthat for f = 8, the Lorenz system exhibits chaotic behaviour [20, 28]. Also noted is that forreasonably large values of n (here we take n = 40), this choice of f leads to a model whichis comparable to weather forecasting models.We solve (5.3) using a 4th order Runge-Kutta method in order to obtain x k +1 = M k ( x k ) , where x k = [ x k , x k , . . . , x nk ] T , (5.4)where M k is the non-linear model operator which evolves the state x k to x k +1 . As before H k denotes the potentially non-linear observation operator for the state x k . To formulatethe data assimilation problem as a saddle point problem, we generate the tangent linearmodel, and observation operators M k and H k by linearising M k and H k about x k .As in Section 4.1, we compare the low-rank approximation computed using LR-GMRES,to the full-rank solution of the saddle point formulation (2.16), and the background estimate(e.g. no assimilation). We perform the data assimilation using an assimilation window of200 timesteps, followed by a forecast of 1300 timesteps, where the timesteps are of size∆ t = 5 · − . The full-rank update is therefore δx ∈ R , , whilst in contrast the low-rankupdate W V T , is such that W ∈ R × r , V ∈ R × r . Here we consider r = 20 once more,which here requires 60% of the storage, still demonstrating a signiﬁcant reduction. Perfect observations

As with the advection-diﬀusion equation, let us ﬁrst suppose wehave perfect observations of every state in the assimilation window, we take as the back-ground estimate x b , a perturbed initial condition with background covariance B = 0 . I ,and as before, we consider a model error with covariance Q = 10 − I . The error | x ∗ − x | for the time after assimilation, and the root mean square errors for the three approaches inthis example are presented in Figure 5.1. The choice of r = 20 here results in a low-rankapproximation which is very close to the full-rank solution. This is very good given that thelow-rank approximation requires 40% less storage. In the state error plot we observe smalldiﬀerences between solutions for the middle states, though this is still substantially smallerthan the error with no assimilation. In the forecast the low-rank approximation matchesthe full-rank until both reach the error with no assimilation, with only small variation.21 TIME-DEPENDENT SYSTEMS − − − − x E rr o r No assimilationFull-rankLow-rank (a) Error | x ∗ − x | ,

000 1 ,

200 1 , R M S E rr o r No assimilationFull-rankLow-rank (b) Root mean squared error

Figure 5.1: Error | x ∗ − x | for the time after the assimilation window, and root meansquared error for Lorenz-95 system with perfect observations. Noisy observations

We next introduce noisy observations, taking R = 0 . I p for theobservation error covariance, furthermore we take as the background error covariance B i,j =0 . −| i − j | ). In Figure 5.2 we consider the root mean squared errors for two diﬀerentchoices of observation operator: taking interpolatory observations in every component ( p =40) shown on the left, and in every ﬁfth component ( p = 8) on the right. In both cases, ,

000 1 ,

200 1 , R M S E rr o r No assimilationFull-rankLow-rank (a) Noisy observations in every component ,

000 1 ,

200 1 , R M S E rr o r No assimilationFull-rankLow-rank (b) Noisy observations every ﬁfth component

Figure 5.2: Root mean squared error for Lorenz-95 system with noisy, and partialobservations.the low-rank approximation matches the full-rank very closely until the time at which botherrors are comparable to the background estimate. In this example the assimilation of noisyobservations in every ﬁfth component is similarly diﬃcult for both approaches. To achievethese very similar results using the low-rank approach, despite using just 60% of the storage22

TIME-DEPENDENT SYSTEMS is very promising.

Finally, we consider as a larger example, the 150 - dimen-sional Lorenz-95 system with an assimilation window of 150 timesteps. This gives a full-rankupdate δx ∈ R , , and we consider two diﬀerent choices of low-rank, r = 20 requiring27% of the storage, and r = 5 needing 7%. In this example we take noisy observations ineach state, with covariances B i,j = 0 . −| i − j | ), R = 0 . I and Q = 10 − I .These examples, shown in Figure 5.3 demonstrate further that a low-rank approximationperforms very closely to that of the full-rank solution for small choices of r . Taking r = 20 wesee that as in the previous examples, the resulting approximation is nearly indistinguishableuntil both solutions reach the same level of error as with no assimilation. As before, we ,

000 1 ,

200 1 , R M S E rr o r No assimilationFull-rankLow-rank (a) r = 20 ,

000 1 ,

200 1 , R M S E rr o r No assimilationFull-rankLow-rank (b) r = 5 Figure 5.3: Root mean squared error for 150-dimensional Lorenz-95 system with r = 20and r = 5.see the low-rank performing better for r = 5, this is not always the case depending on therandom seed as noted earlier, and is emphasised by the chaotic system sensitivity. Howeverrepeated experimentation shows that the full- and low-rank approximations are often close.Here the approximation using r = 5 gives similar results to the full-rank approximation,despite requiring just 7% of the storage.Table 2 presents the storage requirements for the examples considered in this section.As with the advection-diﬀusion example, despite the large reduction in storage required,the experiments have shown that the low-rank approximations give similar results to thefull-rank approach, which is a very good prospect.23 EFERENCES

The saddle point formulation of weak constraint four-dimensional variational data assimila-tion results in a large linear system which in the incremental approach is solved to determinethe update δx at every step. In this paper we have proposed a low-rank approach whichapproximates the solution to the saddle point system, with signiﬁcant reductions in thestorage needed. This was achieved by considering the structure of this saddle point systemand using techniques from the theory of matrix equations. Using the existence of low-ranksolutions to Sylvester equations we showed that low-rank solutions to the data assimilationproblem exist under certain assumptions, with numerical experimentation demonstratingthat this may be the case even when these assumptions are relaxed.We introduced a low-rank GMRES solver, considered the requirements for implementingthis algorithm, and investigated several preconditioning approaches. For our examples weobserved that no preconditioners were necessary, however further investigation of this maylead to new choices of preconditioners for the data assimilation setting, and new low-ranksolvers for weak constraint 4D-Var.Numerical experiments have demonstrated that the low-rank approach introduced hereis successful using both linear and non-linear models. In these examples we achieved closeapproximations to the full-rank solutions with storage requirements of up to less than 10%of those needed by the full-rank approach, which is very promising. References [1]

P. Benner and T. Breiten , Low rank methods for a class of generalized Lyapunovequations and related issues , Numer. Math., 124 (2013), pp. 441–470.[2]

P. Benner and P. K¨urschner , Computing real low-rank solutions of Sylvester equa-tions by the factored ADI method , Comput. Math. Appl., 67 (2014), pp. 1656–1672.[3]

P. Benner, J.-R. Li, and T. Penzl , Numerical solution of large-scale Lyapunovequations, Riccati equations, and linear-quadratic optimal control problems , Numer.Linear Algebra Appl., 15 (2008), pp. 755–777.[4]

P. Benner, R.-C. Li, and N. Truhar , On the ADI method for Sylvester equations ,J. Comput. Appl. Math., 233 (2009), pp. 1035–1045.[5]

M. Benzi, G. H. Golub, and J. Liesen , Numerical solution of saddle point problems ,Acta Numer., 14 (2005), pp. 1–137. 24

EFERENCES [6]

M. Benzi and A. J. Wathen , Some preconditioning techniques for saddle point prob-lems , Springer-Verlag, 2008, pp. 195–211.[7]

L. Bergamaschi , On eigenvalue distribution of constraint-preconditioned symmetricsaddle point matrices , Numer. Linear Algebra Appl., 19 (2011), pp. 754–772.[8]

L. Bergamaschi, J. Gondzio, M. Venturin, and G. Zilli , Inexact constraintpreconditioners for linear systems arising in interior point methods , Comput. Optim.Appl., 36 (2007), pp. 137–147.[9] ,

Erratum to: Inexact constraint preconditioners for linear systems arising in in-terior point methods , Comput. Optim. Appl., 49 (2009), pp. 401–406.[10]

Y. Cao, J. Zhu, I. M. Navon, and Z. Luo , A reduced-order approach to four-dimensional variational data assimilation using proper orthogonal decomposition , In-ternat. J. Numer. Methods Fluids, 53 (2007), pp. 1571–1583.[11]

P. Courtier, J.-N. Th´epaut, and A. Hollingsworth , A strategy for operationalimplementation of 4D-var, using an incremental approach , Q. J. R. Meteorol. Soc., 120(1994), pp. 1367–1387.[12]

J. Crank and P. Nicolson , A practical method for numerical evaluation of solutionsof partial diﬀerential equations of the heat-conduction type , in Math. Proc. CambridgePhilos. Soc., vol. 43, Cambridge Univ Press, 1947, pp. 50–67.[13]

V. Druskin and V. Simoncini , Adaptive rational Krylov subspaces for large-scaledynamical systems , Systems Control Lett., 60 (2011), pp. 546–560.[14]

G. Evensen , Sequential data assimilation with a nonlinear quasi-geostrophic modelusing Monte Carlo methods to forecast error statistics , J. Geophys. Res., 99 (1994),pp. 10143–10162.[15]

M. Fisher, S. Gratton, S. G¨urol, Y. Tr´emolet, and X. Vasseur , Low rankupdates in preconditioning the saddle point systems arising from data assimilation prob-lems , Optimization Methods and Software, 0 (2016), pp. 1–25.[16]

M. Fisher and S. G¨urol , Parallelisation in the time dimension of four-dimensionalvariational data assimilation , Q. J. R. Meteorol. Soc., (2017).[17]

M. Fisher, M. Leutbecher, and G. A. Kelly , On the equivalence between Kalmansmoothing and weak-constraint four-dimensional variational data assimilation , Q. J. R.Meteorol. Soc., 131 (2005), pp. 3235–3246.[18]

M. Fisher, Y. Tr´emolet, H. Auvinen, D. Tan, and P. Poli , Weak-constraintand long-window 4D-var , Tech. Report 655, ECMWF, 2011.[19]

G. M. Flagg and S. Gugercin , On the ADI method for the Sylvester equation andthe optimal- H points , Appl. Numer. Math., 64 (2013), pp. 50–58.[20] M. A. Freitag and R. Potthast , Synergy of inverse problems and data assimilationtechniques , vol. 13, Walter de Gruyter, 2013, pp. 1–53.25

EFERENCES [21]

L. Grasedyck , Existence of a low rank or H -matrix approximant to the solution of aSylvester equation , Numer. Linear Algebra Appl., 11 (2004), pp. 371–389.[22] R. E. Kalman , A new approach to linear ﬁltering and prediction problems , J. BasicEng., 82 (1960), pp. 35–45.[23]

D. Kressner and C. Tobler , Krylov subspace methods for linear systems with tensorproduct structure , SIAM J. Matrix Anal. Appl., 31 (2010), pp. 1688–1714.[24]

A. S. Lawless , Variational data assimilation for very large environmental problems ,vol. 13, Walter de Gruyter, 2013, pp. 55–90.[25]

A. S. Lawless, N. K. Nichols, C. Boess, and A. Bunse-Gerstner , Using modelreduction methods within incremental four-dimensional variational data assimilation ,Mon. Wea. Rev., 136 (2008), pp. 1511–1522.[26]

J.-R. Li and J. White , Low-rank solution of Lyapunov equations , SIAM J. MatrixAnal. Appl., 24 (2002), pp. 260–280.[27]

E. N. Lorenz , Deterministic nonperiodic ﬂow , J. Atmos. Sci., 20 (1963), pp. 130–141.[28] ,

Predictability: A problem partly solved , in Proc. Seminar on predictability, vol. 1,1996.[29]

B. C. Moore , Principal component analysis in linear systems: Controllability, ob-servability, and model reduction , IEEE Trans. Automat. Control, 26 (1981), pp. 17–32.[30]

T. Penzl , A cyclic low-rank Smith method for large sparse Lyapunov equations , SIAMJ. Sci. Comput., 21 (1999), pp. 1401–1418.[31]

D. T. Pham, J. Verron, and M. C. Roubaud , A singular evolutive extendedKalman ﬁlter for data assimilation in oceanography , J. Mar. Syst, 16 (1998), pp. 323–340.[32]

Y. Saad , Numerical solution of large Lyapunov equations , in Signal Processing, Scat-tering and Operator Theory, and Numerical Methods, Proc. MTNS-89, Birkhauser,1990, pp. 503–511.[33]

Y. Saad and M. H. Schultz , GMRES: A generalized minimal residual algorithm forsolving nonsymmetric linear systems , SIAM J. Sci. Comput., 7 (1986), pp. 856–869.[34]

Y. Sasaki , An objective analysis based on the variational method , J. Meteor. Soc.Japan, 36 (1958), pp. 77–88.[35] ,

Some basic formalisms in numerical variational analysis , Mon. Wea. Rev., 98(1970), pp. 875–883.[36]

V. Simoncini , A new iterative method for solving large-scale Lyapunov matrix equa-tions , SIAM J. Sci. Comput., 29 (2007), pp. 1268–1288.[37] ,

Computational methods for linear matrix equations , SIAM Rev., 58 (2016),pp. 377–441. 26

EFERENCES [38]

M. Stoll and T. Breiten , A low-rank in time approach to PDE-constrained opti-mization , SIAM J. Sci. Comput., 37 (2015), pp. B1–B29.[39]