Implicit particle filtering for models with partial noise, and an application to geomagnetic data assimilation
IImplicit particle filtering for models with partialnoise, and an application to geomagnetic dataassimilation
Matthias Morzfeld and Alexandre J. Chorin , Lawrence Berkeley National Laboratory, Berkeley, CAand Department of MathematicsUniversity of California, Berkeley, CA
Abstract
Implicit particle filtering is a sequential Monte Carlo method for data assim-ilation, designed to keep the number of particles manageable by focussingattention on regions of large probability. These regions are found by min-imizing, for each particle, a scalar function F of the state variables. Someprevious implementations of the implicit filter rely on finding the Hessiansof these functions. The calculation of the Hessians can be cumbersome if thestate dimension is large or if the underlying physics are such that derivativesof F are difficult to calculate. This is the case in many geophysical applica-tions, in particular for models with partial noise, i.e. with a singular statecovariance matrix. Examples of models with partial noise include stochasticpartial differential equations driven by spatially smooth noise processes andmodels for which uncertain dynamic equations are supplemented by con-servation laws with zero uncertainty. We make the implicit particle filterapplicable to such situations by combining gradient descent minimizationwith random maps and show that the filter is efficient, accurate and reliablebecause it operates in a subspace whose dimension is smaller than the statedimension. As an example, we assimilate data for a system of nonlinearpartial differential equations that appears in models of geomagnetism.Keywords: data assimilation; implicit sampling; particle filters; partialnoise; geomagnetismAMS Subject Classification: 60G35, 62M20, 86A051 a r X i v : . [ m a t h . NA ] S e p Introduction
The task in data assimilation is to use available data to update the fore-cast of a numerical model. The numerical model is typically given by adiscretization of a stochastic differential equation (SDE) x n +1 = R ( x n , t n ) + G ( x n , t n )∆ W n +1 , (1)where x is an m -dimensional vector, called the state, t n , n = 0 , , , . . . , isa sequence of times, R is an m -dimensional vector function, G is an m × m matrix and ∆ W is an m -dimensional vector, whose elements are independentstandard normal variates. The random vectors G ( x n , t n )∆ W n +1 representthe uncertainty in the system, however even for G = 0 the state x n may berandom for any n because the initial state x can be random. The data z l = h ( x q ( l ) , t q ( l ) ) + Q ( x q ( l ) , t q ( l ) ) V l , (2)are collected at times t q ( l ) , l = 1 , , . . . ; for simplicity, we assume that thedata are collected at a subset of the model steps, i.e. q ( l ) = rl , with r ≥ z is a k -dimensional vector ( k ≤ m ), h is a k -dimensional vector function, V is a k -dimensional vector whosecomponents are independent standard normal variates, and Q is a k × k matrix. Throughout this paper, we will write x n for the sequence of vectors (cid:8) x , . . . , x n (cid:9) .Data assimilation is necessary in many areas of science and engineeringand is essential in geophysics, for example in oceanography, meteorology,geomagnetism or atmospheric chemistry (see e.g. the reviews [5, 18, 20, 27,28, 39]). What makes the assimilation of data in geophysical applicationsdifficult is the complicated underlying physics, which lead to a large statedimension m and a nonlinear function R in equation (1).If the model (1) as well as h in (2) are linear and if, in addition, theinitial state x is Gaussian, then the probability density function (pdf) of thestate x n is Gaussian for any n and can be characterized in full by its meanand covariance. The Kalman filter (KF) sequentially computes the meanof the model (1), conditioned on the observations and, thus, provides thebest linear unbiased estimate of the state [23]. The ensemble Kalman filter(EnKF) is a Monte Carlo approximation of the Kalman filter and can beobtained by replacing the state covariance matrix by the sample covariancematrix in the Kalman formalism. The state covariance is the covariancematrix of the pdf of the current state conditioned on the previous statewhich we calculate from the model (1) to be: p ( x n +1 | x n ) ∼ N ( R ( x n , t n ) , G ( x n , t n ) G ( x n , t n ) T ) , (3)2here N ( µ, Σ) denotes a Gaussian with mean µ and covariance matrix Σ.To streamline the notation we write for the state covarianceΣ nx = G ( x n , t n ) G ( x n , t n ) T , (4)where T denotes a transpose. In the EnKF, the sample covariance matrix iscomputed from an “ensemble,” by running the model (1) for different realiza-tions of the noise process ∆ W . The Monte Carlo approach avoids the com-putationally expensive step of updating the state covariance in the Kalmanformalism. Both KF and EnKF have extensions to nonlinear, non-Gaussianmodels, however they rely on linearity and Gaussianity approximations [22].Variational methods [3, 11, 12, 36–38, 42] aim at assimilating the obser-vations within a given time window by computing the state trajectory ofmaximum probability. The trajectory is computed by minimizing a suit-able cost function which is, up to a normalization constant, the logarithmof the pdf of the state trajectory = x q ( l ) given the set of observations z l , p ( x q ( l ) | z l ). In particular, 3D-Var methods assume a static model [36].Strong constraint 4D-Var determines an optimal initial state given a “per-fect” dynamic model, i.e. G = 0, and a Gaussian initial uncertainty, i.e. x ∼ N ( µ , Σ ) [11, 12, 36, 37]. Uncertain models with G (cid:54) = 0 are tackledwith a weak constraint 4D-Var approach [3,38,42]. Many implementations ofvariational methods compute the gradient of the cost function from tangentlinear adjoint equations and rely on linear approximations.For the reminder of this paper, we focus on sequential Monte Carlo(SMC) methods for data assimilation, called particle filters [1, 7, 8, 14, 15,19, 29–31, 40, 41]. Particle filters do not rely upon linearity or Gaussianityassumptions and approximate the pdf of the state given the observations, p ( x q ( l ) | z l ), by SMC. The state estimate is a statistic (e.g. the mean,median, mode etc.) of this pdf. Most particle filters rely on the recursiverelation p ( x q ( l +1) | z l +1 ) ∝ p ( x q ( l ) | z l ) p ( z l +1 | x q ( l +1) ) p ( x q ( l )+1: q ( l +1) | x q ( l ) ) . (5)In the above equation p ( x q ( l +1) | z l +1 ) is the pdf of the state trajectoryup to time t q ( l +1) given all available observations up to time t q ( l +1) and iscalled the target density; p ( z l +1 | x q ( l +1) ) is the probability density of thecurrent observation given the current state and can be obtained from (2): p ( z l +1 | x q ( l +1) ) ∼ N ( h ( x q ( l ) , t q ( l ) ) , Σ nz ) , (6)with Σ nz = Q ( x n , t n ) Q ( x n , t n ) T . (7)3he pdf p ( x q ( l )+1: q ( l +1) | x q ( l ) ) is the density of the state trajectory from theprevious assimilation step to the current observation, conditioned on thestate at the previous assimilation step, and is determined by the model (1).A standard version of the sampling-importance-resampling (SIR) parti-cle filter (also called bootstrap filter, see e.g. [14]) generates, at each step,samples from p ( x q ( l )+1: q ( l +1) | x q ( l ) ) (the prior density) by running the model.These samples (particles) are weighted by the observations with weights w ∝ p ( z l +1 | x q ( l +1) ), to yield a posterior density that approximates thetarget density p ( x q ( l ) | z l ). One then removes particles with a smallweight by “resampling” (see e.g. [1] for resampling algorithms) and repeatsthe procedure when the next observation becomes available. This SIR fil-ter is straightforward to implement, the catch is that many particles havesmall weights because the particles are generated without using informationfrom the data. If many particles have a small weight, the approximation ofthe target density is poor and the number of particles required for a goodapproximation of the target density can grow catastrophically with the di-mension of the state [4, 34]. Various methods, e.g. different prior densitiesand weighting schemes, have been invented to ameliorate this problem (seee.g. [14, 39–41]).The basic idea of implicit particle filters [7, 8, 31] is to use the availableobservations to find regions of high probability in the target density and lookfor samples within this region. This implicit sampling strategy generates athin particle beam within the high probability domain and, thus, keeps thenumber of particles required manageable, even if the state dimension is large.The focussing of particles is achieved by setting up an underdeterminedalgebraic equation that depends on the model (1) as well as on the data(2), and whose solution generates a high probability sample of the targetdensity. We review the implicit filter in the next section, and it will becomeevident that the construction assumes that the state covariance Σ nx in (4) isnonsingular. This condition is often not satisfied. If, for example, one wantsto assimilate data into a stochastic partial differential equation (SPDE)driven by spatially smooth noise, then the continuous-time noise processcan be represented by a series with rapidly decaying coefficients, leadingto a non-singular or ill-conditioned state covariance Σ nx in discrete time andspace (see Sections 3.1 and 4, as well as [10,21,26]). A second important classof models with partial noise are uncertain dynamic equations supplementedby conservation laws (e.g. conservation of mass) with zero uncertainty. Suchmodels often appear in data assimilation for fluid dynamics problems [25].The purpose of the present paper is two-fold. First, in Section 2, wepresent a new implementation of the implicit particle filter. Most previous4mplementations of the implicit filter [7, 31] rely in one way or another onfinding the Hessians of scalar functions in rm variables. For systems withvery large state vectors and considerable gaps between observations, mem-ory constraints may forbid a computation of these Hessians. Our new imple-mentation combines gradient descent minimization with random maps [31]to avoid the calculation of Hessians, and thus reduces the memory require-ments.The second objective is to consider models with a singular or ill-conditionedstate covariance Σ nx where previous implementations of the implicit filter, asdescribed in [7, 8, 31], are not applicable. In Section 3, we make the implicitfilter applicable to models with partial noise and show that our approachis then particularly efficient, because the filter operates in a space whosedimension is determined by the rank of Σ nx , rather than by the model di-mension. We compare the new implicit filter to SIR, EnKF and variationalmethods, in particular with respect to how information is propagated fromobserved variables to unobserved ones.In Section 4, we illustrate the theory with an application in geomag-netism and consider two coupled nonlinear SPDE’s with partial noise. Weobserve that the implicit filter gives good results with very few (4-10) par-ticles, while EnKF and SIR require hundreds to thousands of particles forsimilar accuracy. We first follow [31] closely to review implicit sampling with random maps.Suppose we are given a collection of M particles X q ( l ) j , j = 1 , , . . . , M ,whose empirical distribution approximates the target density at time t q ( l ) ,where q ( l ) = rl , and suppose that an observation z l +1 is available after r steps at time t q ( l +1) = t r ( l +1) . From (5) we find, by repeatedly using Bayes’theorem, that, for each particle, p ( X q ( l +1) j | z l +1 ) ∝ p ( X q ( l ) j | z l ) p ( z l +1 | X q ( l +1) j ) × p ( X q ( l +1) j | X q ( l +1) − j ) p ( X q ( l +1) − j | X q ( l +1) − j )... × p ( X q ( l )+1 j | X q ( l ) j ) . (8)Implicit sampling is a recipe for computing high-probability samples fromthe above pdf. To draw a sample we define, for each particle, a function F j
5y exp( − F ( X j )) = p ( X q ( l +1) j | X q ( l +1) − j ) · · · p ( X q ( l )+1 j | X q ( l ) j ) × p ( z l +1 | X q ( l +1) j ) . (9)where X j is shorthand for the state trajectory X q ( l )+1: q ( l +1) j . Specifically,we have F j ( X j ) = 12 (cid:16) X q ( l )+1 j − R q ( l ) j (cid:17) T (cid:16) Σ q ( l ) x,j (cid:17) − (cid:16) X q ( l )+1 j − R q ( l ) j (cid:17) + 12 (cid:16) X q ( l )+2 j − R q ( l )+1 j (cid:17) T (cid:16) Σ q ( l )+1 x,j (cid:17) − (cid:16) X q ( l )+2 j − R q ( l )+1 j (cid:17) ...+ 12 (cid:16) X q ( l +1) j − R q ( l +1) − j (cid:17) T (cid:16) Σ q ( l +1) − x,j (cid:17) − (cid:16) X q ( l +1) j − R q ( l +1) − j (cid:17) + 12 (cid:16) h (cid:16) X q ( l +1) j (cid:17) − z l +1 (cid:17) T (cid:16) Σ l +1 z,j (cid:17) − (cid:16) h (cid:16) X q ( l +1) j (cid:17) − z l +1 (cid:17) + Z j , (10)where R nj is shorthand notation for R ( X nj , t n ) and where Z j is a positivenumber that can be computed from the normalization constants of the var-ious pdf’s in the definition of F j in (9). With this F j , we solve the algebraicequation F ( X j ) − φ j = 12 ξ Tj ξ j , (11)where ξ j is a realization of the rm − dimensional reference variable ξ ∼N (0 , I ), and where φ j = min F j . (12)The choice of a Gaussian reference variables does not imply linearity orGaussianity assumptions and other choices are certainly possible. We findsolutions of (11) by using the random map X j = µ j + λ j L j η j , (13)where λ j is a scalar, µ j is an rm -dimensional column vector which representsthe location of the minimum of F j , i.e. µ j = argmin F j , L j is a deterministic rm × rm matrix we can choose, and η j = ξ j / (cid:113) ξ Tj ξ j , is uniformly distributedon the unit rm -sphere. Upon substitution of (13) into (11), we can find a6olution of (11) by solving a single algebraic equation in the variable λ j .The weight of the particle can be shown to be (see [31], Section 3) w q ( l +1) j ∝ w q ( l ) j exp( − φ j ) | det L j | ρ − rm/ j (cid:12)(cid:12)(cid:12)(cid:12) λ rm − j ∂λ j ∂ρ j (cid:12)(cid:12)(cid:12)(cid:12) , (14)where ρ j = ξ Tj ξ j and det L j denotes the determinant of the matrix L j (see[31] for details of the calculation). An expression for the scalar derivative ∂λ j /∂ρ j can be obtained by implicit differentiation of (11): ∂λ j ∂ρ j = ρ ∇ F j ) L Tj η j , (15)where ∇ F j denotes the gradient of F j (an rm -dimensional row vector).The weights are normalized so that their sum equals one. The weightedpositions X j of the particles approximate the target pdf. We compute themean of X j with weights w j as the state estimate, and then proceed to assim-ilate the next observation. The method just described makes use of only oneset of observations per assimilation step, however an extension to multipleobservation sets per assimilation step (smoothing) is straightforward. An algorithm for data assimilation with implicit sampling and random mapswas presented in [31]. This algorithm relies on the calculation of the Hessiansof the F j ’s and the Hessians are used for minimizing the F j ’s with Newton’smethod and for setting up the random map. The calculation of the Hessianshowever may not be easy in some applications because of a very large statedimension, or because the second derivatives are hard to calculate, as is thecase for models with partial noise (see Section 3). To avoid the calculation ofHessians, we propose to use a gradient descent algorithm with line-search tominimize the F j ’s (see e.g. [32]), along with simple random maps. Of courseother minimization techniques, in particular quasi-Newton methods (see e.g.[16, 32], can also be applied here and perhaps speed up the minimization.However, we decided to use gradient descent with line search to keep theminimization as simple as possible.For simplicity, we assume that G and Q in (1)-(2) are constant matricesand calculate the gradient of F j from (10): ∇ F = (cid:18) ∂F∂X q ( l )+1 , ∂F∂X q ( l )+2 , . . . , ∂F∂X q ( l +1) − , ∂F∂X q ( l +1) (cid:19) , (16)7ith (cid:18) ∂F∂X k (cid:19) T = Σ − x (cid:16) X k − R k − (cid:17) − ( ∂R∂x | x = X k ) T Σ − x (cid:16) X k +1 − R k (cid:17) , (17)for k = q ( l )+1 , q ( l )+2 , . . . , q ( l +1) −
1, where R n is shorthand for R ( X n , t n ),and where (cid:18) ∂F∂X q ( l +1) (cid:19) T = Σ − x (cid:16) X q ( l +1) − R q ( l +1) − (cid:17) + (cid:18) ∂h∂x | x = X q ( l +1) (cid:19) T Σ − z (cid:16) h ( X q ( l +1) ) − z l +1 (cid:17) . (18)Here, we dropped the index j for the particles for notational convenience.We initialize the minimization using the result of a simplified implicit parti-cle filter (see next subsection). Once the minimum is obtained, we substitutethe random map (13) with L j = I , where I is the identity matrix, into (11)and solve the resulting scalar equation by Newton’s method. The scalarderivative we need for the Newton steps is computed numerically. We ini-tialize this iteration with λ j = 0. Finally, we compute the weights accordingto (14). If some weights are small, as indicated by a small effective samplesize [1], M Eff = 1 / M (cid:88) j =1 (cid:16) w q ( l +1) j (cid:17) (19)we resample using Algorithm 2 in [1]. The implicit filtering algorithm withgradient descent minimization and random maps is summarized in pseudo-code in Algorithm 1.This implicit filtering algorithm shares with weak constraint 4D-Var thata “cost function” (here F j ) is minimized by gradient descent. However, most4D-Var implementations use tangent linear adjoint equations to computethe gradient. In the implicit filtering Algorithm 1, we do a fully nonlinearcalculation of the gradient. Two further differences between 4D-Var andAlgorithm 1 are ( i ) 4D-Var does not update the state sequentially, but theimplicit particle filter does and, thus, reduces memory requirements; ( ii )4D-Var computes the most likely state by minimizing the cost function, andthis estimate can be biased; the implicit particle filter approximates the8 lgorithm 1 Implicit Particle Filter with Random Maps and GradientDescent Minimization { Initialization, t = 0 } for j = 1 , . . . , M do • sample X j ∼ p o ( X ) end for { Assimilate observation z l } for j = 1 , . . . , M do • Set up and minimize F j using gradient descent to compute φ j and µ j • Sample reference density ξ j ∼ N (0 , I ) • Compute ρ j = ξ Tj ξ j and η j = ξ j / √ ρ j • Solve (11) using the random map (13) with L j = I • Compute weight of the particle using (14) • Save particle X j and weight w j end for • Normalize the weights so that their sum equals 1 • Compute state estimate from X j weighted with w j (e.g. the mean) • Resample if M Eff < c • Assimilate z l +1 We wish to simplify the implicit particle filtering algorithm by reducingthe dimension of the function F j . The idea is to do an implicit samplingstep only at times t q ( l +1) , i.e. when an observation becomes available. Thestate trajectory of each particle from time t q ( l ) (the last time an observa-tion became available) to t q ( l +1) − , is generated using the model equations(1). This approach reduces the dimension of F j from rm to m (the statedimension). The simplification is thus very attractive if the number of stepsbetween observations, r , is large. However, difficulties can also be expectedfor large r : the state trajectories up to time t q ( l +1) − are generated by themodel alone and, thus, may not have a high probability with respect to theobservations at time t q ( l +1) . The focussing effect of implicit sampling canbe expected to be less emphasized and the number of particles required maygrow as the gap between observations becomes larger. Whether or not thesimplification we describe here can reduce the computational cost is prob-lem dependent and we will illustrate advantages and disadvantages in theexamples in Section 4.Suppose we are given a collection of M particles X q ( l ) j , j = 1 , , . . . , M ,whose empirical distribution approximates the target density at time t q ( l ) and the next observation, z l +1 , is available after r steps at time t q ( l +1) . Foreach particle, we run the model for r − X q ( l )+1 j , . . . , X q ( l +1) − j .We then define, for each particle, a function F j by F j ( X j ) = 12 (cid:16) X q ( l +1) j − R q ( l +1) − j (cid:17) T (cid:16) Σ q ( l +1) − x,j (cid:17) − (cid:16) X q ( l +1) j − R q ( l +1) − j (cid:17) + 12 (cid:16) h (cid:16) X q ( l +1) j (cid:17) − z l +1 (cid:17) T (cid:16) Σ q ( l +1) z,j (cid:17) − (cid:16) h (cid:16) X q ( l +1) j (cid:17) − z l +1 (cid:17) , + Z j (20)whose gradient is given by (18). The algorithm then proceeds as Algorithm1 in the previous section: we find the minimum of F j using gradient descentand solve (11) with the random map (13) with L j = I . The weights arecalculated by (14) with r = 1 and the mean of X j weighted by w j is thestate estimate at time t q ( l +1) . 10his simplified implicit filter simplifies further if the observation functionis linear, i.e. h ( x ) = Hx , where H is a k × m matrix. One can show [31]that the minimim of F j is φ j = 12 ( z l +1 − HR q ( l +1) − j ) T K − j ( z l +1 − HR q ( l +1) − j ) , (21)with K j = H Σ q ( l +1) − x,j H T + Σ l +1 z,j . (22)A numerical approximation of the minimum is thus not required. The loca-tion of the minimum is µ j = Σ j (cid:18)(cid:16) Σ q ( l +1) − x,j (cid:17) − R q ( l +1) − j + H T (Σ q ( l +1) z,j ) − z l +1 (cid:19) , (23)with Σ − j = (cid:16) Σ q ( l +1) − x,j (cid:17) − + H T (Σ l +1 z,j ) − H. (24)Moreover, if L j is a Cholesky factor of Σ j , then X j = µ j + L j ξ j solves (11)and the weights simplify to w n +1 j ∝ w nj exp( − φ j ) | det L j | . (25)For the special case of a linear observation function and observations avail-able at every model step ( r = 1), the simplified implicit filter is the fullimplicit filter and reduces to a version of optimal importance sampling[1, 5, 7, 31]. We consider the case of a singular state covariance matrix Σ x in the contextof implicit particle filtering. We start with an example taken from [21], todemonstrate how a singular state covariance appears naturally in the con-text of SPDE’s driven by spatially smooth noise. The example serves as amotivation for more general developments in later sections. Another classof models with partial noise consists of dynamical equations supplementedby conservation laws. The dynamics are often uncertain and thus driven bynoise processes, however there is typically zero uncertainty in the conserva-tion laws (e.g. conservation of mass), so that the full model (dynamics andconservation laws) is subject to partial noise [25].11 .1 Example of a model with partial noise: the semi-linearheat equation driven by spatially smooth noise We consider the stochastic semi-linear heat equation on the one-dimensionaldomain x ∈ [0 ,
1] over the time interval t ∈ [0 , ∂u∂t = ∂ u∂x + Γ( u ) + ∂W t ∂t , (26)where Γ is a continuous function, and W t is a cylindrical Brownian motion(BM) [21]. The derivative ∂W t /∂t in (26) is formal only (it does not exist inthe usual sense). Equation (26) is supplemented by homogeneous Dirichletboundary conditions and the initial value u ( x,
0) = u o ( x ). We expand thecylindrical BM W t in the eigenfunctions of the Laplace operator W t = ∞ (cid:88) k =1 (cid:112) q k sin( kπx ) β kt , (27)where β kt denote independent BM’s and where the coefficients q k ≥ γ ∈ (0 , ∞ (cid:88) k =1 λ γ − k q k < ∞ , (28)where λ k are the eigenvalues of the Laplace operator [21]. If the coefficients q k decay fast enough, then, by (27) and basic properties of Fourier series,the noise is smooth in space and, in addition, the sum (28) remains finite asis required. For example one may be interested in problems where q k = (cid:26) e − k , if k ≤ c, , if k > c, (29)for some c > m -dimensionalspace spanned by the first m eigenfunctions e k of the Laplace operator dU mt = ( A m U mt + Γ m ( U mt )) dt + dW mt , (30)where U mt , Γ m and W mt are m -dimensional truncations of the solution, thefunction Γ and the cylindrical BM W t respectively, and where A m is a dis-cretization of the Laplace operator. Specifically, from (27) and (29), weobtain: dW mt = c (cid:88) k =1 √ e − k sin( kπx ) dβ kt . (31)12fter multiplying with the basis functions and integrating over the spatialdomain, we are left with a set of m stochastic ordinary differential equations dx = f ( x ) dt + gdW (32)where x is an m -dimensional state vector, f is a nonlinear vector function, W is a BM. In particular, we calculate from (31): g = 1 √ (cid:0)(cid:0) e − , e − , . . . , e − c , , , . . . , (cid:1)(cid:1) , c < m, (33)where diag( a ) is a diagonal matrix whose diagonal elements are the com-ponents of the vector a . Upon time discretization using, for example, astochastic version of forward Euler with time step δ [24], we arrive at (1)with R ( x ) = x n + δf ( x n ) , G ( x ) = √ δg. (34)It is now clear that the state covariance matrix Σ x = GG T is singular for c < m .A singular state covariance causes no problems for running the discretetime model (1) forward in time. However problems do arise if we want toknow the pdf of the current state given the previous one. For example, thefunctions F j in the implicit particle filter algorithms (either those in Section2, or those in [7,8,31]) are not defined for singular Σ x . If c ≥ m , then Σ x is ill-conditioned and causes a number of numerical issues in the implementationof these implicit particle filtering algorithms and, ultimately, the algorithmsfail. We start deriving the implicit filter for models with partial noise by con-sidering the special case in which observations are available at every modelstep ( r = 1). For simplicity, we assume that the noise is additive, i.e. G ( x n , t n ) = G = constant and that Q in (2) is a constant matrix. Un-der these assumptions, we can use a linear coordinate transformation todiagonalize the state covariance matrix and rewrite the model (1) and theobservations (2) as x n +1 = f ( x n , y n , t n ) + ∆ W n +1 , ∆ W n +1 ∼ N (0 , ˆΣ x ) (35) y n +1 = g ( x n , y n , t n ) , (36) z n +1 = h ( x n , y n ) + QV n , (37)13here x is a p -dimensional column vector, p < m is the rank of the statecovariance matrix (4), and where f is a p -dimensional vector function, ˆΣ x is a non-singular, diagonal p × p matrix, y is a ( m − p )-dimensional vector,and g is a ( m − p )-dimensional vector function. For ease of notation, wedrop the hat above the “new” state covariance matrix ˆΣ x in (35) and, forconvenience, we refer to the set of variables x and y as the “forced” and“unforced variables” respectively.The key to filtering this system is observing that the unforced variablesat time t n +1 , given the state at time t n , are not random. To be sure, y n israndom for any n due to the nonlinear coupling g ( x, y ), but the conditionalpdf p ( y n +1 | x n , y n ) is the delta-distribution. For a given (not random)initial state x , y , the target density is p ( x n +1 , y n +1 | z n +1 ) ∝ p ( x n , y n | z n ) × p ( z n +1 | x n +1 , y n +1 ) p ( x n +1 | x n , y n ) . (38)Suppose we are given a collection of M particles, X nj , Y nj , j = 1 , , . . . , M ,whose empirical distribution approximates the target density p ( x n , y n | z n ) at time t n . The pdf for each particle at time t n +1 is thus given by(38) with the substitution of X j for x and Y j for y . In agreement with thedefinition of F j in previous implementations of the implicit filter, we define F j here byexp( − F j ( X n +1 j )) = p ( z n +1 | X n +1 j , Y n +1 j ) p ( X n +1 j | X nj , Y nj ) . (39)More specifically, F j ( X n +1 j ) = 12 (cid:16) X n +1 j − f nj (cid:17) T Σ − x (cid:16) X n +1 j − f nj (cid:17) + 12 (cid:16) h (cid:16) X n +1 j , Y n +1 j (cid:17) − z n +1 (cid:17) T Σ − z (cid:16) h (cid:16) X n +1 j , Y j (cid:17) − z n +1 (cid:17) , + Z j (40)where f nj is shorthand notation for f ( X nj , Y nj , t n ). With this F j , we can useAlgorithm 1 to construct the implicit filter. For this algorithm we need thegradient of F j :( ∇ F j ) T = Σ − x (cid:16) X n +1 j − f nj (cid:17) + (cid:18) ∂h∂x | x = X n +1 j (cid:19) T Σ − z (cid:16) h ( X n +1 j , Y n +1 j ) − z n +1 (cid:17) . (41)14ote that Y n +1 j is fixed for each particle, if its previous state, ( X nj , Y nj ),is known, so that the filter only updates X n +1 j when the observations z n +1 become available. The unforced variables of the particles, Y n +1 j , are movedforward in time using the model, as they should be, since there is no uncer-tainty in y n +1 given x n , y n . The data are used in the state estimation of y indirectly through the weights and through the nonlinear coupling betweenthe forced and unforced variables of the model. If one observes only theunforced variables, i.e. h ( x, y ) = h ( y ), then the data is not used directlywhen generating the forced variables, X n +1 j , because the second term in (40)is merely a constant. In this case, the implicit filter becomes equivalent toa standard SIR filter, with weights w n +1 j = w nj exp( − φ j ).This implementation of the implicit filter is numerically effective for fil-tering systems with partial noise, because the filter operates in a space ofdimension p (the rank of the state covariance matrix), which typically isless than the state dimension (see the example in Section 4). The use of agradient descent algorithm and random maps further makes the often costlycomputation of the Hessian of F j unnecessary. If h is linear no iterativeminimization is required.If the state covariance matrix is ill-conditioned, a direct implementa-tion of Algorithm 1 is not possible. We propose to diagonalize the statecovariance and set all eigenvalues below a certain threshold to zero so thata model of the form (35)-(37) can be obtained. In our experience, suchapproximations are accurate and the filter of this section can be used. We extend the results of Section 3.2 to the more general case of observationsthat are sparse in time. Again, the key is to realize that y n +1 is fixed given x n , y n . For simplicity, we assume additive noise and a constant Q in (2), sothat the target density becomes p ( x q ( l +1) , y q ( l +1) | z l +1 ) ∝ p ( x q ( l ) , y q ( l ) | z l ) × p ( z l +1 | x q ( l +1) , y q ( l +1) ) × p ( x q ( l +1) | x q ( l +1) − , y q ( l +1) − ) × p ( x q ( l +1) − | x q ( l +1) − , y q ( l +1) − )... × p ( x q ( l )+1 | x q ( l ) , y q ( l ) )15iven a collection of M particles, X nj , Y nj , j = 1 , , . . . , M , whose empiricaldistribution approximates the target density p ( x q ( l ) , y q ( l ) | z l ) at time t q ( l ) , we define, for each particle, the function F j byexp( − F j ( X j )) = p ( z l +1 | X q ( l +1) j , Y q ( l +1) j ) × p ( X q ( l +1) j | X q ( l +1) − j , Y q ( l +1) − j )... × p ( X q ( l )+1 j | X q ( l ) j , Y q ( l ) j ) (42)where X j is shorthand for X q ( l )+1 ,...,q ( l +1) j , so that F j ( X j ) = 12 (cid:16) X q ( l )+1 j − f q ( l ) j (cid:17) T Σ − x (cid:16) X q ( l )+1 j − f q ( l ) j (cid:17) + 12 (cid:16) X q ( l )+2 j − f q ( l )+1 j (cid:17) T Σ − x (cid:16) X q ( l )+2 j − f q ( l )+1 j (cid:17) ...+ 12 (cid:16) X q ( l +1) j − f q ( l +1) − j (cid:17) T Σ − x (cid:16) X q ( l +1) j − f q ( l +1) − j (cid:17) + 12 (cid:16) h (cid:16) X q ( l +1) j , Y q ( l +1) j (cid:17) − z l +1 (cid:17) T Σ − z × (cid:16) h (cid:16) X q ( l +1) j , Y q ( l +1) j (cid:17) − z l +1 (cid:17) + Z j . (43)At each model step, the unforced variables of each particle depend on theforced and unforced variables of the particle at the previous time step, sothat Y q ( l +1) j is a function of X q ( l ) j , X q ( l )+1 j , . . . , X q ( l +1) − j and f q ( l +1) j is afunction of X q ( l )+1 j , X q ( l )+2 j , . . . , X q ( l +1) j . The function F j thus depends onthe forced variables only. However, the appearances of the unforced variablesin F j make it rather difficult to compute derivatives. The implicit filter withgradient descent minimization and random maps (see Algorithm 1) is thusa good filter for this problem, because it only requires computation of thefirst derivatives of F j , while other implementations (see [7,31]) require secondderivatives as well.The gradient of F j is given by the rp -dimensional row vector ∇ F j = (cid:32) ∂F j ∂X q ( l )+1 j , ∂F j ∂X q ( l )+2 j , . . . , ∂F j ∂X q ( l +1) j (cid:33) (44)16ith ∂F j ∂X kj T = Σ − x (cid:16) X kj − f k − j (cid:17) + (cid:18) ∂f∂x | k (cid:19) T Σ − x (cid:16) X k +1 j − f kj (cid:17) + (cid:32) ∂f∂y | k +1 ∂y k +1 ∂X kj (cid:33) T Σ − x (cid:16) X k +2 j − f k +1 j (cid:17) + (cid:32) ∂f∂y | k +2 ∂y k +2 ∂X kj (cid:33) T Σ − x (cid:16) X k +3 j − f k +2 j (cid:17) ...+ (cid:32) ∂f∂y | q ( l ) − ∂y q ( l ) − ∂X kj (cid:33) T Σ − x (cid:16) X q ( l +1) j − f q ( l ) − j (cid:17) + (cid:32) ∂h∂y | k ∂y q ( l ) ∂X kj | k − (cid:33) T Σ − z (cid:16) h (cid:16) X q ( l +1) j , Y q ( l +1) j (cid:17) − z l +1 (cid:17) (45)for k = q ( l )+1 , . . . , q ( l +1) − · ) | k denotes “evaluate at time t k .”The derivatives ∂y i /∂X kj , i = k + 1 , . . . , q ( l ), can be computed recursivelywhile constructing the sum, starting with ∂y k +1 ∂X kj = ∂∂X kj (cid:16) g ( X kj , Y kj ) (cid:17) = ∂g∂x | k , (46)and then using ∂y k + i ∂X kj = ∂g∂x | i − ∂y i − ∂X kj | i − , i = k + 2 , . . . , q ( l ) . (47)The minimization of F j for each particle is initialized with a free modelrun for r steps, with initial conditions given by the final position of the j th particle at the previous assimilation step. With this initial guess we computethe gradient using (44)-(47) and, after a line search and one step of gradientdescent, obtain a new set of forced variables. We use this result to updatethe unforced variables by the model, and proceed to the next iteration.Once the minimum φ j and its location µ j are found, we use the randommap (13) with L j = I to compute X q ( l )+1 j , . . . , X q ( l +1) j for this particle andthen use these forced variables to compute Y q ( l )+1 ,...,q ( l +1) j . We do this for all17articles, and compute the weights from (14) with m = p , then normalize theweights so that their sum equals one and thereby obtain an approximationof the target density. We resample if the effective sample size M Eff is belowa threshold and move on to assimilate the next observation. The implicitfiltering algorithm is summarized with pseudo code in Algorithm 2.
Algorithm 2
Implicit Particle Filter with Random Maps and GradientDescent Minimization for Models with Partial Noise { Initialization, t = 0 } for j = 1 , . . . , M do • sample X j ∼ p o ( X ) end for { Assimilate observation z l } for j = 1 , . . . , M do • Set up and minimize F j using gradient descent:Initialize minimization with a free model run while Convergence criteria not satisfied do Compute gradient by (44)Do a line searchCompute next iterate by gradient descent stepUse results to update unforced variables using the modelCheck if convergence criteria are satisfied end while • Sample reference density ξ j ∼ N (0 , I ) • Compute ρ j = ξ Tj ξ j and η j = ξ j / √ ρ j • Solve (11) using random map (13) with L j = I to compute X j • Use this X j and the model to compute corresponding Y j • Compute weight of the particle using (14) • Save particle ( X j , Y j ) and weight w j end for • Normalize the weights so that their sum equals 1 • Compute state estimate from X j weighted with w j (e.g. the mean) • Resample if M Eff < c • Assimilate z l +1 Note that all state variables are computed by using both the data and themodel, regardless of which set of variables (the forced or unforced ones) is ob-served. The reason is that sparse observations induce a nonlinear coupling,18hrough f and g in (35)-(37), between the unforced and forced variables atthe various model steps. It should also be noted that the function F j is afunction of rp variables (rather than rm ), because the filter operates in thesubspace of the forced variables. If the minimization is computationally tooexpensive, because p or r is extremely large, then one can easily adapt the“simplified” implicit particle filter of Section 2.2 to the situation of partialnoise using the methods we have described above. If h is nonlinear, thissimplified filter requires a minimization of a p -dimensional function for eachparticle. If h is linear, no numerical minimization is required. We wish to point out similarities and differences between the implicit filterand three other data assimilation methods. In particular, we discuss howdata are used in the generation of the state estimates. It is clear that theimplicit filter uses the available data as well as the model to generate thestate trajectories for each particle, i.e. it makes use of the nonlinear couplingbetween forced and unforced variables.SIR and EnKF make less direct use of the data. In SIR, the particletrajectories are generated using the model alone and only later weighted bythe observations. Data thus propagate to the SIR state estimates indirectlythrough the weights. In EnKF, the state trajectories are generated using themodel and only the states at times t q ( l ) (when data are available) are updatedby data. Thus, EnKF uses the data only to update its state estimates attimes for which data are actually available.A weak constraint 4D-Var method is perhaps closest in spirit to theimplicit filter. In weak constraint 4D-Var, a cost function similar to F j isminimized (typically by gradient descent) to find the state trajectory withmaximum probability given data and model. If one picks the time windowfor a 4D-Var assimilation from one observation z l to the next z l +1 , thenthe use of the data is similar to the use of the data in an implicit filter,because, in both algorithms, the model as well as data are used to generatethe state trajectories. In fact, one can view the implicit particle filter asa randomized and sequential version of weak constraint 4D-Var (or, onemay interpret weak constraint 4D-Var as an implicit smoother with a singleparticle). These issues will be taken up in more detail in a subsequentpaper [2].Finally, we would like to discuss the implicit filtering algorithm for the19pecial case of a perfect model, i.e. y n +1 = g ( y n , t n ) , (48) z l = h ( y q ( l ) ) + Q l V l . (49)Following the steps above and, assuming we are given a collection of M particles, Y nj , j = 1 , , . . . , M , whose empirical distribution approximatesthe target density p ( y q ( l ) | z l ) at time t q ( l ) , we define, for each particle,the function F j by exp( − F j ) = p ( z l +1 | Y q ( l +1) j ) , (50)so that F j = 12 (cid:16) h (cid:16) Y q ( l +1) j (cid:17) − z l +1 (cid:17) T (cid:16) Σ q ( l +1) z,j (cid:17) − × (cid:16) h (cid:16) Y q ( l +1) j (cid:17) − z l +1 (cid:17) + Z j (51)Since Y q ( l ) j is fixed for each particle, F j is merely a constant that is used toweigh the particle trajectory by the weight w l +1 j = w lj exp( − F j ) . (52)The data are used indirectly here, because the initial condition determinesthe full state trajectory. However, this initial condition is fixed for eachparticle. For a perfect model, strong constraint 4D-Var makes more efficientuse of the available data by using it to find an “optimal initial condition,”compatible with the data. Data assimilation has been recently applied to geomagnetic applications andthere is a need to find out which data assimilation technique is most suitable[18]. Thus far, a strong constraint 4D-Var approach [17] and a Kalman filterapproach [35] have been considered. Here, we apply the implicit particlefilter to a test problem very similar to the one first introduced by Fournierand his colleagues in [17]. The model is given by two SPDE’s ∂ t u + u∂ x u = b∂ x b + ν∂ x u + g u ∂ t W ( x, t ) , (53) ∂ t b + u∂ x b = b∂ x u + ∂ x b + g b ∂ t W ( x, t ) , (54)20here, g u , g b are scalars, and where W is a stochastic process (the derivativehere is formal and may not exist in the usual sense). We study the aboveequations with ν = 10 − as in [17], and with g u = 0 . g b = 1, so that theuncertainty in the unobserved quantity is much larger than the uncertaintyin the unobserved quantity. We consider the above equations on the strip0 ≤ t ≤ T , − ≤ x ≤ u ( x, t ) = 0 , if x = ± , u ( x,
0) = sin( πx ) + 2 / πx ) , (55) b ( x, t ) = ± , if x = ± , b ( x,
0) = cos( πx ) + 2 sin( π ( x + 1) / . (56)Physically, u represents the velocity field and b represents the secular vari-ation of the magnetic field. The model is essentially the model proposedin [17], but with additive noise W ( x, t ) = ∞ (cid:88) k =0 α k sin( kπx ) w k ( t ) + β k cos( kπ/ x ) w k ( t ) . (57)where w k , w k are independent BMs and where α k = β k = (cid:26) , if k ≤ , , if k > . (58)Here, we are content with this simple noise model that represents a smalluncertainty at the boundaries of both fields and is spatially smooth. How-ever, it is straightforward to incorporate more information about the spatialdistribution of the uncertainty by picking suitable coefficients α k , β k . Anillustration of the noise process is given in Figure 1. We follow [17] in the discretization of the dynamical equations, however wedecided to present some details of the discretization to explain how the noiseprocess W comes into play.For both fields, we use Legendre spectral elements of order N (see e.g.21 (cid:3) (cid:1) (cid:2)(cid:1)(cid:8) (cid:1) (cid:2)(cid:1)(cid:7) (cid:1) (cid:2)(cid:1)(cid:5) (cid:1) (cid:2)(cid:1)(cid:4) (cid:2) (cid:2)(cid:1)(cid:4) (cid:2)(cid:1)(cid:5) (cid:2)(cid:1)(cid:7) (cid:2)(cid:1)(cid:8) (cid:3) (cid:1) (cid:2)(cid:1)(cid:3)(cid:6) (cid:1) (cid:2)(cid:1)(cid:3) (cid:1) (cid:2)(cid:1)(cid:2)(cid:6)(cid:2)(cid:2)(cid:1)(cid:2)(cid:6)(cid:2)(cid:1)(cid:3)(cid:2)(cid:1)(cid:3)(cid:6)(cid:2)(cid:1)(cid:4) S p ace Time Space W ( x , t = ˆ t ) Figure 1: The noise process W ( x, t ). Left: Noise plotted as a function of x and t . Right: Snapshot of W ( x, t = ˆ t ).[6, 13]), so that u ( x, t ) = N (cid:88) j =0 ˆ u j ( t ) ψ j ( x ) = N − (cid:88) j =1 ˆ u j ( t ) ψ j ( x ) ,b ( x, t ) = N (cid:88) j =0 ˆ b j ( t ) ψ j ( x ) = − ψ ( x ) + ψ N ( x ) + N − (cid:88) j =1 ˆ b j ( t ) ψ j ( x ) ,W ( x, t ) = N (cid:88) j =0 ˆ W j ( t ) ψ j ( x ) = N − (cid:88) j =1 ˆ W j ( t ) ψ j ( x )where ψ j are the characteristic Lagrange polynomials of order N , centeredat the j th Gauss-Lobatto-Legendre (GLL) node ξ j . We substitute the ex-pansions into the weak form of (53) and (54) (no integration by parts) andevaluate the integrals by Gauss-Lobatto-Legendre quadrature (cid:90) − p ( x ) dx ∼ N (cid:88) j =0 p ( ξ j ) w j , where w j are the corresponding weights. Making use of the orthogonality ofthe basis functions, ψ j ( ξ k ) = δ j,k , we obtain the set of SDE’s M ∂ t ˆ u = M (cid:16) ˆ b ◦ D ˆ b − ˆ u ◦ D ˆ u + νD ˆ u + Ψ Bx ˆ b + g u ∂ t ˆ W (cid:17) ,M ∂ t ˆ b = M (cid:16) ˆ b ◦ D ˆ u − ˆ u ◦ D ˆ b + D ˆ b − Ψ Bx ˆ u + Ψ Bxx + g b ∂ t ˆ W (cid:17) , ◦ denotes the Hadamard product ((ˆ u ◦ ˆ b ) k = ˆ u k ˆ b k ), ˆ u, ˆ b, ˆ W are m =( N − u, b, W u and W b respectively and where Ψ Bx =diag (( ∂ x ψ j ( ξ ) , . . . , ∂ x ψ j ( ξ N − ))) and Ψ Bxx = ( ∂ xx ψ ( ξ ) , . . . , ∂ xx ψ N − ( ξ N − )) T is a diagonal m × m matrix and an m -dimensional column vector respectively,which make sure that our approximation satisfies the boundary conditions.In the above equations, the m × m matrices M , D and D are given by M = diag (( w , . . . , w N − )) , D j,k = ∂ x ψ j ( ξ k ) , D j,k = ∂ xx ψ j ( ξ k ) . We apply a first-order implicit-explicit method with time step δ for timediscretization and obtain the discrete-time and discrete-space equations( M − δνM D ) u n +1 = M (cid:0) u n + δ (cid:0) b n ◦ Db n − u n ◦ Du n + Ψ Bx b n (cid:1)(cid:1) + ∆ W nu , ( M − δM D ) b n +1 = M (cid:0) b n + δ (cid:0) b n ◦ Du n − u n ◦ Db n − Ψ Bx u n + Ψ Bxx (cid:1)(cid:1) + ∆ W nb , where ∆ W u ∼ N (0 , Σ u ) , ∆ W b ∼ N (0 , Σ b ) (59)and Σ u = g u δM (cid:0) F s CC T F Ts + F c CC T F Tc (cid:1) M T , (60)Σ b = g b δM (cid:0) F s CC T F Ts + F c CC T F Tc (cid:1) M T , (61) C = diag(( α , . . . , α n )) , (62) F s = (sin( π ) , sin(2 π ) , . . . , sin( mπ ))( ξ , ξ , . . . , ξ m ) T , (63) F c = (cos( π/ , cos(3 π/ , . . . , cos( mπ/ ξ , ξ , . . . , ξ m ) T . (64)For our choice of α k , β k in (58), the state covariance matrices Σ u and Σ b are singular if m >
10. To diagonalize the state covariances we solve thesymmetric eigenvalue problems [33]( M − δνM D ) v u = Σ u v u λ u , ( M − δM D ) v b = Σ b v b λ b , and define the linear coordinate transformations u = V u ( x u , y u ) T , b = V b ( x b , y b ) T , (65)where the columns of the m × m -matrices V u and V b are the eigenvectorsof v u , v b respectively. The discretization using Legendre spectral elementsworks in our favor here, because the matrices M and D are symmetric so23hat we can diagonalize the left hand side simultaneously with the statecovariance matrix to obtain x n +1 u = f u ( x nu , y nu , x nb , y nb ) + ∆ ˆ W nu ,y n +1 u = g u ( x nu , y nu , x nb , y nb ) ,x n +1 b = f b ( x nu , y nu , x nb , y nb ) + ∆ ˆ W nb ,y n +1 b = g b ( x nu , y nu , x nb , y nb ) , where f u , f b are 10-dimensional vector functions, g u , g b are ( m − W nu ∼ N (0 , diag (( λ u , λ u , . . . , λ u ))) , ˆ W nb ∼ N (cid:16) , diag (cid:16)(cid:16) λ b , λ b , . . . , λ b (cid:17)(cid:17)(cid:17) . We test the convergence of our approximation as follows. To assessthe convergence in the number of grid-points in space, we define a refer-ence solution using N = 2000 grid-points and a time step of δ = 0 . N = 500. We compute the error at t = T = 0 . e x = || ( u ( x, T ) T , b ( x, T ) T ) − ( u Ref ( x, T ) T , b Ref ( x, T ) T ) || , where || · || de-notes the Euclidean norm, and store it. We repeat this procedure 500 timesand compute the mean of the error norms. The results are shown in the leftpanel of Figure 2. We observe a super algebraic convergence as expectedfrom a spectral method.Similarly, we check the convergence of the approximation in the timestep by computing a reference solution with N Ref = 1000 and δ Ref = 2 − .Using the same BM as in the reference solution, we compute an approx-imation with time step δ and compute the error at t = T = 0 . e t = || ( u δ ( x, T ) T , b δ ( x, T ) T ) − ( u Ref ( x, T ) T , b Ref ( x, T ) T ) || , and store it. We re-peat this procedure 500 times and then compute the mean of these errornorms. The results are shown in the right panel of Figure 2. We observe afirst order decay in the error as is expected. The scheme has converged fortime steps smaller than δ = 0 . (cid:239) (cid:239) (cid:239) Number of gridpoints M ean o f e rr o r s (cid:239) M ean o f e rr o r s Figure 2: Convergence of discretization scheme for geomagnetic equations.Left: Convergence in the number of spatial grid-points. Right: Convergencein the time step.Here we are satisfied with an approximation with δ = 0 .
002 and N = 300grid-points in space as in [17]. The relatively small number of spatial grid-points is sufficient because the noise is very smooth in space and becausethe Legendre spectral elements accumulate nodes close to the boundariesand, thus, represent the steep boundary layer, characteristic of (53)-(54),well even if N is small (see also [17]). We apply the implicit particle filter with gradient descent minimization andrandom maps (see Algorithm 2 in Section 3), the simplified implicit parti-cle filter (see Section 2.2) adapted to models with partial noise, a standardEnKF (without localization or inflation), as well as a standard SIR filterto the test problem (53)-(54). The numerical model is given by the dis-cretization described in the previous section with a random initial state.The distribution of the initial state is Gaussian with mean u ( x, , b ( x,
0) asin (55)-(56) and with a covariance Σ u , Σ b given by (60)-(61). In Figure 3,we illustrate the uncertainty in the initial state and plot 10 realizations ofthe initial state (grey lines) along with its mean (black lines). We observethat the uncertainty in u is small compared to the uncertainty in b .The data are the values of the magnetic field b , measured at k equally25 (cid:239) (cid:239) (cid:239) (cid:239) u ( x , ) (cid:239) (cid:239) (cid:239) (cid:239) (cid:239) b ( x , ) Figure 3: Uncertainty in the initial state. Left: u ( x,
0) (unobserved). Right: b ( x,
0) (observed). Black: mean. Grey: 10 realizations of the initial state.spaced locations in [0 ,
1] and corrupted by noise: z l = Hb q ( l ) + sV l , (66)where s = 0 .
001 and where H is a k × m -matrix that maps the numericalapproximation b (defined at the GLL nodes) to the locations where data iscollected. We consider data that are dense in time ( r = 1) as well as sparsein time ( r > i ) we collect the magnetic field at 200 equally spaced locations; and ( ii ) wecollect b at 20 equally spaced locations. The variables u are unobserved andit is of interest to study how the various data assimilation techniques makeuse of the information in b to update the unobserved variables u [17, 18].To assess the performance of the filters, we ran 100 twin experiments. Atwin experiment amounts to: ( i ) drawing a sample from the initial state andrunning the model forward in time until T = 0 . ii ) collecting the data from this free model run; and( iii ) using the data as the input to a filter and reconstructing the statetrajectory. Figure 4 shows the result of one twin experiment for r = 4.For each twin experiment, we calculate and store the error at T = 0 . e u = || u ( x, T ) − u F ilter ( x, T ) || / || u ( x, T ) || , and in the magneticfield, e b = || b ( x, T ) − b F ilter ( x, T ) || / || b ( x, T ) || . After running the 100 twinexperiments, we calculate the mean of the error norms (not the mean error)and the variance of the error norms (not the variance of the error). Allfilters we tested were “untuned,” i.e. we have not adjusted or inserted any26 (cid:239) (cid:239) (cid:239) (cid:239) u ( x , T ) (cid:239) (cid:239) (cid:239) (cid:239) b ( x , T ) Figure 4: Outcome of a twin experiment. Black: true state u ( x, T ) (left)and b ( x, T ) (right). Grey: reconstruction by implicit particle filter with 4particles.free parameters to boost the performance of the filters.Figure 5 shows the results for the implicit particle filter, the EnKF aswell as the SIR filter for 200 measurement locations and for r = 10. It isevident from this figure that the implicit particle filter requires only veryfew particles to yield accurate state estimates with less than 1% error inthe observed variables. The SIR filter with 1000 particles gives significantlylarger errors (about 10% in the observed variables) and much larger variancesin the errors. The EnKF requires about 500 particles to come close to theaccuracy of the implicit filter with only 10 particles.In the experiments, we observed that the minimization in implicit par-ticle filtering typically converged after 4-10 steps (depending on r , the gapin time between observations). The convergence criterion was to stop theiteration when the change in F j was less than 10%. A more accurate mini-mization did not improve the results significantly, so that we were satisfiedwith a relatively crude estimate of the minimum in exchange for a speed-upof the algorithm. We found λ by solving (11) with Newton’s method using λ = 0 as initial guess and observed that it converged after about eight steps.The convergence criterion was to stop the iteration if | F ( λ ) − φ − ρ | ≤ − ,because the accurate solution of this scalar equation is numerically inexpen-sive. We resampled using Algorithm 2 in [1] if the effective sample size M Eff in (19) is less than 90% of the number of particles.To further investigate the performance of the filters, we run more nu-merical experiments and vary the availability of the data in time, as well as27
500 100012141618202224 0 500 1000123456789101112 E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % Number of Particles Number of ParticlesSIR EnkF Implicit Filter
Figure 5: Filtering results for data collected at a high spatial resolution (200measurement locations). The errors at T = 0 . r = 1 , , , u depends strongly on the gap between observations and, for a large gap, isabout 12%.The reconstructions of the observed variables by the simplified implicitparticle filter are rather insensitive to the availability of data in time and,with 20 particles, the simplified filter gives an error in u of less than 1%.The errors in the unobserved quantity depend strongly on the gap betweenthe observations and can be as large as 15%. Here, we need more particles,observe a larger error and a larger sensitivity of the errors to the availabilityof the data, because the simplified filter makes less direct use of the data,than the “full” implicit filter, since it generates the state trajectories usingthe model and only the final position of each particle is updated by thedata. Thus, the error increases as the gap in time between the observa-tions becomes larger. Again, the error statistics have converged and onlyminor improvements can be expected if the number of particles is increasedbeyond 20.The SIR filter also makes less efficient use of the data so that we requiresignificantly more particles, observe larger errors as well as a stronger depen-dence of the errors on the availability of data in time, for both the observedand unobserved quantities. With 1000 particles and for a large gap ( r = 10),the SIR filter gives mean errors of 10% for the observed quantity and 22%for the unobserved quantity. An increase in the number of particles did notdecrease these errors. The EnKF performs well and, for about 500 particles,gives results that are comparable to those of the implicit particle filter. Thereason for the large number of particles is, again, the indirect use of the datain EnKF.The errors in the reconstructions of the various filters are not Gaussian,so that an assessment of the errors based on the first two moments is in-complete. In the two panels on the right of Figure 7, we show histogramsof the errors of the implicit filter (10 particles), simplified implicit filter (20particles), EnKF (500 particles) and SIR filter (1000 particles) for r = 10model steps between observations. We observe that the errors of the im-29
200 4004681012141618 0 200 4000.60.650.70.750.80.85
Implicit Particle FilterSIR Particle Filter Ensemble Kalman FilterSimpli ed Implicit Particle Filter Number of Particles Number of ParticlesNumber of Particles Number of ParticlesNumber of Particles Number of Particles Number of Particles Number of Particles E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % Filtering Results for High Spatial Resolution of Data: r = 0 r = 2 r = 4 r = 10 Figure 6: Filtering results for data collected at a high spatial resolution (200measurement locations). The errors at T = 0 .
10 15 20 25 30 350510152025300 5 10 15 2001020304050605 10 15 20 25 30 35 40051015202530350 5 10 15 20020406080100
Error in u (observed variables) in % Error in u (observed variables) in %Error in b (unobserved variables) in % Error in b (unobserved variables) in % F r e qu e n c y F r e qu e n c y F r e qu e n c y F r e qu e n c y High Spatial Resolution of Data Low Spatial Resolution of Data
Implicit lter with 10 particlesSimpli ed implicit lter with 20 particles Ensemble Kalman Filter with 500 particlesSIR lter with 1000 particles Figure 7: Histogram of errors at T = 0 . r = 10 modelsteps. Right: data are available at a low spatial resolution (20 measurementlocations) and every r = 10 model steps.31licit filter, simplified implicit filter and EnKF are centered to the right ofthe diagrams (at around 10% in the unobserved quantity u and about 1%for the observed quantity b ) and show a considerably smaller spread thanthe errors of the SIR filter, which are centered at much larger errors (20%in the unobserved quantity u and about 9% for the observed quantity b ). Acloser look at the distribution of the errors thus confirms our conclusions wedrew from an analysis based on the first two moments.We decrease the spatial resolution of the data to 20 measurement loca-tions and show filtering results from 100 twin experiments in Figure 8. Theresults are qualitatively similar to those obtained at a high spatial resolutionof 200 data points per observation. We observe for the implicit particle filterthat the errors in the unobserved quantity are insensitive to the spatial res-olution of the data, while the errors in the observed quantity are determinedby the spatial resolution of the data and are rather insensitive to the tempo-ral resolution of the data. These observations are in line with those reportedin connection with a strong 4D-Var algorithm in [17]. All other filters wehave tried show a dependence of the errors in the observed quantity on thetemporal resolution of the data. Again, the reason for the good performanceof the implicit particle filter is its direct use of the data. The two panels tothe left of Figure 7, show histograms of the errors of the implicit filter (10particles), simplified implicit filter (20 particles), EnKF (500 particles) andSIR filter (1000 particles) for r = 10 model steps between observations. Theresults are qualitatively similar to the results we obtained at a higher spatialresolution of the data and the closer look at the distributions of the errorsconfirms the conclusions we drew from an analysis of the first two moments.In summary, we observe that the implicit particle filter yields the low-est errors with a small number of particles for all examples we considered,and performs well and reliably in this application. The SIR and simplifiedimplicit particle filters could not reach the accuracy of the implicit particlefilter, even when the number of particles is very large. The EnKF requiresabout 500 particles to come close to the accuracy of the implicit particlefilter with only 4 particles. Although the implicit filter uses the computa-tionally most expensive particles, the small number of particles required fora very high accuracy make the implicit filter the most efficient filter for thisproblem. The partial noise works in our favor here, because the dimension ofthe space the implicit filter operates in is 20, rather than the state dimension600.Finally, we wish to compare our results with those in [17], where a strongconstraint 4D-Var algorithm was applied to the deterministic version of thetest problem. Fournier used “perfect data,” i.e. the observations were not32 Implicit Particle FilterSIR Particle Filter Ensemble Kalman FilterSimpli ed Implicit Particle Filter E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n u ( ob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % Number of Particles Number of ParticlesNumber of Particles Number of ParticlesNumber of Particles Number of Particles Number of Particles Number of Particles
Filtering Results for Low Spatial Resolution of Data: r = 0 r = 2 r = 4 r = 10 E rr o r i n b ( unob s e r v e d v a r i a b l e s ) i n % Figure 8: Filtering results for data collected at a low spatial resolution (20measurement locations). The errors at T = 0 . r = 5 modelsteps between observations, an error of about 1.2% in u and 4.7% in b wasachieved. With the implicit filter, we can get to a similar accuracy at thesame spatial resolution of the data, but with a larger gap of r = 10 modelsteps between observations. Moreover, the data assimilation problem wesolve here is somewhat harder than the strong constraint 4D-Var problembecause we allow for model errors. The implicit particle filter also reducesthe memory requirements because it operates in the 20-dimensional subspaceof the forced variables. Each minimization is thus not as costly as a 600-dimensional strong constraint 4D-Var minimization. We considered implicit particle filters for data assimilation. Previous im-plementations of the implicit particle filter rely on finding the Hessians offunctions F j of the state variables. Finding these Hessians can be expensiveif the state dimension is large and can be cumbersome if the second deriva-tives of the F j ’s are hard to calculate. We presented a new implementationof the implicit filter combining gradient descent minimization with randommaps. This new implementation avoids the often costly calculation of theHessians and, thus, reduces the memory requirements compared to earlierimplementations of the filter.We have considered models for which the state covariance matrix is sin-gular or ill-conditioned. This happens often, for example, in geophysicalapplications in which the noise is smooth in space or if the model includesconservation laws with zero uncertainty. Previous implementations of theimplicit filter are not applicable here and we have shown how to use ournew implementation in this situation. The implicit filter is found to bemore efficient than competing methods because it operates in a space whosedimension is given by the rank of the state covariance matrix rather thanthe model dimension.We applied the implicit filter in its new implementation to a test prob-lem in geomagnetic data assimilation. The implicit filter performed well incomparison to other data assimilation methods (SIR, EnKF and 4D-Var)and gave accurate state estimates with a small number of particles and at alow computational cost. We have studied how the various data assimilationtechniques use the available data to propagate information from observed34o unobserved quantities and found that the implicit particle filter uses thedata in a direct way, propagating information to unobserved quantities fasterthan competing methods. The direct use of the data is the reason for thevery small errors in reconstructions of the state. Acknowledgments
We would like to thank our collaborators Dr. Ethan Atkins at UC Berke-ley, and Professors Robert Miller, Yvette Spitz, and Dr. Brad Weir atOregon State University, for their comments and helpful discussion. Wethank Mr. Robert Saye for careful proofreading of early versions of thismanuscript. This work was supported in part by the Director, Office of Sci-ence, Computational and Technology Research, U.S. Department of Energyunder Contract No. DE-AC02-05CH11231.
References [1] M. S. Arulampalam, S. Maskell, N. Gordon, and T. Clapp. A tutorialon particle filters for online nonlinear/non-Gaussian Bayesian tracking.
IEEE Transactions on Signal Processing , 50(2):174–188, 2002.[2] E. Atkins, M. Morzfeld, and A.J. Chorin. Implicit sampling as a ran-domized maximum likelihood estimator.
In preparation , 2011.[3] A.F. Bennet, L.M. Leslie, C.R. Hagelberg, and P.E. Powers. A cy-clone prediction using a barotropic model initialized by a general inversemethod.
Monthly Weather Review , 121:1714–1728, 1993.[4] P. Bickel, B. Li, and T. Bengtsson. Sharp failure rates for the boot-strap particle filter in high dimensions.
IMS Collections: Pushing theLimits of Contemporary Statistics: Contributions in Honor of JayantaK. Ghosh , 3:318–329, 2008.[5] M. Bocquet, C.A. Pires, and L. Wu. Beyond Gaussian statisticalmodeling in geophysical data assimilation.
Monthly Weather Review ,138:2997–3023, 2010.[6] C. Canuto, M. Y. Hussiani, A. Quarteroni, and T. A. Zang.
SpectralMethods. Fundamentals in Single Domains . Springer, Berlin, Germany,2006. 357] A. J. Chorin, M. Morzfeld, and X. Tu. Implicit particle filters for dataassimilation.
Comm. Appl. Math. Comp. , 5(2):221–240, 2010.[8] A. J. Chorin and X. Tu. Implicit sampling for particle filters.
Proceed-ings of the National Academy of Sciences , 106(41):17249–17254, 2009.[9] A.J. Chorin and O.H. Hald.
Stochastic Tools in Mathematics and Sci-ence . Springer, second edition, 2009.[10] I. Chueshov. Gevrey regularity of random attractors for stochas-tic reaction-diffusion equations.
Random Operators and Stoch. Eqns. ,8:143–162, 2000.[11] P. Courtier. Dual formulation of four-dimensional variational assimila-tion.
Quarterly Journal of the Royal Meteorological Society , 123:2449–2461, 1997.[12] P. Courtier, J.N. Thepaut, and A. Hollingsworth. A strategy for opera-tional implementation of 4D-Var, using an incremental appoach.
Quar-terly Journal of the Royal Meteorological Society , 120:1367–1387, 1994.[13] M. O. Deville, P. F. Fischer, and E. H. Mund.
Higher-Order Methods forIncompressible Flow . Cambridge University Press, Oxford, UK, 2006.[14] A. Doucet, N. de Freitas, and N. Gordon.
Sequential Monte Carlomethods in practice . Springer, 2001.[15] A. Doucet, S. Godsill, and C. Andrieu. On sequential Monte Carlosampling methods for Bayesian filtering.
Statistics and Computing ,10:197–208, 2000.[16] R. Fletcher.
Practical Methods of Optimization . Wiley, second edition,1987.[17] A. Fournier, C. Eymin, and T. Alboussiere. A case for variational geo-magnetic data assimilation: insights from a one-dimensional, nonlinear,and sparsely observed mhd system.
Nonlinear Processes in Geophysics ,14:163–180, 2007.[18] A. Fournier, G. Hulot, D. Jault, W. Kuang, A. Tangborn, N. Gillet,E. Canet, J. Aubert, and F. Lhuillier. An introduction to data as-similation and predictability in geomagnetism.
Space Science Reviews ,155(1-4):247–291, 2010. 3619] N. J. Gordon, D. J. Salmond, and A. F. M. Smith. Novel approachto nonlinear/non-Gaussian Bayesian state estimation.
IEE Poc. F onRadar and Signal Processing , 140:107–113, 1993.[20] K. Ide, P. Coutier, M. Ghil, and A.C. Lorenc. Unified notation for dataassimilation: operational, sequential and variational.
J. Meteor. Soc.Japan , 75:181–189, 1997.[21] A. Jentzen and P. E. Kloeden. Overcoming the order barrier in the nu-merical approximation of stochastic partial differential equations withadditive space-time noise.
Proceedings of the Royal Society A , 465:649–667, 2009.[22] S. J. Julier and J. K. Uhlmann. A new extension of the Kalman filterto nonlinear systems.
Int. Symp. Aerospace/Defense Sensing, Simul.and Controls , 3, 1997.[23] R. E. Kalman and R. S. Bucy. New results in linear filtering and predic-tion theory.
Transactions of the ASME–Journal of Basic Engineering ,83(Series D):95–108, 1961.[24] P. E. Kloeden and E. Platen.
Numerical solution of stochastic differen-tial equations . Springer, 1999.[25] A. L. Kurapov, G. D. Egbert, J. S. Allen, and R. N. Miller. Representer-based variational data assimilation in a nonlinear model of nearshorecirculation.
Journal of Geophysical Research , 112:C11019, 2007.[26] G. J. Lord and J. Rougemont. A numerical scheme for stochastic pdeswith Gevrey regularity.
IMA Journal of Numerical Analysis , 24:587–604, 2004.[27] R. N. Miller, Jr. E. F. Carter, and S. T. Blue. Data assimilation intononlinear stochastic models.
Tellus , 51:167–194, 1999.[28] R. N. Miller, M. Ghil, and F. Gauthiez. Advanced data assimila-tion in strongly nonlinear dynamical systems.
J. Atmospheric Science ,51:1037–1056, 1994.[29] P. Del Moral. Measure-valued processes and interacting particle sys-tems. Application to nonlinear filtering problems.
Ann. Appl. Probab. ,8:438–495, 1998.[30] P. Del Moral.
Feynman-Kac Formulae . Springer, NY, 2004.3731] M. Morzfeld, X. Tu, E. Atkins, and A.J. Chorin. A random map imple-mentation of implicit filters.
Journal of Computational Physics , 2011.[32] J. Nocedal and S. T. Wright.
Numerical Optimization . Springer, secondedition, 2006.[33] B. N. Parlett.
The symmetric eigenvalue problem . Classics in AppliedMathematics, Vol. 20, Society for Industrial and Applied Mathematics,Philadelphia, 1998.[34] C. Snyder, T. Bengtsson, P. Bickel, and J. Andersson. Obstacles tohigh-dimensional particle filtering.
Monthly Weather Review , 136:4629–4640, 2008.[35] Z. Sun, A. Tangborn, and W. Kuang. Data assimilation in a sparselyobserved one-dimensional modeled mhd system.
Nonlinear Processesin Geophysics , pages 181–192, 2007.[36] O. Talagrand. Assimilation of observations, an introduction.
Journalof the Meteorological Society of Japan , 75(1):191–209, 1997.[37] O. Talagrand and P. Courtier. Variational assimilation of meteorolog-ical observations with the adjoint vorticity equation. i: Theory.
Quar-terly Journal of the Royal Meteorological Society , 113:1311–1328, 1987.[38] Y. Tremolet. Accounting for an imperfect mode in 4d-Var.
QuarterlyJournal of the Royal Meteorological Society , 132(621):2483–2504, 2006.[39] P.J. van Leeuwen. Particle filtering in geophysical systems.
MonthlyWeather Review , 137:4089–4114, 2009.[40] P.J. van Leeuwen. Nonlinear data assimilation in geosciences: an ex-tremely efficient particle filter.
Quart. J. Roy. Meteo. Soc. , 136:1991–1999, 2010.[41] J. Weare. Particle filtering with path sampling and an application to abimodal ocean current model.
J. Comput. Phys. , 228:4312–4331, 2009.[42] D. Zupanski. A general weak constraint applicable to operational4DVAR data assimilation systems.