[PDF] Ensemble Riemannian Data Assimilation over the Wasserstein Space

Abstract

In this paper, we present an ensemble data assimilation paradigm over a Riemannian manifold equipped with the Wasserstein metric. Unlike the Eulerian penalization of error in the Euclidean space, the Wasserstein metric can capture translation and difference between the shapes of square-integrable probability distributions of the background state and observations -- enabling to formally penalize geophysical biases in state-space with non-Gaussian distributions. The new approach is applied to dissipative and chaotic evolutionary dynamics and its potential advantages and limitations are highlighted compared to the classic variational and filtering data assimilation approaches under systematic and random errors.

Full PDF

EEnsemble Riemannian Data Assimilation over the WassersteinSpace

Sagar K. Tamang , , Ardeshir Ebtehaj , , Peter J. Van Leeuwen , Dongmian Zou , andGilad Lerman Saint Anthony Falls Laboratory, University of Minnesota-Twin Cities Department of Civil, Environmental and Geo- Engineering, University of Minnesota-TwinCities Department of Atmospheric Science, Colorado State University Duke Kunshan University School of Mathematics, University of Minnesota-Twin Cities

Abstract

In this paper, we present a new ensemble data assimilation paradigm over a Rieman-nian manifold equipped with the Wasserstein metric. Unlike Eulerian penalization oferror in the Euclidean space, the Wasserstein metric can capture translation and shapediﬀerence between square integrable probability distributions of the background stateand observations, enabling to formally penalize geophysical biases in a non-Gaussianstate-space. The new approach is applied to dissipative and chaotic evolutionary dy-namics with a wide range of applications in Earth system models. Its advantages overclassic variational and particle ﬁlter techniques are documented under systematic errorsand non-Gaussian state-spaces. a r X i v : . [ s t a t . M E ] S e p Introduction

Extending the forecast skill of Earth System Models (ESM) relies on ad-vancing the science of data assimilation (DA). A large body of current DAmethodologies, either ﬁltering (

Evensen , 1994, 2003;

Kalman , 1960;

Reichleet al. , 2002a,0) or variational approaches (

Andersson et al. , 1994;

Ebtehaj andFoufoula-Georgiou , 2013;

Le Dimet and Talagrand , 1986;

Lorenc , 1986;

Parkand ˇZupanski , 2003;

Reichle et al. , 2001), are derived from basic principlesof Bayesian inference under the assumption that the model and observationerrors are drawn from zero-mean Gaussian distributions. It is well docu-mented that these assumptions often limit the forecast skills of DA systems(

Chen et al. , 2019a;

Dee , 2005;

Ebtehaj et al. , 2014;

Walker et al. , 2001) asnatural processes are not necessarily Gaussian (

Bocquet et al. , 2010;

Pireset al. , 2010) and systematic errors or biases exist in ESMs and observations– chieﬂy due to under-representation of the underlying physics and errors ininstruments as well as retrieval algorithms (

Dee , 2003).Apart from particle ﬁlters (

Van Leeuwen , 2010), which are intrinsically de-signed to deal with non-Gaussian state-space, numerous modiﬁcations to thevariational DA (VDA) and ensemble ﬁltering techniques have been made totackle non-Gaussianity (

Anderson , 2010;

Han and Li , 2008;

Mandel and Bee-zley , 2009;

Pires et al. , 1996). In four-dimensional VDA, where non-Gaussianstate-space may lead to local minima, a quasi-static VDA is proposed to en-sure convergence to a global minimum by gradually increasing the assimila-tion intervals (

Pires et al. , 1996). For ensemble-based ﬁlters,

Anderson (2010)proposes a new approach to account for non-Gaussian priors and posteriors y utilizing rank histograms ( Anderson , 1996;

Hamill , 2001). A hybrid en-semble Kalman-particle ﬁlter is also developed (

Mandel and Beezley , 2009) toutilize both the ensemble Kalman ﬁlters (EnKF;

Evensen , 1994) and particleﬁlters to cope with non-Gaussian priors in high-dimensional geophysical DAsystems.In recent years, signiﬁcant progress has been made to treat systematicerrors through numerous ad-hoc methods such as ﬁeld alignment techniques(

Ravela et al. , 2007) and morphing EnKF (

Beezley and Mandel , 2008) thatcan tackle position errors between observations and forecast. Dual state-parameter EnKF (

Moradkhani et al. , 2005) is also developed to resolve sys-tematic errors originating from parameter uncertainties. Bias aware variantsof the Kalman ﬁlter are designed (

De Lannoy et al. , 2007a,0;

Dr´ecourt et al. ,2006;

Kollat et al. , 2008) to simultaneously update the state-space and an apriori estimate of the additive biases. In parallel, the cumulative distribu-tion function matching (

Reichle and Koster , 2004) has garnered widespreadattention in land DA to alleviate systematic biases in remote sensing obser-vations.Despite signiﬁcant progress in developing versatile DA systems, manyexisting techniques are ad-hoc and cannot formalize non-Gaussianity andsystematic errors rigorously. Even though particle ﬁlters can handle non-Gaussian likelihood functions, when observations lie away from the supportset of the particles, their ensemble variance collapses over time and this de-generates the ﬁlters (

Poterjoy and Anderson , 2016). In this paper, we aimto address the shortcomings of current Bayesian DA methodologies by devel- ping a new ensemble methodology, which borrows ideas from Riemanniangeometry and optimal mass transport, that can eﬀectively handle a non-Gaussian state-space and geophysical biases.In order to emphasize the advantage of this Riemannian perspective overcommon Bayesian methods for DA, we refer to the following technical prop-erty: The set of equal-mean Gaussian distributions, which are common inBayesian modeling, span an aﬃne Euclidean subspace ( Amari , 2012). How-ever, suﬃciently smooth non-Gaussian distributions with ﬁnite second-ordermoments constitute a Riemannian statistical manifold (

Lauritzen , 1987). Itis equipped with a geodesic distance, which assigns to any two non-Gaussianprobability distributions the shortest curve on the Riemannian manifold theybelong to. This distance thus encapsulates the dissimilarity between the meanvalues as well as all higher-order moments of the two distributions (

Pennec ,2006).More speciﬁcally, the geodesic distance on the Riemannian manifold ofsquare-integrable probability distributions is the Wasserstein distance (

Vil-lani , 2003). It stems from the theory of optimal mass transport (

Kantorovich ,1942;

Monge , 1781). Unlike the Euclidean distance, which is “Eulerian” in thesense that it becomes insensitive to the magnitude of translation between twodisjoint densities, the Wasserstein distance acts as a “Lagrangian” metric andpenalizes the mismatch based on the diﬀerence between their central positionsas well as their shapes (

Chen et al. , 2019b;

Ning et al. , 2014). Wassersteinspace representations have been established for the following physical mod-els: The Fokker-Plank equation (

Jordan et al. , 1998), the dissipative porous edia equation ( Otto , 2001) and the semi-geostrophic approximation of thecompressible Navier-Stokes equations (

Cullen et al. , 1991).Several attempts have been made to utilize the Wasserstein metric in geo-physical DA.

Ning et al. (2014) used the Wasserstein distance to reduce fore-cast uncertainty due to errors in parameter estimation in dissipative evolu-tionary equations.

Feyeux et al. (2018) suggested replacing the Euclideandistance in VDA with the Wasserstein distance to penalize the position errorbetween state and observations.

Tamang et al. (2020) introduced a Wasser-stein regularization of the VDA cost function with particular emphasis onbias correction under chaotic dynamics.All of the aforementioned techniques stem from the variational prob-lem formulation and therefore they are unable to rigorously tackle non-Gaussian state-space. Furthermore, the high computational cost associatedwith the Wasserstein distance inhibit the application of these methods inhigh-dimensional geophysical systems. In this work, we attempt to answersome of the key questions which still remain to be explored: How can a Rie-mannian interpretation of the DA using the Wasserstein distance formallyaccount for non-Gaussian state-space? Why and how does the RiemannianDA can correct for systematic errors? Is there a way for ensemble implemen-tation in high-dimensional problems given the fact that the computation ofthe Wasserstein distance is costly?We present a novel ensemble DA methodology namely Ensemble Rieman-nian DA (EnRDA). As explained, the presented methodology extends the DAformalism over a Riemannian manifold to account for non-Gaussian state- pace under systematic geophysical biases. It is important to note that sinceEnRDA operates in the probability domain it can lead to the estimation ofthe entire forecast probability distribution. The initial results of the EnRDAare demonstrated for dissipative advection-diﬀusion and chaotic Lorenz-63system with a wide range of prospective applications in the forecast of mass,momentum, and heat transport in Earth system models.The organization of the paper is as follows: Section 2 provides a briefbackground on Bayesian DA formulations and optimal mass transport. Themathematical formalism of the EnRDA is described in Section 3. Section 4presents the results and compares them with their Euclidean counterparts.Section 5 discusses the ﬁndings and ideas for future research. Throughout, small and capital boldface letters represent m -element columnvectors x ∈ R m and m -by- n matrices X ∈ R m × n , respectively, whereas R m + ( R m × n + ) denotes that vectors (matrices) only contain non-negative realnumbers. The m denotes an m -element vector of ones, I m is an m × m identity matrix, δ ( x − y ) represents Kronecker delta function with value 1 iﬀ x = y and 0 otherwise, diag( x ) ∈ R m × m represents a diagonal matrix withentries given by x ∈ R m and E X ( x ) is the expectation of x under probabilitydensity function p ( x ). Notation x ∼ N ( µ , Σ ) denotes that the random vec-tor x is drawn from a Gaussian distribution with mean µ and covariance Σ . he square of the weighted (cid:96) -norm of x is represented as (cid:107) x (cid:107) B − = x T B − x ,where B is a positive deﬁnite matrix and ( · ) T is the transposition opera-tor. Notations of x (cid:12) y and x (cid:11) y represent the element-wise Hadamardproduct and division between equal length vectors x and y , respectively, (cid:104) A , B (cid:105) = tr( A T B ) denotes the Frobenius inner product between matrices A and B , tr( · ) and det( · ) represent trace and determinant of a square matrixrespectively. In this section, we provide a brief review on derivation of classic variationalDA and particle ﬁlters based on the Bayes’ theorem to set the stage for thepresented Ensemble Riemannian DA formalism.

Let us consider a discrete-time Markovian state-space and its observations asfollows: x t = M ( x t − ) + ω t , ω t ∼ N ( , B ) y t = H ( x t ) + v t , v t ∼ N ( , R ) , (1)where x t ∈ R m and y t ∈ R n represent the state variables and the observationsat time t ; M : R m → R m and H : R m → R n are the deterministic forwardmodel and observation operators; ω t ∈ R m and v t ∈ R n are the independentand identically distributed model and observation errors, respectively.Recalling the Bayes’ theorem, dropping the time subscript, without loss ofgenerality the posterior probability density function (pdf ) of the state given he observation can be obtained as p ( x | y ) = p ( y | x ) p ( x ) /p ( y ), where p ( y | x ) is proportional to the likelihood function, p ( x ) is the priordensity and p ( y ) denotes the distribution of observations. As is evident, inthe log-space, the theorem can be expressed as follows: − log p ( x | y ) = − log p ( x ) − log p ( y | x ) + log p ( y ) . (2)Letting x b = E X ( x ) ∈ R m represents the background state, ignoring theadditive constant terms including log p ( y ) and assuming Gaussian errors,Equation 2 can be expanded to the known three-dimensional VDA (3D-Var)cost function ( Lorenc , 1986) as follows: − log p ( x | y ) ∝

12 ( x − x b ) T B − ( x − x b ) + 12 ( y − H ( x )) T R − ( y − H ( x )) ∝ (cid:107) x − x b (cid:107) B − + (cid:107) y − H ( x ) (cid:107) R − . (3)As a result, the analysis state obtained by minimization of the 3D-Var costfunction in Equation 3 is the mode of the posterior distribution that coincideswith the posterior mean when errors are drawn from unbiased Gaussian densi-ties and H is a linear operator. Using the Woodbury matrix inversion lemma( Woodbury and Woodbury , 1950), it can be easily demonstrated that for alinear observation operator, the analysis states in the 3D-Var and Kalmanﬁlter are equivalent. As is evident, zero-mean Gaussian assumption leads topenalization of the error through the weighted Euclidean norm. .2.2 Particle Filters Particle ﬁlters (

Gordon et al. , 1993;

Van Leeuwen , 2010;

Van Leeuwen et al. ,2019) in DA were introduced to address the issue of non-Gaussian state-space by representing the prior and posterior distributions through a weightedensemble of model outputs referred to as the “particles”. In its standarddiscrete setting, using Monte Carlo simulations, the state-space prior density p ( x ) is represented by a sum of equal-weight Kronecker delta functions as: p ( x ) = 1 N N (cid:88) i =1 δ ( x − x i ) , (4)where x i ∈ R m is the state variable represented by the i th particle and N isthe total number of particles.Each of these N particles are then evolved through the nonlinear model inEquation 1. Let us assume that at every assimilation cycle, when observationsbecome available, the probability of observations y ∈ R n given the statevariable x i follows a Gaussian distribution as: p ( y | x i ) = 1(2 π ) n/ det( R ) / exp (cid:18) − (cid:0) y − H ( x i ) (cid:1) T R − (cid:0) y − H ( x i ) (cid:1)(cid:19) ∀ i. (5)Using the Bayes’ theorem, under the Gaussian error assumption, it can beeasily shown that the posterior distribution p ( x | y ) can be approximated usinga set of weighted particles as: p ( x | y ) = N (cid:88) i =1 w i δ ( x − x i ) , (6) here { w i } Ni =1 ∈ R represents the weight of the i th particle w i = p ( y | x i ) (cid:80) Nj =1 p ( y | x j ) . (7)The particles are then resampled from the posterior distribution in Equa-tion 6 based on their relative weights and are propagated forward in timeaccording to the model dynamics. As is evident, in particle ﬁlters, weightsof each particle are updated using the Gaussian likelihood function under azero-mean error assumption. However, in the presence of systematic biases,when the support sets of particles and the observation are disjoint, only theweights of a few particles become signiﬁcantly large and weights of all otherparticle tend to zero. As the underlying dynamical system progresses in time,only those few particles, with relatively larger weights, are resampled, whichgradually makes the ﬁlter degenerate in time ( Poterjoy and Anderson , 2016).It is to note that the problem of ﬁlter degeneracy exists even when the ensem-ble size is low and requires an exponential growth in the number of particlesto maintain a constant eﬀective particle size (

Doucet and Johansen , 2009).

The theory of optimal mass transport (OMT) ﬁrst coined by Gaspard Monge(

Monge , 1781) and later extended by Kantorovich (

Kantorovich , 1942) wasdeveloped to minimize transportation cost in resource allocation problemswith purely practical motivations. Later on mathematicians discovered thatthe OMT provides a rich ground to compare and morph probability distribu-tions and uncovered new connections to partial diﬀerential equations (

Jordan t al. , 1998; Otto , 2001) as well as functional analysis (

Benamou and Brenier ,2000;

Brenier , 1987;

Villani , 2003).In a discrete setting, for two probability histograms { p x b ∈ R m + : (cid:80) i p x bi =1 } and { p y ∈ R m + : (cid:80) j p y j = 1 } with their respective measures supportedon x bi and y j , a “ground” transportation cost matrix C ∈ R m × m + is deﬁnedsuch that its elements c ij = | x bi − y j | q represent the cost of transporting unitprobability masses from location x bi to y j , where q is a positive exponent.Without loss of generality, assuming that p x b and p y have the same number ofsupport points, OMT determines an optimal transportation plan U a ∈ R m × m + that can linearly map the two histograms onto each other with minimumamount of total transportation cost or “work” as follows: U a = argmin U (cid:104) C , U (cid:105) s.t. U ≥ , U m = p x b , U T m = p y . (8)It is worth noting that the transportation plan can be interpreted as a jointdistribution that couples the marginals densities p x b and p y . For the groundtransportation cost with q = 2, the OMT problem in Equation 8 is convexand deﬁnes the square of the 2-Wasserstein distance d W ( p x b , p y ) between theprobability histograms.One key question that might arise here is – what is the advantage ofthe Wasserstein distance for interpolating probability distributions comparedto other measures of proximity – such as the Hellinger distance ( Hellinger ,1909) or the KullbackLeibler (KL) divergence (

Kullback and Leibler , 1951)?To elaborate on this question, we conﬁne our consideration to the Gaus-sian densities over which the Wasserstein distance can be represented in aclosed form. In particular, let us interpolate between two Gaussian distri- N (1 . , .

01) and N ( − . , .

35) as afunction of displacement parameter η ∈ [0 ,

1] for the (a) Hellinger distance, (b) Kullback-Leibler divergence, and (c) Wasserstein distance. butions N ( µ x , Σ x ) with N ( µ y , Σ y ) with an interpolation or displacementparameter η . It can be shown that the interpolated density, with mini-mum sum of squares of the Wasserstein distance, is a Gaussian distribution N ( µ η , Σ η ), where µ η = (1 − η ) µ x + η µ y and Σ η = Σ − / x (cid:0) (1 − η ) Σ x + η ( Σ / x Σ y Σ / x ) / (cid:1) Σ − / x ( Chen et al. , 2019b).Figure 1 shows the spectrum of barycenters between two Gaussian pdfs fora range of the displacement parameter η ∈ [0 , p η between two Gaussian mixture models p x b :0 . N ( − , .

4) + 0 . N ( − , .

8) and p y : 0 . N (5 ,

4) + 0 . N (9 . ,

4) as a function of thedisplacement parameter η ∈ [0 , η = 0 . As previously noted, this metric is not limited to any Gaussian assumption.Figure 2 shows the Wasserstein barycenter between two bimodal mixtures ofGaussians as a function of the displacement parameter η ∈ [0 , d W ( p x b , p y ) = d W ( p x b , p y ) + (cid:13)(cid:13) µ x b − µ y (cid:13)(cid:13) , where p x b and p y are the centred zero-mean probability massesand µ x b and µ y are the respective mean values. Ensemble Riemannian Data Assimilation

First, let us recall that the weighted mean of a cloud of points { x i } Ni =1 ∈ R m on Euclidean space is µ x = (cid:80) Ni =1 η i x i /N for a given family of non-negativeweights (cid:80) i η i = 1. This expected value is equivalent to solving the followingvariational problem: µ x = argmin x N (cid:88) i =1 η i (cid:107) x i − x (cid:107) . (9)Thus, the 3D-Var problem in Equation 3 after Cholesky decomposition ( Nash ,1990) of the error covariance matrices and rearrangement of terms can beinterpreted as a “barycenter problem” in the Euclidean space, where theanalysis state is the weighted mean of the background state and observation.By changing the distance metric from Euclidean to the Wasserstein (

Aguehand Carlier , 2011), a Riemannian barycenter can be deﬁned as the Fr´echetmean (

Fr´echet , 1948) of N probability histograms with ﬁnite second-ordermoments p η = argmin p N (cid:88) k =1 η k d W ( p , p k ) . (10)Let us assume that the probability histogram of the background state p x b = N (cid:80) Ni =1 δ ( x − x i ) can be represented through a sum of equal weight Kroneckerdelta functions located at N ensemble members of the state variable x i ∈ R m . We similarly approximate the distribution of the observations p y = M (cid:80) Mj =1 δ ( y − y j ∗ ) using the replicated “pseudo-observations” { y j ∗ } Mj =1 ∈ R n obtained by perturbing it with zero-mean Gaussian noise v ∗ ∼ N (0 , R ). he EnRDA deﬁnes the probability histogram of the analysis state p x a ∈ R m as the Frchet barycenter over the Wasserstein space as follows: p x a = argmin p x (cid:8) η d W ( p x , p x b ) + (1 − η ) d W ( | det( H (cid:48) ( x )) | − p x , p y ) (cid:9) , (11)where the displacement parameter η > H (cid:48) ( · ) is the Jacobian of the observation op-erator assuming that the support set of the probability histograms of theobservation and state variables are the same. The optimal value of η shallbe determined empirically using some reference data. It is important tonote that the solution of the above DA formalism involves ﬁnding the opti-mal transportation plan (Equation 8) or the joint density, which couples thebackground and observation marginal distributions.From a Bayesian point of view, the above problem can be derived as-suming that the model and observation errors have ﬁnite second-order mo-ments and are explained by generalized Riemannian Gaussian distributions( Said et al. , 2017). Speciﬁcally, following the Bayesian principle explainedin Equation 2, the EnRDA can be derived assuming that the prior p ( x ) andlikelihood function p ( y | x ) are p ( x ) ∝ exp (cid:2) − η d W ( p x , p x b ) (cid:3) and p ( y | x ) ∝ exp (cid:2) − (1 − η ) d W (cid:0) | det H (cid:48) ( x ) | − p x , p y , (cid:1)(cid:3) . From the joint histogram ( U a ) coupling the p x b and p y (Equation 8),the histogram of the analysis state can be obtained based on the McCann’sinterpolation ( McCann , 1997;

Peyr´e et al. , 2019), p x a = N (cid:88) i =1 M (cid:88) j =1 u aij δ (cid:2) (1 − η ) x i + η y j ∗ (cid:3) . (12) amples from the analysis state histogram in Equation 12 can be drawn toproduce model ensemble forecast. It should be noted that computation of thejoint histogram is an expensive linear programming problem. The widely usedinterior-point methods ( Altman and Gondzio , 1999) and the Orlin’s (

Orlin ,1993) algorithm have super-cubic run time with computational complexity of O ( d log d ), where d represents the total number of support points over whichthe input probability histograms are deﬁned. This is a serious limitation inhigh-dimensional geophysical DA problems. In order to speed up computation of coupling between p x b and p y , the problemin Equation 8 can be regularized ( Cuturi , 2013) as follows: U a = argmin U (cid:104) C , U (cid:105)− γ H ( U ) s.t. U ≥ , U m = p x b , U T m = p y , (13)where γ > H ( U ) := (cid:104) U , log U − m T m (cid:105) represents the Gibbs-Boltzmann relative entropy function. Note that therelative entropy is a concave function and thus its negative value is convex.The Lagrangian function ( L ) of Equation 13 can be obtained by addingtwo dual variables or Lagrangian multipliers q , r ∈ R m as follows: L ( U , q , r ) = (cid:104) C , U (cid:105) − γ H ( U ) − (cid:104) q , U m − p x b (cid:105) − (cid:104) r , U T m − p y (cid:105) . (14)Setting the derivative of the Lagrangian function to zero, we have ∂ L ( U , q , r ) ∂u ij = c ij + γ log( u ij ) − q i − r j = 0 , ∀ i, j . (15) he convexity of the entropic regularization keeps the problem in Equa-tion 13 strongly convex and it can be shown ( Peyr´e et al. , 2019) that Equa-tion 15 leads to a unique optimal joint density with the following form: U a = diag( v ) K diag( w ) , (16)where v = exp( q ) (cid:11) ( γ m ) ∈ R m and w = exp( r ) (cid:11) ( γ m ) ∈ R m are theunknown scaling variables and K ∈ R m × m + is the Gibbs kernel, associatedwith cost matrix C with element k ij = exp ( − c ij γ ).From the mass conservation constraints in Equation 13 and scaling formof the optimal joint density in Equation 16 we can derive,diag( v ) K diag( w ) m = p x b and diag( w ) K T diag( v ) m = p y . (17)Using Equation 17, the two unknown scaling variables v and w can beiteratively solved using the Sinkhorn’s algorithm ( Cuturi , 2013) as follows: v l +1 = p x b (cid:11) ( Kw l ) and w l +1 = p y (cid:11) ( K T v l ) . (18) The entropic regularization parameter γ plays a very important role in char-acterization of the joint density; however, there exists no closed form solu-tion for its optimal selection. Generally speaking, increasing the value of γ will increase convexity of the cost function and thus computational eﬃciency;however, at the expense of reduced coupling between the marginal histogramsof the background state and observations. The reduced coupling might be in-terpreted physically using the ﬁrst and second law of thermodynamics, which tates that the drive of thermodynamic systems toward equilibrium is a resultof a tendency toward maximum entropy or disorder. Therefore, we expectthat by increasing the entropic regularization, the degree of coupling is re-duced and the probability masses of the joint distribution becomes morespread out.As an example, the eﬀects of γ on the coupling between two Gaussianmixture models p x b and p y , shown in Figure 2, are demonstrated in Figure 3.It can be seen that at smaller values of γ = 0 . γ increases, the probability masses of the jointdistribution become more dense and spread out – reﬂecting less degree ofdependencies between the marginals. It is important to note that in limitingcases, as γ →

0, the solution of Equation 13 converges to the true optimaljoint histogram, while as γ → ∞ the entropy of the analysis state increasesand tends to p x b p T y . In order to demonstrate the performance of the EnRDA and assess its eﬀec-tiveness, we focus on the linear advection-diﬀusion equation and the chaoticLorenz-63 model (

Lorenz , 1963). The advection-diﬀusion model explains awide range of heat, mass, and momentum transport across the land, vege-tation, and atmospheric continuum, and has been utilized to evaluate theperformance of DA methodologies (

Berardi et al. , 2016;

Ebtehaj et al. , 2014; γ on the optimal joint his-togram coupling two Gaussian mixture models p x b : 0 . N ( − , .

4) + 0 . N ( − , .

8) and p y : 0 . N (5 ,

4) + 0 . N (9 . ,

4) – shown in Figure 2.

Hurkmans et al. , 2006;

Ning et al. , 2014;

Zhang et al. , 1997). Similarly, theLorenz-63 model, as a chaotic model of atmospheric convection, has beenwidely used in testing the performance of DA methodologies (

Goodliﬀ et al. ,2015;

Miller et al. , 1994;

Nakano et al. , 2007;

Tamang et al. , 2020;

Tandeoet al. , 2015;

Van Leeuwen , 2010). Throughout, under controlled experimentalsettings with foreknown model and observation errors, we run the forwardmodels under systematic errors and compare the results of the EnRDA with3D-Var for advection-diﬀusion dynamics and with a particle ﬁlter for theLorenz-63 system.

The advection-diﬀusion is a special case of the Navier-Stokes partial diﬀer-ential equation and its linearized form, with constant diﬀusivity in an incom- ressible ﬂuid ﬂow, is expressed for a conserved physical quantity x ( s , t ) asfollows: ∂ x ( s , t ) ∂t + a (cid:12) ∇ x ( s , t ) = D ∇ x ( s , t ) , (19)where s ∈ R k represents a k − dimensional spatial domain and t is time. Inthe above expression, a = ( a , . . . , a k ) ∈ R k is the constant velocity vectorand D = diag( D , . . . , D k ) ∈ R k × k represents the diﬀusivity matrix.Given initial condition x ( s , t = 0), owing to its linearity, the solution attime t can be obtained by convolving the initial condition with a Kroneckerdelta function δ ( s − a t ) followed by convolution with a zero-mean Gaussiankernel G ( s , t ) G ( s , t ) = 1(2 π ) k/ det( Σ ) / exp (cid:18) − s T Σ − s (cid:19) , (20)where the covariance matrix of the kernel is Σ = 2 D t . In this subsection, we present the results of DA experiments on 1-D and 2-Dadvection-diﬀusion equations. For the 1-D case, the state-space is character-ized over a spatial domain s ∈ (0 ,

60] with a discretization of ∆ s = 0 . a = 0 . D = 0 . /T]. The initial state is considered to be a bimodal mixture of Gaus-sian distributions obtained by superposition of two Kronecker delta func-tions x ( s, t = 0) = 300 δ ( s ) – evolved for time 15 and 25 [t], respectively.The ground truth of the state-space is then obtained by evolving the initialstate with a time step of ∆ t = 0 . T = 0–30 [t] in theabsence of any model error. he observations are obtained at assimilation intervals 10∆ t by corruptingthe ground truth with a heteroscedastic Gaussian noise v t ∼ N (0 , diag( ε y x (cid:12) x )), where ε y = 0 .

05. We introduce both systematic and random errors inmodel simulations to evolve the state-space under model errors. For thesystematic error, model velocity and diﬀusivity coeﬃcient are set to a (cid:48) =0 .

12 [L/T] and D (cid:48) = 0 . /T] respectively, and for the random error, aGaussian noise ω t ∼ N (0 , diag( ε b x (cid:12) x )) is added at every ∆ t , where ε b =0 .

02. For implementation of EnRDA, to construct the distribution of thebackground state, 100 model ensemble members are generated by corruptingthe ground truth at T = 0 with a heteroscedastic Gaussian noise ω ∼N (0 , diag( ε x (cid:12) x )), where ε = 0 .

05. The regularization and displacementparameters are set to γ = 3 and η = 0 . x tr , observations y and analysis states x a by 3D-Var andthe EnRDA for three snapshots of time at 5, 15 and 25 [t], respectively. Also provided arethe bias and ubrmse of the analysis state by EnRDA (3D-Var) at respective times. of the analysis state is no longer preserved properly. For 50 independentsimulations, on average, the bias and ubrmse are decreased by 97.9% and73.1%, respectively, compared to the 3D-Var.Figure 5 shows the results of a 2-D experiment, elaborating more on theways that the Riemannian DA can tackle systematic errors in comparisonto the classic variational approach over the Euclidean space. In particular,the results demonstrate the role of the displacement parameter η in EnRDAversus the ratio of the error covariance matrices in 3D-Var. As is well un-derstood, error covariance matrices in variational approaches act as weightsenabling interpolation between the background state and observations acrossdiﬀerent dimensions. Recall that in the present form, EnRDA interpolates niformly across multiple dimensions of the problem with a single displace-ment parameter η . To make a fair comparison, a parameter α ∈ [0 ,

1] overthe Euclidean space is deﬁned based on the model and observation error co-variances as α = tr (cid:0) B − ( H T R − H + B − ) − (cid:1) – assuming that observationand model errors are uncorrelated and homoscedastic. Thus, when assimilat-ing over the Euclidean space, larger values of α assign larger weights to thebackground state and vice versa.The state-space is characterized over a spatial domain s = (0 ,

10] and s = (0 ,

10] with a discretization of ∆ s = ∆ s = 0 .

1. The advection-diﬀusion is considered to be an isotropic process with the true model param-eter values set as a = a = 0 .

08 [L/T], and D = D = 0 .

02 [L /T]. Similarto the 1-D setting, the ground truth is considered to be a bi-variate Gaus-sian mixture, which is obtained after evolving two Kronecker delta functions x ( s , t ) = 1000 δ ( s , s ) and x ( s , t ) = 4000 δ ( s , s ) for 25 and 35 [t], respec-tively.To resemble a model with systematic errors, a background state is obtainedby increasing the advective velocity isotropically set to 0.12 [L/T] while dif-fusivity is reduced to 0.01 [L /T]. Thus the previous time forecast is not onlycorrupted with systematic biases in the position but also in terms of its spreadand the size of its support set (Figure 5b). Observations are not considered tohave position biases. However, systematic representative errors are imposedassuming that the sensing system has lower resolution than the model. Tothat end, we evolve two Kronecker delta functions x ( s , t ) = 800 δ ( s , s ) and x ( s , t ) = 2400 δ ( s , s ), which have less mass than the true state, for same x tr versus the background states x b and observations y (a–c) withsystematic errors under a 2-D advection-diﬀusion dynamics as well as the analysis state x a by 3D-Var (d–f) and the EnRDA (g–i) for diﬀerent displacement parameters in the Euclidean α and Riemannian space η , where the entropic regularization parameter is set to γ = 0 . ime period of 25 and 35 [t], respectively. Then, in order to resemble poten-tial representative error of the sensing system, the observations are up-scaledby a factor of two through box averaging.As shown in Figure 5, the EnRDA preserves the shape of the state-spacevery well and gradually moves the mass towards the background state asthe value of η increases. However, the expected shape of the state is notrecovered properly, in 3D-Var, especially for smaller values of α = 0 .

25 and0.5 – which tend to nudge the analysis state towards the background state.Although the higher values of α = 0 .

75 improves preservation of the shape ofthe state-space in 3D-Var, it does so by nudging the modes of analysis statefurther away from their true position towards the observations, ultimatelyincreasing the error metrics.Speciﬁcally, the bias is reduced by more than 30%, from 0.15 to 0.05, as α increases from 0.25 to 0.75 for the 3D-Var; however, this occurs at theexpense of almost three folds increase in ubrmse, from 0.3 to 1.1. This isprimarily due to the fact that the background state has position error dueto erroneous velocity and diﬀusivity parameters. Since bias measures theaverage magnitude of errors, the positive diﬀerences between the analysisstate and true state due to the position error is compensated by the negativediﬀerences in their magnitude, leading to reduction in bias as the α increases.However, ubrmse is quadratic and thus measures the average magnitude ofthe error irrespective of its signs. Whereas, the bias (0.002–0.002) and ubrmse(0.12–0.95) in EnRDA are consistently lower than the 3D-Var by almost 99%and 60% respectively, over the reported range of η . .2 Lorenz-63 The Lorenz system (Lorenz-63,

Lorenz , 1963) is derived through truncation ofthe Fourier series of the Rayleigh-B´enard convection model. This model canbe interpreted as a simplistic local weather system only involving the eﬀectof local shear stress and buoyancy forces. The system is expressed usingcoupled ordinary diﬀerential equations that describe the temporal evolutionof three coordinates x , y , and z representing the rate of convective overturn,horizontal, and vertical temperature variations as: dxdt = − σ ( x − y ) dydt = ρx − y − xzdzdt = xy − βz , (21)where σ represents the Prandtl number, ρ is a normalized Rayleigh num-ber proportional to the diﬀerence in temperature gradient through the depthof the ﬂuid and β denotes a horizontal wave number of the convective mo-tion. It is well established that for parameter values of σ = 10, ρ = 28and β = 8 /

3, the system exhibits chaotic behavior with the phase space re-volving around two attractors located at ( (cid:112) β ( ρ − , (cid:112) β ( ρ − , ρ −

1) and( − (cid:112) β ( ρ − , − (cid:112) β ( ρ − , ρ − In this subsection, we will present the results from DA experiments conductedon Lorenz system under systematic error and also compare the results of the nRDA with particle ﬁlter. Apart from the systematic error component,we utilize the standard experimental setting used in numerous DA studies( Furtado et al. , 2008;

Miller et al. , 1994;

Van Leeuwen , 2010). In order toobtain the ground truth of the model trajectory, the system initialized at x = (1 . − . . t =0 .

01 over a time period of T = 0–20 [t] using the fourth-order Runge-Kuttaapproximation ( Kutta , 1901;

Runge , 1895). The observations are obtainedat every assimilation time step of 40∆ t by perturbing the ground truth withGaussian noise v t ∼ N (0 , σ obs Σ ρ ), where σ obs = 2 is the observation errorvariance and Σ ρ ∈ R × denotes the correlation matrix with 1 on the diagonalentries, 0.5 on the ﬁrst sub and super diagonals, and 0.25 on the second suband super diagonals.In order to characterize the distribution of the background state, 10 en-semble members of the EnRDA and 100 particles of the ﬁlter are generatedby corrupting the ground truth at T = 0 [t] with a zero-mean Gaussian noise ω ∼ N (0 , σ I ), where σ = 2. For introducing systematic errors, themodel parameter values are set to σ (cid:48) = 10 . ρ (cid:48) = 27, and β (cid:48) = 10 /

3. Therandom errors are also introduced as the system evolves in time by addinga Gaussian noise ω t ∼ N (0 , σ b I ) at every ∆ t , where σ b = 0 .

02. For En-RDA, the regularization and displacement parameters are set to γ = 10 and η = 0 .

05. Throughout, to draw a robust statistical conclusion about the errorstatistics, the DA experiments are repeated for 50 independent and identicalsimulations.Figure 6 shows the temporal evolution of the ground truth and the anal- x tr of the Lorenz-63, observations y , meanof 100 and 10 ensemble members in particle ﬁlter x a (PF) (ﬁrst column) and EnRDA x a (EnRDA) (second column), respectively. The temporal evolution of the ensemble mem-bers are shown in gray. Also shown within dashed rectangles are the window of time overwhich support set of observations and ensemble spread are disjoint. ysis state by particle ﬁlter (left column) and EnRDA (right column) overa time period of 0 to 15 [t] for one simulation. As is evident, the particleﬁlter is well capable of capturing the ground truth when the observationslie within the particle spread. However, when the observations lie far apartfrom the support set of particles (Figure 6, dashed box), i.e. when the pdfsof background state and observations are disjoint, the particle ﬁlter becomesdegenerate and the analysis state (particle mean) deviates away from theground truth. Whereas, the EnRDA is capable of capturing the true stateremarkably well even when ensemble spread and observations are far apart from each other.The time evolution of the average bias and ubrmse for 50 independent andidentical simulations is color coded over the phase space in Figure 7. As isevident, the average bias and ubrmse for the EnRDA is much lower than theparticle ﬁlter throughout the entire simulation period. The error statisticsshow a transient behavior; however, for the EnRDA (Figure 7, left column)error statistics remain fairly stable and much lower than that of the particleﬁlter (Figure 7, right column). In particular, the mean bias and ubrmse aredecreased by 36–88% and 47–55% along the three dimensions, respectively.The expected values of the bias and ubrmse are reported in Table 1. x y z x y z Particle Filter 2.24 2.41 0.59 6.25 7.95 7.88EnRDA 0.26 (88.4) 0.32 (86.7) 0.38 (35.6) 2.80 (55.2) 4.22 (46.9) 4.17 (47.1)

In this study, we introduced an ensemble data assimilation (DA) method-ology over a Riemannian manifold, namely Ensemble Riemannian DA (En-RDA), and illustrated its advantages over classic Bayesian DA techniquesfor dissipative and chaotic dynamics. We demonstrated that the presentedmethodology is capable of assimilating information in the probability domain– characterized by the families of probability distributions with ﬁnite second-order moments. The key message is that when the forecast and observationsexhibit non-Gaussian structure and their support sets are disjoint due tothe presence of systematic errors; the translation variance of the Wassersteinmetric can be leveraged to extend geophysical forecast skills beyond whatis currently viable through classic Euclidean DA techniques. Even though,future research for a comprehensive comparison with existing bias correctionmethodologies is needed to completely characterize its relative pros and cons.We have explained the role of regularization and displacement parameterin the EnRDA and empirically examined their eﬀects on the optimal jointhistogram and consequently on the analysis state. Nevertheless, future stud- es are required to characterize closed-form expressions to better understandtheir implications on the forecast uncertainty. As was explained earlier, un-like the classic Bayesian DA techniques which assimilate available informationusing diﬀerent relative weights across multiple dimensions dictated by the er-ror covariance matrices; a scalar displacement parameter is utilized in theEnRDA that balances the entire analysis state across all dimensions. Fu-ture research can be devoted to developing a framework that utilizes a vectorrepresentation of the displacement parameters to eﬀectively tackle possibleheterogeneity of uncertainty across multiple dimensions.Although the proposed EnRDA methodology through entropic regular-ization works properly for the presented low dimensional problems, futureresearch is needed to test its eﬃciency in operational applications in high-dimensional geophysical DA systems with a state-space easily exceeding mil-lion dimensions ( Van Leeuwen , 2009). In particular, additional research canbe conducted to characterize the eﬀective number of ensemble members re-quired to suﬃciently represent such a large state-space. In high-dimensionalsystems, further constraining the solution of the optimal joint histogram ona submanifold of probability distributions with a Gaussian mixture structurecan provide a way forward by signiﬁcantly lowering the overall computationalcost (

Chen et al. , 2019b).Nevertheless, the eﬀectiveness of the presented methodology in treatingsystematic errors shows a great potential to improve precipitation and soilmoisture prediction in weather and land surface models, where position ofconvective cells (

Jones and Macpherson , 1997;

Le Coz et al. , 2019;

Lin et al. , Dee and Da Silva , 1998;

Reichleand Koster , 2004) are often corrupted with signiﬁcant biases. The intrinsicability to assimilate probabilistic information regardless of the shape of thedistribution further promises the methodology to be eﬀective in the sea ice DAframework, where the subgrid thickness is represented through probabilitydistribution of sea ice fractions (

Asadi et al. , 2019).Furthermore, a promising area of future research is that of extending thepresented formalism in Earth system models to assimilate a priori climatolog-ical information obtained from in-situ gauges, ships, buoys, and radiosondeswhere currently due to the existing scale gaps, assimilation of such observa-tions into the model grids is not straightforward. A crude framework maybe cast to ﬁnd the analysis state distribution as the barycenter of the threemarginals namely background state, observations, and a priori climatologicaldata. For such a framework, the Sinkhorn algorithm (

Cuturi , 2013) can beeasily generalized to solve through iterative projections (

Peyr´e et al. , 2019).

Acknowledgements

The ﬁrst and second author acknowledge the grant from the National Aero-nautics and Space Administration (NASA) Terrestrial Hydrology Program(THP, 80NSSC18K1528) and the New (Early Career) Investigator Program(NIP, 80NSSC18K0742). The ﬁfth author also acknowledge the support fromNational Science Foundation (NSF, DMS1830418). eferences Agueh, M., and G. Carlier (2011), Barycenters in the wasserstein space,

SIAM Journal onMathematical Analysis , 10.1137/100805741.Altman, A., and J. Gondzio (1999), Regularized symmetric indeﬁnite systems in interiorpoint methods for linear and quadratic optimization,

Optimization Methods and Software , (1-4), 275–302.Amari, S.-i. (2012), Diﬀerential-geometrical methods in statistics , vol. 28, Springer Science& Business Media.Anderson, J. L. (1996), A method for producing and evaluating probabilistic forecasts fromensemble model integrations,

Journal of climate , (7), 1518–1530.Anderson, J. L. (2010), A non-gaussian ensemble ﬁlter update for data assimilation, MonthlyWeather Review , (11), 4186–4198.Andersson, E., J. Pailleux, J.-N. Th´epaut, J. Eyre, A. McNally, G. Kelly, and P. Courtier(1994), Use of cloud-cleared radiances in three/four-dimensional variational data assimi-lation, Quarterly Journal of the Royal Meteorological Society , (517), 627–653.Asadi, N., K. A. Scott, and D. A. Clausi (2019), Data fusion and data assimilation of icethickness observations using a regularisation framework, Tellus A: Dynamic Meteorologyand Oceanography , pp. 1–20.Beezley, J. D., and J. Mandel (2008), Morphing ensemble kalman ﬁlters,

Tellus A: DynamicMeteorology and Oceanography , (1), 131–140.Benamou, J.-D., and Y. Brenier (2000), A computational ﬂuid mechanics solution to themonge-kantorovich mass transfer problem, Numerische Mathematik , (3), 375–393.Berardi, M., A. Andrisani, L. Lopez, and M. Vurro (2016), A new data assimilation techniquebased on ensemble kalman ﬁlter and brownian bridges: an application to richards equation, Computer Physics Communications , , 43–53.33ocquet, M., C. A. Pires, and L. Wu (2010), Beyond gaussian statistical modeling in geo-physical data assimilation, Monthly Weather Review , (8), 2997–3023.Brenier, Y. (1987), D´ecomposition polaire et r´earrangement monotone des champs devecteurs, CR Acad. Sci. Paris S´er. I Math. , , 805–808.Chen, B., L. Dang, Y. Gu, N. Zheng, and J. C. Principe (2019a), Minimum Error EntropyKalman Filter, IEEE Transactions on Systems, Man, and Cybernetics: Systems , 10.1109/tsmc.2019.2957269.Chen, Y., T. T. Georgiou, and A. Tannenbaum (2019b), Optimal transport for gaussianmixture models,

IEEE Access , , 6269–6278.Cullen, M., J. Norbury, and R. J. Purser (1991), Generalised lagrangian solutions foratmospheric and oceanic ﬂows, SIAM Journal on Applied Mathematics , (1), 20–31,10.1137/0151002.Cuturi, M. (2013), Sinkhorn distances: Lightspeed computation of optimal transport, in Advances in neural information processing systems , pp. 2292–2300.De Lannoy, G. J., P. R. Houser, V. R. Pauwels, and N. E. Verhoest (2007a), State and biasestimation for soil moisture proﬁles by an ensemble kalman ﬁlter: Eﬀect of assimilationdepth and frequency,

Water resources research , (6).De Lannoy, G. J., R. H. Reichle, P. R. Houser, V. Pauwels, and N. E. Verhoest (2007b),Correcting for forecast bias in soil moisture assimilation with the ensemble kalman ﬁlter, Water Resources Research , (9).Dee, D. P. (2003), Detection and correction of model bias during data assimilation, Meteo-rological Training Course Lecture Series (ECMWF) .Dee, D. P. (2005), Bias and data assimilation,

Quarterly Journal of the Royal MeteorologicalSociety , (613), 3323–3343.Dee, D. P., and A. M. Da Silva (1998), Data assimilation in the presence of forecast bias, Quarterly Journal of the Royal Meteorological Society , (545), 269–295.34oucet, A., and A. M. Johansen (2009), A tutorial on particle ﬁltering and smoothing:Fifteen years later, Handbook of nonlinear ﬁltering , (656-704), 3.Dr´ecourt, J.-P., H. Madsen, and D. Rosbjerg (2006), Bias aware kalman ﬁlters: Comparisonand improvements, Advances in Water Resources , (5), 707–718.Ebtehaj, A. M., and E. Foufoula-Georgiou (2013), On variational downscaling, fusion, andassimilation of hydrometeorological states: A uniﬁed framework via regularization, WaterResources Research , (9), 5944–5963, 10.1002/wrcr.20424.Ebtehaj, A. M., M. Zupanski, G. Lerman, and E. Foufoula-Georgiou (2014), Variational dataassimilation via sparse regularisation, Tellus A: Dynamic Meteorology and Oceanography , (1), 21,789.Evensen, G. (1994), Sequential data assimilation with a nonlinear quasi-geostrophic modelusing monte carlo methods to forecast error statistics, Journal of Geophysical Research:Oceans , (C5), 10,143–10,162.Evensen, G. (2003), The ensemble kalman ﬁlter: Theoretical formulation and practical im-plementation, Ocean dynamics , (4), 343–367.Feyeux, N., A. Vidard, and M. Nodet (2018), Optimal transport for variational data assim-ilation, Nonlinear Processes in Geophysics , (1), 55–66.Fr´echet, M. (1948), Les ´el´ements al´eatoires de nature quelconque dans un espace distanci´e,in Annales de l’institut Henri Poincar´e , vol. 10, pp. 215–310.Furtado, H. C. M., H. F. de Campos Velho, and E. E. N. Macau (2008), Data assimilation:Particle ﬁlter and artiﬁcial neural networks, in

Journal of Physics: Conference Series , vol.135, p. 012073, IOP Publishing.Goodliﬀ, M., J. Amezcua, and P. J. Van Leeuwen (2015), Comparing hybrid data assimila-tion methods on the lorenz 1963 model with increasing non-linearity,

Tellus A: DynamicMeteorology and Oceanography , (1), 26,928.35ordon, N. J., D. J. Salmond, and A. F. Smith (1993), Novel approach to nonlinear/non-gaussian bayesian state estimation, in IEE proceedings F (radar and signal processing) ,vol. 140, pp. 107–113, IET.Hamill, T. M. (2001), Interpretation of rank histograms for verifying ensemble forecasts,

Monthly Weather Review , (3), 550–560.Han, X., and X. Li (2008), An evaluation of the nonlinear/non-gaussian ﬁlters for the se-quential data assimilation, Remote Sensing of Environment , (4), 1434–1449.Hellinger, E. (1909), Neue begr¨undung der theorie quadratischer formen von unendlichvie-len ver¨anderlichen., Journal f¨ur die reine und angewandte Mathematik (Crelles Journal) , (136), 210–271.Hurkmans, R., C. Paniconi, and P. A. Troch (2006), Numerical assessment of a dynami-cal relaxation data assimilation scheme for a catchment hydrological model, HydrologicalProcesses: An International Journal , (3), 549–563.Jones, C., and B. Macpherson (1997), A latent heat nudging scheme for the assimilationof precipitation data into an operational mesoscale model, Meteorological Applications:A journal of forecasting, practical applications, training techniques and modelling , (3),269–277.Jordan, R., D. Kinderlehrer, and F. Otto (1998), The variational formulation of the Fokker-Planck equation, SIAM Journal on Mathematical Analysis , 10.1137/S0036141096303359.Kalman, R. E. (1960), A new approach to linear ﬁltering and prediction problems,

Journalof basic Engineering , (1), 35–45.Kantorovich, L. V. (1942), On the translocation of masses, in Dokl. Akad. Nauk. USSR (NS) ,vol. 37, pp. 199–201.Kollat, J., P. Reed, and D. Rizzo (2008), Addressing model bias and uncertainty in threedimensional groundwater transport forecasts for a physical aquifer experiment,

Geophysicalresearch letters , (17). 36ullback, S., and R. A. Leibler (1951), On information and suﬃciency, The annals of math-ematical statistics , (1), 79–86.Kutta, W. (1901), Beitrag zur naherungsweisen integration totaler diﬀerentialgleichungen, Z. Math. Phys. , , 435–453.Lauritzen, S. L. (1987), Statistical manifolds, Diﬀerential geometry in statistical inference , , 163–216.Le Coz, C., A. Heemink, M. Verlaan, M.-c. ten Veldhuis, N. van de Giesen, et al. (2019),Correcting position error in precipitation data using image morphing, Remote Sensing , (21), 2557.Le Dimet, F.-X., and O. Talagrand (1986), Variational algorithms for analysis and assimi-lation of meteorological observations: theoretical aspects, Tellus A: Dynamic Meteorologyand Oceanography , (2), 97–110.Lin, L.-F., A. M. Ebtehaj, A. N. Flores, S. Bastola, and R. L. Bras (2017), CombinedAssimilation of Satellite Precipitation and Soil Moisture: A Case Study Using TRMM andSMOS Data, Monthly Weather Review , (12), 4997–5014, 10.1175/MWR-D-17-0125.1.Lorenc, A. C. (1986), Analysis methods for numerical weather prediction, Quarterly Journalof the Royal Meteorological Society , (474), 1177–1194.Lorenz, E. N. (1963), Deterministic nonperiodic ﬂow, Journal of the atmospheric sciences , (2), 130–141.Mandel, J., and J. D. Beezley (2009), An ensemble kalman-particle predictor-corrector ﬁlterfor non-gaussian data assimilation, in International Conference on Computational Science ,pp. 470–478, Springer.McCann, R. J. (1997), A convexity principle for interacting gases,

Advances in mathematics , (1), 153–179.Miller, R. N., M. Ghil, and F. Gauthiez (1994), Advanced data assimilation in stronglynonlinear dynamical systems, Journal of the atmospheric sciences , (8), 1037–1056.37onge, G. (1781), M´emoire sur la th´eorie des d´eblais et des remblais, Histoire de l’Acad´emieRoyale des Sciences de Paris .Moradkhani, H., S. Sorooshian, H. V. Gupta, and P. R. Houser (2005), Dual state–parameterestimation of hydrological models using ensemble kalman ﬁlter,

Advances in water re-sources , (2), 135–147.Nakano, S., G. Ueno, and T. Higuchi (2007), Merging particle ﬁlter for sequential dataassimilation, Nonlinear Processes in Geophysics , 10.5194/npg-14-395-2007.Nash, J. C. (1990),

Compact numerical methods for computers: linear algebra and functionminimisation , CRC press.Ning, L., F. P. Carli, A. M. Ebtehaj, E. Foufoula-Georgiou, and T. T. Georgiou (2014),Coping with model error in variational data assimilation using optimal mass transport,

Water Resources Research , (7), 5817–5830.Orlin, J. B. (1993), A faster strongly polynomial minimum cost ﬂow algorithm, Operationsresearch , (2), 338–350.Otto, F. (2001), The geometry of dissipative evolution equations: The porous medium equa-tion, Communications in Partial Diﬀerential Equations , 10.1081/PDE-100002243.Park, S. K., and D. ˇZupanski (2003), Four-dimensional variational data assimilation formesoscale and storm-scale applications,

Meteorology and Atmospheric Physics , (1-4),173–208.Pennec, X. (2006), Intrinsic statistics on Riemannian manifolds: Basic tools for geometricmeasurements, Journal of Mathematical Imaging and Vision , 10.1007/s10851-006-6228-4.Peyr´e, G., M. Cuturi, et al. (2019), Computational optimal transport,

Foundations andTrends R (cid:13) in Machine Learning , (5-6), 355–607.Pires, C., R. Vautard, and O. Talagrand (1996), On extending the limits of variationalassimilation in nonlinear chaotic systems, Tellus A , (1), 96–121.38ires, C. A., O. Talagrand, and M. Bocquet (2010), Diagnosis and impacts of non-gaussianityof innovations in data assimilation, Physica D: Nonlinear Phenomena , (17), 1701–1717.Poterjoy, J., and J. L. Anderson (2016), Eﬃcient assimilation of simulated observations ina high-dimensional geophysical system using a localized particle ﬁlter, Monthly WeatherReview , (5), 2007–2020.Ravela, S., K. Emanuel, and D. McLaughlin (2007), Data assimilation by ﬁeld alignment, Physica D: Nonlinear Phenomena , (1-2), 127–145.Reichle, R. H., and R. D. Koster (2004), Bias reduction in short records of satellite soilmoisture, Geophysical Research Letters , (19).Reichle, R. H., D. Entekhabi, and D. B. McLaughlin (2001), Downscaling of radio brightnessmeasurements for soil moisture estimation: A four-dimensional variational data assimila-tion approach, Water Resources Research , (9), 2353–2364.Reichle, R. H., D. B. McLaughlin, and D. Entekhabi (2002a), Hydrologic data assimilationwith the ensemble kalman ﬁlter, Monthly Weather Review , (1), 103–114.Reichle, R. H., J. P. Walker, R. D. Koster, and P. R. Houser (2002b), Extended versusensemble kalman ﬁltering for land data assimilation, Journal of hydrometeorology , (6),728–740.Runge, C. (1895), ¨Uber die numerische auﬂ¨osung von diﬀerentialgleichungen, MathematischeAnnalen , (2), 167–178.Said, S., L. Bombrun, Y. Berthoumieu, and J. H. Manton (2017), Riemannian GaussianDistributions on the Space of Symmetric Positive Deﬁnite Matrices, IEEE Transactionson Information Theory , 10.1109/TIT.2017.2653803.Tamang, S. K., A. Ebtehaj, D. Zou, and G. Lerman (2020), Regularized variational dataassimilation for bias treatment using the wasserstein metric,

Quarterly Journal of theRoyal Meteorological Society , (730), 2332–2346.39andeo, P., P. Ailliot, J. Ruiz, A. Hannart, B. Chapron, A. Cuzol, V. Monbet, R. Easton, andR. Fablet (2015), Combining analog method and ensemble data assimilation: applicationto the lorenz-63 chaotic system, in Machine learning and data mining approaches to climatescience , pp. 3–12, Springer.Van Leeuwen, P. J. (2009), Particle ﬁltering in geophysical systems,

Monthly Weather Review , (12), 4089–4114.Van Leeuwen, P. J. (2010), Nonlinear data assimilation in geosciences: an extremely eﬃcientparticle ﬁlter, Quarterly Journal of the Royal Meteorological Society , (653), 1991–1999.Van Leeuwen, P. J., H. R. K¨unsch, L. Nerger, R. Potthast, and S. Reich (2019), Particleﬁlters for high-dimensional geoscience applications: A review, Quarterly Journal of theRoyal Meteorological Society , (723), 2335–2365.Villani, C. (2003), Topics in optimal transportation , 58, American Mathematical Soc.Walker, J. P., G. R. Willgoose, and J. D. Kalma (2001), One-dimensional soil moisture proﬁleretrieval by assimilation of near-surface measurements: A simpliﬁed soil moisture modeland ﬁeld application,

Journal of Hydrometeorology , (4), 356–373.Woodbury, M. A., and M. Woodbury (1950), Inverting modiﬁed matrices.Zhang, X., A. Heemink, and J. Van Eijkeren (1997), Data assimilation in transport models, Applied mathematical modelling ,21