[PDF] Accounting for model errors in iterative ensemble smoothers

Abstract

In the strong-constraint formulation of the history-matching problem, we assume that all the model errors relate to a selection of uncertain model input parameters. One does not account for additional model errors that could result from, e.g., excluded uncertain parameters, neglected physics in the model formulation, the use of an approximate model forcing, or discretization errors resulting from numerical approximations. If parameters with significant uncertainties are unaccounted for, there is a risk for an unphysical update, of some uncertain parameters, that compensates for errors in the omitted parameters. This paper gives the theoretical foundation for introducing model errors in ensemble methods for history matching. In particular, we explain procedures for practically including model errors in iterative ensemble smoothers like ESMDA and IES. Also, we demonstrate the impact of adding (or neglecting) model errors in the parameter-estimation problem.

Full PDF

AAccounting for model errors in iterative ensemble smoothers

Geir Evensen

International Research Institute of Stavanger, Bergen, Norway Nansen Environmental and Remote Sensing Center, Bergen, NorwayDecember 4, 2018

Abstract

In the strong-constraint formulation of the history-matching problem, we assume that all the model errors relateto a selection of uncertain model input parameters. One does not account for additional model errors that could resultfrom, e.g., excluded uncertain parameters, neglected physics in the model formulation, the use of an approximatemodel forcing, or discretization errors resulting from numerical approximations. If parameters with signiﬁcant uncer-tainties are unaccounted for, there is a risk for an unphysical update, of some uncertain parameters, that compensatesfor errors in the omitted parameters. This paper gives the theoretical foundation for introducing model errors in en-semble methods for history matching. In particular, we explain procedures for practically including model errors initerative ensemble smoothers like ESMDA and IES, and we demonstrate the impact of adding (or neglecting) modelerrors in the parameter-estimation problem. Also, we present a new result regarding the consistency of using thesample covariance of the predicted nonlinear measurements in the update schemes.

It is standard to assume the model to be perfect when we use ensemble smoothers for solving inverse problems.This paper addresses the problem of consistently including the additional effect of stochastic model errors in differentensemble smoothers. In particular, we consider methods such as Ensemble Smoother (ES) (

Evensen and van Leeuwen ,1996;

Evensen , 2009b), Iterative ES (IES) (

Chen and Oliver , 2012, 2013), and Ensemble Smoother with Multiple DataAssimilations (ESMDA) (

Emerick and Reynolds , 2013).There is a vast literature on solving the data-assimilation problem in the presence of model errors (see, e.g., thereviews by

Carrassi and Vannitsem , 2016;

Harlim , 2017). We traditionally characterize the data-assimilation problemas being either a weak-constraint or a strong-constraint problem, dependent on whether we include the dynamicalmodel as a strong or a weak constraint in the cost function. The classical book by

Bennett (1992) is entirely devotedto solving the weak-constraint inverse problem, and it illustrates that in the weak-constraint case we need to invertsimultaneously for the state vector as a function of both space and time.In

Eknes and Evensen (1997) the variational formulation was solved for a weak-constraint parameter and state es-timation problem using the representer method by

Bennett (1992), and in

Evensen (2003) different methods includingensemble methods were used to solve a weak-constraint state- and parameter-estimation problem. A conclusion fromthese works is that, if model errors are present, we need to increase the dimension of the problem by either updatingthe model solution as a function of space and time or by estimating the actual model errors and after that solve forthe model solution. It is also interesting to realize that the simultaneous state-parameter estimation problem becomesnonlinear even if the model itself is linear (except for in a very trivial case), see for instance

Evensen (2003). Also,when using Ensemble Kalman Filter (EnKF) by

Evensen (1994) or ES, it is relatively easy to include stochastic modelerrors as long as we update both the parameters and the state variables simultaneously (

Evensen , 2003, 2009a).At this point, we should mention that we do not distinguish between model errors and model bias. In fact, withtime-correlated stochastic model errors, and if the correlation becomes perfect (equal to one), then the model errorsbecome equivalent to a constant bias. Fortunately, the procedure outlined in this paper can be used to estimate bothmodel errors and model bias.In a recent paper by

Sakov et al. (2018), the Iterative EnKF was reformulated to allow for additive model errors.However, for the history-matching problem, we need to account for more general representations of the error termsince the solution of the reservoir simulator depends nonlinearly on the errors in the rate data used to force the model.We will start by deﬁning the standard strong-constraint history-matching problem, and after that, we move onto the general weak-constraint formulation. We then formally derive the ES, ESMDA, and IES, in the presence ofmodel errors. The different smoother methods are used in a simple example to demonstrate the consistency of theformulation used and to illustrate the impact of model errors on the parameter-estimation problem.1 a r X i v : . [ phy s i c s . d a t a - a n ] D ec Standard history-matching problem

The strong-constraint formulation given by

Evensen (2018) is attractive because it simpliﬁes the inverse problem andefﬁcient ensemble smoothers can be deﬁned. A ﬁrst fundamental assumption is that we have a perfect deterministicforward model where the prediction y only depends on the input model parametrization x , y = g ( x ) . (1)In a reservoir history-matching problem, the model operator is the reservoir simulation model, which predicts theobserved production of oil, water, and gas, from the reservoir. Thus, given the true parametrization of x , the trueprediction of y is precisely determined by the model in Eq. (1). Also, we have measurements d of the true y given as d = y + e . (2)From evaluating the model operator g ( x ) , given a realization of the uncertain model parameters x ∈ ℜ n , we uniquelydetermine a realization of predicted measurements y ∈ ℜ m (corresponding to the real measurements d ∈ ℜ m ). Here n is the number of parameters and m the number of measurements. We want to use the measurements to estimate theparameters x , and the measurements are assumed to contain random errors e ∈ ℜ m .In history matching, it is common to deﬁne a prior distribution for the uncertain parameters since we usually willhave more degrees of freedom in the parameters, than we have independent information in the measurements. Bayes’theorem gives the joint posterior pdf for x and y as f ( x , y | d ) ∝ f ( x , y ) f ( d | y )= f ( x ) f ( y | x ) f ( d | y ) . (3)In the case of no model errors, the transition density f ( y | x ) becomes the Dirac delta function, and we can write f ( x , y | d ) ∝ f ( x ) δ ( y − g ( x )) f ( d | y ) . (4)We are interested in the marginal pdf for x , which we obtain by integrating Eq. (4) over y , giving f ( x | d ) ∝ (cid:90) f ( x ) δ ( y − g ( x )) f ( d | y ) d y = f ( x ) f ( d | g ( x )) . (5)When introducing the normal priors f ( x ) = N ( x f , C xx ) , (6) f ( d | g ( x )) = f ( e ) = N ( , C dd ) , (7)we can write Eq. (5) as f ( x | d ) ∝ exp (cid:26) − (cid:16) x − x f (cid:17) T C − xx (cid:16) x − x f (cid:17)(cid:27) × exp (cid:26) − (cid:16) g ( x ) − d (cid:17) T C − dd (cid:16) g ( x ) − d (cid:17)(cid:27) . (8)Note that the posterior pdf in Eq. (8) is non-Gaussian due to the nonlinear model g ( x ) . Maximizing f ( x | d ) isequivalent to minimizing the cost function J ( x ) = (cid:0) x − x f (cid:1) T C − xx (cid:0) x − x f (cid:1) + (cid:0) g ( x ) − d (cid:1) T C − dd (cid:0) g ( x ) − d (cid:1) . (9)Most methods for history matching apply the assumptions of a perfect model and Gaussian priors, and they attemptto solve either one of Eqs. (8) or (9).For this strong-constraint problem, Evensen (2018) explained how Eqs. (8) or (9) can be approximately solvedusing the ES, ESMDA, and IES. We can interpret these methods to approximately sample the posterior pdf in Eq. (8),and we can easily derive them as an ensemble of minimizing solutions of the cost function in Eq. (9) written for eachrealization as J ( x j ) = (cid:0) x j − x f j (cid:1) T C − xx (cid:0) x j − x f j (cid:1) + (cid:0) g ( x j ) − d j (cid:1) T C − dd (cid:0) g ( x j ) − d j (cid:1) . (10)This approach relates to the papers on Ensemble Randomized Likelihood (EnRML) ( Kitanidis , 1995;

Oliver et al. ,1996). Note that the minimizing solutions will not precisely sample the posterior non-Gaussian distribution.2

ES solution

We obtain the ES solution by ﬁrst sampling the prior parameters x f j and the perturbed measurements d j from x f j ∼ N ( x f , C xx ) , (11) d j ∼ N ( d , C dd ) . (12)We then compute the model predictions y f j from y f j = g ( x f j ) . (13)We deﬁne the covariance matrix between two vectors x and y as the expectation C xy = E (cid:104) (cid:0) x − E [ x ] (cid:1)(cid:0) y − E [ y ] (cid:1) T (cid:105) , (14)while in the ensemble methods we introduce the sample mean x = N N ∑ j = x j , (15)and the sample covariance between two arbitrary vectors x and y as C xy = N − N ∑ j = (cid:0) x j − x (cid:1)(cid:0) y j − y (cid:1) T . (16) Evensen (2018) derived the ES update equations for the parameters x a j and the predictions y a j as x a j = x f j + C xy (cid:0) C yy + C dd (cid:1) − (cid:16) d j − y f j (cid:17) , (17) y a j = g ( x a j ) , (18)but see also the alternative derivation in Secs. 4.3–4.5 below regarding the consistency of using the sample covari-ance C yy in Eq. (17). To compute the ES solution, we start from an initial ensemble of parameters x f j and perturbedmeasurements d j . First, we generate a prior ensemble prediction y f j by evaluating the model in Eq. (13) for each real-ization x f j . Then we compute the prior ensemble covariance between the parameters and the predicted measurements C xy ∈ ℜ n × m and the ensemble covariance of the predicted measurements C yy ∈ ℜ m × m and use them in the ES updatein Eq. (17). Finally, we can recompute the model prediction using the updated parameters x a j in Eq. (18).As an alternative to solving the model in Eq. (18) for y a j , it is possible to compute the updated prediction directlyfrom an update equation y a j = y f j + C yy (cid:0) C yy + C dd (cid:1) − (cid:16) d j − y f j (cid:17) , (19)and in the case with a linear model, the result would be identical to solving Eq. (18). However, due to the nonlinearityof the deterministic model, the Eqs. (18) and (19) will give different results for y a j . The Eq. (19) is just the standard ESupdate y a j given the prior forecast ensemble for y f j and the perturbed measurements d j of y . We will later show that,for nonlinear models, there may be a beneﬁt of computing y a j indirectly through integration of the model in Eq. (18),initialized with x a j , rather than using the direct update in Eq. (19). Also, the indirect update in Eq. (18) allows for theuse of iterations. Lets now look at a formulation where we assume that the model depends nonlinearly on the model errors q as well asthe parameters x , i.e., y = g ( x , q ) . (20)The model operator can be somewhat general, including a recursion or time steps, and the model error is a vector ofnoise components that could represent, e.g., the time-correlated noise in rate data used to force a reservoir simulationmodel over a speciﬁc period. Thus, there is a signiﬁcant difference between x and q . Firstly, x is a static parameterthat does not change with time. Thus, once the parameters x are estimated, we can use them in a future predictionsimulation. The model errors q , on the other hand, will vary in time and are only estimated for the period of theinverse calculation. Only in the case of time-correlated errors can the estimated q be used to some extent as a forcingin the future prediction. 3e assume that we have prior pdfs for the model errors and the parameters f ( x ) and f ( q ) . It is common, butnot necessary, to assume that the model errors and parameters are independent so we can write the joint prior pdf as f ( x , q ) = f ( x ) f ( q ) . The transition density for the model evolution is f ( y | x , q ) = δ (cid:0) y − g ( x , q ) (cid:1) , (21)and we obtain the joint pdf by multiplying the prior with the transition density, f ( y , x , q ) = δ (cid:0) y − g ( x , q ) (cid:1) f ( x ) f ( q ) . (22)The likelihood function for the measurements given the prediction y is f ( d | y ) , thus, the posterior conditional pdfbecomes f ( y , x , q | d ) ∝ f ( d | y ) δ (cid:0) y − g ( x , q ) (cid:1) f ( x ) f ( q ) . (23)Since y is given by the model in Eq. (20) as soon as we know x and q , we can compute the marginal density f ( x , q | d ) ∝ (cid:90) f ( d | y ) δ (cid:0) y − g ( x , q ) (cid:1) f ( x ) f ( q ) d y = f (cid:0) d | g ( x , q ) (cid:1) f ( x ) f ( q ) . (24) We now assume Gaussian priors f ( x ) ∝ exp (cid:26) − (cid:0) x − x f (cid:1) T C − xx (cid:0) x − x f (cid:1)(cid:27) , (25)and f ( q ) ∝ exp (cid:26) − (cid:0) q − q f (cid:1) T C − qq (cid:0) q − q f (cid:1)(cid:27) , (26)where q f would normally be zero. The likelihood for the measurements becomes f (cid:0) d | g ( x , q ) (cid:1) = exp (cid:26) − (cid:0) g ( x , q ) − d (cid:1) T C − dd (cid:0) g ( x , q ) − d (cid:1)(cid:27) , (27)and we write the marginal posterior as f ( x , q | d ) ∝ exp (cid:26) − (cid:0) x − x f (cid:1) T C − xx (cid:0) x − x f (cid:1) − (cid:0) q − q f (cid:1) T C − qq (cid:0) q − q f (cid:1) − (cid:0) g ( x , q ) − d (cid:1) T C − dd (cid:0) g ( x , q ) − d (cid:1)(cid:27) . (28)Maximizing the posterior pdf in Eq. (28) is equivalent to minimizing the cost function J ( x , q ) = (cid:0) x − x f (cid:1) T C − xx (cid:0) x − x f (cid:1) + (cid:0) q − q f (cid:1) T C − qq (cid:0) q − q f (cid:1) + (cid:0) g ( x , q ) − d (cid:1) T C − dd (cid:0) g ( x , q ) − d (cid:1) . (29)As in the strong constraint case we sample the priors and deﬁne a cost function for each sample realization, and weobtain the weak constraint analog to the strong-constraint cost function in Eq. (10), i.e., J ( x j , q j ) = (cid:0) x j − x f j (cid:1) T C − xx (cid:0) x j − x f j (cid:1) + (cid:0) q j − q f j (cid:1) T C − qq (cid:0) q j − q f j (cid:1) + (cid:0) g ( x j , q j ) − d j (cid:1) T C − dd (cid:0) g ( x j , q j ) − d j (cid:1) . (30)4 .2 Stationarity condition To develop a consistent set of equations for the different methods, it is simpler to rewrite the cost function for anaugmented variable z T = ( x T , q T ) . We can then deﬁne the covariance C zz = (cid:18) C xx C xq C qx C qq (cid:19) , (31)where we allow for correlations between x and q since this correlation becomes important in the iterative methods,and we can rewrite the cost function in Eq. (30) as J ( z j ) = (cid:0) z j − z f j (cid:1) T C − zz (cid:0) z j − z f j (cid:1) + (cid:0) g ( z j ) − d j (cid:1) T C − dd (cid:0) g ( z j ) − d j (cid:1) . (32)This cost function is slightly more general than the one in Eq. (30) since we do not require independence betweeen q and x . By taking the gradient with respect to z j we obtain the following closed system of nonlinear equations for z j C − zz ( z j − z f j ) + ∇ z g ( z j ) C − dd (cid:0) g ( z j ) − d j (cid:1) = . (33)In the case of a linear model, the solution is the standard Kalman ﬁlter update equation for the mean. However, thepresence of the nonlinear function g ( z j ) makes the solution more elaborate. All of ES, ESMDA and IES are developedto solve this system of equations. To derive the ES solution, we will use a Taylor expansion and approximation that allows us to obtain a closed formsolution for each realization of z j , i.e., g ( z j ) = g ( z f j ) + G j ( z j − z f j ) , (34)where we have deﬁned G T j = ∇ z g ( z ) (cid:12)(cid:12) z = z f j . (35)Thus, we approximate the nonlinear function g ( z j ) with its linearization in Eq. (34) around z = z f j and in additionevaluate the gradient in Eq. (33) at z f j . We now have the gradient of the model, G j deﬁned in Eq. (35), which differfor each realization. We wish to replace the individual model gradients with an “averaged” gradient that is commonfor all realizations, and for now, we denote it G , which allows us to write Eq. (33) as (cid:0) C − zz + G T C − dd G (cid:1)(cid:0) z j − z f j (cid:1) = G T C − dd (cid:0) d j − g ( z f j ) (cid:1) . (36)By solving for z j and using the matrix identity (cid:0) G T D − G + C − (cid:1) − G T D − = CG T (cid:0) GCG T + D (cid:1) − , (37)(which can be derived from the Woodbury identity), where we substitute C zz for C , and C dd for D , we obtain thesolution for z j as z j = z f j + C zz G T (cid:16) GC zz G T + C dd (cid:17) − (cid:16) d j − g ( z f j ) (cid:17) . (38) Evensen (2018) used a Taylor expansion of g ( z j ) around the ensemble mean z = z f and could then replace the indi-vidual gradients in Eq. (35) evaluated at z j with the gradient G z evaluated at the ensemble mean z . He then showedthat C zy ≈ G z C zz and C yy ≈ G z C zz G T z ( Evensen , 2018, Eqs. 29 and 30), and he replaced the analytical gradients withensemble covariances.We will here use an interpretation based on linear regression (see also

Reynolds et al. , 2006;

Chen and Oliver ,2013) where we start by deﬁning G (cid:44) C yz C − zz . (39)Thus, we view G as the sensitivity matrix in a linear regression between y and z as C yz = GC zz . (40)5hen we introduce the ensemble-anomaly matrices Y = √ N − (cid:0) y − y , . . . , y N − y (cid:1) ∈ ℜ m × N , (41) Z = √ N − (cid:0) z − z , . . . , z N − z (cid:1) ∈ ℜ n × N , (42)we can write Eq. (40) as YZ T ≈ GZZ T , (43)or equivalently Y ≈ GZ , (44)where the approximation is the use of a ﬁnite ensemble size. From Eq. (39) we have G ≈ G = C yz ( C zz ) + = YZ T (cid:0) ZZ T (cid:1) + = YZ + , (45)where the superscript + denotes pseudo inverse. The unbiasedness of G can be shown, i.e., E (cid:2) G (cid:3) = G . (46)Also, using Eq. (39), we can write GC zz G T = C yz C − zz C zy , (47)and when we introduce the ensemble representation from Eq. (45), we obtain G C zz G T = Y ( Z + Z )( Z + Z ) Y T . (48)The projection Z + Z is just the orthogonal projection onto the range of Z T . We will now consider three cases: For a linear model and measurement operator we can write, e.g., Y = HZ , and Eq. (48) becomes G C zz G T = HZ ( Z + Z )( Z + Z ) Z T H T = HZ ( HZ ) T = YY T = C yy . (49)Hence, Eq. (49) is consistent with the deﬁnition of the covariance matrix C yy = YY T for linear models and all combi-nations of n and N . n ≥ N − n ≥ N − Z is N −

1, and the projection Z + Z = (cid:16) I − N T (cid:17) with ∈ ℜ N being a vector with all elements equal to one (see the Appendix in Sakov et al. , 2012). This result is seenfrom the fact that Z has only one singular value equal to zero corresponding to the right singular vector / √ N . Theprojection Z + Z is then just the subtraction of the ensemble mean. Since we have already removed the ensemble meanfrom Y , we can write Eq. (48) as G C zz G T = Y (cid:16) I − N T (cid:17)(cid:16) I − N T (cid:17) Y T = YY T = C yy , (50)and as in the linear case, Eq. (48) exactly corresponds to the deﬁnition of the ensemble covariance C yy . This nonlinearcase with n ≥ N − G C zz G T with the sample covariance C yy . n < N − n < N −

1, which applies for the example considered in Section 6, the expressionin Eq. (48) is not equal to YY T and we must include the projection and redeﬁne the sample covariance as (cid:101) C yy (cid:44) Y ( Z + Z ) (cid:0) Y ( Z + Z ) (cid:1) T , (51)i.e., we compute the covariance of the predicted measurement anomalies projected onto the range of Z T . This casealso applies for nonlinear models in the limit of inﬁnite ensemble size. Thus, we must use the deﬁnition Eq. (51) toevaluate (cid:101) C yy to ensure consistency in the derivation of the update equation.6 .5 ES algorithm We can replace the gradient G in the update Eq. (38), using Eqs. (40) and (47) to obtain z a j = z f j + C zy (cid:16) C yz C − zz C zy + C dd (cid:17) − (cid:16) d j − g ( z f j ) (cid:17) . (52)The solution of this equation is identical to the solution of Eq. (36) with G deﬁned by Eq. (39). However, if we replace C yz C − zz C zy with C yy in Eq. (52), the solutions of Eqs. (36) and (52) will differ in the nonlinear case.By representing the covariances using a ﬁnite sample size and sample covariance matrices, we get the updateequation for the ﬁnite ensemble of N realizations as z a j = z f j + C zy (cid:16)(cid:101) C yy + C dd (cid:17) − (cid:16) d j − g ( z f j ) (cid:17) , (53)where (cid:101) C yy is deﬁned in (51). If we omit the projection in Eq. (51), then the solution computed by Eqs. (52) (usingsample covariances) and (53) will differ in the case of a nonlinear model and n < N −

1, also when N becomesinﬁnitely large. Thus, our update will be biased.To compute the ES update, we start by sampling the Gaussian prior variables for the parameters x f j , the modelerrors q f j , and the measurement perturbations e j , x f j ∼ N ( x f , C f xx ) , (54) q f j ∼ N ( , C f qq ) , (55) e j ∼ N ( , C dd ) . (56)The vector q contains all stochastic model errors over the time interval of the model integration, and the errors canalso have correlations in time. We obtain the model prediction from the model written on the form y f j = g ( x f j , q f j ) , (57)where the model operator depends nonlinearly on the model-error term. Next, we can compute the sample covariances C f xy , and C f qy , from the ensembles of y f j , x f j , and q f j , and (cid:101) C f yy from the deﬁnition in Eq. (51).Eq. (53) deﬁnes the ﬁnal update equations for x a j and q a j which becomes x a j = x f j + C f xy (cid:0)(cid:101) C f yy + C dd (cid:1) − (cid:16) d j − y f j (cid:17) , (58) q a j = q f j + C f qy (cid:0)(cid:101) C f yy + C dd (cid:1) − (cid:16) d j − y f j (cid:17) . (59)We then rerun the model using the updates x a j and q a j to get y a j = g ( x a j , q a j ) . (60)Alternatively, we can also compute the update of the predicted measurements from y a j = y f j + C f yy (cid:0) C f yy + C dd (cid:1) − (cid:16) d j − y f j (cid:17) , (61)and in the case of a linear model the result would be identical to that obtained by integrating the model in Eq. (60). The critical approximations used in the derivation of ES are, ﬁrstly, the linearization in Eq. (34) of the model about x f j meaning that large updates will have large errors, and secondly, that an averaged statistical ensemble gradientreplaces the exact analytic gradients. Only a single linear update step is computed, and with strong nonlinearities,these approximations may lead to poor results as was discussed in Evensen (2018).The minimization problem in Eq. (10) can be solved using iterative methods like IES, ESMDA, and IEnKF. Theiterative ensemble smoother (IES) by

Chen and Oliver (2012, 2013) minimizes the ensemble of cost functions bydirect minimization using an approximate ensemble gradient. Alternatively, the Ensemble Smoother with MultipleData Assimilations (ESMDA) by

Emerick and Reynolds (2012) rewrites the ES update equation as a sequence ofrecursive updates by inﬂating the measurement errors, and thereby reduces the approximation introduced by thelinearization in the ES update.

Sakov et al. (2012);

Bocquet and Sakov (2014) derived the iterative EnKF (IEnKF) and iterative Ensemble KalmanSmoother (IEnKS) to better handle nonlinearities in the dynamical model and the observation operator. The focus7as on state estimation where the model state at the time t i is updated using measurements of the state at time t i + .IEnKF solves the same kind of problem as given by the marginal conditional pdf in Eq. (5) or the cost function inEq. (9). In a recent paper by Sakov et al. (2018) the IEnKF was extended to account for additive model errors, and themethod should also be applicable for the history matching problem in the presence of additive model errors.In the following, we will present variants of ESMDA and IES that take more general model errors into account asis required when solving the weak constraint history-matching problem.

As explained in

Evensen (2018), ESMDA solves the standard ES update equations using a predeﬁned number ofrecursive steps. In each step, the measurement error covariance and associated measurement perturbations are inﬂatedto reduce the impact of the measurements. With correctly chosen inﬂation factors and a linear model and observationoperators, the ESMDA update precisely replicates the ES update. When the model or observation operators arenonlinear, it turns out that the use of multiple short update steps reduces the errors and improves the solution ascompared to using one long update step in ES.From the previous discussion, it is clear that, in the presence of model errors, we need to recursively update boththe parameters and the model errors. It is easiest to derive ESMDA by using a tempering of the likelihood function(

Neal , 1996) which leads to a recursive minimization of a sequence of N mda cost functions, ( Stordal and Elsheikh ,2015;

Evensen , 2018), J ( z j , i + ) = (cid:0) z j , i + − z j , i (cid:1) T (cid:0) C izz (cid:1) − (cid:0) z j , i + − z j , i (cid:1) + (cid:0) g ( z j , i + ) − d − √ α i + e j , i (cid:1) T (cid:0) α i + C dd (cid:1) − (cid:0) g ( z j , i + ) − d − √ α i + e j , i (cid:1) , (62)where we evaluate C izz at the i th iterate z i , and we must have N mda ∑ i = α i = . (63)Similarly to the derivation of the ES in the previous section, we obtain the recursive update equations for ESMDAgiven by Eqs. (69) and (70) in the algorithm below. As in ES, the update direction is computed based on a linearizationaround the prior realizations of each update step. Thus, we can interpret the ES update as taking one long Euler stepof length ∆ τ = τ , while in ESMDA we take a predeﬁned number of shorter Euler steps of step length ∆ τ i = / α i that satisfy Eq. (63) (see e.g. the discussion in Evensen , 2018).To compute the ESMDA solution, we start by sampling the initial ensembles from Eqs. (54) and (55) to initializethe recursion in ESMDA x j , ∼ N ( x f , C xx ) , (64) q j , ∼ N ( , C qq ) . (65)Then the model is integrated according to Eq. (57) to obtain the prior ensemble prediction for the ﬁrst ESMDA step, y j , = g ( x j , , q j , ) , (66)and we compute recursively the following for each iteration i = , . . . , N mda − C ixy , and C iqy , from the ensembles of y j , i , x j , i , and q j , i , and (cid:101) C iyy from thedeﬁnition in Eq. (51), and we sample the measurement perturbations e j , i ∼ N ( , α i + C dd ) , (67)used to generate the perturbed measurements d j , i = d + e j , i (68)We then compute the updates x j , i + = x j , i + C ixy (cid:16)(cid:101) C iyy + α i + C dd (cid:17) − (cid:16) d j , i − y j , i (cid:17) , (69) q j , i + = q j , i + C iqy (cid:16)(cid:101) C iyy + α i + C dd (cid:17) − (cid:16) d j , i − y j , i (cid:17) , (70)and rerun the model to obtain the updated prediction y j , i + = g ( x j , i + , q j , i + ) , (71)for step i +

1. We repeat this procedure until i = N mda −

1, which results in the ESMDA solution for x j , q j and y j .8 .2 IES In IES we use a gradient based minimization method, and we need to evaluate the ﬁrst and second order derivatives ofthe cost function in Eq. (32) with respect to z . The gradient of the cost function in Eq. (32) is already derived aboveas Eq. (33) ∇ z J ( z j ) = C − zz ( z j − z f j ) + ∇ z g ( z j ) C − dd (cid:0) g ( z j ) − d j (cid:1) . (72)An approximation to the Hessian of the cost function is obtained by operating again by ∇ z on the gradient in Eq. (72)to obtain ∇ z ∇ z J ( z j ) ≈ C − zz + ∇ z g ( z j ) C − dd (cid:0) ∇ z g ( z j ) (cid:1) T , (73)where we have neglected the second derivatives or Hessian of the vector function g ( z ) , i.e., ∇ z ∇ z g ( z ) . We can thenwrite a Gauss-Newton iteration z j , i + = z j , i − γ ∆ z j , i , (74)where we deﬁne ∆ z as the gradient normalized by the approximate Hessian as follows ∆ z j , i = (cid:16) ∇ z ∇ z J ( z j , i ) (cid:17) − ∇ z J ( z j , i )= (cid:16) C − zz + G T j , i C − dd G j , i (cid:17) − C − zz ( z j , i − z f j ) (75) + (cid:16) C − zz + G T j , i C − dd G j , i (cid:17) − G T j , i C − dd (cid:0) g ( z j , i ) − d j (cid:1) , and we deﬁne G T i , j = ∇ z g ( z i , j ) (76)as the gradient of the model, evaluated at iteration i and for ensemble member j . The Eqs. (74) and (75) deﬁnes theEnsemble Randomized Likelihood method ( Kitanidis , 1995;

Oliver et al. , 1996).Since we are not computing the analytical gradient of the model we will need to aproximate the ensemble ofgradients with an averaged gradient like G from Eq. (45), or we can evaluate the gradient at the ensemble average forthe local iterate G z as was explained by Evensen (2018). The same model gradient is now used for all realizations,and this leads to a different solution than the solution of the originally posed problem.Eq. (75) is exactly Eq. (2) in

Chen and Oliver (2013). Also,

Chen and Oliver (2013) suggested using the statecovariance in the Hessian evaluated at the local iterate to simplify further computations, since changing the Hessiandoes not change the gradient and thus the ﬁnal converged solution (although it changes the step lengths in eachiteration).Thus, we can rewrite Eq. (75) with the averaged model gradient and introduce the state covariance for the localiterate in the Hessian, to obtain ∆ z j , i = (cid:16)(cid:0) C izz (cid:1) − + G T i C − dd G i (cid:17) − C − zz ( z j , i − z f j )+ (cid:16)(cid:0) C izz (cid:1) − + G T i C − dd G i (cid:17) − G T i C − dd (cid:0) g ( z j , i ) − d j (cid:1) . (77)Then using the corollaries (cid:0) C − + G T D − G (cid:1) − = C − CG T ( GCG T + D ) − GC , (78) (cid:0) G T D − G + C − (cid:1) − G T D − = CG T (cid:0) GCG T + D (cid:1) − , (79)which are derived from the Woodbury identity, we can write Eq. (77) as ∆ z j , i = C izz C − zz ( z j , i − z f j ) − C izz G T i (cid:16) G i C izz G T i + C dd (cid:17) − (cid:16) G i C izz C − zz ( z j , i − z f j ) − (cid:0) g ( z j , i ) − d j (cid:1)(cid:17) . (80)Now, from Eqs. (40) and (47) we can write Eq. (80) as ∆ z j , i = C izz C − zz ( z j , i − z f j ) − C izy (cid:16) C iyz ( C izz ) − C izy + C dd (cid:17) − (cid:16) C iyz C − zz ( z j , i − z f j ) − (cid:0) g ( z j , i ) − d j (cid:1)(cid:17) . (81)In the original algorithm the expression C iyz ( C izz ) − C izy was replaced with the covariance C iyy . But, as we have seen,this will break the consistency between Eqs. (77) and (81) in the nonlinear case with n < N −

1, and we need to usethe deﬁnition in Eq. (51) to replace this expression.The numerical solution method for this equation is discussed in more detail by

Chen and Oliver (2013). It is clearthat it is the introduction of low-rank ensemble representations of the covariances that makes it possible to computethe update steps ∆ z j , i , and the computation requires the use of singular-value decompositions and pseudo inversions.9 y

2 1 0 1 2 3 4 5 2024

Prior Joint PDF x y

2 1 0 1 2 3 4 5 2024

Prior Joint PDF x y

2 1 0 1 2 3 4 5 2024

Bayes Posterior x y

2 1 0 1 2 3 4 5 2024

Bayes Posterior x y

2 1 0 1 2 3 4 5 2024 ES x y

2 1 0 1 2 3 4 5 2024

ESMDA 4 x y

2 1 0 1 2 3 4 5 2024

ESMDA 4 x y

2 1 0 1 2 3 4 5 2024

IES x y

2 1 0 1 2 3 4 5 2024

IES

Figure 1: The plots show joint pdfs for the linear case. In the left column, the model error is set to zero (a smallmodel error is retained in the ﬁnal prediction for plotting purposes), while in the right column, we include a modelerror with standard deviation equal to 0.5. The two upper rows are the analytical prior and posterior, while the threelower rows show the results from ES, ESMDA with four steps, and IES.10 M a r g i n a l P D F

2 1 0 1 2 3 40.00.10.20.30.40.50.6

PriorBayes_0.0ES_7_0.0MDA_7_004_0.0IES_7_0.0 y M a r g i n a l P D F

3 2 1 0 1 2 3 4 50.00.10.20.30.40.50.6

PriorBayes_0.0ES_7_0.0MDA_7_004_0.0IES_7_0.0Datum x M a r g i n a l P D F

2 1 0 1 2 3 40.00.10.20.30.40.50.6

PriorBayes_0.5ES_7_0.5MDA_7_004_0.5IES_7_0.5 y M a r g i n a l P D F

3 2 1 0 1 2 3 4 50.00.10.20.30.40.50.6

PriorBayes_0.5ES_7_0.5MDA_7_004_0.5IES_7_0.5Datum

Figure 2: The plots show the marginal pdfs for x in the left column and the marginal pdfs for y in the right columnfor the linear case corresponding to Figure 1. The upper row is the case with zero model error, while the lower rowshows the result when we include a model error with standard deviation equal to 0.5. In all the plots, the legends, e.g.,MDA 7 004 0 . ensemble members, 4 MDA steps, and a model error with variance equalto 0.5. Note that, in the linear case, all the methods give results that are identical to the Bayesian posterior, and theposterior pdfs are indistinguishable. To verify the new algorithms, we will use the scalar example from

Evensen (2018). The example resembles the useof conditioning methods in history matching, i.e., there is a parameter x that serves as an input to a forward model topredict y = g ( x , q ) . We assume an initial state x and a prediction y , given by the model y = g ( x , q )= x ( + β x ) + q . (82)Here β is a parameter that determines the nonlinearity of the model. In the current example, we have used β = . β = . x and q , the problem becomes nonlinear, and we can not test if thedifferent methods converge to the same solution in the entirely linear case where the convergence partly constitutesour proof of consistency.The model error q is a random variable sampled from N ( , C qq ) with C qq = . C qq = .

25 in the case including model errors.In all the four cases we sample the prior ensemble for x from a Gaussian distribution with mean x f = C xx = y from a Gaussian error distribution with mean d = − C dd =

1. Thus, in the current example, x represents the initial state or the model parameter, while y is thepredicted observation. The goal is to estimate x given a measurement of y and then to recompute the correct predictionof y subject to model errors consistently with Bayes theorem.In this example, we use a sufﬁciently large number of samples, i.e., 10 , to generate accurate estimates of theprobability density functions and this allows us to work directly with the pdfs and to examine the converged solutionsof the methods. 11 M a r g i n a l P D F

2 1 0 1 20.00.20.40.60.8

INI_7_0.5ES_7_0.5MDA_7_004_0.5IES_7_0.5

Figure 3: Model error distributions for the linear case. It is not possible to distinguish between the pdfs for theestimated model errors from the different methods.

In Figs. 1 and 2 we show the results from the linear cases without and with model errors. In Fig. 1 we plot the jointpdfs for the prior and the updated solutions, and in Fig. 2 we plot the corresponding marginal pdfs.The joint pdfs illustrate the effect of including stochastic model errors. Without model errors, there is a uniquemapping from x to y , and the pdf is zero except along the curve (or line in the linear case) deﬁned by the modelfunction y = g ( x ) . The prior joint pdf has a maximum value located at ( x , y ) = ( , ) while the posterior joint pdf hasshifted the maximum value to ( x , y ) = ( , ) for all the methods. When we introduce the model errors, we notice thatwe obtain a stronger update in y and weaker update in x , than in the case without model errors. Still, we observe thatall the smoother methods give a result that is identical to the Bayesian update. We can better visualize these resultswhen we examine the marginal pdfs plotted in Fig. 2.In the case without model errors, we see that the prediction pdf for y and the measurement pdf have the samevariance and only differ in the value of the means. The measurement is at y = − y =

1. The update from ES, ESMDA, and IES, exactly matches the Bayesian update in this case and is centeredbetween the measurement and prediction pdf as we would expect.When we include model errors, the effect is that the prediction gets a higher variance, although the mean is thesame (in this particular case). Due to the higher variance, we give more weight to the measurement in the update andthe update for y is stronger than in the case without model errors. On the other hand, the update for x is weaker in thiscase, since the addition of model errors reduces the correlation between the predicted measurement y and the prior x .So, how can the update for y be shifted towards the observation in this case? Afterall, we compute y as a predictionfrom x . Here the inclusion of the model errors in the inversion plays a vital role. We simultaneously update theensemble for x and the estimate of the model errors q . In Fig. 3 we see how we shift the model errors towards negativevalues. Thus, when we integrate the model forward from the updated x , the forcing from q compensates for the weakerupdate of x and also the additional shift of y towards the measured value.This example illustrates how model errors impact the updates of x and y as well as how we also need to includethe model errors as a parameter in the estimation and then use it in the prediction to obtain the correct estimate of y .Finally, we also demonstrate that in the linear case with and without model errors, ES, IES, and ESMDA, all convergeto the correct Bayesian solution. 12 y

2 1 0 1 2 3 4 2024

Prior Joint PDF x y

2 1 0 1 2 3 4 2024

Prior Joint PDF x y

2 1 0 1 2 3 4 2024

Bayes Posterior x y

2 1 0 1 2 3 4 2024

Bayes Posterior x y

2 1 0 1 2 3 4 2024 ES x y

2 1 0 1 2 3 4 2024

ESMDA 4 x y

2 1 0 1 2 3 4 2024

ESMDA 4 x y

2 1 0 1 2 3 4 2024

IES x y

2 1 0 1 2 3 4 2024

IES

Figure 4: The plots show joint pdfs for the nonlinear case. In the left column, the model error is set to zero, while inthe right column, we include a model error with standard deviation equal to 0.5. The two upper rows are the analyticalprior and posterior, while the three lower rows show results from ES, ESMDA with four steps, and IES.13 M a r g i n a l P D F

2 1 0 1 2 3 40.00.10.20.30.40.50.60.70.8

PriorBayes_0.0ES_7_0.0MDA_7_004_0.0MDA_7_064_0.0IES_7_0.0 y M a r g i n a l P D F

4 3 2 1 0 1 2 3 4 50.00.10.20.30.40.50.60.70.8

PriorPDFC_7_0.0ES_7_0.0MDA_7_004_0.0MDA_7_064_0.0IES_7_0.0DatumES_7_0.0D x M a r g i n a l P D F

2 1 0 1 2 3 40.00.10.20.30.40.50.60.70.8

PriorBayes_0.5ES_7_0.5MDA_7_004_0.5MDA_7_064_0.5IES_7_0.5 y M a r g i n a l P D F

4 3 2 1 0 1 2 3 4 50.00.10.20.30.40.50.60.70.8

PriorPDFC_7_0.5ES_7_0.5MDA_7_004_0.5MDA_7_064_0.5IES_7_0.5DatumES_7_0.5D

Figure 5: The plots show the marginal pdfs for x in the left column and the marginal pdfs for y in the right columnfor the nonlinear case corresponding to Figure 4. The upper row is the case with zero model error, while the lowerrow shows the result when we include a model error with standard deviation equal to 0.5. In all the plots, the legends,e.g., MDA 7 004 0 . ensemble members, 4 MDA steps, and a model error with varianceequal to 0.5. The legends ES 7 0.0D and ES 7 0.5D refers to an ES case where the prediction y a j is updated directlyby Eq. (61). In Figs. 4 and 5 we show the results from the nonlinear cases with and without model errors, where Fig. 4 plots thejoint pdfs for the prior and the updated solutions, and in Fig. 5 we plot the corresponding marginal pdfs.From the joint pdfs, we notice that the various smoother methods give different results both with and without theinclusion of model errors, although the general shape and locations of the pdfs are reasonably consistent with thetheoretical solution as given by Bayes theorem.We get a clearer picture from the marginal pdfs in Fig. 5. As for the linear case, we get a weaker update of x anda stronger update of y . We also notice that the introduction of model errors is handled well by the iterative methods,and the results are somewhat better and more consistent with the theoretical solution than in the case without modelerrors. ES is still the poorest estimator, and the iterative smoothers provide a signiﬁcant improvement in the estimate,also in the case including model errors. We show the corresponding updates of the model error q in Fig. 6 and thedifferent smoother methods all give slightly different results.The dashed green line in the plots for y in Fig. 5 is the direct ES update of y using the predicted ensemble for y and the measurement. It is clear that the update of x followed by an integration of the model to obtain y gives abetter result than a direct update of y . Furthermore, the additional use of iterations improves the estimate of y evenfurther. This result is the motivation for introducing IEnKF in sequential data assimilation ( Sakov et al. , 2012, 2018)and also the iterative smoother by

Bocquet and Sakov (2013, 2014). In history matching, we are primarely interestedin estimating x , and the prediction y is just the result given the model parameters in x . But also in history matching,the ultimate value comes from accurate predictions of y .The impact of using Eq. (51) for evaluating (cid:101) C yy in the update schemes is illustrated in Fig. 7 where we showresults including and excluding the projection. The impact is most pronounced when using ES and ESMDA with fewupdate steps where the use of C yy instead of (cid:101) C yy severely impacts the computation of the long linear update steps. InIES we must include the projection to ensure that the gradients deﬁned in Eqs. (77) and (81) are identical but in thecurrent case the relative difference in the estimated mean when including or excluding the projection is only aroundone percent. 14 M a r g i n a l P D F

2 1 0 1 20.00.20.40.60.8

INI_7_0.5ES_7_0.5MDA_7_004_0.5MDA_7_064_0.5IES_7_0.5

Figure 6: Model error distributions for the nonlinear case.

The need for including model errors in iterative ensemble smoothers became apparent while working with the paper(

Evensen and Eikrem , 2018), which considered the conditioning of reservoir models on production-rate data. Typi-cally, in history matching, we assign errors to the rate data used in the conditioning step, while we neglect these errorswhen the same data are used to force the reservoir simulation model during the historical simulation.The errors in rate data are considered as the dominant model errors in a reservoir simulation model when weexclude errors in the model parameters that we estimate during the history matching. Also,

Evensen and Eikrem (2018) pointed out that there are strong time correlations in the errors in rate data due to the rate allocation procedureused. When we include a stochastic model forcing using time-correlated errors, we will experience a signiﬁcantlystronger impact than when the errors are white in time (see Chap. 12 in

Evensen , 2009b).The functional form y = g ( x , q ) can represent the prediction of the produced rates (that we observe) from areservoir simulation model. Note that using ensemble methods, we do not need to explicitly construct the functionalform y = g ( x , q ) since we represent the gradients using ensemble covariances. We only need access to a numericalreservoir simulation model. The inputs x then contain all the uncertain parameters of the model, such as, e.g., porosityand permeability ﬁelds, various transmissibilities, and even parameters deﬁning the reservoir structure. The modelerrors can be the errors in the rates used to force the model in the historical period. These errors are not additive sincethe model prediction depends nonlinearly on the speciﬁed rates.If we associate the dominant model errors with the rates used to force the simulation model, then the size of thevector of model errors q is equal to the number of rate data used to force the model. A typical number of data for awell can be 12 data points per year, i.e., if we force the model using monthly reservoir-volume rates.The prior error statistics for the rate data used to force the model should be the same as is used for the rate data inthe update step. Thus, we can simulate a prior ensemble of time series with mean zero, a speciﬁed variogram in time,and a speciﬁed variance, to represent the model errors. These time series are then deﬁning the vectors q j of modelerrors for each realization. An ensemble model-error covariance C qq is then deﬁned by the ensemble q j .The conventional procedure of deriving the production rates from rate-allocation tables, which we only update inconnection with separator tests, often several years apart, means that the errors in the rate data will be nearly perfectlycorrelated in between each separator test. Note also that the inclusion of time correlations signiﬁcantly reduces thedegrees of freedom in the model errors and simpliﬁes the estimation of the model errors in the conditioning step.The expected impact of including model errors is ﬁrst a larger and more realistic spread of the prior ensemble.Second, we will obtain a weaker and more correct update of the reservoir parameters in conditioning step. Further-more, the posterior ensemble will give a more accurate and consistent prediction at the end of the history matching15 M a r g i n a l P D F

2 1 0 1 2 30.00.10.20.30.40.50.60.70.8

Bayes_0.5ES_7_0.5MDA_7_004_0.5MDA_7_064_0.5IES_7_0.5

Figure 7: The plot compares solutions including (solid lines) and excluding (dashed lines) the projection in Eq. (51).period since the posterior realizations are forced by updated and improved estimated rates. Thus, we have an improvedbasis for making future predictions. We also expect that the information contained in the improved estimates of rates,in some cases may be used to correct for biases in the rates used in future predictions.

In this paper, we have given consistent formulations of iterative ensemble smoothers when we include model errors.We demonstrate the consistency by showing that the ensemble smoothers all converge to the Bayesian posterior inthe linear case. The main result is that the model errors need to be treated as another set of unknown parameters andestimated in the same way as the input parameters to the model. The proposed approach allows for the inclusion ofgeneral nonlinear errors that can be both red and white in time. Thus, we can apply the methods for history matchingof reservoir models forced with uncertain rate data having time-correlated errors, as well as an iterative EnKF insequential data assimilation with general model-error terms.We demonstrate that the inclusion of model errors leads to a weaker update of the input parameters to the model,but a stronger update of the measured model prediction. Vice verse, the negligence of model errors that shouldbe present, will lead to a too substantial update of the model input parameters with an associated underestimateduncertainty and also a too weak update of the prediction.Thus, the results open for a more consistent solution of the history-matching problem, given that signiﬁcant modelerrors are neglected in all previous history-matching applications with iterative ensemble smoothers.We also brieﬂy discuss an inconsistency of the linearizations in the analysis scheme that appear for nonlinearoperators and when the state dimension is less than the ensemble size minus one, and we show that we must includean additional projection of the predicted measurements for consistency in the derivation of the ﬁnal update equations.

Acknowledgement

This work was supported by the Research Council of Norway and the companies AkerBP, DEA, ENI, Shell, Petrobras,Equinor, and VNG, through the Petromaks–2 project DIGIRES. Also, the work has beneﬁted from the interaction andcollaborations with the Nordforsk Nordic center of excellence in data assimilation, EMBLA. In-depth discussionswith Patrick Raanes regarding the linear-regression derivation have helped to improve the manuscript, and the authoris also grateful for comments by Geir Nævdal and Alberto Carrassi on early versions of the manuscript. Constructive16omments by three anonymous reviewers lead to the inclusion of a section on how to practically account for modelerrors in reservoir history matching.

References

Bennett, A. F.,

Inverse Methods in Physical Oceanography , 346 pp., Cambridge University Press, 1992.Bocquet, M., and P. Sakov, Joint state and parameter estimation with an iterative ensemble Kalman smoother,

Non-linear Processes in Geophysics, European Geosciences Union (EGU) , (5), 803–818, 2013.Bocquet, M., and P. Sakov, An iterative ensemble Kalman smoother, Q. J. R. Meteorol. Soc. , , 1521–1535, 2014.Carrassi, A., and S. Vannitsem, Deterministic treatment of model error in geophysical data assimilation, in Mathemat-ical Paradigms of Climate Science , Springer INdAM Series , vol. 15, edited by F. Ancona, P. Cannarsa, C. Jones,and A. Portaluri, pp. 175–213, Springer, Cham, 2016.Chen, Y., and D. S. Oliver, Ensemble randomized maximum likelihood method as an iterative ensemble smoother,

Math. Geosci. , , 1–26, 2012.Chen, Y., and D. S. Oliver, Levenberg-Marquardt forms of the iterative ensemble smoother for efﬁcient history match-ing and uncertainty quantiﬁcation, Computat Geosci , , 689–703, 2013.Eknes, M., and G. Evensen, Parameter estimation solving a weak constraint variational formulation for an Ekmanmodel, J. Geophys. Res. , (C6), 12,479–12,491, 1997.Emerick, A. A., and A. C. Reynolds, History matching time-lapse seismic data using the ensemble Kalman ﬁlter withmultiple data assimilations, Computat Geosci , (3), 639–659, 2012.Emerick, A. A., and A. C. Reynolds, Ensemble smoother with multiple data assimilation, Computers and Geosciences , , 3–15, 2013.Evensen, G., Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods toforecast error statistics, J. Geophys. Res. , (C5), 10,143–10,162, 1994.Evensen, G., The ensemble Kalman ﬁlter: Theoretical formulation and practical implementation, Ocean Dynamics , , 343–367, 2003.Evensen, G., The ensemble Kalman ﬁlter for combined state and parameter estimation, IEEE Control Systems Maga-zine , (3), 83–104, 2009a.Evensen, G., Data Assimilation: The Ensemble Kalman Filter , 2nd ed., 320 pp., Springer, 2009b.Evensen, G., Analysis of iterative ensemble smoothers for solving inverse problems,

Computat Geosci , (3), 885–908, doi:10.1007/s10596-018-9731-y, 2018.Evensen, G., and K. S. Eikrem, Strategies for conditioning reservoir models on rate data using ensemble smoothers, Computat Geosci , (5), 1251–1270, doi:10.1007/s10596-018-9750-8, 2018.Evensen, G., and P. J. van Leeuwen, Assimilation of Geosat altimeter data for the Agulhas current using the ensembleKalman ﬁlter with a quasi-geostrophic model, Mon. Weather Rev. , , 85–96, 1996.Harlim, J., Model error in data assimilation, in Nonlinear and Stochastic Climate Dynamics , edited by C. Franzke andT. OKane, Cambridge University Press: Cambridge, UK, also available as arXiv:1311.3579, 2017.Kitanidis, P. K., Quasi-linear geostatistical therory for inversing,

Water Resources Research , (10), 2411–2419,1995.Neal, R. M., Sampling from multimodal distributions using tempered transitions, Statistics and Computing , (4),353–366, 1996.Oliver, D. S., N. He, and A. C. Reynolds, Conditioning permeability ﬁelds to pressure data, ECMOR – 5th EuropeanConference on the Mathematics of Oil Recovery, 1996.Reynolds, A. C., M. Zafari, and G. Li, Iterative forms of the Ensemble Klman Filter, ECMOR – 10th EuropeanConference on the Mathematics of Oil Recovery, 2006.Sakov, P., D. S. Oliver, and L. Bertino, An iterative EnKF for strongly nonlinear systems, Mon. Weather Rev. , ,1988–2004, 2012. 17akov, P., J.-M. Haussaire, and M. Bocquet, An iterative ensemble Kalman ﬁlter in the presence of additive modelerror, Q. J. R. Meteorol. Soc. , doi: 10.1002/qj.3213, 2018.Stordal, A., and A. H. Elsheikh, Iterative ensemble smoothers in the annealed importance sampling framework,

Ad-vances in Water Resources ,86