[PDF] A New Approach for 4DVar Data Assimilation

Abstract

Four-dimensional variational data assimilation (4DVar) has become an increasingly important tool in data science with wide applications in many engineering and scientific fields such as geoscience1-12, biology13 and the financial industry14. The 4DVar seeks a solution that minimizes the departure from the background field and the mismatch between the forecast trajectory and the observations within an assimilation window. The current state-of-the-art 4DVar offers only two choices by using different forms of the forecast model: the strong- and weak-constrained 4DVar approaches15-16. The former ignores the model error and only corrects the initial condition error at the expense of reduced accuracy; while the latter accounts for both the initial and model errors and corrects them separately, which increases computational costs and uncertainty. To overcome these limitations, here we develop an integral correcting 4DVar (i4DVar) approach by treating all errors as a whole and correcting them simultaneously and indiscriminately. To achieve that, a novel exponentially decaying function is proposed to characterize the error evolution and correct it at each time step in the i4DVar. As a result, the i4DVar greatly enhances the capability of the strong-constrained 4DVar for correcting the model error while also overcomes the limitation of the weak-constrained 4DVar for being prohibitively expensive with added uncertainty. Numerical experiments with the Lorenz model show that the i4DVar significantly outperforms the existing 4DVar approaches. It has the potential to be applied in many scientific and engineering fields and industrial sectors in the big data era because of its ease of implementation and superior performance.

Full PDF

1 A New Approach for 4DVar Data Assimilation

Xiangjun Tian , Aiguo Dai , Xiaobing Feng , Hongqin Zhang , Rui Han and Lu Zhang ICCES, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, 100029, China University of Chinese Academy of Sciences, Beijing, 100049, China Department of Atmospheric and Environmental Sciences, University at Albany, SUNY, 1400 Washington Ave. Albany, NY 12222, U.S.A. Department of Mathematics, the University of Tennessee, Knoxville, TN 37996, U.S.A. ____________________ * Corresponding author e-mail: Xiangjun Tian, [email protected] Four-dimensional variational data assimilation (4DVar) has become an increasingly important tool in data science with wide applications in many engineering and scientific fields such as geoscience , biology and the financial industry . The 4DVar seeks a solution that minimizes the departure from the background field and the mismatch between the forecast trajectory and the observations within an assimilation window. The current state-of-the-art 4DVar offers only two choices by using different forms of the forecast model: the strong- and weak-constrained 4DVar approaches . The former ignores the model error and only corrects the initial condition error at the expense of reduced accuracy; while the latter accounts for both the initial and model errors and corrects them separately, which increases computational costs and uncertainty. To overcome these limitations, here we develop an integral correcting 4DVar (i4DVar) approach by treating all errors as a whole and correcting them simultaneously and indiscriminately. To achieve that, a novel exponentially decaying function is proposed to characterize the error evolution and correct it at each time step in the i4DVar. As a result, the i4DVar greatly enhances the capability of the strong-constrained 4DVar for correcting the model error while also overcomes the limitation of the weak-constrained 4DVar for being prohibitively expensive with added uncertainty. Numerical experiments with the Lorenz model show that the i4DVar significantly outperforms the existing 4DVar approaches. It has the potential to be applied in many scientific and engineering fields and industrial sectors in the big data era because of its ease of implementation and superior performance. , has experienced explosive growth and development, especially in the context of the big data era . This is because DA can effectively incorporate the time series of observational data into model simulations and predictions to improve estimates of all the current and future states of a natural (e.g., the atmosphere) or social (e.g., the financial markets) system . For example, most numerical weather prediction (NWP) centers around the world have adopted the 4DVar approach to assimilate asynchronous observations simultaneously, which has greatly improved the accuracy of weather prediction. Currently, the 4DVar only has two approaches: the strong- and weak-constrained methods depending on whether the 4DVar solution is required to satisfy the forecast model exactly . The strong-constrained 4DVar (s4DVar) assumes the forecast model is perfect and all errors in the prediction originate from the initial conditions. This is clearly unrealistic in many cases, and recent studies show that incorporating the model error into the 4DVar improves its performance. The weak-constrained 4DVar (w4DVar) allows for both initial and model errors and it corrects them separately , which adds computational costs significantly and increases uncertainty. To reduce the computational costs, various simplifications are used to specify the model error in the w4DVar . However, these simplifications still have many issues; for example, it is very difficult or expensive to determine the parameters used to represent the model error covariance . To overcome the limitations of both the s4DVar and w4DVar, we develop a new 4DVar approach in which all errors are treated indistinctly and corrected simultaneously. Our approach is based on the observation that the influence of all errors would decay gradually in time if the correction-term is incorporated into the model integration sequentially. Here, we first show a less recognized fact that the s4DVar actually has a hidden mechanism that can correct the model error at the initial (i.e., analysis) time. The s4DVar seeks 4 an analysis increment ' x m  x of the initial background condition x , such that ' x minimizes the following cost function (so that * '0 0   x x x represents the corrected estimate of the state variable x at the initial time t , see Fig. 1) :     TT' 1 ' 10 0 0 n n

S n k obs,n n n k obs,nn              x B x h x y R h x y , (1) under the constraint ( ) k k k   x f x , where ' x is assumed to be Gaussian with the covariance matrix x x m m   B , k x is the state values at time step k , n k is the time step at the observational time level n t and k f is the forecast model from time step k  to k . Other symbols follow the general convention and are described in Extended Data (ED) Table 1. The traditional s4DVar assumes that the forecast model ( ) k k k   x f x describes the underlying system exactly , thus the model error is negligible. However, the model error is often non-negligible due to errors from discretization of continuous fields, approximation of certain physical processes, parameter uncertainties, boundary conditions, and round-off errors. Moreover, various studies show that the model error prevails over the initial error in many circumstances . The use of the 4DVar at several major NWP centers also indicated that the 4DVar needs to account for model errors because NWP models are imperfect. The s4DVar can still deliver comparable performance with or even outperforms (ED Figs.1-2) one version of the w4DVar for a very special case with perfect initial conditions but random model errors. This case clearly does not fulfill the perfect model assumption, yet the s4DVar still worked reasonably well. This is because the s4DVar is formulated as a constrained optimization problem, which itself (eq. 1) does not require the forecast model k f to be perfect without errors. From this view point, the analysis increment ' x derived from the s4DVar is actually an integral “correction term” to correct all errors (including the initial and model errors) indiscriminately at the analysis time t . Thus, the widely-accepted perfect model assumption in 5 the s4DVar is actually unnecessary for deriving the analysis increment ' x as long as one considers the increment ' x as for correcting both initial and model errors and allows k f to contain errors. This important point seems have not been recognized in the DA community. However, the s4DVar focuses solely on the correction term at the initial analysis time without explicit corrections for the model errors at other times in the assimilation window; thus it is expected to perform poorly when the model error is large. To address this deficiency, the recently-introduced w4DVar seeks the analysis increment ( ' x ) of the initial conditions and the model-error correction terms x mmk  ε , such that ' x and mk ε minimize the following cost function (see Fig. 1):         TT T' 1 ' 1 10 0 0 n n

LS m mn k obs,n n n k obs,n k k kn k                  x B x h x y R h x y ε Q ε , (2) under the state constraint ( ) mk k k k    x f x ε , where ' x and mk ε are assumed to be Gaussian with the covariance matrices x x m m   B and x x m mk   Q , respectively. Here k f may also contain a model error; but different from the s4DVar, an explicit correction term mk ε is used to adjust the state k x at all times. The size of the w4DVar optimization increases greatly because of the extra variables mk ε ; hence, its practical applications are very limited. The situation is further aggravated by our limited knowledge about the general form of the model error terms ( mk ε ) . Thus, the s4DVar handles all errors as an integral term but only at the analysis time, while the w4DVar attempts to differentiate the initial and model errors and to correct them separately, resulting a much more complicated and unpractical problem due to the high complexity of error sources. Here we propose a new approach, which takes the advantages of both the s4DVar and the w4DVar but avoiding their deficiencies: To extend the s4DVar’s strategy of correcting the initial and model errors as a whole to other times in the assimilation window, we introduce an additive correction term x k - into the state variable x k - at time step k  to obtain a corrected 6 state variable * '1 1 1 k k k      x x x , which is then used to obtain the state variable at time step k : * 1 ( ) k k k   x f x . Moreover, we propose the following exponential function for the integral correction term ' k x based on a thorough analysis of the error evolution process (see Methods):     '' ' 2 0 k k k k L                 xx x . (3) Here ' x is the maximum integral correction term and [0,0.5]  is a pre-determined parameter. Eq. (3) indicates that the integral correction term ( ' k x ) decreases exponentially with time index k (ED Fig.3) for    . Our new integral correcting 4DVar (i4DVar, Fig.1) method is defined by seeking the maximum correction error ' x that minimizes the following cost function:     TT' 1 ' 10 n n

Sb n k obs,n n n k obs,nn              x Q x h x y R h x y , (4) under the constraint (with the predetermined parameter ):     '1 ' 2 11 0 (1 ) , 11 1 2 , 21 k kk kk k + k+ k L                      f x xx f x x , (5) where ' x is assumed to be Gaussian with the covariance matrix x x m mb   Q . Intuitively, the structure of b Q should be similar to B of the s4DVar but with different standard deviations. The i4DVar degenerates to the s4DVar when   and to a special w4DVar case (i.e. with a constant bias error) when   . Thus, the i4DVar can completely recover the s4DVar and partially recover the w4DVar. Compared with the s4DVar, the size of the i4DVar optimization is not changed and its implementation could be accomplished with a minimum added computational cost because that they differ only slightly in the way to integrate the forecast model (namely, ( ) k k  f x vs. '1 1 ( ) k k k    f x x ) during the optimization process. The i4DVar outperforms both the s4DVar and w4DVar methods in the framework of the  under scenarios with model errors from different sources, as it produces considerably smaller RMS errors (Fig.2) in the assimilation mode. Comparably, the disadvantage of the s4DVar becomes increasingly apparent towards the later part of the assimilation period due to the increased influence of the model errors. These results demonstrate that the exponential function used in the i4DVar represents well the evolution of a significant part of the total error in k x under various model-error scenarios and leads to a large reduction of the RMS errors. Further, the i4DVar’s performance is insensitive to different types of the model errors, and it performs almost as good as or even slightly outperforms the s4DVar for the perfect model case (ED Figs.4-5). In contrast, the explicit model-error representation in the w4DVar is ineffective, leading to its inferior performance than the s4DVar except for the last half of the assimilation period (Fig.2), when the tangent linear model of the forecast model was capable of characterizing the error evolution generally. Thus, it could be problematic to use an inaccurate form of the model error in the w4DVar. In the forecast mode, we used   ' ' 21 LL           x x as an approximation of the model error to adjust the forecast at each time step, as the influence of the initial error is largely eliminated outside of the assimilation window. Again, the i4DVar significantly outperforms the other two approaches consistently (Fig. 3). This further demonstrates that the strategy of treating all errors as a whole is effective and the exponentially decaying function works. In summary, we showed that by introducing a decaying error correction term for all the time steps in the assimilation window, it is possible to significantly improve the performance of the 4DVar using the Lorenz model. The proposed i4DVar approach, combined with the ensemble-based nonlinear least squares fast algorithm (also see Methods) that does not require an adjoint model for solving the minimization problem, can be easily applied to other DA cases with minimum added costs, and it has the potential to significantly advance DA. 8 References Luo, Y.Q. et al . Ecological forecasting and data assimilation in a data-rich era.

Ecological Applications . , 1429-1442(2011). 2. Elbern, H., Schmidt, H., Talagrand, O. & Ebel, A. 4D-variational data assimilation with an adjoint air quality model for emission analysis.

Environ. Modell. Softw. (6-7), 539-548(2000). 3. Fisher, M. & Lary, D.J. Lagrangian four-dimensional variational data assimilation of chemical species.

Q. J. R. Meteorol. Soc. , 1681-1704(1995). 4.

Lewis, P. et al . An earth observation land data assimilation system (EO-LDAS).

Remote Sensing of Environment. , 219-235 (2012). 5.

Mitchell, K. E. et al . The multi-institution North American land data assimilation system (NLDAS): utilizing multiple GCIP products and partners in a continental distributed hydrological modeling system.

J. Geophys. Res . (D7), 585-587(2004). 6. Kaminski, T. et al . The BETHY/JSBACH carbon cycle data assimilation system: experiences and challenges.

J. Geophys. Res . (4), 1414-1426(2013). 7. McNally, A. P. The direct assimilation of cloud-affected satellite infrared radiances in the ECMWF 4D-Var.

Q. J. R. Meteorol. Soc. , 1214-1229 (2009). 8.

Kazumori, M. Satellite radiance assimilation in the JMA operational mesoscale 4DVAR system.

Mon. Weather Rev. (3), 1361-1381 (2014). 9.

Rabier, F., Jarvinen, H., Klinker, E., Mahfouf, J.-F. & Simmons, A. The ECMWF operational implementation of four-dimensional variational assimilation. I: experimental results with simplified physics.

Quart. J. Roy. Meteor. Soc . (564), 1143-1170(2000). 10. Bouittier, F. & Kelly, G. Observing-system experiments in the ECMWF 4D-Var data assimilation system.

Q. J. R. Meteorol. Soc. , 1469-1488(2001). 11.

Gauthier, P., & Thépaut J. N. Impact of the digital filter as a weak constraint in the preoperational 4DVAR assimilation system of Météo-France.

Mon. Weather Rev. (8), 9 2089-2102(2001). 12.

Clayton, A. M., Lorenc, A. C. & Barker, D. M. Operational implementation of a hybrid ensemble/4D ‐ Var global data assimilation system at the Met Office.

Q. J. R. Meteorol. Soc. (675), 1445-1461(2013). 13.

Lellouche, J. M., Ouberdous, M. & Eifler, W. 4D-Var data assimilation system for a coupled physical-biological model.

Journal of Earth System Science . (4): 491-502(2000). 14. Argaud, J. P., Bouriquet, B. & Hunt, J. Data assimilation from operational and industrial applications to complex systems.

Mathematics Today , 150-152(2009). 15.

Trémolet, Y. Accounting for an imperfect model in 4D-Var.

Q. J. R. Meteorol. Soc. (621), 2483-2504(2006). 16.

Trémolet, Y. Model-error estimation in 4-DVar.

Q. J. R. Meteorol. Soc. (626), 1267-1280(2007). 17.

Lewis, J. M. & Derber, J. C. The use of the adjoint equation to solve a variational adjustment problem with advective constraints.

Tellus A . (4), 309-322(1985). 18. Le Dimet, F. X. & Talagrand, O. Variational algorithms for analysis and assimilation of meteorological observations: theoretical aspects.

Tellus A . (2), 97-110(1986). 19. Zupanski, D. A general weak constraint applicable to operational 4DVar data assimilation systems.

Mon. Weather Rev. (9), 2274-2292(1997). 20.

Griffith, A.K. & Nichols, N.K. Adjoint methods in data assimilation for estimating model error.

Flow, Turbulence & Combustion . (3-4), 469-488(2000). 21. Shaw, J. A. & Daescu, D. N. Sensitivity of the model error parameter specification in weak-constraint four-dimensional variational data assimilation.

Journal of Computational Physics . , 115-129(2017). 22. Duan, W. S. & Zhou, F.F. Non-linear forcing singular vector of a two-dimensional quasi- 10 geostrophic model.

Tellus. , 18452, doi:10.3402/tellusa.v65i0.18452(2013). 23.

Lorenz, E. N. Predictability: a problem partly solved. In

Shinfield Park, Reading (1995): Proc. Seminar on Predictability Vol. , Tian, X. J. & Feng, X. B. A non-linear least squares enhanced POD-4DVar algorithm for data assimilation.

Tellus A . (1): 25340(2015). 11 Supplementary Information is available in the online version of the paper.

Acknowledgments.

The work of the first author was partially supported by the National Key Research and Development Program of China (2016YFA0600203), the National Natural Science Foundation of China (41575100) and the High-resolution Earth Observation System Major Special Project (CHEOS) (32-Y20A17-9001-15/17). A. Dai was supported by the U.S. National Science Foundation (Grant

Figure 1 | Schematic diagram illustrating the three 4DVar methods.

Over the assimilation window, each 4DVar is used to assimilate the observations (green dots) using a segment of the previous forecast as the background (or initial) state b x (black dashed line) to yield its corresponding optimized fit to all the observations within the window with its own way to integrate the forecast model , as shown in the lower panel. The optimized fields [i.e., the sold lines determined by the analysis increment ' x (blue) for s4DVar, the analysis increment ' x and the model-error correction terms mk ε for w4DVar (purple), and the maximum correction-term ' x for i4DVar (red), respectively] are then used to improve the subsequent forecast. 13 Figure 2 | Time series of the 4DVar-assimilated root mean square (RMS) errors using three different methods. a , The 24-step (6-window) assimilation results under the parameter error scenario. b, Same as a , but for the bias error scenario. c , Same as a , but for the random error scenario. d , Same as a , but for the parameter error + bias error + random error scenario. 14 Figure 3 | Time series of the 4DVar-driven forecast root mean square (RMS) errors using three different methods. a,

The 12-step (3-window) model forecasts after the assimilation period under the parameter error scenario. b , Same as a , but for the bias error scenario. c , Same as a , but for the random error scenario. d , Same as a , but for the parameter error + bias error + random error scenario. METHODS Error evolution and correction within the assimilation window.

Similar to that in the w4DVar method, we consider the nonlinear model system given by ( ) mk k k k    x f x ε , (1) where x mk  x is the simulated state at time step k , k f : x x m m  represents the nonlinear evolution of the state and x mmk  ε represents model error from time step k  to k . For each single-step forecast, the error evolution can be formulated by a simple identity operator . Thus, the integral (or total) error k ε in k x is the error k  ε (in k  x ) plus the model error mk ε from step k  to k mk k k    ε ε ε , k L   , (2) where ε is the initial error in x at time t . If k  x is not corrected for each step over the assimilation window (AW), as in the s4DVar, the initial error ε and the model error mk ε , will be propagated into all k x , and the model forecast error from earlier forecasts will accumulate with k , leading to increasingly large k ε as k approaches the end of the AW. To mitigate this problem, we attempt to correct k  x by replacing it with k k    x ε to remove its error k  ε before using it as the starting condition to forecast k  x . We define k  such that   m mk k k k    ε ε ε  and then   mk k k k      ε ε ε  . After this correction, only the model forecast error mk ε is passed into k x ; so its error becomes:   mk k k k     ε ε ε  . (3) Applying this eq. to

1, 2, ,0 k k   , the error k ε in k x can be expressed as   mk k k k      ε ε ε m mk k k k k k k k +                 ε ε ε m m mk k k k k                             ε ε ε ε . (4) To reduce the number of parameters in eq. (4), here we choose a special type of error correction in which the correction at each time removes the same faction  of the error k ε in each single-step forecast. Further, we will ignore the difference among the individual model error vectors (which is small for a fixed forecast interval within the AW), and replace them with a mean model error ave m ε ( L mkk = L   ε ). With this approximation, eq. (4) becomes m m mk k k k k k                             ε ε ε ε ε  

10 ave k m k k           ε ε

10 ave k+k m        ε ε . (5) Although eq. (5) reduces the number of error vectors from L  as in w4DVar to only two, it still has one more vector than that in the s4DVar. To further simplify this error-evolution model, we introduce the maximum integral error (vector) max ε ( m   ε ε ), and choose a value for  such that ave max m  ε ε  , (6a) and (1 )   ε ε  (6b) with the additional assumption that the two vectors ave m ε and ε have the same direction. This is a reasonable assumption given that ε is the error term in the assimilated state at the end of the previous AW, similar to the error term at the end of the current AW. Substituting eq. (6) into eq. (5), we obtain an evolution of k ε as follows     max 2max 0 k k k k L                εε ε . (7a) 17 Since    , eq. (7) suggests that the error ( k ε ) decreases (increases) exponentially with time index k for   (   ). For   , ave m ε is less than ε , which suggests that the mean model error ave m ε over one-single step (a very short time period) should be generally smaller than the initial error ε . Otherwise, the forecast model is meaningless. Particularly, max 0 , 00, 1 k k k L     εε for   and max k  ε ε for   . Therefore, the reasonable value range of  is [0,0.5] . Correspondingly, the correction term x k ' for the integral error k ε is thus formulated as follows     '' ' 2 0 k k k k L                 xx x , (7b) where ' max   x ε is the maximum integral correction term. We emphasize that eq. (7a) represents the evolution of only part of the total error in k x , since we ignored the differences among the individual model error vectors ( mk ε ) and set all the correction factor k  to the parameter  . In other words, the k ε from eq. (7a) represents only part (hopefully a significant part) of the total error in k x , and thus the adjustment of replacing k  x with k k    x ε also does not completely remove the error in k  x , in contrast to what we originally attempted to do in deriving eq. (3). However, these approximations do not alter the above derivations that lead to eq. (7a); they only make the k ε part of the total error in k x , instead of its total error. Thus, at least for the part of the error discussed here, the structure represented by eq. (7a) is valid. Despite the approximations, our experiments with the Lorenz model suggest that the error corrections based on this simplified error model still significantly improve the assimilation and forecast results. This implies that the error model of eq. (7a) indeed represents 18 a significant part of the total error in k x ; otherwise, our adjustment of replacing k  x with '1 1 k k    x x (i.e. k k    x ε ) would make little improvement. 19 The ensemble nonlinear least squares-based w4DVar approach (NLS-w4DVar).

We consider one special version of the w4DVar problem , in which the model errors are evolving with model evolution. This w4DVar seeks the analysis increment ' x of the initial conditions and the initial model error m ε , such that ' x and m ε minimize the following cost function :     TT' 1 ' 1 T 10 0 0 00 n n

S m mn k obs,n n n k obs,nn                x B x h x y R h x y ε Q ε (8a) subject to the forecast model ( ) mk k k k k     x f x g ε , (8b) where the evolution operator k g of model error is taken as the tangent linear model of the forecast model k f . Eq. (8) can be rewritten as follows     TT' 1 ' ' ' ' 1 ' ' ' T 10 0 0 0 0 00 n n

S m m m mw,k obs,n n w,k obs,nn

L L                x B x x ε y R x ε y ε Q ε , (9a) subject to the forecast model '0 ( ) k w, k   x f x (9b) where '' m      xx ε , (10a)     '0 ( ) ( ) m m ms, k k k      f x f f f x g ε g g ε g g g ε , (10b) ' ' ', ( ) ( ) (0) n n n w k n w k n w k L     x h f x h f , (10c) and ' , , ,0 (0) n obs n obs n n w k    y y h f . (10d) 20 Similar to NLS-s4DVar , the NLS-w4DVar approach assumes that the joint-vector '' m      xx ε is expressed by the linear combinations of the joint initial model perturbations (MPs) '' jj j       xx e (

1, , j N  ) as follows ' x  x P β (11) where   , , , x N  P x x x and  

T1 2 , , , N     β . Furthermore, we mark   ' ' '1 2 , , , x N  P x x x and   ' ' '1 2 , , , e N  P e e e . Substituting eq.(11), and the ensemble background covariances     T0 x x m mx x N    P PB and     T0 x x m me e N    P PQ into eq.(9a), and expressing the cost function in terms of the new control variable β yield TT ' ' 1 ' ', ,0 n n

S w k x obs,n n w k x obs,nn

N L L                β β P β y R P β y . (12) After a series of mathematical transformations similar to those in formulating NLS-s4DVar , eq. (12) can be transformed into a non-linear least squares formulation , which is solved by the Gauss–Newton iteration scheme as follows :    

2( 1)

Si i y n n y nn N           β β I P R P     T 1 ' 1 ' 1, , ,0 ( ) 2( 1) n S i iy n n w k x obs nn

L N            P R P β y β , (13) where   , ' ' ', ,1 ,2 , , , , y n m Ny n n n n N    P y y y and ' ',0 ,0 ( ) (0) n n n,j n w k j n w k     y h f x h f . Therefore, we can obtain the optimized simulations over the assimilation window through ( ) mk k k- k k    x f x g ε from the analysis increment ' x  x P β and the optimized initial error m e  ε P β . The ensemble nonlinear least squares-based i4DVar approach (NLS-i4DVar).

The i4DVar seeks the maximum correction-term ' x that minimizes the following cost function 21     TT' 1 ' 10 n n

Sb n k obs,n n n k obs,nn              x Q x h x y R h x y (14a) subject to the forecast model with the parameter [0,0.5]      '1 ' 2 11 0 (1 ) , 11 1 2 , 21 k kk kk k + k+ k L                      f x xx f x x . (14b) Similarly, eq. (14) can be also rewritten into the following format     TT' 1 ' ' ' 1 ' ' '0 n n

S 'b k obs,n n k obs,nn

L L                x Q x x y R x y (15a) subject to the forecast model '0 ( ) k i, k   x f x , (15b) where     ' ' '0 1 2 1 0

1( ) (1 ) 1 21                         f x f f f f x x x   ' 1             x , (16a) ' ' ',0 ,0 ( ) ( ) (0) n n n k n i k n i k L     x h f x h f    , (16b) and ' , , ,0 (0) n obs n obs n n i k    y y h f  . (16c) Similarly, the maximum integral correction-term ' x is assumed to be expressed by the linear combinations of the MPs ( ' j x ,

1, , j N  ) as follows ' x  x P β (17) 22 where   ' ' '1 2 , , , x N  P x x x and  

T1 2 , , , N     β . Substituting eq. (17) and the ensemble error covariance     T x x m mx xb N    P PQ into eq. (15a) and expressing the cost function in terms of the new control variable β yield TT ' ' 1 ' '0 n n

S k x obs,n n k x obs,nn

N L L                 β β P β y R P β y . (18) Similarly, after a series of mathematical transformations similar to those in formulating NLS-s4DVar , eq.(15a) can be also transformed into a non-linear least squares formulation , which is solved by the Gauss–Newton iteration scheme as follows :     ( 1) Si i y n n y nn N           β β I P R P     T 1 ' 1 ' 1, ,0 ( ) ( 1) n S i iy n n k x obs nn

L N             P R P β y β , (19) where   , ' ' ', ,1 ,2 , , , , y n m Ny n n n n N    P y y y and ' ',0 ,0 ( ) (0) n n n,j n i k j n i k     y h f x h f . As a result, we can obtain the optimized model simulations over the assimilation window through   ' 21 kk k              f x x from the optimized maximum correction-term ' x  x P β with the predetermined parameter  . Localization Implementation of the NLS-w/i4DVar approaches.

An important issue in the ensemble-based method is sampling errors, and a practical way to address this issue is through a localization technique, which could ameliorate the contaminations resulting from inadequate sampling or the spurious long-range correlations . Obviously, the final Gauss–Newton iteration schemes to the i4DVar and w4DVar problems share the following unified formulation     ( 1) Si i y n n y nn N            β β I P R P     T 1 '( ) 1 ' 1, ,0 ( ) ( 1) n S i iy n n k x obs nn

L N               P R P y β , (20) 23 where   , '(1) ' n n k k L L   for NLS-i4DVar, and   , '(2) ' , n n k w k L L  for NLS-w4DVar, respectively. We adapt an efficient localization scheme proposed for the NLS-s4DVar to transform eq. (20) into the following form:   T1 ( ) n Si i iy n y n k xn e L            β β P ρ P β   T* 1 ' '( ), , , ,0 ( ) n S iy n y n n obs n k xn e L           P ρ R y P β , (21a) where           ( 1) ( 1)

S Sy n y n n y n y n y n y nn n

N N                    

P P R P I P P P , (21b) and      

1T T* T 1, , , ,0 ( 1)

Sy n y n n y n y nn N          P P R P I P . (21c) Then we can obtain ', , i ix   x P β , where     ( )* *, ,1 , , , x m r Nx x x x x x x N e        P P ρ ρ P ρ P (22) ( r is the is the eigenvector number chosen ) is the localized MP matrix, and *, x j P (

1, , j N  ) is an x m r  matrix whose every column is the j th column of x P , , , y n m ry n   ρ , x m rx   ρ , and T, n x y n  ρ ρ ρ . Here, , x y n m mn   ρ is the spatial correlation matrix , and stands for the Schür product of matrices and C which is a matrix whose ( , i j ) entry is given by b i , j × c i , j . Eq. (22) also defines the matrix operator   e     for two matrices. Model and experimental Design.

We use the Lorenz-96 model as the model platform to evaluate all the three 4DVar methods. This model has been widely used to study various issues associated with data assimilation. The Lorenz-96 model is governed by the following system of nonlinear equations: 24 i i i i i i dx x x x x x Fdt          ,

1, , i n  , (23) with periodic boundary conditions. The model behaves quite differently with different values of F and produces chaotic systems with integer values of F larger than three. In this configuration, we take n  and F  . For computational stability, a time step of 0.05 units (or 6 h in equivalent ) is adopted and a fourth-order Runge–Kutta scheme is used for temporal integration in this study. An observing system simulation experiment (OSSE) is considered as one of the best benchmark tests to evaluate a data assimilation methodology since it can provide both the ‘true’ state and the corresponding ‘observation’. These artificial observations were assimilated in the OSSEs: The default number of observations is m  (equally spaced at every observation time) and observations were taken every two time steps (or 12 h), which were generated by adding random error perturbations with the standard deviation error of 0.1 to the time series of the true state obtained by running the perfect model ( F  ) from the “true” initial field. All experiments were carried out over 6 days. The default parameter setups are the ensemble size N  , r  , and the covariance localization Schür radius ( ) 16 h v r d d   (only one direction in the Lorenz-96 model). The length of assimilation window is 4 steps and the ensemble is re-generated by the LETKF format for each cycle. The background initial field is generated by integrating the model 100000 steps from an arbitrary nonzero fields and the true initial field is subsequently obtained by integrating the 40 steps from the background initial field. Therefore, this background state is significantly different from the ‘‘true’’ state. The initial sample is generated by running the Lorenz model from 90 steps (i.e., sampling once every three time steps). To verify the three 4DVar approaches comprehensively, we design one group of model error scenarios (totally four cases) including the parameter error ( F  ), constant bias 25 error s v ( || || 1.5 s  v ), random error (0, 0.1) r N  v and their combinations (parameter error bias error random error). Availability.

All other relevant data and codes are available from the corresponding author. 25.

Evensen G. The ensemble Kalman filter: Theoretical formulation and practical implementation.

Ocean dynamics . (4), 343-367(2003). 26. Dennis, J. E. Jr. & Schnabel, R. B.

Numerical Methods for Unconstrained Optimization and Nonlinear Equations (Classics in Applied Mathematics, 16).

SIAM . Philadelphia. ISBN: 0898713641(1996). 27.

Houtekamer, P. L. & Mitchell, H. L. A sequential ensemble Kalman filter for atmospheric data assimilation.

Mon.Weather Rev. , 123-137(2001). 28.

Tian X. J., Feng X. B., Zhang H. Q., Zhang B. & Han R. An enhanced ensemble-based method for computing CNOPs using an efficient localization implementation scheme and a two-step optimization strategy: formulation and preliminary tests.

Q. J. R. Meteorol. Soc. (695), 1007-1016(2016). 29.

Hunt, B. R., Kostelich, E.J. & Szunyogh, I. Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter.

Physica D . , 112-126(2007) Extended Data

Extended Data Table 1. | Summary of key notation used in this article

Notation Description Size x , ' x , ' j x State vector: time t ( 'increment, j ensemble member) x m k  x , * 1 k  x State vector: time step k  ( * corrected) x m B , B , b Q Background error covariance x x m m  , n R Observation error covariance , , y n y n m m  ( ) k f Forecast model: time step k k   x m (in, out) ( ) n h Nonlinear observation operator: time n t x m (in), , y n m (out) obs,n y Observations: time n t , y n m m ε , mk ε Model error terms: time step , k x m Q , k Q Model error covariance x x m m   Decaying parameter: [0,0.5] ( ) n w, k  f , ( ) n i, k   f Forecast model: time n t t  ( w weak, i integral) x m (in, out) ' ', ( ) n w k L x ' , ,0 ,0 ( ) ( ) (0) n n n w k n w k n w k L     h f h f : time n t t  x m (in), , y n m (out) ' ' ( ) n k L x  ' ,0 ,0 ( ) ( ) (0) n n n k n i k n i k L     h f h f    : time n t t  x m (in), , y n m (out) ' , obs n y , ,0 (0) n obs n n w k   y h f ; , ,0 (0) n obs n n i k   y h f  , y n m x P   ' ' '1 2 , , , N x x x x m N  i β Coefficient vector at i th iteration r N  , y n P   ' ',1 , , , n n N y y , y n m N  ' , n j y ',0 ,0 ( ) (0) n n n w k j n w k    h f x h f ; ',0 ,0 ( ) (0) n n n i k j n i k    h f x h f   , y n m k g The evolution operator of model error x m (in, out) ' x Joint-vector x m x P   ' ' '1 2 , , , x x x x m N  e P   ' ' '1 2 , , , N e e e x m N  n ρ Spatial correlation matrix , x y n m m  x ρ , , y n ρ T, x y n n  ρ ρ ρ x m r  , , y n m r  , x  P x x e   P ρ ( ) x m r N   x m is the size of the state space. , y n m is the size of observations at n t . S  is the total number of observation time points in the assimilation window. L is the total length of the assimilation window. r is the number of eigenvectors chosen in the SVD decomposition of correlation matrix n ρ . N is the number of ensemble members. Extended Data Figure 1 | Time series of the 4DVar-assimilated root mean square (RMS) errors.

Shown are the 24-step (6-window) assimilation results under the random error scenario with the perfect initial field. Extended Data Figure 2 |

Time series of the 4DVar-driven forecast root mean square (RMS) errors.

Shown are the 12-step (3-window) model forecasts after the assimilation period under the random error scenario with the perfect initial field. 29

Extended Data Figure 3 |

The exponential function           . Shown are its values vary with the time step ( k ) for different  values: i      , with

1, ,10 i  . Extended Data Figure 4 |

Same as Extended Data Fig. 1, but for the perfect model scenario with the background field. 31

Extended Data Figure 5 ||