[PDF] Identification of physical processes via combined data-driven and data-assimilation methods

Abstract

With the advent of modern data collection and storage technologies, data-driven approaches have been developed for discovering the governing partial differential equations (PDE) of physical problems. However, in the extant works the model parameters in the equations are either assumed to be known or have a linear dependency. Therefore, most of the realistic physical processes cannot be identified with the current data-driven PDE discovery approaches. In this study, an innovative framework is developed that combines data-driven and data-assimilation methods for simultaneously identifying physical processes and inferring model parameters. Spatiotemporal measurement data are first divided into a training data set and a testing data set. Using the training data set, a data-driven method is developed to learn the governing equation of the considered physical problem by identifying the occurred (or dominated) processes and selecting the proper empirical model. Through introducing a prediction error of the learned governing equation for the testing data set, a data-assimilation method is devised to estimate the uncertain model parameters of the selected empirical model. For the contaminant transport problem investigated, the results demonstrate that the proposed method can adequately identify the considered physical processes via concurrently discovering the corresponding governing equations and inferring uncertain parameters of nonlinear models, even in the presence of measurement errors. This work helps to broaden the applicable area of the research of data driven discovery of governing equations of physical problems.

Full PDF

11 Identification of physical processes via combined data-driven and data-assimilation methods

Haibin Chang and Dongxiao Zhang * ERE, BIC-ESAT, and SKLTCS, College of Engineering, Peking University, Beijing 100871, China. *Corresponding author. Email: [email protected]

Abstract

Introduction

Physical problems may consist of several processes. For example, for contaminant solute transport in subsurface formation, simultaneous processes may exist, such as advection, dispersion, and reaction (7); hydraulic fracturing in oil/gas reservoirs is possible because of the coupled processes of fluid flow, geomechanics, and heat transfer ( ); and the viability of geological sequestration of carbon dioxide depends on the collective effects of hydrodynamic trapping, residual trapping, dissolution trapping, gravitational instability, and mineral trapping, while the dominant processes may vary during the course of storage ( ). Modeling such processes is important for characterizing corresponding physical problems. First principle derivation by resorting to conservation law can lead to rigorous models for some physical processes. However, it may not be applicable to some complex processes, for which approximate or empirical models are usually proposed based on laboratory experiments or data analyses. Alternative models of various accuracies or complexities may be proposed for the same physical process by considering different conditions, and these models usually have several parameters. For the specific occurrence of a physical problem, which processes occurred (or dominated) may be unclear. Moreover, which is the proper empirical model among the alternatives for a specific process? What are the values of the model parameters of the empirical model? Addressing these questions can be summarized into two tasks: determining the governing equation, which includes identifying the occurred (or dominated) processes and selecting the proper empirical model(s); and estimating the uncertain model parameters in the governing equation. Developing a methodology for efficiently solving these is crucial for many physical problems. When the governing equation and its solving scheme of an investigated problem is known, estimating the uncertain model parameters can be implemented by utilizing data-assimilation (inverse modeling) methods. In data-assimilation methods, the uncertain model parameters can be automatically updated to maximize the posterior probability density function. Various data-assimilation methods exist, such as the Markov chain Monte Carlo method ( , , ), the gradient-based method ( ), and the ensemble-based method ( , , , ). The prerequisite of implementing data-assimilation methods is that the model of the investigated problem that includes the governing equation and its solving scheme is known. If some of the occurred (or dominated) physical processes are not clear, the proper empirical model for a specific process cannot be determined, or the constitutive relations of some quantities are not explicitly known, the model will be vague, which will impede the implementation of data-assimilation method. For these physical problems, determining the governing equation, which includes identifying the occurred (or dominated) processes and selecting the proper empirical model, is critical. Usually, this can be accomplished by combining theoretical derivations and laboratory experiments. However, this may be time-consuming and present some technical obstacles. When spatiotemporal measurements are available, the data-driven method may provide a viable option for discovering the governing equation of a physical problem. Several recent works exist in investigating data-driven discovery of dynamical systems ( , , , , ) and partial differential equations (PDE) ( , ). In these works, sparse regression, such as sequential threshold ridge regression and the least absolute shrinkage and selection operator (LASSO), constitutes an essential technique for identifying the terms in the governing equation from a large candidate library. The Gaussian process technique is also utilized for data-driven discovery of differential equations in some works ( , ). In the extant literature about learning PDE, the general form of the investigated PDE takes the form / ( ) u t u      where u is the response of the physical problem, ( ) u  is the candidate term library, and  is the coefficient. For the physical problem considered in this work, because the empirical model usually contains model parameters in a nonlinear manner, the PDE takes the form / ( , ) u t u m      where m is the uncertain model parameter that cannot be included in  . Here, note that the empirical model is not only used for modeling a physical process, but also for modeling the constitutive relationship between variables of many physical problems. For example, for modeling flow in unsaturated porous media, empirical models are needed to model the constitutive relationships between conductivity, water content, and hydraulic head (21). Thus, this generalized form of PDE can cover a wider range of physical problems. However, such an extension cannot be resolved with the current data-driven PDE discovery approaches. In this study, in order to identify the considered physical processes from data, an innovative framework that combines the data-driven and data-assimilation method is proposed. For illustrating the proposed method, we focus here on the particular problem of contaminant solute transport in subsurface formation, while the proposed method can be easily applied to numerous other problems in different fields. This work helps to broaden the applicable area of the research of data-driven discovery of governing equations of physical problems. Results

Identification of solute transport processes

Here, we evaluate the proposed method by considering the problem of contaminant solute transport in subsurface formation. Solute transport in subsurface formation may be subject to a number of different processes ( ). For a site with solute transport, our goal is to use spatiotemporal solute concentration measurements to identify the occurred (or dominated) processes, select the proper empirical model for describing a reaction, and estimate the model parameters. This can be generalized as identifying PDE with the form: ( , ) u u mt     (1) where u denotes the response of a physical problem; ( , ) u m  denotes the library of candidate processes and empirical models;  denotes the coefficient; and m denotes the model parameters of the empirical model that cannot be included in  . For building the candidate library of the solute transport problem, three processes, advection (ADV), dispersion (DIS), and sorption (SORP), are considered in this work. Here, it is assumed that prior knowledge exists about the modeling of each considered candidate process. For the change in solute concentration with time, / C t   , where C denotes the concentration of solute in aqueous phase, one-dimensional ADV and DIS are modeled by / x v C x    and / L D C x   , respectively, where x v denotes the average linear groundwater velocity, and L D denotes the longitudinal dispersion coefficient. The model of SORP can be expressed by * ( / ) / b C t     , where b  denotes the bulk density of the aquifer,  denotes the porosity, and * C denotes the amount of solute adsorbed per unit weight of solid. Different from the ADV and DIS, the SORP may not be easily modeled by first principle derivation. Different empirical models are usually proposed (based on laboratory experiments) for modeling SORP by considering different conditions. Two equilibrium sorption models, Freundlich sorption isotherm (F-SORP) and Langmuir sorption isotherm (L-SORP), are considered here, which are expressed as * af C K C  and * ( ) / (1 ) l l C K SC K C   , respectively, where f K denotes the Freundlich constant, a denotes the Freundlich exponent, l K denotes the Langmuir constant, and S denotes the total concentration of sorption sites available. Empirical models normally contain model parameters, which are traditionally obtained by data fitting. For the considered solute transport, the candidate library can be expressed as:

1( , ) , , , (1 ) a l

C C C CC m Cx x t K C t               (2) and, ( , ) l m a K  (3) Here, note that the four terms of ( , ) C m  shown in Eq. 2 are used for denoting ADV, DIS, F-SORP, and L-SORP, respectively, and some parameters of the process models ( x v  , L D , ( / ) b f aK   , and ( / ) b l K S   ) are included in  while the nonlinear terms a C  and l K C  cannot be included in  . For proof of the concept, the data used in this work are obtained from numerical simulation. A benchmark problem of solute transport from MT3DMS software ( ) is adopted here. We briefly describe the problem setup here, and additional details can be found in the Supplementary Materials. It is a one-dimensional problem. The flow field is steady state. A solute is released at the origin for 160 s. The solute concentration is measured from 300 s to 1100 s at 101 evenly-distributed spatial locations. The measurement interval is 0.5 s. Three solute transport scenarios are designed here: scenario 1 contains the two processes of ADV and DIS; scenario 2 contains the three processes of ADV, DIS, and F-SORP; and scenario 3 contains the three processes of ADV, DIS, and L-SORP. For the three scenarios, x v and dispersivity are set as 0.01 and 1, respectively. For scenario 2, f K and a are set as 0.05 and 0.7, respectively. For scenario 3, l K and S are set as 100 and 0.003, respectively. The three scenarios are particularly designed for testing the performance of the proposed method for identifying the occurred (or dominated) processes, selecting the proper empirical model, and estimating the model parameters. MT3DMS software is utilized for running the simulations to obtain the measurement data for the three scenarios. Because m is usually uncertain, the traditional data-driven method cannot be utilized for learning the equation shown in Eq. 1. In order to learn this type of equation, a combined data-driven and data-assimilation method is proposed. For implementing the proposed method, the data are first divided into a training data set and a testing data set. In this work, the data are divided according to time sequence. The first 60% of data are used as a training data set, and the remaining 40% of data are used as a testing data set. The prior distribution of m is needed to be prescribed according to prior knowledge. Here, a is supposed to follow a uniform distribution and take values from 0.25 to 0.75, that is, [0.25, 0.75] a U  ; and l K is supposed to follow a uniform distribution and take values from 30 to 150, that is, [30, 150] l K U  . Implementing the proposed method is started with an initial sample of m. Next, the corresponding equation is learned using the training data set, and the prediction error of the learned equation for the testing data set is calculated. Then, m is updated using the data-assimilation method. Parameter updating and equation learning are then continued until a convergence criterion is met. Additional details of the proposed method can be found in the Methods section. Starting with 100 initial samples of m , we obtain 100 groups of results independently. Fig. 1 shows the results of identification of three solute transport scenarios, which include the learned coefficients of the normalized candidate terms (Figs. 1a-1c), the estimation of a (Figs. 1d-1f), and the estimation of l K (Figs. 1g-1i). Since 100 groups of results are obtained, the coefficients of normalized candidate terms are shown in a box plot. From Fig. 1a, one can see that, for scenario 1, ADV and DIS are identified to be the occurred (or dominated) processes, while the coefficients of the normalized two empirical models of SORP are close to zero, indicating that for this scenario, SORP can be ignored. From Fig. 1b, one can see that, for scenario 2, the three candidate processes are all identified to be occurred, and it is clear that F-SORP is the proper empirical model for SORP by observing that the coefficient of normalized L-SORP is close to zero. Similarly, from Fig. 1c, one can see that, for scenario 3, the three candidate processes are all identified to be occurred, and it is clear that the L-SORP is the proper empirical model for SORP by observing that the coefficient of normalized F-SORP is close to zero. These results demonstrate that the proposed method can identify the occurred (or dominated) processes and select the proper empirical model well. In the proposed method, a gradient-based data-assimilation method is utilized for updating m . If a candidate empirical model is identified to be not occurred (or dominated), the associated model parameters will have a near zero (or small) gradient, and thus there will be no (or slight) update of these parameters. In contrast, the associated model parameters will be updated if the corresponding candidate empirical model is identified to be occurred (or dominated). This is verified by the results shown in Figs. 1d-1i. Since F-SORP is only occurred in scenario 2, there is no obvious update of its model parameter, a , for scenario 1 (as shown in Fig. 1d) and scenario 3 (as shown in Fig. 1f). While, for scenario 2, the updated a is close to the true value with a mean of 0.700 and a standard deviation of 0.0031. Similarly, there is no obvious update of l K for scenario 1, and there is slight update of l K for scenario 2. While, for scenario 3, the updated l K is close to the true value with a mean of 100.349 and a standard deviation of 0.086. These results show that the proposed method can estimate the model parameters of the selected empirical model well. For identification of physical processes, the proposed method can be implemented only one time with one initial sample of m . Implementing the proposed method multiple times with different initial samples of m can assist to avoid getting stuck at the local minimums of the objective function when updating m using the data-assimilation method. By observing the results shown in Fig. 1, the final learned coefficients of the normalized candidate terms and updated model parameters of the selected empirical model from all of the implementations have small variability, which may indicate that the objective function of the considered cases only have a global minimum and all of the results converged to it. Thus, for the case considered here, it may only be necessary to implement the proposed method one time with one initial sample of m . However, if there is no prior information about the shape of the objective function with respect to model parameters, multiple implementations are recommended. For determining the learned equation, the normalized terms that have coefficients close to zero will be discarded in order to obtain a parsimonious model. The final learned equations of the three scenarios are given in Table 1. For the coefficients of the PDE terms, we use the mean of the results of all of the implementations. By comparison with the true PDE, the learned PDE has high accuracy for both the coefficients and model parameters. Other tests with different parameter setups are implemented, and the results are also satisfactory (see Supplementary Materials). The influence of data noise

In previous cases, the data utilized are clean, i.e., there is no measurement error. However, in reality, the measurement data may be associated with some noise. Here, we test the performance of the proposed method with noisy data. We synthetically add noise to data as: ( , ) ( , ) (1 )

C x t C x t e     (4) where  denotes the noise level; and e denotes the uniform random variable taking values from -1 to 1. Three noise levels, 1%, 5%, and 10%, are added to the data of the three solute transport scenarios. For ( , ) C m  shown in Eq. 2, the spatial derivatives up to order two and temporal derivative need be calculated using spatiotemporal data. Since finite difference is utilized for calculating derivatives in this work, small data noise may incur huge error in the calculated derivatives. Thus, pretreatment of the noisy data is requisite. In this work, the polynomial technique is utilized to smooth the noisy data. Additional details about the pretreatment can be found in the Supplementary Materials. Fig. 2 presents the results of the learned coefficients of the normalized terms with three data-noise levels. From the figures, it can be seen that, for scenario 1 and scenario 3, the occurred (or dominated) processes and the proper empirical model can be satisfactorily identified by pretreating the noisy data. For 10% noise, however, there are some implementations with spurious results. Moreover, for scenario 2, the learned coefficients exhibit larger variability than that of scenario 1 and scenario 3. The coefficient of the normalized L-SORP is not as small as that with clean data. By comparing the magnitude of the absolute value, the coefficient of normalized F-SORP is larger than that of L-SORP for the three noise levels. However, as the noise level becomes larger, the superiority of F-SORP becomes smaller. Since only one model is needed for SORP, F-SORP can be identified as the better empirical model for SORP under the three noise levels. The results of the updated model parameter with three data-noise levels are given in the Supplementary Materials. Overall, as noise level becomes larger, accuracy decreases. It is also seen that data noise may incur larger variability in the results from different implementations, which will increase the difficulty for physical process identification and model parameter inference. Considering that a prediction error of the learned model for the testing data will be calculated in the proposed method, the abnormal results of some implementations may be screened out via analyzing the prediction errors of the final learned equations. By screening out the abnormal results, the variability of the learned  and updated m will decrease. On the other hand, in order to improve the accuracy of the results of  and m , the proposed method may be implemented again with a modified candidate library, which only retains the identified terms from the first implementation. The learned equations with three data-noise levels are given in Table 1. Similarly, as noise level becomes larger, the accuracy of the learned equation decreases. Additional discussion and results can be found in the Supplementary Materials. Data noise constitutes a challenge for the proposed method, and further works are requisite. Learning equations using the combined method

In previous cases, concerning prior knowledge about the considered solute transport problem, it is assumed to know that there exist ADV, DIS, and SORP. In reality, the potential physical processes or how they are modeled for a physical problem may be unclear. In order to demonstrate the applicability of the proposed method to more challenging problems, here the prior information about the considered solute transport problem is relaxed as that SORP is regarded as a potential process for solute transport with two possible empirical models, while no other information exists about the other potential processes. In order to learn the governing equation, a new candidate library is designed as:

1( , ) , , , , , , , , , .(1 ) a l

C C C C C C C CC m C C Cx x x x x x t K C t                       (5) Besides the last two terms, which are used for modelling F-SORP and L-SORP, respectively, there are eight other terms utilized for determining other physical processes. The data of the three previously-designed scenarios are also utilized here for learning equation. The prior distributions of a and l K are the same as previous cases. Fig. 3 shows the results of the learned coefficients of the normalized terms and the estimated model parameters for the three scenarios. From Figs. 3a-3c, one can see that the correct terms can be selected for the three scenarios. Regarding the estimation of model parameters, one can see that there is no update of a and l K for scenario 1. The updated a for scenario 2 has a mean of 0.701 and a standard deviation of 0.022. The updated l K for scenario 3 has a mean of 101.025 and a standard deviation of 0.388. While, for some implementations, there is large update of l K for scenario 2 and a for scenario 3. This may be caused by the error of the calculated gradient. Since the empirical model can be correctly selected, this spurious update of model parameter will not influence the final learned equation. For determining the learned equation, the normalized terms that have coefficients close to zero will be discarded. In order to further improve the accuracy of the learned equation, a new candidate library can be built by deleting the unnecessary terms. Choosing the selected terms of the three scenarios, the new candidate library will be the same as that shown in Eq. 2. Implementing the proposed method again with the new candidate library, the same learned equations as those of previous cases would be obtained. From the results here, it can be seen that the proposed method can be utilized for learning the equation with the form shown in Eq. 1. Discussion

In recent works that investigate data-driven discovery of dynamical systems and PDEs, the aim is to learn one set of parameters, i.e., the  utilized in this work. For physical problem, it may contain physical process that needs to be modeled by empirical models or augmented with constitutive relations. In addition, the empirical model usually contains model parameters, m , which cannot be included in  . Thus, data-driven discovery of this kind of physical problem comprises two tasks, which are to learn  and to estimate m . Through focusing on a particular problem of contaminant solute transport in subsurface formation, the results demonstrate that the proposed method that combines the data-driven and data-assimilation methods can solve this problem well. The proposed method provides an efficient option for identifying occurred (or dominated) physical processes, selecting proper empirical model, and estimating model parameters. Generally, the proposed method can be utilized to learn the equation with the form shown in Eq. 1. Considering that empirical model is not only used for modeling a physical process, but also for modeling constitutive relationships between variables of many physical problems, this form of PDE can cover a wider range of physical problems. This study assists to broaden the applicable area of the research of data-driven discovery of governing equations of physical problems. In order to learn the appropriate terms/models (  ) and estimate the model parameters ( m ), the proposed method can be implemented only one time with one initial sample of m . Since the gradient-based data-assimilation method is utilized, multiple implementations with different initial samples of m are recommended for avoiding getting stuck at the local minimums of the objective function. For determining the final learned  and estimated m , it is helpful to screen out spurious results of some implementations based on analyzing the prediction errors. In this work, for case studies, 100 implementations are performed to test the performance of the proposed method, while it is not necessary to perform this large number of implementations. The results demonstrate that a small number of implementations may be sufficient for determining the learned equation. Traditionally, implementing the data-assimilation method requires forward simulations for predicting model responses with respect to given model parameters. In this work, however, the prediction error introduced in the proposed data-assimilation method is calculated using a testing data set, and no numerical simulations of the forward problem are revolved. Thus, the data-assimilation method utilized in this work can also be understood as a data-driven method. Since spatial and temporal derivatives need to be calculated in the proposed method, data noise presents a challenge. The results show that, for larger noise, the accuracy of the learned  and estimated m will decrease. After obtaining the learned PDE, in order to improve the accuracy of the coefficients and parameters of the PDE terms, a numerical scheme of the learned PDE may be developed. Moreover, a data-assimilation method that uses the developed numerical scheme for performing forward simulations may be implemented to further update the coefficients and parameters of the PDE terms. Combing the data-driven and data-assimilation methods constitutes a viable strategy for solving the discussed problem. Besides the methods utilized in this work, other data-driven and data-assimilation methods may be used as alternatives. Especially, for data-driven methods, sparsity constrained methods, such as LASSO, may be employed to replace the least square regression that is utilized in this work to improve the performance of the proposed method. Some physical problems are described by coupled equations, where empirical models of constitutive relationships can be included as supplementary equations. The proposed method is amenable to these kinds of problems. Methods

Identifying the kinds of physical processes discussed in this work can be understood as learning the PDE with the form shown in Eq. 1. In order to achieve this, spatiotemporal measurement data are divided into a training data set with number train n and a testing data set with number test n . For the training data set, Eq. 1 can be rewritten as: ( , ) U U mt     (6) where [ , , ..., ] train Tn U u u u  ; and i u denotes the i th training data. For any m , the corresponding coefficient  is learned by least square regression as:   = ( ) = ( , ) ( , ) ( , ) . TT T

Um U m U m U m t       (7) For the learned equation, a prediction error  is calculated using the testing data set as: ( ) ( , ) ( ) . test n n nn um u m mt             (8) In order to update m , an objective function is defined as: obs T obs pr T prM O m m C m m m C m m              (9) where obs  denotes the observed (or target) prediction error; pr m denotes the initial sample of m ; C  denotes the covariance of the prediction error; and M C denotes the covariance of m . Supposing that no error will be incurred in calculating the partial derivatives in Eq. 8, the true PDE will have no prediction error, that is, obs   . The data-assimilation method utilized in this work for updating m takes the form ( ):        l l T T prM M l l l M l l M M llT T obsM l l l M l l m m C C G C G C G G C C m mC G C G C G m                     (10) where l denotes the iteration index;  denotes the multiplier; and G denotes the gradient. In this work, G is calculated by finite difference. In summary, starting with one initial sample of m ,  will be learned and  will be calculated. Then, the data-assimilation method can iteratively update m to minimize an objective function. After each update of m ,  will be relearned and  recalculated. This process will continue until a convergence criterion is met. Additional details of the proposed method can be found in the Supplementary Materials. References Brunton, S. L., Proctor, J. L. & Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems.

PNAS . , 3932–3937 (2016). 2. Chang, H., Liao, Q. & Zhang, D. Surrogate model based iterative ensemble smoother for subsurface flow data assimilation.

Adv. Water Resour. , 96-108 (2017). Chen, Y. & Oliver, D. S. Levenberg–Marquardt forms of the iterative ensemble smoother for efficient history matching and uncertainty quantification.

Comput. Geosci. , 689–703 (2013). Efendiev, Y., Datta-Gupta, A., Ginting, V., Ma, X. & Mallick, B. An efficient two-stage Markov chain Monte Carlo method for dynamic data integration.

Water Resour. Res. , W12423 (2005). Emerick, A. A. & Reynolds, A. C. Ensemble smoother with multiple data assimilation. Computers & Geosciences. , 3–15 (2013). Evensen, G. Data Assimilation: The Ensemble Kalman Filter (Springer-Verlag, Berlin Heidelberg, 2007). Fetter, C. W. Contaminant hydrogeology (Prentice Hall, ed. 2, 1999). Gamerman, D. & Lopes, H. F. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference (Taylor and Francis, Boca Raton, ed. 2, 2006). Gu, Y. & Oliver, D. S. An iterative ensemble Kalman filter for multiphase fluid flow data assimilation.

SPE J. , 438–446 (2007). Haario, H., Laine, M., Mira, A. & Saksman, E. DRAM: Efficient adaptive MCMC.

Stat. Comput. , 339–35 (2006). Li, S., Li, X. & Zhang, D. A fully coupled thermo-hydro-mechanical, three-dimensional model for hydraulic stimulation treatments.

J. Nat. Gas Sci. Eng.

34, 64-84 (2016).

Mangan, N. M., Kutz, J. N., Brunton, S. L. & Proctor, J. L. Model selection for dynamical systems via sparse regression and information criteria.

Proc. R. Soc. A. , 20170009 (2017). 13.

Oliver, D. S., Reynolds, A. C. & Liu, N. Inverse Theory for Petroleum Reservoir Characterization and History Matching (Cambridge University Press, New York, 2008). 14.

Quade, M., Abel, M., Kutz, J. N. & Brunton, S. L. Sparse identification of nonlinear dynamics for rapid model recovery.

Chaos . , 063116 (2018). Raissi, M. & Karniadakis, G. E. Hidden physics models: Machine learning of nonlinear partial differential equations.

J. Comput. Phys. , 125–141 (2018).

Raissia, M., Perdikarisb, P. & Karniadakisa, G. E. Machine learning of linear differential equations using Gaussian processes.

J. Comput. Phys. , 683–693 (2017).

Rudy, S. H., Brunton, S. L., Proctor, J. L. & Kutz, J. N. Data-driven discovery of partial differential equations.

Sci. Adv. , e1602614 (2017). Schaeffer, H. Learning partial differential equation via data discovery and sparse optimisation.

Proc. R. Soc. A. , 20160446 (2017).

Schaeffer, H. & McCalla, S. G. Sparse model selection via integral terms.

Phys. Rev. E . , 023302 (2017). Tran, G. & Ward, R. Exact Recovery of Chaotic Systems from Highly Corrupted Data.

Multiscale Model. Simul. , 1108-1129 (2017). van Genuchten, M. T. A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Sci. Soc. Am. J. , 892-898 (1980). Vishal, V. & Singh, T. N. Geologic Carbon Sequestration: Understanding Reservoir Behavior (Springer International Publishing Switzerland, 2016). 23.

Zheng, C. & Wang, P. P. MT3DMS, A modular three-dimensional multi-species transport model for simulation of advection, dispersion and chemical reactions of contaminants in groundwater systems; documentation and user’s guide. U.S. Army Engineer Research and Development Center Contract Report SERDP-99-1, Vicksburg, MS, 202 p. (1999).

Acknowledgments

This work is partially funded by the National Natural Science Foundation of China (Grant No. U1663208 and 51520105005), and the National Science and Technology Major Project of China (Grant No. 2017ZX05009-005 and 2016ZX05037-003). Figures and Tables -101 c o e ff i c i e n t ADV DIS F-SORP L-SORP (a) -101

ADV DIS F-SORP L-SORP (b) c o e ff i c i e n t -101 ADV DIS F-SORP L-SORP (c) c o e ff i c i e n t (d) Freundlich exponent, a upd a t e d initial (e) Freundlich exponent, a upd a t e d initial (f) Freundlich exponent, a upd a t e d initial

40 80 120 1604080120160 (g)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (h)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (i)

Langmuir constant, K l upd a t e d initial Fig. 1. The results of identification of three solute transport scenarios.

The learned coefficients of normalized terms of scenario 1 ( a ), scenario 2 ( b ), and scenario 3 ( c ). The estimation of Freundlich exponent of scenario 1 ( d ), scenario 2 ( e ), and scenario 3 ( f ). The estimation of Langmuir constant of scenario 1 ( g ), scenario 2 ( h ), and scenario 3 ( i ). -101

1% noise

ADV DIS F-SORP L-SORP (a) c o e ff i c i e n t -101

1% noise (b)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

1% noise (c)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

5% noise

ADV DIS F-SORP L-SORP (d) c o e ff i c i e n t -101

5% noise (e)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

5% noise (f)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

10% noise

ADV DIS F-SORP L-SORP (g) c o e ff i c i e n t -101

10% noise (h)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

10% noise (i)

ADV DIS F-SORP L-SORP c o e ff i c i e n t Fig. 2. The learned coefficients of normalized terms with three data-noise levels.

1 2 3 4 5 6 7 8 9 10 Cx  Cx  -101

1 2 3 4 5 6 7 8 9 10 Cx  Cx  (b) c o e ff i c i e n t a CC t   -101

1 2 3 4 5 6 7 8 9 10 l CK C t   Cx  Cx  (c) c o e ff i c i e n t (d) Freundlich exponent, a upd a t e d initial (e) Freundlich exponent, a upd a t e d initial (f) Freundlich exponent, a upd a t e d initial

40 80 120 1604080120160 (g)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (h)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (i)

Langmuir constant, K l upd a t e d initial Fig. 3. The results of learning equations of three solute transport scenarios.

The learned equations of solute transport.

Physical processes Learned equation

ADV and DIS

C C Ct x x       

C C Ct x x        (clean data)

C C Ct x x        (1% noise)

C C Ct x x        (5% noise)

C C Ct x x        (10% noise)

ADV, DIS, and F-SORP

C C Ct x xCC t          C C C CCt x x t            (clean data) C C C CCt x x t            (1% noise) C C C CCt x x t            (5% noise) C C C CCt x x t            (10% noise) ADV, DIS, and L-SORP

C C Ct x xCt        

C C C Ct x x C t            (clean data)

C C C Ct x x C t            (1% noise)

C C C Ct x x C t            (5% noise)

C C C Ct x x C t            (10% noise) Supplementary Information for

Identification of physical processes via combined data-driven and data-assimilation methods

Haibin Chang and Dongxiao Zhang * *

Corresponding author. Email: [email protected]

This PDF file includes:

Section S1.

Methods

Section S2.

Cases

Fig. S1.

The results of identification of scenario 2 and scenario 3 with different values for K f and K l , respectively. Fig. S2.

The results of identification of scenario 2 and scenario 3 with different value for v x . Fig. S3.

The estimation of Freundlich exponent for the three scenarios with three data-noise levels.

Fig. S4.

The estimation of Langmuir constant for the three scenarios with three data-noise levels.

Fig. S5.

The prediction error and the learned coefficients of PDE terms for the three scenarios with clean data.

Fig. S6.

The prediction error and the learned coefficients of normalized terms for scenario 2 with three data-noise levels.

Fig. S7.

The results of identification of scenario 2 with a modified candidate library.

Table S1.

The learned equations of scenario 2 and scenario 3 with different values for K f and K l , respectively. Table S2.

The learned equations of scenario 2 and scenario 3 with different value for v x . Table S3.

The mean and standard deviation of the coefficients of the PDE terms. Section S1. Methods

Identification of the physical processes considered can be understood as identifying the partial differential equation (PDE) with the form: ( , ) u u mt     (S1) where u denotes the response of a physical problem; ( , ) u m  denotes the library of the candidate processes and empirical models;  denotes the coefficient; and m denotes the model parameters of the empirical model that cannot be included in  . In order to learn the PDE shown in Eq. S1, spatiotemporal measurement data are divided into a training data set with number train n and a testing data set with number test n . For the training data set, Eq. S1 can be rewritten as: ( , ) U U mt     (S2) where [ , , ..., ] train Tn U u u u  ; and i u denotes the i th training data. Here, U t   is a vector with a size of train n  , ( , ) U m  is a matrix with a size of train cand n n  , and  is a vector with a size of cand n  . Here, cand n is used for denoting the number of candidate terms. Because ( , ) U m  usually contains partial derivatives, calculating derivatives using spatiotemporal data is needed. In this work, finite difference is utilized for calculating derivatives. For any m , the corresponding coefficient  can be learned by least square regression as:   = ( ) = ( , ) ( , ) ( , ) . TT T

Um U m U m U m t       (S3) Here, note that for learning  , U t   and each column of ( , )

U m  are normalized to have zero mean and unit standard deviation. Speculating that if the guess of m is far away from the truth, the accuracy of the learned equation will be low. In order to evaluate the learned equation, a prediction error  is introduced, which is calculated using the testing data set as: ( ) ( , ) ( ) . test n n nn um u m mt             (S4) Here, note that for calculating  , we also need to calculate the partial derivatives contained in n u t   and ( , ) n u m  using testing data. In order to obtain an optimal m using the introduced  , an objective function is defined as: obs T obs pr T prM O m m C m m m C m m              (S5) where obs  denotes the observed (or target) prediction error; pr m denotes the initial sample of m ; C  denotes the covariance of the prediction error; and M C denotes the covariance of m . The first term on the right-hand side of Eq. S5 is for constraining  . The second term on the right-hand side of Eq. S5 is a regularization term. C  and M C  are used as weighting matrix for balancing the two terms on the right-hand side of Eq. S5. Supposing that no error will be incurred in calculating the partial derivatives in Eq. S4, the true PDE will have no prediction error, that is, obs   . Using the Gauss-Newton method for minimizing the objective function shown in Eq. S5, the iterative update formula takes the form:      

11 1 1 11 ( ) ,

T pr T obsl l M l l M l l l m m C G C G C m m G C m                 (S6) where l denotes the iteration index; and G l denotes the gradient taking value at m l . In Eq. S6, TM l l

C G C G    is the Hessian matrix without the term containing the second-order derivative of O ( m ). In order to mitigate the influence of the large data mismatch in early iterations, the Hessian can be modified by using a multiplier  to M C  in the Hessian term as:    

11 1 1 11 (1 ) ( ) .

T pr T obsl l l M l l M l l l m m C G C G C m m G C m                      (S7) Using the following equalities:     , T T TM l l M M l l M l l M

C G C G C C G C G C G G C         (S8)     . T T T TM l l l M l l M l

C G C G G C C G C G C G          (S9) Eq. S7 can be rewritten as:        l l T T prM M l l l M l l M M llT T obsM l l l M l l m m C C G C G C G G C C m mC G C G C G m                     (S10) Eq. S10 is the data-assimilation method utilized in this work. For implementing the data-assimilation method, the gradient G needs to be calculated. In this work, finite difference is utilized for calculating G as: . .. . ( ) ( ) ,2 l l i l l il i l i m m m mG m        (S11) where G l.i and m l,i denote the i th entry of G l and m l , respectively; and . l i m  denotes the perturbation of m l,i . Here, note that, for calculating G l.i , we only perturb the i th entry of m l , keeping other entries unchanged. In this work, . l i m  is set to be 1% of m l,i . After each update of m , ( ) m  will be calculated. If ( ) ( ) l l m m     , the update will be accepted and  will be reduced by a factor  . Otherwise, the update will be rejected, and the  will be increased by a factor  to repeat the current iteration. In this work,  is set to be 10. In summary, starting with one initial sample of m ,  will be learned and  will be calculated. Then, the data-assimilation method can iteratively update m to minimize an objective function. After each update of m ,  will be relearned and  will be recalculated. This process will continue until a convergence criterion is met. The convergence criterion utilized in this work is given as: (1) ( ) ( ) ( ); l l l m m m         (2) Iteration exceeds the pre-given number, MAX I . In this work, we set   and MAX I  . Derivative calculation is needed in this work both for building

U t   and ( , )

U m  (shown in Eq. S2) using the training data set and calculating ( ) m  (shown in Eq. S4) using the testing data set. In this work, the finite difference scheme for calculating derivatives takes the form: ( , ) (0.5 ( , ) 0.5 ( , )) / ,( , ) (0.5 ( , ) 0.5 ( , )) / ,( , ) ( ( , ) 2 ( , ) ( , )) / ( ) ,( , ) (0.5 ( , ) ( , ) ( , ) 0.5 ( , )) / ( ) k k ki i ii i i ii i i i i u x t u x t u x t ttu x t u x t u x t xxu x t u x t u x t u x t xxu x t u x t u x t u x t u x t xx                        . (S12) where t  and x  are the step sizes for time and space, respectively. Here, note that, to calculate the derivatives, we need the spatially or temporally nearby data. The data near the boundaries without sufficient nearby data for calculating the derivatives are not utilized for learning PDE. Since finite difference is used for calculating derivatives in this work, small data noise may incur huge error in the calculated derivatives. Thus, pretreatment of the noisy data is requisite. In this work, the polynomial technique is utilized to smooth the noisy data. The procedure is given below: 1.

For each monitoring location x , smooth the data along t according to the following procedures: a) For each , 1,..., k t t k n  , design an interval , CH CHk k t n t t n t       . Generate CH N  Chebyshev interpolation points , 1, ...,1

CH CHi t i N   . b)

For each , 1, ...,1

CH CHi t i N   , design an interval , CH LS CH LSi i t n t t n t       . Calculate the smoothed value at the Chebyshev interpolation point, ( , ) LS CHi u x t by performing a least squares regression with polynomial up to order LS N using the data inside of the designed interval. c) Calculate the smoothed value ( , ) CH k u x t by performing Chebyshev interpolation using the values ( , ), 1,...,1 LS CH CHi u x t i N   . 2.

For each monitoring step t , smooth the data along x using the procedures described in step 1. Calculate the derivatives using finite difference. 4.

Repeat the above three steps, if the calculated derivatives exhibit unreasonable fluctuation. For the investigated problem, we set CH N  and LS N  . Considering that the data density in the temporal domain and spatial domain is different, we set CH LS n n   for smoothing the data along t and set CH LS n n   for smoothing the data along x . Here, due to the fact that there are not sufficient data for smoothing near the boundaries in time and space, the derivatives near the boundary are not employed for learning PDE. If there exist lower and upper bounds for the model parameter, a transformation technique can be utilized for constraining the update of model parameter. Let m lower and m upper denote the lower and upper bounds for m , respectively. The transformed parameter, s , is defined by the following log-transformation: ln . lowerupper m ms m m      (S13) Then, the update is applied on s with gradient s G calculated as: ( )( ) , upper lowers mi i i i ii i upper loweri i i i i m m m m mG Gs m s m m            (S14) where mi G is calculated using Eq. S11. After updating s , the original model parameter m is calculated as: upper lower upper lower sm m m m m s         (S15) In this work, the lower and upper bounds for a is 0.25 and 0.75, respectively, and the lower and upper bounds for l K is 30 and 150, respectively. When implementing the proposed method, the transformation is not utilized for the first time. If the updated model parameter violates the constraint, the proposed method will be implemented again with model parameter transformation. Section S2. Cases

In this work, we evaluate the proposed method by considering the problem of contaminant solute transport in subsurface formation. A benchmark problem of solute transport from MT3DMS software ( ) is adopted here. It is one-dimensional advective-dispersive transport with nonlinear equilibrium-controlled sorption. The flow field is steady state. The initial and boundary conditions of the transport model are: ( , 0) 0 f , 00,( , ) 0, 0 x C x t tCD qC t txC t tx              (S16) The values for model parameters are set as: grid spacing x cm   ; dispersivity L cm   ; porosity   ; bulk density b g cm   ; duration of source pulse t s  ; advective mass flux f qC  and concentration of source fluid C mg l    . Three solute transport scenarios are designed here: scenario 1 contains the two processes of ADV and DIS; scenario 2 contains the three processes of ADV, DIS, and F-SORP; and scenario 3 contains the three processes of ADV, DIS, and L-SORP. MT3DMS software is utilized for running the simulations to obtain the measurement data for the three scenarios. The solute concentration is measured at 101 evenly-distributed spatial locations. The concentration that is larger than mg l   is used as the data for identifying the physical processes. In order to test the robustness of the proposed method, here we design additional cases with different parameter setups. In the first group of cases, scenario 2 and scenario 3 are reinvestigated with different values for f K and l K , respectively. For scenario 2, f K is set to be 0.1 ( / )( / ) a g g l mg  , and for scenario 3, l K is set to be 60 l/mg . The values of other parameters are unchanged as: x v cm s  ; a =0.7 for scenario 2; and S g g  for scenario 3. The solute concentration is measured from 300 s to 1100 s, and the measurement interval is 0.5 s. Fig. S1 shows the learned coefficient of normalized terms and the estimation of model parameters. Table S1 presents the learned equations. Overall, the results are satisfactory. In the second group of cases, scenario 2 and scenario 3 are reinvestigated with different values for x v . We set x v cm s  for the two scenarios. The values of other parameters are unchanged as: af K g g l mg  for scenario 2; a =0.7 for scenario 2;

100 / l K l mg  for scenario 3; and S g g  for scenario 3. Considering that the solute plume will rapidly move through the investigated domain due to the larger average linear groundwater velocity, the solute concentration should be measured at a shorter period with larger frequency. Here, the solute concentration is measured from 180 s to 300 s, and the measurement interval is 0.1 s. Fig. S2 shows the learned coefficient of normalized terms and the estimation of model parameters. Table S2 presents the learned equations. For scenario 2, one can obviously see the spurious update of Langmuir constant for some implementations. However, since the empirical model can be correctly selected, this spurious update of model parameter will not influence the final learned equation. The learned equations shown in Table S2 are satisfactory. In this work, for testing the performance of the proposed method, three noise levels, 1%, 5%, and 10%, are added to the data of the three solute transport scenarios. Fig. S3 shows the estimation of Freundlich exponent for the three scenarios with three data-noise levels. Fig. S4 presents the estimation of Langmuir constant exponent for the three scenarios with three data-noise levels. For scenario 1, as the noise level increases, spurious update occurs for a small number of implementations. For scenario 2 and scenario 3, as the noise level increases, the accuracy of the updated model parameter of the empirical model that is identified to be occurred (or dominated) decreases, and more implementations show spurious update of the model parameter of the empirical model that is identified to be not occurred (or dominated).

Since the gradient-based data-assimilation method is utilized in the proposed framework, multiple implementations with different initial samples of model parameter are recommended for avoiding getting stuck at the local minimums of the objective function. Considering that a prediction error of the learned model for the testing data will be calculated in the proposed method, we can evaluate the results from different implementations based on the calculated predictions errors. Fig. S5 shows the prediction error and the learned coefficients of PDE terms for the three scenarios with clean data. It can be seen that, with clean data, the variability of the prediction error and the learned coefficients are small. Table S3 presents the mean and the standard deviation (std) of the coefficients of the PDE terms, and one can see that the coefficient of variation (which equals the standard deviation divided by the mean) of the coefficients of the PDE terms are small. As discussed in the main text, data noise may incur larger variability in the results from different implementations, which will increase the difficulty for physical process identification and model parameter inference. This can be clearly seen from the results of identification of scenario 2 with three data-noise levels, as shown in Fig. 2. For further evaluation of the results, Figs. S6a-S6c show the prediction error for scenario 2 with three data-noise levels, and one can see that the learned models from some implementations have obviously higher prediction errors than others, which may be screened out. Then, we screen out the results from some implementations that have prediction errors larger than a certain value indicated by the red dashed lines shown in Figs. S6a-S6c. The coefficient of the normalized terms of the resting implementations are shown in Figs. S6d-S6f, and one can see that the variability of the coefficients obviously decreases, which can benefit the physical process identification and model parameter inference. However, the variability of the coefficients is still not sufficiently small, especially for the results shown in Fig. S6d. In order to further improve the accuracy of the learned model, the proposed method may be implemented again with a modified candidate library, which only retains the identified terms from the first implementation. For scenario 2 with three data-noise levels, since it can be identified that F-SORP is the better empirical model for SORP, the proposed method may be implemented again by deleting the candidate term for L-SORP in the candidate library. Fig. S7 shows the results of identification of scenario 2 with a modified candidate library, and it can be seen that the variability of the learned coefficients and the updated model parameters is small, which can benefit the determination of the learned equations. For the three scenarios with different data-noise levels, the mean and the standard deviation of the coefficients of the PDE terms are given in Table S3. The results are obtained by screening out the abnormal results and implementing the proposed method again with a modified candidate library if necessary. It can be seen that the coefficients of ADV, DIS, and L-SORP usually have a small coefficient of variation. However, the coefficient of F-SORP may have a larger coefficient of variation for large data-noise levels than that of other terms. Overall, the variability of the learned equations from different implementations is small. -2-1012 (a) ADV DIS F-SORP L-SORP c o e ff i c i e n t -2-1012 (b) ADV DIS F-SORP L-SORP c o e ff i c i e n t Freundlich exponent, a (c) upd a t e d initial (d) Freundlich exponent, a upd a t e d initial

40 80 120 1604080120160 (e)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (f)

Langmuir constant, K l upd a t e d initial Fig. S1. The results of identification of scenario 2 and scenario 3 with different values for K f and K l , respectively. The learned coefficient of normalized terms of scenario 2 ( a ) and scenario 3 ( b ). The estimation of Freundlich exponent of scenario 2 ( c ) and scenario 3 ( d ). The estimation of Langmuir constant of scenario 2 ( e ) and scenario 3 ( f ). -101 (a) ADV DIS F-SORP L-SORP c o e ff i c i e n t -101 (b) ADV DIS F-SORP L-SORP c o e ff i c i e n t (c) Freundlich exponent, a upd a t e d initial (d) Freundlich exponent, a upd a t e d initial

40 80 120 1604080120160 (e)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (f)

Langmuir constant, K l upd a t e d initial Fig. S2. The results of identification of scenario 2 and scenario 3 with different value for v x . The learned coefficient of normalized terms of scenario 2 ( a ) and scenario 3 ( b ). The estimation of Freundlich exponent of scenario 2 ( c ) and scenario 3 ( d ). The estimation of Langmuir constant of scenario 2 ( e ) and scenario 3 ( f ). (a) Freundlich exponent, a upd a t e d initial Freundlich exponent, a (b) upd a t e d initial (c) Freundlich exponent, a upd a t e d initial Freundlich exponent, a (d) upd a t e d initial Freundlich exponent, a (e) upd a t e d initial (f) Freundlich exponent, a upd a t e d initial (g) Freundlich exponent, a upd a t e d initial Freundlich exponent, a (h) upd a t e d initial (i) Freundlich exponent, a upd a t e d initial Fig. S3. The estimation of Freundlich exponent for the three scenarios with three data-noise levels.

40 80 120 1604080120160 (a)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160

Langmuir constant, K l (b) upd a t e d initial

40 80 120 1604080120160 (c)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (d)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160

Langmuir constant, K l (e) upd a t e d initial

40 80 120 1604080120160 (f)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160 (g)

Langmuir constant, K l upd a t e d initial

40 80 120 1604080120160

Langmuir constant, K l (h) upd a t e d initial

40 80 120 1604080120160 (i)

Langmuir constant, K l upd a t e d initial Fig. S4. The estimation of Langmuir constant for the three scenarios with three data-noise levels.

With 1% noise, the results of scenario 1 ( a ), scenario 2 ( b ), and scenario 3 ( c ). With 5% noise, the results of scenario 1 ( d ), scenario 2 ( e ), and scenario 3 ( f ). With 10% noise, the results of scenario 1 ( g ), scenario 2 ( h ), and scenario 3 ( i ). ADV and DIS p r e d i c ti on e rr o r realization (a) (b) ADV, DIS, and F-SORP p r e d i c ti on e rr o r realization (c) ADV, DIS, and L-SORP p r e d i c ti on e rr o r realization -0.00994-0.00992-0.00990 ADV and DIS c o e ff i c i e n t o f P D E t e r m ADV (d) -0.0102-0.0099-0.0096

ADV, DIS, and F-SORP (e) c o e ff i c i e n t o f P D E t e r m ADV -0.00993-0.00990-0.00987-0.00984

ADV, DIS, and L-SORP (f) c o e ff i c i e n t o f P D E t e r m ADV

ADV and DIS (g) c o e ff i c i e n t o f P D E t e r m DIS

ADV, DIS, and F-SORP (h) c o e ff i c i e n t o f P D E t e r m DIS

ADV, DIS, and L-SORP (i) c o e ff i c i e n t o f P D E t e r m DIS -0.16-0.14-0.12

ADV, DIS, and F-SORP (j) c o e ff i c i e n t o f P D E t e r m F-SORP -1.278-1.277-1.276-1.275

ADV, DIS, and L-SORP (k) c o e ff i c i e n t o f P D E t e r m L-SORP

Fig. S5. The prediction error and the learned coefficients of PDE terms for the three scenarios with clean data.

The prediction error of scenario 1 ( a ), scenario 2 ( b ), and scenario 3 ( c ). The learned coefficient of ADV of scenario 1 ( d ), scenario 2 ( e ), and scenario 3 ( f ). The learned coefficient of DIS of scenario 1 ( g ), scenario 2 ( h ), and scenario 3 ( i ). The learned coefficient of F-SORP of scenario 2 ( j ). The learned coefficient of L-SORP of scenario 3 ( k ). (a)

1% noise p r e d i c ti on e rr o r realization (b)

5% noise p r e d i c ti on e rr o r realization (c)

10% noise p r e d i c ti on e rr o r realization -101

1% noise (d)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

5% noise (e)

ADV DIS F-SORP L-SORP c o e ff i c i e n t -101

10% noise (f)

ADV DIS F-SORP L-SORP c o e ff i c i e n t Fig. S6. The prediction error and the learned coefficients of normalized terms for scenario 2 with three data-noise levels.

The prediction error with 1% noise ( a ), 5% noise ( b ), and 10% noise ( c ). The learned coefficient of normalized terms with 1% noise ( d ), 5% noise ( e ), and 10% noise ( f ). -101

1% noise (a)

ADV DIS F-SORP c o e ff i c i e n t -101

5% noise (b)

ADV DIS F-SORP c o e ff i c i e n t -101

10% noise (c)

ADV DIS F-SORP c o e ff i c i e n t

1% noise

Freundlich exponent, a (d) upd a t e d initial

5% noise (e)

Freundlich exponent, a upd a t e d initial

10% noise (f)

Freundlich exponent, a upd a t e d initial Fig. S7. The results of identification of scenario 2 with a modified candidate library.

The learned coefficients of normalized terms with 1% noise ( a ), 5% noise ( b ), and 10% noise ( c ). The estimation of Freundlich exponent with 1% noise ( d ), 5% noise ( e ), and 10% noise ( f ). Table S1.

The learned equations of scenario 2 and scenario 3 with different values for K f and K l , respectively. Physical processes Learned equation

ADV, DIS, and F-SORP

C C Ct x xCC t          C C C CCt x x t            (clean data) ADV, DIS, and L-SORP

C C Ct x x CtC         

C C C Ct x x C t            (clean data)

Table S2.

The learned equations of scenario 2 and scenario 3 with different value for v x . Physical processes Learned equation

ADV, DIS, and F-SORP

C C Ct x xCC t          C C C CCt x x t            (clean data) ADV, DIS, and L-SORP

C C Ct x x CtC         

C C C Ct x x C t            (clean data) Table S3.

The mean and standard deviation of the coefficients of the PDE terms.

Physical processes

Data quality

Coefficient of ADV

Coefficient of DIS

Coefficient of F-SORP

Coefficient of L-SORP mean std mean std mean std mean std ADV and DIS clean data -9.93E-3 9.33E-6 9.97E-3 8.25E-6 1% noise -9.97E-3 1.39E-5 1.002E-2 1.34E-5 5% noise -9.97E-3 1.56E-5 1.006E-2 1.73E-5 10% noise -9.91E-3 4.81E-5 1.004E-2 5.75E-5

ADV, DIS, and F-SORP clean data -9.89E-3 9.75E-5 9.92E-3 9.79E-5 -1.49E-1 6.58E-3 1% noise -9.67E-3 4.03E-5 9.68E-3 4.24E-5 -1.38E-1 2.27E-3 5% noise -8.51E-3 6.33E-5 8.45E-3 6.82E-5 -7.91E-2 3.13E-3 10% noise -7.50E-3 4.05E-5 7.26E-3 4.49E-5 -3.68E-2 1.62E-3

ADV, DIS, and L-SORP clean data -9.89E-3 1.28E-5 9.92E-3 1.29E-5 -1.277E0 6.11E-4 1% noise -9.76E-3 6.03E-5 9.83E-3 6.11E-5 -1.203E0 4.53E-3 5% noise -9.21E-3 5.09E-5 9.29E-3 5.23E-5 -1.124E0 6.33E-3 10% noise -8.57E-3 7.88E-5 8.63E-3 8.27E-5 -1.015E0 8.61E-3clean data -9.89E-3 1.28E-5 9.92E-3 1.29E-5 -1.277E0 6.11E-4 1% noise -9.76E-3 6.03E-5 9.83E-3 6.11E-5 -1.203E0 4.53E-3 5% noise -9.21E-3 5.09E-5 9.29E-3 5.23E-5 -1.124E0 6.33E-3 10% noise -8.57E-3 7.88E-5 8.63E-3 8.27E-5 -1.015E0 8.61E-3