[PDF] Efficient treatment of model discrepancy by Gaussian Processes - Importance for imbalanced multiple constraint inversions

Abstract

Mechanistic simulation models are inverted against observations in order to gain inference on modeled processes. However, with the increasing ability to collect high resolution observations, these observations represent more patterns of detailed processes that are not part of a modeling purpose. This mismatch results in model discrepancies, i.e. systematic differences between observations and model predictions. When discrepancies are not accounted for properly, posterior uncertainty is underestimated. Furthermore parameters are inferred so that model discrepancies appear with observation data stream with few records instead of data streams corresponding to the weak model parts. This impedes the identification of weak process formulations that need to be improved. Therefore, we developed an efficient formulation to account for model discrepancy by the statistical model of Gaussian processes (GP). This paper presents a new Bayesian sampling scheme for model parameters and discrepancies, explains the effects of its application on inference by a basic example, and demonstrates applicability to a real world model-data integration study. The GP approach correctly identified model discrepancy in rich data streams. Innovations in sampling allowed successful application to observation data streams of several thousand records. Moreover, the proposed new formulation could be combined with gradient-based optimization. As a consequence, model inversion studies should acknowledge model discrepancies, especially when using multiple imbalanced data streams. To this end, studies can use the proposed GP approach to improve inference on model parameters and modeled processes.

Full PDF

EEﬃcient treatment of model discrepancy byGaussian Processes - Importance for imbalancedmultiple constraint inversions

Thomas Wutzler Max Planck Institute for Biogeochemistry, Hans-Kn¨oll-Straße 10, 07745 Jena,GermanyE-mail: [email protected]

December 2018

Abstract.

Mechanistic simulation models are inverted against observations in orderto gain inference on modeled processes. However, with the increasing ability tocollect high resolution observations, these observations represent more patterns ofdetailed processes that are not part of a modeling purpose. This mismatch resultsin model discrepancies, i.e. systematic diﬀerences between observations and modelpredictions. When discrepancies are not accounted for properly, posterior uncertaintyis underestimated. Furthermore parameters are inferred so that model discrepanciesappear with observation data stream with few records instead of data streamscorreponding to the weak model parts. This impedes the identiﬁcation of weakprocess formulations that need to be improved. Therefore, we developed an eﬃcientformulation to account for model discrepancy by the statistical model of Gaussianprocesses (GP). This paper presents a new Bayesian sampling scheme for modelparameters and discrepancies, explains the eﬀects of its application on inference by abasic example, and demonstrates applicability to a real world model-data integrationstudy.The GP approach correctly identiﬁed model discrepancy in rich data streams.Innovations in sampling allowed successful application to observation data streams ofseveral thousand records. Moreover, the proposed new formulation could be combinedwith gradient-based optimization. As a consequence, model inversion studies shouldacknowledge model discrepancies, especially when using multiple imbalanced datastreams. To this end, studies can use the proposed GP approach to improve inferenceon model parameters and modeled processes.

Keywords : Bayesian, multiple constraint, Gaussian process, model discrepancy,imbalanced data streams

Submitted to:

Inverse Problems a r X i v : . [ s t a t . C O ] D ec odel discrepancy by GP

1. Introduction

The increased availability of high-resolution observation data streams in many sciences(Luo et al., 2011) allows constructing and evaluating detailed process models (Keenanet al., 2011). Often, the data supports higher detail than required by the modelingpurpose. Mismatch in detail between observations and model leads to modeldiscrepancy: systematic diﬀerences between the observed process and the predictionof a calibrated model. Model discrepancy is often ignored in model-data integrationstudies, or it is treated as part of a Gaussian residual term (Tarantola, 2005). Yet,model discrepancies cause correlation in model-data residuals that are not accountedfor by correlations in observations. Such unaccounted correlations lead to overconﬁdentestimates of parameters and predictions (Weston et al., 2014). They can also leadto biased parameter estimates, especially when used with imbalenced data streams,i.e. streams that strongly diﬀer by their number of records (Wutzler and Carvalhais,2014). A treatment for model discrepancy, hence, is important for model calibration.Researchers in numerical weather prediction early acknowledged the importance ofaccounting for correlations in model data residuals and developed methods for estimatingthose observations (Bormann et al., 2003; Healy and White, 2005; Weston et al., 2014).If a background, i.e. a previous ensemble of model predictions, is available and if certainconditions are fulﬁlled, the covariance matrix can be estimated using both the predicionsbased on the background and predictions based on the posterior sample (Desrozierset al., 2005; Waller et al., 2014). However, it is hard for this approach to account fortradeoﬀs among imbalanced data streams, because all ensemble members are based onthe same estimate of the covariance matrix. An approach that is suitable for trade-oﬀsshould allow discrepancies and correlations matrices to diﬀer across parameter samples.A promising alternative approach is to represent discrepancy explicitly as a statisticaldistribution called Gaussian process (GP) (Kennedy and O’Hagan, 2001; Brynjarsd´ottirand O’Hagan, 2014), which is further described in section 2.1.Adding additional constraints or data streams of various types holds the promiseto constrain diﬀerent aspects of the model (Richardson et al., 2010; Ahrens et al.,2014). The addition of sparse data stream, i.e. data sets of relatively few records,has, however, only neglibile inﬂuence on the model ﬁt if no additional weights areapplied. Model discrepancy is allocated preferentially to sparse data streams. Thispreferential allocation leads to complications in identifying model deﬁciendies in studiesusing imbalanced data streams, i.e. streams that strongly diﬀer in their number ofrecords (Wutzler and Carvalhais, 2014).The objectives of this study center around a better allocation of model discrepancyand improved inference on modeled processes in inversion studies using imbalanced datastreams. The objectives are to • demonstrate the problem of preferential allocation of model discrepancy to sparsedata streams, • present an eﬃcient sampling formulation where model discrepancy is represented odel discrepancy by GP • demonstrate the ability of the GP approach to better allocate discrepancy helpingidentiﬁcation of which modelled processes need improvement. • demonstrate the applicability of the GP approach to real world problems involvingrich data streams.This study focuses on deterministic models and parameter estimation by batch dataassimilation (Zobitz et al., 2011). The initial state is assumed to be given or part of thevector of parameters to estimate. For each proposed parameter vector, there is a uniquemodel prediction for each observation.In addition, the study focuses on approaches that combine evidence from alldata streams into a scalar valued objective function. The data stream likelihoodsare combined by their product, i.e. saying how likely the ﬁrst constraint “AND”the other constraints are, given a set of parameters. This approach diﬀers frommulti-objective optimization where the trade-oﬀs are explored by using a vector-valuedobjective function, and the user has to specify additional information on how to selectamong diﬀerent trade-oﬀs (Miettinen, 1999). The Bayesian approach employed in thisstudy corresponds to using the information content of the diﬀerent streams to determinethe location in the pareto front in case of trade-oﬀs.The paper is structured as follows. The remainder of this section visually introducesthe approach of explicitly representing model discrepancy via a GP and how thisapoproach aﬀects posterior predictions. Section 2 gives a mathematical introduction tothe GP approach and introduces both the basic example and the real-world ecosystemcase used in the remainder of the paper. Section 3.1 demonstrates the eﬀects usingthe GP approach on model inference with imbalanced data streams, by using a basicexample with two scenarios: one that ignores model discrepancies and another one thatexplicitly models discrepancy as a GP. Section 3.2 brieﬂy demonstrates the applicabilityof the GP approach to a real world inversion problem, the Dalec-Howland case. Section4 discusses how several problems were tackled and what we learned.The concept of representing model discrepancy as a GP is visualized in Figure1, which displays two examples of model predictions and model discrepancies againstobservations. The surrogate process of the example, i.e. the process generating thesynthethic data, consists of a linear part plus an oscillating part. The model onlyaccounts for the linear part, but is used to gain inferences on the slope. Modeldiscrepancy constitutes the other oscillating part. It is treated as a smooth sequenceof values at observation locations, x , and is represented as a GP. It is extremely large,here, for visualization purposes. Its uncertainty is reﬂected by several samples fromthe GP (several squiggly lines in Figure 1) conditioned on observed model-observationdiﬀerences at some supporting locations (triangles in Figure 1). Note that diﬀerencesbetween observation and process predictions, i.e. model plus discrepancy, are similaracross predictions by diﬀerent model parameters (top and bottom panels).The GP approach aﬀects inference mainly by an increased estimate of prediction odel discrepancy by GP −2.50.02.55.0−2.50.02.55.0 f l a t s t eep x y Figure 1.

Process predictions (squiggly lies) that approximate observations (dots)are the sum of a model prediction, g (straight line) and several realizations of modeldiscrepancy δ : o ∼ N ( g ( θ ) + δ , σ (cid:15) ). Process predictions (top and bottom panel)obtained by diﬀerent parameters (model parameters θ = ( intersept, slope ) T , andcorrelation length ψ ) are more similar than the corresponding model predictions. Therespective two Gaussian process (GP) models of discrepancies have been trained on asubset of data at supporting locations indicated by triangles. uncertainty (Figure 2). The inversion with the ignore scenario, which does notacknowledge correlation among model-data residuals, overestimates information contentin the observations and hence overestimates precision of posterior estimates. Contrary,the GP approach strongly reduces correlations among residuals between observationsprocess predictions, i.e. model predictions plus discrepancies. At the same time, ityields in a similar likelihood for a broader range of model parameters (compare rows inFigure 1), and hence increases the estimate of uncertainty.

2. Methods

The vector of observations o of one data stream is modelled as the sum of thedeterministic model prediction g , an unknown smooth correlated vector of modeldiscrepancy δ , and observation error (cid:15) as (Kennedy and O’Hagan, 2001): o = g ( θ ) + δ + (cid:15) (1 a ) (cid:15) ∼ N ( , Σ ) (1 b ) δ ∼ GP ( , K ) . (1 c )The model prediction g depends on parameter vector θ , whose distribution is to beestimated. The model can be a simple regression function or a complex deterministic odel discrepancy by GP ignore GP lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l −4−2024 0 5 10 15 0 5 10 15 x y Figure 2.

Uncertainty estimates of predictions change when accounting forcorrelations due to model discrepancy. Sampling the linear part of the example data(dots, same as in Figure 1) underestimated prediction uncertainty when ignoringmodel discrepancy (left), as denoted by the narrow shaded 95% conﬁdence band.Representing discrepancy as a Gaussian process (GP) (right) increased the estimate ofuncertainty. In addition, the representation allowed inference on process predictions( g i ( θ ) + δ i ), as denoted by the 95% conﬁdence interval whiskers. dynamical simulation model. The observation error (cid:15) is assumed to be Gaussian noise,often with the additional assumptions of independent errors: Σ = σ (cid:15) I . The unknownvector of the model discrepancy is modelled as as GP. It accounts for the additionalcorrelations in model-data residuals due to model discrepancy. Here, the GP uses asquared exponential covariance function, K ( x p , x q ) = σ d exp ( − ( x p − x q ) /ψ + σ (cid:15) δ pq ),where the Kronecker delta δ pg equals one for p = q and zero otherwise. The covariancefunction has two hyperparameters: correlation length ψ that controls the smoothness ofthe model discrepancy across neighboring locations, and signal variance σ d that controlsthe amplitude of the discrepancies varying around the expected value of zero (Rasmussenand Williams, 2006). The approach is applicable also for diﬀerent assumptions aboutthe measurement error or the covariance structure of the model discrepancies. When dealing with multiple data streams, the diﬀerent units of model discrepancyvariance σ d of diﬀerent streams hinder the comparison of model error. For example onestream may report weights of soil carbon and the other ﬂuxes of carbon dioxide per day.Here, we propose normalizing discrepancy variance for stream, k , by the mean varianceof observation uncertainty, σ k = σ d,k /σ (cid:15),k . (2)The unitless quantity σ k allows comparing model discrepancy across streams. Forexample, one can state that model discrepancy in stream k is double the modeldiscrepancy of stream k , although they report diﬀerent qualities. odel discrepancy by GP The parameters to estimate comprise the vector of model parameters, θ , and for eachdata stream, k , the two scalar hyperparameters σ k , and ψ k . Variance of observationsuncertainty, σ (cid:15) , is assumed to be known in this study. This section motivates severalchoices of the sampling strategy, while Appendix A reportes the details of the sampling.Sampling used the Block-at-a-Time Metropolis-Hastings algorithm (Hastings, 1970;Chib and Greenberg, 1995; Gelman et al., 2003). Each signal variance, σ k , was sampledfrom an inverse Gamma distribution. All the other parameters were sampled byMetropolis steps. A Metropolis step can be used to obtain a sample from a randomvariable for which probability density can be evaluated up to a normalizing constant(Metropolis et al., 1953; Hastings, 1970; Andrieu et al., 2003). One Metropolis stepwas applied for each hyperparameter ψ k and another Metropolis step was applied formodel parameter vector θ . Proposals of new parameter vectors were generated by usingdiﬀerential evolution Markov chain algorithm (ter Braak and Vrugt, 2008).We used a subset of observation locations, called the supporting locations (trianglesin Figure 1), for training the GP model of discrepancies. The usage of a subsetimproved computational eﬃciency and prevent numerical singularity, because theresulting matrices were smaller and less prone to numerically singularity (Brynjarsd´ottirand O’Hagan, 2014).In addition, we treated model discrepancies at supporting locations, δ s , as latentvariables instead of including them in the set of estimated parameters for two reasons.First, the spacing of the supporting points should adapt to the correlation length to avoidnumerical instabilities, and hence the number and location of these supporting locationsvaried. Second, this treatment dimished the problem of high rejection rate in theMetropolis step and the slow mixing of the parameters, which occurs with sampling thatis conditioned on discrepancies (Brynjarsd´ottir and O’Hagan, 2014). Furthermore, theresulting formulation can also be used with non-sampling based optimization methods,e.g. gradient-based methods that require a smaller number of model evaluations. A basic synthethic example demonstrates the problem with imbalanced multiple datastreams.Consider the chemical reaction that oxidizes organic carbon in soil and evolvescarbon dioxide ( CO ). The CO production rate was monitored over a week, i.e. roughly1000 times, and found to increase with temperature. For the same site the content oforganic matter has been measured september each year in 10 years together with thecumulative amount of CO produced over two weeks. The cumulated CO productionincreased with organic matter content, i.e. the amount of available reactant.We modeled the corresponding two data streams o rich ( n rich = 1000) and o sparse ( n sparse = 10) of such a system by a surrogate physical process (Reich and Cotter,2015). The covariates of organic matter content ( x sparse ) and temperature ( x rich ) were odel discrepancy by GP ∼ U (0 . , .

5) and ∼ U (0 . , a , b , and c as,ˆ o sparse ( a, b ) = a x sparse + b x rich / . (3 a )ˆ o rich ( a, b, c ) = a x , sparse + b ( x rich − c) . (3 b )The example is set up in a way so that the the spread in the prediction of thesparse data stream, o sparse , mainly depends on the covariates x sparse . This is because theprocess description (3 a ) uses a single aggregated value x rich , of the covariates of the richdata stream, speciﬁcally average september temperature. The spread in the rich datastream o rich , on the other hand, mainly depends on covariates, x rich . This is becausethe process description (3 b ) only uses a single value, x , sparse , of the covariates of therich data stream, speciﬁcally the organic matter content at the year of the measurementcampaign.The surrogate physical process, o ∗ , was generated by running the model withparameters a ∗ = 1 and b ∗ = 2, and bias variable c ∗ = 0 .

3. Next, observations, o ,were generated by adding Gaussian noise to o ∗ with standard deviation of 4% and 3%of the mean of o ∗ sparse and o ∗ rich , respectively.Using the generated observations and covariates, the posterior density of parameters a and b were sampled by diﬀerent scenarios of model inversions. In order to demonstratethe transfer of model uncertainty, all scenarios used a model that slightly diﬀered fromthe data-generating model. Speciﬁcally they used a value of the ﬁxed bias parameter inthe process description of the rich data stream (3 b ) of c = 0 .

1, instead of c ∗ = 0 .

3. Thisbias parameter represents a diﬀerence between measured air temperature and the topsoil temperature at the site of respiration.

The real world example inversely estimated 15 parameters and initial conditions ofthe process-based ecosystem model of carbon dynamics, the Data Assimilation LinkedEcosystem Carbon (DALEC) (Williams et al., 2005). Observations comprised 10-yearsof daily eddy covariance-based net ecosystem exchange (NEE) (Hollinger et al., 2004;Hollinger and Richardson, 2005), soil respiration, and litterfall at the Howland forest.While data streams of NEE and respiration had about 2000 observations, litterfall hadonly one record for each of the 10 years. For demonstration purposes, here, we usedestimates of observation uncertainties that were reduced by 25% compared to the originalestimates for each data stream. Original estimates of observation error were speciﬁed asa function that increased with the magnitude of the ﬂuxes. We applied those functionsto the model prediction instead of the observed value in order to prevent preferential ﬁtto low ﬂuxes. Further details of the model inversion settings are described in (Wutzlerand Carvalhais, 2014). odel discrepancy by GP Table 1.

Scenarios of model inversions.

Sampling scenario Description Sectionignore ignoring model discrepancy 3.1.1GP GP for each stream’s model discrepancy 3.1.2

3. Results

First, the results of the basic example illustrate the consequences applying severalanalyses that diﬀer by their treatment of model discrepancy. Second, the ecosystemexample demonstrates the applicability of the GP approach to more demanding realworld inversion settings.

The inversion of the basic example model (Section 2.4) resulted in diﬀerent estimates ofposterior density of model parameters and in diﬀerent estimates of predictive posteriordensity of process values (Figure 3) when using diﬀerent sampling scenarios (Table 1)

With the ignore scenario,model discrepancy was assumed to be the zero vector, δ = . The Likelihood of theobservations conditioned on model parameters θ was to the usual formulation derivedfrom the assumption of normally distributed observation errors (Tarantola, 2005). p indep ( θ | o ) = C p ( o | θ ) p ( θ ) . log ( p indep ( θ | o )) = C − / (cid:88) k S k ( o k | θ ) + log p ( θ ) . (4) S k ( o k | θ , δ ) = d T Σ − d = (cid:88) i k d i k σ (cid:15),i k , (5) d k = o k − [ g k ( θ ) + δ k ] (6)where C and C are constants, which cancel in the Metropolis decision, and i k iterates across all record numbers in stream k ∈ { rich, sparse } , S k is a cost, and d k isthe vector of observation-process residuals.The infererred uncertainty of the process prediction was very low, indicated by thethin 95% conﬁdence band of the model predictions (ﬁrst column of Figure 3). Moreover,the introduced model deﬁciency appeared in the sparse data stream instead of the richdata stream.By ignoring the model discrepancies, also the additional correlation among modeldata residuals were ignored. Hence with this scenario, the uncertainty was largely odel discrepancy by GP ignore GP lllllllllllllll llllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll llllllllllllllllllll lllllllllllllllllll lllllllllllllllllllllllllllllllllllllllllllllll ll o r i c h ignore GP l l l l l l l l l l l l l l l l l l l ll ll x o s pa r s e Figure 3.

Predictive posterior of the basic example: Ignoring model discrepancy(left) leads to allocation of model discrepancy to the sparse observations, as seenby the misﬁt between observations (dots) and median model predictions (line insideshaded 95% conﬁdence band). Contrary, representing discrepancies by a GP (right)correctly allocates discrepancy to a slightly worse ﬁt of the biased rich observations.The distribution of process predictions (vertical 95% conﬁdence whiskers) hint to theform of the deﬁciency in the process formulation. underestimated. Moreover, the results incorrectly suggested that the model fails topredict the sparse data stream, and that the corresponding process description shouldbe reﬁned. Instead, we had introduced the model deﬁciency in the process descriptioncorresponding to the rich data stream (Section 2.4).

With the GP scenario,model discrepancies were represented by a separate GP for each stream. Note thatthere was an additional term, − / δ s,k K − ss,k ˆ δ s,k , in the log-density of the parameters(A.13 - A.16) compared to the ignore scenario (4), eﬀectively penalizing large modeldiscrepancies.The infererred uncertainty of the process corresponding to the rich observationsincreased compared to the ignore scenario (Figure 3). The magnitude of modeldiscrepancies in the sparse data stream strongly declined, while the magnitude of modeldiscrepancies in the rich data stream only slightly increased.The GP scenario helped to tackle both problems: ﬁrst, the underestimation ofposterior variance and second, the unbalanced allocation of model discrepancies. odel discrepancy by GP gradIgnore gradGP lllllllllllllll llllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll lllllllllllllll llllllllllllllllllll lllllllllllllllllll llllllllllllllllllllllllllllllllllllllllllllll o r i c h gradIgnore gradGP l l l l l l l l l l l l l l l l l l l l x o s pa r s e Figure 4.

Predictive posterior based on a gradient-based optimization of the basicexample reveals similar changes on applying the GP-based formulation as with thesampling based inversion (Figure 3). The GP-based inversion (right) correctly allocatesdiscrepancy to the biased rich observations, as seen by the misﬁt between observations(dots) and median model predictions (line inside shaded 95% conﬁdence band).

The optimum parameter set canbe found by a gradient-based maximization of the probability densitiy (4). Here,we used the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method of (Nash, 1990), asimplemented by the optim function in the R stats package (R Core Team, 2016). A ﬁrstorder estimate of parameter uncertainty was obtained from the Hessian matrix at theoptimum.Alternatively, the optimum parameter set can be found by maximization of theGP-based density (A.13). In order to maximise this density, we had to a priorispecify the parameters of the GP, i.e. supporting locations, correlation length, ψ k ,and signal variance, σ d,k . We speciﬁed conservative values based on the results ofthe optimization that ignored discrepancies (Figure 4 left), where the magnitudes ofthe model discrepancies exceeded the observation uncertainty, while the correlationsspanned the entire range of the data. Speciﬁcally, we speciﬁed four equidistantsupporting points, a correlation length of one-third of the range of the respectivecovariate, and a signal variance based on 1.5 times the observation uncertainty forall data streams.The results with gradient-based optimization changed in a similar way as in thesampling-based inversions when representing model discrepancy by a GP. The estimatesof the prediction uncertainty increased, while the model discrepancy in the sparse data odel discrepancy by GP Model inversions of the 15 DALEC parameters using datastreams with thousands ofrecords was computationally more expensive, both in terms of required model runsand in terms of computing the inverse of the discrepancy correlation matrix. Despitethe improved GP sampling formulation, mixing was not as good as expected and wehad to increase the thinning interval up to 48. The slowest mixing was observed forcorrelation length parameters. Chains converged to the limiting distribution after about2000 samples (Appendix C).When ignoring model discrepancies, posterior distribution of parameters wereestimated in such a way that predictions beyond the calibration period lead to verylow uncertainty of predictions of the rich data streams of NEE and soil respiration(ﬁrst column in Figure 5). With the GP scenario the predicted uncertainty increased(second column in Figure 5). This was similar to the results of an inversion based onthe parameter blocks approach (Wutzler and Carvalhais, 2014) (shown for comparisonin third column of Figure 5).The marginal parameter distributions estimated by the GP scenario were mostlybroader than the estimates of the ignore and the blocks scenario. Moreover, the GPscenario estimated higher turnover rates (Tl, Tf, Tlab) and lower temperature sensitivity(Et) (Appendix C).The GP inversion estimated the largest model discrepancies for soil respiration(Figure 6), at about double the discrepancy of all the other data streams. Theoverall magnitude was, however, small with only less than 10% of average observationuncertainty of the corresponding data stream. The DALEC model well captured thepatterns in the observations and there was only a negligible trade-oﬀ between severalconstraining data streams.

4. Discussion

The discussion highlights three main points. Section 4.1 discusses how the GP approachcan solve problems associated with imbalanced data streams. Next, section 4.2 discusseshow methodological innovations allowed the application to real world problems. Finally,section 4.3 highlights, that the proposed formulation allows combining advantages ofnon-sampling based optimizations with GP-based representation of model discrepancy.

Model discrepancy will become more relevant in future. It is often caused by a mismatchin detail between model and observations. The detail in models is bounded by themodeling purpose, while the detail in observations increases with the ability to takehigh resolution measurements (Luo et al., 2011). odel discrepancy by GP neeignore neeGP neeblockrsoilignore rsoilGP rsoilblocklfallignore lfallGP lfallblock Figure 5.

Long term (40 years) future predictive posterior of the DALEC ecosystemexample show a more realistic estimation of prediction uncertainty (shaded 95%conﬁdence bands) with the GP and the block approach. Ignoring discrepancy (left)leads to overconﬁdent estimates, as seen by the overplotting of the conﬁdence intervalby the median model prediction in the rich NEE and respiration data streams (two toprows, where only the 40 th year is shown). While the GP approach (middle column)leads to increased uncertainty of both rich and sparse data streams, the block approach(right) balances the allocation of uncertainty away from the sparse litterfall stream(bottom row) towards the rich data streams. odel discrepancy by GP Model Discrepancy den s i t y variable neeSummerneeWinterrsoillfall Figure 6.

Distribution of discrepancy variance in Howland example was only a smallfraction of corresponding variance of observation uncertainty (see Equation 2). Sucha ﬁnding increases conﬁdence in the application of the DALEC model.

Therefore, the problem of preferential allocation of model discrepancy to sparsedata streams (Wutzler and Carvalhais, 2014) should be acknowledged when invertingmodels against imbalanced multiple data streams. It impairs the usage of multiple datastreams to constrains diﬀerent model aspects (Richardson et al., 2010), as demonstrated,here, with the basic example. Most important, it complicates locating the source ofmodel deﬁciencies, because the model discrepancy does not appear with the observationscorresponding to the weak model aspects (Figure 2).The GP approach, i.e. explicit representation of model discrepancy by a GP (Figure1), tackles the problem of preferential discrepancy allocation. The problem is causedin part by overestimating the information content in the rich data stream, which is inturn caused by not accounting for correlations among model-data residuals due to modeldiscrepancy. The GP approach represents these correlations by two parameters per datastream, whose distribution is estimated during the Bayesian model inversion. Indeed,in this study it achieved a better balance of ﬁts to the imbalanced data streams (Figure3). In addition, it yielded a more realistic estimate of the uncertainty of the parametersand the predictions.The estimated variance of the model discrepancies can be used to identifywhich processes in the model need reﬁnement. The variance can be expressed ina dimensionless metric, normalized as a multiple of average stream measurementuncertainty (section 2.2), which is not aﬀected by diﬀerent units. A larger normalizedvariance (relative to other streams) indicates that the corresponding process are in lessaccordance with observations. In the basic example, the largest normalized misﬁt wascorrectly associated with the process that predicts the observations of the rich datastream (Figure 7), as the example was constructed. odel discrepancy by GP logSigma den s i t y stream sparserich Figure 7.

The large values in the distribution of posterior model discrepancy, whichare of the same magnitude as the variance of observation uncertainty (units of the x-axisis as natural logaritm of their ratio, see equation 2), suggest that model developersshould reﬁne the process description predicting the rich data stream with the basicexample.

Model reﬁnement can be justiﬁed by the magnitude of the normalized discrepancyvariance. Reﬁnement is necessary up to a detail that captures the salient patterns inthe observations (Wiegand et al., 2003). Therefore, it is justiﬁed when the data isgood enough to reveal more details of the processes, and when those details help themodeling purpose. With the Howland case, reﬁnement was not necessary, because,the model discrepancy was only a small fraction of the observation uncertainty (Figure6). With the basic example, reﬁnement was justiﬁed, despite the cost of an additionalparameter, because model discrepancy variance of the rich data stream amounted tosame magnitude as the observation uncertainty (Figure 7, e = 1). In addition,reﬁnement can be guided by the shape of the model discrepancy. With the basicexample, reﬁnement would start to look at the slope in the modeling of the rich datastream (Figure 3).The GP approach propagates uncertainty in observations to the uncertainty ofparameters in a well-grounded manner. It is fully based on probabilistic principles, anddoes not invoke any external information for combining multiple criteria. It avoid themodiﬁcation of uncertainty estimates with weighting diﬀerent data stream (Wutzler andCarvalhais, 2014). It retains the advantage that improved resolution of sampling alsoleads to a more precise estimation of model parameters, given that the model is able toresolve the patterns emerging at the high resolution.The acceptance of more parameter vectors with the GP-approach compared to theapproach ignoring discrepancy is feature not a ﬂaw. The resulting higher estimates ofparameter and prediction uncertainty are more realistic, because there is less informationin model-data residuals that are correlated due to model discrepancy. But how can theapproach decide in which shares the model-data residuals are explained by either themechanistical model or the statistical GP model of discrepancies? The allocation of odel discrepancy by GP While the GP approach worked well for the basic example, there were practical problemswith application to larger problems, which were tackled by the proposed samplingscheme.The ﬁrst problem is that an inversion using rich data streams with several thousandrecords (Luo et al., 2011) must use many supporting points for the GP if correlationlength becomes small. That leads to the construction of huge matrices that need tobe inverted often. We tackled this problem by constraining the choices of supportingpoints and correlation length (Appendix B).The second problem is the slow mixing of model parameters and model discrepancieswhen parameters and discrepancies are sampled separately (Brynjarsd´ottir andO’Hagan, 2014). In this study, we tackled this problem by conditioning the modeldiscrepancies on observed model-data residuals, i.e. treating them as latend variables,and recomputing them on each draw of new model parameters or correlation length.The posterior density of model parameters then is a part of the joint likelihood ofmodel parameters and inferred discrepancies. This derivation resulted in an additionalterm per data stream in the density of model parameters that penalizes large modeldiscrepancies (Appendix A).The third problem is to specify appropriately spaced supporting points (trianglesin ﬁgure 1) if correlation length ψ is not well known a priori. If the points are spacedto close to each other, the matrices become numerically singular (Brynjarsd´ottir andO’Hagan, 2014). If the points are spaced too wide, the underlying function may not becaptured well. We tackled this problem by re-arranging the supporting points for eachproposed correlation length. In addition, we introduced some randomness in selectingsupporting points for a given correlation length. This avoided being trapped in stateswith lucky choices of supporting locations that yield exceptionally high likelihoods ofobservations.The proposed sampling scheme tackled those problems and allowed successfullapplication to real world data of several thousand records. The proposed formulation (A.13 - A.16) makes it possible to combine the advantages ofgradient-based optimization with GP-based representation of model discrepancy.Speed is one advantage of gradient-based optimization compared to Bayesiansampling. The proposed Bayesian sampling scheme converges faster and is numericallyless demanding than a sampling scheme that needs to sample discrepancies at the odel discrepancy by GP

5. Conclusions

Based on results of diﬀerent sampling scenarios of the basic model example and of theecosystem case we conclude: • Neglecting model discrepancies during the sampling leads to, ﬁrst, underestimationof posterior uncertainty, and second, preferential allocation of model discrepancyto sparse data streams. • Explicitly accounting for model discrepancy by representing it as a Gaussianprocesses (GP) successfully tackles both problems. • The proposed sampling scheme, which treats model discrepancies as latent variable,tackles several computational problems occuring with large data streams. It allowsapplication of the GP approach to real world inverse problems. • The proposed formulation can combine advantages of gradient-based optimizationwith GP-based representation of model discrepancy.When inverting a model using multiple data streams, it is important to explicitlyaccount for model discrepancies. The presented GP formulation eﬃciently balancesallocation of model discrepancy to imbalanced data streams and allows an improvedinference on modelled processes.

Appendix A. Sampling details

The following section details the sampling of parameters for one data stream. Allquantities except model parameters, θ , are stream speciﬁc. For brevity the streamindex k is omitted. odel discrepancy by GP Table A1.

Notation. Stream subscript k is omitted for brevity. Symbol description o observations. vector of size n k θ model parameters g ( θ ) vector of model prediction for each observation t locations of the observations, here their times. vector of size n k z model-data residuals. z = o − g ( θ ) = z s ∪ z r z s model-data residuals at supporting locations s ⊂ t . z r model-data residuals at remaining locations r = t \ s . d model-process residuals. d = o − [ g ( θ ) + δ ] ψ correlation length of model discrepancies σ (cid:15) variance of observation uncertainty. σ d variance of model discrepancy, i.e. signal variance of the Gaussian process (GP). σ normalized variance of model discrepancies. σ = σ d /σ (cid:15) (2) δ model discrepancy. δ = δ s ∪ δ r Λ i , j Correlation matrix between two vectors of locations i , j ⊆ t . It depends on ψ K i , j Covariance matrix between two vectors of model locations. K i , j = σ d Λ i , j K z Covariance matrix with observation noise. K z = K s , s + σ (cid:15) I A complete formulation of the model is the following o | θ , δ ∼ N (cid:0) g ( θ ) + δ , σ (cid:15) I (cid:1) (A.1)[ θ ] ∝ δ | ψ, σ d ∼ GP (cid:0) , K ( ψ, σ d ) (cid:1) (A.3) ψ ∼ trGamma( a ψ , b ψ ) (A.4) σ d = σ σ (cid:15) , σ ∼ IG( α σ , β σ ) , (A.5)where, in the presented examples the prior for model parameters θ was uniform, andtrGamma denotes the truncated gamma distribution. The covariance function of the GP,here, was a squared exponential function: K ( x p , x q ; ψ, σ d ) = σ d exp ( − ( x p − x q ) /ψ + σ (cid:15) δ pq ), where the Kronecker delta δ pg equals one for p = q and zero otherwise. Inthis study we knew that the correlation length of model discrepancies had to be ofsimilar magnitude as the range of observation locations and used a prior to yield amean of one third of the range of locations, r ( x k ), and variance of r ( x k ) / .

2, speciﬁcally( a ψ , b ψ ) = (1 . , . ψ rejected proposals if they fell outside the truncation boundsdescribed in Appendix B. The prior density of the normalized discrepancy variance isan Inverse Gamma distribution. We used α σ = 1 .

005 and β σ = 0 . odel discrepancy by GP ψ k , another Metropolis sampling block for θ , and Gibbs sampling blocksfor each σ d,k . We brieﬂy recall the Metropolis-Markov chain procedure to sample arandom variable for which the probability density ist known up to a constant. At thebeginning of each Metropolis sampling block a new parameter proposal is generated. Inthis study we generated proposals by using DEMC, which suggests steps in parameterspace based on the distribution the parameter among several sampling chains (terBraak and Vrugt, 2008). Next, model discrepancies, and the conditional probability forproposed block parameters are computed. These are also re-computed for the currentblock parameters if parameters that are used by the probability function have beenupdated by other blocks. Next a Metropolis decision accepts the proposed state if theratio of the probabilties (proposed to current) is larger than a random number drawnfrom U (0 , ψ is sampled by a Metropolis-Hastings block. The fullconditional distribution of ψ is p ( ψ | o ) = p (cid:16) ψ, ˆ δ | o (cid:17) p (cid:16) ˆ δ | ψ, o (cid:17) ∝ p ( o | ψ, ˆ δ ) p ( ψ, ˆ δ ) p (cid:16) ˆ δ | ψ, o (cid:17) = p ( o | ψ, ˆ δ ) p (ˆ δ | ψ ) p ( ψ ) p (cid:16) ˆ δ | ψ, o (cid:17) (A.6) p ( o | ψ, ˆ δ ) ∝ exp (cid:18) − σ (cid:15) (cid:107) o − (cid:16) g + ˆ δ (cid:17) (cid:107) (cid:19) (A.7) p ( ψ ) = p Γ ( a ψ , b ψ ) ∝ ψ a ψ − e − b ψ ψ (A.8) p (ˆ δ | ψ ) p (cid:16) ˆ δ | ψ, o (cid:17) = (2 π ) − n k / | K δ | − / e − (ˆ δ − ) K − δ (ˆ δ − ) (2 π ) − n k / | K δ | − / e ≈ e − ˆ δ s K − ss ˆ δ s (A.9) K δ = [ K ss , K sr ; K rs , K rr ] , (A.10)where the depencies on θ and σ d and the stream index, k , are omitted for brevity.The ﬁrst step in (A.6) derives from a factorization of the joint density of ψ and ˆ δ .The second step is the Bayes rule. The third step is the factorization of the jointunconditional density of ˆ δ and ψ .The Likelihood A.6 is based only on the expected value of model discrepancy anddoes not require a sample of model discrepancy. It actually holds for any particularsample of model discrepancy δ , but choosing the expected value ˆ δ greatly simpliﬁescalculations (A.9). The normalizing factor in (A.9) cancels. The numerator is theprobability density of ˆ δ without knowning the observations, i.e. a multivariate normaldensity of the GP with zero mean. The denominator is the probability conditioned onthe current observations and predictions, i.e. a multivariate normal density with meanˆ δ . The inversion of the blocked matrix K δ could be done using blockwise inversion.However, it still requires an inversion of a matrix of dimension of the number of non-supporting locations r , and has the danger of becoming numerically singular. Wesuggest approximating the norm by only using the supporting locations. Due to the odel discrepancy by GP K s , s from becoming numerically singular. If K s , s becamenumerically singular, a small diagonal component could be added.The vector of expected value of model discrepancies ˆ δ = ˆ δ s ∪ ˆ δ r is computedas function of parameters θ , ψ , and σ d based on the model model-data residuals atsupporting locations, z s (Rasmussen and Williams, 2006)ˆ δ s = K s , s K − z z s (A.11)ˆ δ r = K r , s K − s , s δ s = K r , s K − z z s (A.12)Note that covariance matrices K depend on current discrepancy variance σ d andcorrelation length ψ , and that z s depends on parameters θ and observations o .Model parameters θ are sampled by Metropolis-Hastings block. Their conditionaldistribution depends on all data streams, k . We assume that observations and modeldiscrepancies are independent between diﬀerent streams. p ( θ | o ) = p (cid:16) θ , ˆ δ | o (cid:17) p (cid:16) ˆ δ | θ , o (cid:17) ∝ p ( o | θ , ˆ δ ) p (ˆ δ | θ ) p ( θ ) p (cid:16) ˆ δ | θ , o (cid:17) (A.13)= (cid:89) k p ( o k | θ , ˆ δ k ) p (ˆ δ k | θ ) p (cid:16) ˆ δ k | θ , o k (cid:17) p ( θ ) (A.14) p ( θ ) ∝ p (ˆ δ k | θ ) p (cid:16) ˆ δ k | θ , o k (cid:17) = exp ( − / δ k K − δ ,k ˆ δ k ) ≈ exp ( − / δ s,k K − ss,k ˆ δ s,k ) , (A.16)where, the dependence on all hyperparameters ψ k and σ d,k has been omitted for brevity.The derivation of each stream-factor is analogous to the derivation of conditional density p ( ψ | o ) above, unless the simpliﬁcation that the normalizing factor of p (ˆ δ k ) does notdepend on θ . Note that a penalty term (A.16) for each stream model discrepancy mustbe included in the conditional density function of the parameter vector. Again, weapproximated the norm of the vector of all model discrepancies ˆ δ k by the norm of themodel discrepancies at supporting locations ˆ δ s,k .Normalized discrepancy variance σ was sampled from an inverse Gammadistribution conditioned on discrepancies derived for current parameters. σ | o , θ , ψ ∼ IG (cid:18) α σ + n s , β σ + 12 σ (cid:15) || ˆ δ s ( o , θ , ψ, σ d ) || Λ − ss (cid:19) , (A.17)where n s is the number of supporting locations and α σ , β σ are parameters of the priorprobability density. The factor 1 /σ (cid:15) needs to be included, because we speciﬁed theprior for the discrepancy variance normalized by variance of observation uncertainties.We did not encounter problems when reusing the current value of σ d in the calculationof discrepancy. However, one could instead use a data-based discrepancy variance toprevent this dependence and potential feedback behaviour. A data-based estimate EFERENCES

Appendix B. Limits for choosing supporting points s and truncation ofsampling correlation length ψ For large correlation length ψ , some matrices used in the calculation of modeldiscrepancies δ s become numerically singular. Further, diﬀerent correlation length thatare larger than the range of the data, t , cannot be distinguished. Hence, an upper boundof ( max ( t ) − min ( t )) is applied to the sampling of ψ .For very small correlation length, model discrepancy goes to the expected valuezero between supporting points. However, we want to model a smooth discrepancybetween supporting points. Hence, a lower bound of 2/3 of the mean distance betweensupporting points is applied to the sampling of ψ .Supporting points are chosen among observation points closest to a grid withdistance 3 ψ/

2, with a minimal spacing so that there are two points between supportingpoints and a maximal spacing so that there are at least ﬁve supporting points.

Appendix C. Dalec-Howland inversion additional results

Eight chains from two independent populations converged to the same limitingdistribution (Figure C1).The variance of the marginals of posterior probability density inferred by the GPapproach, was intermediate between the ignore and the blocked inversion scenarios. Thelocation of the mode was mostly in between the two other approaches (Figure C2).Prediction during calibration period are quite similar across inversion scenarios(Figure C3). Nevertheless there were diﬀerences in uncertainty of posterior predictions(Figure 5).

Acknowledgments

We thank Nuno Carvalhais and Paul Bodesheim for constructive discussions on modeldata integration and on GPs, Adam Erickson for help with the English, Trevor F.Keenan and David Hollinger for permission to use the Howland data and model setup.The work has been partly funded by the Deutsche Forschungsgemeinschaft CRC 1076”AquaDiva”.

References

Ahrens, B., Reichstein, M., Borken, W., Muhr, J., Trumbore, S. E. and Wutzler,T. (2014). Bayesian calibration of a soil organic carbon model using delta-14cmeasurements of soil organic carbon and heterotrophic respiration as joint constraints,

Biogeosciences (8): 2147–168. EFERENCES Trace of Fg

Density of Fg

N = 500 Bandwidth = 0.005532 0 100 300 5000.700.750.800.850.90 Iterations

Trace of Fnf

Density of Fnf

N = 500 Bandwidth = 0.007390 100 300 5000.10.20.30.40.5 Iterations

Trace of Fnrr

Density of Fnrr

N = 500 Bandwidth = 0.01857 0 100 300 500−1.70−1.68−1.66−1.64−1.62 Iterations

Trace of Tf −1.72 −1.68 −1.64 −1.60010203040

Density of Tf

N = 500 Bandwidth = 0.0027350 100 300 500−7.0−6.5−6.0 Iterations

Trace of Tl −7.0 −6.5 −6.00.00.51.01.52.02.5

Density of Tl

N = 500 Bandwidth = 0.0332 0 100 300 5000.0150.0200.025 Iterations

Trace of Et

Density of Et

N = 500 Bandwidth = 0.00038660 100 300 50056789 Iterations

Trace of Pr

Density of Pr

N = 500 Bandwidth = 0.1444 0 100 300 500200205210215220225 Iterations

Trace of Lout

195 205 215 2250.000.050.100.150.200.25

Density of Lout

N = 500 Bandwidth = 0.30830 100 300 50012.3812.4012.4212.4412.4612.4812.50 Iterations

Trace of Lfall

Density of Lfall

N = 500 Bandwidth = 0.003054 0 100 300 5000.100.110.120.130.14 Iterations

Trace of Fll

Density of Fll

N = 500 Bandwidth = 0.0015130 100 300 500−3.8−3.6−3.4−3.2−3.0 Iterations

Trace of Tlab −4.0 −3.6 −3.2 −2.80.00.51.01.52.02.5

Density of Tlab

N = 500 Bandwidth = 0.03112 0 100 300 5000.080.090.100.110.120.130.14 Iterations

Trace of Flr

Density of Flr

N = 500 Bandwidth = 0.0016780 100 300 500250300350400450500 Iterations

Trace of Cfmax

250 350 4500.0000.0010.0020.0030.0040.0050.006

Density of Cfmax

N = 500 Bandwidth = 11.21 0 100 300 5000.300.350.400.450.50 Iterations

Trace of pRaBelow

Density of pRaBelow

N = 500 Bandwidth = 0.0091060 100 300 500200250300350400 Iterations

Trace of Clab0

200 300 4000.0000.0020.0040.0060.0080.0100.012

Density of Clab0

N = 500 Bandwidth = 6.909 0 100 300 5005.05.56.06.57.07.58.0 Iterations

Trace of logPsi_neeSummer

Density of logPsi_neeSummer

N = 500 Bandwidth = 0.11170 100 300 500−5−4−3−2−1 Iterations

Trace of logSd2Discr_neeSummer −5 −4 −3 −2 −10.00.20.40.60.8

Density of logSd2Discr_neeSummer

N = 500 Bandwidth = 0.1032 0 100 300 5005.05.56.06.57.07.58.0 Iterations

Trace of logPsi_neeWinter

Density of logPsi_neeWinter

N = 500 Bandwidth = 0.11730 100 300 500−5−4−3−2−1 Iterations

Trace of logSd2Discr_neeWinter −5 −4 −3 −2 −1 00.00.10.20.30.40.50.6

Density of logSd2Discr_neeWinter

N = 500 Bandwidth = 0.1199 0 100 300 5005.56.06.57.07.58.0 Iterations

Trace of logPsi_rsoil

Density of logPsi_rsoil

N = 500 Bandwidth = 0.068690 100 300 500−4−3−2−10 Iterations

Trace of logSd2Discr_rsoil −4 −3 −2 −1 00.00.20.40.6

Density of logSd2Discr_rsoil

N = 500 Bandwidth = 0.1092 0 100 300 5006.57.07.58.0 Iterations

Trace of logPsi_lfall

Density of logPsi_lfall

N = 500 Bandwidth = 0.10620 100 300 500−5−4−3−2−1 Iterations

Trace of logSd2Discr_lfall −5 −4 −3 −2 −10.00.20.40.60.8

Density of logSd2Discr_lfall

N = 500 Bandwidth = 0.1041

Figure C1.

Trace and marginal distribution of the tails of eight Chains from twoindependent populations sampling 15 parameters of the DALEC model. They indicategood mixing and convergence to the limiting distribution.

EFERENCES Fg Fnf Fnrr TfTl Et Pr LoutLfall Fll Tlab FlrCfmax pRaBelow Clab00.000.250.500.751.000.000.250.500.751.000.000.250.500.751.000.000.250.500.751.00 0.60 0.65 0.70 0.750.70 0.75 0.80 0.85 0.900.1 0.2 0.3 0.4 0.5−2.2 −2.0 −1.8 −1.6−9 −8 −7 −60.010.020.030.040.050.06 5 6 7 8 9 200 210 22012.42512.45012.47512.5000.1000.1250.1500.1750.2000.225−4.5 −4.0 −3.5 −3.0 0.08 0.10 0.12 0.14250 300 350 400 450 500 0.3 0.4 0.5200 300 400 500

Parameter S c a l ed po s t e r i o r den s i t y Scenarios ignoreGPblock

Figure C2.

Marginals of posterior probability density of inverting the DALECmodel. The thick dash-dot line represents prior parameter density.

Andrieu, C., De Freitas, N., Doucet, A. and Jordan, M. I. (2003). An introduction tomcmc for machine learning,

Machine learning (1-2): 5–43.Bormann, N., Saarinen, S., Kelly, G. and Th´epaut, J.-N. (2003). The spatial structureof observation errors in atmospheric motion vectors from geostationary satellite data, Monthly Weather Review (4): 706–718.Brynjarsd´ottir, J. and O’Hagan, A. (2014). Learning about physical parameters: theimportance of model discrepancy,

Inverse Problems (11): 114007.Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm, American Statistician : 327–335.Desroziers, G., Berre, L., Chapnik, B. and Poli, P. (2005). Diagnosis of observation, EFERENCES l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l oooooooooooooooooooooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooo oooooooooooooooooooooooooooooooooooooooooooooo lfallignore lfallGP lfallblockneeignore neeGP neeblockrsoilignore rsoilGP rsoilblock Figure C3.

Comparsion of model predictions to observations of the DALEC inversionduring the calibration period. Columns represent diﬀerent inversion scenarios. ignore:ignoring model discrepancies, GP: Gaussian processes approach, blocks: parameterblock approach. Rows show diﬀerent data streams. Litterfall is shown for predictionsof 10 years of calibration period. NEE and respiration are shown for the last two ofthe 10 years. background and analysis-error statistics in observation space,

Q. J. R. Meteorol. Soc. (613): 3385–3396.Gelman, A., Carlin, J., Stern, H. and Rubin, D. (2003). Bayesian Data Analysis. 2003,

Boca Raton (FL): Chapman Hall .Hastings, W. (1970). Monte carlo sampling methods using markov chains and theirapplications,

Biometrika (1): 97109.Healy, S. and White, A. (2005). Use of discrete Fourier transforms in the 1D-var retrieval EFERENCES

Q. J. R. Meteorol. Soc. (605): 63–72. neglecting correlations leads tooverestimation of information content.Hollinger, D. Y., Aber, J., Dail, B., Davidson, E. A., Goltz, S. M., Hughes, H., Leclerc,M. Y., Lee, J. T., Richardson, A. D., Rodrigues, C. and et al. (2004). Spatialand temporal variability in forest-atmosphere CO2 exchange,

Global Change Biology (10): 1689–1706.Hollinger, D. Y. and Richardson, A. D. (2005). Uncertainty in eddy covariancemeasurements and its application to physiological models, Tree Physiology (7): 873–885.Keenan, T. F., Carbone, M. S., Reichstein, M. and Richardson, A. D. (2011). Themodel–data fusion pitfall: assuming certainty in an uncertain world, Oecologia (3): 587–597.Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models,

Journal of the Royal Statistical Society. Series B, Statistical Methodology pp. 425–464.Luo, Y., Ogle, K., Tucker, C., Fei, S., Gao, C., LaDeau, S., Clark, J. S. and Schimel,D. S. (2011). Ecological forecasting and data assimilation in a data-rich era,

EcologicalApplications (5): 1429–1442.Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E.(1953). Equation of state calculations by fast computing machines, The journal ofchemical physics (6): 1087–1092.Miettinen, K. (1999). Nonlinear multiobjective optimization, volume 12 of internationalseries in operations research and management science. URL: http://users.jyu.ﬁ/ miettine/book/

Nash, J. C. (1990).

Compact numerical methods for computers: linear algebra andfunction minimisation , CRC Press.R Core Team (2016).

R: A Language and Environment for Statistical Computing , RFoundation for Statistical Computing, Vienna, Austria.

URL:

Rasmussen, C. E. and Williams, C. K. I. (2006).

Gaussian Processes for MachineLearning , The MIT Press.

URL:

Reich, S. and Cotter, C. (2015).

Probabilistic Forecasting and Bayesian DataAssimilation , Cambridge University Press.Richardson, A., Williams, M., Hollinger, D., Moore, D., Dail, D., Davidson, E., Scott,N., Evans, R., Hughes, H., Lee, J., Rodrigues, C. and Savage, K. (2010). Estimatingparameters of a forest ecosystem c model with measurements of stocks and ﬂuxes asjoint constraints,

Oecologia pp. –.Tarantola, A. (2005).

Inverse Problems Theory, Methods for Data Fitting and ModelParameter Estimation , SIAM, Philadelphia.

EFERENCES

Statistics and Computing (4): 435–446.Waller, J. A., Dance, S. L., Lawless, A. S. and Nichols, N. K. (2014). Estimatingcorrelated observation error statistics using an ensemble transform Kalman ﬁlter, Tellus A (0). estimation of error covariance matrix by DBCP diagnostics, hereslowly varying with time.Weston, P. P., Bell, W. and Eyre, J. R. (2014). Accounting for correlatederror in the assimilation of high-resolution sounder data, Q.J.R. Meteorol. Soc. (685): 24202429.Wiegand, T., Jeltsch, F., Hanski, I. and Grimm, V. (2003). Using pattern-orientedmodeling for revealing hidden information: a key for reconciling ecological theory andapplication,

Oikos (2): 209–222.Williams, M., Schwarz, P. A., Law, B. E., Irvine, J. and Kurpius, M. R. (2005). Animproved analysis of forest carbon dynamics using data assimilation,

Global ChangeBiology (1): 89–105.Wutzler, T. and Carvalhais, N. (2014). Balancing multiple constraints in model-dataintegration: Weights and the parameter-block approach, J. Geophys. Res. Biogeosci. (11): 2112–2129.Zobitz, J., Desai, A., Moore, D. and Chadwick, M. (2011). A primer for dataassimilation with ecological models using markov chain monte carlo (mcmc),

Oecologia167