[PDF] WOMBAT: A fully Bayesian global flux-inversion framework

Abstract

WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fully Bayesian hierarchical statistical framework for flux inversion of trace gases from flask, in situ, and remotely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through the consideration of a correlated error term, the capacity for online bias correction, and the provision of uncertainty quantification on all unknowns that appear in the Bayesian statistical model. We show, in an observing system simulation experiment (OSSE), that these extensions are crucial when the data are indeed biased and have errors that are correlated. Using the GEOS-Chem atmospheric transport model, we show that WOMBAT is able to obtain posterior means and uncertainties on non-fossil-fuel CO_2 fluxes from Orbiting Carbon Observatory-2 (OCO-2) data that are comparable to those from the Model Intercomparison Project (MIP) reported in Crowell et al. (2019, Atmos. Chem. Phys., vol. 19). We also find that our predictions of out-of-sample retrievals from the Total Column Carbon Observing Network are, for the most part, more accurate than those made by the MIP participants. Subsequent versions of the OCO-2 datasets will be ingested into WOMBAT as they become available.

Full PDF

WWOMBAT: A fully Bayesian global ﬂux-inversion framework

Andrew Zammit-Mangion ∗ School of Mathematics and Applied Statistics,University of Wollongong, AustraliaMichael Bertolacci ∗ School of Mathematics and Applied Statistics,University of Wollongong, AustraliaJenny FisherSchool of Earth and Life Sciences, University of Wollongong, AustraliaAnn StavertClimate Science Centre, CSIRO Oceans and Atmosphere, AustraliaMatthew L. RigbySchool of Chemistry, University of Bristol, UKYi CaoSchool of Mathematics and Applied Statistics,University of Wollongong, AustraliaNoel CressieSchool of Mathematics and Applied Statistics,University of Wollongong, Australia

Abstract

WOMBAT (the WOllongong Methodology for Bayesian Assimilation of Trace-gases) is a fullyBayesian hierarchical statistical framework for ﬂux inversion of trace gases from ﬂask, in situ , and re-motely sensed data. WOMBAT extends the conventional Bayesian-synthesis framework through theconsideration of a correlated error term, the capacity for online bias correction, and the provision ofuncertainty quantiﬁcation on all unknowns that appear in the Bayesian statistical model. We show,in an observing system simulation experiment (OSSE), that these extensions are crucial when the dataare indeed biased and have errors that are correlated. Using the GEOS-Chem atmospheric transportmodel, we show that WOMBAT is able to obtain posterior means and uncertainties on non-fossil-fuelCO ﬂuxes from Orbiting Carbon Observatory-2 (OCO-2) data that are comparable to those from theModel Intercomparison Project (MIP) reported in Crowell et al. (2019, Atmos. Chem. Phys. , vol. 19).We also ﬁnd that our predictions of out-of-sample retrievals from the Total Column Carbon ObservingNetwork are, for the most part, more accurate than those made by the MIP participants. Subsequentversions of the OCO-2 datasets will be ingested into WOMBAT as they become available.

Keywords:

Bayesian Hierarchical Model; Carbon Cycle; Markov Chain Monte Carlo; Physical-StatisticalModel; Spatio-Temporal Statistics. ∗ Contributed equally a r X i v : . [ s t a t . A P ] F e b Introduction

Carbon dioxide (CO ) is a leading driver of global warming (Peters et al., 2013). If left unchecked, the risein global temperatures will have a substantial negative impact on society and the environment (Edenhoferet al., 2014). As part of the worldwide eﬀort to limit these impacts, the 2016 Paris Agreement under theUnited Nations Framework Convention on Climate Change, COP 21, called for a global stocktake of thesources and sinks of CO and other greenhouse gases, with the ﬁrst evaluation planned for 2023 (UNFCCC,2015). The rate at which CO is emitted (from sources) or absorbed (at sinks) per unit space and time isknown as the CO ﬂux, which itself varies substantially spatio-temporally. Despite the fact that it is humanemissions that are driving the rise in atmospheric CO concentrations, these emissions are relatively easy toquantify, and the most uncertain aspects of quantifying CO ﬂuxes at Earth’s surface centre around naturalprocesses. For example, while we know that the land and ocean absorb more than half of the CO emittedby human activities (Dlugokencky and Tans, 2020), the geographical and temporal patterns of these sinksremain unclear (Crowell et al., 2019).Monitoring the progression of CO in our atmosphere is thus of utmost importance, and billions ofdollars have been spent over the last few decades on research, development, and deployment of instrumentsfor measuring CO mole fraction (deﬁned here as the proportion of CO molecules in a given parcel of dryair) on a global scale (Burrows et al., 1995; Kuze et al., 2009; Wunch et al., 2011a; Masarie et al., 2014;Eldering et al., 2017, 2019). However, CO mole fraction is only indirectly related to the key quantityof interest, namely the geographic distribution of the CO ﬂuxes, which cannot be observed directly onregional scales. Identifying these sources and sinks is an ill-posed inverse problem, often called a trace-gasﬂux-inversion problem, whose solution requires the use of both an atmospheric transport model and a spatio-temporal model for the ﬂuxes (e.g., Enting, 2002). In this paper, we present a global ﬂux-inversion systemfor the solution of this problem, which we call the WOllongong Methodology for Bayesian Assimilation ofTrace-gases (WOMBAT).A global trace-gas ﬂux-inversion system is designed to infer ﬂuxes from observational data, which aregenerally available either as point-referenced (ﬂask or in situ ) measurements or column-averaged remote-sensing retrievals. The underlying model in an inversion system is usually a state-space model, where theﬂuxes, or a reduced representation thereof, are the latent states that need to be inferred from data via theuse of an atmospheric chemical transport model (CTM). Computationally, ﬂux estimation is either donewithin a standard optimisation framework (e.g., Chevallier et al., 2005; Baker et al., 2006), via full Bayesiansynthesis (Enting 2002, Chapter 3; Mukherjee et al. 2011; Schuh et al. 2019), or via ensemble Kalmanﬁltering (e.g., Peters et al., 2005; Feng et al., 2009). Inversion systems rely, to various extents, on realisticbottom-up estimates of ﬂuxes for the elicitation of an informative prior distribution; an accurate CTM thatprovides the link between the ﬂuxes and the observed mole fraction; high-quality unbiased measurements;and reliable uncertainty measures on each model component.The complexity of all modelled processes, from ﬂuxes right up to satellite retrieval errors, inevitably leadsto model misspeciﬁcation. Further, because ﬂux inversion is an ill-posed problem, even if the misspeciﬁcationis slight, the consequences for model-based ﬂux estimates could be signiﬁcant (e.g., Enting and Newsam,1990), and they can lead to a large spread in posterior ﬂux estimates from diﬀerent ﬂux-inversion systems.Hence, uncertainty from an ensemble of inversions is usually quantiﬁed through the spread of the posteriormeans from diﬀerent inversion systems, rather than through the application of a meta-analysis that takesinto account statistical uncertainties of individual estimates (e.g., Houweling et al., 2015; Crowell et al.,2019; Rayner, 2020). The large disagreements between inversion results suggests that at least some inversionsystems are missing (or have misspeciﬁed) certain components of variability in their underlying models.The causes of model misspeciﬁcation are numerous; for a comprehensive discussion, see Engelen et al.(2002). The main ones are (i) ﬂux-process dimension-reduction error (e.g., Kaminski et al., 2001), whichis a consequence of using a spatio-temporal model for the ﬂux ﬁeld that is low-dimensional and inﬂexible;(ii) an inaccurate prior ﬂux mean, variance, and covariance (e.g., Philip et al., 2019); (iii) transport-modelerrors (e.g., Houweling et al., 2010; Basu et al., 2018; Schuh et al., 2019) arising from the underlying assumedphysics, meteorology, and discretisation schemes used (e.g., Lauvaux et al., 2019; McNorton et al., 2020);(iv) retrieval biases (e.g., O’Dell et al., 2018) and incorrect associated measurement-error statistics (e.g.,Worden et al., 2017); and (v) measurement-error spatio-temporal correlations that are not fully accountedfor (e.g., Chevallier, 2007; Ciais et al., 2010). Two other causes of model misspeciﬁcation worth noting are an2ncorrectly speciﬁed initial global mole-fraction ﬁeld, and ﬂux components assumed known in the inversion(i.e., assumed degenerate at their prior mean), such as anthropogenic emissions (e.g., Feng et al., 2019). Thelatter can be seen as a special case of (i) above, while the eﬀect of the former can generally be minimised byusing a realistic initial condition (e.g., Basu et al., 2013) and incorporating a burn-in (or spin-up) period, inwhich early ﬂux estimates are discarded.In Section 2 we present the underlying framework of WOMBAT. The statistical framework addressesthe implications of model misspeciﬁcation in several ways: ﬁrst, by using prior distributions to encodeuncertainty over prior beliefs on the ﬂuxes (e.g., Ganesan et al., 2014; Zammit-Mangion et al., 2016); second,by adding a spatio-temporally correlated component of variability to the mole-fraction data model to addresssome of this typically unmodelled, correlated model–data discrepancy (Brynjarsd´ottir and O’Hagan, 2014);third, by explicitly modelling biases in the mole-fraction data model (generalising the approach of Basu et al.,2013); and fourth, by propagating uncertainty on all unknowns within a fully Bayesian statistical frameworkwherein inference is made using Markov chain Monte Carlo (MCMC). We note that while the beneﬁts ofMCMC are becoming increasingly apparent in regional trace-gas inversions (e.g., Mukherjee et al., 2011;Ganesan et al., 2014; Miller et al., 2014; Zammit-Mangion et al., 2016), its use is still the exception ratherthan the rule in global trace-gas inversions.The fully Bayesian nature of the model, coupled with the introduction of a correlated process in modellingthe mole-fraction ﬁeld, leads to computational challenges. Section 3 details how we deal with these by deﬁninga speciﬁc type of stochastic process on the irregularly located spatio-temporal errors, one that leads to asparse precision matrix (Vecchia, 1988; Datta et al., 2016). Details on how this facilitates the MCMC schemewe implement are deferred to Appendix A. Section 4 discusses the experimental setup used for running,validating, and implementing WOMBAT on the satellite data analysed in this article. In Section 5, we ﬁrstconduct an observing system simulation experiment (OSSE) in which the true ﬂuxes are assumed known.Results from the OSSE demonstrate WOMBAT’s validity and also illustrate the importance of modellingbiases and correlated error terms when these are indeed present in the data. We then use WOMBAT toperform ﬂux inversion using the Orbiting Carbon Observatory-2 (OCO-2) Version 7 retrospective (V7r)dataset as used in the model intercomparison project (MIP) of Crowell et al. (2019). Our model ﬁttingreveals that about 80% of the total measurement-error variance associated with the data used for the MIPcan be explained with a correlated model–data discrepancy term. WOMBAT accounts for this, which resultsin posterior distributions over the ﬂuxes that, for the most part, corroborate the results from the ensembleinversions, both on a regional and on a global scale. In Section 5, we also show the utility of WOMBAT incarrying out online bias correction. Section 6 summarises the features of WOMBAT and discusses avenuesfor future work. Code reproducing the analyses is available online at https://github.com/mbertolacci/wombat-paper/ . In this section we outline the spatio-temporal Bayesian hierarchical statistical model (see, for example,Wikle et al., 2019, Section 1.3) that WOMBAT uses for global ﬂux inversion. The model consists of foursub-models: (1) a ﬂux process model, (2) a mole-fraction process model, (3) a mole-fraction data model, and(4) a parameter model.

Let Y ( s , t ) denote the prior mean of the trace-gas surface ﬂux at spatial location s ∈ S and time t ∈ T ,where S is the surface of Earth, and T ≡ [ t , t f ] is some time interval of interest. The ﬁeld Y ( · , · ) couldbe set to zero everywhere (e.g., Michalak et al., 2004) or could be constructed using bottom-up estimates ofbiospheric and/or anthropogenic ﬂuxes (e.g., Basu et al., 2013).In the vein of conventional Bayesian-synthesis frameworks (e.g., Enting, 2002), we model the true ﬂux, Y ( · , · ), as Y ( · , · ) plus a spatio-temporal ﬁeld that consists of two components. The ﬁrst component, the structured component, is the sum of r spatio-temporal basis functions that are not necessarily disjoint inspace and time. These basis functions could be space-time step functions, as typically found in variational3nversion systems (e.g., Chevallier et al., 2005), discretised ﬂux ‘patterns’ (e.g., Fan et al., 1998), or theycould be a general-purpose basis such as a Fourier basis (e.g., Crowell et al., 2019, Appendix A4). The secondcomponent of Y ( · , · ) is an error arising from the use of a low-dimensional model for the ﬁrst componentand is commonly termed aggregation error; here we call it dimension-reduction error. This error accountsfor the fact that the structured basis functions typically span a small (function) space, and therefore theycannot reproduce ﬂuxes perfectly. In practice, these errors correspond to sub-grid-cell ﬂux variation or, moregenerally, to variations at smaller scales than the basis functions can resolve.Denote the set of pre-speciﬁed ﬂux basis functions as { φ j ( · , · ) : j = 1 , . . . , r } , and the dimension-reductionerror component as v ( · , · ). Our ﬂux process model is a spatio-temporal stochastic process given by, Y ( s , t ) = Y ( s , t ) + r (cid:88) j =1 φ j ( s , t ) α j + v ( s , t ) , s ∈ S , t ∈ T , (1)where the scaling factors { α j : j = 1 , . . . , r } are unknown, are assigned a multivariate probability distribution,and need to be inferred in the inversion framework. The error process v ( · , · ) is also unknown but, as wewill discuss later, it is not possible to estimate it in isolation of other components of error that appear inother layers of the hierarchical statistical model.Since we assume that E( Y ( · , · )) = Y ( · , · ) , we let E( α j ) = 0 for j = 1 , . . . , r, and E( ν ( · , · )) = 0. Fora given set of basis functions, the prior belief on the covariance structure of Y ( · , · ) is fully determined bythat on α ≡ ( α , . . . , α r ) (cid:48) and that on v ( · , · ). Reasonable prior judgements for the moments of both ofthese will in turn largely depend on the choice of basis functions. For example, when the ﬂux is modelledon a space-time grid and space-time step functions are used as basis functions, a prior distribution on α that correlates the ﬂux a priori in space and time is natural (Michalak et al., 2004; Chevallier et al., 2007;Basu et al., 2018). On the other hand, when using large spatial ﬂux ‘patterns’ that have temporally-limitedscope, it is generally reasonable to assume that the α j ’s corresponding to basis functions in diﬀerent spatialregions are uncorrelated, but that those associated with the same spatial region are temporally correlated.Irrespective of the choice of basis functions, in WOMBAT one expresses prior judgement on α throughthe model α ∼ Gau( , Σ α ), where Gau( µ , Σ ) is the Gaussian probability density function of a random vectorwith mean µ and covariance matrix Σ . The covariance matrix Σ α is parameterised through a parametervector θ α , which typically contains variances and spatio-temporal length scales in the covariances. Expertelicitation can be used to construct prior distributions on these parameters too; we describe possible priordistributions when discussing the parameter model in Section 2.4.It is harder to formulate prior beliefs on the process v ( · , · ), but it is reasonable to claim that this processhas strictly positive variance and is spatio-temporally correlated. For this reason, it is natural to model v ( · , · ) as a spatio-temporal Gaussian process (e.g., Rasmussen and Williams, 2006) with zero expectation.However, as mentioned above, we temporarily set aside describing its covariance function. Instead we seewhat impact this term can be expected to have on the mole-fraction ﬁeld and its consequent process model,and then we model the spatio-temporal discrepancies of all possible contributors to model–data discrepancytogether (Section 2.3). Denote the true mole-fraction process at space-height-time location ( s , h, t ) as Y ( s , h, t ). We only model themole fraction within our time interval of interest, T , and therefore we express the true mole fraction ﬁeldwithin T as a function of the initial mole-fraction process at t and the exogenous ﬂux-process inputs in T .Speciﬁcally, deﬁne T t ≡ [ t , t ] , for t ≤ t ≤ t f , as the set of all time points in T up to and including t . Theﬁeld Y ( · , · , · ) is related to the ﬂux process through the relationship, Y ( s , h, t ) = H ( Y ( · , · , t ) , Y ( · , T t ); s , h, t ) , s ∈ S , h ∈ R + , t ∈ T , (2)where Y ( · , · , t ) is the mole-fraction ﬁeld at time t , Y ( · , T t ) is the ﬂux ﬁeld evolving over the wholetime period T t , and H is an operator that solves the underlying chemical transport equations (that areapproximately linear for long-lived species such as CO ; see, e.g., Enting, 2002, Chapter 2). In practice, H is not known perfectly, but we usually have at hand a reasonable approximation to it, ˆ H , which is often4eferred to as the chemical transport model (CTM) or simply as the transport model. Similarly, we willusually have a reasonable approximation to Y ( · , · , t ), which we call ˆ Y ( · , · , , t ). The use of ˆ Y ( · , · , t )instead of Y ( · , · , t ), and of ˆ H instead of H , leads to a residual term v ( · , · , · ) that will inevitably bespatio-temporally correlated (Enting, 2002, Chapter 9). In particular, Y ( s , h, t ) = ˆ H ( ˆ Y ( · , · , t ) , Y ( · , T t ); s , h, t ) + v ( s , h, t ) , s ∈ S , h ∈ R + , t ∈ T , (3)where v ( · , · , · ) is the residual mole-fraction process arising from the use of an approximate initial mole-fraction ﬁeld, imperfect meteorology inside the transport model, imperfect transport-model parameters andphysics, and potentially sub-grid-scale variation in the mole-fraction when ˆ H is a numerical model evaluatedat a coarse resolution. It is diﬃcult to place prior beliefs on the structure of v ( · , · , · ), which we modelas statistical error, but it is known that using the approximation ˆ H introduces errors that could spanhundreds of kilometres and several days (Lauvaux et al., 2019; McNorton et al., 2020). Transport-modelimplementations tend to diﬀer considerably in their vertical and inter-hemispheric mixing behaviour, andﬂux-inversion estimates are known to be particularly sensitive to transport-model choice (Gurney et al.,2002; Schuh et al., 2019). Note that v ( · , · , · ) is also likely to depend on Y ( · , · ), and that we ignore thisdependence for model simplicity in what follows.The assumed linear behaviour of the underlying dynamics for CO is important. First, it allows us tomodel the eﬀect of the approximate initial mole-fraction ﬁeld, ˆ Y ( · , · , t ), separately from that of the ﬂuxes(e.g., Enting, 2002, Chapter 10), so that (3) is of the form Y ( s , h, t ) = ˆ H ( ˆ Y ( · , · , t ) , s , h, t ) + ˆ H (0 , Y ( · , T t ); s , h, t ) + v ( s , h, t ) , s ∈ S , h ∈ R + , t ∈ T . (4)Second, it allows us to express the mole-fraction ﬁeld as a linear combination of the structured componentof Y ( · , · ) and the ﬂux dimension-reduction error component, as we now show. Substituting (1) into (4), wehave that Y ( s , h, t ) = Y ( s , h, t ) + r (cid:88) j =1 ˆ ψ j ( s , h, t ) α j + v , ( s , h, t ) + v ( s , h, t ) , s ∈ S , h ∈ R + , t ∈ T , (5)where Y ( s , h, t ) ≡ ˆ H ( ˆ Y ( · , · , t ) , s , h, t ) + ˆ H (0 , Y ( · , T t ); s , h, t ); ˆ ψ j ( s , h, t ) ≡ ˆ H (0 , φ j ( · , T t ); s , h, t ) , j =1 , . . . , r , are basis functions in mole-fraction space, often termed response functions (e.g., Saeki et al., 2013);and v , ( s , h, t ) ≡ ˆ H (0 , v ( · , T t ); s , h, t ) is the impact of ﬂux dimension-reduction error on the mole fractionat ( s , h, t ) after application of the approximate transport model ˆ H ; and v ( s , h, t ) is the residual term givenin (3). We assume that E( v ( · , · , · )) = 0; in this case Y ( s , h, t ) , can be seen as the prior expectation ofthe mole-fraction ﬁeld at ( s , h, t ) under ˆ H and ˆ Y ( · , · , t ). That is, it is the mole fraction ﬁeld generated byrunning our CTM with the input ﬂuxes set to the prior expected ﬂux, and with the mole-fraction ﬁeld at t set to ˆ Y ( · , · , t ).Without further knowledge, the two spatio-temporal error processes v , ( · , · , · ) and v ( · , · , · ) cannot bedisentangled, and hence we do not attempt to model them separately. In Section 2.3 we will introduce anothersource of correlated error, and ultimately all these components are modelled together as one ‘model–datadiscrepancy’ term. Fluxes cannot be observed directly at the spatial and temporal scales of interest. Flux inversion thereforeproceeds by ‘constraining’ the ﬂux ﬁeld using column-averaged retrievals or point-referenced measurementsof mole fraction. We use Z ,i to denote the i th mole-fraction measurement or retrieval, where i ∈ { , . . . , m } indexes the datum used in the inversion, and m is the number of data used in the inversion.Point-referenced (PR) measurements of mole fraction are generally made at or near Earth’s surface, usinginstruments on towers or in aircraft. The mole-fraction data model for these measurements is therefore givenby Z ,i = Y ( s i , h i , t i ) + (cid:15) i , if Z ,i is from a point-referenced instrument , (6)where Z ,i is the observed mole fraction at ( s i , h i , t i ), and (cid:15) i is mean-zero Gaussian measurement error with5 model for its variance parameter presented below in Section 2.4. Measurement errors associated withpoint-referenced instruments are generally small and (usually) not correlated in space and time. Despitethis, such data are not immune to the eﬀects of spatio-temporal correlations induced by the CTM in theprocess model, and they may even be more susceptible than column-averaged retrievals due to the combinedeﬀect of their usual proximity to the surface and the discretisations employed when simulating approximatetransport (Rayner and O’Brien, 2001; Basu et al., 2018).Column-averaged (CA) retrievals, such as XCO (where ‘X’ refers to the column averaged nature ofthe retrievals) from the Orbiting Carbon Observatory-2 (OCO-2) or the Total Carbon Column ObservingNetwork (TCCON), are noisier than PR measurements, although TCCON less so. In particular, since theraw spectral information collected for the retrieval is aﬀected by environmental factors such as aerosols(O’Dell et al., 2012), the errors can contain biases as well as exhibit spatio-temporal correlations. Thesebiases can also be instrument-mode dependent (e.g., land glint vs land nadir retrievals; see Section 4.4.1).The vertical averaging operation also involves an averaging kernel and an a priori bias correction, whichare both speciﬁc to the retrieval, and which arise from the algorithm used for the retrieval. In general, thisrelationship can be expressed as Z ,i = ˆ A i ( Y ( s i , · , t i )) + b i + v Z ,i + (cid:15) i , if Z ,i is a column-averaged retrieval, (7)where ( s i , t i ) is the space-time location of the retrieval, ˆ A i is the assumed (but necessarily approximate)observation operator of the i th retrieval that column-averages the mole fraction ﬁeld via an averaging kernel; b i is bias; v Z ,i is mean-zero spatio-temporally correlated random error; and (cid:15) i is mean-zero uncorrelatedrandom error. The bias and error terms arise from the use of an approximate observation operator. Surface-based or remotely sensed CA retrievals are sometimes provided as ‘bias-corrected.’ In this case, the datamodel for these retrievals is identical to (7) but with the bias component omitted.Substituting (5) into (6) and (7) we see that, in general, we have that Z ,i = ˆ Z ,i + r (cid:88) j =1 ˆ ψ j,i α j + b i + ξ i + (cid:15) i , (8)where, for a PR measurement, ˆ Z ,i ≡ Y ( s i , h i , t i ); ˆ ψ j,i ≡ ˆ ψ j ( s i , h i , t i ); b i = 0; and ξ i ≡ v , ( s i , h i , t i ) + v ( s i , h i , t i ); while for a CA retrieval, ˆ Z ,i ≡ ˆ A i ( Y ( s i , · , t i )); ˆ ψ j,i ≡ ˆ A i ( ˆ ψ j ( s i , · , t i )), j = 1 , . . . , r ; and ξ i ≡ ˆ A i ( v , ( s i , · , t i ) + v ( s i , · , t i )) + v Z ,i . Bias-corrected retrievals follow (8) but with the bias componentomitted. Note that, for identiﬁability reasons, we have modelled all the various correlated-error terms usingone component, { ξ i } .Flux inversions can make use of both PR measurements and CA retrievals simultaneously, and hence itis convenient to provide a data model that encapsulates both types of measurements. It is also often thecase that measurements from the same instrument type can be divided into groups that can be expected tohave similar characteristics, such as group-speciﬁc bias and error properties. A given group could contain,for example, PR data from the same in situ instrument, or CA retrievals from a particular remote sensinginstrument under a speciﬁc retrieval mode (e.g., land nadir). Hence, we consider the following general datamodel, where diﬀerent groups have diﬀerent terms, but the overall structure is the same: Z ,g = ˆ Z ,g + ˆ Ψ g α + b g + ξ g + (cid:15) g , g = 1 , . . . , n g , (9)where n g is the number of groups; Z ,g contains the data in group g ; ˆ Z ,g are the prior expected mole-fractions under the approximate transport model and, if the groups consist of CA retrievals, under theapproximate observation operators; ˆ Ψ g are the response functions evaluated at either the PR locations (inthe case of PR measurements) or averaged over a column via an approximate observation operator (in thecase of CA retrievals); b g ≡ A g β g are group-speciﬁc biases, with A g a group-speciﬁc design matrix and β g the corresponding weights; ξ g is the group’s vector of correlated errors; and (cid:15) g is the group’s vector ofuncorrelated errors. When the data in the group are considered to be unbiased (or already bias-corrected),the term A g β g = . The variables constituting β g and (cid:15) g , for g = 1 , . . . , n g , are mutually independentwithin and across groups. 6he most diﬃcult terms to characterise, and hence those most likely to lead to model misspeciﬁcationissues, are the correlated error terms { ξ g : g = 1 , . . . , n g } , since these are sums of multiple unidentiﬁableerror terms. Clearly, { ξ g } are correlated both within and between groups, as they have common sources ofvariation such as transport-model error. However, our Bayesian statistical framework in WOMBAT assumesthat { ξ g } are independent across groups, for two reasons. The ﬁrst reason is computational, as estimatingcross-correlations between groups adds an extra layer of computational complexity that is probably unneces-sary given the amount of data that can be expected within each group. The second reason is more subtle andrelates to covariance nonstationarity in the transport model and the ﬂux dimension-reduction error. Thiscovariance nonstationarity manifests itself as nonstationarity in the correlation of the model–data discrep-ancy, and the simplest way to model this eﬀect is to assume independence across groups of measurementsthat are proximal in space and/or time (such as measurements from the same in situ instrument) and thento ﬁt separate covariance-model parameters to data from each group.Irrespective of the model chosen for inter-group correlations, as demonstrated through its construction,the correlation between elements within each { ξ g } associated with measurements that are proximal in spaceand time are stronger than those that are farther apart. The elements will almost certainly also havesubstantial variance. However, there has been extensive discussion as to whether this correlation needsto be explicitly modelled within an inversion framework. While spatio-temporal correlation in model–datadiscrepancies is widely acknowledged (Chevallier, 2007; Ciais et al., 2010; Mukherjee et al., 2011; Miller et al.,2020), the general consensus is that using just the variance of ξ g to add to the variance of the uncorrelatedcomponent (or, similarly, to omit this term entirely and just scale the variance of the (cid:15) g instead), is suﬃcient(e.g., Michalak et al., 2005; Basu et al., 2013; Deng et al., 2016). However, as we show in simulations givenin Section 5.1, even when a measurement-error variance inﬂation factor is estimated, statistical eﬃciency islost under the assumption of uncorrelated errors if the errors truly are correlated. The main reason not toroutinely model spatio-temporal correlations in global ﬂux inversion appears to be computational; we discussa way to rectify this bottleneck in Section 3. More details on how we represent the model–data discrepancyare given in Section 2.4. The parameter model (i.e., prior distributions) is dependent on the speciﬁcation of the ﬂux process model,the mole-fraction process model, and the mole-fraction data model. Here, we describe the parameter modelwe use in the OSSE and in the MIP comparison in Section 5.

Parameters of the ﬂux process model:

In the experiments of Section 5, our ﬂux basis functions are frombottom-up inventories that are divided into r s spatial regions and r t time spans. This construction yields r = r s × r t basis functions, and naturally suggests a temporal partitioning of α into ( α , . . . , α r t ) (cid:48) , whereeach α k ∈ R r s , k = 1 , . . . , r t . This, in turn, suggests that a suitable model for α is the vector-autoregressiveprocess, similar to that used by Peters et al. (2005) and Dahl´en et al. (2020), α k +1 = M ( κ ) α k + w k ; k = 1 , . . . , r t , where, in our examples, we constrain the matrix M ( κ ) to be diagonal with non-zero elements equal to κ ≡ ( κ , . . . , κ r s ) (cid:48) ; and we let w k ∼ Gau( , Σ w ), where the precision matrix Q w ≡ Σ − w is diagonal withpositive elements τ w ≡ ( τ w,j : j = 1 , . . . , r s ) (cid:48) . The ﬂux-process parameters are therefore θ α ≡ ( κ (cid:48) , τ (cid:48) w ) (cid:48) ,which in turn govern the covariance matrix of α , Σ α ≡ var( α ). There is an ordering of α for which Σ − α isblock-tridiagonal (see Appendix A). Either sequential estimation (e.g., Kalman ﬁltering/smoothing) or batchBayesian updating can be used to make inference on α . In our case we use the latter, and take advantageof eﬃcient algorithms that are available for sparse linear algebraic computations.We expect that { α k } are positively correlated in time. Therefore, for the prior distributions for { κ j : j =1 , . . . , r s } , we use the Beta distribution: Independently, κ j ∼ Beta( a κ,j , b κ,j ) , j = 1 , . . . , r s , where { a κ,j } and { b κ,j } are ﬁxed and assumed known. For prior distributions for the precision parameters { τ w,j } we use gamma distributions with shape parameters { ν w,j } and rate parameters { ω w,j } , which are7xed and assumed known: Independently, τ w,j ∼ Ga( ν w,j , ω w,j ) , j = 1 , . . . , r s . Michalak et al. (2005) suggested that variance parameters could be estimated directly in a maximum likeli-hood framework. The use of prior distributions on τ w and κ adds an extra level of ﬂexibility and allows themodeller to express the ‘uncertainty on the uncertainties’ in an inversion framework (e.g., Ganesan et al.,2014). One can conﬁgure these prior distributions to be extremely informative and, eﬀectively, ﬁx the priormodel for the ﬂux, or else make them extremely uninformative so that the posterior maximum- a-posteriori (MAP) estimates of the parameters in the prior ﬂux-process model converge to the maximum likelihoodestimates. This choice can be made on a region-by-region basis, as is the case in Section 4.5, where landregions are given uninformative priors and and ocean regions are given informative ones. CA mole-fraction retrieval bias parameters:

In our studies of Section 5, we consider OCO-2 retrievalsand a diﬀerent set of bias parameters for each instrument mode. In this context, β g is associated with aparticular instrument mode and a single element of β g captures the bias arising from, for example, aerosolpresence. Investigations in Section 5.3 reveal that these bias parameters are quite easily constrained in aninversion framework. When constructing the model for each β g , we ﬁrst standardise each row in A g so thatthe covariates have unit marginal empirical variance. Then we model { β g } as follows: Independently, β g ∼ Gau( , σ β I ) , g = 1 , . . . , n g , with σ β = 100. This choice for σ β renders the prior distribution uninformative for the data-set sizes weconsider in our experiments. Model–data discrepancy and measurement-error parameters:

For each g = 1 , . . . , n g , the terms ξ g and (cid:15) g need to be considered together, since often they are confounded in inversion studies. In particular,there are two cases we need to cater for:Case (i): Measurements Z ,g are usually provided to the modeller with accompanying uncertainty mea-sures in the form of marginal standard errors. In this ﬁrst case, the prescribed standard error is what weattribute to the component (cid:15) g by constructing the (diagonal) covariance matrix, var( (cid:15) g ) ≡ Σ (cid:15),g , with thesquared prescribed standard errors on the diagonal. However, it is likely that the prescribed standard er-rors are either over-estimates or under-estimates of the true standard error (Worden et al., 2017), and wetherefore let diag( Σ (cid:15),g ) = γ g · v ps g , where v ps g is a vector containing the square of the prescribed standarderrors, and { γ g : g = 1 , . . . , n g } are group-speciﬁc inﬂation factors (e.g., Michalak et al., 2005). We modelthe inﬂation factors using inverse-gamma distributions: Independently, γ g ∼ IG( ν γ g , ω γ g ) , g = 1 , . . . , n g , where the shape and rate parameters, { ν γ g } and { ω γ g } , are ﬁxed and assumed known.We model the covariance matrix Σ ξ g ≡ var( ξ g ) , using a spatio-temporal covariance function C ξ g ( · , · ; θ ξ g )(e.g., Cressie and Wikle, 2011, Chapter 6), where the parameter vector θ ξ g contains unknown parametersthat need to be inferred from the data. Here we have n τ g precision parameters, τ ξ g , and n (cid:96) g length scaleparameters, (cid:96) g ; that is, for Case (i), θ ξ g = ( τ (cid:48) ξ g , (cid:96) (cid:48) g ) (cid:48) . We model each precision parameter and each length-scale parameter in each group using gamma distributions: Independently, τ ξ g ,j ∼ Ga( ν ξ g ,j , ω ξ g ,j ) , j = 1 , . . . , n τ g ; g = 1 , . . . , n g ,(cid:96) g,j ∼ Ga( ν (cid:96) g ,j , ω (cid:96) g ,j ) , j = 1 , . . . , n (cid:96) g ; g = 1 , . . . , n g , where the shape and rate parameters { ν ξ g ,j } , { ω ξ g ,j } , { ν (cid:96) g ,j } , and { ω (cid:96) g ,j } , are ﬁxed and assumed known.Case (ii): Occasionally, as in the MIP of Crowell et al. (2019), one is provided with a measure ofuncertainty that contains a component intended to account for CTM error. In this case, it is reasonable tolet the total marginal variance from ξ g and (cid:15) g equal the inﬂated prescribed variance, γ g · v ps g . We enforce thisconstraint by letting diag( Σ ξ g ) = ρ g γ g · v ps g , and diag( Σ (cid:15) g ) = (1 − ρ g ) γ g · v ps g . This formulation encapsulatesthe notion that the prescribed error includes the transport model–error component, but that the overall8ontribution of that component is uncertain. Thus, the parameter ρ g needs to be estimated in order toobtain the relative contribution of the correlated component. The inﬂation factor γ g is still needed since,ﬁrst, the prescribed variances could still be misspeciﬁed and, second, even if the prescribed variances werecorrectly speciﬁed, they do not take into account the dimension-reduction error, that is speciﬁc to the ﬂuxprocess model.We model the contribution factors { ρ g : g = 1 , . . . , n g } via a standard uniform distribution: Indepen-dently, ρ g ∼ Unif(0 , , g = 1 , . . . , n g , and, as in Case (i), we use gamma prior distributions to model the length scales { (cid:96) g } in the covariancefunction. For Case (ii), θ ξ g = ( ρ g , (cid:96) (cid:48) g ) (cid:48) . To summarise, the Bayesian model we implement in WOMBAT consists of a ﬂux process model, a mole-fraction process model, a mole-fraction data model, and a parameter model, where the parameter modeldepends on the speciﬁcations of the other three. The hierarchical Bayesian model, which we use in theexamples of Section 5, can be written succinctly as follows: Independently,Flux scaling factors autoregressive parameters: κ j ∼ Beta( a κ,j , b κ,j ) , j = 1 , . . . , r s , Flux scaling factors innovation-term precisions: τ w,j ∼ Ga( ν w,j , ω w,j ) , j = 1 , . . . , r s , Flux scaling factors: α | κ , τ w ∼ Gau( , Σ α ) , Measurement-bias coeﬃcients: β g ∼ Gau( , σ β I ) , g = 1 , . . . , n g , Measurement-error variance inﬂation factors: γ g ∼ IG( ν γ g , ω γ g ) , g = 1 , . . . , n g , Correlated-error length scales: (cid:96) g,j ∼ Ga( ν (cid:96) g ,j , ω (cid:96) g ,j ) , j = 1 , . . . , n (cid:96) g ; g = 1 , . . . , n g , Correlated-error precisions (only Case (i)): τ ξ g ,j ∼ Ga( ν ξ g ,j , ω ξ g ,j ) , j = 1 , . . . , n τ g ; g = 1 , . . . , n g , Correlated-error contribution (only Case (ii)): ρ g ∼ Unif(0 , , g = 1 , . . . , n g , Likelihood: Z | β , α , θ ξ,(cid:15) ∼ Gau( ˆ Z + ˆ Ψ α + A β , Σ ξ + Σ (cid:15) ) , where Z ≡ ( Z (cid:48) , , . . . , Z (cid:48) ,n g ) (cid:48) , ˆ Z ≡ ( ˆ Z (cid:48) , , . . . , ˆ Z (cid:48) ,n g ) (cid:48) , ˆ Ψ ≡ ( ˆ Ψ (cid:48) , . . . , ˆ Ψ (cid:48) n g ) (cid:48) , A ≡ bdiag( A , . . . , A n g ) , β ≡ ( β (cid:48) , . . . , β (cid:48) n g ) (cid:48) , Σ ξ ≡ bdiag( Σ ξ , . . . , Σ ξ ng ) , Σ (cid:15) ≡ bdiag( Σ (cid:15) , . . . , Σ (cid:15) ng ) , and θ ξ,(cid:15) ≡ (( θ (cid:48) ξ g , γ g ) : g =1 , . . . , n g ) (cid:48) . Here, bdiag( · ) constructs a block-diagonal matrix from its arguments. A graphical model sum-marising the relationships between the variables is given in Figure 1.The joint posterior distribution over all quantities can be estimated using a Gibbs sampler, which succes-sively ‘updates’ parameters using their full conditional distributions. Often, our choice of prior distributionsensure that these conditional distributions are of known form and can be sampled from directly. When this isnot the case (in particular, for the parameters κ , (cid:96) g , and ρ g , where g = 1 , . . . , n g ), we employ a slice sampler(Neal, 2003) to obtain samples from the full conditional distributions. Details are given in Appendix A. The posterior distribution over all the unknown parameters in our model is given by p ( α , β , κ , τ w , θ ξ,(cid:15) | Z ) ∝ p ( Z | β , α , θ ξ,(cid:15) ) p ( α | κ , τ w ) p ( β ) p ( θ ξ,(cid:15) ) p ( κ ) p ( τ w ) . (10)9 τ w α Z ,g β g ρ g (cid:96) g γ g g = 1 , . . . , n g Fig. 1: Graphical model summarising the relationship between the variables and parameters to be inferredand the grouped data { Z ,g : g = 1 , . . . , n g } in Case (ii). In Case (i) we replace ρ g with τ ξ g .The log of the ﬁrst two terms on the right-hand side of (10) are expressions that are commonly seen inoptimisation-based ﬂux-inversion frameworks. In particular, we have thatlog p ( Z | β , α , θ ) p ( α | κ , θ w ) = −

12 log | Σ ξ + Σ (cid:15) | −

12 ( Z − Z ,p ( β , α )) (cid:48) ( Σ ξ + Σ (cid:15) ) − ( Z − Z ,p ( β , α )) −

12 log | Σ α | −

12 ( α − α p ) (cid:48) Σ − α ( α − α p ) + const., (11)where Z ,p ( β , α ) ≡ ˆ Z + ˆ Ψ α + A β , in our case we set α p = , and “const.” denotes a constant. The primarydiﬀerences to popular optimisation-based ﬂux-inversion frameworks are the presence of the log-determinants,which penalise covariance matrices that have large determinants (large correlations and/or large variances),and the presence of oﬀ-diagonal elements in the matrix Σ ξ + Σ (cid:15) . The log-determinants can only be omittedwhen all covariance matrices are considered known a priori .The computational complexity of the Gibbs sampler is dominated by that of the log-likelihood functionlog p ( Z | β , α , θ ξ,(cid:15) ) , which is the sum of the group-wise log-likelihood functions, log p ( Z ,g | β , α , θ ξ,(cid:15) ), g = 1 , . . . , n g . For computationally eﬃcient inference, we must hence ensure that each group-wise log-likelihood function is simple to evaluate. From (9), the group-wise log-likelihood is given bylog p ( Z ,g | β , α , θ ξ,(cid:15) ) = − m g π −

12 log | Σ g |−

12 ( Z ,g − ˆ Z ,g − ˆ Ψ g α − A g β g ) (cid:48) Σ − g ( Z ,g − ˆ Z ,g − ˆ Ψ g α − A g β g ) , (12)for g = 1 , . . . , n g , where Σ g ≡ Σ ξ g + Σ (cid:15) g , and m g is the number of data in group g . From a computationalperspective there are two components in (12) that can present diﬃculties. The ﬁrst component is the matrixˆ Ψ g ; this matrix is dense, and its number of elements scales linearly with both the number of data pointsand the number of basis functions, r . However, this matrix just needs to be evaluated once using the CTMand can typically be generated eﬃciently on a large parallel computing infrastructure (see Section 4.6). Thesecond component is the matrix Σ g , which is of size m g × m g and generally dense. Recall from Section 2.4that this covariance matrix is constructed using a spatio-temporal covariance function C ξ g ( · , · ; θ ξ g ), wherethe parameters θ ξ g are sampled within the Gibbs sampler. Therefore, this covariance matrix needs to bere-constructed at each sampler iteration. For m g ≈ , Σ g is toolarge to ﬁt in the random-access memory of most computers, let alone factorised, which is infeasible.We deal with Σ g by using an approximation ﬁrst proposed by Vecchia (1988), and further formalisedunder the guise of nearest-neighbour Gaussian processes by Datta et al. (2016). Here, one ﬁrst orders the10lements of ξ g . Then, one approximates the joint distribution of ξ g as, p ( ξ g ) = p ( ξ g, ) m g (cid:89) i =2 p ( ξ g,i | ξ g, i − ) ≈ p ( ξ g, ) m g (cid:89) i =2 p ( ξ g,i | ξ g, N g,i ) , (13)where N g,i is the “past” neighbour set of the i th datum in group g , which contains a (very small) subset ofthe integers between, and including, 1 and ( i − ξ g that approximates the true joint distribution. The approximate distribution is Gaussianwith mean and a sparse precision matrix, Σ − ξ,g , with the degree of sparsity closely connected to the sizesof the sets {N g,i : i = 1 , . . . , m g } .In the version of WOMBAT presented here, we consider a special case of (13), where the observationsare ordered in time and where the covariance function C ξ g ( · , · ; θ ξ g ) is simply an exponential function oftemporal separation. In this case one only needs the consideration of one (temporal) length-scale parameterper group, (cid:96) g, , g = 1 , . . . , n g . The motivations for this simpliﬁcation are twofold. First, the remote-sensinginstrument we consider in Sections 4 and 5 ﬂies on a satellite that is in a sun-synchronous orbit, so thatcorrelation in time is a proxy for along-track correlations. This model for characterising correlation in theerrors was suggested and used by Chevallier (2007). Second, the use of an exponential covariance functionallows the approximation in (13) to become an equality, where N g,i = i −

1. This is a manifestation ofthe so-called ‘screening eﬀect,’ where the exponential covariance function induces a ﬁrst-order conditionalindependence structure. Consequently, Σ − ξ g is a tridiagonal matrix of known form; a similar structure wasproposed by Lunt et al. (2016) for in situ measurements in a regional inversion for methane ﬂuxes. Denotethe (temporally ordered) measurement times within a group as t , t , . . . , t m g , the normalised temporalseparations as δ k ≡ ( t k +1 − t k ) /(cid:96) g, , for k = 1 , . . . , m g −

1, and the marginal variances of ξ g as σ ξ g,k , for k = 1 , . . . , m g . Then, it can be shown (e.g., Finley et al., 2009) that Σ − ξ g has diagonal entries σ − ξ g ,k d ξ g ,k , for k = 1 , . . . , m g , where d ξ g ,k =  e − δ − e − δ if k = 1 , e − δ k − e − δ k + e − δ k +1 − e − δ k +1 if k = 2 , , . . . , m g − , e − δ mg − e − δ mg if k = m g , and sub-diagonal entries σ − ξ g ,k σ − ξ g ,k +1 s ξ g ,k , for k = 1 , . . . , m g −

1, where s ξ g ,k = − e − δ k − e − δ k . Now, Σ − g = ( Σ ξ g + Σ (cid:15) g ) − and | Σ g | = | Σ ξ g + Σ (cid:15) g | , where Σ − ξ g is very sparse and Σ − (cid:15) g is diagonal.Eﬃcient computations of (12) therefore follows by expressing Σ − g and | Σ g | in terms of these sparse matrices.Then Σ − g and | Σ g | can be evaluated through the use of the Sherman–Morrison–Woodbury matrix identityand a matrix-determinant lemma (e.g., Henderson and Searle, 1981, and Appendix A). This section gives the setup needed for Section 5, where we compare the inversions from WOMBAT tothose from the OCO-2 MIP (Crowell et al., 2019). In the MIP, inversions followed a predeﬁned protocol;we therefore conﬁgured WOMBAT to follow the same protocol. The MIP prescribed the data to be used,including both preprocessed point referenced and remotely sensed data from OCO-2 between 06 September2014 and 01 April 2017. Participants were tasked to provide ﬂux estimates for the years 2015 and 2016. Theprotocol also speciﬁed a fossil-fuel ﬂux ﬁeld that had to be assumed ﬁxed and known, in order to facilitate theinterpretation of the diﬀerences in ﬂux estimates obtained by the diﬀerent participants. All other modelling11hoices (e.g., transport model, prior ﬂuxes) were left to individual participants.

Our prior expectation of the spatio-temporal ﬂux process, Y ( · , · ), is constructed from inventories of diﬀerenttypes of ﬂuxes through the decomposition Y ( s , · ) = (cid:40) Y , ﬀ1 ( s , · ) + Y , bio1 ( s , · ) + Y , bf1 ( s , · ) + Y , ﬁre1 ( s , · ) , if s is over land, Y , ﬀ1 ( s , · ) + Y , ocn1 ( s , · ) , if s is over ocean , (14)where Y , ﬀ1 ( · , · ) corresponds to fossil fuel emissions, Y , bio1 ( · , · ) to terrestrial biospheric ﬂuxes, Y , bf1 ( · , · ) tobiofuel emissions, Y , ﬁre1 ( · , · ) to ﬁre emissions, and Y , ocn1 ( · , · ) to ocean-air exchange ﬂuxes. We now describethese components in more detail: • Fossil-fuel emissions: For Y , ﬀ1 ( · , · ) we use the Open-source Data Inventory for Anthropogenic CO monthly fossil-fuel emissions (ODIAC2016; Oda and Maksyutov, 2011; Oda et al., 2018) with TemporalImprovements for Modeling Emissions by Scaling (TIMES) weekly scaling factors (Nassar et al., 2013).This term also includes emissions from international aviation and shipping. The fossil-fuel ﬂuxes areprescribed by the MIP protocol. • Biospheric ﬂux: This ﬂux is a result of the interaction between the atmosphere and trees, shrubs,grasses, soils, dead wood, leaf litter, and other biota. It is deﬁned by the formula GPP − R A − R H ,where GPP stands for gross primary production and represents the uptake of carbon by plants dueto photosynthesis; R A is autotrophic respiration, the release of carbon through respiration by plants;and R H is heterotrophic respiration, the release of carbon through the metabolic action of bacteria,fungi, and animals. For Y , bio1 ( · , · ) we use one of the two priors used in CarbonTracker 2019 (Jacobsonet al., 2020), speciﬁcally that based on the Carnegie-Ames Stanford Approach (CASA) biogeochemicalmodel (Potter et al., 1993; Giglio et al., 2013). • Biofuel emissions: These emissions result from the burning of wood, charcoal, and agricultural wastefor energy, as well as the burning of agricultural ﬁelds. For Y , bf1 ( · , · ) we use the estimates of Yevichand Logan (2003) that in turn were based on data from 1985. • Fire emissions: These emissions correspond to those from vegetative ﬁres (wildﬁres), which may starteither naturally or by humans. For Y , ﬁre1 ( · , · ) we use the Quick Fire Emissions Dataset, version 2(QFED2; Darmenov and da Silva, 2015). • Ocean–air exchange: These ﬂuxes are a result of ocean–air diﬀerences in partial pressure of CO . For Y , ocn1 ( · , · ) we use the estimates of Takahashi et al. (2002), with annual scalings reﬂecting increasinguptake of CO as described by Takahashi et al. (2009).A summary of these components is provided in Table 1. We divided the globe into the r s = 22 disjoint TransCom3 regions (Gurney et al., 2002), and time into the r t = 31 months between (and including) September 2014 and March 2017, and then we constructed one ﬂuxbasis function for each region/month pair. This yielded r = r s × r t = 682 basis functions, each with non-zerosupport in a space-time volume spanning one month in time, and one TransCom3 region in space. Halfof the TransCom3 regions are land regions, and half are ocean regions, so that half of our basis functionscorrespond to land areas, and half to ocean areas. We show the TransCom3 regions in Figure C1, and theircodes and labels in Table C1, both of which can be found in Appendix C. Some areas of the globe, depictedin white in Figure C1, are assumed to have zero ﬂux; all our basis functions are zero in these regions.12able 1: Components of the ﬂux prior Y ( s , t ) used in (14). Symbol Type Inventory name Resolution Reference(s) Y , ﬀ1 ( · , · ) Fossil fuel Open-source DataInventory forAnthropogenic CO Y , bio1 ( · , · ) Biospheric CarbonTracker 2019,based on theCarnegie-Ames StanfordApproach (CASA)biogeochemical model 3 hourly Potter et al. (1993);Giglio et al. (2013);Jacobson et al. (2020) Y , bf1 ( · , · ) Biofuels Yevich & Logan Constant Yevich and Logan (2003) Y , ﬁre1 ( · , · ) Fires Quick Fire EmissionsDataset, version 2(QFED2) Hourly Darmenov and da Silva (2015) Y , ocn1 ( · , · ) Ocean Scaled Takahashi Monthly Takahashi et al. (2002);Takahashi et al. (2009)For φ j ( · , · ) a basis function corresponding to a land area, we have φ j ( s , t ) = (cid:40) Y , bio1 ( s , t ) + Y , bf1 ( s , t ) + Y , ﬁre1 ( s , t ) , ( s (cid:48) , t ) (cid:48) ∈ D φj , , (15)where D φj is the space-time volume over which the j th basis function is deﬁned to be non-zero. For φ j ( · , · )a basis function corresponding to an ocean area, we have φ j ( s , t ) = (cid:40) Y , ocn1 ( s , t ) , ( s (cid:48) , t ) (cid:48) ∈ D φj , . (16)Note that both (15) and (16) exclude Y , ﬀ1 ( · , · ). This is done to meet the MIP requirement that fossil-fuelﬂuxes are treated as ﬁxed and known (which is common practice in ﬂux inversion; e.g., see Basu et al., 2013).The inﬂuence of fossil fuels on the mole-fraction ﬁeld is therefore present only as an invariant component ofthe prior expectation of the mole-fraction ﬁeld.Since the spatio-temporal patterns of the ﬂuxes within a region–month space-time volume are ﬁxed, ourconstruction is quite restrictive. However, the space-time patterns are dictated by those in the inventoriesused to construct the basis functions, and they can be assumed to be fairly realistic. This underlyingassumption is ubiquitous in ﬂux-inversion systems; for example, Jacobson et al. (2020) scale 3-hourly ﬂuxesusing weekly scale factors over 156 regions, while Basu et al. (2013) use monthly scale factors for 3-hourlyﬂuxes over 6 ◦ × ◦ grid cells. Constraining the spatio-temporal pattern is inferentially advantageous becauseit helps address the ill-posed nature of ﬂux inversion. It is also computationally advantageous because itreduces the number of unknowns for which inference is needed. The disadvantage is that the reliance on apriori structures increases the risk of dimension-reduction error because, while our basis functions allow theposterior ﬂuxes to vary at sub-TransCom3-region scales, variations that don’t follow the prescribed patternare necessarily ignored.As described in Section 2.2, for j = 1 , . . . , φ j ( · , · ) has a corresponding mole-13raction basis function ˆ ψ j ( · , · , · ), which may be constructed by running the transport model, ˆ H , under theﬂux Y ( · , · )+ φ j ( · , · ). Then the mole-fraction basis function ˆ ψ j ( · , · , · ) is recovered through linearity by simplysubtracting Y ( · , · , · ) from the output mole-fraction ﬁeld. For illustration, Figure 2 shows examples of theﬂux basis functions (monthly averaged) for January 2016 and the regions TransCom3 02 and TransCom3 06,and snapshots of the corresponding mole-fraction basis functions (daily and atmospheric-column averaged)obtained using the transport model described next in Section 4.3. For our approximate transport model, ˆ H , we used the GEOS-Chem global 3-D chemical transport model,version 12.3.2 (Bey et al., 2001; Yantosca, 2019), driven by the GEOS-FP meteorological ﬁelds from NASA’sGoddard Earth Observing System (Rienecker et al., 2008). We use the oﬄine GEOS-Chem CO simulation(Nassar et al., 2010), with the native horizontal resolution of 0 . ◦ × . ◦ and 72 vertical levels aggregatedto 2 ◦ × . ◦ and 47 vertical levels for computational eﬃciency. We use a transport time step of 10 minutes,and a ﬂux time step of 20 minutes. All ﬂuxes described in sections 4.1 and 4.2 were implemented in GEOS-Chem using the HEMCO emissions component (Keller et al., 2014). GEOS-Chem can be conﬁgured to allowfor a 3-D chemical source of CO due to oxidation of other trace gases, but this was disabled for compatibilitywith the OCO-2 MIP.The approximate initial condition, ˆ Y ( · , · , t ), speciﬁes the mole-fraction ﬁeld at the beginning of thestudy period on 01 September 2014. For our initial mole-fraction ﬁeld, we used a modiﬁed version of thatgenerated by Bukosa et al. (2019). This mole-fraction ﬁeld was constructed using a spin-up period, startingon 01 January 2005, and ending on 01 September 2014, with transport driven by inventory ﬂuxes andmeteorology (that in some cases diﬀer from those we use here; see Bukosa et al., 2019, for details). At theend of the spin-up period, the whole mole-fraction ﬁeld on 01 September 2014 was scaled such that thevalue in the surface grid cell containing the South Pole was equal to the monthly-averaged mole-fractionmeasurement from surface ﬂask measurements at the South Pole (Thoning et al., 2020) in September 2014.The South Pole was chosen as our calibration point due its physical isolation from strong sources and sinks. This study uses a subset of the data sources in the MIP (Crowell et al., 2019). These include retrievals ofcolumn-averaged CO by NASA’s OCO-2 satellite (Eldering et al., 2017), and retrievals of column-averagedCO from TCCON (Wunch et al., 2011a). As in the MIP, we use OCO-2 data to estimate CO ﬂuxes, andTCCON data to validate the estimates. The OCO-2 satellite was launched in 2014 with the goal of retrieving atmospheric CO mole fractions. Theon-board instrument measures radiances in three near-infrared spectral bands, which in turn are used toretrieve the CO mole fraction on 20 vertical levels via a retrieval algorithm based on Bayesian optimalestimation (Rodgers, 2000; O’Dell et al., 2012). The retrieved levels are column-averaged, and then bias-corrected through comparison with TCCON retrievals (Wunch et al., 2011a). The OCO-2 team releasesregular revisions of the retrieval dataset.OCO-2 radiance measurements are taken in three distinct pointing modes: nadir mode, where the satelliteaims at the point directly beneath it; glint mode, where the satellite points at the reﬂection of the sun onthe surface; and target mode, where the satellite aims at a speciﬁc target, typically a ground station thatalso measures CO mole fractions. Target observations are generally excluded from ﬂux inversions and usedonly for instrument calibration. Over the ocean, nadir measurements have insuﬃcient signal-to-noise ratio toprovide useful retrievals, while over land, both nadir and glint retrievals are made. There are therefore threeretrieval modes to consider, land glint (LG), land nadir (LN), and ocean glint (OG). The error propertiesof retrievals over land diﬀer signiﬁcantly from those over ocean; in particular, the OG retrievals in the V7rdataset (the dataset used in the MIP) are not considered reliable, and were therefore excluded from the MIP(Crowell et al., 2019). We follow this protocol, and only do inversions using LG and LN data.The MIP protocol dictated the use of a post-processed version of the V7r retrievals; this post-processingwas done as follows. First, an additional bias-correction term related to high-albedo measurements was14 j ( s , t ) TransCom3 02 TransCom3 06 − −1.5−1.0−0.50.00.51.01.5 [ k g / m / y ea r ] y ^ j ( s , h , t ) TransCom3 02 TransCom3 06 − − − − − − −0.2−0.10.00.10.2 [ pp m ] Fig. 2: Examples of ﬂux basis functions that have support in the month of January 2016 in the regionsTransCom3 02 and TransCom3 06, and the corresponding mole-fraction basis functions. The ﬁrst row showsthe values of the ﬂux basis functions, φ j ( · , · ), averaged over the whole month (these basis functions arezero outside January 2016). The next three rows show daily averages of the column-averaged CO for thecorresponding mole-fraction basis functions on three days: the start of the month-long period where the ﬂuxis non-zero, 01 January 2016; the middle of the period, 15 January 2016; and 15 days after the end of theperiod, 15 February 2016. 15pplied to the XCO retrievals. The bias-corrected retrievals were then grouped and averaged into 1 s bins,and then they were further grouped and averaged into 10 s bins. The 10 s spans correspond to groundswathes of approximately 67 km in length. The standard deviation for each 10 s retrieval was computed as afunction of the individual retrieval standard deviations, with an additional model–data mismatch term addedto account for the expected diﬀerences arising from transport-model errors. For the MIP, the 10 s averageswere assumed to be independent but, following Section 2.4, we treat them as dependent. As the prescribedvariances already account for transport-model error, we adopt Case (ii) of Section 2.4 in our model. Formore details on how the 10 s averages were computed, including how the standard errors were derived, seeCrowell et al. (2019).The OCO-2 retrieval algorithm produces estimates of XCO . In equation (7), we encapsulate the retrievalalgorithm in the observation operator, ˆ A i . Appendix B gives more details on this observation operator inthe case of OCO-2 retrievals. TCCON is a network of ground-based sites measuring solar radiances in the near-infrared spectral band(Wunch et al., 2011a). Similar to the way OCO-2 retrievals are obtained, these measurements are convertedto retrievals of column-averaged CO (and other gases) using a retrieval algorithm. TCCON retrievals havebeen adjusted to agree with World Meteorological Organization (WMO) trace-gas measurement scales, andvalidated using aircraft data (Wunch et al., 2010). As both TCCON and remote sensing instruments retrievecolumn-averaged mole fractions, the TCCON data are an important validation resource (Wunch et al.,2011b). The MIP used TCCON column-averaged CO retrievals from the GGG2014 release as validationdata, including all retrievals available as of July 6, 2017. In the MIP, outliers and retrievals corresponding tosoundings with high solar zenith angle were removed. The remaining retrievals were then averaged over 30-minute intervals; further details are given by Crowell et al. (2019). We note that the ﬁltering procedure usedin the MIP occasionally led to long periods of time for which data were considered missing. For consistencywith the MIP, in this study we used the same retrievals and postprocessing methods; the stations used arelisted in Table 2.Like OCO-2, TCCON retrievals also have an associated observation operator ˆ A i . This has a similarform to the operator for OCO-2, which is described in Appendix B. A detailed description of the TCCONobservation operator is given by Wunch et al. (2011b). The prior distributions for the parameters governing the scaling factors, α , are speciﬁed separately for theland and ocean TransCom3 regions. The land regions, which are observed directly by OCO-2 when in LGor LN mode, are assigned a non-informative prior, while the indirectly observed ocean regions (which alsohave relatively small ﬂuxes over a given area) are assigned a relatively informative prior. Informative priorsfor ocean ﬂuxes were deemed necessary following OSSEs performed by us, which revealed that it is often notpossible to reliably constrain ocean ﬂuxes from OCO-2 land data.Speciﬁcally, for j corresponding to a land region, we assigned a prior to κ j by letting a κ,j = b κ,j = 1(resulting in a uniform prior), and for τ w,j we let ν w,j = 0 .

354 and ω w,j = (1 − κ j ) / . τ w,j implies that 1 /τ w,j , the marginal variance of the elements of the scalings in α that correspond to landregions, has 5% and 95% percentiles of 0.01 and 10, respectively, which is reasonably uninformative. For j corresponding to an ocean region, we apply an independent and identically distributed Gaussian prior withmean zero and standard deviation 0.5 to α j . This is achieved by ﬁxing κ j = 0 and τ w,j = 4.As described in Section 4.4.1, the OCO-2 MIP 10 s averages come with prescribed uncertainties thatinclude both measurement error and transport-model error. This falls under Case (ii) of Section 2.4, so theparameters governing these processes are γ g , ρ g , and (cid:96) g, . For the prior distribution of γ g , we let ν γ g = 1 . ω γ g = 2 . ρ g . For (cid:96) g, , we let ν (cid:96) g , = 1 and ω (cid:96) g , = 1 min − . When doing bias correctiononline, we used the prior on β described in Section 2.4.16able 2: Names of the TCCON stations used in this study.Name ReferenceAscension Island, Saint Helena Feist et al. (2014)Bialystok, Poland Deutscher et al. (2015)Bremen, Germany Notholt et al. (2014)Caltech, Pasadena, CA, USA Wennberg et al. (2015)Darwin, Australia Griﬃth et al. (2014a)Edwards (Armstrong/Dryden), CA, USA Iraci et al. (2016)Eureka, Canada Strong et al. (2016)Iza˜na, Spain Blumenstock et al. (2014)Karlsruhe, Germany Hase et al. (2015)Lamont, OK, USA Wennberg et al. (2016)Lauder, New Zealand Sherlock et al. (2014)Manaus, Brasil Dubey et al. (2014)Orl´eans, France Warneke et al. (2014)Park Falls, WI, USA Wennberg et al. (2014)R´eunion Island, France De Mazi`ere et al. (2014)Saga, Japan Kawakami et al. (2014)Sodankyla, Finland Kivi et al. (2014)Tsukuba, Japan Morino et al. (2016)Wollongong, Australia Griﬃth et al. (2014b) Computations were performed in two stages. In the ﬁrst stage, the 682 mole-fraction basis functions wereprecomputed in the manner described in Section 4.2. This is the most computationally demanding step, aseach basis function requires the CTM to be run for several days of clock time, on average. Fortunately, sinceevery basis function can be computed independently from all others, computing them is an embarrassinglyparallel problem. Furthermore, since the basis functions are shared between the diﬀerent inversions in thissection, they only need to be computed once. Computation of the basis functions took seven days in totalusing the Gadi supercomputer at the Australian National Computational Infrastructure.The inversions were performed in the second stage. As mentioned in Section 2.5, the posterior distributionfor each inversion was estimated using an MCMC sampling scheme, with details given in Appendix A. Thesampling schemes in all cases were run for 11,000 iterations, and the ﬁrst 1,000 iterations were discardedas burn-in. Convergence of the MCMC chain was assessed by inspection of all the trace plots. The schemewas implemented in the R programming language (R Core Team, 2020), with intensive linear algebraiccomputations oﬄoaded for performance to a graphics processing unit (GPU) using Tensorﬂow (Abadi et al.,2016). The total running time of the sampler depended on which model assumptions were used; in particular,whether uncorrelated (15 minutes) or correlated errors (two hours) were modelled. All inversions wereperformed on a machine with an 8-core Intel i9-9900K CPU running at 3.60 GHz and an NVIDIA RTX 2080GPU.

In this section we evaluate WOMBAT, ﬁrst in an OSSE, where the true ﬂuxes are assumed known and dataare simulated from these true ﬂuxes, then on actual satellite data via the MIP protocol of Crowell et al.(2019). Using an OSSE, described in Section 5.1, serves two purposes: ﬁrst, to show that WOMBAT canindeed recover the true ﬂuxes in a controlled environment where the “working model” is the “true model”;and second, to illustrate the importance of modelling measurement-error biases and correlated errors whenthese are present in the true model from which the data are simulated. Then, in Section 5.2, we show thatWOMBAT gives similar ﬂux estimates to those obtained by diﬀerent MIP participants, and that it performs17ell relative to the MIP participants in reproducing out-of-sample TCCON validation data. In Section 5.3we show that WOMBAT is able to estimate bias coeﬃcients online, if needed.

In this section we illustrate the use of WOMBAT in an OSSE, where we randomly draw ﬂux scaling factors α s from a Gaussian distribution with mean and covariance matrix 0 . I , and assume that these are thetrue ﬂux scaling factors. We further assume that the dimension-reduction error, ν ( · , · ), is zero, so that the(simulated) true ﬂux Y s ( · , · ) is given by Y s ( · , · ) = Y ( · , · ) + r (cid:88) j =1 φ j ( · , · ) α sj , (17)where α sj is the j th element of α s . The (simulated) true mole-fraction ﬁeld, Y s ( · , · , · ), is then given by Y s ( · , · , · ) = Y ( · , · , · ) + r (cid:88) j =1 ˆ ψ j ( · , · , · ) α sj . (18)Finally, we simulate measurements from the mole-fraction data model in equation (9) at the same timesand locations as the OCO-2 10 s average retrievals for the LN and LG modes, by passing Y s ( · , · , · ) throughthe corresponding OCO-2 observation operators (see Section 4.4.1).When simulating data via (9), we assume that both b g and ξ g are present. For the bias term b g , weassume that the OCO-2 retrieval biases are a linear combination of covariates that are associated with biasin the retrieval process: • “dp”: The prior–retrieval surface pressure diﬀerential; • “logDWS”: The logarithm of the total retrieved optical depth associated with the aerosol types dust,water cloud, and sea salt; and • “co2 grad del”: The diﬀerence between the retrieved CO mole fractions at the surface and retrievalvertical level 13 (corresponding to the height with air pressure equal to 63.2% of the surface pressure,around 520–650 hPa for most retrievals).The “oﬃcial” V7r bias-correction parameters (regression coeﬃcients) for the original Level 2 (L2) datarelease were obtained through oﬄine comparison of the raw L2 product with TCCON retrievals, and theyare the same for both LG and LN observations. They are equal to 0.3, 0.028, and 0.6, for the three variables,respectively. We construct our (simulated) true biases based on these coeﬃcients.To mimic the MIP setup, we assume that we are in Case (ii), wherein the prescribed variance of eachretrieval needs to be inﬂated, and the inﬂated variance is the sum of the variance from both the correlated( ξ g ) and uncorrelated ( (cid:15) g ) error components. In our OSSE, we assume that the inﬂation factor of theprescribed variances, v ps g , is γ g = 1 .

25, and that the proportion of this variance allocated to the correlatederror process is ρ g = 0 .

8. We induce the correlations using the exponential covariance function described inSection 3 with the single length scale of the correlated component set to (cid:96) g, = 1 minute for all g = 1 , . . . , n g .We specify this correlation structure to be the same for both LG and LN data.We ran four diﬀerent setups in WOMBAT, where bias is assumed or not assumed to be present, andwhere errors are assumed or not assumed to be correlated. The known true ﬂux, generated as describedabove, is the same between the cases, and we evaluate the ability of WOMBAT to recover the truth undereach of the setups. Table 3 gives the root-mean-squared error (RMSE) and continuous ranked probabilityscore (CRPS, Gneiting and Raftery, 2007) when estimating monthly- and regionally-aggregated ﬂuxes usingthese four setups. The regions on which these evaluations are based are the same TransCom3 regions thatwere used to construct the ﬂux basis functions (see Section 4.2), and the quantities in the table are averagesacross all combinations of the 31 months and 22 regions. The true data-generating process involves bothbias and correlated error. Therefore, as one would would expect, Table 3 shows that the WOMBAT setupthat takes into account both of these features performs the best in terms of both RMSE and CRPS, while18able 3: Root-mean-squared error (RMSE) and continuous ranked probability score (CRPS) when estimat-ing monthly regional ﬂuxes using LG and LN data in the OSSE of Section 5.1. Four setups in WOMBATare evaluated and the regions and time periods over which these summaries (averages) are obtained are thesame as those used for constructing ﬂux basis functions (see Section 4.2).RMSE CRPSConﬁguration LG LN LG LNBias correction/correlated errors 0.023 0.021 0.010 0.009Bias correction/uncorrelated errors 0.038 0.038 0.015 0.016No bias correction/correlated errors 0.045 0.026 0.016 0.011No bias correction/uncorrelated errors 0.092 0.063 0.034 0.026the setup that assumes that neither feature is present performs the worst. For the two partially misspeciﬁedsetups, the bias-corrected/uncorrelated setup outperforms the not-bias-corrected/correlated setup for LGdata, while the opposite is true for LN data. Notably, despite the presence of systematic biases in thesimulated data, the WOMBAT setup that assumes no bias, but which takes into account correlated errors,performs overwhelmingly better than the fully misspeciﬁed model that assumes no bias and uncorrelatederrors.Figure 3 shows the (simulated) true (black), prior mean (blue), and posterior distributions (red, purple,orange, and green), for the total ﬂux in the tropics (latitude 23 . ◦ S–23 . ◦ N) in 2015 and 2016, from bothLN and LG, split by the southern and the northern components (latitudes 23 . ◦ S–0 ◦ and 0 ◦ –23 . ◦ N, respec-tively). The four posterior distributions depicted in each panel correspond to the four diﬀerent WOMBATsetups. The interior of the ellipses represent the 95% credible regions. The grey dotted lines along thediagonal correspond to combinations of the southern and northern ﬂuxes that yield the true total ﬂux in thetropics; if the dotted line is not within the ellipse for an inversion, the total ﬂux is misestimated. All shownﬂuxes are exclusive of fossil fuels which, recall, are held ﬁxed in the inversions. Figure C2 in Appendix Cshows the results on a global scale (land vs ocean), while Figure C3 in Appendix C shows the results on aregional scale (TransCom3 region 04 [South American Temperate] vs TransCom region 06 [Southern Africa]). Collectively, the performances of the diﬀerent models, as shown in Figures 3, C2, and C3, align withthe conclusions based on the CRPS and RMSE statistics. In all the cases shown, the 95% credible regionsfor the WOMBAT conﬁguration with both bias correction and correlated error (red) contain the true value,while those for the conﬁguration with neither feature (green) do so rarely. The credible regions for thebias-corrected/uncorrelated variant (orange) are always smaller than the red credible regions, indicatingthat the bias-corrected/uncorrelated variant is overconﬁdent. In contrast, the purple credible regions, forthe not-bias-corrected/correlated variant, are always larger, which may suggest that the correlated errorsare partially compensating for the lack of bias correction in this variant.In summary, this OSSE shows that WOMBAT can recover the true ﬂux when the assumed model is thetrue model. But, more importantly, the OSSE also demonstrates the importance of modelling both biasand correlated errors in inversions. If the bias parameters are omitted, ﬂuxes can be estimated incorrectly,although this may be partially mitigated by modelling correlated errors. If uncorrelated errors are assumed,estimation performance suﬀers, and ﬂux estimates will likely be reported with too small an uncertainty,even if the prescribed variances are allowed to be inﬂated when making inference. Practically, in a real-datasetting, any factors thought to introduce systematic biases should be taken into account, but this OSSE alsosuggests that the use of correlated errors may provide some insurance against any remaining, unmitigated,spatio-temporal biases.

In this section we present results from WOMBAT applied to OCO-2 satellite data under the MIP protocol(Crowell et al., 2019). The protocol mandates the use of OCO-2 retrievals with the TCCON-based oﬄinebias correction. While WOMBAT is capable of online bias correction (see Section 5.3), in this section we19

015 2016 L G L N - ] F l u x i n no r t he r n t r op i cs [ P g C y r - ] Bias correction/correlated errorsBias correction/uncorrelated errors No bias correction/correlated errorsNo bias correction/uncorrelated errors Prior meanTruth

Fig. 3: True, prior, and posterior, estimates of total ﬂux in the northern tropics (0 ◦ –23 . ◦ N) versus thetotal ﬂux in the southern tropics (23 . ◦ S–0 ◦ ) for the OSSE in Section 5.1. The columns correspond to theyears 2015 and 2016, while the rows show which observation groups were used, either OCO-2 land glint (LG)or land nadir (LN), noting that the prior and true ﬂux are the same across rows. Points show the posteriormean ﬂuxes for each model conﬁguration, as well as the prior mean in blue and the truth in black. For theposterior quantities, the ellipses contain 95% of the posterior probability. The grey dotted lines along thediagonal correspond to combinations of southern and northern tropical ﬂux that yield the true total ﬂux inthe tropics. All ﬂuxes are exclusive of fossil fuels, which are held ﬁxed in the inversion.20ollow the protocol and set the bias parameters in WOMBAT equal to zero. Moreover, since we are currentlyunable to deconvolve dimension-reduction error from other sources of error (e.g., transport-model error) inour model, we do not include v ( · , · ) in (1), and let its impact on the mole-fraction ﬁeld, v , ( · , · , · ) in (5),be absorbed into the correlated error term, ξ g . In the OCO-2 MIP, nine participants submitted ﬂuxes based on inversions satisfying the MIP protocol.Each participant reported to the MIP four sets of ﬂuxes: their prior mean ﬂuxes, and ﬂuxes from threeinversions based respectively on point referenced data, OCO-2 LG data, and OCO-2 LN data. Crowell et al.(2019) considered the diﬀerent participants’ ﬂuxes as an ensemble, reporting the ensemble mean, median,and standard deviation across a variety of temporal and spatial scales. Under the same protocol and usingOCO-2 LG and LN data, we compare WOMBAT’s posterior distribution over the ﬂuxes to the correspondingresults from the MIP ensemble.Through its MCMC scheme, WOMBAT’s inversions generate samples from the posterior distributionof Y ( · , · ), the ﬂux ﬁeld. These samples enable estimation of functionals of the posterior distribution,including posterior means and quantiles. While some individual MIP participants are able to produceprobabilistic uncertainty estimates for ﬂuxes, these were not reported as part of the MIP; instead, theempirical distribution from the ensemble of ﬂuxes was used by the MIP for uncertainty quantiﬁcation. Sinceit is diﬃcult to make a quantitative comparison between WOMBAT’s posterior-based uncertainties and theensemble uncertainties in the MIP, we opt here for a qualitative analysis, where the posterior distributionsover the ﬂuxes are visually compared to the ensemble minimum, mean, and maximum. This comparison isdone at both annual and monthly temporal scales, and across spatial scales encompassing the whole globe,global land, global ocean, and zonal bands. Global totals:

Figure 4 presents annual and monthly non-fossil-fuel ﬂuxes for the globe, land regions, andocean regions. Fluxes are shown for the MIP, split into prior, LG inversions, and LN inversions, and forWOMBAT, split into prior, posterior using LG data, and posterior using LN data. Thick lines show theensemble means for the MIP, and the (prior or posterior) means for WOMBAT. Shaded areas and thin linesfor the MIP denote the values between the ensemble minimum and maximum, while for WOMBAT theydenote values in the 95% credible intervals.At the global scale, shown in the ﬁrst row of Figure 4, the MIP ensemble mean for prior ﬂuxes over landand WOMBAT’s prior mean over land are similar. This likely reﬂects the common use of prior inventoriesbetween groups performing ﬂux inversions, particularly biospheric ﬂuxes based on the CASA biogeochemicalmodel (see Section 4.1 and Table 1). By contrast, WOMBAT’s prior ocean ﬂuxes are deeper than any othermember of the MIP ensemble. The global annual non-fossil-fuel ﬂuxes estimated by posterior means fromWOMBAT are very similar for both LG and LN modes, with an overall posterior mean net sink in 2015of 3.40 and 3.14 PgC yr − , respectively, and in 2016 of 4.14 and 4.27 PgC yr − , respectively. For 2015,these sinks are very similar to the MIP ensemble means (3.57/3.21 PgC yr − for LG/LN, respectively) forboth modes while, for 2016, WOMBAT returns a deeper posterior-mean sink than the MIP ensemble means(3.82/3.78 PgC yr − for LG/LN, respectively), although the 95% credible intervals overlap with the ensemblespread. At a monthly scale, WOMBAT reproduces a key feature of the MIP ﬂuxes, wherein the seasonalcycle in the ﬂuxes, driven by the Northern Hemisphere growing season, begins and ends earlier than in theprior for both 2015 and 2016. In agreement with the MIP, WOMBAT results indicate that the peak sinkin the cycle is larger than that in the prior. However, the sink estimated by WOMBAT is around 0.4 PgCmo − smaller than the MIP ensemble mean for both LG and LN. Global land and ocean:

For global land ﬂuxes, shown in the second row of Figure 4, WOMBAT’s resultsagree with those from the MIP for both LG and LN in that a source larger than the prior ﬂux is estimatedfor October 2015. However, while the source persists into November in the MIP ensemble mean, it doesnot do so in the WOMBAT posterior mean. For global ocean ﬂuxes, shown in the third row of Figure 4,the MIP LG- and LN-estimated ﬂuxes diﬀer little from the prior ﬂuxes, and we observe the same for globaloceans in the WOMBAT estimates for both LG and LN modes and in most months. The exceptions areSeptember and October in both years, where WOMBAT estimates a shallower sink, and even zero ﬂux with21 lobal −5−4−3−2−10 2015 2016Year F l u x [ P g C y r - ] −2022015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] Global Land −202 2015 2016Year F l u x [ P g C y r - ] −3−2−10122015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] Global Oceans −4−3−2−10 2015 2016Year F l u x [ P g C y r - ] −0.4−0.20.02015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] MIP Prior (min/mean/max)MIP LG (min/mean/max) MIP LN (min/mean/max)WOMBAT Prior (mean) WOMBAT LG (mean, 95% cred. int.)WOMBAT LN (mean, 95% cred. int.)

Fig. 4: Annual (left column) and monthly (right column) ﬂuxes for the globe (ﬁrst row), land (second row),and ocean (third row). Flux estimates from both the MIP and WOMBAT are shown, split into the prior,LG inversions, and LN inversions. Thick lines represent the ensemble means for the MIP and the (prioror posterior) means for WOMBAT. Shaded areas and thin lines for the MIP represent values between theensemble minimum and maximum, while for WOMBAT they represent values in the 95% credible intervals.Fossil-fuel ﬂuxes are excluded from all ﬁgures. Note that each row of plots has a diﬀerent vertical scale.22he LN data. These features are not obvious in the MIP ensemble means, but appear reasonable within theensemble spread.

Zonal bands:

Figure C4 in Appendix C shows ﬂuxes for zonal bands covering the northern extratropics(23.5 ◦ N–90 ◦ N), northern tropics (0 ◦ –23.5 ◦ N), southern tropics (23.5 ◦ S–0 ◦ ), and southern extratropics (90 ◦ S–23.5 ◦ S). For the MIP ensemble, Crowell et al. (2019) noted that inversions using LG data led to a smallernet annual sink (averaged between 2015 and 2016) in the northern extratropics than those using LN data.WOMBAT also ﬁnds this feature, with a 95% credible interval of the diﬀerence spanning 0.14–0.6 PgC yr − .This is substantially smaller than the diﬀerence between the MIP ensemble means for these modes, which is0.7 PgC yr − . Fluxes in the southern extratropics, shown in the fourth row of Figure C4, are dominated byocean ﬂuxes for which, as noted above, LG and LN data provide little information.One of the most prominent features in the MIP inversion results is a seasonal cycle in the tropics that islarger than that in both the prior mean and the in situ inversions (Crowell et al., 2019). From the secondand third rows of Figure C4, which depict tropical-zone ﬂuxes, it can be seen that WOMBAT does notreproduce this for both LG and LN inversions. In the northern tropics, the WOMBAT posterior means aresimilar to the prior means, and the credible intervals in the annual ﬂuxes reﬂect high conﬁdence. However,WOMBAT results do corroborate those of the MIP ensemble, in that non-fossil-fuel ﬂuxes in the northerntropics were a net source of CO in 2016. To evaluate the estimated ﬂuxes in the OCO-2 MIP, each participant was asked to compare the 30-minute-average TCCON retrievals of column-averaged CO (see Section 4.4.2) to the column-averaged CO predictedunder the corresponding estimated ﬂuxes and CTM used for the inversion. Recall that, when performingthe inversions, only OCO-2 data were used, and the TCCON data were treated as unobserved and usedfor validation. For WOMBAT, we repeated this validation exercise by examining the prior and posteriordistributions of Z ,g , where each g corresponds to a diﬀerent TCCON site. For each group g , we set b g = ,since we assume that the provided TCCON retrievals are free of bias. On the other hand, we assumed that ξ g + (cid:15) g has variance that is group speciﬁc, and that these errors are fully correlated. While this assumption isconservative, it is also reasonable, since it is likely that the CTM does induce errors that are highly correlatedin time at a common spatial location, as it averages all variables on a rather coarse grid when simulatingatmospheric transport. We estimate the variance of these correlated errors in a group g as the average ofthe reported variances of each TCCON retrieval within a group g .In Figure C5 in Appendix C we compare the time series of the TCCON retrievals with the predictions fromWOMBAT under the prior mean ﬂux, the posterior distribution of ﬂux from LG data, and the posteriordistribution of ﬂux from LN data. Several things are of note from this ﬁgure: First, the posterior-meanestimates are a better match to the TCCON retrievals than the prior-mean estimates, which is evidencethat OCO-2 data allow for improved ﬂux estimates. Second, discrepancies between TCCON and predictedretrievals persist for a long time, lending credence to our assumption that errors are highly temporallycorrelated. Third, the 95% prediction intervals are appropriate, and largely contain within them the TCCONretrievals.In Figure 5 we reproduce an augmented version of Figure 8 of Crowell et al. (2019), which depicts themean and standard deviation of the diﬀerences between the TCCON retrievals and the predicted retrieval byTCCON site, MIP participant, and observation mode (LG and LN) alongside the results from WOMBAT.The improvement of the WOMBAT posterior prediction over the prior prediction is evident in the meanof the diﬀerences, and the WOMBAT posterior error means and standard deviations are in line with thoseof the MIP participants. WOMBAT’s predictive distributions from LG-inferred ﬂuxes can be seen to bebetter than those of the MIP participants, even by straightforward visual inspection. In Table 4 we computemean-squared error by participant and observation mode averaged over all the TCCON stations. WOMBAToutperforms all other participants when using this metric with LG data, and is fourth-best when using thismetric with LN data. While these results are not conclusive on the validity of the WOMBAT ﬂuxes globally,they are encouraging, especially in light of the fact that our ﬂux process has a relatively low-dimensionalrepresentation. 23 G LN T M − D VA R C T − NR T O U C A M S B a k e r − m ean S CHUH U T C M S − F l u x U o E W O M BA T P o s t. W O M BA T P r i o r T M − D VA R C T − NR T O U C A M S B a k e r − m ean S CHUH U T C M S − F l u x U o E W O M BA T P o s t. W O M BA T P r i o r LauderWollongongReunionDarwinAscensionManausIzanaSagaCaltechDrydenTsukubaLamontPark FallsOrleansKarlsruheBremenBialystokSodankylaEureka −1.5−1.0−0.50.00.51.01.5 M ean e rr o r [ pp m ] LG LN T M − D VA R C T − NR T O U C A M S B a k e r − m ean S CHUH U T C M S − F l u x U o E W O M BA T P o s t. W O M BA T P r i o r T M − D VA R C T − NR T O U C A M S B a k e r − m ean S CHUH U T C M S − F l u x U o E W O M BA T P o s t. W O M BA T P r i o r LauderWollongongReunionDarwinAscensionManausIzanaSagaCaltechDrydenTsukubaLamontPark FallsOrleansKarlsruheBremenBialystokSodankylaEureka E rr o r s t. de v . [ pp m ] Fig. 5: Mean (top row) and standard deviation (bottom row) of the errors across TCCON stations for eachMIP participant (refer to Crowell et al., 2019, for details) and for WOMBAT’s prior and posterior meanpredicted values. The error statistics for inversions using LG data are shown in the left column, while thosefor LN data are shown in the right column. This ﬁgure reproduces and extends Figure 8 of Crowell et al.(2019) with similar (but not identical) colour gradients.24able 4: Mean-squared error (in ppm ) averaged across TCCON stations, for each MIP participant and forWOMBAT’s prior and posterior mean predicted values. T M - D V A R C T - N R T O U C A M S B a k e r - m e a nS C H U H U

T C M S - F l u x U o E W O M B A T P o s t . W O M B A T P r i o r LG 1.33 1.63 1.31 1.31 1.88 1.41 1.85 1.78 1.71 1.19 3.56LN 1.36 2.09 1.74 1.24 2.12 1.58 1.71 2.72 1.37 1.57 3.56Table 5: Posterior means, 2.5% quantiles, and 97.5% quantiles, for the parameters (cid:96) g, , γ g , and ρ g , for theinversions using LG and LN retrievals. Recall that the parameters associated with LG and LN are derivedfrom diﬀerent inversions, and not from using the two retrieval groups in the same inversion.LG LNVariable Mean 2.5% 97.5% Mean 2.5% 97.5% (cid:96) g, γ g ρ g One of the key features of WOMBAT is the use of a hierarchical prior distribution, which applies to both theparameters governing the ﬂux scaling factors, and to the parameters governing the model–data discrepancyand measurement-error processes. Figure 6 shows the estimated posterior means and 95% credible intervalsfor the autoregressive parameters κ (top) and the innovation precisions τ w (bottom), for the 11 land regions,and for inversions using LG and LN data. The inferred parameters are relatively consistent across the LGand LN modes, with the exception of TransCom3 region 02 (North American Temperate). Most regions havea posterior mean for κ j that is approximately between 0.25 and 0.75, reﬂective of moderate autocorrelationin the scaling factors. The exception is TransCom3 region 03, for which the scaling factors are estimated tobe highly autocorrelated a priori . The innovation precisions, τ w , have posterior means that lie approximatelybetween 1 and 10, for most regions.The parameters governing the model–data discrepancy and measurement-error processes are ρ g , (cid:96) g, ,and γ g , for g = 1 , . . . , n g . Table 5 gives the posterior means, 2.5% quantiles, and 97.5% quantiles, for theseparameters. Recall that the LG and LN parameters are derived from diﬀerent inversions; they are not twogroups in the same inversion. Nonetheless, the inferred parameters are similar between the inversions, whichis reassuring. The values for γ g are indicative of a 21% variance inﬂation needed for both instrument modes.The length scales, (cid:96) g , are 1.2 minutes for the LG data and 1.1 minutes for the LN data, which corresponds toaround 700–800 km on the ground. Finally, the estimated values of ρ g are around 0.835, indicating that themajority of the combined model–data discrepancy/measurement-error process should indeed be attributedto the correlated component, ξ g , given in (9). The OSSE-based sensitivity study in Section 5.1 demonstrated that WOMBAT is able to perform online bias correction with simulated data, where biases are estimated while doing ﬂux inversion. This is diﬀerentto the typical oﬄine approach to bias correction, where retrieval biases are determined in a separate study.To comply with the MIP protocol, the online bias-correction functionality of WOMBAT was disabled in thestudy of Section 5.2, and the TCCON-based oﬄine bias-corrected OCO-2 10 s average retrievals from theMIP were used. In order to investigate the prospect of online bias correction with real data, we repeat the25 .000.250.500.751.00 k j T T T T T T T T T T T t w , j LGLN

Fig. 6: Posterior means and 95% credible intervals for κ (top) and τ w (bottom, shown using a log scale),for the 11 land regions: TransCom3 region 01 (T01) to TransCom3 region 11 (T11). Estimates are shownfor inversions using LG data (yellow) and LN data (dark green).inversions with online bias correction enabled, using OCO-2 10 s average retrievals both with and withoutthe TCCON-based oﬄine corrections.In Figure 7 we show the posterior densities of the WOMBAT-estimated bias-correction coeﬃcients whenusing the retrievals without the oﬄine correction. The posterior densities shown there are for inversionsbased on LG and LN retrievals, while the TCCON-based oﬄine bias-correction coeﬃcients are given by theblue vertical lines. The WOMBAT-estimated coeﬃcients have the same sign and similar magnitudes to theoﬄine corrections, suggesting that WOMBAT is picking up similar bias patterns while doing ﬂux inversion.However, with the exception of the “dp” coeﬃcient under the LN inversion, the oﬄine values are outsidethe plausible ranges estimated by WOMBAT. For “co2 grad del”, WOMBAT favours a smaller correctionfor both LG and LN, while for “logDWS” a larger correction is favoured. For “dp”, WOMBAT favours asmaller correction under the LG inversion.We repeated the online bias-correction procedure using retrievals retaining the TCCON-based oﬄine biascorrection. In this setting, if the retrievals are unbiased, bias coeﬃcients equal to zero should be inferred.The posterior densities of the estimated coeﬃcients under this conﬁguration are shown in the second rowof Figure 7. As expected, the magnitudes of the online-estimated coeﬃcients are close to zero, althoughthe credible intervals do not always include zero. Na¨ıvely, one might expect that the coeﬃcients wouldbe approximately equal to the diﬀerence between the TCCON-based oﬄine coeﬃcients and the coeﬃcientsestimated by WOMBAT when using uncorrected data. For “dp” and “co2 grad del”, the estimated co-eﬃcients indeed have the expected sign, and magnitudes of the expected order. The inferred “logDWS”coeﬃcients are surprising, however, with an opposite sign to what was expected for the LG inversion, andwith smaller magnitudes for both the LG and LN inversions. This unexpected result is a reﬂection of thecomplex interplay, and nonlinear relationships, between the parameters and processes in our model.Overall, the online-estimated bias correction is practically, if not statistically, similar to the TCCON-basedoﬄine correction. One possible explanation for the diﬀerence between the WOMBAT and the TCCON-basedestimates is that diﬀerent data are used, because WOMBAT does not use TCCON data for the correction.Moreover, it is likely that the true bias coeﬃcients are spatio-temporally varying; if this is indeed the case,26 p co2_grad_del logDWS . . . . . . . . . . . . D en s i t y Uncorrected retrievals dp co2_grad_del logDWS − . . . . − . − . − . . − . . . D en s i t y TCCON−corrected retrievals

WOMBAT LG WOMBAT LN TCCON−based offline correction

Fig. 7: Posterior densities of bias-correction coeﬃcients from inversions using OCO-2 retrievals without biascorrections applied (top row), and those from inversions using retrievals corrected using the TCCON-basedbias coeﬃcients (bottom row). Densities are shown for inversions based on LG and LN retrievals in yellowand dark green, respectively. The TCCON-based oﬄine bias correction coeﬃcients are shown as verticalblue lines for the panels in the top row. The grey dashed lines in the bottom row mark zero, the value thecoeﬃcients would take if the data were unbiased.the estimated biases would be aﬀected by the spatio-temporal locations of the retrievals used to estimatethem. Another reason may be that, for simplicity, we have used only a few of the most important variablesthat are used for oﬄine bias correction; a consideration of all the variables may lead to slightly diﬀerentresults. Despite this, our results show that online bias correction is possible, and that further research intoit as an attractive by-product of ﬂux inversion is warranted.Figure 8 gives annual and monthly ﬂuxes estimated from the inversions using the online bias correctionapplied to the uncorrected retrievals. For comparison, the equivalent ﬂuxes from Section 5.2.1, for theoﬄine-corrected data and with the online bias correction disabled, are also reported. The annual andmonthly ﬂuxes are similar between the oﬄine-corrected and online-corrected inversions, with substantialoverlap in the marginal posterior distributions for all time periods. This result gives further evidence thatonline bias correction is a viable alternative to oﬄine correction (here based on TCCON data) when doingﬂux inversion. 27 lobal −4−3−2−10 2015 2016Year F l u x [ P g C y r - ] −3−2−10122015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] Global Land −2−10 2015 2016Year F l u x [ P g C y r - ] −2−10122015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] Global Oceans −3−2−10 2015 2016Year F l u x [ P g C y r - ] −0.4−0.20.02015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] Prior (mean)LG, offline−corrected (mean, 95% cred. int.)LG, online−corrected (mean, 95% cred. int.) LN, offline−corrected (mean, 95% cred. int.)LN, online−corrected (mean, 95% cred. int.)

Fig. 8: Annual and monthly ﬂuxes for WOMBAT inversions using: TCCON-based oﬄine bias-correctedretrievals (solid lines with yellow for LG and dark green for LN; the ﬂuxes are the same as in Figure 4), andwith online bias correction applied to the uncorrected retrievals (dashed lines). The prior-mean ﬂuxes areshown in blue. Solid lines depict prior/posterior means, and shaded areas denote the 95% credible intervals.The top row depicts global ﬂuxes, the middle row global land ﬂuxes, and the bottom row global ocean ﬂuxes.28

Conclusion

The WOllongong Methodology for Bayesian Assimilation of Trace-gases (WOMBAT) extends the standardBayesian synthesis ﬂux-inversion framework and is based on a fully Bayesian hierarchical statistical model. Itincorporates physically motivated ﬂux basis functions, and follows the standard Bayesian synthesis frameworkby using a CTM to compute the corresponding mole-fraction basis functions oﬄine. The scaling factors forthe basis functions are inferred from mole-fraction satellite data. WOMBAT incorporates a correlated errorterm, estimates measurement biases and measurement-error scaling factors online, estimates prior variancesand length scales on ﬂux scaling factors, and uses a fully Bayesian MCMC scheme that allows uncertaintyquantiﬁcation on all unknown parameters in the model. We have illustrated the importance of modellingcorrelation and bias within a ﬂux-inversion system, and we have shown that WOMBAT produces globaland regional ﬂux estimates from OCO-2 data that are comparable to those from the MIP participants inCrowell et al. (2019). In particular, WOMBAT outperformed the other ﬂux models in reproducing TCCONdata when using the OCO-2 LG retrievals to obtain the ﬂuxes, and it was competitive when using ﬂuxesobtained from the OCO-2 LN retrievals. The MIP reported a global total (non-fossil-fuel) carbon sink of3 . ± . − (ensemble mean ± ensemble standard deviation) and 1 . ± . − for globalland for the years 2015 and 2016. WOMBAT’s posterior estimates are 3 . ± .

09 PgC yr − (posteriormean ± posterior standard deviation) and 1 . ± .

14 PgC yr − , respectively, using the OCO-2 LG data,and 3 . ± .

07 PgC yr − and 1 . ± .

13 PgC yr − , respectively, using the OCO-2 LN data. When thefossil-fuel ﬂuxes are included, we estimate a global carbon source of 6 . ± .

09 PgC yr − using the LGdata, and 6 . ± .

07 PgC yr − using the LN data. These estimates corroborate those of the MIP withinuncertainty. The WOMBAT framework, and code to reproduce the results of this paper, are available onlineat https://github.com/mbertolacci/wombat-paper/ .This paper presents the general, underlying, Bayesian hierarchical framework for WOMBAT, which willserve as a baseline for our ﬂux inversions based on current and future versions of OCO-2 data. Thereare several potential extensions being considered; here we discuss three of the most pertinent ones. First,WOMBAT, like most other ﬂux-inversion systems, currently operates using a single CTM. This is problem-atic from a statistical modelling point of view, as it does not allow one to attribute the correlated error tothe measurement-error process or the mole-fraction process. If more than one CTM is used, one could, inprinciple, statistically identify at least part of the error due to transport. This will not necessarily solve theproblem though, since CTMs tend to share common features that induce similar correlations (e.g., due tounresolved sub-grid variation). A possible way forward is to take results from oﬄine OSSEs to estimate andﬁx the parameters characterising the CTM error, and then attribute any residual observed correlation tothe retrieval process. Second, WOMBAT extends a traditional state-space approach to ﬂux inversion, whichcompetes with adjoint models that can allow for a much higher ﬂux dimensionality. Moving forward, WOM-BAT will seek to introduce higher dimensionality by using approximations to response functions constructedfrom ﬂux basis function that are at a much ﬁner scale than the TransCom3-by-month spatio-temporal basisfunctions that we have used here. These will help reduce any model error due to dimensionality reduction.Finally, WOMBAT currently only models along-track correlations when modelling the correlation-error term ξ g . However, the general framework we have proposed, based on the Vecchia approximation, can be extendedto model full space-time correlations. Dedicated investigations will be required to assess the feasibility ofthe implementation, and the impact it will have on ﬂux estimates. Acknowledgements

This project was largely supported by the Australian Research Council (ARC) Discovery Project (DP)DP190100180. Andrew Zammit-Mangion’s research was also supported by an ARC Discovery Early CareerResearch Award (DECRA) DE180100203. Noel Cressie’s research was also supported by ARC DP150104576.Cressie’s and Zammit-Mangion’s research was also supported by NASA ROSES grant 17-OCO2-17-0012.Jenny Fisher was supported by ARC DP160101598. Ann Stavert and Matthew Rigby were supported bythe UK Natural Environment Research Council grant NE/K002236/1. The TCCON data were obtainedfrom the TCCON Data Archive hosted by CaltechDATA at https://tccondata.org. The authors would alsolike to thank David Baker for generating the OCO-2 10 s average retrievals, and several colleagues forvaluable input/feedback. These include Beata Bukosa, Nicholas Deutscher, Anita Ganesan, Peter Rayner,29ndrew Schuh, and members of the OCO-2 Flux Team. This work was supported by resources provided bythe Pawsey Supercomputing Centre with funding from the Australian Government and the Government ofWestern Australia. The work was also supported by the NCI National Facility systems at the Australian Na-tional University through the National Computational Merit Allocation Scheme supported by the AustralianGovernment.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G. S., Davis, A., Dean,J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R.,Kaiser, L., Kudlur, M., Levenberg, J., Mane, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster,M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viegas,F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., and Zheng, X. (2016). TensorFlow:Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467.Baker, D. F., Doney, S. C., and Schimel, D. S. (2006). Variational data assimilation for atmospheric CO . Tellus B: Chemical and Physical Meteorology , 58:359–365.Basu, S., Baker, D. F., Chevallier, F., Patra, P. K., Liu, J., and Miller, J. B. (2018). The impact oftransport model diﬀerences on CO surface ﬂux estimates from OCO-2 retrievals of column average CO . Atmospheric Chemistry and Physics , 18:7189–7215.Basu, S., Guerlet, S., Butz, A., Houweling, S., Hasekamp, O., Aben, I., Krummel, P., Steele, P., Langenfelds,R., Torn, M., Biraud, S., Stephens, B., Andrews, A., and Worthy, D. (2013). Global CO ﬂuxes estimatedfrom GOSAT retrievals of total column CO . Atmospheric Chemistry and Physics , 13:8695–8717.Bey, I., Jacob, D. J., Yantosca, R. M., Logan, J. A., Field, B. D., Fiore, A. M., Li, Q., Liu, H. Y., Mickley,L. J., and Schultz, M. G. (2001). Global modeling of tropospheric chemistry with assimilated meteorology:Model description and evaluation.

Journal of Geophysical Research: Atmospheres , 106:23073–23095.Blumenstock, T., Hase, F., Schneider, M., Garc´ıa, O. E., and Sep´ulveda, E. (2014). TCCONdata from Iza˜na (ES), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.izana01.R0/1149295.Brynjarsd´ottir, J. and O’Hagan, A. (2014). Learning about physical parameters: the importance of modeldiscrepancy.

Inverse Problems , 30:114007.Bukosa, B., Deutscher, N. M., Fisher, J. A., Kubistin, D. C., Paton-Walsh, C., and Griﬃth, D. W. (2019). Si-multaneous shipborne measurements of CO , CH and CO and their application to improving greenhouse-gas ﬂux estimates in Australia. Atmospheric Chamistry and Physics , 19:7055–7072.Burrows, J. P., H¨olzle, E., Goede, A. P. H., Visser, H., and Fricke, W. (1995). SCIAMACHY—ScanningImaging Absorption Spectrometer for Atmospheric Chartography.

Acta Astronautica , 35:445–451.Chevallier, F. (2007). Impact of correlated observation errors on inverted CO surface ﬂuxes from OCOmeasurements. Geophysical Research Letters , 34(L24804).Chevallier, F., Br´eon, F.-M., and Rayner, P. J. (2007). Contribution of the Orbiting Carbon Observatory tothe estimation of CO sources and sinks: Theoretical study in a variational data assimilation framework. Journal of Geophysical Research: Atmospheres , 112(D9).Chevallier, F., Fisher, M., Peylin, P., Serrar, S., Bousquet, P., Br´eon, F.-M., Ch´edin, A., and Ciais, P.(2005). Inferring CO sources and sinks from satellite observations: Method and application to TOVSdata. Journal of Geophysical Research: Atmospheres , 110(D24).Ciais, P., Rayner, P., Chevallier, F., Bousquet, P., Logan, M., Peylin, P., and Ramonet, M. (2010). Atmo-spheric inversions for estimating CO ﬂuxes: methods and perspectives. Climatic Change , 103:69–92.30onnor, B. J., Boesch, H., Toon, G., Sen, B., Miller, C., and Crisp, D. (2008). Orbiting Carbon Observatory:Inverse method and prospective error analysis.

Journal of Geophysical Research: Atmospheres , 113(D5).Cressie, N. and Wikle, C. (2011).

Statistics for Spatio-Temporal Data . John Wiley and Sons, Hoboken, NJ.Crowell, S., Baker, D., Schuh, A., Basu, S., Jacobson, A. R., Chevallier, F., Liu, J., Deng, F., Feng, L.,McKain, K., Chatterjee, A., Miller, J. B., Stephens, B. B., Eldering, A., Crisp, D., Schimel, D., Nassar,R., O’Dell, C. W., Oda, T., Sweeney, C., Palmer, P. I., and Jones, D. B. A. (2019). The 2015–2016carbon cycle as seen from OCO-2 and the global in situ network.

Atmospheric Chemistry and Physics ,19:9797–9831.Dahl´en, U., Lindstr¨om, J., and Scholze, M. (2020). Spatiotemporal reconstructions of global CO -ﬂuxesusing Gaussian Markov random ﬁelds. Environmetrics , 31:e2610.Darmenov, A. and da Silva, A. (2015). The Quick Fire Emissions Dataset (QFED): Documentation ofversions 2.1, 2.2 and 2.4, NASA Technical Report Series on Global Modeling and Data Assimilation.Technical Report TM-2015-104606, NASA.Datta, A., Banerjee, S., Finley, A. O., and Gelfand, A. E. (2016). Hierarchical nearest-neighbor Gaussianprocess models for large geostatistical datasets.

Journal of the American Statistical Association , 111:800–812.De Mazi`ere, M., Sha, M. K., Desmet, F., Hermans, C., Scolas, F., Kumps, N., Metzger, J.-M., Duﬂot, V.,and Cammas, J.-P. (2014). TCCON data from R´eunion Island (RE), Release GGG2014.R0. TCCON dataarchive, hosted by CaltechDATA, https://doi.org/10.14291/tccon.ggg2014.reunion01.R0/1149288.Deng, F., Jones, D. B., O’Dell, C. W., Nassar, R., and Parazoo, N. C. (2016). Combining GOSAT XCO observations over land and ocean to improve regional CO ﬂux estimates. Journal of Geophysical Research:Atmospheres

Climate Change 2014: Mitigation of ClimateChange. Contribution of Working Group III to the Fifth Assessment Report of the Intergovernmental Panelon Climate Change . Cambridge University Press, Cambridge, UK.Eldering, A., O’Dell, C. W., Wennberg, P. O., Crisp, D., Gunson, M. R., Viatte, C., Avis, C., Braverman, A.,Castano, R., Chang, A., Chapsky, L., Cheng, C., Connor, B., Dang, L., Doran, G., Fisher, B., Frankenberg,C., Fu, D., Granat, R., Hobbs, J., Lee, R. A. M., Mandrake, L., McDuﬃe, J., Miller, C. E., Myers, V.,Natraj, V., O’Brien, D., Osterman, G. B., Oyafuso, F., Payne, V. H., Pollock, H. R., Polonsky, I., Roehl,C. M., Rosenberg, R., Schwandner, F., Smyth, M., Tang, V., Taylor, T. E., To, C., Wunch, D., andYoshimizu, J. (2017). The Orbiting Carbon Observatory-2: First 18 months of science data products.

Atmospheric Measurement Techniques , 10:549–563.31ldering, A., Taylor, T. E., O’Dell, C. W., and Pavlick, R. (2019). The OCO-3 mission: Measurement objec-tives and expected performance based on 1 year of simulated data.

Atmospheric Measurement Techniques ,12:2341–2370.Engelen, R. J., Denning, A. S., and Gurney, K. R. (2002). On error estimation in atmospheric CO inversions. Journal of Geophysical Research: Atmospheres , 107(D22):ACL–10.Enting, I. and Newsam, G. (1990). Atmospheric constituent inversion problems: Implications for baselinemonitoring.

Journal of Atmospheric Chemistry , 11:69–87.Enting, I. G. (2002).

Inverse Problems in Atmospheric Constituent Transport . Cambridge University Press,Cambridge, UK.Fan, S., Gloor, M., Mahlman, J., Pacala, S., Sarmiento, J., Takahashi, T., and Tans, P. (1998). A largeterrestrial carbon sink in North America implied by atmospheric and oceanic carbon dioxide data andmodels.

Science , 282:442–446.Feist, D. G., Arnold, S. G., John, N., and Geibel, M. C. (2014). TCCON data from As-cension Island (SH), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.ascension01.r0/1149285.Feng, L., Palmer, P., B¨osch, H., and Dance, S. (2009). Estimating surface CO ﬂuxes from space-borne CO dry air mole fraction observations using an ensemble Kalman ﬁlter. Atmospheric Chemistry & Physics ,9:2619–2633.Feng, S., Lauvaux, T., Keller, K., Davis, K. J., Rayner, P., Oda, T., and Gurney, K. R. (2019). A roadmap for improving the treatment of uncertainties in high-resolution regional carbon ﬂux inverse estimates.

Geophysical Research Letters , 46:13461–13469.Finley, A. O., Banerjee, S., Waldmann, P., and Ericsson, T. (2009). Hierarchical spatial modeling of additiveand dominance genetic variance for large spatial trial datasets.

Biometrics , 65:441–451.Ganesan, A. L., Rigby, M., Zammit-Mangion, A., Manning, A. J., Prinn, R. G., Fraser, P. J., Harth, C. M.,Kim, K.-R., Krummel, P. B., Li, S., M¨uhle, J., O’Doherty, S. J., Park, S., Salameh, P. K., Steele, L. P., andWeiss, R. F. (2014). Characterization of uncertainties in atmospheric trace gas inversions using hierarchicalBayesian methods.

Atmospheric Chemistry and Physics , 14:3855–3864.Giglio, L., Randerson, J. T., and van der Werf, G. R. (2013). Analysis of daily, monthly, and annual burnedarea using the fourth-generation global ﬁre emissions database (GFED4).

Journal of Geophysical Research:Biogeosciences , 118:317–328.Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.

Journal ofthe American Statistical Association , 102:359–378.Griﬃth, D. W., Deutscher, N. M., Velazco, V. A., Wennberg, P. O., Yavin, Y., Keppel-Aleks, G., Washen-felder, R. A., Toon, G. C., Blavier, J.-F., Paton-Walsh, C., Jones, N. B., Kettlewell, G. C., Connor,B. J., Macatangay, R. C., Roehl, C., Ryczek, M., Glowacki, J., Culgan, T., and Bryant, G. W. (2014a).TCCON data from Darwin (AU), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.darwin01.R0/1149290.Griﬃth, D. W., Velazco, V. A., Deutscher, N. M., Paton-Walsh, C., Jones, N. B., Wilson, S. R.,Macatangay, R. C., Kettlewell, G. C., Buchholz, R. R., and Riggenbach, M. O. (2014b). TCCONdata from Wollongong (AU), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.wollongong01.R0/1149291.Gurney, K. R., Law, R. M., Denning, A. S., Rayner, P. J., Baker, D., Bousquet, P., Bruhwiler, L., Chen, Y.-H., Ciais, P., Fan, S., Fung, I. Y., Gloor, M., Heimann, M., Higuchi, K., John, J., Maki, T., Maksyutov, S.,Masarie, K., Peylin, P., Prather, M., Pak, B. C., Randerson, J., Sarmiento, J., Taguchi, S., Takahashi, T.,and Yuen, C.-W. (2002). Towards robust regional estimates of CO sources and sinks using atmospherictransport models. Nature , 415:626–630. 32ase, F., Blumenstock, T., Dohe, S., Groß, J., and Kiel, M. (2015). TCCON datafrom Karlsruhe (DE), Release GGG2014.R1. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.karlsruhe01.R1/1182416.Henderson, H. V. and Searle, S. R. (1981). On deriving the inverse of a sum of matrices.

Siam Review ,23:53–60.Houweling, S., Aben, I., Breon, F.-M., Chevallier, F., Deutscher, N., Engelen, R., Gerbig, C., Griﬃth, D.,Hungershoefer, K., Macatangay, R., Marshall, J., Notholt, J., Peters, W., and Serrar, S. (2010). Theimportance of transport model uncertainties for the estimation of CO sources and sinks using satellitemeasurements. Atmospheric Chemistry and Physics , 10:9981–9992.Houweling, S., Baker, D., Basu, S., Boesch, H., Butz, A., Chevallier, F., Deng, F., Dlugokencky, E. J.,Feng, L., Ganshin, A., Hasekamp, O., Jones, D., Maksyutov, S., Marshall, J., Oda, T., O’Dell, C. W.,Oshchepkov, S., Palmer, P. I., Peylin, P., Poussi, Z., Reum, F., Takagi, H., Yoshida, Y., and Zhuravlev,R. (2015). An intercomparison of inverse models for estimating sources and sinks of CO using GOSATmeasurements. Journal of Geophysical Research: Atmospheres , 120:5253–5266.Iraci, L. T., Podolske, J. R., Hillyard, P. W., Roehl, C., Wennberg, P. O., Blavier, J.-F., Landeros, J., Allen,N., Wunch, D., Zavaleta, J., Quigley, E., Osterman, G. B., Albertson, R., Dunwoody, K., and Boyden,H. (2016). TCCON data from Edwards (US), Release GGG2014.R1. TCCON data archive, hosted byCaltechDATA, https://doi.org/10.14291/tccon.ggg2014.edwards01.r1/1255068.Jacobson, A. R., Schuldt, K. N., Miller, J. B., Oda, T., Tans, P., Arlyn Andrews, Mund, J., Ott, L., Collatz,G. J., Aalto, T., Afshar, S., Aikin, K., Aoki, S., Apadula, F., Baier, B., Bergamaschi, P., Beyersdorf, A.,Biraud, S. C., Bollenbacher, A., Bowling, D., Brailsford, G., Abshire, J. B., Chen, G., Chen, H., Chmura,L., Climadat, S., Colomb, A., Conil, S., Cox, A., Cristofanelli, P., Cuevas, E., Curcoll, R., Sloop, C. D.,Davis, K., Wekker, S. D., Delmotte, M., DiGangi, J. P., Dlugokencky, E., Ehleringer, J., Elkins, J. W.,Emmenegger, L., Fischer, M. L., Forster, G., Frumau, A., Galkowski, M., Gatti, L. V., Gloor, E., Griﬃs,T., Hammer, S., Haszpra, L., Hatakka, J., Heliasz, M., Hensen, A., Hermanssen, O., Hintsa, E., Holst, J.,Jaﬀe, D., Karion, A., Kawa, S. R., Keeling, R., Keronen, P., Kolari, P., Kominkova, K., Kort, E., Krummel,P., Kubistin, D., Labuschagne, C., Langenfelds, R., Laurent, O., Laurila, T., Lauvaux, T., Law, B., Lee,J., Lehner, I., Leuenberger, M., Levin, I., Levula, J., Lin, J., Lindauer, M., Loh, Z., Lopez, M., Myhre,C. L., Machida, T., Mammarella, I., Manca, G., Manning, A., Manning, A., Marek, M. V., Marklund, P.,Martin, M. Y., Matsueda, H., McKain, K., Meijer, H., Meinhardt, F., Miles, N., Miller, C. E., M¨older,M., Montzka, S., Moore, F., Morgui, J.-A., Morimoto, S., Munger, B., Necki, J., Newman, S., Nichol, S.,Niwa, Y., O’Doherty, S., Ottosson-L¨ofvenius, M., Paplawsky, B., Peischl, J., Peltola, O., Pichon, J.-M.,Piper, S., Plass-D¨olmer, C., Ramonet, M., Reyes-Sanchez, E., Richardson, S., Riris, H., Ryerson, T., Saito,K., Sargent, M., Sasakawa, M., Sawa, Y., Say, D., Scheeren, B., Schmidt, M., Schmidt, A., Schumacher,M., Shepson, P., Shook, M., Stanley, K., Steinbacher, M., Stephens, B., Sweeney, C., Thoning, K., Torn,M., Turnbull, J., Tørseth, K., Bulk, P. V. D., Laan-Luijkx, I. T. V. D., Dinther, D. V., Vermeulen, A.,Viner, B., Vitkova, G., Walker, S., Weyrauch, D., Wofsy, S., Worthy, D., Young, D., and Zimnoch, M.(2020). Carbontracker CT2019, Model published 2020 by NOAA Earth System Research Laboratory,Global Monitoring Division. .Kaminski, T., Rayner, P. J., Heimann, M., and Enting, I. G. (2001). On aggregation errors in atmospherictransport inversions.

Journal of Geophysical Research: Atmospheres , 106:4703–4715.Kawakami, S., Ohyama, H., Arai, K., Okumura, H., Taura, C., Fukamachi, T., and Sakashita, M. (2014).TCCON data from Saga (JP), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.saga01.R0/1149283.Keller, C. A., Long, M. S., Yantosca, R. M., Da Silva, A. M., Pawson, S., and Jacob, D. J. (2014). HEMCOv1.0: A versatile, ESMF-compliant component for calculating emissions in atmospheric models.

Geosci-entiﬁc Model Development , 7:1409–1417. 33ivi, R., Heikkinen, P., and Kyr¨o, E. (2014). TCCON data from Sodankyl¨a(FI), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.sodankyla01.R0/1149280.Kuze, A., Suto, H., Nakajima, M., and Hamazaki, T. (2009). Thermal and near infrared sensor for carbonobservation Fourier-transform spectrometer on the Greenhouse Gases Observing Satellite for greenhousegases monitoring.

Applied Optics , 48:6716–6733.Lauvaux, T., D´ıaz-Isaac, L. I., Bocquet, M., and Bousserez, N. (2019). Diagnosing spatial error structures inCO mole fractions and XCO column mole fractions from atmospheric transport. Atmospheric Chemistry& Physics , 19(18).Lunt, M. F., Rigby, M., Ganesan, A. L., and Manning, A. J. (2016). Estimation of trace gas ﬂuxes withobjectively determined basis functions using reversible-jump Markov chain Monte Carlo.

GeoscientiﬁcModel Development , 9:3213–3229.Masarie, K. A., Peters, W., Jacobson, A. R., and Tans, P. P. (2014). ObsPack: A framework for thepreparation, delivery, and attribution of atmospheric greenhouse gas measurements.

Earth System ScienceData , 6:375–384.McNorton, J. R., Bousserez, N., Agust´ı-Panareda, A., Balsamo, G., Choulga, M., Dawson, A., Engelen,R., Kipling, Z., and Lang, S. (2020). Representing model uncertainty for global atmospheric CO ﬂuxinversions using ECMWF-IFS-46R1. Geoscientiﬁc Model Development , 13:2297–2313.Michalak, A. M., Bruhwiler, L., and Tans, P. P. (2004). A geostatistical approach to surface ﬂux estimationof atmospheric trace gases.

Journal of Geophysical Research , 109:D14109.Michalak, A. M., Hirsch, A., Bruhwiler, L., Gurney, K. R., Peters, W., and Tans, P. P. (2005). Maximumlikelihood estimation of covariance parameters for Bayesian atmospheric trace gas surface ﬂux inversions.

Journal of Geophysical Research: Atmospheres , 110(D24).Miller, S. M., Michalak, A. M., and Levi, P. J. (2014). Atmospheric inverse modeling with known physicalbounds: an example from trace gas emissions.

Geoscientiﬁc Model Development , 7:303–315.Miller, S. M., Saibaba, A. K., Trudeau, M. E., Mountain, M. E., and Andrews, A. E. (2020). Geostatisticalinverse modeling with very large datasets: an example from the Orbiting Carbon Observatory 2 (OCO-2)satellite.

Geoscientiﬁc Model Development , 13:1771–1785.Morino, I., Matsuzaki, T., and Horikawa, M. (2016). TCCON data from Tsukuba(JP), 125HR, Release GGG2014.R1. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.tsukuba02.R1/1241486.Mukherjee, C., Kasibhatla, P. S., and West, M. (2011). Bayesian statistical modeling of spatially correlatederror structure in atmospheric tracer inverse analysis.

Atmospheric Chemistry and Physics , 11:5365–5382.Nassar, R., Jones, D. B. A., Suntharalingam, P., Chen, J. M., Andres, R. J., Wecht, K. J., Yantosca, R. M.,Kulawik, S. S., Bowman, K. W., Worden, J. R., Machida, T., and Matsueda, H. (2010). Modeling globalatmospheric CO with improved emission inventories and CO production from the oxidation of othercarbon species. Geoscientiﬁc Model Development , 3:689–716.Nassar, R., Napier-Linton, L., Gurney, K. R., Andres, R. J., Oda, T., Vogel, F. R., and Deng, F. (2013).Improving the temporal and spatial distribution of CO emissions from global fossil fuel emission datasets. Journal of Geophysical Research: Atmospheres , 118:917–933.Neal, R. M. (2003). Slice sampling.

Annals of Statistics , 31:705–741.Notholt, J., Petri, C., Warneke, T., Deutscher, N. M., Palm, M., Buschmann, M., Weinzierl, C., Macatangay,R. C., and Grupe, P. (2014). TCCON data from Bremen (DE), Release GGG2014.R0. TCCON dataarchive, hosted by CaltechDATA, https://doi.org/10.14291/tccon.ggg2014.bremen01.R0/1149275.34da, T. and Maksyutov, S. (2011). A very high-resolution (1 km × emissioninventory derived using a point source database and satellite observations of nighttime lights. AtmosphericChemistry and Physics , 11:543–556.Oda, T., Maksyutov, S., and Andres, R. J. (2018). The Open-source Data Inventory for Anthropogenic CO ,version 2016 (ODIAC2016): A global monthly fossil fuel CO gridded emissions data product for tracertransport simulations and surface ﬂux inversions. Earth System Science Data , 10:87–107.O’Dell, C. W., Connor, B., B¨osch, H., O’Brien, D., Frankenberg, C., Castano, R., Christi, M., Eldering,D., Fisher, B., Gunson, M., McDuﬃe, J., Miller, C. E., Natraj, V., Oyafuso, F., Polonsky, I., Smyth, M.,Taylor, T., Toon, G. C., Wennberg, P. O., and Wunch, D. (2012). The ACOS CO retrieval algorithmPart 1: Description and validation against synthetic observations. Atmospheric Measurement Techniques ,5:99–121.O’Dell, C. W., Eldering, A., Wennberg, P. O., Crisp, D., Gunson, M. R., Fisher, B., Frankenberg, C., Kiel,M., Lindqvist, H., Mandrake, L., Merrelli, A., Natraj, V., Nelson, R. R., Osterman, G. B., Payne, V. H.,Taylor, T. E., Wunch, D., Drouin, B. J., Oyafuso, F., Chang, A., McDuﬃe, J., Smyth, M., Baker, D. F.,Basu, S., Chevallier, F., Crowell, S. M. R., Feng, L., Palmer, P. I., Dubey, M., Garc´ıa, O. E., Griﬃth, D.W. T., Hase, F., Iraci, L. T., Kivi, R., Morino, I., Notholt, J., Ohyama, H., Petri, C., Roehl, C. M., Sha,M. K., Strong, K., Sussmann, R., Te, Y., Uchino, O., and Velazco, V. A. (2018). Improved retrievals ofcarbon dioxide from Orbiting Carbon Observatory-2 with the version 8 ACOS algorithm.

AtmosphericMeasurement Techniques , 11:6539–6576.Peters, G. P., Andrew, R. M., Boden, T., Canadell, J. G., Ciais, P., Le Qu´er´e, C., Marland, G., Raupach,M. R., and Wilson, C. (2013). The challenge to keep global warming below 2 ◦ C. Nature Climate Change ,3:4–6.Peters, W., Miller, J., Whitaker, J., Denning, A., Hirsch, A., Krol, M., Zupanski, D., Bruhwiler, L., andTans, P. (2005). An ensemble data assimilation system to estimate CO surface ﬂuxes from atmospherictrace gas observations. Journal of Geophysical Research , page D24304.Philip, S., Johnson, M. S., Potter, C., Genovesse, V., Baker, D. F., Haynes, K. D., Henze, D. K., Liu, J., andPoulter, B. (2019). Prior biosphere model impact on global terrestrial CO ﬂuxes estimated from OCO-2retrievals. Atmospheric Chemistry and Physics , 19:13267–13287.Potter, C. S., Randerson, J. T., Field, C. B., Matson, P. A., Vitousek, P. M., Mooney, H. A., and Klooster,S. A. (1993). Terrestrial ecosystem production: a process model based on global satellite and surface data.

Global Biogeochemical Cycles , 7:811–841.R Core Team (2020).

R: A Language and Environment for Statistical Computing . R Foundation for StatisticalComputing, Vienna, Austria. .Rasmussen, C. E. and Williams, C. K. I. (2006).

Gaussian Processes for Machine Learning . The MIT Press,Cambridge, MA.Rayner, P. (2020). Data assimilation using an ensemble of models: A hierarchical approach.

AtmosphericChemistry and Physics , 20:3725–3737.Rayner, P. and O’Brien, D. (2001). The utility of remotely sensed CO concentration data in surface sourceinversions. Geophysical Research Letters , 28:175–178.Rienecker, M. M., Suarez, M. J., Todling, R., Bacmeister, J., Takacs, L., Liu, H.-C., Gu, W., Sienkiewicz,M., Koster, R. D., Gelaro, R., Stajner, I., and Nielsen, J. (2008). The GEOS-5 Data Assimilation System-Documentation of Versions 5.0.1, 5.1.0, and 5.2.0. Technical Report TM–2008–104606, NASA.Rodgers, C. D. (2000).

Inverse Methods for Atmospheric Sounding . World Scientiﬁc, Singapore, Singapore.Rodgers, C. D. and Connor, B. J. (2003). Intercomparison of remote sounding instruments.

Journal ofGeophysical Research: Atmospheres , 108(D3). 35aeki, T., Maksyutov, S., Sasakawa, M., Machida, T., Arshinov, M., Tans, P., Conway, T. J., Saito, M.,Valsala, V., Oda, T., Andres, R. J., and Belikov, D. (2013). Carbon ﬂux estimation for Siberia byinverse modeling constrained by aircraft and tower CO measurements. Journal of Geophysical Research:Atmospheres , 118:1100–1122.Schuh, A. E., Jacobson, A. R., Basu, S., Weir, B., Baker, D., Bowman, K., Chevallier, F., Crowell, S., Davis,K. J., Deng, F., Denning, S., Feng, L., Jones, D., Liu, J., and Palmer, P. I. (2019). Quantifying theimpact of atmospheric transport uncertainty on CO surface ﬂux estimates. Global Biogeochemical Cycles ,33:484–500.Sherlock, V., Connor, B., Robinson, J., Shiona, H., Smale, D., and Pollard, D. F. (2014). TCCONdata from Lauder (NZ), 125HR, Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.lauder02.R0/1149298.Strong, K., Roche, S., Franklin, J. E., Mendonca, J., Lutsch, E., Weaver, D., Fogal,P. F., Drummond, J. R., Batchelor, R., and Lindenmaier, R. (2016). TCCON datafrom Eureka (CA), Release GGG2014.R1. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.eureka01.r1/1325515.Takahashi, T., Sutherland, S. C., Sweeney, C., Poisson, A., Metzl, N., Tilbrook, B., Bates, N., Wanninkhof,R., Feely, R. A., Sabine, C., Olafsson, J., and Nojiri, Y. (2002). Global sea–air CO ﬂux based onclimatological surface ocean pCO , and seasonal biological and temperature eﬀects. Deep Sea ResearchPart II: Topical Studies in Oceanography , 49:1601–1622.Takahashi, T., Sutherland, S. C., Wanninkhof, R., Sweeney, C., Feely, R. A., Chipman, D. W., Hales, B.,Friederich, G., Chavez, F., Sabine, C., Watson, A., Bakker, D. C. E., Schuster, U., Metzl, N., Yoshikawa-Inoue, H., Ishii, M., Midorikawa, T., Nojiri, Y., K¨ortzinger, A., Steinhoﬀ, T., Hoppema, M., Olafsson, J.,Arnarson, T. S., Tilbrook, B., Johannessen, T., Olsen, A., Bellerby, R., Wong, C. S., Delille, B., Bates,N. R., and de Baar, H. J. W. (2009). Climatological mean and decadal change in surface ocean pCO , andnet sea–air CO ﬂux over the global oceans. Deep Sea Research Part II: Topical Studies in Oceanography ,56:554–577.Thoning, K. W., Crotwell, A. M., and Mund, J. W. (2020). Atmospheric carbon dioxide dry air molefractions from continuous measurements at Mauna Loa, Hawaii, Barrow, Alaska, American Samoa andSouth Pole. 1973-2019, version 2020-08. https://doi.org/10.15138/yaf1-bk21.Tierney, L. (1994). Markov Chains for Exploring Posterior Distributions.

The Annals of Statistics , 22:1701–1728.UNFCCC (2015). Adoption of the Paris Agreement. Report No. FCCC/CP/2015/L.9/Rev.1.http://unfccc.int/resource/docs/2015/cop21/eng/l09r01.pdf.Vecchia, A. V. (1988). Estimation and model identiﬁcation for continuous spatial processes.

Journal of theRoyal Statistical Society: Series B , 50:297–312.Warneke, T., Messerschmidt, J., Notholt, J., Weinzierl, C., Deutscher, N. M., Petri, C., and Grupe, P. (2014).TCCON data from Orl´eans (FR), Release GGG2014.R0. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.orleans01.R0/1149276.Wennberg, P. O., Roehl, C. M., Wunch, D., Toon, G. C., Blavier, J.-F., Washenfelder, R., Keppel-Aleks, G.,Allen, N. T., and Ayers, J. (2014). TCCON data from Park Falls (US), Release GGG2014.R0. TCCONdata archive, hosted by CaltechDATA, https://doi.org/10.14291/tccon.ggg2014.parkfalls01.R0/1149161.Wennberg, P. O., Wunch, D., Roehl, C. M., Blavier, J.-F., Toon, G. C., and Allen, N. T. (2015). TC-CON data from Caltech (US), Release GGG2014.R1. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.pasadena01.r1/1182415.36ennberg, P. O., Wunch, D., Roehl, C. M., Blavier, J.-F., Toon, G. C., and Allen, N. T. (2016). TC-CON data from Lamont (US), Release GGG2014.R1. TCCON data archive, hosted by CaltechDATA,https://doi.org/10.14291/tccon.ggg2014.lamont01.r1/1255070.Wikle, C. K., Zammit-Mangion, A., and Cressie, N. (2019).

Spatio-Temporal Statistics with R . CRC Press.Worden, J. R., Doran, G., Kulawik, S., Eldering, A., Crisp, D., Frankenberg, C., O’Dell, C., and Bowman, K.(2017). Evaluation and attribution of OCO-2 XCO uncertainties. Atmospheric Measurement Techniques ,10:2759–2771.Wunch, D., Toon, G. C., Blavier, J.-F. L., Washenfelder, R. A., Notholt, J., Connor, B. J., Griﬃth, D. W.,Sherlock, V., and Wennberg, P. O. (2011a). The Total Carbon Column Observing Network.

Philosoph-ical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences ,369:2087–2112.Wunch, D., Toon, G. C., Wennberg, P. O., Wofsy, S. C., Stephens, B. B., Fischer, M. L., Uchino, O., Abshire,J. B., Bernath, P., Biraud, S. C., Blavier, J.-F. L., Boone, C., Bowman, K. P., Browell, E. V., Campos,T., Connor, B. J., Daube, B. C., Deutscher, N. M., Diao, M., Elkins, J. W., Gerbig, C., Gottlieb, E.,Griﬃth, D. W. T., Hurst, D. F., Jim´enez, R., Keppel-Aleks, G., Kort, E. A., Macatangay, R., Machida,T., Matsueda, H., Moore, F., Morino, I., Park, S., Robinson, J., Roehl, C. M., Sawa, Y., Sherlock, V.,Sweeney, C., Tanaka, T., and Zondlo, M. A. (2010). Calibration of the Total Carbon Column ObservingNetwork using aircraft proﬁle data.

Atmospheric Measurement Techniques , 3:1351–1362.Wunch, D., Wennberg, P. O., Toon, G. C., Connor, B. J., Fisher, B., Osterman, G. B., Frankenberg, C.,Mandrake, L., O’Dell, C., Ahonen, P., Biraud, S. C., Castano, R., Cressie, N., Crisp, D., Deutscher, N. M.,Eldering, A., Fisher, M. L., Griﬃth, D. W. T., Gunson, M., Heikkinen, P., Keppel-Aleks, G., Kyr¨o, E.,Lindenmaier, R., Macatangay, R., Mendonca, J., Messerschmidt, J., Miller, C. E., Morino, I., Notholt,J., Oyafuso, F. A., Rettinger, M., Robinson, J., Roehl, C. M., Salawitch, R. J., Sherlock, V., Strong,K., Sussmann, R., Tanaka, T., Thompson, D. R., Uchino, O., Warneke, T., and Wofsy, S. C. (2011b).A method for evaluating bias in global measurements of CO total columns from space. AtmosphericChemistry and Physics , 11:12317–12337.Yantosca, B. (2019). geoschem/geos-chem: GEOS-Chem 12.3.2. https://doi.org/10.5281/zenodo.2658178.Yevich, R. and Logan, J. A. (2003). An assessment of biofuel use and burning of agricultural waste in thedeveloping world.

Global Biogeochemical Cycles , 17:1095.Zammit-Mangion, A., Cressie, N., and Ganesan, A. L. (2016). Non-Gaussian bivariate modelling withapplication to atmospheric trace-gas inversion.

Spatial Statistics , 18:194–220.

A Markov chain Monte Carlo algorithm

As mentioned in Section 2.5, WOMBAT makes inference on the ﬂuxes and the other parameters in the modelusing a Gibbs sampler, wherein samples of subsets of the parameters are iteratively drawn from their fullconditional distributions (e.g., Tierney, 1994). Recall that the target distribution is p ( α , β , κ , τ w , θ ξ,(cid:15) | Z ),as shown in (10), and that a graphical model that can be used for establishing conditional independencerelationships is given in Figure 1.The Gibbs sampler in WOMBAT is as follows. Given the i th sample, { α [ i ] , β [ i ] , κ [ i ] , τ [ i ] w , θ [ i ] ξ,(cid:15) } , the( i + 1)th sample is generated sequentially in the following manner.1. Sample α [ i +1] and β [ i +1] jointly from p ( α , β | κ [ i ] , τ [ i ] w , θ [ i ] ξ,(cid:15) , Z ).2. Sample κ [ i +1] from p ( κ | τ [ i ] w , α [ i +1] ).3. Sample τ [ i +1] w from p ( τ w | κ [ i +1] , α [ i +1] ). 37. Sample θ [ i +1] ξ,(cid:15) from p ( θ ξ,(cid:15) | α [ i +1] , β [ i +1] , Z ).Below, in Sections A.1–A.4, we give the details for Steps 1–4. In deriving the conditional distributions, weoften make use of the Sherman–Morrison–Woodbury matrix identity and a matrix-determinant lemma (e.g.,Henderson and Searle, 1981). The former identity states that, for conformable matrices A , U , C , and V ,( A + UCV ) − = A − − A − U ( C − + VA − U ) − VA − , whenever the required inverses exist, while the latter lemma states that | A + UCV | = | C − + VA − U || C || A | . A.1 Sampling α and β Let B = ( ˆ Ψ , A ), x = ( α (cid:48) , β (cid:48) ) (cid:48) , and Σ x = bdiag( Σ α , σ β I ). Then p ( α , β | κ , τ w , θ ξ,(cid:15) , Z ) ∝ exp (cid:20) −

12 ( Z − ˆ Z − Bx ) (cid:48) ( Σ ξ + Σ (cid:15) ) − ( Z − ˆ Z − Bx ) − x (cid:48) Σ − x x (cid:21) . The log of this quantity is quadratic in x , and therefore the full conditional distribution of x is a multivariateGaussian distribution; speciﬁcally, x | κ , τ w , θ ξ,(cid:15) , Z ∼ Gau( µ cx , Σ cx ) , (19)where ( Σ cx ) − = B (cid:48) ( Σ ξ + Σ (cid:15) ) − B + Σ − x , and µ cx = Σ cx B (cid:48) ( Σ ξ + Σ (cid:15) ) − ( Z − ˆ Z ) . As described in Section 3, Σ (cid:15) is diagonal and, under Markov assumptions, Σ ξ has a sparse inverse. Theseproperties allow us to eﬃciently compute the required mean and covariance matrix through use of theSherman–Morrison–Woodbury matrix identity. Once these are computed, sampling from (19) is straightfor-ward. A.2 Sampling κ The full conditional distribution of κ is p ( κ | τ w , α ) ∝ | Σ α | − / exp (cid:18) − α (cid:48) Σ − α α (cid:19) p ( κ ) , (20)where, recalling Section 2.4, p ( κ ) is a product of Beta density functions. Since each κ j and τ w,j is associatedwith a spatial region, we also partition α by spatial region. That is, we deﬁne α ≡ (( α ) (cid:48) , . . . , ( α r s ) (cid:48) ) (cid:48) ,where α j ∈ R r t , j = 1 , . . . , r s . For j = 1 , . . . , r s , the r t -dimensional vector α j can therefore be associatedwith κ j , τ w,j , and a r t × r t sub-block of the matrix Σ − α , which we denote as Σ − α,j . Under the autoregressivemodel in Section 2.4, Σ − α,j = τ w,j Q α,j , where Q α,j =  − κ j − κ j κ j . . .. . . . . . . . .. . . 1 + κ j − κ j − κ j  , j = 1 , . . . , r s . a priori , (20) may be writtenas p ( κ | τ w , α ) = (cid:81) r s j =1 p ( κ j | τ w,j , α j ), where p ( κ j | τ w,j , α j ) ∝ | Q α,j | / exp (cid:16) − τ w,j α j ) (cid:48) Q α,j α j (cid:17) κ a κ,j − j (1 − κ j ) b κ,j − . (21)To generate samples from (20), we therefore successively sample κ j , j = 1 , . . . , r s , from its full conditionaldistribution (21). Equation (21) does not correspond to any standard distribution, so we use slice sampling(Neal, 2003) to sample from it. A.3 Sampling τ w Similarly to κ , the conditional distribution of τ w factorises across spatial regions, and it is therefore givenby p ( τ w | κ , α ) = (cid:81) r s j =1 p ( τ w,j | κ j , α j ), where p ( τ w,j | κ j , α j ) ∝ τ r t / w,j exp (cid:16) − τ w,j α j ) (cid:48) Q α,j α j (cid:17) τ ν w,j − w,j e − ω w,j τ w,j . (22)The density function in (22) is a Gamma density function, τ w,j | κ j , α j ∼ Ga( ν cw,j , ω cw,j ) , (23)where ν cw,j = ν w,j + r t and ω cw,j = ω w,j + ( α j ) (cid:48) Q α,j α j . Therefore, we sample from the full conditionaldistribution of τ w by successively sampling τ w,j , for j = 1 , . . . , r s , directly from (23). A.4 Sampling θ ξ,(cid:15) Since we assume that the parameters governing the correlated and uncorrelated error terms are data-groupspeciﬁc, the full conditional distribution of θ ξ,(cid:15) is p ( θ ξ,(cid:15) | α , β , Z ) ∝ n g (cid:89) g =1 p ( θ ξ g , γ g | α , β , Z ,g ) , (24)where p ( θ ξ g , γ g | α , β , Z ,g ) ∝ p ( Z ,g | α , β , θ ξ g , γ g ) p ( θ ξ g , γ g ); p ( θ ξ g , γ g ) is the joint prior over θ ξ g and γ g ;and p ( Z ,g | α , β , θ ξ g , γ g ) ∝ | Σ ξ g + Σ (cid:15) g | − / exp (cid:104) −

12 ( Z ,g − ˆ Z ,g − ˆ Ψ g α − A g β g ) (cid:48) ( Σ ξ g + Σ (cid:15) g ) − ( Z ,g − ˆ Z ,g − ˆ Ψ g α − A g β g ) (cid:105) . (25)The matrix operations in (25) may be computed eﬃciently using the matrix identities given at the start ofthis section. Since (25) does not correspond to any standard distribution, we use slice sampling to samplefrom it, for g = 1 , . . . , n g . B Observation operator for OCO-2 retrievals

The retrieval algorithm used for OCO-2 takes spectral data as input and returns CO mole fractions on 20vertical levels as output via an optimisation procedure. The CO mole fractions at these 20 vertical levelsare subsequently used to compute a column-averaged CO that we associate with a single retrieval. For the i th retrieval, denote the vector of retrieved mole fractions as Z r ,i . Following the argument given by Rodgersand Connor (2003) and Connor et al. (2008), the retrieved mole fractions are given by Z r ,i = Y ,r ,i + A i ( Y ,i − Y ,r ,i ) + ε ri , Y ,r ,i = ( Y ,r ,i, , . . . , Y ,r ,i, ) (cid:48) is the vector of prior-mean mole fractions used by the retrieval algorithm,speciﬁc to the i th retrieval (these are unrelated to the prior mean of the mole-fraction ﬁeld used in ourinversion, Y ( · , · , · )); A i is the “averaging kernel matrix”; Y ,i ≡ ( Y ( s i , h i,k , t i )) k =1 is the true molefraction at the 20 vertical levels for the retrieval at geopotential heights h i,k , k = 1 , . . . ,

20; and ε ri is a vectorof measurement errors (which may also include systematic biases and errors induced by nonlinearity in theinversion process). The column-averaged retrieval is then Z ,i = c (cid:48) i Z r ,i = Y ,rc ,i + c (cid:48) i A i ( Y ,i − Y ,r ,i ) + c (cid:48) i ε ri , (26)where c i ≡ ( c i, , . . . , c i, ) (cid:48) are quadrature weights used to estimate the column average, and Y ,rc ,i ≡ c (cid:48) i Y ,r ,i is the retrieval prior mean column-averaged CO . Deﬁne a i ≡ ( a i, , . . . , a i, ) (cid:48) , where a i,k ≡ c i,k ( c (cid:48) i A i ) k , k = 1 , . . . , , and ( c (cid:48) i A i ) k denotes the k th element of c (cid:48) i A i . The vector a i is the “averaging kernel vector” of the i thretrieval, which is available in the oﬃcial releases of the OCO-2 data. It follows that the observationoperator in (7) is deﬁned asˆ A i ( Y ( s i , · , t i )) ≡ Y ,rc ,i + (cid:88) k =1 c i,k a i,k (cid:16) Y ( s i , h i,k , t i ) − Y ,r ,i,k (cid:17) . (27)Note that we do not explicitly account for the error term c (cid:48) i ε ri in our deﬁnition for ˆ A i . This is because itwill be absorbed by the error terms we use in the data model (8).The averaging-kernel-vector elements { a i,k } are typically close in value to 1. They reﬂect the fact thatthe retrieval is not equally sensitive to the mole fractions at all the vertical levels. At levels where there isless sensitivity (i.e., values < estimate (Rodgers, 2000).40 Supplementary Figures and Tables

T01T02 T03T04 T05T06 T07T08 T09 T10T11 T12T13T14T15 T16T17 T18T19T20 T21T22

Fig. C1: Map of the 22 TransCom3 regions used in our study. The names of these regions are given inTable C1. The white regions correspond to areas assumed to have zero CO surface ﬂux.41ode Name TypeT01 North American Boreal LandT02 North American Temperate LandT03 Tropical South America LandT04 South American Temperate LandT05 Northern Africa LandT06 Southern Africa LandT07 Eurasia Boreal LandT08 Eurasia Temperate LandT09 Tropical Asia LandT10 Australia LandT11 Europe LandT12 North Paciﬁc Temperate OceanT13 West Paciﬁc Tropical OceanT14 East Paciﬁc Tropical OceanT15 South Paciﬁc Temperate OceanT16 Northern Ocean OceanT17 North Atlantic Temperate OceanT18 Atlantic Tropical OceanT19 South Atlantic Temperate OceanT20 Southern Ocean OceanT21 Indian Tropical OceanT22 South Indian Temperate OceanTable C1: The code, name, and type, of the 22 TransCom3 regions used in our study. A map showing theseregions is given in Figure C1. 42

015 2016 L G L N −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0−3.6−3.2−2.8−2.4−3.6−3.2−2.8−2.4 Global land flux [PgC yr - ] G l oba l o c ean f l u x [ P g C y r - ] Bias correction/correlated errorsBias correction/uncorrelated errors No bias correction/correlated errorsNo bias correction/uncorrelated errors Prior meanTruth

Fig. C2: As in Figure 3, but for Global ocean versus Global land.43

015 2016 L G L N - ] F l u x i n T r an s C o m : S ou t he r n A f r i c a [ P g C y r - ] Bias correction/correlated errorsBias correction/uncorrelated errors No bias correction/correlated errorsNo bias correction/uncorrelated errors Prior meanTruth

Fig. C3: As in Figure 3, but for the Southern Africa region (TransCom3 region 06) versus the SouthAmerican Temperate region (TransCom3 region 04).44 . Extratropics (23.5° N − 90° N) −4−3−2−10 2015 2016Year F l u x [ P g C y r - ] −3−2−1012015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] N. Tropics (0° − 23.5° N) −10123 2015 2016Year F l u x [ P g C y r - ] −0.50.00.52015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] S. Tropics (23.5° S − 0°) −101 2015 2016Year F l u x [ P g C y r - ] −1.0−0.50.00.51.02015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] S. Extratropics (90° S − 23.5° S) −2−10 2015 2016Year F l u x [ P g C y r - ] −0.4−0.20.00.22015−01 2015−07 2016−01 2016−07Month F l u x [ P g C m o - ] MIP Prior (min/mean/max)MIP LG (min/mean/max) MIP LN (min/mean/max)WOMBAT Prior (mean) WOMBAT LG (mean, 95% cred. int.)WOMBAT LN (mean, 95% cred. int.)

Fig. C4: As in Figure 4, but for zonal bands covering the northern extratropics (23.5 ◦ N–90 ◦ N), the northerntropics (0 ◦ –23.5 ◦ N), the southern tropics (23.5 ◦ S–0 ◦ ), and the southern extratropics (90 ◦ S–23.5 ◦ S).45 eunion Wollongong LauderIzana Manaus Ascension DarwinTsukuba Dryden Caltech SagaKarlsruhe Orleans Park Falls LamontEureka Sodankyla Bialystok Bremen − − − − −

01 2015 − − − − −