Identifying regions of inhomogeneities in spatial processes via an M-RA and mixture priors
Marco H. Benedetti, Veronica J. Berrocal, Naveen N. Narisetty
IIdentifying regions of inhomogeneities in spatial processes via anM-RA and mixture priors
Marco H. Benedetti , Veronica J. Berrocal , and Naveen N. Narisetty Center for Injury Research and Policy, Nationwide Children’s Hospital, Columbus, OH 43215 Department of Statistics, University of California, Irvine, CA 92697 Department of Statistics, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL 61820
February 5, 2021
Abstract
Soils have been heralded as a hidden resource that can be leveraged to mitigate and addresssome of the major global environmental challenges. Specifically, the organic carbon stored insoils, called Soil Organic Carbon (SOC), can, through proper soil management, help offset fuelemissions, increase food productivity, and improve water quality. As collecting data on SOCis costly and time consuming, not much data on SOC is available, although understanding thespatial variability in SOC is of fundamental importance for effective soil management.In this manuscript, we propose a modeling framework that can be used to gain a betterunderstanding of the dependence structure of a spatial process by identifying regions withina spatial domain where the process displays the same spatial correlation range. To achievethis goal, we propose a generalization of the Multi-Resolution Approximation (M-RA) modelingframework of Katzfuss (2017) originally introduced as a strategy to reduce the computationalburden encountered when analyzing massive spatial datasets.To allow for the possibility that the correlation of a spatial process might be characterizedby a different range in different subregions of a spatial domain, we provide the M-RA basisfunctions weights with a two-component mixture prior with one of the mixture components ashrinking prior. We call our approach the mixture M-RA . Application of the mixture M-RAmodel to both stationary and non-stationary data shows that the mixture M-RA model canhandle both types of data, can correctly establish the type of spatial dependence structure inthe data (e.g. stationary vs not), and can identify regions of local stationarity.
Keywords: Basis expansion; Gaussian process; inhomogeneities; local stationarity; mixtureprior; M-RA; non-stationary covariance function.
Soils contain a massive proportion of the Earth system’s carbon (Lef`evre et al., 2017): within thefirst meters below surface, there is more organic carbon than in the atmosphere and terrestrialvegetation combined. This carbon, called soil organic carbon (SOC), is stored within soil organicmatter, e.g. plant or animal residual matter, and is typically very dynamic due to its involvementin several biological, physical and chemical processes (Bliss et al., 2014). For example, through rootbiomass decomposition, humification and particle detachment, SOC gets deposited and distributedin soils and removed from the atmosphere, whereas, by acting as a membrane that filters pollutants,1 a r X i v : . [ s t a t . M E ] F e b OC plays a key role in water quality. Human activity, along with erosion and SOC depletion, havecontributed to a loss of SOC that varies not only with latitude but also locally. The adoption ofrecommended soil management practices, such as reduced tillage, manuring, etc., have the potentialto transform soils into carbon pools, sequestering carbon from the atmosphere, offsetting fossilfuel emissions, and improving agronomic yield and water quality. As such, SOC can provide asustainable partial solution to three of the major global environmental challenges identified by theUnited Nations (Lef`evre et al., 2017): climate change, desertification and food insecurity.Because of the biophysical and societal importance of SOC, since the Kyoto meeting, the In-tergovernmental Panel on Climate Change has recommended that SOC stocks, i.e. the amount ofSOC in a volume of soil, are carefully monitored (Lef`evre et al., 2017). However, since collectionand measurement of SOC is costly and time consuming (Goidts and Wesemael, 2007), informationon SOC is spatially sparse, with large disparities in terms of data availability across the planet. Inaddition, as soil restorative and management measures have to be highly localized in order to beeffective, there is a great interest in understanding the spatial variability and spatial dependence ofSOC. This increased knowledge will result extremely useful when designing future sampling cam-paigns, which should be concentrated in regions with a fast-decaying spatial correlation and largeprediction uncertainty.As already indicated by Risser et al. (2019), SOC is characterized by a spatial dependencestructure that is not homogeneous in space. The contribution of our paper is to present a novelstatistical modeling approach that does not require a practitioner to determine a priori if a processis stationary or not, and can be used to model both stationary and non-stationary spatial pro-cesses. In the latter case, our model allows one to determine regions in which the spatial process ischaracterized by a similar range in the spatial correlation. This information is not readily availableby other methods. In fact, while tests to assess whether a process is stationary (Bandyopadhyayand Subba Rao, 2017; Jun and Genton, 2012), isotropic (Guan et al., 2004) or symmetric (Li et al.,2008; Weller and Hoeting, 2016) have been proposed in the literature, they only provide a globalanswer to the question of whether a spatial process is stationary or not, and they do not identifyregions with similar strength in the spatial correlation.The decomposition of a spatial domain into areas with similar spatial dependence is one ofthe classical modeling approaches used to construct non-stationary covariance functions (Sampson,2010), see for example Fuentes (2001), Kim et al. (2005), Nott and Dunsmuir (2002), Pope et al.(2018), Gramacy and Lee (2008), Konomi et al. (2014), and, more recently, Risser et al. (2019)who leverage information in the covariates to define the domain segmentation. Our model addsto this literature on non-stationary spatial processes, and it breaks with previous approaches inthat it provides an alternative, computationally less challenging way to determine regions of localstationarity. Additionally, it does not require specifying a priori the maximum number of possiblesegments, and it keeps computation rather feasible.We build on the Multi-Resolution Approximation (M-RA) model of Katzfuss (2017) that pro-vides an approximation to the original covariance function of a Gaussian spatial process via alinear combination of basis functions obtained by recursively implementing a predictive processapproximation (Banerjee et al., 2008). Katzfuss (2017) noted that, in the M-RA modeling frame-work, the magnitude of the basis functions weights at each level is related to the strength of thespatial dependence in the data. Exploiting this intuition, we propose to modify the M-RA modelto allow basis functions weights to be shrunk towards zero. Examining the behavior of the basisfunctions weights over space will provide us with information on whether the spatial dependenceof the spatial process has the same strength across the domain. Because of the prior specificationon the basis functions weights, we call our approach the mixture M-RA . We show via simulationexperiments that our modeling framework is flexible enough to accommodate both stationary and2on-stationary data.The remainder of the paper is organized as follows: in Section 2 we introduce the motivatingdataset and provide a description of the SOC data. In Section 3 we review the M-RA modeland present our modification of the M-RA approach to allow for the identification of regions ofinhomogeneity in the range parameter. Section 4 presents results relative to the application of ourmodel to simulated data and SOC, while Section 5 offers a discussion on limitations and futureextensions of our model.
The Rapid Carbon Assessment (RaCA) project was a project initiated by the U.S. Department ofAgriculture National Resource Conservation Service (NRCS) in 2010 with the goal of obtaining anup-to-date snapshot of the amount and spatial distribution of SOC across the conterminous UnitedStates (CONUS) under different land use and agricultural management. A secondary goal of theproject was to provide useful information for subsequent modeling efforts that investigate changesin SOC as a result of climate change and conservation practices.Volumes of soils were extracted at tens of thousands of locations randomly selected withindifferent land use/land cover classes, and the amount of SOC at depths of 5, 10, 20, 30, 50 and 100centimeters (in Mg C ha − ) were obtained using a Visible Near Infrared Spectrometer (VNIRS).Here we focus on measurements of SOC at 100 cm of depth. The data are openly available in the R package soilDB . A subset of these data, relative to the Great Lakes region, was previously analyzedby Risser et al. (2019) who demonstrated that, contrarily to previous analyses of SOC that assumedsecond-order stationarity, SOC is indeed a non-stationary spatial process with varying correlationranges across the Great Lakes region.As Figure 1(a) shows, raw SOC measurements are extremely right skewed, prompting us to workon the log scale, as did Risser et al. (2019). Log-transforming the data yields a distribution thatis more symmetric, as Figure 1(a) depicts. Across the CONUS, log SOC ranges from a minimumof 0.18 to a maximum of 8.71 log(Mg C ha − ), with a mean of 5.07 and a standard deviation of1.03 log(Mg C ha − ). In the remainder of the paper, with an abuse of terminology, we will use theexpression “SOC” when referring to results that pertain to the analysis of log SOC.Building upon previous modeling efforts focused mostly on spatial prediction (Risser et al.,2019; Mishra et al., 2009), in modeling the spatial process of SOC we expresse its mean function, µ ( s ), as a linear combination of three spatially-varying covariates - land use/land cover, drainageclass, and elevation - known to influence levels of SOC. The first two, both categorical variables, arerather important from a geological point of view and refer, respectively, to: six different land useclassifications according to the definition of the National Resources Inventory (namely: wetland,crop, pastureland, rangeland, forestland and land under a Conservation Reserve Program or CRP),and eight different classes that refer to the “frequency and duration of wet periods under conditionssimilar to those under which the soil developed” (NRCS Soil Survey Manual). Over 40% of theSOC samples were taken in locations that are used primarily as pasture or for grazing by animals(26.0% range and 16.8% pastureland), while 26.4% and 17.0% of the soil samples refer to farmlandand crop, respectively. SOC sites are also characterized by good drainage: about 50% of the SOCsamples were collected in well drained land (9.2% moderately well drained, 40.9% well drainedand 2.3% excessively drained), whereas about 20% of the sites were either poorly drained (8.2%somewhat poorly drained, 8.7% poorly drained and 4.7% very poorly drained) or not rated (22.7%).Spatial plots of SOC, land use/land cover, drainage class, and elevation at the 20,087 locationsin the CONUS, where measurements of all variables are available, are displayed in Figure 1(b)3hrough (e). Inspection of these maps indicate that SOC tends to be lower at higher altitudes,and higher in poorly drained locations. However, none of these covariates fully explain the spatialvariability in SOC: as Table 1 reports, a linear regression model of SOC on the three covariatesyields a residual standard error of 0.88 log(Mg C ha − ).To examine whether SOC is a stationary spatial process, we generate empirical semi-variogramsof the residuals of the above-mentioned linear regression model for each of the 48 conterminous USstates. The variograms, presented in Figure 1(f), highlight the inhomogeneity in the strength of thespatial dependence in SOC across the CONUS, as also noted by Risser et al. (2019). As we recognizethat the 48 states determine a partition of the CONUS based on administrative boundaries, ourgoal in the rest of this paper is to develop a statistical modeling framework that allows us to identifyregions with a similar spatial dependence structure in SOC. In this section, we provide a brief review of the M-RA approach. Interested readers are referredto Katzfuss (2017) for additional details. In the following, we adopt a notation that is slightlydifferent from that used by the former, particularly with respect to the subscripts used to indexdomain partitions and levels.Let y ( s ) , s ∈ S denote a spatial process in S observed at locations s , s , . . . , s n . Using ageostatistical modeling approach (Banerjee et al., 2004), we express y ( s ) as y ( s ) = µ ( s ) + w ( s ) + (cid:15) ( s ) (cid:15) ( s ) iid ∼ N (0 , τ ) , (1)where µ ( s ) denotes the mean, or large scale spatial trend in y ( s ), w ( s ) indicates spatial randomeffects, and (cid:15) ( s ) denotes an independent error process, independent of w ( s ). We will often refer to(1) as a Kriging model, using the expression Bayesian Kriging model if (1) is fit within a Bayesianframework.In (1), the spatial process w ( s ) is taken to be a mean-zero Gaussian process with covariancefunction C w ( s , s (cid:48) ; θ ), where θ represents a vector of covariance parameters that, in the case of astationary, isotropic covariance function, includes the marginal variance ( σ ) and the range param-eter ( φ ). Our θ does not include a nugget effect ( τ ) since that part of variability in the data isalready accounted for by the term (cid:15) ( s ). When C w ( s , s (cid:48) ; θ ) is the Mat´ern covariance function, θ alsoincludes a smoothness parameter ν .Using the M-RA framework, the spatial process w ( s ) can be approximated by ˜ w M ( s ), definedas ˜ w M ( s ) = B ( s ) η with B ( s ) matrix of basis functions up to level M evaluated at s and η set ofbasis functions weights. Replacing w ( s ) with ˜ w M ( s ) into (1) leads to the M-RA model y ( s ) = µ ( s ) + ˜ w M ( s ) + (cid:15) ( s ) = µ + B ( s ) η + (cid:15) ( s ) , (cid:15) ( s ) iid ∼ N (0 , τ ) , (2)which provides great computational efficiency for large dimensional spatial data as illustrated inKatzfuss (2017).The basis functions B ( s ) are defined by recursively partitioning the spatial domain, introducinga new set of knots within each new partition and using a predictive process approximation eachtime. More precisely, at level 0, r knots are placed on the entire domain. No particular placementof the r knots is suggested, although placing them on an equidistant grid is probably the mostconvenient and easy-to-implement choice. We indicate with Q (0) the set of r knots introducedat level 0. Using this first set of knots Q (0) , the original process w ( s ) is approximated using the4redictive process τ ( s ) := E (cid:2) w ( s ) | w ( Q (0) ) (cid:3) where w ( Q (0) ) denotes the r -dimensional realizationof the spatial process w ( s ) at the knots’ locations. After this initial approximation at level 0,at level 1 the spatial domain is subdivided into J non-overlapping subregions, and r new knotsare placed in each new subregion (see Web Figure 1 in Web Appendix A for an illustration).We indicate with Q (1) the set of J · r knots introduced at level 1. As for level 0, the knots in Q (1) are in turn used to construct the predictive process approximation, τ ( s ), to the remainderprocess, δ ( s ), obtained at level 0 and defined as δ ( s ) := [ w ( s ) − τ ( s )], where [ · ] superimposesindependence across subregions. We note that J and r do not need to be equal at each level, but itis assumed for convenience of notation. In addition, the knots do not need to lay on a grid at eachlevel: Katzfuss (2017) uses the observation locations as knots in the final level of the M-RA. Thisprocedure of partitioning, introducing knots, and approximating the remainder term δ m ( s ) withits predictive process approximation τ m ( s ) is repeated M times leading to the following M -levelM-RA approximation w M ( · ) to w ( s ): w M ( s ) = τ ( s ) + τ ( s ) + . . . + τ M ( s ) + δ M +1 ( s ) ≡ ˜ w M ( s ) + δ M +1 ( s ) , s ∈ S , (3)with δ M +1 ( s ) remainder at level M .By construction, the individual terms in (3) are mutually independent processes as also are theremainder processes, δ ( s ), δ ( s ), . . . , δ M +1 ( s ), with the latter independent across subregions at thecorresponding level, e.g. δ ( s ) is independent across the J subregions introduced at level 1. Thisleads to a convenient block-diagonal covariance matrix structure for the basis functions weights.For a sufficiently large M , it is safe to assume that the remainder term δ M +1 ( s ) is not spatiallycorrelated as the spatial dependence structure in y ( s ) has been captured by ˜ w M ( s ).As at each level m , the predictive process τ m ( s ) can be rewritten as a basis function expansion(see Banerjee et al. (2008)), then it follows˜ w M ( s ) = M (cid:88) m =0 J m (cid:88) j =1 b m,j ( s ) η m,j , (4)where the sum is taken over partitions and levels (see Katzfuss (2017) for more details). In (4), b m,j ( s ) denotes the set of basis functions corresponding to the j -th partition and the m -th levelevaluated at s , while η m,j is the r -dimensional vector of basis functions weights in the j -th partitionof the m -th level.The form of the basis functions b m,j ( s ) is completely determined by the choice of the covariancefunction C w ( · , · ) for the spatial random effects w ( s ) in (1). For a given covariance function, the basisfunctions can be defined as follows (see Banerjee et al. (2008) and Katzfuss (2017) for more details):let Q ( m,j ) denote the set of r knots in the j -th partition at the m -th level, m = 0 , ..., M, j = 1 , ..., J m ,with Q ( m ) = (cid:83) J m j =1 Q ( m,j ) , then the basis functions and the prior covariance of the basis functionsweights are defined by the following recursive formulas: v ( s , s ) = C w ( s , s ; θ ) b m,j ( s ) = v m ( s , Q ( m,j ) ) K − m,j = v m ( Q ( m,j ) , Q ( m,j ) ) (5) v m +1 ( s , s ) = (cid:40) , if s and s are in different regions at resolution mv m ( s , s ) − b m,j ( s ) (cid:48) K m,j b m,j ( s ) , otherwisefor any s , s ∈ S . For every m and j , K m,j is a r × r covariance matrix, and η m,j ∼ N r ( , K m,j ) . (6)5eplacing (4) into (2) yields y ( s ) = µ + M (cid:88) m =0 J m (cid:88) j =1 b m,j ( s ) η m,j + (cid:15) ( s ) , (cid:15) ( s ) iid ∼ N (0 , τ ) . (7) To allow for the possibility that a spatial process is characterized by a spatial correlation with adifferent range parameter in different subregions, we propose to slightly change the prior distributionon the basis functions weights. Specifically, rather than specifying a multivariate normal prior, weprovide them with a prior that allows them to be shrunk to zero from a level ˜
M < M onwardin certain subregions, if needed. Regions where the basis function weights are shrunk to zerocorrespond to segments of the spatial domain where the spatial process is characterized by a slowly-decaying spatial correlation.Various alternatives are possible to shrink the basis functions weights to 0: a spike and slab prior(Mitchell and Beauchamp, 1988; Ishwaran and Rao, 2005), nonlocal priors (Johnson and Rossell,2012), stochastic search variable selection (George and McCulloch, 1993, 1997), or empirical Bayesvariable selection (George and Foster, 2000). Here, given the large number of parameters, forcomputational convenience, we elect to use the method proposed by Narisetty and He (2014) forBayesian variable selection. This will enable us to retain the hierarchical structure of the basisfunctions weights, shrinking to 0 all the basis functions weights nested within a given subregionand level, while maintaing desirable properties for shrinkage.Using the prior distribution in (6) as a starting point, we specify the following mixture prior: for m = 0 , . . . , M and j = 1 , . . . , J m η m,j ∼ p m N r (0 , K m,j ) + (1 − p m ) N r (0 , K m,j /L ) , (8)where L is a fixed large constant. The parameter 0 ≤ p m ≤ η m,j is not shrunk to 0 and thus it is active . We call this model the mixture M-RA . In fitting themixture M-RA model to data, we tune L within the burn-in period of the Markov Chain MonteCarlo (MCMC) algorithm, keeping it fixed at the selected value for the remainder of the MCMCiterations. In future implementations, L could be seen as an additional parameter in the model forwhich a prior distribution may be considered.A characteristic distinction of our prior specification on the η m,j ’s compared to the commonlyused spike and slab Gaussian priors is that the covariance matrix of both the spike and slabdistributions is proportional to K m,j instead of being equal to the identity matrix. This is to reflectthe dependence structure in the η m,j ’s.The grouping-preservation character of our model specification can be seen in the followingBayesian hierarchical formulation. Let Z m,j , for m = 0 , . . . , M , j = 1 , . . . , J m denote binary latentvariables, then our model could be re-expressed as: y ( s ) = µ ( s ) + (cid:80) Mm =0 (cid:80) J m j =1 b m,j η m,j + (cid:15) ( s ) (cid:15) ( s ) iid ∼ N (0 , τ ) η m,j | Z m,j = 1 ∼ N r (0 , K m,j ) η m,j | Z m,j = 0 ∼ N r (0 , K m,j /L ) Z m,j | ( Z m − ,j ∗ = 1 , p m ) ∼ Bernoulli( p m ); p m = ρ m ; ρ ∼ Beta( α ρ , β ρ ) (9) P ( Z m,j = 1 | Z m − ,j ∗ = 0) = 0 (10)where j ∗ is the partition in level m − j -th partition at the m -th level.6n (9), p m represents the probability that a set of basis functions weights in the m -th levelbelongs to the first component of the mixture prior and is not shrunk. Since we expect that for m = 0, there is residual spatial correlation to be captured, all sets of weights will belong to thefirst mixture component, whereas at higher levels, with almost no residual spatial correlation left,they are more likely to be drawn from the second component of the mixture prior. In light of this,for model parsimony, we set p m to be equal to ρ m , with ρ ∈ (0 , p m = ρ cm with a positive parameter c to be estimated, or it will keep all the p m ’s distinctwhile still monotone decreasing in m . To set the weights at a given level to be zero if the weightsin the previous level are zero, we add (10) to our model specification. We note that our conditionalprior specification in (9) and (10) is a special case of the priors proposed by Taylor-Rodriguez et al.(2016) for variable selection in the presence of heredity constraints, and we observe that (9) and(10) impose a strong heredity constraint.We complete the specification of our model by providing priors to all the remaining modelparameters. Choosing a stationary Mat´ern covariance function for C w ( s , s (cid:48) ; θ ), the covarianceparameter θ is given by ( σ , φ, ν ) (cid:48) . We place Inverse Gamma priors on the residual variance, ornugget effect, τ , and on the marginal variance σ of w ( s ). We choose hyperparameters α τ , β τ and α σ , β σ , corresponding to the shape and rate parameter, respectively, so that the priors on τ and σ are both weakly informative. Conversely, we specify a weakly informative Gamma(0 . , . φ , while we place a Uniform prior on the interval (0 ,
2) on thesmoothness parameter ν . Finally, assuming µ ( s ) ≡ µ , we place a vague mean-zero Normal prior on µ . In situations where µ ( s ) is modeled as a linear function of (spatial) covariates, the regressioncoefficients β are provided with flat prior distributions. We clarify that, in spite of using someimproper priors on the parameters for the mean function, because our model specifies a Gaussianlikelihood, our joint posterior distribution is proper. We fit our model within a Bayesian framework, approximating the posterior distribution using anMCMC algorithm. The algorithm includes Gibbs sampling steps to generate posterior samples forthe constant mean µ , the basis functions weights η m,j , the nugget effect τ , and the auxiliary binaryvariables Z m,j . Metropolis-Hastings steps are used to generate posterior samples of the parameter ρ that defines the probabilities p m , and of the covariance parameters σ , φ and ν . Specifically,to sample ρ we use a uniform proposal distribution bounded between 0 and 1 and centered at itscurrent value in the MCMC algorithm. Similarly, we use uniform proposals to sample σ , φ and ν . In all cases, the widths of the uniform proposals are adaptively adjusted every 100 iterationsuntil burn-in to achieve an acceptance rate of approximately 25%. Details on the adaptive MCMCalgorithm are provided in Web Appendix C.Although in the current implementation of the mixture M-RA we do not place a prior on L , wedetermine its value within the MCMC algorithm. Specifically, we start the algorithm with a largevalue for L for which we expect few to no basis functions weights to be drawn from the shrinkageprior. We typically use 1,000 as initial value for L based on the results obtained in Simulation study1, which we discuss in Section 4. We monitor the behavior of the basis functions weights duringthe burn-in period by computing the average of the sampled binary auxiliary variables Z m,j every1,000 iterations, for m = 0 , . . . , M and j = 1 , . . . , J m . Expecting shrinkage of the basis functionsweights mostly at the higher levels, we focus on monitoring the average of the Z M,j ’s, reducingthe value of L by 50% if the average of the sampled Z M,j ’s is greater than 0.95, while keeping itfixed otherwise. We continue monitoring the basis functions weights, decreasing the value of L , ifwarranted, during burn-in until we do not see further significant changes in the proportion of basis7unctions weights being shrunk to zero. We then keep L fixed for the rest of the MCMC iterations.More details on the MCMC algorithm along with the pseudocode are available in Web AppendixC. We now present results of the application of the mixture M-RA model to simulated data and toobservations of Soil Organic Carbon in the CONUS.
To gain a better understanding of the mixture M-RA model, we designed simulation studies withthe goals of:1. understanding the role of L in the estimation of the basis functions weights, and the magnitudeof L needed to allow shrinkage to zero of the basis functions weights, when a process is trulylocally stationary;2. evaluating whether the mixture M-RA model can identify regions of local stationarity if theyindeed exist, even in case of model mis-specification; and3. determining whether shrinkage of the basis functions weights is needed to identify regions ofnon-stationarity.To achieve these objectives we considered the following simulations: • Simulation study 1 : data were generated according to the M-RA model in (7) with somebasis functions weights η m,j set equal to 0; and • Simulation study 2 : data were generated on the unit square and non-stationarity was obtainedby introducing two mean-zero stationary spatial processes with different spatial correlationranges, each operating on one half of the square.For each simulation study, we generated multiple replicates: 50 for simulation study 1 and 30for simulation study 2. Posterior inference is based on samples yielded from an MCMC algorithmwhose convergence was assessed using various diagnostics, including trace plots, and Geweke’s(Geweke, 1992) and Raftery and Lewis’ diagnostics (Raftery and Lewis, 1992). Unless otherwisenoted, results are averaged across the multiple realizations. Here we report and discuss results forsimulation studies 1 and 2; results for additional simulation studies investigating other propertiesof the mixture M-RA are available in Web Appendix D.
We generated data at n = 756 random locations in S =[0,1] × [0,1] fifty times according to (7)where µ = 0 and τ = 0 .
05. Basis functions weights were drawn either from the distribution in (6)with covariance matrix implied by a stationary Mat´ern covariance function with θ = ( σ , φ, ν ) (cid:48) =(1 , . , . M = 3, J = 4 and r = 9, or were set equal to 0. Zero-valued function weights werelocated in the upper right and lower left quadrants (see Web Figure 6 in Web Appendix D).To each of the 50 datasets, we fitted our mixture M-RA model using M = 3, J = 4 and r = 16.As the goal of this simulation study is to determine whether our model is capable to identifyregions with different strengths of spatial dependence, in fitting the mixture M-RA model we did8ot estimate θ , rather we kept it fixed at its true value. However, we varied the value of L and weused six different ones: L =10, 25, 50, 100, 200, and 10,000.Table 2 presents summary statistics pertaining to the recovery of the basis functions weightsaveraged across levels, partitions, and the 50 simulations. As the table indicates, the true valuesof the basis functions weights are contained in the 95% credible intervals with a frequency thatis close to the nominal level, at times slightly over. The accuracy with which the basis functionsweights are estimated varies depending on the magnitude of the basis functions weights. While onaverage across simulations, the average relative absolute error is around 0.40, when L = 100, theaverage relative absolute error of basis functions weights whose true absolute value is less than 0.5is 1.54, while for basis functions weights with true absolute value greater or equal than 0.5, it issignificantly smaller and equal to 0.28. We also observe that, when L =10,000, the MCMC algorithmnever samples basis functions weights from the shrinkage prior resulting in a larger average MeanAbsolute Error (MAE) and Mean Square Error (MSE). On the other hand, L =100 guarantees thatthe MCMC algorithm samples the zero-valued basis functions weights from the shrinkage prior,leading to a greater accuracy in recovering the η m,j ’s. Data were generated at n = 1 ,
012 random locations in S = [0 , × [0 ,
1] according to the followingmodel: y ( s ) = µ ( s ) + I ( s x < . w ( s ) + I ( s x ≥ . w ( s ) + (cid:15) ( s ) (cid:15) ( s ) iid ∼ N (0 , τ ) (11)where s x indicates the first coordinate of the two-dimensional vector of geographical coordinatesfor point s (e.g. longitude or Easting). In (11), τ was set equal to 0 .
05, and w ( s ) and w ( s ) weretaken to be two mean-zero stationary Gaussian processes with Mat´ern covariance functions withparameters σ = 1, ν = 1, and φ equal to 1 . .
01, respectively.We selected 756 locations at random and we used the corresponding data as training data. Tothese data, we fitted: (i) the mixture M-RA model with a stationary Mat´ern covariance functionto define the basis functions, and (ii) the M-RA model of Katzfuss (2017) without shrinkage of thebasis functions weights. The comparison with the M-RA model will enable us to determine whethershrinkage is needed to determine the regions of inhomogeneities (goal (2)). In fitting the mixtureM-RA model, we used M = 3, J = 4, and r = 16, with L determined via tuning and kept equal toits optimal value ( L =100) in the post burn-in iterations.Inspecting the results for the basis functions weights for simulation study 2, we observe thatthe η m,j ’s are sampled from the two components of the mixture prior at different rates in the twohalves of the spatial domain: in the part of the domain where φ = 1 .
0, the average posterior meanof the binary latent variables Z m,j at the third level ( m = 3), averaged across the 30 simulations,is 0.296. Conversely, in the region where φ = 0 .
01, it is 0.968. Also the M-RA model withoutshrinkage captures the difference in magnitude between the basis functions weights in the tworegions. However, looking at the spatial map of the posterior means of the basis functions weightsdisplayed in Figure 2(e), it is clear that the M-RA cannot identify the two regions easily.A similar conclusion is drawn when examining the confusion matrix presented in Table 3. Thistable is obtained by classifying the knots in the highest level ( m =3) of the mixture M-RA and of theM-RA to a region according to some model-specific criteria. For the mixture M-RA, since the binarylatent variables are defined at the partition level, all knots in partition j at level m = 3 are classifiedas belonging to Region 1 ( φ = 1 .
0) or Region 2 ( φ = 0 . E [ Z m,j | y ] < . m = 3 is labeled as belonging to Region 1 or9egion 2 based on whether the absolute value of the posterior mean of the basis function weightassociated with the given knot is below a certain threshold. The results presented in Figure 2 andTable 3 underscore the importance of the shrinkage prior in detecting regions of non-stationarity(goal (3)). To the 20,087 measurements of log SOC described in Section 2 we fit our mixture M-RA modelin (9), with µ ( s ) a linear function of land use/land cover, drainage class and elevation. We used astationary Mat´ern covariance function to define the basis functions, and set M , J , and r equal to4, 4 and 16, respectively. To allow for regions of local stationarity with irregular boundaries, weplaced knots using a Voronoi tessellation approach: more discussion on this approach is providedin Web Appendix E, where we also lists all the prior specifications (see Web Table 8). We ran theMCMC algorithm for 10,000 iterations, determining the optimal value of L in the burn-in period( L = 100) and keeping it fixed afterwards. We assessed convergence using multiple diagnostics,which we present in Web Appendix E. After discarding the first 5,000 iterations for burn-in, wederived posterior inference on the latent variables Z m,j which we use to identify regions where thebasis functions weights are shrunk to zero versus not shrunk. Figure 3(b) plots the posterior meansof the latent binary variables Z m,j at their respective knots locations. The regions in which theposterior means of the Z m,j ’s are below 0.5 are the regions where the basis functions weights areshrunk to zero, and those are expected to contain observations whose residual spatial correlationdecays more slowly. To validate this, using land use/land cover, drainage class, and elevation ascovariates, we fit a likelihood-based spatial statistical model to SOC in the two regions denoted inFigure 3(c): Region 1, further West, which contains knots corresponding to basis functions weightsshrunk towards zero, and Region 2, SouthEast of Region 1, in which basis functions weights are notshrunk. Figure 3(d) shows the estimated Mat´ern correlation function in the two regions. Consistentwith our expectation, Region 1, in which basis functions weights are shrunk towards zero, exhibitsa more slowly-decaying spatial correlation than Region 2, in which the basis functions weights areactive in the model.Using our mixture M-RA model, we also generate predictions of SOC at observation sites alongwith the corresponding posterior predictive standard deviations, which we display in Figure 3(e)and 3(f). Combining the information in panels (e) and (f) of Figure 3 together allows us to identifyregions where the residual spatial correlation in SOC persists at long distances and predictionsare characterized by great uncertainty. This can in turn be used to plan future SOC data collec-tion campaigns. Specifically: more sampling efforts should be concentrated in regions with largeprediction uncertainty and where the spatial correlation has a short effective range. Based onFigure 3, this means: Texas, Montana, Western North and South Dakota, and Northern Missis-sippi/Alabama/Georgia, among others.As a secondary assessment of our model, we also examined its predictive performance at 2,000hold-out sites. The goal of this added assessment was two-fold: to obtain an indication of the ade-quacy of our model for spatial interpolation, and to evaluate its appropriateness for modeling SOC.For this latter reason, we compare its predictive accuracy against that of three other models: (i) astationary Bayesian Kriging model that we fit using a predictive process approximation (Banerjeeet al., 2008) due to the large size of the dataset; (ii) the M-RA model of Katzfuss (2017); and(iii) the convolution-based non-stationary model of Risser and Calder (2017). Among all the non-stationary models proposed in the literature, we select the latter partly because statistical softwareto implement it is readily available. Results relative to this out-of-sample predictive performanceassessment are reported in Web Appendix E. Here, we simply observe that although our model10oes not consistently yield the best predictions in terms of Mean Square Predictive Error (MSPE)or Mean Absolute Predictive Error (MAPE), it has similar predictive accuracy than the originalM-RA and the model of Risser and Calder (2017), and it yields much better predictions than thestationary Bayesian Kriging model which has the worst prediction accuracy of all. These resultsindicate that even though our model has been developed with the main purpose of identifyingregions of non-stationarity, it also has a utility beyond that: it is a viable non-stationary spatialstatistical model in its own right. Analysis of geostatistical data often involves as a first step the selection of a covariance function.In applications, stationary covariance functions are often used even though the assumption ofstationarity might not be warranted. In this paper, we have proposed a flexible modeling frameworkthat allows investigators to gain a better understanding of the spatial dependence structure of aspatial process while accomodating both stationary and non-stationary spatial data, in cases wherethe non-stationarity is due to inhomogeneities in the range of the spatial correlation. Applicationof our model to both stationary (see simulation study 5 in Web Appendix D) and non-stationarydata have shown that our model allows one to identify regions of local stationarity, if they exist.Determining whether there exist regions in the spatial domain where a spatial process displaysvarying strength of spatial dependence is extremely important for various reasons. From a sam-pling design perspective, knowing that in different regions the spatial process is characterized bya different range parameter, could lead to a differential strategy when collecting observations orwhen placing monitoring devices, as we have discussed in the analysis of SOC in Section 4.2. Froma computational point of view, the decomposition of the spatial domain in regions of local station-arity can lead to computational savings as a spatial model can be fit to data within each regionindividually.Another appealing feature of our model is that it can identify regions of local stationarity bysimply using a stationary covariance function within the M-RA without the need of using a non-stationary covariance function model, for which parametric closed form expressions are not readilyavailable.There are multiple ways in which our model could be extended and improved. First, for nowwe have only been considering Gaussian spatial processes: it would be interesting, and potentiallynot too difficult, to extend the mixture M-RA modeling framework to non-Gaussian spatial data.In its current form, our model only accommodates non-stationarity due to inhomogeneities in therange of the correlation function; extensions of this work could be geared towards accommodatingother types of non-stationarity. The use of an anisotropic spatial covariance function could also beexplored.The prior specification on the M-RA basis functions weights involves a mixture of two normalpriors, with one of the two normal distributions introducing an additional parameter, the shrinkageparameter L . In our implementation, L is tuned during the burn-in period and kept it fixedafterwards. In all our analyses the optimal value of L resulted to be 100: we suspect that themagnitude of L is influenced by the size of the marginal variance, as in all our simulations and inthe SOC data, the marginal variance was approximately equal to 1.0. A further avenue of researchcould be to investigate how to provide a prior on L , and infer upon it using the data.In specifying the mixture prior on the basis functions weights, we used shrinkage probabilitiesthat vary by level, and have expressed them as powers of a parameter ρ . A more general formulationwould allow the shrinkage probabilities to vary from level to level and not be multiple of a common11arameter ρ . Although this approach might increase the degrees of freedom of the model, it mightnot ensure that the shrinkage probabilities decrease with increasing level, as one would expect.In our simulation experiments, we have placed the knots on a regular grid or at the observationlocations in the highest level of the M-RA (see simulation studies in Web Appendix D), yieldingregions of local stationarity that have regular boundaries. We note that it is possible to identifyregions with more irregular boundaries by placing the knots at random, defining the M-RA subre-gions as disjoint unions of complete Voronoi polygons, as we show in Web Appendix A. We haveused this strategy in the analysis of SOC. Investigating the influence of the knots’ placement onthe inference yielded by the mixture M-RA is another avenue for future research.Finally, we observe that while prediction is not the main goal of our model, the predictiveperformance of the mixture M-RA in both simulations (see Web Appendix (D)) and in the SOCdata analysis, is either comparable or slightly inferior to that of the MRA model or another non-stationary model. While this is a limitation of our model to be further improved, we underline thatthe primary goal of our model is to capture the second-order structure of a spatial process. References
Bandyopadhyay, S. and Subba Rao, S. (2017). A test for stationarity for irregularly spaced spatialdata.
JRSS B , 79:95–123.Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004).
Hierarchical modeling and analysis for spatialdata . Chapman & Hall/CRC. Boca Raton, FL.Banerjee, S., Gelfand, A. E., Finley, A. O., and Sang, H. (2008). Gaussian predictive process modelsfor large spatial data sets.
JRSS B , 70:825–848.Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection.
The Annals ofStatistics , 32:870–897.Bliss, N. B., Waltman, S. W., West, L., Neale, A., and Mehaffey, M. (2014). Distribution of soilorganic carbon in the conterminous United States. In Hartemink, A. E. and McSweeneyi, K.,editors,
Progress in Soil Science , pages 85–93. Springer.Fuentes, M. (2001). A new high frequency kriging approach for non-stationary environmentalprocesses.
Environmetrics , 12:469–483.George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayesian variable selection.
Biometrika , 87:731–747.George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling.
JASA , 88:881–889.George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection.
StatisticaSinica , 7:339–374.Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to the calculation ofposterior moments. In Bernardo, J., Berger, J., Dawid, A., and Smith, A., editors,
BayesianStatistics 4 , pages 169–193. Oxford University Press.Goidts, E. and Wesemael, B. V. (2007). Regional assessment of soil organic carbon changes underagriculture in Southern Belgium, 1955-2005.
Geoderma , 141:341–354.12ramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with anapplication to computer modeling.
JASA , 103:1119–1130.Guan, Y., Sherman, M., and Calvin, J. A. (2004). A nonparametric test for spatial isotropy usingsubsampling.
JASA , 99:810–821.Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesianstrategies.
The Annals of Statistics , 33:730–773.Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings.
JASA , 107:649–660.Jun, M. and Genton, M. G. (2012). A test for stationarity of spatio-temporal random fields onplanar and spherical domains.
Statistica Sinica , 22:1737–1764.Katzfuss, M. (2017). A multi-resolution approximation for massive spatial datasets.
JASA , 112:201–214.Kim, H.-M., Mallick, B. K., and Holmes, C. C. (2005). Analyzing nonstationary spatial data usingpiecewise Gaussian processes.
JASA , 100:653–668.Konomi, B. A., Sang, H., and Mallick, B. K. (2014). Adaptive Bayesian nonstationary modeling forlarge spatial datasets using covariance approximations.
Journal of Computational and GraphicalStatistics , 3:802–829.Lef`evre, C., Rekik, F., Alcantara, V., and Wiese, L. (2017). Soil organic carbon: the hiddenpotential. Technical Report of the Food and Agriculture Organization of the United Nations.Li, B., Genton, M. G., and Sherman, M. (2008). Testing the covariance structure of multivariaterandom fields.
Biometrika , 95:813–829.Mishra, U., Lal, R., Slater, B., Calhoun, F., Liu, D., and Van Meirvenne, M. (2009). Predictingsoil organic carbon stock using profile depth distribution functions and Ordinary Kriging.
SoilScience Society of America Journal , 73:614–621.Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression.
JASA ,83:1023–1032.Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shriking and diffusing priors.
The Annals of Statistics , 42:789–817.Nott, D. J. and Dunsmuir, W. T. M. (2002). Estimation of nonstationary spatial covariancestructure.
Biometrika , 89:819–829.Pope, C. A., Gosling, J. P., Barber, S., Yamaguchi, T., and Blackwell, P. G. (2018). Modellingspatial heterogeneity and discontinuities using Voronoi tessellations. arXiv:1802.05530.Raftery, A. E. and Lewis, S. M. (1992). Practical Markov Chain Monte Carlo: Comment: Onelong run with diagnostics: Implementation strategies for Markov Chain Monte Carlo.
StatisticalScience , 7:493–497.Risser, M. D. and Calder, C. A. (2017). Local likelihood estimation for covariance functions withspatially-varying parameters: The convoSPAT package for R.
Journal of Statistical Software ,81:1–32. 13isser, M. D., Calder, C. A., Berrocal, V. J., and Berrett, C. (2019). Nonstationary spatialprediction of soil organic carbon: Implications for stock assessment decision making.
Annals ofApplied Statistics , 13:165–188.Sampson, P. D. (2010). Constructions for nonstationary spatial processes. In Gelfand, A. E., Diggle,P., Fuentes, M., and Guttorp, P., editors,
Handbook of Spatial Statistics , pages 119–130. CRCPress.Taylor-Rodriguez, D., Womack, A., and Bliznyuk, N. (2016). Bayesian variable selection on modelspaces constrained by heredity conditions.
Journal of Computational and Graphical Statistics ,25:515–535.Weller, Z. D. and Hoeting, J. A. (2016). A review of nonparametric hypothesis tests of isotropyproperties of spatial data.
Statistical Science , 31:305–324.
Supporting Information
Web Appendices, Tables, and Figures referenced in Sections 3.1, 3.3, 4.1, 4.2 and 5, are availablein a supplemental document submitted as an ancillary file. Code for implementing the MixtureM-RA to simulated data are available at: https://github.com/marco-benedetti/Mixture-M-RA . Data Availability Statement
The data that support the findings of this study are openly available in the soilDB package of R at https://cran.r-project.org/web/packages/soilDB/index.html .14 OC, Mg/ha D en s i t y . . . . (a) −120 −110 −100 −90 −80 −70 −60 Longitude La t i t ude log(SOC)<4.504.50−4.754.75−5.005.00−5.255.25−5.505.50−5.755.75−6.00>6.00 (b) −120 −110 −100 −90 −80 −70 −60 Longitude La t i t ude CropFarmlandPasturelandRangeWetlandCRP (c) −120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude Very poorly drainedPoorly drainedSomewhat poorly drainedModerately well drainedWell drainedSomewhat excessively drainedExcessively drainedNot Rated (d) −120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude Elevation, ft−100−250251−500501−750751−10001001−1500>1500 (e) . . . . . . Distance (km) S e m i − v a r i an c e Full DataStates (f)
Figure 1: Soil Organic Carbon (SOC) analysis. (a) Histogram of raw SOC values with overlaid akernel density estimate of log SOC. (b)-(e) Spatial maps of: (b) log SOC; (c) land use/land cover;(d) drainage class; and (e) elevation. (f) Empirical semi-variogram of the residuals of log SOC onland use/land cover, drainage class and elevation when using data for each of the 48 CONUS states(thinner lines) and when using all the data (thicker line) across the CONUS.15 hi = 1.0 phi = 0.01 x−coordinate y − c oo r d i na t e −2−10123 y(s) (a)(b) (c) phi = 1.0 phi = 0.01 x−coordinate y − c oo r d i na t e E(Z|y) (d) phi = 1.0 phi = 0.01 x−coordinate y − c oo r d i na t e −3−2−10123 E( h |y) (e) Figure 2: Simulation study 2. (a) One of the 30 realizations of y ( s ) generated according to (11),with µ ( s ) ≡ ∀ s ∈ S = [0 , × [0 , τ = 0 .
05, and w ( s ) and w ( s ) mean-zero stationaryGaussian processes with Mat´ern covariance function with parameters, σ = 1 . ν = 1, φ = 0 . σ = 1 . ν = 1, and φ = 1 .
0, respectively. (b)-(c) Histograms of the posterior means ofthe basis function weights η m,j for m = 3 for the mixture M-RA and the M-RA model withoutshrinkage, respectively, grouped by values of φ , the range parameter. (d) Posterior means of the Z m,j ’s for m = 3. (e) Posterior means of the basis function weights η m,j for m = 3 as estimatedby the M-RA model without shrinkage. 16
120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude Knots at Level 0Knots at Level 1Knots at Level 2Knots at Level 3Knots at Level 4 (a) −120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude E(Z|Y)<0.10.1−0.20.2−0.50.5−0.80.8−0.90.9−1.0 (b) −120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude Region 1: E(Z|Y)< 0.5Region 2: E(Z|Y) > 0.5 (c)
Distance (km) S pa t i a l C o rr e l a t i on Region 1: E(Z|Y) < 0.5Region 2: E(Z|Y) > 0.5 (d) −120 −110 −100 −90 −80 −70 −60
Longitude La t i t ude log(SOC)<4.504.50−4.754.75−5.005.00−5.255.25−5.505.50−5.755.75−6.00>6.00 (e) −120 −110 −100 −90 −80 −70 −60 Longitude La t i t ude SD log(SOC)<0.580.58−0.600.60−0.620.62−0.640.64−0.66>0.66E(Z_m,j < 0.5) (f)
Figure 3: SOC analysis. (a) Placement of the knots used in the mixture M-RA model. (b) Posteriormeans of Z m,j at the highest level ( m =4). (c) Two regions where the posterior means of Z m,j atthe highest level ( m =4) are, respectively, less than 0.5 versus greater or equal to 0.5. In the former,basis functions weights are more likely to be shrunk to zero while in the latter they are more likelyto not be shrunk to zero. (d) Estimated Mat´ern correlation functions in the selected subregions.(e) Predicted SOC and (f) posterior predictive standard deviation as yielded by the mixture M-RAmodel. In (e), lines delineate regions where the posterior mean of the Z m,j ’s at the highest level, m =4, is less than 0.5. We identify these as regions of local stationarity.17able 1: Exploratory data analysis for SOC. Estimated regression coefficients with correspondingstandard errors from a linear regression model that regresses log(SOC) on land use/land cover,drainage class, and elevation. ∗∗∗ denotes estimates significant at α = 0 . ∗∗ indicatessignificance at α = 0 .
05 level.Variable Level ˆ β SE (cid:16) ˆ β (cid:17) Intercept 5.017 ∗∗∗ ∗∗∗ (Reference = Crop)
Pastureland 0.141 ∗∗∗ − ∗∗∗ ∗∗ ∗∗∗ (Reference = Not rated) Poorly drained 0.766 ∗∗∗ ∗∗∗ ∗∗∗ ∗∗ ∗∗∗ ∗∗∗ − ∗∗∗ .
88 log (cid:0)
Mg C ha − (cid:1) Z m,j , average Mean Absolute Error(Avg. MAE), average Mean Squared Error (Avg. MSE), average bias, average relative MSE, andaverage empirical coverage (covg.) of the 95% credible interval (CI) for the non-zero basis functionsweights, averaged across levels, subregions, and the 50 simulated datasets. Summary statistics arepresented overall, and stratified based on whether the true basis functions weights are equal to zeroor not. Avg. Avg. Avg. Avg.Avg. Avg. Avg. Avg. Relative Relative Relative covg. of η m,j L E [ Z m,j | y ] MAE MSE bias MAE bias MSE 95% CI10 0.476 1.214 5.379 - 0.021 NA NA NA 0.91225 0.518 1.227 5.394 - 0.019 NA NA NA 0.92550 0.536 1.244 5.415 - 0.017 NA NA NA 0.926All 100 0.600 1.341 5.453 -0.023 NA NA NA 0.935200 0.828 1.607 5.952 -0.023 NA NA NA 0.9301,000 0.884 1.661 6.029 -0.022 NA NA NA 0.93010,000 1.000 1.687 6.131 -0.024 NA NA NA 0.92110 0.000 0.041 0.003 -0.000 NA NA NA –25 0.001 0.041 0.003 -0.000 NA NA NA –50 0.025 0.062 0.009 -0.000 NA NA NA –= 0 100 0.152 0.302 0.377 -0.005 NA NA NA –200 0.634 0.874 1.535 -0.008 NA NA NA –1,000 0.754 0.912 1.719 -0.009 NA NA NA –10,000 1.000 1.043 1.909 -0.010 NA NA NA –10 0.904 2.356 10.914 -0.033 0.491 − − − (cid:54) = 0 100 0.999 2.264 9.966 -0.039 0.440 − − − − h in partition j of level m ( m =3 here) is determined based on whether | E ( η m,j,h | y ) | < t or not, with t denoting athreshold. If | E ( η m,j,h | y ) | < t , then the h -th knots is classified as belonging to Region 1, otherwiseto Region 2. The classifier for the mixture M-RA model is based on the median probability model:all knots in parition j of level m ( m = 3) are classified to belong to region 1 if E ( Z m,j | y ) < . t ( φ = 1 .
00) ( φ = 0 . Mixture M-RA E ( Z m,j | y ) < . E ( Z m,j | y ) ≥ .83.3% 100.0%