LLevel set Cox processes
Anders Hildeman , David Bolin , Jonas Wallin , and Janine B. Illian Department of Mathematical Sciences, Chalmers University of Technology andUniversity of Gothenburg, Sweden Department of Statistics, Lund University, Sweden School of Mathematics and Statistics, University of St Andrews, Scotland
Abstract
The log-Gaussian Cox process (LGCP) is a popular point process for modeling non-interacting spatial point patterns. This paper extends the LGCP model to handle dataexhibiting fundamentally different behaviors in different subregions of the spatial domain.The aim of the analyst might be either to identify and classify these regions, to perform krig-ing, or to derive some properties of the parameters driving the random field in one or severalof the subregions. The extension is based on replacing the latent Gaussian random field inthe LGCP by a latent spatial mixture model. The mixture model is specified using a latent,categorically valued, random field induced by level set operations on a Gaussian randomfield. Conditional on the classification, the intensity surface for each class is modeled by aset of independent Gaussian random fields. This allows for standard stationary covariancestructures, such as the Matérn family, to be used to model Gaussian random fields withsome degree of general smoothness but also occasional and structured sharp discontinuities.A computationally efficient MCMC method is proposed for Bayesian inference and weshow consistency of finite dimensional approximations of the model. Finally, the model isfitted to point pattern data derived from a tropical rainforest on Barro Colorado island,Panama. We show that the proposed model is able to capture behavior for which inferencebased on the standard LGCP is biased.
Cox processes, and in particular log-Gaussian Cox processes (LGCP), have been used extensivelyas flexible models of spatial point pattern data [37, 36, 29, 19]. These are hierachical point processmodels where the point locations are assumed to be independent given a random intensityfunction λ ( s ) = exp { B( s ) β + X( s ) } , (1)where B( s ) is a, possibly multivariate, function of covariates and X( s ) is a Gaussian random field,which is typically assumed to be stationary. The random field captures spatial structure in thepoint pattern that the given covariates cannot capture. In this paper, we relax the assumptionthat a single stationary Gaussian field can account for those remaining spatial structures anddevelop a mixture model based on level set inversion.To motivate the relevance of the approach we consider a point pattern data set formed by thelocations of the tree species Beilschmiedia Pendula , one of the species in the tropical rainforestplot on Barro Colorado Island [11, 13, 9, 26]. The point pattern comprises point locationsin a rectangular observation window (500 m × R [1] package1 a r X i v : . [ s t a t . M E ] N ov a) (b) (c) Figure 1: Spatial point pattern formed by the locations of trees of the species
Beilschmiediapendula in a 500 m × spatstat [5]. Previous analyses have fitted a log-Gaussian Cox process [36] to this and relateddata sets to draw conclusions on the association of habitat preferences based on a number ofspatial covariates reflecting local soil chemistry and topography [36, 29]. We initially fitted alog Gaussian Cox process to this pattern, with an intensity function as in Equation (1), using covariates, see Section 4.On close inspection, the pattern in Figure 1a (a) shows large areas of very low point intensitywhere hardly any trees can be found. The estimated posterior mean using the LGCP modelpredicts large regions of low intensity, as plotted in Figure 1c. Anecdotal knowledge reveals thatthese regions are covered by a swamp, where the tree species is known to be very unlikely togrow, independent of local soil covariates and topography. However, data on the exact extent ofthe swamp is not available. When a LGCP model that ignores the presence of swamp is fittedto this pattern, the swamp is likely to act as a confounding factor and this is likely to impact oninference. Hence, any conclusions on habitat preferences of the species will be heavily biased.Covariates associated with the presence of the swamp may appear to have a significant correlationwith the intensity of the tree growth, or important covariates might appear non-significant asthey vary indepedently of the presence of the swamp.The approach we take here is designed to capture sharp discontinuities in the intensitysurface that result from qualitative yet unavailable covariates or environmental conditions as theone seen in this example. These effects cannot be captured by the classical Gaussian randomfield approach. Further examples of data where such a model could be important is ecologicaldata with several distinct types of habitat, spatial regions with different treatment regimes inmedical data, or materials exhibiting separate regions of differing properties in material science.Specifically, we consider a Cox process model where the intensity surface is modeled using aBayesian level set approach. The proposed model is an extension of the log-Gaussian Coxprocess with increased flexibility resulting from a random segmentation of the spatial regioninto K classes. The intensity surfaces of the regions associated with the K different classes canbe modeled separately of each other by latent log-Gaussian random fields with simple covariancestructures, while still maintaining flexibility. We refer to the proposed model as the level setCox process (LSCP) .Level set inversion [45, 8] are geometric inverse problems where the main objective is to findinterfaces between geometrical regions based on observed data. In this approach, the interfacesare modeled as level sets of an unknown level set function . Level set inversion has been used2xtensively for segmentation [10, 34, 46], for multiphase flow modeling [7, 18], and for statisticalmodeling of porous materials [38]. Higgs and Hoeting [22] modeled spatially correlated cate-gorical data using a Bayesian level set approach, where the level set function was modeled as aGaussian random field. This probabilistic approach, which Iglesias et al. [27] and Dunlop et al.[20] extended to more general inverse problems, has the advantage that the level sets can beestimated through the posterior distribution of the level set function given the observed data.The LSCP is, like the LGCP, a continuous process. In order to use the model in practicalinference some finite dimensional approximations are required. We show that the classical latticeapproximation of the LSCP converges, in total variation distance, to the continuous model asthe grid gets finer. Further, we propose a computationally efficient Markov chain Monte-Carlo(MCMC) algorithm for Bayesian inference on the model parameters, based on preconditionedCrank-Nicholson Langevin proposals [15].This paper is structured as follows. A detailed model description is given in Section 2. InSection 3, we derive the MCMC algorithm for the method. Section 4 analyses the BeilschmiediaPendula point pattern of rainforest trees with the new approach. Finally, Section 5 discussesthe presented material and possible future extensions of it. The theoretical results and proofsare given in two appendices.
In this section, we first introduce the level set Cox process in Subsection 2.1. Some examplesof the model are presented in Subsection 2.2 and basic properties of the model are presentedin Subsection 2.3. Finally, Subsection 2.4 introduces finite dimensional approximations of themodel necessary for infererence.
Let
D ⊂ R be a bounded domain. The Bayesian level set inversion problem of Iglesias et al.[27] corresponds to reconstructing a latent field of the form X( s ) = K (cid:88) k =1 X k I ( s ∈ D k ) , (2)given noisy data. Here D k ⊂ D is the spatial region associated with segmentation class k , and X k are fixed values. If the constants { X k } k are known, the partition {D k } Kk =1 characterizes X . Iglesias et al. [27] defined D k as an excursion set of an unknown random continuous levelset function , X , such as D k = { s : c k − < X ( s ) ≤ c k } . Here c k are constants such that {−∞ = c < c < ... < c K +1 = ∞} and X is assumed to be a realization of a Gaussian randomfield. Thus, this model corresponds to the level set problem for categorical data by Higgs andHoeting [22]. The level set model using a latent Gaussian random field is not identifiable withregards to the parameter triplet threshold values , mean , and marginal variance of the level setfield, X . Hence, we define X to have standard normal marginal distributions in order to makethe model identifiable.We extend the level set function of (2) by replacing the fixed constants X k by Gaussian ran-dom fields and denote these Gaussian random fields as X k ( s )+ µ k ( s ) , where µ k is a deterministicmean function and X k is a centered Gaussian random field. X( s ) = K (cid:88) k =1 (X k ( s ) + µ k ( s )) I ( c k − < X ( s ) + µ ( s ) < c k ) . (3)3his can be regarded as a mixture model of Gaussian fields related to the non-stationary geo-statistical model proposed by Fuentes [21]. We use this model to specify a statistical modelfor spatial point process data through a Cox process [19], modeling the number of occurrencesof some event in a subregion E ⊆ D as an inhomogeneous Poisson process conditional on arealization of X , i.e. Y( E ) ∼ Pois (cid:18)(cid:90) E λ ( s ) d s (cid:19) , where the intensity surface is λ ( s ) = exp { X( s ) } .A common usage of point process models is to study the effect of covariates on observedpoint patterns. A simple way of doing this is through a standard Poisson regression, where thelog-intensity of the point process is of the form log λ ( s ) = B( s ) β , where B( s ) are the covariatesof interest. This can easily be incorporated into the LSCP model by letting µ k ( s ) = B( s ) β k or µ k ( s ) = µ ( s ) = B( s ) β . Poisson regression and log-Gaussian Cox processes are special cases of the LSCP model. For anillustration of the flexibility of the model, Figure 2 shows the log intensity for four special casessimulated in the unit square. In this figure, all Gaussian random fields are assumed to haveconstant means µ and Matérn covariance functions [35], C (X k ( s ) , X k ( s )) = C ( h ) = σ ν − Γ( ν ) ( κh ) ν K ν ( κh ) , where h = (cid:107) s − s (cid:107) , σ = Var(X k ( s )) , κ = √ ν r , and ν is a smoothness parameter. Further, r is the correlation range approximately corresponding to the value of h where the correlation is . , K ν is a modified Bessel function of the second kind, and Γ is the Gamma function.The patterns were generated using the same random seed such that the level set function isthe same for all cases, yielding comparable results. A realization of log λ ( s ) using two classescan be seen in Panel (a). The log intensity surface of the first class has µ = 2 and r = 0 . ,whereas the second class has µ = 0 and r = 0 . . Both fields have σ = ν = 1 . The level set field, X , has a threshold value at the origin, c = 0 , and range r = 0 . . In the figure, the regionsbelonging to the two classes, and the difference in spatial correlation range is clearly visible.A simplification of the model is obtained by assuming that the intensity for one of the twoclasses is constant (change X ( s ) to a constant X , for instance). A realization of such a logintensity surface can be seen in Panel (b). This model might be relevant in applications wheresome unknown factor makes it unlikely to observe points in certain subregions and may beregarded as spatially varying zero-inflation [31]. If a standard LGCP is fitted to data of thistype some overdispersion unexplained and the estimated mean field and covariance parameterswill be biased; this is not the case for the LSCP model. We discuss and example of this in Section4. The two-class model can of course be simplified further by assuming a constant intensity forboth classes, and log λ ( s ) is then of the form (2). A realization of this simplified model is shownin Panel (c).The last model example uses the structure of the level set formulation to capture effects onthe boundary between two regions. For a model with three classes, the second class takes onthe role of an interface layer between the first and third class as can be seen in Panel (d). Thelog intensity is in this case log λ ( s ) = π ( s ) X + π ( s ) X ( s ) + π ( s ) X . This can be used tomodel effects present on the boundary between two regions. Examples of potential applicationsare activity on shore lines between water and land or mixing regions between fluids.4 a) (b) (c) (d) Figure 2: Realization of the log intensity surface, log λ ( s ) , for the four models presented inSection 2.2. Panel a) corresponds to the model with two random classes, Panel b) with oneconstant and one random, panel c) with two constant, and Panel d) is the model with twoconstant and a third random boundary class. The intensity measure
Λ = { Λ( E ) = (cid:82) E λ ( s ) d s ; E ⊆ D} for a Cox process is well-defined if λ is almost surely finite and integrable. The LSCP model with K = 1 reduces to the standardLGCP model, which has a well-defined random intensity measure if realizations of the Gaussianfield are identified with its continuous modification [37]. For K > , a continuous modificationdoes not need to exist but almost sure integrability follows if X is a.s. continuous which ensuresthat the sets { s : c k − < X ( s ) ≤ c k } are a.s. Lebesgue measurable for all k ∈ { , ..., K } . Hencethe LSCP model is well-defined when the realizations of all Gaussian fields are identified withtheir continuous modification with respect to the Lebesgue measure. By the same argument asin Theorem 3 of Møller et al. [37], ergodicity of the LSCP model follows from ergodicity of log λ .Thus, the LSCP model is ergodic if all latent Gaussian fields are ergodic.The following proposition gives semi-explicit formulas for the two first product densities. Proposition 2.1.
For a level set Cox processes with log intensity (3) , where { X k } Kk =1 are zero-mean stationary random fields with covariance functions r k , the first moment of the intensityfunction equals ρ ( s ) = E [ λ ( s )] = K (cid:88) k =1 exp (cid:18) µ k ( s ) + r k (0)2 (cid:19) (Φ ( c k − µ ( s )) − Φ ( c k − − µ ( s ))) , where Φ is the CDF of a standard normal distribution. Further, the second moment of λ , ρ ( s , s ) , corresponding to the second order product density equals ρ ( s , s ) = K (cid:88) k =1 exp ( µ k ( s ) + µ k ( s ) + r k (0) + r k ( | s − s | )) p kk + K (cid:88) k =1 (cid:88) l (cid:54) = k p lk exp (cid:18) µ l ( s ) + µ k ( s ) + r k (0) + r l (0)2 (cid:19) . Here p lk = (cid:90) c k c k − (cid:18) Φ (cid:18) c l − µ ∗ ( u ) σ ∗ ( u ) (cid:19) − Φ (cid:18) c l − − µ ∗ ( u ) σ ∗ ( u ) (cid:19)(cid:19) e − ( u − µ s √ π du, here µ ∗ ( u ) = µ ( s ) + r ( | u − s | ) r (0) ( u − µ ( u )) and σ ∗ ( u ) = (cid:113) r (0) − r ( | u − s | ) r (0) . The proof is given in B. The form of the pair-correlation function for the LSCP model isgiven by g ( s , s ) = ρ ( s , s ) ρ ( s ) ρ ( s ) , and can hence be expressed using the first and second productdensities given in Proposition 2.1. The integral in p lk has to be evaluated numerically. If g is translation invariant we can compute the inhomogeneous K-function [6] of the processas K ( r ) = (cid:82) B (0 ,r ) g ( h ) dh , where B (0 , r ) is a ball with radius r centered at the origin and g ( s ) = g (0 , s ) . In the case of a homogeneous intensity, the K-function shows the expectednumber of other points at a radius of r from a specific point.Finally, the inhomogeneous empty space function [6, 16], F ( r ) , for the general model is givenby Proposition 2.2. Proposition 2.2.
For a level set Cox processes with log intensity (3) where X k are zero-meanstationary random fields with covariance functions r k , the inhomogeneous empty space functionsis given by F ( s , r ) = 1 − E (cid:34) K (cid:89) k =1 exp (cid:32) − (cid:90) D k ∩ B ( s ,r ) e µ k ( s ) e X k ( s ) d s (cid:33)(cid:35) , where for a given realization of X , D k is the region classified as k . The proof is given in B.
As for standard LGCP models, some finite dimensional approximation of the LSCP model isneeded if it is to be used for inference. The discretization we will use is a classical lattice approx-imation. The observational domain is discretized into subregions D i of a regular lattice over thedomain, and the point locations are replaced by counts Y i of the number of observations withineach subregion D i . This yields the discretized model Y i ∼ Pois ( λ i ) , where λ i = (cid:82) D j λ ( s ) d s andthe information on the fine-scale behaviour of the point pattern behavior is lost. The stochasticintegral in the definition of λ i is not Gaussian and generally difficult to handle. Therefore, acommon approximation is to use λ j ≈ | D j | λ ( s j ) , for some location (usually the center) s j ∈ D j [37]. In Appendix A, we show consistency of this finite dimensional approximation of the likeli-hood for the LSCP model. More precisely, we show that the posterior distribution for the latentfields { X k } k computed using the lattice approximation converges, in total variation distance, tothe posterior distribution of the continuous process.For any fixed lattice approximation there is a positive probability that the level set field takesvalues in several of the intervals { c k , c k +1 } in any fixed lattice cell. Since the spatial informationabout the level set field on a finer scale than the lattice discretization are lost we propose addinga “nugget” effect, ξ j , for each lattice cell D j . The “nugget” effect will model the within-cellclassification uncertainty. This gives the discretization λ j ≈ | D j | ˜ λ ( s j ) , where log ˜ λ ( s j ) = K (cid:88) k =1 I (cid:0) c k − < X ( s j ) + µ ( s j ) + ξ j < c k (cid:1) X k ( s j ) , and ξ j ∼ N (0 , σ ξ ) . The nugget variance , σ ξ , controls the amount of mixing between the classesfor a given realization of X . This classification mechanism is equivalent to the ordered probitmodel discussed in Dunlop et al. [20]. In practice, it is typically difficult to objectively discernan appropriate value for σ ξ and hence we therefore let σ ξ be a regular parameter to be estimatedfor a fixed discretization. 6 Inference
It is common to fit LGCP models in a Bayesian setting. A popular approach is through Markovchain Monte Carlo (MCMC) methodology, for instance using the Metropolis adjusted Langevinalgorithm (MALA) [42] which was suggested by Møller et al. [37]. Another approach is throughintegrated nested Laplace approximation (INLA) [29, 44, 48], which when applicable can havebeneficial computational properties. In this work we use a Bayesian MCMC approach for es-timating the model parameters of the LSCP model. Specifically, we propose a method basedon the preconditioned Crank-Nicholson (pCN) MALA MCMC method of Cotter et al. [15]. Animportant property of the pCN MALA is the optimal step length invariance to mesh refinement,which regular MALA does not have. However, for the LSCP model the main advantage is thatit can be combined with efficient simulation methods based on the fast Fourier transform [32]to decrease the computational cost; we provide more details on this below.Denote the parameters associated with class k as θ k . For the level set field, X , we alsoinclude the nugget variance, σ ξ , and the thresholds, { c k } k in θ . By introducing an auxiliaryfield Z defined such that P (Z( s j ) = k ) = Φ (cid:16) c k − X ( s j ) σ ξ (cid:17) − Φ (cid:16) c k − − X ( s j ) σ ξ (cid:17) , we have log ˜ λ ( s ) d = K (cid:88) k =1 Z( s ) X k ( s ) . This means that parameters and latent fields of different classes, { X k , θ k } , are conditionallyindependent given Z . We use this to construct a Metropolis-within-Gibbs algorithm [41] tosample from the joint posterior. In the i th iteration of the algorithm, the following three stepsare performed1. Sample from Z |{ X k , θ k } k , Y . The sampling can be performed exactly since Z( s i ) ⊥ Z( s i ) , ∀ i (cid:54) = j given { X k , θ k } k , Y and P (Z( s i ) = k ) is known up to a normalizing constant.2. Sample from θ k | Z , X k using the MALA random walk sampler. Since parameters fromdifferent classes are conditionally independent, the sampling can be performed separately,and in parallell, for each θ k .3. Sample from X k | Z , θ k , Y using the pCN MALA algorithm of Cotter et al. [15]. Also inthis step, the updates for different k can be done in parallel since the different Gaussianfields are conditionally independent.The computational bottleneck of the algorithm is the third step, where the latent Gaussianfields are sampled. If the model is discretized into a lattice with N grid cells, the sampling of theGaussian fields in the third step of the estimation method generally requires ( KN ) operations.An approach to remedy this would be to acquire a Gaussian Markov random field approximationof the problem. This idea has been studied by [33, 43, 48] revealing computationally attractiveproperties on arbitrary domains. An adaptation of the method by Simpson et al. [48] to theLSCP model would reduce the computational cost to O ( KN / ) . We can reduce this costfurther by using the fact that proposals in the pCN MALA algorithm are drawn from the priordistribution of the fields.If we restrict ourselves to square domains and assume that the fields have stationary andisotropic covariance functions with known spectral density, we can represent { X k } Kk =0 usingFourier series expansions. By truncating these series, the fast Fourier transform can be usedto sample the field on a regular lattice over the region. This means that the proposals can begenerated with a O ( KN log( N )) computational complexity. Working in the spectral domain also7llows for efficient computation of all gradients and acceptance probabilities needed, making thespectral approach and the pCN-MALA method in combination very favorable. In Appendix Awe justify this truncation theoretically by showing that convergence of the lattice approximationstill holds given certain bounds on the spectral densities. To further illustrate our approach we return to the tropical rainforest data example in Section 1to compare the effect of considering LSCP models to a simple Poisson regression model as wellas to the LGCP model.
The dataset consists of locations of trees of the species
Beilschmiedia pendula in a 50ha rectangular study plot ( x meter) on the island of Barro Colorado in Panama,Figure 1a. The data were acquired from the first census of a major ongoing ecological studythat started in the 1980s, designed to understand the mechanisms maintaining species richness,consisting of the observed positions of a large number of tree species ([26, 25, 12]). The studydeliberately considers a spatially mapped rainforest community, arguing that population andcommunity dynamics occur in a spatial context [24]. In addition to the spatial pattern formedby the tree locations, measurements of topographical variables and soil nutrients that potentiallyinfluence the spatial distribution of the trees are available [30, 47, 17], with the aim of linkingspatial patterns to spatial environmental variations, reflected by observed topography and soilnutrients. In the statistical literature some of the point patterns derived from the study havebeen considered, for example in [36, 48, 29, 40] and the Beilschmiedia pendula data are availablein the spatstat package [5] for the R project [1].Elevation was measured and sampled on a 5x5 meter grid, and based on this an approximationof the slope at each of these grid points was calculated using a Sobel filter [50]. Soil sampleswere taken at 300 locations, for which the amount of 12 soil constituents (Al, B, Ca, Cu, Fe,K, Mg, Mn, N, Nmin, P, Zn) as well as the pH level were measured; these were interpolatedto yield spatially continuous covariates. Since the covariates derived from the soil samples andelevation were not sampled with the grid resolution they had to be interpolated to a commonlatice. In this example, the model was discretized to × subregions over the observationalwindow, giving a spatial resolution of . × . meters. The number of observed points in eachsubregion is shown as a two dimensional histogram in Figure 1b. The spatial interpolation ofthe covariates to this lattice grid was performed using bi-cubic splines with the function interp2 in Matlab (R2016a); Figure 3 shows the standardized covariates.To avoid problems with multicollinearity among the covariates we chose to discard the covari-ates corresponding to high variance inflation factors (VIF) [39]. The covariates were discardediteratively by first computing the VIF for all the covariates, removing the covariate correspond-ing to the highest VIF value if it exceeds 5 and then starting over on the new reduced set ofcovariates. The algorithm was stopped when none of the VIFs exceeded 5. By this procedure,the covariates B, Ca, K, and Zn were discarded, leaving 11 covariates for further analysis. As discussed above, it is obvious from Figure 1a that there is a large area in the middle of theplot where hardly any trees are growing. This indicates that in some parts of the plot, spatialaggregation varies more rapidly than in the other parts. It is likely that some inhibitory factor8levation Slope Al BCa Cu Fe KMg Mn N NminP Zn pHFigure 3: The standardized covariate values on the observational domain.prevents the trees from growing in that region. As mentioned earlier, we have anecdotal evidencethat this area is covered by a swamp and that the tree species is known to be very unlikely togrow there. We test four different models to see how the confounding factor will affect inference.The first is a simple Poisson regression model on the covariates, i.e. an inhomogeneousPoisson process with linear fixed effects defining the log intensity as B( s ) β . We will refer tothis model as the Fixed model. The second model includes a Gaussian field to capture thevariability not explained by the covariates. More precisely, we use an LGCP model with log-intensity log λ ( s ) = X( s ; B) . Here X( s ; B) is a Gaussian field with E [X( s ; B)] = B( s ) β and aMatérn covariance with standard deviation σ and range r .Looking at the data, we might expect the LGCP model to explain the variation in pointintensity well, except for the complete lack of observations in the central region coupled withthe discontinuity in the observed intensity at the border between the large empty area and theother parts of the plot. If the habitat dependence of the trees is significantly different in thesetwo separated regions, a LSCP model with a separate class for each of the two regions mightprovide a better fit. Therefore, the third model is a two-class LSCP model where the first classis defined as in the LGCP model and the second class has constant intensity. That is, log λ ( s ) = π ( s ) X ( s ; B) + π ( s ) C . We will refer to this as the LSCP model. We fixed the parameter of C to a small value proportional to the mean intensity among the grid cells with at most tree.Finally, we consider a simplified version of this model where log λ ( s ) = π ( s ) B( s ) β + π ( s ) C .This will be referred to as the FixedM model.The posterior distributions of the parameters and latent fields were estimated using theproposed MCMC method. In order to avoid significant wrap-around effects, the lattice was9xtended by m for the level set field and by m for the latent Gaussian fields of theclasses (implicitly assuming correlation ranges smaller than m for classification and within classes).The smoothness parameters of the Gaussian fields were fixed at ν = 1 and thefollowing independent prior distributions for the model parameters (when applicable) were used: i ) N (0 , priors for the fixed effects; ii ) N (0 , -priors for the threshold parameters; iii ) an ex-ponential distribution with mean 2, Exp (2) , for the standard deviations of the Gaussian fieldsexcept for X , where σ = 1 is fixed; iv ) Exp (200) distributions truncated from below at thelattice distance and from above at the lattice extension range for the range parameters r k ; thisensures that no wrap-around artifacts were introduced and that the correlation range were notsmaller than the discretization distance; v ) an Exp (0 . distribution truncated from above at for the nugget standard deviation; this yields an expected a priori standard deviation of ap-proximately . and ensures that the nugget variance does not dominate the spatial dependencyin the level set field.The standard deviations of the Gaussian fields were given exponential priors since this cor-responds to the PC prior [51, 49] which penalizes deviations from the simpler model withoutthe Gaussian field, where a mean of penalizes large values. The range parameters were givenexponential priors using similar reasoning where no spatial dependency corresponds to the basemodel. However, ranges below the lattice distance were truncated since no information exist forsmaller values due to the spatial discretization. The covariates were standardized to mean andvariance . Hence, the fixed effects prior yields a penalisation from the base model of no fixedeffects. The nugget for the level set field was considered as a deviation from the base model(without a nugget) and hence penalised by an exponential distribution.To assess the model fit models we used a common approach for point process models [36, 28, 4]that compares summary characteristics estimated from the observed point pattern with envelopesbased on the summary characteristic estimated for simulated point patterns, generated from eachof the four fitted models.As a functional summary characteristic we used the centered and variance stabilized K -function, commonly referred to as the centered L -function, ˆ L ( r ) = (cid:113) ˆ K ( r ) π − r . The L -functionstabilizes the variance such that ˆ L ( r ) will be homoscedastic with respect to r . Furthermore,the − r term centers the function in the sense that the resulting function for a homogeneousPoisson process has the value of zero for all distances [28]. We used isotropic edge correctionand calculated the envelope using the functions Kest and envelope from the spatstat package[5]. The L -function for the observed point pattern as well as the pointwise sample mean and envelopes from realizations for each of the four models can be seen in the top row ofFigure 4.Not surprisingly, the Fixed model seems clearly inappropriate, as the functional summarycharacteristics for the observed point pattern is far outside the envelopes for all distances. For allother models the estimated function for the observed point pattern remains inside the envelopes.The estimated function for the observed pattern and the expected values of the simulated pat-terns are most similar for the LSCP model. For the standard LGCP model the empirical patternappears to show less clustering than the one expected from the model. This can be seen by lowvalues of the black line relative to the red line for a large range of r , and the function only justwithin the envelope at smaller distances. Due to the cumulative nature of the L -function it ishard to discern at what inter-point distances this discrepancy occurs in the patterns. Interest-ingly, the differences between the two lines are less drastic for the FixedM model and reversed.The simulated patterns seem to be less clustered.Secondly, we compute the pair correlation function for the different models as well as pointwise envelopes and expected values in a similar fashion as for the L -function. The result is10SCP model.
50 100 150 200
LGCP model
50 100 150 200
FixedM model.
50 100 150 200
Fixed model.
50 100 150 200
20 40 60 80 100 140 . . . . . . .
20 40 60 80 100 140 . . . . . . .
20 40 60 80 100 140 . . . . . . .
20 40 60 80 100 140 . . . . . . .
18 20 22 24 26 28 30 . . . . . .
18 20 22 24 26 28 30 . . . . . .
18 20 22 24 26 28 30 . . . . . .
18 20 22 24 26 28 30 . . . . . . Figure 4: Plots of the L-function (top row), pair correlation function (mid row), and empty spacefunction (bottom row). In each case, the black line corresponding to the value estimated fromthe real point pattern, the thin line corresponding to the model mean and the envelopes beingthe pointwise envelopes of simulated point patterns for the corresponding model.shown in the second row of Figure 4, with the function for the empirical pattern again deviatingdrastically from those for the simulated patterns, for the Fixed model. For the LGCP model thefunction for the empirical pattern is only just outside the envelopes and again below the meanfunction, indicating less clustering. This discrepancy vanished at a a distance r of about meters. For the FixedM model the deviations are mainly at short distances where the empiricalpattern show greater amount of clustering than what would be expected by the model.Finally, we compute the empty space function and corresponding pointwise envelopesand expected values. These are shown in the bottom row of Figure 4, and one can see thatthe Fixed model once again deviates drastically from the empirical pattern. Here the LSCPmodel show lower values than the empirical pattern while the FixedM model show larger ones.The LGCP model show an increasing over estimation of clustering and the lower border of theenvelope touches the empirical value at far right of the range. Based on the comparison betweenthe pointwise expected values and envelopes with the empirical values, the LSCP model seemto explain the observed point pattern better than the other models.11 ξ Fixed FixedM LGCP LSCP00.20.40.60.8 r Fixed FixedM LGCP LSCP0100200300400 c Fixed FixedM LGCP LSCP-1-0.500.511.5 σ Fixed FixedM LGCP LSCP00.511.52 r Fixed FixedM LGCP LSCP050100150
Figure 5: The mean (cross) and 95% credibility intervals (lines) for the field parameters.
The models discussed here, relating a spatial pattern to the spatially continuous covariates maybe of interest for a number of reasons. Commonly, one seeks to understand habitat preferences ofa particular species as reflected in the relationship between the point pattern and the covariates.In addition, it might be of interest to understand the nature of the spatial structure that remainsunexplained by the covariates. This might be gleaned from the parameters of the covariancefunction of the Gaussian random field(s).To investigate the spatial structure, we first look at the mean value and credibilityintervals for the random field parameters of the models. These are presented in Figure 5.Observing the difference between r , and σ values of the LGCP and LSCP models show howthe empty region will affect the estimation of the spatial dependency structure. Here, the LSCPmodel shows a significantly lower variance and clearly lower correlation range. This is naturalsince the Gaussian field for the LSCP model does not need to explain both the effect of naturalspatial dependency between growth of trees as well as the unknown inhibitory effect that causestrees to not grow at all in certain regions of the forest. And finally we note that σ (cid:15) has a largeeffect (signal to noise ratio equals σ (cid:15) ) indicating that the Matérn field for X cannot explain theclassification on its own. This is clearer for the FixedM model, where classification jumps moresporadically between adjacent grid cells due to the over-simplified structure of the classes.In Figure 6 the mean posterior log intensities, { log λ i } Ni =1 are presented as kriging predictionsfor each of the four models. The figure also shows the posterior probabilities P (X ( s ) > c | Y) ,giving an indication of the region with very few trees. The posterior log intensity surface of theLSCP shows sharp boundaries contrary to the smoothly varying in the LGCP. The classificationin the FixedM model is more noisy than that of the LSCP model and a larger proportion ofthe observation window is classified as being the empty region . Once again, this is connected tothe larger value of σ ξ and caused by the FixedM having to explain the intensity with a muchsimpler model.The relationship between the tree intensity and the covariates is also of interest, and inpractice is often the focus of a study and hence the most relevant inference. Recall that 12covariates of the original 16 covariates are considered her; 11 covariates and one intercept term.Figure 7 shows the mean and credibility intervals for each of these covariates for all models.The first question is which of the covariates have a significant impact on the spatial distributionof the trees and hence reflect a habitat preference of the species. To answer this we asseswhich of the regression coefficients β are significantly different from zero. Empirical p-valuesare computed from the sampled posterior distributions and adjusted for the multiple testingscenario, Holm-Bonferroni correction [23] is used to acquire rejection regions for each covariate.12 a) LSCP (b) LGCP(c) FixedM (d) Fixed -2-101234 (e) LSCP(f) FixedM Figure 6: Mean posterior log intensity surface, λ (left) and mean classification for the modelswhere applicable (right)Model CovariatesFixed Int, Elev, Slope, Al, Mn, NMin, P, pH, Cu, NFixedM Int, NMin, Elev, MnLGCP IntLSCP IntTable 1: Significant covariates on a 5% level for the covariates using Holm-Bonferroni correctionto correct for multiple hypothesis tests.Table 1 shows the covariates that were considered significant, at a significance level of , foreach of the four models.The FixedM model identifies a smaller number of significant covariates than the Fixed model.This is not suprising since the covariates do not need to explain the lack of trees in the emptydomain anymore. For the LGCP and LSCP model, only the intercept is significant. This isprobably an effect of the smaller number of degrees of freedom due to the increased number ofparameters to estimate, i.e. the Gaussian fields. We have considered the problem of Bayesian level set inversion for point process data. Theproposed model can be seen as a generalization of the log-Gaussian Cox process model wherethe latent Gaussian field is extended to a level set mixture of Gaussian fields. We derived basicmodel properties and in Appendix A showed consistency of the posterior probability measure offinite-dimensional approximations to the continuous model. A computationally efficient MCMCmethod for Bayesian inference, based on the pCN MALA algorithm, was presented. A topic offurther research could be to investigate other, potentially even quicker, estimation methods suchas INLA or variational Bayes.We modelled a point pattern formed by the locations of the trees from a species in a tropicalrainforest. The example was of interest since the point pattern show clear signs of being affectedby some unknown confounding factor. Comparisons of functional statistics between simulationsfrom the fitted models and the observed data indicated that allowing for a second class in themodel better explains the point pattern behavior. Moreover, the LSCP model stayed close to13ntercept
Fixed FixedM LGCP LSCP-2.5-2-1.5-1-0.50
Elevation
Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6
Slope
Fixed FixedM LGCP LSCP-0.100.10.20.3 Al Fixed FixedM LGCP LSCP-0.6-0.4-0.200.2 Cu Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 Fe Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 Mg Fixed FixedM LGCP LSCP-0.6-0.4-0.200.20.4 Mn Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 N Fixed FixedM LGCP LSCP-0.4-0.200.20.4
Nmin
Fixed FixedM LGCP LSCP-0.8-0.6-0.4-0.20 P Fixed FixedM LGCP LSCP-0.8-0.6-0.4-0.200.2 pH Fixed FixedM LGCP LSCP-0.4-0.200.2
Figure 7: The mean (cross) and 95% credibility intervals (lines) for the posterior marginaldistribution of fixed effect for the four different models.the expected values for all three functional characteristics investigated while the popular LGCPmodel did not. There are indications that the FixedM model explains the data better thanthe LGCP model despite the much simpler structure of the earlier model. FixedM has far lessdegrees of freedom than the LSCP and LGCP models, and is hence less prone to overfitting.It also shows that it is not overfitting that allows models with two classes to outperform theLGCP. The analysis of the tropical rainforest showed that inference on both the Gaussian fieldparameters and covariates were affected by allowing for a second class in the model. It suggeststhat the inference drawn based on the LGCP model were biased by the confounding factor.Future analysis could consider using fixed effects also in the level set field, X , in orderto investigate which covariates that explains the classification. This is another feature of theproposed model that we have not yet investigated. Further, analysis of multivariate pointpatterns are possible such as for instance joint analysis of several species of plants. This could beperformed by introducing multivariate Gaussian random fields for the classes, i.e. for { X k } Kk =1 .Another possibility is letting several species share the same level set field, X , or classificationsfield, Z , but use independent class fields, { X k } Kk =1 . In this way, information about X could beenforced from several point patterns jointly. The authors gratefully acknowledge the financial support from the Knut and Alice WallenbergFoundation, the Swedish Research Council Grant 2016-04187, and the ÅForsk foundation. Wewould like to thank the people at the Center of tropical forest research, Smithsonian TropicalResearch Institute for the extensive forest census plot and for making the data publicly available.The BCI forest dynamics research project was founded by S.P. Hubbell and R.B. Foster and isnow managed by R. Condit, S. Lao, and R. Perez under the Center for Tropical Forest Scienceand the Smithsonian Tropical Research in Panama. Numerous organizations have providedfunding, principally the U.S. National Science Foundation, and hundreds of field workers havecontributed.Also thanks to the Barro Colorado soil survey (Jim Dalling, Robert John, Kyle Harms,14obert Stallard and Joe Yavitt and field assistants Paolo Segre and Juan Di Trani) for mak-ing the soil sample data publicly available and for answering questions and handing out theoriginal soil sample locations on request. The Barro Colorado soil survey was funded by NSFDEB021104,021115, 0212284,0212818 and OISE 0314581 as well as the STRI Soils Initiative andCTFS.
References [1] The R Project for Statistical Computing. URL .[2] R.J. Adler and J.E. Taylor.
Random Fields and Geometry . Springer, 2007. ISBN 978-0-387-48112-8.[3] J-M. Azaïs and M. Wschebor.
Level Sets and Extrema of Random Processes and Fields .Wiley, 1 edition, 2009. ISBN 04704093393.[4] A. Baddeley, E. Rubak, and R. Turner.
Spatial point patterns: methodology and applicationswith R . CRC Press, 2015.[5] Adrian Baddeley and Rolf Turner. spatstat: An R package for analyzing spatial pointpatterns.
Journal of Statistical Software , 12(6):1–42, 2005. URL .[6] A.J. Baddeley. Non- and semi-parametric estimation of interaction in inhomogeneuouspoint patterns.
Statistica Neerlandica , 54(3):329–350, 2000.[7] S. Barman and D. Bolin. A three-dimensional statistical model for CLSM images of porouspolymer films.
ArXiv e-prints , 1705.03938, May 2017.[8] M. Burger. A level set method for inverse problems.
Inverse problems , 17(5):1327–1355,2001.[9] D. F. R. P. Burslem, N. C. Garwood, and S. C. Thomas. Tropical forest diversity – theplot thickens.
Science , 291:606–607, 2001.[10] E.T. Chung. Electrical impedance tomography using level set representation and totalvariational regularization.
Journal of computational physics , 205:357–372, 2005.[11] R. Condit.
Tropical Forest Census Plots.
Springer-Verlag and R. G. Landes Company,Berlin, Germany, and Georgetown, Texas., 1998.[12] R. Condit.
Tropical Forest Census Plots: Methods and Results from Barro Colorado Island,Panama and a Comparison with Other Plots . Springer Berlin Heidelberg, 1998. ISBN9783662036648.[13] R. Condit, P. S. Ashton, P. Baker, S. Bunyavejchewin, S. Gunatilleke, N. Gunatilleke,S.P. Hubbell, R.B. Foster, A. Itoh, J.V. LaFrankie, H.S. Lee, E. Losos, N. Manokaran,R. Sukumar, and T. Yamakura. Spatial patterns in the distribution of tropical tree species.
Science , 288:1414–1418, 2000.[14] S.L. Cotter, M. Dashti, and A.M. Stuart. Approximation of Bayesian inverse problems forPDEs.
SIAM journal on numerical analysis , 48(1):322–345, 2010.1515] S.L. Cotter, G.O. Roberts, A.M. Stuart, and D. White. MCMC Methods for Functions:Modifying Old Algorithms to Make Them Faster.
Statistical Science , 28(3):424–446, 2013.[16] D.J. Daley and D. Vere-Jones.
An Introduction to the Theory of Point Processes: VolumeII: General Theory and Structure , volume 2. Springer, 2003. ISBN 0-387-95541-0.[17] J. Dalling, R. John, K. Harms, R. Stallard, and J. Yavitt. Soil Maps of Barro Col-orado Island 50 ha Plot. URL "http://ctfs.si.edu/webatlas/datasets/bci/soilmaps/BCIsoil.html" .[18] O. Desjardins and H. Pitsch. A spectrally refined interface approach for simulating multi-phase flows.
Journal of computational physics , 228(5):1658–1677, 2009.[19] P.J. Diggle.
Statistical Analysis of Spatial and Spatio-Temporal Point Patterns . Thirdedition edition, 2014.[20] M.M. Dunlop, M.A. Iglesias, and A.M. Stuart. Hierarchical Bayesian level set inversion.
Statistics and Computing
Computational statistics and data analysis , 54:1999–2011, 2010.[23] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure.
Scandinavian journalof statistics , 6(2):65–70, 1979.[24] S. P. Hubbell.
The Unified Neutral Theory of Biodiversity and Biogeography . Monographsin Population Biology 32, Princeton University Press, 2001.[25] S. P. Hubbell, R. B. Foster, S. T. O’Brien, K. E. Harms, R. Condit, B. Wechsler, S. J.Wright, and S. Loo de Lao. Light-Gap Disturbances, Recruitment Limitation, and TreeDiversity in a Neotropical Forest.
Science , 283(5401):554–557, 1999.[26] S. P. Hubbell, R. Condit, and R. B. Foster. Barro Colorado Forest Census Plot Data, 2005.URL http://ctfs.si/edu/datasets/bci .[27] M.A. Iglesias, Y. Lu, and A.M. Stuart. A Bayesian level set method for geometric inverseproblems.
Interfaces and free boundaries , 18(2):181–217, 2016.[28] J. B. Illian, A. Penttinen, H. Stoyan, and D. Stoyan.
Statistical Analysis and Modelling ofSpatial Point Patterns , volume 70. John Wiley & Sons, 2008.[29] J.B. Illian, S.H. Sørbye, and H. Rue. A toolbox for fitting complex spatial point processmodels using integrated nested Laplace approximation.
The Annals of Applied Statistics , 6(4):1499–1530, 2012.[30] R. C. John, J. W. Dalling, K. E. Harms, J. B. Yavitt, R. F. Stallard, M. Mirabello, S. P.Hubbell, R. Valencia, H. Navarrete, M. Vallejo, and R. B. Foster. Soil nutrients influencespatial distributions of tropical tree species.
Proceedings of the National Academy of SciencesUSA , 104:864–869, 2007.[31] D. Lambert. Zero-Inflated Poisson Regression, With an Application to Defects in Manu-facturing.
Technometrics , 34(1):1–14, 1992.1632] A. Lang and J. Potthoff. Fast simulation of Gaussian random fields.
Monte Carlo Methodsand Applications , 17(3):195–214, 2011.[33] F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian fields and Gaus-sian Markov random fields: the stochastic partial differential equation approach.
Journalof the Royal Statistical Society , 73(4):423–498, 2011.[34] R.J. Lorentzen, G. Naevdal, and A. Shafieirad. Estimating Facies Fields by Use of the En-semble Kalman Filter and Distance Functions - Applied to Shallow-Marine Environments.
SPE Journal , 3(1):146–158, 2012.[35] B. Matérn.
Spatial Variations , volume 36. Springer-Verlag, 1986. ISBN 9780387963655.[36] J Møller and R.P. Waagepetersen. Modern Statistics for Spatial Point Processes.
Scandi-navian Journal of Statistics , 34(4):643–684, 2007.[37] J Møller, A.R. Syversveen, and R.P. Waagepetersen. Log Gaussian Cox Processes.
Scandi-navian journal of statistics , 25(3):451–482, 1998.[38] V.V. Mourzenko. Percolation in two-scale porous media.
The European physical journal B ,19(1):75–85, 2001.[39] J. Neter, W. Wasserman, and M.H. Kutner.
Applied Linear Regression Models . Irwin,second edition edition, 1989. ISBN 0-256-07068-7.[40] T. Rajala and J. Illian. A family of spatial biodiversity measures based on graphs.
Envi-ronmental and ecological statistics , 19(4):545–572, 2012.[41] C.P. Robert and G. Casella.
Monte Carlo statistical methods . Springer, 2 edition, 2004.ISBN 9781475741452.[42] G.O. Roberts and R.L. Tweedie. Exponential convergence of Langevin distributions andtheir discrete approximations.
Bernoulli , 2(4):341–363, 1996.[43] H. Rue and L. Held.
Gaussian Markov random fields , volume 104. Chapman and Hall,2005. ISBN 0203492021.[44] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussianmodels by using integrated nested Laplace approximations.
Journal of the royal statisticalsociety: series B , 71(2):319–392, 2009.[45] F. Santosa. A level-set approach for inverse problems involving obstacles.
ESAIM: Control,Optimisation and caculus of variations , 1:17–33, 1996.[46] B. Scheuermann and B. Rosenhahn. Analysis of Numerical Methods for Level Set BasedImage Segmentation.
Lecture notes in computer science , 5876(2):196–207, 2009.[47] L. A. Schreeg, W. J. Kress, D. L. Erickson, and N. G. Swenson. Phylogenetic analysis oflocal-scale tree soil associations in a lowland moist tropical forest.
PLoS ONE , 5:1–10, 2010.[48] D. Simpson, J.B. Illian, F. Lindgren, S.H. Sørbye, and H. Rue. Going off grid: computationalefficient inference for log-Gaussian Cox processes.
Biometrika , 103(1):49–70, 2016.[49] D. Simpson, H. Rue, A. Riebler, T.G. Martins, and Sørbye S.H. Penalising Model Com-ponent Complexity: A Principled, Practical Approach to Constructing Priors.
Statisticalscience , 32(1):1–28, 2017. 1750] M Sonka, V Hlavac, and R Boyle.
Image Processing, Analysis, and Machine Vision , chapterImage pre-processing. Thomson, 2008.[51] S.H. Sørbye, J.B. Illian, D.P. Simpson, and D. Burslem. Careful prior specification avoidsincautious inference for log-Gaussian Cox processes. Unpublished manuscript, 2017.[52] A.M. Stuart. Inverse problems: A Bayesian perspective.
Acta numerica , 19:451–559, 2010.
A Theoretical results
In this section, we will theoretically justify the two approximations of the LSCP process thatare needed for inference. The first is the finite dimensional approximation from Section 2.4 andthe second is the truncation needed for the fast Fourier transform in Section 3.For k = { , ..., K } , let X k be a Gaussian random field on the spatial domain D = [0 , d ⊂ R d ,defined on a complete probability space. We will show the results using methods similar to thosein [14, 27, 48] and for this it is convenient to represent the fields as Gaussian measures µ ( k )0 . Tosimplify the presentation, we will assume a specific covariance operator related to the Matérncovariance function. However, the results can be extended to more general densely-defined,self-adjoint, positive definite operators and to more general bounded domains.Let µ ( k )0 = N (0 , C ) , where C = τ A − α with A = κ − ∆ . Here τ, κ and α are positiveparameters and A : D ( A ) ⊂ L ( D ) → L ( D ) , further we impose periodic boundary conditions.Denote the eigenvalues of A as { λ j } j ∈ N , which are arranged in a nondecreasing order, and thecorresponding eigenfunctions as { e j } j ∈ N , which form a complete orthonormal basis for L ( D ) .The fractional power operator A α : D ( A α ) → L ( D ) is defined by A α u = (cid:88) j ∈ N λ αj (cid:104) u, e j (cid:105) e j . For any α , the subspace H α := D ( A α/ ) is a Hilbert space H α = { u : (cid:88) j ∈ N λ αj | (cid:104) u, e j (cid:105) | < ∞} , with respect to the inner product (cid:104) φ, ψ (cid:105) α = (cid:10) A α/ φ, A α/ ψ (cid:11) and corresponding norm (cid:107) φ (cid:107) α = (cid:80) j ∈ N λ αj (cid:104) φ, e j (cid:105) .With this choice of covariance operator, we have that if u ∼ µ k , then u ∈ H s for any s <α − d/ µ k -almost surely [20, Theorem 1]. Furthermore, u is almost surely p-times differentiableif α − d/ > p . We will need this differentiability and we formulate it as an assumption. Assumption A.1.
The classification field X is almost surely a Morse function with strictlypositive variance at all locations in the domain, and for k > the Gaussian fields X k are almostsurely differentiable. The differentiability assumption is satisfied by assuming α > . The Morse function re-quirement is slightly stronger than C , but is implied by α > [2]. Furthermore, we can usea theorem equivalent to the Sobolev embedding theorem for our H s space [52, Theorem 2.10].That is, (cid:107) X k (cid:107) L ∞ ≤ C (cid:107) X k (cid:107) s if X k ∈ H s and s > d/ . For our case with periodic boundaryconditions the space H s is even equivalent to the Sobolev space H s .We thus have that X k is represented as a Gaussian measure, µ ( k )0 , on H α and we can choose anappropriate σ -algebra such as the probability space ( H α , Σ k , µ ( k )0 ) becomes complete (see [27]).18ikewise X = { X } Kk =0 can be represented by a product measure µ on the complete measurespace X = (Ω , Σ , µ ) , where Ω is the product space of each H α and Σ is the correspondingproduct σ -algebra.Since the LSCP model defines the point process as a non-homogeneous Poisson processconditioned on X , the likelihood potentials for the continuous and finite dimensional models,defined in Section 2.4, are Φ(X; Y) = (cid:90) D λ ( s ; X) ds − (cid:88) s j ∈ Y log λ ( s j ; X) , (4) Φ N (X; Y) = (cid:88) i ∈ N ( | D i | λ (˜ s i ; X) − Y i log λ (˜ s i ; X)) . (5)Here, N is the number of discretized regions in the lattice approximation and Y i denotes thenumber of observations in D i . Further, ˜ s i is the midpoint of each D i , and s j is the locationof the j th point in the point pattern Y . Based on these likelihoods, we can now define thecorresponding posterior measures as follows. Proposition A.2.
If Assumption A.1 holds, we can define posterior measures using Radon-Nikodym derivative with respect to µ : dµdµ (X) = 1 C µ (Y) exp ( − Φ(X; Y)) ,dµ N dµ (X) = 1 C µ N (Y) exp (cid:0) − Φ N (X; Y) (cid:1) , (6) where C µ (Y) and C µ N (Y) are normalizing constants. The proof is given in Appendix B. Since only the discretized model can be used for inference,it is important to know that the approximation µ N converges to the true posterior, µ , as thediscretization becomes finer. The following theorem shows that this indeed is the case withrespect to the total variation distance, d TV ( µ, µ N ) = 2 sup E ∈F X | µ ( E ) − µ N ( E ) | . Theorem A.3.
Let Assumption A.1 hold and let µ N and µ be the posterior measures definedin (6) . Then d TV ( µ, µ N ) → as N → ∞ . The proof is given in Appendix B. Also the latent fields, X , need to be approximated by finitedimensional representations for inference. We will do this by truncating the basis expansion ofthe field to p terms: X ≈ ˜ X = p (cid:88) j =1 ξ j λ αj e j , where ξ j are independent standard normal variables. We will refer to the model using a dis-cretization of the observational domain and finite dimensional approximations of X as the fullydiscretized model . The advantage with using this truncation is that we can use the fast Fouriertransform for simulating the field. To show that we still have convergence under this approxi-mations, note that the finite dimensional approximation of X can be viewed as an orthogonalprojection of X on to the space spanned by the eigenfunctions { e j } j ≤ p as is done in Cotter et al.[14]. We define the projection operator P p such that ˜X( s ) = P p X( s ) . It is now possible todefine a posterior probability measure for ˜ µ N by it’s Radon-Nikodym derivative as d ˜ µ N dµ (X) = 1 C ˜ µ N (Y) exp (cid:0) − Φ N ( P p X; Y) (cid:1) . (7)19n important consequence of this definition is that the posterior measure is absolutely continuouswith respect to µ and measurable with respect to Σ . The interpretation of ˜ µ N is that the datawill only affect the projection, P p X . We can now show that also under this approximation, weget convergence to the true posterior. Theorem A.4.
Let the measure ˜ µ N be defined by (7) , and let the measure µ be defined by (6) .If µ satisfies Assumption A.1, then d TV ( µ, ˜ µ N ) → as N → ∞ and p → ∞ . The proof is given in Appendix B.
B Proofs
Proof of Proposition 2.1.
For the first moment, note that E [ λ ( s )] = E [exp(X( s ))] = K (cid:88) k =1 E [exp(X( s )) | X ( s ) ∈ ( c k − , c k ]] P (X ( s ) ∈ ( c k − , c k ])= K (cid:88) k =1 E [exp(X k ( s ) + µ k ( s ))] P (X ( s ) ∈ ( c k − , c k ])= K (cid:88) k =1 exp (cid:18) µ k ( s ) + r k (0)2 (cid:19) P (X ( s ) ∈ ( c k − , c k ]) , where the final equality follows from the explicit form of the expectation of a log normal randomvariable. The second moment follows by similar calculations. (cid:3) Proof of Proposition 2.2.
The inhomogeneous empty space function, F ( s , r ) is defined as theprobability of having at least one point inside a ball of radius r centered at s , i.e. F ( s , r ) = P ( N (Y; B ( s , r )) > . Here, N (Y; A ) is the number of points inside the domain A for a real-ization of the point process, Y . Hence F ( s , r ) = 1 − P ( N (Y; B ( s , r )) = 0) . Now, P ( N ( B (Y; s , r )) = 0) = E (cid:34) exp (cid:32) − (cid:90) B ( s ,r ) e (cid:80) Kk =1 Z k ( s )(X k ( s )+ µ k ( s )) d s (cid:33)(cid:35) = E (cid:34) exp (cid:32) − (cid:90) B ( s ,r ) K (cid:88) k =1 Z k ( s ) e X k ( s )+ µ k ( s ) d s (cid:33)(cid:35) = E (cid:34) K (cid:89) k =1 exp (cid:32) − (cid:90) D k ∩ B ( s ,r ) e µ k ( s ) e X k ( s ) d s (cid:33)(cid:35) . (cid:3) Due to the product space interpretation of X as the collection { X k } k , we define norms on X as (cid:107) X (cid:107) ( · ) = (cid:80) Kk =0 (cid:107) X k (cid:107) ( · ) . That is, a norm on realizations of all Gaussian random fields jointlyare defined as the sum of the norm for each of the K + 1 fields.To simplify the proofs we note that the potential Φ can be written as a composition of twofunctions: The potential Φ(X; Y) = Φ P ( G (X); Y) where Φ P : L ( D ) × Y → R is the continuousPoisson log-likelihood function and G : H α → L ( D ) is G ( X ) = K (cid:88) k =1 π k ( · ) X k ( · ) = log( λ ( · )) , π k is the classification function, π k ( s ) = I (c k − ≤ X ( s ) < c k ) . Similarly Φ N (X; Y) =Φ NP ( G (X); Y) where Φ NP is the Poisson log-likelihood function for the discretized domain.To prove Proposition A.2, we will need two lemmas, where the first gives bounds for thelikelihood potentials. Lemma B.1.
Let (cid:107) Y (cid:107) Y denote the number of points in a given point pattern. For Φ in (4) and Φ N in (5) we then have that:(i) For every r > , (cid:15) > , and s > with X ∈ H s and Y ∈ Y with || Y || Y ≤ r , there exists aconstant M ( (cid:15), r ) ∈ R such that Φ(X; Y) ≥ M ( (cid:15), r ) − (cid:15) || X || s .(ii) For every r > , and s > all X ∈ H s and all Y ∈ Y with max {|| X || s , || Y || Y } < r wehave Φ(X; Y) ≤ | D | e Cr + C r .Proof. To show (i) note that Φ P ( G (X); Y) = (cid:90) D exp ( G (X)) ds − (cid:88) s j ∈ Y G (X) ≥ − (cid:88) s j ∈ Y G (X) ≥ −(cid:107) Y (cid:107) Y (cid:107) G (X) (cid:107) L ∞ ( D ) ≥ − r (cid:107) G (X) (cid:107) L ∞ ( D ) . By Assumption A.1 and the Sobolev embedding theorem we have that (cid:107) X (cid:107) L ∞ ( D ) ≤ C (cid:107) X (cid:107) s .Thus (cid:107) G (X) (cid:107) L ∞ ( D ) ≤ (cid:107) X (cid:107) L ∞ ( D ) ≤ C (cid:107) X (cid:107) s and we have Φ P ( G (X) , Y) ≥ − rC (cid:107) X (cid:107) s . Now, ≤ ( Cr √ (cid:15) − √ (cid:15) (cid:107) X (cid:107) s ) = C r (cid:15) + (cid:15) (cid:107) X (cid:107) s − Cr (cid:107) X (cid:107) s . Hence Cr (cid:107) X (cid:107) s ≤ (cid:15) (cid:107) X (cid:107) s + C r (cid:15) = (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), r ) . By the same argument, Φ NP ( G (X); Y) = (cid:88) i ∈ I N ( | D i | exp ( G (X)( s )) − Y i G (X)( s i )) ≥ M ( (cid:15), r ) − (cid:15) (cid:107) X (cid:107) s . Statement (ii) holds for Φ since Φ P ( G (X); Y) = (cid:90) D exp ( G (X)( s )) d s − (cid:88) s j ∈ Y G (X)( s j ) ≤ | D | e (cid:107) G (X) (cid:107) L ∞ ( D ) + (cid:107) Y (cid:107) Y (cid:107) G (X) (cid:107) L ∞ ( D ) ≤ | D | e Cr + Cr ≤ | D | e Cr + C r , and the same for Φ N since Φ NP ( G (X) , Y) = (cid:88) i ∈ I N (cid:16) | D i | e (cid:107) G (X) (cid:107) s − Y i G (X)( s i ) (cid:17) ≤ | D | e Cr + C r . (cid:3) The second lemma we need concerns the regularity of the level sets of X . Let S k (X ) = { s : X ( s ) = c k } be the level set of X for the level c k and set S (X ) = ∪ Kk =1 S k (X ) . Further,let J s denote the set of indices for all subregions D j that do not intersect with S (X ) , thatis, j ∈ J s if D j ∩ D k = ∅ for all ≤ k ≤ K , and define S (X) = ∪ j ∈ J s D j as the set of allsubregions where the level sets are not included. We then have the following result about S (X) ,and L d ( S (X)) where L d denotes the Lebesgue measure in dimension d .21 emma B.2. Let Assumption A.1 hold, then • L ( S (X)) = 0 a.s. • E (cid:2) L ( S C (X)) (cid:3) → as N → ∞ . • For any finite set of points Y , E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3) → as N → ∞ .Proof. That L ( S (X)) = 0 a.s. follows from Proposition 2.8 in Iglesias et al. [27].We will now show that E (cid:2) L ( S C (X)) (cid:3) goes to zero. Note that a curve segment of length l can at most cover 4( lh + 1 ) subregions D j . Hence, the number of subregions D j that have alevel crossing, N − | J s | , is bounded by (cid:80) N ∗ i =1 l i /h + 1) , where N ∗ is the number of disjoint linesegments in S ( X ) and l i the length of i th segment. This gives that E (cid:2) L ( S C (X)) (cid:3) ≤ h (cid:18) h E (cid:2) L ( S ( X )) (cid:3) + 4 E [ N ∗ ] (cid:19) ≤ h ( E (cid:2) L ( S ( X )) (cid:3) + hN ∗ ) . Thus, the result follows if we can bound E (cid:2) L ( S ( X ))) (cid:3) and E [ N ∗ ] . By assumption X satisfiesthe conditions of Rice Theorem [3], which gives that E (cid:2) L ( S ( X ))) (cid:3) < ∞ . Let N k denote thenumber of local maxima of X over the level c k and let N = (cid:80) k N k . Since E [ N ∗ ] is boundedby E (cid:2) N (cid:3) , and Rice Theorem bounds E (cid:2) N (cid:3) , the result follows.Finally, we show that E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3) goes to zero. We only consider the case K = 1 and Y = { y } , as the general result follows directly given that the claim holds for this special case.Let B ( y, (cid:15) N ) be a ball centered at y , where (cid:15) N is chosen so that the subregions covering y arecontained in the ball. To prove the result we need to show that P ( L i ( B ( y, (cid:15) N ) ∩ X − ( c )) > → as N → , for both i = 0 , , where L i are the Lipschitz-Killing curvatures. Since X is a Morsefunction and L i ( B ( y, (cid:15) N )) → , Theorem 15.9.4 in [2] shows that E [ L i ( B ( y, (cid:15) N ) ∩ X − ( c )))] → for i = 0 , . Thus P ( L ( B ( y, (cid:15) N ) ∩ X − ( c )) > → as L ( B ( y, (cid:15) N ) ∩ X − ( c )) is non-negativerandom variable. Since any B ( y, (cid:15) N ) converges to a point, it follows that L ( B ( y, (cid:15) N ) ∩ X − ( c )) (the Euler characteristic) converges to a non-negative random variable, and thus P ( L ( B ( y, (cid:15) N ) ∩ X − ( c )) > → . (cid:3) Proof of Proposition A.2.
We only state the proof for µ since the proof for µ N follows similarly.To show the result we must show that the Φ is a measurable function, and then that the measureis normalizable. To prove measurability it suffices, by Lemma 6.1 in Iglesias et al. [27], to showthat that Φ is continuous µ -almost surely. Thus for ˆ X, ˜ X ∈ H s , s > , we must show that | Φ( ˆX) − Φ( ˜X) | → as (cid:107) ˆX − ˜X (cid:107) s → . Note that | Φ( ˆX) − Φ( ˜X) | ≤ (cid:90) D | e G ( ˆX)( s ) − e G ( ˜X)( s ) | d s + (cid:88) s j ∈ Y | G ( ˆX)( s j ) − G ( ˜X)( s j ) | . (8)We show continuity of the two terms separately. For the first term in (8) it follows that (cid:90) D | e G ( ˆX)( s ) − e G ( ˜X)( s ) | d s ≤ (cid:90) D exp (cid:16) | G ( ˆX)( s ) | + | G ( ˜X)( s ) | (cid:17) | G ( ˆX)( s ) − G ( ˜X)( s ) | d s ≤ C | D | e C (cid:107) ˆX (cid:107) s + C (cid:107) ˜X (cid:107) s (cid:107) G ( ˆX) − G ( ˜X) (cid:107) s . Here the first inequality is due to the mean value theorem, and the second inequality comesfrom using Sobolev’s embedding theorem, and Hölders inequality. Since (cid:107) G ( ˆX) − G ( ˜X) (cid:107) s ≤ (cid:80) Kk =1 (cid:107) X (cid:107) s (cid:107) π k ( ˆX ) − π k ( ˜X ) (cid:107) s + (cid:107) ˆX − ˜X (cid:107) s , it suffices to show that π k is continuous. By LemmaB.2, L ( S (X)) = 0 a.s. and since π k ( · ) is constant on S (X) C it is also a.s. continuous. By22roposition 2.6 in Iglesias et al. [27], π k ( · ) is therefore continuous on L ( D ) and thus also on H since it is a.s. constant.The second term in (8) can be bounded by C (cid:107) Y (cid:107) Y (cid:107) ˆX − ˜X (cid:107) s a.s. since | G ( ˆX)( s ) − G ( ˜X)( s ) | ≤(cid:107) ˆX − ˜X (cid:107) L ∞ ( D ) . Finally, by Lemma B.1 the function Φ is bounded from above and below, andthus the measure can be normalized. (cid:3) From here on we will simplify the notation by omitting the observed point pattern from thelikelihood potential and the constants, i.e.
Φ(X) = Φ(X; Y) and C µ = C µ (Y) . Proof of Theorem A.3.
By Stuart [52, Lemma 6.36], the Hellinger distance bounds the totalvariation norm, so it suffices to show convergence in Hellinger distance. Take X ∈ H s , s > .By the triangle inequality, d Hell ( µ, µ N ) = (cid:90) (cid:32)(cid:114) dµdν − (cid:114) dµ N dν (cid:33) dµ (X) = (cid:90) (cid:32) e − Φ(X) (cid:112) C µ − e − Φ N (X) (cid:112) C µ N (cid:33) dµ (X) ≤ C µ (cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:112) C µ − (cid:112) C µ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) e − Φ N (X) dµ (X)= I + I , where C µ = (cid:82) e − Φ(X) dµ (X) and C µ N = (cid:82) e − Φ N (X) dµ (X) . We now first show that I can bebounded by I and then show that I → as N → ∞ . Note that I ≤ ( C µ − C µ N ) (cid:0) min { C µ , C µ N } (cid:1) − C µ N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) e − Φ(X) − e − Φ N (X) dµ (X) (cid:12)(cid:12)(cid:12)(cid:12) (cid:0) min { C µ , C µ N } (cid:1) − C µ N ≤ C µ N { C µ , C µ N } (cid:18)(cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) (cid:19) ≤ C µ N { C µ , C µ N } (cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) (cid:90) e (cid:15) (cid:107) X (cid:107) − M ( (cid:15), (cid:107) Y (cid:107) Y ) dµ (X) ≤ CI . Here the third inequality is due to Hölder’s inequality and Ferniques theorem [14, Theorem A.3].Now to bound I note that I ≤ C µ (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) | Φ(X) − Φ N (X) | dµ (X) . Since the function G is Lipschitz continuous on S ( X ) (see Lemma B.2) we get (cid:12)(cid:12) Φ(X) − Φ N (X) (cid:12)(cid:12) ≤ Ce C (cid:107) X (cid:107) s | D | h + C (cid:107) Y (cid:107) Y h + C (cid:107) X (cid:107) s ( e C (cid:107) X (cid:107) s L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) , and thus I ≤ C µ C (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) ( | D | + (cid:107) Y (cid:107) Y ) h dµ (X)+ 24 C µ C (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) ( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) dµ (X) . N → ∞ . The second integralcan be bounded by (cid:113) E (cid:2) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) (cid:3)(cid:113) E [( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) ] ≤ C (cid:113) E [( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) ] ≤ C ( L ( D ) + (cid:107) Y (cid:107) Y ) (cid:0) E (cid:2) L ( S C (X)) (cid:3) + E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3)(cid:1) , and as N → ∞ this also goes to zero by Lemma B.2. (cid:3) Proof of Theorem A.4.
Denote the posterior measure for the fully discretized model by ˜ µ N . TheTV distance between the posterior measures can be bounded as d T V ( µ, ˜ µ N ) ≤ d T V ( µ, µ N ) + d T V ( µ N , ˜ µ N ) , where the first term goes to zero by theorem A.3. Clearly, ˜ µ N as given in (7) defines a posteriormeasure with respect to µ by the same arguments as in the proof of Proposition A.2, and itcoincides with µ N on the span of { e j } j>p +1 . We can therefore bound d T V ( µ N , ˜ µ N ) using thesame method as in the proof of theorem A.3, this gives that d Hell ( µ N , ˜ µ N ) ≤ I + I , wherenow, I = 1 C µ N (cid:90) X (cid:12)(cid:12)(cid:12) e − Φ N (X) − e − Φ N ( P p X) (cid:12)(cid:12)(cid:12) dµ (X) I = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:112) C µ N − (cid:112) C ˜ µ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) e − Φ N ( P p X) dµ (X) . We can again bound I by CI , so what remains to be shown is that I goes to zero as p → ∞ .Let X ∈ H s , s > . Since P p is a projection, we then clearly have that (cid:107) P p X (cid:107) s ≤ (cid:107) X (cid:107) s . ByLemma B.1(i) and Hölders inequality I ≤ C µ (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) | Φ N (X) − Φ N ( P p X) | dµ (X) ≤ C (cid:115)(cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) dµ (X) E (cid:2) | Φ N (X) − Φ N ( P p X) | (cid:3) . We will now focus on bounding the expectation above. Using Ferniques theorem (cid:12)(cid:12) Φ N (X) − Φ N ( P p X) (cid:12)(cid:12) = N (cid:88) i =1 | D i | ( e G (X)( s i ) − e G ( P p X)( s i ) ) − Y i ( G ( P p X)( s i ) − G (X)( s i )) ≤ e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) N (cid:88) i =1 ( | D i | + Y i ) | G ( P p X)( s i ) − G (X)( s i ) | . Using the inequalities | G ( P p X)( s ) − G (X)( s ) | ≤ K (cid:88) k =1 | π k (X )( s ) X k ( s ) − π k ( P p X )( s ) P p X k ( s ) |≤ K (cid:88) k =1 ( | X k ( s ) − P p X k ( s ) | + C (cid:107) X (cid:107) s | π k (X )( s ) − π k ( P p X )( s ) | ) E (cid:2) | Φ N (X) − Φ N ( P p X) | (cid:3) ≤ C E (cid:88) i ∈ I N ( | D i | + Y i ) K (cid:88) k =1 | X k ( s i ) − P p X k ( s i ) | + C E (cid:88) i ∈ I N ( | D i | + Y i ) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) | . (9)Note that | D i | ∝ N − and that X ( s ) is bounded for each s ∈ D almost surely. Let Q p X = X − P p X and note that Q p X for each s ∈ D is a mean-zero Gaussian variable with a variance σ p that goes to zero as p → ∞ . Thus, the first term in (9) clearly goes to zero as p → ∞ . Since | π k (X )( s i ) − π k ( P p X )( s i ) | is bounded by one, the second term in (9) can be bounded by C N (cid:88) i =1 ( | D i | + Y i ) E (cid:34) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) | (cid:35) . Here the expectation can be bounded as E (cid:34) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) | (cid:35) ≤ K max k { P (X ( s i ) ≤ c k ∩ P p X ( s i ) > c k )+ P (X ( s ) > c k ∩ P p X ( s ) < c k ) } . We now show how to bound the first probability, and the second probability is bounded by similarcalculations. Define the events A = { X ( s i ) ≤ c k ∩ P p X ( s i ) > c k } and B = { P p X ( s i ) ∈ [ c k , c k + (cid:15) ] } . It follows that P ( A ) = P ( A | B ) P ( B ) + P ( A | B C ) P ( B C ) ≤ P ( B ) + P ( A | B C ) ≤ P ( P p X ( s i ) ∈ [ c k , c k + (cid:15) ]) + P ( Q p X ( s i ) ≤ − (cid:15) ) . Now set (cid:15) = √ σ p and recall that P ( Z > t ) < √ πt e − t / if Z ∼ N (0 , . This gives that P ( A ) ≤ P (cid:0) < P p X ( s i ) ≤ √ σ p (cid:1) + P (cid:0) Q p X ( s i ) ≤ −√ σ p (cid:1) ≤ C √ σ p + √ σ p √ π e − σp , which goes to zero as p → ∞ , and thus so does the final expectation in (9). (cid:3)(cid:3)