[PDF] Level set Cox processes

Abstract

The log-Gaussian Cox process (LGCP) is a popular point process for modeling non-interacting spatial point patterns. This paper extends the LGCP model to handle data exhibiting fundamentally different behaviors in different subregions of the spatial domain. The aim of the analyst might be either to identify and classify these regions, to perform kriging, or to derive some properties of the parameters driving the random field in one or several of the subregions. The extension is based on replacing the latent Gaussian random field in the LGCP by a latent spatial mixture model. The mixture model is specified using a latent, categorically valued, random field induced by level set operations on a Gaussian random field. Conditional on the classification, the intensity surface for each class is modeled by a set of independent Gaussian random fields. This allows for standard stationary covariance structures, such as the Mat\'{e}rn family, to be used to model Gaussian random fields with some degree of general smoothness but also occasional and structured sharp discontinuities. A computationally efficient MCMC method is proposed for Bayesian inference and we show consistency of finite dimensional approximations of the model. Finally, the model is fitted to point pattern data derived from a tropical rainforest on Barro Colorado island, Panama. We show that the proposed model is able to capture behavior for which inference based on the standard LGCP is biased.

Full PDF

LLevel set Cox processes

Anders Hildeman , David Bolin , Jonas Wallin , and Janine B. Illian Department of Mathematical Sciences, Chalmers University of Technology andUniversity of Gothenburg, Sweden Department of Statistics, Lund University, Sweden School of Mathematics and Statistics, University of St Andrews, Scotland

Abstract

The log-Gaussian Cox process (LGCP) is a popular point process for modeling non-interacting spatial point patterns. This paper extends the LGCP model to handle dataexhibiting fundamentally diﬀerent behaviors in diﬀerent subregions of the spatial domain.The aim of the analyst might be either to identify and classify these regions, to perform krig-ing, or to derive some properties of the parameters driving the random ﬁeld in one or severalof the subregions. The extension is based on replacing the latent Gaussian random ﬁeld inthe LGCP by a latent spatial mixture model. The mixture model is speciﬁed using a latent,categorically valued, random ﬁeld induced by level set operations on a Gaussian randomﬁeld. Conditional on the classiﬁcation, the intensity surface for each class is modeled by aset of independent Gaussian random ﬁelds. This allows for standard stationary covariancestructures, such as the Matérn family, to be used to model Gaussian random ﬁelds withsome degree of general smoothness but also occasional and structured sharp discontinuities.A computationally eﬃcient MCMC method is proposed for Bayesian inference and weshow consistency of ﬁnite dimensional approximations of the model. Finally, the model isﬁtted to point pattern data derived from a tropical rainforest on Barro Colorado island,Panama. We show that the proposed model is able to capture behavior for which inferencebased on the standard LGCP is biased.

Cox processes, and in particular log-Gaussian Cox processes (LGCP), have been used extensivelyas ﬂexible models of spatial point pattern data [37, 36, 29, 19]. These are hierachical point processmodels where the point locations are assumed to be independent given a random intensityfunction λ ( s ) = exp { B( s ) β + X( s ) } , (1)where B( s ) is a, possibly multivariate, function of covariates and X( s ) is a Gaussian random ﬁeld,which is typically assumed to be stationary. The random ﬁeld captures spatial structure in thepoint pattern that the given covariates cannot capture. In this paper, we relax the assumptionthat a single stationary Gaussian ﬁeld can account for those remaining spatial structures anddevelop a mixture model based on level set inversion.To motivate the relevance of the approach we consider a point pattern data set formed by thelocations of the tree species Beilschmiedia Pendula , one of the species in the tropical rainforestplot on Barro Colorado Island [11, 13, 9, 26]. The point pattern comprises point locationsin a rectangular observation window (500 m × R [1] package1 a r X i v : . [ s t a t . M E ] N ov a) (b) (c) Figure 1: Spatial point pattern formed by the locations of trees of the species

Beilschmiediapendula in a 500 m × spatstat [5]. Previous analyses have ﬁtted a log-Gaussian Cox process [36] to this and relateddata sets to draw conclusions on the association of habitat preferences based on a number ofspatial covariates reﬂecting local soil chemistry and topography [36, 29]. We initially ﬁtted alog Gaussian Cox process to this pattern, with an intensity function as in Equation (1), using covariates, see Section 4.On close inspection, the pattern in Figure 1a (a) shows large areas of very low point intensitywhere hardly any trees can be found. The estimated posterior mean using the LGCP modelpredicts large regions of low intensity, as plotted in Figure 1c. Anecdotal knowledge reveals thatthese regions are covered by a swamp, where the tree species is known to be very unlikely togrow, independent of local soil covariates and topography. However, data on the exact extent ofthe swamp is not available. When a LGCP model that ignores the presence of swamp is ﬁttedto this pattern, the swamp is likely to act as a confounding factor and this is likely to impact oninference. Hence, any conclusions on habitat preferences of the species will be heavily biased.Covariates associated with the presence of the swamp may appear to have a signiﬁcant correlationwith the intensity of the tree growth, or important covariates might appear non-signiﬁcant asthey vary indepedently of the presence of the swamp.The approach we take here is designed to capture sharp discontinuities in the intensitysurface that result from qualitative yet unavailable covariates or environmental conditions as theone seen in this example. These eﬀects cannot be captured by the classical Gaussian randomﬁeld approach. Further examples of data where such a model could be important is ecologicaldata with several distinct types of habitat, spatial regions with diﬀerent treatment regimes inmedical data, or materials exhibiting separate regions of diﬀering properties in material science.Speciﬁcally, we consider a Cox process model where the intensity surface is modeled using aBayesian level set approach. The proposed model is an extension of the log-Gaussian Coxprocess with increased ﬂexibility resulting from a random segmentation of the spatial regioninto K classes. The intensity surfaces of the regions associated with the K diﬀerent classes canbe modeled separately of each other by latent log-Gaussian random ﬁelds with simple covariancestructures, while still maintaining ﬂexibility. We refer to the proposed model as the level setCox process (LSCP) .Level set inversion [45, 8] are geometric inverse problems where the main objective is to ﬁndinterfaces between geometrical regions based on observed data. In this approach, the interfacesare modeled as level sets of an unknown level set function . Level set inversion has been used2xtensively for segmentation [10, 34, 46], for multiphase ﬂow modeling [7, 18], and for statisticalmodeling of porous materials [38]. Higgs and Hoeting [22] modeled spatially correlated cate-gorical data using a Bayesian level set approach, where the level set function was modeled as aGaussian random ﬁeld. This probabilistic approach, which Iglesias et al. [27] and Dunlop et al.[20] extended to more general inverse problems, has the advantage that the level sets can beestimated through the posterior distribution of the level set function given the observed data.The LSCP is, like the LGCP, a continuous process. In order to use the model in practicalinference some ﬁnite dimensional approximations are required. We show that the classical latticeapproximation of the LSCP converges, in total variation distance, to the continuous model asthe grid gets ﬁner. Further, we propose a computationally eﬃcient Markov chain Monte-Carlo(MCMC) algorithm for Bayesian inference on the model parameters, based on preconditionedCrank-Nicholson Langevin proposals [15].This paper is structured as follows. A detailed model description is given in Section 2. InSection 3, we derive the MCMC algorithm for the method. Section 4 analyses the BeilschmiediaPendula point pattern of rainforest trees with the new approach. Finally, Section 5 discussesthe presented material and possible future extensions of it. The theoretical results and proofsare given in two appendices.

In this section, we ﬁrst introduce the level set Cox process in Subsection 2.1. Some examplesof the model are presented in Subsection 2.2 and basic properties of the model are presentedin Subsection 2.3. Finally, Subsection 2.4 introduces ﬁnite dimensional approximations of themodel necessary for infererence.

Let

D ⊂ R be a bounded domain. The Bayesian level set inversion problem of Iglesias et al.[27] corresponds to reconstructing a latent ﬁeld of the form X( s ) = K (cid:88) k =1 X k I ( s ∈ D k ) , (2)given noisy data. Here D k ⊂ D is the spatial region associated with segmentation class k , and X k are ﬁxed values. If the constants { X k } k are known, the partition {D k } Kk =1 characterizes X . Iglesias et al. [27] deﬁned D k as an excursion set of an unknown random continuous levelset function , X , such as D k = { s : c k − < X ( s ) ≤ c k } . Here c k are constants such that {−∞ = c < c < ... < c K +1 = ∞} and X is assumed to be a realization of a Gaussian randomﬁeld. Thus, this model corresponds to the level set problem for categorical data by Higgs andHoeting [22]. The level set model using a latent Gaussian random ﬁeld is not identiﬁable withregards to the parameter triplet threshold values , mean , and marginal variance of the level setﬁeld, X . Hence, we deﬁne X to have standard normal marginal distributions in order to makethe model identiﬁable.We extend the level set function of (2) by replacing the ﬁxed constants X k by Gaussian ran-dom ﬁelds and denote these Gaussian random ﬁelds as X k ( s )+ µ k ( s ) , where µ k is a deterministicmean function and X k is a centered Gaussian random ﬁeld. X( s ) = K (cid:88) k =1 (X k ( s ) + µ k ( s )) I ( c k − < X ( s ) + µ ( s ) < c k ) . (3)3his can be regarded as a mixture model of Gaussian ﬁelds related to the non-stationary geo-statistical model proposed by Fuentes [21]. We use this model to specify a statistical modelfor spatial point process data through a Cox process [19], modeling the number of occurrencesof some event in a subregion E ⊆ D as an inhomogeneous Poisson process conditional on arealization of X , i.e. Y( E ) ∼ Pois (cid:18)(cid:90) E λ ( s ) d s (cid:19) , where the intensity surface is λ ( s ) = exp { X( s ) } .A common usage of point process models is to study the eﬀect of covariates on observedpoint patterns. A simple way of doing this is through a standard Poisson regression, where thelog-intensity of the point process is of the form log λ ( s ) = B( s ) β , where B( s ) are the covariatesof interest. This can easily be incorporated into the LSCP model by letting µ k ( s ) = B( s ) β k or µ k ( s ) = µ ( s ) = B( s ) β . Poisson regression and log-Gaussian Cox processes are special cases of the LSCP model. For anillustration of the ﬂexibility of the model, Figure 2 shows the log intensity for four special casessimulated in the unit square. In this ﬁgure, all Gaussian random ﬁelds are assumed to haveconstant means µ and Matérn covariance functions [35], C (X k ( s ) , X k ( s )) = C ( h ) = σ ν − Γ( ν ) ( κh ) ν K ν ( κh ) , where h = (cid:107) s − s (cid:107) , σ = Var(X k ( s )) , κ = √ ν r , and ν is a smoothness parameter. Further, r is the correlation range approximately corresponding to the value of h where the correlation is . , K ν is a modiﬁed Bessel function of the second kind, and Γ is the Gamma function.The patterns were generated using the same random seed such that the level set function isthe same for all cases, yielding comparable results. A realization of log λ ( s ) using two classescan be seen in Panel (a). The log intensity surface of the ﬁrst class has µ = 2 and r = 0 . ,whereas the second class has µ = 0 and r = 0 . . Both ﬁelds have σ = ν = 1 . The level set ﬁeld, X , has a threshold value at the origin, c = 0 , and range r = 0 . . In the ﬁgure, the regionsbelonging to the two classes, and the diﬀerence in spatial correlation range is clearly visible.A simpliﬁcation of the model is obtained by assuming that the intensity for one of the twoclasses is constant (change X ( s ) to a constant X , for instance). A realization of such a logintensity surface can be seen in Panel (b). This model might be relevant in applications wheresome unknown factor makes it unlikely to observe points in certain subregions and may beregarded as spatially varying zero-inﬂation [31]. If a standard LGCP is ﬁtted to data of thistype some overdispersion unexplained and the estimated mean ﬁeld and covariance parameterswill be biased; this is not the case for the LSCP model. We discuss and example of this in Section4. The two-class model can of course be simpliﬁed further by assuming a constant intensity forboth classes, and log λ ( s ) is then of the form (2). A realization of this simpliﬁed model is shownin Panel (c).The last model example uses the structure of the level set formulation to capture eﬀects onthe boundary between two regions. For a model with three classes, the second class takes onthe role of an interface layer between the ﬁrst and third class as can be seen in Panel (d). Thelog intensity is in this case log λ ( s ) = π ( s ) X + π ( s ) X ( s ) + π ( s ) X . This can be used tomodel eﬀects present on the boundary between two regions. Examples of potential applicationsare activity on shore lines between water and land or mixing regions between ﬂuids.4 a) (b) (c) (d) Figure 2: Realization of the log intensity surface, log λ ( s ) , for the four models presented inSection 2.2. Panel a) corresponds to the model with two random classes, Panel b) with oneconstant and one random, panel c) with two constant, and Panel d) is the model with twoconstant and a third random boundary class. The intensity measure

Λ = { Λ( E ) = (cid:82) E λ ( s ) d s ; E ⊆ D} for a Cox process is well-deﬁned if λ is almost surely ﬁnite and integrable. The LSCP model with K = 1 reduces to the standardLGCP model, which has a well-deﬁned random intensity measure if realizations of the Gaussianﬁeld are identiﬁed with its continuous modiﬁcation [37]. For K > , a continuous modiﬁcationdoes not need to exist but almost sure integrability follows if X is a.s. continuous which ensuresthat the sets { s : c k − < X ( s ) ≤ c k } are a.s. Lebesgue measurable for all k ∈ { , ..., K } . Hencethe LSCP model is well-deﬁned when the realizations of all Gaussian ﬁelds are identiﬁed withtheir continuous modiﬁcation with respect to the Lebesgue measure. By the same argument asin Theorem 3 of Møller et al. [37], ergodicity of the LSCP model follows from ergodicity of log λ .Thus, the LSCP model is ergodic if all latent Gaussian ﬁelds are ergodic.The following proposition gives semi-explicit formulas for the two ﬁrst product densities. Proposition 2.1.

For a level set Cox processes with log intensity (3) , where { X k } Kk =1 are zero-mean stationary random ﬁelds with covariance functions r k , the ﬁrst moment of the intensityfunction equals ρ ( s ) = E [ λ ( s )] = K (cid:88) k =1 exp (cid:18) µ k ( s ) + r k (0)2 (cid:19) (Φ ( c k − µ ( s )) − Φ ( c k − − µ ( s ))) , where Φ is the CDF of a standard normal distribution. Further, the second moment of λ , ρ ( s , s ) , corresponding to the second order product density equals ρ ( s , s ) = K (cid:88) k =1 exp ( µ k ( s ) + µ k ( s ) + r k (0) + r k ( | s − s | )) p kk + K (cid:88) k =1 (cid:88) l (cid:54) = k p lk exp (cid:18) µ l ( s ) + µ k ( s ) + r k (0) + r l (0)2 (cid:19) . Here p lk = (cid:90) c k c k − (cid:18) Φ (cid:18) c l − µ ∗ ( u ) σ ∗ ( u ) (cid:19) − Φ (cid:18) c l − − µ ∗ ( u ) σ ∗ ( u ) (cid:19)(cid:19) e − ( u − µ s √ π du, here µ ∗ ( u ) = µ ( s ) + r ( | u − s | ) r (0) ( u − µ ( u )) and σ ∗ ( u ) = (cid:113) r (0) − r ( | u − s | ) r (0) . The proof is given in B. The form of the pair-correlation function for the LSCP model isgiven by g ( s , s ) = ρ ( s , s ) ρ ( s ) ρ ( s ) , and can hence be expressed using the ﬁrst and second productdensities given in Proposition 2.1. The integral in p lk has to be evaluated numerically. If g is translation invariant we can compute the inhomogeneous K-function [6] of the processas K ( r ) = (cid:82) B (0 ,r ) g ( h ) dh , where B (0 , r ) is a ball with radius r centered at the origin and g ( s ) = g (0 , s ) . In the case of a homogeneous intensity, the K-function shows the expectednumber of other points at a radius of r from a speciﬁc point.Finally, the inhomogeneous empty space function [6, 16], F ( r ) , for the general model is givenby Proposition 2.2. Proposition 2.2.

For a level set Cox processes with log intensity (3) where X k are zero-meanstationary random ﬁelds with covariance functions r k , the inhomogeneous empty space functionsis given by F ( s , r ) = 1 − E (cid:34) K (cid:89) k =1 exp (cid:32) − (cid:90) D k ∩ B ( s ,r ) e µ k ( s ) e X k ( s ) d s (cid:33)(cid:35) , where for a given realization of X , D k is the region classiﬁed as k . The proof is given in B.

As for standard LGCP models, some ﬁnite dimensional approximation of the LSCP model isneeded if it is to be used for inference. The discretization we will use is a classical lattice approx-imation. The observational domain is discretized into subregions D i of a regular lattice over thedomain, and the point locations are replaced by counts Y i of the number of observations withineach subregion D i . This yields the discretized model Y i ∼ Pois ( λ i ) , where λ i = (cid:82) D j λ ( s ) d s andthe information on the ﬁne-scale behaviour of the point pattern behavior is lost. The stochasticintegral in the deﬁnition of λ i is not Gaussian and generally diﬃcult to handle. Therefore, acommon approximation is to use λ j ≈ | D j | λ ( s j ) , for some location (usually the center) s j ∈ D j [37]. In Appendix A, we show consistency of this ﬁnite dimensional approximation of the likeli-hood for the LSCP model. More precisely, we show that the posterior distribution for the latentﬁelds { X k } k computed using the lattice approximation converges, in total variation distance, tothe posterior distribution of the continuous process.For any ﬁxed lattice approximation there is a positive probability that the level set ﬁeld takesvalues in several of the intervals { c k , c k +1 } in any ﬁxed lattice cell. Since the spatial informationabout the level set ﬁeld on a ﬁner scale than the lattice discretization are lost we propose addinga “nugget” eﬀect, ξ j , for each lattice cell D j . The “nugget” eﬀect will model the within-cellclassiﬁcation uncertainty. This gives the discretization λ j ≈ | D j | ˜ λ ( s j ) , where log ˜ λ ( s j ) = K (cid:88) k =1 I (cid:0) c k − < X ( s j ) + µ ( s j ) + ξ j < c k (cid:1) X k ( s j ) , and ξ j ∼ N (0 , σ ξ ) . The nugget variance , σ ξ , controls the amount of mixing between the classesfor a given realization of X . This classiﬁcation mechanism is equivalent to the ordered probitmodel discussed in Dunlop et al. [20]. In practice, it is typically diﬃcult to objectively discernan appropriate value for σ ξ and hence we therefore let σ ξ be a regular parameter to be estimatedfor a ﬁxed discretization. 6 Inference

It is common to ﬁt LGCP models in a Bayesian setting. A popular approach is through Markovchain Monte Carlo (MCMC) methodology, for instance using the Metropolis adjusted Langevinalgorithm (MALA) [42] which was suggested by Møller et al. [37]. Another approach is throughintegrated nested Laplace approximation (INLA) [29, 44, 48], which when applicable can havebeneﬁcial computational properties. In this work we use a Bayesian MCMC approach for es-timating the model parameters of the LSCP model. Speciﬁcally, we propose a method basedon the preconditioned Crank-Nicholson (pCN) MALA MCMC method of Cotter et al. [15]. Animportant property of the pCN MALA is the optimal step length invariance to mesh reﬁnement,which regular MALA does not have. However, for the LSCP model the main advantage is thatit can be combined with eﬃcient simulation methods based on the fast Fourier transform [32]to decrease the computational cost; we provide more details on this below.Denote the parameters associated with class k as θ k . For the level set ﬁeld, X , we alsoinclude the nugget variance, σ ξ , and the thresholds, { c k } k in θ . By introducing an auxiliaryﬁeld Z deﬁned such that P (Z( s j ) = k ) = Φ (cid:16) c k − X ( s j ) σ ξ (cid:17) − Φ (cid:16) c k − − X ( s j ) σ ξ (cid:17) , we have log ˜ λ ( s ) d = K (cid:88) k =1 Z( s ) X k ( s ) . This means that parameters and latent ﬁelds of diﬀerent classes, { X k , θ k } , are conditionallyindependent given Z . We use this to construct a Metropolis-within-Gibbs algorithm [41] tosample from the joint posterior. In the i th iteration of the algorithm, the following three stepsare performed1. Sample from Z |{ X k , θ k } k , Y . The sampling can be performed exactly since Z( s i ) ⊥ Z( s i ) , ∀ i (cid:54) = j given { X k , θ k } k , Y and P (Z( s i ) = k ) is known up to a normalizing constant.2. Sample from θ k | Z , X k using the MALA random walk sampler. Since parameters fromdiﬀerent classes are conditionally independent, the sampling can be performed separately,and in parallell, for each θ k .3. Sample from X k | Z , θ k , Y using the pCN MALA algorithm of Cotter et al. [15]. Also inthis step, the updates for diﬀerent k can be done in parallel since the diﬀerent Gaussianﬁelds are conditionally independent.The computational bottleneck of the algorithm is the third step, where the latent Gaussianﬁelds are sampled. If the model is discretized into a lattice with N grid cells, the sampling of theGaussian ﬁelds in the third step of the estimation method generally requires ( KN ) operations.An approach to remedy this would be to acquire a Gaussian Markov random ﬁeld approximationof the problem. This idea has been studied by [33, 43, 48] revealing computationally attractiveproperties on arbitrary domains. An adaptation of the method by Simpson et al. [48] to theLSCP model would reduce the computational cost to O ( KN / ) . We can reduce this costfurther by using the fact that proposals in the pCN MALA algorithm are drawn from the priordistribution of the ﬁelds.If we restrict ourselves to square domains and assume that the ﬁelds have stationary andisotropic covariance functions with known spectral density, we can represent { X k } Kk =0 usingFourier series expansions. By truncating these series, the fast Fourier transform can be usedto sample the ﬁeld on a regular lattice over the region. This means that the proposals can begenerated with a O ( KN log( N )) computational complexity. Working in the spectral domain also7llows for eﬃcient computation of all gradients and acceptance probabilities needed, making thespectral approach and the pCN-MALA method in combination very favorable. In Appendix Awe justify this truncation theoretically by showing that convergence of the lattice approximationstill holds given certain bounds on the spectral densities. To further illustrate our approach we return to the tropical rainforest data example in Section 1to compare the eﬀect of considering LSCP models to a simple Poisson regression model as wellas to the LGCP model.

The dataset consists of locations of trees of the species

Beilschmiedia pendula in a 50ha rectangular study plot ( x meter) on the island of Barro Colorado in Panama,Figure 1a. The data were acquired from the ﬁrst census of a major ongoing ecological studythat started in the 1980s, designed to understand the mechanisms maintaining species richness,consisting of the observed positions of a large number of tree species ([26, 25, 12]). The studydeliberately considers a spatially mapped rainforest community, arguing that population andcommunity dynamics occur in a spatial context [24]. In addition to the spatial pattern formedby the tree locations, measurements of topographical variables and soil nutrients that potentiallyinﬂuence the spatial distribution of the trees are available [30, 47, 17], with the aim of linkingspatial patterns to spatial environmental variations, reﬂected by observed topography and soilnutrients. In the statistical literature some of the point patterns derived from the study havebeen considered, for example in [36, 48, 29, 40] and the Beilschmiedia pendula data are availablein the spatstat package [5] for the R project [1].Elevation was measured and sampled on a 5x5 meter grid, and based on this an approximationof the slope at each of these grid points was calculated using a Sobel ﬁlter [50]. Soil sampleswere taken at 300 locations, for which the amount of 12 soil constituents (Al, B, Ca, Cu, Fe,K, Mg, Mn, N, Nmin, P, Zn) as well as the pH level were measured; these were interpolatedto yield spatially continuous covariates. Since the covariates derived from the soil samples andelevation were not sampled with the grid resolution they had to be interpolated to a commonlatice. In this example, the model was discretized to × subregions over the observationalwindow, giving a spatial resolution of . × . meters. The number of observed points in eachsubregion is shown as a two dimensional histogram in Figure 1b. The spatial interpolation ofthe covariates to this lattice grid was performed using bi-cubic splines with the function interp2 in Matlab (R2016a); Figure 3 shows the standardized covariates.To avoid problems with multicollinearity among the covariates we chose to discard the covari-ates corresponding to high variance inﬂation factors (VIF) [39]. The covariates were discardediteratively by ﬁrst computing the VIF for all the covariates, removing the covariate correspond-ing to the highest VIF value if it exceeds 5 and then starting over on the new reduced set ofcovariates. The algorithm was stopped when none of the VIFs exceeded 5. By this procedure,the covariates B, Ca, K, and Zn were discarded, leaving 11 covariates for further analysis. As discussed above, it is obvious from Figure 1a that there is a large area in the middle of theplot where hardly any trees are growing. This indicates that in some parts of the plot, spatialaggregation varies more rapidly than in the other parts. It is likely that some inhibitory factor8levation Slope Al BCa Cu Fe KMg Mn N NminP Zn pHFigure 3: The standardized covariate values on the observational domain.prevents the trees from growing in that region. As mentioned earlier, we have anecdotal evidencethat this area is covered by a swamp and that the tree species is known to be very unlikely togrow there. We test four diﬀerent models to see how the confounding factor will aﬀect inference.The ﬁrst is a simple Poisson regression model on the covariates, i.e. an inhomogeneousPoisson process with linear ﬁxed eﬀects deﬁning the log intensity as B( s ) β . We will refer tothis model as the Fixed model. The second model includes a Gaussian ﬁeld to capture thevariability not explained by the covariates. More precisely, we use an LGCP model with log-intensity log λ ( s ) = X( s ; B) . Here X( s ; B) is a Gaussian ﬁeld with E [X( s ; B)] = B( s ) β and aMatérn covariance with standard deviation σ and range r .Looking at the data, we might expect the LGCP model to explain the variation in pointintensity well, except for the complete lack of observations in the central region coupled withthe discontinuity in the observed intensity at the border between the large empty area and theother parts of the plot. If the habitat dependence of the trees is signiﬁcantly diﬀerent in thesetwo separated regions, a LSCP model with a separate class for each of the two regions mightprovide a better ﬁt. Therefore, the third model is a two-class LSCP model where the ﬁrst classis deﬁned as in the LGCP model and the second class has constant intensity. That is, log λ ( s ) = π ( s ) X ( s ; B) + π ( s ) C . We will refer to this as the LSCP model. We ﬁxed the parameter of C to a small value proportional to the mean intensity among the grid cells with at most tree.Finally, we consider a simpliﬁed version of this model where log λ ( s ) = π ( s ) B( s ) β + π ( s ) C .This will be referred to as the FixedM model.The posterior distributions of the parameters and latent ﬁelds were estimated using theproposed MCMC method. In order to avoid signiﬁcant wrap-around eﬀects, the lattice was9xtended by m for the level set ﬁeld and by m for the latent Gaussian ﬁelds of theclasses (implicitly assuming correlation ranges smaller than m for classiﬁcation and within classes).The smoothness parameters of the Gaussian ﬁelds were ﬁxed at ν = 1 and thefollowing independent prior distributions for the model parameters (when applicable) were used: i ) N (0 , priors for the ﬁxed eﬀects; ii ) N (0 , -priors for the threshold parameters; iii ) an ex-ponential distribution with mean 2, Exp (2) , for the standard deviations of the Gaussian ﬁeldsexcept for X , where σ = 1 is ﬁxed; iv ) Exp (200) distributions truncated from below at thelattice distance and from above at the lattice extension range for the range parameters r k ; thisensures that no wrap-around artifacts were introduced and that the correlation range were notsmaller than the discretization distance; v ) an Exp (0 . distribution truncated from above at for the nugget standard deviation; this yields an expected a priori standard deviation of ap-proximately . and ensures that the nugget variance does not dominate the spatial dependencyin the level set ﬁeld.The standard deviations of the Gaussian ﬁelds were given exponential priors since this cor-responds to the PC prior [51, 49] which penalizes deviations from the simpler model withoutthe Gaussian ﬁeld, where a mean of penalizes large values. The range parameters were givenexponential priors using similar reasoning where no spatial dependency corresponds to the basemodel. However, ranges below the lattice distance were truncated since no information exist forsmaller values due to the spatial discretization. The covariates were standardized to mean andvariance . Hence, the ﬁxed eﬀects prior yields a penalisation from the base model of no ﬁxedeﬀects. The nugget for the level set ﬁeld was considered as a deviation from the base model(without a nugget) and hence penalised by an exponential distribution.To assess the model ﬁt models we used a common approach for point process models [36, 28, 4]that compares summary characteristics estimated from the observed point pattern with envelopesbased on the summary characteristic estimated for simulated point patterns, generated from eachof the four ﬁtted models.As a functional summary characteristic we used the centered and variance stabilized K -function, commonly referred to as the centered L -function, ˆ L ( r ) = (cid:113) ˆ K ( r ) π − r . The L -functionstabilizes the variance such that ˆ L ( r ) will be homoscedastic with respect to r . Furthermore,the − r term centers the function in the sense that the resulting function for a homogeneousPoisson process has the value of zero for all distances [28]. We used isotropic edge correctionand calculated the envelope using the functions Kest and envelope from the spatstat package[5]. The L -function for the observed point pattern as well as the pointwise sample mean and envelopes from realizations for each of the four models can be seen in the top row ofFigure 4.Not surprisingly, the Fixed model seems clearly inappropriate, as the functional summarycharacteristics for the observed point pattern is far outside the envelopes for all distances. For allother models the estimated function for the observed point pattern remains inside the envelopes.The estimated function for the observed pattern and the expected values of the simulated pat-terns are most similar for the LSCP model. For the standard LGCP model the empirical patternappears to show less clustering than the one expected from the model. This can be seen by lowvalues of the black line relative to the red line for a large range of r , and the function only justwithin the envelope at smaller distances. Due to the cumulative nature of the L -function it ishard to discern at what inter-point distances this discrepancy occurs in the patterns. Interest-ingly, the diﬀerences between the two lines are less drastic for the FixedM model and reversed.The simulated patterns seem to be less clustered.Secondly, we compute the pair correlation function for the diﬀerent models as well as pointwise envelopes and expected values in a similar fashion as for the L -function. The result is10SCP model.

50 100 150 200

LGCP model

50 100 150 200

FixedM model.

50 100 150 200

Fixed model.

50 100 150 200

20 40 60 80 100 140 . . . . . . .

18 20 22 24 26 28 30 . . . . . .

18 20 22 24 26 28 30 . . . . . . Figure 4: Plots of the L-function (top row), pair correlation function (mid row), and empty spacefunction (bottom row). In each case, the black line corresponding to the value estimated fromthe real point pattern, the thin line corresponding to the model mean and the envelopes beingthe pointwise envelopes of simulated point patterns for the corresponding model.shown in the second row of Figure 4, with the function for the empirical pattern again deviatingdrastically from those for the simulated patterns, for the Fixed model. For the LGCP model thefunction for the empirical pattern is only just outside the envelopes and again below the meanfunction, indicating less clustering. This discrepancy vanished at a a distance r of about meters. For the FixedM model the deviations are mainly at short distances where the empiricalpattern show greater amount of clustering than what would be expected by the model.Finally, we compute the empty space function and corresponding pointwise envelopesand expected values. These are shown in the bottom row of Figure 4, and one can see thatthe Fixed model once again deviates drastically from the empirical pattern. Here the LSCPmodel show lower values than the empirical pattern while the FixedM model show larger ones.The LGCP model show an increasing over estimation of clustering and the lower border of theenvelope touches the empirical value at far right of the range. Based on the comparison betweenthe pointwise expected values and envelopes with the empirical values, the LSCP model seemto explain the observed point pattern better than the other models.11 ξ Fixed FixedM LGCP LSCP00.20.40.60.8 r Fixed FixedM LGCP LSCP0100200300400 c Fixed FixedM LGCP LSCP-1-0.500.511.5 σ Fixed FixedM LGCP LSCP00.511.52 r Fixed FixedM LGCP LSCP050100150

Figure 5: The mean (cross) and 95% credibility intervals (lines) for the ﬁeld parameters.

The models discussed here, relating a spatial pattern to the spatially continuous covariates maybe of interest for a number of reasons. Commonly, one seeks to understand habitat preferences ofa particular species as reﬂected in the relationship between the point pattern and the covariates.In addition, it might be of interest to understand the nature of the spatial structure that remainsunexplained by the covariates. This might be gleaned from the parameters of the covariancefunction of the Gaussian random ﬁeld(s).To investigate the spatial structure, we ﬁrst look at the mean value and credibilityintervals for the random ﬁeld parameters of the models. These are presented in Figure 5.Observing the diﬀerence between r , and σ values of the LGCP and LSCP models show howthe empty region will aﬀect the estimation of the spatial dependency structure. Here, the LSCPmodel shows a signiﬁcantly lower variance and clearly lower correlation range. This is naturalsince the Gaussian ﬁeld for the LSCP model does not need to explain both the eﬀect of naturalspatial dependency between growth of trees as well as the unknown inhibitory eﬀect that causestrees to not grow at all in certain regions of the forest. And ﬁnally we note that σ (cid:15) has a largeeﬀect (signal to noise ratio equals σ (cid:15) ) indicating that the Matérn ﬁeld for X cannot explain theclassiﬁcation on its own. This is clearer for the FixedM model, where classiﬁcation jumps moresporadically between adjacent grid cells due to the over-simpliﬁed structure of the classes.In Figure 6 the mean posterior log intensities, { log λ i } Ni =1 are presented as kriging predictionsfor each of the four models. The ﬁgure also shows the posterior probabilities P (X ( s ) > c | Y) ,giving an indication of the region with very few trees. The posterior log intensity surface of theLSCP shows sharp boundaries contrary to the smoothly varying in the LGCP. The classiﬁcationin the FixedM model is more noisy than that of the LSCP model and a larger proportion ofthe observation window is classiﬁed as being the empty region . Once again, this is connected tothe larger value of σ ξ and caused by the FixedM having to explain the intensity with a muchsimpler model.The relationship between the tree intensity and the covariates is also of interest, and inpractice is often the focus of a study and hence the most relevant inference. Recall that 12covariates of the original 16 covariates are considered her; 11 covariates and one intercept term.Figure 7 shows the mean and credibility intervals for each of these covariates for all models.The ﬁrst question is which of the covariates have a signiﬁcant impact on the spatial distributionof the trees and hence reﬂect a habitat preference of the species. To answer this we asseswhich of the regression coeﬃcients β are signiﬁcantly diﬀerent from zero. Empirical p-valuesare computed from the sampled posterior distributions and adjusted for the multiple testingscenario, Holm-Bonferroni correction [23] is used to acquire rejection regions for each covariate.12 a) LSCP (b) LGCP(c) FixedM (d) Fixed -2-101234 (e) LSCP(f) FixedM Figure 6: Mean posterior log intensity surface, λ (left) and mean classiﬁcation for the modelswhere applicable (right)Model CovariatesFixed Int, Elev, Slope, Al, Mn, NMin, P, pH, Cu, NFixedM Int, NMin, Elev, MnLGCP IntLSCP IntTable 1: Signiﬁcant covariates on a 5% level for the covariates using Holm-Bonferroni correctionto correct for multiple hypothesis tests.Table 1 shows the covariates that were considered signiﬁcant, at a signiﬁcance level of , foreach of the four models.The FixedM model identiﬁes a smaller number of signiﬁcant covariates than the Fixed model.This is not suprising since the covariates do not need to explain the lack of trees in the emptydomain anymore. For the LGCP and LSCP model, only the intercept is signiﬁcant. This isprobably an eﬀect of the smaller number of degrees of freedom due to the increased number ofparameters to estimate, i.e. the Gaussian ﬁelds. We have considered the problem of Bayesian level set inversion for point process data. Theproposed model can be seen as a generalization of the log-Gaussian Cox process model wherethe latent Gaussian ﬁeld is extended to a level set mixture of Gaussian ﬁelds. We derived basicmodel properties and in Appendix A showed consistency of the posterior probability measure ofﬁnite-dimensional approximations to the continuous model. A computationally eﬃcient MCMCmethod for Bayesian inference, based on the pCN MALA algorithm, was presented. A topic offurther research could be to investigate other, potentially even quicker, estimation methods suchas INLA or variational Bayes.We modelled a point pattern formed by the locations of the trees from a species in a tropicalrainforest. The example was of interest since the point pattern show clear signs of being aﬀectedby some unknown confounding factor. Comparisons of functional statistics between simulationsfrom the ﬁtted models and the observed data indicated that allowing for a second class in themodel better explains the point pattern behavior. Moreover, the LSCP model stayed close to13ntercept

Fixed FixedM LGCP LSCP-2.5-2-1.5-1-0.50

Elevation

Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6

Slope

Fixed FixedM LGCP LSCP-0.100.10.20.3 Al Fixed FixedM LGCP LSCP-0.6-0.4-0.200.2 Cu Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 Fe Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 Mg Fixed FixedM LGCP LSCP-0.6-0.4-0.200.20.4 Mn Fixed FixedM LGCP LSCP-0.4-0.200.20.40.6 N Fixed FixedM LGCP LSCP-0.4-0.200.20.4

Nmin

Fixed FixedM LGCP LSCP-0.8-0.6-0.4-0.20 P Fixed FixedM LGCP LSCP-0.8-0.6-0.4-0.200.2 pH Fixed FixedM LGCP LSCP-0.4-0.200.2

Figure 7: The mean (cross) and 95% credibility intervals (lines) for the posterior marginaldistribution of ﬁxed eﬀect for the four diﬀerent models.the expected values for all three functional characteristics investigated while the popular LGCPmodel did not. There are indications that the FixedM model explains the data better thanthe LGCP model despite the much simpler structure of the earlier model. FixedM has far lessdegrees of freedom than the LSCP and LGCP models, and is hence less prone to overﬁtting.It also shows that it is not overﬁtting that allows models with two classes to outperform theLGCP. The analysis of the tropical rainforest showed that inference on both the Gaussian ﬁeldparameters and covariates were aﬀected by allowing for a second class in the model. It suggeststhat the inference drawn based on the LGCP model were biased by the confounding factor.Future analysis could consider using ﬁxed eﬀects also in the level set ﬁeld, X , in orderto investigate which covariates that explains the classiﬁcation. This is another feature of theproposed model that we have not yet investigated. Further, analysis of multivariate pointpatterns are possible such as for instance joint analysis of several species of plants. This could beperformed by introducing multivariate Gaussian random ﬁelds for the classes, i.e. for { X k } Kk =1 .Another possibility is letting several species share the same level set ﬁeld, X , or classiﬁcationsﬁeld, Z , but use independent class ﬁelds, { X k } Kk =1 . In this way, information about X could beenforced from several point patterns jointly. The authors gratefully acknowledge the ﬁnancial support from the Knut and Alice WallenbergFoundation, the Swedish Research Council Grant 2016-04187, and the ÅForsk foundation. Wewould like to thank the people at the Center of tropical forest research, Smithsonian TropicalResearch Institute for the extensive forest census plot and for making the data publicly available.The BCI forest dynamics research project was founded by S.P. Hubbell and R.B. Foster and isnow managed by R. Condit, S. Lao, and R. Perez under the Center for Tropical Forest Scienceand the Smithsonian Tropical Research in Panama. Numerous organizations have providedfunding, principally the U.S. National Science Foundation, and hundreds of ﬁeld workers havecontributed.Also thanks to the Barro Colorado soil survey (Jim Dalling, Robert John, Kyle Harms,14obert Stallard and Joe Yavitt and ﬁeld assistants Paolo Segre and Juan Di Trani) for mak-ing the soil sample data publicly available and for answering questions and handing out theoriginal soil sample locations on request. The Barro Colorado soil survey was funded by NSFDEB021104,021115, 0212284,0212818 and OISE 0314581 as well as the STRI Soils Initiative andCTFS.

References [1] The R Project for Statistical Computing. URL .[2] R.J. Adler and J.E. Taylor.

Random Fields and Geometry . Springer, 2007. ISBN 978-0-387-48112-8.[3] J-M. Azaïs and M. Wschebor.

Level Sets and Extrema of Random Processes and Fields .Wiley, 1 edition, 2009. ISBN 04704093393.[4] A. Baddeley, E. Rubak, and R. Turner.

Spatial point patterns: methodology and applicationswith R . CRC Press, 2015.[5] Adrian Baddeley and Rolf Turner. spatstat: An R package for analyzing spatial pointpatterns.

Journal of Statistical Software , 12(6):1–42, 2005. URL .[6] A.J. Baddeley. Non- and semi-parametric estimation of interaction in inhomogeneuouspoint patterns.

Statistica Neerlandica , 54(3):329–350, 2000.[7] S. Barman and D. Bolin. A three-dimensional statistical model for CLSM images of porouspolymer ﬁlms.

ArXiv e-prints , 1705.03938, May 2017.[8] M. Burger. A level set method for inverse problems.

Inverse problems , 17(5):1327–1355,2001.[9] D. F. R. P. Burslem, N. C. Garwood, and S. C. Thomas. Tropical forest diversity – theplot thickens.

Science , 291:606–607, 2001.[10] E.T. Chung. Electrical impedance tomography using level set representation and totalvariational regularization.

Journal of computational physics , 205:357–372, 2005.[11] R. Condit.

Tropical Forest Census Plots.

Springer-Verlag and R. G. Landes Company,Berlin, Germany, and Georgetown, Texas., 1998.[12] R. Condit.

Tropical Forest Census Plots: Methods and Results from Barro Colorado Island,Panama and a Comparison with Other Plots . Springer Berlin Heidelberg, 1998. ISBN9783662036648.[13] R. Condit, P. S. Ashton, P. Baker, S. Bunyavejchewin, S. Gunatilleke, N. Gunatilleke,S.P. Hubbell, R.B. Foster, A. Itoh, J.V. LaFrankie, H.S. Lee, E. Losos, N. Manokaran,R. Sukumar, and T. Yamakura. Spatial patterns in the distribution of tropical tree species.

Science , 288:1414–1418, 2000.[14] S.L. Cotter, M. Dashti, and A.M. Stuart. Approximation of Bayesian inverse problems forPDEs.

SIAM journal on numerical analysis , 48(1):322–345, 2010.1515] S.L. Cotter, G.O. Roberts, A.M. Stuart, and D. White. MCMC Methods for Functions:Modifying Old Algorithms to Make Them Faster.

Statistical Science , 28(3):424–446, 2013.[16] D.J. Daley and D. Vere-Jones.

An Introduction to the Theory of Point Processes: VolumeII: General Theory and Structure , volume 2. Springer, 2003. ISBN 0-387-95541-0.[17] J. Dalling, R. John, K. Harms, R. Stallard, and J. Yavitt. Soil Maps of Barro Col-orado Island 50 ha Plot. URL "http://ctfs.si.edu/webatlas/datasets/bci/soilmaps/BCIsoil.html" .[18] O. Desjardins and H. Pitsch. A spectrally reﬁned interface approach for simulating multi-phase ﬂows.

Journal of computational physics , 228(5):1658–1677, 2009.[19] P.J. Diggle.

Statistical Analysis of Spatial and Spatio-Temporal Point Patterns . Thirdedition edition, 2014.[20] M.M. Dunlop, M.A. Iglesias, and A.M. Stuart. Hierarchical Bayesian level set inversion.

Statistics and Computing

Computational statistics and data analysis , 54:1999–2011, 2010.[23] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure.

Scandinavian journalof statistics , 6(2):65–70, 1979.[24] S. P. Hubbell.

The Uniﬁed Neutral Theory of Biodiversity and Biogeography . Monographsin Population Biology 32, Princeton University Press, 2001.[25] S. P. Hubbell, R. B. Foster, S. T. O’Brien, K. E. Harms, R. Condit, B. Wechsler, S. J.Wright, and S. Loo de Lao. Light-Gap Disturbances, Recruitment Limitation, and TreeDiversity in a Neotropical Forest.

Science , 283(5401):554–557, 1999.[26] S. P. Hubbell, R. Condit, and R. B. Foster. Barro Colorado Forest Census Plot Data, 2005.URL http://ctfs.si/edu/datasets/bci .[27] M.A. Iglesias, Y. Lu, and A.M. Stuart. A Bayesian level set method for geometric inverseproblems.

Interfaces and free boundaries , 18(2):181–217, 2016.[28] J. B. Illian, A. Penttinen, H. Stoyan, and D. Stoyan.

Statistical Analysis and Modelling ofSpatial Point Patterns , volume 70. John Wiley & Sons, 2008.[29] J.B. Illian, S.H. Sørbye, and H. Rue. A toolbox for ﬁtting complex spatial point processmodels using integrated nested Laplace approximation.

The Annals of Applied Statistics , 6(4):1499–1530, 2012.[30] R. C. John, J. W. Dalling, K. E. Harms, J. B. Yavitt, R. F. Stallard, M. Mirabello, S. P.Hubbell, R. Valencia, H. Navarrete, M. Vallejo, and R. B. Foster. Soil nutrients inﬂuencespatial distributions of tropical tree species.

Proceedings of the National Academy of SciencesUSA , 104:864–869, 2007.[31] D. Lambert. Zero-Inﬂated Poisson Regression, With an Application to Defects in Manu-facturing.

Technometrics , 34(1):1–14, 1992.1632] A. Lang and J. Potthoﬀ. Fast simulation of Gaussian random ﬁelds.

Monte Carlo Methodsand Applications , 17(3):195–214, 2011.[33] F. Lindgren, H. Rue, and J. Lindström. An explicit link between Gaussian ﬁelds and Gaus-sian Markov random ﬁelds: the stochastic partial diﬀerential equation approach.

Journalof the Royal Statistical Society , 73(4):423–498, 2011.[34] R.J. Lorentzen, G. Naevdal, and A. Shaﬁeirad. Estimating Facies Fields by Use of the En-semble Kalman Filter and Distance Functions - Applied to Shallow-Marine Environments.

SPE Journal , 3(1):146–158, 2012.[35] B. Matérn.

Spatial Variations , volume 36. Springer-Verlag, 1986. ISBN 9780387963655.[36] J Møller and R.P. Waagepetersen. Modern Statistics for Spatial Point Processes.

Scandi-navian Journal of Statistics , 34(4):643–684, 2007.[37] J Møller, A.R. Syversveen, and R.P. Waagepetersen. Log Gaussian Cox Processes.

Scandi-navian journal of statistics , 25(3):451–482, 1998.[38] V.V. Mourzenko. Percolation in two-scale porous media.

The European physical journal B ,19(1):75–85, 2001.[39] J. Neter, W. Wasserman, and M.H. Kutner.

Applied Linear Regression Models . Irwin,second edition edition, 1989. ISBN 0-256-07068-7.[40] T. Rajala and J. Illian. A family of spatial biodiversity measures based on graphs.

Envi-ronmental and ecological statistics , 19(4):545–572, 2012.[41] C.P. Robert and G. Casella.

Monte Carlo statistical methods . Springer, 2 edition, 2004.ISBN 9781475741452.[42] G.O. Roberts and R.L. Tweedie. Exponential convergence of Langevin distributions andtheir discrete approximations.

Bernoulli , 2(4):341–363, 1996.[43] H. Rue and L. Held.

Gaussian Markov random ﬁelds , volume 104. Chapman and Hall,2005. ISBN 0203492021.[44] H. Rue, S. Martino, and N. Chopin. Approximate Bayesian inference for latent Gaussianmodels by using integrated nested Laplace approximations.

Journal of the royal statisticalsociety: series B , 71(2):319–392, 2009.[45] F. Santosa. A level-set approach for inverse problems involving obstacles.

ESAIM: Control,Optimisation and caculus of variations , 1:17–33, 1996.[46] B. Scheuermann and B. Rosenhahn. Analysis of Numerical Methods for Level Set BasedImage Segmentation.

Lecture notes in computer science , 5876(2):196–207, 2009.[47] L. A. Schreeg, W. J. Kress, D. L. Erickson, and N. G. Swenson. Phylogenetic analysis oflocal-scale tree soil associations in a lowland moist tropical forest.

PLoS ONE , 5:1–10, 2010.[48] D. Simpson, J.B. Illian, F. Lindgren, S.H. Sørbye, and H. Rue. Going oﬀ grid: computationaleﬃcient inference for log-Gaussian Cox processes.

Biometrika , 103(1):49–70, 2016.[49] D. Simpson, H. Rue, A. Riebler, T.G. Martins, and Sørbye S.H. Penalising Model Com-ponent Complexity: A Principled, Practical Approach to Constructing Priors.

Statisticalscience , 32(1):1–28, 2017. 1750] M Sonka, V Hlavac, and R Boyle.

Image Processing, Analysis, and Machine Vision , chapterImage pre-processing. Thomson, 2008.[51] S.H. Sørbye, J.B. Illian, D.P. Simpson, and D. Burslem. Careful prior speciﬁcation avoidsincautious inference for log-Gaussian Cox processes. Unpublished manuscript, 2017.[52] A.M. Stuart. Inverse problems: A Bayesian perspective.

Acta numerica , 19:451–559, 2010.

A Theoretical results

In this section, we will theoretically justify the two approximations of the LSCP process thatare needed for inference. The ﬁrst is the ﬁnite dimensional approximation from Section 2.4 andthe second is the truncation needed for the fast Fourier transform in Section 3.For k = { , ..., K } , let X k be a Gaussian random ﬁeld on the spatial domain D = [0 , d ⊂ R d ,deﬁned on a complete probability space. We will show the results using methods similar to thosein [14, 27, 48] and for this it is convenient to represent the ﬁelds as Gaussian measures µ ( k )0 . Tosimplify the presentation, we will assume a speciﬁc covariance operator related to the Matérncovariance function. However, the results can be extended to more general densely-deﬁned,self-adjoint, positive deﬁnite operators and to more general bounded domains.Let µ ( k )0 = N (0 , C ) , where C = τ A − α with A = κ − ∆ . Here τ, κ and α are positiveparameters and A : D ( A ) ⊂ L ( D ) → L ( D ) , further we impose periodic boundary conditions.Denote the eigenvalues of A as { λ j } j ∈ N , which are arranged in a nondecreasing order, and thecorresponding eigenfunctions as { e j } j ∈ N , which form a complete orthonormal basis for L ( D ) .The fractional power operator A α : D ( A α ) → L ( D ) is deﬁned by A α u = (cid:88) j ∈ N λ αj (cid:104) u, e j (cid:105) e j . For any α , the subspace H α := D ( A α/ ) is a Hilbert space H α = { u : (cid:88) j ∈ N λ αj | (cid:104) u, e j (cid:105) | < ∞} , with respect to the inner product (cid:104) φ, ψ (cid:105) α = (cid:10) A α/ φ, A α/ ψ (cid:11) and corresponding norm (cid:107) φ (cid:107) α = (cid:80) j ∈ N λ αj (cid:104) φ, e j (cid:105) .With this choice of covariance operator, we have that if u ∼ µ k , then u ∈ H s for any s <α − d/ µ k -almost surely [20, Theorem 1]. Furthermore, u is almost surely p-times diﬀerentiableif α − d/ > p . We will need this diﬀerentiability and we formulate it as an assumption. Assumption A.1.

The classiﬁcation ﬁeld X is almost surely a Morse function with strictlypositive variance at all locations in the domain, and for k > the Gaussian ﬁelds X k are almostsurely diﬀerentiable. The diﬀerentiability assumption is satisﬁed by assuming α > . The Morse function re-quirement is slightly stronger than C , but is implied by α > [2]. Furthermore, we can usea theorem equivalent to the Sobolev embedding theorem for our H s space [52, Theorem 2.10].That is, (cid:107) X k (cid:107) L ∞ ≤ C (cid:107) X k (cid:107) s if X k ∈ H s and s > d/ . For our case with periodic boundaryconditions the space H s is even equivalent to the Sobolev space H s .We thus have that X k is represented as a Gaussian measure, µ ( k )0 , on H α and we can choose anappropriate σ -algebra such as the probability space ( H α , Σ k , µ ( k )0 ) becomes complete (see [27]).18ikewise X = { X } Kk =0 can be represented by a product measure µ on the complete measurespace X = (Ω , Σ , µ ) , where Ω is the product space of each H α and Σ is the correspondingproduct σ -algebra.Since the LSCP model deﬁnes the point process as a non-homogeneous Poisson processconditioned on X , the likelihood potentials for the continuous and ﬁnite dimensional models,deﬁned in Section 2.4, are Φ(X; Y) = (cid:90) D λ ( s ; X) ds − (cid:88) s j ∈ Y log λ ( s j ; X) , (4) Φ N (X; Y) = (cid:88) i ∈ N ( | D i | λ (˜ s i ; X) − Y i log λ (˜ s i ; X)) . (5)Here, N is the number of discretized regions in the lattice approximation and Y i denotes thenumber of observations in D i . Further, ˜ s i is the midpoint of each D i , and s j is the locationof the j th point in the point pattern Y . Based on these likelihoods, we can now deﬁne thecorresponding posterior measures as follows. Proposition A.2.

If Assumption A.1 holds, we can deﬁne posterior measures using Radon-Nikodym derivative with respect to µ : dµdµ (X) = 1 C µ (Y) exp ( − Φ(X; Y)) ,dµ N dµ (X) = 1 C µ N (Y) exp (cid:0) − Φ N (X; Y) (cid:1) , (6) where C µ (Y) and C µ N (Y) are normalizing constants. The proof is given in Appendix B. Since only the discretized model can be used for inference,it is important to know that the approximation µ N converges to the true posterior, µ , as thediscretization becomes ﬁner. The following theorem shows that this indeed is the case withrespect to the total variation distance, d TV ( µ, µ N ) = 2 sup E ∈F X | µ ( E ) − µ N ( E ) | . Theorem A.3.

Let Assumption A.1 hold and let µ N and µ be the posterior measures deﬁnedin (6) . Then d TV ( µ, µ N ) → as N → ∞ . The proof is given in Appendix B. Also the latent ﬁelds, X , need to be approximated by ﬁnitedimensional representations for inference. We will do this by truncating the basis expansion ofthe ﬁeld to p terms: X ≈ ˜ X = p (cid:88) j =1 ξ j λ αj e j , where ξ j are independent standard normal variables. We will refer to the model using a dis-cretization of the observational domain and ﬁnite dimensional approximations of X as the fullydiscretized model . The advantage with using this truncation is that we can use the fast Fouriertransform for simulating the ﬁeld. To show that we still have convergence under this approxi-mations, note that the ﬁnite dimensional approximation of X can be viewed as an orthogonalprojection of X on to the space spanned by the eigenfunctions { e j } j ≤ p as is done in Cotter et al.[14]. We deﬁne the projection operator P p such that ˜X( s ) = P p X( s ) . It is now possible todeﬁne a posterior probability measure for ˜ µ N by it’s Radon-Nikodym derivative as d ˜ µ N dµ (X) = 1 C ˜ µ N (Y) exp (cid:0) − Φ N ( P p X; Y) (cid:1) . (7)19n important consequence of this deﬁnition is that the posterior measure is absolutely continuouswith respect to µ and measurable with respect to Σ . The interpretation of ˜ µ N is that the datawill only aﬀect the projection, P p X . We can now show that also under this approximation, weget convergence to the true posterior. Theorem A.4.

Let the measure ˜ µ N be deﬁned by (7) , and let the measure µ be deﬁned by (6) .If µ satisﬁes Assumption A.1, then d TV ( µ, ˜ µ N ) → as N → ∞ and p → ∞ . The proof is given in Appendix B.

B Proofs

Proof of Proposition 2.1.

For the ﬁrst moment, note that E [ λ ( s )] = E [exp(X( s ))] = K (cid:88) k =1 E [exp(X( s )) | X ( s ) ∈ ( c k − , c k ]] P (X ( s ) ∈ ( c k − , c k ])= K (cid:88) k =1 E [exp(X k ( s ) + µ k ( s ))] P (X ( s ) ∈ ( c k − , c k ])= K (cid:88) k =1 exp (cid:18) µ k ( s ) + r k (0)2 (cid:19) P (X ( s ) ∈ ( c k − , c k ]) , where the ﬁnal equality follows from the explicit form of the expectation of a log normal randomvariable. The second moment follows by similar calculations. (cid:3) Proof of Proposition 2.2.

The inhomogeneous empty space function, F ( s , r ) is deﬁned as theprobability of having at least one point inside a ball of radius r centered at s , i.e. F ( s , r ) = P ( N (Y; B ( s , r )) > . Here, N (Y; A ) is the number of points inside the domain A for a real-ization of the point process, Y . Hence F ( s , r ) = 1 − P ( N (Y; B ( s , r )) = 0) . Now, P ( N ( B (Y; s , r )) = 0) = E (cid:34) exp (cid:32) − (cid:90) B ( s ,r ) e (cid:80) Kk =1 Z k ( s )(X k ( s )+ µ k ( s )) d s (cid:33)(cid:35) = E (cid:34) exp (cid:32) − (cid:90) B ( s ,r ) K (cid:88) k =1 Z k ( s ) e X k ( s )+ µ k ( s ) d s (cid:33)(cid:35) = E (cid:34) K (cid:89) k =1 exp (cid:32) − (cid:90) D k ∩ B ( s ,r ) e µ k ( s ) e X k ( s ) d s (cid:33)(cid:35) . (cid:3) Due to the product space interpretation of X as the collection { X k } k , we deﬁne norms on X as (cid:107) X (cid:107) ( · ) = (cid:80) Kk =0 (cid:107) X k (cid:107) ( · ) . That is, a norm on realizations of all Gaussian random ﬁelds jointlyare deﬁned as the sum of the norm for each of the K + 1 ﬁelds.To simplify the proofs we note that the potential Φ can be written as a composition of twofunctions: The potential Φ(X; Y) = Φ P ( G (X); Y) where Φ P : L ( D ) × Y → R is the continuousPoisson log-likelihood function and G : H α → L ( D ) is G ( X ) = K (cid:88) k =1 π k ( · ) X k ( · ) = log( λ ( · )) , π k is the classiﬁcation function, π k ( s ) = I (c k − ≤ X ( s ) < c k ) . Similarly Φ N (X; Y) =Φ NP ( G (X); Y) where Φ NP is the Poisson log-likelihood function for the discretized domain.To prove Proposition A.2, we will need two lemmas, where the ﬁrst gives bounds for thelikelihood potentials. Lemma B.1.

Let (cid:107) Y (cid:107) Y denote the number of points in a given point pattern. For Φ in (4) and Φ N in (5) we then have that:(i) For every r > , (cid:15) > , and s > with X ∈ H s and Y ∈ Y with || Y || Y ≤ r , there exists aconstant M ( (cid:15), r ) ∈ R such that Φ(X; Y) ≥ M ( (cid:15), r ) − (cid:15) || X || s .(ii) For every r > , and s > all X ∈ H s and all Y ∈ Y with max {|| X || s , || Y || Y } < r wehave Φ(X; Y) ≤ | D | e Cr + C r .Proof. To show (i) note that Φ P ( G (X); Y) = (cid:90) D exp ( G (X)) ds − (cid:88) s j ∈ Y G (X) ≥ − (cid:88) s j ∈ Y G (X) ≥ −(cid:107) Y (cid:107) Y (cid:107) G (X) (cid:107) L ∞ ( D ) ≥ − r (cid:107) G (X) (cid:107) L ∞ ( D ) . By Assumption A.1 and the Sobolev embedding theorem we have that (cid:107) X (cid:107) L ∞ ( D ) ≤ C (cid:107) X (cid:107) s .Thus (cid:107) G (X) (cid:107) L ∞ ( D ) ≤ (cid:107) X (cid:107) L ∞ ( D ) ≤ C (cid:107) X (cid:107) s and we have Φ P ( G (X) , Y) ≥ − rC (cid:107) X (cid:107) s . Now, ≤ ( Cr √ (cid:15) − √ (cid:15) (cid:107) X (cid:107) s ) = C r (cid:15) + (cid:15) (cid:107) X (cid:107) s − Cr (cid:107) X (cid:107) s . Hence Cr (cid:107) X (cid:107) s ≤ (cid:15) (cid:107) X (cid:107) s + C r (cid:15) = (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), r ) . By the same argument, Φ NP ( G (X); Y) = (cid:88) i ∈ I N ( | D i | exp ( G (X)( s )) − Y i G (X)( s i )) ≥ M ( (cid:15), r ) − (cid:15) (cid:107) X (cid:107) s . Statement (ii) holds for Φ since Φ P ( G (X); Y) = (cid:90) D exp ( G (X)( s )) d s − (cid:88) s j ∈ Y G (X)( s j ) ≤ | D | e (cid:107) G (X) (cid:107) L ∞ ( D ) + (cid:107) Y (cid:107) Y (cid:107) G (X) (cid:107) L ∞ ( D ) ≤ | D | e Cr + Cr ≤ | D | e Cr + C r , and the same for Φ N since Φ NP ( G (X) , Y) = (cid:88) i ∈ I N (cid:16) | D i | e (cid:107) G (X) (cid:107) s − Y i G (X)( s i ) (cid:17) ≤ | D | e Cr + C r . (cid:3) The second lemma we need concerns the regularity of the level sets of X . Let S k (X ) = { s : X ( s ) = c k } be the level set of X for the level c k and set S (X ) = ∪ Kk =1 S k (X ) . Further,let J s denote the set of indices for all subregions D j that do not intersect with S (X ) , thatis, j ∈ J s if D j ∩ D k = ∅ for all ≤ k ≤ K , and deﬁne S (X) = ∪ j ∈ J s D j as the set of allsubregions where the level sets are not included. We then have the following result about S (X) ,and L d ( S (X)) where L d denotes the Lebesgue measure in dimension d .21 emma B.2. Let Assumption A.1 hold, then • L ( S (X)) = 0 a.s. • E (cid:2) L ( S C (X)) (cid:3) → as N → ∞ . • For any ﬁnite set of points Y , E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3) → as N → ∞ .Proof. That L ( S (X)) = 0 a.s. follows from Proposition 2.8 in Iglesias et al. [27].We will now show that E (cid:2) L ( S C (X)) (cid:3) goes to zero. Note that a curve segment of length l can at most cover 4( lh + 1 ) subregions D j . Hence, the number of subregions D j that have alevel crossing, N − | J s | , is bounded by (cid:80) N ∗ i =1 l i /h + 1) , where N ∗ is the number of disjoint linesegments in S ( X ) and l i the length of i th segment. This gives that E (cid:2) L ( S C (X)) (cid:3) ≤ h (cid:18) h E (cid:2) L ( S ( X )) (cid:3) + 4 E [ N ∗ ] (cid:19) ≤ h ( E (cid:2) L ( S ( X )) (cid:3) + hN ∗ ) . Thus, the result follows if we can bound E (cid:2) L ( S ( X ))) (cid:3) and E [ N ∗ ] . By assumption X satisﬁesthe conditions of Rice Theorem [3], which gives that E (cid:2) L ( S ( X ))) (cid:3) < ∞ . Let N k denote thenumber of local maxima of X over the level c k and let N = (cid:80) k N k . Since E [ N ∗ ] is boundedby E (cid:2) N (cid:3) , and Rice Theorem bounds E (cid:2) N (cid:3) , the result follows.Finally, we show that E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3) goes to zero. We only consider the case K = 1 and Y = { y } , as the general result follows directly given that the claim holds for this special case.Let B ( y, (cid:15) N ) be a ball centered at y , where (cid:15) N is chosen so that the subregions covering y arecontained in the ball. To prove the result we need to show that P ( L i ( B ( y, (cid:15) N ) ∩ X − ( c )) > → as N → , for both i = 0 , , where L i are the Lipschitz-Killing curvatures. Since X is a Morsefunction and L i ( B ( y, (cid:15) N )) → , Theorem 15.9.4 in [2] shows that E [ L i ( B ( y, (cid:15) N ) ∩ X − ( c )))] → for i = 0 , . Thus P ( L ( B ( y, (cid:15) N ) ∩ X − ( c )) > → as L ( B ( y, (cid:15) N ) ∩ X − ( c )) is non-negativerandom variable. Since any B ( y, (cid:15) N ) converges to a point, it follows that L ( B ( y, (cid:15) N ) ∩ X − ( c )) (the Euler characteristic) converges to a non-negative random variable, and thus P ( L ( B ( y, (cid:15) N ) ∩ X − ( c )) > → . (cid:3) Proof of Proposition A.2.

We only state the proof for µ since the proof for µ N follows similarly.To show the result we must show that the Φ is a measurable function, and then that the measureis normalizable. To prove measurability it suﬃces, by Lemma 6.1 in Iglesias et al. [27], to showthat that Φ is continuous µ -almost surely. Thus for ˆ X, ˜ X ∈ H s , s > , we must show that | Φ( ˆX) − Φ( ˜X) | → as (cid:107) ˆX − ˜X (cid:107) s → . Note that | Φ( ˆX) − Φ( ˜X) | ≤ (cid:90) D | e G ( ˆX)( s ) − e G ( ˜X)( s ) | d s + (cid:88) s j ∈ Y | G ( ˆX)( s j ) − G ( ˜X)( s j ) | . (8)We show continuity of the two terms separately. For the ﬁrst term in (8) it follows that (cid:90) D | e G ( ˆX)( s ) − e G ( ˜X)( s ) | d s ≤ (cid:90) D exp (cid:16) | G ( ˆX)( s ) | + | G ( ˜X)( s ) | (cid:17) | G ( ˆX)( s ) − G ( ˜X)( s ) | d s ≤ C | D | e C (cid:107) ˆX (cid:107) s + C (cid:107) ˜X (cid:107) s (cid:107) G ( ˆX) − G ( ˜X) (cid:107) s . Here the ﬁrst inequality is due to the mean value theorem, and the second inequality comesfrom using Sobolev’s embedding theorem, and Hölders inequality. Since (cid:107) G ( ˆX) − G ( ˜X) (cid:107) s ≤ (cid:80) Kk =1 (cid:107) X (cid:107) s (cid:107) π k ( ˆX ) − π k ( ˜X ) (cid:107) s + (cid:107) ˆX − ˜X (cid:107) s , it suﬃces to show that π k is continuous. By LemmaB.2, L ( S (X)) = 0 a.s. and since π k ( · ) is constant on S (X) C it is also a.s. continuous. By22roposition 2.6 in Iglesias et al. [27], π k ( · ) is therefore continuous on L ( D ) and thus also on H since it is a.s. constant.The second term in (8) can be bounded by C (cid:107) Y (cid:107) Y (cid:107) ˆX − ˜X (cid:107) s a.s. since | G ( ˆX)( s ) − G ( ˜X)( s ) | ≤(cid:107) ˆX − ˜X (cid:107) L ∞ ( D ) . Finally, by Lemma B.1 the function Φ is bounded from above and below, andthus the measure can be normalized. (cid:3) From here on we will simplify the notation by omitting the observed point pattern from thelikelihood potential and the constants, i.e.

Φ(X) = Φ(X; Y) and C µ = C µ (Y) . Proof of Theorem A.3.

By Stuart [52, Lemma 6.36], the Hellinger distance bounds the totalvariation norm, so it suﬃces to show convergence in Hellinger distance. Take X ∈ H s , s > .By the triangle inequality, d Hell ( µ, µ N ) = (cid:90) (cid:32)(cid:114) dµdν − (cid:114) dµ N dν (cid:33) dµ (X) = (cid:90) (cid:32) e − Φ(X) (cid:112) C µ − e − Φ N (X) (cid:112) C µ N (cid:33) dµ (X) ≤ C µ (cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) + (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:112) C µ − (cid:112) C µ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) e − Φ N (X) dµ (X)= I + I , where C µ = (cid:82) e − Φ(X) dµ (X) and C µ N = (cid:82) e − Φ N (X) dµ (X) . We now ﬁrst show that I can bebounded by I and then show that I → as N → ∞ . Note that I ≤ ( C µ − C µ N ) (cid:0) min { C µ , C µ N } (cid:1) − C µ N = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) e − Φ(X) − e − Φ N (X) dµ (X) (cid:12)(cid:12)(cid:12)(cid:12) (cid:0) min { C µ , C µ N } (cid:1) − C µ N ≤ C µ N { C µ , C µ N } (cid:18)(cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) (cid:19) ≤ C µ N { C µ , C µ N } (cid:90) (cid:12)(cid:12)(cid:12) e − Φ(X) − e − Φ N (X) (cid:12)(cid:12)(cid:12) dµ (X) (cid:90) e (cid:15) (cid:107) X (cid:107) − M ( (cid:15), (cid:107) Y (cid:107) Y ) dµ (X) ≤ CI . Here the third inequality is due to Hölder’s inequality and Ferniques theorem [14, Theorem A.3].Now to bound I note that I ≤ C µ (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) | Φ(X) − Φ N (X) | dµ (X) . Since the function G is Lipschitz continuous on S ( X ) (see Lemma B.2) we get (cid:12)(cid:12) Φ(X) − Φ N (X) (cid:12)(cid:12) ≤ Ce C (cid:107) X (cid:107) s | D | h + C (cid:107) Y (cid:107) Y h + C (cid:107) X (cid:107) s ( e C (cid:107) X (cid:107) s L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) , and thus I ≤ C µ C (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) ( | D | + (cid:107) Y (cid:107) Y ) h dµ (X)+ 24 C µ C (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) ( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) dµ (X) . N → ∞ . The second integralcan be bounded by (cid:113) E (cid:2) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) (cid:3)(cid:113) E [( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) ] ≤ C (cid:113) E [( L ( S C (X)) + (cid:107) S C (X) ∩ Y (cid:107) Y ) ] ≤ C ( L ( D ) + (cid:107) Y (cid:107) Y ) (cid:0) E (cid:2) L ( S C (X)) (cid:3) + E (cid:2) (cid:107) S C (X) ∩ Y (cid:107) Y (cid:3)(cid:1) , and as N → ∞ this also goes to zero by Lemma B.2. (cid:3) Proof of Theorem A.4.

Denote the posterior measure for the fully discretized model by ˜ µ N . TheTV distance between the posterior measures can be bounded as d T V ( µ, ˜ µ N ) ≤ d T V ( µ, µ N ) + d T V ( µ N , ˜ µ N ) , where the ﬁrst term goes to zero by theorem A.3. Clearly, ˜ µ N as given in (7) deﬁnes a posteriormeasure with respect to µ by the same arguments as in the proof of Proposition A.2, and itcoincides with µ N on the span of { e j } j>p +1 . We can therefore bound d T V ( µ N , ˜ µ N ) using thesame method as in the proof of theorem A.3, this gives that d Hell ( µ N , ˜ µ N ) ≤ I + I , wherenow, I = 1 C µ N (cid:90) X (cid:12)(cid:12)(cid:12) e − Φ N (X) − e − Φ N ( P p X) (cid:12)(cid:12)(cid:12) dµ (X) I = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:112) C µ N − (cid:112) C ˜ µ N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:90) e − Φ N ( P p X) dµ (X) . We can again bound I by CI , so what remains to be shown is that I goes to zero as p → ∞ .Let X ∈ H s , s > . Since P p is a projection, we then clearly have that (cid:107) P p X (cid:107) s ≤ (cid:107) X (cid:107) s . ByLemma B.1(i) and Hölders inequality I ≤ C µ (cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) | Φ N (X) − Φ N ( P p X) | dµ (X) ≤ C (cid:115)(cid:90) e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) dµ (X) E (cid:2) | Φ N (X) − Φ N ( P p X) | (cid:3) . We will now focus on bounding the expectation above. Using Ferniques theorem (cid:12)(cid:12) Φ N (X) − Φ N ( P p X) (cid:12)(cid:12) = N (cid:88) i =1 | D i | ( e G (X)( s i ) − e G ( P p X)( s i ) ) − Y i ( G ( P p X)( s i ) − G (X)( s i )) ≤ e (cid:15) (cid:107) X (cid:107) s − M ( (cid:15), (cid:107) Y (cid:107) Y ) N (cid:88) i =1 ( | D i | + Y i ) | G ( P p X)( s i ) − G (X)( s i ) | . Using the inequalities | G ( P p X)( s ) − G (X)( s ) | ≤ K (cid:88) k =1 | π k (X )( s ) X k ( s ) − π k ( P p X )( s ) P p X k ( s ) |≤ K (cid:88) k =1 ( | X k ( s ) − P p X k ( s ) | + C (cid:107) X (cid:107) s | π k (X )( s ) − π k ( P p X )( s ) | ) E (cid:2) | Φ N (X) − Φ N ( P p X) | (cid:3) ≤ C E  (cid:88) i ∈ I N ( | D i | + Y i ) K (cid:88) k =1 | X k ( s i ) − P p X k ( s i ) |   + C E  (cid:88) i ∈ I N ( | D i | + Y i ) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) |   . (9)Note that | D i | ∝ N − and that X ( s ) is bounded for each s ∈ D almost surely. Let Q p X = X − P p X and note that Q p X for each s ∈ D is a mean-zero Gaussian variable with a variance σ p that goes to zero as p → ∞ . Thus, the ﬁrst term in (9) clearly goes to zero as p → ∞ . Since | π k (X )( s i ) − π k ( P p X )( s i ) | is bounded by one, the second term in (9) can be bounded by C N (cid:88) i =1 ( | D i | + Y i ) E (cid:34) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) | (cid:35) . Here the expectation can be bounded as E (cid:34) K (cid:88) k =1 | π k (X )( s i ) − π k ( P p X )( s i ) | (cid:35) ≤ K max k { P (X ( s i ) ≤ c k ∩ P p X ( s i ) > c k )+ P (X ( s ) > c k ∩ P p X ( s ) < c k ) } . We now show how to bound the ﬁrst probability, and the second probability is bounded by similarcalculations. Deﬁne the events A = { X ( s i ) ≤ c k ∩ P p X ( s i ) > c k } and B = { P p X ( s i ) ∈ [ c k , c k + (cid:15) ] } . It follows that P ( A ) = P ( A | B ) P ( B ) + P ( A | B C ) P ( B C ) ≤ P ( B ) + P ( A | B C ) ≤ P ( P p X ( s i ) ∈ [ c k , c k + (cid:15) ]) + P ( Q p X ( s i ) ≤ − (cid:15) ) . Now set (cid:15) = √ σ p and recall that P ( Z > t ) < √ πt e − t / if Z ∼ N (0 , . This gives that P ( A ) ≤ P (cid:0) < P p X ( s i ) ≤ √ σ p (cid:1) + P (cid:0) Q p X ( s i ) ≤ −√ σ p (cid:1) ≤ C √ σ p + √ σ p √ π e − σp , which goes to zero as p → ∞ , and thus so does the ﬁnal expectation in (9). (cid:3)(cid:3)