Nonstationary, Nonparametric, Nonseparable Bayesian Spatio-Temporal Modeling Using Kernel Convolution of Order Based Dependent Dirichlet Process
NNonstationary, Nonparametric, Nonseparable BayesianSpatio-Temporal Modeling using Kernel Convolution ofOrder Based Dependent Dirichlet Process
Moumita Das and Sourabh Bhattacharya ∗ Abstract
Spatio-temporal processes are important modeling tools for varieties of problemsin environmental science, biological science, geographical science, etc. It is generallyassumed that the underlying model is parametric, typically a Gaussian process, andthat the covariance function is stationarity and separable. That this structure neednot be always realistic have been perceived by many researchers and attempts havebeen made to construct nonparametric processes consisting of neither stationary norseparable covariance functions. But, as we elucidate, some desirable and importantspatio-temporal properties are not guaranteed by the existing approaches, thus callingfor further innovative ideas.In this paper, using kernel convolution of order based dependent Dirichlet process(Griffin & Steel (2006)) we construct a nonstationary, nonseparable, nonparametricspace-time process, which, as we show, satisfies desirable properties, and includes thestationary, separable, parametric processes as special cases. We also investigate thesmoothness properties of our proposed model.Since our model entails an infinite random series, for Bayesian model fitting purposewe must either truncate the series or more appropriately consider a random number ofsummands, which renders the model dimension a random variable. We attack the vari-able dimensionality problem using the novel Transdimensional Transformation based ∗ Moumita Das is a PhD student and Sourabh Bhattacharya is an Assistant Professor in Bayesian andInterdisciplinary Research Unit, Indian Statistical Institute, 203, B. T. Road, Kolkata 700108. Correspondinge-mail: [email protected]. a r X i v : . [ s t a t . M E ] M a y arkov Chain Monte Carlo (TTMCMC) methodology introduced by Das & Bhat-tacharya (2014 b ), which can update all the variables and also change dimensions in asingle block using a single random variable drawn from some arbitrary density definedon a relevant support. For the sake of completeness we also address the problem oftruncating the infinite series by providing a uniform bound on the error incurred bytruncating the infinite series.We illustrate our model and the methodologies on a simulated data set and alsofit a real, ozone data set. The results that we obtain from both the studies are quiteencouraging. Keywords:
Kernel convolution; Nonstationary; Nonseparable; Order based Depen-dent Dirichlet Process; Spatio-temporal data; Transdimensional Transformation basedMarkov Chain Monte Carlo.
1. INTRODUCTIONRecent years have witnessed considerable amount of research on spatial and spatio-temporalmodeling. Though it is common practice to assume that the underlying spatial or spatio-temporal process is stationary and isotropic Gaussian process, in reality such assumptionsseldom hold. This is because there may be local influences affecting the correlation structureof the random process. For instance, orographic effects influence the atmospheric transportof pollutants, and result in a correlation structure that depends on different spatial locations(Guttorp & Sampson (1994)). It is also restrictive to assume that the underlying process isGaussian, and nonparametric processes seem more reasonable in realistic situations.Realizing the limitations of stationary parametric processes (almost invariably Gaussianprocesses) researchers have come up with many novel ideas for constructing nonstationaryand/or nonparametric processes. The first significant work in the framework of nonstation-ary parametric processes is by Sampson & Guttorp (1992), who proposed an approach basedon spatial deformation. This work is followed up by Damian, Sampson & Guttorp (2001)and Schmidt & O’Hagan (2003), providing the corresponding Bayesian generalizations. Non-2tationarity has been induced in parametric space-time models by Haas (1995) by proposinga moving window regression residual kriging. A similar approach has been proposed byNott & Dunsmuir (2002). Higdon (1998) (see also Higdon, Swall & Kern (1999), Higdon(2001)) proposed a kernel convolution approach for inducing nonstationarity in Gaussianprocesses. Similar approaches are also proposed by Fuentes & Smith (2001) and Fuentes(2002). Approaches that attempt to model the underlying process as nonparametric, in ad-dition to modeling the covariance structure as nonstationary are more recent in comparison,the approach of Gelfand, Kottas & MacEachern (2005) based on Dirichlet processes (see, forexample, Ferguson (1973), Ferguson (1974)) being the first in this regard; see Duan, Guin-dani & Gelfand (2007) for a generalization. Duan, Gelfand & Sirmans (2009) use stochasticdifferential equations to construct a nonstationary, non-Gaussian process. We discuss theseproposals in some detail in Section 2.Griffin & Steel (2006) (henceforth, GS) proposed novel order-based dependent Dirichletprocesses (ODDP that satisfy most desirable properties of nonparametric processes, and canbe seen as a significant improvement over the other existing ideas. Indeed, our modelingidea also hinges upon the constructive idea of GS, but we attempt to incorporate furtherflexibility in our spatial/temporal/spatio-temporal model in terms of nonstationarity andnonseparability through our proposed kernel convolution based methodology.The rest of our paper is structured as follows. In Section 2 we provide a brief overview ofthe existing approaches to construction of nonstationary, nonseparable space-time processesin both parametric and nonparametric frameworks, arguing that not all desirable propertiesare necessarily accounted for in these approaches. Such issues necessitate development ofnew approaches to construction of nonstationary, nonparametric, nonseparable space-timemodels. In Section 3 we introduce our proposed space-time model based on kernel convo-lution of ODDP and show that it satisfies the properties that are not guaranteed by theexisting models. We investigate continuity and smoothness properties of our model in Sec-tion 4. Since our proposed model involves a random infinite series, for model fitting oneneeds to either truncate the series or assume a random number of summands and adopt3ariable dimensional Markov Chain Monte Carlo (MCMC) approaches. Although we adoptthe latter framework for our applications, and implement the recently developed Transdimen-sional Transformation based Markov Chain Monte Carlo (TTMCMC) (Das & Bhattacharya(2014 b )) for simulating from our variable dimensional model, for the sake of completenesswe also investigate the truncation approach. Indeed, in Section 5 we consider the differencebetween the prior predictive models with and without truncation of the random infinite se-ries, providing a bound that depends upon the truncation parameter. Thus, the truncationparameter can be chosen so that the bound falls below any desired level. In Section 6 wediscuss the choice of suitable kernels, prior distributions and choice of the spatio-temporaldomain that is relevant for computational purpose. We describe the joint posterior distri-bution associated with our model, and provide a brief discussion of TTMCMC in Section7. We detail a simulation study illustrating the performance of our model in Section 8, andfollow it up by analysis of a real, ozone data set in Section 9. Finally, we summarize ourcontributions and provide concluding remarks in Section 10.Proofs of our results and requisite details of TTMCMC, particularly in the context of ourspatio-temporal model, are provided in the supplement Das & Bhattacharya (2014 a ), whosesections and algorithms have the prefix “S-” when referred to in this paper.2. OVERVIEW OF OTHER AVAILABLE NONSTATIONARY APPROACHES2.1 Parametric approachesThe deformation approaches of Sampson & Guttorp (1992), Damian et al. (2001), andSchmidt & O’Hagan (2003) are based on Gaussian processes. In these approaches repli-cations of the data are necessary, which the authors relate to temporal independence ofthe data. This also means that space-time data can not be modeled using these approaches.Moreover, in the deformation-based approaches model based theoretical correlations betweenrandom observations separated by large enough distances need not necessarily tend to zero.4ampson & Guttorp (1992) deal with the variogram of the following form:Var( Y ( s , t ) − Y ( s , t )) = f ( (cid:107) d ( x ) − d ( x ) (cid:107) ) , (2.1)for any s , s , t , where f is an appropriate monotone function and d is a one-to-one nonlinearmapping. The technique of Sampson & Guttorp (1992) involves appropriately approximating f by ˆ f using the multidimensional scaling method, and obtaining a configuration of points { u , . . . , u n } in a “deformed” space where the process is assumed isotropic. Then, usingthin-plate splines, a nonlinear approximation of d , which we denote by ˆ d , is determined suchthat ˆ d ( s i ) ≈ u i , for i = 1 , . . . , n . Bayesian versions of the key idea have been described inDamian et al. (2001), who use random thin-plate splines and Schmidt & O’Hagan (2003), whouse Gaussian process to implement the nonlinear transformation d . Rather than estimate f nonparametrically, both specify a parametric functional form from a valid class of suchmonotone functions.As is clear, since large differences (cid:107) s − s (cid:107) does not imply that (cid:107) d ( s ) − d ( s ) (cid:107) is alsolarge, the model based correlations between two observations widely separated need notnecessarily tend to zero, in either of the aforementioned deformation-based approaches.The kernel convolution approaches of Higdon et al. (1999), Higdon (2001), and Fuentes& Smith (2001) overcome some of the difficulties of the deformation approach. In theseapproaches data replication is not necessary, and for appropriate choices of the kernel, sta-tionarity, nonstationarity, separability, and nonseparability can be achieved with respect tospatio-temporal data. In the approach of Higdon et al. (1999), Higdon (2001), Y ( x ) = (cid:90) K ( x , u ) Z ( u ) d u , (2.2)where K is a kernel function and Z ( · ) is a white noise process. Then the covariance between Y ( x ) and Y ( x ) is given by C ( x , x ) = (cid:90) K ( x , u ) K ( x , u ) d u . (2.3)In general, this does not depend upon x and x only through d = (cid:107) x − x (cid:107) , thus achievingnonstationarity. However, it is clear from the covariance structure (2.3) that C ( x , x ) does5ot generally tend to zero as d = (cid:107) x − x (cid:107) → ∞ . But for separable space-time processesrelated to representation (2.2) this property holds under the additonal assumption of isotropywith respect to either space or time. We elaborate this below.Although respresentation (2.2) can not achieve separability with respect to space andtime, a modified representation of the following form does: Y ( s , t ) = (cid:90) K ( s , u ) K ( t, v ) Z ( u ) Z ( v ) d u d v . (2.4)In (2.4), K , K are two kernel functions, and Z ( x ) , Z ( x ) are independent white noiseprocesses. Now the covariance is given by C (( s , t ) , ( s , t )) = (cid:90) K ( s , u ) K ( x s , u ) K ( t , v ) K ( t , v ) d u d v = C ( s , s ) × C ( t , t ) , (2.5)where C ( s , s ) = (cid:90) K ( s , u ) K ( s , u ) d u , (2.6) C ( t , t ) = (cid:90) K ( t , v ) K ( t , v ) d v , (2.7)exhibiting separability. Further assuming that either of C or C is isotropic, it follows thatif either of d = (cid:107) s − s (cid:107) or d = | t − t | tends to infinity, the covariance given by (2.5)tends to zero even though either of C or C is nonstationary. But if both C and C arenonstationary, then this result need not hold.The approach of Fuentes & Smith (2001) comes close towards solving the problem of zerocovariance in the limit with large enough separation between observations, which we nowexplain. They model the underlying process as Y ( x ) = (cid:90) K ( x − u ) Z θ ( u ) d u , (2.8)where Z θ ( x ); x ∈ D is a family of independent, stationary Gaussian processes indexed by θ ,where the covariance of Z θ ( u ) is given byCov (cid:0) Z θ ( u ) ( x ) , Z θ ( u ) ( x ) (cid:1) = C θ ( u ) ( x − x ) . (2.9)6hen, the covariance between Y ( x ) and Y ( x ) is given by C ( x , x ; θ ) = (cid:90) K ( x − u ) K ( x − u ) C θ ( u ) ( x − x ) d u . (2.10)For practical purposes, Fuentes & Smith (2001) approximate Y ( x ) withˆ Y ( x ) = 1 M M (cid:88) m =1 K ( x − u m ) Z θ ( u m ) ( x ) , (2.11)and C ( x , x ; θ ) byˆ C ( x , x ; θ ) = 1 M M (cid:88) m =1 K ( x − u m ) K ( x − u m ) C θ ( u m ) ( x − x ) , (2.12)where { u , . . . , u M } can be thought of as a set of locations drawn independently from thedomain D . Assuming that the family of independent Gaussian processes Z θ ( x ); x ∈ D is alsoisotropic, it follows, using the fact that M is finite, that ˆ C ( x , x ; θ ) → (cid:107) x − x (cid:107) → ∞ since C θ ( u m ) ( x − x ) → m = 1 , . . . , M . However, this of course does notguarantee that ˆ C ( x , x ; θ ) → M → ∞ . That is, this does not necesasrily imply that C ( x , x ; θ ) → (cid:107) x − x (cid:107) → ∞ . For other available parametric approaches to nonstationarity werefer to the references provided in Chang et al. (2011).2.2 Nonparametric approachesGelfand et al. (2005) seem to be the first to propose a nonstationary, noparametric Bayesianmodel based on Dirichlet process mixing. They represent the random field Y D = { Y ( x ); x ∈ D } as (cid:80) ∞ (cid:96) =1 w (cid:96) δ θ (cid:96),D , where θ (cid:96),D = { θ (cid:96) ( x ); x ∈ D } are realizations from a specified stationaryGaussian process, which we denote as G , w = V , w (cid:96) = V (cid:96) (cid:81) (cid:96) − r =1 (1 − V r ) for (cid:96) ≥ V r iid ∼ Beta (1 , α ); r = 1 , , . . . . Thus, a random process G is induced on the spaceof processes of Y D with G being the “central” process. Gelfand et al. (2005) assume the7pace-time data Y t = ( Y ( s , t ) , . . . , Y ( s n , t )) (cid:48) to be time-independent for t = 1 , . . . , T , whichis the same assumption of data replication used in the deformation-based approaches. Thetemporal-independence assumption allows Gelfand et al. (2005) to model the data as follows:for t = 1 , . . . , T , Y t iid ∼ G ( n ) and G ( n ) ∼ DP ( G ( n )0 ), where the G ( n ) and G ( n )0 denote the n -variate distributions corresponding to the processes G and G . The development leads tothe following covariance structure: for any s , s , t ,Cov( Y ( s , t ) , Y ( s , t ) | G ) = ∞ (cid:88) (cid:96) =1 w (cid:96) θ (cid:96) ( s ) θ (cid:96) ( s ) − (cid:40) ∞ (cid:88) (cid:96) =1 w (cid:96) θ (cid:96) ( s ) (cid:41) (cid:40) ∞ (cid:88) (cid:96) =1 w (cid:96) θ (cid:96) ( s ) (cid:41) , (2.13)which is nonstationary. However, marginalized over G , the covariance between Y ( s , t ) and Y ( s , t ) turns out to be stationary. Since, in Gelfand et al. (2005), the Bayesian inference ofthe data Y , . . . , Y n proceeds by integrating out G ( n ) , the entire flavour of nonstationarityis lost. Also, given G , (2.13) is nonstationary but need not necessarily converge to zero if (cid:107) s − s (cid:107) → ∞ .Duan et al. (2007) attempt to generalize the model of Gelfand et al. (2005) by specifying P r { Y ( x ) ∈ A , . . . , Y ( x n ) ∈ A n } = ∞ (cid:88) i =1 · · · ∞ (cid:88) i n =1 p i ,...,i n δ θ i ( x ) ( A ) · · · δ θ in ( x n ) ( A n ) , (2.14)where θ j ’s are iid G as in Gelfand et al. (2005), and { p i ,...,i n ≥ (cid:80) ∞ i =1 · · · (cid:80) ∞ i n =1 p i ,...,i n } determine the site-specific joint selection probabilities, which also must satisfy simple con-straints to ensure consistency. The resulting conditional covariance (conditional on G ) andthe marginal covariance are somewhat modified versions of those of Gelfand et al. (2005),but now even the marginal covariance is nonstationary. By choosing G to be an isotropicGaussian process it can be ensured that the marginal covariance tends to zero as two obser-vations are widely separated, but the same can not be ensured for the conditional covariance.Moreover, replications of the data is necesary even for this generalized version of Gelfandet al. (2005), and modeling temporal dependence is precluded as before.A nonstationary, nonseparable non-Gaussian spatiotemporal process has been constructedby Duan et al. (2009) using discretized versions of stochastic differential equations, but again,the correlations between largely separated observations do not necessarily tend to zero un-8er their model. Also, stationarity or separability can not be derived as special cases of thisapproach.A flexible approach using kernel convolution of L´evy random measures has been detailedin Wolpert, Clyde & Tu (2011), but even this approach does not guarantee that correlationstend to zero for largely separated distances for arbitrarily chosen kernels.In the next section we introduce our idea based on kernel convolution of ODDP and showthat it overcomes the issues faced by the traditional approaches to construction of flexiblespace-time models. 3. KERNEL CONVOLUTION OF ODDPBefore introducing our proposal, it is necesary to first provide an overview of ODDP.3.1 Overview of ODDPIn order to induce spatial dependence between observations at different locations GS modifythe nonparametric stick-breaking construction of Sethuraman (1994) in the following way:for each point x ∈ D , where D is some specified domain, they define the distribution: G x D = ∞ (cid:88) i =1 p i ( x ) δ θ πi ( x ) , (3.1)where p i ( x ) = V π i ( x ) (cid:89) j π i ( x ) = i for each x and i , then the Dirichlet process (DP) results at all locations.GS construct π ( x ) in a way such that it is associated with the realization of a pointprocess. Specifically, they consider a stationary Poisson process Φ and a sequence of sets U ( x ) for x ∈ D , the latter determining the relevant region for the ordering purpose. In the9ase of only spatial problems, if x ∈ D ⊂ R d , for d ≥
1, then GS suggest U ( x ) = D for all x ∈ D as a suitable construction of U ( x ). For time series problems they suggest D = R and U ( x ) = ( −∞ , x ]. When x = ( s (cid:48) , t ) (cid:48) , that is, when x consists of both spatial and temporalco-ordinates, for our modeling purpose, we use U ( x ) = D × ( −∞ , t ].Letting { z , z , . . . } denote a realization of the stationary Poisson point process, the or-dering π ( x ) is chosen to satisfy (cid:107) x − z π ( x ) (cid:107) < (cid:107) x − z π ( x ) (cid:107) < (cid:107) x − z π ( x ) (cid:107) < · · · , where (cid:107) · (cid:107) is a distance measure and z π ( x ) ∈ Φ ∩ U ( x ). Thus, although the set of probabili-ties { p i ( x ); i = 1 , , . . . } remains same for all locations, they are randomly permuted. Thisrandom permutation, in turn, induces spatial dependence. Assuming a homogeneous Pois-son point process with intensity λ , ODDP is characterized by G , α , and λ . We expressdependence of ODDP on these parameters by ODDP( αG , λ ).Assuming that data { y , . . . , y n } are available at sites { x , . . . , x n } , GS embed the ODDPin a hierarchical Bayesian model: y i ∼ f θ i ( · ) θ i iid ∼ G x i G x i ∼ ODDP( αG , λ ) . Note that the same theory can be extended to space-time situations with x = ( s (cid:48) , t ) (cid:48) , where s stands for the spatial location and t stands for the time point.Next, we introduce our proposed idea of kernel convolution of ODDP.3.2 Kernel convolution of ODDPWe consider the following model for the data Y = { y , . . . , y n } at locations/times { x i =( s (cid:48) i , t i ) (cid:48) ; i = 1 , . . . , n } : y i = f ( x i ) + (cid:15) i , (3.3)where (cid:15) i iid ∼ N (0 , σ ), for unknown σ . We represent the spatio-temporal process f ( x ) as aconvolution of ODDP G x with a smoothing kernel K ( x , · ): f ( x ) = (cid:90) K ( x , θ ) dG x ( θ ) = ∞ (cid:88) i =1 K ( x , θ π i ( x ) ) p i ( x ) ∀ x ∈ D ⊆ R d , (3.4)10 ( ≥
1) being the dimension of x . The following theorem, the proof of which is presented inSection S-1 of the supplement, gives an expression of the expectation of f ( x ). Theorem 1
Let (cid:82) | K ( x , θ ) | dG ( θ ) < ∞ . Then (cid:82) | K ( x , θ ) | dG x ( θ ) < ∞ with probabilityone, and E ( f ( x )) = E (cid:90) K ( x , θ ) dG x ( θ ) = (cid:90) K ( x , θ ) dEG x ( θ ) = (cid:90) K ( x , θ ) dG ( θ ) = E G K ( x , θ ) . Before deriving the covariance structure of f ( · ), we define the necessary notation followingGS. Let T ( x , x ) = { k : there exists i, j such that π i ( x ) = π j ( x ) = k } . For k ∈ T ( x , x ), we further define A lk = { π j ( x l ) : j < i where π i ( x l ) = k } , S k = A k ∩ A k and S (cid:48) k = A k ∪ A k − S k . Then, the following theorem, the proof of which is deferred toSection S-2 of the supplement, provides an expression for the covariance structure of f ( · ),which will be our reference point for arguments regarding nonstationarity and other desirablespatial properties in comparison with the existing methods. Theorem 2 If (cid:82) | K ( x , θ ) | dG ( θ ) < ∞ and (cid:82) | K ( x , θ ) K ( x , θ ) | dG ( θ ) < ∞ , then for afixed ordering at x and x ,Cov ( f ( x ) , f ( x )) = Cov G ( K ( x , θ ) , K ( x , θ )) × α + 1)( α + 2) (cid:88) k ∈ T ( x , x ) (cid:18) αα + 2 (cid:19) S k (cid:18) αα + 1 (cid:19) S (cid:48) k . (3.5) whereCov G ( K ( x , θ ) , K ( x , θ )) = (cid:90) K ( x , θ ) K ( x , θ ) dG ( θ ) − E G ( K ( x , θ )) E G ( K ( x , θ )) . (3.6) Corollary 3
It follows from the above theorem that for i = 1 , , if (cid:82) K ( x i , θ ) dG ( θ ) < ∞ ,then Var ( f ( x i )) = Var G ( K ( x i , θ )) α + 1 (3.7)11 nd Corr ( f ( x ) , f ( x )) = Corr G ( K ( x , θ ) , K ( x , θ )) × Corr ( G x , G x ) , (3.8) where Corr ( G x , G x ) = 2 α + 2 (cid:88) k ∈ T ( x , x ) (cid:18) αα + 2 (cid:19) S k (cid:18) αα + 1 (cid:19) S (cid:48) k . (3.9)The expression for the correlation in (3.9) has been obtained by GS.The above results provide an expression for the correlation conditional on a fixed order-ing. To obtain the unconditional correlation it is necessary to marginalize the conditionalcorrelation over the point processs Φ. Following GS we also modify the notation as follows:we now let T ( x , x ) = Φ ∩ U ( x ) ∩ U ( x ), A (cid:96)k = A (cid:96) ( z k ), where A (cid:96) ( z ) = { w ∈ Φ ∩ U ( x (cid:96) ) : (cid:107) w − x (cid:96) (cid:107) < (cid:107) z − x (cid:96) (cid:107)} , for z ∈ Φ ∩ U ( x (cid:96) ). As already mentioned in Section 3.1, when x = ( s (cid:48) , t ) (cid:48) , we define U ( x ) = D × ( −∞ , t ].Also, for z ∈ T ( x , x ), we let S ( z ) = A ( z ) ∩ A ( z ) and S (cid:48) ( z ) = A ( z ) ∪ A ( z ) − S ( z ),which imply that S ( z ) = { w ∈ T ( x , x ) : (cid:107) w − x (cid:107) < (cid:107) z − x (cid:107) and (cid:107) w − x (cid:107) < (cid:107) z − x (cid:107)} .We further define, as in GS, S − z ( z ) and S (cid:48)− z ( z ) to be translations of S ( z ) and S (cid:48) ( z ),respectively, by − z . Then, the refined Campbell theorem yields, in the case where Φ is astationary point process with intensity λ :Corr( f ( x ) , f ( x )) = Corr G ( K ( x , θ ) , K ( x , θ )) × λα + 2 (cid:90) U ( x ) ∩ U ( x ) (cid:90) (cid:18) αα + 2 (cid:19) φ − z ( S − z ) (cid:18) αα + 1 (cid:19) φ − z ( S (cid:48)− z ) P ( dφ ) d z (3.10)In (3.10), P ( dφ ) is the Palm distribution of Φ at the origin, and φ − z is the realizationof Φ translated by − z . Note also that the second factor of the above correlation is theunconditional correlation between G x and G x (see GS). The following theorem, the proofof which is provided in Section S-3 of the supplement, shows that the above correlationstructure of our kernel convolution based ODDP satisfies desirable properties.12 heorem 4 Corr ( f ( x ) , f ( x )) → as (cid:107) x − x (cid:107) → and Corr ( f ( x ) , f ( x )) → as (cid:107) x − x (cid:107) → ∞ . Under a stationary Poisson process assumption for Φ, and for particular specifications of U ( x ) mentioned in Section 3.1, the calculations of GS show that this factor depends upon x and x only through (cid:107) x − x (cid:107) , leading to isotropy of the process. There does not seem toexist any result analogous to the refined Campbell theorem in the context of nonstationaryPoisson process which might allow one to construct a nonstationary correlation structure inthis case. The analytic form of the ODDP correlation structure need not be available forother constructions of U ( x ) either. Isotropy results even in the case of the more flexible Coxprocesses.On the other hand, our kernel convolution idea neatly solves this problem of attainmentof nonstationarity via the first factor of our correlation structure given in (3.10). Indeed, thekernel K ( x , θ ) can be chosen in the spirit of Higdon et al. (1999), for instance, such thatCorr G ( K ( x , θ ) , K ( x , θ )) does not depend upon x − x alone. In other words, by simplycontrolling the kernel we can ensure nonstationarity of our process f ( · ) even if the underlyingODDP is stationary or even isotropic. Of course, our process can be made stationary as wellby choosing the kernel, say, in the spirit of Higdon (1998), and setting U ( x ) to be of theforms specified by GS, when x consists of either only spatial co-ordinates or only temporalco-ordinate. When x = ( s (cid:48) , t ) (cid:48) , then we set U ( x ) = D × ( −∞ , t ], as already mentionedbefore.We further note that our general space-time correlation structure given by (3.8) is non-separable, that is, in general, Corr( f ( s , t ) , f ( s , t )) (cid:54) = Corr ( s , s ) × Corr ( t , t ), whereCorr and Corr are spatial and temporal structures respectively. However, if desired, sepa-rability can be easily induced by allowing the kernel to depend upon only the spatial locationand by allowing the ordering π to depend only upon time, or the vice versa. Specifically,letting K ( x , θ ) = K ( s , θ ) and π ( x ) = π ( t ), we obtainCorr( f ( s , t ) , f ( s , t )) = Corr G ( K ( s , θ ) , K ( s , θ )) × Corr( G t , G t ) , (3.11)13nd letting K ( x , θ ) = K ( t, θ ) and π ( x ) = π ( s ), we obtainCorr( f ( s , t ) , f ( s , t )) = Corr G ( K ( t , θ ) , K ( t , θ )) × Corr( G s , G s ) , (3.12)In contrast, under the ODDP approach of GS, it is clear from the correlation structure thatCorr( G x , G x ) (cid:54) = Corr ( s , s ) × Corr ( t , t ), showing that separability can not be enforcedif desired.Thus, following our approach it is easy to construct nonparametric covariance structuresthat are either stationary or nonstationary, which, in turn, can be constructed as eitherseparable or nonseparable, as desired. These illustrate the considerable flexibility inherentin our approach, while satisying at the same time the desirable conditions that the correlationbetween f ( x ) and f ( x ) tends to 1 or zero accordingly as the distance between x and x tends to zero or infinity.In the next section we provide a brief overview of the other available approaches fornonstationary modelling of spatio-temporal data.4. CONTINUITY AND SMOOTHNESS PROPERTIES OF OUR MODELFor stationary models, properties like continuity and smoothness can be quite generallycharacterized by the continuity and smoothness of the correlation function. In particular,continuity and smoothness of stationary processes typically depend upon the behaviourof the correlation function at zero; see Yaglom (1987 a ) and Yaglom (1987 b ) for details.For nonstationary processes, however, such elegant theory is not available. Indeed, thestructure of the correlation function itself may be difficult to get hold of, rendering it difficultto investigate the properties of the underlying nonstationary stochastic process. For ourpurpose, we utilize the notions of almost sure continuity, mean square continuity and meansquare differentiability of stochastic processes (see, for example, Stein (1999), Banerjee &Gelfand (2003)) to study the properties of our nonstationary spatio-temporal process. Definition 5
A process { X ( x ) , x ∈ R d } is L continuous at x if lim x → x E [ X ( x ) − X ( x )] =0 . Continuity in the L sense is also referred to as mean square continuity and will be denoted y X ( x ) L → X ( x ) . Definition 6
A process { X ( x ) , x ∈ R d } is almost surely continuous at x if X ( x ) → X ( x ) a.s. as x → x . If the process is almost surely continuous for every x ∈ R d then the processis said to have continuous realizations. Theorem 7
Assume the following conditions:(A1) For all x and θ , | K ( x , θ ) | < M for some M < ∞ .(A2) Given any θ , K ( x , θ ) is a continuous function of x .Then f ( · ) is both almost surely continuous and mean square continuous in the interior of ∩ ∞ k =1 A ki k , where A ki k = { x : π k ( x ) = i k } , and for each k = 1 , , . . . , i k ∈ { , , . . . } ; i k (cid:54) = i k (cid:48) for any k (cid:54) = k (cid:48) . On the other hand, f ( · ) is almost surely discontinuous at any point x ∈ ∩ ∞ k =1 A ki k lying on the boundary of A ki k , for any i k . See Section S-4 for a proof of this result. Now we examine mean square differentiabilityof our process.
Definition 8
A process { X ( x ) , x ∈ R d } is said to be mean square differentiable at x if forany direction u , there exists a process L x ( u ) , linear in u such that X ( x + u ) = X ( x ) + L x ( u ) + R ( x , u ) , where R ( x , u ) (cid:107) u (cid:107) L → . Theorem 9
Assume the following conditions:(B1) For all x and θ , | K ( x , θ ) | < M for some M < ∞ .(B2) Given any θ , K ( x , θ ) is a continuously differentiable function of x .Then f ( · ) is mean square differentiable in the interior of ∩ ∞ k =1 A ki k . See Section S-5 for a proof of this theorem.15. TRUNCATION OF THE INFINITE SUMMANDSince our proposed model f ( x ) = (cid:80) ∞ k =1 K ( x , θ π ( x ) ) p i ( x ) is an infinite (random) series, formodel-fitting purpose it is necessary to truncate the series to f ( x ) = (cid:80) Nk =1 K ( x , θ π ( x ) ) p i ( x ),where N is to be determined, or to implement variable-dimensional Markov chain methodswhere N is to considered a random variable so that the number of parameters associatedwith f ( x ) is also a random variable.Although we will describe and implement a novel variable dimensional Markov chainmethodology introduced by Das & Bhattacharya (2014 b ), which the authors refer to asTransdimensional Transformation based Markov Chain Monte Carlo (TTMCMC), we firstprove a theorem with respect to truncation of the infinite random series. Note that inthe context of traditional Dirichlet process characterized by Sethuraman’s stick breakingconstruction (Sethuraman (1994)) which involves infinite random series, Ishwaran & James(2001) proposed a method of truncating the infinite series.We now state our theorem on truncation, the proof of which is provided in Section S-6of the supplement. But before stating the theorem it is necessary to define some requirednotation. Let P N ( x i ) = N (cid:88) i =1 K ( x i , θ i ) p i , and P ( x i ) = ∞ (cid:88) i =1 K ( x i , θ i ) p i , where N needs to be determined. Also let P N = ( P N ( x ) , . . . , P N ( x n )) (cid:48) and P = ( P ( x ) , . . . , P ( x n )) (cid:48) , and denote by Θ N and Θ the sets of random quantities ( V i , θ i ) associated with P N and P respectively. We define the following marginal densities of the vector of observations y = { y , . . . , y n } : m N ( y ) = (cid:90) Θ N [ y | P N ][ P N ] d Θ N = (cid:90) Θ [ y | P N ][ P ] d Θ , m ∞ ( y ) = (cid:90) Θ [ y | P ][ P ] d Θ . Theorem 10
Under the assumption that sup θ K ( x i , θ ) ≤ M for i = 1 , . . . , n , where M > is a finite constant, we have (cid:90) R n | m N ( y ) − m ∞ ( y ) | d y ≤ M n (cid:18) αα + 2 (cid:19) N + 2 (cid:114) π M n (cid:18) αα + 1 (cid:19) N .
6. CHOICE OF KERNEL, PRIOR DISTRIBUTIONS AND COMPUTATIONALREGIONThe choice of kernel K ( · , · ) plays a crucial role in nonstationary spatio-temporal data anal-ysis. For instance, if K ( x , θ ) = K ( x − θ ), then the correlation between Y ( x ) and Y ( x )turns out to be a function of x − x , thus inducing stationarity. For the purpose of nonsta-tionarity, it is necessary to make the parameters of the kernel depend upon space and time.In the spatial context such nonstationary kernels are considered in Higdon et al. (1999). Inthis paper, we consider a nonstationary space-time kernel; for the spatial part of the kernelwe essentially adopt the dependence structure proposed by Higdon et al. (1999) and for thetemporal part we allow the relevant coefficient to be time varying, modeled by a stationaryGaussian process.In particular, we consider the following kernel for our applications: K ( s , t, θ , τ ) = exp (cid:26) −
12 ( s − θ ) T Σ( s )( s − θ ) − δ ( t ) | t − τ | (cid:27) , where Σ( s ) is a 2 × s , and δ ( t ) > t . We assume that log( δ ( t )) is a zero mean Gaussian process withcovariance c δ ( t , t ) = σ δ exp { ( t − t ) /a δ } . Following Higdon et al. (1999) we setΣ( s ) = ϕ (cid:20) √ A + (cid:107) ψ ( s ) (cid:107) π π + (cid:107) ψ ( s ) (cid:107) (cid:21) (cid:20) √ A + (cid:107) ψ ( s ) (cid:107) π π − (cid:107) ψ ( s ) (cid:107) (cid:21) cos α ( s ) sin α ( s ) − sin α ( s ) cos α ( s ) , (cid:107) ψ ( s ) (cid:107) = ψ ( s ) + ψ ( s ) and α ( s ) = tan − (cid:16) ψ ( s ) ψ ( s ) (cid:17) . We assume, following Higdonet al. (1999), that ψ ( · ) and ψ ( · ) are independent and identical zero mean Gaussian processeswith covariance c ψ ( s , s ) = σ ψ exp {−(cid:107) s − s (cid:107) /b ψ } . Following Higdon et al. (1999) we put U (3 , ϕ , a δ , and b ψ ; we set σ δ = σ ψ = 1. As proposed by Higdon et al. (1999),we set A = 3 .
5. Since in our applications we center and scale the observed time points, for τ we specify the N (0 ,
1) prior.6.1 Elicitation of hyperparameters of the underlying ODDP G In our applications, we center and scale each of the two compo-nents { s i ; i = 1 , . . . , n } and { s i ; i = 1 , . . . , n } of the available spatial locations { s i =( s i , s i ); i = 1 , . . . , n } . Consequently, the choosing G to be the bivariate normal distri-bution with both means zero, both variances equal to one, and correlation ρ appears to bereasonable. We estimate ρ by the empirical correlation between { s i ; i = 1 , . . . , n } and { s i ; i = 1 , . . . , n } . α For the choice of prior distributions of the parametersassociated with the ODDP we follow Griffin & Steel (2004) and GS. In particular, we putthe inverted Beta distribution prior on α , given by p ( α ) = n η Γ(2 η ) α η − Γ( η ) ( α + n ) η , where the hyperparameter n is the prior median of α . Note that the prior variance existsif η > η . This prior implies that αα + n follows a Beta ( η, η )distribution. λ Note that, for small α , only the first few elements of stickbreaking representation are important, so fewer number of points from the underlying Poissonprocess is needed to induce the second factor of the correlation structure (3.10) which roughlydepends upon the ratio λ/ ( α + 1) for U ( x ) = D (spatial problem) and U ( x ) = ( −∞ , x ](temporal problem); see GS for the details. Thus a relatively small value of λ suffices in18uch cases. Similarly, when α is larger, larger λ is necessary to obtain the same correlation.Keeping these in mind, we select the log-normal prior for λ with mean log( α ) and variance b λ , say. For our applications, we choose b λ = 20, so that we obtain a reasonably vague prior.6.2 Computational regionFollowing GS, we consider a truncated region for the point process Z which includes the rangeof the observed x . This truncated region has been referred to as the computational regionby GS. In particular, we choose a bounding box of the form ( a , b ) × ( a , b ) × · · · × ( a d , b d )as the computational region, where a i = d a i − r , b i = d b i − r . Here d a i and d b i are theminimum and the maximum of x in dimension i , and r = 2 (cid:16) Γ( d/ d π d/ α +1 λ log (cid:15) (cid:17) d , with (cid:15) =exp (cid:110) − λα +1 2 π d/ Γ( d/ d (cid:0) r (cid:1) d (cid:111) . See GS for justification of these choices.7. JOINT POSTERIOR AND A BRIEFING OF TTMCMC FOR UPDATINGPARAMETERS IN OUR VARIABLE DIMENSIONAL MODELINGFRAMEWORKLet k denote the random number of summands in f k ( x ) = k (cid:88) i =1 K ( x , θ π i ( x ) ) p i ( x ) ∀ x ∈ D ⊆ R d . (7.1)Let V = ( V , . . . , V k ), z = ( z , . . . , z k ), θ = ( θ , θ ), with θ = ( θ , . . . , θ k ) and θ =( θ , . . . , θ k ). Let also ψ = ( ψ ( s ) , . . . , ψ ( s n )), ψ = ( ψ ( s ) , . . . , ψ ( s n )) and δ =( δ ( t ) , . . . , δ ( t n )). The joint posterior is of the form π ( k, V , z , θ , θ , ψ , ψ , δ , τ, σ, α, λ, b ψ , a δ | Y ) ∝ π ( k ) π ( V , z , θ , θ | k ) π ( ψ , ψ , δ ) π ( τ, ϕ, b ψ , a δ ) π ( σ ) π ( α ) π ( λ | α ) × L ( V , z , θ , θ , σ | k, Y ) , (7.2)where L ( V , z , θ , θ , σ | k, Y ) is the joint normal likelihood of V , z , θ , θ , σ under the model y i = f k ( x i ) + (cid:15) i ; (cid:15) i iid ∼ N (0 , σ ); i = 1 , . . . , n, (7.3)conditional on f k ( · ). 19or our applications, as the prior π ( k ) on k we assume the discrete uniform prioron { , , . . . , } ; in our applications k never even reached 30. Under π ( V , z , θ , θ | k ), V i iid ∼ Beta (1 , α ); i = 1 , . . . , k , z are realizations from the Poisson process with intensity λ ,and for i = 1 , . . . , k , ( θ i , θ i ) iid ∼ G . Under π ( ψ , ψ , δ ), ψ , ψ , δ are independent Gaus-sian processes, as detailed in Section 6. The prior distribution of τ, ϕ, b ψ , a δ , denoted by π ( τ, ϕ, b ψ , a δ ), is already provided in Section 6. For the error standard deviation σ , the priordenoted by π ( σ ) is the log-normal distribution with parameters 0 and 1, so that the meanand variance of σ are about 1.6 and 5, respectively. These quantities appear to be reasonable,and yielded adequate inference.In order to obtain samples from the joint posterior (7.2) which involve the variable dimen-sional f k ( · ), we implement the TTMCMC methodology. In a nutshell, TTMCMC updatesall the parameters, both fixed and variable dimensional, as well as the number of param-eters of the underlying posterior distribution in a single block using simple deterministictransformations of some low-dimensional random variable drawn from some fixed, but low-dimensional arbitrary distribution defined on some relevant support. The idea is an exten-sion of Transformation based Markov Chain Monte Carlo (TMCMC) introduced by Dutta& Bhattacharya (2014) for updating high-dimensional parameters with known dimension-ality in a single block using simple deterministic transformations of some low-dimensional(usually one-dimensional) random variable having arbitrary distribution on some relevantsupport. The strategy of updating high and variable dimensional parameters using verylow-dimensional random variables clearly reduces dimensionality dramatically, thus greatlyimproving acceptance rate, mixing properties, and computational speed. In Section S-7 ofthe supplement we provide a detailed overview of TTMCMC, propose a general algorithm(Algorithm S-7.1) with certain advantages, and in Section S-8 of the supplement we spe-cialize the algorithm to our spatio-temporal modeling set-up, providing full updating details(Algorithm S-8.1). 20. SIMULATION STUDYTo illustrate the performance of our model we first create a synthetic data generating processwhich is nonstationary and non-Gaussian. One popular method to create such process isthe kernel convolution approach. However, since we have developed our spatio-temporalmodel itself using the kernel convolution approach, it is perhaps desirable to obtain thesynthetic data from some nonstationary, non-Gaussian process created using some approachindependent of the kernel convolution method. In Section 8.1 we detail such an approach.Then we fit our proposed model to the data pretending that the data-generating process isunknown.8.1 A nonstationary non-Gaussian data generating processLet X ( · ) denote a stationary Gaussian process with mean function µ ( t, s ) = β + β t + β s (1) + β s (2), with s = ( s (1) , s (2)), and covariance function A ( i, j ) = c (( t i , s i ) , ( t j , s j )) = exp (cid:26) − . (cid:18)(cid:113) ( t i − t j ) + ( s i − s j ) + ( s i − s j ) (cid:19)(cid:27) , for any t i , t j , s i = ( s i , s i ) , s j = ( s j , s j ).Let X = ( X ( t , s ) , . . . , X ( t n , s n )) (cid:48) denote observed data points from the Gaussian pro-cess X at the design points { ( t i , s i ); i = 1 , . . . , n } . Let t = ( t , . . . , t n ) (cid:48) and S = ( s (cid:48) , . . . , s (cid:48) n ) (cid:48) .Further, let us denote by A = ( A ( i, j ); i = 1 , . . . , n ; j = 1 , . . . , n ) the covariance matrix and µ = ( µ ( t , s ) , . . . , µ ( t n , s n )) (cid:48) . Then the posterior process [ X ( · ) | X ] is non-stationary Gaus-sian with mean function µ X ( t, s ) = µ ( t, s )+ A A − ( X − µ ) and variance A − A A − A ,where A = A A A A .Let the posterior nonstationary Gaussian process [ X ( · ) | X ] be denoted by X ∗ ( · ). Nowconsider another process Y ( · ) with mean function µ ∗ ( t, s ) = X ∗ ( t, s ) and covariance func-tion c Y (( t i , s i ) , ( t j , s j )) = exp {− . | X ∗ ( t i , s i ) − X ∗ ( t j , s j ) |} . Then marginally, Y ( · ) is anonstationary non-Gaussian process.For our illustration we will simulate the synthetic dataset from the process Y ( · ). Thealgorithm for generation of this synthetic data is provided next.21.2 Algorithm for generating the synthetic dataWe have perfomed the following steps to simulate a non stationary 95 × t i from each interval ( i − , i ]; i = 1 , . . . ,
100 as100 time points. We store the time points in a vector which we denote by t =( t (1) , . . . , t (100)) (cid:48) .3. Next we generate 100 random points of the form { s i = ( s (1 , i ) , s (2 , i )); i = 1 , . . . , } from [0 , × [0 ,
50] as locations. We store the locations in a 100 × S =( s (cid:48) , . . . , s (cid:48) ) (cid:48) .4. Then we randomly choose 5 time points from t and 5 locations from S and omit theserandom points from t and S . So, we obtain a new time vector of length 95, say t anda new matrix of locations of order 95 ×
2, say S . We store the omitted time pointsin a separate vector, t , and the locations in a separate matrix, S , for future use.5. Next we calculate the covariance matrix A = ( A ( i, j )) of order 100 ×
100 based on t and S , where ( i, j )-th element of A is given by A ( i, j ) = i = j exp (cid:16) − . (cid:112) ( t ( i ) − t ( j )) + ( s (1 , i ) − s (1 , j )) + ( s (2 , i ) − s (2 , j )) (cid:17) if i (cid:54) = j
6. We partition the above covariance function A consisting of four component matrices A , A , A and A , where A is a 5 × t and S ; A is a 95 ×
95 covariance matrix based on t and S (the form of the ( i, j )-thelement being the same as for the matrix A , except now t , S are replaced with t and S for A ; for A , t , S are replaced with t and S ); A = A T is a 5 × i, j )-th element of A is given by A ( i, j ) = (cid:16) − . (cid:112) ( t ( i ) − t ( j )) + ( s (1 , i ) − s (1 , j )) + ( s (2 , i ) − s (2 , j )) (cid:17) , i = 1 , . . . j = 1 , . . . , x , from a 5 variate normaldistribution with mean function µ T = D β , and covariance matrix A , where β T = (0 . , . , .
02) and D = ( t ... S ) is the designmatrix. Note that D is a 5 × x we simulate a 95 × x (95 | from a conditional 95 variatenormal distribution with mean µ T (95 | = D β + ( x − µ ) T A − A and covariance Σ (95 | = A − A A − A , where D is obtained exactly same as D , only t and S are replaced with t and S , respectively.9. The last step is to simulate a 95 × y (95 | x (95 | ) , conditionally on x (95 | . Wesimulate y (95 | x (95 | ) from a 95 variate normal distribution with mean µ T y = 0 . x (95 | ) T and covariance matrixΣ y = i = j exp (cid:0) − . (cid:0) | x (95 | ( i ) − x (95 | ( j ) | (cid:1)(cid:1) if i (cid:54) = j, for i = 1 , . . .
95 and j = 1 , . . .
95. 23.3 Results of fitting our model to the simulated dataNote that for this problem the number of parameters to be updated ranges between 300to 400. Our TTMCMC based model implementation took 35 mins to yield 900000 realiza-tions following a burn-in of 100000. Quite encouragingly, TTMCMC exhibited satisfactoryacceptance rate and mixing properties.
We asess the predictive power of our model withthe leave-one-out cross validation method. All the 95 cases were included in the 95% highestposterior densities of the corresponding leave-one-out posterior predictive densities. Figure10.1 displays the posterior predictive densities of six randomly selected locations, along withthe true values, the latter denoted by the vertical lines. Thus, satisfactory performance ofour proposed model is indicated by the results, particularly given the fact that our modeldoes not assume knowledge of the true, data-generating, parametric model.
Though our simulation mechanism is completely differentfrom our proposed model, the simulated data do exhibit the pattern that the correlationsare close to zero for two widely separated locations. Indeed, from the structure of thecovariance matrix Σ (95 | , it is easily seen that the ( i, j )-th element ( i (cid:54) = j ) of Σ (95 | is closeto zero whenever the distance between t ( i ) and t ( j ) and/or s i and s j is large.We calculate the posterior densities of correlation for different set of locations. In forma-tion of the pair, we select nearby locations, as well as locations which are widely separated,such that we obtain both high and low correlation values under the true, data-generatingmodel. It is evident from Figure 10.2 that the true correlations, ranging from small tohigh values, lie well within their respective 95% credible intervals, vindicating reasonableperformance of our model in terms of capturing the true correlation structure.It is worth mentioning that we repeated the simulation study with 1000 simulated datapoints. In this case the dimension of our spatio-temporal model varies randomly between3000 to 3100. Even in this challenging high-dimensional scenario our TTMCMC based model24mplementation exhibited excellent performance. We omit details for the sake of brevity.9. REAL DATA ANALYSISAccording to the Clean Air Act certain air quality is to be maintained to protect the publichealth, and to maintain proper survival environment of animals and vegetations. As a mea-sure of the quality of air, the Clean Air Act set standard limits for important air pollutantssuch as ozone. For our real data analysis we use the ozone metric called W126 metric, whichis the annual maximum consecutive three month running total of weighted sum of hourlyconcentrations observed between 8AM and 8PM on each day during the high ozone seasonof April through October. Also, we have information on Community Multiscale Air Qualityindices(CMAQ), which is highly correlated with ozone level, so that we use these CMAQvalues as a covariate in our model.9.1 Calculating the W126 metricLet Q l ( s , t ) denote the observed ozone concentration level in parts per million (ppm) unitsat location s at hour l on day t, for t = 1 . . . T and l = 1 . . .
12, where T = 214 days betweenApril 1 and October 31 in a given year. The hours are the 12 day light hours between 8AMand 7PM. The W126 metric for site s is calculated as follows.The weighted hourly metric is calculated using the transformation: U l ( s , t ) = Q l ( s , t ) × (cid:18)
11 + 4403 × exp( − × Q l ( s , t )) (cid:19) . This logistic transformation truncates the values smaller than 0.05ppm to zero, but does notalter the magnitude of values larger than 0.10ppm.The daily index from the 12-hourly weighted values in each day is obtained as Z ( s , t ) = (cid:88) l =1 U l ( s , t ) . The monthly index is calculated from the daily indices by summing and then adjusting25or the number of days in the month as follows: M j ( s ) = (cid:88) t ∈ month j Z ( s , t ) , j = 1 , . . . , , where the the summation is over all the days l that fall within the calender month j .The three-month running totals are centered at the last month and are obtained as:¯ M j ( s ) = j (cid:88) k = j − M k ( s ) , j = 3 , . . . , . Finally, the annual W126 index value is calculated by: Y ( s ) = max j =3 ¯ M j ( s ) . The secondary ozone standard is met at a site s at a given year when the true value of Y ( s ) is less than 21 ppm-hours.Corresponding to each observed ozone concentration Q l ( s , t ) we have a CMAQ modeloutput v l ( A, t ) where the site s is contained in the unique grid cell A . Using the output v l ( A, t ) and the above details daily and annual indexes of CMAQ values namely X ( A, t ) and X ( A ) are constructed.We have the data on annual indexes of ozone values ( W Y ( s ), and correspondingCMAQ X ( A ) values for 76 locations in the US. Now we fit our model to this real data set.Here we model the data on the log scale; we also use the log transformation of the CMAQvalues. In other words, we considerlog( Y ( s )) = α + α log( X ( A i )) + f ( s i ) + (cid:15) i , i = 1 , . . . , where α and α are regression coefficients, f ( s i ) is an annual level spatial random effectat location s i and (cid:15) i is an independent nugget effect with variance σ . Here f ( s i ) is ourproposed spatial model based on kernel convolution with ODPP. All the prior distributionsare the same as mentioned before. For the additional parameters α and α , we use thevague prior distribution N (0 , ), and for σ we use the log-normal prior with mean zeroand variance 10 . 26s before, we assess the predictive power of the model unsing leave-one-out cross valida-tion. For all the locations, the true value of ozone concentration lies within the 95% credibleinterval of the respective cross-validation posterior. This is summarized in the top panelof Figure 10.3, where the middle surface represents the observed data; the lower and theupper surfaces represent the lower and the upper 95% credible regions associated with therespective leave-one-out posterior predictive densities. The surface in the middle of the bot-tom panel are the posterior medians, while the lower and the upper surfaces denote the 95%credible intervals as before. For the convenience of visually comparing the observed dataand the posterior medians, we include Figure 10.4, which also contains the 95% credibleintervals. The plots clearly show that our proposed model is quite adequate for the ozonedata.Posterior densities of correlations, for 6 pairs of sites, are shown in Figure 10.5. All ofthem seem to give high posterior probability to the approximate range (0 . , . p i ’s of ODDP, and kernels are indexed by θ i , where for i = 1 , , . . . , θ i iid ∼ G . We have already obtained some sufficient conditions ensuring that ourmodel converges in L p norm, Besov semi-norm. These results make our proposed model apromising candidate for function estimation.ACKNOWLEDGMENTSWe are grateful to Prof. Sujit Sahu for kindly providing us with the ozone data set and toMr. Suman Guha for helpful dicsussions. 28 upplementary Material S-1. PROOF OF THEOREM 1Note that (cid:90) | K ( x , θ ) | dG x ( θ ) = ∞ (cid:88) i =1 | K ( x , θ π i ( x ) ) | p i , (S-1.1)so that the monotone convergence theorem yields E (cid:18)(cid:90) | K ( x , θ ) | dG x ( θ ) (cid:19) = ∞ (cid:88) i =1 E (cid:0) | K ( x , θ π i ( x ) ) | (cid:1) E ( p i )= (cid:90) | K ( x , θ ) | dG ( θ ) ∞ (cid:88) i =1 E ( p i )= (cid:90) | K ( x , θ ) | dG ( θ ) [since (cid:80) ∞ i =1 p i = 1] . Here we have used the fact that θ i and V i ’s are independent. Therefore (cid:82) | K ( x , θ ) | dG x ( θ )is finite with probability one and hence f ( x ) = (cid:90) K ( x , θ ) dG x ( θ ) = ∞ (cid:88) i =1 K ( x , θ π i ( x ) ) p i ( x )is absolutely convergent with probability one. Since this series is bounded by (S-1.1), whichis integrable, the bounded convergence theorem implies E ( f ( x )) = ∞ (cid:88) i =1 E (cid:0) K ( x , θ π i ( x ) ) (cid:1) E ( p i )= (cid:90) K ( x , θ ) dG ( θ ) . (cid:4) S-2. PROOF OF THEOREM 2 f ( x ) f ( x ) = (cid:90) K ( x , θ ) dG x ( θ ) (cid:90) K ( x , θ ) dG x ( θ )= (cid:88) i K ( x , θ π i ( x ) ) p i ( x ) (cid:88) j K ( x , θ π j ( x ) ) p j ( x )= (cid:88) i (cid:88) j K ( x , θ π i ( x ) ) K ( x , θ π j ( x ) ) p i ( x ) p j ( x ) (S-2.1)29ince both the series are absolutely convergent with probability one. This is bounded inabsolute value by (cid:88) i (cid:88) j | K ( x , θ π i ( x ) ) K ( x , θ π j ( x ) ) | p i ( x ) p j ( x ) . (S-2.2)Now if this is an integrable random variable, we may take the expectation of (S-2.1) insidethe summation sign and obtain E ( f ( x ) f ( x )) = E (cid:18)(cid:90) K ( x , θ ) dG x ( θ ) (cid:90) K ( x , θ ) dG x ( θ ) (cid:19) = (cid:88) i (cid:88) j E (cid:0) K ( x , θ π i ( x ) ) K ( x , θ π j ( x ) ) (cid:1) E ( p i ( x ) p j ( x ))= (cid:88) π i ( x ) (cid:54) = π j ( x ) E (cid:0) K ( x , θ π i ( x ) ) K ( x , θ π j ( x ) ) (cid:1) E ( p i ( x ) p j ( x ))+ (cid:88) π i ( x )= π j ( x ) E (cid:0) K ( x , θ π i ( x ) ) K ( x , θ π j ( x ) ) (cid:1) E ( p i ( x ) p j ( x ))= (cid:88) π i ( x ) (cid:54) = π j ( x ) E ( K ( x , θ )) E ( K ( x , θ )) E ( p i ( x ) p j ( x ))+ (cid:88) π i ( x )= π j ( x ) E ( K ( x , θ ) K ( x , θ )) E ( p i ( x ) p j ( x ))= E ( K ( x , θ )) E ( K ( x , θ ) (cid:88) π i ( x ) (cid:54) = π j ( x ) E ( p i ( x ) p j ( x ))+ E ( K ( x , θ ) K ( x , θ )) (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x ))= E ( K ( x , θ )) E ( K ( x , θ ) (cid:20) (cid:88) π i ( x ) (cid:54) = π j ( x ) E ( p i ( x ) p j ( x )) + (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x )) (cid:21) + E ( K ( x , θ ) K ( x , θ )) (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x )) − E ( K ( x , θ )) E ( K ( x , θ )) (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x ))30 Cov ( K ( x , θ ) , K ( x , θ )) (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x ))+ E ( K ( x , θ )) E ( K ( x , θ )) (cid:20) (cid:88) π i ( x ) ,π j ( x ) E ( p i ( x ) p j ( x )) (cid:21) = Cov ( K ( x , θ ) , K ( x , θ )) (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x ))+ E ( K ( x , θ )) E ( K ( x , θ )) . An analogous equation shows that (S-2.2) is bounded, since, by our assumption (cid:90) | K ( x , θ ) dG ( θ ) < ∞ and (cid:90) | K ( x , θ ) K ( x , θ ) | dG ( θ ) < ∞ . To obtain the complete analytical expression of E ( f ( x ) f ( x )) we need to calculate (cid:80) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x )). Define T ( x , x ) = { k | there exists i, j such that π i ( x ) = π j ( x ) = k } , for k ∈ T ( x , x ). We further define A lk = { π j ( x l ) | j < i where π i ( x l ) = k } , S k = A k ∩ A k and S (cid:48) k = A k ∪ A k − S k .Then it can be easily shown that (cid:88) π i ( x )= π j ( x ) E ( p i ( x ) p j ( x )) = 2( α + 1)( α + 2) (cid:88) k ∈ T ( x , x ) (cid:18) αα + 2 (cid:19) S k (cid:18) αα + 1 (cid:19) S (cid:48) k . Hence Cov( f ( x ) , f ( x )) = Cov G ( K ( x , θ ) , K ( x , θ )) × α + 1)( α + 2) × (cid:88) k ∈ T ( x , x ) (cid:18) αα + 2 (cid:19) S k (cid:18) αα + 1 (cid:19) S (cid:48) k . (cid:4) S-3. PROOF OF THEOREM 4In this context, it is more convenient to deal with the notation used in the context ofTheorem 2. Note that (cid:107) x − x (cid:107) → S k → ( k − S (cid:48) k →
0, and hence,31or any realization of the point process, Corr( G x , G x ) = (cid:80) k ∈ T ( x , x ) (cid:0) αα +2 (cid:1) S k (cid:0) αα +1 (cid:1) S (cid:48) k → (cid:80) ∞ k =1 (cid:0) αα +2 (cid:1) k − = ( α + 2) /
2. Since Corr G ( K ( x , θ ) , K ( x , θ )) → (cid:107) x − x (cid:107) →
0, itfollows that Corr( f ( x ) , f ( x )) → (cid:107) x − x (cid:107) →
0. On the other hand, as (cid:107) x − x (cid:107) → ∞ , T ( x , x ) is at most finite, S k → S (cid:48) k → ∞ . This implies that Corr( G x , G x ) → f ( x ) , f ( x )) →
0. Hence, by the dominated convergence theorem it followsthat the unconditional correlation between f ( x ) and f ( x ) goes to 1 and 0, respectively, as (cid:107) x − x (cid:107) → (cid:107) x − x (cid:107) → ∞ .S-4. PROOF OF THEOREM 7Since for each x , f ( x ) = (cid:80) ∞ i =1 K ( x , θ π i ( x ) ) p i ( x ), each x must satisfy π k ( x ) = i k , for every k = 1 , , . . . , where i k ∈ { , , . . . } ( i k (cid:54) = i k (cid:48) for any k (cid:54) = k (cid:48) ), we must have x ∈ ∩ ∞ k =1 A ki k . Forsimplicity but without loss of generality let i k = k for k = 1 , , . . . . Then for x ∈ ∩ ∞ k =1 A ki k it holds that K ( x , θ π i ( x ) ) = K ( x , θ i ) and p i ( x ) = V i (cid:81) j
0. Hence, using thedominated convergence theorem it follows thatlim (cid:107) u (cid:107)→ E (cid:20) f ( x + u ) − f ( x ) − u (cid:48) (cid:53) f ( x ) (cid:107) u (cid:107) (cid:21) = lim (cid:107) u (cid:107)→ E (cid:20) R ( x , u ) (cid:107) u (cid:107) (cid:21) = 0 . Hence, f ( · ) is mean square differentiable in the interior of ∩ ∞ k =1 A ki k . (cid:4) S-6. PROOF OF THEOREM 10For our purpose, we first state and prove a lemma.34 emma 11
Let p k denote the random weights from an ODDP. For each positive integer N ≥ and each positive integer r ≥ , let T N ( r, α ) = (cid:32) ∞ (cid:88) k = N p k (cid:33) r , U N ( r, α ) = ∞ (cid:88) k = N p rk . Then E ( T N ( r, α )) = (cid:18) αα + r (cid:19) N − , and E ( U N ( r, α )) = (cid:18) αα + r (cid:19) N − Γ( r )Γ( α + 1)Γ( α + r ) . Proof.
Let P be a specific random measure from ODDP. Then P ( · ) = V π δ θ π ( · )+(1 − V π ) (cid:16) V ∗ π δ θ ∗ π ( · ) + (1 − V ∗ π ) V ∗ π δ θ ∗ π ( · ) + (1 − V ∗ π )(1 − V ∗ π ) V ∗ π δ θ ∗ π ( · ) + · · · (cid:17) , where V ∗ π k = V π k +1 are independent Beta (1 , α ) random variables and θ ∗ π k = θ π k +1 are iid G .So we have P ( · ) D = V π δ θ π ( · ) + (1 − V π ) P ∗ ( · ) , where V π , θ π and P ∗ ( · ) are independent and P ∗ ( · ) is an ODDP.Similarly we can show that U ( r, α ) D = V rπ + (1 − V π ) r U ( r, α ) , where on the right-hand side V π and U ( r, α ) are mutually independent. Therefore takingexpectations E ( U ( r, α )) = Γ( r + 1)Γ( α + 1)Γ( α + r + 1) + αα + r E ( U ( r, α )) . Then we have E ( U ( r, α )) = Γ( r )Γ( α + 1)Γ( α + r ) . (S-6.1)Furthermore for N ≥
2, we have that U N ( r, α ) = (1 − V π ) r · · · (1 − V π ( N − ) r ( U ( r, α )) , E ( U N ( r, α )) = (cid:32) N − (cid:89) k =1 E (1 − V π k ) r (cid:33) E (( U ( r, α ))) . Then using (S-6.1) we have E ( U N ( r, α )) = (cid:18) αα + r (cid:19) N − Γ( r )Γ( α + 1)Γ( α + r ) . Now, similarly we can show that T N ( r, α ) D = (1 − V π ) r · · · (1 − V π ( N − ) r ( T ( r, α ))= (1 − V π ) r · · · (1 − V π ( N − ) r = (cid:18) αα + r (cid:19) N − . Hence, the lemma is proved.We now proceed to the proof of Theorem 10. Note that | m N ( y ) − m ∞ ( y ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:90) Θ [ y | P N ][ P ] d Θ − (cid:90) Θ [ y | P ][ P ] d Θ (cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:90) Θ | [ y | P N ] − [ y | P ] | [ P ] d Θ (S-6.2)Now we expand [ y | P N ] around P using multivariate Taylor’s series expansion up to thesecond order:[ y | P N ] = [ y | P ] + ∂ [ y | P N ] ∂P N (cid:12)(cid:12)(cid:12)(cid:12) P N = P ( P N − P ) + ( P N − P ) (cid:48) ∂ [ y | P N ] ∂P N (cid:12)(cid:12)(cid:12)(cid:12) P N = P ∗ ( P N − P ) , [ where P ∗ lies between P and P N , that is, (cid:107) P − P ∗ (cid:107) ≤ (cid:107) P − P N (cid:107) . ]Noting that [ y | P N ] and [ y | P N ] are multivariate normal densities with mean P N and P re-spectively and variance σ I , where I is the n × n identity matrix, we have ∂ [ y | P N ] dP N (cid:12)(cid:12)(cid:12)(cid:12) P N = P = [ y | P ]( y − P ) (cid:48) , and ∂ [ y | P N ] ∂P N (cid:12)(cid:12)(cid:12)(cid:12) P N = P ∗ = [ y | P ∗ ] I ( y − P ∗ )( y − P ∗ ) (cid:48) I − [ y | P ∗ ] I. (cid:90) | [ y | P ]( y − P ) (cid:48) ( P N − P ) + ( P N − P ) (cid:48) [ y | P ∗ ]( y − P ∗ )( y − P ∗ ) (cid:48) ( P N − P ) − ( P N − P ) (cid:48) [ y | P ∗ ] I ( P N − P ) | [ P ] d Θ . So, we have (cid:90) R n | m N ( y ) − m ∞ ( y ) | d y ≤ (cid:90) R n (cid:90) | [ y | P ]( y − P ) (cid:48) ( P N − P ) + ( P N − P ) (cid:48) [ y | P ∗ ]( y − P ∗ )( y − P ∗ ) (cid:48) ( P N − P ) − ( P N − P ) (cid:48) [ y | P ∗ ] I ( P N − P ) | [ P ] d Θ . Now, using Fubini’s theorem we can interchange the order of integration. Finally we have (cid:90) R n (cid:90) | [ y | P ]( y − P ) (cid:48) ( P N − P ) + ( P N − P ) (cid:48) [ y | P ∗ ]( y − P ∗ )( y − P ∗ ) (cid:48) ( P N − P ) − ( P N − P ) (cid:48) [ y | P ∗ ] I ( P N − P ) | [ P ] d Θ d y = (cid:90) (cid:90) R n | [ y | P ]( y − P ) (cid:48) ( P N − P ) + ( P N − P ) (cid:48) [ y | P ∗ ]( y − P ∗ )( y − P ∗ ) (cid:48) ( P N − P ) − ( P N − P ) (cid:48) [ y | P ∗ ] I ( P N − P ) | d y [ P ] d Θ ≤ (cid:90) (cid:20)(cid:90) R n [ y | P ] | ( y − P ) (cid:48) ( P N − P ) | d y + (cid:90) R n ( P N − P ) (cid:48) { [ y | P ∗ ]( y − P ∗ )( y − P ∗ ) (cid:48) + [ y | P ∗ ] I } ( P N − P ) d y (cid:21) [ P ] d Θ= (cid:90) (cid:34)(cid:114) π (cid:48) n × | P N − P | + 2( P N − P ) (cid:48) I ( P N − P ) (cid:35) [ P ] d Θ= E Θ (cid:34)(cid:114) π (cid:48) n × | P N − P | + 2( P N − P ) (cid:48) ( P N − P ) (cid:35) = 2 E Θ (cid:34)(cid:114) π n (cid:88) i =1 | P N ( x i ) − P ( x i ) | + n (cid:88) i =1 ( P N ( x i ) − P ( x i )) (cid:35) . (S-6.3)In the above, n × denotes the n -component vector with each element 1. Now, | P N ( x i ) − P ( x i ) | ≤ M (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) p NN ( x i ) − p ∞ N ( x i ) − ∞ (cid:88) i = N +1 p i ( x i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , p NN ( x i ) = 1 − (cid:80) N − k =1 p k ( x i ) = 1 − (cid:80) N − k =1 V π k ( x i ) (cid:81) j
10 if ζ i = 0 − ζ i = 1 . With this definition of ζ c , T ζ c can be interpreted as the conjugate of T ζ .Since 3 k values of ζ are possible, it is clear that T , via ζ , induces 3 k many types of ‘moves’of the forms { T ζ i ; i = 1 , . . . , k } on the state-space. Suppose now that there is a subset Y of D such that the sets T ζ i ( x , Y ) and T ζ j ( x , Y ) are disjoint for every ζ i (cid:54) = ζ j . In fact, Y denotes the support of the distribution g ( · ) from which (cid:15) is simulated.S-7.2 Illustration of TTMCMC with a simple exampleLet us now illustrate the main idea of TTMCMC informally using the additive transforma-tion. Although the example we illustrate TTMCMC with is borrowed from Das & Bhat-tacharya (2014 b ), the algorithm we now present is somewhat different from that of Das &Bhattacharya (2014 b ). Assume that the current state is x = ( x , x ) ∈ R . We first ran-domly select u = ( u , u , u ) ∼ M ultinomial ( w b , w d , w nc ), where w b , w d , w nc ( >
0) such that w b + w d + w nc = 1 are the probabilities of birth, death, and no-change moves, respectively.That is, if u = 1, then we increase the dimensionality from 2 to 3; if u = 1, then we decreasethe dimensionality from 2 to 1, and if u = 1, then we keep the dimensionality unchanged.In the latter case, when the dimensionality is unchanged, the acceptance probability remainsthe same as in TMCMC, as provided in Algorithm 3.1 of Dutta & Bhattacharya (2014).If u = 1, we can increase the dimensionality by first selecting one of x and x withprobability 1 / x has been selected, we then construct the move-41ype T b, ζ ( x , (cid:15) ) = ( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ) = ( g ,ζ =1 ( x , (cid:15) ) , g ,ζ c = − ( x , (cid:15) ) , g ,ζ ( x , (cid:15) )),say. Here, as in TMCMC, we draw (cid:15) ∼ g ( · ), where g ( · ) is supported on the positive part ofthe real line, and draw ζ = 1 with probability p and ζ = − − p . Notethat the value ζ = 0 is redundant for additive transformation (see Dutta & Bhattacharya(2014) for the details) and so is omitted here. We re-label x (cid:48) = T b, ζ ( x , (cid:15) ) = ( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ) as ( x (cid:48) , x (cid:48) , x (cid:48) ). Thus, T b, ζ ( x , (cid:15) ) increases the dimension from 2 to 3.We accept this birth move with probability a b ( x , (cid:15) ) = min (cid:40) , w d w b × p I { } ( ζ c )2 q I {− } ( ζ c )2 p I { } ( ζ )2 q I {− } ( ζ )2 × π ( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ) π ( x , x ) × (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x , (cid:15) )) ∂ ( x , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) . (S-7.1)In (S-7.1), (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x , (cid:15) )) ∂ ( x , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ) ∂ ( x , x , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a − a ζ a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 2 a . (S-7.2)Now let us illustrate the problem of returning to =( x , x ) ( ∈ R ) from T b, ζ ( x , (cid:15) ) =( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ) ( ∈ R ). For our purpose, in this paper, we select one of thefirst two elements of T b, ζ ( x , (cid:15) ) with the same probabiliuty. Suppose that we select x + a (cid:15) with probability 1 /
2. We then deterministically choose its right-adjacent x − a (cid:15) , and formthe average x ∗ = (( x + a (cid:15) ) + ( x − a (cid:15) )) / x . For non-additive transformations we canconsider the averages of the backward moves of of the selected element and its right-adjacent.Even in this additive transformation example, after simulating (cid:15) as before we can considerthe respective backward moves of x + a (cid:15) and x − a (cid:15) , both yielding x , and then take theaverage denoted by x ∗ . For the remaining element x + ζ a (cid:15) , we need to simulate ζ c andthen consider the move ( x + ζ a (cid:15) ) + ζ c a (cid:15) = x . Thus, we can return to ( x , x ) using thisstrategy.Letting x (cid:48) = ( x (cid:48) , x (cid:48) , x (cid:48) ) = ( x + a (cid:15), x − a (cid:15), x + ζ a (cid:15) ), and denoting the averageinvolving the first two elements by x ∗ , the death move is then given by x (cid:48)(cid:48) = T d, ζ ( x (cid:48) , (cid:15) ) =42 x ∗ , x (cid:48) + ζ c a (cid:15) ) = ( x (cid:48) + x (cid:48) , x (cid:48) + ζ c a (cid:15) ). Now observe that for returning to ( x (cid:48) , x (cid:48) ) from x ∗ , wemust have x ∗ + a (cid:15) ∗ = x (cid:48) and x ∗ − a (cid:15) ∗ = x (cid:48) , which yield (cid:15) ∗ = ( x (cid:48) − x (cid:48) ) / a . Hence, theJacobian associated with the death move in this case is given by (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d, ζ ( x (cid:48) , (cid:15) ) , (cid:15) ∗ , (cid:15) ) ∂ ( x (cid:48) , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ (cid:16) x (cid:48) + x (cid:48) , x (cid:48) + ζ c a (cid:15), x (cid:48) − x (cid:48) a , (cid:15) (cid:17) ∂ ( x (cid:48) , x (cid:48) , x (cid:48) , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) a − a
00 1 0 00 ζ c a (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 a . (S-7.3)We accept this death move with probability a d ( x (cid:48)(cid:48) , (cid:15), (cid:15) ∗ ) = min (cid:26) , w b w d × P ( ζ c ) P ( ζ ) π ( x (cid:48)(cid:48) ) π ( x (cid:48) ( t ) ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d, ζ ( x (cid:48) , (cid:15) ) , (cid:15) ∗ , (cid:15) ) ∂ ( x (cid:48) , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) = min (cid:40) , × w b w d × p I { } ( ζ c )2 q I {− } ( ζ c )2 p I { } ( ζ )2 q I {− } ( ζ )2 × π ( x (cid:48)(cid:48) ) π ( x (cid:48) ) × a (cid:41) . In general, x ∈ R mk may be of the form ( x , x , . . . , x m ), where x (cid:96) = ( x (cid:96), , x (cid:96), , . . . , x (cid:96),k )for (cid:96) = 1 , , . . . , m , where m ≥ x (cid:96) is changed, then the dimensions of all other x (cid:96) (cid:48) ; (cid:96) (cid:48) (cid:54) = (cid:96) must also change accordingly. Forinstance, in our model,where we have summands with unknown number of components andthe i -th component is characterized by the parameters associated with ODDP ( θ i , θ i , V i , z i ),when the dimension of the current k -dimensional vector of the location parameter of the cen-tral distribution ( θ , . . . , θ k ) is increased by one, then one must simultaneously increase thedimension of the other set of the current k -dimensional location parameter ( θ , . . . , θ k ), the k -dimensional vector of the associated point process ( z , . . . , z k ), as well as the k -dimensionalmass vector ( V , . . . , V k ) by one. In Section S-7.3 we present a TTMCMC algorithm (Algo-rithm S-7.1) for situations of this kind, and show that detailed balance holds (irreducibilityand aperiodicity hold by the same arguments provided in Das & Bhattacharya (2014 b )).It is worth mentioning that although Das & Bhattacharya (2014 b ) provide a TTMCMCalgorithm for these situations (Algorithm 5.1 of their paper), their algorithm is somewhatdifferent from ours in that, for the death move, we select only one element randomly; then43e choose the right-adjacent element; take backward transformations of both of them, finallytaking the average. On the other hand, Das & Bhattacharya (2014 b ) select two elementsrandomly without replacement. This difference between the algorithm is reflected in the ac-ceptance ratios – our algorithm is slightly simpler in that the random selection probabilitiesdo not appear in our acceptance ratio, unlike that of Das & Bhattacharya (2014 b ).S-7.3 General TTMCMC algorithm for jumping more than one dimensions at a time whenseveral sets of parameters are related Algorithm S-7.1
General TTMCMC algorithm for jumping m dimensions with m relatedsets of co-ordinates. • Let the initial value be x (0) ∈ R mk , where k ≥ m . • For t = 0 , , , . . .
1. Generate u = ( u , u , u ) ∼ M ultinomial (1; w b,k , w d,k , w nc,k ) .2. If u = 1 (increase dimension from mk to ( m + 1) k ), then(a) Randomly select one co-ordinate from x ( t )1 = ( x ( t )11 , . . . , x ( t )1 k ) without replacement.Let j denote the chosen co-ordinate.(b) Generate (cid:15)(cid:15)(cid:15) m = ( (cid:15) , . . . , (cid:15) m ) iid ∼ g ( · ) and for i ∈ { , . . . , k }\{ j } simulate ζ (cid:96),i ∼ M ultinomial (1; p (cid:96),i , q (cid:96),i , − p (cid:96),i − q (cid:96),i ) independently, for every (cid:96) =1 , . . . , m .(c) Propose the birth move as follows: for each (cid:96) = 1 , . . . , m , applythe transformation x ( t ) (cid:96),i → g i,ζ (cid:96),i ( x ( t ) (cid:96),i , (cid:15) ) for i ∈ { , . . . , k }\{ j } and, foreach (cid:96) ∈ { , . . . , m } , split x ( t ) (cid:96),j into g (cid:96),ζ (cid:96),j =1 ( x ( t ) (cid:96),j , (cid:15) (cid:96) ) and g (cid:96),ζ c(cid:96),j = − ( x ( t ) (cid:96),j , (cid:15) (cid:96) ) .In other words, let x (cid:48) = T b, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) = ( x (cid:48) , . . . , x (cid:48) m ) denote the completebirth move, where, for (cid:96) = 1 , . . . , m , x (cid:48) (cid:96) is given by x (cid:48) (cid:96) = ( g (cid:96),ζ (cid:96), ( x ( t ) (cid:96), , (cid:15) ) , . . . , g j − ,ζ (cid:96),j − ( x ( t ) (cid:96),j − , (cid:15) ) ,g j,ζ (cid:96),j =1 ( x ( t ) (cid:96),j , (cid:15) (cid:96) ) , g j,ζ c(cid:96),j = − ( x ( t ) (cid:96),j , (cid:15) (cid:96) ) , g j +1 ,ζ (cid:96),j +1 ( x ( t ) (cid:96),j +1 , (cid:15) ) , . . . ,. . . , g k,ζ (cid:96),k ( x ( t ) (cid:96),k , (cid:15) )) . e-label the k + 1 elements of x (cid:48) (cid:96) as ( x (cid:48) (cid:96), , x (cid:48) (cid:96), , . . . , x (cid:48) (cid:96),k +1 ) .(d) Calculate the acceptance probability of the birth move x (cid:48) : a b ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) = min (cid:26) , w d,k +1 w b,k × P ( j ) ( ζ c ) P ( j ) ( ζ ) π ( x (cid:48) ) π ( x ( t ) ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m )) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) , where P ( j ) ( ζ ) = m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i , and P ( j ) ( ζ c ) = m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζc(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i . (e) Set x ( t +1) = x (cid:48) with probability a b ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) x ( t ) with probability − a b ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) .
3. If u = 1 (decrease dimension from k to k − m , for k ≥ m ), then(a) Generate (cid:15)(cid:15)(cid:15) m = ( (cid:15) , . . . , (cid:15) m ) iid ∼ g ( · ) .(b) Randomly select one co-ordinate (say, the j -th co-ordinate) from x = ( x , , . . . , x ,k − ) . For (cid:96) = 1 , . . . , m , let x ∗ (cid:96),j = (cid:16) g j,ζ c(cid:96),j = − ( x (cid:96),j , (cid:15) (cid:96) ) + g j (cid:48) ,ζ (cid:96),j +1 =1 ( x (cid:96),j +1 , (cid:15) (cid:96) ) (cid:17) / replace the co-ordinate x (cid:96),j by the average x ∗ (cid:96),j and delete x (cid:96),j +1 .(c) Simulate ζ by generating independently, for (cid:96) = 1 , . . . , m and for i ∈ { , . . . , k }\{ j, j + 1 } , ζ (cid:96),i ∼ M ultinomial (1; p (cid:96),i , q (cid:96),i , − p (cid:96),i − q (cid:96),i ) .(d) For (cid:96) = 1 , . . . , m and for i ∈ { , . . . , k }\{ j, j +1 } , apply the transformation x (cid:48) (cid:96),i = g i,ζ (cid:96),i ( x ( t ) (cid:96),i , (cid:15) ) .(e) Propose the following death move x (cid:48) = T d, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) = ( x (cid:48) , . . . , x (cid:48) m ) wherefor (cid:96) = 1 , . . . , m , x (cid:96) is given by x (cid:48) (cid:96) = ( g ,ζ (cid:96), ( x ( t ) (cid:96), , (cid:15) ) , . . . , g j − ,ζ (cid:96),j − ( x ( t ) (cid:96),j − , (cid:15) ) , x ∗ (cid:96),j , g j +1 ,ζ (cid:96),j +1 ( x ( t ) (cid:96),j +1 , (cid:15) ) ,. . . , g k,ζ (cid:96),k ( x ( t ) (cid:96),k , (cid:15) )) . Re-label the elements of x (cid:48) (cid:96) as ( x (cid:48) (cid:96), , x (cid:48) (cid:96), , . . . , x (cid:48) (cid:96),k − ) . f) For (cid:96) = 1 , . . . , m , solve for (cid:15) ∗ (cid:96) from the equations g (cid:96),ζ (cid:96),j =1 ( x ∗ (cid:96),j , (cid:15) ∗ (cid:96) ) = x (cid:96),j and g (cid:96),ζ c(cid:96),j = − ( x ∗ (cid:96),j , (cid:15) ∗ (cid:96) ) = x (cid:96),j +1 and express (cid:15) ∗ (cid:96) in terms of x (cid:96),j and x (cid:96),j +1 . Let (cid:15)(cid:15)(cid:15) ∗ m = ( (cid:15) ∗ , . . . , (cid:15) ∗ m ) .(g) Calculate the acceptance probability of the death move: a d ( x ( t ) , (cid:15)(cid:15)(cid:15) m , (cid:15)(cid:15)(cid:15) ∗ m ) = min (cid:26) , w b,k − m w d,k × P ( j,j +1) ( ζ c ) P ( j,j +1) ( ζ ) π ( x (cid:48) ) π ( x ( t ) ) (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) , (cid:15)(cid:15)(cid:15) ∗ m , (cid:15)(cid:15)(cid:15) m ) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) , where P ( j,j +1) ( ζ ) = m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j,j +1 } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i , and P ( j,j +1) ( ζ c ) = m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j,j +1 } p I { } ( ζc(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i . (h) Set x ( t +1) = x (cid:48) with probability a d ( x ( t ) , (cid:15)(cid:15)(cid:15) m , (cid:15)(cid:15)(cid:15) ∗ m ) x ( t ) with probability − a d ( x ( t ) , (cid:15)(cid:15)(cid:15) m , (cid:15)(cid:15)(cid:15) ∗ m ) .
4. If u = 1 (dimension remains unchanged), then implement steps (1), (2),(3) of Algorithm 3.1 of Dutta & Bhattacharya (2014). • End for
S-7.4 Detailed balanceTo see that detailed balance is satisfied for the birth and death moves, note that associatedwith the birth move, the probabilily of transition x ( ∈ R k ) (cid:55)→ T b, z ( x , (cid:15)(cid:15)(cid:15) m ) ( ∈ R k + m ), with46 ≥ m , is given by: π ( x ) × k × w b,k × m (cid:89) (cid:96) =1 g ( (cid:15) (cid:96) ) × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i × min , w d,k + m w b,k × (cid:81) m(cid:96) =1 (cid:81) i ∈{ ,...,k }\{ j } p I { } ( ζ c(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i (cid:81) m(cid:96) =1 (cid:81) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i × π ( T b, ζ ( x , (cid:15)(cid:15)(cid:15) m )) π ( x ) × (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m )) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) = 1 k × m (cid:89) i =1 g ( (cid:15) i ) × min π ( x ) × w b,k × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i , × w d,k + m × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ c(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i π ( T b, ζ ( x , (cid:15)(cid:15)(cid:15) m )) × (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m )) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12) . (S-7.4)The transition probability of the reverse death move is given by: π ( x ) × w d,k + m × k × m (cid:89) (cid:96) =1 g ( (cid:15) (cid:96) ) × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ c(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i × (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T − d, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) , (cid:15)(cid:15)(cid:15) ∗ m , (cid:15)(cid:15)(cid:15) m ) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) × min , w b,k w d,k + m × (cid:81) m(cid:96) =1 (cid:81) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i (cid:81) m(cid:96) =1 (cid:81) i ∈{ ,...,k }\{ j } p I { } ( ζ c(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i × π ( x ) π ( T b, ζ ( x , (cid:15)(cid:15)(cid:15) m )) × (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) , (cid:15)(cid:15)(cid:15) ∗ m , (cid:15)(cid:15)(cid:15) m ) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:27) = 1 k × m (cid:89) (cid:96) =1 g ( (cid:15) (cid:96) ) × min π ( T b, ζ ( x , (cid:15)(cid:15)(cid:15) m )) × w d,k + m × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ c(cid:96),i ) (cid:96),i q I {− } ( ζ c(cid:96),i ) (cid:96),i × (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T − d, ζ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) , (cid:15)(cid:15)(cid:15) ∗ m , (cid:15)(cid:15)(cid:15) m ) ∂ ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , w b,k × m (cid:89) (cid:96) =1 (cid:89) i ∈{ ,...,k }\{ j } p I { } ( ζ (cid:96),i ) (cid:96),i q I {− } ( ζ (cid:96),i ) (cid:96),i × π ( x ) . (S-7.5)Noting that (cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T − d, ζ ( x ( t ) ,(cid:15)(cid:15)(cid:15) m ) ,(cid:15)(cid:15)(cid:15) ∗ m ,(cid:15)(cid:15)(cid:15) m ) ∂ ( x ( t ) ,(cid:15)(cid:15)(cid:15) ∗ m ,(cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) ∂ ( T b, ζ ( x ( t ) ,(cid:15)(cid:15)(cid:15) m )) ∂ ( x ( t ) ,(cid:15)(cid:15)(cid:15) m ) (cid:12)(cid:12)(cid:12) , it follows that (S-7.4) = (S-7.5), showingthat detailed balance holds for the birth and the death moves.S-8. TTMCMC ALGORITHM FOR OUR SPATIO-TEMPORAL MODELWe now specialize the general TTMCMC algorithm (Algorithm S-7.1) provided in SectionS-7.3 in our spatio-temporal context. For our spatio temporal model, we need to update47he variable dimensional mass parameter V = ( V , . . . , V k ), the point process variables z = ( z , . . . , z k ), location parameters θ = ( θ , . . . , θ k ), the other set of location param-eters θ = ( θ , . . . , θ k ), and fixed dimensional parameters α , λ , error variance σ and theparameters related to the kernel, namely, ϕ, a δ , b ψ , ( ψ ( s ) , . . . , ψ ( s n )), ( ψ ( s ) , . . . , ψ ( s n )),( δ ( t ) , . . . , δ ( t n )), and τ . For updating the variable dimensional parameters we use proposedTTMCMC algorithm, and for fixed dimension we use the TMCMC algorithm of Dutta &Bhattacharya (2014). We denote by ξ = ( V , z , θ , θ ) the collection of all variable dimen-sional parameters and by η = ( ϕ, a δ , b ψ , ψ ( s ) , . . . , ψ ( s n ) ,ψ ( s ) , . . . , ψ ( s n ) , δ ( t ) , . . . , δ ( t n ) , τ, α, λ ), the collection of all fixed dimensional parameters.The detailed updating procedure is provided as Algorithm S-8.1. Algorithm S-8.1
Detailed updating procedure of our spatio-temporal model • Initialise the number of components k ; let k (0) be the chosen initial value(we chose k (0) = 15 as the initial value for our applications). • Given k = k (0) , let ξ (0) denote the initial value of ξ . Also, let η (0) denotethe initial value of η . • Since z and V are constrained random variables, we consider updating thereparameterized versions V ∗ = log( V ) and z ∗ = log (cid:0) z − ab − a (cid:1) . After every iterationwe invert the transformations to store the original variables V and z . Forthe sake of convenience of presentation of our algorithm we slightly abusenotation by referring to V ∗ and z ∗ as V and z respectively. • For t = 0 , , , . . .
1. Generate u = ( u , u , u ) ∼ M ultinomial (cid:0) , , (cid:1) .2. If u = 1 (increase dimension from k to k +1 for each of the variables V , z , θ , θ ), then(a) Randomly select one co-ordinate from { , . . . , k } . Let j denote thechosen co-ordinate. b) Generate (cid:15)(cid:15)(cid:15) = ( (cid:15) , . . . , (cid:15) ) iid ∼ N (0 , I { (cid:15)> } ( I { (cid:15)> } denoting the inidicatorfunction). For updating the variable dimensional parameters, simulate ζ (1) (cid:96),i = w.p. − w.p. for i ∈ { , . . . , k }\{ j } and (cid:96) = 1 , . . . , , and for updating the fixed one dimensional parameters, simulate ζ (2) (cid:96) = w.p. − w.p. for (cid:96) = 5 , . . . , . For updating fixed multi-dimensional parameters, simulate ζ (3) (cid:96),i = w.p. − w.p. for i ∈ { , . . . , n } and (cid:96) = 11 , , , . (c) Propose the birth move as follows. For i ∈ { , . . . , k }\{ j } , apply theadditive transformation: V ( t ) i → ( V ( t ) i + ζ (1)1 ,i a (cid:15) ) z ( t ) i → ( z ( t ) i + ζ (1)2 ,i a (cid:15) ) θ ( t )1 i → ( θ ( t )1 i + ζ (1)3 ,i a (cid:15) ) θ ( t )2 i → ( θ ( t )2 i + ζ (1)4 ,i a (cid:15) ) and split: V ( t ) j into ( V ( t ) j + a (cid:15) ) and ( V ( t ) j − a (cid:15) ) z ( t ) j into ( z ( t ) j + a (cid:15) ) and ( z ( t ) j − a (cid:15) ) θ ( t )1 j into ( θ ( t )1 j + a (cid:15) ) and ( θ ( t )1 j − a (cid:15) ) θ ( t )2 j into ( θ ( t )2 j + a (cid:15) ) and ( θ ( t )2 j − a (cid:15) ) In other words, let x (cid:48) = T b,ζ (1) ( x ( t ) , (cid:15)(cid:15)(cid:15) m ) = ( V (cid:48) , z (cid:48) , θ (cid:48) , θ (cid:48) ) denote thecomplete birth move, where, V (cid:48) = (( V ( t )1 + ζ (1)1 , a (cid:15) ) . . . ( V ( t ) j − + ζ (1)1 ,j − a (cid:15) ) , ( V ( t ) j + a (cid:15) ) , ( V ( t ) j − a (cid:15) ) . . . ( V ( t ) k + ζ (1)1 ,k a (cid:15) )) (S-8.1)49 (cid:48) = (( z ( t )1 + ζ (1)2 , a (cid:15) ) . . . ( z ( t ) j − + ζ (1)2 ,j − a (cid:15) ) , ( z ( t ) j + a (cid:15) ) , ( z ( t ) j − a (cid:15) ) . . . ( z ( t ) k + ζ (1)2 ,k a (cid:15) )) (S-8.2) θ (cid:48) = (( θ ( t )1 , + ζ (1)3 , a (cid:15) ) . . . ( θ ( t )1 ,j − + ζ (1)3 ,j − a (cid:15) ) , ( θ ( t )1 ,j + a (cid:15) ) , ( θ ( t )1 ,j − a (cid:15) ) . . . ( θ ( t )1 ,k + ζ (1)3 ,k a (cid:15) )) (S-8.3) θ (cid:48) = (( θ ( t )2 , + ζ (1)4 , a (cid:15) ) . . . ( θ ( t )2 ,j − + ζ (1)4 ,j − a (cid:15) ) , ( θ ( t )2 ,j + a (cid:15) ) , ( θ ( t )2 ,j − a (cid:15) ) . . . ( θ ( t )2 ,k + ζ (1)4 ,k a (cid:15) )) (S-8.4) Re-label the k + 1 elements of V (cid:48) as ( V (cid:48) , V (cid:48) , . . . , V (cid:48) k +1 ) , z (cid:48) as ( z (cid:48) , z (cid:48) , . . . , z (cid:48) k +1 ) , θ (cid:48) as ( θ (cid:48) , , θ (cid:48) , , . . . , θ (cid:48) ,k +1 ) , θ (cid:48) as ( θ (cid:48) , , θ (cid:48) , , . . . , θ (cid:48) ,k +1 ) .(d) We apply the additive transformation based on the single (cid:15) to updateall the fixed dimensional parameter η as follows: ϕ ( t ) → ( ϕ ( t ) + ζ (2)5 a (cid:15) ) a ( t ) δ → ( a ( t ) δ + ζ (2)6 a (cid:15) ) b ( t ) ψ → ( b ( t ) ψ + ζ (2)7 a (cid:15) ) α ( t ) → ( α ( t ) + ζ (2)8 a (cid:15) ) λ ( t ) → ( λ ( t ) + ζ (2)9 a (cid:15) ) τ ( t ) → ( τ ( t ) + ζ (2)10 a (cid:15) ) σ ( t ) → ( σ ( t ) + ζ (2)11 a (cid:15) ) ψ ( t )1 ( s i ) → ( ψ ( t )1 ( s i ) + ζ (3)12 ,i a (cid:15) ) ψ ( t )2 ( s i ) → ( ψ ( t )2 ( s i ) + ζ (3)13 ,i a (cid:15) ) δ ( t ) ( t i ) → ( δ ( t ) ( t i ) + ζ (3)14 ,i a (cid:15) )50 et η (cid:48) = T b,ζ (2) ( η ( t ) , (cid:15) ) = ( ϕ (cid:48) , a (cid:48) δ , b (cid:48) ψ , ψ (cid:48) ( s ) , . . . , ψ (cid:48) ( s n ) , ψ (cid:48) ( s ) , . . . , ψ (cid:48) ( s n ) ,δ (cid:48) ( t ) , . . . , δ (cid:48) ( t n ) , τ (cid:48) , α (cid:48) , λ (cid:48) , σ (cid:48) ) denote the complete move type for fixed dimensionalparameters.In the above transformations the a i ’s are the scaling constants tobe chosen appropriately; see Das & Bhattacharya (2014 b ) and Dey &Bhattacharya (2014) for the details. In our applications we choosethe scales on the basis of pilot runs of our TTMCMC algorithm.(e) Calculate the acceptance probability: a b,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) ) = min (cid:40) , π ( ξ (cid:48) , η (cid:48) ) π ( ξ ( t ) , η ( t ) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b,ζ (1) ( ξ ( t ) , (cid:15)(cid:15)(cid:15) )) ∂ ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T b,ζ (2) ( η ( t ) , (cid:15) )) ∂ ( η ( t ) , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:41) , where ∂ ( T b,ζ (1) ( ξ ( t ) , (cid:15)(cid:15)(cid:15) )) ∂ ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) = 2 a a a a and ∂ ( T b,ζ (2) ( η ( t ) , (cid:15) )) ∂ ( η ( t ) , e ) = 1 . (f) Set ( ξ ( t +1) , η ( t +1) ) = ( ξ (cid:48) , η (cid:48) ) with probability a b,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) ) , ( ξ ( t ) , η ( t ) ) with probability − a b,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) ) .
3. If u = 1 (decrease dimension from k to k − for each of the variables V , z , θ , θ ), then(a) Generate (cid:15)(cid:15)(cid:15) = ( (cid:15) , . . . , (cid:15) ) iid ∼ N (0 , I { (cid:15)> } .(b) Randomly select one co-ordinate from { , , . . . , k − } . Let j be theselected co-ordinate. Then let V ∗ j = (( V j + a (cid:15) ) + ( V j +1 − a (cid:15) )) / replace the co-ordinate V j by the average V ∗ j and delete V j +1 . Similarly,let z ∗ j = (( z j + a (cid:15) ) + ( z j +1 − a (cid:15) )) / eplace the co-ordinate z j by the average z ∗ j and delete z j +1 . Form θ ∗ ,j = (( θ ,j + a (cid:15) ) + ( θ ,j +1 − a (cid:15) )) / and replace the co-ordinate θ ,j by the average θ ∗ ,j and delete θ ,j +1 ;create θ ∗ ,j = (( θ ,j + a (cid:15) ) + ( θ ,j +1 − a (cid:15) )) / and replace the co-ordinate θ ,j by the average θ ∗ ,j and delete θ ,j +1 .(c) Simulate ζ similarly as in the case of the birth move.(d) For the co-ordinates other than j and j +1 apply the additive transformation V ( t ) i → ( V ( t ) i + ζ (1)1 ,i a (cid:15) ) z ( t ) i → ( z ( t ) i + ζ (1)2 ,i a (cid:15) ) θ ( t )1 i → ( θ ( t )1 i + ζ (1)3 ,i a (cid:15) ) θ ( t )2 i → ( θ ( t )2 i + ζ (1)4 ,i a (cid:15) ) for i ∈ { , . . . , k }\{ j, j + 1 } .(e) In other words, let ξ (cid:48) = T d,ζ (1) ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) = ( V (cid:48) , z (cid:48) , θ (cid:48) , θ (cid:48) ) denote the completedeath move, where, V (cid:48) = (( V ( t )1 + ζ (1)1 , a (cid:15) ) . . . ( V ( t ) j − + ζ (1)1 ,j − a (cid:15) ) , V ∗ j , ( V ( t ) j +2 + ζ (1)1 ,j +2 a (cid:15) ) . . . ( V ( t ) k + ζ (1)1 ,k a (cid:15) )) (S-8.5) z (cid:48) = (( z ( t )1 + ζ (1)2 , a (cid:15) ) . . . ( z ( t ) j − + ζ (1)2 ,j − a (cid:15) ) , z ∗ j , ( z ( t ) j +21 + ζ (1)2 ,j +21 a (cid:15) ) ,. . . ( z ( t ) k + ζ (1)2 ,k a (cid:15) )) (S-8.6) θ (cid:48) = (( θ ( t )1 , + ζ (1)3 , a (cid:15) ) . . . ( θ ( t )1 ,j − + ζ (1)3 ,j − a (cid:15) ) , θ ∗ ,j , ( θ ( t )1 ,j +2 + ζ (1)3 ,j +2 a (cid:15) ) . . . ( θ ( t )1 ,k + ζ (1)3 ,k a (cid:15) )) (S-8.7)52 (cid:48) = (( θ ( t )2 , + ζ (1)4 , a (cid:15) ) . . . ( θ ( t )2 ,j − + ζ (1)4 ,j − a (cid:15) ) , θ ∗ ,j , ( θ ( t )2 ,j +2 + ζ (1)4 ,j +2 a (cid:15) ) . . . ( θ ( t )2 ,k + ζ (1)4 ,k a (cid:15) )) (S-8.8) Re-label the k − elements of V (cid:48) as ( V (cid:48) , V (cid:48) , . . . , V (cid:48) k − ) , z (cid:48) as ( z (cid:48) , z (cid:48) , . . . , z (cid:48) k − ) , θ (cid:48) as ( θ (cid:48) , , θ (cid:48) , , . . . , θ (cid:48) ,k − ) , and θ (cid:48) as ( θ (cid:48) , , θ (cid:48) , , . . . , θ (cid:48) ,k − ) .(f) Solve for (cid:15) ∗ from the equations V ∗ j + a (cid:15) ∗ = V j and V ∗ j − a (cid:15) ∗ = V j +1 ,which yield (cid:15) ∗ = ( V j − V j +1 )2 a . Similarly, we have (cid:15) ∗ = ( z j − z j +1 )2 a (cid:15) ∗ = ( θ ,j − θ ,j +1 )2 a and (cid:15) ∗ = ( θ ,j − θ ,j +1 )2 a . Let (cid:15)(cid:15)(cid:15) ∗ = ( (cid:15) ∗ , . . . , (cid:15) ∗ ) .(g) For updating the fixed dimensional parameters η = ( ϕ, a δ , b ψ , ψ ( s ) , . . . , ψ ( s n ) , ψ ( s ) , . . . , ψ ( s n ) , δ ( t ) , . . . , δ ( t n ) , τ, α, λ, σ ) implement step d ) .(h) Calculate the acceptance probability of the death move: a d,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) , (cid:15)(cid:15)(cid:15) ∗ )= min (cid:40) , π ( ξ (cid:48) , η (cid:48) ) π ( ξ ( t ) , η ( t ) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d,ζ (1) ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) , (cid:15)(cid:15)(cid:15) ∗ , (cid:15)(cid:15)(cid:15) ) ∂ ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d,ζ (2) ( η ( t ) , (cid:15) )) ∂ ( η ( t ) , (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:41) , where (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂ ( T d,ζ (1) ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) , (cid:15)(cid:15)(cid:15) ∗ , (cid:15)(cid:15)(cid:15) ) ∂ ( ξ ( t ) , (cid:15)(cid:15)(cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 12 a a a a and ∂ ( T d,ζ (2) ( η ( t ) , (cid:15) )) ∂ ( η ( t ) , (cid:15) ) = 1 . (i) Set ( ξ ( t +1) , η ( t +1) ) = ( ξ (cid:48) , η (cid:48) ) with probability a d,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) , (cid:15)(cid:15)(cid:15) ∗ ) , ( ξ ( t ) , η ( t ) ) with probability − a d,ζ ( ξ ( t ) , η ( t ) , (cid:15)(cid:15)(cid:15) , (cid:15)(cid:15)(cid:15) ∗ ) . . If u = 1 (dimension remains unchanged),then update ( ξ ( t ) , η ( t ) ) by implementing steps (1), (2), (3) of Algorithm3.1 of Dutta & Bhattacharya (2014). • End for
Journal of Multivariate Analysis , 84, 85100.Chang, Y.-M., Hsu, N.-J., & Huang, H.-C. (2011), “Semiparametric Estimation and Selec-tion for Nonstationary Spatial Covariance Functions,”
Journal of COmputational andGraphical Statistics , 19, 117–139.Damian, D., Sampson, P. D., & Guttorp, P. (2001), “Bayesian Estimation of Semi-[arametricNon-stationary Spatial Covariance Structures,”
Environmetrics , 12, 161–178.Das, M., & Bhattacharya, S. (2014 a ), “Supplement to “Nonstationary, Nonparametric,Nonseparable Bayesian Spatio-Temporal Modeling Using Kernel Convolution of OrderBased Dependent Dirichlet Process”,”. Submitted.Das, M., & Bhattacharya, S. (2014 b ), “Transdimensional Transformation Based MarkovChain Monte Carlo,”. Submitted. Available at “http://arxiv.org/pdf/1403.5207”.Dey, K. K., & Bhattacharya, S. (2014), “On Optimal Scaling of Additive Trans-formation based Markov Chain Monte Carlo,”. Submitted. Available at“http://arxiv.org/abs/1307.1446”.Duan, J. A., Gelfand, A. E., & Sirmans, C. F. (2009), “Modeling Space-Time Data UsingStochastic Differential Equations,” Bayesian Anslysis , 4, 733–758.Duan, J. A., Guindani, M., & Gelfand, A. E. (2007), “Generalized Spatial Dirichlet ProcessModels,”
Biometrika , 94, 809–825.Dutta, S., & Bhattacharya, S. (2014), “Markov Chain Monte Carlo Basedon Deterministic Transformations,”
Statistical Methodology , 16, 100–116.Also available at http://arxiv.org/abs/1106.5850. Supplement available at“http://arxiv.org/abs/1306.6684”.Ferguson, T. S. (1973), “A Bayesian Analysis of Some Nonparametric Problems,”
The Annalsof Statistics , 1, 209–230.Ferguson, T. S. (1974), “Prior Distributions on Spaces of Probability Measures,”
The Annalsof Statistics , 2, 615–629. 55uentes, M. (2002), “Spectral Methods for Nonstationary Spatial Processes,”
Biometrika ,89, 197–210.Fuentes, M., & Smith, R. L. (2001), “A New Class of Nonstationary Spatial Models,”.Technical Report, Department of Statistics, North Carolina State University.Gelfand, A. E., Kottas, A., & MacEachern, S. N. (2005), “Bayesian Nonparametric SpatialModeling With Dirichlet Process Mixing,”
Journal of the American Statistical Associ-ation , 100, 1021–1035.Griffin, J. E., & Steel, M. F. J. (2004), “Semiparametric Bayesian Inference for StochasticFrontier Models,”
Journal of Econometrics , 123, 121–152.Griffin, J. E., & Steel, M. F. J. (2006), “Order-Based Dependent Dirichlet Processes,”
Jour-nal of the American Statistical Association , 101, 179–194.Guttorp, P., & Sampson, P. D. (1994), Methods for Estimating Heterogeneous Spatial Co-variance Functions with Environmental Applications,, in
Handbook of Statistics XII:Environmental Statistics , eds. G. P. Patil, & C. R. Rao, Elsevier/North Holland, NewYork, pp. 663–690.Haas, T. C. (1995), “Local Prediction of a Spatio-Temporal Process with an Application toWet Sulfate Deposition,”
Journal of the American Statistical Association , 90, 11891199.Higdon, D. (1998), “A Process-Convolution Approach to Modeling Temperatures in theNorth Atlantic Ocean,”
Environmental and Ecological Statistics , 5, 173190.Higdon, D. (2001), Space and Space-Time Modeling Using Process Convolutions,, in
Quan-titative Methods for Current Environmental Issues , eds. C. W. A. V. Barnett, P. C.Chatwin, & A. H. El-Sharaawi, Springer-Verlag, London, pp. 37–56.Higdon, D., Swall, J., & Kern, J. (1999), Non-Stationary Spatial Modeling,, in
BayesianStatistics 6 , eds. J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith, OxfordUniversity Press, Oxford, pp. 761–768.Ishwaran, H., & James, L. F. (2001), “Gibbs Sampling Methods for Stick-Breaking Prior,”
Journal of the American Statistical Association , 96, 161–173.Nott, D. J., & Dunsmuir, W. T. M. (2002), “Estimation of Nonstationary Spatial CovarianceStructure,”
Biometrika , 89, 819–829. 56ampson, P. D., & Guttorp, P. (1992), “Nonparametric Estimation of Nonstationary SpatialCovariance Structure,”
Journal of the American Statistical Association , 87, 108119.Schmidt, A. M., & O’Hagan, A. (2003), “Bayesian Inference for Nonstationary Spatial Co-variance Structure via Spatial Deformations,”
Journal of the Royal Statistical Society.Series B , 65, 743758.Sethuraman, J. (1994), “A constructive definition of Dirichlet priors,”
Statistica Sinica ,4, 639–650.Stein, M. L. (1999),
Interpolation of Spatial Data: Some Theory for Kriging , New York:Springer-Verlag.Wolpert, R. L., Clyde, M. A., & Tu, C. (2011), “Stochastic Expansions Using ContinuousDictionaries: L´evy Adaptive Regression Kernels,”
Annals of Statistics (to appear) , .Yaglom, A. M. (1987 a ), Correlation Theory of Stationary and Related Random Functions–Volume-I: Basic Results , New York: Springer-Verlag.Yaglom, A. M. (1987 b ), Correlation Theory of Stationary and Related Random Functions–Volume-II: Supplemtary Notes and References , New York: Springer-Verlag.57 . . . . . Y(s1,t1) −3 −2 −1 0 1 2 3 . . . . . Y(s12,t12) −2 0 2 4 . . . . . Y(s50,t50) −4 −2 0 2 4 . . . . . Y(s53,t53) −3 −2 −1 0 1 2 3 . . . . . Y(s67,t67) −3 −2 −1 0 1 2 3 . . . . . Y(s95,t95)
Figure 10.1: Simulation study: Posterior predictive densities of Y ( s , t ) for 6 differentlocation-time pairs – the corresponding true values are denoted by the vertical line.58 Correlation between Y(s1,t1) & Y(s2,t2) −0.1 0 0.1 0.2 0.3 0.4 0.5 0.600.511.522.533.544.55
Correlation between Y(s1,t1) & Y(s7,t7) −0.6 −0.4 −0.2 0 0.2 0.400.511.522.533.544.55
Correlation between Y(s1,t1) & Y(s95,t95) −0.2 0 0.2 0.4 0.6 0.8 1 1.200.20.40.60.811.21.41.61.8
Correlation between Y(s5,t5) & Y(s9,t9) −0.2 0 0.2 0.4 0.6 0.8 1 1.200.20.40.60.811.21.41.61.8
Correlation between Y(s5,t5) &Y (s12,t12) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.501234567
Correlation between Y(s5,t5) & Y(s91,t91) −0.4 −0.2 0 0.2 0.4 0.6 0.8 100.20.40.60.811.21.41.6
Correlation Between Y(s21,t21) & Y(s23,t23) −0.2 0 0.2 0.4 0.6 0.800.511.522.533.5
Correlation between Y(s21,t21) &Y(s55,t55) −0.4 −0.2 0 0.2 0.4 0.600.511.522.533.54
Correlation between Y(s21,t21) & Y(s92,t92) −0.2 0 0.2 0.4 0.600.511.522.533.544.555.5
Correlation between Y(s60,t60) & Y(s10,t10) −0.2 0 0.2 0.4 0.600.511.522.533.544.55
Correlation between Y(s60,t60) & Y(s69,t69) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.500.511.522.533.544.55
Correlation between Y(s60,t60) & Y(s95,t95)
Figure 10.2: Simulation study: Posterior densities of correlations for 12 different pairs oflocations; the vertical lines indicate the true correlations.59
95 −90 −85 −80 −75 −7020304050−202468
Spatial surfaces interpolating data points and posterior predictive quantiles −101234567 −95 −90 −85 −80 −75 −7020304050−202468
Spatial surfaces interpolating median values and posterior predictive quantiles −101234567
Figure 10.3: Real data analysis: The top panel shows the surface plot of ozone concentrations(middle), the lower and the upper 95% credible intervals associated with the leave-one-outposterior predictive densities, denoted by the lower and the upper surfaces, respectively. Thebottom panel shows the surface plot of the posterior medians (middle) along with the lowerand the upper 95% credible intervals associated with the leave-one-out posterior predictivedensities (lower and the upper surfaces, respectively). The observed data points are indicatedby ‘*’. 60
10 20 30 40 50 60 70 80−2−101234567
Figure 10.4: Real data analysis: Posterior predictive distributions summarized by the median(middle line) and the 95% credible intervals as a function of s . The observed data pointsare denoted by ‘*’. −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.800.511.522.53 Correlation between Y(s1) & Y(s5) −0.4 −0.2 0 0.2 0.4 0.6 0.8 100.511.522.53
Correlation between Y(s4) & Y(s21) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.800.511.522.53
Correlation between Y(s4) & Y(s75) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.800.511.522.53
Correlation between Y(s25) & Y(s55) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.800.511.522.53
Correlation between Y(s35) & Y(s45) −0.2 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.800.511.522.53