Heterogeneous Regression Models for Clusters of Spatial Dependent Data
HHeterogeneous Regression Models for Clusters of SpatialDependent Data
Zhihua Ma Yishu Xue ∗ Guanyu Hu
Abstract
In economic development, there are often regions that share similar economic char-acteristics, and economic models on such regions tend to have similar covariate effects.In this paper, we propose a Bayesian clustered regression for spatially dependent datain order to detect clusters in the covariate effects. Our proposed method is based on theDirichlet process which provides a probabilistic framework for simultaneous inferenceof the number of clusters and the clustering configurations. The usage of our methodis illustrated both in simulation studies and an application to a housing cost datasetof Georgia. keywords:
Clustered Coefficients Regression, Dirichlet process, MCMC, Spatial Ran-dom Effects ∗ [email protected] a r X i v : . [ ec on . E M ] A p r Introduction
Spatial regression models have been widely applied in many different fields such as environ-mental science (Hu and Bradley, 2018), biological science (Zhang and Lawson, 2011), andeconometrics (Brunsdon et al., 1996) to explore the relation between a response variable anda set of predictors over a region. One of the most important tasks for a spatial regressionmodel is to capture the spatial dependent structure between a response variable and a setof covariates. Cressie (1992) proposed a spatial regression model with Gaussian process,where the spatial random effects are accounted for only by the intercepts. Brunsdon et al.(1996) proposed a geographically weighted regression (GWR) model which assumed the ex-istence of a spatially-dependent parameter surface, and used weighted local linear regressionto estimate such parameter surface. An application of GWR in analyzing the impact ofsocio-economic factors on treated prevalence for mental disorders in Barcelona is presentedin Salinas-P´erez et al. (2015). The idea of GWR has been subsequently extended to the Coxmodel framework by Xue et al. (2019). In addition to mean-based regression, Chasco andGallo (2015) used a spatial quantile regression technique to identify heterogeneities. Fromthe Bayesian perspective, Gelfand et al. (2003) incorporated Gaussian process to regressioncoefficients to build a model with spatially varying coefficients. Autant-Bernard and LeSage(2019) used a Bayesian heterogeneous spatial autoregressive model that allows for spatialvariation variations in intercepts, covariate effects, and noise variances to study the knowl-edge production functions of different regions in order to set up their regional innovationstrategy. The aforementioned works, however, all assumed that each location has its own setof regression parameters, which sometimes leads to excessive numbers of parameters, andsubsequently overfitting. Cluster effects over the space of interest has not been taken intoaccount.Detection of heterogeneous covariate effects in many different fields, such as real estateapplications, spatial econometrics, and environmental science are becoming of increasing re-2earch interest. For example, administrative divisions in a country, such as regions, provinces,states, or territories, often have different economic statuses and development patterns. Moreadvanced divisions and less developed divisions could be put into separate clusters and an-alyzed. Such clustering information is of great interest to regional economics researchers.One of the most popular methods for spatial cluster detection is the scan statistic method(Kulldorff and Nagarwalla, 1995), where a scan statistic is constructed via a likelihood ratiostatistic to test the potential clusters. The usage of spatial scan statistics has been extendedto studies of disease mapping, crime, and public health. Similar endeavor has also been madeunder the Bayesian and nonparametric Bayesian frameworks in pursuit of spatial homogene-ity. Li et al. (2015) used nonparametric Bayesian method to detect cluster boundaries forareal data. Noticing that traditional methods may not work well with spatial missing data,Panzera et al. (2016) proposed using multiple imputation together with the Bayesian Inter-polation method to analyze spatially clustered missing data, which addresses both spatialunivariate and multivariate problems.Most of the aforementioned frequentist and Bayesian approaches mainly focus on estimat-ing cluster configurations of spatial response. Spatially varying patterns in the relationshipbetween a set of covariates and the response is also an important topic that needs to bestudied. Bill´e et al. (2017) used a two-step approach where in the first step, spatial regimesof spatially varying parameters are identified, and in the second step estimated. Recently,methods for cluster detection of spatial regression coefficients have been proposed to detectthe homogeneity of the covariates effects among subareas. Li and Sang (2019) incorpo-rated spatial neighborhood information in a penalized approach to detect spatially clusteredpatterns in the regression coefficients.Under the Bayesian framework, Lawson et al. (2014) explored the usage of multinomialpriors in modeling clustered coefficients in the accelerated failure time model for survivaldata. As discussed by Lawson et al. (2014), to infer the grouping level, complicated searchalgorithms in variable dimensional parameter space are needed, such as the reversible jump3arkov chain Monte Carlo (MCMC) algorithm of Green (1995), which assigns a prior onthe number of clusters, and this number is updated at each iteration of an MCMC chain.Such algorithms are difficult to implement and automate, and are known to suffer from lackof scalability and mixing issues. Nonparametric Bayesian approaches, such as the Dirichletprocess mixture model (DPMM; Ferguson, 1973), offer choices to allow for uncertainty inthe number of clusters, and provide an integrated probabilistic framework under which thenumber of clusters, the clustering configuration, and regression coefficients are simultaneouslyestimated.In this paper, we propose a Bayesian spatial clustered linear regression model with Dirich-let process (DP; Ferguson, 1973) prior, which considers spatially dependent structure andclusters the covariate effects simultaneously. In addition, implementation of our proposedmethods based on nimble (de Valpine et al., 2017), a relatively new and powerful R package,is discussed. The model diagnostic technique, logarithm of the pseudo-marginal likelihood(LPML; Ibrahim et al., 2013), is introduced to assess the fitness of our proposed model. Ourproposed Bayesian approach reveals interesting features of the state-level data of Georgia.The remainder of the paper is organized as follows. In Section 2, we develop a spatialclustered linear regression model with DP prior. In Section 3, a MCMC sampling algorithmbased on nimble and post MCMC inference are discussed. Extensive simulation studiesare carried out in the next section. For illustration, our proposed methodology is appliedto Georgia housing cost dataset in Section 5. Finally, we conclude this paper with a briefdiscussion. In this section, a Bayesian spatial clustered linear model using DPMM is proposed forcoefficient grouping in spatially dependent data. Based on the spatial regression model,spatially-varying coefficients are assigned with a nonparametric DP prior to achieve the goal4f grouping.
The basic geostatistical model (Gelfand and Schliep, 2016) for spatially dependent responseat locations s = ( s , . . . , s n ) is denoted by Y = Xβ + w + (cid:15) , (1)where Y = ( Y ( s ) , . . . , Y ( s n )) the n -dimensional vector of responses observed at the n different locations, X = X ( s ) (cid:62) . . .X ( s n ) (cid:62) is the n × p matrix of covariates, w = ( w ( s ) , . . . , w ( s n ))is the vector of spatial random effects, which is assumed to follow a stationary Gaussianprocess whose covariance structure often depends on the geographical locations, and (cid:15) ∼ MVN( , σ y I ) adds the nugget effect (see e.g., Chapter 6 of Carlin et al., 2014), which isusually a vector of white noise, with MVN denoting the multivariate normal distribution.Oftentimes, w ( s ) is assumed to also follow a MVN. The above spatial regression model canalso be rewritten as Y | β , w , σ y ∼ MVN( Xβ + w , σ y I ) , w ∼ MVN( , Σ W ) , where Σ W is the covariance matrix of the spatial random effect vector w ( s ), and I denotesthe identity matrix. Conditional on X and w , entries in Y are independent. Conventionally,the covariance matrix is given as Σ W = σ w H , where H is a matrix constructed using thegreat circle distance matrix, denoted as GCD, between different locations, i.e.,GCD = GCD( i, j ) i,j ∈{ ,...,n } = great circle distance between locations s i and s j , σ w is a scalar. There are three common weighting schemes for defining H , including:the unity scheme : H = I n × n the exponential scheme : H ( φ ) = H ( i, j ) i,j ∈{ ,...,n } = [exp( − GCD( i, j ) /φ )] n × n , the Gaussian scheme : H ( φ ) = H ( i, j ) i,j ∈{ ,...,n } = (cid:2) exp (cid:0) − (GCD( i, j ) /φ ) (cid:1)(cid:3) n × n , (2)where φ is a tuning parameter that controls the spatial correlation. Larger value of φ indicatesstronger correlation.Another spatial regression model is the spatially varying coefficients model (Gelfand et al.,2003): Y ( s ) = X ( s ) (cid:62) (cid:101) β ( s ) + (cid:15) ( s ) , (3)where X ( s ) is p × s , and (cid:101) β ( s ) is assumed to followa p -variate spatial process model. If we have observations ( Y ( s i ) , X ( s i )) for i = 1 , . . . , n ,they can be written into Y = X (cid:62) (cid:101) β + (cid:15) , where Y = ( Y ( s ) , . . . , Y ( s n )) (cid:62) , X (cid:62) is an n × ( np ) block diagonal matrix which has the rowvector X (cid:62) ( s i ) as its i -th diagonal entry, (cid:101) β = ( (cid:101) β ( s ) (cid:62) , . . . , (cid:101) β ( s n ) (cid:62) ) (cid:62) , and (cid:15) ∼ MVN( , σ I ).Gelfand et al. (2003) proposed the following hierarchical model: Y | (cid:101) β , σ ∼ MVN( X (cid:62) (cid:101) β , σ I ) (cid:101) β | µ β , T ∼ MVN(1 n × ⊗ µ β , H ( φ ) ⊗ T ) (4)where µ β is a p × H ( φ ) is a n × n matrix measuring spatial correlations between the n observed locations, T is a p × p covariance matrix associated with an observation vectorat any spatial location, and ⊗ denotes the Kronecker product. In model (1), β is constantover space, which means the covariate effects remain the same over all locations; model (4)6llows for different covariate effects over locations, but restrict the covariate effects to bedetermined by distances between pairs of locations as in (2).For many spatial economics data, however, some regions will share similar covariateeffects regardless of their geographical distance. Taking China as an example, Beijing tendsto have similar economic development pattern with Shanghai or Jiangsu (Ma et al., 2019b). The models in (1) and (4), however, do not take into account such inherent similarities inspatially dependent data. Within the Bayesian framework, coefficient clustering can be accomplished using a Dirichletprocess mixture model (DPMM) by nonparametrically linking the spatial response variableto covariates through cluster membership. Formally, a probability measure G following a DPwith a concentration parameter α and a base distribution G is denoted by G ∼ DP( α, G ) if( G ( A ) , · · · , G ( A r )) ∼ Dirichlet( αG ( A ) , · · · , αG ( A r )) , (5)where ( A , · · · , A r ) are finite measurable partitions of the space Ω. Several different formula-tions can be used for determining the DP. In this work, we use the stick-breaking constructionproposed by Sethuraman (1991) for DP realization, which is given as θ c ∼ G , G = ∞ (cid:88) c =1 π c δ θ c ( · ) ,π = V , π c = V c (cid:89) (cid:96) MCMC is used to draw samples from the posterior distributions of the model parameters.In this section we present the sampling scheme, the posterior inference of cluster belongings,and measurements to evaluate the estimation performance and clustering accuracy. We present the main R function written using the nimble package (de Valpine et al., 2017).The model is wrapped in a nimbleCode() function. For ease of exposition, we break it intoseparate snippets. The full code is available on GitHub with an example implementation.(link removed for blinding purposes, documentation submitted separately)Define S as the number of locations. The following code represent Equation (7). Ateach location, y[i] has a normal distribution with mu y[i] and precision tau y , whichis equivalent to 1 /σ y . A Gamma(1,1) prior is given to tau y . The coefficient vector forlocation i , b[i, 1:6] , equals the coefficient vector estimated for the cluster it belongs to,represented by latent[i] , which follows a multinomial distribution with probability vector zlatent[1:M] , where M denotes the number of potential clusters. SLMMCode <- nimbleCode( { for (i in 1:S) { y[i] ~ dnorm(mu_y[i], tau = tau_y)mu_y[i] <- b[i, 1] * x1[i] + b[i, 2] * x2[i] +b[i, 3] * x3[i] + b[i, 4] * x4[i] + b[i, 5] * x5[i] +b[i, 6] * x6[i] + W[i]b[i, 1:6] <- bm[latent[i], 1:6]latent[i] ~ dcat(zlatent[1:M]) The following code represent Equation (8). H represents the matrix Σ W , where phi is thetuning parameter φ in the exponential scheme in (2) that controls spatial correlation. Therandom effects at locations 1 to S follow a multivariate normal distribution with mu w[1:S] and precision matrix, which equals the product of σ w , tau w , and the inverse of H . H is definedas a function of a certain distance matrix Dist , which is passed into the function later as anargument of SLMMConsts . Here, the function is chosen to be the exponential scheme. Theprior distribution of the bandwidth phi is specified to be a uniform distribution from 0 to acertain upper limit, denoted by D . The prior of tau w is set to Gamma(1,1). for (j in 1:S) { for (k in 1:S) { H[j, k] <- exp(-Dist[j, k]/phi) }} W[1:S] ~ dmnorm(mu_w[1:S], prec = prec_W[1:S, 1:S])prec_W[1:S, 1:S] <- tau_w * inverse(H[1:S, 1:S])phi ~ dunif(0, D)tau_w ~ dgamma(1, 1)mu_w[1:S] <- rep(0, S) The distribution of β for each location s i is defined next. They each come from a multivariatenormal distribution with mean mu bm and covariance matrix var bm , which is a diagonalmatrix with all diagonal entries being . The inverse variance term, tau bm , is again10iven a Gamma(1,1) prior, and the entries in the mean vector are all given independentstandard normal priors. for (k in 1:M) { bm[k, 1:6] ~ dmnorm(mu_bm[1:6], cov = var_bm[1:6, 1:6]) } var_bm[1:6, 1:6] <- 1/tau_bm * diag(rep(1, 6))tau_bm ~ dgamma(1, 1)for (j in 1:6) { mu_bm[j] ~ dnorm(0, 1) } Finally for the model, the stick breaking process corresponding to Equations (10) and (11)is depicted. zlatent[1:M] <- stick_breaking(vlatent[1:(M - 1)])for (j in 1:(M - 1)) { vlatent[j] ~ dbeta(1, alpha) } alpha ~ dgamma(1, 1)tau_y ~ dgamma(1, 1) } ) With the full model defined, we next declare the data list, which is made up of the response Y , the covariates X[,1] to X[,6] , and the matrix of distances Dist . The constants in the11odel also need to be supplied, including the number of locations S , the number of startingclusters M , and the upper endpoint D for the uniform distribution of bandwidth. In addition,the initial values are specified. Code to compile the model, supply the initial values, andinvoke the MCMC process is included in the supplementary package. SLMMdata <- list(y = y, x1 = X[,1], x2 = X[,2], x3 = X[,3],x4 = X[,4], x5 = X[,5], x6 = X[,6],Dist = distmatrix)SLMMConsts <- list(S = 159, M = 50, D = 100)SLMMInits <- list(tau_y = 1,latent = rep(1, SLMMConsts$S), alpha = 2,tau_bm = 1,mu_bm = rnorm(6), phi = 1, tau_w = 1,vlatent = rbeta(SLMMConsts$M - 1, 1, 1)) The estimated parameters, together with the cluster assignments z , are determined for eachreplicate from the best post burn-in iteration selected using the Dahl’s method (Dahl, 2006).Dahl (2006) proposed a least-square model-based clustering for estimating the clustering ofobservations using draws from a posterior clustering distribution. In this method, mem-bership matrices for each iteration, B (1) , . . . , B ( M ) , where M is the number of post-burn-inMCMC itertations, are calculated. The membership matrix for the c th iteration, B ( c ) is12efined as: B ( c ) = ( B ( c ) ( i, j )) i,j ∈{ n } = 1( z ( c ) i = z ( c ) j ) n × n , (12)with 1() being the indicator function, B ( c ) ( i, j ) ∈ { , } for all i, j = 1 , ..., n and c = 1 , . . . , M .Having B ( c ) ( i, j ) = 1 means observations i and j are in the same cluster in the c th iteration.The average of B (1) , . . . , B ( M ) can be calculated as B = 1 M M (cid:88) c =1 B ( c ) , where (cid:80) here denotes element-wise summation of matrices. The ( i, j )th entry of B providesan empirical estimate of the probability for locations i and j to be in the same cluster.Next we find the iteration that has the least squared distance to B as: C LS = arg min c ∈ (1: M ) n (cid:88) i =1 n (cid:88) j =1 (cid:0) B ( c ) ( i, j ) − B ( i, j ) (cid:1) , (13)where B ( c ) ( i, j ) is the ( i, j )th entry of B ( c ) , and B ( i, j ) is the ( i, j )th entry of B . Anadvantage of the least-squares clustering is the fact that information from all clusterings areutilized via the usage of the empirical pairwise probability matrix B . It is also intuitivelyappealing, as the average clustering is selected instead of formed via an external, ad hoc clustering algorithm. In the spatial regression model, the Gaussian process spatial structure Σ W = σ w H canbe constructed via several different weighting schemes including the aforementioned unity,exponential, and Gaussian schemes in (2). In order to determine which weighting scheme isthe most suitable for the data, a commonly used model comparison criterion, the logarithm13f the pseudo-Marginal likelihood (LPML; Ibrahim et al., 2013), is applied. The LPMLcan be obtained through the conditional predictive ordinate (CPO) values. With Y ∗ ( − i ) =( Y , . . . , Y i − , Y i +1 , . . . , Y n ) denoting the observations with the i th subject response deleted,CPO can be regarded as leave-one-out-cross-validation under Bayesian framework, and itestimates the probability of observing Y i in the future if after having already observed Y ∗ ( − i ) .The CPO for the i th subject is calculated as:CPO i = (cid:90) f ( y ( s i ) | β ( s i ) , w ( s i ) , σ y ) π ( w ( s ) , β ( s ) , σ y | Y ∗ ( − i ) ) d (cid:0) w ( s ) , β ( s ) , σ y (cid:1) , (14)where π ( w ( s ) , β ( s ) , σ y | Y ∗ ( − i ) ) = (cid:81) j (cid:54) = i f ( y ( s j ) | β ( s j ) , w ( s j ) , σ y ) π ( w ( s ) , β ( s ) , σ y | Y ∗ ( − i ) ) c ( Y ∗ ( − i ) ) , and c ( Y ∗ ( − i ) ) is the normalizing constant. Within the Bayesian framework, a Monte Carloestimate of the CPO can be obtained as: (cid:91) CPO − i = 1 M M (cid:88) t =1 f ( y ( s i ) | w t ( s i ) , β t ( s i ) , σ yt ) , (15)where w t ( s i ) is calculated based on the sampled φ in the t -th iteration, and β t ( s i ) and σ yt are, respectively, the t -th iteration samples for β ( s i ) and σ y . An estimate of the LPML cansubsequently be calculated as: (cid:92) LPML = N (cid:88) i =1 log (cid:16) (cid:91) CPO i (cid:17) . (16)A model with a larger LPML value is preferred. In addition, p D , the effective number ofparameters, which can be used to measure the complexity of the model, is defined as p D = D − D ( θ ) , (17)14here D = − f ( y ( s ) | θ ) is the deviance of the model, D is the posterior mean ofdeviance, θ is the posterior mean of the parameters and D ( θ ) denotes deviance at posteriormeans. We use the Rand index (RI; Rand, 1971) to measure the accuracy of clustering. The RI isdefined as RI = ( a + b ) / ( a + b + c + d ) = ( a + b ) / (cid:18) n (cid:19) , where C = { X , . . . , X r } and C = { Y , . . . , Y s } are two partitions of { , , . . . , n } , and a, b, c and d respectively denote the number of pairs of elements of { , , . . . , n } that are (a) in asame set in C and a same set in C , (b) in different sets in C and different sets in C , (c)in a same set in C but in different sets in C , and (d) in different sets in C and a same setin C . The RI ranges from 0 to 1 with a higher value indicating better agreement betweenthe two partitions. In particular, RI = 1 indicates that C and C are identical in terms ofmodulo labeling of the nodes. In this section, we conduct simulation studies to assess the performance of the proposedmethods under scenarios where there is no clustered covariate effect, and when there isindeed clustered covariate effect. All simulations are run on an institutional high performancecomputing cluster running Red Hat Enterprise Linux Server (release 6.7).15 .1 Simulation Without Clustered Covariate Effects The spatial adjacency structure of counties in Georgia is used. As a starting point, to mimicthe real dataset we use later, one observation is generated for each of the 159 counties. Sixcovariate vectors are generated for the 159 counties with each entry i.i.d. from N (0 , × X . The spatial random effects w are simulated basedon the matrix of great circle distance (GCD) between county centroids. The great circledistances are obtained using the function distCosine() , and the centroids are calculated basedon county polygons using the function centroid() , both provided by the R package geosphere (Hijmans, 2017). The GCD matrix is subsequently normalized to have a maximum value of10 for ease in computation. and the response vector Y is generated as Y = Xβ + w + (cid:15) , where X = ( X , . . . , X ), w ∼ MVN( , exp( − GCD / (cid:15) ∼ MVN( , I ). Differentvalues of β are used: (1 , , , , . , (cid:62) , (2 , , , , , (cid:62) , and (9 , , − , , , (cid:62) , correspondingto scenarios where the signal is weak, moderate, and strong. For each of the three β ’s,the average parameter estimate denoted by (cid:98) β (cid:96),m ( (cid:96) = 1 , · · · , m = 1 , · · · , 6) in 100simulations is calculated as (cid:98) β (cid:96),m = 1100 (cid:88) r =1 (cid:98) β (cid:96),m,r , (18)where (cid:98) β (cid:96),m,r denotes the posterior estimate for the m th coefficient of county (cid:96) in the r threplicate. The performance of these posterior estimates are evaluated by the mean absolutebias (MAB), the mean standard deviation (MSD), the mean of mean squared error (MMSE)and mean coverage rate (MCR) of the 95% highest posterior density (HPD) intervals in the16ollowing ways: MAB = 1159 (cid:88) (cid:96) =1 (cid:88) r =1 (cid:12)(cid:12)(cid:12) (cid:98) β (cid:96),m,r − β (cid:96),m (cid:12)(cid:12)(cid:12) , (19)MSD = 1159 (cid:88) (cid:96) =1 (cid:118)(cid:117)(cid:117)(cid:116) (cid:88) r =1 (cid:16) (cid:98) β (cid:96),m,r − (cid:98) β (cid:96),m (cid:17) , (20)MMSE = 1159 (cid:88) (cid:96) =1 (cid:88) r =1 (cid:16) (cid:98) β (cid:96),m,r − β (cid:96),m (cid:17) , (21)MCR = 1159 (cid:88) (cid:96) =1 (cid:88) r =1 (cid:16) (cid:98) β (cid:96),m,r ∈ 95% HPD interval (cid:17) , (22)In each replicate, the MCMC chain length is set to be 50,000, with thinning 10 and thefirst 2,000 samples are discarded as burn-in, therefore we have 3,000 samples for posteriorinference. The parameter D for the uniform prior of bandwidth, i.e. φ in Equation (2),is set to 100 such that the prior for bandwidth is also noninformative. In Table 1 theaverage parameter estimates (cid:98) β (cid:96),m are reported together with the four performance measuresin Equations (19),(20), (21) and (22) are reported for the three settings. Under all threesettings, the parameter estimates are highly close to the true underlying values, and havevery small MAB, MSD and MMSE, while maintaining the MCR at close to 95% level. TheRI’s are all very close to or equal to 1, indicating that the clustering results are highlyconsistent and credible. It is worth noticing that, even when the signal is relatively weak,the clustering approach is quite precise. We consider an underlying setting where there exist clustered covariate effects. First weconsider a setting where the clustered covariate effect is independent of spatial locations, i.e.where cluster belonging are set randomly. The 159 counties are randomly assigned to threeclusters, visualized in Figure 1(a). There are, respectively, 51, 49, and 59 counties in thethree clusters. Different parameter vectors are used for data generation in different clusters17able 1: Average parameter estimates and performance of parameter estimates and cluster-ing results when without clustered covariate effect. (cid:98) β MAB MSD MMSE MCR RISetting 1 β β β β -0.003 0.090 0.071 0.005 0.970 β β β β -0.001 0.090 0.072 0.005 0.980 β β β β β β -0.001 0.088 0.069 0.005 0.970 β -3.996 0.089 0.069 0.005 0.960 β -0.002 0.091 0.072 0.005 0.960 β β Longitude La t i t ude Cluster (a) Longitude La t i t ude Cluster (b) Figure 1: Visualization of (a) random cluster assignment and (b) regional cluster assignmentfor Georgia counties used for simulation studies.18able 2: True parameter vectors used in data generation for three clusters.Cluster 1 Cluster 2 Cluster 3Setting 1 (1, 0, 1, 0, 0.5, 2) (1, 0.7, 0.3, 2, 0, 3) (2, 1, 0.8, 1, 0, 1)Setting 2 (2, 0, 1, 0, 4, 2) (1, 0, 3, 2, 0, 3) (4, 1, 0, 3, 0, 1)Setting 3 (9, 0, -4, 0, 2, 5) (1, 7, 3, 6, 0, -1) (2, 0, 6, 1, 7, 0)(see Table 2) to assess the estimation and clustering performance under different strengthsof signals. The spatial random effect W is generated using the same setting as before.The performance measures are presented in Table 3. In another scenario, a setting wherethe clustered covariate effect depends on spatial locations. Consider a partition of Georgiacounties into three large regions, visualized in Figure 1(b). The same parameter vectors inTable 2 are used for the three clusters under three settings. Corresponding performancemeasures are reported in Table 4.For each signal strength and each of the two settings, we randomly selected four replicatesfrom the total of 100 replicates and visualize the results in the Online Supplement. It is nosurprise that under both settings, the accuracy of clustering increases with the strengtheningof signals. It can be seen from Supplemental Figures 1 and 4 that with weak true signals,the proposed approach suffers from over-clustering, which is a known property of Dirichletprocess mixtures that the posterior does not concentrate at the true number of clusters (see,e.g., Miller and Harrison, 2013). This over-clustering behavior, however, diminishes as thesignals’ strength increase. When the signals are strong, the RI reaches near 0.85, indicatingthat 85% of the time, two counties that belong to the same cluster are correctly put into thesame cluster. Together with increase in RI is decrease in MCR, which is an inevitable resultof incorporating more counties in each cluster. For each county, taking other counties thatdo not belong to this county’s true cluster introduces bias in estimation.19able 3: Performance of parameter estimates and clustering results under the scenario wherecluster belongings are set randomly.MAB MSD MMSE MCR RISetting 1 β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β Real Data Analysis We consider analyzing influential factors for monthly housing cost in Georgia using the pro-posed methods. The dataset is available at , with 159observations corresponding to the 159 counties in Georgia. For each county, the dependentvariable median monthly housing cost for all occupied housing units is observed. The inde-pendent variables considered here include: the percentage of adults aged 18 to 64 who areunemployed ( X ), the average total real and personal property taxes collected per person( X ), the median home market value ( X , in thousand dollars), the percentage of White racepopulation ( X ), the median age ( X ), and size of a county’s population ( X , in thousands).Figure 2 provides a visualization of the 6 covariates on the Georgia map. In the computation,the covariates are centered and scaled to have mean 0 and unit standard deviation. Also,following the common practice in economics to account for long-tailed distributions, we takethe logarithm of monthly housing cost before fitting the model. The response variable is alsocentered and scaled, and therefore all models to follow are fitted without the intercept term.We firstly apply the model assessment criteria, LPML, for selecting the best weightingscheme for the data. The LPML values for the unity weighting scheme, the exponentialweighting scheme and the Gaussian weighting scheme are shown in Table 5. From Table 5we can see that the model with exponential weighting scheme has the largest LPML valueamong the three candidate schemes, and is therefore preferred. To verify that there isindeed spatially varying covariate effects, we also fitted the spatially-varying coefficientsmodel without clustering (4) to the dataset. The model is also compared against a vanillaBayesian regression, where no spatial effect is considered, and observations are treated asi.i.d. samples from the population. In this model, no spatial variation is assumed in thecovariate effects β , and the model reduces to the regular Bayesian linear regression.The computation is performed on a desktop computer running Windows 10 Enterprise,with i7-8700K CPU @ 3.70GHz. The computing time as well as performance measure for22 Longitude La t i t ude Value Unemployment.Rate Longitude La t i t ude Value Property.Tax Longitude La t i t ude Value House.Market.Price Longitude La t i t ude Value White.Race Longitude La t i t ude Value Age Longitude La t i t ude Value Population Figure 2: Visualizations of covariate values in counties of Georgia.Table 5: LPML values for different weighting schemes in the proposed model.Unity Exponential GaussianLPML -165.784 -165.620 -217.878these three models are recorded and presented in Table 6. The proposed model takes around650 seconds to run, followed by the spatially varying coefficients model, and then the spatiallyconstant coefficients model. The LPML values of the first two models that allow for spatiallyvarying coefficients are larger than the vanilla regression model, and the differences are notminor. This indicates that there indeed exist spatially varying covariate effect, and moreflexible models are preferred. Comparing the LPML and p D for the first two models, it canbe seen that the proposed model reduces p D and provides better fit to the data. Combiningthe conclusions from Tables 5 and 6, the proposed model with exponential weighting schemefor W ( s ) is fitted on the dataset.A total of 3 clusters are identified. The cluster belongings of the 159 counties are visu-23able 6: LPML and p D values and computing time for different models.The proposed model Spatial-varying coefficients model Vanilla regressionLPML -165.620 -174.171 -203.013 p D In this paper, we have proposed a Bayesian clustered coefficients linear regression model withspatial random effects to capture heterogeneity of regression coefficients. Multiple weightingschemes in modeling the spatial random effects have been proposed, and the corresponding24 Longitude La t i t ude Cluster Figure 3: Clusters produced by the proposed approach.Table 7: Parameter estimates and their 95% HPD intervals for the three clusters identified.Cluster 1 2 3 (cid:98) β (cid:98) β -0.091 0.043 -0.047(-0.217, 0.040) (-0.404, 0.433) (-1.19, 1.08) (cid:98) β (cid:98) β -0.078 -0.146 0.151(-0.295, 0.102) (-0.648, 0.313) (-0.910, 1.509) (cid:98) β -0.004 -0.998 -0.994(-0.205, 0.192) (-1.478, -0.578) (-1.899, 0.186) (cid:98) β References Autant-Bernard, C. and J. P. LeSage (2019). A heterogeneous coefficient approach to theknowledge production function. Spatial Economic Analysis 14 (2), 196–218.Bill´e, A. G., R. Benedetti, and P. Postiglione (2017). A two-step approach to account forunobserved spatial heterogeneity. Spatial Economic Analysis 12 (4), 452–471.Brunsdon, C., A. S. Fotheringham, and M. E. Charlton (1996). Geographically weighted26egression: a method for exploring spatial nonstationarity. Geographical Analysis 28 (4),281–298.Carlin, B. P., A. E. Gelfand, and S. Banerjee (2014). Hierarchical Modeling and Analysis forSpatial Data . Chapman and Hall/CRC.Chasco, C. and J. L. Gallo (2015). Heterogeneity in perceptions of noise and air pollution:A spatial quantile approach on the city of Madrid. Spatial Economic Analysis 10 (3),317–343.Cressie, N. (1992). Statistics for spatial data. Terra Nova 4 (5), 613–617.Dahl, D. B. (2006). Model-based clustering for expression data via a Dirichlet process mixturemodel. Bayesian Inference for Gene Expression and Proteomics 4 , 201–218.de Valpine, P., D. Turek, C. J. Paciorek, C. Anderson-Bergman, D. T. Lang, and R. Bodik(2017). Programming with models: writing statistical algorithms for general model struc-tures with NIMBLE. Journal of Computational and Graphical Statistics 26 (2), 403–413.Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Annals ofStatistics 1 (2), 209–230.Gelfand, A. E., H.-J. Kim, C. Sirmans, and S. Banerjee (2003). Spatial modeling withspatially varying coefficient processes. Journal of the American Statistical Associa-tion 98 (462), 387–396.Gelfand, A. E. and E. M. Schliep (2016). Spatial statistics and Gaussian processes: Abeautiful marriage. Spatial Statistics 18 , 86–104.Geng, J., A. Bhattacharya, and D. Pati (2019). Probabilistic community detection with un-known number of communities. Journal of the American Statistical Association 114 (526),893–905. 27reen, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesianmodel determination. Biometrika 82 (4), 711–732.Hijmans, R. J. (2017). geosphere: Spherical Trigonometry . R package version 1.5-7.Hu, G. and J. Bradley (2018). A Bayesian spatial-temporal model with latent multivariatelog-gamma random effects with application to earthquake magnitudes. Stat 7 (1), e179.e179 sta4.179.Hu, G., J. Geng, Y. Xue, and H. Sang (2020). Bayesian spatial homogeneity pursuit offunctional data: an application to the U.S. income distribution. Arxiv . Preprint.Ibrahim, J. G., M.-H. Chen, and D. Sinha (2013). Bayesian Survival Analysis . SpringerScience & Business Media.Kulldorff, M. and N. Nagarwalla (1995). Spatial disease clusters: detection and inference. Statistics in Medicine 14 (8), 799–810.Lawson, A. B., J. Choi, and J. Zhang (2014). Prior choice in discrete latent modeling ofspatially referenced cancer survival. Statistical Methods in Medical Research 23 (2), 183–200.Li, F. and H. Sang (2019). Spatial homogeneity pursuit of regression coefficients for largedatasets. Journal of the American Statistical Association , 1–21.Li, P., S. Banerjee, T. A. Hanson, and A. M. McBean (2015). Bayesian models for detectingdifference boundaries in areal data. Statistica Sinica 25 (1), 385–402.Ma, Z., Y. Xue, and G. Hu (2019a). Geographically weighted regression analysis for spatialeconomics data: a Bayesian recourse. Technical report, University of Connecticut.Ma, Z., Y. Xue, and G. Hu (2019b). Nonparametric analysis of income distributions amongdifferent regions based on energy distance with applications to China Health and NutritionSurvey data. Communications for Statistical Applications and Methods 26 (1), 57–67.28iller, J. W. and M. T. Harrison (2013). A simple example of Dirichlet process mixtureinconsistency for the number of components. In Advances in Neural Information ProcessingSystems , pp. 199–206.Panzera, D., R. Benedetti, and P. Postiglione (2016). A Bayesian approach to parameterestimation in the presence of spatial missing data. Spatial Economic Analysis 11 (2),201–218.Rand, W. M. (1971). Objective criteria for the evaluation of clustering methods. Journal ofthe American Statistical Association 66 (336), 846–850.Salinas-P´erez, J. A., M. L. Rodero-Cosano, C. R. Garc´ıa-Alonso, and L. Salvador-Carulla(2015). Applying an evolutionary algorithm for the analysis of mental disorders in macro-urban areas: The case of Barcelona. Spatial Economic Analysis 10 (3), 270–288.Sethuraman, J. (1991). A constructive definition of Dirichlet priors. Statistica Sinica 4 (2),639–650.Xue, Y., E. D. Schifano, and G. Hu (2019). Geographically weighted Cox regression and itsapplication to prostate cancer survival data in Louisiana. Geographical Analysis . Forth-coming.Zhang, J. and A. B. Lawson (2011). Bayesian parametric accelerated failure time spatialmodel and its application to prostate cancer. Journal of Applied Statistics 38 (3), 591–603.Zhao, P., H.-C. Yang, D. K. Dey, and G. Hu (2020). Bayesian spatial homogeneity pursuitregression for count value data.