Bayesian Hierarchical Spatial Regression Models for Spatial Data in the Presence of Missing Covariates with Applications
BBayesian Hierarchical Spatial Regression Models forSpatial Data in the Presence of Missing Covariates withApplications
Zhihua Ma Guanyu Hu Ming-Hui Chen
Abstract
In many applications, survey data are collected from different survey centers in differentregions. It happens that in some circumstances, response variables are completely observedwhile the covariates have missing values. In this paper, we propose a joint spatial regressionmodel for the response variable and missing covariates via a sequence of one-dimensionalconditional spatial regression models. We further construct a joint spatial model for missingcovariate data mechanisms. The properties of the proposed models are examined and a Markovchain Monte Carlo sampling algorithm is used to sample from the posterior distribution. Inaddition, the Bayesian model comparison criteria, the modified Deviance Information Crite-rion (mDIC) and the modified Logarithm of the Pseudo-Marginal Likelihood (mLPML), aredeveloped to assess the fit of spatial regression models for spatial data. Extensive simulationstudies are carried out to examine empirical performance of the proposed methods. We furtherapply the proposed methodology to analyze a real data set from a Chinese Health and NutritionSurvey (CHNS) conducted in 2011.
Keywords : CHNS 2011, Gaussian Spatial Process Model, Household Income, Spatial MissingCovariates
Household income is a very important measurement of the development of one region’s economy.It is of great practical interest to examine the effects of covariates on the household income. Sincehousehold income data are always collected from different survey centers in different regions, thereare two challenges when analyzing household income data. For geographically distributed data, itis not desirable to fit a traditional regression model because the traditional regression model doesnot account for the spatial dependence among different regions. As a result, the first challenge for1 a r X i v : . [ s t a t . M E ] J u l patially dependent data such as household incomes from different regions is to build a suitableregression model. From Banerjee et al. and Cressie , there are different approaches for modellingspatially dependent data, such as the conditional autoregressive model (CAR), the simultaneousautoregressive model (SAR), and the linear regression model with spatial random effects. For theareal data, CAR and SAR are two widely used models. The study region is partitioned into a finitenumber of areal units with well-defined boundaries . The spatial correlation structure depends onadjacency matrix of subareas. The CAR model is appropriate for situations with the first orderdependency or a relatively local spatial autocorrelation, which assumes that a particular area isinfluenced by its neighbors. However, the SAR model is more suitable where there is the secondorder dependency or a more global spatial autocorrelation. The locations of the point reference datavary continuously over the study region. The spatial correlation structure depends on the distancebetween the locations. The most popular model for point reference data is the regression model withGaussian spatial random effects . Another challenge for analyzing such kind of data is that thereexist some missing covariates. Household income data are collected from surveys, so it is commonfor us to get incomplete data for some covariates. There is rich literature on building regressionmodels with missing covariates. Zhao et al. used estimating equations for regression analysis in thepresence of missing observations on one covariate. Ibrahim et al. proposed methods for Bayesianinference of regression models with missing covariates. However, no existing literature dealswith spatial data and missing covariates simultaneously. Seshadri proposed a spatial averagingapproach for modelling spatial response data only. Bae et al. , Xue et al. and Collins et al. alsoproposed some approaches for dealing with spatial missing data. However, they did not considermissing data model in their approaches. Besides, spatial random effects are not commonly usedin missing covariate models to take account of spatial effects. Recently, Grantham et al. built ajoint hierarchical model for PM 2.5 and aerosol optical depth (AOD). To deal with missingness ofAOD in spatial regression model, they assume informative missingness of AOD and build spatialregression model for AOD to interpolate AOD.In this paper, we develop a Bayesian spatial regression model to deal with the spatially de-pendent data with missing covariates using the idea from Ibrahim et al. . We assume that themissing covariates are spatially dependent and build hierarchical spatial regression models for boththe response variable and missing covariates. Furthermore, we propose the modified DevianceInformation Criterion (mDIC) and the modified Logarithm of the Pseudo-Marginal Likelihood(mLPML). One of the main focus of this paper is on the examination of the impact of spatialeffects in the missing covariates models on the spatial response model. Our proposed mDIC andmLPML criteria allow us to assess the fit of the spatial response data under covariates models withor without spatial effects. We further conduct extensive simulation studies to examine the empiricalperformance of the proposed criteria. Such investigation and assessment have not been carried out2n the literature based on our best knowledge.The remainder of this paper is organized as follows. In Section 2, the data from Chinese Healthand Nutrition Survey (CHNS) 2011 are introduced as a motivating example. In Section 3, wedevelop the spatial regression model for the response variable, the model for missing covariateswith spatial random effects, and the model for the missing data mechanism. Furthermore, Bayesianmodel assessment criteria including mDIC and mLPML are used for model comparison. Anextensive simulation study is conducted in Section 4 to investigate empirical performance of themodels proposed in Section 3. In Section 5, the proposed method is employed to analyze the realdata set of CHNS 2011. Finally, we conclude the paper with a brief discussion in Section 6. Chinese Health and Nutrition Survey (CHNS), a project collaborated by the Carolina PopulationCenter at the University of North Carolina and the National Institute for Nutrition and Health at theChinese Center for Disease Control and Prevention, aims to examine the relationship between thesocial and economic transformation of Chinese society and the health and nutritional status of itspopulation. As a geographically distributed data set, CHNS 2011 collected individual-, household-and community-specific information from 12 provinces in China. In this paper, household incomefrom 12 provinces is selected as the spatial response variable, and the aim is to explore the spatialeffects and the factors that may have impacts on this variable of interest.
The data were collected from 12 provinces in China with a total sample size of 4346. Householdincome ( hincome ) is the response variable. Individual-level covariates include wage of head of thehousehold ( indwage ), age of head of the household ( age ), proportion of urban area ( urban ), numberof hours worked last year (
WThour ), family size ( hhsize ) and GDP per capita of the province (
GDP ).The units of hincome , indwage and GDP are CNY.The sample sizes in different provinces as well as the summary information of the variables areshown in Table 1.Among these covariates, indwage and
WThour have missing values. The average percentageswith only indwage or WThour missing are 22.50% and 5.34%, respectively, while the averagepercentage with both indwage and
WThour missing is 22.30%. A summary of the missing patternsof these two covariates are given in Table 2. 3able 1: Sample size and summary information of the variables in each provinceBeijing Liaoning Heilongjiang Shanghai Jiangsu ShandongSample size 415 395 396 424 412 399 hincome mean 75599.23 49426.97 46861.01 87455.34 61393.95 40999.05sd 49926.75 47862.42 44386.13 68695.15 43495.38 40926.51 indwage mean 41029.76 25021.40 29590.38 41829.25 20894.34 20769.75sd 41730.44 25584.54 40866.63 45704.46 20348.74 24941.24 age mean 49.30 56.40 51.15 56.28 59.31 56.01sd 13.13 11.88 11.42 11.68 11.83 11.38 urban proportion 0.86 0.30 0.37 0.83 0.33 0.29sd 0.34 0.46 0.48 0.38 0.47 0.46
WThour mean 44.21 38.15 27.78 41.43 35.86 43.09sd 12.22 24.77 24.50 8.77 22.49 18.10 hhsize mean 2.80 2.85 2.62 3.20 3.22 3.01sd 0.84 1.16 1.01 1.10 1.51 1.33Henan Hubei Hunan Guangxi Guizhou ChongqingSample size 298 337 244 360 339 327 hincome mean 36782.92 50417.49 48163.62 37022.83 45388.52 41770.31sd 42655.65 57341.40 43458.89 33374.35 52696.84 39894.29 indwage mean 16022.50 21769.90 27191.26 12122.19 22694.39 25977.72sd 24142.06 28241.78 29676.95 14334.89 39639.95 34609.48 age mean 53.96 54.67 53.39 55.27 56.21 52.48sd 12.18 10.38 12.44 12.43 12.58 11.52 urban proportion 0.37 0.33 0.41 0.29 0.33 0.54sd 0.48 0.47 0.49 0.45 0.47 0.50
WThour mean 37.12 38.23 39.31 36.32 32.07 38.85sd 23.11 18.53 17.01 21.36 18.95 18.97 hhsize mean 3.67 3.27 3.25 4.14 3.43 3.20sd 1.49 1.48 1.35 1.76 1.43 1.114able 2: Missing percentages in each provinceBeijing Liaoning Heilongjiang Shanghai Jiangsu Shandongmissing indwage only 1.20% 25.57% 41.92% 0.94% 18.20% 21.55%missing
WThour only 4.82% 3.04% 4.55% 3.54% 7.52% 8.77%missing indwage and
WThour indwage only 22.48% 32.94% 23.77% 28.33% 31.86% 29.05%missing
WThour only 5.37% 5.93% 4.10% 6.94% 2.95% 6.12%missing indwage and
WThour
In the CHNS 2011 data set, we do not have survey data for all the provinces in China. Also, theprovinces included in this data set are not always neighbored with each other. Thus, we treat theCHNS 2011 data as point-referenced data such that the spatial dependence can be possibly andreasonably captured by the distance between two provinces especially when they are away fromeach other. The centroid latitudes and longitudes of these 12 provinces are given in Table 3. Figure1 shows the map of mainland China. The provinces which are included in our study are marked inblue color.Using the coordinates of 12 provinces, we can easily calculate the distance between twoprovinces. These distances are useful to construct covariance matrices of the spatial random effectsin Section 5 below.
In this section, a spatial regression model with missing covariates is built hierarchically. AGaussian spatial regression model for the response variable is built, after which, missing covariate5 nhuiBeijingChongqing FujianGansu GuangdongGuangxi GuizhouHainan HebeiHebei HeilongjiangHenanHubeiHunanInner Mongolia JiangsuJiangxi JilinLiaoningNingxiaQinghai Shaanxi ShandongShanghaiShanxiSichuan TaiwanTianjin CityTibetXinjiang Yunnan Zhejiang long
Figure 1: China Map (Blue indicates the province which are included in the study)distributions are built to take account of the missing covariates and covariate-specific spatial effects.In addition, a model capturing the missing data mechanism is also built. After introducing themodel construction, posterior inference procedure and model assessment are presented.
Suppose, we consider S locations and N s observations at location s ( s = , · · · , S ) . The spatialresponse variable at location s is denoted by Y ( s ) = ( Y ( s ) , · · · , Y N s ( s )) (cid:48) . A Gaussian stationaryspatial process model is built for the spatial response variable. The general Gaussian stationaryspatial process model can be written as in, for example, Cressie : Y ( s ) = X ( s ) (cid:48) β + σ y W y ( s ) N s + (cid:15) ( s ) , (1)where X ( s ) = ( N s , X ( s ) , . . . , X p ( s )) (cid:48) is a ( p + ) × N s matrix, p is the number of covariates, N s isthe N s -dimensional vector with s, X k ( s ) = ( X k ( s ) , · · · , X kN s ( s )) (cid:48) is an N s -dimensional vector ofcovariates, and β = ( β , β , · · · , β p ) (cid:48) is a ( p + ) dimensional vector of corresponding regression6oefficients. The spatial random effect W y ( s ) is a second-order stationary mean-zero process. Tobe more specific, W y ( s ) conforms that E ( W y ( s )) = , Var ( W y ( s )) = , and Cov ( W y ( s ) , W y ( s (cid:48) )) = ρ ( s , s (cid:48) ) , where ρ (·) is a valid two-dimensional correlation function. (cid:15) ( s ) is the white noise processsuch that (cid:15) ( s ) ∼ MVN ( N s , τ − y I N s ) , where “MVN” represents the multivariate normal distribution, I N s is the N s × N s identity matrix, and Cov ( (cid:15) ( s ) , (cid:15) ( s (cid:48) )) = for s (cid:44) s (cid:48) . According to (1), the followingspatial hierarchical model is built: Y ( s )| W y ( s ) , X ( s ) , β , σ y , τ y ∼ MVN ( X ( s ) (cid:48) β + σ y W y ( s ) N s , τ − y I N s ) , s = , , . . . , S , (2) W y | λ y ∼ MVN ( , H ( λ y )) , (3)where W y = ( W y ( ) , · · · , W y ( S )) (cid:48) is the response-specific spatial random effect, H ( λ y ) is a spatialcorrelation matrix based on distance and parameter λ y . For the exponential spatial correlationkernel, the ( s , s (cid:48) ) th entry of the correlation matrix is exp (− λ y d ss (cid:48) ) , where d ss (cid:48) is the Euclidiandistance between location s and location s (cid:48) , and λ y is the range parameter for spatial correlation. Asmall value of λ y means a strong spatial correlation, and a large value of λ y means a weak spatialcorrelation. For survey data, it is common that the data for some covariates are not completely observed. Forexample, X k ( s ) is the k th covariate at location s and has N s observations. If there are any missingvalues among those N s observations, i.e. if any one of the elements of ( X k ( s ) , X k ( s ) , · · · , X kN s ( s )) is missing, X k is defined as a missing covariate at location s . For the CHNS 2011 dataset discussedin Section 2, two missing covariates exist at all locations. Therefore, in this section, we assumethat for all locations, among the p covariates, the first q ( q ≤ p ) covariates are missing covariates.In the presence of missing covariates, a joint model for the missing covariates should bespecified to take account of the uncertainty resulting from the missing values in the covariates. Tobe specific, for the i th observation at location s , the corresponding q -dimensional missing covariatevector is X misi ( s ) = ( X i ( s ) , · · · , X qi ( s )) (cid:48) , while the ( p − q ) -dimensional complete covariate vectoris denoted by X obsi ( s ) = ( X q + , i ( s ) , · · · , X pi ( s )) (cid:48) . For missing covariate data, it is crucial to specifya model for the missing covariates X misi ( s ) . Given the spatial random effects, we assume X misi ( s ) , i = , , . . . , N s , are conditionally independent. In general settings, Lipsitz and Ibrahim andIbrahim et al. specified the missing covariate distribution through a series of one-dimensionalconditional distributions. In our case, since the covariates are also spatially distributed, covariate-specific spatial effects are also taken into account in the missing covariate model. We extend theirmodel as 7 ( X i ( s ) , · · · , X qi ( s )| X obsi ( s ) , W x ( s ) , α , σ x , τ x ) = f ( X qi ( s )| X i ( s ) , · · · , X q − , i ( s ) , X obsi ( s ) , W x q ( s ) , α q , σ x q , τ x q )× f ( X q − , i ( s )| X i ( s ) , · · · , X q − , i ( s ) , X obsi ( s ) , W x q − ( s ) , α q − , σ x q − , τ x q − )× · · · × f ( X i ( s )| X obsi ( s ) , W x ( s ) , α , σ x , τ x ) , (4)where W x ( s ) = ( W x ( s ) , · · · , W x q ( s )) (cid:48) , W x (cid:96) ( s ) represents the spatial effect of covariate X (cid:96), i ( (cid:96) = , · · · , q ) at location s , σ x = ( σ x , · · · , σ x q ) (cid:48) is a vector of the standard deviations of the spatiallystructured random errors of the covariates, τ x = ( τ x , · · · , τ x q ) (cid:48) is a vector of the precisions of theindependent random errors of the covariates, and the coefficients associated to the covariates are α = ( α , · · · , α q ) (cid:48) with α (cid:96) being the indexing parameter vector for the (cid:96) th conditional distribution.For the covariate-specific spatial random effects W x (cid:96) = ( W x (cid:96) ( ) , · · · , W x (cid:96) ( S )) (cid:48) , a multivariate normaldistribution similar to (3) can be assumed. As in Grund et al. , here we assume the spatial randomeffects, W x (cid:96) ’s, of the missing covariates and W y ( s ) of the response variable are independent. Thisassumption is reasonable since W x (cid:96) captures spatial dependence of the covariate x (cid:96) ( s ) , W y ( s ) captures spatial dependence of the response variable, and the dependence between the responsevariable and the covariates is induced by the spatial regression model in (3).There are many possibilities in ( ), especially when q is large. Chen and Ibrahim gavesome guidelines for specifying the sequence of one-dimensional conditional distributions. Whenthe missing covariates are categorical, logistic regression for the conditional missing covariatedistribution can be specified. Probit or complementary log-log links are also suitable to modelcategorical covariates. Ordinal regression models can be employed to model missing ordinalcovariates. For count variables, we can model them via Poisson regression. And for continuousvariables, normal regression, log-normal regression, and exponential regression can be considered.In our extended model, covariate-specific spatial effects are considered in the model additionally.Conditional on the spatial effects, missing covariates can be modeled according to the above strategy.And for the spatial effects, the same stationary process structure in (3) can be used. In the motivatingexample, there are q = missing continuous covariates, and the spatial regression model for thesetwo missing covariates can be written, for i = , , · · · , N s and s = , , · · · , S , as X i ( s )| X i ( s ) , X obsi ( s ) , W x ( s ) , α , σ x , τ x ∼ N ( X (− ) i ( s ) (cid:48) α + σ x W x ( s ) , τ − x ) , X i ( s )| X obsi ( s ) , W x ( s ) , α , σ x , τ x ∼ N ( X obsi ( s ) (cid:48) α + σ x W x ( s ) , τ − x ) , W x | λ x ∼ MVN ( , H ( λ x )) , W x | λ x ∼ MVN ( , H ( λ x )) , where X (− ) i ( s ) = ( X i ( s ) , ( X obsi ( s )) (cid:48) ) (cid:48) denotes a vector of the other covariates except X i ( s ) , α and α are the indexing parameter vectors for the distributions of X i ( s ) and X i ( s ) , respectively, τ x τ x are the precision parameters of X i ( s ) and X i ( s ) , σ x and σ x are the standard deviationsof W x and W x , and λ x and λ x are the corresponding range parameters for spatial correlations of W x and W x , which are different than λ y defined in (3). Assuming a corresponding missing indicator for each missing covariate, for observation i we havethe q -dimensional missing indicator vector R i ( s ) = ( R i ( s ) , · · · , R qi ( s )) (cid:48) with R (cid:96) i ( s ) = if X (cid:96) i ( s ) isobserved and R (cid:96) i ( s ) = if X (cid:96) i ( s ) is missing ( (cid:96) = , · · · , q ). The joint distribution of R (cid:96) i ( s ) can alsobe written as the form of a product of one-dimensional conditional distributions, that is f ( R i ( s ) , · · · , R qi ( s )| X i ( s ) , Y i ( s ) , φ ) = f ( R qi ( s )| R i ( s ) , · · · , R q − , i ( s ) , X i ( s ) , Y i ( s ) , φ q )× · · · × f ( R i ( s )| X i ( s ) , Y i ( s ) , φ ) (5)for i = , , · · · , N s and s = , , · · · , S , where φ = ( φ , · · · , φ q ) (cid:48) parameterizes the missingnessmechanism model with φ (cid:96) as a vector of indexing parameters for the (cid:96) th conditional distribution.For each one-dimensional conditional distributions of these binary missing indicators, it is commonto build a logistic regression model for each of them.In the missing data literature, missing data mechanism can be categorized as missing completelyat random (MCAR), missing at random (MAR) or missing not at random (MNAR) . Whenmissingness does not depend on the covariates that are missing or observed, then the missing datamechanism is termed as MCAR. When missingness depends only on the observed covariates butnot on the missing ones, the missing data mechanism is MAR. When neither MCAR nor MARholds, the missing data mechanism is termed as MNAR.For simplicity, in our case we assume that the missing data mechanism is MAR, which meansthat the missing data does not depend on the missing covariates. For q = missing covariates, thejoint distribution of the missing indicators is written as f ( R i ( s ) , R i ( s )| X obsi ( s ) , Y i ( s ) , φ ) = f ( R i ( s )| R i ( s ) , X obsi ( s ) , Y i ( s ) , φ ) × f ( R i ( s )| X obsi ( s ) , Y i ( s ) , φ ) , R i ( s )| R i ( s ) , X obsi ( s ) , Y i ( s ) , φ ∼ Bernoulli ( p i ( s )) , R i ( s )| X obsi ( s ) , Y i ( s ) , φ ∼ Bernoulli ( p i ( s )) , logit ( p i ( s )) = log ( p i ( s )/( − p i ( s ))) = ( X obsi ( s ) (cid:48) , Y i ( s )) (cid:48) φ , logit ( p i ( s )) = ( X obsi ( s ) (cid:48) , Y i ( s )) (cid:48) φ . For the unknown parameters θ = { β , σ y , τ y , λ y , α , σ x , τ x , λ x , φ } , where λ x = { λ x (cid:96) } q (cid:96) = , we as-sume that they are independent a priori. For (cid:96) = , · · · , q , the following prior distributions9re assigned: β k ∼ N ( , ψ − β k ) , for k = , · · · , p ; τ − y ∼ IG ( a y , b y ) ; σ y ∼ half-Normal ( , ψ − σ y ) ; λ y ∼ log-Normal ( , ψ − λ y ) ; α (cid:96) k ∼ N ( , ψ − α (cid:96) k ) , for k = , · · · , m (cid:96) ; σ x (cid:96) ∼ half-Normal ( , ψ − σ x (cid:96) ) ; τ − x (cid:96) ∼ IG ( a x (cid:96) , b x (cid:96) ) ; λ x (cid:96) ∼ log-Normal ( , ψ − λ x (cid:96) ) ; and φ (cid:96) k ∼ N ( , ψ − φ (cid:96) k ) , for k = , · · · , m (cid:48) (cid:96) , where m (cid:96) and m (cid:48) (cid:96) are the dimensions of covariates in the missing covariate model and missing data mechanismmodel of the (cid:96) th missing covariate, respectively.Note that ψ β k , a y , b y , ψ σ y , ψ λ y , ψ α (cid:96) k , ψ σ x (cid:96) , a x (cid:96) , b x (cid:96) , ψ λ x (cid:96) , and ψ φ (cid:96) k are prespecified hyperparame-ters. In this article, we use ψ β k = ψ α (cid:96) k = ψ σ y = ψ σ x (cid:96) = ψ φ (cid:96) k = . , a y = b y = a x (cid:96) = b x (cid:96) = . , and ψ λ y = ψ λ x (cid:96) = , which lead to non-informative priors. With the above prior distributions, theposterior distribution of these unknown parameters based on the observed data D obs = { Y , X obs } with X obs = { X obs ( s )} Ss = is given by π ( θ | D obs ) ∝ L ( θ | D obs ) π ( θ )∝ (cid:20) ∫ S (cid:214) s = (cid:16) f ( Y ( s )| W y ( s ) , X ( s ) , β , σ y , τ y ) N s (cid:214) i = ∫ f ( X misi ( s )| X obsi ( s ) , W x ( s ) , α , σ x , τ x ) f ( R i ( s )| X i ( s ) , Y i ( s ) , φ ) d X misi ( s ) (cid:17) × f ( W y | λ y ) f ( W x | λ x ) d W y d W x (cid:21) π ( θ ) , (6)where f ( Y ( s )| W y ( s ) , X ( s ) , β , σ y , τ y ) refers to the spatial regression model for the response variablein (2), f ( X misi ( s )| X obsi ( s ) , W x ( s ) , α , σ x , τ x ) is defined in (4), and f ( R i ( s )| X i ( s ) , Y i ( s ) , φ ) is definedin (5) . In equation (6), f ( W y | λ y ) and f ( W x | λ x ) = (cid:206) q (cid:96) = f ( W x (cid:96) | λ x (cid:96) ) refer to the distributions of W y and W x (cid:96) ’s, respectively, d W x = (cid:206) q (cid:96) = d W x (cid:96) , and π ( θ ) denotes the joint prior distribution ofthe unknown parameters. When a MAR missing data mechanism is assumed, the model for themissing data mechanism does not need to enter the posterior distribution.The analytical form of the posterior distribution of θ is unavailable. Therefore, we carry out theposterior inference using the Markov chain Monte Carlo (MCMC) sampling algorithm to samplefrom the posterior distribution. Instead of sampling from the posterior distributions of the unknownparameters directly, MCMC samples from the full conditional distributions of the parameters withthe remaining variables fixed to their current values are obtained. In this way, we can conductinferences of the proposed model. In our case, spatial random effects are also regarded as unknownparameters, and then the algorithm samples these parameters in turn from their corresponding fullconditional distributions. 10 .5 Model Assessment Within the Bayesian framework, the Deviance Information Criterion (DIC) and the Logarithmof the Pseudo-Marginal Likelihood (LPML) are two well-known Bayesian criteria for modelcomparison.Since our main objective is to assess the fit of the spatial regression model for the response, wespecify the following deviance function:Dev ( W y , X , β , σ y , τ y ) = − S (cid:213) s = log f ( Y ( s )| W y ( s ) , X ( s ) , β , σ y , τ y ) = S (cid:213) s = (cid:110) log ( π ) + N s log ( τ y ) + ( Y ( s ) − X (cid:48) ( s ) β − σ y W y ( s ) N s ) (cid:48) | τ y I N s | − ( Y ( s ) − X ( s ) (cid:48) β − σ y W y ( s ) N s ) (cid:111) . (7)Therefore, we define a modified DIC (mDIC) for the response model as follows:mDIC = E [ Dev ( W y , X , β , σ y , τ y )] − Dev ( ˆ W y , ˆ X , ˆ β , ˆ σ y , ˆ τ y ) , (8)where ˆ W y , ˆ X , ˆ β , ˆ σ y , and ˆ τ y are the posterior means of parameters and missing covariates. A smallervalue of mDIC indicates a better model.Let D (− i ) ( s ) = { Y j ( s ) : j = , · · · , i − , i + , · · · , N s , s = , , . . . , S } denote the observationdata with the i th subject response deleted. Following Hanson et al. , we consider a modifiedConditional Predictive Ordinate (mCPO) for the i th subject asmCPO i ( s ) = ∫ f ( Y i ( s )| W y ( s ) , X ( s ) , β , σ y , τ y )× π ( W y , X , β , σ y , τ y | D (− i ) ) d ( W y , X , β , σ y , τ y ) , (9)where π ( W y , X , β , σ y , λ y , τ y | D (− i ) ( s )) = (cid:206) Ss = (cid:206) j (cid:44) i f ( Y j ( s )| W y ( s ) , X ( s ) , β ,σ y ,τ y ) π ( W y , X , β ,σ y ,τ y ) c ( D (− i ) ( s )) and c ( D (− i ) ( s )) denotes the normalizing constant. In practice, a Monte Carlo estimate of mCPO using MCMCalgorithms from the posterior distributions can be used. To be specific, letting W y t ( s ) , X t ( s ) , β t , σ y t ,and τ y t ( t = , · · · , T ) denote a MCMC sample of unknown parameters and missing covariates fromthe corresponding augmented posterior distribution, a Monte Carlo estimate of mCPO − i is givenby (cid:92) mCPO i ( s ) − = T T (cid:213) t = f ( Y i ( s )| W y t ( s ) , X t ( s ) , β t , σ y t , τ y t ) . (10)Then mLPML is given by (cid:92) mLPML = S (cid:213) s = N s (cid:213) i = log ( (cid:92) mCPO i ( s )) . (11)Similar to the conventional LPML, a larger value of mLPML indicates a more favorable model.11 A Simulation Study
In this simulation study, we randomly generated 20 locations in a space of [ , ] × [ , ] . Foreach location, we generated 50 observations based on Y i ( s ) = β + β X i ( s ) + β X i ( s ) + β X i ( s ) + σ y W y ( s ) + (cid:15) i ( s ) , where s = , . . . , , i = , · · · , , (cid:15) i ( s ) is i.i.d. generated from N ( , ) and W y ∼ MVN ( , H ( λ y )) .Covariate X i ( s ) is independently generated from N ( , ) , X i ( s ) is generated from N ( X i ( s ) + σ x W x ( s ) , ) , and X i ( s ) is generated from N ( X i ( s ) + σ x W x ( s ) , ) , where W x ∼ MVN ( , H ( λ x )) , W x ∼ MVN ( , H ( λ x )) , σ y = √ , σ x = , and σ x = √ . . For both spatial random effects, the ( s , s (cid:48) ) th entry of H (·) is exp (− d ss (cid:48) / λ ) , where d ss (cid:48) is the distance between s and s (cid:48) , λ y = , λ x = ,and λ x = .Missing data for ( X ( s ) , X ( s )) are generated with a missing data mechanism that does notdepend on ( X ( s ) , X ( s )) , leading to the missing data to be MAR. As a result, the missing datamechanism can be ignored when estimating the parameters. Specifically, let R (cid:96) i ( s ) = if X (cid:96) i ( s ) isobserved and R (cid:96) i ( s ) = if X (cid:96) i ( s ) is missing ( (cid:96) = , ) . The joint distribution of ( R ( s ) , R ( s )) isgiven by f ( R ( s ) , R ( s )| φ ) = f ( R ( s )| R ( s ) , φ ) f ( R ( s )| φ ) , (12)where φ = ( φ , φ ) , φ and φ are the vectors of parameters corresponding to the distributionsof R ( s ) and R ( s ) , respectively. We take logistic regression models for f ( R ( s )| R ( s ) , φ ) and f ( R ( s )| φ ) . Thus, f ( R i ( s ) = | R i ( s ) , φ , X i ( s ) , Y i ( s )) = exp ( φ + φ X i ( s ) + φ Y i ( s ) + φ R i ( s )) + exp ( φ + φ X i ( s ) + φ Y i ( s ) + φ R i ( s )) , (13)and f ( R i ( s ) = | φ , X i ( s ) , Y i ( s )) = exp ( φ + φ X i ( s ) + φ Y i ( s )) + exp ( φ + φ X i ( s ) + φ Y i ( s )) . (14)In (13) and (14), φ = ( φ , φ , φ ) (cid:48) and φ = ( φ , φ , φ , φ ) (cid:48) . One hunderd simulateddatasets were generated in this study. The average percentages over the 100 simulated datasets withonly X ( s ) missing or only X ( s ) missing are 32.82% and 39.27% respectively, while the averagepercentage with both X ( s ) and X ( s ) missing is 28.72%.12 .2 Simulation Results According to Section 3, we set up the following model M and fix the parameters related to W y , W x , W x to their true values. The spatial regression model for the response variable is given as Y i ( s ) ∼ N ( µ y i ( s ) , τ − y ) , µ y i ( s ) = β + β X i ( s ) + β X i ( s ) + β X i ( s ) + σ y W y ( s ) . The models for the two missing covariates are given as X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + α X i ( s ) + σ x W x ( s ) , X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + σ x W x ( s ) . The true values of the model parameters are shown in Table 4. In order to examine empiricalperformance of the posterior estimates, several assessment measures including average bias (Bias),average standard deviations (SD), mean square error (MSE) and coverage probability (CP) for eachparameter are computed. Taking β as an example, these measures are given asBias = T T (cid:213) t = ( ˆ β t − β ) , SD = T T (cid:213) t = sd ( β t ) , MSE = T T (cid:213) t = ( ˆ β t − β ) , CP = T T (cid:213) t = ( β ∈ HPD ( β t )) , where β is the true value of β and T is the total number of simulated datasets while ˆ β t isthe posterior mean of β . sd ( β t ) is the estimated standard deviation of β , and HPD ( β t ) is theestimated 95% highest probability density (HPD) interval of β computed from the t th simulateddataset for t = , · · · , T . Bayesian estimates are obtained via JAGS and R . With the thinninginterval to be 20, 5,000 samples are kept for calculation after a burn-in of 10,000 samples. Theresults of these measures with all records, CC analysis, and model M proposed above are shownin Table 4. The difference between “all records", “CC" and “ M ” is on the datasets used to fit theproposed model. “All records" means using the whole dataset before generating the missing ones,“CC" means using the datasets excluding the missing records, and “ M ” means using the datasetswith missing values.From Table 4, we can observe that the biases of the posterior estimates under CC are muchgreater than those under model M . The 95% HPD intervals under M are larger than those underCC. Thus, M is more preferred than the CC analysis.In order to assess the performance of the model comparison criteria proposed in Section 3.5,we set up several alternative models with the same response model as M but with different missing13able 4: Simulation results of assessment measures with all records, CC, and model M Truevalue All records CCBias SD MSE CP Bias SD MSE CP β β β β τ y α α α τ x α α τ x M value Bias SD MSE CP β β β β τ y α α α τ x α α τ x M , M , M , and M M M M M mDIC 3151.82 3205.51 3182.25 3171.66mLPML -1672.62 -1709.00 -1695.38 -1683.85covariate models as follows: M : X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + α X i ( s ) , X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) ; M : X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + α X i ( s ) , X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + σ x W x ( s ) ; M : X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) + α X i ( s ) + σ x W x ( s ) , X i ( s ) ∼ N ( µ x i ( s ) , τ − x ) , µ x i ( s ) = α + α X i ( s ) . The averages of mDIC and mLPML under these models are shown in Table 5. Boxplots of thedifferences of the mDICs and mLPMLs between each of the missing covariate models M , M ,and M and model M are shown in Figure 2. The boxplots of mDIC and mLPML values for eachmodel are shown in Figure S1 in the supplementary materials.Comparing the mDICs and mLPMLs in Table 5, we can see that model M is the best modelcompared to the other models since it has the smallest mDIC and the largest mLPML, indicatingthat these model comparison criteria perform well in choosing the best model. The simulationresults of posterior estimates of parameters in the spatial response model with missing covariatemodels M , M , and M are shown in Table S1 in the supplementary materials.Furthermore, we consider four more estimation models. Let M ∗ , M ∗ , M ∗ , M ∗ denote models of M , M , M , M with unknown parameters λ y and σ y . For these two parameters, prior distributions λ y ∼ log-Normal ( , ) and σ y ∼ half-Normal ( , . − ) were specified. Model comparison resultsof these four models are presented in Table 6 and Figure 3. The boxplots of mDIC and mLPMLvalues for each model are shown in Figure S2 in the supplementary materials.Similarly, M ∗ is the best model chosen by mDIC and mLPML. The results of Bias, SD, MSEand CP for models with all records, CC analysis, and M ∗ are shown in Table 7. From Table 7,similar conclusions can be obtained as Table 4. Estimates in CC analysis are biased while CPunder model M ∗ are generally larger than that of CC. The simulation results of posterior estimatesof parameters in the spatial response model with missing covariate models M ∗ , M ∗ , and M ∗ are15igure 2: Difference of mDICs and mLPMLs compared to M Table 6: The averages of mDIC and mLPML under M ∗ , M ∗ , M ∗ , and M ∗ M ∗ M ∗ M ∗ M ∗ mDIC 3203.28 3271.84 3237.75 3231.42mLPML -1710.99 -1754.99 -1735.15 -1727.05Figure 3: Difference of mDICs and mLPMLs compared to M ∗ shown in Table S2 in the supplementary materials.16able 7: Simulation results of assessment measures with all records, CC, and model M ∗ Truevalue All records* CC*Bias SD MSE CP Bias SD MSE CP β β β β σ y ( λ y ) τ y α α α σ x ( λ x ) τ x α α σ x ( λ x ) τ x M ∗ value Bias SD MSE CP β β β β σ y ( λ y ) τ y α α α σ x ( λ x ) τ x α α σ x ( λ x ) τ x Application to Spatial Health and Nutrition Survey Data
In this section, the proposed Bayesian hierarchical spatial model and model comparison criteria areapplied to analyze the CHNS 2011 survey data described in Section 2.
For the spatial positive continuous response variable hincome , the spatial regression model pro-posed in Section 3.1 are built for its logarithm form. Covariate vector X involves five individualcovariates including log( indwage ), age , urban , log( WThour ) and hhsize and a province-level co-variate
GDP . The ( s , s (cid:48) ) th entry of H ( λ y ) is exp (− d ss (cid:48) / λ y ) , where d ss (cid:48) is the distance betweenlocation s and location s (cid:48) .For the individual-level covariates, two of them, indwage and WThour , are missing. In orderto take account of different spatial structures in the missing covariates, we consider four differentmissing covariate models in our study.Denote log ( hincome ) as Y , log ( WT hour ) as X , log ( ind w a g e ) as X , GDP as X , a g e as X , ur ban as X , and hhsize as X . We first consider the following model M real for the data. Thespatial regression model for the response variable is: Y i ( s ) = β + β X i ( s ) + β X i ( s ) + β X i ( s ) + β X i ( s ) + β X i ( s ) + β X i ( s ) + σ y W y ( s ) + (cid:15) y i ( s ) ,(cid:15) y i ( s ) ∼ N ( , τ − y ) , W y ∼ MVN ( , H ( λ y )) . The missing covariate model is: X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + σ x W x ( s ) + (cid:15) x i ( s ) ,(cid:15) x i ( s ) ∼ N ( , τ − x ) , W x ∼ MVN ( , H ( λ x )) ; X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + σ x W x ( s ) + (cid:15) x i ( s ) ,(cid:15) x i ( s ) ∼ N ( , τ − x ) , W x ∼ MVN ( , H ( λ x )) . We also consider another three alternative models with the same response model as M real but withdifferent missing covariate distributions as follows: M real : X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + (cid:15) x i ( s ) , X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + (cid:15) x i ( s ) ,(cid:15) x i ( s ) ∼ N ( , τ − x ) , (cid:15) x i ( s ) ∼ N ( , τ − x ) . M real : X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + σ x W x ( s ) + (cid:15) x i ( s ) , X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + (cid:15) x i ( s ) ,(cid:15) x i ( s ) ∼ N ( , τ − x ) , W x ∼ MVN ( , H ( λ x )) , (cid:15) x i ( s ) ∼ N ( , τ − x ) . real : X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + (cid:15) x i ( s ) , X i ( s ) = α + α X i ( s ) + α X i ( s ) + α X i ( s ) + α X i ( s ) + σ x W x ( s ) + (cid:15) x i ( s ) ,(cid:15) x i ( s ) ∼ N ( , τ − x ) , (cid:15) x i ( s ) ∼ N ( , τ − x ) , W x ∼ MVN ( , H ( λ x )) . Assume R and R represent the missing indicators of covariates WT hour and ind w a g e respectively, where R (cid:96) i ( s ) = denotes missing records and R (cid:96) i ( s ) = denotes observed ones( (cid:96) = , ) at location s . For each of the above four models, the following MAR model is assumedfor the missing data mechanism: M m R : R i ( s ) ∼ Bernoulli ( p i ( s )) , R i ( s ) ∼ Bernoulli ( p i ( s )) , logit ( p i ( s )) = φ + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ Y i ( s ) ; logit ( p i ( s )) = φ + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ Y i ( s ) . The same prior distributions described in Section 3.4 were used in these four competitive modelsalong with model M m R for the missing data mechanism. mDIC and mLPML values under models M real to M real are calculated via JAGS and R. With the thinning interval to be 25, 8,000 samples arekept for calculation after a burn-in of 150,000 samples. The convergence of the MCMC samplingalgorithm is checked using several diagnostic procedures discussed in Cowles and Carlin andChen et al. . For example, the traceplots of the parameters under model M real shown in FigureS3 demonstrate good mixing of MCMC chains. Table 8 shows the values of mDIC and mLPML under the four models for the CHNS 2011 surveydata. From Table 8, we choose model M real since it has the smallest mDIC and the largest mLPMLamong these models. The posterior estimates of the parameters under model M real and the resultsof CC estimation are given in Table 9. From Table 9, we can observe that covariates GDP , indwage , age , urban , and hhsize , have significant positive impact on the household income. The householdincome has spatial correlation among different provinces. For missing covariate indwage , both GDP , age , WThour , and urban have significant impact on it. age , and urban can help explainthe missing covariate
WThour . These two missing covariates also have spatial correlation amongdifferent provinces like the household income. The posterior estimates of parameters under othermodels, namely, M real , M real , and M real , are shown in Table S3 in the supplementary materials.Since under model M m R (MAR), φ is independent of the other parameters a posteriori , the posteriorestimates of φ remain the same no matter which of models M real to M real is used to fit hincome , indwage and WThour . These estimates are reported in Table 10. We see from Table 10 that the19able 8: Results of model comparison for the CHNS 2011 survey dataModel M real M real M real M real mDIC 8968.30 8982.39 8971.00 8970.43mLPML -4541.60 -4558.69 -4546.51 -4543.62Table 9: Posterior estimates under CC and M real for the CHNS 2011 survey data CC M real Parameters Mean SD 95% HPD interval Mean SD 95% HPD interval β β -0.0017 0.0143 (-0.0295, 0.0256) 0.0072 0.0134 (-0.0188, 0.0336) β β β β β τ y ( λ y ) -0.1595 0.8870 (-1.9897, 1.4398) -0.0778 0.9261 (-1.9814, 1.6019) σ y α α α α -0.4240 0.0265 (-0.4758, -0.3728) -0.6901 0.0267 (-0.7436, -0.6395) α τ x ( λ x ) σ x α α α -0.1014 0.0206 (-0.1407, -0.0597) -0.1276 0.0240 (-0.1755, -0.0803) α α -0.0333 0.0165 (-0.0663, -0.0004) -0.0202 0.0145 (-0.0495, 0.0088) τ x ( λ x ) -0.1166 0.9361 (-1.9674, 1.6887) 0.7219 1.2951 (-1.7500, 3.1023) σ x M m R Parameters Mean SD 95% HPD interval Parameters Mean SD 95% HPD interval φ -4.3012 0.6841 (-5.6261,-2.9422) φ φ φ -0.0315 0.0378 (-0.1046,0.0435) φ φ φ φ -0.1247 0.0743 (-0.2702,0.0226) φ -0.1776 0.0362 (-0.2484,-0.1073) φ φ φ -0.3768 0.0458 (-0.4691,-0.2824)
95% HPD intervals for φ , φ , φ , φ , φ , φ , and φ do not contain zero, implying that themissingness mechanism is not missing completely at random (MCAR).In order to see whether the posterior estimates will differ with different spatial structures,we also consider another commonly used spatial structure, the conditional autoregressive (CAR)structure, in our analysis. With a similar form with model M real , we assume that both W y , W x and W x follow a CAR structure MVN ( , Σ w ) with Σ w = ( I − λ D ) − , where I = diag ( ) and D is the adjacent matrix of the 12 locations. The mDIC and mLPML values of this model withCAR structures are 8966.27 and -4540.83, which are quite close to those under model M real . Theposterior estimates under the model with CAR structure are also similar, which are shown in TableS4 in the supplementary materials.We also carry out a sensitivity analysis on specification of the models for missing data mech-anism. In addition to model M m R (MAR), we further consider a non-ignorable model for the twomissing covariates, given by M m R : R i ( s ) ∼ Bernoulli ( p i ( s )) , R i ( s ) ∼ Bernoulli ( p i ( s )) , logit ( p i ( s )) = φ + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ Y i ( s ) + φ W x ( s ) + φ W x ( s ) ; logit ( p i ( s )) = φ + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ X i ( s ) + φ Y i ( s ) + φ W x ( s ) + φ W x ( s ) . With the thinning interval to be 25, 8,000 samples are kept for calculation after a burn-in of150,000 samples using JAGS and R. With the same response model and missing covariates modelas model M real , we fit the one with a different missing mechanism model M m R . Posterior estimatesunder this model is shown in Table S5 in the supplementary materials. By comparing the estimatesunder this model and M real , we can see that the estimates of parameters in the response model andthe missing covariate distribution are quite similar, so M real with a MAR assumption is a relativelysimple model to achieve our goal of analysis. 21e calculate the mDIC for the response model as well as DIC ( R ) for the missingness mechanismmodel alone. DIC ( R ) is defined with Dev ( ¯ θ ) = − log f ( R , R | φ , X ) , where X denotes thecovariates included in the missingness mechanism models.The mDIC values under models M real to M real are 8968.30, 8982.39, 8971.00, and 8970.43,respectively, under the MAR missingness model M m R , while these mDIC values are 8966.80,8979.23, 8970.98, and 8969.02, respectively, under model M m R . For models M real to M real withthe MAR missingness model M m R , the DIC ( R ) values are 8122.12, 8121.89, 8122.19, and 8122.03.For models with the missingness model M m R , the DIC ( R ) values are 7963.12, 7965.87, 7963.98,and 7963.75, corresponding to models M real to M real , respectively. These results show that modelswith M m R as missingness model have lower mDIC values and DIC ( R ) values, therefore, we canconclude that the missingness mechanism model M m R is preferred.The posterior estimates of φ under model M m R are given in Table S6 in the supplementalmaterials. For the chosen model M real with missingness model M m R , we can see that a g e , hhsize and the spatial effects W x have a significant positive effect on the missingness of covariate ind w a g e ,while the response variable has a significant negative effect on the missingness of covariate ind w a g e .It means that older people, people with a higher household income, and people who have a largerfamily are prone to reject to report their wages. For the missingness of covariate WT hour , boththe two spatial effects have the significant positive impact. Older people, people living in the urbanarea, people who have a smaller family and people with a higher household income tend to rejectto report their working hours in this analysis. In addition, the coefficients of the spatial effects, φ , φ , and φ , are significant, meaning that the missingness of the missing covariates does dependon the spatial random effects. In this paper, a Bayesian hierarchical spatial model is constructed for spatial data with missingcovariates. In addition to a Gaussian stationary spatial process model for the continuous spatialresponse, missing covariate models with spatial random effects are built for the missing covariates.In our method, missingness mechanisms for the missing covariates are restricted to be MAR, whichmay not be suitable in practice. MNAR is more common in reality and may introduce much morecomplexity in analysis. Future study can be focus on extending the missingness mechanism to beMNAR, and missingness mechanism models should be built to test the assumptions of missingnessmechanisms. Additionally, in our method, a spatial model is built for the continuous responsevariable, which can be extended to variables of other data types, such as categorical responses. Inthe real data analysis, we also fit the models using the conditional autoregressive (CAR) spatialrandom effects in both the response model and the missing covariate models. From the results in22he supplementary materials, we find that the Gaussian random effects and the CAR random effectsyield nearly the same estimation results. In the future, we can introduce missing covariates modelto autologistic model which is universally used for spatial binary data. Furthermore, dealing withspatial effects and missing variables simultaneously complicates the implementation of MCMCsampling algorithms, so it is also necessary to develop efficient algorithms and software to speedup convergence of MCMC sampling. One limitation of the data we analyzed is the lack of detailedaddress information for households. In this study, we just emphasized on the spatial dependentstructure at the province level. Considering both between-province dependency and within-provincedependency is an area devoted for future research.
Acknowledgements
We would like to thank the Editor-in-Chief and the Referee for their very helpful commentsand suggestions, which helped us further improve the paper. Dr. Chen’s research was partiallysupported by NIH grants
Data Availability Statement
The data that support the findings of this study are available from the corresponding author uponreasonable request.
Supporting Information
Additional figures and tables for this article are available online, including boxplots of mDIC andmLPML under models in the simulation study, trace plots of parameters under M real , simulationresults for models M , M , M , M ∗ , M ∗ and M ∗ , posterior estimates under model M real , M real , M real , the model with CAR structure and models with missingness model M m R in real data analysis. References [1] Banerjee, S.; Carlin, B. P.; Gelfand, A. E.
Hierarchical modeling and analysis for spatialdata ; CRC Press, 2014. 232] Cressie, N.
Statistics for spatial data ; John Wiley & Sons, 2015.[3] Cressie, N. A.
Statistics for spatial data ; Wiley Online Library, 1993.[4] Zhao, L. P.; Lipsitz, S.; Lew, D. Regression analysis with missing covariate data usingestimating equations.
Biometrics , 1165–1182.[5] Ibrahim, J. G.; Chen, M.-H.; Lipsitz, S. R. Bayesian methods for generalized linear modelswith covariates missing at random.
Canadian Journal of Statistics , , 55–78.[6] Seshadri, A. K. Statistics of spatial averages and optimal averaging in the presence of missingdata. Spatial Statistics , , 1–21.[7] Bae, B.; Kim, H.; Lim, H.; Liu, Y.; Han, L. D.; Freeze, P. B. Missing data imputation fortraffic flow speed using spatio-temporal cokriging. Transportation Research Part C: EmergingTechnologies , , 124–139.[8] Xue, J.; Nie, B.; Smirni, E. Fill-in the gaps: Spatial-temporal models for missing data. 201713th International Conference on Network and Service Management (CNSM). 2017; pp 1–9.[9] Collins, G.; Heaton, M.; Hu, L.; Monaghan, A. Spatiotemporal multiresolution modelingto infill missing areal data and enhance the temporal frequency of infrared satellite images. Environmetrics , .[10] Grantham, N. S.; Reich, B. J.; Liu, Y.; Chang, H. H. Spatial regression with an informativelymissing covariate: Application to mapping fine particulate matter. Environmetrics , ,e2499.[11] Lipsitz, S. R.; Ibrahim, J. G. A conditional model for incomplete covariates in parametricregression models. Biometrika , , 916–922.[12] Ibrahim, J. G.; Lipsitz, S. R.; Chen, M.-H. Missing covariates in generalized linear modelswhen the missing data mechanism is non-ignorable. Journal of the Royal Statistical Society:Series B (Statistical Methodology) , , 173–190.[13] Grund, S.; Lüdtke, O.; Robitzsch, A. Multiple imputation of missing covariate values inmultilevel models with random slopes: A cautionary note. Behavior Research Methods , , 640–649.[14] Chen, M.-H.; Ibrahim, J. G. Maximum likelihood methods for cure rate models with missingcovariates. Biometrics , , 43–52. 2415] Rubin, D. B. Inference and missing data. Biometrika , , 581–592.[16] Spiegelhalter, D. J.; Best, N. G.; Carlin, B. P.; Van Der Linde, A. Bayesian measures of modelcomplexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , , 583–639.[17] Ibrahim, J. G.; Chen, M.-H.; Sinha, D. Bayesian survival analysis ; Springer Science &Business Media, 2013.[18] Hanson, T. E.; Branscum, A. J.; Johnson, W. O. Predictive comparison of joint longitudinal-survival modeling: a case study illustrating competing approaches.