# Hierarchical Multivariate Directed Acyclic Graph Auto-Regressive (MDAGAR) models for spatial diseases mapping

HH IERARCHICAL M ULTIVARIATE D IRECTED A CYCLIC G RAPH A UTO -R EGRESSIVE (MDAGAR)

MODELS FOR SPATIALDISEASES MAPPING

A P

REPRINT

Leiwen Gao

Department of Biostatistics, University of California, Los Angeles [email protected]

Abhirup Datta

Department of Biostatistics, Johns Hopkins University [email protected]

Sudipto Banerjee

Department of Biostatistics, University of California, Los Angeles [email protected]

February 8, 2021 A BSTRACT

Disease mapping is an important statistical tool used by epidemiologists to assess geographic varia-tion in disease rates and identify lurking environmental risk factors from spatial patterns. Such mapsrely upon spatial models for regionally aggregated data, where neighboring regions tend to exhibitsimilar outcomes than those farther apart. We contribute to the literature on multivariate diseasemapping, which deals with measurements on multiple (two or more) diseases in each region. Weaim to disentangle associations among the multiple diseases from spatial autocorrelation in each dis-ease. We develop Multivariate Directed Acyclic Graphical Autoregression (MDAGAR) models toaccommodate spatial and inter-disease dependence. The hierarchical construction imparts ﬂexibilityand richness, interpretability of spatial autocorrelation and inter-disease relationships, and compu-tational ease, but depends upon the order in which the cancers are modeled. To obviate this, wedemonstrate how Bayesian model selection and averaging across orders are easily achieved usingbridge sampling. We compare our method with a competitor using simulation studies and presentan application to multiple cancer mapping using data from the Surveillance, Epidemiology, and EndResults (SEER) Program.

Keywords:

Areal data analysis; Bayesian hierarchical models; Directed acyclic graphical autoregression; Multipledisease mapping; Multivariate areal data models.

Spatially-referenced data comprising regional aggregates of health outcomes over delineated administrative units suchas counties or zip codes are widely used by epidemiologists to map mortality or morbidity rates and better understandtheir geographic variation. Disease mapping, as this exercise is customarily called, employs statistical models topresent smoothed maps of rates or counts of a disease. Such maps can assist investigators in identifying lurking riskfactors (Koch, 2005) and in detecting “hot-spots” or spatial clusters emerging from common environmental and socio-demographic effects shared by neighboring regions. By interpolating estimates of health outcome from areal data onto a r X i v : . [ s t a t . M E ] F e b PREPRINT - F

EBRUARY

8, 2021 a continous surface, disease mapping also generates smoothed maps for the small-area scale, adjusting for the sparsityof data or low population size (Berke, 2004; Richardson et al., 2004).For a single disease, there has been a long tradition of employing Markov random ﬁelds (MRFs) (Rua and Held, 2005)to introduce conditional dependence for the outcome in a region given its neighbors. Two conspicuous examples arethe Conditional Autoregression (CAR) (Besag, 1974; Besag et al., 1991) and Simultaneous Autoregression (SAR)models (Kissling and Carl, 2008) that build dependence using undirected graphs to model geographic maps. Morerecently, Datta et al. (2018) proposed a class of Directed Acyclic Graphical Autoregressive (DAGAR) models as a pre-ferred alternative to CAR or SAR models in allowing better identiﬁability and interpretation of spatial autocorrelationparameters.Multivariate disease mapping is concerned with the analysis of multiple diseases that are associated among themselvesand across space. It is not uncommon to ﬁnd substantial associations among different diseases sharing genetic andenvironmental risk factors. Quantiﬁcation of genetic correlations among multiple cancers have revealed associationsamong several cancers including lung, breast, colorectal, ovarian and pancreatic cancers (Lindström et al., 2017).Disease mapping exercises with lung and esophageal cancers have also evinced associations among them (Jin et al.,2005). When the diseases are inherently related so that the prevalence of one in a region encourages (or inhibits)occurrence of the other on the same unit, there can be substantial inferential beneﬁts in jointly modeling the diseasesrather than ﬁtting independent univariate models for each disease (see, e.g., Knorr-Held and Best, 2001; Kim et al.,2001; Gelfand and Vounatsou, 2003; Carlin et al., 2003; Held et al., 2005; Jin et al., 2005, 2007; Zhang et al., 2009;Diva et al., 2008; Martinez-Beneito, 2013; Marí-Dell’Olmo et al., 2014).Broadly speaking, there are two approaches to multivariate areal modeling. One approach builds upon a linear trans-formations of latent effects (see, e.g., Gelfand and Vounatsou, 2003; Carlin et al., 2003; Jin et al., 2005; Zhu et al.,2005; Martinez-Beneito, 2013; Bradley et al., 2015). A different class emerges from hierarchical constructions (Jinet al., 2005; Daniels et al., 2006) where each disease enters the model in a given sequence. Here, we build a class ofmultivariate DAGAR (MDAGAR) models for multiple disease mapping by building the joint distribution hierarchi-cally using univariate DAGAR models. We build upon the idea in Jin et al. (2005) of constructing GMCAR models,but with some important modiﬁcations. As noted in Jin et al. (2005), the order in which the new diseases enter thehierarchical model speciﬁes the joint distribution. Therefore, every ordering produces a different GMCAR model,which leads to an explosion in the number of models even for a modest number of cancers (say, more than 2 or 3diseases). Alternatively, joint models that are invariant to ordering are constructed using linear transformations oflatent random variables (Jin et al., 2007). However, these models are cumbersome and computationally onerous to ﬁtand interpreting spatial autocorrelation becomes challenging.Our current methodological innovation lies in devising a hierarchical MDAGAR model in conjunction with a bridgesampling algorithm (Meng and Wong, 1996; Gronau et al., 2017) for choosing among differently ordered hierarchicalmodels. The idea is to begin with a ﬁxed ordered set of cancers, posited to be associated with each other and acrossspace, and build a hierarchical model. The DAGAR speciﬁcation produces a comprehensible association structure,while bridge sampling allows us to rank differently ordered models using their marginal posterior probabilities. Sinceeach model corresponds to an assumed conditional dependence, the marginal posterior probabilities will indicate thetenability of such assumptions given the data. Epidemiologists, then, will be able to use this information to establishrelationships among the diseases and spatial autocorrelation for each disease.The balance of this paper proceeds as follows. Section 2 develops the hierarchical MDAGAR model and introducesa bridge sampling method to select the MDAGAR with the best hierarchical order. Section 3 presents a simulationstudy to compare the MDAGAR with the GMCAR model and also illustrates the bridge sampling algorithm’s efﬁcacyin selecting the “true” model. Section 4 applies our MDAGAR to age-adjusted incidence rates of four cancers fromthe SEER database and discusses different cases with respect to predictors. Finally, in Section 5, we summarize someconcluding remarks and suggest promotion in the future research.

Let G = {V , E} be a graph corresponding to a geographic map, where V = { , , . . . , k } is a ﬁxed ordering of thevertices of the graph representing clearly delineated regions on the map, and E = { ( i, j ) : i ∼ j } is the collectionof edges between the vertices representing neighboring pairs of regions. We denote two neighboring regions by ∼ .The DAGAR model, proposed by Datta et al. (2018), builds a spatial autocorrelation model for a single outcome on G using an ordered set of vertices in V . Let N (1) be the empty set and let N ( j ) = { j (cid:48) < j : j (cid:48) ∼ j } , where j ∈ V \ { } .Thus, N ( j ) includes geographic neighbors of region j (cid:48) that precede j in the ordered set V . Let { w i : i ∈ V} be a PREPRINT - F

EBRUARY

8, 2021 collection of k random variables deﬁned over the map. DAGAR speciﬁes the following autoregression, w = (cid:15) ; w j = (cid:88) j (cid:48) ∈ N ( j ) b jj (cid:48) w j (cid:48) + (cid:15) j , j = 2 , . . . , k , (1)where (cid:15) j ind ∼ N (0 , λ j ) with the precision λ j , and b jj (cid:48) = 0 if j (cid:48) (cid:54)∈ N ( j ) . This implies that w ∼ N ( , τ Q ( ρ )) , where Q ( ρ ) is a spatial precision matrix that depends only upon a spatial autocorrelation parameter ρ and τ is a positive scaleparameter. The precision matrix Q ( ρ ) = ( I − B ) (cid:62) F ( I − B ) , B is a k × k strictly lower-triangular matrix and F isa k × k diagonal matrix. The elements of B and F are denoted by b jj (cid:48) and λ j , respectively, where b jj (cid:48) = (cid:26) if j (cid:48) / ∈ N ( j ) ; ρ n

EBRUARY

8, 2021 where x ij is a p i × vector of explanatory variables speciﬁc to disease i within region j , β i are the slopes correspond-ing to disease i , w ij is a random effect for disease i in region j , and e ij ind ∼ N (0 , ( σ i ) − ) is the random noise arisingfrom uncontrolled imperfections in the data.Part of the residual from the explanatory variables is captured by the spatial-temporal effect w ij . Let w i =( w i , w i , . . . , w ik ) (cid:62) for i = 1 , , . . . , q . We adopt a hierarchical approach (see, e.g., Jin et al., 2005), where wespecify the joint distribution of w = ( w (cid:62) , w (cid:62) , . . . , w (cid:62) q ) (cid:62) as p ( w ) = p ( w ) (cid:81) qi =2 p ( w i | w *
*

*EBRUARY*

8, 2021 respectively. For each element in η i we choose a normal prior N ( µ ij , σ η ij ) , while the prior N ( w | , Q w ) can also bewritten as p ( w | τ , η , . . . , η q , ρ ) ∝ τ k | Q ( ρ ) | exp (cid:110) − τ w (cid:62) Q ( ρ ) w (cid:111) × q (cid:89) i =2 τ k i | Q ( ρ i ) | exp (cid:40) − τ i w i − i − (cid:88) i (cid:48) =1 A ii (cid:48) w i (cid:48) ) (cid:62) Q ( ρ i )( w i − i − (cid:88) i (cid:48) =1 A ii (cid:48) w i (cid:48) ) (cid:41) , (8)where det( Q ( ρ i )) = (cid:81) kj =1 λ ij , and w Ti Q ( ρ i ) w i = λ i w i + (cid:80) kj =2 λ ij ( w ij − (cid:80) j (cid:48) ∈ N ( j ) b ijj (cid:48) w ij (cid:48) ) .We sample the parameters from the posterior distribution in (7) using Markov chain Monte Carlo (MCMC) with Gibbssampling and random walk metropolis (Gamerman and Lopes, 2006) as implemented in the rjags package within the R statistical computing environment. Section S.6 presents details on the the MCMC updating scheme. It is clear from (5) that each ordering of diseases in MDAGAR will produce a different model. For the bivariatesituation, it is convenient to compare only two models (orders) by the signiﬁcance of parameter estimates as well asmodel performance. However, when there are more than two diseases involved in the model, at least six models (forthree diseases) will be ﬁtted and comparing all models become cumbersome or even impracticable.Instead, we pursue model averaging of MDAGAR models. Given a set of T = q ! candidate models, say M , . . . , M T ,Bayesian model selection and model averaging calculates p ( M = M t | y ) = p ( y | M = M t ) p ( M = M t ) (cid:80) Tj =1 p ( y | M = M j ) p ( M = M j ) , (9)for t = 1 , . . . , T (Hoeting et al., 1999). Computing the marginal likelihood p ( y | M t ) in (9) is challenging. Methodssuch as importance sampling (Perrakis et al., 2014) and generalized harmonic mean (Gelfand and Dey, 1994) have beenproposed as stable estimators with ﬁnite variance, but ﬁnding the required importance density with strong constraintson the tail behavior relative to the posterior distribution is often challenging. Bridge sampling estimates the marginallikelihood (i.e. the normalizing constant) by combining samples from two distributions: a bridge function h ( · ) and aproposal distribution g ( · ) (Gronau et al., 2017). Let θ t = { β t , σ t , ρ t , τ t , η ,t , . . . , η q,t } be the set of parameters inmodel M t with prior p ( θ t | M t ) as deﬁned in the ﬁrst row of (7). Based on the identity, (cid:82) p ( y | θ t , M t ) p ( θ t | M t ) h ( θ t | M t ) g ( θ t | M t ) d θ t (cid:82) p ( y | θ t , M t ) p ( θ t | M t ) h ( θ t | M t ) g ( θ t | M t ) d θ t , a current version of the bridge sampling estimator is p ( y | M = M t ) = E g ( θ t | M t ) [ p ( y | θ t , M t ) p ( θ t | M t ) h ( θ t | M t )] E p ( θ t | y ,M t ) [ h ( θ t | M t ) g ( θ t | M t )] ≈ N (cid:80) N i =1 p ( y | ˜ θ t,i , M t ) p ( ˜ θ t,i | M t ) h ( ˜ θ t,i | M t ) N (cid:80) N j =1 h ( θ (cid:63)t,j | M t ) g ( θ (cid:63)t,j | M t ) (10)where θ (cid:63)t,j ∼ p ( θ t | y , M t ) , j = 1 , . . . , N , are N posterior samples and ˜ θ t,i ∼ g ( θ t | M t ) , i = 1 , . . . , N , are N samples drawn from the proposal distribution (Gronau et al., 2017). The likelihood p ( y | θ t , M = M t ) is obtained byintegrating out w from (7) as N ( y | Xβ , (cid:2) Q − w ( ρ t , τ t , η ,t , . . . , η q,t ) + diag ( σ t ) ⊗ I k (cid:3) − ) , (11)given that y = ( y (cid:62) , . . . , y (cid:62) q ) (cid:62) with y i = ( y i , y i , . . . , y ik ) (cid:62) , diag ( σ ) is a diagonal matrix with σ i , i = 1 , . . . , q , onthe diagonal, and X is the design matrix with X i as block diagonal where X i = ( x i , x i , . . . , x ik ) (cid:62) . The bridgefunction h ( θ t | M t ) is speciﬁed by the optimal choice proposed in Meng and Wong (1996), h ( θ t | M t ) = C s p ( y | θ t , M t ) p ( θ t | M t ) + s p ( y | M t ) g ( θ t | M t ) (12)where C is a constant. Inserting (12) in (10) yields the estimate of p ( y | M = M t ) after convergence of an iterativescheme (Meng and Wong, 1996) as ˆ p ( y | M t ) ( t +1) = N (cid:80) N i =1 l ,i s l ,i + s ˆ p ( y | M t ) ( t ) N (cid:80) N j =1 1 s l ,j + s ˆ p ( y | M t ) ( t ) (13) PREPRINT - F

EBRUARY

8, 2021 where l ,j = p ( y | θ (cid:63)t,j ,M t ) p ( θ (cid:63)t,j | M t ) g ( θ (cid:63)t,j | M t ) , l ,i = p ( y | ˜ θ t,i ,M t ) p ( ˜ θ t,i | M t ) g ( ˜ θ t,i | M t ) , s = N N + N and s = N N + N .Given the log marginal likelihood estimates from bridgesampling , the posterior model probability for each modelis calculated from (9) by setting prior probability of each model p ( M = M t ) . For Bayesian model averaging (BMA),the model averaged posterior distribution of a quantity of interest ∆ is obtained as p (∆ | y ) = (cid:80) Tt =1 p (∆ | M = M t , y ) p ( M = M t | y ) (Hoeting et al., 1999), and the posterior mean is E (∆ | y ) = T (cid:88) t =1 E (∆ | M = M t , y ) p ( M = M t | y ) . (14)Setting ∆ = { β, w } fetches us the model averaged posterior estimates for spatial random effects as well as calculatingthe posterior mean incidence rates as discussed in Section 4. We simulate two different experiments. The ﬁrst experiment is designed to evaluate MDAGAR’s inferential perfor-mance against GMCAR. The second experiment aims to ascertain the effectiveness of the bridge sampling algorithm(2.5) in preferring models with a correct “ordering” of the diseases in the model.

We compare MDAGAR’s inferential performance with GMCAR (Jin et al., 2005). We choose the 48 states of thecontiguous United States as our underlying map, where two states are treated as neighbors if they share a commongeographic boundary. We generated our outcomes y ij using the model in (4) with q = 2 , i.e., two outcomes, and twocovariates, x j and x j , with p = 2 and p = 3 . We ﬁxed the values of the covariates after generating them from N ( , I p i ) , i = 1 , , independent across regions. The regression slopes were set to β = (1 , (cid:62) and β = (2 , , (cid:62) .Turning to the spatial random effects, we generated values of w = (cid:0) w (cid:62) , w (cid:62) (cid:1) (cid:62) from a N ( , Q w ) distribution, wherethe precision matrix is Q w = (cid:20) τ Q ( ρ ) + τ A (cid:62) Q ( ρ ) A τ A (cid:62) Q ( ρ ) τ Q ( ρ ) A τ Q ( ρ ) (cid:21) . (15)We set τ = τ = 0 . , ρ = 0 . and ρ = 0 . in (15) and take Q ( ρ i ) = D ( ρ i ) − , where D ( ρ i ) = exp( − φ i d ( j, j (cid:48) )) , φ i = − log( ρ i ) is the spatial decay for disease i and d ( j, j (cid:48) ) refers to the distance between the embedding of the j th and j (cid:48) th vertex. The vertices are embedded on the Euclidean plane and the centroid of each state is used tocreate the distance matrix. Using this exponential covariance matrix to generate the data offers a “neutral” groundto compare the performance of MDAGAR with GMCAR. We speciﬁed A using ﬁxed values of η = { η , η } .Here, we considered three sets of values for η to correspond to low, medium and high correlation among diseases. Weﬁxed η = { . , . } to ensure an average correlation of 0.15 (range 0.072 - 0.31); η = { . , . } with an averagecorrelation of 0.55 (range 0.45 - 0.74); and η = { . , . } with a mean correlation of 0.89 (range 0.84 - 0.94). Wegenerated w ij ’s for each of the above speciﬁcations for η and, with the values of w ij generated as above, we generatedthe outcome y ij ∼ N ( x (cid:62) ij β i + w ij , /σ i ) , where σ = σ = 0 . . We repeat the above procedure to replicate datasets for each of the three speciﬁcations of η .For our second experiment we generate a data set with q = 3 cancers. We extend the above setup to include one moredisease. We generate y ij ’s from (4) with the value of x j ﬁxed after being generated from N ( , I ) , β = (5 , , (cid:62) and σ = 0 . . Let [ i, j, k ] denote the model p ( w i ) × p ( w j | w i ) × p ( w k | w j , w i ) . For three diseases the six resultingmodels are denoted as M = [1 , , , M = [1 , , , M = [2 , , , M = [2 , , , M = [3 , , and M = [3 , , .Each of the six models imply a corresponding joint distribution w ∼ N ( , Q w ) which is used to generate the w ij ’s.Let the parenthesized sufﬁx ( i ) denote the disease in the i th order. For example, in M = [1 , , , we write w in theform of (5) as w ∼ (cid:15) (1) ; w = A (21) w + (cid:15) (2) ; w = A (31) w + A (32) w + (cid:15) (3) , where (cid:15) ( i ) ∼ N ( , τ ( i ) Q ( ρ ( i ) )) with Q ( ρ ( i ) ) = D ( ρ ( i ) ) − as in the ﬁrst experiment, and A ( ii (cid:48) ) = η ii (cid:48) ) I + η ii (cid:48) ) M is the coefﬁcient matrix associating random effects for diseases in the i th and i (cid:48) th order. We set τ (1) = τ (2) = τ (3) =0 . , ρ (1) = 0 . , ρ (2) = 0 . , ρ (3) = 0 . , η = 0 . , η = 0 . , η = 1 , η = 0 . , η = 1 . ,and η = 0 . to completely specify Q w for each of the 6 models. For each M i , we generate datasets by ﬁrstgenerating w ∼ N ( , Q w ) and then generating y ij ’s from (4) using the speciﬁcations described above. PREPRINT - F

EBRUARY

8, 2021

In our ﬁrst experiment we analyzed the replicated datasets using (7) with p ( ρ ) × p ( η ) ∝ q =2 (cid:89) i =1 { U nif ( ρ i | , } × N ( η | , . I ) , (16)where η = ( η , η ) (cid:62) and U nif is the Uniform density. Prior speciﬁcations are completed by setting a τ = 2 , b τ = 8 , a σ = 2 , b σ = 0 . , µ β = , V β = 1000 I in (7). Note that the same set of priors were used for both MDAGARand GMCAR as they have the same number of parameters with similar interpretations.We compare models using the Widely Applicable Information Criterion (WAIC) (Watanabe, 2010; Gelman et al.,2014) and a model comparison score D based on a balanced loss function for replicated data (Gelfand and Ghosh,1998). Both WAIC and D reward goodness of ﬁt and penalize model complexity. Details on how these metricsare computed are provided in S.7. In addition, we also computed the average mean squared error (AMSE) of thespatial random effects estimated from each of the data sets. We found the mean (standard deviation) of the AMSEsto be 1.69 (0.034) from the 85 low-correlation datasets, 1.47 (0.030) from the 85 medium-correlation datasets, and2.35 (0.059) from the 85 high-correlation datasets. The corresponding numbers for GMCAR were 1.83 (0.033), 1.59(0.031), and 2.14 (0.050), respectively. The MDAGAR tends to have smaller AMSE for low and medium correlations,while GMCAR has lower AMSE when the correlations are high, although the differences are not signiﬁcant. Wealso compute the WAICs and D scores for each simulated data set. Figure 1 plots the values of WAICs ((a)–(c))and D scores ((d)–(f)) for the data sets corresponding to each of the three correlation settings. Here, MDAGARoutperforms GMCAR in all three correlation settings with respect to both WAICs and D scores. WAIC den s i t y Model

GMCARMDAGAR (a) WAIC: low

WAIC den s i t y Model

GMCARMDAGAR (b) WAIC: medium

WAIC den s i t y Model

GMCARMDAGAR (c) WAIC: high D den s i t y Model

GMCARMDAGAR (d) Score D : low D den s i t y Model

GMCARMDAGAR (e) Score D : medium D den s i t y Model

GMCARMDAGAR (f) Score D : high Figure 1: Density plots for WAICs and D scores over 85 datasets. Density plots of WAIC for MDAGAR (blue) andGMCAR (red) models with low, medium and high correlation are shown in (a), (b) and (c) respectively, while (d)–(f)are the corresponding density plots for D scores. The dotted vertical line shows the mean for WAIC and D in eachplot.Figure 2 presents scatter plots for the true values (x axis) of spatial random effects against their posterior estimates(y axis). To be precise, each panel plots × × true values of the elements of the × vector w for datasets against their corresponding posterior estimates. We see strong agreements between the true values andtheir estimates for both MDAGAR and GMCAR. The agreement is more pronounced for the datasets corresponding tomedium and high correlations. For the low-correlation datasets, the agreement is clearly weaker although MDAGARdoes slightly better than GMCAR. PREPRINT - F

EBRUARY

8, 2021 −404 −5 0 5 W E s t i m a t e o f W (a) MDAGAR: low −50510 −5 0 5 10 W E s t i m a t e o f W (b) MDAGAR: medium −20−1001020 −20 −10 0 10 20 30 W E s t i m a t e o f W (c) MDAGAR: high −6−3036 −5 0 5 W E s t i m a t e o f W (d) GMCAR: low −50510 −5 0 5 10 W E s t i m a t e o f W (e) GMCAR: medium −20−100102030 −20 −10 0 10 20 30 W E s t i m a t e o f W (f) GMCAR: high Figure 2: Scatter plots for estimates of spatial random effects (y axis) against the true values (x axis) with ◦ linesover 85 datasets: (a)–(c) are estimates from MDAGAR model with low, medium and high correlation, while (d)–(f)are the corresponding estimates from GMCAR.We compute D KL ( N ( , Q true ) || N ( , Q w )) = 12 (cid:20) log (cid:18) det( Q true )det( Q w ) (cid:19) + tr ( Q w Q − true ) − qk (cid:21) , which is theKullback-Leibler Divergence between the model for w with the true generative precision matrix ( Q true ) and thosewith MDAGAR and GMCAR precisions ( Q w ). Using the posterior samples in the precision matrix, we evaluatethe posterior probability that D KL ( N ( , Q true ) || N ( , Q MDAGAR )) is smaller than D KL ( N ( , Q true ) || N ( , Q w )) .Figure S.5 depicts a density plot of these probabilities over the 85 data sets. When correlations are low and medium,the MDAGAR has a mean probability of around to be closer to the true model than the GMCAR, while for highcorrelations GMCAR excels with an average probability of to be closer to the true model. These ﬁndings areconsistent with the results of AMSE, where the GMCAR tended to perform better when the correlations are high.Additional comparative diagnostics from MDAGAR and GMCAR such as coverage probabilities for parameters andcorrelation between random effects for two diseases in the same state are presented in S.7.2. We now evaluate the effectiveness of the method in Section 2.5 at selecting the model with the correct ordering ofdiseases. We used the bridgesampling package in R to compute p ( M i | y ( n ) ) = max t =1 ,..., p ( M t | y ( n ) ) for eachof n = 50 × data sets generated as described in Section 3.1. Table 1 presents the probability of each model beingselected for different true model scenarios. The probability of selecting the true model is shown in bold along thediagonal. Our experiment reveals that bridge sampling is extremely effective at choosing the correct order. It was ableto identify the correct order between to , which is substantially larger than any of the probability of choosingany of the misspeciﬁed models.Table 1: Proportion of times ( π ( M i ) ) bridge sampling chose the model with the correct order out of the 50 data setswith that order. True model π ( M ) π ( M ) π ( M ) π ( M ) π ( M ) π ( M ) M M M M M M PREPRINT - F

EBRUARY

8, 2021

We now turn to an areal dataset with 4 different cancers using the MDAGAR model. The data set is extracted from theSEER ∗ Stat database using the SEER ∗ Stat statistical software (National Cancer Institute, 2019). The dataset consists offour cancers: lung, esophagus, larynx and colorectal, where the outcome is the 5-year average age-adjusted incidencerates (age-adjusted to the 2000 U.S. Standard Population) per 100,000 population in the years from 2012 to 2016 across58 counties in California, USA, as mapped in Figure S.6a. The maps exhibit preliminary evidence of correlation acrossspace and among cancers. Cutoffs for the different levels of incidence rates are quantiles for each cancer. For all fourcancers, incidence rates are relatively higher in counties concentrated in the middle northern areas including Shasta,Tehama, Glenn, Butte and Yuba than those other areas. In general, northern areas have higher incidence rates than inthe southern part. This is especially pronounced for lung cancer and esphogus cancer. For larynx cancer, in spite ofthe highest incidence rate concentrated in the north, the incidence rates in the south are mostly at the same high level.For colorectal cancer, the edge areas at the bottom also exhibit high incidence rates.Overall, counties with similar levels of incidence rates tend to depict some spatial clustering. We analyze this data setusing (7) with the following prior speciﬁcation p ( η , ρ , τ , σ , w ) = q (cid:89) i =1 U nif ( ρ i | , × q (cid:89) i =2 i − (cid:89) j =1 N ( η ij | , . I ) × q (cid:89) i =1 N ( β i | , . I ) × q (cid:89) i =1 IG (1 /τ i | , . × q (cid:89) i =1 IG ( σ i | , × N ( w | , Q w ) . (17)We also discuss a case excluding the risk factor (see Section S.8).For covariates, we include county attributes that possibly affect the incidence rates, including percentages of residentsyounger than 18 years old (young ij ), older than 65 years old (old ij ), with education level below high school (edu ij ),percentages of unemployed residents (unemp ij ), black residents (black ij ), male residents (male ij ), uninsured residents(uninsure ij ), and percentages of families below the poverty threshold (poverty ij ). All covariates are common fordifferent cancers and extracted from the SEER ∗ Stat database (National Cancer Institute, 2019) for the same period,2012 - 2016. Since cigarette smoking is a common risk factor for cancers, adult smoking rates (smoke ij ) for 2014–2016 were obtained from the California Tobacco Facts and Figures 2018 database (California Department of PublicHealth, 2018). Spatial patterns in the map of adult cigarette smoking rates, shown in Figure S.6b, are similar to theincidence of cancers, especially lung and esophageal cancers, the highest smoking rates are concentrated in the north.While some central California counties (e.g., Stanislaus, Tuolumne, Merced, Mariposa, Fresno and Tulare) also exhibithigh rates, although there is clearly less spatial clustering of the high rates than in the north.Since the order of cancers in the DAG specify the model, we ﬁt all

4! = 24 models using (7) and compute the marginallikelihoods using bridge sampling (Section 2.5). By setting the prior model probabilities as p ( M = M t ) = for t = 1 , , . . . , , we compute the posterior model probabilities using (9). These are presented in Table S.5. We obtainBayesian model averaged (BMA) estimates using (14) with the weights in Table S.5. Among all models, model M is selected as the best model with the largest posterior probability . and the corresponding conditional structure is [ esophageal ] × [ larynx | esophageal ] × [ colorectal | esophageal , larynx ] × [ lung | esophageal , larynx , colorectal ] .Table 2 is a summary of the parameter estimates including regression coefﬁcients, spatial autocorrelation ( ρ i ), spatialprecision ( τ i ) and noise variance ( σ i ) for each cancer. From M and BMA, we ﬁnd the regression slopes for thepercentage of smokers and uninsured residents are signiﬁcantly positive and negative, respectively, for esophagealcancer. The negative association between percentage of uninsured and esophageal cancer may seem surprising, butis likely a consequence of spatial confounding with counties exhibiting low incidence rates for esophageal cancerhaving a relatively large number of uninsured residents (see top right in S.6a and the right most ﬁgure in S.6b).Since esophageal cancer has low incidence rates, this association could well be spurious due to spatial confounding.Percentage of smokers is, unsurprisingly, found to be a signiﬁcant risk factor for lung cancer, while the percentageof blacks seems to be signiﬁcantly associated with elevated incidence of larynx cancer. In addition, we tend to seethat percentage of population below the poverty level has a pronounced association with higher rates of lung andesophageal cancer.Recall from Section 2.4 that ρ is the residual spatial autocorrelation for esophageal cancer after accounting for theexplanatory variables, while ρ i for i = 2 , , are residual spatial autocorrelations after accounting for the explanatoryvariables and the preceding cancers in the model M . From Table 2 we see that esophageal cancer exihibits relativelyweaker spatial autocorrelation, while the residual spatial autocorrelations for larynx and colorectal cancers after ac-counting for preceding cancers are both at moderate levels of around 0.5. Similarly for the spatial precision τ i , larynxappears to have the smallest conditional variability while that for colorectal and lung are slightly larger. PREPRINT - F

EBRUARY

8, 2021

Table 2: Posterior means (95 % credible intervals) for parameters estimated from M and BMA estimates for regres-sion coefﬁcients only for the SEER four cancer dataset. Parameters Model Esophageal Larynx Colorectal LungIntercept M % ) M BMA

Young ( % ) M -0.12 (-0.31, 0.07) -0.07 (-0.18, 0.04) 0.27 (-0.2, 0.76) -0.08 (-0.90, 0.74)BMA -0.11 (-0.3, 0.08) -0.08 (-0.19, 0.03) 0.29 (-0.18, 0.78) -0.01 (-0.86, 0.82)Old ( % ) M -0.11 (-0.25, 0.04) -0.05 (-0.14, 0.03) 0.10 (-0.28, 0.48) -0.09 (-0.81, 0.67)BMA -0.10 (-0.25, 0.05) -0.05 (-0.14, 0.03) 0.10 (-0.29, 0.49) -0.08 (-0.79, 0.66)Edu ( % ) M % ) M -0.13 (-0.29, 0.03) 0.01 (-0.08, 0.10) -0.09 (-0.54, 0.37) 0.60 (-0.47, 1.55)BMA -0.12 (-0.28, 0.05) 0.01 (-0.08, 0.1) -0.08 (-0.54, 0.38) 0.61 (-0.43, 1.56)Black ( % ) M -0.16 (-0.73, 0.39) 0.15 (-1.06, 1.29)BMA 0.13 (-0.07, 0.33) -0.18 (-0.75, 0.39) 0.14 (-1.02, 1.25)Male ( % ) M -0.04 (-0.17, 0.09) 0.00 (-0.07, 0.08) 0.24 (-0.12, 0.60) 0.14 (-0.57, 0.79)BMA -0.04 (-0.17, 0.09) 0 (-0.07, 0.08) 0.24 (-0.12, 0.62) 0.14 (-0.55, 0.82)Uninsured ( % ) M -0.24 (-0.44, -0.04) -0.08 (-0.20, 0.04) 0.07 (-0.44, 0.58) 0.01 (-0.82, 0.86)BMA -0.23 (-0.42, -0.02) -0.08 (-0.2, 0.04) 0.09 (-0.42, 0.61) 0 (-0.81, 0.82)Poverty ( % ) M ρ i M τ i M σ i M For the posterior mean incidence rates and spatial random effects w ij , we present estimates from model M andBMA. Figure 3a and 3b are maps of posterior mean spatial random effects and model ﬁtted incidence rates for fourcancers obtained from BMA, while Figure 3c and 3d show maps of those from model M . The posterior meanincidence rates from BMA and M are in accord with each other, and both present DAGAR-smoothed versions ofthe original patterns in Figure S.6a. For posterior means of spatial random effects, in general, the estimates from M are similar to model averaged estimates, especially for lung and colorectal cancers, exhibiting relatively largepositive values in the northern counties, where the incidence rates are high. However, for esophageal and larynxcancers we see slight discrepancies between M and BMA in the north. The BMA estimates produce larger positiverandom effects, ranging between . − . , in most counties, while M produces estimates between − . foresophageal cancer. More counties with random effects larger than . are estimated from M for larynx cancer.We believe this is attributable, at least in part, to another competitive model, M = [ larynx ] × [ esophagus | larynx ] × [ lung | larynx , esophagus ] × [ colorectal | larynx , esophagus , lung ] (posterior probability . ), which contributes to theBMA. On the other hand, the effects of some important county-level covariates play an essential role in the discrepancybetween the estimates of random effects and model ﬁtted incidence rates for each cancer.Recall from Section 2 that η ii (cid:48) and η ii (cid:48) reﬂect the associations among cancers that can be attributed to spatialstructure. Speciﬁcally, larger values of η ii (cid:48) will indicate inherent associations unrelated to spatial structure, whilethe magnitude of η ii (cid:48) reﬂects associations due to spatial structure. Figure S.7 presents posterior distributions of η for all pairs of cancers. We see from the distribution of η that there is a pronounced non-spatial component in theassociation between lung and colorectal cancers. Similar, albeit somewhat less pronounced, non-spatial associationsare seen between larynx and esophageal cancers and between lung and larynx cancers. Analogously, the posteriordistributions for η and η tend to have substantial positive support suggesting substantial spatial cross-correlationsbetween lung and colorectal cancers and between colorectal and larynx cancers. Interestingly, we ﬁnd negative supportin the posterior distributions for η and η . The negative mass implies that the covariance among cancers withina region is suppressed by strong dependence with neighboring regions. This seems to be the case for associationsbetween lung and esophageal cancers and between lung and larynx cancers.We also present supplementary analysis that excludes adult smoking rates from the covariates, which we refer toas “Case 2”. Figure S.8 shows estimated correlations between pairwise cancers in each of the 58 counties. Thetop row presents the correlations including smoking rates (“Case 1”) as has been analyzed here. The bottom rowpresents the corresponding maps for “Case 2”. Interestingly, accounting for smoking rates substantially diminishesthe associations among esophageal, colorectal and lung cancers. These are signiﬁcantly associated in “Case 2” butonly lung and colorectal retain their signiﬁcance after accounting for smoking rates. PREPRINT - F

EBRUARY

8, 2021

Lung cancer−20 − −10−10 − 00 − 55 − 1010 − 32 Esophageal cancer−1.2 − −0.5−0.5 − 00 − 0.10.1 − 0.50.5 − 1Larynx cancer−1 − −0.5−0.5 − −00 − 0.10.1 − 0.50.5 − 1.1 Colorectum cancer−5 − −3−3 − 00 − 11 − 33 − 6 (a) BMA: posterior mean spatial random effects

Lung cancer22−4141−4545−5151−80 Esophageal cancer0−3.53.5−3.93.9−4.54.5−12Larynx cancer0−1.81.8−2.12.1−2.62.6−5 Colorectum cancer24−3434−3636−3838−50 (b) BMA: posterior mean incidence rates

Lung cancer−20 − −10−10 − 00 − 55 − 1010 − 32 Esophageal cancer−1.2 − −0.5−0.5 − 00 − 0.10.1 − 0.50.5 − 1Larynx cancer−1 − −0.5−0.5 − −00 − 0.10.1 − 0.50.5 − 1.1 Colorectum cancer−5 − −3−3 − 00 − 11 − 33 − 6 (c) M : posterior mean spatial random effects Lung cancer22−4141−4545−5151−80 Esophageal cancer0−3.53.5−3.93.9−4.54.5−12Larynx cancer0−1.81.8−2.12.1−2.62.6−5 Colorectum cancer24−3434−3636−3838−50 (d) M : posterior mean incidence rates Figure 3: Maps of posterior results using BMA and the highest probabilty model M for lung, esophagus, larynx andcolorectal cancer in California including posterior mean spatial random effects and posterior mean incidence rates asshown in (a) (b) for BMA and (c) (d) for M ; PREPRINT - F

EBRUARY

8, 2021

We have developed a conditional multivariate “MDAGAR” model to estimate spatial correlations for multiple cor-related diseases based on a currently proposed class of DAGAR models for univariate disease mapping, as well asproviding better interpretation of the association among diseases. We demonstrate that MDAGAR tends to outperformGMCAR when association between spatial random effects for different diseases is weak or moderate. Inference iscompetitive when associations are strong. MDAGAR retains the interpretability of spatial autocorrelations, as in uni-variate DAGAR, separating the spatial correlation for each disease from any inherent or endemic association amongdiseases. While MDAGAR, like all DAG based models, is speciﬁed according to a ﬁxed order of the diseases, weshow that a bridge sampling algorithm can effectively choose among the different orders and also provide Bayesianmodel averaged inference in a computationally efﬁcient manner.Our data analysis reveals that correlations between incidence rates for different cancers are impacted by covariates.For example, eliminating adult cigarette smoking rates produces similar spatial patterns for the incidence rates ofesophageal, lung and colorectal cancer. In addition, the signiﬁcant correlation between lung and esophageal cancer,even after accounting for smoking rates, implies other inherent or endemic association such as latent risk factorsand metabolic mechanisms. We also see that the MDAGAR based posterior estimates of the latent spatial effects inFigure 3a and 3c resemble those from an MDAGAR without covariates (Figure S.9), while the maps for the estimatedincidence rates in Figure 3b and 3d account for the spatial variability of the covariates.Future challenges will include scalability with very large number of diseases because, as we have seen, the numberof models to be ﬁtted grows exponentially with the number of diseases. One way to obviate this issue is to adopta joint modeling approach analogous to order-free MCAR models (Jin et al., 2007) that build rich spatial structuresfrom linear transformations of simpler latent variables. For instance, we can develop alternate MDAGAR modelsby specifying w = Λ f , where Λ is a suitably speciﬁed matrix and f is a latent vector whose components followindependent univariate DAGAR distributions. This will avoid the order dependence, but the issue of identifying andspecifying Λ will need to be considered as will the interpretation of disease speciﬁc spatial autocorrelations. Acknowledgements

The work of the ﬁrst and third authors have been supported in part by the Division of Mathematical Sciences (DMS)of the National Science Foundation (NSF) under grant 1916349 and by the National Institute of Environmental HealthSciences (NIEHS) under grants R01ES030210 and 5R01ES027027. The work of the second author was supported bythe Division of Mathematical Sciences (DMS) of the National Science Foundation (NSF) under grant 1915803.

Supplementary Materials

The R code implementing our models are available at https://github.com/LeiwenG/Multivariate_DAGAR . References

Banerjee, S. (2020). Modeling massive spatial datasets using a conjugate bayesian linear modeling framework.

SpatialStatistics

Interna-tional Journal of Health Geographics Journal of the Royal StatisticalSociety: Series B (Methodological)

Annals of the institute of statistical mathematics

Ann. Appl. Stat. Bayesian statistics PREPRINT - F

EBRUARY

8, 2021

Cover, T. M. and Thomas, J. A. (1991).

Elements of information theory . Wiley Series in Telecommunications andSignal Processing. Wiley Interscience.Daniels, M. J., Zhou, Z., and Zou, H. (2006). Conditionally speciﬁed space-time models for multivariate processes.

Journal of Computational and Graphical Statistics

Bayesian Analysis .Diva, U., Dey, D. K., and Banerjee, S. (2008). Parametric models for spatially correlated survival data for individualswith multiple cancers.

Statistics in medicine

Markov chain Monte Carlo: stochastic simulation for Bayesian inference .Chapman and Hall/CRC.Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice: asymptotics and exact calculations.

Journal of theRoyal Statistical Society: Series B (Methodological)

Biometrika

Biostatistics Statistics and computing

Journal of mathematical psychology arXiv preprint arXiv:1710.08162 .Held, L., Natário, I., Fenton, S. E., Rue, H., and Becker, N. (2005). Towards joint disease mapping.

Statistical methodsin medical research

Statistical science pages 382–401.Jin, X., Banerjee, S., and Carlin, B. P. (2007). Order-free co-regionalized areal data models with application tomultiple-disease mapping.

Journal of the Royal Statistical Society: Series B (Statistical Methodology)

Biomet-rics

Journal of the American Statistical association

Global Ecology and Biogeography

Journal of the Royal Statistical Society: Series A (Statistics in Society)

Cartographies of disease: maps, mapping, and medicine . Esri Press Redlands, CA.Lindström, S., Finucane, H., Bulik-Sullivan, B., Schumacher, F. R., Amos, C. I., Hung, R. J., Rand, K., Gruber, S. B.,Conti, D., Permuth, J. B., et al. (2017). Quantifying the genetic correlation between multiple cancer types.

CancerEpidemiology and Prevention Biomarkers

Stochastic environmental research and risk assessment

Biometrika

Statistica Sinica pages 831–860.National Cancer Institute (2019). Seer*stat software.Perrakis, K., Ntzoufras, I., and Tsionas, E. G. (2014). On the use of marginal posteriors in marginal likelihoodestimation via importance sampling.

Computational Statistics & Data Analysis PREPRINT - F

EBRUARY

8, 2021

Richardson, S., Thomson, A., Best, N., and Elliott, P. (2004). Interpreting posterior relative risk estimates in disease-mapping studies.

Environmental health perspectives

New York .Rua, H. and Held, L. (2005).

Gaussian Markov Random Fields : Theory and Applications . Monographs on statisticsand applied probability. Chapman and Hall/CRC Press, Boca Raton, FL.Watanabe, S. (2010). Asymptotic equivalence of bayes cross validation and widely applicable information criterion insingular learning theory.

Journal of Machine Learning Research

The annals of applied statistics Biometrics

S.6 Algorithm for Model Implementation

We outline model implementation for (7) using Markov Chain Monte Carlo (MCMC). We update { w , β , σ , τ , η , . . . , η q } using Gibbs steps, while the elements of ρ are updated from their full conditional distribu-tions using Metropolis random walk steps (Robert and Casella, 2004). A particularly appealing feature of our proposedMDAGAR model is that the spatial weight parameters η = { η , . . . , η q } render Gaussian full conditional distributionsin addition to the customary Gaussian full conditional distributions for β and w . As a matter of notational conveniencefor the derivations that follow, we use N ( µ , V ) to denote the normal distribution with variance-covariance matrix V .This difference from our notation in the main manuscript where we use the precision matrix in the argument of normaldistribution. S.6.1 Full Conditional Distributions

The full conditional distribution for each β i is β i | y i , w i , σ i ∼ N ( M i m i , M i ) (S.18)where M i = (cid:16) σ i X (cid:62) i X i + σ β I p i (cid:17) − and m i = σ i X (cid:62) i ( y i − w i ) . Similarly, the full conditional distribution ofeach σ i follows an inverse gamma distribution, σ i | y i , β i , w i ∼ IG (cid:18) a σ + k , b σ + 12 ( y i − X i β i − w i ) (cid:62) ( y i − X i β i − w i ) (cid:19) . (S.19)The full conditional distribution for w i for each i = 2 , . . . , q − is p (cid:0) w i | w , . . . , w i − , w i +1 , w i +1 , . . . , w q , y i , β i , σ i , η i , . . . , η q , ρ i , . . . , ρ q , τ i , . . . , τ q (cid:1) ∝ q (cid:89) n = i exp − τ n (cid:32) w n − n − (cid:88) i (cid:48) =1 A ni (cid:48) w i (cid:48) (cid:33) (cid:62) Q ( ρ n ) (cid:32) w n − n − (cid:88) i (cid:48) =1 A ni (cid:48) w i (cid:48) (cid:33) × exp (cid:26) − σ i ( y i − X i β i − w i ) (cid:62) ( y i − X i β i − w i ) (cid:27) (S.20)which is equal to N ( w i | G i g i , G i ) , where G i = (cid:20) τ i Q ( ρ i ) + q (cid:88) n = i +1 τ n A (cid:62) ni Q ( ρ n ) A ni + 1 σ i I k (cid:21) − and g i = τ i Q ( ρ i ) i − (cid:88) n =1 A in w n + q (cid:88) n = i +1 τ n A (cid:62) ni Q ( ρ n ) w n − n − (cid:88) i (cid:48) =1 ,i (cid:48) (cid:54) = i A ni (cid:48) w i (cid:48) + 1 σ i ( y i − X i β i ) . For i = 1 and q ,we have w | w , . . . , w q , y , β , σ , η , ρ , τ ∼ N ( G g , G ) w q | w , . . . , w q − , y q , β q , σ , η q , ρ q , τ q ∼ N ( G q g q , G q ) PREPRINT - F

EBRUARY

8, 2021 , where G = (cid:32) τ Q ( ρ ) + q (cid:88) n =2 τ n A (cid:62) n Q ( ρ n ) A n + 1 σ I k (cid:33) − , g = τ A (cid:62) Q ( ρ ) w + q (cid:88) n =3 τ n A (cid:62) n Q ( ρ n ) (cid:32) w n − n − (cid:88) i (cid:48) =2 A ni (cid:48) w i (cid:48) (cid:33) + 1 σ ( y − X β ) , G q = (cid:18) τ q Q ( ρ q ) + 1 σ q I k (cid:19) − g q = τ q Q ( ρ q ) q − (cid:88) n =1 A qn w n + 1 σ q ( y q − X q β q ) . The full conditional distribution of each τ i is τ | w , ρ ∼ G (cid:18) a τ + k , b τ + 12 w T Q ( ρ ) w (cid:19) ,τ i | w , . . . , w i , η i , ρ i ∼ G a τ i + k , b τ i + 12 (cid:32) w i − i − (cid:88) i (cid:48) =1 A i,i (cid:48) w i (cid:48) (cid:33) (cid:62) Q ( ρ i ) (cid:32) w i − i − (cid:88) i (cid:48) =1 A i,i (cid:48) w i (cid:48) (cid:33) ,i = 2 , , . . . , q We now derive the full conditional distributions for the η i s. From (5) with i = 2 , each element in w can be writtenas w j = η w j + η (cid:80) j (cid:48) ∼ j w j (cid:48) + (cid:15) j , where (cid:15) j is the j th element in (cid:15) . To extract η = ( η , η ) (cid:62) fromthe matrix A , A w is rewritten as Z η where Z = ( w , ζ ) and ζ = (cid:16)(cid:80) j (cid:48) ∼ w j (cid:48) , . . . , (cid:80) j (cid:48) ∼ k w j (cid:48) (cid:17) (cid:62) . Ingeneral, A ii (cid:48) w i (cid:48) = Z i (cid:48) η ii (cid:48) with Z i (cid:48) = ( w i (cid:48) , ζ i (cid:48) ) , where ζ i (cid:48) = (cid:16)(cid:80) j (cid:48) ∼ w i (cid:48) j (cid:48) , . . . , (cid:80) j (cid:48) ∼ k w i (cid:48) j (cid:48) (cid:17) (cid:62) . Consequently,(5) can be written as w i = δ i η i + (cid:15) i , where block matrix δ i = ( Z , . . . , Z i − ) . If η i ∼ N ( µ i , V i ) , then the fullconditional distribution of η i is p ( η i | w , . . . , w i , ρ i ) ∝ exp (cid:110) − τ i w i − δ i η i ) (cid:62) Q ( ρ i )( w i − δ i η i ) (cid:111) × exp (cid:26) −

12 ( η i − µ i ) (cid:62) V − i ( η i − µ i ) (cid:27) . (S.21)The above is equal to N ( η i | H i h i , H i ) , where H i = (cid:0) τ i δ (cid:62) i Q ( ρ i ) δ i + V − i (cid:1) − and h i = τ i δ (cid:62) i Q ( ρ i ) w i + V − i µ i .For our analysis we set µ i = and V i = 1000 I . S.6.2 Metropolis within Gibbs

Let γ i = log( ρ i − ρ i ) , γ i ∈ R and γ = ( γ , . . . , γ q ) (cid:62) . The full conditional distribution of γ is p ( γ | w , η , . . . , η q , τ ) ∝ p ( w | τ , η , . . . , η q , ρ ) × p ( ρ ) | J | , (S.22)where p ( w | τ , η , . . . , η q , ρ ) = N ( w | Gg , G ) , G = (cid:0) Q w + Σ − (cid:1) − , g = Σ − ( y − Xβ ) , Σ = diag ( σ ) ⊗ I k and J = (cid:81) qi =1 ρ i (1 − ρ i ) . Using the formula of transformation, p ( ρ ) | J | is the prior for γ and in the right-hand side, ρ canbe substituted by γ given ρ i = e γi e γi .In our analysis, for each model we ran two MCMC chains for 30,000 iterations each. Posterior inference was basedupon 15,000 samples retained after adequate convergence was diagnosed. The MDAGAR model in the simulationexamples was programmed in the S language as implemented in the R statistical computing environment. All othermodels were implemented using the rjags package available from CRAN https://cran.r-project.org/web/packages/rjags/ . PREPRINT - F

EBRUARY

8, 2021

S.7 Supplementary Details in Simulation

S.7.1 WAIC, AMSE and D score

For the simulation studies in Section 3.2, let θ = { β , σ , w } . The likelihood of each data point p ( y ij | θ ) = p (cid:0) y ij | x (cid:62) ij β i + w ij , /σ i (cid:1) is needed for calculating WAIC which is deﬁned as W AIC = − (cid:16) (cid:99) lpd − ˆ p W AIC (cid:17) , where (cid:99) lpd is computed using posterior samples as the sum of log average predictive density i.e. (cid:80) qi =1 (cid:80) kj =1 log (cid:16) L (cid:80) L(cid:96) =1 p (cid:0) y ij | θ ( (cid:96) ) (cid:1)(cid:17) , θ ( (cid:96) ) for (cid:96) = 1 , . . . , L being L posterior samples of θ , and ˆ p W AIC is theestimated effective number of parameters computed as q (cid:88) i =1 k (cid:88) j =1 V L(cid:96) =1 (cid:16) log p (cid:16) y ij | θ ( (cid:96) ) (cid:17)(cid:17) with V L(cid:96) =1 (cid:0) log p (cid:0) y ij | θ ( (cid:96) ) (cid:1)(cid:1) = L − (cid:80) L(cid:96) =1 (cid:104) log p (cid:0) y ij | θ ( (cid:96) ) (cid:1) − L (cid:80) L(cid:96) =1 log p (cid:0) y ij | θ ( (cid:96) ) (cid:1)(cid:105) .Turning to the D score, we draw replicates y ij , y ( (cid:96) ) rep ,ij ∼ N ( x (cid:62) ij β ( (cid:96) ) i + w ( (cid:96) ) ij , /σ (cid:96) ) i ) and compute D = G + P .Here G = (cid:80) qi =1 || y i − ¯ y rep ,i || is a goodness-of-ﬁt measure, where ¯ y rep ,i is the mean veactor with elements ¯ y rep ,ij = L (cid:80) L(cid:96) =1 y ( (cid:96) ) rep ,ij and P = (cid:80) qi =1 (cid:80) kj =1 σ rep ,ij is a summary of variance, where σ rep ,ij is the variance of y ( (cid:96) ) rep ,ij for (cid:96) = 1 , . . . , L .For AMSE, we use w ij as the true value of each random effect and ˆ w ( n ) ij is the posterior mean of w ij for the data set n . The estimated AMSE is calculated as (cid:92) AM SE = Nqk (cid:80) Nn =1 (cid:80) qi =1 (cid:80) kj =1 (cid:16) ˆ w ( n ) ij − w ij (cid:17) with associated MonteCarlo standard error estimate (cid:100) SE ( (cid:92) AM SE ) = (cid:118)(cid:117)(cid:117)(cid:116) N qk )( N qk − N (cid:88) n =1 q (cid:88) i =1 k (cid:88) j =1 (cid:20)(cid:16) ˆ w ( n ) ij − w ij (cid:17) − (cid:92) AM SE (cid:21) . S.7.2 Coverage Probability

For the simulation studies in Section 3.2, Table S.3 shows the coverage probabilities (CP) deﬁned as the mean coveragefor a parameter by the credible intervals over 85 datasets. The MDAGAR offers satisfactory coverages for allparameters when correlations are low and medium, outperforming GMCAR, while GMCAR presents competitiveperformance with MDAGAR for high correlations. Figure S.4 plots coverage probabilities of correlation betweentwo diseases in the same region, given by corr ( w j , w j ) = cov ( w j , w j ) / ( √ var ( w j ) √ var ( w j )) , for MDAGARand GMCAR. Let Q ( ρ i ) − = { d ijj (cid:48) } , we obtain cov ( w j , w j ) = τ − ( η d jj + η (cid:80) j (cid:48) ∼ j d jj (cid:48) ) , var ( w j ) = τ − d jj and var ( w j ) = τ − [ η ( η d jj + η (cid:88) j (cid:48) ∼ j d jj (cid:48) ) + η (cid:88) j (cid:48) ∼ j ( η d jj (cid:48) + η (cid:88) j (cid:48)(cid:48) ∼ j d j (cid:48)(cid:48) j (cid:48) )] + τ − d jj . The MDAGAR performs better in estimating disease correlations in the same region for all scenarios, especially forlow and medium correlations with CPs at around in all states. PREPRINT - F

EBRUARY

8, 2021 l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l A L A Z A RC A C O C T D E F L G A I D I L I N I AKSKY L A M E M D M A M I M N M S M O M T N E N V NHN J N M N Y NCND O H O K O R PA R I S C S D T N T X U T V T VA W A W V W I W Y States C o v e r age p r obab ili t y ( % ) model ll GMCARMDAGAR (a) Low l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l A L A Z A RC A C O C T D E F L G A I D I L I N I AKSKY L A M E M D M A M I M N M S M O M T N E N V NHN J N M N Y NCND O H O K O R PA R I S C S D T N T X U T V T VA W A W V W I W Y States C o v e r age p r obab ili t y ( % ) model ll GMCARMDAGAR (b) Medium l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l ll l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l A L A Z A RC A C O C T D E F L G A I D I L I N I AKSKY L A M E M D M A M I M N M S M O M T N E N V NHN J N M N Y NCND O H O K O R PA R I S C S D T N T X U T V T VA W A W V W I W Y States C o v e r age p r obab ili t y ( % ) model ll GMCARMDAGAR (c) High

Figure S.4: Coverage probability ( % ) of corr ( w j , w j ) , i.e. correlation between two diseases in each state, forMDAGAR (blue) and GMCAR (red).Table S.3: Coverage probability ( % ) of parameters estimated from MDAGAR and GMCARCoverage probability ( % )Correlation Model η η ρ ρ τ τ σ σ Low MDAGAR 92.9 95.3 92.9 97.6 100 98.8 100 100GMCAR 92.9 80.0 84.7 95.3 95.3 100 83.5 100Medium MDAGAR 94.1 97.6 98.8 96.5 98.8 98.8 100 100GMCAR 85.9 77.6 69.4 98.8 61.2 98.8 84.7 98.8High MDAGAR 92.9 94.1 95.3 54.1 96.5 98.8 78.8 100GMCAR 96.5 88.2 70.6 100 1.20 100 97.6 100

S.8 Multiple Cancer Analysis from SEER for Case 2: Exclude smoking rates in covariates

Excluding the risk factor adult cigarette smoking rates, we only include county attributes described in 4 as covari-ates. Among models, model M exhibits dominated best performance with a posteior probability of 0.999and the corresponding conditional structure is [ larynx ] × [ esophagus | larynx ] × [ colorectal | larynx , esophagus ] × [ lung | larynx , esophagus , colorectal ] .Table S.4 is a summary of the parameter estimates for each cancer. From M , we ﬁnd that the regression slope forthe percentage of blacks and unemployed residents are signiﬁcantly positive for larynx and lung cancer respectively.The larynx cancer exhibits weaker spatial autocorrelation while the residual spatial autocorrelation for the other threecancers after accounting for preceding cancers are at moderate levels. For spatial precision τ i , larynx random effectsstill have the smallest variability while the conditional variability for colorectal and lung cancers are slightly larger.Table S.4: Posterior means (95 % credible intervals)for parameter estimated from M for Case 2 (excluding somkingrates in covariates).Parameters Larynx Esophageal Colorectal LungIntercept 6.75 (-0.58, 14.00) 11.14 (-1.70, 24.05) 18.89 (-10.37, 48.12) 24.18 (-22.71, 68.75)Young( % ) -0.09 (-0.20, 0.02) -0.09 (-0.29, 0.11) 0.27 (-0.19, 0.74) 0.04 (-0.75, 0.86)Old( % ) -0.04 (-0.12, 0.04) 0.00 (-0.15, 0.16) 0.13 (-0.23, 0.49) 0.15 (-0.49, 0.91)Edu( % ) -0.02 (-0.08, 0.04) -0.02 (-0.13, 0.09) 0.12 (-0.13, 0.38) -0.34 (-0.82, 0.15)Unemp( % ) 0.04 (-0.03, 0.12) 0.06 (-0.08, 0.20) 0.10 (-0.26, 0.45) Black( % ) % ) -0.01 (-0.08, 0.07) -0.07 (-0.21, 0.06) 0.18 (-0.16, 0.52) 0.01 (-0.59, 0.60)Uninsured( % ) -0.07 (-0.19, 0.04) -0.13 (-0.33, 0.07) 0.10 (-0.37, 0.58) 0.11 (-0.70, 0.95)Poverty( % ) 0.21 (-0.11, 0.53) 0.40 (-0.20, 1.02) 0.03 (-1.38, 1.45) 0.84 (-2.14, 3.52) ρ i τ i σ i PREPRINT - F

EBRUARY

8, 2021

S.9 Supplementary Figures and Tables

Supplementary ﬁgures and tables referenced in Section 3.2, 4 and 5 are shown below.

Probability den s i t y Correlation

HighLowMedium

Figure S.5: Density plots for probability that the KL-divergence between the MDAGAR and the true model is smallerthan that between GMCAR and the true model with three levels of correlation for two diseases: low (purple), medium(green) and high (red) PREPRINT - F

EBRUARY

8, 2021

Lung cancer22−4141−4545−5151−80 Esophagus cancer0−3.53.5−3.93.9−4.54.5−12Larynx cancer0−1.81.8−2.12.1−2.62.6−5 Colorectum cancer24−3434−3636−3838−50 (a) 5-year average age-adjusted incidence rates per 100,000 population for lung, esophagus, larynx and colorectal cancer, 2012 -2016

Smoke (%)6.70 − 11.5011.50 − 13.8513.85 − 16.2816.28 − 25.50 Black (%)0.90 − 1.901.90 − 2.802.80 − 5.155.15 − 16.90 Uninsured (%)0.0 − 0.60.6 − 0.90.9 − 1.41.4 − 3.8 (b) adult cigarette smoking rates (left), percentage of black residents (middle) and uninsured residents (right)

Figure S.6: Maps of county-level raw data in California including (a) incidence rates for lung, esophagus, larynxand colorectal cancer and (b) important county-level covariates with signiﬁcant effects: adult cigarette smoking rates,percentage of blacks and uninsured residents. PREPRINT - F

EBRUARY

8, 2021 eta_021, larynx | esophageal−5 0 5 eta_121, larynx | esophageal−2 −1 0 1 2 3 4 eta_031, colorectum | esophageal−30 −10 0 10 20 30 eta_131, colorectum | esophageal−20 −10 0 10 eta_032, colorectum | larynx−20 −10 0 10 20 eta_132, colorectum | larynx−5 0 5 10 eta_041, lung | esophageal−30 −10 0 10 20 30 40 eta_141, lung | esophageal−30 −10 0 10 20 30 eta_042, lung | larynx−40 −20 0 20 40 eta_142, lung | larynx−15 −5 0 5 10 eta_043, lung | colorectum−5 0 5 10 eta_143, lung | colorectum−1 0 1 2 3

Figure S.7: Posterior distributions of η for all pairs of cancers. PREPRINT - F

EBRUARY

8, 2021 (a) case 1: esophageal cancer and colorectal cancer (b) case 1: esophageal cancer and lung cancer (c) case 1: colorectal cancer and lung cancer l lll ll ll l lll llll l lll ll l lll l lll ll l lll l ll llll ll ll llll lll l (d) case 2: esophageal cancer and colorectal cancer l lll ll ll l lll lllll l lll ll ll lll l lll ll l lll l ll llll ll ll llll ll ll l (e) case 2: esophageal cancer and lung cancer l lll ll ll l lll lllll l lll ll ll lll l lll ll l lll l ll llll ll ll llll ll ll l (f) case 2: colorectal cancer and lung cancer l lll ll ll l lll lllll l lll ll ll lll l lll ll l lll l ll llll ll ll llll ll ll l < 0.1 0.1 − 0.3 0.3 − 0.5 0.5 − 0.7 0.7 − 0.8

Figure S.8: Estimated correlation between the incidence of pairwise cancers in each of 58 counties of California forCase 1 vs. Case 2: (a) case 1: esophageal and colorectal cancer, (b) case 1: esophageal and lung cancer, (c) case 1:colorectal and lung cancer, (d) case 2: esophageal and colorectal cancer, (e) case 2: esophageal and lung cancer, (f)case 2: colorectal and lung cancer. Maps (a)-(c) exihibit estimated correlations for Case 1, and (d) - (f) are for Case2. Yelllow points indicate signiﬁcant correlations. Note: Maps for larynx cancer are not shown due to non-signiﬁcantcorrelation with any of the other three cancers PREPRINT - F

EBRUARY

8, 2021

Lung cancer−20 − −10−10 − 00 − 55 − 1010 − 32 Esophageal cancer−1.2 − −0.5−0.5 − 00 − 0.10.1 − 0.50.5 − 1Larynx cancer−1 − −0.5−0.5 − −00 − 0.10.1 − 0.50.5 − 1.1 Colorectum cancer−5 − −3−3 − 00 − 11 − 33 − 6

Figure S.9: Maps of posterior mean spatial random effects (with no covariates) using the same order as M Table S.5: The prosterior model probabilities for 24 models p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y ) p ( M | y )0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.002

*
*