[PDF] A Joint Spatial Conditional Auto-Regressive Model for Estimating HIV Prevalence Rates Among Key Populations

Abstract

Ending the HIV/AIDS pandemic is among the Sustainable Development Goals for the next decade. In order to overcome the gap between the need for care and the available resources, better understanding of HIV epidemics is needed to guide policy decisions, especially for key populations that are at higher risk for HIV infection. Accurate HIV epidemic estimates for key populations have been difficult to obtain because their HIV surveillance data is very limited. In this paper, we propose a so-called joint spatial conditional auto-regressive model for estimating HIV prevalence rates among key populations. Our model borrows information from both neighboring locations and dependent populations. As illustrated in the real data analysis, it provides more accurate estimates than independently fitting the sub-epidemic for each key population. In addition, we provide a study to reveal the conditions that our proposal gives a better prediction. The study combines both theoretical investigation and numerical study, revealing strength and limitations of our proposal.

Full PDF

AA Joint Spatial Conditional Auto-RegressiveModel for Estimating HIV Prevalence Rates

Among Key Populations

Zhou LanSchool of Medicine, Yale UniversityandLe BaoDepartment of Statistics, Penn State UniversitySeptember 3, 2020

Abstract

Ending the HIV/AIDS pandemic is among the Sustainable Development Goalsfor the next decade. In order to overcome the gap between the need for care and theavailable resources, better understanding of HIV epidemics is needed to guide policydecisions, especially for key populations that are at higher risk for HIV infection.Accurate HIV epidemic estimates for key populations have been diﬃcult to obtainbecause their HIV surveillance data is very limited. In this paper, we propose a so-called joint spatial conditional auto-regressive model for estimating HIV prevalencerates among key populations. Our model borrows information from both neighboringlocations and dependent populations. As illustrated in the real data analysis, itprovides more accurate estimates than independently ﬁtting the sub-epidemic foreach key population. In addition, we provide a study to reveal the conditions that ourproposal gives a better prediction. The study combines both theoretical investigationand numerical study, revealing strength and limitations of our proposal.

Keywords:

Conditional Auto-Regressive Model, Cross-Population Dependence, HIV Preva-lence, Key Populations, Missing Data 1 a r X i v : . [ s t a t . A P ] S e p Introduction

Almost four decades since the HIV/AIDS pandemic began, HIV continues to be a lead-ing cause of death (Naghavi et al., 2017). Ending the HIV epidemic is among the Sus-tainable Development Goals ( https://sustainabledevelopment.un.org/ ) for the nextdecade (Alfv´en et al., 2017; Bekker et al., 2018; World Health Organization, 2019). How-ever, it is challenged by the long-standing gap between the need for care and the availableresources to provide care for the populations mostly aﬀected by the HIV epidemic. Thesepopulations are called key populations , and they are at higher risk for HIV, based on sex-ual practices, occupations, and substance use (e.g., injection drug use, female sex workers,and men who have sex with men) (Lyerla et al., 2008; Calleja et al., 2010; Baral et al.,2012). Accurate HIV epidemic estimates among key populations would help determine thegovernments’ policy and resource allocation. To monitor the HIV epidemics among keypopulations, countries rely on anonymous HIV surveillance data which include the samplesize of participants and the proportion of HIV positive cases that are collected at sexuallytransmitted disease (STD) clinics. In most countries, the HIV surveillance data for keypopulations is still very limited. In light of this, a model that more eﬃciently utilizes exist-ing data and produces more accurate estimates of HIV prevalence among key populationsis needed .The generalized linear mixed model is an appealing tool for information pooling. Onemay assume that the number of HIV positive cases follows a binomial distribution withthe unknown proportion parameter corresponding to the HIV prevalence within a keypopulation at a certain time and location. The ﬁxed eﬀects and the the random eﬀects The surveillance data for the remaining population (the population which is not a key population ) arerelatively abundant at clinics, and thus estimating the HIV prevalence among the remaining population isnot the focus of this paper.

Gaussian cosimulation and wasintroduced by Oliver (2003). It has a variety of applications, e.g., environment (Rectaet al., 2012), ecology (Fanshawe and Diggle, 2012), portfolio analysis (Weatherill et al.,2015), etc. We implement this model in a Bayesian way and use the posterior predictivedistribution to impute the missing entries of the key populations. We obtain a substantialimprovement in imputation accuracy on real HIV prevalence datasets.The availability of HIV surveillance is extremely imbalanced for key populations –some locations have data for all key populations while some locations do not have any keypopulation data. We provide an investigation of missing structures to understand when ourproposed model would be expected to yield improved results. Essentially, the complicated3issingness of surveillance data can be categorize into two types: matching and discrepancy .We study the impact of two missing structures on the parameter estimation and missingdata imputation, and ﬁnd an interesting trade-oﬀ between two missing structures.In the rest of the paper, we ﬁrst introduce our motivating data in Section 2 and ourmethod in Section 3. In light of our scientiﬁc goal, which is to impute the missing HIVprevalence among diﬀerent key populations, we use our motivating data to evaluate ourproposal via measuring the accuracy in imputation (Section 4). Given that our proposalprovides more accurate imputations, we further investigate the missing structure trade-oﬀ(Section 5). In Section 6, we conclude with a discussion.

In this paper, we use the HIV surveillance data from three representative countries todemonstrate our proposal. They are Ukraine (2004-2015), Morocco (2001-2018), and Ja-maica (2000-2014). We use subscripts, i , j , k , to denote a population, a location, and ayear, respectively; and we let Y ijk be the number of HIV positive cases and N ijk be thesample size of the population i ∈ { , , ..., I } at the location j ∈ { , , ..., J } in the year k ∈ { , , ..., K } , respectively. Take Ukraine data as an example: i = 1 : 5 represent peoplewho use drugs (IDUs), female sex workers (FSW), clients of female sex workers (Clients),men who have sex with men (MSM), and remaining people; j = 1 : 27 represent the dis-tricts (i.e., Kiev, Crimea, etc); k = 1 : 12 represent the year from 2004-2015. We can easilyprovide naive prevalence rate estimators ˆ p ijk = Y ijk N ijk for the combinations of ( i, j, k ) if thesurveillance data is available. However, except for the remaining people, the observationsof the key populations suﬀer severe data scarcity. Figure 1 gives the missing structure ofUkraine. As stated before, the missingness is due to the limitation of the HIV surveillance4ata for key populations, and thus the corrasponding Y ijk and N ijk are not observed. Ourprimary goal is to provide the HIV prevalence estimates for all key populations.Figure 1: The missing structure of Ukraine from 2004 to 2015 are visualized. The y-axis isfor the populations and the x-axis is for the locations. The red entries are missing and theblue entries are observed. 5 Method

In this section, we give our proposed model. As introduced in Section 2, we let Y ijk and N ijk be the number of HIV infected people and the sample size of participants, respectively.Since Y ijk is the number of infected people among N ijk participants of the population i atthe location j in the year k , we assume that Y ijk follows a binomial distribution with theHIV prevalence rate p ijk , denoted as Y ijk | p ijk ∼ B ( N ijk , p ijk ) . We use the logit link function as the link function, and thus we have the transformedmean as µ ijk = log p ijk − p ijk . Under the framework of linear mixed model, we assume that thevariation of the transformed mean µ ijk can be decomposed into a ﬁxed eﬀect v i ( k ) describingthe population speciﬁc time trend which is a quantity of interest, and a random eﬀect s ij describing the additional variability across populations and locations. This decompositionis expressed as µ ijk = v i ( k ) + s ij . First, we give the speciﬁcation of the ﬁxed eﬀect. In our model, the ﬁxed eﬀect is deﬁned asan eﬀect driven by the population-speciﬁc trend over the years, describing the averaged levelof the prevalence rate in a certain year ( k ). The trend is usually dynamic and population-speciﬁc. For example, Figure 2 indicates that the transformed means (ˆ µ ijk = log ˆ p ijk − ˆ p ijk )present diﬀerent trends among populations and are non-trivial to be handled. Among avariety of nonparametric regression models and given our epidemic real data, we use the6ubic-polynomial regression to model the population-speciﬁc trend, that is that v i ( k ) = β i + (cid:88) p =1 k p β ki . Figure 2: The population-speciﬁc HIV prevalence trends of Ukraine from 2004 to 2015 arevisualized by using scatter plots. The y-axis is for the transformed (ˆ µ ijk ) and the x-axis isfor the years.Next, we introduce the speciﬁcation of the random eﬀect s ij . There are several potentialoptions for this speciﬁcation. The random eﬀect s ij can be simply treated as an eﬀect causedby the variation among locations, e.g., s ij ∼ N (0 , σ i ). However, this speciﬁcation does notallow the spatial dependence among nearby locations. Given that the HIV prevalence datacan be treated as areal data (Rue and Held, 2005), another popular approach is the spatialconditional auto-regressive (CAR) model (Besag, 1974). In the CAR model, we treat thelocations as the nodes of an undirected graph and the nodes are connected if the locationsare adjacent (See Figure 3). The random eﬀects are speciﬁed as s i = [ s i , s i , ..., s iJ ] T ∼ N ( , σ i D ( I − φ i C ) − ) , D is a diagonal matrix whose diagonal entries are the degrees of each node and C is the adjacency matrix of the graph . The population-speciﬁc variance σ i > < φ i < In graph theory, an undirected graph is made up of nodes (locations) which are connected by edges.In this context, degrees of a node is the number of the other nodes which are connected to it. Adjacencymatrix is a n × n symmetric matrix. The entries of the matrix can only be 0 or 1. If node i and node j are connected, the ( i, j )-th and ( j, i )-th entry are 1; otherwise, it is 0. s i and s i (cid:48) is ρ ii (cid:48) L i L Ti (cid:48) , where ρ ii (cid:48) = ρ i (cid:48) i ∈ [ − , Σ i = σ i D ( I − φ i C ) − andits Cholesky decomposition is Σ i = L i L Ti . Thus, the dependence between the population i and the population i (cid:48) is determined by ρ ii (cid:48) : the two populations have large positive depen-dence if ρ ii (cid:48) is close to 1; the two populations have large negative dependence if ρ ii (cid:48) is closeto −

1; and the two populations have no dependence if ρ ii (cid:48) is close to 0. This approach isreferred to as Gaussian cosimulation , and the validity of the above approach for construct-ing cross-covariances has already been provided (Oliver, 2003). Recta et al. (2012) furthergives an intuitive explanation of ρ ii (cid:48) , that is, the correlation between the processes of thepopulation i and i (cid:48) at the same location. To this end, our proposed model is Y ijk | p ijk ∼ B ( N ijk , p ijk ) , µ ijk = log p ijk − p ijk = v i ( k ) + s ij v i ( k ) = β i + (cid:88) p =1 k p β ki , [ s T , s T , ..., s TI ] T ∼ N ( , S ) S =  Σ ρ L L T . . . ρ I L L TI ρ L L T Σ . . . ρ I L L TI ... ... . . . ... ρ I L I L T ρ I L I L T . . . Σ I  IJ × IJ , Σ i = L i L Ti = σ i D ( I − φ i C ) − , ρ ii (cid:48) = ρ i (cid:48) i ∈ [ − ,

1] (1)9e name this model as a joint spatial conditional auto-regressive model and use Markov-chain Monte Carlo (MCMC) to ﬁt the model. We give priors to the unknown parameters:for k ∈ { , , ..., K } , β ki follows a normal distribution with mean 0 and variance 100, de-noted as β ki ∼ N (0 , σ i follows a inverse gamma distribution with a shape parameter0 . .

1, denoted as 1 /σ i ∼ GA (0 . , . φ i follows a uniform distri-bution ranging from 0 to 1, denoted as φ i ∼ U (0 , ρ ii (cid:48) follows a a uniform distributionranging from -1 to 1, denoted as ρ ii (cid:48) ∼ U ( − , β ki and σ i are known to be conjugate priors and frequently used inBayesian analysis of the (generalized) linear model (Gelman et al., 2013), and our spec-iﬁcation brings weak prior information. The uniform prior on φ i has been implementedin several reports (e.g., Lee, 2013; Xue et al., 2018). The uniform prior of ρ ii (cid:48) follows thepractice of Recta et al. (2012). We use NIMBLE (de Valpine et al., 2017) codes to implementour proposal and the codes are attached in Section A of the supplementary materials.In the above statement of this model, we treat all combinations of ( i, j, k ) as observedones. However, note that in our motivating data, many combinations of ( i, j, k ) are notobserved. These unobserved prevalence rates (or their transformed means) can be imputedby the predictive density function f ( µ ( M ) | µ ( O ) , . ), where µ ( M ) is a vector of transformedmeans whose indices are the missing entries and µ ( O ) is a vector of transformed meanswhose indices are the observed entries. The predictive density function can be intuitivelyunderstood as an extended kriging, borrowing information not only from neighbouringlocations but also dependent populations. The NIMBLE codes adopt MCMC imputation(de Valpine et al., 2017) to impute these unobserved prevalence rates for each MCMCiteration via drawing a sample from the predictive density function.10

Model Comparison and Evaluation

Given our scientiﬁc objective, which is to impute the missing HIV prevalence amongdiﬀerent key populations, we apply our proposal and other benchmark methods to theHIV surveillance data described in Section 2 and evaluate their performances via cross-validation. The model speciﬁcations of the benchmark methods and our proposal aresummarized in Table 1. They are distinguished by their random eﬀects: the simple mixedmodel only assumes dependence within a combination of a key population i and a location j ; the CAR model further introduces the spatial eﬀects; our proposal captures both thespatial dependence and cross-population dependence.Table 1: The proposal and the benchmark methods. Method Benchmark Methods ProposalMix Method CAR Mode Joint CAR ModelData Model Y ijk | p ijk ∼ B ( N ijk , p ijk ) , µ ijk = log p ijk − p ijk = v i ( k ) + s ij Fixed Eﬀect v i ( k ) = β i + (cid:80) p =1 k p β ki Random Eﬀect s ij ∼ N (0 , σ i ) s i ∼ N ( , σ i D ( I − φ i C ) − ) [ s T , s T , ..., s TI ] T ∼ N ( , S ) In the following numerical studies, we randomly mark some observed entries Y ijk in thereal data as missing entries. Let H be a set of combinations of indexes ( i, j, k ) which are missing and | H | is the size of this set. Because the true prevalence rates are unknown,the naive prevalence rate estimators ˆ p ijk = Y ijk N ijk are used for accuracy evaluation. Theaccuracy is summarized in terms of mean square error, MSE = (cid:80) ( i,j,k ) ∈ H | H | ( E p ijk − ˆ p ijk ) ,and 99% posterior coverage on these missing entries. Given N posterior samples of p ijk for( i, j, k ) ∈ H , denoted as { p ( t ) ijk : t ∈ { , , ..., N }} , we draw samples { Y ( t ) ijk : t ∈ { , , ..., N }} via Y ( t ) ijk ∼ B ( N ijk , p ( t ) ijk ), and then compute { ˜ p ( t ) ijk = Y ( t ) ijk N ijk : t ∈ { , , ..., N }} . For ( i, j, k ) ∈ H ,we obtain E p ijk = N (cid:80) Nt =1 p ( t ) ijk for MSE calculation. We obtain the empirical 99% posterior11nterval of the density { ˜ p ( t ) ijk = Y ( t ) ijk N ijk : t ∈ { , , ..., N }} and calculate the frequency that theempirical interval covers the naive estimator ˆ p ijk .To demonstrate that our proposal produces better imputations universally, we appliedthe methods to the HIV prevalence data of the three representative countries: Ukraine,Morocco, and Jamaica . We collect 30 ,

000 MCMC samples discarding the ﬁrst 20 , leave-one-location-out cross-validation regarding our case, becauseour model treats years as replications and aims to predict unknown locations for eachreplication (year). The leave-one-location-out cross-validation is described as follows. Foreach fold, we treat all of a key population’s observations within one location over all theyears as missing. We ﬁt the model to the rest of the observed data and calculate theMSE and the posterior coverage of the missing location based on the prediction using theposterior predictive densities. We give the weighted average of these MSEs and posteriorcoverages along with their sample standard deviation (SD) in Table 2. The weights for theweighted average are the missing entries of each component. The leave-one-location-out cross-validation demonstrates that the joint CAR model (our proposal) produces the bestpoint estimates and the most appropriate uncertainties. Both the joint CAR model and theCAR model surpass the mixed model, indicating the importance of borrowing informationfrom neighboring locations. The diﬀerence between the performances of the Joint CARand CAR are due to whether the cross-population dependence is taken into consideration. Because Jamaica has few observations of IDUs, FSW, and MSM, only Clients and the remaining peopleare included in the model ﬁtting. leave-one-location-out cross-validation.

Index Key Population Mixed CAR Joint CAR CountryMean SD Mean SD Mean SDMSE FSW 6.53E-03 8.61E-03 7.47E-03 9.93E-03

We ﬁnally use the prevalence rate of the IDUs in 2009 as an illustrative example (Figure4). We use the left panel of Figure 4 to present the original prevalence map where the entriesare either unobserved or labeled with native prevalence estimator ˆ p ijk . After model ﬁtting,we use the posterior mean E p ijk to impute the observed entries, which are presented in theright panel of Figure 4. 13igure 4: The left panel presents the original prevalence map where the entries are eitherunobserved (grey) or labeled (color) with native prevalence estimator ˆ p ijk . The right paneladditionally use the posterior mean E p ijk to impute the observed entries. In this section, we further investigate when our proposal is expected to improve the accuracyof the missing data imputation. We conjecture that the strength of the cross-populationdependence varies with the structure of the missing data. Considering two populations ofa certain year, we deﬁne two extreme missing structures as follows: • Matching – at each location, the surveillance data of the two populations are eitherboth available OR both missing; • Discrepancy – at each location, the surveillance data are available for one populationAND missing for the other population.14 visual illustration is in Figure 5. The missing structure of real surveillance data formultiple populations is usually a mix of those two extremes. However, studying missingstructure via focusing on the extreme cases make it eﬃcient to evaluate their impact onmissing imputation.In the following two sections, we study the impacts of the two missing structures on twoaspects: (1) imputation robustness and (2) statistical inference of population dependenceparameter ρ ii (cid:48) . The relevant derivations and proofs are summarized in Section B of thesupplementary materials. In this subsection, we investigate the impact of two missing data structures in Figure 5on imputation robustness. For simplicity, we assume that there is only one year ( K = 1),which means that the dummy index k denoting the years is omitted in the following il-lustration. We evaluate the performance of imputation by treating crossed entries as theones to be predicted. The missing entries are predicted by using predictive posterior den-sity, and the densities under diﬀerent missing structures are diﬀerently expressed. Forthe matching structure, the predictive posterior density depends on two components: theobservations of population 1, denoted as µ obs = [ µ , µ , ..., µ , (2 G − ] T , and the obser-vations of population 2, denoted as µ obs = [ µ , µ , ..., µ , (2 G − ] T . For the discrepancystructure, the predictive posterior density depends on two components: the observations ofpopulation 1, denoted as µ obs = [ µ , µ , ..., µ , (2 G − ] T , and the observations of pop-ulation 2, denoted as µ obs = [ µ , µ , ..., µ , G ] T . For both cases, we aim to predict µ pred = [ µ , µ , ..., µ , G ] T . For a more concise illustration, we further assume ﬁxed eﬀectsare zeros. Let Σ pred = L pred L Tpred , Σ obs = L obs L Tobs , Σ obs = L obs L Tobs be the marginalcovariance matrices of µ pred , µ obs , µ obs , respectively; ρ be the population dependence pa-15igure 5: A graphical illustration of the two missing structures. We assume there are 2 G locations in total. The blue boxes indicate the ones which are observed. The red boxesindicate the ones which are missing. The boxes labeled with a crossing indicate the oneswhich are to be predicted.rameter between population 1 and population 2. We express µ pred , µ obs , µ obs as16atching:  µ pred = L pred ( RZ + ( I − RR T ) Z ) , µ obs = L obs Z , µ obs = L obs ( ρ Z + (1 − ρ ) Z ) , Discrepancy:  µ pred = L pred Z , µ obs = L obs ( R T Z + ( I − R T R ) Z ) , µ obs = L obs ( ρ Z + (1 − ρ ) Z ) , where R = L − pred Cov ( µ pred , µ obs )( L Tobs ) − , Z , Z and Z are independently distributedas a multivariate normal distribution with mean and variance I , and A returns thelower Cholesky factor of A . Therefore, the joint distribution of [ µ pred , µ obs , µ obs ] isMatching:  µ pred µ obs µ obs  | .  ∼ N  ,  Σ pred L pred RL Tobs ρ L pred RL Tobs L obs R T L Tpred Σ obs ρ L obs L Tobs ρ L obs R T L Tpred ρ L obs L Tobs Σ obs  Discrepancy:  µ pred µ obs µ obs  | .  ∼ N  ,  Σ pred L pred RL Tobs ρ L pred L Tobs L obs R T L Tpred Σ obs ρ L obs R T L Tobs ρ L obs L Tpred ρ L obs RL Tobs Σ obs  . (2)17iven the joint densities, we have the predictive posterior densities expressed as follows:Matching:[ µ pred | µ obs , µ obs , . ] ∼ N [ L pred RL Tobs Σ − obs µ obs , Σ pred − L pred RR T L Tpred ] , Discrepancy:[ µ pred | µ obs , µ obs , . ] ∼ N [ W µ obs + W µ obs , Σ pred − ( W L obs R T L Tpred + W L obs L Tpred )] , (3)where W = L pred R ( I − ρ R T R ) − L − obs − L pred ρ ( I − ρ RR T ) − RL − obs , W = − L pred ρ RR T ( I − ρ RR T ) − L − obs + L pred ρ ( I − ρ RR T ) − L − obs . (4)In Equation (3), the discrepancy structure borrows information from both populationswhereas the matching structure only borrows information from population 1. Thus theBayesian estimation under the discrepancy structure utilizes more information, which isexpected to perform better than that of the matching structure. Taking a closer look atthe predictive distributions, we ﬁnd that both predictive posterior densities enjoy unbiasedmean after integrating out the observed ones, i.e., µ obs , µ obs . However, the discrepancystructure has a smaller predictive variance than that of the matching structure for any ρ ∈ [ − ,

1] and the diﬀerence is larger if | ρ | is closer to 1. The two variances are equal ifand only if ρ = 0. Thus, ρ controls how much information is borrowed from the dependentpopulations. In summary, we can give a remark below: Remark 1.

The prediction of discrepancy structure is more robust than that of the matchingstructure. The prediction of discrepancy structure borrows information from both popula-tions but the prediction of matching structure only borrows information from neighboringlocations of its own population. G = 1, Figure 6 illustrates the relationship be-tween the predictive variance and the population dependence parameter under two missingdata structures: discrepancy in blue and matching in yellow. We assume that the spatialcovariance matrix of both population 1 and population 2 are Σ = Σ =  . .  . Thediscrepancy structure produces a smaller predictive variance when the absolute value ofpopulation dependence parameter is large. The population dependence parameter has noeﬀect on the predictive variance of matching structure.Figure 6: The relationship between the predictive variance and the population dependenceparameter under two missing data structures: discrepancy in blue and matching in yellow.We assume that G = 1 and spatial covariance matrix of both population 1 and population2 are Σ = Σ . As discussed in Section 5.1, the discrepancy structure provides a robust prediction. How-ever, we also realize that this robustness relies on a valid statistical inference of the pop-19lation dependence parameter ρ . In this section, we discuss the impact of the missingstructures on the estimation of the population dependence parameter ρ . In particular, wecompare the variances of the unbiased estimate, ˆ ρ = f ( µ obs , µ obs ), under two missingdata structures. Both µ obs and µ obs are observed ones as deﬁned in Section 5.1. A lowerbound on the variance of the unbiased estimator can be obtained by using the Cram´er–Raoinequality (Gart, 1959), such as V ar ( ˆ ρ ) ≤ I − ( ρ ). Given a missing structure, the inverseof the Fisher information, I − ( ρ ), is expressed as follows:Matching: I − Matching ( ρ ) = 1 G (1 − ρ ) ρ . Discrepancy: I − Discrepancy ( ρ ) = 1 /T r [ ρ R T ( I − ρ RR T ) − R ] . (5)The above information bounds are attained if ddρ log L ( ρ | µ obs , µ obs ) = a ( ρ )( ˆ ρ − ρ ) where a ( ρ ) is a function of ρ and log L ( ρ | µ obs , µ obs ) is the log likelihood of [ µ obs , µ obs ] with Y ijk marginalized. In addition, I − Matching ( ρ ) < I − Discrepancy ( ρ ) for any ρ and R (See the proofin Supplement B.2). When R = , I − Discrepancy ( ρ ) is inﬁnity meaning that ρ could not beestimated. In summary, we can give a remark below: Remark 2.

The estimator of the population dependence parameter under the matchingstructure is more eﬃcient than that under the discrepancy structure.

Figure 7 shows how I − Matching ( ρ ) varies by ρ and G , which provides a way to controlthe variance of ˆ ρ in practice. For instance, if | ρ | ≥ .

5, then observing data for both pop-ulations in 18 locations could ensure I − Matching ( ρ ) ≤ . K years/replicationsinto consideration, the inverted ﬁsher information is G × K (1 − ρ ) ρ . This means the neededmatching locations can be accumulative over the years. The examples we considered inthis article satisﬁed this requirement. 20igure 7: The contour plot of I − Matching ( ρ ). The x-axis is for number of pairs. The y-axisis for the population dependence parameter.To further validate Remark 2, we also numerically investigate how the spatial parame-ters aﬀect the statistical inference of ρ ii (cid:48) . The simulated data are generated based on ourproposed model (Equation 1). We use the map of Ukraine. We have I = 2, J = 27, and K = 10. For all i, j, k , we have the sample size N ijk = 100. We give the population-speciﬁctrends as v ( k ) = sin(0 . k ) and v ( k ) = cos(0 . k ). The population-speciﬁc random eﬀectsare generated as s i ∼ N ( , S ). The parameters associated with S are speciﬁed as follows.The population-speciﬁc spatial parameters are speciﬁed as σ = σ = 1. For each run, wesample a simulated data as follows, we simulate 50 replications with ρ ∈ { . , . , ..., . } and φ = φ ∈ { . , . , ..., . , . } and hide the assumed missing entries as presentedin Figure 5. From Figure 8, we can ﬁnd that the MSE of posterior mean of ρ is largeunder the structure of discrepancy. The MSE increases if the absolute value of ρ is large.21owever, the structure of matching does not impact the posterior mean of ρ . All theseare consistent with our previous claims using Cram´er–Rao inequality (Gart, 1959).Figure 8: The MSEs of the population dependence parameter varied by spatial correlationparameter φ = φ (x-axis) and population dependence parameter ρ (value on the top ofeach ﬁgure). Putting Remarks 1 and 2 together, we found a trade-oﬀ between two missing structures:

The matching structure gives an eﬃcient estimation of population dependence parameter,but its prediction does not use the cross-population information v.s.

The discrepancy struc-ture gives a robust prediction by borrowing cross-population information but leads to ineﬃ-cient estimation of the population dependence parameter . Given that the cross-populationdependence induced by the

Gaussian cosimulation (Oliver, 2003) has a variety of applica-tions environment (e.g., Recta et al., 2012; Fanshawe and Diggle, 2012), this trade-oﬀ helps22nderstand the strength and limitation of this model in terms of missing data imputation.

Understanding HIV prevalence among key populations is important for HIV prevention.However, accurate estimates have been diﬃcult to obtain because their HIV surveillancedata is very limited. In this paper, we propose a generalized liner mixed model with thespatial conditional auto-regressive feature which captures both the spatial dependence andthe cross-population dependence. The proposed model fully utilizes existing data and pro-vides a useful statistical tool for HIV epidemiologists to impute the unknown prevalencerates and reveal potential spatial/cross-population variation. A substantial improvement indata imputation is obtained in the real data application, primarily resolving our scientiﬁcgoal of estimating the HIV prevalence rates among key populations. The study also moti-vates us to explore the impact of missing data on imputation accuracy. We present bothsimulation results and theoretical results to reveal the strength and limitation of

Gaussiancosimulation when the model is applied to the missing data imputation.Two topics are worthwhile for further investigation. The ﬁrst one is the cross-populationdependence. Our proposal adopts

Gaussian cosimulation (Oliver, 2003), and the correlationparameter describes a linear relationship between the processes of any two populations.However, the actual cross-population dependence may be more complicated than the one wehave proposed. Modern statistical methods such as graphical models may be implementedfor handling the cross-population dependence, but the limited availability of our surveillancedata may be a hurdle. The second one is the data missing mechanism. The missing entriesare not missing completely at random but due to some speciﬁc reasons (e.g., resourceallocation, administrative issues). It would be useful to know why the HIV surveillance23ata were missing for some combinations of key population, year and location, so thatthe potential bias due to missing not at random could be addressed. Unfortunately, suchinformation is not readily available. 24 upplementary Materials

A Codes

We compile our codes into an R package JointSpCAR . The package provides a function

JointCAR() to implement our proposal. We also provide an R markdown script implantation.rmd introducing the function implementation by using synthetic HIV epidemic data. In ad-dition, we also give the functions which are CAR() and

Mixed() . They implement thebenchmark methods which are the CAR model and Mixed model, respectively.

B Proofs

In this section, we give essential proofs of this article.

B.1 Proofs for Section 5.1 Imputation Robustness

In this subsection, we give the proofs for Section 5.1 Imputation Robustness. We give that S = V ar ( µ pred ), S = V ar ([ µ Tobs , µ Tobs ] T ), and S = Cov ( µ pred , [ µ Tobs , µ Tobs ] T ). Thus,for both structures, the posterior mean is S S − [ µ Tobs , µ Tobs ] T and the posterior variancematrix is S − S S − S T .The bottleneck of obtaining the mean and the variance is S − . Lu and Shiou (2002)gives an explicit inverse formula for a 2 × Theorem 1.

Let R =  A BC D  be a square positive semi-deﬁnite matrix where A and D are all square matrices. Let R − =  E FG H  be inversion of R and the components in theinversion are E = A − + A − B ( D − CA − B ) − CA − • F = − A − B ( D − CA − B ) − • G = − ( D − CA − B ) CA − • H = ( D − CA − B ) − Next, we apply Theorem 1 to both structures:

Matching Structure:

Here, we give that A = Σ obs , B = ρ L obs L Tobs , C = ρ L obs L Tobs ,and D = Σ obs . Then, E , F , G , and H are expressed as follows: E = A − + A − B ( D − CA − B ) − CA − = Σ − obs + Σ − obs ρ L obs L Tobs ( Σ obs − ρ L obs L Tobs Σ − obs ρ L obs L Tobs ) − ρ L obs L Tobs Σ − obs = 11 − ρ Σ − obs F = − A − B ( D − CA − B ) − = − Σ − obs ρ L obs L Tobs ( Σ obs − ρ L obs L Tobs Σ − obs ρ L obs L Tobs ) − = − Σ − obs L obs L Tobs Σ − obs ρ − ρ G = − Σ − obs L obs L Tobs Σ − obs ρ − ρ H = ( D − CA − B ) − = ( Σ obs − ρ L obs L Tobs Σ − obs ρ L obs L Tobs ) − = 11 − ρ Σ − obs S S − [ µ Tobs , µ Tobs ] T = [ L pred RL Tobs , ρ L pred RL Tobs ]  − ρ Σ − obs − Σ − obs L obs L Tobs Σ − obs ρ − ρ − Σ − obs L obs L Tobs Σ − obs ρ − ρ − ρ Σ − obs   µ obs µ obs  = L pred RL Tobs Σ − obs µ obs , and the posterior variance is S − S S − S T = Σ pred − [ L pred RL Tobs , ρ L pred RL Tobs ]  − ρ Σ − obs − Σ − obs L obs L Tobs Σ − obs ρ − ρ − Σ − obs L obs L Tobs Σ − obs ρ − ρ − ρ Σ − obs   L obs R T L Tpred ρ L obs R T L Tpred  = Σ pred − [ L pred RL Tobs Σ − obs , ]  L obs R T L Tpred ρ L obs R T L Tpred  = Σ pred − L pred RR T L Tpred iscrepancy Structure: Here, we give that A = Σ obs , B = ρ L obs R T L Tobs , C = ρ L obs RL Tobs , and D = Σ obs . Then, E , F , G , and H are expressed as follows: E = A − + A − B ( D − CA − B ) − CA − = Σ − obs + Σ − obs ρ L obs R T L Tobs ( Σ obs − ρ L obs RL Tobs Σ − obs ρ L obs RL Tobs ) − ρ L obs RL Tobs Σ − obs = ( Σ obs − ρ L obs R T RL Tobs ) − F = − A − B ( D − CA − B ) − = − Σ − obs ρ L obs R T L Tobs ( Σ obs − ρ L obs RL Tobs Σ − obs ρ L obs R T L Tobs ) − = − Σ − obs ρ L obs R T L Tobs ( Σ obs − ρ L obs RR T L Tobs ) − = − ρ ( L Tobs ) − R T ( I − ρ RR T ) − L − obs G = − ρ ( L Tobs ) − ( I − ρ RR T ) − RL − obs H = ( D − CA − B ) − = ( Σ obs − ρ L obs RL Tobs Σ − obs ρ L obs R T L Tobs ) − = ( Σ obs − ρ L obs RR T L Tobs ) − Then the posterior mean is S S − [ µ Tobs , µ Tobs ] T = [ L pred RL Tobs , ρ L pred L Tobs ]  ( Σ obs − ρ L obs R T RL Tobs ) − − ρ ( L Tobs ) − R T ( I − ρ RR T ) − L − obs − ρ ( L Tobs ) − ( I − ρ RR T ) − RL − obs ( Σ obs − ρ L obs RR T L Tobs ) −   µ obs µ obs  = W µ obs + W µ obs , W = L pred R ( I − ρ R T R ) − L − obs − L pred ρ ( I − ρ RR T ) − RL − obs W = − L pred ρ RR T ( I − ρ RR T ) − L − obs + L pred ρ ( I − ρ RR T ) − L − obs and the posterior variance is S − S S − S T = Σ pred − [ W W ]  L obs R T L Tpred ρ L obs L Tpred  = Σ pred − [ L pred R ( I − ρ R T R ) − R T L Tpred − L pred ρ ( I − ρ RR T ) − RR T L Tpred − L pred ρ RR T ( I − ρ RR T ) − ρ L Tpred + L pred ρ ( I − ρ RR T ) − ρ L Tpred ] Next, we want to prove that the predictive variance of the matching structure is largerthan that of discrepancy structure, that is ∆ S = S Matching − S Discrepancy (cid:23) , where S Matching and S Discrepancy are the conditional covariance matrices of the matchingstructure and discrepancy structure, respectively. This is also equivalent to show that U =[ R ( I − ρ R T R ) − R T − ρ ( I − ρ RR T ) − RR T − ρ RR T ( I − ρ RR T ) − ρ + ρ ( I − ρ RR T ) − ρ ] − RR T (cid:23) for all ρ ∈ [ − ,

1] and R . Given the Neumann series, U can be expressed as29 = R ∞ (cid:88) k =0 ( ρ R T R ) k R T − ρ ∞ (cid:88) k =0 ( ρ RR T ) k RR T − ρ RR T ∞ (cid:88) k =0 ( ρ RR T ) k + ρ ∞ (cid:88) k =0 ( ρ RR T ) k − RR T = R ∞ (cid:88) k =0 ( ρ R T R ) k R T − ρ ∞ (cid:88) k =0 ( ρ ) k ( RR T ) k +1 + ρ ∞ (cid:88) k =0 ( ρ RR T ) k − RR T = ∞ (cid:88) k =1 ( ρ ) k ( RR T ) k +1 − ρ ∞ (cid:88) k =0 ( ρ ) k ( RR T ) k +1 + ρ ∞ (cid:88) k =0 ( ρ RR T ) k = ∞ (cid:88) k =1 ( ρ ) k ( RR T ) k +1 − ρ ∞ (cid:88) k =1 ( ρ ) k − ( RR T ) k + ρ ∞ (cid:88) k =1 ( ρ RR T ) k − = ∞ (cid:88) k =1 ρ k Z k − ( I − Z ) , (6)where Z = RR T . Because Z k − ( I − Z ) is positive semi-deﬁnite for all k , then U ispositive semi-deﬁnite. In summary, we proved our statement. B.2 Proofs for Section 5.2 Population Dependence Parameter

In this subsection, we give the proofs for Section 5.2 Population Dependence Parameter.The Fisher information I ( ρ ) (Malag`o and Pistone, 2015) is I ( ρ ) = 12 T r ( S − ∂ S ∂ρ S − ∂ S ∂ρ ) . (7)We have already given the expressions of S − and S in Section B.1. Thus, it is notdiﬃcult to give explicit expression of these I ( ρ ) by plugging in the expressions of S − and S under matching structure or discrepancy structure.Next, we want to prove I − Matching ( ρ ) < I − Discrepancy ( ρ ) for any ρ and R . Our goal isto ﬁnd a lower bond of I − Discrepancy ( ρ ). Given the Neumann series, I − Discrepancy ( ρ ) can beexpressed as 30 − Discrepancy ( ρ ) = 1 /T r [ ρ R T ( I − ρ RR T ) − R ] = 1 /T r [ ρ R ∞ (cid:88) k =0 ( ρ R T R ) k R T ] = 1 /T r [ ρ ∞ (cid:88) k =0 ( ρ ) k ( RR T ) k +1 ] = 1 /T r (cid:34) ρ (cid:32) ∞ (cid:88) k =0 ( ρ ) k ( RR T ) k +1) + ∞ (cid:88) i (cid:54) = j ( ρ ) i ( RR T ) i +1 ( ρ ) j ( RR T ) j +1 (cid:33)(cid:35) Because all the eigenvalues of R are less than 1, then T r ( RR T ) k < G for any k . Thus, thelower bond of I − Discrepancy ( ρ ) is G (1 − ρ ) ρ which is larger than I − Matching ( ρ ). Thus, we provedthat I − Matching ( ρ ) < I − Discrepancy ( ρ ) for any ρ and R . References

Alfv´en, T., Erkkola, T., Ghys, P., Padayachy, J., Warner-Smith, M., Rugg, D. and De Lay,P. (2017), ‘Global AIDS reporting-2001 to 2015: lessons for monitoring the sustainabledevelopment goals’,

AIDS and Behavior (1), 5–14.Bao, L., Salomon, J. A., Brown, T., Raftery, A. E. and Hogan, D. R. (2012), ‘Modelling na-tional HIV/AIDS epidemics: revised approach in the unAIDS estimation and projectionpackage 2011’, Sexually Transmitted Infections (Suppl 2), i3–i10.Baral, S., Beyrer, C., Muessig, K., Poteat, T., Wirtz, A. L., Decker, M. R., Sherman, S. G.and Kerrigan, D. (2012), ‘Burden of HIV among female sex workers in low-income andmiddle-income countries: a systematic review and meta-analysis’, The Lancet InfectiousDiseases (7), 538–549. 31ekker, L.-G., Alleyne, G., Baral, S., Cepeda, J., Daskalakis, D., Dowdy, D., Dybul, M.,Eholie, S., Esom, K., Garnett, G. et al. (2018), ‘Advancing global health and strengthen-ing the HIV response in the era of the sustainable development goals: the internationalAIDS society—lancet commission’, The Lancet (10144), 312–358.Besag, J. (1974), ‘Spatial interaction and the statistical analysis of lattice systems’,

Journalof the Royal Statistical Society: Series B (Methodological) (2), 192–225.Calleja, J. M. G., Jacobson, J., Garg, R., Thuy, N., Stengaard, A., Alonso, M., Ziady, H.,Mukenge, L., Ntabangana, S., Chamla, D. et al. (2010), ‘Has the quality of serosurveil-lance in low-and middle-income countries improved since the last HIV estimates roundin 2007? status and trends through 2009’, Sexually Transmitted Infections (Suppl2), ii35–ii42.de Valpine, P., Turek, D., Paciorek, C. J., Anderson-Bergman, C., Lang, D. T. and Bodik,R. (2017), ‘Programming with models: writing statistical algorithms for general modelstructures with nimble’, Journal of Computational and Graphical Statistics (2), 403–413.Eaton, J. W., Brown, T., Puckett, R., Glaubius, R., Mutai, K., Bao, L., Salomon, J. A.,Stover, J., Mahy, M. and Hallett, T. B. (2019), ‘The estimation and projection packageage-sex model and the r-hybrid model: new tools for estimating HIV incidence trends insub-saharan africa’.Eaton, J. W., Hallett, T. B. and Garnett, G. P. (2011), ‘Concurrent sexual partnershipsand primary HIV infection: a critical interaction’, AIDS and Behavior (4), 687–692.Fanshawe, T. R. and Diggle, P. J. (2012), ‘Bivariate geostatistical modelling: a review and32n application to spatial variation in radon concentrations’, Environmental and EcologicalStatistics (2), 139–160.Gart, J. J. (1959), ‘An extension of the cram´er-rao inequality’, The Annals of MathematicalStatistics pp. 367–380.Gasch, C. K., Hengl, T., Gr¨aler, B., Meyer, H., Magney, T. S. and Brown, D. J. (2015),‘Spatio-temporal interpolation of soil water, temperature, and electrical conductivity in3d+ t: The cook agronomy farm data set’,

Spatial Statistics , 70–90.Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013), Bayesian Data Analysis , CRC press.Lee, D. (2013), ‘Carbayes: an r package for bayesian spatial modeling with conditionalautoregressive priors’,

Journal of Statistical Software (13), 1–24.Lu, T.-T. and Shiou, S.-H. (2002), ‘Inverses of 2 × Computers & Mathe-matics with Applications (1-2), 119–129.Lyerla, R., Gouws, E. and Garcia-Calleja, J. (2008), ‘The quality of sero-surveillance inlow-and middle-income countries: status and trends through 2007’, Sexually TransmittedInfections (Suppl 1), i85–i91.Malag`o, L. and Pistone, G. (2015), Information geometry of the gaussian distributionin view of stochastic optimization, in ‘Proceedings of the 2015 ACM Conference onFoundations of Genetic Algorithms XIII’, pp. 150–162.Meyer, H., Katurji, M., Appelhans, T., M¨uller, M. U., Nauss, T., Roudier, P. and Zawar-Reza, P. (2016), ‘Mapping daily air temperature for antarctica based on modis lst’, Remote Sensing (9), 732. 33eyer, H., Reudenbach, C., Hengl, T., Katurji, M. and Nauss, T. (2018), ‘Improvingperformance of spatio-temporal machine learning models using forward feature selectionand target-oriented validation’, Environmental Modelling & Software , 1–9.Naghavi, M., Abajobir, A. A., Abbafati, C., Abbas, K. M., Abd-Allah, F., Abera, S. F.,Aboyans, V., Adetokunboh, O., Afshin, A., Agrawal, A. et al. (2017), ‘Global, regional,and national age-sex speciﬁc mortality for 264 causes of death, 1980–2016: a systematicanalysis for the global burden of disease study 2016’,

The Lancet (10100), 1151–1210.Niu, X., Zhang, A., Brown, T., Puckett, R., Mahy, M. and Bao, L. (2017), ‘Incorporationof hierarchical structure into estimation and projection package ﬁtting with examples ofestimating subnational HIV/AIDS dynamics’,

AIDS (1), S51–S59.Oliver, D. S. (2003), ‘Gaussian cosimulation: modelling of the cross-covariance’, Mathe-matical Geology (6), 681–698.Recta, V., Haran, M. and Rosenberger, J. L. (2012), ‘A two-stage model for incidence andprevalence in point-level spatial count data’, Environmetrics (2), 162–174.Rue, H. and Held, L. (2005), Gaussian Markov Random Fields: Theory and Applications ,CRC press.Spiegel, P. B. (2004), ‘HIV/AIDS among conﬂict-aﬀected and displaced populations: Dis-pelling myths and taking action’,

Disasters (3), 322–339.Weatherill, G., Silva, V., Crowley, H. and Bazzurro, P. (2015), ‘Exploring the impact ofspatial correlations and uncertainties for portfolio analysis in probabilistic seismic lossestimation’, Bulletin of Earthquake Engineering (4), 957–981.34orld Health Organization (2019), The global action plan for healthy lives and well-beingfor all: strengthening collaboration among multilateral organizations to accelerate coun-try progress on the health-related sustainable development goals, Technical report, WHO.Xue, W., Bowman, F. D. and Kang, J. (2018), ‘A Bayesian spatial model to predict diseasestatus using imaging data from various modalities’, Frontiers in Neuroscience12