[PDF] A Bayesian Spatial Modeling Approach to Mortality Forecasting

Abstract

This paper extends Bayesian mortality projection models for multiple populations considering the stochastic structure and the effect of spatial autocorrelation among the observations. We explain high levels of overdispersion according to adjacent locations based on the conditional autoregressive model. In an empirical study, we compare different hierarchical projection models for the analysis of geographical diversity in mortality between the Japanese counties in multiple years, according to age. By a Markov chain Monte Carlo (MCMC) computation, results have demonstrated the flexibility and predictive performance of our proposed model.

Full PDF

AA Bayesian Spatial Modeling Approach to MortalityForecasting

Zhen Liu , Xiaoqian Sun Yu-Bo Wang ∗ School of Mathematical and Statistical Sciences,Clemson University, SC, USA ∗ Corresponding author: [email protected] a r X i v : . [ s t a t . M E ] F e b bstractThis paper extends Bayesian mortality projection models for multiple populationsconsidering the stochastic structure and the eﬀect of spatial autocorrelation amongthe observations. We explain high levels of overdispersion according to adjacent loca-tions based on the conditional autoregressive model. In an empirical study, we comparediﬀerent hierarchical projection models for the analysis of geographical diversity in mor-tality between the Japanese counties in multiple years, according to age. By a Markovchain Monte Carlo (MCMC) computation, results have demonstrated the ﬂexibility andpredictive performance of our proposed model.Keywords: Mortality Projection; Overdispersion; MCMC; Conditional AutoregressiveModel Mortality rates have been varying in most countries over half a century. Manyresearchers work on mortality projection modelings applied in wide social ﬁelds, suchas demography and insurance. With the development of approaches, people focus onthe mortality information evolving from single population into multiple regions. Forsome adjacent districts, the conditions of mortality would be inﬂuenced mutually byeach other and potential missing data issues can be solved using the accurate estimatesfrom neighboring areas. Hence, spatiotemporal modeling and stochastic modeling ofmortality forecasting for age groups has drawn more attentions in decades.Besag et al. (1991) suggested a model which combined an autoregressive componentfor adjacent spatial eﬀects and independent error terms component for the observedvariance without explanations by the spatial structure for disease data. However, thetemporal eﬀect to the spatial heterogeneity is not discussed to establish the disease riskmapping model over time.Knorr-Held (2000) considered the interaction of spatial and temporal factors andintroduced a Bayesian spatio-temporal model of Ohio lung cancer accounting for thevariation mixed in the geographical diversity involved across each period. The asso-2iation of temporal varying parts overcomes the drawback of limitation of the singlespatial model for panel data.Arat´o et al. (2006) worked on the research of age-dependent deaths in 150 geographicregions of Hungary by the Bayesian approach. They assumed a binomial distributionas the underlying model for respective age, region and groups in 1997. This structurefocuses on the mortality estimation in a single year rather than the projection forfollowing periods.Greco and Scalone (2013) proposed a extended Poisson log-linear model for mortalityrisk of every provinces in Italy from 1992 to 2009, which sets a Bayesian hierarchicalspatio-temporal modeling framework based on a decomposition for mortality rate. Theauthors propose separate models for each age class without borrowing strength fromeach age groups and the stochastic structure of forecasting is also dropped in this model.For more countries, see Congdon (2007), Rau and Schmertmann (2017), Bergeron-Boucher et al. (2018), etc.Lee and Carter (1992) contributed a log-bilinear mortality projection model in-cluding separate age-speciﬁed parameters and time-series structure parameters whichrepresent the trend of mortality variation over time. The Lee Carter (LC) model impliesthe future mortality rate will follows the same tendency as past periods, but withoutconsidering the accidental eﬀects and improvements of society, since only the stochastictemporal component is responsible for variation in the estimation.Considering the limitation of variation resources in the original LC model, Brouhnset al. (2002) suggested a hierarchical Poisson log-bilinear model. The extended modelkeep the basic form of mortality risk and capture more information of variability viaincluding death and exposure to risk. It is commonly named Poisson Lee-Carter (PLC)model in following researches.Wong et al. (2018) proposed a Poisson Log-normal Lee Carter (PLNLC) model,which introduced an overdispersion term with a normal distribution to the structure ofmortality rate. Their method overcomes drawbacks of mean–variance equality assump-tion in the PLC model by adding more mortality variations across units. For otherextensions of log normal structure about mortality rate, readers are referred to Cairns3t al. (2006), Delwarde et al. (2007), Hyndman and Ullah (2007), etc.To reﬂect more possible uncertainties which aﬀect predicted values and predictionintervals, Czado et al. (2005) implemented the Poisson LC model in the Bayesian frame-work by Markov chain Monte Carlo (MCMC) sampling. Besides these works on themortality projections for single population, some researchers also discuss the relatedextension of the LC model within multiple populations (see Li and Lee (2005), Pedroza(2006), Cairns et al. (2011), Antonio et al. (2015), Liu et al. (2020), etc.).In this paper, we intend to take advantage of age-speciﬁc factors and stochasticstructures in LC model for the projections to ﬁll the gap of the integrated estimationand prediction of age-stratiﬁed mortality rates for multiple nationwide adjacent regionsunder Bayesian paradigm in a long run. Most of mortality research focused on themodeling of age-time alone or separate spatial eﬀect in the single period. An essentialfeature of our model is to link spatial eﬀects to time eﬀects for each age division inthe estimation of mortality with multiple populations. We employ spatial factors inthe model, which extends with univariate distribution to a multivariate normal struc-ture of overdispersion term involving more information among areas. The variances areassumed to be independent for each population in the frame of univariate overdisper-sion. This assumption probably leads to the lack of variation issue. Comparably, ouroverdispersion terms based on dependent covariance structure overcome the previousdrawback since considering more interactions of adjacent regions.The remainder of the paper is organized as follows. In section 2, we presents a briefdescription of trends in mortality for every county in Japan. Section 3 introduces theproposed model along with the prior settings and detailed steps of an Markov chainMonte Carlo (MCMC) sampling. The real data is applied to illustrate the advantage ofthe proposed approach in Section 4. Finally, we conclude with a discussion in Section5. 4

Geographical Mortality Data

In order to perform age-speciﬁc mortality levels at county’s level, we take intoaccount mortality rates by representative age groups and counties in three separateyears between 1975 and 2005. The adopted data are collected by the Japan MortalityDatabase (JMD) referring to 45 Japanese counties from 1975.From ﬁgures 1 to ﬁgure 3, counties are classiﬁed by direct mortality rates level inage group of 30, 50 and 70 for three given years (1975, 1990 and 2005). The generaldeclining trend which occurred in each age group over time can be demonstrated in theseﬁgures. Indeed, we can easily see that the ﬁrst maps in four ﬁgures are dominated bythe darkest highest mortality categories, since almost all the counties fell in the highestintervals. On the other hand, we can clearly see the lowest mortality categories in 2005.Spatial correlation of mortality is supported by the fact that higher death ratesare concentrated in some visible cluster of counties. In this respect, it indicates thatthe importance of spatial structure components when modeling mortality rates for sub-populations in terms of county. Fig. 1.

Maps of observed mortality rates by county at age 30 in Japan at 1975(left), 1990 (middle) and 2005 (right). ig. 2. Maps of observed mortality rates by county at age 50 in Japan at 1975(left), 1990 (middle) and 2005 (right).

Fig. 3.

Maps of observed mortality rates by county at age 70 in Japan at 1975(left), 1990 (middle) and 2005 (right).6

The Proposed Model

Besag et al. (1991) discussed a spatial modeling method to estimate relative risksof diseases under a Bayesian framework. The idea is based on the decomposition ofspatial structure component and speciﬁed regional random eﬀects for a risk parameter η i . It can be depicted as η i = µ + φ i + θ i , (1)where µ is a ﬁxed eﬀect representing the overall risk level across all areas. φ i is an condi-tional autoregressive spatial component and θ i is an ordinary random eﬀect for diversity.Knorr-Held (2000) introduced a Bayesian spatio-temporal model with a expression forlog odds as a combination of temporal and spatial main eﬀects and spatio-temporalinteractions. η it = ln { π it / (1 − π it ) } with η it = α t + θ i + φ i + δ it , (2)where α t and is a temporal eﬀects of year t . θ i and φ i represent features of area i with individual spatial structure and δ it is a spatio-temporal parameter. The objectof analysis is combined 55 to 64 years white male in this model. Further hierarchicalmodeling and extensions stratiﬁed by age are not described in detail.In the Lee-Carter (LC) model, the central death rate m x,t for age x at year t isformulated by the following equation:log m x,t = α x + β x κ t + (cid:15) x,t , (3)where α x denotes the logarithm of mortality rate averaged over time, whereas β x denotesthe age-speciﬁed sensitivity from the averaged pattern as mortality trend varies. Thetrend of mortality change in periods is represented by the time parameter κ t . Thecomponent (cid:15) x,t denotes the error term, with mean 0 and an identical variance σ (cid:15) ,7eﬂecting extra variations excluding age and time eﬀects in the model.Brouhns et al. (2002) modiﬁed the LC model into the following Poisson framework: D x,t | µ x,t ∼ Poisson( E x,t µ x,t ) with log µ x,t = α x + β x κ t , (4)where D x,t is the death toll for the group aged x at time t , E x,t is the matched exposureat risk and µ x,t is the corresponding theoretic mortality rate. Let θ i denote the i th spatial random eﬀect for i = 1 , , . . . , n . We propose theBayesian Lee-Carter model with spatial structure as follows, D ( i ) x,t | µ ( i ) x,t ∼ Poisson( E ( i ) x,t µ ( i ) x,t ) , with log µ ( i ) x,t = α x + β x κ t + γ x θ i , (5) κ t = ϕ + ϕ t + ρ [ κ t − − ϕ − ϕ ( t − (cid:15) t , where (cid:15) t i.i.d. ∼ N (0 , σ κ ). Here, the roles of α x , β x and κ t are same as ones in the LC model.Moreover, γ x denotes the overall sensitivity of the mortality aged x to all regions. Addi-tionally, we expect to explore the interaction between age and spatial eﬀects to explainthe extra spatial heterogeneity accounting for variability in diﬀerent generations.The ﬁrst two lines can be viewed as a generalization of PLC model (4) to a multi-population problem at geographical level, and the last one describes the autoregressiveof order one (AR1) with drift structure of κ t .8et U N × N =  · · · · · · − ρ − ρ . . . ...... . . . . . . . . .0 · · · − ρ  , W =  t ... ...1 t N  , ϕ = ( ϕ , ϕ ) (cid:48) , Q = U (cid:48) U , n + 1 dependence structures of time eﬀects can be written as κ ∼ N ( W ϕ , σ κ Q − ) where κ = ( κ , κ , . . . , κ N ) (cid:48) .For model identiﬁability, we also want to point out that some constraints with two sameones as Lee and Carter (1992) (cid:88) x ∈ Θ age β x = 1 and (cid:88) t ∈ Θ time κ t = 0 . (6)Here, we suppose Θ age = { x , x + 1 , . . . , x + M − } ≡ { x , x , . . . , x M } , Θ time = { t , t + 1 , . . . , t + N − } ≡ { t , t , . . . , t N } and Θ space = { θ , θ , . . . , θ P } denoted thesets of age, time and space considered in the training dataset respectively.For all sets of spatial parameters, we assume an Conditional AutoRegressive (CAR)structure, which can be formulated as a multivariate Normal distribution in terms ofmean vector and covariance matrix. This method is ﬁrst introduced by Besag (1974) toanalyze the similar properties of neighboring regions. We also refer extensions of CARmodel about Bayesian sampling contributed by Banerjee et al. (2014) and De Oliveira(2012).As for θ = ( θ , θ , . . . , θ P ) (cid:48) , assuming θ ∼ N p (cid:0) , σ θ ( M θ − λ W θ ) − (cid:1) , where λ ∈ ( λ − min , λ − max ) (7)and λ min , λ max are the smallest and largest eigenvalues in W θ ,Denote Q θ = M θ − λ W θ , then θ ∼ N p (cid:0) , σ θ Q − θ (cid:1) , (8)9here W θ is the adjacency matrix whose generic ij entry w ij = 1 if i and j areconsidered as neighbors, while w ij = 0 otherwise. Matrix M θ is the diagonal matrixwith diagonal entries equal to the number of neighbors.Assuming the conditional distribution of θ i as follows: θ i | θ − i ∼ N ( λ (cid:88) i (cid:54) = j b ij θ j , σ i ) , where i = { , , · · · , n } , (9) b ij = w ij w i + and σ i = σ θ w i + , (10) w i + = (cid:88) j w ij and w ij =  To assure the tractable full conditional distribution of α , we conduct the samevariable transformation e x = exp( α ) as Czado et al. (2005) and propose e x ∼ Gamma( a x , b x ) , (12)where a x and b x are pre-speciﬁed constants such that π ( e x ) = ( b x ) a x ( e x ) a x − exp( − e x b x ) / Γ( a x ) . As for β = ( β , β , . . . , β M ) (cid:48) , we consider the following non-informative priors β ∼ N (cid:18) M J M , σ β I M (cid:19) ,σ β ∼ InvGamma( a β , b β ) , J M is a M × I M is an identity matrix ofsize M , and a β , b β are pre-speciﬁed constants such that π ( σ β ) = b a β β ( σ β ) − a β − exp( − b β /σ β ) / Γ( a β ) . Note that the proposed priors are non-informative in the sense that they all center at1 /M , the constraint (=1) equally shared by M age groups.Similarly, by setting γ = ( γ , γ , . . . , γ M ) (cid:48) , we consider the following non-informativepriors γ ∼ N (cid:18) M J M , σ γ I M (cid:19) ,σ γ ∼ InvGamma( a γ , b γ ) . Suppose ϕ = ( ϕ , ϕ ) (cid:48) , we consider the following priors for the dependence structureof κ ϕ ∼ N ( ϕ , Σ ) ,ρ ∼ N (0 , σ ρ ) { ρ ∈ ( − , } ,σ κ ∼ InvGamma( a κ , b κ ) , where ϕ , Σ , σ ρ , a κ , and b κ are pre-speciﬁed hyperparameters, and { ρ ∈ ( − , } is anindicator function equal to 1 when ρ is between -1 and 1. Finally, setting the prior of λ and σ θ as λ ∼ Unif( λ − min , λ − max ) and σ θ ∼ InvGamma( a θ , b θ ) . .4 Posterior Computation The full conditional distributions of age parameters are given by π ( e x | · ) ∝ exp( − c x e x )( e x ) D x,. (cid:12)(cid:12)(cid:12)(cid:12) dde x g − ( α x ) (cid:12)(cid:12)(cid:12)(cid:12) π ( e x ) ∝ exp [ − ( b x + c x ) e x ] ( e x ) a x + D x,. − , (13) π ( β x | · ) ∝ p (cid:89) i =1 (cid:89) t ∈ Θ time exp (cid:104) − E ( i ) x,t exp( α x + β x κ t + γ x θ i ) (cid:105) × exp( β x κ t D ( i ) x,t ) × exp (cid:34) − ( β x − M ) σ β (cid:35) , (14) π ( σ β | · ) ∝ ( σ β ) − ˜ a β − exp( − ˜ b β /σ β ) , (15) π ( γ x | · ) ∝ p (cid:89) i =1 (cid:89) t ∈ Θ time exp (cid:104) − E ( i ) x,t exp( α x + β x κ t + γ x θ i ) (cid:105) × exp( γ x θ i D ( i ) x,t ) × exp (cid:20) − ( γ x − M ) σ γ (cid:21) , (16) π ( σ γ | · ) ∝ ( σ γ ) − ˜ a γ − exp( − ˜ b γ /σ γ ) , (17)where the notation “ | · ” represents “conditional on all other parameters and dataG”, c x = (cid:80) i ∈ Θ space c ( i ) x and c ( i ) x = (cid:80) t ∈ Θ time E ( i ) x,t exp( β x κ t + γ x θ i ), D x,. = (cid:80) i ∈ Θ space D ( i ) x,. and D ( i ) x,. = (cid:80) t ∈ Θ time D ( i ) x,t −

1, ˜ a β = a β + M , ˜ b β = b β + (cid:0) β − M J M (cid:1) (cid:48) (cid:0) β − M J M (cid:1) , ˜ a γ = a γ + M ,˜ b γ = b γ + (cid:0) γ − M J M (cid:1) (cid:48) (cid:0) γ − M J M (cid:1) .From (13), (15), and (17) we have e x | · ∼ Gamma( a x + D x,. , b x + c x ) ,σ β | · ∼ InvGamma(˜ a β , ˜ b β ) ,σ γ | · ∼ InvGamma(˜ a γ , ˜ b γ )and can sample each iteration of e ( i ) , σ β and σ γ from the known distributions.12 .4.2 Posterior distributions for Time Parameters In this section, we separately discuss the sampling algorithms for common andpopulation-speciﬁc time parameters because the dependence structure of the latter isfurther regularized by the dirac spike. Let κ − t = κ \{ κ t } = ( κ , . . . , κ t − , κ t +1 , . . . , κ t N ) (cid:48) and η t = ϕ + ϕ t , the full conditional distributions of κ , ϕ , ρ , and σ κ are proportionalto π ( κ t | . ) ∝ n (cid:89) i =1 (cid:89) x ∈ Θ age exp (cid:104) − E ( i ) x,t exp( α x + β x κ t + γ x θ i ) (cid:105) × exp( β x κ t D ( i ) x,t ) × f ( κ t | κ − t ) , (18) π ( ϕ | · ) ∝ exp (cid:20) − σ κ ( ϕ (cid:48) ( Σ ∗ ) − ϕ − κ (cid:48) QW + σ κ ϕ (cid:48) Σ − ) ϕ ) (cid:21) , (19) π ( ρ | · ) ∝ exp (cid:20) − σ κ (cid:18) a ρ ρ + σ κ σ ρ ρ − b ρ ρ (cid:19)(cid:21) { ρ ∈ ( − , } , (20) π ( σ κ | · ) ∝ ( σ κ ) − ( a κ + N/ − exp (cid:20) − σ κ (cid:18) b κ + 12 ( κ − W ϕ ) (cid:48) Q ( κ − W ϕ ) (cid:19)(cid:21) , (21)where f ( κ t | κ − t ) are the conditional distribution of κ t based on AR(1) with a driftin (5), Σ ∗ = ( W (cid:48) QW + σ κ Σ − ) − , a ρ = (cid:80) t N t = t ( κ t − − η t − ) , and b ρ = (cid:80) t N t = t ( κ t − η t )( κ t − − η t − ). Note that when t = t , f ( κ t | κ − t ) ∝ f ( κ t ) f ( κ t +1 | κ t ) ∝ exp (cid:20) − σ κ [( κ t − η t ) + ( κ t +1 − η t +1 − ρ ( κ t − η t )) ] (cid:21) ; (22)when t < t < t N , f ( κ t | κ − t ) ∝ f ( κ t +1 | κ t ) f ( κ t | κ t − ) ∝ exp (cid:20) − σ κ [( κ t − η t − ρ ( κ t − − η t − )) + ( κ t +1 − η t +1 − ρ ( κ t − η t )) ] (cid:21) ;(23)13hen t = t N , f ( κ t | κ − t ) ∝ f ( κ t | κ t − ) ∝ exp (cid:20) − σ κ ( κ t − η t − ρ ( κ t − − η t − )) (cid:21) . (24)From (19), (20), and (21), ϕ , ρ and σ κ are updated by ϕ | · ∼ N ( Σ ∗ ( W (cid:48) Qκ + σ κ Σ − ϕ ) , σ κ Σ ∗ ) ,ρ | · ∼ N  b ρ a ρ + σ κ σ ρ , σ κ a ρ + σ κ σ ρ  { ρ ∈ ( − , } ,σ κ | · ∼ InvGamma (cid:18) a κ + N , b κ + 12 ( κ − W ϕ ) (cid:48) Q ( κ − W ϕ ) (cid:19) . The full conditional distribution is f ( θ i | · ) ∝ (cid:89) x (cid:89) t exp( − E ( i ) x,t exp( α x + β x κ t + γ x θ i )) · exp( γ x θ i D ( i ) x,t ) (25) × exp( − σ i ( θ i − λ (cid:88) i (cid:54) = j b ij θ j ) ) . The posterior distributions of λ and σ θ are f ( λ | · ) ∝ det(Σ θ ) − exp( − θ (cid:48) Σ − θ θ ) × λ ( λ − min , λ − max ) , (26)where Σ θ denoted as σ θ Q − θ . σ θ ∼ InvGamma( a θ + P , b θ + 12 θQ θ θ (cid:48) ) . (27)14 Numerical Analysis

The data to illustrate our proposed method is deaths and exposures to risk in 45counties (excluding Hokkaido and Okinawa without adjacent county) of Japan from theJapanese Mortality Database (JMD). We consider each county as a single population,and calibrate the model on the data from 1975 to 2004 for ages 0-99 while the datafrom 2005 to 2016 is separated for the validation purpose.

Following the prior speciﬁcations in Sections 3.3.1-3.3.3, we consider a x = b x = 1, a β = b β = 0 .

01, and a γ = b γ = 0 .

001 as age-related hyperparameters. For thosehyperparameters related to the spacial and time factors, they are set as a θ = b θ = 0 . λ = 0 . a κ = b κ = 0 . σ ρ = 1, ϕ = (0 , (cid:48) , Σ = 

10 00 10  , respectively. It has tobe mentioned that the pre-speciﬁed values here are similar to the ones in Czado et al.(2005) and non-informative relative to the size (30 × ×

45) of our analyzing dataset.

To evaluate the performance of our spatial model, we generate an MCMC sampleof 20000 iterations with the ﬁrst 10000 as burn-in period, and summarize the posteriormedians of α ( i ) , β , β ( i ) , κ and κ ( i ) along with the 95% highest posterior density (HPD)intervals in sections 4.3.1 and 4.3.2, respectively. Figures 4(a) present the results of α x under the spatial model, where the 95% HPDintervals are obtained by the method suggested in Hoﬀ (2009). From Figure 4(a), wenotice that the posterior distributions of α x have small variances, and that there are15ome features in the estimated curves: The decline from the infant stage to teenageris likely related to the immune system strengthened with growing age. Then, whenages are around 16-21, the health condition may not be the only decisive factor for thehump. It might be blamed on unnatural deaths caused by immature behaviors. For theadult-and-elder stage, the curves consistently increase since deaths happening in thisstage are more related to aging.Similarly, Figure 4(b) and 4(c) shows the posterior median and 95% HPD intervalof factor β x and γ x , respectively. It can be seen that β x are more sensitive to all timechange at children and seniors stage and γ x is lack of sensitivity to spacial variationswith increasing ages. 16 a)(b)(c) Fig. 4.

Plots of the estimated medians of α x , β x and γ x with their 95% HDP intervals.17 .3.2 Time parameters In Figure 5, we present the posterior medians and 95% HPD intervals of κ t . Inaddition to the years 1975-2004, the follow-up 12-year ahead projections of time eﬀectsare also provided via the posterior predictive distributions.To obtain a sample from theposterior predictive distribution of κ t , we have κ [ j ] N + t (cid:48) ∼ N (cid:16) ϕ [ j ]1 + ϕ [ j ]2 ( N + t (cid:48) ) + ρ [ j ] [ κ [ j ] N + t (cid:48) − − ϕ [ j ]1 − ϕ [ j ]2 ( N + t (cid:48) − , ( σ κ ) [ j ] (cid:17) for iterations j = 1 , , . . . , t (cid:48) = 1 , , . . . ,

12. From Figure 5, we observedecreasing trends in most of time windows, which might be attributed to the advancesin medical improvement and social welfare.

Fig. 5.

Plots of the estimated medians of κ t with their 95% HDP intervals and the20-years ahead projections of these parameters. Through this part, we are working on performances of the spatial model, the PoissonLee-Carter (PLC) model and Poisson log-normal Lee–Carter (PLNLC) model. With acommon measurement, we list the square of Pearson residuals under the PLC modelfor each population given as: r ( i )2 x,t = ( D ( i ) x,t − E ( i ) x,t ˆ µ ( i ) x,t ) E ( i ) x,t ˆ µ ( i ) x,t where ˆ µ ( i ) x,t = exp( ˆ α ( i ) x + ˆ β ( i ) x ˆ κ ( i ) t )and estiamtes ˆ α ( i ) x , ˆ β ( i ) x and ˆ κ ( i ) t are posterior medians via Bayesian sampling. Here,heat maps of r ( i )2 x,t are opted to visualize the lack of ﬁt of the PLC model for mortality18ata in three Japanese counties, as depicted in Fig 6 (a) to Fig 8 (a).By a considerable amount of dark color cells dispersing around various regions in theheat maps, there is one evidence showing the lack of ﬁt in the PLC model with ignoringthe spatial eﬀect.Similarly, heat maps of the squared Pearson residuals for our models can be constructed.The expression of squared residuals for our spatial model follows r ( i )2 x,t = ( D ( i ) x,t − E ( i ) x,t ˆ µ ( i ) x,t ) E ( i ) x,t ˆ µ ( i ) x,t where ˆ µ ( i ) x,t = exp( ˆ α x + ˆ β x ˆ κ t + ˆ γ x ˆ θ ( i ) )and here the posterior medians of the parameters α x , β x , γ x , κ t and θ i are plugged intothe expression for an estimate.Fig 6 (c) to 8 (c) shows that there exist more lighter color cells in the heat maps of thespatial model than those in the other two models, which implies an overall improvementin goodness of ﬁt totally. 19 a)(b)(c) Fig. 6.

Heat map of squared Pearson residuals r ( i ) x,t under the PLC model (top), thePLNLC model (middle) and the spatial model (bottom) for Miyagi.20 a)(b)(c) Fig. 7.

Heat map of squared Pearson residuals r ( i ) x,t under under the PLC model(top), the PLNLC model (middle) and the spatial model (bottom) for Tokyo.21 a)(b)(c) Fig. 8.

Heat map of squared Pearson residuals r ( i ) x,t under under the PLC model(top), the PLNLC model (middle) and the spatial model (bottom) for Ehime.Further, we evaluate our spatial model by the comparison of variability for previousthree models. Figures 9 to 11 present the 95% HPD intervals of simulated log mortalityrates at three selected ages 20, 50, and 70 for three models, respectively. In additionto the training time window (years 1975-2004), 12-year ahead projections are provided22o assess the prediction ability of our spatial model. For convenience, model 1, model2 and model 3 denote the spatial model, the PLC model and the PLNLC model in thefollowing discussion.From Figure 9 to 11, we see that almost every observed points of 1975-2004 fallin the 95% prediction intervals (blue curves). Moreover, the spatial model providessolid 12-years ahead projections because validated log rates are down the middle ofthose blue curves. We also notice that the prediction intervals may properly presentthe variability of data by introducing additional spatial terms.23 a)(b)(c) Fig. 9.

Plots of the observed log death rates, ﬁtted log death rates, the associated12-years ahead projection of the log death rates and 95% HDP intervals aged 20under three models for Miyagi (upper), Tokyo (middle) and Ehime (lower).24 a)(b)(c)

Fig. 10.

Plots of the observed log death rates, ﬁtted log death rates, the associated12-years ahead projection of the log death rates and 95% HDP intervals aged 50under three models for Miyagi (upper), Tokyo (middle) and Ehime (lower).25 a)(b)(c)

Fig. 11.

Plots of the observed log death rates, ﬁtted log death rates, the associated12-years ahead projection of the log death rates and 95% HDP intervals aged 70under three models for Miyagi (upper), Tokyo (middle) and Ehime (lower).26

Conclusion

This paper presents a Bayesian approach to estimate and predict mortality formultiple neighbor regions under Poisson log-normal assumption. By illustrations ofnationwide cases, we can observe the evidences of spatiotemporal eﬀect at county levelin Japan. Incorporating a CAR structure in the PLC model, the new model can si-multaneously conduct the observed spatial association and temporal correlation trends,which proposed a model where regional mortality rates estimates can borrow informa-tion from each other. As a result, the issue of large uncertainty of mortality rates viadirect estimation can be mitigated at speciﬁc age groups in some areas.In our extension of projection modeling, the number of death in each populationis treated as conditional independence. Some researchers consider cohort eﬀect mod-els to solve the dependence structure of death tolls among populations (Renshaw andHaberman (2006), Booth and Tickle (2008) , etc). However, even if the dependence be-tween mortality rates by age is not involved in the model, estimated log-mortality ratesare mostly in accord with the observed results. Hence, the assumption of conditionalindependence is reasonable to our proposed approach.27 eferences

Katrien Antonio, Anastasios Bardoutsos, and Wilbert Ouburg. Bayesian Poisson log-bilinear models for mortality projections with multiple populations.

European Actu-arial Journal , 5(2):245–281, 2015.N Mikl´os Arat´o, Ian L Dryden, and Charles C Taylor. Hierarchical Bayesian modellingof spatial age-dependent mortality.

Computational Statistics and & Data Analysis ,51(2):1347–1363, 2006.Sudipto Banerjee, Bradley P Carlin, and Alan E Gelfand.

Hierarchical modeling andanalysis for spatial data . CRC press, 2014.Marie-Pier Bergeron-Boucher, Violetta Simonacci, Jim Oeppen, and Michele Gallo.Coherent modeling and forecasting of mortality patterns for subpopulations usingmultiway analysis of compositions: an application to canadian provinces and territo-ries.

North American Actuarial Journal , 22(1):92–118, 2018.Julian Besag. Spatial interaction and the statistical analysis of lattice systems.

Journalof the Royal Statistical Society: Series B (Methodological) , 36(2):192–225, 1974.Julian Besag, Jeremy York, and Annie Molli´e. Bayesian image restoration, with twoapplications in spatial statistics.

Annals of the institute of statistical mathematics ,43(1):1–20, 1991.Heather Booth and Leonie Tickle. Mortality modelling and forecasting: A review ofmethods.

Annals of actuarial science , 3(1-2):3–43, 2008.Natacha Brouhns, Michel Denuit, and Jeroen K Vermunt. A Poisson log-bilinear re-gression approach to the construction of projected lifetables.

Insurance: Mathematicsand economics , 31(3):373–393, 2002.Andrew JG Cairns, David Blake, and Kevin Dowd. A two-factor model for stochasticmortality with parameter uncertainty: theory and calibration.

Journal of Risk andInsurance , 73(4):687–718, 2006. 28ndrew JG Cairns, David Blake, Kevin Dowd, Guy D Coughlan, and Marwa Khalaf-Allah. Bayesian stochastic mortality modelling for two populations.

ASTIN Bulletin:The Journal of the IAA , 41(1):29–59, 2011.Peter Congdon. A model for spatial variations in life expectancy; mortality in chineseregions in 2000.

International journal of health geographics , 6(1):16, 2007.Claudia Czado, Antoine Delwarde, and Michel Denuit. Bayesian Poisson log-bilinearmortality projections.

Insurance: Mathematics and Economics , 36(3):260–284, 2005.Victor De Oliveira. Bayesian analysis of conditional autoregressive models.

Annals ofthe Institute of Statistical Mathematics , 64(1):107–133, 2012.Antoine Delwarde, Michel Denuit, and Christian Partrat. Negative binomial version ofthe Lee-Carter model for mortality forecasting.

Applied Stochastic Models in Businessand Industry , 23(5):385–401, 2007.Fedele Greco and Francesco Scalone. A space-time extension of the Lee-Carter modelin a hierarchical bayesian frame-work: Modelling provincial mortality in ITALY.

Measuring Uncertainty in Population Forecasts: A New Approach. , page 412, 2013.Peter D. Hoﬀ.

A First Course in Bayesian Statistical Methods . Springer PublishingCompany, Incorporated, 1st edition, 2009. ISBN 0387922997, 9780387922997.Rob J Hyndman and Md Shahid Ullah. Robust forecasting of mortality and fertilityrates: a functional data approach.

Computational Statistics & Data Analysis , 51(10):4942–4956, 2007.Leonhard Knorr-Held. Bayesian modelling of inseparable space-time variation in diseaserisk.

Statistics in medicine , 19(17-18):2555–2567, 2000.Ronald D Lee and Lawrence R Carter. Modeling and forecasting US mortality.

Journalof the American statistical association , 87(419):659–671, 1992.Nan Li and Ronald Lee. Coherent mortality forecasts for a group of populations: Anextension of the Lee-Carter method.

Demography , 42(3):575–594, 2005.29hen Liu, Xiaoqian Sun, Leping Liu, and Yu-Bo Wang. Bayesian poisson log-normalmodel with regularized time structure for mortality projection of multi-population. arXiv preprint arXiv:2010.04775 , 2020.Claudia Pedroza. A Bayesian forecasting model: predicting us male mortality.

Bio-statistics , 7(4):530–550, 2006.Roland Rau and Carl Schmertmann. Bayesian modeling of small-area mortality withrelational model schedules and spatially varying parameters. In . IUSSP, 2017.Arthur E Renshaw and Steven Haberman. A cohort-based extension to the Lee-Cartermodel for mortality reduction factors.

Insurance: Mathematics and economics , 38(3):556–570, 2006.Jackie ST Wong, Jonathan J Forster, and Peter WF Smith. Bayesian mortality fore-casting with overdispersion.