[PDF] Dynamic modeling of mortality via mixtures of skewed distribution functions

Abstract

There has been growing interest on forecasting mortality. In this article, we propose a novel dynamic Bayesian approach for modeling and forecasting the age-at-death distribution, focusing on a three-components mixture of a Dirac mass, a Gaussian distribution and a Skew-Normal distribution. According to the specified model, the age-at-death distribution is characterized via seven parameters corresponding to the main aspects of infant, adult and old-age mortality. The proposed approach focuses on coherent modeling of multiple countries, and following a Bayesian approach to inference we allow to borrow information across populations and to shrink parameters towards a common mean level, implicitly penalizing diverging scenarios. Dynamic modeling across years is induced trough an hierarchical dynamic prior distribution that allows to characterize the temporal evolution of each mortality component and to forecast the age-at-death distribution. Empirical results on multiple countries indicate that the proposed approach outperforms popular methods for forecasting mortality, providing interpretable insights on the evolution of mortality.

Full PDF

DDynamic modeling of mortality via mixtures of skewed distributionfunctions

Emanuele Aliverti , Stefano Mazzuco , and Bruno Scarpa Department of Economics, University Ca’ Foscari Venezia Department of Statistical Sciences, University of Padova Department of Mathematics “Tullio Levi-Civita”, University of Padova

Abstract

There has been growing interest on forecasting mortality. In this article, we propose a noveldynamic Bayesian approach for modeling and forecasting the age-at-death distribution, focusingon a three-components mixture of a Dirac mass, a Gaussian distribution and a Skew-Normal distri-bution. According to the speciﬁed model, the age-at-death distribution is characterized via sevenparameters corresponding to the main aspects of infant, adult and old-age mortality. The proposedapproach focuses on coherent modeling of multiple countries, and following a Bayesian approachto inference we allow to borrow information across populations and to shrink parameters towardsa common mean level, implicitly penalizing diverging scenarios. Dynamic modeling across years isinduced trough an hierarchical dynamic prior distribution that allows to characterize the temporalevolution of each mortality component and to forecast the age-at-death distribution. Empiricalresults on multiple countries indicate that the proposed approach outperforms popular methods forforecasting mortality, providing interpretable insights on the evolution of mortality.

Keywords:

Bayesian inference; Dynamic modeling; Mixture model; Mortality forecast; Skew-Normal dis-tribution.

In the last decades, changes in life expectancy and population growths have stimulated growing inter-est in sophisticated models for mortality (e.g., Booth, 2020; Zanotto et al., 2020). These approachescharacterize the observed age-speciﬁc regularities of the mortality function and their trends over time,providing a parsimonious representation of the components of mortality and allowing to provide fore-casts. Estimates of past mortality trends provide insights on the global health of a population, describe1 a r X i v : . [ s t a t . A P ] F e b taly France Sweden Norway Ages Y ea r s Figure 1:

Age-at-death distribution across Italy, France, Sweden and Norway from to . relevant aspects of its demographic structure and facilitate comparisons across countries. Similarly,probabilistic forecasts are routinely employed by researchers, practitioners and organizations to aidand support welfare planning strategies such as pension funds, insurances and health systems, amongmany others (e.g., Lutz and Samir, 2010).Human mortality has been generally characterized by several quantities, with the hazard function,the survival function and the probability density function being the most popular ones (e.g., Kleinand Moeschberger, 2006). Although each representation can be directly obtained from the others,their interpretations are complementary and highlight more clearly diﬀerent aspects of mortality (e.g.,Basellini and Camarda, 2019). In this article we will focus on modeling the distribution of the ages atdeath, generally referred to as “age-at-death distribution” and corresponding to the density functionthat characterizes mortality (Keyﬁtz, 2005). In the literature on modeling and forecasting mortality,this quantity has received signiﬁcantly less attention than other quantities such as mortality rates(e.g., Lee and Carter, 1992; Li and Lee, 2005). However, as outlined in Basellini and Camarda (2020);Pascariu et al. (2019), analysis of the age-at-death distribution can be incredibly useful, since it providesa direct evaluation of longevity and lifespan variability that cannot be directly inferred from otherquantities (see also Canudas-Romo, 2010; Cheung et al., 2005).To further see that, Figure 1 illustrates the age-speciﬁc distribution of deaths from four illustrativecountries, over a time period that ranges from to . These data highlight crucial aspects ofthe temporal evolution of mortality, such as the global shift of the curve to older ages—due to theincrease in life expectancy—and the drop of perinatal and infant mortality. Figure 1 also suggests that2uch an evolution has characterized countries diﬀerently, resulting in faster changes in some countries(e.g., Italy) and slower in others (e.g., Sweden). Furthermore, the age-speciﬁc distribution of deathsillustrates that several mortality trends are converging, resulting in similar patterns in terms of modalage at death, infant mortality and compression of old age mortality.The ﬁrst approaches for modeling the distribution of deaths date back to the 19-th century (e.g.,Gompertz, 1825; Makeham, 1860; Pearson, 1897), and have been generalized in the following years(Heligman and Pollard, 1980; Siler, 1983). The main interest of such methods has been on character-izing mortality patterns across ages, focusing on a given country during a speciﬁc year. According tothese approaches, the mortality curve can be characterized using diﬀerent parametric speciﬁcations,accounting for distinct aspects of the mortality curve; see also Dellaportas et al. (2001) and Mazzucoet al. (2018). One notable limitation of these approaches is the focus on single-country modeling;when interest is on forecasting mortality for multiple countries, independent country-speciﬁc analysisare required. This strategy ignores the fact that in most developed nations similar trends are ob-served, and this abundance of information can be appropriately included in the model to improvethe quality of the estimates. Joint modeling of multiple nations, in fact, is expected to improve theestimation of country-speciﬁc parameters and temporal trends, allowing to share information acrossdiﬀerent countries.More recently, researchers have focused on explicitly modeling the temporal evolution of the age-at-death distribution, forecasting the parameters using time-series speciﬁcations (Basellini and Camarda,2019; Pascariu et al., 2019). These methods borrow ideas from death-rates forecasting, such as themilestone approach introduced by Lee and Carter (1992) and sequent developments (e.g., Hyndmanand Ullah, 2007). Potentially, these models can be both used to describe the evolution of mortalityin the past years and to provide forecasts, although the focus of applications is more frequently onforecasting only. However, separate forecasts often lead to unreasonable results, with country-speciﬁcor sex-speciﬁc predictions that diverge substantially across groups. Such divergent forecasts acrosssub-populations are implausible, since industrialized countries have been showing global convergencetrends in terms of mortality levels, and this tendency is expected to endure (Bergeron-Boucher et al.,2017; Oeppen, 2008; Wilson, 2001).Following the breakthrough approach of Li and Lee (2005), diﬀerent authors have focused on“coherent” mortality forecasts, deﬁned as predictions that do not diverge across groups. Such a strategyhas constantly received attention in the past years, being successfully implemented in several models fordeath rates; for example, generalizing compositional-data techniques (Bergeron-Boucher et al., 2017)and functional-data models (Hyndman et al., 2013) for mortality. However, a critical issue with these3ethods concerns with the choice of a population standard to be used as a reference. This choiceis substantially arbitrary, and a common consensus on which sub-population should be used is stilllacking, since results are often sensitive to diﬀerent references (Booth, 2020; Kjærgaard et al., 2016).Motivated by the above discussion, in this article we propose a novel approach for modeling andforecasting the age-at-death distributions of multiple countries. We will focus on the age-distributionof deaths, combining the life tables quantities with the total number of deaths. This procedure allowsto naturally take into account the population size of each country and its age-structure, which areimportant determinants of the age-at-death distribution. The proposed methods can also be applied,without modiﬁcations, directly to the life table quantities, where we can assume that data are providedon artiﬁcial populations with a total number of deaths equal to . However, this would discard animportant source of uncertainty, which corresponds to the population size of a speciﬁc nation. Indeed,imposing a ﬁxed same sample size would weight each nation equally, implying that a relatively smallcountry (e.g. Sweden) would have the same inﬂuence as a much bigger country (e.g., Germany orUnited Kingdom). Therefore, we prefer to use the actual sample size of each county to have a properquantiﬁcation of uncertainty and genuinely account for diﬀerent population structures.We propose a method based on a mixture distribution function of a Dirac mass, a Gaussian dis-tribution and a Skew-Normal distribution, in order to characterize infant, adult and old-age mortalitywith an interpretable set of parameters. Following a Bayesian approach to inference, we specify hi-erarchical priors for the dynamic parameters and the country-speciﬁc coeﬃcients, relying on commonprior speciﬁcations to improve the borrowing of information across nations. Compared to the avail-able techniques, our method allows to model directly the temporal evolution and the country-speciﬁcvariations across multiple sub-populations, avoiding sequential procedures and providing estimatesshrunk towards common levels. Under the proposed methods, coherent predictions are naturally ob-tained without selecting a reference population or an external standard, but instead estimating thesequantities from the available data. We will refer to the proposed dynamic model based on a skeweddistribution function as dysm in the sequel. Our interest is on providing a ﬂexible representation of the age-at-death distribution in multiple coun-tries, dynamically across diﬀerent years or time periods. We will characterize this quantity trougha continuous probability density function ( pdf ) f ( x ; ϑ jt ) which parametrizes the probability of dy-ing at age x , in country j during year t via a set of country- and time-speciﬁc parameters ϑ jt , with4 = 0 , . . . , , j = 1 , . . . , p countries and t = 1 , . . . , T years. Although f ( x ; · ) is often speciﬁed as acontinuous function, in practice we focus on characterizing the age-at-death distribution on a discretegrid, corresponding to the probability p x jt of dying at a speciﬁc age x = 0 , . . . , . Such an aim canbe addressed discretizing f ( x ; ϑ jt ) over a set of disjoint intervals ( x − / , x + 1 / as p x jt = (cid:90) x +1 / x − / f ( z ; ϑ jt ) d z = F ( x + 1 / ϑ jt ) − F ( x − / ϑ jt ) , x = 0 , . . . , , (1)where F denotes the cumulative distribution function ( cdf ) associated with f . This characterizationof probabilities via a discretization of an underlying pdf automatically guarantees that the vector p jt = p jt ( ϑ jt ) = [ p jt , . . . , p jt ] with elements as in Equation 1 is a proper probability vector with p x jt ∈ [0 , and (cid:80) x p x jt = 1 , therefore avoiding further restrictions (e.g., Oeppen, 2008); see the leftpanel of Figure 1 for a graphical representation of the discretization process.We denote as D jt = [ D jt , . . . , D jt ] the number of deaths and as p jt = [ p jt , . . . , p jt ] thedeath probabilities, where the elements D x jt and p x jt denote, respectively, the number of deathsand the probability of dying at age x , in country j during year t . Given the total number of deaths n jt = (cid:80) x D x jt and the probabilities outlined in Equation (1), we model the vector D jt as a multinomialdistribution with ( D jt | n jt , p jt ) ∼ multinom ( n jt , p jt ) , j = 1 , . . . , p, t = 1 , . . . , T. (2)A crucial aspect of this approach involves the speciﬁcation f ( x ; · ) , which induces a model on thevector of probabilities p jt . In practice, diﬀerent parametric models for f ( x ; · ) are available in theliterature, with the Gompertz (1825) and Siler (1983) models being popular options. The age-at-deathdistribution f ( x ; · ) could be also modeled semi-parametrically, using for example splines or other basisfunctions (Basellini and Camarda, 2020; Hyndman et al., 2013). However, when interest is explicitlyon characterizing relevant aspects of mortality – such as the impact of infant mortality or the shapeof old-age mortality – parametric models provide a much clearer interpretation and deeper insights onthe structure of the populations under investigation.In this work, we draw inspiration from Mazzuco et al. (2018) and Zanotto et al. (2020) and relyon a mixture of three components to characterize mortality. Speciﬁcally, in order to facilitate model’sinterpretation, we decompose mortality as the contribution of infant mortality, adult mortality and old-age mortality. Infant mortality is accounted trough a single Dirac mass at age , while adult mortalityis modeled via a Gaussian distribution with mean µ jt ∈ R and standard deviation σ jt ∈ R + . Old-agemortality, instead, is characterized using a Skew-Normal distribution (e.g., Azzalini and Capitanio,5 p p p

56 57 58 59 60 (a) (b)

Ages

Figure 2:

Proposed mixture density function. Panel (a) provides a sketch of the discretization of the pdf into a set ofprobabilities p j , with j = 56 , . . . , . Panel (b) highlights the three components of the mixture density function. ξ jt ∈ R , scale parameter ω jt ∈ R + and shape parameter α jt ∈ R . Theimpact of each component to the overall mortality function is accounted assigning mixture weights π h jt ∈ [0 , such that π jt + π jt + π jt = 1 in order to guarantee that the resulting expression remainsa proper pdf . This choice leads to the following characterization of f ( x ; ϑ jt ) f ( x ; ϑ jt ) = π jt [ x = 0]+ π jt φ (cid:18) x − µ jt σ jt (cid:19) + (3) π j t ω jt φ (cid:18) x − ξ jt ω jt (cid:19) Φ (cid:18) α jt x − ξ jt ω jt (cid:19) , with ϑ jt = ( π jt , π jt , µ jt , σ jt , ξ jt , ω jt , α jt ) , [ · ] denoting the indicator function and φ ( · ) , Φ( · ) denotingthe pdf and the cdf of a standard Gaussian, respectively; we omit the parameter π jt from ϑ jt since π jt = 1 − π jt − π jt due to the constraint on the mixture weights; see the right panel of Figure 1 fora graphical representation of the proposed model for the age-at-death distribution.Equation (3) allows to characterize the age-at-death distribution with ﬂexibility, while also retaininginterpretability of the mortality structure. The ﬁrst component of Equation (3) accounts for infantand perinatal mortality, characterizing deaths in the ﬁrst year of life. As outlined, for example, inFigure 1, perinatal mortality is notably decreased during recent years, mainly due to technologicaladvances (e.g., Wilson, 2001; Zanotto et al., 2020); the mixture weight π jt characterizes the impact ofthis component in the overall mortality. The use of a Dirac mass is particularly adequate for developedcountries, where infant mortality occurs during the ﬁrst year of life and can be properly modeled with asingle mass, avoiding more elaborate speciﬁcations and additional parameters. However, dysm can benaturally adapted to countries with high infant-mortality levels, replacing the Dirac mass with an half-normal distribution and leveraging on the same model architecture. The Gaussian component, instead,is assigned to a relative weight π jt and characterizes adult mortality via the location parameter µ jt σ jt . In addition, these quantities correspond also to the moments of theGaussian component, and therefore can be interpreted directly as the mean and standard deviation ofthe age at adult mortality.The Skew-Normal component, lastly, accounts for the asymmetric shape of old-age mortality, whichhas been frequently observed in developed countries (e.g., Mazzuco et al., 2018; Pascariu et al., 2019).Since most deaths occur at old ages, this component is expected to be associated to the largest mixtureweight π jt . Inference on the structure of old-age mortality rely on such mixture weight and on theSkew-Normal parameters ξ jt , ω jt and α jt , providing insights on the evolution of old-age mortality interms of location, scale and shape of such a component. In practice, it can be convenient to focus onthe mean and standard deviation of old-age mortality, which are more interpretable measures of theshift and compression of the age-at-death distribution. According to the proposed parametrization,these quantities can be conveniently expressed as ˜ µ SN jt = ξ jt + ω jt δ jt (cid:112) /π, ˜ σ SN jt = ω jt (cid:113) − δ jt /π (4)with δ ij = α jt / (cid:112) α jt (e.g., Azzalini and Capitanio, 2013, sec. 2.1.4). Alternatively, it is also possibleto parameterize directly the Skew-Normal distribution in terms of its cumulants (e.g., Azzalini andCapitanio, 2013, sec. 3.1.4). However, for simplicity in implementation and exposition, we prefer tofocus on the outlined parameters, and eventually post-process the estimates to focus on the functionalsof interest. The Multinomial model outlined in Expression 2, with probabilities given by Equations (1) and (3),provides a ﬂexible speciﬁcation for the distribution of the number of deaths occurred in a single country j during a speciﬁc year t . In order to account for the temporal variation of the age-at-death distribution,we model explicitly the dynamic evolution of the parameters ϑ jt trough an hierarchical dynamic model.This strategy will allow us to highlight what aspects of mortality have changed in the past years,characterizing the rates of such changes and facilitating comparison across diﬀerent countries. Inaddition, the inclusion of a dynamic component facilitates the developments of mortality forecastsextrapolating parameters in time, obtaining predictions for the future age-at-death distributions.We specify the hierarchical model in an unconstrained domain, transforming the parameters ofEquation (3). This choice facilitates computation and implementations, allowing to rely on Gaussiandynamic models in the transformed space and avoiding restrictions on the support of the parame-ters. For this purpose, we introduce the reparametrized vector ˜ ϑ jt , with elements ( ˜ ϑ jt , . . . , ˜ ϑ jt ) (cid:101) ϑ jt = ( ˜ ϑ jt , . . . , ˜ ϑ jt ) := (cid:18) log (cid:18) π jt − π jt (cid:19) , log (cid:18) π jt − π jt − π jt (cid:19) , µ jt , log( σ jt ) , ξ jt , log( ω jt ) , α jt (cid:19) . (5)In Equation (5), the mixture weights ( π jt , π jt ) are mapped into ( ˜ ϑ jt , ˜ ϑ jt ) leveraging a cumu-lative logit transformation, while σ jt and ω jt – corresponding to adult and old-age mortality scaleparameters – are transformed into ˜ ϑ jt and ˜ ϑ jt , respectively, via a logarithmic transformations. Notethat each ˜ ϑ jt k ∈ R for k = 1 , . . . , , and that the vector (cid:101) ϑ jt eﬀectively corresponds to a one-to-onereparametrization of ϑ jt .We model the dynamic evolution of the elements ˜ ϑ jt k leveraging a random-walk model with drift.This choice leads to the following speciﬁcation. ˜ ϑ j k ∼ N ( m k , s k )( ˜ ϑ jt k | ˜ ϑ j t − k , β j k , η j k ) ∼ N ( β j k + ˜ ϑ j t − k , η j k ) , t = 1 , . . . , T, (6)In Equation (6), ˜ ϑ j k ∈ R denotes the initial condition of ˜ ϑ jt k , and is assigned to a Gaussian distri-bution with mean m k ∈ R and standard deviation s k ∈ R + . These values correspond to the initialinformation for the dynamic hierarchical model, and characterize a probabilistic representation of ourbeliefs before the data are observed (e.g., West and Harrison, 1997). The drift parameter β j k ∈ R measures the expected diﬀerence across consecutive values of ˜ ϑ jt k and accounts for possible increas-ing or decreasing linear trends in the evolution of the parameters. The inclusion of this componentis particularly relevant in our problem, where we expected to observe clear and important trends inthe evolution of mortality, accounted by the parameters charactering the age-at-death distribution.Conditionally on the value at time ( t − , each ˜ ϑ jt k is recursively distributed as a Gaussian distri-bution with mean β j k + ˜ ϑ j t − k , and standard deviation η j k ∈ R + . Overall, we can interpret dysm as a Multinomial state-space model, where Equation (6) controls the evolution of the latent states viarandom-walks with Gaussian innovations and, conditional on the states, the observed vectors of deathsare modeled as independent Multinomial variables with probabilities given by Equations (1) and (3);see Figure 3 for a graphical representation of the proposed hierarchical dynamic model.Although this dynamic model might seem oversimplistic, we found that such a speciﬁcation leadsto good results in a large variety of applications involving developed countries; see Section 3 for furtherdetails. More elaborate extensions are possible, for example including a moving-average component,increasing the lag of the autoregressive terms or joint modeling multiple components of ˜ ϑ jt k via multi-variate random-walks. However, a random walk with drift on the latent states often characterizes well8 ϑ j ˜ ϑ j · · · ˜ ϑ jt · · · ˜ ϑ j T − ˜ ϑ j T p j p j · · · p jt · · · p j T − p j T D j D j · · · D jt · · · D j T − D j T for country j = 1 , . . . , p Figure 3:

Graphical illustration of the dynamic component of dysm . Solid circles, white squares and gray squaresdenote vectors of dynamic parameters, probabilities characterizing the age-at-death distribution and observed numberof deaths, respectively. many phenomena without over-parameterizing the dynamic component, and is also general enough tobe directly used, without further modiﬁcations, in a large variety of settings; see Durbin and Koopman(2012) and West and Harrison (1997) for related arguments.Additionally, we highlight that Equation (6) is valid for any equally-spaced time grid, and thereforethe proposed approach can be also applied to diﬀerent time scales. In fact, our description focuseson yearly data for simplicity in exposition, but this requirement is not necessary for developing theproposed model in practice. Several important settings involve analysis of mortality across diﬀerenttime scales, such as monthly data or ﬁve-year periods; for example, when interest is on modelingshort-term variations due to extraordinary events (e.g., covid-19 pandemic) or large-scale trends,respectively. The purposely general speciﬁcation of dysm facilitate its use in these diﬀerent settings,without modifying the model speciﬁcation.

As discussed in the Introduction, a crucial aspect of mortality forecasting involves the penalizationof diverging scenarios via coherent modeling. This can be obtained explicitly including a commontrend across countries (e.g., Li, 2013; Li and Lee, 2005), or choosing a reference population (e.g.,Basellini and Camarda, 2019; Hyndman et al., 2013); see also Booth (2020) for a recent discussion.However, these choices are arbitrary and signiﬁcantly aﬀect the ﬁnal results (e.g., Kjærgaard et al.,2016). Therefore, we follow a diﬀerent path and induce coherent predictions via a Bayesian approachto inference, regularizing estimates toward a common mean trend. More speciﬁcally, we introduce acommon prior speciﬁcation for the country-speciﬁc dynamic hierarchical model deﬁned in Equation (6),9 k , η k , (cid:16) ˜ ϑ t k (cid:17) Tt =0 · · · β p k , η p k , (cid:16) ˜ ϑ pt k (cid:17) Tt =0 m k , s k , m β k , s β k , a k , b k k = 1 , . . . , Figure 4:

Graphical illustration of the hierarchical component of dysm , inducing borrowing of information and coherentpredictions across countries, with j = 1 , . . . , p . Top level block denotes ﬁxed hyper-parameters. letting for each coeﬃcient k = 1 , . . . , , β j k iid ∼ N ( m β k , s β k ) , η j k iid ∼ Inv-Gamma ( a k , b k ) for j = 1 , . . . , p, (7)where m β k , s β k , a k and b k denote ﬁxed hyper-parameters; see Figure 4 for an illustration of the hier-archical dependence structure induced by the common prior speciﬁcation.The speciﬁcation of a common prior distribution for the country-speciﬁc parameters allows toshare information across diﬀerent nations, improving estimates for the age-at-death distributions. Inaddition, the introduction of a common prior shrinks estimates toward a common mean, providing animplicit regularization on the parameters. Such a shrinkage is what makes dysm a “coherent” model ,using a data-driven reference mortality level without the need of a user-deﬁned benchmark. The large abundance of research on the evolution of mortality, and the clear interpretation of theparameters in Equation (3), facilitate informative elicitation for the initial conditions of the dynamichierarchical model in Equation (6) and the hyper-parameters of Equation (7). For example, Europeanadult morality is expected to lie between and years with high probability (e.g., Lewer et al.,2020); therefore, we can assume ˜ ϑ j ∼ N (50 , for the initial condition of ˜ ϑ jt , independently for j = 1 , . . . , p . Similarly, informative speciﬁcations for the prior distribution on the drift parameter β j k allow to include information on the direction and intensity of the temporal trends. For instance, wecould center the trend β j around m β = 2 , resulting in an expected increasing trend for the age atadult mortality of years between consecutive time points.When such information is not available or is not reliable, we recommend choosing weakly- or non-10nformative priors and diﬀuse initializations. This aim can be achieved, for example, setting a verylarge variance for ˜ ϑ j k (e.g., s k = 100 ), centering the drift parameters around zero with substantialvariability, and choosing non-informative priors for the state variances setting small values for a k and b k ; for example, a k = b k = 0 . .There is a large literature on simulation methods for non-Gaussian state-space models; see, forexample, Durbin and Koopman (2012, Chapter 13.3), Prado and West (2010, Chapter 6) or Chopinand Papaspiliopoulos (2020) for recent developments. A major complexity in our methods derive fromthe computation of the age-at-death distribution, which links ϑ jt to p jt via Equation (1). Therefore,we conduct posterior inference relying on the r package nimble (de Valpine et al., 2020), that allowsto specify the main structure of dysm , compile it in c++ , and implement an adaptive Metropolis-within-Gibbs mcmc algorithm to simulate from the posterior distribution of the model’s parameters.Speciﬁcally, the parameters of Equation (7) are updated using Gibbs steps, while the remaining areupdated with multivariate Metropolis steps, due to the lack of conjugacy induced by Equation (1).In order to improve the computational performance, we included further modiﬁcations to eﬃcientlycompute the cdf of the mixture density function in Expression (3) using c++ . This operation is nontrivial since it requires to compute the cdf of the Skew-Normal component, which relies on the Owen(1956) T-function; see also Azzalini and Capitanio (2013, Appendix B) for the details and Pateﬁeldand Tandy (2000) for practical considerations. Overall, posterior computations in a setting with p = 12 countries and T = 20 time points requires roughly minutes per iterations on an -core intel I - HQ , . ghz processor running Linux, which is satisfactory for our purposes. Code implementingthe method is publicly available at github.com/emanuelealiverti/dysm . The focus of this Section is to evaluate the ability of dysm to characterize the age-at-death distribu-tions of diﬀerent European countries. In Section 3.2, we are interested in comparing its performancewith some state-of-the-art methods for mortality forecast, focusing on rolling windows scenarios. InSection 3.3 we use the proposed method to estimate mortality and to forecasts for future years, illus-trating in details how dysm can be used to draw thoughtful inferential conclusions by making inferenceon diﬀerent quantities. Analysis will focus on the time range − , and on joint modeling thefollowing p = 12 countries: Austria ( aus ), Switzerland ( che ), England & Wales ( gbr ), West-Germany( deu ), Denmark ( dnk ), Finland ( fin ), France ( fra ), Italy ( ita ), Netherlands ( nld ), Norway ( nor ),Spain ( esp ) and Sweden ( swe ), separately for the male and female populations. These countries were11elected according to the European countries analyzed in other approaches, such as Li and Lee (2005).Mortality data are retrieved from the Human Mortality Database (2020), and the year- and country-speciﬁc number of deaths are obtained with the utilities of the r package demograpy (Hyndman et al.,2019). In order to quantitatively evaluate the performance of dysm in estimating and forecasting mortality,we rely on rolling window scenarios. Speciﬁcally, analysis focus on sub-periods of years of data to ﬁtthe models, and subsequent periods of years of data to provide forecasts. Performance is evaluatedboth in terms of in-sample accuracy—comparing the model ﬁt with the observed years—and interms of forecasts ability—comparing forecasts with the out-of-sample block for the subsequent years. Multiple scenarios are obtained rolling the window forward of steps of years, thereby dividingthe range − in consecutive scenarios.We compare dysm with diﬀerent popular approaches for mortality forecasting: the Lee-Cartermodel (Lee and Carter, 1992), the coherent Li-Lee model (Li and Lee, 2005), the functional mortalitymodel of Hyndman–Ullah (Hyndman and Ullah, 2007), the compositional Oeppen model (Oeppen,2008), and the mem5 model (Pascariu et al., 2019). All methods are conveniently available in acommon interface within the r package MortalityForecast (Pascariu, 2020). Since these methodsfocus on modeling single population data, we perform the analysis on each country separately. Inorder to compare the approaches on the same basis, results are evaluated in terms of the quality ofthe ﬁtted and predicted age-at-death distributions. Under dysm , such estimates are obtained post-processing the mcmc output, computing the expectation of Equation (1) with respect to the posteriordistribution via Monte Carlo integration. Oeppen (2008) and Pascariu et al. (2019) provides insteaddirect estimates of the age-at-death distribution, while for the remaining approaches we follow standardpractices and transform predictions for the death rates into predictions for the age-at-death distribution(e.g., Pascariu, 2020).Posterior inference proceeds separately for the female and male populations. We centered the initialconditions ˜ ϑ j k for the hierarchical dynamic component considering the expected level of mortality inthe temporal window under investigation, centering old-age mortality around years ( m = 70 ),adult mortality around years ( m = 50) and setting the remaining values of m k to . We also let s k = 10 for all k to induce suﬃcient variability in such initial prior guesses and cover the temporalwindow under investigation. Lastly, we speciﬁed a shared standard Gaussian for the drift parameters β j k and a non-informative Inverse-Gamma on the variances of the dynamic innovations η j k , setting12 emale malemae mse mae msedysm [1 . , . [1 . , . [1 . , . [2 . , . Lee-Carter 1.40 [1 . , . [1 . , . [1 . , . [2 . , . Li-Lee 1.23 [0 . , . [1 . , . [1 . , . [2 . , . mem5 [1 . , . [1 . , . [1 . , . [3 . , . Oeppen 1.13 [0 . , . [1 . , . [1 . , . [2 . , . Table 1:

In-sample relative performance. Median across rolling windows and countries. Second and third quartiles arereported in squared brackets. a k = 0 . , b k = 0 . for all k . In each scenario, posterior inference for dysm relies on iterationscollected after a burn-in of , thinning one iteration every . Convergence was assessed in termsof mixing, autocorrelation of the chains and Geweke diagnostics, which resulted satisfactory in all thewindows considered, with an Eﬀective Sample Size larger than out of stored iterations forall the parameters.Performance is evaluated in terms of Mean-Squared-Error ( mse ) and Mean-Absolute-Error ( mae )between the ﬁtted and predicted age-at-death distribution and the ground truth. Since the elementsof age-at-death distribution are generally very small numbers, we report results in relative terms,dividing the performance of each approach by the performance obtained by dysm in each scenario andcountry. Recalling that both mse and mae are measure of discrepancy, results larger than indicatethat predictions under dysm achieve an error which is lower than the competitor, and therefore wecan conclude that dysm performs better; the opposite holds for values lower than .Results referring to in-sample evaluation are reported in Table 1, computing the median, ﬁrst andthird quartiles of the relative performances across countries and rolling windows scenarios. Currentresults suggest that dysm outperforms the competitors both for the male and female populations, witha performance that is better than the competitors in all settings. These results conﬁrm the abilityof dysm to characterize with ﬂexibility age-at-death distributions across multiple countries and time females malesmae mse mae msedysm [1 . , . [1 . , . [1 . , . [1 . , . Lee-Carter 1.30 [1 . , . [1 . , . [1 . , . [1 . , . Li-Lee 1.28 [0 . , . [0 . , . [1 . , . [1 . , . mem5 [2 . , . [3 . , . [1 . , . [3 . , . Oeppen 1.29 [0 . , . [1 . , . [1 . , . [1 . , . Table 2:

Out-of-sample relative performance. Median across rolling windows and countries. First and third quartilesare reported in squared brackets. dysm , which obtains out-of-sample forecasts more accurate than thecompetitors in all settings. This important results suggest that, overall, dysm is the most accuratemethods for modeling and forecasting the age-at-death distribution in the European countries underinvestigation. Performances stratiﬁed by countries are reported in Tables A1-A2 and A3-A4, andprovide insights on the speciﬁc countries in which dysm might perform worse than the competitors.Considering, for example, in-sample performance reported in Table A1-A2, we observe that dysm performs slightly worse than some competitors only in the female Danish and Dutch populations. Whenout-of-sample performance is considered, only results for the male Spanish and Swedish populationsindicate that dysm is slightly less accurate than some competitors in these countries.

In Section 3.2, we have demonstrated that dysm outperforms the competitors in terms of the qualityof the ﬁtted and predicted age-at-death distributions. In this Section we argue that the proposedmethod has concrete practical beneﬁts in terms of mortality modeling also from an interpretativepoint of view. Indeed, posterior inference on the parameters characterizing dysm and their functionalsprovides insights on the structure of mortality and its future evolution, highlighting aspects that havelarger impact in such changes and explaining diﬀerences across countries.To illustrate this, we focus on the same European countries as in the previous section, modelingthe more recent period − and providing forecasts until . Posterior simulation for thisapplication relies on the same settings as in Section 3.2. Results in terms of mixing, autocorrelations,Geweke diagnostic and eﬀective sample size were satisfactory for all the parameters considered, withan eﬀective sample size larger than out of iterations for all the parameter. We conductinference on the set of parameters ( π jt , π jt , π jt , µ jt , σ jt , α jt , ˜ µ SN jt , ˜ σ SN jt ) to further facilitate interpre-tation. Results are reported in Figure 5, illustrating the posterior medians and credible intervalsfor the estimated parameters for male and female populations separately.Current empirical ﬁndings are consistent with known aspects of the temporal evolution of mortalityof the past decades. For example, the impact of infant mortality—measured via the parameters π j t —is notably decreased in all countries in the past years, dropping from values close to . to . .Although both male and female populations share such decreasing trends, we also observe that infantmortality is larger for males than females (e.g., Drevenstedt et al., 2008). Another clear aspect involvesthe evolution of the mean age of old-age mortality, captured via the Skew-Normal mean ˜ µ SN jt , which14 ALE s t MALE a t MALE m ~ tSN MALE s ~ tSN MALE p MALE p MALE p MALE m t FEMALE s t FEMALE a t FEMALE m ~ tSN FEMALE s ~ tSN FEMALE p FEMALE p FEMALE p FEMALE m t AUSCHE DEUDNK ESPFIN FRAGBR ITANLD NORSWE

Figure 5:

Results for the proposed approach (posterior medians and credible intervals). The vertical dotted linesindicate the last observed period (2017). has constantly increased in all countries. The overall trend is similar between males and females, withwoman reporting older ages on average. Similarly, the decreasing trend of ˜ σ SN jt is consistent with theempirical evidence on compression of old-age mortality.Under dysm , estimates and credible intervals for any complex functional of the model’s parametersare easily obtained. More importantly, prediction intervals are determined taking into account theuncertainty of the entire inferential process, which is not the case for other coherent models, whereall the uncertainty related to choice of reference is generally not accounted for. Some interestingfunctionals include death rates and life expectancies, which can be obtained from the age-at-deathdistribution via deterministic transformations; dysm allows to conduct coherent inference also on thisquantities, rigorously characterizing uncertainty post-processing the mcmc output. As an illustrativeexample, Figure 6 depicts the life-expectancy at birth and at , , , and years, consideringmale and female populations in Italy and Sweden. Results are consistent with the conclusion above,15 wedene Swedene Swedene Swedene Swedene Swedene Italye Italye Italye Italye Italye Italye L i f e e x pe c t an cy a t age x , e x FEMALE MALE

Figure 6:

Italy and Sweden, female and male population. Life expectancy at ages (at birth), , , , and . conﬁrming known trends of life-expectancy at all ages. In this article we have introduced dysm , a novel approach for modeling and forecasting the age-at-death distribution of multiple countries across time. The proposed method automatically regularizediverging scenarios, and allows to obtain predictions which are coherent across sub-populations. Inaddition, this strategy leads to a borrowing of information which improved the quality of the ﬁt andof the forecasts, compared with state-of-the-art methods for mortality forecasting. Another importantadvantage of the proposed method is that it provides coherent forecasts without the need of choosing areference population, which is instead naturally derived from data. The choice of reference population,in fact, is still an open issue of coherent forecast models and has received notable attention fromscholars in recent years (Booth, 2020; Kjærgaard et al., 2016).We have focused directly on the age-distribution for the number of deaths, considering the totalnumber of deaths as a given quantity and relying on a Multinomial likelihood. This approach allows tofocus directly on the distributions of deaths across diﬀerent ages, accounting for the heterogeneity inthe overall mortality trends. For future developments, it might be useful to include a further dynamiclevel in the hierarchy and model the process of death counts n j t . Such an aim can be achieved, forexample, using an hierarchical Poisson or Negative-Binomial dynamic model.16 CKNOWLEDGMENTS

This work was supported by the grant miur–prin

A ADDITIONAL RESULTS ON PERFORMANCE EVALUATION u s c h e d e u d n k e s p f i n females m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] males m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] T a b l e A : I n - s a m p l e r e l a t i v e p e r f o r m a n c e . M e d i a n a c r o ss r o lli n g w i nd o w s , s t r a t i ﬁ e db y c o un t r y ( au s , c h e , d e u , d n k , e s p , f i n ) . F i r s t a nd t h i r d q u a r t il e s i n s q u a r e db r a c k e t s . r a g b r i t an l d n o r s w e females m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] males m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] T a b l e A : I n - s a m p l e r e l a t i v e p e r f o r m a n c e . M e d i a n a c r o ss r o lli n g w i nd o w s , s t r a t i ﬁ e db y c o un t r y ( f r a , g b r , i t a , n l d , n o r , s w e ) . F i r s t a nd t h i r d q u a r t il e s i n s q u a r e db r a c k e t s . u s c h e d e u d n k e s p f i n females m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] males m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] T a b l e A : O u t - o f - s a m p l e r e l a t i v e p e r f o r m a n c e . M e d i a n a c r o ss r o lli n g w i nd o w s , s t r a t i ﬁ e db y c o un t r y ( au s , c h e , d e u , d n k , e s p , f i n ) . F i r s t a nd t h i r d q u a r t il e s i n s q u a r e db r a c k e t s . r a g b r i t an l d n o r s w e females m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] males m a e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m s e d y s m H y nd m a n - U ll a h . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L ee - C a rt e r . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] L i - L ee . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] m e m . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] O e pp e n . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] . [ . , . ] T a b l e A : O u t - o f - s a m p l e r e l a t i v e p e r f o r m a n c e . M e d i a n a c r o ss r o lli n g w i nd o w s , s t r a t i ﬁ e db y c o un t r y ( f r a , g b r , i t a , n l d , n o r , s w e ) . F i r s t a nd t h i r d q u a r t il e s i n s q u a r e db r a c k e t s . EFERENCES

Azzalini, A. and Capitanio, A. (2013).

The skew-normal and related families , volume 3. CambridgeUniversity Press.Basellini, U. and Camarda, C. G. (2019). Modelling and forecasting adult age-at-death distributions.

Population Studies , 73(1):119–138.Basellini, U. and Camarda, C. G. (2020). A three-component approach to model and forecast age-at-death distributions. In Mazzuco, S. and Keilman, N., editors,

Developments in DemographicForecasting , pages 105–129. Springer International Publishing, Cham.Bergeron-Boucher, M.-P., Canudas-Romo, V., Oeppen, J., and Vaupel, J. W. (2017). Coherent forecastsof mortality with compositional data analysis.

Demographic Research , 37:527–566.Booth, H. (2020). Coherent mortality forecasting with standards: Low mortality serves as a guide. InMazzuco, S. and Keilman, N., editors,

Developments in Demographic Forecasting , pages 153–178.Springer International Publishing, Cham.Canudas-Romo, V. (2010). Three measures of longevity: Time trends and record values.

Demography ,47(2):299–312.Cheung, S. L. K., Robine, J.-M., Tu, E. J.-C., and Caselli, G. (2005). Three dimensions of the survivalcurve: Horizontalization, verticalization, and longevity extension.

Demography , 42(2):243–258.Chopin, N. and Papaspiliopoulos, O. (2020).

An introduction to sequential Monte Carlo . SpringerInternational Publishing.de Valpine, P., Paciorek, C., Turek, D., Michaud, N., Anderson-Bergman, C., Obermeyer, F.,Wehrhahn Cortes, C., Rodrìguez, A., Temple Lang, D., and Paganin, S. (2020). nimble : mcmc ,particle ﬁltering, and programmable hierarchical modeling. R package version 0.9.1.Dellaportas, P., Smith, A. F., and Stavropoulos, P. (2001). Bayesian analysis of mortality data. Journalof the Royal Statistical Society: Series A (Statistics in Society) , 164(2):275–291.Drevenstedt, G. L., Crimmins, E. M., Vasunilashorn, S., and Finch, C. E. (2008). The rise and fall ofexcess male infant mortality.

Proceedings of the National Academy of Sciences , 105(13):5016–5021.Durbin, J. and Koopman, S. J. (2012).

Time series analysis by state space methods . Oxford universitypress. 22ompertz, B. (1825). On the nature of the function expressive of the law of human mortality, and ona new mode of determining the value of life contingencies.

Philosophical Transactions of the RoyalSociety of London , (115):513–583.Heligman, L. and Pollard, J. H. (1980). The age pattern of mortality.

Journal of the Institute ofActuaries , 107(1):49–80.Human Mortality Database (2020). University of California Berkeley, USA and Max Planck Institutefor Demographic Research, Germany. [Data downloaded on October 2020].Hyndman, R. J., Booth, H., and Yasmeen, F. (2013). Coherent mortality forecasting: the product-ratiomethod with functional time series models.

Demography , 50(1):261–283.Hyndman, R. J. and Ullah, M. S. (2007). Robust forecasting of mortality and fertility rates: a functionaldata approach.

Computational Statistics & Data Analysis , 51(10):4942–4956.Hyndman, R. J. w. c. f. H. B., Tickle, L., and Maindonald., J. (2019). demography: ForecastingMortality, Fertility, Migration and Population Data . R package version 1.22.Keyﬁtz, N. (2005).

Applied mathematical demography . Springer.Kjærgaard, S., Canudas-Romo, V., and Vaupel, J. W. (2016). The importance of the reference popu-lations for coherent mortality forecasting models. In

European Population Conference .Klein, J. P. and Moeschberger, M. L. (2006).

Survival analysis: techniques for censored and truncateddata . Springer Science & Business Media.Lee, R. D. and Carter, L. R. (1992). Modeling and forecasting us mortality.

Journal of the AmericanStatistical Association , 87(419):659–671.Lewer, D., Jayatunga, W., Aldridge, R. W., Edge, C., Marmot, M., Story, A., and Hayward, A. (2020).Premature mortality attributable to socioeconomic inequality in england between 2003 and 2018:an observational study.

The Lancet Public Health , 5(1):e33–e41.Li, J. (2013). A Poisson common factor model for projecting mortality and life expectancy jointly forfemales and males.

Population Studies , 67(1):111–126.Li, N. and Lee, R. (2005). Coherent mortality forecasts for a group of populations: An extension ofthe Lee-Carter method.

Demography , 42(3):575–594.23utz, W. and Samir, K. (2010). Dimensions of global population projections: what do we knowabout future population trends and structures?

Philosophical Transactions of the Royal Society B:Biological Sciences , 365(1554):2779–2791.Makeham, W. M. (1860). On the law of mortality and the construction of annuity tables.

Journal ofthe Institute of Actuaries , 8(6):301–310.Mazzuco, S., Scarpa, B., and Zanotto, L. (2018). A mortality model based on a mixture distributionfunction.

Population Studies , 72(2):191–200.Oeppen, J. (2008). Coherent forecasting of multiple-decrement life tables: a test using Japanese causeof death data.

Compositional data analysis conference .Owen, D. B. (1956). Tables for computing bivariate normal probabilities.

The Annals of MathematicalStatistics , 27(4):1075–1090.Pascariu, M. D. (2020).

MortalityForecast: Standard Tools to Compare and Evaluate Various MortalityForecasting Methods . R package version 0.8.0.Pascariu, M. D., Lenart, A., and Canudas-Romo, V. (2019). The maximum entropy mortality model:forecasting mortality using statistical moments.

Scandinavian Actuarial Journal , 2019(8):661–685.Pateﬁeld, M. and Tandy, D. (2000). Fast and accurate calculation of Owen T-function.

Journal ofStatistical Software , 5(5):1–25.Pearson, K. (1897).

The Chances of Death, and Other Studies in Evolution , volume 2. E. Arnold.Prado, R. and West, M. (2010).

Time series: modeling, computation, and inference . CRC Press.Siler, W. (1983). Parameters of mortality in human populations with widely varying life spans.

Statisticsin Medicine , 2(3):373–380.West, M. and Harrison, J. (1997).

Bayesian forecasting and dynamic models . Springer Science &Business Media.Wilson, C. (2001). On the scale of global demographic convergence 1950–2000.

Population and Devel-opment Review , 27(1):155–171.Zanotto, L., Canudas-Romo, V., and Mazzuco, S. (2020). A mixture-function mortality model: illus-tration of the evolution of premature mortality.