[PDF] A Mortality Model for Multi-populations: A Semi-Parametric Approach

Abstract

Mortality is different across countries, states and regions. Several empirical research works however reveal that mortality trends exhibit a common pattern and show similar structures across populations. The key element in analyzing mortality rate is a time-varying indicator curve. Our main interest lies in validating the existence of the common trends among these curves, the similar gender differences and their variability in location among the curves at the national level. Motivated by the empirical findings, we make the study of estimating and forecasting mortality rates based on a semi-parametric approach, which is applied to multiple curves with the shape-related nonlinear variation. This approach allows us to capture the common features contained in the curve functions and meanwhile provides the possibility to characterize the nonlinear variation via a few deviation parameters. These parameters carry an instructive summary of the time-varying curve functions and can be further used to make a suggestive forecast analysis for countries with barren data sets. In this research the model is illustrated with mortality rates of Japan and China, and extended to incorporate more countries.

Full PDF

MMulti-Populations Mortality Model • June 2018

A Mortality Model forMulti-populations:A Semi-Parametric Approach ∗ Lei Fang , Wolfgang K. Härdle , Juhyun Park † Humboldt-Universität zu Berlin, Lancaster University

Abstract

Mortality is different across countries, states and regions. Several empirical research works however revealthat mortality trends exhibit a common pattern and show similar structures across populations. The keyelement in analyzing mortality rate is a time-varying indicator curve. Our main interest lies in validatingthe existence of the common trends among these curves, the similar gender differences and their variabilityin location among the curves at the national level. Motivated by the empirical ﬁndings, we make thestudy of estimating and forecasting mortality rates based on a semi-parametric approach, which is appliedto multiple curves with the shape-related nonlinear variation. This approach allows us to capture thecommon features contained in the curve functions and meanwhile provides the possibility to characterizethe nonlinear variation via a few deviation parameters. These parameters carry an instructive summaryof the time-varying curve functions and can be further used to make a suggestive forecast analysis forcountries with barren data sets. In this research the model is illustrated with mortality rates of Japan andChina, and extended to incorporate more countries.

JEL classiﬁcation: C14, C32, C38, J11, J13Keywords: Mortality forecasting; Common trend; Lee-Carter method; Multi-populations; Semi-parametric modeling ∗ The authors gratefully acknowledge ﬁnancial support from the Deutsche Forschungsgemeinschaft through theInternational Research Training Group IRTG 1792 "High Dimensional Non Stationary Time Series" and the CollaborativeResearch Center CRC 649 "Economic Risk". † Corresponding author: Department of Mathematics and Statistics, Lancaster University, Lancaster LA1 4YF, UK., email:[email protected] a r X i v : . [ s t a t . A P ] S e p ulti-Populations Mortality Model • June 2018

In recent years, global population trend has received widespread attention with regard to thegrowth of the aging populations on the globe, which raises demographic risk in most developedand even some developing countries. Demographic risk is understood to be an imbalance of theage distribution of a society with the obvious implications in economic growth, social stability,political decisions and resource allocation. The factor demography is particularly important inunderstanding the challenges of developing countries and their global impacts, yet, due to limitedaccess and available resources, most studies have focused on the cases from developed countries.This motivated us to develop a novel approach to forecasting a population trend with limited data,using the example from China.China, as a large developing Asian country, is experiencing the transformation to an agingsociety at an even faster rate, and is therefore a good example with which to study demographicrisk. However, due to political reasons and delays in construction of a national system forsystematically collecting statistics, the problem of insufﬁcient and unsatisfying demographic datasets remains unsolved, and this situation poses a unique challenge in statistical analysis. Japan,China’s neighbour, has undergone a dramatic demographic change during the last several decades.In addition, the Japanese government has set up a complete national statistical system in themiddle of the last century, which provides ﬁne demographic data sets in longer time horizons tohelp researchers explore Japan’s demographic transition. A typical example of demographic datain terms of mortality and fertility is shown in Figure 1 from Japan. − − − − − Age

Log dea t h r a t e ( m a l e ) − − − − − Age

Log dea t h r a t e ( f e m a l e )

10 20 30 40 50 . . . . . . . Age F e r t ili t y r a t e Figure 1:

Japan’s descriptive demography - male (left) and female (middle) mortality dynamics from 1947 to 2012,fertility (right) dynamics from 1947 to 2009, rates in different years are plotted in rainbow palette order.

MuPoMoAs we shall demonstrate later with our new formulation of the problem, those Japanese datasets could offer a good reference point to studying the pattern in China. However, due to thetime-lag in the phase of the development of each country, a direct comparison is not feasible.In this work, we propose to incorporate a nonlinear transformation across different periods2ulti-Populations Mortality Model • June 2018by formulating a semi-parametric regression model. The advantages and shortcomings of ourapproach will be discussed in detail. To overcome some of the shortcomings, our approach willbe extended by taking into account a trend in multi-populations using global Human MortalityDatabase (2013).Our approach could be understood as an extension of the coherent forecasting advocated by (Li& Lee, 2005) with the idea that a local trend estimation could be improved by taking into accountthe patterns in a larger group it belongs to. This requires a serious consideration of the notion ofsimilarity between the members. Until now, the idea of coherent forecasting was mostly discussedin the context of concurrent comparison with a small well-deﬁned neighboring countries. In thisarticle we attempt to demonstrate its generality in a wider context, with a new formulation underthe shape invariant models (Lawton et al., 1972). We ﬁrst introduce the datasets at our disposal todemonstrate our approach before presenting a brief literature review.

The demographic data sets are collected from different sources: China data sets are extracted fromthe China Statistical Year Book, the other 35 countries are obtained from the Human MortalityDatabase (HMD). For data sets on China the mortality rates are gender-age speciﬁc starting from0 to 90+ years old, while the mortality rates of other countries are gender-age speciﬁc from 0 to110+ since the Human Mortality Database makes estimates and adjustments on the raw data toextend into wider age group. As for the sample size, they are all different as well: China mortalitydata spans from 1994 to 2010 with missing data of years 1996, 1997, 2001 and 2006, while thesample size from other countries range from 14 years (Chile) to 261 years (Sweden). Referring tothe missing values, we use moving average of neighboring ﬁve years to compute these.Recall that the deﬁnition of the mortality rate is the number of deaths per 1000 living individualsper single calender year. To ﬁt the mortality trend more precisely and for visual convenience, wepresent the log mortality.Data and sources codes are available online (https://github.com/QuantLet/MuPoMo) andlinked to the ﬁgures.

Since 1980, one of the challenges in demography has been to analyze and forecast mortality in apurely statistical way without involving the subjective opinions of experts. Lee & Carter (1992)(LC) ﬁrstly proposed a stochastic method based on a Singular Value Decomposition techniqueto explore the unobserved demographic information. This proved insightful and gained a goodreputation. Since then, several methods based on stochastic population modelling and forecastinghave been developed, see e.g. Cairn et al. (2008), Booth & Tickle (2008) and Booth (2006) forreview. Among all the stochastic models, the most popular one is the LC model, which wasused to analyze the U.S. mortality rates from 1933 to 1987. Based on the idea of the LC model,3ulti-Populations Mortality Model • June 2018comparisons of different methods and some variants or extensions have been developed. Lee &Miller (2001) compared the forecasts of LC model with the U.S. social security system forecasts. Liet al. (2004) proposed another method when there are fewer observations at uneven intervals, andapplied it to China and South Korea. Hyndman & Ullah (2007) developed a more general methodby treating the underlying demographic process as functional data, employing the functionalprincipal components analysis to extract more than one explanatory components and providingrobust estimation and forecast. This idea is further extended to multilevel functional data in Shang(2016); Gao & Shang (2017). These linear extensions are successful in capturing complex variationat the aggregate level at the expanse of ease of interpretability of the LC model.In light of limited data access combined with fragile quality, less technically reﬁned methodsare available for Asian countries compared to, for example, developed western countries. Anexception is the stochastic population approach on Asian data by Li et al. (2004), who implementedthe LC model to sparse data. In their work, they generated central forecast with just the ﬁrst andlast observations along the time horizon, improved the estimates by additional observations andevaluated its performance with other existing methods. Raftery et al. (2012) proposed a Bayesianmethod for probabilistic population projections for all countries, where the Bayesian hierarchicalmodels, estimated via Markov chain Monte Carlo, are applied to the United Nations populationdata. In cases of limited data and similar demographic trends between two populations (regionalor national level), the Bayesian stochastic modelling for two populations is proposed by Cairnset al. (2011). This motivates us to analyze Chinese demography via taking Japan into reference.One might wonder why Japan, not Taiwan or any other neighboring countries. An interestingﬁnding from Fang & Härdle (2015) is that China has a demographic trend closer to Japan thanto Taiwan, particularly visible in the mortality trend. On the economic level, China and Japanhave been both important economies in last several decades and the development pattern is alsoquite similar. Hanewald (2011) found that the LC mortality index k t correlates signiﬁcantly withmacroeconomic ﬂuctuations in some periods, which provides a good reference with which toconnect the mortality trends between China and Japan. Nevertheless, due to their differences inthe developmental phase, it is much desirable to incorporate nonlinear transformation of the trendto capture the effect of time trend in a ﬂexible manner. Due to the sensitivity of fertility to social policies and induced unpredictability, our research isrestricted to mortality analysis. The purpose of our analysis is to build a modeling frameworkthat takes into account similarities in trend across different countries. We focus on the populartime-varying mortality indicator k t extracted from the standard LC method. Following the usualinterpretation of the mortality indicator as a time trend, we seek to compare the trends in terms oftheir shape variation in time or in phase to measure similarities among different countries. Mostlinear approaches such as those based on principal component analysis are inefﬁcient in capturing4ulti-Populations Mortality Model • June 2018these nonlinear variations and thus difﬁcult to interpret the results. Instead we directly model theshape variation of the curves. To increase ﬂexibility and interpretability of the shape of the trendfunction, we adopt the framework of the semiparametric comparison approach (Härdle & Marron,1990; Kneip & Engel, 1995) and accordingly demonstrate potential improvements in mortalityforecasting. We analyse similarities in mortality between China and Japan, and then extend ourapproach to a global common mortality trend and study a sub-group pattern.A typical setting for the semiparametric approach assumes that (i) measurements are availableon a common interval, often at common and dense grid points, (ii) the population has a welldeﬁned trend in terms of identiﬁable features and (iii) the errors are independent, all of whichare violated in our demographic application. We demonstrate its ramiﬁcations and additionalconsiderations throughout the analysis.The general methodology is presented in Section 2. Section 3 focuses on mortality forecastwith two country comparison, using the example of China and Japan. This serves a motivation forfurther development in Section 4 with multiple populations using the Human Mortality Database.More discussions on economic insights, global aging trend inﬂuences and suggestions are given inthe last section.

In this section, we ﬁrstly introduce the parameters of interests and then outline the LC method,semi-parametric comparison of nonlinear curves and common trend modeling in details.

The parameters of interests are age-speciﬁc mortality rates from multiple countries. We usethe symbols m to denote the mortality rate. All the parameters are indexed by a one-year agegroup, denoted by x , and in addition indexed by time, denoted by t . For instance, m ( x , t ) is themortality rate for age x in year t . Based on the LC model, m ( x , t ) is assumed to be decomposedinto the average age pattern and the time-varying index k t (for single country, state or region).Characterizing k t is essential in understanding the mortality trend, which is the main goal ofthis paper. When it comes to multiple countries, we use k i ( t ) to denote the derived time-varyingmortality indicator for country i , with i ∈ {

1, ..., N } . The benchmark LC model employs the Singular Value Decomposition (SVD) to analyse the timeseries on the log of the age-speciﬁc mortality. The method relies on the standard statistical analysisof the time series. Nonetheless, the LC model does not ﬁt well in some cases where missing datais common or the horizon of time series is not sufﬁcient, the reason being the assumption oflong-term stationarity. 5ulti-Populations Mortality Model • June 2018The basic idea for demography dynamics analysis is to regress mortality m ( x , t ) on non-observable regressors for prediction. The regressors are obtained via SVD of the demographicindicators. It separates the age pattern from the time-dependent components, takes time seriesanalysis on the time-dependent components only and hence forecasts the future trend.The mortality rate m ( x , t ) is hence calibrated via the following model:log { m ( x , t ) } = a x + b x k t + ε x , t , (1)or m ( x , t ) = exp ( a x + b x k t + ε x , t ) ,where a x is the derived age pattern averaged across years and k t represents the only time-varyingindex of mortality level. Thus b x measures the sensitivity of the mortality rates to the change of k t ,reﬂecting how fast the mortality rate changes over ages and ε x , t is the residual term at age x inyear t with E ( ε x , t ) = Var ( ε x , t ) = σ ε .Three unobserved parameters a x , b x and k t in the single equation (1) mean that the LC modelis over-parameterized and therefore two normalisation constraints are imposed: ∑ k t = ∑ b x = k t and b x . The evolution of k t can be further ﬁtted by standard time seriesmodels such as using ARIMA techniques. Lee & Carter (1992) found that a random walk withdrift describes k t quite well: k t = k t − + d + e t where d is the drift parameter reﬂecting the average annual change and e t is an uncorrelated error.Others such as Chan et al. (2008) pointed out that the mortality index k t may be better ﬁtted witha trend-stationary model for Canada, England and United States when accounting for a break inmortality rate decline during the 1970s.Given an h -step ahead forecasting k t + h from the time series models, the forecast of the mortalityrates in future period t + h can be made via the following formula: m ( x , t + h ) = exp ( a x + b x k t + h ) . When the observable curves are noisy versions of similar regression curves, comparison ofregression curves from related samples is not trivial. The problem aggravates when the data aresparse, partially available for some and the domain of the curves do not match, as demonstratedin Figure 2 where we wish to compare the short segment of black dots on the right and the longblue curve below. The black dots on the right with the red solid curve underneath represent the6ulti-Populations Mortality Model • June 2018mortality rates from China, while the elongated blue curve below represents the mortality ratesfrom Sweden. The curve above in cyan is a transformed Swedish curve that gives a best match tothe partial observations of China, which will be deﬁned precisely later. X Y − − Figure 2:

Semi-parametric Comparison of Nonlinear Curves: the dark blue curve and the red curve have similar pattern,while the light blue curve is semi-parametrically shifted to represent the red curve via the dark blue curve.

MuPoMoIt is not unreasonable to assume that developing countries will catch up with the developedcountries, and due to globalization the trend in mortality rates is expected to follow a similarpattern. Then the main differences among those curves could be explained by shifted time axisand vertical re-scaling, which could further be parametrized for parsimonious representation andease of interpretation. This is our motivation to take a view of semiparametric comparison intaking into account mortality variations of other countries.To handle the noisy data, nonparametric smoothing techniques could be incorporated toestimate the underlying curves when the solid theory is unavailable in modeling them. Hence, weseek general semi-parametric models that allow for nonlinear transformation on the domain aswell as the image of the curves.Simply denoting the underlying curves by f and f , the semi-parametric comparison of thenonlinear curves can be expressed as f ( t ) = θ f (cid:18) t − θ θ (cid:19) + θ , (2)where we assume that f has a similar pattern to f and θ = ( θ , θ , θ , θ ) (cid:62) are shape deviationparameters. Since normally only noisy measurements of the curves are available, f and f are not7ulti-Populations Mortality Model • June 2018directly available, there will be measurements models given by Y i ( t ij ) ≡ Y ij = f i ( t ij ) + ε ij i =

1, 2; j =

1, . . . , n i .More detailed discussions on this method can be referred to Härdle & Marron (1990) on semipara-metric comparison of regression curves. When more than two regression curves share similar pattern or trend, a common trend model canbe built based on the technique of semi-parametric comparison of nonlinear curves we discussedpreviously. Suppose we are given N noisy curves Y i , i =

1, . . . , N that exhibit some similar patterns.A general regression model can be expressed as,Y i = f i + ε i , (3)where f i denote unknown smoothing regression functions while ε i represent independent errorswith mean 0 and variance σ i .The relationship among these similar curves can be described as f i ( t ) = θ i g (cid:18) t − θ i θ i (cid:19) + θ i . (4)Here θ i = ( θ i , θ i , θ i , θ i ) are unknown parameters describing shape deviations, and g is aunknown function specifying the common shape of these curves, which can be interpreted as areference curve. The model in (4) is commonly known as shape invariant model (SIM), ﬁrstlyproposed by Lawton et al. (1972) and further studied by Kneip & Engel (1995) and provides anextension of the model in (2) to multiple curves. A detailed investigation of this model to estimatethe mortality trend will be given in the following section. With reasons mentioned in Section 1, we will ﬁrst focus on analysing mortality similarities betweenChina and Japan. Before presenting our formulation, we ﬁrst graphically show the empiricaltrends in the left of Figure 3, which suggests that the mortality trends from both gender groupsof China correlate with those of Japan respectively. However, due to limited sample they allseemingly reﬂect linear mortality trend over time, which would contradict to the theoreticaland conceptual views on mortality trends. To provide intuitive comparison between these twocountries, the horizontal shift of Japan’s mortality curve over time axis is plotted in the right partof Figure 3. The dotted lines from left to right are shifted Japan’s smoothed trends of 20-, 23- and25- years forward respectively. Graphically we see that Japan’s mortality trend is 23 years earlierthan China.8ulti-Populations Mortality Model • June 2018 − − Time K t − − Time K t Figure 3:

China mortality trend vs. Japan mortality trend: female, male (left) and Japan trend, Japan smoothed trend,China trend and China smoothed trends (right).The dotted lines (right) from left to right are shifted Japan’ssmoothed trends of 20-, 23- and 25- years forward respectively.

MuPoMo

To parameterize the potential relationship between China and Japan mortality trend, we specifythe model as following, using k t derived from LC model in (1):log { m ( x , t ) } = a x + b x k t + ε x , t .Then we infer China’s mortality trend via Japan’s trend through the technique of semi-parametriccomparison of regression curves deﬁned in (2) k c ( t ) = θ k j (cid:18) t − θ θ (cid:19) + θ , (5)where k c ( t ) is the time-varying indicator for China, k j ( t ) is the time-varying indicator for Japan,and θ = ( θ , θ , θ , θ ) (cid:62) are shape deviation parameters. Understanding θ It is probably easiest to interpret the parameters (5) by starting with θ =( θ , θ , θ , θ ) (cid:62) = ( θ , 1, θ ) (cid:62) . • θ is the general trend adjustment, here selected as 1. • θ is the time-delay parameter • θ is the time acceleration parameter, here selected as 1. • θ is the vertical shift parameterIn Figure 4, it demonstrates how θ inﬂuences shift of these two curves. In the left of Figure 4,China’s female k t can reach a similar behavioral area by shifting horizontally θ = −

23, while inthe right it shows that another acceptable area could be obtained via approximately vertical shiftof θ =

85. 9ulti-Populations Mortality Model • June 2018 − − Time K t − − Time K t Figure 4:

Time delay θ = − (left) and vertical shift θ = (right). MuPoMo

In order to ﬁnd the optimal solution for shape deviation parameters, we minimize the followingloss function. min θ (cid:90) t c (cid:26) ˆ k c ( u ) − θ ˆ k j (cid:18) u − θ θ (cid:19) − θ (cid:27) w ( u ) du , (6)where ˆ k c ( t ) and ˆ k j ( t ) are the nonparametric estimates of the original time-varying indicators k c ( t ) and k j ( t ) , and t c is the time interval of China mortality data. Here we have used local linearsmoother for our implementation but any other smoothing methods could be used (Simonoff,1996).The comparison region needs to satisfy the following condition, in order to make sure theparameter estimation is compared only in the common region deﬁned by w ( u ) . w ( u ) = ∏ t j [ a , b ] { ( u − θ ) / θ } ,where t j is the time interval of Japan’s mortality data, a ≥ in f ( t j ) and b ≤ sup ( t j ) . To consider theimportance of more recent data’s impact on future trend, one could impose different weights ondifferent support intervals. Algorithm

To estimate the parameters by the nonlinear least squares estimation criterion givenin (6), we ﬁrst obtain the estimates of k c and k t by nonparametric local linear smoothing, denotedby ˆ k c and ˆ k t respectively. Then we set up the initial estimates θ = ( θ , θ , θ , θ ) and solve thenonlinear least squares estimation problem by iteratively updating the estimates until convergence.10ulti-Populations Mortality Model • June 2018 −10 −8 −6 −4 −2

Figure 5:

Loss surface of θ and θ (left) and Contour of θ and θ (right) with θ = θ = . MuPoMo −50 −40 −30 −20 −10 0 theta2 Lo ss Figure 6:

Loss function of θ with ( θ , θ , θ ) (cid:62) = (

1, 1, 0 ) (cid:62) . MuPoMo

Initial choice of θ and θ From previous Figure 3 and in our initial analysis, we see that thereis a potential ambiguity between θ and θ . It is also clear that we can ﬁnd the replacementrelationship from Figure 4. To investigate further, we show the criterion function in Figure 5 asa function of ( θ , θ ) . There is a valley area in the loss surface function of θ and θ in the leftplot and also in the contour of θ and θ in the right one, which suggests that there exists anapproximate linear combination of θ and θ in searching θ for an optimal solution. It will bringabout difﬁculties in ﬁnding the optimal parameters θ in numerical optimization. This ambiguity inidentiﬁability of the parameters is one of the ramiﬁcations of the partial observation in our setting.11ulti-Populations Mortality Model • June 2018In order to ﬁnd the optimal value, we should be very careful with selecting initial valuesof θ and in the consideration of identiﬁability issues we need to decide whether the analysisconcentrates on time delay or vertical shift. In our analysis we stick with time delay inﬂuence θ since it is more valuable in prediction perspective, and thus the initial value of θ is determined as ( θ , θ ) (cid:62) = (

1, 1 ) (cid:62) and set θ =

0. Hence the optimal initial θ is obtained around -23, see Figure6. Goodness of Fit

Based on the initial θ ( ) = ( θ , θ , θ ) (cid:62) = ( −

23, 1 ) (cid:62) and algorithm, theoptimal parameter θ is reached at ˆ θ = ( − ) (cid:62) . This is used in Figure 7, showingthat after the optimal transformation of the curve of Japan (based on the optimal value of θ ) the k t of China ﬁts quite well in the k t of Japan of years around from 1970 to 1990. − − Time K t Figure 7:

Goodness of Fit and forecast of China’s mortality trend from 2011 to 2030 via Japan’s historical data: blackdots represent the original k t from Japan and China, and Japan smoothed trend, China smoothed trend aredisplayed as well; the ﬁtted trend is plotted as light blue dashed line, while the overlapping part is colored inpurple; the light blue dashed line after year 2011 is the forecast part. MuPoMo

Afterwards we can forecast k t for China via the data from Japan and the optimal estimated shapedeviation parameter ˆ θ to extend the forecasting horizon. k c ( t + h ) = ˆ θ k j (cid:40) ( t + h ) − ˆ θ ˆ θ (cid:41) , (7)where ˆ θ = ( − ) (cid:62) and t = h =

1, 2, ..., 20.12ulti-Populations Mortality Model • June 2018Compared with the traditional forecasting method with time series analysis, our proposedmethod can extend the forecasting horizon from 5 years to 23 years, which is a big advantage fromthis semi-parametric comparison technique of regression curves, see Figure 7. However, searchingfor numerical optimal solutions for seemingly linear regression curves is still a challenge in thiscontext. This has motivated us to develop an extension to the multi-populations mortality models,because the identiﬁability issues can be better solved by borrowing information across multiplecurves.

Based on the previous study of two countries under sparse data, investigation of multi-populationsbecomes promising and necessary since more information on nonlinear trend will be provided incase of multi-countries. At the same time it enables us to study global mortality trend in the pastcentury and its future.

Original Kt (36 countries)

Time k t − − Reference Curve vs. Smoothed Kt (31 countries)

Time k t − − Figure 8:

Similarities of mortality trend among countries: different colors represent different countries (left), and thered thick curve in the right plot stands for reference curve while grey ones are smoothed curves.

MuPoMoDenote by k i the time-varying mortality indicator for country i , with i ∈ {

1, ..., N } . Figure 8displays the estimates of k i for 36 countries in the database. The left panel shows the noisy curvesthat are originally estimated k t from the LC model without any further nonparametric smoothing,while on the right it shows the smoothed k t with an initial estimate of the reference curve overlayed.Later on, we will discuss why 31 countries are selected for analysis. By design, the available timemeasurements vary among countries. Nevertheless, we notice remarkable similarities in the trendacross the countries, subject to individual variability.In order to investigate this structural similarity and to borrow the information across thecountries, we consider the shape invariant model introduced in Section 2.4 of methodology part.13ulti-Populations Mortality Model • June 2018

Speciﬁcally we assume the additive noise model deﬁned in (3) for the derived k i s from LC modelto account for noise and suppose that the underlying curves share some common trend and canbe represented in the form k i ( t ) = θ i k (cid:18) t − θ i θ i (cid:19) + θ i , (8)where k ( t ) is a reference curve, understood as common trend and θ i = ( θ i , θ i , θ i , θ i ) (cid:62) areshape deviation parameters. In order to be able to interpret the reference curve k as a mean trend,we can use the normalizing constraints on the parameter θ i as N − N ∑ i = θ i = N − N ∑ i = θ i =

1, (9) N − N ∑ i = θ i = N − N ∑ i = θ i = k t ofSweden with θ = (

1, 0, 1, 0 ) and θ i will measure the deviation with respect to the reference curve.In this work we will consider the mean curve as a reference curve and use the above normalizationconstraints. Estimation of parameters

Suppose that k and k i are given. Then for each country i , theparameter θ i can be determined by minimizing the least squares criterion as (cid:90) (cid:110) k i ( t ) − θ i k (cid:16) t − θ i θ i (cid:17) − θ i (cid:111) w i ( t ) dt (11)where w i is chosen to ensure that the two functions are evaluated over the common domain as inthe case of two countries. In practice, k and k i are replaced by its nonparametric estimate. Estimation of Common Trend

For given parameters θ i , i =

1, . . . , n , the functional relationshipin (8) implies that k i ( θ i t + θ i ) = θ i k ( t ) + θ i , (12)Thanks to the normalizing conditions on θ i and θ i , this implies that k ( t ) = N − N ∑ i = k i ( θ i t + θ i ) . (13)That is, if k i is appropriately transformed with respect to the individual parameters θ i , then k issimply the average. In practice, we have noisy version of k i available at different number of timepoints. Then the functional mean can be estimated more efﬁciently with nonparametric smoothing,which essentially gives rises to a weighted average estimate.14ulti-Populations Mortality Model • June 2018

Estimation algorithm

Combining the above two steps leads to the following iterative algorithmfor estimation of the parameters.(a) Given k i , obtain an initial estimate of k based on all country-level mortality rates(b) Given k , update θ i by minimizing the nonlinear least squares criterion in (11) for each i =

1, . . . , N .(c) Normalize the parameters to satisfy the constraints.(d) Given θ i , i =

1, . . . , N , update k by (13).(e) Iterate (b)-(d) until convergence.The general algorithm was proposed and studied by Kneip & Engel (1995). We adapted accordinglyto account for incomplete observations in our sample. Initial values of k and k i To initialize k , we choose the trimmed mean of the sample estimates,based on the middle 50% of the countries in terms of the length of the recording period. Theestimation of k and k i is done with local linear kernel smoothing method to account for measure-ment error. The smoothing parameter for k i was selected to maintain comparable smoothnessacross the samples using common degrees of freedom (Bowman & Azzalini, 1997). Computational Issues with the Parameterization

The shape invariant model implicitly assumesthat there are identiﬁable features that are common across the sample. It is easy to check fordensely observed curves (with non-monotone functions) by means of the derivative estimation,but for sparsely observed curves, there could be an ambiguity in identifying the parameters. Inthe case of the mortality curves, due to the limited measurements available, the ambiguity occursin distinguishing the differential effect of vertical shift ( θ ) and horizontal shift ( θ ) in time. In thiscase, we choose to attribute the effect as horizontal shift and set θ =

0, as this is more amenableto interpretation and meanwhile promising to extend forecast horizon. Comparison of these twoparameterization cases will be illustrated in Section 4.3.2.

Bootstrap for Prediction Interval

The forecasting for Chinese mortality (8) is now updatedbased on the common reference curve k . In order to construct a prediction interval of China’smortality trend, we also need a reliable estimate of the measure of variability. Unlike the standardsetting studied by Kneip & Engel (1995) for relatively densely observed data on common intervals,it is difﬁcult to derive an asymptotic distribution of the estimators for sparse and incomplete dataas ours. Here, we opt for a bootstrap method to approximate the uncertainties in estimation andprediction.The standard bootstrap techniques relying on identically independent distributed (i.i.d.)observations are not appropriate here. Recently re-sample methods for dependent data have15ulti-Populations Mortality Model • June 2018considered several options: bootstrap with i.i.d. innovations, bootstrap with block segments andmodel-based bootstrap. Due to limited sample of China’s mortality time series, bootstrap withblock segments do not create the ideal re-sampled time series. Alternatively we bootstrap themortality data based on i.i.d. innovations obtained from ﬁtting time series model, and afterwardswe carry out estimation on the re-sampled data and generate prediction interval at different levels.Suppose that for the time series k , ..., k n and some ﬁxed p ∈ N , there exists a parametricestimator of the conditional expectation E ( k t | k t − , ..., k t − p ) denoted by (cid:98) m n ( k t − , ..., k t − p ) . Thisestimator leads to residuals (cid:98) e t : = k t − (cid:98) m n ( k t − , ..., k t − p ) , t = p +

1, ..., n . (14)Resampling from these residuals leads to a bootstrap sample of time series k ∗ t = (cid:98) m n ( k ∗ t − , ..., k ∗ t − p ) + e ∗ t , t = p +

1, ..., n . (15)The idea of parametric ﬁt to the conditional expectation can be executed by ARIMA models.Kreiss & Lahiri (2012) discuss different situations with parametric and nonparametric modeling ofpredictor k t and respective asymptotic consistence properties. In light of similarities across countries, seeking global mortality trend becomes very reasonable andnatural. Some research also ﬁnd out mortality trend has connection with economic developmentlevel, GDP for instance, see Hanewald (2011). In the remaining section, we are going through theempirical analysis of common trend in different groups via slightly different models.

Figure 9:

Five outlying countries: geographically neighbors in east Europe. Source: Google Map. • June 2018Evidence tells us that the mortality rate is decreasing as time evolves, due to medical im-provement, economic development and social stability. However, we notice that there are someremarkable outliers like Russia, Estonia, Latvia, Lithuania and Belarus in the database.As shown in Figure 9, all these ﬁve countries locate themselves in east Europe and areused to be members of the Soviet Union. They share similar geographic characteristics andmeanwhile experienced parallel economic and social progress, and within our expectation theyreﬂect comparable mortality moving path as well, as displayed in Figure 10 and Appendix 1. Notethat solid (blue) curves represent global mortality trend, dashed (yan) curves are representingestimated individual country-level mortality trends based on global trend, the short solid (red)curves are smoothed original individual country-level mortality trends and (black) circles areoriginal individual country-level mortality trends.Surprisingly, they exhibit a quite opposite tendency in contrast with other 31 countries. Themortality rates go through a short period of decrease, then stay stable or slightly increase forseveral years and afterwards go back to declining path again. One possible reason on this differentphenomenon perhaps is connected with political event of dissolution of the soviet union. lllllllllllllllllllllllllllllllllllllllllllllllllllllll

Lithuania 1959−−2013

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllll Latvia 1959−−2013

Time k t − − Figure 10:

Different mortality movements of Lithuania and Latvia: blue curves represent global mortality trend, lightblue curves are representing estimated individual country-level mortality trends based on global trend, redcurves are smoothed original individual country-level mortality trends and black dots are original individualcountry-level mortality trends.

MuPoMoTo ideally demonstrate global mortality movements in majority of countries, we remove theseﬁve countries for remaining study to reduce inﬂuences from minor outlying ones. 17ulti-Populations Mortality Model • June 2018

As discussed previously, for sparsely observed curves there could be an ambiguity in identifyingall of the four parameters. Therefore, we choose to compare the original 4-parameters model witha simpliﬁed 3-parameter model of setting θ = Reference Curves

Time k t − − − Reference Curves

Time k t − − − Figure 11:

Common mortality trends estimated by 3-parameters model (left) vs. 4-parameters model (right).

MuPoMoUnder two different parameterization, we estimate the parameters and reference curve re-spectively. In Figure 11, the reference curves and common trend for these two cases are plotted:the left one is calculated from the 3-parameters model and the right one is generated from the4-parameters one. The red curve is the initial reference curve, blue ones are one-step aheadupdated reference curves. From these two plots, no clear and obvious difference can be seen. Butfrom the viewpoint of analytic thinking, we choose to attribute the effect as horizontal shift andset θ =

0, as this is more amenable to interpretation and meanwhile promising to extend forecasthorizon.Figure 12 explains the common mortality trend generated from 3-parameters model comparedwith individual nation-level mortality trend. In this graph, the black solid curve is the initialreference curve, cyan, green, blue and red ones represent the updated ones at different iterationstage while the grey ones are the non-smoothed mortality trend from each country. It is obviousthat after one step optimization, reference curves are already showing a quite similar pattern. Thecommon mortality trend is adjusted to an upper level, mostly because more developing countries(Czech Republic, Hungary and China, for example) started collecting demographic data at a latertime period in contrast with more developed countries (such as Sweden, Norway and France),and also developing countries have higher mortality rates generally. The ﬁgures on illustratingindividual case are provided in Appendix 2.18ulti-Populations Mortality Model • June 2018

Reference Curves

Time k t − − Figure 12:

Common mortality trend compared with individual nation-level mortality trends: red curve is the initialreference curve, blue and black ones are updated reference curves convergent to common trend, while thegrey lines represent individual country.

MuPoMo

Since a common mortality trend is available, it could be applied to help improve estimation andforecasting of individual case. Especially when sample size from individual country is relativelysmaller than anticipated, semi-parametric comparison of common mortality trend with eachindividual nation-level one will be a promising way with respect to forecasting. At the same time,it helps to reduce ambiguity in identifying the parameters in case of seemingly linear co-movementbetween regression curves, like the case of comparing China and Japan.Figure 13 displays newly estimated China mortality trend via semi-parametric comparisonwith common trend. The (blue) solid line is the common trend or updated reference curve andthe (cyan) dashed line is the estimated China mortality trend based on the common trend. Incomparison, the raw China mortality curve estimate is marked by black circles, with the individualsmoothing estimate overlayed in short (red) line. Thanks to parameters deviation on time axis t − θ θ ,we could extend forecasting horizon of China approximately 40 years through the informationfrom common trend, see Figure 13. Referring to estimation and forecasting of the other 30countries, they are arranged in Appendix 2. 19ulti-Populations Mortality Model • June 2018 lllllllllllllllll

China 1994 −− 2010

Time k t − − Figure 13:

Common mortality trend and estimated China mortality trend based on common trend.

MuPoMoWith model-based bootstrap approach, we simulate 500 re-sampled China’s mortality timeseries from 1994 to 2010 based on ARIMA model. From each simulation, we estimate the optimalshape deviation parameters θ and accordingly calculate the estimated China mortality with longertime horizon based on the common trend. Histogram of theta 1 theta 1 F r equen cy Histogram of theta 2 theta 2 F r equen cy −4 −2 0 2 4 Histogram of theta 3 theta 3 F r equen cy Figure 14:

Histograms of ˆ θ , ˆ θ and ˆ θ . MuPoMo20ulti-Populations Mortality Model • June 2018In Figure 14, we display the variation of ˆ θ , ˆ θ and ˆ θ across the countries. From histogram ofˆ θ , 50% of ˆ θ lies between 1.5 and 2.5, which indicates the overall accelerating declining mortalitytrend of China compared with global trend. More than 95% of ˆ θ , the parameter describing timedelay, falls into the interval of (

0, 10 ) , which further conﬁrms that there exists a time delay ofChina’s mortality trend around 10 years later than global situation. Majority of ˆ θ ranges from0.98 to 1, which reveals a little time acceleration in China’s mortality trend.In Figure 15, conﬁdence intervals at different levels are displayed. On the left part, conﬁdenceintervals at 80% and 90% are plotted in grey zone and blue zone respectively, while yellow zonehighlights the central area of possible forecast path. On the right one, only 90% conﬁdence intervalis presented. Black line stands for global mortality trend, red one is original China’s mortalitytrend and light blue curve shows the estimated China’s mortality trend based common trend andoriginal China’s data. Time K t − − Time K t − − Figure 15:

Conﬁdence intervals at different levels: 80% (left) vs. 90% conﬁdence interval (right).

MuPoMo

The global mortality, as expected, is undergoing a shift toward exhibiting declining tendency,and with a dramatic decreasing movement in the last several decades in contrast with hundredsyears ago. It also depicts that most of countries are converging with a similar mortality pattern ofdecreasing over time. The improvement possibly results from economic development and medicalimprovement, and it is also not difﬁcult to imagine that there will be gradually declining mortalityrate in the near future due to technology progress.In addition to the global mortality trend, each country still behaves differently from others tosome extent. From this perspective, it might be possible for life insurance companies to designinsurance products among different countries to hedge global longevity risk. That is, if wider21ulti-Populations Mortality Model • June 2018range of countries is covered by a particular insurance company, it is possible to redistributelongevity risk among them.Another advantage from this research is to establish a better forecasting regime to foreseemortality change in longer time horizon, particularly for countries with limited historical mortalitydata, such as China and Chile.

References

Booth, H. (2006). Demographic forecasting: 1980 to 2005 in review.

International Journal ofForecasting , , 547–581.Booth, H., & Tickle, L. (2008). Mortality modelling and forecasting: a review of methods. Annals ofActuarial Science , , 3–43.Bowman, A., & Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis: the KernelApproach with S-Plus Illustrations . Oxford University Press, Oxford.Cairn, A., Blake, D., & Dowd, K. (2008). Modelling and management of mortality risk: a review.

Scandinavian Actuarial Journal , , 79–113.Cairns, A., Blake, D., Dowd, K., Coughlan, G., & Khalaf-Allah, M. (2011). Bayesian stochasticmortality modelling for two populations. ASTIN Bulletin , , 29–55.Chan, W., Li, S., & Cheung, S. (2008). Testing deterministic versus stochastic trends in the lee-cartermortality indexes and its implications for projecting mortality improvements at advanced ages. Living to 100 , .Fang, L., & Härdle, W. (2015).

Stochastic population analysis: a functional data analysis . TechnicalReport SFB 649 Discussion Paper Series 2015.Gao, Y., & Shang, H. L. (2017). Multivariate functional time series forecasting: An application toage-speciﬁc mortality rates.

Risks , , 21.Hanewald, K. (2011). Explaining mortality dynamics: the role of macroeconomic ﬂuctuations andcause of death trends. North American Actuarial Journal , .Härdle, W., & Marron, J. (1990). Semiparametric comparison of regression curves. Annals ofStatistics , , 63–89.Human Mortality Database (2013). University of California, Berkeley (USA), and MaxPlanck Institute for Demographic Research (Germany). Available at or (data downloaded on 13 June 2013).Hyndman, R., & Ullah (2007). Robust forecasting of mortality and fertility rates: A functional dataapproach. Computational Statistics and Data Analysis , , 4942–4956.22ulti-Populations Mortality Model • June 2018Kneip, A., & Engel, J. (1995). Model estimation in nonlinear regression under shape invariance.

Annuals of Statistics , , 551–570.Kreiss, J., & Lahiri, S. (2012). Bootstrap methods for time series. Handbook of Statistics , , 3–26.Lawton, W. H., Sylvestre, E. A., & Maggio, M. S. (1972). Self modeling nonlinear regression. Technometrics , , 513–532.Lee, R. D., & Carter, L. R. (1992). Modeling and forecasting u.s. mortality. Journal of the AmericanStatistical Association , , 659–671.Lee, R. D., & Miller, T. (2001). Evaluating the performance of the lee-carter method for forecastingmortality. Demography , , 537–549.Li, N., & Lee, R. (2005). Coherent mortality forecasts for a group of populations: An extension ofthe lee-cartermethod. Demography , , 575–594.Li, N., Lee, R., & Tuljapurkar, S. (2004). Using the lee-carter method to forecast mortality forpopulations with limited data. International Statistical Review , , 19–36.Raftery, A., Li, N., Ševˇcíková, H., Gerland, P., & Heilig, G. (2012). Bayesian probabilistic populationprojections for all countries. Proceedings of the National Academy of Sciences , , 13915–13921.Shang, H. L. (2016). Mortality and life expectancy forecasting for a group of populations indeveloped countries: A multilevel functional data method. The Annals of Applied Statistics , ,1639–1672.Simonoff, J. S. (1996). Smoothing methods in statistics . Springer. 23ulti-Populations Mortality Model • June 2018 llllllllllllllllllllllllllllllllllllllllllllllllllllllll

Belarus 1959−−2014

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllll Estonia 1959−−2014

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllll Russia 1959−−2010

Time k t − − MuPoMo24ulti-Populations Mortality Model • June 2018 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Australia 1921 −− 2011

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Austria 1947 −− 2014

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Bulgaria 1947 −− 2010

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Canada 1921 −− 2011

Time k t − − llllllllllllll Chile 1992 −− 2005

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll CzechRepublic 1950 −− 2014

Time k t − − MuPoMo25ulti-Populations Mortality Model • June 2018 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Denmark 1835 −− 2011

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Finland 1878 −− 2012

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll France 1816 −− 2013

Time k t − − llllllllllllllllllllll Germany 1990 −− 2011

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Hungary 1950 −− 2009

Time k t − −1000100

Time k t − −1000100 MuPoMo26ulti-Populations Mortality Model • June 2018 llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Ireland 1950 −− 2009

Time k t − − llllllllllllllllllllllllllllllll Israel 1983 −− 2014

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Italy 1872 −− 2009

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Japan 1947 −− 2012

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllll Luxembourg 1960 −− 2009

Time k t − −1000100

Time k t − −1000100 MuPoMo27ulti-Populations Mortality Model • June 2018 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

NewZealand 1948 −− 2008

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllll Poland 1958 −− 2009

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Portugal 1940 −− 2012

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Slovakia 1950 −− 2009

Time k t − − llllllllllllllllllllllllllllllll Slovenia 1983 −− 2014

Time k t − −1000100

Time k t − −1000100 MuPoMo28ulti-Populations Mortality Model • June 2018 lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll

Spain 1908 −− 2012

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll Switzerland 1876 −− 2011

Time k t − − lllllllllllllllllllllllllllllllllllllllll Taiwan 1970 −− 2010

Time k t − − llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll UnitedKingdom 1922 −− 2013

Time k t − − lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll USA 1933 −− 2013

Time k t − −1000100