Multipopulation mortality modelling and forecasting: The multivariate functional principal component with time weightings approaches
MMultipopulation mortality modelling and forecasting:The multivariate functional principal component withtime weightings approaches
Ka Kin Lam * , Bo Wang School of Mathematics and Actuarial Science, University of Leicester,Leicester, LE1 7RH, UK
Abstract
Human mortality patterns and trajectories in closely related populations are likelylinked together and share similarities. It is always desirable to model them si-multaneously while taking their heterogeneity into account. This paper introducestwo new models for joint mortality modelling and forecasting multiple subpopu-lations in adaptations of the multivariate functional principal component analysistechniques. The first model extends the independent functional data model to amulti-population modelling setting. In the second one, we propose a novel multi-variate functional principal component method for coherent modelling. Its designprimarily fulfils the idea that when several subpopulation groups have similar socio-economic conditions or common biological characteristics such close connectionsare expected to evolve in a non-diverging fashion. We demonstrate the proposedmethods by using sex-specific mortality data. Their forecast performances are fur-ther compared with several existing models, including the independent functional * Corresponding author. E-mail: [email protected] (K.K.Lam). a r X i v : . [ s t a t . M E ] F e b ata model and the Product-Ratio model, through comparisons with mortality dataof ten developed countries. Our experiment results show that the first proposedmodel maintains a comparable forecast ability with the existing methods. In con-trast, the second proposed model outperforms the first model as well as the currentmodels in terms of forecast accuracy, in addition to several desirable properties. Keywords —
Mortality modelling; Coherent forecasts; Functional principal component analy-sis; Lee-Carter model; Product-Ratio model; Multivariate functional data analysis
There have been tremendous developments in the area of mortality modelling and forecastingover the last three decades. These include the pioneering mortality model proposed by Lee &Carter (1992), well-known as the Lee-Carter model. It rapidly gained credit and popularity,given its simplicity and ability to capture most variations in mortality pattern evolved over time.Several modifications and extensions of the Lee-Carter model have been put forward, see, for in-stance, Lee & Miller (2001), Booth et al. (2002), Renshaw & Haberman (2006) and Cairns et al.(2008). It is worth noting that Hyndman & Ullah (2007) further extends the Lee-Carter model toa functional data framework, which includes non-parametric smoothing techniques, functionalprincipal component decomposition and times series analysis to achieve the task of mortalitymodelling and forecasting. Although the models as mentioned earlier posed a great successin history, the single factor designs limit their capacity of mortality modelling and forecastingon solely one population. It seems improper to prepare a mortality forecast for an individualpopulation in isolation from one another if they are closely linked together. For example, dueto biological and behavioural reasons, male mortality rates have considerably been higher thanfemale mortality rates, see Kalben (2000). However, if male mortality improvements are fasterthan female ones, but two genders are projected independently, the model may forecast male2ortality rates lower than and eventually diverged further from female mortality rates. As such,it is always a significant challenge in human mortality modelling that the model can take mul-tiple populations as well as their heterogeneity simultaneously into account. Several mortalitymodels for multiple populations have been proposed in the literature over the last decade, see,for instance, Delwarde et al. (2006) and Dowd et al. (2011). In more desirable cases, the modelcan further ensure that the forecasts for multiple related populations maintain certain structuralrelationships based on the extensive theoretical considerations and historical observations. A‘coherent’ or ‘non-divergent’ model is one of the most well-suited cases in mortality modellinggiven the fact that the mortality of populations that are geographically close or otherwise relatedis driven by a common set of factors such as socio-economic, environmental and biological con-ditions and differences are unlikely to increase in the long run. Such coherent forecast modelsare also documented in the literature, see, for example, the earliest augmented common factor(ACF) model proposed by Li & Lee (2005), which is an extension of the Lee-Carter model withan additional common factor to capture both short-term divergence and long-term coherenceamong related populations. Variants and extensions of the ACF model have been subsequentlydeveloped, such as Li (2013), Li et al. (2016) and Chen & Millossovich (2018). Some oth-ers like the Age-Period-Cohort (APC) model proposed by Cairns et al. (2011), incorporate amean-reverting stochastic process for two related populations and allow for different trends inmortality improvement rates in the short-run but parallel improvements in the long-run. TheProduct-Ratio model developed by Hyndman et al. (2013), which models the product and ra-tio functions of the age-specific mortality rates of different populations individually through afunctional principal component decomposition, achieves coherent mortality forecasts by con-straining the forecast ratio function via a stationary time series model to appropriate constants.Shang et al. (2016) and Wu & Wang (2019) use multilevel functional principal component anal-ysis of aggregated and population-specific data to extract the common trend and population-3pecific residual trend among populations. The forecast of population-specific residual trend isrestricted to be a stationary time process to achieve convergence in the long run. Some otherdevelopments in this field include Jarner & Kryger (2011), Hatzopoulos & Haberman (2013)and Wan & Bertschi (2015). Also, see Danesi et al. (2015) and Enchev et al. (2017) for reviewsand comparisons.In this paper, we propose two new models for mortality modelling and forecasting with thetheoretical framework of multivariate functional principal component analysis techniques intro-duced by Chiou et al. (2016) and Happ & Greven (2018). The main objective of the multivariatefunctional principal component analysis is to model multiple sets of functional curves that maybe correlated among others, which allows us to construct two new models on top of it. The firstone is to model groups of population mortality rates with similarities across periods and agestogether. The second model is a novel method for coherent mortality modelling and forecastingthat captures the common trend and the population-specific trend of groups of mortality pat-terns and produces forecasts of different populations that do not diverge in the long run. Themodels are estimated using the weighted functional principal component algorithm (Hyndman& Shang, 2009), which places more weights on recent mortality movements and so producesmore realistic forecasts.More will be discussed in detail in the paper, and the rest of this article is organised asfollows. In Section 2, we give a review of the theoretical background about univariate andmultivariate functional principal component analyses. In Section 3, we describe the generalframeworks of two proposed multivariate functional principal component analysis models formortality modelling and forecasting. We then illustrate the models by applying them to the sex-specific mortality rates for Japan with comparisons to two analogous functional data paradigms − the independent FPCA model and the Product-Ratio model proposed by Hyndman & Ullah(2007) and Hyndman et al. (2013), in terms of the systematic differences and forecasting per-4ormances using sex-specific mortality data of ten developed countries in Section 4. We lastlyconclude this article with discussions and remarks in Section 5. Functional principal component analysis (FPCA) is the core technique applied primarily in thispaper. It is a statistical method for analysing the variation of a bunch of functional curves ina dataset then reducing them from infinite dimensions to finite dimensions in principal com-ponent representations of variation (Ramsay & Silverman, 2007). It can also be regarded as afunctional extension of the multivariate PCA method, allowing the data dimension to increasefrom finite space to infinite space (Ramsay & Dalzell, 1991). After the Karhunen-Lo`eve theo-rem in expansions of a stochastic process proposed by Karhunen (1946) and Lo`eve (1946), thetheoretical developments of FPCA can be divided into two main streams: the linear operatorand the covariance operator perspectives, see, for example, Besse (1992), Ramsay (2004), Yaoet al. (2005), Hall et al. (2006) and Bosq (2012). To get the readers well equipped with nec-essary concepts in this paper, we firstly give a brief review of univariate FPCA then move todiscuss the theoretical framework of multivariate FPCA from the covariance operator point ofview.
Let Y ( x ) be a mean square continuous ( L -continuous) stochastic process on a domain X witha mean function µ ( x ) = E ( Y ( x )) and a covariance function K ( x, x (cid:48) ) = Cov ( Y ( x ) , Y ( x (cid:48) )) forall x ∈ X . Assuming that there exists a covariance operator Λ : L ( X ) → L ( X ) for anyfunction f ∈ L ( X ) , (cid:0) Λ f (cid:1) ( x ) = (cid:90) X K ( x, x (cid:48) ) f ( x (cid:48) ) dx (cid:48) , ∀ x ∈ X . Λ , we can perform a spectral analysis of the covariancefunction K ( x, x (cid:48) ) , such that (cid:0) Λ φ (cid:1) ( x ) = (cid:90) X K ( x, x (cid:48) ) φ ( x (cid:48) ) dx (cid:48) = λφ ( x ) , to obtain a set of orthonormal basis eigenfunctions { φ n ( x ) } ∞ n =1 and a corresponding set ofeigenvalues { λ n } ∞ n =1 , where λ ≥ λ ≥ · · · ≥ , representing the amount of variability in Y ( x ) explained by the { φ n ( x ) } ∞ n =1 . Y ( x ) can now be represented as an infinite linear combination ofthe orthonormal functions by the Karhunen-Lo`eve theorem, that is Y ( x ) = µ ( x ) + ∞ (cid:88) n =1 β n φ n ( x ) , ∀ x ∈ X , where β n is the principal component score with β n = (cid:90) X ( Y ( x ) − µ ( x )) φ n ( x ) dx. The principal component scores { β n } ∞ n =1 are uncorrelated random variables with mean zeroand variance { λ n } ∞ n =1 . β n serves as the weight and the projection of the centred stochasticprocess ( Y ( x ) − µ ( x )) in the direction of the n -th eigenfunction φ n ( x ) in the Karhunen-Lo`everepresentation of Y ( x ) . As the eigenvalues decrease quickly towards zero, only the first feweigenfunctions are needed to represent the most important features of Y ( x ) . The truncatedKarhunen-Lo`eve expansion with the optimal first N -dimensional approximations to Y ( x ) canbe written as Y ( x ) = µ ( x ) + N (cid:88) n =1 β n φ n ( x ) , ∀ x ∈ X , (2.1)and thus reduce the infinite dimension of functional data into finite dimensions in principaldirection of variation which is often used in practice for data analysis, e.g. for regression or forclustering (Ramsay & Silverman, 2007). 6 .2 Multivariate FPCA (MFPCA) We now concern with multivariate functional data and the way to establish a direct link fromthe Karhunen-Lo`eve representation of univariate functional data discussed previously to theKarhunen-Lo`eve representation of multivariate functional data. Consider a random samplewhich consists of p ≥ sets of functions Y (1) ( x ) , · · · , Y ( p ) ( x ) on a domain X for all x ∈ X .Combining all different sets of functions in a vector Y ( x ) , we have Y ( x ) = ( Y (1) ( x ) , · · · , Y ( p ) ( x )) T ∈ R p with a mean function µ ( x ) = ( E ( Y (1) ( x )) , · · · , E ( Y ( p ) ( x ))) T = ( µ (1) ( x ) , · · · , µ ( p ) ( x )) T , and a covariance function K ij ( x, x (cid:48) ) = Cov ( Y ( i ) ( x ) , Y ( j ) ( x (cid:48) )) = E [( Y ( i ) ( x ) − µ ( i ) ( x ))( Y ( j ) ( x (cid:48) ) − µ ( j ) ( x (cid:48) ))] . Let a set of functions f = ( f (1) , · · · , f ( p ) ) with an index for each i = 1 , · · · , p : f ( i ) ∈ L ( X ) .Assuming that there exists a covariance operator Γ : L ( X ) → L ( X ) for all f ∈ L ( X ) withthe i -th element of (Γ f ) , (Γ f ) ( i ) ( x ) = p (cid:88) j =1 (cid:90) X K ij ( x, x (cid:48) ) f ( j ) ( x (cid:48) ) dx (cid:48) , ∀ x ∈ X . With the defined covariance operator Γ and the similar structure in the univariate FPCA, we let (cid:0) Γ ψ (cid:1) ( x ) = νψ ( x ) by the spectral theorem on the covariance function K ij ( x, x (cid:48) ) , where ψ ( x ) is the orthonormal basis of eigenfunctions and ν is the corresponding eigenvalue. Then for all i, j = 1 , · · · , p and x ∈ X , we have (Γ ψ ) ( i ) ( x ) = p (cid:88) j =1 (cid:90) X K ij ( x, x (cid:48) ) ψ ( j ) ( x (cid:48) ) dx (cid:48) = νψ ( i ) ( x ) . Y (1) ( x ) , · · · , Y ( p ) ( x ) has its finite univariate Karhunen-Lo`eve representation up to the optimal first N -dimensionalapproximations as shown in Equation (2.1), i.e. Y ( i ) ( x ) = µ ( i ) ( x ) + (cid:80) Nn =1 β ( i ) n φ ( i ) n ( x ) , and K ij ( x, x (cid:48) ) is a separable covariance function, i.e. K ij ( x, x (cid:48) ) = K i ( x, x (cid:48) ) · K j ( x, x (cid:48) ) . It holds foreach m, l = 1 , · · · , N and i, j = 1 , · · · , p and for all x ∈ X , (Γ ψ ) ( i ) ( x ) = p (cid:88) j =1 N (cid:88) m =1 N (cid:88) l =1 (cid:90) X Cov ( β ( i ) m φ ( i ) m ( x ) , β ( j ) l φ ( j ) l ( x (cid:48) )) ψ ( j ) ( x (cid:48) ) dx (cid:48) = p (cid:88) j =1 N (cid:88) m =1 N (cid:88) l =1 Cov ( β ( i ) m , β ( j ) l ) φ ( i ) m ( x ) (cid:90) X φ ( j ) l ( x (cid:48) ) ψ ( j ) ( x (cid:48) ) dx (cid:48) = νψ ( i ) ( x ) . (2.2)For simplicity of notation, we denote Cov ( β ( i ) m , β ( j ) l ) = Z ( ij ) ml and (cid:82) X φ ( j ) l ( x (cid:48) ) ψ ( j ) ( x (cid:48) ) dx (cid:48) = c ( j ) l .Equation (2.2) can be rewritten as p (cid:88) j =1 N (cid:88) m =1 N (cid:88) l =1 Z ( ij ) ml φ ( i ) m ( x ) c ( j ) l = νψ ( i ) ( x ) . (2.3)Following a similar technique of Zemyan (2012) in solving Fredholm integral equations of thesecond kind with a separable covariance function, we can multiply and integrate an orthonormalbasis eigenfunction φ ( i ) n ( x ) , for n = 1 , · · · , N , over the domain X for both sides of Equation(2.3), (cid:90) X φ ( i ) n ( x ) · p (cid:88) j =1 N (cid:88) m =1 N (cid:88) l =1 Z ( ij ) ml φ ( i ) m ( x ) c ( j ) l dx = ν (cid:90) X φ ( i ) n ( x ) · ψ ( i ) ( x ) dx. Due to the orthonormality, (cid:82) X φ ( i ) n ( x ) · φ ( i ) m ( x ) dx = 1 if m = n , and (cid:82) X φ ( i ) n ( x ) · φ ( i ) m ( x ) dx = 0 if m (cid:54) = n otherwise. Denoting (cid:82) X φ ( i ) n ( x ) · ψ ( i ) ( x ) dx = c ( i ) n , it holds with the simplified notations p (cid:88) j =1 N (cid:88) l =1 Z ( ij ) nl c ( j ) l = νc ( i ) n . (2.4)For a given value of ν , the solvability of this linear system correlates with the solvability of theintegral equation. Since i, j and n, m, l are arbitrarily notated, Equation (2.4) can be represented8n matrix form when we consider it as a whole, which is equivalent to an eigenequation, i.e. Zc = ν c , (2.5)or Z (11) . . . Z (1 p ) ... . . . ... Z ( p . . . Z ( pp ) c (1) ... c ( p ) = ν c (1) ... c ( p ) with a positive semidefinite block matrix Z ( ij ) ∈ R N × N and an eigenvector c ( i ) ∈ R N entries.With a matrix eigenanalysis performed on Equation (2.5), we can obtain a set of orthonormaleigenvectors { c n } Nn =1 of Z , corresponding a set of eigenvalues { ν n } Nn =1 , where ν ≥ · · · ≥ ν N ≥ .Substituting the orthonormal eigenvector c n and the corresponding eigenvalue ν n into Equa-tion (2.3), the eigenfunction ψ ( i ) n ( x ) of Γ is given by their elements: ψ ( i ) n ( x ) = 1 ν n p (cid:88) j =1 N (cid:88) m =1 N (cid:88) l =1 Z ( ij ) ml [ c n ] ( j ) l φ ( i ) m ( x ) = N (cid:88) m =1 [ c n ] ( i ) m φ ( i ) m ( x ) , ∀ x ∈ X , where [ c n ] ( i ) denotes the i -th block of the orthonormal eigenvector c n of Z corresponding toits eigenvalue ν n . The truncated multivariate Karhunen-Lo`eve expansions with the first optimal N -dimensional approximations to Y ( i ) ( x ) can be written as Y ( i ) ( x ) = µ ( i ) ( x ) + N (cid:88) n =1 ρ n ψ ( i ) n ( x ) , ∀ x ∈ X , (2.6)where ρ n is the multivariate principal component score with ρ n = p (cid:88) i =1 (cid:90) X ( Y ( i ) ( x ) − µ ( i ) ( x )) ψ ( i ) n ( x ) dx = p (cid:88) i =1 (cid:90) X ( Y ( i ) ( x ) − µ ( i ) ( x )) N (cid:88) m =1 [ c n ] ( i ) m φ ( i ) m ( x )= p (cid:88) i =1 N (cid:88) m =1 [ c n ] ( i ) m (cid:90) X ( Y ( i ) ( x ) − µ ( i ) ( x )) φ ( i ) m ( x ) = p (cid:88) i =1 N (cid:88) m =1 [ c n ] ( i ) m β ( i ) m , β ( i ) m is the m -th univariate principal component score of the i -th element.The mean and the covariance of ρ n can be derived for all i, j = 1 , · · · , p and x ∈ X as E ( ρ n ) = p (cid:88) i =1 (cid:90) X E ( Y ( i ) ( x ) − µ ( i ) ( x )) ψ ( i ) n ( x ) dx = 0 , since E (( Y ( i ) ( x ) − µ ( i ) ( x )) = 0 , andCov ( ρ n , ρ m ) = E (cid:18) p (cid:88) i =1 (cid:90) X ( Y ( i ) ( x ) − µ ( i ) ( x )) ψ ( i ) n ( x ) dx · p (cid:88) j =1 (cid:90) X ( Y ( j ) ( x (cid:48) ) − µ ( j ) ( x (cid:48) )) ψ ( j ) m ( x (cid:48) ) dx (cid:48) (cid:19) = p (cid:88) i =1 (cid:90) X p (cid:88) j =1 (cid:90) X K ij ( x, x (cid:48) ) ψ ( j ) m ( x (cid:48) ) dx (cid:48) · ψ ( i ) n ( x ) dx = p (cid:88) i =1 (cid:90) X ν m ψ ( i ) m ( x ) · ψ ( i ) n ( x ) dx = ν m p (cid:88) i =1 (cid:90) X ψ ( i ) m ( x ) · ψ ( i ) n ( x ) dx. (2.7)Because of the orthonormality, (cid:82) X ψ ( i ) m ( x ) · ψ ( i ) n ( x ) dx = 1 if m = n , and (cid:82) X ψ ( i ) m ( x ) · ψ ( i ) n ( x ) dx =0 if m (cid:54) = n otherwise. In practice, the natural path from the univariate to the multivariate functional principal compo-nent analysis discussed above can be summarised in a simple algorithm introduced by Happ &Greven (2018). Given a random sample consists of p ≥ sets of functions Y (1) ( x ) , · · · , Y ( p ) ( x ) on a domain X for all x ∈ X , the algorithm comprises the following four steps:1. Perform a univariate functional principal component analysis for each element Y ( i ) ( x ) consisting of the observed curves { Y ( i ) t ( x ) } Tt =1 . This gives us a set of estimated principalcomponent scores { ˆ β ( i ) t,n } Nn =1 for each observation unit t = 1 , · · · , T and estimated eigen-functions { ˆ φ ( i ) n ( x ) } Nn =1 with the first suitably chosen N -dimensional approximations toeach Y ( i ) ( x ) . 10. Combine all the estimated principal component scores into a single large matrix Ξ where Ξ = ˆ β (1)1 , . . . ˆ β (1)1 ,N . . . ˆ β ( p )1 , . . . ˆ β ( p )1 ,N ... . . . ... . . . ... . . . ... ˆ β (1) T, . . . ˆ β (1) T,N . . . ˆ β ( p )1 ,N . . . ˆ β ( p ) T,N ∈ R T × pN and estimate the joint covariance matrix ˆ Z = N − Ξ T Ξ .3. Perform a matrix eigenanalysis for ˆ Z to obtain a set of estimated orthonormal eigenvec-tors { ˆ c n } Nn =1 and a set of corresponding eigenvalues { ˆ ν n } Nn =1 of ˆ Z .4. Calculate the estimated multivariate eigenfunctions and the estimated multivariate prin-cipal component scores according to their i -th elements: ˆ ψ ( i ) n ( x ) = N (cid:88) m =1 [ ˆ c n ] ( i ) m ˆ φ ( i ) m ( x ) , ∀ x ∈ X , and ˆ ρ t,n = p (cid:88) i =1 N (cid:88) m =1 [ ˆ c n ] ( i ) m ˆ β ( i ) t,m . The empirical truncated multivariate Karhunen-Lo`eve representation with the first optimal N -dimensional approximations to Y ( i ) t ( x ) is ˆ Y ( i ) t ( x ) = ˆ µ ( i ) ( x ) + N (cid:88) n =1 ˆ ρ t,n ˆ ψ ( i ) n ( x ) , ∀ x ∈ X , where ˆ µ ( i ) ( x ) = T (cid:80) Tt =1 Y ( i ) t ( x ) , and the estimated multivariate principal component score ˆ ρ t,n gives the individual weight of each observation unit t for its corresponding estimated multivari-ate eigenfunction ˆ ψ ( i ) n ( x ) . 11 Methodology
In this section we firstly introduce our new model, namely weighted MFPCA (wMFPCA) modelfor forecasting mortality rates of several subpopulations within a large population simultane-ously.Let Y ( i ) t ( x ) denote the log of the observed mortality rates of the i -th subpopulation for age x in year t . We assume there is an underlying L -continuous smooth function f ( i ) t ( x ) that we areobserving with error and at discrete points of x . Our discrete observations are { x j , Y ( i ) t ( x j ) } ,for i = 1 , . . . , p, t = 1 , · · · , T, j = 1 , · · · , J , such that Y ( i ) t ( x j ) = f ( i ) t ( x j ) + σ ( i ) t ( x j ) e ( i ) t,j , where { e ( i ) t,j } T,Jt,j =1 are i.i.d. standard normal random variables, and σ ( i ) t ( x j ) allows the amount ofnoise varying with age x .In demographic modelling, it is often the case that more recent experience has greater rele-vance on the future behaviour than those data from the distant past. In view of this, we comprisea weighted functional component algorithm for the MFPCA model, allowing the forecasting re-sults of the model to be based more on the recent data.Let ˆ f ( i ) t ( x ) be a smoothed function estimated from the observation Y ( i ) t ( x j ) , and w t = κ (1 − κ ) T − t be a geometrically decaying weight with < κ < . The overall mean function µ ( i ) ( x ) of Y ( i ) t ( x ) is estimated by the weighted average ˆ µ ( i ) ( x ) = T (cid:88) t =1 w t ˆ f ( i ) t ( x ) . The mean-centred functional data is denoted as ˆ f ∗ ( i ) t ( x ) = ˆ f ( i ) t ( x ) − µ ( i ) ( x ) . We discretise ˆ f ∗ ( i ) t ( x ) as a T by J matrix ˆ F ( i ) ∗ , then multiply ˆ F ( i ) ∗ by W , where W = diag ( w , · · · , w T ) ,12uch that ˆ F ( i ) = W ˆ F ( i ) ∗ . We then follow the algorithm of estimating MFPCA introduced inthe previous section to calculate the weighted principal component scores and the weighted mul-tivariate eigenfunctions using the functional form of ˆ F ( i ) to obtain ˆ F ( i ) t ( x ) = (cid:80) Nn =1 ˆ ρ t,n ˆ ψ ( i ) n ( x ) up to the first optimal N -dimensional approximations. We lastly combine the estimated weightedaverage with the estimated weighted multivariate eigenfunctions and the estimated multivari-ate weighted principal component scores to obtain the weighted MFPCA model for mortalitymodelling and forecasting of the i -th subpopulation with first optimal N -dimensional approxi-mations, i.e. ˆ Y ( i ) t ( x ) = ˆ µ ( i ) ( x ) + N (cid:88) n =1 ˆ ρ t,n ˆ ψ ( i ) n ( x ) + ˆ σ ( i ) t ( x )ˆ e ( i ) t . Forecasts can be obtained by forecasting the weighted principal component scores { ˆ ρ t,n } Nn =1 using time series models independently. There is no need to consider the vector autoregression(VAR) model for forecasting the weighted principal component scores as they are not correlated,see Equation (2.7). { ˆ ρ t,n } Nn =1 can be extrapolated using possibly non-stationary autoregressiveintegrated moving average (ARIMA) model, and we can select the order of the ARIMA modelbased on the Akaike information criterion (AIC) or the Bayesian information criterion (BIC).Let ˆ ρ t + h,n denote the h -step ahead forecast of ˆ ρ t,n , then the h -step ahead out-of-sampleforecast of ˆ Y ( i ) t ( x ) is ˆ Y ( i ) t + h ( x ) = ˆ µ ( i ) ( x ) + N (cid:88) n =1 ˆ ρ t + h,n ˆ ψ ( i ) n ( x ) . We can also obtain the forecasting variance of the model by adding up the variances of all termstogether given the fact that the components of the wMFPCA model are uncorrelated, such thatVar ( ˆ Y ( i ) t + h ( x )) = ˆ τ µ ( i ) ( x ) + N (cid:88) n =1 ˆ ν t + h,n ( ˆ ψ ( i ) n ( x )) + (ˆ σ ( i ) t + h ( x )) , ˆ τ µ ( i ) ( x ) is the variance of the smoothed estimates of the mean function derived fromthe smoothing method applied, ˆ ν t + h,n is the estimated variance of ˆ ρ t + h,n that can be obtainedfrom the time series method used, and the estimated variance of forecast error (ˆ σ ( i ) t + h ( x )) iscalculated by taking the average of observational variance from the historical data.With the normality assumption on the model error and the known Var ( ˆ Y ( i ) t + h ( x )) , a − α )% prediction interval for ˆ Y ( i ) t + h ( x ) can be calculated as ˆ Y ( i ) t + h ( x ) ± z α (cid:113) Var ( ˆ Y ( i ) t + h ( x )) , where z α is the (1 − α/ quantile of the standard normal distribution. We now introduce the idea of the coherent wMFPCA model, in the sense that the long termforecasts of several subpopulations within a large population will be non-divergent.Let Y ( i ) t ( x ) be the log of the observed mortality rates of the i -th subpopulation for age x inyear t , { e ( i ) t } Tt =1 are the i.i.d. standard normal random variables, and σ ( i ) t ( x ) allows the amountof noise varying with age x . The coherent wMFPCA model has the following form: Y ( i ) t ( x ) = f ( i ) t ( x ) + σ ( i ) t ( x ) e ( i ) t , where f ( i ) t ( x ) = µ ( x ) + η ( i ) ( x ) + G t ( x ) + Z ( i ) t ( x ) .f ( i ) t ( x ) is the smoothed mortality function of the i -th subpopulation for age x in year t , µ ( x ) is the average of total mortality function, η ( i ) ( x ) is the mean of the i -th subpopulation specificdeviation function from the averaged total mortality function, G t ( x ) is the common trend acrossall populations, and Z ( i ) t ( x ) is the i -th subpopulation specific deviation trend.In such a model, µ ( x ) and η ( i ) ( x ) are unknown fixed functions, while G t ( x ) and Z ( i ) t ( x ) areassumed to be independent zero mean stochastic processes to ensure identifiability (Shang et14l., 2016), such that G t ( x ) and Z ( i ) t ( x ) can then be decomposed by the (multivariate) Karhunen-Lo`eve representation as G t ( x ) = ∞ (cid:88) k =1 β t,k φ k ( x ) ,Z ( i ) t ( x ) = ∞ (cid:88) l =1 γ t,l ϕ ( i ) l ( x ) , where { β t,k } ∞ k =1 and { φ k ( x ) } ∞ k =1 are the corresponding principal component scores and theeigenfunctions of G t ( x ) while { γ t,l } ∞ l =1 and { ϕ ( i ) l ( x ) } ∞ l =1 are the corresponding multivariateprincipal component scores and the multivariate eigenfunctions of Z ( i ) t ( x ) . It follows that { β t,k } ∞ k =1 is uncorrelated with { γ t,l } ∞ l =1 . Following these expansions, the model can be ex-pressed as f ( i ) t ( x ) = µ ( x ) + η ( i ) ( x ) + ∞ (cid:88) k =1 β t,k φ k ( x ) + ∞ (cid:88) l =1 γ t,l ϕ ( i ) l ( x ) . We carry on the same weighted functional component algorithm applied in the wMFPCA modelfor the coherent wMFPCA model. The components of the coherent wMFPCA model can beobtained using the estimation procedures below in practice:1. Obtain the total mortality function among all subpopulations smoothed mortality func-tions, ˆ g t ( x ) = p (cid:80) pi =1 ˆ f ( i ) t ( x ) , then calculate the weighted mean function of the totalmortality function, ˆ µ ( x ) = (cid:80) Tt =1 w t ˆ g t ( x ) , where w t = κ (1 − κ ) T − t is a geometricallydecaying weight with < κ < .2. Calculate the centered functional data ˆ g ∗ t ( x ) = ˆ g t ( x ) − ˆ µ ( x ) , then discretise ˆ g t ( x ) as a T by J matrix G ∗ , then multiply G ∗ by W , where W = diag ( w , · · · , w T ) , such that ˆ G = W G ∗ . 15. Perform univariate FPCA on the functional form of ˆ G to get ˆ G t ( x ) = (cid:80) Kk =1 ˆ β t,k ˆ φ k ( x ) upto the first optimal K -dimensional approximations. Let ˜ g t ( x ) = ˆ µ ( x ) + (cid:80) Kk =1 ˆ β t,k ˆ φ k ( x ) be the estimated weighted total mortality function.4. Calculate the deviation of the i -th subpopulation specific mortality function from the es-timated weighted total mortality function, ˆ d ( i ) t ( x ) = ˆ f ( i ) t ( x ) − ˜ g t ( x ) , then calculate theweighted mean of the subpopulation specific deviation function, ˆ η ( i ) ( x ) = (cid:80) Tt =1 w t ˆ d ( i ) t ( x ) .5. Obtain the demeaned functional data ˆ z ( i ) ∗ t ( x ) = ˆ d ( i ) t ( x ) − ˆ η ( i ) ( x ) , then discretise ˆ z ( i ) ∗ t ( x ) as a T by J matrix ˆ Z ( i ) ∗ , then multiply ˆ Z ( i ) ∗ by W , where W = diag ( w , · · · , w T ) , tohave ˆ Z ( i ) = W ˆ Z ( i ) ∗ .6. Perform multivariate FPCA on the functional form of ˆ Z ( i ) to obtain ˆ Z ( i ) t ( x ) = (cid:80) Ll =1 ˆ γ t,l ˆ ϕ ( i ) l ( x ) up to the first optimal L -dimensional approximations.With all the estimated components obtained above, we can represent the coherent wMFPCAmodel for mortality modelling and forecasting of the i -th subpopulation, i.e. ˆ Y ( i ) t ( x ) = ˆ µ ( x ) + ˆ η ( i ) ( x ) + ˆ G t ( x ) + ˆ Z ( i ) t ( x ) + ˆ σ ( i ) t ( x )ˆ e ( i ) t , or the full representation of the coherent wMFPCA model with the first optimal K -dimensionalapproximations to the common trend and the first optimal L -dimensional approximations to the i -th subpopulation deviation trend, such that ˆ Y ( i ) t ( x ) = ˆ µ ( i ) ( x ) + K (cid:88) k =1 ˆ β t,k ˆ φ k ( x ) + L (cid:88) l =1 ˆ γ t,l ˆ ϕ ( i ) l ( x ) + ˆ σ ( i ) t ( x )ˆ e ( i ) t , where ˆ µ ( i ) ( x ) = ˆ µ ( x ) + ˆ η ( i ) ( x ) is the mean function of the i -th subpopulation.16 .2.2 Out-of-sample forecasts and prediction intervals of the coherent wMFPCA model The h -step ahead out-of-sample forecast of Y ( i ) t ( x ) can be represented as ˆ Y ( i ) t + h ( x ) = ˆ µ ( i ) ( x ) + K (cid:88) k =1 ˆ β t + h,k ˆ φ k ( x ) + L (cid:88) l =1 ˆ γ t + h,l ˆ ϕ ( i ) l ( x ) , where ˆ β t + h,k and ˆ γ t + h,l are the h -step ahead forecasts of the weighted principal componentscores of the common trend and the i -th subpopulation specific deviation trend. ˆ β t + h,k can beobtained using a univariate time series forecasting method, such as ARIMA model. To ensurethe predictions of the subpopulations are coherent in the long term, the forecasts of all subpopu-lation deviation trends need to be restricted to be convergent and a stationary process, such that lim h →∞ L (cid:88) l =1 (cid:0) ˆ γ t + h,l ˆ ϕ ( i ) l ( x ) − ˆ γ t + h,l ˆ ϕ ( j ) l ( x ) (cid:1) = 0 . ˆ γ t + h,k can hence be achieved using possibly anystationary autoregressive moving average (ARMA) process or autoregressive fractionally inte-grated moving average (ARFIMA) process. The order of the aforementioned time series modelscan be decided based on the Akaike information criterion (AIC) or the Bayesian informationcriterion (BIC).Given the way that the coherent wMFPCA model has been constructed, each component isindependent to the other components. Therefore, the forecast variance can be expressed by thesum of component variances, i.e.Var ( ˆ Y ( i ) t + h ( x )) = ˆ τ µ ( i ) ( x ) + K (cid:88) k =1 ˆ u t + h,k ( ˆ φ k ( x )) + L (cid:88) l =1 ˆ ω t + h,l ( ˆ ϕ ( i ) l ( x )) + (ˆ σ ( i ) t + h ( x )) , where ˆ τ µ ( i ) ( x ) is the variance of the smoothed estimates of the mean function derived fromthe smoothing method used, ˆ u t + h,k and ˆ ω t + h,l are the variances of ˆ β t + h,k and ˆ γ t + h,l that can beobtained from the time series methods applied, and the forecast error (ˆ σ ( i ) t + h ( x )) is the averageof the observational variance estimated from the historical data.With the normality assumption on the model error and the known Var ( ˆ Y ( i ) t + h ( x )) , a − α )% prediction interval for ˆ Y ( i ) t + h ( x ) can be calculated as ˆ Y ( i ) t + h ( x ) ± z α (cid:113) Var ( ˆ Y ( i ) t + h ( x )) , where17 α is the (1 − α/ quantile of the standard normal distribution.Note that the weights { w t } Tt =1 are controlled by the tuning parameter κ in the geometricallydecaying weighting approach embedded in the two proposed models. The larger κ is, the fasterthe weights for the historical observations are decaying over time geometrically. In practice,the tuning parameter κ can be determined by minimising the average root mean square error(RMSE) of all populations defined asRMSE = 1 p p (cid:88) i =1 (cid:118)(cid:117)(cid:117)(cid:116) J J (cid:88) j =1 (cid:18) Y ( i ) t + h ( x j ) − ˆ Y ( i ) t + h ( x j ) (cid:19) . (3.1)The value of the parameter κ can alternatively be specified as a prior , if there is a strong priorknowledge of how past data should be weighted (Wu & Wang, 2019).For selecting the optimal number of principal components in the two proposed models, weuse a cumulative percentage of total variation method. We denote N as a generic notation ofthe optimal number of principal components, and N is determined by N = argmin N : N ≥ (cid:18) (cid:80) Nn =1 ˆ λ n (cid:80) ∞ n =1 ˆ λ n ≥ P (cid:19) where ˆ λ n is the corresponding estimated eigenvalue of the principal components analysis, and P ≥ In this section, we illustrate the two proposed models − the wMFPCA model and the coher-ent wMFPCA model using sex-specific mortality data. We first present and plot the observedmortality dataset, then demonstrate the usefulness of these two models by forecasting of thesex-specific mortality rates of Japan. We show the forecasting results for males and femalescompared with the observed data. We further exhibit the ability of non-diverging long termforecasts of the proposed coherent wMFPCA model and finally assess the forecasting accuracy18f the two proposed models in comparison to the Product-Ratio model and the independentFPCA model using the sex-specific mortality data of ten different developed countries. The sex-specific mortality rates data of Japan are available for the year 1947 to the year 2016from the Human Mortality Database (2020). The database consists of central death rates bygender and single calendar year of age up to 110 years old. We restrict the data at the maxi-mum age of 100 to avoid problems associated with erratic rates above age 100. The observedmortality rates curves are smoothed using penalised regression splines with a partial monotonicconstraint so that each mortality curve is increasing above age 65 monotonically (Hyndman &Ullah, 2007). Figure 4.1 shows the log mortality rates for males and females for selected agegroups as univariate time series. Figure 4.2 and Figure 4.3 present the same set of data as a batchof observed and smoothed curves (functional observation) in rainbow plots with time-orderingindicated by the colours of the rainbow, from red to violet. Figure 4.2 and Figure 4.3 both showthat there are steady declines in male and female mortality rates at most ages over our examinedperiod. The mortality curve patterns for male and female are reasonably similar, while for male,the mortality rates are generally higher than the mortality rates of female, particularly at aroundage 20. Despite the higher male mortality rates in comparison with female’s, the mortality gapbetween male’s and female’s gets narrower over time at older ages.
In the demonstration of the first proposed weighted MFPCA model, we aim to make 20-years-ahead out-of-sample forecasts for male and female mortality rates of Japan. We first split thedataset with the observed mortality rates from the year 1947 to the year 1996 and a test datasetwith remaining observed mortality data from the year 1997 to the year 2016. We choose the19eight parameter κ in the wMFPCA model by minimising the average root mean square er-ror (RMSE) stated in Equation (3.1) based on rolling-window analyses. For example, in ourdemonstration, we aim to make a 20-years horizon prediction for the male and female mortalityin Japan for the year 2016. We firstly use the data from the year 1947 to the year 1967 to makea 20-years horizon prediction for the mortality rates in the year 1987 then compute the RMSEwith the observed data for the first rolling-window analysis. We subsequently use the data fromthe year 1947 to the year 1968 to make the prediction for the mortality rates in the year 1988and compute RMSE with the observed data again. We repeat this procedure until it reachesa point where it uses the data from the year 1947 to the year 1976 to make a forecast for theendpoint of the observed data in the year 1996. We decide the value of the weight parameterover the interval < κ < that can minimise the average RMSE of male and female mortalitydata among these ten rolling-window tests.The mean functions for male and female and their functional principal components are es-timated as discussed in the previous section. The analysis shows that the first three functionalprincipal components for male and female explain 97.2%, 2.3% and 0.2% respectively, andaccount for more than 99% in total of the variations in the sample data. We, therefore, se-lect the first three estimated principal components for approximations and demonstrations. Foreach score of the corresponding functional principal component shared by male and female, weforecast it independently by a univariate ARIMA time series using the R package ‘ forecast ’(Hyndman et al., 2007). The order of ARIMA models is chosen based on the Akaike informa-tion criterion (AIC).The top panels of Figure 4.4 demonstrate the estimated mean functions ˆ µ ( i ) ( x ) and thefirst three estimated functional principal components { ˆ ψ ( i ) n ( x ) } for male ( i = M) and female ( i = F). Their corresponding scores of the PCs { ˆ ρ t,n } n =1 with a 20-years-ahead out-of-sampleforecast horizon and 95% prediction intervals are displayed in the bottom panel of Figure 4.4.20he functional principal components model different movements in mortality rates. The firstfunctional principal components for male and female show a similar pattern, which both countfor very high variation for young teenagers, then become level-off in middle-age and elderly.Although the corresponding scores for the first component show an increasing trend, we caninterpret that there is a decreasing trend in mortality rates across our observed period given itscorresponding first components for male and female are both negative. The second and the thirdprincipal components for male have relatively high negative variation among younger ages, butthey get less oscillated in the ranges of middle-age and elderly. While the second and the thirdprincipal components for female show kinds of the opposite direction of each other.Figure 4.5 presents the residual plots by year and by age of the wMFPCA model fitted bygender in Japan. As we have applied the weighting approach for placing more recent data intoconsiderations, both the residual plots by year for male and female, have a funnel shape. Theseindicate that the fitted wMFPCA model captures more information from the recent data andleave the distant past data with less impact. The residual plots by age for male and female showthat the fitted values of the models are underestimated for the young age groups and slightlyoverestimated for the elderly groups. These may be due to the higher mortality improvementrates for the young group and the lower mortality improvement rates for the elderly group overour fitting period. Figure 4.6 shows the 20-years-ahead forecasting results of mortality curvesof male (with RMSE = 0.2006) and female (with RMSE = 0.2406) from age 0 to age 100 forthe year 2016 by the wMFPCA model based on the in-sample data from the year 1947 to theyear 1996 in Japan. The presentation of the coherent wMFPCA model forecasting follows the same strategies ofhow we split the dataset for in- and out-of-sample data and choosing the weight parameter for21he coherent wMFPCA model as we have done for the wMFPCA model in the previous section.In the top panels of Figure 4.7, we display the estimated overall mean function ˆ µ ( x ) and thefirst three estimated functional principal components { ˆ φ k ( x ) } k =1 . The corresponding scores ofthe first three estimated functional principal components { ˆ β t,k } k =1 are presented along with 20-years-ahead out-of-sample forecast means and 95% confidence intervals by ARIMA models inthe bottom panel of Figure 4.7. The first three common trend functional principal componentscapture 98%, 1.7% and 0.2% variations respectively and more than 99% of the variations in theage-specific total mortality overall among this dataset. Each functional principal componentmodels different movements in mortality rates. ˆ φ ( x ) primarily models the degree of variationsin mortality among different age groups as we expect that the variance of mortality rates inyoung age groups is relatively larger than elderly groups. In contrast, ˆ φ ( x ) and ˆ φ ( x ) modelthe young adult age below 40 and differences between late teen and those over 60. From the firstforecast common principal component scores, the declines of mortality rates seem to continue,and we do not spot any apparent pattern from the second and third estimated common principalcomponent scores.In the top panels of Figure 4.8, we plot the estimated male and female deviation functionsfrom the overall mean. It is obvious that the estimated male deviation function ˆ η M ( x ) is apositive function, while the female estimated deviation function ˆ η F ( x ) covers whole negativerange over all ages. These indicate that the male mortality rates are in general higher thanfemale’s, and the difference reaches its peak at around age 20. All estimated male and femaledeviation functions from the overall mean behave like an upside-down reflection of each otherand demonstrate completely reverse patterns. The bottom panels of Figure 4.8 show the firstthree estimated deviation trend functional principal components for male and female. From theforecasts of the three shared principal component scores among two genders, they seem to havea flat zero-convergent trend. It is therefore likely to achieve non-divergent forecasts between22ale and female subpopulations in the long term, in which we will further examine this in detailin the next section.Figure 4.9 shows the residual plots by year and by age of the fitted coherent wMFPCAmodel for male and female mortality of Japan. Compared to the residual patterns of the fittedwMFPCA model used in the previous section, the fitted coherent wMFPCA model seems tocapture mortality information evenly over the whole observed period, given that both residualplots demonstrate i.i.d. random variables patterns with zero mean and equal variance across allthe age-groups over our examined period. Figure 4.10 shows the 20-years-ahead forecastingresults of mortality curves of male (with RMSE = 0.1821) and female (with RMSE = 0.1460)from age 0 to age 100 for the year 2016 using the coherent wMFPCA model based on thein-sample data from the year 1947 to the year 1996 in Japan. In this section, we move to examine and compare forecast patterns with mortality sex ratiosand life expectancy by the forecasts of the two proposed models − the wMFPCA model and thecoherent wMFPCA model with two existing models − the independent FPCA model (Hyndman& Shang, 2009) and the Product-Ratio model (Hyndman et al., 2013).The independent FPCA model is a univariate FPCA method for forecasting two subpopu-lation groups independently without considerations of any potential correlation of two subpop-ulation groups. Similar to the wMFPCA model, the forecast results of the independent FPCAmodel are based on a non-stationary time series model on its estimated principal componentscores, leading to forecast results of two or more subpopulations divergent to different direc-tions in the long run. It is thus regarded as a non-coherent forecasting approach. Meanwhile,the Product-Ratio model begins with an idea of obtaining the product and ratio function of23ll subpopulations by assuming all subpopulations have equal variance. In the log scale, theproduct function can be treated as the sum of all sub populations, whereas the ratio functioncan be treated as the differences among subpopulations. The predictions can be obtained byfirstly applying the independent FPCA model to forecast the future realisations of the productand ratio functions separately, then transforming the forecasts of the product and ratio functionsback to the original subpopulations functions. The convergent forecasts are achieved by usingstationary time series methods, namely the ARMA model or the ARFIMA model, on the ratiofunction, which implicitly implies that the differences among subpopulations will be convergentto zero in the long term. It is therefore viewed as an example of coherent forecasting approach.In a similar vein, the proposed coherent wMFPCA model restricts the stationary properties onthe deviation functions of each subpopulation from the overall mean to accomplish the coherentforecasting with no need to assume all the subpopulations have the same variances.To deliver the concept of coherent forecasting more concretely, we plot the historical mor-tality sex ratios (Male/Female) and the life expectancy curves obtained from the observed maleand female mortality rates from the year 1997 to the year 2016 alongside the 20-years-aheadforecasts of the mortality sex ratios and the life expectancy curves from the year 1997 to theyear 2016 by the non-coherent forecast methods − the independent FPCA model and the wMF-PCA model and the coherent forecast methods − the Product-Ratio model and the coherentwMFPCA model using the observed mortality rates from the year 1947 to the year 1996 in Fig-ure 4.11 and Figure 4.12. We can see that all the coherent forecasting methods exhibit a fairlyone-to-one smooth pattern with the actual mortality sex ratio, and with much less fluctuationthan the non-coherent forecast methods in Figure 4.11. The convergent forecastings by the co-herent models also fit well with the actual biological characteristics trends in Figure 4.12, wherethe differences in males and females life expectancy converge to a certain level gradually andslowly, instead of diverging into different directions like the forecast results of the non-coherent24orecast methods. Our demonstrations prove the importance of coherent modelling when thereexist common biological characteristics among several subpopulations. We now evaluate and compare the forecast accuracy of our two proposed models − the wMPFCAmodel and the coherent wMFPCA model with the two existing models − the independent FPCAmodel and the Product-Ratio model demonstrated in the previous section. In order to have acomprehensive investigation of the forecast accuracy of our two proposed models, we con-sider ten other developed countries for which data are also available in the Human MortalityDatabase. We restrict data periods of all selected countries commencing in the year 1947 up tothe year 2016 (70 years in total) for a unified purpose. We examine and quantify the forecastingperformance of our models by a rolling window analysis, which is frequently used for assess-ing the consistency of a model’s forecasting ability by rolling a fixed size prediction interval(window) throughout the observed sample (Zivot & Wang, 2007). More generally speaking, wehold the sample data from the initial year up to the year t as holdout samples. We produce theforecast for the t + h year where h is the forecast horizon, then determine the forecasts errorsby comparing the forecast result with the actual out-of-sample data. We increase one rollingwindow (1 year ahead) in year t + 1 to make the same procedure again for the year t + h + 1 until the rolling window analysis covers all available data.We include four different forecast horizons ( h = 5 , , and with ten sets of rolling-window to exam the short-term, the mid-term and the long-term forecast abilities of the pro-posed two models. We use the root mean square error (RMSE) to measure the standard devia-tion of the average square prediction error regardless of sign. In our experiments, we define the25MSE as follows:RMSE ( i ) c ( h ) = (cid:118)(cid:117)(cid:117)(cid:116) × (cid:88) w =0 101 (cid:88) j =1 (cid:18) Y ( i ) t + w + h ( x j ) − ˆ Y ( i ) t + w + h ( x j ) (cid:19) , where c is the selected country, w is the rolling-window set, i is the subpopulation for male ( i = M) and for female ( i = F) and j is the age group including from age 0 to age 100 in ourexperiment.Based on the average RMSE results of ten sets of rolling window analysis across ten coun-tries in four different forecast horizons presented in Table 4.1, the forecast performances ofthe wMFPCA model and the Product-Ratio model are reasonably similar with no particularoutstanding area. The independent FPCA model performs consistently the most desirable forforecasting female mortality. Small variabilities of female mortality across all observed coun-tries over age and time may mainly contribute to the superiority of the independent FPCA modelover the others in our experiments. Meanwhile, the proposed coherent wMFPCA method per-forms the best in terms of having the least male forecast errors. The coherent wMFPCA methodseems to be capable of capturing rapid changes in male mortality across different periods andage groups in many tested countries.When we consider the forecast horizon size up to fifteen or twenty years, we can see that thecoherent models maintain relatively less forecast bias among two sexes than the non-coherentmodels. For instance, the independent FPCA model produces comparatively large forecastingerrors for male mortality and gives small forecast inaccuracies for female death rates. In con-trast, the coherent models can keep the same level of forecast errors for both. With assumedjoint biological characteristics among the two genders that we discussed in the previous sec-tion, the mortality pattern among two sexes is supposed to get similar in the long run, and theconvergent designed forecast model is therefore needed. In particular, the coherent wMFPCAmodel is proved to be more suitable and accurate than the Product-Ratio model as it produces26he smallest overall forecast errors and bias for both genders in the long-term prediction in ourstudy.The main finding in this section is that in the two-sex case, the accuracy of the male forecastis significantly improved by the coherent wMFPCA model at the small expense of accuracy infemale mortality forecasts. By adopting the coherent wMFPCA model, the forecast accuracyamong all subpopulations is homogeneous as it incorporates additional information into theforecast for a single subpopulation. The additional information acts as a frame of referencelimiting to the probability of a subpopulation forecast which may continue a diverging trendfrom other related subpopulations directions. This feature of the coherent wMFPCA model isalso useful in some other specific practical applications, such as financial planning with severalrelated stock prices, in a situation that we aim to maintain a balanced error margin amongst allsubpopulations. This speciality is unique and has not been achieved by other non-coherent orsingle population models. 27 a) Male(b) Female Figure 4.1: Log mortality rates for (a) male and (b) female from age 0 to age 100 with 20-yearage intervals from the year 1947 to the year 2016 in Japan.28 a) Observed(b) Smoothed
Figure 4.2: (a) Observed and (b) smoothed log mortality rates for male from the year 1947 tothe year 2016 in Japan, viewed as functional data curves with time-ordering indicated by thecolours of the rainbow from red to violet. 29 a) Observed(b) Smoothed
Figure 4.3: (a) Observed and (b) smoothed log mortality rates for female from the year 1947to the year 2016 in Japan, viewed as functional data curves with time-ordering indicated by thecolours of the rainbow from red to violet. 30 a) Male(b) Female
Figure 4.4: Estimated mean functions and the first three functional principal components for (a)male and (b) female mortality of Japan. 31 c) PC scores for male and female
Figure 4.4: ( cont. ) (c) Corresponding estimated PC scores with 20-years-ahead forecast meanswith 95% prediction intervals (in grey). 32 a) Male(b) Female
Figure 4.5: Residual plots by year and by age of the fitted sex-specific mortality rates of Japanfor (a) male and (b) female via the wMFPCA model.33 a) Male(b) Female
Figure 4.6: Fitted mortality curves for (a) male (with RMSE = 0.2006) and for (b) female (withRMSE = 0.2406) from age 0 to age 100 with 95% confidence intervals using the wMFPCAmodel for the year 2016 based on the observations from the year 1947 to the year 1996 inJapan. Circles are the true log mortality rates, solid lines are the predictive means, and dashedlines are the 95% confidence intervals. 34igure 4.7: Estimated overall mean function and the first three estimated common trend func-tional principal components for the total mortality rates of Japan.35igure 4.7: ( cont. ) Corresponding PC scores of the estimated common trend with 20-years-ahead forecast means with 95% confidence intervals (in grey).36 a) Male deviation from the overall mean function(b) Female deviation from the overall mean function
Figure 4.8: Estimated deviation functions from the overall mean function and the first threeestimated deviation trend functional principal components for (a) male and (b) female mortalityrates of Japan. 37igure 4.8: ( cont. ) (c) Corresponding PC scores of the estimated deviation trend with 20-years-ahead forecast means with 95% confidence intervals (in grey).38 a) Male(b) Female
Figure 4.9: Residual plots by year and by age of the fitted sex-specific mortality rates of Japanfor (a) male and (b) female via the coherent wMFPCA model.39 a) Male(b) Female
Figure 4.10: Predicted mortality curves of male (with RMSE = 0.1821) and female (with RMSE= 0.1460) from age 0 to age 100 with 95% confidence intervals using the coherent wMFPCAmodel for the year 2016 based on the observations from the year 1947 to the year 1996 in Japan.Circles are the true log mortality rates, solid lines are the predictive means and dashed lines arethe 95% confidence intervals. 40 a) Historical data(b) Independent FPCA model (c) wMFPCA model(d) Product-Ratio model (e) Coherent wMFPCA model
Figure 4.11: Historical Japanese mortality sex ratio and 20-years-ahead forecasts of mortalitysex ratios in Japan from the year 1997 to the year 2016 using the independent FPCA model, thewMFPCA model, the Product-Ratio model and the coherent wMFPCA model.41 a) Independent FPCA model (b) wMFPCA model(c) Product-Ratio model (d) Coherent wMFPCA model
Figure 4.12: 20-year life expectancy predicted curves for male and female in Japan using theindependent FPCA model, the wMFPCA model, the Product-Ratio model and the coherentwMFPCA model. Blue sold line is used for male and red sold line is used for female. Dottedlines are the observed life expectancy for males and females.42 oun t r y I nd e p e nd e n t FP C A m od e l w M FP C A m od e l P r odu c t - R a ti o m od e l C oh e r e n t w M FP C A m od e l M F M + F M F M + F M F M + F M F M + F h = A u s t r a li a . . . . . . . . . . . . B e l g i u m . . . . . . . . . . . . C a n a d a . . . . . . . . . . . . F r a n ce . . . . . . . . . . . . I t a l y0 . . . . . . . . . . . . J a p a n0 . . . . . . . . . . . . N e t h e r l a nd s . . . . . . . . . . . . S p a i n0 . . . . . . . . . . . . U . K . . . . . . . . . . . . U . S . A . . . . . . . . . . . . A v e r a g e . . . . . . . . . . . . h = A u s t r a li a . . . . . . . . . . . . B e l g i u m . . . . . . . . . . . . C a n a d a . . . . . . . . . . . . F r a n ce . . . . . . . . . . . . I t a l y0 . . . . . . . . . . . . J a p a n0 . . . . . . . . . . . . N e t h e r l a nd s . . . . . . . . . . . . S p a i n0 . . . . . . . . . . . . U . K . . . . . . . . . . . . U . S . A . . . . . . . . . . . . A v e r a g e . . . . . . . . . . . . T a b l e . : F o r eca s t acc u r ac yo f m o r t a lit y f o r m a l ea nd f e m a l e u s i ng t h e i nd e p e nd e n t FP C A m od e l , t h e w M FP C A m od e l , t h e P r odu c t - R a ti o m od e l a nd t h ec oh e r e n t w M PF C A m od e li s m ea s u r e dby t h ea v e r a g e R M S E s o f t e n s e t s o fr o lli ng - w i ndo w s a n a l y s i s . T h e m i n i m a l f o r eca s t e rr o r s a r e h i gh li gh t e d , a nd t h e l o w e s t a v e r a g e d f o r eca s t e rr o r s a m ong m od e l s i nd i ff e r e n t f o r eca s t ho r i z on s a r e h i gh li gh t e d i nbo l d . oun t r y I nd e p e nd e n t FP C A m od e l w M FP C A m od e l P r odu c t - R a ti o m od e l C oh e r e n t w M FP C A m od e l M F M + F M F M + F M F M + F M F M + F h = A u s t r a li a . . . . . . . . . . . . B e l g i u m . . . . . . . . . . . . C a n a d a . . . . . . . . . . . . F r a n ce . . . . . . . . . . . . I t a l y0 . . . . . . . . . . . . J a p a n0 . . . . . . . . . . . . N e t h e r l a nd s . . . . . . . . . . . . S p a i n0 . . . . . . . . . . . . U . K . . . . . . . . . . . . U . S . A . . . . . . . . . . . . A v e r a g e . . . . . . . . . . . . h = A u s t r a li a . . . . . . . . . . . . B e l g i u m . . . . . . . . . . . . C a n a d a . . . . . . . . . . . . F r a n ce . . . . . . . . . . . . I t a l y0 . . . . . . . . . . . . J a p a n0 . . . . . . . . . . . . N e t h e r l a nd s . . . . . . . . . . . . S p a i n0 . . . . . . . . . . . . U . K . . . . . . . . . . . . U . S . A . . . . . . . . . . . . A v e r a g e . . . . . . . . . . . . T a b l e . : ( c on t . ) F o r eca s t acc u r ac yo f m o r t a lit y f o r m a l ea nd f e m a l e u s i ng t h e i nd e p e nd e n t FP C A m od e l , t h e w M FP C A m od e l , t h e P r odu c t - R a ti o m od e l a nd t h ec oh e r e n t w M PF C A m od e li s m ea s u r e dby t h ea v e r a g e R M S E s o f t e n s e t s o fr o lli ng - w i ndo w s a n a l y s i s . T h e m i n i m a l f o r eca s t e rr o r s a r e h i gh li gh t e d , a nd t h e l o w e s t a v e r a g e d f o r eca s t e rr o r s a m ong m od e l s i n d i ff e r e n t f o r eca s t ho r i z on s a r e h i gh li gh t e d i nbo l d . Discussion and conclusion remarks
With the theoretical framework of multivariate functional principal component analysis mo-tivated by Chiou et al. (2016) and Happ & Greven (2018), in this paper, we have proposedtwo new models that aim to model and forecast for a group of mortality rates, taking advan-tages of commonalities in their historical experience and age patterns. The first one, namely aswMFPCA model, is introduced to acknowledge differences in groups, age patterns and trendsof several subpopulations to model together when subpopulations have somewhat sufficientlysimilar historical patterns. The coherent wMFPCA model is a novel extension of the wMFPCAmodel in a coherent direction. We design the coherent structure of the model to primarily ful-fil the idea that when several subpopulation groups have similar socio-economic conditions orcommon biological characteristics and such these close connections are expected to continueand evolve in a non-diverging fashion in the distant future. The time weightings approaches onthese two models lead us to expect the future patterns of mortality follow more likely recentpast observations and obliterate some parts of irrelevant distant past mortality movements infavour of forecast performances of the two proposed models.We have demonstrated the two models through forecasting for sex-specific mortality withthe observed data from Japan. The wMFPCA model consists of the mean functions and thefunctional principal components of each subpopulation with corresponding scores shared byall subpopulations. We can obtain the forecasts of the wMFPCA model by extrapolating theshared principal component scores ahead with any non-stationary time series model, such asARIMA model in our example. Whereas there are two primary components in the coherentwMFPCA model, including the average components among all subpopulations and the devi-ation components of each subpopulation deviated from the average components. We also usea non-stationary time series model for the predicted average component corresponding scores.45owever, we apply a stationary time series model for the forecast of deviation components cor-responding scores so that the differences among all subpopulations will gradually decline andbe bound to achieve coherence in the long term forecast. The forecasts of mortality sex ratiosand life expectancy cure patterns confirmed the non-divergent forecastability of the proposedcoherent wMFPCA model.The usefulness of the two proposed models is illustrated through a series of forecast accu-racy evaluations and comparisons with other existing methods. The wMFPCA model provides avery flexible framework for multi-population mortality forecasting with comparable forecast ac-curacy as the independent FPCA model and the Product-Ratio model. The coherent wMFPCAmodel outperforms the Product-Ratio model in terms of forecast accuracy with no assumptionsneeded to place on the equal variance of all subpopulations. The coherent wMFPCA modelmaintains the same level in the short term forecast ability as the independent FPCA model.In contrast, the coherent wMFPCA model produces more sensible forecast results with lessforecast error and bias in the two-sex mortality case in the long term.The main limitations of the two proposed models also attribute to the characteristics inwhich they belong to the classes of ‘non-parametric’ or ‘pure extrapolative’ methods. They cancapture trends in the historical data well. At the same time, they lack the ability to incorporatemore other related information, such as the change in medical technology, environment andsocial-economy for predictions. Another issue is the compatibility of the wMFPCA models.It requires the sufficient level of homogeneity among subpopulations for forecasting, and theforecast ability of the wMFPCA models may be affected if several completely irrelevant sub-populations are placed together in the wMFPCA models. Although we used two sex-specificmortality rates for our demonstration in this article, it is obviously straightforward to apply thesemodels to other scenarios with multiple related populations.46 eferences
Besse, P. (1992). PCA stability and choice of dimensionality.
Statistics & Probability Letters , (5), 405–410.Booth, H., Maindonald, J., & Smith, L. (2002). Applying Lee-Carter under conditions ofvariable mortality decline. Population Studies , (3), 325–336.Bosq, D. (2012). Linear Processes in Function Spaces: Theory and Applications (Vol. 149).Springer Science & Business Media.Cairns, A. J., Blake, D., & Dowd, K. (2008). Modelling and management of mortality risk: Areview.
Scandinavian Actuarial Journal , (2-3), 79–113.Cairns, A. J., Blake, D., Dowd, K., Coughlan, G. D., & Khalaf-Allah, M. (2011). Bayesianstochastic mortality modelling for two populations. ASTIN Bulletin: The Journal of the IAA , (1), 29–59.Chen, R. Y., & Millossovich, P. (2018). Sex-specific mortality forecasting for UK countries: Acoherent approach. European Actuarial Journal , (1), 69–95.Chiou, J.-M., et al. (2012). Dynamical functional prediction and classification, with applicationto traffic flow prediction. The Annals of Applied Statistics , (4), 1588–1614.Chiou, J.-M., Yang, Y.-F., & Chen, Y.-T. (2016). Multivariate functional linear regression andprediction. Journal of Multivariate Analysis , , 301–312.Danesi, I. L., Haberman, S., & Millossovich, P. (2015). Forecasting mortality in subpopulationsusing Lee-Carter type models: A comparison. Insurance: Mathematics and Economics , ,151–161.Delwarde, A., Denuit, M., Guill´en, M., & Vidiella-i Anguera, A. (2006). Application of thePoisson log-bilinear projection model to the G5 mortality experience. Belgian ActuarialBulletin , (1), 54–68. 47owd, K., Cairns, A. J., Blake, D., Coughlan, G. D., & Khalaf-Allah, M. (2011). A gravitymodel of mortality rates for two related populations. North American Actuarial Journal , (2), 334–356.Enchev, V., Kleinow, T., & Cairns, A. J. (2017). Multi-population mortality models: Fitting,forecasting and comparisons. Scandinavian Actuarial Journal , (4), 319–342.Hall, P., M¨uller, H.-G., & Wang, J.-L. (2006). Properties of principal component methods forfunctional and longitudinal data analysis. The Annals of Statistics , 1493–1517.Happ, C., & Greven, S. (2018). Multivariate functional principal component analysis for dataobserved on different (dimensional) domains.
Journal of the American Statistical Associa-tion , (522), 649–659.Hatzopoulos, P., & Haberman, S. (2013). Common mortality modeling and coherent forecasts.An empirical analysis of worldwide mortality data. Insurance: Mathematics and Economics , (2), 320–337.Human Mortality Database. (2020). University of California, Berkeley (USA), and MaxPlanck Institute for Demographic Research (Germany). Available online: . Accessed on 15 January 2020.Hyndman, R. J., Booth, H., & Yasmeen, F. (2013). Coherent mortality forecasting: TheProduct-Ratio method with functional time series models. Demography , (1), 261–283.Hyndman, R. J., Khandakar, Y., et al. (2007). Automatic Time Series for Forecasting: TheForecast Package for R (No. 6/07). Monash University, Department of Econometrics andBusiness Statistics.Hyndman, R. J., & Shang, H. L. (2009). Forecasting functional time series.
Journal of theKorean Statistical Society , (3), 199–211. 48yndman, R. J., & Ullah, M. S. (2007). Robust forecasting of mortality and fertility rates: Afunctional data approach. Computational Statistics & Data Analysis , (10), 4942–4956.Jarner, S. F., & Kryger, E. M. (2011). Modelling adult mortality in small populations: TheSAINT model. ASTIN Bulletin: The Journal of the IAA , (2), 377–418.Kalben, B. B. (2000). Why men die younger: Causes of mortality differences by sex. NorthAmerican Actuarial Journal , (4), 83–111.Karhunen, K. (1946). Zur spektraltheorie stochastischer prozesse. Annales Academie Scien-tiarum Fennice, Series Ai , .Lee, R., & Carter, L. R. (1992). Modeling and forecasting US mortality. Journal of theAmerican Statistical Association , (419), 659–671.Lee, R., & Miller, T. (2001). Evaluating the performance of the Lee-Carter method for fore-casting mortality. Demography , (4), 537–549.Li, J. (2013). A Poisson common factor model for projecting mortality and life expectancyjointly for females and males. Population Studies , (1), 111–126.Li, J., Tickle, L., & Parr, N. (2016). A multi-population evaluation of the Poisson commonfactor model for projecting mortality jointly for both sexes. Journal of Population Research , (4), 333–360.Li, N., & Lee, R. (2005). Coherent mortality forecasts for a group of populations: An extensionof the Lee-Carter method. Demography , (3), 575–594.Lo`eve, M. (1946). Fonctions al´eatoires `a d´ecomposition orthogonale exponentielle. La RevueScientifique , , 159–162.Ramsay, J. O. (2004). Functional data analysis. Encyclopedia of Statistical Sciences , .49amsay, J. O., & Dalzell, C. (1991). Some tools for functional data analysis. Journal of theRoyal Statistical Society: Series B (Methodological) , (3), 539–561.Ramsay, J. O., & Silverman, B. W. (2007). Applied Functional Data Analysis: Methods andCase Studies . Springer.Renshaw, A. E., & Haberman, S. (2006). A cohort-based extension to the Lee-Carter model formortality reduction factors.
Insurance: Mathematics and Economics , (3), 556–570.Shang, H. L., et al. (2016). Mortality and life expectancy forecasting for a group of populationsin developed countries: A multilevel functional data method. The Annals of Applied Statistics , (3), 1639–1672.Wan, C., & Bertschi, L. (2015). Swiss coherent mortality model as a basis for developinglongevity de-risking solutions for Swiss pension funds: A practical approach. Insurance:Mathematics and Economics , , 66–75.Wu, R., & Wang, B. (2019). Coherent mortality forecasting by the weighted multilevel func-tional principal component approach. Journal of Applied Statistics , (10), 1774–1791.Yao, F., M ¨uller, H.-G., & Wang, J.-L. (2005). Functional data analysis for sparse longitudinaldata. Journal of the American Statistical Association , (470), 577–590.Zemyan, S. M. (2012). The Classical Theory of Integral Equations: A Concise Treatment .Springer Science & Business Media.Zivot, E., & Wang, J. (2007).