# Mortality Forecasting using Factor Models: Time-varying or Time-invariant Factor Loadings?

MMortality Forecasting using Factor Models:Time-varying or Time-invariant Factor Loadings?

Lingyu He , Fei Huang *2 , Jianjie Shi , and Yanrong Yang Hunan University The University of New South Wales Monash University The Australian National University

Abstract

Many existing mortality models follow the framework of classical factor models, suchas the Lee-Carter model and its variants. Latent common factors in factor models are de-ﬁned as time-related mortality indices (such as κ t in the Lee-Carter model). Factor load-ings, which capture the linear relationship between age variables and latent common fac-tors (such as β x in the Lee-Carter model), are assumed to be time-invariant in the classicalframework. This assumption is usually too restrictive in reality as mortality datasets typ-ically span a long period of time. Driving forces such as medical improvement of certaindiseases, environmental changes and technological progress may signiﬁcantly inﬂuence therelationship of different variables. In this paper, we ﬁrst develop a factor model with time-varying factor loadings (time-varying factor model) as an extension of the classical factormodel for mortality modelling. Two forecasting methods to extrapolate the factor loadings,the local regression method and the naive method, are proposed for the time-varying fac-tor model. From the empirical data analysis, we ﬁnd that the new model can capture theempirical feature of time-varying factor loadings and improve mortality forecasting overdifferent horizons and countries. Further, we propose a novel approach based on changepoint analysis to estimate the optimal ‘boundary’ between short-term and long-term fore-casting, which is favoured by the local linear regression and naive method, respectively.Additionally, simulation studies are provided to show the performance of the time-varyingfactor model under various scenarios. * Correspondence to: Dr. Fei Huang, School of Risk and Actuarial Studies, UNSW Business School, UNSWSydney, NSW 2052, Australia. Email: [email protected] a r X i v : . [ s t a t . M E ] F e b EL Code

G22 - Insurance; Insurance Companies; Actuarial Studies

Keywords

Lee-Carter Model; Long-term Forecasting; Optimal ‘Boundary’ Estimation;Short-term Forecasting; Time-varying Factor Model.

Mortality forecasting is an important topic in various areas, such as demography, actuarialscience and government policymaking. Most age-speciﬁc mortality data are high-dimensionaltime series. The factor model approach is one of the most popular methods to model high-dimensional time series, representing the data matrix by a few latent common factors. Commonfactors describe common information shared by cross-sections, while factor loadings reﬂect thelinear relationship between the original variables and the common factors. There is a largeliterature discussing factor models, including but not limited to Anderson (1963), Pena and Box(1987), Stock and Watson (2002), Bai and Ng (2002), Bai (2009), Lam and Yao (2012) andChang et al. (2018).Many existing stochastic mortality models use the factor model approach. As an applica-tion of the classical factor model (with time-invariant factor loadings), Lee and Carter (1992)(Lee-Carter Model) is one of the most prominent methods for mortality forecasting, which isemployed by the US Bureau of the Census as the benchmark model to predict long-run lifeexpectancy (Hollmann et al., 1999). The common factor extracted by the Lee-Carter model isdeﬁned as Mortality Index, and the factor loadings capture the relationship between the age vari-ables and the mortality index. Since there is only one factor in the Lee-Carter model, Booth et al.(2002), Renshaw and Haberman (2003) and Yang et al. (2010) extended the Lee-Carter frame-work to incorporate more common latent factors for mortality modelling in different countries.Li and Chan (2005) proposed an outlier-adjusted model to deal with possible outliers in the mor-tality index by combining the Lee-Carter model with time series outlier analysis. Additionally,Booth et al. (2006) compared the Lee-Carter model with four other variants by applying them tomortality data of multiple populations. Tuljapurkar et al. (2000) examined mortality rates overﬁve decades for the G7 countries using the Lee-Carter model. Lundstr¨om and Qvist (2004) andBooth et al. (2004) applied the Lee-Carter model to mortality data of Sweden and Australia, re-spectively. A summary of the variants of the Lee-Carter model is discussed in Booth and Tickle(2008).In the existing literature of mortality factor models, factor loadings, which capture the re-lationship between age variables and latent common factors, are usually assumed to be invariantover time (we call factor models with time-invariant factor loadings ‘classical factor model’).For example, in Lee-Carter model, there is only one factor and the time-invariant factor load-ing represents the age-related sensitivity to the mortality improvement (we call classical factormodel with only one factor ‘Lee-Carter model’ throughout this paper). However, since mor-2ality datasets typically span a long period of time, it is restrictive to assume that the factorloadings are time-invariant. Driving forces such as medical improvement of certain diseases,environmental changes, and technological progress may inﬂuence the relationship of differentvariables signiﬁcantly. Booth et al. (2002) studied the violation of the invariance assumption inthe mortality data of Australia and suggested to ﬁnd an optimal ﬁtting period during which thefactor loadings were invariant to improve the ﬁt of the Lee-Carter model. Their approach, how-ever, needs to manually select the ﬁtting period and hence loses the information of early years.In recent years, there is a rich literature on time-varying factor models to capture the dynamicsand structural changes in factor loadings for macroeconomic variables modelling, for example,see Breitung and Eickmeier (2011) and Chen et al. (2014). However, there has been no literatureon mortality modelling which allows factor loadings to change smoothly over time, to the bestof our knowledge. Li and O’Hare (2017) and Li et al. (2015) used semi-parametric approachesto extend the CBD models (Cairns et al., 2009) by allowing for time-varying coefﬁcients, whichcan free model assumptions and show superior short-term forecasting performance. However,CBD models are only suitable for old-age mortality modelling, and the factors (regressors) areobservable. Unfortunately, for Lee-Carter model and many of its variants the factors are unob-served, which makes it difﬁcult to model and estimate. To ﬁll those gaps, we introduce a factormodel with time-varying factor loadings as an extension of the classical factor model based onSu and Wang (2017). This new model can be used for mortality modelling and forecasting bydeveloping corresponding estimation and forecasting methods.As the time-varying factor model allows for time-varying factor loadings, it provides moreﬂexibility in model ﬁtting, which, however, also poses challenges in model forecasting. Besidesforecasting the common factors, factor loadings also need to be extrapolated into the future.In this paper, we provide two forecasting methods of the factor loadings, one uses the locallinear regression to roll over the time-varying factor loadings into the future; while the otherone inherits the value of the factor loading from the last time period and remains invariantin the future. These two forecasting methods are called the local linear regression and thenaive method, respectively. Their details are described in Section 2. Empirical results usingthe mortality data from different populations show that the time-varying factor model providesmore accurate out-of-sample forecasting results than the Lee-Carter model.The existing literature suggests that different forecasting horizons may favour differentmodels. For example, Bell (1997) found that a simple random walk with drift model for age-speciﬁc mortality rates yields the most accurate 1-step-ahead forecast compared with the othersix methods on the US data. Hyndman and Ullah (2007) introduced a method which outper-formed the method proposed by Lee and Miller (2001) in the long-term forecasting. Speciﬁ-cally, we have found in the literature that semi-parametric or non-parametric methods can bemore suitable for short-term forecasting. For example, the semi-parametric model developed inLi et al. (2015) can produce superior 5-year-ahead forecasting results. CMI (2009) employed3he P-splines model (Currie et al. (2004)) for short-term forecasting to generate the initial ratesof mortality improvement. Our empirical applications in Section 5.3 also suggest that the time-varying model based on local regression (non-parametric forecasting) is better for short-termforecasting, while the time-varying model based on naive method (parametric forecasting) isbetter for long-term forecasting. Then where is the optimal ‘boundary’ between short-term(based on the local regression method) and long-term (based on the naive method) forecasting?We propose a novel approach based on change point analysis (Bai, 2010) to estimate the opti-mal ‘boundary’ and apply it to mortality data of multiple countries. Additionally, we conductsimulation studies to show the performance of the time-varying factor model under differentscenarios and investigate under which conditions it preforms better than the classical factormodel.The rest of the paper is organized as follows. Section 2 introduces the time-varying factormodel and its estimation approach. The forecasting methods based on the time-varying fac-tor model are also discussed in detail. Section 3 discusses the relative advantages of the localregression method and the naive method in the short-term and long-term forecasting, respec-tively. We then propose an approach based on change point analysis to estimate the ‘boundary’between short-term and long-term forecasting, which is favoured by the local regression methodand the naive method, respectively. Section 4 introduces the datasets and empirical evidence oftime-varying factor loadings. Section 5 applies the proposed methods to age-speciﬁc mortalitydata of multiple countries and shows the advantages of the proposed methods. Section 6 con-ducts simulation studies to investigate the performance of the time-varying factor model underdifferent scenarios. Section 7 concludes the paper. Appendix A provides the gender-speciﬁcempirical results using the time-varying factor model. Appendix B presents the estimations ofthe optimal boundaries for multiple countries with a variety of forecasting horizons. AppendixC displays estimation results of the time-varying model with multiple factors.

Let m x , t denote the central death rate for age x in year t , where x = , , . . . , N and t = , , . . . , T . Thus, { m x , t } x = , ,..., N , t = , ,..., T is an N -dimensional time series with T observations.Since mortality rates are always positive numbers, we use the log transformation to map thecentral death rates from R + space to R space for modelling purposes. Assume a x is the age-speciﬁc constant, which is the averages over time of the ln ( m x , t ) . Then ln ( m x , t ) − a x can bemodelled using the classical factor model as follows:ln ( m x , t ) = a x + bbb (cid:62) x kkk t + ε x , t , (2.1)4here kkk t is an R × bbb x is an R × x (i.e. the age-related sensitivity to the mortalityimprovement); and ε x , t is the idiosyncratic error of ln ( m x , t ) , which represents the componentunexplained by the common factor. Here, kkk t , bbb x and ε x , t are all unobservable components.Speciﬁcally, when R =

1, the classical factor model is equivalent to the Lee-Carter model. Thesingle factor k t is deﬁned as the mortality index in the Lee-Carter model, and consequently thefactor loading b x represents the impact of the mortality index on the death rate of age x .The classical factor model, however, is too restrictive when used to analyse the mortalitydata. It assumes that for each age the factor loadings are invariant over time. Statisticians andeconomists have noticed that the relationship between many economic variables and commonfactors is not time-invariant. Our empirical analysis using mortality data in Section 5 alsosuggests time-varying factor loadings. Therefore, we develop a factor model to allow for factorloadings changing smoothly overtime.We introduce the time-varying factor model based on the work of Su and Wang (2017),where factor loadings are modelled as non-random functions of time. Su and Wang (2017)provided a localized PCA method to consistently estimate the factors and time-varying factorloadings. Compared with Park et al. (2009), the time-varying factor model proposed by Suand Wang (2017) can capture more types of structural changes in factor loadings, includingboth continuous changes and abrupt structural breaks. Assume ln ( m x , t ) − a x follows the time-varying factor model with R unobservable common factors:ln ( m x , t ) = a x + bbb (cid:62) x , t kkk t + ε x , t , (2.2)where notations above are the same as the classical factor model, except for the factor loadings.Here, each component of the factor loading bbb x , t is assumed to be a deterministic function of t / T : bbb x , t = bbb x ( t / T ) , where each component of bbb x ( · ) is an unknown piece-wise smooth functionof t / T . The time-varying factor model can be seen as a generalization of the classical factormodel. If bbb x ( · ) is invariant over time, which is a special case of the piece-wise smooth function,the time-varying factor model will degenerate to the classical factor model. Generally speaking,the assumption that factor loadings are time-invariant is too restrictive to hold in most settings.However, the time-varying factor model can relax this assumption by allowing for both con-tinuous structural changes and abrupt changes in factor loadings, which can also beneﬁt themortality forecasting. Similar to the classical factor model, there exists an identiﬁcation problem in the time-varying factor model. At each time point t , and for any R × R invertible matrix HHH t , we have bbb (cid:62) x , t kkk t = (cid:0) HHH − t bbb x , t (cid:1) (cid:62) (cid:0) HHH (cid:62) t kkk t (cid:1) . Since an R × R invertible matrix has R free elements, R restric-5ions are needed in parameter estimations so that bbb x , t and kkk t can be identiﬁed separately. Deﬁne BBB t = ( bbb , t , bbb , t , . . . , bbb N , t ) (cid:62) and KKK = ( kkk , kkk , . . . , kkk T ) (cid:62) . Then the two sets of restrictions to solvethe issue of identiﬁcation are KKK (cid:62)

KKK / T = I R and BBB (cid:62) t BBB t = a diagonal matrix, where I R is an R × R identity matrix. The ﬁrst normalization condition imposes R × ( R + ) / R × ( R − ) / KKK and the factor loadings

BBB t (only up to a sign change, i.e., − KKK and − BBB t also satisfy the two sets of restrictions). When R =

1, only one restriction is needed to identify parameters. We choose to use the same nor-malization condition as Lee and Carter (1992), that is, we normalize the bbb x , t to sum to unityfor each t . In this way, we can directly compare the results of our new method with that of theLee-Carter model. The estimation method for the time-varying factor model is proposed by Su and Wang(2017). Let r ∈ { , . . . , T } be a ﬁxed year. Since we have assumed that each component of bbb x , t : [ , ] → R is a piece-wise smooth function, we have: bbb x , t = bbb x ( tT ) ≈ bbb x ( rT ) = bbb x , r , when tT ≈ rT . Thus, the mortality rate ln ( m x , t ) can be approximated by:ln ( m x , t ) ≈ a x + bbb (cid:62) x , r kkk t + ε x , t , when tT ≈ rT . In order to estimate the factors and time-varying factor loadings, we consider the followinglocal weighted least squares problem:min { bbb x , r } Nx = , { kkk t } Tt = ( NT ) − N ∑ x = T ∑ t = (cid:16) ln ( m x , t ) − a x − bbb (cid:62) x , r kkk t (cid:17) K h (cid:18) t − rT (cid:19) , (2.3)subject to the identiﬁcation constraints as discussed in Section 2.1. In the objective functionin Equation (2.3), K h ( x ) = h − K ( x / h ) , where K ( · ) is a kernel function and h is a smoothingparameter called “bandwidth”. We will show that the optimization problem of Equation (2.3)can be solved using the same estimation method for the classical factor model.We have known that the mortality rates can be approximated by ln ( m x , t ) − a x ≈ bbb (cid:62) x , r kkk t + ε x , t when tT ≈ rT . Multiplying both sides of the equation by K / h , tr : = (cid:18) K h (cid:18) t − rT (cid:19)(cid:19) / = (cid:18) h K (cid:18) t − rT h (cid:19)(cid:19) / ,

6e obtain a transformed model as: K / h , tr ( ln ( m x , t ) − a x ) ≈ K / h , tr bbb (cid:62) x , r kkk t + K / h , tr ε x , t , when tT ≈ rT . Then we can deﬁne matrices

MMM ( r ) = (cid:16) MMM ( r ) , . . . , MMM ( r ) N (cid:17) , εεε ( r ) = (cid:16) εεε ( r ) , . . . , εεε ( r ) N (cid:17) , and KKK ( r ) = (cid:16) K / h , r kkk , . . . , K / h , Tr kkk T (cid:17) (cid:62) . where MMM ( r ) x = (cid:16) K / h , r ( ln ( m x , ) − a x ) , . . . , K / h , Tr ( ln ( m x , T ) − a x ) (cid:17) (cid:62) and εεε ( r ) x = (cid:16) K / h , r ε x , , . . . , K / h , Tr ε x , T (cid:17) (cid:62) with x = , , . . . , N . Therefore, the transformed model can be written in matrix form as follows: MMM ( r ) ≈ KKK ( r ) BBB (cid:62) r + εεε ( r ) , and the optimization problem above can also be written in matrix notation as:min KKK ( r ) , BBB r Tr (cid:18)(cid:16) MMM ( r ) − KKK ( r ) BBB (cid:62) r (cid:17) (cid:16) MMM ( r ) − KKK ( r ) BBB (cid:62) r (cid:17) (cid:62) (cid:19) s . t KKK ( r ) (cid:62) KKK ( r ) / T = I R and BBB (cid:62) r BBB r = a diagonal matrix . Concentrating out

BBB r = MMM ( r ) (cid:62) KKK ( r ) (cid:16) KKK ( r ) (cid:62) KKK ( r ) (cid:17) − (which is MMM ( r ) (cid:62) KKK ( r ) / T under the normal-ization KKK ( r ) (cid:62) KKK ( r ) / T = I R ), the optimization problem is converted to minimizing the objectivefunction: Tr (cid:16) MMM ( r ) (cid:62) MMM ( r ) (cid:17) − T − Tr (cid:16) KKK ( r ) (cid:62) MMM ( r ) MMM ( r ) (cid:62) KKK ( r ) (cid:17) . Thus, the original local weighted least squares problem is equivalent to maximizingTr (cid:16)

KKK ( r ) (cid:62) MMM ( r ) MMM ( r ) (cid:62) KKK ( r ) (cid:17) , subject to the restriction KKK ( r ) (cid:62) KKK ( r ) / T = I R , which is equivalent to the optimization problem ofthe classical factor model.Our objective is to obtain estimators of the factors and factor loadings. A two-stage estima-7ion procedure is used to estimate those parameters. Let (cid:98) KKK ( r ) denote the estimated factor matrixof KKK ( r ) , and (cid:98) BBB r = (cid:16)(cid:98) bbb , r , (cid:98) bbb , r , . . . , (cid:98) bbb N , r (cid:17) (cid:62) denote the estimator of the time-varying factor loadingmatrix BBB r . Then, (cid:98) KKK ( r ) is √ T times eigenvectors corresponding to the largest R eigenvalues ofthe T × T matrix MMM ( r ) MMM ( r ) (cid:62) , and (cid:98) BBB r is MMM ( r ) (cid:62) (cid:98) KKK ( r ) (cid:16) (cid:98) KKK ( r ) (cid:62) (cid:98) KKK ( r ) (cid:17) − (it is MMM ( r ) (cid:62) (cid:98) KKK ( r ) / T under thecondition KKK ( r ) (cid:62) KKK ( r ) / T = I R ). Therefore, in the ﬁrst step, we can acquire estimators (cid:98) BBB r of thefactor loadings for r = , . . . , T .Based on the estimator (cid:98) BBB r of the factor loading matrix obtained in the ﬁrst stage, we con-sider another least squares problem in the second stage to obtain the estimator of the factor kkk t .The objective function we would like to minimize is as follows: N ∑ x = (cid:16) ln ( m x , t ) − a x − (cid:98) bbb (cid:62) x , t kkk t (cid:17) for t = , . . . , T . Since we already have (cid:98) bbb x , t in the ﬁrst stage, the answer to this minimization problem is (cid:98) kkk t = (cid:32) N ∑ x = (cid:98) bbb x , t (cid:98) bbb (cid:62) x , t (cid:33) − (cid:32) N ∑ x = (cid:98) bbb x , t ( ln ( m x , t ) − a x ) (cid:33) for t = , . . . , T . Thus, using the two-stage estimation method, we can obtain consistent estimators for both thefactors and time-varying factor loadings.Next, we discuss some issues in the kernel estimation.

Remark 1.

Boundary kernel . Usually, there exists a boundary bias issue in the kernel estima-tion. Instead of using the ordinary kernel function, it is suggested that a boundary kernel shouldbe used to help us obtain some uniform results. Let (cid:98) a (cid:99) represent the greatest integer less thanor equal to a, then the boundary kernel we choose to use is as follows:K ∗ h , tr = h − K ∗ r (cid:18) t − rT h (cid:19) = h − K ( t − rTh ) (cid:82) − rTh K ( u ) d u r ∈ [ , (cid:98) T h (cid:99) ] h − K (cid:0) t − rT h (cid:1) , r ∈ ( (cid:98) T h (cid:99) , T − (cid:98) T h (cid:99) ] h − K ( t − rTh ) (cid:82) ( − r / T ) / h − K ( u ) d u r ∈ ( T − (cid:98) T h (cid:99) , T ] . Remark 2.

The choice of bandwidth . For the nonparametric local smoothing method, it is im-portant to determine the bandwidth for the kernel estimation. There are two ways to choose thebandwidth. The ﬁrst one is to use a data-driven method, such as the cross-validation. The otherone is to use Silverman’s rule of thumb to set the bandwidth, which is much easier to compute.Su and Wang (2017) have shown that choices of the kernel function and the bandwidth havelittle impact on the performance of the information criteria. Thus, in the following empiricalanalysis, we decide to use the Epanechnikov kernel and its corresponding Silverman’s rule of humb bandwidth, which is h = ( . / √ ) T − / N − / . Remark 3.

Determination of the number of factors . There are mainly two methods to deter-mine the number of factors, R. The ﬁrst one is to use a BIC-type information criterion proposedby Su and Wang (2017). Under certain assumptions, the new information criterion can correctlychoose the true value of R. However, those assumptions may not hold in real data. Additionally,it is not easy to implement the out-of-sample forecasting if the chosen value of R is too large.The second method is based on the fact that the original local weighted least squares prob-lem can be transformed into an optimization problem of the classical factor model. Therefore,the cumulative sum of eigenvalues can help us identify the number of factors. Let c denote acut-off value between and , and λ k denote the k th largest eigenvalue of the matrix MMM ( r ) MMM ( r ) (cid:62) ,then we can choose the value of R as min { R : (cid:0) ∑ Rk = λ k (cid:1) / (cid:0) ∑ Nk = λ k (cid:1) ≥ c } . In the followinganalysis, we will set the cut-off value as c = . and empirical analysis shows that only onefactor is enough to capture most characteristics of the mortality data, which is consistent withthe Lee and Carter (1992) model. We now consider how to make out-of-sample forecasting using the time-varying factormodel. Since the factor loadings change over time, we should not only make predictions of thecommon factors, but also extrapolate the factor loadings for each age. We describe the fore-casting method for a single factor model ( R =

1) for the simplicity of notations in the followinganalysis. Assume that based on the historical data we have acquired the estimated commonfactor and factor loadings using the method mentioned in Section 2.2.In order to forecast the common factor, we ﬁrst ﬁt the common factor with an ARIMAmodel. Since Akaike Information Criterion (AIC) is asymptotically equivalent to the cross-validation when the maximum likelihood estimation is used to ﬁt the model (Stone, 1977), wechoose AIC as the model selection criteria to ﬁnd the most appropriate ARIMA model. Afterthat, we can use the chosen model to forecast and obtain prediction intervals for the latent factor(see more details in Chapter 5 & 9 of Brockwell et al. (1991)).The factor loading b x , t is assumed to be an unknown piece-wise smooth function of time t . For the purpose of extrapolating the factor loading b x , t into the future, we will adopt twodifferent methods to achieve the goal:1. The naive method . We simply assume that in the forecasting horizon, b x , t ( t > T ) is set as b x , T . This is essentially a parametric forecasting method and similar to that in the Lee-Carter model but with a different estimated value. The naive method using constant factorloading has a simple structure and could provide more stable forecasts in the long term.9. The local regression method . This method is based on a nonparametric regression method– the local linear regression, to ﬂexibly estimate the deterministic function b x , t (See moredetails in Fan and Gijbels (1996) and Friedman et al. (2001)). Similar method has alsobeen applied in Li and O’Hare (2017); Li et al. (2015). The local linear regression caneasily extend the most recent trends, which is more suitable for short-term forecasting.We brieﬂy describe the local regression method in the rest of this section. The main ideaof the local linear regression is to ﬁt the linear regression using only the observations in theneighbourhood of a target point t . This so-called localization is achieved by using a weightfunction K λ ( t , t ) = K (( t − t ) / λ ) , where K is a kernel function and the index λ indicates thewidth of the neighborhood. One of the commonly used kernel functions with compact support isEpanechnikov kernel, which is adopted in this paper. For the Epanechnikov kernel, the windowwidth parameter λ is the radius of the support region, which can be estimated using out-of-sample validation. The weight function assigns a weight to each time point t based on thecorresponding distance from t (i.e., | t − t | ). In this way, the resulting estimated function is asmooth function.Speciﬁcally for the forecasting of the time-varying factor loading of each age x , the locallinear regression solves a separate weighted least square problem at each target point T + h ( h = , , . . . ) : min α ( T + h ) , β ( T + h ) T + h − ∑ t = K λ ( t , T + h ) ( b x , t − α ( T + h ) − β ( T + h ) t ) . Note that the notations α ( T + h ) and β ( T + h ) indicate that the two parameters under studyvary with the point T + h in the local linear method.Let bbb x = (cid:0) b x , , b x , , . . . , b x , T + h − (cid:1) (cid:62) , XXX = . . .

11 2 . . . T + h − (cid:62) , and WWW ( T + h ) denotethe ( T + h − ) × ( T + h − ) diagonal matrix with the t th diagonal element K λ ( t , T + h ) . Thenby using the weighted least squares estimation, we can obtain the estimators for α ( T + h ) and β ( T + h ) as follows: (cid:16)(cid:98) α ( T + h ) , (cid:98) β ( T + h ) (cid:17) (cid:62) = (cid:16) XXX (cid:62)

WWW ( T + h ) XXX (cid:17) − XXX (cid:62)

WWW ( T + h ) bbb x . To ensure that

XXX (cid:62)

WWW ( T + h ) XXX is nonsingular, the bandwidth parameter λ in the kernel functionshould be selected properly in practice, see more details in Fan and Gijbels (1996). Therefore10he forecasted factor loading at point T + h is (cid:98) b x , T + h = (cid:16) T + h (cid:17) (cid:98) α ( T + h ) (cid:98) β ( T + h ) . Note that for h >

1, the forecasts (cid:98) b x , T + , . . . , (cid:98) b x , T + h − are evolved in bbb x when estimating thefactor loading at the time T + h . Following this method, we can estimate the factor loadings foreach age as a smooth function of time t and then extrapolate the factor loadings into the future.Combining with the predicted common factors, we can make out-of-sample predictions of thecentral death rates using the time-varying factor model. Under the framework of time-varying factor model, we assume the factor loading b x , t is afunction of time t . In Section 2.3, we introduced two different methods to extrapolate the factorloading. One is a naive method, which is more suitable for long-term forecasting; and the otheris based on local linear regression, which is more suitable for short-term forecasting. Thencan we estimate the ‘boundary’ between short-term and long-term forecasting that divides theforecasting horizon according to the predictive power of the local regression method and naivemethod?To solve this problem, we ﬁrst propose a new forecasting method, which is a hybrid of twopreviously introduced methods. Assume T is the number of years used in ﬁtting the model and k ( k = , , , · · · ) is the optimal boundary between short-term and long-term forecasting, favouredby the time-varying models based on the local regression and naive method, respectively. Wehave the point forecast estimation of mortality rate ln ( m x , t ) for any given x , t and k ( k ≥ ) using the hybrid method as ln ( (cid:98) m x , t ) = (cid:98) a x + (cid:98) b x , t · (cid:98) k t T + ≤ t ≤ T + k , (cid:98) a x + (cid:98) b x , T + k · (cid:98) k t t ≥ T + k + . If T + ≤ t ≤ T + k , the forecast of ln ( m x , t ) at time t is (cid:98) a x + (cid:98) b x , t · (cid:98) k t , where (cid:98) b x , t is the extrapolatedfactor loading at time t based on the local regression method. When t ≥ T + k +

1, the forecastat time t is (cid:98) a x + (cid:98) b x , T + k · (cid:98) k t , where (cid:98) b x , T + k is time-invariant and obtained using the extrapolatedfactor loading at time T + k based on the local regression method. For k =

0, the forecast attime t ( t > T ) is just (cid:98) a x + (cid:98) b x , T · (cid:98) k t using the estimated factor loading at time T . In this case,the hybrid method degenerates to the naive method. In view of this, T + k is the time boundarybetween short-term and long-term forecasting, and between choosing the local regression andnaive method. Given the value of k , the hybrid method applies the local linear regression for11he ﬁrst k periods in the forecasting horizon and keeps the factor loadings ( (cid:98) b x , T + k ) unchangedthereafter, which combines the local regression and naive methods. Additionally, the hybridmethod guarantees a consistent and smooth transition from short-term to long-term forecasting.As discussed in Section 1, different forecasting horizons may favour different models.Generally, long-term forecasting beneﬁts more from the historical long-term trend and short-term forecasting relies on the recent trend (Booth et al., 2002). Since the local linear regressioncan easily extend the most recent trend, it is more suitable for short-term forecasting. However,as time goes by, the recent trend becomes less and less reliable, which is not suitable for long-term forecasting. On the other hand, the naive method using constant factor loading is moresuitable for long-term forecasting, as it has a simple structure and would provide more stableforecasts in the long term. Compared to the classical factor model, the naive method providesmore accurate estimations not only for the factor loadings but also for the common factors,which helps generate more accurate long-term forecasts.Based on the hybrid forecasting method, we propose an estimation method of the optimal‘boundary’ inspired by Bai (2010). Assume the entire dataset has T years and we consider theﬁrst T years of data as the training set, and the remaining data with size T − T as the validationset. Given the value of k , we ﬁrst ﬁt the time-varying factor model using the training set, andthen apply the hybrid forecasting method to the validation set. We consider all possible lengthsof short-term (long-term) forecasting horizons (i.e. k ) and ﬁnd out an optimal one using leastsquares estimation. We describe the estimation procedure as follows.For the given x and k such that 1 ≤ k ≤ T − T −

1, deﬁne (cid:98) y x , t ( k ) = (cid:98) a x + (cid:98) b x , t · (cid:98) k t as thepredicted value of ln ( m x , t ) using the hybrid forecasting method. When T + ≤ t ≤ T + k , (cid:98) b x , t is forecasted by the local regression method; And when T + k + ≤ t ≤ T , (cid:98) b x , t = (cid:98) b x , T + k , where (cid:98) b x , T + k is the predicted factor loading at time T + k obtained via the local regression method.Then we deﬁne the sum of squared residuals for age x as S x , T ( k ) = T ∑ t = T + ( ln ( m x , t ) − (cid:98) y x , t ( k )) = T + k ∑ t = T + (cid:16) ln ( m x , t ) − (cid:98) a x − (cid:98) b x , t · (cid:98) k t (cid:17) + T ∑ t = T + k + (cid:16) ln ( m x , t ) − (cid:98) a x − (cid:98) b x , T + k · (cid:98) k t (cid:17) , where k = , , . . . , T − T −

1. Here k represents the length of the short-term forecasting horizonor the ‘boundary’ between short-term (based on the local regression method) and long-term(based on the naive method) forecasting. The local linear regression is used to make forecastsfrom T + T + k ; while the naive method (i.e. assuming (cid:98) b x , t does not change over the12emaining period) is used to make forecasts from T + k + T . We deﬁne S x , T ( ) = T ∑ t = T + (cid:16) ln ( m x , t ) − (cid:98) a x − (cid:98) b x , T · (cid:98) k t (cid:17) , for k = S x , T ( T − T ) = T ∑ t = T + (cid:16) ln ( m x , t ) − (cid:98) a x − (cid:98) b x , t · (cid:98) k t (cid:17) , for k = T − T . In this way, S x , T ( k ) is deﬁned for each k = , , . . . , T − T . Thus, the total sum of squaredresiduals (SSR) across all ages is deﬁned as SSR ( k ) = N ∑ x = S x , T ( k ) , Hence the least squares estimator of the optimal ‘boundary’ is (cid:98) k = argmin ≤ k ≤ T − T SSR ( k ) . The estimated optimal ‘boundary’ between the short-term (based on local linear regression) andlong-term (based on naive method) forecasting is the time (cid:98) k that leads to the smallest SSR. The mortality data used in this paper are extracted from the Human Mortality Database(HMD) (36). Six countries are selected for the empirical analysis in Section 4 and Section 5. Foreach country, age-sex-speciﬁc death rates are available annually for the entire population. Theselected countries are shown in Table 1 along with the corresponding available time horizons,which will be used for empirical analysis.

Table 1: Time horizon for different countries

Country start year end year lengthAUSTRALIA 1921 2018 98CANADA 1921 2016 96FRANCE 1816 2017 202ITALY 1872 2017 146JAPAN 1947 2018 72USA 1933 2017 85 + for each year. Sincemeasures of mortality at very old ages are unreliable (Lee and Carter, 1992), we decide not touse mortality data of age 91 and over in the following analysis and end up with N =

91 ages.In order to investigate whether the factor loadings are time-varying or time-invariant in theempirical data. We conduct an exploratory data analysis by applying the Lee-Carter Model onthe US mortality data with rolling-window time frames. We ﬁrst divide the entire dataset into44 subsets (each with 40 yearly observations) with the ﬁrst subset from year 1933 to year 1972,the second subset from year 1934 to year 1973, and so on. We then ﬁt the Lee-Carter modelon each of the subset and extract the factor loading b x for each time frame. We plot the factorloadings of some selected ages in Figure 1. We can see that the factor loadings possess differentdynamic patterns for different ages and they are not time-invariant. l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . . . . different time interval, Age = 20 b_ x l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l . . . . different time interval, Age = 40 b_ x l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l − . − . − . − . − . different time interval, Age = 60 b_ x l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l l − . − . − . − . − . − . different time interval, Age = 80 b_ x Figure 1: Factor loadings for ages 20, 40, 60, 80 over 44 rolling-window time frames

In the ﬁrst two subsections, we present the application results of the time-varying factormodel using age-speciﬁc mortality data of the US. We compare the time-varying factor modelsbased on both the naive and local regression forecasting methods with Lee-Carter model (theclassical factor model with one factor) via out-of-sample forecasting performance. Empiricalresults by gender are provided in Appendix A. Section 5.3 further compares the forecasting14erformance across multiple countries and models based on different ﬁtting and forecastinghorizons. And in the Section 5.4, we estimate the optimal ‘boundary’ between short-term andlong-term forecasting for different countries.

We ﬁt the US mortality data using the estimation method of the time-varying factor modelintroduced in Section 2. The number of factors estimated is 1 ( (cid:98) R = k t , obtained fromthe time-varying factor model, follows an ARIMA ( , , ) with drift model. This model cancapture most of the characteristics of the common factor. Our ﬁtted model of the common factor k t is as follows: (cid:53) k t = − . ( . ) + . ( . ) (cid:53) k t − + e t , where (cid:53) refers to the ﬁrst order differencing and e t represents the error term. The numbersin the parentheses are the standard errors of the corresponding parameters. With the ARIMAmodel built above, we can then forecast the common factor into the future.As a comparison, we also list the ﬁtted ARIMA model of the common factor (The numberof factors estimated is 1.) using the classical factor model below: (cid:53) k t = − . ( . ) + . ( . ) (cid:53) k t − + e t . We can see that the ARIMA models of the common factors estimated from the time-varyingfactor model and the Lee-Carter model are close. The estimated common factors, plotted inFigure 2, tend to decrease linearly and show similar dynamic patterns. The common factor isregarded as the index of the level of mortality, which captures major inﬂuence on death rates ofall ages.Figure 3 displays the comparison of the factor loadings between the time-varying factormodel and the Lee-Carter model for selected ages. Compared with time-invariant factor load-ings (the dashed lines), the time-varying factor loadings (the solid curves) change smoothlyovertime, see Figure 3. It is interesting to notice that, no matter which age it is, the corre-sponding factor loadings always reach their own minimum or maximum values during 1960sor 1970s. For older people (over age 40), the factor loadings usually arrive at their maximumvalues during 1960s or 1970s, which means the death rates of older people are more sensitive In this case, the classical factor model has the same model structure as the Lee-Carter model with R = ime F a c t o r s − − Time−Varying ModelLee−Carter Model

Figure 2: Plots of the estimated common factors for the time-varying factor model & the Lee-Carter model to the latent common factor during that period. For the younger ages (below age 40), however,the corresponding factor loadings reach their minimum values during the same period, whichmeans the death rates of younger people are less sensitive to the latent factor during that time.The only exception is the factor loadings of the infant group, whose dynamic pattern is moresimilar to that of the older group.Figure 4 shows the ﬁtted death rates of both the time-varying factor model and the Lee-Carter model with empirical observations for selected ages. Obviously, no matter which age itis, the time-varying factor model ﬁts better than the Lee-Carter model. We use the mean squarederror (MSE) to evaluate the goodness of ﬁt. As a result, the overall MSE of the time-varyingfactor model is 0 . . The MSE for the time-varying model is computed as follows:MSE = NT ∑ x ∑ t (cid:16) ln ( m x , t ) − a x − (cid:98) bbb (cid:62) x , t (cid:98) kkk t (cid:17) Computation of the MSE for the Lee-Carter model is the same as above except that (cid:98) bbb x , t is replaced by (cid:98) bbb x . ge: 0 Time f a c t o r l aod i ng . . . . Age: 10

Time f a c t o r l aod i ng . . . Age: 20

Time f a c t o r l aod i ng . . . Age: 30

Time f a c t o r l aod i ng . . . Age: 40

Time f a c t o r l aod i ng . . . . . Age: 50

Time f a c t o r l aod i ng . . . . Age: 60

Time f a c t o r l aod i ng . . . Age: 70

Time f a c t o r l aod i ng . . . . . Age: 80

Time f a c t o r l aod i ng . . . . . Age: 90

Time f a c t o r l aod i ng . . . . Figure 3: Plots of the estimated time-invariant factor loadings(dashed lines) & the time-varying factor loadings (solid lines)for age 0 , ,..., Mortality(Age: 0 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 10 )

Time dea t h r a t e s − . − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 20 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 30 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 40 )

Time dea t h r a t e s − . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 50 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 60 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 70 )

Time dea t h r a t e s − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 80 )

Time dea t h r a t e s − . − . . . . . ActualTime−VaryingLee−Carter

Mortality(Age: 90 )

Time dea t h r a t e s − . . . ActualTime−VaryingLee−Carter

Figure 4: The actual data (black solid lines) versus the ﬁttedvalues from the time-varying model (red dashed lines) and theLee-Carter model (blue dotted lines); the data have been log-transformed & demeaned. mance in the next subsection.

In this subsection, we use the original US mortality data over the ﬁrst 60 years as thetraining set (from 1933 to 1992) to ﬁt the models, and then forecast the mortality rates in thetesting set (from 1993 to 2017) using the ﬁtted models. The predicted values are compared withthe actual data in the testing set to see which model is better at the out-of-sample forecasting. Weapply the mean squared prediction error (MSPE) as the measure to evaluate the out-of-sampleforecasting performance. 17 ime−Varying Factor Model

Time f a c t o r s − − historical data80% PIpredicted value Figure 5: Out-of-sample forecast of the common factor, with the model ﬁtted on 1933 to 1999 and the forecast horizon over 1993 to 2017;predicted value (dashed line), 80% PI (dotted line)

Figure 5 plots the historical and predicted values of the common factor of the time varyingfactor model along with the associated 80% prediction intervals, which is based on the ARIMAmodel ﬁtted in Section 5.1. The dashed downward line shows that the latent factor will keepdeclining in the future, and there is an 80% chance that a future observation will be covered bythe corresponding prediction interval (represented by area between the red dashed lines).Since the time-varying factor model and the Lee-Carter model have similar common factorand the corresponding ﬁtted ARIMA model, the forecasts of the common factor are close toeach other too. Hence, the major difference of prediction accuracy between the time-varyingfactor model and the Lee-Carter model lies in the factor loadings. We extrapolate the factorloadings obtained from the time-varying factor model using both the naive method and locallinear regression introduced in Section 2.3, respectively. We then forecast the mortality ratesinto the future using both the time-varying and Lee-Carter models.Figure 6 plots the estimated and extrapolated factor loadings of the time varying factormodels. Figure 7 plots the actual data and predicted values using the three above-mentionedmethods. From Figure 6, we see that the local regression method follows the recent histori-cal trend of factor loadings, while the naive method stays at a constant level. Theoretically,if future factor loadings do not deviate signiﬁcantly from the recent historical trend, the locallinear regression may perform better than the naive method in forecasting. However, it mayonly be reasonable to assume that factor loadings will follow the local trends in the short-term.For long-term forecasting, this assumption is less reliable and the non-parametric forecastingmethod would lead to inferior results. In Section 5.3, we observe similar results in other coun-tries’ mortality forecasting. Hence, the naive method, with time-invariant forecasted factorloadings, is more suitable for long-term forecasting. And it may also be suitable for short-termforecasting if the long-term trend is consistent with the short-term trend. Since the beneﬁt of us-ing the local regression method decreases as the forecasting horizon increases, it is worthwhile18 ge: 0 year f a c t o r l oad i ng . . . . Age: 10 year f a c t o r l oad i ng . . . Age: 20 year f a c t o r l oad i ng . . . . Age: 30 year f a c t o r l oad i ng . . . Age: 40 year f a c t o r l oad i ng . . . Age: 50 year f a c t o r l oad i ng . . . . Age: 60 year f a c t o r l oad i ng . . . Age: 70 year f a c t o r l oad i ng . . . . Age: 80 year f a c t o r l oad i ng . . . . Age: 90 year f a c t o r l oad i ng . . . Figure 6: Plots of the estimated and extrapolated factor loadingsbased on naive method (dotted lines) & local regression method(dashed lines) for age 0 , ,..., − . − . − . Age: 0 time dea t h r a t e s − . − . − . − . Age: 10 time dea t h r a t e s − . − . − . − . Age: 20 time dea t h r a t e s − . − . − . − . Age: 30 time dea t h r a t e s − . − . − . − . Age: 40 time dea t h r a t e s − . − . − . − . − . Age: 50 time dea t h r a t e s − . − . − . − . − . Age: 60 time dea t h r a t e s − . − . − . − . − . Age: 70 time dea t h r a t e s − . − . − . − . Age: 80 time dea t h r a t e s − . − . − . − . − . Age: 90 time dea t h r a t e s Figure 7: The actual data (solid line) versus the predicted valuesfrom the time-varying model (naive method: dash-dotted line;local linear regression: dotted line) and the Lee-Carter model(dashed line); the data have been log-transformed. to ask whether an optimal forecasting horizon exists for using the local regression method. Thisquestion will be answered in Section 5.4.The empirical analysis suggests that, the time-varying factor model (based on the naivemethod) performs better than the Lee-Carter model over the entire forecasting horizon (1993-2017). Using the mean squared prediction error (MSPE) to evaluate the out-of-sample forecast-ing performance, we see that the overall MSPE for the Lee-Carter model is 0 . . . year M SPE name

Lee−Carterlocal_regressionnaive

Year−specific MSPE

Figure 8: US: Year-speciﬁc MSPE for the time-varying model and the Lee-Carter model over 1993 to 2017; for time-varying model, both thenaive method and the local regression method are used

Next, we investigate the forecasting performance of the time-varying factor model at dif-ferent ages. Figure 9 shows the age-speciﬁc MSPE for the time-varying factor models and theLee-Carter model. The square-dashed lines and triangle-dotted lines represent the MSPE of thetime-varying factor models with the naive and local regression methods for each age, respec-tively, and the circle-solid line represents the MSPE of the Lee-Carter model. We ﬁnd that thenaive forecasting method based on the time-varying factor model is almost always better thanthe local regression method for any age in this data. And roughly speaking, no matter which ex-20 .00.10.2 0 10 20 30 40 50 60 70 80 90 age M SPE name

Lee−Carterlocal_regressionnaive

Age−specific MSPE

Figure 9: US: Age-speciﬁc MSPE for both the time-varying model and the Lee-Carter model; for time-varying model, both naive method andlocal regression method are used trapolation method we choose to use, the time-varying models provide more accurate forecaststhan the Lee-Carter model for age groups 20 ∼

40 and 60 ∼

80. However, for the age group40 ∼

60, the forecasting performance of the time-varying factor models is worse than that ofthe Lee-Carter model. Thus, even though by using naive extrapolation method the time-varyingfactor model improves the overall performance (over 40% in terms of the MSPE) signiﬁcantly,it cannot outperform the Lee-Carter model for some ages. The main advantage of the time-varying model is to forecast mortality rates for the young adulthood (20 ∼

40) and the olderadulthood (60 ∼ We apply and compare different models using mortality data of multiple countries. Inparticular, the functional data model proposed by Hyndman and Ullah (2007) is also consideredfor comparison purposes and we call it ‘Hyndman-Ullah model’ in the rest of the paper. It is amulti-factor extension of the Lee-Carter model, allowing for multiple age-time interaction termsto capture the complex structure of the data. Similar to the Lee-Carter model (i.e. the classicalfactor model with one factor), it is also commonly considered as a benchmark for mortalityforecasting. To determine the order K of Hyndman-Ullah model (similar to the number offactors R in this paper), we choose the best value of K by minimizing Integrated Squared ForestError. Please refer to Hyndman and Ullah (2007) for more details.In Table 2, we present the results of the overall MSPE for different countries, forecasthorizons and methods. We use the longest available dataset for training purposes, with differentforecasting horizons listed in Table 2. To investigate the performance of the short-term and21ong-term forecasting, we consider multiple forecasting horizons with different lengths, includ-ing 5, 10, 15, 20 and 25 years. Note that, as the number of total historical years is ﬁxed for eachcountry, the number of training years changes with the length of the forecasting horizon. Forexample, the total historical years for Australia are 1921 ∼ ∼ ∼ ∼ ∼ (cid:98) R =

1) forboth the classical factor model the time-varying model. Hence, we use ‘Lee-Carter’ to representthe classical factor model (with R =

1) in this section. As for Hyndman-Ullah model, we recordthe estimated order K for each scenario in Table 2. Table 2: Overall MSPE by country, forecast horizon and method (

Hyndman-Ullah : functional data model in Hyndman and Ullah (2007)(estimated order K is shown in parenthesis for each scenario),

Lee-Carter : classical factor model,

TV-local regression : time-varying factormodel based on local linear regression,

TV-naive : time-varying factor model based on naive method)

Country Forecast Horizon Hyndman-Ullah Lee-Carter TV-Local Regression TV-Naive

Australia ∼ . (K = 8) 0 .

037 0 .

013 0 . ∼ .

022 (K = 5) 0 . . . ∼ .

036 (K = 10) 0 . . . ∼ .

082 (K = 3) 0 .

094 0 . . ∼ .

053 (K = 2) 0 .

112 0 . . ∼ . (K = 8) 0 .

044 0 .

015 0 . ∼ .

018 (K = 3) 0 . . . ∼ .

021 (K = 9) 0 . . . ∼ .

021 (K = 6) 0 .

051 0 . . ∼ .

032 (K = 6) 0 .

061 0 . . ∼ . (K = 5) 0 .

047 0 . . ∼ . (K = 10) 0 .

054 0 .

021 0 . ∼ . (K = 8) 0 .

081 0 .

049 0 . ∼ . (K = 3) 0 .

101 0 .

068 0 . ∼ . (K = 8) 0 .

128 0 .

096 0 . Italy ∼ . (K = 9) 0 .

044 0 .

014 0 . ∼ . (K = 4) 0 .

057 0 .

028 0 . ∼ . (K = 10) 0 .

084 0 .

064 0 . ∼ .

090 (K = 5) 0 .

098 0 . . ∼ .

099 (K = 6) 0 .

108 0 . . ∼ .

012 (K = 3) 0 . .

009 0 . ∼ . (K = 6) 0 .

034 0 .

017 0 . ∼ .

039 (K = 4) 0 .

072 0 . . Continued on next page able 2 – continued from previous pageCountry Forecast Horizon Hyndman-Ullah Lee-Carter TV-Local Regression TV-Naive1999 ∼ .

024 (K = 3) 0 .

055 0 . . ∼ . (K = 9) 0 .

072 0 .

037 0 . USA ∼ . (K = 8) 0 .

028 0 .

013 0 . ∼ . (K = 10) 0 .

024 0 .

015 0 . ∼ .

017 (K = 6) 0 .

025 0 . . ∼ .

111 (K = 10) 0 .

029 0 . . ∼ .

026 (K = 10) 0 .

031 0 . . From Table 2, we see that the time-varying models can signiﬁcantly improve the out-of-sample forecasting performance compared with the Lee-Carter model. The time-varying factormodels and Hyndman-Ullah model perform the best in most cases. The Hyndman-Ullah modelis especially suitable for mortality forecasting of the France data, as shown in Hyndman andUllah (2007). In addition, for the time-varying factor models, the local regression method tendsto perform better for short-term forecasting, while the naive method is better for long-termforecasting. Compared with all the other models, the time-varying model based on the naivemethod show superior results for long-term forecasting, while the Lee-Carter model always hasthe worst forecasting performance.In Figure 10, by ﬁxing the length of the forecast horizon to be 25 years, we plot the year-speciﬁc MSPE for all the countries using different forecasting methods. Roughly speaking, thetime-varying model based on the naive method always has the most accurate forecasts, whilethe Lee-Carter model always performs the worst. Additionally, the time-varying model basedon the local regression method usually performs well (similar to or better than the naive method)at the ﬁrst few years, while it deteriorates as time goes by. For mortality data of Australia andthe US, it has worse forecasting performances than the Lee-Carter model in the long term.

From the empirical results above, we see that under the framework of time-varying model,the local regression method (by assuming factor loading will change in the future) is betterat short-term forecasting while the naive method (by assuming factor loading is in-variant inthe future) is better at long-term forecasting. Therefore, we are interested in the boundarybetween short-term and long-term forecasting that divides the forecast horizon according to thepredictive power of the local regression method and the naive method.By applying the estimation method introduced in Section 3, we estimate the optimal ‘bound-ary’ between the short-term and long-term forecasting, which is favoured by the local regressionmethod (time-varying forecast of the factor loading) and the naive method (time-invariant fore-23 .00.10.20.3 2000 2010 year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

AUSTRALIA: Year−specific MSPE year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

FRANCE: Year−specific MSPE year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

JAPAN: Year−specific MSPE year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

CANADA: Year−specific MSPE year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

ITALY: Year−specific MSPE year M SPE methods

Hyndman−UllahLee−Carternaivelocal_regression

USA: Year−specific MSPE

Figure 10: Year-speciﬁc MSPE by country and method (

Hyndman-Ullah : functional data model in Hyndman and Ullah (2007),

Lee-Carter :classical factor model, local regression : time-varying factor model based on local linear regression, naive : time-varying factor model basedon naive method); length of forecast horizon is 25 years cast of the factor loading) respectively. Recall that the optimal ‘boundary’ can be regarded asthe optimal value of k deﬁned in the hybrid forecasting method in Section 3. Applying the samedatasets introduced in Section 4, we use the last 25 years of the historical data as the valida-tion set, and the remaining data as the training set. In addition, to check the sensitivity of theleast squares estimator to the division of the validation and testing sets, we consider the last p years ( p = , , ,

30, respectively) of data as the validation set and the remaining data as thetraining set. The results are shown in Appendix B.As deﬁned in Section 3, for each set of data we compute the least squares estimator for the24 l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (AUSTRALIA) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (FRANCE) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (JAPAN) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (CANADA) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (ITALY) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (USA) Figure 11: Plots of the total sum of squared residuals (SSR) versus the length ( k ) of the short-term forecast horizon (based on the hybridforecasting method of time-varying factor model); length of forecast horizon: 25 optimal ‘boundary’ in the forecasting horizon as (cid:98) k = argmin ≤ k ≤ T − T SSR ( k ) , where k = , , , . . . , T − T . Here k represents the length of the short-term forecasting horizon,or the ‘boundary’ between short-term (based on the local regression method) and long-term(based on the naive method) forecasting. The local linear regression is used to make forecastsfrom T + T + k ; while the naive method (i.e. keep the factor loading the same as that intime T + k ) is used to make forecasts from T + k + T .We plot the total SSR versus k in Figure 11. As shown in the plots, for each country, there isa minimum point of k corresponding to the smallest values of SSR, which indicates the optimallength of the short-term forecasting horizon for the time-varying factor model. For example,25he plot of Italy shows that, when the value of k equals 7, SSR achieves the smallest value.Thus, based on the historical Italy mortality data, we suggest that it is better to use the localregression method for the short-term forecasting (less than 7 leads), as it puts more weights onthe recent observations. As for forecasting more than 7 years ahead, the naive method generatesmore accurate predictions. However, as for the US data, the optimal length of the short-termforecasting horizon is 0, which means there is no need to extrapolate factor loadings using locallinear regression and we should use the naive method alone. In Figure 11, we can observe jumpsat k = k =

0, the hybrid method is justthe naive method and the extrapolation of factor loading is based on the historical estimation;while when k >

1, the hybrid method is based on the local regression method before the time T + k . Therefore, a sudden change of SSR when k increases from 0 to 1 indicates that thepattern of factor loadings has a relatively large change after the time T .Now we have observed the existence of positive optimal ‘boundary’ for some countries(such as Australia, Italy and Japan). Then based on the estimation of optimal ‘boundary’ k ,we can compare the out-of-sample forecasting performance of the hybrid method with othermethods. Here, we consider the Australian mortality data as an example since the estimationof the optimal boundary for Australian data is relatively stable (see details in Remark 4 andAppendix B).Similar to previous analysis, we choose the last 25 years of the historical data as the testingset. And the remaining data is the training set. To estimate k , we use the last 25 years of thetraining data as the validation set to estimate the optimal ‘boundary’. We then apply the estima-tion procedure in Section 3 to the training data and the forecasting method in the testing datausing the estimated optimal ‘boundary’ to get the out-of-sample performance. The empiricalresults are shown in Figure 12 and Table 3.In Table 3, we compute the overall MSPE over 1994 to 2018 for each forecasting method.We ﬁnd that the time-varying model based on the hybrid forecasting method has the best out-of-sample performance among all four methods. And the naive method is a little bit worse thanthe hybrid method. Additionally, in Figure 12, we plot the year-speciﬁc MSPE for all differentmethods. For short-term forecasting, the hybrid method shows similar performance with thelocal regression method; while for the long-term forecasting, the hybrid method shows similarperformance with the naive method. Therefore, the hybrid method is superior for both short-term and long-term forecasting, as it beneﬁts from the advantages of both methods. In practice,both naive method and hybrid method are recommended. The hybrid method could producemore accurate predictions while the naive method is much easier to implement. Remark 4.

In Appendix B, we further investigate the sensitivity of the optimal ‘boundary’estimation by showing plots of the total sum of squared residuals (SSR) versus k using differentlengths of training and testing datasets. For some countries, like Japan and USA, the leastsquares estimators are relatively stable. However, for countries like Canada and Italy, the able 3: Australia: Overall MSPE over 1994 to 2018 ( Lee-Carter : classical factor model,

TV-Local Regression : time-varying factor modelbased on local regression method),

TV-Naive : time-varying factor model based on naive method),

TV-Hybrid : time-varying factor modelbased on hybrid method)

Country Lee-Carter TV-LocalRegression TV-Naive TV-HybridAustralia 0.11162 0.08932 0.04342 0.04288 year M SPE methods hybridLee−Carterlocal_regressionnaive

AUSTRALIA: Year−specific MSPE

Figure 12: Australia: Year-speciﬁc MSPE over 1994 to 2018 (

Lee-Carter : classical factor model, local regression : time-varying factor modelbased on local regression method), naive : time-varying factor model based on naive method), hybrid : time-varying factor model based onhybrid method) estimation of the optimal boundary is somewhat sensitive to the length of the dataset. Thisis reasonable as the time-varying factor model highly depends on the data and its intrinsicpatterns. Therefore, when using different training data and forecasting horizons to estimateoptimal ‘boundary’, we may obtain different values of k.

In this section, we further investigate the prediction performance of the time-varying factormodel and the classical factor model through Monte Carlo simulations. We use examples withdifferent structures of the factor loadings to illustrate that the time-varying factor model canimprove the forecasting accuracy when the ‘true’ factor loadings change over time. In addition,we explain under which conditions the naive method performs better than the local regressionmethod even in the short-term.Similar to the previous empirical analysis, we denote the classical factor model as ‘Lee-Carter’ since there is only one factor. We denote the two forecasting methods based on the time-varying factor model as ‘TV-Local Regression’ method and ‘TV-Naive’ method, respectively.27e show that when the ‘true’ factor loadings change over time, both forecasting methods basedon the time-varying factor model outperform the Lee-Carter method in forecasting. Moreover,the ‘TV-Naive’ method performs similarly to the ‘TV-Local Regression’ method for short-termforecasting; while it performs better than the ‘TV-Local Regression’ for long-term forecasting.

We generate the centered data x i , t with one common factor: x i , t = b i , t · k t + ε i , t , i = , , . . . , N , where k t = k t − + w t . w t follows independent identically distributed normal distributions, N ( , . ) . Thus the common factor k t follows a random walk. We consider the followingsettings for different factor loadings b i , t and error terms ε i , t . In each setting below, we apply thenormalization condition mentioned in Section 2, so that b i , t is normalized to sum to unity foreach t .• DGP 1 (time-invariant factor loading): b i , t = b i ∼ i . i . d uni f orm ( , ) , ε i , t ∼ i . i . d N ( , . ) . • DGP 2 (single-point structural change):For i = , , · · · , N / b i , t = b i f or t = , , ..., T / b i + f or t = T / + , ..., T ;and for i = N / + , · · · , N , b i , t = b i f or t = , , ..., T / b i − f or t = T / + , ..., T ; b i ∼ i . i . d uni f orm ( . , . ) , ε i , t ∼ i . i . d N ( , . ) . DGP 3 (continuous structural change): b i , t = + e ( iN + − tT ) , ε i , t ∼ i . i . d N ( , . ) . DGP 1 follows the Lee-Carter model with time-invariant factor loadings, and

DGP 2 and

DGP 3 exhibit different structures of the time-varying factor loadings.

DGP 2 describes asingle-point structural change in the factor loadings; while

DGP 3 considers a continuous struc-tural change in the factor loadings. For each i , the factor loadings generated in DGP 3 aremonotonic functions and would converge to some constant as time goes by. The factor loadingsin

DGP 3 may go up and down and would never diverge to extreme values, which is similar tothe estimated time-varying factor loadings in Figure 3 using the US mortality data.

To compare the forecasting performance of the time-varying factor model and the Lee-Carter model (classical time-invariant factor model), we use the out-of-sample testing approachin the following analysis. For each DGP, we simulate 100 data sets with the dimension and sam-ple sizes N = T = k years of the data as the trainingset, and the remaining T − k years of the data as the testing set ( k = , , , , ,

95, re-spectively). The model is ﬁrst ﬁtted using the training set, and then forecasted in the testing set.We employ the mean squared prediction error (MSPE) as the measure to evaluate performanceof different models.Table 4 reports the comparison results based on various lengths of the training and testingsets. An example of the estimated and forecasted factor loadings is shown in Figure 13 to betterexplain the results. As shown in Table 4, the two time-varying methods perform worse thanthe Lee-Carter model for

DGP 1 , which assumes the time-invariant factor loadings. The lessaccurate prediction results can be attributed to the inaccurate estimation from the time-varyingmodel, which is supported by the left plot of Figure 13. When the ‘true’ factor loadings aretime-invariant, the estimation from the time-varying model goes up and down randomly due tothe over-ﬁtting problem of the non-parametric estimating method. Therefore, the forecastingbased on these estimation is not satisﬁed. However, the Lee-Carter method provides a closeestimation and better forecasting in this case.

DGP 2 and

DGP 3 follow the structures of time-varying factor models with abrupt andcontinuous changes in the factor loadings, respectively. From Table 4, we see that both thetwo time-varying methods perform better than the Lee-Carter method when the ‘true’ factorloadings are time-varying. In particular, the naive method performs the best in these two cases,especially for long-term forecasting. The superior forecasts of the two time-varying methodsresult from the more accurate estimation of the time-varying factor loadings, which can be29 able 4: Comparison of forecasting performance of the time-varying factor model and the Lee-Carter model (based on the different lengths oftraining sets)

DGPs methods 70 75 80 85 90 95TV-Local Regression 0 . . . . . . . . . . . . . . . . . . TV-Local Regression 0 . . . . . . . . . . . . Lee-Carter 0 . . . . . . . . . . . . . . . . . . Lee-Carter 0 . . . . . . n: 30 Time f a c t o r l oad i ng . . . . . . . n: 30 Time f a c t o r l oad i ng . . . . . . n: 30 Time f a c t o r l oad i ng . . . . Figure 13: Comparison of the factor loadings: estimation and forecast. From left to right:

DGP 1 , DGP 2 , DGP 3 . k =

70. Solid line: truefactor loadings. Dot-dashed line: estimation from the classical factor model (‘Lee-Carter’). Dotted line: estimation from the time-varyingfactor model. Long dashed line: ‘TV-Local Regression’. Short dashed line: ‘TV-Naive’. seen from the middle and right plots in Figure 13. We ﬁnd that the Lee-Carter model cannotcapture the changed factor loadings as it assumes the factor loadings are time-invariant, whilethe time-varying model can estimate those changing factor loadings accurately and provides asolid foundation for the forecasting step.Further analysis of Table 4 suggests that the local regression method and the naive methodpreform similarly for the relatively short-term forecasting ( k = , , k = , DGP 3 in Figure 13. From the plot, we see that the forecast of the local regressionmethod follows the trend of the estimated factor loadings. Therefore, when the trend remainsthe same in a short period of time, the forecast of the local regression method is satisfying.However, as the trend of the ‘true’ factor loadings changes, the forecast of the local regressionmethod diverges away from the true values in the long term. This is the drawback of the non-30arametric forecasting method. On the other hand, the constant factor loadings used in theforecasting procedure of the naive method guarantee that the factor loadings will not divergedramatically and result in a better performance for long-term forecasting. In addition, althoughthe ‘true’ factor loading in the training set changes, if it is time-invariant in the forecastinghorizon, the naive method is also better than the local regression method even in the short term(which is supported by the plot of

DGP 2 in Figure 13). The aforementioned analysis could beused to explain why the estimated optimal ‘boundary’ of the US data is 0 in Section 5.4. FromFigure 3 we see that in the ﬁrst years of the forecasting horizon, the trends of the factor loadingsare either different from that in the training set or remain ﬂat. However, the local regressionmethod cannot capture the unknown changing trends, so the naive method can outperform iteven in the short term. In this case, the naive method not only captures the time-varying factorloadings in the estimation step, but also uses the constant factor loadings in the forecasting step.

This paper develops a time-varying factor model for mortality modelling. Two forecastingmethods, the local linear regression and naive method, are used to extrapolate the factor loadingsinto the future. To understand the optimal forecasting horizon of the two forecasting methods,we propose an approach to estimate the optimal ‘boundary’ between short-term and long-termforecasting, which is favoured by the local linear regression and the naive method, respectively.Empirical analysis on mortality forecasting of multiple countries demonstrates the advantagesof the new model.Like other existing mortality models, the time-varying factor model proposed in this papercan only extract linear features from the original data. For future research, we are interestedin developing a general non-parametric factor model, which can capture non-linear featuresin mortality data. Modern statistical techniques, such as neural networks and non-parametricestimation methods, can be used to fulﬁl this target. In addition, we are interested in extendingthe age range considered in this research to include more advanced ages (Huang et al., 2020)and exploring time-varying factor models for multi-population mortality forecasting.

References

Anderson, T. W. (1963), ‘The use of factor analysis in the statistical analysis of multiple timeseries’,

Psychometrika (1), 1–25.Bai, J. (2009), ‘Panel data models with interactive ﬁxed effects’, Econometrica (4), 1229–1279. 31ai, J. (2010), ‘Common breaks in means and variances for panel data’, Journal of Economet-rics (1), 78–92.Bai, J. and Ng, S. (2002), ‘Determining the number of factors in approximate factor models’,

Econometrica (1), 191–221.Bell, W. R. (1997), ‘Comparing and assessing time series methods for forecasting age-speciﬁcfertility and mortality rates’, Journal of Ofﬁcial Statistics (3), 279.Booth, H., Hyndman, R. J., Tickle, L. and De Jong, P. (2006), ‘Lee-Carter mortality forecasting:a multi-country comparison of variants and extensions’, Demographic Research , 289–310.Booth, H., Maindonald, J. and Smith, L. (2002), ‘Applying Lee-Carter under conditions ofvariable mortality decline’, Population Studies (3), 325–336.Booth, H. and Tickle, L. (2008), ‘Mortality modelling and forecasting: A review of methods’, Annals of Actuarial Science (1-2), 3–43.Booth, H., Tickle, L. et al. (2004), ‘Beyond three score years and ten: prospects for longevityin Australia’, People and Place (1), 15.Breitung, J. and Eickmeier, S. (2011), ‘Testing for structural breaks in dynamic factor models’, Journal of Econometrics (1), 71 – 84.Brockwell, P. J., Davis, R. A. and Fienberg, S. E. (1991),

Time Series: Theory and Methods:Theory and Methods , Springer Science & Business Media.Cairns, A. J. G., Blake, D., Dowd, K., Coughlan, G. D., Epstein, D., Ong, A. and Balevich, I.(2009), ‘A quantitative comparison of stochastic mortality models using data from Englandand Wales and the United States’,

North American Actuarial Journal (1), 1–35.Chang, J., Guo, B., Yao, Q. et al. (2018), ‘Principal component analysis for second-order sta-tionary vector time series’, The Annals of Statistics (5), 2094–2124.Chen, L., Dolado, J. J. and Gonzalo, J. (2014), ‘Detecting big structural breaks in large factormodels’, Journal of Econometrics (1), 30 – 48.CMI (2009), Continuous Mortality Investigation: A prototype mortality projections model: parttwo – detailed analysis, Working Paper 39, The Institute of Actuaries and the Faculty ofActuaries.Currie, I. D., Durban, M. and Eilers, P. H. (2004), ‘Smoothing and forecasting mortality rates’,

Statistical Modelling (4), 279–298. 32an, J. and Gijbels, I. (1996), Local polynomial modelling and its applications: monographs onstatistics and applied probability 66 , Vol. 66, CRC Press.Friedman, J., Hastie, T. and Tibshirani, R. (2001),

The Elements of Statistical Learning , Vol. 1,Springer series in statistics New York, NY, USA:.Hollmann, F. W., Mulder, T. J. and Kallan, J. E. (1999),

Methodology & Assumptions for thePopulation Projections of the United States: 1999 to 2010 , US Department of Commerce,Bureau of the Census, Population Division . . . .Huang, F., Maller, R. and Ning, X. (2020), ‘Modelling life tables with advanced ages: Anextreme value theory approach’,

Insurance: Mathematics and Economics , 95 – 115.Hyndman, R. J. and Ullah, M. S. (2007), ‘Robust forecasting of mortality and fertility rates: afunctional data approach’, Computational Statistics & Data Analysis (10), 4942–4956.Lam, C. and Yao, Q. (2012), ‘Factor modeling for high-dimensional time series: inference forthe number of factors’, The Annals of Statistics (2), 694–726.Lee, R. D. and Carter, L. R. (1992), ‘Modeling and forecasting US mortality’, Journal of theAmerican Statistical Association (419), 659–671.Lee, R. and Miller, T. (2001), ‘Evaluating the performance of the lee-carter method for fore-casting mortality’, Demography (4), 537–549.Li, H. and O’Hare, C. (2017), ‘Semi-parametric extensions of the Cairns-Blake-Dowd model:A one-dimensional kernel smoothing approach’, Insurance: Mathematics and Economics , 166 – 176.Li, H., O’Hare, C. and Zhang, X. (2015), ‘A semiparametric panel approach to mortality mod-eling’, Insurance: Mathematics and Economics , 264 – 270.Li, S.-H. and Chan, W.-S. (2005), ‘Outlier analysis and mortality forecasting: the United King-dom and Scandinavian countries’, Scandinavian Actuarial Journal (3), 187–211.Lundstr¨om, H. and Qvist, J. (2004), ‘Mortality forecasting and trend shifts: an application of theLee-Carter model to Swedish mortality data’,

International Statistical Review (1), 37–50.Park, B. U., Mammen, E., H¨ardle, W. and Borak, S. (2009), ‘Time series modelling with semi-parametric factor dynamics’, Journal of the American Statistical Association (485), 284–298.Pena, D. and Box, G. E. (1987), ‘Identifying a simplifying structure in time series’,

Journal ofthe American Statistical Association (399), 836–843.33enshaw, A. E. and Haberman, S. (2003), ‘Lee-Carter mortality forecasting with age-speciﬁcenhancement’, Insurance: Mathematics and Economics (2), 255–272.Stock, J. H. and Watson, M. W. (2002), ‘Forecasting using principal components from a largenumber of predictors’, Journal of the American Statistical Association (460), 1167–1179.Stone, M. (1977), ‘An asymptotic equivalence of choice of model by cross-validation andAkaike’s criterion’, Journal of the Royal Statistical Society. Series B (Methodological) pp. 44–47.Su, L. and Wang, X. (2017), ‘On time-varying factor models: estimation and testing’,

Journalof Econometrics (1), 84–101.Tuljapurkar, S., Li, N. and Boe, C. (2000), ‘A universal pattern of mortality decline in the G7countries’,

Nature (6788), 789.University of California, Berkeley (USA) and Max Planck Institute for Demographic Re-search (Germany) (n.d.), ‘

Human Mortality Database ’. [Accessed:2018.07.10].Yang, S. S., Yue, J. C. and Huang, H.-C. (2010), ‘Modeling longevity risks using a principalcomponent approach: A comparison with existing stochastic mortality models’, Insurance:Mathematics and Economics (1), 254–270.34 ppendix A Our analysis of the mortality forecasting in the main text focuses on age-speciﬁc US mor-tality data of the total population. This appendix provides additional gender-speciﬁc results ofthe estimation and out-of-sample forecasting for both males and females.

Male

The results for the male subpopulation are showed in Figure 14 and Figure 15.In terms of the goodness of ﬁt, the overall MSE for the Lee-Carter model is 0 . . . . . ∼ year M SPE name

Lee−Carterlocal_regressionnaive

Male:Year−specific MSPE

Figure 14: Year-speciﬁc MSPE for both the time-varying model and the Lee-Carter model; for time-varying model, both naive method andlocal regression method are used;

Male subpopulation. .00.10.20.3 0 10 20 30 40 50 60 70 80 90 age M SPE name

Lee−Carterlocal_regressionnaive

Age−specific MSPE

Figure 15: Age-speciﬁc MSPE for both the time-varying model and the Lee-Carter model; for time-varying model, both naive method andlocal regression method are used;

Male subpopulation.

Female

The results for the female subpopulation are showed in Figure 16 and Figure 17.In terms of the goodness of ﬁt, the overall MSE for the Lee-Carter model is 0 . . . . . ∼ ∼ ∼

40) andthe older adulthood (55 ∼ .0000.0250.0500.0750.1000.125 1995 2000 2005 2010 2015 year M SPE name

Lee−Carterlocal_regressionnaive

Female:Year−specific MSPE

Figure 16: Year-speciﬁc MSPE for both the time-varying model and the Lee-Carter model; for time-varying model, both the naive method andlocal regression method are used;

Female subpopulation. age M SPE name

Lee−Carterlocal_regressionnaive

Age−specific MSPE

Figure 17: Age-speciﬁc MSPE for both the time-varying model and the Lee-Carter model; for time-varying model, both naive method andlocal regression method are used;

Female subpopulation. ppendix B l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (AUSTRALIA) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (AUSTRALIA) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (AUSTRALIA) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (AUSTRALIA) l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (FRANCE) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (FRANCE) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (FRANCE) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (FRANCE) l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (JAPAN) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (JAPAN) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (JAPAN) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (JAPAN) l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (CANADA) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (CANADA) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (CANADA) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (CANADA) l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (ITALY) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (ITALY) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (ITALY) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (ITALY) l l l l l l l l l l l l l l l l k SS R length of horizon: 15 years (USA) l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 20 years (USA) l l l l l l l l l l l l l l l l l l l l l l l l l l k SS R length of horizon: 25 years (USA) lllllllllllllllllllllllllllllll k SS R length of horizon: 30 years (USA) Figure 18: Plots of the total sum of squared residuals (SSR) versus the length ( k ) of the short-term forecast horizon (based on the hybridforecasting method of time-varying factor model); length of forecast horizon: 15, 20, 25 and 30 ppendix C In Section 5.1, we only present the estimation results of the time-varying model with asingle factor, as the number of factors for the US mortality data is estimated as 1. This appendixprovides some further empirical results of ﬁtting multi-factor time-varying model to the USmortality data. The number of factors is 6 (i.e. R = 6) in the following analysis. To solvethe identiﬁcation problem in the multi-factor context, we require the invertible matrix

HHH t as adiagonal matrix and normalize the bbb x , t to sum to unity for each t . For more details, please referto Section 2.1. year f a c t o r Factors (R = 6)

Figure 19: Plots of the estimated factors for the time-varying factor model ( R = All six factors are depicted in Figure 19. And in Figure 20, we plot all the correspondingtime-varying factor loadings for ages 20, 40 and 60. We can see that the ﬁrst factor and itscorresponding factor loadings remain the same as the results in Section 5.1. The explanation isthat theoretically, increasing the number of factors R does not affect the estimation of the ﬁrstcommon factor κκκ t and the corresponding factor loading bbb x , t in the localized PCA approach (seethe estimation method in Section 2.2). The ﬁrst factor captures the major trend of the mortalitydata, while the remaining factors are much less important with little information. It validatesour decision to use one factor in the empirical analysis in Section 5.39 th factor 5th factor 6th factor1st factor 2nd factor 3rd factor1950 1975 2000 1950 1975 2000 1950 1975 2000−10.0−7.5−5.0−2.50.02.5−50510−101230102030400.0000.0030.0060.0090.012−10−505 year f a c t o r l oad i ng Factor Loadings (R = 6 & age = 20) year f a c t o r l oad i ng Factor Loadings (R = 6 & age = 40) year f a c t o r l oad i ng Factor Loadings (R = 6 & age = 60)

Figure 20: Plots of the estimated time-varying factor loadings for age 20, 40 and 60 ( R =6)