[PDF] Actuarial Applications and Estimation of Extended~CreditRisk +

Abstract

We introduce an additive stochastic mortality model which allows joint modelling and forecasting of underlying death causes. Parameter families for mortality trends can be chosen freely. As model settings become high dimensional, Markov chain Monte Carlo (MCMC) is used for parameter estimation. We then link our proposed model to an extended version of the credit risk model CreditRisk + . This allows exact risk aggregation via an efficient numerically stable Panjer recursion algorithm and provides numerous applications in credit, life insurance and annuity portfolios to derive P\&L distributions. Furthermore, the model allows exact (without Monte Carlo simulation error) calculation of risk measures and their sensitivities with respect to model parameters for P\&L distributions such as value-at-risk and expected shortfall. Numerous examples, including an application to partial internal models under Solvency II, using Austrian and Australian data are shown.

Full PDF

AACTUARIAL APPLICATIONS AND ESTIMATION OFEXTENDED CREDITRISK + JONAS HIRZ, UWE SCHMOCK, AND PAVEL V. SHEVCHENKO

Abstract.

We introduce an additive stochastic mortality model which allowsjoint modelling and forecasting of underlying death causes. Parameter familiesfor mortality trends can be chosen freely. As model settings become high dimen-sional, Markov chain Monte Carlo (MCMC) is used for parameter estimation.We then link our proposed model to an extended version of the credit risk modelCreditRisk + . This allows exact risk aggregation via an eﬃcient numericallystable Panjer recursion algorithm and provides numerous applications in credit,life insurance and annuity portfolios to derive P&L distributions. Furthermore,the model allows exact (without Monte Carlo simulation error) calculation ofrisk measures and their sensitivities with respect to model parameters for P&Ldistributions such as value-at-risk and expected shortfall. Numerous examples,including an application to partial internal models under Solvency II, usingAustrian and Australian data are shown. Contents

1. Introduction 22. An Alternative Stochastic Mortality Model 42.1. Basic Deﬁnitions and Notation 42.2. Some Classical Stochastic Mortality Models 42.3. An Additive Stochastic Mortality Model 53. Parameter Estimation 83.1. Estimation via Maximum Likelihood 83.2. Estimation via a Maximum a Posteriori Approach 103.3. Estimation via MCMC 123.4. Estimation via Matching of Moments 134. Applications 154.1. Prediction of Underlying Death Causes 154.2. Forecasting Death Probabilities 185. A Link to the Extended CreditRisk + Model and Applications 225.1. The ECRP Model 22

Date : May 3, 2017.

Key words and phrases. stochastic mortality model; extended CreditRisk + ; risk aggregation;partial internal model; mortality risk; longevity risk; Markov chain Monte Carlo.J. Hirz gratefully acknowledges ﬁnancial support from the Australian Government via the 2014Endeavour Research Fellowship, as well as from the Oesterreichische Nationalbank (AnniversaryFund, project number: 14977) and Arithmetica. P. V. Shevchenko gratefully acknowledges ﬁnancialsupport by the CSIRO-Monash Superannuation Research Cluster, a collaboration among CSIRO,Monash University, Griﬃth University, the University of Western Australia, the University ofWarwick, and stakeholders of the retirement system in the interest of better outcomes for all. a r X i v : . [ q -f i n . R M ] A p r J. HIRZ, U. SCHMOCK, AND P. V. SHEVCHENKO

Introduction

As the current low interest rate environment forces insurers to put more focus onbiometric risks, proper stochastic modelling of mortality has become increasinglyimportant. New regulatory requirements such as Solvency II allow the use ofinternal stochastic models which provide a more risk-sensitive evaluation of capitalrequirements and the ability to derive accurate proﬁt and loss (P&L) attributionswith respect to diﬀerent sources of risk. Beneﬁts for companies which make useof actuarial tools such as internal models depend crucially on the accuracy ofpredicted death probabilities and the ability to extract diﬀerent sources of risk. Sofar, insurers often use deterministic modelling techniques and then add artiﬁcialrisk margins to account for risks associated with longevity, size of the portfolio,selection phenomena, estimation and various other sources. Such approaches oftenlack a stochastic foundation and are certainly not consistently appropriate for allcompanies.Deriving P&L distributions of large credit, life and pension portfolios typically isa very challenging task. In applications, Monte Carlo is the most commonly usedapproach as it is easy to implement for all diﬀerent kinds of stochastic settings.However, it has shortcomings in ﬁnesse and speed, especially for calculation ofmodel sensitivities. Motivated by numerical trials, we found that the credit riskmodel extended CreditRisk + (ECRP), as introduced in [Schmock(2017), section 6],is an exceptionally eﬃcient as well as ﬂexible alternative to Monte Carlo and, si-multaneously, ﬁts into life actuarial settings as well. Coming from credit risk,this model allows ﬂexible handling of dependence structures within a portfolio viacommon stochastic risk factors. The ECRP model relies on Panjer recursion (cf.[Sundt(1999)]) which, unlike Monte Carlo, does not require simulation. It allowsan eﬃcient implementation to derive P&L distributions exactly given input dataand chosen granularity associated with discretisation. The speed up for derivingsensitivities, i.e., derivatives, of risk measures with respect to model parameters https://eiopa.europa.eu/regulation-supervision/insurance/solvency-ii, accessed on March 28,2017. CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + using Panjer recursion will even be an order of magnitude larger as it is extremelydiﬃcult to calculate them via ﬁnite diﬀerencing using Monte Carlo. In addition,our proposed approach can enhance pricing of retirement income products and canbe used for applications to partial internal models in the underwriting risk module.In Section 2 we introduce an additive stochastic mortality model which is relatedto classical approaches such as the model introduced in [Lee and Carter(1992)] ormodels discussed in [Cairns et al.(2009)]. It allows joint modelling of underlyingstochastic death causes based on Poisson assumptions where dependence is intro-duced via common stochastic risk factors. Note that forecasting of death causes ina disaggregated way can lead to problems with dominating causes in the long run,as argued in [Wilmoth(1995)] as well as [Booth and Tickle(2008)]. However, jointmodelling of death causes can yield computational issues due to high dimensionalitywhich is why the literature is sparse in this matter. An extensive literature reviewand a multinomial logistic model for joint modelling of death causes is studiedin [ Alai et al.(2015)].Given suitable mortality data, in Section 3 we provide several methods to estimatemodel parameters including matching of moments, a maximum a posteriori approachand maximum likelihood as well as Markov chain Monte Carlo (MCMC). Deathand population data are usually freely available on governmental websites or atstatistic bureaus. Due to the high dimensionality of our problem, we suggest the useof MCMC which is one of the few statistical approaches allowing joint parameterestimation in high-dimensions. MCMC can become time-consuming, in particularfor settings with common stochastic risk factors. However, estimation of modelparameters does usually not have to be done on a frequent basis. We propose aparameter family for mortality trends which makes our model a generalisation ofthe Lee–Carter approach. However, our approach allows the use of any other kindof parameter family as MCMC is very ﬂexible.In Section 4 we estimate model parameters for Australian death data in a settingwith 362 model parameters, where trends, trend acceleration/reduction and cohorteﬀects are estimated. Further applications include forecasting of central mortalityrates and expected future life time.In Section 5 we then introduce the ECRP model, see [Schmock(2017), section 6],which is a collective risk model corresponding one-to-one to our proposed stochasticmortality model. As the name suggests, it is a credit risk model used to derive lossdistributions of credit portfolios and originates from the classical CreditRisk + modelwhich was introduced by [Credit Suisse First Boston(1997)]. Within credit risk mod-els it is classiﬁed as a Poisson mixture model. Identifying default with death makesthe model perfectly applicable for actuarial applications. Extended CreditRisk + provides a ﬂexible basis for modelling multi-level dependencies and allows a fast andnumerically stable algorithm for risk aggregation. In the ECRP model, deaths aredriven by independent stochastic risk factors. The number of deaths of each policy-holder is assumed to be Poisson distributed with stochastic intensity. Thus, servingas an approximation for the true case with single deaths, each person can die multipletimes within a period. However, with proper parameter scaling, approximations arevery good and ﬁnal loss distributions are accurate due to Poisson approximation,as shown in [Barbour et al.(1992)] or [Vellaisamy and Chaudhuri(1996)] as well asthe references therein. The close ﬁt of the ECRP model with (mixed) Poissondistributed deaths to more realistic Bernoulli models is outlined in an introductory J. HIRZ, U. SCHMOCK, AND P. V. SHEVCHENKO example. Another great advantage of the ECRP model is that it automaticallyincorporates many diﬀerent sources of risks, such as trends, statistical volatility riskand parameter risk.Section 6 brieﬂy illustrates validation and model selection techniques. Modelvalidation approaches are based on previously deﬁned dependence and independencestructures. All tests suggest that the model suitably ﬁts Australian mortality data.2.

An Alternative Stochastic Mortality Model

Basic Deﬁnitions and Notation.

Following standard actuarial notationsand deﬁnitions, [Pitacco et al.(2009)] or [Cairns et al.(2009)], let T a,g ( t ) denote therandom variable of remaining life time of a person aged a ∈ { , , . . . , A } , withmaximum age A ∈ N , and of gender g ∈ { m , f } at time/year t ∈ N . Survival anddeath probabilities over a time frame τ ≥ τ p a,g ( t ) = P ( T a,g ( t ) > τ )and τ q a,g ( t ) = P ( T a,g ( t ) ≤ τ ), respectively. For notational purposes we write q a,g ( t ) := q a,g ( t ).Deterministic force of mortality (theory for the stochastic case is also available)at age a + τ with gender g of a person aged a at time t is given by the derivative µ a + τ,g ( t ) := − ∂∂τ log τ p a,g ( t ). Henceforth, the central death rate of a person aged a at time t and of gender g is given by a weighted average of the force of mortality m a,g ( t ) := (cid:82) s p a,g ( t + s ) µ a + s,g ( t + s ) ds (cid:82) s p a,g ( t + s ) ds = q a,g ( t ) (cid:82) s p a,g ( t + s ) ds ≈ q a,g ( t )1 − q a,g ( t ) / . If µ a + s,g ( t + s ) = µ a,g ( t ) for all 0 ≤ s < a, t ∈ N with a ≤ A , i.e., underpiecewise constant force of mortality, we have m a,g ( t ) = µ a,g ( t ) as well as q a,g ( t ) =1 − exp( − m a,g ( t )).Let N a,g ( t ) denote the number of recorded deaths in year t of people having age a and gender g , as well as deﬁne the exposure to risk E a,g ( t ) as the average numberof people in year t having age a and gender g . The latter can often be retrieved fromstatistical bureaus or approximated by the age-dependent population in the middleof a calender year. Estimates for these data in Australia (with several adjustmentcomponents such as census undercount and immigration taken into account) areavailable at the website of the Australian Bureau of Statistics. Considering under-lying death causes k = 0 , . . . , K , which are to be understood as diseases or injurythat initiated the train of morbid events leading directly to death, let N a,g,k ( t )denote the actual number of recorded deaths due to death cause k in year t ofpeople having age a and gender g . Note that N a,g ( t ) = N a,g, ( t ) + · · · + N a,g,K ( t ).Data on ICD-classiﬁed (short for International Statistical Classiﬁcation of Diseasesand Related Health Problems) death counts can be found for many countries. ForAustralia these data can be found at the Australian Institute of Health and Welfare(AIHW), classiﬁed by ICD-9 and ICD-10.2.2. Some Classical Stochastic Mortality Models.

We start with a simplemodel and assume that deaths in year t of people having age a and gender g arePoisson distributed N a,g ( t ) ∼ Poisson( E a,g ( t ) m a,g ( t )). In this case the maximumlikelihood estimate for the central death rate is given by (cid:98) m a,g ( t ) = (cid:98) N a,g ( t ) /E a,g ( t ),where (cid:98) N a,g ( t ) is the actual recorded number of deaths.The benchmark stochastic mortality model considered in the literature is thetraditional Lee–Carter model, [Lee and Carter(1992)], where the logarithmic central CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + death rates are modelled in the formlog (cid:98) m a,g ( t ) = α a,g + β a,g κ t + ε a,g,t with independent normal error terms ε a,g,t with mean zero, common time-speciﬁccomponent κ t , as well as age and gender speciﬁc parameters α a,g and β a,g . Usingsuitable normalisations, estimates for these parameters and κ t can be derived viathe method of moments and singular value decompositions, [Kainhofer et al.(2006),section 4.5.1]. Forecasts may then be obtained by applying auto-regressive models to κ t . Note that [Fung et al.(2017)] and [Fung et al.(2015)] provide joint estimation ofparameters and latent factor κ t in the Lee-Carter model via a state-space frameworkusing MCMC. Various extensions of this classical approach with multiple timefactors and cohort components have been proposed in the literature; for a review,see [Cairns et al.(2009)].Instead of modelling central death rates with normal error terms as in the Lee–Carter approach, [Brouhns et al.(2002)] propose to model death counts via Poissonregression where error terms are replaced by Poison random variables. In thiscase N a,g ( t ) ∼ Poisson( E a,g ( t ) m a,g ( t )) where, in the simplest case, log m a,g ( t ) = α a,g + β a,g κ t . Correspondingly, assuming that we want to forecast central deathrates for diﬀerent underlying death causes k , it is natural to assume N a,g,k ( t ) ∼ Poisson( E a,g ( t ) m a,g,k ( t )) where log m a,g,k ( t ) = α a,g,k + β a,g,k κ k,t . However, in thiscase, it is not diﬃcult to see that m a,g, ( t ) + · · · + m a,g,K ( t ) (cid:54) = m a,g ( t ), in general,and thus E [ N a,g ( t )] (cid:54) = K (cid:88) k =0 E [ N a,g,k ( t )]since N a,g ( t ) ∼ Poisson (cid:0) E a,g ( t )( m a,g, ( t ) + · · · + m a,g,K ( t )) (cid:1) . Moreover, as centraldeath rates are increasing for selected underlying death cause (e.g., central deathrates for 75–79 year olds in Australia have doubled from 1987 throughout 2011),forecasts increase exponentially, exceeding one in the future.In light of this shortcoming, we will introduce an additive stochastic mortalitymodel which ﬁts into the risk aggregation framework of extended CredtRisk + , see[Schmock(2017)].2.3. An Additive Stochastic Mortality Model.

To be able to model diﬀerentunderlying death causes or, more generally, diﬀerent covariates which show somecommon death behaviour (however, we will restrict to the ﬁrst case in this paper),let us assume common stochastic risk factors Λ ( t ) , . . . , Λ K ( t ) with correspondingage-dependent weights w a,g,k ( t ) which give the age-dependent susceptibility to thediﬀerent risk factors and which satisfy w a,g, ( t ) + · · · + w a,g,K ( t ) = 1 . Remark . Risk factors introduce dependence amongst deaths of diﬀerent policy-holders. If risk factor Λ k ( t ) takes large or small values, then the likelihood of deathdue to k increases or decreases, respectively, simultaneously for all policyholdersdepending on the weight w a,g,k ( t ). Weights w a,g, , . . . , w a,g,K indicate the vulnera-bility of people aged a with gender g to risk factors Λ ( t ) , . . . , Λ K ( t ). Risk factorsare of particular importance to forecast death causes. For a practical example,assume that a new, very eﬀective cancer treatment is available such that fewerpeople die from lung cancer. This situation would have a longevity eﬀect on all J. HIRZ, U. SCHMOCK, AND P. V. SHEVCHENKO policyholders. Such a scenario would then correspond to the case when the riskfactor for neoplasms shows a small realisation.

Deﬁnition 2.2 (Additive stoch. mort. model) . Given risk factors Λ ( t ) , . . . , Λ K ( t )with unit mean and variances σ ( t ) , . . . , σ K ( t ), assume N a,g,k ( t ) ∼ Poisson (cid:0) E a,g ( t ) m a,g ( t ) w a,g,k ( t )Λ k ( t ) (cid:1) , k = 1 , . . . , K , being conditionally independent of all N a,g,k (cid:48) ( t ) with k (cid:54) = k (cid:48) . Idiosyncratic deaths N a,g, ( t ) with k = 0 are assumed to be mutually independent and independent ofall other random variables such that N a,g, ( t ) ∼ Poisson (cid:0) E a,g ( t ) m a,g ( t ) w a,g, ( t ) (cid:1) . In this case, in expectation, deaths due to diﬀerent underlying death causes addup correctly, i.e., E [ N a,g ( t )] = E a,g ( t ) m a,g ( t ) = E a,g ( t ) m a,g ( t ) K (cid:88) k =0 w a,g,k ( t ) = K (cid:88) k =0 E [ N a,g,k ( t )]as E [Λ k ( t )] = 1 by assumption. Remark . In applications, if K = 0, it is feasible to replace the Poisson assumptionby a more realistic Binomial assumption N a,g, ( t ) ∼ Binomial (cid:0) E a,g ( t ) , m a,g ( t ) (cid:1) , asdone in Section 4.2 for illustration purposes. Remark . If risk factors are independent and gamma distributed (as in thecase of classical CreditRisk + ), then, unconditionally, deaths N a,g,k ( t ) have a nega-tive binomial distribution. Then, variance of deaths is given by Var( N a,g,k ( t )) = E a,g ( t ) m a,g ( t ) w a,g,k ( t )(1 + E a,g ( t ) m a,g ( t ) w a,g,k ( t ) σ k ( t )) with σ k ( t ) denoting thevariance of Λ k ( t ). Analogously, for all a (cid:54) = a (cid:48) or g (cid:54) = g (cid:48) ,Cov( N a,g,k ( t ) , N a (cid:48) ,g (cid:48) ,k ( t )) = E a,g ( t ) E a (cid:48) ,g (cid:48) ( t ) m a,g ( t ) m a (cid:48) ,g (cid:48) ( t ) w a,g,k ( t ) w a (cid:48) ,g (cid:48) ,k ( t ) σ k ( t ) . (2.5)This result will be used in Section 6 for model validation. A similar result alsoholds for the more general model with dependent risk factors, see [Schmock(2017),section 6.5].To account for improvement in mortality and shifts in death causes over time,we introduce the following time-dependent parameter families for trends. Similar tothe Lee–Carter model, we could simply consider a linear decrease in log mortality.However, since this yields diminishing or exploding mortality over time, we choose amore sophisticated class with trend reduction features. First, in order to guaranteethat values lie in the unit interval, let F Lap denote the Laplace distribution functionwith mean zero and variance two, i.e., F Lap ( x ) = 12 + 12 sign( x ) (cid:0) − exp( −| x | ) (cid:1) , x ∈ R , such that, for x <

0, twice the expression becomes the exponential function.To ensure that weights and death probabilities are strictly positive for t → ∞ ,we use the trend reduction/acceleration technique T ∗ ζ,η ( t ) = 1 η arctan( η ( t − ζ )) , (2.6)with parameters ( ζ, η ) ∈ R × (0 , ∞ ) and t ∈ R , motivated by [Kainhofer et al.(2006),section 4.6.2]. In particular, (2.6) roughly gives a linear function of t if parameter CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + η is small which illustrates the close link to the Lee–Carter model. In order tomake estimation more stable, we suggest the normalisation T ζ,η ( t ) = ( T ∗ ζ,η ( t ) −T ∗ ζ,η ( t )) / ( T ∗ ζ,η ( t ) − T ∗ ζ,η ( t − t ∈ R . A cleartrend reduction in mortality improvements can be observed in Japan since 1970,see [Pasdika and Wolﬀ(2005), section 4.2], and also for females in Australia. Deﬁnition 2.7 (Trend families for central death rates and weights) . Central deathrates for age a , gender g in year t are given by m a,g ( t ) = F Lap (cid:0) α a,g + β a,g T ζ a,g ,η a,g ( t ) + γ t − a (cid:1) , (2.8)with parameters α a,g , β a,g , ζ a,g , γ t − a ∈ R and η i ∈ (0 , ∞ ), as well as where weightsare given by w a,g,k ( t ) = exp (cid:0) u a,g,k + v a,g,k T φ k ,ψ k ( t ) (cid:1)(cid:80) Kj =0 exp (cid:0) u a,g,j + v a,g,j T φ j ,ψ j ( t ) (cid:1) , k ∈ { , . . . , K } , (2.9)with parameters u a,g, , v a,g, , φ , . . . , u a,g,K , v a,g,K , φ K ∈ R and ψ , . . . , ψ K ∈ (0 , ∞ ).The assumptions above yield an exponential evolution of central death rates overtime, modulo trend reduction T ζ,η ( t ) and cohort eﬀects γ t − a ( t − a refers to the birthyear). Vector α can be interpreted as intercept parameter for central death rates.Henceforth, β gives the speed of mortality improvement while η gives the speed oftrend reduction and ζ gives the shift on the S-shaped arctangent curve, i.e., thelocation of trend acceleration and trend reduction. Parameter γ t − a models cohorteﬀects for groups with the same year of birth. This factor can also be understood asa categorical variate such as smoker/non-smoker, diabetic/non-diabetic or country ofresidence. The interpretation of model parameters for families of weights is similar.Cohort eﬀects are not used for modelling weights w a,g,k ( t ) as sparse data do notallow proper estimation. In applications, we suggest to ﬁx φ and ψ in order to reducedimensionality to suitable levels. Furthermore, ﬁxing trend acceleration/reductionparameters ( ζ , η , φ , ψ ) yields stable results over time, with similar behavior as inthe Lee-Carter model. Including trend reduction parameters can lead to less stableresults over time. However, our proposed model allows free adaption of parameterfamilies for mortality and weights. Remark . Long-term projections of death probabilitiesusing (2.8) give lim t →∞ m a,g ( t ) = F Lap (cid:16) α a,g + β a,g π η a,g (cid:17) . Likewise, long-term projections for weights using (2.9) are given bylim t →∞ w a,g,k ( t ) = exp (cid:0) u a,g,k + v a,g,k π ψ k (cid:1)(cid:80) Kj =0 exp (cid:0) u a,g,j + v a,g,j π ψ j (cid:1) . Thus, given weak trend reduction, i.e., ψ k close to zero, weights with the strongesttrend will tend to dominate in the long term. If we a priori ﬁx the parameter fortrend reduction ψ k at suitable values, this eﬀect can be controlled. Alternatively,diﬀerent parameter families for weights can be used, e.g., linear families. Note thatour model ensures that weights across risk factors k = 0 , , . . . , K always sum up toone which is why overall mortality m a,g ( t ) is not inﬂuenced by weights and theirtrends. J. HIRZ, U. SCHMOCK, AND P. V. SHEVCHENKO Parameter Estimation

In this section we provide several approaches for parameter estimation in ourproposed model from Deﬁnitions 2.2 and 2.7. The approaches include maximumlikelihood, maximum a posteriori, matching of moments and MCMC. Whilst match-ing of moments estimates are easy to derive but less accurate, maximum a posteriorand maximum likelihood estimates cannot be calculated by deterministic numericaloptimisation, in general. Thus, we suggest MCMC as a slower but very powerfulalternative. Publicly available data based on the whole population of a country areused.[McNeil et al.(2005)] in section 8.6 consider statistical inference for Poissonmixture models and Bernoulli mixture models. They brieﬂy introduce momentestimators and maximum likelihood estimators for homogeneous groups in Bernoullimixture models. Alternatively, they derive statistical inference via a generalisedlinear mixed model representation for mixture models which is distantly related to oursetting. In their ‘Notes and Comments’ section the reader can ﬁnd a comprehensivelist of interesting references. Nevertheless, most of their results and arguments arenot directly applicable to our case since we use a diﬀerent parametrisation and sincewe usually have rich data of death counts compared to the sparse data on companydefaults.In order to be able to derive statistically sound estimates, we make the followingsimplifying assumption for time independence:

Deﬁnition 3.1 (Time independence and risk factors) . Given Deﬁnition 2.2, con-sider discrete-time periods U := { , . . . , T } and assume that random variables areindependent for diﬀerent points in time s (cid:54) = t in U . Moreover, for each t ∈ U , riskfactors Λ ( t ) , . . . , Λ K ( t ) are assumed to be independent and, for each k ∈ { , . . . , K } ,Λ k (1) , . . . , Λ k ( T ) are identically gamma distributed with mean one and variance σ k ≥ K = 0 or K = 1 with w a,g, = 1for all ages and genders. For estimation and forecasting of death causes, we identifyrisk factors with underlying death causes. Note that for ﬁxed a and g , Equation(2.9) is invariant under a constant shift of parameters ( u a , g ,k ) k ∈{ ,...,K } as well as( v a , g ,k ) k ∈{ ,...,K } if φ = · · · = φ K and ψ = · · · = ψ K , respectively. Thus, for each a and g , we can always choose ﬁxed and arbitrary values for u a,g, and v a,g, .3.1. Estimation via Maximum Likelihood.

We start with the classical maxi-mum likelihood approach. The likelihood function can be derived in closed form but,unfortunately, estimates have to be derived via MCMC as deterministic numericaloptimisation quickly breaks down due to high dimensionality.

Lemma 3.2 (Likelihood function) . Given Deﬁnitions 2.2–3.1, deﬁne (cid:98) N k ( t ) := A (cid:88) a =0 (cid:88) g ∈{ f , m } (cid:98) N a,g,k ( t ) , k ∈ { , . . . , K } and t ∈ U ,

CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + as well as ρ a,g,k ( t ) := E a,g ( t ) m a,g ( t ) w a,g,k ( t ) for all age groups a , with maximumage group A , and gender g and ρ k ( t ) := A (cid:88) a =0 (cid:88) g ∈{ f , m } ρ a,g,k ( t ) . Then, the likelihood function (cid:96) ( (cid:98) N | θ m , θ w , σ ) of parameters θ m := ( α, β, ζ, η ) ∈ E ,as well as θ w := ( u, v, φ, ψ ) ∈ F and σ := ( σ k ) ∈ [0 , ∞ ) K given (cid:98) N := ( (cid:98) N a,g,k ( t )) ∈ N A × × ( K +1) × T is given by (cid:96) ( (cid:98) N | θ m , θ w , σ ) = T (cid:89) t =1 (cid:32)(cid:18) A (cid:89) a =0 (cid:89) g ∈{ f , m } e − ρ a,g, ( t ) ρ a,g, ( t ) (cid:98) N a,g, ( t ) (cid:98) N a,g,k ( t )! (cid:19) × K (cid:89) k =1 (cid:18) Γ( σ − k + (cid:98) N k ( t ))Γ( σ − k ) σ σ − k k ( σ − k + ρ k ( t )) σ − k + (cid:98) N k ( t ) A (cid:89) a =0 (cid:89) g ∈{ f , m } ρ a,g,k ( t ) (cid:98) N a,g,k ( t ) (cid:98) N a,g,k ( t )! (cid:19)(cid:33) . (3.3) Proof.

Following our assumptions, by straightforward computation we get (cid:96) ( (cid:98) N | θ m , θ w , σ ) = T (cid:89) t =1 (cid:32)(cid:18) A (cid:89) a =0 (cid:89) g ∈{ f , m } e − ρ a,g, ( t ) ρ a,g, ( t ) (cid:98) N a,g, ( t ) (cid:98) N a,g, ( t )! (cid:19) × K (cid:89) k =1 E (cid:34) P (cid:18) A (cid:92) a =0 (cid:92) g ∈{ f , m } (cid:8) N a,g,k ( t ) = (cid:98) N a,g,k ( t ) (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) Λ k ( t ) (cid:19)(cid:35)(cid:33) , where (cid:96) ( (cid:98) N | θ m , θ w , σ ) = P ( N = (cid:98) N | θ m , θ w , σ ) denotes the probability of the event { N = (cid:98) N } given parameters. Taking expectations in the equation above yields E (cid:20) P (cid:18) A (cid:92) a =0 (cid:92) g ∈{ f , m } (cid:8) N a,g,k ( t ) = (cid:98) N a,g,k ( t ) (cid:9) (cid:12)(cid:12)(cid:12)(cid:12) Λ k ( t ) (cid:19)(cid:21) = (cid:18) A (cid:89) a =0 (cid:89) g ∈{ f , m } ρ a,g,k ( t ) (cid:98) N a,g,k ( t ) (cid:98) N a,g,k ( t )! (cid:19) (cid:90) ∞ e − ρ k ( t ) x t x (cid:98) N k ( t ) t x σ − k − t e − x t σ − k Γ( σ − k ) σ σ − k k dx t . The integrand above is a density of a gamma distribution—modulo the nor-malisation constant—with parameters σ − k + (cid:98) N k ( t ) and σ − k + ρ k ( t ). Therefore,the corresponding integral equals the multiplicative inverse of the normalisationconstant, i.e., (cid:18) ( σ − k + ρ k ( t )) σ − k + (cid:98) N k ( t ) Γ( σ − k + (cid:98) N k ( t )) (cid:19) − , k ∈ { , . . . , K } and t ∈ { , . . . , T } . Putting all results together gives (3.3). (cid:3)

Since the products in (3.3) can become small, we recommend to use the log-likelihood function instead. For implementations we recommend to use the log-gamma function, e.g., the lgamma function in ‘R’ see [R Core Team(2013)].

Deﬁnition 3.4 (Maximum likelihood estimates) . Recalling (3.3), as well as giventhe assumptions of Lemma 3.2, maximum likelihood estimates for parameters θ m , θ w and σ are deﬁned by (cid:0) ˆ θ MLE m , ˆ θ MLE w , ˆ σ MLE (cid:1) := arg sup θ m ,θ w ,σ (cid:96) ( (cid:98) N | θ m , θ w , σ ) = arg sup θ m ,θ w ,σ log (cid:96) ( (cid:98) N | θ m , θ w , σ ) . Deterministic optimisation of the likelihood function may quickly lead to numericalissues due to high dimensionality. In ‘R’ the deterministic optimisation routine nlminb , see [R Core Team(2013)], gives stable results in simple examples. Ourproposed alternative is to switch to a Bayesian setting and use MCMC as describedin Section 3.3.3.2.

Estimation via a Maximum a Posteriori Approach.

Secondly we pro-pose a variation of maximum a posteriori estimation based on Bayesian inference,[Shevchenko(2011), section 2.9]. If risk factors are not integrated out in the likeli-hood function, we may also derive the posterior density of the risk factors as follows.One main advantage of this approach is that estimates for risk factors are obtainedwhich is very useful for scenario analysis and model validation. Furthermore, handyapproximations for estimates of risk factor realisations and variances are obtained.

Lemma 3.5 (Posterior density) . Given Deﬁnitions 2.2–3.1, consider parame-ters θ m := ( α, β, ζ, η ) ∈ E , θ w := ( u, v, φ, ψ ) ∈ F , as well as realisations λ :=( λ k ( t )) ∈ (0 , ∞ ) K × T of risk factors Λ := (Λ k ( t )) ∈ (0 , ∞ ) K × T , as well as data (cid:98) N := ( (cid:98) N a,g,k ( t )) ∈ N A × × ( K +1) × T . Assume that their prior distribution is denotedby π ( θ m , θ w , σ ) . Then, the posterior density π ( θ m , θ w , λ, σ | (cid:98) N ) of parameters givendata (cid:98) N is up to constant given by π ( θ m , θ w , λ, σ | (cid:98) N ) ∝ π ( θ m , θ w , σ ) π ( λ | θ m , θ w , σ ) (cid:96) ( (cid:98) N | θ m , θ w , λ, σ )= T (cid:89) t =1 (cid:32)(cid:18) A (cid:89) a =0 (cid:89) g ∈{ f , m } e − ρ a,g, ( t ) ρ a,g, ( t ) (cid:98) N a,g, ( t ) (cid:98) N a,g, ( t )! (cid:19) K (cid:89) k =1 (cid:18) e − λ k ( t ) σ − k λ k ( t ) σ − k − Γ( σ − k ) σ σ − k k × A (cid:89) a =0 (cid:89) g ∈{ f , m } e − ρ a,g,k ( t ) λ k ( t ) ( ρ a,g,k ( t ) λ k ( t )) (cid:98) N a,g,k ( t ) (cid:98) N a,g,k ( t )! (cid:19)(cid:33) π ( θ m , θ w , σ ) , (3.6) where π ( λ | θ m , θ w , σ ) denotes the prior distribution of risk factors at Λ = λ givenall other parameters, where (cid:96) ( (cid:98) N | θ m , θ w , λ, σ ) denotes the likelihood of N = (cid:98) N givenall parameters and where ρ a,g,k ( t ) = E a,g ( t ) m a,g ( t ) w a,g,k ( t ) .Proof. The ﬁrst proportional equality follows by Bayes’ theorem which is alsowidely used in Bayesian inference, see, for example, [Shevchenko(2011), section 2.9].Moreover, π ( λ | θ m , θ w , σ ) = K (cid:89) k =1 T (cid:89) t =1 (cid:18) e − λ k ( t ) σ − k λ k ( t ) σ − k − Γ( σ − k ) σ σ − k k (cid:19) . If θ m ∈ E , θ w ∈ F , λ ∈ (0 , ∞ ) K × T and σ ∈ [0 , ∞ ) K , then note that (cid:96) ( (cid:98) N | θ m , θ w , λ, σ ) = A (cid:89) a =0 (cid:89) g ∈{ f , m } T (cid:89) t =1 (cid:18) e − ρ a,g, ( t ) ρ a,g, ( t ) (cid:98) N a,g, ( t ) (cid:98) N a,g, ( t )! × K (cid:89) k =1 P (cid:0) N a,g,k ( t ) = (cid:98) N a,g,k ( t ) (cid:12)(cid:12) Λ k ( t ) = λ k ( t ) (cid:1)(cid:19) , CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + which then gives (3.6) by straightforward computation as in Lemma 3.2. (cid:3) The approach described above may look like a pure Bayesian inference approachbut note that risk factors Λ k ( t ) are truly stochastic and, therefore, we refer toit as a maximum a posteriori estimation approach. There are many reasonablechoices for prior distributions of parameters which include (improper) uniform priors π ( θ m , θ w , σ ) := 1 E ( θ m )1 F ( θ w )1 (0 , ∞ ) K ( σ ) to smoothing priors as given in Section 4.2.Having derived the posterior density, we can now deﬁne corresponding maximuma posteriori estimates. Deﬁnition 3.7 (Maximum a posteriori estimates) . Recalling (3.6), as well as giventhe assumptions of Lemma 3.5, maximum a posteriori estimates for parameters θ m , θ w , λ and σ , given uniqueness, are deﬁned by (cid:0) ˆ θ MAP m , ˆ θ MAP w , ˆ λ MAP , ˆ σ MAP (cid:1) := arg sup θ m ,θ w ,λ,σ π ( θ m , θ w , λ, σ | (cid:98) N )= arg sup θ m ,θ w ,λ,σ log π ( θ m , θ w , λ, σ | (cid:98) N ) . Again, deterministic optimisation of the posterior function may quickly lead tonumerical issues due to high dimensionality of the posterior function which is whywe recommend MCMC. However, we can provide handy approximations for riskfactor and variance estimates.

Lemma 3.8 (Conditions for maximum a posteriori estimates) . Given Deﬁnition3.7, estimates ˆ λ MAP and ˆ σ MAP satisfy, for every k ∈ { , . . . , K } and t ∈ U , ˆ λ MAP k ( t ) = (ˆ σ MAP k ) − − (cid:80) Aa =0 (cid:80) g ∈{ f , m } (cid:98) N a,g,k ( t )(ˆ σ MAP k ) − + (cid:80) Aa =0 (cid:80) g ∈{ f , m } ρ a,g,k ( t ) (3.9) if (ˆ σ MAP k ) − − (cid:80) Aa =0 (cid:80) g ∈{ f , m } (cid:98) N a,g,k ( t ) > , as well as σ MAP k + Γ (cid:48) (cid:0) (ˆ σ MAP k ) − (cid:1) Γ (cid:0) (ˆ σ MAP k ) − (cid:1) = 1 T T (cid:88) t =1 (cid:0) λ MAP k ( t ) − ˆ λ MAP k ( t ) (cid:1) , (3.10) where, for given ˆ λ MAP k (1) , . . . , ˆ λ MAP k ( T ) > , (3.10) has a unique solution which isstrictly positive.Proof. First, set π ∗ ( (cid:98) N ) := log π ( θ m , θ w , λ, σ | (cid:98) N ). Then, for every k ∈ { , . . . , K } and t ∈ U , diﬀerentiating π ∗ ( (cid:98) N ) gives ∂π ∗ ( (cid:98) N ) ∂λ k ( t ) = σ − k − λ k ( t ) − σ k + A (cid:88) a =0 (cid:88) g ∈{ f , m } (cid:16) (cid:98) N a,g,k ( t ) λ k ( t ) − ρ a,g,k ( t ) (cid:17) . Setting this term equal to zero and solving for Λ k ( t ) gives (3.9). Similarly, forevery k ∈ { , . . . , K } , we obtain ∂π ∗ ( (cid:98) N ) ∂σ k = 1 σ k T (cid:88) t =1 (cid:16) log σ k − (cid:48) ( σ − k )Γ( σ − k ) − log λ k ( t ) + λ k ( t ) (cid:17) . Again, setting this term equal to zero and rearranging the terms gives (3.10).For existence, uniqueness of the solution in (3.10), let ˆ λ MAP k (1) , . . . , ˆ λ MAP k ( T ) > k ∈ { , . . . , K } to be ﬁxed. Then, note that the right side in (3.10) isstrictly negative unless ˆ λ MAP k (1) = · · · = ˆ λ MAP k ( T ) = 1, as log x ≤ x − x > with equality for x = 1. If ˆ λ MAP k (1) = · · · = ˆ λ MAP k ( T ) = 1, then there is no variabilityin the risk factor such that σ k = 0. Henceforth, note that f ( x ) := log x − Γ (cid:48) ( x ) / Γ( x ),for all x >

0, is continuous (Γ (cid:48) ( x ) / Γ( x ) is known as digamma function or ψ -function)with 12 x < f ( x ) < x + 112 x , x > , (3.11)which follows by [Qi et al.(2005), Corollary 1] and f ( x +1) = 1 /x + f ( x ) for all x > − f (1 /x ) = − c for some given c >

0, note that f (0+) = ∞ , aswell as lim x →∞ f ( x ) = 0. Thus, a solution has to exist as f (1 /x ) is continuous on x >

0. Furthermore, f (cid:48) ( x ) = 1 x − ∞ (cid:88) i =0 x + i ) < x − (cid:90) ∞ x z dz = 0 , x > , where the ﬁrst equality follows by [Chaudhry and Zubair(2001)]. This implies that f ( x ) and ( − f (1 /x )) are strictly decreasing. Thus, the solution in (3.10) is unique. (cid:3) Using Lemma 3.8, it is possible to derive handy approximations for risk factorand variance estimates, given estimates for weights and death probabilities whichcan be derived by matching of moments as given in Section 3.4 or other modelssuch as Lee–Carter. If (cid:80) Aa =0 (cid:80) g ∈{ f , m } (cid:98) N a,g,k ( t ) is large, it is reasonable to deﬁneˆ λ MAPappr k ( t ) := − (cid:80) Aa =0 (cid:80) g ∈{ f , m } (cid:98) N a,g,k ( t ) (cid:80) Aa =0 (cid:80) g ∈{ f , m } ρ a,g,k ( t ) (3.12)as an approximative estimate for λ k ( t ) where ρ a,g,k ( t ) := E a,g ( t ) m a,g ( t ) w a,g,k ( t ).Having derived approximations for λ , we can use (3.10) to get estimates for σ .Alternatively, note that due to (3.11), we get − σ MAP k − Γ (cid:48) (cid:0) (ˆ σ MAP k ) − (cid:1) Γ (cid:0) (ˆ σ MAP k ) − (cid:1) = (ˆ σ MAP k ) O (cid:0) (ˆ σ MAP k ) (cid:1) . Furthermore, if we use second order Taylor expansion for the logarithm, then theright hand side of (3.10) gets1 T T (cid:88) t =1 (cid:0) ˆ λ MAP k ( t ) − − log ˆ λ MAP k ( t ) (cid:1) = 12 T T (cid:88) t =1 (cid:16)(cid:0) ˆ λ MAP k ( t ) − (cid:1) + O (cid:0)(cid:0) ˆ λ MAP k ( t ) − (cid:1) (cid:1)(cid:17) . This approximation is better the closer the values of λ are to one. Thus, usingthese observations, an approximation for risk factor variances σ is given by (cid:0) ˆ σ MAPappr k (cid:1) := 1 T T (cid:88) t =1 (cid:0) ˆ λ MAPappr k ( t ) − (cid:1) , (3.13)which is the sample variance of ˆ λ MAP . Note | ˆ λ MAP k ( t ) − | < | ˆ λ MAPappr k ( t ) − | ,implying that (3.13) will dominate solutions obtained by (3.10) in most cases.3.3. Estimation via MCMC.

As we have already outlined in in the previoussections, deriving maximum a posteriori estimates and maximum likelihood esti-mates via deterministic numerical optimisation is mostly impossible due to highdimensionality (several hundred parameters). Alternatively, we can use MCMCunder a Bayesian setting. Introductions to this topic can be found, for example,

CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + in [Gilks(1995)], [Gamerman and Lopes(2006)], as well as [Shevchenko(2011), sec-tion 2.11]. We suggest to use the random walk Metropolis–Hastings within Gibbsalgorithm which, given that the Markov chain is irreducible and aperiodic, generatessample chains that converge to the stationary distribution, [Tierney(1994)] andalso [Robert and Casella(2004), sections 6–10]. However, note that various MCMCalgorithms are available.MCMC requires a Bayesian setting which we automatically have in the maximuma posteriori approach, see Section 3.2. Similarly, we can switch to a Bayesiansetting in the maximum likelihood approach, see Section 3.1, by simply multiplyingthe likelihood function with a prior distribution of parameters. MCMC generatesMarkov chains which provide samples from the posterior distribution where themode of these samples then corresponds to an approximation for the maximuma posteriori estimate. More stable estimates in terms of mean squared error areobtained by taking the mean over all samples once MCMC chains sample from thestationary distribution, [Shevchenko(2011), section 2.10]. Taking the mean over allsamples as an estimate, of course, can lead to troubles if posterior distributionsof parameters are, e.g., bimodal, such that we end up in a region which is highlyunlikely. Furthermore, sampled posterior distribution can be used to estimateparameter uncertainty. The method requires a certain burn-in period until thegenerated chain becomes stationary. Typically, one tries to get average acceptanceprobabilities close to 0 .

234 which is asymptotically optimal for multivariate Gaussianproposals as shown in [Roberts et al.(1997)]. To reduce long computational times,one can run several independent MCMC chains with diﬀerent starting points ondiﬀerent CPUs in a parallel way. To prevent overﬁtting, it is possible to regularise,i.e., smooth, maximum a posteriori estimates via adjusting the prior distribution.This technique is particularly used in regression, as well as in many applications,such as signal processing. When forecasting death probabilities in Section 4.2, weuse a Gaussian prior density with a certain correlation structure.Also under MCMC, note that ultimately we are troubled with the curse ofdimensionality as we will never be able to get an accurate approximation of thejoint posterior distribution in a setting with several hundred parameters.As MCMC samples yield conﬁdence bands for parameter estimates, they can easilybe checked for signiﬁcance at every desired level, i.e., parameters are not signiﬁcantif conﬁdence bands cover the value zero. In our subsequent examples, almost allparameters are signiﬁcant. Given internal mortality data, these conﬁdence bandsfor parameter estimates can also be used to test whether parameters signiﬁcantlydeviate from oﬃcially published life tables. On the other hand, MCMC is perfectlyapplicable to sparse data as life tables can be used as prior distributions withconﬁdence bands providing an estimate for parameter uncertainty which increasewith fewer data points.3.4.

Estimation via Matching of Moments.

Finally, we provide a matching ofmoments approach which allows easier estimation of parameters but which is lessaccurate. Therefore, we suggest this approach solely to be used to obtain startingvalues for the other, more sophisticated estimation procedures. In addition, matchingof moments approach needs simplifying assumptions to guarantee independent andidentical random variables over time.

Assumption 3.14 (i.i.d. setting) . Given Deﬁnitions 2.2–3.1, assume death counts ( N a,g,k ( t )) t ∈ U to be i.i.d. with E a,g := E a,g (1) = · · · = E a,g ( T ) and m a,g := m a,g (1) = · · · = m a,g ( T ) , as well as w a,g,k := w a,g,k (1) = · · · = w a,g,k ( T ) . To achieve such an i.i.d. setting, transform deaths N a,g,k ( t ) such that Poissonmixture intensities are constant over time via N (cid:48) a,g,k ( t ) := (cid:22) E a,g ( T ) m a,g ( T ) w a,g,k ( T ) E a,g ( t ) m a,g ( t ) w a,g,k ( t ) N a,g,k ( t ) (cid:23) , t ∈ U , and, correspondingly, deﬁne E a,g := E a,g ( T ), as well as m a,g := m a,g ( T ) and w a,g,k := w a,g,k ( T ). Using this modiﬁcation, we manage to remove long term trendsand keep E a,g ( t ), m a,g ( t ) and w a,g,k ( t ) constant over time.Estimates ˆ m MM a,g ( t ) for central death rates m a,g ( t ) can be obtained via minimisingmean squared error to crude death rates which, if parameters ζ , η and γ arepreviously ﬁxed, can be obtained by regressing( F Lap ) − (cid:18) (cid:80) Kk =0 (cid:98) N (cid:48) a,g,k ( t ) E a,g (cid:19) − γ a − t on T ζ a,g ,η a,g ( t ). Estimates ˆ u MM a,g,k , ˆ v MM a,g,k , ˆ φ MM k , ˆ ψ MM k for parameters u a,g,k , v a,g,k , φ k , ψ k via minimising the mean squared error to crude death rates which again, if parame-ters φ and ψ are previously ﬁxed, can be obtained by regressing log( (cid:98) N (cid:48) a,g,k ( t )) − log( E a,g ˆ m MM a,g ( t )) on T φ k ,ψ k ( t ). Estimates ˆ w MM a,g,k ( t ) are then given by (2.9).Then, deﬁne unbiased estimators for weights W ∗ a,g,k ( t ) := N (cid:48) a,g,k ( t ) /E a,g m a,g , aswell as W ∗ a,g,k := 1 T T (cid:88) t =1 W ∗ a,g,k ( t ) . In particular, we have E [ W ∗ a,g,k ] = E [ W ∗ a,g,k ( t )] = w a,g,k . Lemma 3.15.

Given Assumptions 3.1, deﬁne (cid:98) Σ a,g,k = 1 T − T (cid:88) t =1 (cid:0) W ∗ a,g,k ( t ) − W ∗ a,g,k (cid:1) , for all a ∈ { , . . . , A } , g ∈ { f , m } and k ∈ { , . . . , K } . Then, E [ (cid:98) Σ a,g,k ] = Var (cid:0) W ∗ a,g,k (1) (cid:1) = w a,g,k E a,g m a,g + σ k w a,g,k . (3.16) Proof.

Note that ( W ∗ a,g,k ( t )) t ∈ U is assumed to be an i.i.d. sequence. Thus, since (cid:98) Σ a,g,k is an unbiased estimator for the standard deviation of W ∗ a,g,k (1) and W ∗ a,g,k ,see [Lehmann and Romano(2005), Example 11.2.6], we immediately get E [ (cid:98) Σ a,g,k ] = Var( W ∗ a,g,k (1)) = Var (cid:18) N (cid:48) a,g,k (1) E a,g m a,g (cid:19) . Using the law of total variance as in [Schmock(2017), Lemma 3.48], as well asDeﬁnition 5.6 gives E a,g m a,g E [ (cid:98) Σ a,g,k ] = E [Var( N (cid:48) a,g,k (1) | Λ k )] + Var( E [ N (cid:48) a,g,k (1) | Λ k ]) . Since Var( N (cid:48) a,g,k (1) | Λ k ) = E [ N (cid:48) a,g,k (1) | Λ k ] = E a,g m a,g w a,g,k Λ k a.s., the equationabove gives the result. (cid:3) CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + Having obtained Equation (3.16), we may deﬁne the following matching ofmoments estimates for risk factor variances.

Deﬁnition 3.17 (Matching of moments estimates for risk factor variances) . GivenAssumption 3.1, the matching of moments estimate for σ k for all k ∈ { , . . . , K } isdeﬁned as (cid:0) ˆ σ MM k (cid:1) := max (cid:40) , (cid:80) Aa =0 (cid:80) g ∈{ f , m } (cid:16) ˆ σ a,g,k − w MM a,g,k ( T ) E a,g m MM a,g ( T ) (cid:17)(cid:80) Aa =0 (cid:80) g ∈{ f , m } ( w MM a,g,k ( T )) (cid:41) , where ˆ σ a,g,k is the estimate corresponding to estimator (cid:98) Σ a,g,k .4. Applications

Prediction of Underlying Death Causes.

As an applied example for ourproposed stochastic mortality model, as well as for some further applications, wetake annual death data from Australia for the period 1987 to 2011. We ﬁt ourmodel using the matching of moments approach, as well as the maximum-likelihoodapproach with Markov chain Monte Carlo (MCMC). Data source for historicalAustralian population, categorised by age and gender, is taken from the AustralianBureau of Statistics and data for the number of deaths categorised by death causeand divided into eight age categories, i.e., 50–54 years, 55–59 years, 60–64 years,65–69 years, 70–74 years, 75–79 years, 80–84 years and 85+ years, denoted by a , . . . , a , respectively, for each gender is taken from the AIHW . The provideddeath data is divided into 19 diﬀerent death causes—based on the ICD-9 or ICD-10classiﬁcation—where we identify the following ten of them with common non-idiosyncratic risk factors: ‘certain infectious and parasitic diseases’, ‘neoplasms’,‘endocrine, nutritional and metabolic diseases’, ‘mental and behavioural disorders’,‘diseases of the nervous system’, ‘circulatory diseases’, ‘diseases of the respiratorysystem’, ‘diseases of the digestive system’, ‘external causes of injury and poisoning’,‘diseases of the genitourinary system’ . We merge the remaining eight death causesto idiosyncratic risk as their individual contributions to overall death counts aresmall for all categories. Data handling needs some care as there was a change inclassiﬁcation of death data in 1997 as explained at the website of the AustralianBureau of Statistics . Australia introduced the tenth revision of the InternationalClassiﬁcation of Diseases (ICD-10, following ICD-9) in 1997, with a transition periodfrom 1997 to 1998. Within this period, comparability factors are given in Table4.1. Thus, for the period 1987 to 1996, death counts have to be multiplied bycorresponding comparability factors.To reduce the number of parameters which have to be estimated, cohort eﬀectsare not considered, i.e., γ = 0, and trend reduction parameters are ﬁxed withvalues ζ = φ = 0 and η = ψ = . This corresponds to slow trend reductionover the data and forecasting period (no acceleration) which makes the settingsimilar to the Lee–Carter model. Moreover, we choose the arbitrary normalisation t = 1987. Results for a more advanced modelling of trend reduction are shownlater in Section 4.2. Thus, within the maximum-likelihood framework, we end upwith 394 parameters, with 362 to be optimised. For matching of moments we follow Table 4.1.

Comparability factors for ICD-9 to ICD-10. death cause factorinfectious 1.25neoplasms 1.00endocrine 1.01mental 0.78nervous 1.20circulatory 1.00respiratory 0.91digestive 1.05genitourinary 1.14external 1.06not elsewhere (idio.) 1.00 the approach given in Section 3.4. Risk factor variances are then estimated viaApproximations (3.12) and (3.13) of the maximum a posteriori approach as theygive more reliable results than matching of moments.Based on 40,000 MCMC steps with burn-in period of 10,000 we are able toderive estimates of all parameters where starting values are taken from matchingof moments, as well as (3.12) and (3.13). Tuning parameters are frequently re-evaluated in the burn-in period. The execution time of our algorithm is roughlyseven hours on a standard computer in ‘R’. Running several parallel MCMC chainsreduces execution times to several minutes. However, note that a reduction in riskfactors (e.g., one or zero risk factors for mortality modelling) makes estimation muchquicker.As an illustration, Figure 4.1 shows MCMC chains of the variance of risk factorfor external causes of injury and poisoning σ , as well as of the parameter α ,f fordeath probability intercept of females aged 55 to 59 years. We observe in Figure 4.1that stationary distributions of MCMC chains for risk factor variances are typicallyright skewed. This indicates risk which is associated with underestimating variancesdue to limited observations of tail events.Table 4.2 shows estimates for risk factor standard deviations using matchingof moments, Approximation (3.13), as well as mean estimates of MCMC withcorresponding 5% and 95% quantiles, as well as standard errors. First, Table 4.2illustrates that (3.12) and (3.13), as well as matching of moments estimates forrisk factor standard deviations σ are close to mean MCMC estimates. Risk factorstandard deviations are small but tend to be higher for death causes with justfew deaths as statistical ﬂuctuations in the data are higher compared to morefrequent death causes. Solely estimates for the risk factor standard deviation ofmental and behavioural disorders give higher values. Standard errors, as deﬁned in[Shevchenko(2011), section 2.12.2] with block size 50, for corresponding risk factorvariances are consistently less than 3%. We can use the approximation given in(3.9) to derive risk factor estimates over previous years. For example, we observeincreased risk factor realisations of diseases of the respiratory system over the years2002 to 2004. This is mainly driven by many deaths due to inﬂuenza and pneumoniaduring that period. CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + . . (a) simulation step v a l ue − . − . (b) simulation step v a l ue (a) value den s i t y (b) value den s i t y −4.50 −4.48 −4.46 −4.44 Figure 4.1.

MCMC chains and corresponding density histogramsfor the variance of risk factor for deaths due to external causesof injury and poisoning σ in subﬁgure (a) and for the deathprobability intercept parameter of females aged 55 to 59 years α , f in subﬁgure (b) . Table 4.2.

Estimates for risk factor standard deviations σ usingmatching of moments (MM), Approximation (3.13) (appr.) andMCMC mean estimates (mean), as well as corresponding standarddeviations (stdev.) and ﬁve and 95 percent quantiles (5% and 95%). MM appr. mean 5% 95% stdev.infectious 0.1932 0.0787 0.0812 0.0583 0.1063 0.0147neoplasms 0.0198 0.0148 0.0173 0.0100 0.0200 0.0029endocrine 0.0743 0.0340 0.0346 0.0245 0.0469 0.0068mental 0.1502 0.1357 0.1591 0.1200 0.2052 0.0265nervous 0.0756 0.0505 0.0557 0.0412 0.0728 0.0098circulatory 0.0377 0.0243 0.0300 0.0224 0.0387 0.0053respiratory 0.0712 0.0612 0.0670 0.0510 0.0866 0.0110digestive 0.0921 0.0645 0.0728 0.0548 0.0943 0.0123external 0.1044 0.0912 0.1049 0.0787 0.1353 0.0176genitourinary 0.0535 0.0284 0.0245 0.0141 0.0346 0.0066

Assumption (2.9) provides a joint forecast of all death cause intensities, i.e., weights,simultaneously—in contrast to standard procedures where projections are made foreach death cause separately. Throughout the past decades we have observed drasticshifts in crude death rates due to certain death causes over the past decades. Thisfact can be be illustrated by our model as shown in Table 4.3. This table lists weights w a , g ,k ( t ) for all death causes estimated for 2011, as well as forecasted for 2031 using(2.9) with MCMC mean estimates for males and females aged between 80 to 84 years.Model forecasts suggest that if these trends in weight changes persist, then the futuregives a whole new picture of mortality. First, deaths due to circulatory diseases areexpected to decrease whilst neoplasms will become the leading death cause overmost age categories. Moreover, deaths due to mental and behavioural disorders are expected to rise considerably for older ages. High uncertainty in forecasted weightsis reﬂected by wide conﬁdence intervals (values in brackets) for the risk factor ofmental and behavioural disorders. These conﬁdence intervals are derived fromcorresponding MCMC chains and, therefore, solely reﬂect uncertainty associatedwith parameter estimation. Note that results for estimated trends depend on thelength of the data period as short-term trends might not coincide with mid- tolong-term trends. Further results can be found in [Shevchenko et al.(2015)]. Table 4.3.

Estimated weights for all death causes in years 2011,2021 and 2031 using (2.9) with MCMC mean estimates for ages 60to 64 years (left) and 80 to 84 years (right) for both genders. Fiveand 95 percent quantiles for the year 2031 are given in brackets.

60 to 64 years 80 to 84 years2011 2021 2031 (quant.) (quant.) maleneoplasms 0.499 0.531 0.547 (cid:0) . . (cid:1) (cid:0) . . (cid:1) circulatory 0.228 0.165 0.116 (cid:0) . . (cid:1) (cid:0) . . (cid:1) external 0.056 0.060 0.062 (cid:0) . . (cid:1) (cid:0) . . (cid:1) respiratory 0.051 0.043 0.036 (cid:0) . . (cid:1) (cid:0) . . (cid:1) endocrine 0.044 0.053 0.062 (cid:0) . . (cid:1) (cid:0) . . (cid:1) digestive 0.041 0.039 0.036 (cid:0) . . (cid:1) (cid:0) . . (cid:1) nervous 0.029 0.040 0.052 (cid:0) . . (cid:1) (cid:0) . . (cid:1) not elsewhere (idio.) 0.018 0.023 0.028 (cid:0) . . (cid:1) (cid:0) . . (cid:1) infectious 0.014 0.019 0.025 (cid:0) . . (cid:1) (cid:0) . . (cid:1) mental 0.013 0.019 0.027 (cid:0) . . (cid:1) (cid:0) . . (cid:1) genitourinary 0.008 0.008 0.008 (cid:0) . . (cid:1) (cid:0) . . (cid:1) femaleneoplasms 0.592 0.628 0.648 (cid:0) . . (cid:1) (cid:0) . . (cid:1) circulatory 0.140 0.092 0.060 (cid:0) . . (cid:1) (cid:0) . . (cid:1) respiratory 0.072 0.071 0.069 (cid:0) . . (cid:1) (cid:0) . . (cid:1) endocrine 0.038 0.038 0.037 (cid:0) . . (cid:1) (cid:0) . . (cid:1) nervous 0.036 0.043 0.051 (cid:0) . . (cid:1) (cid:0) . . (cid:1) external 0.035 0.033 0.032 (cid:0) . . (cid:1) (cid:0) . . (cid:1) digestive 0.031 0.028 0.024 (cid:0) . . (cid:1) (cid:0) . . (cid:1) not elsewhere (idio.) 0.022 0.023 0.023 (cid:0) . . (cid:1) (cid:0) . . (cid:1) infectious 0.014 0.017 0.020 (cid:0) . . (cid:1) (cid:0) . . (cid:1) mental 0.012 0.019 0.032 (cid:0) . . (cid:1) (cid:0) . . (cid:1) genitourinary 0.009 0.007 0.005 (cid:0) . . (cid:1) (cid:0) . . (cid:1) Forecasting Death Probabilities.

Forecasting death probabilities and cen-tral death rates within our proposed model is straight forward using (2.8). In the

CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + special case with just idiosyncratic risk, i.e., K = 0, death indicators can be assumedto be Bernoulli distributed instead of being Poisson distributed in which case wemay write the likelihood function in the form (cid:96) B ( (cid:98) N | α, β, ζ, η, γ ) = T (cid:89) t =1 A (cid:89) a =0 (cid:89) g ∈{ f , m } (cid:18) E a,g ( t ) (cid:98) N a,g, ( t ) (cid:19) m a,g ( t ) (cid:98) N a,g, ( t ) (1 − m a,g ( t )) E a,g ( t ) − (cid:98) N a,g, ( t ) , with 0 ≤ (cid:98) N a,g, ( t ) ≤ E a,g ( t ). Due to possible overﬁtting, derived estimates may notbe suﬃciently smooth across age categories a ∈ { , . . . , A } . Therefore, if we switchto a Bayesian setting, we may use regularisation via prior distributions to obtainstabler results. To guarantee smooth results and a suﬃcient stochastic foundation,we suggest the usage of Gaussian priors with mean zero and a speciﬁc correlationstructure, i.e., π ( α, β, ζ, η, γ ) = π ( α ) π ( β ) π ( ζ ) π ( η ) π ( γ ) withlog π ( α ) := − c α (cid:88) g ∈{ f , m } (cid:18) A − (cid:88) a =0 ( α a,g − α a +1 ,g ) + ε α A (cid:88) a =0 α a,g (cid:19) + log( d α ) , (4.1) c α , d α , ε α >

0, and correspondingly for β , ζ , η and γ . Parameters c α (correspondinglyfor β , ζ , η and γ ) is a scaling parameters and directly associated with the variance ofGaussian priors while normalisation-parameter d α guarantees that π ( α ) is a properGaussian density. Penalty-parameter ε α scales the correlation amongst neighbourparameters in the sense that the lower it gets, the higher the correlation. The morewe increase c α the stronger the inﬂuence of, or the believe in the prior distribution.This particular prior density penalises deviations from the ordinate which is amild conceptual shortcoming as this does not accurately reﬂect our prior believes.Setting ε α = 0 gives an improper prior with uniformly distributed (on R ) marginalssuch that we gain that there is no prior believe in expectations of parameters but,simultaneously, lose the presence of variance-covariance-matrices and asymptoticallyget perfect positive correlation across parameters of diﬀerent ages. Still, whilstlacking theoretical properties, better ﬁts to data are obtained by setting ε α = 0. Forexample, setting ε α = ε β = 10 − and ε ζ = ε η = ε γ = 10 − yields a prior correlationstructure which decreases with higher age diﬀerences and which is always positive asgiven in subﬁgure (a) of Figure 4.2. There exist many other reasonable choices for (a) age age (b) age age (c) age age −1−0.8−0.6−0.4−0.200.20.40.60.81 Figure 4.2.

Correlation structure of Gaussian priors with penal-isation for deviation from ordinate with ε = 1 /

100 in subﬁgure (a) , straight line with ε = 1 / (b) , and parabola ε = 1 / (c) . Gaussian prior densities. For example, replacing graduation terms ( α a,g − α a +1 ,g ) in (4.1) by higher order diﬀerences of the form (cid:0) (cid:80) kν =0 ( − ν (cid:0) kν (cid:1) α a,g + ν (cid:1) yields apenalisation for deviations from a straight line with k = 2, see subﬁgure (b) inFigure 4.2, or from a parabola with k = 3, see subﬁgure (c) in Figure 4.2. Theusage of higher order diﬀerences for graduation of statistical estimates goes back tothe Whittaker–Henderson method. Taking k = 2 , ε α = ε β = ε ζ = ε η = ε γ = 0 as this yields accurateresults whilst still being reasonably smooth.An optimal choice of regularisation parameters c α , c β , c ζ , c η and c γ can be obtainedby cross-validation.Results for Australian data from 1971 to 2013 with t = 2013 are given inFigure 4.3. Using MCMC we derive estimates for logarithmic central death rateslog m a,g ( t ) with corresponding forecasts, mortality trends β a,g , as well as trendreduction parameters ζ a,g , η a,g and cohort eﬀects γ a − t . As we do not assume commonstochastic risk factors, the MCMC algorithm we use can be implemented veryeﬃciently such that 40 000 samples from the posterior distribution of all parametersare derived within a minute. We observe negligible parameter uncertainty due to along period of data. Further, regularisation parameters obtained by cross-validationare given by c α = 500, c β = c η = 30 , c α , c ζ = c α /

20 and c γ = 1000 c α . Wecan draw some immediate conclusions. Firstly, we see an overall improvementin mortality over all ages where the trend is particularly strong for young agesand ages between 60 and 80 whereas the trend vanishes towards the age of 100,maybe implying a natural barrier for life expectancy. Due to sparse data the latterconclusion should be treated with the utmost caution. Furthermore, we see theclassical hump of increased mortality driven by accidents around the age of 20 whichis more developed for males.Secondly, estimates for ζ a,g suggest that trend acceleration switched to trendreduction throughout the past 10 to 30 years for males while for females this transitionalready took place 45 years ago. However, note that parameter uncertainty (underMCMC) associated with ζ a,g is high, particularly if estimates are not regularised.Estimates for η a,g show that the speed of trend reduction is much stronger for malesthan for females. Estimates for γ a − t show that the cohort eﬀect is particularlystrong (in the sense of increased mortality) for the generation born between 1915and 1930 (probably associated with World War II) and particularly weak for thegeneration born around 1945. However, considering cohort eﬀects makes estimationand forecasts signiﬁcantly less stable for the used data, which is why we recommendto set γ a − t = 0.Based on forecasts for death probabilities, expected future life time can beestimated. To be consistent concerning longevity risk, mortality trends have to beincluded as a 60-year-old today will probably not have as good medication as a 60-year-old in several decades. However, it seems that this is not the standard approachin the literature. Based on the deﬁnitions above, expected (curtate) future life time ofa person at date T is given by e a,g ( T ) = E [ K a,g ( T )] = (cid:80) ∞ k =1 k p a,g ( T ), where survivalprobabilities over k ∈ N years are given by k p a,g ( T ) := (cid:81) k − j =0 (cid:0) − q a + j,g ( T + j ) (cid:1) and where K a,g ( T ) denotes the number of completed future years lived by a personof particular age and gender at time T . Approximating death probabilities by CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + − − − − − (a) age l og c en t r a l dea t h r a t e s − − − − (b) age pa r a m e t e r a malefemale0 20 40 60 80 100 − . − . − . (c) age pa r a m e t e r b ( t r end ) malefemale 0 20 40 60 80 100 − − − (d) age pa r a m e t e r z ( t r end r ed . l o c a t i on ) malefemale0 20 40 60 80 100 . . . . . (e) age pa r a m e t e r h ( t r end r ed . s peed ) malefemale − . . . (f) year of birth pa r a m e t e r g ( c oho r t e ff e c t ) Figure 4.3.

Logarithm of death central death rates (a) for 2013and forecasts for 2063 in Australia as well as parameter values for α, β, ζ, η and γ in subﬁgures (b) , (c) , (d) , (e) and (f ) , respectively.central death rates, for newborns in Australia we get a life expectancy of roughly 83years for males and 89 . saying that ‘Aussie men now expected to live past 80’ and ‘improvementsin expected lifespan for women has since slowed down, increasing by around four yearsover the period—it’s 84.3 now’, our results show a much higher life expectancy dueto the consideration of mortality trends. Table 4.4.

Curtate future life time e a,g ( T ) for males and femalesin 2013. age in 2013 0 (Newborn) 20 40 60 80male 83 .

07 63 .

33 43 .

62 24 .

44 8 . .

45 69 .

05 48 .

20 27 .

76 9 . A Link to the Extended CreditRisk + Model and Applications

The ECRP Model.

In this section we establish the connection from our pro-posed stochastic mortality model to the risk aggregation model extended CreditRisk + (abbreviated as ECRP), as given in [Schmock(2017), section 6]. Deﬁnition 5.1 (Policyholders and number of deaths) . Let { , . . . , E } with E ∈ N denote the set of people (termed as policyholders in light of insurance applications)in the portfolio and let random variables N , . . . , N E : Ω → N indicate the numberof deaths of each policyholder in the following period. The event { N i = 0 } indicatessurvival of person i whilst { N i ≥ } indicates death. Deﬁnition 5.2 (Portfolio quantities) . Given Definition 5.1, the independent randomvectors Y , . . . , Y E : Ω → N d with d ≥ portfolio quantities within the following period given deaths of policyholders, i.e., on { N i ≥ } for all i ∈ { , . . . , E } , and are independent of N , . . . , N E . Remark . (Portfolio quantities).(a) For applications in the context of internal models we may set Y i as the bestestimate liability, i.e., discounted future cash ﬂows, of policyholder i at theend of the period. Thus, when using stochastic discount factors or contractswith optionality, for example, portfolio quantities may be stochastic.(b) In the context of portfolio payment analysis we may set Y i as the payments(such as annuities) to i over the next period. We may include premiumsin a second dimension in order to get joint distributions of premiums andpayments.(c) For applications in the context of mortality estimation and projection weset Y i = 1.(d) Using discretisation which preserves expectations (termed as stochasticrounding in [Schmock(2017), section 6.2.2], we may assume Y i to be [0 , ∞ ) d -valued . Deﬁnition 5.4 (Aggregated portfolio quantities) . Given Deﬁnitions 5.1 and 5.2,aggregated portfolio quantities due to deaths are given by S := E (cid:88) i =1 N i (cid:88) j =1 Y i,j , where ( Y i,j ) j ∈ N for every i ∈ { , . . . , E } is an i.i.d. sequence of random variableswith the same distributions as Y i . Remark . In the context of term life insurance contracts, for example, S is thesum of best estimates of payments and premiums which are paid and received,respectively, due to deaths of policyholders, see Section 5.2. In the context ofannuities, S is the sum of best estimates of payments and premiums which neednot be paid and are received, respectively, due to deaths of policyholders. Then,small values of S , i.e., the left tail of its distribution, is the part of major interestand major risk.It is a demanding question how to choose the modelling setup such that the distri-bution of S can be derived eﬃciently and accurately. Assuming N i to be Bernoullidistributed is not suitable for our intended applications as computational complexityexplodes. Therefore, to make the modelling setup applicable in practical situations CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + and to ensure a ﬂexible handling in terms of multi-level dependence, we introducethe ECRP model which is based on extended CreditRisk + , see [Schmock(2017),section 6]. Deﬁnition 5.6 (The ECRP model) . Given Deﬁnitions 5.1 and 5.2, the ECRPmodel satisﬁes the following additional assumptions:(a) Consider independent random common risk factors Λ , . . . , Λ K : Ω → [0 , ∞ )which have a gamma distribution with mean e k = 1 and variance σ k > σ − k . Also the degeneratecase with σ k = 0 for k ∈ { , . . . , K } is allowed. Corresponding weights w i, , . . . , w i,K ∈ [0 ,

1] for every policyholder i ∈ { , . . . , E } . Risk index zerorepresents idiosyncratic risk and we require w i, + · · · + w i,K = 1.(b) Deaths N , , . . . , N E, : Ω → N are independent from one another, as wellas all other random variables and, for all i ∈ { , . . . , E } , they are Poissondistributed with intensity m i w i, , i.e., P (cid:18) E (cid:92) i =1 { N i, = (cid:98) N i, } (cid:19) = E (cid:89) i =1 e − m i w i, ( m i w i, ) (cid:98) N i, (cid:98) N i, ! , (cid:98) N , , . . . , (cid:98) N E, ∈ N . (c) Given risk factors, deaths ( N i,k ) i ∈{ ,...,E } ,k ∈{ ,...,K } : Ω → N E × K are inde-pendent and, for every policyholder i ∈ { , . . . , E } and k ∈ { , . . . , K } , theyare Poisson distributed with random intensity m i w i,k Λ k , i.e., P (cid:18) E (cid:92) i =1 K (cid:92) k =1 { N i,k = (cid:98) N i,k } (cid:12)(cid:12)(cid:12)(cid:12) Λ , . . . , Λ K (cid:19) = E (cid:89) i =1 K (cid:89) k =1 e − m i w i,k Λ k ( m i w i,k Λ k ) (cid:98) N i,k (cid:98) N i,k ! a.s.,for all n i,k ∈ N .(d) For every policyholder i ∈ { , . . . , E } , the total number of deaths N i is splitup additively according to risk factors as N i = N i, + · · · + N i,K . Thus, bymodel construction, E [ N i ] = m i ( w i, + · · · + w i,K ) = m i .Given Deﬁnition 5.1, central death rates are given by m i = E [ N i ] and deathprobabilities, under piecewise constant death rates, are given by q i = 1 − exp( − m i ). Remark . Assuming that central death rates and weights are equal for allpolicyholders for the same age and gender, it is obvious that the ECRP correspondsone-to-one to our proposed stochastic mortality model, as given in Deﬁnition 2.2, ifrisk factors are independent gamma distributed.In reality, number of deaths are Bernoulli random variables as each person can justdie once. Unfortunately in practice, such an approach is not tractable for calculatingP&L distributions of large portfolios as execution times explode if numerical errorsshould be small. Instead, we will assume the number of deaths of each policyholderto be compound Poisson distributed. However, for estimation of life tables wewill assume the number of deaths to be Bernoulli distributed. Poisson distributeddeaths give an eﬃcient way for calculating P&L distributions using an algorithmbased on Panjer’s recursion, also for large portfolios, see [Schmock(2017), section6.7]. The algorithm is basically due to [Giese(2003)] for which [Haaf et al.(2004)]proved numerical stability. The relation to Panjer’s recursion was ﬁrst pointed outin [Gerhold et al.(2010), section 5.5]. [Schmock(2017)] in section 5.1 generalised thealgorithm to the multivariate case with dependent risk factors and risk groups, basedon the multivariate extension of Panjer’s algorithm given by [Sundt(1999)]. The algorithm is numerically stable since just positive terms are added up. To avoid longexecution times for implementations of extended CreditRisk + with large annuity port-folios, greater loss units and stochastic rounding, see [Schmock(2017), section 6.2.2],can be used.However, the proposed model allows for multiple (Poisson) deaths of each pol-icyholder and thus approximates the ‘real world’ with single (Bernoulli) deaths.From a theoretical point of view, this is justiﬁed by the Poisson approximationand generalisations of it, see for example [Vellaisamy and Chaudhuri(1996)]. Sinceannual death probabilities for ages up to 85 are less than 10%, multiple deaths arerelatively unlikely for all major ages. However, implementations of this algorithm aresigniﬁcantly faster than Monte Carlo approximations for comparable error (Poissonagainst Bernoulli) levels.As an illustration we take a portfolio with E = 10 ,

000 policyholders havingcentral death rate m := m i = 0 .

05 and payments Y i = 1. We then derive thedistribution of S using the ECRP model for the case with just idiosyncratic risk,i.e., w i, = 1 and Poisson distributed deaths, and for the case with just one commonstochastic risk factor Λ with variance σ = 0 . w i, = 1 with mixed Poisson distributed deaths. Then, using 50 ,

000 simulationsof the corresponding model where N i is Bernoulli distributed or mixed Bernoullidistributed given truncated risk factor Λ | Λ ≤ m , we compare the results of theECRP model to Monte Carlo, respectively. Truncation of risk factors in the Bernoullimodel is necessary as otherwise death probabilities may exceed one. We observethat the ECRP model drastically reduces execution times in ‘R’ at comparableerror levels and leads to a speed up by the factor of 1000. Error levels in thepurely idiosyncratic case are measured in terms of total variation distance betweenapproximations and the binomial distribution with parameters (10 , , .

05) whicharises as the independent sum of all Bernoulli random variables. Error levels inthe purely non-idiosyncratic case are measured in terms of total variation distancebetween approximations and the mixed binomial distribution where for the ECRPmodel we use Poisson approximation to get an upper bound. the total variationbetween those distributions is 0 . Table 5.1. uantiles, execution times (speed) and total variationdistance (accuracy) of Monte Carlo with Bernoulli deaths and50 ,

000 simulations, as well as the extended CreditRisk + (ECRP)model with Poisson deaths, given a simple portfolio. quantiles1% 10% 50% 90% 99% speed accuracyBernoulli (MC), w i, = 1 450 472 500 528 552 22 .

99 sec. 0 . w i, = 1 449 471 500 529 553 0 .

01 sec. 0 . w i, = 1 202 310 483 711 936 23 .

07 sec. 0 . w i, = 1 204 309 483 712 944 0 .

02 sec. ≤ . CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + Application I: Mortality Risk, Longevity Risk and Solvency II Ap-plication.

In light of the previous section, life tables can be projected into thefuture and, thus, it is straightforward to derive best estimate liabilities (BEL) ofannuities and life insurance contracts. The possibility that death probabilities diﬀerfrom an expected curve, i.e., estimated parameters do no longer reﬂect the bestestimate and have to be changed, contributes to mortality or longevity risk, whenrisk is measured over a one year time horizon as in Solvency II and the durationof in-force insurance contracts exceeds this time horizon. In our model, this riskcan be captured by considering various MCMC samples (ˆ θ h ) h =1 ,...,m (indexed bysuperscript h ) of parameters θ = ( α, β, ζ, η, γ ) for death probabilities, yieldingdistributions of BELs. For example, taking D ( T, T + t ) as the discount curve fromtime T + t back to T and choosing an MCMC sample ˆ θ h of parameters to calculatedeath probabilities q ha,g ( T ) and survival probabilities p ha,g ( T ) at age a with gender g , the BEL for a term life insurance contract which pays 1 unit at the end of theyear of death within the contract term of d years is given by A Ta,g (cid:0) ˆ θ h (cid:1) = D ( T, T + 1) q ha,g ( T ) + d (cid:88) t =1 D ( T, T + t + 1) · t p ha,g ( T ) q ha + t,g ( T + t ) . (5.8)In a next step, this approach can be used as a building block for (partial) internalmodels to calculate basic solvency capital requirements (BSCR) for biometricunderwriting risk under Solvency II, as illustrated in the following example.Consider an insurance portfolio at time 0 with E ∈ N whole life insurance policieswith lump sum payments C i >

0, for i = 1 , . . . , E , upon death at the end of theyear. Assume that all assets are invested in an EU government bond (risk free underthe standard model of the Solvency II directive) with maturity 1, nominal A andcoupon rate c > −

1. Furthermore, assume that we are only considering mortalityrisk and ignore proﬁt sharing, lapse, costs, reinsurance, deferred taxes, other assetsand other liabilities, as well as the risk margin. Note that in this case, basic ownfunds, denoted by BOF t , are given by market value of assets minus BEL at time t , respectively. Then, the BSCR at time 0 is given by the 99.5% quantile of thechange in basic own funds over the period [0 , , which can bederived by, see (5.8),∆BOF = BOF − D (0 , = A (cid:0) − D (0 , c ) (cid:1) − E (cid:88) i =1 C i A a,g (cid:0) ˆ θ (cid:1) + D (0 , m m (cid:88) h =1 (cid:18) E (cid:88) i =1 C i A a +1 ,g (cid:0) ˆ θ h (cid:1) + E (cid:88) i =1 N hi (cid:88) j =1 C i (cid:0) − A a +1 ,g (cid:0) ˆ θ h (cid:1)(cid:1)(cid:19) . (5.9)where ˆ θ := m (cid:80) mh =1 ˆ θ h and where N h , . . . , N hE are independent and Poisson dis-tributed with E [ N hi ] = q ha i ,g i (0) with policyholder i belonging to age group a i andof gender g i . The distribution of the last sum above can be derived eﬃciently byPanjer recursion. This example does not require a consideration of market risk andit nicely illustrates how mortality risk splits into a part associated with statisticalﬂuctuation (experience variance: Panjer recursion) and into a part with long-termimpact (change in assumptions: MCMC). Note that by mixing N i with commonstochastic risk factors, we may include other biometric risks such as morbidity. Consider a portfolio with 100 males and females at each age between 20 and 60years, each having a 40-year term life insurance, issued in 2014, which provides a lumpsum payment between 10,000 and 200,000 (randomly chosen for each policyholder)if death occurs within these 40 years. Using MCMC samples and estimates basedon the Austrian data from 1965 to 2014 as given in the previous section, we mayderive the change in basic own funds from 2014 to 2015 by (5.9) using the extendedCreditRisk + algorithm. The 99.5% quantile of change in BOFs, i.e., the SCR, islying slightly above one million. If we did not consider parameter risk in the formof MCMC samples, the SCR would decrease by roughly 33%.5.3. Application II: Impact of Certain Health Scenarios in Portfolios.

Analysis of certain health scenarios and their impact on portfolio P&L distributionsis straightforward As an explanatory example, assume m = 1600 policyholderswhich distribute uniformly over all age categories and genders, i.e., each age categorycontains 100 policyholders with corresponding death probabilities, as well as weightsas previously estimated and forecasted for 2012. Annuities Y i for all i ∈ { , . . . , E } are paid annually and take deterministic values in { , . . . , } such that ten poli-cyholders in each age and gender category share equally high payments. We nowanalyse the eﬀect on the total amount of annuity payments in the next period underthe scenario, indexed by ‘scen’, that deaths due to neoplasms are reduced by 25%in 2012 over all ages. In that case, we can estimate the realisation of risk factorfor neoplasms, see (3.9), which takes an estimated value of 0 . L scen where deaths due to neoplasms have decreased. Figure 5.1 thenshows probability distributions of traditional loss L without scenario, as well as ofscenario loss L neo with corresponding 95% and 99% quantiles. We observe that areduction of 25% in cancer crude death rates leads to a remarkable shift in quantilesof the loss distribution as fewer people die and, thus, more annuity payments haveto be made. . . . . loss p r obab ili t y den s i t y loss scenario95% quantile scenario99% quantile scenarioloss95% quantile99% quantile Figure 5.1.

Loss distributions of L and L scen with 95 and 99%quantiles.5.4. Application III: Forecasting Central Death Rates and ComparisonWith the Lee–Carter Model.

We can compare out-of-sample forecasts of deathrates from our proposed model to forecasts obtained by other mortality models.Here, we choose the traditional Lee–Carter model as a proxy as our proposed model

CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + is conceptionally based on a similar idea. We make the simplifying assumption of aconstant population for out-of-sample time points.Using the ECRP model it is straight-forward to forecast central death ratesand to give corresponding conﬁdence intervals via setting Y j ( t ) := 1. Then, for anestimate ˆ θ of parameter vector θ run the ECRP model with parameters forecasted,see (2.8) and (2.9). We then obtain the distribution of the total number of deaths S a,g ( t ) given ˆ θ and, thus, forecasted death rate (cid:98) m a,g ( t ) is given by P (cid:0) (cid:98) m a,g ( t ) = N/E a,g ( T ) (cid:1) = P ( S a,g ( t ) = N ), for all N ∈ N .Uncertainty in the form of conﬁdence intervals represent statistical ﬂuctuations, aswell as random changes in risk factors. Additionally, using results obtained by Markovchain Monte Carlo (MCMC) it is even possible to incorporate parameter uncertaintyinto predictions. To account for an increase in uncertainty for forecasts we suggest toassume increasing risk factor variances for forecasts, e.g., ˜ σ k ( t ) = σ k (1 + d ( t − T )) with d ≥

0. A motivation for this approach with k = 1 is the following: A majorsource of uncertainty for forecasts lies in an unexpected deviation from the estimatedtrend for death probabilities. We may therefore assume that rather than beingdeterministic, forecasted values m a,g ( t ) are beta distributed (now denoted by M a,g ( t ))with E [ M a,g ( t )] = m a,g ( t ) and variance σ a,g ( t ) which is increasing in time. Then,given independence amongst risk factor Λ and M a,g ( t ), we may assume that thereexists a future point in time t such that σ a,g ( t ) = m a,g ( t )(1 − m a,g ( t )) σ − + 1 . In that case, M a,g ( t )Λ is again gamma distributed with mean one and increasedvariance m a,g ( t ) σ (instead of m a,g ( t ) σ for the deterministic case). Henceforth,it seems reasonable to stay within the family of gamma distributions for forecastsand just adapt variances over time. Of course, families for these variances forgamma distributions can be changed arbitrarily and may be selected via classicalinformation criteria.Using in-sample data, d can be estimated via (3.3) with all other parametersbeing ﬁxed. Using Australian death and population data for the years 1963 to 1997we estimate model parameters via MCMC in the ECRP model with one commonstochastic risk factor having constant weight one. In average, i.e., for variousforecasting periods and starting at diﬀerent dates, parameter d takes the value0 .

22 in our example. Using ﬁxed trend parameters as above, and using the meanof 30,000 MCMC samples, we forecast death rates and corresponding conﬁdenceintervals out of sample for the period 1998 to 2013. We can then compare theseresults to crude death rates within the stated period and to forecasts obtained bythe Lee–Carter model which is shown in Figure 5.2 for females aged 50 to 54 years.We observe that crude death rates mostly fall in the 90% conﬁdence band for bothprocedures. Moreover, Lee–Carter forecasts lead to wider spreads of quantiles in thefuture whilst the ECRP model suggests a more moderate increase in uncertainty.Taking various parameter samples from the MCMC chain and deriving quantiles fordeath rates, we can extract contributions of parameter uncertainty in the ECRPmodel coming from posterior distributions of parameters.Within our approach to forecast death rates, it is now possible to derive contri-butions of various sources of risk. If we set δ = 0 we get forecasts where uncertaintysolely comes from statistical ﬂuctuations and random changes in risk factors. Using . . . year dea t h r a t e f o r e c a s t Figure 5.2.

Forecasted and true death rates using the ECRPmodel (AM) and the Lee–Carter model (LC) for females aged 50to 54 years. δ = 0 .

22 this adds the uncertainty increase associated with uncertainty for forecasts.Finally, considering several MCMC samples this adds parameter risk. We observethat the contribution of statistical ﬂuctuations and random changes in risk factorsdecreases from 63% in 1998 to 20% in 2013. Adding the increase in uncertainty forforecasts gives a roughly constant contribution of 72% which implies that δ becomesthe main driver of risk in the long term. On top of that, parameter uncertaintyleaves a constant contribution of 28%.5.5. Considered Risks.

Regulators often require security margins in life tableswhen modelling annuity or certain life insurance products and portfolios to accountfor diﬀerent sources of risk, including trends, volatility risk, model risk and parameterrisk, [Kainhofer et al.(2006)] as well as [Pasdika and Wolﬀ(2005)].In the ECRP model, mortality trends are incorporated via families for deathprobabilities which are motivated by the Lee–Carter model. It is straight forward toarbitrarily change parameter families such that it ﬁts the data as in the case whentrends change fundamentally. If other families for weights are used, one always has tocheck that they sum up to one over all death causes. Note that for certain alternativeparameter families, mean estimates obtained from Markov chain Monte Carlo do notnecessarily sum up to one anymore. Changing model parameter families may also benecessary when using long-term projections since long-term trends are fundamentallydiﬀerent from short-term trends. Further estimation and testing procedures fortrends in composite Poisson models in the context of convertible bonds can befound in [Schmock(1999)]. Trends for weights are particularly interesting insofaras the model becomes sensitive to the change in the vulnerability of policyholdersto diﬀerent death causes over time. Cross dependencies over diﬀerent death causesand diﬀerent ages can occur. Such an eﬀect can arise as a reduction in deathsof a particular cause can lead to more deaths in another cause, several periodslater, as people have to die at some point. Furthermore, the ECRP model capturesunexpected, temporary deviations from a trend with the variability introduced bycommon stochastic risk factors which eﬀect all policyholders according to weightssimultaneously.

CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + Assuming that the model choice is right and that estimated values are correct,life tables still just give mean values of death probabilities over a whole population.Therefore, in the case of German data it is suggested to add a non gender speciﬁc dueto legal reasons and it is set to 7.4% to account for the risk of random ﬂuctuationsin deaths, approximately at a 95% quantile, see German Actuarial Association(DAV) [Todesfallrisiko(2009), section 4.1]. In the ECRP model this risk is capturedautomatically by risk aggregation. As a reference to the suggested security marginof 7 .

4% on death probabilities, we can use the same approach as given in Section5.4 to estimate quantiles for death rates via setting Y j = 1. These quantiles thencorrespond to statistical ﬂuctuations around death probabilities. We roughly observean average deviation from death probability of 8.4% for the 5% quantile and of 8.7%for the 95% quantile of females aged 55 to 60 years in 2002, i.e., these values are inline with a security margin of 7.4%.The risk of wrong parameter estimates, i.e., that realised death probabilitiesdeviate from estimated values, can be captured using MCMC as described in Section3.3 where we sample from the joint posterior distributions of the estimators. Asour proposed extended CreditRisk + algorithm is numerically very eﬃcient, we caneasily run the ECRP model for several thousand samples from the MCMC chainto derive sensitivities of quantiles, for example. Note that parameter risk is closelylinked to longevity risk. To cover the risk of fundamental changes in future deathprobabilities, Section 5.4 provides an approach where future risk factor variancesincrease over time.Modelling is usually a projection of a sophisticated real world problem on arelatively simple subspace which cannot cover all facets and observations in thedata. Therefore, when applying the ECRP model to a portfolio of policyholders,we usually ﬁnd structural diﬀerences to the data which is used for estimation.There may also be a diﬀerence in mortality rates between individual companies orbetween portfolios within a company since diﬀerent types of insurance productsattract diﬀerent types of policyholders with a diﬀerent individual risk proﬁle. InGermany, for these risks a minimal security margin of ten% is suggested, see[Pasdika and Wolﬀ(2005), section 2.4.2]. Within the ECRP model, this risk canbe addressed by using individual portfolio data instead of the whole population.Estimates from the whole population or a diﬀerent portfolio can be used as priordistributions under MCMC which, in case of sparse data, makes estimation morestable. Another possibility for introducing dependency amongst two portfolios isthe introduction of a joint stochastic risk factor for both portfolios. In that case,estimation can be performed jointly with all remaining (except risk factors andtheir variances) parameters being individually estimated for both portfolios. Incontrast to the whole population, observed mortality rates in insurance portfoliosoften show a completely diﬀerent structure due to self-selection of policyholders. Inparticular, for ages around 60, this eﬀect is very strong. In Germany, a securitymargin for death probabilities of 15% is suggested to cover selection eﬀects, see DAV[Todesfallrisiko(2009), section 4.2]. In the literature, this eﬀect is also referred toas basis risk, [Li and Hardy(2011)]. As already mentioned, instead of using a ﬁxedsecurity margin, this issue can be tackled by using portfolio data with estimatesfrom the whole population serving as prior distribution. Again, dependence amongsta portfolio and the whole population can be introduced by a joint stochastic riskfactor in the ECRP model. Alternatively, in [Kainhofer et al.(2006), section 4.7.1] it is suggested that allthese risks are addressed by adding a constant security margin on the trend. Thisapproach has the great conceptional advantage that the security margin is increasingover time and does not diminish as in the case of direct security margins on deathprobabilities.5.6.

Generalised and Alternative Models.

Up to now, we applied a simpliﬁedversion of extended CreditRisk + to derive cumulative payments in annuity portfo-lios. A major shortcoming in this approach is the limited possibility of modellingdependencies amongst policyholders and death causes. In the most general form ofextended CreditRisk + as described in [Schmock(2017), section 6], it is possible tointroduce risk groups which enable us to model joint deaths of several policyholdersand it is possible to model dependencies amongst death causes. Dependencies cantake a linear dependence structure combined with dependence scenarios to modelnegative correlations as well. Risk factors may then be identiﬁed with statisticalvariates such as average blood pressure, average physical activity or the averageof smoked cigarettes, etc., and not directly with death causes. Moreover, for eachpolicyholder individually, the general model allows for losses which depend on theunderlying cause of death. This gives scope to the possibility of modelling—possiblynew—life insurance products with payoﬀs depending on the cause of death as, forexample, in the case of accidental death beneﬁts. Including all extensions men-tioned above, a similar algorithm may still be applied to derive loss distributions,see [Schmock(2017), section 6.7].Instead of using extended CreditRisk + to model annuity portfolios, i.e., anapproach based on Poisson mixtures, we can assume a similar Bernoulli mixturemodel. In such a Bernoulli mixture model, conditionally Poisson distributed deathsare simply replaced by conditionally Bernoulli distributed deaths. In general, explicitand eﬃcient derivation of loss distributions in the case of Bernoulli mixture modelsis not possible anymore. Thus, in this case, one has to rely on other methods suchas Monte Carlo. Estimation of model parameters works similarly as discussed inSection 3. Poisson approximation suggests that loss distributions derived fromBernoulli and Poisson mixture models are similar in terms of total variation distanceif death probabilities are small.6. Model Validation and Model Selection

In this section we propose several validation techniques in order to check whetherthe ECRP model ﬁts the given data or not. Results for Australian data, see Section4.1, strongly suggest that the proposed model is suitable. If any of the followingvalidation approaches suggested misspeciﬁcation in the model or if parameterestimation did not seem to be accurate, one possibility to tackle these problemswould be to reduce risk factors.6.1.

Validation via Cross-Covariance.

For the ﬁrst procedure, we transformdeaths N a,g,k ( t ) to N (cid:48) a,g,k ( t ), see Section 3.4, such that this sequence has constantexpectation and can thus be assumed to be i.i.d. Then, sample variances oftransformed death counts, cumulated across age and gender groups, can be comparedto MCMC conﬁdence bounds from the model. In the death-cause-example allobserved sample variances of N k ( t ) lie within 5%- and 95%-quantiles. CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + Validation via Independence.

Number of deaths for diﬀerent death causesare independent within the ECRP model as independent risk factors are assumed.Thus, for all a, a (cid:48) ∈ { , . . . , A } and g, g (cid:48) ∈ { f , m } , as well as k, k (cid:48) ∈ { , . . . , K } with k (cid:54) = k (cid:48) and t ∈ U , we have Cov( N a,g,k ( t ) , N a (cid:48) ,g (cid:48) ,k (cid:48) ( t )) = 0. Again, trans-form the data as above and subsequently normalise the transformed data, givenVar (cid:0) N (cid:48) a,g,k ( t ) | Λ k ( t ) (cid:1) > N ∗ a,g,k ( t ) := N (cid:48) a,g,k ( t ) − E [ N (cid:48) a,g,k ( t ) | Λ k ( t )] (cid:113) Var( N (cid:48) a,g,k ( t ) | Λ k ( t )) = N (cid:48) a,g,k ( t ) − E a,g m a,g w a,g,k Λ k ( t ) (cid:112) E a,g m a,g w a,g,k Λ k ( t ) . Using the conditional central limit theorem as in [Grzenda and Zieba(2008)],we have N ∗ a,g,k ( t ) → N (0 ,

1) in distribution as E a,g ( t ) → ∞ where N (0 ,

1) denotesthe standard normal distribution. Thus, normalised death counts n ∗ a,g,k ( t ) are givenby n ∗ a,g,k ( t ) = n (cid:48) a,g,k ( t ) − E a,g ˆ m a,g ˆ w a,g,k ˆ λ k ( t ) (cid:113) E a,g ˆ m a,g ˆ w a,g,k ˆ λ k ( t ) . with ˆ λ ( t ) := 1. Then, assuming that each pair ( N ∗ a,g,k ( t ) , N ∗ a (cid:48) ,g (cid:48) ,k (cid:48) ( t )), for a, a (cid:48) ∈{ , . . . , A } and g, g (cid:48) ∈ { f , m } , as well as k, k (cid:48) ∈ { , . . . , K } with k (cid:54) = k (cid:48) and t ∈ U ,has a joint normal distribution with some correlation coeﬃcient ρ and standardnormal marginals, we may derive the sample correlation coeﬃcient R a,g,a (cid:48) ,g (cid:48) ,k,k (cid:48) := (cid:80) Tt =1 ( N ∗ a,g,k ( t ) − N ∗ a,g,k )( N ∗ a (cid:48) ,g (cid:48) ,k (cid:48) ( t ) − N ∗ a (cid:48) ,g (cid:48) ,k (cid:48) ) (cid:113)(cid:80) Tt =1 ( N ∗ a,g,k ( t ) − N ∗ a,g,k ) (cid:80) Tt =1 ( N ∗ a (cid:48) ,g (cid:48) ,k (cid:48) ( t ) − N ∗ a (cid:48) ,g (cid:48) ,k (cid:48) ) , where N ∗ a,g,k := T (cid:80) Ts =1 N ∗ a,g,k ( s ). Then, the test of the null hypothesis ρ = 0against the alternative hypothesis ρ (cid:54) = 0 rejects the null hypothesis at an δ -percentlevel, see [Lehmann and Romano(2005), section 5.13], when | R a,g,a (cid:48) ,g (cid:48) ,k,k (cid:48) | (cid:113) (1 − R a,g,a (cid:48) ,g (cid:48) ,k,k (cid:48) ) / ( T − > K δ,T , (6.1)with K δ,T such that (cid:82) ∞ K δ,T t T − ( y ) dy = δ/ t T − denotes the density of a t -distribution with ( T −

2) degrees of freedom.Applying this validation procedure on Australian data with ten death causes showsthat 88.9% of all independence tests, see (6.1), are accepted at a 5% signiﬁcancelevel. Thus, we may assume that the ECRP model ﬁts the data suitably with respectto independence amongst death counts due to diﬀerent causes.6.3.

Validation via Serial Correlation.

Using the same data transformation andnormalisation as in Section 6.2, we may assume that random variables ( N ∗ a,g,k ( t )) t ∈ U are identically and standard normally distributed. Then, we can check for serialdependence and autocorrelation in the data such as causalities between a reductionin deaths due to certain death causes and a possibly lagged increase in diﬀerentones. Note that we already remove a lot of dependence via time-dependent weightsand death probabilities. Such serial eﬀects are, for example, visible in the case ofmental and behavioural disorders and circulatory diseases.Many tests are available most of which assume an autoregressive model withnormal errors such as the Breusch–Godfrey test, see [Godfrey(1978)]. For theBreusch–Godfrey test a linear model is ﬁtted to the data where the residuals are assumed to follow an autoregressive process of length p ∈ N . Then, ( T − p ) R asymptotically follows a χ distribution with p degrees of freedom under the nullhypothesis that there is no autocorrelation. In ‘R’, an implementation of theBreusch–Godfrey is available within the function bgtest in the ‘lmtest’ package, see[Zeileis and Hothorn(2002)].Applying this validation procedure to Australian data given in Section 4.1, thenull hypothesis, i.e., that there is no serial correlation of order 1 , , . . . ,

10, is notrejected at a 5% level in 93.8% of all cases. Again, this is an indicator that theECRP model with trends for weights and death probabilities ﬁts the data suitably6.4.

Validation via Risk Factor Realisations.

In the ECRP model, risk factorsΛ are assumed to be independent and identically gamma distributed with mean oneand variance σ k . Based on these assumptions, we can use estimates for risk factorrealisations λ to judge whether the ECRP model adequately ﬁts the data. Theseestimates can either be obtained via MCMC based on the maximum a posteriorisetting or by Equations (3.9) or (3.12).For each k ∈ { , . . . , K } , we may check whether estimates ˆ λ k (1) , . . . , ˆ λ k ( T )suggest a rejection of the null hypothesis that they are sampled from a gammadistribution with mean one and variance σ k . The classical way is to use theKolmogorov–Smirnov test, see e.g. [Lehmann and Romano(2005), section 6.13] andthe references therein. In ‘R’ an implementation of this test is provided by the ks.test function, see [R Core Team(2013)]. The null hypotheses is rejected as soon as thetest statistic sup x ∈ R | F T ( x ) − F ( x ) | exceeds the corresponding critical value where F T denotes the empirical distribution function of samples ˆ λ k (1) , . . . , ˆ λ k ( T ) andwhere F denotes the gamma distribution function with mean one and variance σ k .Testing whether risk factor realisations are sampled from a gamma distributionvia the Kolmogorov–Smirnov test as described above gives acceptance of the nullhypothesis for all ten risk factors on all suitable levels of signiﬁcance.6.5. Model Selection.

For choosing a suitable family for mortality trends, infor-mation criteria such as AIC, BIC, or DIC can be applied straight away. The decisionhow many risk factors to use cannot be answered by traditional information criteriasince a reduction in risk factors leads to a diﬀerent data structure. It also dependson the ultimate goal. For example, if the development of all death causes is ofinterest, then a reduction of risk factors is not wanted. On the contrary, in thecontext of annuity portfolios several risk factors may be merged to one risk factoras their contributions to the risk of the total portfolio are small.7.

Conclusions

We introduce an additive stochastic mortality model which is closely related toclassical approaches such as the Lee–Carter model but allows for joint modellingof underlying death causes and improves models using disaggregated death causeforecasts. Model parameters can be jointly estimated using MCMC based on publiclyavailable data. We give a link to extended CreditRisk + which provides a usefulactuarial tool with numerous portfolio applications such as P&L derivation in annuityand life insurance portfolios or (partial) internal model applications. Yet, there existsa fast and numerically stable algorithm to derive loss distributions exactly, insteadof Monte Carlo, even for large portfolios. Our proposed model directly incorporatesvarious sources of risk including trends, longevity, mortality risk, statistical volatility CTUARIAL APPLICATIONS AND ESTIMATION OF EXTENDED CREDITRISK + and estimation risk. In particular, it is possible to quantify the risk of statisticalﬂuctuations within the next period (experience variance) and parameter uncertaintyover a longer time horizon (change in assumptions). Compared to the Lee–Cartermodel, we have a more ﬂexible framework and can directly extract several sourcesof uncertainty. Straightforward model validation techniques are available. References

Alai et al.(2015). Alai, Daniel H., S´everine Arnold, and Michael Sherris. 2015. Modelling cause-of-death mortality and the impact of cause-elimination.

Annals of Actuarial Science

Poisson Approx-imation , Oxford Studies in Probability. Oxford: Oxford University Press, vol. 2.Booth and Tickle(2008). Booth, Heather, and Leonie Tickle. 2008. Mortality modelling and fore-casting: a review of methods.

Annals of Actuarial Science

3: 3–43.Brouhns et al.(2002). Brouhns, Natacha, Michel Denuit, and Jeroen K. Vermunt. 2002. A Pois-son log-bilinear regression approach to the construction of projected lifetables.

Insurance:Mathematics and Economics

31: 373–93.Cairns et al.(2009). Cairns, Andrew J. G., David Blake, Kevin Dowd, Guy D. Coughlan, DavidEpstein, Alen Ong, and Igor Balevich. 2009. A quantitative comparison of stochastic mortalitymodels using data from England and Wales and the United States.

North American ActuarialJournal

13: 1–35.Chaudhry and Zubair(2001). Chaudhry, M. Aslam, and Syed M. Zubair. 2001.

On a Class ofIncomplete Gamma Functions with Applications . Abingdon: Taylor & Francis.Credit Suisse First Boston(1997). Credit Suisse First Boston. 1997.

Creditrisk + : A Credit RiskManagement Framework . Tech. Report. New York: CSFB.Fung et al.(2017). Fung, Man Chung, Gareth W. Peters, and Pavel V. Shevchenko. 2017. A uniﬁedapproach to mortality modelling using state-space framework: characterization, identiﬁcation,estimation and forecasting. To appear in Annals of Actuarial Science.Fung et al.(2015). Fung, Man Chung, Gareth W. Peters, and Pavel V. Shevchenko. 2015. A state-space estimation of the Lee–Carter mortality model and implications for annuity pricing. InT. Weber, M. J. McPhee, and R. S. Anderssen (Eds.), MODSIM2015, 21st InternationalCongress on Modelling and Simulation, Modelling and Simulation Society of Australia andNew Zealand, pp. 952–58.Gamerman and Lopes(2006). Gamerman, Dani, and Hedibert F. Lopes. 2006. Markov chain MonteCarlo: Stochastic Simulation for Bayesian Inference , 2nd ed. Texts in Statistical ScienceSeries. Boca Raton: Chapman & Hall/CRC.Gerhold et al.(2010). Gerhold, Stefan, Uwe Schmock, and Richard Warnung. 2010. A generalizationof Panjer’s recursion and numerically stable risk aggregation.

Finance and Stochastics + . Risk

16: 73–77.Gilks(1995). Gilks, Walter R., Sylvia Richardson, and David Spiegelhalter. 1995.

Markov ChainMonte Carlo in Practice . Boca Raton: Chapman & Hall/CRC Interdisciplinary Statistics,Taylor & Francis.Godfrey(1978). Godfrey, Leslie G. 1978. Testing against general autoregressive and moving averageerror models when the regressors include lagged dependent variables.

Econometrica

International Mathematical Forum

3: 1521–28.Haaf et al.(2004). Haaf, Hermann, Oliver Reiss, and John Schoenmakers. 2004. Numerically stablecomputation of CreditRisk + . In Gundlach, Matthias, and Frank Lehrbass (Eds.) CreditRisk + in the Banking Industry . Springer Finance, Berlin, pp. 69–77.Kainhofer et al.(2006). Kainhofer, Reinhold, Martin Predota, and Uwe Schmock. 2006. The newAustrian annuity valuation table AV ¨O 2005R. Mitteilungen der Aktuarvereinigung ¨Osterreichs

13: 55–135.Lee and Carter(1992). Lee, Ronald D., and Lawrence R. Carter. 1992. Modeling and forecastingU.S. mortality.

Journal of the American Statistical Association

87: 659–71.

Lehmann and Romano(2005). Lehmann, Erich L., and Joseph P. Romano. 2005.

Testing StatisticalHypotheses , 3rd ed. Springer Texts in Statistics. Berlin: Springer-Verlag.Li and Hardy(2011). Li, Johnny Siu-Hang, and Mary R. Hardy. 2011. Measuring basis risk inlongevity hedges.

North American Actuarial Journal

15: 177–200.McNeil et al.(2005). McNeil, Alexander J., R ˜A diger Frey, and Paul Embrechts. 2005. QuantitativeRisk Management: Concepts, Techniques and Tools . Princeton Series in Finance. Princeton:Princeton University Press.Pasdika and Wolﬀ(2005). Pasdika, Ulrich and J ˜A rgen Wolﬀ. 12–14 January 2005. Coping withlongvity: The new German annuity valuation table DAV 2004 R. Paper presented at theLiving to 100 and beyond Symposium, Orlando, FL, USA.Pitacco et al.(2009). Pitacco, Ermanno, Michel Denuit, and Steven Haberman. 2009. ModellinglOngevity Dynamics for Pensions and Annuity Business . Oxford: Oxford University Press.Qi et al.(2005). Qi, Feng, Run-Qing Cui, Chao-Ping Chen, and Bai-Ni Guo. 2005. Some com-pletely monotonic functions involving polygamma functions and an application.

Journal ofMathematical Analysis and Applications

R: A Language and Environment for Statistical Com-puting . Vienna: R Foundation for Statistical Computing.Robert and Casella(2004). Robert, Christian P., and George Casella. 2004.

Monte Carlo StatisticalMethods , 2nd ed. Springer Texts in Statistics. New York:Springer-Verlag.Roberts et al.(1997). Roberts, Gareth O., Andrew Gelman, and Walter R. Gilks. 1997. Weakconvergence and optimal scaling of random walk Metropolis algorithms.

The Annals ofApplied Probability

7: 110–20.Schmock(1999). Schmock, Uwe. 1999. Estimating the value of the WinCAT coupons of theWinterthur insurance convertible bond: A study of the model risk.

Astin Bulletin

29: 101–63.Schmock(2017). Schmock, Uwe. 2017. Modelling Dependent Credit Risks with Extensions of CreditRisk+ and Application to Operational Risk. Lecture Notes, Version March 28, 2017. Avail-able online: (accessed on March 28, 2017).Shevchenko et al.(2015). Shevchenko, Pavel V., Jonas Hirz, and Uwe Schmock. 2015. Forecastingleading death causes in Australia using extended CreditRisk+. In T. Weber, M. J. McPhee,and R. S. Anderssen (Eds.), MODSIM2015, 21st International Congress on Modelling andSimulation, Modelling and Simulation Society of Australia and New Zealand, pp. 966–72.Shevchenko(2011). Shevchenko, Pavel V. 2011.

Modelling Operational Risk Using Bayesian Infer-ence . Berlin: Springer-Verlag.Sundt(1999). Sundt, Bj ˜A¸rn. 1999. On multivariate Panjer recursions.

Astin Bulletin

29: 29–45.Tierney(1994). Tierney, Luke. 1994. Markov chains for exploring posterior distributions.

TheAnnals of Statistics

22: 1701–62.Todesfallrisiko(2009). Todesfallrisiko, DAV-Unterarbeitsgruppe. 2009. Herleitung der SterbetafelDAV 2008 T f¨ur Lebensversicherungen mit Todesfallcharakter.

Bl¨atter der DGVFM

Journal of Applied Probability

33: 127–37.Wilmoth(1995). Wilmoth, John R. 1995. Are mortality projections always more pessimistic whendisaggregated by cause of death?

Mathematical Population Studies

5: 293–319.Zeileis and Hothorn(2002). Zeileis, Achim, and Torsten Hothorn. 2002. Diagnostic checking inregression relationships.

R News

2: 7–10.(J. Hirz)

BELTIOS GmbH, Lehargasse 1, 1060 Vienna, Austria

E-mail address : [email protected] (U. Schmock) Department of Financial and Actuarial Mathematics, TU Wien, WiednerHauptstr. 8–10, 1040 Vienna, Austria

E-mail address : [email protected] (Pavel V. Shevchenko) Department of Applied Finance and Actuarial Studies, MacquarieUniversity, NSW, 2109, Australia

E-mail address ::