[PDF] Volterra mortality model: Actuarial valuation and risk management with long-range dependence

Abstract

While abundant empirical studies support the long-range dependence (LRD) of mortality rates, the corresponding impact on mortality securities are largely unknown due to the lack of appropriate tractable models for valuation and risk management purposes. We propose a novel class of Volterra mortality models that incorporate LRD into the actuarial valuation, retain tractability, and are consistent with the existing continuous-time affine mortality models. We derive the survival probability in closed-form solution by taking into account of the historical health records. The flexibility and tractability of the models make them useful in valuing mortality-related products such as death benefits, annuities, longevity bonds, and many others, as well as offering optimal mean-variance mortality hedging rules. Numerical studies are conducted to examine the effect of incorporating LRD into mortality rates on various insurance products and hedging efficiency.

Full PDF

aa r X i v : . [ q -f i n . M F ] S e p Volterra mortality model: Actuarial valuation and riskmanagement with long-range dependence

Ling Wang ∗ Mei Choi Chiu † Hoi Ying Wong ‡ September 22, 2020

Abstract

While abundant empirical studies support the long-range dependence (LRD) of mor-tality rates, the corresponding impact on mortality securities are largely unknown due tothe lack of appropriate tractable models for valuation and risk management purposes. Wepropose a novel class of Volterra mortality models that incorporate LRD into the actuarialvaluation, retain tractability, and are consistent with the existing continuous-time aﬃnemortality models. We derive the survival probability in closed-form solution by takinginto account of the historical health records. The ﬂexibility and tractability of the modelsmake them useful in valuing mortality-related products such as death beneﬁts, annuities,longevity bonds, and many others, as well as oﬀering optimal mean-variance mortalityhedging rules. Numerical studies are conducted to examine the eﬀect of incorporatingLRD into mortality rates on various insurance products and hedging eﬃciency.

Keywords : Stochastic mortality; Long-range dependence; Aﬃne Volterra processes;Valuation; Mean-variance hedging. ∗ Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong.( [email protected] ) † Department of Mathematics & Information Technology, The Education University of Hong Kong, Tai Po,N.T., Hong Kong.( [email protected] ) ‡ Corresponding author. Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T.,Hong Kong.( [email protected] ) Introduction

Actuaries heavily rely on mortality modeling for mortality prediction, actuarial valuation,and risk management. Accurate estimations and predictions of human mortality are theessential building blocks of both insurance contract pricing and pension policy. The ﬁrststudy of this can be dated back to Gompertz (1825).The arguably most well-received modern mortality model is the Lee and Carter (1992)model and its extensions using time series analysis. For instance, it has been generalizedto multivariate populations with a common trend (Li and Lee, 2005), mortality fore-casts using single value decomposition (Renshaw and Haberman, 2003), joint modelingof diﬀerent national populations (Antonio et al., 2015) and sub-populations (Villegas andHaberman, 2014), a multi-population stochastic mortality model (Danesi et al., 2015), aPoisson regression model (Brouhns et al., 2002), and stochastic period and cohort eﬀect(Toczydlowska et al., 2017), among others. A key advantage of the Lee-Carter modeland its invariant is that statistical inferences from time series analysis can be applied orgeneralized to estimate and test with a real mortality data set.By incorporating fractionally integrated time series analysis into the Lee-Carter model,Yan et al. (2018) empirically show the existence of long-range dependence (LRD) (alsoknown as long-memory pattern or fractional persistence) across age groups, gender, andcountries by using the dataset of 16 countries. When they apply their long-memory mor-tality model to forecast life expectancies, the mortality model ignoring LRD tends tounderestimate life expectancy, which leads to important implications for pension schemesand funding issues. Yan et al. (2020) further extend the model to incorporate multivariatecohorts and document the existence of LRD. Yaya et al. (2019) show a long-memory pat-tern in the infant mortality rates of G7 countries. Delgado-Vences and Ornelas (2019) oﬀerfurther empirical evidence that mortality rates exhibit LRD using a fractional Ornstein-Uhlenbeck (fOU) process with Italian population data from the 1950 to 2004 period.Most stochastic mortality models focus on the mortality rate, or equivalently thePoisson intensity rate. We refer to the pioneering work of Milevsky and Promislow (2001)who introduced the Cox model to insurance applications. Biﬃs (2005) and Biﬃs andMillossovich (2006) further develop this idea of doubly stochastic mortality models withan aﬃne feature for exploiting analytical tractability in actuarial valuation with bothﬁnancial and mortality risks. Jevti´c et al. (2013) extend it to cohort models, and Wong etal. (2017) introduce continuous-time cointegration into the multivariate mortality rates.Blackburn and Sherris (2013) advocate the use of continuous-time aﬃne mortalitymodels for longevity pricing and hedging because of its tractability and consistency withthe market data. Jevti´c and Regis (2019) propose a calibration to the multiple popula-tions aﬃne mortality models and demonstrate its empirical use with product price data.However, none of the aforementioned studies provide an analytically tractable dynamicmortality model with the LRD feature.The primary contribution of this paper is the proposal of a novel class of dynamic tochastic mortality models that simultaneously render actuarial valuation tractabilityand the LRD property. As the proposed model is based on Volterra processes, we callthem Volterra mortality models. Inspired by the aﬃne Volterra process (Abi Jaber et al.,2019), our model preserves the aﬃne structure for general actuarial valuation but stillcaptures LRD. In terms of practical contributions, we use the model to derive closed-formsolutions for the survival probability, death and survival beneﬁts of insurance contracts,and longevity bonds, and then address the impact of LRD on these insurance products.To the best of our knowledge, the derived formulas constitute the ﬁrst set of formulas forinsurance products that are subject to the LRD feature of mortality rates.This study also contributes to risk management with LRD mortality rates. We rigor-ously develop the mean-variance (MV) strategy for hedging longevity risk with a longevitysecurity that is subject to LRD. This later hedging strategy is highly non-trivial becausethe Volterra mortality rate is a non-Markovian and non-semimartingale process. Inspiredby Han and Wong (2020), we derive the MV optimal hedging with the Volterra mortal-ity models by means of linear-quadratic control with the backward stochastic diﬀerentialequation (BSDE) framework similar to Wong et al. (2017). In contrast, Han and Wong(2020) solve the MV portfolio problem with rough volatility by constructing an auxiliaryprocess. Our optimal hedging rule shows how to adjust the hedge for LRD of mortalityrates.The rest of this paper is organized as follows. Section 2 introduces the Volterra mor-tality model based on the doubly stochastic mortality models and explains how the modelcaptures LRD. Section 3 oﬀers some formulas for actuarial valuation. In Section 4, weformulate an optimal hedging problem under the Volterra mortality model and give anexplicit solution. To compare the Volterra mortality model with LRD with the Marko-vian mortality model, numerical studies are conducted for both actuarial valuation andthe hedging problem in Section 5. Section 6 gives our concluding remarks. Some detailsand additional proofs are given in the Appendix. Consider a ﬁltered probability space (Ω , F , F , P ) where the ﬁltration F = {F t : 0 ≤ t ≤ T } satisﬁes the usual properties. We write F t = G t ∨ H t , where H t represents the ﬂow ofinformation available as time goes by including the historical processes and the currentstates, and G t contains the information whether an individual has died. We interpret P as the physical probability measure. Alternatively, our model can be developed undera pricing measure so that the model parameters are calibrated to the insurance productprices available in the market. This enables actuarial valuation consistent with marketprices. However, risk management strategies should be conducted under the physicalprobability measure. To avoid confusion, we denote the pricing measure by Q and discussthe relationship between P and Q in the next section. For the time being, we focus onthe model development under P . e begin with the classic doubly stochastic mortality models. For simplicity, weconsider a group of people with homogeneous feature while individual diﬀerences certainlyexist in this group at the same time. A counting process N is a doubly stochastic processdriven by the subﬁltration G = {G t } t ≥ of F and with G -intensity µ t . Let τ be theﬁrst jump-time of the process N with intensity µ t . In actuarial applications, the process { N t } t ≥ records the number of deaths at each time t ≥

0. For any time t ≥ ω ∈ Ω such that τ ( ω ) > t , we have P ( τ ≤ t + ∆ |F t ) ∼ = µ t ( ω )∆ , (1)for a trajectory of µ t ( ω ) and a ﬁxed ω ∈ Ω. Thus, the counting process N associated with τ becomes an inhomogeneous Poisson with parameter R · µ s ( ω ) ds . In other words, for all T ≥ t ≥ k ( k ≥ P ( N T − N t = k |F t ∨ G T ) = ( R Tt µ s ( ω ) ds ) k k ! e − R Tt µ s ( ω ) ds . By the law of iterated expectations, the time- t survival probabilities over the time interval( t, T ] (for ﬁxed T ≥ t ≥

0) can be expressed as follows: P ( τ > T |F t ) = E h e − R Tt µ s ( ω ) ds (cid:12)(cid:12)(cid:12) F t i . (2)If the intensity µ t is a constant, then the doubly stochastic process reduces to the homo-geneous Poisson process. However, the literature of mortality modeling is in favour of astochastic intensity. Typically, the intensity is modeled through a stochastic diﬀerentialequation (SDE). For instance, Biﬃs (2005) and Biﬃs and Millossovich (2006) postulate aMarkovian process such that µ t = f ( X t ), where f is a continuous function on R , dX t = b ( X t ) dt + σ ( X t ) dW t , (3)and { W t } t ≥ is the standard Brownian motion.To incorporate LRD into the mortality rate, one simply replaces the Brownian motionin (3) with the fractional Brownian motion. In other words, dX t = b ( X t ) dt + σ ( X t ) dW Ht , (4)where W Ht is a fractional Brownian motion (fBM) with the Hurst parameter H ∈ [0 . , f ( X t ) = h exp( h t + h X t ), for the constants h , h , h >

0, and a fOU process in the form of (4)such that the drift term b ( X t ) is a linear function of X t and the σ ( X t ) ≡ σ is a constant.However, the fractional Brownian motion is analytically intractable for actuarial valuation. We propose a stochastic mortality model incorporating LRD that retains the key advan-tages of the works of Biﬃs (2005), Delgado-Vences and Ornelas (2019), and Leonenko et l. (2019). More speciﬁcally, we maintain the aﬃne nature of Biﬃs (2005), reﬂect LRDwith fBM as in Delgado-Vences and Ornelas (2019), and oﬀers explicit expressions forsome important Fourier-Laplace functional generalizing Leonenko et al. (2019) for actu-arial valuation. Our model is highly inspired by the aﬃne Volterra processes (Abi Jaberet al., 2019) and hence called the Volterra mortality model.In the one dimensional case, Baudoin and Nualart (2003) show the equivalence betweenfBM and the Volterra process: W Ht = c H Z t ( t − s ) H − dW ( s ) , where c H is a constant related to the Hurst parameter H , W is the Wiener process, andthe integral process on the right-handed side is a standard Volterra process. For simplicityand to be consistent with the literature, we postulate the mortality rate µ t of a group: µ t = m ( t ) + ηX t , (5)where m ( t ) is a bounded continuous deterministic function and η is a constant. In otherwords, we require that f ( X t ) is a linear function of X t . In addition, X t follows a stochasticVolterra integral equation (SVIE): X t = X + Z t K ( t − s ) b ( X s ) ds + Z t K ( t − s ) σ ( X s ) dW s , (6)where W = [ W , · · · , W d ] ⊤ is the standard d -dimensional Brownian motion under P , andthe coeﬃcients b and σ are assumed to be continuous. The convolution kernel K satisﬁesthe following condition: K ∈ L loc ( R + , R ), R h K ( t ) dt = O ( h γ ) and R T ( K ( t + h ) − K ( t )) dt = O ( h γ ) forsome γ ∈ (0 ,

2] and every

T < ∞ .Although the process X t in (6) is generally high-dimensional, we would like to illustrateit in a one-dimensional case. Table 1 exhibits some useful kernels K in the one-dimensionalcase. We obtain the fBM by choosing K as the fractional kernel in Table 1 with a constant σ ( X s ) and b = 0 in (6). Therefore, the Volterra processes can be applied to a wider classof LRD noise terms. Note that the resolvent or resolvent of the second kind correspondingto the K shown in Table 1 is deﬁned as the kernel R such that K ∗ R = R ∗ K = K − R .The convolutions K ∗ R and R ∗ K with K a measurable function on R + and R a measureon R + of locally bounded variation are deﬁned by( K ∗ R )( t ) = Z [0 ,t ] K ( t − s ) R ( ds ) , ( R ∗ K )( t ) = Z [0 ,t ] R ( ds ) K ( t − s )for t > Remark 1.

According to Biﬃs (2005), the deterministic function m ( t ) in (5) may repre-sent (i) a best-estimated assumption on µ enforcing unbiased expectations about the futurebased on the available information, (ii) pricing demographics basis, or (iii) an availablemortality table for a population of insureds. In Section 5, we calibrate m ( t ) to the tableSIM92, a period table usually employed to price assurances. K ( t ) c c t α − Γ( α ) ce − λt ce − λt t α − Γ( α ) R ( t ) ce − ct ct α − E α,α ( − ct α ) ce − λt e − ct ce − λt t α − E α,α ( − ct α )Table 1: Examples of kernel function K and the corresponding resolvent R . Here E α,β ( z ) = P ∞ n =0 z n Γ( αn + β ) denotes the Mittag-Leﬄer function. In addition, when the convolution kernel K is set to a constant c in (6), the X t reducesto the solution of a SDE. Furthermore, once b ( X t ) is linear in X t and σ ( X t ) satisﬁes acertain aﬃne property, then our model in (6) becomes the aﬃne stochastic mortalitymodel of Biﬃs (2005). The possibly high-dimensional X t enables us to also incorporatemulti-factor mortality modeling. However, we would like to highlight that the Volterraprocess in (6) is generally a non-Markovian and non-semimartingale process. The non-Markovian nature is obvious because the integrals in the SIVE take the whole realizedsample path into account. The non-semimartingale feature is reﬂected by the fact thatthe time variable t appears in both the integral limit and the kernel function, making itfail to deﬁne the Itˆo integral.Fortunately, Abi Jaber et al. (2019) show that it is still possible to maintain the aﬃnenature within (6). Let a ( x ) = σ ( x ) σ ( x ) ⊤ be the covariance matrix. Deﬁnition 1.

The SVIE (6) is called an aﬃne process (Abi Jaber et al., 2019) if a ( x ) = A + x A + · + x d A d ,b ( x ) = b + x b + · · · + x d b d , for some d -dimensional symmetric matrices A i and vectors b i . For simplicity, we set B = ( b , · · · , b d ) and A ( u ) = ( uA u ⊤ , · · · , uA d u ⊤ ) for any row vector u ∈ C d . To draw insights from Deﬁnition 1, consider the one dimensional case. When b ( x ) = b − b x , a linear function of x , and a ( x ) is a constant, (6) is known as the Volterra type ofthe Vasicek (VV) model which reduces to the classic Vasicek model by taking a constantkernel or, equivalently, H = 1 / b ( x ) is linear in x and a ( x ) is directly proportional to x , our model in (6) reduces to the Volterra version of the CIR (VCIR) model.

Although we focus on mortality modeling, actuarial valuation needs to specify the dynamicof the risk-free interest rate. We simply adopt a Markov aﬃne model for the interest rate.Speciﬁcally, we adopt the short rate process r that satisﬁes R t | r s | ds < ∞ for t ≥

0, andwe deﬁne the return of a risk-less asset as exp( R t r s ds ) for a unit dollar investment at ime 0. In addition, the interest rate process is driven by the Markov aﬃne process Z in R k : dZ t = e b ( Z t ) dt + e σ ( Z t ) dW ′ t , (7)where W ′ is a k -dimensional standard Brownian motion. The coeﬃcients e b ( Z t ) and e a ( Z t ) = e σ ( Z t ) e σ ⊤ ( Z t ) have aﬃne dependence on Z t once they satisify Deﬁnition 1 withthe dimension d replaced by k . Hence, the Markov aﬃne feature coincides with thedeﬁnition of Markov aﬃne process in Duﬃe et al. (2003). Furthermore, the short rate r t . = r ( t, Z t ) = λ ( t ) + λ ( t ) · Z t which is an aﬃne function on Z t with coeﬃcients λ ( t )and λ ( t ) being bounded continuous functions on [0 , ∞ ). By the aﬃne processes in Duﬃeet al. (2003) and Filipovi´c (2005), at time t , we have B ( t, T ) = E h e − R Tt r ( s,Z s ) ds (cid:12)(cid:12)(cid:12) F t i = e e α ( t,T )+ e β ( t,T ) · Z t , (8)where the functions ˜ α ( · , T ) and ˜ β ( · , T ) are uniquely solved from the ordinary diﬀerentialequations (ODEs) in Appendix A with boundary conditions e α ( T, T ) = 0 and e β ( T, T ) = 0.If the interest rate model in (7) is deﬁned under the pricing measure, i.e., P = Q , thenthe quantity B ( t, T ) represents the price of a unit zero coupon bond. We demonstrate the tractability of the proposed Volterra mortality model in actuarialvaluation. Speciﬁcally, we derive closed-form solutions to the survival probability andprices of some standard life insurance products. The following theorem is the buildingblock of the actuarial valuation.

Theorem 1.

If the mortality rate µ t follows (5) and (6) and has the aﬃne structurespeciﬁed in Deﬁnition 1, then, for any constant c and c and T > t , we have E h e − R Tt µ s ds ( c + c µ T ) (cid:12)(cid:12)(cid:12) F Xt i = c g ( t, T ) − c ∂g ( t, T ) ∂T , (9) where g ( t, T ) = e − R T m ( s ) ds e R t µ s ds exp( Y t ( T )) ,Y t ( T ) = Y + Z t ψ ( T − s ) σ ( X s ) dW s − Z t ψ ( T − s ) a ( X s ) ψ ( T − s ) ⊤ ds, (10) Y ( T ) = Z T ( − ηX + ψ ( s ) b ( X ) + 12 ψ ( s ) a ( X ) ψ ( s ) ⊤ ) ds, and ψ ∈ L ([0 , T ] , C d ) solves the Riccati-Volterra equation: ψ = ( − η + ψB + 12 A ( ψ )) ∗ K, (11) with A ( · ) appearing in Deﬁnition 1. In addition, the Y has an alternative expression: Y t ( T ) = − η Z T E [ X s |F t ] ds + 12 Z Tt ψ ( T − s ) a ( E [ X s |F t ]) ψ ( T − s ) ⊤ ds, (12) here E [ X T |F t ] = id − Z T R B ( s ) ds ! X + Z T E B ( T − s ) b ( s ) ds + Z t E B ( T − s ) σ ( X s ) dW s (13) with id being the identity matrix, R B the resolvent of − KB , and E B = K − R B ∗ K .Proof. See Appendix A.

Remark 2.

The partial derivative ∂g ( t,T ) ∂T does not admit a closed-form solution in generalbecause the function g ( t, T ) depends on Y t ( T ) which depends on T through the ψ solvedfrom the Riccati-Volterra Equation (11) . Fortunately, the partial derivative appears ininsurance products related to the death beneﬁt through an integration. We can then avoidcomputing it by means of integration by parts. We highlight that the expression in (10) implies that Y t ( T ) is a semimartingale, becauseall of the integrants in (10) are independent of t . This is important and interestingbecause it implies that insurance product prices can be expressed into SDE even thoughthe mortality rate with LRD can not. This enables us to construct a hedging strategy forlongevity risk using longevity securities in a LRD mortality environment, indicating theimportance of the longevity securatization. For the time being, we apply Theorem 1 toobtain the survival probability of the Volterra mortality model in a closed-form solution. Corollary 1. (Survival Probability) Under the Volterra mortality model in (5) , (6) , andDeﬁnition 1, for any t < T , the survival probability reads P ( τ > T |F t ) = E h e − R Tt µ s ds (cid:12)(cid:12)(cid:12) F t i = g ( t, T ) = e − R T m ( s ) ds + R t µ s ds exp( Y t ( T )) , (14) where Y t ( T ) is deﬁned in (10) or, equivalently, (12) .Proof. The result follows by taking c = 1 and c = 0 in Theorem 1.The survival probability in Corollary 1 captures LRD because it depends on the wholehistorical path of the mortality rate. This is reﬂected in the terms e − R T m ( s ) ds + R t µ s ds and Y ( T ). However, when comparing our survival probability with LRD with that of thecorresponding Markovian mortality model, we ﬁnd them consistent. Consider the case offractional kernel K ( t ) = t α − Γ( α ) id, where α = H + 1 / H is the Hurst parameter H .The process X t becomes X t = X + λ Z t ( t − s ) α − Γ( α ) ( θ − X s ) ds + Z t ( t − s ) α − Γ( α ) σ ( X s ) dW s . (15)When α = 1, the K ( t ) ≡ id and dX t = λ ( θ − X t ) dt + σ ( X t ) dW t , which is the Vasicek mortality rate model for a constant σ ( X t ) and the CIR model for σ ( X t ) = σ √ X t . Both are investigated by Biﬃs (2005). In such a situation, a part of the ( T ) in (10) cancels with R t µ s ds , and the Volterra-Riccati Equation (11) reduces to theordinary Riccati equation. This makes our solution the same as these in Biﬃs (2005) for α = 1 or H = 1 /

2. However, once α >

1, the process X t has the LRD feature. Theempirical study in Yan et al. (2018) shows that the survival probability is underestimatedwhen LRD is not taken into account. To streamline the presentation, we assume that mortality rates are independent of theinterest rate. Although this assumption could be considered as mathematically restrictive,it is a common assumption in the actuarial and insurance literature. Two basic payoﬀsin insurance contracts are the survival beneﬁt and the death beneﬁt.Let C T be a bounded random payoﬀ for a survivor at time T independent of themortality. The time- t fair value of the survival beneﬁt SB t ( C T ; T ) of the terminal amount C T , with 0 ≤ t ≤ T under the pricing measure Q is given bySB t ( C T ; T ) = 1 { τ>t } E Q h e − R Tt r s ds C T (cid:12)(cid:12)(cid:12) G Zt i E Q h e − R Tt µ s ds (cid:12)(cid:12)(cid:12) G Xt i . (16)To draw some insights from (16), let us consider the situation in which the mortality modelof (5) and (6) and interest rate process of (7) are constructed under the pricing measure Q or, equivalently, that P = Q in Section 2. We refer to the results obtained under suchan assumption as the baseline case in this paper and the corresponding valuation becomessimple. Proposition 1. (Survival Beneﬁt: The Baseline Valuation.) If P = Q and the mortalityand interest rate are independent, then the Volterra mortality model of (5) , (6) , andDeﬁnition 1 and the aﬃne interest rate model imply that SB t ( C T ; T ) = 1 { τ>t } B ( t, T ) E Q T (cid:2) C T | G Zt (cid:3) g ( t, T ) , where g ( t, T ) is presented in Theorem 1, B ( t, T ) is the zero coupon bond price in (8) , and Q T is the forward pricing measure: d Q T d Q (cid:12)(cid:12)(cid:12)(cid:12) F t = exp (cid:18) − Z t e β ( u, T ) e σ ( Z u ) du − Z t e β ( u, T ) e σ ( Z u ) dW ′ u (cid:19) . Proof.

By Corollary 1, E Q h e − R Tt µ s ds (cid:12)(cid:12)(cid:12) G Xt i = g ( t, T ) . By the aﬃne short-rate Model (7) and Equation (8), we have d B ( t, T ) = B ( t, T ) r t dt − B ( t, T ) e β ( t, T ) e σ ( Z t ) dW ′ t , which implies that 1 = B ( T, T ) = B ( t, T ) e R Tt r u − e β ( u,T ) e σ ( Z u ) du − R Tt e β ( u,T ) e σ ( Z u ) dW ′ u . Hence, e − R Tt r s ds = B ( t, T ) exp − Z Tt e β ( u, T ) e σ ( Z u ) du − Z Tt e β ( u, T ) e σ ( Z u ) dW ′ u ! . n application of the Girsanov theorem shows that E Q h e − R Tt r s ds C T (cid:12)(cid:12)(cid:12) G Zt i = B ( t, T ) E Q T (cid:2) C T | G Zt (cid:3) , where the forward measure Q T is presented in the Proposition.Another important basic payoﬀ is the death beneﬁt. Let C t be a bounded G Z -predictable process, representing a cash ﬂow stream independent of the mortality rate.Then, the time- t fair value of the death beneﬁt with a cash ﬂow stream C t , payable incase the insured dies before time T and 0 ≤ t ≤ T , is given byDB t ( C τ ; T ) = 1 { τ>t } Z Tt E Q h e − R ut r s ds C u (cid:12)(cid:12)(cid:12) G Zt i E Q h e − R ut µ s ds µ u (cid:12)(cid:12)(cid:12) G Xt i du. Then, we also have an explicit baseline valuation formula for the death beneﬁt.

Proposition 2. (Death Beneﬁt: The Baseline Valuation.) If P = Q and the mortalityand interest rate are independent, then the Volterra mortality model of (5) , (6) , andDeﬁnition 1 and the aﬃne interest rate model imply that DB t ( C T ; T ) = − { τ>t } Z Tt B ( t, u ) E Q u (cid:2) C u | G Zt (cid:3) ∂g ( t, u ) ∂u du, where Y t ( u ) is deﬁned in (10) , B ( t, T ) in (8) , g ( t, T ) in Theorem 1, and the forwardpricing measure Q u in Proposition 1.Proof. The proof is similar to that of Proposition 1 except for the second expectationappearing in the representation of DB t ( C τ ; T ). By Theorem 1, it is clear that E Q h e − R ut µ s ds µ u (cid:12)(cid:12)(cid:12) G Xt i = − ∂g ( t, u ) ∂u . Applying integration by parts to DB in Proposition 2 yields an alternative expression:DB t ( C T ; T ) = − { τ>t } (cid:26) B ( t, T ) E Q T (cid:2) C T | G Zt (cid:3) g ( t, T ) − E Q t (cid:2) C t | G Zt (cid:3) (17) − Z Tt ∂ (cid:0) B ( t, u ) E Q u (cid:2) C u | G Zt (cid:3)(cid:1) ∂u g ( t, u ) du (cid:27) . In this way, as the interest rate model follows the Markovian aﬃne model, the partialderivative term in (17) admits a closed-form solution in many cases and we get rid of theneed to compute a T -partial derivative of g ( t, T ), which is rather more complicated. These formulas for survival and death beneﬁts may still be considered abstract, so weapply them to some concrete insurance or pension products. ongevity Bond : Consider a unit zero-coupon longevity bond which pays $1 times e − R Tt µ s ds , the percentage of survivors in a population during t to T . Blake et al. (2006)show that the longevity bond takes the form B L ( t, T ) = E Q h e − R Tt r s + µ s ds (cid:12)(cid:12)(cid:12) F t i . Under the Volterra mortality model with LRD, Proposition 1 immediately implies that B L ( t, T ) = B ( t, T ) g ( t, T ) , by setting C T ≡ Annuity : Consider a t ′ -years deferred annuity involving a continuous payment of anindexed beneﬁt from time t onwards, conditional on survival of the policyholder at thattime. Suppose that the payoﬀ is made of a unit amount each year. Denote x ∗ as themaximum age humans can live. The fair value of such an annuity is given byAN t ( t ′ ) = x ∗ − t − X h = t ′ SB t (1; t + h ) = x ∗ − X T = t + t ′ B ( t, T ) g ( t, T )= x ∗ − X T = t + t ′ e e α ( t,T )+ e β ( t,T ) Z t e − R T m ( s ) ds + R t µ s ds exp( Y t ( T )) , (18)where Y t ( T ) is deﬁned in (10) and e α ( t, T ) and e β ( t, T ) are as in (8). Assurances:

Consider an assurance guaranteeing a unit amount beneﬁt in case ofdeath in the period ( t, T ]. By setting C ≡ t ( T ) = 1 − B ( t, T ) g ( t, T ) + Z Tt ∂ B ( t, u ) ∂u g ( t, u ) du, where B ( t, T ) is deﬁned in (8) and g ( t, T ) in Theorem 1. Endowment:

Consider an endowment given the survival on time t with maturity time T , which includes a survival beneﬁt C given the survival on time T and a death beneﬁt C in case of the death in the period ( t, T ]. C and C are constants. By Propositions 1and 2 and (17), the fair value of such an endowment is given byEN Tt ( C , C ) = SB t ( C ; T ) + DB t ( C ; T )= ( C − C ) B ( t, T ) g ( t, T ) + C Z Tt ∂ B ( t, u ) ∂u g ( t, u ) du ! , where B ( t, T ) is deﬁned in (8) and g ( t, T ) in Theorem 1. Although Propositions 1 and 2 facilitate the model development under the pricing measureand the calibration to market prices of insurance products, an insurance practice may nothave suﬃcient market prices for such calibration. In addition, risk management requiresthe connection between the physical and pricing measures as demonstrated in the next ection. Therefore, we present two possible ways to link the measures of P and Q withlimited observed prices. For the time being, we focus on the situation in which the Volterramortality model is estimated using a historical mortality table and hence built under thephysical measure P = Q .The ﬁrst approach commonly used to identify a pricing measure in the actuarial liter-ature is the Esscher transform. Chuang and Brockett (2014) apply the Esscher transformto the mortality rate to ﬁnd a related martingale measure for pricing longevity derivatives.Wang et al. (2019) also use the Esscher transform for pricing longevity derivatives basedon an improved Lee–Carter model. Although the mortality rate µ t is non-Markovianand non-semimartingale under our framework, the advantage is that we have an explicitLaplace-Fourier functional representation in Theorem 1. For a random variable γ witha well-deﬁned moment-generating function (MGF) under P , an equivalent probabilitymeasure Q ( θ ) derived from the Esscher transform with parameter θ is deﬁned as d Q ( θ ) d P = e θγ E [ e θγ ] . (19)By setting c = 1 and c = 0 in Theorem 1, the MGF for the random variable − R Tt µ s ds is well-deﬁned and can be obtained in an explicit form. Speciﬁcally, as weassume µ t = m ( t ) + ηX t , the MGF deﬁned as M ( θ T ) = E [ e − θ T R Tt µ s ds ] , which corresponds to the g ( t, T ) in Theorem 1 with the parameters m ( t ) and η replacedwith θ T m ( t ) and θ T η for the constant θ T and a ﬁxed T . For instance, we observe a risk-free zero coupon bond and a zero coupon longevity bond with the same maturity. Then,we can deduce the synthetic value of E Q ( θ T ) t [ e − R Tt µ s ds ] = E t [ e − ( θ T +1) R Tt µ s ds ] E t [ e − θ T R Tt µ s ds ] = M ( θ T + 1) M ( θ T ) . (20)Although the left-hand quantity is deduced from market prices, the M ( θ T ) achieves aclosed-form solution from our model through Theorem 1. Speciﬁcally, M ( θ ) is the g ( t, T )in Theorem 1 with m ( t ) and η replaced with θm ( t ) and θη , respectively. One can thencalibrate θ T to the term structure of longevity bonds, or longevity bond prices for diﬀerentmaturity T , after estimating the physical model parameters, including the LRD feature,using historical data.From (20), when θ T = 0, the longevity bond is priced under P and our previousvaluation formulas hold. For a nonzero θ T , a slight adjustment can be made through (20)as the MGF is explicitly known. Although the Esscher transform provides us with a powerful and convenient frameworkto identify a pricing measure, it does not oﬀer us an explicit stochastic process under the ricing measure. When we perform a risk management strategy, we need the stochasticprocess of the mortality rate under both P and Q . It is desirable that the Volterra mortalitymodel retains the aﬃne nature in Deﬁnition 1. Therefore, we propose the following aﬃneretaining transform based on the Girsanov theorem. Deﬁnition 2.

Given an aﬃne SIVE of (6) satisfying Deﬁnition 1, an aﬃne retainingtransform for measure change is based on shifting the Wiener process as follows: dW Q t = dW t − σ ( X t ) ⊤ ϕ ( t ) dt, for a deterministic function ϕ ( t ) ∈ R d satisfying E t h e R T | σ ( X t ) ⊤ ϕ ( t ) | dt i < ∞ . (21)Under Deﬁnition 2, we identify a pricing measure Q equivalent to P : d Q d P = e − R t | σ ( X s ) ⊤ ϕ ( s ) | ds + R t ϕ ( s ) ⊤ σ ( X s ) dW s , where ϕ ( t ) is calibrated to observed prices. In addition, the mortality process µ t = m ( t ) + ηX t in (6) under Q has the X t changed to X t = X + Z t K ( t − s )( b ( X s ) + a ( X s ) ϕ ( s )) ds + Z t K ( t − s ) σ ( X s ) dW Q s , (22)where b ( X s ) + a ( X s ) ϕ ( s ) and a ( X s ) still satisfy the aﬃne nature in Deﬁnition 1. Hence,the pricing formulas of Propositions 1 and 2 remain the same except that the b ( X s ) isreplaced with b ( X s ) + a ( X s ) ϕ ( s ) once the aﬃne retaining transform in Deﬁnition 2 isadopted. Remark 3.

Although the Esscher and aﬃne retaining transforms presented in Sections3.2 and 3.3 are applied to the Volterra mortality model, these techniques have been widelyused in the actuarial science literature, including the measure change with the aﬃne in-terest rate models. Therefore, we do not repeat the detailed case for the interest rate. Wemention them to highlight the advantage of the proposed LRD mortality model in sense ofcalibrating to the pricing measure.

We further investigate optimal hedging with the proposed LRD mortality model, as hedg-ing is a typical risk management task. The intent is to demonstrate the tractability ofthe LRD mortality model in hedging problems. As hedging should be performed underthe physical probability measure P , whereas longevity securities such as the longevitybonds and swaps are valued in the market-implied pricing measure Q , we adopt the aﬃneretaining transform detailed in Section 3.3 to bridge the two probability measures in thissection. et us sketch the conceptual framework prior to detailing the mathematics. As insur-ance product prices under the Volterra mortality model are semimartingales and hencecan be expressed in SDE, the insurer’s wealth also satisﬁes a SDE with stochastic coef-ﬁcients, which are possibly non-Markovian. According to stochastic control theory, theinsurer’s wealth plays the role of the state process. Therefore, the theory of backwardSDE (BSDE) is useful for solving the stochastic optimal control problem for a state pro-cess with stochastic coeﬃcients. Typically, the mean-variance (MV) hedging problem isclosely related to the linear-quadratic (LQ) control problem under the classic formulationof the BSDE approach. In the following, we leverage this well-received theoretical resultto show the application of the LRD mortality model, though the optimal hedging derivedis novel and has remarkable performance in reducing risk with the LRD mortality. Theperformance is, however, shown in the next section numerically. Consider an insurer oﬀering a pension scheme who wants to hedge the longevity risk usinga longevity security. Speciﬁcally, the insurer allocates her capital among a bank account,risk-free zero-coupon bond, and zero-coupon longevity bond. Let us concentrate on theone-dimensional case so that d = k = 1 from now on.To simplify the discussion, we adopt the VV mortality rate and assume m ( t ) = 0 and η = 1 in (5). In other words, µ ( t ) = X ( t ) and µ t = X t = X + Z t K ( t − s )( b − b X s ) ds + Z t K ( t − s ) σ µ dW s , (23)where b , b , and σ µ are constants and K is the Volterra kernel. In addition, the interestrate r t = Z t follows the Vasicek model: dr ( t ) = ( e b − e b r t ) dt + σ r dW ′ t , (24)where e b , e b , and σ r are constant parameters. W t and W ′ t are independent Wiener pro-cesses under P . Let W ( t ) = ( W t , W ′ t ) ⊤ . Using the aﬃne retaining transform in Deﬁnition2, the Weiner process under the pricing measure is given by dW Q t = dW t − σ µ ϕ ( t ) σ µ dt, dW ′ t Q = dW ′ t − σ r ϑ ( t ) σ r dt, where ϑ and ϕ are deterministic functions satisfying the condition (21). Under the pricingmeasure, the mortality and interest rates are, respectively, X t = X + Z t K ( t − s )( b + ϕ ( s ) σ µ − b X s ) ds + Z t K ( t − s ) σ µ dW Q s ; dr ( t ) = ( e b + ϑ ( t ) σ r − e b r t ) dt + σ r dW ′ t Q . As the unit zero coupon bond price takes the form B ( t, T ) = E Q h e − R Tt r ( s ) ds (cid:12)(cid:12)(cid:12) F t i = e e α ( t,T )+ e β ( t,T ) r t , ith e α ( t, T ) and e β ( t, T ) as deﬁned in Appendix A, the P -dynamics of the bond reads d B ( t, T ) = B ( t, T )( r ( t ) + ν B ( t )) dt + B ( t, T ) σ b ( t ) dW ′ t , where ν B = ϑ ( t ) σ b ( t ) and σ b ( t ) = − e β ( t, T ) σ r . Similarly, using the expression for a zerocoupon longevity bond, i.e., B L ( t, T ) = E Q h e − R Tt r ( s )+ µ ( s ) ds (cid:12)(cid:12)(cid:12) F t i = B ( t, T ) e R t µ ( s ) ds exp( Y t ( T )) , where Y t ( T ) is equivalent to the Y t ( T ) in (10) with b ( x ) = b + ϕ ( s ) σ µ − b x , σ ( x ) = σ µ ,and W replaced by W Q , we obtain the P -dynamics of the longevity bond prices as follows: d B L ( t, T ) = B L ( t, T )( r ( t ) + µ ( t ) + ν L ( t )) dt + B L ( t, T ) σ l ( t ) dW t + B L ( t, T ) σ b dW ′ t , where ν L = ν B + ϕ ( t ) σ l , σ l = − ψ ( T − t ) σ µ , and ψ ∈ L ([0 , T ] , C ) is the solution ofthe Riccati equation ψ = ( − − b ψ ) ∗ K . As an investment amount of B L ( t, T ) in thelongevity bond at time t becomes e − R τt µ ( s ) ds B L ( τ, T ) at τ > t , the value of holding oneunit of zero coupon longevity bond B L ( t ) satisﬁes d B L ( t, T ) = B L ( t, T )( r ( t ) + ν L ( t )) dt + B L ( t, T ) σ l ( t ) dW t + B L ( t, T ) σ b dW ′ t . (25)The quantities ν L − ν B and ν B are often known as the market prices of mortality andinterest rate risks, respectively. From (25), the zero coupon longevity bond price stillsatisﬁes a SDE due to the semimartingale nature of Y t ( T ). This fact enables us to dealwith the optimal hedging problem with a LRD mortality rate. Note that the LRD featureis reﬂected by the volatility term of B L ( t ) through a Riccati-Volterra equation.Let u ( t ), u ( t ), and u ( t ) denote the investment amounts in the bank account, zero-coupon longevity bond, and zero-coupon bond respectively. Denote ˜ N ( t ) as a stochasticPoisson process with intensity k µ ( t ) and { z i } ∞ i =1 as independent identically distributed(iid) insurance claims. Consider a hedging horizon of T < T . Then, the wealth processof the insurer reads M ( t ) = u ( t ) + u ( t ) + u ( t ) − ˜ N ( t ) X i =1 z i − Π( t ) , t ∈ [0 , T ] , (26)where Π = R t π ( s ) ds , t ∈ [0 , T ], and π ( t ) is a F t -adapted, square integrable processrepresenting the pension annuity net cash outﬂow. We denote the ﬁltration generated by { M ( s ) : 0 ≤ s ≤ t } by ˜ H t ⊇ F t . The insurer’s wealth M ( t ) satisﬁes the following SDE: dM ( t ) = ( M ( t ) r ( t ) + u ( t ) ⊤ ν ( t ) − π ( t )) dt + u ( t ) ⊤ σ S ( t ) ⊤ d W ( t ) − zd ˜ N ( t ) , (27)where z has the same distribution as z , u ( t ) = ( u ( t ) , u ( t )) ⊤ , ν ( t ) = ( ν L ( t ) , ν B ( t )) ⊤ ,and σ S ( t ) ⊤ = σ l σ b σ b ! . If a hedging strategy u ( t ) is a F t -adapted process and E [ R T | u ( s ) | ds ] < ∞ , then itis said to be admissible. We denote the set of admissible controls as U . eﬁnition 3. The classic mean-variance (MV) hedging problem is deﬁned as V ( φ ) = min u ( · ) ∈U Var( M ( T )) − φ E [ M ( T )] , (28) where the parameter φ measures the insurer’s risk averseness. When φ = 0, problem (28) refers to the minimum-variance hedging. For any given¯ M = E [ M ( T )], E [( M ( T ) − ¯ M ) ] − φ E [ M ( T )] = E [( M ( T ) − ( ¯ M + φ ] − φ M − φ . In addition, the MV hedging problem can be embedded into a target-based objective.Speciﬁcally, the problem (28) is equivalent tomin ¯ M ∈ R min u ( · ) ∈U E [( M ( T ) − c ) ] − φ M − φ , (29)where c = ¯ M + φ . The inner minimization problem there refers to a target-based objectivethat aims to make the wealth close to the target c . Let π ( t ) = k e − R t µ ( s ) ds and Σ( t ) = σ S ( t ) ⊤ σ S ( t ). To solve the optimal hedging problem,we introduce two additional probability measures: d ˆ P d P = e − R t ξ ( s ) ⊤ d W ( s ) − | ξ ( s ) | ds , d ´ P d P = e − R t ζ ( s ) ⊤ d W ( s ) − ζ ( s ) ⊤ ζ ( s ) ds with ξ ( t ) = (2 ϕ ( t ) , ϑ ( t )) ⊤ and ζ ( t ) = ( ϕ ( t ) , ϑ ( t )) ⊤ . By the Girsanov theorem, ˆ W t , W t + R t ξ ( s ) ds and ´ W t , W t + R t ζ ( s ) ds are Wiener processes under ˆ P and ´ P , respectively.Denote ˆ E [ · ] and ´ E [ · ] as expectations under ˆ P and ´ P , respectively. By Theorem 1,´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i = exp( Y t ( T )) , where Y t ( T ) is equivalent to the Y t ( T ) in (10) with b ( x ) = b − ϕ ( s ) σ µ − b x , σ ( x ) = σ µ ,and W replaced by ´ W ; ´ E [ µ s | ˜ H t ] = ´ E [ X s | ˜ H t ] is equivalent to E [ X s |F t ] as deﬁned in (13)with B = − b , b ( s ) replaced by b − ϕ ( s ) σ µ , and W replaced by ´ W . In addition, wehave the following expressions.ˆ E h e − R T t r ( s ) ds (cid:12)(cid:12)(cid:12) F t i = exp( α ( t, T ) + β ( t, T ) r ( t )) , (30)´ B ( t, s ) = ´ E h e − R st r ( u ) du (cid:12)(cid:12)(cid:12) F t i = exp( α ( t, s ) + β ( t, s ) r ( t )) , (31)where α ( t, T ), β ( t, T ), α ( t, s ), and β ( t, s ) solve the ODEs in Appendix A. Thefollowing theorem provides the optimal hedging strategy. Theorem 2.

Consider two stochastic processes P ( t ) = e − R T t ϑ ( s )+ ϕ ( s ) ds ˆ E h e − R T t r ( s ) ds (cid:12)(cid:12)(cid:12) F t i (32) nd Q ( t ) = − P ( t )[ Q ( t ) + c ´ B ( t, T )] , (33) where Q ( t ) = Z T t ´ B ( t, s )( k E [ z ]´ E [ µ s | ˜ H t ] + k ´ E [ e − R s µ τ dτ | ˜ H t ]) ds, ´ B ( t, s ) = ´ E h e − R st r ( u ) du (cid:12)(cid:12)(cid:12) F t i , ≤ t ≤ s. Once dP ( t ) = µ P ( t ) dt + η ⊤ d W ( t ) and dQ ( t ) = µ Q ( t ) dt + η ⊤ d W ( t ) (34) under P , the inner minimization problem in (29) has an optimal feedback control: u ∗ c ( t ) = − Σ( t ) − (cid:20)(cid:18) ν ( t ) + σ S ( t ) ⊤ η ( t ) P ( t ) (cid:19) M ( t ) + Q ( t ) ν ( t ) + σ S ( t ) ⊤ η ( t ) P ( t ) (cid:21) . (35) In addition, the optimal objective value is P (0)( M (0) + Q (0) P (0) ) + I (0) , where I ( t ) = E " Z T t P ( µz + (cid:18) η − Qη P (cid:19) ⊤ σ ⊥ (cid:18) η − Qη P (cid:19)) ( s ) ds (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ˜ H t (36) in which σ ⊥ = id − σ S ( t )Σ( t ) − σ S ( t ) ⊤ .Proof. See Appendix B.

Proposition 3.

Then, the diﬀusion coeﬃcients in (34) are η = (0 , η ) ⊤ , where η = − P ( t ) β ( t, T ) σ r and η = ( η , η ) ⊤ in which η = − P ( t ) Z T t ´ B ( t, s ) (cid:16) k E [ z ] E B ( s − t ) σ µ + k ´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i ψ ( s − t ) σ µ (cid:17) ds,η = − P ( t ) (cid:26) Z T t ´ B ( t, s ) (cid:16) k E [ z ]´ E [ µ s | ˜ H t ] + k ´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i(cid:17) β ( t, s ) σ r ds + c ´ B ( t, T ) β ( t, T ) σ r (cid:27) + P ( t )[ Q ( t ) + c ´ B ( t, T )] β ( t, T ) σ r , (37) where β ( t, T ) is deﬁned in (30) , β ( t, s ) in (31) , E B in Theorem 1 with B = − b , and ψ ∈ L ([0 , s ] , C ) solves the Riccati equation ψ = ( − − ψ b ) ∗ K . Proposition 4.

The optimal hedging strategy u ∗ ( t ) = ( u ∗ ( t ) , u ∗ ( t )) ⊤ to problem (28) isgiven by u ∗ ( t ) = − σ l ( t ) (cid:26)(cid:20) M ( t ) − Q ( t ) − (cid:18) ¯ M ∗ + φ (cid:19) ´ B ( t, T ) (cid:21) ϕ ( t ) + η ( t ) P ( t ) (cid:27) , (38) u ∗ ( t ) = − σ b ( t ) (cid:26)(cid:20) M ( t ) − Q ( t ) − (cid:18) ¯ M ∗ + φ (cid:19) ´ B ( t, T ) (cid:21) ϑ ( t ) + M ( t ) η ( t ) + η ( t ) P ( t ) (cid:27) − u ∗ ( t ) , (39) where ¯ M ∗ = φ (1 − P (0) ´ B (0 , T )) + P (0) ´ B (0 , T )( M (0) − Q (0)) P (0) ´ B (0 , T ) . he explicit optimal hedging strategy in Proposition 4 incorporates the LRD featurethrough η which depends on the mortality rate path and the kernel K as shown in Propo-sition 3. In addition, the Hurst parameter is contained in the kernel function K . In this section, we numerically examine the impact of long-range dependence on theprices of insurance products and the hedging eﬀectiveness. To do so, we contrast theLRD mortality model with its Markovian counterpart. For the latter case, the Hurstparameter H is set to 1/2. As the LRD appears when H > /

2, we examine the eﬀectwhen H falls into this range. As the basic quantity, we begin with the survival probability. Under the Volterra mortalitymodel, we assume that process X satisﬁes Equation (15) which is a Volterra type ofVasicek model. The Vasicek model is a special case with α = 1 or H = 1 /

2. Wecompare the Vasicek and VV mortality models using two diﬀerent values of H while theother parameters are kept constant. It is empirically estimated by Yan et al. (2018)that the H is around 0.83 for mortality data. Thus, we choose an α of 1.33 for the VVmortality model. Table 2 summarizes the remaining parameters used in this numericalstudy. The parameters chosen have similar magnitudes to those in Biﬃs (2005) for thecase of Markovian model. Projection α m ( t ) η λ θ σ t X A 1.33 SIM92 0.2 0.5 0.0009 0.01 40 0.001B 1 SIM92 0.2 0.5 0.0009 0.01 40 0.001Table 2: Parameter values for the mortality model

Remark 4.

The SIM92 in Table 2 is a dataset from the Italian National Institute ofStatistics (ISTAT) which reports Italian population life tables. SIM92 is usually employedto price assurance. Such a setting for m ( t ) has been adopted in Biﬃs (2005). Speciﬁcally,after ﬁxing the other parameter values, the m ( t ) is calibrated to ﬁt the SIM92 table, sothe functional form of m ( t ) is not explicitly shown here. Although parameter values are assigned in this numerical experiment, we stress that, inreality, the parameters can be calibrated to observed prices of actuarial products using theset of the closed-form pricing formulas derived in this paper. In addition, the parameter θ n (14) or b in Deﬁnition 1 can be set as a bounded measurable function of time t ratherthan a constant as in our example.In Table 2, the symbol t stands for the age group. For example, when we set t = 40,it corresponds to a group of the survival population at the age of 40. In Figures 1(a)and 2(a), we simulate two diﬀerent sample paths of X for this group of individuals overthe time interval [0 , t ]. Under the VV mortality model, the historical sample paths of X aﬀect the estimated survival probability, whereas the Vasicek model does not due toits Markovian nature. Given the parameters in Table 2 and (14), we directly calculatesurvival probabilities from the two models. By (14) and Theorem 1, P ( τ > T |F t )= e − R Tt m ( s ) ds exp − η Z Tt E [ X s |F t ] ds + 12 Z Tt ψ ( T − s ) a ( E [ X s |F t ]) ψ ( T − s ) ⊤ ds ! , (40)for T > t . Under the Vasicek mortality model, the survival probability depends only on X t ( t = 40) as E [ µ s |F t ] = µ t . However, under the VV mortality model, the expression of E [ µ s |F t ] given in (13) depends on the whole historical path of X . Based on the simulatedsample paths, we calculate the survival probabilities for the interval T ∈ [ t, x ∗ ], where weset the maximum age at x ∗ = 109.Figures 1(b) and 2(b) show the survival probabilities that correspond to the historicalrecords in Figure 1(a) and 2(a), respectively. The solid line is the survival probabilitycurve with LRD and the dashed line is that of the Markovian model. Depending on thehistorical record, the LRD survival probability can be higher or lower than the Markoviansurvival probability. This indicates that the historical sample path has impact on thesurvival probability when LRD is present. The eﬀect is more pronounced for the middleage group. This is reasonable because the young age group has a shorter historical recordand the old age group may be restricted by the human age limit. This kind of middle-ageeﬀect may result in a signiﬁcant eﬀect on insurance pricing. We further examine it witha concrete insurance product.

10 20 30 40 − . . . . . age X (a)

40 50 60 70 80 90 100 110 . . . . . . age S u r v i v a l P r obab ili t y AB (b) Figure 1: A sample historical path of X that makes the survival probability with LRDhigher than its Markovian counterpart. − . . . . age X (a)

40 50 60 70 80 90 100 110 . . . . . . age S u r v i v a l P r obab ili t y AB (b) Figure 2: A sample of historical path of X that makes the survival probability with LRDlower than its Markovian counterpart.20 .2 Impact on annuity To examine the eﬀect of LRD on annuity prices, we compare the prices calculated bythe two models. We are interested in annuities because they are popular insurance andpension products around the globe.The numerical experiment is constructed as follows. Consider a 20-year deferred an-nuity and its payoﬀ is a unit amount each year. For simplicity, we assume that Q = P in this part so that no additional eﬀort is required to identify the pricing measure. Thesimulation and calculation are made with the parameters in Table 2. In addition, wespecify the short interest rate r t = Z t as follows. dZ t = ( e b − e b Z t ) dt + σ r dW ′ , where e b = 0 . e b = 0 . σ r = 0 .

3, and Z (40) = 0 .

01. Then we use (18) directly tocalculate the price of the annuity and t ′ = 20. (a) Price difference F r equen cy −0.04 −0.02 0.00 0.02 0.04 (b) Figure 3: (a) Examples of historical paths for X and (b) histogram of percentage diﬀerencein annuity prices between the two models To demonstrate the LRD eﬀect, we generate 15,000 sample paths of X over the timeinterval [0 , t ]. In Figure 3(a), we illustrate that the last two sample paths meet at time t . The classic Markovian model ignores how they come to this point and assigns thesame price to the two scenarios as explained in (40). However, our LRD mortality modeltakes the historical record into account and assigns two diﬀerent prices as shown in (18)and Theorem 1. The problem is to determine how large the diﬀerence between thesetwo models is. Clearly, the diﬀerence is not a single number as there are uncountably any ways to reach the same point. Therefore, we examine the distribution of the pricediﬀerence for diﬀerent historical paths.To do so, Figure 3(b) plots a histogram of the percentage diﬀerence of the annuityprices between the LRD and Markovian models. First, the mean of the distribution isnear zero, implying that the Markovian mortality model oﬀers an appropriate estimate ofthe averaged price even under the LRD feature. However, the dispersion of the histogramis still obvious. The price diﬀerence between the two models can reach 4% even for a linearannuity product, and this 4% diﬀerence seems not negligible in practice. The discrepancymay be ampliﬁed for products with leveraging eﬀects such as those with optionality.Even for this annuity product, we can see the volatility could be higher compared tothe Markovian model due to incorrect predictions of the mortality rate if the realizedmortality has the LRD feature.To illustrate the inﬂuence of LRD on products with optionality, consider a Europeancall option on a zero-coupon longevity bond B L ( t, T ) with strike D and expiration time T , where T is the ﬁxed maturity of the bond and T is the expiration date of the optionso that 0 ≤ t ≤ T < T . Speciﬁcally, the call option payoﬀ reads V ( B L ( T , T )) =max( B L ( T , T ) − D, r and m ( · ) = 0. By (15) and (25), we have d B L ( t, T ) = B L ( t, T ) h rdt + ψ ( T − t ) σdW Q t i , (41)under the pricing measure, where ψ solves ψ = ( − η − λψ ) ∗ K . As (25) is the dynamic of B L ( t, T ) under P , the corresponding Q dynamics in (41) is one in which the term ν L in(25) is absorbed into the P -Brownian motion to form a Q -Brownian motion. Hence, thecall value function V ( B L , t ) resembles the Black-Scholes formula. Speciﬁcally, V ( B L , t ) = Φ( d ) B L ( t, T ) − Φ( d ) De − r ( T − t ) ,d = 1 ψ ( T − t ) σ √ T − t (cid:20) ln (cid:18) B L ( t, T ) D (cid:19) + (cid:18) r + 12 ψ ( T − t ) σ (cid:19) ( T − t ) (cid:21) ,d = d − ψ ( T − t ) σ p T − t, where Φ( · ) is the cumulative distribution function of the standard normal distribution.Let us make a numerical comparison in terms of percentage diﬀerence in option pricebetween the VV and Markovian models. Let r = 0 . T = 5, and T = 2, and setthe other parameters as in Table 2. Assume B L ( t, T ) = 0 . , D , the benchmarkingat-the-money (ATM) strike, at the option issuance time. Note that the historical pathof the mortality rate is subsumed into the longevity bond price B L ( t, T ). By varying thestrike D from 0.8 (ATM) to 0.832 (4% in-the-money), option prices under the two modelsare shown in Figure 4(a) while the percentage diﬀerence in price is shown in Figure 4(b).When the strike increases by 4%, the percentage diﬀerence in option price could reach20% which is quite signiﬁcant. We mention the 4% increase in strike because the price ofan annuity can reach a 4% diﬀerence in price in the former analysis. When the strike isset to make the option ATM, the diﬀerence in the longevity bond price results in a 4% iﬀerence in setting the ATM strike. This example shows that optionality may furtheramplify the pricing diﬀerence. Strike D P r i c e Volterra mortality modelMarkovian mortality model (a)

D/D P e r c e n t a g e d i ff e r e n c e o f o p t i o n p r i c e s (b) Figure 4: (a) Option prices and (b) diﬀerence of the prices under the two models

We further examine the hedging with LRD. In this part, we still consider the fractionalkernel in (23) so that K ( t ) = t α − Γ( α ) . Again, we ﬁrst simulate a pair of sample pathsof mortality and interest rates as shown in Figure 5. The model parameters used are µ (0) = 0 . b = 0 . b = 0 . σ µ = 0 . r (0) = 0 . e b = 0.6, e b = 0 . σ r = 0 . T = 5, α = 1 . k = 1, k = 10, and E [ z ] = 2. . . . . t M o r t a li t y r a t e (a) . . . t I n t e r e s t r a t e (b) Figure 5: A pair of sample paths of (a) mortality rate and (b) interest rate

We hedge with the following two models. Model 1: Above assumption with K ( t ) = t α − Γ( α ) (Volterra mortality model); • Model 2: Above assumption with K ( t ) = 1 (Markovian mortality model).Our objective is to hedge with φ = 3000 over a horizon of 5 years using a zero-couponlongevity bond and a zero-coupon bond with a maturity time T = 15. The initial valueof wealth process is set to 2000. The optimal hedging strategies are calculated accordingto (38) and (39). The longevity bond price and bond price are calculated by assumingconstant market price of risks ϕ = 0 . ϑ = 0 . α = 1 .

33 (or H = 0 .

83) in this numerical experiment, the value of α can be calibrated or estimated in practice by using the pricing formulas we provide.Therefore, this study oﬀers the option of choosing between Volterra and Markovian mor-tality models when dealing with longevity hedging in reality. Our proposed model rendersa practical, ﬂexible approach to the choice of α . u ( t ) Model 1Model 2 (a) u ( t ) Model 1Model 2 (b)

Figure 6: Optimal hedging strategy (a) u ( t ) and (b) u ( t )24 W e a l t h p r o c e ss Model 1Model 2No hedge

Figure 7: Wealth processes

In this paper, we propose a tractable continuous-time mortality rate model that incor-porates the LRD feature. Using our model, we derive novel closed-form solutions to thesurvival probability and prices of several basic insurance products. In addition, our modelenables us to investigate an optimal longevity hedging strategy via the BSDE framework.Therefore, the key advantages of our model are its tractability for pricing and risk manage-ment as well as its ability to capture the LRD feature. Our numerical experiments showthat LRD has signiﬁcant eﬀects for insurance pricing and hedging. The new longevityhedging strategy improves the hedging eﬀectiveness when the mortality rate observes theLRD feature.

A Transformation of Markov aﬃne processes

We now give the ODEs which the coeﬃcients e α and e β solve appearing in Section 2 and 4.A R k -valued aﬃne diﬀusion Z is a F -Markovian process speciﬁed as the strong solutionto the following SDE: dZ t = e b ( Z t ) dt + e σ ( Z t ) dW ′ t , where W ′ t is a F -standard k -dimensional Brownian motion. We require the covariancematrix e a ( Z ) = e σ ( Z ) e σ ( Z ) ⊤ and the drift e b ( Z ) to have aﬃne dependence on Z as inDeﬁnition 1. That is e a ( Z ) = e A + Z e A + · · · + Z k e A k , e b ( Z ) = e b + Z e b + · · · + Z k e b k , or some k -dimensional symmetric matrices e A i and vectors e b i . For convenience, we set e A = ( e A , · · · , e A k ) and e b = ( e b , · · · , e b k ). As shown in Duﬃe et al. (2000), for any c , c ∈ C k and c ∈ C , given T > t and aﬃne function Λ( t, x ) = λ ( t ) + λ ( t ) · Z ( λ and λ are bounded continuous functions), under technical conditions we have E [ e − R Tt Λ( s,Z s ) ds e c · Z T ( c · Z T + c ) |F t ] = e e α ( t )+ e β ( t ) · Z t [ˆ α ( t ) + ˆ β ( t ) · Z t ] , where the functions e α ( · ) . = e α ( · , T ) and e β ( · ) . = e β ( · , T ) solve the following ODEs:˙ e β ( t ) = λ ( t ) − e b ( t ) ⊤ e β ( t ) − e β ( t ) ⊤ e A ( t ) e β ( t ) , ˙ e α ( t ) = λ ( t ) − e b ( t ) · e β ( t ) − e β ( t ) ⊤ e A ( t ) e β ( t ) , with boundary conditions e α ( T ) = 0 and e β ( T ) = c ; the functions ˆ α ( · ) . = ˆ α ( · ; c , c , c , T )and ˆ β ( · ) . = ˆ β ( · ; c , c , c , T ) are the solutions to the following ODEs:˙ˆ β ( t ) = − e b ( t ) ⊤ ˆ β ( t ) − e β ( t ) ⊤ e A ( t ) ˆ β ( t ) , ˙ˆ α ( t ) = − e b ( t ) · ˆ β ( t ) − e β ( t ) ⊤ e A ( t ) ˆ β ( t ) , with boundary conditions ˆ α ( T ) = c and ˆ β ( T ) = c . B Some Proofs

Proof of Theorem 1

Under our model, from (5), E [ e − R Tt µ s ds |F t ] = E [ e − R Tt m ( s )+ ηX s ds |F t ] = e − R Tt m ( s ) ds E [ e − R Tt ηX s ds |F t ] . As X t has the aﬃne structure speciﬁed in Deﬁnition 1, by application of Lemma 4.2 andTheorem 4.3 provided in Abi Jaber et al. (2019), we have E [ e − R Tt ηX s ds |F t ] = e R t ηX s ds E [ e − R T ηX s ds |F t ] = e R t ηX s ds exp( Y t ( T )) , where Y t ( T ) is the Markovian process deﬁned in (10) or equivalently (12) in Theorem 1.Then, for T > t ≥

0, we have E [ e − R Tt µ s ds |F t ] = e − R Tt m ( s ) ds e R t ηX s ds exp( Y t ( T )) . Notice that − R Tt m ( s ) ds + R t ηX s ds = − R T m ( s ) ds + R t µ s ds . Hence, E [ e − R Tt µ s ds |F t ] = e − R T m ( s ) ds e R t µ s ds exp( Y t ( T )) = g ( t, T ) . (42)By taking the derivative of g ( t, T ) with respect to T , we get − ∂g ( t, T ) ∂T = E [ e − R Tt µ s ds µ T |F t ] , T > t. (43)Then, by combining the Equations (42) and (43), the result in (9) follows. roof of Theorem 2 and Proposition 3 For P ( t ), it is obvious that P ( t ) > P ( T ) = 1, and P − ( t ) = e R T t ϑ ( s )+ ϕ ( s ) ds ˆ E [ e − R T t r ( s ) ds |F t ]= e R T t ϑ ( s )+ ϕ ( s ) ds exp( α ( t, T ) + β ( t, T ) r ( t )) . Under our setting, ν ( t ) ⊤ Σ( t ) − ν ( t ) = ϑ ( t ) + ϕ ( t ). Then, by applying Itˆo’s formula, weget dP − ( t ) = P − ( t )(2 r ( t ) − ϑ ( t ) − ϕ ( t )) dt − P − ( t ) e η ( t ) ⊤ d ˆ W ( t )= P − ( t )(2 r ( t ) − ν ( t ) ⊤ Σ( t ) − ν ( t ) − ˜ η ( t ) ⊤ ξ ( t )) dt − P − ( t ) e η ( t ) ⊤ d W ( t ) , where e η = − β ( t, T ) σ r = η /P ( t ) and η ( t ) is deﬁned in Proposition 3. Notice that ξ ( t ) = 2 σ S Σ( t ) − ν ( t ) and σ ⊥ e η = 0. Then by Itˆo’s lemma again, P ( t ) satisﬁes dP ( t ) = (cid:26) (cid:2) − r ( t ) + ν ( t ) ⊤ Σ( t ) − ν ( t ) (cid:3) P ( t ) + 2 ν ( t ) ⊤ Σ( t ) − σ S ( t ) ⊤ η ( t )+ η ( t ) ⊤ σ S ( t )Σ( t ) − σ S ( t ) ⊤ η ( t ) 1 P ( t ) (cid:27) dt + η ( t ) ⊤ d W ( t ) . For Q ( t ), it is obvious that Q ( T ) = − c and Q ( t ) P ( t ) = − [ Q ( t ) + c ´ B ( t, T )]. By applyingItˆo’s lemma to ´ E [ µ s | ˜ H t ] on time t , we have d (cid:16) ´ E [ µ s | ˜ H t ] (cid:17) = E B ( s − t ) σ µ d ´ W t , where E B is deﬁned in Theorem 1 with B = − b . By applying Ito’s lemma to ´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i =exp( Y t ( T )) on time t , we get d (cid:16) ´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i(cid:17) = ´ E h e − R s µ τ dτ (cid:12)(cid:12)(cid:12) ˜ H t i ψ ( s − t ) σ µ d ´ W t with ψ ∈ L ([0 , s ] , C ) solving the Riccati equation ψ = ( − − ψ b ) ∗ K . From (31), d ´ B ( t, s ) = ´ B ( t, s ) r ( t ) dt − ´ B ( t, s ) β ( t, s ) σ r d ´ W ′ t . Then, by applying Itˆo’s lemma to Q ( t ) P ( t ) , wehave d (cid:20) Q ( t ) P ( t ) (cid:21) = (cid:20) Q ( t ) P ( t ) r ( t ) + k µ ( t ) z + π ( t ) (cid:21) dt + (cid:20)e η ( t ) ⊤ − Q ( t ) P ( t ) e η ( t ) ⊤ (cid:21) d ´ W ( t )= (cid:20) Q ( t ) P ( t ) r ( t ) + k µ ( t ) z + π ( t ) + e η ( t ) ⊤ ζ ( t ) − Q ( t ) P ( t ) e η ( t ) ⊤ ζ ( t ) dt (cid:21) + (cid:20)e η ( t ) ⊤ − Q ( t ) P ( t ) e η ( t ) ⊤ (cid:21) d W ( t ) , where e η = η /P ( t ) and η is shown in Proposition 3. Notice that ζ ( t ) = σ S Σ( t ) − ν ( t )and σ ⊥ e η = 0. Then, by Itˆo’s lemma again, Q ( t ) satisﬁes dQ ( t ) = (cid:26) (cid:20) − r ( t ) + ν ( t ) ⊤ Σ( t ) − (cid:18) ν ( t ) + σ S ( t ) ⊤ η ( t ) P ( t ) (cid:19)(cid:21) Q ( t ) + P ( t )( k µ ( t ) z + π ( t ))+ η ( t ) ⊤ σ S ( t )Σ( t ) − (cid:18) ν ( t ) + σ S ( t ) ⊤ η ( t ) P ( t ) (cid:19) (cid:27) dt + η ( t ) ⊤ d W ( t ) . inally, we consider the process P ( t ) (cid:16) M ( t ) + Q ( t ) P ( t ) (cid:17) + I ( t ). By Itˆo’s formula, we have d " P ( t ) (cid:18) M ( t ) + Q ( t ) P ( t ) (cid:19) + I ( t ) = d [ P ( t ) M ( t ) + 2 d [ Q ( t ) M ( t )] + d [ Q ( t ) P − ( t )] + dI ( t )= P ( t )( u ( t ) − u ∗ c ( t )) ⊤ σ S ( t ) ⊤ σ S ( t )( u ( t ) − u ∗ c ( t )) dt + {· · · } d W ( t ) + {· · · } d K ( t )= P ( t ) || σ S ( t )( u ( t ) − u ∗ c ( t )) || dt + {· · · } d W ( t ) + {· · · } d K ( t ) , where u ∗ c ( t ) is deﬁned in (35) and K ( t ) = ˜ N ( t ) − k R t µ ( s ) ds is a martingale with respectto the ﬁltration ˜ H t . Then, there exists an increasing sequence of stopping times { τ i } suchthat τ i ↑ T as i → ∞ and E " P ( T ∧ τ i ) (cid:18) M ( T ∧ τ i ) + Q ( T ∧ τ i ) P ( T ∧ τ i ) (cid:19) + I ( T ∧ τ i ) = P (0)( Y (0) + Q (0) P (0) ) + I (0) + E "Z T ∧ τ i P ( t ) || σ S ( t )( u ( t ) − u ∗ c ( t )) || dt . From (32) and (33), we can see P ( t ) and Q ( t ) are bounded. From (36), I ( t ) is alsobounded. As E [sup t ∈ [0 ,T ] | Y ( t ) | ] < ∞ , according to the Dominance Covergence Theo-rem and Monotone Convergence Theorem as i → ∞ , we have E " P ( T ) (cid:18) M ( T ) + Q ( T ) P ( T ) (cid:19) + I ( T ) = P (0)( Y (0) + Q (0) P (0) ) + I (0) + E "Z T P ( t ) || σ S ( t )( u ( t ) − u ∗ c ( t )) || dt . Thus, the objective function E [( M ( T ) − c ) ] = E (cid:20) P ( T ) (cid:16) M ( T ) + Q ( T ) P ( T ) (cid:17) + I ( T ) (cid:21) isminimized when u ( t ) = u ∗ t . P (0)( Y (0) + Q (0) P (0) ) + I (0) is the optimal objective value. Proof of Proposition 4

By Theorem 2, the optimal objective value is given by P (0)( M (0) + Q (0) P (0) ) + I (0) forany given c . Take c = ¯ M + φ and substitute Q (0) = − P (0)[ Q (0) + c ´ B (0 , T )], then theexternal minimization problem in (29) becomesmin ¯ M ∈ R P (0)( M (0) − ( ¯ M + φ B (0 , T ) − Q (0)) + I (0) − φ M − φ , which is a quadratic function attaining its minimum at¯ M ∗ = φ (1 − P (0) ´ B (0 , T )) + P (0) ´ B (0 , T )( M (0) − Q (0)) P (0) ´ B (0 , T ) . By substituting c = ¯ M ∗ + φ , the result follows. eferences Abi Jaber, E., Larsson, M., Pulido, S. (2019). Aﬃne Volterra processes.

The Annals ofApplied Probability

European Actuarial Journal

Stochastic Processesand Their Applications

Insur-ance: Mathematics and Economics

Scandi-navian Actuarial Journal

Insurance: Mathematics and Economics

Journal of Risk and Insurance.

Insurance: Mathematics and Eco-nomics , 31(3), 373-393.Chuang, S. L., Brockett, P. L. (2014). Modeling and pricing longevity derivatives us-ing stochastic mortality rates and the Esscher transform.

North American ActuarialJournal , 18(1), 22-37.Danesi, I. L., Haberman, S., Millossovich, P. (2015). Forecasting mortality in subpop-ulations using Lee-Carter type models: A comparison.

Insurance: Mathematics andEconomics

62, 151-161.Delgado-Vences, F., Ornelas, A. (2019). Modelling Italian mortality rateswith a geometric-type fractional Ornstein-Uhlenbeck process. arXiv preprintarXiv:1901.00795.

Duﬃe, D., Filipovi´c, D., Schachermayer, W. (2003). Aﬃne processes and applications inﬁnance.

The Annals of Applied Probability uﬃe, D., Pan, J., Singleton, K. (2000). Transform analysis and asset pricing for aﬃnejump-diﬀusions. Econometrica

Stochastic Processes and TheirApplications

PhilosophicalTransactions of the Royal Society of London (115), 513-583.Han, B., Wong, H.Y. (2020). Mean-variance portfolio selection withVolterra Heston model.

Applied Mathematics and Optimization https://doi.org/10.1007/s00245-020-09658-3.Jevti´c, P., Luciano, E., Vigna, E. (2013). Mortality surface by means of continuous timecohort models.

Insurance: Mathematics and Economics

88, 181-195.Lee, R. D., Carter, L. R. (1992). Modeling and forecasting US mortality.

Journal of theAmerican Statistical Association

Journal of Applied Probability

Demography

Insurance: Mathematics and Economics

Insurance: Mathematics and Eco-nomics

Risks

NorthAmerican Actuarial Journal ang, Y., Zhang, N., Jin, Z., Ho, T. L. (2019). Pricing longevity-linked derivatives using astochastic mortality model. Communications in Statistics-Theory and Methods , 48(24),5923-5942.Wong, T. W., Chiu, M. C., Wong, H. Y. (2017). Managing mortality risk with longevitybonds when mortality rates are cointegrated.

Journal of Risk and Insurance

Annals of ActuarialScience.

Yan, H., Peters, G., Chan, J. (2020). Multivariate long memory cohort mortality models.

ASTIN Bulletin

European Journal of Population

35, 675-694.35, 675-694.