Least Squares Monte Carlo applied to Dynamic Monetary Utility Functions
LLeast Squares Monte Carlo applied to DynamicMonetary Utility Functions
Hampus EngsnerJanuary 27, 2021
Abstract
In this paper we explore ways of numerically computing recursivedynamic monetary risk measures and utility functions. Computation-ally, this problem suffers from the curse of dimensionality and nestedsimulations are unfeasible if there are more than two time steps. Theapproach considered in this paper is to use a Least Squares MonteCarlo (LSM) algorithm to tackle this problem, a method which hasbeen primarily considered for valuing American derivatives, or moregeneral stopping time problems, as these also give rise to backwardrecursions with corresponding challenges in terms of numerical com-putation. We give some overarching consistency results for the LSMalgorithm in a general setting as well as explore numerically its per-formance for recursive Cost-of-Capital valuation, a special case of adynamic monetary utility function.
Dynamic monetary risk measures and utility functions, as described for in-stance in [1] and [4], are time consistent if and only if they satisfy a recursiverelationship (see for instance [5], [18]). In the case of time-consistent val-uations of cash flows, often in an insurance setting, (e.g. [11], [12], [18],[19], [17], [10]), analogous recursions also appear. Recursive relationshipsalso occur as properties of solutions to optimal stopping problems, of whichvaluation of American derivatives is a special case. It is well known thatnumerical solutions to these kinds of recursions suffer from ”the curse ofdimensionality”: As the underlying stochastic process generating the flowof information grows high dimensional, direct computations of solutions ofthese recursions prove unfeasible.In order to make objective of this paper more clear, consider a probabil-ity space (Ω , F , P ), a d -dimensional Markov chain ( S t ) Tt =0 in L ( F ) and itsnatural filtration ( F t ) Tt =0 . We are interested in computing V given as thesolution to the following recursion V t = ϕ t ( f ( S t +1 ) + V t +1 ) , V T = 0 , (1)1 a r X i v : . [ q -f i n . C P ] J a n here, for each t , ϕ t : L ( F t +1 ) → L ( F t ) is a given law-invariant mapping(see Section 2 and definition 3). Recursions such as (1) arise when describingtime-consistent dynamic monetary risk measures/utility functions (see e.g.[4]). Alternatively, we may be interested in computing V given as thesolution to the following recursion V t = f ( S t ) ∨ E [ V t +1 | F t ] , V T = f ( S T ) , (2)where f : R d → R is a given function. Recursions such as (2) arise whensolving discrete-time optimal stopping problems or valuing American-stylederivatives (see e.g. [13] and [15]). In this article we will focus the recursiveexpression (1). In either case, due to the Markovian assumptions, we expect V t to be determined by some deterministic function of the state S t at time t . The curse of dimensionality can now be succinctly put as the statementthat as the dimension d grows, direct computation of V t often becomesunfeasible. Additionally, brute-force valuation via a nested Monte Carlosimulation, discussed in [3] and [2], is only a feasible option when T = 2, asthe number of required simulations would grow exponentially with T . Oneapproach to tackle this problem is the Least Squares Monte Carlo (LSM)algorithm, notably used in [15] to value American-style derivatives, andconsists of approximating V t +1 in either (1) or (2) as a linear combinationbasis functions of the state S t +1 via least-squares regression. While mostoften considered for optimal stopping problems ([15], [21], [20], [13], [22],[23], [24]), it has also been used recently in [17] for the purpose of actuarialvaluation, with respect to a recursive relationship in line with (1).The paper is organized as follows. In Section 2 we introduce the math-ematical definitions and notation that will allow us to describe the LSMalgorithm in our setting mathematically, as well as to formulate theoreticalresults. Section 3 contains consistency results with respect to computing(1) both in a general setting, only requiring an assumption of continuity in L norm, and for the special case of a Cost-of-Capital valuation, studied in[11] and [12], under the assumption that capital requirements are given bythe risk measure Value-at-Risk, in line with solvency II. The lack of conve-nient continuity properties of Value-at-Risk pose certain challenges that arehandled. Section 4 investigates the numerical performance of the LSM algo-rithm on valuation problems for a set of models for liability cash flows. Heresome effort is also put into evaluating and validating the LSM algorithm’sperformance, as this is not trivial for the considered cases. Consider a probability space (Ω , F , P ). On this space we consider two fil-trations ( F t ) Tt =0 , with F = {∅ , Ω } , and ( H t ) Tt =0 . The latter filtration is aninitial expansion of the former: take D ⊂ F and set H t := F t ∨ D . D will2ater correspond to the σ -field generated by initially simulated data neededfor numerical approximations. Define L ( H t ) as the spaces of H t measurablerandom variables Z with E [ Z ] < ∞ . The subspace L ( F t ) ⊂ L ( H t ) is de-fined analogously. All equalities and inequalities between random variablesshould be interpreted in the P -almost sure sense.We assume that the probability space supports a Markov chain S =( S t ) Tt =0 on ( R d ) T , where S is constant, and an iid sequence D = ( S ( i ) ) i ∈ N ,independent of S , where, for each i , S ( i ) = ( S ( i ) t ) Tt =0 has independent compo-nents with L ( S ( i ) t ) = L ( S t ) (equal in distribution). D will represent possibleinitially simulated data and we set D = σ ( D ). The actual simulated datawill be a finite sample and we write D n := ( S ( i ) ) ni =1 . For Z ∈ L ( F ) wewrite (cid:107) Z (cid:107) := E [ | Z | | H ] . Notice that (cid:107) Z (cid:107) is a nonrandom number if Z is independent of D .The mappings ρ t and ϕ t appearing in Definitions 1 and 2 below canbe defined analogously as mappings from L p ( H t +1 ) to L p ( H t ) for p (cid:54) = 2.However, p = 2 will be the relevant choice for the applications treatedsubsequently. Definition 1.
A dynamic monetary risk measure is a sequence ( ρ t ) T − t =0 ofmappings ρ t : L ( H t +1 ) → L ( H t ) satisfyingif λ ∈ L ( H t ) and Y ∈ L ( H t +1 ) , then ρ t ( Y + λ ) = ρ t ( Y ) − λ, (3) if Y, (cid:101) Y ∈ L ( H t +1 ) and Y ≤ (cid:101) Y , then ρ t ( Y ) ≥ ρ t ( (cid:101) Y ) , (4) ρ t (0) = 0 . (5)The elements ρ t of the dynamic monetary risk measure ( ρ t ) T − t =0 are calledconditional monetary risk measures. Definition 2.
A dynamic monetary utility function is a sequence ( ϕ t ) T − t =0 of mappings ϕ t : L ( H t +1 ) → L ( H t ) satisfyingif λ ∈ L ( H t ) and Y ∈ L ( H t +1 ) , then ϕ t ( Y + λ ) = ϕ t ( Y ) + λ, (6) if Y, (cid:101) Y ∈ L ( H t +1 ) and Y ≤ (cid:101) Y , then ϕ t ( Y ) ≤ ϕ t ( (cid:101) Y ) , (7) ϕ t (0) = 0 . (8)Note that if ( ρ t ) T − t =0 is a dynamic monetary risk measure, ( ρ t ( −· )) T − t =0 is a dynamic monetary utility function. In what follows we will focus ondynamic monetary utility function of the form ϕ t ( Y ) = ρ t ( − Y ) −
11 + η t E (cid:2)(cid:0) ρ t ( − Y ) − Y (cid:1) + | H t (cid:3) , (9)where ( ρ t ) Tt =0 is a dynamic monetary risk measures in the sense of Defini-tion 1 and ( η t ) T − t =0 is a sequence of nonrandom numbers in (0 , η t ) T − t =0 to be an ( F t ) T − t =0 adapted sequence, however we choosethe simpler version here. That ( ϕ t ) Tt =0 is indeed a dynamic monetary utilityfunction is shown in [11].We will later consider conditional monetary risk measures that are con-ditionally law invariant in the sense of Definition 3 below. Conditional lawinvariance will then be inherited by ϕ t in (9). Definition 3.
A mapping ϕ t : L ( H t +1 ) → L ( H t ) is called law invariantif ϕ t ( X ) = ϕ t ( Y ) whenever P ( X ∈ · | H t ) = P ( Y ∈ · | H t ) . We now define the value process corresponding to a dynamic monetaryutility function ( ϕ t ) T − t =0 in the sense of Definition 2 with respect to thefiltration ( F t ) T − t =0 instead of ( H t ) T − t =0 . The use of the smaller filtration is dueto that a value process of the sort appearing in Definition 4 is the theoreticalobject that we aim to approximate well by the methods considered in thispaper. Definition 4.
Let L := ( L t ) Tt =1 with L t ∈ L ( F t ) for all t . Let ( ϕ t ) T − t =0 bea dynamic monetary utility function with respect to ( F t ) T − t =0 . Let V t ( L, ϕ ) := ϕ t ( L t +1 + V t +1 ( L, ϕ )) , V T ( L, ϕ ) := 0 . (10) We refer to V t ( L, ϕ ) as the time t ϕ -value of L . Whenever it will cause no confusion, we will suppress the argument ϕ in V t ( L, ϕ ) in order to make the expressions less notationally heavy.
Remark 1.
Letting ρ be a dynamic monetary risk measure and letting ϕ be a dynamic monetary utility function, − (cid:80) ts =1 L t − V t ( − L, − ρ ) will be aconditional monetary risk measure on the cash flow L in the sense of [4]and likewise (cid:80) ts =1 L t + V t ( L, ϕ ) will be a conditional monetary utility func-tion in the sense of [4], with L being interpreted as a process of incrementalcash flows. If L is a liability cash flow, we may write the risk measure of L as (cid:80) ts =1 L t + V t ( L, ρ ) . Importantly, any time-consistent dynamic mone-tary utility function/risk measure may be written in this way (see e.g. [5]).Often convexity or subadditivity is added to the list of desired properties indefinitions 1 and 2 (see e.g. [1] [4] [5]). For t = 1 , . . . , T , consider a sequence of functions { , Φ t, , Φ t, , . . . } , wherefor each i ∈ N , Φ t,i : R d → R has the property Φ t,i ( S t ) ∈ L p ( F t ) andthe set { , Φ t, ( S t ) , Φ t, ( S t ) , . . . } make up a.s. linearly independent randomvariables. We define the approximation space B t,N and its corresponding4 projection operator P B t,N : L ( H t ) → B t,N as follows: for N ∈ N and t ∈ { , . . . , T } , B t,N := span { , Φ t, ( S t ) , . . . , Φ t,N ( S t ) } , (11) P B t,N Z t := arg inf B ∈B t,N (cid:107) Z t − B (cid:107) . (12)Defining Φ t,N := (1 , Φ t, ( S t ) , . . . , Φ t,N ( S t )) T , note that the unique mini-mizer in (12) is given by P B t,N Z t := β T t,N,Z t Φ t,N , with β t,N,Z t = E (cid:2) Φ t,N Φ T t,N | H (cid:3) − E (cid:2) Φ t,N Z t | H (cid:3) , (13)where the expected value of a vector or matrix is interpreted elementwise.Note that if Z t in (13) is independent of the initial data D , then β t,N,Z t is anonrandom vector. Indeed, we will only apply the operator P B t,N to randomvariables Z t independent of D .For each t , consider a nonrandom function z t such that Z t = z t ( D M , S t ) ∈ L ( H t ). For M ∈ N , let Φ ( M ) t,N := t, ( S (1) t ) . . . Φ t,N ( S (1) t )... . . . . . . ...1 Φ t, ( S ( M ) t ) . . . Φ t,N ( S ( M ) t ) ,Z ( M ) t := z t ( D M , S (1) t )... z t ( D M , S ( M ) t ) and define (cid:98) β ( M ) t,N,Z t := (cid:16)(cid:0) Φ ( M ) t,N (cid:1) T Φ ( M ) t,N (cid:17) − (cid:0) Φ ( M ) t,N (cid:1) T Z ( M ) t , (14) P ( M ) B t,N Z t := ( (cid:98) β ( M ) t,N,Z t ) T Φ t,N . (15)Notice that (cid:98) β ( M ) t,N,Z t is independent of S and is the standard OLS estimator of β t,N,Z t in (13). Notice also that Φ t,N is independent of D . With the abovedefinitions we can define the Least Squares Monte Carlo (LSM) algorithmfor approximating the value V ( L, ϕ ) given by (4).Let ( ϕ t ) T − t =0 be a sequence of law-invariant mappings ϕ t : L ( H t +1 ) → L ( H t ). Consider a stochastic process L := ( L t ) Tt =1 , where L t = l t ( S t ) ∈ L ( F t ) for all t for some nonrandom functions l t : R d → R . The goal is toestimate the values ( V t ( L )) Tt =0 given by Definition 4. Note that the soughtvalues V t ( L ) are independent of D and thus, by the law-invariance propertyand the Markov property, V t ( L ) is a function of S t for each t . Now we may5escribe the LSM algorithm with respect to N basis functions and simulationsample size M . The LSM algorithm corresponds to the following recursion: (cid:98) V ( M ) N,t ( L ) := P ( M ) B t,N ϕ t (cid:0) L t +1 + (cid:98) V ( M ) N,t +1 ( L ) (cid:1) , (cid:98) V ( M ) N,T ( L ) := 0 . (16)Notice that (cid:98) V ( M ) N,t is a function of the random variables S t and S ( i ) u for 1 ≤ i ≤ M and t + 1 ≤ u ≤ T . In particular, (cid:98) V ( M ) N,t ∈ L ( H t ). In the sectionbelow, we will investigate when, and in what manner, (cid:98) V ( M ) N,t ∈ L ( H t ) mayconverge to V t ∈ L ( F t ) ⊂ L ( H t ). For this purpose, we make the additionaluseful definition: (cid:98) V N,t ( L ) := P B t,N ϕ t (cid:0) L t +1 + (cid:98) V N,t +1 ( L ) (cid:1) , (cid:98) V N,T ( L ) := 0 . (17) (cid:98) V N,t ( L ) is to be interpreted as an idealized LSM estimate, where we make aleast-squares optimal estimate in each iteration. Note that this quantity isindependent of D In the following section we prove what essentially are two consistency resultsfor the LSM estimator (cid:98) V ( M ) N,t ( L ) along with providing conditions for theseto hold. The first and simplest result, Lemma 1, is that if we have a flex-ible enough class of basis functions, (cid:98) V N,t ( L ) will asymptotically approachthe true value V t ( L ). The second consistency result, Theorem 1, is thatwhen N is kept fixed, then (cid:98) V ( M ) N,t ( L ) will approach the least-squares optimal (cid:98) V N,t ( L ) for each t as M grows to infinity. Hence, we show that the LSMestimator for a fixed number of basis functions is consistent in the sense thatthe simulation-based projection operator P ( M ) B t,N will approach P B t,N even inthe presence of errors in a multiperiod setting. Lemma 7 and Theorem 3furthermore extends these results to the case of a Cost-of-Capital valuation,studied in [11] and [12], which here is dependent on the non-continuousrisk measure Value-at-Risk. Note from Section 2 that these results presumesimulated data not to be path dependent, in contrast to the results in [24].We should note that these results do not give a rate of convergence,which is done in the optimal stopping setting in for instance [13] and [22].Especially, these papers provide a joint convergence rate in which M and N simultaneously go to infinity, something which is not done here. Thereare three main reasons for this. First of all, in this paper the purpose is toinvestigate LSM methods given by standard OLS regression, i.e. we do notwant to involve a truncation operator as we believe this would not likely beimplemented in practice. The use of truncation operators is necessary forthe results in [13] and [22], although one can handle the case of unboundedcash flows by letting the bound based on the truncation operator suitably6o to infinity along with N and M . Secondly, we believe that the boundsinvolved in the rates of convergence would be quite large in our case if weapplied repeatedly the procedure in [13] or [22] (see remark 3). Thirdly, wewant to consider mappings which are L -continuous (Definition 5) but notnecessarily Lipschitz. In this case it is not clear how convergence can beestablished other than at some unspecified rate. We first define a useful mode of continuity that we will require to show ourfirst results on the convergence of the LSM algorithm.
Definition 5.
The mapping ϕ t : L ( H t +1 ) → L ( H t ) is said to be L -continuous if (cid:107) X − X n (cid:107) P → implies (cid:107) ϕ t ( X ) − ϕ t ( X n ) (cid:107) P → . Notice that if ( X n ) ∞ n =1 and X are independent of D , the convergence inprobability may be replaced by convergence of real numbers.We are now ready to formulate our first result on the convergence of theLSM algorithm. The first result essentially says that if we make the bestpossible estimation in each recursion step, using N basis functions, then, foreach t , the estimator of V t will converge in L to V t as N → ∞ . This result isnot affected by the initial data D , as it does not require any simulation-basedapproximation. Lemma 1.
For t = 0 , . . . , T − , let the mappings ϕ t : L ( H t +1 ) → L ( H t ) be L -continuous and law invariant. For t = 1 , . . . , T , let (cid:83) n ∈ N B t,n be densein the set { h ( S t ) | h : R d → R , h ( S t ) ∈ L ( F t ) } . Then, for t = 0 , . . . , T − , (cid:107) (cid:98) V N,t ( L ) − V t ( L ) (cid:107) ∈ R and lim N →∞ (cid:107) (cid:98) V N,t ( L ) − V t ( L ) (cid:107) = 0 . The second result uses the independence assumptions of D to provea somewhat technical result for when P ( M ) B t,N given by (15) asymptoticallyapproaches the projection P B t,N given by (12). Lemma 2.
Let Z t = z t ( S t ) ∈ L ( F t ) . For each M ∈ N , let Z M,t = z M,t ( D M , S t ) ∈ L ( H t ) , where z M,t ( D M , · ) only depends on D M through { S ( i ) u : 1 ≤ i ≤ M, u ≥ t + 1 } , i.e. L (cid:0) z M,t ( D M , S ( i ) t ) (cid:1) = L (cid:0) z M,t ( D M , S t ) (cid:1) . (18) Then, (cid:107) Z t − Z M,t (cid:107) P → as M → ∞ implies (cid:107) P B t,N Z t − P ( M ) B t,N Z M,t (cid:107) P → and (cid:98) β ( M ) t,N,Z M,t P → β t,N,Z t as M → ∞ . Remark 2.
Notice that (cid:98) V ( M ) N,t ( L ) satisfies (18) due to the backwards recur-sive structure of the LSM algorithm (16) . Theorem 1.
For t = 0 , . . . , T − , let the mappings ϕ t : L ( H t +1 ) → L ( H t ) be L -continuous and law invariant, let (cid:98) V ( M ) N,t ( L ) be given by (16) , and let (cid:98) V N,t ( L ) be given by (17) . Then, for t = 0 , . . . , T − , (cid:107) (cid:98) V N,t ( L ) − (cid:98) V ( M ) N,t ( L ) (cid:107) P → as M → ∞ . To summarize, Lemma 1 says that we can theoretically/asymptoticallyachieve arbitrarily accurate approximations, even when applying the ap-proximation recursively, and Theorem 1 says that we may approach thistheoretical best approximation in practice, with enough simulated non-path-dependent data.
Lemma 3.
If the mapping ϕ t : L ( H t +1 ) → L ( H t ) is Lipschitz continuousin the sense that there exists a constant K > such that | ϕ t ( X ) − ϕ t ( Y ) | ≤ K E [ | X − Y | | H t ] for all X, Y ∈ L ( H t +1 ) , (19) then ϕ t is L -continuous in the sense of Definition 5. Lemma 4.
If the conditional monetary risk measure ρ t : L ( H t +1 ) → L ( H t ) is Lipschitz continuous in the sense of (19) with Lipschitz constant K , then so is the mapping ϕ t given by (9) with Lipschitz constant K . The large class of (conditional) spectral risk measures are in fact Lip-schitz continuous. These conditional monetary risk measures can be ex-pressed as ρ t,m ( Y ) = − (cid:90) F − t,Y ( u ) m ( u )d u, (20)where m is a probability density function that is decreasing, bounded andright continuous, and F − t,Y ( u ) is the conditional quantile function F − t,Y ( u ) := ess inf { y ∈ L ( H t ) : P ( Y ≤ y | H t ) ≥ u } . It is well known that spectral risk measures are coherent and includes ex-pected shortfall as a special case.
Lemma 5. If m is a probability density function that is decreasing, boundedand right continuous, then ( ρ t,m ) T − t =0 is a dynamic monetary risk measure inthe sense of Definition 1. Moreover, each ρ t,m is law invariant in the senseof Definition 3 and also Lipschitz continuous with constant m (0) . emark 3. Assume ϕ t is Lipschitz continuous. Then || (cid:98) V t ( L ) − V t ( L ) || ≤ || (cid:98) V t ( L ) − ϕ t ( L t +1 + (cid:98) V t +1 ( L )) || + || ϕ t ( L t +1 + V t +1 ( L )) − ϕ t ( L t +1 + (cid:98) V t +1 ( L ) || ≤ || (cid:98) V t ( L ) − ϕ t ( L t +1 + (cid:98) V t +1 ( L )) || + K || V t +1 ( L ) − (cid:98) V t +1 ( L ) || Repeating this argument gives || (cid:98) V t ( L ) − V t ( L ) || ≤ T (cid:88) s = t K s − t || (cid:98) V s ( L ) − ϕ s ( L s +1 + (cid:98) V s +1 ( L ) || This bound is analogous to that in [23] (Lemma 2.3, see also Remark 3.4 forhow this ties in with the main result) with the exception that the constant K s − t appears instead of . As K may be quite large this is one of the reasonsfor not seeking to determine the exact rate of convergence, as is done infor instance [24] and [13]. This observation also discourages judging theaccuracy of the LSM algorithm purely by estimating the out-of-sample one-step estimation errors of the form || (cid:98) V s ( L ) − ϕ s ( L s +1 + (cid:98) V s +1 ( L )) || , as theseneed to be quite small in order for a satisfying error bound. In this section, we will focus on mappings ϕ t,α : L ( H t +1 ) → L ( H t ) givenby ϕ t,α ( Y ) := VaR t,α ( − Y ) −
11 + η t E [(VaR t,α ( − Y ) − Y ) + | H t ] (21)for some α ∈ (0 ,
1) and nonnegative constants ( η t ) T − t =0 , and whereVaR t,α ( − Y ) := F − t,Y (1 − α ):= ess inf { y ∈ L ( H t ) : P ( Y ≤ y | H t ) ≥ − α } . is the conditional version of Value-at-Risk. Note that ϕ t,α is a special caseof mappings ϕ in (9). (VaR t,α ) T − t =0 is a dynamic monetary risk measurein the sense of Definition 1, and VaR t,α is law invariant in the sense ofDefinition 3. Since VaR t,α is in general not Lipschitz continuous, ϕ t,α cannotbe guaranteed to be so, without further regularity conditions. The aim ofthis section is to find results analogous to Lemma 1 and Theorem 1.We will use the following Lemma and especially its corollary in lieu of L -continuity for Value-at-Risk: Lemma 6.
For any
X, Z ∈ H t +1 and any δ ∈ (0 , − α ) , VaR t,α ( − ( X + Z )) ≤ VaR t,α + δ ( − X ) + VaR t, − δ ( − Z ) , (22)VaR t,α ( − ( X + Z )) ≥ VaR t,α − δ ( − X ) − VaR t, − δ ( − Z ) . (23)9e get an interesting corollary from this lemma: Corollary 1.
Let α ∈ (0 , and let δ ∈ (0 , − α ) with δ < / . Then, forany X, Y ∈ L ( H t +1 ) , inf | (cid:15) | <δ | VaR t,α + (cid:15) ( X ) − VaR t,α ( Y ) | ≤ δ E [ | X − Y | | H t ] . Using these Lipschitz-like results, we can show a Lipschitz-like result for ϕ t,α ( · ). Theorem 2.
Let α ∈ (0 , and let δ ∈ (0 , − α ) with δ < / . Then, forany X, Y ∈ L ( H t +1 ) , then inf | (cid:15) | <δ | ϕ t,α + (cid:15) ( X ) − ϕ t,α ( Y ) | ≤ δ E [ | X − Y | | H t ] (24) and ϕ t,α − δ ( X ) − δ E [ | X − Y | | H t ] ≤ ϕ t,α ( Y ) ≤ ϕ t,α + δ ( X ) + 2 δ E [ | X − Y | | H t ] , (25) and (24) and (25) are equivalent. Theorem 2 enables us to prove L -continuity of ϕ t,α under a continuityassumption. Corollary 2.
Consider
X, X n ∈ L ( H t +1 ) , n ≥ , with X n → X in L .Assume that (0 , (cid:51) u (cid:55)→ VaR t,u ( − X ) be a.s. continuous at u = α . Then ϕ t,α ( X n ) → ϕ t,α ( X ) in L . The following remark illustrates that even a stronger requirement of a.s.continuous time t -conditional distributions should not be a great hindrancein practice: Remark 4.
If we add to our cash flow ( L t ) Tt =1 an adapted process ( (cid:15) t ) Tt =1 ,independent of ( L t ) Tt =1 , such that for each t , (cid:15) t is independent of F t − andhas a continuous distribution function, then the assumptions in Corollary 2will be satisfied. We are now ready to formulate a result analogous to Lemma 1.
Lemma 7.
Let α ∈ (0 , and let δ ∈ (0 , − α ) with δ < / . Let ( ϕ t,α ) T − t =0 be defined by (21) and let (cid:98) V N,t,α ( L ) := P B t,N ϕ t,α (cid:0) L t +1 + (cid:98) V N,t +1 ,α ( L ) (cid:1) , (cid:98) V N,T,α ( L ) := 0 . (26) Let (cid:83) n ∈ N B t,n be dense in the set { h ( S t ) | h : R d → R , h ( S t ) ∈ L ( F t ) } andassume that (0 , (cid:51) u (cid:55)→ VaR t,u ( − L t +1 − (cid:98) V N,t +1 ,α ( L )) be a.s. continuous at u = α for all N ∈ N , t = 0 , . . . T − . Then, for t = 0 , . . . , T − , (cid:107) (cid:98) V N,t,α ( L ) − V t,α ( L ) (cid:107) ∈ R and lim N →∞ (cid:107) (cid:98) V N,t,α ( L ) − V t,α ( L ) (cid:107) = 0 . emma 8. Let α ∈ (0 , and let (0 , (cid:51) u (cid:55)→ VaR t,u ( v T Φ t +1 ,N ) be a.s.continuous at u = α for any v ∈ R N . Then β n P → β implies (cid:13)(cid:13) ϕ t,α ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) (cid:13)(cid:13) P → . Remark 5.
Lemma 8 can be extended to show the convergence (cid:13)(cid:13) ϕ t,α ( L t +1 + β T Φ t +1 ,N ) − ϕ t,α ( L t +1 + β T n Φ t +1 ,N ) (cid:13)(cid:13) P → since the vector of basis functions Φ t +1 ,N could contain L t +1 as an element.The requirement for convergence is that u (cid:55)→ VaR t,u ( − L t +1 − v T Φ t +1 ,N ) isa.s. continuous at u = α . This requirement could be replaced by the strongerrequirement that x (cid:55)→ P ( L t +1 + v T Φ t +1 ,N | F t ) is a.s. continuous. We have now fitted ϕ t,α into the setting of Theorem 1. Theorem 3.
Let u (cid:55)→ VaR t,u ( − L t +1 − v T Φ t +1 ,N ) be a.s. continuous at u = α for any v ∈ R N . For any N ∈ N and t = 0 , . . . , T , let (cid:98) V N,t,α ( L ) begiven by (26) and define (cid:98) V ( M ) N,t,α ( L ) := P ( M ) B t,N ϕ t,α (cid:0) L t +1 + (cid:98) V ( M ) N,t +1 ( L ) (cid:1) , (cid:98) V ( M ) N,T,α ( L ) := 0 . Then, for t = 0 , , . . . , T − , (cid:107) (cid:98) V N,t,α ( L ) − (cid:98) V ( M ) N,t,α ( L ) (cid:107) P → as M → ∞ . In this section we will test the LSM algorithm empirically for the special caseof the mappings ϕ being given by ( ϕ t,α ) T − t =0 in (21). The LSM algorithmdescribed below, Algorithm 1, will differ slightly from the one previously, inthe sense that it will contain the small inefficiency of having two regressionsteps: One for the VaR term of the mapping, one for the expected value term.The reason for introducing this split is that it will significantly simplify thevalidation procedures of the algorithm. Heuristically, we will be able to runa forward simulation where we may test the accuracy of both the VaR termand the expected value term.Let µ t,t +1 ( · , · ) be the transition kernel from time t to t + 1 of the Markovprocess ( S t ) Tt =0 so that µ t,t +1 ( S t , · ) = P ( S t +1 ∈ · | S t ). In order to performthe LSM algorithm below, the only requirements are the ability to efficientlysample a variate s from the unconditional law L ( S t ) of S t and from theconditional law µ t,t +1 ( · , s ). Recall that the liability cash flow ( L t ) Tt =1 isassumed to be given by L t := g t ( S t ), for known functions ( g t ) Tt =1 .We may assess the accuracy of the LSM implementation by computingroot mean-squared errors (RMSE) of quantities appearing in Algorithm 1.11 lgorithm 1 LSM AlgorithmSet (cid:98) β ( M ) T,N,V := 0 for t = T − do Draw independent variables S (1) t , . . . S ( M ) t from L ( S t ) for i = 1 : M do Draw independent variables S ( i, t +1 , . . . , S ( i,n ) t +1 from µ t,t +1 ( S ( i ) t , · )Set Y ( i,j ) t +1 := g t +1 ( S ( i,j ) t +1 ) + ( (cid:98) β ( M ) t +1 ,N,V ) T Φ t +1 ,N ( S ( i,j ) t +1 ), j = 1 , . . . , n Let (cid:98) F ( i ) t ( y ) := n (cid:80) nj =1 I { Y ( i,j ) t +1 ≤ y } (empirical cdf)Set R ( i ) t := min { y : (cid:98) F ( i ) t ( y ) ≥ α } (empirical α -quantile)Set E ( i ) t := n (cid:80) nj =1 ( R ( i ) t − Y ( i,j ) t +1 ) + end for Set (cid:98) β ( M ) t,N,R as in (14) by regressing ( R ( i ) t ) Mi =1 onto ( Φ t,N ( S ( i ) t )) Mi =1 Set (cid:98) β ( M ) t,N,E as in (14) by regressing ( E ( i ) t ) Mi =1 onto ( Φ t,N ( S ( i ) t )) Mi =1 Set (cid:98) β ( M ) t,N,V := (cid:98) β ( M ) t,N,R t − η (cid:98) β ( M ) t,N,E t end for For each index pair ( t, i ) set V ( i ) t := R ( i ) t − η E ( i ) t . Define the RMSE andthe normalized RMSE byRMSE Z,t := (cid:18) M M (cid:88) i =1 (cid:16) Z ( i ) t − ( (cid:98) β ( M ) t,N,Z ) T Φ t,N (cid:0) S ( i ) t (cid:1)(cid:17) (cid:19) / , (27)NRMSE Z,t := RMSE
Z,t × (cid:18) M M (cid:88) i =1 Z ( i ) t (cid:19) − / , (28)where Z is a placeholder for R , E or V .For each index pair ( t, i ) consider the actual non-default probability andactual return on capital given byANDP ( i ) t := (cid:98) F ( i ) t (cid:0) ( (cid:98) β ( M ) t,N,R ) T Φ t,N ( S ( i ) t )) (cid:1) , (29)AROC ( i ) t := (1 + η ) E ( i ) t × (cid:0) ( (cid:98) β ( M ) t,N,E ) T Φ t,N ( S ( i ) t ) (cid:1) − ) , (30)and note that these random variables are expected to be centered around α and 1 + η , respectively, if the implementation is accurate. All validationprocedures in this paper are performed out-of-sample, i.e. we must performa second validation run to get these values. In this section we will introduce two model types in order to test the perfor-mance of the LSM algorithm. The first model type, introduced in Section4.1.1, is not motivated by a specific application but is simply a sufficiently12exible and moderately complex time series model.The second model type,introduced in Section 4.1.2, aims to describe the cash flow of a life insuranceportfolio paying both survival and death benefits.
The first model to be evaluated is when the liability cash flow ( L t ) Tt =1 is as-sumed to be given by a process given by an AR(1) model with GARCH(1,1)residuals, with dynamics given by: L t +1 = α + α L t + σ t +1 (cid:15) t +1 , σ t +1 = α + α σ t + α L t , L = 0 , σ = 1 . Here (cid:15) , . . . , (cid:15) T are assumed to be i.i.d. standard normally distributed and α , . . . α are known model parameters. If we put S t = ( L t , σ t +1 ) for t =0 , . . . , T , we see that S t will form a time homogeneous Markov chain.In order to contrast this model with a more complex model, we also in-vestigate the case where the process ( L t ) Tt =1 is given by a sum of independentAR(1)-GARCH(1,1)-processes of the above type: L t = (cid:80) i =1 L t,i , where L t +1 ,i = α ,i + α ,i L t,i + σ t +1 ,i (cid:15) t +1 ,i , σ t +1 ,i = α ,i + α ,i σ t,i + α ,i L t,i . The motivation for these choices of toy models is as follows: Firstly, a singleAR(1)-GARCH(1,1) process is sufficiently low dimensional so we may com-pare brute force approximation with that of the LSM model, thus getting areal sense of the performance of the LSM model. Secondly, despite it beinglow dimensional, it still seems to have a sufficiently complex dependencestructure as not to be easily valued other than by numerical means. Themotivation for looking at a sum of AR(1)-GARCH(1,1) processes is simplyto investigate whether model performance is severely hampered by an in-crease in dimensionality, provided a certain amount of independence of thesources of randomness.
In order to investigate a set of models more closely resembling an insurancecash flow, we also consider an example closely inspired by that in [8]. Es-sentially, we will assume the liability cash flow to be given by life insurancepolicies where we take into account age cohorts and their sizes at each time,along with financial data relevant to the contract payouts.We consider two risky assets Y and F , given by the log-normal dynamicsd Y t = µ Y Y t d t + σ Y Y t d W Yt , ≤ t ≤ T, Y = y , d F t = µ F F t d t + σ F F t d W Ft , ≤ t ≤ T, F = f .W Yt W Ft are two correlated Brownian motions, which we may re-write as W Yt = W t , W Ft = ρW t + (cid:112) − ρ W t , ≤ t ≤ T, W and W are two standard, uncorrelated Brownian motions. Here, F will represent the index associated with unit-linked contracts and Y willrepresent assets owned by the insurance company. Furthermore, we assumethat an individual of age a has the probability 1 − p a of reaching age a + 1,where the probabilities p a for a = 0 , , . . . are assumed to be nonrandomand known. All deaths are assumed to be independent of each other. Wewill consider k age-homogeneous cohorts of sizes n , . . . , n k at time t = 0and ages a , . . . , a k at time t = 0. We assume that all insured individualshave bought identical contracts. If death occurs at time t , the contract paysout the death benefit max( D ∗ , F t ), where D ∗ is a nonrandom guaranteedamount. If an insured person survives until time T , the survival benefitmax( S ∗ , F T ), where again S ∗ is a nonrandom amount. We finally assumethat the insurance company holds the nominal amount c ( n + · · · + n k ) inthe risky asset Y and that they will sell off these assets proportionally to theamount of deaths as they occur, and sell off the entire remaining amountat time T . Let N it denote the number of people alive in cohort i at time t ,with the following dynamics: N it +1 ∼ Bin( N it , − p a i + t ) , t = 0 , . . . , T − . These are the same dynamics as the life insurance example in Section 5 of[11]. Thus, the liability cash flow we consider here is given by L t = (cid:0) max( D ∗ , F t ) − cY t (cid:1) k (cid:88) i =1 ( N it − N it − )+ I { t = T } (cid:0) max( S ∗ , F T ) − cY T (cid:1) k (cid:88) i =1 N iT If we write S t = ( Y t , F t , N t , . . . , N kt ), then S := ( S t ) Tt =0 will be a Markovchain with dynamics outlined above. Note that depending on the number k of cohorts, S might be a fairly high-dimensional Markov chain. Note that inaddition to the obvious risk factors of mortality and the contractual payoutamounts, there is also the risk of the value of the insurance company’s riskyasset Y depreciating in value, something which is of course a large risk factorof insurance companies in practice. Here we will consider the case of k = 4cohorts, referred to as the small life insurance model and the case k = 10cohorts, referred to as the large life insurance model. So far, the choice of basis functions has not been addressed. As we are tryingto numerically calculate some unknown functions we do not know the formof, the approach used here will be a combination of standard polynomialfunctions, completed with functions that in some ways bear resemblance to14he underlying liability cash flow. A similar approach for the valuation ofAmerican derivatives is taken in for instance [15] and [3], where in the latterit is explicitly advised (see pp. 1082) to use the value of related, simpler,derivatives as basis functions to price more exotic ones.In these examples, we will not be overly concerned with model sparsity,covariate significance or efficiency, but rather take the machine-learning ap-proach of simply evaluating models based on out-of-sample performance.This is feasible due to the availability of simulated data for both fitting andout-of-sample validation.
Since the AR(1)-GARCH(1,1) models can be considered toy models, genericbasis functions were chosen. For a single AR(1)-GARCH(1,1) model, thechoice of basis functions was all polynomials of the form L it σ jt +1 for all 0
2. For the sum of 10 independent AR(1)-GARCH(1,1) models wedenote by L t , σ t +1 the aggregated liability cash flow and standard deviationat time t and t + 1, respectively. Then we consider the basis functionsconsisting of the state vector ( L t,i , σ t,i ) i =1 along with L it σ jt +1 for all 0
2, omitting the case of i = 1, j = 0 to avoid collinearity. Note thatthe number of basis functions grow linearly with the dimensionality of thestate space, rather than quadratically. For the state S t = ( Y t , F t , N t , . . . , N kt ), let p it +1 be the probability of deathduring ( t, t + 1) for an individual in cohort i , with q it +1 := 1 − p it +1 . We thenintroduce the state-dependent variables µ t +1 := k (cid:88) i =1 N it p it +1 , σ t +1 := (cid:16) k (cid:88) i =1 N it p it +1 q it +1 (cid:17) / , N t := k (cid:88) i =1 N it . The first two terms here are the mean and standard deviation of the numberof deaths during ( t, t + 1), the third simply being the total number of peoplealive at time t . The basis functions we choose consist of the state vector Y t , F t , N t , . . . , N kt together with all products of two factors where the firstfactor is an element of the set { µ t +1 , σ t +1 , N t } and the other factor is anelement of the set (cid:8) Y t , F t , Y t , F t , F t , Y t F t , Y t F t , ( F t − K j ) + , ( F t − K j ) + Y t ,C ( F t , S ∗ , T, t ) , C ( F t , D ∗ , t + 1 , t ) , C ( F t , S ∗ , T, t ) Y t , C ( F t , D ∗ , t + 1 , t ) Y t (cid:9) .K j can take values in { , , , } depending on which covariates ofthe form ( F t − K j ) + had the highest R -value at time T = 5. Here the15 -values were calculated based on the residuals after performing linear re-gression with respect to all basis function not containing elements of the form( F t − K j ) + . While this is a somewhat ad hoc approach that could be refined,it is a simple and easy to implement example of basis functions. Again notethat the number of basis functions grow linearly with the dimensionality ofthe state space, rather than quadratically. For Algorithm 1, M = 5 · and n = 10 were chosen for the life insurancemodels and M = 10 and n = 10 for the AR(1)-GARCH(1,1) models.Terminal time T = 6 was used in all cases. For the validation run, M = 10 and n = 10 were chosen for all models. Due to the extreme quantile levelinvolved, and also based on empirical observations, it was deemed necessaryto keep n around this order of magnitude. Similarly, in part due to thenumber of basis functions involved, it was observed as well that performanceseemed to increase with M . The choice of M and n to be on the consideredorder of magnitude was thus necessary for good model performance, andalso the largest orders of magnitude that was computationally feasible giventhe computing power available.For the AR(1)-GARCH(1,1) model, the chosen parameters were α = 1 , α = 1 , α = 0 . , α = 0 . , α = 0 . . The same choice was used for each of the terms in the sum of 10 AR(1)-GARCH(1,1) processes, making the model a sum of i.i.d. processes.For the life insurance models, the choice of parameters of the risky assetswas µ Y = µ F = 0 . , σ Y = σ F = 0 . , ρ = 0 . , y = f = 100. The benefitlower bounds were chosen as D ∗ = 100 , S ∗ = 110. The death/survivalprobabilities were calculated using the Makeham formula (for males): p a = exp (cid:110) − (cid:90) a +1 a µ x d x (cid:111) µ x := 0 .
001 + 0 . { . x } . These numbers correspond to the Swedish mortality table M90 for males(the formula for females is identical, but adjusted backwards by 6 years toaccount for the greater longevity in the female population). For the caseof 4 cohorts, starting ages (for males) were 50 −
80 in 10-year incrementsand for the case of 10 cohorts the starting ages were 40 −
85 with 5-yearincrements.The algorithms were run on a computer with 8 Intel(R) Core(TM) i7-4770S 3.10GHz processors, and parallel programming was implemented inthe nested simulation steps in both Algorithm 1 and the validation algo-rithm. 16odel RMSE V RMSE R RMSE Eone singleAR(1)-GARCH(1,1) 0.0114, 0.0118,0.0115, 0.0098,0.0061 0.0533, 0.0556,0.0553, 0.0455,0.0285 0.0521, 0.0542,0.0544, 0.0444,0.0279a sum of 10AR(1)-GARCH(1,1) 0.0172, 0.0130,0.0120, 0.0100,0.0061 0.0525, 0.0552,0.0546, 0.0467,0.0278 0.0536, 0.0544,0.0535, 0.0458,0.0273Life modelwith 4 cohorts 134.4, 120.3,134.8, 85.1,75.5 760.3, 682.7,901.6, 535.9,575.1 742.4, 665.0,856.8, 536.0,571.3Life modelwith 10 cohorts 331.9, 307.4,330.8, 226.2,219.3 1730.1, 1719.4,2148.3, 1431.0,1928.3 1689.2, 1672.8,2049.8, 1429.4,1910.3Table 1: RMSE values for the quantities
V, R, E as defined in (27). The fivevalues in each cell are for times t = 1 , , , ,
5, in that order.
The RMSE:s and NRMSE:s of the LSM models can be seen in Table 1(RMSE:s) and Table 2 (NRMSE:s). The ANDP:s and AROC:s of the LSMmodels can be seen in Table 3. The tables display quantile ranges withrespect to the 2 .
5% and 97 .
5% quantiles of the data.Below, in Figure 4.3 we also present some histograms of the actual re-turns and risks of ruin, in order to get a sense of the spread of these values.From these we can observe that the quantity representing the actualreturns seems to be quite sensitive to model errors, if we recall the rathersmall size of the RMSE values.Table 4 displays the running times of each model. As far as is known,the main factor determining running time of Algoritm 1 is the repeatedcalculation of the basis functions inside the nested simulation (required tocalculate the quantities Y i,jt +1 in the inner for-loop). As these are quite manyfor models with high-dimensional state spaces, we see that running timesincrease accordingly. It should be noted that Algorithm 1 was not imple-mented to run as fast as possible for any specific model, other than theimplementation of parallel programming. Speed could potentially be gainedby adapting Algorithm 1 for specific models of interest.Some conclusions can be drawn from the numerical results. Firstly, wecan see that from a mean-squared-error point of view, the LSM model seemsto work well in order to capture the dynamics of the multiperiod cost-of-capital valuation. It should be noted that the (N)RMSE of the value V is lower than those of R and E across the board for all models and times.Since the expression for E is heavily dependent of R , we can suspect that17odel NRMSE V (%) NRMSE R (%) NRMSE E (%)one singleAR(1)-GARCH(1,1) 0.0498, 0.0583,0.0685, 0.0797,0.0901 0.1705, 0.1931,0.2170, 0.2333,0.2544 0.5810, 0.5962,0.5930, 0.5747,0.5885a sum of 10AR(1)-GARCH(1,1) 0.0758, 0.0642,0.0715, 0.0813,0.0912 0.1693, 0.1913,0.2158, 0.2393,0.2493 0.6049, 0.5963,0.5880, 0.5949,0.5779Life modelwith 4 cohorts 0.2567 0.22250.2603 0.17110.1647 0.5443 0.56460.8722 0.59240.7403 0.7109 0.75351.1381 0.83061.0271Life modelwith 10 cohorts 0.2505, 0.2247,0.2454, 0.1720,0.1733 0.4911, 0.5592,0.7948, 0.5952,0.9056 0.6475, 0.7452,1.0499, 0.8351,1.2580Table 2: NRMSE values for the quantities V, R, E as defined in (28). Thefive values in each cell are for times t = 1 , , , ,
5, in that order.Figure 1: The top two figures correspond to the AR(1)-GARCH(1,1) model.The bottom two figures correspond to the life insurance model with 10 co-horts. 18odel QR ANDP(2 . . . . − ANDP ( i ) t ) Mi =1 and(AROC ( i ) t ) Mi =1 , as defined in (29) and (30). The quantiles considered are2 .
5% and 97 . t = 1 , , , , R and E are positively correlated, and thus that V = R − η E gets lower mean squared errors as a result.We can see that increasing model complexity for the AR(1)-GARCH(1,1)and life insurance models seems to have no significant effect on LSM perfor-mance. It should be noted that model complexity in both cases is increasedby introducing independent stochastic factors; A sum of i.i.d. processesin the AR(1)-GARCH(1,1) case and the adding of (independent) cohortsin the life insurance case. Thus the de-facto model complexity might nothave increased much, even though the state-space of the markov process isincreased.When we look at the ANDP and AROC quantities, we see that theseseem to vary more than do the (N)RMSE:s. Especially AROC, which isdefined via a quotient, seems to be sensitive to model error.One important thing to note with regards to sensitivity of ANDP andAROC is the presence of errors introduced by the necessity of having to useMonte-Carlo simulations in order to calculate samples of VaR t, − α ( −· ) , E [( · ) + ].This can be seen in the AR(1)-GARCH(1,1) case: If we investigate what thevalue V ( L ) should be, we see that in this case it has a closed form (usingpositive homogeneity and translation invariance): ϕ ( L + V ( L )) = ϕ ( α + α L + σ (cid:15) ) = α + α L + σ ϕ ( (cid:15) ) .ϕ ( (cid:15) ) is deterministic due to law invariance. Since L t and σ t are includedin the basis functions for the AR(1)-GARCH(1,1) model, we would expectthe fit in this case to be perfect. Since it is not, we conclude that errors stillmay appear even if optimal basis functions are among our selection of basisfunctions.Finally, if we recall that the main purpose of these calculations is tocalculate the quantity V t ( L ), a good approach for validation might be to re-balance the LSM estimates of R ( i ) t and E ( i ) t so that the LSM estimate of thevalue V ( i ) t remains unchanged, but the LSM estimates are better fitted to R ( i ) t and E ( i ) t . This re-balancing would not be problematic in the economicmodel that this validation scheme is played out in. However, in this paperwe were also interested in how the LSM model captures both the VaR termand the expected value term, so the quantities ANDP and AROC remainrelevant to look at. We have studied the performance of the LSM algorithm to numerically com-pute recursively defined objects such as ( V t ( L, ϕ )) Tt =0 given in definition 4,where the mappings ϕ are either L -continuous or are given by (21). Asa part of this study, Lipschitz-like results and conditions for L -continuity20ere established for Value-at-Risk and the associated operator ϕ t,α in The-orem 2 and Corollary 2. Important basic consistency results have beenobtained showing the convergence of the LSM estimator both as the num-ber of basis functions go to infinity in Lemmas 1 and 7 and when the sizeof the simulated data goes to infinity for a fixed number of basis functionsin Theorems 1 and 3. Furthermore, these results are applicable to a largeclass of conditional monetary risk measures, utility functions and variousactuarial multi-period valuations, the only requirement being L -continuityor a property like that established in Theorem 2. We also apply and evalu-ate the LSM algorithm with respect to multi-period cost-of-capital valuationconsidered in [11] and [16], and in doing this also provide insight into prac-tical considerations concerning implementation and validation of the LSMalgorithm. Proof of Lemma 1.
Note that the quantities defined in (10) and (17) areindependent of D , hence all norms below are a.s. constants. Define (cid:15) t := (cid:98) V N,t − V t . We will now show via backwards induction staring from time t = T that || (cid:15) t || →
0. The induction base is trivial, since (cid:98) V N,T ( L ) = V T ( L ) = 0.Now assume that || (cid:15) t +1 || →
0. Then || (cid:15) t || ≤ || (cid:98) V N,t − ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) || + || ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) − ϕ t ( L t +1 + V t +1 ( L )) || By the induction assumption and the continuity assumption, we know thatthe second summand goes to 0. We now need to show that || (cid:98) V N,t − ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) || →
0. Now we simply note, by the definition of the projectionoperator and denseness of the approximating sets, || (cid:98) V N,t − ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) || = inf B ∈B N || B − ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) || ≤ inf B ∈B N || B − ϕ t ( L t +1 + V t +1 ) || + || ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) − ϕ t ( L t +1 + V t +1 ) || = (cid:104) inf B ∈B N || B − ϕ t ( L t +1 + V t +1 ) || (cid:105) + || ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) − ϕ t ( L t +1 + V t +1 ) || . By our assumptions, both these terms go to zero as ϕ t ( L t +1 + V t +1 ( L )) isafunction of the state S t , which lies in L ( F t ). Proof of Lemma 2.
We first note that if (cid:98) β ( M ) t,N,Z M,t → β t,N,Z t in probability,then || ( β t,N,Z t ) T Φ t,N − ( (cid:98) β ( M ) t,N,Z M,t ) T Φ t,N || → (cid:98) β ( M ) t,N,Z M,t
21s independent of Φ t,N and F -measurable, while Φ t,N is independent of D . Hence it suffices to show that (cid:98) β ( M ) t,N,Z M,t → β t,N,Z t in probability. Now,recalling the definition of (cid:98) β ( M ) t,N,Z M,t , we re-write (14) as (cid:98) β ( M ) t,N,Z t = (cid:16) M (cid:0) Φ ( M ) t,N (cid:1) T Φ ( M ) t,N (cid:17) − M (cid:0) Φ ( M ) t,N (cid:1) T Z ( M ) t , Furthermore recall the form of β t,N,Z t given by (13). We first note thatsince, by the law of large numbers, M (cid:0) Φ ( M ) t,N (cid:1) T Φ ( M ) t,N → E (cid:2) Φ t,N Φ T t,N (cid:3) almostsurely and thus in probability, it suffices to show that1 M (cid:0) Φ ( M ) t,j ( S ( i ) t ) (cid:1) T1 ≤ i ≤ M Z ( M ) t → E [Φ t,j ( S t ) Z t ]in probability for each j = 1 , . . . N . We first note that, letting (cid:15) ( i ) M := Z ( i ) M,t − z t ( S ( i ) t ) (cid:12)(cid:12)(cid:12) M M (cid:88) i =1 Φ t,j ( S ( i ) t ) Z ( i ) M,t − E [Φ t,j ( S t ) Z t ] (cid:12)(cid:12)(cid:12) ≤ (cid:12)(cid:12)(cid:12) M M (cid:88) i =1 Φ t,j ( S ( i ) t ) z t ( S ( i ) t ) − E [Φ t,j ( S t ) Z t ] (cid:12)(cid:12)(cid:12) + (cid:12)(cid:12)(cid:12) M M (cid:88) i =1 Φ t,j ( S ( i ) t ) (cid:15) ( i ) M (cid:12)(cid:12)(cid:12) The first summand goes to zero in probability by the law of large numbers.Thus, we investigate the second summand using H¨older’s inequality: (cid:12)(cid:12)(cid:12) M M (cid:88) i =1 Φ t,j ( S ( i ) t ) (cid:15) ( i ) M (cid:12)(cid:12)(cid:12) ≤ (cid:16) M M (cid:88) i =1 (Φ t,j ( S ( i ) t )) (cid:17) / (cid:16) M M (cid:88) i =1 ( (cid:15) ( i ) M ) (cid:17) / We see that, again by the law of large numbers, the first factor convergesto E [(Φ t,j ( S t )) ] in probability. Now we look at the second factor. By ourindependence assumption, (cid:15) ( i ) M d = Z t,M − Z t and thusVar (cid:16)(cid:16) M M (cid:88) i =1 ( (cid:15) ( i ) M ) (cid:17) / (cid:12)(cid:12)(cid:12) F (cid:17) ≤ E (cid:104) M M (cid:88) i =1 ( (cid:15) ( i ) M ) (cid:105) = || Z t,M − Z t || , which, by assumption, goes to 0 in probability, hence the expression goes tozero in probability. This concludes the proof. Proof of Theorem 1.
We pove the statement by backwards induction, start-ing from time t = T . As before, the induction base follows immediately fromour assumptions. Now assume || (cid:98) V N,t +1 ( L ) − (cid:98) V ( M ) N,t +1 ( L ) || → M → ∞ . By L -continuity we get that || ϕ t ( L t +1 + (cid:98) V N,t +1 ( L )) − ϕ t ( L t +1 + (cid:98) V ( M ) N,t +1 ( L )) || → || (cid:98) V N,t ( L ) − (cid:98) V ( M ) N,t ( L ) || → roof of Lemma 3. Note that || ϕ t ( X ) − ϕ t ( Y ) || = E [ | ϕ t ( X ) − ϕ t ( Y ) | ] ≤ E [ K E t [ | X − Y | ] ] ≤ K E [ E t [ | X − Y | ]] = K E [ | X − Y | ]= K || X − Y || . Here we have used Jensen’s inequality and the tower property of the con-ditional epectation at the second inequality and the following equality, re-spectively. From this L -continuity immediately follows. Proof of Lemma 4.
By Lemma 9, to construct upper and lower bounds fora quantity given by ϕ t,α ( ξ ) we may find upper and lower bounds for ρ t ( − ξ )and insert them into the expression for ϕ t,α ( ξ ). Now Take X, Y ∈ L p ( F t +1 )and let Z := Y − X . By monotnicity we get that ϕ t ( X − | Z | ) ≤ ϕ t ( Y ) ≤ ϕ t ( X + | Z | ) . We now observe that ρ t ( − ( X + | Z | )) ≤ ρ t ( − X ) + K E t [ | Z | ] ρ t ( − ( X − | Z | )) ≤ ρ t ( − X ) − K E t [ | Z | ]We use this to also observe that, by the subadditivity of the () + -operation, − E t [( ρ t ( − X ) + K E t [ | Z | ] − X − | Z | ) + ] ≤ − E t [ ρ t ( − X ) − X ) + ] + E t [( ρ t ( −| Z | ) − | Z | ) + ] ≤ − E t [ ρ t ( − X ) − X ) + ] + ρ t ( −| Z | ) ≤ − E t [ ρ t ( − X ) − X ) + ] + K E t [ | Z | ] . Similarly, we have that − E t [( ρ t ( − X ) − K E t [ | Z | ] − X + | Z | ) + ] ≤ − E t [ ρ t ( − X ) − X ) + ] − E t [( ρ t ( −| Z | ) − | Z | ) + ] ≤ − E t [ ρ t ( − X ) − X ) + ] − ρ t ( −| Z | ) ≤ − E t [ ρ t ( − X ) − X ) + ] − K E t [ | Z | ] . From this we get that ϕ t ( Y ) ≤ ϕ t ( X + | Z | ) ≤ ϕ t ( X ) + K E t [ | Z | ] + 11 + η K E t [ | Z | ] ϕ t ( Y ) ≥ ϕ t ( X − | Z | ) ≥ ϕ t ( X ) − K E t [ | Z | ] −
11 + η K E t [ | Z | ]From which Lipschitz continuity with resepct to the constant 2 K immedi-ately follows. 23 roof of Lemma 5. By subadditivity we have that ρ t,M ( Y ) − ρ t,M ( X ) ≤ ρ t,M ( Y − X ) ≤ ρ t,M ( −| Y − X ) | ) ,ρ t,M ( X ) − ρ t,M ( Y ) ≤ ρ t,M ( X − Y ) ≤ ρ t,M ( −| Y − X ) | ) , Now we simply note that ρ t,M ( −| Y − X ) | ) = − (cid:90) F − t, | Y − X ) | ( u ) m ( u )d u ≤ − m (0) (cid:90) F − t, | Y − X ) | ( u )d u = m (0) E t [ | Y − X | ]This concludes the proof. Proof of Lemma 6.
We begin by showing (22). Let E = { Z ≤ VaR − δ ( − Z ) } .Then: P t ( X + Z ≤ y ) ≥ P t ( E ∩ { X + Z ≤ y } ) ≥ P t ( E ∩ { X + VaR − δ ( − Z ) ≤ y } ) ≥ P t ( X + VaR − δ ( − Z ) ≤ y ) − P ( E (cid:123) ) ≥ P t ( X ≤ y − VaR − δ ( Z )) − δ Putting y = VaR α + δ ( − X ) + VaR − δ ( − Z ) yields P t ( X ≤ y − VaR − δ ( − Z )) − δ ≥ α + δ − δ = α Hence VaR t,α ( − ( X + Z )) ≤ VaR α + δ ( − X ) + VaR − δ ( − Z ). We now prove(23) by applying (22)VaR t,α − δ ( − X ) = VaR t,α − δ ( − ( X + Z + ( − Z )) ≤ VaR t,α ( − ( X + Z )) + VaR − δ ( Z ) , from which we get (23) Proof of Corollary 1.
Let Z = Y − X . Now we simply note that, for any δ VaR t,α ( − ( X + Z )) ≤ VaR t,α ( − ( X + | Z | )) ≤ VaR t,α + δ ( − X ) + VaR t, − δ ( −| Z | )By Markov’s inequality, we may bound the latter summand:VaR t, − δ ( −| Z | ) ≤ δ E t [ | Z | ]24ow for the lower bound, we similarly noteVaR t,α ( − ( X + Z )) ≥ VaR t,α ( − ( X − | Z | )) ≥ VaR t,α − δ ( − X ) + VaR t, − δ ( | Z | )where again we may bound the second summand using Markov’s inequality:VaR t, − δ ( | Z | ) ≥ − − δ E t [ | Z | ] ≥ − δ E t [ | Z | ] , since we have assumed δ < /
2. This immediately yields that, almost surely,VaR t,α ( − Y ) ∈ (cid:104) VaR t,α − δ ( − X ) − δ E t [ | X − Y | ] , VaR t,α + δ ( − X ) + 1 δ E t [ | X − Y | ] (cid:105) . This immediately yields our desired result.
Lemma 9.
For any X ∈ L ( F t +1 ) and R , R ∈ L ( F t ) with R ≤ R a.s., R −
11 + η E t [( R − X ) + ] ≤ R −
11 + η E t [( R − X ) + ] a.s. Proof of Lemma 9.
Let R ≤ R a.s. and let A = { R − X ≥ } and A = { R − X ≥ } . Note that A ⊆ A almost surely, i.e. P t ( A \ A ) = 0a.s. We now note that: R −
11 + η E t [( R − X ) + ] = (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ] R −
11 + η E t [( R − X ) + ] = (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ]We look at the expectation in the first expression: E t [ I A X ] = E t [ I A X ] − E t [ X I A \ A ] ≤ E t [ I A X ] − P t ( A \ A ) R We now see that (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ] ≤ (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ] −
11 + η P t ( A \ A ) R = (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ] ≤ (cid:0) −
11 + η P t ( A ) (cid:1) R + 11 + η E t [ I A X ]This concludes the proof. 25 roof of Theorem 2. Let Z = Y − X . Note that ϕ t,α ( Y ) = ϕ t,α ( X + Z ) ≤ ϕ t,α ( X + | Z | ) . As for the VaR-part of ϕ t,α , including that in the expectation, we note thatby Lemma 6 VaR t,α ( − ( X + | Z | )) ≤ VaR t,α + δ ( − X ) + VaR t, − δ ( −| Z | ). Wenow note that, by subadditivity of x (cid:55)→ ( x ) + , − E t [(VaR t,α + δ ( − X ) + VaR t, − δ ( −| Z | ) − X − | Z | ) + ] ≤ − E t [(VaR t,α + δ ( − X ) − X ) + ] + E t [(VaR t, − δ ( −| Z | ) − | Z | ) + ] ≤ − E t [(VaR t,α + δ ( − X ) − X ) + ] + VaR t, − δ ( −| Z | ) . Hence ϕ t,α ( X + | Z | ) ≤ ϕ t,α + δ ( X ) + 2 + η η VaR t, − δ ( −| Z | ) ≤ ϕ t,α + δ ( X ) + 2 δ E t [ | Z | ] . Here we have used the Markov’s inequality bound from Corollary 1.We now similarly construct a lower bound for ϕ t,α ( Y ): ϕ t,α ( Y ) = ϕ t,α ( X + Z ) ≥ ϕ t,α ( X − | Z | )Again, for the VaR-part, we note that by Lemma 6 VaR t,α ( − ( X − | Z | )) ≥ VaR t,α − δ ( − X )+VaR t, − δ ( | Z | ). We now analyze the resulting expected valuepart, using subadditivity of x (cid:55)→ ( x ) + : − E t [(VaR t,α + δ ( − X ) + VaR t, − δ ( | Z | ) − X + | Z | ) + ] ≥ − E t [(VaR t,α + δ ( − X ) − X ) + ] − E t [(VaR t,α + δ ( | Z | ) + | Z | ) + ] ≥ − E t [(VaR t,α + δ ( − X ) − X ) + ] − E t [ | Z | ]Hence we get the lower bound ϕ t,α ( X − | Z | ) ≤ ϕ t,α − δ ( X ) + VaR t, − δ ( | Z | ) + 11 + η E t [ | Z | ] ≥ ϕ t,α − δ ( X ) − δ E t [ | Z | ] . Here, again, we have used the Markov’s inequality bound from Corollary 1.Hence we have shown that ϕ t,α ( Y ) ∈ (cid:104) ϕ t,α − δ ( X ) − δ E t [ | X − Y | ] , ϕ t,α + δ ( X ) + 2 δ E t [ | X − Y | ] (cid:105) , from which (24) immediately follows.26 roof of Corollary 2. Choose a sequence δ n → δ n E [ | X − X n | ] → α + δ n < δ n < /
2. We now use the following inequality, whichfollows from the monotonicity of ϕ t,α in α : | ϕ t,α ( X ) − ϕ t,α ( X n ) |≤ ϕ t,α + δ n ( X ) − ϕ t,α − δ n ( X ) + inf | (cid:15) | <δ n | ϕ t,α + (cid:15) ( X ) − ϕ t,α ( X n ) |≤ ϕ t,α + δ n ( X ) − ϕ t,α − δ n ( X ) + 2 δ n E [ | X − X n | | H t ]By L -convergence and our choice of δ n , the last term clearly goes to 0 byour assumptions.As for the first summand, we see that for any sequence δ n → ϕ t,α − δ n ( X ) − ϕ t,α + δ n ( X ) → t,u at α ) and furthermore it is a decreasing sequence of nonnegative randomvariables in L ( H t ). Hence by Lebesgue’s monotone convergence theorem || ϕ t,α − δ n ( X ) − ϕ t,α + δ n ( X ) || →
0. This concludes the proof.
Proof of Lemma 7.
By Lemma 2, ϕ t,α is L continuous with respect to limitobjects with a.s. continuous t -conditional distributions. Hence the proof iscompletely analogous to that of Lemma 1. Proof of Lemma 8.
Fix (cid:15) >
0. we want to show that P ( || ϕ t,α ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) || > (cid:15) ) → n → ∞ . We first note that, for any δ ∈ (0 , − α ) with δ < /
2, we have an inequality similar to that in the proofof Corollary 2: || ϕ t,α ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) || ≤ || ϕ t,α − δ ( β T Φ t +1 ,N ) − ϕ t,α + δ ( β T Φ t +1 ,N ) || + inf | ξ |≤ δ || ϕ t,α + ξ ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) || If we look at the first summand, we see that for any sequence δ n → ϕ t,α − δ n ( β T Φ t +1 ,N ) − ϕ t,α + δ n ( β T Φ t +1 ,N ) → || ϕ t,α − δ n ( β T Φ t +1 ,N ) − ϕ t,α + δ n ( β T Φ t +1 ,N ) || → D .We now apply Theorem 2 to the second term to see thatinf | ξ |≤ δ || ϕ t,α + ξ ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) || ≤ δ || ( β − β n ) T Φ t +1 ,N || ≤ δ || β − β n || ∞ || Φ t +1 ,N || Note that || Φ t +1 ,N || := K Φ is just a constant. We now note that, as || β − β n || ∞ → (cid:15) >
0, it is possible to27hoose a sequence δ n → P ( K Φ δ n || β − β n || ∞ > (cid:15) ) →
0. Hence,for any fixed (cid:15) > P ( || ϕ t,α ( β T Φ t +1 ,N ) − ϕ t,α ( β T n Φ t +1 ,N ) || > (cid:15) ) → n → ∞ . Proof of Theorem 3.
We prove the statement by backwards induction, start-ing from time t = T . The induction base is trivial. Now assume thatthe statement holds for time t + 1. But then, by Lemma 8, || ϕ t,α ( L t +1 + (cid:98) V N,t,α ( L )) − ϕ t,α ( L t +1 + (cid:98) V ( M ) N,t,α ( L )) || → || (cid:98) V N,t,α ( L ) − (cid:98) V ( M ) N,t,α ( L ) || → eferences [1] Philippe Artzner, Freddy Delbaen, Jean-Marc Eber, David Heath andHyejin Ku (2007), Coherent multiperiod risk adjusted values and Bell-man’s principle. Annals of Operations Research , 152, 5-22.[2] D. Barrera, S. Cr´epey, B. Diallo, G. Fort, E. Gobet and U. Stazhynski(2018), Stochastic approximation schemes for economic capital and riskmargin computations.
HAL < hal-01710394 > [3] Mark Broadie, Yiping Du and Ciamac C. Moallemi (2015), Risk esti-mation via regression. Operations Research , 63 (5) 1077-1097.[4] Patrick Cheridito, Freddy Delbaen and Michael Kupper (2006), Dy-namic monetary risk measures for bounded discrete-time processes.
Electronic Journal of Probability , 11, 57-106.[5] Patrick Cheridito and Michael Kupper (2011), Composition of time-consistent dynamic monetary risk measures in discrete time.
Interna-tional Journal of Theoretical and Applied Finance , 14 (1), 137-162.[6] Patrick Cheridito and Michael Kupper (2009), Recursiveness of indif-ference prices and translation-invariant preferences.
Mathematics andFinancial Economics , 2 (3), 173-188.[7] European Commission (2015), Commission delegated regulation (EU)2015/35 of 10 October 2014.
Official Journal of the European Union .[8] (cid:32)Lukasz Delong, Jan Dhaene and Karim Barigou (2019), Fair valuationof insurance liability cash-flow streams in continuous time: Applica-tions.
ASTIN Bulletin , 49 (2), 299-333.[9] Kai Detlefsen and Giacomo Scandolo (2005), Conditional and dynamicconvex risk measures.
Finance and Stochastics , 9, 539-561.[10] Jan Dhaene, Ben Stassen, Karim Barigou, Dani¨el Linders and ZeChen (2017) Fair valuation of insurance liabilities: Merging actuar-ial judgement and Market-Consistency.
Insurance: Mathematics andEconomics , 76, 14-27.[11] Hampus Engsner, Mathias Lindholm and Filip Lindskog (2017), Insur-ance valuation: A computable multi-period cost-of-capital approach.
Insurance: Mathematics and Economics , 72, 250-264.[12] Hampus Engsner and Filip Lindskog (2020), Continuous-time limitsof multi-period cost-of-capital margins.
Statistics and Risk Modelling ,forthcoming (DOI: https://doi.org/10.1515/strm-2019-0008)2913] Daniel Egloff (2005), Monte Carlo algorithms for optimal stopping andstatistical learning,
The Annals of Applied Probability , 15 (2), 1396-1432.[14] Hans F¨ollmer and Alexander Schied (2016),
Stochastic finance: An in-troduction in discrete time , 4th edition, De Gruyter Graduate.[15] Francis A. Longstaff and Eduardo S. Schwartz (2001). Valuing Ameri-can options by simulation: A simple least-squares approach.
The Reviewof Financial Studies , 14 (1),113-147.[16] Christoph M¨ohr (2011), Market-consistent valuation of insurance lia-bilities by cost of capital.
ASTIN Bulletin , 41, 315-341.[17] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2020). Time-consistent and market-consistent actuarial valuation of the partici-pating pension contract.
Scandinavian Actuarial Journal , forthcoming(DOI: https://doi.org/10.1080/03461238.2020.1832911)[18] Antoon Pelsser and Ahmad Salahnejhad Ghalehjooghi (2016). Time-consistent actuarial valuations.
Insurance: Mathematics and Eco-nomics , 66, 97-112.[19] Antoon Pelsser and Mitja Stadje (2014), Time-consistent and market-consistent evaluations.
Mathematical Finance , 24 (1), 25-65.[20] Lars Stentoft (2004), Convergence of the least squares Monte Carloapproach to American option valuation.
Management Science , 50 (9),1193-1203.[21] John N. Tsitsiklis and Benjamin Van Roy (2001). Regression meth-ods for pricing complex American-style options.
IEEE Transactions OnNeural Networks , 12 (4), 694-703.[22] Daniel Z. Zanger (2009), Convergence of a least-squares Monte Carloalgorithm for bounded approximating sets.
Applied Mathematical Fi-nance , 16 (2), 123-150.[23] Daniel Z. Zanger (2013), Quantitative error estimates for a least-squares Monte Carlo algorithm for American option pricing.
Financeand Stochastics , 17, 503-534.[24] Daniel Z. Zanger (2018), Convergence of a least-squares Monte Carloalgorithm for American option pricing with dependent sample data.