The Longstaff--Schwartz algorithm for Lévy models: Results on fast and slow convergence
aa r X i v : . [ m a t h . P R ] A p r The Annals of Applied Probability (cid:13)
Institute of Mathematical Statistics, 2011
THE LONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS:RESULTS ON FAST AND SLOW CONVERGENCE
By Stefan Gerhold Vienna University of Technology
We investigate the Longstaff–Schwartz algorithm for Americanoption pricing assuming that both the number of regressors and thenumber of Monte Carlo paths tend to infinity. Our main resultsconcern extensions, respectively, applications of results by Glasser-man and Yu [
Ann. Appl. Probab. (2004) 2090–2119] and Stentoft[ Manag. Sci. (2004) 1193–1203] to several L´evy models, in partic-ular the geometric Meixner model. A convenient setting to analyzethis convergence problem is provided by the L´evy–Sheffer systemsintroduced by Schoutens and Teugels.
1. Introduction.
PDE or tree methods for pricing financial products be-come ineffective in the presence of many stochastic factors and path de-pendent payoff structures. When resorting to Monte Carlo, early exercisefeatures like callability or flip options pose difficulties. Typical examples arethe pricing of callable LIBOR exotics with the LIBOR market model [3] orthe valuation of life insurance contracts with early exercise features [1].The least squares Monte Carlo approach by Longstaff and Schwartz [17]has become the standard method to deal with such American/Bermudanproducts. It proceeds by backward induction and estimates value functionsby regression on a prescribed set of basis functions. The computed exercisestrategy is suboptimal, resulting in a lower bound for the option price; seeBelomestny, Bender and Schoenmakers [2] for recent work on upper bounds.Fouque and Han [12] discuss numerical aspects of American option pricing,including variance reduction.
Received March 2008; revised March 2010. Supported by the Christian Doppler Laboratory for Portfolio Risk Management(PRisMa Lab) . The Bank Austria Creditanstalt (BA-CA)and the Austrian Federal Financing Agency ( ¨OBFA) through CDG is gratefully acknowl-edged.
AMS 2000 subject classifications.
Primary 62P05; secondary 33C45.
Key words and phrases.
Option pricing, dynamic programming, Monte Carlo, regres-sion, orthogonal polynomials, L´evy–Meixner systems
This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in
The Annals of Applied Probability ,2011, Vol. 21, No. 2, 589–608. This reprint differs from the original in paginationand typographic detail. 1
S. GERHOLD
The convergence analysis of the Longstaff–Schwartz algorithm was com-menced in the original paper [17] and was carried out in detail by Cl´ement,Lamberton and Protter [4]. They show convergence of the regression ap-proximation to the true Bermudan price and convergence of the MonteCarlo procedure for a fixed number of basis functions. Glasserman andYu [14] and Stentoft [25] have analyzed settings in which the number ofbasis functions and the number of simulation paths increase together. Inparticular, Glasserman and Yu [14] have shown that the number of pathsmust grow exponentially in the number of basis functions if the underlyingprocess is Brownian motion or geometric Brownian motion. On the otherhand, Stentoft [25] appealed to results on series estimators [7, 20] to obtainpolynomial growth for rather general models, assuming that the underly-ing has a bounded state space. The latter assumption was also imposed byEglof, Kohler and Todorovic [9, 10] in the analysis of their extension of theLongstaff–Schwartz algorithm.In the present paper we discuss the applicability of Stentoft’s results toexponential L´evy models and extend Glasserman and Yu’s analysis to severalmodels, including the Meixner model [16, 22]. These latter results provide anapplication of the neat martingale properties that Schoutens and Teugels [21,23] found for certain L´evy processes and families of orthogonal polynomials.In the following section we recall the dynamic programming principle andthe Longstaff–Schwartz algorithm. We show how Stentoft’s [25] convergenceresult can be applied to L´evy models, in particular, to the Meixner model.This involves discussing the assumption of a bounded underlying and thesmoothness of the value functions occurring in the backward induction.In Section 3 we describe the problem that Glasserman and Yu [14] treated.The main difference to Stentoft’s setting is the unbounded support of theunderlying. Section 4 recalls the notions of Sheffer system and L´evy–Meixnersystem. Besides Brownian motion, this theory yields four processes that lendthemselves to the investigation: the Meixner, standard Poisson, Gamma andPascal processes [8, 13]. In Section 5 we assume that our option has onlythree exercise opportunities resulting in a single regression and show how fastthe number of simulation paths must increase in order to ensure convergenceof the Longstaff–Schwartz algorithm for a growing number of basis functions.Finally, Section 6 contains an analogous bound for the multi-period setting,which is weaker, but upon inversion still leads to the same critical asymptoticrate as the single-period case. In the course of the proofs it turns out thatthe different critical rate pertaining to Brownian motion stems from thecomparatively slow growth of the linearization coefficients of the associatedL´evy–Meixner system, namely, the Hermite polynomials.
2. Bounded state space and fast convergence.
Suppose that our assetfollows a Markov process S t . We assume throughout the paper that the ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS interest rate is zero; extending our results to a constant interest rate r > t < · · · < t m .The payoff from exercise is h n ( S t n ) for given functions h n , 0 ≤ n ≤ m . Bythe dynamic programming principle the option value at time t = 0 equals V = max { h ( S ) , C ( S ) } , where the continuation values C n are given by C m ( x ) = 0 ,C n ( x ) = E [max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ] , ≤ n < m. Suppose that N sample paths of the underlying are simulated. Longstaff andSchwartz [17] propose to approximate the continuation values by a linearcombination of basis functions ψ nk , C n ( x ) ≈ K X k =0 β nk ψ nk ( x ) = β T n ψ n ( x ) , where β n = ( β n , . . . , β nK ) T is a vector of real numbers which is estimatedby regression over the simulated paths and ψ n ( x ) = [ ψ n ( x ) , . . . , ψ nK ( x )] T .To obtain a good convergence result as N and K both tend to infinity,Stentoft [25] assumes that samples above and below certain thresholds arediscarded. So let us fix finite truncation intervals I , . . . , I m ⊂ ]0 , ∞ [ anddiscard all sample paths with S t n / ∈ I n when estimating the continuationvalue C n ( x ). We are then estimating the following “truncated” continuationvalues: C tr m ( x ) = 0 ,C tr n ( x ) = I n ( x ) · E [max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ] , ≤ n < m,C tr0 ( x ) = E [max { h ( S t ) , C tr1 ( S t ) } | S = x ] . The option value at time t = 0 approximately equals V tr0 = max { h ( S ) , C tr0 ( S ) } . (2.1)Outside of the truncation intervals I n ⊂ ]0 , ∞ [ we extrapolate by zero sinceit does not matter in the theoretical analysis.Besides truncation, another possibility to make the state space boundedwould be absorption of the underlying process at some lower and upperbounds [9, 10]. This, however, causes atoms in the distribution so thatStentoft’s result is no longer applicable as it requires the existence of adensity.We assume in the present section that the underlying has the followingdynamics. (Recall that we suppose throughout that the interest rate is zero.) S. GERHOLD
Assumption A (Exponential L´evy dynamics). The risk neutral dynam-ics of the underlying are S t = S exp( X t ) , where X t is a L´evy process with X = 0. The support of X t is the whole realline for t > X t has a continuous density function. Assumption B (Value smoothness). Let the function h be of, at most,linear growth and such that h ( S T ) is integrable for each T >
0. Then E [ h ( S T ) | S = x ] is a C -smooth function of x .Without going into detail we note that Stentoft [25] imposes the followingadditional assumptions: Assumption C (Further technical assumptions). The basis functionsare shifted Legendre polynomials, the continuation values C n ( S t n ) are inthe L -span of the regressors, the simulated paths are independent and theprobability that the exercise payoff exactly equals the continuation value iszero.Now Stentoft’s main result ([25], Theorem 2), specialized to L´evy models,reads as follows. (By “truncated algorithm” we mean that we discard thesamples outside the intervals I n as explained above.) Theorem 1.
Fix arbitrary finite truncation intervals I , . . . , I m con-tained in ]0 , ∞ [ and assume that Assumptions A–C hold. Let N (the numberof paths) and K (the number of basis functions) tend to infinity such that K /N → . Then the option prices computed by the truncated Longstaff–Schwartz algorithm converge to V tr0 , defined by (2.1). If the truncation intervals are large enough, then one would hope that theapproximate price V tr0 is close to the exact price V . We will now show thatthis is indeed the case for L´evy models, assuming mild integrability and (atmost) linearly growing payoff functions. Assumption D (Integrability). For each t there are p > p ′ > S pt and S − p ′ t are integrable. Assumption E (Linear payoff growth). The payoff functions grow atmost linearly, | h n ( x ) | ≤ c (1 + x ) , x ≥ , ≤ n ≤ m , for some c > . (2.2) ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS Theorem 2.
Assume that Assumptions A, D and E are satisfied andthat the truncation intervals satisfy I n = [ b − n , b n ] , ≤ n < m, where b n = b νn +1 , ≤ n < m − , (2.3) with ν = min (cid:26) p ′ p ′ + q , pp + q (cid:27) and p + 1 q = 1 . Then V tr0 converges to the exact option price V as b m tends to infinity. Note that ν = 1 − /p in (2.3) if p = p ′ . In particular, if all moments ofthe underlying and its reciprocal exist, like in the Black–Scholes model, thenthe exponent ν ∈ ]0 ,
1[ is arbitrarily close to 1.
Proof of Theorem 2.
A trivial induction, using the martingale prop-erty of S t , shows that the continuation values C n ( x ) and C tr n ( x ) satisfy thebound (2.2) too. We will show that for all nC tr n ( x ) = C n ( x ) + o (1) as b m → ∞ , uniformly w.r.t. x ∈ I n . For x ∈ I n , we have C n ( x ) − C tr n ( x )(2.4) = E [ { S tn +1 / ∈ I n +1 } max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ]+ E [ { S tn +1 ∈ I n +1 } max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ] − E [ { S tn +1 ∈ I n +1 } max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ](2.5) − E [ { S tn +1 / ∈ I n +1 } max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ] . It follows readily from the induction hypothesis that the difference of thesecond and the third term is uniformly o (1) on I n , as b m − → ∞ . In thefollowing, we write c for various positive constants whose precise value isirrelevant. Now let us estimate the first and the last expectation on theright-hand side of (2.5). Again, for x ∈ I n we use H¨older’s inequality andMinkowski’s inequality to see that each of them is bounded by E [ { S tn +1 / ∈ I n +1 } c (1 + S t n +1 ) | S t n = x ] ≤ c P [ S t n +1 / ∈ I n +1 | S t n = x ] /q · E [(1 + S t n +1 ) p | S t n = x ] /p S. GERHOLD = c P (cid:20) x S t n +1 S t n / ∈ I n +1 (cid:21) /q · E (cid:20)(cid:18) x S t n +1 S t n (cid:19) p (cid:21) /p (2.6) ≤ c (cid:18) − F n (cid:18) b n +1 x (cid:19) + F n (cid:18) xb n +1 (cid:19)(cid:19) /q (cid:18) x E (cid:20)(cid:18) S t n +1 S t n (cid:19) p (cid:21) /p (cid:19) ≤ cb n (cid:18) − F n (cid:18) b n +1 b n (cid:19) + F n (cid:18) b n b n +1 (cid:19)(cid:19) /q , where F n is the distribution function of S t n +1 /S t n . Now note that b qn (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) = b p + qn b − pn +1 (cid:18) b n +1 b n (cid:19) p (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) ≤ (cid:18) b n +1 b n (cid:19) p (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) = o (1) , b m − → ∞ , where the last equality follows [11] from S t n +1 /S t n ∈ L p . Similarly, if G n denotes the distribution function of S t n /S t n +1 , we have b qn F n (cid:18) b n b n +1 (cid:19) = b qn (cid:18) − G n (cid:18) b n +1 b n (cid:19)(cid:19) = b p ′ + qn b − p ′ n +1 (cid:18) b n +1 b n (cid:19) p ′ (cid:18) − G n (cid:18) b n +1 b n (cid:19)(cid:19) = o (1) . (cid:3) Besides the bounded state space, a crucial assumption of Stentoft’s re-sult (Theorem 1) is the smoothness of the continuation value functions. Inthe Black–Scholes model, and more generally in models where the log-price X t has a diffusion component, they are always C ∞ -smooth [5]. The vari-ance Gamma model is an example of a pure jump process where the valuefunctions are not necessarily continuously differentiable [5]. In the geometricMeixner model [16, 22, 23], on the other hand, the continuation values aresmooth, as we will now show. Consequently, Theorem 1 is applicable to thegeometric Meixner model (if the mild Assumptions C and D are satisfied). Proposition 3.
Suppose that Assumptions A and D hold and that thelog-price X t is a Meixner process. Then Assumption B holds. Proof.
For fixed t > X t follows the Meixner distributionMeix( α, β, µt, δt ), where α > − π < β < π , µ > δ ∈ R . This means ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS that the density of X t equals f t ( x ) = (2 cos( β/ δt πα Γ(2 δt ) e β/α ( x − µt ) (cid:12)(cid:12)(cid:12)(cid:12) Γ (cid:18) δt + i x − µtα (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) and the value function for the payoff h ( S T ) is E [ h ( S T ) | S t = x ] = Z ∞−∞ h ( e y x ) f T − t ( y ) dy (2.7) = Z ∞ h ( z ) f T − t (cid:18) log zx (cid:19) dz/z. By the asymptotic formulas [22] f t ( x ) ∼ c ± | x | δt − e −| x | ( π ± β ) /α , x → ±∞ , and the integrability Assumption D, we must have ( π + β ) /α >
1. We cannow differentiate the value function (2.7) under the integral sign, justifiedby the following fact: for real u and natural k the quantity ∂ k /∂v k | Γ( u + iv ) || Γ( u + iv ) | grows only polynomially in v as v → ±∞ . To see this start from Lerch’sformula [15] | Γ( u + iv ) | = Γ( u + 1) √ u + v ∞ Y n =1 (cid:18) v ( u + n ) (cid:19) − / , hence, we have ∂/∂v | Γ( u + iv ) || Γ( u + iv ) | = − vu + v − v ∞ X n =1 u + n ) + v . It suffices to note that 1 / [( u + n ) + v ] ≤ / ( u + n ) to see that this expres-sion grows only polynomially in v . The higher derivatives can be dealt withby a straightforward induction. (cid:3)
3. Unbounded state space and slow convergence.
If we drop the assump-tion that the state space of our underlying is bounded, the convergence be-havior of the Longstaff–Schwartz algorithm radically changes. (As above, wesuppose that both the number of paths and the number of basis functionstend to infinity.) This is illustrated by results of Glasserman and Yu [14]who showed, assuming that the underlying follows either Brownian motionor geometric Brownian motion, that the number of Monte Carlo paths mustgrow exponentially in the number of basis functions to retain convergence.
S. GERHOLD
The first and last lines of Table 1 reflect this result; the lines in between willbe established below.For the reader’s convenience, our notation closely follows that of [14].Recall that we assume that the interest rate is r = 0 throughout the paper.The variant of the Longstaff–Schwartz algorithm to be analyzed pro-ceeds as follows. Start with the final continuation value ˆ C m = 0 and thefinal option value ˆ V m = h m . For n = m − , . . . , N sample paths { S ( i ) t , . . . , S ( i ) t n +1 } , 1 ≤ i ≤ N , and setˆ γ n = 1 N N X i =1 ˆ V n +1 ( S ( i ) t n +1 ) ψ n ( S ( i ) t n ) , ˆ β n = Ψ − n ˆ γ n , ˆ C n = ˆ β T n ψ n , ˆ V n = max { h n , ˆ C n } . Finally, the initial continuation value is ˆ C ( S ) = N − P Ni =1 ˆ V ( S ( i ) t ) fromwhich the initial option value is estimated by ˆ V = max { h ( S ) , ˆ C ( S ) } .There are two (minor) differences to the variant of the algorithm thatwe analyzed in Section 2: first, we assume now that a fresh set of paths isgenerated for each exercise date. Second, in the present section we will useexplicit expressions for the ( K + 1) × ( K + 1) matrixΨ n = E [ ψ n ( S t n ) ψ n ( S t n ) T ] , (3.1)which has to be estimated by its sample counterpart in general.In the single-period case m = 2, the question that Glasserman and Yu [14]treated is as follows. Suppose that there is an exact representation h ( S t ) = K X k =0 β k ψ k ( S t ) , (3.2) Table 1
The highest possible number of basis functions for N paths Process Basis polynomials
Geometric Brownian motion Monomials √ log N Meixner Meixner–Pollaczek log N/ log log N Standard Poisson Charlier log N/ log log N Gamma Laguerre log N/ log log N Pascal Meixner log N/ log log N Brownian motion Hermite log N ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS with unknown constants β k . This assumption is not too restrictive; an in-finite series representation of this kind has to be assumed anyway to getconvergence of the algorithm and since we are interested in K → ∞ , we cansuppose that (3.2) is a good approximation of the payoff at t . Furthermore,assume that the martingale property E [ ψ k ( S t ) | S t ] = ψ k ( S t )(3.3)holds. (In [14], additional deterministic factors in (3.3) are allowed; we choseto absorb these into the basis functions.) How fast may K tend to infinitycompared to N while assuring that the mean square error of β tends tozero? To this end, Glasserman and Yu [14] established the boundssup | β | =1 E [ | β − ˆ β | ] ≤ k Ψ − k N K X j =0 K X k =0 E [ ψ j ( S t ) ψ k ( S t ) ](3.4)and sup | β | =1 E [ | β − ˆ β | ] ≥ N k Ψ k K X k =0 E [ ψ K ( S t ) ψ k ( S t ) ] − N . (3.5)Here and in what follows, | · | denotes the Euclidean vector norm and k · k denotes the Euclidean (or Frobenius) matrix norm. With regard to notation,Glasserman and Yu [14] call the coefficients in (3.2) a k instead of β k ; oursimplified assumption (3.3) makes both their a and β equal to our β . Thishas to be kept in mind when comparing (3.4) and (3.5) to [14], formulas(22), respectively, (23).The proofs of the estimates (3.4) and (3.5) are short; the bulk of thework of Glasserman and Yu [14] lies in the concrete examples (Brownianmotion and geometric Brownian motion) and in the general analysis of themulti-period case on which we will build in Section 6.The martingale property (3.3) is convenient for estimating the expecta-tions in the bounds (3.4) and (3.5). Another useful property is orthogonalityof the basis functions. If S t is Brownian motion, then Glasserman and Yu[14] have shown that for N paths the highest K , for which the mean squareerror tends to zero, is roughly log N . Hermite polynomials are natural basisfunctions in this case. If the underlying process is geometric Brownian mo-tion and monomials are used as basis functions, then K may only be as highas √ log N . In the following sections we show that the analogous rate forthe Meixner, Poisson, Gamma and Pascal processes is in between, namely,log N/ log log N . S. GERHOLD
4. L´evy–Meixner systems.
A source of basis functions and processes thatsatisfy martingale equalities of the type (3.3) are the L´evy–Meixner systemsintroduced by Schoutens and Teugels [21, 23]. Recall that Meixner [18] hasdetermined all sets of orthogonal polynomials Q k ( x ) that satisfy Sheffer’scondition f ( z ) exp( xu ( z )) = ∞ X k =0 Q k ( x ) z k k !for some formal power series f and u with u (0) = 0, u ′ (0) = 0 and f (0) = 0.Schoutens and Teugels [23] introduce a time parameter t via f ( z ) t exp( xu ( z )) = ∞ X k =0 Q k ( x, t ) z k k !and show how an infinitely divisible characteristic function, and thus a L´evyprocess, can be defined by f and u under appropriate conditions. Buildingon Meixner’s characterization, five sets of orthogonal polynomials Q k ( X t , t )and associated L´evy processes X t are determined which satisfy martingaleequalities of the type E [ Q k ( X t , t ) | X s ] = Q k ( X s , s ) , ≤ s ≤ t. This furnishes the connection between Sheffer (resp., L´evy–Meixner) systemsand condition (3.3). There are five L´evy–Meixner systems constructed fromHermite polynomials, Charlier polynomials C k ( x, µ ), Laguerre polynomials L ( α ) k ( x ), Meixner polynomials M k ( x ; µ, q ) and Meixner–Pollaczek polynomi-als P k ( x ; µ, ζ ), respectively. The resulting L´evy processes X t are standardBrownian motion B t , the standard Poisson process N t , the Gamma pro-cess G t , the Pascal process P t and the Meixner process H t , respectively. SeeSchoutens and Teugels [21, 23] for details on all these processes and familiesof orthogonal polynomials.Brownian motion is not of interest to us since the corresponding lastline of Table 1 has been established by Glasserman and Yu [14]. As forthe remaining four processes, in the light of condition (3.3), the martingalerelations [21] E [ C k ( N t , t ) | N s ] = (cid:18) st (cid:19) k C k ( N s , s ) , E [ L ( t − k ( G t ) | G s ] = L ( s − k ( G s ) , (4.1) E [ M k ( P t ; t, q ) | P s ] = ( s ) k ( t ) k M k ( P s ; s, q ) , E [ P k ( H t ; t, ζ ) | H s ] = P k ( H s ; s, ζ ) , ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS valid for 0 < s < t , prompt us to choose the basis functions in Table 2.[Note that ( t ) k = t ( t + 1) · · · ( t + k −
1) is the Pochhammer symbol.] Whenspecializing the bounds (3.4) and (3.5) to our examples, we will require theorthogonality properties E [ C k ( N t , t ) C l ( N t , t )] = t − k k ! δ kl , (4.2) E [ L ( t ) k ( G t ) L ( t ) l ( G t )] = Γ( k + t + 1) k ! δ kl , (4.3) E [ M k ( P t ; t, q ) M l ( P t ; t, q )] = k !( t ) k q k δ kl , (4.4) E [ P k ( H t ; t, ζ ) P l ( H t ; t, ζ )] = Γ( k + 2 t )(2 sin ζ ) t k ! δ kl , (4.5)as well as a way to express the squares of the basis functions as series ofbasis functions. We will denote by d ki ( t n ) the linearization coefficients inthe expansion ψ nk ( x ) = k X i =0 d ki ( t n ) ψ ni ( x ) . (4.6)Where distinction is necessary, the linearization coefficients correspondingto the four families in Table 2 will be written as d P ki ( t n ), d G ki ( t n ), d Pa ki ( t n ) and d M ki ( t n ), respectively. The same superscripts will adorn other quantities todistinguish the four cases, namely, the Meixner, Poisson, Gamma and Pascalprocess as in Table 2.Among these processes, the Meixner process has the most significancein applications. Clearly, a financial model will impose geometric Meixnerdynamics (as in Proposition 3) rather than the linear process which maybecome negative. But then a convergence analysis in the spirit of Glassermanand Yu [14] is impossible with polynomial basis functions as the geometricMeixner process does not have finite moments of all orders. Instead, wepropose to use basis functions of logarithmic growth, ψ M , log nk ( x ) = P k (log x ; t n , ζ ) . (4.7) Table 2
L´evy–Meixner systems
Process Notation Basis polynomials ψ nk ( x ) Parameters
Meixner H t ψ M nk ( x ) = P k ( x ; t n , ζ ) 0 < ζ < π Standard Poisson N t ψ P nk ( x ) = t kn C k ( x, t n )Gamma G t ψ G nk ( x ) = L ( t n − k ( x )Pascal P t ψ Pa nk ( x ) = ( t n ) k M k ( x ; t n , q ) 0 < q < S. GERHOLD
Then our convergence result for the Meixner process (Theorem 4 below) canbe applied. Similarly, models based on the geometric Poisson ([24], Section112.7.1) or geometric Pascal processes can be reduced to the linear case bymodifying their respective basis functions analogously.
5. Unbounded state space: The single-period problem.
Main result and first steps of the proof.
We now state our mainresult about the single-period problem where our option has the exercisetimes 0 = t < t < t . As noted above, the geometric Meixner model iscontained in this result by modifying the basis functions according to (4.7). Theorem 4.
Suppose m = 2 , that S t is a Meixner process and that thebasis functions are as in the first line of Table 2. Put ( u, v ) = (8 , . If thenumber N of paths and the number K of basis functions satisfy N ≥ K ( u + ε ) K for some positive ε , then lim N →∞ sup | β | =1 E [ | β − ˆ β | ] = 0 . If N ≤ K ( v − ε ) K , then lim N →∞ sup | β | =1 E [ | β − ˆ β | ] = ∞ . For the standard Poisson, Gamma and Pascal processes, with their respectivebasis functions from Table 2, the same holds if ( u, v ) is replaced by (10 , , (8 , and (11 , , respectively. The announced critical rate log N/ log log N in Table 1 then follows fromthe fact that the solution of N = K cK satisfies K ∼ c − log N/ log log N (see,e.g., de Bruijn [6]).Looking at (3.4) and (3.5) we begin the proof of Theorem 4 by bounding k Ψ k and k Ψ − k , defined by (3.1) and Table 2. As in Section 2, the letter c denotes various positive constants whose value is irrelevant. Lemma 5. As K → ∞ , the values k Ψ k and k Ψ − k grow at most expo-nentially in all four cases (Meixner, Poisson, Gamma and Pascal), exceptfor k Ψ P1 k ≤ c K K K and k Ψ Pa1 k ≤ c K K K . Proof.
The estimates for the Meixner, Poisson and Pascal cases areeasy consequences of the orthogonality relations (4.2)–(4.5) and Stirling’sformula. It remains to deal with the Gamma case. The parameter t − ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS the martingale property (4.1) is not quite compatible with the orthogonalityrelation (4.3) of the Laguerre polynomials. But by the formula [26] L ( α − k ( x ) = L ( α ) k ( x ) − L ( α ) k − ( x )we obtain E [ ψ G1 k ( G t ) ψ G1 l ( G t )] = − (cid:18) k + t k (cid:19) , k = l − k + t k + t (cid:18) k + t k (cid:19) , k = l , − (cid:18) k + t − k − (cid:19) , k = l + 1,0 , | k − l | ≥ G1 is tridiagonal. Since (5.1) grows only polynomially in k , it is clearthat so does k Ψ G1 k . As for the inverse, note that Ψ G1 is diagonally dominantso that it suffices to bound the diagonal elements of (Ψ G1 ) − (see Nabben[19], Theorem 3.1); note that the τ k from that theorem are all equal to 1in our situation.) The diagonal elements e k of (Ψ G1 ) − can be computedrecursively by [19] e KK = KK + t (cid:18) K + t − K − (cid:19) − ≤ c K and e k − ,k − = k + t k (cid:18) k + t k + t e k,k − e k +1 ,k +1 (cid:19) , ≤ k < K. A straightforward backward induction shows that this implies | e kk | ≤ (4( t + 1)) K − k +1 e KK , ≤ k < K, hence, k (Ψ G1 ) − k grows at most exponentially too. (cid:3) We proceed to bound the fourth order moments appearing in (3.4). Using(4.6) and the martingale relation (3.3), we obtain E [ ψ j ( S t ) ψ k ( S t ) ]= E " j X i =0 d ji ( t ) ψ i ( S t ) × k X s =0 d ks ( t ) ψ s ( S t ) (5.2) = j X i =0 2 k X s =0 d ji ( t ) d ks ( t ) E [ E [ ψ i ( S t ) | S t ] ψ s ( S t )]= j X i =0 2 k X s =0 d ji ( t ) d ks ( t ) E [ ψ i ( S t ) ψ s ( S t )] . S. GERHOLD
The linearization coefficients d ki from the expansion (4.6) are well-studiedobjects for various families of orthogonal polynomials. They have combinato-rial interpretations in terms of (generalized) derangements, rook polynomialsand matching polynomials. See Zeng [27] for on overview of these properties,explicit formulas and many references. Paraphrasing some of these formulas([27], Corollary 2) we have d P ki ( t n ) = t k − in k ! i ! X s ≥ t sn ( s − k )! ( s − i )!(2 k + i − s )! , (5.3) d G ki ( t n ) = 2 k + i k ! i ! X s ≥ ( t n − s s ( s − k )! ( s − i )!(2 k + i − s )! , (5.4) d Pa ki ( t n ) = (1 + q ) k + i k ! i ! ( t n ) k ( t n ) i X s ≥ ( t n ) s (1 + q ) − s q − s ( s − k )! ( s − i )!(2 k + i − s )! , (5.5) d M ki ( t n ) = ( − ζ ) k + i k ! i ! X s ≥ ( t n ) s (1 + (cot ζ ) − ) s s ( s − k )! ( s − i )!(2 k + i − s )! . (5.6)Here it is understood that 1 /n ! = 0 for n a negative integer, as is naturalwhen extending the factorial by the Gamma function. Therefore, the sumsin (5.3)–(5.6) run from s = max { i, k } to s = k + ⌊ i/ ⌋ .5.2. Moment bounds in the Poisson case.
By (4.2), (5.2) and (5.3), thesum on the right-hand side of (3.4) can be estimated by K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ](5.7) ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b P jis (cid:19)(cid:18)X s ≥ b P kis (cid:19) , where b P kis := k ! i !( s − k )! ( s − i )!(2 k + i − s )! . It is easy to see that b P k +1 ,i,k + l +1 /b P k,i,k + l > i ≥
1, 0 ≤ l ≤ i/ k ≥ i − l , hence, b P k,i,k + l increases in k under these conditions. From this wededuce that the s -sums in (5.7) increase in j , respectively, k : k + ⌊ i/ ⌋ X s =max { i,k } b P kis = ⌊ i/ ⌋ X l =max { i − k, } b P k,i,k + l ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS ≤ ⌊ i/ ⌋ X l =max { i − k, } b P k +1 ,i,k + l +1 = k + ⌊ i/ ⌋ +1 X s =max { i,k } +1 b P k +1 ,i,s ≤ k + ⌊ i/ ⌋ +1 X s =max { i,k +1 } b P k +1 ,i,s . Using this in (5.7) yields (recall that c may change its value in each occur-rence) K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ](5.8) ≤ c K K ! K X i =0 K + ⌊ i/ ⌋ X s = K i ! / ( s − K )! ( s − i )!(2 K + i − s )! ! . It is plain that the summand increases in i for K ≥
0, 0 ≤ i ≤ K and K ≤ s ≤ K + i/
2. Hence, we find that the portion P Ki =0 of the i -sum in (5.8) canbe bounded from above by( K + 1) K ! ⌊ K/ ⌋ X s = K s − K )! (3 K − s )! ! (5.9) ≤ c K K K . To see the last inequality, note that the summand in (5.9) is unimodal withmode at s = K + K / − K / + O (1). Estimating this maximal summand,by Stirling’s formula and some easy manipulations, shows that the sum in(5.9) is smaller than c K K K . The remaining part P Ki = K +1 of the i -sum in(5.8) can be estimated by K X i = K +1 i ! K + ⌊ i/ ⌋ X s = i i ! s !( s − K )! ( s − i )!(2 K + i − s )! ! ≤ c K K X i = K +1 i ! (cid:18) i !( K + ⌊ i/ ⌋ )! ⌊ i/ ⌋ ! ( K + ⌊ i/ ⌋ − i )!( i − ⌊ i/ ⌋ )! (cid:19) (5.10) ≤ c K K X i = K +1 i !( K + ⌊ i/ ⌋ )! ( K + ⌊ i/ ⌋ − i )! ≤ c K K K . S. GERHOLD
Fig. 1.
The summation range of the first sum in (5.10).
Note that in the first line we have introduced the new factor s ! in the numer-ator. This makes the summand increasing w.r.t. the substitution i → i + 1, s → s + 1. Hence, it suffices to keep only the summands of the s -sum with s = K + ⌊ i/ ⌋ (the thick dots in Figure 1) which shows the first inequality.As for the second inequality, note that the factor i ! / ⌊ i/ ⌋ ! of the summandgrows only exponentially and that the factor ( i − ⌊ i/ ⌋ )! in the denominatoris clearly negligible. Finally, the last sum in (5.10) has increasing summandswhich, together with Stirling’s formula, implies the last inequality. By (5.8),the estimates (5.9) and (5.10) show that K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ] ≤ c K K K . In light of (3.4) and Lemma 5, the value u = 10 for the Poisson process inTheorem 4 is established.As for the second assertion about the Poisson process in Theorem 4, notethat, from (5.2), E [ ψ K ( S t ) ψ k ( S t ) ] = K X i =0 2 k X s =0 d Ki ( t ) d ks ( t ) E [ ψ i ( S t ) ψ s ( S t )] . The orthogonality property (4.2) and formula (5.3) yield K X k =0 E [ ψ P2 K ( S t ) ψ P1 k ( S t ) ] ≥ c K K X k =0 2 k X i =0 d P Ki ( t ) d P ki ( t ) i ! ≥ c K d P K, K ( t ) d P K, K ( t )(2 K )! ≥ c K (2 K )! ≥ c K K K . The second inequality follows from retaining only the summand k = K , i = 2 K . This makes the sum in (5.3) collapse to the summand s = 2 K , ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS hence, the third inequality. Appealing to (3.5) and Lemma 5 completes theproof of the Poisson part of Theorem 4. Note that the preceding estimatescan presumably be improved. This seems not worthwhile though; since ourestimate of k Ψ P1 k in Lemma 5 is sharp, we will not obtain equal values u = v in Theorem 4 anyway, unless at least one of the bounds (3.4) and (3.5) wasimproved too.5.3. Moment bounds in the Meixner case.
The proofs in the remainingthree cases are very similar to the Poisson case. In the Meixner case, wehave K X j =0 K X k =0 E [ ψ M2 j ( S t ) ψ M1 k ( S t ) ](5.11) ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) , where b M kis := k ! s !( s − k )! ( s − i )!(2 k + i − s )! . Again, b M k,i,k + l increases in k and the remaining steps to show the upperbound are completely analogous to the Poisson case. This time the numera-tor factor s ! in the analogue of (5.10) appears naturally and is not introducedartificially to force some monotonicity. Moreover, the lower bound uses thesame summands as in the Poisson case. Both resulting bounds are of theform c K K K , hence, u = v = 8 in Theorem 4.5.4. Moment bounds in the Pascal case.
We can reuse the values b M kis and the estimate that we just sketched: K X j =0 K X k =0 E [ ψ Pa2 j ( S t ) ψ Pa1 k ( S t ) ] ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! k ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) ≤ c K K !(2 K )! K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) ≤ c K K !(2 K )! K K ≤ c K K K . The lower bound poses no new difficulties either. S. GERHOLD
Moment bounds in the Gamma case.
This part is only slightly moreinvolved. Due to (5.1), we have three i -sums instead of one in the analogueof (5.8). The right-hand side of (5.1) can be replaced by c K in each ofthese. Then one of the three i -sums equals the i -sum in (5.11) and the othertwo differ only in an index shift b M k,i ± ,s which can be easily bounded bypolynomial factors. Thus the resulting growth rate is c K K K , as for theMeixner case. The proof of Theorem 4 is complete.5.6. Side remark: The Bachelier model.
We finish this section with aremark about Brownian motion. If this is the underlying process S t , thenappropriate basis functions can be built from Hermite polynomials in sucha way that k Ψ k , k Ψ − k and the analogue of (4.2) grow only exponentially[14]. This is in line with the corresponding growth orders in the Gamma andMeixner cases (and in the Poisson and Pascal cases, if we renormalize ourbasis functions there by 1 / √ k ! and 1 /k !, resp.). What makes the Gaussiancase peculiar is that the linearization coefficients of the Hermite polynomialsinduce only exponential growth too when plugged into (5.2), whereas thelinearization coefficients in the four cases we treat in this paper grow faster.
6. Unbounded state space: The multi-period problem.
In this sectionwe extend the main result of the preceding section (Theorem 4) to themulti-period problem, that is, to m + 1 exercise dates 0 = t < · · · < t m .We know from the single-period problem that the critical rate cannot belarger than log N/ log log N , so we will be done if we can show that thereis an upper bound for the mean square error of the form K cK for somepositive c . Fortunately, this can be deduced with little effort from a result ofGlasserman and Yu [14] and the estimates from the preceding section aboutthe single-period problem. Following [14], we assume that a representationanalogous to (3.2) holds at time t m and that the payoff functions do notgrow too fast in the following sense. Theorem 6.
Suppose that the payoff functions satisfy the growth con-straint E [ h n ( S t n ) ] ≤ max ν (cid:18) t ν +1 t ν (cid:19) K max ν,k E [ ψ νk ( S t ν ) ] , ≤ n ≤ m. Then the mean square error of the estimated coefficients satisfies sup | β m − | =1 E [ | β n − ˆ β n | ] ≤ N − c K K ( m − n +1) uK , ≤ n < m, where u takes on the same values as in Theorem 4, that is, , , , for S t ,the Meixner, standard Poisson, Gamma and Pascal process, respectively. ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS Proof.
By results of Glasserman and Yu [14], Theorem 3 and the lastformula before (18) on page 2096 and Jensen’s inequality, we havesup | β m − | =1 E [ | β n − ˆ β n | ] ≤ c K N max ≤ ν 7. Conclusion. Stentoft [25] and Glasserman and Yu [14] obtained appar-ently contradictory results about the convergence of the Longstaff–Schwartzalgorithm. The main difference between their respective assumptions is the(un-)boundedness of the support of the underlying at the exercise dates. Inthis light the pessimistic results of Glasserman and Yu (and our Theorems 4and 6) turn out to stem from the tails of the distribution of the underlying.The present paper shows that Stentoft’s result can be applied to L´evymodels under mild assumptions and extends Glasserman and Yu’s [14] re-sults to several concrete processes. Thus we provide some evidence thatGlasserman and Yu [14] were right to conjecture that their results for Brow-nian motion and geometric Brownian motion extend to other models.Concerning Stentoft [25] and our Section 2: although the boundedness ofthe underlying induces a nice (polynomial) relation between the number ofbasis functions and the necessary number of Monte Carlo paths, it seemsnot yet completely clear that it is a harmless assumption in practice. Anatural question for future research is how strongly the size of the truncationintervals influences the convergence speed of the calculated prices. S. GERHOLD Acknowledgments. I thank Lars Stentoft, Friedrich Hubalek and an anony-mous referee for helpful comments.REFERENCES [1] Bacinello, A. R. , Biffis, E. and Millossovich, P. (2009). Pricing life insur-ance contracts with early exercise features. J. Comput. Appl. Math. Belomestny, D. , Bender, C. and Schoenmakers, J. (2009). True upper boundsfor Bermudan products via non-nested Monte Carlo. Math. Finance Brigo, D. and Mercurio, F. (2006). Interest Rate Models—Theory and Practice:With Smile, Inflation and Credit , 2nd ed. Springer, Berlin. MR2255741[4] Cl´ement, E. , Lamberton, D. and Protter, P. (2002). An analysis of a leastsquares regression method for American option pricing. Finance Stoch. Cont, R. , Tankov, P. and Voltchkova, E. (2004). Option pricing models withjumps: Integro-differential equations and inverse problems. In Proceedings ofECCOMAS 2004 ( P. Neittaanm¨aki et al. , eds.). Jyv¨askyl¨a, Finland.[6] de Bruijn, N. G. (1958). Asymptotic Methods in Analysis . Bibliotheca Mathematica . North-Holland, Amsterdam. MR0099564[7] de Jong, R. M. (2002). A note on “Convergence rates and asymptotic normality forseries estimators”: Uniform convergence rates by W. K. Newey. J. Econometrics Dufresne, F. , Gerber, H. U. and Shiu, E. S. W. (1991). Risk theory with theGamma process. Astin Bull. Egloff, D. (2005). Monte Carlo algorithms for optimal stopping and statisticallearning. Ann. Appl. Probab. Egloff, D. , Kohler, M. and Todorovic, N. (2007). A dynamic look-ahead MonteCarlo algorithm for pricing Bermudan options. Ann. Appl. Probab. Feller, W. (1971). An Introduction to Probability Theory and Its Applications. Vol.II , 2nd ed. Wiley, New York. MR0270403[12] Fouque, J.-P. and Han, C.-H. (2009). Asymmetric variance reduction for pric-ing American options. In Mathematical Modelling and Numerical Methods inFinance ( A. Bensoussan, P. Ciarlet and Q. Zhang , eds.). Handbook of Nu-merical Analysis Gerber, H. U. and Shiu, E. S. W. (1994). Option pricing by Esscher transforms. Transactions of Society of Actuaries Glasserman, P. and Yu, B. (2004). Number of paths versus number of basis func-tions in American option pricing. Ann. Appl. Probab. Godefroy, M. (1901). La Fonction Gamma . Gauthier-Villars, Paris.[16] Grigelionis, B. (1999). Processes of Meixner type. Liet. Mat. Rink. Longstaff, F. A. and Schwartz, E. S. (2001). Valuing American options by sim-ulation: A simple least-squares approach. Rev. Financial Stud. Meixner, J. (1934). Orthogonale Polynomsysteme mit einer besonderen Gestalt dererzeugenden Funktion. J. London Math. Soc. Nabben, R. (1999). Two-sided bounds on the inverses of diagonally dominant tridi-agonal matrices. Linear Algebra Appl. [20] Newey, W. K. (1997). Convergence rates and asymptotic normality for series esti-mators. J. Econometrics Schoutens, W. (2000). Stochastic Processes and Orthogonal Polynomials . LectureNotes in Statistics . Springer, New York. MR1761401[22] Schoutens, W. (2002). Meixner processes: Theory and applications in finance. EU-RANDOM Report 2002-004, EURANDOM, Eindhoven.[23] Schoutens, W. and Teugels, J. L. (1998). L´evy processes, polynomials and mar-tingales. Comm. Statist. Stochastic Models Shreve, S. E. (2004). Stochastic Calculus for Finance. II. Continuous-Time Models .Springer, New York. MR2057928[25] Stentoft, L. (2004). Convergence of the least squares Monte Carlo approach toAmerican option valuation. Manag. Sci. Szeg˝o, G. (1975). Orthogonal Polynomials , 4th ed. American Mathematical Society,Colloquium Publications XXIII . Amer. Math. Soc., Providence, RI. MR0372517[27] Zeng, J. (1992). Weighted derangements and the linearization coefficients of orthog-onal Sheffer polynomials. Proc. London Math. Soc. (3) Financial and Actuarial MathematicsVienna University of TechnologyWiedner Hauptstraße 8–10A-1040, ViennaAustriaE-mail: [email protected]