[PDF] The Longstaff--Schwartz algorithm for Lévy models: Results on fast and slow convergence

Abstract

We investigate the Longstaff--Schwartz algorithm for American option pricing assuming that both the number of regressors and the number of Monte Carlo paths tend to infinity. Our main results concern extensions, respectively, applications of results by Glasserman and Yu [Ann. Appl. Probab. 14 (2004) 2090--2119] and Stentoft [Manag. Sci. 50 (2004) 1193--1203] to several Lévy models, in particular the geometric Meixner model. A convenient setting to analyze this convergence problem is provided by the Lévy--Sheffer systems introduced by Schoutens and Teugels.

Full PDF

aa r X i v : . [ m a t h . P R ] A p r The Annals of Applied Probability (cid:13)

Institute of Mathematical Statistics, 2011

THE LONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS:RESULTS ON FAST AND SLOW CONVERGENCE

By Stefan Gerhold Vienna University of Technology

We investigate the Longstaﬀ–Schwartz algorithm for Americanoption pricing assuming that both the number of regressors and thenumber of Monte Carlo paths tend to inﬁnity. Our main resultsconcern extensions, respectively, applications of results by Glasser-man and Yu [

Ann. Appl. Probab. (2004) 2090–2119] and Stentoft[ Manag. Sci. (2004) 1193–1203] to several L´evy models, in partic-ular the geometric Meixner model. A convenient setting to analyzethis convergence problem is provided by the L´evy–Sheﬀer systemsintroduced by Schoutens and Teugels.

1. Introduction.

PDE or tree methods for pricing ﬁnancial products be-come ineﬀective in the presence of many stochastic factors and path de-pendent payoﬀ structures. When resorting to Monte Carlo, early exercisefeatures like callability or ﬂip options pose diﬃculties. Typical examples arethe pricing of callable LIBOR exotics with the LIBOR market model [3] orthe valuation of life insurance contracts with early exercise features [1].The least squares Monte Carlo approach by Longstaﬀ and Schwartz [17]has become the standard method to deal with such American/Bermudanproducts. It proceeds by backward induction and estimates value functionsby regression on a prescribed set of basis functions. The computed exercisestrategy is suboptimal, resulting in a lower bound for the option price; seeBelomestny, Bender and Schoenmakers [2] for recent work on upper bounds.Fouque and Han [12] discuss numerical aspects of American option pricing,including variance reduction.

Received March 2008; revised March 2010. Supported by the Christian Doppler Laboratory for Portfolio Risk Management(PRisMa Lab) . The Bank Austria Creditanstalt (BA-CA)and the Austrian Federal Financing Agency ( ¨OBFA) through CDG is gratefully acknowl-edged.

AMS 2000 subject classiﬁcations.

Primary 62P05; secondary 33C45.

Key words and phrases.

Option pricing, dynamic programming, Monte Carlo, regres-sion, orthogonal polynomials, L´evy–Meixner systems

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Applied Probability ,2011, Vol. 21, No. 2, 589–608. This reprint diﬀers from the original in paginationand typographic detail. 1

S. GERHOLD

The convergence analysis of the Longstaﬀ–Schwartz algorithm was com-menced in the original paper [17] and was carried out in detail by Cl´ement,Lamberton and Protter [4]. They show convergence of the regression ap-proximation to the true Bermudan price and convergence of the MonteCarlo procedure for a ﬁxed number of basis functions. Glasserman andYu [14] and Stentoft [25] have analyzed settings in which the number ofbasis functions and the number of simulation paths increase together. Inparticular, Glasserman and Yu [14] have shown that the number of pathsmust grow exponentially in the number of basis functions if the underlyingprocess is Brownian motion or geometric Brownian motion. On the otherhand, Stentoft [25] appealed to results on series estimators [7, 20] to obtainpolynomial growth for rather general models, assuming that the underly-ing has a bounded state space. The latter assumption was also imposed byEglof, Kohler and Todorovic [9, 10] in the analysis of their extension of theLongstaﬀ–Schwartz algorithm.In the present paper we discuss the applicability of Stentoft’s results toexponential L´evy models and extend Glasserman and Yu’s analysis to severalmodels, including the Meixner model [16, 22]. These latter results provide anapplication of the neat martingale properties that Schoutens and Teugels [21,23] found for certain L´evy processes and families of orthogonal polynomials.In the following section we recall the dynamic programming principle andthe Longstaﬀ–Schwartz algorithm. We show how Stentoft’s [25] convergenceresult can be applied to L´evy models, in particular, to the Meixner model.This involves discussing the assumption of a bounded underlying and thesmoothness of the value functions occurring in the backward induction.In Section 3 we describe the problem that Glasserman and Yu [14] treated.The main diﬀerence to Stentoft’s setting is the unbounded support of theunderlying. Section 4 recalls the notions of Sheﬀer system and L´evy–Meixnersystem. Besides Brownian motion, this theory yields four processes that lendthemselves to the investigation: the Meixner, standard Poisson, Gamma andPascal processes [8, 13]. In Section 5 we assume that our option has onlythree exercise opportunities resulting in a single regression and show how fastthe number of simulation paths must increase in order to ensure convergenceof the Longstaﬀ–Schwartz algorithm for a growing number of basis functions.Finally, Section 6 contains an analogous bound for the multi-period setting,which is weaker, but upon inversion still leads to the same critical asymptoticrate as the single-period case. In the course of the proofs it turns out thatthe diﬀerent critical rate pertaining to Brownian motion stems from thecomparatively slow growth of the linearization coeﬃcients of the associatedL´evy–Meixner system, namely, the Hermite polynomials.

2. Bounded state space and fast convergence.

Suppose that our assetfollows a Markov process S t . We assume throughout the paper that the ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS interest rate is zero; extending our results to a constant interest rate r > t < · · · < t m .The payoﬀ from exercise is h n ( S t n ) for given functions h n , 0 ≤ n ≤ m . Bythe dynamic programming principle the option value at time t = 0 equals V = max { h ( S ) , C ( S ) } , where the continuation values C n are given by C m ( x ) = 0 ,C n ( x ) = E [max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ] , ≤ n < m. Suppose that N sample paths of the underlying are simulated. Longstaﬀ andSchwartz [17] propose to approximate the continuation values by a linearcombination of basis functions ψ nk , C n ( x ) ≈ K X k =0 β nk ψ nk ( x ) = β T n ψ n ( x ) , where β n = ( β n , . . . , β nK ) T is a vector of real numbers which is estimatedby regression over the simulated paths and ψ n ( x ) = [ ψ n ( x ) , . . . , ψ nK ( x )] T .To obtain a good convergence result as N and K both tend to inﬁnity,Stentoft [25] assumes that samples above and below certain thresholds arediscarded. So let us ﬁx ﬁnite truncation intervals I , . . . , I m ⊂ ]0 , ∞ [ anddiscard all sample paths with S t n / ∈ I n when estimating the continuationvalue C n ( x ). We are then estimating the following “truncated” continuationvalues: C tr m ( x ) = 0 ,C tr n ( x ) = I n ( x ) · E [max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ] , ≤ n < m,C tr0 ( x ) = E [max { h ( S t ) , C tr1 ( S t ) } | S = x ] . The option value at time t = 0 approximately equals V tr0 = max { h ( S ) , C tr0 ( S ) } . (2.1)Outside of the truncation intervals I n ⊂ ]0 , ∞ [ we extrapolate by zero sinceit does not matter in the theoretical analysis.Besides truncation, another possibility to make the state space boundedwould be absorption of the underlying process at some lower and upperbounds [9, 10]. This, however, causes atoms in the distribution so thatStentoft’s result is no longer applicable as it requires the existence of adensity.We assume in the present section that the underlying has the followingdynamics. (Recall that we suppose throughout that the interest rate is zero.) S. GERHOLD

Assumption A (Exponential L´evy dynamics). The risk neutral dynam-ics of the underlying are S t = S exp( X t ) , where X t is a L´evy process with X = 0. The support of X t is the whole realline for t > X t has a continuous density function. Assumption B (Value smoothness). Let the function h be of, at most,linear growth and such that h ( S T ) is integrable for each T >

0. Then E [ h ( S T ) | S = x ] is a C -smooth function of x .Without going into detail we note that Stentoft [25] imposes the followingadditional assumptions: Assumption C (Further technical assumptions). The basis functionsare shifted Legendre polynomials, the continuation values C n ( S t n ) are inthe L -span of the regressors, the simulated paths are independent and theprobability that the exercise payoﬀ exactly equals the continuation value iszero.Now Stentoft’s main result ([25], Theorem 2), specialized to L´evy models,reads as follows. (By “truncated algorithm” we mean that we discard thesamples outside the intervals I n as explained above.) Theorem 1.

Fix arbitrary ﬁnite truncation intervals I , . . . , I m con-tained in ]0 , ∞ [ and assume that Assumptions A–C hold. Let N (the numberof paths) and K (the number of basis functions) tend to inﬁnity such that K /N → . Then the option prices computed by the truncated Longstaﬀ–Schwartz algorithm converge to V tr0 , deﬁned by (2.1). If the truncation intervals are large enough, then one would hope that theapproximate price V tr0 is close to the exact price V . We will now show thatthis is indeed the case for L´evy models, assuming mild integrability and (atmost) linearly growing payoﬀ functions. Assumption D (Integrability). For each t there are p > p ′ > S pt and S − p ′ t are integrable. Assumption E (Linear payoﬀ growth). The payoﬀ functions grow atmost linearly, | h n ( x ) | ≤ c (1 + x ) , x ≥ , ≤ n ≤ m , for some c > . (2.2) ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS Theorem 2.

Assume that Assumptions A, D and E are satisﬁed andthat the truncation intervals satisfy I n = [ b − n , b n ] , ≤ n < m, where b n = b νn +1 , ≤ n < m − , (2.3) with ν = min (cid:26) p ′ p ′ + q , pp + q (cid:27) and p + 1 q = 1 . Then V tr0 converges to the exact option price V as b m tends to inﬁnity. Note that ν = 1 − /p in (2.3) if p = p ′ . In particular, if all moments ofthe underlying and its reciprocal exist, like in the Black–Scholes model, thenthe exponent ν ∈ ]0 ,

1[ is arbitrarily close to 1.

Proof of Theorem 2.

A trivial induction, using the martingale prop-erty of S t , shows that the continuation values C n ( x ) and C tr n ( x ) satisfy thebound (2.2) too. We will show that for all nC tr n ( x ) = C n ( x ) + o (1) as b m → ∞ , uniformly w.r.t. x ∈ I n . For x ∈ I n , we have C n ( x ) − C tr n ( x )(2.4) = E [ { S tn +1 / ∈ I n +1 } max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ]+ E [ { S tn +1 ∈ I n +1 } max { h n +1 ( S t n +1 ) , C n +1 ( S t n +1 ) } | S t n = x ] − E [ { S tn +1 ∈ I n +1 } max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ](2.5) − E [ { S tn +1 / ∈ I n +1 } max { h n +1 ( S t n +1 ) , C tr n +1 ( S t n +1 ) } | S t n = x ] . It follows readily from the induction hypothesis that the diﬀerence of thesecond and the third term is uniformly o (1) on I n , as b m − → ∞ . In thefollowing, we write c for various positive constants whose precise value isirrelevant. Now let us estimate the ﬁrst and the last expectation on theright-hand side of (2.5). Again, for x ∈ I n we use H¨older’s inequality andMinkowski’s inequality to see that each of them is bounded by E [ { S tn +1 / ∈ I n +1 } c (1 + S t n +1 ) | S t n = x ] ≤ c P [ S t n +1 / ∈ I n +1 | S t n = x ] /q · E [(1 + S t n +1 ) p | S t n = x ] /p S. GERHOLD = c P (cid:20) x S t n +1 S t n / ∈ I n +1 (cid:21) /q · E (cid:20)(cid:18) x S t n +1 S t n (cid:19) p (cid:21) /p (2.6) ≤ c (cid:18) − F n (cid:18) b n +1 x (cid:19) + F n (cid:18) xb n +1 (cid:19)(cid:19) /q (cid:18) x E (cid:20)(cid:18) S t n +1 S t n (cid:19) p (cid:21) /p (cid:19) ≤ cb n (cid:18) − F n (cid:18) b n +1 b n (cid:19) + F n (cid:18) b n b n +1 (cid:19)(cid:19) /q , where F n is the distribution function of S t n +1 /S t n . Now note that b qn (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) = b p + qn b − pn +1 (cid:18) b n +1 b n (cid:19) p (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) ≤ (cid:18) b n +1 b n (cid:19) p (cid:18) − F n (cid:18) b n +1 b n (cid:19)(cid:19) = o (1) , b m − → ∞ , where the last equality follows [11] from S t n +1 /S t n ∈ L p . Similarly, if G n denotes the distribution function of S t n /S t n +1 , we have b qn F n (cid:18) b n b n +1 (cid:19) = b qn (cid:18) − G n (cid:18) b n +1 b n (cid:19)(cid:19) = b p ′ + qn b − p ′ n +1 (cid:18) b n +1 b n (cid:19) p ′ (cid:18) − G n (cid:18) b n +1 b n (cid:19)(cid:19) = o (1) . (cid:3) Besides the bounded state space, a crucial assumption of Stentoft’s re-sult (Theorem 1) is the smoothness of the continuation value functions. Inthe Black–Scholes model, and more generally in models where the log-price X t has a diﬀusion component, they are always C ∞ -smooth [5]. The vari-ance Gamma model is an example of a pure jump process where the valuefunctions are not necessarily continuously diﬀerentiable [5]. In the geometricMeixner model [16, 22, 23], on the other hand, the continuation values aresmooth, as we will now show. Consequently, Theorem 1 is applicable to thegeometric Meixner model (if the mild Assumptions C and D are satisﬁed). Proposition 3.

Suppose that Assumptions A and D hold and that thelog-price X t is a Meixner process. Then Assumption B holds. Proof.

For ﬁxed t > X t follows the Meixner distributionMeix( α, β, µt, δt ), where α > − π < β < π , µ > δ ∈ R . This means ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS that the density of X t equals f t ( x ) = (2 cos( β/ δt πα Γ(2 δt ) e β/α ( x − µt ) (cid:12)(cid:12)(cid:12)(cid:12) Γ (cid:18) δt + i x − µtα (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) and the value function for the payoﬀ h ( S T ) is E [ h ( S T ) | S t = x ] = Z ∞−∞ h ( e y x ) f T − t ( y ) dy (2.7) = Z ∞ h ( z ) f T − t (cid:18) log zx (cid:19) dz/z. By the asymptotic formulas [22] f t ( x ) ∼ c ± | x | δt − e −| x | ( π ± β ) /α , x → ±∞ , and the integrability Assumption D, we must have ( π + β ) /α >

1. We cannow diﬀerentiate the value function (2.7) under the integral sign, justiﬁedby the following fact: for real u and natural k the quantity ∂ k /∂v k | Γ( u + iv ) || Γ( u + iv ) | grows only polynomially in v as v → ±∞ . To see this start from Lerch’sformula [15] | Γ( u + iv ) | = Γ( u + 1) √ u + v ∞ Y n =1 (cid:18) v ( u + n ) (cid:19) − / , hence, we have ∂/∂v | Γ( u + iv ) || Γ( u + iv ) | = − vu + v − v ∞ X n =1 u + n ) + v . It suﬃces to note that 1 / [( u + n ) + v ] ≤ / ( u + n ) to see that this expres-sion grows only polynomially in v . The higher derivatives can be dealt withby a straightforward induction. (cid:3)

3. Unbounded state space and slow convergence.

If we drop the assump-tion that the state space of our underlying is bounded, the convergence be-havior of the Longstaﬀ–Schwartz algorithm radically changes. (As above, wesuppose that both the number of paths and the number of basis functionstend to inﬁnity.) This is illustrated by results of Glasserman and Yu [14]who showed, assuming that the underlying follows either Brownian motionor geometric Brownian motion, that the number of Monte Carlo paths mustgrow exponentially in the number of basis functions to retain convergence.

S. GERHOLD

The ﬁrst and last lines of Table 1 reﬂect this result; the lines in between willbe established below.For the reader’s convenience, our notation closely follows that of [14].Recall that we assume that the interest rate is r = 0 throughout the paper.The variant of the Longstaﬀ–Schwartz algorithm to be analyzed pro-ceeds as follows. Start with the ﬁnal continuation value ˆ C m = 0 and theﬁnal option value ˆ V m = h m . For n = m − , . . . , N sample paths { S ( i ) t , . . . , S ( i ) t n +1 } , 1 ≤ i ≤ N , and setˆ γ n = 1 N N X i =1 ˆ V n +1 ( S ( i ) t n +1 ) ψ n ( S ( i ) t n ) , ˆ β n = Ψ − n ˆ γ n , ˆ C n = ˆ β T n ψ n , ˆ V n = max { h n , ˆ C n } . Finally, the initial continuation value is ˆ C ( S ) = N − P Ni =1 ˆ V ( S ( i ) t ) fromwhich the initial option value is estimated by ˆ V = max { h ( S ) , ˆ C ( S ) } .There are two (minor) diﬀerences to the variant of the algorithm thatwe analyzed in Section 2: ﬁrst, we assume now that a fresh set of paths isgenerated for each exercise date. Second, in the present section we will useexplicit expressions for the ( K + 1) × ( K + 1) matrixΨ n = E [ ψ n ( S t n ) ψ n ( S t n ) T ] , (3.1)which has to be estimated by its sample counterpart in general.In the single-period case m = 2, the question that Glasserman and Yu [14]treated is as follows. Suppose that there is an exact representation h ( S t ) = K X k =0 β k ψ k ( S t ) , (3.2) Table 1

The highest possible number of basis functions for N paths Process Basis polynomials

Geometric Brownian motion Monomials √ log N Meixner Meixner–Pollaczek log N/ log log N Standard Poisson Charlier log N/ log log N Gamma Laguerre log N/ log log N Pascal Meixner log N/ log log N Brownian motion Hermite log N ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS with unknown constants β k . This assumption is not too restrictive; an in-ﬁnite series representation of this kind has to be assumed anyway to getconvergence of the algorithm and since we are interested in K → ∞ , we cansuppose that (3.2) is a good approximation of the payoﬀ at t . Furthermore,assume that the martingale property E [ ψ k ( S t ) | S t ] = ψ k ( S t )(3.3)holds. (In [14], additional deterministic factors in (3.3) are allowed; we choseto absorb these into the basis functions.) How fast may K tend to inﬁnitycompared to N while assuring that the mean square error of β tends tozero? To this end, Glasserman and Yu [14] established the boundssup | β | =1 E [ | β − ˆ β | ] ≤ k Ψ − k N K X j =0 K X k =0 E [ ψ j ( S t ) ψ k ( S t ) ](3.4)and sup | β | =1 E [ | β − ˆ β | ] ≥ N k Ψ k K X k =0 E [ ψ K ( S t ) ψ k ( S t ) ] − N . (3.5)Here and in what follows, | · | denotes the Euclidean vector norm and k · k denotes the Euclidean (or Frobenius) matrix norm. With regard to notation,Glasserman and Yu [14] call the coeﬃcients in (3.2) a k instead of β k ; oursimpliﬁed assumption (3.3) makes both their a and β equal to our β . Thishas to be kept in mind when comparing (3.4) and (3.5) to [14], formulas(22), respectively, (23).The proofs of the estimates (3.4) and (3.5) are short; the bulk of thework of Glasserman and Yu [14] lies in the concrete examples (Brownianmotion and geometric Brownian motion) and in the general analysis of themulti-period case on which we will build in Section 6.The martingale property (3.3) is convenient for estimating the expecta-tions in the bounds (3.4) and (3.5). Another useful property is orthogonalityof the basis functions. If S t is Brownian motion, then Glasserman and Yu[14] have shown that for N paths the highest K , for which the mean squareerror tends to zero, is roughly log N . Hermite polynomials are natural basisfunctions in this case. If the underlying process is geometric Brownian mo-tion and monomials are used as basis functions, then K may only be as highas √ log N . In the following sections we show that the analogous rate forthe Meixner, Poisson, Gamma and Pascal processes is in between, namely,log N/ log log N . S. GERHOLD

4. L´evy–Meixner systems.

A source of basis functions and processes thatsatisfy martingale equalities of the type (3.3) are the L´evy–Meixner systemsintroduced by Schoutens and Teugels [21, 23]. Recall that Meixner [18] hasdetermined all sets of orthogonal polynomials Q k ( x ) that satisfy Sheﬀer’scondition f ( z ) exp( xu ( z )) = ∞ X k =0 Q k ( x ) z k k !for some formal power series f and u with u (0) = 0, u ′ (0) = 0 and f (0) = 0.Schoutens and Teugels [23] introduce a time parameter t via f ( z ) t exp( xu ( z )) = ∞ X k =0 Q k ( x, t ) z k k !and show how an inﬁnitely divisible characteristic function, and thus a L´evyprocess, can be deﬁned by f and u under appropriate conditions. Buildingon Meixner’s characterization, ﬁve sets of orthogonal polynomials Q k ( X t , t )and associated L´evy processes X t are determined which satisfy martingaleequalities of the type E [ Q k ( X t , t ) | X s ] = Q k ( X s , s ) , ≤ s ≤ t. This furnishes the connection between Sheﬀer (resp., L´evy–Meixner) systemsand condition (3.3). There are ﬁve L´evy–Meixner systems constructed fromHermite polynomials, Charlier polynomials C k ( x, µ ), Laguerre polynomials L ( α ) k ( x ), Meixner polynomials M k ( x ; µ, q ) and Meixner–Pollaczek polynomi-als P k ( x ; µ, ζ ), respectively. The resulting L´evy processes X t are standardBrownian motion B t , the standard Poisson process N t , the Gamma pro-cess G t , the Pascal process P t and the Meixner process H t , respectively. SeeSchoutens and Teugels [21, 23] for details on all these processes and familiesof orthogonal polynomials.Brownian motion is not of interest to us since the corresponding lastline of Table 1 has been established by Glasserman and Yu [14]. As forthe remaining four processes, in the light of condition (3.3), the martingalerelations [21] E [ C k ( N t , t ) | N s ] = (cid:18) st (cid:19) k C k ( N s , s ) , E [ L ( t − k ( G t ) | G s ] = L ( s − k ( G s ) , (4.1) E [ M k ( P t ; t, q ) | P s ] = ( s ) k ( t ) k M k ( P s ; s, q ) , E [ P k ( H t ; t, ζ ) | H s ] = P k ( H s ; s, ζ ) , ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS valid for 0 < s < t , prompt us to choose the basis functions in Table 2.[Note that ( t ) k = t ( t + 1) · · · ( t + k −

1) is the Pochhammer symbol.] Whenspecializing the bounds (3.4) and (3.5) to our examples, we will require theorthogonality properties E [ C k ( N t , t ) C l ( N t , t )] = t − k k ! δ kl , (4.2) E [ L ( t ) k ( G t ) L ( t ) l ( G t )] = Γ( k + t + 1) k ! δ kl , (4.3) E [ M k ( P t ; t, q ) M l ( P t ; t, q )] = k !( t ) k q k δ kl , (4.4) E [ P k ( H t ; t, ζ ) P l ( H t ; t, ζ )] = Γ( k + 2 t )(2 sin ζ ) t k ! δ kl , (4.5)as well as a way to express the squares of the basis functions as series ofbasis functions. We will denote by d ki ( t n ) the linearization coeﬃcients inthe expansion ψ nk ( x ) = k X i =0 d ki ( t n ) ψ ni ( x ) . (4.6)Where distinction is necessary, the linearization coeﬃcients correspondingto the four families in Table 2 will be written as d P ki ( t n ), d G ki ( t n ), d Pa ki ( t n ) and d M ki ( t n ), respectively. The same superscripts will adorn other quantities todistinguish the four cases, namely, the Meixner, Poisson, Gamma and Pascalprocess as in Table 2.Among these processes, the Meixner process has the most signiﬁcancein applications. Clearly, a ﬁnancial model will impose geometric Meixnerdynamics (as in Proposition 3) rather than the linear process which maybecome negative. But then a convergence analysis in the spirit of Glassermanand Yu [14] is impossible with polynomial basis functions as the geometricMeixner process does not have ﬁnite moments of all orders. Instead, wepropose to use basis functions of logarithmic growth, ψ M , log nk ( x ) = P k (log x ; t n , ζ ) . (4.7) Table 2

L´evy–Meixner systems

Process Notation Basis polynomials ψ nk ( x ) Parameters

Meixner H t ψ M nk ( x ) = P k ( x ; t n , ζ ) 0 < ζ < π Standard Poisson N t ψ P nk ( x ) = t kn C k ( x, t n )Gamma G t ψ G nk ( x ) = L ( t n − k ( x )Pascal P t ψ Pa nk ( x ) = ( t n ) k M k ( x ; t n , q ) 0 < q < S. GERHOLD

Then our convergence result for the Meixner process (Theorem 4 below) canbe applied. Similarly, models based on the geometric Poisson ([24], Section112.7.1) or geometric Pascal processes can be reduced to the linear case bymodifying their respective basis functions analogously.

5. Unbounded state space: The single-period problem.

Main result and ﬁrst steps of the proof.

We now state our mainresult about the single-period problem where our option has the exercisetimes 0 = t < t < t . As noted above, the geometric Meixner model iscontained in this result by modifying the basis functions according to (4.7). Theorem 4.

Suppose m = 2 , that S t is a Meixner process and that thebasis functions are as in the ﬁrst line of Table 2. Put ( u, v ) = (8 , . If thenumber N of paths and the number K of basis functions satisfy N ≥ K ( u + ε ) K for some positive ε , then lim N →∞ sup | β | =1 E [ | β − ˆ β | ] = 0 . If N ≤ K ( v − ε ) K , then lim N →∞ sup | β | =1 E [ | β − ˆ β | ] = ∞ . For the standard Poisson, Gamma and Pascal processes, with their respectivebasis functions from Table 2, the same holds if ( u, v ) is replaced by (10 , , (8 , and (11 , , respectively. The announced critical rate log N/ log log N in Table 1 then follows fromthe fact that the solution of N = K cK satisﬁes K ∼ c − log N/ log log N (see,e.g., de Bruijn [6]).Looking at (3.4) and (3.5) we begin the proof of Theorem 4 by bounding k Ψ k and k Ψ − k , deﬁned by (3.1) and Table 2. As in Section 2, the letter c denotes various positive constants whose value is irrelevant. Lemma 5. As K → ∞ , the values k Ψ k and k Ψ − k grow at most expo-nentially in all four cases (Meixner, Poisson, Gamma and Pascal), exceptfor k Ψ P1 k ≤ c K K K and k Ψ Pa1 k ≤ c K K K . Proof.

The estimates for the Meixner, Poisson and Pascal cases areeasy consequences of the orthogonality relations (4.2)–(4.5) and Stirling’sformula. It remains to deal with the Gamma case. The parameter t − ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS the martingale property (4.1) is not quite compatible with the orthogonalityrelation (4.3) of the Laguerre polynomials. But by the formula [26] L ( α − k ( x ) = L ( α ) k ( x ) − L ( α ) k − ( x )we obtain E [ ψ G1 k ( G t ) ψ G1 l ( G t )] =  − (cid:18) k + t k (cid:19) , k = l − k + t k + t (cid:18) k + t k (cid:19) , k = l , − (cid:18) k + t − k − (cid:19) , k = l + 1,0 , | k − l | ≥ G1 is tridiagonal. Since (5.1) grows only polynomially in k , it is clearthat so does k Ψ G1 k . As for the inverse, note that Ψ G1 is diagonally dominantso that it suﬃces to bound the diagonal elements of (Ψ G1 ) − (see Nabben[19], Theorem 3.1); note that the τ k from that theorem are all equal to 1in our situation.) The diagonal elements e k of (Ψ G1 ) − can be computedrecursively by [19] e KK = KK + t (cid:18) K + t − K − (cid:19) − ≤ c K and e k − ,k − = k + t k (cid:18) k + t k + t e k,k − e k +1 ,k +1 (cid:19) , ≤ k < K. A straightforward backward induction shows that this implies | e kk | ≤ (4( t + 1)) K − k +1 e KK , ≤ k < K, hence, k (Ψ G1 ) − k grows at most exponentially too. (cid:3) We proceed to bound the fourth order moments appearing in (3.4). Using(4.6) and the martingale relation (3.3), we obtain E [ ψ j ( S t ) ψ k ( S t ) ]= E " j X i =0 d ji ( t ) ψ i ( S t ) × k X s =0 d ks ( t ) ψ s ( S t ) (5.2) = j X i =0 2 k X s =0 d ji ( t ) d ks ( t ) E [ E [ ψ i ( S t ) | S t ] ψ s ( S t )]= j X i =0 2 k X s =0 d ji ( t ) d ks ( t ) E [ ψ i ( S t ) ψ s ( S t )] . S. GERHOLD

The linearization coeﬃcients d ki from the expansion (4.6) are well-studiedobjects for various families of orthogonal polynomials. They have combinato-rial interpretations in terms of (generalized) derangements, rook polynomialsand matching polynomials. See Zeng [27] for on overview of these properties,explicit formulas and many references. Paraphrasing some of these formulas([27], Corollary 2) we have d P ki ( t n ) = t k − in k ! i ! X s ≥ t sn ( s − k )! ( s − i )!(2 k + i − s )! , (5.3) d G ki ( t n ) = 2 k + i k ! i ! X s ≥ ( t n − s s ( s − k )! ( s − i )!(2 k + i − s )! , (5.4) d Pa ki ( t n ) = (1 + q ) k + i k ! i ! ( t n ) k ( t n ) i X s ≥ ( t n ) s (1 + q ) − s q − s ( s − k )! ( s − i )!(2 k + i − s )! , (5.5) d M ki ( t n ) = ( − ζ ) k + i k ! i ! X s ≥ ( t n ) s (1 + (cot ζ ) − ) s s ( s − k )! ( s − i )!(2 k + i − s )! . (5.6)Here it is understood that 1 /n ! = 0 for n a negative integer, as is naturalwhen extending the factorial by the Gamma function. Therefore, the sumsin (5.3)–(5.6) run from s = max { i, k } to s = k + ⌊ i/ ⌋ .5.2. Moment bounds in the Poisson case.

By (4.2), (5.2) and (5.3), thesum on the right-hand side of (3.4) can be estimated by K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ](5.7) ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b P jis (cid:19)(cid:18)X s ≥ b P kis (cid:19) , where b P kis := k ! i !( s − k )! ( s − i )!(2 k + i − s )! . It is easy to see that b P k +1 ,i,k + l +1 /b P k,i,k + l > i ≥

1, 0 ≤ l ≤ i/ k ≥ i − l , hence, b P k,i,k + l increases in k under these conditions. From this wededuce that the s -sums in (5.7) increase in j , respectively, k : k + ⌊ i/ ⌋ X s =max { i,k } b P kis = ⌊ i/ ⌋ X l =max { i − k, } b P k,i,k + l ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS ≤ ⌊ i/ ⌋ X l =max { i − k, } b P k +1 ,i,k + l +1 = k + ⌊ i/ ⌋ +1 X s =max { i,k } +1 b P k +1 ,i,s ≤ k + ⌊ i/ ⌋ +1 X s =max { i,k +1 } b P k +1 ,i,s . Using this in (5.7) yields (recall that c may change its value in each occur-rence) K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ](5.8) ≤ c K K ! K X i =0 K + ⌊ i/ ⌋ X s = K i ! / ( s − K )! ( s − i )!(2 K + i − s )! ! . It is plain that the summand increases in i for K ≥

0, 0 ≤ i ≤ K and K ≤ s ≤ K + i/

2. Hence, we ﬁnd that the portion P Ki =0 of the i -sum in (5.8) canbe bounded from above by( K + 1) K ! ⌊ K/ ⌋ X s = K s − K )! (3 K − s )! ! (5.9) ≤ c K K K . To see the last inequality, note that the summand in (5.9) is unimodal withmode at s = K + K / − K / + O (1). Estimating this maximal summand,by Stirling’s formula and some easy manipulations, shows that the sum in(5.9) is smaller than c K K K . The remaining part P Ki = K +1 of the i -sum in(5.8) can be estimated by K X i = K +1 i ! K + ⌊ i/ ⌋ X s = i i ! s !( s − K )! ( s − i )!(2 K + i − s )! ! ≤ c K K X i = K +1 i ! (cid:18) i !( K + ⌊ i/ ⌋ )! ⌊ i/ ⌋ ! ( K + ⌊ i/ ⌋ − i )!( i − ⌊ i/ ⌋ )! (cid:19) (5.10) ≤ c K K X i = K +1 i !( K + ⌊ i/ ⌋ )! ( K + ⌊ i/ ⌋ − i )! ≤ c K K K . S. GERHOLD

Fig. 1.

The summation range of the ﬁrst sum in (5.10).

Note that in the ﬁrst line we have introduced the new factor s ! in the numer-ator. This makes the summand increasing w.r.t. the substitution i → i + 1, s → s + 1. Hence, it suﬃces to keep only the summands of the s -sum with s = K + ⌊ i/ ⌋ (the thick dots in Figure 1) which shows the ﬁrst inequality.As for the second inequality, note that the factor i ! / ⌊ i/ ⌋ ! of the summandgrows only exponentially and that the factor ( i − ⌊ i/ ⌋ )! in the denominatoris clearly negligible. Finally, the last sum in (5.10) has increasing summandswhich, together with Stirling’s formula, implies the last inequality. By (5.8),the estimates (5.9) and (5.10) show that K X j =0 K X k =0 E [ ψ P2 j ( S t ) ψ P1 k ( S t ) ] ≤ c K K K . In light of (3.4) and Lemma 5, the value u = 10 for the Poisson process inTheorem 4 is established.As for the second assertion about the Poisson process in Theorem 4, notethat, from (5.2), E [ ψ K ( S t ) ψ k ( S t ) ] = K X i =0 2 k X s =0 d Ki ( t ) d ks ( t ) E [ ψ i ( S t ) ψ s ( S t )] . The orthogonality property (4.2) and formula (5.3) yield K X k =0 E [ ψ P2 K ( S t ) ψ P1 k ( S t ) ] ≥ c K K X k =0 2 k X i =0 d P Ki ( t ) d P ki ( t ) i ! ≥ c K d P K, K ( t ) d P K, K ( t )(2 K )! ≥ c K (2 K )! ≥ c K K K . The second inequality follows from retaining only the summand k = K , i = 2 K . This makes the sum in (5.3) collapse to the summand s = 2 K , ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS hence, the third inequality. Appealing to (3.5) and Lemma 5 completes theproof of the Poisson part of Theorem 4. Note that the preceding estimatescan presumably be improved. This seems not worthwhile though; since ourestimate of k Ψ P1 k in Lemma 5 is sharp, we will not obtain equal values u = v in Theorem 4 anyway, unless at least one of the bounds (3.4) and (3.5) wasimproved too.5.3. Moment bounds in the Meixner case.

The proofs in the remainingthree cases are very similar to the Poisson case. In the Meixner case, wehave K X j =0 K X k =0 E [ ψ M2 j ( S t ) ψ M1 k ( S t ) ](5.11) ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) , where b M kis := k ! s !( s − k )! ( s − i )!(2 k + i − s )! . Again, b M k,i,k + l increases in k and the remaining steps to show the upperbound are completely analogous to the Poisson case. This time the numera-tor factor s ! in the analogue of (5.10) appears naturally and is not introducedartiﬁcially to force some monotonicity. Moreover, the lower bound uses thesame summands as in the Poisson case. Both resulting bounds are of theform c K K K , hence, u = v = 8 in Theorem 4.5.4. Moment bounds in the Pascal case.

We can reuse the values b M kis and the estimate that we just sketched: K X j =0 K X k =0 E [ ψ Pa2 j ( S t ) ψ Pa1 k ( S t ) ] ≤ c K K X j =0 K X k =0 2 min { k,j } X i =0 i ! k ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) ≤ c K K !(2 K )! K X j =0 K X k =0 2 min { k,j } X i =0 i ! (cid:18)X s ≥ b M jis (cid:19)(cid:18)X s ≥ b M kis (cid:19) ≤ c K K !(2 K )! K K ≤ c K K K . The lower bound poses no new diﬃculties either. S. GERHOLD

Moment bounds in the Gamma case.

This part is only slightly moreinvolved. Due to (5.1), we have three i -sums instead of one in the analogueof (5.8). The right-hand side of (5.1) can be replaced by c K in each ofthese. Then one of the three i -sums equals the i -sum in (5.11) and the othertwo diﬀer only in an index shift b M k,i ± ,s which can be easily bounded bypolynomial factors. Thus the resulting growth rate is c K K K , as for theMeixner case. The proof of Theorem 4 is complete.5.6. Side remark: The Bachelier model.

We ﬁnish this section with aremark about Brownian motion. If this is the underlying process S t , thenappropriate basis functions can be built from Hermite polynomials in sucha way that k Ψ k , k Ψ − k and the analogue of (4.2) grow only exponentially[14]. This is in line with the corresponding growth orders in the Gamma andMeixner cases (and in the Poisson and Pascal cases, if we renormalize ourbasis functions there by 1 / √ k ! and 1 /k !, resp.). What makes the Gaussiancase peculiar is that the linearization coeﬃcients of the Hermite polynomialsinduce only exponential growth too when plugged into (5.2), whereas thelinearization coeﬃcients in the four cases we treat in this paper grow faster.

6. Unbounded state space: The multi-period problem.

In this sectionwe extend the main result of the preceding section (Theorem 4) to themulti-period problem, that is, to m + 1 exercise dates 0 = t < · · · < t m .We know from the single-period problem that the critical rate cannot belarger than log N/ log log N , so we will be done if we can show that thereis an upper bound for the mean square error of the form K cK for somepositive c . Fortunately, this can be deduced with little eﬀort from a result ofGlasserman and Yu [14] and the estimates from the preceding section aboutthe single-period problem. Following [14], we assume that a representationanalogous to (3.2) holds at time t m and that the payoﬀ functions do notgrow too fast in the following sense. Theorem 6.

Suppose that the payoﬀ functions satisfy the growth con-straint E [ h n ( S t n ) ] ≤ max ν (cid:18) t ν +1 t ν (cid:19) K max ν,k E [ ψ νk ( S t ν ) ] , ≤ n ≤ m. Then the mean square error of the estimated coeﬃcients satisﬁes sup | β m − | =1 E [ | β n − ˆ β n | ] ≤ N − c K K ( m − n +1) uK , ≤ n < m, where u takes on the same values as in Theorem 4, that is, , , , for S t ,the Meixner, standard Poisson, Gamma and Pascal process, respectively. ONGSTAFF–SCHWARTZ ALGORITHM FOR L´EVY MODELS Proof.

By results of Glasserman and Yu [14], Theorem 3 and the lastformula before (18) on page 2096 and Jensen’s inequality, we havesup | β m − | =1 E [ | β n − ˆ β n | ] ≤ c K N max ≤ ν

7. Conclusion.

Stentoft [25] and Glasserman and Yu [14] obtained appar-ently contradictory results about the convergence of the Longstaﬀ–Schwartzalgorithm. The main diﬀerence between their respective assumptions is the(un-)boundedness of the support of the underlying at the exercise dates. Inthis light the pessimistic results of Glasserman and Yu (and our Theorems 4and 6) turn out to stem from the tails of the distribution of the underlying.The present paper shows that Stentoft’s result can be applied to L´evymodels under mild assumptions and extends Glasserman and Yu’s [14] re-sults to several concrete processes. Thus we provide some evidence thatGlasserman and Yu [14] were right to conjecture that their results for Brow-nian motion and geometric Brownian motion extend to other models.Concerning Stentoft [25] and our Section 2: although the boundedness ofthe underlying induces a nice (polynomial) relation between the number ofbasis functions and the necessary number of Monte Carlo paths, it seemsnot yet completely clear that it is a harmless assumption in practice. Anatural question for future research is how strongly the size of the truncationintervals inﬂuences the convergence speed of the calculated prices. S. GERHOLD

Acknowledgments.

I thank Lars Stentoft, Friedrich Hubalek and an anony-mous referee for helpful comments.REFERENCES [1]

Bacinello, A. R. , Biffis, E. and

Millossovich, P. (2009). Pricing life insur-ance contracts with early exercise features.

J. Comput. Appl. Math.

Belomestny, D. , Bender, C. and

Schoenmakers, J. (2009). True upper boundsfor Bermudan products via non-nested Monte Carlo.

Math. Finance Brigo, D. and

Mercurio, F. (2006).

Interest Rate Models—Theory and Practice:With Smile, Inﬂation and Credit , 2nd ed. Springer, Berlin. MR2255741[4]

Cl´ement, E. , Lamberton, D. and

Protter, P. (2002). An analysis of a leastsquares regression method for American option pricing.

Finance Stoch. Cont, R. , Tankov, P. and

Voltchkova, E. (2004). Option pricing models withjumps: Integro-diﬀerential equations and inverse problems. In

Proceedings ofECCOMAS 2004 ( P. Neittaanm¨aki et al. , eds.). Jyv¨askyl¨a, Finland.[6] de Bruijn, N. G. (1958).

Asymptotic Methods in Analysis . Bibliotheca Mathematica . North-Holland, Amsterdam. MR0099564[7] de Jong, R. M. (2002). A note on “Convergence rates and asymptotic normality forseries estimators”: Uniform convergence rates by W. K. Newey. J. Econometrics

Dufresne, F. , Gerber, H. U. and

Shiu, E. S. W. (1991). Risk theory with theGamma process.

Astin Bull. Egloff, D. (2005). Monte Carlo algorithms for optimal stopping and statisticallearning.

Ann. Appl. Probab. Egloff, D. , Kohler, M. and

Todorovic, N. (2007). A dynamic look-ahead MonteCarlo algorithm for pricing Bermudan options.

Ann. Appl. Probab. Feller, W. (1971).

An Introduction to Probability Theory and Its Applications. Vol.II , 2nd ed. Wiley, New York. MR0270403[12]

Fouque, J.-P. and

Han, C.-H. (2009). Asymmetric variance reduction for pric-ing American options. In

Mathematical Modelling and Numerical Methods inFinance ( A. Bensoussan, P. Ciarlet and Q. Zhang , eds.).

Handbook of Nu-merical Analysis Gerber, H. U. and

Shiu, E. S. W. (1994). Option pricing by Esscher transforms.

Transactions of Society of Actuaries Glasserman, P. and

Yu, B. (2004). Number of paths versus number of basis func-tions in American option pricing.

Ann. Appl. Probab. Godefroy, M. (1901).

La Fonction Gamma . Gauthier-Villars, Paris.[16]

Grigelionis, B. (1999). Processes of Meixner type.

Liet. Mat. Rink. Longstaff, F. A. and

Schwartz, E. S. (2001). Valuing American options by sim-ulation: A simple least-squares approach.

Rev. Financial Stud. Meixner, J. (1934). Orthogonale Polynomsysteme mit einer besonderen Gestalt dererzeugenden Funktion.

J. London Math. Soc. Nabben, R. (1999). Two-sided bounds on the inverses of diagonally dominant tridi-agonal matrices.

Linear Algebra Appl. [20] Newey, W. K. (1997). Convergence rates and asymptotic normality for series esti-mators.

J. Econometrics Schoutens, W. (2000).

Stochastic Processes and Orthogonal Polynomials . LectureNotes in Statistics . Springer, New York. MR1761401[22]

Schoutens, W. (2002). Meixner processes: Theory and applications in ﬁnance. EU-RANDOM Report 2002-004, EURANDOM, Eindhoven.[23]

Schoutens, W. and

Teugels, J. L. (1998). L´evy processes, polynomials and mar-tingales.

Comm. Statist. Stochastic Models Shreve, S. E. (2004).

Stochastic Calculus for Finance. II. Continuous-Time Models .Springer, New York. MR2057928[25]

Stentoft, L. (2004). Convergence of the least squares Monte Carlo approach toAmerican option valuation.

Manag. Sci. Szeg˝o, G. (1975).

Orthogonal Polynomials , 4th ed.

American Mathematical Society,Colloquium Publications

XXIII . Amer. Math. Soc., Providence, RI. MR0372517[27]

Zeng, J. (1992). Weighted derangements and the linearization coeﬃcients of orthog-onal Sheﬀer polynomials.

Proc. London Math. Soc. (3) Financial and Actuarial MathematicsVienna University of TechnologyWiedner Hauptstraße 8–10A-1040, ViennaAustriaE-mail: [email protected]