[PDF] Weighted Monte Carlo with least squares and randomized extended Kaczmarz for option pricing

Abstract

We propose a methodology for computing single and multi-asset European option prices, and more generally expectations of scalar functions of (multivariate) random variables. This new approach combines the ability of Monte Carlo simulation to handle high-dimensional problems with the efficiency of function approximation. Specifically, we first generalize the recently developed method for multivariate integration in [arXiv:1806.05492] to integration with respect to probability measures. The method is based on the principle "approximate and integrate" in three steps i) sample the integrand at points in the integration domain, ii) approximate the integrand by solving a least-squares problem, iii) integrate the approximate function. In high-dimensional applications we face memory limitations due to large storage requirements in step ii). Combining weighted sampling and the randomized extended Kaczmarz algorithm we obtain a new efficient approach to solve large-scale least-squares problems. Our convergence and cost analysis along with numerical experiments show the effectiveness of the method in both low and high dimensions, and under the assumption of a limited number of available simulations.

Full PDF

WWeighted Monte Carlo with least squares andrandomized extended Kaczmarz for option pricing ∗ Damir Filipovi´c † Kathrin Glau ‡ Yuji Nakatsukasa § Francesco Statti ¶ September 30, 2019

Abstract

We propose a methodology for computing single and multi-asset European optionprices, and more generally expectations of scalar functions of (multivariate) randomvariables. This new approach combines the ability of Monte Carlo simulation to han-dle high-dimensional problems with the eﬃciency of function approximation. Speciﬁ-cally, we ﬁrst generalize the recently developed method for multivariate integration in[arXiv:1806.05492] to integration with respect to probability measures. The methodis based on the principle “approximate and integrate” in three steps i) sample the in-tegrand at points in the integration domain, ii) approximate the integrand by solvinga least-squares problem, iii) integrate the approximate function. In high-dimensionalapplications we face memory limitations due to large storage requirements in step ii).Combining weighted sampling and the randomized extended Kaczmarz algorithm weobtain a new eﬃcient approach to solve large-scale least-squares problems. Our con-vergence and cost analysis along with numerical experiments show the eﬀectiveness ofthe method in both low and high dimensions, and under the assumption of a limitednumber of available simulations.

Key words

Monte Carlo, Monte Carlo under budget constraints, variance reduction,multi-asset options, Kaczmarz algorithm, weighted sampling, large-scale least-squares prob-lems.

Recently, a new algorithm combining function approximation and integration with MonteCarlo simulation has been developed in [20]. This algorithm, called MCLS (Monte Carlowith Least Squares), draws on Monte Carlo’s ability to deal with the curse of dimensionalityand reduces the variance through function approximation.In this paper we extend MCLS to eﬃciently price European options in low and highdimensions. That is we approximate integrals of the form I µ = Z E f ( x ) dµ ( x ) , ∗ The authors would like to thank Daniel Kressner for helpful discussions on this paper. † EPFL and Swiss Finance Institute, 1015 Lausanne, Switzerland. Email: [email protected] ‡ Queen Mary University of London, Mile End Road, E1 4NS London, United Kingdom. Email: [email protected] . This project has received funding from the European Union’s Horizon 2020 researchand innovation program under grant 665667. § Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK, and National Institute of Infor-matics, Japan. Email: [email protected] ¶ EPFL, 1015 Lausanne, Switzerland. Email: [email protected] Note that the similarity between the names MCLS and Least-Squares Monte Carlo for American optionpricing developed in [19] is only owed to the fact that both methods use Monte Carlo and least-squares. a r X i v : . [ q -f i n . C P ] O c t here ( E, A , µ ) is a probability space. MCLS is based on the principle “approximate andintegrate” and mainly consists of three steps: i) generate N sample points { x i } Ni =1 ∈ E , from µ ; ii) approximate the integrand f with a linear combination of a priori chosen basis functions f ( x ) ≈ p ( x ) := P nj =0 c j φ j ( x ), where the coeﬃcients c = ( c , . . . , c n ) > are computed bysolving the least-squares problem min c ∈ R n +1 k Vc − f k for the Vandermonde matrix V =( φ j ( x i )) i =1 ,...,N ; j =0 ,...,n and f = ( f ( x ) , . . . , f ( x N )) > ; iii) integrate p to approximate theintegral P nj =0 c j R E φ j ( x ) µ ( d x ) ≈ R E f ( x ) µ ( d x ). Key to exploit the advantage of functionapproximation for integration is to choose basis functions { φ j } nj =0 that can be easily, possiblyexactly, integrated with respect to µ , and that p approximates f well. It can be shown thatfor n = 0 the MCLS estimator coincides with the standard MC estimator. For a given ﬁxed N , MCLS can outperform MC for any reasonably chosen basis { φ j } nj =0 with n > I µ . This extends [20] which only considered E =[0 , d and µ ( d x ) = d x . In particular, our cost analysis reveals that MCLS asymptoticallybecomes more accurate than MC at the same cost, see Proposition 3.4. This is of practicaluse whenever high accuracy is required and a large number of simulations is available, as forinstance in option pricing. For a typical task in portfolio risk management, instead, only alimited budget of simulations is available. This is because evaluating f is extremely costlyon the IT infrastructure of standard ﬁnancial institutions, such as insurance companies.Our error analysis suggests that MCLS provides a better approximation than MC for suchlimited budget situations, compare Proposition 3.2 and the subsequent discussion.Second, we note that a computational bottleneck in MCLS is the storage requirementarising when solving the least-squares problem. Indeed, when the number of simulations N and of the number of basis functions n + 1 are too large, it is not feasible to explicitly storethe N × ( n + 1) Vandermonde matrix V . This severely limits the scalability of MCLS. Inorder to overcome this limitation we enhance MCLS by the randomized extended Kaczmarz(REK) algorithm [24] for solving the least-squares problem. The beneﬁt of REK is that noexplicit storage of V is needed. However REK only applies eﬃciently to well conditionedleast-squares problems. Here we proﬁt from the reduction of the condition number of V thanks to a weighted sampling scheme due to [4]. Moreover, under this weighted samplingscheme the rows of V have equal Euclidean norm, which further speeds up the REK.Third, we apply MCLS to eﬃciently price European options in low and high dimensions.Here, the method turns out to be especially favorable for the class of polynomial models [6, 7]which covers widely used asset models such as the Black Scholes, Heston and Cox IngersollRoss models. In this framework conditional moments and thus expectations of polynomialsin the underlying price process are given in closed form. This naturally suggests the choice ofpolynomials as basis functions { φ j } nj =0 , for which step iii) of MCLS becomes very eﬃcient.Fourth, we approximate the high-dimensional integral R [0 , d sin (cid:16) P dj =1 x j (cid:17) d x for d = 10and d = 30, which shows that the limitations of the approach in [20] due to the highdimensionality have been indeed overcome considerably thanks to REK.The rest of the paper is organized as follows. In Section 2 we review the main ingredientsof our methodology and we explain how to combine them. In particular, in Section 2.1 wereview MCLS as in [20] and we extend it to arbitrary probability measures. Then, in Section2.2 we present the weighted sampling strategy proposed in [4]. In Section 2.3 we reviewthe randomized extended Kaczmarz algorithm and combine it with MCLS. We provide aconvergence and a cost analysis in Section 3. In Section 4 we apply MCLS to option pricing.Here, we present numerical results for diﬀerent polynomial models and payoﬀ proﬁles inboth low and high dimensionality. In Section 5 we analyze the performance of MCLS for astandard high-dimensional integration test problem. We conclude in Section 6. Usually the notion “Vandermonde matrix” is used for the special case when φ i ( x ) = x i and the matrixwe are dealing with therefore is a generalized Vandermonde matrix. In statistics, such a matrix is usuallyreferred to as “design matrix”. For the rest of the paper we call it Vandermonde matrix as in [20]. Core methods

For the reader’s convenience we present the three main ingredients of our methodology. First,we extend MCLS to arbitrary probability measures. Then, we present its combination withweighted sampling strategies as in [20]. Finally, we recap the randomized extended Kaczmarzalgorithm and we propose our combined strategy.

In this section we introduce a methodology to compute the deﬁnite integral I µ := Z E f ( x ) dµ ( x ) , for some probability space ( E, A , µ ) and for a function f : E → R , which we assume to besquare-integrable, i.e. in L µ = { f : E → R | k f k µ = Z E f ( x ) dµ ( x ) < ∞ , f measurable } , which is a Hilbert space with the inner product h f, g i µ = R E f ( x ) g ( x ) dµ ( x ). The methodis an extension of the method proposed in [20] for integrals with respect to the Lebesguemeasure.To start, we choose a set of n basis functions { φ j } nj =1 , with φ ≡

1, which will be usedto approximate the integrand f . The idea is to choose basis functions φ j that can be easilyintegrated. For instance, polynomials can be a good choice. Then, the steps of MCLSare as follows. First, as in standard Monte Carlo methods, one generates N sample points { x i } Ni =1 ∈ E , according to µ .Second, the integrand f and the set of basis functions are evaluated at all simulatedpoints { x i } Ni =1 leading to the following least-squared problem:min c ∈ R n +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) φ ( x ) φ ( x ) . . . φ n ( x )1 φ ( x ) φ ( x ) . . . φ n ( x )... ... ...1 φ ( x N ) φ ( x N ) . . . φ n ( x N ) | {z } =: V  c c ... c n  −  f ( x ) f ( x )... f ( x N ) | {z } =: f (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , (1)which we denote as min c ∈ R n +1 k Vc − f k . Note that (1) can be seen as a discrete versionof the projection problem min c k f − P nj =0 c j φ j k µ . Third, one solves (1), whose solution isknown to be explicitly given by ˆc = ( V T V ) − V T f . At this point, the linear combination p ( x ) := P nj =0 ˆ c j φ j ( x ) is an approximation of f .Finally, the last step consists of computing the integral of the approximant p , and I isapproximated by I ≈ ˆ I µ,N = Z E p ( x ) dµ ( x ) = ˆ c + n X j =1 ˆ c j Z E φ j ( x ) dµ ( x ) . We summarize the procedure in Algorithm 1.We remark that there is an interesting connection between MCLS and the standardMonte Carlo method: If one takes n = 0, i.e. one approximates f with a constant function,3 lgorithm 1 Generalized MCLS

Input:

Function f , basis functions { φ j } nj =1 , φ ≡

1, integer N ( > n ), probability distribu-tion µ ( x ) over domain E . Output:

Approximate integral ˆ I µ,N ≈ R E f ( x ) dµ ( x ) Generate sample points { x i } Ni =1 ∈ E , according to µ . Evaluate f ( x i ) and φ j ( x i ), for i = 1 , . . . , N and j = 1 , . . . , n . Solve the least-squares problem (1) for c = [ c , c , . . . , c n ] T . Compute ˆ I µ,N = ˆ c + P nj =1 ˆ c j R E φ j ( x ) dµ ( x ).the resulting approximation is the solution of the least-squares problemmin c ∈ R n +1 (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)  c −  f ( x ) f ( x )... f ( x N ) (cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13)(cid:13) , which is exactly given by ˆ c := N P Ni =1 f ( x i ), the standard Monte Carlo estimator. Werecall that in the standard MC method, asymptotically for large N the error scales like (cid:0) R E ( f ( x ) − I µ ) dµ ( x ) (cid:1) √ N = min c ∈ R k f − c k µ √ N =: σ ( f ) √ N . (2)The quantity ( σ ( f )) is usually referred to as the variance of f . This relation between MCand MCLS leads to an asymptotic error analysis, which we detail in Section 3.1.This connection leads to an asymptotic error analysis, which we detail in Section 3.1.This connection can also be exploited in order to increase the speed of convergence bycombining it with quasi-Monte Carlo. In [20] also other ways to speed up the procedure areproposed, for example by an adaptive choice of the basis functions (MCLSA).It is observed in [20] that the method performs well for dimensions d up to d = 6. Forhigher dimensions solving the least-squares problem (1) becomes computationally expensive,this is mainly due to two eﬀects:(i) The size of the matrix V , being N × ( n +1), rapidly becomes very large, posing memorylimitations.(ii) The condition number of the Vandermonde matrix V typically gets large.In the following we address these issues by combining MCLS with weighted sampling strate-gies and with the randomized extended Kaczmarz algorithm for solving the least-squaresproblem. It is crucial that the coeﬃcient matrix V in (1) be well conditioned, from both a computa-tional and (more importantly) a function approximation perspective. Computationally, anill-conditioned V means the least-squares problem is harder to solve using e.g. the conju-gate gradient method, and the randomized Kaczmarz method described Section 2.3. Froman approximation viewpoint, V having a large condition number κ ( V ) implies that thefunction approximation error (in the continuous setting) can be large: k f − P nj =0 c j φ j k µ is See e.g. [2] for the ﬁrst term and recall that the expectation of a random variable X is the constantwith minimal distance to X in the L -norm, rescaling yields the identity (2). We denote by κ ( V ) the 2-norm condition number of the matrix V . κ ( V ) k f − P nj =0 c ∗ j φ j k µ (see [20, § c ∗ := argmin c ∈ R n +1 k f − P nj =0 c j φ j k µ . Hence in practice we devise the MCLS setting (choice of φ and sample strat-egy) so that V is well conditioned with high probability.A ﬁrst step to obtain a well-conditioned Vandermonde matrix V , is to choose the ba-sis to be orthonormal with respect to the scalar product h·i µ , for instance by applying aGram-Schmidt orthonormalization procedure. Next, we observe that the strong law of largenumbers yields1 N ( V T V ) i +1 ,j +1 = 1 N N X l =1 φ i ( x l ) φ j ( x l ) p → Z E φ i ( x ) φ j ( x ) dµ ( x ) = δ ij as N → ∞ . Therefore, for a large number of samples N we expect N V T V to be close to theidentity matrix Id n +1 ∈ R ( n +1) × ( n +1) . This implies that κ ( V ) is close to 1. In practice,however, the condition number often is large. This is because the number N of samplepoints required to obtain a well-conditioned V might be very large. For example, if weconsider the one-dimensional interval E = [ − ,

1] with the uniform probability measure andan orthonormal basis of Legendre polynomials, one can show that at least N = O ( n log( n ))sample points are needed to obtain a well conditioned V . This example and others arediscussed in [3, 4].To overcome this problem, Cohen and Migliorati [4] introduce a weighted sampling forleast-squares ﬁtting. Its use for MCLS was suggested in [20], which we summarize here.Deﬁne the nonnegative function w via1 w ( x ) = P nj =0 φ j ( x ) n + 1 , (3)which is a probability distribution since w ≥ E and R E w ( x ) dµ ( x ) = 1. We then takesamples { ˜ x i } Ni =1 according to dµw . Intuitively this means that we sample more often in areaswhere P ni =0 φ i ( x ) takes large values.The least-squares problem (1) with the samples ∼ dµw becomesmin c k√ W ( Vc − f ) k , (4)where √ W = diag( p w (˜ x ) , p w (˜ x ) , . . . , p w (˜ x N )), and V , f are as before in (1) with x ← ˜ x . This is again a least-squares problem min c k e Vc − ˜ f k , with coeﬃcient matrix e V := √ WV and right-hand side ˜ f := √ Wf , whose solution is c = ( e V T e V ) − e V T √ Wf .With high probability, the matrix e V is well conditioned, provided that N (cid:38) n log n , seeTheorem 2.1 in [4]. Remark 2.1.

Note that the left-multiplication by √ W forces all the rows of e V to have thesame norm (here √ n + 1 ); a property that proves useful in Section 2.3. A simple strategy to sample from w is as follows: for each of the N samples, choose a basisfunction φ j from { φ j } nj =0 uniformly at random, and sample from a probability distributionproportional to φ j . We refer to [13] for more details. A standard least-squares solver that uses the QR factorization [9, Ch. 5] costs O ( N n )operations, which quickly becomes prohibitive (relative to standard MC) when n (cid:29) V T V ) c = V T f is suggested in [20]. For κ ( V ) = O (1) this reduces the computational cost5 lgorithm 2 REK: Randomized extended Kaczmarz method

Input: V ∈ R N × n and f ∈ R N . Output:

Approximate solution c for min c k Vc − f k Initialize c (0) = 0 and z (0) = f for k = 1 , , . . . , M do Pick i = i k ∈ { , . . . , N } with probability k V ( i, :) k / k V k F Pick j = j k ∈ { , . . . , n + 1 } with probability k V (: , j ) k / k V k F Set z ( k +1) = z ( k ) − V (: ,j k ) T z ( k ) k V (: ,j k ) k V (: , j k ) Set c ( k +1) = c ( k ) + f ik − z ( k ) ik − V ( i k , :) T c ( k ) k V ( i k , :) k V ( i k , :) end for c = c M to O ( N n ). However, CG still requires the storage of the whole matrix V , which is O ( N n ).Indeed in practice, building and storing the matrix V becomes a major bottleneck in MCLS.To overcome this issue, here we suggest a further alternative, the randomized extendedKaczmarz (REK) algorithm developed by Zouzias and Freris [24]. REK is a particularstochastic gradient method to solve least-squares problems. It builds upon Strohmer andVershynin’s pioneering work [23] and Needell’s extension to inconsistent systems [21], andconverges to the minimum-norm solution by simultaneously performing projection and so-lution reﬁnement at each iteration. The convergence is geometric in expectation, and asalready observed in [23], Kaczmarz methods can sometimes even outperform the conju-gate gradient method in speed for well-conditioned systems. A block version of REK wasintroduced in [22], which sometimes additionally improves the performance.Here we focus on REK and consider its application to MCLS. A pseudocode of REK isgiven in Algorithm 2. MATLAB notation is used, in which V (: , j ) denotes the j th column of V and V ( i, :) the i th row. The z ( k ) iterates are the projection steps, which converge to f ⊥ ,the part of f that lies in the orthogonal complement of V ’s column space. REK works bysimultaneously projecting out the f ⊥ component while reﬁning the least-squares solution.Let us comment on REK (Algorithm 2) and its implementation, particularly in the MCLScontext: • Employing the weighted sampling strategy of Section 2.2 signiﬁcantly simpliﬁes Algo-rithm 2. Following Remark 2.1, the norm of the rows of e V are constant and equal to √ n + 1. This also implies that k e V k F = N ( n + 1). The index i k in line 3 is there-fore simulated uniformly at random. This has a practical signiﬁcance in MCLS asthe probability distribution ( k V ( i, :) k / k V k F ) i =1 ,...,N does not have to be computedbefore starting the REK iterates. This results in (a potentially enormous) computa-tional reduction; an additional beneﬁt of using the weighted sampling strategy, besidesimproving conditioning. • The number of iterations M is usually not chosen a priori but by checking convergenceof c ( k ) infrequently. The suggestion in [24] is to check every 8 min( N, n ) iterations forthe conditions k Vc ( k ) − ( f − z ( k ) ) k k V k F k c ( k ) k ≤ ε, and k V T z ( k ) ) k k V k F k c ( k ) k ≤ ε for a prescribed tolerance ε > • A signiﬁcant advantage of REK is that it renders unnecessary the storage of the wholematrix V : only the i k th row and the j k th column are needed, taking O ( N ) memorycost. In practice, one can even sample in an online fashion: early samples can bediscarded once the REK update is completed.6 an I store V ? REK without storing V N o CG to normal equation κ ( V ) ≤ QR based solver κ ( V ) > Y e s Figure 1: Choice of algorithm to solve the least-squares problem.The convergence of REK is known to be geometric in the expected mean squared sense [24,Thm 4.1]: after M iterations, we have e E k c ( M ) − ˆc k ≤ (cid:18) − ( σ min ( V )) k V k F (cid:19) b M c (1 + 2 κ ( V )) k ˆc k , (5)where ˆc is the solution for min c k Vc − f k and the expectation e E is taken over the randomchoices of the algorithm. When V is close to having orthonormal columns (as would holdwith weighted sampling and/or N → ∞ with orthonormal basis functions φ ), the conver-gence in (5) becomes O ((1 − n ) M ).Our experiments suggest that conjugate gradients applied to the normal equation is fasterthan Kaczmarz, so we recommend CG whenever it is feasible. However, as mentioned above,an advantage of (extended) Kaczmarz is that there is no need to store the whole matrix toexecute the iterations. For these reasons, we suggest to choose the solver for the LS problem(1) according to the scheme shown in Figure 1. Preliminary numerical experiments indicatethat the threshold 10 for κ ( V ) is a good choice. In this section we ﬁrst present convergence results, on which basis we will derive a costanalysis.

First, we obtain a convergence result and consequently asymptotic conﬁdence intervals,applying the central limit theorem (CLT). The following statement and proof is a straight-forward generalization of [20, Theorem 5.1] for an arbitrary integrating probability measure µ . Proposition 3.1.

Fix n and the L µ -basis functions { φ j } nj =0 and let either w = 1 or w asin (3) . Then with the weighted sampling dµw , the corresponding MCLS estimator ˆ I µ,N , as N → ∞ we have √ N ( ˆ I µ,N − I µ ) d −→ N (0 , min c k√ w ( f − n X j =0 c j φ j ) k µ ) , where d −→ denotes convergence in distribution.Proof. The proof is provided in the Appendix.7e observe that MCLS converges like min c k√ w ( f − P nj =0 c j φ j ) k µ √ N (when { φ j } nj =1 is ﬁxed),highlighting the fact that the speed of convergence is still 1 / √ N , but with variance reducedfrom min c k f − c k (standard MC) to min c k√ w ( f − P nj =0 c j φ j ) k (MCLS). In other words,the variance is reduced thanks to the approximation of the function f .The above proposition shows that the MCLS estimator yields an approximate integralˆ I µ,N that asymptotically (for N → ∞ and { φ j } nj =1 ﬁxed) satisﬁes E [ | ˆ I µ,N − I µ | ] ≈ min c ∈ R n +1 k√ w ( f − P nj =0 c j φ j ) k µ √ N , (6)highlighting the fact that the asymptotic error is still O (1 / √ N ) (as in the standard MC), butwith variance ( σ ( f )) reduced from min c ∈ R k f − c k (standard MC, see (2)) to min c ∈ R n +1 k√ w ( f − P nj =0 c j φ j ) k µ (MCLS). In other words, the variance is reduced thanks to the approximationof the function f and the constant in front of the O (1 / √ N ) convergence in MCLS is equalto the function approximation error in the L µ norm.After solving the least-squares problem (4), the variance min c ∈ R n +1 k√ w ( f − P nj =0 c j φ j ) k µ can be estimated via e σ LS := 1 N − n − N X i =1 ( w (˜ x i )) ( f (˜ x i ) − p (˜ x i )) = 1 N − n − k W ( V ˆ c − f ) k , (7)where the samples ˜ x i , i = 1 , · · · , N are taken according to dµw . This leads to approximateconﬁdence intervals, for example the 95% conﬁdence interval is approximately given by h ˆ I µ,N − . e σ LS √ N , ˆ I µ,N − . e σ LS √ N i . (8)As explained in [20], the MCLS estimator is not unbiased, in the sense that E ( ˆ I µ,N ) = I µ .However, one can show along the same lines as in the proof of [20, Proposition 3.1] thatwith the MCLS estimator ˆ I µ,N with n and { φ j } nj =0 ﬁxed, | I µ − E ( ˆ I µ,N ) | = O (cid:18) N (cid:19) . This shows that the bias is of a smaller order than the error.In the case of weighted sampling, we moreover have a ﬁnite sample error bound, whichfollows directly from [4, Theorem 2.1 (iv)]. Note that as this is a non-asymptotic result, itis especially useful in practice.

Proposition 3.2.

Assume that we adopt the weighted sampling dµw . For any r > , if n and N are such that n ≤ κ N log( N ) − for κ = − log(2)2+2 r , then E [ k f − ˜ p k µ ] ≤ (cid:16) κ log( N ) (cid:17) min c k f − n X j =0 c j φ j k µ + 2 k f k µ N − r , (9) where ˜ p is deﬁned as ˜ p := ( p, if k N V T V − I k ≤ , otherwise , with p = P nj =0 ˆ c j φ j , for ˆ c being the solution of (4) . Note that the simulation is done withrespect to dµw . We use the notation “ ≈ ” with the statement “for N → ∞ ” to mean that the relation holds for suﬃcientlylarge N . E.g. (6) means E [ | ˆ I µ,N − I µ | ] = min c ∈ R n +1 k√ w ( f − P nj =0 c j φ j ) k µ √ N + o ( √ N ) for N → ∞ . This approximation is commonly used in linear regression, see e.g. [14].

8e note the slight diﬀerence between p and ˜ p ; this is introduced to deal with the tailcase in which V becomes ill-conditioned (which happens with low probability). This is usedfor a theoretical purpose, but in practice, this modiﬁcation is not necessary and we do notemploy it in our experiments.Proposition 3.2 allows us to deﬁne a non-asymptotic, proper bound for the expectederror we commit when estimating the vector c ∗ , solving the LS problem (4). To see this, weﬁrst decompose the function f into a sum of orthogonal terms f = n X j =0 c ∗ j φ j + g =: f + g, (10)for some coeﬃcients c ∗ j , j = 0 , · · · , n and where g satisﬁes R E g ( x ) φ j ( x ) dµ ( x ) = 0 for all j = 0 , · · · , n . Note that k g k µ = min c k f − P nj =0 c j φ j k µ . Then, E [ k f − n X j =0 ˆ c j φ j k µ ] = E [ k f − f − n X j =0 ˆ c j φ j + f k µ ]= k f − f k µ + E [ k c ∗ − ˆ c k ] = k g k µ + E [ k c ∗ − ˆ c k ] . This, together with the bound (9) yields E [ k c ∗ − ˆ c k ] ≤ κ log( N ) min c k f − n X j =0 c j φ j k µ + 2 k f k µ N − r . (11)When we are primarily interested in integration, we aim at an upper bound for the expectederror of the ﬁrst component of c . The bound (11) clearly holds for the ﬁrst component andthis gives us a bound for E ( | ˆ I µ,N − I µ | ). Intuitively, we expect that the error in the elementsof c are not concentrated in any of the components. This suggests a heuristic bound E [ | ˆ I µ,N − I µ | ] (cid:47) n  κ log( N ) min c k f − n X j =0 c j φ j k µ + 2 k f k µ N − r  . (12)This argument has already been proposed in [20]. A rigorous argument still remains anopen problem. Observing that the ﬁrst term of the right hand side is the dominant one (for N → ∞ ) and assuming n ≈ N log( N ) , we can see that the heuristic bound (12) matches theasymptotic result derived in Proposition 3.1. The purpose of this section is to reveal the relationship between the error vs. cost (in ﬂops).The cost of MCLS is analyzed in [20] and in Table 1 we report a cost and error comparisonbetween MC and MCLS as given in Table 3.1 in [20]. Here, we highlight some cases forwhich MCLS outperforms MC in terms of accuracy or cost.

Remark 3.3.

Note that the cost of MCLS in Table 1 is reported to be C f N + O ( N n ) .As already mentioned at the beginning of Section 2.3, this reﬂects the cost of MCLS whenapplying the CG algorithm to solve the least-squares problem (whenever κ ( V ) = O (1) ). Inthe case that we combine MCLS with the REK algorithm and κ ( V ) = O (1) , which happenswith high probability whenever the weighted sampling strategy is used (see [4, Theorem 2.1]),the cost is also C f N + O ( N n ) , as shown in [24, Lemma 9] and in the subsequent discussion.The following cost analysis includes therefore the two options CG and REK. C f N √ N min c k f − c k µ MCLS C f N + O ( N n ) 1 √ N min c k√ w ( f − n X j =0 c j φ j ) k µ Table 1: Comparison between MC and MCLS. N is the number of sample points and C f denotes the cost for evaluating f at a single point.First, consider the situation of a limited budget of sample points N that can not beincreased further, and the goal is to approximate the integral I µ in the best possible way.This is a typical task in ﬁnancial institutions. For instance, in portfolio risk management,simulation can be extremely expensive because a large number of risk factors and positionscontribute to the company’s portfolio. In this case even if MCLS is more expensive thanMC (second column of Table 1), MCLS is preferable to MC as it yields a more accurateapproximation (third column of Table 1).Second, we show under mild conditions that MCLS also asymptotically becomes moreaccurate than MC at the same cost. This can be of practical relevance whenever the in-tegral I µ needs to be computed at a very high accuracy and one is able to spend a highcomputational cost. Let us ﬁx some notation: e n := min c k√ w ( f − n X j =0 c j φ j ) k µ for n ≥ , Cost MC ( N ) := C f N, Cost

MCLS ( N , n ) := C f N + C M N n for some C M > , error MC ( N ) := e √ N , error

MCLS ( N , n ) := e n √ N , where the last two deﬁnitions reﬂect the asymptotic error behaviour for large N and N (fora ﬁxed n ), depicted in Table 1. We are now in the position to present the result. Proposition 3.4.

Assume that e n = o (cid:0) √ n (cid:1) . Then there exists ˜ n ∈ N such that for anyﬁxed n > ˜ n , error MCLS < error MC as Cost MCLS = Cost MC → ∞ .Proof. We ﬁrst determine the value of N = N ( N , n ) such that Cost MCLS = Cost MC :Cost MCLS = Cost MC ⇐⇒ N = N (cid:0) C M C f n (cid:1) . Consider now the error ratio under the constraint Cost

MCLS = Cost MC , given by ER := error MC error MCLS = e e n q C M C f n , yielding ER > ⇐⇒ e n s C M C f n < e . The assumption e n = o (cid:0) √ n (cid:1) implies that there exists some ˜ n such that ER > n > ˜ n . Now, ﬁxing an arbitrary n > ˜ n and letting N and consequently N going to inﬁnityyields the result. 10 emark 3.5. Note that the quantity ER in the proof of Proposition 3.4 only reﬂects theerror ratio asymptotically for N, N → ∞ . Therefore we restrict the statement of the resultto the asymptotic case where Cost MCLS = Cost MC → ∞ . To show the practical implication of this asymptotic analysis, in Figures 2 and 3 weexamine the convergence of MC and MCLS. We consider the problem of integrating smoothand non-smooth functions for several dimensions d , on the unit cube [0 , d and with respectto the Lebesgue measure. Even though the result of Proposition 3.4 holds for a ﬁxed valueof n , in practice the convergence rate can be improved by varying n together with N , asillustrated in [20], where such an adaptive strategy is denoted by MCLSA. For this reason,we show numerical results where we let the cost increase (represented on the x-axis) and fordiﬀerent choices of n ( n ﬁxed and n varying) .As expected, the numerical results reﬂect our analysis presented above. For all dimen-sions and chosen functions, we achieve an eﬃciency gain by an appropriate choice of n and N , asymptotically. Note that the erratic convergence with ﬁxed n is a consequence of ill-conditioning; an eﬀect described also in [20]. Namely, when the number of sample points N is not enough, V tends to be ill-conditioned and the least-squares problem min c k Vc − f k requires many CG iterations, resulting in higher cost than with a larger N . Therefore, thefunction “N Cost(N)” is not necessarily monotonically increasing in N . We observe thatsome of the curves in Figures 2 and 3, for instance for n ﬁxed, are not functions of Cost(N),as they are not always single-valued. This shows that indeed the mapping “N Cost(N)”is not always monotone. Cost (flops) -6 -4 -2 e rr o r M C Cost (flops) -10 -5 e rr o r M C Figure 2: Cost vs Convergence plots for MC and MCLS with varying n : n = 50, n = √ N and n = N/ log N , for d = 1. Cost is computed as 2 N ( n + 1) k , the ﬂop counts in theCG iteration, where k is the number of CG steps required. Left: Non-smooth function f ( x ) = | x − | . Right: analytic function f ( x ) = sin(30 x ). The option pricing problem is one of the main tasks in ﬁnancial mathematics and can besummarized as follows. First, we ﬁx a ﬁltered probability space (Ω , F , F t , Q ), where Q denotes a risk neutral pricing measure. In this framework a stochastic process ( X t ) ≤ t ≤ T deﬁned on a time horizon [0 , T ] for T > E ⊆ R d is These ﬁgures diﬀer from those in [20] in that the x -axis is the cost rather than the number of samplepoints N . Cost (flops) -4 -2 e rr o r M C Cost (flops) -10 -5 e rr o r M C Figure 3: Same as in Figure 2, but with d = 5. The ﬁxed value n = 251 comes from n = (cid:0) d + kk (cid:1) − k = 5. Left: non-smooth function f ( x ) = P di =1 exp( −| x − | ).Right: analytic function f ( x ) = sin( P di =1 x i ).used to model the price of the ﬁnancial assets. Then, the price at time t = 0 of a Europeanoption with payoﬀ function f : E → R and maturing at time T is given by e − rT E [ f ( X T )] = e − rT Z E f ( x ) dµ ( x ) , (13)where r is a risk-free interest rate and µ denotes the distribution of X T whose support isassumed to be E and f ∈ L ( µ ). In this section we explain how to apply MCLS to compute European option prices. Whenapplying MCLS for computing (13) we observe two potential issues. First, the distribution µ often is not known explicitly. Therefore, we can not directly perform the sampling part,namely the ﬁrst step of MCLS, as described in Algorithm 1. Second, it is crucial that thebasis functions { φ j } nj =0 are easily integrable with respect to µ . Therefore we need to ﬁndan appropriate selection of the basis functions.Concerning the sampling part, if µ is explicitly known, as for example in the Blackand Scholes framework (see Section 4.2.2 and Section 4.2.3 for two examples), we can justgenerate sample points according to µ . If µ is not explicitly known, typically the process( X t ) ≤ t ≤ T can still be expressed as the solution of a stochastic diﬀerential equation (SDE).In this case, we propose to simulate N paths of X t by discretizing its governing SDE andcollect the realizations of X T . More details follow below and an example can be found inSection 4.2.1.To obtain an appropriate choice of the basis functions { φ j } nj =0 we need E [ φ j ( X T )] to beeasy to evaluate. To do so we exploit the structure of the underlying asset model. If X t belongs to the wide class of aﬃne processes, which is true for a large set of popular modelsincluding Black and Scholes, Heston and Levy models, then the characteristic function of X t can be easily computed, as explained e.g. in [5]. Therefore, the natural choice of basisfunctions is to choose exponentials. If X t is a polynomial diﬀusion [6] (as in our numericalexamples in Section 4.2.1) or a polynomial jump-diﬀusion [7], then its conditional momentsare given in closed form. Therefore, polynomials are an excellent choice of basis functions.To summarize, the main steps of MCLS for option pricing are as follows (if µ is notknown explicitly): 12 lgorithm 3 Generalized MCLS for European option pricing

Input:

Payoﬀ function f , basis functions { φ j } nj =0 , φ ≡

1, integer N ( > n ), governing SDEof X T . Output:

Approximate option price ˆ I µ,N ≈ R E f ( x ) dµ ( x ) Simulate N paths of the process X t from t = 0 to t = T , and collect the realizations of X T in x i , i = 1 , . . . , N . Evaluate f ( x i ) and φ j ( x i ), for i = 1 , . . . , N and j = 1 , . . . , n . Solve the least-squares problem (1) for c = [ c , c , . . . , c n ] T . Compute ˆ I µ,N = P nj =0 c j R E φ j ( x ) dµ ( x ).1. Simulate N paths of the process X t , from t = 0 to t = T (time to maturity), bydiscretization of the governing SDE.2. Let x i for i = 1 , . . . , N be the realizations of X T for each simulated path. Then, weevaluate f ( x i ) and φ j ( x i ), for i = 1 , . . . , N and j = 1 , . . . , n .3. Solve the least-squares problem (1) to obtain the approximation of f . The solver canbe chosen according to the scheme represented in Figure 1.4. Finally, the option price is approximated by (we omit the discounting factor) E [ f ( X T )] = Z E f ( x ) dµ ( x ) ≈ ˆ I µ,N := n X j =0 c j Z E φ j ( x ) dµ ( x ) = n X j =0 c j E [ φ j ( X T )] . Note that we selected the basis functions in such away that the quantities E [ φ j ( X T )]can be easily evaluated. In particular, no Monte Carlo simulation is required.Algorithm 3 summarizes this procedure.In the case that µ is explicitly known, the error resulting from MCLS is analysed inProposition 3.1 and Proposition 3.2. In case we discretize the governing SDE of X t , weintroduce a second source of error, which we address in the following.Assume that X t is the solution of an SDE of the form dX t = b ( X t ) dt + Σ( X t ) dW t ,X = x , (14)where W t denotes a d -dimensional Brownian motion, b : R d R d , Σ : R d R d × d . and x ∈ R d . An approximation of the solution X t of (14) can be computed via a uniformEuler-Maruyama scheme, deﬁned in the following. Deﬁnition 4.1.

Consider an equidistant partition of [0 , T ] in N s intervals, i.e. ∆ t = T /N s , t i = i ∆ t for i = 0 , · · · , N s , together with ∆ f W i = W t i +1 − W t i for i = 0 , · · · , N s . Then, the Euler-Maruyama discretization scheme of (14) is given by ¯ X i +1 = ¯ X i + b ( ¯ X i )∆ t + Σ( ¯ X i )∆ f W i , for i = 0 , · · · , N s − , ¯ X = x , (15) and the Euler-Maruyama approximation of X T is given by ¯ X N s . N independent copies of ¯ X N s (ﬁrst step of Algorithm 3) and weapply MCLS to approximate (13). Then the error naturally splits into two components as | E [ f ( X T )] − ¯ I µ,N | ≤ | E [ f ( X T )] − E [ f ( ¯ X N s )] | + | E [ f ( ¯ X N s )] − ¯ I µ,N | . The second summand can then be approximated as in (6). We collect the result in thefollowing proposition. Note that for simplicity we assume a vanishing interest rate.

Proposition 4.2.

Let ¯ I µ,N be the MCLS estimator obtained by sampling according to theEuler-Maruyama scheme as in Deﬁnition 4.1. Then, the MCLS error asymptotically (for n ﬁxed and N → ∞ ) satisﬁes | E [ f ( X T )] − ¯ I µ,N | (cid:47) | E [ f ( X T )] − E [ f ( ¯ X N s )] | + min c k f − P nj =0 c j φ j k ¯ µ √ N , (16) where ¯ µ is the distribution of ¯ X N s . The ﬁrst term in the right-hand-side of (16) is usually referred to as time-discretization er-ror , while the second summand denotes the so-called statistical error . The time-discretizationerror and, more generally, the Euler-Maruyama scheme together with its properties, are wellstudied in the literature, see e.g. [17]. Depending on the regularity properties of f, b and Σ,one can conclude, for example, that the time-discretization error is bounded from above by C | ∆ t | , for a constant C >

0. In this case, we say that the Euler-Maruyama scheme converges weakly with order 1. Finally, note that the statistical error can be further approximated asin (7) usingmin c k f − n X j =0 c j φ j k ¯ µ ≈ N − n − N X i =1 ( f ( x i ) − p ( x i )) = 1 N − n − k Vc − f k , where the x i ’s are sampled according to ¯ µ . Next, we apply MCLS to numerically compute European option prices (13) for several typesof payoﬀ functions f and in diﬀerent models. In particular, the considered models belongto the class of polynomial diﬀusion models, introduced in [6]. All algorithms have beenimplemented in Matlab version 2017a and run on a standard laptop (Intel Core i7, 2 cores,256kB/4MB L2/L3 cache).In all of our numerical experiments the solver for numerical solution of the least-squaresproblem (1) is chosen according to the scheme in Figure 1. The choice of the examples lead usto test all of the three choices in the scheme. For the univariate pricing examples in Heston’sand the Jacobi model, Section 4.2.1 the CG algorithm is appropriate. In Section 4.2.2, abasket option price of medium dimensionality in the multivariate Black-Scholes model, aQR based method is employed, because the condition number of V was usually larger than O (1). In these both cases we directly sample from the distribution of the underlying randomvariable X T , where in the univariate case we solve an SDE. Finally, we consider pricing arainbow option in a high dimensional multivariate Black-Scholes model in Section 4.2.3,where the randomized extended Kazcmarz algorithm combined with the weighted samplingstrategy yields a good performance. For example, if b and Σ are four times continuously diﬀerentiable and f is continuous and bounded,then the scheme converges weakly with order 1. See [17] for details. .2.1 Call option in stochastic volatility models We consider the Heston model as in [15]. The log asset price X t (meaning that the assetprice S t is of the form S t = e X t ) and the squared volatility process V t are deﬁned via theSDE dV t = κ ( θ − V t ) dt + σ p V t dW t ,dX t = ( r − V t / d t + ρ p V t dW t + p V t p − ρ dW t , where W t and W t are independent standard Brownian motions and the model parameterssatisfy the conditions κ ≥ θ ≥ σ > r ≥ ρ ∈ [ − , E = R + × R .The log-asset process in the Heston model is a polynomial diﬀusion and its moments canbe computed according to the moment formula introduced in [6, Theorem 3.1]. In this casethe formula is given by E [ p ( X T , V T ) |F t ] = H n ( X t , V t ) e G n ( T − t ) ~p, (17)where p is an arbitrary multivariate polynomial belonging to the space Pol n ( R ) of bivariatepolynomials of total maximal degree smaller than n , H n is a basis vector of Pol n ( R ) and ~p is the coordinate vector of p with respect to H n . Finally, G n is the matrix representationof the action of the generator of ( V t , X t ) restricted to the space Pol n ( R ). Note that thematrix G n can be constructed as explained in [18], with respect to the monomial basis.In the following we apply MCLS in the Heston model in order to price single-assetEuropean call options with payoﬀ function given by f ( x ) = ( e x − e k ) + , for a log-strike value k . We compare MC and MCLS to the Fourier pricing method introducedin [15].In this experiment we use an ONB (with respect to the corresponding L µ space, where µ is the distribution of X T ) of polynomials as basis functions φ j . Conveniently the ONBcan be obtained by applying the Gram-Schmidt orthogonalization process to the monomialbasis. Note that, even if the distribution µ is not known explicitly, we still can apply theGram-Schmidt orthogonalization procedure since the corresponding scalar product and theinduced norm can be computed via the moment formula (17).Since the distribution of X T is not known a priori, we apply the Euler-Maruyama schemeas deﬁned in (15) and obtain V = v ,X = x ,V t i = V t i − + κ ( θ − V t i − )∆ t + σ q V t i − √ ∆ tZ i ,X t i = X t i − + ( r − V t i − / t + ρ q V t i − √ ∆ tZ i + q V t i − q − ρ V t i − √ ∆ tZ i , (18)for all i = 1 , · · · , N s and where Z i and Z i are independent standard normal distributedrandom variables. For the following numerical experiments we consider the set of modelparameters σ = 0 . , v = 0 . , x = 0 , κ = 0 . , θ = 0 . , ρ = − . , r = 0 . . As long as the square roots in (18) are positive the Euler-Maruyama scheme is well-deﬁned.In our numerical experiments this was the case. To guarantee well-deﬁnedness, the schemecan be modiﬁed by taking the absolute value or the positive part of the arguments of thesquare roots. Such a modiﬁcation is discussed, e.g., in [16]. The same remark holds for theforthcoming numerical examples. 15irst, we apply MCLS to an in-the-money example, with payoﬀ parameters k = − . , T = 1 / , and we use N s = 100 time steps for the discretization of the SDE. We use an ONB consistingof polynomials of maximal degrees 0 (standard MC), 1 , Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 4: MCLS for ITM call option in Heston model for diﬀerent polynomial degrees. Left:Absolute price error. Right: Width of 95% conﬁdence interval.Second, we apply again MCLS but this time to an at-the-money call option with param-eters k = 0 , T = 1 / k = 0 . , T = 1 / . The results are shown in Figure 5 and in Figure 6, respectively. Sample size N -7 -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 5: MCLS for ATM call option in Heston model for diﬀerent polynomial degrees. Left:Absolute price error. Right: Width of 95% conﬁdence interval.16 Sample size N -7 -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 6: MCLS for OTM call option in Heston model for diﬀerent polynomial degrees.Left: Absolute price error. Right: Width of 95% conﬁdence interval.

Implied vol absolute errors k = − . k = 0 k = 0 .

1N MC MCLS MC MCLS MC MCLS100 – 0.21 2.16 0.29 0.50 0.37215 4.35 0.09 0.47 0.15 1.56 0.49464 9.16 0.31 1.00 0.00 1.03 0.261000 9.13 0.28 1.58 0.17 1.21 0.022154 2.44 0.22 0.82 0.16 0.59 0.194642 1.15 0.09 0.18 0.02 0.18 0.2410000 0.34 0.04 0.25 0.03 0.28 0.01Table 2: Implied volatility errors (in %) for MC and MCLS with basis of polynomials ofmaximal degree 5 in the Heston model, for diﬀerent sizes N of the sample set.In this setting, for all diﬀerent choices of payoﬀ parameters, we show in Table 2 theimplied volatility absolute errors for the MC and MCLS prices computed with a basisof polynomials of maximal degree 5. The implied volatility error is measured against theimplied volatility of the reference method.Before commenting on the numerical results, we apply MCLS to a second stochasticvolatility model, the Jacobi model as in [1]. Here, the log asset price X t and the squaredvolatility process V t are deﬁned through the SDE dV t = κ ( θ − V t ) dt + σ p Q ( V t ) dW t ,dX t = ( r − V t / dt + ρ p Q ( V t ) dW t + p V t − ρ Q ( V t ) dW t , where Q ( v ) = ( v − v min )( v max − v )( √ v max − √ v min ) , for some 0 ≤ v min < v max . Here, W t and W t are independent standard Brownian motionsand the model parameters satisfy the conditions κ ≥ θ ∈ [ v min , v max ], σ > r ≥ ρ ∈ [ − , E = [ v min , v max ] × R . The matrix G n in (17) canbe constructed as explained in the original paper [1] (with respect to a Hermite polynomial For a given call option price C the implied volatility is deﬁned as the volatility parameter that rendersthe corresponding Black Scholes price equal to C . σ = 0 . , v = 0 . , x = 0 , κ = 0 . , θ = 0 . ,v min = 10 − , v max = 0 . , ρ = − . , r = 0 . . We again consider single-asset European call options with payoﬀ parameters k = {− . , , . } , T = 1 / . As reference pricing method we choose the polynomial expansion technique introduced in[1], where we truncate the polynomial expansion of the price after 50 terms.We simulate the whole path of X t from 0 to T in order to get the sample points x i , i = 1 , · · · , n . The discretization scheme of the SDE is given by V = v ,X = x ,V t i = V t i − + κ ( θ − V t i − )∆ t + σ q Q ( V t i − ) √ ∆ tZ i ,X t i = X t i − + ( r − V t i − / t + ρ q Q ( V t i − ) √ ∆ tZ i + q V t i − − ρ Q ( V t i − ) √ ∆ tZ i for all i = 1 , · · · , N s , where Z i and Z i are independent standard normal distributed randomvariables and the rest of the parameters are as speciﬁed in the example for the Heston model.We use again an ONB consisting of polynomials of maximal degrees 0 (standard MC),1 , Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 7: MCLS for ITM call option in Jacobi model for diﬀerent polynomial degrees. Left:Absolute price error. Right: Width of 95% conﬁdence interval.We can observe that MCLS strongly outperforms the standard MC in terms of priceerrors, conﬁdence interval width and implied volatility errors, for every type of moneyness,in both chosen stochastic volatility models.The last remark concerns the condition number of the Vandermonde matrix V . Thanksto the choice of the ONB, in both models, its condition number is at most of order 10.Therefore, the CG algorithm has been selected. As another consequence of the low conditionnumber we did not implement weighted sampling.18 Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 8: MCLS for ATM call option in Jacobi model for diﬀerent polynomial degrees. Left:Absolute price error. Right: Width of 95% conﬁdence interval. Sample size N -7 -6 -5 -4 -3 -2 -1 P r i c e e rr o r M C M C L S deg = M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I M C M C L S deg = M C L S deg = M C L S deg = Figure 9: MCLS for OTM call option in Jacobi model for diﬀerent polynomial degrees. Left:Absolute price error. Right: Width of 95% conﬁdence interval.

In this section we address multi-dimensional option pricing problems of medium size, mean-ing with number of assets d ≤

10 and N ≤ . The asset prices S t , . . . , S dt follow a d -dimensional Black Scholes model, i.e. dS it = rS it + σ i S it dW it , i = 1 , · · · , d, (19)for some volatility values σ i , i = 1 , . . . , d , a risk-free interest rate r and d correlated Brownianmotions W it with correlation parameters ρ ij ∈ [ − ,

1] for i = j . The state space is E = R d + and the explicit solution of (19) is given by S it = S i exp (cid:0) ( r − σ i t + σ i W it (cid:1) . The process ( S t , . . . , S dt ) is a polynomial diﬀusion and the moment formula is given by E [ p ( S T , . . . , S dT ) |F t ] = H n ( S t , . . . , S dt ) e G n ( T − t ) ~p, where the involved quantities are deﬁned along the lines following (17). The matrix G n canbe computed with respect to the monomial basis as in the following lemma, and turns outto be diagonal, making Step 4 of Algorithm 3 even more eﬃcient.19 mplied vol absolute errors k = − . k = 0 k = 0 .

1N MC MCLS MC MCLS MC MCLS100 8.75 0.67 2.75 0.40 0.72 0.20215 8.67 0.46 1.92 0.33 0.96 0.14464 – 0.30 1.23 0.07 1.77 0.101000 3.27 0.39 0.32 0.13 1.16 0.242154 2.55 0.11 0.03 0.13 0.07 0.144642 3.26 0.03 0.68 0.01 0.15 0.0810000 0.47 0.05 0.35 0.02 0.32 0.02Table 3: Implied volatility errors (in %) for MC and MCLS with basis of polynomials withmaximal degree 5 in the Jacobi model, for diﬀerent sizes N of the sample set. Lemma 4.3.

Let H n be the monomial basis of Pol n ( R d + ) . Let π : E → ( , . . . , (cid:18) n + dn (cid:19)) be an enumeration of the set of tuples E = { k ∈ N d : | k | ≤ n } . Then, the matrix represen-tation G n of the inﬁnitesimal generator of the process ( S t , · · · , S dt ) with respect to H n andrestricted to Pol n ( R d + ) is diagonal with diagonal entries G π ( k ) ,π ( k ) = 12 d X i =1 d X j =1 σ i σ j ρ ij ( k i k j i = j + k i ( k i − i = j ) + r d X i =1 k i . Proof.

The inﬁnitesimal generator G of ( S t , · · · , S dt ) is given by G f = 12 d X i =1 d X j =1 σ i σ j ρ ij s i s j ∂ s i s j f + r d X i =1 s i ∂ s i f, which implies that for any monomial of the form s k · · · s k d d one has G s k · · · s k d d = s k · · · s k d d (cid:16) d X i =1 d X j =1 σ i σ j ρ ij ( k i k j i = j + k i ( k i − i = j ) + r d X i =1 k i (cid:17) . It follows that G n is diagonal as stated above.For the following numerical experiments we consider basket options with payoﬀ function f ( s , · · · , s d ) = (cid:16) d X i =1 w i s i − K (cid:17) + (20)for diﬀerent moneyness with payoﬀ parameters K = { . , , . } , T = 1 , w i = 1 d ∀ i. Model parameters are chosen to be S i = 1 ∀ i, σ i = rand(0 , . ∀ i, { ρ ij } di,j =1 = R d , r = 0 . , R d denotes a random correlation matrix of size d × d , where we choose d = 5 and d = 10.We compare MCLS to a reference price computed via a standard Monte Carlo algorithmwith 10 simulations. We plot again the absolute price errors and the width of the 95%conﬁdence intervals (computed as in (7) and (8)) for diﬀerent chosen polynomial degrees(maximally 1 and maximally 3). To be more precise, we used the monomial basis as functions { φ j } nj =0 . Note that the distribution of the price process ( S t , · · · , S dt ) is known to be thegeometric Brownian distribution so that there is no need to simulate the whole path butonly the process at ﬁnal time T .The results are shown in Figures 10, 11 and 12. In the legend the represented numberindicates again the maximal total degree of the basis monomials. For instance, if d = 2 andthe maximal total degree is deg = 3, this means the the basis functions φ j are chosen to be { , s , s , s , s s , s , s , s s , s s , s } . Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 5 M C M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I d = 5 M C M C L S deg = M C L S deg = Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 10 M C M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I d = 10 M C M C L S deg = M C L S deg = Figure 10: MCLS for ITM basket option in Black Scholes model for diﬀerent dimensions andpolynomial degrees. Left: absolute price errors with respect to a reference price computedwith 10 simulations. Right: Width of 95% conﬁdence interval.We observe that also in these multidimensional examples MCLS strongly outperformsthe standard MC in terms of absolute price errors and width of the conﬁdence intervals.Due to the use of the multivariate monomials as basis functions, the condition number of V is relatively high, reaching values up to order 10 . However, the QR based algorithm chosenaccording to the selection scheme 1 for the numerical solution of the least-squares problem(1) still yields accurate results. The Vandermonde matrix V is here still storable, beingof size at most 10 × Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 5 M C M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I d = 5 M C M C L S deg = M C L S deg = Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 10 M C M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I d = 10 M C M C L S deg = M C L S deg = Figure 11: MCLS for ATM basket option in Black Scholes model for diﬀerent dimensions andpolynomial degrees. Left: absolute price errors with respect to a reference price computedwith 10 simulations. Right: Width of 95% conﬁdence interval.and neither CG nor QR based solver can be used. In the multivariate Black-Scholes model we now consider rainbow options with payoﬀ func-tion f ( s , · · · , s d ) = ( K − min( s , · · · , s d )) + , so that we apply MCLS in order to compute the quantitye − rT E [( K − min( S T , · · · , S dT )) + ] = e − rT Z R d + ( K − min( s , · · · , s d )) + dµ ( s , · · · , s d ) , where µ is the distribution of ( S T , · · · , S dT ). In contrast to the payoﬀ (20) which presentsone type of irregularity that derives from taking the positive part ( · ) + , this payoﬀ functionpresents two types of irregularities: one again due to ( · ) + , and the second one deriving fromthe min( · ) function. This example is therefore more challenging.As in [20], we rewrite the option price with respect to the Lebesgue measuree − rT Z [0 , d (cid:18) K − min i =1 ,...,d (cid:16) S i exp (cid:0) ( r − σ i t + σ i √ T L Φ − ( x ) (cid:1)(cid:17)(cid:19) + d x , where L is the Cholesky decomposition of the correlation matrix and Φ − is the inverse mapof the cumulative distribution function of the multivariate standard normal distribution.22 Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 5 M C M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I d = 5 M C M C L S deg = M C L S deg = Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 10 M C M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I d = 10 M C M C L S deg = M C L S deg = Figure 12: MCLS for OTM basket option in Black Scholes model for diﬀerent dimensions andpolynomial degrees. Left: absolute price errors with respect to a reference price computedwith 10 simulations. Right: Width of 95% conﬁdence interval.The model and payoﬀ parameters are chosen to be S i = 1 , K = 1 , σ i = 0 . ∀ i, Σ = I d T = 1 , r = 0 . , so that we consider a basket option of uncorrelated assets.We apply MCLS for d = { , , } using diﬀerent total degrees for the approximatingpolynomial space and compare it with a reference price computed using the standard MCalgorithm with 10 simulations. Also, we consider diﬀerent numbers of simulations that goup to 10 . We choose a basis of tensorized Legendre polynomials, which form an ONB withrespect to the Lebesgue measure on the unit cube [0 , d , and we perform the sampling stepof MCLS (step 1) according to the optimal distribution as introduced in [4] and reviewedin Section 2.2. The solver for the least-squares problem is chosen according to the schemeshown in Figure 1, where we assume that the Vandermonde matrix V can be stored wheneverthe number of entries is less than 10 . This implies that also, for example, for the case d = 5with polynomial degree 5 and 10 simulations V can not be stored. Indeed, for d = 5,deg = 5 and N = 10 the matrix V has 2 . · entries. For all of these cases, we thereforesolve the least-squares problem by applying the randomized extended Kaczmarz algorithm.In Figure 13 we plot the obtained price absolute errors and the width of the 95% con-ﬁdence intervals for all considered problems. We notice that MCLS outperforms again MCin terms of conﬁdence interval width and price errors, as observed for medium dimensions.The choice of the weighted sampling strategy combined with the ONB allowed us to obtaina well conditioned matrix V , according to the theory presented in the previous sections.These examples and the obtained numerical results show therefore that our extension ofMCLS is eﬀective and allows us to eﬃciently price single and multi-asset European options.23n the next section we test our extended MCLS in a slightly diﬀerent setting where theintegrating function is smooth. Sample size N -6 -5 -4 -3 -2 -1 P r i c e e rr o r d = 5 M C L S deg = M C M C L S deg = Sample size N -5 -4 -3 -2 -1 C I d = 5 M C L S deg = M C M C L S deg = Sample size N -5 -4 -3 -2 -1 P r i c e e rr o r d = 10 M C M C L S deg = M C L S deg = Sample size N -5 -4 -3 -2 -1 C I d = 10 M C L S deg = M C L S deg = M C Sample size N -5 -4 -3 -2 P r i c e e rr o r d = 20 M C M C L S deg = M C L S deg = Sample size N -4 -3 -2 -1 C I d = 20 M C L S deg = M C M C L S deg = Figure 13: MCLS for rainbow options in Black-Scholes model for diﬀerent dimensions andpolynomial degrees. Left: absolute price errors with respect to a reference price computedwith 10 simulations. Right: width of the 95% conﬁdence interval. In this section we apply the extended MCLS algorithm to compute the deﬁnite integral Z [0 , d sin (cid:16) d X j =1 x j (cid:17) d x . (21)24his is a classical integration problem [8], which was also considered in [20], where MCLS isapplied to compute (21) for dimension at most d = 6 and with at most N = 10 simulations.Our goal is to show that, thanks to the use of REK, we can increase the dimension d andthe number of simulations N .Here we apply MCLS for d = 10 and d = 30 using a basis of tensorized Legendrepolynomials of total degree 5 and 4, respectively. We compare it to the reference resultwhich for d = 2 (mod 4) is explicitly given by Z [0 , d sin (cid:16) d X j =1 x j (cid:17) d x = d X j =0 ( − j +1 (cid:18) dj (cid:19) sin( j ) . Also, we consider diﬀerent sample sizes that go up to 10 . We perform the sampling stepof MCLS (step 1) again according to the optimal distribution. The choice of the solver forthe least-squares problem is again taken according to the scheme in Figure 1 and we assumethat the Vandermonde matrix V can be stored whenever the number of entries is less than10 .The results are shown in Figure 14. Again, we have plotted the obtained absolute errorcomputed with respect to the reference result (left) and the width of the 95% conﬁdenceinterval (right). First, we note that MCLS performs much better than the standard MC,as in the previous examples. Furthermore, the results are considerably better than the onesobtained in the previous section, see Figure 13. This is due to the fact that the integrandis now smooth (it is analytic/entire), while in the multi-asset option example it was onlycontinuous. Indeed, the function approximation error min c ∈ R n +1 k√ w ( f − P nj =0 c j φ j ) k µ isexpected to be much smaller in this case, since polynomials provide a more suitable approx-imating space for smooth functions than for irregular functions. According to Proposition3.1, this results in a stronger variance reduction and hence a better approximation of theintegral. We have presented a numerical technique to price single and multi-asset European optionsand, more generally, to compute expectations of functions of multivariate random variables.The methodology consists of extending MCLS to arbitrary probability measures and combin-ing it with weighted sampling strategies and the randomized extended Kaczmarz algorithm.The core concepts and algorithms have been presented in Section 2. In Section 3 we haveproposed a new cost analysis. Here, we have shown that MCLS asymptotically outperformsthe standard Monte Carlo method as the cost goes to inﬁnity, provided that the integrandsatisﬁes certain regularity conditions.In Section 4 we have applied the new method to price European options. First, we haveadapted the generalization to compute multi-asset option prices, where we have proposedto modify the sampling step of MCLS by discretizing the governing SDE of the underlyingprice process, whenever needed. The modiﬁcation of the ﬁrst step introduces a new source oferror, which has been analyzed in Proposition 4.2. In the Sections 4.2.1, 4.2.2 and 4.2.3 wehave applied the algorithm to price multi-asset European options in the Heston model, in theJacobi model and in the multidimensional Black Scholes model, where we have exploited thefact the they belong to the class of polynomial diﬀusions and the moments can be computedin closed form. For these examples, MCLS usually provides considerably high accuracycompared to the standard MC for the same sample sizes. This typically holds for diﬀerentsample sizes and when accuracy is measured in terms of implied volatility, see Table 2 andTable 3, and in terms of option price errors and conﬁdence interval widths, see for instanceFigures 4-9 and Figure 13. As expected, enlarging the number of basis functions n for agiven sample size N leads to more accurate results. Moreover, in Section 4.2.3 employing25 Sample Size N -7 -6 -5 -4 -3 -2 -1 A b s e rr o r d =10 MC M C L S deg = Sample Size N -6 -5 -4 -3 -2 -1 C I d =10 MCMCLS deg=5 Sample Size N -4 -3 -2 A b s e rr o r d =30 MCMCLS deg=4 Sample Size N -4 -3 -2 C I d =30 MCLS deg=4MC

Figure 14: MCLS for integrating the function sin( P dj =1 x j ) for d = { , } and diﬀerentpolynomial degrees. Left: absolute errors with respect to a reference result given in closedform. Right: width of the 95% conﬁdence interval.REK allowed us to solve high dimensional problems with high accuracy. For instance alsoour experiments for basket options on 20 assets shows that enlarging the number of basisfunctions yields higher accuracy in terms of conﬁdence intervals. Finally, in Section 5 weconsidered the approximation of a multidimensional integral of a smooth function. Storagerequirements limit the feasibility of the basic MCLS for high dimension. Indeed, in [20]only cases with maximal dimension d = 6 and N = 10 could be treated. Thanks to theapplication of REK we were able to treat dimension d = 30 and N up to 10 . This illustratesthe eﬀectiveness of our extended approach.To extend the approach further to even higher dimensions, other computational bottle-necks arising are to be addressed. Solving the storage issue in the least-squares problemwith REK leaves us with a high number of function calls. We do not need to store thefull Vandermode matrix, but instead rows and columns are required many times during theiteration. This leads to a high computational cost. One can reduce this cost by 1) reducingthe number of function calls and by 2) making the function calls more eﬃcient. To achieve1), one can for instance store the rows and columns of the Vandermonde matrix which arecalled with highest probability. To achieve 2) one can exploit further insight of the func-tions, for instance using a low-rank approximation [11] or functional analogues of tensordecomposition approximation [10]. Appendix

Here, we present the proof of Proposition 3.1.26 roof.

Note that the approximate function P nj =0 ˆ c j φ j and thus ˆ I µ,N only depends on thespan of the basis functions { φ j } nj =0 and not on the speciﬁc choice of the basis. Therefore,without loss of generality we can assume that the chosen basis functions { φ j } nj =0 form anorthonormal basis (ONB) in L µ , i.e. R E φ i ( x ) φ j ( x ) dµ ( x ) = δ ij .We decompose the function f into a sum of orthogonal terms f = n X j =0 c ∗ j φ j + g =: f + g, (22)where g satisﬁes R E g ( x ) φ j ( x ) dµ ( x ) = 0 for all j = 0 , · · · , n . Note that k g k µ = min c ∈ R n +1 k f − P nj =0 c j φ j k µ . Assume now that we sample according to dµw and obtain the points { ˜ x i } Ni =1 .Then, the vector of sample values in the weighted least-squares problem can be decomposedas ˜ f = [ ˜ f (˜ x ) + ˜ g (˜ x ) , . . . , ˜ f (˜ x N ) + ˜ g (˜ x N )] T = e Vc ∗ + ˜ g , where e V and ˜ f are deﬁned as in (4) and ˜ g := √ wg and hence˜ g = [ p w (˜ x ) g (˜ x ) , . . . , p w (˜ x N ) g (˜ x N )] . Let ˆ c be again the least-squares solution to (4), thenˆ c = argmin c ∈ R n +1 k e Vc − ( e Vc ∗ + ˜ g ) k = ( e V T e V ) − e V T ( e Vc ∗ + ˜ g ) = c ∗ + ( e V T e V ) − e V T ˜ g , where the second summand is exactly c g := argmin c ∈ R n +1 k e Vc − ˜ g k . It thus follows thatthe integration error is ˆ I µ,N − I µ = c g, = [1 , , . . . , e V T e V ) − e V T ˜ g .Now by the strong law of large numbers we have1 N ( e V T e V ) i +1 ,j +1 = 1 N N X l =1 w (˜ x l ) φ i (˜ x l ) φ j (˜ x l ) → Z E w (˜ x ) φ i (˜ x ) φ j (˜ x ) dµ (˜ x ) w (˜ x ) = Z E φ i (˜ x ) φ j (˜ x ) dµ (˜ x ) = δ ij almost surely and in probability as N → ∞ , by the orthonormality of { φ j } nj =0 . Therefore wehave N e V T e V p → Id n +1 as N → ∞ , where Id n +1 denotes the identity matrix in R ( n +1) × ( n +1) .Moreover, √ N (cid:16) N P Ni =1 w (˜ x i ) g (˜ x i ) (cid:17) d → Z ∼ N (0 , k√ wg k µ ) for N → ∞ by the central limittheorem, where we used the fact R E g ( x ) dµ ( x ) = 0 for the mean and R E ( w (˜ x ) g (˜ x )) dµ (˜ x ) w (˜ x ) = k√ wg k µ for the variance. Thanks to Slutsky’s theorem (see e.g. [12, Chapter 5]) we ﬁnallyobtain √ N ( ˆ I µ,N − I µ ) = √ N [1 , , . . . , T N ( 1 N e V T e V ) − e V T ˜ g d −→ N (0 , k√ wg k µ ) . References [1] Damien Ackerer, Damir Filipovi´c, and Sergio Pulido. The Jacobi stochastic volatilitymodel.

Finance Stoch. , 22(3):667–700, 2018.[2] Russel E. Caﬂisch. Monte Carlo and quasi-Monte Carlo methods.

Acta Numer. , 7:1–49,1998.[3] Abdellah Chkifa, Albert Cohen, Giovanni Migliorati, Fabio Nobile, and Raul Tem-pone. Discrete least squares polynomial approximation with random evaluations—application to parametric and stochastic elliptic PDEs.

ESAIM Math. Model. Numer.Anal. , 49(3):815–837, 2015. 274] Albert Cohen and Giovanni Migliorati. Optimal weighted least-squares methods.

SMAIJ. Comput. Math. , 3:181–203, 2017.[5] D. Duﬃe, D. Filipovi´c, and W. Schachermayer. Aﬃne processes and applications inﬁnance.

Ann. Appl. Probab. , 13(3):984–1053, 2003.[6] Damir Filipovi´c and Martin Larsson. Polynomial diﬀusions and applications in ﬁnance.

Finance Stoch. , 20(4):931–972, 2016.[7] Damir Filipovi´c and Martin Larsson. Polynomial jump-diﬀusion models. arXiv preprintarXiv:1711.08043 , 2017.[8] Alan Genz. Testing multidimensional integration routines. In

Proc. of InternationalConference on Tools, Methods and Languages for Scientiﬁc and Engineering Computa-tion , pages 81–94. Elsevier North-Holland, Inc., 1984.[9] Gene H. Golub and Charles F. Van Loan.

Matrix Computations . The Johns HopkinsUniversity Press, 4th edition, 2012.[10] Alex Gorodetsky, Sertac Karaman, and Youssef Marzouk. A continuous analogue of thetensor-train decomposition.

Computer Methods in Applied Mechanics and Engineering ,347:59–84, 2019.[11] Lars Grasedyck, Daniel Kressner, and Christine Tobler. A literature survey of low-ranktensor approximation techniques.

GAMM-Mitteilungen , 36(1):53–78, 2013.[12] Allan Gut.

Probability: a graduate course . Springer Texts in Statistics. Springer, NewYork, 2nd edition, 2013.[13] A.-L. Haji-Ali, F. Nobile, R. Tempone, and S. Wolfers. Multilevel weighted least squarespolynomial approximation. arXiv preprint arXiv:1707.00026 , 2017.[14] Trevor Hastie, Robert Tibshirani, and Jerome Friedman.

The elements of statisticallearning . Springer Series in Statistics. Springer, New York, 2nd edition, 2009.[15] Steven L Heston. A closed-form solution for options with stochastic volatility withapplications to bond and currency options.

Review of ﬁnancial studies , 6(2):327–343,1993.[16] Peter Kloeden and Andreas Neuenkirch. Convergence of numerical methods for stochas-tic diﬀerential equations in mathematical ﬁnance. In

Recent developments in compu-tational ﬁnance , volume 14 of

Interdiscip. Math. Sci. , pages 49–80. World Sci. Publ.,Hackensack, NJ, 2013.[17] Peter E. Kloeden and Eckhard Platen.

Numerical solution of stochastic diﬀerentialequations , volume 23 of

Applications of Mathematics (New York) . Springer-Verlag,Berlin, 1992.[18] Daniel Kressner, Robert Luce, and Francesco Statti. Incremental computation of blocktriangular matrix exponentials with application to option pricing.

Electron. Trans.Numer. Anal. , 47:57–72, 2017.[19] Francis A. Longstaﬀ and Eduardo S. Schwartz. Valuing American options by simulation:A simple least-squares approach.

Review of Financial Studies , pages 113–147, 2001.[20] Yuji Nakatsukasa. Approximate and integrate: Variance reduction in Monte Carlointegration via function approximation. arXiv preprint arXiv:1806.05492 , 2018.[21] Deanna Needell. Randomized Kaczmarz solver for noisy linear systems.

BIT , 50(2):395–403, 2010. 2822] Deanna Needell, Ran Zhao, and Anastasios Zouzias. Randomized block Kaczmarzmethod with projection for solving least squares.

Linear Algebra Appl. , 484:322–343,2015.[23] Thomas Strohmer and Roman Vershynin. A randomized Kaczmarz algorithm withexponential convergence.

J. Fourier Anal. Appl. , 15(2):262, 2009.[24] Anastasios Zouzias and Nikolaos M. Freris. Randomized extended Kaczmarz for solvingleast squares.