Truncated control variates for weak approximation schemes
TTRUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES
DENIS BELOMESTNY, STEFAN HÄFNER, AND MIKHAIL URUSOV
Abstract.
In this paper we present an enhancement of the regression-based variance reduction ap-proaches recently proposed in Belomestny et al. [1] and [4]. This enhancement is based on a truncationof the control variate and allows for a significant reduction of the computing time, while the complexitystays of the same order. The performances of the proposed truncated algorithms are illustrated by anumerical example.
Keywords.
Control variates; Monte Carlo methods; regression methods; stochastic differential equa-tions; weak schemes.
Mathematics Subject Classification (2010).
Introduction
Let
T > be a fixed time horizon. Consider a d -dimensional diffusion process ( X t ) t ∈ [0 ,T ] defined bythe Itô stochastic differential equation dX t = µ ( X t ) dt + σ ( X t ) dW t , X = x ∈ R d , (0.1)for Lipschitz continuous functions µ : R d → R d and σ : R d → R d × m , where ( W t ) t ∈ [0 ,T ] is a standard m -dimensional Brownian motion. Our aim is to compute the expectation u ( t, x ) := E [ f ( X t,xT )] , (0.2)for some f : R d → R , where X t,x denotes the solution to (0.1) started at time t in point x . Thestandard Monte Carlo (SMC) estimate for u (0 , x ) at a fixed point x ∈ R d has the form V N := 1 N N (cid:88) i =1 f (cid:16) X ( i ) T (cid:17) (0.3)for some N ∈ N , where X T is an approximation for X ,xT constructed via a time discretisationof (0.1) (we refer to [7] for a nice overview of various discretisation schemes). In the computation of u (0 , x ) = E [ f ( X ,xT )] by the SMC approach there are two types of error inherent: the (deterministic)discretisation error E [ f ( X ,xT )] − E [ f ( X T )] and the Monte Carlo (statistical) error, which results fromthe substitution of E [ f ( X T )] with the sample average V N . The aim of variance reduction methods isto reduce the latter statistical error. For example, in the so-called control variate variance reductionapproach one looks for a random variable ξ with E ξ = 0 , which can be simulated, such that the varianceof the difference f ( X T ) − ξ is minimised, that is, Var[ f ( X T ) − ξ ] → min under E ξ = 0 . Then one uses the sample average V CVN := 1 N N (cid:88) i =1 (cid:104) f (cid:16) X ( i ) T (cid:17) − ξ ( i ) (cid:105) (0.4)instead of (0.3) to approximate E [ f ( X T )] . The use of control variates for computing expectations offunctionals of diffusion processes via Monte Carlo was initiated by Newton [11] and further developedin Milstein and Tretyakov [8]. In Belomestny et al [1] a novel regression-based approach for theconstruction of control variates, which reduces the variance of the approximated functional f ( X T ) wasproposed. As shown in [1], the “Monte Carlo approach with the Regression-based Control Variate”(abbreviated below as “RCV approach”) is able to achieve a higher order convergence of the resultingvariance to zero, which in turn leads to a significant complexity reduction as compared to the SMC The work of Denis Belomestny is supported by the Russian Science Foundation project 14-50-00150. a r X i v : . [ m a t h . P R ] M a y DENIS BELOMESTNY, STEFAN HÄFNER, AND MIKHAIL URUSOV algorithm. Other prominent examples of algorithms with this property are the multilevel Monte Carlo(MLMC) algorithm of [5] and quadrature-based algorithms of [9] and [10]. The RCV approach becomesespecially simple in the case of the so-called weak approximation schemes, i.e., the schemes, wheresimple random variables are used in place of Brownian increments, and which became quite popularin recent years. However, due to the fact that a lot of computations are required for implementing theRCV approach, its numerical efficiency is not convincing in higher-dimensional examples. The sameapplies also to the SRCV algorithm of [4]. In this paper we further enhance the performances of theRCV and SRCV algorithms by truncating the control variates, leading to a reduction from (2 m − to m terms at each time point in case of the weak Euler scheme and a reduction from (3 m m ( m − − to m ( m + 1) = O ( m ) terms at each time point in case of the second order weak scheme. It turns outthat, while the computing time is reduced significantly, we still have a sufficient variance reductioneffect such that the complexity is of the same order as for the original RCV and SRCV approaches.The paper is organised as follows. In Section 1 we present a smoothness theorem for a general classof discretisation schemes. Section 2 recalls the construction of control variates for weak schemes ofthe first and the second order. The main truncation results are derived in Section 3. In Section 4 wedescribe a generic regression algorithm. Section 5 deals with a complexity analysis for the algorithmthat is based on the truncated control variate. Section 6 is devoted to a simulation study. Finally, allproofs are collected in Section 7.1. Smoothness theorem for discretisation schemes
In this section we present a technical result for discretisation schemes, which will be very importantin the sequel. To begin with, let J ∈ N denote the time discretisation parameter, we set ∆ := T /J and consider discretisation schemes defined on the grid { j ∆ : j = 0 , . . . , J } .Let us consider a scheme, where d -dimensional approximations X ∆ ,j ∆ , j = 0 , . . . , J , satisfy X ∆ , = x and X ∆ ,j ∆ = Φ ∆ (cid:0) X ∆ , ( j − , ξ j (cid:1) , j = 1 , . . . , J, (1.1)for some Borel measurable functions Φ ∆ : R d + ˜ m → R d , where ˜ m ≥ m , and for ˜ m -dimensional i.i.d.random vectors ξ j = ( ξ j , . . . , ξ ˜ mj ) (cid:62) with independent coordinates satisfying E (cid:104) ξ ij (cid:105) = 0 and Var (cid:104) ξ ij (cid:105) = 1 for all i = 1 , . . . , ˜ m , j = 1 , . . . , J . Moreover, let G be the trivial σ -field and G j = σ ( ξ , . . . , ξ j ) , j = 1 , . . . , J . In the chapters below we will focus on different kinds of discretisation schemes, resultingin different convergence behaviour.We now define the random function G l,j ( x ) for J ≥ l ≥ j ≥ , x ∈ R d , as follows G l,j ( x ) ≡ Φ ∆ ,l ◦ Φ ∆ ,l − ◦ . . . ◦ Φ ∆ ,j +1 ( x ) , l > j, (1.2) G l,j ( x ) ≡ x, l = j, where Φ ∆ ,l ( x ) := Φ ∆ ( x, ξ l ) for l = 1 , . . . , J . By Φ k ∆ ,l , k ∈ { , . . . , d } , we denote the k -th componentof the function Φ ∆ ,l . Note that it holds q j ( x ) := E [ f ( X ∆ ,T ) | X ∆ ,j ∆ = x ] = E [ f ( G J,j ( x ))] . (1.3)Let us define the operator D α as follows D α g ( x ) := ∂ | α | g ( x ) ∂x α · · · ∂x α d d , (1.4)where g is a real-valued function, α ∈ N d and | α | = α + . . . + α d ( N := N ∪ { } ).In the next theorem we present some smoothness conditions on q j , which will be used several timesin the chapters below. Theorem 1.1.
Let K ∈ { , , } . Suppose that f is K times continuously differentiable with boundedpartial derivatives up to order K , Φ ∆ ( · , ξ ) is K times continuously differentiable (for any fixed ξ ), andthat, for any n ∈ N , l ≥ j , k ∈ { , . . . , d } , α ∈ N d with ≤ | α | ≤ K , it holds (cid:12)(cid:12)(cid:12) E (cid:104) (cid:16) D α Φ k ∆ ,l +1 ( G l,j ( x )) (cid:17) n (cid:12)(cid:12)(cid:12) G l (cid:105)(cid:12)(cid:12)(cid:12) ≤ (cid:40) (1 + A n ∆) , | α | = α k = 1 B n ∆ , ( | α | > ∨ ( α k (cid:54) = 1) (1.5) RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 3 with probability one for some constants A n > , B n > . Moreover, suppose that for any n , n ∈ N , α, β ∈ N d , with | α | = 1 , ≤ | β | ≤ K , α (cid:54) = β , it holds (cid:12)(cid:12)(cid:12) E (cid:104) (cid:16) D α Φ k ∆ ,l +1 ( G l,j ( x )) (cid:17) n (cid:16) D β Φ k ∆ ,l +1 ( G l,j ( x )) (cid:17) n (cid:12)(cid:12)(cid:12) G l (cid:105)(cid:12)(cid:12)(cid:12) ≤ E n ,n ∆ (1.6) for some constants E n ,n > . Then we obtain for all j ∈ { , . . . , J } that q j is K times continuouslydifferentiable with bounded partial derivatives up to order K . Representations for weak approximation schemes
Below we focus on weak schemes of first and second order.2.1.
Weak Euler scheme.
In this subsection we treat weak schemes of order . Let us consider ascheme, where d -dimensional approximations X ∆ ,j ∆ , j = 0 , . . . , J , satisfy X ∆ , = x and X ∆ ,j ∆ = Φ ∆ ( X ∆ , ( j − , ξ j ) , j = 1 , . . . , J, (2.1)for some functions Φ ∆ : R d + m → R d , with ξ j = ( ξ j , . . . , ξ mj ) , j = 1 , . . . , J , being m -dimensional iidrandom vectors with iid coordinates such that P (cid:16) ξ kj = ± (cid:17) = 12 , k = 1 , . . . , m. That is, relating to the framework in Section 1, we have ˜ m = m and use the discrete increments ξ ij , i = 1 , . . . , m . A particular case is the weak Euler scheme (also called the simplified weak Euler scheme in [7, Section 14.1]) of order 1, which is given by Φ ∆ ( x, y ) = x + µ ( x ) ∆ + σ ( x ) y √ ∆ . (2.2)Let us recall the functions (cf. (1.3)) q j ( x ) = E [ f ( X ∆ ,T ) | X ∆ ,j ∆ = x ] . The proposition below summarises important representations for the weak Euler scheme, which werederived in [1].
Proposition 2.1.
The following representation holds f ( X ∆ ,T ) = E f ( X ∆ ,T ) + J (cid:88) j =1 m (cid:88) r =1 (cid:88) ≤ s <...
Proposition 2.2.
Assume that µ and σ in (0.1) are Lipschitz continuous with components µ i , σ i,r : R d → R , i = 1 , . . . , d , r = 1 , . . . , m , being times continuously differentiable with theirpartial derivatives of order up to having polynomial growth. Let f : R d → R be times continuouslydifferentiable with partial derivatives of order up to having polynomial growth. Provided that (2.2) holds and that, for sufficiently large p ∈ N , the expectations E | X ∆ ,j ∆ | p are uniformly bounded in J and j = 0 , . . . , J , we have for this “simplified weak Euler scheme” | E [ f ( X T ) − f ( X ∆ ,T )] | ≤ c ∆ , where the constant c does not depend on ∆ . Moreover, it holds Var (cid:104) f ( X ∆ ,T ) − M (1)∆ ,T (cid:105) = 0 . Discussion.
In order to use the control variate M (1)∆ ,T in practice, we need to estimate the unknowncoefficients a j,r,s . Thus, practically implementable control variates (cid:102) M (1)∆ ,T have the form (2.6) with someestimated functions ˜ a j,r,s : R d → R . Notice that they remain valid control variates, i.e. we still have E (cid:2) (cid:102) M (1)∆ ,T (cid:3) = 0 , which is due to the martingale transform structure in (2.6).2.2. Second order weak scheme.
Now we treat weak schemes of order . We consider a scheme,where d -dimensional approximations X ∆ ,j ∆ , j = 0 , . . . , J , satisfy X ∆ , = x and X ∆ ,j ∆ = Φ ∆ ( X ∆ , ( j − , ξ j , V j ) , j = 1 , . . . , J, (2.7)for some functions Φ ∆ : R d + m + m × m → R d . Here,(S1) ξ j = ( ξ ij ) mi =1 are m -dimensional random vectors,(S2) V j = ( V ilj ) mi,l =1 are random m × m -matrices,(S3) the pairs ( ξ j , V j ) , j = 1 , . . . , J , are i.i.d.,(S4) for each j , the random elements ξ j and V j are independent,(S5) for each j , the random variables ξ ij , i = 1 , . . . , m , are i.i.d. with P (cid:16) ξ ij = ±√ (cid:17) = 16 , P (cid:0) ξ ij = 0 (cid:1) = 23 , (S6) for each j , the random variables V ilj , ≤ i < l ≤ m , are i.i.d. with P (cid:16) V ilj = ± (cid:17) = 12 , (S7) V lij = − V ilj , ≤ i < l ≤ m , j = 1 , . . . , J ,(S8) V iij = − , i = 1 , . . . , m , j = 1 , . . . , J .Hence, the matrices V j can be generated by means of m ( m − i.i.d. random variables. That is, relatingto the framework in Section 1, we have ˜ m -dimensional random vectors ˜ ξ j := (( ξ ij ) i =1 ,...,m , ( V ilj ) ≤ i In order to obtain a second order weak scheme in the multidimensional case, we needto incorporate additional random elements V j into the structure of the scheme. This is the reason whywe now consider (2.7) instead of (2.1). For instance, to get the second order weak scheme (also calledthe simplified order 2 weak Taylor scheme ) of [7, Section 14.2] in the multidimensional case, we needto define the functions Φ ∆ ( x, y, z ) , x ∈ R d , y ∈ R m , z ∈ R m × m , as explained below. First we definethe function Σ : R d → R d × d by the formula Σ( x ) = σ ( x ) σ ( x ) (cid:62) and recall that the coordinates of vectors and matrices are denoted by superscripts, e.g. Σ( x ) =(Σ kl ( x )) dk,l =1 , Φ ∆ ( x, y, z ) = (Φ k ∆ ( x, y, z )) dk =1 . Let us introduce the operators L r , r = 0 , . . . , m , that act This phrase means that the discrete-time process ˜ M = ( ˜ M l ) l =0 ,...,J , where ˜ M = 0 and ˜ M l is defined like the right-hand side of (2.6) but with (cid:80) Jj =1 being replaced by (cid:80) lj =1 and a j,r,s by ˜ a j,r,s is a martingale, which is a straightforwardcalculation. RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 5 on sufficiently smooth functions g : R d → R as follows: L g ( x ) := d (cid:88) k =1 µ k ( x ) ∂g∂x k ( x ) + 12 d (cid:88) k,l =1 Σ kl ( x ) ∂ g∂x l ∂x k ( x ) , L r g ( x ) := d (cid:88) k =1 σ kr ( x ) ∂g∂x k ( x ) , r = 1 , . . . , m. The r -th coordinate Φ r ∆ , r = 1 , . . . , d , in the simplified order 2 weak Taylor scheme of [7, Section 14.2]is now given by the formula Φ r ∆ ( x, y, z ) = x r + m (cid:88) k =1 σ rk ( x ) y k √ ∆ (2.8) + µ r ( x ) + 12 m (cid:88) k,l =1 L k σ rl ( x )( y k y l + z kl ) ∆+ 12 m (cid:88) k =1 (cid:104) L σ rk ( x ) + L k µ r ( x ) (cid:105) y k ∆ / + 12 L µ r ( x ) ∆ , provided the coefficients µ and σ of (0.1) are sufficiently smooth. We will need to work explicitlywith (2.8) at some point, but all results in this subsection assume structure (2.7) only.Let us define the index sets I = { , . . . , m } , I = (cid:8) ( k, l ) ∈ I : k < l (cid:9) and the system A = { ( U , U ) ∈ P ( I ) × P ( I ) : U ∪ U (cid:54) = ∅} , where P ( I ) denotes the set of all subsets of a set I . For any U ⊆ I and o ∈ { , } U , we write o as o = ( o r ) r ∈ U . Below we use the convention that a product over the empty set is always one.For k ∈ N , H k : R → R stands for the (normalized) k -th Hermite polynomial, i.e. H k ( x ) := ( − k √ k ! e x d k dx k e − x , x ∈ R . We remark that, in particular, H ≡ , H ( x ) = x and H ( x ) = √ ( x − .As in Subsection 2.1, we summarise important representations from [1] below. Proposition 2.4. It holds f ( X ∆ ,T ) = E f ( X ∆ ,T ) + J (cid:88) j =1 (cid:88) ( U ,U ) ∈A (cid:88) o ∈{ , } U a j,o,U ,U ( X ∆ , ( j − ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj , (2.9) where the coefficients a j,o,U ,U : R d → R can be computed by the formula a j,o,U ,U ( x ) = E f ( X ∆ ,T ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ∆ , ( j − = x . (2.10) Moreover, we have for each j ∈ { , . . . , J } , q j − ( x ) = E [ q j ( X ∆ ,j ∆ ) | X ∆ , ( j − = x ]= 12 m ( m − m (cid:88) ( y ,...,y m ) ∈{−√ , , √ } m (cid:88) ( z uv ) ≤ u Proposition 2.5. Assume, that µ and σ in (0.1) are Lipschitz continuous with components µ i , σ i,r : R d → R , i = 1 , . . . , d , r = 1 , . . . , m , being times continuously differentiable with theirpartial derivatives of order up to having polynomial growth. Let f : R d → R be times continuouslydifferentiable with partial derivatives of order up to having polynomial growth. Provided that (2.8) holds and that, for sufficiently large p ∈ N , the expectations E | X ∆ ,j ∆ | p are uniformly bounded in J and j = 0 , . . . , J , we have for this “simplified second order weak Taylor scheme” | E [ f ( X T ) − f ( X ∆ ,T )] | ≤ c ∆ , where the constant c does not depend on ∆ . Moreover, we have Var (cid:104) f ( X ∆ ,T ) − M (2)∆ ,T (cid:105) = 0 for thecontrol variate M (2)∆ ,T := J (cid:88) j =1 (cid:88) ( U ,U ) ∈A (cid:88) o ∈{ , } U a j,o,U ,U ( X ∆ , ( j − ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj , (2.12) where the coefficients a j,o,U ,U ( x ) are defined in (2.10) . Truncated control variates for weak approximation schemes Below we recall the assumptions from [1], suggest sufficient conditions for them in terms of thefunctions f, µ, σ , and then suggest some stronger conditions that will justify the use of truncatedcontrol variates.3.1. Weak Euler scheme. Note that we considered only the second order weak scheme in terms of theregression and complexity analyses in [1]. However, analogous assumptions for the weak Euler schemeare as follows (cf. Proposition 2.1): fix some j ∈ { , . . . , J } , r ∈ { , . . . , m } , s = ( s , . . . , s r ) with ≤ s < . . . < s r ≤ m , set ζ j,r,s := f ( X ∆ ,T ) (cid:81) ri =1 ξ s i j and remark that a j,r,s ( x ) = E [ ζ j,r,s | X ∆ , ( j − = x ] .We assume that, for some positive constants Σ , A , it holds:(A1) sup x ∈ R d Var[ ζ j,r,s | X ∆ , ( j − = x ] ≤ Σ < ∞ ,(A2) sup x ∈ R d | a j,r,s ( x ) | ≤ A √ ∆ < ∞ .In the following theorem we suggest sufficient conditions for the above assumptions. Theorem 3.1. (i) Let f be bounded. Then (A1) holds.(ii) Let all the functions σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } , be bounded and all the functions f, µ k , σ ki be continuously differentiable with bounded partial derivatives. Then (A2) holds. Next we suggest some stronger conditions that give us somewhat more than (A2). Theorem 3.2. Let all the functions σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } , be bounded and all thefunctions f, µ k , σ ki be twice continuously differentiable with bounded partial derivatives up to order 2.Then it holds (A3) sup x ∈ R d | a j,r,s ( x ) | (cid:46) ∆ , whenever r > . Remark 3.3. As a generalisation of Theorem 3.2, it is natural to expect that it holds, under additionalsmoothness conditions on f, µ, σ , sup x ∈ R d | a j,r,s ( x ) | (cid:46) ∆ r/ for all j ∈ { , . . . , J } , r ∈ { , . . . , m } and ≤ s < . . . < s r ≤ m . RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 7 Let us define the “truncated control variate” M (1) ,trunc ∆ ,T := J (cid:88) j =1 m (cid:88) i =1 a j, ,e i ( X ∆ , ( j − ) ξ ij , (3.1)where e i ∈ R m denotes the i -th unit vector in R m and a j, ,e i is given by (cf. (2.4)) a j, ,e i ( x ) = E (cid:2) f ( X ∆ ,T ) ξ ij (cid:12)(cid:12) X ∆ , ( j − = x (cid:3) . Note that the superscript “trunc” comes from “truncated”. That is, we consider in M (1) ,trunc ∆ ,T only theterms of the control variate M (1)∆ ,T for which r = 1 (cf. (2.6)).Next we study the truncation error that arises from replacing M (1)∆ ,T by M (1) ,trunc ∆ ,T . Theorem 3.4. Suppose that all the functions σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } are bounded and allthe functions f, µ k , σ ki are twice continuously differentiable with bounded partial derivatives up to order2. Then it holds (cf. Proposition 2.2) Var (cid:104) f ( X ∆ ,T ) − M (1) ,trunc ∆ ,T (cid:105) (cid:46) ∆ . (3.2)Notice that under Assumption (A2) alone the variance in (3.2) would have been O (1) .3.2. Second order weak scheme. First we recall some of the required assumptions in [1]: let us fixsome j ∈ { , . . . , J } , ( U , U ) ∈ A , o ∈ { , } U , set ζ j,o,U ,U := f ( X ∆ ,T ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj and remark that a j,o,U ,U ( x ) = E [ ζ j,o,U ,U | X ∆ , ( j − = x ] . We assume that, for some positive con-stants Σ , A , it holds:(B1) sup x ∈ R d Var[ ζ j,o,U ,U | X ∆ , ( j − = x ] ≤ Σ < ∞ ,(B2) sup x ∈ R d | a j,o,U ,U ( x ) | ≤ A √ ∆ < ∞ .Below we verify the above assumptions. Theorem 3.5. (i) Let f be bounded. Then (B1) holds.(ii) Let all the functions µ k and σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } , be bounded, the function f becontinuously differentiable with bounded partial derivatives, and all the functions µ k , σ ki be three timescontinuously differentiable with bounded partial derivatives up to order 3. Then (B2) holds. Let us define the index sets K := { r ∈ U : o r = 1 } , K := { r ∈ U : o r = 2 } . In the following theorem we provide some stronger conditions that give us more than (B2). Theorem 3.6. (i) Let all the functions µ k and σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } , be bounded, thefunction f be twice continuously differentiable with bounded partial derivatives up to order 2, and allthe functions µ k , σ ki be four times continuously differentiable with bounded partial derivatives up toorder 4. Then it holds (B3) sup x ∈ R d | a j,o,U ,U ( x ) | (cid:46) ∆ , whenever | U | + |K | + |K | ≥ .(ii) Let all the functions µ k and σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } , be bounded, the function f be three times continuously differentiable with bounded partial derivatives up to order 3, and all thefunctions µ k , σ ki be five times continuously differentiable with bounded partial derivatives up to order 5.Then it holds (B4) sup x ∈ R d | a j,o,U ,U ( x ) | (cid:46) ∆ / , whenever | U | + |K | + |K | > . Remark 3.7. (i) As a generalisation of Theorem 3.6, it is natural to expect that it holds, underadditional smoothness conditions on f, µ, σ , sup x ∈ R d | a j,o,U ,U ( x ) | (cid:46) ∆ | U | + |K | + |K | DENIS BELOMESTNY, STEFAN HÄFNER, AND MIKHAIL URUSOV for all j ∈ { , . . . , J } , ( U , U ) ∈ A and o ∈ { , } U .(ii) Define ∆ U ,U := (cid:40) ∆ | U | + |K | + |K | if | U | + |K | + |K | ≤ , ∆ / otherwise . (3.3)An equivalent reformulation of assumptions (B2)–(B4) is as follows: there exists some positive constant ˜ A such that it holds sup x ∈ R d | a j,o,U ,U ( x ) | ≤ ˜ A ∆ U ,U (3.4)for all j, o, U , U .Similar to Section 3.1, let us define a truncated control variate through M (2) ,trunc ∆ ,T := J (cid:88) j =1 (cid:88) ( U ,U ) ∈A| U | + |K | + |K |≤ (cid:88) o ∈{ , } U a j,o,U ,U ( X ∆ , ( j − ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj . (3.5)Next we derive the truncation error that arises from replacing M (2)∆ ,T by M (2) ,trunc ∆ ,T . Theorem 3.8. Suppose that all the functions µ k and σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } are bounded,the function f is three times continuously differentiable with bounded partial derivatives up to order 3,and all the functions µ k , σ ki are five times continuously differentiable with bounded partial derivativesup to order 5. Then it holds (cf. Proposition 2.5) Var (cid:104) f ( X ∆ ,T ) − M (2) ,trunc ∆ ,T (cid:105) (cid:46) ∆ . (3.6) 4. Generic regression algorithm In the previous sections we have given several representations for control variates. Now we discusshow to compute the coefficients in these representations via regression. For the sake of clarity, we focuson second order schemes and control variate (3.5) with coefficients given by (2.10).4.1. Monte Carlo regression. Fix a Q -dimensional vector of real-valued functions ψ = ( ψ , . . . , ψ Q ) on R d . Simulate a big number N of independent “training paths” of the discretised diffusion X ∆ ,j ∆ ,j = 0 , . . . , J . In what follows these N training paths are denoted by D trN : D trN := (cid:110) ( X tr, ( i )∆ ,j ∆ ) j =0 ,...,J : i = 1 , . . . , N (cid:111) . Let α j,o,U ,U = ( α j,o,U ,U , . . . , α Qj,o,U ,U ) , where j ∈ { , . . . , J } , ( U , U ) ∈ A , | U | + |K | + |K | ≤ , o ∈ { , } U , be a solution of the following least squares optimisation problem: argmin α ∈ R n N (cid:88) i =1 (cid:104) ζ tr, ( i ) j,o,U ,U − α ψ ( X tr, ( i )∆ , ( j − ) − . . . − α Q ψ Q ( X tr, ( i )∆ , ( j − ) (cid:105) with ζ tr, ( i ) j,o,U ,U := f ( X tr, ( i )∆ ,T ) (cid:89) r ∈ U H o r (cid:16) ( ξ tr, ( i ) j ) r (cid:17) (cid:89) ( k,l ) ∈ U ( V tr, ( i ) j ) kl . Define an estimate for the coefficient function a j,o,U ,U via ˆ a j,o,U ,U ( x ) := ˆ a j,o,U ,U ( x, D trN ) := α j,o,U ,U ψ ( x ) + . . . + α Qj,o,U ,U ψ Q ( x ) , x ∈ R d . The intermediate expression ˆ a j,o,U ,U ( x, D trN ) in the above formula emphasises that the estimates ˆ a j,o,U ,U of the functions a j,o,U ,U are random in that they depend on the simulated training paths. The In the complexity analysis below we show how large N is required to be in order to provide an estimate within somegiven tolerance. RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 9 cost of computing α j,o,U ,U is of order O ( N Q ) , since each α j,o,U ,U is of the form α j,o,U ,U = B − b with B k,l := 1 N N (cid:88) i =1 ψ k (cid:0) X tr, ( i )∆ , ( j − (cid:1) ψ l (cid:0) X tr, ( i )∆ , ( j − (cid:1) (4.1)and b k := 1 N N (cid:88) i =1 ψ k (cid:0) X tr, ( i )∆ , ( j − (cid:1) ζ tr, ( i ) j,o,U ,U ,k, l ∈ { , . . . , Q } . The cost of approximating the family of the coefficient functions a j,o,U ,U , j ∈{ , . . . , J } , ( U , U ) ∈ A , | U | + |K | + |K | ≤ , o ∈ { , } U , is of order O (cid:0) J m ( m + 1) N Q (cid:1) .4.2. Summary of the algorithm. The algorithm consists of two phases: training phase and testingphase. In the training phase, we simulate N independent training paths D trN and construct regressionestimates ˆ a j,o,U ,U ( · , D trN ) for the coefficients a j,o,U ,U ( · ) . In the testing phase, independently from D trN we simulate N independent testing paths ( X ( i )∆ ,j ∆ ) j =0 ,...,J , i = 1 , . . . , N , and build the Monte Carloestimator for E [ f ( X T )] as(4.2) E = 1 N N (cid:88) i =1 (cid:16) f ( X ( i )∆ ,T ) − (cid:99) M (2) ,trunc, ( i )∆ ,T (cid:17) , where (cid:99) M (2) ,trunc, ( i )∆ ,T := J (cid:88) j =1 (cid:88) ( U ,U ) ∈A| U | + |K | + |K |≤ (cid:88) o ∈{ , } U ˆ a j,o,U ,U ( X ( i )∆ , ( j − , D trN ) (cid:89) r ∈ U H o r ( ξ r, ( i ) j ) (cid:89) ( k,l ) ∈ U V kl, ( i ) j (4.3)(cf. with (2.12)). Due to the martingale transform structure in (4.3) (recall footnote 1 on page 4), wehave E (cid:104) (cid:99) M (2) ,trunc, ( i )∆ ,T | D trN (cid:105) = 0 , hence E [ E| D trN ] = E [ f ( X ( i )∆ ,T ) − (cid:99) M (2) ,trunc, ( i )∆ ,T | D trN ] = E [ f ( X ∆ ,T )] , andwe obtain (cf. (3.6)) Var[ E ] = E [Var( E| D trN )] + Var[ E ( E| D trN )] = E [Var( E| D trN )]= 1 N E (cid:104) Var (cid:16) f ( X (1)∆ ,T ) − (cid:99) M (2) ,trunc, (1)∆ ,T | D trN (cid:17)(cid:105) = 1 N Var (cid:104) f ( X (1)∆ ,T ) − (cid:99) M (2) ,trunc, (1)∆ ,T (cid:105) . Summarising, we have E [ E ] = E [ f ( X ∆ ,T )] , Var[ E ] = 1 N Var (cid:104) f ( X (1)∆ ,T ) − (cid:99) M (2) ,trunc, (1)∆ ,T (cid:105) . (4.4)Notice that the result of (4.4) indeed requires the computations above and cannot be stated right fromthe outset because the summands in (4.2) are dependent (through D trN ).This concludes the description of the generic regression algorithm for constructing the control variate.Further details, such as bounds for the right-hand side of (4.4), depend on a particular implementation,i.e. on the quality of the chosen basis functions.5. Complexity analysis In this section we extend the complexity analysis presented in [1] to the case of the “TRCV” (trun-cated RCV) algorithm. Below we only sketch the main results for the second order schemes. We makethe following assumption (cf. [2] and [4]):(B5) The functions a j,o,U ,U ( x ) can be well approximated by the functions from Ψ Q := span ( { ψ , . . . , ψ Q } ) , in the sense that there are constants κ > and C κ > such that inf g ∈ Ψ Q ˆ R d ( a j,o,U ,U ( x ) − g ( x )) P ∆ ,j − ( dx ) ≤ C κ Q κ , where P ∆ ,j − denotes the distribution of X ∆ , ( j − . Remark 5.1. Note that (B5) is a natural condition to be satisfied for good choices of Ψ Q . Forinstance, under appropriate assumptions, in the case of piecewise polynomial regression as describedin [1], (B5) is satisfied with κ = ν ( p +1)2 d ( p +1)+ dν , where the parameters p and ν are explained in [1].In Lemma 5.2 below we present an L -upper bound for the estimation error of the TRCV algorithm.To this end, we need to describe more precisely, how exactly the regression-based approximations ˜ a j,o,U ,U are constructed:Let functions ˆ a j,o,U ,U ( x ) be obtained by regression onto the set of basis functions { ψ , . . . , ψ Q } ,while the approximations ˜ a j,o,U ,U ( x ) of the TRCV algorithm be the truncated estimates, which aredefined as follows ˜ a j,o,U ,U ( x ) := T ˜ A ∆ U ,U ˆ a j,o,U ,U ( x ) := (cid:40) ˆ a j,o,U ,U ( x ) if | ˆ a j,o,U ,U ( x ) | ≤ ˜ A ∆ U ,U , ˜ A ∆ U ,U sgn ˆ a j,o,U ,U ( x ) otherwise , (5.1)where ∆ U ,U and ˜ A are given in (3.3) and (3.4)). Lemma 5.2. Under (B1)–(B5), we have E (cid:107) ˜ a j,o,U ,U − a j,o,U ,U (cid:107) L ( P ∆ ,j − ) ≤ ˜ c (Σ + ˜ A ∆ U ,U (log N + 1)) QN + 8 C κ Q κ , (5.2) where ˜ c is a universal constant. Notice that the expectation in the left-hand side of (5.2) means averaging over the randomnessin D trN .Let ( X ∆ ,j ∆ ) j =0 ,...,J be a testing path, which is independent of the training paths D trN . We define (cid:102) M (2) ,trunc ∆ ,T := J (cid:88) j =1 (cid:88) ( U ,U ) ∈A| U | + |K | + |K |≤ (cid:88) o ∈{ , } U ˜ a j,o,U ,U ( X ∆ , ( j − , D trN ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj (5.3)(cf. (3.5)). Lemma 5.2 now allows to bound the variance Var[ f ( X ∆ ,T ) − (cid:102) M (2) ,trunc ∆ ,T ] from above. Theorem 5.3. Under (B1)–(B5), it holds Var[ f ( X ∆ ,T ) − (cid:102) M (2) ,trunc ∆ ,T ] (cid:46) ∆ + J m ( m + 1) (cid:18) ˜ c (Σ + ˜ A ∆(log N + 1)) QN + 8 C κ Q κ (cid:19) . Complexity of the TRCV approach. Let us study the complexity of the TRCV approach.The overall cost is of order J Q max { N Q, N } , provided that we only track the constants which tendto ∞ when ε (cid:38) with ε being the accuracy to be achieved. That is, the constants, such as d, m, κ, C κ ,are ignored. We have the following constraints max (cid:26) J , J N , J QN N , JQ κ N (cid:27) (cid:46) ε , (5.4)where the first term comes from the squared bias of the estimator and the remaining three ones comefrom the variance of the estimator (see Theorem 5.3 as well as footnote 3 on page 10). We get thefollowing result. Theorem 5.4. For the TRCV approach with the second order weak schemes under (B1)–(B5), it isoptimal to choose the orders of parameters as follows (cf. [4] ) J (cid:16) ε − , Q (cid:16) ε − κ +4 , N (cid:16) ε − , N (cid:16) N Q (cid:16) ε − κ +104 κ +4 , Notice that the variance of the TRCV estimate N (cid:80) N i =1 (cid:104) f (cid:16) X ( i )∆ ,T (cid:17) − ˜ M (2) ,trunc, ( i )∆ ,T (cid:105) with N testing paths is N Var[ f ( X ∆ ,T ) − ˜ M (2) ,trunc ∆ ,T ] (cf. (4.4)). RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 11 provided that κ > . Thus, we have for the complexity C T RCV (cid:16) J N Q (cid:16) J N Q (cid:16) ε − κ +174 κ +4 . (5.5) Remark 5.5. (i) For the sake of comparison with the SMC and MLMC approaches, we recall at thispoint that their complexities are C SMC (cid:16) ε − . and C MLMC (cid:16) ε − at best (we are considering the second order scheme).(ii) Complexity estimate (5.5) shows that one can go beyond the complexity order ε − , providedthat κ > , and that we can achieve the complexity order ε − . − δ , for arbitrarily small δ > , provided κ is large enough.(iii) The complexity of the TRCV approach is the same that we obtain for the RCV approach (wherethe “complete” control variate (2.12) is estimated), since the second constraint in (5.4), which does notarise for the RCV approach, is the only inactive one in this case. That is why we truncated M (2) ,trunc ∆ ,T in (3.5) at the level | U | + |K | + |K | ≤ . For instance, if we had used a control variate of the form(cf. (3.1)) J (cid:88) j =1 (cid:88) ( U ,U ) ∈A| U | + |K | + |K | = (cid:88) o ∈{ , } U a j,o,U ,U ( X ∆ , ( j − ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj = J (cid:88) j =1 m (cid:88) i =1 a j, ,i, ∅ ( X ∆ , ( j − ) ξ ij with a j, ,i, ∅ ( x ) = E (cid:104) f ( X ∆ ,T ) ξ ij | X ∆ , ( j − = x (cid:105) , the bound for the variance in (3.6) would have beenof order ∆ and due to the resulting constraint JN (cid:46) ε , we would have obtained worse complexitiesthan ε − , since C T RCV (cid:38) J N . 6. Numerical results The results below are based on program codes written and vectorised in MATLAB and running ona Linux 64-bit operating system.Let us consider the following SDE for d = m = 5 (cf. [1]) dX it = − sin (cid:0) X it (cid:1) cos (cid:0) X it (cid:1) dt + cos (cid:0) X it (cid:1) dW it , X i = 0 , i ∈ { , , , } ,dX t = (cid:88) i =1 (cid:20) − 12 sin (cid:0) X it (cid:1) cos (cid:0) X it (cid:1) dt + cos (cid:0) X it (cid:1) dW it (cid:21) + dW t , X = 0 . (6.1)The solution of (6.1) is given by X it = arctan (cid:0) W it (cid:1) , i ∈ { , , , } ,X t = (cid:88) i =1 arsinh (cid:0) W it (cid:1) + W t . for t ∈ [0 , . Further, we consider the functional f ( x ) = cos (cid:32) (cid:88) i =1 x i (cid:33) − (cid:88) i =1 sin (cid:0) x i (cid:1) , Performing the full complexity analysis via Lagrange multipliers one can see that these parameter values are not optimal if κ ≤ (a Lagrange multiplier corresponding to a “ ≤ ” constraint is negative). Recall that in the case ofpiecewise polynomial regression (see [1] and recall Remark 5.1) we have κ = ν ( p +1)2 d ( p +1)+ dν . Let us note that in [1] it isrequired to choose the parameters p and ν according to p > d − and ν > d ( p +1)2( p +1) − d , which implies that κ > , for κ expressed via p and ν by the above formula. that is, we have E [ f ( X )] = (cid:0) E (cid:2) cos (cid:0) arctan (cid:0) W (cid:1) + arsinh (cid:0) W (cid:1)(cid:1)(cid:3)(cid:1) E (cid:2) cos (cid:0) W (cid:1)(cid:3) ≈ . . Here we consider weak schemes of the second order and compare the numerical performances ofthe SMC, MLMC, RCV, TRCV and TSRCV approaches. The latter one is the truncated version ofthe SRCV approach of [4]. Like the RCV algorithm, the SRCV one is based on (2.12), the differenceis only in how to implement the approximations of the coefficients a j,o,U ,U in practice (while theRCV algorithm is a direct Monte Carlo regression, in the SRCV algorithm the regression is combinedwith a kind of “stratification”; see [4] for more detail). Therefore, the idea of the truncation (i.e.replacing (2.12) with (3.5)) applies also to the SRCV approach and gives us the TSRCV one.For simplicity we implemented a global regression for the RCV, TRCV and TSRCV approaches (i.e.the one without considering the truncation operator in (5.1), as a part of the general description inSection 4). More precisely, we use quadratic polynomials (that is (cid:81) i =1 x l i i , where l , . . . l ∈ { , , } and (cid:80) l =1 l i ≤ ) as well as f as basis functions, hence Ψ Q consists of Q = (cid:0) (cid:1) + 1 = 22 basis functions.Note that we do not need to consider random variables V klj in the second order weak scheme, since L k σ rl ( x ) = 0 for k (cid:54) = l (see (2.8)). This gives us less terms for the RCV approach, namely m − rather than m m ( m − − terms in (2.12) (the factor m ( m − ≡ is no longer present).As for the TRCV and TSRCV approaches, this gives us only m ( m +3)2 = 20 compared to m ( m + 1) = 30 terms in (3.5).We choose κ = 1 . , which is related to the piecewise polynomial regression with polynomial degree p = 2 (comparable to our setting) and the limiting case ν → ∞ (see footnote 4 on page 11). Moreover,for each ε = 2 − i , i ∈ { , , , , } , we set the parameters J , N and N for the RCV, TRCV andTSRCV approaches as follows (compare with the formulas in Subsection 5.1): J = (cid:6) ε − . (cid:7) , N = (cid:40) · (cid:100) ε − . (cid:101) , RCV, TRCV , · (cid:100) ε − . (cid:101) , TSRCV , N = 512 · (cid:100) ε − . (cid:101) . The factors 512 and 2048 are here for stability purposes. For the TRCV and SMC algorithms weadditionally consider ε = 2 − , which produces a picture with approximately equal maximal computa-tional time (that is, the time corresponding to the best accuracy) for all algorithms. Next we estimatethe numerical complexity for the RCV, TRCV and TSRCV approaches by means of 100 independentsimulations and compare it with the one for the SMC and MLMC approach, for which we use thesame output as in [1]. As can be seen from Figure 1, the estimated numerical complexity is aboutRMSE − . for the RCV approach, RMSE − . for the TRCV approach, RMSE − . for the TSRCVapproach, RMSE − . for the SMC approach and RMSE − . for the MLMC approach, which we get byregressing the log-time (logarithmic computing time of the whole algorithm in seconds) vs. log-RMSE.Beyond the numerical complexities we observe that the truncation effect from RCV algorithm to itstruncated versions is huge. While we have poor results for the RCV approach (as in [1]), i.e. in thisregion of ε -values the RCV approach is numerically outperformed by the other ones, the TRCV andTSRCV approaches work best (even better than the SMC and MLMC approaches).7. Proofs Proof of Theorem 1.1. We begin with the following remark. Assumptions (1.5) and (1.6) togetherwith the Cauchy-Schwarz inequality | E [ XY |G ] | ≤ (cid:112) E [ X |G ] E [ Y |G ] imply that the following general-isation of (1.6) is satisfied: for any n , n ∈ N , α, β ∈ N d , with ≤ | α | ≤ K , ≤ | β | ≤ K , α (cid:54) = β , itholds (cid:12)(cid:12)(cid:12) E (cid:104) (cid:16) D α Φ k ∆ ,l +1 ( G l,j ( x )) (cid:17) n (cid:16) D β Φ k ∆ ,l +1 ( G l,j ( x )) (cid:17) n (cid:12)(cid:12)(cid:12) G l (cid:105)(cid:12)(cid:12)(cid:12) ≤ C n ,n ∆ (7.1)for some appropriate constants C n ,n > .Let us begin with the case K = 1 . We have for some k, r ∈ { , . . . , d } ∂∂x r G kl +1 ,j ( x ) = d (cid:88) s =1 ∂∂x s Φ k ∆ ,l +1 ( G l,j ( x )) ∂∂x r G sl,j ( x ) =: d (cid:88) s =1 γ s RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 13 -12 -10 -8 -6 -4 -2 0051015 Figure 1. Numerical complexities of the RCV, TRCV, TSRCV, SMC and MLMC approaches.and ∂∂x r G sj +1 ,j ( x ) = ∂∂x r Φ s ∆ ( x, ξ j +1 ) , where G sl +1 ,j and Φ s ∆ , s ∈ { , . . . , d } , denote the s -th componentof the functions G l +1 ,j and Φ ∆ . Hence E (cid:34)(cid:18) ∂∂x r G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E γ k + (cid:88) s : s (cid:54) = k (cid:0) γ k γ s + ( d − γ s (cid:1) . For an arbitrary j ∈ { , . . . , J − } , denote ρ r,sl +1 ,n, := E (cid:20)(cid:18) ∂∂x r G sl +1 ,j ( x ) (cid:19) n (cid:21) , then, due to (1.5) and (7.1), we get for l = j, . . . , J − , ρ r,kl +1 , , ≤ (1 + A ∆) ρ r,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆( ρ r,kl, , + ρ r,sl, , ) + ( d − B ∆ ρ r,sl, , (cid:17) . Further, denote ρ rl +1 ,n, := d (cid:88) s =1 ρ r,sl +1 ,n, , then we get ρ rl +1 , , ≤ (1 + A ∆) ρ rl, , + 2( d − C , ∆ ρ rl, , + ( d − B ∆ ρ rl, , . This gives us ρ rl +1 , , ≤ (1 + κ ∆) ρ rl, , for some constant κ > , leading to(7.2) ρ rl, , ≤ (1 + κ ∆) l − j − ρ rj +1 , , , l = j + 1 , . . . , J − , where ρ rj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂∂x r Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) , which is bounded due to (1.5). Together with (7.2) we obtain the boundedness of { ρ rJ, , : J ∈ N } andhence the boundedness of (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x r q j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) s =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x s f ( G J,j ( x )) ∂∂x r G sJ,j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) s =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂∂x s f ( G J,j ( x )) (cid:19) (cid:35) ρ r,sJ, , ≤ (cid:118)(cid:117)(cid:117)(cid:116) d d (cid:88) s =1 E (cid:34)(cid:18) ∂∂x s f ( G J,j ( x )) (cid:19) (cid:35) ρ r,sJ, , ≤ const (cid:113) ρ rJ, , for all r ∈ { , . . . , d } , since f is assumed to be continuously differentiable with bounded partial deriva-tives.Let us proceed with the case K = 2 . We have, due to ( (cid:80) dk =1 a k ) n ≤ d n − (cid:80) dk =1 a nk , E (cid:34)(cid:18) ∂∂x r G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E γ k + (cid:88) s : s (cid:54) = k (cid:0) γ k γ s + 6( d − γ k γ s + 4( d − γ k γ s + ( d − γ s (cid:1) and thus, due to a b ≤ a + b and a b ≤ a + b , ρ r,kl +1 , , ≤ (1 + A ∆) ρ r,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆(3 ρ r,kl, , + ρ r,sl, , ) + 3( d − C , ∆( ρ r,kl, , + ρ r,sl, , )+( d − C , ∆( ρ r,kl, , + 3 ρ r,sl, , ) + ( d − B ∆ ρ r,sl, , (cid:17) . This gives us ρ rl +1 , , ≤ (1 + A ∆) ρ rl, , + 4( d − C , ∆ ρ rl, , + 6( d − C , ∆ ρ rl, , +4( d − C , ∆ ρ rl, , + ( d − B ∆ ρ rl, , . Hence, we obtain ρ rl +1 , , ≤ (1 + κ ∆) ρ rl, , , for some constant κ > , leading to ρ rl, , ≤ (1 + κ ∆) l − j − ρ rj +1 , , , l = j + 1 , . . . , J − , where ρ rj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂∂x r Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) . Next, we have for some k, o, r ∈ { , . . . , d } ∂ ∂x r ∂x o G kl +1 ,j ( x ) = d (cid:88) s =1 ∂∂x s Φ k ∆ ,l +1 ( G l,j ( x )) ∂ ∂x r ∂x o G sl,j ( x )+ d (cid:88) s,u =1 ∂ ∂x s ∂x u Φ k ∆ ,l +1 ( G l,j ( x )) ∂∂x r G sl,j ( x ) ∂∂x o G ul,j ( x )=: d (cid:88) s =1 η ,s + d (cid:88) s,u =1 η ,s,u RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 15 and ∂ ∂x r ∂x o G sj +1 ,j ( x ) = ∂ ∂x r ∂x o Φ s ∆ ( x, ξ j +1 ) . Hence E (cid:34)(cid:18) ∂ ∂x r ∂x o G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E η ,k + (cid:88) s : s (cid:54) = k (cid:0) η ,k η ,s + ( d − η ,s (cid:1) + 2 d (cid:88) s,u,v =1 η ,v η ,s,u + d d (cid:88) s,u =1 η ,s,u Denote ρ r,o,sl +1 ,n, = E (cid:20)(cid:18) ∂ ∂x r ∂x o G sl +1 ,j ( x ) (cid:19) n (cid:21) , then we get, due to E [ XY Z ] ≤ (cid:112) E [ X ] (cid:112) E [ Y ] (cid:112) E [ Z ] ≤ E (cid:2) X (cid:3) + (cid:112) E [ Y ] (cid:112) E [ Z ] ≤ E (cid:2) X (cid:3) + 12 (cid:0) E (cid:2) Y (cid:3) + E (cid:2) Z (cid:3)(cid:1) , as well as (1.5) and (7.1), ρ r,o,kl +1 , , ≤ (1 + A ∆) ρ r,o,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆( ρ r,o,kl, , + ρ r,o,sl, , ) + ( d − B ∆ ρ r,o,sl, , (cid:17) + d (cid:88) s,u,v =1 C , ∆ (cid:18) ρ r,o,vl, , + 12 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17)(cid:19) + d d (cid:88) s,u =1 B ∆ 12 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17) . Further, denote ρ r,ol +1 ,n, = d (cid:88) s =1 ρ r,o,sl +1 ,n, , then we get for l = j + 1 , . . . , J − , ρ r,ol +1 , , ≤ (1 + A ∆) ρ r,ol, , + 2( d − C , ∆ ρ r,ol, , + ( d − B ∆ ρ r,ol, , + d C , ∆ (cid:18) ρ r,ol, , + 12 (cid:0) ρ rl, , + ρ ol, , (cid:1)(cid:19) + d B ∆ 12 (cid:0) ρ rl, , + ρ ol, , (cid:1) . This gives us ρ r,ol +1 , , ≤ (1 + κ ∆) ρ r,ol, , + κ ∆ , for some constants κ , κ > , leading to ρ r,ol, , ≤ (1 + κ ∆) l − j − ρ r,oj +1 , , + κ , l = j + 1 , . . . , J − , where κ > and ρ r,oj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂ ∂x r ∂x o Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) . Thus, we obtain the boundedness of (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂x r ∂x o q j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) s =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x s f ( G J,j ( x )) ∂ ∂x r ∂x o G sJ,j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) + d (cid:88) s,u =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂x s ∂x u f ( G J,j ( x )) ∂∂x r G sJ,j ( x ) ∂∂x o G uJ,j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) s =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂∂x s f ( G J,j ( x )) (cid:19) (cid:35) ρ r,o,sJ, , + d (cid:88) s,u =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂ ∂x s ∂x u f ( G J,j ( x )) (cid:19) (cid:35) (cid:113) ρ r,sJ, , ρ o,uJ, , for all r, o ∈ { , . . . , d } , since f is assumed to be twice continuously differentiable with bounded partialderivatives up to order .Let us proceed with the final case K = 3 . We have E (cid:34)(cid:18) ∂∂x r G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E γ k + (cid:88) s : s (cid:54) = k (cid:0) γ k γ s + 15( d − γ k γ s + 20( d − γ k γ s + 15( d − γ k γ s +6( d − γ k γ s + ( d − γ s (cid:1)(cid:3) and thus, due to a b ≤ a + b , a b ≤ a + b and a b ≤ a + b , ρ r,kl +1 , , ≤ (1 + A ∆) ρ r,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆(5 ρ r,kl, , + ρ r,sl, , ) + 5( d − C , ∆(2 ρ r,kl, , + ρ r,sl, , )+ 10( d − C , ∆( ρ r,kl, , + ρ r,sl, , ) + 5( d − C , ∆( ρ r,kl, , + 2 ρ r,sl, , )+( d − C , ∆( ρ r,kl, , + 5 ρ r,sl, , ) + ( d − B ∆ ρ r,sl, , (cid:17) . This gives us ρ rl +1 , , ≤ (1 + A ∆) ρ rl, , + 6( d − C , ∆ ρ rl, , + 15( d − C , ∆ ρ rl, , +20( d − C , ∆ ρ rl, , + 15( d − C , ∆ ρ rl, , + 6( d − C , ∆ ρ rl, , +( d − B ∆ ρ rl, , . Hence, we obtain ρ rl +1 , , ≤ (1 + κ ∆) ρ rl, , for some constant κ > , leading to ρ rl, , ≤ (1 + κ ∆) l − j − ρ rj +1 , , , l = j + 1 , . . . , J − , where ρ rj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂∂x r Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 17 Moreover, we have E (cid:34)(cid:18) ∂∂x r G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E γ k + (cid:88) s : s (cid:54) = k (cid:0) γ k γ s + 28( d − γ k γ s + 56( d − γ k γ s + 70( d − γ k γ s +56( d − γ k γ s + 28( d − γ k γ s + 8( d − γ k γ s + ( d − γ s (cid:1)(cid:3) and thus, due to a b ≤ a + b , a b ≤ a + b , a b ≤ a + 3 b and a b ≤ a + b , ρ r,kl +1 , , ≤ (1 + A ∆) ρ r,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆(7 ρ r,kl, , + ρ r,sl, , ) + 7( d − C , ∆(3 ρ r,kl, , + ρ r,sl, , )+ 7( d − C , ∆(5 ρ r,kl, , + 3 ρ r,sl, , ) + 35( d − C , ∆( ρ r,kl, , + ρ r,sl, , )+ 7( d − C , ∆(3 ρ r,kl, , + 5 ρ r,sl, , ) + 7( d − C , ∆( ρ r,kl, , + 3 ρ r,sl, , )+( d − C , ∆( ρ r,kl, , + 7 ρ r,sl, , ) + ( d − B ∆ ρ r,sl, , (cid:17) . This gives us ρ rl +1 , , ≤ (1 + A ∆) ρ rl, , + 8( d − C , ∆ ρ rl, , + 28( d − C , ∆ ρ rl, , +56( d − C , ∆ ρ rl, , + 70( d − C , ∆ ρ rl, , +56( d − C , ∆ ρ rl, , + 28( d − C , ∆ ρ rl, , +8( d − C , ∆ ρ rl, , + ( d − B ∆ ρ rl, , . Hence, we obtain ρ rl +1 , , ≤ (1 + κ ∆) ρ rl, , , for some constant κ > , leading to ρ rl, , ≤ (1 + κ ∆) l − j − ρ rj +1 , , , l = j + 1 , . . . , J − , where ρ rj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂∂x r Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) . Moreover, we have E (cid:34)(cid:18) ∂ ∂x r ∂x o G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E η ,k + (cid:88) s : s (cid:54) = k (cid:0) η ,k η ,s + 6( d − η ,k η ,s + 4( d − η ,k η ,s + ( d − η ,s (cid:1) + d (cid:88) s,u,v =1 (cid:0) d η ,v η ,s,u + 6 d η ,v η ,s,u + 4 d η ,v η ,s,u (cid:1) + d d (cid:88) s,u =1 η ,s,u and thus, due to a bc ≤ a + (cid:0) b + c (cid:1) , a b c ≤ a + (cid:0) b + c (cid:1) and ab c ≤ a + (cid:0) b + c (cid:1) , ρ r,o,kl +1 , , ≤ (1 + A ∆) ρ r,o,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆(3 ρ r,o,kl, , + ρ r,o,sl, , ) + 3( d − C , ∆( ρ r,o,kl, , + ρ r,o,sl, , )+( d − C , ∆( ρ r,o,kl, , + 3 ρ r,o,sl, , ) + ( d − B ∆ ρ r,o,sl, , (cid:17) + d (cid:88) s,u,v =1 (cid:18) d C , ∆ (cid:18) ρ r,o,vl, , + 12 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17)(cid:19) + 3 d C , ∆ (cid:18) ρ r,o,vl, , + 12 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17)(cid:19) + d C , ∆ (cid:18) ρ r,o,vl, , + 32 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17)(cid:19)(cid:19) + d d (cid:88) s,u =1 B ∆ 12 (cid:16) ρ r,sl, , + ρ o,ul, , (cid:17) . This gives us ρ r,ol +1 , , ≤ (1 + A ∆) ρ r,ol, , + 4( d − C , ∆ ρ r,ol, , + 6( d − C , ∆ ρ r,ol, , +4( d − C , ∆ ρ rl, , + ( d − B ∆ ρ r,ol, , + d C , ∆ (cid:18) ρ r,ol, , + 12 (cid:0) ρ rl, , + ρ ol, , (cid:1)(cid:19) +3 d C , ∆ (cid:18) ρ r,ol, , + 12 (cid:0) ρ rl, , + ρ ol, , (cid:1)(cid:19) + d C , ∆ (cid:18) ρ r,ol, , + 32 (cid:0) ρ rl, , + ρ ol, , (cid:1)(cid:19) + d B ∆ 12 (cid:0) ρ rl, , + ρ ol, , (cid:1) . Hence, we obtain ρ r,ol +1 , , ≤ (1 + κ ∆) ρ r,ol, , + κ ∆ , for some constants κ , κ > , leading to ρ r,ol, , ≤ (1 + κ ∆) l − j − ρ r,oj +1 , , + κ , l = j + 1 , . . . , J − , where κ > and ρ r,oj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂ ∂x r ∂x o Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) . RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 19 Next, we have for some k, o, r, z ∈ { , . . . , d } ∂ ∂x r ∂x o ∂x z G kl +1 ,j ( x )= d (cid:88) s =1 ∂∂x s Φ k ∆ ,l +1 ( G l,j ( x )) ∂ ∂x r ∂x o ∂x z G sl,j ( x )+ d (cid:88) s,u =1 ∂ ∂x s ∂x u Φ k ∆ ,l +1 ( G l,j ( x )) (cid:18) ∂ ∂x r ∂x o G sl,j ( x ) ∂∂x z G ul,j ( x ) + ∂ ∂x r ∂x z G sl,j ( x ) ∂∂x o G ul,j ( x )+ ∂∂x r G sl,j ( x ) ∂ ∂x o ∂x z G ul,j ( x ) (cid:19) + d (cid:88) s,u,v =1 ∂ ∂x s ∂x u ∂x v Φ k ∆ ,l +1 ( G l,j ( x )) ∂∂x r G sl,j ( x ) ∂∂x o G ul,j ( x ) ∂∂x z G vl,j ( x )=: d (cid:88) s =1 ψ ,s + d (cid:88) s,u =1 ψ ,s,u + d (cid:88) s,u,v =1 ψ ,s,u,v and ∂ ∂x r ∂x o ∂x z G sj +1 ,j ( x ) = ∂ ∂x r ∂x o ∂x z Φ s ∆ ( x, ξ j +1 ) . Hence E (cid:34)(cid:18) ∂ ∂x r ∂x o ∂x z G kl +1 ,j ( x ) (cid:19) (cid:35) ≤ E ψ ,k + (cid:88) s : s (cid:54) = k (cid:0) ψ ,k ψ ,s + ( d − ψ ,s (cid:1) + 2 d (cid:88) s,u,v =1 ψ ,v ψ ,s,u +2 d (cid:88) s,u,v,w =1 ψ ,w ψ ,s,u,v + 2 d d (cid:88) s,u =1 ψ ,s,u + 2 d d (cid:88) s,u,v =1 ψ ,s,u,v Denote ρ r,o,z,sl +1 ,n, = E (cid:20)(cid:18) ∂ ∂x r ∂x o ∂x z G sl +1 ,j ( x ) (cid:19) n (cid:21) , then we get, due to a b c ≤ a + b + c and E [ XY ZU ] ≤ (cid:112) E [ X ] (cid:112) E [ Y ] (cid:112) E [ Z ] (cid:112) E [ U ] ≤ E (cid:2) X (cid:3) + (cid:112) E [ Y ] (cid:112) E [ Z ] (cid:112) E [ U ] ≤ E (cid:2) X (cid:3) + 13 (cid:0) E (cid:2) Y (cid:3) + E (cid:2) Z (cid:3) + E (cid:2) U (cid:3)(cid:1) , as well as (1.5) and (7.1), ρ r,o,z,kl +1 , , ≤ (1 + A ∆) ρ r,o,z,kl, , + (cid:88) s : s (cid:54) = k (cid:16) C , ∆( ρ r,o,z,kl, , + ρ r,o,z,sl, , ) + ( d − B ∆ ρ r,o,z,sl, , (cid:17) + d (cid:88) s,u,v =1 C , ∆ (cid:18) ρ r,o,z,vl, , + 12 (cid:16) ρ r,sl, , + ρ o,ul, , + ρ z,ul, , + ρ r,o,sl, , + ρ r,z,sl, , + ρ o,z,ul, , (cid:17)(cid:19) + d (cid:88) s,u,v,w =1 C , ∆ (cid:18) ρ r,o,z,wl, , + 13 (cid:16) ρ r,sl, , + ρ o,ul, , + ρ z,vl, , (cid:17)(cid:19) +3 d d (cid:88) s,u =1 B ∆ (cid:16) ρ r,sl, , + ρ o,ul, , + ρ z,ul, , + ρ r,o,sl, , + ρ r,z,sl, , + ρ o,z,ul, , (cid:17) + d d (cid:88) s,u,v =1 B ∆ 13 (cid:16) ρ r,sl, , + ρ o,ul, , + ρ z,vl, , (cid:17) . Further, denote ρ r,o,zl +1 , , = d (cid:88) s =1 ρ r,o,z,sl +1 , , , then we get ρ r,o,zl +1 , , ≤ (1 + A ∆) ρ r,o,zl, , + 2( d − C , ∆ ρ r,o,zl, , + ( d − B ∆ ρ r,o,zl, , + d C , ∆ (cid:18) ρ r,o,zl, , + 12 (cid:16) ρ rl, , + ρ ol, , + ρ zl, , + ρ r,ol, , + ρ r,zl, , + ρ o,zl, , (cid:17)(cid:19) + d C , ∆ (cid:18) ρ r,o,zl, , + 13 (cid:0) ρ rl, , + ρ ol, , + ρ zl, , (cid:1)(cid:19) +3 d B ∆ (cid:16) ρ rl, , + ρ ol, , + ρ zl, , + ρ r,ol, , + ρ r,zl, , + ρ o,zl, , (cid:17) + d B ∆ 13 (cid:0) ρ rl, , + ρ ol, , + ρ zl, , (cid:1) . This gives us ρ r,o,zl +1 , , ≤ (1 + κ ∆) ρ r,o,zl, , + κ ∆ , for some constants κ , κ > , leading to ρ r,o,zl, , ≤ (1 + κ ∆) l − j − ρ r,o,zj +1 , , + κ , l = j + 1 , . . . , J − , where κ > and ρ r,o,zj +1 , , = d (cid:88) s =1 E (cid:34)(cid:18) ∂ ∂x r ∂x o ∂x z Φ s ∆ ( x, ξ j +1 ) (cid:19) (cid:35) . Thus, we obtain the boundedness of (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂x r ∂x o ∂x z q j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ d (cid:88) s =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂∂x s f ( G J,j ( x )) ∂ ∂x r ∂x o ∂x z G sJ,j ( x ) (cid:12)(cid:12)(cid:12)(cid:12) + d (cid:88) s,u =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂x s ∂x u f ( G J,j ( x )) (cid:18) ∂ ∂x r ∂x o G sJ,j ( x ) ∂∂x z G uJ,j ( x ) + ∂ ∂x r ∂x z G sJ,j ( x ) ∂∂x o G uJ,j ( x )+ ∂∂x r G sJ,j ( x ) ∂ ∂x o ∂x z G uJ,j ( x ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) + d (cid:88) s,u,v =1 E (cid:12)(cid:12)(cid:12)(cid:12) ∂ ∂x s ∂x u ∂x v f ( G J,j ( x )) ∂∂x r G sJ,j ( x ) ∂∂x o G uJ,j ( x ) ∂∂x z G vJ,j ( x ) (cid:21) ≤ d (cid:88) s =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂∂x s f ( G J,j ( x )) (cid:19) (cid:35) ρ r,o,z,sJ, , + d (cid:88) s,u =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂ ∂x s ∂x u f ( G J,j ( x )) (cid:19) (cid:35) (cid:16) (cid:113) ρ r,o,sJ, , ρ z,uJ, , + (cid:113) ρ r,z,sJ, , ρ o,uJ, , + (cid:113) ρ r,sJ, , ρ o,z,uJ, , (cid:17) + d (cid:88) s,u,v =1 (cid:118)(cid:117)(cid:117)(cid:116) E (cid:34)(cid:18) ∂ ∂x s ∂x u ∂x v f ( G J,j ( x )) (cid:19) (cid:35) (cid:113) ρ r,sJ, , ρ o,uJ, , ρ z,vJ, , for all r, o, z ∈ { , . . . , d } , since f is assumed to be three times continuously differentiable with boundedpartial derivatives up to order 3. RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 21 Proof of Theorem 3.1. (i) Straightforward.(ii) Let us define µ ∆ ( x ) := x + µ ( x )∆ . Then we obtain via Taylor’s theorem (cf. (2.2)) q j (Φ ∆ ( x, y )) = q j ( µ ∆ ( x )) + √ ∆ d (cid:88) k =1 m (cid:88) i =1 σ ki ( x ) y i ˆ ∂q j ∂x k ( µ ∆ ( x ) + tσ ( x ) √ ∆ y ) dt. This gives us (see (2.5)) a j,r,s ( x ) = 12 m (cid:88) y ∈{− , } m q j (Φ ∆ ( x, y )) r (cid:89) o =1 y s o = √ ∆2 m (cid:88) y ∈{− , } m (cid:32) r (cid:89) o =1 y s o (cid:33) d (cid:88) k =1 m (cid:88) i =1 σ ki ( x ) y i ˆ ∂q j ∂x k ( µ ∆ ( x ) + tσ ( x ) √ ∆ y ) dt, (7.3)since m (cid:88) y ∈{− , } m r (cid:89) o =1 y s o = E (cid:34) r (cid:89) o =1 ξ s o j (cid:35) = 0 . Next we apply Theorem 1.1 for the case K = 1 to get that all the functions q j are continuouslydifferentiable with bounded partial derivatives. Clearly, the assumptions in this theorem hold, whenall the functions f, µ k , σ ki , k ∈ { , . . . , d } , i ∈ { , . . . , m } are continuously differentiable with boundedderivatives. Together with the assumption, that all the functions σ ki are bounded, we get from (7.3)that a j,r,s is of order √ ∆ for all j, r, s .7.2. Proof of Theorem 3.2. Let us consider a higher order Taylor expansion compared to the proofof Theorem 3.1 and recall that µ ∆ ( x ) = x + µ ( x )∆ . We have for any y ∈ {− , } m q j (Φ ∆ ( x, y )) = q j ( µ ∆ ( x )) + √ ∆ d (cid:88) k =1 ∂∂x k q j ( µ ∆ ( x )) m (cid:88) i =1 σ ki ( x ) y i + ∆ d (cid:88) k,l =1 (2 − δ k,l ) ˆ (1 − t ) ∂ ∂x k ∂x l q j ( µ ∆ ( x ) + t √ ∆ σ ( x ) y ) dt m (cid:88) i =1 σ ki ( x ) y i m (cid:88) i =1 σ li ( x ) y i , (7.4)where δ · , · is the Kronecker delta. This gives us for r ≥ (cf. (2.5)) a j,r,s ( x ) = 12 m (cid:88) y ∈{− , } m q j (Φ ∆ ( x, y )) r (cid:89) o =1 y s o = ∆2 m d (cid:88) k,l =1 (2 − δ k,l ) (cid:88) y ∈{− , } m (cid:32) m (cid:88) i =1 σ ki ( x ) y i m (cid:88) i =1 σ li ( x ) y i r (cid:89) o =1 y s o · ˆ (1 − t ) ∂ ∂x k ∂x l q j ( µ ∆ ( x ) + t √ ∆ σ ( x ) y ) dt , (7.5)due to (cf. (7.4)) m (cid:88) y ∈{− , } m y i r (cid:89) o =1 y s o = E (cid:34) ξ ij r (cid:89) o =1 ξ s o j (cid:35) = 0 (7.6)for all i ∈ { , . . . , m } . (Note that (7.6) does not hold for r = 1 .) Applying Theorem 1.1 (case K = 2 ),we get that q j is twice continuously differentiable with bounded partial derivatives up to order 2,provided that all the functions f, µ k , σ k,i are twice continuously differentiable with bounded partialderivatives up to order 2. Together with the assumption, that all the functions σ kl are bounded, weget from (7.5) that a j,r,s is of order ∆ for all j, r, s with r > . Proof of Theorem 3.4. Here we apply Theorem 3.2, which gives us (cf. (2.6)) Var (cid:104) f ( X ∆ ,T ) − M (1) ,trunc ∆ ,T (cid:105) = Var (cid:104) M (1)∆ ,T − M (1) ,trunc ∆ ,T (cid:105) = Var J (cid:88) j =1 m (cid:88) r =2 (cid:88) ≤ s <... The proof works similarly to the one of Theorem 3.1. More precisely,here we define (cf. (2.8)) µ ∆ ( x ) := x + µ ( x )∆ + 12 L µ ( x ) ∆ . Then we derive the zero-order Taylor expansion for q j (Φ ∆ ( x, y, z )) around µ ∆ ( x ) , use that E (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj = 0 and observe that all components ˜Φ k ∆ ( x, y, z ) := Φ k ∆ ( x, y, z ) − µ k ∆ ( x ) , k ∈ { , . . . , d } (as an analogue of √ ∆ (cid:80) mi =1 σ ki ( x ) y i in case of the weak Euler scheme), are of order √ ∆ under less strict assumptionsthan required in the present theorem. Finally we apply Theorem 1.1 (case K = 1 ) which gives usthat q j is continuously differentiable with bounded partial derivatives under the assumptions, thatall functions µ k and σ ki are bounded and all the functions f, µ k , σ ki are three-times continuouslydifferentiable with bounded partial derivatives up to order 3. Consequently, all the functions a j,o,U ,U are of order √ ∆ .7.5. Proof of Theorem 3.6. (i) The proof works similarly to the one of Theorem 3.2, that is, weconsider a Taylor expansion for q j (Φ ∆ ( x, y, z )) of order 1, around the same point µ ∆ ( x ) as in the proofof Theorem 3.5. Then we use E ˜Φ k ∆ ( x, ξ j , V j ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj = 0 , k ∈ { , . . . , d } , whenever | U | + |K | + |K | ≥ (where again ˜Φ k ∆ ( x, y, z ) = Φ k ∆ ( x, y, z ) − µ k ∆ ( x ) ). Then we applyTheorem 1.1 (case K = 2 ) which gives us that q j is twice continuously differentiable with boundedpartial derivatives up to order 2 under the assumptions, that all functions µ k and σ ki are bounded andall the functions f, µ k , σ ki are four-times continuously differentiable with bounded partial derivativesup to order 4. Finally, we get that all the functions a j,o,U ,U are of order ∆ , since the product of allfunctions ˜Φ k ∆ ( x, y, z ) ˜Φ l ∆ ( x, y, z ) , k, l ∈ { , . . . , d } , is of order ∆ under the above assumptions. RUNCATED CONTROL VARIATES FOR WEAK APPROXIMATION SCHEMES 23 (ii) Here we consider the Taylor expansion of order 2, that is q j (Φ ∆ ( x, y, z )) = q j ( µ ∆ ( x )) + d (cid:88) k =1 ∂∂x k q j ( µ ∆ ( x )) ˜Φ k ∆ ( x, y, z )+ d (cid:88) k,l =1 12 (2 − δ k,l ) ∂ ∂x k ∂x l q j ( µ ∆ ( x )) ˜Φ k ∆ ( x, y, z ) ˜Φ l ∆ ( x, y, z )+ d (cid:88) k,l,n =1 (cid:20)(cid:18) − 32 ( δ k,l + δ k,n + δ l,n ) + 2 δ k,l δ k,n δ l,n (cid:19) ˜Φ k ∆ ( x, y, z )Φ l ∆ ( x, y, z )Φ n ∆ ( x, y, z ) · ˆ (1 − t ) ∂ ∂x k ∂x l ∂x n q j ( µ ∆ ( x ) + t ˜Φ ∆ ( x, y, z )) dt . Next we use E ˜Φ k ∆ ( x, ξ j , V j ) ˜Φ l ∆ ( x, ξ j , V j ) (cid:89) r ∈ U H o r ( ξ rj ) (cid:89) ( k,l ) ∈ U V klj = 0 , k, l ∈ { , . . . , d } , whenever | U | + |K | + |K | > , and thus we obtain (cf. (2.11)) a j,o,U ,U ( x )= 12 m ( m − m (cid:88) y ∈{−√ , , √ } m (cid:88) z ∈{− , } m ( m − (cid:80) mi =1 I ( y i =0) (cid:89) r ∈ U H o r ( y r ) (cid:89) ( k,l ) ∈ U z kl q j (Φ ∆ ( x, y, z ))= 12 m ( m − m (cid:88) y ∈{−√ , , √ } m (cid:88) z ∈{− , } m ( m − (cid:80) mi =1 I ( y i =0) (cid:89) r ∈ U H o r ( y r ) (cid:89) ( k,l ) ∈ U z kl · d (cid:88) k,l,n =1 (cid:20)(cid:18) − 32 ( δ k,l + δ k,n + δ l,n ) + 2 δ k,l δ k,n δ l,n (cid:19) ˜Φ k ∆ ( x, y, z )Φ l ∆ ( x, y, z )Φ n ∆ ( x, y, z ) · ˆ (1 − t ) ∂ ∂x k ∂x l ∂x n q j ( µ ∆ ( x ) + t ˜Φ ∆ ( x, y, z )) dt Then we apply Theorem 1.1 (case K = 3 ) which gives us that q j is three-times continuously differen-tiable with bounded partial derivatives up to order 3 under the assumptions, that all functions µ k and σ ki are bounded and all the functions f, µ k , σ ki are five-times continuously differentiable with boundedpartial derivatives up to order 5. Finally, we get that all the functions a j,o,U ,U are of order ∆ / ,since the product of all functions ˜Φ k ∆ ( x, y, z ) ˜Φ l ∆ ( x, y, z ) ˜Φ n ∆ ( x, y, z ) , k, l, n ∈ { , . . . , d } , is of order ∆ / under the above assumptions.7.6. Proof of Theorem 3.8. The proof is similar to the one of Theorem 3.4.7.7. Proof of Lemma 5.2. We refer to Theorem 11.3 in [6]. When applying it, we obtain actually E (cid:107) ˜ a j,o,U ,U − a j,o,U ,U (cid:107) L ( P ∆ ,j − ) ≤ ˜ c max (cid:110) Σ , ˜ A ∆ U ,U (cid:111) (log N + 1) QN + 8 C κ Q κ . (7.7)However, the maximum in (7.7) is in fact a sum of two terms Σ and ˜ A ∆ U ,U (log N r + 1) so that thelogarithm is only included in one term (see the proof of Theorem 11.3 in [6]). Proof of Theorem 5.3. Using the martingale transform structure in (2.12) and (3.5) (recallfootnote 1 on page 4) together with the orthonormality of the system (cid:81) r ∈ U H o r ( ξ rj ) (cid:81) ( k,l ) ∈ U V klj , weget by (3.6) and (5.2) Var (cid:104) f ( X ∆ ,T ) − (cid:102) M (2) ,trunc ∆ ,T (cid:105) = Var (cid:104) f ( X ∆ ,T ) − M (2) ,trunc ∆ ,T (cid:105) + Var (cid:104) M (2) ,trunc ∆ ,T − (cid:102) M (2) ,trunc ∆ ,T (cid:105) (cid:46) ∆ + J (cid:88) j =1 (cid:88) ( U ,U ) ∈A| U | + |K | + |K |≤ (cid:88) o ∈{ , } U E (cid:107) ˜ a j,o,U ,U − a j,o,U ,U (cid:107) L ( P ∆ ,j − ) ≤ ∆ + J m ( m + 1) (cid:18) ˜ c (Σ + ˜ A ∆(log N + 1)) QN + 8 C κ Q κ (cid:19) , since ∆ U ,U ≤ ∆ .7.9. Proof of Theorem 5.4. The proof is similar to the complexity analysis performed in [3]. References [1] D. Belomestny, S. Häfner, T. Nagapetyan, and M. Urusov. Variance reduction for discretised diffusions via regression. Preprint, arXiv:1510.03141v3 , 2016.[2] D. Belomestny, S. Häfner, and M. Urusov. Regression-based complexity reduction of the dual nested Monte Carlomethods. Preprint, arXiv:1611.06344 , 2016.[3] D. Belomestny, S. Häfner, and M. Urusov. Regression-based variance reduction approach for strong approximationschemes. Preprint, arXiv:1612.03407v2 , 2017.[4] D. Belomestny, S. Häfner, and M. Urusov. Stratified regression-based variance reduction approach for weak approx-imation schemes. Preprint, arXiv:1612.05255v2 , 2017.[5] M. B. Giles. Multilevel Monte Carlo path simulation. Operations Research , 56(3):607–617, 2008.[6] L. Györfi, M. Kohler, A. Krzyżak, and H. Walk. A distribution-free theory of nonparametric regression . SpringerSeries in Statistics. Springer-Verlag, New York, 2002.[7] P. Kloeden and E. Platen. Numerical solution of stochastic differential equations , volume 23. Springer, 1992.[8] G. N. Milstein and M. V. Tretyakov. Practical variance reduction via regression for simulating diffusions. SIAMJournal on Numerical Analysis , 47(2):887–910, 2009.[9] T. Müller-Gronbach, K. Ritter, and L. Yaroslavtseva. On the complexity of computing quadrature formulas formarginal distributions of SDEs. Journal of Complexity , 31(1):110–145, 2015.[10] T. Müller-Gronbach and L. Yaroslavtseva. Deterministic quadrature formulas for SDEs based on simplified weakItô-Taylor steps. Foundations of Computational Mathematics , 16(5):1325–1366, 2016.[11] N. J. Newton. Variance reduction for simulated diffusions. SIAM journal on applied mathematics , 54(6):1780–1805,1994. University of Duisburg-Essen, Essen, Germany and IITP RAS, Moscow, Russia E-mail address : [email protected] PricewaterhouseCoopers GmbH, Frankfurt, Germany E-mail address : [email protected] University of Duisburg-Essen, Essen, Germany E-mail address :: .7.4. Proof of Theorem 3.5.