Convex order, quantization and monotone approximations of ARCH models
CConvex order, quantization and monotoneapproximations of ARCH models
Benjamin Jourdain ∗‡ Gilles Pag`es †‡ Abstract
We are interested in proposing approximations of a sequence of probability measures in theconvex order by finitely supported probability measures still in the convex order. We propose toalternate transitions according to a martingale Markov kernel mapping a probability measurein the sequence to the next and dual quantization steps. In the case of ARCH models and inparticular of the Euler scheme of a driftless Brownian diffusion, the noise has to be truncatedto enable the dual quantization step. We analyze the error between the original ARCH modeland its approximation with truncated noise and exhibit conditions under which the latter isdominated by the former in the convex order at the level of sample-paths. Last, we analyse theerror of the scheme combining the dual quantization steps with truncation of the noise accordingto primal quantization.
AMS Subject Classification (2010):
For d ∈ N ∗ , and µ, ν in the set P ( R d ) of probability measures on R d , we say that µ is smallerthan ν in the convex order and denote µ ≤ cvx ν if ∀ ϕ : R d → R convex , (cid:90) R d ϕ ( x ) µ ( dx ) ≤ (cid:90) R d ϕ ( y ) ν ( dy ) , (1.1)when the integrals make sense (since any real valued convex function is bounded from below by anaffine function (cid:82) R d ϕ ( x ) µ ( dx ) makes sense in R ∪ { + ∞} as soon as (cid:82) R d | x | µ ( dx ) < + ∞ ). We thenalso write X ≤ cvx Y for X and Y random vectors respectively distributed according to µ and ν .For p ≥
1, we denote by P p ( R d ) = { µ ∈ P ( R d ) : (cid:82) R d | x | p µ ( dx ) < + ∞} the Wasserstein spacewith index p over R d . When µ, ν ∈ P ( R d ), according to the Strassen theorem [42], µ ≤ cvx ν if andonly if there exists a martingale coupling between µ and ν that is a probability measure M ( dx, dy ) on R d × R d with marginals (cid:82) y ∈ R d M ( dx, dy ) and (cid:82) x ∈ R d M ( dx, dy ) equal to µ ( dx ) and ν ( dy ) respectivelysuch that M ( dx, dy ) = µ ( dx ) m ( x, dy ) for some Markov kernel m with the martingale property: ∗ Universit´e Paris-Est, Cermics (ENPC), INRIA, F-77455 Marne-la-Vall´ee, France. E-mail: [email protected] † Laboratoire de Probabilit´es, Statistique et Mod´elisation, UMR 8001, Campus Pierre et Marie Curie, SorbonneUniversit´e case 158, 4, pl. Jussieu, F-75252 Paris Cedex 5, France. E-mail: [email protected] ‡ This research benefited from the support of the “Chaire Risques Financiers”, Fondation du Risque a r X i v : . [ m a t h . P R ] O c t x ∈ R d , (cid:82) R d | y | m ( x, dy ) < + ∞ and (cid:82) R d y m ( x, dy ) = x . If ( X, Y ) is distributed according to M ,then X and Y are respectively distributed according to µ and ν and E ( Y | X ) = X .In this paper, we are interested in constructing approximations of a sequence ( µ k ) k =0: n ∈ (cid:0) P ( R d ) (cid:1) n in increasing convex order ( ∀ k = 0 : n − , µ k ≤ cvx µ k +1 ) by a sequence ( (cid:98) µ k ) k =0: n ofprobability measures with finite supports still in the convex order. A possible motivation comes frommathematical finance when one wants to price exotic options written on d assets with price evolution (cid:0) S t = ( S t , . . . , S dt ) (cid:1) t ≥ . Suppose for simplicity zero interest rate and let ( T k ) k =0: n be the maturitiesindexed in increasing order of the vanilla options written on these assets of the exotic option withpayoff c (( S T k ) k =0: n ). The trader typically picks up her favourite model, then calibrates it to vanillaoptions prices and uses this calibrated model ( (cid:101) S t ) t ≥ to compute the price E (cid:2) c ( (cid:0) (cid:101) S T k ) k =0: n (cid:1)(cid:3) of thisexotic option. A natural way for the bank to evaluate the model risk is to compute the range ofprices of this exotic option in all models compatible with the marginal distributions ( µ k ) k =0: n of( (cid:101) S T k ) k =0: n which are calibrated to the vanilla option prices. This approach is formalized by theMartingale Optimal Transport (MOT) problem introduced in [6] which has received recently agreat attention in the financial mathematics literature. In particular, the structure of martingaleoptimal transport couplings [7, 9, 11, 17, 22], continuous time formulations [12, 16, 21], linkswith the Skorokhod embedding problem [5], numerical methods [1, 2, 10, 19, 20] and stabilityproperties [3, 27, 43] have been investigated.By absence of arbitrage opportunities, the marginal distributions are in increasing convex orderand the range is [ C (( µ k ) k =0: n ) , ¯ C (( µ k ) k =0: n )] with C (( µ k ) k =0: n ) = inf µ ∈M (cid:0) ( µ k ) k =0: n (cid:1) (cid:90) ( R d ) n +1 c (cid:0) ( x k ) k =0: n (cid:1) µ ( d ( x k ) k =0: n )and ¯ C (( µ k ) k =0: n ) = sup µ ∈M (cid:0) ( µ k ) k =0: n (cid:1) (cid:90) ( R d ) n +1 c (cid:0) ( x k ) k =0: n (cid:1) µ ( d ( x k ) k =0: n )where the set M (cid:0) ( µ k ) k =0: n (cid:1) ) = (cid:26) µ ∈ P (( R d ) n +1 ) : ∀ k = 0 : n and B ∈ B or ( R d ) , µ (( R d ) k × B × ( R d ) n − k ) = µ k ( B ) ∀ k = 0 : n − ϕ : ( R d ) k +1 → R d meas. bounded , (cid:90) ( R d ) n +1 ϕ (( x (cid:96) ) (cid:96) =0: k ) . ( x k +1 − x k ) µ (cid:0) d ( x (cid:96) ) (cid:96) =0: n (cid:1) = 0 (cid:27) of martingale couplings between the marginals is non empty according to Strassen’s theorem [42].The dual formulation of these optimization problems and its interpretation in terms of sub andsuper-hedging strategies are investigated in [6, 8]. One may approximate the above interval by (cid:2) C (( (cid:98) µ k ) k =0: n ) , ¯ C (( (cid:98) µ k ) k =0: n ) (cid:3) . This approach can be compared to [24, 25, 26] which also deal withrobust pricing (and hedging) of various classes of path-dependent options.If for k = 0 : n , (cid:98) µ k = (cid:80) N k i =1 ρ ki δ x ki with distinct elements x ki of R d , then C (( (cid:98) µ k ) k =0: n ) (resp.¯ C (( (cid:98) µ k ) k =0: n )) is the value of the linear programming problem which consists in minimizing (resp.maximizing) (cid:88) i =1: N . . . (cid:88) i n =1: N n p i ,...,i n c (cid:0) ( x ki k ) k =0: n (cid:1) p i ,...,i n ) i =1: N ,...,i n =1: N n ∈ R N × ... × N n + such that ∀ k = 0 : n, ∀ i k = 1 : N k , (cid:88) i =1: N . . . (cid:88) i k − =1: N k − (cid:88) i k +1 =1: N k +1 . . . (cid:88) i n =1: N n p i ,...,i n = ρ ki k ∀ k = 0 : n − , ∀ i = 1 : N , . . . , ∀ i k = 1 : N k , (cid:88) i k +1 =1: N k +1 . . . (cid:88) i n =1: N n p i ,...,i n ( x k +1 i k +1 − x ki k ) = 0 . These finite-dimensional linear programming problems can be solved using solvers like e.g. GLPK .Even for smooth payoff functions c , the stability of the infimum and the supremum with respect tothe marginal distributions i.e. the continuity of C and ¯ C which would give a theoretical ground tothis approach is still an open question when d ≥ n ≥
2. When d = n = 1, Backhoff-Veraguasand Pammer [3] prove that C ( µ (cid:96) , ν (cid:96) ) converges to C ( µ, ν ) as (cid:96) → + ∞ when ( µ (cid:96) ) (cid:96) ≥ and ( ν (cid:96) ) (cid:96) ≥ are two sequences in P ( R ) respectively converging to µ and ν for the Wasserstein distance withindex one such that for each (cid:96) ≥ µ (cid:96) ≤ cvx ν (cid:96) and c : R → R is a continuous function such thatsup ( x,y ) ∈ R | c ( x,y ) | | x | + | y | < + ∞ . Their result even applies to payoffs c (cid:96) depending on (cid:96) and converginguniformly to c as above when (cid:96) → + ∞ . See also [43] for further results in that direction but stillrestricted to the case d = n = 1.To our best knowledge, few studies consider the problem of preserving the convex order whileapproximating a sequence of probability measures. We mention the thesis of Baker [4] who pro-poses the following construction in dimension d = 1. Let for u ∈ (0 , F − k ( u ) = inf { x ∈ R : µ k (( −∞ , x ]) ≥ u } be the quantile of µ k of order u . Let ( N k ) k =0: n be a sequence of elements of N ∗ such that for k = 0 : n − N k +1 /N k ∈ N ∗ sOne has (cid:98) µ ≤ cvx (cid:98) µ ≤ cvx . . . ≤ cvx (cid:98) µ n for the choice (cid:98) µ k = 1 N k N k (cid:88) i =1 δ N k (cid:82) iNki − Nk F − k ( u ) du , k = 0 , . . . , n. Dual (or Delaunay) quantization introduced by Pag`es and Wilbertz [36] and further studied in [37,38, 39] gives another way to preserve the convex order in dimension d = 1 (see the remark afterProposition 10 in [37]) when µ n is compactly supported.In two recent papers [1, 2], Alfonsi, Corbetta and Jourdain propose to restore the convexordering from any finitely supported approximation ( (cid:101) µ k ) k =0: n of ( µ k ) k =0: n . In dimension d = 1,one may define the increasing (resp. decreasing) convex order by adding the constraint that thetest function ϕ is non-decreasing (resp. non-increasing) in (1.1). Moreover, according to [2], thiscan be performed by forward (resp. backward) induction on k by setting (cid:98) µ = (cid:101) µ (resp. (cid:98) µ n = (cid:101) µ n )and computing (cid:98) µ k as the supremum between (cid:98) µ k − (resp. infimum between (cid:98) µ k +1 ) and (cid:101) µ k for theincreasing convex order when (cid:82) R x (cid:101) µ k ( dx ) ≤ (cid:82) R x (cid:98) µ k − ( dx ) (resp. (cid:82) R x (cid:101) µ k ( dx ) ≥ (cid:82) R x (cid:98) µ k +1 ( dx )) andthe decreasing convex order when (cid:82) R x (cid:101) µ k ( dx ) ≥ (cid:82) R x (cid:98) µ k − ( dx ) (resp. (cid:82) R x (cid:101) µ k ( dx ) ≤ (cid:82) R x (cid:98) µ k +1 ( dx )).For a general dimension d , [1] suggests to set (cid:98) µ n = (cid:101) µ n and compute by backward induction on k = 0 : n − (cid:98) µ k as the projection of (cid:101) µ k on the set of probability measures dominated by (cid:98) µ k +1 for the quadratic Wasserstein distance by solving a quadratic optimization problem with linearconstraints.For general dimensions d but with only two marginals ( n = 1) and µ compactly supported,the convex order is preserved by defining (cid:98) µ as a stationary primal (or Voronoi) quantization µ on N points and (cid:98) µ as a dual (or Delaunay) quantization of µ on N points. We willprove in Section 2.3 that when these quantizations are optimal and N and N go to infinity, then C ( (cid:98) µ , (cid:98) µ ) and ¯ C ( (cid:98) µ , (cid:98) µ ) respectively converge to C ( µ , µ ) and ¯ C ( µ , µ ) for continuous payoffs c : R d × R d → R with polynomial growth.Dual quantization of a probability measure with bounded support yields an approximation bya probability measure which is larger for the convex order and has a finite support. In the presentpaper, taking advantage of both properties, we are going to propose a quantization-based spatialdiscretization scheme still valid for n, d ≥ µ k ) k =0: n is the sequence of marginals of anARCH model evolving inductively according to X k +1 = X k + ϑ k ( X k ) Z k +1 , k = 0 , . . . , n − Z k ) k =1: n an R q -valued white noise ( ) independent of X and, for k = 0 , . . . , n − ϑ k goesfrom R d to the space M d,q of real matrices with d rows and q columns.The main problem especially to be solved in presence of several times steps ( n ≥
2) is tocontrol at every time k the (finite) size of the support of the approximation of X k while preservingthe martingale property i.e. the convex order. Preserving the last feature by simply spatiallydiscretizing the white noise ( Z k ) k =1: n , leads to an explosion of the support of the X k : indeed if X = x ∈ R d and the Z k are replaced by ˘ Z k taking e.g. N values, then X n will take N n valueswhich is totally unrealistic as soon as. . . N = 2 if n = 20. Combining alternatively a Voronoiquantization step of the white noise and a dual quantization step of the ARCH will provide atractable answer to this question with an a priori control of the induced quadratic error. The aimof the next sections of this paper is to investigate this approach in a step-by-step manner.In the second section, we first prove that an optimal quadratic primal quantization of µ ∈ P ( R d )on N points is a quadratic Wasserstein projection of µ on the set of probability measures withsupport restricted to N points and smaller than µ in the convex order. We next introduce thedual (Delaunay) quantization. We then prove the above mentioned stability property of C and ¯ C when n = 1. We last introduce a theoretical approximation preserving the convex order for themarginals ( µ k ) k =0: n of a martingale Markov chain: it consists in alternating dual quantization stepswith conditional evolution according to the current Markov transition.The third section is dedicated to ARCH models. When the support of X is bounded andthe functions ( ϑ k ) k =0: n − are locally bounded, the replacement of the white noise ( Z k ) k =1: n by atruncated bounded white noise yields an approximation ( ¯ X k ) k =0: n of ( X k ) k =0: n where each randomvector is compactly supported a condition necessary to undergo a dual quantization step. Weanalyze the resulting quadratic error. We then give conditions on the functions ( ϑ k ) k =0: n − ensuringthe convex ordering ( ¯ X k ) k =0: n ≤ cvx ( X k ) k =0: n of the whole paths whatever the white noise is indimension d = 1 and with the r.v. Z k radially distributed in higher dimensions.In Section 4, the theoretical approximation proposed at the end of Section 2 is made morepractical in the case of ARCH models. In dimension one, we show that, for ome distributionsof the white noise, a deterministic optimization can be implemented, based on some closed formformulas, without quantizing it. This includes the case of the Euler scheme of a Brownian diffusion.In higher dimension, the white noise ( Z k ) k =1: n in ARCH models can be replaced by an approximatewhite noise ( (cid:101) Z k ) k =1: n where (cid:101) Z k only takes N Zk values in the martingale Markov transitions which
2. By white noise we mean here a sequence of independent square integrable centered random vectors with identitycovariance matrix. k = 0 : n , (cid:101) Z k is a stationary primal quantization of Z k . These multidimensionalresults still include the Euler scheme. Definitions and notations. • The space of real matrices with d rows and q columns is denoted by M d,q . • | · | denotes the canonical Euclidean norm on R d . • When R d and R q are endowed with the canonical Euclidean norms, the operator norm of a matrix A ∈ M d,q is denoted ||| A ||| . • If A : (Ω , A , P ) → M d,q , we denote by (cid:107) A (cid:107) p = (cid:2) E ||| A ||| p (cid:3) p . • conv( A ) denotes the (closed) convex hull of A ⊂ R d and card( A ) or | A | its cardinality (dependingon the context). • For p ∈ [1 , + ∞ ), let W p ( µ, ν ) = inf (cid:110) (cid:18)(cid:90) R d × R d | x − y | p M ( dx, dy ) (cid:19) /p , where M has marginals µ and ν (cid:111) ≤ + ∞ denote the Wasserstein distance with index p . This is a complete metric on the set P p ( R d ) ofprobability distributions on (cid:0) R d , B or ( R d ) (cid:1) with finite p th moment. • For every integer N ≥
1, we denote by P ( R d , N ) the set of distributions on R d whose supportcontains at most N points. • The symbol ⊥⊥ denotes the independence of random variables or vectors. • A white noise is a sequence of independent square integrable centered R q -valued random vectorswith identity covariance matrix I q . This section is devoted to the connections between quantization modes, convex order and pro-jections with respect to Wasserstein distances. Let d ∈ N ∗ and p ∈ [1 , + ∞ ) ( p ∈ (0 ,
1) should workas well by adapting some proofs as usual).
In this subsection which can be read independently of what follows, we make a connectionbetween regular quantization and various projections (in the Wasserstein sense), including, in thequadratic case, with the one mentioned above in the introduction. Let us first recall the followingbasic facts about (primal) Voronoi quantization (see [18, 34, 31] among others):– The L p -quantization error modulus e p (Γ , µ ) satisfies e p (Γ , µ ) p = (cid:90) R d | x − Proj Γ ( x ) | p µ ( dx ) (2.2)where Proj Γ denotes a Borel nearest neighbour projection on Γ (see (A.41) in Appendix A.1 forconnections with the Voronoi diagrams).– For any level N ≥
1, there exists an optimal grid or N -quantizer Γ ( N ) such that e p,N ( µ ) := inf (cid:110) e p (Γ , µ ) : Γ ⊂ R d , | Γ | ≤ N (cid:111) = e p (cid:0) Γ ( N ) , µ (cid:1) . µ ) contains at least N points, then Γ ( N ) has exactly N pairwise distinctelements (see Appendix A.1).– In the quadratic case ( p = 2), any optimal quantization grid Γ ( N ) (possibly not unique) andits induced quantization Proj ( N )Γ ( X ) satisfy a stationarity (or self-consistency ) property (see (A.43)in Appendix A.1) that is, if X ∼ µ and (cid:98) X N = Proj ( N )Γ ( X ) ∼ (cid:98) µ N , then E (cid:0) X | (cid:98) X N (cid:1) = (cid:98) X N (2.3)so that (cid:98) µ N ≤ cvx µ. Proposition 2.1 ( a ) p ∈ [1 , + ∞ ) . Let Γ ⊂ R d be a finite set and P (Γ) denote the subset of Γ -supported distributions. Let µ ∈ P p ( R d ) and ν ∈ P (Γ) where Γ ⊂ R d , Γ finite. Then W p (cid:0) µ, P (Γ) (cid:1) := inf ν ∈P (Γ) W p ( µ, ν ) = e p (Γ , µ ) := (cid:13)(cid:13) dist( ., Γ) (cid:13)(cid:13) L p ( µ ) and (cid:98) µ Γ = µ ◦ Proj − is a projection of µ on P (Γ) . ( b ) Quadratic case ( p = 2 ). Let Γ ( N ) be an optimal quadratic quantization grid at level N ≥ .Then (cid:98) µ N = µ ◦ Proj − ( N ) , (2.4) is (the/)a projection of µ on the set P ≤ µ ( R d , N ) of distributions dominated by µ for the convexorder whose support contains at most N elements. Proof. ( a ) Let M be any distribution on ( R d × R d , B or ( R d ) ⊗ ) with marginals µ and ν ∈ P (Γ).Then (cid:90) R d × R d | x − y | p M ( dx, dy ) ≥ (cid:90) dist( x, Γ) p M ( dx, dy ) = (cid:90) dist( x, Γ) p µ ( dx ) = e p (Γ , µ ) p . Now let (cid:98) µ Γ = µ ◦ Proj − . It follows from (2.2) that W p (cid:0) µ, P (Γ) (cid:1) ≤ W pp ( µ, (cid:98) µ Γ ) ≤ (cid:90) R d | x − Proj Γ ( x ) | p µ ( dx ) = e p (Γ , µ ) p . ( b ) One has by the stationarity property (2.3) that (cid:98) µ N ∈ P ≤ µ ( R d , N ) and by ( a ) W (cid:0) µ, (cid:98) µ N (cid:1) = W (cid:0) µ, P (cid:0) Γ ( N ) (cid:1)(cid:1) = e (cid:0) µ, Γ ( N ) (cid:1) = e ,N ( µ ) = W (cid:0) µ, P ( R d , N ) (cid:1) ≤ W (cid:0) µ, P ≤ µ ( R d , N ) (cid:1) where the last inequality is in fact an equality since (cid:98) µ N ∈ P ≤ µ ( R d , N ). (cid:50) Remark about uniqueness.
As a consequence of this proposition, it turns out that the unique-ness of W p -projection on P ( R d , N ) and that of the distribution (cid:98) µ N of an optimal quantizer areequivalent. Thus in dimension d = 1 and p = 2 distributions with log-concave densities have aunique optimal N -quantization grid (see Kiefer [29]) hence this projection is unique. In higher di-mension, a general result seems difficult to reach: indeed, the N (0; I d ) distribution, being invariantunder the action of O ( d, R ) (orthogonal transforms), so are the (hence infinite) sets of its optimalquantizers at levels N ≥
2. 6 .2 Dual (Delaunay) quantization
We assume in this section that µ is compactly supported. Let X : (Ω , A , P ) → R d be a randomvector lying in L ∞ ( P ) with distribution µ . Assume for simplicity that the support of µ spans R d as an affine space ( ), or, equivalently, that it contains an affine basis of R d or that its convex hullhas a non-empty interior. It means that d is the dimension of the state space of X . Otherwise onemay always consider the affine space A µ spanned by supp( µ ) and reduce the problem to the formerframework by combining a translation with a change of coordinates into an orthonormal basis ofthe vector space associated with A µ . Optimal dual (or Delaunay) quantization as introduced in [37]relies on the best approximation which can be achieved by a discrete random vector (cid:98) X that satisfiesa certain stationarity assumption on an extended probability space (Ω × Ω , A ⊗ A , P ⊗ P ) with(Ω , A , P ) supporting a random variable uniformly distributed on [0 , p ∈ [1 , + ∞ ), d p,N ( X ) = inf (cid:98) X (cid:110)(cid:13)(cid:13) X − (cid:98) X (cid:13)(cid:13) p : (cid:98) X : (Ω × Ω , A ⊗ A , P ⊗ P ) → R d , card (cid:0) (cid:98) X (Ω × Ω ) (cid:1) ≤ N and E ( (cid:98) X | X ) = X (cid:111) . One checks that d p,N ( X ) only depends on the distribution µ of X and can subsequently also bedenoted d p,N ( µ ). Moreover, for every level N ≥ d + 1, there exists an L p -optimal dual quantizationgrid Γ ( N ) ,del (see [37]) i.e. satisfying supp( µ ) ⊂ conv(Γ ( N ) ,del ) (hence with a non-empty interior)and d p,N ( X ) = (cid:107) X − (cid:98) X (cid:107) p with (cid:98) X (cid:0) Ω × Ω ) = Γ ( N ) ,del . One may always assume that Ω = [0 ,
1] and define (cid:98) X as the dual projection on an ap-propriate Delaunay (hyper-)triangulation induced by Γ ( N ) ,del denoted Proj del Γ ( N ) ,del so that (cid:98) X =Proj del Γ ( N ) ,del ( X, U ) with U ⊥⊥ X , U ∼ U ([0 , del Γ : conv (cid:0) Γ (cid:1) × [0 , → Γ,also called a splitting operator , can be associated to any grid Γ with non-empty interior satisfies,beyond measurability, the following stationarity property ∀ y ∈ conv (cid:0) Γ (cid:1) , (cid:90) Proj del Γ ( y, u ) du = y (2.5)from which one derives the dual stationarity property for any conv(Γ)-valued random vector E (cid:0) Proj del Γ ( X, U ) (cid:12)(cid:12) X (cid:1) = X. (2.6)This stationarity property is satisfied regardless of the optimality of the grid Γ but of course it isin particular satisfied by the optimal dual grid Γ ( N ) so that E ( (cid:98) X | X ) = X. (2.7)For more details on this dual projection, see Appendix A.2, see also [37, 38] where this notionhas been introduced. When µ spans a lower dimensional affine space than R d , simply replace R d by this affine space in what precedes.
3. i.e. (cid:8) x + λ ( x − x ) + · · · + λ d ( x d − x ) , x , . . . , x d ∈ supp( µ ) , λ , . . . , λ d ∈ R (cid:9) = R d . P ≥ µ ( N, R d ) denote the set of distributions dominating µ for the convex order and supportedby at most N elements. By Lemma 2.22 in [28], we see that for each ν ∈ P ≥ µ ( N, R d ) and eachmartingale coupling M between µ and ν there exists on (Ω × Ω , A ⊗ A , P ⊗ P ) a random vector (cid:98) X such that ( X, (cid:98) X ) is distributed according to M . Hence d pp,N ( µ ) = inf ν ∈P ≥ µ ( N, R d ) inf M ∈M ( µ,ν ) (cid:90) R d × R d | y − x | p M ( dx, dy ) , where M ( µ, ν ) denotes the set of martingale couplings between µ and ν . Note that, in the quadraticcase p = 2, for each M ∈ M ( µ, ν ), (cid:82) R d × R d | y − x | M ( dx, dy ) = (cid:82) R d | y | ν ( dy ) − (cid:82) R d | x | µ ( dx ) so that d ,N ( µ ) = inf ν ∈P ≥ µ ( N, R d ) (cid:90) R d | y | ν ( dy ) − (cid:90) R d | x | µ ( dx ) . We consider, following [1], the W p -projection (cid:101) µ of µ on P ≥ µ ( N, R d ). It is clear from its verydefinition that d p,N ( µ ) ≥ inf ν ∈P ≥ µ ( N, R d ) W p ( µ, ν ) = W p ( µ, (cid:101) µ ) . But this time, the converse inequality is not true as emphasized by the following counter-example.In contrast with the Voronoi setting there is no reason why (cid:101) µ should be the dual quantization of X ∼ µ or, equivalently, that the transition m ( x, dy ) associated to the distribution M ( dx, dy ) = µ ( dx ) m ( x, dy ) achieving the Wasserstein distance W (cid:0) µ, (cid:101) µ (cid:1) to be a/the martingale one providedby Strassen’s theorem. Counter-Example.
Let µ ( dx ) = 2 x [0 , ( x ) dx . We look for ν ∈ P ≥ µ (3 , R ) minimizing either (cid:82) R y ν ( dy ) to compute the law of the optimal quadratic quantization of µ on N = 3 points or W ( µ, ν ) to compute (cid:101) µ . Since d = 1, W ( µ, ν ) is equal to the integral (cid:82) ( F − µ ( u ) − F − ν ( u )) du ofthe squared difference between the quantile functions of µ and ν . It is not difficult to check that itis equivalent to minimize over the set { ν ∈ P ≥ µ (3 , R ) : ν ([0 , } = (cid:26) ν u ( dy ) = u δ ( dy ) + 1 + √ u δ √ u ( dy ) + 2 − √ u − u δ ( dy ) : u ∈ [0 , (cid:27) . One has (cid:82) y ν u ( dy ) = u / −√ u and the infimum is attained for u = 1 /
3. On the other hand, W ( µ, ν u ) = (cid:90) u/ (0 − √ v ) dv + (cid:90) (1+ √ u + u ) / u/ ( √ u − √ v ) dv + (cid:90) √ u + u ) / (1 − √ v ) dv = −
16 + u / − √ u − √ u )(1 + √ u + u ) / + u / . One easily checks that ddu W ( µ, ν u ) | u =1 / > W ( µ, ν u ) is minimal for u (cid:39) . Let µ , µ ∈ P ( R d ) with µ ≤ cvx µ and let c : R d × R d → R be a Borel cost function. Assumethat there exists functions c , c : R d → R + both Borel, such that (cid:90) R d c ( x ) µ ( dx ) + (cid:90) R d c ( x ) µ ( dx ) < + ∞ and ∀ ( x , x ) ∈ R d × R d , c ( x , x ) ≥ − c ( x ) − c ( x ) . (2.8)8e recall that C( µ , µ ) = inf µ ∈M ( µ ,µ ) (cid:90) R d × R d c ( x , x ) µ ( dx , dx )where the infimum is taken over the set M ( µ , µ ) of martingale couplings between µ and µ .Setting ¯ C ( µ , µ ) = sup µ ∈M ( µ ,µ ) (cid:82) R d × R d c ( x , x ) µ ( dx , dx ) when there exists functions c , c : R d → R + both Borel, such that (cid:90) R d c ( x ) µ ( dx ) + (cid:90) R d c ( x ) µ ( dx ) < + ∞ and ∀ ( x , x ) ∈ R d × R d , c ( x , x ) ≤ c ( x ) + c ( x ) , (2.9)one has ¯ C ( µ , µ , c ) = − C( µ , µ , − c ) when making explicit the dependence on the cost function c .Therefore it is enough to deal with the infimum case in the proofs. Lemma 2.1
Let µ , µ ∈ P ( R d ) be such that µ ≤ cvx µ . If c is lower semi-continuous andsatisfies (2.8) , then −∞ < C ( µ , µ ) and there exists µ ∈ M ( µ , µ ) such that C ( µ , µ ) = (cid:82) R d × R d c ( x , x ) µ ( dx , dx ) . If c is upper semi-continuous and satisfies (2.9) , then ¯ C ( µ , µ ) < + ∞ and there exists ¯ µ ∈ M ( µ , µ ) such that ¯ C ( µ , µ ) = (cid:82) R d × R d c ( x , x )¯ µ ( dx , dx )Notice that, under (2.9) (resp. (2.8)), C( µ , µ ) < + ∞ (resp. −∞ < ¯ C ( µ , µ )) inequalities whichare not guaranteed under the assumptions of the Lemma. Proof.
Let ( µ m ) m ∈ N and ( µ m ) m ∈ N be two sequences in P ( R d ) respectively weakly converging to µ and µ as m → + ∞ and such that µ m ≤ cvx µ m for each m ∈ N . Let also µ m ∈ M ( µ m , µ m ) foreach m ∈ N . The necessary condition in Prokhorov’s theorem ensures that the weakly convergingsequences ( µ m ) m ∈ N and ( µ m ) m ∈ N are tight. We deduce that ( µ m ) m ∈ N is tight. By continuity of thetwo canonical projections from R d × R d onto R d , the marginals of any weak limit of a subsequenceof ( µ m ) m ∈ N are µ and µ . Since the martingale property is preserved by weak convergence, sucha limit belongs to M ( µ , µ ).Let now ( µ m ) m ∈ N be a sequence in M ( µ , µ ) such thatlim m → + ∞ (cid:90) R d × R d c ( x , x ) µ m ( dx , dx ) = C( µ , µ ) . (2.10)By the above argument, we may extract a subsequence, still denoted ( µ m ) m for notational conve-nience, converging weakly to some limit µ ∈ M ( µ , µ ). For a > µ ∈ M ( µ , µ ), we have (cid:90) R d × R d (cid:0) − a − c ( x , x ) (cid:1) + µ ( dx , dx ) ≤ (cid:90) R d × R d ( c ( x ) + c ( x ) − a ) + µ ( dx , dx ) ≤ (cid:90) { c ( x )+ c ( x ) >a } (cid:0) c ( x ) + c ( x ) (cid:1) µ ( dx , dx ) ≤ (cid:90) R d × R d c ( x ) (cid:0) { c ( x ) >a/ } + { c ( x ) ≤ a/ , c ( x ) >a/ } (cid:1) µ ( dx , dx )+ (cid:90) R d × R d c ( x ) (cid:0) { c ( x ) >a/ } + { c ( x ) ≤ a/ , c ( x ) >a/ } (cid:1) µ ( dx , dx ) ≤ (cid:90) R d c ( x ) { c ( x ) >a/ } µ ( dx ) + a µ (cid:0) { x : c ( x ) > a/ } (cid:1) + (cid:90) R d c ( x ) { c ( x ) >a/ } µ ( dx ) + a µ (cid:0) { x : c ( x ) > a/ } (cid:1) . µ ∈ M ( µ , µ ) and goes to 0 as a → + ∞ by Lebesgue’stheorem. Since for a >
0, by the lower semi-continuity of c and the Portemanteau theorem, (cid:90) R d × R d c ( x , x ) ∨ ( − a ) µ ( dx , dx ) ≤ lim inf m → + ∞ (cid:90) R d × R d c ( x , x ) ∨ ( − a ) µ m ( dx , dx ) , and using that c ( x , x ) ∨ ( − a ) = c ( x , x ) + ( − a − c ( x , x )) + , we conclude that (cid:90) R d × R d c ( x , x ) µ ( dx , dx ) ≤ lim m → + ∞ (cid:90) R d × R d c ( x , x ) µ m ( dx , dx ) = C ( µ , µ ) . (cid:50) Remark. If c and c are themselves l.s.c., the second part of the proof is a straightforwardapplication of Fatou’s lemma for weak convergence applied to the non-negative l.s.c. function c + c + c . Proposition 2.1
Let µ , µ ∈ P ( R d ) be such that µ ≤ cvx µ with µ compactly supported and c : R d × R d → R be a continuous function with polynomial growth. Let ( N m ) m ∈ N and ( M m ) m ∈ N betwo sequences of positive integers converging to ∞ with m and (cid:98) µ m (resp. (cid:98) µ m ) be an optimal primal(resp. dual) quantization of µ (resp. µ ) on N m (resp. M m ) points. ( a ) Then, C ( µ , µ ) = lim m → + ∞ C ( (cid:98) µ m , (cid:98) µ m ) , ¯ C ( µ , µ ) = lim m → + ∞ ¯ C ( (cid:98) µ m , (cid:98) µ m ) and any sequence ( µ m ) m ∈ N with µ m ∈ M ( (cid:98) µ m , (cid:98) µ m ) is tight. ( b ) The weak limits of subsequences of c -minimal (resp. maximal) martingale couplings between (cid:98) µ m and (cid:98) µ m , which exist for each m ∈ N , are c -minimal (resp. maximal) martingale couplings between µ and µ . Remark.
For all the statements but the existence of c -optimal martingale couplings between (cid:98) µ m and (cid:98) µ m , the continuity of c may be replaced by continuity outside a set negligible for all µ ∈ M ( µ , µ ). The structure of these M ( µ , µ )-polar sets has been studied by De March andTouzi, see [11].The proof of Proposition 2.1 relies on the next lemma which in turn crucially relies on the re-spective use of optimal primal and dual quantization to approximate the first and second marginals. Lemma 2.2
Let µ , µ , (cid:98) µ m , (cid:98) µ m be as in Proposition 2.1. For each µ ∈ M ( µ , µ ) , there exists (cid:98) µ m ∈ M ( (cid:98) µ m , (cid:98) µ m ) such that W ( (cid:98) µ m , µ ) ≤ e ,N n ( µ ) + d ,M n ( µ ) . Proof.
Let µ ∈ M ( µ , µ ) and let q denote the martingale Markov kernel associated to this couplingin the sense that µ ( dx , dx ) = µ ( dx ) q ( x , dx ) ( q ( x , dx ) is µ ( dx ) a.e. unique). The image of µ by R d (cid:51) x (cid:55)→ (Proj Γ Nm ( x ) , x ) is a martingale coupling between (cid:98) µ m and µ , optimal for the W distance: W ( (cid:98) µ m , µ ) = e ,N m ( µ ). Let q m ( (cid:98) x , dx ) denote the associated martingale Markovkernel. For X distributed according to µ and U an independent random variable uniformlydistributed on [0 , X , Proj del (cid:101) Γ Mm ( X , U )) is a (martingale) coupling between µ and (cid:98) µ m so that W ( µ , (cid:98) µ m ) ≤ E | X − Proj del (cid:101) Γ Mm ( X , U ) | = d ,M m ( µ ). Let q m ( x , d (cid:98) x ) denote theassociated martingale Markov kernel. Then (cid:98) µ m ( d (cid:98) x , d (cid:98) x ) = (cid:98) µ m ( d (cid:98) x ) (cid:90) ( x ,x ) ∈ R d × R d q m ( (cid:98) x , dx ) q ( x , dx ) q m ( x , d (cid:98) x ) ∈ M ( (cid:98) µ m , (cid:98) µ m ) . (cid:90) ( (cid:98) x , (cid:98) x ) ∈ R d × R d (cid:98) µ m ( d (cid:98) x ) q m ( (cid:98) x , dx ) q ( x , dx ) q m ( x , d (cid:98) x ) = µ ( dx , dx ) , one has W ( (cid:98) µ m , µ ) ≤ (cid:90) R d × R d × R d × R d ( | (cid:98) x − x | + | (cid:98) x − x | ) (cid:98) µ m ( d (cid:98) x ) q m ( (cid:98) x , dx ) q ( x , dx ) q m ( x , d (cid:98) x )= e ,N m ( µ ) + d ,M m ( µ ) . (cid:50) Proof of Proposition 2.1.
Since µ is compactly supported (cid:82) R d ( | x |− K ) + µ ( dx ) = 0 for K largeenough. Then, since R d (cid:51) x (cid:55)→ ( | x |− K ) + is convex and µ ≤ cvx µ , (cid:82) R d ( | x |− K ) + µ ( dx ) = 0 and µ is also compactly supported. With Lemma 2.1 and the continuity and the polynomial growth of c , we deduce that there exists µ ∈ M ( µ , µ ) be such that C ( µ , µ ) = (cid:82) R d × R d c ( x , x ) µ ( dx , dx ).By Lemma 2.2, there exists a sequence ( (cid:98) µ m ) m ∈ N converging weakly to µ as m → + ∞ with (cid:98) µ m ∈M ( (cid:98) µ m , (cid:98) µ m ) for each m ∈ N . By continuity of c and uniform integrability deduced from thepolynomial growth of c combined with Theorems A.1 and A.2 for primal and dual quantizationsrespectively,lim m → + ∞ (cid:90) R d × R d c ( x , x ) (cid:98) µ m ( dx , dx ) = (cid:90) R d × R d c ( x , x ) µ ( dx , dx ) = C ( µ , µ ) . On the other hand, by Lemma 2.1, there exists µ m ∈ M ( (cid:98) µ m , (cid:98) µ m ) such that (cid:90) R d × R d c ( x , x ) µ m ( dx , dx ) = C ( (cid:98) µ m , (cid:98) µ m )so that (cid:82) R d × R d c ( x , x ) µ m ( dx , dx ) ≤ (cid:82) R d × R d c ( x , x ) (cid:98) µ m ( dx , dx ). By the first step in the proofof Lemma 2.1, like any sequence of elements of M ( (cid:98) µ m , (cid:98) µ m ), the sequence ( µ m ) m is tight and thelimit µ of any weakly convergent subsequence still denoted by ( µ m ) m for notational simplicitybelongs to M ( µ , µ ). Moreover, by the above arguments, (cid:90) R d × R d c ( x , x ) µ ( dx , dx ) = lim m → + ∞ (cid:90) R d × R d c ( x , x ) µ m ( dx , dx ) ≤ lim m → + ∞ (cid:90) R d × R d c ( x , x ) (cid:98) µ m ( dx , dx ) = C ( µ , µ )so that µ is a c -minimal martingale coupling between µ and ν . (cid:50) We consider a discrete time family of Markov transitions (cid:0) ( P k ( x, dy ) x ∈ R d ) k =0: n − satisfying amartingale property, namely ∀ k ∈ { , . . . , n − } , ∀ x ∈ R d , (cid:90) R d | y | P k ( x, dy ) < + ∞ and (cid:90) R d yP k ( x, dy ) = x. (2.11)Equivalently, we may consider a Markov chain ( X k ) k =0: n with transitions ( P k ( x, dy )) x ∈ R d as aboveon the canonical space (cid:0) ( R d ) n +1 , B or ( R d ) ⊗ n +1 , ( P x ) x ∈ R d (cid:1) so that E (cid:0) X k | F Xk − ) = X k − where F Xk − = σ (( X (cid:96) ) (cid:96) =0: k − ). 11t is straightforward that in such a situation X ≤ cvx X ≤ cvx · · · ≤ cvx X n . A natural question, closely connected with Martingale Optimal Transport (
M OT , see [6, 21]) isto produce “tractable approximations” of the chain ( X k ) k =0: n that still satisfy the above convexordering. For such a discrete time family of Markov transitions ( P k ( x, dy )) x ∈ R d satisfying (2.11), thereexist measurable functions ( F k : R d × R q → R d ) k =0: n − and independent R q -valued random vectors( Z k ) k =1: n independent from X , such that the sequence ( X k ) k =0: n defined inductively by X k +1 = G k ( X k , Z k +1 ) , k = 0 , . . . , n − , (2.12)is a Markov chain with transition kernels ( P k ) k =0: n − (see e.g. Lemma 2.22 in [28] for the particularcase q = 1 and ( Z k ) k =1: n uniformly distributed on [0 ,
1] for each k = 1 , . . . , n ). Condition (2.11)then reads ∀ k = 0 , . . . , n − , ∀ x ∈ R d , E | G k ( x, Z k +1 ) | < + ∞ and E G k ( x, Z k +1 ) = x (2.13)In what follows, this dynamical formulation as iterated random maps, which appears naturally inany application, will be the starting point of our investigations. We will make various assumptionson the functions G k and the random vectors Z k . In most applications that follow we will supposethat ( X k ) k =0: n ; is an F Xk -martingale.which is an assumption more stringent than (2.13), since it also requires that E | X k | < + ∞ for k = 0 , . . . , n .Keep in mind that, as a consequence of the dual stationarity property (2.6), for any randomvector ( Y, U ) ∈ L ∞ R d +1 (Ω , A , P ), Y ⊥⊥ U , U d = U ([0 , ⊂ R d , E (cid:0) Proj del Γ ( Y, U ) | Y (cid:1) = Y, where Proj del Γ is defined in Appendix A.2 (see (A.47)). The main geometric properties of dualquantization in connection with the (generalized) Delaunay triangulation and its optimization,when viewed as a function of the grid Γ, are recalled in the Appendix.Hence, at this stage, we know that, in order to dually quantize the chain ( X k ) k =0: n we needexogenous i.i.d. random variables U k ∼ U ([0 , k = 1 : n , independent of ( Z k ) k =1: n . In this dualquantization of the chain note that the starting value X will have a special status and needs notto be dually quantized but simply spatially discretized by any tractable mean. A natural choicebeing then to perform on X a primal (optimal) quantization.For every k = 0 , . . . , n , we consider grids Γ k ⊂ R d satisfying the following inductive property:( G F ) ∀ x ∈ Γ k − , G k − (cid:0) x, Z k (Ω) (cid:1) ⊂ conv (cid:0) Γ k (cid:1) , k = 1 : n.
12t is clear that, by induction on k , such n + 1-tuples of grids satisfying ( G F ) exist as soon as themappings G k satisfy ∀ k = 1 : n, ∀ x ∈ R d , G k − (cid:0) x, Z k (Ω) (cid:1) is bounded in R d . (2.14)Then, we may define by induction (cid:98) X = g ( X ) for some Borel function g : R d → Γ , (2.15) (cid:101) X k = G k − ( (cid:98) X k − , Z k ) and (cid:98) X k = Proj del Γ k ( (cid:101) X k , U k ) , k = 1 : n. (2.16)where the sequence ( U k ) k =1: n is i.i.d. uniformly distributed on the unit interval and independentof ( Z k ) k =1: n and X . This definition is consistent since, by induction, all the random vectors (cid:101) X k are bounded. Proposition 2.2
Assume ( G F ) . Let F k = σ (cid:0) X , ( Z (cid:96) , U (cid:96) ) (cid:96) =1: k (cid:1) , k = 0 , . . . , n and G k = σ (cid:0) X , ( Z (cid:96) , U (cid:96) ) (cid:96) =1: k − , Z k (cid:1) , k = 1 : n . The sequences ( (cid:98) X k ) k =0: n and ( (cid:101) X k ) k =1: n defined by (2.15) - (2.16) are respectively an ( F k ) -martingale Markov chain and a ( G k ) -martingale Markov chain. Moreover, (cid:98) X ≤ cvx (cid:101) X ≤ cvx (cid:98) X ≤ cvx · · · ≤ cvx (cid:101) X n ≤ cvx (cid:98) X n . Proof.
The ( F k )-Markov property is clear since (cid:98) X k = Proj del Γ k (cid:0) G k − ( (cid:98) X k − , Z k ) , U k (cid:1) , k = 1 : n. Likewise, since for k = 2 : n , (cid:101) X k = G k − (Proj del Γ k − ( (cid:101) X k − , U k − ) , Z k ), ( (cid:101) X k ) k =1: n is a ( G k )-Markovchain.For k = 1 : n , as U k ⊥⊥ G k and (cid:101) X k is G k -measurable, E ( (cid:98) X k | G k ) = E (Proj del Γ k ( (cid:101) X k , U k ) | G k ) = (cid:2) E (cid:0) Proj del Γ k ( x, U ) (cid:1)(cid:3) | x = (cid:101) X k = (cid:101) X k . (2.17)Moreover, E (cid:0) (cid:101) X k | F k − (cid:1) = E (cid:0) G k − ( (cid:98) X k − , Z k ) | F k − (cid:1) = (cid:2) E (cid:0) G k − ( x, Z k ) (cid:1)(cid:3) | x = (cid:98) X k − = (cid:98) X k − as (cid:90) R d G k − ( x, z ) P Z k ( dz ) = x for every x ∈ R d .The convex ordering of the random vectors follows by Jensen’s inequality.Since F k − ⊂ G k ⊂ F k , with the tower property of conditional expectation, we also deduce that E (cid:0) (cid:98) X k | F k − (cid:1) = E (cid:0) E (cid:0) (cid:98) X k | G k (cid:1) | F k − (cid:1) = E (cid:0) (cid:101) X k | F k − (cid:1) = (cid:98) X k − and, when k ≤ n − E (cid:0) (cid:101) X k +1 | G k (cid:1) = E (cid:0) E (cid:0) (cid:101) X k +1 | F k (cid:1) | G k (cid:1) = E (cid:0) (cid:98) X k | G k (cid:1) = (cid:101) X k . Example: ARCH models with bounded innovation.
Let us consider the ARCH model X k +1 = X k + ϑ k ( X k ) Z k +1 , X ∈ L , (2.18)13here the (Borel) functions ϑ k : R d → M d,q are locally bounded and the r.v. ( Z k ) k =1: n are squareintegrable, centered and mutually independent (when Cov( Z k ) = I q for every k = 1 : n , ( Z k ) k =1: n is a white noise).If the r.v. Z k all lie in L ∞ R q ( P ), then Assumption ( G F ) is satisfied since G k ( x, z ) = x + ϑ k ( x ) z .The Euler scheme with Brownian increments of a martingale Brownian diffusion with diffusioncoefficient ϑ ( t, x ) is an ARCH model corresponding to the choice ϑ k ( x ) = (cid:113) Tn ϑ ( t k , x ) with aGaussian N (0; I q )-distributed white noise ( Z k ) k =0: n since¯ X k +1 = ¯ X k + ϑ ( t k , ¯ X k ) (cid:113) Tn Z k +1 , k = 0 , . . . , n − t = 0, t k = kTn , Z k = (cid:112) nT ( W t k − W t k − ) ∼ N (0; I q ), k = 1 : n ( W is a standard q -dimensionalBrownian motion).However, such a Gaussian white noise makes impossible assumption (2.14) , and in turn ( G F ) to hold true .This can be fixed if the normalized Brownian increments are replaced by a q -dimensionalRademacher white noise or any other noise having a distribution with compact support in R q like e.g. optimal primal (Voronoi) quantizations (cid:98) Z vork of Z k , k = 1 : n (see Section 3.1.1 furtheron). Then assumption ( G F ) is fulfilled and the quantized scheme (2.15)-(2.16) can be designed. ( (cid:98) X k ) k =0: n toward ( X k ) k =0: n We make an additional assumption on the mappings G k and the r.v. Z k , namely a Lipschitzcontinuous property for the L -norm: ∀ k = 0 , . . . , n − , ∃ [ G k ] Lip < + ∞ , ∀ x, y ∈ R d , (cid:13)(cid:13) G k ( x, Z k +1 ) − G k ( y, Z k +1 ) (cid:13)(cid:13) ≤ [ G k ] Lip | x − y | . (2.20) Remark.
Such an assumption is fulfilled by the above ARCH models (2.18) if the “diffusion”coefficients ϑ k are Lipschitz continuous and the r.v. Z k are square integrable. Proposition 2.3 (Quadratic convergence rate)
Let ( X k ) k =0: n be a martingale Markov chaindefined by (2.13) such that the functions G k and the innovation sequence ( Z k ) k =1: n satisfy (2.13) , (2.14) and (2.20) and let ( (cid:98) X k ) k =0: n be defined by (2.15) - (2.16) . ( a ) For every k ∈ { , . . . , n } , (cid:13)(cid:13) (cid:98) X k − X k (cid:13)(cid:13) ≤ (cid:32) k (cid:88) (cid:96) =0 [ G (cid:96) : k ] (cid:13)(cid:13) (cid:98) X (cid:96) − (cid:101) X (cid:96) (cid:13)(cid:13) (cid:33) , with the convention (cid:101) X = X and where, for ≤ (cid:96) ≤ k , [ G (cid:96) : k ] Lip = k (cid:89) i = (cid:96) +1 [ G i ] Lip ( (cid:81) ∅ = 1 ). ( b ) If (cid:98) X = Proj vor Γ ( X ) with the grid Γ L -Voronoi optimal and the grids (Γ k ) k =1: n are L -duallyoptimal, then, for every k ∈ { , . . . , n } , (cid:13)(cid:13) (cid:98) X k − X k (cid:13)(cid:13) ≤ (cid:16) [ G k ] ( (cid:101) C vor ,d,η ) σ η ( X ) + ( (cid:101) C del ,d,η ) k (cid:88) (cid:96) =1 [ G (cid:96) : k ] σ η ( (cid:101) X (cid:96) ) N − d (cid:96) (cid:17) / , here, for any R d -valued r.v. X , σ η ( X ) = inf a ∈ R d (cid:107) (cid:101) X − a (cid:107) η is the L η -pseudo-standarddeviation of X . Proof.
Let k ∈ { , . . . , n − } . Since ( (cid:101) X k +1 , X k +1 ) is G k +1 -measurable and, by (2.17), E ( (cid:98) X k +1 |G k +1 ) = (cid:101) X k +1 , the two terms in the right-hand side of the decomposition (cid:98) X k +1 − X k +1 = (cid:98) X k +1 − (cid:101) X k +1 + (cid:101) X k +1 − X k +1 are orthogonal. This implies (cid:13)(cid:13) (cid:98) X k +1 − X k +1 (cid:13)(cid:13) = (cid:13)(cid:13) (cid:98) X k +1 − (cid:101) X k +1 (cid:13)(cid:13) + (cid:13)(cid:13) G k ( (cid:98) X k , Z k +1 ) − G k ( X k , Z k +1 ) (cid:13)(cid:13) ≤ (cid:13)(cid:13) (cid:98) X k +1 − (cid:101) X k +1 (cid:13)(cid:13) + [ G k ] (cid:13)(cid:13) (cid:98) X k − X k (cid:13)(cid:13) . A straightforward backward induction completes the proof of claim ( a ). ( b ) This follows from thenon-asymptotic bounds for primally and dually optimized grids (Pierce’s lemma) recalled respec-tively in Theorem A.1 ( b ) in Appendix A.1 and Theorem A.2( b ) in Appendix A.2. (cid:50) Remark.
For this kind of Markov models, a control of the pseudo- L η -standard deviation isestablished in Lemma 3.2 [33] and extended in the proofs of Propositions 2.2 and 2.4 [32] when,in (2.16), the dual quantization step is replaced by a primal quantization step still alternating withthe transition step. In particular it holds for ARCH models. An ARCH model evolving according to (2.18) with non-vanishing functions ϑ k satisfies (2.14) iffthe noise ( Z k ) k =1: n is compactly supported. To be able to apply dual quantization to ARCH modelswith non compactly supported noise, we are first going to approximate them by ARCH modelswith truncated noise. In this section, we provide several examples of such ARCH approximations,analyse the resulting error and give conditions under which the whole path of the approximationis dominated by the path of the original ARCH model for the convex order. To deal in a tractable way with general ARCH models satisfying the dynamics (2.18) with asequence of locally bounded coefficients ( ϑ k ) k =0: n − and a general L -noise ( Z k ) k =0: n , a naturalidea is simply to approximate the r.v. Z k by ˘ Z k which are bounded functions of Z k , namely˘ Z k = ϕ k ( Z k ) in such a way that (cid:26) ( i ) E ˘ Z k = 0 , ( ii ) ∀ i, j ∈ { , . . . , d } , Z ik − ˘ Z ik ⊥ L ˘ Z jk k = 1 : n. (3.21)Note that, although the ˘ Z k are independent by construction, the sequence ( ˘ Z k ) k =1: n is not a whitenoise – except if ˘ Z k = Z k a.s. – due to ( ii ) since (cid:107) Z k (cid:107) = (cid:107) Z k − ˘ Z k (cid:107) + (cid:107) ˘ Z k (cid:107) . In particular (cid:107) ˘ Z k (cid:107) ≤ (cid:107) Z k (cid:107) with equality iff Z k = ˘ Z k . A sequence satisfying (3.21) will be called a quasi-whitenoise in what follows and two canonical examples are given just below.Then we define the ARCH model associated to ( ˘ Z k ) k =1: n still with the diffusion coefficients ϑ k by ˘ X k +1 = ˘ X k + ϑ k ( ˘ X k ) ˘ Z k +1 , k = 0 , . . . , n − , ˘ X = g ( X ) (3.22)15here g : R d → R d is a bounded Borel function. It is clear, e.g. by mimicking the proof ofProposition 2.2 (with the same notations), that( ˘ X k ) k =0: n is again an F k -martingale and an F k -Markov chain . (3.23)The aim of this section is to control the error induced by the substitution of Z k by ˘ Z k and to giveconditions ensuring that if, for k = 1 : n , ˘ Z k is dominated by Z k for the convex order ( ˘ Z k ≤ cvx Z k )and g ( X ) ≤ cvx X , then ˘ X n is dominated by X n . Below are two typical examples of boundedand dominated approximations of a white noise. Examples of interest. (cid:73)
Truncated white noise.
Set˘ Z k = Z k { Z k ∈ A k } , k = 1 , . . . , n (3.24)where A , . . . , A n are compact sets such that E Z k { Z k ∈ A k } = 0 , k = 1 , . . . , n. (3.25)Such Borel sets A k are easy to specify when the r.v. Z k have symmetric (invariant by multiplica-tion by −
1) distributions since balls centered at 0 (or any symmetric set) are admissible. Noticethat (3.21) ( ii ) is satisfied whatever the choice of the sets A k such that (3.25) holds. (cid:73) Primal/Voronoi stationary quantization of the white noise.
We replace the white noise by aquantization, usually a Voronoi (primal) one, since the original white noise has no reason to bebounded. Then we set ˘ Z k = (cid:98) Z vor, Γ k k where Γ k is a stationary primal/Voronoi quantization grid with size N k ≥
1. Conditions (3.21)( i )-( ii ) follow from the (primal) stationarity property (2.3).In both settings, defining ( ˘ X k ) k =0: n by (3.22), one obtains an approximation of ( X k ) k =0: n whichis both non-decreasing for the convex order (as a martingale) and dominated by the original ARCHmodel, under some additional assumptions made precise in section 3.1.2. Now we need to estimate the error induced by replacing the original ARCH dynamics (2.18)driven by a true white noise ( Z k ) k =1: n by this ARCH model (3.22) driven by a quasi-white noise.To this end, we will first make precise some vector and matrix notions.We equip the space M d,q with the operator norm ||| B ||| = sup | x |≤ | Bx | where | · | denotes thecanonical Euclidean norm. Then for an M d,q -random variable M we denote in short (cid:107) M (cid:107) for (cid:13)(cid:13) ||| M ||| (cid:13)(cid:13) . Then we will denote by [ ϑ ] Lip the Lipschitz coefficient (if finite) of ϑ : ( R d , | · | ) → ( M d,q , |||·||| ). We will also make use of the Fr¨obenius norm ( (cid:107) B (cid:107) Fr = (cid:112) Tr( BB ∗ ), B ∈ M d,q ) whichsatisfies ||| B ||| ≤ (cid:107) B (cid:107) Fr ≤ √ d ∧ q ||| B ||| .For a Lipschitz continuous function ϑ : R d → M d,q , we define c ( ϑ ) = sup x ∈ R d ||| ϑ ( x ) ||| | x | ≤ (cid:0) ||| ϑ (0) ||| + [ ϑ ] (cid:1) < + ∞ c Fr ( ϑ ) = sup x ∈ R d (cid:107) ϑ ( x ) (cid:107) | x | ≤ (cid:0) (cid:107) ϑ (0) (cid:107) + [ ϑ ] , Lip (cid:1) < + ∞ ϑ ] Lip and [ ϑ ] Fr , Lip denote the Lipschitz coefficients of ϑ with respect to the operator and theFr¨obenius norms respectively.Thus, using that, if ζ ∈ L R q ( P ) is centered with L -orthogonal components ζ i , then E | Aζ | = (cid:80) di =1 AA ∗ ii E ( ζ i ) , we straightforwardly derive the following inequality E | X k +1 | = E | X k | + E | ϑ k ( X k ) Z k +1 | = E | X k | + E | Tr( ϑ k ϑ ∗ k )( X k ) |≤ (cid:0) c Fr ( ϑ k ) (cid:1) E | X k | + c Fr ( ϑ k ) . Standard induction then yields (cid:107) X k (cid:107) = E | X k | ≤ (cid:34) k − (cid:89) (cid:96) =0 (cid:0) c Fr ( ϑ (cid:96) ) (cid:1)(cid:35) (cid:16) E | X | + 1 (cid:17) − . (3.26) Proposition 3.1
Assume that all the functions ϑ k : R d → M d,q , k = 0 : n − , are Lipschitzcontinuous and that X ∈ L R d ( P ) ( e.g. because X = x ∈ R d ). Then, for every k = 0 , . . . , n , (cid:107) X k − ˘ X k (cid:107) ≤ (cid:107) X − ˘ X (cid:107) k − (cid:89) (cid:96) =0 (1 + q [ ϑ (cid:96) ] )+ (cid:0) (cid:107) X (cid:107) (cid:1) k − (cid:88) (cid:96) =0 (cid:34) k − (cid:89) i = (cid:96) +1 (cid:0) q [ ϑ i ] (cid:1) (cid:96) − (cid:89) i =0 (cid:0) c Fr ( ϑ i ) (cid:1)(cid:35) c ( ϑ (cid:96) ) (cid:107) Z (cid:96) +1 − ˘ Z (cid:96) +1 (cid:107) , ≤ (cid:107) X − ˘ X (cid:107) k − (cid:89) (cid:96) =0 (1 + q [ ϑ (cid:96) ] ) + (cid:0) (cid:107) X (cid:107) (cid:1) k − (cid:89) i =0 (cid:0) C ( ϑ i ) (cid:1) k (cid:88) (cid:96) =1 c ( ϑ (cid:96) ) (cid:107) Z (cid:96) − ˘ Z (cid:96) (cid:107) , (3.27) where C ( ϑ ) = (cid:0) q [ ϑ ] (cid:1) ∨ c Fr ( ϑ ) . Moreover, (cid:13)(cid:13)(cid:13) max k =0: n | X k − ˘ X k | (cid:13)(cid:13)(cid:13) ≤ (cid:13)(cid:13) X n − ˘ X n (cid:13)(cid:13) . Proof.
Using successively the martingale property of ( X k − ˘ X k ) k =0: n , then (3.21)( ii ) and last (cid:13)(cid:13) ˘ Z k +1 (cid:13)(cid:13) ≤ (cid:13)(cid:13) Z k +1 (cid:13)(cid:13) = q , one obtains that, for every k = 0 , . . . , n − (cid:107) ˘ X k +1 − X k +1 (cid:107) = (cid:13)(cid:13) ˘ X k − X k (cid:13)(cid:13) + (cid:13)(cid:13) ϑ k ( ˘ X k ) ˘ Z k +1 − ϑ k ( X k ) Z k +1 (cid:13)(cid:13) = (cid:13)(cid:13) ˘ X k − X k (cid:13)(cid:13) + (cid:13)(cid:13) ϑ k ( ˘ X k ) − ϑ k ( X k ) (cid:13)(cid:13) (cid:13)(cid:13) ˘ Z k +1 (cid:13)(cid:13) + (cid:13)(cid:13) ϑ k ( X k ) (cid:107) (cid:107) Z k +1 − ˘ Z k +1 (cid:13)(cid:13) ≤ (cid:13)(cid:13) ˘ X k − X k (cid:13)(cid:13) + q [ ϑ k ] (cid:13)(cid:13) ˘ X k − X k (cid:13)(cid:13) + (cid:13)(cid:13) ϑ k ( X k ) (cid:107) (cid:107) Z k +1 − ˘ Z k +1 (cid:13)(cid:13) . Then, using that (cid:13)(cid:13) ϑ k ( X k ) (cid:107) ≤ c ( ϑ k )(1 + (cid:107) X k (cid:107) ) and (3.26), one concludes by a discrete Gronwall’sLemma. The last inequality follows from Doob’s Inequality. (cid:50) Remark 3.1 • If, furthermore, the ˘ Z k have diagonal covariance matrices, then, for A ∈ M d,q , E | A ˘ Z k | = q (cid:88) i =1 ( AA ∗ ) ii E (cid:0) ˘ Z ik (cid:1) ≤ (cid:107) A (cid:107) max i =1: q E (cid:0) ˘ Z ik (cid:1) ≤ (cid:107) A (cid:107) . oreover, for i (cid:54) = j and k = 1 , . . . , n , E [( ˘ Z ik − Z ik )( ˘ Z jk − Z jk )] = E [( ˘ Z ik − Z ik ) ˘ Z jk ] + E [( ˘ Z jk − Z jk ) ˘ Z ik ] + E [ Z ik Z jk − ˘ Z ik ˘ Z jk ] = 0 , by (3.21) and since the covariance matrices of both Z k and ˘ Z k are diagonal. Hence, for A ∈ M d,q , E | A ( Z k − ˘ Z k ) | ≤ (cid:107) A (cid:107) max i =1: q E (cid:0) Z ik − ˘ Z ik (cid:1) . Consequently E | ϑ k ( ˘ X k ) ˘ Z k +1 − ϑ k ( X k ) Z k +1 | ≤ E (cid:107) ϑ k ( ˘ X k ) − ϑ k ( X k ) (cid:107) + (cid:13)(cid:13) (cid:107) ϑ k ( X k ) (cid:107) Fr (cid:13)(cid:13) max i =1: q E (cid:0) Z ik +1 − ˘ Z ik +1 (cid:1) . Hence (3.27) holds with C ( ϑ ) = [ ϑ ] , Lip ∨ c Fr ( ϑ ) , q [ ϑ k ] and c ( ϑ (cid:96) ) (cid:107) Z (cid:96) − ˘ Z (cid:96) (cid:107) replaced by [ ϑ k ] , Lip , c ( ϑ (cid:96) ) Fr max i =1: q E (cid:0) Z i(cid:96) − ˘ Z i(cid:96) (cid:1) respectively. • Let us assume that the vectors Z k have independent coordinates . Then, the diagonal covariancematrix condition can be achieved by choosing ˘ Z ik = ϕ k,i ( Z ik ) in such a way that (3.21)( i ) is satisfiedand (3.21)( ii ) holds for i = j . Then, for i (cid:54) = j , (3.21)( ii ) follows from the independence property.As for truncation, this can be done by considering sets of the form A k = (cid:81) qi =1 A ik such that E Z ik { Z ik ∈ A ik } = 0 , i = 1 : q .As for the quantization based approach, first note that optimal Voronoi quantization usuallydoes not satisfy this property. But this can be achieved by calling upon product quantization , byconsidering product grids Γ Z = (cid:81) ≤ i ≤ q Γ i where Γ i is a Voronoi stationary grid of the i th marginal Z i so that E (cid:0) Z ik | ( (cid:98) Z j,vork ) ≤ j ≤ q (cid:1) = E (cid:0) Z ik | (cid:98) Z i,vork (cid:1) = (cid:98) Z ik where (cid:98) Z i,vork = Proj vor Γ i ( Z ik ) . One easilyderives from this componentwise stationarity property that (3.21) holds true for ˘ Z k = (cid:0) (cid:98) Z i,vork (cid:1) i =1: d = (cid:98) Z Γ Z ,vork . Finally note that, when the marginal quantizations (cid:98) Z i,vork are L -optimal, then (cid:98) Z Γ Z ,vork israte optimal and satisfies the universal non-asympotic upper-bound provided by Pierce’s Lemma(see the remark following Theorem A.1 in Appendix A.1). (cid:73) Truncation of the Euler scheme with Gaussian increments
Let h := Tn denote the step of the Euler scheme defined by (2.19). Assume that the diffusionfunction ϑ ( t, x ) is Lipschitz continuous in x with constant [ ϑ ] Lip uniformly in t ∈ [0 , T ]. Then ϑ k ( x ) = √ h ϑ ( kTn , x ), k = 0 , . . . , n − Z ∼ N (cid:0) I q (cid:1) . We set A k = B (0; a ) for every k ≥ a >
0. Then c ( ϑ k ) = hc ( ϑ ) and [ ϑ k ] = h [ ϑ ] k = 0 , . . . , n − . so that C ( ϑ k ) = hC ( ϑ ) from which we derive (cid:107) X k − ˘ X k (cid:107) ≤ (cid:107) X − ˘ X (cid:107) (cid:16) q Tn [ ϑ ] (cid:17) k + (cid:0) (cid:107) X (cid:107) (cid:1)(cid:16) Tn C ( ϑ ) (cid:17) k Tn c ( ϑ ) k (cid:88) (cid:96) =1 (cid:13)(cid:13) Z (cid:96) { Z (cid:96) ∈ A c(cid:96) } (cid:13)(cid:13) , ≤ (cid:107) X − ˘ X (cid:107) e q [ ϑ ] kTn + (cid:0) (cid:107) X (cid:107) (cid:1) e C ( ϑ ) T/n c ( ϑ ) kTn (cid:13)(cid:13) Z {| Z |≥ a } (cid:13)(cid:13) , k = 0 , . . . , n, so that, by Doob’s Inequality, with obvious notations (cid:13)(cid:13)(cid:13) max k =0: n | X k − ˘ X ak | (cid:13)(cid:13)(cid:13) ≤ (cid:107) X − ˘ X (cid:107) e q [ ϑ ] T + 4 T (cid:0) (cid:107) X (cid:107) (cid:1) e C ( ϑ ) T c ( ϑ ) (cid:13)(cid:13) Z {| Z |≥ a } (cid:13)(cid:13) . hoice of a = a ( n ) . – If q = 1, the tail expectation can be estimated by a straightforwardintegration by parts which: for every a > E | Z | {| Z |≥ a } ≤ (cid:114) π (cid:16) a + 1 a (cid:17) e − a , a > . If we set a = a n ≥ √ c log n for some c >
0, then E | Z | {| Z |≥ a } = O (cid:18) √ log nn c (cid:19) → . – If q ≥
2, a simple, though sub-optimal, approach is the following: we start form the obvious E | Z | {| Z |≥ a } ≤ e − λa E | Z | e λ | Z | = e − λa q E ζ e λζ × (cid:16) E e λζ (cid:17) d − = q e − λa (1 − λ ) + d − where ζ ∼ N (0; 1) . As soon as a > √ d + 2, the function λ (cid:55)→ − λa − d +22 log(1 − λ ) is minimum at λ ( a ) = (cid:16) − d +2 a (cid:17) ∈ (0 , ). Hence E | Z | {| Z |≥ a } ≤ q e − a (cid:18) ea d + 2 (cid:19) d and, if a = a n ≥ √ c log n for some c >
0, then E | Z | {| Z |≥ a } = O (cid:32) (log n ) d n c (cid:33) → . (cid:73) Voronoi/primal Quantization of the increments of the Euler scheme As N (0; I q ) has 2 + η -moment for any η >
0, it follows from Zador’s Theorem (see Theorem A.1)that, if ˘ Z k are either optimal quantizations of Z k at level N Z or (like in the above remark) a productquantization of optimal quantizations of the marginal, in both cases (cid:13)(cid:13) Z k − ˘ Z k (cid:107) = e ,N Z (cid:0) N (0; I q )) ≤ C q,η σ η ( N (0; I q )) N − /d Z , where σ η ( N (0; I q )) = (cid:16) η/ π q/ S q − Γ (cid:0) η + q + 1 (cid:1)(cid:17) η with Γ( . ) denoting the Euler Γ function and S q − the area of the unit sphere of dimension q − ( ˘ X k ) k =0: n by ( X k ) k =0: n for the convexorder When the functions ϑ k are convex in an appropriate sense and the variables Z k +1 have radialdistributions, then the ARCH model (2.18) dominates all its approximations with truncated whitenoise as established in Propositions 3.3 and 3.5. We start with two lemmas giving conditionsensuring convex ordering between two r.v.. 19 emma 3.1 (Truncation) Let Z ∈ L R q (Ω , A , P ) be a centered random vector. For any Borel set A , let Z A = Z { Z ∈ A } . If E Z A = 0 , then Z A ≤ cvx Z. Proof.
One may restrict to convex functions ϕ : R q → R with linear growth (see e.g. [1]) for which ϕ ( Z ) ∈ L . We may assume w.l.g. that P ( Z / ∈ A ) > E ϕ ( Z ) − E ϕ ( Z A ) = E ϕ ( Z ) { Z / ∈ A } − ϕ (0) P ( Z / ∈ A )= P ( Z / ∈ A ) (cid:16) E (cid:0) ϕ ( Z ) | Z / ∈ A (cid:1) − ϕ (0) (cid:17) ≥ E ( Z | Z / ∈ A ) = 0. (cid:50) Lemma 3.2
Let Z be an integrable R q -valued r.v. and. For i = 1 : q , denote by Z − i the subvectorobtained by removing the i -th coordinate Z i from Z . ( i ) If for i = 1 : q , E [ Z i | Z − i ] = 0 a.s. ( vanishing conditional expectations assumption )and ≤ λ i ≤ (cid:96) i , then Diag( λ , . . . , λ q ) Z ≤ cvx Diag( (cid:96) , . . . , (cid:96) q ) Z where Diag( λ , . . . , λ q ) ∈ M q,q denotes the diagonal matrix with diagonal elements λ , . . . λ q , ( ii ) If for each i = 1 : q , the conditional laws of Z i and − Z i given Z − i coincide a.s. ( sym-metric conditional laws assumption ) and | λ i | ≤ | (cid:96) i | , then Diag( λ , . . . , λ q ) Z ≤ cvx Diag( (cid:96) , . . . , (cid:96) q ) Z , ( iii ) If A , B ∈ M d,q and Z has a radial distribution i.e. for each orthogonal matrix O ∈ M q,q , OZ has the same distribution as Z , then AA ∗ ≤ BB ∗ ⇒ AZ ≤ cvx BZ . If moreover E | Z | ∈ (0 , + ∞ ) , then the converse implication holds. Remark 3.2
The radial distribution assumption implies the symmetric conditional law assumption(choose the orthogonal transformation which only changes the sign of the i -th coordinate) which, inturn, implies the vanishing conditional expectation assumption. On the other hand, the assumptionson the matrices multiplying Z get weaker from ( i ) to ( iii ) .When Z follows the radial distribution N (0; I q ) and AA ∗ ≤ BB ∗ then for ζ ∼ N (0; BB ∗ − AA ∗ ) independent of Z , E [ AZ + ζ | Z ] = AZ , so that AZ ≤ cvx AZ + ζ and AZ + ζ ∼ N (0; BB ∗ ) so that AZ + ζ has the same distribution as BZ . Hence AZ ≤ cvx BZ . This is a simple alternative argumentto the one in [15] which has inspired the generalization to any radial distribution below. Proof. ( i ) The function u (cid:55)→ E ψ ( uX ) is clearly convex and attains its minimum at u = 0 owingto Jensen’s inequality. Hence it is non-decreasing on R + and non-increasing on R − . Now for ϕ : R d → R convex with linear growth, repeatedly using the monotonicity property on R + , oneobtains E ϕ (Diag( λ , . . . , λ q ) Z ) = E E (cid:0) ( ϕ (Diag( λ , . . . , λ q ) Z ) | Z − (cid:1) ≤ E E (cid:0) ϕ (Diag( (cid:96) , λ , . . . , λ q ) Z ) | Z − (cid:1) = E E ( ϕ (Diag( (cid:96) , λ , . . . , λ q ) Z ) | Z − ) ≤ E E ( ϕ (Diag( (cid:96) , (cid:96) , λ , . . . , λ q ) Z ) | Z − ) ≤ . . . ≤ E ϕ (Diag( (cid:96) , . . . , (cid:96) q ) Z ) . By Lemma A.1 in [1] (see also Remark 1.1 p.2 in [23]), one concludes that Diag( λ , . . . , λ q ) Z ≤ cvx Diag( (cid:96) , . . . , (cid:96) q ) Z . 20 ii ) Since, under the assumption, Diag( λ , . . . , λ q ) Z and Diag( (cid:96) , . . . , (cid:96) q ) Z respectively have thesame distributions as Diag( | λ | , . . . , | λ q | ) Z and Diag( | (cid:96) | , . . . , | (cid:96) q | ) Z , the conclusion follows from ( i ).( iii ) Step 1 . For C ∈ M q,q , the singular value decomposition of C writes C = ODV for matrices
O, D, V ∈ M q,q with O, V orthogonal and D diagonal with nonnegative diagonal elements. Onehas √ CC ∗ = ODO ∗ and if Z has a radial distribution, for any measurable and bounded function ϕ : R q → R , E ϕ ( CZ ) = E ϕ ( ODV Z ) = E ϕ ( ODO ∗ Z ) = E ϕ ( √ CC ∗ Z ) so that CZ and √ CC ∗ Z share the same distribution. Step 2 . Let us now assume that the R q -valued r.v. Z has a radial distribution, AA ∗ ≤ BB ∗ and d = q . We set B ε = (cid:112) BB ∗ + εI q . We have B − ε AA ∗ ( B − ε ) ∗ ≤ I q for ε >
0. One deducesthat (cid:113) B − ε AA ∗ ( B − ε ) ∗ = ODO ∗ for matrices O, D ∈ M q,q with O orthogonal and D diagonal withdiagonal elements belonging to [0 , ϕ : R d → R convex with linear growth, the function ψ ( x ) = ϕ ( B ε Ox ) is convex with linear growth and E ϕ ( AZ ) = E ψ ( O ∗ B − ε AZ ) = E ψ ( O ∗ ODO ∗ Z ) = E ψ ( DZ ) ≤ E ψ ( Z ) = E ϕ ( B ε OZ ) = E ϕ ( B ε Z ) , where we used the definition of ψ for the first and fourth equalities, Step 1 for the second equality,the radial property of the distribution of Z for the third and fifth equalities and ( i ) for the inequality.One has lim ε → B ε = √ BB ∗ , so that by Lebesgue’s theorem and Step 1, lim ε → E ϕ ( B ε Z ) = E ϕ ( √ BB ∗ Z ) = E ϕ ( BZ ). We deduce that E ϕ ( AZ ) ≤ E ϕ ( BZ ) so that AZ ≤ cvx BZ . Step 3 . Let us now assume that Z has a radial distribution, AA ∗ ≤ BB ∗ and d < q . Let (cid:101) A, (cid:101) B ∈ M q,q be defined by( (cid:101) A ij , (cid:101) B ij ) = (cid:40) ( A ij , B ij ) for i = 1 : d, j = 1 : q (0 ,
0) for i = d + 1 : q, j = 1 : q . We have (cid:101) A (cid:101) A ∗ ≤ (cid:101) B (cid:101) B ∗ , so that, by Step 2, (cid:101) AZ ≤ cvx (cid:101) BZ . For M ∈ M d,q with non-zero coefficients M ii = 1 for i = 1 : q , we have AZ = M (cid:101) AZ and BZ = M (cid:101) BZ . Since for any convex function ϕ : R d → R , R q (cid:51) x (cid:55)→ ϕ ( M x ) is convex as the composition of a convex function with a linearfunction, we conclude that AZ ≤ cvx BZ . Step 4 . Let us finally assume that Z has a radial distribution, AA ∗ ≤ BB ∗ and d > q . We haveKer B ∗ ⊂ Ker A ∗ so that Im A ⊂ Im B . Let O ∈ M d,q be a matrix with orthogonal columns with normone such that the first dim Im B (we have dim Im B ≤ q ) columns form an orthonormal basis ofIm B . Then B = OO ∗ B and A = OO ∗ A , O ∗ BB ∗ O ≥ O ∗ AA ∗ O and, by Step 2, O ∗ AZ ≤ cvx O ∗ BZ .Since for any convex function ϕ : R d → R , R q (cid:51) x (cid:55)→ ϕ ( Ox ) is convex as the composition of aconvex function with a linear function, we conclude that AZ = OO ∗ AZ ≤ cvx OO ∗ BZ = BZ . Step 5 . Let us suppose that Z is square integrable with a radial distribution. Then E ( Z i Z j ) = { i = j } E | Z | q . If A, B ∈ M d,q are such that AZ ≤ cvx BZ , then, for u ∈ R d , the choice of the convexfunction ϕ : x ∈ R d (cid:55)→ ( u ∗ x ) in the inequality defining the convex order yields u ∗ AA ∗ u E | Z | q ≤ u ∗ BB ∗ u E | Z | q . (cid:50) Proposition 3.2 (Convex order: from the noise to the ARCH)
Let ( Z k ) k =1: n and ( Z (cid:48) k ) k =1: n be two sequences of R q -valued independent integrable and centered random vectors. Let ( ϑ k ) k =0: n − nd ( ϑ (cid:48) k ) k =0: n − be two sequences of M d,q -valued functions with linear growth defined on R d suchthat: ∀ x ∈ R d , ϑ k ϑ ∗ k ( x ) ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ ( x ) for k = 0 , . . . , n − and, either for every k = 0 , . . . , n − , ∀ x, y ∈ R d , ∀ α ∈ [0 , , ∃ O = O k,x,y,α ∈ M q,q orthogonal such that ϑ k ϑ ∗ k (cid:0) αx + (1 − α ) y (cid:1) ≤ (cid:0) αϑ k ( x ) + (1 − α ) ϑ k ( y ) O (cid:1)(cid:0) αϑ k ( x ) + (1 − α ) ϑ k ( y ) O (cid:1) ∗ . (3.28) and the r.v. Z k +1 has a radial distribution ( we say that the assumption is satisfied by ( Z k +1 , ϑ k , ϑ (cid:48) k ) k =0: n − ) or for each k = 0 , . . . , n − , (3.28) is satisfied with ϑ k replaced by ϑ (cid:48) k and Z (cid:48) k +1 has a radial distribution ( we say that the assumption is satisfied by ( Z (cid:48) k +1 , ϑ (cid:48) k , ϑ k ) k =0: n − ).Let X and X (cid:48) be integrable R d -valued r.v. independent of ( Z k ) k =1: n and ( Z (cid:48) k ) k =1: n respectively.Denote by ( X k ) k =0: n and ( X (cid:48) k ) k =0: n the two ARCH models respectively defined by (2.18) and by X (cid:48) k +1 = X (cid:48) k + ϑ (cid:48) k ( X (cid:48) k ) Z (cid:48) k +1 for k = 0 , . . . , n − .If X ≤ cvx X (cid:48) and Z k ≤ cvx Z (cid:48) k for every k = 1 , . . . , n , then ( X k ) k =0: n ≤ cvx ( X (cid:48) k ) k =0: n . Proof.
By the linear growth of the coefficients ϑ k , the integrability of the initial conditions andthe noises and the independence structure, one easily checks by forward induction that X k and X (cid:48) k are integrable for every k = 0 , . . . , n . According to Lemma A.1 in [1], it is enough to provethat E Φ n ( X n ) ≤ E Φ n ( X (cid:48) n ) for Φ n : ( R d ) n +1 → R convex with linear growth.We proceed by successive backward inductions. We define the functions Φ k : ( R d ) k +1 → R , k = 0 , . . . , n −
1, by backward induction as follows:Φ k ( x k ) = Ψ k (cid:0) x k , ϑ k ( x k ) (cid:1) , k = 0 , . . . , n − x k , u ) ∈ ( R d ) k +1 × M d,q ,Ψ k ( x k , u ) = E Φ k +1 (cid:0) x k , x k + uZ k +1 (cid:1) , k = 0 , . . . , n − . By backward induction, using the integrability of the random variables Z k +1 and the linear growthof the function ϑ k ( x k ), one easily checks that the functions Φ k and Ψ k all have linear growth andin particular that the expectation in the definition of Ψ k makes sense. Notice that, since the lawof Z k +1 is radial, ∀ ( x k , u, O ) ∈ ( R d ) k +1 × M d,q × M q,q with O orthogonal Ψ k ( x k , u ) = Ψ k ( x k , uO ) . (3.29)Starting from Φ (cid:48) n = Φ n , we define the functions Φ (cid:48) k , Ψ (cid:48) k , k = 0 , . . . , n − Z (cid:48) k ) k =1: n instead of ( Z k ) k =1: n . The processes ( X k ) k =0: n and ( X (cid:48) k ) k =0: n are ( F Zk = σ ( X , ( Z (cid:96) ) (cid:96) =1: k )) k =0: n and ( F Z (cid:48) k = σ ( X (cid:48) , ( Z (cid:48) (cid:96) ) (cid:96) =1: k )) k =0: n -Markov chains respectively. It is clear bybackward induction thatΦ k ( X k ) = E (cid:0) Φ n ( X n ) | F Zk ) and Φ (cid:48) k ( X k ) = E (cid:0) Φ (cid:48) ( X (cid:48) n ) | F Z (cid:48) k ) , k = 0 , . . . , n. Let us suppose that for each k = 0 , . . . , n −
1, (3.28) holds and Z k +1 has a radial distribution. Wefirst check by backward induction that the functionals Φ k are convex. The function Φ n is convexby assumption. If Φ k +1 is convex, by the convexity of R d (cid:51) w (cid:55)→ Φ k +1 ( x k , x k + w ) and Lemma 3.2( iii ), ∀ x k ∈ ( R d ) k +1 , ∀ u, v ∈ M d,q s.t. uu ∗ ≤ vv ∗ , Ψ k ( x k , u ) ≤ Ψ k ( x k , v ) . (3.30)22ith (3.28) then the convexity of Ψ k consequence of the one of Φ k +1 and last (3.29), we deducethat for x k , y k ∈ ( R d ) k +1 and α ∈ [0 , k (cid:0) αx k + (1 − α ) y k (cid:1) = Ψ k (cid:0) αx k + (1 − α ) y k , ϑ k ( αx k + (1 − α ) y k ) (cid:1) ≤ Ψ k (cid:0) αx k + (1 − α ) y k , αϑ k ( x k ) + (1 − α ) ϑ k ( y k ) O k,x k ,y k ,α (cid:1) ≤ α Ψ k (cid:0) x k , ϑ k ( x k ) (cid:1) + (1 − α )Ψ k (cid:0) y k , ϑ k ( y k ) O k,x k ,y k ,α (cid:1) . = α Ψ k (cid:0) x k , ϑ k ( x k ) (cid:1) + (1 − α )Ψ k (cid:0) y k , ϑ k ( y k ) (cid:1) = α Φ k (cid:0) x k (cid:1) + (1 − α )Φ k (cid:0) y k (cid:1) . As a second step, let us prove that Φ (cid:48) k ≥ Φ k , k = 0 , . . . , n , still by backward induction. This istrue for k = n since Φ n = Φ (cid:48) n . Assume Φ (cid:48) k +1 ≥ Φ k +1 . Then,Ψ (cid:48) k ( x k , u ) ≥ E Φ k +1 (cid:0) x k , x k + uZ (cid:48) k +1 (cid:1) . Now, for every ( x ,k , u ) ∈ ( R d ) k +1 × M d,q , the function z (cid:55)→ Φ k +1 (cid:0) x k , x k + u z (cid:1) is convex as thecomposition of a convex function with an affine function. The assumption Z k +1 ≤ cvx Z (cid:48) k +1 impliesthat Ψ (cid:48) k ( x k , u ) ≥ E Φ k +1 (cid:0) x k , x k + uZ k +1 (cid:1) = Ψ k ( x k , u )which in turn ensures, once composed with ϑ (cid:48) k , that Φ (cid:48) k ( x k ) ≥ Ψ k ( x k , ϑ (cid:48) k ( x k )). With the condition ϑ k ϑ ∗ k ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ and (3.30) we deduce that Φ (cid:48) k ≥ Φ k . Since this inequality holds for every k , onehas in particular that Φ (cid:48) ≥ Φ so that E Φ (cid:48) n (cid:0) X (cid:48) n (cid:1) = E Φ (cid:48) ( X (cid:48) ) ≥ E Φ ( X (cid:48) ) ≥ E Φ ( X ) = E Φ n (cid:0) X n (cid:1) , where we used in the last inequality the assumption X ≤ cvx X (cid:48) and the convexity of Φ .When for each k = 0 , . . . , n −
1, (3.28) holds with ϑ (cid:48) k replacing ϑ k and Z (cid:48) k +1 has a radialdistribution, then we check as above that, for each k = 0 , . . . , n −
1, Φ (cid:48) k is convex and that ∀ x k ∈ ( R d ) k +1 , ∀ u, v ∈ M d,q s.t. uu ∗ ≤ vv ∗ , Ψ (cid:48) k ( x k , u ) ≤ Ψ (cid:48) k ( x k , v ) . (3.31)To deduce by backward induction that Φ k ≤ Φ (cid:48) k , k = 0 , . . . , n , we assume Φ k +1 ≤ Φ (cid:48) k +1 . Then,Ψ k ( x k , u ) ≤ E Φ (cid:48) k +1 (cid:0) x k , x k + uZ k +1 (cid:1) ≤ E Φ (cid:48) k +1 (cid:0) x k , x k + uZ (cid:48) k +1 (cid:1) = Ψ (cid:48) k ( x k , u ) , where we used the convexity of Φ (cid:48) k +1 and Z k +1 ≤ cvx Z (cid:48) k +1 for the second inequality. By composingwith ϑ k then using (3.31) with u = ϑ k ( x k ) and v = ϑ (cid:48) k ( x k ) thanks to the condition ϑ k ϑ ∗ k ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ ,we deduce that Φ k ( x k ) ≤ Ψ (cid:48) k ( x k , ϑ k ( x k )) ≤ Ψ (cid:48) k ( x k , ϑ (cid:48) k ( x k )) = Φ (cid:48) k ( x k ) . One has in particular that Φ ≤ Φ (cid:48) so that E Φ n (cid:0) X n (cid:1) = E Φ ( X ) ≤ E Φ (cid:48) ( X ) ≤ E Φ (cid:48) ( X (cid:48) ) = E Φ (cid:48) n (cid:0) X (cid:48) n (cid:1) where we used in the last inequality the assumption X ≤ cvx X (cid:48) and the convexity of Φ (cid:48) . (cid:50) This leads to the following result. 23 roposition 3.3 (Domination)
Let ( X k ) k =0: n be an R d -valued ARCH model defined by (2.18) where the R q -valued white noise ( Z k ) k =1: n is a sequence of integrable R q -valued r.v. with radialdistributions, the initial random vector X is integrable and the M d,q -valued functions ϑ k , k =1 , . . . , n , are convex in the sense of (3.28) with linear growth. Assume that g ( X ) ≤ cvx X . ( a ) Truncation. Let ( ˘ Z k = Z A k k ) k =1: n with ( A k ) k =1: n an n -tuple of Borel sets satisfying E Z k { Z k ∈ A k } =0 , k = 1 , . . . , n and ( ˘ X Ak ) k =0: n be the induced approximating ARCH process defined by (3.22) . Then ˘ X A n ≤ cvx X n . ( b ) Quantization. Let ( ˘ Z k = (cid:98) Z vork ) k =1: n be a stationary (Voronoi) quantized approximation of thewhite noise Z n and ˘ X n be the induced approximating ARCH process defined by (3.22) . Then ˘ X n ≤ cvx X n . When q = d and in particular in the one-dimensional case q = d = 1, we can rely on points ( i )and ( ii ) in Lemma 3.2 in addition to point ( iii ). This leads to the following relaxed assumption:either for each k = 0 , . . . , n −
1, one of the following conditions holds ( we say that theassumption is satisfied by ( Z k +1 , ϑ k , ϑ (cid:48) k ) k =0: n − ):— Z k +1 satisfies the vanishing conditional expectation assumption, ϑ k and ϑ (cid:48) k are diagonalboth with non-negative entries, the ones of ϑ k being moreover convex and ϑ k ϑ ∗ k ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ (i.e. ( ϑ k ) ii ≤ ( ϑ (cid:48) k ) ii , i = 1 : d ),— Z k +1 satisfies the symmetric conditional distribution assumption, ϑ k and ϑ (cid:48) k are diagonalwith the entries of ϑ k convex and ϑ k ϑ ∗ k ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ (i.e. | ( ϑ k ) ii | ≤ | ( ϑ (cid:48) k ) ii | , i = 1 : d ),— Z k +1 has a radial distribution, ϑ k is convex in the the matrix-convexity sense (3.28) and ϑ k ϑ ∗ k ≤ ϑ (cid:48) k ( ϑ (cid:48) k ) ∗ ,or ( assumption satisfied by ( Z (cid:48) k +1 , ϑ (cid:48) k , ϑ k ) k =0: n − ) for each k = 0 , . . . , n − ϑ k and ϑ (cid:48) k and replacement of Z k and ϑ k by Z (cid:48) k and ϑ (cid:48) k in the other assertions.Notice that from one indent to the next, the assumption on the noise Z k +1 becomes strongerwhereas the assumption on the coefficient ϑ k becomes weaker. Proposition 3.4 (Convex order: q = d ) Let ( Z k ) k =1: n and ( Z (cid:48) k ) k =1: n be two sequences of in-dependent integrable and centered R d -valued random vectors. Let ( ϑ k ) k =0: n − and ( ϑ (cid:48) k ) k =0: n − betwo sequences of M d,d -valued functions with linear growth defined on R d . Let X and X (cid:48) be inte-grable R d -valued r.v. independent of ( Z k ) k =1: n and ( Z (cid:48) k ) k =1: n respectively. Denote by ( X k ) k =0: n and ( X (cid:48) k ) k =0: n the two ARCH models respectively defined by (2.18) and by X (cid:48) k +1 = X (cid:48) k + ϑ (cid:48) k ( X (cid:48) k ) Z (cid:48) k +1 for k = 0 , . . . , n − .Under the assumption just before the proposition and if X ≤ cvx X (cid:48) and Z k ≤ cvx Z (cid:48) k for every k = 1 : n , then ( X k ) k =0: n ≤ cvx ( X (cid:48) k ) k =0: n . Proof.
The proof is formally similar to that of Proposition 3.2 when ( Z k ) k =1: n satisfies the radialdistribution assumption and ϑ k the matrix-convexity assumption. So, we are simply going toexplain how to adapt the backward induction steps when the assumption before the proposition issatisfied by ( Z k +1 , ϑ k , ϑ (cid:48) k ) k =0: n − and Z k +1 satisfies either the vanishing conditional expectations24ssumption or the symmetric conditional distribution assumption and the matrices ϑ k and ϑ (cid:48) k arediagonal. Let Φ n : ( R d ) n +1 → R be a convex function with linear growth and Φ (cid:48) n = Φ n . We defineby backward induction the sequence (Ψ k , Φ k , Ψ (cid:48) k , Φ (cid:48) k ) k =0: n − by using the formulas at the beginningof the proof of Proposition 3.2 when Z k +1 has a radial distribution and otherwise byΨ k ( x k , u ) = E Φ k +1 ( x k , x k + Diag( u ) Z k +1 ) and Φ k ( x k ) = Ψ k ( x k , ( ϑ k ( x k ) ))Ψ (cid:48) k ( x k , u ) = E Φ (cid:48) k +1 ( x k , x k + Diag( u ) Z (cid:48) k +1 ) and Φ (cid:48) k ( x k ) = Ψ (cid:48) k ( x k , ( ϑ (cid:48) k ( x k ) ))where x k ∈ ( R d ) k +1 , u ∈ R d , Diag( u ) ∈ M d,d denotes the diagonal matrix with diagonal coeffi-cients Diag( u ) ii = u i , i = 1 : d , and is the vector in R d with all coefficients equal to 1. Note thatwhen ϑ k is diagonal Diag( ϑ k ( x ) ) = ϑ k ( x ) for all x ∈ R d . If Φ k +1 is convex and Z k +1 satisfies thevanishing conditional expectations assumption (or the stronger symmetric conditional distributionsassumption), then by the convexity of R d (cid:51) w (cid:55)→ Φ k +1 ( x k , x k + w ) and Lemma 3.2 ( i ), ∀ x k ∈ ( R d ) k +1 , ∀ u, v ∈ R d + s.t. u i ≤ v i for i = 1 : d, Ψ k ( x k , u ) ≤ Ψ k ( x k , v ) . (3.32)For u ∈ R d , let us denote by abs( u ) the vector in R d defined by abs( u ) i = | u i | , i = 1 : d . Assumemoreover either that ϑ k is diagonal with nonnegative and convex diagonal coefficients or Z k +1 satisfies the symmetric conditional distributions assumption, ϑ k is diagonal and the absolute valuesof its diagonal coefficients are convex functions. Then Ψ k ( x k , ϑ k ( x k ) ) = Ψ k ( x k , abs( ϑ k ( x k ) ))with the coefficients of abs( ϑ k ( x k ) ) nonnegative and convex. With (3.32) then the convexity ofΨ k consequence of the one of Φ k +1 , we deduce that for x k , y k ∈ ( R d ) k +1 and α ∈ [0 , k (cid:0) αx k + (1 − α ) y k (cid:1) = Ψ k (cid:0) αx k + (1 − α ) y k , abs( ϑ k ( αx k + (1 − α ) y k ) ) (cid:1) ≤ Ψ k (cid:0) αx k + (1 − α ) y k , α abs( ϑ k ( x k ) ) + (1 − α )abs( ϑ k ( y k ) ) (cid:1) ≤ α Ψ k (cid:0) x k , abs( ϑ k ( x k ) ) (cid:1) + (1 − α )Ψ k (cid:0) y k , abs( ϑ k ( x k ) ) (cid:1) = α Φ k (cid:0) x k (cid:1) + (1 − α )Φ k (cid:0) y k (cid:1) . If Φ (cid:48) k +1 ≥ Φ k +1 , then, Ψ (cid:48) k ( x k , u ) ≥ E Φ k +1 (cid:0) x k , x k + Diag( u ) Z (cid:48) k +1 (cid:1) . Now, for every ( x ,k , u ) ∈ ( R d ) k +1 × R d , the function z (cid:55)→ Φ k +1 (cid:0) x k , x k + Diag( u ) z (cid:1) is convexas the composition of a convex function with an affine function. The assumption Z k +1 ≤ cvx Z (cid:48) k +1 implies that Ψ (cid:48) k ( x k , u ) ≥ E Φ k +1 (cid:0) x k , x k + Diag( u ) Z k +1 (cid:1) = Ψ k ( x k , u )which in turn ensures, once composed with ϑ (cid:48) k , thatΦ (cid:48) k ( x k ) ≥ Ψ k ( x k , ϑ (cid:48) k ( x k ) ) = Ψ k ( x k , abs( ϑ (cid:48) k ( x k ) )) . Since the absolute values of the diagonal coefficients of ϑ (cid:48) k are not smaller than the ones of ϑ k , wededuce with (3.32) that Φ (cid:48) k ≥ Φ k . (cid:50) In the scalar q = d = 1 case, we deduce the following result.25 roposition 3.5 (Scalar setting: d = q = 1 ) Let ( X k ) k =0: n be a scalar ARCH model defined by (2.18) where the white noise ( Z k ) k =1: n is scalar but (possibly) not bounded and g ( X ) ≤ cvx X with X integrable. Assume that the functions | ϑ k | , k = 0 , . . . , n − are convex with linear growth and thatfor each k = 0 , . . . , n − , either ϑ k is nonnegative or − Z k +1 has the same distribution as Z k +1 . ( a ) Truncation . Let ( ˘ Z k = Z A k k ) k =1: n with ( A k ) k =1: n an n -tuple of Borel sets satisfying E Z k { Z k ∈ A k } =0 , k = 1 , . . . , n and ( ˘ X Ak ) k =0: n be the induced approximating ARCH process defined by (3.22) . Then ( ˘ X Ak ) k =0: n ≤ cvx ( X k ) k =0: n . ( b ) Voronoi Quantization . Let ( ˘ Z k = (cid:98) Z vork ) k =1: n be a stationary (Voronoi) quantized approximationof the white noise Z n and ˘ X n the induced approximating ARCH process defined by (3.22) . Then ( ˘ X k ) k =0: n ≤ cvx ( X k ) k =0: n . When g is nearest neighbour projection on a stationary Voronoi (primal) quantization grid for X ,the hypothesis g ( X ) ≤ cvx X is satisfied. Proof. ( a ) Follows from the combination of Lemma 3.1 and Proposition 3.4.( b ) Follows from the stationarity property which implies (cid:98) Z vork = E (cid:0) Z k | (cid:98) Z vork (cid:1) ≤ cvx Z k , k = 1 : n and Proposition 3.4. (cid:50) To approximate the sequence ( µ k ) k =0: n of marginal distributions of the ARCH model (2.18) X k +1 = X k + ϑ k ( X k ) Z k +1 driven by a white noise ( Z k ) k =1: n we adopt an ARCH approximation ( (cid:98) X k ) k =0: n based on the dualquantization of the ARCH ˘ X k +1 = ˘ X k + ϑ k ( ˘ X k ) ˘ Z k +1 (4.33)where ( ˘ Z k ) k =1: n is a bounded quasi-white noise satisfying (3.21). To be more precise we start fromany approximation (cid:98) X = g ( X ) of X (usually a Voronoi quantization of X ) supported by agrid Γ with N points and we assume that each (cid:98) X k +1 is obtained from (cid:98) X k by applying a dualquantization using a grid Γ k = { x ki , i = 1 : N k } with N k points after the above ARCH step withtruncated noise ˘ Z k +1 . The resulting dynamics, starting from (cid:98) X reads (cid:101) X k +1 = (cid:98) X k + ϑ k ( (cid:98) X k ) ˘ Z k +1 and (cid:98) X k +1 = Proj del Γ k +1 ( (cid:101) X k +1 , U k +1 ) , k = 0 , . . . , n − . (4.34)where ( U k ) k =1: n is a sequence of independent random variables uniformly distributed on [0 , Z k , ˘ Z k ) k =1: n and X . When the assumption in Proposition 3.2 (or q = d andthe assumption before Proposition 3.4) is satisfied by ( Z k +1 , ϑ k , ϑ k ) k =0: n − or ( ˘ Z k +1 , ϑ k , ϑ k ) k =0: n − and ˘ X ≤ cvx X , then ( ˘ X k ) k =0: n ≤ cvx ( X k ) k =0: n . Since each dual quantization step is convex orderincreasing, ( (cid:98) X k ) k =0: n is, in general, not comparable to the original ARCH ( X k ) k =0: n . More precisely,when the assumption in Proposition 3.2 (or q = d and the assumption before Proposition 3.4) issatisfied by ( ˘ Z k +1 , ϑ k , ϑ k ) k =0: n − and ˘ X ≤ cvx (cid:98) X , then ( ˘ X k ) k =0: n ≤ cvx ( (cid:98) X k ) k =0: n . This can be26hecked by adapting the proof of Proposition 3.2 or Proposition 3.4: the functions Φ k , Ψ k functionsassociated with the ARCH model (4.33) are still convex. Let Φ (cid:48) k , Ψ (cid:48) k be defined by the backwardinduction: Φ (cid:48) n = Φ n andΨ (cid:48) k ( x k , u ) = E φ (cid:48) k +1 ( x k , Proj del Γ k +1 ( x k + u ˘ Z k +1 , U k +1 )) , Φ (cid:48) k ( x k ) = Ψ (cid:48) k ( x k , ϑ k ( x k )) , k = 0 , . . . , n − . Since, except in the scalar case d = 1, the convex property is not necessarily preserved throughthe dual quantization step, the convexity of the functions Ψ (cid:48) k , Φ (cid:48) k is not clear. Nevertheless, whenΦ (cid:48) k +1 ≥ Φ k +1 , by convexity of Φ k +1 , independence of U k +1 and ˘ Z k +1 and Jensen’s inequality,Ψ (cid:48) k ( x k , u ) ≥ E E (cid:0) φ k +1 ( x k , Proj del Γ k +1 ( x k + u ˘ Z k +1 , U k +1 )) | ˘ Z k +1 (cid:1) ≥ E φ k +1 ( x k , x k + u ˘ Z k +1 )from which we can deduce, like in the proof of Proposition 3.4, that Φ (cid:48) k ≥ Φ k . Finally, E Φ n ( (cid:98) X n ) ≥ E Φ n ( ˘ X n ).Note that the fact that truncation and dual quantization have opposite effects in terms of convexorder is not so bad for numerical purposes: the errors coming from these two approximations should,at least partially, compensate.The monotonicity of the sequence ( µ k ) k =0: n for the convex order is preserved by the approxima-tion: by Proposition 2.2, we have (cid:98) X ≤ cvx (cid:98) X ≤ cvx . . . ≤ cvx (cid:98) X n . But this monotonicity property isguaranteed only if the laws of the r.v. (cid:98) X k are computed exactly.In this perspective, it is possible to choose a quasi-white noise ( ˘ Z k ) k =1: n satisfying (3.21) withthe additional condition that each ˘ Z k , k = 1 , . . . , n , takes finitely many values, say N Zk . Such aquasi-white noise may be obtained by primal quantization of the original white noise ( Z k ) k =1: n .Then we may calculate the N k × N Zk +1 possible values of (cid:101) X k +1 to compute its distribution andthen the one of (cid:98) X k +1 . More precisely, if (cid:98) X k and ˘ Z k +1 are respectively distributed according to (cid:80) N k i =1 (cid:98) p ki δ x ki and (cid:80) N Zk +1 j =1 ˘ q k +1 j δ z k +1 j , then (cid:101) X k +1 and (cid:98) X k +1 are respectively distributed according to (cid:80) N k i =1 (cid:80) N Zk +1 j =1 (cid:98) p ki ˘ q k +1 j δ x ki + ϑ k ( x ki ) z k +1 j and N k +1 (cid:88) (cid:96) =1 N k (cid:88) i =1 N Zk +1 (cid:88) j =1 (cid:98) p ki ˘ q k +1 j P (cid:16) Proj del Γ k +1 ( x ki + ϑ k ( x ki ) z k +1 j , U k +1 ) = x k +1 (cid:96) (cid:17) δ x k +1 (cid:96) . While preserving the monotonicity of the sequence ( ˘ X k ) k =0: n for the convex order, the dual quan-tization steps by mapping (cid:101) X k +1 to (cid:98) X k +1 = Proj del Γ k +1 ( (cid:101) X k +1 , U k +1 ) make possible to control thesize/cardinality N k +1 of the support of the approximation of the law of X k +1 hence avoiding itsexplosion with k : in contrast, the cardinality of the support of ˘ X k +1 can be equal to (cid:81) k +1 (cid:96) =0 N (cid:96) when˘ X = (cid:98) X .In the scalar case q = d = 1, the finite support property of the truncated white noise is notneeded to compute the laws of the r.v. (cid:98) X k . Let us now explain this for the truncated noise Z k { Z k ∈ [ α k ,β k ] } before giving general error estimations.27 .1 Scalar setting d = q = 1 Assume for simplicity that all the ϑ k are (strictly) positive. Assume that we have closed formsfor the c.d.f. and the partial first moment of the white noise: F k ( z ) := P ( Z k ≤ z ) and K k ( z ) := E Z k { Z k ≤ z } , k = 1 , . . . , n and also for the starting value of the chain X , denoted by F and K respectively. Denote by F k ( z − ) the left-hand limit at point z of the function F k . We may proceed by (centered) truncationof the noise by considering, for every k = 1 , . . . , n ˘ Z k = Z k { Z k ∈ [ α k ,β k ] } , α k < < β k (so that E Z k { Z k ∈ [ α k ,β k ] } = 0). The transition kernels ˘ P k ( x, dy ) of the chain ( ˘ X k ) k =0: n have a c.d.f.˘ F k ( x, u ) and partial first moment ˘ K k ( x, u ) functions given by˘ F k ( x, u ) := P ( ˘ X k +1 ≤ u | ˘ X k = x ) = { x ≤ u } (cid:0) − F k +1 ( β k +1 ) + F k +1 ( α k +1 − ) (cid:1) + F k +1 (cid:0) α k +1 ∨ u − xϑ k ( x ) ∧ β k +1 (cid:1) − F k +1 (cid:0) α k +1 − (cid:1) (4.35)and ˘ K k ( x, u ) := E (cid:0) ˘ X k +1 { ˘ X k +1 ≤ u } | ˘ X k = x )= x ˘ F k ( x, u ) + ϑ k ( x ) (cid:0) K k +1 (cid:0) α k +1 ∨ u − xϑ k ( x ) ∧ β k +1 (cid:1) − K k +1 (cid:0) α k +1 (cid:1)(cid:1) . (4.36) (cid:73) Fixed grids.
For the Voronoi quantization at time k = 0 associated with the grid ( x i ) i =1: N ,the weights are (cid:98) p i = P ( (cid:98) X = x i ) = F ( x i +1 / ) − F ( x i − / ) , i = 1 : N , with x / = −∞ , x N +1 / = + ∞ and x i +1 / = x i + x i +1 , i = 1 : N − k = { x k , . . . , x kN k } , k = 1 , . . . , n , that dually quantize (cid:101) X k (fromtime 1) are supposed to be fixed, then one may directly compute the transition weights of (cid:98) X k , k = 0 , . . . , n , using Equation (A.49) from the Appendix A.2, keeping in mind that we have aboveclosed form formulas for ˘ F k and ˘ K k : (cid:98) π kij = P (cid:0) (cid:98) X k +1 = x k +1 j | (cid:98) X k = x ki (cid:1) = ˘ K k ( x ki , x k +1 j ) − ˘ K k ( x ki , x k +1 j − ) − x k +1 j − (cid:0) ˘ F k ( x ki , x k +1 j ) − ˘ F k ( x ki , x k +1 j − ) (cid:1) x k +1 j − x k +1 j − + x k +1 j +1 (cid:0) ˘ F k ( x ki , x k +1 j +1 ) − ˘ F k ( x ki , x k +1 j )) − (cid:0) ˘ K k ( x ki , x k +1 j +1 ) − ˘ K k ( x ki , x k +1 j ) (cid:1) x k +1 j +1 − x k +1 j and (cid:98) p k +1 j = P (cid:0) (cid:98) X k +1 = x k +1 j (cid:1) = N k (cid:88) i =1 (cid:98) p ki (cid:98) π kij , j = 1 : N k +1 . Embedded optimization of the grids . One may also perform a step by step embeddedoptimization of the grids Γ k by implementing the standard Lloyd I algorithm to first optimize Γ with respect to Voronoi/primal quantization and then optimize the grids Γ k inductively using the dual fixed point procedure (A.50) called dual Lloyd procedure . In particular if we assume that thegrid Γ k = { x k , . . . , x kN k } that dually quantizes (cid:101) X k (into (cid:98) X k ) has been optimized, then the c.d.f. F (cid:101) X k +1 and first partial moment function K (cid:101) X k +1 of (cid:101) X k +1 are given by F (cid:101) X k +1 ( u ) = N k (cid:88) j =1 (cid:98) p kj ˘ F k ( x kj , u ) and K (cid:101) X k +1 ( u ) = N k (cid:88) j =1 (cid:98) p kj ˘ K k ( x kj , u )respectively where ˘ F k and ˘ K k are given by (4.35) and (4.36). One defines the dual Lloyd mapping T k by replacing F and K by F (cid:101) X k +1 and K (cid:101) X k +1 in (A.50) and implement the iterative procedureΓ [ (cid:96) +1] k +1 = T k +1 (cid:0) Γ [ (cid:96) ] k +1 (cid:1) , (cid:96) ≥ , conv (cid:0) Γ [0] k +1 (cid:1) ⊃ supp (cid:0) P (cid:101) X k +1 (cid:1) . The algorithms permitting the optimization of primal and dual quantization grids in generaldimensions are respectively discussed in Appendix A.1 and Appendix A.2. For a convenient im-plementation of the dual quantization of (cid:98) X k + ϑ k ( (cid:98) X k ) ˘ Z k +1 , one needs to know the affine spacespanned by the support of the distribution of this random vector. Let ˘ C k +1 denote the covariancematrix of ˘ Z k +1 . If for each x ∈ R d , the matrix ϑ k ( x ) C k +1 ϑ ∗ k ( x ) is positive definite (which impliesthat q ≥ d ), then this affine space is R d . Theorem 4.1
Let ( U k ) k =0: n be a sequence of i.i.d. random variables uniformly distributed over [0 , and let ( Z k , ˘ Z k ) k =1: n be an independent sequence of independent square integrable R q + q -valuedrandom vectors, independent of X satisfying (3.21) and such that ( Z k ) k =1: n is a white noise. Let (cid:98) X = g ( X ) and let (Γ k ) k =1: n be grids satisfying the consistency condition ( G F ) . ( a ) General case . Then, for every k ∈ { , . . . , n } , (cid:107) (cid:98) X k − X k (cid:107) ≤ k (cid:89) i =1 (cid:0) q [ ϑ i − ] (cid:1) (cid:107) (cid:98) X − X (cid:107) + k (cid:88) (cid:96) =1 (cid:34) k (cid:89) i = (cid:96) +1 (cid:0) q [ ϑ i − ] (cid:1)(cid:35) × (cid:18) (cid:107) ϑ (cid:96) − ( X (cid:96) − ) (cid:107) (cid:13)(cid:13) Z (cid:96) − ˘ Z (cid:96) (cid:13)(cid:13) + (cid:13)(cid:13) (cid:98) X (cid:96) − (cid:101) X (cid:96) (cid:13)(cid:13) (cid:19) . (4.37)( b ) Quantized innovation. Assume (cid:98) X = Proj vor Γ ( X ) with | Γ | = N and ˘ Z k = (cid:98) Z vork = Proj vor Γ Zk ( Z k ) , k = 1 , . . . , n , all are optimal quadratic Voronoi quantizations of Z k at level N Zk = card(Γ Zk ) . Assume X , Z , . . . , Z n ∈ L η ( P ) . Then X k ∈ L η ( P ) for every k ∈ { , . . . , n } and, for every k = 0 , . . . , n , (cid:107) (cid:98) X k − X k (cid:107) ≤ (cid:18) ( C vord,η ) k (cid:89) i =1 (cid:0) q [ ϑ i − ] (cid:1) σ η ( X ) N /d + k (cid:88) (cid:96) =1 (cid:34) k (cid:89) i = (cid:96) +1 (cid:0) q [ ϑ i − ] (cid:1)(cid:35) (cid:34) (cid:107) ϑ (cid:96) − ( X (cid:96) − ) (cid:107) ( C vorq,η ) σ η ( Z (cid:96) )( N Z(cid:96) ) /q + ( (cid:101) C deld,η ) σ η ( (cid:101) X (cid:96) )( N (cid:96) ) /d (cid:35) (cid:19) / . emark 4.1 One has (cid:107) ϑ (cid:96) − ( X (cid:96) − ) (cid:107) ≤ c ( ϑ (cid:96) − )(1 + (cid:107) X (cid:96) − (cid:107) ) with (cid:107) X (cid:96) − (cid:107) bounded from aboveaccording to (3.26) . Moreover, If the ˘ Z k have diagonal covariance matrices, then, as emphasizedin Remark 3.1, each factor (cid:0) q [ ϑ i − ] (cid:1) can be replaced by (cid:0) ϑ i − ] , Lip (cid:1) which is smaller.
Proof.
By Section 2.4.1, we know that both ( X k ) k =0: n and ( (cid:98) X k ) k =0: n are σ (cid:0) X , ( Z (cid:96) , U (cid:96) ) (cid:96) =1: k (cid:1) -martingales so that, on the one hand, (cid:98) X k ≤ cvx (cid:98) X k +1 for all k = 0 , . . . , n −
1. On the other hand,their difference is also a martingale and one derives from the decomposition (cid:98) X k +1 − X k +1 = (cid:98) X k − X k + Proj del Γ k +1 (cid:0) (cid:101) X k +1 , U k +1 (cid:1) − (cid:101) X k +1 + (cid:0) ϑ k ( (cid:98) X k ) − ϑ k ( X k ) (cid:1) ˘ Z k +1 + ϑ k ( X k )( ˘ Z k +1 − Z k +1 ) . that (cid:107) (cid:98) X k +1 − X k +1 (cid:107) = (cid:107) (cid:98) X k − X k (cid:107) + (cid:107) Proj del Γ k +1 (cid:0) (cid:101) X k +1 , U k +1 (cid:1) − (cid:101) X k +1 + (cid:0) ϑ k ( (cid:98) X k ) − ϑ k ( X k ) (cid:1) ˘ Z k +1 + ϑ k ( X k )( ˘ Z k +1 − Z k +1 ) (cid:107) . As r.v. U k +1 is independent of ( (cid:101) X k +1 , (cid:98) X k , X k , Z k +1 , ˘ Z k +1 ), it follows from the dual stationarityproperty that (cid:107) (cid:98) X k +1 − X k +1 (cid:107) = (cid:107) (cid:98) X k − X k (cid:107) + (cid:13)(cid:13) Proj del Γ k +1 (cid:0) (cid:101) X k +1 , U k +1 (cid:1) − (cid:101) X k +1 (cid:13)(cid:13) + (cid:13)(cid:13) ( ϑ k ( (cid:98) X k ) − ϑ k ( X k )) ˘ Z k +1 + ϑ k ( X k )( ˘ Z k +1 − Z k +1 ) (cid:13)(cid:13) . (4.38)Moreover, by (3.21) and the independence of ( Z k +1 , ˘ Z k +1 ) and ( X k , (cid:98) X k ), one has (cid:13)(cid:13) ( ϑ k ( (cid:98) X k ) − ϑ k ( X k )) ˘ Z k +1 + ϑ k ( X k )( ˘ Z k +1 − Z k +1 ) (cid:13)(cid:13) = (cid:107) ( ϑ k ( (cid:98) X k ) − ϑ k ( X k )) ˘ Z k +1 (cid:107) + (cid:107) ϑ k ( X k )( ˘ Z k +1 − Z k +1 ) (cid:107) ≤ (cid:107) ( ϑ k ( (cid:98) X k ) − ϑ k ( X k )) (cid:107) (cid:107) ˘ Z k +1 (cid:107) + (cid:107) ϑ k ( X k ) (cid:107) (cid:107) Z k +1 − ˘ Z k +1 (cid:107) . (4.39)It follows from (4.38), (4.39), the Lipschitz property of the functions ϑ k and the inequality (cid:107) ˘ Z k +1 (cid:107) ≤ (cid:107) Z k +1 (cid:107) = q deduced from Condition (3.21) ( ii ) that (cid:107) (cid:98) X k +1 − X k +1 (cid:107) ≤ (cid:107) (cid:98) X k − X k (cid:107) (cid:0) q [ ϑ k ] (cid:1) + (cid:107) ϑ k ( X k ) (cid:107) (cid:107) Z k +1 − ˘ Z k +1 (cid:107) + (cid:107) (cid:98) X k +1 − (cid:101) X k +1 (cid:107) . Discrete time Gronwall’s lemma yields for every k = 0 , . . . , n , (cid:107) (cid:98) X k − X k (cid:107) ≤ k (cid:89) i =1 (cid:0) q [ ϑ i − ] (cid:1) (cid:107) (cid:98) X − X (cid:107) + k (cid:88) (cid:96) =1 (cid:34) k (cid:89) i = (cid:96) +1 (cid:0) q [ ϑ i − ] (cid:1)(cid:35) × (cid:18) (cid:107) ϑ (cid:96) − ( X (cid:96) − ) (cid:107) (cid:13)(cid:13) Z (cid:96) − ˘ Z (cid:96) (cid:13)(cid:13) + (cid:13)(cid:13) (cid:98) X (cid:96) − (cid:101) X (cid:96) (cid:13)(cid:13) (cid:19) . (4.40)( b ) Optimality of the quantizations ˘ Z k imply that the ˘ Z k are Voronoi stationary which makes( Z k , ˘ Z k ) a martingale coupling hence satisfying (3.21) for every k = 1 , . . . , n . As X ∈ L η R d ( P ) and30 k ∈ L η R q ( P ) for k = 1 , . . . , n , the Voronoi (primal) non-asymptotic version of Zador’s Theorem(see Theorem A.1( b ) in Appendix A.1) implies that (cid:13)(cid:13) X − (cid:98) X (cid:13)(cid:13) ≤ (cid:101) C vord,η σ η ( X ) N − /d and (cid:13)(cid:13) Z k − ˘ Z k (cid:13)(cid:13) ≤ (cid:101) C vor ,q,η σ η ( Z k )( N Zk ) − /q , k = 1 , . . . , n, where (cid:101) C vor ,q,η is a positive real constant only depending on the dimension q and η >
0. Moreover,for every k = 1 , . . . , n , the random variables (cid:101) X k are compactly supported. Hence, owing to thedual form of Zador’s Theorem (see Appendix A.2, Theorem A.2( b )), there exists a real constant (cid:101) C del ,d,η ∈ (0 , + ∞ ) such that, for every k = 1 , . . . , n , (cid:13)(cid:13) (cid:98) X k − (cid:101) X k (cid:13)(cid:13) ≤ (cid:101) C del ,d,η σ η ( (cid:101) X k ) N − /dk . Plugging these bounds into (4.40) completes the proof. (cid:50)
Application to the Euler scheme of a Brownian diffusion
The Euler scheme (2.19) is anARCH with ϑ k ( x ) = (cid:113) Tn ϑ ( t k , x ). If we assume that the diffusion function ϑ ( t, x ) is Lipschitz in x uniformly in t ∈ [0 , T ] with constant [ ϑ ] Lip , then max ≤ k ≤ n − [ ϑ k ] Lip ≤ (cid:113) Tn [ ϑ ] Lip . As a consequence,under the assumptions of the above claim ( b ), for every k = 0 , . . . , n , (cid:13)(cid:13) (cid:98) X k − ¯ X k (cid:13)(cid:13) ≤ (cid:32) ( (cid:101) C vor ,d,η ) e qt k [ ϑ ] σ η ( X ) N /d + k (cid:88) (cid:96) =1 e q ( t k − t (cid:96) )[ ϑ ] (cid:34) Tn (cid:107) ϑ ( t (cid:96) − , ¯ X (cid:96) − ) (cid:107) ( (cid:101) C vor ,q,η ) σ η ( Z (cid:96) )( N Z(cid:96) ) /q + ( (cid:101) C del ,d,η ) σ η ( (cid:101) X (cid:96) )( N (cid:96) ) /d (cid:35)(cid:33) / . Note that, setting c ( ϑ ) = sup ( t,x ) ∈ [0 ,T ] × R d ||| ϑ ( t,x ) ||| | x | c Fr ( ϑ ) = sup ( t,x ) ∈ [0 ,T ] × R d (cid:107) ϑ ( t,x ) (cid:107) | x | , we havemax ≤ k ≤ n − c Fr ( ϑ k ) ≤ Tn c Fr ( ϑ ) so that, according to (3.26), (cid:107) ϑ ( t k , ¯ X k ) (cid:107) ≤ c ( ϑ ) e c Fr ( ϑ ) t k (1 + (cid:107) X (cid:107) ) . References [1]
Alfonsi, A. Corbetta J. and Jourdain, B. (2017). Sampling of probability measures in the convexorder by Wasserstein projection, arXiv:1709.05287v2.[2]
Alfonsi, A. Corbetta J. and Jourdain, B. (2018). Sampling of one-dimensional probability mea-sures in the convex order and computation of robust option price bounds, accepted in
InternationalJournal of Theoretical and Applied Finance .[3]
Backhoff-Veraguas J. and Pammer, G. (2019). Stability of martingale optimal transport and weakoptimal transport, arXiv:1904.04171.[4]
Baker, D. (2012). Martingales with specified marginals, PhD, Universit´e Pierre et Marie Curie(Sorbonne-Universit´e), Paris, France.[5]
Beiglb¨ock, M. Cox, A. and Huesmann, M. (2017). Optimal transport and Skorokhod embedding.
Invent. Math. , (2):327-400. Beiglb¨ock, M. Henry-Labord`ere, P. and Penkner, F. (2013). Model-independent bounds foroption prices - a mass transport approach.
Finance Stoch. , (3):477–501.[7] Beiglb¨ock, M. and Juillet, N. (2016). On a problem of optimal transport under marginal martingaleconstraints.
Ann. Probab. , (1):42–106.[8] Beiglb¨ock, M. Nutz, M. and Touzi, N. (2017). Complete duality for martingale optimal transporton the line.
Ann. Probab. , (1):3038-3074.[9] Campi, L. Laachir, I. and Martini, C. (2017). Change of numeraire in the two-marginals martingaleoptimal transport problem.
Finance Stoch. , (2):471-486 .[10] De March, H. (2018). Entropic approximation for multi-dimensional martingale optimal transport,arXiv 1812.11104.[11]
De March, H. and Touzi, N. (2017). Irreducible convex paving for decomposition of multi-dimensional martingale transport plans, arXiv 1702.08298.[12]
Dolinsky, Y. and Soner, H.M.
Robust hedging and martingale optimal transport in continuoustime.
Probab. Theory Relat. Fields , :391–427.[13] Du, Q. Emelianenko, M. and Ju, L. (2006). Convergence of the Lloyd algorithm for computingcentroidal Voronoi tessellations,
SIAM Journal on Numerical Analysis , :102-119.[14] Emelianenko, M. Ju, L. and Rand, A. (2008). Nondegeneracy and Weak Global Convergence ofthe Lloyd Algorithm in R d , SIAM Journal on Numerical Analysis , (3):1423-1441.[15] Fadili, A. and Pag`es, G. (2018). Functional convex order for stochastic differential equations andtheir approximation schemes. Technical report.[16]
Galichon, A. Henry-Labord`ere, P. and Touzi, N. (2014). A stochastic control approach tono-arbitrage bounds given marginals, with an application to lookback options.
Ann. Appl. Probab. , (1):312–336.[17] Ghoussoub, N. Kim, Y.-H. and Lim, T. (2019). Structure of optimal martingale transport plans ingeneral dimensions.
Ann. Probab. , (1):109-164.[18] Graf, S. and Luschgy, H. (2000).
Foundations of quantization for probability distributions ,LNM 1730, Springer, Berlin, 230p.[19]
Guo, G. and Obl`oj, J. (2017). Computational Methods for Martingale Optimal Transport problems,arXiv:1710.07911.[20]
Henry-Labord`ere, P. (2019). (Martingale) optimal transport and anomaly detection with neuralnetworks : a primal-dual algorithm, arXiv:1904.04546.[21]
Henry-Labord`ere, P. Tan X. and Touzi, N. (2016). An explicit martingale version of the one-dimensional Brenier’s theorem with full marginals constraint.
Stochastic Process. Appl. , (9):2800–2834.[22] Henry-Labord`ere, P. and Touzi, N. (2016). An explicit martingale version of the one-dimensionalBrenier theorem.
Finance Stoch. , (3):635–668.[23] Hirsch, F Profeta, C Roynette, B. and Yor, M (2011). Peacocks and associated martingales,with explicit constructions. Bocconi & Springer Series, 3. Springer, Milan; Bocconi University Press,Milan, 2011. xxxii+384 pp.[24]
Hobson, D. (1998). Robust hedging of the lookback option.
Finance Stoch. , :329–347.[25] Hobson, D.and Klimmek, M. (2015). Robust price bounds for the forward starting straddle.
FinanceStoch. , (1):189–214.[26] Hobson, D. and Neuberger, A. (2012). Robust bounds for forward start options.
Math. Finance , (1):31–56.[27] Jourdain, B. and Margheriti, W. (2018). A new family of one-dimensional of martingale couplings,arXiv 1808.01390. Kallenberg, O. (1997). Foundations of modern probability. Probability and its Applications. Springer-Verlag, New York.[29]
Kieffer, J. C. (1982). Exponential rate of convergence for Lloyd’s method. I.
IEEE Trans. Inform.Theory , (2):205-210.[30] Montes, T. (2019). Quantization based schemes for pricing derivatives: a comparison, PhD, Sorbonne-Univresit´e.[31]
Pag`es, G. (2018).
Numerical Probability: an introduction with applications to Finance , Springer-Verlag,xvi +579p.[32]
Pag`es, G. and Sagna, A. (2018). Strong and weak error analysis of recursive quantization: a generalapproach with an application to jump diffusions, submitted.[33]
Pag`es, G. and Sagna, A. (2015). Recursive marginal quantization of the Euler scheme of a diffusionprocess.
Appl. Math. Finance (5):463–498.[34] Pag`es, G. (2015). Introduction to optimal quantization for numerics,
ESAIM Proc. & Surveys , :29–79.[35] Pag`es, G. (2016). Convex order for path-dependent derivatives: a dynamic programming ap-proach.
S´eminaire de Probabilit´es XLVIII , C. Donati, A. Lejay, A. Rouault eds, LNM 2168, Springer,Cham, 33–96.[36]
Pag`es, G. and Wilbertz, B. (2012). Dual Quantization for random walks with application to creditderivatives,
J. Comp. Finance , (2):33–60.[37] Pag`es, G. and Wilbertz, B. (2012). Intrinsic stationarity for vector quantization: foundation ofdual quantization.
SIAM J. Numer. Anal. (2):747–780.[38] Pag`es, G. and Wilbertz, B. (2012). Optimal Delaunay and Voronoi quantization schemes for pricingAmerican style options.
Numerical methods in Finance , 171–213, Springer Proc. Math., , Springer,Heidelberg.[39] Pag`es, G. and Wilbertz, B. (2018). Sharp rate for the dual quantization problem,
S´eminaire deProbabilit´es XLV , C. Donati, A. Lejay, A. Rouault eds, LNM 2215, Springer, Cham, 119–164.[40]
Pag`es, G. and Yu, J. (2016). Pointwise convergence of the Lloyd algorithm in higher dimension,
SIAM J. Control Optim. (5): 2354-2382.[41] Rajan, V. T. (1991). Optimality of the Delaunay triangulation in R d . In SCG’91: Proceedings of theseventh annual symposium on Computational geometry , 357-363, New York, NY, USA, ACM.[42]
Strassen, V. (1965). The existence of probability measures with given marginals.
Ann. Math. Statist. , :423–439.[43] Wiesel J. (2019). Continuity of the martingale optimal transport problem on the real line,arXiv:1905.04574.
A Background on (optimal) primal and dual vector quantization
In what follows R d is supposed to be equipped with the canonical Euclidean norm. For a more generalpresentation dealing with any norm, see [18] for Voronoi quantization and [37] for Delaunay quantization. A.1 Optimal Voronoi quantization (primal)
Let Γ = { x , . . . , x N } ⊂ R d denote a finite subset of size N , that we will call grid . To such a grid wecan associate Voronoi diagrams ( C i (Γ)) i =1: N that are Borel partitions of R d satisfying ∀ i ∈ { , . . . , N } , C i (Γ) ⊂ (cid:8) ξ ∈ R d : | ξ − x i | ≤ min ≤ j ≤ N | ξ − x j | (cid:9) . here is a one-to-one correspondence between Voronoi diagrams and Borel nearest neighbour projections ,denoted Proj Γ , defined as Borel mappings from R d → Γ such that ∀ ξ ∈ R d , | ξ − Proj Γ ( ξ ) | = dist( ξ, Γ) . Indeed, if Proj Γ is a Borel nearest neighbour projection, then (cid:0) { Proj Γ = x i } (cid:1) i =1: N is a Voronoi diagram and,conversely, for any Voronoi diagram ( C i (Γ)) i =1: N ,Proj Γ = N (cid:88) i =1 x i C i (Γ) (A.41)is a Borel nearest neighbour projection. The elements ( C i (Γ)) of a Voronoi diagram are called Voronoi cells .We define a
Voronoi or primal Γ -quantization of an R d -valued random vector X : (Ω , A , P ) → R d by (cid:98) X = (cid:98) X Γ := Proj Γ ( X ) (A.42)whose distribution is given by (cid:98) µ Γ = µ ◦ Proj − Γ if X is µ -distributed.If µ (cid:16) (cid:83) ∂C i (Γ) (cid:17) = 0, then (cid:98) µ Γ is unique and all Γ-quantizations are P - a.s. equal. The mean L p -quantization error induced by Γ is defined by e p (Γ , µ ) = e p (Γ , X ) = (cid:13)(cid:13) dist( X, Γ) (cid:13)(cid:13) p = (cid:13)(cid:13) X − (cid:98) X Γ (cid:13)(cid:13) p for any Voronoi quantization of X (still µ -distributed).Then one defines, for p > N ≥
1, the minimal mean L p -quantization error at level N by e p,N ( µ ) = e p,N ( X ) = inf Γ: | Γ |≤ N e p (Γ , X ) . If µ has a finite p th moment, then the above infimum is in fact a minimum and any optimal grid Γ ( N ) solution to the above minimization problem has a full size N provided the support of µ has at least N elements(see e.g. Theorem 4.12 in [18] or Theorem 5.1 in [31] among others). The random vector (cid:98) X N = (cid:98) X Γ ( N ) iscalled an optimal L p -quantization of X . Moreover, the optimal quantization (cid:98) X N is P - a.s. uniquely definedsince one always has µ (cid:16) (cid:83) ∂C i (Γ ( N ) ) (cid:17) = 0 (see Theorem 4.2 in [18]).Finally, in the quadratic case p = 2, any optimal quantization grid Γ ( N ) at level N and its quantization (cid:98) X N satisfy (see e.g. [18], [34] or [31], Proposition 5.1 among others) a stationarity (or self-consistency)equation reading (cid:98) X N = E (cid:0) X | (cid:98) X N ) . (A.43) Quantization rates
Theorem A.1 (Zador Theorem and Pierce Lemma for primal quantization) ( a ) Zador’s Theo-rem for (primal) Voronoi quantization:
Let X ∈ L p + η R d (Ω , A , P ) , p, η > , be a random vector withdistribution P X = ϕ.λ d ⊥ + ν X where λ d denotes the Lebesgue measure and ν X denotes the singular part of thedistribution. Then lim N → + ∞ N d e p,N ( X ) = (cid:101) J vord,p (cid:18)(cid:90) R d ϕ dd + p dλ d (cid:19) d + p where (cid:101) J vord,p = inf N ≥ N d d p,N (cid:0) U ([0 , d ) (cid:1) . When d = 1 , (cid:101) J vor ,p = p +1) /p . ( b ) Non-asympotic bound (Pierce lemma):
Let p, η > . For every dimension d ≥ , there exists a realconstant (cid:101) C vord,η,p > such that, for every random vector X : (Ω , A , P ) → R d , e p,N ( X ) ≤ (cid:101) C vord,η,p N − d σ p + η ( X ) here, for every r > , σ r ( X ) = inf a ∈ R d (cid:107) X − a (cid:107) r ≤ + ∞ . Remark.
Note that if we consider quadratic optimal product quantizations at levels N ≥
1, that is solutions –which exist – to the minimization problems e prod ,N ( µ ) = e prod ,N ( X ) = inf (cid:8) e (Γ , X ) , Γ = Γ × · · · × Γ d , | Γ | ≤ N (cid:9) , N ≥ , then, such optimal product grids are still rate optimal and satisfy a universal non-asymptotic Pierce bound,see e.g. [31]. Lloyd’s algorithm ( p = 2 ) Let µ be a probability distribution supported by at least N points of R d , N ≥
1. The Lloyd procedure at level N provides a systematic way to make the quadratic primal quantization error decrease. Let X ∈ L R d (Ω , A , P ) be µ -distributed. Starting from a grid Γ [0] ⊂ R d with size N , we set for every k ≥ [ k +1] = E (cid:0) X (cid:12)(cid:12) (cid:98) X Γ [ k ] (cid:1) (Ω) where (cid:98) X Γ [ k ] = Proj vor Γ [ k ] ( X ) . One checks that Γ [ k ] has size N for every k ≥ (cid:13)(cid:13) X − (cid:98) X Γ [ k +1] (cid:13)(cid:13) = (cid:13)(cid:13) dist (cid:0) X, Γ [ k +1] (cid:1)(cid:13)(cid:13) ≤ (cid:13)(cid:13) X − E (cid:0) X (cid:12)(cid:12) (cid:98) X Γ [ k ] (cid:1)(cid:13)(cid:13) = (cid:16)(cid:13)(cid:13) X − (cid:98) X Γ [ k ] (cid:13)(cid:13) − (cid:13)(cid:13) E (cid:0) X (cid:12)(cid:12) (cid:98) X Γ [ k ] (cid:1) − (cid:98) X Γ [ k ] (cid:13)(cid:13) (cid:17) / ≤ (cid:13)(cid:13) X − (cid:98) X Γ [ k ] (cid:13)(cid:13) . This does not provide a proof that Γ [ k ] converges to an optimal grid Γ N as k → + ∞ . Some results inthat direction have been obtained when X has a compact support and the initial grid Γ [0] is chosen in anappropriate way (the so-called splitting method ). For recent results on this topic, we refer to [13, 14] or [40]and the references therein.Indeed, as presented, the Lloyd procedure appears as a pseudo-algorithm since computing a conditionalexpectation is a non-trivial exercise, especially in higher dimension. In its original form, the field of applica-tion of Lloyd’s algorithm is mainly the one dimensional framework One dimensional setting ( d = 1 ). Assume that the c.d.f F ( x ) = µ (cid:0) − ∞ , x ] (cid:1) = P ( X ≤ x ) and thepartial first moment K ( ξ ) = (cid:82) x −∞ ξµ ( dξ ) = E X { X ≤ ξ } both have closed form expressions(such is the casefor the normal or the exponential distributions for example). In a one -dimensional setting the Voronoi cellsof a grid Γ = { x , . . . , x N } ⊂ R d with size N ≥ C i (Γ) = (cid:0) x i − / , x i +1 / (cid:3) , i = 1 : N where x / = −∞ , x N +1 / = + ∞ and x i +1 / = x i +1 / + x i − / , i = 1 : N − [ (cid:96) ] = { x [ (cid:96) ]1 , . . . , x [ (cid:96) ] N } the elements of the grid Γ [ (cid:96) ] labelled in an increasing order(i.e. so that x [ (cid:96) ]1 < · · · < x [ (cid:96) ] N ), the procedure reads x [ (cid:96) +1] i = K (cid:0) x [ (cid:96) ] i +1 / (cid:1) − K (cid:0) x [ (cid:96) ] i − / (cid:1) F (cid:0) x [ (cid:96) ] i +1 / (cid:1) − F (cid:0) x [ (cid:96) ] i − / (cid:1) , i = 1 : N. (A.44)If the distribution µ has a non-piecewise affine log-concave density, then it is proved in [29] that x [ (cid:96) ] convergestoward x [ ∞ ] , unique stationary N -quantizer of µ , at an exponential rate. Then, one computes the weightsof this quantizer by p [ ∞ ] i := P (cid:0) X ∈ C i (Γ [ ∞ ] ) (cid:1) = F (cid:0) x [ ∞ ] i +1 / (cid:1) − F (cid:0) x [ ∞ ] i − / (cid:1) , i = 1 : N. igher dimensional setting: the k -means algorithm . In higher dimensions no closed formsare available for the Lloyd algorithm and a randomized version of the procedure is required for an easyimplementation (in low dimension d = 2 or 3 the algorithm can be implemented by computing all theintegrals by cubature formulas using the QHull library ( ) (see also [30]). This randomized (approximate)avatar of the original procedure is also known in datascience as the k -means algorithm . One simulates alarge sample of the distribution of X and replaces the distribution µ = P of X by the induced empiricalmeasure (cid:101) µ = M (cid:80) Mm =1 δ X m . Then, the above recursion (A.44) reads x [ (cid:96) +1] i = (cid:80) ≤ m ≤ M X m { X m ∈ C i (Γ [ (cid:96) ] ) } card { ≤ m ≤ M : X m ∈ C i (Γ [ (cid:96) ] ) } , i = 1 : N, (cid:96) ≥ , (A.45)and the weights are given by p [ (cid:96) ] i = card { ≤ m ≤ M : X m ∈ C i (Γ [ (cid:96) ] ) } M , i = 1 : N (can be computed at the end of theprocedure). This approach based on a Monte Carlo simulation is much more time consuming to computeoptimal quantization grids. A.2 Optimal Delaunay (dual) quantization
Let X : (Ω , A , P ) → R d be a random vector lying in L ∞ ( P ). We will assume for convenience inwhat follows that the support of its distribution µ = P X spans R d as an affine space. Otherwise one mayalways consider the affine space A µ spanned by supp( µ ) and reduce the problem to the former framework bycombining a translation with a change of coordinates into an orthonormal basis of the vector space associatedwith A µ . Optimal dual (or Delaunay) quantization relies on the best approximation which can be achievedby a discrete random vector (cid:98) X that satisfies a certain stationarity assumption on the extended probabilityspace (Ω × Ω , A ⊗ A , P ⊗ P ) with (Ω , A , P ) supporting a random variable uniformly distributed on[0 , p ∈ [1 , + ∞ ): ∀ N ≥ d + 1 , d p,N ( X ) = inf (cid:98) X (cid:110)(cid:13)(cid:13) X − (cid:98) X (cid:13)(cid:13) p : (cid:98) X : (Ω × Ω , A ⊗ A , P ⊗ P ) → R d , card (cid:98) X (Ω × Ω ) ≤ N and E ( (cid:98) X | X ) = X (cid:111) . One checks that d p,N ( X ) only depends on the distribution µ of X and can subsequently be denoted d p,N ( µ ). One shows (see [37]) that, for a given distribution µ on ( R d , B or ( R d )), d p,N ( µ ) = inf (cid:110) (cid:107) Ξ − ξ (cid:107) p , (Ξ , ξ ) : (Ω Ξ , A , P ) → R d × R d , Ξ ∼ µ, E ( ξ | Ξ) = Ξ , card (cid:0) Y (Ω) (cid:1) ≤ N (cid:111) . (A.46)Then (see [37]), one may show that such a definition is equivalent to d p,N ( X ) = inf (cid:8)(cid:13)(cid:13) ∆ p ( X ; Γ) (cid:13)(cid:13) p : conv (cid:0) supp( µ ) (cid:1) ⊂ Γ ⊂ R d , card(Γ) ≤ N (cid:9) where the local dual quantization functional ∆ p reads on a given grid Γ which contains an affine basis of R d (or, equivalently, whose convex hull has a non-empty interior):∆ p ( ξ ; Γ) = inf λ (cid:26)(cid:16) N (cid:88) i =1 λ i | ξ − x i | p (cid:17) /p : ( λ i ) i =1: N ∈ [0 , N and N (cid:88) i =1 λ i x i = ξ, N (cid:88) i =1 λ i = 1 (cid:27) . When p = 2 (quadratic case), one has the following result about the zones where the infimum ∆ p ( ξ ; Γ)is attained: if the grid Γ ⊂ R d contains an affine basis with its points are in general position – none of itssubset of size d + 1 lies on the same sphere – then it admits a unique Delaunay triangulation in the followingsense (see [41] or, for our setting, Proposition 6 and Theorem 4 in [37]): . For every ξ ∈ conv(Γ), there exist a unique I = I ( ξ ) ⊂ { , . . . , N } of cardinality d + 1 such that(a) ( x i ) i ∈ I is an affine basis,(b) conv { x i , i ∈ I } ∩ { x j , j ∈ I c } = ∅ (so-called Delaunay property ),(c) ∆ p ( ξ ; Γ) is attained as a minimum at an N -tuple λ , . . . , λ N satisfying the constraints with λ i = 0if i / ∈ I .2. If I = I ( ξ ) as above for some ξ ∈ conv(Γ), then for every ξ (cid:48) ∈ conv (cid:0) x i , i ∈ I ( ξ ) (cid:1) , I ( ξ (cid:48) ) = I ( ξ ).A collection of simplexes ( x i ) i ∈ I where I is admissible for some ξ ∈ conv(Γ) is called a triangulation ofΓ. When the points of Γ are not in general position, several subsets I of { , . . . , N } can satisfy condition1. However, if such is the case, I remains admissible for all points ξ in conv( x i , i ∈ I ). Thus, severaltriangulations may exist, each one giving raise to its own splitting operator (see (A.47) below). A typicalexample is a rectangle split by one of its two diagonals which yields two triangulations, one for each diagonal.It was proved in [37] that for such grids, we can construct a dual quantization projection (or splittingoperator ) which is the counterpart of the nearest neighbour projection for Voronoi quantization. This op-erator maps the random variable X randomly to the vertices of the Delaunay “hyper-triangle” (in fact a d -simplex) in which X falls (see Figure 1 further on), where the probability of mapping/projecting X toa given vertex t i is determined by the i -th barycentric coordinate of X in the (non-degenerated) “hyper-triangle” (or d -simplex) conv { t j : j = 1 , . . . , d + 1 } . When p (cid:54) = 2, an extension of the notion of Delaunay“triangulation”can still be defined although slightly more involved (similarly, the Voronoi cells are no longerconvex when p (cid:54) = 2). We refer again to [37] for details.Mathematically speaking, let ( D k (Γ)) ≤ k ≤ m be a Delaunay partition of the convex hull conv(Γ) of Γ.Let us denote by λ k ( ξ ) the barycentric coordinates of ξ in the triangle D k (Γ), with the convention λ ki ( ξ ) = 0if x i / ∈ D k (Γ). We define the dual (or Delaunay) projection operator – also called spliting operator – byProj del Γ ( ξ, u ) = m (cid:88) k =1 (cid:34) N (cid:88) i =1 x i · (cid:8) i − (cid:80) j =1 λ kj ( ξ ) ≤ u< i (cid:80) j =1 λ kj ( ξ ) (cid:9)(cid:35) D k (Γ) ( ξ ) . (A.47) Figure 1 –
Voronoi (left) and Delaunay (right) projections for the realization X ( ω ) = (cid:4) . Note that in [37] this projection is denoted J u Γ (this change is motivated by notational consistency). Itis clear that, by contsruction, ∀ ξ ∈ conv(Γ) , (cid:90) Proj del Γ ( ξ, u ) du = ξ oreover, it follows from (A.47), that∆ p ( ξ ; Γ) = (cid:16) E P | ξ − Proj del Γ ( ξ, U ) | p (cid:17) /p , where U is defined on (Ω , A , P ) with a U (cid:0) [0 , (cid:1) -distributed (so that the operator Proj Γ ( ξ, u ) is defined onthis exogenous space). Then we define (on the product probability space ( (cid:101) Ω , (cid:101) A , (cid:101) P )) the dual (or Delaunay ) quantization (cid:98) X Γ , dual := Proj del Γ ( X, U )so that (cid:13)(cid:13) ∆ p ( X ; Γ) (cid:13)(cid:13) p = (cid:13)(cid:13) X − (cid:98) X Γ , dual (cid:13)(cid:13) p and E ( (cid:98) X Γ , dual | X ) = X. Remark. L p -Dual quantization can be extended in a canonical way to L p ( P )-integrable random vectorsby defining in a proper way the splitting operator outside the convex hull of the grid Γ. Unfortunately, asexpected the dual stationarity property is not preserved by this extension. Optimal L p -dual quantizers (existence). It is shown in [37] that, for every integer N ≥ d + 1, thereexists at least one optimal dual quantizer Γ ( N ) ,del at level N ≥ d + 1 which achieves the infimum d p,N ( X )and any such optimal dual quantizer has cardinality N . Furthermore, d p,N ( X ) → N → + ∞ . We recallbelow the main result on convergence rate of dual quantization for bounded random vectors establishedin [39]. Theorem A.2 (Zador Theorem and Pierce Lemma for dual quantization) ( a ) Zador’s Theoremfor dual quantization : Let X ∈ L ∞ R d (Ω , A , P ) be a bounded random vector with distribution P X = ϕ.λ d ⊥ + ν X where λ d denotes the Lebesgue measure and ν X denotes its singular component. Then, for every p ∈ (0 , + ∞ ) , lim N → + ∞ N d d p,N ( X ) = (cid:101) J deld,p (cid:18)(cid:90) R d ϕ dd + p dλ d (cid:19) d + p where (cid:101) J deld,p = inf N ≥ N d d p,N (cid:0) U ([0 , d ) (cid:1) ≥ (cid:101) J vord,p . When d = 1 , (cid:101) J del ,p = (cid:16) p +1)( p +2) (cid:17) /p . Hence, (cid:101) J del ,p (cid:101) J vor ,p = (cid:16) p +1 p +2 (cid:17) /p ↑ as p ↑ + ∞ . ( b ) Non-asymptotic bound (Pierce lemma) : Let p, η > . For every dimension d ≥ , there exists areal constant (cid:101) C deld,η,p > such that, for every random vector X : (Ω , A , P ) → R d , L ∞ ( P ) -bounded, d p,N ( X ) ≤ (cid:101) C deld,η,p N − d σ p + η ( X ) (A.48) where, for every r > , σ r ( X ) = inf a ∈ R d (cid:107) X − a (cid:107) r < + ∞ . Remark.
Note that claim ( b ) remains true if the support of P X does not span R d as an affine space, but A µ with dimension d (cid:48) . However, if such is the case (A.48) holds with N − /d (cid:48) so that N − /d is suboptimal. Voronoi versus
Delaunay quantization.
To illustrate the difference between Voronoi and Delaunayquantization (in the case d = p = 2), we compare in Figure 1 below the nearest neighbor projection and thedual quantization operator.For a given grid Γ ⊂ R d , the nearest neighbor projection Proj vor Γ maps X ( ω ) entirely to the generatorof the Voronoi cell C i (Γ) in which X ( ω ) falls. By contrast, the Delaunay random splitting operator Proj del Γ splits up the “weight” 1 of X ( ω ) across the vertices of the Delaunay triangle in which X ( ω ) falls. Since eachvertex receives here a proportion according to the barycentric coordinate of the point X ( ω ) in that specific elaunay triangle, this splitting operator fulfills a backward interpolation property, i.e. X ( ω ) is given by aconvex combination of the vertices of the Delaunay triangle.Finally, this property also implies the intrinsic dual stationarity condition E ( (cid:98) X Γ , dual | X ) = X. Note that, by contrast with regular Voronoi quantization where (A.43) holds for optimal quadratic grids,this dual stationarity equation is satisfies by any dual quantization grid.
Remark.
For a comparison in one dimension, we give the example of optimal quantizations for U ([0 , U ([0 , N Γ ( N ) ,del = (cid:26) i − N − i = 1 , . . . , N (cid:27) . On the other hand, it holds in the case of optimal Voronoi quantizationΓ ( N ) ,vor = (cid:26) i − N : i = 1 , . . . , N (cid:27) so that an optimal Voronoi quantizer of size N is made up by the midpoints of an optimal Delaunay of size N + 1. Such a property does not hold for general distributions in arbitrary dimensions. One dimensional setting (quadratic case)
Dual weights attached to a fixed grid.
Let µ be probability distribution such that conv (cid:0) supp( µ ) (cid:1) =[ a, b ], a , b ∈ R , a < b and let Γ = { x , . . . , x N } be a grid of size N with x = a and x N = b . We denote by F and K respectively, the c.d.f. and the first partial moment functions of µ .Starting from the fact that, for every ξ ∈ [ x i , x i +1 ], ξ = x i +1 − ξx i +1 − x i x i + ξ − x i x i +1 − x i x i +1 , we derive that, forevery i = 1 : N , p i (Γ) = (cid:90) ( x i − ,x i ] ξ − x i − x i − x i − µ ( dξ ) + (cid:90) ( x i ,x i +1 ] x i +1 − ξx i +1 − x i µ ( dξ )= K ( x i ) − K ( x i − ) − x i − (cid:0) F ( x i ) − F ( x i − ) (cid:1) x i − x i − + (cid:0) F ( x i +1 ) − F ( x i ) (cid:1) x i +1 − (cid:0) K ( x i +1 ) − K ( x i ) (cid:1) x i +1 − x i . (A.49) Optimizing a dual grid
One computes likewise d ,N (Γ , µ ): d ,N (Γ) = (cid:90) [ a,b ] µ ( dξ ) (cid:90) du | ξ − Proj del Γ ( ξ, u ) (cid:12)(cid:12) = N − (cid:88) i =1 (cid:90) ( x i ,x i +1 ] µ ( dξ ) (cid:104) x i +1 − ξx i +1 − x i ( ξ − x i ) + ξ − x i x i +1 − x i ( x i +1 − ξ ) (cid:105) = N − (cid:88) i =1 (cid:90) ( x i ,x i +1 ] µ ( dξ )( x i +1 − ξ )( ξ − x i )= N − (cid:88) i =1 (cid:0) ( x i + x i +1 ) (cid:0) K ( x i +1 ) − K ( x i )) (cid:1) − x i x i +1 (cid:0) F ( x i +1 ) − F ( x i ) (cid:1)(cid:1) − (cid:90) R ξ µ ( dξ ) . Then, one shows that, viewed as a function of the N -tuple x = ( x , . . . , x N ), the mapping x (cid:55)→ d ,N ( x, µ )is continuously differentiable when F is continuous and differentiable on the set of vectors x with all coor-dinates outside the at most countable set of discontinuities of F otherwise with, ∂d ,N ( x, µ ) ∂x i = K ( x i +1 ) − K ( x i − ) − (cid:104) x i +1 (cid:0) F ( x i +1 ) − F ( x i ) (cid:1) + x i − (cid:0) F ( x i ) − F ( x i − ) (cid:1)(cid:105) ith the convention F ( x ) = F ( x ) and F ( x N +1 ) = F ( x N ).As any optimal N -tuple satisfies ∇ d ,N ( x, µ ) = 0, elementary computations show that this equationreads x = T ( x ) = (cid:0) T ( x ) , . . . , T N ( x ) (cid:1) where the mapping T , defined from the simplex S a,b = { a = x < x < · · · < x N } onto it, is given by T i ( x ) = K ( x i +1 ) − K ( x i ) − ( x i +1 − x i ) (cid:0) F ( x i +1 ) − F ( x i ) (cid:1) F ( x i +1 ) − F ( x i − ) (A.50)+ K ( x i ) − K ( x i − ) + ( x i − x i − ) (cid:0) F ( x i ) − F ( x i − ) (cid:1) F ( x i +1 ) − F ( x i − ) , i = 1 : N, (A.51)still with the above convention.From this fixed point equality, one can devise an iterative fixed point procedure which can be seen asthe counterpart of Lloyd I procedure for dual quantization: x [ (cid:96) +1] = T (cid:0) x [ (cid:96) ] (cid:1) , (cid:96) ≥ , x [0] ∈ S a,b . (A.52)Although it turns out to be quite efficient with (truncated) usual distributions like normal, exponential, γ distributions, no theoretical result is available yet to prove its convergence (except for the uniform distributionon the unit interval which is of no practical interest). In particular we have not yet a counterpart of Kieffer’stheorem (see [29] ) which proves the exponentially fast convergence of the one dimensional regular “Voronoi”Lloyd procedure for non-piecewise affine log-concave distributions. Algorithmic aspects in higher dimensions (quadratic setting)
For higher dimensional numerical aspects, we refer to [38] where two stochastic algorithms have beendevised to compute optimal dual quantization grids in the spirit of the randomized avatar of Lloyd I (fixedpoint method) and CLVQ algorithms (stochastic gradient descent) respectively. Figure 2 displays threeexamples of dual quantization of 2 D -random vectors. Figure 2 –
Dual quantizations ( d = 2 ). Left: U ([0 , ) , N = 16 . Middle: truncated N (0; I ) , N = 250 . Right: truncated law of ( W , sup t ∈ [0 , W t ) , W standard Brownian motion, N = 250 (with B. Wilbertz).(with B. Wilbertz).