Adapted Wasserstein Distances and Stability in Mathematical Finance
Julio Backhoff-Veraguas, Daniel Bartl, Mathias Beiglböck, Manu Eder
aa r X i v : . [ q -f i n . M F ] M a y ADAPTED WASSERSTEIN DISTANCES AND STABILITY INMATHEMATICAL FINANCE
J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER
Abstract.
Assume that an agent models a financial asset through a measure Q with the goal to price / hedge some derivative or optimize some expectedutility. Even if the model Q is chosen in the most skilful and sophisticated way,she is left with the possibility that Q does not provide an exact descriptionof reality. This leads us to the following question: will the hedge still besomewhat meaningful for models in the proximity of Q ?If we measure proximity with the usual Wasserstein distance (say), the an-swer is NO. Models which are similar w.r.t. Wasserstein distance may providedramatically different information on which to base a hedging strategy.Remarkably, this can be overcome by considering a suitable adapted ver-sion of the Wasserstein distance which takes the temporal structure of pricingmodels into account. This adapted Wasserstein distance is most closely relatedto the nested distance as pioneered by Pflug and Pichler [52, 53, 54]. It allowsus to establish Lipschitz properties of hedging strategies for semimartingalemodels in discrete and continuous time. Notably, these abstract results aresharp already for Brownian motion and European call options. Keywords:
Hedging, utility maximization, optimal transport, causal optimaltransport, Wasserstein distance, sensitivity, stability.
AMS subject classifications (2010)
Introduction
Outline.
Assume that a reference measure P is used to model the evolution ofa financial asset X with the purpose to hedge a financial claim or to maximize someexpected utility. We do not expect that the model P captures reality in an absolutelyaccurate way. However, supposing that P is close enough to reality (described bya probability Q ) we would still hope that a strategy which is developed for P leadsto reasonable results.A main goal of this paper is to establish this intuitive idea rigorously basedon a new notion of adapted Wasserstein distance AW p between semimartingalemeasures. To fix ideas, we provide a first example of the results we are after. Theorem 1.1.
Let P , Q be continuous semimartingale models for the asset priceprocess X , and assume that C ( X ) denotes an L -Lipschitz payoff of a (pathdepen-dent) derivative C . Assume that a predictable trading strategy H = ( H t ) t , | H | ≤ k and an initial endowment m ∈ R constitute a P -superhedge of C ( X ) , i.e. C ( X ) ≤ m + ( H • X ) T , P -almost surely . Then there is a predictable G s.t. m, G constitute an “almost” Q -superhedge: E Q [( C ( X ) − m − ( G • X ) T ) + ] ≤ k + L ) · AW ( P , Q ) . (1.1)While the adapted Wasserstein distance will be defined in abstract terms (see(1.3)), it relates directly to the model parameters for ‘simple’ models. In particular,if P , Q are Brownian models with different volatilities, than the distance betweenthese models is just the difference of these volatilities. Moreover, the bound in (1.1) (as well as further Lipschitz bounds given below) are already sharp in such a simplesetting and for C a European call option.Below we will provide a number of results with similar flavour as Theorem 1.1.E.g. we will provide versions where the hedging error is controlled in terms of riskmeasures and we will show that a Lipschitz bound of the type (1.1) applies (withbigger constants) if the same trading strategy H is applied in the model P as wellas in the model Q . Importantly, we establish that comparable results of Lipschitzcontinuity apply to utility maximization and utility indifference pricing.We emphasize that familiar concepts such as the L´evy-Prokhorov metric or theusual Wasserstein distance do not appear suitable to derive results comparable toTheorem 1.1. E.g. in the vicinity of financial meaningful models there are mod-els with arbitrarily high arbitrage even for bounded strategies; similar phenomenaappear w.r.t. completeness / incompleteness. Instead we introduce an adaptedWasserstein distance AW p which takes the temporal structure of semimartingalemodels into account. These distances are conceptually closely related to the nesteddistance as pioneered by Pflug and Pichler [53, 54, 55]; see [1, 30, 20] for first arti-cles which link such a type of distance to finance. We describe these contributionsmore closely in Section 2 below.1.2. Notation and adapted Wasserstein distances.
Throughout we letΩ := R T or Ω := C (0 , T ) . The first setting shall be referred to as the discrete time case, and the second as thecontinuous time case. In the first case we denote by I = { , . . . , T } the time-indexset, and in the second I = [0 , T ]. Throughout the article we will provide definitionsand results without specifying which of the two cases we are referring to: Thismeans that the definitions / results apply in both cases. Only occasionally will weconsider one case specifically, and in this situation we will state this explicitly.We interpret Ω as the set of all possible evolutions (in time) of the 1-dimensionalasset price. Importantly, mutatis mutandis, all our results (except Propositions3.3, 3.6 and Example 3.4) remain true for multi-dimensional asset price processes(corresponding to Ω = ( R d ) T / Ω = C ([0 , T ] , R d )). We chose to go for the 1-dimensional version to simplify notation.The mappings X, Y : Ω → Ω denote the canonical processes (i.e. the identitymap), and we make the convention that on Ω × Ω the process X denotes the firstcoordinate and Y the second one. The spaces Ω and Ω × Ω are endowed with themaximum-norm and the corresponding Borel- σ -field. In continuous time, the spaceΩ is endowed with the right-continuous filtration generated by X , in discrete timewe use the plain filtration generated by X . In any case we denote this filtrationby F = ( F t ) t and endow Ω × Ω with the product filtration ( F t ⊗ F t ) t . Given a σ -algebra G and a probability P on G we write G P for the P -completion of G . The setCpl( P , Q ) of couplings between probability measures P , Q consists of all probabilitymeasures π on Ω × Ω such that X ( π ) = P and Y ( π ) = Q . A Monge coupling is acoupling that is of the form π = (Id , T )( P ) for some Borel mapping T : Ω → Ω thattransports P to Q , i.e. satisfies T ( P ) = Q . Given a metric d on Ω and p ≥
1, the p -Wasserstein distance of P , Q is W p ( P , Q ) = inf n E π [ d ( X, Y ) p ] /p : π ∈ Cpl( P , Q ) o . (1.2)In many cases of practical interest the infimum in (1.2) remains unchanged if oneminimizes only over Monge couplings, cf. [56]. Indeed the arguments in the discrete and the continuous case use the same set of ideas butthe presentation is significantly less technical in the discrete case which was an important reasonto include the discrete case in the paper.
DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 3
Before defining the adapted Wasserstein distance between measures P and Q onΩ, let us hint why distances related to weak convergence are not suitable for theresults we have in mind. Assume for example that we are interested in a utilitymaximization problem in two periods and that Figure 1 describes the laws P , Q oftwo traded assets. Clearly they are very close in Wasserstein distance, as followsfrom considering the obvious Monge coupling induced by T : Ω → Ω , T ( P ) = Q depicted in Figure 1. At the same time, the outcome of utility maximization iscertainly very different. Similarly, P is a martingale measure while Q allows forarbitrage. The clear reason for that is the different structure of information availableat time 1. T Figure 1.
Map T sends the blue path on the left, to the bluepath on the right, and similarly for the red paths. The stochasticprocesses depicted are close in Wasserstein sense, but very differentfor utility maximization.To exhibit why the Wasserstein distance does not reflect this different structureof information, let us review the transport condition T ( P ) = Q . We rephrase it as( T ( X , X ) , T ( X , X )) ∼ ( Y , Y ) . (1.3)While this condition is of course perfectly natural in mass transport, (1.3) almostseems like cheating when viewed from a probabilistic perspective: the map T shouldnot be allowed to consider the future value X in order to determine Y . To definean adapted version of the Wasserstein distance, the ‘process’ ( T i ) i =1 , should betaken to be adapted in order to account for the different information structures of P and Q .Naturally our official definition of adapted Wasserstein distances will not referto adapted Monge transports but rather to couplings which are ‘adapted’ in anappropriate sense. Following Lassalle [45], we call such couplings (bi-)causal. Sincethe definition below may appear a bit technical at first glance, the following maybe reassuring: In the discrete time setting and for absolutely continuous measures P , the weak closure of the set of adapted Monge couplings, i.e. π = (Id , T )( P ) for T adapted, is precisely the set of all causal couplings, see [42]. Definition 1.2 ((bi-)causal couplings) . For a coupling π of P , Q ∈ P (Ω) denote by π ( dω, dη ) = P ( dω ) π ω ( dη ) a regular disintegration w.r.t. P . The set Cpl C ( P , Q ) of causal couplings consists of all π ∈ Cpl( P , Q ) such that for all t ∈ I and A ∈ F t ω π ω ( A ) is F P t -measurable . The set of all bi-causal couplings
Cpl BC ( P , Q ) consists of all π ∈ Cpl C ( P , Q ) suchthat also S ( π ) ∈ Cpl C ( Q , P ) , where S : Ω × Ω → Ω × Ω , S ( ω, η ) := ( η, ω ) . In discrete time, a coupling π is causal if and only if π (cid:0) ( Y , . . . , Y t ) ∈ A | X (cid:1) = π (cid:0) ( Y , . . . , Y t ) ∈ A | X , . . . X t (cid:1) , P -a.s. for every t and Borel set A ⊆ R t , that is, at time t , given the past ( X , . . . , X t )of X , the distribution of Y t does not depend on the future ( X t +1 , . . . , X N ) of X .Replacing couplings by bi-causal couplings in (1.2) one arrives at the nested dis-tance as introduced by Pflug and Pichler [52, 53]. Since our goal is to compare also J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER semimartingale models in continuous time we will work with an adapted Wasser-stein distance that is defined slightly differently. (Notably, it is straightforwardthat the two distances are equivalent for probabilities on R N . We will elaborate inSection 3.3 below, why the definition in (1.4) is more appropriate for our purposeseven in discrete time.)In continuous time, we denote by SM (Ω) the set of all probabilities P on (theBorel σ -field of) Ω under which the canonical process X is a continuous semimartin-gale. In discrete time, SM (Ω) denotes the set of all Borel probabilities P on Ω underwhich X is integrable. In either case we can uniquely decompose X = M + A , with A a finite variation predictable process started at zero, and M a local martingale.Indeed, in the first case X is a special semimartingale and in fact M and A are con-tinuous too, and in the second case this is the Doob decomposition of an integrableadapted discrete-time process. For p ∈ [1 , ∞ ) we denote by SM p (Ω) the subset of SM (Ω) for which E P [[ M ] p/ T + | A | p ] < ∞ , where [ · ] is the quadratic variation and | · | the first variation norm. Note alsothat by the BDG inequality E P [sup s ≤ T | M s | ] < ∞ for SM p (Ω), hence M is a truemartingale. Definition 1.3 (Adapted Wasserstein distance) . For P , Q ∈ SM p (Ω) , p ≥ set AW p ( P , Q ) := inf n E π (cid:2) [ M X − M Y ] p/ T + | A X − A Y | p (cid:3) /p : π ∈ Cpl BC ( P , Q ) o , (1.4) where X = M X + A X , Y = M Y + A Y denote the semimartingale decomposition of X and Y resp. It is shown in Lemma 3.1 that AW p is well-defined (i.e. that X − Y is a semi-martingale under every bi-causal coupling) and in Lemma 3.2 that AW p in factdefines a metric. Remark 1.4.
In the continuous time setup, the adapted Wasserstein distance canalso be computed through AW p ( P , Q ) = inf n E π (cid:2) [ X − Y ] p/ T + MV T [ | X − Y | p (cid:3) /p : π ∈ Cpl BC ( P , Q ) o . Here MV denotes the mean variation, i.e. MV T [ Z ] = sup ∆ P t j ∈ ∆ | E [ Z t j +1 − Z t j |F t j ] | ,where the supremum is taken over all finite partitions ∆ of [0 , T ] . In Section 3.2 below we will give explicit formulae for the adapted Wassersteindistance in the case of semi-martingale measures described by simple SDEs.1.3.
Stability of Superhedging.
For the rest of this article, fix some k ∈ R + andlet H k be the set of all predictable processes H : Ω × I → [ − k, k ] . For every p ≥
1, write b p for the ‘upper’ Burkholder-Davis-Gundy (BDG) constant,cf. Remark 3.12 below. In particular it is known that b ≤ b = 2.Our first main result concerns the stability of superhedging and constitutes astronger version of Theorem 1.1 stated above. Theorem 1.5.
Let P , Q ∈ SM (Ω) , H ∈ H k and let C : Ω → R be Lipschitz withconstant L . Then the hedging error under Q is bounded by the distance of P and Q plus the hedging error under P in the following sense: there exists G ∈ H k suchthat E Q [( C − m − ( G • X ) T ) + ] ≤ E P [( C − m − ( H • X ) T ) + ]+ b ( k + L ) AW ( P , Q ) . (WHI) DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 5
Assume in addition that H t : Ω → R is Lipschitz with constant ˜ L for every t ∈ I .Then we can take G = H and obtain E Q [( C − m − ( H • X ) T ) + ] ≤ E P [( C − m − ( H • X ) T ) + ]+ b ( k + L ) AW ( P , Q ) + β AW ( P , Q ) , (SHI) where β := 2 √ b ˜ L min {AW ( P , δ ) , AW ( Q , δ ) } . Importantly, it is impossible to transfer a superhedge under P into a superhedgeunder Q . This occurs already in a one-period framework and is not a by-product ofour definition of adapted Wasserstein distance; see Remark 5.2. A similar reasoningrequires to consider only trading strategies bounded by k ; see Remark 5.3.It is worthwhile to compare the inequalities (WHI) and (SHI):(S) In a certain sense the ‘strong hedging inequality’ (SHI) seems to be themore relevant assertion: after all a trader does not know that the model Q (rather than the model P ) describes reality and hence she might (somewhatstubbornly) stick to the initial plan of hedging her risk according to thestrategy H . The inequality (SHI) then allows to quantify the losses due tothis model-error.(W) However, the ‘weak hedging inequality’ (WHI) also has a particular merit:suppose that a trader W starts with the prior belief that the asset priceevolves according to a Black-Scholes model with volatility σ but soon aftertime 0 realizes that a volatility σ (where σ = σ ) yields a more adequatedescription of reality. If the witty trader W makes an accurate guess aboutthe correct model and updates her trading strategy accordingly, her lossescan be controlled through the tighter bound in (WHI).In Theorem 4.2 we provide a version of Theorem 1.5, where ( · ) + is replaced bya convex, strictly increasing loss function l : R → R + .Another way to gauge the effectiveness of an almost superhedge is by meansof risk measures. We postpone the general formulation to Theorem 4.3 and firstpresent a version that appeals to the average value of risk AVaR P α . Recall that fora random variable Z : Ω → R AVaR P α ( Z ) := inf m ∈ R E P [( Z − m ) + /α + m ] , is the average value at risk at level α ∈ (0 ,
1) under model P . We then have Theorem 1.6.
Assume that C : Ω → R is Lipschitz with constant L . Then (cid:12)(cid:12)(cid:12) inf H ∈H k AVaR P α ( C − ( H • X ) T ) − inf H ∈H k AVaR Q α ( C − ( H • X ) T ) (cid:12)(cid:12)(cid:12) ≤ r AW ( P , Q ) , for r := b ( L + k ) /α . If H ∈ H k is such that H t : Ω → [ − k, k ] is Lipschitz withconstant ˜ L for every t ∈ I and β is the constant defined in Theorem 1.5, then (cid:12)(cid:12) AVaR P α ( C − ( H • X ) T ) − AVaR Q α ( C − ( H • X ) T ) (cid:12)(cid:12) ≤ r AW ( P , Q ) + βα AW ( P , Q ) . The interpretation of this result is similar to the one of Theorem 1.5: As AVaR P α ( · )is translation invariant, one hasinf H ∈H k AVaR P α ( C − ( H • X ) T ) = inf n m ∈ R : there is H ∈ H k such thatAVaR P α ( C − m − ( H • X ) T ) ≤ o , and the right-hand side constitutes a relaxed version of the superhedging price.Notably, the explicit calculations of adapted Wasserstein distance given in Sec-tion 3.2 imply that Theorem 1.6 (and similarly Theorem 1.5) are sharp J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER
Example 1.7 (Hedging in a Brownian framework) . Consider a European call op-tion C ( X ) = ( X T − K ) + , where for simplicity K = 0 . Moreover, let P σ be Wienermeasure with constant volatility σ ≥ . Then for every σ, ˆ σ ≥ , k ≥ , and α ∈ (0 , it holds that (we defer the proof of this fact to Section 4) (cid:12)(cid:12)(cid:12) inf H ∈H k AVaR P σ α ( C − ( H • X ) T ) − inf H ∈H k AVaR P ˆ σ α ( C − ( H • X ) T ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) E P σ [ C ] − E P ˆ σ [ C ] (cid:12)(cid:12) = 1 √ π T | σ − ˆ σ | = 1 √ π AW ( P σ , P ˆ σ ) . This shows that the estimate in Theorem 1.6 is tight (up to constants), in the sensethat it is essentially impossible to improve on the probability metric AW . We make the important remark that Glanzer, Pflug, and Pichler [30] use thenested distance to control acceptability prices in discrete time models in a Lipschitzfashion through the nested distance of these models. Specifically, in a discretetime one-period framework [30, Proposition 3] and Theorem 1.6 yield almost thesame assertion: in this setup, the only difference is that [30, Proposition 3] doesnot specify a Lipschitz constant and does not assume uniform boundedness of theadmissible hedging strategy. (However, the latter seems to be in conflict with ourRemark 5.3 below.)1.4.
Stability of Utility Maximization and Utility Indifference Pricing.
We move on to consider the continuity of utility maximization. Let U : R → R , bea utility function which is concave, increasing, and denote by U ′ the left-continuousversion of the derivative. We have Theorem 1.8.
Let C : Ω → R be Lipschitz continuous and assume that there exists c ≥ such that U ′ ( x ) ≤ c (1 + | x | p − ) for all x . Then, for every R ≥ there existsa constant K such that (cid:12)(cid:12)(cid:12) sup H ∈H k E P [ U ( C + ( H • X ) T )] − sup H ∈H k E Q [ U ( C + ( H • X ) T )] (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) , for all P , Q ∈ SM p (Ω) with AW p ( P , δ ) , AW p ( Q , δ ) ≤ R . The failure of usual Wasserstein distances to guarantee stability of utility maxi-mization is illustrated in Remark 5.1.A common way of quantifying the value of a claim is via utility indifferencepricing: given a claim C , the utility indifference (bid-) price v is defined as thesolution of the following equationsup H ∈H k E P [ U ( C − v + ( H • X ) T )] = sup H ∈H k E P [ U (( H • X ) T )] . Continuing in the spirit of the present paper, we are interested in the stability of P v ( P ), where the latter denotes the utility indifference price associated to themodel P . Theorem 1.9.
Let C : Ω → R be Lipschitz continuous and assume that there exists c ≥ such that < U ′ ( x ) ≤ c (1 + | x | p − ) for all x . Then, for every R ≥ thereexists a constant K such that (cid:12)(cid:12)(cid:12) v ( P ) − v ( Q ) (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) , for all P , Q ∈ SM p (Ω) with AW p ( P , δ ) , AW p ( Q , δ ) ≤ R . We are grateful to the anonymous referee for pointing out that we could include the stabilityof utility indifference pricing w.r.t. adapted Wasserstein distance.
DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 7
Structure of the paper.
In Section 2 we briefly review the literature relatedto this paper. In Section 3 we establish some basic properties of the adaptedWasserstein distance, discuss the choice of cost function and give some examples.Moreover we derive a contraction principle (Theorem 3.10) which relates adaptedWasserstein distance with a ‘weak’ (in the sense of Gozlan et al [32]) transportdistance. This result forms the basis for the proofs of the results mentioned in theintroduction, as well as certain extensions of these results, see Section 4. Finallywe conclude with some remarks in Section 5.2.
Literature
The articles closest in spirit to ours are [1, 20, 30]. Acciaio, Zalashko and oneof the present authors consider in [1] an object related to the adapted Wassersteindistance in continuous time in connection with utility maximization, enlargement offiltrations and optimal stopping. Glanzer, Pflug, and Pichler [30] prove a deviation-inequality for the so-called nested distance in a discrete time framework , andconsider acceptability pricing over an ambiguity set described through the nesteddistance. Bion-Nadal and Talay [20] study via PDE arguments a continuous-timeoptimization problem which is related to the adapted Wasserstein distance.The concept of causal couplings, and optimal transport over causal couplings,has been recently popularized by Lassalle [45] although precursors can be found inthe works [62, 58]. This notion is central to the recent articles [1, 10, 8, 9].The idea of strengthening weak convergence of measures in order to account forthe temporal evolution has some history. Indeed several authors have independentlyintroduced different approaches to address this challenge: The seminal unpublishedwork by Aldous [2] introduces the notion of extended weak convergence for the studyof stability of optimal stopping problems. The principal idea is not to comparethe laws of processes directly, but rather the laws of the corresponding predictionprocesses. Independently, Hellwig [33] introduces the information topology for thestability of equilibrium problems in economics. Roughly, two probability measureson a product of finitely many spaces X × . . . × X N are considered to be close if foreach t ≤ N the projections onto the first t coordinates as well as the correspondingconditional (regular) disintegrations are close. Unrelated to these developmentsPflug and Pichler [52, 53, 54] have introduced the nested distances for the stabilityof stochastic programming in discrete time. The nested distance is the obviousrole model for the adapted Wasserstein distances considered in this article and (asmentioned above) for a fixed number of time steps and p ≥
1, they are obviouslyequivalent. Yet another idea to account for the temporal evolution of processeswould be to symmetrise the causal transport costs W c ( P , Q ) defined by Lassalle[45] by taking the maximum or sum of W c ( P , Q ) and W c ( Q , P ); this was pointedout by Soumik Pal.In parallel work [6], the four authors of the present article investigate the re-lations between these concepts in detail. Remarkably, in discrete time all of theconcepts mentioned above (adapted Wasserstein distances, extended weak conver-gence, information topology, nested distances, symmetrised causal transport costs)define the same topology . As noted above, this ‘weak adapted topology’ refinesthe usual weak topology (properly for T ≥
2, see also Remark 5.2). The arti-cles [8, 6, 27] investigate basic properties of this topology, e.g. the weak adaptedtopology is Polish [8, Section 5], sets are totally bounded w.r.t. to adapted Wasser-stein distance / nested distance if and only if they are totally bounded w.r.t. usual Note added in revision: improved convergence rates have been recently obtained in [7] fora related sample-based estimator. Together with the results of the present article, this givesstatistical consistency for an empirical version of the financial problems considered.
J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER
Wasserstein distance [6, Lemma 1.6]. For recent applications of these concepts tooptimal transport and probabilistic variants thereof we refer to [11, 12, 61].In contrast, fundamental topological properties of the above mentioned conceptsin the continuous time case seem to be much less understood and, as far as theauthors are concerned, pose an interesting challenge for future research. Specifically,it is not clear to us whether the topology associated to the adapted Wassersteindistance is Polish in the continuous time case. In a similar vein, we expect thatresults analogous to the ones of the present article should apply in the case ofc`adl`ag paths, but this extension is beyond the scope of our current understandingof adapted Wasserstein distances.The question of stability in mathematical finance has been studied from differ-ent perspectives over the years. Notably, starting with the articles of Lyons [46]and Avellaneda, Levy, Paras [5] the area of robust finance has mainly focused onextremal models and hedging strategies which dominate the payoff for every modelin a specified class. Following the publication of Hobson’s seminal article [36] con-nections with the Skorokhod embedding problem have been a driving force of thefield, see the surveys of Hobson [37] and Ob l´oj [48]. Recently this has been com-plemented by techniques coming from (martingale) optimal transport, early paperswhich advance this viewpoint include [38, 15, 29, 16, 21, 26, 24, 18]. The litera-ture on ‘local’ misspecification of volatility in a sense more closely related to thepresent article appears more spare. El Karoui, Jeanblanc, and Shreve [28] establishin a stochastic volatility framework that if the misspecified volatility dominates thetrue volatility, then the misspecified price of call options dominates the real price;see also the elegant account of Hobson [39]. More recently, the question of pricingand hedging under uncertainty about the volatility of a reference local volatilitymodel is studied by Herrmann, Muhle-Karbe, and Seifried [35] (see also [34]). Lessplausible models are penalized through a mean square distance to the volatility ofthe reference model and the authors obtain explicit formulas for prices and hedgingstrategies in a limit for small uncertainty aversion. Becherer and Kentia [14] deriveworst-case good-deal bounds under model ambiguity which concerns drift as wellas volatility. Indeed, discussions with Dirk Becherer motivated us to consider alsomodels with drift in our results on stability of super hedging. The behaviour of thesuperhedging price in a ball (w.r.t. various notions of distance) around a referencemodel is studied in depth by Ob l´oj and Wiesel [49] for a d -dimensional asset andone time period.A notable implication of our work is that it yields a coherent way to measuremodel-uncertainty (in the sense of Cont’s influential article [25]): Fix a subset M of the set M of all consistent models, i.e. martingale measures which are consistentwith benchmark instruments whose price can be observed on the market. Given M , the model uncertainty associated to a derivative f can be gauged through ρ M ( f ) := sup { E Q f : Q ∈ M } − inf { E Q f : Q ∈ M } . The worst-case approach typically pursued in robust finance then yields ρ M ( f ) for M = M , but it appears equally natural to take M to be an infinitesimal ballaround a reference model. This approach is first carried out by Drapeau, Ob l´oj,Wiesel and one of the present authors [13] in a one period framework. Our resultsindicate that adapted Wasserstein distance provides a way to extend this to a multi-period setup, and we intend to pursue this further in future work.On a different note, much work has been done regarding the convergence ofdiscrete time models to their continuous time analogues. Due to the vastness ofthis literature we refer the reader to the book [57] for references. Finally, in morerecent times and starting from the works of Kardaras and ˇZitkovi´c, the stability ofutility maximization has been studied in [41, 43, 44, 47, 60] among others. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 9 The adapted Wasserstein distance
Basic properties of AW p . The following Lemma shows that AW p is well-defined. Lemma 3.1.
Let P , Q be integrable (semi-)martingale measures for X, Y : Ω → Ω ,respectively, and let π be a bi-causal coupling between P and Q . Then X, Y, X − Y : Ω × Ω → Ω are (semi)-martingales w.r.t. π . Further, if X = M + A denotesthe semimartingale decomposition under P , then up to evanescence M + A is thesemimartingale decomposition of X under π .Proof. Let X = M + A be the semimartingale decomposition under P and consider M and A as processes on Ω × Ω via M ( ω, η ) := M ( ω ) and A ( ω, η ) := A ( ω ). Furtherlet π = P ( dω ) π ω ( dη ) be a bi-causal coupling between P and Q . To show that X = M + A remains the semimartingale decomposition under π , it is enoughto show that M is a martingale under π . To that end, let 0 ≤ s ≤ t and let Z : Ω × Ω → R be F s ⊗ F s -measurable and bounded. (Recall that F = ( F t ) t denotes the right-continuous filtration generated by X and that we endow Ω × Ωwith the filtration ( F t ⊗ F t ) t .) Then the random variable Z ′ : Ω → R defined by Z ′ ( ω ) := Z Z ( ω, η ) π ω ( dη ) is F P s -measurable,and clearly bounded. Indeed, if Z ( ω, η ) = Z ( ω ) Z ( η ) for F s -measurable boundedfunctions Z and Z , then it follows from the definition of bi-causality that Z ′ is F P s -measurable; the general statement then follows from a monotone class argument.Therefore E π [( M t − M s ) Z ] = Z ( M t ( ω ) − M s ( ω )) Z Z ( ω, η ) π ω ( dη ) P ( dω )= E P [( M t − M s ) Z ′ ]= 0 , by the martingale property of M under P . This shows that M is a martingale under π and therefore that X = M + A is the semimartingale decomposition under π . (cid:3) Lemma 3.2. AW p defines a metric on the set SM p (Ω) . We note that very similar arguments could be used to show that AW p defines ametric for semimartingales with infinite time horizon N or [0 , ∞ ). Proof of Lemma 3.2.
It is clear that AW p ( P , Q ) = AW p ( Q , P ) ≥ P , Q ∈SM p (Ω). Suppose that AW p ( P , Q ) = 0. As k · k ∞ ≤ | · | , it is immediate thatif π participates in the infimum defining AW p ( P , Q ), and X − Y = M + A , then E π [ k X − Y k p ∞ ] ≤ p − E π [ k M k p ∞ + | A | p ] ≤ p − b p E π [[ M ] p/ T + | A | p ]where b p denotes the BDG constant and we used the BDG inequality for the martin-gale M . Hence the usual Wasserstein distance between P and Q (defined w.r.t. the k · k ∞ -norm) is dominated from above by AW p ( P , Q ), and so P = Q .We now prove the triangle inequality. Let P , Q , R given. We fix ε > π is bi-causal ε -optimal for AW p ( P , Q ) and ˜ π is bi-causal ε -optimal for AW p ( Q , R ).In the next couple of lines, ω will always denote the first coordinate of a vector inΩ , η the second, and γ the last. Let π ( dω, dη ) = π η ( dω ) Q ( dη ) and ˜ π ( dη, dγ ) = ˜ π η ( dγ ) Q ( dη )be disintegrations, and define Π ∈ P (Ω ) byΠ( dω, dη, dγ ) = π η ( dω ) ˜ π η ( dγ ) Q ( dη ) . If π ( dω, dγ ) := R Ω Π( dω, dη, dγ ) is the projection of Π onto the first and thirdcomponents, then it is clear that the first and second marginals of π are P and R respectively. Moreover, a disintegration of π = π ω ( dγ ) P ( dω ) is given by π ω ( dγ ) = Z Ω ˜ π η ( dγ ) π ω ( dη ) , where, as indicated above, π ω now denotes the disintegration of π w.r.t. the firstcoordinate, that is π ( dω, dη ) = π ω ( dη ) P ( dω ). We claim that, for every A ∈ F t , themapping ω π ω ( A ) is F P t -measurable. Indeed, by bi-causality of ˜ π one has that η ˜ π η ( A ) is F Q t -measurable. Thus there is an F t -measurable function X and a Q -almost surely zero function N such that ˜ π η ( A ) = X ( η ) + N ( η ) for all η ∈ Ω.Then π ω ( A ) = R Ω X ( η ) π ω ( dη ) + R Ω N ( η ) π ω ( dη ) for all η ∈ Ω. The first term is F P t -measurable (by bi-causality of π ), and, as π is a coupling between P and Q , onehas that R Ω N ( η ) π ω ( dη ) = 0 for P -almost all ω ∈ Ω.The argument for π = π γ ( dω ) R ( dγ ) is similar and therefore π is a bi-causalcoupling between P and R . Finally, it follows as in the proof of Lemma 3.1 that, if X = M X + A X , Y = M Y + A Y , and Z = M Z + A Z are the semimartingale decom-positions under P , Q , and R , then they remain the semimartingale decompositionunder Π on Ω endowed with the product filtration.To finish the proof of the triangle inequality, we observe that AW p ( P , R ) ≤ E π [[ M X − M Z ] p/ T + | A X − A Z | p ] /p = E Π h [( M X − M Y ) + ( M Y − M Z )] p/ T + · · ·· · · + | ( A X − A Y ) + ( A Y − A Z ) | p i /p . The function M E Π [[ M ] p/ T ] /p is known to be a norm on the space M p (Π)of Π-martingales started at zero whose supremum is p -integrable. Likewise A E Π [ | A | p ] /p is a norm on the space of finite variation processes with p -integrablevariation. Hence ( M, A )
7→ k ( M, A ) k := E Π [[ M ] p/ T + | A | p ] /p is a norm on the product of these spaces. We conclude the proof for the triangleinequality with AW p ( P , R ) ≤ k ( M X − M Y , A X − A Y ) + ( M Y − M Z , A Y − A Z ) k≤ k ( M X − M Y , A X − A Y ) k + k ( M X − M Y , A X − A Y ) k = E π [[ M X − M Y ] p/ T + | A X − A Y | p ] /p + E ˜ π [[ M Y − M Z ] p/ T + | A Y − A Z | p ] /p ≤ ε + AW p ( P , Q ) + AW p ( Q , R ) , since the semimartingale decomposition of X − Y under π is ( M X − M Y ) − ( A X − A Y ), with an analogous expression for Y − Z under ˜ π .To conclude the proof, it remains to show that AW p ( P , Q ) < ∞ for all P , Q ∈SM p (Ω). By Lemma 3.1, we have AW p ( P , δ ) = E P [[ M ] p/ T + | A | p ] /p where X = M + A is the semimartingale decomposition under P . Therefore the triangleinequality implies that AW p is real-valued on SM p (Ω). (cid:3) Examples and explicit calculations.
We start by a simple result whichpermits to give a closed-form expression of the adapted Wasserstein distance ingiven continuous-time situations:
DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 11
Proposition 3.3.
For i ∈ { , } consider the SDEs with bounded progressive coef-ficients: dX it = µ i ( t, { X is } s ≤ t ) dt + σ i ( t, { X is } s ≤ t ) dB it . (3.1) Assume that each SDE admits a unique strong solution and denote by P µ i ,σ i therespective laws. Further assume that • µ is a function of time only (namely µ : [0 , T ] → R ) • σ , σ ≥ and at least one of them is a function of time only.Then the synchronous coupling (namely π ∗ = joint law of ( X , X ) , where B = B in (3.1) ), is optimal in the definition of AW p ( P µ ,σ , P µ ,σ ) . The discrete time version of the aforementioned synchronous coupling is givenby the Knothe-Rosenblatt rearrangement [10], and a variant of the previous resultcan also be obtained in the discrete time framework.
Proof.
Let π be a feasible coupling for AW p ( P µ ,σ , P µ ,σ ), leading to a finite cost.Naturally for this proof we denote the coordinate process on Ω × Ω by ( X , X ). Asbefore we let X i = A i + M i be the unique continuous semimartingale decompositionof X i under the P µ i ,σ i -completion of its right-continuous filtration. Observe that ddt A is a.s. deterministic, by the assumption on µ , and that the law of ddt A isindependent of the coupling π . Both facts can be derived easily from the identity ddt A it = lim ε ց E π (cid:2) X it + ε |F X i t (cid:3) − X it ε , which by Lebesgue differentiation theorem holds dt ⊗ dπ -a.s. As a consequence, theterm E π [ | A − A | p − var ] is independent of the coupling π and so we may ignore itand only focus on the term E π [[ M − M ] p/ T ].By Doob’s martingale representation [40, Theorem 4.2], in a possibly enlargedfiltered probability space ( ˜Ω , ˜ F , ˜ π ) we may represent the martingale ( M , M ) by M it = Z t σ i dW + Z t σ i d ˆ W , where W, ˆ W are independent standard one-dimensional Brownian motions and { σ ik : i, k ∈ { , }} real-valued processes, both of them adapted in the enlarged filteredspace. In the following we will omit the argument { X is } s ≤ t from σ i . Necessarily σ i = ddt [ M i ] t = σ i + σ i , ( dt ⊗ d ˜ π − a.s. ) . By Cauchy-Schwarz inequality we deduce that almost surely[ M , M ] T = Z T [ σ σ + σ σ ] dt ≤ Z T σ σ dt, and accordingly we get the lower bound E π [[ M − M ] p/ T ] ≥ E π h(cid:16) Z T ( σ − σ ) dt (cid:17) p/ i . As in the beginning of the proof, the right-hand side does not depend on the coupling π thanks to either σ i being a function of time only. To conclude observe that forthe synchronous coupling π ∗ we have equality in the above equation. (cid:3) As an easy consequence we have
Example 3.4.
For bounded Lipschitz functions µ , µ , σ , σ we denote by P µ i ,σ i the law of the diffusion dX it = µ i ( t, X it ) dt + σ i ( t, X it ) dB t . Assume that • µ i is independent of the x -variable, some i ∈ { , } , and • σ k is independent of the x -variable, some k ∈ { , } .Calling j ∈ { , }\{ i } and ℓ ∈ { , }\{ k } , we have AW p ( P µ ,σ , P µ ,σ ) p = E h(cid:16) Z T [ σ ℓ ( t, X ℓt ) − σ k ( t )] dt (cid:17) p/ i + E h(cid:16) Z T | µ j ( t, X jt ) − µ i ( t ) | dt (cid:17) p i . We now illustrate that in general it is not true that the straightforward synchro-nous coupling of Proposition 3.3 is optimal. As a consequence, we do not expecta closed-form expression for the adapted Wasserstein distance. A discrete-timeversion of this observation is discussed in [8, Section 7].
Example 3.5.
Consider d = 1 , T = 2 , and for each c ∈ R introduce µ ct ( ω ) := c [1 , ( t ) sign( ω ) and ˆ µ ct ( ω ) := − µ ct ( ω ) . Assuming that B is a Brownian motion, and for σ ∈ R + , we introduce the couplings π := Law (cid:16) σB + Z µ ct ( B ) dt , σB + Z ˆ µ ct ( B ) dt (cid:17) ,π := Law (cid:16) σB + Z µ ct ( B ) dt , − σB + Z ˆ µ ct ( − B ) dt (cid:17) . These couplings share the same marginals and each of them is bi-causal. It is easyto compute E π (cid:2) [ M ] p/ T + | A | p (cid:3) = (2 c ) p , E π (cid:2) [ M ] p/ T + | A | p (cid:3) = (8 σ ) p/ . We conclude that, for each p , there are plenty of pairs ( c, σ ) such that the “synchro-nous” coupling π is not optimal between its marginals for the metric AW p . To close this section, we estimate the distance between two geometric Brownianmotions with different volatilities.
Proposition 3.6.
For i = 1 , , let P σ i be the law of the solution to the SDE dZ it = σ i Z it dB it with Z i = 1 , where B i denotes Brownian motion and σ i ∈ R + .Letting R ∼ N (0 , T ) , we then have AW ( P σ , P σ ) = E "(cid:18) e σ R − σ T − e σ R − σ T (cid:19) = e σ T − e σ σ T + e σ T and for p > AW p ( P σ , P σ ) p ≤ c p E (cid:20)(cid:18) e σ R − σ T − e σ R − σ T (cid:19) p (cid:21) , where c p is the constant in the BDG-inequality which allows to control quadraticvariation by terminal value. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 13
Proof.
We have AW p ( P σ , P σ ) p = inf n E π (cid:2) [ Z − Z ] p/ T (cid:3) : π ∈ Cpl BC ( P , Q ) o ≤ c p inf n E π [( Z T − Z T ) p ] : π ∈ Cpl BC ( P , Q ) o = c p inf n Z (cid:18) e σ r − σ T − e σ r − σ T (cid:19) p dπ ( r , r ) : π ∈ Cpl( γ T , γ T ) o = c p E (cid:20)(cid:18) e σ R − σ T − e σ R − σ T (cid:19) p (cid:21) , where γ T denotes a centered Gaussian with variance T . For p = 2 and c = 1 weobtain equality. (cid:3) Choice of the ‘cost functional’.
Recall from Definition 1.3 that the adaptedWasserstein distance is given through AW p ( P , Q ) := inf { Φ : π ∈ Cpl BC ( P , Q ) o , where the ‘cost functional’Φ = E π (cid:2) [ M X − M Y ] p/ T + | A X − A Y | p (cid:3) /p (3.2)is defined using the semimartingale decompositions X = M X + A X , Y = M Y + A Y .The distinctive property of this “quadratic plus first variation” functional is that itexhibits the proper scaling to interpret the discrete time case as approximation tothe continuous time counterpart. To wit, consider Ω = C ([0 , P σ be thelaw of X where X t = R t σ s dB s , B Brownian motion and σ ∈ C ([0 , , σ ≥
0. Foreach N , denote by P σN the law of a random walk on { , /N, /N, . . . , } with inde-pendent increments from n/N to ( n + 1) /N distributed according to N (0 , σ n/N /N ).Then one can compute that for 0 ≤ σ, σ ′ ∈ C ([0 , AW ( P σN , P σ ′ N ) = (cid:16) N − X n =0 N | σ n/N − σ ′ n/N | (cid:17) / → (cid:16) Z | σ t − σ ′ t | dt ) (cid:17) / = AW ( P σ , P σ ′ ) . For comparison, consider the consequences of replacing Φ in (3.2) with ˜Φ = E π [ P Ni =0 ( X i − Y i ) i ] / corresponding to quadratic nested distance (in terms of Pflug and Pichler[53]). While g AW and AW are equivalent metrics for each fixed N , g AW does notexhibit the appropriate scaling for large N . A straightforward computation shows g AW ( P σN , P σ ′ N ) → ∞ as N → ∞ whenever σ = σ ′ . In consequence, bounds on thehedging error in terms of g AW ( P σN , P σ ′ N ) become progressively weaker as N → ∞ .In particular they do not allow for a meaningful continuous time limit.When restricting solely to martingale measures P , Q , a sensible alternative to(3.2) would be to consider the maximum norm, i.e. Φ ′ = E π [sup t | X t − Y t | p ] /p .In fact, by the BDG-inequalities this is essentially equivalent our choice in (3.2).However, when considering semimartingales, this cost is too coarse. For example,let ( ω n ) be a sequence in Ω which converges to zero in maximum norm but forwhich the first variation tends to infinity. Then P n := δ ω n converges to P := δ (when adapted distance is defined only with maximum norm as cost), however,none of our optimization problems converge (take a strategy H ∈ H k for which( H ( X ) • X ) T ≈ k | ω n | almost surely). Stochastic integrals and a contraction principle.
We present here thetwo technical results which underlie the proofs of the main theorems in the article.The first one is
Lemma 3.7.
Let P , Q ∈ SM (Ω) , H ∈ H k , and π be a bi-causal coupling between P and Q . Then there exists a process G ∈ H k such that G t ( Y ) = E π [ H t ( X ) | Y ] forevery t , π -almost surely. Moreover, we have ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] , π -almost surely.Proof. In discrete time, write H = P Nt =1 H t { t } for Borel functions H t : R t − → [ − k, k ]. Let π = π η ( dω ) P ( dω ) be a disintegration and define G ′ t ( η ) := Z H t ( ω ) π η ( dω ) , for every t and η ∈ Ω. By definition of bi-causal coupling G ′ t is F Q t − -measurable.It remains to pick functions G t which are F t − measurable such G t = G ′ t Q -almostsurely. Since E π [ H t ( X ) | Y ] = G t ( Y ) π -almost surely, it is clear that ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almost surely.In continuous time we take G to be the predictable projection of H , under thereference measure π , with respect to the π -completion of the filtration {∅ , Ω } ⊗ F Y .By [1, Lemma C.1] the result is π -indistinguishable from a predictable process underthe Q -completion of the filtration F Y . The t -by- t , π -almost sure equality G t ( Y ) = E π [ H t ( X ) | Y ], is then a consequence of the definition of predictable projection. The π -almost sure equality ( G ( Y ) • Y ) T = E π [( H • Y ) T | Y ] is established in Lemma 3.8below, assuming that E Q [[ Y ] T ] < ∞ . The general case follows by localization. (cid:3) Lemma 3.8.
In the continuous-time context of Lemma 3.7, assume further that E Q [[ Y ] T ] < ∞ . Then we have ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] , π -almost surely.Proof. The statement is true if instead of the stochastic integrals we consideredthe integrals w.r.t. the finite variation part of Y (either by properties of Riemann-Stieltjes integrals, or directly from the definition of predictable projection). Forthis reason we may now assume that Y is itself a martingale.We first take for granted the following result: if h is bounded and predictable inthe filtration of ( X, Y ), and if g denotes its predictable projection in the filtrationof Y under the measure π , then E π h Z T | g t | d [ Y ] t i ≤ E π h Z T | h t | d [ Y ] t i . (3.3)We know that there exist a sequence ( H n ) of predictable simple processes s.t.lim n →∞ E π h Z T | H t − H nt | d [ Y ] t i = 0 . By Itˆo isometry the stochastic integrals ( H n • Y ) T converge in L ( π ) to ( H • Y ) T .Denoting by G n the predictable projection of H n with respect to the Y -filtration,we deduce from (3.3) thatlim n →∞ E π h Z T | G t − G nt | d [ Y ] t i = 0 , so again by Itˆo isometry ( G n • Y ) T converges in L ( π ) to ( G • Y ). The π -almostsure equality ( G n • Y ) T = E π [( H n • Y ) T | Y ] follows easily by the bi-causality of thecoupling π , and by taking L limits the desired conclusion is obtained. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 15
To finish the proof we must establish (3.3). First we observe that E π h Z T | g t | d [ Y ] t i / = sup f is Y -predictable k f k≤ E π h Z T f t g t d [ Y ] t i = sup f is Y -predictable k f k≤ E π h Z T f t h t d [ Y ] t i , as follows from predictable projection and upon taking k f k := E π [ R | f t | d [ Y ] t ].The result is a consequence of the equality E π h Z T | h t | d [ Y ] t i / = sup f is ( X, Y )-predictable k f k≤ E π h Z T f t h t d [ Y ] t i . (cid:3) Our next crucial technical result is given in Theorem 3.10 below. But first weneed some preparation.
Lemma 3.9.
Let P , Q ∈ SM p (Ω) , let π be a bi-causal coupling between P and Q ,let H ∈ H k , and write X − Y = M + A for the semimartingale decomposition under π . Then, for every p ≥ , we have E π [ k X − Y k p ∞ ] ≤ p − b p · E π [[ M ] p/ T + | A | p ] , E π [ | ( H ( X ) • X ) T − ( H ( X ) • Y ) T | p ] ≤ p − b p k p · E π [[ M ] p/ T + | A | p ] , where b p is the upper constant in the BDG-inequality. If further H t : Ω → R is ˜ L -Lipschitz continuous for every t , then we have E π [ | ( H ( X ) • X ) T − ( H ( Y ) • Y ) T | p ] ≤ p − b p k p · E π [[ M ] p/ T + | A | p ]+ α · E π [[ M ] pT + | A | p ] / where α = 2 p − ˜ L p b p b / p min {AW p ( P , δ ) p , AW p ( Q , δ ) p } .Proof. The elementary inequality ( x + y ) p ≤ p − x p + 2 p − y p for x, y ≥ k · k ∞ ≤ | · | imply E π [ k X − Y k p ∞ ] ≤ p − E π [ k M k p ∞ ] + 2 p − E π [ | A | p ] ≤ p − b p E π [[ M ] p/ T + | A | p ] . This proves the first part. The same arguments imply E π [ | ( H ( X ) • X ) T − ( H ( X ) • Y ) T | p ] ≤ p − E π [ | ( H ( X ) • M ) T | p ] + 2 p − E π [ | ( H ( X ) • A ) T | p ] ≤ p − k p b p E π [[ M ] p/ T + | A | p ]from which the second part follows. To prove the third claim, write E π [ | ( H ( X ) • X ) T − ( H ( Y ) • Y ) T | p ] ≤ p − E π [ | (( H ( X ) − H ( Y )) • X ) T | p ] + 2 p − E π [ | ( H ( Y ) • X ) T − ( H ( Y ) • Y ) T | p ] . The second term is smaller than 2 p − p − k p b p E π [[ M ] p/ T + | A | p ] by the secondpart. It remains to estimate E π [ | (( H ( Y ) − H ( Y )) • X ) T | p ]. Write X = N + B for thesemimartingale decomposition of X under P . By Lemma 3.1, the semimartingale decomposition under π is still X = N + B . Moreover, the BDG-inequality, theLipschitz-continuity of H , and H¨older’s inequality, imply that E π [ | (( H ( X ) − H ( Y )) • X ) T | p ] ≤ p − E π [ | (( H ( X ) − H ( Y )) • N ) T | p + | (( H ( X ) − H ( Y )) • B ) T | p ] ≤ p − E π [ k H ( X ) − H ( Y ) k p ∞ ( b p [ N ] p/ T + | B | p )] ≤ p − b p ˜ L p E π [ k X − Y k p ∞ ] / E π [([ N ] pT + | B | P ) ] / . It now follows from the first part that E π [ k X − Y k p ∞ ] / ≤ (2 p − b p ) / E π [[ M ] pT + | A | p ] / and by Lemma 3.1 we have E π [([ N ] p/ T + | B | p ) ] / ≤ / AW p ( P , δ ) p . Putting all estimates together and replacing X and Y yields the claim. (cid:3) Denote by P p ( R ) the set of all Borel probability measures µ on R such that R | x | p µ ( dx ) < ∞ . Moreover, let d p ( µ, ν ) be the usual p -Wasserstein distance, andlet d wp the weak p -Wasserstein cost, that is, d p ( µ, ν ) := inf n(cid:16) Z | x − y | p γ ( dx, dy ) (cid:17) /p : γ is a coupling of µ and ν o ,d wp ( µ, ν ) := inf n(cid:16) Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) p µ ( dx ) (cid:17) /p : γ is a coupling of µ and ν o . Here γ = µ ( dx ) γ x ( dy ) denotes the disintegration. Note that d wp is not symmetricand as a consequence of Jensen’s inequality, we always have d wp ≤ d p . Problemsakin to d wp ( µ, ν ) go under the name of ‘weak optimal transport’ and have beenrecently introduced by Gozlan et al. in [32], but see also [3, 4, 11, 9, 31]. We have Theorem 3.10 (Contraction) . Let P , Q ∈ SM p (Ω) , let π a bi-causal couplingbetween P and Q , let C : Ω → R be Lipschitz with constant L , and let H ∈ H k .Further denote by X − Y = M + A the semimartingale decomposition under π andlet G ∈ H k such that ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almost surely. Then d wp (cid:16) ( C ( Y ) + ( G ( Y ) • Y ) T )( Q ) , ( C ( X ) + ( H ( X ) • X ) T )( P ) (cid:17) ≤ ( p − /p b /pp ( k + L ) · E π [[ M ] p/ T + | A | p ] /p . (3.4) Now assume in addition that H t : Ω → R is ˜ L -Lipschitz continuous for every t ,then d p (cid:16) ( C ( Y ) + ( H ( Y ) • Y ) T )( Q ) , ( C ( X ) + ( H ( X ) • X ) T )( P ) (cid:17) ≤ (3 p − /p b /pp ( k + L ) E π [[ M ] p/ T + | A | p ] /p + α /p E π [[ M ] pT + | A | p ] / p , where α is the constant of Lemma 3.9.Proof. We start by proving the first claim. Let π be as stated, and define a ( X ) := C ( X ) + ( H ( X ) • X ) T as well as b ( Y ) := C ( Y ) + ( G ( Y ) • Y ) T . Now let γ :=( b ( Y ) , a ( X ))( π ) so that γ is trivially a coupling between b ( Y )( Q ) and a ( X )( P ).Therefore d wp (cid:16) b ( Y )( Q ) , a ( X )( P ) (cid:17) ≤ E π [ | b ( Y ) − E π [ a ( X ) | b ( Y )] | p ] /p . By assumption it holds that E π [( G ( Y ) • Y ) T − ( H ( X ) • X ) T | Y ] = E π [( H ( X ) • Y ) T − ( H ( X ) • X ) T | Y ] . DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 17
Thus, using the tower property and Jensen’s inequality, it follows that E π [ | b ( Y ) − E π [ a ( X ) | b ( Y )] | p ] /p ≤ E π (cid:2) (cid:12)(cid:12) E π [ C ( Y ) − C ( X ) | Y ] + E π [( G ( Y ) • Y ) T − ( H ( X ) • X ) T | Y ] (cid:12)(cid:12) p (cid:3) /p ≤ E π [ | C ( Y ) − C ( X ) | p ] /p + E π [ | ( H ( X ) • Y ) T − ( H ( X ) • X ) T | p ] /p The claim now follows from the first and second estimates in Lemma 3.9.In the second case where H is additionally Lipschitz, let d ( X ) := C ( X )+( H ( X ) • X ) T as well as e ( Y ) := C ( Y ) + ( H ( Y ) • Y ) T and γ := ( e ( Y ) , d ( Y ))( π ). Then,similarly as before, d p (cid:16) e ( Y )( Q ) , d ( X )( P ) (cid:17) ≤ E π [ | e ( Y ) − d ( Y ) | p ] /p ≤ E π [ | C ( Y ) − C ( X ) | p ] /p + E π [ | ( H ( Y ) • Y ) T − ( H ( X ) • X ) T | p ] /p and the claim follows from the first and third estimates of Lemma 3.9. (cid:3) Remark 3.11.
An evident question is whether an estimate for the usual Wasser-stein distance holds true without the (Lipschitz-) continuity assumption on H . Namelyif (3.4) holds for d p instead of d wp . The following example shows that this is nottrue. In a two-period discrete time model ( T = 2) , let P := δ ⊗ (( δ + δ − ) / and P ε := (( δ ε + δ − ε ) / ⊗ (( δ + δ − ) / so that AW p ( P ε , P ) → as ε → for every p . Then, set H := 0 and H :=1 (0 , ∞ ) − ( −∞ , . For the projection under any bi-causal coupling between P ε and P of H onto Y one computes G = 0 and G = 0 . In particular ( G ( Y ) • Y ) T = 0 P -almost surely. However, for every ε > one has P ε (( H ( X ) • X ) T ≥ − ε ) ≥ / which implies that the respective laws cannot converge. Remark 3.12. By b p we denote the smallest real number such that E [ k M k p ∞ ] ≤ b p E [[ M ] p/ ](3.5) for every martingale M . For p ≥ it was established by Burkholder [22] that b p = p but the value of b p is unknown for p ∈ [1 , according to [50] , [51, page 427] . By [17] , b ≤ . (The optimal constant in the reverse inequality is known for the trivialcase p = 2 and for p = 1 . In the latter instance one obtains √ and . . . . [59] for continuous martingales, resp.) Proofs of the results stated in the introduction and extensions
Thanks to work done in the previous section, the strategy for the proofs boilsdown into two parts. In a first step, one forgets about the space Ω and only focuseson continuity of the problem at hand with respect to d p or d wp when image measureson R are plugged in: e.g. in utility maximization this means to study continuityof µ R U ( x ) µ ( dx ). In a second step, one uses the obtained continuity and thecontraction theorem in the previous section.4.1. Proof of Theorem 1.5.
We will need the elementary estimate
Lemma 4.1.
Let µ, ν ∈ P ( R ) and let f : R → R be convex and Lipschitz. Z f ( x ) µ ( dx ) − Z f ( y ) ν ( dy ) ≤ L d w ( µ, ν ) , (4.1) where L is Lipschitz constant of f . Proof.
Let γ be a coupling of µ and ν . Applying Jensen’s inequality we obtain Z f ( x ) µ ( dx ) − Z f ( y ) ν ( dy ) = Z f ( x ) − f ( y ) γ ( dx, dy )= Z (cid:16) f ( x ) − Z f ( y ) γ x ( dy ) (cid:17) µ ( dx ) ≤ Z (cid:16) f ( x ) − f (cid:16) Z y γ x ( dy ) (cid:17)(cid:17) µ ( dx ) ≤ L Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) µ ( dx ) . As γ was arbitrary, this implies the claim. (cid:3) In fact there is equality in the previous lemma, if one takes supremum in thel.h.s. of (4.1) over all L -Lipschitz convex function, as shown in [32, Proposition 3.2].We now turn to the proof of Theorem 1.5. For n > π be a bi-causal couplingwhich attains the infimum in the definition of AW ( P , Q ) modulo a 1 /n -margin. ByLemma 3.7 there is G n ∈ H k such that ( G n ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almostsurely. Define µ n := ( C ( Y ) + ( G n ( Y ) • Y ) T )( Q ) and ν := ( C ( X ) + ( H ( X ) • X ) T )( P ) . (Note that µ n , ν ∈ P ( R ) as P , Q ∈ SM (Ω).) By Lemma 4.1 we have E Q (cid:2) ( C ( Y ) − m − ( G n ( Y ) • Y ) T ) + (cid:3) − E P (cid:2) ( C ( X ) − m − ( H ( X ) • X ) T ) + (cid:3) ≤ d w ( µ n , ν ) . From Theorem 3.10 we obtain E Q [( C ( Y ) − m − ( G n ( Y ) • Y ) T )) + ] ≤ E P [( C ( X ) − m − ( H ( X ) • X ) T ) + ]+ b ( k + L ) ( AW ( P , Q ) + 1 /n ) . (4.2)Assume first that E Q [[ Y ] T ] < ∞ and denote by A the finite variation processassociated to Y . Then, as ( G n ) is uniformly bounded by k , there exists a pre-dictable G and a sequence of forward-convex combinations of ( G n ) which convergein L ( d Q ⊗ d ([ Y ] + A )) to G . This, (4.2), and the convexity of ( · ) + lead to thedesired conclusion. The general case follows by a simple but notationally heavylocalization argument.The proof in case that G = H and H is Lipschitz follows analogously from thesecond part of Theorem 3.10.4.2. Proof of Theorem 1.6.
In a first step notice that for all P , P ′ and randomvariables Z, Z ′ , it follows as in Lemma 4.1 thatAVaR P α ( Z ) − AVaR P ′ α ( Z ′ ) ≤ d w ( Z ( P ) , Z ′ ( P ′ )) /α. Indeed, if γ is a coupling from µ := Z ( P ) to ν := Z ′ ( P ′ ) thenAVaR P α ( Z ) − AVaR P ′ α ( Z ′ )= inf m Z α ( x − m ) + − m µ ( dx ) − inf m α Z Z ( y − m ) + γ x ( dy ) − m µ ( dy ) ≤ sup m α Z ( x − m ) + − ( y − m ) + γ ( dx, dy ) ≤ sup m α Z ( x − m ) + − (cid:16) Z y γ x ( dy ) − m (cid:17) + µ ( dx ) ≤ α Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) µ ( dx ) , so minimizing over γ yields the claim.The rest of the proof now follows the line of argumentation as in the proof forTheorem 1.5. Fix P , Q ∈ SM (Ω). Assume only for notational simplicity that DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 19 there exists a bi-causal coupling π which attains the infimum in the definition of AW ( P , Q ), and that there exist H ∗ ∈ H k such thatAVaR P α ( C ( X ) − ( H ∗ ( X ) • X ) T ) = inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) . By Lemma 3.7 there is G ∗ ∈ H k such that ( G ∗ ( Y ) • Y ) T = E π [( H ∗ ( X ) • Y ) T | Y ] π -almost surely. Thereforeinf G ∈H k AVaR Q α ( C ( Y ) − ( G ( Y ) • Y ) T ) − inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≤ AVaR Q α ( C ( Y ) − ( G ∗ ( Y ) • Y ) T ) − AVaR P α ( C ( X ) − ( H ∗ ( X ) • X ) T ) ≤ α d w (cid:16) ( C ( Y ) − ( G ∗ ( Y ) • Y ) T )( Q ) , ( C ( X ) − ( H ∗ ( X ) • X ) T )( P ) (cid:17) ≤ b ( k + L ) α AW ( P , Q ) , where the last inequality is due to Theorem 3.10. Interchanging the role of P and Q yields the desired conclusion. The proof for the second estimate follows analogously.4.3. Proof of Example 1.7.
First note that AVaR P α ( Z ) ≥ E P [ Z ] for every in-tegrable random variable Z . Indeed, this follows from integrating the pointwiseinequality x = x + m − m ≤ ( x + m ) + /α − m . Therefore, as the Brownian stochas-tic integral has expectation zero, we conclude that inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≥ E P [ C ( X )]. On the other hand, define f ( t, x ) := Z c ( x + y ) N (0 , σ ( T − t ))( dy ) for ( t, x ) ∈ [0 , T ] × R , where N (0 , σ ( T − t )) stands for the normal distribution with mean 0 and variance σ ( T − t ). Then C ( X ) = f ( T, X T ) and E P [ f ( t, X t ) |F s ] = f ( s, X s ) for every 0 ≤ s ≤ t ≤ T . Thus, by Itˆo’s formula and fact that the martingale property impliesthat the finite variation part vanishes, one has f ( t, X t ) = f (0 ,
0) + ( H ∗ ( X ) · X ) T forthe predictable trading strategy H ∗ t := ∂ x f ( t, X t ). As further | H ∗ t | ≤ t and f (0 ,
0) = σ/ √ π , one hasinf H ∈H AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≤ AVaR P α ( C ( X ) − ( H ∗ ( X ) · X ) T ) = σ √ π . The proof now follows from the explicit formula for the adapted Wasserstein dis-tance derived in Example 3.4 and the fact that E P [ C ( X )] = σ/ √ π .4.4. Proof of Theorem 1.8.
Recall that U ′ ( x ) ≤ c (1 + | x | p − ) for all x ∈ R andsome constant c . Let P , Q ∈ SM p (Ω) be arbitrary and assume only for notationalsimplicity that there is H ∗ ∈ H k such that E P [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] = sup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )]and that there is a bi-causal coupling π coupling between P and Q which is opti-mal for AW p ( P , Q ). By Lemma 3.7 there is G ∗ ∈ H k such that ( G ∗ ( Y ) • Y ) T = E π [( H ∗ ( X ) • Y ) T | Y ] π -almost surely. Let µ := ( C ( Y ) + ( G ∗ ( Y ) • Y ) T )( Q ) and ν := ( C ( X ) + ( H ∗ ( X ) • X ) T )( P ) , and let γ be an (almost) optimal coupling for d wp ( µ, ν ). As U is concave andincreasing, we have U ( y ) − U ( x ) ≤ U ′ (min { x, y } ) | x − y | . Using Jensen’s inequality for the concave function U we havesup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )] − sup G ∈H k E Q [ U ( C ( Y ) + ( G ( Y ) • Y ) T )] ≤ E P [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] − E Q [ U ( C ( Y ) + ( G ∗ ( Y ) • Y ) T )]= Z U ( y ) − U ( x ) γ ( dx, dy ) ≤ Z U (cid:16) Z y γ x ( dy ) (cid:17) − U ( x ) µ ( dx ) ≤ (cid:16) Z (cid:12)(cid:12)(cid:12) U ′ (cid:16) min n x, Z y γ x ( dy ) o(cid:17)(cid:12)(cid:12)(cid:12) q µ ( dx ) (cid:17) /q d wp ( µ, ν ) , where we used H¨older’s inequality in the last line and q denotes the conjugate H¨olderexponent of p (that is, 1 /p + 1 /q = 1). As q ( p −
1) = p , the growth assumptionon U ′ implies that | U ′ (min { x, y } ) | q ≤ c (1 + | x | p + | y | p ) for some (new) constant c .Then, by Lemma 3.9, we have Z (cid:12)(cid:12)(cid:12) U ′ (cid:16) min n x, Z y γ x ( dy ) o(cid:17)(cid:12)(cid:12)(cid:12) q µ ( dx ) ≤ c (cid:16) Z | x | p µ ( dx ) + Z (cid:12)(cid:12)(cid:12) Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) p µ ( dy ) (cid:17) ≤ c (cid:16) Z | x | p µ ( dx ) + Z | y | p ν ( dy ) (cid:17) ≤ ˜ c (cid:0) AW p ( Q , δ ) p + AW p ( P , δ ) p (cid:1) =: e for e := ˜ c (1 + R p + R p ). Exchanging the roles of P and Q and using Theorem 3.10completes the proof.4.5. The proof of Theorem 1.9.
In a first step, we claim that v ( P ) is uniformlybounded over all P with AW p ( P , δ ) ≤ R . Indeed, using the growth assumption on U , the fact that U is stricly increasing, and the BDG-inequality to control the p -thmoment of ( H • X ) T , it follows that there exist a, A ∈ R such thatinf U < a ≤ sup H ∈H k E P [ U (( H • X ) T )] ≤ A < sup U (4.3)for all P with AW p ( P , δ ) ≤ R . Now assume that there exists a sequence P n with AW p ( P n , δ ) ≤ R but v ( P n ) → ∞ . Then, using the BDG-inequality once more, itfollows that sup H ∈H k E P n [ U ( C − v ( P n ) + ( H • X ) T )] → inf U, a contradiction to (4.3). The case v ( P n ) → −∞ is excluded analogously.At this point, using the definition of v ( P ), a twofold application of Theorem 1.8yields (cid:12)(cid:12)(cid:12) sup H ∈H k E Q [ U ( C − v ( P ) + ( H • X ) T )] − sup H ∈H k E Q [ U (( H • X ) T )] (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) . Indeed, while a direct application of the theorem would give a constant K whichdepends on v ( P ), an inspection of its proof shows that the constant K dependsonly on the size of v ( P ). By the first step this is bounded unifomly over P with AW p ( P , δ ) ≤ R .Now let ε > H ∈ H k be arbitrary, and set Y := Y H := C − v ( P )+( H • X ) T .Then, it follows that there is some constant c > R ≥ AW p ( Q , δ )and U ) such that E Q [ U ( Y + ε )] = E Q [ U ( Y )] + E Q h Z Y + εY U ′ ( z ) dz i ≥ E Q [ U ( Y )] + εc. Indeed, this would follow directly if Y were bounded by a fixed constant but read-ily extends to the present setting as E Q [ | Y | p ] ≤ C for some constant C > DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 21 independent of H and Q as long as AW p ( Q , δ ) ≤ R . In a similar manner E Q [ U ( Y − ε )] ≤ E Q [ U ( Y )] − εc .Putting everything together revealssup H ∈H k E Q [ U ( C − ( v ( P ) + ε ) + ( H • X ) T )] < sup H ∈H k E Q [ U (( H • X ) T )] < sup H ∈H k E Q [ U ( C − ( v ( P ) − ε ) + ( H • X ) T )]for some ε < ˜ C AW p ( P , Q ) (where ˜ C is a new constant emerging from K and c ).Thus | v ( Q ) − v ( P ) | ≤ ε ≤ ˜ C AW p ( P , Q ) which completes the proof.4.6. Two generalizations.
The following two results can be proved using almostthe same arguments as used in the proofs of Theorem 1.6 and Theorem 1.8. Inparticular the proofs boil down to establishing convergence for image measureswith respect to d p and give no new insight on adapted Wasserstein distances, so weshall skip them. Proposition 4.2.
Let ℓ : R → R + be a convex and strictly increasing function andlet δ > . Assume that p ≥ is such that ℓ ′ ( x ) ≤ c (1 + | x | p − ) for some constant c . Then, for every Lipschitz continuous function C : Ω → R , the function P inf n m ∈ R : ∃ H ∈ H k such that E P [ ℓ ( C ( X ) − ( H ( X ) • X ) T − m )] ≤ δ o is continuous on ( SM p (Ω) , AW p ) . Let ρ be a law-invariant risk measure which we directly view as a functional from P p ( R ) to the reals. For P ∈ SM p (Ω) and a random variable Z : Ω → R (such that Z ( P ) ∈ P p ( R )) we write ρ P ( Z ) = ρ ( Z ( P )). A typical example of a law invariantrisk measure which satisfies ρ ( µ ) − ρ ( ν ) ≤ Ld w ( µ, ν ) for some constant L dependingon the p -the moment of µ and ν is the optimized certainty equivalent, introducedto the mathematical finance community in [19]. For a convex, increasing function ℓ : R → R which is bounded from below and satisfies ℓ ( x ) /x → ∞ as x → ∞ , theoptimized certainty equivalent is defined via ρ P ( Z ) := inf m ∈ R (cid:0) E P [ ℓ ( Z − m )] + m (cid:1) = inf m ∈ R (cid:16) Z ℓ ( x − m ) ( Z ( P ))( dx ) + m (cid:17) . If ℓ ′ ( x ) ≤ c (1 + | x | p − ), then it follows that the infimum over m can be taken insome compact set depending on the p -th moments. Due to cash additivity of ρ , thefollowing proposition has the same interpretation as Theorem 1.6. Proposition 4.3.
Assume that ρ : P p ( R ) → R satisfies ρ ( µ ) − ρ ( ν ) ≤ Ld w ( µ, ν ) for some constant L depending on the p -the moment of µ and ν . Then, for everyLipschitz function C : Ω → R , the mapping P inf H ∈H k ρ P ( C ( X ) − ( H ( X ) • X ) T ) is locally Lipschitz continuous on ( SM p (Ω) , AW p ) . Finally, let us point out that (though not a convex risk measure) the Value-at-Risk (VaR) would be another natural candidate to study continuity. However,as VaR is not continuous w.r.t. weak convergence, already in a one period modelcontinuity of P inf { m ∈ R : there is H ∈ H k with VaR P ( C ( X ) − m − ( H ( X ) • X ) T ) ≤ } does not hold. Final remarks
Remark 5.1 (Usual Wasserstein does not work I) . We note that convergence inthe usual Wasserstein distance is not sufficient to obtain continuity in any of theproblems we study in this paper. Consider a two period market with P n = 14 (cid:16) δ (1 /n, + δ (1 /n, + δ ( − /n, + δ ( − /n, − (cid:17) , P = 14 (cid:16) δ (0 , + 2 δ (0 , + δ (0 , − (cid:17) . Then P and each P n satisfy the classical no-arbitrage condition, unlike the situationdescribed in Figure 1. While P n converges to P in usual Wasserstein distance, onecan verify that convergence in nested distance does not hold. For example in utilitymaximization of the trivial claim C = 0 , we have sup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )] = U (0) by Jensen’s inequality (as X is a martingale under P ). For P n takingthe strategy H ∗ consisting of H ∗ = 0 and H ∗ ( x ) = k sign( x ) , one gets sup H ∈H k E P n [ U ( C ( X ) + ( H ( X ) • X ) T )] ≥ E P n [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] → U ( k ) , showing the lack of continuity. Remark 5.2 (Usual Wasserstein does not work II) . As explained in the introduc-tion, the objective in Theorem 1.5 can be seen as a relaxed version of the superhedg-ing problem. The reason to consider this relaxation is not a technical simplificationbut necessary to to obtain continuity without further assumptions. Indeed, the prob-lem of superhedging inf n m ∈ R : there is H ∈ H k such that m + ( H • X ) T ≥ C ( X ) , P -almost surely (cid:9) is not continuous in P w.r.t. adapted distance for any k ∈ [0 , ∞ ] . In fact, thisalready happens in one period, where adapted and the usual Wasserstein distancescoincide. Consider a sequence of measures P n with full support which convergeweakly to a measure P . Then the superhedging price w.r.t. P n equals the concaveenvelope of C , while the superhedging price w.r.t. P equals the concave envelope of C restricted to the support of P . For a recent paper on this problem in one period,see the work of Ob l´oj and Wiesel [49] . Remark 5.3 (Uniformly bounded strategies are necessary) . Similar as in Remark5.2 the restriction to trading strategies in H k (i.e. uniformly bounded strategies)is also no technical simplification. For example, in a one-period framework, themeasures P ε := (1 − ε ) δ (0 ,ε ) + εδ (0 , − ε ) converges to P := δ (0 , in every (adapted)Wasserstein distance. However, we have for small ε > H ∈H ∞ AVaR P ε α (( H • X ) T ) = −∞ while inf H ∈H ∞ AVaR P α (( H • X ) T ) = 0 , where H ∈ H ∞ := S k ∈ N H k is the set of all bounded trading strategies. Acknowledgements . All authors are grateful to the anonymous referees whoseinsightful comments had a significant impact on this article. J. Backhoff grate-fully acknowledges financial support by the FWF through grant P30750 and bythe Vienna University of Technology. D. Bartl has been funded by the AustrianScience Fund (FWF) under Project P28661. M. Beiglboeck and M. Eder gratefullyacknowledge financial support by the FWF through grant Y782.
DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 23
References [1] B. Acciaio, J. Backhoff-Veraguas, and A. Zalashko. Causal optimal transport and its links toenlargement of filtrations and continuous-time stochastic optimization.
Forthcoming at Stoch.Processes and their Applications , 2016.[2] D. J. Aldous. Weak convergence and general theory of processes. Unpublished monograph;Department of Statistics, University of California, Berkeley, CA 94720, July 1981.[3] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of one-dimensional probability measuresin the convex order and computation of robust option price bounds.
International Journalof Theoretical and Applied Finance , 22(03):1950002, 2019.[4] J.-J. Alibert, G. Bouchitte, and T. Champion. A new class of cost for optimal transportplanning. hal-preprint , 2018.[5] M. Avellaneda, A. Levy, and A. Par`as. Pricing and hedging derivative securities in marketswith uncertain volatilities.
Appl. Math. Finance , 2(2):73–88, 1995.[6] J. Backhoff-Veraguas, D. Bartl, M. Beiglb¨ock, and M. Eder. All Adapted Topologies areEqual. arXiv e-prints , page arXiv:1905.00368, May 2019.[7] J. Backhoff-Veraguas, D. Bartl, M. Beiglb¨ock, and J. Wiesel. Estimating processes in adaptedwasserstein distance. arXiv e-prints , 2020.[8] J. Backhoff-Veraguas, M. Beiglb¨ock, M. Eder, and A. Pichler. Fundamental properties ofprocess distances.
ArXiv e-prints , 2017.[9] J. Backhoff-Veraguas, M. Beiglb¨ock, M. Huesmann, and S. K¨allblad. Martingale Benamou–Brenier: a probabilistic perspective.
To appear in Annals of Probability , Aug. 2018.[10] J. Backhoff-Veraguas, M. Beiglb¨ock, Y. Lin, and A. Zalashko. Causal transport in discretetime and applications.
SIAM Journal on Optimization , 27(4):2528–2562, 2017.[11] J. Backhoff-Veraguas, M. Beiglb¨ock, and G. Pammer. Existence, duality, and cyclical mono-tonicity for weak transport costs.
Calculus of Variations and Partial Differential Equations ,58(6):203, 2019.[12] J. Backhoff-Veraguas and G. Pammer. Stability of martingale optimal transport and weakoptimal transport. arXiv e-prints , page arXiv:1904.04171, Apr 2019.[13] D. Bartl, S. Drapeau, J. Ob l´oj, and J. Wiesel. , private communication.[14] D. Becherer and K. Kentia. Good deal hedging and valuation under combined uncertaintyabout drift and volatility.
Probab. Uncertain. Quant. Risk , 2:Paper No. 13, 40, 2017.[15] M. Beiglb¨ock, P. Henry-Labord`ere, and F. Penkner. Model-independent bounds for optionprices: A mass transport approach.
Finance Stoch. , 17(3):477–501, 2013.[16] M. Beiglb¨ock and N. Juillet. On a problem of optimal transport under marginal martingaleconstraints.
Ann. Probab. , 44(1):42–106, 2016.[17] M. Beiglb¨ock and P. Siorpaes. Pathwise versions of the Burkholder-Davis-Gundy inequality.
Bernoulli , 21(1):360–373, 2015.[18] M. Beiglboeck, A. Cox, and M. Huesmann. The geometry of multi-marginal Skorokhod Em-bedding.
PTRF, to appear , page arXiv:1705.09505, May 2019.[19] A. Ben-Tal and M. Teboulle. An old-new concept of convex risk measures: The optimizedcertainty equivalent.
Mathematical Finance , 17(3):449–476, 2007.[20] J. Bion-Nadal and D. Talay. On a Wasserstein-type distance between solutions to stochasticdifferential equations.
Ann. Appl. Probab. , 29(3):1609–1639, 2019.[21] B. Bouchard and M. Nutz. Arbitrage and duality in nondominated discrete-time models.
TheAnnals of Applied Probability , 25(2):823–859, 2015.[22] D. L. Burkholder. Explorations in martingale theory and its applications. In ´Ecole d’ ´Et´e deProbabilit´es de Saint-Flour XIX—1989 , volume 1464 of
Lecture Notes in Math. , pages 1–66.Springer, Berlin, 1991.[23] D. L. Burkholder. The best constant in the Davis inequality for the expectation of the mar-tingale square function.
Trans. Amer. Math. Soc. , 354(1):91–105 (electronic), 2002.[24] L. Campi, I. Laachir, and C. Martini. Change of numeraire in the two-marginals martingaletransport problem.
Finance Stoch. , 21(2):471–486, June 2017.[25] R. Cont. Model uncertainty and its impact on the pricing of derivative instruments.
Mathe-matical finance , 16(3):519–547, 2006.[26] Y. Dolinsky and H. M. Soner. Martingale optimal transport and robust hedging in continuoustime.
Probab. Theory Relat. Fields , 160(1-2):391–427, 2014.[27] M. Eder. Compactness in Adapted Weak Topologies. arXiv e-prints , page arXiv:1905.00856,May 2019.[28] N. El Karoui, M. Jeanblanc, and S. Shreve. Robustness of the Black and Scholes formula.
Math. Finance , 8(2):93–126, 1998. [29] A. Galichon, P. Henry-Labord`ere, and N. Touzi. A stochastic control approach to no-arbitragebounds given marginals, with an application to lookback options.
Ann. Appl. Probab. ,24(1):312–336, 2014.[30] M. Glanzer, G. C. Pflug, and A. Pichler. Incorporating statistical model error into the calcu-lation of acceptability prices of contingent claims.
Mathematical Programming , 174(1-2):499–524, 2019.[31] N. Gozlan, C. Roberto, P.-M. Samson, Y. Shu, and P. Tetali. Characterization of a classof weak transport-entropy inequalities on the line.
Ann. Inst. Henri Poincar´e Probab. Stat. ,54(3):1667–1693, 2018.[32] N. Gozlan, C. Roberto, P.-M. Samson, and P. Tetali. Kantorovich duality for general transportcosts and applications.
J. Funct. Anal. , 273(11):3327–3405, 2017.[33] M. F. Hellwig. Sequential decisions under uncertainty and the maximum theorem.
J. Math.Econom. , 25(4):443–464, 1996.[34] S. Herrmann and J. Muhle-Karbe. Model uncertainty, recalibration, and the emergence ofdelta–vega hedging.
Finance and Stochastics , 21(4):873–930, Oct 2017.[35] S. Herrmann, J. Muhle-Karbe, and F. T. Seifried. Hedging with small uncertainty aversion.
Finance and Stochastics , 21(1):1–64, Jan 2017.[36] D. Hobson. Robust hedging of the lookback option.
Finance and Stochastics , 2:329–347, 1998.[37] D. Hobson. The Skorokhod embedding problem and model-independent bounds for optionprices. In
Paris-Princeton Lectures on Mathematical Finance 2010 , volume 2003 of
LectureNotes in Math. , pages 267–318. Springer, Berlin, 2011.[38] D. Hobson and A. Neuberger. Robust bounds for forward start options.
Math. Finance ,22(1):31–56, 2012.[39] D. G. Hobson. Volatility misspecification, option pricing and superreplication via coupling.
Ann. Appl. Probab. , 8(1):193–205, 1998.[40] I. Karatzas and S. Shreve.
Brownian motion and stochastic calculus , volume 113. SpringerScience & Business Media, 2012.[41] C. Kardaras and G. ˇZitkovi´c. Stability of the utility maximization problem with randomendowment in incomplete markets.
Math. Finance , 21(2):313–333, 2011.[42] D. Lacker. Dense sets of joint distributions appearing in filtration enlargements, stochasticcontrol, and causal optimal transport.
ArXiv e-prints , 2018.[43] K. Larsen. Continuity of utility-maximization with respect to preferences.
Math. Finance ,19(2):237–250, 2009.[44] K. Larsen and G. ˇZitkovi´c. Stability of utility-maximization in incomplete markets.
StochasticProcess. Appl. , 117(11):1642–1662, 2007.[45] R. Lassalle. Causal transference plans and their Monge-Kantorovich problems.
StochasticAnalysis and Applications , 36(3):452–484, 2018.[46] T. J. Lyons. Uncertain volatility and the risk-free synthesis of derivatives.
Applied Mathemat-ical Finance , 2(2):117–133, 1995.[47] M. Mocha and N. Westray. The stability of the constrained utility maximization problem: aBSDE approach.
SIAM J. Financial Math. , 4(1):117–150, 2013.[48] J. Ob l´oj. The Skorokhod embedding problem and its offspring.
Probab. Surv. , 1:321–390,2004.[49] J. Ob l´oj and J. Wiesel. Statistical estimation of superhedging prices.
ArXiv e-prints , 2018.[50] A. Osekowski. Sharp maximal inequalities for the martingale square bracket.
Stochastics: AnInternational Journal of Probability and Stochastics Processes , 82(06):589–605, 2010.[51] A. Osekowski.
Sharp martingale and semimartingale inequalities , volume 72 of
InstytutMatematyczny Polskiej Akademii Nauk. Monografie Matematyczne (New Series) [Mathe-matics Institute of the Polish Academy of Sciences. Mathematical Monographs (New Series)] .Birkh¨auser/Springer Basel AG, Basel, 2012.[52] G. C. Pflug. Version-independence and nested distributions in multistage stochastic optimiza-tion.
SIAM Journal on Optimization , 20(3):1406–1420, 2009.[53] G. C. Pflug and A. Pichler. A distance for multistage stochastic optimization models.
SIAMJ. Optim. , 22(1):1–23, 2012.[54] G. C. Pflug and A. Pichler.
Multistage stochastic optimization . Springer Series in OperationsResearch and Financial Engineering. Springer, Cham, 2014.[55] G. C. Pflug and A. Pichler. From empirical observations to tree models for stochastic opti-mization: convergence properties.
SIAM J. Optim. , 26(3):1715–1740, 2016.[56] A. Pratelli. On the equality between Monge’s infimum and Kantorovich’s minimum in optimalmass transportation.
Ann. Inst. H. Poincar´e Probab. Statist. , 43(1):1–13, 2007.[57] J.-L. Prigent. Weak convergence of financial markets. In
Weak Convergence of FinancialMarkets , pages 129–265. Springer, 2003.
DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 25 [58] L. R¨uschendorf. The Wasserstein distance and approximation theorems.
Z. Wahrsch. Verw.Gebiete , 70(1):117–129, 1985.[59] W. Schachermayer and F. Stebegg. The Sharp Constant for the Burkholder-Davis-GundyInequality and Non-Smooth Pasting.
Bernoulli, to appear , July 2017.[60] K. Weston. Stability of utility maximization in nonequivalent markets.
Finance and Stochas-tics , 20(2):511–541, 2016.[61] J. Wiesel. Continuity of the martingale optimal transport problem on the real line. arXive-prints , page arXiv:1905.04574, May 2019.[62] T. Yamada and S. Watanabe. On the uniqueness of solutions of stochastic differential equa-tions.