[PDF] Adapted Wasserstein Distances and Stability in Mathematical Finance

Abstract

Assume that an agent models a financial asset through a measure Q with the goal to price / hedge some derivative or optimize some expected utility. Even if the model Q is chosen in the most skilful and sophisticated way, she is left with the possibility that Q does not provide an "exact" description of reality. This leads us to the following question: will the hedge still be somewhat meaningful for models in the proximity of Q? If we measure proximity with the usual Wasserstein distance (say), the answer is NO. Models which are similar w.r.t. Wasserstein distance may provide dramatically different information on which to base a hedging strategy. Remarkably, this can be overcome by considering a suitable "adapted" version of the Wasserstein distance which takes the temporal structure of pricing models into account. This adapted Wasserstein distance is most closely related to the nested distance as pioneered by Pflug and Pichler \cite{Pf09,PfPi12,PfPi14}. It allows us to establish Lipschitz properties of hedging strategies for semimartingale models in discrete and continuous time. Notably, these abstract results are sharp already for Brownian motion and European call options.

Full PDF

aa r X i v : . [ q -f i n . M F ] M a y ADAPTED WASSERSTEIN DISTANCES AND STABILITY INMATHEMATICAL FINANCE

J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER

Abstract.

Assume that an agent models a ﬁnancial asset through a measure Q with the goal to price / hedge some derivative or optimize some expectedutility. Even if the model Q is chosen in the most skilful and sophisticated way,she is left with the possibility that Q does not provide an exact descriptionof reality. This leads us to the following question: will the hedge still besomewhat meaningful for models in the proximity of Q ?If we measure proximity with the usual Wasserstein distance (say), the an-swer is NO. Models which are similar w.r.t. Wasserstein distance may providedramatically diﬀerent information on which to base a hedging strategy.Remarkably, this can be overcome by considering a suitable adapted ver-sion of the Wasserstein distance which takes the temporal structure of pricingmodels into account. This adapted Wasserstein distance is most closely relatedto the nested distance as pioneered by Pﬂug and Pichler [52, 53, 54]. It allowsus to establish Lipschitz properties of hedging strategies for semimartingalemodels in discrete and continuous time. Notably, these abstract results aresharp already for Brownian motion and European call options. Keywords:

Hedging, utility maximization, optimal transport, causal optimaltransport, Wasserstein distance, sensitivity, stability.

AMS subject classiﬁcations (2010)

Introduction

Outline.

Assume that a reference measure P is used to model the evolution ofa ﬁnancial asset X with the purpose to hedge a ﬁnancial claim or to maximize someexpected utility. We do not expect that the model P captures reality in an absolutelyaccurate way. However, supposing that P is close enough to reality (described bya probability Q ) we would still hope that a strategy which is developed for P leadsto reasonable results.A main goal of this paper is to establish this intuitive idea rigorously basedon a new notion of adapted Wasserstein distance AW p between semimartingalemeasures. To ﬁx ideas, we provide a ﬁrst example of the results we are after. Theorem 1.1.

Let P , Q be continuous semimartingale models for the asset priceprocess X , and assume that C ( X ) denotes an L -Lipschitz payoﬀ of a (pathdepen-dent) derivative C . Assume that a predictable trading strategy H = ( H t ) t , | H | ≤ k and an initial endowment m ∈ R constitute a P -superhedge of C ( X ) , i.e. C ( X ) ≤ m + ( H • X ) T , P -almost surely . Then there is a predictable G s.t. m, G constitute an “almost” Q -superhedge: E Q [( C ( X ) − m − ( G • X ) T ) + ] ≤ k + L ) · AW ( P , Q ) . (1.1)While the adapted Wasserstein distance will be deﬁned in abstract terms (see(1.3)), it relates directly to the model parameters for ‘simple’ models. In particular,if P , Q are Brownian models with diﬀerent volatilities, than the distance betweenthese models is just the diﬀerence of these volatilities. Moreover, the bound in (1.1) (as well as further Lipschitz bounds given below) are already sharp in such a simplesetting and for C a European call option.Below we will provide a number of results with similar ﬂavour as Theorem 1.1.E.g. we will provide versions where the hedging error is controlled in terms of riskmeasures and we will show that a Lipschitz bound of the type (1.1) applies (withbigger constants) if the same trading strategy H is applied in the model P as wellas in the model Q . Importantly, we establish that comparable results of Lipschitzcontinuity apply to utility maximization and utility indiﬀerence pricing.We emphasize that familiar concepts such as the L´evy-Prokhorov metric or theusual Wasserstein distance do not appear suitable to derive results comparable toTheorem 1.1. E.g. in the vicinity of ﬁnancial meaningful models there are mod-els with arbitrarily high arbitrage even for bounded strategies; similar phenomenaappear w.r.t. completeness / incompleteness. Instead we introduce an adaptedWasserstein distance AW p which takes the temporal structure of semimartingalemodels into account. These distances are conceptually closely related to the nesteddistance as pioneered by Pﬂug and Pichler [53, 54, 55]; see [1, 30, 20] for ﬁrst arti-cles which link such a type of distance to ﬁnance. We describe these contributionsmore closely in Section 2 below.1.2. Notation and adapted Wasserstein distances.

Throughout we letΩ := R T or Ω := C (0 , T ) . The ﬁrst setting shall be referred to as the discrete time case, and the second as thecontinuous time case. In the ﬁrst case we denote by I = { , . . . , T } the time-indexset, and in the second I = [0 , T ]. Throughout the article we will provide deﬁnitionsand results without specifying which of the two cases we are referring to: Thismeans that the deﬁnitions / results apply in both cases. Only occasionally will weconsider one case speciﬁcally, and in this situation we will state this explicitly.We interpret Ω as the set of all possible evolutions (in time) of the 1-dimensionalasset price. Importantly, mutatis mutandis, all our results (except Propositions3.3, 3.6 and Example 3.4) remain true for multi-dimensional asset price processes(corresponding to Ω = ( R d ) T / Ω = C ([0 , T ] , R d )). We chose to go for the 1-dimensional version to simplify notation.The mappings X, Y : Ω → Ω denote the canonical processes (i.e. the identitymap), and we make the convention that on Ω × Ω the process X denotes the ﬁrstcoordinate and Y the second one. The spaces Ω and Ω × Ω are endowed with themaximum-norm and the corresponding Borel- σ -ﬁeld. In continuous time, the spaceΩ is endowed with the right-continuous ﬁltration generated by X , in discrete timewe use the plain ﬁltration generated by X . In any case we denote this ﬁltrationby F = ( F t ) t and endow Ω × Ω with the product ﬁltration ( F t ⊗ F t ) t . Given a σ -algebra G and a probability P on G we write G P for the P -completion of G . The setCpl( P , Q ) of couplings between probability measures P , Q consists of all probabilitymeasures π on Ω × Ω such that X ( π ) = P and Y ( π ) = Q . A Monge coupling is acoupling that is of the form π = (Id , T )( P ) for some Borel mapping T : Ω → Ω thattransports P to Q , i.e. satisﬁes T ( P ) = Q . Given a metric d on Ω and p ≥

1, the p -Wasserstein distance of P , Q is W p ( P , Q ) = inf n E π [ d ( X, Y ) p ] /p : π ∈ Cpl( P , Q ) o . (1.2)In many cases of practical interest the inﬁmum in (1.2) remains unchanged if oneminimizes only over Monge couplings, cf. [56]. Indeed the arguments in the discrete and the continuous case use the same set of ideas butthe presentation is signiﬁcantly less technical in the discrete case which was an important reasonto include the discrete case in the paper.

DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 3

Before deﬁning the adapted Wasserstein distance between measures P and Q onΩ, let us hint why distances related to weak convergence are not suitable for theresults we have in mind. Assume for example that we are interested in a utilitymaximization problem in two periods and that Figure 1 describes the laws P , Q oftwo traded assets. Clearly they are very close in Wasserstein distance, as followsfrom considering the obvious Monge coupling induced by T : Ω → Ω , T ( P ) = Q depicted in Figure 1. At the same time, the outcome of utility maximization iscertainly very diﬀerent. Similarly, P is a martingale measure while Q allows forarbitrage. The clear reason for that is the diﬀerent structure of information availableat time 1. T Figure 1.

Map T sends the blue path on the left, to the bluepath on the right, and similarly for the red paths. The stochasticprocesses depicted are close in Wasserstein sense, but very diﬀerentfor utility maximization.To exhibit why the Wasserstein distance does not reﬂect this diﬀerent structureof information, let us review the transport condition T ( P ) = Q . We rephrase it as( T ( X , X ) , T ( X , X )) ∼ ( Y , Y ) . (1.3)While this condition is of course perfectly natural in mass transport, (1.3) almostseems like cheating when viewed from a probabilistic perspective: the map T shouldnot be allowed to consider the future value X in order to determine Y . To deﬁnean adapted version of the Wasserstein distance, the ‘process’ ( T i ) i =1 , should betaken to be adapted in order to account for the diﬀerent information structures of P and Q .Naturally our oﬃcial deﬁnition of adapted Wasserstein distances will not referto adapted Monge transports but rather to couplings which are ‘adapted’ in anappropriate sense. Following Lassalle [45], we call such couplings (bi-)causal. Sincethe deﬁnition below may appear a bit technical at ﬁrst glance, the following maybe reassuring: In the discrete time setting and for absolutely continuous measures P , the weak closure of the set of adapted Monge couplings, i.e. π = (Id , T )( P ) for T adapted, is precisely the set of all causal couplings, see [42]. Deﬁnition 1.2 ((bi-)causal couplings) . For a coupling π of P , Q ∈ P (Ω) denote by π ( dω, dη ) = P ( dω ) π ω ( dη ) a regular disintegration w.r.t. P . The set Cpl C ( P , Q ) of causal couplings consists of all π ∈ Cpl( P , Q ) such that for all t ∈ I and A ∈ F t ω π ω ( A ) is F P t -measurable . The set of all bi-causal couplings

Cpl BC ( P , Q ) consists of all π ∈ Cpl C ( P , Q ) suchthat also S ( π ) ∈ Cpl C ( Q , P ) , where S : Ω × Ω → Ω × Ω , S ( ω, η ) := ( η, ω ) . In discrete time, a coupling π is causal if and only if π (cid:0) ( Y , . . . , Y t ) ∈ A | X (cid:1) = π (cid:0) ( Y , . . . , Y t ) ∈ A | X , . . . X t (cid:1) , P -a.s. for every t and Borel set A ⊆ R t , that is, at time t , given the past ( X , . . . , X t )of X , the distribution of Y t does not depend on the future ( X t +1 , . . . , X N ) of X .Replacing couplings by bi-causal couplings in (1.2) one arrives at the nested dis-tance as introduced by Pﬂug and Pichler [52, 53]. Since our goal is to compare also J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER semimartingale models in continuous time we will work with an adapted Wasser-stein distance that is deﬁned slightly diﬀerently. (Notably, it is straightforwardthat the two distances are equivalent for probabilities on R N . We will elaborate inSection 3.3 below, why the deﬁnition in (1.4) is more appropriate for our purposeseven in discrete time.)In continuous time, we denote by SM (Ω) the set of all probabilities P on (theBorel σ -ﬁeld of) Ω under which the canonical process X is a continuous semimartin-gale. In discrete time, SM (Ω) denotes the set of all Borel probabilities P on Ω underwhich X is integrable. In either case we can uniquely decompose X = M + A , with A a ﬁnite variation predictable process started at zero, and M a local martingale.Indeed, in the ﬁrst case X is a special semimartingale and in fact M and A are con-tinuous too, and in the second case this is the Doob decomposition of an integrableadapted discrete-time process. For p ∈ [1 , ∞ ) we denote by SM p (Ω) the subset of SM (Ω) for which E P [[ M ] p/ T + | A | p ] < ∞ , where [ · ] is the quadratic variation and | · | the ﬁrst variation norm. Note alsothat by the BDG inequality E P [sup s ≤ T | M s | ] < ∞ for SM p (Ω), hence M is a truemartingale. Deﬁnition 1.3 (Adapted Wasserstein distance) . For P , Q ∈ SM p (Ω) , p ≥ set AW p ( P , Q ) := inf n E π (cid:2) [ M X − M Y ] p/ T + | A X − A Y | p (cid:3) /p : π ∈ Cpl BC ( P , Q ) o , (1.4) where X = M X + A X , Y = M Y + A Y denote the semimartingale decomposition of X and Y resp. It is shown in Lemma 3.1 that AW p is well-deﬁned (i.e. that X − Y is a semi-martingale under every bi-causal coupling) and in Lemma 3.2 that AW p in factdeﬁnes a metric. Remark 1.4.

In the continuous time setup, the adapted Wasserstein distance canalso be computed through AW p ( P , Q ) = inf n E π (cid:2) [ X − Y ] p/ T + MV T [ | X − Y | p (cid:3) /p : π ∈ Cpl BC ( P , Q ) o . Here MV denotes the mean variation, i.e. MV T [ Z ] = sup ∆ P t j ∈ ∆ | E [ Z t j +1 − Z t j |F t j ] | ,where the supremum is taken over all ﬁnite partitions ∆ of [0 , T ] . In Section 3.2 below we will give explicit formulae for the adapted Wassersteindistance in the case of semi-martingale measures described by simple SDEs.1.3.

Stability of Superhedging.

For the rest of this article, ﬁx some k ∈ R + andlet H k be the set of all predictable processes H : Ω × I → [ − k, k ] . For every p ≥

1, write b p for the ‘upper’ Burkholder-Davis-Gundy (BDG) constant,cf. Remark 3.12 below. In particular it is known that b ≤ b = 2.Our ﬁrst main result concerns the stability of superhedging and constitutes astronger version of Theorem 1.1 stated above. Theorem 1.5.

Let P , Q ∈ SM (Ω) , H ∈ H k and let C : Ω → R be Lipschitz withconstant L . Then the hedging error under Q is bounded by the distance of P and Q plus the hedging error under P in the following sense: there exists G ∈ H k suchthat E Q [( C − m − ( G • X ) T ) + ] ≤ E P [( C − m − ( H • X ) T ) + ]+ b ( k + L ) AW ( P , Q ) . (WHI) DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 5

Assume in addition that H t : Ω → R is Lipschitz with constant ˜ L for every t ∈ I .Then we can take G = H and obtain E Q [( C − m − ( H • X ) T ) + ] ≤ E P [( C − m − ( H • X ) T ) + ]+ b ( k + L ) AW ( P , Q ) + β AW ( P , Q ) , (SHI) where β := 2 √ b ˜ L min {AW ( P , δ ) , AW ( Q , δ ) } . Importantly, it is impossible to transfer a superhedge under P into a superhedgeunder Q . This occurs already in a one-period framework and is not a by-product ofour deﬁnition of adapted Wasserstein distance; see Remark 5.2. A similar reasoningrequires to consider only trading strategies bounded by k ; see Remark 5.3.It is worthwhile to compare the inequalities (WHI) and (SHI):(S) In a certain sense the ‘strong hedging inequality’ (SHI) seems to be themore relevant assertion: after all a trader does not know that the model Q (rather than the model P ) describes reality and hence she might (somewhatstubbornly) stick to the initial plan of hedging her risk according to thestrategy H . The inequality (SHI) then allows to quantify the losses due tothis model-error.(W) However, the ‘weak hedging inequality’ (WHI) also has a particular merit:suppose that a trader W starts with the prior belief that the asset priceevolves according to a Black-Scholes model with volatility σ but soon aftertime 0 realizes that a volatility σ (where σ = σ ) yields a more adequatedescription of reality. If the witty trader W makes an accurate guess aboutthe correct model and updates her trading strategy accordingly, her lossescan be controlled through the tighter bound in (WHI).In Theorem 4.2 we provide a version of Theorem 1.5, where ( · ) + is replaced bya convex, strictly increasing loss function l : R → R + .Another way to gauge the eﬀectiveness of an almost superhedge is by meansof risk measures. We postpone the general formulation to Theorem 4.3 and ﬁrstpresent a version that appeals to the average value of risk AVaR P α . Recall that fora random variable Z : Ω → R AVaR P α ( Z ) := inf m ∈ R E P [( Z − m ) + /α + m ] , is the average value at risk at level α ∈ (0 ,

1) under model P . We then have Theorem 1.6.

Assume that C : Ω → R is Lipschitz with constant L . Then (cid:12)(cid:12)(cid:12) inf H ∈H k AVaR P α ( C − ( H • X ) T ) − inf H ∈H k AVaR Q α ( C − ( H • X ) T ) (cid:12)(cid:12)(cid:12) ≤ r AW ( P , Q ) , for r := b ( L + k ) /α . If H ∈ H k is such that H t : Ω → [ − k, k ] is Lipschitz withconstant ˜ L for every t ∈ I and β is the constant deﬁned in Theorem 1.5, then (cid:12)(cid:12) AVaR P α ( C − ( H • X ) T ) − AVaR Q α ( C − ( H • X ) T ) (cid:12)(cid:12) ≤ r AW ( P , Q ) + βα AW ( P , Q ) . The interpretation of this result is similar to the one of Theorem 1.5: As AVaR P α ( · )is translation invariant, one hasinf H ∈H k AVaR P α ( C − ( H • X ) T ) = inf n m ∈ R : there is H ∈ H k such thatAVaR P α ( C − m − ( H • X ) T ) ≤ o , and the right-hand side constitutes a relaxed version of the superhedging price.Notably, the explicit calculations of adapted Wasserstein distance given in Sec-tion 3.2 imply that Theorem 1.6 (and similarly Theorem 1.5) are sharp J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER

Example 1.7 (Hedging in a Brownian framework) . Consider a European call op-tion C ( X ) = ( X T − K ) + , where for simplicity K = 0 . Moreover, let P σ be Wienermeasure with constant volatility σ ≥ . Then for every σ, ˆ σ ≥ , k ≥ , and α ∈ (0 , it holds that (we defer the proof of this fact to Section 4) (cid:12)(cid:12)(cid:12) inf H ∈H k AVaR P σ α ( C − ( H • X ) T ) − inf H ∈H k AVaR P ˆ σ α ( C − ( H • X ) T ) (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12) E P σ [ C ] − E P ˆ σ [ C ] (cid:12)(cid:12) = 1 √ π T | σ − ˆ σ | = 1 √ π AW ( P σ , P ˆ σ ) . This shows that the estimate in Theorem 1.6 is tight (up to constants), in the sensethat it is essentially impossible to improve on the probability metric AW . We make the important remark that Glanzer, Pﬂug, and Pichler [30] use thenested distance to control acceptability prices in discrete time models in a Lipschitzfashion through the nested distance of these models. Speciﬁcally, in a discretetime one-period framework [30, Proposition 3] and Theorem 1.6 yield almost thesame assertion: in this setup, the only diﬀerence is that [30, Proposition 3] doesnot specify a Lipschitz constant and does not assume uniform boundedness of theadmissible hedging strategy. (However, the latter seems to be in conﬂict with ourRemark 5.3 below.)1.4.

Stability of Utility Maximization and Utility Indiﬀerence Pricing.

We move on to consider the continuity of utility maximization. Let U : R → R , bea utility function which is concave, increasing, and denote by U ′ the left-continuousversion of the derivative. We have Theorem 1.8.

Let C : Ω → R be Lipschitz continuous and assume that there exists c ≥ such that U ′ ( x ) ≤ c (1 + | x | p − ) for all x . Then, for every R ≥ there existsa constant K such that (cid:12)(cid:12)(cid:12) sup H ∈H k E P [ U ( C + ( H • X ) T )] − sup H ∈H k E Q [ U ( C + ( H • X ) T )] (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) , for all P , Q ∈ SM p (Ω) with AW p ( P , δ ) , AW p ( Q , δ ) ≤ R . The failure of usual Wasserstein distances to guarantee stability of utility maxi-mization is illustrated in Remark 5.1.A common way of quantifying the value of a claim is via utility indiﬀerencepricing: given a claim C , the utility indiﬀerence (bid-) price v is deﬁned as thesolution of the following equationsup H ∈H k E P [ U ( C − v + ( H • X ) T )] = sup H ∈H k E P [ U (( H • X ) T )] . Continuing in the spirit of the present paper, we are interested in the stability of P v ( P ), where the latter denotes the utility indiﬀerence price associated to themodel P . Theorem 1.9.

Let C : Ω → R be Lipschitz continuous and assume that there exists c ≥ such that < U ′ ( x ) ≤ c (1 + | x | p − ) for all x . Then, for every R ≥ thereexists a constant K such that (cid:12)(cid:12)(cid:12) v ( P ) − v ( Q ) (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) , for all P , Q ∈ SM p (Ω) with AW p ( P , δ ) , AW p ( Q , δ ) ≤ R . We are grateful to the anonymous referee for pointing out that we could include the stabilityof utility indiﬀerence pricing w.r.t. adapted Wasserstein distance.

DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 7

Structure of the paper.

In Section 2 we brieﬂy review the literature relatedto this paper. In Section 3 we establish some basic properties of the adaptedWasserstein distance, discuss the choice of cost function and give some examples.Moreover we derive a contraction principle (Theorem 3.10) which relates adaptedWasserstein distance with a ‘weak’ (in the sense of Gozlan et al [32]) transportdistance. This result forms the basis for the proofs of the results mentioned in theintroduction, as well as certain extensions of these results, see Section 4. Finallywe conclude with some remarks in Section 5.2.

Literature

The articles closest in spirit to ours are [1, 20, 30]. Acciaio, Zalashko and oneof the present authors consider in [1] an object related to the adapted Wassersteindistance in continuous time in connection with utility maximization, enlargement ofﬁltrations and optimal stopping. Glanzer, Pﬂug, and Pichler [30] prove a deviation-inequality for the so-called nested distance in a discrete time framework , andconsider acceptability pricing over an ambiguity set described through the nesteddistance. Bion-Nadal and Talay [20] study via PDE arguments a continuous-timeoptimization problem which is related to the adapted Wasserstein distance.The concept of causal couplings, and optimal transport over causal couplings,has been recently popularized by Lassalle [45] although precursors can be found inthe works [62, 58]. This notion is central to the recent articles [1, 10, 8, 9].The idea of strengthening weak convergence of measures in order to account forthe temporal evolution has some history. Indeed several authors have independentlyintroduced diﬀerent approaches to address this challenge: The seminal unpublishedwork by Aldous [2] introduces the notion of extended weak convergence for the studyof stability of optimal stopping problems. The principal idea is not to comparethe laws of processes directly, but rather the laws of the corresponding predictionprocesses. Independently, Hellwig [33] introduces the information topology for thestability of equilibrium problems in economics. Roughly, two probability measureson a product of ﬁnitely many spaces X × . . . × X N are considered to be close if foreach t ≤ N the projections onto the ﬁrst t coordinates as well as the correspondingconditional (regular) disintegrations are close. Unrelated to these developmentsPﬂug and Pichler [52, 53, 54] have introduced the nested distances for the stabilityof stochastic programming in discrete time. The nested distance is the obviousrole model for the adapted Wasserstein distances considered in this article and (asmentioned above) for a ﬁxed number of time steps and p ≥

1, they are obviouslyequivalent. Yet another idea to account for the temporal evolution of processeswould be to symmetrise the causal transport costs W c ( P , Q ) deﬁned by Lassalle[45] by taking the maximum or sum of W c ( P , Q ) and W c ( Q , P ); this was pointedout by Soumik Pal.In parallel work [6], the four authors of the present article investigate the re-lations between these concepts in detail. Remarkably, in discrete time all of theconcepts mentioned above (adapted Wasserstein distances, extended weak conver-gence, information topology, nested distances, symmetrised causal transport costs)deﬁne the same topology . As noted above, this ‘weak adapted topology’ reﬁnesthe usual weak topology (properly for T ≥

2, see also Remark 5.2). The arti-cles [8, 6, 27] investigate basic properties of this topology, e.g. the weak adaptedtopology is Polish [8, Section 5], sets are totally bounded w.r.t. to adapted Wasser-stein distance / nested distance if and only if they are totally bounded w.r.t. usual Note added in revision: improved convergence rates have been recently obtained in [7] fora related sample-based estimator. Together with the results of the present article, this givesstatistical consistency for an empirical version of the ﬁnancial problems considered.

J. BACKHOFF-VERAGUAS, D. BARTL, M. BEIGLB ¨OCK, M. EDER

Wasserstein distance [6, Lemma 1.6]. For recent applications of these concepts tooptimal transport and probabilistic variants thereof we refer to [11, 12, 61].In contrast, fundamental topological properties of the above mentioned conceptsin the continuous time case seem to be much less understood and, as far as theauthors are concerned, pose an interesting challenge for future research. Speciﬁcally,it is not clear to us whether the topology associated to the adapted Wassersteindistance is Polish in the continuous time case. In a similar vein, we expect thatresults analogous to the ones of the present article should apply in the case ofc`adl`ag paths, but this extension is beyond the scope of our current understandingof adapted Wasserstein distances.The question of stability in mathematical ﬁnance has been studied from diﬀer-ent perspectives over the years. Notably, starting with the articles of Lyons [46]and Avellaneda, Levy, Paras [5] the area of robust ﬁnance has mainly focused onextremal models and hedging strategies which dominate the payoﬀ for every modelin a speciﬁed class. Following the publication of Hobson’s seminal article [36] con-nections with the Skorokhod embedding problem have been a driving force of theﬁeld, see the surveys of Hobson [37] and Ob l´oj [48]. Recently this has been com-plemented by techniques coming from (martingale) optimal transport, early paperswhich advance this viewpoint include [38, 15, 29, 16, 21, 26, 24, 18]. The litera-ture on ‘local’ misspeciﬁcation of volatility in a sense more closely related to thepresent article appears more spare. El Karoui, Jeanblanc, and Shreve [28] establishin a stochastic volatility framework that if the misspeciﬁed volatility dominates thetrue volatility, then the misspeciﬁed price of call options dominates the real price;see also the elegant account of Hobson [39]. More recently, the question of pricingand hedging under uncertainty about the volatility of a reference local volatilitymodel is studied by Herrmann, Muhle-Karbe, and Seifried [35] (see also [34]). Lessplausible models are penalized through a mean square distance to the volatility ofthe reference model and the authors obtain explicit formulas for prices and hedgingstrategies in a limit for small uncertainty aversion. Becherer and Kentia [14] deriveworst-case good-deal bounds under model ambiguity which concerns drift as wellas volatility. Indeed, discussions with Dirk Becherer motivated us to consider alsomodels with drift in our results on stability of super hedging. The behaviour of thesuperhedging price in a ball (w.r.t. various notions of distance) around a referencemodel is studied in depth by Ob l´oj and Wiesel [49] for a d -dimensional asset andone time period.A notable implication of our work is that it yields a coherent way to measuremodel-uncertainty (in the sense of Cont’s inﬂuential article [25]): Fix a subset M of the set M of all consistent models, i.e. martingale measures which are consistentwith benchmark instruments whose price can be observed on the market. Given M , the model uncertainty associated to a derivative f can be gauged through ρ M ( f ) := sup { E Q f : Q ∈ M } − inf { E Q f : Q ∈ M } . The worst-case approach typically pursued in robust ﬁnance then yields ρ M ( f ) for M = M , but it appears equally natural to take M to be an inﬁnitesimal ballaround a reference model. This approach is ﬁrst carried out by Drapeau, Ob l´oj,Wiesel and one of the present authors [13] in a one period framework. Our resultsindicate that adapted Wasserstein distance provides a way to extend this to a multi-period setup, and we intend to pursue this further in future work.On a diﬀerent note, much work has been done regarding the convergence ofdiscrete time models to their continuous time analogues. Due to the vastness ofthis literature we refer the reader to the book [57] for references. Finally, in morerecent times and starting from the works of Kardaras and ˇZitkovi´c, the stability ofutility maximization has been studied in [41, 43, 44, 47, 60] among others. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 9 The adapted Wasserstein distance

Basic properties of AW p . The following Lemma shows that AW p is well-deﬁned. Lemma 3.1.

Let P , Q be integrable (semi-)martingale measures for X, Y : Ω → Ω ,respectively, and let π be a bi-causal coupling between P and Q . Then X, Y, X − Y : Ω × Ω → Ω are (semi)-martingales w.r.t. π . Further, if X = M + A denotesthe semimartingale decomposition under P , then up to evanescence M + A is thesemimartingale decomposition of X under π .Proof. Let X = M + A be the semimartingale decomposition under P and consider M and A as processes on Ω × Ω via M ( ω, η ) := M ( ω ) and A ( ω, η ) := A ( ω ). Furtherlet π = P ( dω ) π ω ( dη ) be a bi-causal coupling between P and Q . To show that X = M + A remains the semimartingale decomposition under π , it is enoughto show that M is a martingale under π . To that end, let 0 ≤ s ≤ t and let Z : Ω × Ω → R be F s ⊗ F s -measurable and bounded. (Recall that F = ( F t ) t denotes the right-continuous ﬁltration generated by X and that we endow Ω × Ωwith the ﬁltration ( F t ⊗ F t ) t .) Then the random variable Z ′ : Ω → R deﬁned by Z ′ ( ω ) := Z Z ( ω, η ) π ω ( dη ) is F P s -measurable,and clearly bounded. Indeed, if Z ( ω, η ) = Z ( ω ) Z ( η ) for F s -measurable boundedfunctions Z and Z , then it follows from the deﬁnition of bi-causality that Z ′ is F P s -measurable; the general statement then follows from a monotone class argument.Therefore E π [( M t − M s ) Z ] = Z ( M t ( ω ) − M s ( ω )) Z Z ( ω, η ) π ω ( dη ) P ( dω )= E P [( M t − M s ) Z ′ ]= 0 , by the martingale property of M under P . This shows that M is a martingale under π and therefore that X = M + A is the semimartingale decomposition under π . (cid:3) Lemma 3.2. AW p deﬁnes a metric on the set SM p (Ω) . We note that very similar arguments could be used to show that AW p deﬁnes ametric for semimartingales with inﬁnite time horizon N or [0 , ∞ ). Proof of Lemma 3.2.

It is clear that AW p ( P , Q ) = AW p ( Q , P ) ≥ P , Q ∈SM p (Ω). Suppose that AW p ( P , Q ) = 0. As k · k ∞ ≤ | · | , it is immediate thatif π participates in the inﬁmum deﬁning AW p ( P , Q ), and X − Y = M + A , then E π [ k X − Y k p ∞ ] ≤ p − E π [ k M k p ∞ + | A | p ] ≤ p − b p E π [[ M ] p/ T + | A | p ]where b p denotes the BDG constant and we used the BDG inequality for the martin-gale M . Hence the usual Wasserstein distance between P and Q (deﬁned w.r.t. the k · k ∞ -norm) is dominated from above by AW p ( P , Q ), and so P = Q .We now prove the triangle inequality. Let P , Q , R given. We ﬁx ε > π is bi-causal ε -optimal for AW p ( P , Q ) and ˜ π is bi-causal ε -optimal for AW p ( Q , R ).In the next couple of lines, ω will always denote the ﬁrst coordinate of a vector inΩ , η the second, and γ the last. Let π ( dω, dη ) = π η ( dω ) Q ( dη ) and ˜ π ( dη, dγ ) = ˜ π η ( dγ ) Q ( dη )be disintegrations, and deﬁne Π ∈ P (Ω ) byΠ( dω, dη, dγ ) = π η ( dω ) ˜ π η ( dγ ) Q ( dη ) . If π ( dω, dγ ) := R Ω Π( dω, dη, dγ ) is the projection of Π onto the ﬁrst and thirdcomponents, then it is clear that the ﬁrst and second marginals of π are P and R respectively. Moreover, a disintegration of π = π ω ( dγ ) P ( dω ) is given by π ω ( dγ ) = Z Ω ˜ π η ( dγ ) π ω ( dη ) , where, as indicated above, π ω now denotes the disintegration of π w.r.t. the ﬁrstcoordinate, that is π ( dω, dη ) = π ω ( dη ) P ( dω ). We claim that, for every A ∈ F t , themapping ω π ω ( A ) is F P t -measurable. Indeed, by bi-causality of ˜ π one has that η ˜ π η ( A ) is F Q t -measurable. Thus there is an F t -measurable function X and a Q -almost surely zero function N such that ˜ π η ( A ) = X ( η ) + N ( η ) for all η ∈ Ω.Then π ω ( A ) = R Ω X ( η ) π ω ( dη ) + R Ω N ( η ) π ω ( dη ) for all η ∈ Ω. The ﬁrst term is F P t -measurable (by bi-causality of π ), and, as π is a coupling between P and Q , onehas that R Ω N ( η ) π ω ( dη ) = 0 for P -almost all ω ∈ Ω.The argument for π = π γ ( dω ) R ( dγ ) is similar and therefore π is a bi-causalcoupling between P and R . Finally, it follows as in the proof of Lemma 3.1 that, if X = M X + A X , Y = M Y + A Y , and Z = M Z + A Z are the semimartingale decom-positions under P , Q , and R , then they remain the semimartingale decompositionunder Π on Ω endowed with the product ﬁltration.To ﬁnish the proof of the triangle inequality, we observe that AW p ( P , R ) ≤ E π [[ M X − M Z ] p/ T + | A X − A Z | p ] /p = E Π h [( M X − M Y ) + ( M Y − M Z )] p/ T + · · ·· · · + | ( A X − A Y ) + ( A Y − A Z ) | p i /p . The function M E Π [[ M ] p/ T ] /p is known to be a norm on the space M p (Π)of Π-martingales started at zero whose supremum is p -integrable. Likewise A E Π [ | A | p ] /p is a norm on the space of ﬁnite variation processes with p -integrablevariation. Hence ( M, A )

7→ k ( M, A ) k := E Π [[ M ] p/ T + | A | p ] /p is a norm on the product of these spaces. We conclude the proof for the triangleinequality with AW p ( P , R ) ≤ k ( M X − M Y , A X − A Y ) + ( M Y − M Z , A Y − A Z ) k≤ k ( M X − M Y , A X − A Y ) k + k ( M X − M Y , A X − A Y ) k = E π [[ M X − M Y ] p/ T + | A X − A Y | p ] /p + E ˜ π [[ M Y − M Z ] p/ T + | A Y − A Z | p ] /p ≤ ε + AW p ( P , Q ) + AW p ( Q , R ) , since the semimartingale decomposition of X − Y under π is ( M X − M Y ) − ( A X − A Y ), with an analogous expression for Y − Z under ˜ π .To conclude the proof, it remains to show that AW p ( P , Q ) < ∞ for all P , Q ∈SM p (Ω). By Lemma 3.1, we have AW p ( P , δ ) = E P [[ M ] p/ T + | A | p ] /p where X = M + A is the semimartingale decomposition under P . Therefore the triangleinequality implies that AW p is real-valued on SM p (Ω). (cid:3) Examples and explicit calculations.

We start by a simple result whichpermits to give a closed-form expression of the adapted Wasserstein distance ingiven continuous-time situations:

DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 11

Proposition 3.3.

For i ∈ { , } consider the SDEs with bounded progressive coef-ﬁcients: dX it = µ i ( t, { X is } s ≤ t ) dt + σ i ( t, { X is } s ≤ t ) dB it . (3.1) Assume that each SDE admits a unique strong solution and denote by P µ i ,σ i therespective laws. Further assume that • µ is a function of time only (namely µ : [0 , T ] → R ) • σ , σ ≥ and at least one of them is a function of time only.Then the synchronous coupling (namely π ∗ = joint law of ( X , X ) , where B = B in (3.1) ), is optimal in the deﬁnition of AW p ( P µ ,σ , P µ ,σ ) . The discrete time version of the aforementioned synchronous coupling is givenby the Knothe-Rosenblatt rearrangement [10], and a variant of the previous resultcan also be obtained in the discrete time framework.

Proof.

Let π be a feasible coupling for AW p ( P µ ,σ , P µ ,σ ), leading to a ﬁnite cost.Naturally for this proof we denote the coordinate process on Ω × Ω by ( X , X ). Asbefore we let X i = A i + M i be the unique continuous semimartingale decompositionof X i under the P µ i ,σ i -completion of its right-continuous ﬁltration. Observe that ddt A is a.s. deterministic, by the assumption on µ , and that the law of ddt A isindependent of the coupling π . Both facts can be derived easily from the identity ddt A it = lim ε ց E π (cid:2) X it + ε |F X i t (cid:3) − X it ε , which by Lebesgue diﬀerentiation theorem holds dt ⊗ dπ -a.s. As a consequence, theterm E π [ | A − A | p − var ] is independent of the coupling π and so we may ignore itand only focus on the term E π [[ M − M ] p/ T ].By Doob’s martingale representation [40, Theorem 4.2], in a possibly enlargedﬁltered probability space ( ˜Ω , ˜ F , ˜ π ) we may represent the martingale ( M , M ) by M it = Z t σ i dW + Z t σ i d ˆ W , where W, ˆ W are independent standard one-dimensional Brownian motions and { σ ik : i, k ∈ { , }} real-valued processes, both of them adapted in the enlarged ﬁlteredspace. In the following we will omit the argument { X is } s ≤ t from σ i . Necessarily σ i = ddt [ M i ] t = σ i + σ i , ( dt ⊗ d ˜ π − a.s. ) . By Cauchy-Schwarz inequality we deduce that almost surely[ M , M ] T = Z T [ σ σ + σ σ ] dt ≤ Z T σ σ dt, and accordingly we get the lower bound E π [[ M − M ] p/ T ] ≥ E π h(cid:16) Z T ( σ − σ ) dt (cid:17) p/ i . As in the beginning of the proof, the right-hand side does not depend on the coupling π thanks to either σ i being a function of time only. To conclude observe that forthe synchronous coupling π ∗ we have equality in the above equation. (cid:3) As an easy consequence we have

Example 3.4.

For bounded Lipschitz functions µ , µ , σ , σ we denote by P µ i ,σ i the law of the diﬀusion dX it = µ i ( t, X it ) dt + σ i ( t, X it ) dB t . Assume that • µ i is independent of the x -variable, some i ∈ { , } , and • σ k is independent of the x -variable, some k ∈ { , } .Calling j ∈ { , }\{ i } and ℓ ∈ { , }\{ k } , we have AW p ( P µ ,σ , P µ ,σ ) p = E h(cid:16) Z T [ σ ℓ ( t, X ℓt ) − σ k ( t )] dt (cid:17) p/ i + E h(cid:16) Z T | µ j ( t, X jt ) − µ i ( t ) | dt (cid:17) p i . We now illustrate that in general it is not true that the straightforward synchro-nous coupling of Proposition 3.3 is optimal. As a consequence, we do not expecta closed-form expression for the adapted Wasserstein distance. A discrete-timeversion of this observation is discussed in [8, Section 7].

Example 3.5.

Consider d = 1 , T = 2 , and for each c ∈ R introduce µ ct ( ω ) := c [1 , ( t ) sign( ω ) and ˆ µ ct ( ω ) := − µ ct ( ω ) . Assuming that B is a Brownian motion, and for σ ∈ R + , we introduce the couplings π := Law (cid:16) σB + Z µ ct ( B ) dt , σB + Z ˆ µ ct ( B ) dt (cid:17) ,π := Law (cid:16) σB + Z µ ct ( B ) dt , − σB + Z ˆ µ ct ( − B ) dt (cid:17) . These couplings share the same marginals and each of them is bi-causal. It is easyto compute E π (cid:2) [ M ] p/ T + | A | p (cid:3) = (2 c ) p , E π (cid:2) [ M ] p/ T + | A | p (cid:3) = (8 σ ) p/ . We conclude that, for each p , there are plenty of pairs ( c, σ ) such that the “synchro-nous” coupling π is not optimal between its marginals for the metric AW p . To close this section, we estimate the distance between two geometric Brownianmotions with diﬀerent volatilities.

Proposition 3.6.

For i = 1 , , let P σ i be the law of the solution to the SDE dZ it = σ i Z it dB it with Z i = 1 , where B i denotes Brownian motion and σ i ∈ R + .Letting R ∼ N (0 , T ) , we then have AW ( P σ , P σ ) = E "(cid:18) e σ R − σ T − e σ R − σ T (cid:19) = e σ T − e σ σ T + e σ T and for p > AW p ( P σ , P σ ) p ≤ c p E (cid:20)(cid:18) e σ R − σ T − e σ R − σ T (cid:19) p (cid:21) , where c p is the constant in the BDG-inequality which allows to control quadraticvariation by terminal value. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 13

Proof.

We have AW p ( P σ , P σ ) p = inf n E π (cid:2) [ Z − Z ] p/ T (cid:3) : π ∈ Cpl BC ( P , Q ) o ≤ c p inf n E π [( Z T − Z T ) p ] : π ∈ Cpl BC ( P , Q ) o = c p inf n Z (cid:18) e σ r − σ T − e σ r − σ T (cid:19) p dπ ( r , r ) : π ∈ Cpl( γ T , γ T ) o = c p E (cid:20)(cid:18) e σ R − σ T − e σ R − σ T (cid:19) p (cid:21) , where γ T denotes a centered Gaussian with variance T . For p = 2 and c = 1 weobtain equality. (cid:3) Choice of the ‘cost functional’.

Recall from Deﬁnition 1.3 that the adaptedWasserstein distance is given through AW p ( P , Q ) := inf { Φ : π ∈ Cpl BC ( P , Q ) o , where the ‘cost functional’Φ = E π (cid:2) [ M X − M Y ] p/ T + | A X − A Y | p (cid:3) /p (3.2)is deﬁned using the semimartingale decompositions X = M X + A X , Y = M Y + A Y .The distinctive property of this “quadratic plus ﬁrst variation” functional is that itexhibits the proper scaling to interpret the discrete time case as approximation tothe continuous time counterpart. To wit, consider Ω = C ([0 , P σ be thelaw of X where X t = R t σ s dB s , B Brownian motion and σ ∈ C ([0 , , σ ≥

0. Foreach N , denote by P σN the law of a random walk on { , /N, /N, . . . , } with inde-pendent increments from n/N to ( n + 1) /N distributed according to N (0 , σ n/N /N ).Then one can compute that for 0 ≤ σ, σ ′ ∈ C ([0 , AW ( P σN , P σ ′ N ) = (cid:16) N − X n =0 N | σ n/N − σ ′ n/N | (cid:17) / → (cid:16) Z | σ t − σ ′ t | dt ) (cid:17) / = AW ( P σ , P σ ′ ) . For comparison, consider the consequences of replacing Φ in (3.2) with ˜Φ = E π [ P Ni =0 ( X i − Y i ) i ] / corresponding to quadratic nested distance (in terms of Pﬂug and Pichler[53]). While g AW and AW are equivalent metrics for each ﬁxed N , g AW does notexhibit the appropriate scaling for large N . A straightforward computation shows g AW ( P σN , P σ ′ N ) → ∞ as N → ∞ whenever σ = σ ′ . In consequence, bounds on thehedging error in terms of g AW ( P σN , P σ ′ N ) become progressively weaker as N → ∞ .In particular they do not allow for a meaningful continuous time limit.When restricting solely to martingale measures P , Q , a sensible alternative to(3.2) would be to consider the maximum norm, i.e. Φ ′ = E π [sup t | X t − Y t | p ] /p .In fact, by the BDG-inequalities this is essentially equivalent our choice in (3.2).However, when considering semimartingales, this cost is too coarse. For example,let ( ω n ) be a sequence in Ω which converges to zero in maximum norm but forwhich the ﬁrst variation tends to inﬁnity. Then P n := δ ω n converges to P := δ (when adapted distance is deﬁned only with maximum norm as cost), however,none of our optimization problems converge (take a strategy H ∈ H k for which( H ( X ) • X ) T ≈ k | ω n | almost surely). Stochastic integrals and a contraction principle.

We present here thetwo technical results which underlie the proofs of the main theorems in the article.The ﬁrst one is

Lemma 3.7.

Let P , Q ∈ SM (Ω) , H ∈ H k , and π be a bi-causal coupling between P and Q . Then there exists a process G ∈ H k such that G t ( Y ) = E π [ H t ( X ) | Y ] forevery t , π -almost surely. Moreover, we have ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] , π -almost surely.Proof. In discrete time, write H = P Nt =1 H t { t } for Borel functions H t : R t − → [ − k, k ]. Let π = π η ( dω ) P ( dω ) be a disintegration and deﬁne G ′ t ( η ) := Z H t ( ω ) π η ( dω ) , for every t and η ∈ Ω. By deﬁnition of bi-causal coupling G ′ t is F Q t − -measurable.It remains to pick functions G t which are F t − measurable such G t = G ′ t Q -almostsurely. Since E π [ H t ( X ) | Y ] = G t ( Y ) π -almost surely, it is clear that ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almost surely.In continuous time we take G to be the predictable projection of H , under thereference measure π , with respect to the π -completion of the ﬁltration {∅ , Ω } ⊗ F Y .By [1, Lemma C.1] the result is π -indistinguishable from a predictable process underthe Q -completion of the ﬁltration F Y . The t -by- t , π -almost sure equality G t ( Y ) = E π [ H t ( X ) | Y ], is then a consequence of the deﬁnition of predictable projection. The π -almost sure equality ( G ( Y ) • Y ) T = E π [( H • Y ) T | Y ] is established in Lemma 3.8below, assuming that E Q [[ Y ] T ] < ∞ . The general case follows by localization. (cid:3) Lemma 3.8.

In the continuous-time context of Lemma 3.7, assume further that E Q [[ Y ] T ] < ∞ . Then we have ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] , π -almost surely.Proof. The statement is true if instead of the stochastic integrals we consideredthe integrals w.r.t. the ﬁnite variation part of Y (either by properties of Riemann-Stieltjes integrals, or directly from the deﬁnition of predictable projection). Forthis reason we may now assume that Y is itself a martingale.We ﬁrst take for granted the following result: if h is bounded and predictable inthe ﬁltration of ( X, Y ), and if g denotes its predictable projection in the ﬁltrationof Y under the measure π , then E π h Z T | g t | d [ Y ] t i ≤ E π h Z T | h t | d [ Y ] t i . (3.3)We know that there exist a sequence ( H n ) of predictable simple processes s.t.lim n →∞ E π h Z T | H t − H nt | d [ Y ] t i = 0 . By Itˆo isometry the stochastic integrals ( H n • Y ) T converge in L ( π ) to ( H • Y ) T .Denoting by G n the predictable projection of H n with respect to the Y -ﬁltration,we deduce from (3.3) thatlim n →∞ E π h Z T | G t − G nt | d [ Y ] t i = 0 , so again by Itˆo isometry ( G n • Y ) T converges in L ( π ) to ( G • Y ). The π -almostsure equality ( G n • Y ) T = E π [( H n • Y ) T | Y ] follows easily by the bi-causality of thecoupling π , and by taking L limits the desired conclusion is obtained. DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 15

To ﬁnish the proof we must establish (3.3). First we observe that E π h Z T | g t | d [ Y ] t i / = sup f is Y -predictable k f k≤ E π h Z T f t g t d [ Y ] t i = sup f is Y -predictable k f k≤ E π h Z T f t h t d [ Y ] t i , as follows from predictable projection and upon taking k f k := E π [ R | f t | d [ Y ] t ].The result is a consequence of the equality E π h Z T | h t | d [ Y ] t i / = sup f is ( X, Y )-predictable k f k≤ E π h Z T f t h t d [ Y ] t i . (cid:3) Our next crucial technical result is given in Theorem 3.10 below. But ﬁrst weneed some preparation.

Lemma 3.9.

Let P , Q ∈ SM p (Ω) , let π be a bi-causal coupling between P and Q ,let H ∈ H k , and write X − Y = M + A for the semimartingale decomposition under π . Then, for every p ≥ , we have E π [ k X − Y k p ∞ ] ≤ p − b p · E π [[ M ] p/ T + | A | p ] , E π [ | ( H ( X ) • X ) T − ( H ( X ) • Y ) T | p ] ≤ p − b p k p · E π [[ M ] p/ T + | A | p ] , where b p is the upper constant in the BDG-inequality. If further H t : Ω → R is ˜ L -Lipschitz continuous for every t , then we have E π [ | ( H ( X ) • X ) T − ( H ( Y ) • Y ) T | p ] ≤ p − b p k p · E π [[ M ] p/ T + | A | p ]+ α · E π [[ M ] pT + | A | p ] / where α = 2 p − ˜ L p b p b / p min {AW p ( P , δ ) p , AW p ( Q , δ ) p } .Proof. The elementary inequality ( x + y ) p ≤ p − x p + 2 p − y p for x, y ≥ k · k ∞ ≤ | · | imply E π [ k X − Y k p ∞ ] ≤ p − E π [ k M k p ∞ ] + 2 p − E π [ | A | p ] ≤ p − b p E π [[ M ] p/ T + | A | p ] . This proves the ﬁrst part. The same arguments imply E π [ | ( H ( X ) • X ) T − ( H ( X ) • Y ) T | p ] ≤ p − E π [ | ( H ( X ) • M ) T | p ] + 2 p − E π [ | ( H ( X ) • A ) T | p ] ≤ p − k p b p E π [[ M ] p/ T + | A | p ]from which the second part follows. To prove the third claim, write E π [ | ( H ( X ) • X ) T − ( H ( Y ) • Y ) T | p ] ≤ p − E π [ | (( H ( X ) − H ( Y )) • X ) T | p ] + 2 p − E π [ | ( H ( Y ) • X ) T − ( H ( Y ) • Y ) T | p ] . The second term is smaller than 2 p − p − k p b p E π [[ M ] p/ T + | A | p ] by the secondpart. It remains to estimate E π [ | (( H ( Y ) − H ( Y )) • X ) T | p ]. Write X = N + B for thesemimartingale decomposition of X under P . By Lemma 3.1, the semimartingale decomposition under π is still X = N + B . Moreover, the BDG-inequality, theLipschitz-continuity of H , and H¨older’s inequality, imply that E π [ | (( H ( X ) − H ( Y )) • X ) T | p ] ≤ p − E π [ | (( H ( X ) − H ( Y )) • N ) T | p + | (( H ( X ) − H ( Y )) • B ) T | p ] ≤ p − E π [ k H ( X ) − H ( Y ) k p ∞ ( b p [ N ] p/ T + | B | p )] ≤ p − b p ˜ L p E π [ k X − Y k p ∞ ] / E π [([ N ] pT + | B | P ) ] / . It now follows from the ﬁrst part that E π [ k X − Y k p ∞ ] / ≤ (2 p − b p ) / E π [[ M ] pT + | A | p ] / and by Lemma 3.1 we have E π [([ N ] p/ T + | B | p ) ] / ≤ / AW p ( P , δ ) p . Putting all estimates together and replacing X and Y yields the claim. (cid:3) Denote by P p ( R ) the set of all Borel probability measures µ on R such that R | x | p µ ( dx ) < ∞ . Moreover, let d p ( µ, ν ) be the usual p -Wasserstein distance, andlet d wp the weak p -Wasserstein cost, that is, d p ( µ, ν ) := inf n(cid:16) Z | x − y | p γ ( dx, dy ) (cid:17) /p : γ is a coupling of µ and ν o ,d wp ( µ, ν ) := inf n(cid:16) Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) p µ ( dx ) (cid:17) /p : γ is a coupling of µ and ν o . Here γ = µ ( dx ) γ x ( dy ) denotes the disintegration. Note that d wp is not symmetricand as a consequence of Jensen’s inequality, we always have d wp ≤ d p . Problemsakin to d wp ( µ, ν ) go under the name of ‘weak optimal transport’ and have beenrecently introduced by Gozlan et al. in [32], but see also [3, 4, 11, 9, 31]. We have Theorem 3.10 (Contraction) . Let P , Q ∈ SM p (Ω) , let π a bi-causal couplingbetween P and Q , let C : Ω → R be Lipschitz with constant L , and let H ∈ H k .Further denote by X − Y = M + A the semimartingale decomposition under π andlet G ∈ H k such that ( G ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almost surely. Then d wp (cid:16) ( C ( Y ) + ( G ( Y ) • Y ) T )( Q ) , ( C ( X ) + ( H ( X ) • X ) T )( P ) (cid:17) ≤ ( p − /p b /pp ( k + L ) · E π [[ M ] p/ T + | A | p ] /p . (3.4) Now assume in addition that H t : Ω → R is ˜ L -Lipschitz continuous for every t ,then d p (cid:16) ( C ( Y ) + ( H ( Y ) • Y ) T )( Q ) , ( C ( X ) + ( H ( X ) • X ) T )( P ) (cid:17) ≤ (3 p − /p b /pp ( k + L ) E π [[ M ] p/ T + | A | p ] /p + α /p E π [[ M ] pT + | A | p ] / p , where α is the constant of Lemma 3.9.Proof. We start by proving the ﬁrst claim. Let π be as stated, and deﬁne a ( X ) := C ( X ) + ( H ( X ) • X ) T as well as b ( Y ) := C ( Y ) + ( G ( Y ) • Y ) T . Now let γ :=( b ( Y ) , a ( X ))( π ) so that γ is trivially a coupling between b ( Y )( Q ) and a ( X )( P ).Therefore d wp (cid:16) b ( Y )( Q ) , a ( X )( P ) (cid:17) ≤ E π [ | b ( Y ) − E π [ a ( X ) | b ( Y )] | p ] /p . By assumption it holds that E π [( G ( Y ) • Y ) T − ( H ( X ) • X ) T | Y ] = E π [( H ( X ) • Y ) T − ( H ( X ) • X ) T | Y ] . DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 17

Thus, using the tower property and Jensen’s inequality, it follows that E π [ | b ( Y ) − E π [ a ( X ) | b ( Y )] | p ] /p ≤ E π (cid:2) (cid:12)(cid:12) E π [ C ( Y ) − C ( X ) | Y ] + E π [( G ( Y ) • Y ) T − ( H ( X ) • X ) T | Y ] (cid:12)(cid:12) p (cid:3) /p ≤ E π [ | C ( Y ) − C ( X ) | p ] /p + E π [ | ( H ( X ) • Y ) T − ( H ( X ) • X ) T | p ] /p The claim now follows from the ﬁrst and second estimates in Lemma 3.9.In the second case where H is additionally Lipschitz, let d ( X ) := C ( X )+( H ( X ) • X ) T as well as e ( Y ) := C ( Y ) + ( H ( Y ) • Y ) T and γ := ( e ( Y ) , d ( Y ))( π ). Then,similarly as before, d p (cid:16) e ( Y )( Q ) , d ( X )( P ) (cid:17) ≤ E π [ | e ( Y ) − d ( Y ) | p ] /p ≤ E π [ | C ( Y ) − C ( X ) | p ] /p + E π [ | ( H ( Y ) • Y ) T − ( H ( X ) • X ) T | p ] /p and the claim follows from the ﬁrst and third estimates of Lemma 3.9. (cid:3) Remark 3.11.

An evident question is whether an estimate for the usual Wasser-stein distance holds true without the (Lipschitz-) continuity assumption on H . Namelyif (3.4) holds for d p instead of d wp . The following example shows that this is nottrue. In a two-period discrete time model ( T = 2) , let P := δ ⊗ (( δ + δ − ) / and P ε := (( δ ε + δ − ε ) / ⊗ (( δ + δ − ) / so that AW p ( P ε , P ) → as ε → for every p . Then, set H := 0 and H :=1 (0 , ∞ ) − ( −∞ , . For the projection under any bi-causal coupling between P ε and P of H onto Y one computes G = 0 and G = 0 . In particular ( G ( Y ) • Y ) T = 0 P -almost surely. However, for every ε > one has P ε (( H ( X ) • X ) T ≥ − ε ) ≥ / which implies that the respective laws cannot converge. Remark 3.12. By b p we denote the smallest real number such that E [ k M k p ∞ ] ≤ b p E [[ M ] p/ ](3.5) for every martingale M . For p ≥ it was established by Burkholder [22] that b p = p but the value of b p is unknown for p ∈ [1 , according to [50] , [51, page 427] . By [17] , b ≤ . (The optimal constant in the reverse inequality is known for the trivialcase p = 2 and for p = 1 . In the latter instance one obtains √ and . . . . [59] for continuous martingales, resp.) Proofs of the results stated in the introduction and extensions

Thanks to work done in the previous section, the strategy for the proofs boilsdown into two parts. In a ﬁrst step, one forgets about the space Ω and only focuseson continuity of the problem at hand with respect to d p or d wp when image measureson R are plugged in: e.g. in utility maximization this means to study continuityof µ R U ( x ) µ ( dx ). In a second step, one uses the obtained continuity and thecontraction theorem in the previous section.4.1. Proof of Theorem 1.5.

We will need the elementary estimate

Lemma 4.1.

Let µ, ν ∈ P ( R ) and let f : R → R be convex and Lipschitz. Z f ( x ) µ ( dx ) − Z f ( y ) ν ( dy ) ≤ L d w ( µ, ν ) , (4.1) where L is Lipschitz constant of f . Proof.

Let γ be a coupling of µ and ν . Applying Jensen’s inequality we obtain Z f ( x ) µ ( dx ) − Z f ( y ) ν ( dy ) = Z f ( x ) − f ( y ) γ ( dx, dy )= Z (cid:16) f ( x ) − Z f ( y ) γ x ( dy ) (cid:17) µ ( dx ) ≤ Z (cid:16) f ( x ) − f (cid:16) Z y γ x ( dy ) (cid:17)(cid:17) µ ( dx ) ≤ L Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) µ ( dx ) . As γ was arbitrary, this implies the claim. (cid:3) In fact there is equality in the previous lemma, if one takes supremum in thel.h.s. of (4.1) over all L -Lipschitz convex function, as shown in [32, Proposition 3.2].We now turn to the proof of Theorem 1.5. For n > π be a bi-causal couplingwhich attains the inﬁmum in the deﬁnition of AW ( P , Q ) modulo a 1 /n -margin. ByLemma 3.7 there is G n ∈ H k such that ( G n ( Y ) • Y ) T = E π [( H ( X ) • Y ) T | Y ] π -almostsurely. Deﬁne µ n := ( C ( Y ) + ( G n ( Y ) • Y ) T )( Q ) and ν := ( C ( X ) + ( H ( X ) • X ) T )( P ) . (Note that µ n , ν ∈ P ( R ) as P , Q ∈ SM (Ω).) By Lemma 4.1 we have E Q (cid:2) ( C ( Y ) − m − ( G n ( Y ) • Y ) T ) + (cid:3) − E P (cid:2) ( C ( X ) − m − ( H ( X ) • X ) T ) + (cid:3) ≤ d w ( µ n , ν ) . From Theorem 3.10 we obtain E Q [( C ( Y ) − m − ( G n ( Y ) • Y ) T )) + ] ≤ E P [( C ( X ) − m − ( H ( X ) • X ) T ) + ]+ b ( k + L ) ( AW ( P , Q ) + 1 /n ) . (4.2)Assume ﬁrst that E Q [[ Y ] T ] < ∞ and denote by A the ﬁnite variation processassociated to Y . Then, as ( G n ) is uniformly bounded by k , there exists a pre-dictable G and a sequence of forward-convex combinations of ( G n ) which convergein L ( d Q ⊗ d ([ Y ] + A )) to G . This, (4.2), and the convexity of ( · ) + lead to thedesired conclusion. The general case follows by a simple but notationally heavylocalization argument.The proof in case that G = H and H is Lipschitz follows analogously from thesecond part of Theorem 3.10.4.2. Proof of Theorem 1.6.

In a ﬁrst step notice that for all P , P ′ and randomvariables Z, Z ′ , it follows as in Lemma 4.1 thatAVaR P α ( Z ) − AVaR P ′ α ( Z ′ ) ≤ d w ( Z ( P ) , Z ′ ( P ′ )) /α. Indeed, if γ is a coupling from µ := Z ( P ) to ν := Z ′ ( P ′ ) thenAVaR P α ( Z ) − AVaR P ′ α ( Z ′ )= inf m Z α ( x − m ) + − m µ ( dx ) − inf m α Z Z ( y − m ) + γ x ( dy ) − m µ ( dy ) ≤ sup m α Z ( x − m ) + − ( y − m ) + γ ( dx, dy ) ≤ sup m α Z ( x − m ) + − (cid:16) Z y γ x ( dy ) − m (cid:17) + µ ( dx ) ≤ α Z (cid:12)(cid:12)(cid:12) x − Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) µ ( dx ) , so minimizing over γ yields the claim.The rest of the proof now follows the line of argumentation as in the proof forTheorem 1.5. Fix P , Q ∈ SM (Ω). Assume only for notational simplicity that DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 19 there exists a bi-causal coupling π which attains the inﬁmum in the deﬁnition of AW ( P , Q ), and that there exist H ∗ ∈ H k such thatAVaR P α ( C ( X ) − ( H ∗ ( X ) • X ) T ) = inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) . By Lemma 3.7 there is G ∗ ∈ H k such that ( G ∗ ( Y ) • Y ) T = E π [( H ∗ ( X ) • Y ) T | Y ] π -almost surely. Thereforeinf G ∈H k AVaR Q α ( C ( Y ) − ( G ( Y ) • Y ) T ) − inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≤ AVaR Q α ( C ( Y ) − ( G ∗ ( Y ) • Y ) T ) − AVaR P α ( C ( X ) − ( H ∗ ( X ) • X ) T ) ≤ α d w (cid:16) ( C ( Y ) − ( G ∗ ( Y ) • Y ) T )( Q ) , ( C ( X ) − ( H ∗ ( X ) • X ) T )( P ) (cid:17) ≤ b ( k + L ) α AW ( P , Q ) , where the last inequality is due to Theorem 3.10. Interchanging the role of P and Q yields the desired conclusion. The proof for the second estimate follows analogously.4.3. Proof of Example 1.7.

First note that AVaR P α ( Z ) ≥ E P [ Z ] for every in-tegrable random variable Z . Indeed, this follows from integrating the pointwiseinequality x = x + m − m ≤ ( x + m ) + /α − m . Therefore, as the Brownian stochas-tic integral has expectation zero, we conclude that inf H ∈H k AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≥ E P [ C ( X )]. On the other hand, deﬁne f ( t, x ) := Z c ( x + y ) N (0 , σ ( T − t ))( dy ) for ( t, x ) ∈ [0 , T ] × R , where N (0 , σ ( T − t )) stands for the normal distribution with mean 0 and variance σ ( T − t ). Then C ( X ) = f ( T, X T ) and E P [ f ( t, X t ) |F s ] = f ( s, X s ) for every 0 ≤ s ≤ t ≤ T . Thus, by Itˆo’s formula and fact that the martingale property impliesthat the ﬁnite variation part vanishes, one has f ( t, X t ) = f (0 ,

0) + ( H ∗ ( X ) · X ) T forthe predictable trading strategy H ∗ t := ∂ x f ( t, X t ). As further | H ∗ t | ≤ t and f (0 ,

0) = σ/ √ π , one hasinf H ∈H AVaR P α ( C ( X ) − ( H ( X ) • X ) T ) ≤ AVaR P α ( C ( X ) − ( H ∗ ( X ) · X ) T ) = σ √ π . The proof now follows from the explicit formula for the adapted Wasserstein dis-tance derived in Example 3.4 and the fact that E P [ C ( X )] = σ/ √ π .4.4. Proof of Theorem 1.8.

Recall that U ′ ( x ) ≤ c (1 + | x | p − ) for all x ∈ R andsome constant c . Let P , Q ∈ SM p (Ω) be arbitrary and assume only for notationalsimplicity that there is H ∗ ∈ H k such that E P [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] = sup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )]and that there is a bi-causal coupling π coupling between P and Q which is opti-mal for AW p ( P , Q ). By Lemma 3.7 there is G ∗ ∈ H k such that ( G ∗ ( Y ) • Y ) T = E π [( H ∗ ( X ) • Y ) T | Y ] π -almost surely. Let µ := ( C ( Y ) + ( G ∗ ( Y ) • Y ) T )( Q ) and ν := ( C ( X ) + ( H ∗ ( X ) • X ) T )( P ) , and let γ be an (almost) optimal coupling for d wp ( µ, ν ). As U is concave andincreasing, we have U ( y ) − U ( x ) ≤ U ′ (min { x, y } ) | x − y | . Using Jensen’s inequality for the concave function U we havesup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )] − sup G ∈H k E Q [ U ( C ( Y ) + ( G ( Y ) • Y ) T )] ≤ E P [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] − E Q [ U ( C ( Y ) + ( G ∗ ( Y ) • Y ) T )]= Z U ( y ) − U ( x ) γ ( dx, dy ) ≤ Z U (cid:16) Z y γ x ( dy ) (cid:17) − U ( x ) µ ( dx ) ≤ (cid:16) Z (cid:12)(cid:12)(cid:12) U ′ (cid:16) min n x, Z y γ x ( dy ) o(cid:17)(cid:12)(cid:12)(cid:12) q µ ( dx ) (cid:17) /q d wp ( µ, ν ) , where we used H¨older’s inequality in the last line and q denotes the conjugate H¨olderexponent of p (that is, 1 /p + 1 /q = 1). As q ( p −

1) = p , the growth assumptionon U ′ implies that | U ′ (min { x, y } ) | q ≤ c (1 + | x | p + | y | p ) for some (new) constant c .Then, by Lemma 3.9, we have Z (cid:12)(cid:12)(cid:12) U ′ (cid:16) min n x, Z y γ x ( dy ) o(cid:17)(cid:12)(cid:12)(cid:12) q µ ( dx ) ≤ c (cid:16) Z | x | p µ ( dx ) + Z (cid:12)(cid:12)(cid:12) Z y γ x ( dy ) (cid:12)(cid:12)(cid:12) p µ ( dy ) (cid:17) ≤ c (cid:16) Z | x | p µ ( dx ) + Z | y | p ν ( dy ) (cid:17) ≤ ˜ c (cid:0) AW p ( Q , δ ) p + AW p ( P , δ ) p (cid:1) =: e for e := ˜ c (1 + R p + R p ). Exchanging the roles of P and Q and using Theorem 3.10completes the proof.4.5. The proof of Theorem 1.9.

In a ﬁrst step, we claim that v ( P ) is uniformlybounded over all P with AW p ( P , δ ) ≤ R . Indeed, using the growth assumption on U , the fact that U is stricly increasing, and the BDG-inequality to control the p -thmoment of ( H • X ) T , it follows that there exist a, A ∈ R such thatinf U < a ≤ sup H ∈H k E P [ U (( H • X ) T )] ≤ A < sup U (4.3)for all P with AW p ( P , δ ) ≤ R . Now assume that there exists a sequence P n with AW p ( P n , δ ) ≤ R but v ( P n ) → ∞ . Then, using the BDG-inequality once more, itfollows that sup H ∈H k E P n [ U ( C − v ( P n ) + ( H • X ) T )] → inf U, a contradiction to (4.3). The case v ( P n ) → −∞ is excluded analogously.At this point, using the deﬁnition of v ( P ), a twofold application of Theorem 1.8yields (cid:12)(cid:12)(cid:12) sup H ∈H k E Q [ U ( C − v ( P ) + ( H • X ) T )] − sup H ∈H k E Q [ U (( H • X ) T )] (cid:12)(cid:12)(cid:12) ≤ K · AW p ( P , Q ) . Indeed, while a direct application of the theorem would give a constant K whichdepends on v ( P ), an inspection of its proof shows that the constant K dependsonly on the size of v ( P ). By the ﬁrst step this is bounded unifomly over P with AW p ( P , δ ) ≤ R .Now let ε > H ∈ H k be arbitrary, and set Y := Y H := C − v ( P )+( H • X ) T .Then, it follows that there is some constant c > R ≥ AW p ( Q , δ )and U ) such that E Q [ U ( Y + ε )] = E Q [ U ( Y )] + E Q h Z Y + εY U ′ ( z ) dz i ≥ E Q [ U ( Y )] + εc. Indeed, this would follow directly if Y were bounded by a ﬁxed constant but read-ily extends to the present setting as E Q [ | Y | p ] ≤ C for some constant C > DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 21 independent of H and Q as long as AW p ( Q , δ ) ≤ R . In a similar manner E Q [ U ( Y − ε )] ≤ E Q [ U ( Y )] − εc .Putting everything together revealssup H ∈H k E Q [ U ( C − ( v ( P ) + ε ) + ( H • X ) T )] < sup H ∈H k E Q [ U (( H • X ) T )] < sup H ∈H k E Q [ U ( C − ( v ( P ) − ε ) + ( H • X ) T )]for some ε < ˜ C AW p ( P , Q ) (where ˜ C is a new constant emerging from K and c ).Thus | v ( Q ) − v ( P ) | ≤ ε ≤ ˜ C AW p ( P , Q ) which completes the proof.4.6. Two generalizations.

The following two results can be proved using almostthe same arguments as used in the proofs of Theorem 1.6 and Theorem 1.8. Inparticular the proofs boil down to establishing convergence for image measureswith respect to d p and give no new insight on adapted Wasserstein distances, so weshall skip them. Proposition 4.2.

Let ℓ : R → R + be a convex and strictly increasing function andlet δ > . Assume that p ≥ is such that ℓ ′ ( x ) ≤ c (1 + | x | p − ) for some constant c . Then, for every Lipschitz continuous function C : Ω → R , the function P inf n m ∈ R : ∃ H ∈ H k such that E P [ ℓ ( C ( X ) − ( H ( X ) • X ) T − m )] ≤ δ o is continuous on ( SM p (Ω) , AW p ) . Let ρ be a law-invariant risk measure which we directly view as a functional from P p ( R ) to the reals. For P ∈ SM p (Ω) and a random variable Z : Ω → R (such that Z ( P ) ∈ P p ( R )) we write ρ P ( Z ) = ρ ( Z ( P )). A typical example of a law invariantrisk measure which satisﬁes ρ ( µ ) − ρ ( ν ) ≤ Ld w ( µ, ν ) for some constant L dependingon the p -the moment of µ and ν is the optimized certainty equivalent, introducedto the mathematical ﬁnance community in [19]. For a convex, increasing function ℓ : R → R which is bounded from below and satisﬁes ℓ ( x ) /x → ∞ as x → ∞ , theoptimized certainty equivalent is deﬁned via ρ P ( Z ) := inf m ∈ R (cid:0) E P [ ℓ ( Z − m )] + m (cid:1) = inf m ∈ R (cid:16) Z ℓ ( x − m ) ( Z ( P ))( dx ) + m (cid:17) . If ℓ ′ ( x ) ≤ c (1 + | x | p − ), then it follows that the inﬁmum over m can be taken insome compact set depending on the p -th moments. Due to cash additivity of ρ , thefollowing proposition has the same interpretation as Theorem 1.6. Proposition 4.3.

Assume that ρ : P p ( R ) → R satisﬁes ρ ( µ ) − ρ ( ν ) ≤ Ld w ( µ, ν ) for some constant L depending on the p -the moment of µ and ν . Then, for everyLipschitz function C : Ω → R , the mapping P inf H ∈H k ρ P ( C ( X ) − ( H ( X ) • X ) T ) is locally Lipschitz continuous on ( SM p (Ω) , AW p ) . Finally, let us point out that (though not a convex risk measure) the Value-at-Risk (VaR) would be another natural candidate to study continuity. However,as VaR is not continuous w.r.t. weak convergence, already in a one period modelcontinuity of P inf { m ∈ R : there is H ∈ H k with VaR P ( C ( X ) − m − ( H ( X ) • X ) T ) ≤ } does not hold. Final remarks

Remark 5.1 (Usual Wasserstein does not work I) . We note that convergence inthe usual Wasserstein distance is not suﬃcient to obtain continuity in any of theproblems we study in this paper. Consider a two period market with P n = 14 (cid:16) δ (1 /n, + δ (1 /n, + δ ( − /n, + δ ( − /n, − (cid:17) , P = 14 (cid:16) δ (0 , + 2 δ (0 , + δ (0 , − (cid:17) . Then P and each P n satisfy the classical no-arbitrage condition, unlike the situationdescribed in Figure 1. While P n converges to P in usual Wasserstein distance, onecan verify that convergence in nested distance does not hold. For example in utilitymaximization of the trivial claim C = 0 , we have sup H ∈H k E P [ U ( C ( X ) + ( H ( X ) • X ) T )] = U (0) by Jensen’s inequality (as X is a martingale under P ). For P n takingthe strategy H ∗ consisting of H ∗ = 0 and H ∗ ( x ) = k sign( x ) , one gets sup H ∈H k E P n [ U ( C ( X ) + ( H ( X ) • X ) T )] ≥ E P n [ U ( C ( X ) + ( H ∗ ( X ) • X ) T )] → U ( k ) , showing the lack of continuity. Remark 5.2 (Usual Wasserstein does not work II) . As explained in the introduc-tion, the objective in Theorem 1.5 can be seen as a relaxed version of the superhedg-ing problem. The reason to consider this relaxation is not a technical simpliﬁcationbut necessary to to obtain continuity without further assumptions. Indeed, the prob-lem of superhedging inf n m ∈ R : there is H ∈ H k such that m + ( H • X ) T ≥ C ( X ) , P -almost surely (cid:9) is not continuous in P w.r.t. adapted distance for any k ∈ [0 , ∞ ] . In fact, thisalready happens in one period, where adapted and the usual Wasserstein distancescoincide. Consider a sequence of measures P n with full support which convergeweakly to a measure P . Then the superhedging price w.r.t. P n equals the concaveenvelope of C , while the superhedging price w.r.t. P equals the concave envelope of C restricted to the support of P . For a recent paper on this problem in one period,see the work of Ob l´oj and Wiesel [49] . Remark 5.3 (Uniformly bounded strategies are necessary) . Similar as in Remark5.2 the restriction to trading strategies in H k (i.e. uniformly bounded strategies)is also no technical simpliﬁcation. For example, in a one-period framework, themeasures P ε := (1 − ε ) δ (0 ,ε ) + εδ (0 , − ε ) converges to P := δ (0 , in every (adapted)Wasserstein distance. However, we have for small ε > H ∈H ∞ AVaR P ε α (( H • X ) T ) = −∞ while inf H ∈H ∞ AVaR P α (( H • X ) T ) = 0 , where H ∈ H ∞ := S k ∈ N H k is the set of all bounded trading strategies. Acknowledgements . All authors are grateful to the anonymous referees whoseinsightful comments had a signiﬁcant impact on this article. J. Backhoﬀ grate-fully acknowledges ﬁnancial support by the FWF through grant P30750 and bythe Vienna University of Technology. D. Bartl has been funded by the AustrianScience Fund (FWF) under Project P28661. M. Beiglboeck and M. Eder gratefullyacknowledge ﬁnancial support by the FWF through grant Y782.

DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 23

References [1] B. Acciaio, J. Backhoﬀ-Veraguas, and A. Zalashko. Causal optimal transport and its links toenlargement of ﬁltrations and continuous-time stochastic optimization.

Forthcoming at Stoch.Processes and their Applications , 2016.[2] D. J. Aldous. Weak convergence and general theory of processes. Unpublished monograph;Department of Statistics, University of California, Berkeley, CA 94720, July 1981.[3] A. Alfonsi, J. Corbetta, and B. Jourdain. Sampling of one-dimensional probability measuresin the convex order and computation of robust option price bounds.

International Journalof Theoretical and Applied Finance , 22(03):1950002, 2019.[4] J.-J. Alibert, G. Bouchitte, and T. Champion. A new class of cost for optimal transportplanning. hal-preprint , 2018.[5] M. Avellaneda, A. Levy, and A. Par`as. Pricing and hedging derivative securities in marketswith uncertain volatilities.

Appl. Math. Finance , 2(2):73–88, 1995.[6] J. Backhoﬀ-Veraguas, D. Bartl, M. Beiglb¨ock, and M. Eder. All Adapted Topologies areEqual. arXiv e-prints , page arXiv:1905.00368, May 2019.[7] J. Backhoﬀ-Veraguas, D. Bartl, M. Beiglb¨ock, and J. Wiesel. Estimating processes in adaptedwasserstein distance. arXiv e-prints , 2020.[8] J. Backhoﬀ-Veraguas, M. Beiglb¨ock, M. Eder, and A. Pichler. Fundamental properties ofprocess distances.

ArXiv e-prints , 2017.[9] J. Backhoﬀ-Veraguas, M. Beiglb¨ock, M. Huesmann, and S. K¨allblad. Martingale Benamou–Brenier: a probabilistic perspective.

To appear in Annals of Probability , Aug. 2018.[10] J. Backhoﬀ-Veraguas, M. Beiglb¨ock, Y. Lin, and A. Zalashko. Causal transport in discretetime and applications.

SIAM Journal on Optimization , 27(4):2528–2562, 2017.[11] J. Backhoﬀ-Veraguas, M. Beiglb¨ock, and G. Pammer. Existence, duality, and cyclical mono-tonicity for weak transport costs.

Calculus of Variations and Partial Diﬀerential Equations ,58(6):203, 2019.[12] J. Backhoﬀ-Veraguas and G. Pammer. Stability of martingale optimal transport and weakoptimal transport. arXiv e-prints , page arXiv:1904.04171, Apr 2019.[13] D. Bartl, S. Drapeau, J. Ob l´oj, and J. Wiesel. , private communication.[14] D. Becherer and K. Kentia. Good deal hedging and valuation under combined uncertaintyabout drift and volatility.

Probab. Uncertain. Quant. Risk , 2:Paper No. 13, 40, 2017.[15] M. Beiglb¨ock, P. Henry-Labord`ere, and F. Penkner. Model-independent bounds for optionprices: A mass transport approach.

Finance Stoch. , 17(3):477–501, 2013.[16] M. Beiglb¨ock and N. Juillet. On a problem of optimal transport under marginal martingaleconstraints.

Ann. Probab. , 44(1):42–106, 2016.[17] M. Beiglb¨ock and P. Siorpaes. Pathwise versions of the Burkholder-Davis-Gundy inequality.

Bernoulli , 21(1):360–373, 2015.[18] M. Beiglboeck, A. Cox, and M. Huesmann. The geometry of multi-marginal Skorokhod Em-bedding.

PTRF, to appear , page arXiv:1705.09505, May 2019.[19] A. Ben-Tal and M. Teboulle. An old-new concept of convex risk measures: The optimizedcertainty equivalent.

Mathematical Finance , 17(3):449–476, 2007.[20] J. Bion-Nadal and D. Talay. On a Wasserstein-type distance between solutions to stochasticdiﬀerential equations.

Ann. Appl. Probab. , 29(3):1609–1639, 2019.[21] B. Bouchard and M. Nutz. Arbitrage and duality in nondominated discrete-time models.

TheAnnals of Applied Probability , 25(2):823–859, 2015.[22] D. L. Burkholder. Explorations in martingale theory and its applications. In ´Ecole d’ ´Et´e deProbabilit´es de Saint-Flour XIX—1989 , volume 1464 of

Lecture Notes in Math. , pages 1–66.Springer, Berlin, 1991.[23] D. L. Burkholder. The best constant in the Davis inequality for the expectation of the mar-tingale square function.

Trans. Amer. Math. Soc. , 354(1):91–105 (electronic), 2002.[24] L. Campi, I. Laachir, and C. Martini. Change of numeraire in the two-marginals martingaletransport problem.

Finance Stoch. , 21(2):471–486, June 2017.[25] R. Cont. Model uncertainty and its impact on the pricing of derivative instruments.

Mathe-matical ﬁnance , 16(3):519–547, 2006.[26] Y. Dolinsky and H. M. Soner. Martingale optimal transport and robust hedging in continuoustime.

Probab. Theory Relat. Fields , 160(1-2):391–427, 2014.[27] M. Eder. Compactness in Adapted Weak Topologies. arXiv e-prints , page arXiv:1905.00856,May 2019.[28] N. El Karoui, M. Jeanblanc, and S. Shreve. Robustness of the Black and Scholes formula.

Math. Finance , 8(2):93–126, 1998. [29] A. Galichon, P. Henry-Labord`ere, and N. Touzi. A stochastic control approach to no-arbitragebounds given marginals, with an application to lookback options.

Ann. Appl. Probab. ,24(1):312–336, 2014.[30] M. Glanzer, G. C. Pﬂug, and A. Pichler. Incorporating statistical model error into the calcu-lation of acceptability prices of contingent claims.

Mathematical Programming , 174(1-2):499–524, 2019.[31] N. Gozlan, C. Roberto, P.-M. Samson, Y. Shu, and P. Tetali. Characterization of a classof weak transport-entropy inequalities on the line.

Ann. Inst. Henri Poincar´e Probab. Stat. ,54(3):1667–1693, 2018.[32] N. Gozlan, C. Roberto, P.-M. Samson, and P. Tetali. Kantorovich duality for general transportcosts and applications.

J. Funct. Anal. , 273(11):3327–3405, 2017.[33] M. F. Hellwig. Sequential decisions under uncertainty and the maximum theorem.

J. Math.Econom. , 25(4):443–464, 1996.[34] S. Herrmann and J. Muhle-Karbe. Model uncertainty, recalibration, and the emergence ofdelta–vega hedging.

Finance and Stochastics , 21(4):873–930, Oct 2017.[35] S. Herrmann, J. Muhle-Karbe, and F. T. Seifried. Hedging with small uncertainty aversion.

Finance and Stochastics , 21(1):1–64, Jan 2017.[36] D. Hobson. Robust hedging of the lookback option.

Finance and Stochastics , 2:329–347, 1998.[37] D. Hobson. The Skorokhod embedding problem and model-independent bounds for optionprices. In

Paris-Princeton Lectures on Mathematical Finance 2010 , volume 2003 of

LectureNotes in Math. , pages 267–318. Springer, Berlin, 2011.[38] D. Hobson and A. Neuberger. Robust bounds for forward start options.

Math. Finance ,22(1):31–56, 2012.[39] D. G. Hobson. Volatility misspeciﬁcation, option pricing and superreplication via coupling.

Ann. Appl. Probab. , 8(1):193–205, 1998.[40] I. Karatzas and S. Shreve.

Brownian motion and stochastic calculus , volume 113. SpringerScience & Business Media, 2012.[41] C. Kardaras and G. ˇZitkovi´c. Stability of the utility maximization problem with randomendowment in incomplete markets.

Math. Finance , 21(2):313–333, 2011.[42] D. Lacker. Dense sets of joint distributions appearing in ﬁltration enlargements, stochasticcontrol, and causal optimal transport.

ArXiv e-prints , 2018.[43] K. Larsen. Continuity of utility-maximization with respect to preferences.

Math. Finance ,19(2):237–250, 2009.[44] K. Larsen and G. ˇZitkovi´c. Stability of utility-maximization in incomplete markets.

StochasticProcess. Appl. , 117(11):1642–1662, 2007.[45] R. Lassalle. Causal transference plans and their Monge-Kantorovich problems.

StochasticAnalysis and Applications , 36(3):452–484, 2018.[46] T. J. Lyons. Uncertain volatility and the risk-free synthesis of derivatives.

Applied Mathemat-ical Finance , 2(2):117–133, 1995.[47] M. Mocha and N. Westray. The stability of the constrained utility maximization problem: aBSDE approach.

SIAM J. Financial Math. , 4(1):117–150, 2013.[48] J. Ob l´oj. The Skorokhod embedding problem and its oﬀspring.

Probab. Surv. , 1:321–390,2004.[49] J. Ob l´oj and J. Wiesel. Statistical estimation of superhedging prices.

ArXiv e-prints , 2018.[50] A. Osekowski. Sharp maximal inequalities for the martingale square bracket.

Stochastics: AnInternational Journal of Probability and Stochastics Processes , 82(06):589–605, 2010.[51] A. Osekowski.

Sharp martingale and semimartingale inequalities , volume 72 of

InstytutMatematyczny Polskiej Akademii Nauk. Monograﬁe Matematyczne (New Series) [Mathe-matics Institute of the Polish Academy of Sciences. Mathematical Monographs (New Series)] .Birkh¨auser/Springer Basel AG, Basel, 2012.[52] G. C. Pﬂug. Version-independence and nested distributions in multistage stochastic optimiza-tion.

SIAM Journal on Optimization , 20(3):1406–1420, 2009.[53] G. C. Pﬂug and A. Pichler. A distance for multistage stochastic optimization models.

SIAMJ. Optim. , 22(1):1–23, 2012.[54] G. C. Pﬂug and A. Pichler.

Multistage stochastic optimization . Springer Series in OperationsResearch and Financial Engineering. Springer, Cham, 2014.[55] G. C. Pﬂug and A. Pichler. From empirical observations to tree models for stochastic opti-mization: convergence properties.

SIAM J. Optim. , 26(3):1715–1740, 2016.[56] A. Pratelli. On the equality between Monge’s inﬁmum and Kantorovich’s minimum in optimalmass transportation.

Ann. Inst. H. Poincar´e Probab. Statist. , 43(1):1–13, 2007.[57] J.-L. Prigent. Weak convergence of ﬁnancial markets. In

Weak Convergence of FinancialMarkets , pages 129–265. Springer, 2003.

DAPTED WASSERSTEIN DISTANCES AND STABILITY IN MATHEMATICAL FINANCE 25 [58] L. R¨uschendorf. The Wasserstein distance and approximation theorems.

Z. Wahrsch. Verw.Gebiete , 70(1):117–129, 1985.[59] W. Schachermayer and F. Stebegg. The Sharp Constant for the Burkholder-Davis-GundyInequality and Non-Smooth Pasting.

Bernoulli, to appear , July 2017.[60] K. Weston. Stability of utility maximization in nonequivalent markets.

Finance and Stochas-tics , 20(2):511–541, 2016.[61] J. Wiesel. Continuity of the martingale optimal transport problem on the real line. arXive-prints , page arXiv:1905.04574, May 2019.[62] T. Yamada and S. Watanabe. On the uniqueness of solutions of stochastic diﬀerential equa-tions.