[PDF] Distributionally Robust XVA via Wasserstein Distance: Wrong Way Counterparty Credit and Funding Risk

Abstract

This paper investigates calculations of robust XVA, in particular, credit valuation adjustment (CVA) and funding valuation adjustment (FVA) for over-the-counter derivatives under distributional uncertainty using Wasserstein distance as the ambiguity measure. Wrong way counterparty credit risk and funding risk can be characterized (and indeed quantified) via the robust XVA formulations. The simpler dual formulations are derived using recent infinite dimensional Lagrangian duality results. Next, some computational experiments are conducted to measure the additional XVA charges due to distributional uncertainty under a variety of portfolio and market configurations. Finally some suggestions for future work are discussed.

Full PDF

DDistributionally Robust XVA via Wasserstein Distance:Wrong Way Counterparty Credit and Funding Risk

Derek Singh, Shuzhong Zhang

Department of Industrial and Systems Engineering, University of [email protected], [email protected]

Abstract

This paper investigates calculations of robust XVA, in particular, credit valuation adjustment (CVA) and fund-ing valuation adjustment (FVA) for over-the-counter derivatives under distributional uncertainty using Wassersteindistance as the ambiguity measure. Wrong way counterparty credit risk and funding risk can be characterized (andindeed quantiﬁed) via the robust XVA formulations. The simpler dual formulations are derived using recent inﬁ-nite dimensional Lagrangian duality results. Next, some computational experiments are conducted to measure theadditional XVA charges due to distributional uncertainty under a variety of portfolio and market conﬁgurations.Finally some suggestions for future work are discussed.

Keywords—

CCR, CVA, FVA, derivatives, distributional robust optimization, Wasserstein distance, Lagrangian duality

An X-Value adjustment (XVA) is a generic term used to refer to various valuation adjustments, typically applied to over-the-counter (OTC) derivatives held by ﬁnancial institutions. The ﬁrst of the XVAs, and still one of the most signiﬁcant, in termsof market exposure, is CVA. One of the more recent, and perhaps equally signiﬁcant, exposures is FVA. Both of these XVAshave similar structure (unilateral, bilateral) and mathematical form for computation. Other XVAs include capital valuationadjustment (KVA) and margin valuation adjustment (MVA). Wrong way risk refers to adversely correlated moves in the marketexposures and the counterparty spreads (e.g. credit, funding). It can materially affect the magnitude of the XVA adjustment.Credit valuation adjustment (CVA) represents the impact on portfolio market value due to counterparty default. UnilateralCVA can be represented mathematically as an integral of discounted expected positive exposure times (incremental) counter-party default probability. The market valuation is a function of counterparty credit risk, the underlying (market) risk factors thatdrive the portfolio valuation (and hence positive exposure), as well as the correlations between these market risk factors and thecounterparty credit risk curves for a given portfolio. CVA is typically measured and reported at the counterparty level.The “other side” of unilateral CVA is unilateral debit valuation adjustment (DVA). This is the beneﬁt to the ﬁrm, of itsreduced liability, as measured by discounted expected negative exposure times ﬁrm default probability. As above, the marketvaluation is a function of ﬁrm credit risk, underlying market risk factors that drive portfolio valuation, and the correlations.Unilateral DVA can be represented mathematically as an integral of discounted negative exposure times (incremental) ﬁrmdefault probability. DVA is typically measured at the ﬁrm level.Bilateral CVA represents the dual impact on portfolio market value due to counterparty default and ﬁrm default. BilateralCVA can be represented mathematically as the difference between two integrals: (i) discounted expected positive exposuretimes (incremental) counterparty default probability prior to ﬁrm default, (ii) discounted expected negative exposure times(incremental) ﬁrm default probability prior to counterparty default. Bilateral CVA is typically measured and reported at thecounterparty level, for a given ﬁrm.Funding valuation adjustment (FVA) represents the impact on portfolio market value due to funding exposures for thehedge on uncollateralized derivatives. It represents the market value of funding exposure risk. Funding cost adjustment (FCA)can be represented mathematically as an integral of discounted expected positive exposure times funding cost (incremental)conditional on joint counterparty and ﬁrm survival. FCA arises for a positive portfolio exposure since this implies a negativehedge exposure which leads to a funding cost for collateral posted. The market valuation is a function of joint counterparty andﬁrm credit risk, the underlying (market) risk factors that drive the portfolio valuation (and hence positive exposure) as well as a r X i v : . [ q -f i n . M F ] M a y unding cost, as well as the correlations between these market risk factors and the credit risk curves for a given portfolio. FCAis typically measured and reported at the funding netting set level.The “other side” of FCA is funding beneﬁt adjustment (FBA). This represents the funding beneﬁt to the ﬁrm, for interestincome proceeds on received collateral posted against counterparty exposure on the hedge, as measured by discounted expectednegative exposure times funding beneﬁt conditional on joint counterparty and ﬁrm survival. As above, the market valuation isa function of counterparty and ﬁrm credit risk, underlying market risk factors that drive portfolio valuation and funding beneﬁt,and the correlations. FBA can be represented mathematically as an integral of discounted negative exposure times fundingbeneﬁt conditional on joint counterparty and ﬁrm survival. FBA is typically measured at the funding netting set level.(Bilateral) FVA represents the dual impact on portfolio market value due to both funding cost and funding beneﬁt exhibitedover the portfolio lifetime. FVA can be represented mathematically as the difference (or sum) of two integrals: (i) discountedexpected positive exposure times funding cost conditional on joint counterparty and ﬁrm survival; (ii) discounted expectednegative exposure times funding beneﬁt conditional on joint counterparty and ﬁrm survival. FVA is typically measured andreported at the netting set level for a given ﬁrm.U.S. regulatory authorities, the Federal Reserve and Ofﬁce of the Comptroller of the Currency (OCC), periodically assessnational banks’ compliance with Market Risk Capital Rule (MRR). Counterparty credit risk (CCR) and funding risk (FR)metrics are key metrics used to evaluate bank risk proﬁles and balance sheet exposures due to over the counter (OTC) derivatives,securities ﬁnancing transactions, and other transactions and exposures (Ofﬁce of the Comptroller of the Currency, 2011). BaselCommittee on Banking Supervision has issued supervisory guidance, in the form of its Basel III framework (and supplementalguidance), to quantify capital charges due to CCR. A new element in Basel III was a capital charge due to degradation in CCRfor a given portfolio or book of business (Basel Committee on Banking Supervision, 2015). Potential revisions to the Baselframework may include elements to quantify CCR capital charges due to deterioration in market risk exposure.The Dodd-Frank Wall Street Reform and Consumer Protection Act (July 2010) enacted regulations for the swaps marketand authorized creation of centralized exchanges for swaps (and other) derivatives trading. Derivatives that trade on an exchangereference the exchange as the transaction counterparty. Since exchanges clear multiple (typically offsetting) transactions andhedge their risk through other third parties, exchange traded derivatives have minimal CCR risk proﬁle. However, OTC deriva-tives typically have banks or other ﬁnancial institutions as counterparties which do have material credit risk proﬁles. Accordingto International Swap Dealers Association (ISDA) the OTC derivatives notional outstanding was 544 trillion at year end 2018.Interest rate derivatives notional outstanding was 437 trillion at year end 2018. Recent (04/20/20) Bloomberg CDX investmentgrade and high yield credit spreads are 93 and 643 basis points respectively. Consequently the CCR and FR exposures (dueto uncollateralized or partially collateralized hedges) inherent in the OTC derivatives market represent signiﬁcant market riskexposures. This motivates the concepts of worst case CVA, FVA, and wrong way risk (WWR) and the impact of uncertainty inprobability distribution on these exposures and risk metrics. It is these considerations that motivate this line of research (RamziBen-Abdallah and Marzouk, 2019), (El Hajjaji and Subbotin, 2015).In our study, distributional uncertainty is characterized via the Wasserstein metric for a couple reasons. The Wassersteinmetric is a (reasonably) well understood metric and a natural, intuitive way to compare two probability distributions using ideasof transport cost. It is also a ﬂexible approach that encompasses parametric and non-parametric distributions of either discreteor continuous form. For example, one can explore distributions that alter the shape of the marginals as well as the correlationstructure. Furthermore, recent duality results and structural results on the worst case distributions could help us understandand/or quantify the market model transitions as well as measure (in a relative sense) the degree of wrong way risk inherent to agiven market model.An outline of this paper is as follows. Section 1 gives an overview of CVA, FVA, and WWR as well as a literature review.Section 2 develops the main theoretical results of the paper and provides proof sketches. Section 3 conducts a computationalstudy of WWR for a representative set of derivative portfolios and market environments. Section 4 discusses the conclusionsand suggestions for future research. All detailed proofs of propositions, corollaries, and theorems are deferred to the Appendix. Remark 1.

The authors are not aware of any substantial research that has been done on the topic of worst case FVA. Thediscussion below pertains to literature regarding worst case CVA.

In the past few years some research has been done to investigate and quantify the effect of distributional uncertainty onCVA. Brigo et al. (2013) explicitly incorporate correlation into the stochastic processes driving the market risk and credit defaultfactors. They quantify the effect of dependency structure (and hence wrong way risk) on CVA for a variety of asset classes:interest rate swaps, interest rate swaptions, commodities, equities, and foreign exchange products. Glasserman and Yang (2015)bound the effect of wrong way risk on CVA. Their approach considers a discrete setting for portfolio exposures and counterpartydefault times and formulates worst case CVA as the solution to a worst case linear program subject to certain constraints (suchas ﬁxed marginals for portfolio exposures and default times), where the dependency structure across the risk factors is allowed o vary. As this approach leads to large values for worst case CVA, they introduce a penalty term to modulate or temper thedegree of wrong way risk and run some sensitivity analysis to study the effect of the penalty term. Kullback-Leibler (KL)divergence is used to measure the distance between the reference (empirical) and the perturbed distribution. They remark thatdetermining a suitable value for the penalty term would be a topic for further research.Memartoluie, in his PhD thesis, uses an ordered scenario copula methodology to quantify worst case CVA (Memartoluie,2017). A particular method of scenario ordering correlates portfolio exposures to company default times (ﬁrm, counterparty, orboth) and the resulting dependency structure introduces wrong way risk. He chooses to order exposure scenarios by increasingtime averaged total exposure and then simulates company default times conditional on the exposure path using pre-speciﬁedcorrelation between the market risk factor(s) and credit risk factor(s). For worst case correlations set to one, he ﬁnds resultsfor worst case CVA that are comparable to the method of Glasserman and Yang (2015). In a recent paper, Ben-Abdallah et al.perform a computational study on the effect of wrong way risk on CVA for a portfolio of interest rate swaps, caps, and ﬂoors(Ramzi Ben-Abdallah and Marzouk, 2019). They ﬁnd that the dependency structure between interest rates and default intensityproduces material wrong way risk whereas the dependency structure between interest rate volatility and default intensity doesnot. Recent results in Lagrangian duality were independently developed by Blanchet and Murthy (2019) and Gao and Kleywegt(2016). These results hold under mild assumptions such as upper semicontinuity in the objective function and lower semi-continuity in the distance metric. Blanchet et al. (2016a) applied this duality theory to study a number of classical regressionproblems in machine learning under distributional uncertainty. In that context, the authors ﬁnd that distributional uncertaintycan be viewed as adding a regularization term to the objective function, analogous to a penalized regression setting. Similarly,Gao et al. (2017) apply the Lagrangian duality theory to problems in statistical learning.The main innovation in our work is to apply these recent results in Lagrangian duality to worst case CVA and FVA usingWasserstein distance as the ambiguity measure. Furthermore, analytical expressions are derived for the solutions to the inner andouter convex optimization problems that comprise worst case CVA and FVA via the Wasserstein approach. A computationalstudy shows the material impact of distributional uncertainty on worst case CVA and FVA, illustrates the risk proﬁles, andcomputes the worst case distributions. In Section 2 we formulate the primal optimization problems for distributionally robust CVA and FVA. As in our earlierwork this year, (Singh and Zhang, 2020) a key step in the approach is to use recent Lagrangian duality results to formulate theequivalent dual problems. The dual problems are much more tractable than the primal problems since they only involve thereference probability measure as opposed to a Wasserstein ball of probability measures (of some ﬁnite radius). For real valuedupper semicontinuous objective function f ∈ L and non-negative lower semicontinuous cost function c such that { ( u , v ) : c ( u , v ) < ∞ } is Borel measurable and non-empty, it holds that (Blanchet et al., 2016b)sup Q ∈ U δ ( Q N ) E Q [ f ( X )] = inf λ ≥ [ λ δ + N n ∑ i = Ψ λ ( x i )] where Ψ λ ( x i ) : = sup u ∈ dom ( f ) [ f ( u ) − λ c ( u , x i )] . Further details, including proofs and concrete examples, can be found in the papers by Blanchet and Murthy (2019), Gao andKleywegt (2016), and Esfahani and Kuhn (2018). These authors independently derived these results around the same timealthough Blanchet and Murthy (2019) did so in a more general setting.

Simply put, the set of worst case distributions (when non-empty) can be deﬁned as WC ( f , δ ) : = { Q ∗ : E Q ∗ [ f ( X )] = sup Q ∈ U δ ( Q N ) E Q [ f ( X )] } . Another recent set of results from the literature describes the structure of the worst case distribution(s)when they exist [(Blanchet and Murthy, 2019), (Gao and Kleywegt, 2016), (Esfahani and Kuhn, 2018)]. The boundednessconditions for existence are tied to the growth rate κ : = lim sup d ( X , X ) → ∞ f ( X ) − f ( X ) d ( X , X ) for ﬁxed X and the value of the dual minimizer λ ∗ . For empirical reference distributions, supported on N points, such that WC ( f , δ ) is non-empty, there exists a worst casedistribution that is another empirical distribution supported on at most N + N points, they can be identiﬁed as solving x ∗ i ∈ arg min ˜ x ∈ dom ( f ) [ λ ∗ c ( ˜ x , x i ) − f ( ˜ x )] .At most one point has its probability mass split into two pieces (according to budget constraint δ ) that solve x ∗ i , x ∗∗ i ∈ arg min ˜ x ∈ dom ( f ) [ λ ∗ c ( ˜ x , x i ) − f ( ˜ x )] . Details can be found in Gao and Kleywegt (2016). .3 Notation and Deﬁnitions Notation and core deﬁnitions for bilateral CVA (BCVA) problem setup incorporate those for unilateral CVA and DVA.Bilateral CVA measures expected portfolio loss (or beneﬁt) due to counterparty and/or ﬁrm default. Let V + ( τ C ) denote thediscounted positive portfolio exposure at time τ C and let R C ∈ [ , ) denote the recovery rate the ﬁrm receives upon counterpartydefault. Let V − ( τ F ) denote the discounted negative portfolio exposure at time τ and let R F ∈ [ , ) denote the recovery rate thecounterparty receives upon ﬁrm default. The problem setup here assumes a ﬁxed set of observation dates, 0 = t < t < · · · < t n = T . Let X + denote the vector of recovery adjusted discounted positive exposures and Y C denote the vector of counterpartydefault indicators. Let ( x + i , y ci ) denote realizations of ( X + , Y C ) along sample paths for i = { , , . . . , N } . Let X − denote thevector of recovery adjusted discounted ﬁrm negative exposures and Y F denote the vector of ﬁrm default indicators. Let ( x − i , y fi ) denote realizations of ( X − , Y F ) along sample paths for i = { , , . . . , N } .Due to the linkage, one can write X = X + + X − and decompose sample realizations of X accordingly. Therefore, let ( x i , y ci , y fi ) denote realizations of ( X , Y C , Y F ) along sample paths for i = { , , . . . , N } . The relation x i = x + i + x − i can be used todecompose x i into its positive and negative exposures respectively.The bilateral CVA associated with discounted positive exposure V + ( τ C ) , counterparty default indicator { τ C ≤ T }∩{ τ C < τ F } ,discounted negative exposure V − ( τ F ) , ﬁrm default indicator { τ F ≤ T }∩{ τ F < τ C } , isCVA B = E [( − R C ) V + ( τ C ) { τ C ≤ T }∩{ τ C < τ F } ] + E [( − R F ) V − ( τ F ) { τ F ≤ T }∩{ τ F < τ C } ] . Equivalently, one can writeCVA B = ( − R C ) (cid:90) T E [ V + ( t ) | τ C = t , τ F > t ] d Π (cid:48) C ( t ) + ( − R F ) (cid:90) T E [ V − ( t ) | τ F = t , τ C > t ] d Π (cid:48) F ( t ) , where the joint counterparty and ﬁrm default time distributions are given by Π (cid:48) C ( t ) = P ( τ C ≤ t , τ F > τ C ) and Π (cid:48) F ( t ) = P ( τ F ≤ t , τ C > τ F ) [(Green, 2015), (Lichters et al., 2015), (Memartoluie, 2017)]. The pair of vectors ( X + , Y C ) ∈ ( R n + × B n ) is X + = (( − R C ) V + ( t ) , . . . , ( − R C ) V + ( t n )) and Y C = ( { τ C = t }∩{ τ F > τ C } , . . . , { τ C = t n }∩{ τ F > τ C } ) , and the pair of vectors ( X − , Y F ) ∈ ( R n − × B n ) is X − = (( − R F ) V − ( t ) , . . . , ( − R F ) V − ( t n )) and Y F = ( { τ F = t }∩{ τ C > τ F } , . . . , { τ F = t n }∩{ τ C > τ F } ) . Here B n denotes the set of default time vectors: binary vectors of ones and zeros with n components, and at most one non-zeroelement. Note that counterparty or ﬁrm default occurs on at most one observation date within the ﬁxed set of dates in theproblem setup. The empirical measure, Φ N , is Φ N ( dz ) = N N ∑ i = ( x i , y ci , y fi ) ( dz ) . Under the empirical measure Φ N , bilateral CVA is a sum of expectations of inner productsCVA B = E Φ N [ (cid:104) X + , Y C (cid:105) ] + E Φ N [ (cid:104) X − , Y F (cid:105) ] . In the context of this work, the uncertainty set for probability measures is U δ ( Φ N ) = { P : D c ( Φ , Φ N ) ≤ δ } where D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018). For conveniencethe deﬁnition for D c is given as D c ( Φ , Φ (cid:48) ) = inf { E π [ c ( A , B )] : π ∈ P ( R d × R d ) , π A = Φ , π B = Φ (cid:48) } where P denotes the space of Borel probability measures and π A and π B denote the distributions of A and B . Here A denotes ( X A , Y CA , Y FA ) ∈ ( R n × B n × B n ) and B denotes ( X B , Y CB , Y FB ) ∈ ( R n × B n × B n ) respectively. The analysis in this work uses thecost function c S where c S (( u , v , v ) , ( x , y , y )) = S (cid:104) v − y , v − y (cid:105) + S (cid:104) v − y , v − y (cid:105) + (cid:104) u − x , u − x (cid:105) . The scale factor S > ( u , v , v ) ∈ ( R n × B n × B n ) , ( x , y , y ) ∈ ( R n × B n × B n ) . .3.2 Unilateral CVA, DVA Bilateral CVA can be reduced to express unilateral CVA asCVA U = E [( − R C ) V + ( τ C ) { τ C ≤ T } ] = ( − R C ) (cid:90) T E [ V + ( t ) | τ C = t ] d Π C ( t ) , where the counterparty default time distribution is given by Π C ( t ) = P ( τ C ≤ t ) . Note the assumption here is that τ C < τ F .Similarly, it can be reduced to express unilateral DVA (note the minus sign), assuming τ F < τ C , asDVA U = − E [( − R F ) V − ( τ F ) { τ F ≤ T } ] = − ( − R F ) (cid:90) T E [ V − ( t ) | τ F = t ] d Π F ( t ) , where ﬁrm default time distribution is given by Π F ( t ) = P ( τ F ≤ t ) [(Green, 2015), (Lichters et al., 2015), (Memartoluie,2017)]. Notation and core deﬁnitions for (bilateral) FVA problem setup incorporate those for FCA and FBA. FVA measures ex-pected funding costs and beneﬁts over portfolio lifetime. Let V + ( t ) denote the positive portfolio exposure at time t . Let V − ( t ) denote the negative portfolio exposure at time t . The problem setup here assumes a ﬁxed set of observation dates,0 = t < t < · · · < t n = T . Let X + denote the vector of discounted positive exposures and Y C denote the vector of counter-party survival indicators. Let X − denote the vector of discounted negative exposures and Y F denote the vector of ﬁrm survivalindicators. Further, let Y CF denote the Hadamard product Y C (cid:12) Y F which represents the vector of joint survival indicators. Toincorporate funding, let Z + denote the vector of funding costs incurred on exposures X + . And similarly for Z − with respectto exposures X − . Due to the linkage between Z + and Z − , one can write Z = Z + + Z − and decompose sample realizations of Z into Z + and Z − accordingly. Therefore, let ( z i , y c fi ) denote realizations of ( Z , Y CF ) along sample paths for i = { , , . . . , N } .The relation z i = z + i + z − i can be used to decompose z i into its positive and negative exposures respectively.The FVA associated with funding costs Z ( t ) , joint survival indicator { τ C > t }∩{ τ F > t } is [(Lichters et al., 2015), (Green,2015)]:FVA = FCA + FBA = (cid:90) T E [ Z + ( t ) { τ C > t }∩{ τ F > t } ] dt + (cid:90) T E [ Z − ( t ) { τ C > t }∩{ τ F > t } ] dt = (cid:90) T E [ Z ( t ) { τ C > t }∩{ τ F > t } ] dt . The pair of vectors ( Z , Y CF ) ∈ ( R n × B n ) is Z = ( Z + ( t ) + Z − ( t ) , . . . , Z + ( t n ) + Z − ( t n )) and Y CF = ( { τ C > t }∩{ τ F > t } , . . . , { τ C > t n }∩{ τ F > t n } ) , and the pair of vectors ( Z + , Z − ) ∈ ( R n + × R n − ) is Z + = ( f c ( t , t ) X + ( t ) , . . . , f c ( t n − , t n ) X + ( t n )) and Z − = ( f b ( t , t ) X − ( t ) , . . . , f b ( t n − , t n ) X − ( t n )) . Here B n denotes the set of survival time vectors: binary vectors of ones and zeros with n components, and at most one block ofones followed by a complementary block of zeros. The empirical measure, Φ N , is Φ N ( dz ) = N N ∑ i = ( z i , y cfi ) ( dz ) . Under the empirical measure Φ N , FVA is a sum of expectations of inner productsFVA = E Φ N [ (cid:104) Z + , Y CF (cid:105) ] + E Φ N [ (cid:104) Z − , Y CF (cid:105) ] = E Φ N [ (cid:104) Z , Y CF (cid:105) ] . In the context of this work, the uncertainty set for probability measures is U δ ( Φ N ) = { P : D c ( Φ , Φ N ) ≤ δ } where D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018). For conveniencethe deﬁnition for D c is D c ( Φ , Φ (cid:48) ) = inf { E π [ c ( A , B )] : π ∈ P ( R d × R d ) , π A = Φ , π B = Φ (cid:48) } where P denotes the space of Borel probability measures and π A and π B denote the distributions of A and B . Here A denotes ( Z A , Y A ) ∈ ( R n × B n ) and B denotes ( Z B , Y B ) ∈ ( R n × B n ) respectively. This work uses the cost function c S where c S (( u , v ) , ( z , y )) = S (cid:104) v − y , v − y (cid:105) + (cid:104) u − z , u − z (cid:105) . The scale factor S > ( u , v ) ∈ ( R n × B n ) , ( z , y ) ∈ ( R n × B n ) . Theory: Robust XVA and Wrong Way Risk

The robust unilateral CVA can be written as sup P ∈ U δ ( P N ) E P [ (cid:104) X + , Y C (cid:105) ] . (P1)Similarly, the robust unilateral DVA is − sup Q ∈ U δ ( Q N ) E Q [ (cid:104) X − , Y F (cid:105) ] . (P2)As such, the dual formulations and solutions to the above primal optimization problems are special cases of the solutions to thebilateral CVA optimization problems, to be described next. The robust bilateral CVA is sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] . (P3)Similar to before, use recent duality results, noting that the inner product (cid:104) ; (cid:105) satisﬁes the upper semicontinuous condition ofthe Lagrangian duality theorem, and cost function c S satisﬁes the non-negative lower semicontinuous condition (see Blanchetand Murthy (2019) Assumptions 1 & 2, Gao and Kleywegt (2016)). Hence the dual problem can be written asinf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( x i , y ci , y fi ) (cid:21) (D3)where Ψ α ( x i , y ci , y fi ) = sup u ∈ R n , v ∈ B n , v ∈ B n [ (cid:104) u + , { v < v } v (cid:105) + (cid:104) u − , { v < v } v (cid:105) − α c S (( u , v , v ) , ( x i , y ci , y fi ))]= sup u ∈ R n , v ∈ B n , v ∈ B n [ (cid:104) u + , { v < v } v (cid:105) + (cid:104) u − , { v < v } v (cid:105) − α ( (cid:104) u − x i , u − x i (cid:105) + S (cid:104) v − y ci , v − y ci (cid:105) + S (cid:104) v − y fi , v − y fi (cid:105) )] . Note that default times ( v , v ) are compared via the indicator function { v ≶ v } by comparing indices (into the ﬁxed dates array0 < t < · · · < t n = T ) of the respective default times. So if v has a one element in index i and either (cid:107) v (cid:107) = v has aone element in index j and i < j then { v < v } = i > j or (cid:107) v (cid:107) = { v < v } =

0. The probability that i = j for any i , j ∈ { , . . . , n } is zero in continuous time, hence this case is not considered here. Also (cid:107) v (cid:107) = v ≤ t n = T , the maturity date of the CVA calculation. Similar analysis applies to v .Now apply change of variables w = ( u − x i ) , w = ( v − y ci ) , and w = ( v − y fi ) to get Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , { w + y ci < w + y fi } w + y ci (cid:105) + (cid:104) ( w + x i ) − , { w + y fi < w + y ci } w + y fi (cid:105)− α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . It turns out that Ψ α can be expressed as the pointwise max of four functions of more complex forms. The four functionsrepresent the four logical cases for w and w each being zero or non-zero. Furthermore, we need to consider the sub-caseswhere the counterparty defaults before the ﬁrm, as in Ψ a α or vice-versa as in Ψ b α . Again, Ψ α quantiﬁes the adversarial movesin CVA and DVA across both time and spatial dimensions while accounting for the associated cost via the K terms. Remark 2.

Note that this result involves some lengthy and tedious derivations and requires some time to go through. However,there are some patterns across the various cases and sub-cases which does simplify the analysis to some extent.

Optimization Objective Function Solution sup w ∈ R n (cid:104) w , y i (cid:105) − α (cid:104) w , w (cid:105) (cid:107) y i (cid:107) α sup w ≤ x i τ (cid:104) w , y i (cid:105) − α (cid:104) w , w (cid:105) [ x i τ ∧ (cid:107) y i (cid:107) α ] − α [ x i τ ∧ (cid:107) y i (cid:107) α ] sup w ∈ R n (cid:104) ( w + x i ) + , y i (cid:105) − α (cid:104) w , w (cid:105) [ α + (cid:104) x i , y i (cid:105) ] + sup w ∈ R n (cid:104) ( w + x i ) − , y i (cid:105) − α (cid:104) w , w (cid:105) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + (cid:104) x i , y i (cid:105) (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( (cid:104) x i , y i (cid:105) ) (cid:3) sup w ∈ R n ( w + x i τ ) − − α (cid:104) w , w (cid:105) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + (cid:104) x i , y i (cid:105) + ( x i τ − x i τ ) (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( x i τ ) (cid:3) Proposition 1.

We have Ψ α ( x i , y ci , y fi ) = (cid:87) k = Ψ k α ( x i , y ci , y fi ) where Ψ α ( x i , y ci , y fi ) = ( y ci < y fi ) Ψ a α ( x i , y ci , y fi ) + ( y fi < y ci ) Ψ b α ( x i , y ci , y fi ) , Ψ α ( x i , y ci , y fi ) = ( w + y ci < y fi ) Ψ a α ( x i , y ci , y fi ) + ( y fi < w + y ci ) Ψ b α ( x i , y ci , y fi ) , Ψ α ( x i , y ci , y fi ) = ( y ci < w + y fi ) Ψ a α ( x i , y ci , y fi ) + ( w + y fi < y ci ) Ψ b α ( x i , y ci , y fi ) , Ψ α ( x i , y ci , y fi ) = ( w + y ci < w + y fi ) Ψ a α ( x i , y ci , y fi ) + ( w + y fi < w + y ci ) Ψ b α ( x i , y ci , y fi ) ,and (suppressing arguments for brevity): Ψ a α = (cid:20) α + (cid:104) x i , y ci (cid:105) (cid:21) + , Ψ b α = (cid:20) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( (cid:104) x i , y fi (cid:105) ) (cid:3)(cid:21) , Ψ a = (cid:20)(cid:2) α + (cid:104) x i , y ci (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) + − α S K a (cid:21) , Ψ b α = (cid:20) Ψ b α − α S K b (cid:21) , Ψ a = (cid:20) Ψ a α − α S K a (cid:21) , Ψ b = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) , Ψ a = (cid:20) Ψ a − α S ( K a − K a ) (cid:21) , Ψ b = (cid:20) Ψ b − α S ( K b − K b ) (cid:21) .Note parameter τ ∗ and constant K are deﬁned within the proof by cases (see Supplementary Material), and are omitted herefor brevity. Recall τ is index τ such that y { c , f } i τ = else it is 0 if (cid:107) y { c , f } i (cid:107) = . The selection in { c , f } is determined by context. section Proof sketch.

This result follows from jointly maximizing the adversarial exposure w and the default time indices w , w . Thestructure of B n allows us to decouple this joint maximization and ﬁnd the critical point to maximize the quadratic in w andwrite down the condition to select the optimal default time index τ ∗ for either the counterparty (in sub-case a) or the ﬁrm (insub-case b), as determined by ﬁrst to default. Finally, take the max over the four logical cases for w and w to arrive at thesolution. The K terms represent the cost associated with the worst case BCVA. The goal now is to evaluate inf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( x i , y ci , y fi ) (cid:21) where the Ψ α functions are given as the solutions to Proposition 2.1. Although the Lagrangian duality implies the convexity of F ( α ) , due to its complexity, computational methods and solvers are used to evaluate this expression. Nonetheless, the solutioncan be expressed as below. Note that for δ = B given in Section 1.3.1. Theorem 1.

The primal problem P3 has solution (cid:2) α ∗ δ + N ∑ Ni = Ψ α ∗ ( x i , y ci , y fi ) (cid:3) where α ∗ = arg min α ≥ (cid:2) αδ + N ∑ Ni = Ψ α ( x i , y ci , y fi ) (cid:3) and Ψ α ∗ ( x i , y ci , y fi ) = (cid:87) k = Ψ k α ∗ ( x i , y ci , y fi ) . xpressed in terms of original BCVA, this says sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] = E P N [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] + α ∗ δ + E P N (cid:2) Ψ α ∗ ( X , Y C , Y F ) − [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] (cid:3) + where the additional terms represent a penalty due to uncertainty in probability distribution. section Proof sketch.

This follows directly from the previous proposition. δ = The process of recovering the worst case CVA distribution involves evaluating the arg min expressions given in Section 1.2.2.The procedure is a bit tedious but one can go through the various cases and subcases discussed in Proposition 2.1, and computethe value of the dual minimizer α ∗ as given in Theorem 2.1, to recover the worst case distribution { ( x ∗ i , y c ∗ i , y f ∗ i ) : i ∈ { , ..., N + }} for a given δ . This procedure is done for a few concrete examples in Section 3. One limitation in the current approach is the omission of a risk neutral measure constraint on the underlying interest rate andcredit default distributions that generate the portfolio exposure distributions described by the Wasserstein ball U δ ( Φ N ) . It is notclear how to (either explicitly or implicitly) incorporate such a constraint in a solvable way. We highlight this as an opportunityfor improvement and a direction for further research. Empirical results for our worst case CVA studies are provided in Section3. From the authors’ perspective the computational study was illuminating to understand the magnitude and shape of worst caseCVA proﬁles as a function of uncertainty. Some recent work was done to map Wasserstein radii into lower and upper boundson the distance between the true and empirical distributions. See the discussion on this topic in Section 3.2. The robust FCA can be written as sup P ∈ U δ ( P N ) E P [ (cid:104) Z + , Y CF (cid:105) ] . (P4)Similarly, the robust FBA can be written as sup Q ∈ U δ ( Q N ) E Q [ (cid:104) Z − , Y CF (cid:105) ] . (P5)As such, the dual formulations and solutions to the above primal optimization problems are special cases of the solutions to theFVA optimization problems, to be described next. The robust FVA is sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) Z , Y CF (cid:105) ] . (P6)Similar to before, we use recent duality results, noting the inner product (cid:104) ; (cid:105) satisﬁes the upper semicontinuous condition of theLagrangian duality theorem, and cost function c S satisﬁes the non-negative lower semicontinuous condition (see Blanchet andMurthy (2019) Assumptions 1 & 2, Gao and Kleywegt (2016)). Hence the dual problem (to sup above) can be written asinf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( z i , y c fi ) (cid:21) (D6)where Ψ α ( z i , y c fi ) = sup u ∈ R n , v ∈ B n [ (cid:104) u , v (cid:105) − α c S (( u , v ) , ( z i , y c fi ))] = sup u ∈ R n , v ∈ B n [ (cid:104) u , v (cid:105) − α ( (cid:104) u − z i , u − z i (cid:105) + S (cid:104) v − y c fi , v − y c fi (cid:105) )] . ow apply change of variables w = ( u − z i ) and w = ( v − y c fi ) to get Ψ α ( z i , y c fi ) = sup w ∈ R n , w ∈ B n [ (cid:104) w + z i , w + y c fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) )] where the sets B n and B n are deﬁned as before. It turns out that Ψ α can be expressed as original FVA plus the pointwise max of ( n + ) convex functions. The degenerate case l = n cases are hyperbolas plus linesof negative slope. Ψ α quantiﬁes the adversarial move in FVA across both time and spatial dimensions while accounting for thecost via the K terms. Proposition 2.

We have Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:2) l ∗ α + (cid:0) ∑ l ∗ k = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α S K (cid:3) where l ∗ = arg max l ≥ [ l α + ∑ lk = z ik − α S K ] and l = (cid:107) w + y c fi (cid:107) ≥ , l ∈ Z + . Also (cid:107) y c fi (cid:107) ∈ Z + , and K = | l − (cid:107) y c fi (cid:107) | = (cid:107) w (cid:107) ≥ , K ∈ Z + . Once l ∗ is selected, K : = | l ∗ − (cid:107) y c fi (cid:107) | = (cid:107) w ∗ (cid:107) . Alternatively, Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:87) nl = h α ( l ) forh α ( l ) : = (cid:2) l α + (cid:0) ∑ lk = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α S K (cid:3) . section Proof sketch.

This result follows from jointly maximizing the adversarial funding exposure w and the survival time index w .The structure of B n allows us to decouple this joint maximization and ﬁnd the critical point to maximize the quadratic in w andwrite down the condition to select the optimal survival time index l ∗ . Finally, consider the two cases w = w (cid:54) = K terms represent the cost associated with the worst case. The goal now is to evaluate inf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( z i , y c fi ) (cid:21) where Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + n (cid:95) l = h α ( l ) for h α ( l ) : = (cid:2) l α + (cid:0) l ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:1) − α S K (cid:3) . The convexity of the objective function F ( α ) simpliﬁes the task of solving this optimization problem. The ﬁrst order optimalitycondition sufﬁces. As Ψ α and hence F ( α ) may have non-differentiable kinks due to the max functions, ∨ , we characterize theoptimality condition via subgradients. In particular, we look for α ∗ ≥ ∈ ∂ F ( α ∗ ) . Inspection of the asymptoticproperties of Ψ α and its subgradients reveals that ∂ F ( α ) will cross zero (as α sweeps from 0 to ∞ ) and hence α ∗ ≥

0. Notethat for δ = Proposition 3.

Let α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } where ∂ Ψ α = Conv ∪ (cid:110) ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n } (cid:111) and ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α . section Proof sketch.

This follows from application of standard properties of subgradients as well as inspection of the asymptoticproperties of Ψ α and ∂ Ψ α . For α sufﬁciently small, Ψ α has a large positive value and ∂ Ψ α has a large negative derivative. For α sufﬁciently large, for optimal l ∗ , either l ∗ = = ⇒ ∈ ∂ Ψ α or l ∗ = (cid:107) y c fi (cid:107) > = ⇒ ∂ Ψ α approaches zero = ⇒ ∂ F ( α ) crosses zero. Theorem 2.

The primal problem P6 has solution (cid:2) α ∗ δ + N ∑ Ni = Ψ α ∗ ( z i , y c fi ) (cid:3) where α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } and Ψ α ∗ ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:87) nl = h α ∗ ( l ) f or h α ∗ ( l ) : = (cid:2) l α ∗ + (cid:0) ∑ lk = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α ∗ S K (cid:3) . Expressed in terms of original FVA, this says sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) Z , Y CF (cid:105) ] = E Φ N [ (cid:104) Z , Y CF (cid:105) ] + α ∗ δ + E Φ N (cid:2) n (cid:95) l = l α ∗ + (cid:0) l ∑ k = Z k − (cid:107) Y CF (cid:107) ∑ k = Z k (cid:1) − α ∗ S K (cid:3) where the additional terms represent a penalty due to uncertainty in probability distribution. section Proof sketch.

This follows directly from the previous two propositions. δ = .4.3 Recovering the Worst Case Distribution The process of recovering the worst case FVA distribution is similar to that for CVA. In fact, for the FVA case, the procedureis a bit simpler since there are less cases and subcases to consider to recover { x ∗ i , y c f ∗ i : i ∈ { , ..., N + }} . The steps to recoverthe dual minimizer α ∗ are the same. This procedure is done for a few concrete examples in Section 3. The comments regarding incorporation of risk neutral measure constraint for the robust CVA problem formulations apply forthe robust FVA problem formulations as well. Empirical results for the worst case FVA studies are provided in Section 3.Similar to CVA, from the authors’ perspective the computational study was illuminating to understand the magnitude and shapeof worst case FVA proﬁles as a function of uncertainty.

This computational study uses the Matlab Financial Instruments Toolbox and extends WWR portfolio analysis (Brigo et al.,2013, section 5.3) to consider uncertainty in probability distribution. Other key concepts that will be discussed in this sectioninclude suitable choice for Wasserstein radius δ , calibration of scale factor S , and choice of units for exposures. The studies inthis section will investigate (and quantify) worst case bilateral CVA and FVA for different market environments and portfoliosof interest rate swaps. For CVA, the current swaps market data (see below) will be used in conjunction with Monte Carlosimulation of a market calibrated one factor Hull-White model for interest rates. The counterparty credit curve selection willvary between investment grade and high yield. For FVA, the funding spreads and volatility data is taken from Markit. Theswaps portfolios are shown as well. All calculations are done in Matlab using the ﬁnancial instruments toolbox (Matlab, 2019). As of April 20, 2020, the 5y par interest rate swap rate is 0.47% (on Bloomberg). The full interest rate swaps curve isshown in Table 2. All market data displayed below is for this date.

Table 2: Swap RatesSwap Tenor 1y 2y 3y 5y 7y 10y 30ySwap Rate 0.515% 0.409% 0.401% 0.470% 0.569% 0.691% 0.855%

Bloomberg shows the interest rate swaption volatility matrix (with option expirations as rows and swap tenors as columns).

Table 3: Swaption Normal VolatilitiesExp / Tenor 2y 3y 5y 7y 10y2y 0.520% 0.542% 0.601% 0.631% 0.680%3y 0.577% 0.592% 0.622% 0.640% 0.671%5y 0.637% 0.637% 0.637% 0.643% 0.652%7y 0.640% 0.639% 0.636% 0.636% 0.636%10y 0.639% 0.633% 0.624% 0.618% 0.612%

Furthermore, Markit shows U.S. CDX investment grade and high yield 5y credit default swap spreads as in Table 4. The ﬁrmand counterparty investment grade credit spreads are set to 100 and 150 basis points respectively. The high yield credit spreadsare shown in Table 5. Referencing Markit funding spreads, the funding spread curves are shown in Table 6. Unavailable quotesfor high yield spreads are displayed as “N/A”. This term structure of funding spreads is used for the FVA analysis. Fundingspread lognormal volatility is set to exponential decay. For investment grade it decays from 85% down to about 31% in 10years. For high yield it decays from 35% down to about 13% in 10 years.

The swaps portfolios for the CVA and FVA studies are shown in Tables 7 and 8. All 10 swaps are used for the 30y Monte Carlosimulation for CVA. The last 4 are capped at 10y maturity for FVA, as we have (only) 10y of funding market data.

Table 7: CVA Swaps PortfolioIssued Notional Maturity Rec / Pay Fixed Coupon Freq4/20/20 10 4/20/21 Rec 0.51% quarterly4/20/20 10 4/20/22 Pay 0.41% quarterly4/20/20 10 4/20/23 Pay 0.40% quarterly4/20/20 10 4/20/25 Rec 0.47% quarterly4/20/20 10 4/20/27 Pay 0.57% quarterly4/20/20 10 4/20/30 Rec 0.69% quarterly4/20/20 10 4/20/35 Rec 0.74% quarterly4/20/20 10 4/20/40 Rec 0.83% quarterly4/20/20 10 4/20/45 Pay 0.83% quarterly4/20/20 10 4/20/50 Pay 0.85% quarterly

A natural question to ask when computing worst case XVA is how to interpret the size of the Wasserstein radius δ . Adiscussion of some key results is given in (Carlsson et al., 2018, Section 3). For this study, we adopt a fairly straightforwardapproach to compute upper and lower bounds for the expected Wasserstein distance between the empirical and true distributions.A rough procedure for selecting δ involves sampling two independent data sets D and D , and setting δ = α c ∗ where α ∈ [ / , ] and c ∗ denotes the cost of the minimum bipartite matching between D and D (Carlsson et al., 2018), (Canas andRosasco, 2012). This approach relies on the following theorem referenced in Carlsson et al. (2018) and established in Canasand Rosasco (2012). Theorem.

Let ˆ f and ˆ f denote empirical distributions associated with two sets of independent samples of n points from adistribution f . Then E [ D c ( ˆ f , ˆ f )] ≤ E [ D c ( f , ˆ f )] ≤ E [ D c ( ˆ f , ˆ f )] . As such, our approach is to sample two indepedent data sets D and D of portfolio exposures and default times andcompute lower and upper bounds δ l : = E [ D c ( ˆ f , ˆ f )] and δ u : = E [ D c ( ˆ f , ˆ f )] for the expected Wasserstein distance between the empirical and true distributions. Given these bounds, one can compute the corresponding lower and upper bounds on theworst case XVA risk metrics and exposure and default time distributions.Constructing the bounds δ l and δ u in this way builds in a dependency on the units of portfolio exposures (e.g. millionsof dollars) and units in the time dimension (e.g. years), through the computation of D c ( ˆ f , ˆ f ) and the calibration of the scalefactor S (see Section 3.3 below for this). Such a dependency is desirable to assign “units” to δ as well as to conduct relativevalue analysis across portfolios. See Section 3.4 below for more commentary on choice of units for exposures. S for CVA The scale factor S represents a scaling for changes to default times. A suitable choice for S is one that charges anappropriate cost for this. Let us think about what a change in default time means in the context of CVA. For a ﬁxed pathwith index i , and exposure vector x ± i , changing the default time from τ to τ changes the value of the realized exposurefrom x ± i τ to x ± i τ upon default. A reasonable value for S , call it s , for this particular path, might be (cid:107) x ± i τ − x ± i τ (cid:107) ∞ where τ , τ ∈ { , ..., n } . Now let us generalize this to average over all paths i ∈ { , ..., N } in our empirical distribution Φ N . Let x ± τ denote N ∑ Ni = x ± i τ , the average exposure at default time τ . Substituting average exposures into our previous expressiongives the relation S : = (cid:107) x ± τ − x ± τ (cid:107) ∞ . Let us use this as our working deﬁnition for S for unilateral CVA, DVA. Calibration isstraightforward given Φ N , the set of sample paths { ( x i , y ci , y fi ) : i ∈ { , ..., N }} . For bilateral CVA, take the average over theunilateral CVA and DVA scale factors, namely S : = ( (cid:107) x + τ − x + τ (cid:107) ∞ + (cid:107) x − τ − x − τ (cid:107) ∞ ) . S for FVA Let us the follow the approach above for FVA. For a ﬁxed path with index i , the funding exposure vector is z ± i and theincremental change is ∆ z ± i τ . A reasonable value for S , call it s , for this particular path, might be (cid:107) z ± i τ − z ± i τ (cid:107) ∞ . Substitutingaverage exposures into this expression gives the relation S : = (cid:107) z ± τ − z ± τ (cid:107) ∞ . Let us use this as our working deﬁnition for S for FCA, FBA. Calibration is straightforward given Φ N , the set of sample paths { ( z i , y c fi ) : i ∈ { , ..., N }} . For FVA, take theaverage over the FCA and FBA scale factors, namely S : = ( (cid:107) z + τ − z + τ (cid:107) ∞ + (cid:107) z − τ − z − τ (cid:107) ∞ ) . Standardizing the units across portfolios is useful for relative value analysis. The choice of units for exposures (e.g. millionsof dollars) and default times (e.g. decimal years) is up to the user, although we recommend these conventions, and use themin our analysis in this section. Note that different choices of units will lead to calibrated different values for S for BCVA andFVA. There is no one choice for units (as in regression analysis, for example) although consistency is recommended as a goodpractice. The same comments apply for the choices of time frequency and time horizon for the robust XVA analysis. .5 Deﬁnitions for Exposure Calculations The deﬁnitions for the various exposure calculations plotted in Section 3.6 for CVA and DVA are given in Table 9 below.For FVA calculations (FCA and FBA), plotted in Section 3.7, replace portfolio exposures V + and V − with funding exposures Z + and Z − respectively. Table 9: CVA Exposure Calculations

Term

CVA U DVA U EE ( t ) E [ V + ( t )] E [ V − ( t )] PFE α ( t ) inf { x ∈ R : α ≤ F V + ( t ) ( x ) } inf { x ∈ R : α ≤ F V − ( t ) ( x ) } EPE ( t ) T (cid:82) T EE ( t ) dt T (cid:82) T EE ( t ) dt EffEE ( t ) max { EE ( τ ) : τ ∈ [ , t ] } max { EE ( τ ) : τ ∈ [ , t ] } EffEPE ( t ) T (cid:82) T EffEE ( t ) dt T (cid:82) T EffEE ( t ) dt The swaps portfolio shown in Table 7 is used for this analysis. The portfolio consists of ten par coupon interest rate swaps,with a mix of receving ﬁxed and paying ﬁxed swaps at different maturities. The investment grade ﬁrm and counterparty creditspreads are set to 100 and 150 basis points respectively. The calibrated value of S is 1.4584 which results in δ l = .

414 and δ u = .

828 using a second set of Bloomberg market data (for 03/20/20) along with the ﬁrst set for 04/20/20. The full range ofWasserstein radii δ is given in Table 10. Table 10: BCVA Wasserstein RadiiPercentage of δ u

50% 60% 70% 80% 90% 100%W Radius delta 14.41 17.30 20.18 23.06 25.95 28.83

Matlab plots characterizing the BCVA positive and negative exposure proﬁles and trajectory of worst case BCVA as a functionof Wasserstein radius are shown in Figures 1,2,3.

Figure 1: Swaps Portfolio Positive Exposure Proﬁles

The baseline BCVA for this portfolio is approximately 160k USD and represents the dot product of the discounted positiveportfolio exposure proﬁle times counterparty default probability plus dot product of the discounted negative portfolio exposure times ﬁrm default probability. The worst case BCVA curve is shown in Figure 3. The worst case CVA curve ranges from 69%to 93% the size of Max PFE (Potential Future Exposure) which is equal to 6.43mm USD (see Figure 1), for Wasserstein radii δ given in Table 10. So the takeaway here is worst case BCVA can be a signiﬁcant percentage of PFE for swap portfolios withlow risk counterparty default curves (investment grade).The worst case distribution for δ u is shown in Figures 4 and 5. The ﬁrst plot shows the exposures { x ∗ i } and the second plotshows the joint distribution of counterparty and ﬁrm default times { y c ∗ i , y f ∗ i } . Default times beyond the portfolio maturity datedenote no default prior to portfolio maturity for those simulation paths. This results in higher contours in the back row. The swaps portfolio shown in Table 7 is used for this analysis. The portfolio consists of ten par coupon interest rate swaps,with a mix of receving ﬁxed and paying ﬁxed swaps at different maturities. The high yield counterparty credit spreads are set asin Table 5. The investment grade ﬁrm credit spreads are set to a constant 100 basis points. The calibrated value of S is 1.4584which results in δ l = .

45 and δ u = .

90 using a second set of Bloomberg market data (for 03/20/20) along with the ﬁrst setfor 04/20/20. The full range of Wasserstein radii δ is given in Table 11. Table 11: BCVA Wasserstein RadiiPercentage of δ u

50% 60% 70% 80% 90% 100%W Radius delta 14.45 17.34 20.23 23.12 26.01 28.90

Matlab plots characterizing the BCVA positive and negative exposure proﬁles and trajectory of worst case BCVA as a function of Wasserstein radius are shown in Figures 6,7,8.

Figure 6: Swaps Portfolio Positive Exposure Proﬁles

The baseline BCVA for this portfolio is approximately 106k USD and represents the dot product of the discounted positiveportfolio exposure proﬁle times counterparty default probability plus dot product of the discounted negative portfolio exposuretimes ﬁrm default probability. The worst case BCVA curve is shown in Figure 8. Note that for this problem instance, the worstcase BCVA results for high yield counterparty credit are similar to the previous subsection, for investment grade counterparty credit. Note the worst case BCVA ranges from 70% to 95% the size of Max PFE (Potential Future Exposure), which is equalto 6.43mm USD (see Figure 6), for Wasserstein radii δ given in Table 11. So the takeaway here is worst case BCVA can be asigniﬁcant percentage of PFE for swap portfolios with high yield counterparty default curves as well. Figure 9: Worst Case Exposures

The worst case distribution for delta value δ u is shown in Figures 9 and 10. The ﬁrst plot shows the exposures { x ∗ i } and thesecond plot shows the joint distribution of counterparty and ﬁrm default times { y c ∗ i , y f ∗ i } . Default times beyond the portfolio maturity date denote no default prior to portfolio maturity for those simulation paths. This results in higher contours in the jointdensity plot in the back row. Higher counterparty credit spreads lead to earlier counterparty default times. The swaps portfolio shown in Table 8 is used for this analysis. The portfolio consists of ten interest rate swaps, with a mixof receving ﬁxed and paying ﬁxed swaps at different maturities. Capping maturities at 10y introduces some positive NPV tothis portfolio. The investment grade ﬁrm and counterparty funding spreads are set as shown in Table 6. The calibrated value of S is 0.082 which results in δ l = .

387 and δ u = .

774 using a second set of Bloomberg market data (for 03/20/20) along withthe ﬁrst set for 04/20/20. The full range of Wasserstein radii δ is given in Table 12. Table 12: FVA Wasserstein RadiiPercentage of δ u Matlab plots characterizing the FVA positive and negative exposure proﬁles and trajectory of worst case FVA as a function ofWasserstein radius are shown in Figures 13,14,15.

Figure 11: Swaps Portfolio Positive Exposure Proﬁles

The baseline FVA for this portfolio is 240k USD and represents the dot product of the discounted portfolio FCA exposureproﬁle times joint survival probability plus dot product of the discounted portfolio FBA exposure times joint survival probability.

The worst case FVA curve is shown in Figure 15. For Wasserstein radius δ l = . δ u = . δ u is shown in Figures 16 and 17. The ﬁrst plot shows the exposures { z ∗ i } andthe second plot shows the joint distribution of counterparty and ﬁrm survival times { y c f ∗ i } . Survival times beyond the portfolio maturity date denote no defaults prior to portfolio maturity for those simulation paths. The swaps portfolio shown in Table 8 is used for this analysis. The portfolio consists of ten par coupon interest rate swaps,with a mix of receving ﬁxed and paying ﬁxed swaps at different maturities. The high yield ﬁrm and counterparty funding preads are set as shown in Table 6. The high yield counterparty credit spreads are set as shown in Table 5. The investmentgrade ﬁrm credit spreads are set to a constant 100 basis points. The calibrated value of S is 0.2898 which results in δ l = . δ u = .

87 using a second set of Bloomberg market data (for 03/20/20) along with the ﬁrst set for 04/20/20. The full rangeof Wasserstein radii δ is given in Table 13. Matlab plots characterizing the FVA positive and negative exposure proﬁles andtrajectory of worst case FVA as a function of Wasserstein radius are shown in Figures 20,21,22. Table 13: FVA Wasserstein RadiiPercentage of δ u The baseline FVA for this portfolio is 1.18mm USD and represents the dot product of the discounted portfolio FCA exposureproﬁle times joint survival probability plus dot product of the discounted portfolio FBA exposure times joint survival probability.The worst case FVA curve is shown in Figure 22. For Wasserstein radius δ l = . δ u = .

87, the worst case FVA isapproximately 12.44, or 1.75 times the size of integrated FCA PFE. In this problem instance, similar to the investment gradeexample, worst case FVA is a multiple of integrated FCA PFE exposure, so quite signiﬁcant.

The worst case distribution for delta value δ u is shown in Figures 23 and 24. The ﬁrst plot shows the exposures { z ∗ i } andthe second plot shows the joint distribution of counterparty and ﬁrm survival times { y c f ∗ i } . Survival times beyond the portfoliomaturity date denote no defaults prior to portfolio maturity for those simulation paths. This work has developed theoretical results and investigated calculations of robust CVA, FVA, and wrong way risk for OTCderivatives under distributional uncerainty using Wasserstein distance as an ambiguity measure. The ﬁnancial market overview,foundational notation, and robust XVA primal problems were introduced in Section 1. Using recent inﬁnite dimensional La-grangian duality results (Blanchet and Murthy, 2019), the simpler dual formulations and their analytic solutions for BCVAand FVA were derived in Section 2. After that, in Section 3, some computational experiments were conducted to measure theadditional XVA charge due to distributional uncertainty for a variety of portfolio and market conﬁgurations. Worst case BCVAand FVA were found to be signiﬁcant relative to their respective PFE proﬁles in all problem instances. Finally, we concludewith some commentary on directions for further research.One direction for future research, as has been previously discussed, is to extend the problem formulations to include a riskneutral measure constraint in a solvable way. Explicitly adding the constraint would complicate the problem formulations nodoubt, so perhaps there is a more tractable indirect approach. Another direction for future research would be to develop (andapply) similar theoretical machinery as used for robust CVA and FVA towards robust KVA (Capital Valuation Adjustment) andMVA (Margin Valuation Adjustment) and wrong way risk in that context. Intuitively, wrong way risk arises in that context whenthe market cost of capital and/or funding the margin position increases at the same time as the portfolio exposure increases. ata Availability Statement The raw and/or processed data required to reproduce the ﬁndings from this research can be obtained from the correspondingauthor, [D.S.], upon reasonable request.

Conﬂict of Interest Statement

The authors declare they have no conﬂict of interest.

Funding Statement

The authors received no speciﬁc funding for this work. eferences Bartl, D., Drapeau, S., and Tangpi, L. (2017). Computational aspects of robust optimized certainty equivalents and optionpricing.Basel Committee on Banking Supervision (2015). Review of the credit valuation adjustment risk framework.Blanchet, J., Chen, L., and Zhou, X. Y. (2018). Distributionally robust mean-variance portfolio selection with wassersteindistances.Blanchet, J., Kang, Y., and Murthy, K. (2016a). Robust wasserstein proﬁle inference and applications to machine learning. arXiv preprint arXiv:1610.05627 .Blanchet, J., Kang, Y., and Murthy, K. (2016b). Robust wasserstein proﬁle inference and applications to machine learning. arXiv preprint arXiv:1610.05627 .Blanchet, J. and Murthy, K. (2019). Quantifying distributional model risk via optimal transport.

Mathematics of OperationsResearch , 44(2):565–600.Brigo, D., Morini, M., and Pallavicini, A. (2013).

Counterparty credit risk, collateral and funding: with pricing cases for allasset classes , volume 478. John Wiley & Sons.Canas, G. and Rosasco, L. (2012). Learning probability measures with respect to optimal transport metrics. In

Advances inNeural Information Processing Systems , pages 2492–2500.Carlsson, J. G., Behroozi, M., and Mihic, K. (2018). Wasserstein distance and the distributionally robust tsp.

OperationsResearch , 66(6):1603–1624.El Hajjaji, O. and Subbotin, A. (2015). Cva with wrong way risk: Sensitivities, volatility and hedging.

International Journalof Theoretical and Applied Finance , 18(03):1550017.Esfahani, P. M. and Kuhn, D. (2018). Data-driven distributionally robust optimization using the wasserstein metric: Perfor-mance guarantees and tractable reformulations.

Mathematical Programming , 171(1-2):115–166.Gao, R., Chen, X., and Kleywegt, A. J. (2017). Wasserstein distributional robustness and regularization in statistical learning. arXiv preprint arXiv:1712.06050 .Gao, R. and Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with wasserstein distance. arXiv preprintarXiv:1604.02199 .Glasserman, P. and Yang, L. (2015). Bounding wrong-way risk in measuring counterparty risk.

Ofﬁce of Financial ResearchWorking Paper , (15-16):15–76.Green, A. (2015).

XVA: Credit, Funding and Capital Valuation Adjustments . John Wiley & Sons.Lichters, R., Stamm, R., and Gallagher, D. (2015).

Modern derivatives pricing and credit exposure analysis: theory andpractice of CSA and XVA pricing, exposure simulation and backtesting . Springer.Matlab (2019). Matlab, counterparty credit risk and cva. . Accessed: 2019-07-30.Memartoluie, A. (2017). Computational methods in ﬁnance related to distributions with known marginals.Ofﬁce of the Comptroller of the Currency (2011). Interagency supervisory guidance on counterparty credit risk management.

Bulletin 2011-30a .Ramzi Ben-Abdallah, M. B. and Marzouk, O. (2019). Wrong-way risk of interest rate instruments.

Journal of Credit Risk ,pages 21–44.Singh, D. and Zhang, S. (2020). Robust arbitrage conditions for ﬁnancial markets. arXiv preprint arXiv:2004.09432 . Supplement for Theory: Robust XVA and Wrong Way Risk (Section 2 )

Proposition 1.

Suppose w ∗ = , w ∗ =

0. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n (cid:2) (cid:104) ( w + x i ) + , { y ci < y fi } y ci (cid:105) + (cid:104) ( w + x i ) − , { y fi < y ci } y fi (cid:105) − α ( (cid:104) w , w (cid:105) ) (cid:3) . a) Suppose ( y ci < y fi ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n (cid:2) (cid:104) ( w + x i ) + , y ci (cid:105) − α ( (cid:104) w , w (cid:105) ) (cid:3) . Therefore (cid:107) y ci (cid:107) =

1. Let τ denote default time for y ci . Simplify further to get Ψ α ( x i , y ci , y fi ) = sup w τ ∈ R (cid:2) ( w τ + x i τ ) + − α ( w τ ) (cid:3) . Now follow the approach in Bartl et al. (2017) to write down the ﬁrst order optimality condition: [ , ∞ ) ( w τ + x i τ ) − α w τ ≤ ≤ ( , ∞ ) ( w τ + x i τ ) − α w τ . i) Suppose ( w ∗ τ + x i τ ) <

0. Then w ∗ τ =

0. So x i τ < = ⇒ w ∗ i τ = ( w ∗ τ + x i τ ) >

0. Then w ∗ τ = α . So x i τ > − α = ⇒ w ∗ τ = α .iii) Note ( w ∗ τ + x i τ ) = x i τ above, there are three cases as below.i) x i τ ≥ = ⇒ w ∗ τ = α = ⇒ Ψ α = [ α + x i τ ] .ii) x i τ ≤ − α = ⇒ w ∗ τ = = ⇒ Ψ α = ( − α < x i τ < ) = ⇒ Ψ α = [ α + x i τ ] + .In summary, considering all cases above, conclude that Ψ a α ( x i , y ci , y fi ) = (cid:2) α + x i τ (cid:3) + . This can also be expressed as Ψ a α ( x i , y ci , y fi ) = (cid:2) α + (cid:104) x i , y ci (cid:105) (cid:3) + . ) Suppose ( y fi < y ci ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n (cid:2) (cid:104) ( w + x i ) − , y fi (cid:105) − α ( (cid:104) w , w (cid:105) ) (cid:3) . Therefore (cid:107) y fi (cid:107) =

1. Let τ denote default time for y fi . Simplify further to get Ψ α ( x i , y ci , y fi ) = sup w τ ∈ R (cid:2) ( w τ + x i τ ) − − α ( w τ ) (cid:3) . Now follow the approach in Bartl et al. (2017) to write down the ﬁrst order optimality condition: ( − ∞ , ] ( w τ + x i τ ) − α w τ ≤ ≤ ( − ∞ , ) ( w τ + x i τ ) − α w τ . i) Suppose ( w ∗ τ + x i τ ) >

0. Then w ∗ τ =

0. So x i τ > = ⇒ w ∗ τ = ( w ∗ τ + x i τ ) <

0. Then w ∗ τ = α . So x i τ < − α = ⇒ w ∗ τ = α .iii) Note ( w ∗ τ + x i τ ) = x i τ above, there are three cases as below.i) x i τ > = ⇒ w ∗ τ = = ⇒ Ψ α = x i τ < − α = ⇒ w ∗ i τ = α = ⇒ Ψ α = [ α + x i τ ] .iii) [ − α ≤ x i τ ≤ ] = ⇒ w ∗ τ = | x i τ | .Note the slope ( − α w τ ) is positive for 0 ≤ w τ < α , and equals zero at w τ = α .However, ( w τ + x i τ ) − attains its max value of zero for w i τ = | x i τ | so stop there.In summary, considering all cases above, conclude that Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + x i τ (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( x i τ ) (cid:3)(cid:21) . This can also be expressed as Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( (cid:104) x i , y fi (cid:105) ) (cid:3)(cid:21) . c) Suppose ( (cid:107) y fi (cid:107) = (cid:107) y ci (cid:107) = ) = Ψ α =

0. Note there is no third subcase for Cases 2-4 below since that would imply w ∗ = , w ∗ = Ψ α ( x i , y ci , y fi ) = ( y ci < y fi ) Ψ a α ( x i , y ci , y fi ) + ( y fi < y ci ) Ψ b α ( x i , y ci , y fi ) . Case 2

Suppose w ∗ (cid:54) = , w ∗ = w ∗ has +1 in position τ ∗ and -1 in position τ , where τ j = ± τ ∗ (cid:54) = τ otherwise w ∗ = Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , { w + y ci < y fi } w + y ci (cid:105) + (cid:104) ( w + x i ) − , { y fi < w + y ci } y fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . a) Suppose ( w + y ci < y fi ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , w + y ci (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall (cid:104) ( w + x i ) , ( w + y ci ) (cid:105) = ( w τ + x i τ ) . Also recall τ and τ are associated with y ci . Let τ , f denote default time(index) for y fi . The default time constraint implies τ < τ , f . Therefore τ >

0. The structure of ﬁnite set B n implies Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , < τ < τ , f , τ (cid:54) = τ (cid:2) ( w τ + x i τ ) + − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . bserve the only positive component for w ∈ R n in sup above is τ .sup w ∈ R n (cid:2) ( w τ + x i τ ) + − α (cid:104) w , w (cid:105) (cid:3) = sup w τ ∈ R (cid:2) ( w τ + x i τ ) + − α ( w τ ) (cid:3) . Evaluating at the critical point w ∗ τ = α ∈ R for the above quadratic givessup w τ ∈ R (cid:2) ( w τ + x i τ ) + − α ( w τ ) (cid:3) = (cid:2) α + x i τ (cid:3) + . Therefore one can write Ψ α ( x i , y ci , y fi ) = max < τ < τ , f , τ (cid:54) = τ (cid:2) α + x i τ (cid:3) + − α S K a where K a : = ( + { τ (cid:54) = } ) . Furthermore, τ ∗ is determined as τ ∗ = arg max < τ < τ , f , τ (cid:54) = τ [ x + i τ ] . Substituting back into expression for Ψ α gives Ψ a α ( x i , y ci , y fi ) = (cid:20)(cid:2) α + x i τ ∗ (cid:3) + − α S K a (cid:21) . This can also be expressed as Ψ a α ( x i , y ci , y fi ) = (cid:20)(cid:2) α + (cid:104) x i , y ci (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) + − α S K a (cid:21) . b) Suppose ( y fi < w + y ci ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) − , y fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall τ and τ are associated with y ci . Let τ , f denote the default time (index) for y fi . The default time constraintimplies τ , f < τ . Therefore τ , f > (cid:107) y fi (cid:107) =

1. Note the only non-zero component of (cid:107) y fi (cid:107) is τ , f . Hence set w ∗ τ = ∀ τ (cid:54) = τ , f . Simplifying further Ψ α ( x i , y ci , y fi ) = sup w τ , f ∈ R , w ∈ B n (cid:2) ( w τ , f + x i τ , f ) − − α (( w τ , f ) + S K b ) (cid:3) . where K b : = ( { τ (cid:54) = } + { τ (cid:54) = } ) = 1. For K b , if τ =

0, then τ (cid:54) = w ∗ (cid:54) =

0. Otherwise set τ = τ (cid:54) = w above. Following the calculations in Case 1b ) above, conclude that Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ < − α ) ∨ ( x i τ > ) } [ α + x i τ ] − − {− α ≤ x i τ ≤ } (cid:2) α ( x i τ ) (cid:3) − α S K b (cid:21) . This can also be expressed as Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ < − α ) ∨ ( x i τ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) (cid:3) − − {− α ≤ x i τ ≤ } (cid:2) α ( (cid:104) x i , y fi (cid:105) ) (cid:3) − α S K b (cid:21) . Finally, to sum up Case 2, considering parts a) and b), let us write: Ψ α ( x i , y ci , y fi ) = ( w + y ci < y fi ) Ψ a α ( x i , y ci , y fi ) + ( y fi < w + y ci ) Ψ b α ( x i , y ci , y fi ) . Case 3

Suppose w ∗ = , w ∗ (cid:54) = w ∗ has +1 in position τ ∗ and -1 in position τ , where τ j = ± τ ∗ (cid:54) = τ otherwise w ∗ = Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , { y ci < w + y fi } y ci (cid:105) + (cid:104) ( w + x i ) − , { w + y fi < y ci } w + y fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . ) Suppose ( y ci < w + y fi ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , y ci (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall (cid:104) ( w + x i ) , y ci (cid:105) = ( w τ + x i τ ) . Also recall τ and τ are associated with y fi . Let τ , c denote the default time (index)for y ci . The default time constraint implies τ , c < τ . Therefore τ , c > (cid:107) y ci (cid:107) =

1. Note the only positive componentof (cid:107) y ci (cid:107) is τ , c . Hence set w ∗ τ = ∀ τ (cid:54) = τ , c . Simplify further to get Ψ α ( x i , y ci , y fi ) = sup w τ , c ∈ R , w ∈ B n (cid:2) ( w τ , c + x i τ , c ) + − α (( w τ , c ) + S K a ) (cid:3) where K a : = ( { τ (cid:54) = } + { τ (cid:54) = } ) = 1, following logic in Case 2b) above. Evaluating at the critical point w ∗ τ , c = α ∈ R gives sup w τ , c ∈ R (cid:2) ( w τ , c + x i τ , c ) + − α ( w τ , c ) (cid:3) = (cid:2) α + x i τ , c (cid:3) + . Therefore one can write Ψ α ( x i , y ci , y fi ) = (cid:2) α + x i τ , c (cid:3) + − α S K a . This can also be expressed as Ψ a α ( x i , y ci , y fi ) = (cid:20)(cid:2) α + (cid:104) x i , y ci (cid:105) (cid:3) + − α S K a (cid:21) . b) Suppose ( w + y fi < y ci ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n (cid:2) (cid:104) ( w + x i ) − , w + y fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall (cid:104) ( w + x i ) , ( w + y fi ) (cid:105) = ( w τ + x i τ ) . Also recall τ and τ are associated with y fi . Let τ , c denote default time(index) for y ci . The default time constraint implies τ < τ , c . Therefore τ > Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , < τ < τ , c , τ (cid:54) = τ (cid:2) ( w τ + x i τ ) − − α (( w τ ) + S K b ) (cid:3) where K b : = ( + { τ (cid:54) = } ) . Following the calculations in Case 2b ) above, conclude that Ψ α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + x i τ ∗ (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) . Furthermore, τ ∗ is determined as τ ∗ = arg max < τ < τ c , τ (cid:54) = τ [ x i τ ] . Therefore one can write Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + x i τ ∗ (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) . This can also be expressed as Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) . Finally, to sum up Case 3, considering parts a) and b), let us write: Ψ α ( x i , y ci , y fi ) = ( y ci < w + y fi ) Ψ a α ( x i , y ci , y fi ) + ( w + y fi < y ci ) Ψ b α ( x i , y ci , y fi ) . ase 4 Suppose w ∗ (cid:54) = , w ∗ (cid:54) = w ∗ has +1 in position τ ∗ , c and -1 in position τ , c , where τ j , c = ± τ ∗ , c (cid:54) = τ , c otherwise w ∗ = w ∗ has +1 in position τ ∗ , f and -1 in position τ , f , where τ j , f = ± τ ∗ , f (cid:54) = τ , f otherwise w ∗ = Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n , w ∈ B n (cid:20) (cid:104) ( w + x i ) + , { w + y ci < w + y fi } w + y ci (cid:105) + (cid:104) ( w + x i ) − , { w + y fi < w + y ci } w + y fi (cid:105)− α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:21) . a) Suppose ( w + y ci < w + y fi ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n , w ∈ B n (cid:2) (cid:104) ( w + x i ) + , w + y ci (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall (cid:104) ( w + x i ) , ( w + y ci ) (cid:105) = ( w τ , c + x i τ , c ) . The default time constraint implies τ , c < τ , f . Therefore τ , c > B n implies Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , < τ , c < τ , f , τ , c (cid:54) = τ , c (cid:2) ( w τ , c + x i τ , c ) + − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Observe the only positive component for w ∈ R n in sup above is τ , c .sup w ∈ R n (cid:2) ( w τ , c + x i τ , c ) + − α (cid:104) w , w (cid:105) (cid:3) = sup w τ , c ∈ R (cid:2) ( w τ , c + x i τ , c ) + − α ( w τ , c ) (cid:3) . Evaluating at the critical point w ∗ τ , c = α ∈ R for the above quadratic givessup w τ , c ∈ R (cid:2) ( w τ , c + x i τ , c ) + − α ( w τ , c ) (cid:3) = (cid:2) α + x i τ , c (cid:3) + . Therefore one can write Ψ α ( x i , y ci , y fi ) = max < τ , c < τ , f , τ , c (cid:54) = τ , c (cid:2) α + x i τ , c (cid:3) + − α S K a . where K a : = ( { τ , c (cid:54) = } + { τ , c (cid:54) = } + { τ , f (cid:54) = } + { τ , f (cid:54) = } ) = ( + { τ , c (cid:54) = } ) following logic as in Case 3a) above.Furthermore, τ ∗ is determined as τ ∗ = arg max < τ , c < τ , f , τ , c (cid:54) = τ , c [ x + i τ , c ] . Substituting back into expression for Ψ α gives Ψ a α ( x i , y ci , y fi ) = (cid:20)(cid:2) α + x i τ ∗ (cid:3) + − α S K a (cid:21) . Let τ = τ , c . Then this can also be expressed as Ψ a α ( x i , y ci , y fi ) = (cid:20)(cid:2) α + (cid:104) x i , y ci (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) + − α S K a (cid:21) . b) Suppose ( w + y fi < w + y ci ) =

1. Then Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , w ∈ B n , w ∈ B n (cid:2) (cid:104) ( w + x i ) − , w + y fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . Recall (cid:104) ( w + x i ) , ( w + y fi ) (cid:105) = ( w τ , f + x i τ , f ) . The default time constraint implies τ , f < τ , c . Therefore τ , f > B n implies Ψ α ( x i , y ci , y fi ) = sup w ∈ R n , < τ , f < τ , c , τ , f (cid:54) = τ , f (cid:2) ( w τ , f + x i τ , f ) − − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) ) (cid:3) . α ( x i , y ci , y fi ) = sup w ∈ R n , < τ , f < τ , c , τ , f (cid:54) = τ , f (cid:2) ( w τ , f + x i τ , f ) − − α (( w τ , f ) + S K ) (cid:3) . where K : = ( { τ , c (cid:54) = } + { τ , c (cid:54) = } + { τ , f (cid:54) = } + { τ , f (cid:54) = } ) = ( + { τ , f (cid:54) = } ) following logic as in Case 4a) above.Following the calculations in Case 3b ) above, conclude that Ψ α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + x i τ ∗ (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K (cid:21) . Furthermore, τ ∗ is determined as τ ∗ = arg max < τ , f < τ , c , τ , f (cid:54) = τ , f [ x i τ , f ] . Therefore one can write Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + x i τ ∗ (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) . Let τ = τ , f . Then this can also be expressed as Ψ b α ( x i , y ci , y fi ) = (cid:20) { ( x i τ ∗ < − α ) ∨ ( x i τ ∗ > ) } (cid:2) α + (cid:104) x i , y fi (cid:105) + ( x i τ ∗ − x i τ ) (cid:3) − − {− α ≤ x i τ ∗ ≤ } (cid:2) α ( x i τ ∗ ) (cid:3) − α S K b (cid:21) . Finally, to sum up Case 4, considering parts a) and b), let us write: Ψ α ( x i , y ci , y fi ) = ( w + y ci < w + y fi ) Ψ a α ( x i , y ci , y fi ) + ( w + y fi < w + y ci ) Ψ b α ( x i , y ci , y fi ) . Theorem 1.

The primal problem P3 has solution (cid:2) α ∗ δ + N ∑ Ni = Ψ α ∗ ( x i , y ci , y fi ) (cid:3) where α ∗ = arg min α ≥ (cid:2) αδ + N ∑ Ni = Ψ α ( x i , y ci , y fi ) (cid:3) and Ψ α ∗ ( x i , y ci , y fi ) = (cid:87) k = Ψ k α ∗ ( x i , y ci , y fi ) .Expressed in terms of original BCVA, this says sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] = E P N [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] + α ∗ δ + E P N (cid:2) Ψ α ∗ ( X , Y C , Y F ) − [ (cid:104) X + , Y C (cid:105) + (cid:104) X − , Y F (cid:105) ] (cid:3) + where the additional terms represent a penalty due to uncertainty in probability distribution.Proof. This follows by direct substitution of α ∗ as characterized above into the dual problem D3. Proposition 2.

Suppose w ∗ = = ⇒ l = (cid:107) y c fi (cid:107) . Then Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + sup w ∈ R n [ (cid:104) w , y c fi (cid:105) − α (cid:104) w , w (cid:105) ] . Applying the Cauchy-Schwarz Inequality gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + sup (cid:107) w (cid:107) [ (cid:107) w (cid:107)(cid:107) y c fi (cid:107) − α (cid:107) w (cid:107) ] . valuating the critical point (cid:107) w ∗ (cid:107) = (cid:107) y cfi (cid:107) α ∈ R + for the quadratic gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:107) y c fi (cid:107) α = (cid:104) z i , y c fi (cid:105) + (cid:107) y c fi (cid:107) α . Case 2

Now consider w ∗ (cid:54) = = ⇒ l (cid:54) = (cid:107) y c fi (cid:107) .Observe for l = (cid:107) w + y c fi (cid:107) ≥ (cid:104) w + z i , w + y c fi (cid:105) = l ∑ k = ( w k + z ik ) . The structure of ﬁnite set B n implies Ψ α ( z i , y c fi ) = sup w ∈ R n , l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l ∑ k = ( w k + z ik ) − α ( (cid:104) w , w (cid:105) + S K )] . Again, using that B n is a ﬁnite set, one can write Ψ α ( z i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) sup w ∈ R n [ l ∑ k = ( w k + z ik ) − α ( (cid:104) w , w (cid:105) + S K )] . Observing that only the ﬁrst l components of w inside the sup are positive gives ∀ k ∈ { , . . . , l } sup w ∈ R n [ l ∑ k = ( w k ) − α (cid:104) w , w (cid:105) ] = l × sup w k ∈ R [ w k − α ( w k ) ] . Evaluating at the critical point w ∗ k = α ∈ R + for the above quadratic givessup w k ∈ R [ w k − α ( w k )] = α . Therefore one can write Ψ α ( z i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l α + l ∑ k = ( z ik ) − α S K ] . Furthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l α + l ∑ k = ( z ik ) − α S K ] . Substituting back into expression for Ψ α gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Finally, taking the max values for Ψ α over cases w ∗ = w ∗ (cid:54) = Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) (cid:107) y c fi (cid:107) α (cid:21) ∨ (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Observe that for l ∗ = (cid:107) y c fi (cid:107) , the last term in brackets [ ; ] above evaluates to (cid:2) (cid:107) y cfi (cid:107) α (cid:3) . Let l ∗ be determined as l ∗ = arg max l ∈{ ,..., n } [ l α + l ∑ k = ( z ik ) − α S K ] nd write Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Alternatively, one can write Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + n (cid:95) l = (cid:20) l α + (cid:18) l ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Proposition 3.

Let α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } where ∂ Ψ α = Conv ∪ (cid:110) ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n } (cid:111) and ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α .Proof. This follows from standard application of properties of convex functions and subgradients. First note that function h α is convex in α since (for ﬁxed l ) it is the sum of a hyperbola plus a constant plus a negative linear term. So then Ψ α is convexsince it is the pointwise max of a ﬁnite set of convex functions plus a constant. Using properties of subgradients, one can write ∂ Ψ α = Conv ∪ { ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n }} . Furthermore F ( α ) is convex in α since it is a linear term plusa sum of convex functions, so one can write α ∗ ∈ { α : 0 ∈ ∂ F ( α ) } and it follows that ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α . Finally,we argue that α ∗ ≥

0. For α > ∃ z < − δ such that z ∈ ∂ Ψ α and for α > ∃ z > − δ such that z ∈ ∂ Ψ α . To elaborate, for α > (cid:107) y c fi (cid:107) > = ⇒ l ∗ = (cid:107) y c f (cid:107) = ⇒ K = = ⇒ ∃ z > − δ suchthat z ∈ ∂ Ψ α . To elaborate, for α > (cid:107) y c fi (cid:107) = = ⇒ l ∗ = = ⇒ K = , Ψ α = , = z > − δ such that z ∈ ∂ Ψ α . Hence we deduce ∂ F ( α ) crosses the origin ( as α sweeps from 0 to ∞ ). Theorem 2.