[PDF] Distributionally Robust XVA via Wasserstein Distance Part 2: Wrong Way Funding Risk

Abstract

This paper investigates calculations of robust funding valuation adjustment (FVA) for over the counter (OTC) derivatives under distributional uncertainty using Wasserstein distance as the ambiguity measure. Wrong way funding risk can be characterized via the robust FVA formulation. The simpler dual formulation of the robust FVA optimization is derived. Next, some computational experiments are conducted to measure the additional FVA charge due to distributional uncertainty under a variety of portfolio and market configurations. Finally some suggestions for future work, such as robust capital valuation adjustment (KVA) and margin valuation adjustment (MVA), are discussed.

Full PDF

DDistributionally Robust XVA via Wasserstein DistancePart 2: Wrong Way Funding Risk

Derek Singh, Shuzhong Zhang

Department of Industrial and Systems Engineering, University of [email protected], [email protected]

Abstract

This paper investigates calculations of robust funding valuation adjustment (FVA) for over the counter (OTC)derivatives under distributional uncertainty using Wasserstein distance as the ambiguity measure. Wrong wayfunding risk can be characterized via the robust FVA formulation. The simpler dual formulation of the robustFVA optimization is derived. Next, some computational experiments are conducted to measure the additionalFVA charge due to distributional uncertainty under a variety of portfolio and market conﬁgurations. Finally somesuggestions for future work, such as robust capital valuation adjustment (KVA) and margin valuation adjustment(MVA), are discussed.

Keywords—

OTC, CCR, CVA, FVA, derivatives, distributional robust optimization, Wasserstein distance, duality

Funding valuation adjustment (FVA) represents the impact on portfolio market value due to funding exposures for thehedge on uncollateralized derivatives. It represents the market value of funding exposure risk. Funding cost adjustment (FCA)can be represented mathematically as an integral of discounted expected positive exposure times funding cost (incremental)conditional on joint counterparty and ﬁrm survival. FCA arises for a positive portfolio exposure since this implies a negativehedge exposure which leads to a funding cost for collateral posted. The market valuation is a function of joint counterparty andﬁrm credit risk, the underlying (market) risk factors that drive the portfolio valuation (and hence positive exposure) as well asfunding cost, as well as the correlations between these market risk factors and the credit risk curves for a given portfolio. FCAis typically measured and reported at the funding netting set level.The “other side” of FCA is funding beneﬁt adjustment (FBA). This represents the funding beneﬁt to the ﬁrm, for interestincome proceeds on received collateral posted against counterparty exposure on the hedge, as measured by discounted expectednegative exposure times funding beneﬁt conditional on joint counterparty and ﬁrm survival. As above, the market valuation isa function of counterparty and ﬁrm credit risk, underlying market risk factors that drive portfolio valuation and funding beneﬁt,and the correlations. FBA can be represented mathematically as an integral of discounted negative exposure times fundingbeneﬁt conditional on joint counterparty and ﬁrm survival. FBA is typically measured at the funding netting set level.(Bilateral) FVA represents the dual impact on portfolio market value due to both funding cost and funding beneﬁt exhibitedover the portfolio lifetime. FVA can be represented mathematically as the difference (or sum) of two integrals: (i) discountedexpected positive exposure times funding cost conditional on joint counterparty and ﬁrm survival (ii) discounted expectednegative exposure times funding beneﬁt conditional on joint counterparty and ﬁrm survival. FVA is typically measured andreported at the netting set level for a given ﬁrm.As mentioned in Part 1 (Singh and Zhang, 2019), U.S. regulatory authorities, the Federal Reserve and Ofﬁce of theComptroller of the Currency (OCC), periodically assess national banks’ compliance with the Market Risk Capital Rule (MRR).Counterparty credit risk (CCR) and funding risk (FR) metrics are key metrics used to evaluate bank risk proﬁles due to OTCderivatives. Basel Committee on Banking Supervision, through Basel III, has developed criteria to quantify capital charges dueto CCR. According to International Swap Dealers Association (ISDA) the current OTC derivatives notional outstanding is over500 trillion. Consequently the CCR and FR exposures (due to uncollateralized or partially collateralized hedges) inherent inthe OTC derivatives market represent signiﬁcant market risk exposures. This motivates the concepts of worst case FVA andwrong way risk (WWR) and the impact of uncertainty in probability distribution on FR and FVA. It is these considerations thatmotivate this line of research. (Ramzi Ben-Abdallah and Marzouk, 2019), (El Hajjaji and Subbotin, 2015) a r X i v : . [ q -f i n . M F ] O c t n outline of this paper is as follows. Section 1 represents an introduction and overview of FVA and wrong way fundingrisk. Section 2 develops the main theoretical results of the paper and provides proof sketches. Section 3 conducts a com-putational study of WWR for a representative set of derivative instruments, portfolios, and market environments. Section 4discusses the conclusions and suggestions for future research. All detailed proofs of propositions, corollaries, and theorems aredeferred to the Appendix. Remark 1.

The authors are not aware of any substantial research that has been done on the topic of worst case FVA. Thediscussion below pertains to literature regarding worst case CVA.

In the past few years research has been done to investigate the effect of distributional uncertainty on CVA. Brigo et al.(2013) explicitly incorporate correlation into the stochastic processes driving the market risk and credit default factors. Theymeasure the effect of dependency structure (and hence wrong way risk) on CVA for a variety of asset classes. Glasserman andYang (2015) bound the effect of wrong way risk on CVA. Their approach considers a discrete setting and formulates worst caseCVA as the solution to a worst case linear program subject to certain constraints, where the dependency structure is allowedto vary. They introduce a penalty term, measured via Kullback-Leibler (KL) divergence, to control the degree of wrong wayrisk. Memartoluie, in his PhD thesis, uses an ordered scenario copula methodology to quantify worst case CVA (Memartoluie,2017). For worst case correlations set to one, he ﬁnds results that are comparable to the method of Glasserman and Yang.Lagrangian duality results were recently (and independently) developed by Blanchet and Murthy (2019) and Gao andKleywegt (2016). These results hold under mild assumptions. The main innovation in our work is to apply these recent resultsto worst case FVA using Wasserstein distance as the ambiguity measure. Furthermore, analytical expressions are derivedfor the solutions to the inner and outer convex optimization problems. A computational study shows the material impact ofdistributional uncertainty on worst case FVA and illustrates the risk proﬁle.

Notation and core deﬁnitions for FCA problem setup follow conventions in Glasserman and Yang (2015) and Lichters et al.(2015). Those for the robust FCA problem formulation follow conventions in Blanchet et al. (2018). FCA measures expectedfunding cost over the lifetime of the portfolio. Let V + ( t ) denote the positive portfolio exposure at time t . The problem setuphere assumes a ﬁxed set of observation dates, 0 = t < t < · · · < t n = T . Let X + denote the vector of discounted positiveexposures and Y C and Y F denote the vectors of counterparty and ﬁrm survival indicators. Further, let Y CF denote the Hadamardproduct Y C (cid:12) Y F which represents the vector of joint survival indicators. To incorporate funding, let Z + denote the vector offunding costs incurred on exposures X + . Let ( z + i , y c fi ) denote realizations of ( Z + , Y CF ) along sample paths for i = { , , . . . , N } .The FCA associated with funding costs Z + and joint survival indicator { τ C > t }∩{ τ F > t } is (Lichters et al., 2015), (Green,2015) FCA = (cid:90) T E [ Z + ( t ) { τ C > t }∩{ τ F > t } ] dt . The pair of vectors ( Z + , Y CF ) ∈ ( R n + × B n ) is Z + = ( f c ( t , t ) X + ( t ) , . . . , f c ( t n − , t n ) X + ( t n )) and Y CF = ( { τ C > t }∩{ τ F > t } , . . . , { τ C > t n }∩{ τ F > t n } ) . Here B n denotes the set of survival time vectors: binary vectors of ones and zeros with n components, and at most one block ofones followed by a complementary block of zeros. The empirical measure, P N , is P N ( dz ) = N N ∑ i = ( z + i , y cfi ) ( dz ) . Under the empirical measure, P N , FCA is an expectation of an inner productFCA = E P N [ (cid:104) Z + , Y CF (cid:105) ] . In the context of this work, the uncertainty set for probability measures is U δ ( P N ) = { P : D c ( P , P N ) ≤ δ } here D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018). For conveniencethe deﬁnition for D c is D c ( P , P (cid:48) ) = inf { E π [ c ( A , B )] : π ∈ P ( R d × R d ) , π A = P , π B = P (cid:48) } where P denotes the space of Borel probability measures and π A and π B denote the distributions of A and B . Here A denotes ( Z + A , Y A ) ∈ ( R n + × B n ) and B denotes ( Z + B , Y B ) ∈ ( R n + × B n ) respectively. This work uses the cost function c S where c S (( u , v ) , ( z , y )) = S (cid:104) v − y , v − y (cid:105) + (cid:104) u − z , u − z (cid:105) . The scale factor S > ( u , v ) ∈ ( R n + × B n ) , ( z , y ) ∈ ( R n + × B n ) . Notation and core deﬁnitions for FBA problem setup follow conventions in Glasserman and Yang (2015) and Lichterset al. (2015). Those for the robust FBA problem formulation follow conventions in Blanchet et al. (2018). FBA measuresexpected funding beneﬁt over the lifetime of the portfolio. Let V − ( t ) denote the negative portfolio exposure at time t . Theproblem setup here assumes a ﬁxed set of observation dates, 0 = t < t < · · · < t n = T . Let X − denote the vector of discountednegative exposures and Y C and Y F denote the vectors of counterparty and ﬁrm survival indicators. Further, let Y CF denote theHadamard product Y C (cid:12) Y F which represents the vector of joint survival indicators. To incorporate funding, let Z − denotethe vector of funding benﬁts incurred on exposures X − . Let ( z − i , y c fi ) denote realizations of ( Z − , Y CF ) along sample paths for i = { , , . . . , N } .The FBA associated with funding beneﬁts Z − and joint survival indicator { τ C > t }∩{ τ F > t } is (Lichters et al., 2015), (Green,2015) FBA = (cid:90) T E [ Z − ( t ) { τ C > t }∩{ τ F > t } ] dt . The pair of vectors ( Z − , Y CF ) ∈ ( R n − × B n ) is Z − = ( f b ( t , t ) X − ( t ) , . . . , f b ( t n − , t n ) X − ( t n )) and Y CF = ( { τ C > t }∩{ τ F > t } , . . . , { τ C > t n }∩{ τ F > t n } ) . Here B n denotes the set of survival time vectors: binary vectors of ones and zeros with n components, and at most one block ofones followed by a complementary block of zeros. The empirical measure, Q N , can be written as Q N ( dz ) = N N ∑ i = ( z − i , y cfi ) ( dz ) . Under the empirical measure, Q N , FBA is an expectation of an inner productFBA = E Q N [ (cid:104) Z − , Y CF (cid:105) ] . In the context of this work, the uncertainty set for probability measures is U δ ( Q N ) = { Q : D c ( Q , Q N ) ≤ δ } where D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018). For conveniencethe deﬁnition for D c is D c ( Q , Q (cid:48) ) = inf { E π [ c ( A , B )] : π ∈ P ( R d × R d ) , π A = Q , π B = Q (cid:48) } where P denotes the space of Borel probability measures and π A and π B denote the distributions of A and B . Here A denotes ( Z − A , Y A ) ∈ ( R n − × B n ) and B denotes ( Z − B , Y B ) ∈ ( R n − × B n ) respectively. This work uses the cost function c S where c S (( u , v ) , ( x , y )) = S (cid:104) v − y , v − y (cid:105) + (cid:104) u − z , u − z (cid:105) . The scale factor S > ( u , v ) ∈ ( R n − × B n ) , ( z , y ) ∈ ( R n − × B n ) . Notation and core deﬁnitions for (bilateral) FVA problem setup incorporate those above for FCA and FBA. FVA measuresexpected funding costs and beneﬁts over portfolio lifetime. Let V + ( t ) denote the positive portfolio exposure at time t . Let V − ( t ) denote the negative portfolio exposure at time t . The problem setup here assumes a ﬁxed set of observation dates,0 = t < t < · · · < t n = T . Let X + denote the vector of discounted positive exposures and Y C denote the vector of counterparty urvival indicators. Let X − denote the vector of discounted negative exposures and Y F denote the vector of ﬁrm survivalindicators. Further, let Y CF denote the Hadamard product Y C (cid:12) Y F which represents the vector of joint survival indicators. Toincorporate funding, let Z + denote the vector of funding costs incurred on exposures X + . And similarly for Z − with respectto exposures X − . Due to the linkage between Z + and Z − , one can write Z = Z + + Z − and decompose sample realizations of Z into Z + and Z − accordingly. Therefore, let ( z i , y c fi ) denote realizations of ( Z , Y CF ) along sample paths for i = { , , . . . , N } .The relation z i = z + i + z − i can be used to decompose z i into its positive and negative exposures respectively.The FVA associated with funding costs Z ( t ) , joint survival indicator { τ C > t }∩{ τ F > t } is (Lichters et al., 2015), (Green, 2015)FVA = FCA + FBA = (cid:90) T E [ Z + ( t ) { τ C > t }∩{ τ F > t } ] dt + (cid:90) T E [ Z − ( t ) { τ C > t }∩{ τ F > t } ] dt = (cid:90) T E [ Z ( t ) { τ C > t }∩{ τ F > t } ] dt . The pair of vectors ( Z , Y CF ) ∈ ( R n × B n ) is Z = ( Z + ( t ) + Z − ( t ) , . . . , Z + ( t n ) + Z − ( t n )) and Y CF = ( { τ C > t }∩{ τ F > t } , . . . , { τ C > t n }∩{ τ F > t n } ) , and the pair of vectors ( Z + , Z − ) ∈ ( R n + × R n − ) is Z + = ( f c ( t , t ) X + ( t ) , . . . , f c ( t n − , t n ) X + ( t n )) and Z − = ( f b ( t , t ) X − ( t ) , . . . , f b ( t n − , t n ) X − ( t n )) . Here B n denotes the set of survival time vectors: binary vectors of ones and zeros with n components, and at most one block ofones followed by a complementary block of zeros. The empirical measure, Φ N , is Φ N ( dz ) = N N ∑ i = ( z i , y cf ) ( dz ) . Under the empirical measure, Φ N , FVA is a sum of expectations of inner productsFVA = E Φ N [ (cid:104) Z + , Y CF (cid:105) ] + E Φ N [ (cid:104) Z − , Y CF (cid:105) ] = E Φ N [ (cid:104) Z , Y CF (cid:105) ] . In the context of this work, the uncertainty set for probability measures is U δ ( Φ N ) = { P : D c ( Φ , Φ N ) ≤ δ } where D c is the optimal transport cost or Wasserstein discrepancy for cost function c (Blanchet et al., 2018). For conveniencethe deﬁnition for D c is D c ( Φ , Φ (cid:48) ) = inf { E π [ c ( A , B )] : π ∈ P ( R d × R d ) , π A = Φ , π B = Φ (cid:48) } where P denotes the space of Borel probability measures and π A and π B denote the distributions of A and B . Here A denotes ( Z A , Y A ) ∈ ( R n × B n ) and B denotes ( Z B , Y B ) ∈ ( R n × B n ) respectively. This work uses the cost function c S where c S (( u , v ) , ( z , y )) = S (cid:104) v − y , v − y (cid:105) + (cid:104) u − z , u − z (cid:105) . The scale factor S > ( u , v ) ∈ ( R n × B n ) , ( z , y ) ∈ ( R n × B n ) . The robust FCA can be written as sup P ∈ U δ ( P N ) E P [ (cid:104) Z + , Y CF (cid:105) ] . (P1)Now use recent duality results, noting the inner product (cid:104) ; (cid:105) satisﬁes the upper semicontinuous condition of the Lagrangianduality theorem, and cost function c S satisﬁes the non-negative lower semicontinuous condition (see Blanchet and Murthy(2019) Assumptions 1 & 2, Gao and Kleywegt (2016)). Hence the dual problem (to sup above) can be written asinf γ ≥ H ( γ ) : = (cid:20) γδ + N N ∑ i = Ψ γ ( z + i , y c fi ) (cid:21) (D1) here Ψ γ ( z + i , y c fi ) = sup u ∈ R n + , v ∈ B n [ (cid:104) u , v (cid:105) − γ c S (( u , v ) , ( z + i , y c fi ))] = sup u ∈ R n + , v ∈ B n [ (cid:104) u , v (cid:105) − γ ( (cid:104) u − z + i , u − z + i (cid:105) + S (cid:104) v − y c fi , v − y c fi (cid:105) )] . Now apply change of variables w = ( u − z + i ) and w = ( v − y c fi ) to get Ψ γ ( z + i , y c fi ) = sup w ≥− z + i , w ∈ B n [ (cid:104) w + z + i , w + y c fi (cid:105) − γ ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) )] where B n denotes the set of ternary vectors of ones, zeros, and minus ones with n components, and at most one block of onesor minus ones. Note that sup w [ ; ] is attained for w ∗ ∈ R n + (as will become evident in the proof) hence it sufﬁces to considerthis space for w . It turns out that Ψ γ can be expressed as original FCA plus the pointwise max of ( n + ) convex functions.The degenerate case l = n cases are a hyperbola plus a line of negative slope. Ψ γ quantiﬁes the adversarial move in FCA across both time and spatial dimensions while accounting for the cost via the K terms. Proposition 1.

Let Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:2) l ∗ γ + ( ∑ l ∗ k = z + ik − ∑ (cid:107) y cfi (cid:107) k = z + ik ) − γ S K (cid:3) where l ∗ = arg max l ≥ [ l γ + ∑ lk = z + ik − γ S K ] and l = (cid:107) w + y c fi (cid:107) ≥ , l ∈ Z + . Also (cid:107) y c fi (cid:107) ∈ Z + , and K = | l − (cid:107) y c fi (cid:107) | = (cid:107) w (cid:107) ≥ , K ∈ Z + . Once l ∗ is selected, K : = | l ∗ − (cid:107) y c fi (cid:107) | = (cid:107) w ∗ (cid:107) . Alternatively, Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:87) nl = h γ ( l ) forh γ ( l ) : = (cid:2) l γ + ( ∑ lk = z + ik − ∑ (cid:107) y cfi (cid:107) k = z + ik ) − γ S K (cid:3) . Finally, note (cid:87) nl = h γ ( l ) denotes max l ∈{ ,..., n } h γ ( l ) . section Proof sketch.

This result follows from jointly maximizing the adversarial funding exposure w and the survival time index w .The structure of B n allows us to decouple this joint maximization and ﬁnd the critical point to maximize the quadratic in w andwrite down the condition to select the optimal survival time index l ∗ . Finally, consider the two cases w = w (cid:54) = K terms represent the cost associated with the worst case. The goal now is to evaluate inf γ ≥ H ( γ ) : = (cid:20) γδ + N N ∑ i = Ψ γ ( z + i , y c fi ) (cid:21) where Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + n (cid:95) l = h γ ( l ) where h γ ( l ) : = (cid:2) l γ + (cid:0) l ∑ k = z + ik − (cid:107) y cfi (cid:107) ∑ k = z + ik (cid:1) − γ S K (cid:3) . The convexity of the objective function H ( γ ) simpliﬁes the task of solving this optimization problem. The ﬁrst order optimalitycondition sufﬁces. As Ψ γ and hence H ( γ ) may have non-differentiable kinks due to the max functions, ∨ , we characterize theoptimality condition via subgradients. In particular, we look for γ ∗ ≥ ∈ ∂ H ( γ ∗ ) . Inspection of the asymptoticproperties of Ψ γ and its subgradients reveals that ∂ H ( γ ) will cross zero (as γ sweeps from 0 to ∞ ) and hence γ ∗ ≥ Proposition 2.

Let γ ∗ ∈ { γ ≥ ∈ ∂ H ( γ ) } where ∂ Ψ γ = Conv ∪ { ∂ h γ ( l ) | (cid:104) z + i , y c fi (cid:105) + h γ ( l ) = Ψ γ ; l ∈ { , . . . , n }} and ∂ H ( γ ) = δ + N ∑ Ni = ∂ Ψ γ . section Proof sketch.

This follows from application of standard properties of subgradients as well as inspection of the asymptoticproperties of Ψ γ and ∂ Ψ γ . For γ sufﬁciently small, Ψ γ has a large positive value and ∂ Ψ γ has a large negative derivative. For γ sufﬁciently large, for optimal l ∗ , either l ∗ = = ⇒ ∈ ∂ Ψ γ or l ∗ = (cid:107) y c fi (cid:107) > = ⇒ ∂ Ψ γ approaches zero = ⇒ ∂ H ( γ ) crosseszero.Putting together the results of these two propositions, we arrive at our ﬁrst theorem. Theorem 1.

This follows directly from the previous two propositions.

The robust FBA can be written as sup Q ∈ U δ ( Q N ) E Q [ (cid:104) Z − , Y CF (cid:105) ] . (P2)Now use recent duality results, noting the inner product (cid:104) ; (cid:105) satisﬁes the upper semicontinuous condition of the Lagrangianduality theorem, and cost function c S satisﬁes the non-negative lower semicontinuous condition (see Blanchet and Murthy(2019) Assumptions 1 & 2, Gao and Kleywegt (2016)). Hence the dual problem (to sup above) can be written asinf β ≥ G ( β ) : = (cid:20) β δ + N N ∑ i = Ψ β ( z − i , y c fi ) (cid:21) (D2)where Ψ β ( z − i , y c fi ) = sup u ∈ R n − , v ∈ B n [ (cid:104) u , v (cid:105) − β c S (( u , v ) , ( z − i , y c fi ))] = sup u ∈ R n − , v ∈ B n [ (cid:104) u , v (cid:105) − β ( (cid:104) u − z − i , u − z − i (cid:105) + S (cid:104) v − y c fi , v − y c fi (cid:105) )] . Now apply change of variables w = ( u − z − i ) and w = ( v − y c fi ) to get Ψ β ( z − i , y c fi ) = sup w ≤− z − i , w ∈ B n [ (cid:104) w + z − i , w + y c fi (cid:105) − β ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) )] where sets B n and B n are deﬁned as before. Following a similar approach as for FCA, it turns out that Ψ β can be expressed asoriginal FBA plus the pointwise max of ( n + ) convex functions. The degenerate case l = n cases are a (convex) piecewise line then hyperbola function plus a line of negative slope. Ψ β quantiﬁes the adversarialmove in FBA across both time and spatial dimensions while accounting for the cost via the K terms. Proposition 3.

We have Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) where h i ( β ) = (cid:2) h i ( β , l ∗ ) + (cid:0) ∑ l ∗ k = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K (cid:3) = (cid:2) (cid:87) nl = h i ( β , l ) + (cid:0) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K (cid:3) . Also (cid:107) y c fi (cid:107) ∈ Z + , and K = | l − (cid:107) y c fi (cid:107) | = (cid:107) w (cid:107) ≥ , K ∈ Z + . Once l ∗ is selected, K : = | l ∗ − (cid:107) y c fi (cid:107) | = (cid:107) w ∗ (cid:107) . Continuing with the notationalsetup, h i ( β , l ) = l ∑ k = g ik ( β ) = l ∑ k = (cid:40) − z − ik − β ( z − ik ) , − z − ik ≤ β β , − z − ik > β . Furthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } [ h i ( β , l ) + l ∑ k = z − ik − β S K ] . Recall (cid:87) nl = h i ( β , l ) denotes max l ∈{ ,..., n } h i ( β , l ) . section Proof sketch.

This result follows from jointly maximizing the adversarial funding exposure w and the survival time index w .The structure of B n allows us to decouple this joint maximization and ﬁnd the critical point to maximize the quadratic in w and write down the condition to select the optimal survival time index l ∗ . Finally, consider the two cases w = w (cid:54) = w ≤ − z − i leads to a convex but piecewise structure for h i ( β , l ) . The K terms represent the cost associated with the worst case. The goal now is to evaluate inf β ≥ G ( β ) : = (cid:20) β δ + N N ∑ i = Ψ β ( z − i , y c fi ) (cid:21) here Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) for h i ( β ) = (cid:2) h i ( β , l ∗ ) + ∑ l ∗ k = z − ik − β S K (cid:3) = (cid:2) (cid:87) nl = h i ( β , l ) + ∑ lk = z − ik − β S K (cid:3) .The convexity of the objective function G ( β ) simpliﬁes the task of solving this optimization problem. The ﬁrst order optimalitycondition sufﬁces. As Ψ β and hence G ( β ) may have non-differentiable kinks due to the max functions, ∨ , we characterize theoptimality condition via subgradients. In particular, we look for β ∗ ≥ ∈ ∂ G ( β ∗ ) . Inspection of the asymptoticproperties of Ψ β and its subgradients reveals that two cases are possible. Case 1 is ∂ G ( β ) consists of strictly positive elementshence β ∗ =

0. Case 2 is ∂ G ( β ) will cross zero (as β sweeps from 0 to ∞ ) and hence β ∗ ≥ Proposition 4.

This follows from application of standard properties of subgradients as well as inspection of the asymptoticproperties of Ψ β and ∂ Ψ β . For Case 1, if ∂ G ( β ) consists of strictly positive elements then it is clear that β ∗ attains theminimum. For Case 2, the asymptotic properties can be used to show that ∂ G ( β ) can’t consist of strictly negative elements.For β sufﬁciently large, for optimal l ∗ , either l ∗ = = ⇒ ∈ ∂ Ψ β or l ∗ = (cid:107) y c fi (cid:107) > = ⇒ ∂ Ψ β approaches 0 = ⇒ ∂ G ( β ) crosses zero. Theorem 2.

The primal problem P2 has solution (cid:2) β ∗ δ + N ∑ Ni = Ψ β ∗ ( z − i , y c fi ) (cid:3) where β ∗ ∈ { β ≥ ∈ ∂ G ( β ) } ∪ { β = / ∈ ∂ G ( β ) } and Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) where h i ( β ) = (cid:2) (cid:87) nl = h i ( β , l ) + (cid:0) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K (cid:3) . Expressed in terms of original FBA, this says sup Q ∈ U δ ( Q N ) E Q [ (cid:104) Z − , Y CF (cid:105) ] = E Q N [ (cid:104) Z − , Y CF (cid:105) ] + β ∗ δ + E Q N (cid:2) n (cid:95) l = h ( β ∗ , l ) + (cid:0) l ∑ k = Z − k − (cid:107) Y CF (cid:107) ∑ k = Z − k (cid:1) − β ∗ S K (cid:3) where the additional terms represent a penalty due to uncertainty in probability distribution, andh ( β , l ) = l ∑ k = (cid:40) − Z − k − β ( Z − k ) , − Z − k ≤ β β , − Z − k > β . section Proof sketch.

This result follows directly from the previous two propositions.

The robust FVA can be written as sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) Z , Y CF (cid:105) ] . (P3)Similar to before, use recent duality results, noting the inner product (cid:104) ; (cid:105) satisﬁes the upper semicontinuous condition of theLagrangian duality theorem, and cost function c S satisﬁes the non-negative lower semicontinuous condition (see Blanchet andMurthy (2019) Assumptions 1 & 2, Gao and Kleywegt (2016)). Hence the dual problem (to sup above) can be written asinf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( z i , y c fi ) (cid:21) (D3)where Ψ α ( z i , y c fi ) = sup u ∈ R n , v ∈ B n [ (cid:104) u , v (cid:105) − α c S (( u , v ) , ( z i , y c fi ))] = sup u ∈ R n , v ∈ B n [ (cid:104) u , v (cid:105) − α ( (cid:104) u − z i , u − z i (cid:105) + S (cid:104) v − y c fi , v − y c fi (cid:105) )] . ow apply change of variables w = ( u − z i ) and w = ( v − y c fi ) to get Ψ α ( z i , y c fi ) = sup w ∈ R n , w ∈ B n [ (cid:104) w + z i , w + y c fi (cid:105) − α ( (cid:104) w , w (cid:105) + S (cid:104) w , w (cid:105) )] where the sets B n and B n are deﬁned as before. It turns out that Ψ α can be expressed as original FVA plus the pointwise maxof ( n + ) convex functions. The degenerate case l = n cases are a hyperbola plusa line of negative slope. Ψ α quantiﬁes the adversarial move in FVA across both time and spatial dimensions while accountingfor the cost via the K terms. Proposition 5.

We have Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:2) l ∗ α + (cid:0) ∑ l ∗ k = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α S K (cid:3) where l ∗ = arg max l ≥ [ l α + ∑ lk = z ik − α S K ] and l = (cid:107) w + y c fi (cid:107) ≥ , l ∈ Z + . Also (cid:107) y c fi (cid:107) ∈ Z + , and K = | l − (cid:107) y c fi (cid:107) | = (cid:107) w (cid:107) ≥ , K ∈ Z + . Once l ∗ is selected, K : = | l ∗ − (cid:107) y c fi (cid:107) | = (cid:107) w ∗ (cid:107) . Alternatively, Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:87) nl = h α ( l ) forh α ( l ) : = (cid:2) l α + (cid:0) ∑ lk = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α S K (cid:3) . section Proof sketch.

This result follows from jointly maximizing the adversarial funding exposure w and the survival time index w .The structure of B n allows us to decouple this joint maximization and ﬁnd the critical point to maximize the quadratic in w andwrite down the condition to select the optimal survival time index l ∗ . Finally, consider the two cases w = w (cid:54) = K terms represent the cost associated with the worst case. The goal now is to evaluate inf α ≥ F ( α ) : = (cid:20) αδ + N N ∑ i = Ψ α ( z i , y c fi ) (cid:21) where Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + n (cid:95) l = h α ( l ) for h α ( l ) : = (cid:2) l α + (cid:0) l ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:1) − α S K (cid:3) . The convexity of the objective function F ( α ) simpliﬁes the task of solving this optimization problem. The ﬁrst order optimalitycondition sufﬁces. As Ψ α and hence F ( α ) may have non-differentiable kinks due to the max functions, ∨ , we characterize theoptimality condition via subgradients. In particular, we look for α ∗ ≥ ∈ ∂ F ( α ∗ ) . Inspection of the asymptoticproperties of Ψ α and its subgradients reveals that ∂ F ( α ) will cross zero (as α sweeps from 0 to ∞ ) and hence α ∗ ≥ Proposition 6.

Let α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } where ∂ Ψ α = Conv ∪ (cid:110) ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n } (cid:111) and ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α . section Proof sketch.

This follows from application of standard properties of subgradients as well as inspection of the asymptoticproperties of Ψ α and ∂ Ψ α . For α sufﬁciently small, Ψ α has a large positive value and ∂ Ψ α has a large negative derivative. For α sufﬁciently large, for optimal l ∗ , either l ∗ = = ⇒ ∈ ∂ Ψ α or l ∗ = (cid:107) y c fi (cid:107) > = ⇒ ∂ Ψ α approaches zero = ⇒ ∂ F ( α ) crosses zero. Theorem 3.

The primal problem P3 has solution (cid:2) α ∗ δ + N ∑ Ni = Ψ α ∗ ( z i , y c fi ) (cid:3) where α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } and Ψ α ∗ ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:87) nl = h α ∗ ( l ) f or h α ∗ ( l ) : = (cid:2) l α ∗ + (cid:0) ∑ lk = z ik − ∑ (cid:107) y cfi (cid:107) k = z ik (cid:1) − α ∗ S K (cid:3) . Expressed in terms of original FVA, this says sup Φ ∈ U δ ( Φ N ) E Φ [ (cid:104) Z , Y CF (cid:105) ] = E Φ N [ (cid:104) Z , Y CF (cid:105) ] + α ∗ δ + E Φ N (cid:2) n (cid:95) l = l α ∗ + (cid:0) l ∑ k = Z k − (cid:107) Y CF (cid:107) ∑ k = Z k (cid:1) − α ∗ S K (cid:3) where the additional terms represent a penalty due to uncertainty in probability distribution. section Proof sketch.

This follows directly from the previous two propositions. Computational Study: Robust FVA and Wrong Way Funding Risk

This computational study uses the Matlab Financial Instruments Toolbox and extends WWR portfolio analysis (Brigo et al.,2013, section 5.3) to consider uncertainty in probability distribution. Other key concepts that will be leveraged in this sectionare the measure concentration results and association of Wasserstein radius δ with conﬁdence level 1 − β for some β ∈ ( , ) .As of July 10, 2019, 5y par interest rate swaps are 1.88% (see ). The full table is shownbelow. Furthermore, Bloomberg shows U.S. CDX investment grade and high yield 5y credit default swap spreads as below. Table 1: Swap RatesSwap Tenor 1y 2y 3y 5y 7y 10y 30ySwap Rate 2.13% 1.95% 1.89% 1.88% 1.94% 2.05% 2.27%Table 2: CDS SpreadsCDX Index IG HYCDS Spread 53 323

Referencing current (as of August 26, 2019) MarkIt funding spreads, the funding spread curves are set as below. Unavailablequotes for high yield spreads are displayed as “N/A”.

Table 3: Funding SpreadsFunding Tenor 1y 2y 3y 5y 7y 10yIG Spread 0.13% 0.21% 0.31% 0.59% 0.84% 1.07%HY Spread N/A N/A 3.02% 3.62% 4.01% 4.09%

The computational studies in this section will investigate (and quantify) worst case FCA, FBA, and FVA for differentmarket environments and portfolios of interest rate swaps. The current swaps curve (shown above) will be used in conjunctionwith monte carlo simulation of a one factor Hull-White model for interest rates. The funding spreads will be used in conjunctionwith a Libor Market Model (LMM) simulation of forward funding spreads. For this analysis, the same funding spreads will beused for both FCA and FBA calculations. The counterparty credit curve selection will vary between investment grade and highyield (as shown above). The different portfolio setups will be described in the following sections. All calculations are donein Matlab as an extension of the example provided in the ﬁnancial instruments toolbox (Matlab, 2019). The default Matlabsettings for volatility and correlation parameters are used for the Hull-White and LMM term structure models. Independencebetween the funding cost, interest rate, and credit default factors is assumed for the joint simulation.

As mentioned in Part 1 (Singh and Zhang, 2019), a natural question to ask when computing worst case FVA is how tointerpret the size of the Wasserstein radius δ . Substantial research has been done and some key results are mentioned here. Thefollowing result is due to Fournier and Guillin (2015) and the constants c , c below can be calculated explicitly by followingthe proof: P [ D c ( Φ , Φ N ) ≥ δ )] ≤ (cid:40) c exp ( − c N δ max { n , } ) if δ ≤ , c exp ( − c N δ a ) if δ > ∀ N ≥ , n (cid:54) = , and δ > c > , c > a , A , and n .Esfahani and Kuhn (2018) discuss how equating the RHS above to β and solving for δ gives δ N ( β ) = (cid:40) ( log ( c β − c N ) / max { n , } if N ≥ log ( c β − ) c , ( log ( c β − c N ) / a if N < log ( c β − ) c however these bounds are overly conservative, and result in a radius δ ∗ much larger than necessary. s an alternative approach, we follow a method that provides a more explicit mapping between δ and β (Carlsson et al.,2018, section 3). Theorem 6.15 of Villani (2008) gives a bound on Wasserstein distance between two pdfs Φ , Φ (cid:48) as D ( Φ , Φ (cid:48) ) ≤ (cid:90)(cid:90) R (cid:107) x − x (cid:107) · | Φ ( x ) − Φ (cid:48) ( x ) | dA . Carlsson et al. (2018) get the result P ( D ( Φ , Φ N ) ≥ δ ) (cid:46) exp (cid:32) − N r − √ r + r δ + r + δ + + δ + + r (cid:33) where r = max x ∈ R , x ∈ R (cid:107) x − x (cid:107) denotes the radius of domain R . This is the characterization which is used in this study. Therefore, for a desired signiﬁcance(conﬁdence) level β ∈ ( , ) , ﬁnd δ β such that1 − β = exp  − N r − (cid:113) r + r δ β + r + δ β + + δ β + + r  . In our problem setting we use r N (computed using the empirical domain R N ) as the discrete approximation to r , which isdifﬁcult to bound. The portfolio here consists of a dozen interest rate swaps, with a mix of receving ﬁxed and paying ﬁxed swaps, at differentcoupons, maturities, and notionals. The ﬁxed coupons range between 2% and 2.5%, the maturities range between 4y and 12y,the notionals range between 400k USD and 1mm USD. The investment grade counterparty and ﬁrm credit spreads are set to 50basis points. The table of conﬁdence levels β and their corresponding Wasserstein radii δ follows. Table 4: FCA Investment Grade Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 1.2 1.3 1.5 1.7 2.1 2.6

The scale factor S is set (by default) to 1 and the portfolio exposures are scaled to be in units of thousands of dollars.Again, the intent of scaling is to provide appropriate penalty to the adversarial change in joint distribution of portfolio fundingexposures and default times that promotes worst case FCA and wrong way risk. Further work may conduct a sensitivity analysisregarding the pairings of S and units of portfolio exposures to investigate suitable (unsuitable) ranges that preserve (distort) theshape of the robust FCA proﬁle. Matlab plots characterizing the FCA exposure proﬁle and trajectory of worst case FCA as afunction of Wasserstein radius are shown. Again, we think about worst case FCA (which incorporates joint survival probability)as compared to the funding PFE (potential future exposure) which shows tail percentiles of funding exposure (not scaled byjoint survival probability). The baseline FCA for this portfolio is small (about 2.2k USD) and represents the dot product of the discounted portfoliofunding exposure proﬁle times joint survival probability. The worst case FCA curve is shown below. Note the worst case FCAis approximately 70% the size of integrated (lifetime) FCA PFE for Wasserstein radius δ about 1.7 which maps to a signiﬁcancelevel around 95%. So the takeaway here is worst case FCA is still a signiﬁcant percentage of funding PFE for swap portfolioswith low counterparty default curves (investment grade). .2.2 Portfolio of Interest Rate Swaps, High Yield Counterparty and Firm The reference portfolio here consists of a dozen interest rate swaps, with a mix of receving ﬁxed and paying ﬁxed swaps,at different coupons, maturities, and notionals. The ﬁxed coupons range between 2% and 2.5%, the maturities range between4y and 12y, the notionals range between 400k USD and 1mm USD. The high yield counterparty and ﬁrm credit spreads are setto 320 basis points. The table of conﬁdence levels β and their corresponding Wasserstein radii δ is shown below. For the samereference portolio, and same set of monte carlo interest rate paths, the max interest rate exposures are the same. However, thehigh yield credit spreads and funding costs expand the Wasserstein radii. Table 5: FCA High Yield Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 3.1 3.4 3.7 4.2 5.3 6.5Figure 4: Swaps Portfolio Positive Exposure ProﬁlesFigure 5: Swaps Portfolio HY FCA Exposure Proﬁles12igure 6: Swaps Portfolio Worst Case HY FCA Proﬁle

The baseline FCA for this portfolio is higher, around 65k USD and represents the dot product of the discounted portfoliofunding exposure proﬁle times joint survival probability. The worst case FCA curve is shown. Note the worst case FCA is stillapproximately 33% the size of integrated (lifetime) FCA PFE (Potential Future Exposure) for Wasserstein radius δ about 4.2which maps to a signiﬁcance level around 95%. So the takeaway here is worst case FCA is a signiﬁcant percentage of fundingPFE for swap portfolios with moderately high counterparty and ﬁrm default curves. The portfolio here consists of a dozen interest rate swaps, with a mix of receving ﬁxed and paying ﬁxed swaps, at differentcoupons, maturities, and notionals. The ﬁxed coupons range between 2% and 2.5%, the maturities range between 4y and 12y,the notionals range between 4mm USD and 10mm USD. The investment grade counterparty and ﬁrm credit spreads are set to50 basis points. The table of conﬁdence levels β and their corresponding Wasserstein radii δ follows. Table 6: FBA Investment Grade Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 1.0 1.1 1.2 1.3 1.7 2.1

The scale factor S is set (by default) to 1 and the portfolio exposures are scaled to be in units of thousands of dollars. Samecomments as above, regarding scaling, apply. Matlab plots characterizing the FBA exposure proﬁle and trajectory of worst caseFBA as a function of Wasserstein radius are shown. Figure 7: Swaps Portfolio Negative Exposure Proﬁles13igure 8: Swaps Portfolio IG FBA Exposure ProﬁlesFigure 9: Swaps Portfolio Worst Case IG FBA Proﬁle

The baseline FBA for this portfolio is -2k USD and represents the dot product of the discounted negative funding portfolioexposure proﬁle times joint survival probability. The worst case FBA plot is shown. The plot illustrates that worst case FBAquickly attains its lower bound (in magnitude) of zero (no funding beneﬁt to the ﬁrm for FBA).

The reference portfolio here is the same one used in the previous subsection, albeit with notionals from 4mm to 10mmUSD. The high yield counterparty and ﬁrm credit spreads are set to 320 basis points. The table of conﬁdence levels β and theircorresponding Wasserstein radii δ is shown. For the same reference portolio, and same set of monte carlo interest rate paths,the max interest rate exposures are the same. However, the high yield credit spreads and funding costs expand the Wassersteinradii. Table 7: FBA High Yield Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 2.0 2.2 2.4 2.8 3.4 4.3

A series of matlab plots characterizing the FBA exposure proﬁle and trajectory of worst case FBA as a function of Wasser-stein radius is shown.

The baseline FBA for this portfolio is -6k USD and represents the dot product of the discounted negative portfolio fundingexposure proﬁle times joint survival probability. The worst case FBA plot is shown. Once again, the plot illustrates that worstcase FBA moves towards its lower bound (in magnitude) of zero (no funding beneﬁt to the ﬁrm for FBA). Note the worst caseFBA is just 2.1% the size of integrated (lifetime) FBA PFE (Potential Future Exposure) for Wasserstein radius δ about 2.8which maps to a signiﬁcance level around 95%. So it accelerates towards the lower bound (in magnitude) of zero. .4 FVA The corresponding FCA portfolio is used for comparison. The portfolio consists of a dozen interest rate swaps, with a mixof receving ﬁxed and paying ﬁxed swaps, at different coupons, maturities, and notionals. The ﬁxed coupons range between 2%and 2.5%, the maturities range between 4y and 12y, the notionals range between 400k USD and 1mm USD. The investmentgrade counterparty and ﬁrm credit spreads are set to 50 basis points. The table of conﬁdence levels β and their correspondingWasserstein radii δ follows. Table 8: FVA Investment Grade Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 1.4 1.5 1.7 1.9 2.4 2.9

The scale factor S is set (by default) to 1 and the portfolio exposures are scaled to be in units of thousands of dollars. Samecomments as above, for FCA and FBA, regarding scaling, apply. Matlab plots characterizing the FVA positive and negativeexposure proﬁles and trajectory of worst case FVA as a function of Wasserstein radius are shown. Figure 13: Swaps Portfolio Positive Exposure ProﬁlesFigure 14: Swaps Portfolio Negative Exposure Proﬁles

FCA PFE for Wasserstein radius δ about 1.9 which maps to a signiﬁcance level around 95%. So the takeaway here is worstcase FVA is still a signiﬁcant percentage of funding PFE for swap portfolios with low counterparty default curves (investmentgrade). It is also interesting to compare worst case FVA vs. worst case FCA for a given delta. For example, for δ of 1.7, whichrepresents the 95% signiﬁcance level for FCA, we see FVA of around 9 which is below FCA of 11.8. This agrees with intuitionthat FVA should be less than FCA due to funding beneﬁt from FBA. .4.2 Portfolio of Interest Rate Swaps, High Yield Counterparty and Firm The corresponding FCA portfolio is used for comparison. The portfolio consists of a dozen interest rate swaps, with a mixof receving ﬁxed and paying ﬁxed swaps, at different coupons, maturities, and notionals. The ﬁxed coupons range between2% and 2.5%, the maturities range between 4y and 12y, the notionals range between 400k USD and 1mm USD. The highyield counterparty and ﬁrm credit spreads are set to 320 basis points. The table of conﬁdence levels β and their correspondingWasserstein radii δ follows. Table 9: FVA High Yield Wasserstein RadiiConﬁdence Level 0.80 0.85 0.90 0.95 0.99 0.999W Radius delta 3.8 4.2 4.6 5.3 6.6 8.1

The scale factor S is set (by default) to 1 and the portfolio exposures are scaled to be in units of thousands of dollars. Samecomments as above, for FCA and FBA, regarding scaling, apply. Matlab plots characterizing the FVA positive and negativeexposure proﬁles and trajectory of worst case FVA as a function of Wasserstein radius are shown. Figure 18: Swaps Portfolio Positive Exposure ProﬁlesFigure 19: Swaps Portfolio Negative Exposure Proﬁles

The baseline FVA for this portfolio is small (less than 1k USD) and represents the dot product of the discounted portfolioFCA exposure proﬁle times joint survival probability plus dot product of the discounted portfolio FBA exposure times jointsurvival probability. The worst case FVA curve is shown below. Note the worst case FVA is approximately 23% the size ofFCA PFE for Wasserstein radius δ about 5.3 which maps to a signiﬁcance level around 95%. So the takeaway here is worstcase FVA is still a signiﬁcant percentage of funding PFE for swap portfolios with high counterparty and ﬁrm default curves. It is also interesting to compare worst case FVA vs. worst case FCA for a given delta. For example, for δ of 4.2, which representsthe 95% signiﬁcance level for FCA, we see FVA of around 12.5 which is below FCA of 19. This agrees with intuition that FVAshould be less than FCA due to funding beneﬁt from FBA. Conclusions and Further Work

This work has developed theoretical results and investigated calculations of robust FVA and wrong way risk for OTCderivatives under distributional uncerainty using Wasserstein distance as an ambiguity measure. The ﬁnancial market overviewand foundational notation and wrong way risk (robust FVA) primal problem deﬁnitions were introduced in Section 1. Usingrecent duality results (Blanchet and Murthy, 2019), the simpler dual formulation and its analytic solutions for FCA, FBA,and FVA were derived in Section 2. After that, in Section 3, some computational experiments were conducted to measurethe additional FCA charge (and/or FBA impairment) due to distributional uncertainty for a variety of portfolio and marketconﬁgurations for FCA, FBA, and FVA. Using some probability results on bounding Wasserstein distance between distributions(Carlsson et al., 2018), a mapping between Wasserstein radii δ and signiﬁcance levels β was devised to study the trajectoriesof wrong way risk as a function of radius δ . FCA increased to a signiﬁcant percentage of PFE. FBA quickly reached its lowerbound of zero funding beneﬁt. FVA was below FCA (as expected) but still showed an upward (apparently concave) trajectoryas radius δ increased. Finally, we conclude with some commentary on directions for further research.One direction for future research, as has been previously discussed, is a thorough study (including sensitivity analysis)regarding the pairings of scale factors ( S , S , S ) and units of portfolio exposures to investigate suitable (unsuitable) ranges thatpreserve (distort) the shape of the robust FCA, FBA, FVA proﬁles (as a function of Wasserstein radii, and hence distributionaluncertainty) respectively. As a reminder, the intent of scaling is to provide appropriate penalty to the adversarial change in jointdistribution of portfolio exposures and default times that promotes worst case FCA, FBA, FVA and wrong way risk. Anotherdirection for future research would be to develop (and apply) similar theoretical machinery as used for robust FVA and wrongway risk in this work towards robust KVA (Capital Valuation Adjustment) and MVA (Margin Valuation Adjustment) and wrongway risk in that context. Intuitively, wrong way risk arises in that context when the market cost of capital and/or funding themargin position increases at the same time as the portfolio exposure increases. eferences Blanchet, J., Chen, L., and Zhou, X. Y. (2018). Distributionally robust mean-variance portfolio selection with wassersteindistances.Blanchet, J. and Murthy, K. (2019). Quantifying distributional model risk via optimal transport.

Mathematics of OperationsResearch , 44(2):565–600.Brigo, D., Morini, M., and Pallavicini, A. (2013).

Counterparty credit risk, collateral and funding: with pricing cases for allasset classes , volume 478. John Wiley & Sons.Carlsson, J. G., Behroozi, M., and Mihic, K. (2018). Wasserstein distance and the distributionally robust tsp.

OperationsResearch , 66(6):1603–1624.El Hajjaji, O. and Subbotin, A. (2015). Cva with wrong way risk: Sensitivities, volatility and hedging.

International Journalof Theoretical and Applied Finance , 18(03):1550017.Esfahani, P. M. and Kuhn, D. (2018). Data-driven distributionally robust optimization using the wasserstein metric: Perfor-mance guarantees and tractable reformulations.

Mathematical Programming , 171(1-2):115–166.Fournier, N. and Guillin, A. (2015). On the rate of convergence in wasserstein distance of the empirical measure.

ProbabilityTheory and Related Fields , 162(3-4):707–738.Gao, R. and Kleywegt, A. J. (2016). Distributionally robust stochastic optimization with wasserstein distance. arXiv preprintarXiv:1604.02199 .Glasserman, P. and Yang, L. (2015). Bounding wrong-way risk in measuring counterparty risk.

Ofﬁce of Financial ResearchWorking Paper , (15-16):15–76.Green, A. (2015).

XVA: Credit, Funding and Capital Valuation Adjustments . John Wiley & Sons.Lichters, R., Stamm, R., and Gallagher, D. (2015).

Modern derivatives pricing and credit exposure analysis: theory andpractice of CSA and XVA pricing, exposure simulation and backtesting . Springer.Matlab (2019). Matlab, counterparty credit risk and cva. . Accessed: 2019-07-30.Memartoluie, A. (2017). Computational methods in ﬁnance related to distributions with known marginals.Ramzi Ben-Abdallah, M. B. and Marzouk, O. (2019). Wrong-way risk of interest rate instruments.

Journal of Credit Risk ,pages 21–44.Singh, D. and Zhang, S. (2019). Distributionally robust xva via wasserstein distance part 1. arXiv preprint arXiv:1910.01781 .Villani, C. (2008).

Optimal transport: old and new , volume 338. Springer Science & Business Media. Supplement for Theory: Robust FVA and Wrong Way Funding Risk (Section 2)

Proposition 1.

Suppose w ∗ = = ⇒ l = (cid:107) y c fi (cid:107) . Then Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + sup w ∈ R n + [ (cid:104) w , y c fi (cid:105) − γ (cid:104) w , w (cid:105) ] . Applying the Cauchy-Schwarz Inequality gives Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + sup (cid:107) w (cid:107) [ (cid:107) w (cid:107)(cid:107) y c fi (cid:107) − γ (cid:107) w (cid:107) ] . Evaluating the critical point (cid:107) w ∗ (cid:107) = (cid:107) y cfi (cid:107) γ ∈ R + for the quadratic gives Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:107) y c fi (cid:107) γ = (cid:104) z + i , y c fi (cid:105) + (cid:107) y c fi (cid:107) γ . Case 2

Now consider w ∗ (cid:54) = = ⇒ l (cid:54) = (cid:107) y c fi (cid:107) .Observe for l = (cid:107) w + y c fi (cid:107) ≥ (cid:104) w + z + i , w + y c fi (cid:105) = l ∑ k = ( w k + z + ik ) . The structure of ﬁnite set B n implies Ψ γ ( z + i , y c fi ) = sup w ∈ R n + , l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l ∑ k = ( w k + z + ik ) − γ ( (cid:104) w , w (cid:105) + S K )] . Again, using that B n is a ﬁnite set, one can write Ψ γ ( z + i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) sup w ∈ R n + [ l ∑ k = ( w k + z + ik ) − γ ( (cid:104) w , w (cid:105) + S K )] . Observing that only the ﬁrst l components of w inside the sup are positive gives ∀ k ∈ { , . . . , l } sup w ∈ R n + [ l ∑ k = ( w k ) − γ (cid:104) w , w (cid:105) ] = l × sup w k ∈ R + [ w k − γ ( w k ) ] . Evaluating at the critical point w ∗ k = γ ∈ R + for the above quadratic givessup w k ∈ R + [ w k − γ ( w k )] = γ . Therefore one can write Ψ γ ( z + i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l γ + l ∑ k = ( z + ik ) − γ S K ] . urthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l γ + l ∑ k = ( z + ik ) − γ S K ] . Substituting back into expression for Ψ γ gives Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:20) l ∗ γ + (cid:18) l ∗ ∑ k = z + ik − (cid:107) y cfi (cid:107) ∑ k = z + ik (cid:19) − γ S K (cid:21) . Finally, taking the max values for Ψ γ over cases w ∗ = w ∗ (cid:54) = Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:20) (cid:107) y c fi (cid:107) γ (cid:21) ∨ (cid:20) l ∗ γ + (cid:18) l ∗ ∑ k = z + ik − (cid:107) y cfi (cid:107) ∑ k = z + ik (cid:19) − γ S K (cid:21) . Observe that for l ∗ = (cid:107) y c fi (cid:107) , the last term in brackets [ ; ] above evaluates to (cid:2) (cid:107) y cfi (cid:107) γ (cid:3) . Therefore let l ∗ be determined as l ∗ = arg max l ∈{ ,..., n } [ l γ + l ∑ k = ( z + ik ) − γ S K ] and write Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:20) l ∗ γ + (cid:18) l ∗ ∑ k = z + ik − (cid:107) y cfi (cid:107) ∑ k = z + ik (cid:19) − γ S K (cid:21) . Alternatively, one can write Ψ γ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + n (cid:95) l = (cid:20) l γ + (cid:18) l ∑ k = z + ik − (cid:107) y cfi (cid:107) ∑ k = z + ik (cid:19) − γ S K (cid:21) . . Proposition 2.

Let γ ∗ ∈ { γ ≥ ∈ ∂ H ( γ ) } where ∂ Ψ γ = Conv ∪ { ∂ h γ ( l ) | (cid:104) z + i , y c fi (cid:105) + h γ ( l ) = Ψ γ ; l ∈ { , . . . , n }} and ∂ H ( γ ) = δ + N ∑ Ni = ∂ Ψ γ .Proof. This follows from standard application of properties of convex functions and subgradients. First note that function h γ is convex in γ since (for ﬁxed l ) it is the sum of a hyperbola plus a constant plus a negative linear term. So then Ψ γ is convexsince it is the pointwise max of a ﬁnite set of convex functions plus a constant. Using properties of subgradients, one can write ∂ Ψ γ = Conv ∪ { ∂ h γ ( l ) | (cid:104) z + i , y c fi (cid:105) + h γ ( l ) = Ψ γ ; l ∈ { , . . . , n }} . Furthermore H ( γ ) is convex in γ since it is a linear term plusa sum of convex functions, so one can write γ ∗ ∈ { γ : 0 ∈ ∂ H ( γ ) } and it follows that ∂ H ( γ ) = δ + N ∑ Ni = ∂ Ψ γ . Finally, weargue that γ ∗ ≥

0. For γ > ∃ z < − δ such that z ∈ ∂ Ψ γ and for γ > ∃ z > − δ such that z ∈ ∂ Ψ γ . To elaborate, for γ > (cid:107) y c fi (cid:107) > = ⇒ l ∗ = (cid:107) y c fi (cid:107) = ⇒ K = = ⇒ ∃ z > − δ such that z ∈ ∂ Ψ γ .To elaborate, for γ > (cid:107) y c fi (cid:107) = = ⇒ l ∗ = = ⇒ K = , Ψ γ = , = z > − δ such that z ∈ ∂ Ψ γ . Hencewe deduce ∂ H ( γ ) crosses zero ( as γ sweeps from 0 to ∞ ). Theorem 1.

The primal problem P1 has solution (cid:2) γ ∗ δ + N ∑ Ni = Ψ γ ∗ ( z + i , y c fi ) (cid:3) where γ ∗ ∈ { γ ≥ ∈ ∂ H ( γ ) } and Ψ γ ∗ ( z + i , y c fi ) = (cid:104) z + i , y c fi (cid:105) + (cid:87) nl = h γ ∗ ( l ) for h γ ∗ ( l ) : = (cid:2) l γ ∗ +( ∑ lk = z + ik − ∑ (cid:107) y cfi (cid:107) k = z + ik ) − γ ∗ S K (cid:3) .Expressed in terms of original FCA, this says sup P ∈ U δ ( P N ) E P [ (cid:104) Z + , Y CF (cid:105) ] = E P N [ (cid:104) Z + , Y CF (cid:105) ] + γ ∗ δ + E P N (cid:2) n (cid:95) l = l γ ∗ + (cid:0) l ∑ k = Z + k − (cid:107) Y CF (cid:107) ∑ k = Z + k (cid:1) − γ ∗ S K (cid:3) where the additional terms represent a penalty due to uncertainty in probability distribution.Proof. This follows by direct substitution of γ ∗ as characterized in Proposition 2.2 into Proposition 2.1 and then the dualproblem D1. roposition 3. We have Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) where h i ( β ) = (cid:2) h i ( β , l ∗ ) + (cid:0) ∑ l ∗ k = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K (cid:3) = (cid:2) (cid:87) nl = h i ( β , l ) + (cid:0) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K (cid:3) . Also (cid:107) y c fi (cid:107) ∈ Z + , and K = | l − (cid:107) y c fi (cid:107) | = (cid:107) w (cid:107) ≥ , K ∈ Z + . Once l ∗ is selected, K : = | l ∗ − (cid:107) y c fi (cid:107) | = (cid:107) w ∗ (cid:107) . Continuing with the notationalsetup, h i ( β , l ) = l ∑ k = g ik ( β ) = l ∑ k = (cid:40) − z − ik − β ( z − ik ) , − z − ik ≤ β β , − z − ik > β . Furthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } [ h i ( β , l ) + l ∑ k = z − ik − β S K ] . Recall (cid:87) nl = h i ( β , l ) denotes max l ∈{ ,..., n } h i ( β , l ) .Proof. The particular structure of B n and B n will be exploited to evaluate the sup above. The analysis proceeds by consideringdifferent cases for optimal values ( w ∗ , w ∗ ) . Case 1

Suppose w ∗ = = ⇒ l = (cid:107) y c fi (cid:107) . Then Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + sup w ≤− z − i [ (cid:104) w , y c fi (cid:105) − β (cid:104) w , w (cid:105) ] . First look at the unconstrained problem,˜ Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + sup w [ (cid:104) w , y c fi (cid:105) − β (cid:104) w , w (cid:105) ] . Applying the Cauchy-Schwarz Inequality gives˜ Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + sup (cid:107) w (cid:107) [ (cid:107) w (cid:107)(cid:107) y c fi (cid:107) − β (cid:107) w (cid:107) ] . Evaluating the critical point (cid:107) w ∗ (cid:107) = (cid:107) y cfi (cid:107) β ∈ R + for the quadratic gives˜ Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + (cid:107) y c fi (cid:107) β = (cid:104) z − i , y c fi (cid:105) + (cid:107) y c fi (cid:107) β . Now let us return to the constrained problem, Ψ β .Observing that only the ﬁrst l = (cid:107) y c fi (cid:107) components of w inside the sup are positive gives ∀ k ∈ { , . . . , l } sup w ≤− z − i [ l ∑ k = ( w k ) − β (cid:104) w , w (cid:105) ] = l ∑ k = (cid:2) sup w k ≤− z − ik w k − β ( w k ) (cid:3) . Deduce that w ∗ k = (cid:20) − z − ik ∧ β (cid:21) ∀ k ∈ { , . . . , l } . Therefore Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + l ∑ k = (cid:20) − z − ik ∧ β (cid:21) − β (cid:20) − z − ik ∧ β (cid:21) . Next, let us do some simpliﬁcation for g i ( β ) = l ∑ k = (cid:20) − z − ik ∧ β (cid:21) − β (cid:20) − z − ik ∧ β (cid:21) . Considering the two cases, it follows that: g i ( β ) = l ∑ k = g ik ( β ) = l ∑ k = (cid:40) − z − ik − β ( z − ik ) , − z − ik ≤ β β , − z − ik > β . ote that g i ( β ) is a convex function!In the degenerate case, − z − ik =

0, then g ik ( β ) = ∀ β ≥

0, where g ik denotes the k th term in the sum.Otherwise, g ik ( β ) is piecewise (line of negative slope for part 1, hyperbola for part 2) but still convex. g (cid:48) ik ( β ) = (cid:40) − ( z − ik ) , − z − ik ≤ β − β , − z − ik > β . Remarkably, these slopes are equal when − z − ik = β hence the convexity of g ik and thus g i holds. Proceed to rewrite Ψ β as Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + g i ( β ) . Case 2

Now consider w ∗ (cid:54) = = ⇒ l (cid:54) = (cid:107) y c fi (cid:107) .Observe for l = (cid:107) w + y c fi (cid:107) ≥ (cid:104) w + z − i , w + y c fi (cid:105) = l ∑ k = ( w k + z − ik ) . The structure of ﬁnite set B n implies Ψ β ( z − i , y c fi ) = sup w ∈ R n − , l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l ∑ k = ( w k + z − ik ) − γ ( (cid:104) w , w (cid:105) + S K )] . Again, using that B n is a ﬁnite set, one can write Ψ β ( z − i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) sup w ∈ R n − [ l ∑ k = ( w k + z − ik ) − β ( (cid:104) w , w (cid:105) + S K )] . Observing that only the ﬁrst l components of w inside the sup are positive gives ∀ k ∈ { , . . . , l } sup w ≤− z − i [ l ∑ k = ( w k ) − β (cid:104) w , w (cid:105) ] = l ∑ k = (cid:2) sup w k ≤− z − ik w k − β ( w k ) (cid:3) . Following the approach in Case 1 above, deﬁne h i ( β , l ) = l ∑ k = g ik ( β ) = l ∑ k = (cid:40) − z − ik − β ( z − ik ) , − z − ik ≤ β β , − z − ik > β . Furthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ h i ( β , l ) + l ∑ k = z − ik − β S K ] . Proceed to write Ψ β as Ψ β ( z − i , y c fi ) = [ h i ( β , l ∗ ) + l ∗ ∑ k = z − ik − β S K ] . This can be rewritten as Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + (cid:20) h i ( β , l ∗ ) + (cid:18) l ∗ ∑ k = z − ik − (cid:107) y cfi (cid:107) ∑ k = z − ik (cid:19) − β S K (cid:21) . Introducing h i ( β ) : = (cid:2) h i ( β , l ∗ ) + (cid:18) ∑ l ∗ k = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:19) − β S K (cid:3) , Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) . ax Taking the max values for Ψ β over cases w ∗ = w ∗ (cid:54) = Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + g i ( β ) ∨ h i ( β ) . Inspection suggests that Ψ β can be simpliﬁed further. Observe that for l ∗ = (cid:107) y c fi (cid:107) , h i ( β ) evaluates to g i ( β ) . Let l ∗ bedetermined as l ∗ = arg max l ∈{ ,..., n } [ h i ( β , l ) + l ∑ k = ( z − ik ) − β S K ] and write Ψ β ( z − i , y c fi ) = (cid:104) z − i , y c fi (cid:105) + h i ( β ) . Finally, note an alternate expression for h i ( β ) is h i ( β ) = (cid:2) (cid:87) nl = h i ( β , l ) + (cid:18) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:19) − β S K (cid:3) . Proposition 4.

Let β ∗ ∈ { β ≥ ∈ ∂ G ( β ) } ∪ { β = / ∈ ∂ G ( β ) } where ∂ Ψ β = ∂ h i ( β ) and ∂ G ( β ) = δ + N ∑ Ni = ∂ Ψ β ; ∂ h i ( β ) = Conv ∪ { ∂ h i ( β , l ) − S K | h i ( β ) = h i ( β , l ) + (cid:0) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K ; l ∈{ , . . . , n }} .Proof. This follows from standard application of properties of convex functions and subgradients. First note that function h i ( β ) is convex in β since it is the pointwise max of a ﬁnite set of convex functions plus a constant plus a negative linear term. So then Ψ β is also convex. Using properties of subgradients, ∂ Ψ β = ∂ h i ( β ) ; ∂ h i ( β ) = Conv ∪ { ∂ h i ( β , l ) − S K | h i ( β ) = h i ( β , l ) + (cid:0) ∑ lk = z − ik − ∑ (cid:107) y cfi (cid:107) k = z − ik (cid:1) − β S K ; l ∈ { , . . . , n }} . Continuing, G ( β ) is convex since it is a linear term plus a sum of convexfunctions. It follows that ∂ G ( β ) = δ + N ∑ Ni = ∂ Ψ β . Finally, we argue that β ∗ ∈ { β ≥ ∈ ∂ G ( β ) } ∪ { β = / ∈ ∂ G ( β ) } .Observe that for non-empty { β ≥ ∈ ∂ G ( β ) } , this deﬁnes β ∗ due to convexity of G ( β ) . The claim is that for empty { β ≥ ∈ ∂ G ( β ) } then β ∗ =

0. For the easier case, 0 < z β ∀ z β ∈ ∂ G ( β ) , ∀ β ≥

0, it is clear that β ∗ = β ≥ G ( β ) . It remains to show that 0 > z β ∀ z β ∈ ∂ G ( β ) , ∀ β ≥

0, does not occur. To elaborate, for β > (cid:107) y c fi (cid:107) > = ⇒ l ∗ = (cid:107) y c fi (cid:107) = ⇒ K = = ⇒ ∃ z β > − δ such that z β ∈ ∂ Ψ β . To elaborate, for β > (cid:107) y c fi (cid:107) = = ⇒ l ∗ = = ⇒ K = , Ψ β = , = z β > − δ such that z β ∈ ∂ Ψ β . Hence we deduce ∃ z β such that0 < z β + δ ∈ ∂ G ( β ) and by continuity and convexity of G ( β ) the claim holds. Theorem 2.

This follows by direct substitution of β ∗ as characterized in Proposition 2.4 into Proposition 2.3 and then the dualproblem D2. Proposition 5.

Suppose w ∗ = = ⇒ l = (cid:107) y c fi (cid:107) . Then Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + sup w ∈ R n [ (cid:104) w , y c fi (cid:105) − α (cid:104) w , w (cid:105) ] . Applying the Cauchy-Schwarz Inequality gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + sup (cid:107) w (cid:107) [ (cid:107) w (cid:107)(cid:107) y c fi (cid:107) − α (cid:107) w (cid:107) ] . Evaluating the critical point (cid:107) w ∗ (cid:107) = (cid:107) y cfi (cid:107) α ∈ R + for the quadratic gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:107) y c fi (cid:107) α = (cid:104) z i , y c fi (cid:105) + (cid:107) y c fi (cid:107) α . Case 2

Now consider w ∗ (cid:54) = = ⇒ l (cid:54) = (cid:107) y c fi (cid:107) .Observe for l = (cid:107) w + y c fi (cid:107) ≥ (cid:104) w + z i , w + y c fi (cid:105) = l ∑ k = ( w k + z ik ) . The structure of ﬁnite set B n implies Ψ α ( z i , y c fi ) = sup w ∈ R n , l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l ∑ k = ( w k + z ik ) − α ( (cid:104) w , w (cid:105) + S K )] . Again, using that B n is a ﬁnite set, one can write Ψ α ( z i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) sup w ∈ R n [ l ∑ k = ( w k + z ik ) − α ( (cid:104) w , w (cid:105) + S K )] . Observing that only the ﬁrst l components of w inside the sup are positive gives ∀ k ∈ { , . . . , l } sup w ∈ R n [ l ∑ k = ( w k ) − α (cid:104) w , w (cid:105) ] = l × sup w k ∈ R [ w k − α ( w k ) ] . Evaluating at the critical point w ∗ k = α ∈ R + for the above quadratic givessup w k ∈ R [ w k − α ( w k )] = α . Therefore one can write Ψ α ( z i , y c fi ) = max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l α + l ∑ k = ( z ik ) − α S K ] . Furthermore, l ∗ is determined as l ∗ = arg max l ∈{ ,..., n } , l (cid:54) = (cid:107) y cfi (cid:107) [ l α + l ∑ k = ( z ik ) − α S K ] . Substituting back into expression for Ψ α gives Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . inally, taking the max values for Ψ α over cases w ∗ = w ∗ (cid:54) = Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) (cid:107) y c fi (cid:107) α (cid:21) ∨ (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Observe that for l ∗ = (cid:107) y c fi (cid:107) , the last term in brackets [ ; ] above evaluates to (cid:2) (cid:107) y cfi (cid:107) α (cid:3) . Let l ∗ be determined as l ∗ = arg max l ∈{ ,..., n } [ l α + l ∑ k = ( z ik ) − α S K ] and write Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + (cid:20) l ∗ α + (cid:18) l ∗ ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Alternatively, one can write Ψ α ( z i , y c fi ) = (cid:104) z i , y c fi (cid:105) + n (cid:95) l = (cid:20) l α + (cid:18) l ∑ k = z ik − (cid:107) y cfi (cid:107) ∑ k = z ik (cid:19) − α S K (cid:21) . Proposition 6.

Let α ∗ ∈ { α ≥ ∈ ∂ F ( α ) } where ∂ Ψ α = Conv ∪ (cid:110) ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n } (cid:111) and ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α .Proof. This follows from standard application of properties of convex functions and subgradients. First note that function h α is convex in α since (for ﬁxed l ) it is the sum of a hyperbola plus a constant plus a negative linear term. So then Ψ α is convexsince it is the pointwise max of a ﬁnite set of convex functions plus a constant. Using properties of subgradients, one can write ∂ Ψ α = Conv ∪ { ∂ h α ( l ) | (cid:104) z i , y c fi (cid:105) + h α ( l ) = Ψ α ; l ∈ { , . . . , n }} . Furthermore F ( α ) is convex in α since it is a linear term plusa sum of convex functions, so one can write α ∗ ∈ { α : 0 ∈ ∂ F ( α ) } and it follows that ∂ F ( α ) = δ + N ∑ Ni = ∂ Ψ α . Finally,we argue that α ∗ ≥

0. For α > ∃ z < − δ such that z ∈ ∂ Ψ α and for α > ∃ z > − δ such that z ∈ ∂ Ψ α . To elaborate, for α > (cid:107) y c fi (cid:107) > = ⇒ l ∗ = (cid:107) y c f (cid:107) = ⇒ K = = ⇒ ∃ z > − δ suchthat z ∈ ∂ Ψ α . To elaborate, for α > (cid:107) y c fi (cid:107) = = ⇒ l ∗ = = ⇒ K = , Ψ α = , = z > − δ such that z ∈ ∂ Ψ α . Hence we deduce ∂ F ( α ) crosses the origin ( as α sweeps from 0 to ∞ ). Theorem 3.