[PDF] Deep xVA solver -- A neural network based counterparty credit risk management framework

Abstract

In this paper, we present a novel computational framework for portfolio-wide risk management problems, where the presence of a potentially large number of risk factors makes traditional numerical techniques ineffective. The new method utilises a coupled system of BSDEs for the valuation adjustments (xVA) and solves these by a recursive application of a neural network based BSDE solver. This not only makes the computation of xVA for high-dimensional problems feasible, but also produces hedge ratios and dynamic risk measures for xVA, and allows simulations of the collateral account.

Full PDF

DDEEP XVA SOLVER – A NEURAL NETWORK BASED COUNTERPARTYCREDIT RISK MANAGEMENT FRAMEWORK

ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGERMay 7, 2020

Abstract.

In this paper, we present a novel computational framework for portfolio-wide risk man-agement problems where the presence of a potentially large number of risk factors makes traditionalnumerical techniques ineﬀective. The new method utilises a coupled system of BSDEs for the valuationadjustments (xVA) and solves these by a recursive application of a neural network based BSDE solver.This not only makes the computation of xVA for high-dimensional problems feasible, but also produceshedge ratios and dynamic risk measures for xVA, and allows simulations of the collateral account. Introduction

As a consequence of the 2007–2009 ﬁnancial crisis, academics and practitioners have been redeﬁningand augmenting key concepts of risk management. This made it necessary to reconsider many widelyused methodologies in quantitative and computational ﬁnance.As an example, it is now generally accepted that a reliable valuation of a ﬁnancial product shouldaccount for the possibility of default of any agent involved in the transaction. Moreover, the tradingactivity is nowadays funded by resorting to diﬀerent sources of liquidity. This results in the interestrate multi-curve phenomenon that, in such a framework, the existence of a unique funding stream witha unique risk-free interest rate no longer represents a realistic assumption. The increasingly importantrole of collateral agreements, demands for a portfolio-wide view on valuation.The aforementioned stylized facts are incorporated at the level of valuation equations by introducingvalue adjustments (xVA). Value adjustments are further terms to be added to, or subtracted from, anidealized reference portfolio value, computed in the absence of frictions, in order to obtain the ﬁnalvalue of the transaction.The literature on counterparty credit risk and funding is large and we only attempt to provide insightson the main references as they relate to our work. Possibly the ﬁrst contribution on the subject is amodel for credit risk asymmetry in swap contracts in Duﬃe and Huang (1996). Before the 2007–2009ﬁnancial crisis, we have the works of Brigo and Masetti (2005) and Cherubini (2005), where the conceptof credit valuation adjustment (CVA) is analyzed. The possibility of default of both counterpartiesinvolved in the transaction, represented by the introduction of the debt valuation adjustment (DVA),is investigated, among others, in Brigo et al. (2011) and Brigo et al. (2014).Another important source of concern for practitioners apart from default risk is represented by fund-ing costs. A parallel stream of literature emerged during and after the ﬁnancial crisis to generalizevaluation equations to include the presence of collateralization agreements. In a Black-Scholes econ-omy, Piterbarg (2010) provides valuation formulas both in the collateralized and uncollateralized case.Generalizations to the case of a multi-currency economy can be found in Piterbarg (2012), Fujii et al.(2010) and Fujii et al. (2011). The funding valuation adjustment (FVA) is derived under alternative

Mathematics Subject Classiﬁcation.

JEL Classiﬁcation

E43, G12.

Key words and phrases.

CVA, DVA, FVA, ColVA, xVA, EPE, Collateral, xVA hedging, Deep BSDE Solver. a r X i v : . [ q -f i n . M F ] M a y ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER assumptions on the Credit Support Annex (CSA) in Pallavicini et al., while Brigo and Pallavicini(2014) also discusses the role of central counterparties in terms of funding costs. A general approachto funding in a semimartingale setting is provided by Bielecki and Rutkowski (2015).Funding and default risk need to be united in a single risk management framework to account for allpossible frictions and their interplay. Contributions in this sense can be found in Brigo et al. (2018) bymeans of the so-called discounting approach. In a series of papers, Burgard and Kjaer generalize theclassical Black-Scholes replication approach to include some of the aforementioned eﬀects, see Burgardand Kjaer (2011b) and Burgard and Kjaer (2011a). A more general backward stochastic diﬀerentialequation (BSDE) approach is provided by Cr´epey (2015a), Cr´epey (2015b), Bichuch et al. (2018a) andBichuch et al. (2018b). The equivalence between the discounting approach of Brigo and co-authorsand the BSDE-based replication approaches is demonstrated in Brigo et al. (2018).The importance of the topic is reﬂected in the increasing number of monographs on the subject, withBrigo et al. (2013) an example of an early work. An advanced BSDE-based treatment is presented byCr´epey et al. (2014), while detailed analyses of how to construct large hybrid models for counterpartyrisk simulations are provided in Green (2015), Lichters et al. (2015) and Sokol (2014). Finally, Gregory(2015) provides an accessible introduction to wide-ranging aspects of the topic.A common fundamental feature of such generalized risk management frameworks is the necessity toadopt a portfolio-wide point of view in order to properly account for risk mitigation beneﬁts arisingfrom diversiﬁed positions. Adopting such portfolio-wide models, as is the present market practice inﬁnancial institutions, involves high-dimensional joint simulations of all positions within a portfolio.Commonly used numerical techniques (see for instance Sh¨oftner (2008); Karlsson et al. (2016); Broadieet al. (2015); Joshi and Kwon (2016)) make use of regression approaches, based on a modiﬁcation ofthe Least-Squares Monte Carlo approach in Longstaﬀ and Schwartz (2001), to alleviate the highcomputational cost of fully nested Monte Carlo simulations such as those initially proposed in Gordyand Juneja (2010); Broadie et al. (2011). For an application of adjoint algorithmic diﬀerentiation(AAD) to xVA simulation by regression see, for instance, Capriotti et al. (2017); Fries (2019).An alternative, hybrid, approach to counterparty risk computations is taken in de Graaf et al. (2014),where standard pricing methods are applied to the products in the portfolio and outer Monte Carloestimators are applied for exposures. Techniques based purely on PDEs generally suﬀer from the so-called curse of dimensionality , a rapid increase of computational cost in presence of high dimensionalproblems. A PDE approach with factor-based dimension reduction has been proposed in de Graafet al. (2018). Observe that in the presence of collaterals, a PDE representation for CVA and DVA isnot always available.Presently, in high dimension (i.e., on portfolio level), there are no well established computationalmethods that have achieved satisfactory results. However, as explained above, such situations are ofgreat practical relevance.Modern risk-management frameworks demand the use of advanced parallel programming techniquesat the heart of software running on high-performance hardware infrastructures. In recent years, thanksto developments such as the introduction of advanced graphics processing units (GPUs), allowing formassive parallel computations, machine learning techniques have witnessed an increasing popularityin diﬀerent domains. Such methods prove themselves particularly appealing in the context of high-dimensional problems involving large amounts of data.Of particular interest is the concept of an artiﬁcial neural network (ANN). From a mathematicalperspective, ANNs are multiple nested compositions of relatively simple multivariate functions. Theterm deep neural networks refers to ANNs with several interconnected layers. One remarkable property

SDES OF XVA 3 of ANNs is given in the “Universal Approximation Theorem”, which has been proven in diﬀerentversions, starting from the remarkable insight of Kolmogorov’s Representation Theorem, Kolmogorov(1956), and the seminal works of Cybenko (1989) and Hornik (1991). In a nutshell, this result statesthat any continuous function in any dimension can be represented to arbitrary accuracy by means of anANN. In this context, and building heavily on earlier work of Jentzen et al. (2018), the recent results byReisinger and Zhang (2019) have proved that deep ANNs can overcome the curse of dimensionality forapproximating (nonsmooth) solutions of partial diﬀerential equations arising from (open-loop controlof) SDEs. A result to the same eﬀect has been shown for heat equations with a zero-order nonlinearityin Hutzenthaler et al. (2018). This is potentially useful in the context of risk management as simplemodels for CVA can be expressed in this form. For a recent literature survey of applications of neuralnetworks to pricing, hedging and risk management problems more generally we refer the reader to Rufand Wang (2019).In this paper, we investigate the application of ANNs to solve high-dimensional BSDEs arising fromrisk management problems. Indeed, in the classical continuous-time mathematical ﬁnance literaturethe random behavior of the simple ﬁnancial assets composing a portfolio is typically described bymeans multi-dimensional Brownian motions and forward stochastic diﬀerential equations (SDEs). Inthis setting, BSDEs naturally arise as a representation of the evolution of the hedging portfolio, wherethe terminal condition represents the target payoﬀ (see, e.g., El Karoui et al. (1997)). In essence,(numerically) solving a BSDE is equivalent to identifying a risk management strategy.More precisely, we will consider a discretized version of the BSDE and parametrize the (high dimen-sional) control (i.e., hedging) process at every point in time by means of a family of ANNs. Oncewritten in this form, BSDEs can be viewed as model-based reinforcement learning problems. TheANN parameters are then ﬁtted so as to minimize a prescribed loss function. Mathematically, thisinvolves an optimization step over a very large number of variables which typically requires the use ofstochastic gradient descent-type algorithms.The line of computational methods we follow has been initiated in the context of high-dimensionalnonlinear PDEs, in E et al. (2017) and further investigated in Henry-Labordere (2017) and Fujii et al.(2019), and has led to the so-called Deep BSDE Solver. By way of ﬁnancial applications, and xVAspeciﬁcally, a primal-dual extension to the Deep BSDE Solver has been developed in Henry-Labordere(2017) and tested on stylised CVA- and IM(Initial Margin)-type PDEs; the Deep BSDE Solver hasalso been applied speciﬁcally to exposure computations for a Bermudan swaption and a cross-currencyswap in She and Grecu (2017).Our approach goes beyond these earlier works in the following regards: we • formulate a rigorous, generic BSDE model for the dynamics of xVA, including CVA, DVA,FVA and ColVA (collateral valuation adjustment), for a derivative portfolio; • provide algorithms for the computation of ‘non-recursive’ xVAs (such as CVA and DVA) and‘recursive’ xVAs by (recursive) application of a Deep BSDE Solver; • show how the method can be used for the simulation of xVA sensitivities and collateral.We will refer to our method as Deep xVA Solver. More recently, an xVA strategy based on deeplearning regression has been proposed in Cr´epey et al. (2019); Albanese et al. (2020), exploiting thenumerical approach to BSDEs presented in Hur´e et al. (2020). Diﬀerent from E et al. (2017), thissolver approximates the value function, not the control, by means of an ANN and reconstructs it ateach time step by dynamic programming techniques. A comparison of the performance and robustnessof the two approaches will require comprehensive testing in industry-relevant settings. We see as a ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER structural advantage of our algorithm that it directly computes the xVA hedging strategy.The paper is organized as follows. The ﬁnancial framework is established in Section 2. In Section3, after shortly recalling the main features of the deep BSDE solver presented in E et al. (2017), thealgorithm for xVA computation is introduced. Numerical results for a selection of test cases are shownin Section4, while Section 5 concludes.2.

The financial market

We ﬁx a time horizon

T < ∞ for the trading activity of two agents named the bank (B) and the counterparty (C). Unless otherwise stated, throughout the paper we assume the bank’s perspectiveand refer to the bank as the hedger. All underlying processes are modeled over a probability space (Ω , G , G , Q ) , where G = ( G t ) t ∈ [0 ,T ] ⊆ G is a ﬁltration satisfying the usual assumptions ( G is assumed to be trivial). We denote by τ B and τ C the time of default of the bank and the counterparty, respectively. Speciﬁcally, we assume that G = F ∨ H , where F = ( F t ) t ∈ [0 ,T ] is a reference ﬁltration satisfying the usual assumptions and H = H B ∨ H C , with H j = (cid:16) H jt (cid:17) t ∈ [0 ,T ] for H jt = σ ( H u | u ≤ t ), and H jt := 1 { τ j ≤ t } , j ∈ { B, C } . We set(2.1) τ = τ C ∧ τ B . In the present paper we will extensively make use of the so called

Immersion Hypothesis (see, e.g.,Bielecki and Rutkowski (2004)).

Hypothesis 2.1.

Any local ( F , Q ) -martingale is a local ( G , Q ) -martingale. We consider the following spaces: • L ( R d ) is the space of all F T -measurable R d -valued random variables X : Ω (cid:55)→ R d such that (cid:107) X (cid:107) = E (cid:104) | X | (cid:105) < ∞ . • H ,q × d is the space of all predictable R q × d -valued processes φ : Ω × [0 , T ] (cid:55)→ R q × d such that E (cid:104)(cid:82) T | φ t | dt (cid:105) < ∞ . • S the space of all adapted processes φ : Ω × [0 , T ] (cid:55)→ R q × d such that E (cid:2) sup ≤ t ≤ T | φ t | (cid:3) < ∞ .2.1. Basic traded assets.

Risky assets.

For d ≥ , we denote by S i , i = 1 , . . . , d , the ex-dividend price (i.e. the price) of riskysecurities. All S i are assumed to be c`adl`ag F -semimartingales.Let W Q = (cid:16) W Q t (cid:17) t ∈ [0 ,T ] be a d -dimensional ( F , Q )-Brownian motion (hence a ( G , Q )-Brownian motion,thanks to Hypothesis 2.1). We introduce the following coeﬃcient functions: µ : R + × R d (cid:55)→ R d ,σ : R + × R d (cid:55)→ R d × d , (2.2)which are assumed to satisfy standard conditions ensuring existence and uniqueness of strong solutionsof SDEs driven by the Brownian motion W Q . The matrix process σ is assumed to be invertible atevery point in time. We assume that  d S t = µ ( t, S t ) d t + σ ( t, S t ) d W Q t ,S = s ∈ R d , (2.3)on [0 , T ] . Note that we are not postulating that the processes S i are positive. SDES OF XVA 5

Throughout the paper we assume that the market is complete for the sake of simplicity.

Cash accounts.

Given a stochastic return process x := ( x t ) t ≥ , which is assumed bounded from below,right-continuous and F -adapted, we deﬁne the cash account B x with unitary value at time 0, as thestrictly positive continuous processes of ﬁnite variation B xt := exp (cid:26)(cid:90) t x s d s (cid:27) , t ∈ [0 , T ] . (2.4)In particular, B x := ( B xt ) t ∈ [0 ,T ] is also continuous and adapted. Defaultable bonds.

Default times are assumed to be exponentially distributed random variables withtime-dependent intensity Γ jt = (cid:90) t λ js d s, j ∈ { B, C } , t ∈ [0 , T ] , where λ j are non-negative measurable bounded deterministic functions such that (cid:90) T λ js d s < ∞ , ∀ t ≥ , j ∈ { B, C } . We introduce two risky bonds with maturity T (cid:63) ≤ T and rate of return r j + λ j , issued by the bankand the counterparty, with dynamicsd P jt = (cid:16) r jt + λ jt (cid:17) P jt d t − P jt − dH jt , P j = e − (cid:82) T(cid:63) ∧ τj ( r ju + λ ju ) d u , j ∈ { B, C } . (2.5)2.2. xVA framework. We consider a family of contingent claims within a portfolio with agreeddividend stream A m = ( A mt ) t ∈ [0 ,T ] , m = 1 , . . . , M , and set ¯ A mt := 1 { t<τ } A mt + 1 { t ≥ τ } A mτ − . The valueof the single claims within the portfolio, ignoring any counterparty risk or funding issue, that we referto as clean values , are denoted by ( ˆ V mt ) m =1 ,...,M and satisfy the following FBSDEs, for m = 1 , . . . , M ,  − d ˆ V mt = d A mt − r t ˆ V mt d t − (cid:80) dk =1 ˆ Z k,mt d W k, Q t , ˆ V mT m = 0 , (2.6)which reads, in integral form,ˆ V mt := E Q (cid:34) B rt (cid:90) ( t,T m ] d A mu B ru (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) , t ∈ [0 , T m ] , (2.7)where r is a collateral rate in an idealized perfect collateral agreement. For simplicity, we restrictourselves to Europan-type contracts and write A mt = 1 { t = T m } g m ( S T m ), T m ≤ T , for a family ofLipschitz functions g m , m = 1 , . . . M . Then, equation (2.6) reads  − d ˆ V mt = − r t ˆ V mt d t − (cid:80) dk =1 ˆ Z k,mt d W k, Q t , ˆ V mT m = g m ( S T m ) . (2.8)Observe that the system (2.3) and (2.8) is decoupled, in the sense that the forward equation (2.3)does not exhibit a dependence on the backward component.We follow here the framework in Biagini et al. (2019), where the portfolio dynamics are stated in theform of a BSDE under the enlarged ﬁltration G . We set Z kt := d (cid:88) i =1 ξ it σ i,k ( t, S t ) , k = 1 , . . . , d, (2.9a) U jt := − ξ jt P jt − , j ∈ { B, C } , (2.9b) ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER f ( t, V, C ) := − (cid:104) ( r f,lt − r t ) ( V t − C t ) + − ( r f,bt − r t ) ( V t − C t ) − (2.9c) +( r c,lt − r t ) C + t − ( r c,bt − r t ) C − t (cid:105) , where • ξ i , i = 1 , . . . , d , are the positions in risky assets, while ξ B , ξ C are the position in the bank andcounterparty bond respectively; • r f,l , r f,b represent unsecured funding lending and borrowing rates; • r c,l , r c,b denote the interest on posted and received variation margin (collateral); • C + and C − represent the posted and received variation margin/collateral.All above processes are assumed to satisfy suitable regularity conditions ensuring existence and unique-ness for a solution to BSDE (2.10) below. Both posted and received collateral are assumed to beLipschitz functions of the clean value of the derivative portfolio.We denote by V the full contract value, i.e. the portfolio value including counterparty risk and multiplecurves. The G -BSDE for the portfolio’s dynamics then has the form on { τ > t }  − d V t = (cid:80) Mm =1 d ¯ A mt + ( f ( t, V, C ) − r t V t ) d t − (cid:80) dk =1 Z kt d W k, Q t − (cid:80) j ∈{ B,C } U jt d M j, Q t ,V τ = θ τ ( ˆ V , C ) , where θ τ ( ˆ V , C ) := ˆ V τ + 1 { τ C <τ B } (1 − R C ) (cid:16) ˆ V τ − C τ − (cid:17) − − { τ B <τ C } (1 − R B ) (cid:16) ˆ V τ − C τ − (cid:17) + , (2.10)where ˆ V t := (cid:80) Mm =1 ˆ V mt and R B , R C are two positive constants representing the recovery rate of thebank and the counterparty, respectively.In their Theorem 3.15, Biagini et al. (2019) show that there exists a unique solution ( V, Z, U ) for the G -BSDE (2.10), and the process V assumes the following form on { τ > t } : V t = B rt E Q (cid:34) M (cid:88) m =1 (cid:90) ( t,τ ∧ T ] d ¯ A mu B ru + (cid:90) τ ∧ Tt f ( u, V, C ) B ru d u + 1 { τ ≤ T } θ τ ( ˆ V , C ) B rτ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G t (cid:35) . (2.11)To prove existence and uniquencess for the G -BSDE, Biagini et al. (2019) employ the techniqueintroduced by Cr´epey (2015a) and reformulate the problem under the reduced ﬁltration F . Statedin such a form, the problem is also more amenable to numerical computations, especially in the casewhere f and θ do not dependent on V , either explicitly, or implicitly through C .We consider the following F -BSDE on [0 , T ]:  − dXVA t = ¯ f ( ˆ V t , XVA t ) d t − (cid:80) dk =1 ¯ Z kt d W k, Q t , XVA T = 0 , (2.12)where ¯ f ( ˆ V t , XVA t ) := − (1 − R C ) (cid:16) ˆ V t − C t (cid:17) − λ C, Q t + (1 − R B ) (cid:16) ˆ V t − C t (cid:17) + λ B, Q t + ( r f,lt − r t ) (cid:16) ˆ V t − XVA t − C t (cid:17) + − ( r f,bt − r t ) (cid:16) ˆ V t − XVA t − C t (cid:17) − + ( r c,lt − r t ) C + t − ( r c,bt − r t ) C − t − ( r t + λ C, Q t + λ B, Q t )XVA t . (2.13)By standard results on BSDEs, see e.g. Delong (2017, Theorem 4.1.3, Theorem 3.1.1), the existence anduniqueness of solutions ( ˆ V m , ˆ Z m ) ∈ S ( R ) × H ,q × , for m = 1 , . . . , M , and (XVA , ¯ Z ) ∈ S ( R ) × H ,q × to, respectively, (2.8) and (2.12), holds under the following conditions: r f,l , r f,b , r c,l , r c,b , r are bounded processes; | µ ( t, x ) − µ ( t, x (cid:48) ) | + | σ ( t, x ) − σ ( t, x (cid:48) ) | ≤ C | x − x (cid:48) | , | σ ( t, x ) | + | µ ( t, x ) | ≤ C (1 + | x | ) , for any t ∈ [0 , T ], x, x (cid:48) ∈ R d , for some constants C ≥ pre-default xVA process. Indeed, given the pre-default valueprocess V such that V t { τ>t } = V t { τ>t } , on { τ > t } the solution to (2.10) can be represented as V t = ˆ V t − XVA t . Moreover, deﬁning the process ˜ r = (˜ r t ) t ∈ [0 ,T ] as ˜ r := r + λ C, Q + λ B, Q , it has been shown in Biaginiet al. (2019, Corollary 3.31) that the process XVA admits the representationXVA t = − CVA t + DVA t + FVA t + ColVA t , (2.14)where CVA t := B ˜ rt E Q (cid:20) (1 − R C ) (cid:90) Tt B ˜ ru (cid:16) ˆ V u − C u (cid:17) − λ C, Q u d u (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (2.15) DVA t := B ˜ rt E Q (cid:20) (1 − R B ) (cid:90) Tt B ˜ ru (cid:16) ˆ V u − C u (cid:17) + λ B, Q u d u (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (2.16) FVA t := B ˜ rt E Q  (cid:90) Tt ( r f,lu − r u ) (cid:16) ˆ V u − XVA u − C u (cid:17) + B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t  (2.17) − B ˜ rt E Q  (cid:90) Tt ( r f,bu − r u ) (cid:16) ˆ V u − XVA u − C u (cid:17) − B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t  , ColVA t := B ˜ rt E Q (cid:34) (cid:90) Tt ( r c,lu − r u ) C + u − ( r c,bu − r u ) C − u B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) . (2.18)This representation highlights that the inclusion of diﬀerent borrowing and lending rates introducesa non-zero funding adjustment which cannot be found independently of the other adjustments. Analgorithm to compute all valuations adjustments systematically in the ‘non-recursive’ and ‘recursive’setting, especially with the view of potentially large portfolios, is the focus of the next sections.3. The algorithm

In this section, we describe the algorithm for the computation of valuation adjustments by neuralnetwork approximations to the BSDE introduced in the previous section. We start by brieﬂy recallingthe main features of the deep BSDE solver in E et al. (2017). Then, we present the application ofthe solver to valuation adjustments and its extensions to obtain ﬁnancially important quantities. Weﬁrst focus on non recursive adjustments, namely CVA and DVA, and then extend the approach to therecursive case. In particular, we propose to use the deep BSDE solver in E et al. (2017) to approximatethe dynamics of ˆ V mu , m = 1 , . . . , M , u ∈ [ t, T ], which constitute the portfolio ˆ V u = (cid:80) Mi =1 ˆ V mu . Once theportfolio value has been approximated and resulting collaterals computed, the value of the adjustmentcan be obtained either by inserting the values in an ‘outer’ Monte Carlo computation (observe thatthis strategy only works for non recursive adjustments) or applying a second time the deep BSDEsolver to (2.12). ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER

The deep BSDE solver of E et al. (2017).

We describe in this section the main ideas leadingto the algorithm by E et al. (2017). We consider a general forward-backward stochastic diﬀerentialequation (FBSDE) framework.Let (Ω , F , Q ) be a probability space rich enough to support an R d -valued Brownian motion W Q =( W Q t ) t ∈ [0 ,T ] . Let F = ( F t ) t ∈ [0 ,T ] be the ﬁltration generated by W Q , assumed to satisfy the standardassumptions. Let us consider an FBSDE in the following general form: X t = x + (cid:90) t b ( s, X s ) d s + (cid:90) t a ( s, X s ) (cid:62) d W Q s , x ∈ R d (3.1) Y t = ϑ ( X T ) + (cid:90) Tt h ( s, X s , Y s , Z s ) d s − (cid:90) Tt Z (cid:62) s d W Q s , t ∈ [0 , T ] , (3.2)where the vector ﬁelds b : [0 , T ] × R d (cid:55)→ R d , a : [0 , T ] × R d (cid:55)→ R d × d , h : [0 , T ] × R d × R × R d (cid:55)→ R and ϑ : R d (cid:55)→ R satisfy suitable assumptions ensuring existence and uniqueness results. We denoteby ( X xt ) t ∈ [0 ,T ] ∈ S ( R d ) and ( Y yt , Z t ) t ∈ [0 ,T ] ∈ S ( R ) × H ,q × the unique adapted solution to (3.1) and(3.2), respectively. To alleviate notations, hereafter we omit the dependency on the initial condition x of the process X x · .The above formulation of FBSDEs is intrinsically linked to the following stochastic optimal controlproblem: minimise y, Z =( Z t ) t ∈ [0 ,T ] E (cid:104)(cid:12)(cid:12) ϑ ( X T ) − Y yT (cid:12)(cid:12) (cid:105) subject to (3.1)–(3.2) . (3.3)In particular, a solution ( Y, Z ) to (3.2) is a minimiser of the problem (3.3). A discretized version ofthe optimal control problem (3.3) is the basis of the deep BSDE solver.Given N ∈ N , consider 0 = t < t < . . . < t N = T . For simplicity, let us take a uniform mesh withstep ∆ t such that t n = n ∆ t , n = 0 , . . . , N , and denote ∆ W n = W Q t n +1 − W Q t n . By an Euler-Maruyamaapproximation of (3.1)–(3.2), one has (cid:101) X n +1 = (cid:101) X n − b ( t n , (cid:101) X n )∆ t + a ( t n , (cid:101) X n )∆ W n , (cid:101) X = x, (3.4) (cid:101) Y yn +1 = (cid:101) Y n − h ( t n , (cid:101) X n , (cid:101) Y n , (cid:101) Z n )∆ t + (cid:101) Z (cid:62) n ∆ W n , (cid:101) Y y = y. (3.5)The core idea of the deep BSDE solver is to approximate, at each time step n , the control process (cid:101) Z n in (3.5) by using an artiﬁcial neural network (ANN). More speciﬁcally, in the Markovian setting, Z t isa measurable function of X t , which we approximate by an ANN ansatz and carry out the optimisationabove over this parametrised form. For this we introduce next a formalism for the description of neuralnetworks. ANN approximation.

We consider artiﬁcial neural networks with L + 1 ∈ N \ { , } layers. Each layerconsists of ν (cid:96) nodes (also called neurons ), for (cid:96) = 0 , . . . , L . The 0-th layer represents the input layer ,whereas the L -th layer is called the output layer . The remaining L − hidden layers .For simplicity we set ν (cid:96) = ν , (cid:96) = 1 , . . . , L −

1. The input and output dimensions are both d in ourcase.A feedforward neural network is a function ϕ (cid:37) : R d (cid:55)→ R d . A feedforward neural network is deﬁnedvia the composition x ∈ R d (cid:55)−→ A L ◦ (cid:37) ◦ A L− ◦ . . . ◦ (cid:37) ◦ A ( x ) ∈ R d , where all A (cid:96) , (cid:96) = 1 , . . . , L , are aﬃne transformations A : R d (cid:55)→ R ν , A (cid:96) : R ν (cid:55)→ R ν , (cid:96) = 2 , . . . , L − , A L : R ν (cid:55)→ R d , SDES OF XVA 9 x x x x Inputlayer Hiddenlayer Hiddenlayer y y y y Outputlayer

Figure 1.

Schematic representation of a feedforward neural network with two hiddenlayers, i.e. L = 3, input and output dimension d = 4, and ν = d + 2 = 6 nodes.of the form A (cid:96) ( x ) := W (cid:96) x + β (cid:96) , (cid:96) = 1 , . . . , L , where W (cid:96) and β (cid:96) are matrices and vectors of suitablesize called, respectively, weights and biases. The function (cid:37) , called activation function is a univariatefunction (cid:37) : R (cid:55)→ R that is applied component-wise to vectors. With an abuse of notation, wedenote (cid:37) ( x , . . . , x ν ) = ( (cid:37) ( x ) , . . . , (cid:37) ( x ν )) . The elements of the weights W (cid:96) and of the vectors β (cid:96) arethe parameters of the neural network. We can regroup all parameters in a vector ρ ∈ R R where R = (cid:80) L (cid:96) =0 ν (cid:96) (1 + ν (cid:96) ).As announced we use ANNs to approximate the control process Z t . More speciﬁcally, let R ∈ N as before and let ξ ∈ R , ρ ≡ ( ρ . . . , ρ R ) ∈ R R be R + 1 parameters. We introduce a family ofneural networks ϕ ρn : R d → R d , n ∈ { , . . . , N } parametrized by ρ and indexed by time. We denote Z ρn = ϕ ρn ( X n ) and consider the following parametrized version of (3.5) Y ρ,ξn +1 = Y ρ,ξn − h ( t n , X n , Y ρ,ξn , Z ρn )∆ t + ( Z ρn ) (cid:62) ∆ W n , Y ρ,ξ = ξ, (3.6)meaning that, at each time step, we use a distinct neural network to approximate the control process.The deep BSDE solver by E et al. (2017) considers the following stochastic optimization problemminimise ξ ∈ R , ρ ∈ R R E (cid:20)(cid:16) ϑ ( X N ) − Y ρ,ξN (cid:17) (cid:21) subject to (3.4)–(3.6) . (3.7)Observe that, in practice, one simulates L ∈ N Monte Carlo paths ( X ( (cid:96) ) n , Y ξ,ρ, ( (cid:96) ) n ) n =0 ...N for (cid:96) = 1 , . . . , L ,using (3.4)–(3.6) with N i.i.d. Gaussian random variables (∆ W n ) n =0 ,...,N − with mean 0 and variance∆ t . Replacing the expected cost functional by the empirical mean, (3.7) becomesminimise ξ ∈ R , ρ ∈ R R L L (cid:88) (cid:96) =1 (cid:16) ϑ ( X ( (cid:96) ) T ) − Y ρ,ξ, ( (cid:96) ) N (cid:17) subject to (3.4)–(3.6) . (3.8)This minimization typically involves a huge number of parameters and it is performed by a stochasticgradient descent-type algorithm (SGD), leading to random approximations. For further details on thispoint we refer the reader to Section 2.6 in E et al. (2017). We will denote by I the maximum numberof SGD iterations. To improve the performance and stability of of the ANN approximation a batchnormalization can also be considered, see Ioﬀe and Szegedy (2015). However, in our framework, thisnormalization does not always have a positive impact on the results and we will only apply it whenresulting in some numerical improvement. A rigorous and complete theoretical convergence framework for the deep BSDE solver is not availableto date. Interesting a posteriori error bounds can be found in Han and Long (2018, Theorem 1’),where the authors show that under suitable assumption on the coeﬃcients of the FBSDE (3.1)-(3.2)(namely the monotonicity of b and h , the Lipschitz continuity in space, H¨older continuity in time,linear growth of b, a, h and the Lipschitz continuity of ϑ ) one has, for ∆ t suﬃciently small,(3.9) sup t ∈ [0 ,T ] E | Y yt − ˜ Y ρ,ξt | + (cid:90) T E | Z t − ˜ Z ρt | d t ≤ C (cid:16) ∆ t + E (cid:104) (cid:16) ϑ ( X N ) − Y ρ,ξN (cid:17) (cid:105)(cid:17) , where C is a constant independent of ∆ t and d possibly depending on the starting point of the forwardprocess and, given ( Y ρ,ξn ) n =0 ,...,N from (3.6), ˜ Y ρ,ξt = Y ρ,ξn and ˜ Z ρt = Z ρn for t ∈ [ t n , t n +1 ).In Han and Long (2018, Theorem 2’), a priori estimates on the term E [( ϑ ( X N ) − Y ρ,ξN ) ] are alsoprovided. However, the obtained bounds depend on the (unknown) approximation capacity of theconsidered ANN. To corroborate this idea, we mention that in our numerical tests we experiencedimportant variability of results depending on the diﬀerent structure of the ANN used (see also theresults in Figure 5).3.2. The Deep xVA Solver for non recursive valuation adjustments.

In our setting, the deepBSDE solver is ﬁrst employed in the approximation of the clean values of the portfolio, i.e., theprocesses ˆ V mt for m = 1 , . . . , M , which are the solutions of (2.8) with underlying forward dynamicsgiven by S in (2.3). More precisely, in the notation of the previous section, we take X t = S t and Y t = ˆ V mt for m = 1 , . . . , M. We now describe the algorithm for computing CVA and DVA given by formulas (2.15) and (2.16),respectively. A unifying formula for CVA and DVA can be written as(3.10) B ˜ rt E Q (cid:20) (cid:90) Tt Φ( u, ˆ V u ) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , where • Φ( u, v ) = (1 − R C ) B ˜ ru ( v − C u ) − λ C, Q u for CVA; • Φ( u, v ) = (1 − R B ) B ˜ ru ( v − C u ) + λ B, Q u for DVA.Given a time discretization (uniform, for simplicity) with time step ∆ t , the integral in (3.10) can beapproximated by a quadrature rule, i.e. (cid:90) T Φ( u, ˆ V u ) du ≈ N (cid:88) n =0 η n Φ( t n , ˆ V t n ) . For instance, taking t = t = 0, one may consider the trapezoidal rule (cid:90) T Φ( u, ˆ V u ) du ≈ N (cid:88) n =1 ∆ t t n , ˆ V t n ) + Φ( t n − , ˆ V t n − )) . Denoting for any m = 1 , . . . , M by ( V m, ¯ ρ, ¯ ξ, ( p ) n ) n =0 ,...,N,p =1 ,...,P the approximation of the process ( ˆ V mt n ) n =0 ,...,N obtained by means of the parameters (¯ ρ, ¯ ξ ) resulting from the deep BSDE solver optimization (3.7),the adjustment is then approximated by the following formula:1 P P (cid:88) p =1 (cid:32) N (cid:88) n =0 η n Φ( t n , M (cid:88) m =1 V m, ¯ ρ, ¯ ξ, ( p ) n ) (cid:33) . Algorithms 1 and 2 summarize the main steps of the method.

SDES OF XVA 11

Algorithm 1:

Deep algorithm for exposure simulationSet parameters:

N, L . (cid:46) N time steps, L paths for inner Monte Carlo loop Fix architecture of ANN. (cid:46) intrinsically deﬁnes the number of parameters R Deep BSDE solver ( N , L ) : Simulate L paths ( S ( (cid:96) ) n ) n =0 ,...,N , (cid:96) = 1 , . . . , L of the forward dynamics.Deﬁne the neural networks ( ϕ ρn ) n =1 ,...,N . for m = 1 , . . . , M do Minimize over ξ and ρ L L (cid:88) (cid:96) =1 (cid:16) g m ( S ( (cid:96) ) N ) − V m,ρ,ξ, ( (cid:96) ) N (cid:17) , subject to(3.11)  V m,ρ,ξ, ( (cid:96) ) n +1 = V m,ρ,ξ, ( (cid:96) ) n − f ( t n , S ( (cid:96) ) n , V m,ρ,ξ, ( (cid:96) ) n , Z ρ, ( (cid:96) ) n )∆ t + ( Z ρ, ( (cid:96) ) n ) (cid:62) ∆ W ( (cid:96) ) n , V m,ρ,ξ, ( (cid:96) )0 = ξ, Z ρ, ( (cid:96) ) n = ϕ ρn ( S ( (cid:96) ) n ) . Save the optimizer ( ¯ ξ m , ¯ ρ m ). endendAlgorithm 2: Deep xVA Solver for non recursive valuation adjustmentsApply Algorithm 1Set parameters: P . (cid:46) P paths for the outer Monte Carlo loop Simulate, for m = 1 , . . . , M , ( V m, ( p ) n ) n =0 ,...,N,p =1 ,...,P by means of (3.11) with ξ = ¯ ξ m , ρ = ¯ ρ m . (cid:46) approximation of the clean values Deﬁne V ( p ) n := (cid:80) Mm =1 V m, ( p ) n for n = 0 , . . . , N , p = 1 , . . . , P . (cid:46) approximation of the clean portfolio value Compute the adjustment as 1 P P (cid:88) i =1 (cid:32) N (cid:88) n =0 η n Φ( t n , V ( p ) n ) (cid:33) . Estimates (3.9) can be used to obtain a posteriori bounds on the L error for exposures in [0 , T ] startingfrom the loss function, however, the MC error should be added to obtain a computable bound.3.3. The Deep xVA Solver for recursive valuation adjustments.

The procedure of the previoussection is suﬃcient to perform the estimation of CVA and DVA according to (2.15) and (2.16) at timezero by means of a standard Monte Carlo estimator, given the pathwise solutions of the BSDEs forclean values. Typically, however, the bank needs to also compute risk measures on the CVA, such asValue–at–Risk. Also, if we look at the driver of the xVA BSDE (2.12) we observe that FVA termsintroduce a recursive structure in the driver, so that even a time t estimate of the process XVA requiresthe use of a numerical solver for a BSDE. Finally, let us observe that the bank is not only interestedin computing the xVA at time t , also hedging the market risk of xVA is important, meaning that onealso needs sensitivities of valuation adjustments with respect to the driving risk factors.All above considerations motivate us to propose a two-step procedure, where we ﬁrst employ the deepBSDE solver to estimate the clean values ˆ V m , m = 1 , . . . , M , according to Algorithm 1 and then,using the simulated paths of the M clean BSDEs obtained from the ﬁrst step, we apply again the deepBSDE solver to numerically solve the xVA BSDE (2.12). The procedure is outlined in Algorithm 3. Algorithm 3:

Deep xVA SolverApply Algorithm 1.Set parameters: P . (cid:46) P paths for outer Monte Carlo loop Fix architecture of ANN. (cid:46) intrinsically deﬁnes the number of parameters ¯ R (in general ¯ R (cid:54) = R ) Deep XVA-BSDE solver ( N , P ) : Simulate P paths ( V ( p ) n ) n =0 ,...,N , p = 1 , . . . , P , of the portfolio value.Deﬁne the neural networks ( ψ ζn ) n =1 ,...,N .Minimize over γ and ζ P P (cid:88) p =1 (cid:16) X ζ,γ, ( p ) N (cid:17) , subject to(3.12)  X ζ,γ, ( p ) n +1 = X ζ,γ, ( p ) n − ¯ f ( V ( p ) n , X ζ,γ, ( p ) n )∆ t + ( ¯ Z ζ, ( p ) n ) (cid:62) ∆ W ( p ) n , X ζ,γ, ( p )0 = γ, ¯ Z ζ, ( p ) n = ψ ζn ( V ( p ) n ) . end Pathwise simulation of sensitivities.

One interesting feature of our approach to xVA compu-tations is that we can easily estimate several sensitivities (i.e., partial derivatives) of pricing functions.Let us recall that, in the present Markovian setting, the control Z associated to a FBSDE of thegeneral form (3.1)–(3.2) satisﬁes Z t = ∂Y∂X ( t, X t ) a ( t, X t ) , (3.13)so that we can easily reconstruct the gradient of the pricing function with respect to all risk factorssimply by multiplying each (vector-valued) neural network by the inverse (which is assumed to exist)of the matrix a ( t, X t ). This becomes particularly interesting in view of Algorithms 1 and 3, where wecan obtain hedge rations both for the clean value and for the valuation adjustments without furthercomputations.Obtaining second order sensitivities, which may also be important for hedging purposes, is also fea-sible in our setting, because feedforward neural networks are compositions of simple functions andcomputation of gradients of neural network functions has become standard in that community. Usingthe notation of Section 3.1, we can write(3.14) ∂Z ρn ∂X n = ∂ϕ (cid:37) ( X n ) ∂X n , with ϕ (cid:37) ( X n ) = A L ( ρ ( A L− . . . ρ ( A ( X n )))). Since ( A (cid:96) ) (cid:96) =1 ,..., L are aﬃne functions, their Jacobians aregiven by the weight matrices, i.e. J A (cid:96) ( · ) = W (cid:96) , (cid:96) = 1 , . . . , L . Moreover, one also has the Jacobian of ρ , J (cid:37) ( · ) = diag (cid:0) (cid:37) (cid:48) ( · ) (cid:1) , SDES OF XVA 13 where, for x ∈ R ν we denote (cid:37) (cid:48) ( x ) = ( (cid:37) (cid:48) ( x ) , . . . , (cid:37) (cid:48) ( x ν )). In the present paper, we choose (cid:37) ( x ) =ReLU( x ) = max { x, } so that the ﬁrst derivative can be deﬁned as (cid:37) (cid:48) ( x ) = ReLU (cid:48) ( x ) = (cid:40) x >

00 otherwise (cid:41) = sgn(ReLU( x )) . Finally, we deduce that the following explicit diﬀerentiation formula holds: ∂Z ρn ∂X n = W L diag (cid:0) (cid:37) (cid:48) ( A L− ( . . . A ( X n ))) (cid:1) . . . diag (cid:0) (cid:37) (cid:48) ( A ( X n )) (cid:1) W . Given the availability of the derivative of Z ρn we can then obtain the Hessian of Y from (3.13).4. Numerical results

To test our algorithm, we start by studying two very simple examples with a similar computationalstructure as CVA and DVA, and for which we can easily provide reference solutions. We will thengive a higher-dimensional example and illustrate further practically relevant features of the method,such as recursive xVA computations and simulation of the collateral account.Let S be the price of a single stock described by a Black-Scholes dynamics,d S t = rS t d t + σS t d W Q t , S = s , and ˆ V a European-style contingent claim with valueˆ V t = E (cid:104) e − r ( T − t ) g ( S T ) |F t (cid:105) . In particular, ˆ V solves the following BSDE:  − d ˆ V t = − r ˆ V t d t − Z t d W Q t , ˆ V T = g ( S T ) . (4.1)The discounted positive and negative expected exposure of ˆ V are deﬁned, respectively, byDEPE( s ) = E Q (cid:20) e − r ( s − t ) (cid:16) ˆ V s (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (4.2) DENE( s ) = − E Q (cid:20) e − r ( s − t ) (cid:16) ˆ V s (cid:17) − (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (4.3)In the plots below, we show some promising results obtained from particular realisations of the DeepxVA Solver. A more careful evaluation of the numerical results should also take into account therandomness of the algorithm (through the inner and outer Monte Carlo estimation and stochasticgradient descent).4.1. A forward on S . In this case we consider g ( S T ) = S T − K with K = s . The pathwise exposure ˆ V at time s ∈ [ t, T ] is given byˆ V s = E Q (cid:104) e − r ( T − s ) ( S T − K ) (cid:12)(cid:12)(cid:12) F s (cid:105) = S s − Ke − r ( T − s ) . Substituting in (4.2) one has DEPE( s ) = S t Φ( d ) − Ke − r ( T − t ) Φ( d ) , (4.4) DENE( s ) = S t Φ( − d ) − Ke − r ( T − t ) Φ( − d ) , (4.5) Figure 2.

Forward contract: approximated exposure (left) and EPE, ENE (right).Parameters used: outer MC paths P = 2048, inner MC paths L = 64, internal layers L − ν = d + 20 = 21, I = 4000, time steps N = 200, no batch normalization.where Φ( · ) denotes the standard normal cumulative distribution function and, as usual, d = ln (cid:0) e r ( t − s ) S t /K (cid:1) + (cid:0) r + σ / (cid:1) ( s − t ) σ √ s − t and d = d − σ √ s − t.σ K T .

25 100 1

Table 1.

Parameters used in numerical experiments.We report in Figure 2 the plot of the numerical results obtained by Algorithm 2 using the parametersin Table 1 and r = 0. In particular, on the left we plot the simulated pathwise exposure, i.e. the paths t n → V ( p ) n for p = 1 , . . . , P , while on the right we compare the approximated EPE and ENE (solidlines) with the exact expected exposures given by (4.4)–(4.5) (dashed lines).4.2. A European call option.

In this case we consider g ( S T ) = ( S T − K ) + , where we set K = s . The pathwise exposure ˆ V at time s ∈ [ t, T ] is given by the Black-Scholes formulaˆ V s = E Q (cid:104) e − r ( T − s ) ( S T − K ) + (cid:12)(cid:12)(cid:12) F s (cid:105) = S s Φ( d ) − Ke − r ( T − s ) Φ( d ) > . It follows immediately that DEPE( t ) = E Q (cid:104) e − r ( s − t ) ˆ V s (cid:12)(cid:12)(cid:12) F t (cid:105) = ˆ V t , and DENE( t ) = 0 . The results obtained using Algorithm 2 with the parameters in Table 1 and r = 0 .

01 are reportedin Figure 3 (left). The exact European call price is 10 .

40, while the approximation of the exposureobtained by the solver and reported in Figure 3(left) takes values, for t ∈ [0 , T ], within the interval[10 . , .

33] with maximum distance 0 .

09 to the exact solution.4.3.

A basket call option.

Let us now consider the case of several underlying assets ( S , . . . , S d ):d S it = r i S it d t + σ i S it d W Q ,it , S = s i ∈ R d , i = 1 , . . . , d, SDES OF XVA 15

Figure 3.

DEPE and DENE for a European call option (left) and a European basketoption with 100 underlyings (right). Parameters used: outer MC paths P = 1024, innerMC paths L = 64, internal layers L − ν = d + 20 = 21 (left) and ν = d + 10 = 110(right), I = 4000, time steps N = 100, with batch normalization.where W Q = ( W Q , , . . . , W Q ,d ) is a standard Brownian motion in R d with correlation matrix ( ρ i,j ) ≤ i,j ≤ d .We set d = 100. A European basket call option is associated with the payoﬀ g ( S T , . . . , S dT ) = (cid:32) d (cid:88) i =1 S iT − K (cid:33) + . The results obtained by Algorithm 2 using the parameters in Table 1 with σ i = σ for all i = 1 , . . . , d ,zero correlation, and r i = r = 0 .

01 are reported in Figure 3 (right).The distinctive feature of the present example is the high dimension of the vector of risk factors. Whilethe two previous one-dimensional examples mainly served as a validation for the methodology, thepresent example highlights the ability of the proposed methodology to provide an accurate numericalapproximation in a high-dimensional context. For this example, we used the feedforward neuralnetwork with two layers and d + 10 nodes, with a ReLU activation function. The approximationparameters used are reported in the caption of Figure 3 (right). We increase the number of nodes ν roughly linearly with the dimension d , which turned out to be a useful rule-of-thumb for consistentaccuracy across dimensions in this case.A detailed study of deep learning values of basket derivative (on six underlying asses) from simulatedvalues, not based on BSDEs, see Ferguson and Green (2018).For the case of the basket call option, we observe that the exposure proﬁle corresponds to the presentvalue of the contract. As a consequence, we obtain a simple method to validate the exposure proﬁle bycomputing an estimate of the basket call option price by means of a standard Monte Carlo simulationwith 10 paths. We regard this as the ‘exact’ price. The Monte Carlo price we obtained is 398 .

08 withconﬁdence interval [397 . , . .

02 and 400 .

82, achieving the maximumdistance 5 .

06 to the Monte Carlo price at time t = 0.For this product we also perform an xVA calculation with the objective to validate Algorithm 2 andAlgorithm 3 in a case where both are applicable. To perform this comparison, we need the xVA BSDEto be non-recursive: this can be achieved by assuming that there is a unique risk-free interest rate, sothat FVA and ColVA are identically zero, i.e., xVA consists only of the CVA and DVA term. The ideais then to compare a Monte Carlo estimate of xVA according to Algorithm 2 with the initial value ofthe BSDE as produced by a full application of Algorithm 3. We assume that the default intensities of the bank and the counterparty are λ C, Q = 0 .

10 and λ B, Q =0 .

01, respectively. For the recovery rates we set R C = 0 . R B = 0 .

4, while the unique risk-freeinterest rate is r = 0 .

01. Using the same network setting (see again the caption of Figure 3, right), theDeep xVA Solver produced an xVA estimate of 208 .

55 by means of Algorithm 3, whereas the estimateproduced by Algorithm 2 is 211 . Realistic simulation of the collateral account.

A useful feature of our proposed approachconsists in the possibility of performing realistic simulations of the collateral account without resortingto simplifying assumptions. We can in fact compute the overall outstanding exposure between thebank and the counterparty by the following steps. Algorithm 1 allows us to simulate paths for allprocesses ˆ V m , m = 1 , . . . , M . Such paths can then be aggregated so as to produce a simulation ofthe portfolio process ˆ V = (cid:80) Mm =1 ˆ V m , that corresponds to the pre-collateral exposure . After this, wecompute the value of the collateral balance C corresponding to the simulated paths of ˆ V , which inturn allows us to compute the post-collateral exposure process ˆ V − C that enters the xVA formulas.For illustration, we consider M = 1 and the equity forward from the ﬁrst example. We introducea simple example of a collateral agreement where collateral is exchanged between the counterpartiesat every point in time (a margin call frequency that does not coincide with the simulation timediscretization can of course be treated as well). Collateral is exchanged only in case the pre-collateralexposure is above (below) a receiving (posting) threshold which are both set equal to 5, i.e. C t := C ( ˆ V t ) = ( ˆ V t − + − ( ˆ V t + 5) − . An illustration for a single path is provided in Figure 4.

Exposure Before Collateral

Collateral Balance

Exposure After Collateral

Figure 4.

Pathwise simulation of a collateralized exposure. The top left panel: ˆ V .Top right panel C . Bottom panel ˆ V − C . Posting and receiving threshold are 5 EUR. SDES OF XVA 17

Impact of number of layers and nodes.

The aim of this section is to analyse the impact ofnumber of nodes and layers of the neural network on the quality of the approximation in our setting.When considering the richness of the network, one can face two adverse, albeit opposite, situations,which are abundantly documented in the literature for various applications: on the one hand, choosingan overly simplistic structure implies that the model underﬁts , i.e., has a poor explanatory capability;on the other hand, a network architecture that is too rich might result in a limited capability of themodel to generalize when simulating new paths of the risk factors. This second situation is usuallytermed overﬁtting .To analyse the issue in the present setting, we consider the forward contract from Section 4.1 andapply Algorithm 1 to estimate the exposure of the contract. In this case, there is only one risk factor,namely the stock price, so that d = 1. We test neural networks with diﬀerent depths, from one tohidden three layers, and diﬀerent numbers of nodes, between d and d + 20 per layer. A graphicalrepresentation of the results is provided in Figure 5. We can clearly observe that a single (non-deep)neural network does not succeed in providing a satisfactory ﬁt to the data, while including more nodescan improve the ﬁt. A two layer network provides the best explanatory capability, with the bestresults being provided by a 2-layer conﬁguration with d + 20 nodes. Adding a further layer, for thisparticular example, does not lead to an improvement of the ﬁtting quality as testiﬁed by the last lineof Figure 5, which further shows that adding more nodes has a detrimental eﬀect.5. Preliminary conclusions and extensions

The proposed xVA algorithm exploits two useful complementary aspects of the Deep BSDE Solverof E et al. (2017). First, the formulation as an optimisation problem over a parametrisation of the(Markovian) control of the xVA BSDE, which is carried out by SDE discretisation and path sampling,directly gives both the hedge ratios in approximate functional form and model-based derivative pricesalong the sample paths. This is amenable to the simulation of exposure proﬁles, the computationof higher-order Greeks by pathwise diﬀerentiation, and allows for the computation of funding andmargin variation adjustments as well as xVA hedging. A second aspect of the Deep BSDE Solver isthe use of neural networks speciﬁcally as parametrisation for the Markovian control. A key advantageresults from the approximation power of neural networks in high dimensions, which has the potentialto make risk management computations on portfolio level feasible. Moreover, the simple functionalform allows standard pathwise sensitivity computations.Our numerical examples provide a proof of concept, but further systematic testing in realistic appli-cation settings is needed. While the basket option example gave good accuracy in a high-dimensionalapplication, we encountered delicate issues with ﬁtting the ANN even for the simple forward contract.An additional diﬃculty arises from the non-linear, non-convex parametric form, which, combinedwith the large number of parameters, leads to callenging optimisation problems. Both these aspects,the expression power of the ANN and the practicalities of the learning process, are extremely activeresearch areas and further developments of the proposed Deep xVA Solver will be informed by therapidly developing understanding of neural networks in a broader sense.The application of our proposed scheme is not restricted to the chosen xVA framework. For example,one could in principle apply our methodology to the balance-sheet based model computed in Cr´epeyet al. (2019); Albanese et al. (2020). In this case, the xVA computation involves multiple recursivevaluations (illustrated succinctly in Abbas-Turki et al. (2018, Figure 1)), which can be approached bymeans of multiple applications of the Deep xVA Solver. (a) d + 8 nodes (b) d + 20 nodes (c) d + 8 nodes (d) d + 20 nodes (e) d + 8 nodes (f) d + 20 nodes Figure 5.

Estimation of the exposure for the forward from Example 4.1 obtained with200 time steps and diﬀerent numbers of layers and nodes. From the top to bottom:results obtained with 1, 2, 3 layers with d + 8 (left) and d + 20 (right) nodes.We also emphasise that the Deep xVA Solver can be combined with an existing analytics library: thecomputation of the mark-to-market cube (i.e., the simulation of all possible scenarios for the cleanvalues over diﬀerent points in time) represents a classical numerical problem to be solved in order tocompute traditional risk ﬁgures such as Value-at-Risk or Expected Shortfall (this is often referred toas “Monte Carlo full revaluation approach”). Since most products individually depend on a limited SDES OF XVA 19 number of risk factors, it may be best to use a traditional numerical scheme, such as a ﬁnite diﬀerencesolver, for at least some of the more vanilla products, and then revaluate the products over diﬀerentMonte Carlo paths by means of a look-up table over the pre-computed numerical solution. Thisprovides an alternative route with respect to our Algorithm 1 for the simulation of the clean values.However, once we aggregate all mark-to-markets, we end up with an object that depends on a highnumber of risk factors, so for the computation of xVA our proposed methodology provides a usefultool which allows the recursive computation of valuation adjustments, their hedging strategy, andsimulation of collateral.

References

Abbas-Turki, L. A., Cr´epey, S., and Diallo, B. (2018). XVA principles, nested Monte Carlo strategies,and GPU optimizations. International Journal of Theoretical and Applied Finance, 21(06):1850030.Albanese, C., Cr´epey, S.and Hoskinson, R., and Saadeddine, B. (2020). XVA analysis from thebalance sheet. Working paper; available at https://math.maths.univ-evry.fr/crepey/papers/xva-analysis-balance.pdf.Biagini, F., Gnoatto, A., and Oliva, I. (2019). Pricing of counterparty risk and funding with CSAdiscounting, portfolio eﬀects and initial margin. arXiv preprint arXiv:1905.11328.Bichuch, M., Capponi, A., and Sturm, S. (2018a). Arbitrage-free XVA. Mathematical Finance,28(2):582–620.Bichuch, M., Capponi, A., and Sturm, S. (2018b). Robust XVA. arXiv preprint arXiv:1808.04908.Bielecki, T. and Rutkowski, M. (2004). Credit Risk: Modeling, Valuation and Hedging. Springer,Berlin, New York.Bielecki, T. and Rutkowski, M. (2015). Valuation and hedging of contracts with funding costs andcollateralization. SIAM Journal of Financial Mathematics, 6(1):594–655.Brigo, D., Buescu, C., Francischello, M., Pallavicini, A., and Rutkowski, M. (2018). Risk-neutral valua-tion under diﬀerential funding costs, defaults and collateralization. arXiv preprint arXiv:1802.10228.Brigo, D., Capponi, A., and Pallavicini, A. (2014). Arbitrage-free bilateral counterparty risk valuationunder collateralization and application to credit default swaps. Mathematical Finance, 24(1):125–146.Brigo, D. and Masetti, M. (2005). Risk neutral pricing of counterparty risk. In Pykhtin, M., edi-tor, Counterparty Credit Risk Modeling: Risk Management, Pricing and Regulation. Risk Books,London.Brigo, D., Morini, M., and Pallavicini, A. (2013). Counterparty Credit Risk, Collateral and Funding.Wiley Finance. Wiley, Chichester.Brigo, D. and Pallavicini, A. (2014). Nonlinear consistent valuation of CCP cleared or CSA bilat-eral trades with initial margins under credit, funding and wrong-way risks. Journal of FinancialEngineering, 01(01):1450001.Brigo, D., Pallavicini, A., and Papatheodorou, V. (2011). Arbitrage-free valuation of bilateral counter-party risk for interest-rate products: Impact of volatilities and correlations. International Journalof Theoretical and Applied Finance, 14(06):773–802.Broadie, M., Du, Y., and Moallemi, C. (2011). Eﬃcient risk estimation via nested sequential simula-tion. Management Science.Broadie, M., Du, Y., and Moallemi, C. (2015). Risk estimation via regression. Operations Research.Burgard, C. and Kjaer, M. (2011a). In the balance. Risk, November:72–75.

Burgard, C. and Kjaer, M. (2011b). Partial diﬀerential equation representations of derivatives withbilateral counterparty risk and funding costs. The Journal of Credit Risk, 7(3):1–19.Capriotti, L., Jiang, Y., and Macrina, A. (2017). AAD and least-square Monte Carlo: Fast Bermudan-style options and XVA Greeks. Algorithmic Finance, 6(1-2):35–49.Cherubini, U. (2005). Counterparty risk in derivatives and collateral policies: The replicating portfolioapproach. In Tilman, L., editor, ALM of Financial Institutions. Institutional Investor Books.Cr´epey, S. (2015a). Bilateral counterparty risk under funding constraints – Part I: Pricing.Mathematical Finance, 25(1):1–22.Cr´epey, S. (2015b). Bilateral counterparty risk under funding constraints – Part II: CVA. MathematicalFinance, 25(1):23–50.Cr´epey, S., Bielecki, T. S., and Brigo, D. (2014). Counterparty risk and funding: a tale of twopuzzles, volume 31 of Chapman and Hall/CRC Press Series in Financial Mathematics. Chapmanand Hall/CRC, Boca Raton.Cr´epey, S., Hoskinson, R., and Saadeddine, B. (2019). Balance sheet XVA by deep learning andGPU. Working paper; available at https://math.maths.univ-evry.fr/crepey/papers/xva-analysis-balance.pdf.Cybenko, G. (1989). Approximations by superpositions of sigmoidal functions. Mathemtics of Control,Signals, and Systems, 2(4):303–314.de Graaf, C., Feng, Q., Kandhai, B. D., and Oosterlee, C. (2014). Eﬃcient computation of exposureproﬁles for counterparty credit risk. International Journal of Theoretical and Applied Finance,17(4).de Graaf, C., Kandhai, B. D., and Reisinger, C. (2018). Eﬃcient exposure computation by risk factordecomposition. Quantitative Finance, 18(10):1657–1678.Delong, L. (2017). Backward Stochastic Diﬀerential Equations with Jumps and Their Actuarial andFinancial Applications. Springer, Berlin, New York.Duﬃe, D. and Huang, M. (1996). Swap rates and credit quality. The Journal of Finance, 51(3):921–949.E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial diﬀerential equations and backward stochastic diﬀerential equations.Communications in Mathematics and Statistics, 5:349–380.El Karoui, N., Peng, S., and Quenez, M. C. (1997). Backward stochastic diﬀerential equations inﬁnance. Mathematical Finance, 7(1):1–71.Ferguson, R. and Green, A. (2018). Deeply learning derivatives. arXiv preprint arXiv:1809.02233.Fries, C. P. (2019). Stochastic automatic diﬀerentiation: Automatic diﬀerentiation for Monte-Carlosimulations. Quantitative Finance, 19(6):1043–1059.Fujii, M., Shimada, A., and Takahashi, A. (2010). Note on construction of multiple swap curves withand without collateral. Available at SSRN: http://ssrn.com/abstract=1440633 .Fujii, M., Shimada, A., and Takahashi, A. (2011). A market model of interest rates with dynamicbasis spreads in the presence of collateral and multiple currencies. Wilmott, 54:61–73.Fujii, M., Takahashi, A., and Takahashi, M. (2019). Asymptotic expansion as prior knowledge in deeplearning method for high dimensional BSDEs. Asia-Paciﬁc Financial Markets, 26(3):391–408.Gordy, M. B. and Juneja, S. (2010). Nested simulation in portfolio risk measurement. ManagementScience, 56(10):1833–1848.Green, A. (2015). XVA: Credit, Funding and Capital Valuation Adjustments. Wiley Finance. Wiley,Chichester.Gregory, J. (2015). The xVA challenge. Wiley Finance. Wiley, Chichester.

SDES OF XVA 21

Han, J. and Long, J. (2018). Convergence of the deep BSDE method for coupled FBSDEs. arXivpreprint arXiv:1811.01165.Henry-Labordere, P. (2017). Deep primal-dual algorithm for BSDEs: applications of machine learningto CVA and IM. Available at SSRN:https://ssrn.com/abstract=3071506.Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks,4(2):251–257.Hur´e, C., Pham, H., and Warin, X. (2020). Some machine learning schemes for high-dimensionalnonlinear PDEs. Mathematics of Computations, 89:1547–1579.Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T. A., and von Wurstemberger, P. (2018). Over-coming the curse of dimensionality in the numerical approximation of semilinear parabolic partialdiﬀerential equations. arXiv preprint arXiv:1807.01212.Ioﬀe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducinginternal covariate shift. Proceeding of the 32nd International Conference on Machine Learning(CML).Jentzen, A., Salimova, D., and Welti, T. (2018). A proof that deep artiﬁcial neural networks over-come the curse of dimensionality in the numerical approximation of Kolmogorov partial diﬀerentialequations with constant diﬀusion and nonlinear drift coeﬃcients. arXiv preprint arXiv:1809.07321.Joshi, M. and Kwon, O. (2016). Least squares Monte Carlo credit value adjustment with small andunidirectional bias. International Journal of Theoretical and Applied Finance, 19(8).Karlsson, P., Jain, S., and Oosterlee, C. (2016). Counterparty credit exposures for interest ratederivatives using the stochastic grid bundling method. Applied Mathematical Finance, 23(3):175–196.Kolmogorov, A. N. (1956). On the representation of continuous functions of several variables bysuperposition of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR,108(2):679–681.Lichters, R., Stamm, R., and Gallagher, D. (2015). Modern Derivatives Pricing and Credit ExposureAnalysis: Theory and Practice of CSA and XVA Pricing, Exposure Simulation and Backtesting.Applied Quantitative Finance. Palgrave Macmillan, London.Longstaﬀ, F. A. and Schwartz, E. S. (2001). Valuing American options by simulation: A simpleleast-squares approach. Review of Financial Studies, 14(1):113–147.Pallavicini, A., Perini, D., and Brigo, D. Funding Valuation Adjustment: a consistent frameworkincluding CVA, DVA, collateral, netting rules and re-hypothecation. arXiv preprint arXiv:1112.1521.Piterbarg, V. (2010). Funding beyond discounting: collateral agreements and derivatives pricing. RiskMagazine, 2:97–102.Piterbarg, V. (2012). Cooking with collateral. Risk Magazine, 2:58–63.Reisinger, C. and Zhang, Y. (2019). Rectiﬁed deep neural networks overcome the curse of dimension-ality for nonsmooth value functions in zero-sum games of nonlinear stiﬀ systems. arXiv preprintarXiv:1903.06652.Ruf, J. and Wang, W. (2019). Neural networks for option pricing and hedging: a literature review.Available at SSRN:3486363.She, J.-H. and Grecu, D. (2017). Neural network for CVA: Learning future values. arXiv preprintarXiv:1811.08726.Sh¨oftner, R. (2008). On the estimation of credit exposures using regression-based Monte Carlo simu-lation. The Journal of Credit Risk, 4(4):37–62.

Sokol, A. (2014). Long-Term Portfolio Simulation: For XVA, Limits, Liquidity and Regulatory Capital.Risk Books, London. (Alessandro Gnoatto)

University of Verona, Department of Economics,via Cantarane 24, 37129 Verona, Italy

E-mail address , Alessandro Gnoatto: [email protected] (Athena Picarelli)

University of Verona, Department of Economics,via Cantarane 24, 37129 Verona, Italy

E-mail address , Athena Picarelli: [email protected] (Christoph Reisinger)

Oxford University, Mathematical InstituteROQ, Woodstock Rd, Oxford, OX2 6GG, UK

E-mail address , Christoph Reisinger:, Christoph Reisinger: