Deep xVA solver -- A neural network based counterparty credit risk management framework
DDEEP XVA SOLVER – A NEURAL NETWORK BASED COUNTERPARTYCREDIT RISK MANAGEMENT FRAMEWORK
ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGERMay 7, 2020
Abstract.
In this paper, we present a novel computational framework for portfolio-wide risk man-agement problems where the presence of a potentially large number of risk factors makes traditionalnumerical techniques ineffective. The new method utilises a coupled system of BSDEs for the valuationadjustments (xVA) and solves these by a recursive application of a neural network based BSDE solver.This not only makes the computation of xVA for high-dimensional problems feasible, but also produceshedge ratios and dynamic risk measures for xVA, and allows simulations of the collateral account. Introduction
As a consequence of the 2007–2009 financial crisis, academics and practitioners have been redefiningand augmenting key concepts of risk management. This made it necessary to reconsider many widelyused methodologies in quantitative and computational finance.As an example, it is now generally accepted that a reliable valuation of a financial product shouldaccount for the possibility of default of any agent involved in the transaction. Moreover, the tradingactivity is nowadays funded by resorting to different sources of liquidity. This results in the interestrate multi-curve phenomenon that, in such a framework, the existence of a unique funding stream witha unique risk-free interest rate no longer represents a realistic assumption. The increasingly importantrole of collateral agreements, demands for a portfolio-wide view on valuation.The aforementioned stylized facts are incorporated at the level of valuation equations by introducingvalue adjustments (xVA). Value adjustments are further terms to be added to, or subtracted from, anidealized reference portfolio value, computed in the absence of frictions, in order to obtain the finalvalue of the transaction.The literature on counterparty credit risk and funding is large and we only attempt to provide insightson the main references as they relate to our work. Possibly the first contribution on the subject is amodel for credit risk asymmetry in swap contracts in Duffie and Huang (1996). Before the 2007–2009financial crisis, we have the works of Brigo and Masetti (2005) and Cherubini (2005), where the conceptof credit valuation adjustment (CVA) is analyzed. The possibility of default of both counterpartiesinvolved in the transaction, represented by the introduction of the debt valuation adjustment (DVA),is investigated, among others, in Brigo et al. (2011) and Brigo et al. (2014).Another important source of concern for practitioners apart from default risk is represented by fund-ing costs. A parallel stream of literature emerged during and after the financial crisis to generalizevaluation equations to include the presence of collateralization agreements. In a Black-Scholes econ-omy, Piterbarg (2010) provides valuation formulas both in the collateralized and uncollateralized case.Generalizations to the case of a multi-currency economy can be found in Piterbarg (2012), Fujii et al.(2010) and Fujii et al. (2011). The funding valuation adjustment (FVA) is derived under alternative
Mathematics Subject Classification.
JEL Classification
E43, G12.
Key words and phrases.
CVA, DVA, FVA, ColVA, xVA, EPE, Collateral, xVA hedging, Deep BSDE Solver. a r X i v : . [ q -f i n . M F ] M a y ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER assumptions on the Credit Support Annex (CSA) in Pallavicini et al., while Brigo and Pallavicini(2014) also discusses the role of central counterparties in terms of funding costs. A general approachto funding in a semimartingale setting is provided by Bielecki and Rutkowski (2015).Funding and default risk need to be united in a single risk management framework to account for allpossible frictions and their interplay. Contributions in this sense can be found in Brigo et al. (2018) bymeans of the so-called discounting approach. In a series of papers, Burgard and Kjaer generalize theclassical Black-Scholes replication approach to include some of the aforementioned effects, see Burgardand Kjaer (2011b) and Burgard and Kjaer (2011a). A more general backward stochastic differentialequation (BSDE) approach is provided by Cr´epey (2015a), Cr´epey (2015b), Bichuch et al. (2018a) andBichuch et al. (2018b). The equivalence between the discounting approach of Brigo and co-authorsand the BSDE-based replication approaches is demonstrated in Brigo et al. (2018).The importance of the topic is reflected in the increasing number of monographs on the subject, withBrigo et al. (2013) an example of an early work. An advanced BSDE-based treatment is presented byCr´epey et al. (2014), while detailed analyses of how to construct large hybrid models for counterpartyrisk simulations are provided in Green (2015), Lichters et al. (2015) and Sokol (2014). Finally, Gregory(2015) provides an accessible introduction to wide-ranging aspects of the topic.A common fundamental feature of such generalized risk management frameworks is the necessity toadopt a portfolio-wide point of view in order to properly account for risk mitigation benefits arisingfrom diversified positions. Adopting such portfolio-wide models, as is the present market practice infinancial institutions, involves high-dimensional joint simulations of all positions within a portfolio.Commonly used numerical techniques (see for instance Sh¨oftner (2008); Karlsson et al. (2016); Broadieet al. (2015); Joshi and Kwon (2016)) make use of regression approaches, based on a modification ofthe Least-Squares Monte Carlo approach in Longstaff and Schwartz (2001), to alleviate the highcomputational cost of fully nested Monte Carlo simulations such as those initially proposed in Gordyand Juneja (2010); Broadie et al. (2011). For an application of adjoint algorithmic differentiation(AAD) to xVA simulation by regression see, for instance, Capriotti et al. (2017); Fries (2019).An alternative, hybrid, approach to counterparty risk computations is taken in de Graaf et al. (2014),where standard pricing methods are applied to the products in the portfolio and outer Monte Carloestimators are applied for exposures. Techniques based purely on PDEs generally suffer from the so-called curse of dimensionality , a rapid increase of computational cost in presence of high dimensionalproblems. A PDE approach with factor-based dimension reduction has been proposed in de Graafet al. (2018). Observe that in the presence of collaterals, a PDE representation for CVA and DVA isnot always available.Presently, in high dimension (i.e., on portfolio level), there are no well established computationalmethods that have achieved satisfactory results. However, as explained above, such situations are ofgreat practical relevance.Modern risk-management frameworks demand the use of advanced parallel programming techniquesat the heart of software running on high-performance hardware infrastructures. In recent years, thanksto developments such as the introduction of advanced graphics processing units (GPUs), allowing formassive parallel computations, machine learning techniques have witnessed an increasing popularityin different domains. Such methods prove themselves particularly appealing in the context of high-dimensional problems involving large amounts of data.Of particular interest is the concept of an artificial neural network (ANN). From a mathematicalperspective, ANNs are multiple nested compositions of relatively simple multivariate functions. Theterm deep neural networks refers to ANNs with several interconnected layers. One remarkable property
SDES OF XVA 3 of ANNs is given in the “Universal Approximation Theorem”, which has been proven in differentversions, starting from the remarkable insight of Kolmogorov’s Representation Theorem, Kolmogorov(1956), and the seminal works of Cybenko (1989) and Hornik (1991). In a nutshell, this result statesthat any continuous function in any dimension can be represented to arbitrary accuracy by means of anANN. In this context, and building heavily on earlier work of Jentzen et al. (2018), the recent results byReisinger and Zhang (2019) have proved that deep ANNs can overcome the curse of dimensionality forapproximating (nonsmooth) solutions of partial differential equations arising from (open-loop controlof) SDEs. A result to the same effect has been shown for heat equations with a zero-order nonlinearityin Hutzenthaler et al. (2018). This is potentially useful in the context of risk management as simplemodels for CVA can be expressed in this form. For a recent literature survey of applications of neuralnetworks to pricing, hedging and risk management problems more generally we refer the reader to Rufand Wang (2019).In this paper, we investigate the application of ANNs to solve high-dimensional BSDEs arising fromrisk management problems. Indeed, in the classical continuous-time mathematical finance literaturethe random behavior of the simple financial assets composing a portfolio is typically described bymeans multi-dimensional Brownian motions and forward stochastic differential equations (SDEs). Inthis setting, BSDEs naturally arise as a representation of the evolution of the hedging portfolio, wherethe terminal condition represents the target payoff (see, e.g., El Karoui et al. (1997)). In essence,(numerically) solving a BSDE is equivalent to identifying a risk management strategy.More precisely, we will consider a discretized version of the BSDE and parametrize the (high dimen-sional) control (i.e., hedging) process at every point in time by means of a family of ANNs. Oncewritten in this form, BSDEs can be viewed as model-based reinforcement learning problems. TheANN parameters are then fitted so as to minimize a prescribed loss function. Mathematically, thisinvolves an optimization step over a very large number of variables which typically requires the use ofstochastic gradient descent-type algorithms.The line of computational methods we follow has been initiated in the context of high-dimensionalnonlinear PDEs, in E et al. (2017) and further investigated in Henry-Labordere (2017) and Fujii et al.(2019), and has led to the so-called Deep BSDE Solver. By way of financial applications, and xVAspecifically, a primal-dual extension to the Deep BSDE Solver has been developed in Henry-Labordere(2017) and tested on stylised CVA- and IM(Initial Margin)-type PDEs; the Deep BSDE Solver hasalso been applied specifically to exposure computations for a Bermudan swaption and a cross-currencyswap in She and Grecu (2017).Our approach goes beyond these earlier works in the following regards: we • formulate a rigorous, generic BSDE model for the dynamics of xVA, including CVA, DVA,FVA and ColVA (collateral valuation adjustment), for a derivative portfolio; • provide algorithms for the computation of ‘non-recursive’ xVAs (such as CVA and DVA) and‘recursive’ xVAs by (recursive) application of a Deep BSDE Solver; • show how the method can be used for the simulation of xVA sensitivities and collateral.We will refer to our method as Deep xVA Solver. More recently, an xVA strategy based on deeplearning regression has been proposed in Cr´epey et al. (2019); Albanese et al. (2020), exploiting thenumerical approach to BSDEs presented in Hur´e et al. (2020). Different from E et al. (2017), thissolver approximates the value function, not the control, by means of an ANN and reconstructs it ateach time step by dynamic programming techniques. A comparison of the performance and robustnessof the two approaches will require comprehensive testing in industry-relevant settings. We see as a ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER structural advantage of our algorithm that it directly computes the xVA hedging strategy.The paper is organized as follows. The financial framework is established in Section 2. In Section3, after shortly recalling the main features of the deep BSDE solver presented in E et al. (2017), thealgorithm for xVA computation is introduced. Numerical results for a selection of test cases are shownin Section4, while Section 5 concludes.2.
The financial market
We fix a time horizon
T < ∞ for the trading activity of two agents named the bank (B) and the counterparty (C). Unless otherwise stated, throughout the paper we assume the bank’s perspectiveand refer to the bank as the hedger. All underlying processes are modeled over a probability space (Ω , G , G , Q ) , where G = ( G t ) t ∈ [0 ,T ] ⊆ G is a filtration satisfying the usual assumptions ( G is assumed to be trivial). We denote by τ B and τ C the time of default of the bank and the counterparty, respectively. Specifically, we assume that G = F ∨ H , where F = ( F t ) t ∈ [0 ,T ] is a reference filtration satisfying the usual assumptions and H = H B ∨ H C , with H j = (cid:16) H jt (cid:17) t ∈ [0 ,T ] for H jt = σ ( H u | u ≤ t ), and H jt := 1 { τ j ≤ t } , j ∈ { B, C } . We set(2.1) τ = τ C ∧ τ B . In the present paper we will extensively make use of the so called
Immersion Hypothesis (see, e.g.,Bielecki and Rutkowski (2004)).
Hypothesis 2.1.
Any local ( F , Q ) -martingale is a local ( G , Q ) -martingale. We consider the following spaces: • L ( R d ) is the space of all F T -measurable R d -valued random variables X : Ω (cid:55)→ R d such that (cid:107) X (cid:107) = E (cid:104) | X | (cid:105) < ∞ . • H ,q × d is the space of all predictable R q × d -valued processes φ : Ω × [0 , T ] (cid:55)→ R q × d such that E (cid:104)(cid:82) T | φ t | dt (cid:105) < ∞ . • S the space of all adapted processes φ : Ω × [0 , T ] (cid:55)→ R q × d such that E (cid:2) sup ≤ t ≤ T | φ t | (cid:3) < ∞ .2.1. Basic traded assets.
Risky assets.
For d ≥ , we denote by S i , i = 1 , . . . , d , the ex-dividend price (i.e. the price) of riskysecurities. All S i are assumed to be c`adl`ag F -semimartingales.Let W Q = (cid:16) W Q t (cid:17) t ∈ [0 ,T ] be a d -dimensional ( F , Q )-Brownian motion (hence a ( G , Q )-Brownian motion,thanks to Hypothesis 2.1). We introduce the following coefficient functions: µ : R + × R d (cid:55)→ R d ,σ : R + × R d (cid:55)→ R d × d , (2.2)which are assumed to satisfy standard conditions ensuring existence and uniqueness of strong solutionsof SDEs driven by the Brownian motion W Q . The matrix process σ is assumed to be invertible atevery point in time. We assume that d S t = µ ( t, S t ) d t + σ ( t, S t ) d W Q t ,S = s ∈ R d , (2.3)on [0 , T ] . Note that we are not postulating that the processes S i are positive. SDES OF XVA 5
Throughout the paper we assume that the market is complete for the sake of simplicity.
Cash accounts.
Given a stochastic return process x := ( x t ) t ≥ , which is assumed bounded from below,right-continuous and F -adapted, we define the cash account B x with unitary value at time 0, as thestrictly positive continuous processes of finite variation B xt := exp (cid:26)(cid:90) t x s d s (cid:27) , t ∈ [0 , T ] . (2.4)In particular, B x := ( B xt ) t ∈ [0 ,T ] is also continuous and adapted. Defaultable bonds.
Default times are assumed to be exponentially distributed random variables withtime-dependent intensity Γ jt = (cid:90) t λ js d s, j ∈ { B, C } , t ∈ [0 , T ] , where λ j are non-negative measurable bounded deterministic functions such that (cid:90) T λ js d s < ∞ , ∀ t ≥ , j ∈ { B, C } . We introduce two risky bonds with maturity T (cid:63) ≤ T and rate of return r j + λ j , issued by the bankand the counterparty, with dynamicsd P jt = (cid:16) r jt + λ jt (cid:17) P jt d t − P jt − dH jt , P j = e − (cid:82) T(cid:63) ∧ τj ( r ju + λ ju ) d u , j ∈ { B, C } . (2.5)2.2. xVA framework. We consider a family of contingent claims within a portfolio with agreeddividend stream A m = ( A mt ) t ∈ [0 ,T ] , m = 1 , . . . , M , and set ¯ A mt := 1 { t<τ } A mt + 1 { t ≥ τ } A mτ − . The valueof the single claims within the portfolio, ignoring any counterparty risk or funding issue, that we referto as clean values , are denoted by ( ˆ V mt ) m =1 ,...,M and satisfy the following FBSDEs, for m = 1 , . . . , M , − d ˆ V mt = d A mt − r t ˆ V mt d t − (cid:80) dk =1 ˆ Z k,mt d W k, Q t , ˆ V mT m = 0 , (2.6)which reads, in integral form,ˆ V mt := E Q (cid:34) B rt (cid:90) ( t,T m ] d A mu B ru (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) , t ∈ [0 , T m ] , (2.7)where r is a collateral rate in an idealized perfect collateral agreement. For simplicity, we restrictourselves to Europan-type contracts and write A mt = 1 { t = T m } g m ( S T m ), T m ≤ T , for a family ofLipschitz functions g m , m = 1 , . . . M . Then, equation (2.6) reads − d ˆ V mt = − r t ˆ V mt d t − (cid:80) dk =1 ˆ Z k,mt d W k, Q t , ˆ V mT m = g m ( S T m ) . (2.8)Observe that the system (2.3) and (2.8) is decoupled, in the sense that the forward equation (2.3)does not exhibit a dependence on the backward component.We follow here the framework in Biagini et al. (2019), where the portfolio dynamics are stated in theform of a BSDE under the enlarged filtration G . We set Z kt := d (cid:88) i =1 ξ it σ i,k ( t, S t ) , k = 1 , . . . , d, (2.9a) U jt := − ξ jt P jt − , j ∈ { B, C } , (2.9b) ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER f ( t, V, C ) := − (cid:104) ( r f,lt − r t ) ( V t − C t ) + − ( r f,bt − r t ) ( V t − C t ) − (2.9c) +( r c,lt − r t ) C + t − ( r c,bt − r t ) C − t (cid:105) , where • ξ i , i = 1 , . . . , d , are the positions in risky assets, while ξ B , ξ C are the position in the bank andcounterparty bond respectively; • r f,l , r f,b represent unsecured funding lending and borrowing rates; • r c,l , r c,b denote the interest on posted and received variation margin (collateral); • C + and C − represent the posted and received variation margin/collateral.All above processes are assumed to satisfy suitable regularity conditions ensuring existence and unique-ness for a solution to BSDE (2.10) below. Both posted and received collateral are assumed to beLipschitz functions of the clean value of the derivative portfolio.We denote by V the full contract value, i.e. the portfolio value including counterparty risk and multiplecurves. The G -BSDE for the portfolio’s dynamics then has the form on { τ > t } − d V t = (cid:80) Mm =1 d ¯ A mt + ( f ( t, V, C ) − r t V t ) d t − (cid:80) dk =1 Z kt d W k, Q t − (cid:80) j ∈{ B,C } U jt d M j, Q t ,V τ = θ τ ( ˆ V , C ) , where θ τ ( ˆ V , C ) := ˆ V τ + 1 { τ C <τ B } (1 − R C ) (cid:16) ˆ V τ − C τ − (cid:17) − − { τ B <τ C } (1 − R B ) (cid:16) ˆ V τ − C τ − (cid:17) + , (2.10)where ˆ V t := (cid:80) Mm =1 ˆ V mt and R B , R C are two positive constants representing the recovery rate of thebank and the counterparty, respectively.In their Theorem 3.15, Biagini et al. (2019) show that there exists a unique solution ( V, Z, U ) for the G -BSDE (2.10), and the process V assumes the following form on { τ > t } : V t = B rt E Q (cid:34) M (cid:88) m =1 (cid:90) ( t,τ ∧ T ] d ¯ A mu B ru + (cid:90) τ ∧ Tt f ( u, V, C ) B ru d u + 1 { τ ≤ T } θ τ ( ˆ V , C ) B rτ (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) G t (cid:35) . (2.11)To prove existence and uniquencess for the G -BSDE, Biagini et al. (2019) employ the techniqueintroduced by Cr´epey (2015a) and reformulate the problem under the reduced filtration F . Statedin such a form, the problem is also more amenable to numerical computations, especially in the casewhere f and θ do not dependent on V , either explicitly, or implicitly through C .We consider the following F -BSDE on [0 , T ]: − dXVA t = ¯ f ( ˆ V t , XVA t ) d t − (cid:80) dk =1 ¯ Z kt d W k, Q t , XVA T = 0 , (2.12)where ¯ f ( ˆ V t , XVA t ) := − (1 − R C ) (cid:16) ˆ V t − C t (cid:17) − λ C, Q t + (1 − R B ) (cid:16) ˆ V t − C t (cid:17) + λ B, Q t + ( r f,lt − r t ) (cid:16) ˆ V t − XVA t − C t (cid:17) + − ( r f,bt − r t ) (cid:16) ˆ V t − XVA t − C t (cid:17) − + ( r c,lt − r t ) C + t − ( r c,bt − r t ) C − t − ( r t + λ C, Q t + λ B, Q t )XVA t . (2.13)By standard results on BSDEs, see e.g. Delong (2017, Theorem 4.1.3, Theorem 3.1.1), the existence anduniqueness of solutions ( ˆ V m , ˆ Z m ) ∈ S ( R ) × H ,q × , for m = 1 , . . . , M , and (XVA , ¯ Z ) ∈ S ( R ) × H ,q × to, respectively, (2.8) and (2.12), holds under the following conditions: r f,l , r f,b , r c,l , r c,b , r are bounded processes; | µ ( t, x ) − µ ( t, x (cid:48) ) | + | σ ( t, x ) − σ ( t, x (cid:48) ) | ≤ C | x − x (cid:48) | , | σ ( t, x ) | + | µ ( t, x ) | ≤ C (1 + | x | ) , for any t ∈ [0 , T ], x, x (cid:48) ∈ R d , for some constants C ≥ pre-default xVA process. Indeed, given the pre-default valueprocess V such that V t { τ>t } = V t { τ>t } , on { τ > t } the solution to (2.10) can be represented as V t = ˆ V t − XVA t . Moreover, defining the process ˜ r = (˜ r t ) t ∈ [0 ,T ] as ˜ r := r + λ C, Q + λ B, Q , it has been shown in Biaginiet al. (2019, Corollary 3.31) that the process XVA admits the representationXVA t = − CVA t + DVA t + FVA t + ColVA t , (2.14)where CVA t := B ˜ rt E Q (cid:20) (1 − R C ) (cid:90) Tt B ˜ ru (cid:16) ˆ V u − C u (cid:17) − λ C, Q u d u (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (2.15) DVA t := B ˜ rt E Q (cid:20) (1 − R B ) (cid:90) Tt B ˜ ru (cid:16) ˆ V u − C u (cid:17) + λ B, Q u d u (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (2.16) FVA t := B ˜ rt E Q (cid:90) Tt ( r f,lu − r u ) (cid:16) ˆ V u − XVA u − C u (cid:17) + B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (2.17) − B ˜ rt E Q (cid:90) Tt ( r f,bu − r u ) (cid:16) ˆ V u − XVA u − C u (cid:17) − B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t , ColVA t := B ˜ rt E Q (cid:34) (cid:90) Tt ( r c,lu − r u ) C + u − ( r c,bu − r u ) C − u B ˜ ru d u (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) F t (cid:35) . (2.18)This representation highlights that the inclusion of different borrowing and lending rates introducesa non-zero funding adjustment which cannot be found independently of the other adjustments. Analgorithm to compute all valuations adjustments systematically in the ‘non-recursive’ and ‘recursive’setting, especially with the view of potentially large portfolios, is the focus of the next sections.3. The algorithm
In this section, we describe the algorithm for the computation of valuation adjustments by neuralnetwork approximations to the BSDE introduced in the previous section. We start by briefly recallingthe main features of the deep BSDE solver in E et al. (2017). Then, we present the application ofthe solver to valuation adjustments and its extensions to obtain financially important quantities. Wefirst focus on non recursive adjustments, namely CVA and DVA, and then extend the approach to therecursive case. In particular, we propose to use the deep BSDE solver in E et al. (2017) to approximatethe dynamics of ˆ V mu , m = 1 , . . . , M , u ∈ [ t, T ], which constitute the portfolio ˆ V u = (cid:80) Mi =1 ˆ V mu . Once theportfolio value has been approximated and resulting collaterals computed, the value of the adjustmentcan be obtained either by inserting the values in an ‘outer’ Monte Carlo computation (observe thatthis strategy only works for non recursive adjustments) or applying a second time the deep BSDEsolver to (2.12). ALESSANDRO GNOATTO, ATHENA PICARELLI, AND CHRISTOPH REISINGER
The deep BSDE solver of E et al. (2017).
We describe in this section the main ideas leadingto the algorithm by E et al. (2017). We consider a general forward-backward stochastic differentialequation (FBSDE) framework.Let (Ω , F , Q ) be a probability space rich enough to support an R d -valued Brownian motion W Q =( W Q t ) t ∈ [0 ,T ] . Let F = ( F t ) t ∈ [0 ,T ] be the filtration generated by W Q , assumed to satisfy the standardassumptions. Let us consider an FBSDE in the following general form: X t = x + (cid:90) t b ( s, X s ) d s + (cid:90) t a ( s, X s ) (cid:62) d W Q s , x ∈ R d (3.1) Y t = ϑ ( X T ) + (cid:90) Tt h ( s, X s , Y s , Z s ) d s − (cid:90) Tt Z (cid:62) s d W Q s , t ∈ [0 , T ] , (3.2)where the vector fields b : [0 , T ] × R d (cid:55)→ R d , a : [0 , T ] × R d (cid:55)→ R d × d , h : [0 , T ] × R d × R × R d (cid:55)→ R and ϑ : R d (cid:55)→ R satisfy suitable assumptions ensuring existence and uniqueness results. We denoteby ( X xt ) t ∈ [0 ,T ] ∈ S ( R d ) and ( Y yt , Z t ) t ∈ [0 ,T ] ∈ S ( R ) × H ,q × the unique adapted solution to (3.1) and(3.2), respectively. To alleviate notations, hereafter we omit the dependency on the initial condition x of the process X x · .The above formulation of FBSDEs is intrinsically linked to the following stochastic optimal controlproblem: minimise y, Z =( Z t ) t ∈ [0 ,T ] E (cid:104)(cid:12)(cid:12) ϑ ( X T ) − Y yT (cid:12)(cid:12) (cid:105) subject to (3.1)–(3.2) . (3.3)In particular, a solution ( Y, Z ) to (3.2) is a minimiser of the problem (3.3). A discretized version ofthe optimal control problem (3.3) is the basis of the deep BSDE solver.Given N ∈ N , consider 0 = t < t < . . . < t N = T . For simplicity, let us take a uniform mesh withstep ∆ t such that t n = n ∆ t , n = 0 , . . . , N , and denote ∆ W n = W Q t n +1 − W Q t n . By an Euler-Maruyamaapproximation of (3.1)–(3.2), one has (cid:101) X n +1 = (cid:101) X n − b ( t n , (cid:101) X n )∆ t + a ( t n , (cid:101) X n )∆ W n , (cid:101) X = x, (3.4) (cid:101) Y yn +1 = (cid:101) Y n − h ( t n , (cid:101) X n , (cid:101) Y n , (cid:101) Z n )∆ t + (cid:101) Z (cid:62) n ∆ W n , (cid:101) Y y = y. (3.5)The core idea of the deep BSDE solver is to approximate, at each time step n , the control process (cid:101) Z n in (3.5) by using an artificial neural network (ANN). More specifically, in the Markovian setting, Z t isa measurable function of X t , which we approximate by an ANN ansatz and carry out the optimisationabove over this parametrised form. For this we introduce next a formalism for the description of neuralnetworks. ANN approximation.
We consider artificial neural networks with L + 1 ∈ N \ { , } layers. Each layerconsists of ν (cid:96) nodes (also called neurons ), for (cid:96) = 0 , . . . , L . The 0-th layer represents the input layer ,whereas the L -th layer is called the output layer . The remaining L − hidden layers .For simplicity we set ν (cid:96) = ν , (cid:96) = 1 , . . . , L −
1. The input and output dimensions are both d in ourcase.A feedforward neural network is a function ϕ (cid:37) : R d (cid:55)→ R d . A feedforward neural network is definedvia the composition x ∈ R d (cid:55)−→ A L ◦ (cid:37) ◦ A L− ◦ . . . ◦ (cid:37) ◦ A ( x ) ∈ R d , where all A (cid:96) , (cid:96) = 1 , . . . , L , are affine transformations A : R d (cid:55)→ R ν , A (cid:96) : R ν (cid:55)→ R ν , (cid:96) = 2 , . . . , L − , A L : R ν (cid:55)→ R d , SDES OF XVA 9 x x x x Inputlayer Hiddenlayer Hiddenlayer y y y y Outputlayer
Figure 1.
Schematic representation of a feedforward neural network with two hiddenlayers, i.e. L = 3, input and output dimension d = 4, and ν = d + 2 = 6 nodes.of the form A (cid:96) ( x ) := W (cid:96) x + β (cid:96) , (cid:96) = 1 , . . . , L , where W (cid:96) and β (cid:96) are matrices and vectors of suitablesize called, respectively, weights and biases. The function (cid:37) , called activation function is a univariatefunction (cid:37) : R (cid:55)→ R that is applied component-wise to vectors. With an abuse of notation, wedenote (cid:37) ( x , . . . , x ν ) = ( (cid:37) ( x ) , . . . , (cid:37) ( x ν )) . The elements of the weights W (cid:96) and of the vectors β (cid:96) arethe parameters of the neural network. We can regroup all parameters in a vector ρ ∈ R R where R = (cid:80) L (cid:96) =0 ν (cid:96) (1 + ν (cid:96) ).As announced we use ANNs to approximate the control process Z t . More specifically, let R ∈ N as before and let ξ ∈ R , ρ ≡ ( ρ . . . , ρ R ) ∈ R R be R + 1 parameters. We introduce a family ofneural networks ϕ ρn : R d → R d , n ∈ { , . . . , N } parametrized by ρ and indexed by time. We denote Z ρn = ϕ ρn ( X n ) and consider the following parametrized version of (3.5) Y ρ,ξn +1 = Y ρ,ξn − h ( t n , X n , Y ρ,ξn , Z ρn )∆ t + ( Z ρn ) (cid:62) ∆ W n , Y ρ,ξ = ξ, (3.6)meaning that, at each time step, we use a distinct neural network to approximate the control process.The deep BSDE solver by E et al. (2017) considers the following stochastic optimization problemminimise ξ ∈ R , ρ ∈ R R E (cid:20)(cid:16) ϑ ( X N ) − Y ρ,ξN (cid:17) (cid:21) subject to (3.4)–(3.6) . (3.7)Observe that, in practice, one simulates L ∈ N Monte Carlo paths ( X ( (cid:96) ) n , Y ξ,ρ, ( (cid:96) ) n ) n =0 ...N for (cid:96) = 1 , . . . , L ,using (3.4)–(3.6) with N i.i.d. Gaussian random variables (∆ W n ) n =0 ,...,N − with mean 0 and variance∆ t . Replacing the expected cost functional by the empirical mean, (3.7) becomesminimise ξ ∈ R , ρ ∈ R R L L (cid:88) (cid:96) =1 (cid:16) ϑ ( X ( (cid:96) ) T ) − Y ρ,ξ, ( (cid:96) ) N (cid:17) subject to (3.4)–(3.6) . (3.8)This minimization typically involves a huge number of parameters and it is performed by a stochasticgradient descent-type algorithm (SGD), leading to random approximations. For further details on thispoint we refer the reader to Section 2.6 in E et al. (2017). We will denote by I the maximum numberof SGD iterations. To improve the performance and stability of of the ANN approximation a batchnormalization can also be considered, see Ioffe and Szegedy (2015). However, in our framework, thisnormalization does not always have a positive impact on the results and we will only apply it whenresulting in some numerical improvement. A rigorous and complete theoretical convergence framework for the deep BSDE solver is not availableto date. Interesting a posteriori error bounds can be found in Han and Long (2018, Theorem 1’),where the authors show that under suitable assumption on the coefficients of the FBSDE (3.1)-(3.2)(namely the monotonicity of b and h , the Lipschitz continuity in space, H¨older continuity in time,linear growth of b, a, h and the Lipschitz continuity of ϑ ) one has, for ∆ t sufficiently small,(3.9) sup t ∈ [0 ,T ] E | Y yt − ˜ Y ρ,ξt | + (cid:90) T E | Z t − ˜ Z ρt | d t ≤ C (cid:16) ∆ t + E (cid:104) (cid:16) ϑ ( X N ) − Y ρ,ξN (cid:17) (cid:105)(cid:17) , where C is a constant independent of ∆ t and d possibly depending on the starting point of the forwardprocess and, given ( Y ρ,ξn ) n =0 ,...,N from (3.6), ˜ Y ρ,ξt = Y ρ,ξn and ˜ Z ρt = Z ρn for t ∈ [ t n , t n +1 ).In Han and Long (2018, Theorem 2’), a priori estimates on the term E [( ϑ ( X N ) − Y ρ,ξN ) ] are alsoprovided. However, the obtained bounds depend on the (unknown) approximation capacity of theconsidered ANN. To corroborate this idea, we mention that in our numerical tests we experiencedimportant variability of results depending on the different structure of the ANN used (see also theresults in Figure 5).3.2. The Deep xVA Solver for non recursive valuation adjustments.
In our setting, the deepBSDE solver is first employed in the approximation of the clean values of the portfolio, i.e., theprocesses ˆ V mt for m = 1 , . . . , M , which are the solutions of (2.8) with underlying forward dynamicsgiven by S in (2.3). More precisely, in the notation of the previous section, we take X t = S t and Y t = ˆ V mt for m = 1 , . . . , M. We now describe the algorithm for computing CVA and DVA given by formulas (2.15) and (2.16),respectively. A unifying formula for CVA and DVA can be written as(3.10) B ˜ rt E Q (cid:20) (cid:90) Tt Φ( u, ˆ V u ) du (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , where • Φ( u, v ) = (1 − R C ) B ˜ ru ( v − C u ) − λ C, Q u for CVA; • Φ( u, v ) = (1 − R B ) B ˜ ru ( v − C u ) + λ B, Q u for DVA.Given a time discretization (uniform, for simplicity) with time step ∆ t , the integral in (3.10) can beapproximated by a quadrature rule, i.e. (cid:90) T Φ( u, ˆ V u ) du ≈ N (cid:88) n =0 η n Φ( t n , ˆ V t n ) . For instance, taking t = t = 0, one may consider the trapezoidal rule (cid:90) T Φ( u, ˆ V u ) du ≈ N (cid:88) n =1 ∆ t t n , ˆ V t n ) + Φ( t n − , ˆ V t n − )) . Denoting for any m = 1 , . . . , M by ( V m, ¯ ρ, ¯ ξ, ( p ) n ) n =0 ,...,N,p =1 ,...,P the approximation of the process ( ˆ V mt n ) n =0 ,...,N obtained by means of the parameters (¯ ρ, ¯ ξ ) resulting from the deep BSDE solver optimization (3.7),the adjustment is then approximated by the following formula:1 P P (cid:88) p =1 (cid:32) N (cid:88) n =0 η n Φ( t n , M (cid:88) m =1 V m, ¯ ρ, ¯ ξ, ( p ) n ) (cid:33) . Algorithms 1 and 2 summarize the main steps of the method.
SDES OF XVA 11
Algorithm 1:
Deep algorithm for exposure simulationSet parameters:
N, L . (cid:46) N time steps, L paths for inner Monte Carlo loop Fix architecture of ANN. (cid:46) intrinsically defines the number of parameters R Deep BSDE solver ( N , L ) : Simulate L paths ( S ( (cid:96) ) n ) n =0 ,...,N , (cid:96) = 1 , . . . , L of the forward dynamics.Define the neural networks ( ϕ ρn ) n =1 ,...,N . for m = 1 , . . . , M do Minimize over ξ and ρ L L (cid:88) (cid:96) =1 (cid:16) g m ( S ( (cid:96) ) N ) − V m,ρ,ξ, ( (cid:96) ) N (cid:17) , subject to(3.11) V m,ρ,ξ, ( (cid:96) ) n +1 = V m,ρ,ξ, ( (cid:96) ) n − f ( t n , S ( (cid:96) ) n , V m,ρ,ξ, ( (cid:96) ) n , Z ρ, ( (cid:96) ) n )∆ t + ( Z ρ, ( (cid:96) ) n ) (cid:62) ∆ W ( (cid:96) ) n , V m,ρ,ξ, ( (cid:96) )0 = ξ, Z ρ, ( (cid:96) ) n = ϕ ρn ( S ( (cid:96) ) n ) . Save the optimizer ( ¯ ξ m , ¯ ρ m ). endendAlgorithm 2: Deep xVA Solver for non recursive valuation adjustmentsApply Algorithm 1Set parameters: P . (cid:46) P paths for the outer Monte Carlo loop Simulate, for m = 1 , . . . , M , ( V m, ( p ) n ) n =0 ,...,N,p =1 ,...,P by means of (3.11) with ξ = ¯ ξ m , ρ = ¯ ρ m . (cid:46) approximation of the clean values Define V ( p ) n := (cid:80) Mm =1 V m, ( p ) n for n = 0 , . . . , N , p = 1 , . . . , P . (cid:46) approximation of the clean portfolio value Compute the adjustment as 1 P P (cid:88) i =1 (cid:32) N (cid:88) n =0 η n Φ( t n , V ( p ) n ) (cid:33) . Estimates (3.9) can be used to obtain a posteriori bounds on the L error for exposures in [0 , T ] startingfrom the loss function, however, the MC error should be added to obtain a computable bound.3.3. The Deep xVA Solver for recursive valuation adjustments.
The procedure of the previoussection is sufficient to perform the estimation of CVA and DVA according to (2.15) and (2.16) at timezero by means of a standard Monte Carlo estimator, given the pathwise solutions of the BSDEs forclean values. Typically, however, the bank needs to also compute risk measures on the CVA, such asValue–at–Risk. Also, if we look at the driver of the xVA BSDE (2.12) we observe that FVA termsintroduce a recursive structure in the driver, so that even a time t estimate of the process XVA requiresthe use of a numerical solver for a BSDE. Finally, let us observe that the bank is not only interestedin computing the xVA at time t , also hedging the market risk of xVA is important, meaning that onealso needs sensitivities of valuation adjustments with respect to the driving risk factors.All above considerations motivate us to propose a two-step procedure, where we first employ the deepBSDE solver to estimate the clean values ˆ V m , m = 1 , . . . , M , according to Algorithm 1 and then,using the simulated paths of the M clean BSDEs obtained from the first step, we apply again the deepBSDE solver to numerically solve the xVA BSDE (2.12). The procedure is outlined in Algorithm 3. Algorithm 3:
Deep xVA SolverApply Algorithm 1.Set parameters: P . (cid:46) P paths for outer Monte Carlo loop Fix architecture of ANN. (cid:46) intrinsically defines the number of parameters ¯ R (in general ¯ R (cid:54) = R ) Deep XVA-BSDE solver ( N , P ) : Simulate P paths ( V ( p ) n ) n =0 ,...,N , p = 1 , . . . , P , of the portfolio value.Define the neural networks ( ψ ζn ) n =1 ,...,N .Minimize over γ and ζ P P (cid:88) p =1 (cid:16) X ζ,γ, ( p ) N (cid:17) , subject to(3.12) X ζ,γ, ( p ) n +1 = X ζ,γ, ( p ) n − ¯ f ( V ( p ) n , X ζ,γ, ( p ) n )∆ t + ( ¯ Z ζ, ( p ) n ) (cid:62) ∆ W ( p ) n , X ζ,γ, ( p )0 = γ, ¯ Z ζ, ( p ) n = ψ ζn ( V ( p ) n ) . end Pathwise simulation of sensitivities.
One interesting feature of our approach to xVA compu-tations is that we can easily estimate several sensitivities (i.e., partial derivatives) of pricing functions.Let us recall that, in the present Markovian setting, the control Z associated to a FBSDE of thegeneral form (3.1)–(3.2) satisfies Z t = ∂Y∂X ( t, X t ) a ( t, X t ) , (3.13)so that we can easily reconstruct the gradient of the pricing function with respect to all risk factorssimply by multiplying each (vector-valued) neural network by the inverse (which is assumed to exist)of the matrix a ( t, X t ). This becomes particularly interesting in view of Algorithms 1 and 3, where wecan obtain hedge rations both for the clean value and for the valuation adjustments without furthercomputations.Obtaining second order sensitivities, which may also be important for hedging purposes, is also fea-sible in our setting, because feedforward neural networks are compositions of simple functions andcomputation of gradients of neural network functions has become standard in that community. Usingthe notation of Section 3.1, we can write(3.14) ∂Z ρn ∂X n = ∂ϕ (cid:37) ( X n ) ∂X n , with ϕ (cid:37) ( X n ) = A L ( ρ ( A L− . . . ρ ( A ( X n )))). Since ( A (cid:96) ) (cid:96) =1 ,..., L are affine functions, their Jacobians aregiven by the weight matrices, i.e. J A (cid:96) ( · ) = W (cid:96) , (cid:96) = 1 , . . . , L . Moreover, one also has the Jacobian of ρ , J (cid:37) ( · ) = diag (cid:0) (cid:37) (cid:48) ( · ) (cid:1) , SDES OF XVA 13 where, for x ∈ R ν we denote (cid:37) (cid:48) ( x ) = ( (cid:37) (cid:48) ( x ) , . . . , (cid:37) (cid:48) ( x ν )). In the present paper, we choose (cid:37) ( x ) =ReLU( x ) = max { x, } so that the first derivative can be defined as (cid:37) (cid:48) ( x ) = ReLU (cid:48) ( x ) = (cid:40) x >
00 otherwise (cid:41) = sgn(ReLU( x )) . Finally, we deduce that the following explicit differentiation formula holds: ∂Z ρn ∂X n = W L diag (cid:0) (cid:37) (cid:48) ( A L− ( . . . A ( X n ))) (cid:1) . . . diag (cid:0) (cid:37) (cid:48) ( A ( X n )) (cid:1) W . Given the availability of the derivative of Z ρn we can then obtain the Hessian of Y from (3.13).4. Numerical results
To test our algorithm, we start by studying two very simple examples with a similar computationalstructure as CVA and DVA, and for which we can easily provide reference solutions. We will thengive a higher-dimensional example and illustrate further practically relevant features of the method,such as recursive xVA computations and simulation of the collateral account.Let S be the price of a single stock described by a Black-Scholes dynamics,d S t = rS t d t + σS t d W Q t , S = s , and ˆ V a European-style contingent claim with valueˆ V t = E (cid:104) e − r ( T − t ) g ( S T ) |F t (cid:105) . In particular, ˆ V solves the following BSDE: − d ˆ V t = − r ˆ V t d t − Z t d W Q t , ˆ V T = g ( S T ) . (4.1)The discounted positive and negative expected exposure of ˆ V are defined, respectively, byDEPE( s ) = E Q (cid:20) e − r ( s − t ) (cid:16) ˆ V s (cid:17) + (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) , (4.2) DENE( s ) = − E Q (cid:20) e − r ( s − t ) (cid:16) ˆ V s (cid:17) − (cid:12)(cid:12)(cid:12)(cid:12) F t (cid:21) . (4.3)In the plots below, we show some promising results obtained from particular realisations of the DeepxVA Solver. A more careful evaluation of the numerical results should also take into account therandomness of the algorithm (through the inner and outer Monte Carlo estimation and stochasticgradient descent).4.1. A forward on S . In this case we consider g ( S T ) = S T − K with K = s . The pathwise exposure ˆ V at time s ∈ [ t, T ] is given byˆ V s = E Q (cid:104) e − r ( T − s ) ( S T − K ) (cid:12)(cid:12)(cid:12) F s (cid:105) = S s − Ke − r ( T − s ) . Substituting in (4.2) one has DEPE( s ) = S t Φ( d ) − Ke − r ( T − t ) Φ( d ) , (4.4) DENE( s ) = S t Φ( − d ) − Ke − r ( T − t ) Φ( − d ) , (4.5) Figure 2.
Forward contract: approximated exposure (left) and EPE, ENE (right).Parameters used: outer MC paths P = 2048, inner MC paths L = 64, internal layers L − ν = d + 20 = 21, I = 4000, time steps N = 200, no batch normalization.where Φ( · ) denotes the standard normal cumulative distribution function and, as usual, d = ln (cid:0) e r ( t − s ) S t /K (cid:1) + (cid:0) r + σ / (cid:1) ( s − t ) σ √ s − t and d = d − σ √ s − t.σ K T .
25 100 1
Table 1.
Parameters used in numerical experiments.We report in Figure 2 the plot of the numerical results obtained by Algorithm 2 using the parametersin Table 1 and r = 0. In particular, on the left we plot the simulated pathwise exposure, i.e. the paths t n → V ( p ) n for p = 1 , . . . , P , while on the right we compare the approximated EPE and ENE (solidlines) with the exact expected exposures given by (4.4)–(4.5) (dashed lines).4.2. A European call option.
In this case we consider g ( S T ) = ( S T − K ) + , where we set K = s . The pathwise exposure ˆ V at time s ∈ [ t, T ] is given by the Black-Scholes formulaˆ V s = E Q (cid:104) e − r ( T − s ) ( S T − K ) + (cid:12)(cid:12)(cid:12) F s (cid:105) = S s Φ( d ) − Ke − r ( T − s ) Φ( d ) > . It follows immediately that DEPE( t ) = E Q (cid:104) e − r ( s − t ) ˆ V s (cid:12)(cid:12)(cid:12) F t (cid:105) = ˆ V t , and DENE( t ) = 0 . The results obtained using Algorithm 2 with the parameters in Table 1 and r = 0 .
01 are reportedin Figure 3 (left). The exact European call price is 10 .
40, while the approximation of the exposureobtained by the solver and reported in Figure 3(left) takes values, for t ∈ [0 , T ], within the interval[10 . , .
33] with maximum distance 0 .
09 to the exact solution.4.3.
A basket call option.
Let us now consider the case of several underlying assets ( S , . . . , S d ):d S it = r i S it d t + σ i S it d W Q ,it , S = s i ∈ R d , i = 1 , . . . , d, SDES OF XVA 15
Figure 3.
DEPE and DENE for a European call option (left) and a European basketoption with 100 underlyings (right). Parameters used: outer MC paths P = 1024, innerMC paths L = 64, internal layers L − ν = d + 20 = 21 (left) and ν = d + 10 = 110(right), I = 4000, time steps N = 100, with batch normalization.where W Q = ( W Q , , . . . , W Q ,d ) is a standard Brownian motion in R d with correlation matrix ( ρ i,j ) ≤ i,j ≤ d .We set d = 100. A European basket call option is associated with the payoff g ( S T , . . . , S dT ) = (cid:32) d (cid:88) i =1 S iT − K (cid:33) + . The results obtained by Algorithm 2 using the parameters in Table 1 with σ i = σ for all i = 1 , . . . , d ,zero correlation, and r i = r = 0 .
01 are reported in Figure 3 (right).The distinctive feature of the present example is the high dimension of the vector of risk factors. Whilethe two previous one-dimensional examples mainly served as a validation for the methodology, thepresent example highlights the ability of the proposed methodology to provide an accurate numericalapproximation in a high-dimensional context. For this example, we used the feedforward neuralnetwork with two layers and d + 10 nodes, with a ReLU activation function. The approximationparameters used are reported in the caption of Figure 3 (right). We increase the number of nodes ν roughly linearly with the dimension d , which turned out to be a useful rule-of-thumb for consistentaccuracy across dimensions in this case.A detailed study of deep learning values of basket derivative (on six underlying asses) from simulatedvalues, not based on BSDEs, see Ferguson and Green (2018).For the case of the basket call option, we observe that the exposure profile corresponds to the presentvalue of the contract. As a consequence, we obtain a simple method to validate the exposure profile bycomputing an estimate of the basket call option price by means of a standard Monte Carlo simulationwith 10 paths. We regard this as the ‘exact’ price. The Monte Carlo price we obtained is 398 .
08 withconfidence interval [397 . , . .
02 and 400 .
82, achieving the maximumdistance 5 .
06 to the Monte Carlo price at time t = 0.For this product we also perform an xVA calculation with the objective to validate Algorithm 2 andAlgorithm 3 in a case where both are applicable. To perform this comparison, we need the xVA BSDEto be non-recursive: this can be achieved by assuming that there is a unique risk-free interest rate, sothat FVA and ColVA are identically zero, i.e., xVA consists only of the CVA and DVA term. The ideais then to compare a Monte Carlo estimate of xVA according to Algorithm 2 with the initial value ofthe BSDE as produced by a full application of Algorithm 3. We assume that the default intensities of the bank and the counterparty are λ C, Q = 0 .
10 and λ B, Q =0 .
01, respectively. For the recovery rates we set R C = 0 . R B = 0 .
4, while the unique risk-freeinterest rate is r = 0 .
01. Using the same network setting (see again the caption of Figure 3, right), theDeep xVA Solver produced an xVA estimate of 208 .
55 by means of Algorithm 3, whereas the estimateproduced by Algorithm 2 is 211 . Realistic simulation of the collateral account.
A useful feature of our proposed approachconsists in the possibility of performing realistic simulations of the collateral account without resortingto simplifying assumptions. We can in fact compute the overall outstanding exposure between thebank and the counterparty by the following steps. Algorithm 1 allows us to simulate paths for allprocesses ˆ V m , m = 1 , . . . , M . Such paths can then be aggregated so as to produce a simulation ofthe portfolio process ˆ V = (cid:80) Mm =1 ˆ V m , that corresponds to the pre-collateral exposure . After this, wecompute the value of the collateral balance C corresponding to the simulated paths of ˆ V , which inturn allows us to compute the post-collateral exposure process ˆ V − C that enters the xVA formulas.For illustration, we consider M = 1 and the equity forward from the first example. We introducea simple example of a collateral agreement where collateral is exchanged between the counterpartiesat every point in time (a margin call frequency that does not coincide with the simulation timediscretization can of course be treated as well). Collateral is exchanged only in case the pre-collateralexposure is above (below) a receiving (posting) threshold which are both set equal to 5, i.e. C t := C ( ˆ V t ) = ( ˆ V t − + − ( ˆ V t + 5) − . An illustration for a single path is provided in Figure 4.
Exposure Before Collateral
Collateral Balance
Exposure After Collateral
Figure 4.
Pathwise simulation of a collateralized exposure. The top left panel: ˆ V .Top right panel C . Bottom panel ˆ V − C . Posting and receiving threshold are 5 EUR. SDES OF XVA 17
Impact of number of layers and nodes.
The aim of this section is to analyse the impact ofnumber of nodes and layers of the neural network on the quality of the approximation in our setting.When considering the richness of the network, one can face two adverse, albeit opposite, situations,which are abundantly documented in the literature for various applications: on the one hand, choosingan overly simplistic structure implies that the model underfits , i.e., has a poor explanatory capability;on the other hand, a network architecture that is too rich might result in a limited capability of themodel to generalize when simulating new paths of the risk factors. This second situation is usuallytermed overfitting .To analyse the issue in the present setting, we consider the forward contract from Section 4.1 andapply Algorithm 1 to estimate the exposure of the contract. In this case, there is only one risk factor,namely the stock price, so that d = 1. We test neural networks with different depths, from one tohidden three layers, and different numbers of nodes, between d and d + 20 per layer. A graphicalrepresentation of the results is provided in Figure 5. We can clearly observe that a single (non-deep)neural network does not succeed in providing a satisfactory fit to the data, while including more nodescan improve the fit. A two layer network provides the best explanatory capability, with the bestresults being provided by a 2-layer configuration with d + 20 nodes. Adding a further layer, for thisparticular example, does not lead to an improvement of the fitting quality as testified by the last lineof Figure 5, which further shows that adding more nodes has a detrimental effect.5. Preliminary conclusions and extensions
The proposed xVA algorithm exploits two useful complementary aspects of the Deep BSDE Solverof E et al. (2017). First, the formulation as an optimisation problem over a parametrisation of the(Markovian) control of the xVA BSDE, which is carried out by SDE discretisation and path sampling,directly gives both the hedge ratios in approximate functional form and model-based derivative pricesalong the sample paths. This is amenable to the simulation of exposure profiles, the computationof higher-order Greeks by pathwise differentiation, and allows for the computation of funding andmargin variation adjustments as well as xVA hedging. A second aspect of the Deep BSDE Solver isthe use of neural networks specifically as parametrisation for the Markovian control. A key advantageresults from the approximation power of neural networks in high dimensions, which has the potentialto make risk management computations on portfolio level feasible. Moreover, the simple functionalform allows standard pathwise sensitivity computations.Our numerical examples provide a proof of concept, but further systematic testing in realistic appli-cation settings is needed. While the basket option example gave good accuracy in a high-dimensionalapplication, we encountered delicate issues with fitting the ANN even for the simple forward contract.An additional difficulty arises from the non-linear, non-convex parametric form, which, combinedwith the large number of parameters, leads to callenging optimisation problems. Both these aspects,the expression power of the ANN and the practicalities of the learning process, are extremely activeresearch areas and further developments of the proposed Deep xVA Solver will be informed by therapidly developing understanding of neural networks in a broader sense.The application of our proposed scheme is not restricted to the chosen xVA framework. For example,one could in principle apply our methodology to the balance-sheet based model computed in Cr´epeyet al. (2019); Albanese et al. (2020). In this case, the xVA computation involves multiple recursivevaluations (illustrated succinctly in Abbas-Turki et al. (2018, Figure 1)), which can be approached bymeans of multiple applications of the Deep xVA Solver. (a) d + 8 nodes (b) d + 20 nodes (c) d + 8 nodes (d) d + 20 nodes (e) d + 8 nodes (f) d + 20 nodes Figure 5.
Estimation of the exposure for the forward from Example 4.1 obtained with200 time steps and different numbers of layers and nodes. From the top to bottom:results obtained with 1, 2, 3 layers with d + 8 (left) and d + 20 (right) nodes.We also emphasise that the Deep xVA Solver can be combined with an existing analytics library: thecomputation of the mark-to-market cube (i.e., the simulation of all possible scenarios for the cleanvalues over different points in time) represents a classical numerical problem to be solved in order tocompute traditional risk figures such as Value-at-Risk or Expected Shortfall (this is often referred toas “Monte Carlo full revaluation approach”). Since most products individually depend on a limited SDES OF XVA 19 number of risk factors, it may be best to use a traditional numerical scheme, such as a finite differencesolver, for at least some of the more vanilla products, and then revaluate the products over differentMonte Carlo paths by means of a look-up table over the pre-computed numerical solution. Thisprovides an alternative route with respect to our Algorithm 1 for the simulation of the clean values.However, once we aggregate all mark-to-markets, we end up with an object that depends on a highnumber of risk factors, so for the computation of xVA our proposed methodology provides a usefultool which allows the recursive computation of valuation adjustments, their hedging strategy, andsimulation of collateral.
References
Abbas-Turki, L. A., Cr´epey, S., and Diallo, B. (2018). XVA principles, nested Monte Carlo strategies,and GPU optimizations. International Journal of Theoretical and Applied Finance, 21(06):1850030.Albanese, C., Cr´epey, S.and Hoskinson, R., and Saadeddine, B. (2020). XVA analysis from thebalance sheet. Working paper; available at https://math.maths.univ-evry.fr/crepey/papers/xva-analysis-balance.pdf.Biagini, F., Gnoatto, A., and Oliva, I. (2019). Pricing of counterparty risk and funding with CSAdiscounting, portfolio effects and initial margin. arXiv preprint arXiv:1905.11328.Bichuch, M., Capponi, A., and Sturm, S. (2018a). Arbitrage-free XVA. Mathematical Finance,28(2):582–620.Bichuch, M., Capponi, A., and Sturm, S. (2018b). Robust XVA. arXiv preprint arXiv:1808.04908.Bielecki, T. and Rutkowski, M. (2004). Credit Risk: Modeling, Valuation and Hedging. Springer,Berlin, New York.Bielecki, T. and Rutkowski, M. (2015). Valuation and hedging of contracts with funding costs andcollateralization. SIAM Journal of Financial Mathematics, 6(1):594–655.Brigo, D., Buescu, C., Francischello, M., Pallavicini, A., and Rutkowski, M. (2018). Risk-neutral valua-tion under differential funding costs, defaults and collateralization. arXiv preprint arXiv:1802.10228.Brigo, D., Capponi, A., and Pallavicini, A. (2014). Arbitrage-free bilateral counterparty risk valuationunder collateralization and application to credit default swaps. Mathematical Finance, 24(1):125–146.Brigo, D. and Masetti, M. (2005). Risk neutral pricing of counterparty risk. In Pykhtin, M., edi-tor, Counterparty Credit Risk Modeling: Risk Management, Pricing and Regulation. Risk Books,London.Brigo, D., Morini, M., and Pallavicini, A. (2013). Counterparty Credit Risk, Collateral and Funding.Wiley Finance. Wiley, Chichester.Brigo, D. and Pallavicini, A. (2014). Nonlinear consistent valuation of CCP cleared or CSA bilat-eral trades with initial margins under credit, funding and wrong-way risks. Journal of FinancialEngineering, 01(01):1450001.Brigo, D., Pallavicini, A., and Papatheodorou, V. (2011). Arbitrage-free valuation of bilateral counter-party risk for interest-rate products: Impact of volatilities and correlations. International Journalof Theoretical and Applied Finance, 14(06):773–802.Broadie, M., Du, Y., and Moallemi, C. (2011). Efficient risk estimation via nested sequential simula-tion. Management Science.Broadie, M., Du, Y., and Moallemi, C. (2015). Risk estimation via regression. Operations Research.Burgard, C. and Kjaer, M. (2011a). In the balance. Risk, November:72–75.
Burgard, C. and Kjaer, M. (2011b). Partial differential equation representations of derivatives withbilateral counterparty risk and funding costs. The Journal of Credit Risk, 7(3):1–19.Capriotti, L., Jiang, Y., and Macrina, A. (2017). AAD and least-square Monte Carlo: Fast Bermudan-style options and XVA Greeks. Algorithmic Finance, 6(1-2):35–49.Cherubini, U. (2005). Counterparty risk in derivatives and collateral policies: The replicating portfolioapproach. In Tilman, L., editor, ALM of Financial Institutions. Institutional Investor Books.Cr´epey, S. (2015a). Bilateral counterparty risk under funding constraints – Part I: Pricing.Mathematical Finance, 25(1):1–22.Cr´epey, S. (2015b). Bilateral counterparty risk under funding constraints – Part II: CVA. MathematicalFinance, 25(1):23–50.Cr´epey, S., Bielecki, T. S., and Brigo, D. (2014). Counterparty risk and funding: a tale of twopuzzles, volume 31 of Chapman and Hall/CRC Press Series in Financial Mathematics. Chapmanand Hall/CRC, Boca Raton.Cr´epey, S., Hoskinson, R., and Saadeddine, B. (2019). Balance sheet XVA by deep learning andGPU. Working paper; available at https://math.maths.univ-evry.fr/crepey/papers/xva-analysis-balance.pdf.Cybenko, G. (1989). Approximations by superpositions of sigmoidal functions. Mathemtics of Control,Signals, and Systems, 2(4):303–314.de Graaf, C., Feng, Q., Kandhai, B. D., and Oosterlee, C. (2014). Efficient computation of exposureprofiles for counterparty credit risk. International Journal of Theoretical and Applied Finance,17(4).de Graaf, C., Kandhai, B. D., and Reisinger, C. (2018). Efficient exposure computation by risk factordecomposition. Quantitative Finance, 18(10):1657–1678.Delong, L. (2017). Backward Stochastic Differential Equations with Jumps and Their Actuarial andFinancial Applications. Springer, Berlin, New York.Duffie, D. and Huang, M. (1996). Swap rates and credit quality. The Journal of Finance, 51(3):921–949.E, W., Han, J., and Jentzen, A. (2017). Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations.Communications in Mathematics and Statistics, 5:349–380.El Karoui, N., Peng, S., and Quenez, M. C. (1997). Backward stochastic differential equations infinance. Mathematical Finance, 7(1):1–71.Ferguson, R. and Green, A. (2018). Deeply learning derivatives. arXiv preprint arXiv:1809.02233.Fries, C. P. (2019). Stochastic automatic differentiation: Automatic differentiation for Monte-Carlosimulations. Quantitative Finance, 19(6):1043–1059.Fujii, M., Shimada, A., and Takahashi, A. (2010). Note on construction of multiple swap curves withand without collateral. Available at SSRN: http://ssrn.com/abstract=1440633 .Fujii, M., Shimada, A., and Takahashi, A. (2011). A market model of interest rates with dynamicbasis spreads in the presence of collateral and multiple currencies. Wilmott, 54:61–73.Fujii, M., Takahashi, A., and Takahashi, M. (2019). Asymptotic expansion as prior knowledge in deeplearning method for high dimensional BSDEs. Asia-Pacific Financial Markets, 26(3):391–408.Gordy, M. B. and Juneja, S. (2010). Nested simulation in portfolio risk measurement. ManagementScience, 56(10):1833–1848.Green, A. (2015). XVA: Credit, Funding and Capital Valuation Adjustments. Wiley Finance. Wiley,Chichester.Gregory, J. (2015). The xVA challenge. Wiley Finance. Wiley, Chichester.
SDES OF XVA 21
Han, J. and Long, J. (2018). Convergence of the deep BSDE method for coupled FBSDEs. arXivpreprint arXiv:1811.01165.Henry-Labordere, P. (2017). Deep primal-dual algorithm for BSDEs: applications of machine learningto CVA and IM. Available at SSRN:https://ssrn.com/abstract=3071506.Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks,4(2):251–257.Hur´e, C., Pham, H., and Warin, X. (2020). Some machine learning schemes for high-dimensionalnonlinear PDEs. Mathematics of Computations, 89:1547–1579.Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T. A., and von Wurstemberger, P. (2018). Over-coming the curse of dimensionality in the numerical approximation of semilinear parabolic partialdifferential equations. arXiv preprint arXiv:1807.01212.Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducinginternal covariate shift. Proceeding of the 32nd International Conference on Machine Learning(CML).Jentzen, A., Salimova, D., and Welti, T. (2018). A proof that deep artificial neural networks over-come the curse of dimensionality in the numerical approximation of Kolmogorov partial differentialequations with constant diffusion and nonlinear drift coefficients. arXiv preprint arXiv:1809.07321.Joshi, M. and Kwon, O. (2016). Least squares Monte Carlo credit value adjustment with small andunidirectional bias. International Journal of Theoretical and Applied Finance, 19(8).Karlsson, P., Jain, S., and Oosterlee, C. (2016). Counterparty credit exposures for interest ratederivatives using the stochastic grid bundling method. Applied Mathematical Finance, 23(3):175–196.Kolmogorov, A. N. (1956). On the representation of continuous functions of several variables bysuperposition of continuous functions of one variable and addition. Doklady Akademii Nauk SSSR,108(2):679–681.Lichters, R., Stamm, R., and Gallagher, D. (2015). Modern Derivatives Pricing and Credit ExposureAnalysis: Theory and Practice of CSA and XVA Pricing, Exposure Simulation and Backtesting.Applied Quantitative Finance. Palgrave Macmillan, London.Longstaff, F. A. and Schwartz, E. S. (2001). Valuing American options by simulation: A simpleleast-squares approach. Review of Financial Studies, 14(1):113–147.Pallavicini, A., Perini, D., and Brigo, D. Funding Valuation Adjustment: a consistent frameworkincluding CVA, DVA, collateral, netting rules and re-hypothecation. arXiv preprint arXiv:1112.1521.Piterbarg, V. (2010). Funding beyond discounting: collateral agreements and derivatives pricing. RiskMagazine, 2:97–102.Piterbarg, V. (2012). Cooking with collateral. Risk Magazine, 2:58–63.Reisinger, C. and Zhang, Y. (2019). Rectified deep neural networks overcome the curse of dimension-ality for nonsmooth value functions in zero-sum games of nonlinear stiff systems. arXiv preprintarXiv:1903.06652.Ruf, J. and Wang, W. (2019). Neural networks for option pricing and hedging: a literature review.Available at SSRN:3486363.She, J.-H. and Grecu, D. (2017). Neural network for CVA: Learning future values. arXiv preprintarXiv:1811.08726.Sh¨oftner, R. (2008). On the estimation of credit exposures using regression-based Monte Carlo simu-lation. The Journal of Credit Risk, 4(4):37–62.
Sokol, A. (2014). Long-Term Portfolio Simulation: For XVA, Limits, Liquidity and Regulatory Capital.Risk Books, London. (Alessandro Gnoatto)
University of Verona, Department of Economics,via Cantarane 24, 37129 Verona, Italy
E-mail address , Alessandro Gnoatto: [email protected] (Athena Picarelli)
University of Verona, Department of Economics,via Cantarane 24, 37129 Verona, Italy
E-mail address , Athena Picarelli: [email protected] (Christoph Reisinger)
Oxford University, Mathematical InstituteROQ, Woodstock Rd, Oxford, OX2 6GG, UK
E-mail address , Christoph Reisinger:, Christoph Reisinger: