Deep learning Profit & Loss
DDeep learning Profit & Loss
Pietro Rossi ∗ Flavio Cocco † Giacomo Bormetti ‡ August 28, 2020
Abstract
Building the future profit and loss (P&L) distribution of a portfolio holding,among other assets, highly non-linear and path-dependent derivatives is a challeng-ing task. We provide a simple machinery where more and more assets could beaccounted for in a simple and semi-automatic fashion. We resort to a variation of theLeast Square Monte Carlo algorithm where interpolation of the continuation valueof the portfolio is done with a feed forward neural network. This approach has sev-eral appealing features not all of them will be fully discussed in the paper. Neuralnetworks are extremely flexible regressors. We do not need to worry about the factthat for multi assets payoff, the exercise surface could be non connected. Neither wehave to search for smart regressors. The idea is to use, regardless of the complexityof the payoff, only the underlying processes. Neural networks with many outputscan interpolate every single assets in the portfolio generated by a single Monte Carlosimulation. This is an essential feature to account for the P&L distribution of thewhole portfolio when the dependence structure between the different assets is verystrong like the case where one has contingent claims written on the same underlying.
Keywords : Feed-forward neural networks; profit & loss distribution; non-linear port-folios ∗ Prometeia S.p.A. and University of Bologna, Bologna, Italy. Corresponding author:[email protected] † Prometeia S.p.A. and University of Bologna, Bologna, Italy. ‡ University of Bologna, Italy a r X i v : . [ q -f i n . R M ] A ug Introduction
The computation of the future profit & loss distribution of a portfolio is the key step tomanage risk and to set aside regulatory capital. To perform it, the financial industry deviseddifferent strategies, ranging from historical full evaluation to parametric modeling, linearand quadratic mapping on risk factors, and closed-form analytic approximations (McNeilet al., 2015). As it is well known, the problem has a direct and viable solution if we consideronly assets whose future price has a fast analytic solution. In this case, we just generateenough trajectories to the desired time horizon and, for each generated trajectory we haveto compute the present value of the portfolio. To build an accurate distribution we needseveral trajectories of the underlying and, if the portfolio has a large number of assets, thecomputation can still be formidable, but it can be easily parallelised. If we are not withinthis context, and the only way to compute the present value of an asset is via Monte Carlosimulation, we are in the unpleasant situation that for each scenario we must perform anew Monte Carlo simulation. The situation is even worse if we are dealing with stronglypath-dependent derivatives – such as American or Bermuda options – and highly non-linearpayoff in multiple dimensions. The problem can easily turn out to be not tractable.One way out of this deadlock is to resort to techniques inspired by the Least SquareMonte Carlo (LSM) (Longstaff and Schwartz, 2001) to estimate, via back propagation, thecontinuation value of the portfolio produced by the optimal strategy (Karlin and Taylor,1975). This method, that usually is performed with polynomial interpolation of the contin-uation value, has proven effective in many situations. The major drawback is that for eachasset in the portfolio the ideal strategy has to be tailored to the asset and, the polynomialinterpolation, when used with many variables, shows a marked preference for over fitting.The approach we follow is along the lines of the LSM method, but with some significantvariation. Interpolation is done using feed forward neural networks (FFNN) (Bishop et al.,1995; Ferguson and Green, 2018). The impact on the entire procedure is striking. This2ures the curse of dimensionality associated with the polynomial interpolators. FFNNflexibility and ability to learn (fit) the price of the portfolio is remarkable. Rather thanusing just one trajectory to propagate back the continuation value, from each time horizonwe perform a short Monte Carlo with very few trajectories and, at each time horizon,we train the network to learn the continuation value. Once we have the coefficients of thetrained network at each t we launch a large simulation and compute the desired distributionusing our trained networks as proxies of the continuation value. The key idea behind,that is the main contribution of this work, is that even with relatively few trajectoriesin the training phase, we can still obtain an unbiased set of FFNN capable to producethe correct P&L distribution. It is worth to remark again that the procedure we detail iscapable to deal with whole portfolios, rather than just single assets, and it seems to be theonly feasible approach to the proper reconstruction of the P&L distribution in presenceof highly non-linear effects and strong cross-dependence among portfolio components. Asa final comment, our work contributes to the spurring stream of quantitative researchapproaching the hedging problem by means of machine learning techniques (Cao et al.,2019; Buehler et al., 2019). When dealing with market risk what we are mainly interested in is the P&L distribution.Given the value V t for our portfolio at a future time t the profit or loss is defined as thedifference D (0 , t ) V t − V , with D (0 , t ) the discount factor. This way we compare two quantities originating at thesame time. The quantity D (0 , t ) V t is a random variable defined as D (0 , t ) V t = E [ D (0 , T ) V T | F t ] = D (0 , t ) E [ D ( t, T ) V T | F t ] , that is the expected value, conditional to the information known in t .3he steps needed to estimate the P& L distribution are:1. sample the event space described by F t
2. for each sampled event, compute D (0 , t ) V t
3. build the cumulative distribution function (CDF) for D (0 , t ) V t − V .Once we take control of the CDF for our portfolio, we can tackle issues like the computationof risk measures, Value-at-Risk (VaR), Expected Shortfall, expectiles, or the expectedpositive/negative exposure, if we are interested in CVA or DVA contributions.Computation of V t , possibly for a large set of values of t can be a daunting process. If wedo not have available a fast way to perform it, we must resort to Monte Carlo simulation,but a Monte Carlo nested inside another Monte Carlo is most of the time just too expensivefrom the computational point of view. The strategy we pursue in this work is to producea fast device to compute V t for every t we are interested in and every possible event in F t . Then, we will use it to produce the desired CDF. The fast device we are hinting at isan FFNN taking in input, as regressors, all of the underlyings entering the problem andproducing a multivariate fit, one value for each asset making up our portfolio. In a financial world – to fix ideas we can think about handling a portfolio – we assumedecisions based on what we know. These decisions will have consequences according towhat will happen in the future but we do not know yet. The main consequence is that ourdecision will generate a cash flow i (positive, negative or null ) and will have some impacton future cash flows.To make the reasoning more formal, we look at a time horizon T , and break it up into N intervals now = t < t < · · · < t N − < t N = T = portfolio horizon . S n the set of variables describing the situation in t n and, in loose speech, the filtration F n can be seen as: F n = n (cid:91) i =0 S i , all of the information available up to t n . The stochastic nature of the world is describedby a transition probability p ( S n , t n | S n − , t n − ) to go from the state S n − in t n − to thestate S n in t n .A strategy φ is an adapted function to the filtration. In other words, it is a decisionone takes based solely on past and present information. The decision we assume in t n isbased on F n , and has immediate consequences depending on S n and future (unknown)consequences. The immediate consequence is the generation of a cash flow i n . Think of anAmerican Put option: based on current knowledge, we decide to exercise and the immediateconsequence, is the payoff that we pocket. The future, unknown consequence is that weforgo the possibility to exercise at a later time, possibly in a more advantageous condition.The following discussion, purely a review, relies heavily on material from (Karlin andTaylor, 1975; Longstaff and Schwartz, 2001) and is a standard issue in the optimal stoppingtime literature. Let’s call a n the decision we take in t n , φ the strategy emerging from the N choices we will make and D (0 , t n ) the discount factor to be applied to a cash flow in t n .The value of the φ strategy we decide to enact will be: I ( φ, S ) = E (cid:34) N (cid:88) n =1 D (0 , t n ) i ( S n , a n ) (cid:35) . (1)1. evolution occurs in the interval [ t n − , t n ), and we make our decision in t n having fullknowledge of everything that happened up to that point;2. after we have made our choice, S n will evolve to S n +1 with a probability describedby the matrix p ( S n +1 , t n +1 | S n , t n ).Next step is to build a function that is an upper bound for the value described ineq. (1). Let V ( S N ) the payoff at maturity of the contract in exam. It is well known that In the American Put option example, a n can take one of either two values { h = hold , s = stop } . f N ( S N ) = i ( S N , a N ) def = V ( S N ) , ∀ S N f n − ( S n − ) = max a n − ( i ( S n − , a n − ) , E [ D ( t n − , t n ) f n ( S n ) | F n − ]) , for n = 1 , . . . , N − . The previous solution was first derived in (Bellman, 1952) as the optimality condition indynamic programming and it is experiencing a new life as the key formula in ReinforcementLearning (Cao et al., 2019).
Different assets, say K , in a portfolio are characterized by the fact that they have differentcash flow structure i kn and the previous result can be easily generalized as: f kN ( S N ) = i k ( S N , a kN ) def = V k ( S N ) , ∀ S N f kn − ( S n − ) = max a kn − (cid:0) i k ( S n − , a kn − ) , E [ D ( t n − , t n ) f kn ( S n ) |F n − ] (cid:1) , for n = 1 , . . . , N − . From the above section we conclude that what we need is a reliable estimate of the transitionprobability p ( S n +1 , t n +1 | S n , t n ) . Even though modeling the right process is the major concern when pricing a portfolio,in this paper we are more focused on the methodological aspects. We will make our lifesimpler and assume that all of the assets undergo a log-normal process. More complexdynamics can be readily dealt with, provided they belong to the class of Markov processes,possibly of order higher than one.Let S denote the vector of underlying prices S , . . . , S K . Starting from t N − , for eachtrajectory j we have a value S ( j, t N − ). From each S ( j, t N − ) we launch M one-steptrajectories from t N − to t N according to the law p ( S N , t N | S N − , t N − ). The small set oftrajectories originating from S ( j, t N − ) will be denoted as S m ( j, t N ) for m = 1 , . . . , M . In6 N , we know what is the payoff for each possible triplet S m ( j, t N ) so we can easily computethe corresponding payoffs V k ( S m ( j, t N ) ) for each asset k = 1 , . . . , K . The price in t N − conditioned on S ( j, t N − ) is given by V k ( t N − | S ( j, t N − )) = E [ D ( t N − , t N ) V k ( S ( j, t N ) ) | S ( j, t N − ) ] (2)and the right hand side of the above equation can be estimated by the quantity1 M M (cid:88) m =1 D ( t N − , t N ) V k ( S m ( j, t N ) ) . (3)Now we have V k ( t N − | S ( j, t N − )) for k = 1 , . . . , K at each point S ( j, t N − ) and we usea neural network to fit it using as input variables S ( j, t N − ). We name the interpolatingfunction C N − int . The symbol C has been chosen as a mnemonic for “Continuation” and isbasically a vector value function: C N − int : S ( j, t N − ) → V ( t N − | S ( j, t N − )) , where V ( t N − | S ( j, t N − ) is the vector of prices subject to the fact that we have notexercised our option till t N − . Components of V , defined in eq. (2), are the price processesof every single asset making up the portfolio. At this point we repeat the process at t N − . We launch again a small set of one-step trajectories from each S ( j, t N − ) producing S m ( j, t N − ). The difference this time is that in t N − instead of using the known payoff tocompute prices for the one-step trajectory, we will use the interpolating function C N − int andcompute the price of the trajectory asmax( i k ( S m ( j, t N − )) , C N − int ( S m ( j, t N − )) )where i k is the cash flow obtained by exercising the option. The recursive structure ofthis scheme is rather evident and we can build interpolating functions all the way down tothe first exercise date t . At each time slice, we have calibrated an interpolating function7hat we can use to compute continuation values. Now, generating P&L distributions is astraightforward matter:1. generate L trajectories from t to t n ;2. for each trajectory use the interpolator C nint to compute V j ( t n );3. the sample distribution of D (0 , t n ) V j ( t n ) − V ( t ) will provide the wanted CDF of theportfolio P&L.As a non negligible bonus, we obtain all the marginal distributions for each asset in theportfolio.The continuation value, propagated backward, is what we use to build the P&L dis-tribution and it could be used as a proxy for the price itself. Unfortunately, it is not areliable neither an accurate quantity. For once, the numbers obtained can hardly be con-sidered independent. Therefore we cannot produce at all an estimate for the statisticalerror. Besides, as is well known in literature (Longstaff and Schwartz, 2001) such a pro-cedure produces a value biased towards higher values. For accurate pricing, it is a muchbetter procedure to use our interpolator as our policy maker. We generate a brand new setof trajectories and for each trajectory we use our FFNN-based continuation value as thestrategy maker. For each time horizon we decide to exercise if the payoff is higher than thecontinuation value, otherwise we check the next time slice. It is worth pointing out thatthis way of proceeding establishes a strategy that is not looking forward, therefore it canbe, at best, as good as the optimal. The price computed this way is always a lower boundfor the real price. It is a simple consequence of the fact that any legitimate strategy is atmost as good as the optimal strategy. 8 Numerical Experiments
In all the numerical experiments performed, the stochastic process of the underlying isdefined by: S i ( t + ∆ t ) = S i ( t ) exp (cid:18) ( r − δ i )∆ t − σ i t + σ i ( W i ( t + ∆ t ) − W i ( t ) ) (cid:19) , with r the instantaneously short rate, δ i the continuously compounded dividend yield, σ i the volatility of the Wiener process W it . All Wiener processes are non correlated. Fur-thermore in modeling the portfolios used as an example, we have made some technicallysimplified assumption, namely all of the assets within a given portfolio share the sameexercise schedule and maturity. The portfolio we study is made up of three assets:1. an American Put written on S x , whose payoff readspayoff AM = ( κ am − S x ( T ) ) + ,
2. a European Call on Min written on S x and S y , wherepayoff Cm = ( min( S x ( T ) , S y ( T )) − κ cm ) + ,
3. a Bermuda Call on Max written on S x , S y , and S z , whose payoff is given bypayoff bCM = ( max( S x ( T ) , S y ( T ) , S z ( T )) − κ bcm ) + . In section 5.4 we look at a one year maturity portfolio with monthly exercise scheduleboth for the American Put and the Bermuda option, while in section 5.5 we look at thesame portfolio on a three years horizon and four months exercise schedule.9 .2 The Interpolator
The network used for the interpolation is a very simple feed forward network with twohidden layers, with 10 nodes each. Its topology is detailed in fig.(1). The activationfunctions are sigmoidals on both layers. We have tried with different activation functionswithout any visible benefit.Figure 1: The topology of the FFNN used as interpolator at each exercise date. All thenodes in the hidden layers are connected even though, for graphical reason, this is notproperly represented. For the same reason, the bias node at the input layer and each ofthe hidden layers, is not represented. The output nodes will produce, as described in thetext, the continuation value of each of the assets in the portfolio. The legenda, from topdown is as follow: American Put, European Call on Min, Bermuda Call on Max.
The portfolio described above has been simulated in two different contexts: the first one fora one-year maturity and monthly exercise schedule, the second for a three-year maturityand triannual exercise schedule. In sections (5.4, 5.5) we show results for the price of theportfolio in these two scenarios and the quantiles for each individual assets as well as thewhole portfolio. Given our definition of P&L negative quantiles correspond to losses, thenVaR is the negative of the quantile. To check for correctness of the results is not verysimple. We elected to perform the following checks. As far as the individual assets areconcerned, we compared results, both for prices and quantiles, obtained while handling the10hole portfolio in one simulation, with results obtained simulating each asset individually.Results show no difference in the two cases.For some of the assets considered there are in the literature benchmark results, namelythe Bermuda Call on Max and the American Put can be compared with high precisionvalues coming from PDE approaches. Results form both of these checks are presentedin section 6. Figures (2) and (3) come from the three-year portfolio. They compare theP&L distribution four months after contract start and two years after contract start. Thecomparison is done for each single asset individually as well as the whole portfolio. Thefilled gray curve corresponds to the P&L CDF of the whole portfolio. The bold, dashed,and dashed-dotted lines represent the P&L CDF of the American Put, European Call onMin, and Bermudan Call on Max, respectively. It is important to stress that the CDF ofeach asset was computed concurrently with the P&L distribution of the entire portfolio.For ease of readability, both figures report the horizontal lines corresponding to the 1, 10,50, 90, and 99 percent probability levels. The Tables (5) and (6) detail the associatedquantile values for the portfolio and for each single component.The procedure is capable to provide accurate results and the ability to deal with wholeportfolios, rather than just single assets allows for a rather accurate estimate of the hedgingeffects. In this toy example we have clearly the American Put hedging against the two calls,and the results of the VaR show markedly this effect. It is worth pointing out that the VaRof the whole portfolio could not have been estimated by any approach that would considereach asset separately. The portfolio was build with assets hedging each other and the onlyway to account for this effect is to compute, for each scenario, the contribution to the P&Ldistribution of the whole portfolio. This hedging effect is quite noticeable both in table 2than in table 3. The portfolio VaR is significantly different than the sum of VaR for eachasset. 11 .4 One-year maturity
The maturity is one year after contract start, exercise dates are monthly, both for theAmerican Put and the Bermuda Call on Max. Parameter values are as follows S x = 1 . , S y = 1 . , S x = 1 . ,κ am = 1 . , κ cm = 0 . , κ bCM = 1 . ,σ x = σ y = σ z = 0 . , r = 5 . , δ x = δ y = δ z = 3 . . Asset PriceAM 6.943 ± ± ± The maturity is three years after contract start, exercise dates are every four months, bothfor the American Put and the Bermuda Call on Max. Parameter values are as follows12sset/Quantile .01 .10 .50 .90 .99portfolio -15.36 -10.18 -.50 13.51 26.34AM -5.92 -5.21 -1.00 7.40 16.25Cm -5.47 -4.35 -.86 5.94 12.75bCM -13.81 -8.92 -.29 10.93 20.30Table 3: Quantiles of the P&L distribution six months after contract start for a portfoliowith one-year maturity and monthly exercise schedule. Quantiles are given for the wholeportfolio as well as each single asset. S x = 1 . , S y = 1 . , S z = 1 . ,κ am = 1 . , κ cm = 1 . , κ bCM = 1 . ,σ x = σ y = σ z = 0 . , r = 5 . , δ x = δ y = δ z = 10 . . Asset PriceAM 18.020 ± ± ± %10%50%90%99% -0.2 -0.1 0 0.1 0.2 0.3 0.4 P&L CD F PortfolioAmerican PutEuropean Call on MinBermuda Call on Max
Figure 2: Three years portfolio, four months after contract start. The filled gray curvecorresponds to the P&L CDF of the whole portfolio. The bold, dashed, and dashed-dottedlines represent the P&L CDF of the American Put, European Call on Min, and BermudanCall on Max, respectively. The CDF of each single asset was computed while processingthe whole portfolio.
There are no results listed in the literature for the quantiles of the future P&L distributionbut, for some of the parameters used above, namely for the three-year exercise schedule,some results are provided in (Becker et al., 2019; Gouden`ege et al., 2020), concerning theBermuda Call on Max.Table (7) compares our results with those in (Becker et al., 2019) where the price andthe 95% confidence interval (CI) are provided. These results have been obtained using theinterpolated continuation value as a proxy for the exercise surface. FFNN-based prices arefully consistent with the benchmark 95% CI.Furthermore, we can perform accurate checks for the American Put option. In table (8)we give the corresponding results. The column labeled ∆ t is the interval between exercisedates, and the value ∆ t = 0 is the result of a linear regression on the non zero ∆ t . The15 %10%50%90%99% -0.4 -0.2 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 P&L CD F PortfolioAmerican PutEuropean Call on MinBermuda Call on Max
Figure 3: Three years portfolio, two years after contract start. Legend as in caption offigure (2).Nr Asset S price ±
95% MC error results from (Becker et al., 2019) 95% CI2 90 8 . ± .
005 8.074 [ 8.060,8.081]2 100 13 . ± .
007 13.899 [13.880,13.910]2 110 21 . ± .
006 21.349 [21.336,21.354]3 90 11 . ± .
007 11.287 [11.276,11.290]3 100 18 . ± .
008 18.690 [18.673,18.699]3 110 27 . ± .
007 27.573 [27.545,27.591]Table 7: Bermuda Call on Max prices and Monte Carlo errors from FFNN compared withresults from (Becker et al., 2019).column labeled PDE provides the result coming from a method based on partial differentialequations. The PDE result has been obtained solving the equation with a semi-implicitmethod, both with an SOR and an FFT based preconditioning. Some results with dividendsare shown in table (9). The quality of the agreement between the FFNN-based values andthe PDE results is remarkable.
We presented a general and simple machinery to compute the future P&L distribution ofa portfolio including non-linear, path-dependent, and strongly correlated derivatives. The16 t S Price Err PDE2M 90 11.340 ± ± ± ± ± ± ± ± ± ± ± ± σ = 20%, T = 1 year, Strike= 100, δ = 0.∆ t S Price Err PDE2M 90 12.309 ± ± ± ± σ = 20%, T = 1 year, Strike= 100, δ = 3%.idea was to leverage the flexibility of feed forward neural networks as universal approxi-mators and to employ them in a Least-Square Monte Carlo approach to price Americanand Bermuda derivatives. The advantages of the approach are manifold: i) the machinerycan easily manage the inclusion of new instruments. ii) The FFNNs cure the drawbacksrelated with the curse of dimensionality, which are inherently a problem with polynomialapproximating functions. iii) The neural networks are better designed to deal with nondifferentiable payoffs. iv) It is the unique viable approach for concurrently pricing in-struments with different exercise style and sensitive to the same risk factors avoiding thecomputationally burden nested Monte Carlo procedure. In this respect, it is worth stress-ing once more that our approach jointly recovers the portfolio P&L distribution and thesingle instruments’ marginals.This paper details the results from a toy experiment where we considered a portfolio17omposed by three strongly dependent derivatives – an American Put, a European Call onMin, and a Bermuda Call on Max – whose underlying assets follow a simple dynamics. Asa future perspective, we plan to investigate more realistic dynamics in a higher dimensionalsetting. Specifically, we want to assess to which extent the approach can be extended tomanage non-Markov dynamics. This case is of particular relevance given the flourishingof stochastic models where the volatility is driven by a fractional dynamics. The interplayamong the long-memory features and the high-dimensional nature of a portfolio may resultin a mixture whose complexity can be cured only by resorting to a neural network approach. References
Becker, S., P. Cheridito, and A. Jentzen (2019). Deep optimal stopping.
Journal of MachineLearning Research 20 (74), 1–25.Bellman, R. (1952). On the theory of dynamic programming.
Proceedings of the NationalAcademy of Sciences of the United States of America 38 (8), 716.Bishop, C. M. et al. (1995).
Neural networks for pattern recognition . Oxford UniversityPress.Buehler, H., L. Gonon, J. Teichmann, and B. Wood (2019). Deep hedging.
QuantitativeFinance 19 (8), 1271–1291.Cao, J., J. Chen, J. C. Hull, and Z. Poulos (2019). Deep hedging of derivatives usingreinforcement learning.
Available at SSRN 3514586 .Ferguson, R. and A. D. Green (2018). Deeply learning derivatives.
Available at SSRN3244821 .Gouden`ege, L., A. Molent, and A. Zanette (2020). Machine learning for pricing amer-ican options in high-dimensional Markovian and non-Markovian models.
QuantitativeFinance 20 (4), 573–591. 18arlin, S. and H. M. Taylor (1975).
A First Course in Stochastic Processes , Volume 1.Academic Press.Longstaff, F. A. and E. S. Schwartz (2001). Valuing American options by simulation: Asimple least-squares approach.
The Review of Financial Studies 14 (1), 113–147.McNeil, A. J., R. Frey, and P. Embrechts (2015).