[PDF] Deep Replication of a Runoff Portfolio

Abstract

To the best of our knowledge, the application of deep learning in the field of quantitative risk management is still a relatively recent phenomenon. This article presents the key notions of Deep Asset Liability Management (Deep~ALM) for a technological transformation in the management of assets and liabilities along a whole term structure. The approach has a profound impact on a wide range of applications such as optimal decision making for treasurers, optimal procurement of commodities or the optimisation of hydroelectric power plants. As a by-product, intriguing aspects of goal-based investing or Asset Liability Management (ALM) in abstract terms concerning urgent challenges of our society are expected alongside. We illustrate the potential of the approach in a stylised case.

Full PDF

DDEEP REPLICATION OF A RUNOFF PORTFOLIO

THOMAS KRABICHLER AND JOSEF TEICHMANN

Abstract.

To the best of our knowledge, the application of deep learning inthe ﬁeld of quantitative risk management is still a relatively recent phenome-non. This article presents the key notions of Deep Asset Liability Management(«Deep ALM») for a technological transformation in the management of as-sets and liabilities along a whole term structure. The approach has a profoundimpact on a wide range of applications such as optimal decision making fortreasurers, optimal procurement of commodities or the optimisation of hy-droelectric power plants. As a by-product, intriguing aspects of goal-basedinvesting or Asset Liability Management (ALM) in abstract terms concern-ing urgent challenges of our society are expected alongside. We illustrate thepotential of the approach in a stylised case. Introduction

Mismatches between opposing ﬁxed and ﬂoating prices along a whole term struc-ture are omnipresent in various business ﬁelds and raise substantial business risks.Already humble approaches lead to analytically intractable mathematical modelsentailing high-dimensional allocation problems with constraints in the presence offrictions. This impediment often leads to over-simpliﬁed modelling (which stillneeds sophisticated technology), uncovered risks or overseen opportunities. Spec-tacular successes of deep learning techniques in language processing, image recogni-tion, learning games from tabula rasa, or risk management, just to name a few, arestimulating our imagination that asset-liability-management (ALM) might enter anew era. This novel approach, brieﬂy denoted «Deep ALM», raises a huge ﬁeldof mathematical research questions such as feasibility, robustness, inherent modelrisk and inclusions of all economic aspects. If we answer these questions, this couldpossibly change the ALM practice entirely.ALM is a research topic that may be traced back to the seventies of the pre-vious century when Martin L. Leibowitz and others developed cash-ﬂow matchingschemes in the context of the so-called dedicated portfolio theory ; see [15]. In theeighties and nineties, ALM was an intensively studied research topic in the mathe-matical community. This unanimously important branch of quantitative risk man-agement has been dominated by techniques from stochastic control theory ; e.g., see[8] by Giorgio Consigli and Michael A. H. Dempster or [16] by Karl Frauendorferand Michael Schürle for articles utilising dynamic programming . Further aspects ofthe historical evolution are treated in [29] by Ronald J. Ryan. However, alreadyover-simpliﬁed modelling involves a high complexity both from the analytical andthe technological viewpoint. A cornerstone in this regard is [23] by Martin I. Kusyand William T. Ziemba. They consider a rather advanced and ﬂexible ALM modelby specifying an economic objective (e.g., the maximisation of the present value of

Mathematics Subject Classiﬁcation. a r X i v : . [ q -f i n . R M ] S e p THOMAS KRABICHLER AND JOSEF TEICHMANN future proﬁts), a series of constraints (e.g., regulatory requirements and liquidityassurance) and penalty costs for constraint violations.Prevalent approaches applied in the industry such as solely immunising oneselfagainst parallel shifts of the yield curve have remained suboptimal until this day;e.g., see [32] by Martin Spillmann, Karsten Döhnert and Roger Rissi for a recent ex-position. Particularly, the empirical approach with static replicating portfolios de-scribed in the Section 4.2 of [32] is often applied in mid-sized retail banks. Althoughthe impact of stress scenarios is investigated in the light of regulatory requirements,the holistic investment and risk management process of many treasury departmentsremains insuﬃcient. More sophisticated approaches, such as for instance option-adjusted spread models (e.g., see [2] by Martin M. Bardenhewer), have not becomeestablished in the ﬁnancial industry. Therefore, despite many achievements, it isfair to say that ALM has not entirely fulﬁlled its high expectations.The fresh approach on deep hedging by Hans Bühler, Lukas Gonon, Josef Teich-mann and Ben Wood (see [7]) might pave a way for a new era in quantitative riskmanagement. It is clear that classical yet analytically intractable problems fromALM can be tackled with techniques inspired from deep reinforcement learning incase of an underlying Markovian structure; see [17] by Ian Goodfellow, Yoshua Ben-gio and Aaron Courville for a comprehensive overview of deep learning and [33] byRichard S. Sutton and Andrew G. Barto for an introduction to reinforcement learn-ing in particular. Reinforcement learning has found many applications in games(e.g., see [31] by a research team of Google’s DeepMind), robotics (e.g., see [22] byJens Kober, J. Andrew Bagnell and Jan Peters) and autonomous vehicles (e.g., see[21] by a joint publication of Valeo and academic partners). Recent monographs onmachine learning for ﬁnancial applications are [25] by Marcos López de Prado and[12] by Matthew F. Dixon, Igor Halperin and Paul Bilokon. Regarding applicationsof neural networks for hedging and pricing purposes, some research articleshave accumulated over the last 30 years; [28] by Johannes Ruf and Weiguan Wangprovides a comprehensive literature review. Only a handful of these articles exploitthe reinforcement learning paradigm. In contrast to this obvious applications ofdeep reinforcement learning to ALM, deep hedging approaches to ALM are simplerbut still providing the solution relevant for business decisions. Notice, however,that neither Markovian assumptions are needed, nor value functions or dynamicprogramming principles: Deep ALM will simply provide an artiﬁcial asset liabilitymanager who precisely solves the business problem (and not more) in a convincingway, i.e. provides ALM strategies along pre-deﬁned future scenarios and stress sce-narios. The fundamental ideas will be further elaborated on in the sections below.To the best of our knowledge, we are the ﬁrst to tackle ALM systematically by deephedging approaches.There are many fundamental problems arising in mathematical ﬁnance that can-not be treated analytically. Exemplarily, explicit pricing formulae for Americanoptions in continuous time are yet unknown even in the simplest settings suchas the Black-Scholes-framework. This is incredible in that most exchange tradedoptions are of American style. Optimal stopping problems in genuine stochasticmodels involve so-called parabolic integro-diﬀerential inequalities, which can benumerically approximated by classical methods up to three dimensions; e.g., seethe Section 10.7 in [20]. The curse of dimension impedes the feasibility of classicalﬁnite element methods for applications involving more than three risk drivers. With

EEP REPLICATION OF A RUNOFF PORTFOLIO 3 today’s computational power of a multi-core laptop, the runtime in six dimensionswould last many centuries. A popular circumvention is the regression based leastsquares Monte-Carlo method proposed by Francis A. Longstaﬀ and Eduardo S.Schwartz; see [24]. Some statisticians perceive machine learning as a generalisationof the regression technique. In this sense, a spectacular success was achieved bySebastian Becker, Patrick Cheridito and Arnulf Jentzen when they utilised deepneural networks to price American options in as much as dimensions below minutes; see [4] for further details. One is forced to abandon the fundamentaldesire of directly solving a very complex mathematical equation and must approveof the paradigm shift to the deep learning principle: «What would a clever andnon-forgetful artiﬁcial ﬁnancial agent with a lot of experience and a decent riskappetite do?». This change of mindset opens the room for many possibilities notonly in quantitative ﬁnance. Computationally very intensive techniques such asnested Monte-Carlo become obsolete. We aim at utilising these advances to tackleproblems that were formulated back in the eighties, e.g., by Martin I. Kusy andWilliam T. Ziemba (see [23]), and have remained unsolved for the desired degreeof complexity ever since. Thus, the technological impact of Deep ALM for theﬁnancial industry might be considerable.The Markov assumption, roughly speaking that future states only depend onthe current state and not the past, usually leads to comparatively tractable ﬁnan-cial models. Real world ﬁnancial time series often feature a leverage eﬀect (i.e.,a relationship between the spot and the volatility) and a clustering eﬀect (i.e., apersistence of low and high volatility regimes). Unless volatility becomes a directlytraded product itself (which generally is not the case) and is perceived as an inte-grable part of the state, an eﬀective hedging strategy can hardly be found analyti-cally. This leads to the concept of platonic ﬁnancial markets that was introducedin [9] by Christa Cuchiero, Irene Klein and Josef Teichmann. Generally observableevents and decision processes are adapted to a strict subﬁltration of its counterpartthat is generated by the obscure market dynamics. This concept goes in a similardirection as hidden Markov models ; e.g., see the textbook [14] by Robert J. Elliott,Lakhdar Aggoun and John B. Moore for a reference. Regarding Deep ALM, thisinherent intractability of platonic ﬁnancial markets is not an impediment at all.Certainly, the replication portfolios along a trained deep neural network only de-pend on generally observable states. However, the ﬁnancial market dynamics donot have to satisfy any Markovianity restrictions with respect to the observablestates and/or traded products. The deep neural network jointly learns the pricingand the adaptation of hedging strategies in the presence of market incompleteness and model uncertainty ; e.g. see [13] by Nicole El Karoui and Marie-Claire Quenezand [1] by Marco Avellaneda, Arnon Levy and Antonio Parás for further details.Hence, Deep ALM uniﬁes key elements of mathematical ﬁnance in a fundamentaluse case.An essential prerequisite for the viability of Deep ALM in a treasury depart-ment is to come up with a suﬃciently rich idea of the macro-economic environmentand bank-speciﬁc quantities such as market risk factors, future deposit evolutions,credit rate evolutions and migrations, stress scenarios and all the parameterisationsthereof. In the end, the current state of a bank must be observable. In contrastto dynamic programming, machine learning solutions are straightforward to imple-ment directly on a cash-ﬂow basis without any simpliﬁcations (e.g., a divisibility THOMAS KRABICHLER AND JOSEF TEICHMANN into simpler subproblems becomes redundant), and their quality can be assessedinstantaneously along an arbitrarily large set of validation scenarios. Moreover, theparadigm of letting a smart artiﬁcial ﬁnancial agent (similar to an artiﬁcial chessplayer) with a lot of experience choose the best option allows to tame the curseof dimension, which is the pivotal bottleneck of classical methods. Further aspectssuch as intricate price dynamics, illiquidity, price impacts, storage cost, transac-tion cost and uncertainty can easily be incorporated without further ado; all theseaspects are hardly treated in existing frameworks. Despite the higher degree of re-ality, Deep ALM is computationally less intensive than traditional methods. Riskaversion or risk appetite respectively can be controlled directly by choosing an ap-propriate objective associated with positive as well as negative rewards in the learn-ing algorithm. Regulatory constraints can be enforced through adequately chosenpenalties. Hence, Deep ALM oﬀers a powerful and high-dimensional frameworkthat supports well-balanced risk-taking and unprecedented risk-adjusted pricing.It surpasses prevalent replication strategies by far. Moreover, it allows to make thewhole ﬁnancial system more resilient through eﬀective regulatory measures.The article is structured as follows. In the Section 2, we describe a prevalentreplication methodology for retail banks. Furthermore, we specify similar use casesfrom other industries. Subsequently, we explain the key notion of Deep ALM in theSection 3. A hint about the tremendous potential of Deep ALM is provided in theSection 4 in the context of a prototype. While accounting for the regulatory liquidityconstraints, a deep neural network adapts a non-trivial dynamic replication strategyfor a runoﬀ portfolio that outperforms static benchmark strategies conclusively. Thelast section elaborates on implications of Deep ALM and future research. Commonmisunderstandings from the computational and regulatory viewpoint are going tobe clariﬁed. 2.

A Static Replication Scheme

The Treasury Case of a Retail Bank.

The business model of a retailbank relies on making proﬁts from maturity transformations . To this end, a sub-stantial part of their uncertain liability term structure is re-invested. The liabil-ity side mainly consists of term and non-maturing deposits. Major positions onthe asset side are corporate credits and mortgages. Residual components serveas liquidity buﬀers, ﬂuctuation reserves and general risk-bearing capital. The in-herent optimisation problem within this simpliﬁed business situation is alreadyinvolved. Whereas most cash-ﬂows originating from the liability side are stochas-tic, the bank is obliged to ensure a well-balanced liquidity management. Regardingshort- and long-term liquidity, regulations require them to hold a liquidity coverageratio (LCR) and a net stable funding ratio (NSFR) beyond . In contrast,market pressure forces them to tighten the liquidity buﬀers in order to reach a de-cent return-on-equity.

Asset-liability-committees (ALCO) steer the balance sheetmanagement through risk-adjusted pricing and hedging instruments. Due to thehigh complexity, many banks predominantly deﬁne and monitor static replicationschemes. It is worth noticing that ALCOs nowadays often meet only once a monthand supervise simplistic measures such as the modiﬁed duration and aspects ofhedge accounting on loosely time-bucketed liabilities. Many opportunities are over-seen, and many risks remain unhedged; e.g., see the introductory chapter of [32].

EEP REPLICATION OF A RUNOFF PORTFOLIO 5 term in volume in number of volume in mEURmonths weight in % mEUR tranches per tranche . . . . . . Table 1.

Static investment scheme of a ﬁxed income portfolioCrucial business decisions are inconsistent over time and rather reactionary thanpro-active.Table 1 illustrates the replication scheme of a generic retail bank. It is inspiredfrom Section 4.2 in [32]. The notional allocated to ﬁxed income currently amountsto EUR m , which are rolled-over monthly subject to some static weights. Theﬁrst line corresponds to a liquidity buﬀer that is held in non-interest bearing legaltender.Exemplarily, or EUR m of the currently allocated capital are invested inﬁxed income instruments with a maturity of y . This portion is again split intoequal portions, such that roughly the same volume matures each month. In orderto end up with a closed cycle, this refers to as maintaining tranches with avolume of EUR . m (= 1 / × m ) . This redemption amount is re-investedagain over a horizon of y at the latest market rates. The same roll-over procedureis conducted for all other maturities. This leaves us with rolling tranches intotal. Under the current premises, tranches with a total volume of EUR . m mature each month and are re-invested consistently. The portfolio weights are onlyrevised from time to time. They are typically the result of empirical and historicalconsiderations. On the one hand, it is lucrative to increase the portion in longerterms as long-term yields are often but not always higher than short-term yields.On the other hand, if only a small portion is released each month, this leaves thebank with a signiﬁcant interest and liquidity risk. This static scheme is typicallyaccompanied with selected hedging instruments such that the sensitivities withrespect to parallel shifts of the yield curve or the so-called modiﬁed durations forthe asset and liability side roughly coincide.Certainly, dynamic strategies (including a restructuring of all tranches) are su-perior to their static counterparts regarding proﬁt potential and resilience of theenterprise. However, dealing directly with tranches is often not feasible com-putationally. If we only add a simple dynamic feature and allow for arbitraryweights in the monthly re-investment process, this already introduces a consider-able additional complexity from the analytical viewpoint. Even if one reduces thedimensionality of the yield curve dynamics, one faces an intricate nested optimisa-tion exercise over all re-investment instances. Utilising traditional methods such asdynamic programming and accounting for a reasonable set of constraints is far frombeing trivial. Moreover, due to the necessary model simpliﬁcations, it is uncertain THOMAS KRABICHLER AND JOSEF TEICHMANN whether this additional complexity really pays oﬀ. It is therefore not surprisingthat many retail banks have been implementing a rather static replication schemeas described above. In the near future, this is likely to change once the noveland computationally much less intensive Deep ALM with a huge potential will bedeployed.2.2.

Similar Use Cases.

The notion of replication schemes may be applied anal-ogously to other use cases. In the following, we describe three of them.2.2.1.

Actuarial Perspective.

The situation described above can be transferred one-to-one to the situation of insurance companies. The aspects that change are com-pletely diﬀerent regulatory circumstances, the asset illiquidity of hedging instru-ments (the retrocession ) and the stochastic cash-ﬂow generation as well as separateextreme risks on the liability side. Life insurance companies and pension fundsmust account for the exact product terms and the mortality characteristics of theirpolicyholders. This includes, for instance, longevity and pandemic risks. Non-lifeinsurance companies model the annual loss distribution and its unwinding overconsecutive years.2.2.2.

Procurement.

The producing industry is faced with a long and complex pro-cess chain. Several raw commodities need to be bought, delivered, stored, processedand designed into a consumer product. In order to control the inherent businessrisks and ensure proﬁtability, a maturity transformation in the procurement pro-cess becomes inevitable. Even though the exact amount of the required materialsis yet unknown, the company’s exposure towards adverse price movements needsto be hedged accordingly. Despite the seemingly large daily trading volumes, anyorder comes with a price impact. Controlling the inventory is also worthwhile dueto funding and storage cost. Therefore, slightly more involved than the treasurycase described above, we have an ALM optimisation in the presence of illiquidity,storage cost, transaction cost and uncertainty.2.2.3.

Hydroelectric Power Plants.

The technological advances regarding renewableenergy and the deregulation are having a huge impact on electricity markets. Windand solar energy make the electricity prices weather-dependent and more volatile.Still, some segments of future markets have remained comparatively illiquid. Apeculiarity of the electricity market are its intricate price dynamics (including neg-ative prices) and a wide range of more or less liquidly traded future contracts overvarying delivery periods (e.g., intraday, day ahead, weekend, week, months, quar-ters, years). This makes the estimate of hourly price forward curves (HPFC) thataccount for seasonality patterns (intraday, weekday, months), trends, holiday calen-dars and aggregated price information inferred from various future contracts (base,peak and oﬀ-peak) a very challenging exercise. Electricity producers may incursigniﬁcant losses due to adverse contracts with locked-in tariﬀs and mis-timed hy-dropower production in reservoirs. This environment forces the suppliers to conducta rigorous ALM, which comprises, for instance, a competitive pricing of ﬁxed-for-ﬂoating contracts and daily optimised power production plans. The situation ofdelivering competitive yet still proﬁtable margins in the presence of uncertainty isa situation banks have been accustomed to for decades (with a diﬀerent commodityand diﬀerent rate dynamics). Deep ALM can be tailored to this very situation andprovides exactly the technological solution to this delicate business problem. All

EEP REPLICATION OF A RUNOFF PORTFOLIO 7 one needs to establish is an appropriate scenario generator for the future marketsand the HPFC. Many business-speciﬁc peculiarities can hardly be reﬂected by tra-ditional optimisation techniques. As an illustration, a turbine cannot be turnedon and oﬀ on an hourly basis. This simple-looking constraint renders the dynamicprogramming principle impossible. However, this constraint can be accounted forin Deep ALM without further ado by incorporating suitable penalties.3.

Deep ALM

Beyond professional judgment, practical problems arising in ALM are often tack-led either by linear programming or dynamic programming. The latter requiresthat the problem is divisible into simpler subproblems. For non-linear and high-dimensional cases, one typically exploits Monte-Carlo techniques in order to deriveor validate strategies empirically along a handful of criteria. As a matter of fact,the level of sophistication for replicating strategies remains fairly limited due to thehigh intricacy and the lack of an adequate constructional approach. Deep ALMmakes all these impediments obsolete. One directly implements the arbitrarilycomplex rule book of the use case and lets a very smart and non-forgetful artiﬁcialﬁnancial agent gain the experience of many thousand years in a couple of hours.By incentivising the desired behaviour and stimulating a swift learning process, onereaches superhuman level in due course.We consider a balance sheet evolution of a retail bank over some time grid { , , , . . . , N } in hours, days, weeks or months. The roll-forward is an alternatingprocess between interventions and stochastic updates over time; see Figure 1. Ateach time instance t , an artiﬁcial asset-liability-manager, subsequently denoted the«artiﬁcial ﬁnancial agent» (AFA), assesses the re-investment of matured productsas well as the restructuring of the current investment portfolio. This involves var-ious asset classes, not just ﬁxed income instruments. Correspondingly, the AFA isactive in both the primary market of newly issued instruments and the secondarymarket of previously issued and circulating instruments. There may be furthereligible transformations of this initial state with transaction cost, e.g., granting fur-ther credits or building additional liquidity buﬀers. The balance sheet components A t (assets), L t (liabilities) and E t (equity) come with the superscripts «pre» and«post», which refer to as before and after the balance sheet restructuring respec-tively. Preceding interventions, an auxiliary step for the calculation of taxes maybe necessary. Subsequently, the economic scenario generator performs a stochasticroll-forward of the balance sheet. All accounts are updated to the latest circum-stances and macro-economic factors, e.g., certain products pay oﬀ cash amountssuch as coupons or dividends, certain products need to be written oﬀ due to thebankruptcy of the referenced entity, clients withdraw certain portions of their de-posits, etc. It needs to be noted that the described situation is kept simplisticfor illustrative purposes. It is the main target of future research work to reach anacceptable level of sophistication in order to make the technology eligible for realworld ALM.The superiority of the dynamic replication scheme requires that the trained AFAcan perfectly trade-oﬀ beneﬁts from a restructured portfolio and the incurred trans-action cost of the restructuring. Furthermore, the trained AFA must impeccablyweigh up diﬀerent objectives such as complying with regulatory constraints, earninga risk-adjusted spread, maintaining an adequate level of liquidity, keeping suﬃcient THOMAS KRABICHLER AND JOSEF TEICHMANN

B/S A pre t L pre t E pre t t B/S A post t L post t E post t restructuring B/S A pre t +1 L pre t +1 E pre t +1 t + 1 restructuringeconomic B/S L post t +1 E post t +1 scenariogenerator , A post t +1 valuationupdate Figure 1.

Figure 1.

Balance sheet roll-forward

DNN rewardupdate weightsDNNDNN t = 0 t = 1 t = 2 t = 3 t = N DNN

Figure 2.

Deep neural network architecturemargins for adverse scenarios, etc. The fundamental idea is to parameterise thedecision process at each time instance t through a deep neural network (DNN)and to incrementally improve its performance through techniques inspired frommachine learning. By concatenating these deep neural networks along the timeaxis, one ends up at a holistic dynamic replication strategy that easily deals withnon-stationary environments. The learning process incentivises the maximisationof an adequately chosen reward function by incrementally updating the weights ofthe feedforward networks through backpropagation ; see Figure 2. This idea allowsto tame the curse of dimension. These new possibilities of Deep ALM allow fora more frequent supervision while accounting for arbitrarily many side constraintsand complicated market frictions such as illiquidity and oﬀsetting eﬀects of trans-action cost in hedges. Thus, the full power of dynamic strategies can be deployed. EEP REPLICATION OF A RUNOFF PORTFOLIO 9

Traditional numerical schemes describe an algorithm either in order to solve anequation or in order to optimise a certain functional. Typically, algorithms areonly accepted by the mathematical community if they come with two supplements,namely a proof that the recipe terminates with a certain accuracy at the desiredlevel, on the one hand, and a statement about the rate-of-convergence, on the otherhand. In the context of Deep ALM, we coin the notion of convincing strategies because we are likely not able to prove that the randomly initialised deep neuralnetwork will end up at some optimal replication strategy. In fact, we intend to de-rive non-trivial but still realistic dynamic replication strategies within a reasonabletime that signiﬁcantly outperform those strategies currently used in the ﬁnancialindustry. From the mathematical viewpoint, Deep ALM entails a delicate paradigmshift. The results will be derived empirically based on scenarios representing theexperience of many thousand years. By repeating the learning process a few times,we will corroborate the robustness of the approach. Furthermore, we will validateand challenge the performance of the deep neural network on unseen testing data.If the gradient descent algorithm can only further reduce the training loss withsome overﬁtting strategy at the cost of the validation loss, we have ended up at alocal optimiser. Even though we will not be able to prove global optimality, thestrategy might still reach superhuman level and, thus, be more than convincing.4.

Empirical Study on a Stylised Case

The Runoﬀ Case.

We aim at optimising the replication strategy when un-winding a runoﬀ portfolio over ten years in the context of an equity index and alarge ﬁxed income universe. At every time instance, roughly 200 bonds are ac-tive. This prototype demonstrates the feasibility of Deep ALM in a stylised case.In the end, every balance sheet roll-forward is a superposition of runoﬀ portfo-lios. Furthermore, we give an indication about the potential of dynamic replicationstrategies.The liabilities are backed by a static legacy investment portfolio. Roughly twothirds of the assets are invested in the ﬁxed income market and are invested inequities. The remainder is held in cash. Every month, six new bonds are issued inthe primary market with the face amount of monetary units and the maturities m , m , m , y , y and y . The ﬁrst three series only pay a coupon at the ﬁnalredemption date, the other three pay semi-annual coupons. At issuance, all bondstrade at par. The legacy investment portfolio has attributed equal portions to allof the six bond series and has conducted an equally distributed roll-over strategybased on historical data from the European Central Bank (ECB). Future yieldcurve scenarios are simulated consistent with a principal component analysis. Forsimplicity, we assume that there is no secondary market. Hence, bonds cannot besold on in the market and must be held to maturity. Furthermore, we precludecredit defaults. Regarding equities, we generated scenarios based on a discretisedgeometric Brownian motion. Changing the position in equities involves proportionaltransaction cost. Due to legal requirements, at least of the asset portfoliomust be held in cash; this is just an arbitrary choice in order to incorporate aregulatory constraint. Any discrepancy is penalised monthly with a high interestrate. The initial balance hosts assets worth monetary units. The term structureof liabilities with the same face amount is spread across the time grid according tosome beta distribution. The Implementation Strategy.

A well-established approach to tackle quan-titative ﬁnance problems with deep reinforcement learning is to follow the standardroutine inspired from games. A game is a stochastic environment in which a playercan act according to a set of rules and can inﬂuence the course of events from startto ﬁnish to a certain degree. The game terminates after a prespeciﬁed number ofrounds. A rule book includes the model drivers and their interdependence, theconstraints and the penalties for constraint violations, the control variables andthe objective function. Notably, we actually do not solve the game for all initialstates nor do we assume any Markovian structure. We implemented the ’game’according to the rule book speciﬁed below in a Jupyter Notebook such that therandom outcome of any strategy could be simulated arbitrarily often. To this end,we deﬁned an economic scenario generator that hosted the yield curve dynamicsand spot prices for equities. Having this environment in place, one can test theperformance of any naïve or traditionally derived replication portfolio. The stepof translating the rule book into model dynamics is crucial for understanding thegame, for ensuring a well-posed optimisation problem and for streamlining the logicof the subsequent tensorisation. The evolution of R N -valued quantities can be in-ferred from historical time series with a principal component analysis as proposedby Andres Hernandez in [19]. Subsequently, the game is embedded in a deep neu-ral network graph architecture using TensorFlow such that an AFA can run manygames in parallel and adaptively improve its performance. The graph constitutes ofmany placeholders and formalises the logic of the game. The ﬁrst decision processof the AFA is managed through the randomly initialised weights of a neural net-work with a handful of hidden layers. Its input layer is a parameterisation of theinitial state, concerning both the balance sheet structure and the market quotes,and the output layer characterises the eligible balance sheet transformation. Thebalance sheet roll-forward is accomplished by connecting the output of the neuralnetwork with a whole bunch of eﬃcient tensor operations reﬂecting the balancesheet restructuring, new inputs for the non-anticipative scenario updates and theinput layer of the next decision process. Iterating this process yields the desireddeep neural network.4.3. Term Structures.

We are given a discrete time grid T = { , , , . . . , N } inmonths, where N = 120 refers to as a planning horizon of ten years. The initialterm structure of liabilities with the face amount of monetary units is spreadacross the time grid according to a beta distribution with the parameters a = 1 . und b = 2 . , i.e., if F a,b denotes the corresponding cumulative distribution function,then the cash-ﬂow L ( t ) := (cid:16) F a,b (cid:0) t/N (cid:1) − F a,b (cid:0) ( t − /N (cid:1)(cid:17) × becomes due in month t ∈ T := T \{ } . We brieﬂy write L := (cid:0) L (1) , L (2) , . . . , L ( N ) (cid:1) for the collection of all payables. At some ﬁxed time instance, any active bond ischaracterised through its future cash-ﬂows B = (cid:0) B (1) , B (2) , . . . , B ( N ) (cid:1) ∈ R N . If Y = (cid:0) Y (1) , Y (2) , . . . , Y ( N ) (cid:1) ∈ R N denotes the current yield curve with the yield Y ( T ) for the maturity T / in years and D = (cid:0) D (1) , D (2) , . . . , D ( N ) (cid:1) ∈ R N with D ( T ) = e − T/ · Y ( T ) for all T ∈ T the consistent discount factors, then the value of B is simply V ( B ) = (cid:104) D, B (cid:105) , where (cid:104)· , ·(cid:105) stands for the standard inner product. Thecoupons are chosen such that each bond trades at par. The cash-ﬂow generation at EEP REPLICATION OF A RUNOFF PORTFOLIO 11 issuance is straightforward. It corresponds to solving a linear equation. Exemplar-ily, a bond with a maturity of y and semi-annual coupons carries an annualisedcoupon rate c = 126 · − D (60)10 (cid:88) k =1 D (6 · k ) . Therefore, it holds B (6 · k ) = c · for all k ∈ { , , . . . , } , B (60) = (cid:18) c (cid:19) · , and all other components are zero. When going from one time instance to the next, B (1) is paid oﬀ and one simply applies to the scheme B the linear update operator U =  . . . . . .  ∈ R N × N . A k -fold application of the update operator is denoted by U k . We utilise datafrom the European Central Bank (ECB) in order to model the historical and futureyield curves. ECB publishes for each working day the Svensson parameters β , β , β , β , τ and τ of its yield curve. The corresponding yield curve is the vector Y = (cid:0) Y (1) , Y (2) , . . . , Y ( N ) (cid:1) ∈ R N with Y ( T ) = β + β − e − T/ (12 τ ) T / (12 τ ) + β − e − T/ (12 τ ) T / (12 τ ) − e − T/ (12 τ ) + β − e − T/ (12 τ ) T / (12 τ ) − e − T/ (12 τ ) . We ﬁx some historical time series in R N and conduct a principal component anal-ysis. To this end, we assume stationary daily yield curve increments ∆ X withexpected value E [∆ X ] ≡ µ ∈ R N and covariance matrix cov (∆ X ) ≡ Q ∈ R N × N .We spectrally decompose Q = Λ L Λ (cid:62) into the normalised eigenvectors Λ ∈ R N × N and the ordered eigenvalues L = diag { λ (1) , λ (2) , . . . , λ ( N ) } with λ (1) ≥ λ (2) ≥ . . . ≥ λ ( N ) . The last historical yield curve is the day when we acquire the runoﬀ portfolio.The stochastic yield curve increments from one month to the next are consistentlysampled according to ∆ Y = 22 · µ + n (cid:88) k =1 Z k Λ · k , where we choose n = 3 and Z k ∼ N (cid:0) , σ k (cid:1) for σ k = √ · λ ( k ) . The factor accounts for the number of trading days per month. If i denotes the maturityin number of months, the legacy bond portfolio has been investing each month aface amount of /i monetary units in that particular bond series for i ∈ I := { , , , , , } . This is consistent with the replication scheme from Table 1.4.4. The Equity Market.

We assume that the stochastic process S = ( S t ) t ∈ T follows a discretised geometric Brownian motion, i.e., it holds for all t ∈ T log S t S t − ∼ N (cid:32)(cid:18) m − s (cid:19) · , s (cid:33) with initial level S = 100 , drift m = 5% and volatility s = 18% . Changing theposition in equities involves proportional transaction cost κ = 0 . .4.5. The Balance Sheet.

We proceed iteratively. Let Y t ∈ R N denote the currentyield curve for t ∈ T \{ N } and D t consistent discount factors. Before restructuring,the left hand side of the balance sheet with a total present value of A pre t consists ofthree additive components: the held cash C pre t , the legacy bond portfolio with thevalue V pre t , and the equity position worth G pre t = ∆ pre t · S t . The right hand side ofthe balance sheet consists of the liabilities L pre t = (cid:104) D t , U t L (cid:105) and the residue E pre t = A pre t − L pre t .4.6. The Deep Neural Network Architecture.

For any t ∈ T \ { N } , weconsider a feedforward neural network F t = (cid:0) φ ◦ W (2) t (cid:1) ◦ (cid:0) φ ◦ W (1) t (cid:1) ◦ (cid:0) φ ◦ W (0) t (cid:1) with some aﬃne functions W (0) t : R −→ R , W (1) t : R −→ R , W (2) t : R −→ R . and the ReLU activation function φ ( x ) = max { x, } . The input layer consists of the leverage ratio E pre t /A pre t , the liquidity ratio C pre t /A pre t , the risk portion G pre t /A pre t ,the holding in equities ∆ pre t and the yields Y ( i ) t for i ∈ I of the latest bond series.The output layer reveals the outcome of the restructuring at the time instance t .The ﬁrst six components represent the holdings h ( i ) t for i ∈ I in the latest bondseries, the last component represents the holding ∆ post t in equities. Note that theactivation function implies long-only investments.4.7. The Restructuring.

The restructuring implicates the updated balance sheetitems C post t = C pre t − · (cid:88) i ∈ I h ( i ) t − (cid:16)(cid:0) ∆ post t − ∆ pre t (cid:1) + κ · (cid:12)(cid:12) ∆ post t − ∆ pre t (cid:12)(cid:12)(cid:17) · S t ,V post t = V pre t + 100 · (cid:88) i ∈ I h ( i ) t ,G post t = ∆ post t · S t ,A post t = C post t + V post t + G post t ,L post t = L pre t ,E post t = A post t − L post t . The Roll-Forward.

Let B post t ∈ R N denote the aggregated future cash-ﬂowsof the whole ﬁxed income portfolio after restructuring at time t . Furthermore, let π (1) : R N −→ R denote the projection onto the ﬁrst component. If one does notadhere to the regulatory constraint of holding at least of the assets in cash,one is penalised with liquidity cost of p = 24% per annum on the discrepancy. Oncethe yield curve has been updated stochastically to the state Y t +1 with consistentdiscount factors D t +1 and the equity index has been updated stochastically to thestate S t +1 , this leaves us with the roll-forward C pre t +1 = C post t + π (1) (cid:0) B post t (cid:1) − π (1) (cid:0) U t L (cid:1) − p · max (cid:8) · A post t − C post t , (cid:9) , EEP REPLICATION OF A RUNOFF PORTFOLIO 13 V pre t +1 = (cid:104) D t +1 , U B post t (cid:105) ,G pre t +1 = ∆ post t · S t +1 = ∆ pre t +1 · S t +1 ,A pre t +1 = C pre t +1 + V pre t +1 + G pre t +1 ,L pre t +1 = (cid:104) D t +1 , U t +1 L (cid:105) ,E pre t +1 = A pre t +1 − L pre t +1 . Objective.

For diﬀerent maturities T ∈ T , a common approach in mathemat-ical ﬁnance is to maximise E (cid:2) u (cid:0) E post T /E pre (cid:1)(cid:3) , where u ( x ) =  x − γ − − γ , if γ (cid:54) = 1log x , if γ = 1 denotes the iso-elastic utility function with constant relative risk aversion γ ≥ .This is not directly applicable in our case since we cannot prevent E post T frombecoming negative. Instead, we aim at maximising E (cid:20) u (cid:18)(cid:16) ε + φ (cid:0) E post T (cid:1)(cid:17) /E pre (cid:19)(cid:21) for a small constant < ε (cid:28) . Provided that the ﬁnal equity distribution is posi-tive, the case γ = 1 corresponds to the quest for the growth optimal portfolio thatmaximises the expected log-return; see [27]. An intriguing alternative is quadratichedging, where one aims at minimising E (cid:20)(cid:16) E post T − (1 + r ) T/ E pre (cid:17) (cid:21) for some annualised target return-on-equity r > . Either case leads to convincingstrategies.4.10. Results.

The optimisation is only non-trivial for non-negative yield curves.If the mid-term yields are consistently negative (as they have been in the EURarea for quite some years), holding cash is superior to the considered ﬁxed incomeinvestments (within the simpliﬁed premises speciﬁed above). As starting point, wechoose the ECB yield curve as per December 31, 2007. In that case, the initialvalue of the liabilities amount to L pre = 86 . monetary units. As outlined above,the exercise of optimising the equity distribution after the settlement of all liabili-ties is extremely challenging with conventional methods. The investment universeconsists at each time instance of active assets, namely the ﬁxed income universe,an equity index and legal tender. We propose a simple benchmark strategy thatallocates the available cash beyond of A pre t to bonds with a maturity of m ,provided that the coupons are positive. Equities are divested linearly over time.Moreover, we train an AFA on

10 000 unwinding scenarios following the ideas ofDeep ALM. An until only recently inaccessible dynamic strategy outperforms thebenchmark strategy conclusively after a short learning process of roughly min-utes. Using another

10 000 previously unseen validation scenarios, the followingchart illustrates the ﬁnal equity distributions after ten years for a classical staticstrategy in blue and for the dynamic strategy in orange. The dynamic strategydoes not involve extreme risk taking. It simply unveils hidden opportunities. Thesystematic excess return-on-equity with respect to the benchmark is beyond perannum; see Figure 3. d e n s i t y Histogram

BenchmarkDNN

Figure 3.

Line-up between the benchmark and the trained AFA5.

Outlook and Further Research

Potential of Deep ALM.

The real potential can hardly be foreseen. Oneachievement will lead to another and trigger further accomplishments. Despite theoptimism, the technique is not mature enough and requires an adequate level ofresearch. Market participants across all use cases, who we spoke with, like theintriguing idea of Deep ALM. However, only a few dare being the ﬁrst moversin this precompetitive phase. ALM is at the heart of risk management for anyenterprise in the ﬁnancial industry. In the light of the new opportunities, it is notacceptable how simplistic and ineﬀective prevalent replication schemes are to thisday. All of us are directly aﬀected by unhedged risks of bank deposits, insurancepolicies and pension funds. This can be illustrated exemplarily with Finland whosebanking sector caused a devastating ﬁnancial crisis in the early 1990s. In theaftermath, oﬃcial unemployment rates escalated to roughly . Deep ALM oﬀersa powerful framework that supports well-balanced risk-taking and unprecedentedrisk-adjusted pricing. Thus, its rigorous application may prevent a ﬁnancial crisissooner or later. Likewise, we are all exposed to the climate change. Hence, it isbecoming increasingly important to spend resources wisely. Deep ALM promotesthis attitude with signiﬁcant eﬃciency gains in the procurement of commoditiesand the hydropower production. These are promising prospects for Deep ALMindeed.5.2.

Necessity of Further Research.

Some people argue that machine learningwill facilitate the enterprise risk management, make it more eﬀective and makesome of the current tasks even redundant. We agree that the new opportunitieswill allow for more consistent and sound risk taking. However, we are convinced

EEP REPLICATION OF A RUNOFF PORTFOLIO 15 that legitimate applications of Deep ALM will entail a lot of additional work sincethe previously inaccessible analyses do not come for free. Moreover, they will raisea huge portion of new economic challenges and obligations. Thus, it is essentialthat the research community prepares a solid ground for this technological trans-formation.5.3.

Feasibility.

Optimising a whole term structure of assets and liabilities byutilising techniques inspired from deep reinforcement learning is a novel approachand its feasibility needs to be demonstrated. In the light of recent advances, we areoptimistic that this goal can be achieved in due course. Nonetheless, this challengeopens a great ﬁeld of research questions from the mathematical, computationaland economic viewpoints. Regarding feasibility, it is an important concern that wedo not just want to feed an utterly complex mathematical system into a powerfulsupercomputer and see whether we get something out. In contrast, we aim at for-mulating and optimising dynamic replication schemes that meet the requirement ofindustrial applicability, on the one hand, and operability on a moderately enhancedcomputer, on the other hand. Particularly, this involves an assessment whether thebehaviour of the trained AFA is interpretable, whether it follows sound economicreasoning and whether the non-anticipative dynamic replication strategies fulﬁl thebasic requirement of practical viability. Solving a high-dimensional utility maximi-sation problem that entails unrealistic leverage or arbitrary rebalancing frequenciesis futile for real world applications. For this purpose, Deep ALM must trade oﬀmany objectives such as cost and beneﬁt of liquidity, drawdown constraints, moder-ate leverage ratios, proﬁt expectation, regulatory interventions, risk appetite, shortsale restrictions, transaction cost, etc. Most aspects can be accounted for in themodel by means of penalties that incentivise the AFA’s behaviour accordingly. Oth-ers such as drawdown constraints and proﬁt expectation typically enter the targetfunction of the optimisation. Finding the right balance between all the diﬀerentgoals without adversely aﬀecting the robustness of the learning process entails someengineering work. However, once a convincing and stable economic model choicehas been established, this has a tremendous impact on the economic research ﬁeld.Previously inaccessible price tags to crucial economic notions such as, for instance,cost of liquidity for a retail bank can be evaluated quantitatively. All one needs todo is to line up performance measures of trained AFAs when short sale restrictionsof the cash position are enforced and dropped respectively.5.4.

Goal-based Investing.

A closely related subject to Deep ALM is goal-basedinvesting; e.g., see [6] by Sid Browne for a neat mathematical treatment. One aimsat maximising the probability to reach a certain investment goal at a ﬁxed maturity.From the theoretical viewpoint with continuous rebalancing, this is equivalent toreplicating a volume-adjusted set of digital options (unless the target return issmaller or equal than the risk-free rate). Despite being very intuitive, this result isonly of limited use for practical applications. Discrete rebalancing and the payoﬀ-discontinuity at the strike lead, as one gets closer and closer to maturity, to aninherent shortfall risk way beyond the invested capital. It needs to be investigatedwhether Browne’s setting can be modiﬁed accordingly for real world applications.This involves more realistic dynamics of the economic scenario generator, exogenousincome (as inspired by Section 7 of [6]), maximal drawdown constraints and abacktesting with historical time series.

Generalisation.

If we take one step further and look at ALM as an abstractframework, Deep ALM may contribute crucially to urgent challenges of our soci-ety such as combating the climate change or dealing with pandemics. It trains aplayer to reach a certain goal while minimising the involved cost. Regarding theclimate change, there is a complex trade-oﬀ in implementing protection measures,incentivising behavioural changes, transforming the energy mix, etc. RegardingCOVID-19, rigorous emergency policies were enforced throughout the world in or-der to protect individual and public health. These measures involved extreme costsuch as temporary collapses of certain industries, closed schools, restricted mobil-ity, etc. These are again high-dimensional problems, which cannot be tackled withclassical methods. Deep ALM is a ﬁrst step towards these more diﬃcult challenges.5.6.

Impediments.

A challenge will be to overcome the common misunderstand-ing that deep learning requires a lot of historical data (that banks or commoditycompanies usually do not have), that deep learning is an incomprehensible blackbox, and that regulators will never approve of Deep ALM. Firstly, the above ap-proach does not require any historical data. All one needs are model dynamics ofordinary scenarios and stress scenarios. The training data can be fully built on thesepremises synthetically. Secondly, the performance of deep neural networks can bevalidated without further ado along an arbitrarily large set of scenarios. To ourmind, this is one of the most striking features of Deep ALM (which is not satisﬁedby traditional ALM models). Traditional hybrid pricing and risk management mod-els entailed an intensive validation process. This intensive work becomes obsoleteand can be invested into stronger risk management platforms instead. Thirdly andlastly, as long as one provides the regulators with convincing arguments regardingthe accuracy and resilience of the approach (e.g., in terms of a strong validation andmodel risk management framework), they will never object to use deep learning.5.7.

Regulatory Perspective.

The complex regulatory system in place is sup-posed to make the ﬁnancial system less fragile. However, designing and implement-ing eﬀective policies for the ﬁnancial industry, which do no promote wrong incen-tives and which are not (too) detrimental to economic growth, is very demanding.This is highlighted exemplarily in the IMF working paper [30]. An illustration thatcertain measures do not necessarily encourage the desired behaviour can be foundin [18] by Martin Grossmann, Markus Lang and Helmut Dietl. Initially triggered bythe bankruptcy of Herstatt Bank (1974), the Basel Committee on Banking Super-vision has established a framework that speciﬁes capital requirements for banks. Itwas revised heavily twice since its publication in 1988. The current framework, alsoknown as «Basel III» (see [3]), was devised in the aftermath of the 2007/2008 creditcrunch. Amongst others, it proposed requirements for the LCR and the NSFR. Theimpact of these liquidity rules was empirically assessed in [10] by Andreas Dietrich,Kurt Hess and Gabrielle Wanzenried. Due to limited historical data and the in-tricate complexity between the plethora of inﬂuencing factors, the true impact ofthe new regime can hardly be isolated. Deep ALM opens an extremely appealingnew way of appraising the eﬀectiveness of regulatory measures. To this end, oneinvestigates how a very smart and experienced AFA changes her behaviour whencertain regulatory measures are enabled or disabled respectively. This approach al-lows to truly identify the impact of supervisory interventions and the compatibility

EEP REPLICATION OF A RUNOFF PORTFOLIO 17 amongst each other. These quantitative studies will be a promising supplement ofDeep ALM.5.8.

Model Risk Management.

The supervisory letter SR 11-7 of the U.S. Fed-eral Reserve System («Fed») has become a standard for considerations of model riskmanagement ; see [5]. Controlling the inherent model risk and limitations will be akey challenge before Deep ALM may be exploited productively. While the perfor-mance of proposed strategies can be checked instantaneously for thousands of sce-narios within the model, an assessment of the discrepancy between the Deep ALMmodel and the real world is a crucial research topic. Once a robust Deep ALM learn-ing environment has been established, these aspects can be explored by means ofvarious experiments. For instance, one can analyse the sensitivities of the acquiredreplication strategies with respect to the model assumptions or the performance ofthose in the presence of parameter uncertainties. These uncertainties may concernboth the assumed states (e.g., the term structure of deposit outﬂows) and the con-cealed scenario generation for both the training and the validation (e.g., the driftassumptions of equities).5.9.

Risk Policy Perspective.

Another intriguing question is the quest for lo-cally stable strategic asset allocations. Typically, the trained neural network willreshuﬄe the balance sheet structure right from the beginning. Characterising no-action regions corresponds to locally optimal initial states. Concerning the modelrisk management, revealing the link between diﬀerent model assumptions and theinternal risk policies, on the one hand, and the no-action regions, on the other hand,will be powerful tool for risk committees.

References

1. Avellaneda, M., Levy, A., Parás, A. (1995). Pricing and hedging derivative securities in marketswith uncertain volatilities.

Applied Mathematical Finance . Vol. 2, No. 2, pp. 73–88.2. Bardenhewer, M. M. (2007). Modeling Non-maturing Products.

Liquidity Risk Measurementand Management by Matz, L. and Neu, P. (editors), John Wiley & Sons .3. Basel Committee on Banking Supervision (2011). Basel III: A global regula-tory framework for more resilient banks and banking systems (revised version). .4. Becker, S., Cheridito, P., Jentzen, A. (2019). Deep Optimal Stopping.

Journal of MachineLearning Research . Vol. 20, No. 74, pp. 1–25.5. Board of Governors of the Federal Reserve System (2011). SR 11-7: Guidance on Model RiskManagement. .6. Browne, S. (1999). Reaching goals by a deadline: digital options and continuous-time activeportfolio management.

Advances in Applied Probability . Vol. 31, No. 2, pp. 551–577.7. Bühler, H., Gonon, L., Teichmann, J., Wood, B. (2019). Deep Hedging.

Quantitative Finance .Vol. 19, No. 8, pp. 1271–1291.8. Consigli, G., Dempster, M. A. H. (1998). Dynamic stochastic programming for asset-liabilitymanagement.

Annals of Operations Research . Vol. 81, pp. 131–162.9. Cuchiero, C., Klein. I. and Teichmann, J. (2017). A fundamental theorem of asset pricingfor continuous time large ﬁnancial markets in a two ﬁltration setting. to appear in Theory ofProbability and its Applications, arXiv:1705.02087 .10. Dietrich, A., Hess, K., Wanzenried, G. (2014). The good and bad news about the new liquidityrules of Basel III in Western European countries.

Journal of Banking & Finance . Vol. 44,pp. 13–25.11. Dietrich, A., Krabichler, T., Wanzenried, G. (2020). The impact of the Liquidity CoverageRatio on Swiss retail banks.

Work in Progress .12. Dixon, M. F., Halperin, I., Bilokon, P. (2020). Machine Learning in Finance.

Springer VerlagBerlin Heidelberg .

13. El Karoui, N., Quenez, M.-C. (1995). Dynamic Programming and Pricing of Contingent Claimsin an Incomplete Market.

SIAM Journal on Control and Optimization . Vol. 33, No. 1, pp. 29–66.14. Elliott, R. J., Aggoun, L., Moore, J. B. (1995). Hidden Markov Models.

Springer Verlag BerlinHeidelberg .15. Fabozzi, F. J., Leibowitz, M. L., Sharpe, W. F. (1992). Investing: the collected works ofMartin L. Leibowitz.

Probus Professional Pub .16. Frauendorfer, K., Schürle, M. (2003). Management of non-maturing deposits by multistagestochastic programming.

European Journal of Operational Research . Vol. 151, No. 3, pp. 602–616.17. Goodfellow, I., Bengio, Y., Courville, A. (2016). Deep Learning. .18. Grossmann, M., Lang, M., Dietl, H. (2016). Why Taxing Executives’ Bonuses Can FosterRisk-Taking Behavior.

Journal of Institutional and Theoretical Economics . Vol. 172, No. 4,pp. 645–664.19. Hernandez, A. (2016). Model Calibration with Neural Networks. https://ssrn.com/abstract=2812140 .20. Hilber, N., Reichmann, O., Schwab, C., Winter, C. (2013). Computational Methods for Quan-titative Finance.

Springer Verlag Berlin Heidelberg .21. Kiran, B. R., Sobh, I., Talpaert, V., Mannion, P., Al Sallab, A. A., Yogamani, S., Pérez, P.(2020). Deep Reinforcement Learning for Autonomous Driving: A Survey. arXiv:2002.00444 .22. Kober, J., Bagnell, J. A., Peters, J. (2013). Reinforcement learning in robotics: A survey.

TheInternational Journal of Robotics Research . Vol. 32, No. 11, pp. 1238–1274.23. Kusy, M. I., Ziemba, W. T. (1983). A Bank Asset and Liability Management Model.

IIASACollaborative Paper, http://pure.iiasa.ac.at/id/eprint/2323/ .24. Longstaﬀ, F. A., Schwartz, E. S. (2001). Valuing American Options by Simulation: A SimpleLeast-Squares Approach.

The Review of Financial Studies . Vol. 14, No. 1, pp. 113–147.25. López de Prado, M. (2018). Advances in Financial Machine Learning.

John Wiley & Sons .26. Merton, R. K. (1974). On the Pricing of Corporate Debt: The Risk Structure of InterestRates.

Journal of Finance . Vol. 29, No. 2, pp. 449–470.27. Platen, E. and Heath, D. (2006). A Benchmark Approach to Quantitative Finance.

SpringerVerlag Berlin Heidelberg .28. Ruf, J., Wang, W. (2019). Neural networks for option pricing and hedging: a literature review. arXiv:1911.05620 .29. Ryan, R. J. (2013). The Evolution of Asset/Liability Management.

CFA Institute ResearchFoundation Literature Review, https://ssrn.com/abstract=2571462 .30. Sharma, S., Chami, R., Khan, M. (2003). Emerging Issues in Banking Regulation.

IMF Work-ing Papers, WP/03/101 .31. Silver, D., Hubert, T., Schrittwieser, J., Antonoglou, I., Lai, M., Guez, A., Lanctot, M.,Sifre, L., Kumaran, D., Graepel, T., Lillicrap, T., Simonyan, K., Hassabis, D. (2018). Ageneral reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Science . Vol. 362, No. 6419, pp. 1140–1144.32. Spillmann, M., Döhnert, K., Rissi, R. (2019). Asset Liability Management (ALM) in Banken.

Springer Gabler .33. Sutton, R. S., Barto, A. G. (1998). Reinforcement Learning: An Introduction.

MIT Press . E-mail address : [email protected] E-mail address ::