[PDF] A neural network model for solvency calculations in life insurance

Abstract

Insurance companies make extensive use of Monte Carlo simulations in their capital and solvency models. To overcome the computational problems associated with Monte Carlo simulations, most large life insurance companies use proxy models such as replicating portfolios. In this paper, we present an example based on a variable annuity guarantee, showing the main challenges faced by practitioners in the construction of replicating portfolios: the feature engineering step and subsequent basis function selection problem. We describe how neural networks can be used as a proxy model and how to apply risk-neutral pricing on a neural network to integrate such a model into a market risk framework. The proposed model naturally solves the feature engineering and feature selection problems of replicating portfolios.

Full PDF

AA neural network model for solvency calculations in lifeinsurance

Lucio Fernandez-ArjonaUniversity of ZurichDecember 2019

Abstract

Insurance companies make extensive use of Monte Carlo simulations in their capital andsolvency models. To overcome the computational problems associated with Monte Carlo sim-ulations, most large life insurance companies use proxy models such as replicating portfolios.In this paper, we present an example based on a variable annuity guarantee, showing themain challenges faced by practitioners in the construction of replicating portfolios: the featureengineering step and subsequent basis function selection problem.We describe how neural networks can be used as a proxy model and how to apply risk-neutral pricing on a neural network to integrate such a model into a market risk framework.The proposed model naturally solves the feature engineering and feature selection problems ofreplicating portfolios.

Keywords:

Economic Capital; Swiss Solvency Test; Solvency II; Neural Networks; NestedMonte Carlo; Replicating Portfolios.

Correspondence to: L.Fernandez-Arjona, University of Zurich, [email protected] a r X i v : . [ q -f i n . C P ] M a y Introduction

Insurance companies rely on ﬁnancial models for quantitative risk management. These risk modelsshould be accurate and fast in terms of the calculation of risk ﬁgures such that the rapid paceof market environments is matched. Life insurance companies face the challenge of having toquickly revalue their liabilities under economic stress scenarios based on market-consistent valuationprinciples.Typically, insurance liabilities exhibit features, such as options and guarantees, comparable tostandard ﬁnancial products. Unlike for the latter, there are generally no closed-form formulas forthe valuation of the former. Because of this, the use of numerical methods, such as Monte Carlotechniques, becomes inevitable. However, the choice of the speciﬁc technique to be used is the keyfactor in the accuracy and speed of the calculation of risk ﬁgures.Nested Monte Carlo, a straightforward approach to this problem, is computationally burdensometo the point of being infeasible in most cases. Other standard techniques to approach this problemare the least squares Monte Carlo approach (LSMC) and the replicating portfolio approach (RP).These techniques, and their advantages and disadvantages, will be described in more detail in thefollowing sections.The availability of accurate and fast valuation methods is of great interest to risk managementpractitioners in life insurance companies. In the last decade, machine learning models based onneural networks have emerged as the most accurate in many ﬁelds, among them image recognition,natural language understanding and robotics. While not the ﬁrst to apply neural networks to lifeinsurance modelling, this paper incorporates the idea of risk-neutral valuation of neural networks toachieve higher accuracy than other existing models while remaining within a realistic computationalbudget.A review of the literature shows that several papers work with machine learning techniques toaddress the problem of calculating the value and risk metrics of complex life insurance products.One family of methods starts with exact information about a small number of contracts, andthen uses spatial interpolation to generate valuations for all contracts. For example, Gan and Lin(2015) use a clustering technique for the selection of the representative contracts and then applya functional data analysis technique, universal kriging, for the interpolation. Hejazi and Jackson(2016) propose using a neural network model for the interpolation step instead of traditional inter-polation techniques. These methods interpolate among policies for a ﬁxed set of economic scenarios,hence reducing the computational burden of calculating the value of the insurance contract. Theyaddress the valuation problem but on their own they are not enough to solve the computationalproblem of calculating risk metrics.The eﬀective calculation of risk metrics is addressed by a family of methods that take portfoliovaluations (or cash ﬂows) as inputs and use regression methods to construct a model capable of2roducing the real-world distribution of values of the insurance portfolio. These methods can beclassiﬁed in two groups, ‘regress-now’ and ‘regress-later’, a classiﬁcation ﬁrst proposed by Glasser-man and Yu (2002). Regress-later methods perform better than regress-now methods, as shown byBeutner et al. (2013). Examples of both types of methods in this family can be found in Beutneret al. (2016) and Castellani et al. (2018). The former presents a regress-later model based on anorthogonal basis of piece-wise linear functions, and the latter a regress-now model based on neuralnetworks. LSMC and replicating portfolios belong to this overarching family of methods, and anymachine learning approach in this family can be a direct replacement for them.In this paper we present a regress-later model based on neural networks, including a closed-formformula for its risk-neutral valuation, and compare it to a benchmark implementation of replicatingportfolios. We focus on this comparison in order to answer a question of relevance to practitioners:‘Can neural networks provide better results than existing methods used in the industry?’ A formalmathematical treatment of the methods involved can be found in the already mentioned Beutneret al. (2013) (regress-later vs regress-now methods), Natolski and Werner (2014), and Cambou andFilipovi´c (2018) (RPs) and Bauer et al. (2010) (LSMC).We have not found in the literature previous work on comparing replicating portfolios (that is,regress-later method with ﬁnancial instruments as basis functions) to other approximation tech-niques. There are, however, several papers that compare LSMC approaches: Bauer et al. (2010)present a comparison between LSMC and nested Monte Carlo, and Pelsser and Schweizer (2016)present a comparison between LSMC regress-now and LSMC regress-later.Against the background described above, this paper’s contribution is threefold: • it presents a reproducible replicating portfolio approach suitable for benchmarking in a re-search context, • it introduces a risk-neutral valuation formula for a class of neural networks with multivariatenormally distributed inputs (such as discrete time Brownian motion processes), • it builds a regress-later methodology based on neural networks and compares the quality ofthe economic capital calculations between this method and the replicating portfolio approach.The neural network model improves on those in use in the industry and some of those presentedin the literature. In comparison with the (regress-now) neural network model in Castellani et al.(2018), our model is based on a regress-later approach, which—as described in the literature—provides more accurate results than regress-now models. In comparison with the regress-laterapproach in Beutner et al. (2016), the neural network approach allows us to avoid having to deﬁnethe arbitrary A T ( Z ) (dimensionality reduction function) and the hypercube grid. Under a neuralnetwork model, those parameters are data driven and determined as part of the optimization.The rest of the paper is structured as follows: Section 2 describes the solvency capital calculationproblem in detail, together with possible solutions based on nested Monte Carlo and proxy models.3ection 3 summarizes the mathematical framework common to all regression models (both regress-now and regress-later), Section 4 describes the proposed neural network approach, and presents therisk-neutral valuation of a class of neural networks. Section 5 discusses important qualitative aspectsof the models, such as model complexity and feature engineering. Finally, Section 6 describes thenumerical experiments and presents the results. Solvency regimes—including Solvency II or Swiss Solvency Test—require companies to hold capitalin excess of a legal minimum that is based on the amount necessary to remain solvent with highconﬁdence in a one-year period.The above implies the determination of the distribution of the value of the asset–liability port-folio at the end of the one-year period. We call the value of the asset–liability portfolio V t , andtherefore V is the value at the end of the ﬁrst year. The solvency capital requirement is thendetermined relative to a risk metric applied to this distribution: • Value at risk (Solvency II)

VaR α ( V ) = − inf (cid:8) v : F V ( v ) > α (cid:9) = F − − V (1 − α ) . • Expected shortfall (Swiss Solvency Test) ES α ( V ) = − α (cid:90) α VaR γ ( V ) dγ. If F − V is not known (and this is usually the case), then we must simulate { V ( i )1 } i =1: M and thencalculate the risk metric on the empirical (simulated) distribution. These M samples are referredto as real-world (or ‘natural’) simulations.Having described how risk calculations usually depend on a Monte Carlo sampling of V , { V ( i )1 } i =1: M , we now focus on the calculation of each individual V ( i )1 . These represent the value ofthe asset–liability portfolio at the end of the one-year period. The valuation must be carried outon a market-consistent basis, which means applying risk-neutral valuation: V t = E Q t (cid:2) (cid:88) τ>t CF τ ( X ) (cid:3) . (1)In the formula above the value of the portfolio is the expectation over the risk-neutral measure Q of future discounted cash ﬂows, CF τ ( X ). These cash ﬂows depend on a set of economic variables4 . In turn, X depends on a smaller set of normal random drivers ξ —that is, X = X ( ξ ). Thisimplies that there is a CF (cid:48) τ ( ξ ) such that CF (cid:48) = CF ◦ X .When calculating V for solvency capital purposes, the real-world simulations determine anempirical distribution of X ( X between 0 and 1), and for each sample in such distributionthere is a risk-neutral distribution of X after t = 1—which we call X T —over which V must becalculated.For simple products the calculation of V is straightforward. However, products that lack a closedform formula, { V ( i )1 } i =1: M must be approximated by some { (cid:98) V ( i )1 } i =1: M . Most complex productsfall under this case. A simple solution for approximating V is to apply a Monte Carlo approach: (cid:98) V t = V MC = 1 N N (cid:88) j =1 (cid:88) τ>t CF τ ( X ( j ) t : T | X t ) . This formula implies taking N samples and calculating the average of the discounted simu-lated cash ﬂows. Since this must be done for each real-world simulation i to be able to construct { V ( i )1 } i =1: M , the full simulated distribution is given by { (cid:98) V ( i ) t } i =1: M = (cid:110) N N (cid:88) j =1 (cid:88) τ>t CF τ ( X ( j ) t : T | X ( i )0: t ) (cid:111) i =1: M . The set of risk-neutral scenarios are called the inner scenarios because they are constructed foreach outer scenario i to estimate the value of the portfolio conditional on the information at time 1.Since a total of M × N simulations are required, a nested Monte Carlo approach is usually infeasible.Most insurers, therefore, use other approximation methods for (cid:98) V , the most popular being ‘leastsquares Monte Carlo’ (LSMC) and ‘replicating portfolios’ (RPs). Both these methods are based ona regression approach. For an analysis and comparison of nested Monte Carlo to regression-basedmethods, the reader is referred to Broadie et al. (2015). Given the computational diﬃculties of nested Monte Carlo, many methods have been proposedin the literature, some of which are in place in the industry. With regard to those in place, it isimportant to note that none of these proxy models replace the original, full insurance cash ﬂowmodel of the asset–liability portfolio. These methods focus on allowing the solvency capital to becalculated with fewer executions of that model. 5SMC, originally introduced for pricing American style derivatives by Longstaﬀ and Schwartz(2001), is used to reduce the number of necessary risk-neutral simulations by ﬁnding a polynomialapproximation of the portfolio value as a function of the risk drivers. The coeﬃcients of thepolynomial expansion are obtained from a regression against a reduced-size nested Monte Carlo.LSMC is usually applied in the industry as, in the terminology of Glasserman and Yu (2002), a‘regress-now’ approach, as described in Bauer et al. (2010). However, polynomial approximationsdo not need to be restricted to ‘regress-now’ applications.The replicating portfolio approach is based on running a regression against the portfolio cashﬂows, but instead of polynomials the model uses a set of ﬁnancial securities as basis functions.The problem is formulated as a linear optimization problem ( L or L ) where the objective is tominimize the diﬀerences between the cash ﬂows of the asset–liability portfolio and the replicatingportfolio. The output is a portfolio of ﬁnancial instruments that reproduces the payout of the lifeinsurance portfolio as closely as possible. For reference, Natolski and Werner (2014) analyse indetail some of the popular approaches to constructing replicating portfolios. Vidal and Daul (2009)and Chen and Skoglund (2012) look at replicating portfolios from a more practical point of viewand Adelmann et al. (2019) provide a description of a real-world implementation in the insuranceindustry.In this paper we present a method based on neural networks that combines the strengths ofLSMC and replicating portfolios while providing higher accuracy in a setting with a realistic amountof inputs and computing power. We stress this last point since it is common for studies on neuralnetworks to provide results based on millions of input samples, which would not be practical in thereal world. Both regress-now and regress-later models are estimators for the value of the asset–liability portfolio, (cid:98) V t . Regress-now models approximate the value function by estimating the coeﬃcients { w k } in (cid:98) V ( i ) t ( X ) = (cid:88) k w k φ k ( X ( i )0: t ) . This estimation is done via regression, minimizing the squared error m (cid:88) i =1 (cid:16) (cid:88) k w k φ k ( X ( i )0: t ) − ˜ V ( i ) MC (cid:17) , where ˜ V ( i ) MC diﬀers from V ( i ) MC in that it is calculated with a very low number n of inner simulations,6nstead of using N simulations as in nested Monte Carlo. It is also worth noting that the regressionis performed over m << M outer scenarios. LSMC uses a polynomial basis for { φ k } . Once themodel has been calibrated, it can be used to ‘predict’ (in its machine learning sense) all M scenarios,which were not part of the training data.Regress-later models approximate the value function by estimating the coeﬃcients { w k } neces-sary to build the following estimator: (cid:98) V ( i ) t = E Q t (cid:2) (cid:88) τ>t (cid:100) CF τ ( X ) (cid:3) = E Q t (cid:2) (cid:88) τ>t (cid:88) k w k,τ φ k,τ ( X τ ) (cid:3) = (cid:88) τ>t (cid:88) k w k,τ E Q t (cid:2) φ k,τ ( X ( i ) ) (cid:3) . (2)The equation above shows that, while more accurate than regress-now models, regress-latermodels impose an additional requirement: to be able to calculate E Q t [ φ k,τ ]. In the best case, thereis a closed-form formula. In the worst case, it can be done via Monte Carlo, but the computationcost will partially or completely oﬀset the computation gains of avoiding nested Monte Carlo onthe full model.This estimation is done via regression, minimizing the error m (cid:88) i =1 n (cid:88) j =1 (cid:88) τ>t (cid:12)(cid:12)(cid:12) (cid:88) k w k,τ φ k,τ ( X ( j ) t : T | X ( i )0: t )) − CF τ ( X ( j ) t : T | X ( i )0: t ) (cid:12)(cid:12)(cid:12) p , where (as in the LSMC case) m << M but (unlike in that case) n could be as large as N if required.Typically this regression is done minimizing squared errors ( p = 2) but some companies useabsolute errors ( p = 1). The approximation is based on a linear regression at each τ of CF τ ( · )against { φ k,τ ( · ) } , the cash functions of a set of ﬁnancial instruments (bonds, swaps, equity options).This is the usual case in the industry, although some older implementations used grouped cash ﬂows(in time buckets) and some papers, like Cambou and Filipovi´c (2018), present the topic in terms ofterminal values (that is, all time steps grouped).The neural network approach proposed in this paper will take the form of a regress-later method,but using neural network structure for basis functions and performing a non-linear optimization toﬁnd its parameters. 7nput Figure 1:

Single-layer perceptron structure

Artiﬁcial neural networks, more commonly referred to as neural networks, constitute a broad classof models. Among them, the simplest is the single-layer perceptron, which we use for our model.The single-layer perceptron, a type of feed-forward network, is a collection of connected unitsor nodes arranged in layers. Nodes of one layer connect only to nodes of the immediately precedingand immediately following layers. They are fully connected, with every node in one layer connectingto every node in the next layer. The layer that receives external data is the input layer. The layerthat produces the ultimate result is the output layer. In between there is one hidden layer. Eachnode is a simple non-linear function φ ( · ) applied to a linear combination of its inputs—that is, thosenodes to which is connected. An example of this architecture is shown in Figure 1, for three inputsand a hidden layer with a width of four nodes.The single-layer perceptron is a universal function approximator, as proven by the universalapproximation theorem (Hornik (1991)).The ﬁrst neural network model that we present is a direct equivalent of Equation (2) for thecase of single-layer perceptron: (cid:98) V t = E Q t (cid:2) (cid:88) τ>t (cid:88) k w k,τ φ k,τ ( X τ ) (cid:3) = E Q t (cid:2) (cid:88) τ>t (cid:88) k w k,τ φ ( v (cid:124) k,τ X τ ) (cid:3) , where φ is the activation function, w k,τ the weights of the liner (output) layer, and v k,τ the weightsof the hidden layer. Since X is a constant (it describes the initial conditions of the simulation)there is no need for an explicit bias term since the ﬁrst component of v will be the constant term.8n this ﬁrst neural network model, we do not know the distribution of X , which contains arbitraryeconomic variables, and therefore the risk-neutral expectation cannot be calculated in closed form.When this expectation is required, as in Section 6.5, we present results for this model based onMonte Carlo valuation. Since this model’s input is X —the vector of economic variables—we willrefer to this model as the ‘nn econ’ when showing results.The second neural network model is more interesting because it allows a closed form valuation.When discussing Equation (1), we pointed out that X is modelled as a function of a smaller set ofnormal random drivers ξ —that is, X = X ( ξ ) and dim( ξ ) < dim( X ). Using this fact, we expressour second model as (cid:98) V t = E Q t (cid:2) (cid:88) τ>t (cid:88) k w k,τ φ ( v (cid:124) k,τ ξ τ ) (cid:3) ;that is, we use ξ as input instead of X (for symmetry, we keep a ξ o component, which is not randombut constant in order to provide a bias term).Using ξ as input brings two advantages and one disadvantage. On the positive side, the dimen-sionality of ξ is smaller than that of X ( dim ( ξ ) < dim ( X )) and its components are uncorrelatedwith each other ( σ ( ξ i , ξ j ) = δ ij but σ ( X i , X j ) (cid:54) = δ ij ). This makes solving the non-linear optimiza-tion problem much easier, requiring fewer samples and shorter training time to converge. Mostimportantly, the normal distribution of ξ allows a closed-form solution to the risk-neutral expec-tation as described in Section 4.1. On the negative side, CF (cid:48) ( ξ ) = CF ◦ X ( ξ ) is a more complexfunction than CF ( X ) so—all else being equal—we would expect to need a bigger neural network,more samples, and a longer training time. Given these advantages and disadvantages, it is notpossible to tell analytically which model will show higher accuracy. Both will, therefore, be tested.Since this model’s input is ξ —the vector of normal random variables—we will refer to this modelas the ‘nn rand’ when presenting results. We have claimed that the second neural network model has a closed-form solution to the risk-neutralexpectation. In this section we will show its derivation, which, as far as we know, has not appearedbefore in the literature.

Theorem 1.

For a normally distributed ξ , the time-t risk-neutral expectation, E Q t (cid:2) w ,τ + K (cid:88) k =1 w k,τ φ ( v (cid:124) k,τ ξ τ ) (cid:3) , of a single-layer network with K hidden nodes and a ReLu activation function φ that models thecash-ﬂows at time τ is given by ,τ + K (cid:88) k =1 w k,τ (cid:34) t µ k,τ + t σ k,τ (cid:114) π exp (cid:32) − t µ k,τ t σ k,τ (cid:33) + t µ k,τ (cid:18) − (cid:18) − t µ k,τt σ k,τ (cid:19)(cid:19)(cid:35) , (3) t µ k,τ = min( t,τ ) (cid:88) i =0 v (cid:124) i,k,τ ξ i , t σ k,τ = τ (cid:88) i = t +1 (cid:107) v (cid:124) i,k,τ (cid:107) . Proof.

Starting from the full network E Q t (cid:2) w ,τ + K (cid:88) k =1 w k,τ φ ( v (cid:124) k,τ ξ τ ) (cid:3) = w ,τ + K (cid:88) k =1 w k,τ E Q t (cid:2) φ ( v (cid:124) k,τ ξ τ ) (cid:3) , (4)and then focusing on the expectation of a single node, we obtain E Q t (cid:2) φ ( v (cid:124) k,τ ξ τ ) (cid:3) = E Q t (cid:2) max( v (cid:124) k,τ ξ τ , (cid:3) = E Q t (cid:2) v (cid:124) k,τ ξ τ + | v (cid:124) k,τ ξ τ | (cid:3) = 12 (cid:104) E Q t (cid:2) v (cid:124) k,τ ξ τ (cid:3) + E Q t (cid:2) | v (cid:124) k,τ ξ τ | (cid:3)(cid:105) . (5)Since v (cid:124) k,τ ξ τ is normally distributed, it is deﬁned by its mean and standard deviation, µ k,τ and σ k,τ . Its conditional expectation at time t is E Q t (cid:2) v (cid:124) k,τ ξ τ (cid:3) = t µ k,τ = min( t,τ ) (cid:88) i =0 v (cid:124) i,k,τ ξ i , and its conditional variance at time t is t σ k,τ = τ (cid:88) i = t +1 (cid:107) v (cid:124) i,k,τ (cid:107) . If τ ≤ t , then its conditional variance is 0.Since v (cid:124) k,τ ξ τ is normally distributed, | v (cid:124) k,τ ξ τ | follows a folded normal distribution, with con-ditional expectation at time tE Q t (cid:2) | v (cid:124) k,τ ξ τ | (cid:3) = t σ k,τ (cid:114) π exp( − t µ k,τ t σ k,τ ) + t µ k,τ (cid:0) − − t µ k,τt σ k,τ ) (cid:1) . Therefore, the expectation of each hidden node is10 Q t (cid:2) φ ( v (cid:124) k,τ ξ τ ) (cid:3) = 12 (cid:34) t µ k,τ + t σ k,τ (cid:114) π exp (cid:32) − t µ k,τ t σ k,τ (cid:33) + t µ k,τ (cid:18) − (cid:18) − t µ k,τt σ k,τ (cid:19)(cid:19)(cid:35) , which leads to the expectation of the full network in (3). Besides the quantitative experiments, whose results we show in Section 6.5, there are some quali-tative diﬀerences of importance to practitioners. One of them is the complexity of the model, asmeasured by the number of modules and equations required to implement it. A second one is theamount of expert judgement required in the feature engineering. A third one is how to preventoverﬁtting of the training data.

Figure 2 presents a comparison of the module structure. We describe below the sequence of calcu-lations required for the training phase, the prediction phase being very similar with the exceptionthat cash ﬂows are replaced by prices (closed-form or Monte Carlo, depending on the model andthe instruments used).All models require a random number generator. Replicating portfolios and the ﬁrst neuralnetwork model require the generation of the economic variables X . In the insurance industry,these two modules are usually grouped into the so-called economic scenario generator (ESG). Thesecond neural network model does not need X for training or prediction. Finally, the replicatingportfolio model requires the generation of instrument cash ﬂows in order to produce the inputs tothe optimization problem. Despite their reputation for complexity, neural networks actually leadto a simpler model, at least when measured by the number of equations and components necessaryfor their implementation. Any replicating portfolio model requires a substantial degree of expert judgement in deciding whichinstrument cash ﬂows to use in the model. When working with simple liabilities, the decision isnot hard since the simplest instruments, such as bonds and equity forwards, will work well. As theliabilities grow in complexity, and if the most obvious derivatives (swaps, swaptions, European andAsian options) are not enough to capture the behaviour of the liabilities, the practitioner faces theextremely diﬃcult task of ﬁguring out which is the correct derivative to add to the existing mix.11 eplicating portfolio model structure

Random numbers ( ξ ) Economic variables ( X ) Instrument cash-ﬂows

Portfolio values ( V t ) Neural network on X (“nn econ” model structure) Random numbers ( ξ ) Economic variables ( X ) Portfolio values ( V t ) Neural network on ξ (“nn rand” model structure) Random numbers ( ξ ) Portfolio values ( V t )1 Figure 2:

Comparison of model structures

Since ﬁnancial instruments as a whole do not form a structured basis of any meaningful space offunctions, it is not possible to explore in a systematic way the set of all possible ﬁnancial instrumentsuntil the best solution is found. Even worse, each attempt (adding of a new asset class) requiresa substantial amount of implementation work before the results can be seen. In contrast, neuralnetworks provide certain guarantees of convergence by simply increasing the width of the hiddenlayer (adding more nodes). This guarantee is provided by the ‘universal representation theorem’,of which diﬀerent versions exist (among them Hornik (1991) and Hanin and Sellke (2017)). Atleast for the classes of functions covered under these theorems, the search for better results isextremely straightforward: just one parameter that is a direct input in any neural network softwarelibrary. No new equations need to be implemented, only one input change in the existing modelin required. Even when no new asset classes are required, the feature engineering problem inreplicating portfolios still exists. Each asset class can have tens, hundreds, or thousands of individualinstruments. For example, there is one zero coupon for each possible maturity. Even worse is thecase of swaptions: having to choose from a combination of a set of maturities, tenors, and strikesleads to having to select thousands of instruments. All these basis elements must be completelycalculated before being fed to the linear regression problem. In contrast, neural networks createtheir own features adaptively based on the data. This, of course, has the downside that it requiresmore training data than a comparable replicating portfolio model for which the expert judgementselection has been carried out correctly. 12 .3 Feature selection

The reverse problem to not having enough ﬁnancial instruments or not having asset classes that arecomplex enough is the problem of having too many instruments and asset classes. Feeding thousandsof instruments to the regression problem runs the very real risk of overﬁtting the training data. Thisproblem has not been entirely solved by practitioners in the industry, although the popularizationof machine learning software libraries has allowed big improvements in recent years compared withthe completely manual approach used ﬁve or ten years ago. In this paper, we solve this problemby using Lasso regression (Tibshirani (1996)) with the Bayesian information criterion to choosethe regularization parameter (Zou et al. (2007)). For the neural network model, the constraints onmodel complexity are provided by selecting the layer width via cross-validation.

The goal of this section is to provide a quantitative comparison between a neural network model anda replicating portfolio model. It is important to clarify that there are many possible neural networkmodels (many hyper-parameters and architectural choices) as well as many possible replicatingportfolio models (many options of asset classes in the instrument universe and other architecturalchoices). Furthermore, the results presented here are for one particular scenario generator and oneinsurance product. It is not possible to generalize the conclusions drawn from these results to allmodels and all products. However, the choices made in this example are mainstream and robust.We would, therefore, expect the conclusions to extend to many real-world situations.

In order to provide a comparison between replicating portfolios and another model, it is necessaryto ﬁrst have access to simulated cash ﬂows of an insurance product. In the absence of open sourcelibraries or data sets there has been no option but to build our own economic scenario generatorand insurance model. Hoping that the data might help others in their research in the ﬁeld, we havepublished the full data set in Mendeley Data (Fernandez-Arjona (2019)).The high-level structure of the insurance model is described in Figure 3. The ﬁrst moduleis the normal random variable generator, which feeds the second module, the economic scenariogenerator. We use a combination of a one-factor Hull–White for the short rate and a geometricBrownian motion process for equity returns (in excess of risk-free rates). For the interest ratemodel, we use the formulas in Glasserman (2013).In addition to the published data set, the full code of the scenario generator is available inopen source form at https://gitlab.com/luk-f-a/EsgLiL . The generator is entirely written inPython. It uses NumPy (Oliphant (06 ), Van Der Walt et al. (2011)) for array operations, pandas13ormal random variablesPrimary economic variables(equity index, cash account, interest rates)Secondary economic variables(bonds and derivatives’ prices)Insurance product cash ﬂowsESG modelPricing formulasInsurance liability model1

Figure 3:

Modular structure of the insurance model (McKinney (2010)) for data aggregation, scikit-learn (Pedregosa et al. (2011)) for linear regressionsand neural networks training, and joblib for parallelization.This scenario generator produces the evolution of the short rate, the cash account and the equityindex. From these, we derive the rest of the asset prices: bonds, swaptions, and equity options.The last module is the insurance product, a variable annuity guarantee known as ‘guarantee returnon death’. The simulation of the insurance product begins with a policyholder population of 1,000customers of ages 30 to 70. The population evolves according to a Lee–Carter stochastic mortalitymodel (we have used the same parameters as in Lee and Carter (1992)), which provides a trend andstochastic ﬂuctuations. Each customer starts the simulation with an existing investment fund anda guaranteed level. In each time period, they pay a premium, which is used to buy assets; theseassets are deposited in the fund. The fund is rebalanced at each time period to maintain a targetasset allocation. The value of the fund is driven by the inﬂows from premiums and the marketvalue changes are driven by the interest rate, equity, and real estate models. At each time step,a proportion of policy holders (as determined by the stochastic life table) die and the investmentfund is paid out to the beneﬁciaries. If the investment fund were below the guaranteed amount, thecompany will additionally pay the diﬀerence between the fund value and the guaranteed amount.The guaranteed amount is the simple sum of all the premiums paid over the life of the policy. Overthe course of the simulation the premiums paid increase the guaranteed amount for each policy.All policies have the same maturity date, of forty years. Those policyholders alive at maturityreceive the investment fund or the guaranteed value, whichever is the higher.14 .2 Ground truth and benchmark value

The quality comparison across methods requires establishing a ‘ground truth’—the values of therisk metrics calculated in an exact way. Given the lack of closed-form formulas this is not possible,so we settle for performing comparisons against a benchmark value calculated in the most reliableway possible. We do this by running an extremely large Monte Carlo simulation, with 100,000 outersimulations each with 10,000 inner simulations.The results of this calculation are shown in Table 1. The centre column shows the mean presentvalue, the risk-neutral value of the guarantee. To the left and right we see the expected shortfalland value at risk. Both are expressed as the change in monetary value in respect of the mean—thatis as V tail − V . Table 1:

Benchmark values

Left ES Left VaR Mean Right VaR Right ESLarge nMC − . × − . × − . × . × . × We use two established methods to benchmark our proposed model: nested Monte Carlo (nMC)and replicating portfolios (RPs). In all cases we set a computational budget of 10,000 trainingsamples. In the case of nested Monte Carlo, this budget is split into 100 outer and 100 innersimulations. This split was selected for being the combination with the largest number of innersimulations possible (lowest bias) with a minimum of 100 outer simulations (to be able to calculatethe 1 percent expected shortfall). For reference, note the large diﬀerence with the benchmarkvalue calculation made with 100,000 ×

00 200 400 1000 2000 10000 outer sims -20%0%20%40%60%80%100% R i g h t T a il E S e rr o r Figure 4:

Error distribution of nMC estimators (ﬁxed total simulations). Increasing outer simulationnumbers lead to reduced variance but the corresponding decrease in inner simulation numbers increasesbias.

Table 2:

Replicating portfolio parameters

Parameter Choice

Loss function Squared errorsAsset universe Bonds, cash, swaptions, equity index, equity European optionsTime steps Full annual cash ﬂow replicating (no grouping)Constraints NoneOptimizer LassoLarsIC (scikit-learn)many variations and requires the choice of a range of hyper-parameters. Which explains the impor-tance both of the description that follows and of our choices with regard to implementation. Wehave chosen parameters common in the industry except for one aspect. In the industry, replicatingportfolio models are usually run many times with diﬀerent parameters (in particular diﬀerent assetclasses) and one replicating portfolio is selected manually from those runs. Following that method-ology would not allow us to create repeatable experiments or to form consistent distributions. Wetherefore use a Lasso regressor for the feature selection (instrument selection from the universe)and use the Bayesian information criterion to choose the regularization parameter. This is providedby the machine learning library scikit-learn through the LassoLarsIC class. As for the rest of theparameters, we list them in Table 2. 16 .4 Quality measurement

Following the arguments in Willmott and Matsuura (2005) and Chai and Draxler (2014) we chooseto focus on absolute errors rather than squared errors for model comparison since we do not need topenalize large outliers, and the errors are biased and most likely do not follow a normal distribution.We therefore use mean absolute errors as a metric of mean model errors. Other moments arenot quantitatively measured but the histograms of the distributions are presented for qualitativeassessment.The error to be considered is that of each risk metric (ES or VaR) and each tail (left tail orright tail), measured as a percentage error against the benchmark value for that metric.Based on the 100 macro-runs described in the previous section, we derive an empirical distribu-tion for each estimator, { ρ i } i =1 where each individual sample is denoted as ρ i . The mean absolutepercentage error (MApE) is deﬁned as 1100 R (cid:88) i (cid:12)(cid:12) (cid:98) ρ i ρ − (cid:12)(cid:12) , where ρ is the benchmark value of the estimator.All model results are mean-centred before risk metrics are calculated.Figure 4 shows an example of the application of this error measurement for the various pa-rameters in the nested Monte Carlo estimator. We can see that the estimator with 100 outerand 100 inner simulations has the lowest MApE. We therefore use this estimator in all subsequentcomparison with other methods.It is important to note that the benchmark value calculation is completely out-of-sample inrespect of the training of any of the methods. Hence, all comparisons shown below are fully out-of-sample. The results of the numerical experiments are presented in Table 3 and Figure 6. The columns showthe mean absolute percentage error for each of the four risk metrics and the mean present value.The rows show the errors of each diﬀerent method.The errors of the replicating portfolio model are shown in the ﬁrst row. In terms of meanabsolute errors, this method yields the worst results of the group. Interestingly, the results areworse than using a nested Monte Carlo approach (second row). This is probably due to the verycomplex and non-linear nature of the guarantee function that is being replicated. Most likely theasset classes in the instrument universe are not complex enough, or not suﬃciently path dependentto replicate the guarantee. 17 able 3:

Comparison of errors measured as MApEs

Left ES Left VaR Mean Right VaR Right ESRep. Portfolio MApE

20% 23% 4% 46% 47%

Nested MC MApE

19% 14% 2% 8% 10%

Neural net (econ) MApE

4% 2% 2% 7% 4%

Neural net (rand) MApE

15% 11% 5% 4% 5%

While one might consider adding more asset classes, it becomes immediately clear that there isno obvious next step. This is a key disadvantage of replicating portfolios versus polynomial or neuralnetwork methods: the complexity of the approximation (richness of the approximating function)is not a parameter that one can easily modify. Which asset class to add next is an unknown, andthere are hundreds of exotic derivatives that one might try. In order to test any of them, one mustprogram the cash ﬂows and valuation functions before any testing can be done. Therefore, thesearch for a better model takes much longer and success is not even guaranteed since one might notﬁnd the correct derivative.By contrast, when working with a polynomial approximation one might increase the maximumdegree of the polynomial, and with a neural network one might add layers or make each layer wider(adding more nodes). It only takes a few keystrokes to make a more powerful model. Polynomialand neural network models are guaranteed to succeed under certain conditions (smooth enoughfunctions, suﬃcient samples, etc.), at least asymptotically. These models are, therefore, moreamenable to automated solutions than are replicating portfolio models, which require an expertsetup.The last two rows in Table 3 present the results for each type of neural network models. Eachmodel show better results than nested Monte Carlo or replicating portfolios. The ﬁrst type (eco-nomic variable inputs) shows the best results of the group, better than the second type (randomvariable inputs).Despite the advantages of the second type of neural network model, the data shows that 10,000samples were not enough to learn the more complex function CF ◦ X suﬃciently well to have higheraccuracy than the ﬁrst type of neural network model, which only had to learn the simpler function CF . In Figure 5 we can see that the second model (called ‘NNrand:10’ when trained on 10,000samples, ‘NNrand:50’ when trained on 50,000 samples, etc.) does perform better than the ﬁrstmodel (‘NNecon’) once it is given a larger number of samples.Figure 6 shows the empirical distribution of each of the estimators (trained, respectively, on10,000 samples for comparability). We can observe the relative standard deviations of the estimatorsand ﬁnd, as previously seen in Figure 4, that nested Monte Carlo has low bias and a very highvariance compared to the other methods. Interestingly, the standard deviations of the neural18 Necon NNrand:10 NNrand:50 NNrand:100 NNrand:5000%5%10%15%20% E S e rr o r ( l e f t t a il ) Figure 5:

Eﬀect of increasing training sample size on second neural network model. The decrease inmean and standard deviation of errors can be clearly seen as samples are increased from 10,000 to 500,000. network models are not much higher than that of the replicating portfolios, despite each macro-runusing a completely diﬀerent set of inputs. This indicates that the calibration of the neural networksis robust to the variance of the inputs. In this regard, neural networks have a bad reputation due totheir non-convex loss function. A common problem associated with the training of neural networksis that of local minima. Together with the common use of stochastic optimization methods (suchas stochastic gradient descent), this usually contributes to a high variance of predictions. In thispaper, we have taken certain steps to reduce this problem, by a) using a network of a small size(100 nodes) and b) using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) optimization algorithm,which is non-stochastic. These decisions contribute to keeping the variance of the estimator withinreasonable levels.

As shown in Table 3 and Figure 6, the neural network models perform very well in terms of quality.Since these models require a non-linear regression to ﬁnd their parameters, they can be expected tobe slower than a purely linear model, such as a replicating portfolio. Table 4 shows how long it takesto run each model ‘end to end’—that is, including feature generation, calibration and prediction.We can see that training a neural network model takes longer than training a replicating portfo-lio. Both types of model scale approximately linearly to the size of the training set. However, evenat the far end (40,000 training samples) the neural network models do not take more than 30 min-utes (on a single-core AMD Opteron 6380) which is perfectly acceptable from a practitioner’s pointof view. Generating the inputs required for economic capital calculations can normally take from19

NNeconNNrand nMCRP

Left tail ES errors -40% -20% 0% 20% 40% 60%

NNeconNNrand nMC RP

Right tail ES errors

Figure 6:

Empirical distribution of estimators (with 10,000 training samples). Each neural networkmodel delivers more accurate results than the benchmark models.

Table 4:

Comparison of runtime (in seconds) for training data sets of diﬀerent sizes

Samples Neural net (econ) Neural net (rand) Rep. portfolio2500

112 115 4

202 200 4

384 406 6

707 711 13 several hours to several days; and 30 minutes can therefore ﬁt easily within the normal productionschedule of an insurance company.

Based on a simulated insurance product, we have presented a comparison of nested Monte Carlo,replicating portfolios, and neural networks as methods for calculating solvency capital. The numer-ical experiments show that neural networks perform very well, even in a highly non-linear problemwith a small number of training samples. The mean errors are the lowest of the group and thedistributions of the results do not show qualitatively, large variance. A qualitative analysis suggeststhat the neural network model can also have advantages in terms of model simplicity.In the construction of this neural network model we make use of what we believe is a novelformula for calculating the risk-neutral price of a neural network. Additionally, we make twocontributions for other researchers in the ﬁeld: a description of an automated replicating portfolio20odel (necessary for reliable comparisons) and a full data set and software library for the productionof economic scenarios.

Acknowledgements

This paper is based on our presentation at the 2019 Insurance Data Science Conference (ETHZurich). We thank the scientiﬁc committee for its invitation to present, and all those that providedhelpful comments during the preparation: Mariana Andres, Dr Gregor Reich, and the partici-pants of the Quantitative Business Administration PhD Seminar. Special thanks go to Prof. KarlSchmedders and Prof. Damir Filipovic, for reviewing the presentation and for all we have learntfrom them, without which this paper would not be possible.21 eferences

Adelmann, M., L. Fernandez Arjona, J. Mayer, and K. Schmedders (2019). A large-scale optimiza-tion model for replicating portfolios in the life insurance industry.

Working Paper, University ofZurich . 6Bauer, D., D. Bergmann, and A. Reuss (2010). Solvency II and nested simulations–a least-squaresMonte Carlo approach. In

Proceedings of the 2010 ICA congress . 3, 6Beutner, E., A. Pelsser, and J. Schweizer (2013). Fast convergence of regress-later estimates in leastsquares Monte Carlo.

Available at SSRN 2328709 . 3Beutner, E., A. Pelsser, and J. Schweizer (2016). Theory and validation of replicating portfolios ininsurance risk management.

Available at SSRN 2557368 . 3Broadie, M., Y. Du, and C. C. Moallemi (2015). Risk estimation via regression.

OperationsResearch 63 (5), 1077–1097. 5Cambou, M. and D. Filipovi´c (2018). Replicating portfolio approach to capital calculation.

Financeand Stochastics 22 (1), 181–203. 3, 7Castellani, G., U. Fiore, Z. Marino, L. Passalacqua, F. Perla, S. Scognamiglio, and P. Zanetti(2018). An investigation of machine learning approaches in the solvency ii valuation framework.

Available at SSRN 3303296 . 3Chai, T. and R. R. Draxler (2014). Root mean square error (rmse) or mean absolute error (mae)?–arguments against avoiding rmse in the literature.

Geoscientiﬁc model development 7 (3), 1247–1250. 17Chen, W. and J. Skoglund (2012). Cashﬂow replication with mismatch constraints.

The Journalof Risk 14 (4), 115. 6Fernandez-Arjona, L. (2019). Esg simulations and rpdb cash ﬂows (june 2019). http://dx.doi.org/10.17632/6vvzh2w4g4.1 . Mendeley Data, v1. 13Gan, G. and X. S. Lin (2015). Valuation of large variable annuity portfolios under nested simulation:A functional data approach.

Insurance: Mathematics and Economics 62 , 138–150. 2Glasserman, P. (2013).

Monte Carlo methods in ﬁnancial engineering , Volume 53. Springer Science& Business Media. 13Glasserman, P. and B. Yu (2002). Simulation for american options: Regression now or regressionlater? In

Monte Carlo and Quasi-Monte Carlo Methods 2002 , pp. 213–226. Springer. 3, 622anin, B. and M. Sellke (2017). Approximating continuous functions by relu nets of minimal width.12Hejazi, S. A. and K. R. Jackson (2016). A neural network approach to eﬃcient valuation of largeportfolios of variable annuities.

Insurance: Mathematics and Economics 70 , 169–181. 2Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks.

Neural net-works 4 (2), 251–257. 8, 12Lee, R. D. and L. R. Carter (1992). Modeling and forecasting u. s. mortality.

Journal of theAmerican Statistical Association 87 (419), 659–671. 14Longstaﬀ, F. A. and E. S. Schwartz (2001). Valuing american options by simulation: A simpleleast-squares approach.

The Review of Financial Studies 14 (1), 113–147. 6McKinney, W. (2010). Data structures for statistical computing in python. In S. van der Walt andJ. Millman (Eds.),

Proceedings of the 9th Python in Science Conference , pp. 51 – 56. 14Natolski, J. and R. Werner (2014). Mathematical analysis of diﬀerent approaches for replicatingportfolios.

European Actuarial Journal 4 (2), 411–435. 3, 6Oliphant, T. (2006–). NumPy: A guide to NumPy. USA: Trelgol Publishing. [Online; accessedNovember 2019]. 13Pedregosa, F., G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Pret-tenhofer, R. Weiss, V. Dubourg, et al. (2011). Scikit-learn: Machine learning in python.

Journalof machine learning research 12 (Oct), 2825–2830. 14Pelsser, A. and J. Schweizer (2016). The diﬀerence between lsmc and replicating portfolio ininsurance liability modeling.

European actuarial journal 6 (2), 441–494. 3Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.

Journal of the RoyalStatistical Society: Series B (Methodological) 58 (1), 267–288. 13Van Der Walt, S., S. C. Colbert, and G. Varoquaux (2011). The numpy array: a structure foreﬃcient numerical computation.

Computing in Science & Engineering 13 (2), 22. 13Vidal, E. G. and S. Daul (2009). Replication of insurance liabilities.

RiskMetrics Journal 9 (1). 6Willmott, C. J. and K. Matsuura (2005). Advantages of the mean absolute error (mae) over theroot mean square error (rmse) in assessing average model performance.

Climate research 30 (1),79–82. 17Zou, H., T. Hastie, R. Tibshirani, et al. (2007). On the “degrees of freedom” of the lasso.