[PDF] Stochastic reserving with a stacked model based on a hybridized Artificial Neural Network

Abstract

Currently, legal requirements demand that insurance companies increase their emphasis on monitoring the risks linked to the underwriting and asset management activities. Regarding underwriting risks, the main uncertainties that insurers must manage are related to the premium sufficiency to cover future claims and the adequacy of the current reserves to pay outstanding claims. Both risks are calibrated using stochastic models due to their nature. This paper introduces a reserving model based on a set of machine learning techniques such as Gradient Boosting, Random Forest and Artificial Neural Networks. These algorithms and other widely used reserving models are stacked to predict the shape of the runoff. To compute the deviation around a former prediction, a log-normal approach is combined with the suggested model. The empirical results demonstrate that the proposed methodology can be used to improve the performance of the traditional reserving techniques based on Bayesian statistics and a Chain Ladder, leading to a more accurate assessment of the reserving risk.

Full PDF

SStochastic reserving with a stacked model based on ahybridized Artiﬁcial Neural Network

Eduardo Ramos-P´erez (1) ,Pablo J. Alonso-Gonz´alez (2) , Jos´e Javier N´u˜nez-Vel´azquez (2) (1) Ph D Student (Economics and Management Program). Universidad de Alcal´a.(2) Economics Department. Universidad de Alcal´a. ∗† Abstract

Currently, legal requirements demand that insurance companies increase theiremphasis on monitoring the risks linked to the underwriting and asset man-agement activities. Regarding underwriting risks, the main uncertainties thatinsurers must manage are related to the premium suﬃciency to cover futureclaims and the adequacy of the current reserves to pay outstanding claims. Bothrisks are calibrated using stochastic models due to their nature. This paper in-troduces a reserving model based on a set of machine learning techniques suchas Gradient Boosting, Random Forest and Artiﬁcial Neural Networks. Thesealgorithms and other widely used reserving models are stacked to predict theshape of the runoﬀ. To compute the deviation around a former prediction, alog-normal approach is combined with the suggested model. The empirical re-sults demonstrate that the proposed methodology can be used to improve theperformance of the traditional reserving techniques based on Bayesian statisticsand a Chain Ladder, leading to a more accurate assessment of the reserving risk.

Keywords:

Stochastic reserving, Reserving Risk, Machine Learning, General insurance,Run-oﬀ prediction

AMS Subject Classiﬁcation:

As with any other company, the survival of an insurance ﬁrm depends on its abilityto obtain a sustainable proﬁt over the years. These entities have to oﬀer their servicesat an adequate and competitive premium, while the ultimate cost of the claims issubject to uncertainty. Thus, reserving models were developed in order to estimateand monitor the expected ultimate cost of outstanding claims. Although life insur-ance contracts manifest uncertainty about the claims cost, reserving takes a specialrelevance in general insurance as that uncertainty tends to be higher, at least in the ∗ Authors’ address: (1)&(2) :Economics Department, Universidad de Alcal´a, Plaza de la Victoria2, 28802 Alcal´a de Henares, Spain. E–mails: P.J. Alonso-Gonz´alez, [email protected] , J.J.N´u˜nez, [email protected] , E. Ramos, [email protected] † Corresponding author: P. Alonso; Date: August 19, 2020. This manuscript version is made availableunder the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/ a r X i v : . [ q -f i n . R M ] A ug hort term.Methods of estimating the level of reserves in non-life insurance have evolved fromclassical and deterministic methods toward others that take into account the lossreserve uncertainty. The aim of the ﬁrst type is to estimate the expected level ofreserves by taking the historical information into consideration. Chain Ladder is themost frequently used method of this family. When historical data are not stableenough to use the Chain Ladder technique, the Bornhuetter and Ferguson (1972)model tends to be the preferred option to obtain an adequate estimate of the ex-pected ultimate cost.The increasing interest of investors in the risk proﬁle of ﬁnancial institutions sincethe Financial Crisis of 2007-2008 and the implementation of the Solvency II Direc-tive in the European market have fostered the use of stochastic reserving models.As in the case of deterministic approaches, stochastic models based on the ChainLadder technique are the most commonly used. One of the main techniques withinthis family is the Overdispersed Poisson (ODP) model developed by Renshaw andVerrall (1998) and its bootstrap implementation suggested by England and Verrall(1999) and England (2002) which assumes that incremental claims follow an ODPdistribution where the variance is proportional to the mean.In this model, incremental claims must be positive, but this limitation can be over-come by using the quasi-likelihood approach introduced by McCullagh and Nelder(1989). In cases where the ODP assumption does not properly ﬁt the data, Kremer(1982), Mack (1991) and Verrall (2000) developed other models assuming log-normal,gamma and negative binomial distributions respectively. In contrast to the methodswithin this family, Mack (1993) developed a free-distribution model by focusing andlimiting the claims reserve distribution analysis to the ﬁrst two moments.Thus, the bootstrap implementation of Mack’s model allows the analyst to obtain areserve distribution without the necessity of deﬁning a theoretical distribution for thecumulative or incremental claim cost. If the bootstrapping procedure is to be avoided,England and Verrall (2006) introduced a stochastic Bayesian implementation of theODP, Negative Binomial and this last free-distribution model. This approach wasrecently expanded by Meyers (2015), who developed some Bayesian Markov ChainMonte-Carlo (MCMC) models (Levelled Chain-Ladder, Correlated Chain-Ladder,Levelled Incremental Trend, Correlated Incremental Trend and Changing SettlementRate) for incurred and paid data. Their aim is to improve the performance of ODPand Mack models by using diﬀerent approaches such as recognizing the correlationbetween accident years, including a skewed distribution to model negative incremen-tal payments, introducing a trend over the development years and allowing changesin the claim settlement rate.Another set of models is focused on using several triangles simultaneously in order totake into consideration diﬀerent characteristics of incurred and paid data. The mainmodels within this family are the Munich Chain Ladder (MCL) method and DoubleChain Ladder (DCL) model developed by Quarg and Mack (2004) and Mart´ınez-2iranda et al. (2012), respectively. By modifying this last method, Margraf et al.(2018) addressed the problem of calculating general insurance reserves when the port-folio is covered by an excess-of-loss reinsurance. In addition to MCL and DCL, Merzand W¨uthrich (2010) introduced a Bayesian implementation of the paid-incurredchain (PIC) reserving method (Posthuma et al. 2008) based on using both incurredand paid data. Happ et al. (2012) and Happ and W¨uthrich (2013) also investigatedand developed models related to the PIC method, while Halliwell (2009) and Venter(2008) introduced regression approaches based on using both data sources. Pigeonet al. (2014), Antonio and Plat (2014), and Mart´ınez-Miranda et al. (2013b) alsoproposed models by taking into consideration diﬀerent data sources to estimate theexpected ultimate claim cost.In addition to the diﬀerent approaches exposed above, it is possible to ﬁnd modelswhere the information is not organized in an aggregated way, as in the classical tri-angles, but rather in individual claims data (see Taylor et al. 2008, Jessen et al.2011, Pigeon et al. 2013, Antonio and Plat 2014, Mart´ınez-Miranda et al. 2015,Charpentier and Pigeon 2016, or W¨uthrich 2018b).Thanks to the increase in computational power, machine learning techniques haveturned into an adequate tool for reserving purposes. Artiﬁcial Neural Networks(Gabrielli and W¨uthrich 2018 and W¨uthrich 2018b), regression trees (W¨uthrich2018a), Recurrent Neural Networks (Kuo 2018) or tree-based algorithms (Lopez et al.2019) have been used to predict claim reserves. Gabrielli et al. (2018) embedded theODP model into a neural network framework, and Baudry and Robert (2019) intro-duced a nonparametric reserving model based on extremely randomized trees (Geurtset al. 2006) and individual claims data. In addition to the aforementioned algorithms,other machine learning techniques were used by Mart´ınez-Miranda et al. (2013a) forreserving purposes, and a support vector machine was applied to classify risks priorto the reserve calculation (Duma et al. 2011).The research carried out in this paper develops a nonparametric reserving modelbased on the stacking algorithm methodology. The proposed architecture consists oftwo diﬀerent levels. Random Forest (RF) (Breiman 2001), Gradient Boosting (GB)with regression trees (Friedman 2000), Artiﬁcial Neural Network (ANN) (McCullochand Pitts 1943), Changing Settlement Rate (CSR) reserving model and the ChainLadder assumptions are incorporated within the ﬁrst level, while an ANN is includedin the second level of the stacked model (Stacked-ANN) architecture in order to gen-erate the ﬁnal predictions. Therefore, the aim of this hybrid model is to improvethe performance of the individual components by creating an architecture that canto learn from the diﬀerent algorithms and the reserving models included within theﬁrst level.Although the overall methodology is based on that proposed by Ramos-P´erez et al.(2019) for stock volatility forecasting purposes, the model architecture proposed inthis study is diﬀerent. In this research, machine learning algorithms and reservingmodels are present in the ﬁrst level, while in the architecture developed by Ramos-P´erez et al. (2019), only machine learning algorithms were included. Therefore, the3ost popular models for forecasting volatility such as GARCH or EGARCH werenot integrated within the model architecture, while in this case, Chain Ladder andCSR are incorporated. It is also worth mentioning that in contrast to the hybridmodel proposed for forecasting volatility purposes, in this research, the second levelonly receives information already processed by the models within the ﬁrst level. Inaddition to the main diﬀerences explained above, it should be pointed out that thestacking algorithm methodology has not appeared previously in the actuarial litera-ture related to the valuation of loss reserves. Apart from that, a log-normal approachis combined with the suggested reserving model based on machine learning in orderto compute the reserve variability.As all the diﬀerent algorithms and reserving models of the ﬁrst level are incorporatedin the ANN of the second level, some of the most important research studies carriedout in the context of selecting the optimal ANN architecture will be discussed. Thereis a signiﬁcant amount of literature supporting the use of ANNs with just one hid-den layer because under mild assumptions on the activation functions, the universalapproximation theorem states that a feedforward ANN with a single hidden layerand a ﬁnite number of neurons can approximate any continuous function on compactsubsets of the Euclidean space.Based on regularization techniques and using just one hidden layer network, Pog-gio and Girosi (1990) developed a theoretical framework to approximate nonlinearmappings named regularization networks. These authors demonstrated that theirarchitecture can approximate any continuous function on a compact domain if thenumber of units is high enough. Cybenko (1989) and Hornik et al. (1989) also provedthat one hidden layer networks with sigmoidal activation functions can approximatecontinuous functions on any compact Euclidean space. It was also shown that, undercertain conditions, an arbitrarily small error between a single hidden layer ANN andany other continuous function can be obtained by increasing the number of neurons(Barron 1994, Funahashi 1989 and Hornik 1993). Nakama (2011) showed that therange of eﬀective learning rates is wider in the case of ANN with one hidden layerthan in architectures with multiple hidden layers.On the other hand, Hornik (1991) and Leshno et al. (1993) demonstrated that ANNshave the potential of being universal approximators not only due to the choice of aspeciﬁc activation function but also because of the possibility of using several hiddenlayers. Limitations of the approximation capabilities of one hidden layer networkswere demonstrated by Chui et al. (1994) and Chui et al. (1996). In recent years,multi-hidden layer architectures have improved the state of the art in machine learn-ing.For example, in the context of natural language processing, the models and archi-tectures created by Devlin et al. (2018) (BERT), Brown et al. (2020) (GPT3) andVaswani et al. (2017) (Transformer) overcome the performance of other less complexmodels. In addition, it is worth mentioning that agents trained with multi-hiddenlayer ANNs have been able to overcome the human performance in speciﬁc tasks suchas playing chess (Silver et al. 2017) or ‘go’ (Silver et al. 2016). With respect to the4ptimal number of neurons, Celikoglu (2007) analysed this issue in the context ofsolving the dynamic network loading problem, while Sheela and Deepa (2013) pro-posed a list of principles to select this number.Results from recently published papers in the actuarial ﬁeld support the idea of ap-plying ANNs with multiple hidden layers. Indeed, Richman and W¨uthrich (2018) andNigri et al. (2019) applied this structure to model human mortality, while Castellaniet al. (2018) used it for estimating the economic capital of insurance companies underthe Solvency II framework. Thus, the ANNs included within the architecture of theStacked-ANN model have several hidden layers.The rest of the paper proceeds as follows: Section 2 presents the set of models usedfor comparison purposes. Additionally, the error and risk measures taken to validatethe stochastic reserves, payments and ultimate losses are discussed. In Section 3,the theoretical background and architecture of the reserving model based on stackingalgorithms (Stacked-ANN) are explained. Details about the log-normal approachproposed for obtaining a stochastic distribution are also given in this section. Theempirical results, error and risk measures of the diﬀerent reserving models are shownin Section 4. Finally, Section 5 presents the main conclusions derived from the resultsand comparisons presented in Section 4. As previously stated, this section explains the benchmark models and the diﬀerentmeasures used to assess their performance. Thus, the ﬁrst paragraphs are dedicatedto ODP, Mack’s model, CSR and a nonparametric approach based on ANNs, whilethe end of this section presents the indicators used to compare and validate the re-serve distribution functions estimated by the benchmark models with those simulatedby the model presented in Section 3.The ﬁrst benchmark model is ODP (Renshaw and Verrall 1998 and England andVerrall 1999). Denoting the origin year as i and the development year as j , thisreserving model based on the Chain Ladder technique assumes that incremental pay-ments, C ij , follow an overdispersed Poisson distribution with a variance proportionalto the mean: E [ C ij ] = µ ij V ar [ C ij ] = φµ ij (1)where φ is the parameter that determines the level of overdispersion. Even thoughthis model assumes C ij to be a positive integer, the quasi-likelihood (McCullagh andNelder 1989) approach allows ﬁts the model to non-integer data, which can be eitherpositive or negative. The bootstrapping procedure used in this study to compute areserve distribution function with the ODP model was introduced by England andVerrall (1999) and England (2002).Mack (1993) model, which is also based on the Chain Ladder technique, is the secondbenchmark. The main characteristic of this reserving model is the lack of assumptions5bout the underlying distribution of the payments. This is achieved by using onlythe ﬁrst two moments: E [ D ij ] = λ j D i,j − V ar [ D ij ] = σ j D i,j − (2)where λ j and σ j refer to the parameters to be estimated, and D ij is the cumulativepayment. As with the ODP model, a bootstrapping procedure is used to calculatethe reserve distribution function with Mack’s model.The third benchmark model is CSR, a Bayesian approach introduced by Meyers(2015). The default calibration and prior distributions suggested by this author willbe used in this study: • α i ∼ N (ln P i + logelr, √ logelr ∼ U ( − , .

5) and P i are the premiumsby accident year. • β j ∼ U ( − ,

5) for j = 1 , ..., J −

1. In the last development year, β J = 0. • µ i,j = α i + β j (1 − γ ) i − , where γ ∼ N (0 , . • Each σ j = (cid:80) Ji = j a i , where a i ∼ U (0 , D i,j ∼ LN ( µ i,j , σ j ), subject to the constraint σ > σ > ... > σ J .To analyse the improvement in the performance due to the stacking procedure thatis presented in Section 3, the last benchmark model to be introduced is an individualANN. The inputs and characteristics (hidden layers, activation functions, etc.) ofthis algorithm will be the same as those used for the ANN included within the ﬁrstlevel of the Stacked-ANN. Additionally, the log-normal procedure to obtain the re-serve variability is the same as that for the Stacked-ANN model. To avoid repeatingcontent, refer to Section 3 for further details about the characteristics of the ANNused as a benchmark.Once the four benchmark models are explained, the diﬀerent measures selected tocompare the performance of the Stacked-ANN with the aforementioned reservingmodels are presented. Insurance regulations such as the Solvency II Directive andSwiss Solvency Test ask the general insurance companies to evaluate their expectedreserves and potential deviations from these central scenarios. Thus, the error ofthe estimated reserves will be computed in order to compare the performance ofthe diﬀerent models. As several triangles with diﬀerent levels of payments are usedduring this study, the measure for evaluating the reserves is% RM SE ( R t ) = (cid:113)(cid:80) Kk =1 ( ˆ R tk,µ − R tk ) /K (cid:80) Kk =1 R tk ∗

100 =

RM SE ( R t ) (cid:80) Kk =1 R tk ∗

100 (3)where K is the total number of triangles, t is the calendar year when the reservesare evaluated, ˆ R tk,µ is the reserve predicted by the reserving model using the triangle6 and R tk the reserves that were actually observed for that triangle. As it can bederived from the former expression, the aim of this error measure is the evaluation ofthe weight of the root mean squared error over the total reserves. To understand themodel’s performance, this error measure will also be calculated for the next year’spayments (% RM SE ( P t +1 )) and the ultimate loss cost (% RM SE ( U t )).In addition to the aforementioned error measures, the reserving risk ( RR ) per unitof reserve derived from the use of the diﬀerent stochastic reserving models will beanalysed. As previously stated, the models are going to be ﬁtted to several triangles,so the average of the former ratio is taken as a risk measure: Ratio ( RR t − α ) = (cid:80) Kk =1 ( ˆ R tk, − α − ˆ R tk,µ ) / ˆ R tk,µ K = (cid:80) Kk =1 RR t − α / ˆ R tk,µ K (4)where ˆ R tk,µ is the mean and ˆ R tk, − α is the percentile 1 − α of the estimated reservedistribution function of the company k . A deeper evaluation of the variation esti-mated by the diﬀerent stochastic models is carried out by calculating the standarddeviation per unit of reserve: Ratio ( σ ) = (cid:80) Kk =1 σ ( ˆ R tk ) / ˆ R tk,µ K (5)Finally, in order to check the adequacy of the reserving risk calculated for the diﬀerentcompanies, the Kupiec (1995) test is applied in order to verify if the number ofexcesses is aligned with the selected conﬁdence level. The empirical results of thetest and measures are collected in Section 4. This section is divided into several subsections in order to sequentially explain theproposed reserving model. In addition, Figure 1 presents the model architecture inorder to support the explanation.

Before estimating the diﬀerent reserving models within the ﬁrst level of the Stacked-ANN model, the database used, as well as the response and explicative variables forﬁtting the algorithms within this level, need to be deﬁned.The lower and upper triangles needed to ﬁt and validate the models are obtained fromSchedule P of the NAIC Annual Statement. This database (available on the CASwebsite) was collected from property and casualty insurers that underwrite businessin the US, and it contains both paid and incurred losses (net of reinsurance) of theaccident years from 1988 to 1997. Ten development years are available for everyaccident year. In addition to loss data, gross and net premiums by accident year arealso reported in the database. 7igure 1: Stacked-ANN model structureIn this paper, the diﬀerent reserving models will be ﬁtted to 200 loss triangles fromNAIC Schedule P, 50 from each of the following lines of business: Commercial Auto(CA), Private Passenger Auto Liability (PA), Workers’ Compensation (WC) andOther Liability (OL). As pointed out by Meyers (2015), selecting triangles from in-surers who made signiﬁcant changes in business operations is one of the main mistakesthat could be made with NAIC Schedule P data. The coeﬃcient of variation of thenet premiums and the net/gross premium ratio should be appropriate indicators ofchanges in business operations, so this author selected insurers that minimize theaforementioned metrics. The triangles selected by Meyers (2015) are used in this re-search in order to avoid the former issue and ensure comparability with other studies.With regard to the explanatory variables, as with other nonparametric reservingmodels based on Generalized Additive Models (Hastie and Tibshirani 1986 and Eng-land and Verrall 2002) or RNN (Kuo 2018), accident i and development j years wereselected to be the inputs of the ﬁrst-level algorithms. Both were initialized as oneand then scaled to range [0 ,

1] (hereinafter AY ∗ i and DY ∗ j ) in order to facilitate theﬁtting of the algorithms (Hastie et al. 2009).8he response variable of these algorithms is the scaled cumulative payments D ∗ ij .Depending on the data availability and the characteristics of the portfolio to be mod-elled, diﬀerent exposure measures can be selected to scale D ij . In this paper, netpremiums P i play the role of exposure measure, as this is the most relevant optionbetween the variables available in the database.Figure 2: Train and test setsLoss triangles are a representation of payments over time by accident or underwritingyear. Thus, the training and optimization of the deep learning algorithms within theStacked-ANN model architecture need to take into consideration that loss trianglesare composed of temporal series. Accordingly, the last diagonal is selected as a testset because it contains the most updated information, while the rest of the triangleis used for ﬁtting the algorithms (Figure 2).During the optimization process, diﬀerent conﬁgurations of the algorithms are ﬁttedwith the training data. To obtain the best conﬁguration, the test set is predicted, andthe root mean squared error of every option is computed. Finally, the conﬁgurationthat minimizes the former test error is selected. The ﬁrst level of the Stacked-ANN model consists of a Chain Ladder, CSR, and threealgorithms whose inputs were described in Section 3.1. It is worth mentioning that asODP and Mack’s model are based on the Chain Ladder technique, the Stacked ANNmodel incorporates the core rationale behind these stochastic reserving models. Themachine learning algorithms (RF, GB and ANN) ﬁtted at this step are explained inthe following paragraphs and will be optimized by applying a grid search to somehyperparameters and by measuring the test error. Additionally, at the end of thissubsection, the Chain Ladder and CSR hypothesis are integrated within the Stacked-ANN model architecture.The Random Forest (RF) algorithm introduced by Breiman (2001) averages B dif-ferent regression trees. In every ﬁtted tree, the explanatory variables and data pointsused during the training are randomly selected. Therefore, the formal expression to9redict the scaled cumulative payments is:ˆ D ∗ RFij = ˆ D RFij P i = (cid:80) Bb =1 T b ( X ) B (6) T b represents the b-th regression tree ﬁtted and X the selected subset of AY ∗ i and DY ∗ j to ﬁt T b . During the estimation process, the hyper-parameters optimized arethe number of variables randomly selected, N , and the minimum number of obser-vations to be kept in the terminal nodes of every ﬁtted tree, Obs RF .The second algorithm within the ﬁrst level is Gradient Boosting (GB) with regressiontrees (Friedman 2000). In this case, the gradient is minimized by sequentially ﬁtting B regression trees. The subset of data to be used during the estimation process ofevery tree is also randomly selected. The expression to obtain the predicted scaledcumulative payments isˆ D ∗ GBij = ˆ D GBij P i = ˆ f B − ( X ) + δ GB T B ( X ) (7)ˆ f B − ( X ) represents the function obtained after adding sequentially B − δ GB is the learning rate. The hyperparameter selected to be opti-mized during the training process is the minimum number of observations to be keptin the terminal nodes of every ﬁtted tree, Obs GB . Regarding the hyperparameters,it is worth mentioning that the learning rate, δ GB , is set to 0.01.The last algorithm of the ﬁrst layer is an Artiﬁcial Neural Network (ANN) (McCullochand Pitts 1943). Following the notation provided by Bishop (2006) and taking intoconsideration that the feed-forward ANN used in this paper is composed of 2 hiddenlayers with 5 neurons each, the formal expression to obtain the predictions can bedeﬁned as follows:ˆ D ∗ ANNij = ˆ D ANNij /P i == h (3)  (cid:88) k =1 w (3)1 ,k h (2)  (cid:88) j =1 w (2) k,j h (1) (cid:32) (cid:88) i =1 w (1) j,i x i + w (1) j, (cid:33) + w (2) k,  + w (3)1 ,  (8)where h ( n ) is the activation function associated with layer n , w ( n ) z,v is the v-th weightassociated with the neuron z inside layer n , and x i refers to the i-th input variable ofthe database composed of two explanatory variables, the scaled accident ( AY ∗ i ) anddevelopment year ( DY ∗ j ). The percentage of dropout regularization θ is the hyper-parameter to be optimized by applying a grid search and measuring the test error.As with the other algorithms, upper triangle predictions will be used as input withinthe second level of the architecture.In addition to the three aforementioned algorithms, Chain Ladder assumptions areincorporated in the model architecture. To do so, the development factors of theChain Ladder technique are used as an input in the second level of the Stacked-ANN10odel: (cid:98) λ ∗ CLj = (cid:80) n − j − i =1 D ∗ ij (cid:80) n − j − i =1 D ∗ ij − (9)where { (cid:98) λ ∗ CLj : j = (2 , , . . . , J ) } . Although the Chain Ladder methodology does notproduce any parameters for j = 1, the second-level algorithm needs a value for j = 1.Thus, within the Stacked-ANN methodology, it is assumed that (cid:98) λ ∗ CL = 1.Finally, CSR methodology (Meyers 2015) is integrated. To achieve this, 10,000MCMC simulations are produced within the ﬁrst level of the Stacked-ANN model.Then, the expected scaled cumulative payments of the upper triangle arising fromthe aforementioned simulations are used as input in the algorithm within the secondlevel of the Stacked-ANN model:ˆ D ∗ CSRij = (cid:80) , k =1 ˆ D CSRijk /P i ,

000 (10)

As previously stated, the inputs of this level are the scaled cumulative payments pre-dicted by the algorithms named in Section 3.2 (RF, GB and ANN), the developmentfactors based on the Chain Ladder technique and the expected scaled cumulativepayments simulated by the CSR model. On the other hand, the output of the ANNwithin the second level and the Stacked-ANN are the cumulative payments ˆ D ∗ S − ANNij by accident and development year.Figure 3: Second-level structureSimilar to the ﬁrst-level algorithms, the training and optimization processes of theANN within this level need to recognize that loss triangles are composed of a set of11ime series. The most recent information of the loss triangles is the last diagonal;thus, the explicative and response variables of this diagonal are selected as a test set,while the rest of the upper triangle data is used as a training set.Once the test and training sets are deﬁned, the optimum conﬁguration of the ANNneeds to be obtained. To do so, the training data are used to ﬁt ANNs with diﬀerentlevels of dropout regularization θ . Then, the root mean squared error is computedby taking into consideration the predictions made by every ANN conﬁguration. The θ that minimizes the test error is selected.Due to the Stacked-ANN architecture, two substeps need to be carried out in or-der to make the ﬁnal predictions. First, the lower triangle of the ﬁrst-level modelsneed to be predicted. Second, the data predicted in the previous step are used asinput of the ANN within the second layer to make the ﬁnal predictions. Thus, theStacked-ANN model tries to obtain more accurate predictions by combining diﬀerentreserving models and algorithms.Figure 1 shows the overall Stacked-ANN architecture, and Figure 3 provides a detailedsummary of the process deﬁned in the previous paragraphs. Technical details aboutthe feedforward ANN ﬁtted within this level of the Stacked-ANN model are presentedbelow: • It contains two hidden layers with 5 neurons each. The sigmoid activationfunction was selected for all neurons within the hidden layers while the linearactivation function was used in the output layer, which is composed of oneneuron. • The selected optimization algorithm is Adaptive Moment Estimation (ADAM),which was created by Kingma and Ba (2014). This method consists of a pro-gressive adaptation of the initial learning rate, taking into consideration currentand previous gradients. The default calibration proposed by the authors for theADAM parameters is applied as β = 0 . β = 0 . ω t = ω t − − δ ANN ˆ m t √ ˆ v t + (cid:15) (11)ˆ m t = β m t − + (1 − β ) g t − β t (12)ˆ v t = β v t − + (1 − β ) g t − β t (13)where ω is the parameter to be updated and g t the gradient in the epoch t .The initial learning rate is set to δ ANN = 0 . • The number of epochs is 10,000, and the batch size is equal to the length of thedata used for training the ANN. • The backward pass calculations are done according to the selection of the rootmean squared error as a loss function.12

As previously stated, the percentage of dropout regularization θ is the hyper-parameter to be optimized by applying a grid search and measuring the testerror.Taking the abovementioned details into consideration, the scaled cumulative pay-ments predicted by the Stacked-ANN model are obtained by means of the followingexpression:ˆ D ∗ S − ANNij = ˆ D ∗ S − ANNij P i = (cid:98) f ( ˆ D ∗ RFij , ˆ D ∗ GBij , ˆ D ∗ ANNij , (cid:98) λ ∗ CLj , ˆ D ∗ CSRij ) == h (3)  (cid:88) k =1 w (3)1 ,k h (2)  (cid:88) j =1 w (2) k,j h (1) (cid:32) (cid:88) i =1 w (1) j,i x i + w (1) j, (cid:33) + w (2) k,  + w (3)1 ,  (14) To compute the Kupiec test and the measures related to reserve variability (Section2), the deviation around the central scenario predicted by the Stacked-ANN modelneeds to be obtained. Due to its right skewness and long tail, log-normal distributionis widely used within reserving models to derive the variability of the claims cost.Many papers used the lognormal distribution to compute this variability (see, amongothers, Kremer (1982), Antonio et al. (2006), Rehman and Klugman (2009), Wekeand Ratemo (2013), Meyers (2015) or more recently, Omari et al. (2018)).In this study, a log-normal distribution is used to compute the reserve variabilityaround the central scenario predicted by the Stacked-ANN. To do so, the parametersof this distribution are obtained using the aforementioned predictions and the mo-ments method. Therefore, regardless of the distribution selected, the central scenariois that predicted by the Stacked-ANN, and thus, changing the distribution has noeﬀect on the error measures described in Section 2. Nevertheless, changes to the log-normal hypothesis will modify the variability and, consequently, the risk measures(

Ratio ( RR t − α ) and Ratio ( σ )) and the results of the Kupiec test. Below, the stepsof the procedure are described:1. Starting with the scaled cumulative payments predicted by the Stacked-ANN( ˆ D ∗ S − ANNij ), the variance by development year is computed as follows:

V ar [ ˆ D ∗ S − ANNj ] = (cid:80) ni =1 (cid:16) ˆ D ∗ S − ANNij − E [ ˆ D ∗ S − ANNj ] (cid:17) n − n refers to the total number of accident years and E [ ˆ D ∗ S − ANNj ] is themean of the scaled cumulative payments by development year.2. By using the method of the moments and values calculated in the previous step,the parameters of the log-normal distribution are obtained: (cid:98) µ ij [ ˆ D ∗ S − ANN ] = ln  E [ ˆ D ∗ S − ANNj ] (cid:113) V ar [ ˆ D ∗ S − ANNj ] + E [ ˆ D ∗ S − ANNj ]  (16)13 σ j [ ˆ D ∗ S − ANN ] = ln (cid:32)

V ar [ ˆ D ∗ S − ANNj ] E [ ˆ D ∗ S − ANNj ] (cid:33) (17)3. For t = (1 , , ..., T ):(a) A triangle is generated by sampling random values from the followingdistribution function: ˆ C ∗ S − ANN,kij ∼ LN ( (cid:98) µ ij [ ˆ D ∗ S − ANN ] , (cid:98) σ j [ ˆ D ∗ S − ANN ]).(b) The ﬁnal simulated values, ˆ C S − ANN,kij , are obtained by removing the scal-ing. Hence, the scaled payments obtained in the previous step are multi-plied by P i . In this section, the data used, the ﬁtting process and a ﬁnal comparison between theStacked-ANN and the benchmark models are shown.

As stated in Section 3.1, the upper and lower triangles required to ﬁt and validate themodels are obtained from Schedule P of the NAIC Annual Statement. This databasecontains the losses, reserves and premiums from 1988 until 1997 of diﬀerent propertyand casualty insurers that underwrite business in the United States.Meyers (2015) indicated that one of the main mistakes with the NAIC Schedule P datais selecting triangles from insurers that made signiﬁcant changes in their businesses.Meyers used the coeﬃcient of variation of the net premiums and the net-on-grossratio to select 50 triangles of each of the following lines of business: CommercialAuto (CA), Private Passenger Auto Liability (PA), Workers’ Compensation (WC)and Other Liability (OL). This triangle selection was also used in this paper in orderto ensure comparability with other studies that take as a reference the selection madeby Meyers (2015). For further details about the data used to ﬁt the Stacked-ANN,refer to Section 3.1.Once the data have been presented, the subsection focuses on the ﬁtting of theStacked-ANN. The ﬁrst level of the proposed model is composed of three individualalgorithms (RF, GB and ANN), the CSR reserving model and the development factorsderived from the use of the Chain Ladder technique. The second level is composedof an ANN. As pointed out in Sections 3.2 and 3.3, the optimum hyperparameters ofthe algorithms within the ﬁrst and second levels are obtained for each triangle usinga grid search. Table 1 lists the minor diﬀerences across the lines of business in themeans of the 50 optimum hyperparameters obtained for each algorithm.14able 1: Mean of the optimum hyperparameters by line of businessLine of RF ﬁrst GB ﬁrst ANN ﬁrst ANN secondBusiness level level level levelCA

Obs RF = 2 . N = 1 . Obs GB = 4 . θ = 0 . θ = 0 . Obs RF = 2 . N = 1 . Obs GB = 4 . θ = 0 . θ = 0 . Obs RF = 1 . N = 1 . Obs GB = 3 . θ = 0 . θ = 0 . Obs RF = 1 . N = 1 . Obs GB = 4 . θ = 0 . θ = 0 . Source : own elaborationAs previously stated, the development factors ( (cid:98) λ ∗ CLj ) obtained by applying the ChainLadder technique to D ∗ ij are used as input for the ANN included within the secondlevel of the Stacked-ANN model. These values are calculated for each triangle. Table2 presents the means of the development factors by line of business.With regard to the three algorithms of the ﬁrst layer and the Chain Ladder technique,the CSR model is also incorporated in the Stacked-ANN architecture by means ofinputting ˆ D ∗ CSRij in the second-level algorithm. This Bayesian reserving model isﬁtted to every single triangle. Tables 3 and 4 list the means of the CSR parametersby line of business.Table 2: Mean of the development factors by line of businessDevelopment factors CA PA WC OL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL (cid:98) λ ∗ CL Source : own elaborationTable 3, which is focused on the parameters needed to calculate the mean of thecumulative payments, presents positive γ and negative β j for every line of businesswith the unique exception of CA, where β , β , β and β are positive. According tothe model deﬁnition, the claims settlement speed increases when β j < γ > D i,j mean CSR CSRparameter CA PA WC OL parameter CA PA WC OL α β -1.235 -0.987 -1.447 -2.446 α β -0.514 -0.400 -0.626 -1.332 α β -0.229 -0.198 -0.322 -0.709 α β -0.085 -0.097 -0.178 -0.363 α β -0.003 -0.042 -0.089 -0.173 α β α β α β α β α γ Source : own elaboration

The comparison of the CSR deviation by development year presented in Table 4 re-veals that OL is the most volatile portfolio, while PA has the most stable reserves.For CA and WC, the reserve variability estimated by this Bayesian reserving modelis quite similar, and it is located at an intermediate point between the OL and PAlines of business.Table 4: CSR parameters by line of business: D i,j STDCSR Parameter CA PA WC OL σ σ σ σ σ σ σ σ σ σ Source : own elaboration

Once the Stacked-ANN reserving model is ﬁtted, its performance is compared withthe benchmark models explained in Section 2 (ODP, Mack, CSR and an individualANN).Table 5 lists the %RMSEs associated with reserves R t , next year payments P t +1 andultimate losses U t by line of business and reserving model. For further details aboutthe measures presented in the table, refer to Section 2.16able 5: %RMSE by line of business and reserving modelError Line of ODP Mack’s CSR ANN StackedMeasure Business Model ANN% RM SE ( R t ) CA 0.896% 0.896% 0.534% 1.768% 0.739%% RM SE ( P t +1 ) CA 0.668% 0.669% 0.573% 1.775% 0.876%% RM SE ( U t ) CA 0.170% 0.171% 0.102% 0.337% 0.141%% RM SE ( R t ) PA 1.012% 1.004% 0.823% 5.006% 0.254%% RM SE ( P t +1 ) PA 1.290% 1.286% 0.258% 1.900% 0.320%% RM SE ( U t ) PA 0.131% 0.131% 0.107% 0.651% 0.033%% RM SE ( R t ) WC 1.295% 1.286% 1.751% 1.943% 1.058%% RM SE ( P t +1 ) WC 0.887% 0.880% 1.531% 1.525% 0.676%% RM SE ( U t ) WC 0.222% 0.221% 0.301% 0.333% 0.182%% RM SE ( R t ) OL 5.274% 5.086% 3.153% 5.725% 0.722%% RM SE ( P t +1 ) OL 2.216% 2.102% 5.528% 0.268% 1.095%% RM SE ( U t ) OL 1.760% 1.709% 1.056% 1.918% 0.242% Source : own elaborationThe results obtained by using the diﬀerent reserving models are summarized as fol-lows: • The Stacked-ANN model outperforms the individual ANN. The proposed ar-chitecture is empirically more accurate because it can learn from the reservingmodels (Chain Ladder and CSR) and machine learning algorithms (RF, GB andANN) included within the ﬁrst level of the Stacked-ANN, while the individualANN must base its training only on the origin data ( AY ∗ i and DY ∗ j ) withouttaking advantage of other models that are able to capture diﬀerent patternsand characteristics. • As they are based on Chain Ladder assumptions, the mean of the distributionsgenerated by ODP and Mack’s model should converge to the values obtainedby applying the deterministic approach of the Chain Ladder technique. Conse-quently, the error measures observed in Table 5 for these two stochastic reserv-ing models are almost the same. The table also reveals that ODP and Mackare less accurate than the Stacked-ANN model in most cases. %

RM SE ( P t +1 )of CA is a unique category in which the benchmark models based on the ChainLadder technique are more accurate than the proposed methodology. • Regarding the comparison between Stacked-ANN and CSR, R t and U t of PA,WC and OL estimated by the proposed model are signiﬁcantly more accuratethan those obtained when using the Bayesian model. Additionally, % RM SE ( P t +1 )of the Stacked-ANN model is lower in WC and OL. Thus, in the majority ofcases, the CSR model is outperformed by the proposed methodology.To enhance the analysis of the results presented in Table 5, Figure 4 shows the% RM SE ( R t ) by line of business and volume of reserves. First, the companies were17igure 4: % RM SE ( R t ) by line of business and volume of reserves.classiﬁed in four diﬀerent groups taking into consideration the volume of reserves andthe quartiles associated with the distribution. Then, the error rate of each reservingmodel was computed by line of business. The former calculation was carried outwithout making any distinction between lines of business.The results of the aforementioned ﬁgure reveal that when no distinction betweenlines of business is made, the Stacked-ANN architecture outperforms the rest of thereserving models regardless of the company size. This diﬀerence is especially relevantfor those companies with a higher level of reserves, whose results are collected in thegraph labelled ‘Percentile: 75%-100%’. As expected, some ﬂuctuations in the perfor-mance of the models are observed when the results are analysed by line of businessand volume of reserves. Nonetheless, the error rate of the Stacked-ANN tends to belower than the rest of the benchmark models.In accordance with the reasons explained within the former paragraphs, it can beconcluded that the Stacked-ANN model takes advantage of the diﬀerent characteris-tics of several reserving models and machine learning algorithms, leading to a moreﬂexible and precise architecture in most of the cases.In addition to the error analysis, the risk measures ( Ratio ( RR t − α ) and Ratio ( σ ))18nd the p-values of the Kupiec test obtained by using each reserving model are com-pared. Before examining the results, it is important to point out that Mack’s modeldoes not make any assumptions about the payment distribution, ODP assumes thatincremental payments follow an overdispersed Poisson distribution, and CSR, ANNand Stacked-ANN presume that cumulative payments are log-normally distributed.The hypothesis taken regarding the payments impact the distribution shape and con-sequently the risk measures. Therefore, in this case, ODP and Mack’s model are notgoing to converge like they did in the central scenario.Table 6: Risk measures by line of business and reserving modelRisk Line of ODP Mack’s CSR ANN Stackedmeasures Business Model ANN Ratio ( RR t . ) CA 1.936 1.460 2.776 1.387 1.884 Ratio ( σ ) CA 2.561 0.461 0.681 0.456 0.642Kupiec p-value CA ≥ . ≥ . ≥ . < . ≥ . Ratio ( RR t . ) PA 0.544 0.373 0.918 0.783 0.888 Ratio ( σ ) PA 0.277 0.135 0.270 0.279 0.332Kupiec p-value PA ≥ . ≥ . ≥ . ≥ . ≥ . Ratio ( RR t . ) WC 2.525 0.691 1.797 2.194 2.149 Ratio ( σ ) WC 1.273 0.245 0.474 0.682 0.717Kupiec p-value WC < . < . < . < . ≥ . Ratio ( RR t . ) OL 7.506 3.287 4.843 2.315 3.522 Ratio ( σ ) OL 6.275 1.217 1.119 0.690 1.099Kupiec p-value OL ≥ . ≥ . ≥ . ≥ . ≥ . Source : own elaborationThe

Ratio ( RR t − α ) and the p-values collected in Table 6 evaluate the 99 . α = 0 . • According to the results of the Kupiec test, the Stacked-ANN generates anadequate risk assessment for every line of business. It is worth mentioningthat when compared with the individual ANN, the empirical results show thatthe stacking process not only improves the error rate but also allows for thegeneration of more appropriate distribution functions using the same simulationapproach (presented in Section 3.4). With regard to the comparison betweenthe Stacked-ANN and the rest of benchmark models, the Kupiec test revealsthat CSR, ODP and Mack’s model do not produce appropriate risk measuresfor WC, while the proposed methodology passes the test. • Intuitively, the duration of the liabilities should have a close relation with

Ratio ( RR t . ) and Ratio ( σ ): the longer the duration, the higher is the un-certainty around each economic unit of reserve. The development factors basedon the Chain Ladder technique measure the claim settlement speed. Therefore,19hey can be considered a good indicator of the duration of liabilities. A de-velopment factor at year t , λ t , means that the t + 1 cumulative payment is λ t times the cumulative claims settled at t . Consequently, high development fac-tors indicate a long duration, while low values reﬂect a high settlement speed.According to Table 2, OL is the line of business with the highest duration,while PA has the lowest. CA and WC, whose durations are in a similar range,are located at an intermediate point between PA and OL. As can be observedin Table 6, this intuition about the relation between the duration and reserveuncertainty is followed by the Stacked-ANN and benchmark models. • In general, the

Ratio ( RR t . ) and Ratio ( σ ) by line of business are similaracross the diﬀerent reserving models. The two main exceptions are the riskmeasures of ODP for OL and Mack for WC. The high values observed in theODP estimations for OL are due to two companies whose RR t − α / ˆ R tk,µ ratiosare higher than 60, while in the second case, Mack’s model systematically un-derestimates the variability of the payments, leading to lower values comparedwith the rest of the models and an inadequate risk assessment according to theresults of the Kupiec test. The Ratio ( RR t . ) and Ratio ( σ ) of the Stacked-ANN are in line with the majority of the benchmark models, and no extremelyhigh/low risk measures are observed in Table 6. As explained in Section 3, the ANNs included within the proposed Stacked-ANNarchitecture are composed of two hidden layers, each with ﬁve neurons. To analysethe impact of the ANN complexity (Cybenko 1989, Hornik et al. 1989, Hornik 1991and Leshno et al. 1993, among others, introduced the theoretical framework to anal-yse the approximation capabilities of neural networks) on the predictive power of theStacked-ANN model, a sensitivity analysis of the number of hidden layers was carriedout. Thus, Table 7 compares the conﬁguration selected for the Stacked-ANN modelin this paper with two alternative conﬁgurations: ANNs composed of one and threehidden layers with ﬁve neurons each.Two main conclusions can be drawn from the results obtained. First, the high levelof error of the one hidden layer model demonstrates that more complexity is neededin order to properly predict general insurance reserves. The structure proposed dur-ing this study for the Stacked-ANN model (two hidden layers) performs signiﬁcantlybetter than this ﬁrst alternative in every single line of business.Second, the performance of the three hidden layers alternative is similar to that ofthe suggested architecture. As no signiﬁcant diﬀerences are observed, the two hiddenlayer structure is considered more appropriate because the three hidden layer struc-ture adds complexity to the model without a signiﬁcant improvement in the error rate.20able 7: Sensitivity analysis of the number of hidden layersHidden Line oflayers Business %

RM SE ( R t ) % RM SE ( P t +1 ) % RM SE ( U t )1 CA 0.840% 0.766% 0.160%2 CA 0.739% 0.876% 0.141%3 CA 0.780% 0.643% 0.149%1 PA 2.512% 3.507% 0.326%2 PA 0.254% 0.320% 0.033%3 PA 0.231% 0.115% 0.030%1 WC 1.398% 1.296% 0.240%2 WC 1.058% 0.676% 0.182%3 WC 1.140% 1.030% 0.196%1 OL 1.419% 1.145% 0.475%2 OL 0.722% 1.095% 0.242%3 OL 0.613% 0.794% 0.205% Source : own elaboration

This paper introduced a stochastic reserving model based on stacking diﬀerent ma-chine learning algorithms (RF, GB and ANN) and reserving models (Chain Ladderand CSR). The predictive power and reserve volatility of the proposed approach,named Stacked-ANN, were compared with stochastic reserving models based on theChain Ladder technique (ODP and Mack’s model), an individual ANN and CSR,which is a Bayesian loss reserving model.Three main conclusions were drawn. First, a comparison of the Stacked-ANN with theindividual ANN revealed that the predictions of the reserves R t , next year payments P t +1 and ultimate losses U t made by machine learning algorithms were improved byapplying the proposed stacking procedure. The hybrid architecture learns patternsand characteristics from several algorithms and reserving models, resulting in a moreﬂexible and accurate model than an individual ANN, whose inputs for training arelimited to the original data.Second, the empirical results indicated that the Stacked-ANN model is more precisethan CSR and the most widely used stochastic reserving models based on the ChainLadder technique (ODP and Mack’s model). In particular, the R t and U t predictionsmade by the Stacked-ANN were more precise than those of ODP and Mack’s model inall the lines of business analysed, while the Bayesian model (CSR) was outperformedby the proposed architecture in three out of four lines of business. It is important toremark that in Other Liability (OL), which is a line of business with a longer durationand therefore a portfolio where the importance of an accurate reserves estimation isespecially relevant, the error of the models based on Chain Ladder or Bayesian statis-tics was more than four times the error of the Stacked-ANN. Therefore, it can beconcluded that machine or deep learning techniques can be used to improve the per-21ormance of the traditional reserving techniques based on Bayesian statistics or theChain Ladder.With regard to accuracy, it is worth mentioning that the proposed structure of theANNs (two hidden layers) within the Stacked-ANN model seems to be the optimalconﬁguration according to the empirical results. On the one hand, the error increasedsigniﬁcantly when the number of hidden layers is reduced to one. On the other hand,the results demonstrated that increasing the number of hidden layers does not havean impact on the accuracy. Thus, increasing the complexity of the ANNs by up tothree hidden layers will extend the training phase without making any signiﬁcantimprovement in the error.Third, the results of a Kupiec test revealed that the risk estimation made by theStacked-ANN can be considered as appropriate in all lines of business analysed, whilethe rest of the benchmark models failed the test at least once. In particular, CSR,ODP and Mack’s model were unable to produce an appropriate p-value for the Kupiectest in the Workers’ Compensation (WC) business, while the individual ANN failedthe test in Commercial Auto (CA) and, as with the previous models, in Workers’Compensation. Taking into consideration that the same log-normal approach wasused to obtain the reserves variability of the individual ANN and the Stacked-ANN,it must be mentioned that the stacking procedure not only increases the accuracybut also allows for the simulation of more adequate distribution functions.The aforementioned robustness and predictive power of the Stacked-ANN comparedwith other reserving models suggest that further investigation should be conductedabout the possible application of this model within the actuary in the box approach.The generation of outliers is one of the main problems when using the former method-ology with Chain Ladder models. Therefore, the robustness of the Stacked-ANN canbe exploited in order to improve the actuary in the box methodology, which is widelyused to assess the fact that reserves can be insuﬃcient to cover their runoﬀ over a12-month time horizon. References

Antonio, K., J. Beirlant, T. Holdemakers, and R. Verlaak (2006). Log-normalmixed Models for Reported Claims Reserves.

North American Actuarial Jour-nal 7 , 1223–1237.Antonio, K. and R. Plat (2014). Micro-level stochastic loss reserving in generalinsurance.

Scandinavian Actuarial Journal 2014 , 649–669.Barron, A. (1994). Approximation and estimation bounds for artiﬁcial neural net-works.

Machine Learning , 115–133.Baudry, M. and C. Robert (2019). A Machine Learning approach for individualclaims reserving in insurance.

Applied Stochastic Models in Business and In-dustry , 1–29. 22ishop, C. M. (2006).

Pattern Recognition and Machine Learning (InformationScience and Statistics) . Berlin, Heidelberg: Springer-Verlag.Bornhuetter, R. L. and R. E. Ferguson (1972). The actuary and IBNR.

Proceedingsof the Casualty Actuarial Society , 181–195.Breiman, L. (2001, Oct). Random Forests.

Machine Learning 45 (1), 5–32.Brown, T. B., B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Nee-lakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss,G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Win-ter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark,C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei (2020).Language models are few-shot learners.Castellani, G., U. Fiore, Z. Marino, L. Passalacqua, F. Perla, S. Scog-namiglio, and P. Zanetti (2018). An Investigation of Machine LearningApproaches in the Solvency II Valuation Framework. Available at SSRN:https://ssrn.com/abstract=3303296.Celikoglu, H. (2007, 10). A dynamic network loading process with explicit delaymodelling.

Transportation Research Part C: Emerging Technologies 15 , 279–299.Charpentier, A. and M. Pigeon (2016). Macro vs Micro Methods in Non-Life ClaimsReserving: An Econometric Perspective.

Risks 4 , 12.Chui, C., X. Li, and H. Mhaskar (1994). Neural networks for localized approxima-tion.

Mathematics of computation .Chui, C., X. Li, and H. Mhaskar (1996). Limitations of the approximation capa-bilities of neural networks with one hidden layer.

Advances in ComputationalMathematics .Cybenko, G. (1989). Approximation by Superpositions of a Sigmoidal Function.

Mathematics of Control, Signals and Systems 2 , 303–314.Devlin, J., M. Chang, K. Lee, and K. Toutanova (2018). BERT: pre-training of deepbidirectional transformers for language understanding.

CoRR abs/1810.04805 .Duma, M., B. Twala, T. Marwala, and F. Nelwamondo (2011). Improving thePerformance of the Support Vector Machine in Insurance Risk Classiﬁcation:A Comparative Study.

Proceedings of the International Conference on NeuralComputation Theory and Applications (NCTA-2011) , 340–346.England, P. D. (2002). Addendum to analytic and bootstrap estimates of predictionerrors in claims reserving.

Insurance: Mathematics and Economics 31 .England, P. D. and R. J. Verrall (1999). Analytic and bootstrap estimates of predic-tion errors in claims reserving.

Insurance: Mathematics and Economics 25 (3),281–293.England, P. D. and R. J. Verrall (2002, 01). Stochastic Claims Reserving in GeneralInsurance.

Br. Actuar. J. 8 , 443–544.England, P. D. and R. J. Verrall (2006). Predictive Distributions of OutstandingLiabilities in General Insurance.

Annals of Actuarial Science , 221270.23riedman, J. H. (2000). Greedy Function Approximation: A Gradient BoostingMachine.

Annals of Statistics 29 , 1189–1232.Funahashi, K. (1989). On the approximate realization of continuous mappings byneural networks.

Neural Networks 2 .Gabrielli, A., R. Richman, and M. W¨uthrich (2018, 11). Neural Network Em-bedding of the Over-Dispersed Poisson Reserving Model. Available at SSRN,https://ssrn.com/abstract=3288454.Gabrielli, A. and M. W¨uthrich (2018). An Individual Claims History SimulationMachine.

Risks 6 , 29.Geurts, P., D. Ernst, and L. Wehenkel (2006). Extremely Randomized Trees.

Ma-chine Learning 63 , 3–42.Halliwell, L. (2009). Modeling Paid and Incurred Losses Together.

CAS E-Forum,Spring) , 1–40.Happ, S., M. Merz, and M. W¨uthrich (2012). Claims development result inthe paid-incurred chain reserving method.

Insurance: Mathematics and Eco-nomics 51 , 66–72.Happ, S. and M. W¨uthrich (2013). Paid-incurred chain reserving method withdependence modeling.

ASTIN Bulletin 43 , 1–20.Hastie, T. and R. Tibshirani (1986, 08). Generalized Additive Models.

Statist.Sci. 1 (3), 297–310.Hastie, T., R. Tibshirani, and J. Friedman (2009).

The Elements of StatisticalLearning: Data Mining, Inference, and Prediction, Second Edition . SpringerSeries in Statistics. Springer New York.Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks.

Neural Networks 4 , 251–257.Hornik, K. (1993). Some new results on neural network approximation.

Neuralnetworks 6 , 1069–1072.Hornik, K., M. Stinchcombe, and H. White (1989). Multilayer feedforward net-works are universal approximators.

Neural Networks 2 (5), 359 – 366.Jessen, A., T. Mikosch, and G. Samorodnitsky (2011). Prediction of outstandingpayments in a Poisson cluster model.

Scandinavian Actuarial Journal 2011 (3),214–237.Kingma, D. P. and J. Ba (2014). Adam: A Method for Stochastic Optimization.

CoRR .Kremer, E. (1982, 01). IBNR claims and the two-way model of ANOVA.

Scandi-navian Actuarial Journal 1982 .Kuo, K. (2018). DeepTriangle: A Deep Learning Approach to Loss Reserving.

CoRR abs/1804.09253 .Kupiec, P. H. (1995, January). Techniques for verifying the accuracy of risk mea-surement models.

The Journal of Derivatives 3 (2), 73–84.24eshno, M., V. Lin, A. Pinkus, and S. Schocken (1993). Multilayer feedforward net-works with a nonpolynomial activation function can approximate any function.

Neural Networks 6 , 861–867.Lopez, O., X. Milhaud, and P. Th´erond (2019). A Tree-Based algorithm adaptedto microlevel reserving and long development claims.

ASTIN Bulletin .Mack, T. (1991). A Simple Parametric Model for Rating Automobile Insurance orEstimating IBNR Claims Reserves.

ASTIN Bulletin 21 (1), 93109.Mack, T. (1993). Distribution-free Calculation of the Standard Error of ChainLadder Reserve Estimates.

ASTIN Bulletin: The Journal of the InternationalActuarial Association 23 (02), 213–225.Margraf, C., V. Elpidorou, and R. Verrall (2018). Claims reserving in the pres-ence of excess-of-loss reinsurance using micro models based on aggregate data.

Insurance: Mathematics and Economics 80 , 54–65.Mart´ınez-Miranda, M., B. Nielsen, and R. Verrall (2012). Double Chain Ladder.

ASTIN Bulletin 42 (1), 59–76.Mart´ınez-Miranda, M., B. Nielsen, and R. Verrall (2013a). Continuous Chain Lad-der: Reformulating and generalizing a classical insurance problem.

Expert Sys-tems with Applications 40 , 5588–5603.Mart´ınez-Miranda, M., B. Nielsen, and R. Verrall (2013b). Double Chain Ladderand Bornhuetter-Ferguson.

North American Actuarial Journal 17 , 101–113.Mart´ınez-Miranda, M., B. Nielsen, R. Verrall, and W¨uthrich (2015). The LinkBetween Classical Reserving and Granular Reserving Through Double ChainLadder and its Extensions.

Scandinavian Actuarial Journal 2015 , 383–405.McCullagh, P. and J. Nelder (1989).

Generalized Linear Models, Second Edition .Chapman and Hall/CRC Monographs on Statistics and Applied ProbabilitySeries. Chapman & Hall.McCulloch, W. and W. Pitts (1943). A Logical Calculus of Ideas Immanent inNervous Activity.

Bulletin of Mathematical Biophysics 5 , 127–147.Merz, M. and M. V. W¨uthrich (2010). Paid-incurred chain claims reserving method.

Insurance: Mathematics and Economics 46 , 568–579.Meyers, G. (2015).

Stochastic Loss Reserving Using Bayesian MCMC Models . CASMonograph Series, number 1. Casualty Actuarial Society.Nakama, T. (2011). Comparisons of Single and Multiple Hidden Layer NeuralNetworks. pp. 270–279.Nigri, A., S. Levantesi, S. Marino, M. Scognamiglio, and F. Perla (2019). A deeplearning integrated leecarter model.

Risk 7 , 33.Omari, C., S. Nyambura, and J. Wairimu (2018). Modeling the Frequency andSeverity of Auto Insurance Claims Using Statistical Distributions.

Journal ofMathematical Finance 8 , 137–160.Pigeon, M., K. Antonio, and M. Denuit (2013). Individual loss reserving with theMultivariate Skew Normal framework.

ASTIN Bulletin 43 , 399–428.25igeon, M., K. Antonio, and M. Denuit (2014). Individual loss reserving usingpaid-incurred data.

Insurance: Mathematics and Economics 58 , 121–131.Poggio, T. and F. Girosi (1990). Networks for approximation and learning.

Pro-ceedings of the IEEE 78 , 1481–1497.Posthuma, B., E. Cator, W. Veerkamp, and E. Van Zwet (2008). Combined Anal-ysis of Paid and Incurred Losses.

CAS E-Forum Fall , 272–293.Quarg, G. and T. Mack (2004). Munich Chain Ladder.

Bl¨atter der DeutschenGesellschaft fr Versicherungs und Finanzmathematik XXVI , 597–630.Ramos-P´erez, E., P. Alonso-Gonz´alez, and J. N´u˜nez-Vel´azquez (2019). Forecastingvolatility with a stacked model based on a hybridized Artiﬁcial Neural Network.

Expert Systems with Applications 129 , 1–9.Rehman, Z. and S. Klugman (2009). Quantifying Uncertainty in Reserve Estimates.

Variance Journal 4 , 30–46.Renshaw, A. E. and R. J. Verrall (1998). A Stochastic Model Underlying theChain-Ladder Technique.

British Actuarial Journal .Richman, R. and M. W¨uthrich (2018). A Neural Network Extension of theLee-Carter Model to Multiple Populations.

SSRN . Available at SSRN,https://ssrn.com/abstract=3270877.Sheela, D. and S. Deepa (2013). Review on Methods to Fix Number of HiddenNeurons in Neural Networks.

Mathematical Problems in Engineering , 1–11.Silver, D., A. Huang, C. Maddison, A. Guez, L. Sifre, G. Driessche, J. Schrit-twieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman,D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach,K. Kavukcuoglu, T. Graepel, and D. Hassabis (2016, 01). Mastering the gameof go with deep neural networks and tree search.

Nature 529 , 484–489.Silver, D., T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot,L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis(2017). Mastering chess and shogi by self-play with a general reinforcementlearning algorithm.

CoRR abs/1712.01815 .Taylor, G., G. McGuire, and J. Sullivan (2008). Individual Claim Loss ReservingConditioned by Case Estimates.

Annals of Actuarial Science 3 (1-2), 215–256.Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser,and I. Polosukhin (2017). Attention is all you need.

CoRR abs/1706.03762 .Venter, G. (2008). Distribution and Value of Reserves Using Paid and IncurredTriangles.

CAS E-Forum, Fall) , 348–375.Verrall, R. J. (2000). An investigation into stochastic claims reserving models andthe chain-ladder technique.

Insurance: Mathematics and Economics 26 (1), 91–99.Weke, P. and C. Ratemo (2013). Estimating IBNR Claims Reserves for GeneralInsurance Using Archimedean Copulas.

Applied Mathematical Sciences 7 , 1223–1237. 26¨uthrich, M. (2018a). Machine learning in individual claims reserving.

Scandina-vian Actuarial Journal 2018 , 465–480.W¨uthrich, M. (2018b). Neural networks applied to chain-ladder reserving.