[PDF] A random forest based approach for predicting spreads in the primary catastrophe bond market

Abstract

We introduce a random forest approach to enable spreads' prediction in the primary catastrophe bond market. We investigate whether all information provided to investors in the offering circular prior to a new issuance is equally important in predicting its spread. The whole population of non-life catastrophe bonds issued from December 2009 to May 2018 is used. The random forest shows an impressive predictive power on unseen primary catastrophe bond data explaining 93% of the total variability. For comparison, linear regression, our benchmark model, has inferior predictive performance explaining only 47% of the total variability. All details provided in the offering circular are predictive of spread but in a varying degree. The stability of the results is studied. The usage of random forest can speed up investment decisions in the catastrophe bond industry.

Full PDF

AA random forest based approach for predicting spreads in theprimary catastrophe bond market

Despoina Makariou, Pauline Barrieu, Yining ChenLondon School of Economics and Political Science, Statistics DepartmentJanuary 29, 2020

Abstract:

We introduce a random forest approach to enable spreads’ prediction in the primary catastrophebond market. We investigate whether all information provided to investors in the offering circular prior toa new issuance is equally important in predicting its spread. The whole population of non-life catastrophebonds issued from December 2009 to May 2018 is used. The random forest shows an impressive predictivepower on unseen primary catastrophe bond data explaining of the total variability. For comparison,linear regression, our benchmark model, has inferior predictive performance explaining only of thetotal variability. All details provided in the offering circular are predictive of spread but in a varying degree.The stability of the results is studied. The usage of random forest can speed up investment decisions in thecatastrophe bond industry.

Key-words: machine learning in insurance, non-life catastrophe risks, catastrophe bond pricing, primarymarket spread prediction, random forest, minimal depth importance, permutation importance.

1. Introduction

Catastrophe bonds are I nsurance- L inked S ecurities (ILS), ﬁrst developed in 1990s, in aneffort to provide additional capacity to the reinsurance industry post mega-disasters. Thepricing of these instruments is particularly challenging as most of these securities aretraded over the counter. Over the last years, there have been several empirical papers try-ing to address this difﬁculty by studying the price of catastrophe bonds using real-marketdata, see Lane (2000), Lane & Mahul (2008), Lei et al. (2008), Bodoff & Gan (2009),Gatumel & Guegan (2008), Dieckmann (2010), Jaeger et al. (2010), Papachristou (2011),Galeotti et al. (2013), Braun (2016), and G¨otze & G ¨urtler (2018). The main orientation ofthese works was to explain catastrophe bond price via means of identiﬁcation of variableshaving a theoretically material and statistically signiﬁcant link with it. This was vastlyachieved through the use of explanatory statistical models. Certainly, the aforementionedworks have shed some light on the drivers of catastrophe bond prices. However, thereare certain limitations namely selection bias, interactive predictors, fragmented models,non-linearities, and a non-predictive study goal.Starting from selection bias, the data samples used previously often excluded bonds ofcertain characteristics, unusual issuances were eliminated as outliers, and observationswith missing entries were excluded from data sets, see Bodoff & Gan (2009), G¨otze &G¨urtler (2018), Galeotti et al. (2013), Braun (2016) and Lane & Mahul (2008). Besidessigniﬁcant loss of information, such study design strategies do not capture that each is-suance, in aiming to meet a special risk transfer need, is representative of the wholecatastrophe bond population. In Papachristou (2011), concerns about interactive inde-pendent variables were expressed but they were not studied. Furthermore, we often seefragmented models accounting only for a given peril territory combination or risk pro-ﬁle, see for instance Bodoff & Gan (2009), Papachristou (2011) and Lei et al. (2008).1 a r X i v : . [ q -f i n . P R ] J a n rom methodological perspective, this is an obstacle in studying the market as a wholeespecially as new issuances with more rare perils and innovative design arise. From abusiness perspective, a single model accounting for all transactions is more convenientfor model validation ease and ﬂexibility. Another limitation is the extensive use of linearregression without justiﬁcation of its suitability in a catastrophe bond market setting. Thiswas recognised in some cases, see Lane & Mahul (2008) and Papachristou (2011), whilstMajor (2019) presented industry based examples of why a simple linear regression modelis not appropriate for the catastrophe bond market. Finally, in terms of study goal, pastworks did not aim at spread prediction, although there is a business need for it, see Major(2019).As an attempt to overcome these issues, we suggest a supervised machine learning methodcalled random forest (Breiman 2001). Random forest provides highly accurate predictionswithout over-ﬁtting, see Breiman (2001), D´ıaz-Uriarte & De Andres (2006), Oh et al.(2003) among others. Most importantly, it is a ﬂexible method in a sense that makesno assumptions about the underlying data generative process tackling the issue of non-linearities and model fragmentation. Moreover, because the building blocks of the methodare regression trees, random forest is reasonably robust to outliers and since variables areconsidered in a sequential manner it captures interactions between variables without theneed to specify them (Breiman et al. 1984). Additional advantages of the method arethat internal measures of variables’ importance can be derived, and selection of the mostimportant variables is feasible. Finally, the need for data pre-processing is minimal, asmany steps are integrated in the method itself, ensuring time efﬁciency from a businessperspective.In this paper, we apply the random forest method to predict spreads in the full spectrum ofprimary non-life catastrophe bond market. Our main goal is to generate accurate spreads’predictions of new catastrophe bond observations. The chosen benchmark model is linearregression as it is the predominant model used in the relevant literature. An additionaltarget is to ﬁnd out whether the details provided to investors in the offering stage of acatastrophe bond issuance are all equally important in predicting its spread. From anempirical viewpoint, we aim at prediction accuracy and variables’ importance results tobe stable. Given the prediction orientation of our research, we see whether the patternscaptured in our data set can provide material for new explanatory-driven studies in thefuture. Finally, the potential of the introduced machine learning method in facilitatinginvestors’ activity in the catastrophe bond market is also of interest.We contribute to the literature in the following ways. First, we use a diverse catastrophebond data set which to the best of our knowledge includes the largest number of datapoints ever collected before in the primary market setting. Secondly, our study has apurely predictive direction. Thirdly, we use a single algorithmic method to study themarket as a whole which differs from the proposition of fragmented models dominatingthe existing literature. Finally, we incorporate two variables, i.e. vendor and coverage,that have not been seen in earlier works but they appear in practice as part of the offeringcircular provided to investors prior a new catastrophe bond issuance.The rest of the paper is organised as follows. In Section 2, we build on machine learningconcepts, in Section 3 we explain our research methodology and in Section 4 we presentdetails about our catastrophe bond data set. Then, the random forest generation based2n this catastrophe bond data set is demonstrated in Section 5 whilst the performance ofthe random forest is evaluated in Section 6. The importance analysis of catastrophe bondspread predictors is found next in Section 7. Furthermore, in Section 8, we provide anexample of how the random forest could be used in practice to assist investors’ decisionmaking when they examine a new catastrophe bond issuance. Concluding remarks followin Section 9.

2. Machine learning preliminaries

In this section, we introduce some machine learning concepts that will be useful for thecomprehension of methods used later on in our study. The explanations to be given arelimited to regression because catastrophe bond spread is a quantitative response variable.2.1. Supervised learningMachine learning includes a set of approaches dealing with the problem of ﬁnding or oth-erwise learning a function from data (James et al. 2013). Supervised learning is a machinelearning task where a function, otherwise called a hypothesis, is learned from a data set- often referred to as training set. The latter consists of a number of input-output pairswhere for every single input in the training set the correct output is known. An algorithmis going through all data points in the training set identifying patterns and ﬁnding howto map an input to an output. Because the desired answer for the output is known, thealgorithm modiﬁes this mapping based on how different algorithm generated outputs arecompared to the original ones in the training set (Friedman et al. 2001). Ultimately, theaim is that by the time the learning process ﬁnishes, this difference will be small enoughfor the algorithm to be able to map any set of new inputs the algorithm will come acrossin the future.2.2. Ensemble learningSometimes instead of learning one mapping, it is useful to have a collection of mappingswhich merge their predictions to create an ensemble (Russell & Norvig 2016). Individualapproximation functions in the ensemble are usually called base learners and predictionscombination can happen in various ways with most usual ones being voting or averaging.Such techniques have been investigated quite early on, see for example Breiman (1996 c ),Clemen (1989), Perrone (1993) and Wolpert (1992). The main beneﬁt of ensembles is thatif each single hypothesis is characterised by high degree of accuracy and diversity thenthe ensemble is going to produce more accurate predictions than any of the individualhypotheses on its own, see Zhou (2012). Here, accuracy means that a hypothesis resultsat a lower error rate as opposed to one that would be derived from random guessing on newinput values while diversity means that each hypothesis in the ensemble makes differenterrors on new data points (Dietterich 2000 a ). Ensembles are usually built by utilisingmethods to derive various data sets out of the original data set for each base learner. Oneof the most famous methods to construct an ensemble is brieﬂy discussed below.2.3. BaggingBagging, an acronym for b ootstrap agg regat ing presented by Breiman (1996 a ), is a pow-erful ensemble learning method. As the name indicates, the ensemble uses the bootstrap,3ee Efron (1992), as resampling technique to take multiple data samples from which mul-tiple base learners will be then generated. At the same time, aggregation, which is simpleaveraging for regression, is the way to combine the predictions of these individual baselearners. There are various merits in using bagging for building ensembles. First, us-ing a bootstrap sample to build each base learner means that a part of the original data(normally two third by default) are not used in its construction. Then, these unseen datapoints can constitute an unbiased test data to quantify how well each base learner gen-eralises (Breiman 2001). Secondly, the method is useful when data is noisy (Opitz &Maclin 1999). Thirdly, and probably the most important advantage is that by aggregat-ing base learners which individually suffer from high variance, take decision trees forinstance (Breiman et al. 1984), the ensemble as a whole achieves a variance reduction,see Breiman (1996 a ), Bauer & Kohavi (1999), Breiman (1996 c ), Breiman (1996 b ) andDietterich (2000 b ). A pitfall of the method though is that whilst bagging reduces the en-semble variance, there are diminishing returns in variance reductions. This is because allbootstrap samples are drawn from the same original data set meaning that base learnerswill inevitably be correlated. This latter point is where the idea of random forest is basedon and it will be further discussed in Section 3.

3. Research methodology

Having provided necessary background information about certain machine learning con-cepts, the purpose of this section is twofold. We start by stating our catastrophe bondspread prediction problem introducing notations that will be used later in our study. Wethen continue by presenting our research methodology.3.1. Problem statement with notationsBroadly, we use an ensemble algorithmic method to perform a supervised learning taskfor the primary catastrophe bond market. For now, let x generally denote the input whichreﬂects characteristics of catastrophe bonds available in the offering circular at the timeof issuance. At the same time, let symbol y denote catastrophe bond spreads at the timeof issuance. A function f of the form y = f ( x ) relates catastrophe bond characteristicsto their spreads, however f is unknown. Based on past primary catastrophe bond dataincluding information both for x = ( x , x , ..., x P ) where p = 1 , , ..., P and y , we ﬁrstwant to ﬁnd a function that approximates f so that we can predict spreads given newcatastrophe bond input.In particular, experience about past catastrophe bond issuances is captured by collecting n = 1 , ..., N distinct input-output pairs. The input is a vector of predictors, also calledfeatures, covariates or independent variables, x n = ( x n , x n , ..., x P n ) indexed by dimen-sion p = 1 , , ..., P and it is a component of R p . The output, also called response ordependent variable, is a real-valued scalar denoted by y n indexed by example number n = 1 , ..., N . By assembling these N pairs, we collectively form a catastrophe bond dataset D = { ( x n , y n ) , n = 1 , , ..., N } based on which the ensemble algorithmic methodwill search the space H of all feasible functions, in a process called learning, and ﬁnd afunction, denoted by h en , that is able to predict the response y (cid:48) given a new input x (cid:48) asaccurately as possible. Because, we use an ensemble method, h en is in reality a collectionof functions approximating f . We are also interested in assessing the importance of each Our convention is that bold lowercase letters reﬂect random vectors. x in predicting the spread. Finally, all results will be evaluated on the grounds ofthem being stable, subject to random subsampling of the whole data set.3.2. Random forestThe ensemble method that we use is called random forest. It is developed by Breiman(2001) and here is used to solve prediction problems. As James et al. (2013) mentioned,the underlying logic of random forest is “divide and conquer”: split the predictor spaceinto multiple samples, then construct a randomised tree hypothesis on each subspace andend with averaging these hypotheses together. Generally, random forest can be seen as asuccessor of bagging when the base learners are decision trees. This is because randomforest addresses the main pitfall of bagging; the issue of diminishing variance reductionsdiscussed earlier in Section 2.3. This is achieved by injecting an additional element ofrandomness during decision trees construction for them to be less correlated to one an-other. At the same time, since the base learners are decision trees there are not manyassumptions about the form of the target function resulting in low bias. The process ofconstructing a random forest involves various steps which are summarised in Figure 1 anddiscussed straight after. Figure 1:

Random forest construction scheme. For each regression tree, light grey cir-cles indicate the root node, dark grey circles intermediate nodes and white colour circlesterminal nodes.The ﬁrst step in the random forest generation process is bootstrap sampling. In particular,from a data set, like D , we take , .., K samples with replacement each of them havingthe same size as the original data set. The second stage is regression trees development.From each bootstrap sample, K regression trees are grown using recursive partitioning5s done in Classiﬁcation And Regression Trees (CART) (Breiman et al. 1984) but witha smart twist which further randomises the procedure. At each level of the recursivepartitioning process, the best predictor to conduct the splitting is considered based on afresh, each time, random sub-sample of the full set of predictors denoted as m try . The bestsplit is chosen by examining all possible predictors in this sub-sample and all possiblecut-points as of their ability to minimise the residual sum of squares for the resultingtree. A tree stops growing when a minimum number of observations in a given node isreached but generally speaking trees comprising the random forest are fully grown andnot pruned. By constructing these K trees we effectively get K estimators of function f namely h , h , ...h K . The average of these individual estimators h en = K (cid:80) Kk =1 h k ( x n ) is the random forest.From the above description, it is evident that there are three parameters whose value needto be ﬁxed prior to random forest development; namely the number of trees grown, nodesize, and number of variables randomly selected at each split. Each of them respectivelycontrol the size of the forest, the individual tree size and an aspect of the within tree ran-domness. There are certain default values that have been suggested following empiricalexperiments on various data sets but one can use an optimising tuning strategy with re-spect to prediction performance to select the most suitable values speciﬁcally for the dataset under study (Probst et al. 2018).After the random forest is built, it can be used to provide predictions of the responsevariable. To make predictions though, it is necessary to feed the method inputs that havenever been seen before during the construction process. As we have brieﬂy mentionedin Section 2.3, due to bootstrap sampling, we can refrain from keeping aside in advancea portion of the original data set for testing purposes. This is why; each tree uses moreor less two thirds of the observations, from now on called in-bag observations, whilst theremaining one third of the observations are never used to build a speciﬁc tree, from nowon called out of bag (OOB) observations. For each tree, the out of bag observations actas a separate test set. To predict the response variable value for the n th observation, oneshould drop down its corresponding input down every single tree in which this observationwas out of bag. This means that by doing so one will end up having in hand on average K/ predictions for any n = 1 , ..., N observation. Then, in order to derive a singleresponse prediction for the n th observation, the average of these predictions is taken. Thesame procedure is repeated for all other observations. Whether these predictions are goodenough or not needs to evaluated based on certain metrics as shown next.3.3. Performance evaluation criteria for random forestTo assess the performance of any machine learning algorithm, one needs to set in advancethe criterion upon which judgement will be made. In this paper, we employ two criteriafor the performance evaluation of our random forest; prediction accuracy and stability.They are discussed in the two following subsections.3.3.1. Prediction accuracyEvaluating our random forest performance based on its prediction accuracy is valuable forthe purposes of our study. This is because our goal is to predict catastrophe bond spreadsas accurately as possible given new catastrophe bond observations. In general, predictionaccuracy is one of the most used performance indicators in machine learning algorithms6iming at prediction. This is no different for random forest algorithm as originally pre-sented in Breiman (2001).In this paper, prediction accuracy is measured by means of the proportion of the totalvariability explained by the random forest, here denoted as R OOB . Following Gr¨omping(2009), the latter metric is deﬁned as R OOB = 1 − MSE

OOB

TSS where MSE

OOB stands for thetotal out of bag mean squared error and TSS for the total sum of squares. With respectto MSE

OOB , it shows the variability in the response variable that is not forecasted by therandom forest. It is calculated as MSE

OOB = N (cid:80) Nn =1 ( y n − ˆ y n OOB ) where ˆ y n OOB is themean prediction for the n th observation where n = 1 , ..., N for all trees for which the n th data point was out of bag. In effect, MSE OOB is a sound approximation of the test errorfor the random forest because every single data point is predicted based solely on thetrees that were not constructed using this observation. Actually, when the number of trees K is very large then the MSE OOB is roughly equivalent to leave one out cross validation(James et al. 2013). Surpassing the need to keep a separate test set is very practical ina catastrophe bond context. This is because one can use all available catastrophe bonddata, which is scarce, towards the construction of the random forest to create a strongerprediction method. With regards to TSS, as in linear regression, it reﬂects the degree atwhich the response variable, here the catastrophe bond spread, deviates from its meanvalue. It is deﬁned as TSS = (cid:80) Nn =1 ( y n − y ) where y n is the response variable value forthe n th observation where n = 1 , ..., N and y the mean value of the response variable. Inthis study, R OOB is going to be expressed in percentage terms. The higher the R OOB , thebetter the prediction accuracy of the random forest is.3.3.2. StabilityThe term stability here refers to how repeatable random forest results are when differentsamples taken from the same data generative process are used for its construction, seeTurney (1995) and Philipp et al. (2018) for the rationale behind this approach. The reasonfor investigating stability is that consistent results are deemed more reliable, see Stodden(2015), Turney (1995), Yu (2013) and Philipp et al. (2018) for a discussion. Moreover,Turney (1995) had observed that in industry applications, prediction accuracy alone is notenough to gain the trust of practitioners; inconsistent results can create confusion even ifthe prediction accuracy achieved is high.Various ways to measure the stability of algorithmic results have been presented in Turney(1995), Lange et al. (2004), Ntoutsi et al. (2008), Lim & Yu (2016) and Philipp et al.(2018). In this study, we are inspired by the works of Turney (1995) and Philipp et al.(2018) with regards to stability and its empirical measurement. In particular, the ideais that by obtaining two sets of data from the same phenomenon sampled from the sameunderlying distribution the algorithm needs to produce fairly similar results from both datasets for it to be considered stable. One way to achieve this is to randomly partition thewhole data set into two separate data sets multiple times. An important decision thoughis how to take the samples. Here, we propose taking the samples using the split-halftechnique as described in Philipp et al. (2018) meaning that the whole catastrophe dataset will be split into two disjoint data sets of roughly equal size. This sampling methodensures that a similarity between the results is not attributed to the same observationsbeing in both samples as this could result in similar results without meaning that thealgorithm is actually stable. By choosing a small learning overlap it is possible to examine7he degree of a result generalisation for independent draws from the catastrophe bond datagenerative process. The procedure will be repeated times.3.4. Evaluation of predictors’ importanceThe random forest algorithm allows for assessing how important each predictor is withrespect to its ability to predict the response, a concept that is brieﬂy called as variablesimportance. Its assessment is executed empirically (Gr¨omping 2009) and in Chen &Ishwaran (2012) one can ﬁnd a comprehensive review of various methods that can be usedto achieve this. Here, the focus lies on two widely used approaches namely permutationimportance, and minimal depth importance.3.4.1. Permutation ImportanceThe central idea of permutation importance, also known as“Breiman – Cutler importance”(Breiman 2001), is to measure the decrease in the prediction accuracy of the random for-est resulting from randomly permuting the values of a predictor. The method provides aranking for predictors’ importance as end result and it is tied to a prediction performancemeasure. In particular, the permutation importance for x p predictor is derived as follows.For each of the K trees; ﬁrst, record the prediction error MSE OOB k , secondly, noise up,i.e. permute, the predictor x p in the out of bag sample for the k th tree, thirdly, drop thispermuted out of bag sample down the k th tree to get a new MSE x pperm OOB k after the permu-tation and calculate the difference between these two prediction errors (before and afterthe permutation). In the end, average these differences over all trees. The mathematicalexpression of the above description is I x p = (cid:80) Kk =1 [ K ( MSE x pperm OOB k − MSE

OOB k )] where I x p is the importance of variable x p , K the number of trees in the forest, MSE x pperm OOB k theestimation error with predictor x p being permuted for the k th tree, and MSE OOB k the fore-casting error with none of the predictors being permuted for the k th tree. The larger the I x p the stronger the ability of x p to predict the response. Generally speaking a positive per-mutation importance is associated with decrease in prediction accuracy after permutationwhilst negative permutation importance is interpreted as no decline in accuracy.3.4.2. Importance based on minimal depthThe other approach for measuring predictors importance is based on measure named min-imal depth, presented in Ishwaran et al. (2010) with the latter being motivated by earlierworks of Strobl et al. (2007) and Ishwaran (2007). The minimal depth shows how remotea node split with a speciﬁc predictor is with respect to the root node of a tree. Thus, herethe position of a predictor in the k th tree determines its importance for this tree. The lattermeans that unlike permutation importance, the importance of each predictor is not tiedon a prediction performance measure. Also, in addition to ranking variables, the methodalso performs variable selection - a very useful feature for elimination of less importantpredictors.Speciﬁcally, Ishwaran et al. (2010) have formulated the concept of minimal depth basedon the notion of maximal sub-tree for feature x p . The latter is deﬁned as the largest sub-tree whose root node is split using x p . In particular, the minimal depth of a predictor x p , anon-negative random variable, is the distance between the k th tree root node and the mostproximate maximal sub-tree for x p , i.e. the ﬁrst order statistic of the maximal subtree. Ittakes on values { , ..., Q ( k ) } where Q ( k ) the depth of the k th tree reﬂecting how distant8s the root from the furthermost leaf node, i.e. the maximal depth (Ishwaran et al. 2011).A small minimal depth value for predictor x p means that x p has high predictive powerwhilst a large minimal depth value the opposite. That said, the root node is assigned withminimal depth and the successive nodes are sequenced based on how close they are tothe root. The minimal depth for each predictor is averaged over all trees in the forest.Ishwaran et al. (2010) showed that the distribution of the minimal depth can be derived ina closed form and a threshold for picking meaningful variables can be computed, i.e. themean of the minimal depth distribution. In particular, variables whose forest aggregatedminimal depth surpasses the mean minimal depth ceiling are considered irrelevant andthus could be excluded from the model.3.4.3. Other evaluation factorsAfter calculating the importance of predictors using the methods described above, weconsider useful to examine the results based on two additional criteria. First, we wantto ensure that the importance rankings and selected variables results are repeatable. Be-cause both permutation and minimal depth importance are linked to the random forestconstructed, the stability of predictors’ importance results will be evaluated in line withthe random forest stability evaluation. Secondly, we will check whether the predictors’importance results reﬂect investors’ knowledge from an empirical perspective. In a busi-ness context, it would be uncomfortable for an investor to see good catastrophe bondpredictions but importance rankings of the predictors outside their empirical knowledge,even though from a statistical viewpoint this type of agreement is not necessary.

4. Catastrophe bond data

In this section, we present how the catastrophe bond data used in this study have beencollected and processed whilst details are given with respect to the choice of variablesand their role in our study.4.1. CollectionThe core of catastrophe bond pricing cross sectional data has been collected from a lead-ing market participant. The websites of ARTEMIS, Lane Financial and Swiss Re SigmaResearch have been also extensively used to cross validate data entries that were unclearor non-available in the main data body. To the best of our knowledge, the data collectedrefer to all non-life catastrophe bonds issued in the primary market from December 2009to May 2018, a total of transactions. The information gathered was related to in-vestors return, loss potential of the securitized risk, i.e. expected loss and attachmentprobability, various design characteristics of the risk transfer, i.e. issuance size, coverageperiod, coverage type, trigger, region, peril, credit score and risk modelling company.4.2. PreparationConsolidating data from various sources was not a straightforward task. In particular,there were pieces of information referring to the same concept but measured in differentunits across different data providers. For example, some sources expressed expected lossas a percentage of issuance size whilst others in terms of basis points. Since those mea-sured on percentage terms were the majority, the appropriate transformation was made to9hange the unit from basis points into percentage terms to maintain consistency withinthe same data column. With regards to the spread at issuance, it was derived from thecoupon by subtracting the element of the money market rate.The validation of data across various sources, has been a time consuming task but this wasthe only way to ensure that there will be no missing values in the study, a pitfall in manyprevious works. On this note, it needs to be acknowledged that an exception in the abovenon-missing values claim is few private placement deals where the risk modelling ﬁrmwas not publicly announced. There, a separate category level was created to capture thisspeciﬁc reason for missingness, i.e. private placement. Including this level is consideredimportant via means that the developed algorithmic method will be able to predict spreadsfor these circumstances also. Further information on this category level can be found inAppendix A.4.3. Discussion about the choice of variablesThe variables included in the data set can be seen in Table 1, presented along with thedeﬁnition, type, and their role in this study. In Appendix A, one can ﬁnd basic statisticalinformation and histograms for all variables along with a discussion to enhance the un-derstanding of catastrophe bond data intricacies. With regards to the role of each variablein our research, the spread was chosen as dependent variable as it is an industry wide ac-cepted lens through which one can see catastrophe bond pricing. The spread is of utmostinterest to the investors as it indicates how much they could earn on the top of the riskfree rate if they decided to employ their capital in this alternative risk transfer segment.Variable Description Type Rolespread The amount of interest earned on the top of therisk free rate. continuous responseAP The probability of incurred losses surpassing theattachment point. continuous predictorEL The annual expected loss within the layer in ques-tion divided by the layer size. continuous predictorsize Catastrophe bond nominal amount. continuous predictorterm Years passed from issuance to maturity date. continuous predictorcoverage Contract term indicating whether protection is of-fered for a string of loss events or a single lossevent. categorical predictordiversiﬁer A peril-region combination. categorical predictorrating status A binary variable showing whether an impartialview of credit quality has been allocated to acatastrophe bond. categorical predictortrigger Mechanism through which a loss payment is acti-vated. categorical predictorvendor Catastrophe risk modelling software ﬁrm. categorical predictor Table 1:

Catastrophe bond data set dictionary. The equivalent of basis point is 0.01 percent. speciality articles, such as Risk (2019) and Muir-Wood (2017). There,the need to incorporate the coverage type in catastrophe bond pricing was highlightedfollowing the extensive capital freezes investors experienced after California wildﬁres in2018. Brieﬂy touching upon this topic, wildﬁres, a not well understood peril, has beenmostly transferred to investors with a provision that losses are covered on an aggregatebasis. By design, aggregate deals tend to obtain losses easier, even from small events,compared to their per occurrence counterparts, as a string of loss events triggers the bond.The incapacity of the models to account for this to date led to big losses from aggre-gate deals and pressure for spreads to incorporate this transaction aspect. This signiﬁesthe importance of considering this variable. A further addition into the variables kit forstudying the spread is the incorporation of information regarding the modelling companyemployed to calculate the frequency and severity of the securitised catastrophe risks. Thesoftware used for this purpose is ﬁrm speciﬁc thus it is interesting to explore whether byknowing this information part of the spread can be predicted.A ﬁnal note for the variables of this study regards credit ratings. Speciﬁcally, the in-formation captured initially was about the actual credit rating being allocated to a givencatastrophe bond issuance, if at all. However, here a binary variable was created indi-cating whether a catastrophe bond issued with a credit rating attached to it or not. Thisis because we observed that the majority of catastrophe bonds for this period were notrated. Consequently, our focus was shifted from what rating a catastrophe bond has towhether it has any rating at all. The absence of credit rating in new issuances is not solelyan observation in the current data set though. In ILS professional circles, the popular-ity of non-rated catastrophe bonds is justiﬁed from a catastrophe bond market evolutionperspective; investors feel more comfortable and trust the risk modelling companies forthe calculation of loss and the analysis of the risk return proﬁle more and more. As aresult, credit ratings are somehow no longer seen as essential as they used to be in the pastand this is reﬂected in the increasing issuance pace of non-rated bonds, see ARTEMIS(2019). In the following sections, we apply the research methodology of Section 3 to thecatastrophe bond data set that we have just discussed.

5. Random forest generation

In order to build the random forest using our catastrophe bond data set, we ﬁrst needed todecide the hyperparameters’ values that we will use, i.e. number of trees, number of vari-ables randomly selected at each split and node size. Breiman (2001) has suggested certaindefault values that seem to work well after multiple empirical experiments however we ILS is an abbreviation for Insurance Linked Securities or Insurance Linked Securitisation depending on the context in which it isused. The statistical software used is R, version 3.5.1. The statistical packages employed to perform computations are these of; Liaw &Wiener (2002) for developing the random forest as well as calculating permutation importance values, Ishwaran & Kogalur (2019) forcalculating minimal depth importance measures and the minimal depth threshold for variable selection, and Kuhn (2008) for tuningthe main hyperparameter using grid search methodology. It should be mentioned that whenever Ishwaran & Kogalur (2019) and Kuhn(2008) were used, algorithm arguments used agreed to those used in Liaw & Wiener (2002) to avoid inconsistencies. try is small so that each variable has enough of a chance to be included in the forestprediction process. However, except for the computational cost which is associated withgrowing large random forests, it was found by Breiman (2001) that there are diminishingreturns in the prediction accuracy increase by adding a bigger number of trees. Takingthese reﬂections into account, we have started the random forest development process bygrowing trees and in Figure 2 one can see how the MSE

OOB converges for variousvalues of random forest size up to this level. From a ﬁrst sight, it does not take a largenumber of trees for MSE

OOB to stabilise. Before even reaching trees, MSE

OOB hasdropped from above to less than . By the time we reach trees - a defaultrandom forest size value suggested in Breiman (2001), it seems that the MSE

OOB hasalmost been stabilised.

Figure 2:

Out of bag mean squared error convergence with respect to random forest size.Mean squared error based on out of bag samples (MSE

OOB ) versus number of trees inrandom forest.In order to empirically verify that the number of trees we use is satisfactory, we made fewextra checks for various numbers of trees below and above the trees reference pointup to trees. The results of these computations are shown in Table 2. We see indeedthat MSE

OOB drops further by adding extra trees but it seems that trees is adequate for12ur problem. The difference in MSE

OOB from trees onward is small and results in thesame R OOB . Thus, to avoid extra computational cost, our random forest will be comprisedof trees.Number of trees MSE

OOB R OOB

350 11938.50 92.70%400 11928.78 92.71%500 11676.67 92.86%600 11491.27 92.97%

700 11347.55 93.06%

Trials for the empirical approximation of random forest size by looking atMSE

OOB and R OOB for selected number of trees.5.2. Node sizeThe hyperparameter node size controls the size of the tree in the random forest and effec-tively determines when the recursive partitioning should stop. A large node size results inshallower trees because the splitting process stops earlier. This has the advantage of lowercomputation times, but it effectively means that that the tree will not learn some patternsresulting in lower prediction accuracy. From the other hand, a small node size translatesto a higher computational cost but more thorough learning of patterns and consequentlya more accurate base learner. The recommended value for node size, i.e. the minimumnumber of data points in the terminal nodes of each tree, given by Breiman (2001), is for regression problems. This default value was also suggested and used by many otherauthors, as Wang et al. (2018), Gr¨omping (2009), and Berk (2008) and therefore we alsoemploy it as node size value here. The random forest needs to consist of trees which arefully or almost fully grown, see Breiman (2001), thus there is not much added value inexploring this aspect further as meets this requirement and there is a general consensusfor its appropriateness.5.3. Number of variables selected at each splitThe number of candidate predictors getting randomly considered at each split, m try , isthe most important hyperparameter. This is because it can affect the performance of therandom forest and the predictors’ importance measures the most, see Berk (2008). Thesigniﬁcance of m try lies on the fact that it inﬂuences at the same time both the predictionaccuracy of each individual tree but also the diversity of the trees in the forest. To get themost out of the random forest, one wants each tree to have good prediction performancebut at the same time trees not to be correlated to one another. However, these two goals areconﬂicting. An individual tree will be the most accurate when m try has a high value but thiswould result in high correlation for the ensemble. In particular, an extreme case of m try = P would force the process to account to simple bagging (James et al. 2013). Generally,a small m try is preferable as, for a sufﬁciently large number of trees, each predictor willhave higher chance to get selected and thus contribute in the forest construction. All in13ll, the trade-off between individual learner accuracy and diversity needs to be managedby ﬁnding an optimal value which secures balance for the data set we study.In Breiman (2001), the default value of m try = P/ is suggested for regression problems.This means that in our problem where P = 9 , the algorithm would consider predictorsat each potential split. We have investigated the relevance of this empirical rule using atuning strategy called grid search followed by -fold cross validation. The goal was toensure that the most appropriate m try is chosen. The process started by specifying therange of all possible values that m try can take, namely the grid. In the current study,this is between and , i.e. as many as the number of predictors. Then, differentversions of the random forest algorithm were built one for each possible value of m try .The prediction accuracy of each random forest version, measured by means of R OOB , wasevaluated through a 5-fold cross validation.

Figure 3:

Tuning of main random forest hyperparameter through grid search followedby 5-fold cross validation. Out of bag based R (R OOB ) for random forest versus num-ber of candidate predictors getting randomly considered at each split (m try ) during forestgeneration.The results, shown in Figure 3, reveal that allowing the random forest algorithm to lookrandomly at m try = 3 or m try = 4 explanatory variables every time a split is considered atany given node, was almost equally good in terms of prediction performance. However,since variable importance measures were to be calculated later on, we deemed wise tochoose the smaller value of m try = 3 , by discipline, as this would lead in less correlatedtrees giving the opportunity to see the inﬂuence of weaker predictors to catastrophe bondspreads prediction. Having decided on the hyperparameter values, the ﬁnal random forestwas generated. The next section investigates how well the random forest performed inour catastrophe bond setting.

6. Random forest performance evaluation

In this section, we evaluate how well our random forest performs with regards to itsprediction accuracy and stability. The latter is important as industry users need to feelcomfortable that the predictions acquired from random forest are equally good no matterwhich catastrophe bond data set is used for its construction.14.1. Random forest prediction accuracyThe ability of the random forest to predict catastrophe bond spreads given new inputinformation was investigated. We start by clarifying what we regarded as new input fol-lowed by how the catastrophe bond spread predictions were made. Then, the results ofthe prediction accuracy metrics for the random forest are presented and discussed.First, as new input for a given tree, we have accounted its out of bag observations. Dueto sampling with replacement, only unique observations, i.e. around two thirds of N = 934 data points, were used to build each of the unpruned and almost fullygrown (node size = ) regression trees. The remaining observations, i.e. around onethird of N = 934 data points, were never used during the building process for a given treeand as a result they formed a reliable test set for it. Secondly, a prediction for the spreadat issuance for the n = 1 observation, ˆ y , was produced by dropping its correspondinginput down every single tree in which the n = 1 observation was out of bag. This resultedon average to , i.e. around one third of 700, catastrophe bond spread predictions forthe n = 1 observation. Then, a single spread prediction for the n = 1 observation wasmade by taking the average value of these predictions. After having predicted thecatastrophe bond spread value for the observation n = 1 , the same process has beenrepeated for the rest n = 933 observations left. Finally, in order to evaluate the predictionaccuracy of our random forest, the metrics discussed in Section 3 were calculated. Inparticular, we have computed the mean squared error based on the out of bag data asMSE OOB = (cid:80) n =1 ( y n − ˆ y n OOB ) , the total sum of squares as TSS = (cid:80) n =1 ( y n − y n ) and, the variability explained by our random forest as R OOB = 1 − MSE

OOB

TSS . The results arepresented in Table 3 along with additional information about the random forest built.Prediction accuracy of ﬁnal random forestsample size 934number of predictors 9random forest type regressionnumber of trees 700no. of variables tried at each split (m try ) 3node size 5MSE

OOB OOB

Table 3:

Final random forest prediction accuracy results presented in terms of MSE

OOB and R OOB .It stands out that our random forest explains around of the total variability. At thesame time, the predictive performance of our benchmark model, i.e. linear regression,is much lower - it explains only of the total variability . However, another impor-tant aspect in evaluating whether such level of random forest prediction accuracy is highenough, is to consider the nature of the problem under study. On a broader perspective,making predictions in a ﬁnancial market setting is not an easy task. Inefﬁciencies, multi-ple market participants and, the inﬂuence of psychology on their behaviour are only few Details about the prediction performance of linear regression based on the same catastrophe bond data set used for random forestgeneration are given in Appendix B.

15f the factors making the prediction task complex. Consequently, given that we addressa ﬁnancial problem, we could claim that achieving an R OOB of around here corre-sponds to a very satisfactory level of prediction accuracy. From a catastrophe bond marketperspective though, this result is impressive. Our data set included the whole populationof catastrophe bond deals for the year period under study, with all heterogeneity thatcharacterises it. Effectively, this means that by using the same random forest one can pre-dict the spread of a new catastrophe bond issuance no matter its features and risk proﬁle.To our knowledge this is the ﬁrst empirical study which achieves to ﬁnd a single modelto account for various types of catastrophe bonds; usually we see a segmentation for USversus non-US perils or earthquake speciﬁc models. Using only one pricing solution of-fers a great ﬂexibility in a business context. An instance of how our random forest can beused in the insurance linked securities industry, is discussed later on in Section 8. In thefollowing subsection, the stability of prediction accuracy results is assessed.6.2. Random Forest StabilityHere, the random forest results’ stability is used as performance criterion and it is mea-sured empirically from a practitioner’s point of view as presented in Section 3. FollowingTurney (1995) and Philipp et al. (2018), we obtained two sets of data from the same phe-nomenon and same underlying distribution with as little learning overlap as possible, thenconstructed two random forests from each one and checked whether prediction accuracywas fairly similar. That said, we took a random of the observations without replace-ment from the initial catastrophe bond data set, namely Sample A. The rest of the originaldata set observations, not included in Sample A, formed Sample B. Then, two separaterandom forests were grown out of Sample A and Sample B to assess the stability of ran-dom forest prediction accuracy to changes in the initial data set. We have repeated thisprocess 100 times. Optimal values for the number of variables randomly selected to beconsidered at each split were sought in both cases. In Table 4, we present a typical real-isation of 1 out of 100 iterations with respect to the repeatability of prediction accuracyresults.Across all 100 iterations, the recorded mean absolute difference of R OOB between SampleA and Sample B is around . with the minimum and maximum absolute differencesbeing . and . respectively. Given that our problem sits in the intersection ofﬁnancial and insurance market spheres where many behavioural aspects can affect prices,we consider this difference being small. In essence, it is unlikely that an ILS fund wouldreject the use of the method solely for such a level of dissimilarity. In fact, the repeatabilityof prediction results here means that our initial random forest prediction accuracy result,i.e. of an R OOB of presented in Table 3, is reliable.16andom Forest Summary Sample A Sample Bsample size 467 467number of predictors 9 9random forest type regression regressionnumber of trees 700 700no. of variables tried at each split 3 4node size 5 5MSE OOB R OOB

Table 4:

A typical realisation regarding random forest prediction accuracy stability re-sults.This ﬁnding is beneﬁcial for the usage of the method in the industry. With new catastrophebonds being issued, the random forest would need to be validated at some point in timeas any other model in an insurance related ﬁrm. Surely, in a business context, there is nopoint in investing time and capital to introduce a new model if the latter provides accuratepredictions strictly for one particular data set. Having gone through the examination ofprediction accuracy results stability, we proceed with determining the importance of eachindependent variable in the study.

7. Predictors importance analysis

The importance of predictors is assessed using the methodologies of permutation andminimal depth importance presented in Section 3. It should be once again highlighted,that the goal here is to ﬁnd how powerful each independent variable is in predicting catas-trophe bond spreads at issuance. No kind of relationship between spread at issuance andthe predictors is to be established - the focus lies solely on their prediction ability. Wethen compare the stability of predictors importance results for both methods. Finally, forthe most stable method, we discuss whether the rankings and variable selection resultsmake empirical sense from investors’ viewpoint.7.1. Permutation importanceThe importance of each independent variable in predicting catastrophe bond spreads hasbeen here assessed on the basis of a percentage increase in MSE

OOB when a predictoris randomly permuted from the out of bag data whilst others remain untouched. First,the MSE

OOB for each of the trees comprising the random forest, was recorded. Thesame process was repeated after randomly shufﬂing the values of a particular x p acrossall observations. Then, the change between these two mean squared errors, before andafter x p permutation, has been calculated and averaged across the trees after beingnormalised by the standard deviations of the differences. This way the importance scorefor x p has been derived. Finally, based on these scores, an importance ranking has beenproduced. The ranking of catastrophe bond predictors based on their permutation impor-tance score is shown in Figure 4. Variables higher on the vertical axis are more importantin predicting catastrophe spread at issuance.17 igure 4: Permutation importance based ranking of predictors. Predictors being per-muted versus percentage increase in MSE

OOB as a result of the permutation.One of the ﬁrst observations is that all scores have positive value. This indicates that eachof the independent variables presented here does contribute towards prediction of catas-trophe bond spreads. The rating status and expected loss appear as the most importantpredictors of spread at issuance followed by the variable term. In particular, when thevariable rating status is shufﬂed, the out of bag mean squared error increases by . whilst the respective percentage for expected loss predictor is marginally lower at . followed in turn by the predictor term at . . The rest of covariates could be groupedinto two different batches with respect to their predictive strength. Had any of the pre-dictors; diversiﬁer, size and AP been randomly permuted, the prediction performance ofthe random forest would have been deteriorated between . and . . Finally,looking at the next batch in the importance ranks, the predictors trigger category, cover-age and vendor reduce the random forest forecasting power between . and . - a higher percentage range compared to the other two batches of predictors. Next, wepresent the minimal depth importance results.7.2. Minimal depth importanceThe focus is now shifted from using a speciﬁc prediction performance measure to assessvariables importance to a criterion based on the way that the forest was constructed, i.e.the minimal depth. A tour over the constructed random forest was made to ﬁnd the max-imal subtree within each of the K = 700 trees for a particular x p predictor. From there,the minimal depth for x p within each tree was identiﬁed following the rationale explainedin Section 3. Then, the forest level minimal depth for x p was derived by averaging theminimal depth for x p within each tree among all trees. The plot below illustrates theranking of the covariates with respect to their average minimal depth; higher values ofminimal depth correspond to less predictive variables. See Section 3.4.2. for an explanation of what constitutes a maximal subtree. igure 5: Minimal depth importance based ranking of predictors. Predictors and theirforest averaged minimal depth. The dotted line indicates the variable selection threshold,here denoted by T*.Expected loss and attachment probability with random forest average minimal depths of . and . respectively have the largest impact in predicting catastrophe bond spreads.In particular, such small values of minimal depth demonstrate that these two variableswere mostly used to split either the root node or any of its daughter nodes at least inmost of the trees in the forest. Straight after in rankings comes the variable diversiﬁerwhich on average is chosen to split a node for the very ﬁrst time at a depth equal to . .Since this is located close enough to the root node, it is implied that predictor diversiﬁerhas also a considerable forecasting power. Using the same rationale, the counterparts;size and term have similar ability to predict catastrophe bond spreads with a marginaldistance from their predecessor diversiﬁer, with minimal depth measurements rangingfrom . to . . It is worth mentioning that among them, the predictor size holds thelead. Next in the importance rankings, one ﬁnds the variables trigger and rating statuswith minimal depth measurements of . and . respectively. They do have an almostidentical minimal depth measurement which can be interpreted as having the same powerto forecast catastrophe bond spreads at issuance. Nevertheless, at a random forest level,trigger and rating status are not as powerful because they split nodes which naturally haveless data points due to their proximity to the terminal nodes. Finally, vendor followedby coverage type were on average chosen to divide nodes very close to the terminal onesor even the terminal ones. This is revealed by their minimal depth of . and . respectively, the highest among all predictors, revealing that they have the most limitedforecasting ability out of all predictors in the data set.As mentioned in Section 3.4.2, the minimal depth is not used just for ranking predictorsbut also for selecting the most important ones. These are those predictors having forestaveraged minimal depth lower that the mean of the minimal depth distribution, namelymean minimal depth threshold according to Ishwaran et al. (2010), here denoted by T*.In the current study, the mean minimal depth threshold equals . . As seen in Figure 5though, all predictors had lower forest averaged minimal depth score compared to the one19ndicated by this threshold. A sensible remark is that all of the predictors in this studycarry information which is important in predicting catastrophe bond spreads even thoughin a varying degree. Although one could use other approaches to select variables, outsidethe scope of minimal depth, see Chen & Ishwaran (2012) for an overview, it is not anecessity here as our data set is not high dimensional, i.e. P (cid:29) N . Having presented thepredictors’ importance results using both permutation and minimal depth based methods,the next discussion refers to the degree of their divergence.7.3. Divergence between permutation and minimal depth importance resultsPermutation and minimal depth importance procedures presented for ranking or selectingcatastrophe bond spread predictors above are not directly comparable. This is because,as it has been seen, each of them follows a different approach in deﬁning and quantifyingthe importance in prediction. However, empirically we would expect that there shouldbe some consensus at least for the top and bottom rankings between the two methods.What we see is that whilst there is a degree of agreement for the least strong predictors,i.e. vendor and coverage, there is considerable divergence at the top and in the middle ofthe ranks. Given that the low ranks agree, the most worrying difference is for what eachmethod has identiﬁed as the most important predictors. This realisation makes us thinkwhich of the two variable importance approaches leads to the most trustworthy results forour catastrophe bond spread prediction problem. Empirically, an answer to this questionwould be to examine which ranking makes more sense from a practitioner’s perspective.However, we believe that ﬁrst it is preferable to bring our attention back to the concept ofresults stability but this time for the catastrophe bonds features importance. If one of thetwo methods is unstable, then we can shift our focus to the one that is most robust andthen discuss whether it makes sense from an investor’s perspective. That said, the stabilitychecks for the importance results derived by permutation and minimal depth importancemethods follow.7.4. Stability checks for predictors importance resultsIn this study, a predictors importance method will be considered reliable if its importanceranking for catastrophe bond spread predictors will be fairly robust to data set changes.Shall a change in the catastrophe bond data set from which the random forest is con-structed leads to a big change at the top and at the bottom of predictors importancerankings, the variables importance method will be considered unstable and thus proba-bly unreliable.Towards this direction, since both permutation importance and minimal depth importanceare procedures derived internally after the construction of the random forest, the stabilityof permutation and minimal depth importance has been mainly examined based on the random forests pairs grown out of Sample A and Sample B pairs which have beenpreviously used when the stability of the random forest was investigated in Section 6. InTable 5, we report by variable importance method, the percentage of times where therewas an agreement between Sample A and Sample B in the predictor chosen at the top,second, third and bottom positions of the ranking. As bottom positions of the rankingswe consider the last two positions jointly. This is because we understand that the furtherwe go down the ranking, variables may be more susceptible to jump from one position tothe next or the previous one across different iterations. It is evident that minimal depth20mportance provides more stable ranking results for the top positions than permutationimportance whilst both methods are equally robust for the last two ranking positions.Ranking position Importance method Agreement percentageTop Permutation 85%Minimal depth 96%Second from top Permutation 27%Minimal depth 84%Third from top Permutation 27%Minimal depth 70%Second from bottom Permutation 55%Minimal depth 38%Bottom Permutation 58%Minimal depth 43%Last two Permutation 100%Minimal depth 100%

Table 5:

Ranking stability by predictors importance method.We have also examined which of the two variable importance methods provided the moststable results regarding which variable is chosen at which position of the importancerankings. We considered the number of counts out of sub-samples taken in 100iterations (or samples taken in iterations when we consider the last two rankingpositions jointly), where a given predictor was ranked as top, second from top, third fromtop or in last two positions in terms of importance by variable importance method. Theresults are shown in Figure 6 and Figure 7 in terms of percentage frequency. We seethat minimal depth is more stable with regards to its predictors choices for the examinedranking positions, as opposed to permutation importance where more variation is visible.As previously highlighted in Chen & Ishwaran (2012), the complex randomisation ele-ment of permutation importance procedure makes it difﬁcult to dig any deeper and assessthe underlying cause for it being relatively more unstable. However, it should be men-tioned that this is not the ﬁrst work when this measure showed an irregular conduct. Asan example coming from the area of bioinformatics, Calle & Urrea (2010)) showed thatpermutation importance rankings were unstable to small perturbations of a gene data setrelated to the prognosis bladder cancer. All in all, it should be acknowledged that theappropriateness of the feature importance method is mostly data set speciﬁc and at leastfor the catastrophe bond set in hand it seems that permutation importance is not as reli-able. Based on the above, any discussion from now on about predictors importance willbe based on results of minimal depth importance as presented in Section 7.2.7.5. Discussion of predictors’ importance results from an industry perspectiveLooking at the minimal depth predictors’ importance ranking presented in Figure 5 broadly,we consider three groups of predictive variables, i.e. those of utmost, medium and lowprediction strength. We acknowledge that the bounds of where medium and lowest im-portance variables groups start may be subjective. The distinction here is made lookingat the ranking from the perspective of a practitioner. The reason why we want to avoid21 a) Permutation Importance (b) Minimal Depth Importance(c) Permutation Importance (d) Minimal Depth Importance(e) Permutation Importance (f) Minimal Depth Importance

Figure 6:

Bar plots showing the percentage frequency where a given predictor was rankedas top, second from top and third from top in terms of importance by variable importancemethod. 22 a) Permutation Importance (b) Minimal Depth Importance

Figure 7:

Bar plots showing the percentage frequency where a given predictor was rankedin the last two positions in terms of importance by variable importance method.focusing on individual importance scores is that explaining results in such a detailed waywould neither be appropriate nor meaningful for a prediction oriented study. This sectionis not about interpreting results but seeing whether the results capture somehow investors’perception and knowledge of the market.Having explained our rationale, the group of top importance predictors comprises fromthe two fundamental ingredients in any risk quantiﬁcation process, that is the severityand frequency of losses, i.e. expected loss and attachment probability. This is somethingthat would most probably not surprise insurance professionals, risk managers or eveninvestors if the variable importance results were to be presented to them. Especially withrespect to investors, it is well comprehended that the return to be earned by investing intoa catastrophe bond deal needs to surpass the amount of payout money should a qualifyingevent triggers the catastrophe bond payment. Thus, from an empirical viewpoint, investorswould expect that by knowing the expected loss and probability of them losing the ﬁrstdollar, at least a part of the spread value can be predicted.The second group refers to features which could attract investors’ interest in a deal. Onereason for this may be the effect that these features have on investors’ portfolio returns.In particular, investors would most probably agree with the predictor diversiﬁer takingthe lead in predicting catastrophe bond spreads in the second predictors’ group, as thistype of information acts as the window shop for them entering the transaction. The rarityof the peril combined with the coverage territory indirectly informs investors about thediversiﬁcation effect that the particular security can bring into their portfolio; a signiﬁcantincentive for them to invest in this asset class. We acknowledge that this may not betrue for new or rare perils, for which the existing catastrophe models are not yet trusted,however even in this case the peril-territory combination is informative in this sense.Another reason why the predictors of the second group could trigger investment interest isbecause some of these features are typical in traditional bond types traded in the ﬁnancialmarkets with which investors feel more comfortable with. For example, the issuance size,time between issuance and maturity date, credit rating information, and trigger of paymentare criteria that any investor would take into account no matter the ﬁnancial product asthey pin point to elements of market’s demand for the product, liquidity, credit quality andriskiness. Consequently, one could say that the location of these variables in the ranking23upports the way an average investor would think even for a typical non-insurance linkedinvestment.Finally, the last group of predictors in the importance ranking comprises from variableshaving strong technical weight in the securitization process and being insurance sectorspeciﬁc. The ﬁrst predictor in this group, i.e. vendor, refers to the software companyused to calculate the expected loss and various loss probabilities whilst the second one,i.e. coverage type, to a contract term found in insurance contracts. Whilst this may notbe immaterial information, there is not direct equivalent of such features in the ﬁnancialmarkets. Thus, the average investor not specializing in insurance linked securities wouldnot really dig deep into analyzing vendor model updates and historical catalogs or thewording of the transaction when thinking of what can predict returns. Especially forvendor, it is a matter of fact that there is a global oligopoly in ﬁrms offering catastropherisk modelling solutions in the insurance industry. Although the software developed byeach of these companies is based on different assumptions, their scientiﬁc grounds arenot disputed in the marketplace. This is mostly on the basis that these companies havebeen founded years before the birth of the ﬁrst catastrophe bond and also that they have along track record of being used in the traditional insurance and reinsurance markets. Thatsaid there is a contract of trust between them and the market participants as all vendorsare perceived to be of equivalent reputational standing. Having said that, it does not meanthat investors are sure about the reliability of the expected loss computation. It is justthat probably would not believe that one vendor will have a much more valid estimate ofloss than another. Similarly, coverage type really matters from an investor’s perspectivewhen seen in conjunction with the trigger or the combination of peril and geography. Forexample, catastrophe bonds with indemnity triggers or not well understood risks whencombined with aggregate coverage terms can be risky in trapping investors’ capital, asit was seen after 2018 Californian wildﬁres (Risk 2019). However, since the predictorstrigger and diversiﬁer were presented earlier in the ranking, the aspect of coverage couldbe potentially seen as less important in predicting the spread. Taking into account all theabove, the minimal depth predictors’ importance results seem to reﬂect investors’ currentunderstanding of the market. Next, we discuss at which degree the predictors’ importanceresults agree with past research in empirical catastrophe bond pricing.7.6. Predictive versus explanatory importance - links with literatureIt is compelling to see whether the variables that are found in earlier studies to be goodat explaining catastrophe bond spreads are any similar to the variables, shown here, to begood at predicting them. As mentioned in Shmueli (2010), one should not expect thesetwo to be the same. Variables considered important in explaining the response are tiedto a theoretical hypotheses set at the beginning of the study and on the notion of statisti-cal signiﬁcance. However, these aspects are immaterial in a purely predictive modellingframework as ours. Exploring the level of this divergence is interesting, as it can addvalue in understanding the full spectrum of catastrophe bond spread drivers for predictionand explanation. It should be mentioned that this is an exercise that shall be made withextra caution as, to the best of our knowledge, every study in the explanatory catastrophebond pricing literature to date and our predictive study has utilized different data sets andmade different assumptions. However, given the fact that some level of agreement hasbeen recorded in the past for certain variables in the explanatory framework, even underthese constraints, it is worth having a short discussion.24he starting point is independent variables where harmony with respect to predictive andexplanatory importance between this and previous studies has been observed. In partic-ular, in Section 7.2, it was seen that the expected loss is the most major contributor inpredicting spreads in the primary catastrophe bond market. This result comes in agree-ment with almost all explanatory oriented literature where expected loss has been actuallyincluded as an independent variable in their study, in particular the works of Lane (2000),Lane & Mahul (2008), Bodoff & Gan (2009), Dieckmann (2010), Braun (2016), Galeottiet al. (2013) and Jaeger et al. (2010), with Lei et al. (2008) being the only exception -the (conditional) expected loss was not found to be statistically signiﬁcant. At the sametime, it was seen that the probability of losses outstripping the attachment point has al-most equal forecasting power as the expected loss. This consents with the view of Lane(2000) supporting that the catastrophe bond premium is derived through an interplay be-tween frequency and severity of catastrophe bond expected losses. On the top of this, Leiet al. (2008) and Jaeger et al. (2010) also agreed with the view that the attachment prob-ability is of high signiﬁcance in explaining catastrophe bond spreads. Furthermore, theperil-territory combination is also of a particular importance as for its ability to forecastspreads here and in the explanatory framework alike results were obtained by Gatumel &Guegan (2008), Jaeger et al. (2010) and G¨otze & G¨urtler (2018). Similarly, trigger wasfound to be predictive in the current research work whilst Dieckmann (2010), G¨otze &G¨urtler (2018) and Papachristou (2011) have also talked about the explanatory signiﬁ-cance of this variable in their models. Finally, the predictor rating status which was foundto be predictive in our study (although not of top importance), it was seen as major deter-minant of spread in Lei et al. (2008) and G¨otze & G¨urtler (2018) even though both workshave examined rating from a different perspective to the one we have employed .At a second level, one can see that certain predictors which in past studies were cate-gorised as non-signiﬁcant spread determinants, here they appear to be relevant for predic-tion purposes. For example, Papachristou (2011) and Braun (2016) had excluded fromtheir analysis the variable term with Dieckmann (2010) being the only one highlightingits importance. At the same time, whilst the predictor size has been minded as less inﬂu-ential or not signiﬁcant at all by the models of Papachristou (2011), Lei et al. (2008) andBraun (2016) here it is considered sufﬁciently important for prediction purposes. Thisdivergence stems from the way weak predictors are treated in a typical linear regressionmodel (like the aforementioned ones) versus random forests. As Berk (2008) mentions,in a traditional regression framework a variable having a very small association with theresponse is most often excluded from the model being regarded as noise. Nevertheless,a big number of small associations when considered not on an individual basis but on anaggregate level can have a substantial impact on ﬁtted values. That is not to say that linearregression is not capable of capturing interactions, however to do so any interactions needto be explicitly speciﬁed - a complicated task when the number of predictors in the studystarts increasing. On the contrary, random forests, as a tree based method, is naturallyable to capture associations between predictors without the need to specify them. Indeed,Papachristou (2011) had acknowledged that in the context of his study, the fact that theterm was not considered important to be included in the suggested model could have beendue to the challenge of capturing complex effects between covariates. In Lei et al. (2008), the variable related to rating captured whether a catastrophe bond has been allocated an investment grade ornot rather than whether it has been rated at all or not. Similarly, G¨otze & G¨urtler (2018) the variable related to rating was not referredto the rating status of the bond but this of the cedent.

8. An example of random forest application in the industry

Given the exceptional predictive performance of random forest for catastrophe bond spreads,we provide an example of how the random forest could be used in a real catastrophe bondmarket setting. The focus lies on assisting an ILS investor in making faster buying deci-sions when considering to take on a portion of a newly issued non-life catastrophe bond.Below a generic business problem scenario is presented followed by a random forest im-plementation which could potentially solve it.In particular, just before a new catastrophe bond is issued, potential investors are providedwith an offering circular. This document includes information about the deal which is tobe launched and an invite for them to attend a road show, post which the issuance pricingwill be settled. The information disclosed in this package refers to risk details, variousdesign characteristics of the issuance and a price guidance. Investors want to make surethat the suggested spread compensates them enough for the true element of risk that theywould undertake had they entered the transaction. However, a detailed analysis of thisaspect can be time consuming as various departments and sometimes even external riskmodelling ﬁrms get involved in the process. Whilst this process is undoubtedly important,investors would like to have a ﬁrst ﬂavour of new deal’s potential faster. Then, let’simagine how useful a straightforward prediction tool would be, where investors couldplug in details provided in the circular of the new issuance the moment they receive it toget a quick spread prediction for the new transaction they investigate on the spot.The random forest could do exactly this; after being trained, assessed for the accuracyand stability of its predictions, and used to identify the most important predictors, it couldbe saved for future use. It is very beneﬁcial for the random forest to be built using infor-mation included in the offering material as input. This is because for a new issuance, in-vestors would be provided with the same type of information which in other words wouldserve as a fresh input. Then, the latter could be simply dropped down the random forestfor it to predict the catastrophe bond spread of the new issuance, given the patterns thatthe random forest has captured between spreads and circular information in the past. Thisprediction would then be compared with the spread guidance offered and give investorsan initial idea on whether the bond is overpriced, under priced or “fairly” priced based onpast catastrophe bond experience. This would direct investors to identify bargains fasterand ask more relevant questions about the deal whilst on the road show. Then if the dealis of interest, they could send all information needed to their modelling teams to performthe usual tasks of re-modelling the underlying risk exposure and calculate the marginalimpact that this new investment would bring into their portfolio. Overall, random forestis a solution that can speed up the investment decisions and help ILS investment ﬁrms notto use their valuable human resources for irrelevant catastrophe bond deals.26owever, one note that needs to be made is that when assessing the discrepancy betweenpredicted spread value provided by the random forest and the price guidance, one mightﬁrst want to look back at what happened in the past, i.e. the historical discrepancy be-tween predicted and actual values recorded in the prediction phase post the random foresttraining. This may shed some light on the level at which a mispriced deal according torandom forest is due to the portion of variability that the random forest could not explainor merely due to the fact that the new catastrophe bond has characteristics that have neverbeen recorded in the past. The latter problem, could be mitigated if the random forestwould be re-trained at frequent intervals, as part of the model validations taking placeeach every year in a business context, enriching the training data set with more deals.Given the above, this is just a simplistic example presenting how the random forest couldbe used in a real catastrophe bond market framework. Although many other parameterscould be taken into account for such a new method to be incorporated into internal busi-ness processes, here we give an idea of how the prediction power of random forest canliaise with investors’ personal judgement to make faster and more informed investmentdecisions. It should be highlighted that recent developments in the catastrophe risk marketalso support the use of machine learning techniques. Prime examples are the new cyberrisk model of AIR vendor, see AIR (2018), and the structuring of the ﬁrst catastrophebond relying solely on machine learning for its pricing, see Markets (2019).

9. Concluding remarks

So far the data-driven catastrophe bond pricing literature was focused on building statis-tical models with an aim to test causal theory. Due to the heterogeneity of catastrophebond deals, previous researchers had to focus on a particular segment of catastrophe bondmarket, mostly a speciﬁc peril-territory combination, to explain catastrophe bond spreads.The centre of interest lied on identiﬁcation of variables which have a theoretically materialand statistically signiﬁcant link to catastrophe bond price, i.e. hypotheses of relationshipbetween price and each independent variable were made. Then a statistical model, mostlylinear regression, was applied to observed data to compute the size of this effect and thestatistical signiﬁcance of each independent variable in relation to the causal hypothesesset at the beginning. For model evaluation, the vast majority of works have used in-sample R to assess how well the theoretical model ﬁtted the catastrophe bond data they had inhand. Model selection happened on the basis of keeping statistically signiﬁcant factorsand sometimes those non signiﬁcant ones having large coefﬁcients to match the functionconnecting catastrophe bond spread and factors to the true underlying catastrophe bonddata generation process. The ﬁnal results were provided in causal terms; out of thoserelevant compared to our study, expected loss, diversiﬁer, rating and trigger were foundto have a measurable effect on spread with size and term having no effect.The approach presented in this research is fundamentally different. A machine learningmethod called random forest was applied to a rich primary market catastrophe bond dataset with a goal to predict the spreads for any type of catastrophe bond at issuance giventhe features provided to investors in the offering circular. The word “any” here signiﬁesthat the method handles all bespoke characteristics of catastrophe bond transactions, thusthere is no need to silo catastrophe bond observations based on their structure and riskproﬁle and build separate models. Here, we did not focus on the underlying data gener-ation process, instead we learned the association between catastrophe bond spreads and27redictors from the data directly using the random forest. The performance of our methodwas assessed on how accurately it predicts spreads based on unseen catastrophe bond ob-servations. Variable importance measures referred to predictive ability and not the powerto explain how the spreads are generated in this universe. There was also interest in secur-ing repeatable prediction accuracy and predictors’ importance results thus relevant checkswere performed.It was found that random forest has incredible predictive performance and these resultswere stable. Moreover, all examined predictors have a say in the prediction of spreadeven if this is in varying degrees thus they all need to be taken into account. Out of thepredictors which were common with those studied before in the literature, predictive andexplanatory power coexist for expected loss, diversiﬁer, rating and trigger. The variablessize and term were found to have considerable predictive power but in previous studiesthey were not found to be explanatory. There is potential for random forest to be used inthe catastrophe bond industry to fast track investment decisions.Based on the above ﬁndings there are certain aspects that it would be interesting to re-search in the future. Although by using random forest as presented here, an investor cansee whether a new issuance of any type has a competitive price guidance or not, they donot get informed about the suitability of a new deal given their current portfolio compo-sition. Addressing this investors’ need is a signiﬁcant topic for future research. Also, theare few suggestions for the explanatory framework; size and term need to be further in-vestigated as the random forest captured interactions which none of the previous modelshad attempted before. Also, the explanatory power of coverage type and vendor, whichwere found to be predictive of spread here, could be also interesting to be studied. All inall, prediction modelling and explanatory modelling may differ but utilising both meth-ods can only increase the understanding of catastrophe bond market segment, increase itstransparency and contribute to its development. Appendices

A. Summary statistics for the catastrophe bond data set

Here, we provide further information about the catastrophe bond data set used in thisresearch paper. Few summary statistics are presented for all variables, both continuousand categorical ones. Starting from the continuous variables, we present histograms inFigure 10 and measures of central tendency and spread of the observations in our dataset in Table 6. In Figure 10, we see that all continuous variables have a right skeweddistribution with only exception variable term. In particular, term distribution has twopeaks reﬂecting that most catastrophe bond issuances have a 3 to 5 year time horizon.Looking at Table 5, we notice that the range between minimum and maximum valuesfor all continuous variables as well as the interquartile range are rather broad indicatingthat data points are well spread out. Such a data structure is anticipated in a catastrophebond market setting. In essence, each issuance is a bespoke product developed to meeta very speciﬁc risk transfer need and consequently the population of catastrophe bonddeals is heterogeneous. A worth mentioning point is that the minimum value of spreadin our data set is zero. In contrast to the dominant view that low spread associates tolow insurance risk assumed by investors, this is not the case here. In particular, the zero28pread observations happen to be related to risky catastrophe bond tranches which weresold to investors at a discount, as part of Residential Re 2017 and Laetere Re 2016 1 Seriesissuances, and carried a zero coupon. This example signiﬁes once again the diversity ofcatastrophe bond transactions; each one is inherently different from another. (a) Histogram for response spread (b) Histogram for predictor EL (c) Histogram for predictor AP(d) Histogram for predictor size (e) Histogram for predictor term

Figure 8:

Histograms for continuous variables. Percent of total observations versus anumerical variable.Continuous Variable Min. 1st Qu. Median Mean 3rd Qu. Max.spread (as % of size) 0.00 3.75 5.75 6.75 8.60 22.00EL (as % of size) 0.01 1.11 1.88 2.75 3.34 17.35AP (%) 0.02 1.36 2.51 3.72 4.68 25.04size (in million US dollars) 3.00 75.00 130.00 164.70 200.00 1500.00term (in years) 1.00 3.02 3.18 3.49 4.02 5.12

Table 6:

Continuous variables summary statistics. The unit in which each continuousvariable is measured is provided in brackets.Moving forward to categorical variables in Table 7, we present for each of them the num-ber of level and number of observations under each level, with the latter quantity alsobeing expressed as a percentage of the total number of observations. All variables lev-els are those used by the industry unless otherwise stated. Some comments regardingeach categorical variable follow. With regards to coverage type, we ﬁnd that the majorityof catastrophe bonds during the studying period were issued to provide compensation insituations where a single large-scale loss event would activate the trigger, i.e. per occur-rence coverage, as opposed to this happening due to a collection of insured loss eventsi.e. aggregate coverage. In very few instances in the data set, such as tranches A andB of Riverfront Re Ltd Series 2017-1 for example, per occurrence and annual aggregate29overage co-existed.With respect to diversiﬁer, we shall start by providing some explanations in terms of ab-breviations. APAC stands for perils speciﬁc to Asia Paciﬁc region, NA for perils relevantto North America, SA for prominent perils in South America, Europe for perils in theaforementioned region, and Multi peril for various peril-territory combinations. We seethat more than half of the catastrophe bond deals in the data set had a mixture of perilsin various geographical territories as an underlying. This is justiﬁed to an extend becausethe diversiﬁcation effect in these instances is much higher making this type of transactionsmore attractive to investors. Bonds covering wind in North America follow in terms ofpopularity even if the assumption of losses in the area is more likely due to the effect ofhurricane seasons. Nevertheless, the high frequency of events had allowed risk modellingcompanies to understand the risk better, and build more trustworthy models with investorsfeeling more secure to buy exposures in this region. Looking into the rating status of thebonds issued, it is evident that more than half of catastrophe bonds in the data set did notreceive a rating by any independent credit quality agency.With regards to triggers, indemnity ones were the most popular among the bonds includedin the study followed by industry indices. This clearly shows a preference from cedents’perspective to get compensated for the exact level of losses that they anticipate to expe-rience or at least to be compensated in line to industry losses. Deals which are triggeredwhen pre-determined event parameters are satisﬁed or surpassed accounted only for . of the total market in the period under study. Examples of parametric index deals in thecurrent data set is Atlas VI Capital Ltd. Series 2010-1 and Bosphorus Ltd. Series 2015-1whilst IBRD CAR 118-119 is an example of pure parametric trigger deal issued by the In-ternational Bank for Reconstruction and Development for Mexico’s natural disaster fundnamed FONDEN. The least used triggers were those combining different trigger typessuch as Fortius Re II Ltd. Series 2017-1 and those based on the modelled losses of thecedent’s exposure portfolio calculated based on event parameters gathered from speciﬁedagencies, such as Akibare II Ltd. single tranche.30ategorical Variable Levels No. of Observations Percentagecoverage aggregate 303 32.4occurrence 627 67.1diversiﬁer APAC 73 7.8Europe 66 7.1Multi Peril 528 56.5NA Quake 80 8.6NA Wind 184 19.7SA Quake 3 0.3rating status rated 435 46.6not rated 499 53.4trigger Indemnity 511 54.7Pure parametric 29 3.1Industry loss index 325 34.8Parametric index 23 2.5Model 22 2.4Multiple 24 2.6vendor AIR 741 79.3AON 4 0.4EQECAT 42 4.5RMS 141 15.1pp 6 0.6 Table 7:

Summary statistics for categorical variables. Levels of each categorical variableare presented by number of observations and percent of total observations. Abbreviationsare explained in the text.With respect to the risk modelling company used to calculate the expected loss of in-vestors’ exposure to underlying peril, we see that AIR Worldwide is the most widely usedfollowed by RMS. Together, they account for the . of all non-life securitisations inthe data sample followed by EQECAT, AON and pp accounting for the rest . . It isworth to note that pp abbreviation is not a risk modelling ﬁrm but it stands for p rivate p lacement. Examples are the single tranches of Merna Re Ltd. Series 2016-1, 2017-1,2018-1 which were privately purchased by specialized ILS funds. Finally, the internalmodel of AON was used for very few deals where the aforementioned company had actedas the structuring and placement agent, such as in the case of Windmill I Re series 2013-1. B. Random forest versus linear regression prediction accuracy

We believe it is important to compare the accuracy of catastrophe bond spread predictionsderived using the random forest as opposed to those derived using a benchmark model.Given that most of the previous literature used literature segment, the most usual is mostlythe traditional linear regression model. To this direction, we built a linear regressionmodel using our catastrophe bond data set focusing on model’s prediction performance.For consistency reasons, the bootstrap was one of the resampling methods tried out toestimate the prediction accuracy of the linear regression model. This was done as it was31escribed earlier in the case of random forest; we used bootstrap samples to reﬁt themodel and for each observation, we only considered predictions from bootstrap samplesnot including that observation. Then, the prediction accuracy as measured by means ofout of bag R in the linear regression case was percent as opposed to in the caseof random forest. Similar prediction accuracy results for the linear regression case werederived using -fold cross validation with R = 51% and leave one out cross validationwith R = 47% . Based on the above, we see that random forest signiﬁcantly outperformslinear regression for catastrophe bond spread prediction purposes. References

AIR (2018), ‘AIR develops advanced probabilistic model for global cyber risks’.Accessed = 2019-05-19.

URL:

ARTEMIS (2019), ‘Decline in ILS ratings shows the asset class isn’t so alternative:Kbra’. Accessed = 2019-03-19.

URL:

Bauer, E. & Kohavi, R. (1999), ‘An empirical comparison of voting classiﬁcation algo-rithms: Bagging, boosting, and variants’,

Machine learning (1-2), 105–139.Berk, R. A. (2008), Statistical learning from a regression perspective , 2 edn, Springer.Bodoff, N. M. & Gan, Y. (2009), ‘An analysis of the market price of cat bonds’,

CasualtyActuarial Society E-Forum, Spring 2009 .Braun, A. (2012), ‘Determinants of the cat bond spread at issuance’,

Zeitschrift f¨ur diegesamte Versicherungswissenschaft (5), 721–736.Braun, A. (2016), ‘Pricing in the primary market for cat bonds: new empirical evidence’,

Journal of Risk and Insurance (4), 811–847.Breiman, L. (1996 a ), ‘Bagging predictors’, Machine learning (2), 123–140.Breiman, L. (1996 b ), ‘Heuristics of instability and stabilization in model selection’, Theannals of statistics (6), 2350–2383.Breiman, L. (1996 c ), ‘Stacked regressions’, Machine learning (1), 49–64.Breiman, L. (2001), ‘Random forests’, Machine learning (1), 5–32.Breiman, L., Friedman, J. H., Olshen, R. A. & Stone, C. J. (1984), Classiﬁcation andregression trees , Wadsworth & Brooks/Cole Advanced Books & Software.Calle, M. L. & Urrea, V. (2010), ‘Letter to the Editor: Stability of Random Forest impor-tance measures’,

Brieﬁngs in Bioinformatics (1), 86–89.Chen, X. & Ishwaran, H. (2012), ‘Random forests for genomic data analysis’, Genomics (6), 323–329.Clemen, R. T. (1989), ‘Combining forecasts: A review and annotated bibliography’, In-ternational journal of forecasting (4), 559–583.D´ıaz-Uriarte, R. & De Andres, S. A. (2006), ‘Gene selection and classiﬁcation of mi-croarray data using random forest’, BMC bioinformatics (1), 3.Dieckmann, S. (2010), ‘By force of nature: explaining the yield spread on catastrophebonds’, Available at SSRN 1082879 .Dietterich, T. G. (2000 a ), Ensemble methods in machine learning, in ‘International work-shop on multiple classiﬁer systems’, Springer, pp. 1–15.32ietterich, T. G. (2000 b ), ‘An experimental comparison of three methods for constructingensembles of decision trees: Bagging, boosting, and randomization’, Machine learning (2), 139–157.Efron, B. (1992), Bootstrap methods: another look at the jackknife, in ‘Breakthroughs instatistics’, Springer, pp. 569–593.Friedman, J., Hastie, T. & Tibshirani, R. (2001), The elements of statistical learning ,Springer series in statistics New York, NY, USA.Galeotti, M., G¨urtler, M. & Winkelvos, C. (2013), ‘Accuracy of premium calcula-tion models for cat bonds—an empirical analysis’,

Journal of Risk and Insurance (2), 401–421.Gatumel, M. & Guegan, D. (2008), Towards an understanding approach of the insurancelinked securities market, Documents de travail du centre d’economie de la sorbonne,Universit´e Panth´eon-Sorbonne (Paris 1), Centre d’Economie de la Sorbonne.G¨otze, T. & G¨urtler, M. (2018), ‘Sponsor-and trigger-speciﬁc determinants of cat bondpremia: a summary’, Zeitschrift f¨ur die gesamte Versicherungswissenschaft pp. 1–16.Gr¨omping, U. (2009), ‘Variable importance assessment in regression: linear regressionversus random forest’,

The American Statistician (4), 308–319.Ishwaran, H. (2007), ‘Variable importance in binary regression trees and forests’, Elec-tronic Journal of Statistics , 519–537.Ishwaran, H. & Kogalur, U. (2019), Random Forests for Survival, Regression, and Clas-siﬁcation (RF-SRC) . R package version 2.8.0.Ishwaran, H., Kogalur, U. B., Chen, X. & Minn, A. J. (2011), ‘Random survival forests forhigh-dimensional data’,

Statistical Analysis and Data Mining: The ASA Data ScienceJournal (1), 115–132.Ishwaran, H., Kogalur, U. B., Gorodeski, E. Z., Minn, A. J. & Lauer, M. S. (2010), ‘High-dimensional variable selection for survival data’, Journal of the American StatisticalAssociation (489), 205–217.Jaeger, L., M¨uller, S. & Scherling, S. (2010), ‘Insurance-linked securities: What drivestheir returns?’,

The Journal of Alternative Investments (2), 9–34.James, G., Witten, D., Hastie, T. & Tibshirani, R. (2013), An introduction to statisticallearning , Springer.Kuhn, M. (2008), ‘Building predictive models in r using the caret package’,

Journal ofStatistical Software (5), 1–26.Lane, M. & Mahul, O. (2008), Catastrophe risk pricing: an empirical analysis , The WorldBank.Lane, M. N. (2000), ‘Pricing risk transfer transactions 1’,

ASTIN Bulletin: The Journal ofthe IAA (2), 259–293.Lange, T., Roth, V., Braun, M. L. & Buhmann, J. M. (2004), ‘Stability-based validationof clustering solutions’, Neural computation (6), 1299–1323.Lei, D. T., Wang, J.-H. & Tzeng, L. Y. (2008), Explaining the spread premiums on catas-trophe bonds, in ‘NTU International Conference on Finance, Taiwan’.Liaw, A. & Wiener, M. (2002), ‘Classiﬁcation and regression by randomforest’, R News (3), 18–22.Lim, C. & Yu, B. (2016), ‘Estimation stability with cross-validation (escv)’, Journal ofComputational and Graphical Statistics (2), 464–492.Major, J. A. (2019), ‘Methodological considerations in the statistical modeling of catas-trophe bond prices’, Risk Management and Insurance Review (1), 39–56.33arkets, L. C. (2019), ‘Shah: Data-driven cat bond can be replicated’.Muir-Wood, R. (2017), ‘The case of the trapped collateral’. Accessed = 2019-03-19. URL:

Ntoutsi, I., Kalousis, A. & Theodoridis, Y. (2008), A general framework for estimatingsimilarity of datasets and decision trees: exploring semantic similarity of decision trees, in ‘Proceedings of the 2008 SIAM International Conference on Data Mining’, SIAM,pp. 810–821.Oh, J., Laubach, M. & Luczak, A. (2003), Estimating neuronal variable importance withrandom forest, in ‘2003 IEEE 29th Annual Proceedings of Bioengineering Conference’,IEEE, pp. 33–34.Opitz, D. & Maclin, R. (1999), ‘Popular ensemble methods: An empirical study’, Journalof artiﬁcial intelligence research , 169–198.Papachristou, D. (2011), ‘Statistical analysis of the spreads of catastrophe bonds at thetime of issue’, ASTIN Bulletin: The Journal of the IAA (1), 251–277.Perrone, M. P. (1993), Improving regression estimation: Averaging methods for variancereduction with extensions to general convex measure optimization, PhD thesis, PhysicsDepartment, Brown University, Providence, RI.Philipp, M., Rusch, T., Hornik, K. & Strobl, C. (2018), ‘Measuring the stability of resultsfrom supervised statistical learning’, Journal of Computational and Graphical Statistics (4), 685–700.Probst, P., Bischl, B. & Boulesteix, A.-L. (2018), ‘Tunability: Importance of hyperparam-eters of machine learning algorithms’, arXiv preprint arXiv:1802.09596 .Risk, T. (2019), ‘Controlling a blazing risk’, Trading Risk; ILS Investor Guide pp. 14–15.Russell, S. J. & Norvig, P. (2016),

Artiﬁcial intelligence: a modern approach , PearsonEducation Limited.Shmueli, G. (2010), ‘To explain or to predict?’,

Statistical science (3), 289–310.Stodden, V. (2015), ‘Reproducing statistical results’, Annual Review of Statistics and ItsApplication , 1–19.Strobl, C., Boulesteix, A.-L., Zeileis, A. & Hothorn, T. (2007), ‘Bias in random forestvariable importance measures: Illustrations, sources and a solution’, BMC bioinformat-ics (1), 25.Turney, P. (1995), ‘Bias and the quantiﬁcation of stability’, Machine Learning (1-2), 23–33.Wang, Z., Wang, Y., Zeng, R., Srinivasan, R. S. & Ahrentzen, S. (2018), ‘Random forestbased hourly building energy prediction’, Energy and Buildings , 11–25.Wolpert, D. H. (1992), ‘Stacked generalization’,