On the modelling of multivariate counts with Cox processes and dependent shot noise intensities
Benjamin Avanzi, Gregory Clive Taylor, Bernard Wong, Xinda Yang
AA multivariate micro-level insurance counts model with a Cox processapproach
Benjamin Avanzi a , Greg Taylor b , Bernard Wong b , Xinda Yang ∗ ,b a Centre for Actuarial Studies, Department of Economics, University of Melbourne VIC 3010, Australia b School of Risk and Actuarial Studies, UNSW Australia Business School, UNSW Sydney NSW 2052, Australia
Abstract
In this paper, we focus on estimating ultimate claim counts in multiple insurance processes and thus extendthe associated literature of micro-level stochastic reserving models to the multivariate context.Specifically, we develop a multivariate Cox process to model the joint arrival process of insurance claimsin multiple Lines of Business. The dependency structure is introduced via multivariate shot noise intensityprocesses which are connected with the help of L´evy copulas. Such a construction is more parsimoniousand tractable in higher dimensions than plain vanilla common shock models. We also consider practicalimplementation and explicitly incorporate known covariates, such as seasonality patterns and trends, whichmay explain some of the relationship between two insurance processes (or at least help tease out thoserelationships).We develop a filtering algorithm based on the reversible-jump Markov Chain Monte Carlo (RJMCMC)method to estimate the unobservable stochastic intensities. Model calibration is illustrated using real datafrom the AUSI data set.
Key words:
Dependency modelling, Cox process, Shot noise, Insurance claims counts, Micro-level model,Markov chain Monte CarloJEL codes: C51, C53, C55, G22MSC classes: 91G70, 91G60, 62P05, 62H12
1. Introduction
This paper is concerned with the multivariate modelling of ultimate claim counts, which is vital forthe valuation of outstanding claims liabilities of multiple lines of business (“LoBs”). In the multivariatecontext, the distribution of the aggregate losses depends not only on the marginal loss distributions, butalso on their dependency structure. This is of particular importance where a risk margin is required. A riskmargin is typically calculated as the difference between a quantile and the central estimate of the reserves.For example, the Australian Prudential Regulation Authority (“APRA”) requires an actuarial estimate atthe greatest of two quantities: (i) 75% quantile of the predictive distribution of the total loss (see APRAPrudential Standard GPS 320), and (ii) its central estimate, augmented by one half standard deviation.Therefore incorporation of dependencies when they arise is required in order to determine an adequate levelof capitalisation. Dependencies can manifest themselves in the severity or frequency of claims. In this paperwe focus on the latter.The traditional multivariate stochastic reserving literature and practice focus on an aggregate approach,whereby dependence structures are constructed on the basis on loss development triangles (see, for exampleMerz and W¨uthrich, 2007, 2008, 2009b,a; Merz, W¨uthrich, and Hashorva, 2013; Shi and Frees, 2011; Shi, ∗ Corresponding author.
Email addresses: [email protected] (Benjamin Avanzi), [email protected] (Greg Taylor), [email protected] (Bernard Wong), [email protected] (Xinda Yang)
April 24, 2020 a r X i v : . [ q -f i n . R M ] A p r asu, and Meyers, 2012). In such an aggregate approach—later referred to as a “macro-level” approach, rawobservations are aggregated into a more synthetic dataset of claims classified by accident and developmentperiods (typically of annual or quarterly frequencies). This is the traditional approach, which actuaries arevery accustomed to, and which has the advantage of being computationally straightforward, even though afair amount of judgement is required to make reasonable assumptions in its application.When using a macro-level approach, the small(er) size of the data set may lead to high parameter un-certainty and hence reduce the predictive power of the model. Furthermore, one can reasonably questionwhether there is enough data to fit a dependency structure (when required) to a reasonable level of con-fidence. Also, an aggregate dataset by nature may eliminate some information about useful factors (suchas seasonality factors in annual triangles). Further, the aggregation does not treat different componentsof the claims processes separately (such as information on the claim arrival process and individual claimdevelopment process, or speed of settlement). Those components may have different effects on the quantityof interest, and may also change over time in different ways. In short, the forces that drive the quantitiesof interest are hidden behind a black box in a macro-level model, which is a problem when the environmentchanges (a well known limitation of the chain ladder method, for instance). While there may be ways toadjust macro-level methods, such as Berquist-Sherman adjustments (see, e.g. Werner and Modlin, 2010;Friedland, 2013), we see value in trying to model those forces explicitly, rather than trying to adjust a modelthat was built on conflicting assumptions (see Avanzi, Taylor, and Wong, 2016b, for more details).An alternative solution, adopted in this paper, is to use a more granular approach. The so-called “micro-level” reserving model utilises the information of individual claims (Antonio and Plat, 2014; Avanzi, Wong,and Yang, 2016c). Typical micro-level datasets include information such as inception and expiry dates ofeach policy, the arrival and reporting dates of each claim and the history of transaction and settlement (ifthe claim has been settled) of each claim. Such data sets are very granular and complex. They allow theobservations of varying components, such as the arrival and reporting dates as well as the transaction recordsand settlement status of each claim. Therefore, one can try to build a dependency model with micro-levelcomponents to explain the dependencies of the reserving estimate.There is now an increasing interest in the micro-level reserving approach from both academia and in-dustry. One stream of the micro-level reserving literature adopts a marked point process approach, wherethe arrival process of claims follows a continuous time point process and the claim development is capturedby a mark. Early examples include Arjas (1989); Norberg (1993, 1999). In particular, Norberg adopted amarked Poisson process framework, which was implemented in Antonio and Plat (2014) and Larsen (2007).Furthermore, Jewell (1989); Zhao, Zhou, and Wong (2009); Zhao and Zhou (2010) also studied the use ofPoisson process to model claim arrival. Avanzi, Wong, and Yang (2016c) further extended the study to amarked Cox process framework, where a stochastic intensity introduces over-dispersion and an estimationalgorithm is developed with the presence of reporting delays. Badescu, Lin, and Tang (2016) and Badescu,Lin, and Tang (2019) have also adopted with a Cox process approach where the intensity follows a hiddenMarkov model (HMM) with Erlang state-dependent distributions. Avanzi, Taylor, Wong, and Xian (2020)model and analyse claim counts of the AUSI dataset using a univariate Markov modulated non homogeneousPoisson process. Another stream of the micro-level reserving literature adopt a discrete time framework (see,for example Pigeon, Antonio, and Denuit, 2013, 2014, where numbers of claims follow Poisson distributions).Furthermore, Matsui and Mikosch (2010) establish the framework of using a Poisson cluster model in mod-elling insurance claim process, which is extended to to the case of multiple clusters by Matsui (2015) andto non-homogeneous observations by Matsui (2014). Jessen, Mikosch, and Samorodnitsky (2011) adopts aPoisson cluster model for the payment counts and the total loss amounts of claims with real data illustration.Moreover, Taylor, McGuire, and Sullivan (2008) focused on modelling individual claim loss with the helpof case estimates. Last but not the least, Huang, Qiu, and Wu (2015) and Huang, Wu, and Zhou (2016)study the theoretical comparison between micro- and macro-level reserving models. However, there is lackof literature in multivariate micro-level reserving that allows for a dependency structure whereas this isprecisely an area where micro-level information may benefit the modelling a lot.In this paper, we extend the existing study of claim counts within micro-level stochastic reserving to themultivariate context. We construct a multivariate Cox process model that allows for dependent frequenciesfrom multiple LoB’s in a parsimonious way. Furthermore, we develop a filtering algorithm that estimates the2nobservable intensity of the Cox process, which further facilitates an EM algorithm of parameter estimationand prediction. Finally, we illustrate our modelling approach and prediction with a real insurance dataset. Remark 1.1.
A complete micro-level reserving model framework would include not only projection of ul-timate claim frequencies, but also reporting delay distributions, claim severities with claim developmentpatterns. The methodology of introducing and estimating the reporting delays have been studied in Avanzi,Wong, and Yang (2016c). Under the assumption of independence between cross LOB reporting delays (whichwe believe to be a good assumption for attritional claims as modelled in this paper), the introduction of re-porting delay patterns in the multivariate context is essentially the same as that in the univariate context,where an independent reporting delay distribution is chosen for each process. Thus the estimation wouldfollow the same procedures as that in Avanzi, Wong, and Yang (2016c).
2. A multivariate claim arrival process
In this section, we develop our multivariate count model. Section 2.1 introduces a multivariate Coxprocess model with common shock intensities. Section 2.2 provides an alternative bottom-up constructionof a common shock model with a L´evy copula approach.
Let N = { ( N ( t ) , . . . , N G ( t )) , t ≥ } denote the multivariate claim arrival process. The g th marginalprocess, { N g ( t ) , t ≥ } , is a counting process for the number of claims of the g th (1 ≤ g ≤ G ) Line of Business(“LoB”). We use the definition of the multivariate Cox process from Møller and Waagepetersen (2004), andassume that N is a multivariate Cox process driven by a multivariate intensity ˜ λ = (cid:110)(cid:16) ˜ λ ( t ) , . . . , ˜ λ G ( t ) (cid:17) , t ≥ (cid:111) (see Definition 2.1.1). Definition 2.1.1 (A multivariate Cox process) . (see, for example, Møller and Waagepetersen, 2004) Sup-pose that ˜ λ g = { ˜ λ g ( ξ ) : ξ ∈ S } , g = 1 , . . . , G , are non-negative random fields so that for g = 1 , . . . , G , ξ → ˜ λ g ( ξ ) is a locally integrable function (that is, ˜ λ g ( ξ ) is integrable over the space of events) with probability one(where ξ and S refer to the event and the space of events). Conditional on ˜ λ = (cid:110)(cid:16) ˜ λ ( t ) , . . . , ˜ λ G ( t ) (cid:17) , t ≥ (cid:111) ,suppose that { N g ( t ) , t ≥ } g =1 ,...,G are independent Poisson processes with intensity functions { ˜ λ ( t ) , t ≥ } g =1 ,...,G respectively. Then N = { ( N ( t ) , . . . , N G ( t )) , t ≥ } is said to be a multivariate Cox processdriven by ˜ λ . The dependency structure of a multivariate Cox process results from the multivariate intensity process.Here the multivariate intensity is influenced by the joint movement of the underlying risk regimes of multipleLoBs. Furthermore, the conditional independence of the marginal processes assumption provides a simplemathematical representation and is able to account for most of the dependency across general insuranceclaims.As in Avanzi, Wong, and Yang (2016c) (in a univariate framework), we model the marginal intensityprocesses of λ as shot noise processes. This allows us to introduce a dependency structure via commonshocks (common shots) across multiple stochastic intensities. Definition 2.1.2 (A multivariate shot noise process) . (Li, 2002) A multivariate stochastic process, ˜ λ = (cid:110)(cid:16) ˜ λ ( t ) , . . . , ˜ λ G ( t ) (cid:17) , t ≥ (cid:111) , is a multivariate shot noise process if ˜ λ g ( t ) = ˜ λ g (0) e − κ g t + J ( t ) (cid:88) j =1 X g,j e − κ g ( t − τ j ) , t ≥ , (2.1) where { J ( t ) , t ≥ } is a homogeneous Poisson process of intensity ρ (with ρ > ); τ j represents the arrivaltime of the j th event of { J ( t ) , t ≥ } , which triggers a jump over the shot noise process (cid:110) ˜ λ g ( t ) , t ≥ (cid:111) . The ize of the jump is denoted by X g,j (where the subscripts refer to the g th marginal shot noise process and j th shot), which follows an exponential distribution with an expected value of /η g if X g,j > . We assume thatthe sequence of the G -dimensional random variable { X j = ( X ,j , . . . , X G,j ) } are independent and identicallydistributed over j = 1 , . . . , J ( t ) , t ≥ . Furthermore, X g ,j and X g ,j ( g , g = 1 , . . . , G ) can be dependentfor all j = 0 , . . . , J ( t ) . Denote by f X the density function of X j . The speed of decay is measured by theparameter κ g . Note f X can consist of a mixture of continuous and discrete components. Definition 2.1.2 introduces a common shock dependency structure across multiple shot noise processes.It is worth mentioning that one or more marginals of X j ( j = 1 , . . . , J ( t )) can be 0. In light of this, onecan decompose the arrival of shots of a marginal shot noise process into different subsets, where each subsetcorresponds to a unique combination of non-zero shots on this marginal stochastic intensity process. In abivariate context, for example, there can be up to three subsets, where shots can affect either exclusively onone marginal intensity process or simultaneously on both marginal intensity processes. We have providedan example (see Example 1 below) to illustrate how such a decomposition can be achieved and presentedthe covariance between the marginal shot noise intensities at time t ( t > Remark 2.1.
Here J ( t ) triggers a multivariate jump, and it is possible to have only one marginal being non-zero (in which case it is a unique jump) or more than one marginal being non-zero (hence a common jump).Hence we write J and not of J g . This is also why f X does not depend on g , since it is a G -dimensionalrandom variable. Furthermore, X g ,j and X g ,j can be dependent for all g and g . Remark 2.2.
Interested readers can also refer to Selch and Scherer (2018) for an alternative way of defininga multivariate Cox process. In Selch and Scherer (2018), a common stochastic clock is shared by all marginalprocesses. This requires a different interpretation of the model as well as different estimation techniques.
Example 1 (A bivariate example) . A bivariate shot noise process, ˜ λ ( t ) = (˜ λ ( t ) , ˜ λ ( t )) , (2.2) can be expressed as ˜ λ ( t ) = ˜ λ (0) e − κ t + J ⊥ ( t ) (cid:88) j =1 X ⊥ ,j e − κ ( t − τ ⊥ ,j ) + J (cid:107) ( t ) (cid:88) j =1 X (cid:107) ,j e − κ ( t − τ (cid:107) ,j ) ˜ λ ( t ) = ˜ λ (0) e − κ t + J ⊥ ( t ) (cid:88) j =1 X ⊥ ,j e − κ ( t − τ ⊥ ,j ) + J (cid:107) ( t ) (cid:88) j =1 X (cid:107) ,j e − κ ( t − τ (cid:107) ,j ) (2.3) where J ⊥ g ( t ) ( g = 1 , . . . , G ) is a homogeneous Poisson process (of intensity ρ ⊥ g ) that triggers shots of size X ⊥ g,j (with distribution function F ⊥ g ) and arrival time τ ⊥ g,j , only on the g th marginal process (for g = 1 , ); and J (cid:107) ( t ) is a homogeneous Poisson process (of intensity ρ (cid:107) ) that affects both marginal processes simultaneouslywith a bivariate shot of size ( X (cid:107) , X (cid:107) ) (with a bivariate distribution function F (cid:107) of marginals F (cid:107) and F (cid:107) ) and arrival time τ (cid:107) ,j , for j = 1 , . . . , J (cid:107) ( t ) . Note that we have omitted the unique shots of ˜ λ ( t ) in the expression of ˜ λ ( t ) for the ease of notation and also the benefit that F is not necessarily a mixeddistribution. Equivalently, one can adopt the convention in Definition 2.1.2, in which case some shots canbe 0 for a marginal process and hence the distribution consists of a mass at 0.Figure 1 illustrates a realisation of the path of a bivariate shot noise process. The dotted lines indicatecommon shots which trigger jumps in both marginal shot noise processes. Moreover, the dashed lines indicateunique shots, where each shot only triggers jumps in one marginal process.Furthermore, Cov (cid:16) ˜ λ ( t ) , ˜ λ ( t ) (cid:17) = ρ (cid:107) E (cid:104) X (cid:107) X (cid:107) (cid:105) κ + κ (2.4)4 igure 1: An illustration of a bivariate shot noise process: dotted lines - common shots, dashed lines - unique shots. where ( X (cid:107) , X (cid:107) ) follows a bivariate distribution function F (cid:107) ; see Appendix A for a proof. Finally, themarginal process ˜ λ ( t ) is a univariate shot noise process, that is, ˜ λ ( t ) = ˜ λ (0) e − κ t + J (cid:88) j =1 X ,j e − κ ( t − τ ,j ) , (2.5) where J is a homogeneous Poisson process of intensity ρ = ρ ⊥ + ρ (cid:107) , (2.6) and where the size of shot X ,j follows the distribution function F = ρ ⊥ ρ F ⊥ + ρ (cid:107) ρ F (cid:107) . (2.7)One can further generalise Example 1 to higher dimensions in a similar manner. In general, there canbe up to 2 G − G -dimension shot noise process.The frequencies of insurance claims are also driven by risk exposures. The exposure an insurance processfaces depends on factors such as the volumes of business, and can sometimes be measured by the numberof policies in force, for instance. Furthermore, there are also trends and seasonal patterns that affect thefrequencies of claims. In a tropical region, for example, one would expect a higher number of claims insummer for a housing insurance product than winter. We incorporate risk exposure into the multivariateCox process N through { W g ( t ) } g =1 ,...,G , where W g ( t ) is the risk exposure of the g th LoB at time t ; see alsoSection 4.1. Model assumption 2.1.3 (A non-stationary multivariate shot noise intensity) . We assume that thereexists a stationary multivariate shot noise process ˜ λ ( t ) such that for g = 1 , . . . , Gλ g ( t ) = W g ( t )˜ λ g ( t ) . (2.8)Assumption 2.1.3 allows for a multivariate intensity process for the arrivals of claims. The commonshock structure can be interpreted as common adverse events that affect multiple LoBs at different scales.Furthermore, we assume that the stochastic intensity of claims is proportional to the risk exposure of thecorresponding LoB, which leads to a non-stationary shot noise process. Here a stationary shot noise processis a process that follows Definition 2.1.2, where the joint distribution of the process does not depend on t ,and the only source of non-stationarity is introduced through the risk exposure.5 emark 2.3. As the exposure process is non-stationary and the underlying intensity process is unobservable,the analysis of empirical moments (including autocorrelations) may be distorted. As such, direct evaluationof whether a Cox model is a good candidate is difficult. Instead, in this paper the choice of candidate isdriven by theoretical properties and goodness-of-fit analysis of the model (as illustrated later in this paperwith residual analysis). In our illustration, we investigated and were able to explain the non-stationarypatterns of the claim intensity process resulting from exposure, seasonal patterns and trends.2.2. A bottom-up approach of common shot constructions with L´evy copulas
Definition 2.1.2 defines a general common shock model. This allows for dependency construction ofhigher dimensions, which is an extension of Example 1. However, such a common shock representation ofthe dependency structure faces several challenges. Firstly, the practical application of multivariate reservingmodels typically adopts a bottom-up approach in model construction and calibration. Such an approachinvolves the modelling and estimation of the losses of marginal LoBs before a dependency structure isintroduced (see page 221, McNeil, Frey, and Embrechts, 2015, for the definition of a bottom-up approach).However, this is not directly available to a common shock model, where the specification of the dependencystructure is implied in a multivariate set-up. Secondly, the direct construction of a common shock modelmay lead to complexity of model assumptions and a high number of parameters. This is particularly relevantat a high dimension where there will be a number of 2 G − G − G − and the associated jump sizes). Such a dependency re-lationship drives the dependency across the marginal Cox processes. A real world interpretation of thismechanism is that dependency of multiple insurance count processes do not arrive directly from commonarrival of claims - rather, it is the dependency in the underlying risk generating regime (e.g. the intensityprocesses in this paper) that creates the dependency of claim counts. Continuing on Example 1, Example 2introduces the bivariate Clayton L´evy copula, which we will use in Section 4. Example 2.
A bivariate L´evy copula C can be used to couple the marginal processes together via U ( x , x ) = C ( U ( x ) , U ( x )) , (2.9) where U g ( x g ) = ρ g (1 − F g ( x g )) denotes the g th tail integral of the marginal process ( g = 1 , and U ( x , x ) = ρ (1+ F (cid:107) ( x , x ) − F (cid:107) ( x , ∞ ) − F (cid:107) ( ∞ , x )) denotes the bivariate tail integral of the joint bivariate process.A possible choice for C is the bivariate Clayton L´evy copula (Cont and Tankov, 2004), which is definedas C ( u , u ) = ( u − δ + u − δ ) − /δ , δ > . (2.10) It is worth emphasizing that, while there is a relationship between the copula of the common jumps inducedby a Clayton L´evy copula, and the Clayton distributional copula, a Clayton L´evy copula is not a Claytoncopula. A Clayton copula is a copula function for a multivariate random variables and a Clayton L´evy copula(which is not a copula function) aggregates multiple L´evy processes. . Parameter estimation and prediction In this section, we explain how one can estimate the parameters of the multivariate shot noise model.This is particularly essential with a Cox process approach, where the joint intensity is unobservable andhence the model cannot be calibrated with a maximum likelihood approach. Furthermore, the estimation ofthe parameters and the underlying intensity are necessary for the projection of future claims counts, whichare important in the context of valuing future insurance liabilities.We start by introducing notation and likelihood in Section 3.1, before developing an EM algorithm witha reversible jump Markov chain Monte Carlo (“RJMCMC”) filter. We extend the estimation algorithm inAvanzi, Wong, and Yang (2016c) to allow for a multivariate shot noise process. We explain our filteringalgorithm with fixed parameters in Section 3.2 and develop a three-step EM algorithm to update the param-eter estimates in Section 3.3. For simplicity, we illustrate the filtering and parameter estimation proceduresin a trivariate context. These can be adapted in higher dimensions with a similar procedure.
Suppose that the overall observation period (that is, the period for which which policy and claim dataare extracted) is [0 , T ], which is discretised into a number of L sub-periods of equal length ∆. We say thata claim arrives in the i th accident period if the arrival time of the claim falls into (( i − , i ∆]. This set ofobservations is denoted N GD = { N g,i ; g = 1 , , , i = 1 , , . . . , L } . (3.1)Furthermore, we characterise the trajectory of the trivariate shot noise process by θ G = { (˜ λ (0) , ˜ λ (0) , ˜ λ (0)) , τ , . . . , τ n , X , . . . , X n } (3.2)where τ i and X i denote the arrival time and trivariate sizes of the i th shot, respectively. As mentioned inthe explanation for Definition 2.1.2, each X i is a trivariate random variable where one or more marginal canbe 0. Note that we are modelling the intensity of the claim count process here (not the counts themselves),so that all components of θ G are unobservable.In the rest of this section, we will derive the log-likelihood functions for the trajectory of a trivariateshot noise process and the conditional observations of claim counts given the shot noise trajectory.Denote by δ the parameter vector of the L´evy copula. The log-likelihood of θ G islog p ( θ G ; ρ , ρ , ρ , η , η , η , κ , κ , κ , δ )= n log ρ a − ρ a T + log f a ( X n )+ (cid:88) g =1 (cid:20)(cid:18) ρ g κ g − (cid:19) log ˜ λ g (0) − η g ˜ λ g (0) + ρ g κ g log η g − log (cid:18) Γ (cid:18) ρ g κ g (cid:19)(cid:19)(cid:21) , (3.3)where each term in the summation refers to the log-likelihood of the corresponding initial value of a marginalshot noise process, which is chosen to be the corresponding stationary distribution of each univariate process,that is, a Gamma random variable (see Centanni and Minozzo, 2006a,b), and where Γ( · ) is the Gammafunction. Conditional on the trajectory of the multivariate shot noise process, the conditional log-likelihoodof our observations is log L ( N GD | θ G ; κ , κ , κ )= (cid:88) g =1 L (cid:88) i =1 log A ( N g,i , M g,i ; κ g )= (cid:88) g =1 L (cid:88) i =1 ( − M g,i + N g,i log M g,i ) + constant (3.4)where 7 g,i = (cid:90) i ∆( i − λ g ( t ) d t,A ( N g,i , M g,i ; κ g ) = e − M g,i M N g,i g,i N g,i ! (3.5)Here the function A ( m, M g,i ; κ g ) calculates the probability of having m ultimate claims, which comesfrom a Poisson probability mass function of intensity M g,i . This function depends on κ g through the integralof λ g ( t ). Note that the format of the conditional likelihood function (3.4) results from the discretisationscheme introduced at the beginning of this section. Such a discretisation scheme accommodates the discretenature of real data, which is an essential step in applying a continuous time Cox process (see also Avanzi,Wong, and Yang, 2016c). The issue we face is to derive the conditional distribution of the unobservable component, θ G , giventhe knowledge of N GD (which is observable), which is a filtering problem (Centanni and Minozzo, 2006a,b).Since we can obtain the likelihood of the shot noise process itself—see Equation (3.3)—and the likelihoodof the conditional distribution of N GD given θ G —see Equation (3.4), we can adopt a MCMC algorithm.However, since the number of shots is unknown, the dimension of θ G is also undetermined. We thus adopta RJMCMC algorithm (Green, 1995) to allow for ‘moves’ with dimension changing. We will briefly outlinethe general steps of a RJMCMC simulation algorithm. One can refer to Gelman, Jones, Brooks, and Meng(2011, Chapter 3) for more details about the RJMCMC simulation and Centanni and Minozzo (2006a,b)and Avanzi, Wong, and Yang (2016c) regarding using a RJMCMC filter in the univariate case of a shotnoise Cox process.A RJMCMC simulation algorithm helps approximate the conditional distribution of the shot noise processgiven the observations. Firstly, one randomly chooses a move type with probability p ( r | n ) (with (cid:80) r p ( r | n ) =1), which depends on the existing number of shots. Given a chosen move type r and the existing shot noiseprocess θ G , one proposes a new shot noise trajectory by generating a random component u with a proposaldistribution q ( u | r, n, θ G ). This proposal, once accepted, results in θ (cid:48) G with n (cid:48) shots. The probability ofaccepting the proposal is min(1 , α [( n, θ G ) , ( n (cid:48) , θ (cid:48) G )]), where α [( n, θ G ) , ( n (cid:48) , θ (cid:48) G )] is calculated according to(3.6) and can be further decomposed into the product of the likelihood ratio, prior ratio, proposal ratio, andJacobian. α [( n, θ G ) , ( n (cid:48) , θ (cid:48) G )] = min , L ( N GD | θ (cid:48) G ) L ( N GD | θ G ) (cid:124) (cid:123)(cid:122) (cid:125) likelihood ratio × p ( θ (cid:48) G ) p ( θ G ) (cid:124) (cid:123)(cid:122) (cid:125) prior ratio × p ( r (cid:48) | n (cid:48) ) p ( r | n ) q ( u (cid:48) | r (cid:48) , n (cid:48) , θ (cid:48) G ) q ( u | r, n, θ G ) (cid:124) (cid:123)(cid:122) (cid:125) proposal ratio × | ∂f r,n ( θ G , u ) ∂ ( θ G , u ) | (cid:124) (cid:123)(cid:122) (cid:125) Jacobian . (3.6)We introduce five move types, namely s, p, h, b and d. For simplicity, we choose p ( r | n ) = 0 . r = s,p, h, b, d and n >
1. Furthermore, we assume that p ( s |
0) = p ( b |
0) = 0 .
5, that is, a move can only be eitherof type s or type b (with equal probabilities) if there is no existing shot. The algorithm is similar to thatin Centanni and Minozzo (2006a,b) and Avanzi, Wong, and Yang (2016c), except that we now look at themultivariate case. Furthermore, we adopt the dash symbol ( (cid:48) ) for all variables related to the proposed shotnoise trajectory (e.g. n (cid:48) refers to the number of shots in the proposed state).We start by introducing three moves that do not involve dimension changing, namely move s, p and h.In other words, we have n = n (cid:48) .Move s proposes to change the initial values of the multivariate shot noise process. For each of the g th marginal shot noise process ( g = 1 , , (cid:16) ˜ λ (cid:48) g (0) (cid:17) , is drawn from the Gamma8istribution with parameters ( ρ g /κ g , η g ). The prior ratio is calculated as (cid:89) g =1 e − η g ( ˜ λ (cid:48) g (0) − ˜ λ g (0) ) (cid:32) ˜ λ (cid:48) g (0)˜ λ g (0) (cid:33) ρgκg − (3.7)and the proposal ratio as (cid:89) g =1 e − η g ( ˜ λ g (0) − ˜ λ (cid:48) g (0) ) (cid:32) ˜ λ g (0)˜ λ (cid:48) g (0) (cid:33) ρgκg − . (3.8)Move p proposes to change the position of an existing shot. This involves choosing an existing shot bygenerating a value, denoted by n ∗ , from the discrete uniform distribution over { , . . . , n } . Then the positionof the n ∗ th shot, τ n ∗ , is proposed to be changed to τ (cid:48) n ∗ . The proposal follows a continuous uniform randomdistribution over ( τ n ∗ − , τ n ∗ +1 ) where τ = 0 and τ n +1 = T . The prior ratio of this move is 1, since thelocation of the shot is irrelevant to the prior likelihood (see Equation (3.3)). Furthermore, one can also showthat the proposal ratio is 1, which is due to the use of the uniform distribution in the proposal.Move h proposes to change the size (height) of an existing shot. Similar to move p, move h requiresselecting an existing shot n ∗ from the discrete uniform distribution over { , . . . , n } . Then the size of the n ∗ th shot, X n ∗ , is proposed to be changed to X (cid:48) n ∗ . The proposal follows the mixed density function (2.7).The prior ratio is f X ( X (cid:48) n ∗ ) f X ( X n ∗ ) (3.9)and the proposal ratio is f X ( X n ∗ ) f X ( X (cid:48) n ∗ ) . (3.10)One may notice that the product of the proposal ratio and prior ratio is always 1 for moves s, p andh. This is because we choose the proposal distribution to follow the prior knowledge of the shot noisetrajectory (given a fixed number of shots). Furthermore, the Jacobian is always 1 for these three movessince the dimension of θ G does not change.Now we introduce two moves that change the dimension of θ G . Firstly, we have move b that gives birthto a new shot. This involves drawing a new position τ ∗ from a continuous uniform distribution over [0 , T ].Denote by n ∗ the integer such that τ n ∗ − < τ ∗ < τ n ∗ +1 with τ = 0 and τ n +1 = T . In this case, we have n (cid:48) = n + 1. Furthermore, the size of this shot, X (cid:48) n ∗ , is simulated from the (mixed) density function f X .Furthermore, we have a move type d that delete an existing shot. This includes drawing n ∗ from a discreteuniform distribution from { , . . . , n } and the n ∗ th shot is deleted. Then we have n (cid:48) = n − ρf X ( X (cid:48) n ∗ ) , (3.11)and the proposal ratio of move b is p ( d | n + 1) p ( b | n ) (1 + n ) − T − f X ( X (cid:48) n ∗ ) . (3.12)For move d, the prior ratio is ( ρf X ( X n ∗ )) − , (3.13)and the proposal ratio is q ( b | n − q ( d | n ) T − f X ( X n ∗ ) n − . (3.14)One can show that the Jacobian for moves b and d is still 1. The likelihood ratio involved in (3.6) can beobtained from (3.4). It is worth mentioning that this is based on the observations of N GD and hence involvesthe (non-stationary) risk exposure (as part of calculating M l for l = 1 , . . . , L ).We have summarised the various move types in Table 1.9 ove type (‘ r ’) proposal of the next state p ( r | n )s modifying the initial value of (cid:16)(cid:16) ˜ λ (cid:48) (0) , ˜ λ (cid:48) (0) , ˜ λ (cid:48) (0) (cid:17)(cid:17) by drawing (cid:16) ˜ λ (cid:48) g (0) (cid:17) from the stationary distribution of the g th marginal ( g =1 , ,
3) 0.5 ( n = 0), 0.2 ( n > τ ∗ uniformly from(0 , t ] and drawing a new jump height X (cid:48) n ∗ from the shot size distribution f X n = 0), 0.2 ( n > j from the discrete uniformdistribution over { , . . . , n } and drawing X (cid:48) n from the shot size distri-bution f X to replace X n n = 0), 0.2 ( n > j from the discrete uniformdistribution over { , . . . , n } and drawing a new position, τ (cid:48) j , uniformlyover ( τ j − , τ j +1 ) (where τ = 0 and τ n +1 = t ) 0 ( n = 0), 0.2 ( n > j from the discrete uniform distribution over { , . . . , n } and deleting the j th shot 0 ( n = 0), 0.2 ( n > Table 1: Types of moves
An Expectation Maximisation (“EM”) algorithm will iteratively update the parameter estimates withthe presence of incomplete observations (see Ryd´en, 1996, for more details). In particular, a Monte CarloExpectation Maximisation (“MCEM”) algorithm is adopted where the conditional expectation in the E-stepis approximated through simulations. Avanzi, Wong, and Yang (2016c) has used an MCEM algorithm toestimate the shot noise parameters in a univariate scenario. In this case, we have further extended thealgorithm to the multivariate case. In particular, we follow the idea of the Inference Functions for Margins(”IFM”) in the context of copula fitting (see Chapter 10 of Joe, 1997, for more details) and separate theestimation of the parameters of the univariate components and the dependency structure.Firstly, one starts with calibrating the marginal Cox processes. For each marginal process, we followthe algorithm of Avanzi, Wong, and Yang (2016c) to estimate the shot noise parameters. Secondly, weproceed to estimating the parameters of the L´evy copula with a MCEM algorithm while the parameters ofthe marginal processes are fixed. We will explain the full details of this MCEM algorithm in the rest of thissection.The initial estimates of the L´evy copula parameters γ , denoted by γ , are estimated via moment matchingbased on Equation (2.4) (see Example 1). Such a moment matching estimation can be applied to a bivariateL´evy copula and more generally a higher-order nested Archimedean L´evy copula. With a higher-ordernested Archimedean L´evy copula, one can calculate the covariances between claim frequencies, which canbe used to calculate the parameters given the corresponding bivariate marginal L´evy copulas (see Avanzi,Tao, Wong, and Yang, 2016a).The initial estimate of the multivariate trajectory of the unobservable shot noise process is obtainedbased on the estimates of the marginal intensities. The latest estimate of each marginal shot noise is treatedas a multivariate shot noise where the other marginals are 0. Therefore one can combine all the marginalestimates and create a multivariate shot noise where each shot is always unique. Furthermore, shots havebeen merged as long as the arrival times of two shots are less than a threshold (chosen as 0.01 of a day inthis project), where the size of the resulting shot is simply the addition of the sizes two individual shots.Such a procedure creates a reasonable initial estimate that utilises the results of the univariate estimation.An illustration of the above procedure in a bivariate case is provided in Example 2. Example 3.
In Table 2 we illustrate how initial estimates of a bivariate trajectory of the unobservable shotnoise process can be obtained based on two marginal intensities. Here, the threshold used is 0.5.
Once the initial estimates are obtained, one can follow the following procedure:10rrival time size of shot0 10.7 1.52.8 1.2
Marginal 1 arrival time size of shot0 21.5 13.2 1.9
Marginal 2 size of shotsarrival time Marginal 1 Marginal 20 1 20.7 1.5 01.5 0 13.2 1.2 1.9
Initial estimate of the bivariate trajectoryTable 2: Example 3; illustration of the initial guess of a bivariate shot noise trajectory — in the k th iteration, generating a large number of RJMCMC iterations given γ (where γ refer to theinitial estimates) based on the algorithm developed in Section 3.2;— approximating the conditional expectation Q ( γ , γ k ) = E γ k [log L ( N GD , θ G ; γ ) | N D ] . (3.15)as an average of the conditional likelihoods given the RJMCMC simulations;— deriving the parameter estimates γ k +1 such that γ k +1 maximises the conditional expectation Q ( γ , γ k ).
4. Illustrative case study - a bivariate motor insurance dataset
In this section, we illustrate our model and estimation procedure with a real insurance data set. The dataset, which is part of the AUSI data set (see Avanzi, Taylor, and Wong, 2016b, for the details on the data set),consists of observations from a Motor insurance portfolio of a major Australian general insurer. We segmentthe Motor insurance portfolio by states and illustrate how our methodology can be used to understand thedependency of the claim counts between the states of New South Wales (“NSW”) and Victoria (“VIC”).Although the segmentation by geographical areas within the same insurance product is not common inpractice, we aim at demonstrating our model and methodology by utilising the available data we have andone can apply similar procedures to reserving across multiple LoBs. Furthermore we choose NSW and VICbecause they are two adjacent large states in Australia (in terms of exposure), and contribute to more thanhalf of the national population. Here we use observations of insurance claims and policy information from01/January/2006 to 31/December/2010. We performed the following manual adjustments to the data set:— We excluded claims resulting from catastrophe events. Although catastrophe events explicitly contributeto the dependency of insurance claims in the neighbourhood states, this source of dependency can beobserved and hence explicitly modelled. Indeed, the modelling of catastrophe events benefits fromknowledge that is beyond the actuarial field, and is typically conducted with separate catastrophemodels in practice (see e.g. Grossi and Kunreuther, 2005). In the AUSI data set, catastrophe claimsare identified with non-empty catastrophe flags;— We excluded policy records of non-positive gross premium, total sum insured or total excess;— We excluded invalid policy records (for example, where inception dates are after expiry dates and/orinception dates are no earlier than year 9999). This only corresponds to a negligible portion of the dataset.In Section 4.1, we explain how we allow for covariates in modelling claims counts. Sections 4.2 and 4.3illustrate the procedures and provide the results of the univariate and bivariate fitting.11 .1. Exposure and covariates adjustment
Figures 2 and 3 present the time series of daily claim counts, numbers of policy holders, auto-correlationof daily claim counts (standardised by numbers of daily policyholders) as well as auto-correlation of weeklyclaim counts (standardised by numbers of weekly policyholders) for both states. This is to account for thenon-constant risk exposure of total claims counts and to hence reveal a clearer picture of potential trends,weekly and seasonal patterns of claim frequencies.There are a few key observations. Firstly, Figures 2a and 3a present the reported claim count per accidentdays along with the numbers of policies. The daily auto-correlation plots (see Figures 2b and 3b) revealstrong weekly cycles for both states. Furthermore, the weekly auto-correlation plots (see Figures 2c and 3c)suggest significant annual cycles for both states.The existence of strong empirical autocorrelations (see, e.g., Figures 2b and 3b) means one shouldinvestigate the seasonality patterns from the data. Furthermore, we have also noticed the presence ofpotential trends in the data. We have carried an empirical investigation where we compare the average claimcount per policy across (1) different days of a week, (2) different months and (3) during major Australianholidays. Please see the bar charts in Figures 2f and 2g the empirical monthly and weekly patterns of(standardised) claim count per policy for NSW, and the bar charts in Figures 3f and 3g for VIC.Let us further specify the risk exposure, W g ( t ) (where g = NSW, VIC) aslog W g ( t ) = log (number of policies in force at time t ) + f g ( t ) , (4.1)where a deterministic function of time, f g ( t ), is adopted to capture time covariates (e.g. seasonality, trends,etc.). The particular form of f g ( t ) depends on the features of data.In this project, we adopt an additive structure of f g ( t ) such that f g ( t ) = x g ( t ) a g (4.2)where a g ( t ) is a vector of covariate coefficients and x g ( t ) is a matrix of the covariates (with each rowrepresenting each accident day and each column representing each covariate). Note we assume that bothrisk exposure W g ( t ) and covariates x g ( t ) are piece-wise constant over daily intervals. This is a naturalassumption given that daily observations are the most granular level of information one can possibly obtainin practice.Joint estimation of the seasonality component of function f g and the unobservable shot noise componentof ˜ λ g ( t ) ( g = NSW, VIC) involves a large number of parameters. We propose to estimate the seasonalityparameters prior to that of the shot noise parameters. This is achieved by fitting a Poisson GLM model tothe daily claim count. This effectively approximates the Cox model by replacing the shot noise component˜ λ g ( t ) with a constant c g along with the following discretisation:log(˜ λ g ( (cid:98) t (cid:99) + 1)) = log( W g ( (cid:98) t (cid:99) + 1)) + x g ( (cid:98) t (cid:99) + 1) a g + c g . (4.3)where (cid:98) t (cid:99) is the integer part of t (with unit of day). Such a method effectively fits a Poisson GLM modelwith a log-link function to the data of daily claim counts.There can be potentially a large number of parameters in the specification of the seasonality components— in particular, for the monthly patterns. We have attempted to reduce the number of monthly covariatesin two different ways. The first method applies a linear spline model which is aimed at capturing the changeof monthly behaviour of claim frequencies with a simple and continuous parametric form. The secondmethod involves grouping months (hence requires a benchmark month). Both methods were used and weselected the one that performed better based on the AIC criterion. It turns out that the method of groupingworks better for both the claim processes of NSW and VIC. We have also attempted to reduce the numberof weekly covariates by grouping days in a week, however, it turns out that having 6 parameters for theweekly pattern outperforms grouping (in terms of the AIC criteria) for both states. The results of fittingthe monthly and weekly patterns are presented in the dark dots in Figures 2f and 3f, as well as in dark dotsin Figures 2g and 3g. 12
006 2007 2008 2009 2010
Time (a) Daily claim count (dark) and policy count (grey) (b) Auto-correlation function, daily, raw
10 20 30 40 50 60-0.500.51 (c) Auto-correlation function, weekly, raw (d) Auto-correlation function, daily, after exposure ad-justment
10 20 30 40 50 60-0.500.51 (e) Auto-correlation function, weekly, after exposure ad-justment
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.60.70.80.911.11.2 (f) Average claim count per policy - Monthly pattern.Bar chart - empirical patterns, dark lines - fitted patterns.
Sun Mon Tue Wed Thu Fri Sat0.60.811.21.4 (g) Average claim count per policy - Weekly patternBar chart - empirical patterns, dark dots - fitted patterns.
Figure 2: Summary of the claim arrival process of the Motor LoB in NSW
We adjust the empirical autocorrelation plots by examining daily claim count per exposure. The empiricalautocorrelations after adjustment are plotted in Figures 2d, 2e, 3d and 3e. Compared to those before13
006 2007 2008 2009 2010
Time (a) Daily claim count (dark) and policy count (grey) (b) Auto-correlation function, daily, raw
10 20 30 40 50 60-0.500.51 (c) Auto-correlation function, weekly, raw (d) Auto-correlation function, daily, after exposure ad-justment
10 20 30 40 50 60-0.500.51 (e) Auto-correlation function, weekly, after exposure ad-justment
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec0.40.50.60.70.80.911.11.2 (f) Average claim count per policy - Monthly patternBar chart - empirical patterns, dark lines - fitted patterns.
Sun Mon Tue Wed Thu Fri Sat00.71.4 (g) Average claim count per policy - Weekly patternBar chart - empirical patterns, dark dots - fitted patterns.
Figure 3: Summary of the claim arrival process of the Motor LoB in VIC adjustments, both states display significantly reduced levels of autocorrelation. This suggests that we havemodelled away most of what could be explained with our covariates, as would be expected for any appropriate14odel, and we are now ready to move to the next stage and fit a dependence structure (see Avanzi et al.,2016b, for a detailed discussions of the benefits and requirements of modelling trends before applying adependence structure).
The fitting of L´evy copulas to ‘assemble’ processes separate from the marginal processes. Hence, westart by modelling our margins. The calibration of each univariate Cox process follows a similar procedureas Avanzi, Wong, and Yang (2016c), which includes the following steps.The initial estimates for the shot noise parameters are obtained by matching of moments, followed byjointly updating both the reporting delay and shot noise parameters through MCEM algorithms. Thisinvolves 150 MCEM iterations with 20,000 RJMCMC simulations; and we select 100 simulations from thesecond half of each iteration in evaluating the M-step of the MCEM algorithm.The final parameter estimates are summarised in Table 3. Here parameters ρ , η and k fully specifythe marginal shot noise processes (see Theorem 2.1.2), and the implied moments are presented in Table 4.Figure 4 presents the relative change of parameter estimates (via EM iterations), which shows that the EMestimates are quite stable for all the parameters.States ρ η κ NSW 33.77 0.17 2.37VIC 18.74 0.18 1.28
Table 3: Parameter estimates of the univariate shot noise Cox processes -3 (a) ρ nsw -3 (b) η nsw (c) ρ nsw /k nsw -3 (d) ρ vic -3 (e) η vic (f) ρ vic /k vic Figure 4: Relative changes of parameter estimates through EM iterations. Top - NSW, Botton - VIC
States mean variance auto-correlation at 1 day lagNSW 85.22 350.78 0.21VIC 80.74 384.33 0.37
Table 4: Implied moments of claim count per accident day per person based on estimated parameters
Figures 5a and 5b present the assessment of goodness-of-fit via analysing the standardised residualsof estimating daily claim counts (assuming that a claim count is Poisson distributed given the filteredintensity). For both states, the standard deviations of residuals are close to 1 and the means are close to0, which indicates a satisfactory level of goodness-of-fit. Furthermore, the autocorrelations of the residuals15 mean=-0.017084sd=0.932398 (a) Histograms of residuals and the fitted densitywith a normal distribution - NSW -4 -3 -2 -1 0 1 2 3 400.10.20.30.40.5 mean=0.004962sd=0.924345 (b) Histograms of residuals and the fitted densitywith a normal distribution - VIC
10 20 30 40 50 60-0.500.51 (c) Auto-correlation of residuals - NSW
10 20 30 40 50 60-0.500.51 (d) Auto-correlation of residuals - VIC
Figure 5: Residual analysis of the independent shot noise Cox process fitting are also close to 0 for both states, which shows that the shot noise Cox model is able to capture the serialdependency of claims counts.
The final step is to fit a dependence structure between the marginal processes using a L´evy copula.In this illustration, we use a Clayton L´evy copula. The L´evy copula parameter is updated via 150 MCEMiterations while the marginal parameters of both NSW and VIC are fixed. In each iteration, there are 20,000RJMCMC simulations and we select 100 simulations from the second half of each iteration in evaluating theM-step of the MCEM algorithm. The final estimate is 0.4214 and the relative change of estimate in eachEM iteration is shown in Figure 6.
Figure 6: Relative changes of the estimate of the L´evy copula parameter
Similar to the case of univariate fitting, the goodness-of-fit of the bivariate Cox model is examined bystudying the residuals. Here the residuals are defined as the difference between the observed claim countsand the expected claim counts standardised by the standard deviations for all accident days. The empiricaldistributions and auto-correlations of residuals are presented in Figure 7.Figure 8 displays the empirical residual plots of claims of the states, NSW and VIC, from the bivariatefitting. There is no visually significant dependency structure, which suggests good estimation of the underly-16 mean=0.014843sd=1.043929 (a) Histograms of residuals and the fitted densitywith a normal distribution - NSW -6 -4 -2 0 2 4 600.20.4 mean=-0.025601sd=1.067229 (b) Histograms of residuals and the fitted densitywith a normal distribution - VIC
10 20 30 40-0.500.5 (c) Auto-correlation of residuals - NSW
10 20 30 40-0.500.5 (d) Auto-correlation of residuals - VIC
Figure 7: Residual analysis of the bivariate shot noise Cox process fitting ing intensities and that the shot noise Cox assumptions are appropriate. In particular, Figure 8a shows thatthe residuals of univariate fitting are rather independent. This suggests that assuming no dependency acrossthe Poisson processes, given the intensities, is reasonable. Therefore, despite the fact that the univariatefitting ignores the dependency structure, it filters out the marginal intensities and produces i.i.d. errors.Furthermore, the bivariate filtering decomposes each marginal intensity into a sum of unique and commonshot noise.
NSW V I C (a) Univariate fitting NSW V I C (b) Bivariate fitting Figure 8: Empirical copulas of residuals from fitting the claim arrival components to the NSW and VIC.
Absence of anynoticeable dependence structure suggests a good fit.
Figure 9 displays the empirical copula of the integrated stochastic intensities (over accident days, free ofrisk exposure) of both states. It is aimed at presenting the dependency between the integrated intensities17that is, the frequencies of daily claims), and therefore the results are filtered intensity processes (which arenot observable from data). This is because realised intensities are not directly observable (only claim countsare), so they must be ‘inferred’ from the claim counts through the lens of a specific model through filtering.Furthermore, a data point on Figure 9 refers to the number of daily integrated intensities that fall into eachcell, while the integrated intensities are standardised to [0,1]. In particular, Figures 9a and 9b display thecopulas of integrated intensities from the unique jumps and common jumps respectively. This illustratesthat the L´evy copula structure further decomposes the marginal intensities into two components, where theunique components are independent and there is a significant positive dependency structure between thecommon components.
NSW V I C (a) Unique components NSW V I C (b) Common components Figure 9: Empirical copula of claim counts (integrated shot noise intensities over each accident day) of the NSW and VICstates - bivariate fitting. This illustrates the dependence that is present in the data, as filtered and interpreted by our model.
Remark 4.1.
As discussed earlier in the paper, it is empirically difficult to judge whether a Cox model isa good candidate directly from the data. However, the model developed in this paper has properties displayedby the data (non-stationary exposure, auto-correlation, . . . ), and it seems that, once fitted, it can explainmost of those convincingly. While we cannot guarantee that our model is the best, we are convinced it is ofreasonable quality for this data set.4.4. Prediction results and discussion
Given the filtered multivariate intensity, one can investigate further how the unique and common shotscontribute to the integrated stochastic intensities. The results in Table 5 indicate substantial weights fromcommon shocks in both marginal intensities. For VIC, the common events contribution to 49.30% of thestochastic intensities of the claim arrival process. Here 49.30% does not refer to the number of events;instead, it refers to the combination effects of frequency, severity and decay of the shot events on thestochastic intensity of claim arrivals in the state of VIC. NSW VICContribution from unique shot noise 67.54% 50.70%Contribution from common shot noise 32.46% 49.30%
Table 5: Decomposition of the integrated intensities
We predicted the distributions of total future claim counts over the next one year (assuming constant riskexposures) through 100,000 simulations. Note that the initial value of the shot noise trajectory is chosenas the filtered shot noise intensity at the end of the observation period, hence the simulations of future18cenarios are implicitly based on the latest status of the shot noise development. Figure 10 visualises thebivariate histogram of the copulas between claim counts in the two cases. Here the colour intensity in eachcell refers to the number of simulations where the bivariate claim count fall into a specific joint quantilerange. It shows that there is significant evidence of dependency, which is consistent with the results in Table5. This is further confirmed with the numerical dependency measures in Table 6, where all the measuresare material and statistically significant (with all p -values being almost 0). Note that the values in Table6 should not be compared directly to dependency measures of real data. This is due to the non-constantrisk exposures and also serial dependency of claim counts over time, which means the i.i.d. assumptions ofclaim counts in constructing the dependency measures is not valid.The dependency measures in Table 6 refer to the dependency structure of the next year, based on agiven set of controlled factors. Our methodology provides a statistical sound way of making assumptionsabout the dependency structures. The implied future dependency measures can then be used in moretraditional (and straightforward) methodologies, for example, when a correlation matrix is required toaggregate individual portfolios to a company level. This improves the existing practice of multivariatereserving where dependency structures are usually based on expert knowledge and industry statistics. Inparticular, one common approach to estimate risk margin at a company level is by aggregating individualrisk margins of various portfolios with the help of correlations. But correlations typically cannot be inferredfrom data (as 10 years of data would yield only 10 observations, which is insufficient to estimate yearlycorrelations), and are hence generally chosen judgmentally (see also Avanzi, Taylor, and Wong, 2016b, forfurther discussion of this). Our approach provides a promising first step towards estimating dependencymeasures through a more rigorous and objective approach. Indeed, only a few years of data are required tofit a model that can subsequently yield implied dependence measures for any time horizon.Pearson’s correlation Kendall’s tau Spearman’s rho0.3633 0.2355 0.3469 Table 6: Dependency measurement across the bivariate prediction of claims counts
Remark 4.2.
We have also attempted to provide comparison between our model and the Chain Laddermodel, however we realise that such a comparison can be challenging and may provide misleading results.First of all, the Chain Ladder model is widely applied in a univariate context which does not accommodateany dependency. From a practical perspective, dependency can be introduced through external knowledge (e.g.expected correlation and/or tail correlation across various types of products) and included in the estimatein a subjective way. Secondly, the Chain Ladder model includes a large number of parameters and is over-parametrised, hence a suitable comparison should consider the additional uncertainties around the centralestimates that arise from parameter errors, which requires more extensive numerical study. Last but notthe least, the Chain Ladder model is rarely used without any subjective judgement in practice, which canreflect any knowledge of the product, pricing, and claim operation that is not captured in the data. Henceany like-for-like comparison should require similar judgement processes and a broader range of numericalstudies. As a summary, there are fundamental differences in the theoretical properties and how the modelsare implemented between the Chain Ladder model (and most of the existing reserving models) and our micro-level approach. These facts make it very difficult to conduct a meaningful comparison. On the other hand, themultivariate Cox model can be complementary to the Chain Ladder model. While the Chain Ladder model iscommonly used to estimate the individual risk margin, our multivariate Cox model enables the understandingthe dependence structure and hence informed and educated choice for the correlations as explained above.Such choices can subsequently be used to aggregate the CL results.
5. Conclusion
In this paper, we have developed a multivariate
Cox process approach for claim counts in micro-levelstochastic reserving. In particular, we developed a multivariate shot noise intensity that introduces common19
Figure 10: Heat maps of the histograms of the empirical copulas between the simulated claims counts shocks on the intensities of claim arrivals across multiple LoBs with the help of L´evy copulas. Furthermore,we have also developed a filtering algorithm to estimate the underlying joint intensity.The independent increment property of a L´evy process means the dependency structure of a multivariatecompound Poisson process can only be introduced via a common shock model. However, there is a bijectiverelationship between a L´evy copula model and a common shock model (see Sklar’s theorem for L´evy copulasin Tankov, 2003, for example). Hence, our proposed approach is actually not different from a common shockapproach, but it is significantly more tractable.The model construction and calibration procedures are illustrated with a real bivariate dataset. Inparticular, we account for time covariates (including weekly patterns, annual patterns, and trends). Afterallowing for these covariates, the results show that the dependency between the claim arrival processes ofthe Motor LoB in both NSW and VIC states is still significant. This is expected, due to random phenomenathat may affect both states at the same time (e.g., weather events).Our work focuses on the particular issue of dependency modelling for stochastic reserving. Modellers canapply such a framework in understanding the dependency structure between the claim processes of multipleLoBs, and in particular, its impact on the quantiles of the aggregated loss. For example, one implementationof the multivariate Cox process is to derive the implied dependence structure (see e.g. Figure 10), whichcan be utilised in aggregating individual reserve estimates to a company level. Common practice, such asthe Chain-Ladder approach and its variations, still has value in providing estimation for the first momentsof the losses of individual LoBs.One can extend the same idea of model construction and calibration to higher dimensions. This requiresthe choice of a higher dimension L´evy copula. However, the dependency structure is unlikely to be ex-changeable across multiple LoBs. In this situation, one option is to adopt a non-exchangeable L´evy copulas(see, for example Avanzi, Tao, Wong, and Yang, 2016a). The model implementation procedures developedin this paper can be easily extended from our bivariate illustration.20his paper focused on multivariate claim frequencies , including how to reflect exposures (including thenumber of policies in force, seasonal patterns and trends) in the estimation. Full estimation of reserves(including severities) would require the modelling of claim development patterns and possibly a dependencymodel for claim severities. While these are very significant and important endeavours, the additional amountof work required to achieve such a comprehensive model is clearly beyond the scope of a single paper.
Acknowledgements
Authors are very grateful for comments from two referees, which led to significant improvements ofthe paper Earlier versions of this paper were presented at the 20 th International Congress on Insurance:Mathematics and Economics (Atlanta, U.S.A.), at the 3 rd European Actuarial Journal Conference (Lyon,France), at the 22 nd International Congress on Insurance: Mathematics and Economics (Sydney, Australia),and at the Australian National University Research School of Finance, Actuarial Studies and Statistics 2018Summer Camp. The authors are grateful for constructive comments received from colleagues who attendedthose events.This research was supported under Australian Research Council’s Linkage (LP130100723, with fundingpartners Allianz Australia Insurance Ltd, Insurance Australia Group Ltd, and Suncorp Metway Ltd) and Dis-covery (DP200101859) Projects funding schemes. Furthermore, Avanzi acknowledges support from a grantof the Natural Science and Engineering Research Council of Canada (project number RGPIN-2015-04975).Finally, Yang acknowledges financial support from an Australian Postgraduate Award and supplementaryscholarships provided by the UNSW Business School. The views expressed herein are those of the authorsand are not necessarily those of the supporting organisations.
References
Antonio, K., Plat, R., 2014. Micro-level stochastic loss reserving for general insurance. Scandinavian Actuarial Journal 2014.Arjas, E., 1989. The claims reserving problem in non-life insurance: Some structural ideas. ASTIN Bulletin 19 (2), 139–152.Avanzi, B., Cassar, L. C., Wong, B., 2011. Modelling dependence in insurance claims processes with L´evy copulas. ASTINBulletin 41 (2), 575–609.Avanzi, B., Tao, J., Wong, B., Yang, X., 2016a. Capturing non-exchangeable dependence in multivariate loss processes withnested l´evy copulas. Annals of Actuarial Science 10 (1), 87–117.Avanzi, B., Taylor, G. C., Wong, B., 2016b. Correlations between insurance lines of business: An illusion or a real phenomenon?Some methodological considerations. ASTIN Bulletin 46 (2), 225–263.Avanzi, B., Taylor, G. C., Wong, B., Xian, A., 2020. Modelling and understanding count processes through a markov-modulatednon-homogeneous poisson process framework. arXiv q-fin.RM 2003.13888.Avanzi, B., Wong, B., Yang, X., 2016c. A micro-level claim count model with overdispersion and reporting delays. Insurance:Mathematics and Economics 71, 1–14.Badescu, A., Lin, X., Tang, D., 2016. A marked cox model for the number of ibnr claims: Theory. Insurance MathematicsEconomics 69, 29–37.Badescu, A. L., Lin, X. S., Tang, D., 2019. A marked cox model for the number of ibnr claims: Estimation and application.ASTIN Bulletin 49 (3), 709–739.Centanni, S., Minozzo, M., 2006a. Estimation and filtering by reversible jump MCMC for a doubly stochastic poisson modelfor ultra-high-frequency financial data. Statistical Modelling 6 (2), 97–118.Centanni, S., Minozzo, M., 2006b. A monte carlo approach to filtering for a class of marked doubly stochastic poisson processes.Journal of the American Statistical Association 101 (476), 1582–1597.Cont, R., Tankov, P., 2004. Financial Modelling With Jump Processes. Chapman & Hall/CRC, London.Friedland, J., 2013. Fundamentals of General Insurance Actuarial Analysis. Society of Actuaries.Gelman, A., Jones, G., Brooks, S., Meng, X.-L. (Eds.), 2011. Handbook of Markov Chain Monte Carlo. Boca Raton : CRCPress.Green, P., 1995. Reversible jump markov chain monte carlo computation and bayesian model determination. Biometrika 82 (4),711–732.Grossi, P., Kunreuther, H., 2005. Catastrophe modeling: a new approach to managing risk. huebner international series onrisk, insurance and economic security.Huang, J., Qiu, C., Wu, X., 2015. Stochastic loss reserving in discrete time: individual vs. aggregate data models. Communi-cations in Stastics - Theory and Methods 44, 2180–2206.Huang, J., Wu, X., Zhou, X., 2016. Asymptotic behaviors of stochastic reserving: aggregate versus individual models. EuropeanJournal of Operational Research 249, 657–666.Jessen, A. H., Mikosch, T., Samorodnitsky, G., 2011. Prediction of outstanding payments in a poisson cluster model. Scandi-navian Actuarial Journal 2011. ewell, W. S., 1989. Predicting ibnr events and delays. ASTIN Bulletin 19 (1).Joe, H., 1997. Multivariate Models and Dependence Concepts. Chapman & Hall, London.Larsen, C. R., 2007. An individual claims reserving model. ASTIN Bulletin 37 (1).Li, H., 2002. On the Dependence Structure of a System of Components with a Multivariate Shot-Noise Hazard Rate Process.Advances in Mathematics Research. Nova Publishers.Lindskog, F., McNeil, A. J., 2003. Common poisson shock models: Applications to insurance and credit risk modelling. ASTINBulletin 33 (2), 209–238.Matsui, M., 2014. Prediction in a non-homogeneous poisson cluster model. Insurance: Mathematics and Economics 55.Matsui, M., 2015. Prediction in a poisson cluster model with multiple cluster processes. Scandinavian Actuarial Journal 2015.Matsui, M., Mikosch, T., 2010. Prediction in a poisson cluster model. Journal of Applied Probability 47.McNeil, A. J., Frey, R., Embrechts, P., 2015. Quantitative risk management: Concepts, techniques and tools. Princeton, USA:Princeton university press.Merz, M., W¨uthrich, M. V., 2007. Prediction Error of the Chain LadderReserving Method Applied to Correlated Run-offTriangles. Annals of Actuarial Science.Merz, M., W¨uthrich, M. V., 2008. Prediction Error of the Multivariate Chain Ladder Reserving Method. North AmericanActuarial Journal.Merz, M., W¨uthrich, M. V., 2009a. Combining Chain-Ladder and Additive Loss Reserving Methods for Dependent Lines ofBusiness. Variance.Merz, M., W¨uthrich, M. V., 2009b. Prediction Error of the Multivariate Additive Loss Reserving Method for Dependent Linesof Business. Variance.Merz, M., W¨uthrich, M. V., Hashorva, E., 2013. Dependence modelling in multivariate claims run-off triangles. Annals ofActuarial Science 7, 3–25.Møller, J., Waagepetersen, R. P., 2004. Statistical inference and simulation for spatial point processes. Chapman & Hall/CRC.Norberg, R., 1993. Prediction of Outstanding Liabilities in Non-Life Insurance. ASTIN Bulletin 23 (1), 95–115.Norberg, R., 1999. Prediction of Outstanding Liabilities - II Model Variations and Extensions. ASTIN Bulletin 29 (1), 5–27.Pigeon, M., Antonio, K., Denuit, M., 2013. Individual loss reserving with the multivariate skew normal framework. ASTINBulletin 43.Pigeon, M., Antonio, K., Denuit, M., 2014. Individual loss reserving using paid-incurred data. Insurance: Mathematics andEconomics 58.Ross, S. M., 2014. Introduction to probability models. Academic press.Ryd´en, T., 1996. An EM algorithm for estimation in markov-modulated poisson processes. Computational Statistics & DataAnalysis 21 (4).Selch, D. A., Scherer, M., 2018. A Multivariate Claim Count Model for Applications in Insurance. Springer.Shi, P., Basu, S., Meyers, G. G., 2012. A Bayesian log-normal model for multivariate loss reserving. North American ActuarialJournal 16 (1), 29–51.Shi, P., Frees, E. W., 2011. Dependent loss reserving using copulas. ASTIN Bulletin 41 (2), 449–486.Tankov, P., 2003. Dependence structure of spectrally positive multidimensional L´evy processes.Taylor, G., McGuire, G., Sullivan, J., 2008. Individual Claim Loss Reserving Conditioned by Case Estimates. Annals ofActuarial Science 3 (1-2), 215–256.Werner, G., Modlin, C., 2010. Basic Ratemaking. Casualty Actuarial Society.Zhao, X., Zhou, X., 2010. Applying copula models to individual claim loss reserving methods. Insurance: Mathematics andEconomics 46 (2), 290–299.Zhao, X. B., Zhou, X., Wong, J. L., 2009. Semiparametric model for prediction of individual claim loss reserving. Insurance:Mathematics and Economics 45 (1), 1–8. A. Proof for Example 1
Proof.
Consider the process of the common shots, we have˜ λ (cid:107) ( t ) = ˜ λ (cid:107) (0) e − tκ + J (cid:107) ( t ) (cid:88) i =1 X (cid:107) i, e − ( t − τ (cid:107) i, ) κ , ˜ λ (cid:107) ( t ) = ˜ λ (cid:107) (0) e − tκ + J (cid:107) ( t ) (cid:88) i =1 X (cid:107) i, e − ( t − τ (cid:107) i, ) κ . (A.1)We characterise the trajectory of the bivariate shot noise process by using a random vector, θ (cid:107) ( t ) ,which is the collection of the number of jumps, the location and joint severities of each shot up to time t ,that is θ (cid:107) ( t ) = { J (cid:107) ( t ) , τ (cid:107) , , . . . , τ (cid:107) J (cid:107) ( t ) , , X (cid:107) , , . . . , X (cid:107) J (cid:107) ( t ) , } . (A.2)22ased on the Conditional Covariance Formula (see, for example Ross, 2014, Chapter 7), we haveCov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) = E (cid:104) Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:17)(cid:105) + Cov (cid:16) E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) t (cid:105) , E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:105)(cid:17) . (A.3)Furthermore, conditional on J (cid:107) ( t ) (which is a Poisson random variable itself with intensity ρ (cid:107) ), thelocations of shots (cid:16) that is, { τ (cid:107) i, | J (cid:107) ( t ) } i =1 ,...,J (cid:107) ( t ) (cid:17) are independent uniform random variables.Firstly, we start from deriving E (cid:104) Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:17)(cid:105) . We have: E (cid:104) Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:17)(cid:105) = E Cov ˜ λ (cid:107) (0) e − tκ + J (cid:107) ( t ) (cid:88) i =1 X (cid:107) i, e − ( t − τ (cid:107) i, ) κ , ˜ λ (cid:107) (0) e − tκ + J (cid:107) ( t ) (cid:88) i =1 X (cid:107) i, e − ( t − τ (cid:107) i, ) κ | J (cid:107) ( t ) = E (cid:104) e − t ( κ + κ ) Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) + J (cid:107) ( t )Cov (cid:16) X (cid:107) e − ( t − τ (cid:107) ) κ , X (cid:107) e − ( t − τ (cid:107) ) κ | J (cid:107) ( t ) (cid:17)(cid:105) = E (cid:104) e − t ( κ + κ ) Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) + J (cid:107) ( t ) e − t ( κ + κ ) Cov (cid:16) X (cid:107) e τ (cid:107) κ , X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:17)(cid:105) (A.4)andCov (cid:16) X (cid:107) e τ (cid:107) κ , X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:17) = E (cid:104) X (cid:107) e τ (cid:107) κ X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:105) − E (cid:104) X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:105) E (cid:104) X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:105) = E (cid:104) X (cid:107) X (cid:107) (cid:105) E (cid:104) e τ (cid:107) ( κ + κ ) | J (cid:107) (12) (cid:105) − E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) E (cid:104) e τ (cid:107) κ | J (cid:107) ( t ) (cid:105) E (cid:104) e τ (cid:107) κ | J (cid:107) ( t ) (cid:105) . (A.5)The location of the shot given the total number of shot is uniformly distributed over [0 , t ], therefore onecan obtain the first two moments by: E (cid:104) e τ (cid:107) a | J (cid:107) ( t ) (cid:105) = (cid:90) t e ax t d x = e at − at , (A.6)hence Cov (cid:16) X (cid:107) e τ (cid:107) κ , X (cid:107) e τ (cid:107) κ | J (cid:107) ( t ) (cid:17) = E (cid:104) X (cid:107) X (cid:107) (cid:105) e ( κ + κ ) t − κ + κ ) t − E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) e κ t − κ t e κ t − κ t . (A.7)Therefore, E (cid:104) Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) | J (cid:107) t (cid:17)(cid:105) = e − t ( κ + κ ) Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) + ρ (cid:107) te − t ( κ + κ ) (cid:18) E (cid:104) X (cid:107) X (cid:107) (cid:105) e ( κ + κ ) t − κ + κ ) t − E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) e κ t − κ t e κ t − κ t (cid:19) . (A.8)Secondly, we derive Cov (cid:16) E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:105) , E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:105)(cid:17) . We start from deriving the conditionalexpectation terms: 23 (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) ( t ) (cid:105) = E ˜ λ (cid:107) (0) e − tκ + J (cid:107) ( t ) (cid:88) i =1 X (cid:107) e − ( t − τ (cid:107) ) κ | J (cid:107) ( t ) = e − tκ (cid:16) E (cid:104) ˜ λ (cid:107) (0) (cid:105) + J (cid:107) ( t ) E (cid:104) X (cid:107) (cid:105) E (cid:104) e τ (cid:107) κ (cid:105)(cid:17) = e − tκ (cid:18) E (cid:104) ˜ λ (cid:107) (0) (cid:105) + J (cid:107) ( t ) E (cid:104) X (cid:107) (cid:105) e κ t − κ t (cid:19) . (A.9) E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) t (cid:105) = e − tκ (cid:18) E (cid:104) ˜ λ (cid:107) (0) (cid:105) + J (cid:107) ( t ) E (cid:104) X (cid:107) (cid:105) e κ t − κ t (cid:19) (A.10)Therefore we arrive atCov (cid:16) E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) t (cid:105) , E (cid:104) ˜ λ (cid:107) ( t ) | J (cid:107) t (cid:105)(cid:17) = e − t ( κ + κ ) e κ t − κ t e κ t − κ t E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) Var (cid:16) J (cid:107) ( t ) (cid:17) = e − t ( κ + κ ) e κ t − κ t e κ t − κ t ρ (cid:107) t E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) (A.11)and Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) = e − t ( κ + κ ) Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) + ρ (cid:107) te − t ( κ + κ ) (cid:18) E (cid:104) X (cid:107) X (cid:107) (cid:105) e ( κ + κ ) t − κ + κ ) t − E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) e κ t − κ t e κ t − κ t (cid:19) + e − t ( κ + κ ) e κ t − κ t e κ t − κ t ρ (cid:107) t E (cid:104) X (cid:107) (cid:105) E (cid:104) X (cid:107) (cid:105) = e − t ( k + k ) Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) + ρ (cid:107) te − t ( κ + κ ) (cid:18) E (cid:104) X (cid:107) X (cid:107) (cid:105) e ( κ + κ ) t − κ + κ ) t (cid:19) . (A.12)Given that this is a stationary bivariate process, we have:Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) = Cov (cid:16) ˜ λ (cid:107) (0) , ˜ λ (cid:107) (0) (cid:17) , (A.13)hence (cid:16) − e − t ( κ + κ ) (cid:17) Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) = ρ (cid:107) te − t ( κ + κ ) E (cid:104) X (cid:107) X (cid:107) (cid:105) e ( κ + κ ) t − κ + κ ) t . (A.14)Therefore Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) = ρ (cid:107) E (cid:104) X (cid:107) X (cid:107) (cid:105) κ + κ . (A.15)Since the two unique shot processes are independent with the bivariate common shot process, thereforeCov (cid:16) ˜ λ ( t ) , ˜ λ ( t ) (cid:17) = Cov (cid:16) ˜ λ (cid:107) ( t ) + ˜ λ ⊥ ( t ) , ˜ λ (cid:107) ( t ) + ˜ λ ⊥ ( t ) (cid:17) = Cov (cid:16) ˜ λ (cid:107) ( t ) , ˜ λ (cid:107) ( t ) (cid:17) ,,