[PDF] Pricing foreseeable and unforeseeable risks in insurance portfolios

Abstract

In this manuscript we propose a method for pricing insurance products that cover not only traditional risks, but also unforeseen ones. By considering the Poisson process parameter to be a mixed random variable, we capture the heterogeneity of foreseeable and unforeseeable risks. To illustrate, we estimate the weights for the two risk streams for a real dataset from a Portuguese insurer. To calculate the premium, we set the frequency and severity as distributions that belong to the linear exponential family. Under a Bayesian setup , we show that when working with a finite mixture of conjugate priors, the premium can be estimated by a mixture of posterior means, with updated parameters, depending on claim histories. We emphasise the riskiness of the unforeseeable trend, by choosing heavy-tailed distributions. After estimating distribution parameters involved using the Expectation-Maximization algorithm, we found that Bayesian premiums derived are more reactive to claim trends than traditional ones.

Full PDF

PPricing foreseeable and unforeseeable risks in insurance portfolios

Weihong Ni , Corina Constantinescu , Alfredo D. Eg´ıdio dos Reis & V´eronique Maume-Deschamps , Department of Computer Science and Mathematics, Arcadia University, PA, USA; [email protected] Institute for Financial and Actuarial Mathematics, Department of Mathematical Sciences,University of Liverpool, UK; [email protected] ISEG and CEMAPRE, Universidade de Lisboa, PT; [email protected] Univ Lyon, Universit´e Claude Bernard Lyon 1, CNRS UMR 5208, Institut Camille Jordan,F-69622 Villeurbanne, FR; [email protected]

Abstract

In this manuscript we propose a method for pricing insurance products that cover not onlytraditional risks, but also unforeseen ones. By considering the Poisson process parameter to be amixed random variable, we capture the heterogeneity of foreseeable and unforeseeable risks. Toillustrate, we estimate the weights for the two risk streams for a real dataset from a Portugueseinsurer. To calculate the premium, we set the frequency and severity as distributions that belongto the linear exponential family. Under a Bayesian set-up, we show that when working with a ﬁnitemixture of conjugate priors, the premium can be estimated by a mixture of posterior means, withupdated parameters, depending on claim histories. We emphasise the riskiness of the unforeseeabletrend, by choosing heavy-tailed distributions. After estimating distribution parameters involvedusing the Expectation-Maximization algorithm, we found that Bayesian premiums derived are morereactive to claim trends than traditional ones.

Keywords : Mixed Poisson processes; Foreseeable risks; Unforeseeable risks; Bayesian estima-tion; Ratemaking; Experience rating; Bonus-malus; EM algorithm.

The prime objective of this paper is to develop the parameter estimation and premium calculationto the extended model introduced by Li et al. [2015], where the so-called unforeseeable risks wereadditionally taken into account. These risks refer to those that are not really predictable due tolack of history, for example, a very real situation during the Spring of 2020, a pandemic. Anotherexample are the emerging risks that have been evolving with the ever smarter way of living. Theevaluation of these risks could be diﬀerent from classical methods. For instance, with the introductionof autonomous cars, it would possibly change the magnitude of risks on roads, yet we do not knowin which direction precisely, even though they were designed to reduce human error . To model suchunforeseeable features, a defective random variable was incorporated in the risk model by Li et al.[2015], inspired in Dubey [1977]. There, unforeseeable risks were reﬂected in the claim frequenciesonly. We extend to severity risks.We propose a modiﬁcation on the classical risk model which accounts for the accommodationof the unforeseeable risks. Li et al. [2015] show such modiﬁed model, a developed version of the1 a r X i v : . [ q -f i n . GN ] J u l lassical Cram´er-Lundberg’s risk model introduced by Dubey [1977], where the Poisson parameter λ isa realization of a random variable Λ. The premium income is based on an estimation of Λ, for instancethe posterior mean ˆ λ ( t ) = E [Λ | N ( t )], namely the risk process is driven by the equation U ( t ) = u + c (cid:90) t ˆ λ ( s ) ds − S ( t ) , t ≥ . (1.1)Here S ( t ) = (cid:80) N ( t ) j =0 Y j is the aggregate claims up to time t , u = U (0) is the initial surplus, { Y j } ∞ j =1 a sequence of independent and identically distributed (brieﬂy, i.i.d . ) random variables with commondistribution H Y ( . ), existing mean µ = E [ Y ]. Moreover, Y ≡

0, and { N ( t ) , t ≥ } is a Poisson processwith intensity λ ( > { Y j } ∞ j =1 is independent of { N ( t ) , t ≥ } . Here c is the time-invariantcomponent of the premium income, with c = (1 + θ ) E [ Y ], where θ ( >

0) is the loading coeﬃcient.Dubey [1977] considered a positive probability P [Λ = 0] = p ( > { N ( t ) } is a mixed Poisson process conditional on Λ >

0. Li et al. [2015] generalised the process toa sum of two mixed counting processes, { N ( t ) = N (1) ( t ) + N (2) ( t ) } , where { N (1) ( t ) } represents the historical risk stream of claim counts and { N (2) ( t ) } represents the corresponding unforeseeable riskstream . For each, randomised intensity parameters are denoted as Λ (1) and Λ (2) , respectively. This isthe process we will consider in the rest of the paper. The randomised intensity parameter of N (1) ( t )has a classical behaviour, meaning that it is a positive random variable, or P { Λ (1) = 0 } = 0, unlike inthe second process where P { Λ (2) = 0 } = p >

0. As in Li et al. [2015], we set ˆ λ ( t ) = E [Λ (1) + Λ (2) | N ( t )]in (1.1).Not much interpretation is given by the authors, Dubey [1977] and Li et al. [2015], to the situationwhere p >

0. We remark that the second stream can be riskier in either severity or frequency or both.If we think of the development of autonomous cars, experiments suggest that they are potentially saferthan human driving if you are looking at the accident rates (Waymo Team, 2016). But once an accidentoccurs, it could bring in higher damage than that would have been caused by a human (McCausland[2019]). On the other hand, the outbreak of the asbestos crisis in the 1980s raised no awareness until20 years later when claims reached more than double the original amount (White [2003]). If a modelexisted at that time to capture the potential risks embedded in the gradually increasing claims earlier,insurance companies could have avoided dramatic losses. In short, the introduction of an unforeseeablestream is to accommodate potential uncertainties, either riskier or less risky, whose nature one cannotforesee exactly if and/or when , and that the actuary feels it’s wise to be reﬂected in the premium. Anew virus pandemic could be an example, signs may have been appeared before.Modelling two diﬀerent streams of risks for the same portfolio can be done exclusively on the claimcount process as Dubey [1977] and Li et al. [2015] did, or in the claim severity, or both in claim countsand severity. In this case we consider there is some sort of dependence between { N ( t ) , t ≥ } andsequence { Y j } ∞ j =1 . The latter means that the diﬀerent streams may bring diverse severity behaviour.This is not dealt by any of the above cited authors.Letting premium adjust according to claim numbers somehow imitates the operation of a Bonus-malus system (BMS). For instance, Ni et al. [2014] worked with Negative Binomial distributed fre-quencies and Weibull distributed severities, as an extension to classical settings. Another recent workby Cheung et al. [2019] considered a dependence structure between the claim frequency and severitycomponents via conjugate bivariate prior and calculated Bayesian premiums. In this work, we alsoincorporate the two risk streams into the claim severity, add appropriate premium calculation andestimation. This is important in order to reﬂect both risk eﬀects in the premium, for fairness andruin probability compensation. For the premium calculation/estimation credibility theory is used, pa-rameter estimation procedure uses the Expectation-Maximization (EM) algorithm. EM algorithm isdeveloped in Dempster et al. [1977], a guided tour can be found in Couvreur [1997]. We will be using2ixture distributions, particular algorithm application for these can be found in Redner and Walker[1984].The use of Bayesian and credibility methods for posterior premium appear as a natural choice (wehave set above the posterior mean ˆ λ ( t ), for instance). Markovian bonus-malus methods are easy andcommon choices for posterior rating based on claim counts only and are used for individualise premia.This is not our problem, besides, adding the severity component it necessarily drives actuaries forcredibility estimating methods. Introductory notions on credibility can be found in Klugman et al.[2012], Chapters 17-19, and more advanced in B¨uhlmann and Gisler [2006].The manuscript is organised as follows. In Section 2 we work only with the claim frequency com-ponent of the model, while in Section 3 we introduce the claim severity, assuming that severity bringssome information on the stream origin. In Section 4 we deal with the premium calculation/estimationvia a Bayesian approach. Section 5 is devoted to the parameter estimation via an EM algorithm, ona concrete dataset. Claim severity data are randomly generated by the chosen mixed distributionsconsidering both non-extreme and extreme cases. Bayesian premiums are computed in the end usingthe estimated parameters. Section 6 presents discussion on a global likelihood procedure to estimate to-gether the claim frequency and severity distributions, and in the last section we write some concludingremarks. Proofs of lemmas are given in the appendix. The total claim numbers over a time period (0 , t ] within the portfolio is given by N ( t ) = N (1) ( t ) + N (2) ( t ) . (2.1)Here { N (1) ( t ) } is a mixed Poisson process with parameter Λ (1) and { N (2) ( t ) } is mixed Poisson condi-tional on { Λ (2) > } , meaning the intensity must be strictly positive. The event { Λ (2) = 0 } impliesthat { N (2) ( t ) = 0 } , with probability one, sincelim λ ↓ P { N (2) ( t ) = 0 | Λ (2) = λ } = lim λ ↓ e − λ t = 1 . { N ( t ) } is also a mixed Poisson process with intensity Λ (1) + Λ (2) given that N (1) ( t ) and N (2) ( t ) areindependent, see Li et al. [2015] [Lemma 1]. Under this global process, { N ( t ) = N (1) ( t ) + N (2) ( t ) } ,we have that when a claim arrives it comes either from the process { N ( t ) (1) } , or from the process { N (2) ( t ) } . We note that in practice actuaries observe the realisations of the overall claim process,which means they guess which process the claim belongs to.Let us deﬁne and denote by Ξ, the random variable representing the split rate in favour of process { N (1)( t ) } in { N ( t ) } . It is well known that for independent Poisson processes, in our case for givenpositive rates Λ (1) = λ (1) and Λ (2) = λ (2) >

0, we have that the split between processes “(1)” and “(2)”is going to be given by probabilities, ξ = λ (1) λ (1) + λ (2) and 1 − ξ = λ (2) λ (1) + λ (2) , (2.2)respectively. Unconditionally, considering that ξ is a particular outcome of the random variable Ξ,then we can write, extending the unconditional mixed process as deﬁned in (2.1)Ξ = Λ (1) Λ (1) + Λ (2) . (2.3)3ote that { Λ (2) = 0 } ⇔ { Ξ = 1 } ⇒ P [Λ = 0] = P [Ξ = 1] = p . (2.4)We denote the distribution function of Ξ as A Ξ ( . ).We will now set further assumptions. From now on, we assume that the Poisson intensities fol-low gamma distributions: Λ (1) ∼ Gamma ( α , β ) and, conditional on Λ (2) >

0, Λ (2) | Λ (2) > ∼ Gamma ( α , β ). For simplicity, in the sequel we denote Λ (2)+ = Λ (2) | Λ (2) >

0. Furthermore, withoutloss of generality, we consider that the number of claims observed within one unit period N (1) to bedenoted as N . Under a more restrictive assumption for the gamma distributions above, we can arriveat the following lemma: Lemma 2.1 If β = β = β , then the random variable N = N (1) is represented by a mixture of twoNegative Binomial random variables, with probability function P ( N = n ) = p (cid:18) n + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α (cid:18) β + 1 (cid:19) n + (1 − p ) (cid:18) n + α + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α + α (cid:18) β + 1 (cid:19) n . Remark 2.1

Note that N (2) ( t ) follows a ‘Zero Modiﬁed’ Negative Binomial distribution, brieﬂy, N (2) ( t ) ∼ ZM N egative Binomial ( α , β ) . It belongs to the ( a, b, recursion class of distributions,see Section 6.6 of Klugman et al. [2012]. Since M N (2) ( ρ ) = E (cid:104) e N (2) | Λ (2) = 0 (cid:105) × P (cid:104) Λ (2) = 0 (cid:105) + E (cid:104) e N (2) | Λ (2) > (cid:105) × P (cid:104) Λ (2) > (cid:105) = p + (1 − p ) (cid:32) − β − e ρ β (cid:33) α , which is a weighted average of the pgf ’s of a degenerate distribution at { } and a Negative Binomial,member of the ( a, b, class. (cid:3) In the sequel, instead of focusing on the claim arrival process { N ( t ) } which consists of combiningtwo counting processes, { N (1) ( t ) } and { N (2) ( t ) } , as explained above, we could model the underlyingclaim counts via mixing random variables, mixing two Gamma random variables with an independentBernoulli random variable. See [Constantinescu et al., 2013], Lemma 2.2

Assume that the randomized Poisson parameter Λ follows a prior distribution as a mix-ture of two Gamma random variables, such that Λ =

I Z + (1 − I ) Z , (2.5) where Z ∼ Gamma ( α , β ) and Z ∼ Gamma ( α + α , β ) and I is a Bernoulli ( p ) random variableindependent of Z and Z . Then the distribution of N = N (1) is the mixture of two Negative Binomialrandom variables as shown in Lemma 2.1 above. Based on the above Bayesian set-up, considering a prior distribution of Gamma mixtures, we will beable to compute easily a corresponding posterior distribution, for a given distribution of an observablerandom variable. This is important for experience rating where we can use naturally the theory ofBayesian premium, for making posterior estimation of premia, as well as credibility theory, a particularcase in Bayesian premium estimation. This is going to be done in Section 4. In fact, the equivalenceof the previous two constructions can be explained further. Consider the following remark.4 emark 2.2

Let Z = Λ (1) , Z = Λ (1) + Λ (2)+ , where Λ (2)+ = Λ (2) | Λ (2) > , as denoted before. Also, let Λ (2) = (1 − I ) Λ (2)+ + I Λ (2)0 = (1 − I ) Λ (2)+ , with Λ (2)0 = 0 and I ∼ Bernoulli ( p ) . We have, Λ =

I Z + (1 − I ) Z = I Λ (1) + (1 − I ) (Λ (1) + Λ (2)+ )= Λ (1) + (1 − I ) Λ (2)+ = Λ (1) + Λ (2) . Since N ( t ) | Λ ∼ Poisson (Λ (1) + Λ (2) ) , it is true that N ( t ) = N (1) ( t ) + N (2) ( t ) where N (1) ( t ) | Λ (1) ∼ Poisson (Λ (1) ) and N (2) ( t ) | Λ (2)+ ∼ Poisson (Λ (2)+ ) . (cid:3) In the model presented so far, we distinguish the unforeseeable stream of risks from the historical stream by considering two diﬀerent, and independent, claim counting processes, where the randomizedintensities are of diﬀerent nature. For now, these parameters only inﬂuence the claim numbers, not theclaim sizes. However, we should also consider the possibility that the claims are aﬀected by the risktype. The introduction of a randomness in the intensities can bring a dependence between number ofclaims and sizes, setting us away from the classical risk consideration that the claim arrival process isindependent of the claim severity sequence. We may assume a twofold approach1. At a starting stage, we can consider that for a given Λ ( i ) = λ ( i ) , i = 1 ,

2, we have conditionalindependence. We mean, for a given ( i ) and λ ( i ) , severities and claim counts are independent;2. At a later stage, an extended model with dependence.Again, we remark that parameters cannot be observed, only claims can. Furthermore, in prac-tice we only observe the realization of the total claim process { N ( t ) = N (1) ( t ) + N (2) } , so that we guess /estimate which process each claim arrival belongs to, then make the severity correspondence,if we set a diﬀerent distribution for each stream. Thus, using data and the model as behaving likeis deﬁned in Lemma 2.1 we have four parameters to estimate, p , α , α , and β . Parameters of theseverity distributions will be estimated in addition.As deﬁned in (2.3), Ξ is the random split rate between processes { N (1) ( t ) } and { N (2) } in { N ( t ) } .Let’s consider event { Λ (1) + Λ (2) = λ, Ξ = ξ, < ξ < } as given, so that { N ( t ) } conditionally is aPoisson process. Consider the following remark: Remark 2.3

Remark that

Ξ = ξ is the probability that an event arriving from process { N ( t ) = N (1) + N (2) } is generated by { N (1) } . Conditional on { Λ (1) + Λ (2) = λ, Ξ = ξ } we have that Λ (1) is also given. Then Λ (1) = λξ , and Λ (2) = λ − Λ (1) = λ (1 − ξ ) . If ξ = 1 ⇒ Λ (2) = 0 ⇔ Λ = Λ (1) = λ and N (2) ( t ) = 0 ∀ t ≥ almost surely. For now, we assume that the distribution of the individual claim size depends on the stream type, either historical or unforeseeable . We keep assuming that the actuary may not be able to recognize whichstream the claim comes from. At least, he cannot be certain. Let F and G denote the distributions forclaim severities in the historical and unforeseeable streams, respectively. Consider that the individualclaim size, taken at random, say Y , follows a distribution function denoted as H ( y ). Proposition 3.1

For a given claim Y , its distribution function, conditional on Ξ = ξ , can be repre-sented by P { Y ≤ y | Ξ = ξ } := H ξ ( y ) = ξF ( y ) + (1 − ξ ) G ( y ) , (3.1)5 here Ξ = Λ (1) Λ (1) +Λ (2) ∈ (0 , , and F, G correspond to the distributions for claim severities in thehistorical and unforeseeable streams, respectively. The set { Ξ = 1 } = { Λ (2) = 0 } has a probabilitymeasure p . Proof.

The proof is straightforward using the law of total probability. P { Y ≤ y | ξ } = P { Y ≤ y | Y = Y (1) , ξ } P { Y = Y (1) | ξ } + P { Y ≤ y | Y = Y (2) , ξ } P { Y = Y (2) | ξ } = F ( y ) P { Y = Y (1) | ξ } + G ( y ) P { Y = Y (2) | ξ } = F ( y ) · ξ + G ( y ) · (1 − ξ ) . where Y (1) and Y (2) denote random variables of the size of claims stemming from Stream 1 (the historical stream ) and Stream 2 (the unforeseeable stream ) respectively. Note that when Ξ = 1, i.e.,Λ (2) = 0, N ( t ) = N (1) ( t ) and the probability of having a claim from Stream 1, i.e., P { Y = Y (1) | ξ =1 } = 1, where as P { Y = Y (2) | Ξ = 1 } = 0. This special situation does not aﬀect Equality (3.1).It has been illustrated in Li et al. [2015], see their Lemmas 2 and 3, that the independence between Λ (1) Λ (1) +Λ (2)+ and Λ (1) + Λ (2)+ , conditional on Ξ (cid:54) = 1, result in Ξ | Ξ (cid:54) = 1 being distributed as a Beta ( α , α )law. Then, it is easy to derive the unconditional distribution of Y under these assumptions. Proposition 3.2

Assume that Λ (1) Λ (1) +Λ (2)+ and Λ (1) +Λ (2)+ are independent, and that Λ (1) ∼ Gamma ( α , β ) and Λ (2)+ ∼ Gamma ( α , β ) . A given claim size is distributed according to a mixture law of F ( · ) and G ( · ) , i.e., with density, h Y ( y ) = ν · f ( y ) + (1 − ν ) · g ( y ) , (3.2) where ν = p + (1 − p ) B ( α +1 ,α ) B ( α ,α ) . Proof.

Under the assumptions, we know that Ξ | Ξ (cid:54) = 1 (cid:95) Beta ( α , α ) according to Lukacs [1955]’sproportion-sum independence theorem. The subsequent proof is then straightforward. A Ξ ( ξ ) denotesthe distribution function of Ξ (a mixture), we have P { Y ≤ y } = (cid:90) (0 , H ξ ( y ) dA ( ξ ) = pF ( y ) + (1 − p ) (cid:90) (0 , H ξ ( y ) · ξ α − (1 − ξ ) α − B ( α , α ) dξ = pF ( y ) + (1 − p ) (cid:20) B ( α + 1 , α ) B ( α , α ) · F ( y ) + B ( α , α + 1) B ( α , α ) · G ( y ) (cid:21) = (cid:20) p + (1 − p ) B ( α + 1 , α ) B ( α , α ) (cid:21) · F ( y ) + (1 − p ) B ( α , α + 1) B ( α , α ) · G ( y )= ν · F ( y ) + (1 − ν ) · G ( y ) . Remark 3.1

Based on the proof of Li et al. [2015]’ Lemma 2 (see their Appendix) we can concludethat, under the conditions of Lemma 3.2, Y and N are independent. This will allow us to consider theparameter estimation using separately the claim frequency and the severity component. In the previous section we speciﬁed a distribution for the random variable N , as well as itsparametrization. The unconditional distribution for N starts from assuming a Poisson distribution,commonly accepted as assumption for claim count data in the line of motor insurance, for instance. Weended up developing a particular mixture of two Negative Binomial distributions, having started from6 diﬀerent Poisson distribution for each stream of risks. We’ll see in Section 5 that the assumption isreasonable and may ﬁt actual data.For the claim severity we will start with a prior assumption that claim severity distribution belongto the linear exponential family . This exponential family is a common ﬁt for insurance severity data.First, we assume a particular form for the distribution of individual claim severity as a mixture of twodistributions, denoted as F ( · ) and G ( · ). They represent the behaviours of the claim severities from thetwo streams, separately the historical and unforeseeable streams, respectively F ( · ) and G ( · ). Then, ifwe specify F ( · ) and G ( · ), we can move for the parameter estimation.Using a simple example, from now onwards consider that both the densities of the two streamscome from exponentially distributed random variables, Y ∼ Exp (Θ), where Θ − is the mean, and asdiscussed, there is a dependence structure embedded in the random parameter Θ, such that:1. Θ = µ with probability one if we consider the historical stream;2. Θ ∼ Gamma ( δ, σ ) if we consider the unforeseeable one.This implies that claims in the historical stream conform to the exponential distribution with mean µ , whereas those of the unforeseeable stream conform to a Pareto distribution, P areto ( δ, σ ). Theirrespective unconditional densities are f ( y ) = µe − µy and g ( y ) = δσ δ ( σ + y ) δ +1 , µ, δ, σ > . (3.3)We chose a mixture of a light tail distribution with a heavy tail one, respectively to represent an historic behaviour example and an unforeseeable one. Remark 3.2

This consideration can actually be connected with the example presented in Corollary 2in Li et al. [2015]. If we let ζ = µ , treat ζ as a Gamma ( δ, σ ) random variable and integrate the ruinfunction obtained there over ζ , we could in theory represent the ruin probability for our model in aclosed form. Our model will be complete, and ready for parameter estimation/distribution ﬁt, once the premiumcalculation is deﬁned.

Premia are set to be calculated for rating periods, most commonly annual periods. For estimating,risk observations must be organized by periods, suppose that we observed the risk for m years, or m rating periods. Then, our observation vector for the claim counts is n = { n , . . . , n m } . For eachyear, say year j , we’ll have n j individual claims observed, say vector y j = { y j , . . . , y jn j } . This meansthat observations in year j are set in the two fold vector ( n j , y j ). To calculate the annual premium,one only needs the annual amounts ,on aggregate, speciﬁcally the observations of S (1) i , i = 1 , . . . , m ,since a premium is set for an aggregate quantity. However that would be insuﬃcient in our case,since we consider that claim counts and the severities bring separate and diﬀerent information on anunforeseeable stream. Another question would rise: Could we separate information contribution fromcounts and from severities?In a classical model the two annual sequences of claim counts and severities are assumed to beindependent and the annual pure premium is given by E [ S (1)] = E [ N ] × E [ Y ] and the two premium7actors are often estimated separately. In view of Remark 3.1 this is also in our case. So, for now we’lldeal separately estimation of E [ N ] and E [ Y ].Let’s pick up the case of estimation of E [ N ] = E [ N m +1 ] for the upcoming year. The problem andprocedures for the severity component E [ Y ] are similar. If we proceed in a classical way , we wouldsimply estimate the parameters involving that mean p, α , α , β, and we would get an estimate, sayˆ E [ N ]. However, since we have set the model under a Bayesian set-up (recall Lemma 2.2), we naturallyopen the Bayesian methodology of insurance premium calculation, where modern credibility theoryis included. See [Klugman et al., 2012, Sections 18.3-4] The hypothesis of conditional independenceand identically distribution past annual observations, N = { N , . . . , N m } , is assumed, given Λ =Λ (1) + Λ (2) = λ .The Bayesian (pure) premium is deﬁned as the mean of the predictive distribution E [ N m +1 | N = n ].An interpretation is simple, parameters cannot be observed, only claims are, either counts or sever-ities, and they should bring some information about unknown parameters. So, E [ N m +1 | N = n ] isan estimate for the premium we want to calculate. On the other hand, it could be easily shownthat E [ N m +1 | N = n ] = E [ µ m +1 (Λ) | N = n ], where µ m +1 (Λ) = µ (Λ) = Λ is the hypothetical meanof the Poisson likelihood. The former expectation is the mean of the predictive distribution. Thelatter expectation is the expected value of the hypothetical mean µ (Λ), integrated over the posteriordistribution. µ (Λ) is also known in insurance risk theory, and credibility theory, as the (unknown)risk premium. The credibility premium is the best linear approximation for the Bayesian premiumin the sense of minimizing the expected squared error (see [Klugman et al., 2012, Section 20.3.7 andFormula (20.28)]). When we work with linear exponential family likelihood and their conjugate priors(no truncations), we can usually get what is called Exact Credibility in the literature, see [B¨uhlmann,1967, Section 18.7], i.e., the Bayesian premium equals the credibility premium. Despite the fact thatour setting will not result in the exact credibility, we will see that they can be related.Consider a generic a random variable, say X , following a distribution belonging to the linearexponential family, depending of some parameter, say θ . Its probability or density function is of theform (see [Klugman et al., 2012, Chapters 5 and 15]) f X | Θ ( x | θ ) = p ( x ) e r ( θ ) x q ( θ ) . (4.1)If parameter θ is considered to be an outcome of a random variable Θ, whose prior density function isof the following form, denoted as π ( . ) π ( θ ) = [ q ( θ )] − k e µkr ( θ ) r (cid:48) ( θ ) c ( µ, k ) , (4.2)then we are in the presence of a conjugate prior. In the above formulae, (4.1) and (4.2), p ( . ) issome function not depending on the parameter θ . On one other hand, r ( . ), q ( . ) are functions of theparameter; c ( . ) is a normalizing function of given parameters.It is clear from Sections 2 and 3 that we work with situations where the risk random variables thatare observable follow distributions that are members of the linear exponential family given by (4.1).Associated to these we consider priors that are mixture distributions of the form (4.2). In this situationwe don’t have exact credibility but some mixture of exact credibility situations, so that the premiumestimation is simpler than we would expect, with easy interpretation.For claim counts we consider a Poisson random variable, with random parameter and distributed asa mixture (A.1) (in the appendix). For the claim severities, we consider an exponential random variablewhose parameter is random, distributed as a mixture of a degenerate distribution and a Gamma (seeend of Section 3). 8s a general case, consider that the prior distribution is a ﬁnite mixture of distributions of family(4.2) such that π ( θ ) = η (cid:88) i =1 ω i [ q ( θ )] − k i e µ i k i r ( θ ) r (cid:48) ( θ ) c i ( µ i , k i ) , (4.3)where ω i , i = 1 , · · · , η , are given weights. Theorem 4.1

Suppose that given

Θ = θ , the observable vector of i.i.d. random variables X =( X , . . . , X n ) have common probability function given by (4.1) , and that the prior distribution of Θ , π ( θ ) , is of the form given by (4.3) . Then, the posterior distribution, denoted as π ( θ | x ) with x = { x , . . . , x n } , is of a mixture form as (4.3) : π ( θ | x ) = (cid:88) i ω ∗ i [ q ( θ )] − k ∗ i e µ ∗ i k ∗ i r ( θ ) r (cid:48) ( θ ) c i ( µ ∗ i , k ∗ i ) , (4.4) where µ ∗ i = µ i k i + (cid:80) j x j k i + n , (4.5) k ∗ i = k i + n, (4.6) w ∗ i = ω i c i ( µ ∗ i ,k ∗ i ) c i ( µ i ,k i ) (cid:80) i ω i c i ( µ ∗ i ,k ∗ i ) c i ( µ i ,k i ) . (4.7) Proof.

With observations X = x , the posterior distribution is π ( θ | x ) = (cid:81) j p ( x j ) e r ( θ ) (cid:80) j xj [ q ( θ )] n · (cid:80) mi =1 ω i [ q ( θ )] − ki e µikir ( θ ) r (cid:48) ( θ ) c i ( µ i ,k i ) (cid:90) Θ (cid:32) (cid:81) j p ( x j ) e r ( θ ) (cid:80) j x j [ q ( θ )] n · m (cid:88) i =1 ω i [ q ( θ )] − k i e µ i k i r ( θ ) r (cid:48) ( θ ) c i ( µ i , k i ) (cid:33) d θ = (cid:80) i ω i [ q ( θ )] − ki − n e ( µiki + (cid:80) j xj ) r ( θ ) r (cid:48) ( θ ) c i ( µ i ,k i ) (cid:80) i ω i (cid:90) Θ [ q ( θ )] − k i − n e ( µ i k i + (cid:80) j x j ) r ( θ ) r (cid:48) ( θ ) c i ( µ i , k i ) d θ = (cid:80) i ω i [ q ( θ )] − ki − n e ( µiki + (cid:80) j xj ) r ( θ ) r (cid:48) ( θ ) c i ( µ i ,k i ) (cid:80) i ω i c i ( µ ∗ i ,k ∗ i ) c i ( µ i ,k i ) (cid:90) Θ [ q ( θ )] − k ∗ i e k ∗ i µ ∗ i r ( θ ) r (cid:48) ( θ ) c ∗ i ( µ ∗ i , k ∗ i ) d θ = (cid:88) i ω ∗ i [ q ( θ )] − k ∗ i e µ ∗ i k ∗ i r ( θ ) r (cid:48) ( θ ) c i ( µ ∗ i , k ∗ i ) . From the posterior distribution, we calculate the mean, and thus we can calculate the Bayesianpremium, as an estimate for the risk premium [see Klugman et al. [2012], Chapters 17 and 18] deﬁned as E [ X | θ ] = µ ( θ ). The Bayesian premium is given by estimate for next rating period, n +1, E [ X n +1 | X = x ](with n , X and x as deﬁned in Theorem 4.1) and can be calculated via the posterior distribution π ( θ | x ),as E [ X n +1 | x ] = E [ µ (Θ) | x ] . E [ µ (Θ) | x ] = w ¯ x + (1 − w ) E [ µ ( θ )] , (4.8)where w is the weight, function of n , ¯ x is the empirical mean of the observations and E [ µ ( θ )] is the(known) collective or manual premium.As said in Remark 3.1, we can calculate the premium components, separately for the claim countsand severity. Thus, next we are going to consider the claim frequency and then the severity component. If we consider a prior to be a Gamma mixture of the form given by (A.1), we reach to a posterior witha similar mixture, with updated parameters. This is stated in Lemma 4.2 that follows.

Lemma 4.2

Let the prior distribution be (A.1) . With a Poisson model distribution, the posteriordistribution is still a mixture of Gamma distributions, with updated parameters: π ( λ | n ) = w · ( β + m ) (cid:80) j n j + α λ (cid:80) j n j + α − e − ( m + β ) λ Γ( (cid:80) j n j + α )+(1 − w ) · ( β + m ) (cid:80) j n j + α + α λ (cid:80) j n j + α + α − e − ( m + β ) λ Γ( (cid:80) j n j + α + α ) , (4.9) where w = 11 + G ( p, α , α , β, n ) (4.10) G ( p, α , α , β, n ) = 1 − pp · B ( α , α ) B ( (cid:80) j n j + α , α ) · (cid:18) ββ + m (cid:19) α , n = { n , . . . , n m } is the vector of claim count observations and m is the sample size ( (cid:80) j · = (cid:80) mj =1 · ). For a ﬁxed individual claim size, The pure premium estimator will be considered as the posterior meanof the claim frequency component, i.e., E [ N m +1 | n ] = E [Λ | n ] = w · (cid:80) j n j + α β + m + (1 − w ) · (cid:80) j n j + α + α m + β , (4.11)where w is given by (4.10).We note that the premium estimate in (4.11) does not have the credibility form as it may look atﬁrst sight (see [Klugman et al., 2012, Chapters 17 and 18]). The G ( · ) function depends on the vector n , through a Beta function, so we do not have a posterior linearity here. Furthermore, due to Stirling’sformula we have that the ratio of the Gamma functions Γ( n + a ) / Γ( n + b ) ∼ n a − b as n → ∞ , allowingus to write that the function G ( p, α , α , β, n ) ∼ − pp Γ( α )Γ( α + α ) (cid:18) β (cid:80) j n j β + m (cid:19) α as (cid:88) j n j → ∞ .

10n fact, the Bayesian premium calculated is a mixture of two credibility components. For instance,the ﬁrst component can be split into form given by (4.8): (cid:80) j n j + α β + m = mβ + m · (cid:18) (cid:80) j n j m (cid:19) + ββ + m · (cid:18) α β (cid:19) . (4.12)It means it can be viewed as the credibility premium, as if there were only historical claims. Similarly, (cid:80) j n j + α + α m + β gives a corresponding formula when considering the existence of an unforeseeable stream.However this component keeps information on the historical/foreseeable stream through quantities α and β . For the severity component, when claims are exponentially distributed with parameter Θ, randomvariable, we can set the prior as π ( θ ) = ν ∆ θ ( { µ } ) + (1 − ν ) θ δ − σ δ e − σθ Γ( δ ) , (4.13)where ∆ θ ( { µ } ) = { µ } ( θ ) is the Dirac measure, and ν is as given in (3.2). Given Θ = θ , the conditionaldistribution of a single severity is exponential with mean 1 /θ . Thus, the posterior is also a mixture.We are calculating a premium estimate for the severity component only, based on the observedsingle quantities. Let’s consider that the sample is generically of size m ∗ and the observation vector is y = { y , . . . , y m ∗ } . If the sample size of claim counts is m , then m ∗ = (cid:80) mj =1 n j , n j ≥

1. The posteriordistribution is set in the following lemma and the premium form is given in (4.16).

Lemma 4.3

Let the prior distribution be (4.13) and the conditional distribution of a single severity,given

Θ = θ , be exponential with mean /θ . The posterior distribution is still a mixture distribution interms of the prior form with updated parameters, such that π ( θ | y ) = ω · ∆ θ ( { µ } ) + (1 − ω ) · ( σ + (cid:80) i y i ) m ∗ + δ θ m ∗ + δ − e − ( σ + (cid:80) i y i ) θ Γ( m ∗ + δ ) , (4.14) where ω = 11 + ϕ ( ν, µ, δ, σ, y ) (4.15) ϕ ( ν, µ, δ, σ, y ) = 1 − νν · Γ( m ∗ + δ ) σ m ∗ + δ Γ( δ )( σ + (cid:80) i y i ) m ∗ + δ · µ − m ∗ e µ (cid:80) i y i , with ν as given in (3.2) , y = { y , . . . , y m ∗ } and (cid:80) i · = (cid:80) m ∗ i =1 · . As in (4.11), the posterior mean does not exhibit linearity in the observations. The Bayesian esti-mator of the severity component is given by the weighted average between the mean of the exponentialrandom variable with parameter ( µ ) and the mean of the posterior Gamma ( m ∗ + δ, σ + (cid:80) i y i ) randomvariable, namely E [ Y m +1 | y ] = E [ µ (Θ) | x ] = ω µ + (1 − ω ) (cid:80) m ∗ i =1 y i + σm ∗ + δ − , (4.16)where ω is deﬁned in (4.15). Interpretation of Formula (4.16) is similar to that of (4.11).11 emark 4.1 If m ∗ = 0 then by convention, (cid:88) i y i = 0 and ϕ ( ν, µ, δ, σ, y ) ∼ − νν . Bayesian premium P m +1 is an estimate for the next rating period m + 1 and it can be derived from(4.11), (4.16), (4.10) and (4.15), as E [ P m +1 | n , y ] = E [ N m +1 | n ] E [ Y m +1 | y ] (4.17)= (cid:32) w · (cid:80) mj =1 n j + α β + m + (1 − w ) · (cid:80) mj =1 n j + α + α m + β (cid:33) (cid:32) ω · µ + (1 − ω ) · (cid:80) m ∗ j =1 y i + σm ∗ + δ − (cid:33) . In this formula we have a set of prior parameters that are most often unknown, and they need to beestimated from observed data, leading to the

Empirical Bayesian Premium . This is not an easy task,especially when we have to deal with mixtures of distributions, meaning a large number of parametersto estimate.

In this section, we explain how we employed the Expectation-Maximization (EM) algorithm to estimatethe prior parameters of our model based on a data set from a Portuguese insurer. Unfortunately wecould not ﬁnd a dataset having both the claim counts and the corresponding severities. Often datais recorded on an aggregate level. The implementation of the algorithm depends on the distributionchoice. We assumed the use of a mixed parametric distribution for the severity as an illustration forour proposed model.Since we are dealing with mixed distributions, the direct use of the Maximum Likelihood Esti-mation method is not always computationally tractable. We employ the EM algorithm, to ﬁnd bestﬁt estimates. Generally speaking, it is an iterative method to ﬁnd estimates for model parameterswhen dealing with incomplete data, missing data points, or unobserved (hidden) latent variables. SeeDempster et al. [1977] and Couvreur [1997]. Estimating parameters for mixed distributions can beconsidered as a case with unobserved latent variables. It works by initializing parameters according towhich probabilities for each possib value of the latent variable can be computed. Then the probabilitiesof the latent variable are used to derive better estimates of the parameters. The process continues untilconvergence.

We were provided with a quarterly claim count dataset from a Portughese motor insurance portfolio.Namely, it records the total number of claims arising from the portfolio every quarter and there were atotal of approximately 180 quarters recorded. Therefore, we could treat each entry as an observationfor N ( t ) for a ﬁxed t which stands for a quarter here. We start to write N for short in the sequel since N ( t ) is a random variable for ﬁxed t . We took care of the Third Party Liability claims only as thoseare the obligatory components for a car insurance policy. It was assumed that the portfolio is closedover the underlying period and that claim counts for each quarter are independent observations for N .Table 1 shows a brief summary of the data under consideration. The last two columns show thestandard deviation and the coeﬃcient of variation for this dataset. Its distribution can be also visualised12 istogram of N N F r equen cy Figure 1: Histogram of claim counts in the histogram shown by Figure 1. Clearly, data indicates a separation around 6000, which servesas a clue for the adoption of a mixture model as we theoretically derived above. Next, we explain theestimation procedure.

Claim frequency component

Recall that we work with a mixture model and that we only observe a global N i ∈ N , we mean, wedo not know which negative binomial distribution it comes from. Here, we introduce a latent variable,denoted as Z , representing the missing information regarding the provenience of each observation.Since we only have two mixed distributions, Z (cid:95) Bernoulli ( p ) is a Bernoulli random variable with P { Z = 1 } = q = 1 − P { Z = 0 } . Now, the complete information is given by the vector { N, Z } . We canwrite the complete likelihood function as follows. L ( θ | N, Z ) = m (cid:89) i =1 (cid:20) p (cid:18) n i + α − n i (cid:19) (cid:18) ββ + 1 (cid:19) α (cid:18) β + 1 (cid:19) n i (cid:21) z i × (cid:34) (1 − p ) (cid:18) n i + α + α − n i (cid:19) (cid:18) ββ + 1 (cid:19) α + α (cid:18) β + 1 (cid:19) n i (cid:35) − z i , and according to the law of total probability, we have, P { N } = P { N | Z = 1 } P { Z = 1 } + P { N | Z = 0 } P { Z = 0 } . Min. 1st Qu. Median Mean 3rd Qu. Max. S.D. C.V.

Table 1: Summary of statistics θ ∈ { α , α , β, p } are the parameters we are interested in estimating for the claim count distri-bution. Correspondingly, the log-likelihood function is given by, from above, L ( θ | N, Z ) = (cid:88) i z i log (cid:20) p (cid:18) n i + α − n i (cid:19) (cid:18) ββ + 1 (cid:19) α (cid:18) β + 1 (cid:19) n i (cid:21) + (cid:88) i (1 − z i ) log (cid:34) (1 − p ) (cid:18) n i + α + α − n i (cid:19) (cid:18) ββ + 1 (cid:19) α + α (cid:18) β + 1 (cid:19) n i (cid:35) . (5.1)We summarize how the algorithm works. It is an iterative process where each iteration consists oftwo steps, the E-step and the M-step, standing for Expectation and Maximisation, respectively.1. We begin with an initially determined parameter values θ (0) = { α (0)1 , α (0)2 , β (0) , p (0) } .2. E-StepFor the ( l + 1)-th iteration, l = 0 , , . . . , we ﬁrst seek for the expected value of Z i conditionalon the observations together with the current parameter estimates θ ( l ) , i.e., estimates from theprevious l -th iteration. This is denoted as τ i , τ i = E [ Z i | N, θ ( l ) ] = 1 × P { Z i = 1 | N i , θ ( l ) } + 0 × P { Z i = 0 | N i , θ ( l ) } = P { Z i = 1 , N i = n i | θ ( l ) } P { N i | θ ( l ) } = p ( l ) (cid:0) n i + α ( l )1 − n i (cid:1) (cid:16) β ( l ) β ( l ) +1 (cid:17) α ( l )1 (cid:16) β ( l ) +1 (cid:17) n i p ( l ) (cid:0) n i + α ( l )1 − n i (cid:1) (cid:16) β ( l ) β ( l ) +1 (cid:17) α ( l )1 (cid:16) β ( l ) +1 (cid:17) n i + (1 − p ( l ) ) (cid:0) n i + α ( l )1 + α ( l )2 − n i (cid:1) (cid:16) β ( l ) β ( l ) +1 (cid:17) α ( l )1 + α ( l )2 (cid:16) β ( l ) +1 (cid:17) n i Subsequently, based on Expression (5.1), we compute the expectation of the log-likelihood func-tion with respect to the conditional distribution of Z , given N under the current estimates θ ( l ) ,denoted as Q ( θ | θ ( l ) ). Considering we have we have m independent observations, we have Q ( θ | θ ( l ) ) = E Z i | N i ,θ ( l ) [ L ( θ | N i , Z i )] (5.2)= m (cid:88) i =1 τ i (cid:34) log p ( l ) + log (cid:18) n i + α ( l )1 − n i (cid:19) + α ( l )1 log β ( l ) β ( l ) + 1 + n i log 1 β ( l ) + 1 (cid:35) + m (cid:88) i =1 (1 − τ i ) (cid:34) log(1 − p ( l ) ) + log (cid:18) n i + α ( l )1 + α ( l )2 − n i (cid:19) + ( α ( l )1 + α ( l )2 ) log β ( l ) β ( l ) + 1 + n i log 1 β ( l ) + 1 (cid:35) = log p ( l ) (cid:88) i τ i + (cid:88) i τ i log (cid:18) n i + α ( l )1 − n i (cid:19) + α ( l )1 m log β ( l ) β ( l ) + 1 + log 1 β ( l ) + 1 (cid:88) i n i + log(1 − p ( l ) ) (cid:88) i (1 − τ i ) + (cid:88) i (1 − τ i ) log (cid:18) n i + α ( l )1 + α ( l )2 − n i (cid:19) + α ( l )2 log β ( l ) β ( l ) + 1 (cid:88) i (1 − τ i ) .

3. M-StepThe maximisation step is set to ﬁnd the parameter values that maximises function (5.2) and they becomethe estimates to be used in the next iteration, i.e., θ ( l +1) = argmax θ Q ( θ | θ ( l ) ) (5.3) or this, we take the gradient of Q ( θ | θ ( l ) ), equate to zero and solve for { α , α , β, p } simultaneously. ∂Q∂α = (cid:88) i τ i [ (cid:122) ( n i + α ) − (cid:122) ( α )] + (cid:88) i (1 − τ i )[ (cid:122) ( n i + α + α ) − (cid:122) ( α + α )] + m log β β = 0; ∂Q∂α = (cid:88) i (1 − τ i ) (cid:20) (cid:122) ( n i + α + α ) − (cid:122) ( α + α )] + log β β (cid:21) = 0; ∂Q∂β = α m + α (cid:80) i (1 − τ i ) − β (cid:80) i n i β (1 + β ) = 0; ∂Q∂p = (cid:80) i τ i p − (cid:80) i (1 − τ i )1 − p = 0 , (5.4)where (cid:122) ( · ) is the digamma function denoting the logarithmic derivative of a gamma function, i.e., (cid:122) ( x ) = ddx log(Γ( x )) = Γ (cid:48) ( x )Γ( x ) . The estimates for the current iteration, i.e., the ( l +1)-th, will be the solutions of the above equations. Notethat Equation (5.4) depends only on parameter p , thus it can be solved directly with explicit representation p ( l +1) = (cid:80) i τ i m . For the other three parameters, we provide numerical results. We employ the nleqslv

R package, whichsolves systems of non-linear equations. As a consequence, we can obtain estimated parameters at thisiteration θ ( l +1) = { α ( l +1)1 , α ( l +1)2 , β ( l +1) , p ( l +1) } .4. Plug θ ( l +1) into the ( l + 2)-th iteration and repeat the E-M steps until convergence. Initiating values using the method of moments yields α (0)1 = 96 . α (0)2 = 31 . β (0) =0 . p (0) = 0 . . α , α , β, p are α = 97 . α = 30 . β = 0 . p = 0 . . Based on these parameters, we add theoretical density to the histogram as shown in Figure 2. Visuallyit appears to be a good ﬁt. Note that, however, this result stemmed from initial values being derivedfrom moments. In addition, we found that estimates vary much with diﬀerent chosen initial values.For instance, if we begin with α (0)1 = 150; α (0)2 = 20; β (0) = 0 . , convergence happens only within the308 th iteration, at the same tolerance level as above. Nevertheless, the corresponding estimates liecloser to the initial values. α = 141 . α = 28 . β = 0 . , p = 0 . . As can be seen in Figure 2, we could already tell that most likely it does not provide a better ﬁt thanthe previous one.On the other hand, the Kolmogorov-Smirnov, Anderson-Darling and Cram´er-Von Mises test resultsare summarised in Table 2. In the former case, i.e., when initial values were obtained via method ofmoments, all p -values are big. That is to say, we do not have enough evidence to reject the nullhypothesis that the observed data agree with the proposed model. On the contrary, none of the testspresent p -values greater than 0.01 for the second example, i.e., an arbitrarily chosen set of startingvalues. Therefore, we could conclude the second model does not ﬁt the data well even at 1% signiﬁcancelevel. 15 .02.55.07.510.03500 4500 5500 6500 7500 Claim Counts F r equen cy Histogram of Claim Counts with Fitted Curves

Figure 2: Histogram with Fitted Density

Reference Distribution Kolmogorov-Smirnov Anderson-Darling Cram´er-Von Mises

NB1(97.558, 0.0198)and NB2(127.705, 0.0198) 0.6849 0.6316 0.682NB1(141.981, 0.0277)and NB2(170.002, 0.0277) 0.00162 4.347e-06 0.001474

Table 2: Summary of goodness-of-ﬁt tests

Claim severity component

As we said before, we do not have the claim severity data corresponding to the claim counts, so weﬁrst simulated them based on the claim counts used earlier. Then we have for claim counts in eachperiod followed by a sequence of claim sizes randomly generated according to (3.2) when f ( · ) and g ( · )take Exponential ( µ ) and P areto ( δ, σ ) forms, respectively.With varying values of the shape parameter in the Pareto part, we considered two separate scenarioswhen generating claim size data:(1) Shape parameter δ >

1, i.e., ﬁnite mean;(2) Shape parameter δ ≤

1, i.e., inﬁnite mean.For illustration purposes, we adopted δ = 2 and δ = 0 .

3, respectively, for claim costs simulations. Itis well-known that the Pareto distribution in the ﬁrst scenario has ﬁnite mean, whereas it has inﬁnitemean in the second scenario.Before proceeding to the EM steps, let us look into the parameter ν . In fact, ν is connected to theparameters in the claim frequency component. Therefore, once the sample of claim counts is given, wewould get an estimated value for ν . Recall from previous sections that, ν = p + (1 − p ) · B ( α + 1 , α ) B ( α , α ) . (5.5)16f we substitute in (5.5) the estimated values of p, α , α , we could obtain that the point estimationfor ν would be ν = 0 . ν could then be treated as known for now so that we can use it in theclaim costs simulation.Thus, we are ready to implement the steps of EM algorithm for estimating parameters in the claimseverity model.1. We begin with an initially determined parameter values ϑ (0) = { µ (0) , δ (0) , σ (0) } .2. E-StepFor the ( l + 1)-th iteration, l = 0 , , . . . , we ﬁrst seek for the expected value of the random value U i , representing the latent variable, conditional on the observations together with the currentparameter estimates ϑ ( l ) , i.e., estimates from the l -th iteration. Let τ i denote this expectation τ i = E [ U i | Y, ϑ ( l ) ] = 1 × P { U i = 1 | Y i , ϑ ( l ) } + 0 × P { U i = 0 | Y i , ϑ ( l ) } = P { U i = 1 , Y i | ϑ ( l ) } P { Y i , ϑ ( l ) } = νµ ( l ) e − µ ( l ) y i νµ ( l ) e − µ ( l ) y i + (1 − ν ) δ ( l ) σ ( l ) δ ( l ) ( σ ( l ) + y i ) δ ( l )+1 Subsequently, we compute the expectation of its log-likelihood function with respect to the con-ditional distribution of U given Y under the current estimates ϑ ( l ) . Considering that we have m ∗ independent observations, we have, Q ( ϑ | ϑ ( l ) ) = E U i | Y i ,ϑ ( l ) [ L ( ϑ | Y i , U i )] (5.6)= m ∗ (cid:88) i τ i (cid:104) log ν + log µ ( l ) − µ ( l ) y i (cid:105) + m ∗ (cid:88) i (1 − τ i ) (cid:104) log(1 − ν ) + log δ ( l ) + δ ( l ) log σ ( l ) − ( δ ( l ) + 1) log( σ ( l ) + y i ) (cid:105) = log ν (cid:88) i τ i + log µ ( l ) (cid:88) i τ i − µ ( l ) (cid:88) i τ i y i + log(1 − ν ) (cid:88) i (1 − τ i )+ log δ ( l ) (cid:88) i (1 − τ i ) + δ ( l ) log σ ( l ) (cid:88) i (1 − τ i ) − ( δ ( l ) + 1) (cid:88) i (1 − τ i ) log( σ ( l ) + y i ) .

3. M-StepThe maximisation step is set to ﬁnd the parameter values that maximises (5.6), i.e., ϑ ( l +1) = argmax ϑ Q ( ϑ | ϑ ( l ) ) (5.7)In order to do this, we take the gradient of Q ( ϑ | ϑ ( l ) ), equate to zero and solve the system for17 µ, δ, σ } simultaneously. ∂Q∂µ = (cid:80) i τ i µ − (cid:88) i τ i y i = 0; ∂Q∂δ = m ∗ − (cid:80) i τ i δ + m ∗ log σ − log σ (cid:88) i τ i − (cid:88) i (1 − τ i ) log( σ + y i ) = 0; ∂Q∂σ = ( m ∗ − (cid:80) i τ i ) δσ − ( δ + 1) (cid:88) i − τ i σ + y i = 0 . (5.8) ∂Q∂ν = (cid:80) i τ i ν − (cid:80) i (1 − τ i )1 − ν . First note that implementing the EM algorithm to ﬁnd the estimate ˆ ν is very similar to thatapplied to p in Subsection 5.2. Recall that the partial derivative equation with respect to p inthat subsection is independent from all other variables. A similar situation exists here for ν .Note that it does not aﬀect the remainder of the equations. As so, we can ﬁndˆ ν ( l +1) = (cid:80) i τ i m ∗ . (5.9)The estimates for the subsequent iteration ( l + 1)-th are the solutions to the above equations.It is also obvious that µ is independent from other parameters and could be solved directly withan explicit representation µ ( l +1) = (cid:80) i τ i (cid:80) i τ i y i . For the other two parameters, we could only solve numerically. Again we apply the nleqslv package in R. As a consequence, we can derive estimated parameters at this iteration ϑ ( l +1) = { µ ( l +1) , δ ( l +1) , σ ( l +1) } .4. Plug ϑ ( l +1) into the ( l + 2)-th iteration and repeat the above steps until convergence.Then, we implemented the algorithm and were able to achieve the desired parameters withinreasonable amount of iterations. In Table 3 and Table 4 we show our estimates against the predeﬁnedparameters in the simulation scenario with δ > δ ≤

1, respectively. This veriﬁes the eﬀectivenessof our algorithm.

Parameters µ δ σ ν

Predeﬁned 1 2 1 0.9039196Initial value 1.5 2.5 0.5 0.9After 2467 iterations (at convergence) 0.9925845 2.219456 1.159886 0.8343595

Table 3: Performance of estimation using EM algorithm on simulated claim severities ( δ > The tables have demonstrated satisfactory results using the EM algorithm for mixed distributionof the claim severity component. We would like to point out, however, that the algorithm is highlysensitive to the chosen initial values. Some starting values could even lead to no solutions to thenon-linear equations to be solved in the algorithm. But since we knew the true parameter values here,we were able to begin with a relatively reasonable initial ’guess’ of the parameter values. In the case ofapplying it to real data, similar to estimating the claim frequency component explained in the previous18 arameters µ δ σ ν

Predeﬁned 1 0.3 0.5 0.9039196Initial value 1.5 0.5 0.2 0.9After 1330 iterations (at convergence) 1.001311 0.2973774 0.5146515 0.9050024

Table 4: Performance of estimation using EM algorithm on simulated claim severities ( δ ≤ Iteration v a l ue variable vmudelsig Estimates over Iterations (del > 1)

Iteration v a l ue variable vmudelsig Estimates over Iterations (del <= 1)

Figure 3: Parameter Estimates under EM Algorithm section, the method of moments could be adopted in the beginning. Figure 3 shows how the parameterestimates evolved over iterations in the EM algorithm.Similarly, we can visualize how the distribution is ﬁtted to the claims from the histogram with theﬁtted density. On the top plot in Figure 4, we not only show the ﬁt of the mixed distribution, but alsothe ﬁts using an Exponential as well as a Pareto distribution to the simulated data without extremeclaims, i.e., δ = 2.However, the goodness-of-ﬁt is not what we would like to address here, because the claim costsdata were simulated. We are more interested in the tail behaviours using diﬀerent models. The tailbehaviour graphs are plotted under log-log scale in both Figure 4 and Figure 5, from which we canclearly see that our mixed model provides the heaviest ﬁtted tails regardless whether the claim datacontain extreme values or not. It implies that the use of mixed distribution to ﬁt the claims leads to abetter capture of information in the tails. In other words, the mixed distribution is more sensitive indetecting large claims, which reﬂects our motivation for this work.19 istogram of claims log−claims F r equen cy ExpParMix − − − Tail behaviors log−claims l og − den s i t y ExpParMix

Figure 4: Histogram and Tail Behaviours ( δ > −20 −10 0 10 20 30 40 − Tail behaviors log−claims l og − den s i t y ParMix

Figure 5: Tail Behaviours ( δ ≤ .3 Premium Calculation With the estimated parameters, we can then calculate the Bayesian premiums according to the the-oretical results discussed in Section 4. To clarify, by Bayesian premium, we refer to the mean of theposterior distributions. Under our setting, the posterior for both the frequency and severity are mixeddistributions. By choosing mixed conjugate priors that are in the Exponential family, we were able toshow that the posterior distribution is also a mixture with updated parameters which are dynamicallyadjusted by incoming claims. As a reminder, we proposed the use of a mixture of two Negative Bino-mial distributions for the frequency, and a mixture of an Exponential and a Pareto distribution for theseverity. To illustrate our proposed model, we also calculated Bayesian premiums (posterior means)using other common models, i.e., when claim counts are ﬁtted by Negative Binomial, whereas claimseverities are ﬁtted by Exponential or Pareto, as we did in the severity data ﬁtting. All models undercomparison are listed in Table 5.

Scenario Frequency Severity Model δ >

1) 1bMixed Negative Binomial Mixed Exponential and Pareto ( δ >

1) 1c2 Negative Binomial Pareto ( δ ≤

1) 2aMixed Negative Binomial Mixed Exponential and Pareto ( δ ≤

1) 2b

Table 5: Parametric models used for premium calculations

We can thus compute all their corresponding posterior means for each period based on the observedclaim counts and simulated costs over time. Table 6 and Table 7 display the posterior means for theﬁrst 10 periods under the two separate cases distinguished by the values of the parameter δ . Notethat when the shape parameter δ ≤ Period Exp (1a) Par (1b) Mix (1c) Counts Costs

Table 6: Posterior Means under diﬀerent models ( δ > For a better understanding of the diﬀerences among diﬀerent models, we present in graphs thechanges of premiums overtime. Figure 6 demonstrates the ﬁrst scenario when there are no extreme21 eriod Par (2a) Mix (2b) Costs

Table 7: Posterior Means under diﬀerent models ( δ ≤ claims. Figure 7 is the graphical representation of the quantities in Table 7. In Figure 6, the topgraph contains claim counts and costs per period as well as the posterior means obtained from Model1a, Model 1b, and Model 1c, respectively. In general, the diﬀerences among the three models are notobvious in the ﬁrst plot. So the bottom two ﬁgures show the exact diﬀerences between the pairs weare interested in, i.e., 1a vs 1c and 1b vs 1c.Comparing our mixed model (1c) with the Exponential only one (1a), we can see signiﬁcant dis-crepancies over time. Matching them with the trend of the claim counts/costs, we can conclude thatthe mixed model provides higher premiums when there is an increasing trend in claims (e.g. fromperiod 10 to 50), while it immediately drops below the premium suggested by the Exponential modelas soon as a decreasing trend is observed (e.g. from period 100 to 150). That has reﬂected a quickresponse in the premium adjustment of the mixed model.There are no such evident diﬀerence between the mixed model (1c) and the Pareto only model (1b)overall, as can be seen in the graph at the bottom. Nevertheless, the mixed model gives slightly higherpremiums when there is a surge in the claims around period 10. Another main diﬀerence is that themixed model serves lower premiums at the very beginning compared to the Pareto model. It matcheswith our original intention that no punishment should be there for no claims.To sum up, these results have shown support for our proposal of the mixed model. On the onehand, we would like to keep the promise of not increasing premium charges until we have evidence ofrisks. On the other hand, the premium adjustment to actual evidence is sensitive and spontaneous.Figure 7 exhibits the premiums (posterior means) obtained from Model 2a and 2b, respectively. Theyhave demonstrated very similar trends overall according to the claim costs behaviours. When there isa big jump in the claim severity, e.g. around period 47, both models are able to provide a surge in thepremiums instantaneously. However, due to the large scale of claim severities, we could not see clearlythe diﬀerence between the two models from the top two graphs. The third plot was thus generatedat the bottom. As can be seen here, the mixed model in this case actually advocates lower premiumswhen the claim costs are extreme, compared to the Pareto model. It could be an eﬀect stemmingfrom the mixture with the Exponential distribution. However, from the practical point of view, thepremiums would probably be capped at a level anyway when the scale is that large.We have been discussing the mean values only. Since we have derived the posterior distributions,it would be interesting to visualize other statistics, such as the inter-percentile ranges (IPR). Notethat the IPR is a reﬂection of the variance of the posterior distribution, and is also referred to asthe (1 − α )% credibility interval under the Bayesian context. We will be using both terminologies22 Period V a l ue s variable ExpParMixcountscosts

Premiums vs counts and costs

Period P r e m i u m Differences (Mixed − Exponential) −4−3−2−10 0 50 100 150

Period P r e m i u m Differences (Mixed − Pareto)

Figure 6: Bayesian Premiums ( δ > Period V a l ue s variable Mixcosts

Premiums vs Costs (Mixed)

Period V a l ue s variable Parcosts

Premiums vs Costs (Pareto) −1.2e+11−8.0e+10−4.0e+100.0e+00 0 50 100 150 period d i f Differences (Mixed − Pareto)

Figure 7: Bayesian Premiums ( δ ≤ period p r e m premType premExppremMixpremPar Posterior Distribution 5−95 Interpercentile Range

Figure 8: 90% Credibility Interval ( δ > interchangeably in the sequel. Here we have sketched the 90% credibility interval (5-95 IPR) for theﬁrst scenario, i.e., for Model 1a, 1b and 1c in Figure 8. That includes the 90% possible values in thecenter for the risks. As shown in the graph, all three models have premium values lie very close to eachother, with Exponential (red) to be more distinct from the other two. Overall, the credibility intervaldiﬀer more at the very beginning possibly due to insuﬃcient data collected. Exponential has essentiallythe narrowest credibility interval amongst the three models. For comparison between the actual valuesof the 90% credibility interval between model 1b and 1c, see Figure 9. Except for a signiﬁcantly smallcredibility interval at the very beginning, they have exhibited no diﬀerence in the variation. This has,however, oﬀered us with one more reason to choose the mixed model: fewer uncertainties are preferredwhen assessing risks.In the extreme case, we drew a graph of comparison in a similar way. As a result, we can observea smaller variation at the beginning and also when the claims explode. Again, it shows support forthe mixed model from the smaller variance perspective, especially when other premium principles (e.g.the variance premium principle) are used instead of the pure premium principle adopted in our setup. Based on the numerical results and comparison with other commonly used models, we can make a fewsummary statements. On the one hand, we proposed a kind of zero-inﬂated model that can be morepractically employed by actuaries. According to the dataset considered here, the estimated probabilityof zeros per period is approximately 60%, which is larger than traditional models. That implies alower initial premiums which could facilitate the sales of policies. On the other hand, even thoughour model assesses the risks by allowing for more excess of zeros, we have set up a mechanism underthe Bayesian framework such that the measurement of risks can be adjusted in a timely manner tothe observed extreme events. Premiums suggested by the model will dynamically and automaticallymodify to incorporate the potential risks involved. In addition, the uncertainties involved in estimatingrisks are relatively small in our model.Therefore, our model could be useful in practice for its ability of sensitive detection of risks andeﬃcient adjustment to risk measurement. Early detection of risky behaviours in policyholders would bemost valuable for an insurer. This particularly applies to innovative policies covering e.g., autonomous24 period i n t d i ff Variation Differences (Mixed − Pareto) period c o s t s Figure 9: Variation Diﬀerences ( δ > −1.5e+10−1.0e+10−5.0e+090.0e+00 0 50 100 150 period i n t d i ff Variation Differences (Mixed − Pareto) period c o s t s Figure 10: Variation Diﬀerences ( δ ≤ In the previous sections, we were considering the independence between the claim frequency andseverity components. It would be opportune to investigate the eﬀect of dependence between the twoquantities. One possible consideration has been done in Cheung et al. [2019] by adopting a conjugatebivariate prior distribution. Here, we will brieﬂy introduce the idea of implementing a global likelihoodin this section which takes into account the two components simultaneously for parameter estimations.In general, a random sample will be composed by a sequence of m independent pairs of de-pendent observations ( N i , −→ Y i ), where N i represents the number of claims in the i -th period and −→ Y i = ( Y i , . . . , Y in i ) is the corresponding sequence of claim severities. It is clear that if we considerin general that both claim counts and severities may bring information about each stream, foreseableor unforeseable, we should consider that stochastic dependence between N i and corresponding −→ Y i ispresent. However, they are conditionally independent, given Λ = Λ i ( i = 1 ,

2) that is, for each in-dividual stream we can consider the classical assumption of independence between claim counts andseverities.Observed values are represented by corresponding lower case letters ( n i , −→ y i ). We also note that ineach pair the dimension of vector −→ y i depend on the observed n i . Now, we could write a likelihoodfunction considering the joint random vector ( N, −→ Y ) assuming we have m groups of observations andconditional independence of each Y ij , j = 1 , . . . , n i , for a given i : L ( ϑ | n , −→ y ) = m (cid:89) i =1 f N, −→ Y ( n i , y i , . . . , y in i ) = m (cid:89) i =1 f −→ Y | N ( −→ y i | n i ) P ( n i ) = m (cid:89) i =1  n i (cid:89) j =1 f −→ Y | N ( y ij | n i )  P ( n i ) , where P ( n i ) = P ( N = n i ), for simpliﬁcation.For our model, we can write the corresponding global log-likelihood function, denoted as L ( ϑ | n , −→ y ).If we let claim counts conform to Lemma 2.1 and claim severities follow (3.2), i.e. f ( · ) and g ( · ) arerespectively Exponential and Pareto densities, we have L ( ϑ | n , −→ y ) = m (cid:88) i =1 log P ( n i ) + m (cid:88) i =1 n i (cid:88) j =1 log f −→ Y | N ( −→ y i | n i )= m (cid:88) i =1 log (cid:26) p (cid:18) n + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α (cid:18) β + 1 (cid:19) n +(1 − p ) (cid:18) n + α + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α + α (cid:18) β + 1 (cid:19) n (cid:41) + m (cid:88) i =1 n i (cid:88) j =1 log (cid:26) νµe − µy ij + (1 − ν ) δσ δ ( σ + y ij ) δ +1 (cid:27) . Instead of building a likelihood function based on a sample of observations of the pair ( N, −→ Y ), wecan build it using inter-arrival time and corresponding severity, where each observation is a bivariatepair, denoted as ( T j , Y j ), of dependent random variables, in general. Let’s denote the density and thedistribution function of T j by φ ( · ) and Φ( · ), respectively.For ﬁnding the distribution of T j recall that, conditional on Λ = λ , { N ( t ) } is a Poisson process withintensity λ . Then, given Λ = λ , conditional interarrival time T j | Λ = λ are exponentially distributed,26ith mean λ − . Whenever the Poisson parameter is an outcome of a random variable Λ and it isdistributed as (A.1), we have that the unconditional distribution of T j is a mixture of two Paretodistributions whose density is given by φ ( t j ) = p α β α ( β + t j ) α +1 + (1 − p ) ( α + α ) β α + α ( β + t j ) α α +1 . (6.1)On the other hand, the density function of Y j , conditional on a given Ξ = ξ , is given by the mixture,see (3.1), h ξ ( y j ) = ξf ( y j ) + (1 − ξ ) g ( y j ) , (6.2)and the unconditional density is given by (3.2). In particular, we assumed in our model that f ∼ exp ( µ ), g ∼ P areto ( δ, σ ).From Remark 3.1, we know that for Ξ = ξ ﬁxed, T j and Y j are independent. A random sample ofobservations are made of independent pairs ( Y j , T j ), each j of dependent Y i and T i . Each pair, j , hasdensity function φ ( y i , t i ) = (cid:90) ξ h ξ ( y i ) φ ( t i ) dA ( ξ ) (6.3)= (1 − p ) (cid:90) ξ (cid:54) =1 h ξ ( y i ) φ ( t i ) dA ( ξ ) + p h ( y i ) α β α ( β + t i ) α +1 where A ( ξ ) stands for the distribution function of Ξ.Note that that we separated the situations ξ (cid:54) = 1 and ξ = 1. Also, the equivalent eventsare { Ξ = 1 } ⇔ { Λ (2) = 0 } , implying that P r { Ξ = 1 } = P r { Λ (2) = 0 } = p . Although we considerthe unconditional distribution of the arrival time (unconditional of Λ), we consider the conditionaldistribution for the individual claim size Y , given Ξ = ξ . Λ (1) , given as a proportion of Λ, remainsrandom and it means that the distribution of Λ (1) is a scaled distribution of that of Λ, or vice versa .Looking at the density h ξ ( y i ) we see that the split rate ξ gives the weight for the foreseeable streamclaim amount, and looking at the density φ ( y j ) we could think that the probability p gives a similarmeaning regarding the claim count stream. It does not seem to be the case, as it is not clear that thesecond part in the mixture represents the unforeseeable only.Returning to the joint density (6.3), we build the likelihood function over m ∗ pairs of observations( m ∗ may be diﬀerent from sample size m from random vector ( N, −→ Y ) above) L ( ϑ ) = m ∗ (cid:89) i =1 Φ( y i , t i ) = m ∗ (cid:89) i =1 (cid:90) ξ h ξ ( y i ) φ ( t i ) dA ( ξ )= m ∗ (cid:89) i =1 (cid:20) (1 − p ) (cid:90) ξ (cid:54) =1 ( ξf ( y i ) + (1 − ξ ) g ( y i )) φ ( t i ) Beta ( ξ ; α , α ) dξ + p f ( y i ) α β α ( β + t i ) α +1 (cid:21) , where Beta ( ξ ; α , α ) denotes the Beta density function of Ξ, given Ξ (cid:54) = 1, and α βα ( β + t i ) α is the density φ ( t j ) when ξ = 1. Note that we cannot interchange the product ( (cid:81) m ∗ i =1 ) and the integral. To conclude, on the model and the estimation procedure, this article has carried out a developmentand subsequent parameter estimation for the so-called unforeseeable risks discussed in Li et al. [2015]27or the claim counts arrival process. Precisely, for these risks, a probability p has been assigned at masspoint { } so that their corresponding counting process is distinguished from the classical one. Since wecould only observe the entire claim counts from an insurance portfolio, this missing information couldbe estimated using the EM algorithm. Under certain assumptions for the distribution of heterogeneitieswithin the portfolio (which was denoted as Λ in this work), we could derive the randomness of the totalclaim counts given a ﬁxed period to be a Negative Binomial mixture distribution. Thus, the likelihoodfunction could be presented explicitly via the EM algorithm.We introduced a similar modelling to account the unforeseeable eﬀect in the claim severity as well,by choosing a particular mixture distribution. This ended up with the existence of some commonparameters inserted in the severity distribution to those of the claim counts one. This intends to showsomehow the extra and common eﬀect of an unforeseeable risk eﬀect in both number of claims andseverity.When implemented on a data set of claim counts for the third party liability insurance portfolio,from a Portuguese automobile insurer, the resulting estimates show high sensitivity to the choseninitial values. Hence, we employed starting values which were computed via method of moments. Allthe Kolmogorov-Smirnov, Anderson-Darling and Cram´er-Von Mises tests suggest a good ﬁt on theobserved data.Concerning the claim severity component, due to lack of real data, we randomly generated databased on proposed mixed distribution and separated into two cases depending on whether the meanvalues are ﬁnite or inﬁnite. We veriﬁed the EM algorithm by comparing with the true parameter valuesused in the simulation procedure.When comparing with other traditional models, our model stands out for oﬀering lower startingpremia, faster adjustment to claim changes, and baring smaller variation in the resulting values. Theseare valuable features in practice and would be especially useful for pricing innovative policies coveringsuch as autonomous cars or COVID-19.In our model we could consider the calculation of Bayesian and credibility premia separately forclaim counts and severities, due to the independence situation between these two random variables.However, it would be feasible to compute a premium where we estimate parameters ( α , α , β, µ, δ, σ, p )altogether, using a likelihood where N and Y are jointly distributed. We will refer to it as the globallikelihood function and it will serve as a direct extension of the current work (as discussed).Estimating a premium taking into account both foreseeable and unforeseeable risks not only makesthe premium fairer, but also helps not underestimating portfolio’s ruin probabilities. Acknowledgements

Authors thank Fundaci´on MAPFRE, through 2016 Research Grants Ignacio H. de Larramendi, for theﬁnancial support given to the project [ “TAPAS (Technology Advancement on Pricing Auto inSurance)” ]Main research was done while Author ( ) was staying on location at Research Centre CEMAPRE,ISEG, Universidade de Lisboa, Portugal. Authors ( ),( ) gratefully acknowledge support from ProjectCEMAPRE/REM–UIDB/05069/2020 - ﬁnanced by FCT/MCTES (Funda¸c˜ao para a Ciˆencia e a Tec-nologia/Portuguese Foundation for Science and Technology) through national funds.Authors ( ),( ) gratefully acknowledge support by the LABEX MILYON (ANR-10-LABX-0070) ofUniversit´e de Lyon, within the program “Investissements d’Avenir” (ANR-11-IDEX- 0007) operatedby the French National Research Agency (ANR) .28 eferences Hans B¨uhlmann. Experience rating and credibility.

ASTIN Bulletin , 4(3):199207, 1967. doi: 10.1017/S0515036100008989.Hans B¨uhlmann and Alois Gisler.

A course in credibility theory and its applications . Springer Science& Business Media, 2006.EC Cheung, W Ni, R Oh, and JK Woo. A note on bayesian credibility with a dependent structure onthe frequency and the severity of claims. Technical report, Working Paper, 2019.Corina Constantinescu, Dominik Kortschak, and V´eronique Maume-Deschamps. Ruin probabilitiesin models with a Markov chain dependence structure.

Scandinavian Actuarial Journal , 2013(6):453–476, 2013.Christophe Couvreur. The EM algorithm: A guided tour. In

Computer Intensive Methods in Controland Signal Processing , pages 209–222. Springer, 1997.Arthur P Dempster, Nan M Laird, and Donald B Rubin. Maximum likelihood from incomplete datavia the EM algorithm.

Journal of the Royal Statistical Society. Series B (methodological) , pages1–38, 1977.Andr´e Dubey. Probabilit´e de ruine lorsque le param`etre de Poisson est ajust´e a posteriori.

Mitteilungender Vereinigung Schweiz Versicherungsmathematiker , 2:130–141, 1977.Stuart A Klugman, Harry H Panjer, and Gordon E Willmot.

Loss models: from data to decisions ,volume 715. John Wiley & Sons, 2012.Bo Li, Weihong Ni, and Corina Constantinescu. Risk models with premiums adjusted to claims number.

Insurance: Mathematics and Economics , 65:94–102, 2015.Eugene Lukacs. A characterization of the gamma distribution.

The Annals of Mathematical Statistics ,26(2):319–324, 1955.Phil McCausland. Self-driving uber car that hit and killed woman did not recognize that pedestri-ans jaywalk. , 2019.Weihong Ni, Corina Constantinescu, and Athanasios A Pantelous. Bonus–malus systems with Weibulldistributed claim severities.

Annals of Actuarial Science , 8(2):217–233, 2014.Richard A Redner and Homer F Walker. Mixture densities, maximum likelihood and the em algorithm.

SIAM review , 26(2):195–239, 1984.Waimo. Waimo team. https://waymo.com/ , 2016.Michelle J White. Understanding the asbestos crisis. https://law.yale.edu/sites/default/files/documents/pdf/white.pdf , 2003. Unpublished, May.29

Proofs of Lemmas within the paper

Proof.

Proof of Lemma 2.1. Taking the moment generating function of N , as function of ρ , yields M N ( ρ ) = E (cid:104) e ρ ( N (1) + N (2) ) (cid:105) = M N (1) ( ρ ) M N (2) ( ρ )= (cid:32) − β − e ρ β (cid:33) α (cid:34) p + (1 − p ) (cid:32) − β − e ρ β (cid:33) α (cid:35) = p (cid:32) − β − e ρ β (cid:33) α + (1 − p ) (cid:32) − β − e ρ β (cid:33) α + α . We know that this corresponds to a weighted average of two Negative Binomial distributions, withweights p and 1 − p respectively. Proof.

Proof of Lemma 2.2. Recall that given Λ = λ , P { N = n | Λ = λ } = e − λ λ n /n !. Now we integrateover Λ whose pdf, denoted as π ( . ), can be written as π ( λ ) = p · λ α − β α e − βλ Γ( α ) + (1 − p ) · λ ( α + α ) − β α + α e − βλ Γ( α + α ) . (A.1)Hence, P { N = n } = (cid:90) Λ e − λ λ n n ! (cid:32) p · λ α − β α e − βλ Γ( α ) + (1 − p ) · λ ( α + α ) − β α + α e − βλ Γ( α + α ) (cid:33) d λ = p · (cid:90) Λ e − λ λ n n ! λ α − β α e − βλ Γ( α ) d λ + (1 − p) · (cid:90) Λ e − λ λ n n! λ ( α + α ) − β α + α e − βλ Γ( α + α ) d λ = p (cid:18) n + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α (cid:18) β + 1 (cid:19) n +(1 − p ) (cid:18) n + α + α − n (cid:19) (cid:18) ββ + 1 (cid:19) α + α (cid:18) β + 1 (cid:19) n . This distribution coincides with the one shown in Lemma 2.1.

Proof.

Proof of Lemma 4.2. Under the observations n = { n , . . . , n m } , π ( λ | n ) = (cid:81) j e − λ λ nj n j ! (cid:110) p · λ α − β α e − βλ Γ( α ) + (1 − p ) · λ ( α α − β α α e − βλ Γ( α + α ) (cid:111)(cid:90) Λ (cid:81) j e − λ λ nj n j ! (cid:110) p · λ α − β α e − βλ Γ( α ) + (1 − p ) · λ ( α α − β α α e − βλ Γ( α + α ) (cid:111) d λ = p · λ (cid:80) j nj + α − β α e − ( m + β ) λ Γ( α ) + (1 − p ) · λ (cid:80) j nj +( α α − β α α e − ( m + β ) λ Γ( α + α ) p · Γ( (cid:80) j n j + α )Γ( α ) (cid:16) ββ + m (cid:17) α (cid:16) β + m (cid:17) (cid:80) j n j + (1 − p ) · Γ( (cid:80) j n j + α + α )Γ( α + α ) (cid:16) ββ + m (cid:17) α + α (cid:16) β + m (cid:17) (cid:80) j n j = 11 + G ( p, α , α , β, n ) · ( β + m ) (cid:80) j n j + α λ (cid:80) j n j + α − e − ( m + β ) λ Γ( (cid:80) j n j + α )+ G ( p, α , α , β, n )1 + G ( p, α , α , β, n ) · ( β + m ) (cid:80) j n j + α + α λ (cid:80) j n j + α + α − e − ( m + β ) λ Γ( (cid:80) j n j + α + α ) . Proof.

Proof of Lemma 4.3. Under the observations y = { y , . . . , y m ∗ } , and let Π Θ ( · ) be the distri-bution function of Θ π ( λ | n ) = (cid:16)(cid:81) m ∗ i =1 µe − µy i (cid:17) { ν ∆ θ ( { µ } ) } + (cid:16)(cid:81) m ∗ i =1 θe − θy i (cid:17) (cid:110) (1 − ν ) θ δ − σ δ e − σθ Γ( δ ) (cid:111)(cid:82) Θ (cid:16)(cid:81) m ∗ i =1 θe − θy i (cid:17) dΠ Θ ( θ )= ν (cid:0) µ m ∗ e − µ (cid:80) i y i (cid:1) ∆ θ ( { µ } ) + (1 − ν ) θ m ∗ + δ − σ δ e − ( σ + (cid:80) i yi ) θ Γ( m ∗ + δ ) ν (cid:0) µ m ∗ e − µ (cid:80) i y i (cid:1) + (1 − ν ) Γ( m ∗ + δ ) σ m ∗ + δ Γ( δ )( σ + (cid:80) i y i ) m ∗ + δ = 11 + ϕ ( ν, µ, δ, σ, y ) ∆ θ ( { µ } ) + ϕ ( ν, µ, δ, σ, y )1 + ϕ ( ν, µ, δ, σ, y ) ( σ + (cid:80) i y i ) m ∗ + δ θ m ∗ + δ − e − ( σ + (cid:80) i y i ) θ Γ( m ∗ + δ ) ..