Modelling and understanding count processes through a Markov-modulated non-homogeneous Poisson process framework
MModelling and understanding count processes through a Markov-modulatednon-homogeneous Poisson process framework
Benjamin Avanzi a , Greg Taylor b , Bernard Wong b , Alan Xian b, ∗ a Centre for Actuarial Studies, Department of EconomicsUniversity of Melbourne VIC 3010, Australia b School of Risk and Actuarial Studies, UNSW Business SchoolUNSW Sydney NSW 2052, Australia
Abstract
The Markov-modulated Poisson process is utilised for count modelling in a variety of areas such as queueing,reliability, network and insurance claims analysis. In this paper, we extend the Markov-modulated Poissonprocess framework through the introduction of a flexible frequency perturbation measure. This contributionenables known information of observed event arrivals to be naturally incorporated in a tractable manner,while the hidden Markov chain captures the effect of unobservable drivers of the data. In addition toincreases in accuracy and interpretability, this method supplements analysis of the latent factors. Further,this procedure naturally incorporates data features such as over-dispersion and autocorrelation. Additionalinsights can be generated to assist analysis, including a procedure for iterative model improvement.Implementation difficulties are also addressed with a focus on dealing with large data sets, where latentmodels are especially advantageous due the large number of observations facilitating identification of hiddenfactors. Namely, computational issues such as numerical underflow and high processing cost arise in thiscontext and in this paper, we produce procedures to overcome these problems.This modelling framework is demonstrated using a large insurance data set to illustrate theoretical,practical and computational contributions and an empirical comparison to other count models highlight theadvantages of the proposed approach.
Keywords:
Risk analysis, Markov processes, Count processes, Data analysis, EM algorithm
1. Introduction
Count modelling and the analysis of the occurrence of events is common to a wide variety of fields.The Markov-modulated Poisson process (MMPP), which is a continuous time model within the generalfamily of Hidden Markov models (HMMs), is used in a broad spectrum of count modelling applications. Itprovides a natural extension upon the widely-used Poisson process in situations where an observable eventprocess responds to external environmental stimuli. The MMPP model is composed of two components: (i)a stochastic process representing observable events over time, and (ii) a latent component that modulatesthe observed process. One benefit of this approach compared to the classical Poisson model stems fromthe greater modelling flexibility provided by the modulation, in particular to achieve over-dispersion. Forexample, in the context of modelling economic demand such as in Nasr and Maddah (2015) and Arts (2017),the hidden Markov process proxies the external environmental drivers of demand observations which canconsist of various financial, economic, social and political factors. Similarly, in insurance literature such as ∗ Corresponding author.
Email addresses: [email protected] (Benjamin Avanzi), [email protected] (Greg Taylor), [email protected] (Bernard Wong), [email protected] (Alan Xian)
Preprint submitted to Elsevier May 29, 2020 a r X i v : . [ q -f i n . R M ] M a y uillou et al. (2013), claim arrivals are influenced by a variety of different environmental drivers dependingon the line of business under consideration. The MMPP model provides a more faithful representation ofreal-world circumstances in this case because such drivers may be unobservable or extremely difficult tomodel, resulting in “hidden” states.The MMPP model is used in a number of other fields for comparable reasons. It is commonly found inthe natural science literature to model phenomena from birth-related movements (Leroux and Puterman,1992) and photon arrivals (Kou et al., 2005) to rainfall (Thayakaran and Ramesh, 2013b) and populations(Langrock et al., 2013). Aside from the various economic and financial areas previously mentioned, MMPPsare popular in the areas of network theory, telecommunications and data traffic modelling such as in Yoshi-hara et al. (2001), Salvador et al. (2003), Scott and Smyth (2003) and Casale et al. (2016). Finally, thisapproach is utilised in the literature on risk, inventory, reliability and queueing theory to address unrealisticPoisson process assumptions through modulation, for example, in ¨Ozekici and Parlar (1999), Ching (1997),Landon et al. (2013) and Arts et al. (2016).In the above papers, it is sometimes possible to apply techniques (usually Bayesian in nature) to gaininference on the latent Markov process within the MMPP. This is generally an endeavour that producessignificant value, as it enhances the tractability and interpretability of the model outputs. For example, inthe case of software reliability analysis as presented in Landon et al. (2013), the hidden process is inferredto represent the state of the software, with labels assigned of “excellent”, “good” or “bad”. An intuitiveextension upon this concept is the idea that there may exist seasonality in the modulating process which hasled to the development of seasonal MMPPs in the literature, such as the model presented in Guillou et al.(2015) for insurance claim counts where the latent process was assumed to represent the seasonal influenceof various geological and climatological phenomena.However, in any modelling exercise, there usually exist known and modellable causes of event arrivals,and these do not necessarily need to be cyclical or seasonal effects. This situation has garnered little attentionin the MMPP-related academic literature, although the idea itself is not new with some discussion in theinsurance context in Norberg (1993). Consider the case of insurance claims where the number of insurancepolicies under cover is clearly related to the observed claim arrivals but is unable to be taken into account asthis factor is not necessarily periodic. An argument may also be made that such an intuitive driver of claimsshould be modelled explicitly rather than delegated to approximation through hidden regimes. Alternativeexamples can be easily conjured for the other fields, such as promotions producing spikes in online trafficas well as the demand (and consequently, purchases) for goods and services. Unfortunately, it is difficult tostructure this information into the standard MMPP model, as regime selection is generally an automaticprocedure. In this paper, an extension to the standard MMPP model is introduced which includes a component toexplicitly control for the frequency perturbation caused by known factors. This measure is chosen to be veryflexible, so that both cyclical and non-recurring trends can be captured. The introduced heterogeneity hereproduces the Markov-modulated non-homogeneous
Poisson process (MMNPP).The advantages of the proposed framework are many. Firstly, the original benefits of MMPPs in analysingcount processes that are outlined above are retained. Secondly, the distinct separation of the known andhidden drivers of event arrivals enhances the insights obtained. This greatly benefits the utility of the ap-proach, as inappropriate or insufficient consideration of the known components can confound any inferencesobtained about the unobservable regimes of the hidden Markov chain. By explicitly taking these factors intoaccount, the modelling outputs may become more interpretable and this is demonstrated in an empiricalcase study in Section 5 where the MMNPP model is utilised as a diagnostic tool for generating inferenceswhich further assist the modelling procedure. Data features such as over-dispersion, auto-correlation andpersistence in regimes are also able to taken into account in an understandable manner. This includedinformation, in addition to the domain knowledge structured into the model, can ultimately lead to gainsin accuracy and precision.While non-homogeneous extensions to the MMPP model have not been widely explored in the literature(see Thayakaran and Ramesh (2013a); Guillou et al. (2015) for the examples that the authors are aware of),2omparable models have been successfully utilised for the more general family of Hidden Markov models.In these cases, the latent model is implemented to capture various unobservable factors with the non-homogeneous component supplementing this analysis. For example, Montoya et al. (2010) apply a non-homogeneous HMM to assess the effects of pharmaceutical marketing activities on physicians, taking intoaccount their heterogeneity and behaviour dynamics. Shi and Zhang (2014) apply a non-homogeneous HMMto investigate decision aids for online purchases with time-varying parameters across the latent Markov states.Further examples of applications include buyer-seller dynamics in Zhang et al. (2014) as well as the modellingof phenomena such as precipitation (Hughes et al., 1999; Greene et al., 2011) and pollution concentration(Lagona et al., 2011). This approach is also found in insurance analysis in Badescu et al. (2019), whouse a Pascal-HMM to model claim arrivals. Analogously, our proposed non-homogeneous extension willprovide a large amount of flexibility and practicability to the MMNPP model in order to improve analysisin the varied areas where it is applied. Further, in comparison to the HMM approach, our model does notrequire considerations of discretisation as both the arrival processes and the underlying Markov chain arecontinuous. It is noted in some papers such as Badescu et al. (2019) and Crevecoeur et al. (2019) that choiceof granularity and appropriate assumptions can be a non-trivial task, and the time-scale invariance of thecontinuous approach is advantageous.Another contribution in this paper is that we show that the MMNPP is a specific operational timetransform of the standard MMPP. The form of the transform depends on the frequency perturbation measurewhich provides much intuition to the modelling process which aids the interpretability of the final modeloutputs.Finally, implementations found in the literature generally utilise examples with sample sizes that arerelatively small. While this may be appropriate in certain areas, consideration of larger data sets is of merit,particularly given the current popularity of analytic models that are applied to such data sets. Further, thelarge number of observations advantages latent models such as the MMNPP due to clearer identification ofhidden regimes.Under these circumstances, calibration problems such as long computation times and numerical insta-bility are greatly exacerbated. Without addressing these problems, applicability of the proposed approachis more limited. However, it is exactly these situations that are ideal for analysis using hidden Markovmodels as the unobservable regimes are more clearly identified. In this paper, we produce an Expectation-Maximisation (EM) calibration algorithm that introduces several computational developments to reducecalibration time, allowing the model to be calibrated in a reasonable period of time when there is a largeamount of data to be processed. Further, the issue of numerical underflow is dealt with through a scalingprocedure applied to the forward-backwards recursions. These practical considerations and contributionsare outlined in the next section with context provided to demonstrate the necessity of these developments,as otherwise, the model is unable to be implemented and thus provides little utility.In the following section, we outline the insurance claim analysis context that serves as a motivatingexample for the developments within this paper. The MMNPP model specifications are then presented inSection 2. In Section 3, we develop a calibration procedure including various techniques to alleviate thecomputational problems explained previously. In Section 5, we implement the model on a real data setto highlight the inferential capabilities of our proposed approach. We also provide a comparison to othermodels to highlight the advantages provided by the MMNPP. Section 6 concludes.
The empirical case study in Section 5 examines daily claim counts for a motor insurance line of businessfor a large Australian non-life insurer. The analysis of insurance claim occurrences is a common procedureand serves to inform many insurance business operations. There is a large amount of actuarial scienceliterature on this topic but until recently, analysis has exploited data aggregated over longer discrete timeframes such as months or years. Weekly and daily granularity of claim count data has been investigatedby only a few papers such as Verrall and W¨uthrich (2016); Avanzi et al. (2016); Crevecoeur et al. (2019)which deal with the reporting delay aspect of insurance claims while other nuances such as seasonality aredemonstrated in Guillou et al. (2015); Albrecher et al. (2019). The benefits of analysis at this greater level3f detail include the generation of insights that were previously masked by aggregation and discretisation,which can lead to greater accuracy and precision.Insurance claim count modelling encounters several obstacles that are significantly alleviated by theinnovations made in this paper. Particularly for well known lines of business such as Motor, insurers possessestablished domain knowledge of important covariates that influence claim observations and the inability toincorporate this information into analysis would hamper to utility of any such modelling approach. Secondly,analysis at greater granularity results in a very large number of individual data points as opposed to theoriginal aggregated observations (significantly more than half a million in our real data case study). Thisaggravates calibration issues common to hidden Markov models such as numerical instability as well asconsiderations such as calibration time and processing power required. Our developments allow the modelto be calibrated on standard computing software in a reasonable amount of time. Finally, the insights inSection 5 highlight the diagnostic abilities of the modelling framework, including the capacity for iterativemodel improvement. This capability is particularly relevant for insurance claim modelling as analysis at thislevel of detail is not commonly undertaken and we demonstrate that additional interpretable causes of claimobservations are able to be extracted. These drivers can then be examined and incorporated as deemed fitby the modeller.
2. Model Specifications for the MMNPP
We assume that event arrivals, over the time interval [0 , T ], follow a Markov-modulated non-homogeneousPoisson process N M = { N M ( t ); t ≥ } . Here, N M ( t ) represents the total number of events arriving beforetime t , with N M (0) = 0. There is also assumed to be an unobservable underlying environmental process thatimpacts the event arrival intensity. This process is modelled with a continuous time Markov chain M ( t ) withfinite discrete state space 1 , , . . . , r . We denote this latent process M = { M ( t ); t ≥ } , and it is assumedthat the chain has constant stationary transition intensities between states i and j of q i,j . Consequently, ifthe n -th state change time is denoted by u n , then the duration of the n -th state is exponentially distributedand given by P [ u n +1 − u n > t | M ( u n ) = i ] = exp ( − q i t ) , where q i = (cid:88) j (cid:54) = i q i,j , (2.1)where P is the probability function.When the Markov process is in state i at time t , events arrive according to the process N M with intensityrate λ M ( t ) = λ M ( t ) × γ ( t ) . (2.2)Here, λ M ( t ) is the constant intensity conditional on the state of the underlying Markov chain at time t , and γ represents the known exogenous volume or exposure process. The latter is at the heart of the contributionsof this paper, as discussed in detail in the next section. This γ -process is very flexible and can take the formof any desired function of bounded variation which covers the great majority of all possible functions thatmay arise in practical modelling circumstances.The standard expressions for the conditional Poisson processes are easily obtained under this framework.The probability of n events occurring before time t is given by P [ N M ( t ) = n | M ( s ); 0 ≤ s ≤ t ] = exp (cid:18) − (cid:90) t λ M ( s ) γ ( s ) ds (cid:19) (cid:16)(cid:82) t λ M ( s ) γ ( s ) ds (cid:17) n n ! . (2.3)From this expression, properties of the inter-arrival times of events may also be obtained. Let t n be thearrival time of the n -th event. Then P [ t n +1 > t n + t | t n , M ( s ); t n ≤ s ≤ t n + t ] = exp (cid:18) − (cid:90) t n + tt n λ M ( s ) γ ( s ) ds (cid:19) . (2.4)In the paper, the overall MMNPP process to describe the hidden regimes and event arrivals is denoted as { M, N M } = { M ( t ) , N M ( t ); t ≥ } . 4 .1. The exposure measure γ In the above, γ ( t ) is a scalar function representing the known exogenous volume or exposure process. This γ -process is very flexible and theoretically only needs to be a function of bounded variation, which allowsfor both jump-type and removable discontinuities (although the latter would rarely occur in practice). Forthe purposes of calibration using the method outlined in Section 3, we require that any applicable functioncan be adequately approximated by a number of constant-valued intervals. This is a fairly weak assumptionas the simple function approximation is very general, allowing for a wide range of functions to be used suchas continuous and finitely dis-continuous functions. It would be difficult to envisage a volume process in anypractical circumstance that would be suitably irregular to violate this assumption. One example of such anextreme case would be a function consisting of purely discontinuous singletons on the real line, but we couldnot think of any practical situation where the volume function would not satisfy our requirements. Thereasoning behind this constraint is that it greatly reduces the computational burden in evaluating severalcomplex expressions involving matrix differential equations. This is detailed within the proof for Theorem3.2 in Section 3.4. The multiplicative structure of the overall Poisson intensity is also theoretically justifiedin Section 2.3 but it is intuitively explainable. For example, if the number of consumers within the marketfor a certain good doubles, then it is reasonable to expect that the observed conditional Poisson intensityof purchases of the good also doubles, ceteris paribus. The MMNPP can be interpreted as a operational time transformation of the a homogeneous MMPP,which provides a convenient, intuitive interpretation of the model. This argument follows in a similar mannerto the transformation applicable to non-homogeneous Poisson processes and is outlined below.We begin with the definition of an operational time scale that will be applied.
Definition 2.1.
Consider an operational time scaling function ρ applied to time t , where ρ ( t ) is a mono-tonically increasing function in t and ρ (0) = 0 . As a result, ρ − ( t ) is also well defined. This produces theoperational time scale ρ ( t ) . Using ρ , we can define a new time-transformed process N M ( ρ − ( t )) : t → ρ − ( t ) N M ( t ) → N M ( ρ − ( t ))We will also use the following specific operational time function ρ : Definition 2.2.
In the context of the MMNPP model, the operational time function ρ takes the form ρ ( t ) = (cid:90) t γ ( x ) dx. It is shown in Theorem 2.3 below that the original MMNPP process { M ( t ) , N M ( t ); t ≥ } under thistransformation becomes { M ( t ) , N M ( ρ − ( t )); t ≥ } , where the component N M ( ρ − ( t )) is a homogeneousPoisson process with constant conditional intensity rate λ M ( t ) at time t . Observe that in the above, the timescaling has only been applied to the conditional Poisson process and not the hidden Markovian process. Theorem 2.3.
We use the previously defined time transform from t to ρ − ( t ) : t old time → ρ − ( t ) new time , where ρ ( t ) = (cid:82) t γ ( x ) dx is a monotonically increasing function in t with ρ (0) = 0 . This transform is applied tothe conditional Poisson component of a MMNPP process { M ( t ) , N M ( t ); 0 ≤ t ≤ T } with transitions q i,j andintensities λ M ( t ) × γ ( t ) . The resultant process is the homogeneous MMPP process { M ( t ) , N M ( ρ − ( t )); 0 ≤ t ≤ T } with transitions q i,j and intensities λ M ( t ) . roof of Theorem 2.3 Although the result is very intuitive, for completeness we provide a proof here.We require the following preliminary results. For readability, the proofs of the Lemmas are provided insteadin the appendix.
Lemma 2.4.
The time transformed process { M ( t ) , N M ( ρ − ( t )) } retains the Markov property Proof of Lemma 2.4
Please refer to Appendix A.
Lemma 2.5.
The MMPP process { M ( t ) , N M ( t ) } is homogeneous over period [ t , t ] if and only if P [ N M ( t ) = n, N M ( t ) = n ] = exp (cid:34) − m (cid:88) k =1 λ s k ( u k − u k − ) (cid:35) where k is a counter for the number of regime periods that occur in [ t , t ] , s k is the ordinal number of the k -th regime and regime changes occur within this interval at times u , u , . . . , u m with u = t and u m = t . Proof of Lemma 2.5
Please refer to Appendix A.In order to satisfy the non-decreasing property of ρ , γ is restricted to being strictly positive (which is anintuitively correct restriction for an exposure measure). Using k as the number of regime changes between[ t , t ], we obtain P (cid:2) N M ( ρ − ( t )) = n, N M ( ρ − ( t )) = n (cid:3) = P [ N M ( t ) = n, N M ( t ) = n ] (2.5)= exp (cid:20) − (cid:90) t t λ M ( x ) × γ ( x ) dx (cid:21) (2.6)= exp (cid:34) − m (cid:88) k =1 (cid:90) u k u k − λ s k × γ ( x ) dx (cid:35) (2.7)= exp (cid:34) − m (cid:88) k =1 λ s i (cid:90) u k u k − γ ( x ) dx (cid:35) (2.8)= exp (cid:34) − m (cid:88) k =1 λ s k [ ρ ( u k ) − ρ ( u k − )] (cid:35) (2.9)The proof is concluded with the observation that the times ρ ( u k ) correspond to the regime change times u k in the underlying Markov chain M . Thus, using Lemmas 2.4 and 2.5, the operational time adjustmentas defined above changes the original non-homogeneous MMPP into a homogeneous MMPP when the non-homogeneous intensities are in the form λ ( t ) = λ M ( t ) × γ ( t ). We have defined the non-homogeneous Poisson intensity λ M ( t ) = λ M ( t ) × γ ( t ), where γ ( t ) was a knownexogenous process that reflected external risk exposures. In addition to the explanation in the previoussection, The form of this intensity can be theoretically and intuitively justified in the following manner. Ifwe have multiple (homogeneous) MMPP processes with the same modulating Markov process, then we canshow that the standard additivity property of Poisson processes still holds. This result follows directly fromProposition 2.6 below. Proposition 2.6.
Let there be two MMPP processes { M ( t ) , N M ( t ) } and { M ( t ) , N M ( t ) } with the sameunderlying Markov chain { M ( t ) } and claim intensities λ M ( t ) and λ M ( t ) respectively. We also have that theconditional Poisson processes N iM ( t ) are independent for i = 1 , . Then the combined process { M ( t ) , N , M ( t ) } = { M ( t ) , N M ( t ) + N M ( t ) } is also an MMPP with a conditional Poisson process N , M ( t ) that has a claim intensity of λ M ( t ) = λ M ( t ) + λ M ( t ) . roof of Proposition 2.6 Please refer to Appendix BGeneralising Proposition 2.6 to n event processes of form { M ( t ) , N iM ( t ) } i =1 ,...,n with the same intensity λ M ( t ) (for example, risk events arriving from n similar insurance policies or arrivals to n similar queues atairport check-in counters), the combined MMPP process has a claim intensity of λ M ( t ) × n . If this exposureprocess is allowed to vary over time, then the event intensity function can be thought of as λ M ( t ) × γ ( t ),where γ ( t ) is the volume or risk exposure measure. This interpretation also applies when the known function γ is a combination of multiple factors, in which case the multiplicative assumption is convenient. In thiscase, the overall event intensity would also be of the form λ M ( t ) × γ ( t ), where γ is now the multiplicativecombination of the impact of the considered factors.
3. Calibration of the MMNPP framework
The calibration of the MMNPP model is a two-step procedure and is detailed in the following section.The steps involved are1. Calibration of the exposure/volume component γ ( t ).2. Calibration of the hidden Markov-modulated Poisson component, specifically event frequency andregime transition parameters λ M and q i,j respectively. γ Firstly, modellers will combine their domain knowledge and data analysis to generate a model for theexposure/volume component of the framework, which represents the known or “explicitly modellable” infor-mation. One key advantage of our proposed approach is that this component is modular and many differentmodels can be chosen depending on the modelling context. For example, for the purpose of modellinginsurance claims, generalised linear models (GLMs) are commonly applied in practice, although there hasbeen recent exploration into the usage of more sophisticated data-driven methods such as trees and neuralnetworks such as in Gabrielli et al. (2019). All such methods are applicable here due to the flexibility of themeasure γ . Note that without loss of generality, the scalar process ρ is defined relative to some base level,such as the exposure/volume at the beginning of the period.The values obtained from the exposure component of the framework then serve as inputs in the hiddenMarkov chain calibration. Intuitively, the hidden Markov chain captures any residual temporal effects thatwere not explicitly included in the model for the exposure component. The details for this secondary stepin the calibration is provided in the following section. The calibration of the hidden Markov component of the MMNPP model utilises an adapted form of theExpectation Maximization algorithm for MMPPs originally proposed in Ryd´en (1994). Alternative methodsexist in the literature for the homogeneous MMPP models such as moment-matching (Rossiter, 1989; Gusella,1991), other likelihood-based estimation approaches (Meier-Hellstern, 1984) and Markov chain Monte Carlo(MCMC) sampling, the last of which has been popular in the literature in recent times. Ryd´en et al. (2008)examine the EM vs MCMC estimation for the general family of Hidden Markov models (HMM). The papertakes a computational perspective, and concludes that in the case where only point estimates are required ormodel comparison only requires likelihoods, the EM algorithm is preferred. In other cases, either approachhas advantages and disadvantages and the comparison is less clear.When applying the model to large data sets, as is the case in Section 5, the relative computationalsimplicity of the EM algorithm is beneficial. For MMPPs, closed form expressions for the estimators of thealgorithm can be obtained, alleviating the high computational requirements. We extend those results to theMMNPP case in this paper. Additional improvements to the algorithm have been made to further reducecomputation times and to resolve the previously mentioned issue of numerical instability.A very brief summary of one loop of the iterative algorithm is the following (note that the estimatorsand expressions for their computation will be introduced in detail later):7. Using current estimates of the regime transition and regime frequency parameters Q and Λ respectively,evaluate the E-step estimators ˆ a , ˆ n , ˆ T and ˆ T ∗ .2. Using current estimates of the E-step estimators ˆ a , ˆ n , ˆ T and ˆ T ∗ , evaluate new estimates for Q and Λ.3. Stop the algorithm based on some termination criteria (more details in Section 3.7).The following sections are organised as follows. In Section 3.3, the complete likelihood function that ismaximised through the EM algorithm is derived, taking into account the introduced non-homogeneity. InSection 3.4, the closed form expressions for the E-step estimators given with derivations provided in theappendix. In order to overcome numerical instability issues, a scaling approach is applied to the recursionequations, as outlined in Section 3.5. This approach is adapted from the algorithm for MMPPs from Robertset al. (2006). We also adapt a result from Van Loan (1978) to increase computational efficiency whenevaluating integrals of matrix exponentials, which was previously the most computationally burdensomecomponent of the algorithm. If the algorithm was instead forced to rely on quadrature methods for theseevaluations, for example, the calculations would quickly become too cumbersome for even moderate numbersof claims to be viably analysed. Finally, during the maximisation step shown in Section 3.3, the traditionaltime-based estimators presented in Ryd´en (1996) are now split into a time-based and operational time-basedestimator, which follows intuitively from the operational time transform described in the previous section.Section 3.7 concludes with a brief summary of the iterative calibration procedure. The EM algorithm is an iterative procedure that seeks to maximise the complete likelihood function(that is, the likelihood function assuming knowledge of the hidden components). In the following section,we produce the complete likelihood function as well the necessary E-step estimators for the algorithm.The complete form of likelihood function in the case of the MMNPP model, assuming knowledge of bothevent arrival times and regime change times, is what needs to be maximised in this case. We introduce thefollowing notation:1. Jumps between states/regimes occurs at time points 0 = u < u < u < . . . < u m < u m +1 = T ,where T is the end of the observation period.2. The regime time interval is I k = [ u k − , u k ) for 1 ≤ k ≤ m + 1.3. The regime of the process M ( t ) in the interval I k is given by s k ∈ { , , . . . , r } .4. The number of events occurring in regime period I k is given by z k ∈ N .5. The total number of events is denoted n .6. Event arrivals occur within I k at time points t k, , t k, , . . . , t k,z k and t k, = u k − .Further, the following result is required in order to derive the complete likelihood function: Lemma 3.1.
Consider the regime time interval I k = [ u k − , u k ) of a non-homogeneous MMPP with claimintensity λ ( t ) = λ s k × γ ( t ) where the underlying Markov chain M ( t ) does not change states, and λ s k isthe constant component of the claim intensity within I k . The joint density of claim times T i , given that z k claims occur within this interval, is given by the following expression P [ T k, = t k, , T k, = t k, , . . . , T k,z k = t k,z k | N M ( u k ) − N M ( u k − ) = z k ] = z k ! (cid:16)(cid:82) u k u k − γ ( s ) ds (cid:17) z k z k (cid:89) i =1 γ ( t k,i ) Proof of Lemma 3.1
It can be seen that P [ T k, = t k, , T k, = t k, , . . . , T k,z k = t k,z k | N M ( u k ) − N M ( u k − ) = z k ]= P [ T k, = t k, , T k, = t k, , . . . , T k,z k = t k,z k , N M ( u k ) − N M ( u k − ) = z k ] P [ N M ( u k ) − N M ( u k − ) = z k ]8ote that the numerator of the above can be rewritten in the following manner: P [ T k, = t k, , T k, = t k, , . . . , T k,z k = t k,z k , N M ( u k ) − N M ( u k − ) = z k ]= P [ z k claims between [ u k − , u k ) with claim times t k, , t k, , . . . , t k,z k ]= z k (cid:89) i =1 (cid:20) P [ no claims until time t k,i ] × P [ claim at time t k,i ] (cid:21) × P [ no claims until end of interval u k ]= z k (cid:89) i =1 (cid:34) exp (cid:32) − (cid:90) t k,i t k,i − λ s k γ ( y ) dy × λ s k (cid:33) γ ( t k,i ) (cid:35) × exp (cid:32) − (cid:90) u k t k,zk λ s k γ ( y ) dy (cid:33) Thus, we have that P [ T k, = t k, , T k, = t k, , . . . , T k,z k = t k,z k | N M ( u k ) − N M ( u k − ) = z k ]= (cid:81) z k i =1 (cid:104) exp (cid:16) − (cid:82) t k,i t k,i − λ s k γ ( y ) dy (cid:17) λ s k γ ( t k,i ) (cid:105) × exp (cid:16) − (cid:82) u k t k,zk λ s k γ ( y ) dy (cid:17) exp (cid:16) − (cid:82) u k u k − λ s k γ ( y ) dy (cid:17) ( (cid:82) ukuk − λ sk γ ( y ) dy ) zk z k ! = z k ! (cid:16)(cid:82) u k u k − γ ( s ) ds (cid:17) z k z k (cid:89) i =1 γ ( t k,i ) . The complete likelihood function of the MMNPP is then given in Equation (3.2), using Lemma 3.1 todetermine the last term: L c = π s (cid:40) m (cid:89) k =1 q s k exp ( − q s k ∆ u k ) × q s k ,s k +1 q s k (cid:41) exp (cid:0) − q s m +1 ∆ u m +1 (cid:1) × m +1 (cid:89) k =1 (cid:16)(cid:82) u k u k − λ s k γ ( t ) dt (cid:17) z k z k ! exp (cid:32) − (cid:90) u k u k − λ s k γ ( t ) dt (cid:33) × z k ! (cid:16)(cid:82) u k u k − γ ( t ) dt (cid:17) z k × z k (cid:89) i =1 γ ( t k,i ) (3.1)= π s (cid:40) m (cid:89) k =1 exp ( − q s k ∆ u k ) × q s k ,s k +1 (cid:41) exp (cid:0) − q s m +1 ∆ u m +1 (cid:1) × (cid:40) m +1 (cid:89) k =1 (cid:34) λ z k s k exp (cid:32) − (cid:90) u k u k − λ s k γ ( t ) dt (cid:33) z k (cid:89) i =1 γ ( t k,i ) (cid:35)(cid:41) (3.2)In the above, π s denotes the starting probabilities of the underlying Markov chain. If prior informationis unavailable, a discrete uniform distribution is applicable. By taking logs, we obtainlog L c = log π s − m +1 (cid:88) k =1 q s k ∆ u k + m (cid:88) k =1 log q s k ,s k +1 + m +1 (cid:88) k =1 z k log λ s k − m +1 (cid:88) k =1 λ s k (cid:90) u k u k − γ ( t ) dt + m +1 (cid:88) k =1 z k (cid:88) i =1 log( γ ( t k,i )) . (3.3)These sums are then reformulated in terms of the order of the Markov chain r instead of the number ofregime changes. Note that the last term in Equation (3.3) is the summation of all the logs of the exposuresat the claim arrival times. As these values are known, this term is rewritten as the constant C ( γ ).We also require the following E-step estimators, which are generally analogous to the estimators fromRyd´en (1994). The first two, m i,j and n i can be interpreted to the number of transitions from state i to9tate j and the number of claims arriving in state i respectively. a i,j = card { t : 0 < t ≤ T, N M ( t − ) = i, N M ( t ) = j } (3.4) n i = n (cid:88) k =1 I { M ( t k )= i } , (3.5)where card is a count/cardinality function for the number of events. The total amount of time spent in state i is T i = (cid:90) T I { M ( t )= i } dt. (3.6)Compared to the original EM algorithm, there is an extra “time in state” estimator T ∗ i , given by T ∗ i = (cid:90) T I { M ( t )= i } γ ( t ) dt, (3.7)which reduces to the definition of T i in the case of a homogeneous MMPP where γ ( t ) = 1 ∀ t . Finally, thecomplete loglikelihood expression may be written aslog L c = r (cid:88) i =1 I { M (0)= i } log π i − r (cid:88) i =1 T i q i + r (cid:88) i =1 r (cid:88) j =1 ,i (cid:54) = j a i,j log q i,j + r (cid:88) i =1 n i log λ i − r (cid:88) i =1 λ i T ∗ i + C ( γ ) (3.8)It is interesting to note that the forms of T i and T ∗ i point to the use of two different time scales to evaluatethe MMNPP parameters q and λ , where the relationship between the different time scales is defined throughthe exposure measure γ . This is intuitive given our previous operational time interpretation of the MMNPPprocess. In the next section, we will derive closed form expression for the expectation of the estimators definedabove. We require the evaluation of E ( Q , Λ ) [log L c ( Q, Λ) | N M ( ρ − ( t )) , ≤ t ≤ T ] . (3.9)In the above, Q = { q i,j } , Λ = diag( λ M , λ M , . . . , λ M r ) and the current estimates of the parameters of theMMNPP at each iteration of the EM algorithm are denoted by q i,j and λ i . Their respective matrices arecorrespondingly Q and Λ . For notational convenience, we will also rewrite the t k,i terms in chronologicalorder as t j , j = 1 , . . . , n . The following expressions for the expectations of the M-step estimators areobtained:ˆ a i,j = E [ a i,j | N M ( ρ − ( t )) , ≤ t ≤ T ] = (cid:90) T P [ M ( x − ) = i, M ( x ) = j | N M ( ρ − ( t )) , ≤ t ≤ T ] dx (3.10)ˆ n i = E [ n i | N M ( ρ − ( t )) , ≤ t ≤ T ] = n (cid:88) k =1 P [ M ( t k ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] (3.11)ˆ T i = E [ T i | N M ( ρ − ( t )) , ≤ t ≤ T ] = (cid:90) T P [ M ( x ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] dx (3.12)ˆ T ∗ i = E [ T ∗ i | N M ( ρ − ( t )) , ≤ t ≤ T ] = (cid:90) T P [ M ( x ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] γ ( x ) dx (3.13)The above expressions are decomposed into event inter-arrival intervals and written in terms of thefundamental regime transition probabilities for MMNPPs within each interval. These probabilities are bederived from the Chapman-Kolmogorov equations, taking into account the varying exposure measure atevent arrival times, and are given in Theorems 3.2 and 3.3 below.10 heorem 3.2. The probability of regime transitions without any event arrivals by time t + t ∗ , given theregime at time t , is given (in matrix form) by ¯ F ( t, t ∗ ) = exp [( Q − Λ γ t ∗ ) t ∗ ] , (3.14) where γ t ∗ is the constant exposure measure that applies to the interval [ t, t + t ∗ ) . Proof of Theorem 3.2
The transition probabilities without claim arrivals at time t ∗ + t given the stateat time t is given by¯ F i,j ( t, t ∗ ) = P [ M ( t ∗ + t ) = j, N M ( ρ − ( t ∗ + t )) = n | M ( t ∗ ) = i, N M ( ρ − ( t ∗ )) = n ]for the process { M ( t ) , N M ( ρ − ( t )) } , given that n claims have occurred before time t ∗ . For any arbitrarilysmall increment ∆ t ,¯ F i,j ( t + ∆ t, t ∗ ) = ¯ F i,j ( t, t ∗ )(1 − r (cid:88) i (cid:54) = j q j,i ∆ t − λ j ∆ ρ ( t ∗ + t )) + r (cid:88) k (cid:54) = j ¯ F i,k ( t, t ∗ ) q k,j ∆ t + o (∆ t )where ∆ ρ ( t ∗ + t ) = ρ ( t ∗ + t + ∆ t ) − ρ ( t ∗ + t ). Taking the limit as ∆ t goes to zero and using the previ-ously specified constraint that q i = (cid:80) j (cid:54) = i q i,j , the following Chapman-Kolmogorov differential equations areobtained: ¯ F (cid:48) i,j ( t, t ∗ ) = ¯ F i,j ( t, t ∗ )( − q j − λ j lim ∆ t → ∆ ρ ( t ∗ + t )∆ t ) + r (cid:88) k (cid:54) = j ¯ F i,k ( t, t ∗ ) q k,j (3.15)Note that the limit in the above equation is simply ρ (cid:48) ( t ∗ + t ), which equals γ ( t ∗ + t ) from the FundamentalTheorem of Calculus. Then Equation (3.15) may be rewritten as¯ F (cid:48) i,j ( t, t ∗ ) = ¯ F i,j ( t, t ∗ )( − q j − λ j γ ( t ∗ + t )) + r (cid:88) k (cid:54) = j ¯ F i,k ( t, t ∗ ) q k,j which in matrix form is ¯ F (cid:48) ( t, t ∗ ) = ¯ F ( t, t ∗ )( Q − Λ γ ( t ∗ + t )) . The assumption that γ ( t ) is constant between event arrivals and exposure component changes is appliedhere so that the above matrix differential equation may be simplified. In the proposed EM algorithm, eachpart of ¯ F i,j is evaluated over disjoint intervals where there are no event arrivals or changes in the exposurecomponent. In this circumstance, γ ( t ∗ + t ) = γ ( t ∗ ) = γ t ∗ , where γ t ∗ is the constant exposure applicable inthe interval [ t ∗ , t ∗ + t ). The matrix form of the differential equation is then ¯ F (cid:48) ( t, t ∗ ) = ¯ F ( t, t ∗ )( Q − Λ γ t ∗ ) , with solution ¯ F ( t, t ∗ ) = exp [( Q − Λ γ t ∗ ) t ] . This is derived using the following power series definition for exponential matrices:exp( A t ) = I + t A + t A + . . . , which holds when matrix A is commutative. In this case, this property is clearly satisifed as as ( Q − Λ γ t ∗ )is independent of t . It follows using the above definition that ddt exp( A t ) = exp( A t ) A . Substitution of A = ( Q − Λ γ t ∗ ) and ¯ F ( t, t ∗ ) = exp [( Q − Λ γ t ∗ ) t ] then gives the required result.11 emark 3.1. In the above theorem, we have set a constant value for the exposure measure in the interval [ t, t + t ∗ ) . The reason for this approach is that without a constant exposure measure, the matrix differentialequations are very difficult to solve. There are no known analytical solutions and while some approximationsmay be available through the use of Magnus expansions, these come at significant computational cost. It isfor this reason that we restricted the exposure component to a function of bounded variation in Section 2, asany such function can be adequately approximated by the piecewise constant structure implied by the abovetheorem. We note that this is not unrealistically restrictive on the exposure function because if the variationin the function is large between two event arrivals, a middle point in the interval can be chosen to producea better approximation. This approach is expanded upon later in this section. Theorem 3.3.
The probability of regime transitions with a single event arrival at time t ∗ + t , given theregime at time t , is expressed in matrix form as f ( t, t ∗ ) = exp [( Q − Λ γ t ∗ ) t ∗ ] Λ γ t + t ∗ (3.16) where γ t + t ∗ is the constant exposure measure applicable at time t + t ∗ . Proof of Theorem 3.3
The formal definition of f ( t, t ∗ ) is given by f i,j ( t, t ∗ ) = P [ M ( t ∗ + t ) = j, N M ( ρ − ( t ∗ + t − )) = n | M ( t ∗ ) = i, N M ( ρ − ( t ∗ )) = n ] × P [ M ( t ∗ + t ) = j, N M ( ρ − ( t ∗ + t )) = n + 1 | M ( t ∗ + t ) = j, N M ( ρ − ( t ∗ + t − )) = n ]where ρ ( t − ) = lim (cid:15) → ρ ( t − (cid:15) ). By following a similar set of derivations as in the proof for Theorem 3.2, itcan be seen that f ( t, t ∗ ) = exp [( Q − Λ γ t ∗ ) t ] Λ γ ( t + t ∗ )= exp [( Q − Λ γ t ∗ ) t ] Λ γ t + t ∗ as required.These theorems can also be intuitively understood as the non-homogeneous extensions of the corre-sponding MMPP expressions provided in Ryd´en (1996). They provide the fundamental building blocks withwhich the EM algorithm for the MMNPP model is constructed. However, without further adjustments,use of these theorems would require the exposure measure to be unchanged within event inter-arrival timeintervals. In situations where event occurrences are sparse, this may be a strong assumption. To remedythis issue, another event type is introduced for a change in the piecewise constant exposure function γ . Forease of notation, we rewrite the event times in chronological order. Denote all event times t k (the event ofinterest and changes in the exposure function). For each event time, there is an accompanying indicator δ k to denote the type of event: δ k = (cid:40) , if it is a change in the exposure function,1 , if it is an event of interest . Applying this to Theorem 3.2, the matrix function f becomes f δ k ( t k − , t k − t k − ) = (cid:40) f ( t k − , t k − t k − ) , δ k = 0 , ¯ F ( t k − , t k − t k − ) , δ k = 1 . = (cid:40) exp[( Q − Λ γ ∗ t k − )( t k − t k − )]Λ γ ∗ t k , δ k = 0 , exp[( Q − Λ γ ∗ t k − )( t k − t k − )] , δ k = 1 . (3.17)The analytical expressions for the expectation of the E-step estimators are now written in terms of theabove. Each estimator will be dealt with individually in the following subsections. Note that for clarity andreadability, derivation of these estimators are provided in the appendix of the paper.12 .4.1. The estimator for the number of regime changes: ˆ a i,j The expression for the estimator of m i,j is given byˆ a i,j = q i,j π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) k =1 f δ k ( t k − , t k − t k − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) k = N M ( ρ − ( s ) − )+2 f δ k ( t k − , t k − t k − ) (cid:21) ds (cid:35) . (3.18)where π ( Q , Λ ) is the starting distribution of state probabilities given the initial parameter estimates Q and Λ , is a r × t = 0. Also, i is a vector of zeroes except for the i -th entrywhich is one. Finally, k is now a counter for both event types and n is the total number of original eventsand exposure changes. ˆ n i Continuing from Equation (3.11), we obtain the following expression for the estimator of n i :ˆ n i = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × n (cid:88) k =1 π ( Q , Λ ) (cid:20) N M ( ρ − ( t k )) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) i (cid:124) i (cid:20) n (cid:89) l = N M ( ρ − ( t k )+1 f δ l ( t l − , t l − t l − ) (cid:21) . (3.19) ˆ T i Using the expression for ˆ T i from Equation (3.12) and following a similar procedure to the ˆ m i,j calcula-tions, it can be seen thatˆ T i = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) l = N M ( ρ − ( s ) − )+2 f δ l ( t l − , t l − t l − ) (cid:21) ds (cid:35) . (3.20)In the above, the event { M ( s − ) = i } has been replaced with { M ( s ) = i } which does not change the densitydue to the fact that { M ( s ) } is continuous in probability.13 .4.4. The estimator for the operational time spent in each regime: ˆ T ∗ i The results from the previous section are easily adapted to obtain the following estimator for T ∗ i :ˆ T ∗ i = (cid:90) T P [ M ( s ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] γ ( s ) ds = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) l = N M ( ρ − ( s ) − )+2 f δ l ( t l − , t l − t l − ) (cid:21) γ ( s ) ds (cid:35) (3.21) The estimators derived in the previous section involve operations on the product of matrices, resulting inlarge computational cost. In the following section, we apply results from Roberts et al. (2006) and Van Loan(1978) to overcome the issues of computational cost and numerical instability.Firstly, some further notation is established for convenience. Let c k = P [ N M ( ρ − ( t k )) = k, N M ( ρ − ( t k ) − ) = k − | N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − ,N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − , . . . ] , (3.22)be the conditional probability of observing the k -th event arrival at time t k given all previous event arrivaltimes. Also, define c = P [ N M ( ρ − ( t )) = 1 , N M ( ρ − ( t ) − ) = 0]. An alternative formulation for the jointlikelihood of observed event arrivals can be obtained using these c k ’s: P [ N M ( ρ − ( t )) , ≤ t ≤ T ] = n (cid:89) k =1 c k . (3.23)In a similar manner to the approach in Roberts et al. (2006), the forwards/backward recursion equationsare defined by L ( k ) = π ( Q , Λ ) k (cid:89) l =1 f δ l ( t l − , t l − t l − ) c l , R ( k ) = n (cid:89) l = k f δ l ( t l − , t l − t l − ) c l , k = 1 , . . . , n (3.24)with L (0) = π ( Q , Λ ) and R ( n + 1) = . Note that for each k , L ( k ) is a 1 × r vector while R ( k ) is a r × c k = π ( Q , Λ ) k − (cid:89) l =1 f δ l ( t l − , t l − t l − ) c l f δ l ( t l − , t l − t l − ) = L ( k − f δ l ( t l − , t l − t l − ) , (3.25)the recursive functions can be iteratively computed as L ( k ) = L ( k − f δ k ( t k − , t k − t k − ) L ( k − f δ k ( t k − , t k − t k − ) , R ( k ) = f δ k ( t k − , t k − t k − ) R ( k + 1) L ( k − f δ k ( t k − , t k − t k − ) . (3.26)14he L and R functions can also be expressed in an alternative manner that is more interpretable. The i -thelement in each vector is L ( k ) i = P [ M ( t k ) = i | N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − , . . . , N M ( ρ − ( t )) = 1 , N M ( ρ − ( t ) − ) = 0] , (3.27) R ( k ) i = P [ N M ( ρ − ( t n )) = n, N M ( ρ − ( t n − ) − ) = n − , . . . , N M ( ρ − ( t k )) = k, N M ( ρ − ( t k ) − ) = k − | M ( t k − ) = i ] B k , (3.28)with B k = P [ N M ( ρ − ( t n )) = n, N M ( ρ − ( t n − ) − ) = n − , . . . , N M ( ρ − ( t k )) = k, N M ( ρ − ( t k ) − ) = k − | N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − , . . . , N M ( ρ − ( t )) = 1 , N M ( ρ − ( t ) − ) = 0] . The scaling procedure above overcomes the issue of numerical underflow when implementing this method onlarge data sets. It should also be noted that the derivations utilise the disjoint intervals of event/exposurechange inter-arrivals where the exposure measure γ ( t ) is constant so the results from Theorems 3.2, 3.3 andEquation (3.17) are applicable. ˆ m i,j In the following subsections, the scaled forwards/backwards recursions above will be applied to the E-step estimators derived in Section 3.4. Beginning with the estimator for ˆ m i,j obtained in Equation (3.18),factoring out the bracketed elements from the integrand and substituting the recursive functions L and R gives ˆ a i,j = n (cid:88) k =1 (cid:34) q i,j c k × (cid:32) π ( Q , Λ ) k − (cid:89) l =1 f δ l ( t l − , t l − t l − ) c l (cid:33) × (cid:90) t − k t k − ¯ F ( t k − , s − t k − )) i (cid:124) j f δ k ( s, t k − s ) ds × (cid:32) n (cid:89) l = k +1 f δ l ( t l − , t l − t l − ) c l (cid:33)(cid:35) (3.29)= q i,j n (cid:88) k =1 c k (cid:90) t − k t k − (cid:124) j f δ k ( s, t k − s ) R ( k + 1) L ( k − ¯ F ( t k − , s − t k − )) i ds (3.30)Using matrix notation, this becomes a (cid:124) = (cid:0) Q (cid:1) (cid:124) (cid:12) n (cid:88) k =1 c k (cid:90) t − k t k − f δ k ( s, t k − s ) R ( k + 1) L ( k − ¯ F ( t k − , s − t k − ) ds (3.31)= (cid:32) Q (cid:12) n (cid:88) k =1 I (cid:124) k c k (cid:33) (cid:124) , (3.32)where (cid:12) is the Hadamard product, otherwise known as the element-wise product for matrices. Also, theintegral component above is represented by I k where I k = (cid:90) t − k t k − f δ k ( s, t k − s ) R ( k + 1) L ( k − ¯ F ( t k − , s − t k − ) ds (3.33)= (cid:40)(cid:82) t − k − t k − exp[( Q − Λ γ ∗ t k − )( t k − t k − − s )]Λ γ ∗ t k R ( k + 1) L ( k −
1) exp[( Q − Λ γ ∗ t k − )( t k − t k − − s )] ds, δ k = 0 , (cid:82) t − k − t k − exp[( Q − Λ γ ∗ t k − )( t k − t k − − s )] R ( k + 1) L ( k −
1) exp[( Q − Λ γ ∗ t k − )( t k − t k − − s )] ds, δ k = 1 . (3.34)15here again, the integration is performed over disjoint time intervals where the exposure measure γ ( t ) doesnot change. Thus, within each inter-arrival interval I k , the constant exposure measure will be denoted γ k .The integrals within the cases above can be evaluated using the algorithm presented in Van Loan (1978).Define the following 2 r × r block-triangular matrices: C δ k k = (cid:34) Q − Λ γ ∗ t k − Λ γ t k R ( k + 1) L ( k − Q − Λ γ ∗ t k − (cid:35) , δ k = 0 , (cid:34) Q − Λ γ ∗ t k − R ( k + 1) L ( k − Q − Λ γ ∗ t k − (cid:35) , δ k = 1 . (3.35) I k can then be evaluated as the upper-right block of the matrix exp( C δ k k ( t k − t k − )). This exponential can beeasily and efficiently obtained using a variety of methods such as by using the diagonal Pad´e’s approximantwith repeated squaring (Moler and Van Loan, 2003, see). Combining all of the above, the required estimatorof a is obtained. ˆ n i Continuing on from Equation 3.19,ˆ n i = n (cid:88) k =1 π ( Q , Λ ) (cid:20) k (cid:89) l =1 f δ l ( t l − , t l − t l − ) c l (cid:21) i (cid:124) i (cid:20) n (cid:89) l = k +1 f δ l ( t l − , t l − t l − ) c l (cid:21) = (cid:2) n (cid:88) k =1 L ( k ) (cid:124) (cid:12) R ( k + 1) (cid:3) i . (3.36)In matrix form, this is ˆ n = n (cid:88) k =1 L ( k ) (cid:124) (cid:12) R ( k + 1) . (3.37) ˆ T i The computation of ˆ T i is very simple due to the following relationship between ˆ m i,i and ˆ T i (Equations(3.18) and (3.20) respectively): ˆ T i = ˆ m i,i q i,i . (3.38) ˆ T ∗ i The calculations for ˆ T ∗ i are slightly more complex than for ˆ T i . Continuing from Equation 3.21,ˆ T ∗ i = n (cid:88) k =1 (cid:104) c k L ( k − (cid:90) t − k t k − ¯ F ( t k − , s − t k − )) i (cid:124) i f δ k ( s, t k − s ) γ ∗ k dsR ( k + 1) (cid:105) . (3.39)These expressions were similar to the ones obtained when evaluating ˆ a i,i in Section 3.5.1, except thatthere is no q i,j constant in the summation and i = j in this case. It is seen that T ∗ = diag (cid:34) n (cid:88) k =1 γ ∗ k I (cid:124) k c k (cid:35) (3.40)where I k has already been evaluated in the calculation of ˆ a and the function diag denotes taking the diagonalentries of the matrix (as ˆ T ∗ is a r × .6. The M-step of the EM algorithm In comparison to the E-step, the M-step of the EM algorithm is very simple. Under the constraint that q i = (cid:80) j (cid:54) = i q i,j , the unique maximum likelihood estimators of the MMNPP parameters are given in terms ofthe EM algorithm parameters by the following expressions:ˆ q i,j = a i,j T i , ˆ λ i = n i T ∗ i . (3.41)This is derived using the standard MLE procedure applied to the loglikelihood in Equation (3.8). In summary, one iteration of the EM algorithm is given by the following:1. Using current estimates of Q and Λ, evaluate ˆ a , ˆ n , ˆ T and ˆ T ∗ .2. Using current estimates of ˆ a , ˆ n , ˆ T and ˆ T ∗ , evaluate Q and Λ.The last step is the determination of when the algorithm should be terminated. Two simple approachesmay be applied. One method is to consider the change to the matrices Q and Λ and to stop the algorithmwhen the difference between successive EM iterations is smaller than some tolerance value. Another approachis to instead compare the value of the log-likelihood function, which is non-decreasing for successive iterationsof the EM algorithm. From Equation (3.23) that the log-likelihood is given bylog P [ N M ( ρ − ( t k )) = k, N M ( ρ − ( t k ) − ) = k − , N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − ,N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − , . . . , N M ( ρ − ( t )) = 1 , N M ( ρ − ( t ) − ) = 0]= n (cid:88) k =1 log P [ N M ( ρ − ( t k )) = k, N M ( ρ − ( t k ) − ) = k − | N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − ,N M ( ρ − ( t k − )) = k − , N M ( ρ − ( t k − ) − ) = k − , . . . , N M ( ρ − ( t )) = 1 , N M ( ρ − ( t ) − ) = 0] , = n (cid:88) k =1 log c k . (3.42)Various simulation studies were carried out to ascertain the accuracy of the proposed algorithm. Theresults demonstrated a high level of accuracy and a number of other favorable attributes such as fastconvergence of parameter estimates and robustness to initial inputs. For conciseness, we have only presentedtwo case studies in Section 4 the full details have been omitted from this paper but are available for interestedreaders in Avanzi et al. (2018). An important aspect of the model is the selection of the order of the MMNPP. Some papers such asLandon et al. (2013) have proposed procedures for doing so although discussion of this aspect is relativelyuncommon. Generally, simpler models such as order 2 or 3 are chosen for the purposes of demonstration orto maintain parsimony and interpretability of the underlying regimes.From the previous section, the ease of obtaining the log-likelihood facilitates the use of likelihood-basedmodel selection procedures. When choosing the optimal order of the MMNPP, the standard AIC and BICcriteria, as well as their extensions, are straightforward to evaluate. However, we suggest the usage of amore data-driven approach to order selection.Our proposed methodology for order selection in the absence of (or to complement) expert opinion is asfollows. For a chosen order x (starting at order 2, provided there is evidence of regimes as discussed laterin Section 5.3.1), residuals are calculated as the difference between the observed number of events and thepredicted number of events. A white-noise test then applied, and if the evidence of the white-noise propertyis rejected, then the MMNPP model of order x + 1 is fitted instead and the process iterates until a suitable17 -value for the hypothesis of white noise residuals is obtained. In the empirical case study shown in Section5, the Bartlett B test for white noise is used (see Bartlett, 1967).The methodology outlined above for order selection is an intuitive approach based on our implementationand interpretation of the modelling framework. This framework seeks to capture as much modellable/knowninformation as possible in the exposure/volume measure γ . The hidden Markov layer then captures anyresidual data features that may be left over such as auto-correlation and over-dispersion. From this perspec-tive, a model selection criterion based on white-noise residual testing is very natural as it means that thereare a sufficient number of regimes in the hidden Markov layer to capture the leftover residual informationafter the data has been processed by the γ component. Thus, this methodology sets a minimal number ofregimes required to remove data characteristics like auto-correlation.Further questioning of whether an even higher number of states is necessary can be done through standardprocedures such as the Akaike/Bayesian information criteria or by more qualitative assessments based onthe importance of interpreting the underlying regimes. However, we note that the relationship betweenlikelihoods and the presence of data features such as auto-correlation is less clear compared to examinationsof residuals. Thus, we believe that our proposed white-noise residual testing methodology leads to a well-fitted model that balances parsimony and interpretability.
4. Numerical illustration: Simulated case study
In this section, we include a short simulation case study to demonstrate the accuracy and efficiency ofthe calibration procedure we have detailed in the previous section. It is seen that the algorithm producesestimates that are close to the true values, and this convergence occurs over a reasonably large rangeof starting parameter estimates for the algorithm. Further, the computational advantages described in theprevious sections are highlighted as calibrations over tens of thousands of events required only a few minutes,which is sufficiently fast for practical implementation.In this case study, we simulate events from a defined MMNPP in order to compare estimates with theirtrue values. For simplicity, an order 3 MMNPP was chosen. The chosen parameter values for the generatormatrix of the Markov chain and the baseline Poisson intensities are given respectively by Q = − . . . . − . . . − . , λ = (5 , , . (4.1)Finally, the exposure function γ was a piecewise function that varied periodically. For example, thesimulation below varied the exposure function every 100 time units. It is noted that the frequency of thischange was tested at a variety of different levels from 10 to 200 time units and it is shown to have negligibleimpact on the parameter estimates for Q and Λ.Events are simulated over a period of 1,000 time units which produces tens of thousands of event arrivaltimes. In all simulations, the total calibration time using the EM algorithm took less than 5 minutes.The calibrated parameter estimates for Q and Λ areˆ Q = − .
77 0 .
48 0 . . − .
97 0 . .
27 0 . − . , ˆ λ = (5 . , . , . . (4.2)It can be seen that the calibrated values are all quite close to the true values. In particular, thefrequencies associated with each regime are very precise. These results are consistent across many simulationspecifications. Also, the calibrated values demonstrate some robustness as convergence to the true valueswas maintained over a relatively large range of initial estimates.18 .1. Calibration times As previously mentioned, the calibration times in this simulation case study of approximately 20,000events took less than 10 minutes, which is very quick for such volumes of data. In addition, our empiricalcase study detailed in Section 5 contains a calibration involving almost three-quarters of a million claims,and that calibration took approximately 4 hours to complete. As a result, we believe that we have sufficientlyreduced the calibration times to practicably implementable levels.We conducted another small simulation study to examine the calibration times and memory requirementsassociated with different choices of MMNPP order as well as different data set sizes. This study wasperformed on a desktop computer with an i7-4790 CPU @ 3.60GHz and the results are shown in Table 1below.
Num Obs 1,000 2,000 5,000 10,000 100,000 500,000Order 2
Timing (sec) 21.79 42.17 107.09 210.46 1816.7 7517.73RAM Usage (MB) 0.32 0.49 1.12 2.2 21.46 96.87
Order 10
Timing (sec) 59.03 118.30 334.62 549.50 4783.77 24372.67RAM Usage (MB) 4.12 8.1 20.07 40.06 399.86 1990.98
Table 1: Timing and memory usage by number of observations and order of the MMNPP
We can see that the computation times required grow proportionately with the size of the data set thatthe parameters must be calibrated on. However, when moving from an order 2 to and order 10 model, theincrease in computation times is much smaller despite the increasing number of parameters that need to beestimated (going from order r to order r + 1 requires an extra 2 r + 1 number of parameters).In terms of the memory requirements, it is seen in the above table that for even very large data sets ofhalf a million observations and an extreme case of an order 10 MMNPP, the RAM usage of the algorithmis still within feasible ranges of most computers. However, we expect that with most implementations ofthe proposed MMNPP framework, it is preferable to maintain a lower number of regimes for reasons ofparsimony and interpretability. As a result, we believe that the proposed modelling framework is applicableto a variety of modelling procedures involving large data sets. We conclude the simulated case study with a note on the convergence of the parameters. When the iter-ative EM calibration algorithm is terminated using a likelihood condition, it is beneficial if the convergenceto the maximal likelihood value is suitably regular. This is indeed the case, as can be seen in Figure 1 below.
EM iteration number D i ff e r en c e i n l og li k e li hood
20 40 60 80 100 . . . . . EM iteration number starting at iteration 10 D i ff e r en c e i n l og li k e li hood Figure 1: Plot of the difference between successive loglikelihood values
19t can be seen that the loglikelihood converges to the maximal value in a very fast, exponential-likefashion. This demonstrates the beneficial convergence properties of the proposed calibration algorithm. Itcan be seen that this type of convergence holds not only for the loglikelihood but also for the parameterestimates as well. Figure 2 below provides an example of a similar quick convergence rate for the frequencyand transition parameters, despite initial estimates being somewhat far from their true values:
EM iteration number E s t i m a t e s f o r λ . . . . . . . EM iteration number E s t i m a t e s f o r q , Figure 2: Plot of successive parameter estimates for λ and q , From comparison of the convergence plots, we suggest a stopping criterion based on a likelihood differenceof 1 × − , which results in about 60 iterations of the EM algorithm in this case.
5. Numerical illustration: Motor insurance claims
In the following section, we apply the MMNPP model to a set of motor insurance real data to determinewhat insights can be gained in order to assist insurance operations in a real world setting; see also Section1.3 for additional background. Insurance data sets have several characteristics that call for the additionalfeatures introduced in this paper. These data sets generally consists of a large number of event observationswhich, as seen in the previous section, facilities more accurate detection of the underlying regimes. Further,the modelling of claims in motor lines of business is standard, which assists in the analysis required todetermine the appropriate exposure measure γ . How we decided which trends to include in γ is outlined inthe Section 5.3. The residual components are modelled by a MMNPP. Additional insights available throughthe MMNPP model are discussed. The Allianz, University of New South Wales, Suncorp and Insurance Australia Group (AUSI) datasetwas developed as part of an Australian Research Council Linkage Project. The data set contains non-lifeinsurance claims records in a standard format containing information on each claim transaction such asoccurrence and notification dates, claim states and finalisation dates. It also includes a policy file whichdescribes underwritten policies, containing information such as date of inception/expiry and sum insured.This case study investigates the Private Motor line of business from the 1st of January, 2005 to the 31stof December, 2010. It is noted that the data set extends to the end of 2014 and due to the short-tailednature of the reporting delay for motor claims, all accidents that occurred during this period are expectedto have been reported in the data set. As a result, considerations of censoring and reporting delay arenot material here. Further, all claims related to catastrophe events have been removed from this data set.This is due to the fact that catastrophic claims generally exhibit very different characteristics compared tostandard claims and thus it is standard practice to separate their analysis from the main body of the data.20therwise, their presence may distort the analysis (Hart et al. (1996)). In total, the utilized subset of thedata consists of significantly more than half a million claim observations.Because the level of granularity of the claims data is not finer than daily, an extra assumption is requiredto distribute the claim arrival times throughout the day, as the MMNPP is a continuous time model. In orderto not introduce any artificial variability, the claims are assumed to arrive at equal intervals throughout theday. This approach was compared using simulation studies against simulating claim arrivals using a uniformdistribution but the latter approach induced a small amount of additional variability, clouding results interms of regime identification. Further, issues arose in terms of the appropriate method to aggregate thesesimulations. Finally, the use of simulations meant that the model had to be fit several times, increasing thecomputational burden of the approach. Thus, the more convenient approach of an equidistant distributionwas taken instead.
We present some figures in the following section to provide the reader some sense of the data. Due toconfidentiality, identifying features such as y-axes of plots have been removed.Figure 3 below plots the evolution of the number of claims per day over the investigated period of 6years while the line depicts the number of insurance policies in force over the same period.
Day N u m be r o f c l a i m s pe r da y Figure 3: Number of claims and policies in force by day
We examine the daily claim frequencies are scaled by the number of insurance policies in force. Figure4 below plots the auto-correlation function up to a lag of 60 days and some data features are evident, suchas the strong weekly seasonality within the data. This is not unexpected and known data features such asthese are incorporated into the volume/exposure component and details of how this procedure is applied isgiven in the next section. 21
10 20 30 40 50 60 − . . . . . . . Lag A C F Figure 4: Plot of the auto-correlation function of daily claim events scaled by the number of policies
Within the context of motor claims analysis, several factors are typically incorporated into the frequencyperturbation measure γ . Clearly, the number of policies in force are a relevant exposure component but othertrends such as seasonality also exist. An over-dispersed Poisson (ODP) GLM was chosen as an appropriatemodel to determine the exposure process γ . Such an approach is standard in actuarial modelling. Covariatesof interest were identified using a combination of data analysis and domain knowledge. Statistical significancewas then tested and the final model consisted of the features relating to the number of policies in force(introduced as an offset), various forms of seasonality such as weekday/weekend, public holiday, month andday of month fluctuations, and a linear residual component to capture the gradual decay of motor claimfrequencies over time.Further analysis was conducted of γ over time and no additional events involving sudden changes in theexposure component were deemed necessary. This is partly due to the large number of claim events withinthe data set. A natural question that may arise is whether hidden regimes exist within the data after having accountedfor the data features above. A number of statistical results indicate that this is indeed the case. Firstly, thedispersion parameter calibrated in the ODP GLM was 3.16, indicating some level of over-dispersion withinthe data relative to the standard Poisson process.Further, a Wald-Wolfowitz runs test was also applied in order to test for persistence of the frequencyprocess within regimes. This procedure tested for randomness in the runs above and below the Poissonestimate of the intensity, having adjusted for the factors in the volume component outlined in the previoussubsection. The obtained p-value was 0, suggesting that the frequencies tended to remain at certain levelsover time.A white-noise residual testing procedure was utilised for order selection, and an order 4 MMNPP wasidentified as optimal; see Section 3.8 for methodological details.Finally, we produce another auto-correlation plot using the residuals of the ODP GLM fit to test forserial correlation features that are not yet taken into account by the GLM. Indeed, as the optimal order ofthe MMNPP model is 4 (and not 1), it is expected that there still exists some residual auto-correlation leftin the time series data and this is corroborated in Figure 5 below.22
10 20 30 40 50 60 . . . . . . Lag A C F Figure 5: Plot of the auto-correlation function of the ODP GLM residuals
After obtaining the exposure component, we now move to the calibration of the hidden Markov com-ponent. As previously mentioned, an order 4 MMNPP was chosen and the final calibrated parameterswere Q = − .
39 0 .
08 0 .
28 0 . . − .
06 0 .
05 0 . .
38 0 . − .
43 0 . .
00 0 .
00 0 . − . , λ = (135 , , , . (5.1)The distance between Poisson claim intensities suggest that the identified regimes are distinct. The generatormatrix provides information about to the transition probabilities between regimes averaged over the entireperiod and modellers can also obtain other inferences such as how many claims occurred within each regimeand how much time was spent in each regime through the estimators used in the EM algorithm. This isdisplayed in Table 2. For confidentiality reasons, the number of claims in each regime statistic has beenprovided as a proportion of total claims instead. Statistic of interest Estimator
Number of changes to each regime − . . . . . − . . . . . − . . . . . − . Proportion of claims in each regime (2.3%, 88.9%, 8.7%, 0.0%)Time spent in each regime (days) (52.4, 1870.5, 267.1, 1.0)
Table 2: E-estimator quantities of interest
Furthermore, Equation (3.42) allows for the generation of Figure 6, which depicts the most likely regimeat each claim arrival time. Several insights are able to be drawn here. Firstly, there is a single day (a publicholiday coinciding with a weekend) with an extremely high number of claims compared to expectations,sufficiently so to warrant a separate regime (regime 4). As insurance claims related to catastrophic eventshave been removed, such an explanation is unlikely. This result demonstrates some robustness of theMMNPP model to outliers, as the observation has been objectively identified and its impact on the otherobservations and regimes has been mitigated. This mitigation effect is due to the fact that the high intensityassociated with the fourth regime will not skew the parameter estimates for the other regimes as significantly.For example, if the order of the MMNPP model was instead three, then this outlying observation wouldhave increased the frequency parameter associated with the highest regime.23 laim arrival times (days) M o s t li k e l y r eg i m e Figure 6: Most likely regimes over time
The remaining sequence of most likely regimes seems readily interpretable. There is a “normal” state(State 2) within which the process spends the majority of the investigation period. There are occasionaljumps to the higher frequency regime (State 3) and somewhat periodic jumps to the lower frequency regime(State 1). This latter result produces some interesting insights which we discuss in the next section.
A secondary observation demonstrates the diagnostic property of the MMNPP model. Figure 6 suggeststhat the adjusted frequency process remains in State 2 with occasional jumps to a higher or lower state.A closer look leads to the identification of periodic transitions to State 1. This appears to be due to theeffect of workers taking leave from the 27th of December until the new year. This effect was not originallycaptured in the known component γ but is fairly intuitive in hindsight. The reduction in claims on the25th and 26th was already captured (due to these dates being public holidays), but to further include thefollowing dates until New Year’s Eve was not obvious, and could easily be challenged in absence of evidencesuch as provided by our methodology.Modellers may encounter difficulties in properly identifying all relevant factors of interest for analysis inpractice. In the example above, upon discovery of the “end-of-year” effect on claim frequencies, this covariatecan be included within the γ component of the model. This iterative exercise produces model improvementas the more interpretable volume/exposure component is expanded. Further, the impact of any additionallyidentified factors are removed from the hidden Markov chain, allowing this component to more realisticallyrepresent data features that are truly unobservable to the modeller. Repeated application of this diagnosticfeature of the MMNPP model is facilitated by the computational speed increases produced in the previoussections. The final result is ideally a model where all known features are accounted for through the frequencyperturbation measure γ while the hidden Markov regimes capture the aggregated effects of factors that areunable to be explicitly modelled. In order to demonstrate the benefits provided by the MMNPP approach as opposed to more elementaryones, we now provide a comparison between the MMNPP model and four special (simpler) cases, namely:(1) a homogeneous Poisson process (which serves as the null model), (2) a non-homogeneous Poisson process(with the non-homogeneous component incorporating the effect of covariates through the ODP GLM), (3) ahomogeneous Markov-modulated Poisson process with 3 states, and (4) another homogeneous MMPP modelwith 10 states.Model (2) extends Model (1) by allowing a non-homogeneous (but deterministic) intensity, whereasModel (3) extends Model (1) by making the constant intensity regime-dependent. However, it can be seenbelow that the predictions produced by Model (3) are suboptimal and thus Model (4) is introduced to extendModel (3) through the use of a greater number of regimes. The MMNPP considered in this paper combines24he extensions outlined above, and the comparisons below suggest that both are necessary for the data setunder consideration.
Remark 5.1.
We note here that after identification of the outlying regime and end of year effects discussedin the previous sections, these covariates were included in the ODP GLM associated with the γ frequencyperturbation component. The order selection methodology described in Section 3.8 was again applied, re-sulting in a 3 regime MMNPP being chosen as optimal. Analysis of model outputs shows that essentially,the outlying regime (regime 4) has been removed while the lower regime (regime 1) has been retained asthere are still a sufficient number of events to warrant a separate regime. Thus, we emphasise that in thesecomparisons, the chosen ODP GLM incorporates the additional features described in Section 5.5. This isgenerous, because without use of the MMNPP framework, some of these data features may not have beenidentified. Remark 5.2.
The chosen number of states/regimes in the MMPP models is also based on the residual-testing methodology outlined previously. The order 3 MMPP model was chosen to match the MMNPPmodel. Model (4) attempted to utilise a similar residual-testing method to obtain the optimal number ofstates for an MMPP model. However, even upon reaching 10 states, the residuals still did not resemblewhite noise. A higher number of states would continue to improve in-sample fit but this comes at the cost ofreduced model interpretability. Thus, our chosen number of states is 10 in this case.
For each of the models, we produce daily in-sample predictions and then examine the residuals. Thedaily granularity was chosen to match the granularity of the original data. Figure 7 plots the predictions inthe left column and the residuals for each model in the right column. Note that for confidentiality reasons,the y-axis has been removed.Table 3 provides summary statistics of the residuals for each model, as well as the p -value for the Ljung-Box tests for auto-correlation up to lags of 91, 182 and 365 days and the Wolf-Waldowitz runs test forrandomness/persistence. These lags were chosen as they are the prevalent granularities at which insurancedata is examined. Finally, the last row gives an indication of the calibration time involved for each model.HPP NHPP MMPP-3 MMPP-10 MMNPP-3Σ residuals 0.0 0.0 748.8 -1130 1418Σ absolute residuals 130,644 45,211 59,501 44,487 40,203Σ residuals squared 12,051,219 1,670,274 2,725,133 1,385,661 1,295,155Ljung-Box test p-value (lag = 91) 0 8e-08 0 0 0.08Ljung-Box test p-value (lag = 181) 0 1e-06 0 0 0.21Ljung-Box test p-value (lag = 365) 0 2e-06 0 0 0.12Runs test p-value 0 9e-08 1e-05 0.96 0.85Calibration time < < ∼ ∼ ∼ Table 3: Summary statistics for model comparison
It is clear from the results that the homogeneous Poisson prediction (HPP) produces a poor fit. Further,all residual summary statistics are significantly greater than the ones for any other considered model.Model NHPP (2) uses the same over-dispersed Poisson GLM for the non-homogeneous intensity functionas is used for the MMNPP-3 model. The aim of this approach is to incorporate domain knowledge withinthe model-building process. In addition, this method is standard in the industry both due to ease ofimplementation as well as model understandability and tractability. From Table 3, this approach provides amarked improvement upon Model (1). The residual plot also seem more reasonable in comparison. However,both statistical tests applied to Model NHPP (2) indicate that the residuals do not resemble white noiseand thus, there remain some data features that have still not been adequately accounted for.Model MMPP-3 (3) attempts to perform the same task as Model NHPP (2) through the use of themore-objective Markov-modulated Poisson process. However, it can be seen from the prediction plots ofModels NHPP (2), MMPP-10 (4) and the MMNPP-3 that there is a increasing trend in the raw claims25
500 1000 1500 2000
Days H PP P r ed i c t i on Days H PP R e s i dua l s Days NH PP P r ed i c t i on Days NH PP R e s i dua l s Days MM PP - P r ed i c t i on Days MM PP - R e s i dua l s Days MM PP - P r ed i c t i on Days MM PP - R e s i dua l s Days MM N PP - P r ed i c t i on Days MM N PP - R e s i dua l s Figure 7: Predictions and residuals for each model frequency over time. The prediction plot for Model MMPP-3 (3) illustrates that with only 3 regimes, theMMPP model provides a poor fit to the data. Evidence of inferior fit compared to other models underconsideration is also seen in Table 3.The 10-state MMPP approach (4) is a more data-driven approach compared to Model NHPP (2), and thehigh number of regimes means that the fit is able to better accommodate the trends/patterns within the data,including both the increasing frequency over time as well as the periodic jumps to lower frequency regimes.The residual plot depicts lower magnitudes as well when compared to Model MMPP-3 (3). However, Table 3provides strong evidence that there still exists auto-correlation within the residuals, indicating the presenceof data features not taken into account by the Markov modulation. Further, the high number of regimes,while attempting to compensate for the increasing trend, unfortunately detract from understanding of themodel. Ultimately, this is an issue with the structure of the MMPP model. A more natural approach maybe to account for this trend elsewhere instead of using discretised regimes. This removes the dominatingeffect of the trend and ideally, reduces the number of regimes. It may be difficult to extract explainableinformation from a model with a very high number of regimes, as evidenced by the MMPP prediction plotabove (which we note resembles the regime filtering plot for the most likely regime per day). Aside fromthe obvious upwards trend, it requires significant effort to develop additional insight into the causes of thehidden regimes.Finally, the 3-state MMNPP residuals resemble white-noise. This model serves as a useful compositemodel that combines the ‘non-parametric’ fit of the MMPP with the model interpretability provided by the26on-homogeneous Poisson model. The latter component allows for the tractable incorporation of domainknowledge whilst maintaining superior goodness-of-fit. The number of regimes in the MMNPP is only 3,which suitably caters for further diagnosis and model improvement. Also, by adding structural featuresthrough the ODP GLM, we have actually improved marginally on the fit of the MMPP, as can be seen inthe residual statistics shown in Table 3. This has also removed the auto-correlation present in the NHPP andMMPP residuals. In summary, this case study demonstrates that the MMNPP produces a significant numberof advantages through its extensions to the NHPP and MMPP models by allowing for expert experience andfeatures extracted through data analysis to be integrated into the modelling process, while still producingexcellent fits to data and accounting for characteristics such as auto-correlation.
Remark 5.3.
We remark that the MMNPP provides clear advantages for out-of-sample comparisons. TheMMNPP model overcomes a key weakness of the original MMPP model in terms of out-of-sample predictions,in that MMPP predictions are (without adjustment) unable to maintain the time-structure of regimes thatare extracted from the data set. In other words, the MMNPP allows for known effects which do not followMarkov arrivals to be modelled by the non-homogeneous component. Further, we have already commentedthat the structure of the MMPP seems unsuitable to capture the upwards trend displayed in the data and it isexpected that out-of-sample comparisons will heavily favour the MMNPP model. For example, if this trendwere continue in the future, the MMPP model would be unable to account for the higher frequency. Thus,we do not provide numerical results of the out-of-sample comparison here.
6. Conclusion
In this paper, a Markov-modulated non-homogeneous Poisson process framework was introduced to modelcounting processes. We produced an extension of the standard MMPP models through a flexible frequencyperturbation measure, which allows for the inclusion of both non-recurring and periodic data features tobe readily incorporated into analysis. The proposed extension is beneficial as it not only allows knownfactors of interest to be included in a tractable manner but also provides some control over the underlyingdrivers of the observed events that are modelled through the hidden Markov chain. This produces a morecomprehensive, accurate and realistic count model compared to the original MMPP approach. This methodis also intuitive as the MMNPP is shown to be an operational time transform of the original MMPP. Anefficient EM calibration procedure is derived, allowing the model to be applied in cases where there arelarge numbers of observations. In these situations, issues of computational cost and numerical stability areof great concern. These barriers to practical implementation are addressed through various computationaland theoretical innovations such as a scaling procedure for the backwards-forwards recursions utilised inthe calibration algorithm and methods to improve calibration speeds. Our theoretical results and proposedimplementation procedure are illustrated through a case study using real insurance data and insights thatare provided through the MMNPP model are discussed. The procedure is shown to generate interpretableregimes in the empirical study and the MMNPP analysis framework is shown to generate additional resultsthat can allow modellers to delve more deeply into the data if it is deemed necessary. Comparisons toalternative models demonstrate the advantages of the MMNPP model as an attractive composite modelthat provides both superior goodness-of-fit as well as significant model interpretability and tractability.
Acknowledgements
The authors are grateful to two anonymous referees for detailed comments, which led to substantialimprovements to the paper. Earlier versions of this paper were presented at the 8th and 9th
AustralasianActuarial Education and Research Symposium in Sydney (Australia), the in Sydney (Australia), the
Actuarial Risk Modelling and ExtremeValues Workshop in Canberra (Australia), the 4th
European Actuarial Journal Conference in Leuven (Bel-gium), the ASTIN Colloquium in Cape Town (South Africa) as well as at seminars held at the universitiesof Lyon and Copenhagen in 2019. Some results in this paper were also presented at the Australian Actu-aries Institute General Insurance Seminar in November 2018 (and awarded the Taylor Fry Silver Award);27ee Avanzi et al. (2018). The authors are grateful for constructive comments received from colleagues whoattended these events.This research was supported under Australian Research Council’s Linkage (LP130100723, with fundingpartners Allianz Australia Insurance Ltd, Insurance Australia Group Ltd, and Suncorp Metway Ltd) andDiscovery (DP200101859) Projects funding schemes. Furthermore, Alan Xian acknowledges financial supportby the Australian Government Research Training program, as well the UNSW Business School throughsupplementary scholarships. The views expressed herein are those of the authors and are not necessarilythose of the supporting organisations.
Data and Code
We are unable provide the dataset that was used in the empirical case study due to confidentiality.However, simulated data sets with similar features, as well as all relevant codes, can be found at https://github.com/agi-lab/reserving-MMNPP . ReferencesReferences
Albrecher, H., Araujo-Acuna, J., Beirlant, J., 2019. Fitting non-stationary Cox processes - an application to fire insurancedata. North American Actuarial Journal .Arts, J., 2017. A multi-item approach to repairable stocking and expediting in a fluctuating demand environment. EuropeanJournal of Operational Research 256, 102–115.Arts, J., Basten, R., Van Houtum, G.J., 2016. Repairable stocking and expediting in a fluctuating demand environment:Optimal policy and heuristics. Operations Research 64, 1285–1301.Avanzi, B., Taylor, G., Wong, B., Xian, A., 2018. How to proxy the unmodellable: Analysing granular insurance claims in thepresence of unobservable or complex drivers. 2018 General Insurance Seminar .Avanzi, B., Wong, B., Yang, X., 2016. A micro-level claim count model with overdispersion and reporting delays. Insurance:Mathematics and Economics 71, 1–14.Badescu, A.L., Chen, T., Lin, X.S., Tang, D., 2019. A markoed Cox model for the number of IBNR claims: Estimation andapplication. ASTIN Bulletin , 1–31doi: .Bartlett, M., 1967. Some remarks on the analysis of time-series. Biometrika 54, 25–38.Casale, G., Sansottera, A., Cremonesi, P., 2016. Compact Markov-modulated models for multiclass trace fitting. EuropeanJournal of Operational Research 255, 822–833.Ching, W.K., 1997. Markov-modulated Poisson processes for multi-location inventory problems. International Journal ofProduction Economics 53, 217–223.Crevecoeur, J., Antonio, K., Verbelen, R., 2019. Modeling the number of hidden events subject to observation delay. EuropeanJournal of Operational Research .Gabrielli, A., Richman, R., W¨uthrich, M.V., 2019. Neural network embedding of the over-dispersed Poisson reserving model.Scandinavian Actuarial Journal , 1–29.Greene, A.M., Robertson, A.W., Smyth, P., Triglia, S., 2011. Downscaling projections of Indian monsoon rainfall using anon-homogeneous hidden Markov model. Quarterly Journal of the Royal Meteorological Society 137, 347–359.Guillou, A., Loisel, S., Stupfler, G., 2013. Estimation of the parameters of a Markov-modulated loss process in insurance.Insurance: Mathematics and Economics 53, 388–404.Guillou, A., Loisel, S., Stupfler, G., 2015. Estimating the parameters of a seasonal Markov-modulated Poisson process.Statistical Methodology 26, 103 – 123.Gusella, R., 1991. Characterizing the variability of arrival processes with indexes of dispersion. Selected Areas in Communi-cations, IEEE Journal on 9, 203–211.Hart, D.G., Buchanan, R.A., Howe, B.A., 1996. The actuarial practice of general insurance. 517/H33a.Hughes, J.P., Guttorp, P., Charles, S.P., 1999. A non-homogeneous hidden Markov model for precipitation occurrence. Journalof the Royal Statistical Society: Series C (Applied Statistics) 48, 15–30.Kou, S., Sunney Xie, X., Liu, J.S., 2005. Bayesian analysis of single-molecule experimental data. Journal of the Royal StatisticalSociety: Series C (Applied Statistics) 54, 469–506.Lagona, F., Maruotti, A., Picone, M., 2011. A non-homogeneous hidden Markov model for the analysis of multi-pollutantexceedances data, in: Hidden Markov Models, Theory and Applications. IntechOpen.Landon, J., ¨Ozekici, S., Soyer, R., 2013. A Markov modulated Poisson model for software reliability. European Journal ofOperational Research 229, 404–410.Langrock, R., Borchers, D.L., Skaug, H.J., 2013. Markov-modulated nonhomogeneous Poisson processes for modeling detectionsin surveys of marine mammal abundance. Journal of the American Statistical Association 108, 840–851. eroux, B.G., Puterman, M.L., 1992. Maximum-penalized-likelihood estimation for independent and Markov-dependent mix-ture models. Biometrics , 545–558.Meier-Hellstern, K.S., 1984. A statistical procedure for fitting Markov-modulated Poisson processes. Technical report, Univ.of Delaware .Moler, C., Van Loan, C., 2003. Nineteen dubious ways to compute the exponential of a matrix, twenty-five years later. SIAMreview 45, 3–49.Montoya, R., Netzer, O., Jedidi, K., 2010. Dynamic allocation of pharmaceutical detailing and sampling for long-term prof-itability. Marketing Science 29, 909–924.Nasr, W.W., Maddah, B., 2015. Continuous (s, s) policy with MMPP correlated demand. European Journal of OperationalResearch 246, 874–885.Norberg, R., 1993. Prediction of outstanding liabilities in non-life insurance. ASTIN Bulletin 23, 95–115.¨Ozekici, S., Parlar, M., 1999. Inventory models with unreliable suppliers in a random environment. Annals of OperationsResearch 91, 123–136.Roberts, W.J., Ephraim, Y., Dieguez, E., 2006. On Ryd´en’s EM algorithm for estimating MMPPs. Signal Processing Letters,IEEE 13, 373–376.Rossiter, M.H., 1989. Characterizing a random point process by a switched Poisson process. Phd Thesis, Monash University,Melbourne .Ryd´en, T., 1994. Parameter estimation for Markov modulated Poisson processes. Communication in Statistics. StochasticModels 10, 795–829.Ryd´en, T., 1996. An EM algorithm for estimation in Markov-modulated Poisson processes. Computional Statistics & DataAnalysis 21, 431–447.Ryd´en, T., et al., 2008. EM versus Markov chain Monte Carlo for estimation of hidden Markov models: A computationalperspective. Bayesian Analysis 3, 659–688.Salvador, P., Valadas, R., Pacheco, A., 2003. Multiscale fitting procedure using Markov modulated Poisson processes. Telecom-munication systems 23, 123–148.Scott, S., Smyth, P., 2003. The Markov modulated Poisson process and Markov Poisson Cascade with applications to webtraffic modelling. Bayesian Statistics 7.Shi, S.W., Zhang, J., 2014. Usage experience with decision aids and evolution of online purchase behavior. Marketing Science33, 871–882.Thayakaran, R., Ramesh, N., 2013a. Markov modulated Poisson process models incorporating covariates for rainfall intensity.Water Science and Technology 67, 1786–1792.Thayakaran, R., Ramesh, N.I., 2013b. Multivariate models for rainfall based on Markov modulated Poisson pro-cesses. Hydrology Research 44, 631–643. URL: http://hr.iwaponline.com/content/44/4/631 , doi: , arXiv:http://hr.iwaponline.com/content/44/4/631.full.pdf .Van Loan, C., 1978. Computing integrals involving the matrix exponential. IEEE transactions on automatic control 23,395–404.Verrall, R., W¨uthrich, M., 2016. Understanding reporting delay in general insurance. Risks 4, 25.Yoshihara, T., Kasahara, S., Takahashi, Y., 2001. Practical time-scale fitting of self-similar traffic with Markov-modulatedPoisson process. Telecommunication Systems 17, 185–211.Zhang, J.Z., Netzer, O., Ansari, A., 2014. Dynamic targeted pricing in B2B relationships. Marketing Science 33, 317–337. . Appendix: Proof of Lemmas 2.4 and 2.5Proof of Lemma 2.4 For some constants s, t , it is known that P (cid:2) N M ( ρ − ( t )) | N M ( ρ − ( i )); 0 ≤ i < s (cid:3) = P [ N ( t ) | N ( i ); 0 ≤ i < s ]= P [ N ( t ) | N ( s )]= P (cid:2) N M ( ρ − ( t )) | N M ( ρ − ( t )) (cid:3) Thus, the new process { M ( t ) , N M ( ρ − ( t )) } retains the Markov property. Proof of Lemma 2.5 ( = ⇒ ) Beginning from the definition of a homogeneous MMPP, P [ N M ( t ) = n, N M ( t ) = n ] = exp (cid:20) − (cid:90) t t λ M ( x ) dx (cid:21) = exp (cid:34) − m (cid:88) k =1 (cid:90) u k u k − λ s k dx (cid:35) = exp (cid:34) − m (cid:88) k =1 λ s k ( u k − u k − ) (cid:35) Here, s k represents the regime of the process M ( t ) in the interval I k .( ⇐ = ) Beginning from the expression,lim h → − P [ N M ( t ) = n, N M ( t + h ) = n ] h = lim h → − exp [ − (cid:80) mk =1 λ s k ( u k − u k − )] h for the regime intervals that fall between times t and t + h . Note that as h approaches zero, the rightexpression eventually becomes a time interval within a single regime period, so thatlim h → − exp [ − (cid:80) mk =1 λ s k ( u k − u k − )] h = lim h → − exp (cid:2) − λ M ( t ) ( t − ( t + h )) (cid:3) h = λ M ( t ) . Thus, homogeneity is achieved as λ M ( t ) is the constant claim intensity at time t for state of the process M at time t . B. Appendix: Proof of Proposition 2.6Proof of Proposition 2.6
This proposition is a very intuitive result. As the Markov process M ( t ) isthe same in both MMPP processes, it suffices to prove that the combined process N , M ( t ) is a conditionalPoisson process with intensity λ M ( t ) = λ M ( t ) + λ M ( t ) in order to obtain the result that { M ( t ) , N , M ( t ) } isthe required MMPP. This follows directly from the definition of Poisson processes. We begin by noting thateach N iM ( t ) is a conditional Poisson process, i.e., N iM ( t ) for i = 1 , N iM (0) = 0;2. N iM ( t ) has independent increments;3. P [ N iM ( t + ∆ t ) − N iM ( t ) = 1] = λ iM ( t )∆ t + o (∆ t ); and4. P [ N iM ( t + ∆ t ) − N iM ( t ) ≥
2] = o (∆ t ),where o (∆ t ) is little-o notation for lim ∆ t → o (∆ t )∆ t = 0. We wish to prove that these properties also hold for N , M ( t ).The first property is easily seen for N , M ( t ) as N , M (0) = N M (0) + N M (0) = 0 + 0 = 0. The secondproperty holds by using that the conditional Poisson processes N iM ( t ) are independent for i = 1 , N iM ( t ) must also have independent increments.30or the third property, we have that P [ N , M ( t + ∆ t ) − N , M ( t ) = 1] = P [ N M ( t + ∆ t ) − N M ( t ) = 1 , N M ( t + ∆ t ) − N M ( t ) = 0]+ P [ N M ( t + ∆ t ) − N M ( t ) = 0 , N M ( t + ∆ t ) − N M ( t ) = 1]= ( λ M ( t ) ( t )∆ t + o (∆ t ))(1 − λ M ( t ) ( t )∆ t + o (∆ t ))+ (1 − λ M ( t ) ( t )∆ t + o (∆ t ))( λ M ( t ) ( t )∆ t + o (∆ t ))due to the independence between the two processes= [ λ M ( t ) ( t ) + λ M ( t ) ( t )]∆ t + o (∆ t ) . The final property can be shown in a similar manner. Thus, the combined process N , M ( t ) is a conditionalPoisson process with a claim intensity given by λ M ( t ) ( t ) + λ M ( t ) ( t ). C. Appendix: Derivations for EM algorithm estimators
C.1. The estimator for the number of regime changes: ˆ a i,j Beginning with the estimator used for the number of regime/state changes, ˆ a i,j , the integrand in Equation(3.10) is P [ M ( s − ) = i, M ( s ) = j | N M ( ρ − ( t )) , ≤ t ≤ T ]= P [ M ( s ) = j | M ( s − ) = i ] P [ M ( s − ) = i ] P [ N M ( ρ − ( t )) , ≤ t ≤ T | M ( s − ) = i, M ( s ) = j ] P [ N M ( ρ − ( t )) , ≤ t ≤ T ] . The initial term of the numerator is simply q i,j and by using the conditional independence, the last term ofthe numerator is decomposed to P [ N M ( ρ − ( t )) , ≤ t ≤ T | M ( s − ) = i, M ( s ) = j ]= P [ N M ( ρ − ( t )) , ≤ t < s, M ( s − ) = i ] P [ M ( s − ) = i ] P [ N M ( ρ − ( t )) , s ≤ t ≤ T | M ( s ) = j ] . Thus, we obtainˆ a i,j = q i,j P [ N M ( ρ − ( t )) , ≤ t ≤ T ] × (cid:90) T P [ N M ( ρ − ( t )) , ≤ t < s, M ( s − ) = i ] P [ N M ( ρ − ( t )) , s ≤ t ≤ T | M ( s ) = j ] ds (C.1)Each of these probabilities can be rewritten in terms of the transition densities from the previous section.The probability P [ N M ( ρ − ( t )) , ≤ t ≤ T ] is simply the likelihood of the observed claim arrivals which canbe expressed in matrix form as P [ N M ( ρ − ( t )) , ≤ t ≤ T ] = π ( Q , Λ ) (cid:20) n (cid:89) k =1 f δ k ( t k − , t k − t k − ) (cid:21) (C.2)where π ( Q , Λ ) is the starting distribution of state probabilities given the initial parameter estimates Q and Λ , is a r × t = 0. Also, k is now a counter for both event types and n isthe total number of original events and exposure changes. In a similar manner, the probabilities within the31ntegrand are P [ N M ( ρ − ( t )) , ≤ t < s, M ( s − ) = i ] = π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) k =1 f δ k ( t k − t k − , t k − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i , (C.3) P [ N M ( ρ − ( t )) , s ≤ t ≤ T | M ( s ) = j ] = (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) k = N M ( ρ − ( s ) − )+2 f δ k ( t k − , t k − t k − ) (cid:21) , (C.4)where i is a vector of zeroes except for the i -th entry which is one. Thus, the final expression for theestimator of a i,j isˆ a i,j = q i,j π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) k =1 f δ k ( t k − , t k − t k − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) k = N M ( ρ − ( s ) − )+2 f δ k ( t k − , t k − t k − ) (cid:21) ds (cid:35) . (C.5) C.1.1. The estimator for the number of events of interest in each regime: ˆ n i Continuing from Equation (3.11), we obtainˆ n i = E [ n i | N M ( ρ − ( t )) , ≤ t ≤ T ]= n (cid:88) k =1 P [ M ( t k ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ]= n (cid:88) k =1 P [ M ( t k ) = i, N M ( ρ − ( t )) , ≤ t ≤ T ] P [ N M ( ρ − ( t )) , ≤ t ≤ T ]= 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × n (cid:88) k =1 π ( Q , Λ ) (cid:20) N M ( ρ − ( t k )) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) i (cid:124) i (cid:20) n (cid:89) l = N M ( ρ − ( t k )+1 f δ l ( t l − , t l − t l − ) (cid:21) . (C.6)32 .1.2. The estimator for the time spent in each regime: ˆ T i Using the expression for ˆ T i from Equation (3.12) and following a similar procedure to the ˆ m i,j calcula-tions, it can be seen thatˆ T i = (cid:90) T P [ M ( s ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] ds = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:90) T P [ M ( s ) = i, N M ( ρ − ( t )); 0 ≤ t < s ] P [ N M ( ρ − ( t )); s ≤ t ≤ T | M ( s ) = i ] ds = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) l = N M ( ρ − ( s ) − )+2 f δ l ( t l − , t l − t l − ) (cid:21) ds (cid:35) . (C.7)In the above, the event { M ( s − ) = i } has been replaced with { M ( s ) = i } which does not change the densitydue to the fact that { M ( s ) } is continuous in probability. C.1.3. The estimator for the operational time spent in each regime: ˆ T ∗ i The results from the previous section are easily adapted to obtain the following estimator for T ∗ i :ˆ T ∗ i = (cid:90) T P [ M ( s ) = i | N M ( ρ − ( t )) , ≤ t ≤ T ] γ ( s ) ds = 1 π ( Q , Λ ) (cid:20) (cid:81) nk =1 f δ k ( t k − , t k − t k − ) (cid:21) × (cid:34) (cid:90) T π ( Q , Λ ) (cid:20) N M ( ρ − ( s ) − ) (cid:89) l =1 f δ l ( t l − , t l − t l − ) (cid:21) ¯ F ( t N M ( ρ − ( s ) − ) , s − t N M ( ρ − ( s ) − ) ) i × (cid:124) j f δ NM ( ρ − s ) − )+1 ( s, t N M ( ρ − ( s ) − )+1 − s ) (cid:20) n (cid:89) l = N M ( ρ − ( s ) − )+2 f δ l ( t l − , t l − t l − ) (cid:21) γ ( s ) ds (cid:35)(cid:35)