matrixdist: An R Package for Inhomogeneous Phase-Type Distributions
JJanuary 26, 2021
MATRIXDIST: AN R PACKAGE FOR INHOMOGENEOUS PHASE–TYPEDISTRIBUTIONS
MARTIN BLADT AND JORGE YSLAS
Abstract.
Inhomogeneous phase-type distributions (IPH) are a broad class of laws which arisefrom the absorption times of Markov jump processes. In the time-homogeneous particular case,we recover phase-type (PH) distributions. In matrix notation, various functionals correspondingto their distributional properties are explicitly available and succinctly described. As the numberof parameters increases, IPH distributions may converge weakly to any probability measure on thepositive real line, making them particularly attractive candidates for statistical modelling purposes.Contrary to PH distributions, the IPH class allows for a wide range of tail behaviours, whichoften leads to adequate estimation with a moderate number of parameters. One of the maindifficulties in estimating PH and IPH distributions is their large number of matrix parameters.This drawback is best handled through the expectation-maximisation (EM) algorithm, exploitingthe underlying and unobserved Markov structure. The matrixdist package presents tools forIPH distributions to efficiently evaluate functionals, simulate, and carry out maximum likelihoodestimation through a three-step EM algorithm. Aggregated and right-censored data are supportedby the fitting routines, and in particular, one may estimate time-to-event data, histograms, ordiscretised theoretical distributions. Introduction
Phase-type (PH) distributions ([7]) are a natural extension of the exponential distribution and aredefined as the time to reach an absorbing state in an otherwise transient time-homogeneous pure-jump Markov process. The transient state space is of integer dimension p ≥
1. In the particularcase p = 1 we recover the exponential distribution, and for general p , the asymptotic tail behaviourremains exponential. Since we only observe the time to absorption but not the actual stochasticprocess trajectory, models based on phase-type distributions are hidden Markov models. Theyare typically estimated using the expectation-maximisation (EM) algorithm (see [6] for a MCMCapproach). Closed-form formulae for functionals such as density, cumulative distribution function,Laplace transform, and moments are all explicitly available in terms of matrices and closely resembletheir scalar counterparts.Letting p grow indefinitely, one may choose the initial distribution of the Markov process, thesojourn (exponential) rates, and the transition rates between states in such a way that the resultingPH distributions converge weakly to any given and fixed distribution on the positive real line.Hence, we may regard PH distributions as a generalisation of the exponential distribution, which isas flexible as desired in the overall shape of its distribution function. However, the tail behaviouralways remains exponential, and in many applications, a better description is required to estimatefunctionals based on quantiles correctly. This drawback naturally leads to considering classes whichextend PH distributions and allow different tail behaviour while still enjoying closed-form formulae,the weak convergence property, and tractable estimation via the EM algorithm. One such class iscomprised by inhomogeneous phase-type (IPH) distributions.Inhomogeneous phase-type distributions ([1]) are equivalently defined as the law of a transformedPH distributed random variable or as the absorption time of a time-inhomogeneous pure-jumpMarkov process with p transient states and one additional absorbing state. Depending on the a r X i v : . [ s t a t . C O ] J a n M. BLADT AND J. YSLAS chosen transformation, the tail behaviour of the resulting IPH distribution can be shown to be asheavy or light as desired. When considering parametric transformations, the estimation procedurewas outlined in [2] for possibly aggregated data (see also [4] for extensions to right-censored dataand survival regression models).This paper presents the matrixdist package ([5]) to work with PH and IPH distributions, im-plemented in R ([13]). Various structures for the sub-intensity matrix of the underlying Markovprocess are considered, and several parametric transformations are provided, inspired by real-lifeapplications. Most of the functions are coded in the language
C++ for enhanced performanceand made available in the language R through the package Rcpp . It is worth mentioning thatthe R package mapfit provides statistical routines for Markovian Arrival Processes (MAP), whichcan loosely speaking be seen as dependent concatenations of PH variables (arrivals), but only thetime-homogeneous case is treated.The remainder of the paper is organised as follows. Section 2 provides the mathematical formula-tion of PH and IPH distributions and their basic properties and inhomogeneity parametrisations.Estimation for aggregated and right-censored data via the EM algorithm is detailed in Section 3, forvarious sub-intensity matrix structures. In Section 4 the main methods of the matrixdist packagefor functional evaluation, simulation and estimation are presented. An illustration is provided inSection 5, and a condensed summary in Section 6.2.
Definitions and properties
Phase–type distributions.
Let ( J t ) t ≥ be a time-homogeneous Markov jump process on afinite state space { , . . . , p, p + 1 } , where states 1 , . . . , p are transient and state p + 1 is absorbing.Then, the intensity matrix Λ of ( J t ) t ≥ is of the form Λ = (cid:18) S sss (cid:19) , where S is a p × p matrix and sss is a p -dimensional column vector. Since every row of Λ sumsto zero, it follows that sss = − S eee , where eee = (1 , · · · , (cid:48) is a p –dimensional column vector of ones.Assume that the process starts somewhere in the transient space with probability α k = P ( J = k ), k = 1 , . . . , p , and let ααα = ( α , . . . , α p ). Here it is assumed that the probability of starting inthe absorbing state p + 1 is zero, i.e., P ( J = p + 1) = 0. Then, we say that the time untilabsorption Y has phase-type distribution with representation ( ααα, S ) and we write Y ∼ PH( ααα, S )or Y ∼ PH p ( ααα, S ). The density f and distribution function F of Y ∼ PH( ααα, S ) are given by f ( y ) = ααα exp ( S y ) sss , y > , (2.1) F ( y ) = 1 − ααα exp ( S y ) eee , y > , (2.2)where the exponential of a matrix A is defined byexp ( A ) = ∞ (cid:88) n =0 A n n ! . The efficient computation of the exponential of a matrix is not a straightforward task in highdimensions (see [9]). The moments of Y ∼ PH( ααα, S ) are given by E (cid:16) Y θ (cid:17) = Γ (1 + θ ) ααα ( − S ) − θ eee , θ > . (2.3)Let Y ∼ PH p ( ααα , S ) and Y ∼ PH p ( ααα , S ) be independent phase-type distributed randomvariables of dimensions p and p respectively. Then Y + Y ∼ PH p + p (cid:18) ( ααα , , (cid:18) S sss ααα S (cid:19)(cid:19) , (2.4) ATRIXDIST 3 min ( Y , Y ) ∼ PH ( ααα ⊗ ααα , S ⊕ S ) , (2.5)and max ( Y , Y ) ∼ PH ( ααα ⊗ ααα , , , S ⊕ S I ⊗ sss sss ⊗ I S
00 0 S , (2.6)where ⊗ and ⊕ denotes the Kronecker product and sum, respectively, and I is the identity matrixof appropriate dimension. In other words, the PH class is closed under addition, minima andmaxima. In fact, it is also closed under any order statistic, and we confer to [7] for the exact andmore involved expression.By decomposing the matrix S into its Jordan normal form, it follows directly from the definitionof survival function that the right tail of a PH distribution is asymptotically exponential. For acomprehensive reading on phase-type distributions we refer to [7].2.2. Inhomogeneous phase–type distributions.
In [1] the class of inhomogeneous phase-typedistribution was introduced by allowing the Markov jump process to be time-inhomogeneous in theconstruction principle of phase-type distributions. In this way, ( J t ) t ≥ has an intensity matrix ofthe form Λ ( t ) = (cid:18) S ( t ) sss ( t ) (cid:19) , where sss ( t ) = − S ( t ) eee . Here it is assumed the vector of initial probabilities is ααα = ( α , . . . , α p ).Then, we say that the time until absorption X has inhomogeneous phase-type distribution withrepresentation ( ααα, S ( t )) and we write X ∼ IPH( ααα, S ( t )). However, this setting is too general forapplications, since its functionals are given in terms of product integrals. Thus, we consider thesubclass of IPH distribution when the intensity matrix is of the form S ( t ) = λ ( t ) S , where λ issome known non-negative real function and S is a sub-intensity matrix. In this case we write X ∼ IPH( ααα, S , λ ). Note that with λ ≡ X ∼ IPH( ααα, S , λ ),then there exists a function g such that X ∼ g ( Y ) , where Y ∼ PH( ααα, S ). The relationship between g and λ is given by the following expression g − ( y ) = (cid:90) y λ ( t ) dt . The density f and distribution function F of X ∼ IPH( ααα, S , λ ) can again be expressed using theexponential of matrices, and are given by f ( x ) = λ ( x ) ααα exp (cid:18)(cid:90) x λ ( t ) dt S (cid:19) sss, x > , (2.7) F ( x ) = 1 − ααα exp (cid:18)(cid:90) x λ ( t ) dt S (cid:19) eee, x > . (2.8)The class of IPH distributions is no longer confined to exponential tails, despite being defined interms of matrix exponentials. Its exact asymptotic behaviour is determined by the function λ (see[4] for details).It turns out that a number of IPH distributions can be expressed as classical distributions withmatrix-valued parameters, properly defined using functional calculus. Tables 2.1, 2.2 and 2.3contain the IPH transformations, densities and cumulative distribution functions in closed-form M. BLADT AND J. YSLAS matrix notation as implemented in the matrixdist package (cf. [1, 2, 4] for details and motivationon the various parametrisations). λ ( t ) g ( y ) Parameters DomainMatrix-Pareto ( t + β ) − β (exp( y ) − β > βt β − y /β β > γ (log( s +1)) γ − s +1 exp( y /γ ) − γ > θt θ − / ( t θ + γ θ ) γ (exp( y ) − /θ γ, θ > βt ) log( βy + 1) /β β > µ + σ ( y − ξ − /ξ µ ∈ R , σ > ξ ∈ R Table 2.1.
Transformations f ( x ) DomainMatrix-Pareto ααα (cid:16) xβ + 1 (cid:17) S − I sss β x > ααα exp (cid:0) S x β (cid:1) sssβx β − x > ααα exp ( S (log ( x + 1)) γ ) sss γ (log( x +1)) γ − x +1 x > ααα exp (cid:16) S log (cid:16) ( x/γ ) θ + 1 (cid:17)(cid:17) θx θ − / (cid:0) x θ + γ θ (cid:1) x > ααα exp ( S (exp ( βx ) − /β ) sss exp ( βx ) x > ξ (cid:54) = 0) σ ααα exp (cid:16) S (cid:0) ξ x − µσ (cid:1) − /ξ (cid:17) sss (cid:0) ξ x − µσ (cid:1) − (1+ ξ ) /ξ ξ ( x − µ ) /σ > − ξ = 0) σ ααα exp (cid:0) S exp (cid:0) − x − µσ (cid:1)(cid:1) sss exp (cid:0) − x − µσ (cid:1) x ∈ R Table 2.2.
Densities F ( x )Matrix-Pareto 1 − ααα (cid:16) xβ + 1 (cid:17) S eee Matrix-Weibull 1 − ααα exp (cid:0) S x β (cid:1) eee Matrix-Lognormal 1 − ααα exp ( S (log ( x + 1)) γ ) eee Matrix-Loglogistic 1 − ααα exp (cid:16) S log (cid:16) ( x/γ ) θ + 1 (cid:17)(cid:17) Matrix-Gompertz 1 − ααα exp ( S (exp ( βx ) − /β ) eee Matrix-GEV ( ξ (cid:54) = 0) ααα exp (cid:16) S (cid:0) ξ x − µσ (cid:1) − /ξ (cid:17) eee Matrix-GEV ( ξ = 0) ααα exp (cid:0) S exp (cid:0) − x − µσ (cid:1)(cid:1) eee Table 2.3.
Cumulative distribution functionsFor a randomly generated 5-dimensional general phase-type distribution, Figure 2.1 depicts thedensity, distribution function, hazard rate and quantile function of the following ph and iph objects: ATRIXDIST 5 set . seed (1)x <- ph ( structure = " general " , dimension = 5)x1 <- iph (x , gfun = " pareto " , gfun _ pars = 0.5)x2 <- iph (x , gfun = " weibull " , gfun _ pars = 1.5)x3 <- iph (x , gfun = " lognormal " , gfun _ pars = 2)x4 <- iph (x , gfun = " gompertz " , gfun _ pars = 1)
Figure 2.1.
Functionals of 5-dimensional inhomogeneous phase-type distributionswith the same underlying structure.
M. BLADT AND J. YSLAS Inference algorithms
Estimation of PH distributions.
The expectation-maximisation (EM) algorithm is an it-erative method for maximum likelihood (ML) estimation. This algorithm is particularly suitablefor situations best described as incomplete-data problems, where ML estimation is difficult dueto the absence of some part of data in a more familiar and simpler data structure. Examples ofincomplete data settings include: missing data, truncated distributions, and censored or groupedobservations. Consider y , . . . , y M an i.i.d. sample from a phase-type distributed random variable.Since each y i , i = 1 , . . . , M , is a replication of the absorption time of a Markov-jump process whosetrajectory is unobserved, we are clearly in an incomplete data set-up. This was first exploited in[3] where the authors proposed the use of an EM algorithm for ML estimation for PH distributionsand derived explicit formulae for the E- and M- steps. We now briefly describe such method.Let B k be the number of times the Markov process underlying the phase–type distribution initiatesin state k , N kl the total number of transitions from state k to l , N k the number of times an exit tothe absorbing state was caused by a jump from state k , and Z k the total time the Markov processhas spent in state k . The complete data likelihood L c is then given by L c ( ααα, S , βββ ; yyy ) = p (cid:89) k =1 α B k k p (cid:89) k =1 (cid:89) l (cid:54) = k s N kl kl e − s kl Z k p (cid:89) k =1 s N k k e − s k Z k , leading to the EM algorithm 3.1, where eee k denotes a p -dimensional column vector with all entriesequal to zero except from the k -th entry which is equal to one. Algorithm 3.1 (EM algorithm for PH distributions) . Input : observations y , . . . , y M , number of iterations to perform N , initial parameters ( ααα, S ) . Loop over ≤ n ≤ N : (1) (E–step) Calculate E ( B k | YYY = yyy ) = M (cid:88) i =1 α k eee (cid:48) k exp( S y i ) sssααα exp( S y i ) sss E ( Z k | YYY = yyy ) = M (cid:88) i =1 (cid:82) y i eee (cid:48) k exp( S ( y i − u )) sssααα exp( S u ) eee k duααα exp( S y i ) sss E ( N kl | YYY = yyy ) = M (cid:88) i =1 s kl (cid:82) y i eee (cid:48) l exp( S ( y i − u )) sssααα exp( S u ) eee k duααα exp( S y i ) sss E ( N k | YYY = yyy ) = M (cid:88) i =1 s k ααα exp( S y i ) eee k ααα exp( S y i ) sss (2) (M–step) Let ˆ α k = E ( B k | YYY = yyy ) M , ˆ s kl = E ( N kl | YYY = yyy ) E ( Z k | YYY = yyy ) , ˆ s k = E ( N k | YYY = yyy ) E ( Z k | YYY = yyy ) , ˆ s kk = − (cid:88) l (cid:54) = k ˆ t kl − ˆ t k . Let ˆ ααα = ( ˆ α , . . . , ˆ α p ) , ˆ S = { ˆ s kl } k,l =1 ,...,p and ˆ sss = (ˆ s , . . . , ˆ s p ) (cid:48) . ATRIXDIST 7 (3)
Assign ααα := ˆ ααα , S := ˆ S , sss := ˆ sss and GOTO 1. Output : fitted parameters ( ααα, S ) . Let ( ααα n , S n ) be the parameter values after the n -th iteration. The EM algorithm ensures that eachnew estimate is better than the previous one in the sense that L ( ααα n , S n ; yyy ) ≤ L ( ααα n +1 , S n +1 ; yyy ) , where L ( ααα, S ; yyy ) = (cid:81) Mi =1 ααα exp ( S y i ) sss is the likelihood function. In other words, the likelihoodfunction increases for each iteration, and hence converges to some maximum. In general, thereis no way of guaranteeing that such maximum is global. In practice, a commonly used criterionfor determining the number of iterations is consider a large enough number such that subsequentchanges in the log-likelihood are negligible.Note that the EM algorithm 3.1 preserves zeros in ααα and S . This means, that when an elementhas been estimated to zero, it will remain zero in subsequent iterations. Thus, one can initiatethe algorithm with a specific initial structure and it will be preserved. The matrixdist packageincludes the following preset structures: • General PH . No restrictions, all elements are allowed to be non-zero. • Hyperexponential . Also known as mixture of exponentials. Here, the Markov process isallowed to start in any of the transient states but gets absorbed without visiting any otherstate. This means that the matrix S is non-zero only in the diagonal. • Generalised Erlang . Also known as sum of exponentials. Here, the Markov process startsin state 1 with probability one, i.e., ααα = (1 , , . . . ,
0) and then visits each subsequent state2 , . . . , p , in that order, and finally reaches the absorbing state. • Coxian . Same structure as the generalised Erlang, but the Markov process is allowed toreach absorption from any of the transient states. • Generalised Coxian . Same structure as the Coxian, but the Markov process is allowed tostart in any state.The main computational burden lies in the E-steps where the computation of matrix exponentialsis required. While this can be done in multiple ways (see [9]), the matrixdist package employs theapproach described in [3] that consists of converting the problem into a system of ODE’s. Morespecifically, we consider the p -dimensional vector functions aaa ( y | ααα, S ) = ααα exp ( S y ) ,bbb ( y | ααα, S ) = exp ( S y ) sss ,ccc ( y, k | ααα, S ) = (cid:90) y ααα exp ( S u ) eee k exp ( S ( y − u )) tttdu , k = 1 , . . . , p . Note that the E-step can be computed from the knowledge of the above functions. Thus, for fixed ααα and S , we get the following system of ODE’s aaa (1) ( y | ααα, S ) = aaa ( y | ααα, S ) S ,bbb (1) ( y | ααα, S ) = S bbb ( y | ααα, S ) ,ccc (1) ( y, k | ααα, S ) = S ccc ( y, k | ααα, S ) + a i ( y | ααα, S ) sss , k = 1 , . . . , p , where a i ( y | ααα, S ) is the i -th component of the vector aaa ( y | ααα, S ). Here, the superscript ( k ) denotesthe k -th order derivative. The initial conditions of the system are aaa (0 | ααα, S ) = ααα ,bbb (0 | ααα, S ) = sss , M. BLADT AND J. YSLAS ccc (0 , k | ααα, S ) = 000 , k = 1 , . . . , p . The system is then solved numerically via a Runge-Kutta method of fourth order. This is also themethod employed in the EMpht programme ([12]). We made use of the same default step-lengthof 0 . / max i =1 ,...,p | s ii | as in the EMpht programme, but the user may input a custom step-length ifdesired. An important consideration in the implementation of this method is that the sample mustbe ordered, which is done automatically by the matrixdist package. See [2, 10] for other methodsbased on uniformisation.3.2. Censored data.
In [11] an EM algorithm for ML estimation of PH distributions in thepresence of censored data was derived. The only difference with respect to the algorithm in [3] isin the E-step. The authors derived explicit formulae for left-censoring, right-censoring and, moregenerally, interval censoring. The matrixdist package includes the right-censoring case, which isarguably the most commonly encountered setting in application areas such as survival analysis.Next we present the missing formulae which are needed for this specific case. E ( B k | Y > v ) = α k eee (cid:48) k exp ( S v ) eeeααα exp ( S v ) eee , E ( Z k | Y > v ) = (cid:82) v eee (cid:48) k exp ( S ( v − u )) eeeααα exp ( S u ) eee k duααα exp ( S v ) eee , E ( N kl | Y > v ) = s kl (cid:82) v eee (cid:48) l exp ( S ( v − u )) eeeααα exp ( S u ) eee k duααα exp ( S v ) eee . Note that the contribution from N k , given that Y > v , must be zero since absorption has not yettaken place. The computation of this expression is performed via a Runge-Kutta method of fourthorder as well (we refer to [11] for further details).3.3.
Weighted data.
Suppose that for a given sample set there are y , . . . , y D distinct values andthat y i appears M i ≥ i = 1 , . . . , D . In this way, M + · · · + M D = M and the sums over all sample points in the E-step may be reduced to weighted sums of fewer terms.For instance, E ( B k | YYY = yyy ) = D (cid:88) i =1 M i α k eee (cid:48) k exp( S y i ) sssααα exp( S y i ) sss . This is particularly useful when data are represented by a histogram rather than raw data. The matrixdist package allows weighted data sets as input to estimation routines. In absence of weightinput, an equal weight is assigned to each observation by default.Another situation where weighted samples are useful is to fit PH distributions to a theoretical givendistribution. In [3] it was shown that the EM algorithm can be modified to this case and expressionsin terms of integrals are given. However, in general none of the integrals have explicit solutions,and approximations have to be employed. These integrals can be approximated by a weighted sumover a finite grid, i.e., if h is the theoretical given density then the integrals are approximated as (cid:82) ∞ · · · h ( y ) dy = (cid:80) Di = i · · · M i y i . Thus, the problem is reduced to having a weighted sample as input.3.4. Estimation of IPH distributions.
An EM algorithm for ML estimation of IPH distributionwas recently introduced in [2]. The main idea is to allow the inhomogeneity function λ ( · ; βββ ) todepend on a parameter βββ in the representation X ∼ IPH( ααα, S , λ ( · ; βββ )). In particular, g − ( X : βββ ) ∼ PH( ααα, S ), where g − ( x : βββ ) = (cid:82) x λ ( t ; βββ ) dt . ATRIXDIST 9
Notice that the parameter βββ must also be estimated, such that the approach of transforming intothe PH domain taken in [1] is not applicable in this parametric setting. Instead, [2, 4] consideredestimating βββ separately from the other parameters and performing a new transformation to the PHdomain after each set of E- and M- steps. In this way, one obtains the EM algorithm 3.2, whichis also guaranteed to increase the likelihood at each iteration, thus converging to a (possibly local)maximum.It is worth noting that algorithm 3.2 will increase the likelihood at each iteration even if themaximum in Step 2 is not fully reached, as long as there is no decrease of likelihood in the numericaloptimisation routine involved in that step. Hence, to speed-up convergence, a pre-set number ofiterations, or depth , may be specified for Step 2.Extensions to censored and weighted data can be handled in an analogous way as the PH case andare also implemented in the matrixdist package, but are presently omitted for conciseness. Werefer to [4] for further details on these algorithms, as well as other covariate-dependent extensions.
Algorithm 3.2 (EM algorithm for IPH distributions) . Input : observations x , . . . , x M , number of iterations to perform N , initial parameters ( ααα, S , βββ ) . Loop over ≤ n ≤ N : (1) Transform the data into y i = g − ( x i ; βββ ) , i = 1 , . . . , M , and apply the E– and M–steps EMalgorithm 3.1 by which we obtain the estimators (ˆ ααα, ˆ S ) . (2) Compute ˆ βββ = arg max βββ M (cid:88) i =1 log (cid:16) f X (cid:16) x i ; ˆ ααα, ˆ S , βββ (cid:17)(cid:17) = arg max βββ M (cid:88) i =1 log (cid:18) λ ( x i ; βββ )ˆ ααα exp (cid:18)(cid:90) x i λ ( t ; βββ ) dt ˆ S (cid:19) ˆ sss (cid:19) . (3) Assign ( ααα, S , βββ ) = (ˆ ααα, ˆ S , ˆ βββ ) and GOTO 1. Output : fitted parameters ( ααα, S , βββ ) . Methods for PH and IPH distributions
The matrixdist provides an object-oriented interface for PH and IPH distributions. The con-structor function ph will admit structural parameters ααα and S and create an object of class ph .Alternatively, one may input a structure ( ”general” , ”coxian” , ”gcoxian” , ”hyperexponential” , or ”gerlang” ) and a randomly generated valid structure of the provided dimension will be generated.The constructor function iph will admit a ph object together with the inhomogeneity function g ( ”pareto” , ”weibull” , ”lognormal” , ”loglogistic” , ”gompertz” , or ”gev” ), or alternatively will ad-mit the same input as the ph constructor function and the inhomogeneity specifications. Once a ph or iph object has been created, a suite of methods are available for the efficient evaluation offunctionals, simulation, and estimation. We describe in this section the implementation of thesemethods. Density function
The dens method calculates the density function of a ph or iph object. For ph objects, an internal function phdensity is called from c++ , which implements formula (2.1), and inparticular calculates the matrix exponential using a similar implementation to MATLAB’s built-inalgorithm, the Pade approximation ([9]). For iph objects, the expressions in Table 2.2 are implemented individually and called through their g function name. For instance, the c++ function mweibullden is called for Matrix-Weibull objects.This implementation is faster than inputting a inhomogeneity function into a generic function ofthe type (2.7). Cumulative distribution function
The cdf method calculates the cumulative distribution func-tion of a ph or iph object. For ph objects, an internal function phcdf is called from c++ , whichimplements formula (2.2), and once again calculates the matrix exponential using the Pade approx-imation.For iph objects, the expressions in Table 2.3 are implemented individually and called through their g function name. For instance, the c++ function mparetocdf is called for Matrix-Pareto objects.As for densities, the implementation is faster than inputting a inhomogeneity function into formula(2.8). Quantile function
The quan method calculates the quantile function of a ph or iph object, definedas Q ( p ) = inf { x ∈ R : p ≤ F ( x ) } , < p < . No general closed-form formula exists for PH and IPH distributions, so the R function stats::uniroot function is called to numerically find such infimum. For very few special cases, such as a one-dimensional ph object, i.e. an exponential distribution, explicit formulae for quantiles are mathe-matically available and thus their direct implementation will be faster. Hazard rate function
The haz method calculates the hazard rate of a ph or iph object, definedas h ( x ) = f ( x )1 − F ( x ) , x > , and is implemented precisely as such ratio, using the methods dens and cdf . Moments
The moment method calculates the fractional moments of a ph object, or of the un-derlying ph object of an iph object, as given by formula (2.3). The method is implemented fordiagonalisable matrices S , and uses the built-in R functions eigen and solve to compute the matrixfunction involved in (2.3). Random sampling
The sim method performs random sampling from a ph or iph object. For ph objects, an internal function rphasetype is called from c++ , which simulates a path of a time-homogeneous Markov jump process and outputs the absorption time as the corresponding PHvariate.For iph objects, the c++ function riph uses the transforms of Table 2.1 to simulate from a path ofa time-inhomogeneous Markov jump process, and output the absorption time as the correspondingIPH variate. A special implementation for the Matrix-GEV distribution is provided using the c++ function rmatrixgev , since this case is not obtainable through straightforward time transformationfor any ξ . Sum, minimum and maximum
The ”+” , minimum and maximum methods admit two ph objects and construct another ph object according to formulae (2.4), (2.5) and (2.6), respectively.Their implementation is straightforward using the ph constructor function, the built-in R function kronecker for computing Kronecker product of matrices, and the function kronecker sum from the matrixdist package for computing the Kronecker sum of matrices.For the special case when two iph objects have the same g function and parameter, the minimumand maximum is again an IPH distribution with the same g function and parameter, and underlying ATRIXDIST 11 structures (2.5) and (2.6), respectively. Thus, the minimum and maximum methods also work forthis particular case of IPH distributions.
Estimation via the EM algorithm
The fit method estimates a ph or iph object based ongiven data and an initial ph or iph object, respectively. The data should be positive (except whenconsidering the Matrix-GEV distribution, which is allowed to take negative values as well) and maybe aggregated or right-censored. Algorithms 3.1 and 3.2 and their censored and aggregated dataversions are implemented.For ph objects, the c++ function EMstep RK is called at each iteration to perform an E- and M-step. The c++ function default step length re-calculates the default step-size of the Runge-Kuttamethod at each iteration, unless it is user specified. The function logLikelihoodPH RK calculatesand prints the log-likelihood every pre-set number of steps, is implemented in c++ , and re-usessome of the by-products of the EM step via the Runge-Kutta method to improve evaluation speed.For iph objects, the c++ function
EMstep RK is called at each iteration to perform an E- andM-step, and then the optimisation in Step 2 of algorithm 3.2 is performed using the R built-inoptimiser optim . The objective function in the latter optimisation depends on the g inhomogeneityfunction. For instance, logLikelihoodMlognormal RK is a c++ function which is called to performlog-likelihood evaluations for the Matrix-Lognormal distribution using the by-products of the EMstep via the Runge-Kutta method. As before, the step size is re-calculated at every iteration, andthe log-likelihood is printed at every pre-set number of steps, this time using the output of the optim function.The object-oriented interface of the matrixdist package is particularly convenient for estimationusing the fit method, since one may re-start the EM algorithm to perform more iterations bysimply using a fitted object as the initial object of a new fit to the same data. Furthermore, forlarge datasets, one may fit on a histogram to get a near-optimal fit, and then finish off the estimationusing the entire dataset. All previous methods for ph and iph objects are again available for fittedobjects, since they are once again of the same class. Other methods
The show , logLik and coef methods admit ph objects and work in the samemanner as the usual R methods with the same names.5. Illustrations
In this section we provide two detailed illustrations of the estimation of PH and IPH distributionsusing the matrixdist package. These examples also provide some guidance on how to employ thefitting functionalities of the package. In what follows, the number of iterations was chosen in sucha way that the changes in the successive log-likelihood values became negligible.5.1.
Truncated normal distribution.
Let Z be a normally distributed random variable withmean µ and standard deviation σ . Then, we say that the conditional distribution of Z given a < Z < b , −∞ ≤ a < b ≤ ∞ , is a truncated normal distribution with density function h given by h ( x ) = φ (( x − µ ) /σ ) σ (Φ (( b − µ ) /σ ) − Φ (( a − µ ) /σ )) , a < x < b , where φ and Φ are the density and distribution functions of a standard normal distribution, re-spectively.Presently, we consider a truncated normal distribution with support in the interval (0 , ∞ ) and µ = σ = 1. Then, we employ algorithms 3.1 and 3.2 to estimate PH and IPH models to thistheoretical given distribution. To this end, we first discretise the theoretical model’s density. While this can be done in several ways, such as using Simpson’s rule, we proceeded with a simplifiedapproximation which yielded the following data set y <- seq (0.01 , 5 , by = 0.05)weights <- 0.05 * dnorm (y , mean = 1) / pnorm (0 , mean = 1 , lower . tail = FALSE ) Then, we fitted a PH model with 10 phases and Coxian structure set . seed (1)x1 <- ph ( structure = " coxian " , dimension = 10)x _ fit1 <- fit ( x1 , y , weight = weights , stepsEM = 20000)
The run time was of 131 .
28 seconds in a usual PC with a 2.4 GHz 8-Core Intel Core i9 processor.As a second model, we considered a Matrix-Gompertz model with Coxian structure of dimension2. set . seed (1)x2 <- iph ( ph ( structure = " coxian " , dimension = 2) , gfun = " gompertz " )x _ fit2 <- fit ( x2 , y , weight = weights , stepsEM = 600)
Here the run time was 1 .
33 seconds. Figure 5.1 shows that the matrix-Gompertz model provides abetter fit with substantially less phases. Note that the AIC displayed is for comparison purposesonly. In general, one should proceed with great caution when employing information criteria (espe-cially in high dimensions) due to the well-known identifiability issues for PH and IPH distributions.5.2.
Loss insurance data.
The data consist of 1500 insurance claims from a real-life insurancecompany. Each claim consists of an indemnity payment (loss) and an allocated loss adjustmentexpense (ALAE). This data set is available in the copula package and has been previously studiedin the literature (e.g. [8]). We presently consider only the loss component, for which 34 observationsare right-censored, and proceed to fit Pareto and Matrix-Pareto models. As a first step, we splitthe data into censored and non-censored observations data ( loss , package = " copula " )y <- loss $ loss [ loss $ censored == 0]rcens <- loss $ loss [ loss $ censored == 1] Subsequently, we fitted the Pareto model, which is a special case of the matrix-Pareto with p = 1.Thus we may employ algorithm 3.2 for its estimation as well. set . seed (1)x1 <- iph ( ph ( structure = " general " , dimension = 1) , gfun = " pareto " )x _ fit1 <- fit ( x1 , y , rcen = rcens , stepsEM = 1000) The running time was of 5 .
34 seconds with a log-likelihood of − .
36. Moreover, the tail index,which determines the heaviness of the tail is of 0 .
88, very close to the one of 0 .
89 from the Paretomodel in [8].Next we fitted a Matrix-Pareto distribution win Coxian structure of dimension 4 set . seed (1)x2 <- iph ( ph ( structure = " coxian " , dimension = 4) , gfun = " pareto " )x _ fit2 <- fit ( x2 , y , rcen = rcens , stepsEM = 2000)
In this case, the running time was 13 .
66 seconds. Note that the log-likelihood here is of − . ATRIXDIST 13 matrix-Pareto model is closer to the non-parametric Nelson-Aalen estimator, the later computedusing the survival package. Note that the tail index in the Matrix-Pareto fit is of 0 .
67, suggestingthat a model with a lighter tail behaviour is more adequate. This can also be seen directly fromthe cumulative hazard plot. The tail indexes were computed as follows (see [4] for the mathematicsbehind these formulae, and in general for the asymptotic behaviour of IPH distributions) xi1 <- 1 / - max ( eigen ( coef ( x _ fit1 ) $ S ) $ values )xi2 <- 1 / - max ( eigen ( coef ( x _ fit2 ) $ S ) $ values ) Figure 5.1.
Left panel: two approximations of a truncated normal distribution;right panel: fitted cumulative hazards to the loss data compared to the Nelson-Aalenimplied hazard. 6.
Summary
We surveyed the basic properties of phase-type and inhomogeneous phase-type distributions andintroduced the matrixdist package, which provides tools for their statistical analysis. The packageincludes methods for computing various functionals, creating more complicated objects based onsimpler building blocks, simulation, and estimation via the EM algorithm. Particular emphasiswas given to the estimation routines, where we reviewed the most useful existing algorithms andtouched upon some computational aspects.The package’s major novelty is the implementation of all methods concerning IPH distributions.However, functional evaluation (and some new techniques) together with manipulating conventionalPH distributions comprehensively in an object-oriented manner are useful tools that fill a gap intheir own right.
References [1] Hansj¨org Albrecher and Mogens Bladt. Inhomogeneous phase-type distributions and heavy tails.
Journal ofApplied Probability , 56(4):1044–1064, 2019.[2] Hansj¨org Albrecher, Mogens Bladt, and Jorge Yslas. Fitting inhomogeneous phase-type distributions to data:The univariate and the multivariate case.
Scandinavian Journal of Statistics , 2020. [3] Søren Asmussen, Olle Nerman, and Marita Olsson. Fitting phase-type distributions via the EM algorithm.
Scandinavian Journal of Statistics , 23(4):419–441, 1996.[4] Martin Bladt and Jorge Yslas. Inhomogeneous survival regression models. arXiv preprint arXiv:2011.03219 ,2020.[5] Martin Bladt and Jorge Yslas. matrixdist: Statistics for Matrix Distributions , 2021. R package version 1.0.[6] Mogens Bladt, Antonio Gonzalez, and Steffen L Lauritzen. The estimation of phase-type related functionalsusing Markov chain Monte Carlo methods.
Scandinavian Actuarial Journal , 2003(4):280–300, 2003.[7] Mogens Bladt and Bo Friis Nielsen.
Matrix-Exponential Distributions in Applied Probability , volume 81. Springer,2017.[8] Edward W Frees and Emiliano A Valdez. Understanding relationships using copulas.
North American actuarialjournal , 2(1):1–25, 1998.[9] Cleve Moler and Charles Van Loan. Nineteen dubious ways to compute the exponential of a matrix.
SIAMreview , 20(4):801–836, 1978.[10] Hiroyuki Okamura, Tadashi Dohi, and Kishor S Trivedi. A refined EM algorithm for PH distributions.
Perfor-mance Evaluation , 68(10):938–954, 2011.[11] Marita Olsson. Estimation of phase-type distributions from censored data.
Scandinavian Journal of Statistics ,23(4):443–460, 1996.[12] Marita Olsson. The EMpht programme. Manual. Chalmers University of Technology and G¨otborg University,1998.[13] R Core Team.
R: A Language and Environment for Statistical Computing . R Foundation for Statistical Com-puting, Vienna, Austria, 2016. ISBN 3-900051-07-0.
Faculty of Business and Economics, University of Lausanne, 1015 Lausanne, Switzerland
Email address : [email protected] Munich, Germany
Email address ::