[PDF] Fitting inhomogeneous phase-type distributions to data: the univariate and the multivariate case

Abstract

The class of inhomogeneous phase-type distributions (IPH) was recently introduced in Albrecher and Bladt (2019) as an extension of the classical phase-type (PH) distributions. Like PH distributions, the class of IPH is dense in the class of distributions on the positive halfline, but leads to more parsimonious models in the presence of heavy tails. In this paper we propose a fitting procedure for this class to given data. We furthermore consider an analogous extension of Kulkarni's multivariate phase-type class (Kulkarni, 1989) to the inhomogeneous framework and study parameter estimation for the resulting new and flexible class of multivariate distributions. As a by-product, we amend a previously suggested fitting procedure for the homogeneous multivariate phase-type case and provide appropriate adaptations for censored data. The performance of the algorithms is illustrated in several numerical examples, both for simulated and real-life insurance data.

Full PDF

FFITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA:THE UNIVARIATE AND THE MULTIVARIATE CASE

HANSJ ¨ORG ALBRECHER, MOGENS BLADT, AND JORGE YSLAS

Abstract.

The class of inhomogeneous phase-type distributions was recently introduced in [6] asa dense extension of classical phase-type distributions that leads to more parsimonious models inthe presence of heavy tails. In this paper we propose a ﬁtting procedure for this class to givendata. We furthermore consider an analogous extension of Kulkarni’s multivariate phase-type class[27] to the inhomogeneous framework and study parameter estimation for the resulting new andﬂexible class of multivariate distributions. As a by-product, we amend a previously suggested ﬁttingprocedure for the homogeneous multivariate phase-type case and provide appropriate adaptationsfor censored data. The performance of the algorithms is illustrated in several numerical examples,both for simulated and real-life insurance data. Introduction

The development, study and ﬁtting of ﬂexible distributions for random phenomena is an im-portant branch of applied probability and statistics. Some respective approaches are based on anice blend of theory and practice, among which the class of phase–type (PH) distributions is aprominent example. Originally initiated by Neuts [32], the realization of a (univariate) phase–typedistributed random variable is interpreted as the time until absorption of a time–homogeneous,ﬁnite state–space Markov jump process with one absorbing state and the rest being transient. Theexplicit description through matrix exponentials makes the resulting class of distributions at thesame time versatile and analytically tractable (see e.g. [12] for a recent survey). The class of phase–type distributions is known to be dense (in the sense of weak convergence) among all distributionson the positive halﬂine, but for distributions whose shape is very diﬀerent from combinations ofexponential components (which are the building blocks of the probabilistic Markov jump processconstruction), a suitable phase–type approximation will need a large dimension of the involvedmatrix (representing the number of phases of the underlying Markov process) and – in addition tocomputational challenges – may then be seen unnatural. This is particularly the case for heavy–tailed distributions, where the focus in modelling often lies on the tail of the distribution, and thelatter is not well captured by the combination of exponential components of the PH construction.After some ﬁrst amendment involving inﬁnite–dimensional matrices was suggested in Bladt et al.[14, 15], recently a new way to circumvent this problem was proposed in [6]. Concretely, when theMarkov jump process is allowed to be time–inhomogeneous, one gains a lot of ﬂexibility in terms ofthe structure of the individual components entering the matrix framework, which can reduce thecomplexity of appropriate ﬁtting distributions drastically, in particular for distributions with heavytails. The intensity matrices of the Markov jump process are then a function of time. In the generalcase, they may not commute at diﬀerent time epochs, which complicates their statistical estimationdue to a lack of appropriate suﬃcient statistics. However, there is an important sub–class for whichthe intensity matrices can be written as a constant matrix scaled by some real function. In thisclass all matrices commute, and it was shown in [6] that along this way one in fact obtains, for

Mathematics Subject Classiﬁcation.

Primary: 60E05 Secondary: 60J22,62F10,62N01,62P05.

Key words and phrases.

Heavy tails; inhomogeneous phase-type; matrix Pareto distribution; matrix Weibull dis-tribution; multivariate phase-type; parameter estimation.The research of Jorge Yslas is supported by Danmarks Frie Forskningsfond Grant No 9040-00086B.. a r X i v : . [ m a t h . S T ] J un H.ALBRECHER, M. BLADT, AND J. YSLAS instance, Pareto, Weibull and Generalized Extreme Value (GEV) distributions with matrix-valuedparameters. These distribution classes are all dense in the class of distributions on the positivehalﬂine and inherit the computational advantages of the PH-type class, but also provide excellentﬁts for heavy–tailed data already for small dimensions, something that the original PH class couldnot achieve. In particular, if by some preliminary exploratory analysis one has a good guess foran appropriate scaling function (typically suggested by the empirical tail behavior), the resultingmatrix distributions can be very parsimoneous yet eﬀective model improvements of the respectivebase distributions with a genuinely heavy tail. However, while parameter estimation for univariatePH distributions by a standard maximum likelihood procedure based on an EM algorithm hasbeen studied in the seminal paper of Asmussen et al. [9] (see also the later extension of Olsson[34] dealing with censored observations and Bladt et al. [11] for an MCMC approach), parameterestimation for the time–inhomogeneous case has not yet been addressed.Motivated by the ﬂexibility of the approach, in this paper we will also consider an inhomoge-neous extension of the multivariate version of the PH distribution. The multivariate phase–typedistribution (of MPH ∗ type) was originally introduced by Kulkarni [27] and is constructed as thejoint distribution of certain state-dependent accumulated rewards earned on the same underlyingMarkov jump process. It has PH-distributed marginals and also enjoys a denseness property in theclass of all distributions on the respective positive orthant. Multivariate phase–type distributionshave found applications in diverse areas. For instance, Cai et al. [17] consider them for determin-ing conditional tail expectations in risk management, Cai et al. [18] studied several types of ruinprobabilities for a multivariate compound Poisson risk model when the claim size vector follows anMPH ∗ distribution, and Herbertsson [25] used this class to model default contagion in credit risk.More recently, Bladt et al. [13] applied MPH ∗ distributions for the calculation of Parisian typeruin probabilities. In terms of ﬁtting of the (time–homogeneous) multivariate MPH ∗ distribution,Ahlstr¨om et al. [1] introduced an algorithm for a bivariate subclass of MPH ∗ , and an EM algorithmfor parameter estimation in the general case was proposed in [16]. However, the latter was notactually implemented and contains an inconsistency in the maximum likelihood estimator (whichwe amend in this paper).The inhomogeneous extension of the MPH ∗ to be proposed in this paper will then again serve thepurpose of keeping the dimension of the involved matrices low when one faces a non-exponentialbehavior in the marginals and the joint multivariate behaviour. We would like to point out thatan alternative analytically tractable deviation from exponential behavior utilizing Mittag-Leﬄerdistributions in both the univariate and multivariate case can be found in [3, 4, 5]. A number ofcommonly used heavy-tailed multivariate distributions are in fact transformed multivariate expo-nential distributions. For instance, Mardia [29] was the ﬁrst to systematically study multivariatePareto distributions, which he introduced by transforming a Wicksell–Kibble–type multivariateexponential distribution (see [26]). He also noticed that estimation methods for the multivari-ate exponential can then be translated directly towards the estimation of the multivariate Paretodistribution. Arnold [7] presents some approaches to extend Mardia’s analysis to obtain othermultivariate distributions with Pareto marginals. Likewise, multivariate versions of the Weibulldistribution have been obtained as power transforms of multivariate exponential distributions, seee.g. Lee [28]. The inhomogeneous MPH ∗ extension that we propose in this paper can to someextent be seen as a generalization and uniﬁcation of these above models.The main purpose of this paper is to provide algorithms for the statistical ﬁtting of all theseﬂexible classes of distributions and illustrate and discuss their implementation. We will present auniﬁed maximum–likelihood based approach to ﬁtting phase–type distributions (PH), inhomoge-neous phase–type distributions (IPH), multivariate phase–type distributions (of MPH ∗ type) and ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 3 its newly introduced inhomogeneous extension. These classes contain a large number of mathemat-ically tractable distributions that are suﬃciently general to ﬁt any non–negative data set, in thebody and for both light or heavy tails. We will also consider extensions of the procedures to adaptfor censored data and to the ﬁtting of theoretically known joint distributions.The structure of the paper is as follows. In Section 2 we provide an overview of the class of IPHdistributions and present a new ﬁtting procedure, which we then exemplify on two particular cases,one on a simulated data set and the other on actual data for lifetimes of the Danish population. InSection 3 we shortly recollect some facts about the MPH* class, review existing methods for param-eter estimation and provide a substantiation and correction of an algorithm that was previouslyproposed in the literature. We then extend the algorithm to the case of censored observations,and give more details on an important particular bivariate subclass with explicit density. Thesection ﬁnishes with illustrations of the algorithms for a simulated bivariate sample as well as aphase–type approximation to a known bivariate exponential distribution. In Section 4 we introducesome multivariate extensions to distributions in the IPH class, derive basic properties, provide anEM algorithm for its parameter estimation and again illustrate its use in several examples, includ-ing multivariate matrix–Pareto models, multivariate matrix–Weibull models as well as a real dataapplication to a bivariate Danish ﬁre insurance data set. Section 5 concludes.2.

Inhomogeneous phase–type distributions

Preliminaries.

Let { J t } t ≥ denote a time–inhomogeneous Markov jump process on a statespace { , . . . , p, p + 1 } , where states 1 , . . . , p are transient and state p + 1 is absorbing. Then { J t } t ≥ has an intensity matrix of the form Λ ( t ) = (cid:18) T ( t ) ttt ( t ) (cid:19) , t ≥ , where T ( t ) is a p × p matrix and ttt ( t ) is a p –dimensional column vector. Here, for any time t ≥ ttt ( t ) = − T ( t ) eee , where eee is the p –dimensional column vector of ones. Let π k = P ( J = k ), k = 1 , . . . , p , πππ = ( π , . . . , π p ) and assume that P ( J = p + 1) = 0. Then we say that the time untilabsorption τ = inf { t ≥ | J t = p + 1 } has an inhomogeneous phase–type distribution with representation ( πππ, T ( t )) and we write τ ∼ IPH( πππ, T ( t )). If T ( t ) = λ ( t ) T , where λ ( t ) is some known non–negative real function and T is asub–intensity matrix, then we write τ ∼ IPH( πππ, T , λ ). If X ∼ IPH( πππ, T , λ ), then there exists afunction g such that(2.1) X ∼ g ( Y ) , where Y ∼ PH( πππ, T ). Speciﬁcally, g is deﬁned by g − ( x ) = (cid:90) x λ ( t ) dt or, equivalently, λ ( t ) = ddt g − ( t ) . The density f X and distribution function F X for X ∼ IPH( πππ, T , λ ) are given by f X ( x ) = λ ( x ) πππ exp (cid:18)(cid:90) x λ ( t ) dt T (cid:19) ttt ,F X ( x ) = 1 − πππ exp (cid:18)(cid:90) x λ ( t ) dt T (cid:19) eee . H.ALBRECHER, M. BLADT, AND J. YSLAS

For further reading on inhomogeneous phase–type distributions and motivations for their use inmodelling we refer to Albrecher & Bladt [6]. Note that for λ ( t ) ≡ πππ, T ) (a comprehensive account of phase–type distributions can be found in Bladt & Nielsen[12]).As illustrated in [6], a number of IPH distributions can be expressed as classical distributions withmatrix-valued parameter. For the representation of such distributions we make use of functionalcalculus. If h is an analytic function and A is a matrix, deﬁne h ( A ) = 12 πi (cid:73) γ h ( z )( z I − A ) − dz , where γ is a simple path enclosing the eigenvalues of A (cf. [12, Sec. 3.4] for details). Importantexamples include the transformation g ( y ) = β ( e y −

1) for β > f X ( x ) = πππ (cid:18) xβ + 1 (cid:19) T − I ttt β , ¯ F X ( x ) = 1 − F X ( x ) = πππ (cid:18) xβ + 1 (cid:19) T eee , (2.2)respectively, as well as the matrix–Weibull distribution with density and survival function f X ( x ) = πππe T x β tttβx β − , ¯ F X ( x ) = πππe T x β eee , obtained from g ( y ) = y /β ( β > Parameter estimation.

For the matrix–Pareto distribution (2.2) and β = 1, the transformis parameter-independent, so that the distribution can be ﬁtted to i.i.d. data x , . . . , x N by ﬁttinga phase–type distribution PH( πππ, T ) to the transformed data log(1 + x ) , . . . , log(1 + x N ) using anEM algorithm [9]. This was the procedure employed in [6] for the numerical illustration there. Thegeneral case – where the transform does depend on parameters – is more subtle and shall be dealtwith here. The key will be to apply a parameter-dependent transformation in each step of the EMalgorithm.Let x , . . . , x N be an i.i.d. sample of an inhomogeneous phase–type distribution with represen-tation X ∼ IPH( πππ, T , λ ( · ; βββ )), where λ ( · ; βββ ) is a parametric non–negative function depending onthe vector βββ . We then know that X d = g ( Y ; βββ ) with Y ∼ PH( πππ, T ) and g is deﬁned in terms ofits inverse function g − ( x ; βββ ) = (cid:82) x λ ( t ; βββ ) dt . In particular g − ( X ; βββ ) d = Y ∼ PH( πππ, T ). The EMalgorithm for ﬁtting IPH( πππ, T , λ ( · ; βββ )) then works as follows. Algorithm 2.2.1 (EM algorithm for transformed phase–type distributions) .

0. Initialize with some “arbitrary” ( πππ, T , βββ ) .1. Transform the data into y i = g − ( x i ; βββ ) , i = 1 , . . . , N , and apply the E– and M–steps of theconventional EM algorithm of Asmussen [9] by which we obtain the estimators (ˆ πππ, ˆ T ) .2. Compute ˆ βββ = arg max βββ N (cid:88) i =1 log( f X ( x i ; ˆ πππ, ˆ T , βββ )) = arg max βββ N (cid:88) i =1 log (cid:18) λ ( x i ; βββ )ˆ πππ exp (cid:18)(cid:90) x i λ ( t ; βββ ) dt ˆ T (cid:19) ˆ ttt (cid:19) .

3. Assign ( πππ, T , βββ ) = (ˆ πππ, ˆ T , ˆ βββ ) and GOTO 1. Then the likelihood function increases for each iteration, and hence converges to a (possiblylocal) maximum.

ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 5

Proof.

Since the data points x i are assumed to be i.i.d. realisations from the unknown distributionIPH( πππ, T , λ ), there exists a function g such that y i = g − ( x i ; βββ ) are i.i.d. realisations of phase–typedistributed random variables PH( πππ, T ). That function g is assumed to be known up to the valueof βββ . In turn, x i = g ( y i ; βββ ), so a data point x i can be interpreted as the absorption time of theMarkov jump process corresponding to PH( πππ, T ), which is y i , but with the scale of the time axisfor the y i –data converted (stretched) into g ( · ; βββ )–coordinates instead. The full data likelihood isthen given by L ( πππ, T , βββ ; yyy ) = p (cid:89) k =1 π B k k p (cid:89) k =1 (cid:89) l (cid:54) = k t N kl kl e − t kl Z k ( βββ ) p (cid:89) k =1 t N k k e − t k Z k ( βββ ) , where B k is the number of times the Markov process underlying the phase–type distribution initiatesin state k , N kl denotes the total number of transitions from state k to l , N k denotes the number oftimes an exit to the absorbing state was caused by a jump from state k , and Z k ( βββ ) is the total timethe Markov process has spent in state k . We notice that Z k ( βββ ) is the only suﬃcient statistic whichdepends on the transformation of the time axis for the y –data and hence on βββ . Consequently, forany given βββ , the E –step is simply the one as in [9], and so is the M –step for ( πππ, T ).The βββ update in 2. requires a general, usually numerical, maximization of the incomplete datalikelihood. Each iteration of the algorithm increases the likelihood. Indeed, let L I denote theincomplete data likelihood and consider parameter values ( πππ n , T n , βββ n ) after the n -th iteration. Inthe ( n + 1)-th iteration, we ﬁrst obtain ( πππ n +1 , T n +1 ) in 1. so that L I ( πππ n , T n , βββ n ; g − ( yyy ; βββ n )) ≤ L I ( πππ n +1 , T n +1 , βββ n ; g − ( yyy ; βββ n )) . By monotonicity of g and the transformation theorem, L I ( πππ n , T n , βββ n ; yyy ) ≤ L I ( πππ n +1 , T n +1 , βββ n ; yyy )and hence L I ( πππ n , T n , βββ n ; yyy ) ≤ L I ( πππ n +1 , T n +1 , βββ n ; yyy ) ≤ sup βββ L I ( πππ n +1 , T n +1 , βββ ; yyy ) = L I ( πππ n +1 , T n +1 , βββ n +1 ; yyy ) . (cid:3) Example 2.2.1. (Matrix–Gompertz) Let X = log( βY + 1) /β , where Y ∼ PH( πππ, T ) and β > F X ( x ) = πππe T ( e βx − /β eee and f X ( x ) = πππe T ( e βx − /β ttte βx . We refer to the distribution of X as a matrix–Gompertz distribution , since the scale parameter of theusual Gompertz distribution is now replaced by a matrix. Note that the resulting distribution hasa lighter tail than a conventional phase–type distribution. The Gompertz distribution is used in anumber of applications, most notably it is historically used for the modelling of human lifetimes [21].Its matrix version (2.3) provides a natural ﬂexible extension. As an illustration, we ﬁtted a matrix–Gompertz distribution with 3 phases using Algorithm 2.2.1 with 2 500 iterations to the lifetime ofthe Danish population that died in the year 2 000 at ages 50 to 100 (data obtained from the HumanMortality Database (HMD) and available in the R-package MortalitySmooth [19]). Here and inlater examples, the number of iterations in the algorithm is chosen in such a way that the changesin the successive log–likelihoods become negligible. Concerning running times, our implementationmakes use of the gradient ascent method for the maximization part of the algorithm, in which therunning times highly depend on the step–length and the actually chosen stopping criterion. Inthe present example we employed a step–length of 10 − and run gradient ascent until the absolutevalue of the derivative is less than 0 .

001 leading to a running time of about 35 seconds on a usualPC (with 2 . H.ALBRECHER, M. BLADT, AND J. YSLAS improvement on running times can be attained by using a diﬀerent maximization procedure. Theobtained parameters are as follows:ˆ πππ = (0 . , . , . , ˆ T =  − . . . − . . . . − .  , ˆ β = 0 . . Figure 2.1 shows that the ﬁtted density recovers the structure of the data quite well. Note thatconventional phase–type distributions have been used to model the distribution of lifespans (see forinstance Asmussen at al. [8]). However, the number of phases required to capture the tail behaviorof the data with the latter is rather large, due to the lighter than exponential tail. In contrast, thematrix–Gompertz distribution provides an excellent ﬁt with comparably few parameters (phases).

Histogram vs fitted density

Figure 2.1.

Histogram of lifetimes of the Danish population that died in the year2 000 at ages 50 to 100 versus the density of the ﬁtted matrix–Gompertz distribution.

Example 2.2.2. (Matrix–GEV) Algorithm 2.2.1 can also be applied to estimate distributions thatare not IPH in a strict sense, but that are deﬁned as a transformation of a PH distribution. Thisis for instance the case for g ( y ) = µ − σ ( y − ξ − /ξ with µ ∈ R , σ >

0, and ξ ∈ R . Recall from [6]that F X ( x ) =  πππ exp (cid:32) T (cid:18) ξ x − µσ (cid:19) − /ξ (cid:33) eee , ξ (cid:54) = 0 ,πππ exp (cid:18) T exp (cid:18) − x − µσ (cid:19)(cid:19) eee , ξ = 0 ,f X ( x ) =  σ πππ exp (cid:32) T (cid:18) ξ x − µσ (cid:19) − /ξ (cid:33) ttt (cid:18) ξ x − µσ (cid:19) − (1+ ξ ) /ξ , ξ (cid:54) = 0 , σ πππ exp (cid:18) T exp (cid:18) − x − µσ (cid:19)(cid:19) ttt exp (cid:18) − x − µσ (cid:19) , ξ = 0 , from which it becomes clear that this distribution can be interpreted as a matrix version of thegeneralized extreme value (GEV) distribution, see e.g. [10]. As an illustration, we generated ani.i.d. sample of size 5 000 from such a distribution of 3 phases with parameters πππ = (1 , , , T =  − . . − .

81 1 −  ,µ = 2 , σ = 0 . , ξ = 0 . , ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 7 which has theoretical moments E ( X ) = 2 . SD ( X ) = 1 . E ( X ) = 2 . SD ( X ) = 1 . πππ = (0 . , . , . , ˆ T =  − . . . . − . . . . − .  , ˆ µ = 1 . , ˆ σ = 0 . , ˆ ξ = 0 . . We observe that the algorithm estimates pretty well the shape parameter ξ , which determinesthe heaviness of the tail. Moreover, the ﬁtted distribution has moments E ( X ) = 2 . SD ( X ) = 1 . − . − . πππ , T , µ and σ do not resemble theoriginal parameter values, but this is linked with the well-known identiﬁability issue for phase–typedistributions (namely that other parameter combinations may lead to a very similar density shape).In fact, the algorithm ﬁnds the parameters that maximize the likelihood for the given sample, andas the concrete numbers above show, the present parameters even outperform the original modelunderlying the sample(!), see also the convincing QQ–plot in Figure 2.2. Here, the step–length is10 − and the gradient ascent is run until the norm of the derivative is less than 0 . Histogram vs fitted density lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l

Sample F i tt ed d i s t r i bu t i on QQ−plot

Figure 2.2.

Histogram of simulated sample versus density of the ﬁtted matrix–GEV distribution in log–scale (left) as well as QQ–plot of simulated sample versusﬁt (right). 3.

Multivariate phase-type distributions

Preliminaries.

Let τ ∼ PH ( πππ, T ) be a (conventional) p –dimensional phase–type distributedrandom variable with underlying time–homogeneous Markov jump process { J t } t ≥ . Let rrr j =( r j (1) , . . . , r j ( p )) (cid:48) be non–negative p –dimensional column vectors, j = 1 , . . . , d , and let R = ( rrr , rrr , . . . , rrr d )be a p × d –dimensional reward matrix . Now deﬁne Y ( j ) = (cid:90) τ r j ( J t ) dt for all j = 1 , . . . , d . If we interpret r j ( k ) as the rate at which a reward is obtained while J t is instate k , then Y ( j ) is the total reward for component j obtained prior to absorption. We then say H.ALBRECHER, M. BLADT, AND J. YSLAS that the random vector

YYY = (cid:0) Y (1) , . . . , Y ( d ) (cid:1) (cid:48) has a multivariate phase–type distribution of theMPH ∗ type (as deﬁned in [27], see also [12]) and we write YYY ∼ MPH ∗ ( πππ, T , R ).While each member of the MPH ∗ class has an explicit expression for the (joint) Laplace transformand the joint moments of any order (see Section 8.1.1 of [12]), there are no general explicit expres-sions for the density and distribution functions. However, for certain structures and sub–classesexplicit expressions for the latter do exist (like Example 8.1.13 of [12]).If YYY = (cid:0) Y (1) , . . . , Y ( d ) (cid:1) (cid:48) ∼ MPH ∗ ( πππ, T , R ), then each marginal Y ( j ) has a phase–type distribu-tion, PH( πππ j , T j ) say. First we decompose rrr j = (cid:18) rrr + j rrr j (cid:19) , πππ = (cid:0) πππ + πππ (cid:1) and T = (cid:18) T ++ T +0 T T (cid:19) , where we have reordered the state space such that the + terms correspond to the states k for whichthe rewards r ( j ) k are strictly positive, and the 0 terms to the states with zero rewards. E.g., T +0 corresponds to the intensities by which the underlying Markov jump process { J t } t ≥ jumps from astate with positive reward to a state with zero reward. Then the phase–type distribution of Y ( j ) is given by an atom at zero of size πππ ( I − ( − T ) − S ) eee , where eee is the column vector of ones ofappropriate dimension, and(3.1) πππ j = πππ + + πππ ( − T ) − T together with T j = ∆ ( rrr + j ) − (cid:0) T ++ + T +0 ( − T ) − T (cid:1) , where ∆( aaa ) denotes the d (cid:48) × d (cid:48) diagonal matrix with entries a ( m ) , m = 1 , . . . , d (cid:48) , from a d (cid:48) –dimensional vector aaa . The atom appears in case there is a positive probability of starting in anon–reward–earning state (0) and the underlying Markov process gets absorbed before visiting areward earning state (+). The Markov jump process generating Y ( j ) starts in the same state as J t if the reward is positive (hence πππ + ) or it starts in the ﬁrst state with positive rewards that J t enters after starting in a zero reward state (hence the term πππ ( − T ) − T ). Similar argumentsapply to the generator T j , where only reward-earning terms will form part of the state space for Y ( j ) . We refer to [12] for further details.Summarizing, each marginal Y ( j ) has a phase–type distribution, which is based on the originalMarkov process { J t } t ≥ , but with a possibly smaller state space and with rescaled parameters.3.2. Parameter estimation.

We next provide an algorithm for estimating MPH ∗ distributeddata. The data consist of a d –dimensional multivariate sample of N i.i.d. observations yyy i = ( y (1) i , . . . , y ( d ) i ) (cid:48) , i = 1 , . . . , N . That is, we only observe the times to absorption, y ( j ) i , of each phase–type distributed marginal.Hence we are clearly in an incomplete data set–up and we shall employ the EM algorithm for ﬁtting( πππ, T , R ).The EM algorithm works by replacing unavailable suﬃcient statistics by their conditional expec-tations given data under given parameters, and thereby updating the parameters by using knownformulas for the maximum likelihood estimator in the complete data domain. Iteration of theprocedure then produces a sequence of parameter values which increases the likelihood in eachstep.For the present situation, we deﬁne the complete data as both the trajectories of the underly-ing Markov process which generates the phase–type distribution from which the marginals of themultivariate vector are constructed, and the Markov jump processes representing the rewards inall marginal distributions. It is not suﬃcient with complete knowledge of the marginal trajectoriesonly. Indeed, one can easily construct examples where the underlying processes cannot be recon-structed from the marginals only. The complete knowledge of both marginals and the underlyingMarkov process which generates the marginals creates another problem in relation to the incom-plete data since we do not have observations for the absorption times of the underlying Markov ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 9 process. We can get around this problem by assuming that the rows of the reward matrix R sum toone, i.e. R eee = eee . This assumption is not restrictive and can be imposed without losing generalitydue to the great ambiguity of (multivariate) phase–type representations. Hence our data consistsof marginals yyy ( j ) = ( y ( j )1 , . . . , y ( j ) N ) (cid:48) , j = 1 , . . . , d , and their sums yyy ( S ) = (cid:80) dl =1 yyy ( l ) .In the complete data domain, the estimation is straightforward and works as follows. Thecomplete data MLE for ( πππ, T ) is given byˆ π k = B k N , ˆ t kl = N kl Z k , ˆ t k = N k Z k , ˆ t kk = − (cid:88) l (cid:54) = k ˆ t kl − ˆ t k . The rewards of the marginals are then given byˆ r j ( k ) = Z ( j ) k Z k = Z ( j ) k (cid:80) dl =1 Z ( l ) k , where Z ( j ) k is the over-all amount of time the j ’th component has spent in state k .In the EM algorithm, we now must replace all aforementioned suﬃcient statistics by their con-ditional expectations given data. Concerning Z k , N kl , N k and B k , these only depend on theunderlying Markov jump process and are computed conditionally on yyy ( S ) only. Their formulas arethen as stated in the algorithm below (see [9]).Concerning the conditional expectation of Z ( j ) k , we must calculate the expected reward (under( πππ, T )) given all data of marginal j , which amounts to calculating the conditional expected timegiven data for the corresponding phase–type representation of the j -th marginal, ( πππ j , T j ). Theseare readily given by (again using [9]) E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) = N (cid:88) i =1 (cid:90) y ( j ) i eee (cid:48) k e T j ( y ( j ) i − u ) ttt j πππ j e T j u eee k duπππ j e T j y ( j ) i ttt j . Then ˆ r j ( k ) = E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) E (cid:0) Z k | YYY ( S ) = yyy ( S ) (cid:1) . Iterating the above ﬁnally provides a (single) full EM algorithm for the estimation of ( πππ, T , R ). Wesummarize the results in the following. Algorithm 3.2.1 (EM algorithm for MPH* distributions) .

0. Initialize with some “arbitrary” ( πππ, T , R ) with R eee = eee , and compute πππ j and T j , j = 1 , . . . , d ,using (3.1) .1. (E–step) Calculate E (cid:16) B k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 π k eee (cid:48) k e T y ( S ) i tttπππ e T y ( S ) i ttt E (cid:16) Z k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 (cid:90) y ( S ) i eee (cid:48) k e T ( y ( S ) i − u ) tttπππ e T u eee k duπππ e T y ( S ) i ttt E (cid:16) N kl | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 t kl (cid:90) y ( S ) i eee (cid:48) l e T ( y ( S ) i − u ) tttπππ e T u eee k duπππ e T y ( S ) i ttt E (cid:16) N k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 t k πππ e T y ( S ) i eee k πππ e T y ( S ) i ttt E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) = N (cid:88) i =1 (cid:90) y ( j ) i eee (cid:48) k e T j ( y ( j ) i − u ) ttt j πππ j e T j u eee k duπππ j e T j y ( j ) i ttt j .

2. (M–step) Let ˆ α k = 1 N E (cid:16) B k | YYY ( S ) = yyy ( S ) (cid:17) , ˆ t kl = E (cid:0) N kl | YYY ( S ) = yyy ( S ) (cid:1) E (cid:0) Z k | YYY ( S ) = yyy ( S ) (cid:1) , ˆ t k = E (cid:0) N k | YYY ( S ) = yyy ( S ) (cid:1) E (cid:0) Z k | YYY ( S ) = yyy ( S ) (cid:1) , ˆ t kk = − (cid:88) l (cid:54) = k ˆ t kl − ˆ t k , ˆ πππ = (ˆ π , . . . , ˆ π p ) , ˆ T = { ˆ t kl } k,l =1 ,...,p and ˆ ttt = (ˆ t , . . . , ˆ t p ) (cid:48) . and ˆ r j ( k ) := E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) E (cid:16) Z k | YYY ( S ) = yyy ( S ) (cid:17) = E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) d (cid:88) l =1 E (cid:16) Z ( l ) k | YYY ( l ) = yyy ( l ) (cid:17) and ˆ R = { ˆ r j ( k ) } k =1 ,...,p,j =1 ,...,d .

3. Assign πππ := ˆ πππ , T := ˆ T , ttt := ˆ ttt , R := ˆ R and compute πππ j , T j , j = 1 , . . . , d , using (3.1) . GOTO1. Remark 3.2.1.

Algorithm 3.2.1 was originally proposed in [16] as two consecutive EM algorithmsand its original statement contained a minor error in the M–step update for the reward matrix. Tosee why Algorithm 3.2.1 can be decomposed into the two consecutive EM algorithms, we argue asfollows. Running the EM Algorithm 3.2.1, (ˆ πππ, ˆ T ) will eventually converge (without input from thepart involving the reward components). For constant (ˆ πππ, ˆ T ), Algorithm 3.2.1 is indeed equivalentto the second EM algorithm in [16]. More speciﬁcally, the algorithm takes the following form. First EM.

0. Initialize with some “arbitrary” ( πππ, T ).1. (E–step) Calculate E (cid:16) B k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 π k eee (cid:48) k e T y ( S ) i tttπππ e T y ( S ) i ttt E (cid:16) Z k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 (cid:90) y ( S ) i eee (cid:48) k e T ( y ( S ) i − u ) tttπππ e T u eee k duπππ e T y ( S ) i ttt E (cid:16) N kl | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 t kl (cid:90) y ( S ) i eee (cid:48) l e T ( y ( S ) i − u ) tttπππ e T u eee k duπππ e T y ( S ) i ttt E (cid:16) N k | YYY ( S ) = yyy ( S ) (cid:17) = N (cid:88) i =1 t k πππ e T y ( S ) i eee k πππ e T y ( S ) i ttt .

2. (M–step) Letˆ α k = 1 N E (cid:16) B k | YYY ( S ) = yyy ( S ) (cid:17) , ˆ t kl = E (cid:0) N kl | YYY ( S ) = yyy ( S ) (cid:1) E (cid:0) Z k | YYY ( S ) = yyy ( S ) (cid:1) , ˆ t k = E (cid:0) N k | YYY ( S ) = yyy ( S ) (cid:1) E (cid:0) Z k | YYY ( S ) = yyy ( S ) (cid:1) , ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 11 ˆ t kk = − (cid:88) l (cid:54) = k ˆ t kl − ˆ t k , ˆ πππ = (ˆ π , . . . , ˆ π p ) , ˆ T = { ˆ t kl } k,l =1 ,...,p and ˆ ttt = (ˆ t , . . . , ˆ t p ) (cid:48) .

3. Assign πππ := ˆ πππ , T := ˆ T , ttt := ˆ ttt and GOTO 1. Second EM.

Use the estimated ( πππ, T ) of the ﬁrst EM.0. Initialize with some “arbitrary” R with R eee = eee , and compute πππ j and T j , j = 1 , . . . , d , using(3.1).1. (E–step) Calculate E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) = N (cid:88) i =1 (cid:90) y ( j ) i eee (cid:48) k e T j ( y ( j ) i − u ) ttt j πππ j e T j u eee k duπππ j e T j y ( j ) i ttt j .

2. (M–step) Letˆ r j ( k ) := E (cid:16) Z ( j ) k | YYY ( j ) = yyy ( j ) (cid:17) d (cid:88) l =1 E (cid:16) Z ( l ) k | YYY ( l ) = yyy ( l ) (cid:17) and ˆ R = { ˆ r j ( k ) } k =1 ,...,p,j =1 ,...,d .

3. Assign R := ˆ R and compute πππ j and T j , j = 1 , . . . , d , using (3.1). GOTO 1. Remark 3.2.2.

The main computational burden lies in the E–steps, where matrix exponentialsand integrals thereof must be evaluated. In Asmussen et al. [9] this is done by converting theproblem into a system of ODEs, which are then solved via a Runge–Kutta method of fourth order(a C implementation, called EMpht, is available online [35]). While this approach is adequate forﬁtting univariate phase–type distributions, the Runge–Kutta method fails to work in some casesin the multivariate setting, in particular for the second EM, when an element in the reward matrixapproaches zero. The reason is that the sub-intensity matrix of (at least) one of the marginals willadjust to this change by increasing some of the entries of the matrix in each iteration, and thusrequiring an increasingly smaller step–size in the Runge–Kutta method to accurately approximatethe solution to the system. Our implementation includes an approach for the computation of matrixexponentials based on uniformization, and it is a slight variation of the method in Neuts [31, p.232].We explain brieﬂy the method. By taking φ = max( − t kk ) k =1 ,...,p and deﬁning P := φ − ( φ I + T ),which is in fact a transition matrix, we have thatexp( T y ) = ∞ (cid:88) n =0 ( φy ) n n ! e − φy P n . Then (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) e T y − M (cid:88) n =0 ( φy ) n n ! e − φy P n (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ ∞ (cid:88) n = M +1 ( φy ) n n ! e − φy | P n | ≤ ∞ (cid:88) n = M +1 ( φy ) n n ! e − φy = P ( N φy > M ) , where N φy is Poisson distributed with mean φy . Hence, we can ﬁnd M such that the diﬀerence ofthe matrix exponential with a ﬁnite sum is less than or equal to a given error (cid:15) >

0. Of course,larger values of φy give bigger values of M , dismissing any computational improvement for largeobservations. A way to circumvent this problem is to observe that e T y = ( e T y/ m ) m , thus we canﬁnd m such that φy/ m <

1, compute e T y/ m by a ﬁnite sum and then retrieve e T y by squaring.To compute the integrals involving matrix exponentials, we observe that by deﬁning G ( y ; πππ, T ) := (cid:90) y e T ( y − u ) tttπππ e T u du , we have that (see Van Loan [37])exp (cid:18)(cid:18) T ttt πππ T (cid:19) y (cid:19) = (cid:18) e T y G ( y ; πππ, T ) e T y (cid:19) . Correspondingly, a simple (and eﬃcient) way to compute G ( y ; πππ, T ) is by calculating the matrixexponential of the left hand side.Approaches to improve the speed of the EM algorithm in the univariate case exist in the literature;for instance, Okamura et al. [33] proposed a method also based on uniformization. (cid:3) Parameter estimation for censored data.

In certain applications, some or all of the datamay be censored. We call a data point right–censored at v if it takes an unknown value above v , left–censored at w if it takes an unknown value below w , and interval–censored if it is contained in theinterval ( v, w ], but its exact value is unknown. Left–censoring is a special case of interval–censoringwith v = 0, while right–censoring can be obtained by ﬁxing v and letting w → ∞ .The EM Algorithm 3.2.1 works much in the same way as for uncensored data, with the onlydiﬀerence that we are no longer observing exact data points Y ( j ) = y ( j ) , but only Y ( j ) ∈ ( v ( j ) , w ( j ) ].This will only change the E–steps, where the conditional expectations can be calculated using theformulas in Olsson [34]. We now explain in detail how to adapt Algorithm 3.2.1 to censored data. First EM

It is possible that a data point consists of a combination of marginals with both censored (notnecessarily in the same intervals) and uncensored data (this is relevant in the ﬁrst EM algorithmwhen considering data of the sum of the marginals). Table 3.1 contains all possible combinationsone might have in the data and the way of treating them. Note that for d >

2, one simply repeatsthe same rules iteratively. Y (1) Y (2) Y ( S ) = Y (1) + Y (2) Uncensored with value y (1) Uncensored with value y (2) Uncensored with value y (1) + y (2) Right–censored at v (1) Uncensored with value y (1) Right–censored at v (1) + y (2) Right–censored at v (1) Right–censored at v (2) Right–censored at v (1) + v (2) Right–censored at v (1) Interval–censored ( v (2) , w (2) ] Right–censored at v (1) + v (2) Interval–censored ( v (1) , w (1) ] Uncensored with value y (2) Interval–censored ( v (1) + y (2) , w (1) + y (2) ]Interval–censored ( v (1) , w (1) ] Interval–censored ( v (2) , w (2) ] Interval–censored ( v (1) + v (2) , w (1) + w (2) ] Table 3.1.

Rules for censored data.For completeness, we include here the conditional expectations needed (see also [34]). E (cid:16) B k | Y ( S ) ∈ ( v, w ] (cid:17) = π k eee (cid:48) k e T v eee − π k eee (cid:48) k e T w eeeπππ e T v eee − πππ e T w eee , E (cid:16) Z k | Y ( S ) ∈ ( v, w ] (cid:17) = (cid:90) wv πππ e T u eee k du − (cid:18)(cid:90) w eee (cid:48) k e T ( w − u ) eeeπππ e T u eee k du − (cid:90) v eee (cid:48) k e T ( v − u ) eeeπππ e T u eee k du (cid:19) πππ e T v eee − πππ e T w eee , E (cid:16) N kl | Y ( S ) ∈ ( v, w ] (cid:17) = t kl (cid:90) wv πππ e T u eee k du − (cid:18)(cid:90) w eee (cid:48) l e T ( w − u ) eeeπππ e T u eee k du − (cid:90) v eee (cid:48) l e T ( v − u ) eeeπππ e T u eee k du (cid:19) πππ e T v eee − πππ e T w eee , ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 13 E (cid:16) N k | Y ( S ) ∈ ( v, w ] (cid:17) = t k (cid:90) wv πππ e T u eee k duπππ e T v eee − πππ e T w eee . Second EM

The second EM algorithm works as above, with the only diﬀerence that for marginals withcensored data the corresponding conditional expectation is calculated as E (cid:16) Z ( j ) k | Y ( j ) ∈ ( v ( j ) , w ( j ) ] (cid:17) = (cid:90) w ( j ) v ( j ) πππ j e T j u eee k du − (cid:32)(cid:90) w ( j ) eee (cid:48) k e T j ( w ( j ) − u ) eeeπππ j e T j u eee k du − (cid:90) v ( j ) eee (cid:48) k e T j ( v ( j ) − u ) eeeπππ j e T j u eee k du (cid:33) πππ j e T j v ( j ) eee − πππ j e T j w ( j ) eee . A bivariate phase–type distribution with explicit density.

For a general MPH ∗ dis-tribution an explicit density is not available. Kulkarni [27] characterized the density by a systemof partial diﬀerential equations, and in Breuer [16] a semi–explicit form is deduced. The followingtype of bivariate phase–type distributions does lead to an explicit density:Let YYY = ( Y (1) , Y (2) ) (cid:48) ∼ MPH ∗ ( πππ, T , R ) with T = (cid:18) T T T (cid:19) , πππ = ( ααα, R = (cid:18) eee eee (cid:19) , (3.2)where T and T are sub–intensity matrices of dimensions p and p ( p = p + p ), respectively,and T eee + T eee = 000. Then the joint density of YYY is given by f YYY (cid:16) y (1) , y (2) (cid:17) = ααα e T y (1) T e T y (2) ( − T ) eee , (3.3)with marginals Y (1) ∼ PH( ααα, T ) and Y (2) ∼ PH( ααα ( − T ) − T , T ). Note that the Baker–typebivariate distributions introduced in Bladt et al. [13] are a particular case. The latter have someremarkable properties: one can construct a distribution of this type with speciﬁc given marginalsand a given Pearson correlation coeﬃcient; this class is also dense within the set of bivariatedistributions with support in R (this follows from the fact that the class of Bernstein copulas canbe used to approximate arbitrarily well any copula (see Sencetta and Satchell [36]) and that theclass of phase–type distributions can approximate arbitrarily well any distribution with support on R + ), making the bigger class of bivariate distributions also dense.3.4.1. Tail independence.

The existence of an explicit form of the density allows us to compute theupper tail dependence coeﬃcient λ U . Recall that the latter is deﬁned as λ U = lim q → − P (cid:16) Y (1) > F − Y (1) ( q ) | Y (2) > F − Y (2) ( q ) (cid:17) . It is a classical measure of dependence in the tail and of considerable interest in applications ininsurance and ﬁnance, where the modelling of tail events is crucial. From (3.3) we have¯ F YYY ( y (1) , y (2) ) = P (cid:16) Y (1) > y (1) , Y (2) > y (2) (cid:17) = ααα ( − T ) − e T y (1) T e T y (2) eee . Then, if − λ j is the real part of the eigenvalue of T jj with largest real part and k j is the dimensionof the Jordan block of λ j for j = 1 ,

2, it is easy to see that¯ F YYY ( y (1) , y (2) ) ∼ b ( y (1) ) k − e − λ y (1) ( y (2) ) k − e − λ y (2) , as y (1) , y (2) → ∞ , where b is a positive constant. Hence λ U = lim q → − b ( F − Y (1) ( q )) k − e − λ ( F − Y (1) ( q )) ( F − Y (2) ( q )) k − e − λ ( F − Y (2) ( q )) c ( F − Y (2) ( q )) k − e − λ ( F − Y (2) ( q )) = 0 , with c positive constant. In other words, YYY is upper-tail-independent.3.4.2.

Estimation.

The density (3.3) allows for a special form of EM algorithm. Such an algorithmwas introduced in Ahlstr¨om et al. [1] and we include it for completeness, subsequent use andcomparison purposes.

Algorithm 3.4.1.

0. Initialize with some “arbitrary” ( ααα, T ) .1. (E–step) Calculate E ( B k | YYY = yyy ) = N (cid:88) i =1 α k eee (cid:48) k e T y (1) i T e T y (2) i ( − T ) eeef YYY ( y (1) i , y (2) i ; ααα, T ) , k = 1 , . . . , p E ( Z k | YYY = yyy ) =  N (cid:88) i =1 (cid:90) y (1) i ααα e T u eee k eee (cid:48) k e T ( y (1) i − u ) T e T y (2) i ( − T ) eeeduf YYY ( y (1) i , y (2) i ; ααα, T ) , k = 1 , . . . , p N (cid:88) i =1 (cid:90) y (2) i ααα e T y (1) i T e T u eee ( k − p ) eee (cid:48) ( k − p ) e T ( y (2) i − u ) ( − T ) eeeduf YYY ( y (1) i , y (2) i ; ααα, T ) , k = p + 1 , . . . , p E ( N kl | YYY = yyy ) =  N (cid:88) i =1 t kl (cid:90) y (1) i ααα e T u eee k eee (cid:48) l e T ( y (1) i − u ) T e T y (2) i ( − T ) eeeduf YYY ( y (1) i , y (2) i ; ααα, T ) , k, l = 1 , . . . , p , k (cid:54) = l N (cid:88) i =1 t kl ααα e T y (1) i eee k eee (cid:48) l − p e T y (2) i ( − T ) eeef YYY ( y (1) i , y (2) i ; ααα, T ) , k = 1 , . . . , p , l = p + 1 , . . . , p N (cid:88) i =1 t kl (cid:90) y (2) i ααα e T y (1) i T e T u eee k − p eee (cid:48) l − p e T ( y (2) i − u ) ( − T ) eeeduf YYY ( y (1) i , y (2) i ; ααα, T ) , k, l = p + 1 , . . . , p, k (cid:54) = l E ( N k | YYY = yyy ) = N (cid:88) i =1 t k ααα e T y (1) i T e T y (2) i eee k − p f YYY ( y (1) i , y (2) i ; ααα, T ) , k = p + 1 , . . . , p .

2. (M–step) Let ˆ α k = 1 N E ( B k | YYY = yyy ) , ˆ t kl = E ( N kl | YYY = yyy ) E ( Z k | YYY = yyy ) , ˆ t k = E ( N k | YYY = yyy ) E ( Z k | YYY = yyy ) , ˆ t kk = − (cid:88) l (cid:54) = k ˆ t kl − ˆ t k , ˆ ααα = ( ˆ α , . . . , ˆ α p ) , ˆ T = { ˆ t kl } k,l =1 ,...,p and ˆ ttt = (ˆ t p +1 , . . . , ˆ t p ) (cid:48) .

3. Assign ααα := ˆ ααα , T := ˆ T , ttt := ˆ ttt and GOTO 1. We now provide two detailed illustrations. When Algorithm 3.2.1 is employed, given that anexplicit form of the joint density is not available, we choose the number of iterations in such a waythat the changes in the successive log–likelihoods in the ﬁrst EM become negligible and the changes

ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 15 in the successive parameter estimates become negligible in the second EM. For Algorithm 3.4.1 weused a criterion similar to the univariate case.

Example 3.4.1 (Simulation study) . The objective of the present example is to compare the per-formance of Algorithm 3.2.1 and Algorithm 3.4.1. We will illustrate that the more general Algo-rithm 3.2.1 also provides reasonable results when dealing with a sample from a bivariate distributionwith density (3.3), for which the more speciﬁc Algorithm 3.4.1 is particularly well-suited. We gen-erated an i.i.d. sample of size 10 000 from a MPH ∗ distribution with parameters πππ = (0 . , . , , , T =  − −

11 0 20 0 − .

50 0 0 −  , and R =   , which has theoretical mean E ( YYY ) = (cid:0) E ( Y (1) ) , E ( Y (2) ) (cid:1) (cid:48) = (0 . , . (cid:48) , and correlation coeﬃcient ρ ( Y (1) , Y (2) ) = 0 . λ U = 0. The simulated sample has numerical valuesˆ E ( YYY ) = (0 . , . (cid:48) , ˆ ρ = 0 . ρ τ = 0 . p = 4, random initial values and 3 500steps in each EM algorithm, we obtain the following parameters:ˆ πππ = (0 . , . , . , . , ˆ T =  − . . . . . − . . . . . − . . . . . − .  , ˆ R =  . . . . . .  . The ﬁtted distribution has mean E ( YYY ) = (0 . , . (cid:48) and ρ = 0 . λ U and ρ τ via simulation obtaining ˆ λ U = 0 . ρ τ = 0 . Marginal 1 − Histogram vs fitted density

Marginal 2 − Histogram vs fitted density

Sum − Histogram vs fitted density

Figure 3.1.

Histograms of simulated sample versus densities of the MPH* distri-bution ﬁtted using Algorithm 3.2.1. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l l

012 0 1 2

Sample F i tt ed d i s t r i bu t i on Marginal 1 − QQplot lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l

Sample F i tt ed d i s t r i bu t i on Marginal 2 − QQplot llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll l l l l l

Sample F i tt ed d i s t r i bu t i on Sum − QQplot

Figure 3.2.

QQ plots of simulated sample versus ﬁtted MPH* distribution usingAlgorithm 3.2.1. Y (1) Y ( ) Contour plot − Sample Y (1) Y ( ) Contour plot − Simulated data from fitted distribution Y (1) Y ( ) Contour plot − Simulated data from fitted distribution

Figure 3.3.

Contour plot of sample (left), contour plot of a simulated sample fromthe MPH* distribution ﬁtted with Algorithm 3.2.1 (center) and contour plot of of asimulated sample from the distribution ﬁtted with Algorithm 3.4.1 (right).Next we use Algorithm 3.4.1 with 4 phases ( p = 2 and p = 2), random initial values and 2 500steps in the EM algorithm (leading to a running time of 623 seconds). The estimated parametersare: ˆ πππ = (0 . , . , , , ˆ T =  − . . . . . − . . . − . . . − .  , ˆ R =   , with corresponding mean E ( YYY ) = (0 . , . (cid:48) and ρ = 0 . ρ τ for this ﬁt can be estimatedvia simulation, giving ˆ ρ τ = 0 . −

12 329 .

82) outperforms the log–likelihood usingthe original MPH* distribution ( −

12 330 . ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 17 distribution, whereas the more speciﬁc Algorithm 3.4.1 ﬁnds a ﬁt that even exhibits nicely theoriginal correlation pattern. Yet, the ﬂexibility of Algorithm 3.2.1 is a considerable advantagewhen dealing with data sets without the additional knowledge about the underlying distribution.

Example 3.4.2 (Known distribution – Marshall-Olkin exponential) . We now would like to illus-trate that Algorithms 3.2.1 and 3.4.1 can be modiﬁed to ﬁt a MPH ∗ model to a theoretically givenjoint distribution H . The idea is along the lines of [9] and consists of considering sequences ofempirical distributions with increasing sample size. We exemplify this by considering a bivariateMarshall–Olkin exponential distribution, whose joint survival function is of the form F (cid:16) y (1) , y (2) (cid:17) = exp (cid:16) − λ y (1) − λ y (2) − λ max( y (1) , y (2) ) (cid:17) . We take λ = 1, λ = 3 and λ = 1, then the distribution has theoretical moments E ( YYY ) =(0 . , .

25) and ρ = 0 .

2. It is easy to see that the Marshall–Olkin bivariate exponential is upper-tail-independent, i.e., λ U = 0. Moreover, we approximate ρ τ via simulation, obtaining ˆ ρ τ = 0 . πππ = (0 . , . , . , ˆ T =  . . . . − . . . . − .  , ˆ R =  . . . . . .  , which has corresponding moments E ( YYY ) = (0 . , . (cid:48) and ρ = 0 . . λ U and ρ τ for theresulting model can be approximated by simulation to be ˆ λ U = 0 . ρ τ = 0 . y f ( y ) OriginalFitted

Marginal 1 y f ( y ) OriginalFitted

Marginal 2 y f ( y ) OriginalFitted

Sum

Figure 3.4.

Densities of the original Marshall–Olkin distribution versus densitiesof the MPH* distribution ﬁtted using Algorithm 3.2.1.4.

Multivariate inhomogeneous phase-type distributions

There are various possibilities for extending inhomogeneous phase-type distributions to morethan one dimension. In the following we suggest one particular approach and provide an algorithmfor the parameter estimation. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l

Original distribution F i tt ed d i s t r i bu t i on Marginal 1 − QQplot llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l

Original distribution F i tt ed d i s t r i bu t i on Marginal 2 − QQplot lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l

Original distribution F i tt ed d i s t r i bu t i on Sum − QQplot

Figure 3.5.

QQ plots of original Marshall–Olkin distribution versus ﬁtted MPH*distribution using Algorithm 3.2.1. Y (1) Y ( ) Contour plot − Original distribution Y (1) Y ( ) Contour plot − Simulated data from original distribution Y (1) Y ( ) Contour plot − Simulated data from fitted distribution

Figure 3.6.

Contour plot of the original Marshall–Olkin distribution (left) contourplot of simulated sample from Marshall–Olkin distribution (middle) and contour plotof simulated sample from the distribution ﬁtted with Algorithm 3.2.1 (right).4.1.

Deﬁnition and Properties.

Let

YYY ∼ MPH ∗ ( πππ, T , R ) and deﬁne XXX := ( g ( Y (1) ) , . . . , g d ( Y ( d ) )) (cid:48) ,where g j : R + → R + are increasing and diﬀerentiable functions for j = 1 , . . . , d , then we say that XXX has an inhomogeneous MPH* distribution.Several of its properties follow directly from the deﬁnition:(1) The marginals X ( j ) = g j ( Y ( j ) ) are IPH distributed, since each Y ( j ) is phase–type dis-tributed, j = 1 , . . . , d .(2) Since g j is increasing for all j = 1 , . . . , d , the copula of XXX is the same as the copula of

YYY (see e.g. [30, Prop.7.7]).(3) For ﬁxed g j ( · ), j = 1 , . . . , d , this new class is dense in R d + (by the denseness of the MPH*class).In the sequel we will provide an algorithm for parameter estimation for which an explicit expressionsof the bivariate density is needed. We therefore restrict it to the bivariate case.4.2. Parameter estimation in the bivariate case.

In the bivariate case, for any

YYY ∼ MPH ∗ ( πππ, T , R )with parameters (3.2), it is easy to see that the density of XXX = ( g ( Y (1) ) , g ( Y (2) )) (cid:48) is given by f XXX (cid:16) x (1) , x (2) (cid:17) = ααα e T g − ( x (1) ) T e T g − ( x (2) ) ( − T ) eee g (cid:48) ( g − ( x (1) )) g (cid:48) ( g − ( x (2) )) . (4.1)If we assume that g j ( · ; βββ j ) is a parametric non–negative function depending on the vector βββ j , j = 1 ,

2, and let βββ = ( βββ , βββ ). Then, we can formulate an algorithm analogous to Algorithm 2.2.1: Algorithm 4.2.1 (EM algorithm for bivariate inhomogeneous MPH* distributions) .

0. Initialize with some “arbitrary” ( ααα, T , βββ ) .1. Transform the data into y ( j ) i := g − j ( x ( j ) i ; βββ j ) , i = 1 , . . . , N , j = 1 , , and apply the E– andM–steps of Algorithm 3.4.1 by which we obtain the estimators (ˆ ααα, ˆ T ) . ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 19

2. Compute ˆ βββ = arg max βββ N (cid:88) i =1 log( f XXX ( x (1) i , x (2) i ; ˆ ααα, ˆ T , βββ ))

3. Assign ( ααα, T , βββ ) = (ˆ ααα, ˆ T , ˆ βββ ) and GOTO 1. We now consider particular multivariate distributions obtained through such a transformationof an MPH* random vector.4.3.

Multivariate matrix–Pareto models.

Let

XXX = ( g ( Y (1) ) , . . . , g d ( Y ( d ) )) (cid:48) , where YYY ∼ MPH ∗ ( πππ, T , R ) and(4.2) g j ( y ) = β j ( e y − , β j > , j = 1 , . . . , d. Then we say that

XXX follows a multivariate matrix–Pareto distribution . Some special properties ofthis class of distributions are:(1) Marginal distributions are matrix–Pareto distributed.(2) Moments and cross–moments of

XXX can be obtained from the moment generating functionof

YYY (see [12, Theorem 8.1.2]), provided that they exist.(3) Products of the type (cid:81) Mi =1 (cid:16) X ( ji ) β ji + 1 (cid:17) a i are matrix–Pareto distributed, a i > j i ∈ { , . . . , d } , i = 1 , . . . , M , M ≤ N , since linear combinations of YYY are PH distributed.In the bivariate case, (4.2) and (4.1), lead to f XXX ( x (1) , x (2) ) = ααα (cid:32) x (1) β + 1 (cid:33) T − I T (cid:32) x (2) β + 1 (cid:33) T − I ( − T ) eee β β . and ¯ F XXX ( x (1) , x (2) ) = ααα ( − T ) − (cid:32) x (1) β + 1 (cid:33) T T (cid:32) x (2) β + 1 (cid:33) T eee Moreover, in this bivariate case, linear combinations of X ( i ) are regularly varying, and the respectiveindex is the real part of the eigenvalue with largest real part of the sub–intensity matrices of themarginals, which follows from [20, Lem. 2.1] and asymptotic independence of YYY . The general caseis not clear since the condition of asymptotic independence does not hold.

Remark 4.3.1.

The Marshall–Olkin Pareto distribution (see Hanagal [23]) is a particular case ofthis class of distributions.4.3.1.

Parameter estimation.

As in the univariate case, if we assume the simpler transformation g j ( y ) = e y − j = 1 , . . . , d , then we can use ﬁtting methods of the MPH* class by taking thelogarithm of the marginal observations. I.e., we can apply Algorithms 3.2.1 and 3.4.1 to thetransformed data y ( j ) i := log( x ( j ) i + 1), i = 1 , . . . , N , j = 1 , . . . , d , to estimate the parameters( πππ, T , R ). We exemplify the use of this method in two examples. Example 4.3.1. (Mardia type I) We generated an i.i.d. sample of size 10 000 from a (translated)Mardia type I Pareto distribution (see [29]) with parameters σ = σ = 1 and α = 2. Thisdistribution has theoretical numerical values E ( XXX ) = (1 , (cid:48) , λ U = 0 and ρ τ = 0 .

2. The simulatedsample has numerical values ˆ E ( XXX ) = (0 . , . (cid:48) and ˆ ρ τ = 0 . p = p = 2 (i.e., p = 4) and 2 000 steps on the transformed data (with a running time of 530 seconds), getting thefollowing parameters: ˆ ααα = (0 . , . , , , ˆ T =  − . . . . − . . . − . . . − .  , ˆ R =   . The tails of the marginals of the ﬁtted distribution are determined by the real part of the eigenvalueswith largest real part of the sub–intensity matrices of the marginal distributions, which are λ (max)1 = − . λ (max)2 = − . E ( XXX ) = (1 . , . (cid:48) . Moreover, we estimated ρ τ via simulation,obtaining ˆ ρ τ = 0 . −

15 550 .

52) exceeds the log–likelihood using the original Mardia distribution ( −

16 146 . llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l Sample F i tt ed d i s t r i bu t i on Marginal 1 − QQplot lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l

Sample F i tt ed d i s t r i bu t i on Sum of log data − QQplot

Figure 4.1.

QQ plots of simulated Mardia Pareto sample versus ﬁtted multivariatematrix–Pareto distribution using Algorithm 3.4.1. X (1) X ( ) Contour plot − Sample X (1) X ( ) Contour plot − Simulated data from fitted distribution

Figure 4.2.

Contour plot of simulated Mardia Pareto sample (left) and contourplot of a simulated sample from the Matrix–Pareto distribution ﬁtted using Algo-rithm 3.4.1 (right).The concrete structure of the intensity and reward matrix underlying Algorithm 3.4.1 restrictsits application to tail-independent models. While in the previous example this was justiﬁed, insituations with tail-dependent data one should rather look for ﬁts in the general MPH ∗ class byusing Algorithm 3.2.1 on the transformed data. The following example illustrates such an approach. ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 21 X (1) X ( ) Contour plot − Original density X (1) X ( ) Contour plot − Fitted density

Figure 4.3.

Contour plot of original Mardia Pareto (left) and contour plot ofMatrix–Pareto distribution ﬁtted using Algorithm 3.4.1 (right).

Example 4.3.2. (Gumbel copula with matrix–Pareto marginals) We generated an i.i.d. sample ofsize 5 000 from a three-dimensional random vector with ﬁrst marginal being a matrix–Pareto withparameters πππ = (0 . , . , T = (cid:18) − − (cid:19) ,β = 1 , second marginal being a matrix–Pareto with parameters πππ = (0 . , . , T = (cid:18) − − . (cid:19) ,β = 1 , and the third marginal being a conventional Pareto with shape parameter 2 .

5, and a Gumbel copulawith parameter θ = 4. The choice of a Gumbel copula instead of a multivariate matrix–Paretomodel based on an MPH ∗ construction is to show that the algorithm can be used to model anytype of dependence structure. The Gumbel copula is known to have positive tail dependence.This distribution has theoretical numerical values E ( XXX ) = (0 . , . , . (cid:48) , λ U = 2 − / ≈ . ρ τ = 0 .

75, and the real part of the eigenvalues that determine the heaviness of thetails are λ (max)1 = − λ (max)2 = − . λ (max)3 = − .

5. The simulated sample has numericalvalues ˆ E ( XXX ) = (0 . , . , . (cid:48) , ˆ λ U ( X , X ) = 0 . λ U ( X , X ) = 0 . λ U ( X , X ) =0 . ρ τ ( X , X ) = 0 . ρ τ ( X , X ) = 0 . ρ τ ( X , X ) = 0 . p = 6 and 5 000 steps on thetransformed data (running time 6 501 seconds), obtaining the following parameters:ˆ ααα = (0 . , . , . , . , . , . , ˆ T =  − . . . . . . . − . . . . . . . − . . . . . . . − . . . . . . . − . . . . . . . − .  , ˆ R =  . . . . . . . . . . . . . . . . . .  . The real part of the eigenvalues with largest real part of the sub–intensity matrices of themarginal distributions are λ (max)1 = − . λ (max)2 = − . λ (max)3 = − . E ( XXX ) =(0 . , . , . (cid:48) . Moreover, we estimated λ U and ρ τ via simulation, obtaining ˆ λ U ( X , X ) =0 . λ U ( X , X ) = 0 . λ U ( X , X ) = 0 . ρ τ ( X , X ) = 0 . ρ τ ( X , X ) =0 . ρ τ ( X , X ) = 0 . Sample F i tt ed d i s t r i bu t i on Marginal 1 − QQplot

Sample F i tt ed d i s t r i bu t i on Marginal 2 − QQplot

Sample F i tt ed d i s t r i bu t i on Marginal 3 − QQplot

Sample F i tt ed d i s t r i bu t i on Sum of log data − QQplot

Figure 4.4.

QQ plots of simulated sample versus ﬁtted multivariate matrix–Paretodistribution using Algorithm 3.2.1. X (1) X ( ) Contour plot − Sample X (1) X ( ) Contour plot − Sample X (2) X ( ) Contour plot − Sample X (1) X ( ) Contour plot − Simulated data from fitted distribution X (1) X ( ) Contour plot − Simulated data from fitted distribution X (2) X ( ) Contour plot − Simulated data from fitted distribution

Figure 4.5.

Contour plots of simulated sample from Gumbel copula with matrix-Pareto marginals (top) and contour plots of a simulated sample from the Matrix–Pareto distribution ﬁtted using Algorithm 3.2.1 (bottom).We would like to comment on the relative inaccuracy of the obtained estimate for λ (max)1 , espe-cially when compared to the convincing estimates for the other marginals. The ﬁrst marginal here ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 23 is in fact a mixture of Pareto distributions with shape parameters 2 and 3, and mixing probabilities0 . .

8, respectively. That is, a major proportion of the data for the ﬁrst marginal are to beexpected to stem from the Pareto distribution with shape parameter 3. Since Algorithm 3.2.1 ﬁtsbody and tail at the same time, one cannot expect the resulting estimate for the tail to be as goodas techniques of classical extreme value theory, which focusses on the tail ﬁt only. To illustrate thispoint, we also applied the algorithm to data simulated from changed initial probabilities in the ﬁrstmarginal according to πππ = (0 . , .

2) (and otherwise identical parameters). Now most of the datapoints will stem from the heavier distribution in the mixture. Indeed, the estimate for λ (max)1 inthat case turns out to be − . − . − . Example 4.3.3. (Danish ﬁre insurance data) Consider the famous Danish ﬁre insurance claimdata set (see e.g. [22]). We propose here a bivariate matrix–Pareto distribution as a model for thecomponents building and content with observations in the set (1 , ∞ ) × (1 , ∞ ). To that end, we ﬁrsttranslate the sample to the origin, thus the sample has numerical values ˆ E ( XXX ) = (3 . , . (cid:48) and ˆ ρ τ = 0 . p = p = 2 usingAlgorithm 4.2.1 with 2 000 steps (with a step–length of 0 .

05 and gradient ascent until the norm ofthe derivative is less than 0 .

1, the running time is 438 seconds), obtaining the following parameters:ˆ ααα = (0 , , , , ˆ T =  − . . . . − . . − . . − .  , ˆ R =   , ˆ β = 5 . , ˆ β = 13 . . The real part of the eigenvalues that determines the heaviness of the tails are λ (max)1 = − . λ (max)2 = − . E ( XXX ) = (3 . , . (cid:48) .Moreover, estimating ρ τ via simulation gives ˆ ρ τ = 0 . lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll ll l l l l Sample F i tt ed d i s t r i bu t i on Marginal 1 − QQplot lllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l

Sample F i tt ed d i s t r i bu t i on Marginal 2 − QQplot

Figure 4.6.

QQ plots of Danish ﬁre insurance claim size sample versus ﬁtted bi-variate matrix–Pareto distribution using Algorithm 4.2.1.This bivariate data set was recently also studied in Albrecher et al. [2, Sec. 4.5.2], where a splicingmodel with a bivariate mixed Erlang for the body and a bivariate Generalized Pareto distribution X (1) X ( ) Contour plot − Sample

246 2 4 6 X (1) X ( ) Contour plot − Fitted density X (1) X ( ) Contour plot − Simulated data from fitted distribution

Figure 4.7.

Contour plot of Danish ﬁre insurance claim size sample (left), contourplot of Matrix–Pareto distribution ﬁtted using Algorithm 4.2.1 (middle) and contourplot of simulated sample from ﬁtted distribution (right).(GPD) for the tail was proposed. That approach required a threshold selection for the ﬁtting ofthe tails, and univariate extreme value analysis led there to values of regular variation of around 2for the building component and 1 .

67 for the contents component. Even though our estimates arefurther away from these values than their bivariate model (1 .

75 and 1 .

54, respectively), we wouldlike to emphasize that the ﬁtting of a matrix–Pareto distribution does not require any thresholdselection. Furthermore, if we were to use more phases and a general form of the sub–intensityand reward matrices, the ﬁt would quickly improve and for about 6 phases reach the accuracy ofthe bivariate GPD model, but then the overall number of parameters compared to the size of thepresent data set may not be considered commensurate, which is why we stick to the above choice.Note that our proposed procedure is fully automatic and the respective implementation can easilybe applied to any other data set as well.4.4.

Multivariate Matrix–Weibull models.

Let

XXX = ( g ( Y (1) ) , . . . , g d ( Y ( d ) )) (cid:48) , where YYY ∼ MPH ∗ ( πππ, T , R ) and g j ( y ) = y /β j , β j > j = 1 , . . . , d , then we say that XXX has a multivari-ate matrix–Weibull distribution . Some special properties of this type of distribution are:(1) Marginal distributions are matrix–Weibull distributed.(2) For a vector a = ( a (1) , . . . , a ( d ) ), with a ( j ) > j = 1 , . . . , d , ∆( a ) XXX is multivariate matrix–Weibull distributed.For the bivariate case we get f XXX (cid:16) x (1) , x (2) (cid:17) = ααα e T ( x (1) ) β T e T ( x (2) ) β ( − T ) eee β ( x (1) ) β − β ( x (2) ) β − . and ¯ F XXX ( x (1) , x (2) ) = ααα ( − T ) − e T ( x (1) ) β T e T ( x (2) ) β eee Remark 4.4.1.

The Marshall–Olkin Weibull distribution (see [24]) is a particular case of thisdistribution.4.4.1.

Parameter estimation.

In contrast to Section 4.3.1, all transformations are parameter–dependent,and the ﬁtting procedures of the previous subsection are not applicable. However, we can applyAlgorithm 4.2.1 in the bivariate case.

Example 4.4.1. (Bivariate Matrix–Weibull) We generate an i.i.d. sample of size 5 000 of a bivariaterandom vector with matrix–Weibull marginals with parameters πππ = (0 . , . , T = (cid:18) − . − . (cid:19) ,β = 0 . ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 25 for the ﬁrst marginal and πππ = (0 . , . , . , T =  − − . .

50 0 − .  ,β = 0 . ρ = 0 .

5. While any copula, or alsosimply a bivariate matrix–Weibull based on a MPH ∗ construction could be used, we choose theGaussian copula here to illustrate that the algorithm is able to work with any type of dependencestructure. This distribution has theoretical mean E ( XXX ) = (18 . , . (cid:48) . The sample hasnumerical values ˆ E ( XXX ) = (18 . , . (cid:48) and ˆ ρ τ = 0 . p = p = 3 using Algorithm 4.2.1 with1 500 steps (with a running time of 3 930 seconds for a step–length of 10 − ), getting the followingparameters: ˆ ααα = (0 . , , . , , , , ˆ T =  − . . . . − . . . . − . . − . . . − . . − .  , ˆ R =   ,β = 0 . , β = 0 . . One sees that the algorithm estimates the shape parameters of the matrix–Weibull marginals rea-sonably well. The ﬁtted distribution has mean E ( XXX ) = (18 . , . (cid:48) , and from simulateddata we get ˆ ρ τ = 0 . −

39 748, which is to becompared with the log–likelihood −

39 687 .

19 using the original distribution. llllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllllll l l l l l l l l l

Sample F i tt ed d i s t r i bu t i on Marginal 2 − QQplot

Figure 4.8.

QQ plots of sample versus ﬁtted bivariate matrix–Weibull distributionusing Algorithm 4.2.1. X (1) X ( ) Contour plot − Sample X (1) X ( ) Contour plot − Simulated data from fitted distribution

Figure 4.9.

Contour plot of simulated sample (left) and contour plot of a simulatedsample from the bivariate Matrix–Weibull distribution ﬁtted using Algorithm 4.2.1(right). X (1) X ( ) Contour plot − Original density X (1) X ( ) Contour plot − Fitted density

Figure 4.10.

Contour plot of original bivarite matrix-Weibull (left) and contourplot of bivariate Matrix–Weibull distribution ﬁtted using Algorithm 4.2.1 (right).

Remark 4.4.2.

In all examples of this section the marginals were assumed to be of the sametype (both matrix–Pareto or both matrix–Weibull). We would like to mention that the generalityof Algorithm 4.2.1 also allows to ﬁt models with marginals of diﬀerent types (e.g. one marginalmatrix–Pareto and the other matrix–Weibull).5.

Conclusion

In this paper we provided a guide for the statistical ﬁtting of homogeneous and inhomogeneousphase–type distributions to data, both for the univariate and multivariate case. For that purpose, wederived a new EM algorithm for IPH distributions that are obtained through parameter–dependenttransformations. In addition, we introduced new classes of multivariate distributions with IPHmarginals and some attractive properties. As a by-product, we amended the estimation methodproposed by Breuer [16] for the homogeneous MPH ∗ case and illustrated its usefulness and ﬂexibility.We furthermore discussed extensions for censored data and the ﬁtting of the phase–type classesto given continuous joint distribution functions. The performance of the proposed algorithms wasexempliﬁed in various numerical examples, both on simulated and real data. In order to facilitatethe implementation of the proposed algorithms for ﬁtting this general class of distributions to givendata, a respective R package is in preparation and will be made available on Cran. Acknowledgement

We are grateful to Steﬀen L. Lauritzen for some important clariﬁcations concerning the EMalgorithm.

References [1] Liselotte Ahlstr¨om, Marita Olsson, and Olle Nerman. A parametric estimation procedure for relapse time dis-tributions.

Lifetime Data Analysis , 5(2):113–132, 1999.

ITTING INHOMOGENEOUS PHASE-TYPE DISTRIBUTIONS TO DATA 27 [2] Hansj¨org Albrecher, Jan Beirlant, and Jozef L Teugels.

Reinsurance: Actuarial and Statistical Aspects . JohnWiley & Sons, Chichester, 2017.[3] Hansj¨org Albrecher, Martin Bladt, and Mogens Bladt. Matrix Mittag–Leﬄer distributions and modeling heavy-tailed risks.

Extremes , 2020. To appear. doi: 10.1007/s10687-020-00377-0.[4] Hansj¨org Albrecher, Martin Bladt, and Mogens Bladt. Multivariate fractional phase–type distributions. arXivpreprint arXiv:2003.11122 , 2020.[5] Hansj¨org Albrecher, Martin Bladt, and Mogens Bladt. Multivariate Matrix Mittag-Leﬄer distributions.

Ann.Inst. Statist. Math. , 2020. In Press, doi: 10.1007/s10463-020-00750-7.[6] Hansj¨org Albrecher and Mogens Bladt. Inhomogeneous phase-type distributions and heavy tails.

Journal ofApplied Probability , 56(4):1044–1064, 2019.[7] Barry C Arnold.

Pareto distributions . Chapman and Hall/CRC, 2015.[8] Søren Asmussen, Patrick J Laub, and Hailiang Yang. Phase-type models in life insurance: Fitting and valuationof equity-linked beneﬁts.

Risks , 7(1):17, 2019.[9] Søren Asmussen, Olle Nerman, and Marita Olsson. Fitting phase-type distributions via the em algorithm.

Scan-dinavian Journal of Statistics , pages 419–441, 1996.[10] Jan Beirlant, Yuri Goegebeur, Johan Segers, and Jozef L Teugels.

Statistics of extremes: theory and applications .John Wiley & Sons, Chichester, 2004.[11] Mogens Bladt, Antonio Gonzalez, and Steﬀen L Lauritzen. The estimation of phase-type related functionalsusing Markov Chain Monte Carlo methods.

Scandinavian Actuarial Journal , 2003(4):280–300, 2003.[12] Mogens Bladt and Bo Friis Nielsen.

Matrix-Exponential Distributions in Applied Probability . Springer, 2017.[13] Mogens Bladt, Bo Friis Nielsen, and Oscar Peralta. Parisian types of ruin probabilities for a class of dependentrisk-reserve processes.

Scandinavian Actuarial Journal , 2019(1):32–61, 2019.[14] Mogens Bladt, Bo Friis Nielsen, and Gennady Samorodnitsky. Calculation of ruin probabilities for a dense classof heavy tailed distributions.

Scandinavian Actuarial Journal , 2015(7):573–591, 2015.[15] Mogens Bladt and Leonardo Rojas-Nandayapa. Fitting phase–type scale mixtures to heavy–tailed data anddistributions.

Extremes , pages 1–29, 2017.[16] Lothar Breuer. A semi-explicit density function for Kulkarni’s bivariate phase-type distribution.

Stochastic Mod-els , 32(4):632–642, 2016.[17] Jun Cai and Haijun Li. Conditional tail expectations for multivariate phase-type distributions.

Journal of AppliedProbability , 42(3):810–825, 2005.[18] Jun Cai and Haijun Li. Multivariate risk model of phase type.

Insurance: Mathematics and Economics , 36(2):137–152, 2005.[19] C. G. Camarda. Mortalitysmooth: An R package for smoothing poisson counts with P-splines.

Journal of Sta-tistical Software , 50:1–24, 2012.[20] Richard A Davis and Sidney I Resnick. Limit theory for bilinear processes with heavy-tailed noise.

The Annalsof Applied Probability , 6(4):1191–1210, 1996.[21] B. Gompertz. On the nature of the function expressive of the law of human mortality, and on a new mode ofdetermining the value of life contingencies.

Philosophical Transactions of the Royal Society of London , 115:513583,1825.[22] Bettina Gr¨un and Tatjana Miljkovic. Extending composite loss models using a general framework of advancedcomputational tools.

Scandinavian Actuarial Journal , 2019(8):642–660, 2019.[23] David D Hanagal. A multivariate Pareto distribution.

Communications in Statistics-Theory and Methods ,25(7):1471–1488, 1996.[24] David D Hanagal. A multivariate weibull distribution.

Economic Quality Control , 11:193–200, 1996.[25] Alexander Herbertsson. Modelling default contagion using multivariate phase-type distributions.

Review ofDerivatives Research , 14(1):1–36, 2011.[26] WF Kibble. A two-variate gamma type distribution.

Sankhy¯a: The Indian Journal of Statistics , pages 137–150,1941.[27] Vidyadhar G Kulkarni. A new class of multivariate phase type distributions.

Operations Research , 37(1):151–158,1989.[28] Larry Lee. Multivariate distributions having Weibull properties.

Journal of Multivariate Analysis , 9(2):267–277,1979.[29] Kanti V Mardia et al. Multivariate Pareto distributions.

The Annals of Mathematical Statistics , 33(3):1008–1015,1962.[30] Alexander J McNeil, R¨udiger Frey, and Paul Embrechts.

Quantitative risk management: concepts, techniquesand tools-revised edition . Princeton university press, 2015.[31] Marcel F Neuts.

Algorithmic probability: a collection of problems , volume 3. CRC Press, 1995.[32] M.F. Neuts. Probability distributions of phase type. In

Liber Amicorum Professor Emeritus H. Florin , pages173–206. Department of Mathematics, University of Louvian, Belgium, 1975. [33] Hiroyuki Okamura, Tadashi Dohi, and Kishor S Trivedi. A reﬁned em algorithm for ph distributions.

PerformanceEvaluation , 68(10):938–954, 2011.[34] Marita Olsson. Estimation of phase-type distributions from censored data.

Scandinavian Journal of Statistics ,pages 443–460, 1996.[35] Marita Olsson. The EMpht programme. Manual. Chalmers University of Technology and g¨otborg university,1998.[36] Alessio Sancetta and Stephen Satchell. The bernstein copula and its applications to modeling and approximationsof multivariate distributions.

Econometric theory , 20(3):535–562, 2004.[37] Charles Van Loan. Computing integrals involving the matrix exponential.

IEEE transactions on automaticcontrol , 23(3):395–404, 1978.

Department of Actuarial Science, Faculty of Business and Economics, University of Lausanne andSwiss Finance Institute, UNIL-Dorigny, 1015 Lausanne

E-mail address : [email protected] Department of Mathematics, University of Copenhagen, Universitetsparken 5, DK-2100 Copen-hagen, Denmark

E-mail address : [email protected]

Department of Mathematics, University of Copenhagen, Universitetsparken 5, DK-2100 Copen-hagen, Denmark

E-mail address ::