[PDF] Uniform error bounds for a continuous approximation of non-negative random variables

Abstract

In this work, we deal with approximations for distribution functions of non-negative random variables. More specifically, we construct continuous approximants using an acceleration technique over a well-know inversion formula for Laplace transforms. We give uniform error bounds using a representation of these approximations in terms of gamma-type operators. We apply our results to certain mixtures of Erlang distributions which contain the class of continuous phase-type distributions.

Full PDF

aa r X i v : . [ m a t h . S T ] O c t Bernoulli (2), 2010, 561–584DOI: 10.3150/09-BEJ209 Uniform error bounds for a continuousapproximation of non-negativerandom variables

CARMEN SANG ¨UESA

Departamento de M´etodos Estad´ısticos, Facultad de Ciencias, Universidad de Zaragoza, PedroCerbuna, 12, 50009 Zaragoza, Spain. E-mail: [email protected]

In this work, we deal with approximations for distribution functions of non-negative randomvariables. More speciﬁcally, we construct continuous approximants using an acceleration tech-nique over a well-know inversion formula for Laplace transforms. We give uniform error boundsusing a representation of these approximations in terms of gamma-type operators. We applyour results to certain mixtures of Erlang distributions which contain the class of continuousphase-type distributions.

Keywords: gamma distribution; Laplace transform; phase-type distribution; uniform distance

1. Introduction

Frequent operations in probability such as convolution or random summation of randomvariables produce probability distributions which are diﬃcult to evaluate in an explicitway. In these cases, one needs to use numerical evaluation methods. For instance, one canuse numerical inversion of the Laplace or Fourier transform of the distribution at hand(see [2] for the general use of Laplace–Stieltjes transforms in applied probability or [9, 11]for the method of Fast Fourier Transform in the context of risk theory). Another approachis the use of recursive evaluation methods, of special interest for random sums (see [11, 18],for instance). Some of the methods mentioned above require a previous discretizationstep to be applied to the initial random variables when these are continuous. The usualway to do so is by means of rounding methods. However, it is not always possible toevaluate the distribution of the rounded random variable in an explicit way and it isnot always clear when using these methods how the rounding error propagates when onetakes successive convolutions. In these cases, it seems worthwhile to consider alternativediscretization methods. For instance, when dealing with non-negative random variables,the following method ([10], page 233) has been proposed in the literature. Let X be a This is an electronic reprint of the original article published by the ISI/BS in

Bernoulli ,2010, Vol. 16, No. 2, 561–584. This reprint diﬀers from the original in pagination andtypographic detail. (cid:13) C. Sang¨uesa random variable taking values in [0 , ∞ ) with distribution function F . Denote by φ X ( · )the Laplace–Stieltjes (LS) transform of X , that is, φ X ( t ) := E e − tX = Z [0 , ∞ ) e − tu d F ( u ) , t > . For each t >

0, we deﬁne a random variable X • t taking values on k/t, k ∈ N , and suchthat P ( X • t = k/t ) = ( − t ) k k ! φ ( k ) X ( t ) , k ∈ N , (1)where φ ( k ) X denotes the k th derivative ( φ (0) X ≡ φ X ).Thus, if we denote by L ∗ t F the distribution function of X • t , we have L ∗ t F ( x ) := P ( X • t ≤ x ) = [ tx ] X k =0 ( − t ) k k ! φ ( k ) X ( t ) , x ≥ , (2)where [ x ] indicates the largest integer less than or equal to x . The use of this methodallows one to obtain the probability mass function in an explicit way in some situationsin which rounding methods could not (see, for instance, [4] for gamma distributions).Moreover, this method allows for an easy representation of L ∗ t F in terms of F , whichmakes possible the study of rates of convergence in the approximation ([4, 5]). In [4], theproblem was studied in a general setting, whereas in [5], a detailed analysis was carriedout for the case of gamma distributions, that is, distributions whose density function isgiven by f a,p ( x ) := a p x p − e − ax Γ( p ) , x > . (3)Also, in [16], error bounds for random sums of mixtures of gamma distributions wereobtained, uniformly controlled on the parameters of the random summation index. In allof these papers, the measure of distance considered was the Kolmogorov (or sup-norm)distance. More speciﬁcally, for a given real-valued function f deﬁned on [0 , ∞ ), we denoteby k f k the sup-norm, that is, k f k := sup x ≥ | f ( x ) | . It was shown in [5] that for gamma distributions with shape parameter p ≥

1, we havethat k L ∗ t F − F k is of order 1 /t , the length of the discretization interval. Note that k L ∗ t F − F k is the Kolmogorov distance between X and X • t , as both are non-negativerandom variables.The aim of this paper is twofold. First, we will consider a continuous modiﬁcation of(2) and give conditions under which this continuous modiﬁcation has rate of convergenceof 1 /t instead of 1 /t (see Sections 2 and 3). In Section 4, we will consider the caseof gamma distributions to see that the error bounds are also uniform on the shape niform error bounds

2. The approximation procedure

The representation of L ∗ t F in (2) in terms of a Gamma process (see [4]) will play animportant role in our proofs. We recall this representation. Let ( S ( u ) , u ≥

0) be a gammaprocess, in which S (0) = 0 and such that for u >

0, each S ( u ) has a gamma density withparameters a = 1 and p = u , as given in (3). Let g be a function deﬁned on [0 , ∞ ). Weconsider the gamma-type operator L t given by L t g ( x ) := Eg (cid:18) S ( tx ) t (cid:19) , x ≥ , t > , (4)provided that this operator is well deﬁned, that is, L t | g | ( x ) < ∞ , x ≥ , t >

0. Then, for F continuous on (0 , ∞ ), L ∗ t F in (2) can be written as (see [4], page 228) L ∗ t F ( x ) = L t F (cid:18) [ tx ] + 1 t (cid:19) = EF (cid:18) S ([ tx ] + 1) t (cid:19) , x ≥ , t > . (5)It can be seen that the rates of convergence of L t g to g are, at most, of order 1 /t (see(40) below). Our aim now is to get faster rates of convergence. To this end, we will con-sider the following operator, built using a classical acceleration technique ( Richardson’sextrapolation – see, for instance, [9, 11]): L [2] t g ( x ) := 2 L t g ( x ) − L t g ( x ) = 2 Eg (cid:18) S (2 tx )2 t (cid:19) − Eg (cid:18) S ( tx ) t (cid:19) , x ≥ . (6)We will obtain a rate of uniform convergence from L [2] t g to g , of order 1 /t , on thefollowing class of functions: D := { g ∈ C ([0 , ∞ )): k x g iv ( x ) k < ∞} . (7)The problem with L [2] t g is that when tx is not a natural number, L t g ( x ) is given in termsof Weyl fractional derivatives of the Laplace transform (see [6], page 92) and, in general,we are not able to compute them in an explicit way. However, if we modify L [2] t g usinglinear interpolation, that is, M [2] t g ( x ) := ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19)(cid:19) , (8)64 C. Sang¨uesa then we observe that the order of convergence of M [2] t g to g is also 1 /t on the followingclass of functions: D := { g ∈ C ([0 , ∞ )): k g ′′ ( x ) k ≤ ∞ and k x g iv ( x ) k < ∞} . (9)Moreover, the advantage of using M [2] t g instead of L [2] t g to approximate g is the com-putability. In the following result, we note that the last approximation applied to adistribution function F is related to L ∗ t F , as deﬁned in (2). From now on, N ∗ will denotethe set N \ { } . Proposition 2.1.

Let X be a non-negative random variable with Laplace transform φ X .Let L ∗ t F, t > , be as deﬁned in (2) and let M [2] t F be as deﬁned in (8) . We have M [2] t F (cid:18) kt (cid:19) =  F (0) , if k = 0 , L ∗ t F (cid:18) k − t (cid:19) − L ∗ t F (cid:18) k − t (cid:19) , if k ∈ N ∗ , (10) and M [2] t F ( x ) = ( tx − [ tx ]) M [2] t F (cid:18) [ tx ] + 1 t (cid:19) + ([ tx ] + 1 − tx ) M [2] t F (cid:18) [ tx ] t (cid:19) . (11) Proof.

Let t > M [2] t F (cid:18) kt (cid:19) = L [2] t F (cid:18) kt (cid:19) , k ∈ N . (12)Now, using (6) and (4), we have M [2] t F (0) = L [2] t F (0) = F (0), which shows (10) for k = 0.Finally, using (6), (4) and (5), we have, for k ∈ N ∗ ,L [2] t F (cid:18) kt (cid:19) = 2 EF (cid:18) S (2 k )2 t (cid:19) − EF (cid:18) S ( k ) t (cid:19) = 2 L ∗ t F (cid:18) k − t (cid:19) − L ∗ t F (cid:18) k − t (cid:19) . (13)Thus, (12) and (13) show (10) for k ∈ N ∗ . Note that (11) is obvious by (8) and (12). Thiscompletes the proof of Proposition 2.1. (cid:3) In the following example, we illustrate the use of the previous approximant in thecontext of random sums, deﬁned in the following way. Let ( X i ) i ∈ N ∗ be a sequence ofindependent, identically distributed non-negative random variables. Let M be a randomvariable concentrated on the non-negative integers, independent of ( X i ) i ∈ N ∗ . Considerthe random variable M X i =1 X i , (14)with the convention that the empty sum is 0. niform error bounds Example 2.1.

As pointed out in the Introduction, an explicit expression for the dis-tribution of (14) is usually not possible. Our aim is to consider an example in whichthis distribution can be evaluated explicitly and to compare our approximation methodwith some others considered in the literature. To this end, we consider that M follows ageometric distribution of parameter p , that is, P ( M = k ) = (1 − p ) k p, k ∈ N and ( X i ) i ∈ N ∗ are exponentially distributed (with mean 1, for the sake of simplicity). In this case, itis well known (use LS transforms, for instance) that (14) has the same distribution asa mixture of the degenerate distribution at 0 (with probability p ) and an exponentialdistribution, that is, F ( x ) := P M X i =1 X i ≤ x ! = p + (1 − p )(1 − e − px ) = 1 − (1 − p )e − px , x ≥ . (15)When an explicit expression is not possible, the usual approximate evaluation methodis by discretizing the summands in (14) and then using recursive methods found in theliterature for discrete random sums. By considering (1) as a ﬁrst discretization method,we have (see [5], page 391) P (cid:18) X • t = kt (cid:19) = (cid:18) tt + 1 (cid:19) k t + 1 , k = 0 , , . . . . (16)Thus, t P Mi =1 X • ti is a geometric sum of geometric distributions with parameter r = (1 + t ) − . It is easy to check (use LS transforms, for instance) that the distribution of such arandom variable is a mixture of the degenerate distribution at 0 (with probability p ) anda geometric distribution with parameter p ∗ = 1 − (1 − r )(1 − (1 − p ) r ) − = 1 − t ( t + p ) − ,so that for each k ∈ N ,L ∗ t F (cid:18) kt (cid:19) = P M X i =1 X • ti ≤ kt ! (17)= p + (1 − p )(1 − (1 − p ∗ ) k +1 ) = 1 − (1 − p ) (cid:18) tt + p (cid:19) k +1 . Note that the ﬁrst equality in (17) follows by recalling (2) and noting that ( P Mi =1 X i ) • t hasthe same distribution as P Mi =1 X • ti (see [16], Proposition 2.1). Actually, a more naturalway (in this case) to compute (17) is to evaluate the LS transform of ( P Mi =1 X i ) • t andthen apply (1) and (2). However, the previous computations enable easier comparisonswith the following method. In fact, one of the most obvious (and widely used) methodsto discretize the summands in (14) is by a rounding method. For instance, a roundingdown method (we round X i to [ tX i ] t − ) yields P (cid:18) [ tX ] t = kt (cid:19) = P (cid:18) kt ≤ X < k + 1 t (cid:19) = e − k/t (1 − e − /t ) , k ∈ N . (18)66 C. Sang¨uesa

Table 1.

Comparison of diﬀerent approximation methodsfor (15) x = k F ( k ) L ∗ F ( k ) R F ( k ) W F ( k )0 = In this case, P Mi =1 [ tX i ] is a geometric sum of geometric distributions with parameter r ′ = 1 − e − /t . We denote by R t F the distribution function of P Mi =1 [ tX i ] t . Using the samearguments as for (17), we obtain for each k ∈ N that R t F (cid:18) kt (cid:19) = P M X i =1 [ tX i ] t ≤ kt ! = 1 − (1 − p ) (cid:18) e − /t − (1 − p )(1 − e − /t ) (cid:19) k +1 . (19)Finally, it would be interesting to compare the previous ‘discretization methods’ with a‘transform method.’ To this end, we consider the Laplace transform of F in (15) (insteadof its LS transform), that is, w F ( θ ) = Z ∞ e − θu F ( u ) d u = 1 θ − − pθ + p , θ > , and apply the Post–Widder inversion formula (see [10], page 233), deﬁned for t ∈ N ∗ as W t F ( x ) = ( − t − ( t − (cid:18) tx (cid:19) t w ( t − F (cid:18) tx (cid:19) = 1 − (1 − p ) t t ( px + t ) t , x ≥ . In Table 1 (computations with MATLAB) we consider a ‘rough’ discretization interval( t = 5), a small p ( p = 0 .

1) and present, for diﬀerent x = k/

5, the exact values of F (column 2), the L ∗ t approximation (column 3), the ‘rounding down’ discretization (column4) and the Post–Widder inversion (column 5).As we can see in Table 1, L ∗ F provides a better approximation than R F . The intuitiveexplanation of this fact is that, when approximating P Mi =1 X i by P Mi =1 X • ti , the error inthe approximation can be controlled ‘uniformly,’ regardless of the distribution of M (see[16], Theorem 4.3). This eﬀect is obvious when we choose M with a large expected value(our choice of a small p is for this reason – for larger values of p checked, L ∗ F is alsobetter, but the diﬀerence is less appreciable). However, if we compare the approximations niform error bounds Table 2.

Comparison of M [2]5 in (10) with G [2]5 in (21) x = k F ( k ) L ∗ F ( k − ) L ∗ F ( k − ) M [2]5 F ( k ) G [2]5 F ( k )1 = L ∗ F and W F , we see that the last one is better for small values, whereas the ﬁrst one isbetter for large values. To explain this fact, it is interesting to point out that W t F , like L ∗ t F , admits the following well-known representation. For a function g deﬁned on [0 , ∞ ),we can write, as in (5) (see [10], pages 220, 223), W t g ( x ) = Eg (cid:18) x S ( t ) t (cid:19) , x > . (20)Note that the mean of the ‘random points’ deﬁning W t in (20) is E ( xt − S ( t )) = x ,whereas for L ∗ t in (5), we have E ( t − S ([ tx ] + 1)) = t − ([ tx ] + 1). This means that W t is centered at x , whereas L ∗ t is ‘biased’. The beneﬁts of this property for W t are ob-served at small values in Table 1. However, we have Var( xt − S ( t )) = t − x , whereasVar( t − S ([ tx ] + 1)) = t − ([ tx ] + 1), the latter being of order t − x , as t → ∞ . The greatervariability of the random variables deﬁning W t for greater values of x produces an un-desired eﬀect in the approximation.We now show the improvement in the approximation which occurs when using M [2] t ,as deﬁned in (10), instead of L ∗ t . In Table 2 below ( t = 5), we compare M [2] t F (column5) with Richardson extrapolation for W t F (or Stehfest enhancement of order two for thePost–Widder formula – see [1], page 40), that is, G [2] t F ( x ) := 2 W t F ( x ) − W t F ( x ) , x > . (21)As we can see, M [2]5 F provides us with an exact value up to a four decimal places, whereas G [2]5 F does not achieve this accuracy.68 C. Sang¨uesa

3. Error bounds for the approximation

Let g ∈ D , as deﬁned in (7). Our ﬁrst aim is to give bounds of k L [2] t g − g k in terms of k x g iv ( x ) k . To this end, we will use the following as ‘test function’: φ ( x ) =  , if x = 0, x (cid:18) − log( x ) (cid:19) , otherwise. (22)Observe that φ ∈ D . In fact, by elementary calculus, φ ′ ( x ) = x (1 − log x ) , φ ′′ ( x ) = − log x, φ ′′′ ( x ) = − x and φ iv ( x ) = 1 x . (23)In the next lemma, we make an explicit computation of L t φ ( x ) in terms of the Ψ (ordigamma) function. This function is deﬁned as (see [3], page 258)Ψ( x ) := dd x log(Γ( x )) = 1Γ( x ) Z ∞ log u e − u u x − d u, x > , (24)and, therefore, using the last equality, we have the following probabilistic expression ofthe psi function in terms of the gamma process:Ψ( x ) = E log S ( x ) , x > . (25)We will use the following property of this function (see [3], page 258):Ψ( x + 1) = 1 x + Ψ( x ) . (26) Lemma 3.1.

Let φ be deﬁned as in (22) and let L t , t > , be deﬁned as in (4) . We havethat L t φ ( x ) = 12 t (cid:18) tx ) − tx − tx ( tx + 1)( − Ψ( tx ) + log( t )) (cid:19) , x > . (27) Proof.

Let t > x > L t φ ( x ) = E S ( tx ) t (cid:18) − log (cid:18) S ( tx ) t (cid:19)(cid:19) = 12 t tx ) Z ∞ u (cid:18) − log (cid:18) ut (cid:19)(cid:19) e − u u tx − d u (28)= ( tx )( tx + 1)2 t tx + 2) Z ∞ (cid:18) − log (cid:18) ut (cid:19)(cid:19) e − u u tx +1 d u = ( tx )( tx + 1)2 t (cid:18) − E log (cid:18) S ( tx + 2) t (cid:19)(cid:19) . niform error bounds L t φ ( x ) = ( tx )( tx + 1)2 t (cid:18) − Ψ( tx + 2) + log( t ) (cid:19) . (29)Now, using (26) twice, we haveΨ( tx + 2) = 2( tx ) + 1 tx ( tx + 1) + Ψ( tx ) . (30)By (29), (30), we obtain L t φ ( x ) = ( tx )( tx + 1)2 t (cid:18) − tx ) + 1 tx ( tx + 1) − Ψ( tx ) + log( t ) (cid:19) . The result follows using elementary algebra in the expression above. (cid:3)

In the next lemma, we will study the approximation properties of L t φ to φ . We willmake use of the following inequalities for the psi function:12 x ≤ log( x ) − Ψ( x ) ≤ x , x >

0; (31)log( x ) − Ψ( x ) − x ≤ x , x > . (32)We can ﬁnd (31) in [7], page 374, whereas (32) is an immediate consequence of the factthat the function Ψ( x ) − log( x ) + 12 x + 112 x is completely monotonic (see [15], page 304) and thus non-negative. Lemma 3.2.

Let φ be as deﬁned in (22) and let L t , t > , be as deﬁned in (4) . We have (cid:13)(cid:13)(cid:13)(cid:13) L t φ ( x ) − φ ( x ) + x log x t + 13 t (cid:13)(cid:13)(cid:13)(cid:13) ≤ t . (33) Proof.

Let x > t > φ ( x ) = 12 t (cid:18) tx ) − ( tx ) log( tx ) + ( tx ) log( t ) (cid:19) . (34)On the other hand, x log x t + 13 t = 12 t (cid:18) ( tx ) log tx − ( tx ) log t + 23 (cid:19) . (35)70 C. Sang¨uesa

Therefore, using Lemma 3.1, (34) and (35), we can write L t φ ( x ) − φ ( x ) + x log x t + 13 t = 12 t (cid:18) − tx − − ( tx ) Ψ( tx ) − ( tx )Ψ( tx ) + ( tx ) log( tx ) + ( tx ) log( tx ) + 23 (cid:19) (36)= 12 t (cid:18) ( tx ) (cid:18) log( tx ) − Ψ( tx ) − tx ) (cid:19) + tx (log( tx ) − Ψ( tx )) − (cid:19) . By (31), we have that 1 / ≤ x (log( x ) − Ψ( x )) ≤ , x >

0, and thus ≤ tx (log( tx ) − Ψ( tx )) − ≤ . (37)Thus, using (36), (37) and (32), we obtain (33). (cid:3) We are now in a position to state the following.

Theorem 3.1.

Let g ∈ D , as deﬁned in (7) and let L [2] t , t > , be as deﬁned in (6) . Wehave | L [2] t g ( x ) − g ( x ) | ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . Proof.

We will ﬁrst see that g ∈ D implies that k xg ′′′ ( x ) k ≤ k x g iv ( x ) k < ∞ . (38)To begin with, the fact that k x g iv ( x ) k < ∞ implies that lim x →∞ x α g iv ( x ) = 0 for all0 < α <

1. By L’Hˆopital’s rule, we also have that lim x →∞ x α g ′′′ ( x ) = 0, thus concludingthat lim x →∞ g ′′′ ( x ) = 0. Using this fact, we can write g ′′′ ( x ) = − Z ∞ x g iv ( u ) d u, which implies easily (38) as | xg ′′′ ( x ) | ≤ x Z ∞ x | u g iv ( u ) | u d u ≤ k x g iv ( x ) k . Now, let t > L t be as in (4). As a previous step, we will prove that (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) ≤ t k x g iv ( x ) k , x > . (39)To this end, let x >

0. Using a Taylor series expansion of the random point u = S ( tx ) /t around x and taking into account that E ( S ( x ) − x ) = 0, E ( S ( x ) − x ) = x and E ( S ( x ) − niform error bounds x ) = 2 x , we can write L t g ( x ) − g ( x ) = Eg (cid:18) S ( tx ) t (cid:19) − g ( x )= E ( S ( tx ) − tx ) t g ′′ ( x ) + E ( S ( tx ) − tx ) t g ′′′ ( x ) (40)+ 16 E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ = xg ′′ ( x )2 t + xg ′′′ ( x )3 t + 16 E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ. Then, using (40), we get the bound (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) = 16 (cid:12)(cid:12)(cid:12)(cid:12) E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ (cid:12)(cid:12)(cid:12)(cid:12) (41) ≤ k x g iv ( x ) k E Z max( x,S ( tx ) /t )min( x,S ( tx ) /t ) (cid:12)(cid:12)(cid:12)(cid:12) S ( tx ) t − θ (cid:12)(cid:12)(cid:12)(cid:12) θ d θ = k x g iv ( x ) k E Z S ( tx ) /tx (cid:18) S ( tx ) t − θ (cid:19) θ d θ. Let φ ( · ) be as in (22). Using (40) and (23), we have L t φ ( x ) − φ ( x ) + x log x t + 13 t = 16 E Z S ( tx ) /tx (cid:18) S ( tx ) t − θ (cid:19) θ d θ. (42)Then, by (41) and (42), we can write (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) ≤ k x g iv ( x ) k (cid:13)(cid:13)(cid:13)(cid:13) L t φ ( x ) − φ ( x ) + x log x t + 13 t (cid:13)(cid:13)(cid:13)(cid:13) . Thus, (39) follows by applying Lemma 3.2.Observe that in (39), the only term of order 1 /t is the one involving the second deriva-tive. By means of the operator L [2] t , as deﬁned in (6), this term is eliminated. In fact,using (39), we have L [2] t g ( x ) − g ( x ) = 2( L t g ( x ) − g ( x )) − ( L t g ( x ) − g ( x ))= 2 (cid:18) L t g ( x ) − g ( x ) − x t g ′′ ( x ) − x t g ′′′ ( x ) (cid:19) C. Sang¨uesa (43) − (cid:18) L t g ( x ) − g ( x ) − x t g ′′ ( x ) − x t g ′′′ ( x ) (cid:19) − x t g ′′′ ( x ) ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . This completes the proof of Theorem 3.1. (cid:3)

Finally, in the next result, we study the approximation properties of M [2] t . Theorem 3.2.

Let g ∈ D , as deﬁned in (9) and let M [2] t , t > , be as deﬁned in (8) . Wehave k M [2] t g − g k ≤ t k g ′′ ( x ) k + 16 t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . Proof.

First, note that g ∈ D implies that k xg ′′′ ( x ) k < ∞ , thanks to (38). Now, let t > x > M [2] t g ( x ) − g ( x ) = ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19) − g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19) − g (cid:18) [ tx ] t (cid:19)(cid:19) (44)+ ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19) . Using the usual expansion | g ( y ) − g ( x ) − ( y − x ) g ′ ( x ) | ≤ ( y − x ) k g ′′ k (45)and taking into account that( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19) = ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) − [ tx ] + 1 − txt g ′ ( x ) (cid:19) (46)+ ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) − [ tx ] − txt g ′ ( x ) (cid:19) , niform error bounds (cid:12)(cid:12)(cid:12)(cid:12) ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) ( tx − [ tx ]) ([ tx ] + 1 − tx ) t + ([ tx ] + 1 − tx ) ([ tx ] − tx ) t (cid:19) k g ′′ k (47)= ( tx − [ tx ])([ tx ] + 1 − tx )2 t k g ′′ k ≤ t k g ′′ k , the last inequality holding since for each k ∈ N , the supremum of ( u − k )( k + 1 − u ) , k ≤ u ≤ k + 1, is attained at u = k + 1 /

2. On the other hand, taking into account Theorem3.1, we have (cid:12)(cid:12)(cid:12)(cid:12) ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19) − g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19) − g (cid:18) [ tx ] t (cid:19)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) (48) ≤ k L [2] t g − g k ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . The result follows by (44), (47) and (48). (cid:3)

4. Application to gamma distributions

In this section, we will study the case of gamma distributions, that is, distributions withdensity functions as given in (3). It is not hard to see that these distributions are in theclass D , for a shape parameter p = 1 or p ≥

2, and, therefore, we are a position of applyTheorem 3.2. The aim of this section is to show that, in fact, the bounds in this theoremcan be uniformly bounded on the shape parameter, which will be an advantage whendealing with mixtures of these distributions. From now on, we deﬁne f p ( x ) :=  e − x x p − Γ( p ) , x >

0, if p ∈ R \ { , − , − , . . . } ,0 , x >

0, if p ∈ { , − , − , . . . } . (49)The ‘odd’ deﬁnition of f p for p ∈ { , − , − , . . . } is for notational convenience in (51). For p >

0, the function above is the density of a gamma random variable as in (3), with scaleparameter a = 1. Results for another scale parameter will follow by a change of scale(see Proposition 5.1 below). First, we will consider the case p = 1, that is, an exponentialrandom variable. As the distribution function of this random variable presents no compu-tational problems, it makes no sense to approximate it. However, when we consider theproblem of approximating a general mixture of Gamma distributions, the exponentialdistribution could be a component.74 C. Sang¨uesa

Lemma 4.1.

Let F ( x ) = 1 − e − x , x ≥ . For t > , let M [2] t F be as deﬁned in (8) . Wehave that k M [2] t F − F k ≤ (cid:18)

18 + 16e + 94e (cid:19) t . Proof.

First of all, note that | F ( k ) ( x ) | = e − x and that sup x ≥ x k e − x = k k e − k , k =1 , , . . . . Thus, we have k F ′′ k = 1 , k xF ′′′ ( x ) k = e − and k x F iv ( x ) k = 2 e − . (50)The conclusion follows by taking into account Theorem 3.2. (cid:3) We will now deal with the case p ≥ Lemma 4.2.

Let f p ( · ) , p > , be as deﬁned in (49) . We have, for all n ∈ N , d n d x n f p ( x ) = e − x x p − n − Γ( p ) n X i =0 (cid:18) ni (cid:19) ( − i n − i Y j =1 ( p − j ) ! x i (51)= n X i =0 (cid:18) ni (cid:19) ( − i f p − n + i ( x ) , x > , in which Q j =1 ( p − j ) = 1 . Next, we formulate a technical lemma in which we deﬁne certain decreasing functionswhich will be used to bound the weighted derivatives of f p . Lemma 4.3.

We have: (i) the function g ( p ) := 1Γ( p ) e − ( p − ( p − p − , p > g (1) = 1) , is decreasing in p ; (ii) the function g ( p ) := 1Γ( p ) e − ( p − / / √ p − (cid:18) p −

12 + 12 p p − (cid:19) p − / , p ≥ , (53) is decreasing in p ;niform error bounds the function g ( p ) := 1Γ( p ) e − ( p − −√ p − ( p p − − p − ( p p − p − , p > g (2) = 1) , is decreasing in p ; (iv) the function g ( p ) := 1Γ( p ) e − ( p −√ p − ( p − p p − p − ( p p − − , p > g (2) = 1) , is decreasing in p . In the following result, we get bounds of the quantities required in Theorem 3.2, de-pending on the shape parameter p , but also decreasing on p . Lemma 4.4.

Let f p be as in (49) and g i , i = 1 , , , , be as in Lemma . We have: (i) sup x ≥ | f p ( x ) | = g ( p ) , p ≥ x ≥ | xf ′ p ( x ) | = g ( p ) , p ≥ x ≥ | f ′ p ( x ) | = g ( p ) , p ≥ x ≥ | xf ′′ p ( x ) | ≤ max { g ( p − , g ( p − } , p ≥ x ≥ | x f ′′′ p ( x ) | ≤ g ( p ) + 3 g ( p −

1) + g ( p − , p ≥ . Proof.

To show part (i), it is clear that, for p ≥ x ≥ f p ( x ) = f p ( p −

1) = e − ( p − ( p − p − Γ( p )and (i) follows by recalling (52). To show part (ii), we have (see [16], Remark 3.2 andLemma 5.2)sup x ≥ | xf ′ p ( x ) | = 1Γ( p ) (cid:18) p −

12 + 12 p p − (cid:19) p − / e − p − / / √ p − , p > , (56)and (ii) follows by recalling (53). To show part (iii), by (51), we have for p ≥ f ′ p ( x ) = 1Γ( p ) e − x x p − ( p − − x ) , x > , (57) f ′′ p ( x ) = 1Γ( p ) e − x x p − (( p − p − − p − x + x ) , x > , (58)76 C. Sang¨uesa and it can be easily checked that the zeros of f ′′ p ( x ) are p := p − − √ p − p := p − √ p −

1. Therefore, | f ′ p ( x ) | must attain its maximum value at either p or p .Actually, p corresponds to the maximum. To show that, we will see that f ′ p ( p ) | f ′ p ( p ) | = e √ p − (cid:18) √ p − − √ p − (cid:19) p − ≥ , p ≥ . (59)To show the last inequality in (59), taking logarithms, we will prove that r ( p ) := 2 p p − p − p p − − − log( p p − ≥ , p > . (60)Deﬁne ρ ( b ) := 2 bb − b − − log( b + 1)) , b > . Note that r ( p ) = ( p − ρ ( p p − , p > . (61)We will ﬁrst prove that ρ ( b ) ≥ , b > . (62)To show (62), it is readily seen that ρ ′ ( b ) = − b − − , b >

1, so that ρ is decreasing.As lim b →∞ ρ ( b ) = 0, we have (62). This implies also (60), recalling (61). Therefore, weconclude thatsup x> | f ′ p ( x ) | = f ′ p ( p ) = 1Γ( p ) e − ( p − −√ p − ( p p − − p − ( p p − p − , (63)which, together with (54), shows (iii).To show part (iv), note that by using (51), we can write f ′ p ( x ) = f p − ( x ) − f p ( x ) and,therefore, xf ′′ p ( x ) = xf ′ p − ( x ) − xf ′ p ( x ) , x > , p ≥ . (64)On the other hand, we see in (58) that f ′ p − ( x ) and f ′ p ( x ) have the same sign for 0 < x

1, attains itsmaximum value at p −

1. From (66) and (67), we conclude thatsup x ∈ [ p − ,p − | xf ′′ p ( x ) | ≤ p ) e − ( p − ( p − p − ( p −

1) = g ( p − , (68)where the last inequality follows by recalling (52). Thus, (65) and (68) conclude the proofof part (iv). To show part (v), let p ≥

2. First, we have, by (51), f ′′′ p ( x ) = f p − ( x ) − f p − ( x ) + 3 f p − ( x ) − f p ( x )= e − x x p − Γ( p ) (( p − p − p − − p − p − x + 3( p − x − x ) (69)= e − x x p − Γ( p ) (( p − − x ) + 3( p − x − ( p − − ( p − , x > . Therefore, if we call h p ( x ) := e − x x p − Γ( p ) ( p − − x ) , x > , we have, recalling (57), x f ′′′ p ( x ) = e − x x p − Γ( p ) (( p − − x ) − p − x − ( p − − ( p − h p ( x ) + 3 xf ′ p − ( x ) − f p − ( x ) , x ≥ . We will ﬁrstly see that sup x ≥ | h p ( x ) | = g ( p ) (71)with g ( · ) as deﬁned in (55). Note that h ′ p ( x ) = e − x x p − Γ( p ) ( p − − x ) ( x − px + ( p − p − , x > . The maximum value of | h p | will be attained at the roots of the last polynomials, being p := p + √ p − p := p − √ p −

2. To check which value attains the maximum,deﬁne u := √ p −

2. Note that p = ( u + 1)( u + 2) / p = ( u − u − /

3. Then,with this notation, we will prove that | h p ( p ) || h p ( p ) | = e u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) ( u − / (cid:18) u − u + 1 (cid:19) ≥ , u > . (72)78 C. Sang¨uesa

To show the last inequality in (72), taking logarithms, we will show that ρ ( u ) := 2 u + u −

43 log (cid:18) ( u − u − u + 1)( u + 2) (cid:19) + 3 log (cid:18) u − u + 1 (cid:19) ≥ , u > . (73)Note that ρ ′ ( u ) = 2 + 2 u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) + u − (cid:18) u − u − − u + 1 − u + 2 (cid:19) + 3 (cid:18) u − − u + 1 (cid:19) = 4 u u − u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) , u > . We will show that ρ ′ ( u ) ≤ u >

2. In fact,dd u u ρ ′ ( u ) = 36( u + 1) ( u − ( u − ≥ , u > , and then 3(2 u ) − ρ ′ ( u ) is increasing. As lim u →∞ u ) − ρ ′ ( u ) = 0, we conclude that3(2 u ) − × ρ ′ ( u ) ≤ ρ ′ ( u ) ≤

0. Therefore, ρ ( u ) is decreasing. This,together with the fact that lim u →∞ ρ ( u ) = 0, proves (73) and therefore (72). Then, k h p k = h p ( p ) = g ( p ), thus proving (71). The proof of part (iv) now follows easily byrecalling (70) and using (71) and parts (i) and (ii). (cid:3) As an immediate consequence of Theorem 3.2 and Lemma 4.4, we have the followingcorollary.

Corollary 4.1.

Let F p be a gamma distribution with shape parameter p ≥ , that is,whose density function is given by (49) . Let M [2] t , t > , be deﬁned as in (8) . We have k M [2] t F p − F p k ≤ (cid:18) (cid:19) t ≈ . t . Proof.

Let p ≥ F ′ p = f p , as deﬁned in (49). Therefore, by Lemma 4.4(iii) and Lemma 4.3(ii), we havethat k F ′′ p k = k f ′ p k = g ( p ) ≤ g (2) = 1 . (74)On the other hand, we see that by Lemma 4.3(i), we have that g ( p − ≤ g (1) = 1 and g ( p − ≤ g (1) = e − , p ≥ . (75)Thus, using the above inequalities and Lemma 4.4(iv), we have k xF ′′′ p ( x ) k = k xf ′′ p ( x ) k ≤ . (76) niform error bounds k x F ivp ( x ) k = k x f ′′′ p ( x ) k ≤ g (2) + 3 g (1) + g (1) = 2 + 3e − . (77)Using (74), (76), (77) and Theorem 3.2, we obtain the result. This completes the proofof Corollary 4.1. (cid:3)

5. Applications to mixtures of Erlang distributionsand phase-type distributions

In this section we apply the results from the previous section to mixtures of Erlangdistributions and to random sums of thereof. In order to undertake this study for anarbitrary scale parameter, we need the following result which shows the behavior of M [2] t F under changes of scale. Proposition 5.1.

Let X be a random variable with distribution function F . For a given c > , denote by F c the distribution function of cX . Let M [2] t F and M [2] t F c , t > , be therespective approximations for F and F c , as deﬁned in (8) . We have that M [2] t F c ( x ) = M [2] ct F ( x/c ) , x ≥ . (78) Therefore, k M [2] t F c − F c k = k M [2] ct F − F k . (79) Proof.

Let t > c > M [2] t F c (cid:18) kt (cid:19) = M [2] ct F (cid:18) kct (cid:19) , k ∈ N , (80)and, therefore, (78) is satisﬁed for points in the set k/t, k ∈ N . To this end, we use (12)and (6), and take into account that F c ( x ) = F ( x/c ) , x ≥ , (81)to write, for all k ∈ N , M [2] t F c (cid:18) kt (cid:19) = 2 EF c (cid:18) S (2 k )2 t (cid:19) − EF c (cid:18) S ( k ) t (cid:19) (82)= 2 EF (cid:18) S (2 k )2 ct (cid:19) − EF (cid:18) S ( k ) ct (cid:19) = M [2] ct F (cid:18) kct (cid:19) , C. Sang¨uesa thus proving (80). For a general x >

0, we use (8) and (80), to see that M [2] t F c ( x ) = ( tx − [ tx ]) M [2] t F c (cid:18) [ tx ] + 1 t (cid:19) + ([ tx ] + 1 − tx ) M [2] t F c (cid:18) [ tx ] t (cid:19) = ( tx − [ tx ]) M [2] ct F (cid:18) [ tx ] + 1 ct (cid:19) + ([ tx ] + 1 − tx ) M [2] ct F (cid:18) [ tx ] ct (cid:19) = M [2] ct F (cid:18) xc (cid:19) , the last inequality being trivial as tx = ( ct )( x/c ). This concludes the proof of (78). Finally,(79) follows easily from (78) and (81), as we havesup x> | M [2] t F c ( x ) − F c ( x ) | = sup x> | M [2] ct F ( x/c ) − F ( x/c ) | . This concludes the proof of Proposition 5.1. (cid:3)

As an application of the results in the previous section, we will consider the class of(possibly inﬁnite) mixtures of Erlang distributions recently studied by Willmot and Woo(see [19]). More speciﬁcally, let F ( a,j ) , a > , j ∈ N ∗ , be the distribution function corre-sponding to the density f ( a,j ) given in (3) (an Erlang j distribution with scale parameter a ). We will consider a ﬁnite number of scale parameters arranged in increasing order(0 < a < · · · < a n ) and a set of non-negative numbers p ij , i = 1 , . . . , n, j = 0 , , , . . ., suchthat P ni =1 P ∞ j =1 p ij = p ≤

1, and deﬁne the class of distribution functions ME ( a , . . . , a n )given as F ( x ) = (1 − p ) + n X i =1 ∞ X j =1 p ij F a i ,j ( x ) , x ≥ − p ). Based on [19], page 103, we can alternatively write(83) by using only the maximum of the scale parameters, that is, F ( x ) = (1 − p ) + ∞ X j =1 p j F a n ,j ( x ) , x ≥ . (84)Moreover, the class (84) is a wide class containing many of the distributions considered inapplied probability, such as (obviously) ﬁnite mixtures of Erlang distributions, but alsothe class of phase-type distributions (see Proposition 5.3 below). Every random variablehaving a representation as in (83) can be approximated by means of M [2] t , as shown inthe following result. Proposition 5.2.

Let F be a distribution function of the form ME ( a , . . . , a n ) , < a < · · · < a n , as in (83) . Let M [2] t , t > , be deﬁned as in (8) . We have k M [2] t F − F k ≤ (cid:18) (cid:19) P ni =1 ( P ∞ j =1 p ij ) a i t . (85) niform error bounds Proof.

Let t > < a < · · · < a n be ﬁxed. The linearity of M [2] t yields M [2] t F ( x ) = (1 − p ) + n X i =1 ∞ X j =1 p ij M [2] t F a i ,j ( x ) , x ≥ . (86)By Corollary 4.1, we can write, for a scale parameter 1, k M [2] t F ,j − F ,j k ≤ (cid:18) (cid:19) t , j = 2 , , . . . . (87)Moreover, using Lemma 4.1, we have k M [2] t F , − F , k ≤ (cid:18)

12 + 16e + 94e (cid:19) t ≤ (cid:18) (cid:19) t . (88)Now, let the general scale parameters be a i , i = 1 , . . . , n . We use the fact that given X ,a gamma random variable of scale parameter 1, X/a i is a gamma random variable ofscale parameter a i , and, therefore, using Proposition 5.1, (87) and (88), we have for each a i , i = 1 , . . . , n , and j ∈ N ∗ , k M [2] t F a i ,j − F a i ,j k = k M [2] t/a i F ,j − F ,j k ≤ (cid:18) (cid:19) a i t . (89)Thus, using (86) and (89), we have k M [2] t F − F k ≤ n X i =1 ∞ X j =1 p ij k M [2] t F a i ,j − F a i ,j k (90) ≤ (cid:18) e (cid:19) P ni =1 ( P ∞ j =1 p ij ) a i t . This completes the proof of Proposition 5.2. (cid:3)

As a consequence of the previous result, we can provide error bounds for compounddistributions (that is, distribution functions of random sums, as in (14)) when the sum-mands are mixtures of Erlang distributions, as stated in the following result.

Corollary 5.1.

Let G be the distribution function of a random sum, as in (14) , in whichthe sequence of ( X i ) i ∈ N ∗ has a common distribution ME ( a , . . . , a n ) , < a < · · · < a n ,as deﬁned in (83) . Let M [2] t be as in (8) . We have that k M [2] t G − G k ≤ (cid:18) (cid:19) (1 − G (0)) a n t . C. Sang¨uesa

Proof.

The proof is immediate, taking into account that a mixture of Erlang distri-butions ME ( a , . . . , a n ), 0 < a < · · · < a n , can be expressed as in (84) and compounddistributions of these random variables are also mixtures of Erlang distributions (see [19],page 106, with a slight modiﬁcation in the coeﬃcients, as we allow a point mass at 0),that is, we can write G ( x ) = q + ∞ X j =1 q j F a n ,j ( x ) , x ≥ , in which { q j , j = 0 , , . . . } form a probability mass function (obviously, q = G (0)). Theresult follows using the above expression and Proposition 5.2. (cid:3) The class of phase-type distributions, of great importance in applied probability, can beexpressed as mixtures of Erlang distributions. A phase-type distribution is deﬁned as thetime until absorption in a continuous-time Markov chain with one absorbent state (see,for instance, [12], Chapter II or [8], Chapter VIII, and the references therein). A phase-type distribution can be expressed in terms of a matrix exponential as follows. Considera vector α = ( α , . . . , α n ) of non-negative numbers such that α + · · · + α n ≤

1. Let A be a n × n matrix with negative diagonal entries, non-negative oﬀ-diagonal entries andnon-positive row sums. A non-negative random variable X is a phase-type distribution PH ( α, A ) if its distribution function can be written as F ( x ) = 1 − α e xA ′ , x ≥ , in which ′ represents the transpose of the n th dimensional vector = (1 , . . . , α + · · · + α n = 1, having positive mass at 0 (of magnitude 1 − ( α + · · · + α n )) when α + · · · + α n <

1. Phase-type distributions have been extensively studied from both the-oretical and practical points of view. For instance, it is well known that phase-type dis-tributions have rational Laplace transforms, thus allowing numerical computation usingour approximation procedures. Also, in the next proposition, we will give an expressionof phase-type distributions in terms of mixtures of Erlang distributions. This, togetherwith Proposition 5.2, provides our approximations with rates of convergence. The proofof the next result is based on the following property of phase-type distributions, due toMaier (see [13], page 591). Let f be the density of an absolutely continuous phase-typedistribution. There exists some c > c j := d j d x j e cx f ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 > , j ∈ N . (91)We are now in a position to state the following. Proposition 5.3.

Let F be a phase-type distribution PH ( α, A ) , with α + · · · + α n > .Let c > be such that the absolutely continuous part of F satisﬁes the property (91) .niform error bounds Then, F can be expressed as a mixture of Erlang distributions, that is, F ( x ) = p + ∞ X j =1 p j F c,j ( x ) , x ≥ , (92) in which p = 1 − ( α + · · · + α n ) . Proof.

To prove (a), assume ﬁrst that F is absolutely continuous, that is, that α + · · · + α n = 1. Its density is then given by f ( x ) = − α e xA A ′ , x >

0. We choose a c > cx f ( x ) = − α e x ( cI − A ) A ′ , x ≥ . (93)It can be easily checked that the function − α e x ( cI − A ) A ′ , x ∈ R is analytic in R , so if weconsider the Taylor series expansion of this function around 0 and take into account (91)and (93), we have e cx f ( x ) = ∞ X j =0 c j x j j ! , x > , from which we can write (recall (3)) f ( x ) = ∞ X j =0 c j c j +1 c j +1 x j e − cx j ! = ∞ X j =0 c j c j +1 f c,j +1 ( x ) , x > , and, in this way, we obtain the expression of f in terms of a mixture of Erlang densitieswith shape parameter c (by construction, the coeﬃcients are non-negative and integratingboth sides in the above expression, we see that their sum is 1). As a consequence, we canwrite F ( x ) = ∞ X j =1 c j − c j F c,j ( x ) , x ≥ , (94)thus having expressed F as a mixture of Erlang distributions, as in (92). Now, assumethat 0 < α + · · · + α n <

1. This means that F has a point mass at 0 of magnitude p := 1 − ( α + · · · + α n ). The absolutely continuous part of F ( F ac ) is a phase-typedistribution ( PH (¯ α, A )) with ¯ α = ( α + · · · + α n ) − α . Let c > F ac veriﬁesproperty (93). We can write, thanks to (94), F ( x ) = p + (1 − p ) F ac ( x ) = p + ∞ X j =1 (1 − p ) c j − c j F c,j ( x ) , x ≥ . This completes the proof of Proposition 5.3. (cid:3) C. Sang¨uesa

Remark 5.1.

Expansions similar to those given in Proposition 5.3 can be found in[12], page 58. These expansions are obtained using a representation PH ( α, A ) of thedistribution under consideration. Note that if we denote by k A k the maximum absolutevalue of the entries of A , then it is easy to check using (93) (see [14], page 751) that c = k A k veriﬁes (91). However, as the representation of a phase-type distribution is notunique, this value might not be the optimum one. Also, observe that the error boundgiven in (85) indicates that we should take c to be as small as possible. This problem,then, is closely connected to Conjecture 6 in [14], concerning the minimum c satisfying(91) and its relation with a phase-type representation having k A k as small as possible.To the best of our knowledge, this conjecture remains unsolved. Acknowledgments

I would like to thank Jos´e Garrido for suggesting the ﬁnal applications to phase-type dis-tributions when I was at Concordia University and to two anonymous referees for helpfulcomments. This research has been partially supported by research Grants 2006-CIE-05(University of Zaragoza), MTM2007-63683 and PR 2007-0295 (Spanish Government),E64 (DGA) and by FEDER funds.

References [1] Abate, J. and Whitt, W. (1995). Numerical inversion of Laplace transforms of probabilitydistributions.

ORSA Journal of Computing Adv. in Appl. Probab. Handbook of Mathematical Functions . Washing-ton, DC: National Bureau of Standards. MR0167642[4] Adell, J.A. and de la Cal, J. (1993). On the uniform convergence of normalized Poissonmixtures to their mixing distribution.

Statist. Probab. Lett. J. Appl. Probab. J. Austral. Math. Soc. Ser. A Math. Comp. Ruin Probabilities . Singapore: World Scientiﬁc. MR1794582[9] Embrechts, P., Gr¨ubel, R. and Pitts, S.M. (1993). Some applications of the fastFourier transform algorithm in insurance mathematics.

Statist. Neerlandica An Introduction to Probability Theory and Its Applications II , 2nd ed.New York: Wiley.[11] Gr¨ubel, R. and Hermesmeier, R. (2000). Computation of compound distributions II: Dis-cretization errors and Richardson extrapolation. Astin Bull. Introduction to Matrix Analytic Methods inStochastic Modelling . Philadelphia: ASA-SIAM. MR1674122 niform error bounds [13] Maier, R.S. (1991). The algebraic construction of phase-type distributions.

Comm. Statist.Stochastic Models Comm. Statist. Stochastic Models J. Math. Anal. Appl.

Insurance Math. Econom. .[18] Sundt, B. (2002). Recursive evaluation of aggregate claims distributions. Insurance Math.Econom. N. Am. Actuar. J.99–105. MR2380721