Uniform error bounds for a continuous approximation of non-negative random variables
aa r X i v : . [ m a t h . S T ] O c t Bernoulli (2), 2010, 561–584DOI: 10.3150/09-BEJ209 Uniform error bounds for a continuousapproximation of non-negativerandom variables
CARMEN SANG ¨UESA
Departamento de M´etodos Estad´ısticos, Facultad de Ciencias, Universidad de Zaragoza, PedroCerbuna, 12, 50009 Zaragoza, Spain. E-mail: [email protected]
In this work, we deal with approximations for distribution functions of non-negative randomvariables. More specifically, we construct continuous approximants using an acceleration tech-nique over a well-know inversion formula for Laplace transforms. We give uniform error boundsusing a representation of these approximations in terms of gamma-type operators. We applyour results to certain mixtures of Erlang distributions which contain the class of continuousphase-type distributions.
Keywords: gamma distribution; Laplace transform; phase-type distribution; uniform distance
1. Introduction
Frequent operations in probability such as convolution or random summation of randomvariables produce probability distributions which are difficult to evaluate in an explicitway. In these cases, one needs to use numerical evaluation methods. For instance, one canuse numerical inversion of the Laplace or Fourier transform of the distribution at hand(see [2] for the general use of Laplace–Stieltjes transforms in applied probability or [9, 11]for the method of Fast Fourier Transform in the context of risk theory). Another approachis the use of recursive evaluation methods, of special interest for random sums (see [11, 18],for instance). Some of the methods mentioned above require a previous discretizationstep to be applied to the initial random variables when these are continuous. The usualway to do so is by means of rounding methods. However, it is not always possible toevaluate the distribution of the rounded random variable in an explicit way and it isnot always clear when using these methods how the rounding error propagates when onetakes successive convolutions. In these cases, it seems worthwhile to consider alternativediscretization methods. For instance, when dealing with non-negative random variables,the following method ([10], page 233) has been proposed in the literature. Let X be a This is an electronic reprint of the original article published by the ISI/BS in
Bernoulli ,2010, Vol. 16, No. 2, 561–584. This reprint differs from the original in pagination andtypographic detail. (cid:13) C. Sang¨uesa random variable taking values in [0 , ∞ ) with distribution function F . Denote by φ X ( · )the Laplace–Stieltjes (LS) transform of X , that is, φ X ( t ) := E e − tX = Z [0 , ∞ ) e − tu d F ( u ) , t > . For each t >
0, we define a random variable X • t taking values on k/t, k ∈ N , and suchthat P ( X • t = k/t ) = ( − t ) k k ! φ ( k ) X ( t ) , k ∈ N , (1)where φ ( k ) X denotes the k th derivative ( φ (0) X ≡ φ X ).Thus, if we denote by L ∗ t F the distribution function of X • t , we have L ∗ t F ( x ) := P ( X • t ≤ x ) = [ tx ] X k =0 ( − t ) k k ! φ ( k ) X ( t ) , x ≥ , (2)where [ x ] indicates the largest integer less than or equal to x . The use of this methodallows one to obtain the probability mass function in an explicit way in some situationsin which rounding methods could not (see, for instance, [4] for gamma distributions).Moreover, this method allows for an easy representation of L ∗ t F in terms of F , whichmakes possible the study of rates of convergence in the approximation ([4, 5]). In [4], theproblem was studied in a general setting, whereas in [5], a detailed analysis was carriedout for the case of gamma distributions, that is, distributions whose density function isgiven by f a,p ( x ) := a p x p − e − ax Γ( p ) , x > . (3)Also, in [16], error bounds for random sums of mixtures of gamma distributions wereobtained, uniformly controlled on the parameters of the random summation index. In allof these papers, the measure of distance considered was the Kolmogorov (or sup-norm)distance. More specifically, for a given real-valued function f defined on [0 , ∞ ), we denoteby k f k the sup-norm, that is, k f k := sup x ≥ | f ( x ) | . It was shown in [5] that for gamma distributions with shape parameter p ≥
1, we havethat k L ∗ t F − F k is of order 1 /t , the length of the discretization interval. Note that k L ∗ t F − F k is the Kolmogorov distance between X and X • t , as both are non-negativerandom variables.The aim of this paper is twofold. First, we will consider a continuous modification of(2) and give conditions under which this continuous modification has rate of convergenceof 1 /t instead of 1 /t (see Sections 2 and 3). In Section 4, we will consider the caseof gamma distributions to see that the error bounds are also uniform on the shape niform error bounds
2. The approximation procedure
The representation of L ∗ t F in (2) in terms of a Gamma process (see [4]) will play animportant role in our proofs. We recall this representation. Let ( S ( u ) , u ≥
0) be a gammaprocess, in which S (0) = 0 and such that for u >
0, each S ( u ) has a gamma density withparameters a = 1 and p = u , as given in (3). Let g be a function defined on [0 , ∞ ). Weconsider the gamma-type operator L t given by L t g ( x ) := Eg (cid:18) S ( tx ) t (cid:19) , x ≥ , t > , (4)provided that this operator is well defined, that is, L t | g | ( x ) < ∞ , x ≥ , t >
0. Then, for F continuous on (0 , ∞ ), L ∗ t F in (2) can be written as (see [4], page 228) L ∗ t F ( x ) = L t F (cid:18) [ tx ] + 1 t (cid:19) = EF (cid:18) S ([ tx ] + 1) t (cid:19) , x ≥ , t > . (5)It can be seen that the rates of convergence of L t g to g are, at most, of order 1 /t (see(40) below). Our aim now is to get faster rates of convergence. To this end, we will con-sider the following operator, built using a classical acceleration technique ( Richardson’sextrapolation – see, for instance, [9, 11]): L [2] t g ( x ) := 2 L t g ( x ) − L t g ( x ) = 2 Eg (cid:18) S (2 tx )2 t (cid:19) − Eg (cid:18) S ( tx ) t (cid:19) , x ≥ . (6)We will obtain a rate of uniform convergence from L [2] t g to g , of order 1 /t , on thefollowing class of functions: D := { g ∈ C ([0 , ∞ )): k x g iv ( x ) k < ∞} . (7)The problem with L [2] t g is that when tx is not a natural number, L t g ( x ) is given in termsof Weyl fractional derivatives of the Laplace transform (see [6], page 92) and, in general,we are not able to compute them in an explicit way. However, if we modify L [2] t g usinglinear interpolation, that is, M [2] t g ( x ) := ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19)(cid:19) , (8)64 C. Sang¨uesa then we observe that the order of convergence of M [2] t g to g is also 1 /t on the followingclass of functions: D := { g ∈ C ([0 , ∞ )): k g ′′ ( x ) k ≤ ∞ and k x g iv ( x ) k < ∞} . (9)Moreover, the advantage of using M [2] t g instead of L [2] t g to approximate g is the com-putability. In the following result, we note that the last approximation applied to adistribution function F is related to L ∗ t F , as defined in (2). From now on, N ∗ will denotethe set N \ { } . Proposition 2.1.
Let X be a non-negative random variable with Laplace transform φ X .Let L ∗ t F, t > , be as defined in (2) and let M [2] t F be as defined in (8) . We have M [2] t F (cid:18) kt (cid:19) = F (0) , if k = 0 , L ∗ t F (cid:18) k − t (cid:19) − L ∗ t F (cid:18) k − t (cid:19) , if k ∈ N ∗ , (10) and M [2] t F ( x ) = ( tx − [ tx ]) M [2] t F (cid:18) [ tx ] + 1 t (cid:19) + ([ tx ] + 1 − tx ) M [2] t F (cid:18) [ tx ] t (cid:19) . (11) Proof.
Let t > M [2] t F (cid:18) kt (cid:19) = L [2] t F (cid:18) kt (cid:19) , k ∈ N . (12)Now, using (6) and (4), we have M [2] t F (0) = L [2] t F (0) = F (0), which shows (10) for k = 0.Finally, using (6), (4) and (5), we have, for k ∈ N ∗ ,L [2] t F (cid:18) kt (cid:19) = 2 EF (cid:18) S (2 k )2 t (cid:19) − EF (cid:18) S ( k ) t (cid:19) = 2 L ∗ t F (cid:18) k − t (cid:19) − L ∗ t F (cid:18) k − t (cid:19) . (13)Thus, (12) and (13) show (10) for k ∈ N ∗ . Note that (11) is obvious by (8) and (12). Thiscompletes the proof of Proposition 2.1. (cid:3) In the following example, we illustrate the use of the previous approximant in thecontext of random sums, defined in the following way. Let ( X i ) i ∈ N ∗ be a sequence ofindependent, identically distributed non-negative random variables. Let M be a randomvariable concentrated on the non-negative integers, independent of ( X i ) i ∈ N ∗ . Considerthe random variable M X i =1 X i , (14)with the convention that the empty sum is 0. niform error bounds Example 2.1.
As pointed out in the Introduction, an explicit expression for the dis-tribution of (14) is usually not possible. Our aim is to consider an example in whichthis distribution can be evaluated explicitly and to compare our approximation methodwith some others considered in the literature. To this end, we consider that M follows ageometric distribution of parameter p , that is, P ( M = k ) = (1 − p ) k p, k ∈ N and ( X i ) i ∈ N ∗ are exponentially distributed (with mean 1, for the sake of simplicity). In this case, itis well known (use LS transforms, for instance) that (14) has the same distribution asa mixture of the degenerate distribution at 0 (with probability p ) and an exponentialdistribution, that is, F ( x ) := P M X i =1 X i ≤ x ! = p + (1 − p )(1 − e − px ) = 1 − (1 − p )e − px , x ≥ . (15)When an explicit expression is not possible, the usual approximate evaluation methodis by discretizing the summands in (14) and then using recursive methods found in theliterature for discrete random sums. By considering (1) as a first discretization method,we have (see [5], page 391) P (cid:18) X • t = kt (cid:19) = (cid:18) tt + 1 (cid:19) k t + 1 , k = 0 , , . . . . (16)Thus, t P Mi =1 X • ti is a geometric sum of geometric distributions with parameter r = (1 + t ) − . It is easy to check (use LS transforms, for instance) that the distribution of such arandom variable is a mixture of the degenerate distribution at 0 (with probability p ) anda geometric distribution with parameter p ∗ = 1 − (1 − r )(1 − (1 − p ) r ) − = 1 − t ( t + p ) − ,so that for each k ∈ N ,L ∗ t F (cid:18) kt (cid:19) = P M X i =1 X • ti ≤ kt ! (17)= p + (1 − p )(1 − (1 − p ∗ ) k +1 ) = 1 − (1 − p ) (cid:18) tt + p (cid:19) k +1 . Note that the first equality in (17) follows by recalling (2) and noting that ( P Mi =1 X i ) • t hasthe same distribution as P Mi =1 X • ti (see [16], Proposition 2.1). Actually, a more naturalway (in this case) to compute (17) is to evaluate the LS transform of ( P Mi =1 X i ) • t andthen apply (1) and (2). However, the previous computations enable easier comparisonswith the following method. In fact, one of the most obvious (and widely used) methodsto discretize the summands in (14) is by a rounding method. For instance, a roundingdown method (we round X i to [ tX i ] t − ) yields P (cid:18) [ tX ] t = kt (cid:19) = P (cid:18) kt ≤ X < k + 1 t (cid:19) = e − k/t (1 − e − /t ) , k ∈ N . (18)66 C. Sang¨uesa
Table 1.
Comparison of different approximation methodsfor (15) x = k F ( k ) L ∗ F ( k ) R F ( k ) W F ( k )0 = In this case, P Mi =1 [ tX i ] is a geometric sum of geometric distributions with parameter r ′ = 1 − e − /t . We denote by R t F the distribution function of P Mi =1 [ tX i ] t . Using the samearguments as for (17), we obtain for each k ∈ N that R t F (cid:18) kt (cid:19) = P M X i =1 [ tX i ] t ≤ kt ! = 1 − (1 − p ) (cid:18) e − /t − (1 − p )(1 − e − /t ) (cid:19) k +1 . (19)Finally, it would be interesting to compare the previous ‘discretization methods’ with a‘transform method.’ To this end, we consider the Laplace transform of F in (15) (insteadof its LS transform), that is, w F ( θ ) = Z ∞ e − θu F ( u ) d u = 1 θ − − pθ + p , θ > , and apply the Post–Widder inversion formula (see [10], page 233), defined for t ∈ N ∗ as W t F ( x ) = ( − t − ( t − (cid:18) tx (cid:19) t w ( t − F (cid:18) tx (cid:19) = 1 − (1 − p ) t t ( px + t ) t , x ≥ . In Table 1 (computations with MATLAB) we consider a ‘rough’ discretization interval( t = 5), a small p ( p = 0 .
1) and present, for different x = k/
5, the exact values of F (column 2), the L ∗ t approximation (column 3), the ‘rounding down’ discretization (column4) and the Post–Widder inversion (column 5).As we can see in Table 1, L ∗ F provides a better approximation than R F . The intuitiveexplanation of this fact is that, when approximating P Mi =1 X i by P Mi =1 X • ti , the error inthe approximation can be controlled ‘uniformly,’ regardless of the distribution of M (see[16], Theorem 4.3). This effect is obvious when we choose M with a large expected value(our choice of a small p is for this reason – for larger values of p checked, L ∗ F is alsobetter, but the difference is less appreciable). However, if we compare the approximations niform error bounds Table 2.
Comparison of M [2]5 in (10) with G [2]5 in (21) x = k F ( k ) L ∗ F ( k − ) L ∗ F ( k − ) M [2]5 F ( k ) G [2]5 F ( k )1 = L ∗ F and W F , we see that the last one is better for small values, whereas the first one isbetter for large values. To explain this fact, it is interesting to point out that W t F , like L ∗ t F , admits the following well-known representation. For a function g defined on [0 , ∞ ),we can write, as in (5) (see [10], pages 220, 223), W t g ( x ) = Eg (cid:18) x S ( t ) t (cid:19) , x > . (20)Note that the mean of the ‘random points’ defining W t in (20) is E ( xt − S ( t )) = x ,whereas for L ∗ t in (5), we have E ( t − S ([ tx ] + 1)) = t − ([ tx ] + 1). This means that W t is centered at x , whereas L ∗ t is ‘biased’. The benefits of this property for W t are ob-served at small values in Table 1. However, we have Var( xt − S ( t )) = t − x , whereasVar( t − S ([ tx ] + 1)) = t − ([ tx ] + 1), the latter being of order t − x , as t → ∞ . The greatervariability of the random variables defining W t for greater values of x produces an un-desired effect in the approximation.We now show the improvement in the approximation which occurs when using M [2] t ,as defined in (10), instead of L ∗ t . In Table 2 below ( t = 5), we compare M [2] t F (column5) with Richardson extrapolation for W t F (or Stehfest enhancement of order two for thePost–Widder formula – see [1], page 40), that is, G [2] t F ( x ) := 2 W t F ( x ) − W t F ( x ) , x > . (21)As we can see, M [2]5 F provides us with an exact value up to a four decimal places, whereas G [2]5 F does not achieve this accuracy.68 C. Sang¨uesa
3. Error bounds for the approximation
Let g ∈ D , as defined in (7). Our first aim is to give bounds of k L [2] t g − g k in terms of k x g iv ( x ) k . To this end, we will use the following as ‘test function’: φ ( x ) = , if x = 0, x (cid:18) − log( x ) (cid:19) , otherwise. (22)Observe that φ ∈ D . In fact, by elementary calculus, φ ′ ( x ) = x (1 − log x ) , φ ′′ ( x ) = − log x, φ ′′′ ( x ) = − x and φ iv ( x ) = 1 x . (23)In the next lemma, we make an explicit computation of L t φ ( x ) in terms of the Ψ (ordigamma) function. This function is defined as (see [3], page 258)Ψ( x ) := dd x log(Γ( x )) = 1Γ( x ) Z ∞ log u e − u u x − d u, x > , (24)and, therefore, using the last equality, we have the following probabilistic expression ofthe psi function in terms of the gamma process:Ψ( x ) = E log S ( x ) , x > . (25)We will use the following property of this function (see [3], page 258):Ψ( x + 1) = 1 x + Ψ( x ) . (26) Lemma 3.1.
Let φ be defined as in (22) and let L t , t > , be defined as in (4) . We havethat L t φ ( x ) = 12 t (cid:18) tx ) − tx − tx ( tx + 1)( − Ψ( tx ) + log( t )) (cid:19) , x > . (27) Proof.
Let t > x > L t φ ( x ) = E S ( tx ) t (cid:18) − log (cid:18) S ( tx ) t (cid:19)(cid:19) = 12 t tx ) Z ∞ u (cid:18) − log (cid:18) ut (cid:19)(cid:19) e − u u tx − d u (28)= ( tx )( tx + 1)2 t tx + 2) Z ∞ (cid:18) − log (cid:18) ut (cid:19)(cid:19) e − u u tx +1 d u = ( tx )( tx + 1)2 t (cid:18) − E log (cid:18) S ( tx + 2) t (cid:19)(cid:19) . niform error bounds L t φ ( x ) = ( tx )( tx + 1)2 t (cid:18) − Ψ( tx + 2) + log( t ) (cid:19) . (29)Now, using (26) twice, we haveΨ( tx + 2) = 2( tx ) + 1 tx ( tx + 1) + Ψ( tx ) . (30)By (29), (30), we obtain L t φ ( x ) = ( tx )( tx + 1)2 t (cid:18) − tx ) + 1 tx ( tx + 1) − Ψ( tx ) + log( t ) (cid:19) . The result follows using elementary algebra in the expression above. (cid:3)
In the next lemma, we will study the approximation properties of L t φ to φ . We willmake use of the following inequalities for the psi function:12 x ≤ log( x ) − Ψ( x ) ≤ x , x >
0; (31)log( x ) − Ψ( x ) − x ≤ x , x > . (32)We can find (31) in [7], page 374, whereas (32) is an immediate consequence of the factthat the function Ψ( x ) − log( x ) + 12 x + 112 x is completely monotonic (see [15], page 304) and thus non-negative. Lemma 3.2.
Let φ be as defined in (22) and let L t , t > , be as defined in (4) . We have (cid:13)(cid:13)(cid:13)(cid:13) L t φ ( x ) − φ ( x ) + x log x t + 13 t (cid:13)(cid:13)(cid:13)(cid:13) ≤ t . (33) Proof.
Let x > t > φ ( x ) = 12 t (cid:18) tx ) − ( tx ) log( tx ) + ( tx ) log( t ) (cid:19) . (34)On the other hand, x log x t + 13 t = 12 t (cid:18) ( tx ) log tx − ( tx ) log t + 23 (cid:19) . (35)70 C. Sang¨uesa
Therefore, using Lemma 3.1, (34) and (35), we can write L t φ ( x ) − φ ( x ) + x log x t + 13 t = 12 t (cid:18) − tx − − ( tx ) Ψ( tx ) − ( tx )Ψ( tx ) + ( tx ) log( tx ) + ( tx ) log( tx ) + 23 (cid:19) (36)= 12 t (cid:18) ( tx ) (cid:18) log( tx ) − Ψ( tx ) − tx ) (cid:19) + tx (log( tx ) − Ψ( tx )) − (cid:19) . By (31), we have that 1 / ≤ x (log( x ) − Ψ( x )) ≤ , x >
0, and thus ≤ tx (log( tx ) − Ψ( tx )) − ≤ . (37)Thus, using (36), (37) and (32), we obtain (33). (cid:3) We are now in a position to state the following.
Theorem 3.1.
Let g ∈ D , as defined in (7) and let L [2] t , t > , be as defined in (6) . Wehave | L [2] t g ( x ) − g ( x ) | ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . Proof.
We will first see that g ∈ D implies that k xg ′′′ ( x ) k ≤ k x g iv ( x ) k < ∞ . (38)To begin with, the fact that k x g iv ( x ) k < ∞ implies that lim x →∞ x α g iv ( x ) = 0 for all0 < α <
1. By L’Hˆopital’s rule, we also have that lim x →∞ x α g ′′′ ( x ) = 0, thus concludingthat lim x →∞ g ′′′ ( x ) = 0. Using this fact, we can write g ′′′ ( x ) = − Z ∞ x g iv ( u ) d u, which implies easily (38) as | xg ′′′ ( x ) | ≤ x Z ∞ x | u g iv ( u ) | u d u ≤ k x g iv ( x ) k . Now, let t > L t be as in (4). As a previous step, we will prove that (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) ≤ t k x g iv ( x ) k , x > . (39)To this end, let x >
0. Using a Taylor series expansion of the random point u = S ( tx ) /t around x and taking into account that E ( S ( x ) − x ) = 0, E ( S ( x ) − x ) = x and E ( S ( x ) − niform error bounds x ) = 2 x , we can write L t g ( x ) − g ( x ) = Eg (cid:18) S ( tx ) t (cid:19) − g ( x )= E ( S ( tx ) − tx ) t g ′′ ( x ) + E ( S ( tx ) − tx ) t g ′′′ ( x ) (40)+ 16 E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ = xg ′′ ( x )2 t + xg ′′′ ( x )3 t + 16 E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ. Then, using (40), we get the bound (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) = 16 (cid:12)(cid:12)(cid:12)(cid:12) E Z S ( tx ) /tx g iv ( θ ) (cid:18) S ( tx ) t − θ (cid:19) d θ (cid:12)(cid:12)(cid:12)(cid:12) (41) ≤ k x g iv ( x ) k E Z max( x,S ( tx ) /t )min( x,S ( tx ) /t ) (cid:12)(cid:12)(cid:12)(cid:12) S ( tx ) t − θ (cid:12)(cid:12)(cid:12)(cid:12) θ d θ = k x g iv ( x ) k E Z S ( tx ) /tx (cid:18) S ( tx ) t − θ (cid:19) θ d θ. Let φ ( · ) be as in (22). Using (40) and (23), we have L t φ ( x ) − φ ( x ) + x log x t + 13 t = 16 E Z S ( tx ) /tx (cid:18) S ( tx ) t − θ (cid:19) θ d θ. (42)Then, by (41) and (42), we can write (cid:12)(cid:12)(cid:12)(cid:12) L t g ( x ) − g ( x ) − xg ′′ ( x )2 t − xg ′′′ ( x )3 t (cid:12)(cid:12)(cid:12)(cid:12) ≤ k x g iv ( x ) k (cid:13)(cid:13)(cid:13)(cid:13) L t φ ( x ) − φ ( x ) + x log x t + 13 t (cid:13)(cid:13)(cid:13)(cid:13) . Thus, (39) follows by applying Lemma 3.2.Observe that in (39), the only term of order 1 /t is the one involving the second deriva-tive. By means of the operator L [2] t , as defined in (6), this term is eliminated. In fact,using (39), we have L [2] t g ( x ) − g ( x ) = 2( L t g ( x ) − g ( x )) − ( L t g ( x ) − g ( x ))= 2 (cid:18) L t g ( x ) − g ( x ) − x t g ′′ ( x ) − x t g ′′′ ( x ) (cid:19) C. Sang¨uesa (43) − (cid:18) L t g ( x ) − g ( x ) − x t g ′′ ( x ) − x t g ′′′ ( x ) (cid:19) − x t g ′′′ ( x ) ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . This completes the proof of Theorem 3.1. (cid:3)
Finally, in the next result, we study the approximation properties of M [2] t . Theorem 3.2.
Let g ∈ D , as defined in (9) and let M [2] t , t > , be as defined in (8) . Wehave k M [2] t g − g k ≤ t k g ′′ ( x ) k + 16 t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . Proof.
First, note that g ∈ D implies that k xg ′′′ ( x ) k < ∞ , thanks to (38). Now, let t > x > M [2] t g ( x ) − g ( x ) = ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19) − g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19) − g (cid:18) [ tx ] t (cid:19)(cid:19) (44)+ ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19) . Using the usual expansion | g ( y ) − g ( x ) − ( y − x ) g ′ ( x ) | ≤ ( y − x ) k g ′′ k (45)and taking into account that( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19) = ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) − [ tx ] + 1 − txt g ′ ( x ) (cid:19) (46)+ ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) − [ tx ] − txt g ′ ( x ) (cid:19) , niform error bounds (cid:12)(cid:12)(cid:12)(cid:12) ( tx − [ tx ]) (cid:18) g (cid:18) [ tx ] + 1 t (cid:19) − g ( x ) (cid:19) + ([ tx ] + 1 − tx ) (cid:18) g (cid:18) [ tx ] t (cid:19) − g ( x ) (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ (cid:18) ( tx − [ tx ]) ([ tx ] + 1 − tx ) t + ([ tx ] + 1 − tx ) ([ tx ] − tx ) t (cid:19) k g ′′ k (47)= ( tx − [ tx ])([ tx ] + 1 − tx )2 t k g ′′ k ≤ t k g ′′ k , the last inequality holding since for each k ∈ N , the supremum of ( u − k )( k + 1 − u ) , k ≤ u ≤ k + 1, is attained at u = k + 1 /
2. On the other hand, taking into account Theorem3.1, we have (cid:12)(cid:12)(cid:12)(cid:12) ( tx − [ tx ]) (cid:18) L [2] t g (cid:18) [ tx ] + 1 t (cid:19) − g (cid:18) [ tx ] + 1 t (cid:19)(cid:19) + ([ tx ] + 1 − tx ) (cid:18) L [2] t g (cid:18) [ tx ] t (cid:19) − g (cid:18) [ tx ] t (cid:19)(cid:19)(cid:12)(cid:12)(cid:12)(cid:12) (48) ≤ k L [2] t g − g k ≤ t k xg ′′′ ( x ) k + 916 t k x g iv ( x ) k . The result follows by (44), (47) and (48). (cid:3)
4. Application to gamma distributions
In this section, we will study the case of gamma distributions, that is, distributions withdensity functions as given in (3). It is not hard to see that these distributions are in theclass D , for a shape parameter p = 1 or p ≥
2, and, therefore, we are a position of applyTheorem 3.2. The aim of this section is to show that, in fact, the bounds in this theoremcan be uniformly bounded on the shape parameter, which will be an advantage whendealing with mixtures of these distributions. From now on, we define f p ( x ) := e − x x p − Γ( p ) , x >
0, if p ∈ R \ { , − , − , . . . } ,0 , x >
0, if p ∈ { , − , − , . . . } . (49)The ‘odd’ definition of f p for p ∈ { , − , − , . . . } is for notational convenience in (51). For p >
0, the function above is the density of a gamma random variable as in (3), with scaleparameter a = 1. Results for another scale parameter will follow by a change of scale(see Proposition 5.1 below). First, we will consider the case p = 1, that is, an exponentialrandom variable. As the distribution function of this random variable presents no compu-tational problems, it makes no sense to approximate it. However, when we consider theproblem of approximating a general mixture of Gamma distributions, the exponentialdistribution could be a component.74 C. Sang¨uesa
Lemma 4.1.
Let F ( x ) = 1 − e − x , x ≥ . For t > , let M [2] t F be as defined in (8) . Wehave that k M [2] t F − F k ≤ (cid:18)
18 + 16e + 94e (cid:19) t . Proof.
First of all, note that | F ( k ) ( x ) | = e − x and that sup x ≥ x k e − x = k k e − k , k =1 , , . . . . Thus, we have k F ′′ k = 1 , k xF ′′′ ( x ) k = e − and k x F iv ( x ) k = 2 e − . (50)The conclusion follows by taking into account Theorem 3.2. (cid:3) We will now deal with the case p ≥ Lemma 4.2.
Let f p ( · ) , p > , be as defined in (49) . We have, for all n ∈ N , d n d x n f p ( x ) = e − x x p − n − Γ( p ) n X i =0 (cid:18) ni (cid:19) ( − i n − i Y j =1 ( p − j ) ! x i (51)= n X i =0 (cid:18) ni (cid:19) ( − i f p − n + i ( x ) , x > , in which Q j =1 ( p − j ) = 1 . Next, we formulate a technical lemma in which we define certain decreasing functionswhich will be used to bound the weighted derivatives of f p . Lemma 4.3.
We have: (i) the function g ( p ) := 1Γ( p ) e − ( p − ( p − p − , p > g (1) = 1) , is decreasing in p ; (ii) the function g ( p ) := 1Γ( p ) e − ( p − / / √ p − (cid:18) p −
12 + 12 p p − (cid:19) p − / , p ≥ , (53) is decreasing in p ;niform error bounds the function g ( p ) := 1Γ( p ) e − ( p − −√ p − ( p p − − p − ( p p − p − , p > g (2) = 1) , is decreasing in p ; (iv) the function g ( p ) := 1Γ( p ) e − ( p −√ p − ( p − p p − p − ( p p − − , p > g (2) = 1) , is decreasing in p . In the following result, we get bounds of the quantities required in Theorem 3.2, de-pending on the shape parameter p , but also decreasing on p . Lemma 4.4.
Let f p be as in (49) and g i , i = 1 , , , , be as in Lemma . We have: (i) sup x ≥ | f p ( x ) | = g ( p ) , p ≥ x ≥ | xf ′ p ( x ) | = g ( p ) , p ≥ x ≥ | f ′ p ( x ) | = g ( p ) , p ≥ x ≥ | xf ′′ p ( x ) | ≤ max { g ( p − , g ( p − } , p ≥ x ≥ | x f ′′′ p ( x ) | ≤ g ( p ) + 3 g ( p −
1) + g ( p − , p ≥ . Proof.
To show part (i), it is clear that, for p ≥ x ≥ f p ( x ) = f p ( p −
1) = e − ( p − ( p − p − Γ( p )and (i) follows by recalling (52). To show part (ii), we have (see [16], Remark 3.2 andLemma 5.2)sup x ≥ | xf ′ p ( x ) | = 1Γ( p ) (cid:18) p −
12 + 12 p p − (cid:19) p − / e − p − / / √ p − , p > , (56)and (ii) follows by recalling (53). To show part (iii), by (51), we have for p ≥ f ′ p ( x ) = 1Γ( p ) e − x x p − ( p − − x ) , x > , (57) f ′′ p ( x ) = 1Γ( p ) e − x x p − (( p − p − − p − x + x ) , x > , (58)76 C. Sang¨uesa and it can be easily checked that the zeros of f ′′ p ( x ) are p := p − − √ p − p := p − √ p −
1. Therefore, | f ′ p ( x ) | must attain its maximum value at either p or p .Actually, p corresponds to the maximum. To show that, we will see that f ′ p ( p ) | f ′ p ( p ) | = e √ p − (cid:18) √ p − − √ p − (cid:19) p − ≥ , p ≥ . (59)To show the last inequality in (59), taking logarithms, we will prove that r ( p ) := 2 p p − p − p p − − − log( p p − ≥ , p > . (60)Define ρ ( b ) := 2 bb − b − − log( b + 1)) , b > . Note that r ( p ) = ( p − ρ ( p p − , p > . (61)We will first prove that ρ ( b ) ≥ , b > . (62)To show (62), it is readily seen that ρ ′ ( b ) = − b − − , b >
1, so that ρ is decreasing.As lim b →∞ ρ ( b ) = 0, we have (62). This implies also (60), recalling (61). Therefore, weconclude thatsup x> | f ′ p ( x ) | = f ′ p ( p ) = 1Γ( p ) e − ( p − −√ p − ( p p − − p − ( p p − p − , (63)which, together with (54), shows (iii).To show part (iv), note that by using (51), we can write f ′ p ( x ) = f p − ( x ) − f p ( x ) and,therefore, xf ′′ p ( x ) = xf ′ p − ( x ) − xf ′ p ( x ) , x > , p ≥ . (64)On the other hand, we see in (58) that f ′ p − ( x ) and f ′ p ( x ) have the same sign for 0 < x
1, attains itsmaximum value at p −
1. From (66) and (67), we conclude thatsup x ∈ [ p − ,p − | xf ′′ p ( x ) | ≤ p ) e − ( p − ( p − p − ( p −
1) = g ( p − , (68)where the last inequality follows by recalling (52). Thus, (65) and (68) conclude the proofof part (iv). To show part (v), let p ≥
2. First, we have, by (51), f ′′′ p ( x ) = f p − ( x ) − f p − ( x ) + 3 f p − ( x ) − f p ( x )= e − x x p − Γ( p ) (( p − p − p − − p − p − x + 3( p − x − x ) (69)= e − x x p − Γ( p ) (( p − − x ) + 3( p − x − ( p − − ( p − , x > . Therefore, if we call h p ( x ) := e − x x p − Γ( p ) ( p − − x ) , x > , we have, recalling (57), x f ′′′ p ( x ) = e − x x p − Γ( p ) (( p − − x ) − p − x − ( p − − ( p − h p ( x ) + 3 xf ′ p − ( x ) − f p − ( x ) , x ≥ . We will firstly see that sup x ≥ | h p ( x ) | = g ( p ) (71)with g ( · ) as defined in (55). Note that h ′ p ( x ) = e − x x p − Γ( p ) ( p − − x ) ( x − px + ( p − p − , x > . The maximum value of | h p | will be attained at the roots of the last polynomials, being p := p + √ p − p := p − √ p −
2. To check which value attains the maximum,define u := √ p −
2. Note that p = ( u + 1)( u + 2) / p = ( u − u − /
3. Then,with this notation, we will prove that | h p ( p ) || h p ( p ) | = e u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) ( u − / (cid:18) u − u + 1 (cid:19) ≥ , u > . (72)78 C. Sang¨uesa
To show the last inequality in (72), taking logarithms, we will show that ρ ( u ) := 2 u + u −
43 log (cid:18) ( u − u − u + 1)( u + 2) (cid:19) + 3 log (cid:18) u − u + 1 (cid:19) ≥ , u > . (73)Note that ρ ′ ( u ) = 2 + 2 u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) + u − (cid:18) u − u − − u + 1 − u + 2 (cid:19) + 3 (cid:18) u − − u + 1 (cid:19) = 4 u u − u (cid:18) ( u − u − u + 1)( u + 2) (cid:19) , u > . We will show that ρ ′ ( u ) ≤ u >
2. In fact,dd u u ρ ′ ( u ) = 36( u + 1) ( u − ( u − ≥ , u > , and then 3(2 u ) − ρ ′ ( u ) is increasing. As lim u →∞ u ) − ρ ′ ( u ) = 0, we conclude that3(2 u ) − × ρ ′ ( u ) ≤ ρ ′ ( u ) ≤
0. Therefore, ρ ( u ) is decreasing. This,together with the fact that lim u →∞ ρ ( u ) = 0, proves (73) and therefore (72). Then, k h p k = h p ( p ) = g ( p ), thus proving (71). The proof of part (iv) now follows easily byrecalling (70) and using (71) and parts (i) and (ii). (cid:3) As an immediate consequence of Theorem 3.2 and Lemma 4.4, we have the followingcorollary.
Corollary 4.1.
Let F p be a gamma distribution with shape parameter p ≥ , that is,whose density function is given by (49) . Let M [2] t , t > , be defined as in (8) . We have k M [2] t F p − F p k ≤ (cid:18) (cid:19) t ≈ . t . Proof.
Let p ≥ F ′ p = f p , as defined in (49). Therefore, by Lemma 4.4(iii) and Lemma 4.3(ii), we havethat k F ′′ p k = k f ′ p k = g ( p ) ≤ g (2) = 1 . (74)On the other hand, we see that by Lemma 4.3(i), we have that g ( p − ≤ g (1) = 1 and g ( p − ≤ g (1) = e − , p ≥ . (75)Thus, using the above inequalities and Lemma 4.4(iv), we have k xF ′′′ p ( x ) k = k xf ′′ p ( x ) k ≤ . (76) niform error bounds k x F ivp ( x ) k = k x f ′′′ p ( x ) k ≤ g (2) + 3 g (1) + g (1) = 2 + 3e − . (77)Using (74), (76), (77) and Theorem 3.2, we obtain the result. This completes the proofof Corollary 4.1. (cid:3)
5. Applications to mixtures of Erlang distributionsand phase-type distributions
In this section we apply the results from the previous section to mixtures of Erlangdistributions and to random sums of thereof. In order to undertake this study for anarbitrary scale parameter, we need the following result which shows the behavior of M [2] t F under changes of scale. Proposition 5.1.
Let X be a random variable with distribution function F . For a given c > , denote by F c the distribution function of cX . Let M [2] t F and M [2] t F c , t > , be therespective approximations for F and F c , as defined in (8) . We have that M [2] t F c ( x ) = M [2] ct F ( x/c ) , x ≥ . (78) Therefore, k M [2] t F c − F c k = k M [2] ct F − F k . (79) Proof.
Let t > c > M [2] t F c (cid:18) kt (cid:19) = M [2] ct F (cid:18) kct (cid:19) , k ∈ N , (80)and, therefore, (78) is satisfied for points in the set k/t, k ∈ N . To this end, we use (12)and (6), and take into account that F c ( x ) = F ( x/c ) , x ≥ , (81)to write, for all k ∈ N , M [2] t F c (cid:18) kt (cid:19) = 2 EF c (cid:18) S (2 k )2 t (cid:19) − EF c (cid:18) S ( k ) t (cid:19) (82)= 2 EF (cid:18) S (2 k )2 ct (cid:19) − EF (cid:18) S ( k ) ct (cid:19) = M [2] ct F (cid:18) kct (cid:19) , C. Sang¨uesa thus proving (80). For a general x >
0, we use (8) and (80), to see that M [2] t F c ( x ) = ( tx − [ tx ]) M [2] t F c (cid:18) [ tx ] + 1 t (cid:19) + ([ tx ] + 1 − tx ) M [2] t F c (cid:18) [ tx ] t (cid:19) = ( tx − [ tx ]) M [2] ct F (cid:18) [ tx ] + 1 ct (cid:19) + ([ tx ] + 1 − tx ) M [2] ct F (cid:18) [ tx ] ct (cid:19) = M [2] ct F (cid:18) xc (cid:19) , the last inequality being trivial as tx = ( ct )( x/c ). This concludes the proof of (78). Finally,(79) follows easily from (78) and (81), as we havesup x> | M [2] t F c ( x ) − F c ( x ) | = sup x> | M [2] ct F ( x/c ) − F ( x/c ) | . This concludes the proof of Proposition 5.1. (cid:3)
As an application of the results in the previous section, we will consider the class of(possibly infinite) mixtures of Erlang distributions recently studied by Willmot and Woo(see [19]). More specifically, let F ( a,j ) , a > , j ∈ N ∗ , be the distribution function corre-sponding to the density f ( a,j ) given in (3) (an Erlang j distribution with scale parameter a ). We will consider a finite number of scale parameters arranged in increasing order(0 < a < · · · < a n ) and a set of non-negative numbers p ij , i = 1 , . . . , n, j = 0 , , , . . ., suchthat P ni =1 P ∞ j =1 p ij = p ≤
1, and define the class of distribution functions ME ( a , . . . , a n )given as F ( x ) = (1 − p ) + n X i =1 ∞ X j =1 p ij F a i ,j ( x ) , x ≥ − p ). Based on [19], page 103, we can alternatively write(83) by using only the maximum of the scale parameters, that is, F ( x ) = (1 − p ) + ∞ X j =1 p j F a n ,j ( x ) , x ≥ . (84)Moreover, the class (84) is a wide class containing many of the distributions considered inapplied probability, such as (obviously) finite mixtures of Erlang distributions, but alsothe class of phase-type distributions (see Proposition 5.3 below). Every random variablehaving a representation as in (83) can be approximated by means of M [2] t , as shown inthe following result. Proposition 5.2.
Let F be a distribution function of the form ME ( a , . . . , a n ) , < a < · · · < a n , as in (83) . Let M [2] t , t > , be defined as in (8) . We have k M [2] t F − F k ≤ (cid:18) (cid:19) P ni =1 ( P ∞ j =1 p ij ) a i t . (85) niform error bounds Proof.
Let t > < a < · · · < a n be fixed. The linearity of M [2] t yields M [2] t F ( x ) = (1 − p ) + n X i =1 ∞ X j =1 p ij M [2] t F a i ,j ( x ) , x ≥ . (86)By Corollary 4.1, we can write, for a scale parameter 1, k M [2] t F ,j − F ,j k ≤ (cid:18) (cid:19) t , j = 2 , , . . . . (87)Moreover, using Lemma 4.1, we have k M [2] t F , − F , k ≤ (cid:18)
12 + 16e + 94e (cid:19) t ≤ (cid:18) (cid:19) t . (88)Now, let the general scale parameters be a i , i = 1 , . . . , n . We use the fact that given X ,a gamma random variable of scale parameter 1, X/a i is a gamma random variable ofscale parameter a i , and, therefore, using Proposition 5.1, (87) and (88), we have for each a i , i = 1 , . . . , n , and j ∈ N ∗ , k M [2] t F a i ,j − F a i ,j k = k M [2] t/a i F ,j − F ,j k ≤ (cid:18) (cid:19) a i t . (89)Thus, using (86) and (89), we have k M [2] t F − F k ≤ n X i =1 ∞ X j =1 p ij k M [2] t F a i ,j − F a i ,j k (90) ≤ (cid:18) e (cid:19) P ni =1 ( P ∞ j =1 p ij ) a i t . This completes the proof of Proposition 5.2. (cid:3)
As a consequence of the previous result, we can provide error bounds for compounddistributions (that is, distribution functions of random sums, as in (14)) when the sum-mands are mixtures of Erlang distributions, as stated in the following result.
Corollary 5.1.
Let G be the distribution function of a random sum, as in (14) , in whichthe sequence of ( X i ) i ∈ N ∗ has a common distribution ME ( a , . . . , a n ) , < a < · · · < a n ,as defined in (83) . Let M [2] t be as in (8) . We have that k M [2] t G − G k ≤ (cid:18) (cid:19) (1 − G (0)) a n t . C. Sang¨uesa
Proof.
The proof is immediate, taking into account that a mixture of Erlang distri-butions ME ( a , . . . , a n ), 0 < a < · · · < a n , can be expressed as in (84) and compounddistributions of these random variables are also mixtures of Erlang distributions (see [19],page 106, with a slight modification in the coefficients, as we allow a point mass at 0),that is, we can write G ( x ) = q + ∞ X j =1 q j F a n ,j ( x ) , x ≥ , in which { q j , j = 0 , , . . . } form a probability mass function (obviously, q = G (0)). Theresult follows using the above expression and Proposition 5.2. (cid:3) The class of phase-type distributions, of great importance in applied probability, can beexpressed as mixtures of Erlang distributions. A phase-type distribution is defined as thetime until absorption in a continuous-time Markov chain with one absorbent state (see,for instance, [12], Chapter II or [8], Chapter VIII, and the references therein). A phase-type distribution can be expressed in terms of a matrix exponential as follows. Considera vector α = ( α , . . . , α n ) of non-negative numbers such that α + · · · + α n ≤
1. Let A be a n × n matrix with negative diagonal entries, non-negative off-diagonal entries andnon-positive row sums. A non-negative random variable X is a phase-type distribution PH ( α, A ) if its distribution function can be written as F ( x ) = 1 − α e xA ′ , x ≥ , in which ′ represents the transpose of the n th dimensional vector = (1 , . . . , α + · · · + α n = 1, having positive mass at 0 (of magnitude 1 − ( α + · · · + α n )) when α + · · · + α n <
1. Phase-type distributions have been extensively studied from both the-oretical and practical points of view. For instance, it is well known that phase-type dis-tributions have rational Laplace transforms, thus allowing numerical computation usingour approximation procedures. Also, in the next proposition, we will give an expressionof phase-type distributions in terms of mixtures of Erlang distributions. This, togetherwith Proposition 5.2, provides our approximations with rates of convergence. The proofof the next result is based on the following property of phase-type distributions, due toMaier (see [13], page 591). Let f be the density of an absolutely continuous phase-typedistribution. There exists some c > c j := d j d x j e cx f ( x ) (cid:12)(cid:12)(cid:12)(cid:12) x =0 > , j ∈ N . (91)We are now in a position to state the following. Proposition 5.3.
Let F be a phase-type distribution PH ( α, A ) , with α + · · · + α n > .Let c > be such that the absolutely continuous part of F satisfies the property (91) .niform error bounds Then, F can be expressed as a mixture of Erlang distributions, that is, F ( x ) = p + ∞ X j =1 p j F c,j ( x ) , x ≥ , (92) in which p = 1 − ( α + · · · + α n ) . Proof.
To prove (a), assume first that F is absolutely continuous, that is, that α + · · · + α n = 1. Its density is then given by f ( x ) = − α e xA A ′ , x >
0. We choose a c > cx f ( x ) = − α e x ( cI − A ) A ′ , x ≥ . (93)It can be easily checked that the function − α e x ( cI − A ) A ′ , x ∈ R is analytic in R , so if weconsider the Taylor series expansion of this function around 0 and take into account (91)and (93), we have e cx f ( x ) = ∞ X j =0 c j x j j ! , x > , from which we can write (recall (3)) f ( x ) = ∞ X j =0 c j c j +1 c j +1 x j e − cx j ! = ∞ X j =0 c j c j +1 f c,j +1 ( x ) , x > , and, in this way, we obtain the expression of f in terms of a mixture of Erlang densitieswith shape parameter c (by construction, the coefficients are non-negative and integratingboth sides in the above expression, we see that their sum is 1). As a consequence, we canwrite F ( x ) = ∞ X j =1 c j − c j F c,j ( x ) , x ≥ , (94)thus having expressed F as a mixture of Erlang distributions, as in (92). Now, assumethat 0 < α + · · · + α n <
1. This means that F has a point mass at 0 of magnitude p := 1 − ( α + · · · + α n ). The absolutely continuous part of F ( F ac ) is a phase-typedistribution ( PH (¯ α, A )) with ¯ α = ( α + · · · + α n ) − α . Let c > F ac verifiesproperty (93). We can write, thanks to (94), F ( x ) = p + (1 − p ) F ac ( x ) = p + ∞ X j =1 (1 − p ) c j − c j F c,j ( x ) , x ≥ . This completes the proof of Proposition 5.3. (cid:3) C. Sang¨uesa
Remark 5.1.
Expansions similar to those given in Proposition 5.3 can be found in[12], page 58. These expansions are obtained using a representation PH ( α, A ) of thedistribution under consideration. Note that if we denote by k A k the maximum absolutevalue of the entries of A , then it is easy to check using (93) (see [14], page 751) that c = k A k verifies (91). However, as the representation of a phase-type distribution is notunique, this value might not be the optimum one. Also, observe that the error boundgiven in (85) indicates that we should take c to be as small as possible. This problem,then, is closely connected to Conjecture 6 in [14], concerning the minimum c satisfying(91) and its relation with a phase-type representation having k A k as small as possible.To the best of our knowledge, this conjecture remains unsolved. Acknowledgments
I would like to thank Jos´e Garrido for suggesting the final applications to phase-type dis-tributions when I was at Concordia University and to two anonymous referees for helpfulcomments. This research has been partially supported by research Grants 2006-CIE-05(University of Zaragoza), MTM2007-63683 and PR 2007-0295 (Spanish Government),E64 (DGA) and by FEDER funds.
References [1] Abate, J. and Whitt, W. (1995). Numerical inversion of Laplace transforms of probabilitydistributions.
ORSA Journal of Computing Adv. in Appl. Probab. Handbook of Mathematical Functions . Washing-ton, DC: National Bureau of Standards. MR0167642[4] Adell, J.A. and de la Cal, J. (1993). On the uniform convergence of normalized Poissonmixtures to their mixing distribution.
Statist. Probab. Lett. J. Appl. Probab. J. Austral. Math. Soc. Ser. A Math. Comp. Ruin Probabilities . Singapore: World Scientific. MR1794582[9] Embrechts, P., Gr¨ubel, R. and Pitts, S.M. (1993). Some applications of the fastFourier transform algorithm in insurance mathematics.
Statist. Neerlandica An Introduction to Probability Theory and Its Applications II , 2nd ed.New York: Wiley.[11] Gr¨ubel, R. and Hermesmeier, R. (2000). Computation of compound distributions II: Dis-cretization errors and Richardson extrapolation. Astin Bull. Introduction to Matrix Analytic Methods inStochastic Modelling . Philadelphia: ASA-SIAM. MR1674122 niform error bounds [13] Maier, R.S. (1991). The algebraic construction of phase-type distributions.
Comm. Statist.Stochastic Models Comm. Statist. Stochastic Models J. Math. Anal. Appl.
Insurance Math. Econom. .[18] Sundt, B. (2002). Recursive evaluation of aggregate claims distributions. Insurance Math.Econom. N. Am. Actuar. J.99–105. MR2380721