[PDF] Efficient Importance Sampling for Large Sums of Independent and Identically Distributed Random Variables

Abstract

We aim to estimate the probability that the sum of nonnegative independent and identically distributed random variables falls below a given threshold, i.e., \mathbb{P}(\sum_{i=1}^{N}{X_i} \leq \gamma), via importance sampling (IS). We are particularly interested in the rare event regime when N is large and/or \gamma is small. The exponential twisting is a popular technique that, in most of the cases, compares favorably to existing estimators. However, it has several limitations: i) it assumes the knowledge of the moment generating function of X_i and ii) sampling under the new measure is not straightforward and might be expensive. The aim of this work is to propose an alternative change of measure that yields, in the rare event regime corresponding to large N and/or small \gamma, at least the same performance as the exponential twisting technique and, at the same time, does not introduce serious limitations. For distributions whose probability density functions (PDFs) are \mathcal{O}(x^{d}), as x \rightarrow 0 and d>-1, we prove that the Gamma IS PDF with appropriately chosen parameters retrieves asymptotically, in the rare event regime, the same performance of the estimator based on the use of the exponential twisting technique. Moreover, in the Log-normal setting, where the PDF at zero vanishes faster than any polynomial, we numerically show that a Gamma IS PDF with optimized parameters clearly outperforms the exponential twisting change of measure. Numerical experiments validate the efficiency of the proposed estimator in delivering a highly accurate estimate in the regime of large N and/or small \gamma.

Full PDF

aa r X i v : . [ s t a t . C O ] J a n Eﬃcient Importance Sampling for Large Sums ofIndependent and Identically Distributed RandomVariables

Nadhir Ben Rached ∗ , Abdul-Lateef Haji-Ali † , Gerardo Rubino ‡ ,and Ra´ul Tempone § . ¶ Abstract

We aim to estimate the probability that the sum of nonnegative inde-pendent and identically distributed random variables falls below a giventhreshold, i.e., P ( P Ni =1 X i ≤ γ ), via importance sampling (IS). We areparticularly interested in the rare event regime when N is large and/or γ is small. The exponential twisting is a popular technique that, in mostof the cases, compares favorably to existing estimators. However, it hasseveral limitations: i) it assumes the knowledge of the moment generatingfunction of X i and ii) sampling under the new measure is not straight-forward and might be expensive. The aim of this work is to propose analternative change of measure that yields, in the rare event regime cor-responding to large N and/or small γ , at least the same performance asthe exponential twisting technique and, at the same time, does not in-troduce serious limitations. For distributions whose probability densityfunctions (PDFs) are O ( x d ), as x → d > −

1, we prove that theGamma IS PDF with appropriately chosen parameters retrieves asymp-totically, in the rare event regime, the same performance of the estimatorbased on the use of the exponential twisting technique. Moreover, in theLog-normal setting, where the PDF at zero vanishes faster than any poly-nomial, we numerically show that a Gamma IS PDF with optimized pa-rameters clearly outperforms the exponential twisting change of measure.Numerical experiments validate the eﬃciency of the proposed estimatorin delivering a highly accurate estimate in the regime of large N and/orsmall γ . ∗ Chair of Mathematics for Uncertainty Quantiﬁcation, Department of Mathematics,RWTH Aachen University, Aachen, Germany ( [email protected] ). † School of Mathematical & Computer Sciences, Heriot-Watt University, Edinburgh, UK( [email protected] ). ‡ INRIA, Rennes - Bretagne Atlantique, France (

[email protected] ). § Computer, Electrical and Mathematical Sciences & Engineering Division (CEMSE),King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Ara-bia ( [email protected] ). Alexander von Humboldt Professor in Mathe-matics for Uncertainty Quantiﬁcation, RWTH Aachen University, Aachen, Germany( [email protected] ). ¶ This work was supported by the KAUST Oﬃce of Sponsored Research (OSR) underAward No. URF/1/2584-01-01 and the Alexander von Humboldt Foundation. A-L. Haji-Aliwas supported by a Sabbatical Grant from the Royal Society of Edinburgh. eywords:

Importance sampling, rare event, exponential twisting, GammaIS PDF.

AMS subject classiﬁcations:

Eﬃcient estimation of rare event probabilities ﬁnds various applications in theperformance evaluation/prediction of wireless communication systems operat-ing over fading channels [25]. In particular, the left-tail of the cumulative dis-tribution function (CDF) of sums of nonnegative independent and identicallydistributed (i.i.d.) random variables (RVs) is an example of a rare event proba-bility that is of practical importance. More speciﬁcally, the outage probabilityat the output of equal gain combining (EGC) and maximum ratio combining(MRC) receivers can be expressed as the CDF of the sum of fading channelenvelops (for EGC) and fading channel gains (for MRC) [7].The accurate estimation of the left-tail of the CDF of sums of RVs requiresthe use of variance reduction techniques because the naive Monte Carlo (MC)sampler is computationally expensive [4, 19, 23]. Moreover, the existing closed-form approximations [12, 13, 17, 20, 21, 27, 28] fail to be accurate when the tail ofthe CDF is considered. The literature is rich in works in which variance reduc-tion techniques were developed to eﬃciently estimate rare event probabilitiescorresponding to the left-tail of the CDF of sums of RVs, see [1, 3, 5, 7, 9, 10, 15]and the references therein. For instance, the authors in [3] used the concept ofexponential twisting, which is a popular importance sampling (IS) technique,to propose a logarithmically eﬃcient estimator of the CDF of the sum of i.i.d.Log-normal RVs. The logarithmic eﬃciency is a popular property in rare eventsimulation used to ensure estimators’ eﬃciency; see [7] for a formal deﬁnition ofthis property. In [15], the CDF of the sum of correlated Log-normal RVs wasconsidered. The authors developed an IS estimator based on shifting the mean ofthe corresponding multivariate Gaussian distribution. Under mild assumptions,they proved that their proposed estimator is logarithmically eﬃcient. Basedon [15] and under the assumption that the left-tail sum distribution is deter-mined by only one dominant component, the authors in [1] combined IS withthe control variate technique to construct an estimator with the asymptoticallyvanishing relative error property, which is the most desired property in the ﬁeldof rare event simulations [19]. In [7], two uniﬁed IS approaches were developedusing the hazard rate twisting concept [8, 18] to eﬃciently estimate the CDF ofsums of independent RVs. The ﬁrst estimator is shown to be logarithmicallyeﬃcient, whereas the second achieves the bounded relative error property (astronger criterion than the logarithmic eﬃciency) for i.i.d. sums of RVs andunder the given assumption that was shown to hold for most of the practicaldistributions used to model the amplitude/power of fading channels.The eﬃciency of the above mentioned estimators was studied when the num-ber of summand N was kept ﬁxed. More speciﬁcally, recall that the objectiveis to eﬃciently estimate the probability that the sum of nonnegative i.i.d. RVs2alls below a given threshold, i.e., P ( P Ni =1 X i ≤ γ ). A close look at the abovementioned estimators shows that the eﬃciency results were proved when the rar-ity parameter γ decreased whereas N is kept ﬁxed. However, In most cases, theeﬃciency of the existing estimators was considerably aﬀected when N increases.This represents the main motivation of the present work. We aim to introducea highly accurate estimator that eﬃciently estimate P ( P Ni =1 X i ≤ γ ) in the rareevent regime when N is large and/or γ is small. The main contributions of thepresent work are: • The exponential twisting technique is a popular IS change of measurethat, in most of the cases, compares favorably to existing estimators. Itis the optimal IS probability density function (PDF) in the sense thatit minimizes the Kullback-Leibler (KL) divergence with respect to theunderlying PDF under certain constraints [22]. However, it has somelimitations. First, it requires the knowledge of the moment generatingfunction of X i , i = 1 , , · · · , N . Second, sampling according to the newIS PDF is not straightforward and might be expensive. Moreover, thetwisting parameter is not available in a closed-form expression and needsto be estimated numerically. The contribution of this work is to proposean alternative IS estimator that asymptotically yields at least the sameeﬃciency as the one given by the estimator based on exponential twistingand at the same time does not introduce the above limitations. • We prove that for distributions whose PDF vanishes at zero polynomially,the Gamma IS PDF with appropriately chosen parameters retrieves, in theregime of rare events, the same performances as the exponential twistingPDF. • The above result does not apply to the Log-normal setting as the corre-sponding PDF approaches zero faster than any polynomial. Interestingly,we show that in this setting, the Gamma change of measure with op-timized parameters achieves a substantial amount of variance reductioncompared to the one given by exponential twisting. • In addition to yielding at least the same performance as the estimatorbased on exponential twisting, the estimator based on the use of theGamma PDF as an IS PDF is easy to implement. This is comparedto the estimator based on exponential twisting whose implementation ispossible only under restrictive assumptions. • Numerical comparisons with some of the existing estimators validate thatthe proposed estimator can deliver highly accurate estimates with lowcomputational cost in the rare event regime corresponding to large N and/or small γ. The paper is organized as follows. In section 2, we deﬁne the problem settingand motivate the work. In section 3, we introduce the exponential twistingchange of measure and present its limitations. The main contribution of this3ork is presented in section 4, where we show, under diﬀerent scenarios, thatthe Gamma IS PDF with optimized parameters retrieves at least the same per-formance as the exponential twisting technique. Finally, numerical experimentsare shown in section 5 to compare the proposed estimator with various existingestimators.

Let X , X , · · · , X N be i.i.d. nonnegative RVs with common PDF f X ( · ) andCDF F X ( · ). Let x = ( x , · · · , x n ) t and h X ( x ) = Q Ni =1 f X ( x i ) be the joint PDFof the random vector ( X , · · · , X N ) t . We consider the estimation of α ( γ, N ) = P h X N X i =1 X i ≤ γ ! , where P h X ( · ) is the probability under which the random vector X = ( X , · · · , X N ) t is distributed according to h X ( · ). We focus on the estimation of α ( γ, N ) when N is large and/or γ is small. Before delving into the core of the paper, we illus-trate via a simple example that the eﬃciency of an IS estimator, that performswell when γ decreases and N is not suﬃciently large, can deteriorate when weincrease the values of N . We ﬁrst write the quantity of interest as P h X N X i =1 X i ≤ γ ! = P h w N X i =1 w i ≤ ! ( F X ( γ )) N , (1)where w i = X i γ |{ X i γ ≤ } , i = 1 , , · · · , N . The estimator is then given byestimating the ﬁrst factor on the right-hand side of (1) by the naive MC method.Note that this estimator can be understood as applying IS with IS PDF beingthe truncation of the underlying PDF over the hypercube [0 , γ ] N . It can beeasily proved that for ﬁxed N , this estimator achieves the desired boundedrelative error property with respect to the rarity parameter γ for distributionsthat satisfy f w ( x ) ∼ bx d as x approaches zero and for d > − b >

0, see [9].This property means that coeﬃcient of variation, deﬁned as the ratio betweenthe standard deviation of an estimator and its mean, remains bounded as γ → α ( γ, N ) is. The question now is what happens when N is large.Using the Chernoﬀ bound, we obtain P h w ( N X i =1 w i ≤ ≤ min θ> exp ( θ + N log ( E f w [exp( − θw )])) , where E f w [ · ] denotes the expectation under f w ( · ). In particular, when θ = 1,the squared coeﬃcient of variation 1 / P h w ( P Ni =1 w i ≤

1) is lower bounded by4xp ( − − N log ( E f w [exp( − w )])). This shows that the squared coeﬃcient ofvariation increases at least exponentially, which proves that the eﬃciency of theestimator deteriorates when N is large. In this section, we review a popular IS approach, which is exponential twist-ing, and enumerate its limitations in estimating the quantity of interest. Whenapplicable, it is well acknowledged that the exponential twisting technique isexpected to produce a substantial amount of variance reduction and to comparefavorably, in most of the cases, to existing estimators. For distributions withlight right tails and under the i.i.d. assumption, the estimator based on expo-nential twisting can be proved, under some regularity assumptions, to be loga-rithmically eﬃcient when the probability of interest is either P ( P Ni =1 X i > γ )and γ → + ∞ or P ( P Ni =1 X i > γN ) and with N → + ∞ [4]. In the left tail set-ting, which is the region of interest in the present work, the exponential twistingwas shown in [3] to achieve the logarithmic eﬃciency property in the case ofi.i.d. Log-normal RVs when the probability of interest is P ( P Ni =1 X i < N γ ) andeither N → + ∞ or γ → { x , such that P Ni =1 x i ≤ γ } isno longer rare. The IS PDF is selected to be the solution of the followingoptimization problem, see [22]:inf h ∗ X ≥ Z h ∗ X ( x ) log (cid:18) h ∗ X ( x ) h X ( x ) (cid:19) d x s.t Z h ∗ X ( x ) d x = 1 (2) E h ∗ X " N X i =1 X i = γh ∗ X ( x ) ≥ , x i ≥ i. The solution of this problem is given as (see [22] for a more general setting) h ∗ X ( x ) = h X ( x ) exp (cid:16) θ P Ni =1 x i (cid:17) E h X h exp (cid:16) θ P Ni =1 X i (cid:17)i , (3)and θ solves E h X hP Ni =1 X i exp (cid:16) θ P Ni =1 X i (cid:17)i E h X h exp (cid:16) θ P Ni =1 X i (cid:17)i = γ. f X ( · ) f ∗ X ( x ) = f X ( x ) exp( θx ) M ( θ ) , x ≥ , with M ( θ ) = E f X [exp( θX )] and θ satisﬁes M ′ ( θ ) M ( θ ) = γ/N. Observe, however, that the exponential twisting technique has various restrictivelimitations. The main one is that sampling according to f ∗ X ( · ) is not straight-forward. One generally needs the use of an acceptance-rejection technique, thecomplexity of which can be dramatic when the probability of acceptance is rela-tively small. In such a case, the computational complexity of the algorithm canbe huge and even worse than the naive MC method. There are other less criti-cal drawbacks. First, computations are much simpler if the moment generatingfunction M ( θ ) is known in closed-form. Such a requirement does not hold ingeneral. Also, the twisting parameter θ does not have, in general, a closed-formexpression, and hence, it should be approximated numerically. The objective of this paper is to propose an alternative change of measure thatapproximately yields at least the same performance as the exponential twistingtechnique and at the same time does not introduce serious limitations. Wedistinguish three scenarios depending on how the PDF f X ( · ) approaches zero. f X ( x ) ∼ b as x goes to and b > is a constant Recall that the exponential twisting change of measure satisﬁes f ∗ X ( x ) ∝ f X ( x ) exp( θx ) , x ≥ , with θ → −∞ as γ → N → + ∞ . Therefore, as f ( x ) ∼ b and b > M ( θ ) = − θ , we instead consider the following change of measure˜ f X ( x ) = exp( θx )˜ M ( θ ) , x ≥ . The value of θ satisﬁes ˜ M ′ ( θ )˜ M ( θ ) = γN . Through simple computation, we obtain θ = − Nγ . To conclude, when f ( x ) ∼ b and b >

0, we propose a change ofmeasure given by the exponential distribution with rate Nγ . In the regime ofrare events, we expect the proposed change of measure to yield approximatelythe same performance as the one given by exponential twisting.6 able I: Some PDF asymptotics around zero a Distribution

PDF Proportional toExponential k exp( − kx ) 1 k > β k Γ( k ) x k − exp( − xβ ) x k − k, β > kλ ( xλ ) k − exp( − ( xλ ) k ) x k − k, λ > m m Γ( m )Ω m x m − exp( − m Ω x ) x m − m, Ω > p/a d Γ( d/p ) x d − exp( − ( xa ) p ) x d − a, d, p > xσ exp( − x + ν σ ) I (( xνσ )) xσ > , ν ≥ > km ) k + m Γ( k )Γ( m )Ω ( x Ω ) k + m − K k − m (cid:18) q kmx Ω (cid:19) x k − Ω > , m > k > , m − k / ∈ N κ − µ distribution µ (1+ κ ) µ +12 x µ Ω µ +12 κ µ − exp( µκ ) exp( − (1+ κ ) µx Ω ) I µ − (cid:18) µ q κ ( κ +1)Ω x (cid:19) x µ − κ, µ > a Functions I ξ ( · ), and K ξ ( · ) are respectively the modiﬁed Bessel functions of the ﬁrstkind and order ξ and the second kind and order ξ [14]. f X ( x ) = x d g ( x ) with g ( x ) ∼ b as x goes to , d > − , and b > Using the same methodology as in section 4.1, the change of measure that weconsider is ˜ f X ( x ) = x d exp( θx )˜ M ( θ ) , x ≥ . Therefore, the new measure corresponds to the Gamma distribution with shapeparameter d +1 and scale parameter − /θ . The normalizing constant is ˜ M ( θ ) = Γ( d +1)( − θ ) d +1 . Hence, the value of θ satisfying ˜ M ′ ( θ )˜ M ( θ ) = γN is given by θ = − Nγ ( d + 1) . In Table I, we provide a non-exhaustive list of distributions that belong tosection 4.2 (note that distributions in section 4.2 include those in section 4.1).7hese distributions are among the most practical distributions used to modelthe amplitudes and powers of wireless communications fading channels. Forthese distributions, the Gamma IS PDF is expected to asymptotically deliverthe same performance as the one given by the exponential twisting IS PDFwithout introducing serious limitations. polynomially Distributions in this class are much more diﬃcult to handle and need to be tack-led on a case-by-case basis. We consider the sum of i.i.d. standard Log-normalRVs. The density decreases to 0 at a faster rate than any polynomials and thusthe Gamma distribution with ﬁxed shape parameter will not recover the resultsgiven by the use of the exponential twisting technique. Note that in [3], theexponential twisting technique was applied to the sum of i.i.d. standard Log-normals by i) providing an unbiased estimator of the MGF, ii) approximatingthe value of θ , and iii) using acceptance-rejection to sample from the IS PDF.The main diﬃculty is that the PDF of the Log-normal distribution does nothave a Taylor expansion at x = 0. The ﬁrst estimator we propose is basedon truncating the support [0 , + ∞ ] and only work on [ a, + ∞ ] with a = δγ/N .This allows to use a Taylor expansion at x = a . This procedure, however,creates a bias that needs to be controlled. We show numerically that this esti-mator exhibits better performances than the one based on exponential twisting.Moreover, we observe that, in the regime of rare events, the proposed estima-tor achieves approximately the same performances as the Gamma IS PDF withshape parameter equal to 2. This is the main motivation behind introducinga second estimator whose IS PDF is a Gamma PDF with optimized parame-ters. The nnumerical results show that the second estimator achieves substantialvariance reduction with respect to the ﬁrst estimator. We rewrite the quantity of interest as P h X N X i =1 X i ≤ γ ! ≈ (cid:18) − F X ( δγN ) (cid:19) N × P h X N X i =1 X i ≤ γ (cid:12)(cid:12)(cid:12) X i > δγN ! , (4)where δ is a ﬁxed value belonging to [0 , f X ( · ) be the PDF of X i |{ X i > δγN } , i = 1 , , · · · , N , whose expression is given as follows:¯ f X ( x ) = 1 x √ π exp (cid:16) − (log( x )) (cid:17) P ( X i > δγN ) , x ≥ δγN . P h X N X i =1 X i ≤ γ (cid:12)(cid:12)(cid:12) X i > δγN ! = P ¯ h X N X i =1 X i ≤ γ ! , with ¯ h X ( x ) = Q Ni =1 ¯ f X ( x i ). The exponential twisting change of measure is thengiven by ¯ f ∗ X ( x ) ∝ ¯ f X ( x ) exp( θx ) , x ≥ δγN . Now, by using the Taylor expansion of ¯ f X ( · ) at the point x = δγ/N , we write¯ f X ( x ) = ¯ f X ( δγN ) + ( x − δγN ) ¯ f ′ X ( δγN ) + ( x − δγN ) f ′′ X ( ξ x,δ,N ) , where ξ x,δ,N is between δγN and x . Hence, the approximate exponential twistingchange of measure is given by˜ f X ( x ) = ¯ f X exp( θx ) + ( x − δγN ) ¯ f ′ X exp( θx )˜ M ( θ ) , x ≥ δγN , (5)with the notation ¯ f X = ¯ f X ( δγN ) and ¯ f ′ X = ¯ f ′ X ( δγN ). We assume that δγN is strictlyless than exp( −

1) to ensure that ¯ f ′ X >

0. This assumption is not restrictive, aswe are interested in the rare event regime corresponding to N large and/or γ small. Through a simple computation, we get˜ M ( θ ) = − exp ( θδγ/N ) θ ¯ f X + exp ( θδγ/N ) θ ¯ f ′ X . The value of θ that solves ˜ M ′ ( θ )˜ M ( θ ) = γN is given by θ = − ¯ f X − c ¯ f ′ X + q ( ¯ f X − c ¯ f ′ X ) + 8 ¯ f X ¯ f ′ X c c ¯ f X , with c = γN (1 − δ ). The remaining part is to sample from ˜ f X ( · ). To do this, wewrite ˜ f X ( x ) = − ¯ f X exp( θδγ/N )˜ M X ( θ ) θ ˜ f ( x ) + ¯ f ′ X exp( θδγ/N )˜ M X ( θ ) θ ˜ f ( x ) , where ˜ f ( x ) = − θ exp( θx )exp( θδγ/N ) and ˜ f ( x ) = θ ( x − δγ/N ) exp( θx )exp( θδγ/N ) are two valid PDFsfor x > δγ/N .The question that remains is related to controlling the bias through a properchoice of the parameter δ . Let α ( γ, N ) = (cid:16) − F X ( δγN ) (cid:17) N P h X (cid:16)P Ni =1 X i ≤ γ (cid:12)(cid:12)(cid:12) X i > δγN (cid:17) .Then, the global relative error can be upper bounded as follows: (cid:12)(cid:12)(cid:12)(cid:12) α ( γ, N ) − ˆ α IS α ( γ, N ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ α ( γ, N ) − α ( γ, N ) α ( γ, N ) + (cid:12)(cid:12)(cid:12)(cid:12) α ( γ, N ) − ˆ α IS α ( γ, N ) (cid:12)(cid:12)(cid:12)(cid:12) , (6)9ihere ˆ α IS is the IS estimator of α ( γ, N ) based on M i.i.d. realizations sampledaccording to the PDF in (5). The parameter δ is then chosen to control thebias term in the above equation, that is the ﬁrst right-hand side of the aboveinequality. The second term on the right-hand side is the statistical relativeerror of estimating α ( γ, N ) by ˆ α IS . From the Central Limit Theorem (CLT),this error term is approximately proportional to the coeﬃcient of variation ofˆ α IS .To achieve a global relative error of order ǫ , it is suﬃcient to bound thetwo error terms, i.e., the statistical relative error and the relative bias, by ǫ/ δ is selected such that the following inequality holds0 ≤ α ( γ, N ) − α ( γ, N ) α ( γ, N ) ≤ ǫ/ . (7)The following lemma provides the relation between δ and ǫ such that (7) isfulﬁlled. Lemma 1.

The following expression of δ ( ǫ, N, γ ) δ ( ǫ, N, γ ) = Nγ exp (cid:18) Φ − (cid:18) ǫ N (Φ(log( γ/N ))) N (Φ(log( γ ))) N − (cid:19)(cid:19) , (8)where Φ( · ) is the CDF of the standard Normal distribution, ensures that (7)holds. Proof.

We ﬁrst write that α ( γ, N ) − α ( γ, N )= P h X { N X i =1 X i ≤ γ } ∩ ∪ Ni =1 { X i ≤ δγ/N } ! ≤ P h X ( ∪ Ni =1 { X i ≤ δγ/N } ∩ ∩ Ni =1 { X i ≤ γ } ) ≤ N X i =1 P h X ( { X i ≤ δγ/N } ∩ ∩ j = i { X i ≤ γ } )= N P h X ( X ≤ δγ/N, X ≤ γ, · · · , X N ≤ γ )= N Φ(log( δγ/N )) (Φ(log( γ ))) N − . (9)On the other hand, we have α ( γ, N ) ≥ (Φ(log( γ/N ))) N . Therefore, we get α ( γ, N ) − α ( γ, N ) α ( γ, N ) ≤ N Φ (log( δγ/N )) (Φ(log( γ ))) N − (Φ(log( γ/N ))) N .

10y equating the right-hand side of the above inequality with ǫ/

2, we obtain δ ( ǫ, N, γ ) = Nγ exp (cid:18) Φ − (cid:18) ǫ N (Φ(log( γ/N ))) N (Φ(log( γ ))) N − (cid:19)(cid:19) , and hence the proof is concluded. when we consider a suﬃciently small value of δ in the above analysis, we observefrom the expression of the IS PDF in (5) that the proposed estimator with thechange of measure in (5) achieves approximately the same performance as theGamma IS PDF with shape parameter equal to 2. This suggests investigatingwhether the Gamma family can achieve further variance reduction with respectto the approach in the previous subsection. Note that the advantage of using theGamma family as IS PDFs compared to the approach in the previous subsectionis that the estimator is unbiased. Recall that the Gamma PDF is given by˜ f X ( x ) = x k − exp( − x/θ )Γ( k ) θ k , x > , (10)wihere θ > k > θ ischosen to be equal to θ = γNk to ensure that the expected value of each of the X i ’s, i = 1 , , · · · , N , under the PDF ˜ f X ( · ) is equal to γN . The likelihood ratiois then given by L ( x , x , · · · , x N )= (Γ( k ) θ k ) N exp( P Ni =1 x i θ − P Ni =1 (log( x i )) ) Q Ni =1 x ki ( √ π ) N . The second moment of the IS estimator is bounded by E ˜ h X h L ( X , X , · · · , X N ) ( P Ni − X i ≤ γ ) i ≤ ( Γ( k ) θ k √ π ) N exp( 2 γθ ) × E ˜ h X " exp( − N X i =1 (log( x i )) − k N X i =1 log( x i )) ≤ ( Γ( k )( γNk ) k √ π ) N exp(2 kN + k N ) . The last upper bound is found by maximizing the function x → − (log( x )) − k log( x ) for x >

0. Next, using Stirling’s formula for the gamma function11( k ) = √ πk k − exp( − k )(1 + O ( k )), we get E ˜ h X h L ( X , X , · · · , X N ) ( P Ni − X i ≤ γ ) i . Ck − N (cid:16) γN (cid:17) Nk exp( k N )= C exp( N ( k − k log( N/γ ) − log( k )))where C is a constant. Next, the value of k is chosen such that it minimizes theabove right-hand side term. The solution of this minimization problem is givenas follows: k ∗ = 12 log( Nγ ) + s (log( Nγ )) + 2 ! . (11)Note that when N is large and/or γ is small, the value of k ∗ satisﬁes k ∗ ∼ log( Nγ ). In this section, we show some selected numerical results to compare the perfor-mance of the proposed estimators compared to some of the existing estimators.We consider three scenarios depending on the distribution of X i , i = 1 , , · · · , N :the Weibull, the Gamma-Gamma, and the Log-normal distributions. Note thatthe proposed approach is not restricted to these three distributions (see TableI for a non-exhaustive list of distributions that can be handled).We deﬁne the squared coeﬃcient of variation of an unbiased estimator ˆ α ( γ, N )of α ( γ, N ) as SCV (ˆ α ( γ, N )) = var [ ˆ α ( γ, N )] α ( γ, N ) . (12)Note that, from the CLT, the number of required samples to meet ǫ statis-tical relative error with 95% conﬁdence is equal to (1 . SCV (ˆ α ( γ, N )) /ǫ .Therefore, when we compare two estimators, the one with the smaller squaredcoeﬃcient of variation exhibits better performance than the other. In this section, we assume that X i , i = 1 , , · · · , N , are distributed according tothe Weibull distribution whose PDF is given in Table I. The comparison is madewith respect to the second IS approach of [7] that is based on using the hazardrate twisting (HRT). In Figure 1 and Figure 2, we plot the squared coeﬃcientof variations given by the HRT technique and the proposed approach for twodiﬀerent values of the shape parameter: k = 1 . k = 0 .

5, respectively. Thevalue of α ( γ, N ) ranges approximately from 10 − to 10 − (respectively from120 − to 10 − ) using the system’s parameters of Figure 1 (respectively of Figure2). These ﬁgures show that the proposed approach clearly outperforms the onebased on HRT. For instance, when k = 1 . λ = 1, γ = 0 .

5, and N = 12, theproposed approach is approximately 270 times more eﬃcient than the one basedon HRT. More speciﬁcally, to meet the same accuracy, the number of samplesneeded by the approach based on HRT should be approximately 270 times thenumber of samples needed by the proposed approach. N HRT ApproachProposed Approach

Figure 1: Squared coeﬃcient of variation as a function of N where X i are i.i.d.Weibull RVs with rate λ = 1, k = 1 .

5, and γ = 0 . N HRT ApproacgProposed Approach

Figure 2: Squared coeﬃcient of variation as a function of N where X i are i.i.d.Weibull RVs with rate λ = 1, k = 0 .

5, and γ = 0 . N is ﬁxed and γ decreases. In Figure 3, we compare the eﬃciency13f both approaches in terms of squared coeﬃcient of variations plotted as afunction of γ for two scenarios depending on the value of N ( N = 8 and N = 10).In this case, the value of α ( γ, N ) ranges approximately from 10 − to 10 − for N = 8 and from 10 − to 10 − for N = 10. We observe a clear outperformanceof the proposed approach compared to the one based on using HRT for bothvalues of N . While the HRT approach was proved in [7] to achieve the boundedrelative error property with respect to γ and for a ﬁxed value of N , it is clearfrom Figure 3 that the asymptotic bound increases substantially with respectto N , and hence the performance of the HRT approach is dramatically aﬀectedby increasing N . On the other hand, we observe that increasing the value of N has a minor eﬀect on the eﬃciency of the proposed approach, i.e., the squaredcoeﬃcient of variation is approximately unchanged for both values of N andfor the considered range of γ . This numerical observation suggests to conclude HRT Approach N=8HRT Approach N=10Proposed Approach N=8Proposed Approach N=10

Figure 3: Squared coeﬃcient of variation as a function of γ where X i are i.i.d.Weibull RVs with rate λ = 1, k = 1 . N . For illustration, the proposedapproach is approximately 18 (respectively 64) times more eﬃcient than theHRT one when N = 8 (respectively N = 10) and γ = 0 .

2. Note that the previousobservations are valid independently of the value of α ( γ, N ) (see Figure 3, wherethe squared coeﬃcient of variation is approximately constant for a ﬁxed value of N and for the considered range of γ ). This experiment and the numerical resultsin Figures 1 and 2 validate the ability of the proposed approach to deliver a veryaccurate and eﬃcient estimate of α ( γ, N ) when N increases and/or γ decreases.14 .2 Gamma-Gamma Case The Gamma-Gamma distribution is used for various challenging applications inwireless communications. For instance, it exhibited a good ﬁt to experimentaldata and was used to model wireless radio-frequency channels [24] and to modelatmospheric turbulences in free-space optical communication systems [16]. ThePDF of X i is given in Table I. In Figure 4, we compare the proposed approach N IS mean shiftingProposed IS

Figure 4: Squared coeﬃcient of variation as a function of N where X i are i.i.d.Gamma-Gammal RVs with m = 4, k = 1 .

7, Ω = 1, and γ = 0 . N and for a ﬁxed value of γ . Note that in [6], the proposed ISPDF is simply another Gamma-Gamma PDF with shifted mean. We call thismethod the IS-based mean-shifted approach. The range of the quantity of inter-est α ( γ, N ) is approximately from 10 − to 10 − . We observe that the proposedestimator outperforms the one in [6]. Also, we observe that the outperformanceof the proposed estimator compared to the one based on mean shifting increasesas we increase N . Moreover, we should note here that the cost per sample (interms of CPU time) of the approach in [6] is twice the cost of the proposedapproach. This is because a Gamma-Gamma RV is generated by the productof two independent Gamma RVs, see [11]. For illustration, we observe fromFigure 4 that when N = 12, the proposed approach is approximately 2 . The Log-normal distribution can be used to model several types of attenua-tion including shadowing [26], and weak-to-moderate turbulence channels infree-space optical communications [2]. The standard Log-normal PDF (the as-15ociated Gaussian RV has zero mean and unit variance) is given by f X ( x ) = 1 x √ π exp (cid:18) − (log( x )) (cid:19) , x > . N Exp. TwistingGamma DistBiased PDF

Figure 5: Squared coeﬃcient of variation as a function of N where X i are i.i.d.standard Log-normal RVs with γ = 0 .

5, and ǫ = 0 . α ( γ, N ) ranges approximately from 10 − to 10 − . For theconsidered range of N , we observe that out of these three approaches, it is theone using the Gamma distribution as an IS PDF that outperforms the others.When N = 9, it is approximately 30 times more eﬃcient than the one basedon exponential twisting. In addition to the eﬃciency in terms of number ofsamples, it is worth recalling that the exponential twisting technique developedin [3] is computationally expensive in terms of computing time compared to theproposed approaches. Moreover, Figure 5 also shows that the approach basedon the biased estimator achieves better performances than the one based onexponential twisting. It is important to mention here that, for the comparisonto be fair, the squared coeﬃcient of variation of the biased estimator should bemultiplied by 4. This follows from the error analysis in (6), in which the sta-tistical relative error should be bounded by ǫ/

2, where ǫ is the required globalrelative error.In Figure 6, we plot the squared coeﬃcient of variations given by the threeapproaches as a function of γ and for two diﬀerent values of N ( N = 8 and N = 10). The quantity of interest α ( γ, N ) ranges approximately from 10 − to 10 − for N = 8 and from 10 − to 10 − for N = 10. We observe thatthe approach based on using the Gamma distribution as an IS PDF clearlyasymptotically outperforms the two other approaches. For both values of N ,16he outperformance increases as we decrease γ . Moreover, the biased estimatorexhibits better performances than the exponential twisting one for both valuesof N and for the considered range of γ values. Furthermore, increasing N hasa considerable negative eﬀect on the performances of the exponential twistingand the biased IS-based approaches. On the other hand, Figure 6 shows thatincreasing N does not largely eﬀect the performance of the IS estimator based onthe use of the Gamma distribution as an IS PDF. For illustration, the approachbased on using the Gamma distribution as an IS PDF is approximately 15 times(respectively 35) more eﬃcient that the exponential twisting one when N = 8(respectively N = 10) and γ = 0 . Exp. Twisting N=8Exp. Twisting N=10Gamma Dist. N=8Gamma Dist. N=10Biased PDF N=8Biased PDF N=10

Figure 6: Squared coeﬃcient of variation as a function of γ where X i are i.i.d.standard Log-normal RVs with ǫ = 0 . k ∗ in (11)).All of the above comparisons have been carried out in terms of the numberof sampled needed to meet a ﬁxed accuracy requirement. In order to includethe computing time in our comparison, we deﬁne the Work Normalized RelativeVariance (WNRV) metric of an unbiased estimator ˆ α ( γ, N ) of α ( γ, N ) as follows17see [9]): WNRV(ˆ α ( γ, N ))= SCV (ˆ α ( γ, N )) M × computing time in seconds . (13)The computing time is the time in seconds needed to get an estimator of α ( γ, N )using M i.i.d. samples of ˆ α ( γ, N ). When comparing two estimators, the one thatexhibits less WNRV is more eﬃcient than the other estimator. More precisely,an estimator is eﬃcient in terms of WNRV than another estimator means thatit achieves less relative error for a given computational budget, or equivalentlyit needs less computing time to achieve a ﬁxed relative error. Using the samesetting as in Figure 6, we plot in Figure 7 the WNRV metric as a function of γ for two scenarios depending on the value of N ( N = 8 and N = 10). We -5 -4 -3 -2 Exp. Twisting N=8Exp. Twisting N=10Gamma Dist. N=8Gamma Dist. N=10Biased PDF N=8Biased PDF N=10

Figure 7: WNRV as a function of γ where X i are i.i.d. standard Log-normalRVs with ǫ = 0 . α ( γ, N ) is getting smaller, it is the approach based on usingthe Gamma PDF as an IS PDF that outperforms the two other approachesin terms of WNRV (the eﬃciency increases as the event becomes rarer). It isworth recalling that the WNRV of the approach based on biased PDF shouldbe multiplied by 4 in order for the analysis to be fair (this follows from the erroranalysis that was performed in section 4.3.1). Moreover, Figure 7 shows that,in addition to reducing the variance, as shown in Figure 6, the approach basedon using the Gamma IS PDF also reduces the computing time compared tothe one using the exponential twisting technique. To see that, for N = 10 and γ = 0 .

6, the approach based on using the Gamma IS PDF is approximately 35times (respectively 340 times) more eﬃcient than the one based on exponentialtwisting when using the squared coeﬃcient of variation metric (respectively the18NRV metric). More speciﬁcally, the Gamma based IS approach approximatelyreduces the computing time by a factor of 10 with respect to the exponentialtwisting approach.

We developed eﬃcient importance sampling estimators to estimate the rareevent probabilities corresponding to the left-tail of the cumulative distributionfunction of large sums of nonnegative independent and identically distributedrandom variables. The proposed estimators achieve at least the same perfor-mance as the exponential twisting technique. The main conclusion is that theGamma PDF with suitably chosen parameters achieves for most of the well-practical distributions substantial amount of variance reduction, and at thesame time avoids the restrictive limitations when using the exponential twist-ing technique. The numerical results validate the eﬃciency of the proposedapproach in being able to accurately and eﬃciently estimate the quantity ofinterest in the rare event regime corresponding to large N and/or small γ . References [1] M.-S. Alouini, N. Ben Rached, A. Kammoun, and R. Tempone. On the eﬃ-cient simulation of the left-tail of the sum of correlated Log-normal variates.

Monte Carlo Methods and Applications , Mar. 2018.[2] I. S. Ansari, M.-S. Alouini, and J. Cheng. On the capacity of FSO linksunder Lognormal and Rician-Lognormal turbulences. In , pages 1–6, 2014.[3] Søren Asmussen, Jens Ledet Jensen, and Leonardo Rojas-Nandayapa. Ex-ponential family techniques for the lognormal left tail.

Scandinavian Jour-nal of Statistics , 43(3):774–787, Sep. 2016.[4] Søren Asmussen and Peter W. Glynn.

Stochastic simulation : Algorithmsand Analysis . Stochastic modelling and applied probability. Springer, NewYork, 2007.[5] N. C. Beaulieu and G. Luan. Improving simulation of lognormal sum distri-butions with hyperspace replication. In , pages 1–7, 2019.[6] C. Ben Issaid, N. Ben Rached, A. Kammoun, M.-S. Alouini, and R. Tem-pone. On the eﬃcient simulation of the distribution of the sum of Gamma-Gamma variates with application to the outage probability evaluation overfading channels.

IEEE Transactions on Communications , 65(4):1839–1848,2017. 197] N. Ben Rached, A. Kammoun, M.-S. Alouini, and R. Tempone. Uniﬁedimportance sampling schemes for eﬃcient simulation of outage capacityover generalized fading channels.

IEEE Journal of Selected Topics in SignalProcessing , 10(2):376–388, Mar. 2016.[8] Nadhir Ben Rached, Fatma Benkhelifa, Abla Kammoun, Mohamed-SlimAlouini, and Raul Tempone. On the generalization of the hazard ratetwisting-based simulation approach.

Statistics and Computing , Nov. 2016.[9] Nadhir Ben Rached, Zdravko I. Botev, Abla Kammoun, Mohamed-SlimAlouini, and Raul Tempone. On the sum of order statistics and applicationsto wireless communication systems performances.

IEEE Transactions onWireless Communications , 17(11):7801–7813, November 2018.[10] Zdravko I. Botev, Robert Salomone, and Daniel Mackinlay. Fast and ac-curate computation of the distribution of sums of dependent log-normals.

Annals of Operations Research , pages 1–28.[11] N. D. Chatzidiamantis, G. K. Karagiannidis, and D. S. Michalopoulos. Onthe distribution of the sum of Gamma-Gamma variates and application inMIMO optical wireless systems. In

GLOBECOM 2009 - 2009 IEEE GlobalTelecommunications Conference , pages 1–6, 2009.[12] D. B. Da Costa and M. D. Yacoub. Accurate approximations to the sum ofgeneralized random variables and applications in the performance analysisof diversity systems.

IEEE Transactions on Communications , 57(5):1271–1274, May. 2009.[13] N. Y. Ermolova. Moment generating functions of the generalized η − µ and κ − µ distributions and their applications to performance evaluationsof communication systems. IEEE Communications Letters , 12(7):502–504,Jul. 2008.[14] I. S. Gradshteyn and I. M. Ryzhik.

Table of integrals, series, and products .Elsevier/Academic Press, Amsterdam, seventh edition, 2007.[15] Archil Gulisashvili and Peter Tankov. Tail behavior of sums and diﬀerencesof Log-normal random variables.

Bernoulli , 22(1):444–493, Feb. 2016.[16] M. Hajji and F. E. Bouanani. Performance analysis of mixed Weibulland Gamma-Gamma dual-hop RF/FSO transmission systems. In , pages 1–5, 2017.[17] J. Hu and N. C. Beaulieu. Accurate closed-form approximations to Riceansum distributions and densities.

IEEE Communications Letters , 9(2):133–135, Feb. 2005. 2018] Sandeep Juneja and Perwez Shahabuddin. Simulating heavy tailed pro-cesses using delayed hazard rate twisting.

ACM Trans. Model. Comput.Simul. , 12(2):94–118, Apr. 2002.[19] Dirk P. Kroese, Thomas Taimre, and Zdravko I. Botev.

Handbook of MonteCarlo methods . Wiley, N.J, 2011.[20] J. A. Lopez-Salcedo. Simple closed-form approximation to Ricean sumdistributions.

IEEE Signal Processing Letters , 16(3):153–155, Mar. 2009.[21] M. D. Renzo, F. Graziosi, and F. Santucci. Further results on the ap-proximation of Log-normal power sum via Pearson type IV distribution:a general formula for log-moments computation.

IEEE Transactions onCommunications , 57(4):893–898, Apr. 2009.[22] Ad Ridder and Reuven Rubinstein. Minimum cross-entropy methods forrare-event simulation.

SIMULATION , 83(11):769–784, 2007.[23] G. Rubino and B. Tuﬃn.

Rare Event Simulation using Monte Carlo Meth-ods . Wiley, 2009.[24] P. M. Shankar. Error rates in generalized shadowed fading channels.

Wire-less Personal Communications , 28:233–238, Feb. 2004.[25] Marvin Kenneth Simon and Mohamed-Slim Alouini.

Digital Communica-tion over Fading Channels . Wiley Series in Telecommunications and SignalProcessing. Wiley-Interscience, Hoboken, N.J, 2nd ed edition, 2005.[26] Tjeng Thiang Tjhung, Chin Choy Chai, and Xiaodai Dong. Outage prob-ability for Lognormal-shadowed Rician channels.

IEEE Transactions onVehicular Technology , 46(2):400–407, 1997.[27] Z. Xiao, B. Zhu, J. Cheng, and Y. Wang. Outage probability bounds ofEGC over dual-branch non-identically distributed independent Lognormalfading channels with optimized parameters.

IEEE Transactions on Vehic-ular Technology , 68(8):8232–8237, Aug. 2019.[28] B. Zhu and J. Cheng. Asymptotic outage analysis on dual-branch diversityreceptions over non-identically distributed correlated Lognormal channels.