Minimum-Variance Importance-Sampling Bernoulli Estimator for Fast Simulation of Linear Block Codes over Binary Symmetric Channels
aa r X i v : . [ c s . I T ] N ov Minimum-Variance Importance-Sampling BernoulliEstimator for Fast Simulation of Linear BlockCodes over Binary Symmetric Channels
Gianmarco Romano,
Member, IEEE , and Domenico Ciuonzo,
Student Member, IEEE
Abstract —In this paper the choice of the Bernoulli distributionas biased distribution for importance sampling (IS) Monte-Carlo(MC) simulation of linear block codes over binary symmetricchannels (BSCs) is studied. Based on the analytical derivationof the optimal IS Bernoulli distribution, with explicit calculationof the variance of the corresponding IS estimator, two novelalgorithms for fast-simulation of linear block codes are proposed.For sufficiently high signal-to-noise ratios (SNRs) one of theproposed algorithm is SNR-invariant, i.e. the IS estimator doesnot depend on the cross-over probability of the channel. Also,the proposed algorithms are shown to be suitable for theestimation of the error-correcting capability of the code and thedecoder. Finally, the effectiveness of the algorithms is confirmedthrough simulation results in comparison to standard MonteCarlo method.
Index Terms —Binary symmetric channel (BSC), importancesampling (IS), linear block codes, Monte-Carlo simulation.
I. I
NTRODUCTION T HE Monte-Carlo (MC) simulation is a general method toestimate performances of complex systems for which an-alytical solutions are not available or mathematically tractableand it is extensively used in the analysis and design ofcommunications systems [1], [2]. The MC method has alsobeen extensively employed to evaluate the performances offorward-error-correcting (FEC) codes with different decodingalgorithms, in terms of probability of bit error (BER) or worderror (WER), for which, in many cases, is not possible toobtain exact closed-form expressions [3]–[5]. In general anupper bound is available for any linear block code, howeverthe error correcting capability of the code is required [4], [5].The MC method is also used as verification tool in the design,development and implementation of decoding algorithms.The computational complexity of the MC method is givenby the number of generated random samples that are neededto obtain a reliable estimate of the parameters of interest.In the case of FEC codes, estimation of low BER or WERrequires a high number of generated codewords to obtainresults of acceptable or given accuracy, thus leading to pro-hibitive computational complexity. Furthermore, for very longcodes the computational complexity is high even for smallnumber of generated words, since the decoding complexityincreases the simulation time considerably. A practical caseis represented by low-density parity-check (LDPC) codes [6]–[8], for which it is crucial to examine the performances at
The authors are with the Department of Industrial and Information Engi-neering, Second University of Naples, via Roma, 29, 81031 Aversa (CE),Italy. Email: {gianmarco.romano, domenico.ciuonzo}@unina2.it very low probability of error in order to avoid error floors, i.e.the rate of decrease of the probability of error is not as highas at lower SNRs (i.e. in the waterfall region) [9], [10]. Oneof the impediments in the adoption of LDPC codes in fiber-optics communications, where the order of magnitude of theprobability of error of interest is − and below, has been theinability to rule out the existence of such floors via analysis orsimulations [11]. While for some LDPC codes it is possibleto predict such floors, in many other cases the MC methodis the only tool available. LDPC codes are also employed,for example, in nanoscale memories [12], where a majority-logic decoder is chosen instead of soft iterative decoders asthese may not be fast enough for error correction; therefore anefficient method to estimate the performances of hard-decisiondecoding at very low WERs is extremely desirable.Several mathematical techniques have been proposed in theliterature in order to reduce the computational complexityof the MC method and estimate low WERs with the sameaccuracy [13]. Importance sampling (IS) is regarded as oneof the most effective variance-reduction techniques and it iswidely adopted to speed up simulation of rare events, i.e.events that occur with very low probability [14]. The idea is toincrease the frequency of occurrence of rare events, by meansof a biased distribution. The optimal biased IS distribution isknown, but it cannot be used in practice since it depends onthe parameter to be estimated itself. Therefore, a number ofsub-optimal alternatives have been developed in the literature[13], [15]. Some of them are obtained by restricting the searchof the biased distribution to a parametric family of simulationdistributions; then the parameters are derived as minimizersof the estimator variance or other related metrics, such as thecross-entropy [14], [16]. The choice of the family of biaseddistribution is somewhat arbitrary and may depend on thespecific application of the IS method [14]. In the case ofFEC, the rare event corresponds to the decoding error and theIS method, in order to be effective, needs to generate morefrequently the codewords that are likely to be erroneouslydecoded. The mathematical structure of the code, or someperformance parameter of the code, such as the minimumdistance and/or the number of correctable errors or, in the One possibility to cope with the computational complexity of the MCmethod is to adopt more powerful hardware in order to reduce the generationand processing time of each codeword; this might constitute a practicalsolution to reduce the overall simulation time. Nevertheless, the increasedsystem complexity requires more time per sample and compensates thereduction of execution time, thus limiting the achievable gain. case of LDPCs, the minimum size of the absorbing sets intheir Tanner graphs, may be taken into account to choose agood family [17], [18]. In [19], an SNR-invariant IS method isproposed, which, though independent of the minimum distanceof the code, provides better estimates when the error-correctingcapability of the decoder is available. In this paper we considergeneric linear block codes and we do not make any assumptionon specific parameter or structure of the code.In this paper a specific problem is considered: (i) whichis the best joint independent Bernoulli distribution that canbe used as biased distribution for IS estimation of blocklinear code performances and (ii) what are the strengths andlimitations of this solution. The choice of such family ofdistributions is arbitrary and it is motivated by the fact thatthe random generator required for the IS method is of thesame type of that required in the standard MC method andhence is made because of its simplicity rather than takinginto account the specific structure or properties of codes. Onthe other hand, since the study is restricted to the parametricfamily of the joint independent Bernoulli distributions, the gainin computational complexity that is obtained is limited by thischoice, as sub-optimal IS distributions that lead to smaller ISestimator variance may exist.Another performance measure for FEC codes is the mini-mum distance of the code and/or the error correcting capabilityof the code or decoder, i.e. the maximum number of errorsthat a specific couple (code, decoder) are able to correct. Theminimum distance of codes can be estimated to overcome thecomputational complexity required by the exhaustive search,which increases exponentially with the length of the informa-tion word. In [20] the error impulse method is proposed forlinear codes and is based on the properties of the error impulseresponse of the soft-in soft-out decoder and its error-correctingcapability. Due to the sub-optimality of the iterative decoderemployed with LDPC codes, the error impulse method canlead to wrong estimates of minimum distance. For this classof codes the method has been improved in [21] and [22]. Morerecently, integer programming methods have been used tocalculate either the true minimum distance or an upper bound[23]. Alternatively, a branch-and-cut algorithm for finding theminimum distance of linear block codes has been proposed in[24]. In this paper a novel MC method to estimate the errorcorrecting capability of the code/decoder is derived.Summarizing, the main contributions are the following:(i) analytical derivation of the optimal importance samplingdistribution among the family of Bernoulli distributions, withexplicit calculation of the variance of the corresponding ISestimator and proof of convexity; (ii) derivation of two al-gorithms for fast-simulation, one to estimate numerically theoptimal parameter of the importance sampling distribution andone that is invariant to SNR; (iii) derivation of one algorithmfor efficiently estimate the number of correctable errors. Someillustrative numerical examples of application of the proposedalgorithms, for BCH and LDPC codes, are also provided.The proposed fast-simulation algorithms achieve large gainsover standard MC simulation for a vast variety of communi-cation systems where linear block codes are employed overbinary symmetric channels (BSC). They are simple to im-
BSCsource encoder BPSK n demodulator decoder destination m c x y z ˆ m Figure 1. Illustrative block scheme of a communication system. plement because they require only small modifications to thestandard MC method, as the same random sample generatorcan be maintained and only the parameter of the Bernoulligenerator is changed. Furthermore, in most practical situationsthe SNR-invariant version of the algorithm allows to efficientlyobtain entire curves of performance, e.g. WERs correspondingto various SNRs, by just running one IS simulation at onesufficiently high SNR . In such a case the gain with respectto (w.r.t.) the standard MC simulation is even higher, as thenumber of simulation runs is dramatically reduced to one.The outline of the paper is the following: in Sec. II thesystem model is introduced and some preliminaries on MCand IS method are given; the main results of the paper arepresented in Sec. III; in Sec. IV fast-simulation algorithms areformulated and some examples are shown in Sec. V; finally,in Sec. VI some concluding remarks are given; proofs areconfined to the appendices.
Notation - Lower-case bold letters denote vectors; thefunction wt ( z ) returns the number of 1’s in the binaryvector z ; E [ · ] and var [ · ] denote expectation and varianceoperators, respectively; ⌈·⌉ denotes the ceiling operator; P ( · ) and f ( · ) are used to denote probabilities and probability massfunction (pmf); B ( i, p ) denotes the pmf of the n -dimensionalmultivariate independent Bernoulli variable z , with parameter p , i.e. f ( z ; p ) = p i (1 − p ) n − i , where i = wt ( z ) ; E p [ · ] and var p [ · ] denote expectation and variance operators withrespect to the joint Bernoulli distribution of parameter p ,respectively; finally, the symbols ∼ and ⊕ mean “distributedas” and “modulo-2 addition”, respectively.II. S YSTEM MODEL
A communication system where binary codewords are trans-mitted over a BSC with transition probability p is shown inFig. 1. A codeword c , belonging to the block code C ⊂ X n = { , } n is obtained by encoding message word m ∈ X k ;at the output of the channel a word z ∈ X n , corrupted bynoise, is observed. The decoder’s task is to possibly recover m given the observed z . The BSC may represent, for example, anadditive white Gaussian noise (AWGN) channel with binaryphase-shift keying (BPSK) modulation and hard-decision atthe receiver, as shown in Fig. 1.Performances of linear block codes over noisy channels aremeasured by the probability of decoding error, i.e. the prob-ability that a decoded word is different from the transmittedmessage word, because the block code was not able to correctthe errors due to the channel. This probability is also calledprobability of word error or WER. This event occurs whenthe error pattern is not a co-set leader (under the assumption that syndrome decoding is employed). Calculation of WER isoften very complex and some upper bounds are available [4].The WER, denoted as P ( e ) hereinafter, can be expressedin terms of an indicator function I ( z ) that equals to 1 whenthe received word is erroneously decoded and 0 otherwise; itsexplicit form is given as P ( e ) = X z ∈X n I ( z ) f ( z ) = E p [ I ( z )] . (1)Note that the indicator function hides the specific decodingalgorithm employed. The effect of the BSC channel is to flipsome bits, which can be mathematically expressed by z = c ⊕ e , with e ∼ B ( wt ( e ) , p ) . Since the code is linear andthe channel symmetric, without loss of generality (w.l.o.g.)the transmission of the codeword of all zeros is assumed, i.e. c = , and hence the output of the channel z = e , i.e. equalsthe error pattern e . A. Monte-Carlo simulation
In the MC simulation method the WER is estimated asfollows ˆ P MC ( e ) = 1 N N X i =1 I ( z i ) , (2)where z i are generated according the distribution of therandom variable z . It is known that the MC estimator (2) isunbiased and its variance var h ˆ P MC ( e ) i = P ( e ) (1 − P ( e )) N (3)is inversely proportional to N (see, for example, [14]), then itcan be made arbitrarily small as N grows, thus increasing theaccuracy of the estimator. Rather than studying the varianceit is often preferable to consider as accuracy of the estimatorthe relative error [14], defined as κ , r var h ˆ P MC ( e ) i P ( e ) . (4)In standard MC simulation κ becomes κ = s − P ( e ) P ( e ) N , (5)and, for small probabilities of error ( P ( e ) ≪ ), it is wellapproximated as κ ≃ p P ( e ) N . (6)It follows that the number of generated samples needed toachieve a given κ is N ≃ κ P ( e ) . (7)Eq. (7) shows that the number of samples needed to obtaina given κ is inversely proportional to P ( e ) and becomessoon very high and often impractical as P ( e ) decreases. Forexample with a relative error of , at least N ≃ /P ( e ) samples are needed to obtain the desired accuracy. Algorithm 1
Standard MC simulation algorithm totWords = 0WERre = 1 while (WERre > re) and (totWords
In IS simulation the WER is expressed by the followingequivalent of (1) P ( e ) = X z ∈X n I ( z ) f ( z ) f ∗ ( z ) f ∗ ( z ) , (8)where f ∗ ( z ) is a different pmf for which the sum in (8) exists.The corresponding estimator is ˆ P IS ( e ) = 1 N N X i =1 I ( z i ) f ( z i ) f ∗ ( z i ) , (9) = 1 N N X i =1 I ( z i ) W ( z i ) (10) where z i ∼ f ∗ ( · ) and the ratio W ( z ) , f ( z ) f ∗ ( z ) (11)is referred to as the likelihood ratio or weighting function.The estimator in (10) is called the IS estimator and is ageneralization of the simple MC estimator in (2), that canbe obtained as special case (i.e. f ∗ ( z ) = f ( z ) ).The distribution f ∗ ( z ) is called the IS or biased distributionand as long as the random generation of samples is underour control, as in the case of MC simulation, it is possibleto choose any distribution. However, it is crucial to choosethe IS distribution such that the variance of the IS estimatoris minimized. The optimal distribution is known from theory(see for example [13], [14]) and it is given by f ∗ opt ( z ) = I ( z ) f ( z ) P ( e ) . (12)This distribution leads to zero variance: this comes at nosurprise since f ∗ opt ( z ) contains P ( e ) (which is the true valueof the parameter being estimated). For this reason, the optimalsolution cannot be used for MC simulation. Nonetheless,significant gains in simulation time can be achieved withsub-optimal biased distributions. Several methods to find sub-optimal biased distributions have been developed and the in-terested reader can refer to the comprehensive tutorial in [13].One important goal in searching a sub-optimal IS distributionis to obtain a probability distribution from which samples canbe easily generated and that, at the same time, provides aweighted estimator with as low variance as possible.III. S UB - OPTIMAL I MPORTANCE S AMPLING
The main problem in the design of IS simulations is tofind sub-optimal distributions that lead to low variance of theIS estimator. The problem can be simplified if the search islimited within a parametric family of distributions, since theproblem can be recast into a standard optimization w.r.t. afinite number of parameters. Also, a proper choice of theparametric family can reduce the computational complexitydue to the generation of random samples. In this paper thefamily of Bernoulli distributions with parameter q is consid-ered , thus maintaining the simplicity of random generationof error patterns, since no change of the random generatoris required. In practice the WER for a BSC with cross-overprobability p is estimated by simulating the transmission overa different BSC with a different cross-over probability, denotedwith q . Within this restriction the optimal q , denoted ˆ q , is thecross-over probability that minimizes the IS estimator varianceover all possible BSCs. Hereinafter, a general formula for ˆ q is derived for any linear block code and for any decodingalgorithm.Consider the parametric family of joint Bernoulli distribu-tions B ( wt ( z ) , q ) generated by varying q as IS distributions. The IS estimator for WER in (9) specializes to ˆ P IS ( e ) = 1 N N X i =1 I ( z i ) f ( z i ; p ) f ( z i ; q ) (13) = 1 N N X i =1 I ( z i ) p wt ( z i ) (1 − p ) n − wt ( z i ) q wt ( z i ) (1 − q ) n − wt ( z i ) (14) = 1 N N X i =1 I ( z i ) W ( wt ( z i ); p, q ) , (15)where z i ∼ B ( wt ( z ) , q ) . Under the assumption c = , theestimator can be equivalently expressed as ˆ P IS ( e ) = 1 N N X i =1 I ( e i ) W ( wt ( e i ) ; p, q ) , (16)where e i ∼ B ( wt ( e ) , q ) . The general expression of thevariance for the above estimator is var q h ˆ P IS ( e ; q ) i = E q (cid:2) I ( e ) W ( wt ( e ) ; p, q ) (cid:3) − P ( e ) N (17)and clearly depends on q through the weighting function W ( · ) [13]. Therefore, the problem is to find the parameter q thatminimizes (17), i.e. ˆ q = arg min q var q h ˆ P IS ( e ; q ) i . (18)The expression of the IS estimator variance in the generalcase of linear block codes is given by the following lemma. Lemma 1.
The variance of ˆ P IS ( e ; q ) with importance sam-pling distribution in the parametric family B ( i, q ) is given by var q h ˆ P IS ( e ; q ) i = 1 N n X i = t +1 (cid:16) W ( i ; p, q ) P p ( e ; i ) − P p ( e ; i ) (cid:17) (19) where W ( i ; p, q ) is the weighting function of the IS estimator; P p ( e ; i ) is the joint probability of decoding error with i errorsover a BSC with cross-over probability p ; t is the error-correcting capability of the decoder.Proof: The proof is given in Appendix A.The above lemma provides a general expression of thevariance of the IS estimator that depends on the specific de-coding algorithm employed only through the error-correctingcapability of the decoder t . This parameter represents themaximum number of errors that the decoder is able to correctand depends on the structure of the linear block code and thedecoding algorithm [4].In order to solve the problem given by (18) we need tosearch for the equilibrium points of (19) w.r.t. q . The fol-lowing lemma gives a closed-form expression of the variancederivative. Lemma 2.
The derivative of the variance of the IS estimator(16) is given by ∂∂q var q h ˆ P IS ( e ) i = − N n X i = t +1 i − nqq (1 − q ) W ( i ; p, q ) P p ( e ; i ) . (20) Proof:
The proof is given in Appendix B.The solution of the minimization problem (18) can be ob-tained by equating to zero ∂∂q var q h ˆ P IS ( e ) i if the IS varianceis convex with respect to the variable q . The following lemmastates that the second derivative of the IS estimator is alwayspositive and then the variance of the IS estimator is convex. Lemma 3.
The IS estimator (16) is a convex function withrespect to the variable q .Proof: The proof is given in Appendix C.The following theorem gives the general expression for thevalue of q that minimizes the variance of the IS estimatorand for which the estimation requires the minimum numberof generated samples for a fixed relative error. Theorem 4.
The parameter q that minimizes the variance ofthe IS estimator given by (16) is ˆ q = 1 n P ni = t +1 iW ( i ; p, ˆ q ) P p ( e ; i ) P ni = t +1 W ( i ; p, ˆ q ) P p ( e ; i ) . (21) Proof:
The proof is obtained by solving ∂∂q var q h ˆ P IS ( e ) i = 0 and exploiting Lemma 2.The result in (21) defines implicitly the optimal q andtherefore it is not possible to obtain a closed-form solution.In some cases, however, (21) assumes a simplified expression.When np ≪ the following approximation holds [4] var q h ˆ P IS ( e ) i ≃ N W ( t + 1; p, q ) P p ( e ; t + 1) − N P p ( e ; t + 1) (22)and ˆ q can be expressed explicitly, as stated by the followingtheorem. Theorem 5.
Under the approximation np ≪ , the parameter q that minimizes the variance of the IS estimator (16)is ˆ q ≃ t + 1 n . (23) Proof:
The proof is given in Appendix D.A notable consequence of Theorem 5 is the independenceof ˆ q from the cross-over probability p (which in turn dependson the SNR), therefore leading to an SNR-invariant IS-MCsimulation. In this case estimation of WERs for a whole rangeof SNRs can be obtained by running one IS-MC simulationwith a BSC with parameter ˆ q given by (23), in the place of onesimulation for each SNR. Thus the whole performance curveWER versus SNR can be obtained with a dramatic reductionof the number of samples to be generated. It is also interestingto note that for the Hamming code (7 , Eq. (23) gives ˆ q =2 / . which confirms the value of ˆ q that Sadowskyfound empirically in [17]. Furthermore, Sadowsky noted alsothe SNR invariance of ˆ q with respect to p , without giving,unfortunately, any explanation.Note also that for short codes the assumption np ≪ holdsfor a large range of SNRs and then (23) is valid for values of p of interest, while for long codes the same assumption holdsonly for high SNRs and (23) may not be useful in practice. The result of Theorem 5 can also be used conversely toestimate t if an estimate ˆ q is available. In the next sectiona method to estimate ˆ q is provided and then t . This methodis particularly useful for long codes when exhaustive searchfor d min becomes computationally very intensive and/or thedecoder is not optimal and no explicit relationship between d min and t is known.IV. A LGORITHMS
The results presented in the previous section are exploitedhere to formulate two different IS-MC simulation algorithmsto obtain performance curves in terms of WER vs SNRand an algorithm to estimate the error correcting capabilityof the decoder. The two fast-simulation algorithms computethe the WER estimate by means of the same IS estimator(16) and they differ only in the choice of the Bernoulli ISdistribution parameter q . The first algorithm, called basic fast-simulation algorithm (IS-MC basic), estimates the optimalvalue ˆ q and then proceeds with WER estimation. It is the mostgeneral algorithm since no specific assumption is required.The second algorithm assumes q = ( t + 1) /n , a choicebased on Th. 5, and since q is independent on the currentSNR, the algorithm is called SNR-invariant IS-MC algorithm.Under the assumption np ≪ the SNR-invariant IS-MCalgorithm is computationally more efficient with respect tothe IS-MC basic, as the same generated samples can be usedto estimate WERs at different SNRs. The choice between thetwo algorithms depends on the code length n and cross-overprobability p (or, equivalently, the range of SNRs of interest)and therefore on whether the assumption np ≪ holds or not.Finally, the third algorithm is also based on the result ofTh. 5 and does not estimate the WER, but rather the errorcorrecting capability of the code. A. Basic fast-simulation algorithm (IS-MC basic)
The basic version of the algorithm computes an estimateof the parameter ˆ q iteratively, i.e. by updating q at iteration j from the q at iteration j − . In fact, from (21) the followingupdate rule can be derived ˆ q j = 1 n P ni = t +1 iW ( i ; p, ˆ q j − ) P p ( e ; i ) P ni = t +1 W ( i ; p, ˆ q j − ) P p ( e ; i ) , (24)that can also be written as ˆ q j = 1 n P ni = t +1 iW ( i ; p, ˆ q j − ) P q ( e ; i ) P ni = t +1 W ( i ; p, ˆ q j − ) P q ( e ; i ) , (25)since P p ( e ; i ) = W ( i ; p, q ) P q ( e ; i ) . Finally, the stochasticcounterpart approximating (25) can be written in terms of theindicator function I ( · )ˆ q j = 1 n P N q i =1 I ( z i ) wt ( z i ) W ( z i ; p, ˆ q j − ) P N q i =1 I ( z i ) W ( z i ; p, ˆ q j − ) , (26)where z i ∼ B ( wt ( z i ) , ˆ q j − ) .In practice the IS simulation consists of two major steps.During the first step an estimate of ˆ q is derived through (26)with a fixed number of iterations and in the second step theWER estimation is performed by running the simulation with Algorithm 2
IS simulation with embedded estimation of ˆ q totWords = 0WERre = 1 ˆ q = ˆ q while (WERre > re) and (totWords
The result of the Theorem 5 suggests a more computa-tionally efficient IS-MC simulation algorithm that improvesthe basic algorithm derived in the previous sub-section. Infact, under the assumptions of the Theorem 5, the ˆ q givenby (23) does not depend on the current specific cross-overprobability p of the channel being simulated. Then, the sameset of generated samples with ˆ q can be used to calculate theestimate of the WER at different SNRs. More specifically, in(15) the only term that depends on p is the weight function W ( wt ( z i ); p, q ) , which is a deterministic function. Thereforegiven one set of N realization of z i ∼ B ( wt ( z i ); q ) it ispossible to compute the estimated WER for any p for whichthe approximation np ≪ holds. In other words, with just oneIS simulation WERs for any SNR in the range of applicationof Theorem 5 can be estimated.On the other hand, the estimated κ that controls the numberof words to be generated depends on the current SNR. Aconservative rule for the choice of the relative error to beused in the stop condition is to select the relative error cor-responding to the highest SNR in the given range, since, dueto the monotonic decrease of the WER curve, this guaranteesthat all the other relative errors will be smaller. C. Error-correcting capability estimation algorithm
The first step of the basic algorithm can be used to estimatethe error correcting capability of the code and/or decoder,under the assumption of relatively high SNR, as stated byTheorem 4. In fact, Eq. (23) can be inverted to derive t from ˆ q , that can be estimated. Note that, since the solution must bean integer, the estimate of ˆ q may not need to have the sameaccuracy as that required for fast-simulation. Note also, thatespecially for long codes, the number of generated words toobtain t is far less than the number of codewords. − − − − − − − − ( E b /N ) dB W E R n = 255 , k = 231 , t = 3 , R = . n = 511 , k = 466 , t = 5 , R = . n = 1023 , k = 923 , t = 10 , R = . n = 2047 , k = 1849 , t = 18 , R = . n = 4095 , k = 3693 , t = 34 , R = . n = 8191 , k = 7372 , t = 63 , R = . n = 16383 , k = 14752 , t = 117 , R = . n = 32767 , k = 29497 , t = 220 , R = . n = 65535 , k = 58991 , t = 414 , R = . Figure 2. Estimated WER vs signal-to-noise ratio per uncoded bit (in dB)with IS-MC basic fast-simulation algorithm for a set of BCH codes with R ≃ . . V. E
XAMPLES
In this section some examples of applications of the pro-posed fast-simulation algorithms are shown. The first exampleconsiders the application of the IS-MC basic, by simulatingperformances of a set of BCH codes [4] with code rate R = k/n ≃ . , decoded with the Berlekamp-Masseyalgorithm [27], [28]. In Fig. 2 the WER vs signal-to-noiseratio per uncoded bit in dB, ( E b /N ) dB , is reported, alongwith the parameters of the code that have been simulated. Eachcurve is obtained by running the basic algorithm at differentSNRs with a stop condition on the relative error κ = 0 . .For reliable estimation of the parameter ˆ q , the simulation of minNumWords= has been assured and only one iterationhas been performed, i.e. l =1. The results of each simulationrun are plotted with points on the interpolated curves, andcorrespond to the performances predicted by the theoreticalupper bound for linear block codes [4]. On the same set ofBCH codes the error correcting capability estimation algorithmhas been applied with 100 generated words, and returns thecorrect number of correctable errors.In Fig. 3 it is shown the number of generated words requiredby a standard MC simulation with κ = 0 . for BCH code (2047 , . The number, that includes also the number ofwords required to estimate ˆ q , increases with the SNR, but atsome point, in the IS case (blue curve), it reaches a steadyvalue. This corresponds to the region where the IS distributiondoes not depend on the cross-over probability of the channel(cf. Th. 5).A second example is shown in Fig. 4 where the perfor-mances in term of WER vs SNR for the SNR-invariant IS-MCfast-simulation algorithm are plotted. In this case a differentset of BCH codes is considered, with a code rate R ≃ . . This ( E b /N ) dB w o r d s IS-MC - κ = 10%MC - κ = 10% Figure 3. Number of generated words vs signal-to-noise ratio per uncodedbit (in dB) for IS-MC basic and MC (estimated with (7)), BCH code (2047 , . − − − − − − − − ( E b /N ) dB W E R n = 255 , k = 131 , t = 18 , R = . n = 511 , k = 259 , t = 30 , R = . n = 1023 , k = 513 , t = 57 , R = . n = 2047 , k = 1024 , t = 106 , R = . n = 4095 , k = 2057 , t = 198 , R = . n = 8191 , k = 4096 , t = 366 , R = . n = 16383 , k = 8200 , t = 691 , R = . n = 32767 , k = 16397 , t = 1316 , R = . n = 65535 , k = 32771 , t = 2477 , R = . Figure 4. IS-MC SNR-Invariant fast-simulation algorithm for BCH codewith R ≃ . . set presents a greater number of correctable errors, and thusthe decoding algorithm requires an increased computationalcomplexity. The stop condition has been set on the relativeerror estimated at the highest SNR and only points with κ < . has been plotted. The performances in terms ofWER confirm the theoretical results for BCH codes. Moreinterestingly, it is important to note that each curve has beenobtained with a single simulation run with a total number ofgenerated words reported in Tab. I: with approximately × words it is possible to obtain the entire curve of performance.The IS-MC method can be also employed to estimate theperformances of LDPC codes. Fig. 5 shows the results of ISsimulations of a set of LDPC codes taken from [29], [30], interms of WER vs SNR per uncoded bit, for κ = 0 . . All codes ( n, k ) (255 , (511 , (1023 , (2047 , (4095 , (8191 , (16383 , (32767 , (65535 , UMBER OF GENERATED WORDS WITH
IS-MC SNR-
INVARIANTALGORITHM . F
OR EACH
BCH
CODES THE TOTAL NUMBER REQUIRED TODRAW AN ENTIRE PERFORMANCE CURVE IS REPORTED . − − − − − − − − ( E b /N ) dB W E R n = 273 , k = 191 n = 96 , k = 48 (96.44.443) n = 495 , k = 433 (495.62.3.2915) n = 999 , k = 888 (999.111.3.5543) n = 1908 , k = 1696 (1908.212.4.1383) n = 4376 , k = 4094 (4376.282.4.9598) Figure 5. IS-MC basic fast-simulation algorithm for a set of LDPC codes.The code (273 , is taken from [3], the others from [29]. are decoded with the bit-flip iterative algorithm described in[19], with a number of iterations equal to 20. Estimation of ˆ q has been performed with N = 10 generated words in oneiteration. The same number is the minimum enforced to obtaina reliable estimate of the relative error.The total number of generated words as function of the SNRis shown in Fig. 6. It is interesting to note that, as for BCHcodes, at some point the number of generated words requiredto achieve the prescribed relative error (i.e. κ = 0 . ) reaches asteady value. The flat region reflects the independence of theIS estimator variance on the SNR and identifies the SNR rangeover which the SNR-invariant algorithm can be effectivelyapplied. However, the range of SNRs for which the curve isflat is different for each linear block code as it depends on theIS estimator variance which in turn depends on the structure ofthe code and the decoding algorithm. Numerical results showalso that the assumption np ≪ is too strict, as it would haveas consequence a flat region starting at higher SNR that thoseshown in Fig. 6. Furthermore, the number of generated wordsin the flat region varies with the codes. Results confirm the ( E b /N ) dB o f g e n e r a t e d w o r d s n = 273 , k = 191 n = 96 , k = 48 (96.44.443) n = 495 , k = 433 (495.62.3.2915) n = 999 , k = 888 (999.111.3.5543) n = 1908 , k = 1696 (1908.212.4.1383) n = 4376 , k = 4094 (4376.282.4.9598) Figure 6. Total number of generated words to obtain results in Fig. 5. theoretical results obtained in Sec. III.The error correcting estimation algorithm gives the numberof correctable errors shown in Tab. II. Based on these esti-mates, the IS-MC SNR-invariant method is employed to drawthe performance curves corresponding to the codes of Fig. 5.Results are reported in Fig. 7, that, as expected, shows thesame performance results as shown in Fig. 5. The algorithmsets a stop condition on the relative error corresponding to theWER estimate at the higher SNR (in this case E b /N = 15 dB )and only WER estimates with relative error less than the given κ = 0 . are plotted. Results show also that the SNR-invariantalgorithm correctly estimates WER for a large range of SNRs.On the other hand, at very low SNRs, the approximation (21)becomes sensibly different from the optimal solution. In Fig. 8,the relative error κ vs E b /N is plotted, where becomes evidentthat (21) at low SNRs is not a good choice as the IS estimatorvariance increases up to a level that makes the computationalcomplexity of IS simulation even higher that standard MCmethod or, equivalently, the relative error much higher thanthe one obtained with the same number of generated wordswith the standard MC method. Furthermore, it is interestingto note that the range of SNRs for which the relative error isbelow κ = 0 . is larger than it was expected, suggesting thatthe assumption in Th. 4 is too strict.VI. C ONCLUSIONS
In this paper an IS estimator for fast-simulation of linearblock codes with hard-decision decoding was presented. Theestimator is optimal, i.e. it has minimum variance, within therestriction of the parametric family of IS distributions. It ispossible to obtain huge gains w.r.t. the standard MC in terms ofgenerated words. Although limited to the family of Bernoullidistributions, numerical examples have shown that in mostpractical cases the gains obtained are significant. However, code (96 ,
48) (273 , , , , , t Table IIE
STIMATED NUMBER OF CORRECTABLE ERRORS FOR
LDPC
CODES OF F IG . (5) WITH BIT - FLIP DECODING [19]. − − − − − − − − ( E b /N ) dB W E R n = 273 , k = 191 n = 96 , k = 48 (96.44.443) n = 495 , k = 433 (495.62.3.2915) n = 999 , k = 888 (999.111.3.5543) n = 1908 , k = 1696 (1908.212.4.1383) n = 4376 , k = 4094 (4376.282.4.9598) Figure 7. IS-MC SNR-invariant fast-simulation algorithm for a set of LDPCcodes with ˆ q = ( t + 1) /n and t given by Table II. The code (273 , istaken from [3], the others from [29]. − − − ( E b /N ) dB r e l a t i v ee rr o r κ n = 273 , k = 191 n = 96 , k = 48 (96.44.443) n = 495 , k = 433 (495.62.3.2915) n = 999 , k = 888 (999.111.3.5543) n = 1908 , k = 1696 (1908.212.4.1383) n = 4376 , k = 4094 (4376.282.4.9598) Figure 8. Relative error as function of ( E b /N ) dB corresponding toWER estimations obtained by application of the SNR-invariant algorithm andreported in Fig. 7. the effective gain depends on the code and/or decoder per-formances in terms of WER. The advantage of the proposedmethods is the low computational complexity and simplicity,since little modification w.r.t. the standard MC simulation isrequired. Finally, higher gains are achievable when the ISestimator does not depend on the cross-over probability ofthe channel being simulated, typically at high SNR.VII. A CKNOWLEDGMENTS
The authors would like to express their sincere gratitudeto the Associate Editor and the anonymous reviewers fortaking their time into reviewing this manuscript and providingcomments that contributed to improve the quality and thereadability of the manuscript.A
PPENDIX AP ROOF OF L EMMA ˆ P IS ( e ) = 1 N n X i = t +1 N i W i (27)where, for ease of notation, we denote W ( i ; p, q ) as W i ; N i isthe number of words with i errors; t is the maximum numberof errors that the decoder can correct; N is the total numberof generated samples. Therefore the variance can be writtenas var q h ˆ P IS ( e ) i = var q " N n X i = t +1 N i W i (28) = 1 N n X i = t +1 var q [ N i W i ] , (29)since generated samples constitute a realization of an i.i.dsequence of random variables. The variance under the sum-mation can be also expressed as var q [ N i W i ] = var q N X j =1 I i ( z j ) W ( wt ( z j ) ; p, q ) (30) = var q N X j =1 I i ( z j ) W i (31) = W i var q N X j =1 I i ( z j ) (32) = W i N X j =1 var q [ I i ( z j )] (33) where I i ( · ) is the indicator function that returns 1 when theevent “ z j contains i errors” occurs. Note that the term W i isdeterministic as it does not depend on the random variable z j , j = 1 , . . . , N . Now define P q ( e ; i ) , X z I i ( z ) f ( z ; q ) (34)as the joint probability that a decoding error occurs with anerror pattern of weight i , when the IS distribution is a Bernoulliwith parameter q . The variance of the estimator can be writtenas var q [ N i W i ] = W i N X j =1 P q ( e ; i ) (1 − P q ( e ; i )) (35) = N W i P q ( e ; i ) (1 − P q ( e ; i )) (36)The probability P q ( e ; i ) can also be expressed in terms of P p ( e ; i ) . By definition P p ( e ; i ) , X z I i ( z ) f ( z ; p ) (37) = X z I i ( z ) f ( z ; p ) f ( z ; q ) f ( z ; q ) (38) = W i P q ( e ; i ) (39)Finally, the variance of the IS estimator is var h ˆ P IS ( e ) i = 1 N n X i = t +1 W i N P q ( e ; i ) (1 − P q ( e ; i ))= 1 N n X i = t +1 W i P q ( e ; i ) (1 − P q ( e ; i ))= 1 N n X i = t +1 W i P q ( e ; i ) − N n X i = t +1 W i P q ( e ; i ) = 1 N n X i = t +1 W i P p ( e ; i ) − N n X i = t +1 P p ( e ; i ) . (40)A PPENDIX BP ROOF OF L EMMA var q h ˆ P IS ( e ) i can be written as ∂∂q var q h ˆ P IS ( e ) i = ∂∂q N n X i = t +1 (cid:16) W ( i ; p, q ) P p ( e ; i ) − P p ( e ; i ) (cid:17)! = 1 N n X i = t +1 ∂W ( i ; p, q ) ∂q P p ( e ; i ) , (41)where P p ( e ; i ) , X z I i ( z ) f ( z ; p ) (42) does not depend on q . After some manipulations the derivativeof W ( i ; p, q ) w.r.t. q can be written as ∂W ( i ; p, q ) ∂q = ∂∂q p i (1 − p ) n − i q i (1 − q ) n − i ! = − W ( i ; p, q ) (cid:18) iq − n − i − q (cid:19) . (43)By substituting (43) into (41) we obtain ∂∂q var q h ˆ P IS ( e ) i == − N n X i = t +1 W ( i ; p, q ) (cid:18) iq − n − i − q (cid:19) P p ( e ; i )= − N n X k = t +1 i − nqq (1 − q ) W ( i ; p, q ) P p ( e ; i ) . (44)A PPENDIX CP ROOF OF L EMMA ∂ ∂q var q h ˆ P IS ( e ) i > . The second derivative of thevariance is evaluated as follows (starting from Eq. (19)) : ∂ ∂q var q h ˆ P IS ( e ) i == ∂∂q ( − N n X i = t +1 i − nqq (1 − q ) W ( i ; p, q ) P p ( e ; i ) ) = − N n X i = t +1 P p ( e ; i ) (cid:20) ∂∂q (cid:18) i − nqq (1 − q ) (cid:19) · W ( i ; p, q ) + i − nqq (1 − q ) · ∂W ( i ; p, q ) ∂q (cid:21) . (45)After some manipulations, derivatives in (45) can be writtenas ∂∂q (cid:18) i − nqq (1 − q ) (cid:19) = i · (2 q − − nq [ q (1 − q )] (46) ∂W ( i ; p, q ) ∂q = − W ( i ; p, q ) (cid:18) iq − n − i − q (cid:19) (47) = − W ( i ; p, q ) (cid:18) i − nqq (1 − q ) (cid:19) (48)After plugging (46) and (48) into (45) the following expressionis obtained ∂ ∂q var q h ˆ P IS ( e ) i == − N n X i = t +1 P p ( e ; i ) " i · (2 q − − nq [ q (1 − q )] · W ( i ; p, q ) − (cid:18) i − nqq (1 − q ) (cid:19) · W ( i ; p, q ) = 1 N n X i = t +1 P p ( e ; i ) W ( i ; p, q ) (cid:20) ξ ( q, i ) q (1 − q ) (cid:21) (49) where ξ ( q, i ) , (cid:2) ( i − nq ) + nq − i (2 q − (cid:3) . The sign ofthe second derivative depends only on the term ξ ( q, i ) thatcan be rewritten as ξ ( q, i ) = i − inq + n q + nq − iq + i (50) = n (1 + n ) q − i ( n + 1) q + i (1 + i ) (51)The discriminant of the quadratic inequality ξ ( q, i ) > isgiven by ∆ = ( − i ( n + 1)) − n (1 + n ) i (1 + i ) (52) = 4 i ( n + 1) − ni ( n + 1) ( i + 1) (53) = 4 i ( n + 1) [ i ( n + 1) − n ( i + 1)] (54) = 4 i ( n + 1) ( ni + i − ni − n ) (55) = 4 i ( n + 1) ( i − n ) . (56)For i < n we have ∆ < , therefore the corresponding termsin the sum that defines the second derivative are all positive.For i = n the term ξ ( q, n ) is given by ξ ( q, n ) = ( n − nq ) + nq − n (2 q − (57) = n (1 − q ) + n (cid:0) q − q + 1 (cid:1) (58) = ( n + 1) (1 − q ) (59)which implies ξ ( q, n ) ≥ . The property ξ ( q, n ) ≥ readilyimplies convexity of var q h ˆ P IS ( e ) i .A PPENDIX DP ROOF OF T HEOREM q that minimizes the variance of the IS estimator (16).From (22) we have that the only term that depends on q is W ( t + 1; p, q ) , denoted for convenience as W t +1 . In order tominimize the variance of the IS estimator the term W t +1 hasto be to minimized, hence arg min q var h ˆ P IS ( e ) i = arg min q W t +1 (60)or equivalently arg min q var h ˆ P IS ( e ) i = arg min q ln W t +1 = arg max q ln h q t +1 (1 − q ) n − t − i . (61)The solution is obtained by equating the derivative of ln h q t +1 (1 − q ) n − t − i to zero and, after some manipulations,results to be q = t + 1 n (62)The choice of q according to the above equation minimizes thevariance of the IS estimator. Note that q = 0 and q = 1 cannotbe solutions of the minimization problem, as t is always nonnegative and upper bounded by ⌈ ( d min − / ⌉ , where d min is the minimum distance of the code that is always less than n . From (20) it is immediate to see that for q = 0 and q = 1 the variance of the IS estimator presents vertical asymptotes. R EFERENCES[1] M. C. Jeruchim, P. Balaban, and K. S. Shanmugan,
Simulation of com-munication systems: modeling, methodology, and techniques . KluwerAcademic, 2nd ed., 2002.[2] W. Tranter,
Principles of communication systems simulation with wire-less applications . Prentice Hall, 2004.[3] R. Morelos-Zaragoza,
The art of error correcting coding . John Wiley,2006.[4] S. Benedetto and E. Biglieri,
Principles of digital transmission: withwireless applications . Kluwer Academic, 1999.[5] J. Proakis and M. Salehi,
Digital Communications . McGraw-HillInternational Edition, McGraw-Hill Higher Education, 2008.[6] R. Gallager, “Low-Density Parity-Check Codes,”
IRE Transactions onInformation Theory , vol. 8, pp. 21–28, Jan. 1962.[7] D. J. MacKay, “Good Error-Correcting Codes Based on Very SparseMatrices,”
IEEE Trans. Inf. Theory , vol. 45, pp. 399–431, Mar. 1999.[8] S. Lin and D. Costello,
Error control coding: fundamentals and appli-cations . Pearson-Prentice Hall, 2004.[9] T. Richardson, “Error floors of LDPC codes,” in
Proc. of the 41st An-nual Allerton Conference on Communication, Control, and Computing ,vol. 41, pp. 1426–1435, 2003.[10] S. Chilappagari, S. Sankaranarayanan, and B. Vasic, “Error Floors ofLDPC Codes on the Binary Symmetric Channel,” in
Proc. of IEEEInternational Conference on Communications 2006 (ICC 2006) , vol. 3,pp. 1089–1094, June 11–15, 2006.[11] B. P. Smith and F. R. Kschischang, “Future Prospects for FEC in Fiber-Optic Communications,”
IEEE J. Sel. Topics Quantum Electron. , vol. 16,pp. 1245–1257, Sept. 2010.[12] S. Ghosh and P. D. Lincoln, “Dynamic LDPC Codes for NanoscaleMemory with Varying Fault Arrival Rates,” in
Proc. of the 6th Inter-national Conference on Design & Technology of Integrated Systems inNanoscale Era (DTIS) , Apr. 2011.[13] P. Smith, M. Shafi, and H. Gao, “Quick Simulation: A Review ofImportance Sampling Techniques in Communications Systems,”
IEEEJ. Sel. Areas Commun. , vol. 15, pp. 597–613, May 1997.[14] R. Y. Rubinstein and D. P. Kroese,
Simulation and the Monte CarloMethod . Wiley, 2nd ed., 2008.[15] R. Srinivasan,
Importance Sampling: Applications in Communicationsand Detection . Springer-Verlag, 2002.[16] R. Y. Rubinstein and D. P. Kroese,
The Cross-Entropy Method: A UnifiedApproach to Monte Carlo Simulation, Randomized Optimization andMachine Learning . Springer Verlag, 2004.[17] J. S. Sadowsky, “A New Method for Viterbi Decoder Simulation UsingImportance Sampling,”
IEEE Trans. Commun. , vol. 38, no. 9, pp. 1341–1351, 1990.[18] B. Xia and W. E. Ryan, “On Importance Sampling for Linear BlockCodes,” in
Proc. of IEEE International Conference on Communications2003 (ICC 2003) , pp. 2904–2908, May 11–15, 2003.[19] A. Mahadevan and J. M. Morris, “SNR-Invariant Importance Samplingfor Hard-Decision Decoding Performance of Linear Block Codes,”
IEEETrans. Commun. , vol. 55, pp. 100–111, Jan. 2007.[20] C. Berrou, S. Vaton, M. Jezequel, and C. Douillard, “Computing theMinimum Distance of Linear Codes by the Error Impulse Method,” in
Proc. of the Global Telecommunications Conference 2002 (GLOBECOM2002) , vol. 2, pp. 1017–1020, Nov. 17–21, 2002.[21] X.-H. Hu, M. P. Fossorier, and E. Eleftheriou, “On the Computation ofthe Minimum Distance of Low-Density Parity-Check Codes,” in
Proc.of IEEE International Conference on Communications 2004 (ICC 2004) ,vol. 2, pp. 767–771, June 20–24, 2004.[22] F. Daneshgaran, M. Laddomada, and M. Mondin, “An algorithm forthe computation of the minimum distance of LDPC codes,”
EuropeanTransactions on Telecommunications , vol. 17, no. 1, pp. 57–62, 2006.[23] M. Punekar, F. Kienle, N. Wehn, A. Tanatmis, S. Ruzika, and H. W.Hamacher, “Calculating the minimum distance of linear block codes viaInteger Programming,” in
Proc. of the 6th International Symposium onTurbo Codes & Iterative Information Processing , pp. 329–333, IEEE,Sept. 2010.[24] A. Keha and T. Duman, “Minimum distance computation of LDPCcodes using a branch and cut algorithm,”
IEEE Trans. Commun. , vol. 58,pp. 1072–1079, Apr. 2010.[25] L. Mendo and J. M. Hernando, “A simple sequential stopping rule forMonte Carlo Simulation,”
IEEE Trans. Commun. , vol. 54, pp. 231–241,Feb. 2006. [26] G. Romano, A. Drago, and D. Ciuonzo, “Sub-optimal importancesampling for fast simulation of linear block codes over BSC channels,”in Proc. of the 8th International Symposium on Wireless CommunicationSystems (ISWCS 2011) , pp. 141–145, Nov. 2011.[27] S. Wicker,
Error control systems for digital communication and storage .Prentice Hall, 1995.[28] E. Berlekamp,
Algebraic Coding Theory
Gianmarco Romano (M’11) is currently AssistantProfessor at the Department of Information Engi-neering, Second University of Naples, Aversa (CE),Italy. He received the “Laurea” degree in ElectronicEngineering from the University of Naples “FedericoII” and the Ph.D. degree from the Second Universityof Naples, in 2000 and 2004, respectively. From2000 to 2002 he has been Researcher at the Na-tional Laboratory for Multimedia Communications(C.N.I.T.) in Naples, Italy. In 2003 he was VisitingScholar at the Department of Electrical and Elec-tronic Engineering, University of Conncticut, Storrs, USA. Since 2005 hehas been with the Department of Information Engineering, Second Universityof Naples and in 2006 has been appointed Assistant Professor. His researchinterests fall within the areas of communications and signal processing.
Domenico Ciuonzo (S’11) was born in Aversa (CE),Italy, on June 29th, 1985. He received the B.Sc.( summa cum laude ), the M.Sc. ( summa cum laude )degrees in computer engineering and the Ph.D. inelectronic engineering, respectively in 2007, 2009and 2013, from the Second University of Naples,Aversa (CE), Italy. In 2011 he was involved inthe Visiting Researcher Programme of the formerNATO Underwater Research Center (now Centrefor Maritime Research and Experimentation), LaSpezia, Italy; he worked in the "Maritime SituationAwareness" project. In 2012 he was a visiting scholar at the Electrical andComputer Engineering Department of University of Connecticut (UConn),Storrs, US. He is currently a postdoc researcher at Dept. of Industrialand Information Engineering of Second University of Naples. His researchinterests are mainly in the areas of Data and Decision Fusion, StatisticalSignal Processing, Target Tracking and Probabilistic Graphical Models. Dr.Ciuonzo is a reviewer for several IEEE, Elsevier and Wiley journals in theareas of communications, defense and signal processing ..