[PDF] Hypergeometric tail inequalities: ending the insanity

Abstract

The hypergeometric distribution is briefly and informally surveyed, including popular notation, symmetries, and the tail inequalities Pr[i≥E[i]+tn]≤ e −2 t 2 n and Pr[i≤E[i]−tn]≤ e −2 t 2 n .

Full PDF

HHypergeometric tail inequalities:ending the insanity ∗ Matthew [email protected] 26, 2013

I recently needed to put a tail inequality on an hypergeometric distribution.This should be an easy thing to do; but I found the available online sources tobe really frustrating. Everybody uses diﬀerent notation, and most people seemto like giving helpful examples in which the word “success” is used to describefailure and vice versa, and the whole thing is likely to drive the reader nuts.Here, for my own future reference and for the beneﬁt of anyone trying to do thesame thing, is a summary of what I was able to glean in what I hope will beclearly understandable terms.In the years since 2009, when I ﬁrst posted these notes on my Web site,they have attracted a fair bit of attention and even some citations in seriousacademic publications, not all of which spelled my name correctly. Thus itseems appropriate to post the notes on arXiv to make future citations easier,increase my own visibility in academic search engines, and so on.I don’t claim there’s any original math in these notes; this is just a summaryof well-known results; but it cost me a fair bit of annoyance to get issues likenotation straightened out. If you use these notes, a citation to this posting onarXiv would be appreciated.It is assumed that you know about as much as I did about this stuﬀ beforeI did the research: namely, you should know enough to know that applying atail inequality to an hypergeometric distribution is what you want to do, even ifyou have trouble keeping track of the parameters of the distribution or knowingexactly which tail inequality you want. You’re also expected to be mentallyﬂexible enough to translate the balls-and-urn description into whatever yourreal application is. I’ll spare you the confusing burned-out-lightbulbs example.My own actual application had to do with counting bits in the bitwise ANDand OR of random bit strings with known numbers of 1 bits. ∗ First version posted on the Web, March 2009. Revised for minor typographic errors,February 2011. Revised for posting on arXiv, November 2013. a r X i v : . [ m a t h . P R ] N ov he articles by Chv´atal and Hoeﬀding may be hard to ﬁnd online, especiallyif you don’t have academic library privileges [1, 2]. Contact me by email if youneed help locating them. You’ve got an urn with N balls in it. Some of them, namely M of them, arewhite. The rest, namely N − M of the balls, are black. You’re going to drawout n balls from the urn. You are drawing them uniformly , which means thatevery time you pull out a ball it is equally likely to be any of the balls in the urnat that moment. However, you are drawing them without replacement , whichmeans that after you’ve drawn out a ball of one colour, you’ve reduced thenumber of balls of that colour remaining and so the next one will be a littlemore likely to be the other colour. If instead you threw each ball back in afterdrawing it, then every draw would have the same chances, we’d be dealing withthe geometric distribution instead of the hypegeometric distribution, and themath would be a lot easier. But this time you’re drawing without replacement.Now, how many white balls are you going to get among the n you draw?Let’s call this number i ; the question is what interesting things we can say aboutthe distribution of the random variable i , which is called an hypergeometricdistribution.The short answer is that you will get about the same fraction of white ballsin your n -ball sample as the fraction of white balls among the N that the urncontained at the start. That’s the expected value of i . Moreover, you will nearlyalways get very close to exactly that fraction. The distribution has light tails.It isn’t a normal distribution bell curve (which is approached by a geometricdistribution, which in turn is what you’d get by sampling with replacement) butit does have the same kind of faster-than-exponential fall-oﬀ that you would getfrom the normal distribution. As a result you can put a limit just a little bitabove the expected value of i and say “ i is nearly always below this limit” or putanother limit just a little below and say “ i is nearly always above this limit.”That is what a tail inequality does.The usual suspects (MathWorld [3] and Wikipedia [4]) and their sourcesuse many diﬀerent notations. I am following the notation for variables used byChv´atal [1], because his paper seemed easiest to understand. If you try to readthe encyclopedia entries, you can try to translate using this table:Chv´atal [1] MathWorld [3] Wikipedia [4]and these notesballs in urn N n + m N balls that count M n m balls that don’t count N − M m N − m balls you draw n N n drawn balls that count i i k The distribution

What’s the chance of getting exactly i white balls? For that we want theprobability distribution function; Chv´atal doesn’t give a notation for it but Iam using one based on his notation for the cumulative distribution function: h ( M, N, n, i ) = (cid:18) Mi (cid:19)(cid:18) N − Mn − i (cid:19) (cid:30)(cid:18) Nn (cid:19) (1)That follows from simple counting: how many ways can we draw out n ballsincluding exactly i of the M white balls, compared to the number of ways wecan draw out n balls without caring about how many of them are white? Theanswer is that we must choose i of the M white balls to draw, hence the factorof (cid:0) Mi (cid:1) , and n − i of the N − M black balls, hence (cid:0) N − Mn − i (cid:1) , and then divide thatby (cid:0) Nn (cid:1) for drawing n of the N balls without regard to colour. (All these choicesare uniform.)The expected value is just the same fraction of white balls in the sample asin the urn: E [ i ] = n MN (2)and the variance is as follows: V [ i ] = n M ( N − M )( N − n ) N ( N −

1) (3)Proofs for mean and variance are in MathWorld [3].

The Wikipedia article (as of this writing, of course; Wikipedia is a movingtarget) gives some useful symmetries [4]. In our notation: h ( M, N, n, i ) = h ( N − M, M, n, n − i ) (4) h ( M, N, n, i ) = h ( M, N, N − n, M − i ) (5) h ( M, N, n, i ) = h ( n, N, M, i ) (6)If you have M balls white, draw n , and hope for i of them to be white, youcould instead ﬂip all the colours, draw n , and hope for n − i of them to bewhite (4). Also, if you draw n balls and ﬁnd i to be white, that’s the same asﬁnding the M − i remaining white balls among the N − n you did not draw;you can swap “drawn” and “not drawn” balls (5). Finally, you can swap theconcepts of “drawn” and “coloured white” and imagine that the urn is choosing M balls to possibly be drawn by you, instead of you choosing n balls to possiblybe coloured white in the urn (6). 3 Tail inequalities

We’re interested in the chance that i is at least k , for some k that will be alittle bigger than the expected value E [ i ]. We want to say that when k is justa tiny bit bigger than E [ i ], then this chance is already very small. That willmean proving that this function is small: H ( M, N, n, k ) = n (cid:88) i = k h ( M, N, n, i ) = n (cid:88) i = k (cid:18) Mi (cid:19)(cid:18) N − Mn − i (cid:19) (cid:30)(cid:18) Nn (cid:19) (7)That’s the sum for all i ≥ k of the probability distribution function h ( M, N, n, i );we could equally correctly write the summation as going to inﬁnity, because h ( M, N, n, i ) is zero for i > n ; I wrote it up to n for consistency with Chv´atal [1].Chv´atal gives the following bound, which he credits to Hoeﬀding [1, 2]. Ibelieve this is a special case of the well-known result now known as Hoeﬀding’sInequality, but that’s a very powerful result and the steps required to apply itto the hypergeometric distribution in particular are a little involved. Where p = M/N and k = ( p + t ) n with t ≥

0, we have this: H ( M, N, n, k ) ≤ (cid:32)(cid:18) pp + t (cid:19) p + t (cid:18) − p − p − t (cid:19) − p − t (cid:33) n (8)That is a bit of a mess, but we can relax it a little further to get whatChv´atal describes as a “more elegant but weaker” bound which is more likelywhat we’ll want to use when applying this result: [1] H ( M, N, n, k ) ≤ e − t n (9)That’s a nice one-sided tail inequality for hypergeometric distributions. Stat-ing it in terms that sound like what we want for using it in proving that a ran-domized algorithm works: If i is an hypergeometric random variable with theparameters N , M , and n as described above, then P r [ i ≥ E [ i ] + tn ] ≤ e − t n (10)If we want an inequality for the other tail, then we can apply the symme-try (4) as follows: P r [ i ≤ k (cid:48) ] = k (cid:48) (cid:88) i =0 h ( M, N, n, i ) (11)= k (cid:48) (cid:88) i =0 h ( N − M, N, n, n − i ) (12)Then we can change the index of summation to j = n − i and get: P r [ i ≤ k (cid:48) ] = n (cid:88) j = n − k (cid:48) h ( N − M, N, n, j ) (13)4he other side’s inequality (9) can give us a nice bound for that if we choose k , t , and p properly. We want k = n − k (cid:48) = ( p + t ) n where p = ( N − M ) /N =1 − ( M/N ). Then doing the algebra we get k (cid:48) = E [ i ] − tn , nicely equal andopposite to the other side’s bound: P r [ i ≤ E [ i ] − tn ] ≤ e − t n (14) References [1] V. Chv´atal. The tail of the hypergeometric distribution.

Discrete Mathe-matics , 25(3):285–287, 1979.[2] Wassily Hoeﬀding. Probability inequalities for sums of bounded randomvariables.

Journal of the American Statistical Association , 58(301):13–30,1963.[3] Eric W. Weisstein. Hypergeometric distribution. From MathWorld—A Wolfram Web Resource. Online http://mathworld.wolfram.com/HypergeometricDistribution.html .[4] Wikipedia. Hypergeometric distribution. Revision 273333657, 26 Febru-ary 2009. Online http://en.wikipedia.org/w/index.php?title=Hypergeometric_distribution&oldid=273333657http://en.wikipedia.org/w/index.php?title=Hypergeometric_distribution&oldid=273333657