[PDF] A Short Note on the Average Maximal Number of Balls in a Bin

Abstract

We analyze the asymptotic behavior of the average maximal number of balls in a bin obtained by throwing uniformly at random r balls without replacement into n bins, T times. Writing the expected maximum as r n T+ C n,r T − − √ +o( T − − √ ) , a recent preprint of Behrouzi-Far and Zeilberger asks for an explicit expression for C n,r in terms of n,r and π . In this short note, we find an expression for C n,r in terms of n,r and the expected maximum of n independent standard Gaussians. This provides asymptotics for large n as well as closed forms for small n ---e.g. C 4,2 = 3 2 π 3/2 arccos(−1/3) ---and shows that computing a closed form for C n,r is precisely as hard as the difficult question of finding the expected maximum of n independent standard Gaussians.

Full PDF

aa r X i v : . [ m a t h . C O ] M a y A SHORT NOTE ON THE AVERAGE MAXIMAL NUMBER OF BALLSIN A BIN

MARCUS MICHELEN

Abstract.

We analyze the asymptotic behavior of the average maximal number of ballsin a bin obtained by throwing uniformly at random r balls without replacement into n bins, T times. Writing the expected maximum as rn T + C n,r √ T + o ( √ T ), a recent preprint ofBehrouzi-Far and Zeilberger asks for an explicit expression for C n,r in terms of n, r and π . Inthis short note, we ﬁnd an expression for C n,r in terms of n, r and the expected maximum of n independent standard Gaussians. This provides asymptotics for large n as well as closedforms for small n —e.g. C , = π / arccos( − / C n,r is precisely as hard as the diﬃcult question of ﬁnding the expected maximum of n independent standard Gaussians. Introduction

Suppose that you have n bins, and in each round, you throw r balls such that each balllands in a diﬀerent bin, with each of the (cid:0) nr (cid:1) possibilities equally likely. After T rounds, set U ( n, r ; T ) to be the maximum occupancy among the n bins. Set A ( n, r ; T ) = E U ( n, r ; T ) − rn T , and suppose that A ( n, r ; T ) = C n,r √ T + o ( √ T ). A recent preprint [2] of Behrouzi-Far and Zeilberger asks for an explicit expression for C n,r in terms of n, r and π ; they alsocalculate estimates for C , , C , , C , , C , using recurrence relations derived with computeraid. As a motivation, Behrouzi-Far and Zeilberger [2] note that this problem arises incomputer systems since load distribution across servers can be modeled with balls and bins.Rather than utilizing exact computation in the vein of [2], we use a multivariate centrallimit theorem to prove the following: Theorem 1.1. C n,r := lim T →∞ A ( n, r ; T ) √ T = s r ( n − r ) n ( n − E (cid:20) max j n Z j (cid:21) where Z j are i.i.d. standard Gaussians. The expected maximum of n i.i.d. standard Gaussians appears to have no known closedform for general n and in fact the known forms for small n can be quite nasty; for instance,when n = 5, the expected value is π / arccos( − / n → ∞ , uniformly in r : Corollary 1.2. As n → ∞ , we have C n,r ∼ r r ( n − r ) log( n ) n uniformly in r . roof. This follows from utilizing E [max j n Z j ] ∼ p n ) (see, for instance, [3, Exercise3 . . (cid:3) The exact form in Theorem 1.1 also picks up a nice combinatorial property:

Corollary 1.3.

For each n , the sequence { C n,r } n − r =1 is log -concave.Proof. Log-concavity follows from the inequality( r − n − r + 1)( r + 1)( n − r −

1) = ( r − n − r ) − r ( n − r ) . (cid:3) To prove Theorem 1.1, we use a multivariate central limit theorem to prove a limit theoremfor U ( n,r ; T ) − rn T √ T (Corollary 2.2), show that we can exchange the limit and expectation (Lemma2.3), and then relate this expectation to the expected maximum of i.i.d. standard normals(Lemma 2.4). 2. Proving Theorem 1.1

Set b ( n, r ; T ) to be the random vector in { , , . . . , T } n denoting the occupancies of thebins at time T . The following representation for b ( n, r ; T ) is immediate: Lemma 2.1.

Fix n, r and let X be the random variable in { , } n chosen uniformly amongvectors v ∈ { , } n with k v k L = r . Let X , X , . . . be i.i.d. copies of X . Then b ( n, r ; T ) d = P Tj =1 X j . Further, the random variable X has covariance matrix Γ given by Γ i,j = ( r ( n − r ) n for i = j − r ( n − r ) n ( n − for i = j . Proof.

The covariance matrix Γ can be calculated easily:Γ j,j = rn (cid:16) − rn (cid:17) = r ( n − r ) n . For Γ i,j with i = j , we computeΓ i,j = (cid:0) n − r − (cid:1)(cid:0) nr (cid:1) − r n = − r ( n − r ) n ( n − . (cid:3) From here, the multivariate central limit theorem shows convergence in distribution.

Corollary 2.2. U ( n, r ; T ) − rn T √ T d −→ max { Y , . . . , Y n } where ( Y , . . . , Y n ) is a mean-zero multivariate Gaussian with covariance matrix Γ , givenin Lemma 2.1.Proof. The multivariate central limit theorem [3, Theorem 3 . .

6] implies that b ( n, r ; T ) − E b ( n, r ; T ) √ T → N (0 , Γ) . The identity E b ( n, r ; T ) = ( rn , . . . , rn ) together with the continuous mapping theorem impliesthe Corollary. (cid:3) o gain information about A ( n, r ; T ), we need to show that not only do we have conver-gence in distribution, but that we can switch the order of taking limits and expectation. Lemma 2.3. C n,r := lim T →∞ A ( n, r ; T ) √ T = E max { Y , . . . , Y n } where ( Y , . . . , Y n ) are jointly Gaussian with mean and covariance matrix given by Γ asdeﬁned in Lemma 2.1.Proof. Our strategy is to show uniform integrability of b U ( T ) := ( U ( n, r ; T ) − rn T ) / √ T ; for j ∈ { , , . . . , n } , let b ( j ) denote the number of balls in bin j . Then by a union bound, wehave P h(cid:12)(cid:12)(cid:12) U ( n, r ; T ) − rn T (cid:12)(cid:12)(cid:12) > λ √ T i n P h(cid:12)(cid:12)(cid:12) b (1) − rn T (cid:12)(cid:12)(cid:12) > λ √ T i . (1)By Hoeﬀding’s inequality (e.g. [1, Theorem 7.2.1]), we bound P h(cid:12)(cid:12)(cid:12) b (1) − rn T (cid:12)(cid:12)(cid:12) > λ √ T i (cid:0) − λ (cid:1) . Thus, for each T and K > E h | b U ( T ) | · | b U ( T ) | > K i n Z ∞ K e − λ dλ . This goes to zero uniformly in T as K → ∞ , thereby showing that the family { b U ( T ) } T > is uniformly integrable. Since uniform integrability together with convergence in distributionimplies convergence of means, Corollary 2.2 completes the proof. (cid:3) All that remains now is to relate E max { Y , . . . , Y n } to the right-hand-side of Theorem 1.1. Lemma 2.4.

Let ( Y , . . . , Y n ) be jointly Gaussian with mean and covariance matrix Γ .Then E [max { Y , . . . , Y n } ] = s r ( n − r ) n ( n − E (cid:20) max j n Z j (cid:21) where the variables Z j are i.i.d. standard Gaussians.Proof. Consider a multivariate Gaussian ( W , . . . , W n ) with mean 0 and covariance matrixgiven by e Γ i,j = ( nn − for i = j − n ( n − for i = j . Since Γ = r ( n − r )( n − n e Γ, we have( Y , . . . , Y n ) d = r r ( n − r )( n − n ( W , . . . , W n ) . (2)The vector ( W , . . . , W n ) can in fact be realized by setting W j = Z j − P i = j Z i n − with Z i i.i.d. standard Gaussians. This is because the two vectors are both mean-zero multivariateGaussians and have the same covariance matrix. Setting S n = P ni =1 Z i , we note W j = − S n n − nn − Z j hereby implying max j n { W j } = − S n n − (cid:18) nn − (cid:19) max j n { Z j } . Taking expectations and utilizing (2) completes the proof. (cid:3)

Remark . The ﬁnal piece of the proof of Lemma 2.4—relating the expected maximum ofthe process ( Z j − P i = j Z i n − ) nj =1 to that of ( Z j ) nj =1 —is due to a Math Overﬂow answer of IosefPinelis [4]. Proof of Theorem 1.1:

The theorem follows by combining Lemmas 2.3 and 2.4. (cid:3) Comparison with Numerics

Theorem 1.1 proves an equality for C n,r , although for large n , the expectation on theright-hand-side of Theorem 1.1 appears to have no known closed form. Calculating thesevalues for small n is tricky and tedious; we reproduce a few values of E [max j n Z j ] whichcan be computed precisely, as calculated in [5]: n E [max j n Z j ]2 π − / / π − / π − / arccos( − / / π − / arccos( − / C n,r predicted in [2], and notethat their predictions are quite close:Exact Value Numerical Approximation Predicted Value from [2] C , √ π . . . . . . . .C , √ √ π . . . . . . . .C , π / arccos( − /

3) 0 . . . . . . . .C , √ π / arccos( − /

3) 0 . . . . . . . . References [1] N. Alon and J. H. Spencer.

The probabilistic method . Wiley-Interscience Series in Discrete Mathematicsand Optimization. John Wiley & Sons, Inc., Hoboken, NJ, third edition, 2008. With an appendix on thelife and work of Paul Erd˝os.[2] A. Behrouzi-Far and D. Zeilberger. On the average maximal number of balls in a bin resulting fromthrowing r balls into n bins t times. arXiv preprint arXiv:1905.07827 , 2019.[3] R. Durrett. Probability: theory and examples , volume 31 of

Cambridge Series in Statistical and Proba-bilistic Mathematics . Cambridge University Press, Cambridge, fourth edition, 2010.[4] I. Pinelis. Expectation of maximum of multivariate gaussian. MathOverﬂow.URL:https://mathoverﬂow.net/q/332113 (version: 2019-05-21).[5] A. Selby. Expected value for maximum of a normal random variable. Mathematics Stack Exchange.URL:https://math.stackexchange.com/q/510580 (version: 2013-10-01).

Dept. of Mathematics, University of Pennsylvania, 209 South 33rd Street, Philadelphia,PA 19104.

E-mail address : [email protected]@sas.upenn.edu