A Bose-Einstein Approach to the Random Partitioning of an Integer
aa r X i v : . [ c ond - m a t . s t a t - m ec h ] J un A BOSE-EINSTEIN APPROACH TO THE RANDOMPARTITIONING OF AN INTEGER
THIERRY E. HUILLET
Abstract.
Consider N equally-spaced points on a circle of circumference N .Choose at random n points out of N on this circle and append clockwise an arcof integral length k to each such point. The resulting random set is made ofa random number of connected components. Questions such as the evaluationof the probability of random covering and parking configurations, number andlength of the gaps are addressed. They are the discrete versions of similarproblems raised in the continuum. For each value of k , asymptotic resultsare presented when n, N both go to ∞ according to two different regimes.This model may equivalently be viewed as a random partitioning problem of N items into n recipients. A grand-canonical balls in boxes approach is alsosupplied, giving some insight into the multiplicities of the box filling amountsor spacings. The latter model is a k − nearest neighbor random graph with N vertices and kn edges. We shall also briefly consider the covering problem inthe context of a random graph model with N vertices and n (out-degree 1)edges whose endpoints are no more bound to be neighbors. Running title:
Bose-Einstein and Integer Partitioning
Keywords:
Random integer partition, random allocation, discrete cover-ing of the circle, discrete spacings, balls in boxes, Bose-Einstein, k − nearestneighbor random graph. Introduction
Many authors considered the problems related to the coverage of the unit circle byarcs of equal sizes randomly placed on the circle, among which [20], [19], [5], [6],[18], [7], [8], [10]. In this Note, motivated by a Remark in the paper ([2], p.18) onrandom graphs, we shall be concerned by a discrete version to the above problem,following [9] and [14]: Consider N equally spaced points (vertices) on the circleof circumference N so with arc length 1 between consecutive points. Sample atrandom n out of these N points and consider the discrete random spacings betweenconsecutive sampled points, turning clockwise on the circle. Let k be an integerand append clockwise an arc of length k to each sampled points, forming a randomset of arcs on the circle. What is the probability that the circle is covered? If thecircle is not covered, how many gaps do we have in the random set of arcs? Whatis the probability that no arc overlap (the discrete hard rods model), what is theprobability that no arc overlap and that the gaps lengths are smaller than k itself(the discrete version of R´enyi’s parking model). All these questions require someunderstanding of both the smallest and largest spacings in the sample. This modelcan equivalently be formulated in terms of the random partitioning of N items into n recipients. Here also, the distributions of the smallest and largest sharesattached to each of the recipients are of fundamental interest. We will focus on thethermodynamical limit regime: n, N → ∞ while n/N → ρ and also, sometimes,in a regime where n, N → ∞ , while n (cid:0) − nN (cid:1) k → α , 0 < α < ∞ . In the firstregime, the occurrence of say covering and parking configurations are exponentiallyrare in the whole admissible density range of ρ , whereas in the second one they aremacroscopically frequent. At the heart of these models is the Bose-Einstein distri-bution for discrete spacings. Finally, a Bosonic grand canonical approach to theabove model will be considered where N balls are assigned at random to N boxes.For this urn model, we will study the number of empty boxes and the number ofboxes with i balls, giving some insight into the spacings multiplicities, both in thecanonical and the grand-canonical ensembles.The model just developed is a k − nearest neighbors random graph with N verticesand kn edges. In the last Section, we consider a random graph with N vertices and n (out-degree 1) edges whose endpoints are no more necessarily neighbors, beingnow chosen at random on the whole set of vertices. In this model of a different kind,each of the n sampled points is allowed to create a link far away with any of the N vertices, not necessarily with neighbors. We estimate the covering probabilityfor this random graph model in the spirit of Erd˝os-R´enyi (see [1]). We show that,in sharp contrast to the k − nearest neighbor graph, there exists a critical density ρ c = 1 − e − above which covering occurs with probability one. The take-homemessage is to what extent when connections are not restricted to neighbors, thechance of connectedness is increased.2. Random partition of an integer and discrete spacings
Consider a circle of circumference N, with N integer. Consider N equally spacedpoints on the circle so with arc length 1 between consecutive points. We shall callthis discrete set of points the N -circle. Draw at random n ∈ { , .., N − } pointswithout replacement at the integer sites of this circle (thus, with M , .., M n inde-pendent and identically distributed, say iid, and uniform on { , .., N } ). Pick atrandom one the points M , .., M n and call it M n . Next, consider the ordered setof integer points ( M m : n , m = 1 , .., n ), turning clockwise on the circle, starting from M n . Let N m,n = M m +1: n − M m : n , m = 1 , .., n −
1, be the consecutive discretespacings, with N n,n = M n − M n : n , modulo N , closing the loop. Under our hy-pothesis, N m,n d = N n , m = 1 , .., n , independent of m , the distribution of which is F N n ( k ) := P ( N n > k ) = 1 − F N n ( k ) = (cid:0) N − k − n − (cid:1) / (cid:0) N − n − (cid:1) , with E N n = N/n.
It is indeed a result of considerable age (see e.g. [9]) that identically distributed(id) discrete spacings N n := ( N m,n ; m = 1 , .., n ), with | N n | := P m N m,n = N canbe generated as the conditioning(1) N n = G n | {| G n | = N } , where | G n | := P nm =1 G m is the sum of n iid geometric( α ) random variables ≥ P ( G ≥ k ) = α k − , k ≥ α ∈ (0 , N n has the claimed BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 3
P`olya-Eggenberger PE (1 , n −
1) distribution: P ( N n = k ) = (cid:0) N − k − n − (cid:1) / (cid:0) N − n − (cid:1) , k =1 , .., N − n + 1 . Note that as n, N → ∞ , while n/N = ρ < N n d → G, where G ≥ − ρ ) distribution: P ( G ≥ m ) = (1 − ρ ) m − , m ≥ . The limiting expected value of N n is 1 /ρ. With k := ( k m ; m = 1 , .., n ), the joint law of N n is(3) P ( N n = k ) = 1 (cid:0) N − n − (cid:1) | k | = N ) , which is the exchangeable uniform distribution on the restricted discrete N − simplex | k | := P nm =1 k m = N , k m ≥
1, also known as the Bose-Einstein distribution . This distribution occurs in the following P`olya-Eggenberger urn model context (see[15]): An urn contains n balls all of different colors.. A ball is drawn at randomand replaced together while adding another ball of the same color.. Repeating this N − n times, N n is the number of balls of the different colors in the urn. See [9].From the random model just defined, we get,(4) N = n X m =1 N m,n which corresponds to a random partition of N into n id parts or components ≥ . It also models the following random allocation problem (see [16]): N items are tobe shared at random between n recipients. N m,n is the amount of the N itemsallocated to recipient m . Although all shares are id, there is a great variability inthe recipients parts as it will become clear from the detailed study of the smallestand largest shares in the sample.This model is connected to the continuous spacings between n randomly placedpoints on the unit circle in the following way: As N → ∞ , N n /N d → S n where S n := ( S ,n , .., S n,n ) has Dirichlet uniform density function on the continuous unit n − simplex [17](5) f S ,..,S n ( s , .., s n ) = ( n − · δ ( P nm =1 s m − ) . Let P n (1) := P nm =1 N m,n >
1) be the amount of sampled points whose distanceto their clockwise neighbors is more than one unit. There are n − P n (1) sampledpoints which are neighbors, therefore N = 1 · ( n − P n (1)) + n X m =1 N m,n N m,n > n + n X m =1 ( N m,n − + THIERRY E. HUILLET where i + = max ( i, . Appending an arc of length 1 clockwise to the n sam-pled points and considering the induced covered set from { , .., N } , L n (1) := P nm =1 ( N m,n − + represents the length of the gaps (the size of the uncoveredset). So, from the model L n (1) = N − n is a constant and N − n = n X m =1 ( N m,n − + corresponds to a random partition of N − n into n id parts or components ≥ L n (1) = N − L n (1) is constantequal to n , which is obvious . Of considerable interest is the sequence ( N m : n ; m = 1 , .., n ) obtained while orderingthe components sizes ( N m,n ; m = 1 , .., n ), with N n ≤ .. ≤ N n : n .By the exclusion-inclusion principle, the cumulative distribution function F N m : n ( k ) = P ( N m : n ≤ k ) is easily seen to be(6) F N m : n ( k ) = 1 (cid:0) N − n − (cid:1) n X q = m (cid:18) nq (cid:19) n X p = n − q ( − p + q − n (cid:18) qn − p (cid:19)(cid:18) N − pk − n − (cid:19) which has been known for a while in the context of spacings in the continuum (see[20]).In particular,(7) F N n : n ( k ) := P ( N n : n ≤ k ) = 1 (cid:0) N − n − (cid:1) n X p =0 ( − p (cid:18) np (cid:19)(cid:18) N − pk − n − (cid:19) and(8) F N n ( k ) := P ( N n > k ) = (cid:18) N − nk − n − (cid:19) / (cid:18) N − n − (cid:19) are the largest and smallest component sizes distributions in this case.In the formula giving F N n : n ( k ) , with [ x ] standing for the integral part of x , thesum should as well stop at n ∧ (cid:2) N − nk (cid:3) , observing (cid:0) ij (cid:1) = 0 if i < j .Clearly, if k = 1, P ( N n : n = 1) = 0 (= 1) whatever n < N (if n = N ). If k = 2 and N > n , P ( N n : n ≤
2) = P ( N n : n = 2) = 0 . If N = 2 n , P ( N n : n = 2) = 1 / (cid:0) n − n − (cid:1) is the probability of a regular configuration with all sampled points equally-spacedby two arc length units. If n < N < n, P ( N n : n = 2) is the probability of aconfiguration with 2 n − N neighbor points distant of one arc length unit and N − n points distant of two units.As N, k → ∞ while k/N → s (cid:18) N − pk − n − (cid:19) / (cid:18) N − n − (cid:19) → (1 − ps ) n − With 0 < a < b ≤ N, the joint law of ( N n , N n : n ) is given by(9) P ( N n > a, N n : n ≤ b ) = n X m =0 ( − m (cid:0) N − n − (cid:1) (cid:18) nm (cid:19)(cid:18) N − ( na + m ( b − a )) − n − (cid:19) . BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 5
In the random partitioning of N image, it gives the probability that the shares ofall n recipients all range between a and b. Putting ( a = k, b = N ) and ( a = 0 , b = k )gives F N n : n ( k ) and F N n ( k ) . This formula was first obtained by [3] in the contin-uum. Putting next a = k , b = 2 k , we get(10) P ( N n > k, N n : n ≤ k ) = 1 (cid:0) N − n − (cid:1) n X m =0 ( − m (cid:18) nm (cid:19)(cid:18) N − ( n + m ) k − n − (cid:19) . When k = 1 , we have P ( N n > , N n : n ≤
2) = (cid:0) N − n − (cid:1) − N =2 n . If N = 2 n , (cid:0) n − n − (cid:1) − is the probability of the configuration where the n sampled points areexactly equally-spaced, each by two arc length units.As n, N → ∞ while n/N → ρ < , with E a rv with rate 1 exponential distribution(11) − log (1 − ρ ) n ( N n − d → E ; ρ log n N n : n a.s. → , suggesting that the smaller (larger) integer component in the partition of N is oforder n − (respectively log n ) in the considered asymptotic regime. More precisely,using the joint law of ( N n , N n : n )(12) (cid:18) − log (1 − ρ ) n ( N n − , N n : n − log nρ (cid:19) d → ( E, G ) , where ( E, G ) are independent rvs on R + × R with distributions P ( E > t ) = e − t and P ( G ≤ t ) = e − e − t with E ( G ) = γ, the Euler constant (exponential and Gumbel).Although in the random partitioning of N , all parts attributed to each recipientare id, there is a great variability in the shares as the smallest one is of order 1 andthe largest one of order log n. N − circle covering problems Let S n := { M , .., M n } be the discrete set of points drawn at random on the N − circle with circumference N . Fix k ∈ { , .., N } . Consider the coarse-graineddiscrete random set of intervals(13) S n ( k ) := { M + l, .., M n + l, ≤ l ≤ k } appending clockwise an arc of integral length k ≥ S n . The number of gaps and the length of the covered set.
Let P n ( k ) be thenumber of gaps of S n ( k ) (which is also the number of connected components), sowith P n ( k ) = 0 as soon as the N − circle is covered by S n ( k ).Let also L n ( k ) be the total integral length of S n ( k ). As there are n − P n ( k )spacings covered by k and P n ( k ) gaps each contributing of k to the covered length,it can be expressed as a contribution of two terms ( i ∧ j = min ( i, j )),(14) L n ( k ) = n − P n ( k ) X m =1 N m : n + kP n ( k ) = n X m =1 ( N m,n ∧ k ) . THIERRY E. HUILLET
Note also that the vacancy, which is the length of the N -circle not covered by anyarc is(15) L n ( k ) := N − L n ( k ) = P n ( k ) X p =1 ( M n − p +1: n − k ) = n X m =1 ( N m,n − k ) + , summing the gaps’ lengths over the gaps (with N n : n − k the largest gaps size and N n − P n ( k )+1: n − k the smallest gaps’ size). We recover the result ( i ) originally dueto [19] and its asymptotic consequences. The following statements are mainly dueto Holst, see [9]. It holds that( i ) The distribution of P n ( k ) is(16) P ( P n ( k ) = p ) = (cid:0) np (cid:1)(cid:0) N − n − (cid:1) n X m = p ( − m − p (cid:18) n − pm − p (cid:19)(cid:18) N − mk − n − (cid:19) . ( ii ) As n, N → ∞ , while n (cid:0) − nN (cid:1) k → α , 0 < α < ∞ (17) P n ( k ) → Poi ( α ) , where Poi( α ) is a random variable with Poisson distribution of parameter α .( iii ) a. Number of gaps. As n, N → ∞ n/N → ρ, with 0 < ρ < √ n (cid:16) P n ( k ) − n ( n/N ) k (cid:17) d → N →∞ N (cid:0) , σ = ρ k (cid:0) − ρ k (cid:1)(cid:1) . where N (cid:0) m, σ (cid:1) stands for the normal law with mean m and variance σ . b. Gap length:(19) 1 √ n (cid:16) L n ( k ) − N (1 − n/N ) k (cid:17) d → N →∞ N (cid:0) , σ (cid:1) where σ = (cid:0) ρ − ρ k (cid:1) ρ k − ( ρ + kρ ) ρ k − , ρ = 1 − ρ. The proofs of ( ii ) and ( iiib ) are in [9]. The one of ( iiia ) follows from similarCentral Limit Theorem arguments developed there. In the first case ( ii ), n ∼ N (cid:16) − (cid:0) αN (cid:1) /k (cid:17) and so n is very close to N : because of that, there are finitely manygaps in the limit and the covering probability is e − α , so macroscopic. Whereas inthe second case ( iii ), n ∼ ρN is quite small: the number of gaps is of order nρ k and the covering probability is expected to be exponentially small . Note from ( iiib )that the variance of the limiting normal law is 0 when k = 1 , in accordance withthe fact that L n (1) = N − n remains constant. ⋄ The number of arcs needed to cover the N -circle. In (16), P ( P n ( k ) = 0) isthe cover probability and P ( P n ( k ) = n ) the probability that no overlap of arcs orrods takes place (the hard rods model). We have P ( P n ( k ) = 0) = P ( N n : n ≤ k ) . The cover probability P ( P n ( k ) = 0) is also the probability that the number of arcsof length k (the sample size), say N ( k ), required to cover the N − circle is less or BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 7 equal than n . We have N ( k ) = inf ( n : N n : n ≤ k ). In other words, P ( N ( k ) > n ) = P ( N n : n > k ) and so E N ( k ) = P Nn =1 P ( N n : n > k ), with P ( N n : n > k ) = 1 (cid:0) N − n − (cid:1) n X m =1 ( − m − (cid:18) nm (cid:19)(cid:18) N − mk − n − (cid:19) . We wish to estimate E N ( k ) as N grows large.When n (cid:0) − nN (cid:1) k → α , so when n ∼ N (cid:16) − (cid:0) αN (cid:1) /k (cid:17) , we have P ( P n ( k ) = 0) = P ( N ( k ) ≤ n ) → e − α . Therefore, as N → ∞ (20) N /k (cid:18) − N ( k ) N (cid:19) d → E k , where E k has a Weibull( k ) distribution with P ( E k > x ) = e − x k and E ( E k ) =Γ (cid:0) k − (cid:1) . Thus(21) E N ( k ) ∼ N →∞ N − Γ (cid:0) k − (cid:1) N /k + o (cid:16) N − /k (cid:17)! is the estimated expected number of length- k arcs required to cover the N − circle.4. Large deviation rate functions in the thermodynamical limit:Hard rods, covering and parking configurations k − Hard rods configurations are those for which N n > k ≥ N exceeds the arc-length k : appending an arc of length k to allsampled points does not result in overlapping of the added arcs) . k − Covering config-urations with k > N n : n ≤ k (the largest part in the decomposi-tion of N is smaller than arc-length k : appending an arc of length k to each sampledpoints results in the covering of all points of the N -circle, a connectedness property) .k − Parking configurations are those for which both ( N n > k and N n : n ≤ k ) (thesmallest part in the decomposition of N exceeds the arc-length k and the largestpart in the decomposition of N is smaller than twice the arc-length k : appendingan arc of length k to all sampled points results in a hard rods configuration wheresampled points are separated by gaps of length at least k but with the extra excessgaps being smaller than k, so with no way to add a new rod (or car) with size k without provoking an overlap) . All these configurations are exponentially rare inthe thermodynamic limit n, N → ∞ while n/N → ρ ∈ (0 , . We make precise thisstatement by computing the large deviation rate functions in each case, extendingto the discrete formulation similar results obtained in the continuum, see [11].4.1.
Hard rods. k − hard rods configurations are those for which ( N n > k > N amongst n recipients, this event isrealized if the share of the poorest is bounded below by k , a rare event]. When thenumber of sampled points n is a fraction of N (the case with a density n = ρN ),there are too few sampled points for a non-overlapping configuration to occur witha reasonably large probability. Rather, one expects that the probability of non-overlapping (hard-rods) configurations tends to zero exponentially fast. To seethis, we need to evaluate the large n expansion of P ( N n > k ). Note that the THIERRY E. HUILLET event N n > k is an event with positive probability if and only if N ≥ n ( k + 1)so, in the sequel, we shall assume that ρ < / ( k + 1), k ≥
1. We have P ( N n > k ) = Z n,N P k ,..,k n ≥ Q nm =1 P k m = N = Z n,N (cid:0) N − n − (cid:1) ∼ C (cid:16) nN (cid:17) n (cid:16) − nN (cid:17) N − n Z n,N where Z n,N = P k ,..,k n ≥ Q nm =1 k m >k P k m = N . In the limit n, N → ∞ with n/N → ρ, (22) − n log P ( N n > k ) → − ρ ( ρ log ρ + (1 − ρ ) log (1 − ρ ))+ lim n →∞ − n log Z n,N . In the limit n, N → ∞ with fixed n/N limit, the quantity P ( N n > k ) is easierto evaluate in an isobaric ensemble where the pressure p is held fixed instead of P k m . Therefore, relaxing the constraint P k m = N , we shall work instead withthe modified random variables e N m,n , with exponentially tilted law P (cid:16) e N m,n = k m > k, m = 1 , .., n (cid:17) = Q nm =1 k m >k e − pk m Z n,p . Here Z n,p = X k ,..,k n ≥ n Y m =1 k m >k e − pk m = X l>k e − pl ! n = (cid:18) e − p ( k +1) − e − p (cid:19) n is the normalizing constant.Defining G n,p := − log Z n,p , we have ∂ p G n,p = E N,p (cid:16)P e N m,n e N m,n >k (cid:17) and onemust choose p in such a way that ∂ p G n,p = N , leading to N = n (cid:16) k + 1 + e − p − e − p (cid:17) or ρ = e − p − e − p + k + 1, so p = − log (cid:16) − ρ ( k +1)1 − ρk (cid:17) . The latter equation relating p, ρ and k is an equation of state. Due to the equivalence of ensembles principle, see [12] forsimilar arguments, we have: Z n,N = e pN Z n,p O (cid:0) N − / (cid:1) , leading to: − n log Z n,N ∼− n log Z n,p − pρ . Proceeding in this way, we finally get(23) − n log P ( N n > k ) → F hr ( p, ρ ) = − ρ ( ρ log ρ + (1 − ρ ) log (1 − ρ )) − pρ − log (cid:18) e − p ( k +1) − e − p (cid:19) , with ρ ∈ (0 , / ( k + 1)). Here, thermodynamical “pressure” p > ρ are related through the “state equation” ∂ p F hr ( p, ρ ) = 0 which can consistently bechecked to be(24) 1 ρ = k + 1 + e − p − e − p , leading to p = − log (cid:16) − ρ ( k +1)1 − ρk (cid:17) > ρ < / ( k + 1)). Thus F hr is an explicit entropy-like positive function of ρ and k ,namely(25) F hr ( ρ ) = − ρ ((1 − ρ ) log (1 − ρ ) − (1 − ρ ( k + 1)) log (1 − ρ ( k + 1)) + (1 − ρk ) log (1 − ρk )) BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 9
In the thermodynamical limit, hard-rods configurations are exceptional and thehard-rods large deviation rate function F hr is an explicit function of ρ and k. Weconclude that with probability tending to 1, N n = 1: In the partitioning approachof the fortune N amongst n recipients, the share of the poorer is the smallestpossible . As ρ ↑ / ( k + 1), pressure tends to ∞ and F hr ( ρ ) → ( k + 1) log ( k + 1) − k log ( k ) > . As ρ ↓
0, pressure tends to 0 and F hr ( ρ ) → . Covering configurations.
Covering configurations are those for which wehave ( N n : n ≤ k ) . In the partitioning approach of the fortune N amongst n recipi-ents, this event is realized when the share of the richest is bounded above by k (arare event). Assume n, N → ∞ with n/N → ρ ∈ (1 /k,
1) where k > k tends to zero exponentially fast. Working now with Z n,p = X k ,..,k n ≥ n Y m =1 k m ≤ k e − pk m = (cid:18) e − p − e − pk − e − p (cid:19) n and proceeding as for the hard rods case, we easily get − n log P ( N n : n ≤ k ) → F c ( p, ρ )where the covering large deviation rate function is(26) F c ( p, ρ ) = − ρ ( ρ log ρ + (1 − ρ ) log (1 − ρ )) − pρ − log e − p (cid:0) − e − pk (cid:1) − e − p ! . Here, thermodynamical pressure p and density ρ ∈ (1 /k,
1) are related through thecovering state equation ∂ p F c ( p, ρ ) = 0, namely(27) 1 ρ = 1 + e − p − e − p − ke − pk − e − pk . For all finite arc-length k, k − covering configurations are also exceptional. The k − covering large deviation rate function F c is in general an implicit function of ρ and k , ρ ∈ (1 /k, . When ρ ↓ /k , pressure tends to −∞ and F c ( ρ ) → k log k − ( k −
1) log ( k − > . As ρ ↑
1, pressure tends to ∞ and F c ( ρ ) → . By continuity, there is a value of ρ inside the definition domain of ρ where p = 0 . We have F c ( p, ρ ) = − ρ ( ρ log ρ + (1 − ρ ) log (1 − ρ )) − log k. In the partition-ing approach of the fortune N amongst n recipients, the share of the richest isbounded above with probability tending to 0 exponentially fast. Remark:
When k = 2 , the covering equation of state can be solved explicitly be-cause it boils down to a second degree equation in e − p . One finds p = − log (cid:16) − ρ ρ − (cid:17) .Plugging in this expression of p in F c ( p, ρ ) with k = 2 gives F c = − ρ (2 ρ log ρ − (2 ρ −
1) log (2 ρ − , an explicit function of ρ ∈ (1 / , . Note that p ↑ ∞ as ρ ↑ , p ↓ −∞ as ρ ↓ / p = 0 when ρ = 2 / . We have F c ( p, /
3) = log 3 − . Parking configurations.
Parking configurations are those for which we have( N n > k, N n : n ≤ k ) . In the partitioning approach of the fortune N amongst n recipients, this event is realized if the share of the richest is bounded above by twicethe share of the poorest. Assume n, N → ∞ with n/N → ρ ∈ (1 / (2 k ) , / ( k + 1)).One expects that the probability of k − parking configurations tends to zero expo-nentially fast. Working now with Z n,p = X k ,..,k n ≥ n Y m =1 k
1) are related through theparking equation of state ∂ p F π ( p, ρ ) = 0, namely(29) 1 ρ = k + 1 + e − p − e − p − ke − pk − e − pk . The parking configurations large deviation rate function F π is an implicit func-tion of ρ and k with ρ ∈ (1 / (2 k ) , / ( k + 1)) . The latter formula can be extendedto the border case k = 1. Indeed, when k = 1, ρ = 1 / , pressure tends to ∞ and F π ( p, ρ ) = 2 log 2 . From Stirling formula, this is in agreement with the fact P ( N n > , N n : n ≤
2) = (cid:0) N − n − (cid:1) − = 0 only if N = 2 n , which is the probability ofthe regular configuration where the n sampled points are all exactly equally-spacedby two arc length units. Remark:
When k = 2 , the parking equation of state can be solved explicitly togive p = − log (cid:16) − ρ ρ − (cid:17) . Plugging in this expression of p into F π ( p, ρ ) with k = 2gives F π as an explicit function of ρ ∈ (1 / , / . Note that p ↑ ∞ as ρ ↑ / ,p ↓ −∞ as ρ ↓ / p = 0 when ρ = 2 / . The grand canonical partition of N Suppose N indistinguishable balls are assigned at random into N indistinguishableboxes. Let N n,N ≥ n. This leads to arandom partition of N now into N id summands which are ≥ N = N X n =1 N n,N . We have(31) P ( N ,N = k , .., N N,N = k N ) = 1 (cid:0) N − N (cid:1) , BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 11 which is a Bose-Einstein distribution on the full N − simplex: ( k n ≥ N X n =1 k n = N ) . Summing over all the k n but one, the marginal distribution of N ,N is easily seento be(32) P ( N ,N = k ) = (cid:0) N − k − N − k (cid:1)(cid:0) N − N (cid:1) , k = 0 , .., N. Let P N = P Nn =1 N n,N >
0) count the number of summands which are strictlypositive (the number of non-empty boxes). With k m ≥ P nm =1 k m = N ,we obtain(33) P ( N ,N = k , .., N n,N = k n ; P N = n ) = (cid:0) Nn (cid:1)(cid:0) N − N (cid:1) , which is independent of the filled box occupancies ( k , .., k n ) (the probability beinguniform).As there are (cid:0) N − n − (cid:1) sequences k m ≥ m = 1 , .., n satisfying P nm =1 k m = N ,summing over the k m ≥ , we get the hypergeometric distribution for P N :(34) P ( P N = n ) = (cid:0) Nn (cid:1)(cid:0) N − n − (cid:1)(cid:0) N − N (cid:1) , n = 1 , .., N. This distribution occurs in the following urn model: Draw N balls without re-placement from an urn containing 2 N − N of which are white, N − P N describes the distribution of the number of whiteballs drawn from the urn. Its mean is N / (2 N − ∼ N/ (cid:0) N ( N − (cid:1) / (cid:16) N − (cid:17) ∼ N/ . As a result,(35) P ( N ,N = k , .., N n,N = k n | P N = n ) = 1 (cid:0) N − n − (cid:1) | k | = N )which is the spacings conditional Bose-Einstein model with k ≥ described in (3).The balls in boxes model just defined is therefore an extension of the conditionalBose-Einstein model allowing the number of sampled points to be unknown andrandom. Repetitions (grand canonical).
It is likely that some boxes contain the samenumber of particles. To take these multiplicities into account, let A i,N , i ∈ { , .., N } count the number of boxes with exactly i balls, that is(36) A i,N = { n ∈ { , .., N } : N n,N = i } = N X n =1 N n,N = i ) . Then P Ni =0 A i,N = N where P Ni =1 A i,N = P N is the number of filled boxes and A ,N = N − P N the number of empty ones. The joint probability of the A i,N is given by the Ewens formula (see [4] and [13])(37) P ( A ,N = a , A ,N = a , .., A N,N = a N ) = 1 (cid:0) N − N (cid:1) N ! Q Ni =0 a i ! , on the set P Ni =0 a i = P Ni =1 ia i = N. Let us now investigate the marginal law of the A i,N . Firstly, the law of A ,N = N − P N clearly is(38) P ( A ,N = a ) = (cid:0) Na (cid:1)(cid:0) N − a (cid:1)(cid:0) N − N (cid:1) , a = 0 , .., N − , with E ( A ,N ) ∼ N/
2. Secondly, recalling A i,N = P Nn =1 N n,N = i ) , with ( N ) l := N ( N − .. ( N − l + 1), using the exchangeability of ( N ,N , .., N N,N ), the proba-bility generating function of A i,N ( i = 0) reads E (cid:0) z A i,N (cid:1) = 1 + X l ≥ ( z − l l ! ( N ) l P ( N ,N = i, .., N l,N = i ) . Using P ( N ,N = k , .., N l,N = k l ) = (cid:0) N − l − P l k m − N − l − (cid:1) / (cid:0) N − N − (cid:1) , we get the fallingfactorial moments of A i,N as(39) m l,i ( N ) := E (cid:2) ( A i,N ) l (cid:3) = ( N ) l (cid:18) N − l − li − N − l − (cid:19) / (cid:18) N − N − (cid:19) , where l ∈ { , .., l ( i ) = ( N − ∧ [ N/i ] } . The marginal distribution of A i,N is thus(40) P ( A i,N = a i ) = l ( i ) X l = a i ( − l − a i l ! (cid:18) la i (cid:19) m l,i ( N ) , a i ∈ { , .., l ( i ) } . If l = 1, E ( A i,N ) = N (cid:0) N − i − N − (cid:1) / (cid:0) N − N − (cid:1) . The variance of A i,N is σ ( A i,N ) = m ,i ( N ) + m ,i ( N ) − m ,i ( N ) . In particular, we find that E ( A ,N ) = N ( N + 1) / (2 (2 N + 1)) ∼ N/ N is large, aboutone fourth out of the N boxes is filled by singletons (recall that one half of N isfilled by no ball). The variance of A ,N is σ ( A ,N ) ∼ N/ A ,N , properly normalized, converges to a normal distribution . Next, we can check that E ( A ,N ) ∼ N/ E ( A i,N ) ∼ N/ i +1 , showing a geometric decayin i of E ( A i,N ) . Finally, note that the probability that A i,N takes its maximal possible value l ( i ) is P ( A i,N = l ( i )) = m l ( i ) ,i ( N ) /l ( i )! = (cid:18) Nl ( i ) (cid:19) / (cid:18) N − N − (cid:19) . For example P ( A ,N = N −
1) = N/ (cid:0) N − N − (cid:1) is the (exponentially small) probabil-ity that all N boxes are filled by singletons. Multiplicities and conditioning.
Let us now investigate the same problem whileconditioning on P N = n. BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 13
Firstly, note that P Ni =1 iA i,N ( i ) = N is the total number of balls. Using themultinomial formula, with P Ni =1 ia i = N and P Ni =1 a i = n , we thus get(41) P ( A ,N = a , .., A N,N = a N , P N = n ) = (cid:0) Nn (cid:1)(cid:0) N − N (cid:1) n ! Q Ni =1 a i !and(42) P ( A ,N = a , .., A N,N = a N | P N = n ) = n ! (cid:0) N − n − (cid:1) Q Ni =1 a i ! . The latter formulae give the joint (Ewens-like) distributions of the repetition vectorcount.Let us investigate the marginal distribution of the A i,N conditional given P N = n .Firstly, the law of A ,N = N − P N is P ( A ,N = a | P N = n ) = δ a − ( N − n ) . Secondly, recalling A i,N = P Nn =1 N n,N = i ) , with ( n ) l = n ( n − .. ( n − l + 1)(and ( n ) := 1) , using the exchangeability of ( N ,N , .., N N,N ), the conditional prob-ability generating function of A i,N reads E (cid:0) z A i,N | P N = n (cid:1) = 1 + X l ≥ ( z − l l ! ( n ) l P ( N ,N = i, .., N l,N = i | P N = n ) . Using P ( N ,N = k , .., N l,N = k l | P N = n ) = (cid:0) N − P l k m − n − l − (cid:1) / (cid:0) N − n − (cid:1) , we get the con-ditional falling factorial moments of A i,N as(43) m l,i ( n, N ) := E (cid:2) ( A i,N ) l | P N = n (cid:3) = ( n ) l (cid:18) N − li − n − l − (cid:19) / (cid:18) N − n − (cid:19) , where l ∈ { , .., l ( i ) = ( n − ∧ [( N − /i ] } . The conditional marginal distribu-tion of A i,N is thus(44) P ( A i,N = a i | P N = n ) = l ( i ) X l = a i ( − l − a i l ! (cid:18) la i (cid:19) m l,i ( n, N ) . If l = 1, E ( A i,N | P N = n ) = n (cid:0) N − i − n − (cid:1) / (cid:0) N − n − (cid:1) . In particular, E ( A ,N | P N = n ) = n ( n − / ( N −
1) is the mean number of singleton boxes. In the thermodynamicallimit n, N → ∞ , n/N → ρ , E ( A ,N | P N = n ) ∼ ρn and a fraction ρ of the n filledboxes is filled with singletons. For the variance, we have σ ( A ,N | P N = n ) ∼ ρn. We can also check that a fraction ρ (1 − ρ ) of the n filled boxes is filled with double-tons: E ( A ,N | P N = n ) ∼ ρ (1 − ρ ) n and more generally that E ( A i,N | P N = n ) ∼ ρ (1 − ρ ) i n. Finally, note that the probability that A i,N reaches its maximal possible value l ( i )is P ( A i,N = l ( i ) | P N = n ) = m l ( i ) ,i ( n, N ) /l ( i )! = (cid:18) nl ( i ) (cid:19) / (cid:18) N − n − (cid:19) . For example P ( A ,N = n − | P N = n ) = n/ (cid:0) N − n − (cid:1) is the probability that n − N − n + 1 balls, which is obvious. Random Graph Connectivity
The latter model may be viewed as a clockwise k − nearest neighbor graph with N vertices and kn edges. Consider as before N equally spaced points (vertices) onthe N − circle so with arc length 1 between consecutive points. Draw at random n ∈ { , .., N − } points without replacement at the integer vertices of this circle.Assume N ≤ n and draw an edge at random from each of the n sampled points,removing each sampled point once it has been paired. At the end of this process, weget a random graph with N vertices and n (out-degree 1) edges whose endpoints areno more neighbors, being now chosen at random on { , .., N } . We wish to estimatethe covering probability for this new model in the spirit of Erd˝os-R´enyi randomgraphs.Let B m , m = 1 , .., n be a sequence of independent (but not id) Bernoulli rvs withsuccess probabilities p m = N − nN − ( m − , m = 1 , .., n. With (cid:2) z k (cid:3) φ ( z ) the z k − coefficientof φ ( z ) , the N − covering probability is(45) P n,N (cover) = P N − n ≤ n X m =1 B m ≤ n ! = n X k = N − n (cid:2) z k (cid:3) E (cid:16) z P nm =1 B m (cid:17) , which is just the probability to hit all points of the un-sampled set { n + 1 , .., N } atleast once in a uniform pairing without replacement of the n − sample . This coveringprobability is the probability of connectedness of the random graph with N verticesand n out-degree 1 edges. It is of course zero if N > n. Let p n = n P nm =1 p m bethe sample mean of the Bernoulli rvs. The covering probability can be bounded by(46) P n,N (cover) ≤ n X k = N − n (cid:18) nk (cid:19) (1 − p n ) k p n − kn . Assume n, N → ∞ while n/N → ρ so with ρ ∈ (1 / , . Then p n → − − ρρ log (1 − ρ ) =: µ ( ρ ) . Clearly σ ( B m ) < ∞ and P nm =1 m − σ ( B m ) has a finite limit. By KolmogorovStrong Law of Large Numbers n P nm =1 B m a.s. → µ ( ρ ) and so P N (cover) → ρ ≥ ρ c := 1 − e − because in this case the probability to estimate is P n n X m =1 B m ≥ − ρρ ! , with µ ( ρ ) ∈ h − ρρ , i .Whereas, when ρ < − e − , the bound for the covering probability can be estimatedby P n,N (cover) ≤ N Z ρ − ρ (cid:18) N ρN x (cid:19) (1 − µ ( ρ )) Nx µ ( ρ ) N ( ρ − x ) dx ∼ CN Z ρ − ρ e NH ρ ( x ) dx, BOSE-EINSTEIN APPROACH TO THE RANDOM PARTITIONING OF AN INTEGER 15 where H ρ ( x ) = ρ log ρ − x log x − ( ρ − x ) log ( ρ − x )+ x log (1 − µ ( ρ ))+( ρ − x ) log µ ( ρ ) . The function x → H ρ ( x ) is concave and attains its maximum at x = ρ (1 − µ ( ρ )) < − ρ, which is outside the integration interval [1 − ρ, ρ ]. By the saddle point method(47) lim inf n,N →∞ , n/N → ρ − n log P n,N (cover) = F G ( ρ ) := − ρ H ρ (1 − ρ ) > . So only in the low-density range < ρ < − e − is the graph’s connectedness prob-ability exponentially small. Note that the graph large deviation rate function F G ismaximal (minimal) at ρ = 1 / ρ c = 1 − e − ), with F G ( ρ c ) = − − ee − log ( e − > . We conclude that in the random graph approach to the covering problem, in sharpcontrast to the k − nearest neighbor graph, there exists a critical density ρ c = 1 − e − above which covering occurs with probability one. These results illustrateto what extent, when connections are not restricted to neighbors, the chance ofconnectedness is increased. This question was also raised in ([2], p.18) in relationwith Small-World graphs. References [1] Bollob´as B.
Random graphs.
Second edition. Cambridge Studies in Advanced Mathematics,73. Cambridge University Press, Cambridge, 2001.[2] Cannings C. (2006). Modelling protein-protein interactions networks from yeast-2-hybridscreens with random graphs. In Statistics in Genomics and Proteomics, Ed. Urfer A, TurkmanMA. Centro Internacional de Matematica, Coimbra.[3] Darling D.A. (1953) On a class of problems related to the random division of an interval.Ann. Math. Statistics, 24, 239–253.[4] Ewens W. J. (1972) The sampling theory of selectively neutral alleles. Theoret. Pop. Biol., 3,87–112, 1972; erratum, 3, 240, 1972, erratum, 3 , An introduction to probability theory and its applications. . John Wiley and Sons,Second Edition, New York, 1971.[6] Flatto L., Konheim A.G. (1962) The random division of an interval and the random coveringof a circle. SIAM Review, 4, 211-222.[7] Holst L. (1983) A note on random arcs on the circle. Probability and Mathematical Statistics.Essays in honour of Carl–Gustav Esseen. Ed. by Allan Gut and Lars Holst, Uppsala, 40–45.[8] Holst L., H¨usler J. (1984) On the random coverage of the circle. J. Appl. Prob., 21, 558–566.[9] Holst L. (1985) On discrete spacings and the Bose-Einstein distribution. Contributions toProbability and Statistics. Essays in honour of Gunnar Blom. Ed. by Jan Lanke and GeorgLindgren, Lund, 169–177.[10] Huillet T. (2003) Random covering of the circle: the size of the connected components. Adv.in Appl. Probab., 35, no. 3, 563-582.[11] Huillet T. (2003) Random covering of the circle: the configuration-space of the free depositionprocess. J. Phys. A 36, no. 49, 12143-12155.[12] Dunlop F., Huillet, T. (2003) Hard rods: statistics of parking configurations. Phys. A 324,no. 3-4, 698-706[13] Huillet T. (2005) Sampling formulae arising from random Dirichlet populations. Communi-cations in Statistics - Theory and Methods, 34, No 5, 1019-1040.[14] Ivchenko G. I. (1994) On the random covering of a circle: a discrete model. Diskret. Mat. 6,no. 3, 94-109.[15] Johnson N. L., Kotz S. Urn models and their application. An approach to modern discreteprobability theory.
Wiley Series in Probability and Mathematical Statistics. John Wiley &Sons, New York-London-Sydney, xiii+402 pp., 1977[16] Kolchin V. F., Sevastyanov B. A., Chistyakov V. P.
Random allocations.
Translated fromthe Russian. Translation edited by A. V. Balakrishnan. Scripta Series in Mathematics. V. H.Winston & Sons, Washington, D.C.; distributed by Halsted Press [John Wiley & Sons], NewYork-Toronto, Ont.-London, 1978. [17] Pyke R. (1965) Spacings (With discussion). J. Roy. Statist. Soc. Ser. B, 27, 395–449.[18] Siegel A.F. (1978) Random arcs on the circle. J. Appl. Prob., 15, 774-789.[19] Stevens W.L. (1939) Solution to a geometrical problem in probability. Ann. Eugenics, 9,315-320.[20] Withworth W.A. (1897)
Excercises on choice and chance . Deighton Bell and Co., Cambridge.Republished by Hafner, New York, (1959).. Deighton Bell and Co., Cambridge.Republished by Hafner, New York, (1959).