[PDF] The Poisson binomial distribution -- Old & New

Abstract

This is an expository article on the Poisson binomial distribution. We review lesser known results and recent progress on this topic, including geometry of polynomials and distribution learning. We also provide examples to illustrate the use of the Poisson binomial machinery. Some open questions of approximating rational fractions of the Poisson binomial are presented.

Full PDF

TTHE POISSON BINOMIAL DISTRIBUTION – OLD & NEW

WENPIN TANG AND FENGMIN TANG

Abstract.

This is an expository article on the Poisson binomial distribution. We reviewlesser known results and recent progress on this topic, including geometry of polynomials anddistribution learning. We also provide examples to illustrate the use of the Poisson binomialmachinery. Some open questions of approximating rational fractions of the Poisson binomialare presented.

Key words :

Distribution learning, geometry of polynomials, Poisson binomial distribution,Poisson/normal approximation, optimal transport, stochastic ordering, strongly Rayleighproperty. 1.

Introduction

The binomial distribution is one of the earliest examples a college student encountersin his/her ﬁrst course in probability. It is a discrete probability distribution of a sum ofindependent and identically distributed (i.i.d.) Bernoulli random variables, modeling thenumber of occurrence of some events in repeated trials. An integer-valued random variable X is called binomial with parameters ( n, p ), denoted as X ∼ Bin( n, p ), if P ( X = k ) = (cid:0) nk (cid:1) p k (1 − p ) n − k , 0 ≤ k ≤ n . It is well known that if n is large, the Bin( n, p ) distribution isapproximated by the Poisson distribution for small p ’s, and is approximated by the normaldistribution for larger values of p . See e.g. [77] for an educational tour.Poisson [81] considered a more general model of independent trials, which allows hetero-geneity among these trials. Precisely, an integer-valued random variable X is called Poissonbinomial, and denoted as X ∼ PB( p , . . . , p n ) if X ( d ) = ξ + · · · + ξ n , where ξ , . . . , ξ n are independent Bernoulli random variables with parameters p , . . . , p n . Itis easily seen that the probability distribution of X is P ( X = k ) = (cid:88) A ∈ [ n ] , | A | = k (cid:32)(cid:89) i ∈ A p i (cid:89) i/ ∈ A (1 − p i ) (cid:33) , (1.1)where the sum ranges over all subset of [ n ] := { , . . . , n } of size k .The Poisson binomial distribution has a variety of applications such as reliability analysis[16, 57], survey sampling [29, 104], ﬁnance [40, 92], and engineering [44, 100]. Though thistopic has been studied for a long time, the literature is scattered. For instance, the Poissonbinomial distribution has diﬀerent names in various contexts: P´olya frequency (PF) distribu-tion, strongly Rayleigh distribution, convolutions of heterogenous Bernoulli, etc. Researchersoften work on some aspects of this subject, and ignore its connections to other ﬁelds. In late Date : August 28, 2019. a r X i v : . [ m a t h . P R ] A ug WENPIN TANG AND FENGMIN TANG

Distributional properties of Poisson binomial variables

In this section, we review a few distributional properties of the Poisson binomial distribu-tion. For X ∼ PB( p , . . . , p n ), we have µ := E X = n ¯ p and σ := Var X = n ¯ p (1 − ¯ p ) − n (cid:88) i =1 ( p i − ¯ p ) , (2.1)where ¯ p := (cid:80) ni =1 p i /n . It is easily seen that by keeping E X (or ¯ p ) ﬁxed, the variance of X isincreasing as the set of probabilities { p , . . . , p n } gets more homogeneous, and is maximizedas p = · · · = p n . There is a simple interpretation in survey sampling: taking samples fromdiﬀerent communities ( stratiﬁed sampling ) is better than taking from the same group ( simplerandom sampling ).The above observation motivates the study of stochastic orderings for the Poisson binomialdistribution. The ﬁrst result of this kind is due to Hoeﬀding [53], claiming that among allPoisson binomial distributions with a given mean, the binomial distribution is the mostspread-out. Theorem 2.1. [53] (Hoeﬀding’s inequalities) Let X ∼ PB( p , . . . , p n ) , and ¯ X ∼ Bin( n, ¯ p ) .(1) There are inequalities P ( X ≤ k ) ≤ P ( ¯ X ≤ k ) for ≤ k ≤ n ¯ p − , and P ( X ≤ k ) ≥ P ( ¯ X ≤ k ) for n ¯ p ≤ k ≤ n. (2) For any convex function g : [ n ] → R in the sense that g ( k + 2) − g ( k + 1) + g ( k ) > , ≤ k ≤ n − , we have E g ( X ) ≤ E g ( ¯ X ) , where the equality holds if and only if p = · · · = p n = ¯ p . The part (2) in Theorem 2.1 indicates that among all Poisson binomial distributions, thebinomial is the largest one in convex order. This result was extended to the multidimensional

OISSON BINOMIAL 3 setting [9], and to non-negative random variables [8, Proposition 3.2]. See also [68] forinterpretations. Next we give several applications of Hoeﬀding’s inequalities.

Examples 2.2. (1) Monotonicity of binomials.

Fix λ >

0. By taking ( p , . . . , p n ) = (0 , λn − , . . . , λn − ),we get for X ∼ Bin( n − , λn − ) and X (cid:48) ∼ Bin( n, λn ), P ( X ≤ k ) < P ( X (cid:48) ≤ k ) for k ≤ λ − P ( X ≤ k ) > P ( X (cid:48) ≤ k ) for k ≥ λ. Similarly, by taking ( p , . . . , p n ) = (1 , λ − n − , . . . , λ − n − ), we get for X ∼ Bin( n − , λ − n − )and X (cid:48) ∼ Bin( n, λn ), P ( X ≤ k − < P ( X (cid:48) ≤ k ) for k ≤ λ − P ( X ≤ k − > P ( X (cid:48) ≤ k ) for k ≥ λ. These inequalities were used in [3] to derive the monotonicity of error in approximat-ing the binomial distribution by a Poisson distribution. By letting X ∼ Bin( n, p ) and Y ∼ P oi ( np ), they proved P ( X ≤ k ) − P ( Y ≤ k ) is positive if k ≤ n p/ ( n + 1) and isnegative if k ≥ np . The result quantiﬁes the error of conﬁdence levels in hypothesistesting when approximating the binomial distribution by a Poisson distribution. (2) Darroch’s rule. It is well known that a Poisson binomial variable has either one,or two consecutive modes. By an argument in the proof of Hoeﬀding’s inequalities,Darroch [32, Theorem 4] showed that the mode m of the Poisson binomial distributiondiﬀers from its mean µ by at most 1. Precisely, he proved that m =  k if k ≤ µ < k + k +2 ,k or k + 1 if k + k +2 ≤ µ ≤ k + 1 − n − k +1 ,k + 1 if k + 1 − n − k +1 < µ ≤ k + 1 . (2.2)This result was reproved in [91]. See also [60] for a similar result concerning themedian. (3) Azuma-Hoeﬀding inequality. By the Azuma-Hoeﬀding inequality [5, 54], for ξ , . . . , ξ n independent random variables such that 0 ≤ ξ i ≤ P (cid:32) n (cid:88) i =1 ξ i ≥ t (cid:33) ≤ (cid:16) µt (cid:17) t (cid:18) n − µn − t (cid:19) n − t for t > µ, (2.3)where µ := (cid:80) ni =1 E ξ i . Now we show how to derive a version of (2.3) via a Poissonbinomial trick. Given ξ , . . . , ξ n , let b i be independent Bernoulli with parameter ξ i and X ∼ Bin (cid:0) n, n (cid:80) ni =1 ξ i (cid:1) . We have P (cid:32) n (cid:88) i =1 b i ≥ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i =1 ξ i ≥ t (cid:33) ≤ P ( (cid:80) ni =1 b i ≥ t ) P ( (cid:80) ni =1 ξ i ≥ t ) . (2.4)Given (cid:80) ni =1 ξ i ≥ t , (cid:80) ni =1 b i is Poisson binomial with mean greater than t . Accordingto Hoeﬀding’s inequality, P (cid:32) n (cid:88) i =1 b i ≥ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i =1 ξ i ≥ t (cid:33) ≥ P (cid:32) X ≥ t (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n (cid:88) i =1 ξ i ≥ t (cid:33) ≥ c, (2.5)for some universal constant c >

0. Combining (2.4) and (2.5) yields P ( (cid:80) ni =1 ξ i ≥ t ) ≤ c P ( (cid:80) ni =1 b i ≥ t ). Note that (cid:80) ni =1 b i is Poisson binomial with mean µ . Applying WENPIN TANG AND FENGMIN TANG

Hoeﬀding’s inequality to (cid:80) ni =1 b i with bounds for binomial tails [71], we get P (cid:32) n (cid:88) i =1 b i ≥ t (cid:33) ≤ (cid:16) µt (cid:17) t (cid:18) n − µn − t (cid:19) n − t for t ≥ µ + 1 . (2.6)As a consequence, P ( (cid:80) ni =1 ξ i ≥ t ) ≤ c (cid:0) µt (cid:1) t (cid:16) n − µn − t (cid:17) n − t which achieves the same rateas in (2.3) up to a constant factor.The original proof of Theorem 2.1 was brute-force, and it was soon generalized by using theidea of majorization and Schur convexity . To proceed further, we need some vocabularies.Let { x (1) , . . . , x ( n ) } be the order statistics of { x , . . . , x n } . Deﬁnition 2.3.

The vector xxx is said to majorize the vector yyy , denoted as xxx (cid:23) yyy , if k (cid:88) i =1 x ( i ) ≤ k (cid:88) i =1 y ( i ) for k ≤ n − and n (cid:88) i =1 x ( i ) = n (cid:88) i =1 y ( i ) . See [66] for background and development on the theory of majorization and its applications.The following theorem gives a few lesser known variants of Hoeﬀding’s inequalities.

Theorem 2.4.

Let X ∼ PB( p , . . . , p n ) , X (cid:48) ∼ PB( p (cid:48) , . . . , p (cid:48) n ) and Y ∼ Bin( n, p ) .(1) [48, 104] If ( p , . . . , p n ) (cid:23) ( p (cid:48) , . . . , p (cid:48) n ) , then P ( X ≤ k ) ≤ P ( X (cid:48) ≤ k ) for ≤ k ≤ n ¯ p − , and P ( X ≤ k ) ≥ P ( X (cid:48) ≤ k ) for n ¯ p + 2 ≤ k ≤ n. Moreover,

Var( X ) ≤ Var( X (cid:48) ) .(2) [80] If ( − log p , . . . , − log p n ) (cid:23) ( − log p (cid:48) , . . . , − log p (cid:48) n ) , then X is stochastically largerthan X (cid:48) , i.e. P ( X ≥ k ) ≤ P ( X (cid:48) ≥ k ) for all k .(3) [17] X is stochastically larger than Y if and only if p ≤ ( (cid:81) ni =1 p i ) n , and X is stochas-tically smaller than Y if and only if p ≥ − ( (cid:81) ni =1 (1 − p i )) n . Consequently, if ( (cid:81) ni =1 p i ) n ≥ − ( (cid:81) ni =1 (1 − p (cid:48) i )) n then X is stochastically larger than X (cid:48) . The proof of Theorem 2.4 relies on the fact that xxx (cid:23) yyy implies the components of xxx are morespread-out than those of yyy . For example in part (1), it boils down to proving if k ≤ n ¯ p − P ( X ≤ k ) is a Schur concave function in ppp , meaning its value increases as the componentsof ppp are less dispersed. The part (3) gives a suﬃcient condition of stochastic orderings forthe Poisson binomial distribution. A simple necessary and suﬃcient condition remains open.See also [15, 16, 18, 52, 93, 107] for further results.3. Approximation of Poisson binomial distributions

In this section, we discuss various approximations of the Poisson binomial distribution.Pitman [78, Section 2] gave an excellent survey on this topic in the mid-90’s. We complementthe discussion with recent developments. In the sequel, L ( X ) denotes the distribution of arandom variable X . OISSON BINOMIAL 5

Poisson approximation . Le Cam [64] gave the ﬁrst error bound for Poisson approximationof the Poisson binomial distribution. The following theorem is an improvement of Le Cam’sbound.

Theorem 3.1. [7]

Let X ∼ PB( p , . . . , p n ) and µ := (cid:80) ni =1 p i . Then

132 min (cid:18) , µ (cid:19) n (cid:88) i =1 p i ≤ d T V ( L ( X ) , Poi( µ )) ≤ − e − µ µ n (cid:88) i =1 p i , (3.1) where d T V ( · , · ) is the total variation distance. It is easily seen from (3.1) that the Poisson approximation of the Poisson binomial is goodif (cid:80) ni =1 p i (cid:28) (cid:80) ni =1 p i , or equivalently µ − σ (cid:28) µ . There are two cases: • For small µ , the upper bound in (3.1) is sharp. • For large µ , the approximation error is of order (cid:80) ni =1 p i / (cid:80) ni =1 p i .As pointed out in [59], the constant 1 /

32 in the lower bound can be improved to 1 /

14. See[6] for a book-length treatment, and [86] for sharp bounds. A powerful tool to study theapproximation of the sum of (possibly dependent) random variables is Stein’s method ofexchangeable pairs, see [26]. For instance, a simple proof of the upper bound in (3.1) wasgiven in [26, Section 3] via the Stein machinery.The Poisson approximation can be viewed as a mean-matching procedure. The failure ofthe Poisson approximation is due to a lack of control in variance. A typical example is whereall p i ’s are bounded away from 0, so that µ is large and (cid:80) ni =1 p i / (cid:80) ni =1 p i is of constantorder. To deal with these cases, R¨ollin [85] proposed a mean/variance-matching procedure.To present further results, we need the following deﬁnition. Deﬁnition 3.2.

An integer-valued random variable X is said to be translated Poisson dis-tributed with parameters ( µ, σ ) , denoted as TP( µ, σ ) , if X − µ + σ + { µ − σ } ∼ Poi( σ + { µ − σ } ) , where {·} is the fraction part of a positive number. It is easy to see that a TP( µ, σ ) random variable has mean µ , and variance σ + { µ + σ } which is between σ and σ +1. The following theorem gives an upper bound in total variationbetween a Poisson binomial variable and its translated Poisson approximation. Theorem 3.3. [85]

Let X ∼ PB( p , . . . , p n ) , and µ := (cid:80) ni =1 p i and σ := (cid:80) ni =1 p i (1 − p i ) .Then d T V ( L ( X ) , TP( µ, σ )) ≤ (cid:113)(cid:80) ni =1 p i (1 − p i ) σ , (3.2) where d T V ( · , · ) is the total variation distance. Note that if all p i ’s are bounded away from 0 and 1, the approximation error is of order1 / √ n which is optimal. See [70] for the most up-to-date results of the Poisson approximation.Now we give an application of translated Poisson approximation in observational studies. Example 3.4.

Sensitivity analysis.

In matched-pair observational studies, an sensitivityanalysis accesses the sensitivity of results to hidden bias. Here we follow a modern approachof Rosenbaum [88, Chapter 4]. Precisely, the sample consists of n matched pairs and unitsin each pair are indexed by i = 1 ,

2. Each pair k = 1 , . . . , n is matched on a set of observedcovariates xxx k = xxx k , and only one unit in each pair receives the treatment. Let Z ki be the WENPIN TANG AND FENGMIN TANG treatment assignment, so Z k + Z k = 1. Common test statistics for matched pairs are sign-score statistics of the form: T = (cid:80) nk =1 d k ( c k Z k + c k Z k ), where d k ≥ c ki ∈ { , } .For simplicity, we take d k = 1 and the statistics of interest are T = n (cid:88) k =1 ( c k Z k + c k Z k ) , (3.3)where c k Z k + c k Z k is Bernoulli distributed with parameter p k := c k π k + c k (1 − π k ) with π k := P ( Z k = 1 | Z k + Z k = 1). So T ∼ PB( p , . . . , p n ). For 1 ≤ k ≤ n , let Γ k := π k / (1 − π k ),which equals to 1 if there is no hidden bias.The goal is to make inference on T with diﬀerent choices of ( π , . . . , π n ) and understandwhich choices explain away the conclusion we draw from the null hypothesis (i.e. there is nohidden bias). Thus, we are interested in the set R ( t, α ) := { ( π , . . . , π n ) : P ( T ≥ t ) ≤ α } , on the boundary of which the conclusion assuming no hidden bias is turned over. However,direct computation of R ( t, α ) seems hard. A routine way to solve this problem is to ap-proximate R ( t, α ) by a regular shape. To this end, we consider the following optimizationproblem: max Γ ,s.t. max πππ ∈ C Γ P ( T ( π , . . . , π n ) ≥ t ) ≤ α, (3.4)where C Γ is a constraint region. For instance, C Γ := { πππ : ≤ π k ≤ Γ1+Γ } corresponds tothe worst-case sensitivity analysis. By the translated Poisson approximation, the quantitymax πππ ∈ C Γ P ( T ( π , . . . , π n ) ≥ t ) can be evaluated by the following problem which is easy tosolve. min A ∈{ ,...,K } min πππ ∈ C Γ K (cid:88) k =0 λ k e − λ k ! s.t. K = t − A, λ = n (cid:88) k =1 p k − A, A ≤ n (cid:88) k =1 p k < A + 1 . (3.5) Normal approximation . The normal approximation of the Poisson binomial distributionfollows from Lyapunov or Lindeberg central limit theorem, see e.g. [11, Section 27]. Berryand Esseen independently discovered an error bound in terms of the cumulative distribu-tion function for the normal approximation of the sum of independent random variables.Subsequent improvements were obtained by [72, 75, 94, 102] via Fourier analysis, and by[27, 28, 67, 101] via Stein’s method.Let φ ( x ) := √ π exp (cid:0) − x / (cid:1) be the probability density function of the standard normal,and Φ( x ) := (cid:82) x −∞ φ ( y ) dy be its cumulative distribution function. The following theroemprovides uniform bounds for the normal approximation of Poisson binomial variables. Theorem 3.5.

Let X ∼ PB( p , . . . , p n ) , and µ := (cid:80) ni =1 p n and σ := (cid:80) ni =1 p i (1 − p i ) .(1) [79, Theorem 11.2] There is a universal constant

C > such that max ≤ k ≤ n (cid:12)(cid:12)(cid:12)(cid:12) P ( X = k ) − φ (cid:18) k − µσ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ Cσ . (3.6)

OISSON BINOMIAL 7 (2) [94]

We have max ≤ k ≤ n (cid:12)(cid:12)(cid:12)(cid:12) P ( X ≤ k ) − Φ (cid:18) k − µσ (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) ≤ . σ . (3.7)Other than uniform bounds (3.6)-(3.7), several authors [14, 49, 84] studied error boundsfor the normal approximation in other metrics. For µ , ν two probability measures, consider • L p metric d p ( µ, ν ) := (cid:18)(cid:90) ∞−∞ | µ ( −∞ , x ] − ν ( −∞ , x ] | p dx (cid:19) p , • Wasserstein’s p metric W p ( µ, ν ) := inf π (cid:18)(cid:90) ∞−∞ (cid:90) ∞−∞ | x − y | p π ( dxdy ) (cid:19) p , where the inﬁmum runs over all probability measures π on R × R with marginals µ and ν .Specializing these bounds to the Poisson binomial distribution, we get the following result. Theorem 3.6.

Let X ∼ PB( p , . . . , p n ) , and µ := (cid:80) ni =1 p n and σ := (cid:80) ni =1 p i (1 − p i ) .(1) [76, Chapter V] There exists a universal constant

C > such that d p ( L ( X ) , N ( µ, σ )) ≤ Cσ for all p ≥ . (3.8) (2) [14, 84] For each p ≥ , there exists a constant C p > such that W p ( L ( X ) , N ( µ, σ )) ≤ C p σ . (3.9)Goldstein [49] proved L p bound (3.8) for p = 1 with C = 1. The general case follows fromthe inequality d p ( µ, ν ) p ≤ d ∞ ( µ, ν ) p − d ( µ, ν ) together with Goldstein’s L bound and theuniform bound (3.7). By the Kantorovich-Rubinstein duality, d ( µ, ν ) = W ( µ, ν ). So thebound (3.9) holds for p = 1 with C = 1. For general p , the bound (3.9) is a consequence ofthe fact that for Z = (cid:80) ni =1 ξ i with ξ i ’s independent, E ξ i = 0 and (cid:80) ni =1 Var( ξ i ) = 1, W p ( L ( Z ) , N (0 , ≤ C p (cid:32) n (cid:88) i =1 E | Z i | p +1 (cid:33) p . This result was proved in [84] for 1 ≤ p ≤

2, and generalized to all p ≥ Binomial approximation . The binomial approximation of the Poisson binomial is lesserknown. The ﬁrst result of this kind is due to Ehm [41] who proved that for X ∼ PB( p , . . . , p n ), d T V ( L ( X ) , Bin( n, µ/n )) ≤ − ( µ/n ) n +1 − (1 − µ/n ) n +1 ( n + 1)(1 − µ/n ) µ/n n (cid:88) i =1 ( p i − µ/n ) . (3.10)Elm’s approach was extended to a Krawtchouk expansion in [87]. The advantage of thebinomial approximation over the Poisson approximation is justiﬁed by the following resultdue to Choi and Xia [31]. WENPIN TANG AND FENGMIN TANG

Theorem 3.7.

Let X ∼ PB( p , . . . , p n ) , and µ := (cid:80) ni =1 p n . For m ≥ , let d m := d T V ( L ( X ) , Bin( m, µ/m )) . Then for m suﬃciently large, d m < d m +1 < · · · < d T V ( L ( X ) , Poi( µ )) . (3.11)See also [6, 73] for multi-parameter binomial approximations, and [95] for the P´olya approx-imation of the Poisson binomial distribution.4. Poisson binomial distributions, polynomials with nonnegative coefficientsand optimal transport

In this section, we discuss aspects of the Poisson binomial distribution related to poly-nomials with nonnegative coeﬃcients. For X ∼ PB( p , . . . , p n ), the probability generatingfunction (PGF) of X is f ( u ) := E X u = n (cid:89) i =1 ( p i u + 1 − p i ) . (4.1)It is easy to see that f is a polynomial with all nonnegative coeﬃcients, and all of its rootsare real negative. The story starts with the following remarkable theorem, due to Aissen,Endrei, Schoenberg and Whitney [1, 2]. Theorem 4.1. [1, 2]

Let ( a , . . . , a n ) be a sequence of nonnegative real numbers, with asso-ciated generating polynomial f ( z ) := (cid:80) ni =0 a i z i . The following conditions are equivalent:(1) The polynomial f ( z ) has only real roots.(2) The sequence ( a /f (1) , . . . , a n /f (1)) is the probability distribution of a PB( p , . . . , p n ) distribution for some p i . The real roots of f ( z ) are − (1 − p i ) /p i for i with p i > .(3) The sequence ( a , . . . , a n ) is a P´olya frequency (PF) sequence, i.e. the Toeplitz matrix ( a j − i ) i,j is totally nonnegative. See [4] for background on total positivity. From a computational aspect, the condition (3)amounts to solving a system of n ( n − / Corollary 4.2.

A random variable X ∼ PB( p , . . . , p n ) for some p i if and only if X isstrongly Rayleigh on { , . . . , n } . In the sequel, we use the terminologies ‘Poisson binomial’ and ‘strongly Rayleigh’ inter-changeably. Call a polynomial f ( z ) = (cid:80) ni =0 a i z i with a i ≥ OISSON BINOMIAL 9

For n ≥

5, it is hopeless to get any ‘simple’ necessary and suﬃcient condition for f to bestrongly Rayleigh due to Abel’s impossibility theorem. A necessary condition for f to bestrong Rayleigh is the Newton’s inequality: a i ≥ a i − a i +1 (cid:18) i (cid:19) (cid:18) n − i (cid:19) , ≤ i ≤ n − , (4.2)The sequence ( a i ; 0 ≤ i ≤ n ) satisfying (4.2) is also said to be ultra-logconcave [74]. Conse-quently, ( a i ; 0 ≤ i ≤ n ) is logconcave and unimodal. A lesser known suﬃcient condition isgiven in [58, 63]: a i > a i − a i +1 . ≤ i ≤ n − . (4.3)See also [50, 61] for various generalizations. As observed in [62], the inequality (4.3) cannot beimproved since the sequence ( m i ; i ≥

0) deﬁned by m i := inf (cid:110) a i a i − a i +1 ; f is strong Rayleigh (cid:111) decreases from m = 4 to its limit approximately 3 . X is a strong Rayleigh, or Poisson binomialrandom variable, how well can one approximate jX/k for each j, k ≥ j = 1was solved in that paper. Theorem 4.3. [47]

Let X be a strongly Rayleigh random variable. Then (cid:4) Xk (cid:5) is stronglyRayleigh for each k ≥ , where (cid:98) x (cid:99) is the integer part of x . The key to the proof of Theorem 4.3 is [47, Theorem 4.3]: For f a polynomial of degree n and k ≥

1, write f ( z ) = (cid:80) k − j =0 x j g j ( z k ), with g j a polynomial of degree (cid:98) n − jk (cid:99) . The theoremasserts that if f is strongly Rayleigh, then so are g i ’s with interlacing roots. In fact, thereal-rootedness follows from the fact that( a n ; n ≥

0) is a P´olya frequency sequence = ⇒ ( a kn + j ; n ≥

0) is a P´olya frequency sequence , for each k ≥ ≤ j < k . This result is well known, see [1, Theorem 7] or [22, Theorem3.5.4]. But the root interlacing seems less obvious by P´olya frequency sequences.A natural question is whether (cid:98) jX/k (cid:99) is strongly Rayleigh for each j, k ≥

1. It turnsout that (cid:98) X/ (cid:99) can be far away from being strongly Rayleigh. In fact, one can prove thefollowing theorem. Theorem 4.4.

Let X ∼ Bin(3 n, / , and z i be the roots of the probability generating func-tion of (cid:98) X/ (cid:99) . Then max i {(cid:61) ( z i ) } ≥ (cid:114) n − n − , (4.4) where (cid:61) ( z ) is the imaginary part of z . The reason why some roots of the PGF of (cid:98) X/ (cid:99) have large positive imaginary parts is dueto the unbalanced allocation of probability weights to even and odd numbers: P (cid:0)(cid:4) X (cid:5) = 2 k (cid:1) = (cid:0) n +13 k +1 (cid:1) while P (cid:0)(cid:4) X (cid:5) = 2 k + 1 (cid:1) = (cid:0) n k +2 (cid:1) . So the Newton’s inequality (4.2) is not satisﬁed. Optimal transport . For simplicity, we consider X ∼ Bin(3 n, / Y which is strongly Rayleigh on { , , . . . , n } such that sup | Y − X/ | is as smallas possible. Now we provide a formulation of this problem via optimal transport. For µ , ν two probability measures, deﬁne W ∞ ( µ, ν ) := inf γ ∈ π ( µ,ν ) { γ − ess sup | x − y |} , (4.5)where π ( µ, ν ) is the set of couplings of µ and ν . The metric W ∞ ( · , · ) is known as the ∞ -Wasserstein distance, see [103]. A coupling γ which achieves the inﬁmum (4.5) is called anoptimal transference plan. By abuse of notation, write W ∞ ( X, Y ) for X ∼ µ , Y ∼ ν . Wewant to solve the following optimization problem:Acc (cid:18) X (cid:19) := inf (cid:26) W ∞ (cid:18) X , Y (cid:19) ; Y is strongly Rayleigh on { , , . . . , n } (cid:27) . (4.6)Here Acc(2 X/

3) stands for the accuracy of strongly Rayleigh approximations to 2 X/

3. So thesmaller the value of Acc(2 X/

3) is, the better the approximation is. In [47], it was conjecturedthat Acc(2 X/

3) = O (1). The problem (4.6) can be divided into two stages:(1) Given the distribution of Y , ﬁnd an optimal transference plan Y = φ (2 X/

3) withpossibly random φ . This is the Monge(-Kantorovich) problem.(2) Find Y among all strongly Rayleigh distributions on { , , . . . , n } which achieves theinﬁmum of W ∞ (2 X/ , Y ).It might be diﬃcult to solve the problem (4.6) explicitly, but one can obtain a good upperbound by constructing a suitable transference plan. For example, the transference plan belowshows that for X ∼ Bin(9 , / X/ Y ∼ Bin(6 , / W ∞ (2 X/ , Y ) ≤

1. This implies that Acc (2 X/ ≤ X ∼ Bin(9 , / X/

3) with X ∼ Bin( n, /

2) for small n ’s. Figure 1.

A transference plan from Bin(9 , /

2) to Bin(6 , / n, p ) random variable forany p can approximate 2 X/

3. Unfortunately, the approximation is not so good as proved inthe following proposition.

OISSON BINOMIAL 11

Proposition 4.5.

Let X ∼ Bin(3 n, / , and Y ∼ Bin(2 n, p ) for ≤ p ≤ . Then thereexists C p > such that W ∞ (cid:18) X , Y (cid:19) ≥ C p n for large n. (4.7) Proof.

The extreme cases p = 0 , < p <

1. Considertransfer from 2 X/ { Y = 0 } with probability mass (1 − p ) n . By deﬁnition of W ∞ , W ∞ (cid:18) X , Y (cid:19) ≥ inf (cid:40) k ; (1 − p ) n ≤ n k (cid:88) i =0 (cid:18) ni (cid:19)(cid:41) . It is well known that for any λ < / (cid:80) λni = o (cid:0) ni (cid:1) = 2 nH ( λ )+ o ( n ) , where H ( λ ) := − λ log ( λ ) − (1 − λ ) log (1 − λ ). It follows from standard analysis that for p < − / √ W ∞ (cid:0) X , Y (cid:1) ≥ λ p n , where λ p is the unique solution on [0 , /

2) to the equation H ( λ ) = log (1 − p ) + 1.Similarly by considering transfer from 2 X/ { Y = 2 n } with probability mass p n , weget for p > / √ W ∞ (cid:0) X , Y (cid:1) ≥ λ p n , where λ p is the unique solution on [0 , /

2) to theequation H ( λ ) = log ( p ) + 1. We take C p to be 3 λ p for p ≥ /

2, and 3 λ p for p < / (cid:3) The problem requires ﬁnding ( p , . . . , p n ) ∈ [0 , n such that W ∞ (2 X/ , PB( p , . . . , p n ))is small. By Proposition 4.5, the values of p , . . . , p n cannot be all too small or too large.Precisely, there exist i ∈ [2 n ] such that p i > / √

8, and j ∈ [2 n ] such that p j < − / √ p i = i n +1 for i ∈ [2 n ]. By letting Y ∼ PB(1 / (2 n + 1) , . . . , n/ (2 n + 1)), we get E (2 X/

3) = E Y = 2 n and Var (cid:0) X (cid:1) ∼ Var Y ∼ n/ W ∞ (cid:18) X, Y (cid:19) ≥ inf (cid:40) k ; n (cid:89) i =1 i n + 1 ≤ (cid:80) ki =1 (cid:0) ni (cid:1) n (cid:41) ≥ inf (cid:40) k ; (cid:18) e (cid:19) n ≤ k (cid:88) i =1 (cid:18) ni (cid:19)(cid:41) = 3 λ eq n, where λ eq ≈ . , /

2) to the equation H ( λ ) = 1 − log ( e ).Still the approximation is not good, but much better than the Bin(2 n, p ) approximation. Open problem 4.6.

Is there a random variable Y ∼ PB( p , . . . , p n ) such that W ∞ (2 X/ , Y ) is of order o ( n ) ? What is the lower bound of Acc(2 X/ ? Coeﬃcients of Poisson binomial PGF . For simplicity, we take X ∼ Bin(3 n − , / (cid:98) X/ (cid:99) does not satisfy the Newton’s inequality.It is interesting to ask the following: can we ﬁnd ( a , . . . , a n − ) ∈ R n + such that a k + a k +1 = (cid:18) n − k (cid:19) + (cid:18) n − k + 1 (cid:19) + (cid:18) n − k + 2 (cid:19) for k ∈ [ n − , (4.8)and the polynomial P ( x ) := (cid:80) n − k =1 a k x k has all real roots ? If we are able to ﬁnd such( a , . . . , a n − ), then Acc(2 X/ ≤ / a , . . . , a n − ) must satisfy the Newton’s inequality and thus is unimodal. See also [90] forhigher order Newton’s inequalities. According to (4.8), a + a = Θ( n ), meaning that a + a ∼ Cn for some C >

0. If a = Θ( n ), then the condition a ≥ a a implies that a = O ( n ). Further the condition a ≥ a a gives that a = O ( n ). Consequently, a + a = O ( n ) which contradicts thefact that a + a = Θ( n ). So we have a = o ( n ) and a = Θ( n ). A similar argumentshows that for any ﬁxed k , a k = o ( n k +2 ) and a k +1 = Θ( n k +2 ). It can be shown that a k = Θ( n k ) for any ﬁxed k . But the choice for the bulk terms such as a n − , a n is a moresubtle issue since the terms (cid:0) n (cid:98) n/ (cid:99)− (cid:1) , (cid:0) n (cid:98) n/ (cid:99) (cid:1) and (cid:0) n (cid:98) n/ (cid:99) +1 (cid:1) are comparable.In Appendix A, we see that Acc(2 X/

3) = 1 / n = 1, and Acc(2 X/

3) = 2 / n = 2.Further we get, • n = 3: Acc(2 X/

3) = 2 /

3, achieved by a strongly Rayleigh variable with PGF12 (3 + 34 x + 91 x + 91 x + 34 x + 3 x ) . • n = 4: Acc(2 X/

3) = 2 /

3, achieved by a strongly Rayleigh variable with PGF12 (4 + 63 x + 310 x + 647 x + 647 x + 310 x + 63 x + 4 x ) . • n = 5: Acc(2 X/

3) = 2 /

3, achieved by a strongly Rayleigh variable with PGF12 (4 + 102 x + 760 . x + 2606 . x + 4719 x + 4719 x + 2606 . x + 760 . x + 102 x + 4 x ) . From small n cases, we speculate there is a strongly Rayleigh polynomial P ( x ) whosecoeﬃcients satisfy (4.8) and the symmetric/self-reciprocal condition: a k = a n − − k for k ∈ [ n − . (4.9)Such polynomials are instances of Λ -polynomials [23], whose coeﬃcients are symmetric andunimodal. In general, for each n ≥ n − Q k ∈ Z [ a , · · · , a n ] such that the polynomial with real coeﬃcients P ( x ) has only real roots if andonly if Q k ≥ k . These Q k ’s can be constructed as the leading coeﬃcients of theSturm’s sequence of P , see e.g. [99, Section 1.3]. They are also the subresultants of theSylvester matrix of P and P (cid:48) up to sign changes. In other words, we try to ﬁnd whether theset S := { ( a , . . . , a n − ) ∈ R + : (4.8) , (4.9) hold and Q k ≥ k } is empty or not. The set S is semi-algebraic. According to Stengle’s Positivstellensatz [98],the non-emptiness of S is equivalent to − / ∈ C ( Q , . . . , Q n − )+ I (cid:18) a k + a k +1 − (cid:18) n − k (cid:19) − (cid:18) n − k + 1 (cid:19) − (cid:18) n − k + 2 (cid:19) , a k − a n − − k (cid:19) , where C is the cone and I is the ideal. However, the size of the polynomials Q k growsvery fast, and hence exact computations become impossible. See also [69, 83] for relateddiscussions. Hurwitz stability . Recently, Liggett [65] proved an interesting result of (cid:98) X/ (cid:99) for X astrongly Rayleigh variable. Theorem 4.7. [65]

Let X be a strongly Rayleigh random variable. Then the PGF of (cid:98) X/ (cid:99) is Hurwitz stable. That is, all its roots have negative real parts. OISSON BINOMIAL 13

The idea is to write the PGF of (cid:98) X/ (cid:99) as g ( x )+ xg ( x ), where g and g have interlacingroots. By the Hermite-Biehler theorem [10, 51], such polynomials are Hurwitz stable. Thismeans that the PGF of (cid:98) X/ (cid:99) can be factorized into polynomials with positive coeﬃcientsof degrees no greater than 2. Thus, (cid:98) X/ (cid:99) is a Poisson multinomial variable, that is thesum of independent random variables with values in { , , } . In general, it can be shownthat (cid:98) jX/k (cid:99) is expressed as g ( x j ) + xg ( x j ) + · · · + x j − g j − ( x j ) , (4.10)where g . . . g j − have simple interlacing roots. We conjecture the following. Conjecture 4.8.

Let X be a strong Rayleigh random variable. Then (cid:98) jX/k (cid:99) is the sum ofindependent random variables with values in { , , . . . , j } . Equivalently, the PGF of (cid:98) jX/k (cid:99) can be factorized into polynomials with positive coeﬃcients of degrees no greater than j . Let P j be the set of polynomials with positive coeﬃcients which can be factorized intopolynomials with positive coeﬃcients of degrees no greater than j , and Q j be the set ofpolynomials which satisﬁes (4.10). From the above discussion, P = Q and P = Q . Butneither implication between P and Q is true, as the following examples in [65] show: • Let f ( z ) = z + z + z + 2 z + z + . The roots of f are z , ¯ z , z , ¯ z and w with values z = 0 .

725 + 0 . i , z = 0 .

435 + 1 . i and w = 0 . z − z )( z − ¯ z )( z − w ) = 0 .

623 + 1 . z − . z + z , so f / ∈ P . But the roots of h , h , h are − , − , − f ∈ Q . • Let f ( z ) = (1 + z + 2 z )(25 + z + 2 z ) = 25 + 25 z + 51 z + 3 z + 4 z + 4 z , which isin P , However, f / ∈ Q since the roots of h , h , h are − , − , − respectively.See also [24, 106, 108] for discussion of positive factorizations of small degree polynomials.5. Computations of Poisson binomial distributions

In this section we discuss a few computational issues of learning and computing the Poissonbinomial distribution.

Learning the Poisson binomial distribution . Distribution learning is an active domainin both statistics and computer science. Following [36], given access to independent samplesfrom an unknown distribution P , an error control (cid:15) > δ >

0, alearning algorithm outputs an estimation (cid:98) P such that P ( d T V ( (cid:98) P , P ) ≤ (cid:15) ) ≥ − δ . The perfor-mance of a learning algorithm is measured by its sample complexity and its computationalcomplexity.For X ∼ PB( p , . . . , p n ), this amounts to ﬁnding a vector ( (cid:98) p , . . . , (cid:98) p n ) deﬁning (cid:98) X ∼ PB( (cid:98) p , . . . , (cid:98) p n ) such that d T V ( (cid:98) X, X ) is small with high probability. This is often calledproper learning of Poisson binomial distributions. Building upon previous work [12, 35, 85],Daskalakis, Diakonikolas and Servedio [34] established the following result for proper learningof Poisson binomial distributions.

Theorem 5.1. [34]

Let X ∼ PB( p , . . . , p n ) with unknown p i ’s. There is an algorithm suchthat given (cid:15), δ > , it requires • (sample complexity) O (1 /(cid:15) ) · log(1 /δ ) independent samples from X , • (computational complexity) (1 /(cid:15) ) O (log (1 /(cid:15) )) · O (log n · log(1 /δ )) operations, to construct a vector ( (cid:98) p , . . . , (cid:98) p n ) satisfying P ( d T V ( (cid:98) X, X ) ≤ (cid:15) ) ≥ − δ for (cid:98) X ∼ PB( (cid:98) p , . . . , (cid:98) p n ) . The key to the algorithm is to ﬁnd subsets covering all Poisson binomial distributions, andeach of these subsets is either ‘sparse’ or ‘heavy’. Applying Birg´e’s algorithm [12] to sparsesubsets, and the translated Poisson approximation (Theorem 3.3) to heavy subsets give thedesired algorithm. Note that the sample complexity in Theorem 5.1 is nearly optimal, sinceΘ(1 /(cid:15) ) samples are required to distinguish Bin( n, /

2) from Bin( n, / (cid:15)/ √ n ) which diﬀerby Θ( (cid:15) ) in total variation. See also [39] for further results on learning the Poisson binomialdistribution, and [33, 37, 38] for the integer-valued distribution. Computing the Poisson binomial distribution . Recall the probability distribution of X ∼ PB( p , . . . , p n ) from (1.1). A brute-force computation of this distribution is expensivefor large n . Approximations in Section 3 are often used to estimate the probability distribu-tion/CDF of the Poisson binomial distribution. Here we focus on the eﬃcient algorithms tocompute exactly these distribution functions. There are two general approaches: recursiveformulas and discrete Fourier analysis.In [29], the authors presented several recursive algorithms to compute (1.1). For B ⊂ [ n ],deﬁne R ( k, B ) := (cid:88) A ⊂ B, | A | = k (cid:32)(cid:89) i ∈ A p i − p i (cid:33) . So P ( X = k ) = R ( k, [ n ]) · (cid:81) ni =1 (1 − p i ). Now the problem is to ﬁnd eﬃcient ways to compute R ( k, B ). Two recursive algorithms are proposed: • [30, 97] For B ⊂ [ n ], by letting T ( i, B ) := (cid:80) j ∈ B (cid:16) p j − p j (cid:17) i , R ( k, B ) = 1 k k (cid:88) i =1 ( − i +1 T ( i, B ) R ( k − i, B ) , (5.1) • [45] For B ⊂ [ n ], R ( k, B ) = R ( k, B \ { k } ) + p k − p k R ( k − , B \ { k } ) . (5.2)In another direction, [43, 56] used a Fourier approach to evaluate the probability distribu-tion/CDF of Poisson binomial distributions. They provided the following explicit formulas: P ( X = k ) = 1 n + 1 n (cid:88) j =0 exp( − iωkj ) x j , (5.3)and P ( X ≤ k ) = 1 n + 1 n (cid:88) j =0 − exp( − iω ( k + 1) j )1 − exp( − iωj ) x j , (5.4)where ω := πn +1 and x j := (cid:81) nk =1 (1 − p k + p k exp( iωj )). In particular, the r.h.s of (5.3) isthe discrete Fourier transform of { x , . . . , x n } which can be easily computed by Fast FourierTransform. See also [13] for a related approach. OISSON BINOMIAL 15

Appendix A. Accuracy of X/ for small n Recall the deﬁnition of Acc( · ) from (4.6). We compute the values of Acc(2 X/

3) with X ∼ Bin( n, /

2) for 1 ≤ n ≤ • n = 1: Let Y ∼ Ber(1 / p ) is a Bernoulli variable with parameter p . Itis easy to see that Acc(2 X/

3) = W ∞ (2 X/ , Y ) = 1 / . That is, the weight P (2 X/ / { Y = 0 } , and the weight P (2 X/ /

3) = 1 / { Y = 1 } . • n = 2: Let Y ∼ Ber(3 / X/

3) = W ∞ (2 X/ , Y ) = 1 / . So the weight P (2 X/ / { Y = 0 } , and the weight P (2 X/ ∈{ / , / } ) = 3 / { Y = 1 } . • n = 3: suppose that W ∞ (2 X/ , Y ) = 1 / Y . Thenthe weight P (2 X/ / { Y = 0 } , the weight P (2 X/ ∈{ / , / } ) = 3 / { Y = 1 } , and the weight P (2 X/ / { Y = 2 } . The PGF of Y is 1 / x/ x /

8, which has two distinctreal roots − ± √

8. Thus,Acc(2 X/

3) = W ∞ (cid:18) X/ , PB (cid:18)

14 + √ , − √ (cid:19)(cid:19) = 1 / . • n = 4: if W ∞ (2 X/ , Y ) = 1 / Y , then the PGF of Y is 1 / x/

16 + 4 x /

16 + x /

16. This PGF has one real root and two imaginary roots, so Y cannot be strongly Rayleigh. There are many ways to construct a strongly Rayleighvariable Y such that W ∞ (2 X/ , Y ) = 2 /

3. For instance, the weight P (2 X/ /

16 is transferred to { Y = 0 } , the weight P (2 X/ ∈ { / , / } ) = 10 /

16 istransferred to { Y = 1 } and the weight P (2 X/ ∈ { , / } ) = 5 /

16 is transferred to { Y = 2 } . SoAcc(2 X/

3) = W ∞ (cid:18) X/ , PB (cid:18)

12 + 2 / √ , − / √ (cid:19)(cid:19) = 2 / . In fact, we can ﬁnd all strongly Rayleigh Y such that W ∞ (2 X/ , Y ) = 2 /

3. Thereare two cases:(1) The range of Y is { , , } . Suppose θ /

16 with θ ≤ P (2 X/ /

3) istransferred to { Y = 1 } , and θ /

16 with θ ≤ P (2 X/ /

3) is transferredto { Y = 1 } . Then the PGF of Y is5 − θ

16 + θ + θ x + 11 − θ x . So Y is strongly Rayleigh if and only if ( θ + θ ) ≥ − θ )(11 − θ ). Figure2 (Left) shows the valid region of ( θ , θ ).(2) The range of Y is { , , , } . Assume the same as in (1), and in addition θ / θ ≤ P (2 X/ /

3) is transferred to { Y = 3 } . Then the PGF of Y is5 − θ

16 + θ + θ x + 11 − θ − θ x + θ x . The discriminant of the cubic equation ax + bx + cx + d = 0 is ∆ := 18 abcd − b d + b c − ac − a d . According to a well known result of Cardano, thecubic equation has three real roots if and only if ∆ ≥ − θ )( θ + θ )(11 − θ − θ ) θ − − θ − θ ) (5 − θ )+ (11 − θ − θ ) ( θ + θ ) − θ + θ ) θ − − θ ) θ ≥ . Figure 2 (Right) shows the valid region of ( θ , θ , θ ). Figure 2.

Left: Valid region of ( θ , θ ). Right: Valid region of ( θ , θ , θ ). • n = 5: a similar argument as in the case n = 4 shows that W ∞ (2 X/ , Y ) (cid:54) = 1 / Y . Again there are many ways to construct astrongly Rayleigh variable Y such that W ∞ (2 X/ , Y ) = 2 /

3. For instance, the weight P (2 X/ /

32 is transferred to { Y = 0 } , the weight P (2 X/ ∈ { / , / } ) =15 /

32 is transferred to { Y = 1 } , the weight P (2 X/ ∈ { , / } ) = 15 /

32 is transferredto { Y = 2 } , and the weight P (2 X/ /

3) = 1 /

32 is transferred to { Y = 3 } .The PGF of Y is then 1 /

32 + 15 x/

32 + 15 x /

32 + x /

32. It is easily seen thatthe coeﬃcients of the above PGF satisfy the Hutchinson-Kurtz condition (4.3). SoAcc(2 X/

3) = 2 /

3. It is more diﬃcult to ﬁnd all strongly Rayleigh variables Y suchthat W ∞ (2 X/ , Y ) = 2 /

3, since the conditions for a quartic function to have all realroots are more complicated [82]. • n = 6: consider the transference plan in Figure 3. It is easy to see that the PGF of Y is 1 / x ) , so Y ∼ Bin(4 , /

2) and Acc(2 X/

3) = 2 / Acknowledgment:

We thank Tom Liggett, Jim Pitman and Terry Tao for helpful discus-sions. We thank Yuting Ye for providing Example 3.4, and Tom Liggett for showing us themanuscript [65].

References [1] M. Aissen, A. Edrei, I. J. Schoenberg, and A. Whitney. On the generating functions of totally positivesequences.

Proc. Nat. Acad. Sci. U. S. A. , 37:303–307, 1951.[2] M. Aissen, I. J. Schoenberg, and A. Whitney. On the generating functions of totally positive sequences.I.

J. Analyse Math. , 2:93–103, 1952.

OISSON BINOMIAL 17

Figure 3.

A transference plan from Bin(6 , /

2) to Bin(4 , / [3] T. W. Anderson and S. M. Samuels. Some inequalities among binomial and Poisson probabilities. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) , pages Vol. I:Statistics, pp. 1–12. Univ. California Press, Berkeley, Calif., 1967.[4] T. Ando. Totally positive matrices.

Linear Algebra Appl. , 90:165–219, 1987.[5] K. Azuma. Weighted sums of certain dependent random variables.

Tohoku Math. J. (2) , 19:357–367,1967.[6] A. Barbour, L. Holst, and S. Janson. Poisson approximation. 1992.[7] A. D. Barbour and P. Hall. On the rate of Poisson convergence.

Math. Proc. Cambridge Philos. Soc. ,95(3):473–480, 1984.[8] D. Berend and T. Tassa. Improved bounds on Bell numbers and on moments of sums of random variables.

Probab. Math. Statist. , 30(2):185–205, 2010.[9] P. J. Bickel and W. R. van Zwet. On a theorem of Hoeﬀding. In

Asymptotic theory of statistical testsand estimation (Proc. Adv. Internat. Sympos., Univ. North Carolina, Chapel Hill, N.C., 1979) , pages307–324. Academic Press, New York-London-Toronto, Ont., 1980.[10] R. Biehler. Sur une classe d’´equations alg´ebriques dont toutes les racines sont r´eelles.

J. Reine Angew.Math. , 87:350–352, 1879.[11] P. Billingsley.

Probability and measure . Wiley Series in Probability and Mathematical Statistics. JohnWiley & Sons, Inc., New York, third edition, 1995.[12] L. Birg´e. Estimation of unimodal densities without smoothness assumptions.

Ann. Statist. , 25(3):970–981, 1997.[13] W. Biscarri, S. D. Zhao, and R. J. Brunner. A simple and fast method for computing the Poissonbinomial distribution function.

Comput. Statist. Data Anal. , 122:92–100, 2018.[14] S. G. Bobkov. Berry-Esseen bounds and Edgeworth expansions in the central limit theorem for transportdistances.

Probab. Theory Related Fields , 170(1-2):229–262, 2018.[15] P. J. Boland. The probability distribution for the number of successes in independent trials.

Comm.Statist. Theory Methods , 36(5-8):1327–1331, 2007.[16] P. J. Boland and F. Proschan. The reliability of k out of n systems. Ann. Probab. , 11(3):760–764, 1983.[17] P. J. Boland, H. Singh, and B. Cukic. Stochastic orders in partition and random testing of software.

J.Appl. Probab. , 39(3):555–565, 2002.[18] P. J. Boland, H. Singh, and B. Cukic. The stochastic precedence ordering with applications in samplingand testing.

J. Appl. Probab. , 41(1):73–82, 2004.[19] J. Borcea and P. Br¨and´en. Applications of stable polynomials to mixed determinants: Johnson’s con-jectures, unimodality, and symmetrized Fischer products.

Duke Math. J. , 143(2):205–223, 2008. [20] J. Borcea and P. Br¨and´en. P´olya-Schur master theorems for circular domains and their boundaries.

Ann.of Math. (2) , 170(1):465–492, 2009.[21] J. Borcea, P.r Br¨and´en, and T. M. Liggett. Negative dependence and the geometry of polynomials.

J.Amer. Math. Soc. , 22(2):521–567, 2009.[22] F. Brenti. Unimodal, log-concave and P´olya frequency sequences in combinatorics.

Mem. Amer. Math.Soc. , 81(413):viii+106, 1989.[23] F. Brenti. Unimodal polynomials arising from symmetric functions.

Proc. Amer. Math. Soc. ,108(4):1133–1141, 1990.[24] W. E. Briggs. Zeros and factors of polynomials with positive coeﬃcients and protein-ligand binding.

Rocky Mountain J. Math. , 15(1):75–89, 1985.[25] T. Broderick, J. Pitman, and M. I. Jordan. Feature allocations, probability functions, and paintboxes.

Bayesian Anal. , 8(4):801–836, 2013.[26] S. Chatterjee, P. Diaconis, and E. Meckes. Exchangeable pairs and Poisson approximation.

Probab.Surv. , 2:64–106, 2005.[27] L. H. Y. Chen and Q-M Shao. A non-uniform Berry-Esseen bound via Stein’s method.

Probab. TheoryRelated Fields , 120(2):236–254, 2001.[28] L. H. Y. Chen and Q-M Shao. Stein’s method for normal approximation. In

An introduction to Stein’smethod , volume 4 of

Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singap. , pages 1–59. Singapore Univ.Press, Singapore, 2005.[29] S. X. Chen and J. S. Liu. Statistical applications of the Poisson-binomial and conditional Bernoullidistributions.

Statist. Sinica , 7(4):875–892, 1997.[30] X-H Chen, A. P. Dempster, and J. S. Liu. Weighted ﬁnite population sampling to maximize entropy.

Biometrika , 81(3):457–469, 1994.[31] K. P. Choi and A-H Xia. Approximating the number of successes in independent trials: binomial versusPoisson.

Ann. Appl. Probab. , 12(4):1139–1148, 2002.[32] J. N. Darroch. On the distribution of the number of successes in independent trials.

Ann. Math. Statist. ,35, 1964.[33] C. Daskalakis, I. Diakonikolas, R. O’Donnell, R. A. Servedio, and L-Y Tan. Learning sums of indepen-dent integer random variables. In

Foundations of Computer Science (FOCS), 2013 IEEE 54th AnnualSymposium , pages 217–226, 2013.[34] C. Daskalakis, I. Diakonikolas, and R. A. Servedio. Learning Poisson binomial distributions.

Algorith-mica , 72(1):316–357, 2015.[35] C. Daskalakis and C. Papadimitriou. Sparse covers for sums of indicators.

Probab. Theory Related Fields ,162(3-4):679–705, 2015.[36] L. Devroye and G. Lugosi.

Combinatorial methods in density estimation . Springer Series in Statistics.Springer-Verlag, New York, 2001.[37] I. Diakonikolas, D. M. Kane, and A. Stewart. The Fourier transform of Poisson multinomial distribu-tions and its algorithmic applications. In

STOC’16—Proceedings of the 48th Annual ACM SIGACTSymposium on Theory of Computing , pages 1060–1073. ACM, New York, 2016.[38] I. Diakonikolas, D. M. Kane, and A. Stewart. Optimal learning via the Fourier transform for sums ofindependent integer random variables. In

Conference on Learning Theory , pages 831–849, 2016.[39] I. Diakonikolas, D. M. Kane, and A. Stewart. Properly learning Poisson binomial distributions in almostpolynomial time. In

Conference on Learning Theory , pages 850–878, 2016.[40] D. Duﬃe, L. Saita, and K. Wang. Multi-period corporate default prediction with stochastic covariates.

Journal of Financial Economics , 83(3):635–665, 2007.[41] W. Ehm. Binomial approximation to the Poisson binomial distribution.

Statist. Probab. Lett. , 11(1):7–16,1991.[42] S. Fallat, C. R. Johnson, and A. D. Sokal. Total positivity of sums, Hadamard products and Hadamardpowers: results and counterexamples.

Linear Algebra Appl. , 520:242–259, 2017.[43] M. Fern´andez and S. Williams. Closed-form expression for the poisson-binomial probability densityfunction.

IEEE Transactions on Aerospace and Electronic Systems , 46(2):803–817, 2010.[44] M. F. Fernandez and T. Aridgides. Measures for evaluating sea mine identiﬁcation processing perfor-mance and the enhancements provided by fusing multisensor/multiprocess data via an M-out-of-N voting

OISSON BINOMIAL 19 scheme. In

Detection and Remediation Technologies for Mines and Minelike Targets VIII , volume 5089,pages 425–436, 2003.[45] M. H. Gail, J. H. Lubin, and L. V. Rubinstein. Likelihood calculations for matched case-control studiesand survival studies with tied death times.

Biometrika , 68(3):703–707, 1981.[46] M. Gasca and J. M. Pe˜na. Total positivity and Neville elimination.

Linear Algebra Appl. , 165:25–44,1992.[47] S. Ghosh, T. M. Liggett, and R. Pemantle. Multivariate CLT follows from strong Rayleigh property. In ,pages 139–147. SIAM, Philadelphia, PA, 2017.[48] L. J. Gleser. On the distribution of the number of successes in independent trials.

Ann. Proba. , 3:182–188,1975.[49] L. Goldstein. Bounds on the constant in the mean central limit theorem.

Ann. Probab. , 38(4):1672–1689,2010.[50] D. Handelman. Arguments of zeros of highly log concave polynomials.

Rocky Mountain J. Math. ,43(1):149–177, 2013.[51] C. Hermite. Sur le nombre des racines d’une ´equation alg´ebrique comprises entre des limites donn´ees.

J. Reine Angew. Math. , 52:39–51, 1856.[52] E. Hillion and O. Johnson. A proof of the Shepp-Olkin entropy concavity conjecture.

Bernoulli ,23(4B):3638–3649, 2017.[53] W. Hoeﬀding. On the distribution of the number of successes in independent trials.

The Annals ofMathematical Statistics , 27(3):713–721, 1956.[54] W. Hoeﬀding. Probability inequalities for sums of bounded random variables.

J. Amer. Statist. Assoc. ,58:13–30, 1963.[55] O. Holtz and M. Tyaglov. Structured matrices, continued fractions, and root localization of polynomials.

SIAM Rev. , 54(3):421–509, 2012.[56] Y-L Hong. On computing the distribution function for the Poisson binomial distribution.

Comput.Statist. Data Anal. , 59:41–51, 2013.[57] Y-L Hong, W. Q. Meeker, and J. D. McCalley. Prediction of remaining life of power transformers basedon left truncated and right censored lifetime data.

Ann. Appl. Stat. , 3(2):857–879, 2009.[58] J. I. Hutchinson. On a remarkable class of entire functions.

Trans. Amer. Math. Soc. , 25(3):325–332,1923.[59] S. Janson. Coupling and Poisson approximation.

Acta Appl. Math. , 34(1-2):7–15, 1994.[60] K. Jogdeo and S. M. Samuels. Monotone convergence of binomial probabilities and a generalization ofRamanujan’s equation.

Ann. Math. Statist. , 39(4):1191–1195, 1968.[61] O. M. Katkova and A. M. Vishnyakova. A suﬃcient condition for a polynomial to be stable.

J. Math.Anal. Appl. , 347(1):81–89, 2008.[62] V. P. Kostov and B. Shapiro. Hardy-Petrovitch-Hutchinson’s problem and partial theta function.

DukeMath. J. , 162(5):825–861, 2013.[63] D. C. Kurtz. A suﬃcient condition for all the roots of a polynomial to be real.

Amer. Math. Monthly ,99(3):259–263, 1992.[64] L. Le Cam. An approximation theorem for the Poisson binomial distribution.

Paciﬁc J. Math. , 10:1181–1197, 1960.[65] T. M. Liggett. Approximating multiples of strong Rayleigh random variables. 2018. Unpublished man-uscript.[66] A. W. Marshall, I. Olkin, and B. C. Arnold.

Inequalities: theory of majorization and its applications .Springer Series in Statistics. Springer, New York, second edition, 2011.[67] K. Neammanee. On the constant in the nonuniform version of the Berry-Esseen theorem.

Int. J. Math.Math. Sci. , (12):1951–1967, 2005.[68] J. Nedelman and T. Wallenius. Bernoulli trials, Poisson trials, surprising variances, and Jensen’s in-equality.

Amer. Statist. , 40(4):286–289, 1986.[69] C. P. Niculescu. Interpolating Newton’s inequalities.

Bull. Math. Soc. Sci. Math. Roumanie (N.S.) ,47(95)(1-2):67–83, 2004.[70] S. Y. Novak. Poisson approximation.

Probab. Surv. , 16:228–276, 2019. [71] M. Okamoto. Some inequalities relating to the partial sum of binomial probabilities.

Ann. Inst. Statist.Math. , 10:29–35, 1958.[72] L. Paditz. On the analytical structure of the constant in the nonuniform version of the Esseen inequality.

Statistics , 20(3):453–464, 1989.[73] E. A. Pek¨oz, A. R¨ollin, V. ˇCekanaviˇcius, and M. Shwartz. A three-parameter binomial approximation.

J. Appl. Probab. , 46(4):1073–1085, 2009.[74] R. Pemantle. Towards a theory of negative dependence. volume 41, pages 1371–1390. 2000. Probabilistictechniques in equilibrium and nonequilibrium statistical physics.[75] V. V. Petrov. A bound for the deviation of the distribution of a sum of independent random variablesfrom the normal law.

Dokl. Akad. Nauk SSSR , 160:1013–1015, 1965.[76] V. V. Petrov.

Sums of independent random variables . Springer-Verlag, New York-Heidelberg, 1975.Translated from the Russian by A. A. Brown, Ergebnisse der Mathematik und ihrer Grenzgebiete, Band82.[77] J. Pitman.

Probability . Springer Texts in Statistics. Springer-Verlag New York, 1993.[78] J. Pitman. Probabilistic bounds on the coeﬃcients of polynomials with only real zeros.

J. Combin.Theory Ser. A , 77(2):279–303, 1997.[79] M. L. Platonov.

Combinatorial numbers of a class of mappings and their applications . “Nauka”, Moscow,1979.[80] G. Pledger and F. Proschan. Comparisons of order statistics and of spacings from heterogeneous distri-butions. In

Optimizing methods in statistics (Proc. Sympos., Ohio State Univ., Columbus, Ohio, 1971) ,pages 89–113, 1971.[81] S. D. Poisson.

Recherches sur la probabilit´e des jugements en mati`ere criminelle et en mati`ere civile .Bachelier, 1837.[82] E. L. Rees. Graphical Discussion of the Roots of a Quartic Equation.

Amer. Math. Monthly , 29(2):51–55,1922.[83] K. Rietsch. Totally positive Toeplitz matrices and quantum cohomology of partial ﬂag varieties.

J.Amer. Math. Soc. , 16(2):363–392, 2003.[84] E. Rio. Upper bounds for minimal distances in the central limit theorem.

Ann. Inst. Henri Poincar´eProbab. Stat. , 45(3):802–817, 2009.[85] A. R¨ollin. Translated Poisson approximation using exchangeable pair couplings.

Ann. Appl. Probab. ,17(5-6):1596–1614, 2007.[86] B. Roos. Asymptotic and sharp bounds in the Poisson approximation to the Poisson-binomial distribu-tion.

Bernoulli , 5(6):1021–1034, 1999.[87] B. Roos. Binomial approximation to the Poisson binomial distribution: the Krawtchouk expansion.

Teor. Veroyatnost. i Primenen. , 45(2):328–344, 2000.[88] P. R. Rosenbaum.

Observational studies . Springer Series in Statistics. Springer-Verlag, New York, secondedition, 2002.[89] E. Rosenman. Some new results for Poisson binomial models. 2018. arXiv:1907.09053.[90] S. Rosset. Normalized symmetric functions, Newton’s inequalities and a new set of stronger inequalities.

Amer. Math. Monthly , 96(9):815–819, 1989.[91] S. M. Samuels. On the number of successes in independent trials.

Ann. Math. Statist. , 36:1272–1278,1965.[92] N. Schumacher. Binomial option pricing with nonidentically distributed returns and its implications.

Mathematical and computer modelling , 29(10-12):121–143, 1999.[93] L. A. Shepp and I. Olkin. Entropy of the sum of independent Bernoulli random variables and of themultinomial distribution. In

Contributions to probability , pages 201–206. Academic Press, New York-London, 1981.[94] I. S. Shiganov. Reﬁnement of the upper bound of a constant in the remainder term of the centrallimit theorem. In

Stability problems for stochastic models (Moscow, 1982) , pages 109–115. Vsesoyuz.Nauchno-Issled. Inst. Sistem. Issled., Moscow, 1982.[95] Max Skipper. A P´olya approximation to the Poisson-binomial law.

J. Appl. Probab. , 49(3):745–757,2012.

OISSON BINOMIAL 21 [96] R. P. Stanley. Log-concave and unimodal sequences in algebra, combinatorics, and geometry. In

Graphtheory and its applications: East and West (Jinan, 1986) , volume 576 of

Ann. New York Acad. Sci. ,pages 500–535. New York Acad. Sci., New York, 1989.[97] C. Stein. Application of Newton’s identities to a generalized birthday problem and to the Poissonbinomial distribution, 1990. Technical Report 354, Department of Statistics, Stanford University.[98] G. Stengle. A nullstellensatz and a positivstellensatz in semialgebraic geometry.

Math. Ann. , 207:87–97,1974.[99] B. Sturmfels.

Solving systems of polynomial equations , volume 97 of

CBMS Regional Conference Seriesin Mathematics . Published for the Conference Board of the Mathematical Sciences, Washington, DC;by the American Mathematical Society, Providence, RI, 2002.[100] A. Tejada and J. Arnold. The role of Poisson’s binomial distribution in the analysis of TEM images.

Ultramicroscopy , 111(11):1553–1556, 2011.[101] P. Thongtha and K. Neammanee. Reﬁnement on the constants in the non-uniform version of the Berry-Esseen theorem.

Thai J. Math. , 5(1):1–13, 2007.[102] P. van Beek. An application of Fourier methods to the problem of sharpening the Berry-Esseen inequality.

Z. Wahrscheinlichkeitstheorie und Verw. Gebiete , 23:187–196, 1972.[103] C. Villani.

Optimal transport: Old and new , volume 338 of

Grundlehren der Mathematischen Wis-senschaften [Fundamental Principles of Mathematical Sciences] . Springer-Verlag, Berlin, 2009.[104] Y. H. Wang. On the number of successes in independent trials.

Statist. Sinica , 3(2):295–312, 1993.[105] Wikipedia. Cubic function. h ttps://en.wikipedia.org/wiki/Cubic function.[106] B. Xia and L. Yang. A new result on the p -irreducibility of binding polynomials. Comput. Math. Appl. ,48(12):1811–1817, 2004.[107] M-C Xu and N. Balakrishnan. On the convolution of heterogeneous Bernoulli random variables.

J. Appl.Probab. , 48(3):877–884, 2011.[108] L-H Zhi and Z-J Liu. p -irreducibility of binding polynomials. Comput. Math. Appl. , 38(2):1–10, 1999.

Department of Industrial Engineering and Operations Research, UC Berkeley. Email:

E-mail address : [email protected] UCLA. Email:

E-mail address ::