[PDF] Bounds on the Poincare constant under negative dependence

Abstract

We give bounds on the Poincare (inverse spectral gap) constant of a non-negative, integer-valued random variable W, under negative dependence assumptions such as ultra log-concavity and total negative dependence. We show that the bounds obtained compare well to others in the literature. Examples treated include some occupancy and urn models, a random graph model and small spacings on the circumference of a circle. Applications to Poisson convergence theorems are considered.

Full PDF

aa r X i v : . [ m a t h . P R ] N ov Bounds on the Poincar´e constant under negativedependence

Fraser Daly a,1 , Oliver Johnson b a Heilbronn Institute for Mathematical Research, Department of Mathematics, University ofBristol, University Walk, Bristol BS8 1TW, UK. b Department of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW,UK.

Abstract

We give bounds on the Poincar´e (inverse spectral gap) constant of a non–negative, integer–valued random variable W , under negative dependence as-sumptions such as ultra log–concavity and total negative dependence. We showthat the bounds obtained compare well to others in the literature. Examplestreated include some occupancy and urn models, a random graph model andsmall spacings on the circumference of a circle. Applications to Poisson conver-gence theorems are considered. Keywords:

Poincar´e constant, Poisson distribution, total negativedependence, ultra log-concavity, size–bias transform, stochastic ordering

Primary 60E15; Secondary 62E10

1. Introduction

Throughout this note we let W be a random variable supported on (a subsetof) Z + = { , , . . . } and let ∆ be the forward diﬀerence operator, so that forany function g : Z + R , ∆ g ( k ) = g ( k + 1) − g ( k ). The main object we wishto consider here is the (discrete) Poincar´e constant, given by R W = sup g ∈G ( W ) (cid:26) E [ g ( W ) ] E [∆ g ( W ) ] (cid:27) , where the supremum is taken over the set G ( W ) = { g : Z + R with E [ g ( W ) ] < ∞ and E [ g ( W )] = 0 } . In this note we give an explicit upper bound on R W when W satisﬁes anegative dependence assumption. Such a bound can be used, for example, to Email addresses: [email protected] (Fraser Daly), [email protected] (Oliver Johnson) Corresponding author. Tel: +44 (0)117 954 5667, Fax: +44 (0)117 928 7999.

Preprint submitted to Statistics and Probability Letters November 10, 2018 stablish Poisson convergence results, as we shall see below. In the exampleswe consider in Section 3 we shall see that our upper bound is easily calculated,and often of the same order as the trivial lower bound we state in Lemma 3.1.Our work can be understood in the context of size–biasing (see, for example,Barbour et al., 1992) and stochastic ordering (see, for example, Shaked and Shanthikumar,2007). For any non–negative, integer–valued random variable W with mean E W = λ >

0, we let W ⋆ have the W –size–biased distribution, given by P ( W ⋆ = j ) = jP ( W = j ) λ , for j = 1 , , . . . . (1)Equivalently, W ∗ can be deﬁned by requiring that E [ W g ( W )] = λ E g ( W ∗ ) , (2)for all functions g for which the expectation of the left hand side exists.Further we let ≤ st denote the usual stochastic ordering, so that X ≤ st Y if E f ( X ) ≤ E f ( Y ) for all increasing functions f . In this paper we shall considerrandom variables W under the assumption that W ∗ ≤ st W + Z for some Z , thesharpness of which can be judged from the fact that W ≤ st W ∗ always. Forexample, we obtain the following bound on the Poincar´e constant: Theorem 1.1.

Let W be a non–negative, integer–valued random variable withmean λ . Suppose that W ⋆ ≤ st W + 1 . Then R W ≤ λ. (3)This theorem is implied by a stronger result, Theorem 5.1, which we prove inSection 5. In Section 2 we state several negative dependence concepts underwhich we have a ﬁnite Poincar´e constant, and state the bounds on R W weobtain under these assumptions. In particular, in Corollary 2.2 we show that (3)holds for W the sum of totally negatively dependent (TND) Bernoulli randomvariables, and in Corollary 2.4 we show (3) holds for W ultra log-concave ofdegree ∞ .The idea of proving such discrete Poincar´e inequalities is not a new one, withprevious authors to consider the problem including Bobkov and G¨otze (1999),Gao and Quastel (2003), Klaassen (1985), Miclo (1999) and Prakasa Rao and Sreehari(1987). Some comparison with the results of other authors is given in Section 3.In this section, we also treat some examples for which bounds on the Poincar´econstant based on other authors’ work are not straightforward to calculate.The main aim of this work is to prove the new bound (Theorem 1.1) onthe Poincar´e constant for discrete random variables. Our approach enablesus to make new connections between the Poincar´e constant and topics suchas stochastic ordering and various forms of negative dependence of randomvariables. In addition, we gain a new perspective on Poisson approximation byusing our approach to give new bounds which are close to the optimal ones.Section 4 shows how the bounds we derive may be used in proving Poissonconvergence results. We will assess closeness of non–negative, integer–valued2andom variables X and Y using the total variation distance d T V ( L ( X ) , L ( Y )) = sup A ⊆ Z + | P ( X ∈ A ) − P ( Y ∈ A ) | . We prove the following result in Section 4:

Theorem 1.2.

For any non-negative, integer–valued random variable W withmean E W = λ and Var( W ) < ∞ , the total variation distance between W and aPoisson random variable with the same mean is bounded by d T V ( L ( W ) , Po( λ )) ≤ − e − λ λ n | λ − R W | + p R W p R W − Var( W ) o . (4)(Note that Lemma 3.1 below states that R W ≥ Var W , so the resulting boundgives a real number).Our results may also be used with several other probability metrics. For ex-ample, using Theorem 1.1 of Barbour and Xia (2006) we can bound the Wasser-stein distance (replacing the Stein factor − e − λ λ of Theorem 1.2 by 1 . / √ λ ).To illustrate Theorem 1.2, we combine it with Theorem 1.1 to give d T V ( L ( W ) , Po( λ )) ≤ (1 − e − λ ) r − Var( W ) λ , (5)for W such that W ⋆ ≤ st W + 1. Note that such W have Var( W ) < λ . The bestknown general bound under this assumption is d T V ( L ( W ) , Po( λ )) ≤ (1 − e − λ ) (cid:18) − Var( W ) λ (cid:19) . (6)See Daly et al. (2012). So, for example, if W ∼ Bin( n, p ) has a binomial distri-bution (which does indeed satisfy the assumptions of Theorem 1.1, as we shallsee later) then the upper bound in (5) is (1 − e − np ) √ p , while (6) gives the upperbound (1 − e − np ) p .We note also that the bound (4) is increasing as a function of R W for R W lying in (Var( W ) , λ ). Hence, it will typically be sharpest when we have an upperbound for R W that is close to Var( W ), and in the limit we recover the bound(6) (under the assumptions of Theorem 1.1).

2. Negative dependence and the Poincar´e constant

In this section we show that our methods can bound the Poincar´e constant ofrandom variables under several well-known deﬁnitions of negative dependence.

Deﬁnition 2.1.

Consider X , . . . , X n to be Bernoulli random variables andwrite W = X + · · · + X n . If, for each i = 1 , . . . , n and all increasing functions f, g ; Z + R , Cov ( f ( X i ) , g ( W − X i )) ≤ , then X , . . . , X n are said to be totally negatively dependent (TND). W = X + . . . + X n , where X , . . . , X n are TND then W ⋆ ≤ st W + 1. Using Theorem 1.1 wecan deduce: Corollary 2.2. If W has mean λ and may be written as a sum of TND Bernoullirandom variables then R W ≤ λ .2.2. Ultra log–concavity Deﬁnition 2.3.

A random variable W with mass function P W is ultra log-concave of degree ∞ , or ULC ( ∞ ) , if the function P W ( x ) x ! is log-concave, orequivalently ρ W ( x ) = ( x + 1) P W ( x + 1) P W ( x ) is non-increasing in x . This class was introduced by Pemantle (2000) and Liggett (1997) in order tocapture properties of negative dependence. It is well-known that Poisson ran-dom variables and the sum of independent Bernoulli random variables both havethe ULC( ∞ ) property.In the language of stochastic ordering, the ULC( ∞ ) property may be writ-ten W ⋆ ≤ lr W + 1, where ≤ lr represents the likelihood ratio ordering. Sincethe likelihood ratio ordering is stronger than the usual stochastic ordering(Shaked and Shanthikumar, 2007, Theorem 1.C.1), it follows that the assump-tion that W is ULC( ∞ ) is stronger than the assumption made by Theorem 1.1.Hence we deduce the following. Corollary 2.4. If W is ULC ( ∞ ) with mean λ , then R W ≤ λ . Note that the ULC condition can be viewed as in the context of the Bakry-´Emerycondition (Bakry and ´Emery, 1985) for continuous measures, under which log-arithmic Sobolev inequalities are known to hold, and hence the continuous ver-sion of the Poincar´e constant is ﬁnite (see An´e et al. (2000) for more details).The Bakry-´Emery condition is known to hold if the relative density f /φ /c islog-concave (in the continuous sense) and ULC requires that P W / Po( λ ) is log-concave (in the discrete sense). (Here φ t is a normal density with mean 0 andvariance t and Po( λ ) is a Poisson mass function with parameter λ ). As well as total negative dependence and ultra log–concavity, there are otherwell-known negative dependence assumptions which ﬁt into the framework ofTheorem 5.1. For example, we recall the deﬁnition of negative association fromJoag–Dev and Proschan (1983).

Deﬁnition 2.5.

Random variables X , . . . , X n are negatively associated if Cov ( f ( X i , i ∈ Γ ) , g ( X i , i ∈ Γ )) ≤ , for all increasing functions f, g , and all Γ , Γ ⊆ { , . . . , n } with Γ ∩ Γ = ∅ . W = X + · · · + X n , where X , . . . , X n are negatively associated,non–negative, integer–valued random variables then Lemma 3 of Daly (2010)shows how to construct a random variable Z such that the assumptions ofTheorem 5.1 are satisﬁed. We refer the reader to that work for further details.Another form of negative dependence is the following: Deﬁnition 2.6.

Indicators X , . . . , X n are negatively related (NR) iif E [ g ( X , ...X i − , X i +1 , ...X n ) | X i = 1] ≤ E [ g ( X , ...X i − , X i +1 , ...X n )] , for all i = 1 , . . . , n and all increasing functions g . This deﬁnition arises naturally in the context of certain urn and graph mod-els – see for example the P´olya sampling example and random graph example(Arratia et al., 1989, Example 1) treated in Section 3.2. NR is more restrictivethan NA in the sense that in the deﬁnition of negative association, the X i don’tneed to be indicators. However the following results are standard: (a) Neg-atively associated indicators are negatively related (see P30 of Barbour et al.,1992), (b) Negatively related implies total negative dependence (see Theorem3.1 of Papadatos and Papathanasiou, 2002).

3. Examples and comparison with other results

We now discuss several straightforward examples, and consider how our re-sults compare with known bounds. First we mention a trivial lower bound, thedirect equivalent of Theorem 2(vi) of Borovkov and Utev (1984):

Lemma 3.1.

For any non–negative, integer–valued random variable W withﬁnite variance: R W ≥ Var( W ) . Proof

Writing λ = E W , consider the function g ( x ) = x − λ , so that ∆ g ( x ) = 1.We know that E [ g ( W ) ] = Var( W ) and E [∆ g ( W ) ] = 1. It follows that R W =sup g ( E g ( W ) ) / ( E (∆ g )( W ) ) ≥ Var( W ). Remark 3.2.

Note that Theorem 1.1 and Lemma 3.1 together imply that for W such that W ⋆ ≤ st W + 1 we have Var( W ) ≤ R W ≤ λ . These bounds arecompatible, in the sense that for W satisfying this condition, taking g ( x ) = x − in Equation (2), the fact that E W ( W −

1) = E [ W g ( W )] = λ E g ( W ⋆ ) ≤ λ E g ( W +1) = λ . (The inequality follows by the deﬁnition of ≤ st ). Equivalently we writethat Var( W ) ≤ λ . .1. Comparison with other results Note that our bounds can be contrasted with results such as Propositions 1and 2 of Miclo (1999), which give upper and lower bounds on the Poincar´e con-stant that diﬀer by a constant multiplicative factor, as opposed to the additivegap found here.In comparing our results with other authors to have considered the discretePoincar´e constant, we shall see that other methods can signiﬁcantly overestimatethe exact value of the constant. Using a generalization of Cheeger’s inequality,we have the following bound.

Theorem 3.3.

Let W be a non–negative, integer–valued random variable withlog-concave mass function P W , then R W ≤ (cid:18) − P W (0) P W (0) (cid:19) . Proof

Theorem 2.1 of Lawler and Sokal (1988) (a generalization of Cheeger’sinequality) states that if, there exists c such that the ratio r ( u ) := P y>u P W ( y ) P W ( u ) ≤ c for all u ≥ , then R W ≤ c . In particular, if P W is log-concave (a weaker restriction thanultra log-concave) then rearranging shows that r ( u ) is decreasing in u , so taking c = r (0) = (1 − P W (0)) /P W (0), the result holds.Note also that other authors, such as Bobkov and G¨otze (1999), give necessaryand suﬃcient conditions for the discrete Poincar´e constant to be ﬁnite withoutgiving explicit upper bounds on the implied Poincar´e constant.We ﬁrst consider the examples of Poisson random variables and sums ofindependent Bernoulli random variables. In these examples it is straightforwardto compare the bounds given by our results and the work of other authors. Insubsequent work we consider examples for which the bound given by Theorem3.3 is less straightforward to evaluate. Example

Let W ∼ Po( λ ) have a Poisson distribution. It is straightforward tosee that W ⋆ and W + 1 are equal in distribution, and so combining Theorem1.1 and Lemma 3.1 we immediately return the well–known result (see Klaassen,1985) that R W = λ . (Alternatively, we can deduce that R W ≤ λ using Corollary2.4 since W is ULC( ∞ )).Cheeger’s inequality, Theorem 3.3, implies that R W ≤ e λ − , which isclearly far from optimal in the Poisson case. Example

Let W = X + · · · + X n , where X , . . . , X n are independent Bernoullirandom variables with P ( X j = 1) = p j . Clearly X , . . . , X n are TND, and socombining Theorem 1.1 and Lemma 3.1 we have that n X j =1 p j (1 − p j ) ≤ R W ≤ n X j =1 p j . p i ≡ p , this gives R W ≤ − p ) n − .We refer the reader also to Section 1.4 of An´e et al. (2000), where anotherupper bound is derived, but using a diﬀerent deﬁnition of the Poincar´e con-stant. In that work, the endpoints of the support of W are identiﬁed, so theirdiﬀerencing operator is not equal to our operator ∆. We turn our attention now to some further examples in which we may applyTheorem 1.1.

Example

Let W have a hypergeometric distribution, so that if we distribute m balls into N urns (each with capacity for up to one ball), W counts the numberof the ﬁrst n of these urns which are occupied. We may write W = X + · · · + X n ,where X j is an indicator that the j th urn is occupied. The random variables X , . . . , X n are TND: see Barbour et al. (1992, Section 6.1). Hence, by Theorem1.1 and Lemma 3.1, mn ( N − n ) N ( N − (cid:16) − mN (cid:17) ≤ R W ≤ mnN . In particular, if m = O ( N ) and n = O ( N ) then R W = O ( N ). See alsoGao and Quastel (2003). Example

Suppose we have n urns into which we distribute m n = ⌊ tn − /c ⌋ balls, for some constants c ∈ { , , . . . } and t >

0. Let µ = t c /c ! and let W countthe number of urns with at least c balls. Papadatos and Papathanasiou (2002)show that W may be written as a sum of TND Bernoulli random variables and,furthermore, that Var( W ) ≥ µ + O ( n − /c ) and E W = µ + O ( n − /c ) – see (4.9)and Remark 4.1(a) of Papadatos and Papathanasiou (2002). Combining theseresults with our Theorem 1.1 and Lemma 3.1 we have that R W = µ + O ( n − /c ). Example

Consider P´olya sampling. We have an urn initially containing N balls of n diﬀerent colours, with m i balls of colour i . At each step, we draw aball, note its colour and return it to the urn together with an additional ball ofthe same colour. We repeat for a total of r draws and let W count the numberof colours not drawn during this process. We write W = X + · · · + X n , where X j is an indicator that no ball of colour j is seen during the r draws. Since X , . . . , X n are negatively related (see Deﬁnition 2.6), they are also TND: seeSection 6.3 of Barbour et al. (1992). It is straightforward to see that for j = kp j = E X j = (cid:0) N − m j + r − r (cid:1)(cid:0) N + r − r (cid:1) , and p jk = E [ X j X k ] = (cid:0) N − m j − m k + r − r (cid:1)(cid:0) N + r − r (cid:1) . n X j =1 p j (1 − p j ) + n X j =1 X k = j ( p jk − p j p k ) ≤ R W ≤ n X j =1 p j . Example

We treat Example 1 from Arratia et al. (1989). Consider the n dimensional cube { , } n with each of the n n − edges independently assignedone of two directions with equal probability. Let W count the number of verticesat which all n incident edges are directed inward. Then W = X + · · · + X n ,where X j is an indicator that vertex j has all its incident edges directed inward.The random variables X , . . . , X n are negatively related (see Deﬁnition 2.6),and thus also TND. Clearly E X j = 2 − n and E [ X j X k ] = (cid:26) j and k share a common edge, E X j E X k otherwise,for j = k . Hence, E W = 1, Var( W ) = 1 − (2 n + 1)2 − n and, by Theorem 1.1and Lemma 3.1, 1 − (2 n + 1)2 − n ≤ R W ≤ Example

Suppose we distribute n points uniformly on the circumference ofa circle of radius (2 π ) − . Let S , . . . , S n be the arc–length distances betweensuccessive points and deﬁne X j = I ( S j < a ) for some a >

0, the indicatorthat S j falls below the threshold a . Then X , . . . , X n are negatively relatedand thus TND (see Section 7.1 of Barbour et al., 1992). From calculations byBarbour et al. (1992, Section 7.2) we have that E W = n (1 − (1 − a ) n − ) andVar( W ) ≥ (1 − na ) E W . Hence, by Theorem 1.1 and Lemma 3.1,(1 − na ) E W ≤ R W ≤ E W .

Corollary 7.B.1(a) of Barbour et al. (1992) shows that if lim inf E W >

0, thedistribution of W converges to that of a Poisson random variable if and only if na →

0. This is closely related to the bounds we have obtained above on R W .This connection will be further explored in Section 4.

4. Poisson convergence and the Poincar´e constant

Prakasa Rao and Sreehari (1987) show that the property R W = Var( W )characterizes the Poisson distribution (up to integer shifts). Indeed, if P ( W =0) >

0, then Theorem 1 of Prakasa Rao and Sreehari (1987) shows that R W =Var( W ) implies that W is Poisson distributed, and hence we may write R W =Var( W ) = E W . We will show here how closeness of R W to Var( W ) and E W im-plies closeness of W to a Poisson random variable, noting that we do indeed needboth of these conditions to guarantee that we are close to a Poisson distribution(rather than a shifted Poisson distribution). Our work is motivated by that8f Utev (1989) relating analogous characterisations for the normal and Poissondistributions to results in convergence to those distributions. The results of thissection serve to illustrate the beneﬁt of sharp bounds on the Poincar´e constant.We begin by proving Theorem 1.2. Many of the deﬁnitions and equalitiesused in the proof are drawn from Stein’s method for Poisson approximation.See Barbour et al. (1992) for an introduction to these ideas. Proof of Theorem 1.2

For any non–negative, integer–valued random variable W and g ∈ G ( W ) we have, by the deﬁnition of R W and since E g ( W ) = 0,Var( g ( W )) ≤ R W E [∆ g ( W ) ] , (7)Writing λ for E W , we apply this with the choice g ( x ) = x + yf A ( x ) − λ − y E f A ( W ) , (8)where y ∈ R will be determined later, A ⊆ Z + and f A : Z + R solves theChen-Stein equation I ( x ∈ A ) − P (Po( λ ) ∈ A ) = λf A ( x + 1) − xf A ( x ) , so that d T V ( L ( W ) , Po( λ )) = sup A ⊆ Z + | λ E f A ( W + 1) − E [ W f A ( W )] | . (9)We will need the property thatsup A ⊆ Z + sup x ∈ Z + | ∆ f A ( x ) | ≤ − e − λ λ , (10)see Barbour et al. (1992, Lemma 1.1.1) and Papadatos and Papathanasiou (2002,Equation (1.4)).Now, applying (7) with the choice (8) gives us that αy + 2 βy + γ ≥

0, where α = R W E [∆ f A ( W ) ] − Var( f A ( W )) ,β = R W E [∆ f A ( W )] − E [ W f A ( W )] + λ E f A ( W ) ,γ = R W − Var( W ) . Since this quadratic function can have at most one real root, | β | ≤ √ αγ . Thatis | ( E W ) E f A ( W + 1) − E [ W f A ( W )] + ( R W − E W ) E ∆ f A ( W ) |≤ p R W − Var( W ) p R W E [∆ f A ( W ) ] − Var( f A ( W )) ≤ p R W − Var( W ) p R W (1 − e − λ ) λ , where this last inequality follows from (10). Combining this with the triangleinequality, (9) and (10) we obtain Theorem 1.2.9ote that we can treat the RHS of Equation (4) as a function of R W . WhenVar( W ) ≤ R W ≤ λ (for example when W ⋆ ≤ st W + 1, as discussed in Remark3.2), this function is increasing in R W over this range (since it is concave andincreasing at R W = λ ). Hence, providing tighter bounds from above on R W would give tighter bounds on the rate of Poisson convergence.In view of Corollary 2.2 and Corollary 2.4 we obtain the following: Corollary 4.1.

Let { W n : n ≥ } be non–negative, integer–valued random vari-ables such that lim n →∞ | E W n − Var( W n ) | = 0 . Suppose that any of the followingthree conditions hold: (i) lim sup n R W n < ∞ and lim n →∞ | R W n − Var( W n ) | = 0 , (ii) lim sup n E W n < ∞ and for each n , W n may be written as a sum of TNDBernoulli random variables; or (iii) lim sup n E W n < ∞ and for each n , W n is ULC ( ∞ ) .Then d T V ( L ( W n ) , Po( E W n )) → as n → ∞ . Proof

Note that since f ( t ) = (1 − e − t ) /t is decreasing in t , we can bound it by f ( t ) ≤ lim t → f ( t ) = 1. Hence in Theorem 1.2, we need only show that the termin braces converges to zero. Under Condition (i) this is automatic. Further, notethat Conditions (ii) and (iii) each imply that Var( W n ) ≤ R W n ≤ E W n . Thisfollows from Lemma 3.1 and from Corollary 2.2 and 2.4 respectively. Thus, by asandwich argument, we know that Conditions (ii) and (iii) each imply Condition(i).

5. Proof of Theorem 1.1

In this section we prove the following result, which gives Theorem 1.1 as animmediate corollary.

Theorem 5.1.

Let W be a non–negative, integer–valued random variable withmean λ , and let Z ≥ be a random variable deﬁned on the same space as W such that W ⋆ ≤ st W + Z . Then for any g ∈ G ( W ) , Var g ( W ) = E [ g ( W ) ] ≤ λ ∞ X j =0 ∆ g ( j ) P ( j − Z < W ≤ j ) . Our proof will make use of Klaassen’s kernel function, given by equation (2.17)of Klaassen (1985): χ ( i, j ) = I ( ⌊ x ⌋ ≤ j < i ) − I ( i ≤ j < ⌊ x ⌋ ) − ( x − ⌊ x ⌋ ) I ( j = ⌊ x ⌋ ) , (11)for some x ∈ R . We begin with the following lemma. Lemma 5.2.

Let W be a non–negative, integer–valued random variable. Thenfor any g ∈ G ( W ) and any x ∈ R , E [ g ( W ) ] ≤ ∞ X j =0 ∆ g ( j ) ∞ X i =0 ( i − x ) P ( W = i ) χ ( i, j ) . (12)10 roof For any given integer i , by considering the cases { i < x } , { i > x } and { i = x } separately, we deduce that for any function h : ∞ X j =0 χ ( i, j ) h ( j ) =  − P ⌊ x ⌋− j = i h ( j ) + ( ⌊ x ⌋ − x ) h ( ⌊ x ⌋ ) for i < ⌊ x ⌋ ,( ⌊ x ⌋ − x ) h ( ⌊ x ⌋ ) for i = ⌊ x ⌋ , P i − ⌊ x ⌋ h ( j ) + ( ⌊ x ⌋ − x ) h ( ⌊ x ⌋ ) for i > x .Taking h ≡ ∆ g we deduce that P ∞ j =0 χ ( i, j )∆ g ( j ) = g ( i ) − g ∗ , where g ∗ = g ( ⌊ x ⌋ ) + ∆ g ( ⌊ x ⌋ )( x − ⌊ x ⌋ ). In particular, taking h ( j ) ≡ P ∞ j =0 χ ( i, j ) = ( i − x ). Observe that by the Cauchy-Schwarz inequality thismeans that( g ( i ) − g ∗ ) =  ∞ X j =0 χ ( i, j )∆ g ( j )  ≤  ∞ X j =0 χ ( i, j )   ∞ X j =0 χ ( i, j )∆ g ( j )  = ( i − x )  ∞ X j =0 χ ( i, j )∆ g ( j )  . (13)Although χ ( i, j ) is a signed measure on j , the use of the Cauchy–Schwarz in-equality is justiﬁed since χ ( i, j ) has constant sign for any given i . If i ≥ ⌊ x ⌋ then χ ( i, j ) ≥ j , and otherwise χ ( i, j ) ≤ j .The lemma follows on combining (13) with the observation that for all g ∈G ( W ) E [ g ( W ) ] ≤ ∞ X i =0 P ( W = i ) ( g ( i ) − g ∗ ) , and reversing the order of summation in the resulting expression.Theorem 5.1 follows immediately from Lemma 5.2. To see this, choose x = E W = λ . Then, using the deﬁnition of the size–bias transform (2), for a ﬁxed j ∈ Z the inner sum in Equation (12) can be expressed as E W χ ( W, j ) − λ E χ ( W, j ) = λ ( E χ ( W ∗ , j ) − E χ ( W, j )) ≤ λ E ( χ ( W + Z, j ) − χ ( W, j )) , (14)using the stochastic ordering assumption of Theorem 5.1, and the fact that χ ( i, j ) is increasing in i for ﬁxed j . Using (11), and assuming that j ≥ ⌊ x ⌋ ﬁrst, we observe that for any w, z ∈ Z , χ ( w + z, j ) − χ ( w, j ) = I ( j < w + z ) − I ( j < w ) = I ( w ≤ j < w + z ) . (15)Similarly, Equation (15) also holds in the case j < ⌊ x ⌋ . Substituting this in(14) we obtain λ P ( W ≤ j < W + Z ) = λ P ( j − Z < W ≤ j ) as required tocomplete the proof. 11 cknowledgement We wish to thank anonymous referees for pointing us towards useful refer-ences, and for comments that helped improve the presentation of this work.

References

C. An´e, S. Blach`ere, D. Chafa¨ı, P. Foug`eres, I. Gentil, F. Malrieu, C. Robertoand G. Scheﬀer.

Sur les In´egalit´es de Sobolev Logarithmiques . Panoramas etSynth´eses, 10. Soci´et´e Math´ematique de France, Paris, 2000.R. Arratia, L. Goldstein and L. Gordon. Two moments suﬃce for Poisson ap-proximations: the Chen–Stein method.

Ann Probab. , 17(1):9–25, 1989.D. Bakry and Michel ´Emery. Diﬀusions hypercontractives. In

S´eminaire de prob-abilit´es, XIX, 1983/84 , volume 1123 of

Lecture Notes in Math. , pages 177–206.Springer, Berlin, 1985.A. D. Barbour, L. Holst and S. Janson.

Poisson approximation . Oxford Univer-sity Press, Oxford, 1992.A. D. Barbour and A. Xia. On Stein’s factors for Poisson approximation inWasserstein distance

Bernoulli , 12:943-954, 2006.S. Bobkov and F. G¨otze. Discrete isoperimetric and Poincar´e-type inequalities.

Probability Theory and Related Fields , 114:245–277, 1999.A. Borovkov and S. Utev. On an inequality and a related characterisation ofthe normal distribution.

Theory Probab. Appl. , 28(2):219–228, 1984.F. A. Daly. Compound Poisson approximation with association or negative as-sociation via Stein’s method. Preprint, 2010.F. Daly, C. Lef`evre and S. Utev. Stein’s method and stochastic orderings.

Adv.Appl. Probab. , 44(2):343–372, 2012.F. Gao and J. Quastel. Exponential decay of entropy in the random transpositionand Bernoulli–Laplace models.

Ann. Appl. Probab. , 13(4):1591–1600, 2003.K. Joag–Dev and F. Proschan. Negative association of random variables, withapplications.

Ann. Statist. , 11(1):286–295, 1983.C. Klaassen. On an inequality of Chernoﬀ.

Ann. Probab. , 13(3):966–974, 1985.G. F. Lawler and A. D. Sokal. Bounds on the L spectrum for Markov chainsand Markov processes: a generalization of Cheeger’s inequality. Trans. Am.Math. Soc. , 309(2):557–580, 1988.T. M. Liggett. Ultra logconcave sequences and negative dependence.

J. Combin.Theory Ser. A , 79(2):315–325, 1997.12. Miclo. An example of application of discrete Hardy’s inequalities.

MarkovProcess. Related Fields , 5(3):319–330, 1999.N. Papadatos and V. Papathanasiou. Poisson approximation for a sum of de-pendent indicators: an alternative approach.

Adv. Appl. Probab. , 34:609–625,2002.R. Pemantle. Towards a theory of negative dependence.

J. Math. Phys. ,41(3):1371–1390, 2000.B. L. S. Prakasa Rao and M. Sreehari. On a characterization of Poisson distri-bution through inequalities of Chernoﬀ-type.

Austral. J. Statist. , 29(1):38–41,1987.M. Shaked and J. G. Shanthikumar.

Stochastic Orders . Springer, New York,2007.S. A. Utev. Probability problems connected with a certain integrodiﬀerentialinequality.