A Constructive Proof of a Concentration Bound for Real-Valued Random Variables
aa r X i v : . [ c s . D M ] M a r A Constructive Proof of a Concentration Bound forReal-Valued Random Variables ∗ Wolfgang Mulzer and Natalia Shenkman ** Freie Universität Berlin
Abstract.
Almost 10 years ago, Impagliazzo and Kabanets (2010) gave a new combinatorialproof of Chernoff’s bound for sums of bounded independent random variables. Unlike previ-ous methods, their proof is constructive . This means that it provides an efficient randomizedalgorithm for the following task: given a set of Boolean random variables whose sum is notconcentrated around its expectation, find a subset of statistically dependent variables. How-ever, the algorithm of Impagliazzo and Kabanets (2010) is given only for the Boolean case.On the other hand, the general proof technique works also for real-valued random variables,even though for this case, Impagliazzo and Kabanets (2010) obtain a concentration boundthat is slightly suboptimal.Herein, we revisit both these issues and show that it is relatively easy to extend theImpagliazzo-Kabanets algorithm to real-valued random variables and to improve the corre-sponding concentration bound by a constant factor.
The weak law of large numbers is a central pillar of modern probability theory: any sample average ofindependent random variables converges in probability to its expected value. This qualitative state-ment is made more precise by concentration bounds , which quantify the speed of convergence forcertain prominent special cases. Due to their wide applicability in mathematics, statistics, and com-puter science, a whole industry of concentration bounds has developed over the last decades. By now,the literature contains myriads of different bounds, satisfying various needs and proved in numerousways; see, e.g., Chernoff (1952), Hoeffding (1963), Schmidt et al. (1995), Panconesi and Srinivasan(1997), or the interesting and extensive textbooks and surveys by McDiarmid (1998), Chung and Lu(2006), Alon and Spencer (2008), and Mulzer (2018).In theoretical computer science, a central application area of concentration bounds lies in the de-sign and analysis of randomized algorithms. About 10 years ago, Impagliazzo and Kabanets (2010)went in the other direction, showing that methods from theoretical computer science can be usefulin obtaining new proofs for concentration bounds. This led to a new—algorithmic—proof of a gen-eralized Chernoff bound for Boolean random variables, which the authors “ consider more revealingand intuitive than the standard Bernstein-style proofs, and hope that its constructiveness will haveother applications in computer science ”.Impagliazzo and Kabanets were able to extend their combinatorial approach to real-valuedbounded random variables. However, in order to do this, they had to use a slightly different argu-ment. This came at the cost of a sub-optimal multiplicative constant in the bound. Furthermore, ∗ This research was supported in part by ERC StG 757609. It is based on the Master’s thesis of the second authorthat was defended on 10. January 2019 at Freie Universität Berlin. ** Corresponding author
Keywords: generalized Chernoff-Hoeffding bound, concentration bound, randomized algorithm. W. Mulzer and N. Shenkman with the new argument, it was not clear how to generalize the main randomized algorithm to thereal-valued case. Here, we present a constructive proof of Chernoff’s bound that remedies both theseissues, giving the same bound and the same algorithmic result as are known for the Boolean case.
The main result of Impagliazzo and Kabanets for real-valued bounded random variables is stated asTheorem 2.1 below. Essentially, this theorem can be seen as a very simple adaptation of the famousChernoff-Hoeffding bound (Shenkman, 2018, Computation (4.6)). In their paper, Impagliazzo and Kabanets(2010) proved the bound with a sub-optimal multiplicative constant. Very recently, Pelekis and Ramon(2017) claimed the bound with an optimal constant, but, unfortunately, their argument is flawed (Shenkman,2018, Remark 5.10). We present a new proof of Theorem 2.1 that leads to an optimal multiplicativeconstant and a randomized algorithm for real-valued bounded random variables. Remarkably, ourresult is obtained by following the original approach of Impagliazzo and Kabanets for the Booleancase, and pushing through the calculation to the end thanks to Lemma 2.6.In what follows, we set [ n ] := { , . . . , n } , for n ∈ N , and use E [ · ] for the expectation operator.Moreover, for p, q ∈ [0 , , we denote by D ( p k q ) := p ln( p/q ) + (1 − p ) ln((1 − p ) / (1 − q )) the binaryrelative entropy with the conventions and ln( x/
0) = ∞ , for all x ∈ (0 , . Theorem 2.1.
Suppose we are given a sequence X , . . . , X n of n random variables, and n + 1 real constants a , . . . , a n , b, c , . . . , c n such that a , . . . , a n ≤ , b > , and a i ≤ X i ≤ a i + b almostsurely, for all i ∈ [ n ] , and E h Y i ∈ S X i i ≤ Y i ∈ S c i , (2.1) for all S ⊆ [ n ] . Set X := P ni =1 X i , a := (1 /n ) P ni =1 a i , and c := (1 /n ) P ni =1 c i . Then, for any t ∈ [0 , b + a − c ] , we have P (cid:0) X ≥ ( c + t ) n (cid:1) ≤ e − D (cid:0) c − a + tb (cid:13)(cid:13) c − ab (cid:1) n . (2.2) Remark 2.2.
The condition a i ≤ , for i ∈ [ n ] , in Theorem 2.1 can be overcome by imposing astronger condition of dependence than (2.1) ; see (Shenkman, 2018, Theorem 3.11). The remainder of this section is dedicated to the proof of Theorem 2.1, which follows the ideasfound in Impagliazzo and Kabanets (2010). We first deal with the case where c ∈ ( a, b + a ) and t < b + a − c . We fix a parameter λ ∈ [0 , , and we consider the following random process:for i = 1 , . . . , n , we sample the random variable X i . Then, we normalize the resulting values toobtain a sequence e X , . . . , e X n of probabilities. We use these probabilities to sample n conditionallyindependent Boolean random variables Y , . . . , Y n . Finally, we go through the Y i , and for each i ,we set Y i to with probability − λ and we keep it unchanged with probability λ . Now, the aimis to bound the expected value E (cid:2) Q ni =1 Y i (cid:3) in two different ways. On the one hand, it will turn outthat (2.1) implies that E (cid:2) Q ni =1 Y i (cid:3) can be upper-bounded by ( λ ˜ c + 1 − λ ) n , the expectation of theproduct of n independent Boolean random variables that are set to with probability λ ˜ c + 1 − λ ,where ˜ c is the normalized average of the c i . On the other hand, the expectation E (cid:2) Q ni =1 Y i (cid:3) can belower-bounded by P (cid:0) X ≥ ( c + t ) n (cid:1) times the conditional expectation given the event X ≥ ( c + t ) n ,which turns out to be at least (1 − λ ) n − (˜ c +˜ t ) n , where ˜ t is the normalized deviation parameter t .Combining the two bounds and optimizing for λ will then lead to (2.2). Constructive Proof of a Concentration Bound for Real-Valued Random Variables We now proceed with the details. For i ∈ [ n ] , we define the normalized variables e X i := ( X i − a i ) /b and the normalized constants ˜ c i := ( c i − a i ) /b , as well as ˜ c := ( c − a ) /b and ˜ t := t/b . We define n Boolean random variables Y i ∼ Bernoulli (cid:0) e X i (cid:1) , for i ∈ [ n ] , that are conditionally independentgiven e X , . . . , e X n . In other words, Y , . . . , Y n are independent on the σ -algebra generated by the set (cid:8) e X i | i ∈ [ n ] (cid:9) . Furthermore, let λ ∈ [0 , be fixed, and let I be a random variable, independent of e X , . . . , e X n and of Y , . . . , Y n , taking values in { S | S ⊆ [ n ] } with the probability mass function P ( I = S ) = λ | S | (1 − λ ) n −| S | , for all S ⊆ [ n ] . If P ( X ≥ ( c + t ) n ) = 0 , then (2.2) holds trivially, so from now on we assumethat P ( X ≥ ( c + t ) n ) > . As mentioned, our goal is to bound the expectation E (cid:2) Q i ∈I Y i (cid:3) in twodifferent ways, and we start with the upper bound. The first lemma shows that the condition (2.1)on the moments of the X i carries over to the normalized variables e X i . Lemma 2.3.
For any S ⊆ [ n ] , we have E h Y i ∈ S e X i i ≤ Y i ∈ S ˜ c i . Proof.
The lemma follows quickly by plugging in the definitions. More precisely, we have E h Y i ∈ S e X i i = E " Y i ∈ S (cid:16) X i − a i b (cid:17) (definition of e X i ) = 1 b | S | X I⊆ S (cid:16) Y j ∈ S \I ( − a j ) (cid:17) E h Y i ∈I X i i (distributive law, linearity of expectation) ≤ b | S | X I⊆ S (cid:16) Y j ∈ S \I ( − a j ) (cid:17)(cid:16) Y i ∈I c i (cid:17) ((2.1) and a i ≤ , for i ∈ [ n ] ) = 1 b | S | Y i ∈ S ( c i − a i ) = Y i ∈ S ˜ c i , (distributive law, definition of ˜ c i )as claimed.The normalized moment condition from Lemma 2.3 now shows that expectation of Q i ∈ S Y i canbe upper-bounded by the expectation of a product of independent Boolean random variables withsuccess probabilities ˜ c i . Lemma 2.4.
For any S ⊆ [ n ] , we have E h Y i ∈ S Y i i ≤ Y i ∈ S ˜ c i . W. Mulzer and N. Shenkman
Proof.
The lemma follows by the conditional independence of the Y i . We have E h Y i ∈ S Y i i = E " E h Y i ∈ S Y i (cid:12)(cid:12)(cid:12) e X , . . . , e X n i (law of total expectation) = E " Y i ∈ S E (cid:2) Y i (cid:12)(cid:12) e X , . . . , e X n (cid:3) (conditional independence) = E h Y i ∈ S e X i i ≤ Y i ∈ S ˜ c i , (definition of Y i , Lemma 2.3)as desired.To achieve an upper bound for E (cid:2) Q i ∈I Y i (cid:3) , we must still account for the random subset I .Essentially, it says that we can think of the expected product of n independent Boolean randomvariables with success probability λ ˜ c + 1 − λ . Lemma 2.5.
We have E h Y i ∈I Y i i ≤ (cid:0) λ ˜ c + 1 − λ (cid:1) n . Proof.
We proceed as follows: E h Y i ∈I Y i i = X S ⊆ [ n ] P (cid:0) I = S (cid:1) E h Y i ∈ S Y i i (law of total expectation) ≤ X S ⊆ [ n ] λ | S | (1 − λ ) n −| S | Y i ∈ S ˜ c i (definition of I and Lemma 2.4) = X S ⊆ [ n ] (cid:16) Y i ∈ S λ ˜ c i (cid:17)(cid:16) Y i ∈ [ n ] \ S (1 − λ ) (cid:17) (regrouping) = n Y i =1 (cid:0) λ ˜ c i + 1 − λ (cid:1) (distributive law) ≤ n n X i =1 (cid:0) λ ˜ c i + 1 − λ (cid:1)! n (am-gm-inequality) = (cid:0) λ ˜ c + 1 − λ (cid:1) n , (definition of ˜ c )as stated.We turn to the lower bound for E (cid:2) Q i ∈I Y i (cid:3) . For this, we first bound the conditional expectationgiven the event X ≥ ( c + t ) n assuming that P (cid:0) X ≥ ( c + t ) n (cid:1) > . Lemma 2.6.
We have E h Y i ∈I Y i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n i ≥ (1 − λ ) n − (˜ c +˜ t ) n . Constructive Proof of a Concentration Bound for Real-Valued Random Variables Proof.
First, we note that for λ ∈ [0 , and x ∈ [0 , , the binomial series expansion gives (1 − λ ) − x = 1 − (1 − x ) λ + ∞ X i =2 (cid:18) − xi (cid:19) ( − λ ) i ≤ − (1 − x ) λ, (2.3)since (cid:0) − xi (cid:1) = Q i − j =0 (1 − x − j ) i ! , and thus P ∞ i =2 (cid:0) − xi (cid:1) ( − λ ) i ≤ . The derivation proceeds as follows: E " Y i ∈I Y i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n = E " E h Y i ∈I Y i (cid:12)(cid:12)(cid:12) X , . . . , X n i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (law of total expectation) = E " Y i ∈I e X i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (def. and cond. independence of Y i ) = X S ⊆ [ n ] P (cid:0) I = S (cid:1) E h Y i ∈ S e X i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n i (law of total expectation) = E " X S ⊆ [ n ] P (cid:0) I = S (cid:1) Y i ∈ S e X i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (linearity of expectation) = E " X S ⊆ [ n ] (cid:16) Y i ∈ S λ e X i (cid:17)(cid:16) Y i ∈ [ n ] \ S (1 − λ ) (cid:17) (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (definition of I , grouping) = E " n Y i =1 (cid:0) λ e X i + 1 − λ (cid:1) (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (distributive law) ≥ E " n Y i =1 (1 − λ ) − e X i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n (by (2.3)) = E " (1 − λ ) n − X − nab (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n ≥ (1 − λ ) n − (˜ c +˜ t ) n , (definition of e X i , a, ˜ c, ˜ t )as desired.Now, combining Lemmas 2.5 and 2.6 with the law of total expectation, we obtain that for any λ ∈ [0 , , ( λ ˜ c + 1 − λ ) n ≥ E h Y i ∈I Y i i (Lemma 2.5) ≥ E h Y i ∈I Y i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n i P (cid:0) X ≥ ( c + t ) n (cid:1) (law of total expectation) ≥ (1 − λ ) n − (˜ c +˜ t ) n P (cid:0) X ≥ ( c + t ) n (cid:1) , (Lemma 2.6) W. Mulzer and N. Shenkman and hence, for any λ ∈ [0 , , P (cid:0) X ≥ ( c + t ) n (cid:1) ≤ λ ˜ c + 1 − λ (1 − λ ) − ˜ c − ˜ t ! n . (2.4)A straightforward calculation shows that g ( λ ) := ( λ ˜ c + 1 − λ ) / (1 − λ ) − ˜ c − ˜ t is minimized at λ ∗ :=˜ t/ ((1 − ˜ c )(˜ c + ˜ t )) ∈ [0 , and that g ( λ ∗ ) = ˜ c ˜ c + ˜ t ! ˜ c +˜ t − ˜ c − ˜ c − ˜ t ! − ˜ c − ˜ t = e − D (cid:0) ˜ c +˜ t (cid:13)(cid:13) ˜ c (cid:1) . To complete the proof, it remains to consider the cases where c = a or t = b + a − c . First,observe that c = a implies c i = a i , for all i ∈ [ n ] , which, in turn, gives X i = a i almost surely for all i ∈ [ n ] . Consequently, we have P (cid:0) X ≥ ( a + t ) n (cid:1) = 1 = e − D (0 k n , if t = 0 , and P (cid:0) X ≥ ( a + t ) n (cid:1) =0 = e − D (˜ t k n , if t > . Second, if c > a and t = b + a − c , we have that P (cid:0) X ≥ ( c + t ) n (cid:1) = P (cid:0) ∀ i ∈ [ n ] : X i = b + a i (cid:1) (since t = b + a − c ) ≤ E h n Y i =1 e X i i ≤ n Y i =1 ˜ c i (definition of e X i , Lemma 2.3) ≤ (cid:16) n n X i =1 ˜ c i (cid:17) n = ˜ c n = e − D (1 k ˜ c ) n . (am-gm-inequality, definition of ˜ c )This concludes the proof of Theorem 2.1. We provide a generalization of Theorem . in Impagliazzo and Kabanets (2010). Theorem 3.1.
There is a randomized algorithm A such that the following holds. Let X , . . . , X n be [0 , -valued random variables. Let < c < and < t ≤ − c be such that P (cid:0) X ≥ ( c + t ) n (cid:1) = p > α, for some α ≥ e − D ( c + t k c ) n . Then, on inputs n, c, t, α , the algorithm A , using oracle access to thedistribution of ( X , . . . , X n ) , runs in time poly ( α − /ct , n ) and outputs a set S ⊆ [ n ] such that, withprobability at least − o (1) , one has E h Y i ∈ S X i i > c | S | + Ω (cid:0) α /ct (cid:1) . Proof.
We follow the argument of Impagliazzo and Kabanets: for i ∈ [ n ] , let the random variables Y i ∼ Bernoulli ( X i ) be conditionally independent given the X i . Furthermore, for λ ∈ (0 , , let I ∼
Binomial ( n, λ ) be a random set that is independent of the X i and of the Y i . Using the law oftotal expectation and Lemma 2.6, we infer that E h Y i ∈I Y i i ≥ P (cid:0) X ≥ ( c + t ) n (cid:1) E h Y i ∈I Y i (cid:12)(cid:12)(cid:12) X ≥ ( c + t ) n i ≥ p (1 − λ ) n (1 − c − t ) . Constructive Proof of a Concentration Bound for Real-Valued Random Variables Moreover, by proceeding as in the proof of Lemma 2.5, we can show that E (cid:2) c |I| (cid:3) ≤ ( λc + 1 − λ ) n .Hence, we obtain E h Y i ∈I Y i − c |I| i = E h Y i ∈I Y i i − E (cid:2) c |I| (cid:3) ≥ (1 − λ ) n (1 − c − t ) p − (cid:16) λc + 1 − λ (1 − λ ) − c − t (cid:17) n ! . The rest of the proof is completely analogous to (Impagliazzo and Kabanets, 2010, Theorem 4.1).
References
Alon, N. and Spencer, J.H. (2008).
The probabilistic method, 3rd ed.
Wiley-Interscience, New York.Chernoff, H. (1952).
A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations ,Annals of Mathematical Statistics , pp. 493–507.Chung, F. and Lu, L. (2006). Complex Graphs and Networks,
CBMS Regional Conference Series in Mathematics . American Mathematical Society.Hoeffding, W. (1963).
Probability inequalities for sums of bounded random variables , Journal of the American Sta-tistical Association , pp. 13–30.Impagliazzo, R. and Kabanets, V. (2010). Constructive proofs of concentration bounds.
In: Serna, M., Shaltie, R.,Jansen, K., Rolim, J. (eds) Approximation, Randomization, and Combinatorial Optimization. Algorithms andTechniques. RANDOM 2010, APPROX 2010. Lecture Notes in Computer Science , pp. 617–631. Springer,Berlin, Heidelberg.McDiarmid, C. (1998).
Concentration.
In: Habib, M., McDiarmid, C., Ramirez-Alfonsin, J., Reed, B. (eds) Proba-bilistic Methods for Algorithmic Discrete Mathematics. Algorithms and Combinatorics , pp. 195–248. Springer,Berlin, Heidelberg.Mulzer, W. (2018). Five proofs of Chernoff’s bound with applications , Bulletin of the EATCS (BEATCS) ,(February 2018).Panconesi, A. and Srinivasan, A. (1997).
Randomized distributed edge coloring via an extension of the Chernoff-Hoeffding bounds , SIAM Journal on Computing , pp. 350–368.Pelekis, C. and Ramon, J. (2017).
Hoeffding’s inequality for sums of dependent random variables , MediterraneanJournal of Mathematics .Schmidt, J.P., Siegel, A., and Srinivasan, A. (1995).
Chernoff–Hoeffding bounds for applications with limited inde-pendence , SIAM Journal on Discrete Mathematics , pp. 223–250.Shenkman, N. (2018)
A comparative study of some proofs of Chernoff’s bound with regard to their applicability forderiving concentration inequalities . Master’s Thesis, Freie Universität Berlin.
Wolfgang Mulzer and Natalia ShenkmanDepartment of Computer Science, Freie Universität BerlinTakustraße 9, 14195 Berlin, GermanyE-mails: [email protected],[email protected]