[PDF] One Hundred Probability and Statistics Inequalities

Abstract

Herein we present one hundred inequalities culled from various corners of the probability, statistics, and combinatorics literature. We welcome new suggestions.

Full PDF

aa r X i v : . [ m a t h . S T ] F e b One Hundred Probability and StatisticsInequalities

CNP SlagleFebruary 16, 2021

In 2012, the author compiled a subset of the following inequalities for a researcherin randomized algorithms. One might think of said inequalities as a very quick refer-ence, with access to primary and secondary resources listed either within the sectionor alongside the inequality of interest. In the intervening years, some of the originalsources and their respective links have vanished, leading the author to consider a com-panion document with proofs for select inequalities within this list. Though the authorwould refute the completeness of this collection for more advanced researchers, henonetheless believes it may serve some interest. It is important

The relations to follow include axioms within the probabilistic framework, along witha few of the basic inferences derived therefrom. See [1] for a wonderful introduction.Given events (sets) A , B , and countable { A n } ∞ n =1 ,1. P [ A ] ≥ P [ A ] ≤

3. If A ⊂ B , then P [ A ] ≤ P [ B ]

4. If A ⊂ B , then P [ B ] ≤ P [ A c ] (Boole) P [ ∪ ∞ n =1 A n ] ≤ P ∞ n =1 P [ A n ] P [ ∪ ∞ n =1 A n ] ≥ sup { P [ A n ] | n = 1 , , . . . } P [ ∩ ∞ n =1 A n ] ≤ inf { P [ A n ] | n = 1 , , . . . } P [ A ∩ B ] ≤ min { P [ A ] , P [ B ] } (Bonferroni) P [ A ∩ B ] ≥ P [ A ] + P [ B ] − (Bonferroni General) P [ ∩ ni =1 ] ≥ P ni =1 P [ A i ] − ( n − P [ A | B ] ≥ P [ A ∩ B ] (Karlin Ost) Deﬁne P = P ni =1 P [ A i ] , P = P ≤ i , E | X | ≤ p p E | X | p (Liapounov) For s > r > , r p E | X | r ≤ s p E | X | s (Minkowski) For p ≥ , p p E | X + Y | p ≤ p p E | X | p + p p E | Y | p (Triangle) As a special case of Minkowski’s inequality, E | X + Y | ≤ E | X | + E | Y | . 22. If g is nondecreasing and h is nonincreasing, then E ( g ( X ) h ( X )) ≤ ( Eg ( X ))( Eh ( X )) .13. If g and h are both nondecreasing or both nonincreasing, then E ( g ( X ) h ( X )) ≥ ( Eg ( X ))( Eh ( X )) .14. (Cram´er-Rao) Suppose X , . . . , X n is a sample with joint pdf f ( x | θ ) and W ( X ) is any estimator of θ such that ddθ E θ W ( X ) = R χ ∂∂θ [ W ( x )] f ( x | θ ) d x and V ar θ ( W ( X )) < ∞ . Then V ar θ ( W ( X )) ≥ ( ddθ E θ W ( X )) E θ h(cid:0) ∂∂θ log( f ( x | θ )) (cid:1) i . (3)15. (Cram´er-Rao IID) Suppose X , . . . , X n is a sample iid with marginal pdf f ( x | θ ) and W ( X ) is any estimator of θ such that ddθ E θ W ( X ) = R χ ∂∂θ [ W ( x )] f ( x | θ ) d x and V ar θ ( W ( X )) < ∞ . Then V ar θ ( W ( X )) ≥ ( ddθ E θ W ( X )) nE θ h(cid:0) ∂∂θ log( f ( x | θ )) (cid:1) i . (4)16. (Rao-Blackwell) Let U be an unbiased estimator of τ ( θ ) , and let T be a sufﬁ-cient statistic for θ . Deﬁne φ ( T ) = E ( U | T ) . Then E φ ( T ) = τ ( θ ) , and V ar θ φ ( T ) ≤ V ar θ W for all θ. (5) (Han [2]) Let X , . . . , X n be independent discrete random variables. Let H ( X π , . . . , X π k ) be the joint entropy of a subset of the { X i } . Then H ( X , . . . , X n ) ≤ n X i =1 H ( X , . . . , X i − , X i +1 , . . . , X n ) . (6)2. [2] Let X , . . . , X n be independent random variables. Let g : Domain ( X , . . . , X n ) → R be Lesbegue measurable, and Z = g ( X , . . . , X n ) . Then V ar ( Z ) ≤ n X i =1 E [( Z − E ( Z | X , . . . , X i − , X i +1 , . . . , X n ) ] . (7)3. (Efron-Stein [2]) Let X , . . . , X n be independent random variables. Let g : Domain ( X , . . . , X n ) → R be Lesbegue measurable, and Z = g ( X , . . . , X n ) .Let Y , . . . , Y n be an independent copy of X , . . . , X n , and let Z i = g ( X , . . . , Y i , . . . , X n ) .Then V ar ( Z ) ≤ n X i =1 E [( Z − Z i ) ] . (8)3. (Logarithmic Sobolev [2]) Let X , . . . , X n be independent random variables.Let g i : Domain ( X , . . . , X i − , X i +1 , . . . , X n ) → R be Lesbegue measurable, Z i = g i ( X , . . . , X i − , X i +1 , . . . , X n ) , g : Domain ( X , . . . , X n ) → R beLesbegue measurable, and Z = g ( X , . . . , X n ) . Let ψ ( t ) = e t − t − and s > . Then sE ( Ze sZ ) − E ( e sZ ) log[ E ( e sZ )] ≤ n X i =1 E [ e sZ ψ ( − s ( Z − Z i ))] . (9)5. (Symmetrized Logarithmic Sobolev [2]) Let X , . . . , X n be independent ran-dom variables. Let g : Domain ( X , . . . , X n ) → R be Lesbegue measurable, and Z = g ( X , . . . , X n ) .Let Y , . . . , Y n be an independent copy of X , . . . , X n , and let Z i = g ( X , . . . , Y i , . . . , X n ) .Let ψ ( t ) = e t − t − and s > . Then sE ( Ze sZ ) − E ( e sZ ) log[ E ( e sZ )]] ≤ n X i =1 E [ e sZ ψ ( − s ( Z − Z i ))] . (10)6. [3] Suppose { X n } is a sequence of random variables such that for all n , X n ≥ ,and for all ǫ > , there exist c > and c > e − such that P [ X n > ǫ ] ≤ c e − c nǫ . Then E X n ≤ s c ) nc . (11)7. (Ledoux-Talagrand Contraction [4]) Suppose X i , . . . , X n are iid Rademacher variables ( P [ X i = 1] = P [ X i = −

1] = ). Suppose f : R + → R + be convexand increasing, and φ i : R → R be Lipschitz with constant L for i = 1 , . . . , n .Then for T ⊂ R n , Ef

12 sup t ∈ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i φ i ( t i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! ≤ Ef L sup t ∈ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i t i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! (12)8. (Bhatia-Davis [5]) If a univariate probability distribution F has minimum m ,maximum M , and mean µ , then for any X following F , V ar ( X ) ≤ ( M − µ )( µ − m ) .9. (Popoviciu [6]) If a univariate probability distribution F has minimum m andmaximum M , then for any X following F , V ar ( X ) ≤ ( M − m ) .10. (Chapman-Robbins [7]) Suppose X is a random variable in R k with an un-known parameter θ . If δ ( X ) is an unbiased estimator for τ ( θ ) , then V ar ( δ ( X )) ≥ sup ∆ [ τ ( θ + ∆) − τ ( θ )] E θ h p ( X .θ +∆) p ( X ,θ ) − i . (13)41. (Entropy Power [8]) Deﬁne the entropy of X to be H (() X ) = − E log f X ( X ) ,where f X ( x ) is the pdf or pmf of X . Deﬁne the entropy power of X to be N ( X ) = πe e n h ( X ) . Then for random variables X and Y , we have N ( X + Y ) ≥ N ( X ) + N ( Y ) .12. (Marcinkiewicz Zygmund [9]) Let X , . . . , X n be independent random vari-ables with common support such that E X i = 0 and E X pi < ∞ for all p ≥ .Then there exist constants A ( p ) and B ( p ) , dependent only on p , such that A ( p ) E n X i =1 | X i | ! p ≤ E n X i =1 | X i | ! p ≤ B ( p ) E n X i =1 | X i | ! p . (14)13. (Khintchine [10]) Let X , . . . , X n be iid Rademacher random variables. Thenfor any λ , . . . , λ n ∈ C and p > , there exist constants A ( p ) and B ( p ) , depen-dent only on p , such that A ( p ) n X i =1 | λ i | ! ≤ E n X i =1 | λ i X i | ! p ! p ≤ B ( p ) n X i =1 | λ i | ! . (15)14. (Rosenthal I [11]) Let X , . . . , X n be independent nonnegative random vari-ables such that E X pi < ∞ for a ﬁxed p ≥ , i = 1 , . . . , n . Then there existconstants A ( p ) and B ( p ) dependent only on p such that A ( p ) max (cid:8)P ni =1 EX pi , ( P ni =1 EX i ) p (cid:9) ≤ ( P ni =1 EX i ) p ≤ B ( p ) max (cid:8)P ni =1 EX pi , ( P ni =1 EX i ) p (cid:9) . (16)15. (Rosenthal II) Let X , . . . , X n be independent random variables such that E X i =0 and E X pi < ∞ for a ﬁxed p ≥ , i = 1 , . . . , n . Then there exist constants A ( p ) and B ( p ) dependent only on p such that A ( p ) max nP ni =1 E | X i | p , (cid:0)P ni =1 EX i (cid:1) p o ≤ | P ni =1 EX i | p ≤ B ( p ) max nP ni =1 E | X i | p , (cid:0)P ni =1 EX i (cid:1) p o . (17)16. (Papadatos [12]) Let X (1) , . . . , X ( n ) be the order statistics of iid random vari-ables X , . . . , X n with variance σ . Deﬁne G ( x ) = I x ( k, n + 1 − k ) and σ n ( k ) = sup

1. (Markov[1]) Suppose X ≥ , and E [ X ] > . Then P ( X ≥ t ) ≤ E [ X ] t for all t > .2. (Chebychev [1]) For t > , P [ | X − EX | ≥ t ] ≤ V ar ( X ) t .3. ( g -Markov) Let X ≥ , E [ X ] > . Then for increasing g : R +0 → R + , P ( X ≥ t ) ≤ E [ g ( X )] g ( t ) . (22)4. (Normal I, Mill [1][14]) For Z a standard normal, P [ | Z | ≥ t ] ≤ q π e − t / t .5. (Normal II [1]) For Z a standard normal, P [ | Z | ≥ t ] ≥ q π e − t / t t .6. (Chernoff I [1][15]) Let M X ( t ) , − h ≤ t ≤ h be the moment-generating func-tion of X . Then P [ X > a ] ≤ e − at M X ( t ) , − h ≤ t ≤ h .7. (Chernoff II [1][15]) Let M X ( t ) , − h ≤ t ≤ h be the moment-generating func-tion of X . Then P [ X ≤ a ] ≤ e − at M X ( t ) , − h ≤ t ≤ .8. (Chernoff Sum I [1]) Let X , . . . , X n be iid, X = P ni =1 X i , and M X ( t ) , − h ≤ t ≤ h be the moment-generating function of X . . Then P [ S > a ] ≤ e − at [ M X ( t )] n for ≤ t ≤ h .9. (Chernoff Sum II [1]) Let X , . . . , X n be iid, X = P ni =1 X i , and M X ( t ) , − h ≤ t ≤ h be the moment-generating function of X . Then P [ S ≤ a ] ≤ e − at [ M X ( t )] n for − h ≤ t ≤ .10. (Chernoff Mean [1]) Let X , . . . , X n be iid, ǫ > , ¯ X n = P ni =1 X i , M U ( t ) , − h U ≤ t ≤ h U be the moment-generating function of U = X − EX − ǫ ,and M V ( t ) , − h V ≤ t ≤ h V be the moment-generating function of V = − X + EX − ǫ . Then there exist for some < t U ≤ h U and − h V ≤ t V < suchthat P [ | ¯ X n − EX | > ǫ ] ≤ c n , where c = max { M U ( t U ) , M V ( t V ) } ∈ (0 , . (23) See [15] for an introduction into randomized algorithms, whence we infer the followinginequalities.1. (Chernoff Poisson Trials I)

Let X i be n independent Poisson trials . Let X = P ni =1 X i . Then for δ > , P [ X ≥ (1 + δ ) EX ] < (cid:18) e δ (1 + δ ) δ (cid:19) EX . (24)2. (Chernoff Poisson Trials II) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ ≤ , P [ X ≥ (1 + δ ) EX ] < e − ( EX ) δ / . (25)3. (Chernoff Poisson Trials III) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for R ≥ gEX , P [ X ≥ R ] < − R . (26) Such a t U and t V exist since E U < and E V < , guaranteeing that M U and M V are decreasing ina neighborhood of zero. Each X i is a Bernoulli ( p i ) . (Chernoff Poisson Trials IV) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ < , P [ X ≤ (1 − δ ) EX ] < (cid:18) e − δ (1 − δ ) − δ (cid:19) EX . (27)5. (Chernoff Poisson Trials V) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ < , P [ X ≤ (1 − δ ) EX ] < e − δ EX/ . (28)6. (Chernoff Rademacher I) Suppose X , . . . , X n be iid such that P [ X i = 1] = P [ X i = −

1] = . If X = P ni =1 X i and a > , then P [ X ≥ a ] ≤ e − a n .7. (Chernoff Rademacher II) Suppose X , . . . , X n be iid such that P [ X i = 1] = P [ X i = −

1] = . If X = P ni =1 X i and a > , then P [ | X | ≥ a ] ≤ e − a n .8. (Chernoff Bernoulli I) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and < a < n , then P (cid:2) X ≤ n − a (cid:3) ≤ e − a n .9. (Chernoff Bernoulli II) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and < δ < , then P (cid:2) X ≤ n (1 − δ ) (cid:3) ≤ e − nδ .10. (Chernoff Bernoulli III) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and a > , then P (cid:2) X ≥ n + a (cid:3) ≤ e − a n .11. (Chernoff Bernoulli IV) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and δ > , then P (cid:2) X ≤ n (1 + δ ) (cid:3) ≤ e − nδ . (Gauss [1]) Suppose X follows a unimodal distribution with mode ν , and deﬁne τ = E ( X − ν ) . Then P [ | X − ν | > ǫ ] ≤  τ ǫ , ǫ ≥ q τ − ǫτ √ , ǫ ≤ q τ (29)2. (Vysochanski˘i-Petunin [1]) Suppose X follows a unimodal distribution, anddeﬁne ξ = E ( X − α ) for arbitrary α . Then P [ | X − α | > ǫ ] ≤  ξ ǫ , ǫ ≥ q ξ ξ ǫ − , ǫ ≤ q ξ (30)8. (Hoeffding I [14]) Let Y , . . . , Y n be independent observations such that E Y i =0 and a i ≤ Y i ≤ b i for all i . If ǫ > and t > , then P " n X i =1 Y i ≥ ǫ ≤ e − tǫ n Y i =1 e t ( b i − a i ) / (31)4. (Hoeffding II [14]) Let X , . . . , X n be independent Bernoulli( p ). If ǫ > , then P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i − np (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ≤ e − nǫ (32)5. (Saw) Suppose X , . . . , X n are iid with ﬁnite ﬁrst and second order moments.Let ¯ X = n P ni =1 X i and S = n − P ni =1 ( X i − ¯ X ) . Let k > , ν ( t ) =max (cid:8) m ∈ N | m < n +1 t (cid:9) , α ( t ) = ( n +1)( n +1 − ν ( t ))1+ ν ( t )( n +1 − ν ( t )) , and β = n ( n +1) k n − n +1) k .Then P [ | X − ¯ X | ≥ kS ] ≤ ( n +1 ( ν ( β ) − if ν is odd and β > α ( β ) n +1 ν ( β ) otherwise. (33) Kannan [16] furnishes an array of inequalities helpful in analyzing graphs and otherobjects of combinatoric import.1. (Chromatic Number)

Let G ( { , . . . , n } , P ) be a random graph with edge prob-abilities P = ( p ij ) . The chromatic number χ = χ ( G ) is the least number ofcolors necessary to color G such that no two vertices sharing an edge receive thesame color. Let p = P i,j p ij ( n ) . Then there exists a constant c > such that for t ∈ (0 , n √ p ) , P [ | χ ( G ) − Eχ ( G ) | ≥ t ] ≤ e − ct n √ p log n . (34)2. (Johnson-Lindenstrauss Random Projection) Suppose k ≤ n , and we pick V , . . . , V k uniformly randomly from the surface of the unit ball in R n . Then for ǫ ∈ (0 , , there exist constants c , c > such that P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k X i =1 v i − kn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫkn ≤ c e − c ǫ k (35)3. (Random Projection) Suppose m is an even positive integer and X , . . . , X n are real-valued random observations satisfying the strong negative correlationprinciple . That is, for all i , E X i ( X + · · · + X i − ) l < when l < m is oddand E ( X li | X + · · · + X i − ) ≤ (cid:0) nm (cid:1) l − l ! for l ≤ m even. Deﬁne constants9 M i,l } , { K i,l } , and { L i,l } such that E ( X li | X + · · · + X i − ) ≤ M i,l , each K i,l is an indicator variable on the typical case of the conditional expectation where P [ K i,l ] = 1 − δ i,l , and E ( X li | X + · · · + X i − , K i,l ) ≤ L i,l for l = 2 , , . . . , m and i = 1 , . . . , n . Finally, let X = P ni =1 X i . Then E X m ≤ ( cm ) m +1 (cid:18)P m/ l =1 m − l l ( P ni =1 L i, l ) l (cid:19) m + ( cm ) m +2 P m/ l =1 1 nl P ni =1 (cid:16) nM i, l δ / ( m − l +2) i, l (cid:17) ml . (36)4. (Bin Packing) Suppose Y , . . . , Y n are iid from a discrete distribution of r atomseach with probability at least n and E Y ≤ r log n . Let f ( Y , . . . , Y n ) be theminimum number of unit capacity bins necessary to pack the Y , . . . , Y n items.Then there exist constants c , c > such that if t ∈ (0 , n [( EY i ) + V ar ( Y i )]) ,then P [ | f − Ef | ≥ t + r ] ≤ c e − c t n [( EYi )3+

V ar ( Yi )] . (37)5. (Strong Negative Correlation) Suppose m is an even positive integer, and X , . . . , X n are real-valued random observations satisfying the strong negative correlationprinciple . That is, for all i , E X i ( X + · · · + X i − ) l < when l < m is odd and E ( X li | X + · · · + X i − ) ≤ (cid:0) nm (cid:1) l − l ! for l ≤ m even. Then E n X i =1 X i ! m ≤ (24 mn ) m . (38)6. (Hamiltonian Tour) Suppose Y , . . . , Y n are sets of points generated indepen-dently and respectively from n subsquares of size √ n × √ n of the unit square,and there exists a constant c ∈ (0 , such that P [ | Y i | ] ≤ c for all i . Suppose fur-ther that for ǫ > , and l ∈ { , . . . , m } , E | Y i | l ≤ [ O ( l )] (2 − ǫ ) l . Finally, suppose f ( Y , . . . , Y n ) is the length of the shortest Hamiltonian tour through Y ∪· · ·∪ Y n .Then E [ f ( Y , . . . , Y n ) − Ef ( Y . . . , Y n )] m ≤ ( cm ) m . (39)7. (MST) Suppose Y , . . . , Y n are sets of points generated independently and re-spectively from n subsquares of size √ n × √ n of the unit square, and thereexists a constant c ∈ (0 , such that P [ | Y i | ] ≤ c for all i . Suppose furtherthat for ǫ > , and l ∈ { , . . . , m } , E | Y i | l ≤ [ O ( l )] (2 − ǫ ) l . Finally, suppose f ( Y , . . . , Y n ) is the length of a minimum spanning tree of Y ∪ · · · ∪ Y n . Then E [ f ( Y , . . . , Y n ) − Ef ( Y . . . , Y n )] m ≤ ( cm ) m . (40)8. (Random Vector) Suppose Y = ( Y , . . . , Y n ) is a random vector such thatfor a ﬁxed k ≤ n , E ( Y i | Y , . . . , Y i ) is a nondecreasing function of Y + · · + Y i − for i = 1 , . . . , k and for even l ≤ k , there exists a c > such that E ( Y li | Y , . . . , Y i − ) ≤ (cid:0) cln (cid:1) l . Then for any even m ≤ k , E k X i =1 Y i − EY i ! m ≤ √ cmkn ! m . (41) The inequalities to follow furnish mechanisms for the analysis of interdependence,Markov chains, vectors, and graphs, among others.1. (Talagrand [17])

Let X be chosen randomly uniformly from {− , } n , let A bea convex subset of R n , A t = { p ∈ R n | dist ( p , A ) ≤ t } . Then there exists c > such that P [ X ∈ A ] P [ X / ∈ A t ] ≤ e − ct for all t > .2. (Talagrand Large Deviation [17]) Let X be chosen randomly uniformly from {− , } n , V be a d -dimensional subspace of R n . Then there exist constants c, C > such that P [ | dist ( X, V ) − √ n − d | ≥ t ] ≤ Ce − ct for all t > .3. (Gaussian for Lipschitz [17]) Let X be an n -dimensional random vector suchthat each X i is an independent n (0 , variable. If f : R d → R is a Lipschitzfunction with scale constant 1 , then there exists a constant c > such that P [ | f ( X ) − Ef ( X ) | ≥ t ] ≤ e − ct for all t > .4. (Azuma [18]) Suppose X , . . . , X n is a martingale ( E ( X i | X , . . . , X i − ) = X i − for i = 1 , . . . , n ); suppose further that X is c -Lipschitz ( | X i − X i − | ≤ c i for i = 1 , . . . , n , c ∈ R n positive); then P [ X n − X ≥ λ ] ≤ e − λ P ni =1 c i . (42)5. (Bennett [2]) Let X , . . . , X n be independent random variables of zero meansuch that P [ X i ≤

1] = 1 . Let h ( u ) = (1 + u ) log(1 + u ) − u for u ≥ and σ = n P ni =1 V ar ( X i ) . Then for t > , P " n X i =1 X i > t ≤ e − nσ h ( tnσ ) . (43)6. (Bernstein [2]) Let X , . . . , X n be independent random variables of zero meansuch that P [ X i ≤

1] = 1 . Let σ = n P ni =1 V ar ( X i ) . Then for ǫ > , P " n n X i =1 X i > ǫ ≤ e − nǫ σ ǫ/ . (44) A Lipschitz function f satisﬁes | f ( x ) − f ( y ) | ≤ M || x − y || for all x, y ∈ domain ( f ) . (McDiarmid Bounded Differences I [19]) Let X , . . . , X n be independent ran-dom variables each whose domain is χ . If f : χ n → R n is a function such thatfor all x ∈ χ n , y ∈ χ , and i ∈ { , . . . , n } , there exists a constant c i > suchthat | f ( x ) − f ( x , . . . , x i − , y, x i +1 , . . . , x n ) | ≤ c i , then P [ f ( X ) − Ef ( X ) ≥ t ] ≥ e − t P ni =1 c i for all t > . (45)8. (McDiarmid Bounded Differences II [19]) Let X , . . . , X n be independentrandom variables each whose domain is χ . If f : χ n → R n is a function suchthat for all x ∈ χ n , y ∈ χ , and i ∈ { , . . . , n } , there exists a constant c i > such that | f ( x ) − f ( x , . . . , x i − , y, x i +1 , . . . , x n ) | ≤ c i , then P [ f ( X ) − Ef ( X ) ≤ − t ] ≥ e − t P ni =1 c i for all t > . (46)9. (Dvoretzky Kiefer Wolfowitz I [20]) Suppose X , . . . , X n are iid univariaterandom variables following cdf F . Let F n ( x ) = n P ni =1 X i ≤ x be the empiricaldistribution. Then for ǫ > q n log 2 , P (cid:20) sup x ∈ R ( F n ( x ) − F ( x )) > ǫ (cid:21) ≤ e − nǫ . (47)10. (Dvoretzky Kiefer Wolfowitz II [20]) Suppose X , . . . , X n are iid univariaterandom variables following cdf F . Let F n ( x ) = n P ni =1 X i ≤ x be the empiricaldistribution. Then for ǫ > , P (cid:20) sup x ∈ R | F n ( x ) − F ( x ) | > ǫ (cid:21) ≤ e − nǫ . (48)11. (Etemadi Differing Means [21]) Let X , . . . , X n be random variables withcommon support. Let S k = P ki =1 X k be the k th partial sum. Then for ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ≤ k ≤ n P [ | S k | ≥ ǫ ] . (49)12. (Etemadi Shared Means [21]) Let X , . . . , X n be random variables with com-mon support and equal means. Let S k = P ki =1 X k be the k th partial sum. Thenfor ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ǫ V ar ( S n ) . (50)13. (Kolmogorov [22]) Let X , . . . , X n be independent random variables with com-mon support such that E X i = 0 and V ar ( X i ) < ∞ for i = 1 , . . . , n . Let S k = P ki =1 X k be the k th partial sum. Then for ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ǫ n X i =1 V ar ( X i ) . (51)124. (Chebychev Multidimensional [23]) Let X ∈ R n be a random vector withcovariance matrix V = E (cid:2) ( X − E X )( X − E X ) T (cid:3) . Then for t > , P (cid:20)q ( X − E X ) T V − ( X − E X ) (cid:21) ≤ nt . (52)15. (Leguerre Samuelson [24]) Let X , . . . , X n be random variables with commonsupport, and deﬁne ¯ X = n P ni =1 X i and S = n − P ni =1 ( X i − ¯ X ) . Then for i = 1 , . . . , n with probability one, ¯ X − S √ n − ≤ X i ≤ ¯ X + S √ n − . (53)16. (LeCam [25]) Suppose X , . . . , X n are independent binomial random variableswith respective success parameters p , . . . , p n . Letting λ n = P ni =1 p i , we have ∞ X k =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P " n X i =1 X i = k − λ kn e − λ n k ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n X i =1 p i . (54)17. (Doob Martingale [26]) Let X ∈ R n be a martingale ( E ( X i | X , . . . , X i − ) = X i − for i = 2 , . . . , n ). Then for C > , p ≥ , P (cid:20) sup ≤ i ≤ n X i ≥ C (cid:21) ≤ EX pn C p . (55) References [1] G. Casella and R. Berger,

Statistical Inference . Duxbury, 2002.[2] S. Boucheron, O. Bousquet, and G. Lugosi,

Concentration inequalities, AdvancedLectures in Machine Learning . Springer, 2004.[3] L. Wasserman, “Lecture on probability inequalities.” , 2008.[4] J. Duchi, “Probability bounds.” .[5] C. D. C Vasile, “A better bound on the variance,”

American MathematicalMonthly (Mathematical Association of America) , 2000.[6] C. Vasile, “Two generalizations of popovicius inequality,”

Crux Mathematicorum ,2001.[7] D. Chapman and H. Robbins, “Minimum variance estimation without regularityassumptions,”

Annals of Mathematical Statistics , 1951.[8] J. C. A.Dembo, T.M. Cover, “Information-theoretic inequalities,”

IEEE Trans.Inform. Theory , 1991. 139] J. Marcinkiewicz and A. Zygmund, “Sur les foncions independantes,”

Fund.Math , 1937.[10] T. Wolff, “Lectures on harmonic analysis,”

AMS , 2003.[11] H. Rosenthal, “On the subspaces of l p Israel J. Math , 1970.[12] N. Papadatos, “Maximum variance of order statistics.” , 1994.[13] W. H´urlimann, “Generalized algebraic bounds on order statis-tics functions, with application to reinsurance and catastrophe.” , 1970.[14] L. Wasserman,

All of Statistics . Springer, 2004.[15] M. Mitzenmacher and E. Upfal,

Probability and Computing . Cambridge, 2005.[16] R. Kannan, “A new probability inequality using typical moments and concentra-tion results.” , 2009.[17] T. Tao, “Talagrand’s concentration inequality.” http://terrytao.wordpress.com/2009/06/09/talagrands-concentrationinequality , 2009.[18] L. L. F. Chung, “Complex graphs and networks,”

AMS , 2006.[19] P. Bartlett, “Lecture on concentration inequalities.” , 2008.[20] A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of thesample distribution function and of the classical multinomial estimator,”

Annalsof Mathematical Statistics , 1956.[21] N. Etemadi, “On some classical results in probability theory,”

Sankhy¯a Ser , 1985.[22] P. Billingsley,

Probability and Measure . John Wiley, 1995.[23] L. Wasserman, “High-dimensional probability an introduction with applicationsin data science.” , 2020.[24] P. Samuelson, “How deviant can you be?,”

Journal of the American StatisticalAssociation , 1968.[25] L. LeCam, “An approximation theorem for the poisson binomial distribution,”

Paciﬁc Journal of Mathematics , 1960.[26] D. Revuz and M. Yor,