aa r X i v : . [ m a t h . S T ] F e b One Hundred Probability and StatisticsInequalities
CNP SlagleFebruary 16, 2021
In 2012, the author compiled a subset of the following inequalities for a researcherin randomized algorithms. One might think of said inequalities as a very quick refer-ence, with access to primary and secondary resources listed either within the sectionor alongside the inequality of interest. In the intervening years, some of the originalsources and their respective links have vanished, leading the author to consider a com-panion document with proofs for select inequalities within this list. Though the authorwould refute the completeness of this collection for more advanced researchers, henonetheless believes it may serve some interest. It is important
The relations to follow include axioms within the probabilistic framework, along witha few of the basic inferences derived therefrom. See [1] for a wonderful introduction.Given events (sets) A , B , and countable { A n } ∞ n =1 ,1. P [ A ] ≥ P [ A ] ≤
3. If A ⊂ B , then P [ A ] ≤ P [ B ]
4. If A ⊂ B , then P [ B ] ≤ P [ A c ] (Boole) P [ ∪ ∞ n =1 A n ] ≤ P ∞ n =1 P [ A n ] P [ ∪ ∞ n =1 A n ] ≥ sup { P [ A n ] | n = 1 , , . . . } P [ ∩ ∞ n =1 A n ] ≤ inf { P [ A n ] | n = 1 , , . . . } P [ A ∩ B ] ≤ min { P [ A ] , P [ B ] } (Bonferroni) P [ A ∩ B ] ≥ P [ A ] + P [ B ] − (Bonferroni General) P [ ∩ ni =1 ] ≥ P ni =1 P [ A i ] − ( n − P [ A | B ] ≥ P [ A ∩ B ] (Karlin Ost) Define P = P ni =1 P [ A i ] , P = P ≤ i
1] = ). Suppose f : R + → R + be convexand increasing, and φ i : R → R be Lipschitz with constant L for i = 1 , . . . , n .Then for T ⊂ R n , Ef
12 sup t ∈ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i φ i ( t i ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! ≤ Ef L sup t ∈ T (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i t i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)! (12)8. (Bhatia-Davis [5]) If a univariate probability distribution F has minimum m ,maximum M , and mean µ , then for any X following F , V ar ( X ) ≤ ( M − µ )( µ − m ) .9. (Popoviciu [6]) If a univariate probability distribution F has minimum m andmaximum M , then for any X following F , V ar ( X ) ≤ ( M − m ) .10. (Chapman-Robbins [7]) Suppose X is a random variable in R k with an un-known parameter θ . If δ ( X ) is an unbiased estimator for τ ( θ ) , then V ar ( δ ( X )) ≥ sup ∆ [ τ ( θ + ∆) − τ ( θ )] E θ h p ( X .θ +∆) p ( X ,θ ) − i . (13)41. (Entropy Power [8]) Define the entropy of X to be H (() X ) = − E log f X ( X ) ,where f X ( x ) is the pdf or pmf of X . Define the entropy power of X to be N ( X ) = πe e n h ( X ) . Then for random variables X and Y , we have N ( X + Y ) ≥ N ( X ) + N ( Y ) .12. (Marcinkiewicz Zygmund [9]) Let X , . . . , X n be independent random vari-ables with common support such that E X i = 0 and E X pi < ∞ for all p ≥ .Then there exist constants A ( p ) and B ( p ) , dependent only on p , such that A ( p ) E n X i =1 | X i | ! p ≤ E n X i =1 | X i | ! p ≤ B ( p ) E n X i =1 | X i | ! p . (14)13. (Khintchine [10]) Let X , . . . , X n be iid Rademacher random variables. Thenfor any λ , . . . , λ n ∈ C and p > , there exist constants A ( p ) and B ( p ) , depen-dent only on p , such that A ( p ) n X i =1 | λ i | ! ≤ E n X i =1 | λ i X i | ! p ! p ≤ B ( p ) n X i =1 | λ i | ! . (15)14. (Rosenthal I [11]) Let X , . . . , X n be independent nonnegative random vari-ables such that E X pi < ∞ for a fixed p ≥ , i = 1 , . . . , n . Then there existconstants A ( p ) and B ( p ) dependent only on p such that A ( p ) max (cid:8)P ni =1 EX pi , ( P ni =1 EX i ) p (cid:9) ≤ ( P ni =1 EX i ) p ≤ B ( p ) max (cid:8)P ni =1 EX pi , ( P ni =1 EX i ) p (cid:9) . (16)15. (Rosenthal II) Let X , . . . , X n be independent random variables such that E X i =0 and E X pi < ∞ for a fixed p ≥ , i = 1 , . . . , n . Then there exist constants A ( p ) and B ( p ) dependent only on p such that A ( p ) max nP ni =1 E | X i | p , (cid:0)P ni =1 EX i (cid:1) p o ≤ | P ni =1 EX i | p ≤ B ( p ) max nP ni =1 E | X i | p , (cid:0)P ni =1 EX i (cid:1) p o . (17)16. (Papadatos [12]) Let X (1) , . . . , X ( n ) be the order statistics of iid random vari-ables X , . . . , X n with variance σ . Define G ( x ) = I x ( k, n + 1 − k ) and σ n ( k ) = sup 1. (Markov[1]) Suppose X ≥ , and E [ X ] > . Then P ( X ≥ t ) ≤ E [ X ] t for all t > .2. (Chebychev [1]) For t > , P [ | X − EX | ≥ t ] ≤ V ar ( X ) t .3. ( g -Markov) Let X ≥ , E [ X ] > . Then for increasing g : R +0 → R + , P ( X ≥ t ) ≤ E [ g ( X )] g ( t ) . (22)4. (Normal I, Mill [1][14]) For Z a standard normal, P [ | Z | ≥ t ] ≤ q π e − t / t .5. (Normal II [1]) For Z a standard normal, P [ | Z | ≥ t ] ≥ q π e − t / t t .6. (Chernoff I [1][15]) Let M X ( t ) , − h ≤ t ≤ h be the moment-generating func-tion of X . Then P [ X > a ] ≤ e − at M X ( t ) , − h ≤ t ≤ h .7. (Chernoff II [1][15]) Let M X ( t ) , − h ≤ t ≤ h be the moment-generating func-tion of X . Then P [ X ≤ a ] ≤ e − at M X ( t ) , − h ≤ t ≤ .8. (Chernoff Sum I [1]) Let X , . . . , X n be iid, X = P ni =1 X i , and M X ( t ) , − h ≤ t ≤ h be the moment-generating function of X . . Then P [ S > a ] ≤ e − at [ M X ( t )] n for ≤ t ≤ h .9. (Chernoff Sum II [1]) Let X , . . . , X n be iid, X = P ni =1 X i , and M X ( t ) , − h ≤ t ≤ h be the moment-generating function of X . Then P [ S ≤ a ] ≤ e − at [ M X ( t )] n for − h ≤ t ≤ .10. (Chernoff Mean [1]) Let X , . . . , X n be iid, ǫ > , ¯ X n = P ni =1 X i , M U ( t ) , − h U ≤ t ≤ h U be the moment-generating function of U = X − EX − ǫ ,and M V ( t ) , − h V ≤ t ≤ h V be the moment-generating function of V = − X + EX − ǫ . Then there exist for some < t U ≤ h U and − h V ≤ t V < suchthat P [ | ¯ X n − EX | > ǫ ] ≤ c n , where c = max { M U ( t U ) , M V ( t V ) } ∈ (0 , . (23) See [15] for an introduction into randomized algorithms, whence we infer the followinginequalities.1. (Chernoff Poisson Trials I) Let X i be n independent Poisson trials . Let X = P ni =1 X i . Then for δ > , P [ X ≥ (1 + δ ) EX ] < (cid:18) e δ (1 + δ ) δ (cid:19) EX . (24)2. (Chernoff Poisson Trials II) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ ≤ , P [ X ≥ (1 + δ ) EX ] < e − ( EX ) δ / . (25)3. (Chernoff Poisson Trials III) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for R ≥ gEX , P [ X ≥ R ] < − R . (26) Such a t U and t V exist since E U < and E V < , guaranteeing that M U and M V are decreasing ina neighborhood of zero. Each X i is a Bernoulli ( p i ) . (Chernoff Poisson Trials IV) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ < , P [ X ≤ (1 − δ ) EX ] < (cid:18) e − δ (1 − δ ) − δ (cid:19) EX . (27)5. (Chernoff Poisson Trials V) Let X i be n independent Poisson trials. Let X = P ni =1 X i . Then for < δ < , P [ X ≤ (1 − δ ) EX ] < e − δ EX/ . (28)6. (Chernoff Rademacher I) Suppose X , . . . , X n be iid such that P [ X i = 1] = P [ X i = − 1] = . If X = P ni =1 X i and a > , then P [ X ≥ a ] ≤ e − a n .7. (Chernoff Rademacher II) Suppose X , . . . , X n be iid such that P [ X i = 1] = P [ X i = − 1] = . If X = P ni =1 X i and a > , then P [ | X | ≥ a ] ≤ e − a n .8. (Chernoff Bernoulli I) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and < a < n , then P (cid:2) X ≤ n − a (cid:3) ≤ e − a n .9. (Chernoff Bernoulli II) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and < δ < , then P (cid:2) X ≤ n (1 − δ ) (cid:3) ≤ e − nδ .10. (Chernoff Bernoulli III) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and a > , then P (cid:2) X ≥ n + a (cid:3) ≤ e − a n .11. (Chernoff Bernoulli IV) Suppose X , . . . , X n be iid Bernoulli (cid:0) (cid:1) . If X = P ni =1 X i and δ > , then P (cid:2) X ≤ n (1 + δ ) (cid:3) ≤ e − nδ . (Gauss [1]) Suppose X follows a unimodal distribution with mode ν , and define τ = E ( X − ν ) . Then P [ | X − ν | > ǫ ] ≤ τ ǫ , ǫ ≥ q τ − ǫτ √ , ǫ ≤ q τ (29)2. (Vysochanski˘i-Petunin [1]) Suppose X follows a unimodal distribution, anddefine ξ = E ( X − α ) for arbitrary α . Then P [ | X − α | > ǫ ] ≤ ξ ǫ , ǫ ≥ q ξ ξ ǫ − , ǫ ≤ q ξ (30)8. (Hoeffding I [14]) Let Y , . . . , Y n be independent observations such that E Y i =0 and a i ≤ Y i ≤ b i for all i . If ǫ > and t > , then P " n X i =1 Y i ≥ ǫ ≤ e − tǫ n Y i =1 e t ( b i − a i ) / (31)4. (Hoeffding II [14]) Let X , . . . , X n be independent Bernoulli( p ). If ǫ > , then P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 X i − np (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫ ≤ e − nǫ (32)5. (Saw) Suppose X , . . . , X n are iid with finite first and second order moments.Let ¯ X = n P ni =1 X i and S = n − P ni =1 ( X i − ¯ X ) . Let k > , ν ( t ) =max (cid:8) m ∈ N | m < n +1 t (cid:9) , α ( t ) = ( n +1)( n +1 − ν ( t ))1+ ν ( t )( n +1 − ν ( t )) , and β = n ( n +1) k n − n +1) k .Then P [ | X − ¯ X | ≥ kS ] ≤ ( n +1 ( ν ( β ) − if ν is odd and β > α ( β ) n +1 ν ( β ) otherwise. (33) Kannan [16] furnishes an array of inequalities helpful in analyzing graphs and otherobjects of combinatoric import.1. (Chromatic Number) Let G ( { , . . . , n } , P ) be a random graph with edge prob-abilities P = ( p ij ) . The chromatic number χ = χ ( G ) is the least number ofcolors necessary to color G such that no two vertices sharing an edge receive thesame color. Let p = P i,j p ij ( n ) . Then there exists a constant c > such that for t ∈ (0 , n √ p ) , P [ | χ ( G ) − Eχ ( G ) | ≥ t ] ≤ e − ct n √ p log n . (34)2. (Johnson-Lindenstrauss Random Projection) Suppose k ≤ n , and we pick V , . . . , V k uniformly randomly from the surface of the unit ball in R n . Then for ǫ ∈ (0 , , there exist constants c , c > such that P "(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) k X i =1 v i − kn (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ǫkn ≤ c e − c ǫ k (35)3. (Random Projection) Suppose m is an even positive integer and X , . . . , X n are real-valued random observations satisfying the strong negative correlationprinciple . That is, for all i , E X i ( X + · · · + X i − ) l < when l < m is oddand E ( X li | X + · · · + X i − ) ≤ (cid:0) nm (cid:1) l − l ! for l ≤ m even. Define constants9 M i,l } , { K i,l } , and { L i,l } such that E ( X li | X + · · · + X i − ) ≤ M i,l , each K i,l is an indicator variable on the typical case of the conditional expectation where P [ K i,l ] = 1 − δ i,l , and E ( X li | X + · · · + X i − , K i,l ) ≤ L i,l for l = 2 , , . . . , m and i = 1 , . . . , n . Finally, let X = P ni =1 X i . Then E X m ≤ ( cm ) m +1 (cid:18)P m/ l =1 m − l l ( P ni =1 L i, l ) l (cid:19) m + ( cm ) m +2 P m/ l =1 1 nl P ni =1 (cid:16) nM i, l δ / ( m − l +2) i, l (cid:17) ml . (36)4. (Bin Packing) Suppose Y , . . . , Y n are iid from a discrete distribution of r atomseach with probability at least n and E Y ≤ r log n . Let f ( Y , . . . , Y n ) be theminimum number of unit capacity bins necessary to pack the Y , . . . , Y n items.Then there exist constants c , c > such that if t ∈ (0 , n [( EY i ) + V ar ( Y i )]) ,then P [ | f − Ef | ≥ t + r ] ≤ c e − c t n [( EYi )3+ V ar ( Yi )] . (37)5. (Strong Negative Correlation) Suppose m is an even positive integer, and X , . . . , X n are real-valued random observations satisfying the strong negative correlationprinciple . That is, for all i , E X i ( X + · · · + X i − ) l < when l < m is odd and E ( X li | X + · · · + X i − ) ≤ (cid:0) nm (cid:1) l − l ! for l ≤ m even. Then E n X i =1 X i ! m ≤ (24 mn ) m . (38)6. (Hamiltonian Tour) Suppose Y , . . . , Y n are sets of points generated indepen-dently and respectively from n subsquares of size √ n × √ n of the unit square,and there exists a constant c ∈ (0 , such that P [ | Y i | ] ≤ c for all i . Suppose fur-ther that for ǫ > , and l ∈ { , . . . , m } , E | Y i | l ≤ [ O ( l )] (2 − ǫ ) l . Finally, suppose f ( Y , . . . , Y n ) is the length of the shortest Hamiltonian tour through Y ∪· · ·∪ Y n .Then E [ f ( Y , . . . , Y n ) − Ef ( Y . . . , Y n )] m ≤ ( cm ) m . (39)7. (MST) Suppose Y , . . . , Y n are sets of points generated independently and re-spectively from n subsquares of size √ n × √ n of the unit square, and thereexists a constant c ∈ (0 , such that P [ | Y i | ] ≤ c for all i . Suppose furtherthat for ǫ > , and l ∈ { , . . . , m } , E | Y i | l ≤ [ O ( l )] (2 − ǫ ) l . Finally, suppose f ( Y , . . . , Y n ) is the length of a minimum spanning tree of Y ∪ · · · ∪ Y n . Then E [ f ( Y , . . . , Y n ) − Ef ( Y . . . , Y n )] m ≤ ( cm ) m . (40)8. (Random Vector) Suppose Y = ( Y , . . . , Y n ) is a random vector such thatfor a fixed k ≤ n , E ( Y i | Y , . . . , Y i ) is a nondecreasing function of Y + · · + Y i − for i = 1 , . . . , k and for even l ≤ k , there exists a c > such that E ( Y li | Y , . . . , Y i − ) ≤ (cid:0) cln (cid:1) l . Then for any even m ≤ k , E k X i =1 Y i − EY i ! m ≤ √ cmkn ! m . (41) The inequalities to follow furnish mechanisms for the analysis of interdependence,Markov chains, vectors, and graphs, among others.1. (Talagrand [17]) Let X be chosen randomly uniformly from {− , } n , let A bea convex subset of R n , A t = { p ∈ R n | dist ( p , A ) ≤ t } . Then there exists c > such that P [ X ∈ A ] P [ X / ∈ A t ] ≤ e − ct for all t > .2. (Talagrand Large Deviation [17]) Let X be chosen randomly uniformly from {− , } n , V be a d -dimensional subspace of R n . Then there exist constants c, C > such that P [ | dist ( X, V ) − √ n − d | ≥ t ] ≤ Ce − ct for all t > .3. (Gaussian for Lipschitz [17]) Let X be an n -dimensional random vector suchthat each X i is an independent n (0 , variable. If f : R d → R is a Lipschitzfunction with scale constant 1 , then there exists a constant c > such that P [ | f ( X ) − Ef ( X ) | ≥ t ] ≤ e − ct for all t > .4. (Azuma [18]) Suppose X , . . . , X n is a martingale ( E ( X i | X , . . . , X i − ) = X i − for i = 1 , . . . , n ); suppose further that X is c -Lipschitz ( | X i − X i − | ≤ c i for i = 1 , . . . , n , c ∈ R n positive); then P [ X n − X ≥ λ ] ≤ e − λ P ni =1 c i . (42)5. (Bennett [2]) Let X , . . . , X n be independent random variables of zero meansuch that P [ X i ≤ 1] = 1 . Let h ( u ) = (1 + u ) log(1 + u ) − u for u ≥ and σ = n P ni =1 V ar ( X i ) . Then for t > , P " n X i =1 X i > t ≤ e − nσ h ( tnσ ) . (43)6. (Bernstein [2]) Let X , . . . , X n be independent random variables of zero meansuch that P [ X i ≤ 1] = 1 . Let σ = n P ni =1 V ar ( X i ) . Then for ǫ > , P " n n X i =1 X i > ǫ ≤ e − nǫ σ ǫ/ . (44) A Lipschitz function f satisfies | f ( x ) − f ( y ) | ≤ M || x − y || for all x, y ∈ domain ( f ) . (McDiarmid Bounded Differences I [19]) Let X , . . . , X n be independent ran-dom variables each whose domain is χ . If f : χ n → R n is a function such thatfor all x ∈ χ n , y ∈ χ , and i ∈ { , . . . , n } , there exists a constant c i > suchthat | f ( x ) − f ( x , . . . , x i − , y, x i +1 , . . . , x n ) | ≤ c i , then P [ f ( X ) − Ef ( X ) ≥ t ] ≥ e − t P ni =1 c i for all t > . (45)8. (McDiarmid Bounded Differences II [19]) Let X , . . . , X n be independentrandom variables each whose domain is χ . If f : χ n → R n is a function suchthat for all x ∈ χ n , y ∈ χ , and i ∈ { , . . . , n } , there exists a constant c i > such that | f ( x ) − f ( x , . . . , x i − , y, x i +1 , . . . , x n ) | ≤ c i , then P [ f ( X ) − Ef ( X ) ≤ − t ] ≥ e − t P ni =1 c i for all t > . (46)9. (Dvoretzky Kiefer Wolfowitz I [20]) Suppose X , . . . , X n are iid univariaterandom variables following cdf F . Let F n ( x ) = n P ni =1 X i ≤ x be the empiricaldistribution. Then for ǫ > q n log 2 , P (cid:20) sup x ∈ R ( F n ( x ) − F ( x )) > ǫ (cid:21) ≤ e − nǫ . (47)10. (Dvoretzky Kiefer Wolfowitz II [20]) Suppose X , . . . , X n are iid univariaterandom variables following cdf F . Let F n ( x ) = n P ni =1 X i ≤ x be the empiricaldistribution. Then for ǫ > , P (cid:20) sup x ∈ R | F n ( x ) − F ( x ) | > ǫ (cid:21) ≤ e − nǫ . (48)11. (Etemadi Differing Means [21]) Let X , . . . , X n be random variables withcommon support. Let S k = P ki =1 X k be the k th partial sum. Then for ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ≤ k ≤ n P [ | S k | ≥ ǫ ] . (49)12. (Etemadi Shared Means [21]) Let X , . . . , X n be random variables with com-mon support and equal means. Let S k = P ki =1 X k be the k th partial sum. Thenfor ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ǫ V ar ( S n ) . (50)13. (Kolmogorov [22]) Let X , . . . , X n be independent random variables with com-mon support such that E X i = 0 and V ar ( X i ) < ∞ for i = 1 , . . . , n . Let S k = P ki =1 X k be the k th partial sum. Then for ǫ > , P (cid:20) max ≤ k ≤ n | S k | ≥ ǫ (cid:21) ≤ ǫ n X i =1 V ar ( X i ) . (51)124. (Chebychev Multidimensional [23]) Let X ∈ R n be a random vector withcovariance matrix V = E (cid:2) ( X − E X )( X − E X ) T (cid:3) . Then for t > , P (cid:20)q ( X − E X ) T V − ( X − E X ) (cid:21) ≤ nt . (52)15. (Leguerre Samuelson [24]) Let X , . . . , X n be random variables with commonsupport, and define ¯ X = n P ni =1 X i and S = n − P ni =1 ( X i − ¯ X ) . Then for i = 1 , . . . , n with probability one, ¯ X − S √ n − ≤ X i ≤ ¯ X + S √ n − . (53)16. (LeCam [25]) Suppose X , . . . , X n are independent binomial random variableswith respective success parameters p , . . . , p n . Letting λ n = P ni =1 p i , we have ∞ X k =0 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) P " n X i =1 X i = k − λ kn e − λ n k ! (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ n X i =1 p i . (54)17. (Doob Martingale [26]) Let X ∈ R n be a martingale ( E ( X i | X , . . . , X i − ) = X i − for i = 2 , . . . , n ). Then for C > , p ≥ , P (cid:20) sup ≤ i ≤ n X i ≥ C (cid:21) ≤ EX pn C p . (55) References [1] G. Casella and R. Berger, Statistical Inference . Duxbury, 2002.[2] S. Boucheron, O. Bousquet, and G. Lugosi, Concentration inequalities, AdvancedLectures in Machine Learning . Springer, 2004.[3] L. Wasserman, “Lecture on probability inequalities.” , 2008.[4] J. Duchi, “Probability bounds.” .[5] C. D. C Vasile, “A better bound on the variance,” American MathematicalMonthly (Mathematical Association of America) , 2000.[6] C. Vasile, “Two generalizations of popovicius inequality,” Crux Mathematicorum ,2001.[7] D. Chapman and H. Robbins, “Minimum variance estimation without regularityassumptions,” Annals of Mathematical Statistics , 1951.[8] J. C. A.Dembo, T.M. Cover, “Information-theoretic inequalities,” IEEE Trans.Inform. Theory , 1991. 139] J. Marcinkiewicz and A. Zygmund, “Sur les foncions independantes,” Fund.Math , 1937.[10] T. Wolff, “Lectures on harmonic analysis,” AMS , 2003.[11] H. Rosenthal, “On the subspaces of l p Israel J. Math , 1970.[12] N. Papadatos, “Maximum variance of order statistics.” , 1994.[13] W. H´urlimann, “Generalized algebraic bounds on order statis-tics functions, with application to reinsurance and catastrophe.” , 1970.[14] L. Wasserman, All of Statistics . Springer, 2004.[15] M. Mitzenmacher and E. Upfal, Probability and Computing . Cambridge, 2005.[16] R. Kannan, “A new probability inequality using typical moments and concentra-tion results.” , 2009.[17] T. Tao, “Talagrand’s concentration inequality.” http://terrytao.wordpress.com/2009/06/09/talagrands-concentrationinequality , 2009.[18] L. L. F. Chung, “Complex graphs and networks,” AMS , 2006.[19] P. Bartlett, “Lecture on concentration inequalities.” , 2008.[20] A. Dvoretzky, J. Kiefer, and J. Wolfowitz, “Asymptotic minimax character of thesample distribution function and of the classical multinomial estimator,” Annalsof Mathematical Statistics , 1956.[21] N. Etemadi, “On some classical results in probability theory,” Sankhy¯a Ser , 1985.[22] P. Billingsley, Probability and Measure . John Wiley, 1995.[23] L. Wasserman, “High-dimensional probability an introduction with applicationsin data science.” , 2020.[24] P. Samuelson, “How deviant can you be?,” Journal of the American StatisticalAssociation , 1968.[25] L. LeCam, “An approximation theorem for the poisson binomial distribution,” Pacific Journal of Mathematics , 1960.[26] D. Revuz and M. Yor,