Concentration of norms of random vectors with independent p -sub-exponential coordinates
aa r X i v : . [ m a t h . P R ] J un Concentration of norms of random vectors with independent p -sub-exponential coordinates Krzysztof Zajkowski
Institute of Mathematics, University of BialystokCiolkowskiego 1M, 15-245 Bialystok, [email protected]
Abstract
We present examples of p -sub-exponential random variables for any positive p . We prove two types of concentration of standard p -norms (2-norm is the Eu-clidean norm) of random vectors with independent p -sub-exponential coordinatesaround the Lebesgue L p -norms of these p -norms of random vectors. In the firstcase p ≥
1, our estimates depend on the dimension n of random vectors. But inthe second one for p ≥
2, with an additional assumption, we get an estimate thatdoes not depend on n . In other words, we generalize some know concentrationresults in the Euclidean case to cases of the p -norms of random vectors withindependent p -sub-exponential coordinates. . Key words: concentration inequalities, sub-gaussian and sub-exponential randomvariables, Orlicz (Luxemburg) norms, convex conjugates, Bernstein inequalities.
Let p be a positive number. We consider random variables with p - sub-exponential taildecay , i.e., random variables X for which there exists two positive constants c, C suchthat P ( | X | ≥ t ) ≤ c exp (cid:0) − ( t/C ) p (cid:1) for all t ≥
0. Such random variables we will call p - sub-exponential . Example 1.1.
The exponentially distributed random variable X ∼ Exp (1) has expo-nential tail decay that is P ( X ≥ t ) = exp( − t ). It is the example of random variablewith 1-sub-exponential tail decay; c = C = 1. Consider a random variable Y p = θX /p for some α, θ >
0. Observe that for t ≥ P ( Y p ≥ t ) = P (cid:0) θX /p ≥ t (cid:1) = P (cid:0) X ≥ ( t/θ ) p (cid:1) = exp (cid:0) − ( t/θ ) p (cid:1) . The random variable Y p has p -sub-exponential tail decay; c = 1 and C = θ . Let usnote that Y p has the Weibull distribution with the shape parameter p and the scaleparameter θ . One can say that random variables with Weibull distributions form modelexamples of r.v.s with p -sub-exponential tail decay.1ecause it is know that random variables with the Poisson and the geometric dis-tributions have 1-sub-exponential tail decay then, in similar way as above, we can formanother families of p -sub-exponential random variables for any p > Example 1.2.
Let g denote a random variable with the standard normal distribution.It is know that for tails of such variables hold the estimate: P ( | g | ≥ t ) ≤ exp( − t / , for t ≥ Y p = θ | g | /p , by the aboveestimate, we get P ( Y p ≥ t ) = P (cid:0) θ | g | /p ≥ t (cid:1) = P (cid:0) | g | ≥ ( t/θ ) p/ (cid:1) ≤ exp (cid:16) − (cid:2) t/ (2 /p θ ) (cid:3) p (cid:17) . In other words we obtain another family of r.v.s with α -sub-exponential tail decay; c = 1 and C = 2 /α θ .Define now a symmetric random variable g p such that | g p | = | g | /p . One can calcu-late that its density function has the form f p ( x ) = p √ π | x | α/ − e −| x | p / . Let us emphasize that for p = 2 we get the density of the standard normal distribution.Observe that E g p = 0 and E | g p | p = E g = 1. For any p > g p the standard p -normal ( p - gaussian ) and write g p ∼ N p (0 , p -th moment of g p .The p -sub-exponential random variables can be characterized by finiteness of ψ p -norms defined as follows k X k ψ p := inf (cid:8) K > E exp( | X/K | p ) ≤ (cid:9) ;according to the standard convention inf ∅ = ∞ . We will call the above functional ψ p -norm but let us emphasize that only for p ≥ < p < p -sub-exponential random variables X satisfy the following p -sub-exponential tail decay: P ( | X | ≥ t ) ≤ (cid:16) − ( t/ k X k ψ p ) p (cid:17) ;see for instance [6, Lem.2.1]. 2or x = ( x i ) ni =1 ∈ R n and p ≥
1, let | x | p denote the p -norm of x , i.e., | x | p =( P ni =1 | x i | p ) /p . For a random variable X , by k X k L p we will denote the Lebesgue normof X , i.e., k X k L p = ( E | X | p ) /p .From now on let X = ( X i ) ni =1 denote a random vector with real coordinates. We willbe interested in the concentration of the norm | X | p around k| X | p k L p = ( P ni =1 E | X i | p ) /p in spaces of p -sub-exponential random variables. In other words we will be interestedin an estimate of the norm k| X | p − k| X | p k L p k ψ p .I owe the first result of this type to the anonymous reviewer of the previous versionof this paper, to whom I hereby express my thanks. Proposition 1.3.
Let p ≥ and X = ( X , ..., X n ) ∈ R n be a random vector withindependent p -sub-exponential coordinates.Then k| X | p − k| X | p k L p k ψ p ≤ n / (2 p ) C /p K p , where K p = max ≤ i ≤ n k X i k ψ p and C is some universal constant. Let us emphasize that, for p ≥
2, we can remove on the right hand side the fac-tor n / (2 p ) but under an additional assumption that p -th moments of coordinates arethe same, i.e., E | X | p = E | X i | p , i = 2 , , ..., n . Let us note that then k| X | p k L p = n /p k X k L p . The main theorem of this paper is the following. Theorem 1.4.
Let p ≥ and X = ( X , ..., X n ) ∈ R n be a random vector with inde-pendent p -sub-exponential coordinates X i that satisfy E | X | p = E | X i | p , i = 2 , , ..., n .Then k| X | p − √ n k X k L p k ψ p ≤ /p C (cid:16) K p k X k L p (cid:17) p − K p , where K p := max ≤ i ≤ n k X i k ψ p and C is an universal constant. Remark 1.5.
The above theorem is the generalization of the concentration of ψ -norm of random vectors with independent sub-gaussian coordinates (see Vershynin [5,Th.3.1.1]) to the case of ψ p -norm of vectors with p -sub-exponential coordinates, for p ≥ p -sub-exponential random variables (Section 2). p -sub-exponential random variables The p -sub-exponential random variables characterize the following lemma whose proof,for p ≥
1, one can find in [6, Lem.2.1]. Let us emphasize that this proof is valid forany positive p . 3 emma 2.1. Let X be a random variable and p > . There exist positive constants K, L, M such that the following conditions are equivalent:1. E exp( | X/K | p ) ≤ K ≥ k X k ψ p ) ;2. P ( | X | ≥ t ) ≤ − ( t/L ) p ) for all t ≥ ;3. E | X | α ≤ M α Γ (cid:0) αp + 1 (cid:1) for all α > . Remark 2.2.
The definition of ψ p -norm is based on condition 1. Let us notice thatif condition 2 is satisfied with some constant L then k X k ψ p ≤ /p L (compare [6,Rem.2.2]).Let L denote the space of all random variables defined on a given probability space.By L ψ p we will denote the space of random variables with finite ψ p -norm: L ψ p := { X ∈ L : k X k ψ p < ∞} . For ψ p -norms one can formulate the following Lemma 2.3.
Let p, r > and X ∈ L ψ pr then | X | p ∈ L ψ r and k| X | p k ψ r = k X k pψ pr .Proof. Let K = k X k ψ pr >
0. Then2 = E exp( | X/K | pr ) = E exp (cid:0)(cid:12)(cid:12) | X | p /K p (cid:12)(cid:12) r ) , which is equivalent to the conclusion of the lemma.Let us emphasize that if we known the moment generating function of a givenrandom variable | X | then we can calculate the ψ p -norm of | X | /p . Example 2.4.
Let X ∼ Exp (1). The moment generating function of X equals E exp( tX ) = 1 / (1 − t ) for t <
1. Let us observe that E exp( X/K ) = 11 − /K ≤ K ≥
2. It means that k X k ψ = 2. In consequance, Weibull distributed randomvariables, with the shape parameter p and the scale parameter θ , have the ψ p -norms: k θX /p k ψ p = θ k X k /pψ = θ /p . Let us note that starting with the moment generating function of g of the form(1 − t ) − / ( t < / k g k ψ = k g k ψ = 8 / k g p k ψ p = (8 / /p . 4et us notice that by Jensen’s inequality we get, for p ≥
1, that the ψ p -norm of theexpected value of p -sub-exponential random variable is not less than the ψ p -norm ofthis random variable itself, since2 = E exp (cid:16)(cid:12)(cid:12) X/ k X k ψ p (cid:12)(cid:12) p (cid:17) ≥ exp (cid:16)(cid:12)(cid:12) E X/ k X k ψ p (cid:12)(cid:12) p (cid:17) = E exp (cid:16)(cid:12)(cid:12) E X/ k X k ψ p (cid:12)(cid:12) p (cid:17) , which means that k E X k ψ p ≤ k X k ψ p . In consequence, for p -sub-exponential randomvariable, we have k X − E X k ψ p ≤ k X k ψ p ( p ≥ . (1)1-sub-exponential (simply sub-exponential) random variables will play a specialrole in our considerations. Sub-exponential random variable X with mean zero can bedefined by finiteness of τ ϕ -norm, i.e., τ ϕ ( X ) = inf { K > E exp( tX ) ≤ ϕ ∞ ( Kt ) } < ∞ ;where ϕ ∞ ( x ) = x / | x | ≤ ϕ ∞ ( x ) = ∞ otherwise; see the definition of τ ϕ p -norm in [6], compare Vershynin [5, Prop.2.7.1]. Let us emphasize that the norms k · k ψ and τ ϕ ( · ) are equivalent on the space of centered sub-exponential random variables(compare [6, Th.2.7]). Example 2.5. If X is a exponentially distributed random variable with the parameter1 then E X = 1. Let us note that the cumulant generating function of X − E X = X − E exp( X −
1) = − t − ln(1 − t ) . Since C X − (0) = 0 and C ′ X − (0) = 0, by theTaylor formula, we get C X − ( t ) = 12 C ′′ X − ( θ t t ) t ( | t | <
1) (2)for some θ t ∈ (0 , C ′′ X − ( t ) = 1 / (1 − t ) and it is an increasingfunction for | t | <
1. Let us observe now that ϕ ∞ ( Kt ) = K t / | t | ≤ /K and ∞ otherwise. By (2) we have that the infimum K such that C X − ( t ) ≤ ϕ ∞ ( Kt ) satisfiedthe equation C ′′ X − (1 /K ) = K . This means that the solution K of the followingequation 1(1 − /K ) = K is the τ ϕ -norm of ( X − τ ϕ ( X −
1) = 2.In the following lemma it is shown that sub-exponential random variables possessthe approximate rotation invariance property . Lemma 2.6.
Let X , ..., X n be independent sub-exponential random variables. Then τ ϕ (cid:16) n X i =1 ( X i − E X i ) (cid:17) ≤ n X i =1 τ ϕ ( X i − E X i ) . roof. Denote τ ϕ ( X i − E X i ) by K i , i = 1 , ...n . For independent centered sub-exponentialr.v.s we have E exp (cid:16) t n X i =1 ( X i − E X i ) (cid:17) = n Y i =1 E exp (cid:0) t ( X i − E X i ) (cid:1) ≤ n Y i =1 exp ϕ ∞ ( K i t ) = exp (cid:16) n X i =1 ϕ ∞ ( K i t ) (cid:17) . (3)Observe that n X i =1 ϕ ∞ ( K i t ) = (cid:26) ( P ni =1 K i ) t if t ≤ / max i K i , ∞ otherwise . Since max i K i ≤ pP ni =1 K i , we get n X i =1 ϕ ∞ ( K i t ) ≤ ϕ ∞ (cid:16)(cid:16) n X i =1 K i (cid:17) / t (cid:17) . By the above, the estimate (3) and the definition of τ ϕ -norm we obtain that τ ϕ (cid:16) n X i =1 ( X i − E X i ) (cid:17) ≤ (cid:16) n X i =1 τ ϕ ( X i − E X i ) (cid:17) / . Remark 2.7.
Let us note that if X i are sub-exponential then | X i | are sub-exponentialtoo. The above lemma implies that τ ϕ (cid:16) n X i =1 | X i | − n X i =1 E | X i | (cid:17) = τ ϕ (cid:16) | X | − k| X | k (cid:17) ≤ √ n max ≤ i ≤ n τ ϕ (cid:16) | X i | − E | X i | (cid:17) (4)In the following example it is shown that the factor √ n on the right hand side isnecessary Example 2.8.
Let X i ∼ Exp (1), i = 1 , ..., n , be independent random variables. Notethat the cumulant generating function of their centered sum equals nC X − ( X ∼ Exp (1)), i.e., ln E exp[ t ( n X i =1 X i − n )] = nC X − ( t ) = − nt − n ln(1 − t ) . As in Example 2.5 we get nC X − ( t ) = n C ′′ X − ( θ t t ) t ( | t | < τ ϕ -norm of the centered sum of X i equals √ n + 1 ∼ √ n .6ecause the norms τ ϕ ( · ) and k · k ψ are equivalent on the space of centered sub-exponential random variables and k| X i | − E | X i |k ψ ≤ k X i k ψ , i = 1 , ..., n , then wecan rewrite the inequality (4) to the form k| X | − k| X | k L k ψ ≤ C √ nK , (5)where K := max ≤ i ≤ n k X i k ψ and C is an universal constant.The proof of the following proposition is similar to the proof of the upper bound inthe large deviation theory (see for instance [4, 5.11(4)Theorem. Large deviation]) butwith one difference. Instead of the cumulant generating function of a given randomvariable we use its upper estimate by the function ϕ ∞ and, in consequence, the convexconjugate ϕ ∗∞ = ϕ on its tail estimate (see [6, Lem. 2.6]), where ϕ ( x ) = (cid:26) x if | x | ≤ , | x | − if | x | > . Proposition 2.9.
Let X i , i = 1 , ..., n , be independent sub-exponential random vari-ables. Then P (cid:16)(cid:12)(cid:12)(cid:12) n n X i =1 ( X i − E X i ) (cid:12)(cid:12)(cid:12) ≥ t (cid:17) ≤ (cid:16) − nϕ (cid:16) t C K (cid:17)(cid:17) , where K := max ≤ i ≤ n k X i k ψ and C is the universal constant such that τ ϕ ( · ) ≤ C k · k ψ .Proof. The moment generating function of n P ni =1 X i can be estimated as follows E exp (cid:16) u n n X i =1 ( X i − E X i ) (cid:17) = n Y i =1 E exp (cid:16) u n ( X i − E X i ) (cid:17) ≤ n Y i =1 exp (cid:16) ϕ ∞ (cid:16) n τ ϕ (( X i − E X i )) u (cid:17)(cid:17) ≤ n Y i =1 exp (cid:16) ϕ ∞ (cid:16) n C k X i − E X i k ψ u (cid:17)(cid:17) ≤ exp (cid:16) nϕ ∞ (cid:16) n C Ku (cid:17)(cid:17) . The convex conjugate of the function f ( u ) := nϕ ∞ ( n C Ku ) equals f ∗ ( t ) = sup u ∈ R n tu − nϕ ∞ (cid:16) n C Ku (cid:17)o = sup u> n tu − nϕ ∞ (cid:16) n C Ku (cid:17)o = n sup u> n t C K C Kun − ϕ ∞ (cid:16) n C Ku (cid:17)o = n sup v> n t C K v − ϕ ∞ ( v ) o = nϕ ( t/ C K );the second equality holds since ϕ ∞ is the even function, the fourth one by the sub-stituting v = n C Ku and the last one by definition of the convex conjugate for even7unctions and the equality ϕ ∗∞ = ϕ . Thus we get f ∗ ( t ) = nϕ ( t/ C K ) . Similarly asin [1, Lem. 2.4.3] (formally f and f ∗ are not N -function, but the proof is the samealso for these functions), we get P (cid:16)(cid:12)(cid:12)(cid:12) n n X i =1 ( X i − E X i ) (cid:12)(cid:12)(cid:12) ≥ t (cid:17) ≤ (cid:16) − nϕ (cid:16) t C K (cid:17)(cid:17) . Remark 2.10.
Let us emphasize that because ϕ (cid:0) t/ (2 C K ) (cid:1) ≥
12 min (cid:8) t / (4 C K ) , t/ (2 C K ) (cid:9) , then the above estimate implies a form of Bernstein’s inequality for averages: P (cid:16)(cid:12)(cid:12)(cid:12) n n X i =1 ( X i − E X i ) (cid:12)(cid:12)(cid:12) ≥ t (cid:17) ≤ (cid:16) − n n t C K , t C K o(cid:17) ;compare Vershynin [5, Cor.2.8.3]. Proof of Proposition 1.3.
Because, for a ≥
0, the function a /p is concave on thenonnegative half-line of real numbers, then the following inequality (cid:12)(cid:12) a − b (cid:12)(cid:12) ≥ (cid:12)(cid:12) a /p − b /p (cid:12)(cid:12) p (6)holds for any a, b ≥ X i , i = 1 , ..., n , are p -sub-exponential random variables then | X i | p are the sub-exponential ones.Let Y i denotes | X i | p and Y be a vector ( Y i ) ni =1 . By Lemma 2.3 wehave k Y i k ψ = k X i k pψ p . Moreover | Y | = | X | pp and k| Y | k L = k| X | p k pL p . Substitutingin (?) Y instead of X we get k| Y | − k| Y | k L k ψ = k| X | pp − k| X | p k pL p k ψ ≤ C √ nK pp , where K p := max ≤ i ≤ n k X i k ψ p and C is the universal constant.By the definition of ψ -norm and inequality (6) with a = | X | pp and b = k| X | p k pL p weobtain 2 ≥ E exp (cid:16) (cid:12)(cid:12) | X | pp − k| X | p k pL p (cid:12)(cid:12) C √ nK pp (cid:17) ≥ E exp (cid:16) (cid:12)(cid:12) | X | p − k| X | p k L p (cid:12)(cid:12) p (cid:2) ( C √ n ) /p K p (cid:3) p (cid:17) , which means that (cid:13)(cid:13) | X | p − k| X | p k L p (cid:13)(cid:13) ψ p ≤ ( C √ n ) /p K p .
8t finishes the proof of Proposition 1.3.The structure of the proof of Theorem 1.4 is similar to the proof in Vershynin [5,Th. 3.1.1] but, apart from Proposition 2.9 and Lemma 2.1, we also use the followingtwo technical lemmas.
Lemma 3.1.
Let x, δ ≥ and p ≥ . If | x − | ≥ δ then | x p − | ≥ max { δ, δ p } .Proof. Under the above assumption on x and p we have: | x p − | ≥ | x − | . It meansthat if | x − | ≥ δ then | x p − | ≥ δ . For 0 ≤ δ ≤ δ p ≤ δ . In consequence | x p − | ≥ max { δ, δ p } for 0 ≤ δ ≤ δ >
1. The condition | x − | ≥ δ is equivalent to x ≥ δ + 1 if x ≥ x ≤ − δ if 0 ≤ x ≤
1. Let us observe that the second opportunity is notpossible for δ > x ≥
0. The first one gives x p ≥ ( δ + 1) p ≥ δ p + 1 ( p ≥
1) thatis equivalent to x p − ≥ δ p for x ≥
1. Summing up we get | x p − | ≥ max { δ, δ p } for x, δ ≥ p ≥ Lemma 3.2. If p ≥ then ϕ (max { γ, γ p } ) ≥ γ p for γ ≥ .Proof. By the definition of ϕ we have ϕ (max { γ, γ p } ) = (cid:26) γ if 0 ≤ γ ≤ ,γ p − if 1 < γ. If 0 ≤ γ ≤ ϕ (max { γ, γ p } ) = γ ≥ γ p for p ≥ < γ then the inequality ϕ (max { γ, γ p } ) = γ p − > γ p also holds. Proof of Theorem 1.4 . Let us observe that the expression1 n k X k pL p | X | pp − n n X i =1 (cid:16) | X i | p k X k pL p − (cid:17) is the sum of independent and centered 1-sub-exponential random variables. Moreover,by condition (1) and Lemma 2.3, we have k| X i | p − k ψ ≤ k| X i | p k ψ = 2 k X i k pψ p ≤ K pp . Now, by virtue of Lemma 3.1 and Proposition 2.9, we get P (cid:16)(cid:12)(cid:12)(cid:12) n /p k X k L p | X | p − (cid:12)(cid:12)(cid:12) ≥ δ (cid:17) ≤ P (cid:16)(cid:12)(cid:12)(cid:12) n k X k pL p | X | pp − (cid:12)(cid:12)(cid:12) ≥ max { δ, δ p } (cid:17) ≤ (cid:16) − nϕ (cid:16) k X k pL p max { δ, δ p } C K pp (cid:17)(cid:17) , (7)for any C ≥ C . 9he inequality 2 = E exp (cid:16) | X i |k X i k ψ p (cid:17) p ≥ E (cid:16) | X i |k X i k ψ p (cid:17) p implies that k X i k pψ p ≥ E | X i | p = k X k pp , i = 1 , ..., n , and, in consequence, K pp ≥ k X k pp .Since C >
1, we have that k X k pp / C K pp is less than 1. Under this condition we have k X k pL p max { δ, δ p } C K pp ≥ max n k X k pL p δ C K pp , (cid:16) k X k pL p δ C K pp (cid:17) p o . By the definition of ϕ and Lemma 3.2 with γ = k X k pL p δ/ (2 C K pp ) we get ϕ (cid:16) k X k pL p max { δ, δ p } C K pp (cid:17) ≥ ϕ (cid:16) max n k X k pL p δCK pp , (cid:16) k X k pL p δCK pp (cid:17) p o(cid:17) ≥ (cid:16) k X k pL p δCK pp (cid:17) p . Rearranging (7) and applying the above estimate we obtain the following P (cid:16)(cid:12)(cid:12)(cid:12) | X | p − n /p k X k L p (cid:12)(cid:12)(cid:12) ≥ n /p k X k L p δ (cid:17) = P (cid:16)(cid:12)(cid:12)(cid:12) n /p k X k L p | X | p − (cid:12)(cid:12)(cid:12) ≥ δ (cid:17) ≤ (cid:16) − n (cid:16) k X k pL p δCK pp (cid:17) p (cid:17) = 2 exp (cid:16) − (cid:16) n /p k X k pL p δ /p CK pp (cid:17) p (cid:17) . Changing variables to t = n /p k X k L p δ , we get the following p -sub-exponential taildecay P (cid:16)(cid:12)(cid:12)(cid:12) | X | p − n /p k X k L p (cid:12)(cid:12)(cid:12) ≥ t (cid:17) ≤ (cid:16) − (cid:16) k X k p − L p t /p CK pp (cid:17) p (cid:17) . By Lemma 2.1 and Remark 2.2 we obtain (cid:13)(cid:13) | X | p − n /p k X k L p (cid:13)(cid:13) ψ p ≤ /p C (cid:16) K p k X k L p (cid:17) p − K p , for p ≥ , which finishes the proof of Theorem 1.4. Example 3.3.
Let g p = ( g p, , ..., g p,n ) be a random vector with independent standard p -normal coordinates ( g p,i ∼ N p (0 , k g p,i k L p = 1 and k g p,i k ψ p = (8 / /p ,for i = 1 , ..., n . Thus K pp = 8 /
3. By Theorem 1.4 we get k| g p | p − n /p k ψ p ≤ C /p for p ≥ . Remark 3.4.
Many problems deal with sub-gaussian and sub-exponential randomvariables may be considered in the spaces of p -sub-exponential random variables for anypositive p . In the paper G¨otze et al. [3] one can find generalizations and applications ofsome concentrations inequalities for polynomials of such variables in cases of 0 < p ≤ p -sub-exponential coordinates.10 eferences [1] V. Buldygin, Yu. Kozachenko, Metric Characterization of Random Variables andRandom Processes , Amer.Math.Soc., Providence, 2000.[2] R.M. Dudley,
Uniform Central Limit Theorems , Cambridge University Press, 1999.[3] F. G¨otze, H. Sambale, A. Sinulis,
Concentration inequalities for polynomials in α -sub-exponential random variables , preprint (2019), arXiv:1903.05964v1.[4] G. R. Grimmett, D. R. Stirzaker, Probability and Random Processes , Oxford Uni-versity Press, Third edition, 2001.[5] R. Vershynin,
High-Dimensional Probability , Cambridge University Press, 2018.[6] K. Zajkowski,