Large deviations for infinite weighted sums of stretched exponential random variables
aa r X i v : . [ m a t h . P R ] D ec Large deviations for infinite weighted sums ofstretched exponential random variables
Frank AurzadaJanuary 1, 2020
Abstract
We study the large deviation probabilities of infinite weighted sumsof independent random variables that have stretched exponential tails.This generalizes Kiesel and Stadtm¨uller [12], who study the same ob-jects under the assumption of finite exponential moments, and Gantertet al. [8], who study finite weighted sums with stretched exponentialtails.
Keywords: independent, identically distributed random variables; largedeviations; stretched exponential random variables; weighted sums
A classical result in probability theory is Cram´er’s theorem for the largedeviations of sums of independent, identically distributed random variables:If ( X i ) is an i.i.d. sequence with zero mean and for some t > φ ( t ) := E e tX is finite thenlim n →∞ n log P ( 1 n n X i =1 X i > x ) = − sup t ∈ R ( tx − log φ ( t )) , x > . It is also classical that Cram´er’s theorem can be extended to a full largedeviation principle; and it can be seen as the starting point of large deviationtheory, see e.g. [6, 7].Whenever the random variables ( X i ) do not have any finite exponentialmoment, the behaviour of the large deviations is different. This is due tothe fact that then the large deviation event is produced by only one variablebeing unusually large. The classical result here (cf. [14]) is as follows: if ( X i )is an i.i.d. sequence with stretched exponential tail, log P ( X > t ) ∼ − κt r ,as t → ∞ for some 0 < r <
1, and finite expectation thenlim n →∞ n r log P ( 1 n n X i =1 X i > x ) = − κ ( x − E [ X ]) r , x > E [ X ] . (1)1n this paper, we study weighted sums of i.i.d. random variables. There isquite some literature on large devations of weighted sums and their applica-tions. The most recent general reference is Kiesel and Stadtm¨uller [12] (alsosee [1, 2, 3, 4, 5, 9, 10, 15] for further references). However, these papersdeal with random variables that do have some finite exponential moment.The only source, to the knowledge of the author, that deals with weightedsums of random variables that do not have any finite exponential moment isGantert et al. [8]. There, finite sums of the type P ni =1 a i ( n ) X i are consideredwhen the random variables have stretched exponential tails.In this note, we treat the case of infinite weighted sums P ∞ i =1 a i ( n ) X i with ( X i ) i.i.d. random variables having stretched exponential tails. Besidesfilling this gap in the literature, the motivation comes from Baysian statis-tics: There, one is interested in proving contraction rates for the posteriordistribution for nonparametric inverse problems. There, estimates of thetype studied here are important, see e.g. Lemma 5.2 in [13], [16], or [11]for results with Gaussian priors, which require large deviation estimates ofsquared Gaussians, i.e. with exponential moments. We mention that thepresent results are directly motivated by a forthcoming work of S. Agapiouand P. Math´e in that area for non-Gaussian priors.The paper is structured as follows. In Section 2, we define the concretesetup for this paper and state our main result. The proofs are given inSection 3. Let ( a i ( n )) i ≥ ,n =1 , ,... be an array of non-negative numbers (let sup i a i ( n ) > n to avoid trivialities). Let ( X i ) be a sequence of non-negative i.i.d.random variables, copies of the random variable X with tail behaviourlog P ( X > t ) ∼ − κt r , as t → ∞ , (2)for some 0 < r < κ > P ∞ X i =1 a i ( n ) X i > x ! , where x > n → ∞ . (3)The large deviation regime is characterized by the condition that the typicalvalues of P ∞ i =1 a i ( n ) X i lie below x , i.e.lim sup n →∞ E " ∞ X i =1 a i ( n ) X i < x, which we shall encode using assumption (4) below.2e can now formulate our main result, which is a “largest jump prin-ciple” for the large deviations of weighted sums of stretched exponentialrandom variables. This means that the large deviation event is triggered byone of the terms in the sum being large, namely the one corresponding tothe largest weight. Theorem 2.1
Let ( X i ) be a sequence of non-negative i.i.d. random vari-ables, copies of X with tail behaviour (2). Further, let ( a i ( n )) i ≥ ,n =1 , ,... benon-negative numbers with lim n →∞ ∞ X i =1 a i ( n ) = D ∈ [0 , ∞ ) (4) and such that a max ( n ) := max i ≥ a i ( n ) > and a max ( n ) → . Then for any x > D · E [ X ]lim n →∞ a max ( n ) r log P ∞ X i =1 a i ( n ) X i > x ! = − κ ( x − D · E [ X ]) r . We stress that we do not need any regularity assumption on the sequence( a i ( n )). Note that max i ≥ a i ( n ) exists (for any n ), because (4) implies thatlim i →∞ a i ( n ) → Example 2.2
The classical result (1) is retrieved for a i ( n ) = n − i ≤ n . Example 2.3
In a motivating example from Baysian statistics, a i ( n ) =1l i ≥ n σ i /ρ n , which gives the large deviation probability of “remainder” sums: P ( P ∞ i = n σ i X i > xρ n ) . Here ( σ i ) is a positive, summable sequence and ( ρ n ) is a positive sequence. Example 2.4
The work of Gantert et al. [8] in the case of non-negativerandom variables can be recovered as follows. They consider arrays with a i ( n ) = 0 for i > n . Their condition (B) implies that (4) holds with D > and that n · a max ( n ) → s > . Note that we do not require a max ( n ) to be oforder n − in this work. Example 2.5
Examples where a i ( n ) depends on n in a different way aregiven for instance by moving averages, where a i ( n ) := σ i φ − n m n ≤ i ≤ m n + φ n − , for positive sequences σ , φ , m . Such objects were studied by [12] under theassumption of finite exponential moments (cf. the remark on p. 938 in [12]). Possible extensions of the present results include the case that X has apolynomial tail (rather than stretched exponential) or the precise behaviourfor the case of a supremum rather than a sum (see Lemma 3.2 below for apartial result). In the spirit of Example 2.3, one could also consider P ∞ i = N ,where N is random (cf. e.g. [2] for the case of finite sums). Further, onemight want to add a slowly varying factor in (2).3 Proofs
We start with two results for the rate of the probability P (sup i ≥ a i ( n ) X i > x ) , n → ∞ , (5)which is the obvious analog of (3). We start with a lower bound. Lemma 3.1 If a max ( n ) → then for any x > n →∞ a max ( n ) r log P (sup i ≥ a i ( n ) X i > x ) ≥ − κx r . If lim sup n →∞ a max ( n ) > then lim inf n →∞ P (sup i ≥ a i ( n ) X i > x ) > . Proof:
The claims follow immediately from the trivial estimate P (sup i ≥ a i ( n ) X i > x ) ≥ P ( a max ( n ) X m ( n ) > x ) = P ( X > x/a max ( n )) . where m ( n ) := min { i ≥ a i ( n ) = a max ( n ) } . (cid:3) We now turn to the corresponding upper bound. We shall prove it undermore restrictive assumptions in order to avoid lengthy discussions (note that(4) is not necessary for the sup-problem). The stated lemma will be oneingredient in the proof of the main result.
Lemma 3.2
Assume that (4) holds and that a max ( n ) → . Then we havefor any x > n →∞ a max ( n ) r log P (sup i ≥ a i ( n ) X i > x ) = − κx r . Proof:
First note that we can assume w.l.o.g. that x = 1, as otherwise itcan be absorbed as a constant factor into the sequence ( a i ( n )). The lowerbound already follows from Lemma 3.1. For the upper bound, observe that P (sup i ≥ a i ( n ) X i >
1) = P ( ∞ [ i =1 { a i ( n ) X i > } ) ≤ ∞ X i =1 P ( a i ( n ) X i > . (6)Fix 0 < ε < κ . It remains to use the tail bound for X , which shows thatthe last term bounded from above as follows: For large enough n , ∞ X i =1 P ( a i ( n ) X i >
1) = ∞ X i =1 P ( X > /a i ( n )) ≤ ∞ X i =1 Ce − ( κ − ε ) a i ( n ) − r , (7)4ith some constant C >
0. The remainder of the proof consists in a treat-ment of this sum: ∞ X i =1 e − ( κ − ε ) a i ( n ) − r = ∞ X i =1 e − (1 − ε )( κ − ε ) a i ( n ) − r · e − ε ( κ − ε ) a i ( n ) − r ≤ e − (1 − ε )( κ − ε ) a max ( n ) − r · ∞ X i =1 ( ε ( κ − ε ) a i ( n ) − r ) − /r = e − (1 − ε )( κ − ε ) a max ( n ) − r · ( ε ( κ − ε )) − /r ∞ X i =1 a i ( n ) ≤ e − (1 − ε )( κ − ε ) a max ( n ) − r · ( ε ( κ − ε )) − /r a max ( n ) ∞ X i =1 a i ( n ) ≤ e − (1 − ε )( κ − ε ) a max ( n ) − r · ( ε ( κ − ε )) − /r ( D + ε ) , where we used in the second step that e − x ≤ x − /r for large x and, in the laststep, the assumptions that P ∞ i =1 a i ( n ) → D and a max ( n ) →
0. Combiningthis with (6) and (7) showslog P (sup i ≥ a i ( n ) X i > ≤ − (1 − ε )( κ − ε ) a max ( n ) − r +log[ C ( ε ( κ − ε )) − /r ( D + ε )] . Multiplying by a max ( n ) r , taking first n → ∞ and then ε → (cid:3) Here, we give the proofs of the lower and upper bound in Theorem 2.1,respectively.
Proof of the lower bound:
Throughout, we use the notation m ( n ) :=min { i ≥ a i ( n ) = a max ( n ) } .Let us first treat the case that D = 0. Then the lower bound alreadyfollows from assumption (2) together with P ( ∞ X i =1 a i ( n ) X i > x ) ≥ P ( a max ( n ) X m ( n ) > x ) = P ( X > x/a max ( n )) . Assume
D >
0. Then we can fix an ε > ε < D . We begin bynoting that P ( ∞ X i =1 a i ( n ) X i > x ) ≥ P ( a max ( n ) X m ( n ) > x − ∞ X i =1 ,i = m ( n ) a i ( n ) E [ X ](1 − ε )) · P ( ∞ X i =1 ,i = m ( n ) a i ( n ) X i > ∞ X i =1 ,i = m ( n ) a i ( n ) E [ X ](1 − ε )) . (8)5ince a max ( n ) →
0, (4) implies P ∞ i =1 ,i = m ( n ) a i ( n ) = P ∞ i =1 a i ( n ) − a max ( n ) → D . Therefore, the first term on the right-hand side of (8), by (2), satisfieslim inf n →∞ a max ( n ) r log P ( a max ( n ) X m ( n ) > x − ∞ X i =1 ,i = m ( n ) a i ( n ) E [ X ](1 − ε )) ≥ lim inf n →∞ a max ( n ) r log P ( X > ( x − D E [ X ](1 − ε ) ) /a max ( n )) ≥ − κ ( x − D E [ X ](1 − ε ) ) r . We will show that the second term on the right-hand side of (8) tends toone for fixed ε and n → ∞ . Combining this with the last formula will finishthe proof of the lower bound in the theorem.Note that, for large enough n , P ( ∞ X i =1 ,i = m ( n ) a i ( n ) X i > ∞ X i =1 ,i = m ( n ) a i ( n ) E [ X ](1 − ε ))= P ( ∞ X i =1 ,i = m ( n ) a i ( n )( X i − E [ X i ]) > − ε E [ X ] ∞ X i =1 ,i = m ( n ) a i ( n )) ≥ P ( ∞ X i =1 ,i = m ( n ) a i ( n )( X i − E [ X i ]) > − ε E [ X ]( D − ε )) . The last term tends to one, since by Chebyshev’s inequality P ( ∞ X i =1 ,i = m ( n ) a i ( n )( X i − E [ X i ]) ≤ − ε E [ X ]( D − ε )) ≤ P (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∞ X i =1 ,i = m ( n ) a i ( n )( X i − E [ X i ]) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≥ ε E [ X ]( D − ε ) ≤ ( ε E [ X ]( D − ε )) − · V ∞ X i =1 ,i = m ( n ) a i ( n )( X i − E [ X i ]) = ( ε E [ X ]( D − ε )) − · V [ X ] · ∞ X i =1 ,i = m ( n ) a i ( n ) ≤ ( ε E [ X ]( D − ε )) − · V [ X ] · a max ( n ) · ∞ X i =1 a i ( n ) , which tends to zero (because the sum is bounded, by (4), and a max ( n ) → (cid:3) Proof of the upper bound:
The first observation is that we can assumew.l.o.g. that x = 1, as x can be absorbed as a constant factor into thesequence ( a i ( n )). 6 tep 1: Reduction step, main argument, overview. Set A := x − D E [ X ] = 1 − D E [ X ] and note that A >
0, by assumption.Further, fix 0 < ε < κ/ − (1 + ε ) D E [ X ] >
0. First notethat P ( ∞ X i =1 a i ( n ) X i > ≤ P ( ∞ X i =1 a i ( n ) X i > , sup i ≥ a i ( n ) X i ≤ A )+ P (sup i ≥ a i ( n ) X i > A ) , and the second term can be treated with Lemma 3.2, which shows thatthe second term has asymptotic order exp( − κa max ( n ) − r A r (1 + o (1))), asrequired by the assertion. If we can show that the first term is of the sameor lower order, we obtain the statement. Step 2: Exponential Chebychev inequality for the truncated random vari-ables.
Let us consider the first term: For any λ >
0, by the Markov inequality, P ( ∞ X i =1 a i ( n ) X i > , sup i ≥ a i ( n ) X i ≤ A )= P ( e λ P ∞ i =1 a i ( n ) X i > e λ , sup i ≥ a i ( n ) X i ≤ A ) ≤ e − λ E [ e λ P ∞ i =1 a i ( n ) X i , sup i ≥ a i ( n ) X i ≤ A ]= e − λ ∞ Y i =1 E [ e λa i ( n ) X a i ( n ) X ≤ A ]= exp − λ + ∞ X i =1 log E [ e λa i ( n ) X a i ( n ) X ≤ A ] ! ≤ exp − λ + ∞ X i =1 (cid:16) E [ e λa i ( n ) X a i ( n ) X ≤ A ] − (cid:17)! ≤ exp − λ + ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A ] ! . (9)Let us deal with the sum. Note that for 0 ≤ x ≤ ε we have e x − ≤ e ε − ε x ≤ (1 + ε ) x (for ε small enough). Thus ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A ]= ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A,λa i ( n ) X<ε ] + ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A,λa i ( n ) X ≥ ε ]7 ∞ X i =1 E [(1 + ε ) λa i ( n ) X a i ( n ) X ≤ A,λa i ( n ) X<ε ] + ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A,λa i ( n ) X ≥ ε ] ≤ (1 + ε ) λ E [ X ] ∞ X i =1 a i ( n ) + ∞ X i =1 E [( e λa i ( n ) X − a i ( n ) X ≤ A,λa i ( n ) X ≥ ε ] . (10)Setting B := κ − ε we shall use the last estimate with λ := BA r − a max ( n ) r . Step 3: We show that the second sum in (10) tends to zero for fixed ε > and n → ∞ . First note that if a i ( n ) X ≤ A then – using r < λa i ( n ) X = BA r − a i ( n ) a max ( n ) r · X − r · X r ≤ BA r − a i ( n ) a max ( n ) r · A − r a i ( n ) − r · X r = Ba i ( n ) r a max ( n ) r · X r . Therefore, E [( e λa i ( n ) X − a i ( n ) X ≤ A,λa i ( n ) X ≥ ε ] ≤ E [( e Bai ( n ) ra max( n ) r · X r − · λa i ( n ) X ≥ ε ] . (11)Further, it is elementary to show (see Lemma 3.3 below) that due to thetail estimate (2), which we use in the form P ( X > t ) ≤ k exp( − B ′ t r ) for all t > k >
0, where B ′ := κ − ε , we have E [( e bX r − X>a ] ≤ k − b/B ′ e − ( B ′ − b ) a r , for any a, b > b < B ′ .In our case, b := Ba i ( n ) r /a max ( n ) r ≤ B < B ′ and a := εa i ( n ) − λ − .Therefore, we see that the term on the right-hand side of (11) is boundedfrom above by k − ( κ − ε ) a i ( n ) r ( κ − ε ) a max ( n ) r exp( − ( κ − ε − ( κ − ε ) a i ( n ) r /a max ( n ) r ) · [ εa i ( n ) − λ − ] r ) ≤ k κ − εε exp( − ε · B − r ε r A r (1 − r ) [ a i ( n ) − a max ( n ) r ] r ) . The second sum in (10) is therefore bounded from above by c ε ∞ X i =1 e − K [ a i ( n ) − a max ( n ) r ] r , K = 2 K ( ε ) := ε r B − r A r (1 − r ) and c ε := k ( κ − ε ) /ε . This can betreated as follows: Since e − x ≤ x − /r for large enough x , we have ∞ X i =1 e − K [ a i ( n ) − a max ( n ) r ] r = ∞ X i =1 e − K [ a i ( n ) − a max ( n ) r ] r · e − K [ a i ( n ) − a max ( n ) r ] r ≤ ∞ X i =1 (cid:0) K [ a i ( n ) − a max ( n ) r ] r (cid:1) − /r · e − Ka max ( n ) − (1 − r ) r = K − /r e − Ka max ( n ) − (1 − r ) r a max ( n ) − r ∞ X i =1 a i ( n ) . Now, P ∞ i =1 a i ( n ) is bounded by assumption (4). Further, since a max ( n ) → e − Ka max ( n ) − (1 − r ) r a max ( n ) − r tends to zero for fixed ε and n → ∞ .This finishes the proof of the fact that the second sum in (10) tends to zero. Step 4: Final computations.
Putting Step 3 together with (9) and (10),we have seen that for fixed ε and n → ∞ log P ( ∞ X i =1 a i ( n ) X i > , sup i ≥ a i ( n ) X i ≤ A ) ≤ − λ + (1 + ε ) λ E [ X ] ∞ X i =1 a i ( n ) + o (1) . = − BA r − a max ( n ) r " − (1 + ε ) E [ X ] ∞ X i =1 a i ( n ) − o (1) . Multiplying by a max ( n ) r and using (4), we obtainlim sup n →∞ a max ( n ) r log P ( ∞ X i =1 a i ( n ) X i > , sup i ≥ a i ( n ) X i ≤ A ) ≤ − BA r − (1 − (1 + ε ) E [ X ] D ) = − ( κ − ε ) A r − (1 − (1 + ε ) E [ X ] D ) . Letting ε → (cid:3) During the course of the last proof, we used the following completelyelementary lemma.
Lemma 3.3
Let X be a non-negative random variable with P ( X > t ) ≤ ke − B ′ t r for all t > and k, B ′ , r > . Then, for any a > and any < b < B ′ , E [( e bX r − X>a ] ≤ k − b/B ′ e − ( B ′ − b ) a r . roof: Note that E [( e bX r − X>a ] = E [ Z e bXr d s X>a ]= Z ∞ E [1l ( b − log s ) /r
The author is indebted to Sergios Agapiou (Univer-sity of Cyprus) and Peter Math´e (WIAS Berlin) for bringing this problem tohis attention and to Marvin Kettner (Darmstadt) for valuable suggestions.
References [1] O. Bonin. Large deviation theorems for weighted sums applied to ageographical problem.
J. Appl. Probab. , 39(2):251–260, 2002.[2] O. Bonin. Large deviation theorems for weighted compound Pois-son sums.
Probab. Math. Statist. , 23(2, Acta Univ. Wratislav. No.2593):357–368, 2003.[3] S. A. Book. Large deviation probabilities for weighted sums.
Ann.Math. Statist. , 43:1221–1234, 1972.[4] S. A. Book. A large deviation theorem for weighted sums.
Z.Wahrscheinlichkeitstheorie und Verw. Gebiete , 26:43–49, 1973.[5] D. Deltuvien ˙e and L. Saulis. Asymptotic expansion of the distributiondensity function for the sum of random variables in the series schemein large deviation zones. In
Proceedings of the Eighth Vilnius Confer-ence on Probability Theory and Mathematical Statistics, Part I (2002) ,volume 78, pages 87–97, 2003.[6] A. Dembo and O. Zeitouni.
Large deviations techniques and applica-tions , volume 38 of
Applications of Mathematics (New York) . Springer-Verlag, New York, second edition, 1998.107] J.-D. Deuschel and D. W. Stroock.
Large deviations , volume 137 of
Pure and Applied Mathematics . Academic Press, Inc., Boston, MA,1989.[8] N. Gantert, K. Ramanan, and F. Rembart. Large deviations forweighted sums of stretched exponential random variables.
Electron.Commun. Probab. , 19:no. 41, 2014.[9] R. Giuliano and C. Macci. Large deviation principles for sequences oflogarithmically weighted means.
J. Math. Anal. Appl. , 378(2):555–570,2011.[10] R. Giuliano and C. Macci. Large deviations for some normalized sumsof exponentially distributed random variables.
Ann. Math. Inform. ,39:109–123, 2012.[11] S. Gugushvili, A. W. van der Vaart, and D. Yan. Bayesian inverseproblems with partial observations.
Trans. A. Razmadze Math. Inst. ,172(3, part A):388–403, 2018.[12] R. Kiesel and U. Stadtm¨uller. A large deviation principle for weightedsums of independent identically distributed random variables.
J. Math.Anal. Appl. , 251(2):929–939, 2000.[13] B. Knapik and J.-B. Salomond. A general approach to posterior con-traction in nonparametric inverse problems.
Bernoulli , 24(3):2091–2121, 2018.[14] A. V. Nagaev. Integral limit theorems with regard to large deviationswhen Cram´er’s condition is not satisfied. I.
Teor. Verojatnost. i Prime-nen. , 14:51–63, 1969.[15] S. V. Nagaev. Large deviations of sums of independent random vari-ables.
Ann. Probab. , 7(5):745–789, 1979.[16] K. Ray. Bayesian inverse problems with non-conjugate priors.