aa r X i v : . [ m a t h . P R ] D ec NEW ESTIMATES OF THE CONVERGENCE RATE IN THELYAPUNOV THEOREM
Ilya Tyurin ∗ October 24, 2018
Abstract
We investigate the convergence rate in the Lyapunov theorem when the third ab-solute moments exist. By means of convex analysis we obtain the sharp estimate forthe distance in the mean metric between a probability distribution and its zero biastransformation. This bound allows to derive new estimates of the convergence ratein terms of Kolmogorov’s metric as well as the metrics ζ r ( r = 1 , ,
3) introduced byZolotarev. The estimate for ζ is optimal. Moreover, we show that the constant inthe classical Berry-Esseen theorem can be taken as 0 . . Our results [1] concerning the convergence rate in the Lyapunov central limit theo-rem were published in ”Doklady Akademii Nauk” (the article was presented by ProfessorYu. V. Prokhorov on June 10, 2009). The complete proofs [2] were submitted to the ”Theoryof Probability and its Applications” on June 8, 2009. As it turned out later, independentlyof us Professor Goldstein has obtained some results that coincide with ours. Namely, anestimate for the proximity in the mean metric between a probability distribution and itszero bias transformation, and the upper bound of the constant in the mean central limittheorem have been established. His article [3] appeared on arXiv more than two weeks later,i.e. on June 28, 2009.The present paper includes not only the results of [1, 2], but also their improvements. Weshow that the constant C in the Berry-Esseen inequality does not exceed 0 . C . Consider centered independent (real-valued) random variables (r.v.) X , . . . , X n with vari-ances σ , . . . , σ n and finite absolute moments β , . . . , β n . We denote σ = σ ( n ) := n X j =1 σ j , ε n := 1 σ n X j =1 β j . ∗ Moscow State University, Department of Probability Theory, Moscow. E-mail: [email protected] S n := ( X + . . . + X n ) /σ ( n ) converges in distribution tothe standard normal r.v. when ε n →
0. From both theoretical and practical points of view,it is very important to estimate the convergence rate in this theorem. It is known [5, 6] thatthere exists a minimal numerical constant C such that for the Kolmogorov distance between S n and the standard normal variable N holds the inequality ρ ( S n , N ) := sup x ∈ R (cid:12)(cid:12) P ( S n x ) − P ( N x ) (cid:12)(cid:12) Cε n , n ∈ N . (1)There are plenty of works devoted to estimation of this constant. Esseen [6] showed that C .
5. Bergstr¨om [7] obtained the bound C .
8. Takano [8] established that in the caseof independent identically distributed (i.i.d.) summands C . C .
322 and C . C .
301 and C . C . C . C . C . C . C . C is not as sharp as ours. However, interesting estimates of the other kind wereobtained.It is worth mentioning the related problem of determining the asymptotically best con-stants in Lyapunov’s theorem. As it was shown by Esseen [19], if all the r.v. X j , j = 1 , , . . . have the same distribution, thenlim sup n →∞ ρ ( S n , N ) ε n C := √
10 + 36 √ π = 0 . . . ., (2)and the constant on the right-hand side of this inequality cannot be lowered (hence the lowerbound C > C ). This result was elaborated by Rogozin [20], who established that under thesame assumptions lim sup n →∞ ρ ( S n , N ) ε n C := 1 √ π , (3)where ρ ( S n , N ) := inf G ∈N ρ ( S n , G ) and N is the set of all normal r.v.Chistyakov [21, 22, 23] generalized (2) and (3) to the case of nonidentically distributedsummands. He proved that ρ ( S n , N ) C ε n + r ( ε n ) , ρ ( S n , N ) C ε n + r ( ε n ) , where r ( ε n ) , r ( ε n ) are o ( ε n ) when ε n → δ exist (see [24, 25]).Analogues of (1) are known for other probability metrics as well, for example, ζ r (where r = 1 , , ζ r ( r = 1 , ,
3) the following estimates (see [27]) areknown: ζ ( S n , N ) ε n , ζ ( S n , N ) √ π ε n , ζ ( S n , N ) ε n . (4)Hoeffding [28] considered the problem of finding the least upper bound of E f ( X , . . . , X n )over the set of all collections of independent simple r.v. satisfying m restrictions of the form E g ij ( X j ) = c ij , j = 1 , . . . , n . More precisely, it was established that in this case one hasto consider only r.v. taking at most m + 1 values. In the present work results of [28] aregeneralized to the case of arbitrary quasiconvex functional defined on the set of all probabilitydistributions.The results obtained allowed us to derive an unimprovable estimate for the proximity inthe mean metric between a probability distribution and its zero bias transformation. Thelatter was used to estimate the accuracy of the Gaussian approximation for the sums ofindependent variates. It was established that the values of constants in (4) can be taken3 times lower. In addition, our estimate for the metric ζ is optimal. Furthermore, newestimates for the difference between the characteristic functions of the normalized sum andthe standard normal r.v. were derived, which allowed us to prove that C . C . Let (
S, d ) be a metric space and denote by Q the set of all finite signed measures on theBorel σ -algebra B ( S ) with the operations of multiplication by a scalar and addition definedas follows: let µ, µ , µ ∈ Q, c ∈ R , then for each A ∈ B ( S )( cµ )( A ) := c · µ ( A ) , ( µ + µ )( A ) := µ ( A ) + µ ( A ) . It is easy to see that Q forms a linear space. And the set D of discrete probability distributionsthat are concentrated on finite sets of points is a convex subset of Q . The latter means that αµ + (1 − α ) µ ∈ D for arbitrary µ , µ ∈ D and α ∈ (0 , n independent r.v. X , . . . , X n . Then E f ( X , . . . , X n ) = Z R n f dP X . . . dP X n , where f : R n → R , P X , . . . , P X n are the distributions of X , . . . , X n . Thus, E f ( X , . . . , X n )can be regarded as a function on the set of measures, which is linear with respect to each ofits n arguments.A function g : G → R , where G is a convex set, is said to be quasiconvex, if for any x, y ∈ G and α ∈ (0 , g ( αx + (1 − α ) y ) max { g ( x ) , g ( y ) } . We assume that on S some real-valued functions h , . . . , h m are defined. Consider theset K := { µ ∈ D : h h i , µ i = 0 , i = 1 , . . . , m } , where h f, µ i := Z S f dµ.
3t is easy to see that K is convex. Let K j be the set of measures µ ∈ K that are concentratedon at most j points ( j ∈ N ). Theorem 1.
For any quasiconvex function g : K → R , we have sup µ ∈ K g ( µ ) = sup µ ∈ K m +1 g ( µ ) . In this expression, we assume that the supremum over the empty set is zero.
Theorem 2.
Let f be a nonnegative function on S , V – a linear space with the norm k · k , A : K → V – such a mapping that A ( αµ + (1 − α ) ν ) = αAµ + (1 − α ) Aν (5) for arbitrary µ, ν ∈ K, α ∈ (0 , . Then the least value of γ such that the inequality k Aµ k γ h f, µ i (6) holds for every measure µ ∈ K , coincides with the least value of γ such that ( ) is true forevery measure µ ∈ K m +1 . Let W be a zero-mean r.v. with variance σ >
0. A r.v. W ∗ is said to have the W -zerobiased distribution if E W f ( W ) = σ E f ′ ( W ∗ ) (7)for every differentiable function f : R → R such that the left-hand side of (7) is defined. Itis known (see [26]) that W ∗ exists for every W as described above and has a density p ( w ) = ( σ − E ( W · { W > w } ) , if w > σ − E ( − W · { W < w } ) , if w <
0. (8)For every function f ∈ C ( r − ( R ), where r ∈ N , define M r ( f ) := sup x = y (cid:12)(cid:12)(cid:12)(cid:12) f ( r − ( x ) − f ( r − ( y ) x − y (cid:12)(cid:12)(cid:12)(cid:12) . As usual, C (0) ( R ) := C ( R ). If f / ∈ C ( r − ( R ), we set M r ( f ) = ∞ . Denote ζ r ( X, Y ) := sup {| E f ( X ) − E f ( Y ) | : f ∈ F r } , r = 1 , , . . . , where F r is the set of all real bounded functions with M r ( f ) ζ has alternative representations. These are the so-called mean metric κ ( X, Y ) := Z ∞−∞ (cid:12)(cid:12) P ( X x ) − P ( Y x ) (cid:12)(cid:12) dx, and the minimal L -metric l ( X, Y ) := inf n E | e X − e Y | : Law ( e X ) = Law ( X ) , Law ( e Y ) = Law ( Y ) o . For details see [29, p. 21].
Theorem 3. If W is a centered r.v. with unit variance and finite third absolute moment,then ζ ( W, W ∗ ) E | W | , (9)4 ith equality when W has a 2-point distribution. Corollary 1.
Consider a r.v. S ∗ n having the S n -zero biased distribution. Then ζ ( S n , S ∗ n ) ε n . Theorem 4.
The following inequalities are true: ζ ( S n , N ) ζ ( S n , S ∗ n ) ε n , ζ ( S n , N ) √ π ζ ( S n , S ∗ n ) √ π ε n , (10) ζ ( S n , N ) ζ ( S n , S ∗ n ) ε n . (11) The latter double inequality is optimal, namely, for every δ > there exists such a sequenceof i.i.d. r.v. X , X , . . . , that ζ ( S n , N ) ε n > − δ, n = 1 , , . . . For γ > t ∈ R we set b ( t, γ ) := − t + 2 γa | t | , if γ | t | < M ; − (cid:16) γ (cid:17) (1 − cos γt ) , if M γ | t | π ;0 , if γ | t | > π .Here a := max x> { (cos( x ) − x / /x } ≈ . , and M is the point where this maximum is attained, M ≈ . f S n ( t ) := E e itS n , ϕ ( t ) := exp ( − t / δ n ( t ) := | f S n ( t ) − ϕ ( t ) | , t ∈ R . Theorem 5.
For every t ∈ R we have | f S n ( t ) | b f ( ε n , t ) := exp (cid:18) b ( t, ε n ) (cid:19) , (12) δ n ( t ) b δ ( ε n , t ) := ε n ϕ ( t ) | t | Z s (cid:18) s (cid:19) ds. (13) Define A := ε − / n / a . For all t ∈ R the following estimate is true δ n ( t ) b δ ( ε n , t ) := ε n ϕ ( t ) | t | R s exp (cid:16) s ε / n (cid:17) ds, | t | A ; ε n ϕ ( t ) A R s exp (cid:16) s ε / n (cid:17) ds + | t | R A s l exp (2 aε n s ) ds ! , | t | > A . (14) where l := inf t > (cid:8) exp (cid:0) − t / at ) (cid:1)(cid:9) ≈ . . b δ ( ε, t ) and b δ ( ε, t ) can be expressed in terms of the so-called Dawsonintegral Daw( t ) := exp (cid:0) − t (cid:1) t Z exp (cid:0) s (cid:1) ds, which can be computed by the means of several efficient numerical procedures. For example,such a function is available in the GNU Scientific Library (GSL). It is easy to check that b δ ( ε, t ) = ε (cid:18) t − √ (cid:18) t √ (cid:19)(cid:19) . Moreover, b δ ( ε, t ) = exp (cid:16) t ( ε / − (cid:17) (cid:16) tε / − √ Daw (cid:16) tε / √ (cid:17)(cid:17) , | t | A ;exp (cid:16) (1 / a ) − t (cid:17) (cid:16) (1 / a ) − √ Daw(1 / a √ (cid:17) + ϕ ( t )12 al exp (2 aεu ) | tu = A | t | > A .These representations are of great importance, since they allow to reduce significantly theamount of numerical calculations required for the proof of Theorem 7.In the case of i.i.d. variables the estimates can be slightly improved. Denote τ n := σ P nj =1 σ j , and let X , X , . . . be centered i.i.d. r.v. with unit variances andfinite third absolute moments β . Then ε n = β/ √ n and τ n = 1 / √ n . Theorem 6.
For the sequence of r.v. as defined above and every t ∈ R | f S n ( t ) | b f ( ε n , n, t ) := (cid:18) b ( t, ε n + 1 / √ n ) n (cid:19) n/ , (15) δ n ( t ) b δ ( ε n , n, t ) := ε n ϕ ( t ) | t | Z (cid:18) b ( s, ε n + 1 / √ n ) n (cid:19) n − s (cid:18) s (cid:19) ds. (16) Let m ∈ N and n > m . Then | f S n ( t ) | b f ( ε n , m, t ) := exp (cid:18) b ( t, ε n + 1 / √ m ) (cid:19) , (17) δ n ( t ) b δ ( ε n , m, t ) := ε n ϕ ( t ) | t | Z exp (cid:18) m − m b ( s, ε n + 1 / √ m ) + s (cid:19) s ds. (18)Estimates (12)-(18) allowed to establish the following result. Theorem 7.
The constant C in inequality ( ) does not exceed . , and in the caseof identically distributed summands C . . Proof of Theorem 1. If K = ∅ , then K m +1 = ∅ , and the statement of our theorem istrue. Further we suppose that the set K is nonempty.The sequence of the sets K , K , . . . increases to the set K . Therefore,sup µ ∈ K g ( µ ) = sup j > sup µ ∈ K j g ( µ ) = sup j > m +1 sup µ ∈ K j g ( µ ) . µ ∈ K m +1 g ( µ ) > sup µ ∈ K m +2 g ( µ ) > sup µ ∈ K m +3 g ( µ ) > . . . Let’s take an arbitrary measure µ ∈ K j , where j > m + 1, and show that there exists µ ′ ∈ K j − such that g ( µ ′ ) > g ( µ ).Let µ be concentrated in points s , . . . , s j ∈ S and µ ( s i ) = µ i > , i = 1 , . . . , j. (19)The vector ¯ µ = ( µ , . . . , µ j ) defines a probability distribution, so µ + . . . + µ j = 1 . (20)Moreover, the conditions h h i , µ i = 0 , i = 1 , . . . , m, hold and therefore µ · h i ( s ) + . . . + µ j · h i ( s j ) = 0 , i = 1 , . . . , m. (21)Vice versa, an arbitrary vector with nonnegative coordinates satisfying the system oflinear equations (20) and (21) defines according to (19) an element of the set K j , and if oneof its coordinates equals zero – an element of K j − . We have m + 1 equations and at least m + 2 unknowns, so there exists a nonzero solution ¯ ν = ( ν , . . . , ν j ) of the correspondinghomogeneous system. Since the sum of the coordinates of this vector is equal to zero, butthe vector itself is nonzero, it follows that ¯ ν has both positive and negative coordinates.Therefore, there exist the least α > β > µ ∗ = ¯ µ − α ¯ ν equals zero and some coordinate of ¯ µ ∗ = ¯ µ + β ¯ ν is equal to zero.If α = 0, then µ ∈ K j − . Otherwise,¯ µ = βα + β ¯ µ ∗ + αα + β ¯ µ ∗ , and because of the quasiconvexity g ( µ ) max { g ( µ ∗ ) , g ( µ ∗ ) } , where µ ∗ , µ ∗ are the distribu-tions defined by ¯ µ ∗ and ¯ µ ∗ . Thus, g ( µ ) g ( µ ∗ ) or g ( µ ) g ( µ ∗ ). But µ ∗ and µ ∗ ∈ K j − . (cid:3) Proof of Theorem 2.
According to Theorem 1, it is sufficient to prove that for everyfixed value of γ the function g ( µ ) := k Aµ k − γ h f, µ i is quasiconvex. Let α + β = 1. By the properties of the norm k A ( αµ + βν ) k − γ h f, αµ + βν i = k αAµ + βAν k − γ h f, αµ i − γ h f, βν i k αAµ k + k βAν k − γ h f, αµ i − γ h f, βν i = αg ( µ ) + βg ( ν ) max { g ( µ ) , g ( ν ) } . (cid:3) Proof of Theorem 3.
We begin by showing that without loss of generality we canconsider simple r.v. W . It is sufficient to establish that for every r.v. W satisfying theconditions of the theorem there exists a sequence ( W n ) n > of simple r.v. with zero meansand unit variances such that ζ ( W n , W ∗ n ) → ζ ( W, W ∗ ) and E | W n | → E | W | , n → ∞ . (22)7e suppose that r.v. W is defined on the probability space ( R , B ( R ) , P = P W ) and constructa sequence of simple r.v. ( W ′ n ) n > that converges to the r.v. W in L norm. We set W n := W ′ n − E W ′ n p Var W ′ n . It is easy to see that W n converges to W in L as well. Therefore, the second condition in(22) is obviously satisfied. It remains to show that the first one also holds.From the triangle inequality for the metric ζ one can easily derive that | ζ ( W, W ∗ ) − ζ ( W n , W ∗ n ) | ζ ( W, W n ) + ζ ( W ∗ , W ∗ n ) . (23)The first summand on the right-hand side of (23) tends to zero, since ζ ( W, W n ) = l ( W, W n ) E | W − W n | (cid:0) E | W − W n | (cid:1) . Let’s evaluate the second summand. For the function f ∈ F we set F ( x ) := R x f ( u ) du .Then E f ( W ∗ ) − E f ( W ∗ n ) = E W F ( W ) − E W n F ( W n ) . (24)The difference of expectations on the left-hand side of (24) does not change if we replacethe function f ( x ) by f ( x ) − f (0). Therefore, we can assume without loss of generality that f (0) = 0. Then | f ( x ) − f ( y ) | | x − y | yields | f ( x ) | | x | and so | F ( x ) | | x | , | xf ( x ) | | x | .According to the finite-increment theorem, W F ( W ) − W n F ( W n ) = ( W − W n ) · { xF ( x ) } ′ | x = ξ = ( W − W n ) { F ( ξ ) + ξf ( ξ ) } , where ξ is a number between W and W n . Moreover, | F ( ξ ) + ξf ( ξ ) | | ξ | | W | + | W n | ) . This gives the estimate | E { W F ( W ) − W n F ( W n ) }| E | W − W n | ( | W | + | W n | ) . And finally, the H¨older’s inequality yields E | W − W n | ( | W | + | W n | ) (cid:0) E | W − W n | (cid:1) (cid:0) E ( | W | + | W n | ) (cid:1) . Obviously, ( E | W − W n | ) →
0, since W n converges to W in L . Thus, the second summandin (23) tends to zero and (22) is fulfilled.So, it is sufficient to consider simple r.v. Let A be a function that maps the distri-bution P X of a r.v. X to its zero-biased distribution P ∗ X . Moreover, consider a linearoperator A that maps a signed measure ν to its cumulative distribution function (c.d.f.) G ν ( x ) := ν (( −∞ , x ]). It is easy to see that( αP W + (1 − α ) P W ) ∗ = αP ∗ W + (1 − α ) P ∗ W , hence the mapping A − A A satisfies (5). If we set h ( x ) = x , h ( x ) = x − f ( x ) = | x | , A = A − A A , V – the normed space of integrable functions onthe real line with the norm k G k = Z ∞−∞ | G ( x ) | dx, then the problem reduces to the case of simple r.v. taking at most 3 values. C.d.f. of asimple r.v. W is a staircase function. Using formula (8) one can easily obtain the c.d.f. of W ∗ . Therefore, it is not difficult to find the explicit expression for κ ( W, W ∗ ).8ic. 1Let W take exactly two values − x and y with probabilities p and q . Then its c.d.f.is piecewise constant and has two steps in points − x and y that are equal to p and q ,respectively. Since W is centered, we have px = qy , which together with (8) yields that W ∗ is uniformly distributed on [ − x, y ]. Therefore, on [ − x, y ] its c.d.f. is linear, and its graph isa segment that connects ( − x,
0) and ( y, κ ( W, W ∗ ) equals the area of thefigure bounded by distribution functions of these r.v. (in the case considered it is a union oftwo triangles, see pic. 1, left).It follows from the conditions E W = 0 and E W = 1 that x = p q/p, y = p p/q. Hence E | W | = px + qy = q r qp + p r pq . (25)Let’s find the area of the figure bounded by c.d.f. of r.v. W and W ∗ . Density of W ∗ is px = qy = √ pq . Thus, the slope of the c.d.f. of this r.v. on [-x,y] is √ pq . The length of thevertical leg of the first triangle is p , and that of the second one is q . Hence, the total area ofboth triangles is 12 p √ pq + 12 q √ pq = 12 p r pq + 12 q r qp = 12 E | W | . Therefore, if W takes exactly two values, there is equality in (9).Consider the case when W takes three values. We assume without loss of generality thattwo of them ( − a < − b ) do not exceed zero and one ( c ) is positive. As before, the c.d.f. of W is piecewise constant and the c.d.f. of W ∗ is piecewise linear. However, the form of thefigure bounded by them is more complicated (see pic. 1, right). Denote by R the value ofthe c.d.f. of W ∗ at − b and by S its value at 0. Let W take values − a, − b, c with probabilities p, q, r , respectively. Then, because of the moment-type restrictions, p + q + r = 1 , − pa − qb + rc = 0 ,pa + qb + rc = 1 . It is a system of linear equations with respect to p, q, r . Using Cramer’s rule, we obtain p = (1 − bc )( c + b ) / ∆ , q = ( ac − a + c ) / ∆ , r = ( ab + 1)( a − b ) / ∆ , where ∆ = ( a + c )( b + c )( a − b ). Thus, every r.v. with zero mean and variance 1 that takesthree values is uniquely determined by these three values. It is easy to see that p, q, r arenonnegative iff ac > , bc . (26)In other words, a r.v. W taking the values − a, − b, c exists iff (26) is satisfied. Our aim is toprove that the function g ( a, b, c ) := κ ( W, W ∗ ) − E | W | (27)9oes not exceed zero. Its explicit form in terms of variables a, b, c depends on how the c.d.f.of r.v. W and W ∗ are located with respect to each other. There are 5 cases:I. R p , S p + q , or, equivalently, a ( a − b ) c >
1. In this case g ( a, b, c ) = rc − pa − qb . II. R p , S > p + q ⇔ a ( a − b ) c g ( a, b, c ) = rc − pa − qb . III. R > p , S p + q ⇔ a ( a − b ) > c > c ( b + c ) > g ( a, b, c ) = pa (cid:26) a − b − a (cid:27) + rc − pa − qb . IV. p R p + q , S > p + q ⇔ a ( a − b ) > c ( b + c ) > c g ( a, b, c ) = pa (cid:26) a − b − a (cid:27) + rc − pa − qb . V. R > p + q ⇔ c ( b + c ) g ( a, b, c ) = pa − rc . Note that in each of these cases g is the same function defined by (27). As a result, if thevalues a, b, c satisfy the restrictions of two cases simultaneously, then for the function g wecan use the expression corresponding to any of them.As one can see, in the cases I, II, and in the cases III, IV the function g has the samerepresentation. Therefore, further we distinguish three possibilities:A. a ( a − b ) g ( a, b, c ) = rc − pa − qb . B. a ( a − b ) > c ( b + c ) > g ( a, b, c ) = pa (cid:26) a − b − a (cid:27) + rc − pa − qb . C. c ( b + c ) g ( a, b, c ) = pa − rc . We show that in each of the cases A, B and C the function g does not exceed zero.Case A. g ( a, b, c ) = rc − pa − qb = (1 + ab )( a − b )∆ c − (1 − bc )( b + c ) a ∆ − ( ac − a + c ) b ∆ == ( a − b )( ac − − bc )∆ c ( − − bc − ac − ab ) a, b, c and the fact that a > b .10ase C. g ( a, b, c ) = pa − rc = (1 − bc )( b + c )∆ a − (1 + ab )( a − b ) c ∆ == ( ac − a (cid:8) − [( ab + 1)( a − b )] c − [1 + b ( a − b )] c − b (cid:9) . Since ac >
1, it suffices to prove that the expression enclosed by braces does not exceedzero. Consider this expression as a function of the variable c while holding the others fixed.When c = 0, this function equals − b
0. Moreover, it decreases with respect to c , since thecoefficients of terms c and c are negative. Consequently, for all positive values of c it doesnot exceed zero.Case B. g ( a, b, c ) = pa (cid:26) a − b − a (cid:27) + rc − pa − qb = (1 − bc )( b + c ) a ∆ (cid:26) a − b − a (cid:27) ++ (1 + ab )( a − b )∆ c − (1 − bc )( b + c ) a ∆ − ( ac − a + c ) b ∆ = 1 − bc ∆ ac (cid:8) k c + k c + k (cid:9) , where k = 1 − a ( a − b )(1 + ab ) , k = b [1 − a ( a − b )(1 + ab )] , k = a ( a − b )(1 + ab ) . Assume that g ( a, b, c ) >
0. Due to the condition a ( a − b ) > − a ( a − b )(1 + ab ) − (1 + ab ) = − ab , − a ( a − b )(1 + ab ) − ab ) = − − ab < . Therefore, k and k do not exceed zero and, consequently, k c + k c + k decreases withrespect to c . Consequently, if one reduces the value of the variable c while holding a and b fixed, g will remain positive. The variable c is bounded from below by two conditions: ac > c ( b + c ) > . The first of these conditions can be omitted, since it follows from the other two: c ( b + c ) > a ( a − b ) > . Indeed, let ac <
1. Then1 c ( b + c ) < a (cid:18) b + 1 a (cid:19) ⇒ a < ab + 1 ⇒ a ( a − b ) < . Therefore, we can reduce c to the value c ∗ such that c ∗ ( b + c ∗ ) = 1. And g will remainpositive. But the situation, when c ( b + c ) = 1, satisfies the restrictions of the case C, forwhich we established that g (cid:3) Proof of Corollary 1.
Without loss of generality assume σ = 1. Let I be a random indextaking values 1 , . . . , n with probabilities σ , . . . , σ n , independent of X , . . . , X n . Constructon an extended probability space S ′ i := X j = i X j + X ∗ i , i = 1 , . . . , n, X ∗ i has the X i -zero biased distribution and is independent of I, X , . . . , X n , i = 1 , . . . , n . Then S ∗ n = S ′ I (see [26]). Therefore, for an arbitrary function f ∈ F onehas E f ( S ∗ n ) = E f ( S ′ I ) = n X k =1 E f ( S ′ I ) { I = k } = n X k =1 E f ( S ′ k ) { I = k } = n X k =1 σ k E f ( S ′ k ) . (28)Consequently, | E f ( S n ) − E f ( S ∗ n ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X k =1 σ k E f ( S n ) − n X k =1 σ k E f ( S ′ k ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X k =1 σ k | E f ( S n ) − E f ( S ′ k ) | n X k =1 σ k ζ ( X k , X ∗ k ) = n X k =1 σ k ζ (cid:18) X k σ k , X ∗ k σ k (cid:19) n X k =1 σ k · E (cid:12)(cid:12)(cid:12)(cid:12) X k σ k (cid:12)(cid:12)(cid:12)(cid:12) = 12 n X k =1 β k = 12 ε n . Here we used the statement of Theorem 3 for r.v. σ k X k as well as the property of homogeneityof the metric ζ (i. e. ζ ( cX, cY ) = cζ ( X, Y )) and ( αX ) ∗ D = αX ∗ . (cid:3) Proof of Theorem 4.
It is easy to see that for continuous f : R → R h ( w ) := e w w Z −∞ ( f ( w ) − E f ( N )) e − x dx satisfies the Stein’s equation h ′ ( w ) − wh ( w ) = f ( w ) − E f ( N ) . Hence | E f ( S n ) − E f ( N ) | = | E h ′ ( S n ) − E S n h ( S n ) | = | E h ′ ( S n ) − E h ′ ( S ∗ n ) | M ( h ) ζ ( S n , S ∗ n ) . As it was shown in [27], M ( h ) min { M ( f ) , √ π M ( f ) , M ( f ) } . Taking into account that M r ( f ) ζ r , one has ζ ( S n , N ) ζ ( S n , S ∗ n ) , ζ ( S n , N ) √ π ζ ( S n , S ∗ n ) , ζ ( S n , N ) ζ ( S n , S ∗ n ) . (29)Estimates in terms of ε n are obtained by applying Corollary 1 to (29).Let’s prove the optimality of (11). We set f ( x ) := x /
6. Then M ( f ) = 1 and E f ( N ) = 0,since the r.v. N is symmetric and the function f is odd. Consider X , X , . . . – a sequenceof i.i.d. variables with zero means and unit variances. Then E S n = E X √ n and ε n = E | X | √ n . As a result, we have ζ ( S n , N ) ε n > | E f ( S n ) − E f ( N ) | = 16 | E X | E | X | .
12t only remains to prove that | E X | / E | X | can be arbitrarily close to unity. Accordingto (25), the third absolute moment of a centered r.v. X with variance 1 taking two values x = − p q/p, y = p p/q with probabilities p and q , respectively, is equal to E | X | = q r qp + p r pq . It’s easy to see that the third moment of this r.v. equals E X = − q r qp + p r pq . Obviously, | E X | / E | X | → p → (cid:3) Lemma 1 ([30]) . Let W be centered r.v. with variance 1 and finite third absolutemoment β . Denote f ( t ) := E e itW , t ∈ R . Then for all t ∈ R | f ( t ) | b ( t, β + 1) , (30) moreover, | f S n ( t ) | exp ( b ( t, ε n + τ n )) . (31) Lemma 2.
For every t ∈ R the function b ( t, γ ) is nondecreasing with respect to γ . Proof.
This can be checked directly by calculating the derivative. (cid:3)
Lemma 3. If W is a centered r.v. with variance 1 and f ( t ) = E e itW , then | f ( t ) − ϕ ( t ) | ϕ ( t ) t Z | f ( s ) − f ∗ ( s ) | s exp (cid:18) s (cid:19) ds, t ∈ R , (32) where f ∗ ( t ) is the characteristic function of a r.v. W ∗ having the W -zero biased distribution. Proof.
According to the definition of the W -zero biased distribution, f ′ ( t ) = i E W cos( tW ) − E W sin( tW ) = − it E sin( tW ∗ ) − t E cos( tW ∗ ) = − t E e itW ∗ . (33)Consider the function ψ ( t ) := f ( t ) ϕ ( t ) . Note that ψ (0) = 1. Taking into account (33), we have ψ ′ ( t ) = ddt (cid:16) f ( t ) e t (cid:17) = f ′ ( t ) e t + tf ( t ) e t = { f ( t ) − f ∗ ( t ) } te t . Then (cid:12)(cid:12)(cid:12)(cid:12) f ( t ) ϕ ( t ) − (cid:12)(cid:12)(cid:12)(cid:12) = | ψ ( t ) − ψ (0) | Z t | ψ ′ ( s ) | ds = Z t | f ( s ) − f ∗ ( s ) | se s ds. (cid:3) Lemma 4.
For arbitrary r.v. X and Y we have (cid:12)(cid:12) E e itX − E e itY (cid:12)(cid:12) tζ ( X, Y ) . Proof.
It is well known that for all t, x, y ∈ R holds the inequality (cid:12)(cid:12) e itx − e ity (cid:12)(cid:12) t | x − y | . e X, e Y defined on one probability space such that Law ( e X ) = Law ( X ) and Law ( e Y ) = Law ( Y ), we have (cid:12)(cid:12) E e itX − E e itY (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E e it e X − E e it e Y (cid:12)(cid:12)(cid:12) E (cid:12)(cid:12)(cid:12) e it e X − e it e Y (cid:12)(cid:12)(cid:12) t E | e X − e Y | . (34)Passing in (34) to the greatest lower bound among every possible e X, e Y , we obtain (cid:12)(cid:12) E e itX − E e itY (cid:12)(cid:12) tl ( X, Y ) = tζ ( X, Y ) . (cid:3) Proof of Theorem 5.
The inequality (12) is a consequence of Lemma 1. Indeed,according to the Lyapunov inequality, we have σ j β j , j = 1 , . . . , n . Hence τ n ε n . Now(12) follows from (31) and Lemma 2.Further we assume without loss of generality that σ = 1. Denote f j ( t ) := E e itX j , f ∗ j ( t ) := E e itX ∗ j , j = 1 , . . . , n, and set W := S n in (32). Using Lemma 4 and Corollary 1 we get | f ( s ) − f ∗ ( s ) | sζ ( S n , S ∗ n ) ε n s . Substituting the latter into (32), we arrive at (13).According to (28), f ∗ ( s ) = n X m =1 σ m E e isS ′ m = n X m =1 σ m f ∗ m ( s ) Y j = m f j ( s ) . Therefore, | f ( s ) − f ∗ ( s ) | = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 σ i Y j f j ( s ) − n X i =1 σ i f ∗ i ( s ) Y j = i f j ( s ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) == (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 σ i { f i ( s ) − f ∗ i ( s ) } Y j = i f j ( s ) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . (35)From Lemma 4 and Theorem 3 we have | f j ( s ) − f ∗ j ( s ) | sζ ( X j , X ∗ j ) = sσ j ζ (cid:18) X j σ j , X ∗ j σ j (cid:19) β j s σ j . (36)It follows from (30) that | f j ( s ) | exp( − σ j s / β j a | s | ) for all real s . As a result, | f ( s ) − f ∗ ( s ) | (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X i =1 β i s Y j = i exp (cid:18) − σ j s β j as (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) == (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) n X j =1 β j s (cid:18) σ j s − β j as (cid:19)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) exp (cid:18) − s ε n as (cid:19) . (37)Since σ j β j ε n for j = 1 , . . . , n , we have for such j exp (cid:18) σ j t − β j at (cid:19) exp (cid:18) σ j t − σ j at (cid:19) = exp (cid:18) s − as (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) s = tσ j sup (cid:26) exp (cid:18) s − as (cid:19) : s ∈ [0 , tε / n ] (cid:27) . (38)14he function exp (cid:16) s − as (cid:17) increases on the segment [0 , / a ] and at the point 1 / a itattains its global maximum equal to 1 /l . Therefore,sup (cid:26) exp (cid:18) s − as (cid:19) : s ∈ [0 , tε / n ] (cid:27) = ( exp (cid:16) t ε / n − aε n t (cid:17) , if tε / n / a ;1 /l, otherwise. (39)Combining (37), (38) and (39) gives for sε / n / a | f ( s ) − f ∗ ( s ) | ε n s s ε / n − s ! , and for sε / n > / a | f ( s ) − f ∗ ( s ) | ε n s l exp (cid:18) − s aε n s (cid:19) . Substituting the expressions obtained into (32), we get the required estimates.
Proof of Theorem 6.
At first we prove (15). Denote f ( t ) := E e itX . According toLemma 1, (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) t √ n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) s b (cid:18) t √ n , β + 1 (cid:19) = s n b (cid:18) t, β + 1 √ n (cid:19) = r n b ( t, ε n + τ n ) . (40)Now (15) follows from the fact that f S n ( t ) = f n ( t/ √ n ).To establish (17) we note that 1 + x e x for all real x . Applying this inequality to (15)gives | f S n ( t ) | exp (cid:18) b ( t, ε n + τ n ) (cid:19) . (41)It remains to note that the sequence ( τ m ) m > decreases, which leads to (17).We set W := S n in Lemma 3. Applying (35) to the r.v. √ n X , . . . , √ n X n yields | f ( s ) − f ∗ ( s ) | = (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) s √ n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) n − · (cid:12)(cid:12)(cid:12)(cid:12) f (cid:18) s √ n (cid:19) − f ∗ (cid:18) s √ n (cid:19)(cid:12)(cid:12)(cid:12)(cid:12) . The first factor can be estimated with the help of (40) and the second – by means of (36).We have | f ( s ) − f ∗ ( s ) | (cid:18) b ( s, ε n + τ n ) n (cid:19) n − ε n s . (42)Substituting the expression obtained into (32), we get (16). To establish (18) we apply theinequality 1 + x e x to the first factor on the right-hand side of (42) and note that ( τ m ) m > is decreasing. (cid:3) Proof of Theorem 7.
Let D ( ε, n ) denote the least quantity such that for every collectionconsisting of n r.v. X , . . . , X n with ε n = ε holds the inequality ρ ( S n , N ) D ( ε, n ) ε n .
15e set D ( ε ) := sup n > D ( ε, n ) . Then the constant C can be determined as C = sup ε> D ( ε ) . Hence, it suffices to show that for all possible values of ε and n the quantity D ( ε, n ) . D ( ε, n ) . ε > / . ε > / . ρ ( S n , N ) λ n := σ ( n ) / ( σ ( n ) − max k =1 ,...,n σ k ) and set b ε n := λ / n ε n , ε ′ n := λ / n τ n , ε ′′ n := λ n n X k =1 σ k /σ ( n ) . Then, according to the inequality (I.52) from [31], for b ε n + ε ′ n . ρ ( S n , N ) . b ε n + 0 . ε ′ n + 0 . ε ′′ n + 0 . b ε n + ε ′ n ) . (43)Assume without loss of generality that σ = 1. Then the Lyapunov inequality yields σ j β k P nk =1 β k = ε n , j = 1 , . . . , n . Hence λ n (1 − ε / n ) − . In addition, ε ′ n b ε n and,as it was shown in [31], ε ′′ n ( ε ′ n ) / . From these inequalities and (43) it follows easily that D ( ε ) . ε . ε n = β / ( σ √ n ) > / √ n . Thus, n > ⌈ /ε n ⌉ and λ n = nn − n ( ε n ) n ( ε n ) − , (44)where n ( ε ) := ⌈ /ε ⌉ . Moreover, ε ′ n = λ / n √ n , and ε ′′ n = λ n . (45)Combining (43), (44) and (45) yields D ( ε ) . ε . ε from the segment I = [0 .
02; 1 / . I = [0 . / . ε is based on aninequality due to Prawitz [32] ρ ( S n , N ) ε n ε n U Z − U U (cid:12)(cid:12)(cid:12) K (cid:16) uU (cid:17)(cid:12)(cid:12)(cid:12) · | δ n ( u ) | du + Z U < | u | U U (cid:12)(cid:12)(cid:12) K (cid:16) uU (cid:17)(cid:12)(cid:12)(cid:12) · | f n ( u ) | du ++ U Z − U (cid:12)(cid:12)(cid:12)(cid:12) U K (cid:16) uU (cid:17) − i πu (cid:12)(cid:12)(cid:12)(cid:12) · | ϕ ( u ) | du + Z | u | >U (cid:12)(cid:12)(cid:12)(cid:12) ϕ ( u )2 πu (cid:12)(cid:12)(cid:12)(cid:12) du , (46)where K ( u ) := (1 − | u | ) + i (cid:16) (1 − | u | ) cot( πu ) + sgn( u ) π (cid:17) , < U U .16t follows from (46) that D ( ε ) does not exceed the quantity D ∗ ( ε, U , U ), which arises onthe right-hand side of (46) when we substitute δ n ( t ) with its estimate min { b δ ( ε, t ) , b δ ( ε, t ) } , | f n ( t ) | – with the estimate b f ( ε, t ) and select such parameters U , U that the resulting ex-pression was as little as possible. This procedure was carried out with the aid of computerfor several hundreds values of ε dispersed on the segment I . To obtain the estimates forthe intermediate points we used the following property of the quantities D ∗ , which holdsdue to the monotonicity of the functions b f , . . . , b f and b δ , . . . , b δ with respect to their firstarguments. D ∗ ( ε (1) , U , U ) ε (2) ε (1) D ∗ ( ε (2) , U , U ) , ε (1) < ε (2) . (47)The extremal value of the quantity D ∗ ( ε, U , U ) = 0 . ε = 0 . , U = 2 . , U = 5 . . In the case of i.i.d. r.v. the estimates were constructed in a different way.For the fixed value of ε we estimated the quantities D ( ε, n ) , n > , separately . For n < m , where m is some natural number, the individual estimates of D ( ε, n ) were given.On the right-hand side of (46) we substituted δ n ( t ) and | f n ( t ) | with their upper estimates b δ ( ε, n, t ) and b f ( ε, n, t ). After that the computational procedure as described above wascarried out to select the optimal parameters U and U . For n > m the quantities D ( ε, n ) wereestimated uniformly. On the right-hand side of (46) the estimates b δ ( ε, m, t ) and b f ( ε, m, t )were used. As before, it was sufficient to carry out the calculations only for the finite numberof points, since a property similar to (47) holds in this case as well. For the i.i.d. r.v. theextremal value 0 . ε = 0 . , n = 8, U = 2 . , U = 8 . . Thus, the constant C does not exceed 0 . C . Acknowledgement
The author would like to thank Professor A. V. Bulinski for useful discussions and valuableadvice.