[PDF] Rate of convergence and Edgeworth-type expansion in the entropic central limit theorem

Abstract

An Edgeworth-type expansion is established for the entropy distance to the class of normal distributions of sums of i.i.d. random variables or vectors, satisfying minimal moment conditions.

Full PDF

aa r X i v : . [ m a t h . P R ] J u l The Annals of Probability (cid:13)

Institute of Mathematical Statistics, 2013

RATE OF CONVERGENCE AND EDGEWORTH-TYPEEXPANSION IN THE ENTROPIC CENTRAL LIMIT THEOREM By Sergey G. Bobkov, Gennadiy P. Chistyakovand Friedrich G¨otze

University of Minnesota, University of Bielefeld and University of Bielefeld

An Edgeworth-type expansion is established for the entropy dis-tance to the class of normal distributions of sums of i.i.d. randomvariables or vectors, satisfying minimal moment conditions.

1. Introduction.

Let ( X n ) n ≥ be independent, identically distributed ran-dom variables with mean E X = 0 and variance Var( X ) = 1. According tothe central limit theorem, the normalized sums Z n = X + · · · + X n √ n are weakly convergent in distribution to the standard normal law Z n ⇒ Z ,where Z ∼ N (0 ,

1) with density ϕ ( x ) = √ π e − x / . A much stronger state-ment (when applicable)—the entropic central limit theorem—states that, iffor some n , or equivalently, for all n ≥ n , the random variables Z n haveabsolutely continuous distributions with ﬁnite entropies h ( Z n ), then theseentropies converge, h ( Z n ) → h ( Z ) as n → ∞ . (1.1)This theorem is due to Barron [3]. Some weaker variants of the theorem incase of regularized distributions were known before; they go back to the workof Linnik [16], initiating an information-theoretic approach to the centrallimit theorem. Received April 2011; revised May 2012. Supported in part by NSF Grant DMS-11-06530 and SFB 701.

AMS 2000 subject classiﬁcations.

Key words and phrases.

Entropy, entropic distance, central limit theorem, Edgeworth-type expansions.

This is an electronic reprint of the original article published by theInstitute of Mathematical Statistics in

The Annals of Probability ,2013, Vol. 41, No. 4, 2479–2512. This reprint diﬀers from the original inpagination and typographic detail. 1

S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

To clarify in which sense (1.1) is strong, recall that, if a random variable X with ﬁnite second moment has a density p ( x ), its entropy h ( X ) = − Z + ∞−∞ p ( x ) log p ( x ) dx is well deﬁned and is bounded from above by the entropy of the normalrandom variable Z , having the same mean a and the same variance σ as X . Note that the value h ( X ) = −∞ is possible. The relative entropy D ( X ) = D ( X k Z ) = h ( Z ) − h ( X ) = Z + ∞−∞ p ( x ) log p ( x ) ϕ a,σ ( x ) dx, where ϕ a,σ stands for the density of Z , is nonnegative and serves as kindof a distance to the class of normal laws, or to Gaussianity. This quantitydoes not depend on the mean or the variance of X , and can be related tothe total variation distance between the distributions of X and Z by virtueof the Pinsker-type inequality D ( X ) ≥ k F X − F Z k . This already showsthat the entropic convergence (1.1) is stronger than convergence in the totalvariation norm.Thus, the entropic central limit theorem may be reformulated as D ( Z n ) →

0, as long as D ( Z n ) < + ∞ for some n . This property itself gives riseto a number of intriguing questions, such as to the type and the rate ofconvergence. In particular, it has been proved only recently that the sequence h ( Z n ) is nondecreasing, so that D ( Z n ) ↓

0; cf. [1, 17]. This leads to thequestion as to the precise rate of D ( Z n ) tending to zero; however, not muchseems to be known about this problem. The best results in this direction aredue to Artstein et al. [2] and to Barron and Johnson [15]. In the i.i.d. case asabove, these authors have obtained an expected asymptotic bound D ( Z n ) = O (1 /n ) under the hypothesis that the distribution of X admits an analyticinequality of Poincar´e-type (in [15], a restricted Poincar´e inequality is used).These inequalities involve a large variety of “nice” probability distributionswhich necessarily have a ﬁnite exponential moment.The aim of this paper is to study the rate of D ( Z n ), using moment condi-tions E | X | s < + ∞ with ﬁxed values s ≥

2, which are comparable to thoserequired for classical Edgeworth-type approximations in the Kolmogorovdistance. The cumulants γ r = i − r d r dt r log E e itX (cid:12)(cid:12)(cid:12)(cid:12) t =0 are then well deﬁned for all r ≤ [ s ] (the integer part of s ), and one mayintroduce the functions q k ( x ) = ϕ ( x ) X H k +2 j ( x ) 1 r ! · · · r k ! (cid:18) γ (cid:19) r · · · (cid:18) γ k +2 ( k + 2)! (cid:19) r k (1.2) NTROPIC CENTRAL LIMIT THEOREM involving the Chebyshev–Hermite polynomials H k . The summation in (1.2)runs over all nonnegative integer solutions ( r , . . . , r k ) to the equation r +2 r + · · · + kr k = k , and one uses the notation j = r + · · · + r k .The functions q k are deﬁned for k = 1 , . . . , [ s ] −

2. They appear in Edgeworth-type expansions including the local limit theorem, where q k are used to con-struct the approximation of the densities of Z n . These results can be appliedto obtain an expansion in powers of 1 /n for the distance D ( Z n ). For a mul-tidimensional version of the following Theorem 1.1 for moments of integerorder s ≥

2, see Theorem 6.1 below.

Theorem 1.1.

Let E | X | s < + ∞ ( s ≥ , and assume D ( Z n ) < + ∞ ,for some n . Then D ( Z n ) = c n + c n + · · · + c [( s − / n [( s − / + o (( n log n ) − ( s − / ) . (1.3) Here c j = j X k =2 ( − k k ( k − X Z + ∞−∞ q r ( x ) · · · q r k ( x ) dxϕ ( x ) k − , (1.4) where the summation runs over all positive integers ( r , . . . , r k ) such that r + · · · + r k = 2 j . Each coeﬃcient c j in (1.3) represents a certain polynomial in the cumu-lants γ , . . . , γ j +1 . For example, c = γ , and in the case s = 4, (1.3) gives D ( Z n ) = 112 n ( E X ) + o (cid:18) n log n (cid:19) ( E X < + ∞ ) . (1.5)Thus, under the 4th moment condition, we have D ( Z n ) ≤ Cn , where theconstant depends on the underlying distribution. This has been conjecturedby Johnson [14], page 49. Actually, the constant C may be expressed interms of E X and D ( X ), only.When s varies in the range 4 ≤ s ≤

6, the leading linear term in (1.5) willbe unchanged, while the remainder term improves and satisﬁes O ( n ) in case E X < + ∞ . But for s = 6, the result involves the subsequent coeﬃcient c which depends on γ , γ and γ . In particular, if γ = 0, we have c = γ ,thus D ( Z n ) = 148 n ( E X − + o (cid:18) n log n ) (cid:19) ( E X = 0 , E X < + ∞ ) . More generally, representation (1.3) simpliﬁes if the ﬁrst k − X coincide with the corresponding moments of Z ∼ N (0 , S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Corollary 1.2.

Let E | X | s < + ∞ ( s ≥ , and assume that D ( Z n ) < + ∞ , for some n . Given k = 3 , , . . . , [ s ] , assume that γ j = 0 for all ≤ j < k .Then D ( Z n ) = γ k k ! · n k − + O (cid:18) n k − (cid:19) + o (cid:18) n log n ) ( s − / (cid:19) . (1.6)Johnson had noticed (though in terms of the standardized Fisher infor-mation, see [14], Lemma 2.12) that if γ k = 0, D ( Z n ) cannot be of smallerorder than n − ( k − .Note that when E X k < + ∞ , the o -term may be removed in the represen-tation (1.6). On the other hand, when k > s +22 , the o -term will dominate the n − ( k − -term, and we can only conclude that D ( Z n ) = o (( n log n ) − ( s − / ).As for the missing range 2 ≤ s <

4, here there are no coeﬃcients c j ap-pearing in the sum (1.3), and Theorem 1.1 just tells us that D ( Z n ) = o (cid:18) n log n ) ( s − / (cid:19) . (1.7)This bound is worse than the rate 1 /n . In particular, it only gives D ( Z n ) = o (1) for s = 2, which is the statement of Barron’s theorem. In fact, in thiscase the entropic distance to normality may decay to zero at an arbitrarilyslow rate. In case of a ﬁnite 3rd absolute moment, D ( Z n ) = o ( √ n log n ). Tosee that this and that the more general relation (1.7) cannot be improvedwith respect to the powers of 1 /n , we prove: Theorem 1.3.

Let η > . Given < s < , there exists a sequence ofindependent, identically distributed random variables ( X n ) n ≥ with E | X | s < + ∞ , such that D ( X ) < + ∞ and D ( Z n ) ≥ c ( n log n ) ( s − / (log n ) η , n ≥ n ( X ) , with a constant c = c ( η, s ) > , depending on η and s , only. Known bounds on the entropy are commonly based on Bruijn’s identitywhich may be used to represent the entropic distance to normality as an inte-gral of the Fisher information for regularized distributions; cf. [3]. However,it is not clear how to reach exact asymptotics with this approach. The proofsof Theorems 1.1 and 1.3 stated above rely upon classical tools and resultsin the theory of sums of independent summands including Edgeworth-typeexpansions for convolution of densities formulated as local limit theoremswith nonuniform remainder bounds. For noninteger values of s , the authorshad to complete the otherwise extensive literature by recent, technicallyrather involved results based on fractional diﬀerential calculus; see [6, 7]. NTROPIC CENTRAL LIMIT THEOREM Our approach applies to random variables in higher dimension as well andto nonidentical distributions for summands with uniformly bounded s th mo-ments.We start with the description of a truncation-of-density argument, whichallows us to reduce many questions about bounding the entropic distanceto the case of bounded densities (Section 2). In Section 3 we discuss knownresults about Edgeworth-type expansions that will be used in the proof ofTheorem 1.1. Main steps of the proofs are based on it in Sections 4 and 5.All auxiliary results cover the scheme of i.i.d. random vectors in R d as well(however, with integer values of s ) and are ﬁnalized in Section 6 to obtainmultidimensional variants of Theorem 1.1 and Corollary 1.2. Sections 7 and 8are devoted to lower bounds on the entropic distance to normality for aspecial class of probability distributions on the real line that are used in theproof Theorem 1.3.

2. Binomial decomposition of convolutions.

First let us comment on theassumptions in Theorem 1.1. It may happen that X has a singular distribu-tion, but the distribution of X + X and of all next sums S n = X + · · · + X n ( n ≥

2) are absolutely continuous; cf. [25].If it exists, the density p of X may or may not be bounded. In the ﬁrstcase, all the entropies h ( S n ) are ﬁnite. If p is unbounded, it may happenthat all h ( S n ) are inﬁnite, even if p is compactly supported. But if h ( S n )is ﬁnite for some n = n then, for all n ≥ n , entropies are ﬁnite; see [3] forspeciﬁc examples.Denote by p n ( x ) the density of Z n = S n / √ n (when it exists). Since it isdesirable to work with bounded densities, we will slightly modify p n at theexpense of a small change in the entropy. Variants of the next constructionare well known; see, for example, [13, 23], where the central limit theoremwas studied with respect to the total variation distance. Without any extraeﬀorts, we may assume that X n take values in R d which we equip with theusual inner product h· , ·i and the Euclidean norm | · | . For simplicity, wedescribe the construction in the situation, where X has a density p ( x ); cf.Remark 2.5 on appropriate modiﬁcations in the general case.Let m ≥ m = [ s ] + 1.)If p is bounded, we put e p n ( x ) = p n ( x ) for all n ≥

1. Otherwise, the integral b = Z p ( x ) >M p ( x ) dx (2.1)is positive for all M >

0. Choose M to be suﬃciently large to satisfy, forexample, 0 < b < ; cf. Remark 2.4. In this case (when p is unbounded),consider the decomposition p ( x ) = (1 − b ) ρ ( x ) + bρ ( x ) , (2.2) S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE where ρ , ρ are the normalized restrictions of p to the sets { p ( x ) ≤ M } and { p ( x ) > M } , respectively. Hence, for the convolutions we have a binomialdecomposition p ∗ n = n X k =0 C kn (1 − b ) k b n − k ρ ∗ k ∗ ρ ∗ ( n − k )2 . For n ≥ m + 1, we split the above sum into the two parts, so that p ∗ n = ρ n + ρ n with ρ n = n X k = m +1 C kn (1 − b ) k b n − k ρ ∗ k ∗ ρ ∗ ( n − k )2 ,ρ n = m X k =0 C kn (1 − b ) k b n − k ρ ∗ k ∗ ρ ∗ ( n − k )2 . Note that, whenever b < b < , ε n ≡ Z ρ n ( x ) dx = m X k =0 C kn (1 − b ) k b n − k (2.3) ≤ n m b n − m = o ( b n ) as n → ∞ . Finally deﬁne e p n ( x ) = p n ( x ) = 11 − ε n n d/ ρ n ( x √ n )(2.4)and similarly p n ( x ) = ε n n d/ ρ n ( x √ n ). Thus, we have the desired decom-position p n ( x ) = (1 − ε n ) p n ( x ) + ε n p n ( x ) . (2.5)The probability densities p n ( x ) are bounded and provide an approxima-tion for p n ( x ) = n d/ p ∗ n ( x √ n ) in total variation. In particular, from (2.3)–(2.5) it follows that Z | p n ( x ) − p n ( x ) | dx < − n for all n large enough. One of the immediate consequences of this estimateis the bound | v n ( t ) − v n ( t ) | < − n ( t ∈ R d )(2.6)for the characteristic functions v n ( t ) = R e i h t,x i p n ( x ) dx and v n ( t ) = R e i h t,x i × p n ( x ) dx , corresponding to the densities p n and p n .This property may be sharpened in case of ﬁnite moments. NTROPIC CENTRAL LIMIT THEOREM Lemma 2.1. If E | X | s < + ∞ ( s ≥ , then for all n large enough, Z (1 + | x | s ) | e p n ( x ) − p n ( x ) | dx < − n . In particular, (2.6) also holds for all partial derivatives of v n and v n up toorder m = [ s ] . Proof.

By deﬁnition (2.5), | p n ( x ) − p n ( x ) | ≤ ε n ( p n ( x )+ p n ( x )), hence Z | x | s | p n ( x ) − p n ( x ) | dx ≤ ε n − ε n n − s/ Z | x | s ρ n ( x ) dx + n − s/ Z | x | s ρ n ( x ) dx. Let U , U , . . . be independent copies of U and V , V , . . . be independentcopies of V (that are also independent of U n ’s), where U and V are randomvectors with densities ρ and ρ , respectively. From (2.2) β s ≡ E | X | s = (1 − b ) E | U | s + b E | V | s , so E | U | s ≤ β s /b and E | V | s ≤ β s /b (using b < ). Therefore, for the normal-ized sums R k,n = 1 √ n ( U + · · · + U k + V + · · · + V n − k ) , ≤ k ≤ n, we have E | R k,n | s ≤ β s b n s/ , if s ≥

1, and E | R k,n | s ≤ β s b n − ( s/ , if 0 ≤ s ≤ ρ n and ρ n , Z | x | s ρ n ( x ) dx = n s/ n X k = m +1 C kn (1 − b ) k b n − k E | R k,n | s ≤ β s b n s +1 , Z | x | s ρ n ( x ) dx = n s/ m X k =0 C kn (1 − b ) k b n − k E | R k,n | s ≤ β s b n s +1 ε n . It remains to apply estimate (2.3) on ε n , and Lemma 2.1 follows. (cid:3) We need to extend the assertion of Lemma 2.1 to the relative entropieswith respect to the standard normal distribution on R d with density ϕ ( x ) =(2 π ) − d/ e −| x | / . Thus put D n = Z p n ( x ) log p n ( x ) ϕ ( x ) dx, e D n = Z e p n ( x ) log e p n ( x ) ϕ ( x ) dx. Lemma 2.2. If X has a ﬁnite second moment and ﬁnite entropy, then | e D n − D n | < − n , for all n large enough. S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

First, we collect a few elementary properties of the convex function L ( u ) = u log u ( u ≥ Lemma 2.3.

For all u, v ≥ and ≤ ε ≤ : (a) L ((1 − ε ) u + εv ) ≤ (1 − ε ) L ( u ) + εL ( v ) ; (b) L ((1 − ε ) u + εv ) ≥ (1 − ε ) L ( u ) + εL ( v ) + uL (1 − ε ) + vL ( ε ) ; (c) L ((1 − ε ) u + εv ) ≥ (1 − ε ) L ( u ) − e u − e . The ﬁrst assertion is just Jensen’s inequality applied to L . By the convex-ity of L , for each y ≥

0, the function L ( x + y ) − L ( x ) is increasing in x ≥ L ( x + y ) − L ( x ) ≥ L ( y ), which is (b) for x = (1 − ε ) u and y = εv .Similarly, using L ≥ − e , we obtain (c). Proof of Lemma 2.2.

Assuming that p is (essentially) unbounded,deﬁne D nj = Z p nj ( x ) log p nj ( x ) ϕ ( x ) dx ( j = 1 , , so that e D n = D n, . By Lemma 2.3(a), D n ≤ (1 − ε n ) D n + ε n D n . On theother hand, by (b), D n ≥ ((1 − ε n ) D n + ε n D n ) + ε n log ε n + (1 − ε n ) log(1 − ε n ) . In view of (2.3), the two estimates give | D n − D n | < C ( n + D n + D n ) b n , (2.7)which holds for all n ≥ C . In addition, by the inequalityin (c) with ε = b , from (2.2) it follows that D ( X k Z ) = Z L (cid:18) p ( x ) ϕ ( x ) (cid:19) ϕ ( x ) dx ≥ (1 − b ) Z ρ ( x ) log ρ ( x ) ϕ ( x ) dx − e , (2.8)where Z denotes a standard normal random vector in R d . By the samereasoning, D ( X k Z ) ≥ b Z ρ ( x ) log ρ ( x ) ϕ ( x ) dx − e . (2.9)Now, by the convexity of the function L ( u ) = u log u , D n ≤ − ε n n X k = m +1 C kn (1 − b ) k b n − k Z r k,n ( x ) log r k,n ( x ) ϕ ( x ) dx,D n ≤ ε n m X k =0 C kn (1 − b ) k b n − k Z r k,n ( x ) log r k,n ( x ) ϕ ( x ) dx, NTROPIC CENTRAL LIMIT THEOREM where r k,n are densities of the normalized sums R k,n from the proof of Lem-ma 2.1. Here each integral may also be written as Z r k,n ( x ) log r k,n ( x ) ϕ ( x ) dx = Z L ( r k,n ( x )) dx + d π ) + 12 E | R k,n | . (2.10)We have E | R k,n | ≤ β b n , as noticed in the proof of Lemma 2.1. In addition,by the convexity of L , there is a general inequality Z L (( f ∗ g )( x )) dx ≤ Z L ( f ( x )) dx valid for the convolution of any two probability densities f and g on R d (ifthe integrals exist). In particular, Z L ( r k,n ( x )) dx ≤ d n + max (cid:26)Z L ( ρ ( x )) dx, Z L ( ρ ( x )) dx (cid:27) , which may actually be sharpened in case 1 < k < n by replacing max withmin. By (2.8) and (2.9), the integrals on the right-hand side are ﬁnite, thusthe integrals on the left-hand side of (2.10) are bounded by Cn with someconstant C . Hence, a similar bound also holds for D nj , and it remains toapply (2.7). Lemma 2.2 is proved. (cid:3) Remark 2.4. If X has a ﬁnite second moment and D ( X ) < + ∞ , thetruncation level M in (2.1) can be chosen explicitly in terms of b using theentropic distance D ( X ) and σ = det(Σ), where Σ is the covariance matrixof X .Indeed, putting a = E X and using an elementary inequality t log(1 + t ) ≤ t log t + 1 ( t ≥ Z p log (cid:18) pϕ a, Σ (cid:19) dx = Z pϕ a, Σ log (cid:18) pϕ a, Σ (cid:19) ϕ a, Σ dx ≤ Z p log pϕ a, Σ dx + 1 = D ( X ) + 1 . On the other hand, the original expression majorizes Z { p ( x ) >M } p ( x ) log Mϕ a, Σ ( x ) dx ≥ b log( M σ (2 π ) d/ ) , hence M ≤ σ (2 π ) d/ e ( D ( X )+1) /b . Remark 2.5. If Z n have absolutely continuous distributions with ﬁniteentropies for n ≥ n >

1, the above construction should be properly modiﬁed. S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Namely, one may put e p n = p n , if p n are bounded, and otherwise applythe same decomposition (2.2) to p n in place of p . As a result, for any n = An + B ( A ≥

1, 0 ≤ B ≤ n − S n will have thedensity r n ( x ) = A X k =0 C kA (1 − b ) k b A − k Z ( ρ ∗ k ∗ ρ ∗ ( A − k )2 )( x − y ) dF B ( y ) , where F B is the distribution of S B . For A ≥ m + 1, split the above suminto the two parts with summation over m + 1 ≤ k ≤ A and 0 ≤ k ≤ m ,respectively, so that r n = ρ n + ρ n . Then, like in (2.4) and for the samesequence ε n described in (2.3), deﬁne e p n ( x ) = 11 − ε n n d/ ρ n ( x √ n ) . Clearly, these densities are bounded and approximate p n ( x ) in total varia-tion. In particular, for all suﬃciently large n , they satisfy the estimates thatare similar to the estimates in Lemmas 2.1 and 2.2.

3. Edgeworth-type expansions.

Let ( X n ) n ≥ be independent, identicallydistributed random variables with mean E X = 0 and variance Var( X ) = 1.In this section we collect some auxiliary results about Edgeworth-type ex-pansions both for the distribution functions F n ( x ) = P { Z n ≤ x } and the den-sities p n ( x ) of the normalized sums Z n = S n / √ n , where S n = X + · · · + X n .If the absolute moment E | X | s is ﬁnite for a given s ≥ m = [ s ],deﬁne ϕ m ( x ) = ϕ ( x ) + m − X k =1 q k ( x ) n − k/ (3.1)with the functions q k described in (1.2). Introduce as wellΦ m ( x ) = Z x −∞ ϕ m ( y ) dy = Φ( x ) + m − X k =1 Q k ( x ) n − k/ . (3.2)Similar to (1.2), the functions Q k have an explicit description involving thecumulants γ , . . . , γ k +2 of X . Namely, Q k ( x ) = − ϕ ( x ) X H k +2 j − ( x ) 1 r ! · · · r k ! (cid:18) γ (cid:19) r · · · (cid:18) γ k +2 ( k + 2)! (cid:19) r k , where the summation is carried out over all nonnegative integer solutions( r , . . . , r k ) to the equation r + 2 r + · · · + kr k = k with j = r + · · · + r k ; cf.,for example, [4] or [21] for details. NTROPIC CENTRAL LIMIT THEOREM Theorem 3.1.

Assume that lim sup | t |→ + ∞ | E e itX | < . If E | X | s < + ∞ ( s ≥ , then as n → ∞ , uniformly for all x , (1 + | x | s )( F n ( x ) − Φ m ( x )) = o ( n − ( s − / ) . (3.3)For 2 ≤ s < m = 2, there are no expansion terms in the sum (3.2),and hence Φ ( x ) = Φ( x ) is the distribution function of the standard normallaw. In this case, (3.3) becomes(1 + | x | s )( F n ( x ) − Φ( x )) = o ( n − ( s − / ) . (3.4)In fact, in this case Cramer’s condition on the characteristic function of X is not used. The result was obtained by Osipov and Petrov [19]; cf. also [5]where (3.4) is established with O .In the case s ≥ s = m is integer, relation (3.3) without thefactor 1 + | x | m represents the classical Edgeworth expansion. It is essentiallydue to Cram´er and is described in many papers and textbooks; cf. [9, 10].However, the case of fractional values of s is more delicate, especially in thefollowing local limit theorem. Theorem 3.2.

Let E | X | s < + ∞ ( s ≥ . Suppose Z n has a boundeddensity for some n . Then for all suﬃciently large n , the random variables Z n have continuous bounded densities p n satisfying, as n → ∞ , (1 + | x | m )( p n ( x ) − ϕ m ( x )) = o ( n − ( s − / )(3.5) uniformly for all x . Moreover, (1 + | x | s )( p n ( x ) − ϕ m ( x ))(3.6) = o ( n − ( s − / ) + (1 + | x | s − m )( O ( n − ( m − / ) + o ( n − ( s − )) . If s = m is integer and m ≥

3, Theorem 3.2 is well known; then (3.5) and(3.6) simplify to (1 + | x | m )( p n ( x ) − ϕ m ( x )) = o ( n − ( m − / ) . (3.7)In this formulation the result is due to Petrov [20]; cf. [21], page 211, or [4],page 192. Without the term 1 + | x | m , relation (3.7) goes back to the resultsof Cram´er and Gnedenko (cf. [11]).In the general (fractional) case, Theorem 3.2 has recently been obtainedin [6, 7] by using the technique of Liouville fractional integrals and deriva-tives. Assertion (3.6) gives an improvement over (3.5) on relatively largeintervals of the real axis, and this is essential in the case of noninteger s .An obvious weak point in Theorem 3.2 is that it requires the boundednessof the densities p n , which is, however, necessary for conclusions, such as (3.5)or (3.7). Nevertheless, this condition may be removed, if we replace p n byslightly modiﬁed densities e p n . S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Theorem 3.3.

Let E | X | s < + ∞ ( s ≥ . Suppose that, for all for allsuﬃciently large n , Z n have absolutely continuous distributions with densi-ties p n . Then there exist some bounded continuous densities e p n such that: (a) the relations (3.5) and (3.6) hold true for e p n instead of p n ; (b) R + ∞−∞ (1 + | x | s ) | e p n ( x ) − p n ( x ) | dx < − n , for all suﬃciently large n ; (c) e p n ( x ) = p n ( x ) almost everywhere, if p n is bounded ( a.e. ) . Here, property (c) is added to include Theorem 3.2 in Theorem 3.3 as aparticular case. Moreover, one can use the densities e p n constructed in theprevious section with m = [ s ] + 1. We refer to [6, 7] for detailed proofs.This extended result allows us to immediately recover, for example, thecentral limit theorem with respect to the total variation distance (withoutthe assumption of boundedness of p n ). Namely, we have k F n − Φ m k TV = Z + ∞−∞ | p n ( x ) − ϕ m ( x ) | dx = o ( n − ( s − / ) . (3.8)For s = 2 and ϕ ( x ) = ϕ ( x ), this statement corresponds to a theorem ofProkhorov [22], while for s = 3 and ϕ ( x ) = ϕ ( x )(1 + γ x − x √ n )—to the resultof Sirazhdinov and Mamatov [23]. The multidimensional case.

Similar results are also available in the mul-tidimensional case for integer values s = m . In the remaining part of this sec-tion, let ( X n ) n ≥ denote independent identically distributed random vectorsin the Euclidean space R d with mean zero and identity covariance matrix.Assuming E | X | m < + ∞ for some integer m ≥ | · | denotesthe Euclidean norm), introduce the cumulants γ ν of X and the associatedcumulant polynomials γ k ( it ) up to order m by using the equality1 k ! d k du k log E e iu h t,X i (cid:12)(cid:12)(cid:12)(cid:12) u =0 = 1 k ! γ k ( it ) = X | ν | = k γ ν ( it ) ν ν ! ( k = 1 , . . . , m, t ∈ R d ) . Here the summation runs over all d -tuples ν = ( ν , . . . , ν d ) with integer com-ponents ν j ≥ | ν | = ν + · · · + ν d = k . We also write ν ! = ν ! · · · ν d !and use a standard notation for the generalized powers z ν = z ν · · · z ν d d ofreal or complex vectors z = ( z , . . . , z d ), which are treated as polynomials in z of degree | ν | .For 1 ≤ k ≤ m −

2, deﬁne the polynomials P k ( it ) = X r +2 r + ··· + kr k = k r ! · · · r k ! (cid:18) γ ( it )3! (cid:19) r · · · (cid:18) γ k +2 ( it )( k + 2)! (cid:19) r k , (3.9)where the summation is performed over all nonnegative integer solutions( r , . . . , r k ) to the equation r + 2 r + · · · + kr k = k . NTROPIC CENTRAL LIMIT THEOREM Furthermore, like in dimension one, deﬁne the approximating functions ϕ m ( x ) on R d by virtue of the equality (3.1), where every q k is determinedby its Fourier transform Z e i h t,x i q k ( x ) dx = P k ( it ) e −| t | / . (3.10)If Z n has a bounded density for some n , then for all suﬃciently large n , Z n have continuous bounded densities p n satisfying (3.7); see [4], Theo-rem 19.2. We need an extension of this theorem to the case of unboundeddensities, as well as integral variants such as (3.8). The ﬁrst assertion (3.11)in the next theorem is similar to the one-dimensional Theorem 3.3 in thecase where s = m is integer; cf. (3.5). For the proof (which we omit), one mayapply Lemma 2.1 and follow the standard arguments from [4], Chapter 4. Theorem 3.4.

Suppose that E | X | m < + ∞ with some integer m ≥ .If, for all suﬃciently large n , Z n have densities p n , then the densities e p n introduced in Section 2 with m = m + 1 satisfy (1 + | x | m )( e p n ( x ) − ϕ m ( x )) = o ( n − ( m − / )(3.11) uniformly for all x . In addition, Z (1 + | x | m ) | e p n ( x ) − ϕ m ( x ) | dx = o ( n − ( m − / ) . (3.12)The second assertion is Theorem 19.5 in [4], where it is stated for m ≥ X has a nonzero absolutely contin-uous component. Note that, by Lemma 2.1, it does not matter whether e p n or p n are used in (3.12).

4. Entropic distance to normality and moderate deviations.

Let X ,X , . . . be independent, identically distributed random vectors in R d withmean zero, identity covariance matrix and such that D ( Z n ) < + ∞ , for all n large enough.According to Lemma 2.2 and Remark 2.5, up to an error at most 2 − n forsuﬃciently large n , the entropic distance to normality, D n = D ( Z n ), is equalto the relative entropy e D n = Z e p n ( x ) log e p n ( x ) ϕ ( x ) dx, where ϕ is the density of a standard normal random vector Z in R d .Given T ≥

1, split the integral into two parts by writing e D n = Z | x |≤ T e p n ( x ) log e p n ( x ) ϕ ( x ) dx + Z | x | >T e p n ( x ) log e p n ( x ) ϕ ( x ) dx. (4.1)By Theorems 3.3 and 3.4, e p n are uniformly bounded, that is, e p n ( x ) ≤ M ,for all x ∈ R d and n ≥ M . Hence, the second integral S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE in (4.1) may be treated by virtue of moderate deviations results (when T isnot too large). Indeed, since T ≥ Z | x | >T e p n ( x ) log e p n ( x ) ϕ ( x ) dx ≤ Z | x | >T e p n ( x ) log Mϕ ( x ) dx ≤ C Z | x | >T | x | e p n ( x ) dx, where C = + log(1 + M (2 π ) d/ ). One the other hand, using u log u ≥ u − Z | x | >T e p n ( x ) log e p n ( x ) ϕ ( x ) dx ≥ Z | x | >T ( e p n ( x ) − ϕ ( x )) dx ≥ − P {| Z | > T } . The two estimates give (cid:12)(cid:12)(cid:12)(cid:12)Z | x | >T e p n ( x ) log e p n ( x ) ϕ ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ P {| Z | > T } + C Z | x | >T | x | e p n ( x ) dx. (4.2)This is a very general upper bound, valid for any probability density e p n on R d , bounded by a constant M (with C as above).Following (4.1), we are faced with two analytic problems. The ﬁrst one isto give a sharp estimate of e p n ( x ) − ϕ ( x ) on a relatively large Euclidean ball | x | ≤ T . Clearly, T has to be small enough, so that results like local limittheorems, such as Theorems 3.2–3.4 may be applied. The second problemis to give a sharp upper bound of the last integral in (4.2). To this aim,we need moderate deviations inequalities, so that Theorems 3.1 and 3.4are applicable. Anyway, in order to use both types of results we are forcedto choose T from a very narrow window only. This value turns out to beapproximately T n = p ( s −

2) log n + s log log n + ρ n ( s > , (4.3)where ρ n → + ∞ is a suﬃciently slowly growing sequence (whose growth willbe restricted by the decay of the n -dependent constants in o -expressions ofTheorems 3.2–3.4). In the case s = 2, one may put T n = √ ρ n such that T n → + ∞ is a suﬃciently slowly growing sequence. Lemma 4.1 (The case d = 1 and s real). If E X = 0 , E X = 1 , E | X | s < + ∞ ( s ≥ , then Z | x | >T n x e p n ( x ) dx = o (( n log n ) − ( s − / ) . (4.4) Lemma 4.2 (The case d ≥ s integer). If X has mean zero andidentity covariance matrix, and E | X | m < + ∞ , then Z | x | >T n x e p n ( x ) dx = o ( n − ( m − / (log n ) − ( m − d ) / ) ( m ≥ and R | x | >T n x e p n ( x ) dx = o (1) in the case m = 2 . NTROPIC CENTRAL LIMIT THEOREM Note that plenty of results and techniques concerning moderate deviationshave been developed by now. Useful estimates can be found, for example,in [12]. Restricting ourselves to integer values of s = m , one may argue asfollows. Proof of Lemma 4.2.

Given T ≥

1, write Z | x | >T | x | e p n ( x ) dx ≤ T m − Z | x | m e p n ( x ) dx ≤ T m − Z | x | m | e p n ( x ) − ϕ m ( x ) | dx (4.6) + 1 T m − Z | x | >T | x | m ϕ m ( x ) dx. By Theorem 3.4 [cf. (3.12)] the ﬁrst integral in (4.6) is bounded by o ( n − ( m − / ).From the deﬁnition of q k it follows that q k ( x ) = N ( x ) ϕ ( x ) with somepolynomial N of degree at most 3( m − ϕ m ( x ) ≤ ϕ ( x ) on the balls of large radii | x | < n δ with suﬃcientlylarge n (where 0 < δ < ). On the other hand, with some constants C d , C ′ d depending on the dimension only, Z | x | >T | x | m ϕ ( x ) dx = C d Z + ∞ T r m + d − e − r / dr ≤ C ′ d T m + d − e − T / . (4.7)But for T = T n and s = m ≥

3, we have e − T / = T − m o ( n − ( m − / ), so by(4.6) and (4.7), Z | x | >T n | x | e p n ( x ) dx ≤ C (cid:18) T m − + 1 T m − d (cid:19) o ( n − ( m − / ) . Since T n is of order √ log n , (4.5) follows. Furthermore, in the case m = 2,(4.6) gives the desired relation Z | x | >T n | x | e p n ( x ) dx ≤ o (1) + Z | x | >T n | x | ϕ ( x ) dx → n → ∞ ) . (cid:3) Proof of Lemma 4.1.

The above argument also works for d = 1, butit can be reﬁned applying Theorem 3.1 for real s . The case s = 2 is alreadycovered, so let s > T ≥ − ε n ) Z | x | >T x e p n ( x ) dx (4.8) S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE ≤ Z | x | >T x p n ( x ) dx = Z | x | >T x dF n ( x )= T (1 − F n ( T ) + F n ( − T )) + 2 Z + ∞ T x (1 − F n ( x ) + F n ( − x )) dx, (4.9)where F n denotes the distribution function of Z n . [Note that the ﬁrst in-equality in (4.8) should be just ignored in the case, where p is bounded.]By (3.3), F n ( x ) = Φ m ( x ) + r n ( x ) n ( s − /

11 + | x | s , r n = sup x | r n ( x ) | → n → ∞ ) . Hence, the ﬁrst term in (4.9) can be replaced with T (1 − Φ m ( T ) + Φ m ( − T ))(4.10)at the expense of an error not exceeding (for the values T ∼ √ log n )2 r n n ( s − / T T s = o (( n log n ) − ( s − / ) . (4.11)Similarly, the integral in (4.9) can be replaced with Z + ∞ T x (1 − Φ m ( x ) + Φ m ( − x )) dx (4.12)at the expense of an error not exceeding2 r n n ( s − / Z + ∞ T x dx x s = o (( n log n ) − ( s − / ) . (4.13)To explore the behavior of expressions (4.10) and (4.12) for T = T n usingprecise asymptotics as in (4.3), recall that, by (3.2),1 − Φ m ( x ) = 1 − Φ( x ) − m − X k =1 Q k ( x ) n − k/ . Moreover, we note that Q k ( x ) = N k − ( x ) ϕ ( x ), where N k − is a poly-nomial of degree at most 3 k −

1. Thus these functions admit a bound | Q k ( x ) | ≤ C m (1 + | x | m ) ϕ ( x ) with some constants C m (depending on m andthe cumulants γ , . . . , γ m of X ), which implies with some other constants | − Φ m ( x ) | ≤ (1 − Φ( x )) + C m (1 + | x | m ) √ n ϕ ( x ) . (4.14)Hence, using 1 − Φ( x ) < ϕ ( x ) x ( x > T n | − Φ m ( T n ) | ≤ CT n (1 − Φ( T n )) ≤ CT n e − T n / (4.15) = o (( n log n ) − ( s − / ) . A similar bound also holds for T n | Φ m ( − T n ) | . NTROPIC CENTRAL LIMIT THEOREM Now, we use (4.14) to estimate (4.12) with T = T n up to a constant by Z ∞ T x (1 − Φ( x )) dx < − Φ( T ) = o (( n log n ) − ( s − / ) . It remains to combine the last relation with (4.11), (4.13) and (4.15).Since ε n → (cid:3) Remark 4.3.

Note that the probabilities P {| Z | > T } appearing in (4.2)yield a smaller contribution for T = T n in comparison with the right-handsides of (4.4) and (4.5). Indeed, we have P {| Z | > T } ≤ C d T d − e − T / ( T ≥ Z | x | >T n e p n ( x ) log e p n ( x ) ϕ ( x ) dx.

5. Taylor-type expansion for the entropic distance.

In this section weprovide the last auxiliary step toward the proof of Theorem 1.1. In orderto describe the multidimensional case, let X , X , . . . be independent identi-cally distributed random vectors in R d with mean zero, identity covariancematrix, and such that D ( Z n ) < + ∞ for some n .If p n is bounded, then the densities p n of Z n ( n ≥ n ) are uniformlybounded, and we put e p n = p n . Otherwise, we use the modiﬁed densities e p n according to the construction of Section 2. In particular, if e Z n has density e p n ,then | D ( e Z n k Z ) − D ( Z n ) | < − n for all n large enough (where Z is a stan-dard normal random vector; cf. Lemma 2.2 and Remark 2.5). Moreover, byLemmas 4.1, 4.2 and Remark 4.3, (cid:12)(cid:12)(cid:12)(cid:12) D ( Z n ) − Z | x |≤ T n e p n ( x ) log e p n ( x ) ϕ ( x ) dx (cid:12)(cid:12)(cid:12)(cid:12) = o (∆ n ) , (5.1)where T n are deﬁned in (4.3) and∆ n = n − ( s − / (log n ) − ( s − max( d, / (5.2)(with the convention that ∆ n = 1 for the critical case s = 2).Thus, all information about the asymptotics of D ( Z n ) is contained in theintegral in (5.1). More precisely, writing a Taylor expansion for e p n using theapproximating functions ϕ m in Theorems 3.2–3.4 leads to the following rep-resentation (which is more convenient in applications such as Corollary 1.2). Theorem 5.1.

Let E | X | s < + ∞ ( s ≥ , assuming that s is integer incase d ≥ . Then D ( Z n ) = m − X k =2 ( − k k ( k − Z ( ϕ m ( x ) − ϕ ( x )) k dxϕ ( x ) k − (5.3) + o (∆ n ) ( m = [ s ]) . S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Note that in the case 2 ≤ s < D ( Z n ) = o (∆ n ). Proof of Theorem 5.1.

In terms of L ( u ) = u log u , rewrite the inte-gral in (5.1) as e D n, = Z | x |≤ T n L (cid:18) e p n ( x ) ϕ ( x ) (cid:19) ϕ ( x ) dx (5.4) = Z | x |≤ T n L (1 + u m ( x ) + v n ( x )) ϕ ( x ) dx, where u m ( x ) = ϕ m ( x ) − ϕ ( x ) ϕ ( x ) , v n ( x ) = e p n ( x ) − ϕ m ( x ) ϕ ( x ) . By Theorems 3.3 and 3.4, more precisely, by (3.6) for d = 1, and by (3.11)for d ≥ s = m integer, in the region | x | = O ( n δ ) with an appropriate δ >

0, we have | e p n ( x ) − ϕ m ( x ) | ≤ r n n ( s − /

11 + | x | s , r n → . (5.5)Since ϕ ( x )(1 + | x | s ) is decreasing as a function of | x | for large | x | , we obtain,for all | x | ≤ T n , | v n ( x ) | ≤ C r n n ( s − / e T n / T sn ≤ C ′ r n e ρ n / . The last expression tends to zero by a suitable choice of ρ n → ∞ which wewill assume from now on. In particular, for n large enough, | v n ( x ) | < in | x | ≤ T n .From the deﬁnitions of q k and ϕ m [cf. (1.2), (3.1) and (3.10)], it followsthat | u m ( x ) | ≤ C m | x | m − √ n (5.6)with some constants depending on m and the cumulants, only. Thus, we alsohave | u m ( x ) | < for | x | ≤ T n with suﬃciently large n .Now, by Taylor’s formula, for | u | ≤ , | v | ≤ , L (1 + u + v ) = L (1 + u ) + v + 2 θ uv + θ v with some | θ j | ≤ u, v ). Applying this approximation with u = u m ( x ) and v = v n ( x ), we see that v n ( x ) can be removed from the right-hand side of (5.4) at the expense of an error not exceeding | J | + J + J ,where J = Z | x |≤ T n ( e p n ( x ) − ϕ m ( x )) dx, J = Z | x |≤ T n | u m ( x ) || e p n ( x ) − ϕ m ( x ) | dx NTROPIC CENTRAL LIMIT THEOREM and J = Z | x |≤ T n ( e p n ( x ) − ϕ m ( x )) ϕ ( x ) dx. But | J | = (cid:12)(cid:12)(cid:12)(cid:12)Z | x | >T n ( e p n ( x ) − ϕ m ( x )) dx (cid:12)(cid:12)(cid:12)(cid:12) (5.7) ≤ Z | x | >T n e p n ( x ) dx + Z | x | >T n | ϕ m ( x ) | dx. By Lemmas 4.1 and 4.2, the ﬁrst integral on the right-hand side is T n -timessmaller than o (∆ n ). Also, by (5.6), the last integral in (5.7) is bounded by Z | x | >T n | ϕ m ( x ) − ϕ ( x ) | dx + Z | x | >T n ϕ ( x ) dx ≤ C m √ n Z | x | >T n (1 + | x | m − ) ϕ ( x ) dx + P {| Z | > T n } = o (∆ n ) . As a result, J = o (∆ n ).Applying (5.6) once more and then relation (3.12), we may also concludethat J ≤ C m T m − n √ n Z | x |≤ T n | e p n ( x ) − ϕ m ( x ) | dx = o (∆ n ) . Finally, using (5.5) with s >

2, we get, up to some constants, J ≤ C r n n s − Z | x |≤ T n e | x | / | x | s dx ≤ C d r n n s − Z T n r d − s − e r / dr ≤ C ′ d r n n s − T s − d +2 n e T n / = o (cid:18) n ( s − / (log n ) ( s − d +2) / (cid:19) = o (∆ n ) . If s = 2, all these steps are valid as well and give J ≤ C ′ d r n n s − T s − d +2 n e T n / → T n → + ∞ .Thus, at the expense of an error not exceeding o (∆ n ) one may remove v n ( x ) from (5.4), and we obtain the relation e D n, = Z | x |≤ T n L (1 + u m ( x )) ϕ ( x ) dx + o (∆ n ) , (5.8)which contains speciﬁed expansion terms, only. S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Moreover, u m ( x ) = u ( x ) = 0 for 2 ≤ s <

3, and then the theorem is proved.Next, we consider the case s ≥

3. By Taylor’s expansion around zero, weget, whenever | u | < , for some positive constants θ m , L (1 + u ) = u + m − X k =2 ( − k k ( k − u k + θu m − , | θ | ≤ θ m , assuming that the sum has no terms in the case m = 3. Hence, with some | θ | ≤ θ m , Z | x |≤ T n L (1 + u m ( x )) ϕ ( x ) dx (5.9) = Z | x |≤ T n ( ϕ m ( x ) − ϕ ( x )) dx + m − X k =2 ( − k k ( k − Z | x |≤ T n u m ( x ) k ϕ ( x ) dx (5.10) + θ Z R d | u m ( x ) | m − ϕ ( x ) dx. For n large enough, by (5.6), the second integral in (5.9) has an absolutevalue (cid:12)(cid:12)(cid:12)(cid:12)Z | x | >T n ( ϕ m ( x ) − ϕ ( x )) dx (cid:12)(cid:12)(cid:12)(cid:12) ≤ C √ n Z | x | >T n (1 + | x | m − ) ϕ ( x ) dx = o (∆ n ) . This proves the theorem in the case 3 ≤ s < m = 3).Now, let s ≥

4. The last integral in (5.10) can be estimated again by virtueof (5.6) by Cn ( m − / Z R d (1 + | x | m − m − ) ϕ ( x ) dx = o (∆ n ) . In addition, the ﬁrst integral in (5.10) can be extended to the whole spaceat the expense of an error not exceeding (for all n large enough) Z | x | >T n | u m ( x ) | k ϕ ( x ) dx ≤ Cn k/ Z | x | >T n (1 + | x | k ( m − ) ϕ ( x ) dx ≤ C ′ T k ( m − n √ n e − T n / = o (∆ n ) . Collecting these estimates in (5.9) and (5.10) and applying them in (5.8),we arrive at e D n, = m − X k =2 ( − k k ( k − Z u m ( x ) k ϕ ( x ) dx + o (∆ n ) . It remains to apply (5.1). Thus, Theorem 5.1 is proved. (cid:3)

NTROPIC CENTRAL LIMIT THEOREM

6. Theorem 1.1 and its multidimensional extension.

The desired repre-sentation (1.3) of Theorem 1.1 can be deduced from Theorem 5.1. Note thatthe latter covers the multidimensional case as well, although under some-what stronger moment assumptions.Thus, let ( X n ) n ≥ be independent identically distributed random vectorsin R d with ﬁnite second moment. If the normalized sum Z n = ( X + · · · + X n ) / √ n has density p n ( x ), the entropic distance to Gaussianity is deﬁnedas in dimension one to be the relative entropy D ( Z n ) = Z p n ( x ) log p ( x ) ϕ a, Σ ( x ) dx with respect to the normal law on R d with the same mean a = E X andcovariance matrix Σ = Var( X ). This quantity is aﬃne invariant, and in thissense it does not depend on ( a, Σ).

Theorem 6.1. If D ( Z n ) < + ∞ for some n , then D ( Z n ) → , as n → ∞ .Moreover, given that E | X | s < + ∞ ( s ≥ , and that X has mean zero andidentity covariance matrix, we have D ( Z n ) = c n + c n + · · · + c [( m − / n [( m − / + o (∆ n ) ( m = [ s ]) , (6.1) where ∆ n are deﬁned in (5.2), and where we assume that s is integer in case d ≥ . Here, as in Theorem 1.1, each coeﬃcient c j is deﬁned according to (1.4)again. It may be represented as a certain polynomial in the cumulants γ ν ,3 ≤ | ν | ≤ j + 1. Proof of Theorem 6.1.

We shall start from the representation (5.3)of Theorem 5.1, so let us return to deﬁnition (3.1), ϕ m ( x ) − ϕ ( x ) = m − X r =1 q r ( x ) n − r/ . In the case 2 ≤ s < m = 2), the right-hand side contains no termsand is therefore vanishing. Anyhow, raising this sum to the power k ≥ ϕ m ( x ) − ϕ ( x )) k = X j n − j/ X q r ( x ) · · · q r k ( x ) , where the inner sum is carried out over all positive integers r , . . . , r k ≤ m − r + · · · + r k = j . Respectively, the k th integral in (5.3) is equal to X j n − j/ X Z q r ( x ) · · · q r k ( x ) dxϕ ( x ) k − . (6.2) S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Here the integrals are vanishing for odd j . In dimension one, this fol-lows directly from deﬁnition (1.2) of q r and the following property of theChebyshev–Hermite polynomials [24] Z + ∞−∞ H r ( x ) · · · H r k ( x ) ϕ ( x ) dx = 0 ( r + · · · + r k is odd) . (6.3)As for the general case, let us look at the structure of the functions q r . Givena multi-index ν = ( ν , . . . , ν d ) with integers ν , . . . , ν d ≥

1, deﬁne H ν ( x , . . . ,x d ) = H ν ( x ) · · · H ν d ( x d ), so that Z e i h t,x i H ν ( x ) ϕ ( x ) dx = ( it ) ν e −| t | / , t ∈ R d . Hence, by deﬁnition (3.10), q r ( x ) = ϕ ( x ) X ν a ν H ν ( x ) , (6.4)where the coeﬃcients a ν emerge from the expansion P r ( it ) = P ν a ν ( it ) ν .Using (3.9), write these polynomials as P r ( it ) = X l ! · · · l r ! (cid:18) X | ν | =3 γ ν ( it ) ν ν ! (cid:19) l · · · (cid:18) X | ν | = r +2 γ ν ( it ) ν ν ! (cid:19) l r , (6.5)where the outer summation is performed over all nonnegative integer solu-tions ( l , . . . , l r ) to the equation l + 2 l + · · · + rl r = r . Removing the bracketsof the inner sums, we obtain a linear combination of the power polynomials( it ) ν with exponents of order | ν | = 3 l + · · · + ( r + 2) l r = r + 2 b l , b l = l + · · · + l r . (6.6)In particular, r + 2 ≤ | ν | ≤ r , so that P r ( it ) is a polynomial of degree atmost 3 r , and thus ϕ m ( x ) = N ( x ) ϕ ( x ), where N ( x ) is a polynomial of degreeat most 3( m − q r ( x ) · · · q r k ( x ) ϕ ( x ) k − = ϕ ( x ) X a ν (1) · · · a ν ( k ) H ν (1) ( x ) · · · H ν ( k ) ( x ) , (6.7)where | ν (1) | + · · · + | ν ( k ) | = r + · · · + r k (mod 2). Hence, if r + · · · + r k is odd,the sum | ν (1) | + · · · + | ν ( k ) | = d X i =1 ( | ν (1) i | + · · · + | ν ( k ) i | )is odd as well. But then at least one of the inner sums, say with coordinate i ,must be odd as well. Hence in this case, the integral of (6.7) over x i willvanish by property (6.3).Thus, in expression (6.2), only even values of j should be taken intoaccount. NTROPIC CENTRAL LIMIT THEOREM Moreover, since the terms containing n − j/ with j > s − n in relation (6.1), we get from (5.3) and (6.2), D ( Z n ) = m − X k =2 ( − k k ( k − m − X even j =2 n − j/ X Z q r ( x ) · · · q r k ( x ) dxϕ ( x ) k − + o (∆ n ) . Replace now j with 2 j and rearrange the summation. Then D ( Z n ) = X j ≤ m − c j n j + o (∆ n )with c j = m − X k =2 ( − k k ( k − X Z q r ( x ) · · · q r k ( x ) dxϕ ( x ) k − . Here the inner summation is carried out over all positive integers r , . . . , r k ≤ m − r + · · · + r k = 2 j . This implies k ≤ j . Furthermore, 2 j ≤ m − j ≤ [ s − ]. As a result, we arrive at the required relation(6.1) with c j = j X k =2 ( − k k ( k − X r + ··· + r k =2 j Z q r ( x ) · · · q r k ( x ) dxϕ ( x ) k − . (6.8)Thus, Theorem 6.1 and therefore Theorem 1.1 are proved. (cid:3) Remark.

In order to show that c j is a polynomial in the cumulants γ ν , 3 ≤ | ν | ≤ j + 1, ﬁrst note that r + · · · + r k = 2 j , r , . . . , r k ≥ j ≥ max i r i + ( k − i r i ≤ j −

1. Thus, the maximal index for thefunctions q r i in (6.8) does not exceed 2 j −

1. On the other hand, it followsfrom (6.4) and (6.5) that P r and q r are polynomials in the same set of thecumulants; more precisely, P r is a polynomial in γ ν with 3 ≤ | ν | ≤ r + 2. Proof of Corollary 1.2.

By Theorem 5.1 [cf. (5.3)], D ( Z n ) = m − X k =2 ( − k k ( k − Z ( ϕ m ( x ) − ϕ ( x )) k dxϕ ( x ) k − + o (∆ n ) . (6.9)Assume that m ≥ γ = · · · = γ k − = 0 for a given integer 3 ≤ k ≤ m .(This is no restriction, when k = 3.) Then, by (1.2), q = · · · = q k − = 0,while q k − ( x ) = γ k k ! H k ( x ) ϕ ( x ). Hence, according to deﬁnition (3.1), ϕ m ( x ) − ϕ ( x ) = γ k k ! H k ( x ) ϕ ( x ) 1 n ( k − / + m − X j = k − q j ( x ) n j/ , where the sum is empty in the case m = 3. Therefore, the sum in (1.3) willcontain powers of 1 /n starting from 1 /n k − , and the leading coeﬃcient isdue to the quadratic term in (6.9) when k = 2. More precisely, if k − ≤ m − , S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE we get that c = · · · = c k − = 0, and c k − = γ k k ! Z + ∞−∞ H k ( x ) ϕ ( x ) dx = γ k k ! . (6.10)Hence, if k ≤ m , (6.9) yields D ( Z n ) = γ k k ! 1 n k − + O ( n − ( k − ). Otherwise, the O -term should be replaced by o (( n log n ) − ( s − / ). Thus Corollary 1.2 isproved. (cid:3) By a similar argument, the conclusion may be extended to the mul-tidimensional case. Indeed, if γ ν = 0, for all 3 ≤ | ν | < k , then by (6.5), P = · · · = P k − = 0, while P k − ( it ) = X | ν | = k γ ν ( it ) ν ν ! . Correspondingly, in (6.4) we have q = · · · = q k − = 0 and q k − ( x ) = ϕ ( x ) × P | ν | = k γ ν ν ! H ν ( x ). Therefore, ϕ m ( x ) − ϕ ( x ) = ϕ ( x ) X | ν | = k γ ν ν ! H ν ( x ) 1 n ( k − / + m − X j = k − q j ( x ) n j/ . Applying this relation in (6.9), we arrive at (6.1) with c = · · · = c k − = 0and, by orthogonality of the polynomials H ν , c k − = 12 Z (cid:18) X | ν | = k γ ν ν ! H ν ( x ) (cid:19) ϕ ( x ) dx = 12 X | ν | = k γ ν ν ! . We may summarize our ﬁndings as follows.

Corollary 6.2.

Let ( X n ) n ≥ be i.i.d. random vectors in R d ( d ≥ with mean zero and identity covariance matrix. Suppose that E | X | m < + ∞ , for some integer m ≥ , and D ( Z n ) < + ∞ , for some n . Given k =3 , , . . . , m , if γ ν = 0 for all ≤ | ν | < k , we have D ( Z n ) = 12 n k − X | ν | = k γ ν ν ! + O (cid:18) n k − (cid:19) + o (cid:18) n ( m − / (log n ) ( m − d ) / (cid:19) . (6.11)The conclusion corresponds to Corollary 1.2, if we replace d with 2 in theremainder on the right-hand side.As in dimension one, when E X k < + ∞ , the o -term may be removedfrom this representation, while for k > m , the o -term dominates. Moreover,if m +22 < k ≤ m , we are left with this term, only, that is, D ( Z n ) = o (cid:18) n ( m − / (log n ) ( m − d ) / (cid:19) . NTROPIC CENTRAL LIMIT THEOREM When k = 3, there is no restriction on the cumulants in Corollary 6.2, and(6.11) becomes D ( Z n ) = 12 n X | ν | =3 γ ν ν ! + O (cid:18) n (cid:19) + o (cid:18) n ( m − / (log n ) ( m − d ) / (cid:19) . If E | X | < + ∞ , we get D ( Z n ) = O (1 /n ) for d ≤

4, and the weaker bound D ( Z n ) = o ((log n ) ( d − / /n ) for d ≥

5. However, if E | X | < + ∞ , we alwayshave D ( Z n ) = O (1 /n ) regardless of the dimension d .Technically, this slight diﬀerence between conclusions for diﬀerent di-mensions is due to the dimension-dependent asymptotic R | x | >T | x | ϕ ( x ) dx ∼ C d T d e − T / . Remark.

In case of discrete distributions when X takes integer val-ues, asymptotics for D ( S n ) were studied by Vilenkin and D’yachkov [26],who used an Edgeworth-type expansion for probabilities P { S n = k } in thecorresponding local limit theorem.

7. Convolutions of mixtures of normal laws.

Is the asymptotic descrip-tion of D ( Z n ) in Theorem 1.1 still optimal, if no expansion terms of order n − j are present? This is exactly the case for 2 ≤ s < p ( x ) = Z + ∞ ϕ σ ( x ) dP ( σ ) ( x ∈ R ) , (7.1)where P is a (mixing) probability measure on the positive half-axis (0 , + ∞ ),and where ϕ σ ( x ) = 1 σ √ π e − x / (2 σ ) is the density of the normal law with mean zero and variance σ [as usual,we write ϕ ( x ) in the standard normal case with σ = 1].Equivalently, let p ( x ) denote the density of the random variable X = ρZ , where the factors Z ∼ N (0 ,

1) and ρ > P ) areindependent. Such distributions appear naturally, for example, as limit lawsof sums with randomized length; cf., for example, [8].For densities such as (7.1), we need a reﬁnement of the local limit theoremfor convolutions, described in the expansions (3.5) and (3.6). More precisely,our aim is to ﬁnd a representation with an essentially smaller remainder termcompared to o ( n − ( s − / ). S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Thus, let X , X , . . . be independent random variables, having a commondensity p ( x ) as in (7.1), and let p n ( x ) denote the density of the normalizedsum Z n = ( X + · · · + X n ) / √ n . If X = ρZ , where Z ∼ N (0 ,

1) and ρ > E X = E ρ and more generally, E | X | s = β s E ρ s = β s Z + ∞ σ s dP ( σ ) , where β s denotes the s th absolute moment of Z .Note that p ( x ) is unimodal with mode at the origin, and p (0) = E ρ √ π .If ρ ≥ σ >

0, the density is bounded, and therefore the entropy h ( X ) isﬁnite. Proposition 7.1.

Assume that E ρ = 1 , E ρ s < + ∞ (2 < s ≤ . If P { ρ ≥ σ } = 1 with some constant σ > , then uniformly over all x , p n ( x ) = ϕ ( x ) + n Z + ∞ ( ϕ σ n ( x ) − ϕ ( x )) dP ( σ ) + O (cid:18) n s − (cid:19) , (7.2) where σ n = q σ − n . Of course, when E ρ s < + ∞ for s >

4, the proposition may be still applied,but with s = 4. In this case (7.2) has a remainder term of order O ( n ). Notethat necessarily σ ≤ E ρ = 1.The function p n may also be described as the density of Z n = q ρ + ··· + ρ n n Z ,where ρ k are independent copies of ρ (independent of Z as well). This rep-resention already indicates the closeness of p n and ϕ and suggests to appealto the law of large numbers. However, we shall choose a diﬀerent approachbased on the characteristic functions of Z n .Obviously, the characteristic function of X is given by v ( t ) = E e itX = E e − ρ t / ( t ∈ R ) . Using Jensen’s inequality and the assumption ρ ≥ σ >

0, we get a two-sidedestimate e − t / ≤ v ( t ) ≤ e − σ t / . (7.3)In particular, the function ψ ( t ) = e t / v ( t ) − t real. Lemma 7.2. If E ρ = 1 , M s = E ρ s < + ∞ (2 ≤ s ≤ , then for all | t | ≤ , ≤ ψ ( t ) ≤ M s | t | s . Proof.

We may assume 0 < t ≤

1. Write ψ ( t ) = E ( e − ( ρ − t / − ρt >

1, hence ψ ( t ) ≤ E ( e − ( ρ − t / − { ρ ≤ /t } . NTROPIC CENTRAL LIMIT THEOREM Let x = − ( ρ − t . Clearly, | x | ≤ ρ ≤ /t . Using e x ≤ x + x ( | x | ≤

1) and E ρ = 1, we get ψ ( t ) ≤ − t E ( ρ − { ρ ≤ /t } + t E ( ρ − { ρ ≤ /t } (7.4) = t E ( ρ − { ρ> /t } + t E ( ρ − { ρ ≤ /t } . The last expectation is equal to E ρ { ρ ≤ /t } + 2 E ( ρ − { ρ> /t } − P { ρ ≤ /t }≤ E ρ { ρ ≤ /t } + 2 E ρ { ρ> /t } − ≤ E ρ { ρ ≤ /t } + E ρ { ρ> /t } . Together with (7.4), this gives ψ ( t ) ≤ t E ρ { ρ> /t } + t E ρ { ρ ≤ /t } . (7.5)Finally, E ρ { ρ> /t } ≤ E ρ s t s − { ρ> /t } ≤ M s t s − and E ρ { ρ ≤ /t } ≤ E ρ s t s − × { ρ ≤ /t } ≤ M s t s − . It remains to use these estimates in (7.5), and Lemma 7.2is proved. (cid:3) Proof of Proposition 7.1.

The characteristic functions v n ( t ) = v ( t √ n ) n of Z n are real-valued and admit, by (7.3), similar bounds e − t / ≤ v n ( t ) ≤ e − σ t / . (7.6)In particular, one may apply the inverse Fourier transform to represent thedensity of Z n as p n ( x ) = 12 π Z + ∞−∞ e − itx v n ( t ) dt = 12 π Z + ∞−∞ e − itx − t / (1 + ψ ( t/ √ n )) n dt. Letting T n = σ log n , we split the integral into the two regions, deﬁned by I = Z | t |≤ T n e − itx v n ( t ) dt, I = Z | t | >T n e − itx v n ( t ) dt. By the upper bound in (7.6), | I | ≤ Z | t | >T n e − σ t / dt ≤ √ πσ e − σ T n / = √ πσ n . (7.7)In the interval | t | ≤ T n , by Lemma 7.2, ψ ( t √ n ) ≤ M s | t | s n s/ ≤ n , for all n ≥ n .But for 0 ≤ ε ≤ n , there is the simple estimate 0 ≤ (1 + ε ) n − − nε ≤ nε ) . S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Hence, once more by Lemma 7.2,0 ≤ (1 + ψ ( t/ √ n )) n − − nψ ( t/ √ n ) ≤ nψ ( t/ √ n )) ≤ M s | t | s n s − ( n ≥ n ) . This gives (cid:12)(cid:12)(cid:12)(cid:12) I − Z | t |≤ T n e − itx − t / (1 + nψ ( t/ √ n )) dt (cid:12)(cid:12)(cid:12)(cid:12) ≤ M s n s − Z + ∞−∞ | t | s e − t / dt. (7.8)In addition, (cid:12)(cid:12)(cid:12)(cid:12)Z | t | >T n e − itx − t / (1 + nψ ( t/ √ n )) dt (cid:12)(cid:12)(cid:12)(cid:12) ≤ Z | t | >T n e − t / dt + n Z | t | >T n e − t / ψ ( t/ √ n ) dt. Here, the ﬁrst integral on the right-hand side is of order O ( n − ). To estimatethe second one, recall that, by (7.3), ψ ( t ) = e t / v ( t ) − ≤ e (1 − σ ) t / . Hence, ψ ( t/ √ n ) ≤ e (1 − σ ) t / and Z | t | >T n e − t / ψ ( t/ √ n ) dt ≤ Z | t | >T n e − σ t / dt ≤ √ πσ n . Together with (7.7) and (7.8) these bounds imply that p n ( x ) = 12 π Z + ∞−∞ e − itx − t / (1 + nψ ( t/ √ n )) dt + O (cid:18) n s − (cid:19) uniformly over all x . It remains to note that12 π Z + ∞−∞ e − itx − t / ψ ( t/ √ n ) dt = 12 π Z + ∞−∞ e − itx − t / ( e t / n v ( t/ √ n ) − dt = Z + ∞ ( ϕ σ n ( x ) − ϕ ( x )) dP ( σ ) . Proposition 7.1 is proved. (cid:3)

Remark 7.3.

An inspection of (7.5) shows that, in the case 2 < s < ψ ( t ) = o ( | t | s ). Correspondingly,the O -relation in Proposition 7.1 can be replaced with an o -relation. Thisimprovement is convenient, but not crucial for the proof of Theorem 1.3.

8. Lower bounds. Proof of Theorem 1.3.

Let X , X , . . . be independentrandom variables with a common density of the form p ( x ) = Z + ∞ ϕ σ ( x ) dP ( σ ) , x ∈ R . NTROPIC CENTRAL LIMIT THEOREM Equivalently, let X = ρZ with independent random variables Z ∼ N (0 , ρ > P .A basic tool for proving Theorem 1.3 will be the following lower bound onthe entropic distance to Gaussianity for the partial sums S n = X + · · · + X n . Proposition 8.1.

Let E ρ = 1 , E ρ s < + ∞ (2 < s < and P { ρ ≥ σ } = 1 with σ > . Assume that, for some γ > , lim inf n →∞ n s − / Z + ∞ n / γ σ dP ( σ ) > . (8.1) Then with some absolute constant c > and some constant δ > , D ( S n ) ≥ cn log n P { ρ ≥ p n log n } + o (cid:18) n ( s − / δ (cid:19) . (8.2)In fact, in (8.2) one may take any positive number δ < min { γs, s − } . Proof of Proposition 8.1.

By Proposition 7.1 and Remark 7.3, uni-formly over all x , p n ( x ) = ϕ ( x ) + n Z + ∞ ( ϕ σ n ( x ) − ϕ ( x )) dP ( σ ) + o (cid:18) n s − (cid:19) , (8.3)where p n is the density of S n / √ n and σ n = q σ − n .Deﬁne the sequence N n = n / γ √ log n for n large enough (so that N n ≥ P { ρ ≥ N n } ≤ s M s log nn (1 / γ ) s = o (cid:18) n s/ δ (cid:19) , < δ < γs. (8.4)Using u log u ≥ u − u ≥

0) and applying (8.3), we may write I n ≡ Z | x |≤ √ log n p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ Z | x |≤ √ log n ( p n ( x ) − ϕ ( x )) dx (8.5) ≥ n Z + ∞ Z | x |≤ √ log n ( ϕ σ n ( x ) − ϕ ( x )) dx dP ( σ ) − C √ log nn s − with some constant C . S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

Note that σ n < σ <

1, and thus, for any

T > Z | x |≤ T ( ϕ σ n ( x ) − ϕ ( x )) dx = 2(Φ( T /σ n ) − Φ( T )) > , where Φ denotes the distribution function of the standard normal law.Hence, the outer integral in (8.5) may be restricted to the range σ ≥ σ ≥ N n . More precisely, (8.4) gives n (cid:12)(cid:12)(cid:12)(cid:12)Z + ∞ N n Z | x |≤ √ log n ( ϕ σ n ( x ) − ϕ ( x )) dx dP ( σ ) (cid:12)(cid:12)(cid:12)(cid:12) ≤ n P { ρ ≥ N n } = o (cid:18) n ( s − / δ (cid:19) . Comparing this relation with (8.5) and imposing the additional requirement δ < s − , we get I n ≥ n Z N n Z | x |≤ √ log n ( ϕ σ n ( x ) − ϕ ( x )) dx dP ( σ ) + o (cid:18) n ( s − / δ (cid:19) (8.6) = − n Z N n Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ ) + o (cid:18) n ( s − / δ (cid:19) . Now, let us estimate p n ( x ) from below in the region 4 √ log n ≤ | x | ≤ n γ .If | x | ≥ √ log n , it follows from (8.3) that p n ( x ) = n Z + ∞ ϕ σ n ( x ) dP ( σ ) + o (cid:18) n s − (cid:19) . (8.7)Consider the function g n ( x ) = Z + ∞ ϕ σ n ( x ) ϕ ( x ) dP ( σ ) . Note that 1 ≤ σ n ≤ σ for σ ≥

1. In this case, the ratio ϕ σn ( x ) ϕ ( x ) is nonincreasingin x ≥

0. Moreover, for σ ≥ √ n + 1, we have σ n = 1 + σ − n ≥

4, so 1 − σ n ≥ . Hence, for | x | ≥ √ log n , ϕ σ n ( x ) ϕ ( x ) = 1 σ n e x (1 − /σ n ) / ≥ n σ . Therefore, g n ( x ) ≥ n Z + ∞√ n +1 σ dP ( σ ) . But by assumption (8.1), the last expression tends to inﬁnity with n , so forall n large enough, g n ( x ) ≥ | x | ≥ √ log n . NTROPIC CENTRAL LIMIT THEOREM Furthermore, if σ ≥ | x |√ n , then σ n = 1 + σ − n ≥ x , so x σ n ≤ . On theother hand, σ n < σ n = n + σ n ≤ σ /x + σ n ≤ σ n , since | x | ≥ √ log n > n ≥

2. The two estimates give ϕ σ n ( x ) = 1 σ n √ π e − x / σ n ≥ √ n σ . Therefore, whenever 4 √ log n ≤ | x | ≤ n γ , n Z + ∞ ϕ σ n ( x ) dP ( σ ) ≥ n / Z + ∞| x |√ n σ dP ( σ ) ≥ n / Z + ∞ n / γ σ dP ( σ ) . By assumption (8.1), the last expression and therefore the left integral arelarger than cn s − with some constant c >

0. Consequently, the remainderterm in (8.7) is indeed smaller, so that for all n large enough, we may write,for example, p n ( x ) ≥ . n Z + ∞ ϕ σ n ( x ) dP ( σ ) = 0 . ng n ( x ) ϕ ( x ) (4 p log n ≤ | x | ≤ n γ ) . Since g n ( x ) ≥ | x | ≥ √ log n with large n , we have in this region p n ( x ) ϕ ( x ) ≥ . n > n , thus p n ( x ) log p n ( x ) ϕ ( x ) ≥ p n ( x ) log n ≥ . n log n Z + ∞ ϕ σ n ( x ) dx dP ( σ ) . Hence, Z √ log n ≤| x |≤ n γ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ . n log n Z + ∞ Z √ log n ≤| x |≤ n γ ϕ σ n ( x ) dx dP ( σ )(8.8) = 1 . n log n Z + ∞ Z n γ /σ n √ log n/σ n ϕ ( x ) dx dP ( σ ) . At this point, it is useful to note that n γ σ n ≥ √ log n , as long as σ ≤ N n with n large enough. Indeed, in this case σ n ≤ (1 − n ) + N n n < n γ

25 log n , so(4 σ n p log n ) ≤

16 log n (cid:18) n γ

25 log n (cid:19) < n γ for all n large enough. Hence, from (8.8), Z √ log n ≤| x |≤ n γ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ . n log n Z N n Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ ) . S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE

But the last expression dominates the double integral in (8.6) with afactor of 2 n . Therefore, combining the above estimate with (8.6), we get Z | x |≤ n γ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ . n log n Z N n Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ )+ o (cid:18) n ( s − / δ (cid:19) . Finally, we may extend the outer integral on the right-hand side to allvalues σ > n log n Z + ∞ N n Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ ) ≤ n log n P { ρ > N n } = o (cid:18) n ( s − / δ (cid:19) . Hence, Z | x |≤ n γ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ . n log n Z + ∞ Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ )(8.9) + o (cid:18) n ( s − / δ (cid:19) . For the remaining values | x | ≥ n γ , one can just use the property u log u ≥− e to get a simple lower bound Z | x | >n γ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ Z | x | >n γ ,p n ( x ) ≤ ϕ ( x ) p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ − e Z | x | >n γ ,p n ( x ) ≤ ϕ ( x ) ϕ ( x ) dx ≥ − e − n γ / . Together with (8.9) this yields Z + ∞−∞ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ . n log n Z + ∞ Z √ log n √ log n/σ n ϕ ( x ) dx dP ( σ )+ o (cid:18) n ( s − / δ (cid:19) . To simplify, ﬁnally note that σ n √ log n ≤ σ ≥ √ n log n . In this casethe last integral is separated from zero (for large n ), hence with some abso-lute constant c > Z + ∞−∞ p n ( x ) log p n ( x ) ϕ ( x ) dx ≥ cn log n P { ρ ≥ p n log n } + o (cid:18) n ( s − / δ (cid:19) . This is exactly the required inequality (8.2) and Proposition 8.1 is proved. (cid:3)

NTROPIC CENTRAL LIMIT THEOREM Proof of Theorem 1.3.

Given η >

0, one may apply Proposition 8.1to the probability measure P with density dP ( σ ) dσ = cσ s +1 (log σ ) η , σ > , and extending it to an interval [ σ ,

2] to meet the requirement R + ∞ σ σ dP ( σ ) =1 (with some 0 < σ < c = c η,s ). It iseasy to see that in this case condition (8.1) is fulﬁlled for 0 < γ < s − s +1) . Inaddition, if ρ has the distribution P , we have P { ρ ≥ σ } ≥ const 1 σ s (log σ ) η for all σ large enough. Hence, by taking σ = √ n log n , (8.2) provides thedesired lower bound. (cid:3) Remark.

In case s = 2 (i.e., with minimal moment assumptions), themixtures of the normal laws with discrete mixing measures P were used byMatskyavichyus [18] in the central limit theorem in terms of the Kolmogorovdistance. Namely, it is shown that, for any prescribed sequence ε n →

0, onemay choose P such that ∆ n = sup x | F n ( x ) − Φ( x ) | ≥ ε n for all n large enough(where F n is the distribution function of Z n ). In view of the Pinsker-typeinequality, one may conclude that D ( Z n ) ≥ ∆ n ≥ ε n . Therefore, D ( Z n ) may decay at an arbitrarily slow rate.REFERENCES [1] Artstein, S. , Ball, K. M. , Barthe, F. and

Naor, A. (2004). Solution of Shan-non’s problem on the monotonicity of entropy.

J. Amer. Math. Soc. Artstein, S. , Ball, K. M. , Barthe, F. and

Naor, A. (2004). On the rate ofconvergence in the entropic central limit theorem.

Probab. Theory Related Fields

Barron, A. R. (1986). Entropy and the central limit theorem.

Ann. Probab. Bhattacharya, R. N. and

Ranga Rao, R. (1976).

Normal Approximation andAsymptotic Expansions . Wiley, New York. MR0436272[5]

Bikjalis, A. (1964). An estimate for the remainder term in the central limit theorem.

Litovsk. Mat. Sb. Bobkov, S. G. , Chistyakov, G. P. and

G¨otze, F. (2011). Non-uniform bounds inlocal limit theorems in case of fractional moments. I.

Math. Methods Statist. Bobkov, S. G. , Chistyakov, G. P. and

G¨otze, F. (2011). Non-uniform bounds inlocal limit theorems in case of fractional moments. II.

Math. Methods Statist. Bobkov, S. G. and

G¨otze, F. (2007). Concentration inequalities and limit theoremsfor randomized sums.

Probab. Theory Related Fields S. G. BOBKOV, G. P. CHISTYAKOV AND F. G ¨OTZE[9]

Esseen, C.-G. (1945). Fourier analysis of distribution functions. A mathematicalstudy of the Laplace–Gaussian law.

Acta Math. Feller, W. (1971).

An Introduction to Probability Theory and Its Applications. Vol.II , 2nd ed. Wiley, New York. MR0270403[11]

Gnedenko, B. V. and

Kolmogorov, A. N. (1954).

Limit Distributions for Sums ofIndependent Random Variables . Addison-Wesley, Reading, MA. Translated andannotated by K. L. Chung. With an Appendix by J. L. Doob. MR0062975[12]

G¨otze, F. and

Hipp, C. (1978). Asymptotic expansions in the central limit theoremunder moment conditions.

Z. Wahrsch. Verw. Gebiete Ibragimov, I. A. and

Linnik, J. V. (1965).

Independent and Stationarily ConnectedVariables . Nauka, Moscow.[14]

Johnson, O. (2004).

Information Theory and the Central Limit Theorem . ImperialCollege Press, London. MR2109042[15]

Johnson, O. and

Barron, A. (2004). Fisher information inequalities and the centrallimit theorem.

Probab. Theory Related Fields

Linnik, J. V. (1959). An information-theoretic proof of the central limit theoremwith Lindeberg conditions.

Theory Probab. Appl. Madiman, M. and

Barron, A. (2007). Generalized entropy power inequalities andmonotonicity properties of information.

IEEE Trans. Inform. Theory Matskyavichyus, V. K. (1983). A lower bound for the rate of convergence in thecentral limit theorem.

Teor. Veroyatn. Primen. Osipov, L. V. and

Petrov, V. V. (1967). On the estimation of the remainder termin the central limit theorem.

Teor. Veroyatn. Primen. Petrov, V. V. (1964). On local limit theorems for sums of independent randomvariables.

Theory Probab. Appl. Petrov, V. V. (1975).

Sums of Independent Random Variables . Springer, New York.MR0388499[22]

Prohorov, Y. V. (1952). A local theorem for densities.

Doklady Akad. Nauk SSSR(N.S.) Siraˇzdinov, S. H. and

Mamatov, M. (1962). On mean convergence for densities.

Teor. Veroyatn. Primen. Szeg˝o, G. (1967).

Orthogonal Polynomials , 3rd ed.

American Mathematical SocietyColloquium Publications . Amer. Math. Soc., Providence, RI. MR0310533[25] Tucker, H. G. (1965). On a necessary and suﬃcient condition that an inﬁnitelydivisible distribution be absolutely continuous.

Trans. Amer. Math. Soc.

Vilenkin, P. A. and

D’yachkov, A. G. (1998). Asymptotics of Shannon andR´enyi entropies for sums of independent random variables.

Problemy PeredachiInformatsii Probl. Inf. Transm. (1999) 219–232.MR1663910 S. G. BobkovSchool of MathematicsUniversity of Minnesota127 Vincent Hall, 206 Church St. S.E.Minneapolis, Minnesota 55455USAE-mail: [email protected]