[PDF] The Method of Cumulants for the Normal Approximation

Abstract

The survey is dedicated to a celebrated series of quantitave results, developed by the Lithuanian school of probability, on the normal approximation for a real-valued random variable. The key ingredient is a bound on cumulants of the type |\kappa_j(X)| \leq j!^{1+\gamma} /\Delta^{j-2}, which is weaker than Cram\'er's condition of finite exponential moments. We give a self-contained proof of some of the "main lemmas" in a book by Saulis and Statulevi\v{c}ius (1989), and an accessible introduction to the Cram\'er-Petrov series. In addition, we explain relations with heavy-tailed Weibull variables, moderate deviations, and mod-phi convergence. We discuss some methods for bounding cumulants such as summability of mixed cumulants and dependency graphs, and briefly review a few recent applications of the method of cumulants for the normal approximation.

Full PDF

aa r X i v : . [ m a t h . P R ] M a r THE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION

HANNA D ¨ORING, SABINE JANSEN, AND KRISTINA SCHUBERT

Abstract.

The survey is dedicated to a celebrated series of quantitave results, developed bythe Lithuanian school of probability, on the normal approximation for a real-valued randomvariable. The key ingredient is a bound on cumulants of the type | κ j ( X ) | ≤ j ! γ / ∆ j − , whichis weaker than Cram´er’s condition of ﬁnite exponential moments. We give a self-contained proofof some of the “main lemmas” in a book by Saulis and Statuleviˇcius (1989), and an accessibleintroduction to the Cram´er-Petrov series. In addition, we explain relations with heavy-tailedWeibull variables, moderate deviations, and mod-phi convergence. We discuss some methodsfor bounding cumulants such as summability of mixed cumulants and dependency graphs, andbrieﬂy review a few recent applications of the method of cumulants for the normal approxima-tion. Mathematics Subject Classiﬁcation 2020:

Keywords: cumulants; central limit theorems and Berry-Esseen theorems; large and moderatedeviations; heavy-tailed variables.

Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1. Aims and scope of the article . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2. Cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3. Short description of the “main lemmas” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4. How to bound cumulants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.5. Some recent applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62. The main lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.1. Normal approximation with Cram´er corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2. Berry-Esseen bound and concentration inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3. Two examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4. On the Cram´er, Linnik, and Statuleviˇcius conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133. Related techniques and applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.1. Moderate deviations vs. heavy-tailed behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2. Mod-phi convergence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3. Analytic combinatorics. Singularity analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.4. Dependency graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184. Toolbox. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.1. Characteristic functions, Kolmogorov distance, and smoothing inequality . . . . . . . . . 204.2. Tails and exponential moments of the standard Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . 244.3. Integrals of monotone functions and Kolmogorov distance . . . . . . . . . . . . . . . . . . . . . . . . 254.4. Positivity of truncated exponentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255. Concentration inequality. Proof of Theorem 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266. Bounds under Cram´er’s condition. Proof of Theorem 2.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277. Bounds with ﬁnitely many moments. Proof of Theorem 2.2. . . . . . . . . . . . . . . . . . . . . . . . . . . 297.1. Introducing a modiﬁed tilted measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297.2. Moment estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.3. Bounds on tilt parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Date : 4 March 2021. X has moments up to order s ≥

3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54A.3. Bounds under the Statuleviˇcius condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561.

Introduction

Aims and scope of the article.

The method of cumulants is a central tool in comparing thedistribution of a random variable with the normal law. It enters the proof of central limit theorems,moderate and large deviation principles, Berry-Esseen bounds, and concentration inequalities invarious ﬁelds of probability theory: stochastic geometry, random matrices, random graphs, randomcombinatorial structures, functionals of stochastic processes, mathematical biology—the list is notexhaustive.The present survey shines a spotlight on a celebrated series of bounds developed by the Lithua-nian school, notably Rudzkis, Saulis, and Statuleviˇcius [109] and Bentkus and Rudzkis [11]. Thebounds work under a condition on cumulants that allows for heavy-tailed behavior and is con-siderably weaker than the Cram´er condition of ﬁnite exponential moments frequently invoked inthe theory of large deviations. The conditions on cumulants can be veriﬁed in many situations ofinterest, beyond sums of independent random variables. In their monograph [112], Saulis and Sta-tuleviˇcius study applications, for example, to random processes with mixing, multiple stochasticintegrals, and U -statistics.Our primary aim is to give a self-contained and accessible presentation, including proofs, ofthe “main lemmas” from Chapter 2 in the book [112] by Saulis and Statuleviˇcius; along theway, we correct a few minor errors in the proofs. We have not aimed at a reconstruction ofall numerical constants. The presentation should be accessible to a reader with little exposureto complex-analytic or Fourier methods in probability or slightly arcane concepts such as theCram´er-Petrov series. The reader familiar with the classical books by Ibragimov and Linnik [70],Petrov [102], Gnedenko and Kolmogorov [49], or even Feller [39] will easily recognize a classical setof ideas, however an in-depth study of characteristic functions, inversion formulas, and asymptoticexpansions are nowadays frequently eschewed in graduate probability courses and these methodsare no longer part of every probabilist’s toolbox. Moreover the presentation in [112] is extremelytechnical, making it very hard to extract the proof philosophy from the long series of technicalestimates. To remedy this situation, we have strived to make explicit the logical structure and keyideas, notably the truncation procedures needed to deal with Taylor expansions with zero radiusof convergence.In addition, we mention a few exemplary applications and discuss the relation of the afore-mentioned results with other techniques and ﬁelds, in particular, large deviations for sums ofheavy-tailed variables [37, 93], analytic combinatorics [47], and mod-phi convergence [43].In the remaining part of the introduction, we deﬁne the cumulants, summarize the main boundsstudied in the present survey, address methods for bounding cumulants, and list a few recentapplications. HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 3

Cumulants.

The cumulants of a real-valued random variable X are the numbers κ j ≡ κ j ( X ), j ∈ N , given by κ j ( X ) := ( − i) j d j d t j log E (cid:2) e i tX ] (cid:12)(cid:12)(cid:12) t =0 , provided the derivative exists. Equivalently, assuming r -fold diﬀerentiability of the characteristicfunction at the origin, the cumulants of order j = 1 , . . . , r are related to the Taylor expansion bylog E (cid:2) e i tX (cid:3) = r X j =1 κ j j ! (i t ) j + o ( t r ) ( t → . The cumulant of order 1 is the expected value κ = E [ X ], the cumulant of order 2 is the vari-ance κ = V ( X ). More generally, there exists a recurrence formula between the centered mo-ments E [( X − E [ X ]) j ] and the cumulants, which is how Thiele [122] originally deﬁned them.Cumulants are often called semi-invariants because of the homogeneity κ j ( λX ) = λ j κ j ( X ) andshift-invariance κ j ( X + c ) = κ j ( X ) for j ≥

2. The name cumulants was proposed by Fisher andWishart [46]; see Hald [57] for a historical account and a translation of Thiele’s article from Danishto English.There are many moment-to-cumulants formulas. We list a few to give a ﬁrst impression ofcumulants but emphasize that none of them, except perhaps the ﬁrst, is used in the sequel. Themost common relation, obtained from Fa`a di Bruno’s formula, is κ j ( X ) = j X m =1 ( − m − m X k ··· + km = jk ,...,km ≥ j ! k ! · · · k m ! m Y ℓ =1 E (cid:2) X k ℓ (cid:3) . Equivalently, the j -th cumulant is a sum over set partitions { B , . . . , B m } of { , . . . , j } , κ j ( X ) = j X m =1 X { B ,...,B m } ( − m − ( m − m Y ℓ =1 E (cid:2) X B ℓ (cid:3) . It can be obtained by a M¨obius inversion on the lattice of set partitions, the function ( − m − ( m − t E (cid:2) e i tX (cid:3) = E (cid:2) e i tX (cid:3) dd t log E (cid:2) e i tX (cid:3) implies the recurrence relations κ j ( X ) = E (cid:2) X j (cid:3) − j − X r =1 (cid:18) j − r − (cid:19) E (cid:2) X j − r (cid:3) κ r ( X ) . Cram´er’s rule for solving linear equations yields an expression of the cumulant as the determinantof a Toeplitz matrix [107, Cor. 3.1], κ j ( X ) = ( − j − ( j − (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) E X E X · · · E X j − ( j − E X k k ! E X · · · E X j − ( j − E X j − ( j − ... ... . . . ... ...0 0 · · · E X E X · · · E X (cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) . Cumulants oﬀer some advantages over moments. Crucially, the j -th order cumulant of a sumof independent random variables is simply the sum of the j -th order cumulants. For a standardnormal random variable all cumulants of order j ≥ λ are all given by κ j ≡ λ , while formulas for moments are more involved.Cumulants help prove central limit theorems: A sequence ( X n ) n ∈ N of random variables con-verges in distribution to a standard normal variable if and only if the expectation and the variancego to zero and one, respectively, and in addition all higher-order cumulants go to zero. In fact forthe higher-order cumulants it is enough to check that the cumulants with j larger than any ﬁxed HANNA D ¨ORING, SABINE JANSEN, AND KRISTINA SCHUBERT order s ≥ Short description of the “main lemmas”.

The main theorems discussed in the presentsurvey apply to real-valued random variables X for which all moments, hence also all cumulants,exist and for which the cumulants can be bounded as | κ j ( X ) | ≤ ( j !) γ ∆ j − ( j ≥ γ ≥

0, ∆ >

0. The variable X is assumed to be centered and normalized, E [ X ] = 0 and V ( X ) = 1. Following [2] we shall refer to this condition as the Statuleviˇcius condition . The mainresults, roughly, are the following:(1)

Normal approximation with Cram´er corrections.

Let Z ∼ N (0 ,

1) be a standard normalvariable. Then for x ∈ (0 , c ∆ / (1+2 γ ) ) with suitable constant c > P ( X ≥ x ) = e ˜ L ( x ) P ( Z ≥ x ) (cid:16) O (cid:16) x + 1∆ / (1+2 γ ) (cid:17)(cid:17) where ˜ L ( x ) is related to the so-called Cram´er-Petrov series (reviewed in Appendix A) andsatisﬁes | ˜ L ( x ) | = O ( x / ∆ / (1+2 γ ) ). See Rudzkis, Saulis, and Statuleviˇcius [109], Lemma2.3 in [112], and Theorem 2.3 below.(2) Bound on the Kolmogorov distance . The following bound of Berry-Esseen type holds true:sup x ∈ R (cid:12)(cid:12) P ( X ≤ x ) − P ( Z ≤ x ) (cid:12)(cid:12) ≤ C ∆ / (1+2 γ ) for some constant C >

0. See Rudzkis, Saulis, and Statuleviˇcius [109], Corollary 2.1 in[112], and Theorem 2.4 below.(3)

Concentration inequality . Assuming | κ j | ≤ j ! γ H/ ∆ j − for some H, ∆ > j (aminor variant of the Statuleviˇcius condition), one has P ( X ≥ x ) ≤ exp (cid:16) − x H + x − α / ∆ α (cid:17) , α := 11 + γ ≤ exp (cid:16) −

14 min (cid:16) x H , ( x ∆) α (cid:17)(cid:17) for all x ≥

0. See Bentkus and Rudzkis [11], the corollary to Lemma 2.4 in [112], andTheorem 2.5 below.Let us brieﬂy put these results in perspective and discuss the nature of the Statuleviˇcius condition.When γ = 0, the condition implies that the cumulant generating function ϕ ( t ) = log E [exp( tX )] = P j κ j t j /j ! is analytic in a neighborhood of the origin and ﬁnite for | t | < ∆. This is preciselyCram´er’s condition of ﬁnite exponential moments. The normal approximation with Cram´er cor-rections is proven with standard techniques also employed for sums of independent identicallydistributed random variables [70, Chapter 8]. The concentration inequality is similar to the Bern-stein inequality [123].The bounds are more intriguing for γ >

0. In that case the Taylor expansions P j κ j t j /j ! of thecumulant generating function can have zero radius of convergence and the random variable canbe heavy-tailed, meaning that it has inﬁnite exponential moments E [exp( tX )] for arbitrarily small t = 0. The concentration inequality shows that the tails of X have at least stretched exponentialdecay O (exp( − const x α )) with α = 1 / (1 + γ ), i.e., X has (in the worst case) Weibull-like tails.In fact there is equivalence: It is known that the Statuleviˇcius condition holds true if and onlyif Linnik’s condition E [exp( δ | X | α )] < ∞ , for some δ >

0, is satisﬁed, see Section 2.4. Resultson large deviations under conditions more general than Linnik’s condition are available as well,see Section 2.4. Hence, the concentration inequality morally says that if a random variable hascumulants similar to those of a Weibull-like variable, then it also has Weibull-like tails .This result is rather amazing. True, it is well-known that statements on the tails of a ran-dom variable can be inferred from information on the characteristic function χ ( t ) = E [exp(i tX )]near t = 0— or, if the random variable is non-negative or with values in N , from the behavior HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 5

Laplace transform E [exp( − λX )] as λ ↓ G ( z ) = E [ z X ] as z →

1. Such relations are at the heart of analytic proofs of limit theoremsin probability and combinatorics with Fourier analysis, Tauberian theorems, or complex analysis[70, 47]. However, it is not clear at all that the coeﬃcients of a divergent Taylor expansion carryenough information to allow for rigorous statements.Proofs for γ > / (1+2 γ ) is comparable to the monomial zones of attraction to the normal lawand “Cram´er’s system of limiting tails” for sums of i.i.d. random variables discussed by Ibragimovand Linnik [70, Chapters 9 and 10]. Let us stress that the critical scale is not just some technicallimitation. For sums of i.i.d. Weibull-like variables, it corresponds exactly to the scale at which thenormal approximation ceases to be good and heavy-tailed eﬀects kick in [95], see also Section 2.3.2for an elementary example.Note that the same cumulant bound also implies Rosenthal type bounds i.e. estimates forthe diﬀerence of the k -th moment to the corresponding moment of the normal distribution, see[34]. Connections with mod-phi convergence and moderate deviation principles are discussed inSection 3.1 and 3.2.1.4. How to bound cumulants.

There is no one-size-ﬁts-all way to bound cumulants. Neverthe-less, there are some repeating features, depending on the type of bound one seeks to establish andthe type of random variable under investigation. For the bound, it matters whether one is afteran analytic bound γ = 0 or after the weaker case γ >

0. Models fall broadly in two classes: ﬁrst,random variables built out of ﬁelds or processes with underlying independence or good controlon dependencies and decay of correlations, for example, empirical mean for a stationary ergodicprocess or functionals of Poisson point processes; second, models directly deﬁned on more complexstructures, including random matrices, models from analytic and probabilistic number theory, orrandom combinatorial structures.The Statuleviˇcius bound for γ = 0 implies that the cumulant generating function is analytic.Conversely, when the cumulant generating function is analytic in a neighborhood of the origin,then the cumulants can be bounded by applying Cauchy’s formula. Accordingly one may shiftperspective away from the cumulants and focus directly on generating functions. This is especiallyhelpful for random combinatorial structures, where often the recursive structure of combinatorialobjects translates into functional equations for generating functions and information on the an-alytic behavior [47]. Working directly with generating functions is also of advantage for randommatrices and probabilistic number theory [43].Analyticity of the cumulant generating function is equivalent to zero-freeness of the momentgenerating function. The role of analyticity and zero-freeness was already emphasized in Bryc’scentral limit theorem [18]. In statistical mechanics, controlling zeros is related to Lee-Yang theoryand relations with central limit theorems were explored, for example, by Iagolnitzer and Souillard[69] or Ruelle, Pittel, Lebowitz and Speer [82]; see also [43, Chapter 8.1]. For new results andan account of modern developments, the reader is referred to Michelen and Sahasrabudhe [92, 91](Section 7 in [91] has the telling title “Taming the cumulant sequence”).When a direct control of generating functions is not possible, cumulants are often treated withcombinatorial bounds and correlation estimates. Two prototypes are sums of dependent randomvariables and U -statistics of sequences of independent random variables [112, Chapters 4 and 5.1]or m -dependent vectors [58, 60]. Consider for example a sequence of independent random variables( X n ) n ∈ N and the random variable Y = P n − i =1 X i X i +1 . The cumulant of Y is not the sum of thecumulants κ ( X i X j ) because X i X i +1 and X k X k +1 are not independent for { i, i +1 }∩{ k, k +1 } 6 = ∅ .But clearly for a given nearest-neighbor pair α = { i, i + 1 } the number of pairs β = { j, j + 1 } with α ∩ β = ∅ is bounded by 2. This can be exploited with dependency graphs , see Section 3.4.Another example is when the input variables X n , n ∈ N , are not independent. The cumulantsof the partial sum are given by the sum of mixed cumulants κ ( X α , . . . , X α j ) [83] (see Eq. (3.4) HANNA D ¨ORING, SABINE JANSEN, AND KRISTINA SCHUBERT below) as κ j ( X + · · · + X n ) = X ≤ α ,...,α j ≤ n κ ( X α , . . . X α j ) , hence (cid:12)(cid:12) κ j ( X + · · · + X n ) (cid:12)(cid:12) ≤ n sup α ∈ N X α ,...,α j ∈ N (cid:12)(cid:12) κ ( X α , . . . , X α j ) (cid:12)(cid:12) . Thus bounds on the cumulants of the sum are intimately tied to summability properties of mixedcumulants. Analogous relations apply in the context of point processes; here summability ofmixed cumulants is replaced with bounds on the total variation of reduced (factorial) cumulantmeasures, which leads to the notions of weak and strong

Brillinger mixing [17, 71]. In the languageof statistical mechanics, Brillinger mixing corresponds to integrability of truncated correlationfunctions, a condition typically satisﬁed by Gibbs point processes at low density [110].Brillinger mixing can be rather diﬃcult to establish directly. An alternative approach, stillfocused on the decay of correlations, was devised by Baryshnikov and Yukich [10] and then furtherdeveloped, see B laszczyszyn, Yogeshwaran and Yukich [13] and the references therein. Eichels-bacher, Raiˇc and Schreiber [36] pushed the method all the way up to the Statuleviˇcius condition.The method works with approximate factorization properties of moment measures; cumulants arerepresented as combinations of semi-cluster measures . Combinatorics enter when bounding thenumber of summands in the latter representation [36, Lemma 3.2].For functionals of Markov chains or stochastic processes with mixing, it is convenient to workwith another set of quantities, called centered moments [112, Chapter 4], higher-order covariances [62] or

Boolean cumulants [41, Section 10]. Mixing properties of the underlying stochastic processlead to good bounds on the centered moments, and then bounds on cumulants are deduced froma Boolean-to-classical cumulants formula.For Poisson or Gaussian input data, another class of methods builds on diagrammatic formu-las for cumulants and chaos decompositions (related to Feynman diagrams and Fock spaces inmathematical physics). This applies to multiple stochastic integrals of Brownian motion or ofPoisson point processes [112, Chapter 5.3]. A modern account of diagrammatic formulas is givenby Peccati and Taqqu [99]. Useful formulas for cumulants can also be derived using Malliavincalculus and the inﬁnite-dimensional Ornstein-Uhlenbeck operator, see Nourdin and Peccati [97].1.5.

Some recent applications.

In recent years the method of cumulants has attracted interestin various areas of application. We list a few; the examples below involve bounds on cumulantsthough not necessarily of the form | κ j | ≤ j ! γ / ∆ j − .In stochastic geometry, the method of cumulants is used in numerous ways. In [53] and [54]it is used for the volume and the number of faces of a Gaussian polytope to show concentrationinequalities and a Marcinkiewicz-Zygmund-type strong law of large numbers as well as a centrallimit theorem including error bounds and moderate deviations. Furthermore for the volumes ofrandom simplices a central limit theorem, mod-phi convergence as well as moderate and largedeviations are proved in [52], where the dimension and the number of points grow to inﬁnity.Poisson cylinder processes are studied in [65], [66] and volumes of simplices in high-dimensionalPoisson-Delaunay tessellations in [56]. The method is also applicable to the covered volume inthe Boolean model, [51, 61, 62]. General functionals of random m -dependent ﬁelds are studiedin [51, 58, 60, 64]. A survey covering m -dependent variables, Markov chains, and mixing randomvariables is given by Heinrich [59]. Moderate deviation results for classical stabilizing functionalsin stochastic geometry can be found in [36].The method of cumulants also plays an important role in the theory of random matrices anddeterminantal point processes. For the latter cumulants for the sine kernel were studied by Costinand Lebowitz [20]. In [117] and [118] Soshnikov studied the Gaussian limit for linear statisticsof Gaussian ﬁelds and determinantal random point ﬁelds. The methods of [20] were also appliedto spacing distribution in the circular unitary ensemble [116]. The method of [117] was furtherextended e.g. to study general one-cut unitary-invariant matrix models [79] and to prove meso-scopic ﬂuctuations for unitary invariant ensembles [80]. Further the methods were applied to(generalized) Ginibre ensembles, where linear eigenvalue statistics and characteristic polynomials HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 7 were studied [105] and moderate deviations for counting statistics were shown [40]. The method ofcumulants is also applied for eigenvalue counting statistics and determinants of Wigner matrices[29, 28]. This was generalized to the linear spectrum statistics of orthogonal polynomial ensemblesin [98]. Studying the cumulants also works for characteristic polynomials of random matrices fromthe circular unitary ensemble [19] as well as the determinants of random block Hankel matrices,[26]. Cumulants also enter the analysis of matrix models in which the eigenvalues do not form adeterminantal point process, see Borot and Guionnet [16].The method of cumulants, combined with the concept of dependency graphs, yields usefulresults in random graphs, in particular subgraph count statistics in Erd˝os-R´enyi random graphs[30, 43], or more generally for graphons in [45], for the proﬁle of a branching random walk [55] aswell as for the winding number of Brownian motion in the complex plane [23]. F´eray gives criteriafor normality convergence of dependency graphs by using the method of cumulants [41]. This isapplied to vincular permutation patterns in [67] (yielding the same order of convergence in theKolmogorov distance than achieved by applying Stein’s method).The generalization to weighted dependency graphs allows for applications to the Ising model as[32] shows. Already in [90] models of statistical mechanics such as the Curie-Weiss and the Isingmodel are studied.For an application for multiple Wiener-Itˆo integrals in stochastic analysis see [97], the surveyof a series of articles in the book [99] as well as [113].Another area of applications are random logarithmic structures in combinatorics, random per-mutations and random integer partitions as well as so-called character values, see [6, 43, 45].Random arithmetic functions, the Riemannian ζ function and L functions over ﬁnite ﬁelds areconsidered in the series of publications [23, 43, 72, 77, 78]. The latest publication [42] shows acentral limit theorem for weighted dependency graphs and generalizes results for the number ofoccurrences of any ﬁxed pattern in multiset permutations and in set partitions.In mathematical biology M¨ohle and Pitters derived the absorption time and tree length ofthe Kingman coalescent by bounding the cumulants, see [94]. Restricted to the inﬁnitely manysites model of Kimura, in [103] Pitters derives a formula for the cumulants of the number ofsegregating sites of a Kingman coalescent implying a central limit theorem and a law of largenumbers. Studying the multivariate cumulants, the number of cycles in a random permutationand the number of segregating sites jointly converge to the Brownian sheet [104].To conclude this non-exhaustive list, we stress that cumulants are highly relevant in areas some-what outside the scope of this survey. Perhaps closest to traditional probability are asymptotictechniques in statistics, see Barndorﬀ-Nielsen and Cox [9]. A tensorial view of cumulants, withapplications in statistics, is given by McCullagh [89]. Cumulants are also useful in algebraic statis-tics, see the book by Zwiernik [127]. For example, Sturmfels and Zwiernik [121] discuss cumulantsfor algebraic varieties and binary random variables on hidden subset models. A completely diﬀer-ent line of inquiry is free probability and non-commutative probability, in which diﬀerent notionsof independence come with diﬀerent notions of cumulants. The relation between free, monotoneand Boolean cumulants is studied by Arizmendi, Hasebe, Lehner and Vargas [5]. Finally, cu-mulants also feature prominently in kinetic theory and the analysis of time-dependent models inmathematical physics [87, 15]. 2. The main lemmas

Here we state four theorems, roughly the “main lemmas” in Saulis and Statuleviˇcius [112,Chapter 2]. Two theorems are about the normal approximation, with Cram´er corrections, to arandom variable under conditions on ﬁnitely many cumulants (Theorem 2.2) or under the growthcondition ( S γ ), which allows for heavy-tailed random variables (Theorem 2.3). An inequality ofthe Berry-Esseen type and a concentration inequality are given in Theorems 2.4 and 2.5, againunder the condition ( S γ ).The main theorems are illustrated with two elementary examples—Gamma and Weibull dis-tributions—in Section 2.3. The meaning of the Statuleviˇcius condition ( S γ ) is further clariﬁed inSection 2.4. HANNA D ¨ORING, SABINE JANSEN, AND KRISTINA SCHUBERT

Normal approximation with Cram´er corrections.

In the following X is a real-valuedrandom variable, deﬁned on some probability space (Ω , F , P ), with cumulants κ j ( X ) = κ j . Weassume throughout the text that the variable X is normalized, i.e., E [ X ] = 0 and V ( X ) = 1. Twoimportant quantities are the cumulant generating function ϕ ( t ) := log E (cid:2) exp( tX ) (cid:3) ∈ R ∪ {∞} ( t ∈ R ) (2.1)and its Legendre transform I ( x ) := sup t ∈ R (cid:0) tx − ϕ ( t ) (cid:1) ( x ∈ R ) . (2.2)The ﬁrst theorem works under the condition that there exists some ∆ > ∀ j ≥ | κ j ( X ) | ≤ ( j − j − . ( S )Under this condition the cumulant generating function is ﬁnite on ( − ∆ , ∆) with absolutely con-vergent Taylor expansion ϕ ( t ) = t ∞ X j =3 κ j j ! t j ( | t | < ∆) , (2.3)where we have used κ = E [ X ] = 0 and κ = V ( X ) = 1. The Cram´er rate function admits a Taylorexpansion as well (Proposition A.1), with radius of convergence at least 0 . I ( x ) = x − ∞ X j =3 λ j x j ( | x | <

310 ∆) . (2.4)The series P ∞ j =3 λ j x j − is called Cram´er series or Cram´er-Petrov series after [21, 101]. Cram´er’soriginal article [21] was recently made accessible in electronic form, together with an Englishtranslation, by Touchette [22]. Appendix A collects some relevant background on the series. It isconvenient to set L ( x ) := ∞ X j =3 λ j x j (2.5)so that I ( x ) = x − L ( x ). From now on Z is always a standard normal variable, Z ∼ N (0 , Theorem 2.1.

Under condition ( S ) there exist universal constants c, C, C ′ > such that for all x ∈ [0 , c ∆] and some θ = θ ( x ) ∈ [ − , , P ( X ≥ x ) = e L ( x ) P ( Z ≥ x ) (cid:16) Cθ x + 1∆ (cid:17) and | L ( x ) | ≤ C ′ x / ∆ . The theorem is proven in Section 6. It is a special case of Theorem 2.3 below; we have chosento provide a separate statement as it is easier to grasp, and its proof is a helpful warm-up for theproof of Theorem 2.3.The next theorem asks what subsists when the cumulants satisfy the bound ( S ) only up tosome order, i.e., ∀ j ∈ { , . . . , s + 2 } : | κ j ( X ) | ≤ ( j − j − ( S ∗ )for some s ∈ N . We say that X satisﬁes condition ( S ∗ ) if all moments E [ X j ], j ≤ s +2 exist—hencealso all cumulants κ j with j ≤ s + 2—and the cumulants satisfy the required inequality. Undercondition ( S ∗ ) the random variable X need not have exponential moments and the cumulantgenerating function may be inﬁnite, therefore the deﬁnitions of ϕ ( t ) and I ( x ) are modiﬁed asfollows. We set ˜ ϕ ( t ) = t s X j =3 κ j j ! t j . (2.6) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 9

For small x and t the equation ˜ ϕ ′ ( t ) = x reads t + O ( t ) = x and it has a solution t ( x ) = x + P ∞ j =2 ˜ b j x j with suitably deﬁned coeﬃcients ˜ b j . We deﬁne˜ I ( x ) := t ( x ) x − ˜ ϕ ( t ( x )) , ˜ L ( x ) := x − ˜ I ( x ) (2.7)and note that ˜ L ( x ) has a Taylor expansion ˜ L ( x ) = P ∞ j =3 ˜ λ j x j with radius of convergence at least0 . λ j = λ j for j ≤ s (Eq. (A.8)). Theorem 2.2.

Let X be a real-valued random variable with E [ X ] = 0 , V ( X ) = 1 . Assumethat X satisﬁes condition ( S ∗ ) for some even s ≥ and ∆ > with s ≤ . Then for all x ∈ [0 , √ s/ (3 √ e)) and some θ = θ ( x ) ∈ [ − , , P ( X ≥ x ) = e ˜ L ( x ) P ( Z ≥ x ) (cid:16) θf ( δ, s ) x + 1 √ s (cid:17) with δ = x √ s/ (3 √ e) ∈ [0 , , f ( δ, s ) = 11 − δ (cid:16)

127 + 113 s e − (1 − δ ) s / / (cid:17) and | ˜ L ( x ) | ≤ . x / ∆ . The theorem is proven in Section 7. It corresponds to Lemma 2.2 in [112] and is due to Rudzkis,Saulis, and Statuleviˇcius [109]. The constants are slightly worse than the constants given in [112]but of a similar order of magnitude. We are not aware of any application of the concrete formulafor f ( δ, s ). Instead, what matters is that f ( δ, s ) is bounded on [0 , δ ] × N , for all δ < γ > >

0, consider the

Statuleviˇcius condition ∀ j ≥ | κ j ( X ) | ≤ j ! γ ∆ j − . ( S γ )The relation of this condition with Weibull tails and Linnik’s condition E [exp( δ | X | / (1+ γ ) )] < ∞ is clariﬁed by Lemma 2.8 below, see [2]. Deﬁne∆ γ := 16 (cid:16) ∆ √ (cid:17) / (1+2 γ ) , s γ := 2 j (cid:16) ∆ (cid:17) / (1+2 γ ) k − , m γ := min (cid:16)(cid:24) γ (cid:25) + 1 , s γ (cid:17) . (2.8) Theorem 2.3.

Let X be a real-valued random variable with E [ X ] = 0 , V ( X ) = 1 . Suppose that X satisﬁes condition ( S γ ) . Then there exist some universal constant C > such that for all x ∈ [0 , ∆ γ ) and some θ = θ ( x ) ∈ [ − , , P ( X ≥ x ) = e ˜ L γ ( x ) P ( Z ≥ x ) (cid:16) θg ( δ, ∆ γ ) x + 1∆ γ (cid:17) with | ˜ L γ ( x ) | ≤ x / (1 .

54 ∆ γ ) and ˜ L γ ( x ) = ( θ ( x ∆ γ ) , γ ≥ , P m γ j =3 λ j x j + θC γ ( x ∆ γ ) m γ +1 , γ < . Here we set g ( δ, ∆ γ ) := 11 − δ (cid:16)

24 + 749∆ γ exp (cid:0) − (1 − δ ) p ∆ γ (cid:1)(cid:17) , and δ = x/ ∆ γ . The theorem is proven in Section 8. It corresponds to Lemma 2.3 in [112] and is due to Rudzkis,Saulis, and Statuleviˇcius [109]. The constants given in [112] are 60 and 600 instead of 24 and 749.Our second constant 749 is worse but our ﬁrst constant 24 is better.We shall see that under the conditions of the theorem, s γ is larger or equal to 4 so that m γ = 2for γ ≥ m γ ≥ γ <

1. Let us brieﬂy comment on the two bounds for ˜ L γ ( x ) inTheorem 2.3. The global bound | ˜ L γ ( x ) | ≤ x / (1 .

54 ∆ γ ) is similar to the bounds for L ( x ) and˜ L ( x ) in Theorems 2.1 and 2.2. It gives the leading order of ˜ L γ ( x ). Note, however, that it can bequite large, since x / ∆ γ can be of order up to ∆ γ . The case distinction in Theorem 2.3 provides a representation of ˜ L γ ( x ) that is precise in the sense that the remainder is small when x is smallcompared to ∆ γ . Remark.

Correction terms from the Cram´er series need only be taken into account when γ < − c x α ) with α = 1 / (1 + γ ) > /

2, seeSection 2.3.2 and Lemma 2.8 below. It should be pointed out that the value α = 1 / O (( x + 1) / ∆ γ ) after the exponential is known to be not optimal for sums ofi.i.d. random variables. If X n = S n / √ n is a normalized sum of i.i.d. random variables that satisfythe Statuleviˇcius condition for some ﬁxed ∆, then X n satisﬁes the Statuleviˇcius condition with a n -dependent ∆( n ) proportional to √ n (see Section 3.1), and 1 / ∆ γ is of the order of ( x + 1) / √ n β for some β <

1, which is larger than the error term O ( x/ √ n ) proven e.g. in Ibragimov and Linnik[70, Eq. (13.4.4)].The principal idea in the proof of Theorem 2.3 is to apply Theorem 2.2 for suitably chosen s and ∆ s such that condition ( S ∗ ) is satisﬁed if condition ( S γ ) holds true. Thus, we seek s and ∆ s such that j ! γ ∆ j − ≤ ( j − j − s ( j = 3 , . . . , s + 2) . The inequality is equivalent to (cid:0) j ( j − (cid:1) γ ( j − γ ≤ (cid:16) ∆∆ s (cid:17) j − ( j = 3 , . . . , s + 2)and max k =1 ,...,s (cid:16) γk log k ! + 1 + γk (cid:0) log( k + 2) + log( k + 1) (cid:17) ≤ log ∆∆ s . (2.9)By Stirling’s formula, the term to be maximized behaves like γ log k e + O (cid:16) k log k (cid:17) = γ (cid:0) o (1) (cid:1) log k ( k → ∞ ) . For a heuristic evaluation of (2.9), let us keep the leading order term only, then (2.9) becomes γ log s ≤ log(∆ / ∆ s ) hence ∆ s = ∆ /s γ . Then s ≤ s if and only if s ≤ ) / (1+2 γ ) . Thissuggests to pick √ s = const ∆ / (1+2 γ ) , which is precisely the power of ∆ appearing in Theorem 2.3,via ∆ γ .2.2. Berry-Esseen bound and concentration inequality.

Theorem 2.3 is complemented bya Berry-Esseen bound and a concentration inequality that provides statements for all of x ≥ x ∈ (0 , ∆ γ ). Theorem 2.4.

Under the Statuleviˇcius condition ( S γ ) , we have sup x ∈ R (cid:12)(cid:12) P ( X ≥ x ) − P ( Z ≥ x ) (cid:12)(cid:12) ≤ C γ ∆ / (1+2 γ ) . for some constant C γ that does not depend on the random variable X or on ∆ . Theorem 2.4 is proven in Section 9, it corresponds to Corollary 2.1 in [112]. The precise boundgiven by [112] is 18 / ∆ γ with ∆ γ deﬁned in (2.8), we have not checked the numerical constants. Theorem 2.5.

Suppose E [ X ] = 0 and | κ j | ≤ j ! γ H ∆ j − . (2.10) for some γ ≥ and H, ∆ > . Set α := 1 / (1 + γ ) . Then there exists C > such that for all x ≥ , P ( X ≥ x ) ≤ C exp (cid:16) − x H + x − α / ¯∆ α (cid:17) . (2.11) The constant does not depend on X , H , ∆ , or γ . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 11

Theorem 2.5 is proven in Section 5, it corresponds to Lemma 2.4 in [112] and is due to Bentkusand Rudzkis [11]. As noted by Kallabis and Neumann [75], the statement of Lemma 2.4 in [112]contains a typo. Bentkus and Rudzkis give the constant C = 1, we give a shorter proof but provideno concrete numerical bound on C .The exponent in (2.11) can be expressed in terms of the harmonic mean Harm( a, b ) = 2( a − + b − ) − as x H + x − α / ¯∆ α = (cid:16) Hx + 1( x ¯∆) α (cid:17) − = 12 Harm (cid:16) x H , ( x ¯∆) α (cid:17) . Thus Theorem 2.5 provides an upper bound that smoothly interpolates between Gaussian tailsexp( − x / (2 H )) and stretched exponential tails exp( − ( x ¯∆) α / Two examples.

Before we turn to the proofs, we provide two examples that illustrate thetheorems and explain the role of ∆ and ∆ / (1+2 γ ) .2.3.1. A Gamma-distributed random variable.

Pick ∆ > Y be a Gamma random variablewith parameters β = ∆ and α = ∆ . Thus the random variable Y has probability density functionΓ( α ) − (0 , ∞ ) ( x ) β α x α − e − βx , moment generating function (1 − t/β ) − α , variance α/β = 1 andexpected value α/β = 1. Set X := Y − ∆. Then X has pdf ρ ∆ ( x ) = ∆ ∆ Γ(∆ ) ( x + ∆) ∆ − e − ∆( x +∆) [ − ∆ , ∞ ) ( x ) (2.12)and cumulant generating function, for | t | < ∆, given by ϕ ( t ) = log E (cid:2) e tX (cid:3) = − ∆ log (cid:16) − t ∆ (cid:17) − t ∆ = ∞ X j =2 j t j ∆ j − , from which we read oﬀ the cumulants κ j = ( j − j − ( j ≥ . The explicit formula for the probability density function allows us to check that the normal ap-proximation for X is good when ∆ is large and x is small compared to ∆. Proposition 2.6. As ∆ → ∞ and x/ ∆ → , the probability density function (2.12) satisﬁes ρ ∆ ( x ) = exp( − x [1 + O ( x ∆ )]) √ π (cid:16) O (cid:0) x +1∆ (cid:1)(cid:17) . Proof.

We rewrite ρ ∆ ( x ) = ∆ − Γ(∆ ) (1 + x/ ∆) ∆ − e − ∆ (1+ x/ ∆) [ − , ∞ ) ( x/ ∆)= ∆ − e − ∆ Γ(∆ ) 11 + x/ ∆ e ∆ [log(1+ x/ ∆) − x/ ∆] [ − , ∞ ) ( x/ ∆) . (2.13)Using Γ( x + 1) = x Γ( x ) and Stirling’s approximation, we haveΓ(∆ ) = ∆ − Γ(∆ + 1) = (cid:16) O (cid:0) ) (cid:17) ∆ − √ π ∆ (cid:16) ∆ e (cid:17) ∆ = (cid:16) O (cid:0) ) (cid:17) √ π ∆ − e − ∆ . (2.14)Combining this with the Taylor expansion log(1 + u ) = u − u / o ( u ) of the logarithm, wededuce ρ ∆ ( x ) = (cid:16) O (cid:0) (cid:1)(cid:17)(cid:16) O (cid:0) x ∆ (cid:1)(cid:17) √ π exp (cid:16) − x (cid:0) O ( x ∆ ) (cid:1)(cid:17) . (cid:3) The tilted normal approximation (compare Bahadur-Rao [7], [24, Theorem 3.7.4] or the proof ofthe lower bound in Cram´er large deviation principle [24, Chapter 2.2]) consists in the following.

Let I ( x ) := sup t ∈ R ( tx − ϕ ( t )) be the rate function in the Cram´er large deviation principle. Anexplicit computation yields I ( x ) = ∆ x − ∆ log(1 + x/ ∆) = 12 x + ∞ X j =3 ( − j j ∆ j − x j , from which we read oﬀ the Cram´er-Petrov series L ( x ) = − ∞ X j =3 ( − j j ∆ j − x j ( | x | < ∆) . For later purposes we extend the deﬁnition of L ( x ) to all of R by putting L ( x ) = I ( x ) − x / x/ ∆ → L ( x ) = x O ( x/ ∆) . Given x ≥

0, let h ≥ ϕ ′ ( h ) = x and let b X h be a random variable withdistribution P ( b X h ≤ y ) = e − ϕ ( h ) E [e hX { X ≤ y } ]. The variable b X h has expected value ϕ ′ ( h ) = x ,variance ϕ ′′ ( h ), and probability density function e − ϕ ( h ) e hy ρ ∆ ( y ). The approximation L ( b X h ) ≈N ( x, ϕ ′′ ( h )) suggestse − ϕ ( h ) e hy ρ ∆ ( y ) = ρ b X h ( y ) ≈ p πϕ ′′ ( h ) e − ( y − x ) / [2 ϕ ′′ ( h )] . Remember I ( x ) = hx − ϕ ( h ) and I ′′ ( x ) = 1 /ϕ ′′ ( h ), so ρ ∆ ( y ) ≈ p I ′′ ( x )e − I ( x ) × √ π e − ( y − x ) / [2 ϕ ′′ ( h )] − h ( y − x ) . Proposition 2.7. As ∆ → ∞ , we have for all x ≥ − ∆ ρ ∆ ( x ) = (cid:16) O (cid:16) (cid:17)(cid:17)r I ′′ ( x )2 π e − I ( x ) = (cid:16) O (cid:16) (cid:17)(cid:17) exp( L ( x ))1 + x/ ∆ exp( − x / √ π with an error term O (1 / ∆) uniform in x . Notice that this approximation is much more precise than the direct normal approximation.

Proof.

An explicit computation yields I ′ ( x ) = x x/ ∆ , I ′′ ( x ) = 1(1 + x/ ∆) . Hence for all x ≥ − ∆1 √ π p I ′′ ( x )e − I ( x ) = 1 √ π x/ ∆) e − ∆ x (1 + x/ ∆) ∆ = 1 √ π (1 + x/ ∆) ∆ − e − ∆ x . A quick look at (2.13) reveals that this expression diﬀers from the density ρ ∆ ( x ) only through theprefactor, 1 / √ π vs. ∆ − e − ∆ / Γ(∆ ). The ratio between these two prefactors is independentof x and behaves like 1 + O (1 / ∆) by Stirling’s approximation (2.14). (cid:3) Weibull tails.

Fix γ >

0. Set α := 1 / (1 + γ ) and consider a non-negative random variablewith survival function P ( Y ≥ y ) = exp( − y α ) ( y ≥ . The moments of Y are given by E [ Y m ] = Γ (cid:0) mα (cid:1) . Notice that as m → ∞ , E [ Y m ] = (1 + o (1)) p πm/α (cid:16) mα e (cid:17) m/α = m ! γ α m/α e o ( m ) , (2.15)where we have used Stirling’s approximation and 1 /α = 1 + γ . Let Z ∼ N (0 , Y . For small ε >

0, set X ε := Z + εY. HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 13

Then the expected value µ ε = E [ X ε ] and the variance σ ε = V ( X ε ) satisfy µ ε = O ( ε ) and V [ X ε ] =1 + O ( ε ). The centered variable b X ε = ( X ε − µ ε ) /σ ε has cumulants κ j ( b X ε ) = (cid:16) εσ ε (cid:17) j κ j ( Y ) ( j ≥ . In view of (2.15) and Lemma 2.8, it seems plausible that the centered variable b X ε satisﬁes condi-tion ( S γ ) with ε -dependent ∆ = ∆( ε ) of the order of∆( ε ) ≈ σ ε ε = 1 ε (1 + O ( ε )) . We would like to understand the behavior of the tails P ( X ε ≥ x ε ) = P ( b X ε ≥ ( x ε − µ ε ) /σ ε ) as x ε → ∞ and ε →

0. Theorem 2.3 suggests that the normal approximation P ( X ε ≥ x ε ) ≈ P ( Z ≥ x ε ) should be good as long as x ε is small compared to∆( ε ) / (1+2 γ ) ≈ ε − / (1+2 γ ) . (2.16)We are not going to provide a precise statement on the normal approximation. Instead we wouldlike to emphasize two key facts. First, the critical scale (2.16) is explained with a very simpleheuristics. Second, the normal approximation cannot apply beyond that scale. As a consequence,the scale ∆ / (1+2 γ ) in Theorem 2.3 is not due to technical restrictions but is in fact sharp.For the heuristic derivation of the critical scale (2.16), notice P ( X ε ≥ x ε ) ≥ P ( Z ≥ , εY ≥ x ε ) = 12 exp (cid:16) − (cid:0) x ε ε (cid:1) α (cid:17) (2.17)but also, since Y is non-negative, P ( X ε ≥ x ε ) ≥ P ( Z ≥ x ε ) = (1 + o (1)) exp( − x ε / x ε √ π , where we have used the well-known asymptotic behavior of Gaussian tails, see Eq. (4.16) below.The two lower bounds correspond to two diﬀerent ways of realizing the unlikely event that X ε = Z + εY ≥ x ε —either Z stays small but εY is very large, or εY stays small but Z is large. Whichof the two eﬀects dominates the other? The answer depends on how large x ε is. In view of theequivalence exp (cid:16) − x ε (cid:17) ≥ exp (cid:16) − (cid:16) x ε ε (cid:17) α (cid:17) ⇔ x ε ≤ (cid:16) /α ε (cid:17) α/ (2 − α ) we should expect the probability P ( X ε ≥ x ε ) to be similar to exp( − x ε /

2) when x ε ≪ ε − α/ (2 − α ) and similar to exp( − ( x ε /ε ) α ) when x ε ≫ ε − α/ (2 − α ) . Because of α = 1 / (1 + γ ), we have ε − α/ (2 − α ) = ε − / (1+2 γ ) , which is exactly the right-hand side of (2.16). Thus we have recovered, heuristically, the criticalscale from Theorem 2.3.In addition, for x ε ≫ ε − α/ (2 − α ) , the lower bound (2.17) yields the rigorous asymptotic lowerbound P ( X ε ≥ x ε ) ≥

12 exp (cid:16) − (cid:16) x ε ε (cid:17) α (cid:17) ≫ P ( Z ≥ x ε ) . Hence, the normal approximation cannot be good beyond the scale (2.16).2.4.

On the Cram´er, Linnik, and Statuleviˇcius conditions.

For γ >

0, the Statuleviˇciuscondition ( S γ ) seems technical and not immediately accessible to probabilistic intuition. For γ = 0,the situation is simpler: If | κ j | ≤ j ! / ∆ j − for some ∆ > j ≥

3, then P j ≥ κ j t j /j ! isabsolutely convergent on ( − ∆ , ∆) and E [exp( t | X | )] < ∞ for all t ∈ ( − ∆ , ∆). Thus Cram´er’scondition is satisﬁed and the distribution of X has exponentially decaying tails.The question arises if there is a similar intuition for the condition ( S γ ) when γ >

0. The answeris yes, if we replace Cram´er’s condition by

Linnik’s condition , which reads E (cid:2) exp( δ | X | α ) (cid:3) < ∞ for some α ∈ (0 ,

1) and δ >

0. The correct choice turns out to be α = 1 / (1 + γ ), which should notsurprise us after Section 2.3.2. In addition, conditions on cumulants may be replaced by conditionson moments. Lemma 2.8.

Let X be a random variable with E [ X ] = 0 , V ( X ) = 1 , and E [ | X | j ] < ∞ for all j ≥ . Fix γ ≥ . Then, the following three statements are equivalent: (i) X satisﬁes condition ( S γ ) . (ii) There exists H ≥ such that the moments of X satisfy (cid:12)(cid:12) E [ X j ] (cid:12)(cid:12) ≤ j ! γ H j − ( j ≥ . ( M γ )(iii) There exists δ > such that E (cid:2) exp (cid:0) δ | X | / (1+ γ ) (cid:1)(cid:3) < ∞ . Similar relations, with explicit control on constants, are proven in [112, Chapter 3.1]. Condition(ii), for γ = 0, is a variant of the condition | E [ X j ] | ≤ Cj ! H j − , sometimes called Bernsteincondition [112] because it allows for a Bernstein inequality with unbounded random variables [70,Chapter 7.5]. Linnik’s condition is discussed in depth in the context of “monomial zones of localnormal attraction” in [70, Chapter 9]. The name Linnik condition is not used in the book [70], itis used for example by Saulis and Statuleviˇcius [112] or Amosova [3].Large deviation theorems for sums of i.i.d. variables under conditions of the type E exp[ h ( X )] < ∞ are available as well, see Chapter 11 in Ibragimov and Linnik [70] and Nagaev [96] for sums ofi.i.d. variables and Heinrich [59, Section 4] for sums of Markov chains. However to the best of ourknowledge there is no analogue of Lemma 2.8 for such more general conditions. Proof.

We start with the equivalence of (ii) and (iii).“( ii ) ⇒ ( iii )” First we note that condition ( M γ ) implies a similar condition for the momentsof | X | . For even powers this is immediate. For odd powers, we use 2 j + 1 = j + ( j + 1), theCauchy-Schwarz inequality, and condition ( M γ ). This gives E (cid:2) | X | j +1 (cid:3) ≤ (cid:16) (2 j )! γ H j − (2 j + 2)! γ H j (cid:17) / = (cid:16) j + 22 j + 1 (cid:17) (1+ γ ) / (2 j + 1)! γ H j − . The ratio (2 j + 2) / (2 j + 1) is not smaller than 1 but is bounded by 4 / j ≥ H ′ ≥ H large enough, we get E (cid:2) | X | j +1 (cid:3) ≤ (2 j + 1)! γ ( H ′ ) j +1 − . We conclude with an argument by Mason and Zhou [88, Appendix B]. Set α := 1 / (1 + γ ). Thefunction x x α is concave on R + , therefore E [ Y α ] ≤ E [ Y ] α for every non-negative randomvariables Y . In particular, for all j ≥ E (cid:2) | X | jα (cid:3) ≤ (cid:16) E (cid:2) | X | j (cid:3)(cid:17) α ≤ j ! (1+ γ ) α ( H ′ ) ( j − α = j !( H ′ ) ( j − α , which gives E h exp( δ | X | α ) i = 1 + ∞ X j =1 δ j j ! E (cid:2) | X | jα (cid:3) < ∞ for δ < /H ′ .The implication “( iii ) ⇒ ( ii )” is proven by Amosova [3, Lemma 3], see also Mason and Zhou [88,Appendix B]. We sketch the argument for the reader’s convenience, following [88]. Because ofexp( x ) ≥ x k /k ! for all x ≥

0, we have E (cid:2) | X | kα (cid:3) ≤ k ! δ k E (cid:2) exp( δ | X | α ) (cid:3) =: k ! δ k C ( δ ) . Given m ∈ N , deﬁne k = ⌈ m/α ⌉ = ⌈ m (1 + γ ) ⌉ . Thus β := m/ ( kα ) ≤ y y β is concave.Therefore E (cid:2) | X | m (cid:3) ≤ (cid:16) E (cid:2) | X | kα (cid:3)(cid:17) m/ ( kα ) . Because of y β ≤ max(1 , y ) for all y ≥ β ∈ (0 , E (cid:2) | X | m (cid:3) ≤ max (cid:16) , k ! δ k C ( δ ) (cid:17) . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 15

The proof is completed by comparing k ! with m ! γ , aided by Stirling’s formula, see [88, AppendixB] for details.Next we address the equivalence of moment and cumulant conditions.“( ii ) ⇒ ( i )” We follow Rudzkis, Saulis, and Statuleviˇcius [109, Lemma 2]. Set m j := E (cid:2) X j ].The moment-cumulant relation yields κ j j ! = [ z j ] log (cid:16) j X k =1 m k k ! z k (cid:17) meaning that κ j /j ! is equal to the coeﬃcient of z j in the power series obtained by expanding thelogarithm on the right-hand side. Let r > j X k =1 | m k | k ! r k < . Then z log (cid:16) P jk =1 1 k ! m k z k (cid:17) is analytic in | z | < r and continuous in | z | ≤ r . By Cauchy’sintegral formula, the coeﬃcient can be represented by a contour integral over the contour | z | = r and we ﬁnd (cid:12)(cid:12)(cid:12) κ j j ! (cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) π i I log (cid:16) j X k =1 m k k ! z k (cid:17) d zz j +1 (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ≤ r − j (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) log (cid:16) − j X k =1 | m k | k ! r k (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) , where we have used | log(1 + z ) | ≤ − log(1 − | z | ) for z ∈ C with | z | <

1. For δ ∈ (0 ,

1) small enoughwe check that the choice r := δj ! γ/j H ( j − /j is admissible. Notice r − j = δ − j j ! γ H j − resp. r k = δ k H − k ( j − /j ( j !) − γk/j for k ≤ j . We have j X k =1 | m k | k ! r k ≤ r + j X k =3 k ! γ H k − r k = 12 r + j X k =3 k ! γ j ! γk/j H k − H k ( j − /j δ k . For k ≤ j we have k ! j ≤ j ! k (this can be proven by induction over j ≥ k at ﬁxed k ). In addition j ( k − ≤ k ( j − H k − ≤ H k ( j − /j and j X k =1 | m k | k ! r k ≤ δ + j X k =3 δ k ≤ δ + δ − δ =: C δ . Clearly C δ < δ . Setting C ′ δ := − log(1 − C δ ) we get and | κ j | ≤ C ′ δ j ! γ (cid:16) δ H (cid:17) j − . Set ∆ := δH min(1 , C ′ δ ), then | κ j | ≤ j ! γ / ∆ j − for all j ≥ i ) ⇒ ( ii )” Suppose that X satisﬁes ( S γ ). From the moment-cumulant relations, we get m j j ! = [ z j ] exp (cid:16) j X ℓ =2 κ ℓ ℓ ! z ℓ (cid:17) , meaning that m j is equal to the coeﬃcient of z j on the right-hand side. The Cauchy inequalityyields | m j | j ! ≤ sup r> r j exp (cid:16) j X ℓ =2 | κ ℓ | ℓ ! r ℓ (cid:17) . From here on the proof is similar to the proof of the implication ( ii ) ⇒ ( i ) and therefore omitted. (cid:3) Related techniques and applications

Moderate deviations vs. heavy-tailed behavior.

Let ( X n ) n ∈ N be a sequence of normal-ized real-valued random variables such that each X n satisﬁes condition ( S γ ) for some n -dependent∆ n >

0. Further assume that ∆ n → ∞ as n → ∞ . Let us evaluate P ( X n ≥ x n ) for sequences x n with x n → ∞ and x n = o (∆ / (1+2 γ ) n ). Theorem 2.3, combined with the Gaussian tail esti-mate (4.16), yields P ( X n ≥ x n ) = (cid:0) o (1) (cid:1) √ π x n exp − x n O (cid:16) x n ∆ n (cid:17)! . In general the correction term O ( x n / ∆ n ) from the Cram´er-Petrov series does not go to zero,however for x n = o (∆ n ) it is negligible compared to x n /

2. Hence, if we are only interested in arough asymptotics on the exponential scale, we may drop it and write P ( X n ≥ x n ) = exp (cid:16) − x n (cid:0) o (1) (cid:1)(cid:17) . This asymptotic statement can be lifted to a full moderate deviation principle [24, Chapter 3.7],where probabilities of more general sets are examined.

Theorem 3.1. [30, Theorem 1.1]

Let ( X n ) n ∈ N be a sequence of random variables with E [ X n ] = 0 , V ( X n ) = 1 that satisfy condition ( S γ ) with n -dependent ∆ n but ﬁxed γ ≥ . Suppose that ∆ n →∞ . Then, for every sequence a n → ∞ with a n = o (∆ / (1+2 γ ) n ) , the sequence ( X n /a n ) n ∈ N satisﬁes the moderate deviation principle with speed a n and rate function I ( x ) = x / . Thus for every Borel set A ⊂ R , the lower and upper boundslim inf n →∞ a n log P (cid:0) X n a n ∈ A (cid:1) ≥ − inf x ∈ int( A ) x n →∞ a n log P (cid:0) X n a n ∈ A (cid:1) ≤ − inf x ∈ cl( A ) x A .The scale ∆ / (1+2 γ ) n is not merely technical. For sums of i.i.d. random variables that satisfyCram´er’s condition or have Weibull tails, the critical scale ∆ / (1+2 γ ) n corresponds to the scale atwhich the tail behavior switches to Cram´er large deviations or heavy-tailed behavior. Precisely,let Y i , i ∈ N , be i.i.d. random variables with E [ Y i ] = 0, V ( Y i ) = 1. Set S n := Y + · · · + Y n , X n = 1 √ n S n . If E [exp( tY i )] < ∞ for | t | < δ , then X n satisﬁes condition ( S γ ) with γ = 0 and ∆ n = cδ √ n forsome suitable constant c >

0. Furthermore by the Cram´er LDP, for x ≥ n →∞ n log P (cid:0) X n ∆ n ≥ x (cid:1) = lim inf n →∞ cδ ) n log P ( S n n ≥ xcδ ) ≥ − I ( cδx )( cδ ) , I ( x ) = sup t ∈ R (cid:0) tx − log E [e tY ] (cid:1) , similarly for the upper bound. Thus ( X n / ∆ n ) n ∈ N satisﬁes a large deviation principle with speed∆ n , however the rate function I ( x ) is in general diﬀerent from x / Y i is integer-valued and P ( Y i = k ) = (1 + o (1)) c exp( − k α ) with α = 1 / (1 + γ ) ∈ (0 , Y i ’s and κ j ( X n ) = n − j/ P ni =1 κ j ( Y i ), the variable X n satisﬁes condition ( S γ ) with ∆ n of the order of √ n , and on the other hand, for s n ≫ n / (2 − α ) P ( S n = s n ) = nc exp( − s αn + o ( s αn ) (cid:1) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 17 as n → ∞ , see Nagaev [95]. In the previous asymptotics, the unlikely event that S n = s n isbest realized by making one out of the n summands very large—this is the typical heavy-tailedbehavior [37], diﬀerent from the collective behavior underpinning large and moderate deviations.The scale s ∗ n = n / (2 − α ) naturally appears when solving for s αn = s n /n . As a consequence, for a n ≫ √ n n / (2 − α ) = n / [2(1+2 γ )] = const ∆ / (1+2 γ ) n , (meaning a n / ∆ / (1+2 γ ) n → ∞ ) and x >

0, we have1 a n log P (cid:0) X n a n ≥ x (cid:1) = 1 a n log P ( S n ≥ √ na n x ) = (1 + o (1)) 1 a n (cid:0) √ na n x (cid:1) α → − x / small steps sequence for subexponential random variables studied by Denisov, Dieker andShneer [25], in turn related to the sequence Λ( n ) in Ibragimov and Linnik [70, Chapter 11] andin Nagaev [96] ( N ∗∗ n ) in [38, Lemma 2.5]. The small steps sequence in general is smaller than theboundary of the big-jump domain. The latter corresponds to sequences b n for which P ( S n = b n ) ∼ n P ( Y = b n ), i.e. the dominant eﬀect is having one large summand.The connection with heavy-tailed variables suggests that diﬀerent bounds on cumulants—reﬂecting the behavior of other heavy-tailed laws, e.g. log-normal—might lead to generalizationsof Theorem 3.1. This corresponds to a generalization of Linnik’s condition (see Section 2.4) of theform E [exp( h ( X ))] < ∞ with functions h ( x ) diﬀerent from cx α . Such generalized Linnik condi-tions are treated by Ibragimov and Linnik in a chapter on “narrow zones of normal attraction”[70, Chapter 11], the class of functions h and the domains of attraction were further improved byNagaev [96]. However we are not aware of a corresponding generalized Statuleviˇcius condition.3.2. Mod-phi convergence.

Let ( X n ) n ∈ N be a sequence of normalized real-valued random vari-ables such that each X n satisﬁes condition ( S γ ) for some n -dependent ∆ n > R n on the axis of purely imaginary numbers by E (cid:2) e i tX n (cid:3) = E (cid:2) e i tZ (cid:3) R n (i t ) . (3.1)If γ = 0, then for | t | < ∆ n and some θ = θ n ( t ) ∈ [ − , R n (i t ) = exp (cid:16) κ ( X n )∆ n

3! (i t ) ∆ n + θ ∆ n ( t/ ∆ n ) − | t | / ∆ n (cid:17) . If κ ( X n )∆ n converges to some constant c ∈ R , then it is natural to rescale variables as t = ∆ / n s .In view of ∆ n (∆ / n / ∆ n ) = ∆ − / n → n →∞ R n (cid:0) i∆ / n s (cid:1) = exp (cid:16) c s ) (cid:17) uniformly on compact sets. Let us deﬁne η ( z ) := z / ψ ( z ) := exp( c z / Y n := ∆ / n X n ,then lim n →∞ exp (cid:0) − ∆ / n η (i s ) (cid:1) E (cid:2) e i sY n (cid:3) = ψ (i s ) (3.2)uniformly on compact sets. The convergence (3.2) is a key ingredient to the notion of mod-phiconvergence , here mod-Gaussian convergence of the sequence ( Y n ) n ∈ N with speed ∆ / n and limitingfunction ψ ( z ). Diﬀerent full deﬁnitions of mod-phi convergence, given functions η , ψ , and a speedsequence, impose slightly diﬀerent additional conditions [72, 23, 43]. For example, one may ormay not impose that the convergence (3.2) extends from the purely imaginary axis to a complexstrip (which would require γ = 0 in the Statuleviˇcius condition), that η ( z ) is the L´evy exponentof some inﬁnitely divisible law, or that ψ ( z ) is non-zero on a complex strip.Mod-Gaussian convergence was introduced by Jacod, Kowalski, and Nikeghbali [72]. The orig-inal motivation was in random matrix theory and analytic number theory. Concretely, a result byKeating and Snaith [76] as summarized in [72] says that the determinant Z N of a random matrix in U ( N ), distributed according to the uniform measure (Haar measure) on U ( N ), satisﬁes for all λ ∈ C with Re λ > −

1, lim N →∞ N λ E (cid:2) | Z N | λ (cid:3) = ( G (1 + λ )) G (1 + 2 λ ) , (3.3)with G some special function. Eq. (3.3) is clearly in the spirit of (3.2)—set λ = i s , Y N = log | Z N | ,and make adequate choices of the speed and limiting function. Subsequent developments includemod-Poisson convergence for random combinatorial structures [8], random vectors, proving mod-Gaussian convergence with dependency graphs, and a systematic study of asymptotic statementson probabilities when mod-phi convergence holds true, see the monograph [43]. The asymptoticbounds and their proofs share some similarities with the bounds on which we focus in this survey,see the discussion [43, Section 5.3].3.3. Analytic combinatorics. Singularity analysis.

Discrete probability and analytic com-binatorics [47] share a common complex-analytic toolbox. It is often of advantage to work withgenerating functions G ( z ) = P ∞ n =0 g n z n . In discrete probability, the coeﬃcient g n representa probability measure on N . In analytic combinatorics, the coeﬃcients are instead related tocounting problems, e.g. counting the number of trees on n vertices. When the generating functionis well-understood, probabilities or cardinalities can be recovered by complex contour integrals,using Cauchy’s formula. These formulas are similar to inversion formulas that express a proba-bility density function or cumulative distribution function in terms of the characteristic function(Fourier transform).Understanding probabilities then boils down to understanding parameter-dependent contourintegrals, for which a plethora of methods are available, for example saddle-point and steepestdescent methods, and singularity analysis [47]. Singularity refers to the singularities of the function G ( z ) in the complex plane, among which the dominant singularity z = R , the radius of convergenceof G ( z ). Transfer theorems go from asymptotic expansions of G ( z ) near its dominant singularityto asymptotic behavior of the coeﬃcients as n → ∞ , a process related to Tauberian theorems forinverse Laplace transforms [47, Chapter VI].Cumulants ﬁt in very naturally: for an integer-valued, heavy-tailed random variable X , thedominant singularity of the probability generating function G ( z ) = E [ z X ] is at z = 1, and G (e t ) = exp (cid:16)X j ≥ κ j j ! t j (cid:17) ( z = e t → , t → . However, even though singularity analysis deals with Taylor expansions with zero radius of con-vergence, Weibull-like variables do not belong to the class amenable to singularity analysis [47,Chapter VI.6] and therefore the methods described in the book by Flajolet and Sedgewick [47] arenot directly applicable (see nevertheless [38] and the references therein).The methods extend to multivariate generating functions [100] and to sequences of generatingfunctions. The latter enter the stage naturally when working with sequences of random variables,and allow for a derivation of limit laws in random combinatorial structures. The simplest setting iswhen generating functions can be approximated by powers of simpler generating functions (thinkof independent random variables!), leading to the framework of quasi-powers , see Hwang [68]and [47, Chapter IX.5]. The relation between Hwang’s quasi-powers and mod-phi convergence iscommented upon in [43, Remark 1.2].3.4.

Dependency graphs.

Let Y α , α ∈ I , be real-valued random variables indexed by some set I of cardinality N ∈ N . The mixed cumulants of the Y α ’s are given by an inverse M¨obius transformof mixed moments as κ ( Y α , . . . , Y α r ) = X { B ,...,B m } ( − m − ( m − m Y ℓ =1 E h Y i ∈ B ℓ Y α ℓ i (3.4)with summation over set partitions { B , . . . , B m } of { , . . . , r } of variable number of blocks m ∈{ , . . . , r } . A classical reference for (3.4) is Leonov and Shiryaev [83], an early mention of M¨obius HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 19 inversion is found in Sch¨utzenberger [114]; a detailed historical discussion is given by Speicher .Mixed cumulants of independent variables vanish. The cumulants of X := P α ∈ I Y α are κ r ( X ) = X ( α ,...,α r ) ∈ I r κ ( Y α , . . . , Y α r ) . Now assume that the Y α ’s have a dependency structure encoded by a dependency graph G . Thelatter is a graph G = ( I, E ( G )) with vertex set I with the following property: if I and I aredisjoint subsets of I not linked by an edge { α, β } in G , then Y α , α ∈ I and Y β , β ∈ I areindependent. F´eray, M´eliot, and Nikeghbali prove a beautiful tree bound for cumulants. Lemma 3.2. [43, Section 9.3]

Suppose that G = ( I, E ( G )) is a dependency graph for Y α , α ∈ I .Assume in addition that | Y α | ≤ A almost surely. For r ∈ N , let T r be the set of tree graphs withvertex set { , . . . , r } . Then for all α , . . . , α r ∈ I , (cid:12)(cid:12) κ ( Y α , . . . , Y α r ) (cid:12)(cid:12) ≤ r − A r X T ∈T r Y { i,j }∈ E ( T ) (cid:0) { α i = α j } + 1l { α i = α j , { α i ,α j }∈ E ( G ) } (cid:1) . (3.5)The sum over trees is equal to the number of spanning trees of the graph H with vertex set { , . . . , r } for which { i, j } is an edge if and only if either α i = α j or α i = α j and { α i , α j } is anedge of the dependency graph.When the dependency graph has maximum degree D , it is easily checked that for each ﬁxedtree T ∈ T r and all α ∈ I , X α ,...,α r ∈ I Y { i,j }∈ E ( T ) (cid:0) { α i = α j } + 1l { α i = α j , { α i ,α j }∈ E ( G ) } (cid:1) ≤ ( D + 1) r − . Summing over α gives an additional factor N = I . Combining with Cayley’s formula T r = r r − , one ﬁnds (cid:12)(cid:12) κ r ( X ) (cid:12)(cid:12) ≤ r r − N A (cid:0) A ( D + 1) (cid:1) r − , (3.6)see [43, Theorem 9.8]. Stirling’s formula implies that the sum X = P α Y α , suitably normalized,satisﬁes the Statuleviˇcius condition ( S γ ) with γ = 0.Applications of the bound (3.6) are found in [43] and [30]. We make two additional remarks.The ﬁrst remark concerns the appearance of trees. Tree bounds as in Lemma 3.2—with functions u ( α, β ) ≥ { α, β } is in some dependency graph—come up naturally inrandom ﬁelds and statistical mechanics, see for instance Duneau, Iagolnitzer, and Souillard [33].The existence of tree bounds is sometimes called strong mixing . Tree bounds feature prominentlyin the framework of complete analyticity for Gibbs measures, see condition IIb in Dobrushin andShlosman [27]. The bound (3.6) extends to such soft tree bounds because X α ,...,α r ∈ I X T ∈T r Y { i,j }∈ E ( T r ) u ( α i , α j ) ≤ N r r − (cid:16) max α ∈ I X β ∈ I u ( α, β ) (cid:17) r − . Similar considerations appear with weighted dependency graphs introduced by F´eray [41] and inparticular uniform weighted dependency graphs [44, Deﬁnition 43]. Weighted dependency graphshave been applied, for example, to the Ising model [32], and to many other examples (not coveredby statistical mechanics) [41].Our second remark is of a speculative nature: Lemma 3.2, and its proof in [43] is intriguingbecause of many similarities with the theory of cluster expansions. In fact the proof of Lemma 3.2brings up, for a connected graph H , the quantity P G ⊂ H ( − E ( G ) − (summation over spanningsubgraphs) [43, Lemma 9.12]. This object is centerstage in the theory of cluster expansionsand bounding it by trees is fairly standard, see Scott and Sokal [115] and the references therein.Connections between cluster expansions, dependency graphs, and combinatorics have been studiedintensely in the context of the Lov´asz local lemma [115]. The combinatorial proofs by F´eray, M´eliot,and Nikeghbali open up the intriguing perspective of yet another fruitful connection betweencluster expansions and (weighted) dependency graphs. Blog entry from 2 July 2020: https://rolandspeicher.com/tag/moment-cumulant-formula/ . Last consultedon 1 February 2021. Toolbox

Here we collect a few general lemmas that are of independent interest.4.1.

Characteristic functions, Kolmogorov distance, and smoothing inequality.

Onekey ingredient in the proof is a bound on the Kolmogorov distance of two measures in terms of anintegral involving the characteristic functions. Estimates of this type are fairly classical and enterproofs of the Berry-Esseen inequality following Berry’s strategy [12], see for example [39, ChapterXVI], [102, Chapter 5.1] and the survey on smoothing inequalities by Bobkov [14]. For asymptoticexpansions, e.g. Edgeworth expansions that capture correction terms in normal approximations,it is customary to deal not only with probability measures but also with signed measures whosedensity is a Gaussian multiplied by a polynomial [39, Chapter XVI.4].If µ is a ﬁnite signed measure on R , write µ = µ + − µ − for the Jordan decomposition of µ .The cumulative distribution function is F µ ( x ) = µ (( −∞ , x ]) and the characteristic function is χ µ ( t ) = R R exp(i tx ) µ (d x ). Lemma 4.1.

Let µ and ν be two ﬁnite signed measures on R with total mass . Let Y be anauxiliary continuous random variable whose probability density function ρ Y is even, i.e., ρ Y ( y ) = ρ Y ( − y ) for all y ∈ R . Assume that µ has a Radon-Nikodym derivative with respect to Lebesguemeasure bounded in absolute value by q > , and that the negative part of ν , if non-zero, satisﬁes ν − ( R ) ≤ η . Then sup x ∈ R | F µ ( x ) − F ν ( x ) |≤ − P ( | Y | ≥ y ) (cid:16)(cid:2) εqy + η (cid:3) P ( | Y | ≤ y ) + 12 π Z ∞−∞ (cid:12)(cid:12) χ εY ( t ) (cid:12)(cid:12) (cid:12)(cid:12) χ µ ( t ) − χ ν ( t ) (cid:12)(cid:12) d t | t | (cid:17) , for all ε ≥ and every y > with P ( | Y | ≥ y ) < .Remark. If ν is a probability measure, i.e., ν − = 0 and η = 0, then Lemma 4.1 is due toZolotarev [125] as cited in [112, Lemma 2.5]. Our extension to signed measures ν is neededto ﬁx an erroneous application of Zolotarev’s lemma to the normal law µ ∼ N (0 ,

1) and a signedmeasure ν that is not necessarily absolutely continuous. Another extension to signed measuresis found in [126, Theorem 2], under the condition that the atoms of the signed measures form adiscrete subset of R .Lemma 4.1 is applied to a normal law µ = N (0 , q = 1 / √ π —and the random variable Y with probability density function and characteristic function given by ρ Y ( y ) = 1 − cos yπy , χ Y ( t ) = (1 − | t | )1l [ − , ( t ) . (4.1)The choice ρ Y of smoothing density was already made by Berry [12]. Write ε = 1 /T , then theintegral error term becomes12 π Z ∞−∞ (cid:12)(cid:12) χ εY ( t ) (cid:12)(cid:12) (cid:12)(cid:12) χ µ ( t ) − χ ν ( t ) (cid:12)(cid:12) d t | t | = 12 π Z T − T (cid:16) − | t | T (cid:17) | χ X ( t ) − χ Z ( t ) | d t | t | . (4.2)In the proof of Theorem 2.2 we choose T = 1 /ε of the order of √ s , see Section 7.6. We follow [112]and choose y = 3 .

55. A numerical evaluation yields P ( | Y | ≤ y ) ≈ . C := y P ( | Y | ≤ y )1 − P ( | Y | ≥ y ) ≃ . , C := 11 − P ( | Y | ≥ y ) ≃ . . (4.3)The numerical values are better than the values appearing in the smoothing inequality in [39,Chapter XVI.3, Lemma 1]. Assume that ν is a probability measure and write J ε for the integralterm. Then sup x ∈ R | F µ ( x ) − F ν ( x ) | ≤ π qε + 2 J ε . (4.4)We note 24 /π ≃ . > . > . y = 3 .

55 yields betterconstants than Lemma 1 in [39, Chapter XVI.3].

HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 21

The proof of Lemma 4.1 is based on classical inversion formulas. Recall that if the characteristicfunction χ of a random variable X is integrable, then the variable has a probability density functiongiven by ρ ( x ) = 12 π Z ∞−∞ e − i tx χ ( t )d t and the cumulative distribution function F is given by F ( x ) = lim a →−∞ Z xa ρ ( y )d y = lim a →−∞ π Z ∞−∞ (cid:16)Z xa e − i ty d y (cid:17) χ ( t )d t = lim a →−∞ π Z ∞−∞ e − i tx − e i ta − i t χ ( t )d t. For general random variables, the previous formula for F ( x ) holds true in every point x of conti-nuity of F .The proof of Lemma 4.1 is adapted from the proof of Theorem 2 in [126], see also Lemma 1in [39, Chapter XVI.3]. To help the reader grasp the probabilistic content, we ﬁrst prove thelemma when µ and ν are probability measures on R . Proof of Lemma 4.1 when µ and ν are probability measures. Let X and Z be two random vari-ables with respective distributions µ and ν . We may assume without loss of generality that X, Y, Z are deﬁned on a common probability space (Ω , F , P ) and that Y is independent from X and Z . We have, in every point of continuity x of F X + εY and F Z + εY , F X + εY ( x ) − F Z + εY ( x ) = lim a →−∞ π Z ∞−∞ e − i tx − e i ta − i t χ Y ( εt ) (cid:0) χ X ( t ) − χ Z ( t ) (cid:1) d t. If R R | t χ Y ( εt )( χ X ( t ) − χ Z ( t )) | d t = ∞ , the lemma is trivial, so we only need to treat the casewhere t t χ Y ( εt )( χ X ( t ) − χ Z ( t )) is integrable. The Riemann-Lebesgue lemma then yields thesimpliﬁed expression F X + εY ( x ) − F Z + εY ( x ) = 12 π Z ∞−∞ e − i tx χ Y ( εt ) (cid:0) χ X ( t ) − χ Z ( t ) (cid:1) d t − i t , from which we deduce the boundsup x ∈ R (cid:12)(cid:12) F X + εY ( x ) − F Z + εY ( x ) (cid:12)(cid:12) ≤ π Z ∞−∞ (cid:12)(cid:12) χ Y ( εt ) (cid:12)(cid:12) (cid:12)(cid:12) χ X ( t ) − χ Z ( t ) (cid:12)(cid:12) d t | t | =: J ε . (4.5)This proves the lemma in the case ε = 0. For ε >

0, we need to bound the Kolmogorov distance of X and Z by that of X + εY and Z + εY . The relevant inequalities are called smoothing inequalities [39, 14]. We condition on values of Y and distinguish cases according to | Y | ≥ y or | Y | ≤ y .We start with | Y | ≤ y . We wish to exploit sup R | F ′ X ( x ) | ≤ q and the monotonicity of F Z . Noticethat, for every y ′ ∈ [ − y , y ], F Z (cid:0) x + ε ( y − y ′ ) (cid:1) − F X (cid:0) x + ε ( y − y ′ ) (cid:1) ≥ F Z ( x ) − F X ( x ) − qε ( y − y ′ ) ,F Z (cid:0) x − ε ( y + y ′ ) (cid:1) − F X (cid:0) x − ε ( y + y ′ ) (cid:1) ≤ F Z ( x ) − F X ( x ) + qε ( y + y ′ ) . (4.6)Because of the independence of Y from X and from Z , we can reinterpret the inequalities asalmost sure inequalities conditioned on Y = y ′ . The second inequality yields1l {| Y |≤ y } E h(cid:0) { Z + εY ≤ x − εy } − { X + εY ≤ x − εy } (cid:1) (cid:12)(cid:12)(cid:12) Y i ≤ {| Y |≤ y } (cid:16) F Z ( x ) − F X ( x ) + qεy + qεY (cid:17) a.s.We take expectations on both sides, use E [ Y {| Y |≤ y } ] = 0 from the parity of Y , and deduce P ( Z + εY ≤ x − εy , | Y | ≤ y ) − P ( X + εY ≤ x − εy , | Y | ≤ y ) ≤ (cid:16) F Z ( x ) − F X ( x ) + qεy (cid:17) P ( | Y | ≤ y ) . This implies F Z ( x ) − F X ( x ) ≥ P ( Z + εY ≤ x − εy , | Y | ≤ y ) − P ( X + εY ≤ x − εy , | Y | ≤ y ) P ( | Y | ≤ y ) − qεy ≥ − sup x ∈ R (cid:12)(cid:12) F X + εY ( x ) − F Z + εY ( x ) (cid:12)(cid:12) P ( | Y | ≤ y ) − qεy . (4.7)By the same arguments the ﬁrst inequality in (4.6) yields P ( Z + εY ≤ x + εy , | Y | ≤ y ) − P ( X + εY ≤ x + εy , | Y | ≤ y ) ≥ (cid:16) F Z ( x ) − F X ( x ) − qεy (cid:17) P ( | Y | ≤ y )and F Z ( x ) − F X ( x ) ≤ P ( Z + εY ≤ x + εy , | Y | ≤ y ) − P ( X + εY ≤ x + εy , | Y | ≤ y ) P ( | Y | ≤ y ) + qεy . (4.8)With (4.7) and (4.8) we get (cid:12)(cid:12) F Z ( x ) − F X ( x ) | ≤ sup x ∈ R (cid:12)(cid:12) F X + εY ( x ) − F Z + εY ( x ) (cid:12)(cid:12) P ( | Y | ≤ y ) + qεy . (4.9)If | Y | ≤ y almost surely, the lemma follows by ﬁrst using (4.5) and then taking the sup over x .If | Y | is larger than y with positive probability, we need an additional estimate. From thetower property of conditional expectations and the independence of Y from X and from Z we get (cid:12)(cid:12) P ( Z + εY ≤ x , | Y | ≥ y ) − P ( X + εY ≤ x , | Y | ≥ y ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E h {| Y |≥ y } E (cid:2) { Z ≤ x − εY } − { X ≤ x − εY } | Y (cid:3)i(cid:12)(cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) E h {| Y |≥ y } ( F Z ( x − εY ) − F X ( x − εY ) i(cid:12)(cid:12)(cid:12) ≤ D P ( | Y | ≥ y ) , where we used D := sup x ∈ R | F Z ( x ) − F X ( x ) | . The triangle inequality thus shows (cid:12)(cid:12) P ( Z + εY ≤ x , | Y | ≤ y ) − P ( X + εY ≤ x , | Y | ≤ y ) (cid:12)(cid:12) ≤ (cid:12)(cid:12) P ( Z + εY ≤ x ) − P ( X + εY ≤ x ) (cid:12)(cid:12) + D P ( | Y | ≥ y ) . In combination with (4.9) and (4.5), this yields (cid:12)(cid:12) F Z ( x ) − F X ( x ) | P ( | Y | ≤ y ) ≤ J ε + D P ( | Y | ≥ y ) + qεy P ( | Y | ≤ y ) . We take the sup over x ∈ R , use P ( | Y | ≤ y ) − P ( | Y | ≥ y ) = 1 − P ( | Y | ≥ y ) > y , and obtain D ≤ qεy P ( | Y | ≤ y ) + J ε − P ( | Y | > y ) . This concludes the proof of the lemma (remember (4.5)). (cid:3)

Proof of Lemma 4.1 for signed measures µ and ν . Let D := sup x ∈ R (cid:12)(cid:12) F µ ( x ) − F ν ( x ) (cid:12)(cid:12) (4.10)and D ε := sup x ∈ R (cid:12)(cid:12)(cid:12) ν ∗ P εY (cid:0) ( −∞ , x ] (cid:1) − µ ∗ P εY (cid:0) ( −∞ , x ] (cid:1)(cid:12)(cid:12)(cid:12) , (4.11)where P εY is the law of εY and µ ∗ P εY the convolution of µ and P εY . Arguments similar to theproof of Eq. (4.5) yield D ε ≤ π Z ∞−∞ (cid:12)(cid:12) χ Y ( εt ) (cid:12)(cid:12) (cid:12)(cid:12) χ µ ( t ) − χ ν ( t ) (cid:12)(cid:12) d t | t | . (4.12) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 23

It remains to bound D in terms of D ε . For signed measures the cumulative distribution functionis no longer monotone increasing however F ν ( x + u ) − F ν ( x ) = ν (( x , x + u ]) ≥ − ν − ( R ) = − η for all x ∈ R and u ≥

0. Therefore (4.6) becomes F ν (cid:0) x + ε ( y − y ′ ) (cid:1) − F µ (cid:0) x + ε ( y − y ′ ) (cid:1) ≥ F ν ( x ) − F µ ( x ) − qε ( y − y ′ ) − η,F ν (cid:0) x − ε ( y + y ′ ) (cid:1) − F µ (cid:0) x − ε ( y + y ′ ) (cid:1) ≤ F ν ( x ) − F µ ( x ) + qε ( y + y ′ ) + η (4.13)for all y ′ ∈ [ − y , y ]. We integrate over y ′ with respect to the law P Y of Y , use R y − y y ′ P Y (d y ′ ) = 0because of the parity of Y , and obtain Z y − y (cid:16) F ν (cid:0) x + ε ( y − y ′ ) (cid:1) − F µ (cid:0) x + ε ( y − y ′ ) (cid:1)(cid:17) P Y (d y ′ ) ≥ (cid:16) F ν ( x ) − F µ ( x ) − [ qεy + η ] (cid:17) P ( | Y | ≤ y ) , Z y − y (cid:16) F ν (cid:0) x − ε ( y + y ′ ) (cid:1) − F µ (cid:0) x − ε ( y + y ′ ) (cid:1)(cid:17) P Y (d y ′ ) ≤ (cid:16) F ν ( x ) − F µ ( x ) + qεy + η (cid:17) P ( | Y | ≤ y ) , (4.14)which replaces (4.7) and (4.8). The left-hand sides can be rewritten with the help of convolutions.Let P εY be the distribution of εY , then Z ∞−∞ F µ (cid:0) x + ε ( y − y ′ ) (cid:1) P Y (d y ′ ) = Z R Z R ( −∞ ,x + εy ] ( x + εy ′ ) µ (d x ) ! P Y (d y ′ )= (cid:0) µ ∗ P εY (cid:1)(cid:0) ( −∞ , x + εy ] (cid:1) . A similar identity holds true with F ν and ν instead of F µ and µ . For the integral over R \ [ − y , y ],we note (cid:12)(cid:12)(cid:12)Z R \ [ − y ,y ] (cid:16) F µ (cid:0) x + ε ( y − y ′ ) (cid:1) − F ν (cid:0) x + ε ( y − y ′ ) (cid:1)(cid:17) P Y (d y ′ ) (cid:12)(cid:12)(cid:12) ≤ D P ( | Y | > y ) . We write the integral over [ − y , y ] as the diﬀerence of the integral over R and R \ [ − y , y ], applythe triangle inequality, and deduce Z y − y (cid:16) F ν (cid:0) x + ε ( y − y ′ ) (cid:1) − F µ (cid:0) x + ε ( y − y ′ ) (cid:1)(cid:17) P Y (d y ′ ) ≤ D ε + D P ( | Y | > y )with D ε given by (4.11). Similarly, Z y − y (cid:16) F ν (cid:0) x − ε ( y + y ′ ) (cid:1) − F µ (cid:0) x + ε ( y + y ′ ) (cid:1)(cid:17) P Y (d y ′ ) ≥ − D ε − D P ( | Y | > y ) . Combining this with (4.14), we ﬁnd D ε + D P ( | Y | > y ) ≥ (cid:16) F ν ( x ) − F µ ( x ) − (cid:2) qεy + η ] (cid:17) P ( | Y | ≤ y ) , − D ε − D P ( | Y | > y ) ≤ (cid:16) F ν ( x ) − F µ ( x ) + qεy + η (cid:17) P ( | Y | ≤ y ) (4.15)hence P ( | Y | ≤ y ) | F ν ( x ) − F µ ( x ) | ≤ D P ( | Y | > y ) + D ε + ( qεy + η ) P ( | Y | ≤ y ) . We take the supremum over x ∈ R , obtain an inequality with D on both sides from which wededuce D ≤ P ( | Y | ≤ y ) − P ( | Y | > y ) (cid:16) D ε + ( qεy + η ) P ( | Y | ≤ y ) (cid:17) . To conclude, we note that the denominator on the right-hand side is equal to 1 − P ( | Y | > y )and bound D ε by (4.12). (cid:3) Tails and exponential moments of the standard Gaussian.

The

Mills ratio of a con-tinuous random variable is the ratio of its survival function and its probability density function.For standard normal laws, the asymptotic behavior of the Mills ratio [50] is P ( Z ≥ x ) √ π − exp( − x /

2) = (1 + o (1)) 1 x ( x → ∞ ) . (4.16)Eq. (4.16) is complemented by the following lemma. Lemma 4.2.

Let Z be a standard normal variable. Then we have, for all β ≥ , E (cid:2) e − βZ { Z ≥ } (cid:3) = e β / P ( Z ≥ β ) ≥ √ π ( β + 1) . Moreover for all β > and η ∈ ( − β, β ) , E (cid:2) e − ( β + η ) Z { Z ≥ } (cid:3) = ββ + θη E (cid:2) e − βZ { Z ≥ } (cid:3) for some θ ∈ [ − , .Proof. We compute E (cid:2) e − βZ { Z ≥ } (cid:3) = 1 √ π Z ∞ e − βu − u / d u = 1 √ π e β / Z ∞ e − ( u + β ) / d u = 1 √ π e β / Z ∞ β e − y / d y = e β / P ( Z ≥ β ) . Next, we note P ( Z ≥ β ) = 1 √ π Z ∞ β e − y / d y ≥ √ π Z ∞ β e − y / (cid:16) − y (1 + y ) (cid:17) d y = 1 √ π h − exp( − y / y i y = ∞ y = β = 1 √ π exp( − β / β + 1 . We set ψ ( β ) := exp( β / P ( Z ≥ β ) = E [exp( − βZ )1l { Z ≥ } ] and q ( β ) := βψ ( β ). Clearly ψ ismonotone decreasing. We check that q ( β ) is monotone increasing. Indeed, q ′ ( β ) = (1 + β )e β / P ( Z ≥ β ) − β √ π . We compute1 √ π β β e − β / = 1 √ π Z ∞ β (cid:16) − dd y y y e − y / (cid:17) d y = 1 √ π Z ∞ β e − y / y + 2 y − y + 1) d y ≤ √ π Z ∞ β e − y / d y = P ( Z ≥ β )and deduce q ′ ( β ) ≥ β ≥

0, so q ( β ) is indeed monotone increasing. By the monotonicityof ψ and q , if η ∈ ( − β, ψ ( β ) ≤ ψ ( β + η ) ≤ β + η βψ ( β ) . Similarly, if η ∈ (0 , β ), then ββ + η ψ ( β ) ≤ ψ ( β + η ) ≤ ψ ( β )In both cases ψ ( β + η ) /ψ ( β ) = β/ ( β + θη ) for some θ ∈ [0 , (cid:3) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 25

Integrals of monotone functions and Kolmogorov distance.Lemma 4.3.

Let F and G be two cumulative distribution functions of some probability measuresand f : [0 , ∞ ) → R + a monotone decreasing, continuous function with lim y →∞ f ( y ) = 0 . Then (cid:12)(cid:12)Z ∞−∞ f [0 , ∞ ) d F − Z ∞−∞ f [0 , ∞ ) d G (cid:12)(cid:12) ≤ f (0) sup y ∈ R | F ( y ) − G ( y ) | . The lemma extends to cumulative distribution functions of ﬁnite signed measures with total mass . If F and G are continuous at 0 (i.e., the associated measures have no atom at 0), we may write R ∞ f d F and R ∞ f d G without creating ambiguities. Proof.

The lemma is a straightforward consequence of an integration by parts for Riemann-Stieltjesintegrals (roughly, R ∞ f d F = − R ∞ F d f + boundary term). For n ∈ N , set f n := f (0)1l { } + ∞ X k =0 f ( k +1 n )1l ( k/n, ( k +1) /n ] , further deﬁne a nk +1 := f ( kn ) − f ( k +1 n ) ≥

0. Let µ be the measure on R with µ (( −∞ , x ]) = F ( x ).Summing by parts, we get Z ∞−∞ f n [0 , ∞ ) d F = f (0) µ ( { } ) + ∞ X k =0 (cid:0) f (0) − a n − · · · − a nk +1 (cid:1)(cid:16) F (cid:0) k +1 n ) − F (cid:0) kn (cid:1)(cid:17) = f (0) (cid:0) µ ( { } ) + 1 − F (0) (cid:1) − ∞ X ℓ =0 a nℓ +1 ∞ X k = ℓ (cid:16) F (cid:0) k +1 n ) − F (cid:0) kn (cid:1)(cid:17) = f (0) lim ε ց (cid:0) − F ( − ε ) (cid:1) − ∞ X ℓ =0 a nℓ +1 (cid:0) − F (cid:0) ℓn ) (cid:1) . A similar representation holds true for the integral against G . Since 0 ≤ a nℓ +1 ≤ f (0) for all ℓ , wededuce (cid:12)(cid:12)(cid:12)Z ∞−∞ f n [0 , ∞ ) d F − Z ∞−∞ f n [0 , ∞ ) d G (cid:12)(cid:12)(cid:12) ≤ f (0) sup y | F ( y ) − G ( y ) | . We pass to the limit n → ∞ , note f n ր f because of the continuity and monotonicity of f , andobtain the lemma. (cid:3) Positivity of truncated exponentials.

Letexp m ( u ) := m X k =0 u k k !denote the truncated exponential function. Lemma 4.4.

We have exp n ( u ) > for all u ∈ R and n ∈ N . The analogous statement for the exponential series truncated after odd integers 2 n + 1 is false,since exp n +1 ( u ) is a polynomial that goes to −∞ as u → −∞ . Proof.

We follow [112, pp. 37–38]. For u ≥ u <

0. Set a k = u k − (2 k − u k (2 k )! = u k − (2 k − (cid:16) u k (cid:17) ( k ≥ . If u ≤ − k , then a k ≥

0, so if u ≤ − n , then a k ≥ k = 1 , . . . , n and exp n ( u ) =1 + a + · · · + a n >

0. If − n < u <

0, then for all k ≥ n + 1, we have u > − k hence a k <

0. Itfollows that exp n ( u ) = exp( u ) − ∞ X k = n +1 a k > exp( u ) > . (cid:3) An alternative proof is based on Taylor’s theorem with integral remainder: for u < m ,exp( u ) − exp m ( u ) = 1 m ! Z u ( u − t ) m +1 e t d t = − m ! Z u s m +1 e u − s d s > m ( u ) ≥ exp( u ) ≥ Concentration inequality. Proof of Theorem 2.5

For the proof of Theorem 2.5 we follow the proof presented in [112] for the ﬁrst part (upuntil (5.5) below), but then take a short-cut in order to avoid cumbersome numerical evaluations;instead we exploit a relation between truncated exponentials and Poisson distributions, which welearnt about from Rudzkis and Bakshaev [108].The ﬁrst step of the proof consists in applying Markov’s inequality to the truncated exponentialexp n ( x ), which is monotone increasing on R + . This yields, for every h > n ∈ N , and x ≥ P ( X ≥ x ) ≤ E [exp n ( hX )]exp n ( hx ) = 1exp n ( hx ) n X k =0 h k k ! m k . (5.1)We check that n X k =0 h k k ! m k ≤ exp n (cid:16) n X j =2 h j j ! | κ j | (cid:17) . (5.2)It is enough to show that for each k = 2 , . . . , n , the moment m k is smaller than the coeﬃcientof h k of the series obtained by expanding the right-hand side of the inequality. By deﬁnitionof the cumulants, the moment m k is equal to the coeﬃcient of h k in the formal power seriesexp( P ∞ j =2 h j κ j /j !), which gives m k = ∞ X r =1 r ! X j ,...,j r ≥ j + ··· + j r = k κ j j ! · · · κ j r j r ! . For k ≤ n , the only non-zero contributions are from r ≤ n and 2 ≤ j ℓ ≤ n , therefore | m k | ≤ n X r =1 r ! X ≤ j ,...,j r ≤ n : j + ··· + j r = k | κ j | j ! · · · | κ j r | j r ! . The right-hand side is equal to the coeﬃcient of h k in exp n (cid:0)P nj =2 | κ j | h j /j !). This completes theproof of (5.2).Next we show that for a speciﬁc x -dependent choice of h and n , we may bound n X j =2 | κ j | j ! h j ≤ hx . (5.3)Consider ﬁrst the case H = 1. Choose h := x ( x ∆) / (1+ γ ) / ( x + ( x ∆) / (1+ γ ) ) so that1 hx = 1 x + 1( x ∆) / (1+ γ ) . (5.4)By the assumptions on the cumulants, | κ j | /j ! ≤ j ! γ / n X j =2 | κ j | j ! j ≤ h n X j =2 (cid:0) j !2 (cid:1) γ (cid:0) h ∆ (cid:1) j − . A straightforward induction over n shows that for j ≤ n , we have j ! / ≤ (2 n ) j − , thus if we pick n ≤ hx/ j ! / ≤ ( hx ) j − for j ≤ n and n X j =2 | κ j | j ! h j ≤ h n X j =2 (cid:16) ( hx ) γ h ∆ (cid:17) j − = h n X j =2 q j − , q := ( hx ) γ h ∆ . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 27

Noticing q = ( hx ) γ / (∆ x ) <

1, we deduce n X j =2 | κ j | j ! h j ≤ h − q . Set ˜ q = q / (1+ γ ) , then q < ˜ q < hx + ˜ q hence x = h/ (1 − ˜ q ) and n X j =2 | κ j | j ! h j ≤ h − ˜ q = hx . This completes the proof of (5.3).The bounds (5.1), (5.2) and (5.3) yield P ( X ≥ x ) ≤ exp n ( hx/ n ( hx ) (5.5)for 2 n ≤ hx . If we had exponentials instead of truncated exponentials, then the right-hand sideof the previous inequality would be exp( − hx/

2) and we would be done, in the case H = 1.Let us choose n := ⌊ ( hx ) / ⌋ , then 2 n ≤ hx < n + 2. Let N n +2 be a Poisson random variablewith parameter 2 n + 2. Thene − hx exp n ( hx ) ≤ e − n exp n (2 n + 2) = e P ( N n +2 ≤ n ) . The variable N n +2 is equal in distribution to the sum of 2 n + 2 i.i.d. Poisson variables withparameter 1. Applying the central limit theorem, one ﬁnds thatlim n →∞ P ( N n +2 ≤ n ) = 12 . As a consequence, there exist c, m > hx ≥ m (hence large n ),exp n ( hx ) ≥ c e hx ≥ c e hx/ exp n ( hx/ . Together with (5.5) this gives P ( X ≥ x ) ≤ c e − hx/ for all hx ≥ m . Replacing 1 /c by C := max(1 /c, exp( m/ hx . This completes the proof of the theorem when H = 1.For general H >

0, deﬁne ˜ X := X/ √ H . Then ˜ X satisﬁes the assumption of the theorem with H = 1 and ∆ replaced with ˜∆ = ∆ √ H and the proof is easily concluded. (cid:3) Bounds under Cram´er’s condition. Proof of Theorem 2.1

Here we prove Theorem 2.1 on random variables that have exponential moments ( S ), whichcorresponds to s = ∞ in condition ( S ∗ ). This helps explain the strategy for ﬁnite s , and some of theestimates are reused for ﬁnite s . The Cram´er-Petrov series P ∞ j =3 λ j x j is deﬁned in Appendix A.Proposition A.2 shows that under the conditions of Theorem 2.1, we have | λ j | ≤ ∆ / / j . Proof of Theorem 2.1.

The proof of Theorem 2.1 comes in several steps:(1) Introduce a tilted variable X h , as often done in the proof of the Cram´er’s large deviationprinciple, and a centered normalized version b X h with E [ b X h ] = 0 and V [ b X h ] = 1.(2) Estimate the Kolmogorov distance between the law of b X h and the standard normal dis-tribution by comparing characteristic functions.(3) Undo the tilt: express P ( X ≥ x ) in terms of b X h , conclude with the help of step (2). The exponential moments E [exp( tX )] are ﬁnite for all t ∈ ( − ∆ , ∆) (or t ∈ C with | Re t | ≤ ∆),and the Taylor series of the cumulant generating function ϕ ( t ) = log E [exp( tX )] has radius ofconvergence at least ∆, ϕ ( t ) = log E [e tX ] = t ∞ X j =3 κ j j ! t j ( | t | < ∆) . Let I ( x ) = sup t ∈ R ( tx − ϕ ( t )) be the Cram´er rate function. In order to estimate P ( X ≥ x ) with x > X h , given by P ( X h ∈ B ) = e − ϕ ( h ) E [exp( hX )1l B ( X )] , assuming the equation x = ϕ ′ ( h )admits a solution h ∈ (0 , ∆). The use of tilted variables is fairly standard in the proof of Cram´erlarge deviation principle. Now, the key idea is to approximate the distribution of X h by that of anormal random variable with mean x and variance σ ( h ) := ϕ ′′ ( h ). Equivalently, deﬁning b X h := X h − xσ ( h ) , we approximate L ( b X h ) ≈ N (0 , χ h ( t ) := E [exp(i t b X h )] = E h exp (cid:16) i t X h − xσ ( h ) (cid:17)i = exp (cid:16) − ϕ ( h ) − i t xσ ( h ) (cid:17) E h exp (cid:16) hX + i t Xσ ( h ) i = exp (cid:16) ϕ (cid:0) h + i tσ ( h ) (cid:1) − ϕ ( h ) − i t xσ ( h ) (cid:17) , (6.1)for all t ∈ R that are small enough so that | h + i t/σ ( h ) | ≤ T < ∆. A Taylor approximation for ϕ at h showsfor some θ ∈ [ − , ϕ (cid:0) h + i tσ ( h ) (cid:1) − ϕ ( h ) − i t xσ ( h ) = ϕ ′′ ( h ) (cid:0) i tσ ( h ) (cid:1) + 16 ϕ ′′′ ( t + i θ t ˜ σ ( h ) ) t = − t + O ( t ) . Our cumulant bounds allow for an easy bound on the third derivative: if z ∈ C satisﬁes | z | ≤ T < ∆, then | ϕ ′′′ ( z ) | ≤ ∞ X j =3 | κ j | | z | j − ( j − ≤ ∞ X n =1 n (cid:0) | z | ∆ (cid:1) n − ≤ T / ∆(1 − T / ∆) . (6.2)For T bounded away from ∆, we deduce that χ h ( t ) = exp (cid:16) − t (cid:0) O ( t ∆ ) (cid:1)(cid:17) which, for large ∆, is close to the characteristic function exp( − t ) of the standard normal variable.Careful estimates based on the smoothing inequality [39, Lemma 2, Chapter XVI.3]sup y ∈ R (cid:12)(cid:12) P ( b X h ≤ y ) − P ( Z ≤ y ) (cid:12)(cid:12) ≤ π Z T − T (cid:12)(cid:12)(cid:12) χ h ( t ) − exp( − t / t (cid:12)(cid:12)(cid:12) d t + 24 πT √ π show that the Kolmogorov distance between L ( b X h ) and the normal distribution is of order O ( ):for all δ ∈ (0 , C δ > h = h ( x )exists and is in [0 , δ ∆], then D ( b X h , Z ) := sup y ∈ R (cid:12)(cid:12) P ( b X h ≤ y ) − P ( Z ≤ y ) (cid:12)(cid:12) ≤ C δ ∆ . The next step consists in undoing the tilt: we express the probability we are after in terms of thetilted recentered variable b X h by P ( X ≥ x ) = e ϕ ( h ) E h e − hX h { X h ≥ x } i = e ϕ ( h ) − hx E h e − hσ ( h ) b X h { b X h ≥ } i . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 29

An easy lemma on Kolmogorov distances (Lemma 4.3) shows (cid:12)(cid:12)(cid:12) E h e − hσ ( h ) b X h { b X h ≥ } i − E h e − hσ ( h ) Z { Z ≥ } i(cid:12)(cid:12)(cid:12) ≤ D ( b X h , Z ) ≤ C δ ∆ . Further noting ϕ ( h ) − hx = − I ( x ), we get P ( X ≥ x ) = e − I ( x ) (cid:16) E h e − hσ ( h ) Z { Z ≥ } i + θ C δ ∆ (cid:17) (6.3)for some θ ∈ [ − , σ ( h ) is close to 1 because | σ ( h ) − | = | ϕ ′′ ( h ) − ϕ ′′ (0) | ≤ h ∆by the bound (6.2) on the third derivative. The tilt parameter h is close to x because | x − h | = | ϕ ′ ( h ) − h | ≤ ∞ X j =3 | h | j − ( j − | κ j | ≤ ∆ ( h/ ∆) − h/ ∆ , (6.4)which yields x − h = h (1 + O ( h/ ∆)). Altogether we get hσ ( h ) = x (cid:16) O (cid:16) h ∆ (cid:17)(cid:17) . Bounds on the standard normal distribution and a completion of squares given in Lemma 4.2 yield E h e − hσ ( h ) Z { Z ≥ } i = 11 + O ( h/ ∆) E h e − xZ { Z ≥ } i = (cid:0) O ( h/ ∆) (cid:1) e x / P ( Z ≥ x ) . Inserting this expression into (6.3), together with O ( h/ ∆) = 1 + O ( h/ ∆), we get P ( X ≥ x ) = e − I ( x )+ x / (cid:16)(cid:0) O ( h ∆ ) (cid:1) P ( Z ≥ x ) + θ C δ ∆ e − x / (cid:17) . We factor out P ( Z ≥ x ), use the lower bound for P ( Z ≥ x ) from Lemma 4.2, remember (6.4), andobtain P ( X ≥ x ) = e − I ( x )+ x / P ( Z ≥ x ) (cid:16) O (cid:0) x +1∆ (cid:1)(cid:17) . By Appendix A, given δ ∈ (0 , c δ > x ∈ [0 , c δ ∆], theCram´er-Petrov series converges, and the equation ϕ ′ ( h ) = x has a unique solution h = h ( x ) andthis solution is in [0 , δ ∆]. Going through the previous estimates carefully, we see that there is aconstant C δ > x ∈ [0 , c δ ∆], P ( X ≥ x ) = P ( Z ≥ x ) (cid:16) θC δ x +1∆ (cid:17) exp (cid:16) ∞ X j =3 λ j x j (cid:17) . (cid:3) Bounds with finitely many moments. Proof of Theorem 2.2

Now we turn to condition ( S ∗ ) that | κ j | ≤ ( j − / ∆ j − for j = 3 , . . . , s + 2 with 3 ≤ s ≤ .Remember that the random variables are centered and normalized as E [ X ] = 0, V ( X ) = 1.7.1. Introducing a modiﬁed tilted measure.

The Taylor series of ϕ ( t ) may have radius ofconvergence zero, so we work instead with the truncated functions˜ ϕ ( t ) = s X j =2 κ j j ! t j , e ˜ ϕ ( t ) =: ∞ X j =0 ˜ m j j ! t j . (7.1)Notice ˜ m j = E [ X j ] =: m j for j ≤ s + 2. The random variable X may have inﬁnite exponentialmoments, so the exponential tilt is no longer possible; we replace the exponential exp( tx ) in thetilt by g t ( x ) := exp s ( tx ) + x ˜ r ( t ) (7.2)with exp s ( tx ) := s X j =0 ( tx ) j j ! , ˜ r ( t ) := ∞ X j = s +1 j ! ˜ m j t j . (7.3) The truncated exponential is fairly natural, the additional term x ˜ r ( t ) in g t ( x ) ensures that thetilted measure µ h deﬁned below has total mass 1. For small h = h ( x ) such that x = ˜ m ( h ) = ˜ ϕ ′ ( h ) and ˜ σ ( h ) := ˜ ϕ ′′ ( h ) , we introduce a signed measure µ h on R by µ h ( B ) = e − ˜ ϕ ( h ) E h g h ( X )1l { ( X − x ) / ˜ σ h ∈ B } i . (7.4)Because of E [ g h ( X )] = s X j =0 m j j ! h j + E [ X ] ∞ X j = s +1 ˜ m j j ! h j = e ˜ ϕ ( h ) , (7.5)the signed measure is normalized, i.e., µ h ( R ) = 1, however the function g h may take negativevalues so µ h is not necessarily a probability measure. Nevertheless, we will see that µ h is close toa normal distribution. For later purposes we note the inverse relation to (7.4): for all Borel sets B ⊂ R , P ( X ∈ B ) = e ˜ ϕ ( h ) Z R g h (˜ σ ( h ) y + x ) 1l B (cid:0) ˜ σ ( h ) y + x (cid:1) d µ h ( y ) . (7.6)Later we evaluate the function ˜ ϕ ( z ) for complex parameters z . A visual summary of variousquantities that are introduced for proofs is given in Figure 1 below.7.2. Moment estimates.

The key step of the proof will be, just as in the case s = ∞ , to showthat the Fourier transform χ h ( t ) of µ h is close to exp( − t / t , so that we can applyZolotarev’s lemma. A new feature compared to s = ∞ is that we have to deal with truncationerrors. In order to estimate them, it is helpful to have bounds on the quantities ˜ m j and ontruncated moments. Lemma 7.1.

Let ˜ m k be given by (7.1) . We have for all k ∈ N , k ! | ˜ m k | ≤ max (cid:16) p k/ (2e) k , / √ e) k (cid:17) = ( / p k/ (2e) k , k ≤ , / (∆ / √ e) k , k > . Moreover the moments m k := E [ X k ] satisfy m k = ˜ m k for k = 0 , , . . . , s + 2 .Remark. For small k , the ˜ m k ’s satisfy a bound inherited from the moments of Gaussian variables.For large k , the ˜ m k ’s have a behavior closer to the moments of a random variable that hasexponential moments E [exp( tY )] up to order | t | < ∆, e.g., an exponential variable Y ∼ Exp(∆).Indeed, as k → ∞ ,1 k ! E (cid:2) | Y | k (cid:3) = 1 k ! 2 k/ Γ( k +12 ) √ π ∼ √ k ! (cid:16) k e (cid:17) k/ ∼ √ πk ( k/ e) k/ ≤ p k/ (2e) k , where we have used Stirling’s formula Γ( x + 1) ∼ √ πx ( x/ e) x for the Gamma function and forthe factorial k ! = Γ( k + 1). On the other hand the moments of an exponential random variable Y ∼ Exp(∆) are given by 1 k ! E (cid:2) Y k (cid:3) = 1∆ k ≤ / √ e) k . Proof of Lemma 7.1 .

We have1 k ! ˜ m k = ⌊ k/ ⌋ X r =1 r ! X j + ··· + j r = kj ℓ =2 ,...,s κ j · · · κ j r j ! · · · j r ! . (7.7)Hence (cid:12)(cid:12)(cid:12) k ! ˜ m k (cid:12)(cid:12)(cid:12) ≤ ⌊ k/ ⌋ X r =1 r ! X j + ··· + j r = kj ℓ =2 ,...,s r Y ℓ =1 j ℓ ( j ℓ − ! k − r ≤ ⌊ k/ ⌋ X r =1 r ! s X j =2 j ( j − ! r k − r . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 31

Using P ∞ j =2 1 j ( j − = R (cid:0) − log(1 − t ) (cid:1) d t = 1 , we obtain for ∆ ≤ k/ (cid:12)(cid:12)(cid:12) k ! ˜ m k (cid:12)(cid:12)(cid:12) ≤ k ⌊ k/ ⌋ X r =1 r ! ∆ r ≤ k e ∆ ≤ k e k/ . For ∆ ≥ k/

2, we estimate instead (cid:12)(cid:12)(cid:12) k ! ˜ m k (cid:12)(cid:12)(cid:12) ≤ ⌊ k/ ⌋ X r =1 r ! 1 p k/ k − r ≤ p k/ k e k/ . The moments E [ X k ], k ≤ s + 2, are given by a formula similar to (7.7), but in theory the rangeof summation indices is now j ℓ = 2 , . . . , s + 2. However, from j + · · · + j r = k and j ℓ ≥ k − ≤ s , so we are back to Eq. (7.7). (cid:3) Let us quickly explain how Lemma 7.1 aﬀects the parameter choices and the bounds on trun-cation errors. For z ∈ C bounded away from ∆ / √ e and p s/ (2e), the remainder term ˜ r ( z ) from(7.3) is bounded by | ˜ r ( z ) | ≤ ∞ X j = s +1 | z | j (∆ / √ e) j + ∞ X j = s +1 | z | j p s/ (2e) j = O (cid:16)(cid:16) | z | ∆ / √ e (cid:17) s +1 (cid:17) + O (cid:16)(cid:16) | z | p s/ (2e) (cid:17) s +1 (cid:17) , where we have estimated max( x, y ) ≤ x + y for x, y ≥

0. In the bound there is nothing to begained from having s larger than 2∆ . Indeed, if s > , then it is the ﬁrst term that dominates,and it corresponds to the bound obtained for s = 2∆ . This is why, in assumption ( S ∗ ), we donot bother with s > . It will be convenient to work with s ≤ , | z | < a := r s . Then for all k ≥ s , we have q k ≥ p s = √ a and ∆ √ ≥ √ e p s = √ a by the assumption s ≤ , hence min (cid:16)r k , ∆ √ (cid:17) ≥ √ a for all k ≥ s and 1 k ! | ˜ m k | ≤ √ a ) k for all k ≥ s. (7.8)It is in this form that Lemma 7.1 is used later on.In Lemma 7.8 below we need to estimate sums of truncated moments, which are of the type P sk =0 z k k ! E [ | X | k {| X |≥ b } ], for some additional truncation parameter b ≥

1. Because of E [ X k ] = ˜ m k for k = 2 , . . . , s + 2, the bound from Lemma 7.1 extends to the moments E [ X k ], k ≤ s , and weget for k ≤ s/ E (cid:2) | X | k {| X |≥ b } (cid:3) ≤ b k E (cid:2) X k (cid:3) ≤ b k (2 k )!min (cid:0) k/ e , ∆ / e (cid:1) k = 1 b k (2 k )!( k/ e) k , (7.9)where we have used 2 k ≤ s ≤ . By Stirling’s formula,(2 k )! k !( k/ e) k = (1 + o (1))4 k √ k → ∞ ) , so we deduce 1 k ! E (cid:2) | X | k {| X |≥ b } (cid:3) ≤ C k b k (7.10)for k ≤ s/

2, with C ≥ √ s . This bound has the drawback ofnot being small for k = 0, moreover we need to proceed diﬀerently for k > s/

2. Therefore thebound (7.10) is complemented by the following lemma.

Lemma 7.2.

For ≤ s ≤ , a := p s/ (4e) and b := 4 a , we have k ! E (cid:2) | X | k {| X |≥ b } (cid:3) ≤ ( √ /a ⌊ b ⌋ , ≤ k ≤ b, √ /a k , b < k ≤ s. Note that b = p s/ e ≥ b = √ s √ e < s for s ≥

30. Set m k ( b ) := E (cid:2) | X | k {| X |≥ b } (cid:3) . Proof.

For the proof, we distinguish the cases 0 ≤ k ≤ s/ k > s/

2. For 1 ≤ k ≤ s/

2, we pro-ceed as in (7.9). With Stirling’s formula [106] e / (12 n +1) √ πn ( n/ e) n ≤ n ! ≤ e / (12 n ) √ πn ( n/ e) n , we obtain 1 b k (2 k )! k !( k/ e) k ≤ a k √ k − k +1 ≤ √ a k since for k ≥ k − k +1 ≤

0. It follows that (7.10) holds true with C = √ k = 1 , . . . , s/

2. This proves in particular the assertion of Lemma 7.2 for b < k ≤ s . For0 ≤ k ≤ ⌊ b ⌋ , using b ≥

1, we have1 k ! m k ( b ) ≤ k ! E h | X | k | X | ⌊ b ⌋− k b ⌊ b ⌋− k {| X |≥ b } i ≤ ⌊ b ⌋ ! E h | X | ⌊ b ⌋ {| X |≥ b } i = 1 ⌊ b ⌋ ! m ⌊ b ⌋ ( b ) . We combine this with the bound (7.10) with C = √ b ≤ s/ s/ ≤ k ≤ s and k even, we estimate m k ( b ) ≤ E [ | X | k ] = E [ X k ] = ˜ m k and by Lemma 7.1,1 k ! m k ( b ) ≤ p k/ (2e) , ∆ / √ e)] k = 1 p k/ (2e) k ≤ p s/ (4e) k = 1 a k . If k is odd, we handle E [ | X | k ] = E [ X k ] by bounding the expectation by an even moment: m k ( b ) = E (cid:2) | X | k {| X |≥ b } (cid:3) ≤ b E (cid:2) | X | k +1 (cid:3) = 1 b E (cid:2) X k +1 (cid:3) = 1 b ˜ m k +1 hence 1 k ! m k ( b ) ≤ k + 1 b ˜ m k +1 ( k + 1)! ≤ k + 1 b p ( k + 1) / (2e) k +1 . Now k + 1 √ k + 1 k +1 = ( k + 1) − k − ≤ (cid:0) s (cid:1) − ( k − / hence 1 k ! m k ( b ) ≤ s/ p s/ (2e) 1 p ( s/ / (2e) k +1 = s + 28 p s/ (4e) 1 p ( s + 2) / (4e) 1 p ( s + 2) / (4e) k = 4 e √ s + 28 √ s p ( s + 2) / (4e) k ≤ e r s a k ≤ √ a k for s ≥ , since e q s ≤ . ≤ √ s ≥ (cid:3) Bounds on tilt parameters.

Remember the choice s ≤ , a = p s/ (4e) ≤ ∆ / √ θ designates a generic constant in [ − , a = b + θc means “there exists θ = θ ( a, b, c ) ∈ [ − ,

1] suchthat a = b + θc . When the quantities involved are complex, θ is a complex number of modulussmaller than or equal to 1. Lemma 7.3.

Assume that s ≤ . Then for all x ∈ [0 , a ] , the equation ˜ ϕ ′ ( h ) = x has a uniquesolution h ∈ [0 , a ] . In addition, if x = aδ with δ ∈ [0 , , then the solution h is in [0 , δa ] and x = h (cid:0) θ δ (cid:1) , ˜ σ ( h ) = ϕ ′′ ( h ) = 1 + θ . δ > . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 33

Note that the polynomial equation ˜ ϕ ′ ( h ) = x may have additional solutions h > a . The constant0 .

751 replaces the constant 0 .

75 from [112, Eq. (2.21)], which we were not able to reproduce.

Proof.

Set q := 1 / √

2e so that a ≤ q ∆ < ∆ /

2. Fix δ ∈ [0 , h ∈ [0 , δa ], we have h/ ∆ ≤ qδ and | ˜ ϕ ′′ ( h ) − | = (cid:12)(cid:12)(cid:12) s X j =3 κ j ( j − h j − (cid:12)(cid:12)(cid:12) ≤ ∞ X j =3 (cid:16) h ∆ (cid:17) j − = h/ ∆1 − h/ ∆ ≤ q − q δ ≤ . δ. (7.11)It follows that ˜ ϕ ′′ ( h ) > , δa ] and ˜ ϕ ′ ( · ) is a monotone increasing bijection from [0 , δa ] onto[0 , ˜ ϕ ′ ( δa )]. Next we bound | ˜ ϕ ′ ( h ) − h | ≤ h s X k =3 k − (cid:16) h ∆ (cid:17) k − ≤ h (cid:16) h/ ∆2 + 13 ( h/ ∆) − h/ ∆ (cid:17) = h h ∆ (cid:16)

12 + 13 h/ ∆1 − h/ ∆ (cid:17) ≤ hδq (cid:16)

12 + 13 q − q (cid:17) ≤ δh. (7.12)It follows in particular that ˜ ϕ ′ ( δa ) ≥ δa . Therefore, for x to be in the image [0 , ˜ ϕ ′ ( δa )] it issuﬃcient that 0 ≤ x ≤ δa . Thus ˜ ϕ ′ is a monotone increasing bijection from [0 , δa ] onto aninterval containing δa . Specializing to δ = 1 we see that for every x ∈ [0 , a ] there is a unique h ∈ [0 , a ] such that ˜ ϕ ′ ( h ) = x . If in addition x = δa with δ ∈ [0 , h must bein [0 , δa ] and the proof is concluded with (7.11) and (7.12). (cid:3) Lemma 7.4.

We have ˜ ϕ ( h ) ≥ for all h ∈ [0 , a ] .Proof. We have h ≤ a = r s ≤ ∆ √ < ∆2and ˜ ϕ ( h ) = h (cid:16)

12 + s X j =3 κ j j ! h j − (cid:17) ≥ h (cid:16) − ∞ X j =3 j ( j − (cid:0) h ∆ (cid:1) j − (cid:17) ≥ h (cid:16) − h/ ∆1 − h/ ∆ (cid:17) = h − h/ ∆6(1 − h/ ∆) ≥ . (cid:3) Lemma 7.5.

For ≤ h ≤ δa with δ ≤ , we have | ˜ r ( h ) | ≤ ( δ/ √ s +1 − / √ . Proof.

By the deﬁnition of ˜ r ( h ) in (7.3), the bound (7.8) yields | ˜ r ( h ) | ≤ ∞ X k = s +1 h k ( √ a ) k ≤ ( δ/ √ s +1 − δ/ √ . (cid:3) Lemma 7.6.

For s ≥ and h ∈ [0 , δa ] , the function g h given in (7.2) is monotone increasingon [0 , ∞ ) and satisﬁes g h ( u ) ≥ for all u ∈ R + .Proof. Since the assertion of the lemma is trivially true for h = 0, we assume h >

0. We startfrom g ′ h ( u ) = h s − X k =0 ( hu ) k k ! + 2 u ˜ r ( h ) = h (cid:16) exp s − ( hu ) + 2 hu ˜ r ( h ) h (cid:17) . By the deﬁnition of ˜ r ( h ) in (7.3) together with (7.8), i.e. | ˜ m k | ≤ ( √ a ) − k for k ≥ s , and h ∈ (0 , δa ] | ˜ r ( h ) | h ≤ ∞ X k = s +1 h k − ( √ a ) k ≤ a ∞ X k = s − δ k √ k = 12 a − δ/ √ (cid:16) δ √ (cid:17) s − ≤ (cid:16) δ √ (cid:17) s − . (7.13) h t ˜ σ ( h ) aδ a ( δ − δ ) a − ( δ − δ ) a δa h ≤ δa | t | ˜ σ ( h ) ≤ ( δ − δ ) a = T ˜ σ ( h ) Figure 1.

A summary of parameter choices in Lemma 7.7. The function ˜ ϕ ( z ) isevaluated at z = h + i t/ ˜ σ ( h ) with 0 ≤ h ≤ δa and | t | ≤ ( δ − δ ) a ˜ σ ( h ) = T (shadedrectangle), which is contained in the disk | z | ≤ δ a . The parameter a = p s/ (4e)enters the moment bound (7.8).For the last inequality we have used that for s ≥

30, we have2 a (cid:16) − δ √ (cid:17) ≥ s (cid:16) − √ (cid:17) ≥ . Therefore | ˜ r ( h ) | h ≤ (cid:16) √ (cid:17) s − ≤ . We conclude g ′ h ( u ) ≥ h (cid:16) exp s − ( hu ) − hu (cid:17) = h (cid:16) s − X k =2 ( hu ) k k ! (cid:17) ≥ g h is monotone increasing. In particular, g h ( u ) ≥ g h (0) = 1 for all u ∈ R + . (cid:3) Fourier transform of the tilted measure: a ﬁrst bound.

The Fourier transform of thesigned measure µ h is χ h ( t ) = e − ˜ ϕ ( h ) E h g h ( X )e i t ( X − x ) / ˜ σ h i . Motivated by Eq. (6.1), we set˜ χ h ( t ) := exp (cid:16) ˜ ϕ ( h + i t/ ˜ σ h ) − ˜ ϕ ( h ) − i tx/ ˜ σ h (cid:17) . With the analogue of (7.5) for z = h + i t/ ˜ σ h instead of h we see χ h ( t ) = ˜ χ h ( t ) + e − ˜ ϕ ( h ) − i tx/ ˜ σ ( h ) E h g h ( X )e i tX/ ˜ σ ( h ) − g h +i t/ ˜ σ h ( X ) i . (7.14)Eq. (7.14) is a substitute for Eq. (6.1). The term ˜ χ h ( t ) is easily treated with the Taylor series of˜ ϕ . Remember the choice s ≤ , a = p s/ (4e) ≤ ∆ / √ Lemma 7.7.

Assume that s ≤ . Fix < δ < δ < , set a := p s/ (4e) and T := ( δ − δ ) a ˜ σ ( h ) ,see Figure 1. Then for all h ∈ [0 , δa ] and t ∈ [ − T, T ] we have (cid:12)(cid:12) ˜ χ h ( t ) − exp( − t / (cid:12)(cid:12) ≤ | t | T (cid:0) e − t / − e − t / (cid:1) . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 35

Proof.

Let z := h + i t/ ˜ σ ( h ). Then | z | ≤ δ a . A third order Taylor expansion of ˜ ϕ yields, for some θ ∈ [0 , ϕ (cid:0) h + i t ˜ σ ( h ) (cid:1) − ˜ ϕ ( h ) − i tx ˜ σ ( h ) = − t + 13! ˜ ϕ ′′′ (cid:0) h + i θ t ˜ σ ( h ) (cid:1)(cid:0) i t ˜ σ ( h ) (cid:1) . In | z | < ∆, the third derivative is bounded as (cid:12)(cid:12)(cid:12) ˜ ϕ ′′′ ( z ) (cid:12)(cid:12)(cid:12) ≤ s X j =3 | κ j | ( j − | z | j − ≤ s X j =3 ( j − j − | z | j − ≤ dd u (cid:16) ∞ X k =0 (cid:0) u ∆ (cid:1) k (cid:17)(cid:12)(cid:12)(cid:12)(cid:12)(cid:12) u = | z | = 1∆ 1(1 − | z | / ∆) . Set L := 3∆(1 − | z | / ∆) ˜ σ ( h ) , then˜ χ h ( t ) = exp (cid:16) − t (cid:16) θ − | z | / ∆) ˜ σ ( h ) t ∆ (cid:17)(cid:17) = exp (cid:16) − t (cid:16) θ tL (cid:17)(cid:17) for some θ ∈ C with | θ | ≤

1. Assuming that | t | ≤ T ≤ L/

2, we have (cid:12)(cid:12)(cid:12) e − t (1+ θ tL ) − e − t (cid:12)(cid:12)(cid:12) ≤ e − t (cid:16) e | t | / (2 L ) − (cid:17) = e − t ∞ X k =1 k ! (cid:16) | t | L (cid:17) k = e − t | t | ∞ X k =1 k ! (cid:16) t (cid:17) k (4 | t | ) k − (2 L ) k − L ≤ | t | T e − t (cid:0) e t − (cid:1) = | t | T (cid:0) e − t − e − t (cid:1) . It remains to check T ≤ L/

2. To that aim we use the lower bound for ˜ σ ( h ) from Lemma 7.3together with | z | ≤ δ a and we evaluate TL/ δ − δ ) a ˜ σ ( h )3∆(1 − | z | / ∆) ˜ σ ( h ) ≤ δ − δ − . δ a/ ∆(1 − δ a/ ∆) ≤ δ − δ − . δ < . The last inequality is equivalent to δ < − . δ , it holds true because δ < (cid:3) Lemma 7.8.

Assume that ≤ s ≤ . Fix δ ∈ (0 , , set a := p s/ (4e) and further assumethat a − ≤ δ . Then for all h ≥ and t ∈ R with | h + i t ˜ σ ( h ) | ≤ δ a , we have | χ h ( t ) − ˜ χ h ( t ) | ≤ √ δ ⌊ a ⌋ − δ . Proof.

We will use (7.14). By the deﬁnition of g h , we have g h ( X )e i tX/ ˜ σ h − g h +i t/ ˜ σ h ( X )= (cid:0) exp s ( hX )e i tX/ ˜ σ h − exp s (( h + i t ˜ σ h ) X ) (cid:1) + X (cid:0) ˜ r ( h )e i tX/ ˜ σ h − ˜ r ( h + i t ˜ σ h ) (cid:1) (7.15)By (7.14) we have to evaluate the expected value of (7.15). Set z := h + i t/ ˜ σ ( h ); notice | h | ≤ | z | .With Lemma 7.1, the expectation of the second term in (7.15) is bounded in absolute value by I := E h X (cid:12)(cid:12) ˜ r ( h )e i tX/ ˜ σ h − ˜ r ( h + i t/ ˜ σ h ) (cid:12)(cid:12)i ≤ E [ X ] (cid:0) | ˜ r ( h ) | + | ˜ r ( h + i t/ ˜ σ h ) | (cid:1) ≤ ∞ X k = s +1 k ! | ˜ m k | | z | k ≤ ∞ X k = s +1 ( δ a ) k ( √ a ) k , (7.16)where we have used E [ X ] = 1, | z | ≤ δ a and k ! | ˜ m k | ≤ √ a ) k for k ≥ s as derived in (7.8). Forthe ﬁrst term in (7.15), we remember exp s ( y ) = exp( y ) − P ∞ k = s +1 1 k ! y k and deduceexp s ( hX )e i tX/ ˜ σ h − exp s (( h + i t ˜ σ h ) X ) = − ∞ X k = s +1 X k k ! (cid:16) e i tX/ ˜ σ h h k − ( h + i t ˜ σ h ) k (cid:17) . The intuition is of course that this term should be small, however taking expected values weobtain moments E [ X k ] with k ≥ s + 1, over which we have little control. Therefore we introducean additional truncation parameter b := 4 a > | X | ≤ b only: we estimate, with the help of k ! ≥ ( k/ e) k ≥ ( s/ e) k for k ≥ s as well as b/ ( s/ e) ≤ /a and | z | /a ≤ δ I (1)1 := (cid:12)(cid:12)(cid:12) E h {| X |≤ b } (cid:16) exp s ( hX )e i tX/ ˜ σ h − exp s (( h + i t ˜ σ h ) X ) (cid:17)i(cid:12)(cid:12)(cid:12) ≤ ∞ X k = s +1 ( b | z | ) k k ! ≤ ∞ X k = s +1 ( b | z | ) k ( s/ e) k ≤ ∞ X k = s +1 δ k . (7.17)For | X | > b , we use instead | exp s ( y ) | ≤ P sk =0 | y | k /k !, which yields I (2)1 := (cid:12)(cid:12)(cid:12) E h {| X | >b } (cid:16) exp s ( hX )e i tX/ ˜ σ h − exp s (( h + i t ˜ σ h ) X ) (cid:17)i(cid:12)(cid:12)(cid:12) ≤ s X k =0 | z | k k ! E (cid:2) | X | k {| X |≥ b } (cid:3) . Using the bound on the truncated moments provided by Lemma 7.2 and | z | /a ≤ δ , we can furtherestimate 12 I (2)1 ≤ ⌊ b ⌋ X k =0 √ | z | k a ⌊ b ⌋ + s X k = ⌊ b ⌋ +1 √ | z | k a k ≤ ⌊ b ⌋ X k =0 √ | z | k a ⌊ b ⌋ + s X k = ⌊ b ⌋ +1 √ δ k . (7.18)The condition a − ≤ δ allows us to treat the sum over 0 ≤ k ≤ ⌊ b ⌋ using | z | k /a ⌊ b ⌋ ≤ δ k /a ⌊ b ⌋− k ≤ δ ⌊ b ⌋− k . We get I (2)1 ≤ √ (cid:16) ⌊ b ⌋ X k =0 δ ⌊ b ⌋− k + s X k = ⌊ b ⌋ +1 δ k (cid:17) ≤ √ (cid:16) ⌊ b ⌋ X ℓ = ⌊ b ⌋ δ ℓ + s X k = ⌊ b ⌋ +1 δ k (cid:17) ≤ √ s X k = ⌊ b ⌋ δ k , where we have used again 2 b ≤ s . Altogether I (2)1 + I (1)1 + I ≤ √ s X k = ⌊ b ⌋ δ k + 2 ∞ X k = s +1 δ k + 2 ∞ X k = s +1 (cid:16) δ √ (cid:17) k ≤ √ δ ⌊ b ⌋ − δ . The proof is concluded with | χ h ( t ) − ˜ χ h ( t ) | ≤ e − ˜ ϕ ( h ) ( I (1)1 + I (2)1 + I ) and the lower bound ˜ ϕ ( h ) ≥ (cid:3) Lemma 7.9.

Assume that ≤ s ≤ . Fix < δ < δ < , set a := p s/ (4e) and T :=( δ − δ ) a ˜ σ ( h ) . Further assume that a − ≤ δ . Then for all h ∈ [0 , δa ] and t ∈ R with | t | ≤ T , wehave (cid:12)(cid:12) χ h ( t ) − exp( − t / (cid:12)(cid:12) ≤ | t | T (cid:0) e − t / − e − t / (cid:1) + 4 √ δ ⌊ a ⌋ − δ . Proof.

The lemma is an immediate consequence from the Lemmas 7.7 and 7.8. (cid:3)

We want to plug this bound into Zolotarev’s lemma. The ﬁrst part ˜ χ h ( t ) is treated exactly as itwas for s = ∞ . The second term is problematic: because of | z | ≥ | h | , it does not go to zero as t → t/t diverges. So another bound is needed forvery small t .7.5. Fourier transform of the tilted measure: a second bound.

An alternative bound forvery small t is derived with a Taylor expansion of the Fourier transform, χ h ( t ) = 1 + i t Z R y d µ h ( y ) + θ t Z R y d | µ h | ( y ) , (7.19)see Eq. (4.14) in [39, Chapter VI.4]. When s = ∞ , the mean and second moment of the tiltednormalized measure are 0 and 1, respectively. For ﬁnite s , we have instead the following lemma. Lemma 7.10.

Assume that s is even and satisﬁes ≤ s ≤ . Further assume that x ∈ [0 , δa ] with δ ∈ (0 , and a = p s/ (4e) . Then Z R y d µ h ( y ) = θ . δ s +1 , Z R y d | µ h | ( y ) = 1 + θ . δ s +1HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 37 Proof.

First we check that the order of magnitude O ( δ s +1 ) is correct. We evaluate Z R y d µ h ( y ) = e − ˜ ϕ ( h ) E h g h ( X ) X − x ˜ σ ( h ) i = ˜ σ ( h ) − (cid:16) e − ˜ ϕ ( h ) E (cid:2) Xg h ( X ) (cid:3) − x (cid:17) . (7.20)By the deﬁnition of g h and in view of E [ X k ] = m k = ˜ m k for k ≤ s + 2 (see Lemma 7.1), we have E (cid:2) Xg h ( X ) (cid:3) = s X j =0 h j j ! m j +1 + m ˜ r ( h ) = dd h e ˜ ϕ ( h ) − ∞ X j = s +1 h j j ! ˜ m j +1 + m ˜ r ( h ) . In view of ˜ ϕ ′ ( h ) = x , the ﬁrst term is cancelled by − x in (7.20), we ﬁnd Z R y d µ h ( y ) = ˜ σ ( h ) − e − ˜ ϕ ( h ) (cid:16) − ∞ X j = s +1 h j j ! ˜ m j +1 + m ˜ r ( h ) (cid:17) = O ( δ s +1 ) , (7.21)where we have used (7.8) and h ≤ δa . For the second moment, we need to evaluate Z R y d | µ h | ( y ) = e − ˜ ϕ ( h ) E h | g h ( X ) | (cid:16) X − x ˜ σ ( h ) (cid:17) i . Since s is even, the truncated exponential is positive by Lemma 4.4 and the triangle inequalityyields exp s ( u ) − u | ˜ r ( h ) | ≤ | g h ( u ) | ≤ exp s ( u ) + u | ˜ r ( h ) | ( u ∈ R )hence J − J ≤ Z R y d | µ h | ( y ) ≤ J + J (7.22)with J := exp( − ˜ ϕ ( h ))˜ σ ( h ) E h exp s ( hX )( X − x ) i , J := exp( − ˜ ϕ ( h ))˜ σ ( h ) E h X | ˜ r ( h ) | ( X − x ) i . We evaluate J = ˜ σ ( h ) − e − ˜ ϕ ( h ) E h exp s ( hX )( X − xX + x ) i = ˜ σ ( h ) − e − ˜ ϕ ( h ) s X k =0 h k k ! ( m k +2 − xm k +1 + x m k )= ˜ σ ( h ) − e − ˜ ϕ ( h ) n(cid:16) (e ˜ ϕ ) ′′ ( h ) − x (e ˜ ϕ ) ′ ( h ) + x e ˜ ϕ ( h ) (cid:17) − ∞ X k = s +1 h k k ! (cid:0) ˜ m k +2 − x ˜ m k +1 + x ˜ m k (cid:1)o . The terms involving ˜ ϕ ( h ) and its derivatives combine to˜ σ ( h ) − e − ˜ ϕ ( h ) (cid:16) (e ˜ ϕ ) ′′ ( h ) − x (e ˜ ϕ ) ′ ( h ) + x e ˜ ϕ ( h ) (cid:17) = 1˜ σ ( h ) (cid:0) ˜ ϕ ′′ ( h ) + ˜ ϕ ′ ( h ) − x ˜ ϕ ′ ( h ) + x (cid:1) = 1(remember x = ˜ ϕ ′ ( h ) and ˜ σ ( h ) = ˜ ϕ ′′ ( h )), consequently J = 1 − ˜ σ ( h ) − e − ˜ ϕ ( h ) ∞ X k = s +1 h k k ! (cid:0) ˜ m k +2 − x ˜ m k +1 + x ˜ m k (cid:1) = 1 + O ( δ s +1 ) . (7.23)For J , we evaluate by Lemma 7.5 J = ˜ σ ( h ) − e − ˜ ϕ ( h ) | ˜ r ( h ) | ( m − xm + x ) = O ( δ s +1 ) . (7.24)It follows that R R y d | µ h | ( y ) = 1 + O ( δ s +1 ).Next we bound the constants in front of δ s +1 . By Lemmas 7.3 and 7.4, we have ˜ σ ( h ) ≥ − .

751 = 0 .

249 and ˜ ϕ ( h ) ≥

0. Consequently0 < ˜ σ ( h ) − exp( − ˜ ϕ ( h )) ≤ . , < ˜ σ ( h ) − exp( − ˜ ϕ ( h )) ≤ . . (7.25)Moreover, we can estimate the third and fourth moment by | m | = | κ | ≤ ≤ a √ by the cumulant bound and ∆ ≥ s = 2e a and | m | = | κ + 3 κ | ≤ + 3 ≤ ≥ s ≥

15. For ˜ r ( h ) and the power series in h , we use (7.8) and h ≤ δa , set u = h/ ( √ a )and obtain | ˜ r ( h ) | ≤ u s +1 − u , ∞ X k = s +1 h k k ! | ˜ m k +1 | ≤ √ a dd u ∞ X k = s +2 u k = u s +1 √ a (cid:16) ( s + 2)1 − u + u (1 − u ) (cid:17) , ∞ X k = s +1 h k k ! | ˜ m k +2 | ≤ a d d u ∞ X k = s +3 u k = u s +1 a (cid:16) ( s + 3)( s + 2)1 − u + 2 ( s + 3) u (1 − u ) + 2 u (1 − u ) (cid:17) . We use the notation U j := sup u ∈ [0 , √ ] u j − (1 − u ) j =  √ , j = 14 + 3 √ , j = 210 + 7 √ , j = 3 , where the latter representation follows from monotonicity. With this notation we have, in view of(7.21), (7.24), (7.23) together with u s +1 ≤ ( δ/ √ s +1 , (7.25) and x ≤ a = p s e ≤ √ √ e ∆ resp. x ≤ s e (cid:12)(cid:12)(cid:12)Z R y d µ h ( y ) (cid:12)(cid:12)(cid:12) ≤ . δ/ √ s +1 (cid:18) √ a (( s + 2) U + U ) + 1 a √ e U (cid:19) ,J ≤ . δ/ √ s +1 U (3 + 215 + 2 x

1∆ + x ) , ≤ . δ/ √ s +1 U (3 + 215 + 2 √ √ e + s e ) | J − | ≤ . δ/ √ s +1 (cid:16) a ( U ( s + 3)( s + 2) + 2( s + 3) U + 2 U )+ 43 √ s + 2) U + U ) + s e U (cid:17) . From a = p s e ≥ q e together with the representations of U j , j = 1 , , s +2 √ s +1 ≤ √ resp. ( s +3)( s +2) √ s +1 ≤ · √ we immediately get (cid:12)(cid:12)(cid:12)Z R y d µ h ( y ) (cid:12)(cid:12)(cid:12) ≤ δ s +1 . J ≤ δ s +1 . | J − | ≤ δ s +1 . (cid:12)(cid:12)(cid:12)Z R x d | µ h | ( x ) − (cid:12)(cid:12)(cid:12) ≤ | J − | + | J | ≤ . δ s +1 . (cid:3) Lemma 7.11.

Under the assumptions of Lemma 7.10, we have (cid:12)(cid:12) χ h ( t ) − exp( − t ) (cid:12)(cid:12) ≤ δ s +1 . | t | + (1 + 0 . δ s +1 ) t for all t ∈ R .Proof. We combine Eq. (7.19), Lemma 7.10 and | exp( − t ) − | ≤ t (cid:12)(cid:12) χ h ( t ) − exp( − t ) (cid:12)(cid:12) = (cid:12)(cid:12)(cid:12) θt . δ s +1 + θ t (cid:0) θ . δ s +1 (cid:1) − e − t / (cid:12)(cid:12)(cid:12) ≤ . δ s +1 | t | + t . δ s +1 ) + t . (cid:3) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 39

Combining Lemmas 7.9 and 7.11, we get (cid:12)(cid:12) χ h ( t ) − exp( − t ) (cid:12)(cid:12) ≤ min (cid:16) tT (cid:0) e − t / − e − t / (cid:1) + 4 √ δ ⌊ a ⌋ − δ , δ s +1 . | t | + (1 + 0 . δ s +1 ) t . (7.26)Notice that this upper bound is integrable against d t/t . The bound is valid if the assumptions ofboth lemmas are met. For future reference we summarize these assumptions, see Figure 1. Assumption 7.12.

The quantities s ∈ N , x ≥ δ, δ ∈ [0 ,

1] satisfy the following: • ≤ s ≤ and s is even. We set a := p s . • x ∈ [0 , δa ]. • < δ < δ < a − ≤ δ . We set T := ( δ − δ ) a ˜ σ ( h ), with h ∈ [0 , δa ] the solution of˜ ϕ ′ ( h ) = x . • Fourier transforms are evaluated at t ∈ [ − T, T ].7.6.

Normal approximation for the tilted measure.

Armed with the bound (7.26) andLemma 4.1 we can bound the Kolmogorov distance between the normal law and the tilted signedmeasure µ h . Set D h := sup y ∈ R (cid:12)(cid:12) µ h (cid:0) ( −∞ , y ] (cid:1) − P ( Z ≤ y ) (cid:12)(cid:12) . Lemma 7.13.

Fix δ ∈ (0 , . Then, there exists C ( δ ) > and s δ ∈ N such that for all h ∈ (0 , δa ) and all even s ≥ s δ with s ≤ , we have D h ≤ C ( δ ) √ s . (7.27) Proof.

Fix δ ∈ ( δ, s , the condition δ > a − in Assumption 7.12is satisﬁed. We apply Lemma 4.1 to ν = µ h (the signed tilted measure), µ = N (0 ,

1) the standardnormal distribution, q = 1 / √ π , and ε = 1 /T with T of the order of √ s , which gives D h ≤ C √ π T + C π Z T (cid:16) − tT (cid:17)(cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt + C µ − h ( R ) (7.28)with C and C the numerical constants deﬁned in Eq. (4.3) and C = C . ≈ . . The third term in (7.28) is missing in [112] because of an erroneous application of Zolotarev’slemma, see the remark after Lemma 4.1.For T of the order of √ s , clearly the ﬁrst term on the right-hand side of (7.28) is of the orderof 1 / √ s . The measure µ h was deﬁned in (7.4), its negative part is given by µ − h ( R ) = e − ˜ ϕ ( h ) E (cid:2) ( g h ) − ( X ) i ≤ e − ˜ ϕ ( h ) E (cid:2) X ˜ r ( h ) (cid:3) = e − ˜ ϕ ( h ) ˜ r ( h ) , where we used the positivity of the truncated exponential shown in Lemma 4.4. Lemma 7.4 and 7.5give µ − h ( R ) ≤ ( δ/ √ s +1 − / √ O (cid:16)(cid:16) √ (cid:17) s (cid:17) , (7.29)which decays exponentially fast as s → ∞ and is negligible compared to 1 / √ s .In the integrand in the second term in (7.28) we bound (1 − | t | /T ) ≤

1, split the domain ofintegration into two subintervals [0 , t ] and [ t , T ] for some t >

0, and apply (7.26). This yieldstwo contributions. The ﬁrst one is bounded by Z t (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt ≤ Z t (cid:16) O ( δ s +1 ) + (1 + O ( δ s +1 )) t (cid:17) d t = t O ( δ s +1 ) + t O ( δ s +1 )) . Using Lemma 7.9 we ﬁnd that the second contribution is bounded by Z Tt (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt ≤ T Z ∞ t e − t / d t + Z Tt O ( δ ⌊ a ⌋ ) d tt ≤ CT + O ( δ ⌊ a ⌋ ) log Tt . Choose t = δ ⌊ a ⌋ / , then we get in the limit s → ∞ at ﬁxed δ Z T (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt = O (cid:16) √ s (cid:17) + O (cid:0) δ ⌊ a ⌋ (cid:0) s + log δ + log s ) (cid:1) = O (cid:16) √ s (cid:17) . (7.30)We combine Eqs. (7.28), (7.29), (7.30) and arrive at D h = O (1 / √ s ). Precise bounds depend on δ and δ but not on the exact value of h ∈ (0 , δ ) or x = ˜ ϕ ′ ( h ) and the proof is complete. (cid:3) Lemma 7.13 is all that is needed for the proof of Theorem 2.2. For the reader interested innumerical values, we add two further bounds, following [112, Eqs. (2.48) and (2.55)]. Set t := 4 √ δ ⌊ a ⌋ − δ . (7.31)The main idea for the following lemma is then to split the domain of integration [0 , T ] into theintervals [0 , t ] and [ t , T ]. Lemma 7.14.

Under Assumption 7.12, we have Z T (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt ≤ √ πT √ −

12 + t (cid:16) . Tt , (cid:17) . (7.32) Proof.

We reﬁne the bounds from the proof of Lemma 7.13 as in [112, Eq. (2.46)]. The fractionwith δ , which we denote by t according to (7.31), appeared in the error bound on Fouriertransforms in Lemma 7.8. The bound (7.26) yields Z t (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt ≤ . δ s +1 t + (1 + 0 . δ s +1 ) t . (7.33)and, if T ≥ t , Z Tt (cid:12)(cid:12) χ h ( t ) − e − t / (cid:12)(cid:12) d tt ≤ T Z Tt (e − t / − e − t / ) d t + t Z Tt d tt ≤ √ πT √ −

12 + t log Tt . (7.34)To that aim we note, ﬁrst, that, because of δ < δ < a ≤ √ s ≤ s , we have δ s +1 t ≤ δ s +1 −⌊ a ⌋ s − δ √ ≤ − / Therefore (7.33) is bounded by t (cid:0) . × − / + 1 . ≤ t . . (7.35)If T ≥ t , the inequality (7.32) follows by adding up (7.33) and (7.34) and combining with (7.35).If T ≤ t , then the integral from zero to T is bounded by the integral from zero to t and (7.33)holds true because the right-hand side of (7.32) is larger than the right-hand side of (7.33). Weinsert the bound into (7.28), combine with (7.29), and obtain the lemma. (cid:3) Lemma 7.15.

Let s be an even integer with ≤ s ≤ . Fix δ ∈ [0 , and x ∈ [0 , δa ] anddeﬁne h ∈ [0 , δa ] by ˜ ϕ ′ ( h ) = x . Deﬁne δ := 1 − − δ s / . Then Assumption 7.12 is satisﬁed and D h ≤ . − δ ) √ s (cid:16) . s exp (cid:16) −

12 (1 − δ ) s / (cid:17)(cid:17) . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 41

Proof.

The only conditions to be checked in Assumption 7.12 are δ ≥ a − and 0 < δ < δ < < − δ = 1 − δ s / < − δ. The ﬁrst inequality holds true because √ a δ ≥ (cid:16) s (cid:17) / (cid:16) − s / (cid:17) = (4e) − / (cid:0) s / − . (cid:1) . For s = 30, the last expression is approximately equal to 1.0135 hence in particular larger than1, which by monotonicity extends to all of s ≥

30. Thus Assumption 7.12 holds true and we mayapply the bounds from Lemma 7.13 and Lemma 7.14.We want to extract from (7.28) and (7.32) a bound expressed directly in terms of s , δ , andnumerical constants. In order to bound 1 /T with T = ( δ − δ ) a ˜ σ ( h ), we remember the bound˜ σ ( h ) = ˜ ϕ ′′ ( h ) ≥ − .

751 from Lemma 7.3 and bound T ≥ √ . r s (cid:16) − s / (cid:17) (1 − δ ) ≥ .

118 (1 − δ ) √ s (7.36)for s ≥

30. In order to bound t deﬁned in (7.31), we note, ﬁrst, 4 a = 2 p s/ e ≥ √ s , hence ⌊ a ⌋ ≥ √ s − δ √ s ≤ exp (cid:16) − (1 − δ ) s / / (cid:17) , where we have used the standard inequality 1 − x ≤ exp( − x ). As a consequence, t ≤ √ δ (1 − δ ) δ √ s ≤ √ − / (2 s / ) 2 s / − δ exp (cid:16) − (1 − δ ) s / / (cid:17) . It remains to take care of t max(log( T /t ) ,

0) in (7.32). We bound log(

T /t ) by a constant times s / . In view of Lemma 7.3 and δ − δ ≤ T ≤ log( a ˜ σ ( h )) ≤

12 log 1 .

751 + 12 log s . It is easily checked that x − log x is maximal at x = e hence log x ≤ x/ e and log x = 4 log x / ≤ x / / e, therefore log T ≤

12 log (cid:16) . s (cid:17) ≤ (cid:16) . (cid:17) / s / ≤ . s / . Furthermore, by the deﬁnition of t − log t ≤ (cid:16) − a log δ − log(4 √

2) + log(1 − δ ) (cid:17) ≤ − a log δ = − r s e log δ . It is easily checked that x (log x −

1) is minimal at x = 1, from which one deduces the inequalitylog x ≥ − /x . We apply the inequality to log δ , use (1 − / (2 s / )) − ≤ .

272 for s ≥

30, andﬁnd − log δ ≤ − δ δ ≤ − δ s / − / (2 s / ) ≤ . s − / . It follows that log T − log t ≤ (cid:16) .

47 + 0 . √ e (cid:17) s / ≤ . s / and for s ≥

30 0 . (cid:16) log Tt , (cid:17) ≤ . s / . We insert this bound into (7.32), combine with (7.36) and (7.28) for C and C deﬁned by (4.3),and ﬁnd that D h ≤ A + B + C with A = 1 T (cid:16) C √ π + C π √ π √ − (cid:17) ≤ − δ ) √ s . (cid:16) . √ π + 1 . √ π ( √ − (cid:17) ≤ − δ ) √ s . and B = C π t × . s / ≤ . π × . × √ − / (2 × / ) √ s − δ exp (cid:16) −

12 (1 − δ ) s / (cid:17) ≤ . √ s − δ exp (cid:16) −

12 (1 − δ ) s / (cid:17) . and ﬁnally for s ≥ C = C µ − h ( R ) ≤ . (cid:16) δ √ (cid:17) s +1 ≤ − δ s +1 . Notice δ s +1 = δ (cid:0) − (1 − δ ) (cid:1) s ≤ e − (1 − δ ) s ≤ √ s e − (1 − δ ) s / / . The lemma follows by adding up the bounds for A , B , and C . (cid:3) Undoing the tilt.

The relation (7.6) yields P ( X ≥ x ) = e ˜ ϕ ( h ) Z ∞ g h (˜ σ ( h ) y + x ) − d µ h ( y ) , which we want to bring into the form P ( X ≥ x ) = e ˜ ϕ ( h ) − hx (cid:16) E (cid:2) e − h ˜ σ ( h ) Z { Z ≥ } (cid:3) + error term (cid:17) . Let I ( h ) := Z ∞ g h (˜ σ ( h ) y + x ) − d µ h ( y ) − E h g h (˜ σ ( h ) Z + x ) − { Z ≥ } i ,I ( h ) := E (cid:2) g h ( x + ˜ σ ( h ) Z ) − { Z ≥ } (cid:3) so that P ( X ≥ x ) = e ˜ ϕ ( h ) (cid:0) I ( h ) + I ( h ) (cid:1) . (7.37)Lemma 7.16 below says that I ( h ) is small, Lemma 7.17 says that I ( h ) is approximately equal to E [exp( − hx − h ˜ σ ( h ) Z )1l { Z ≥ } ]. Lemma 7.16.

For s ≥ , h ∈ [0 , δa ] and x < √ s/ (3 √ e) let D h be the Kolmogorov distancebetween µ h and the normal distribution N (0 , . Then | I ( h ) | ≤ . D h exp( − hx ) . Proof.

We apply Lemma 4.3 to f ( y ) = 1 /g h (˜ σ ( h ) y + x ), which is monotone decreasing by Lemma7.6, and get | I ( h ) | ≤ g h ( x ) − sup y ∈ R (cid:12)(cid:12) µ h (cid:0) ( −∞ , y ] (cid:1) − P ( Z ≤ y ) (cid:12)(cid:12) = 2 g h ( x ) − D h . We have for all u ≥ θ = θ h,u ∈ [ − , g h ( x ) = exp s ( hx ) + 12 ( hx ) r ( h ) h = (cid:16) θ r ( h ) h (cid:17) exp s ( hx ) . A Taylor expansion for the exponential shows exp( hx ) = exp s ( hx ) + θ ( hx ) s +1 ( s +1)! henceexp s ( hx ) = e hx (cid:16) − θ ( hx ) s +1 ( s + 1)! (cid:17) . Under our assumptions on h and x we have0 ≤ hx ≤ δ r s √ s √ e = s e δ . We estimate with the aid of Stirling( hx ) s +1 ( s + 1)! ≤ ( s/ e) s +1 ( s + 1)! (cid:16) δ (cid:17) s +1 ≤ (cid:16) δ (cid:17) s +1 . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 43

Altogether, combining also with (7.13) and s ≥ g h ( x ) − ≤ exp( − hx )(1 − ( δ/ √ s − )(1 − ( δ/ s +1 ) ≤ . − hx (7.38)and the lemma follows. (cid:3) Lemma 7.17.

Under Assumption 7.12, we have I ( h ) = e − hx E (cid:2) exp( − h ˜ σ ( h ) Z )1l { Z ≥ } (cid:3) (1 + θ ( δ/ √ s − )(1 + θδ s +1 ) + θ . √ π e − s/ ! . Proof.

We split the expectations in two contributions, one belonging to Z ≤ b for some well-chosen truncation parameter b and another contribution belonging to Z > b . On the event Z ≤ b , we are going to estimate g h ( x + ˜ σ ( h ) Z ) − in a way similar to Lemma 7.16 (where we hadlooked at g h ( x ) − only). The event Z > b is shown to give a negligible contribution. Set b := b − x ˜ σ ( h ) ≥ a. The bound uses ˜ σ ( h ) ≤ b = 4 a and x ≤ a . For Z = y ≥ b , we note0 ≤ g h ( x + ˜ σ ( h ) Z ) − ≤ g h ( x ) − and bound g h ( x ) − as in (7.38). This yields0 ≤ E (cid:2) g h ( x + ˜ σ ( h ) Z ) − { Z ≥ b } (cid:3) ≤ . − hx ) P ( Z ≥ b ) . Next consider Z = y ∈ [0 , b ]. Then u := x + ˜ σ ( h ) y ≤ b , moreover hu ≤ hb ≤ ( δa )(4 a ) = δ s eand ( hu ) s +1 ( s + 1)! ≤ ( s/ e) s +1 ( s + 1)! δ s +1 ≤ δ s +1 . Proceeding as in the proof of Lemma 7.16, we see thatexp( − hu )(1 + ( δ/ √ s − )(1 + δ s +1 ) ≤ g h (cid:0) u ) − ≤ exp( − hu )(1 − ( δ/ √ s − )(1 − δ s +1 ) . We substitute u = x + ˜ σ ( h ) Z , take expectations, and ﬁnd E (cid:2) g h ( x + ˜ σ ( h ) Z ) − { Z ∈ [0 ,b ] } (cid:3) = E (cid:2) exp( − hx − h ˜ σ ( h ) Z )1l { Z ∈ [0 ,b ] } (cid:3) (1 + θ ( δ/ √ s − )(1 + θδ s +1 ) . On the right-hand side, replacing the constraint Z ∈ [0 , b ] by Z ≥ E (cid:2) exp( − h ˜ σ ( h ) Z )1l { Z ∈ [0 ,b ] } (cid:3) ≥ E (cid:2) exp( − h ˜ σ ( h ) Z )1l { Z ≥ } (cid:3) − P ( Z ≥ b ) . Combining everything, we get I ( h ) ≤ E (cid:2) exp( − hx − h ˜ σ ( h ) Z )1l { Z ≥ } (cid:3) (1 + θ ( δ/ √ s − )(1 + θδ s +1 ) + 1 . − hx ) P ( Z ≥ b )and I ( h ) ≥ E (cid:2) exp( − hx − h ˜ σ ( h ) Z )1l { Z ≥ } (cid:3) (1 + θ ( δ/ √ s − )(1 + θδ s +1 ) − . − hx ) P ( Z ≥ b ) . Finally we estimate P ( Z ≥ b ) ≤ √ π Z ∞ b yb e − y / d y ≤ b √ π e − b / ≤ √ π e − s/ . In the last inequality we have used b ≥ b ≥ a = s ≃ . s ≥ s . (cid:3) Cram´er-Petrov series.

The function ˜ L ( x ) deﬁned in (2.7) is for h = t ( x ) equal to˜ L ( x ) = ˜ ϕ ( h ) − hx + 12 x . We can now estimate the absolute value of ˜ L . Lemma 7.18.

Suppose ≤ x < √ s/ (3 √ e) with s ≤ . Then | ˜ L ( x ) | ≤ . x / ∆ . We emphasize that the lemma does not need the assumption s ≥

30 or s even. Proof.

To conclude, we follow [112, p. 31] to bound ˜ L ( x ), starting from˜ L ( x ) = 12 h + s X j =3 κ j j ! h j − hx + 12 x = 12 ( h − x ) + s X j =3 κ j j ! h j , which gives | ˜ L ( x ) | ≤ x (cid:16) (cid:16) hx − (cid:17) + h x s X j =3 h j − j ( j − j − (cid:17) . Set s ′ := 2∆ , a ′ := p s ′ / (4e), and δ ′ := x/ ( a ′ ). Because of a ′ ≥ a = √ s/ (3 √ e), we have δ ′ <

1. It is easily checked that Lemma 7.3 also holds true for primed quantities. In particular,the solution h of ˜ ϕ ′ ( h ) = x satisﬁes h ≤ δ ′ a ′ = δ ′ ∆ / √ s X j =3 h j − j ( j − j − ≤ δ ′ ∞ X j =3 (1 / √ j − j ( j − ≤ . δ ′ , where we used a numerical evaluation of the series. By Lemma 7.3 applied to primed variables, x = h (1 + θδ ′ / h/x ) ≤ / (cid:16) hx − (cid:17) ≤ (cid:16) δ ′ / − δ ′ / (cid:17) ≤ δ ′ (cid:16) / − / (cid:17) ≤ δ ′ . Altogether | ˜ L ( x ) | ≤ Cx δ ′ = Cx x / (3 √ ≤ . x x ∆with C = 0 . (cid:3) Lemma 7.19.

Suppose ≤ x < ∆ / (3 √ e ) . Then ˜ L ( x ) = P ∞ j =3 ˜ λ j x j with ˜ λ j the coeﬃcients givenafter Deﬁnition A.3. The series is absolutely convergent in | x | ≤ . .Proof. Let ˜ b k be the coeﬃcients from Eq. (A.7). By Proposition A.5, the series H ( x ) = ∞ X k =1 ˜ b k x k has radius of convergence larger or equal to 0 . a = 23 r s ≤ √ √ e ∆ < . , it follows that H ( x ) is absolutely convergent for all x ∈ [0 , a ]. The deﬁnition of the coeﬃcient˜ b k ensures that ˜ ϕ ′ ( H ( x )) = x whenever H ( x ) is convergent. Since H ( x ) → x → H ( x ) = h ( x ) for all suﬃciently small x . Because of h = h ( x ) ∈ [0 , δa ]and ˜ ϕ ′′ ( h ) = 0 on [0 , a ] (again by Lemma 7.3), the holomorphic inverse function theorem showsthat h ( x ) is analytic in [0 , a ). Therefore the equality h ( x ) = H ( x ) extends to all of [0 , a ). Theseries representation for ˜ L ( x ) then follows from the considerations in Appendix A. (cid:3) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 45

Theorem 2.2—Conclusion of the proof.

In the evaluation of E [exp( − h ˜ σ ( h ) Z )1l { Z ≥ } ]appearing in the approximation for I ( h ) from Lemma 7.17 we would like to replace h ˜ σ ( h ) by x ,and control the error with Lemma 4.2. To that aim we ﬁrst compare h ˜ σ ( h ) with x . Lemma 7.20.

Let δ ∈ [0 , , x = δa , and h ∈ [0 , δa ] the solution of ˜ ϕ ′ ( h ) = x . Then E (cid:2) e − h ˜ σ ( h ) Z { Z ≥ } (cid:3) = 11 + θ . δ e x / P ( Z ≥ x ) . Proof.

We show ﬁrst (cid:12)(cid:12)(cid:12) h ˜ σ ( h ) x − (cid:12)(cid:12)(cid:12) ≤ . δ. (7.39)This bound replaces [112, Eq. (2.53)] which looks incorrect. We start as in [112] and note h ˜ σ ( h ) − x = h ˜ ϕ ′′ ( h ) − ˜ ϕ ′ ( h ) = s X j =2 (cid:16) j − − j − (cid:17) κ j h j − = θ s X j =3 j − j − h (cid:16) h ∆ (cid:17) j − . The sum is non-negative and can be bounded from above, with q = 1 / √

2e and h ≤ δa ≤ δq ∆, by h multiplied with s X j =3 j − j − qδ ) j − ≤ qδ (cid:16)

12 + s − X k =1 ( qδ ) k (cid:17) ≤ qδ (cid:16)

12 + q − q (cid:17) ≤ . δ. Therefore we have h ˜ σ ( h ) − x = θ . δh . Combining with the bound x = h (1 + θ δ ) fromLemma 7.3, we get h ˜ σ ( h ) x = 1 + θδh . h (1 + θδ/

3) = 1 + θ δ . − / θ . δ. Eq. (2.53) in [112] is the same but with ˜ σ ( h ) instead of ˜ σ ( h ) , which looks like a typo. We haveto work a bit more to get rid of the square of the tilted variance. We deduce with the help of theinequality √ u ≤ u that h ˜ σ ( h ) = p hx (1 + θ . δ ) = x s θ . δ θδ/ ≤ x (cid:16) . / δ / (cid:17) ≤ x (1 + 0 . δ ) . For the lower bound, we exploit the concavity of u

7→ √ − u on [0 , u ( √ − u −

1) is monotone decreasing. Let u = (0 . / δ δ/ , u ∗ = 0 . /

31 + 1 / . . Then u ≤ u ∗ and h ˜ σ ( h ) x − s θ . δ θδ/ − ≥ √ − u − ≥ √ − u ∗ − u ∗ u. We notice 1 − √ − u ∗ u ∗ ≤ . , uδ ≤ . / ≤ . h ˜ σ ( h ) x − ≥ − . · . δ ≥ − . δ. The bound (7.39) for h ˜ σ ( h ) /x follows. In other words, for η := h ˜ σ ( h ) − x we showed | η/x | ≤ . δ .Therefore Lemma 4.2 yields E (cid:2) e − h ˜ σ ( h ) Z { Z ≥ } (cid:3) = xx + θη E (cid:2) e − xZ { Z ≥ } (cid:3) = 11 + θη/x e x / P ( Z ≥ x ) . (cid:3) Now we can turn to the proof of Theorem 2.2. We distinguish the cases s ≥

30 and s ≤ L ( x ) has already been proven in Lemma 7.18, it holds true for all s ≤ . Proof of Theorem 2.2 when s ≥ . We start from P ( X ≥ x ) = e ˜ ϕ ( h ) ( I ( h ) + I ( h )), see (7.37).Lemma 7.16 on I ( h ), Lemma 7.15 on the Kolmogorov distance D h , and the lower bound for P ( Z ≥ x ) from Lemma 4.2 yield | I ( h ) | ≤ e − hx . √ s (cid:16) . s exp (cid:16) −

12 (1 − δ ) s / (cid:17)(cid:17) ≤ e − hx + x / P ( Z ≥ x ) 113 ( x + 1) √ s (cid:16) . s exp (cid:16) −

12 (1 − δ ) s / (cid:17)(cid:17) . (7.40)Lemmas 7.17 and 7.20 yield I ( h ) = C e − hx + x / P ( Z ≥ x ) + θ exp( − hx ) √ π . − s/ (7.41)with C := (cid:16) (1 + θ ( δ/ √ s − )(1 + θδ s +1 )(1 + θ . δ ) (cid:17) − . In C , we bound 1 + θδ s +1 ≥ − δ and, using s ≥ θ ( δ/ √ s − )(1 + θ . δ ) ≥ − δ (cid:16) .

86 + 2 − / (cid:17) =: 1 − cδ. with c ≤ . δ (1 − cδ ) − is convex on [0 , δ (cid:16) − cδ − (cid:17) ≤ δ (cid:16) − cδ − (cid:17)(cid:12)(cid:12)(cid:12)(cid:12) δ =1 = c − c ≤ . . We deduce C ≤ . δ − δ = 1 + 7 . δ − δ . For a lower bound for C , we use 1 / (1 + x ) ≥ − x which gives C ≥ (1 − δ s +1 ) (cid:0) − ( δ/ √ s +1 (cid:1) (1 − δ/ ≥ − δ s +1 − ( δ/ √ − δ/ − δ √ ≥ − δ (1 + 2 − / + 1 / √

16 ) , which is certainly larger than 1 − . δ/ (1 − δ ). Thus we have checked C = 1 + θ . δ − δ . We insert this bound into (7.41), combine with the lower bound P ( Z ≥ x ) ≥ exp( − x ) / ( √ π ( x +1)) from Lemma 4.2, and obtain I ( h ) = e − hx + x / P ( Z ≥ x ) (cid:16) θ . δ − δ + θ . x + 1)e − s/ (cid:17) . (7.42)Because of x = δa = δ √ s/ (3 √ e ) we have7 . δ = 7 . x √ s/ (3 √ e) ≤ . x √ s . Moreover, 1 . √ s e − s/ ≤ s ≥

30. It follows that I ( h ) = e − hx + x / P ( Z ≥ x ) (cid:16) θ . x + 1)(1 − δ ) √ s (cid:17) . (7.43)Finally we add up the estimates (7.40) and (7.43) for I ( h ) and I ( h ), remember Eq. (7.37), andobtain P ( X ≥ x ) = e ˜ L ( x ) P ( Z ≥ x ) (cid:16) θ x + 1(1 − δ ) √ s (cid:16)

127 + 113 s e − (1 − δ ) s / / (cid:17)(cid:17) . Because of δ = x/ ( √ s/ (3 √ e)), this completes the proof of the theorem when s is even and s ≥ (cid:3) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 47

Proof of Theorem 2.2 when s < . If s ≤

30, then x ≤ √ s/ (3 √ e) ≤ .

2. By Lemma 7.18 and s ≤ , | ˜ L ( x ) | ≤ . x x ∆ ≤ . x r s ∆ e ≤ . √ √ e ≤ . . Using P ( Z ≥ . ≥ . ˜ L ( x ) P ( Z ≥ x ) (cid:16) f ( δ, s ) x + 1 √ s (cid:17) ≥ e − . . (cid:16) √ (cid:17) ≥ . ≥ P ( X ≥ x ) . Furthermore P ( X ≥ x )exp( ˜ L ( x )) P ( Z ≥ x ) − ≥ − ≥ − √ ≥ − f ( δ, s ) x + 1 √ s . This completes the proof the inequality. (cid:3) Bounds for Weibull tails. Proof of Theorem 2.3

Recall that in Theorem 2.3 we assume condition ( S γ ), i.e. | κ j ( X ) | ≤ j ! γ ∆ j − for j ≥

3. The mainidea is to use Theorem 2.2 (which uses condition ( S ∗ ), i.e. | κ j ( X ) | ≤ ( j − j − , j ∈ { , . . . , s + 2 } ).However, in general ( S γ ) does not imply S γ , but this is true if ∆ in ( S ∗ ) is replaced by some ∆ s .For this purpose, we set for s ∈ N ∆ s := ∆6( s + 2) γ , (8.1)where we choose s depending on ∆ and γ later. Lemma 8.1.

Let s ≥ . If | κ j | ≤ j ! γ / ∆ j − for all j ≥ then | κ j | ≤ ( j − / ∆ j − s for all j = 3 , . . . , s + 2 .Proof. We show j ! γ ≤ ( j − s + 2) γ ) j − for j = 3 , . . . , s + 2 or equivalently, j ( j − j ! γ ≤ (6( s + 2) γ ) j − ( j = 3 , . . . , s + 2) . The proof is by a ﬁnite induction over j . For j = 3, the inequality reads 6 · γ ≤ s + 2) γ and itholds true because of s ≥

4. For the induction step, we note that if the inequality holds true for j − ≥ j − < s + 2 then j ( j − j ! γ = jj − j γ (cid:0) ( j − j − j − γ (cid:1) ≤ s + 2) γ (6( s + 2) γ ) j − ≤ (6( s + 2) γ ) j − . (cid:3) Set s = s γ := 2 j (cid:16) ∆ (cid:17) / (1+2 γ ) k − . (8.2)Before we can apply Theorem 2.2 with this s and ∆ s we note some relations between s and ∆ s resp. ∆ and (below Lemma 8.2) a relation between ∆ s and ∆ γ , where ∆ γ was given in (2.8) andappears in the formulation of Theorem 2.3. Since we will see that the statement Theorem 2.3 istrivially true for ∆ suﬃciently small, some of the following estimates are stated for large values of∆ only. Lemma 8.2.

The even integer s deﬁned in (8.2) satisﬁes s ≤ s . If ∆ ≥ γ , then inaddition s ≥ and √ s ≥ √ . (cid:16) √ (cid:17) / (1+2 γ ) . (8.3)In [112] it is claimed that the bound s ≥ .

95 instead of √ . ≈ .

9, see [112, Eq. (2.64)]) holds true for ∆ > γ , which looks incorrect. Proof.

We note s ( s + 2) γ ≤ ( s + 2) γ ≤ ∆ , which gives s ≤ s . For the lower bound, if ∆ ≥ γ , then ( ∆ ) γ ≥ γ ≥ ≥ s ≥ (cid:16) ∆ (cid:17) / (1+2 γ ) − ≥ − ≥ s ≥ (cid:16) ∆ (cid:17) / (1+2 γ ) (cid:16) − (cid:16) (cid:17) / (1+2 γ ) (cid:17) ≥ (cid:16) ∆ (cid:17) / (1+2 γ ) (cid:16) − · (cid:17) = 0 . (cid:16) √ (cid:17) / (1+2 γ ) . (cid:3) We note that ∆ s , which we introduced in order to use Theorem 2.2 and ∆ γ := (∆ / √ / (1+2 γ ) ,which appears in Theorem 2.3, satisfy the following inequalities:∆ s ≥

16 ∆ (cid:16) ∆ (cid:17) − γ/ (1+2 γ ) = √ · (cid:16) ∆ √ (cid:17) / (1+2 γ ) (cid:17) = √ γ ≥ γ . (8.4)We also note that in (8.4) we have ∆ s = √ γ if in the deﬁnition of s in (8.2) we neglect theﬂoor function. Further, we have∆ s ∆ γ = ∆6( s + 2) γ (∆ / √ γ ≤ ∆ − γ √ γ γ (cid:16) (∆ / γ − (cid:17) γ = √

18 1 (cid:16) − (cid:0) (cid:1) γ (cid:17) γ ≤ √ (cid:18) − (cid:19) γ ≤ √ . γ . (8.5)Together with Lemma 8.2, which reads √ s ≥ √ . γ , (8.5) implies that for ∆ ≥ γ , thequantity ∆ s /s is bounded by some constant that does not depend on ∆. We further note thatfrom the deﬁnition of s we have s ≤ / / / (1+2 γ ) ) = 36∆ γ . (8.6)Next, let ˜ λ k , k ≥

3, be the coeﬃcients deﬁned with the help of the truncated functions belowDeﬁnition A.3, with s given by (8.2). Set˜ L ( x ) := ∞ X j =3 ˜ λ j x j . (8.7)The function appears naturally when applying Theorem 2.2 in the proof of Theorem 2.3. A ﬁrstrough estimate for ˜ L ( x ) is obtained as follows. Because of Lemma 8.1 we can apply the bound for˜ λ j from Proposition A.5 (remember ˜ λ k = − ˜ b k − /k ) with ∆ s instead of ∆, which gives | ˜ λ j | ≤ . j ( j −

1) 1(0 . s ) j − for all j ≥ . (8.8)Consequently for | x | ≤ . s we have | ˜ L ( x ) | ≤ x . s ∞ X j =3 . j ( j −

1) = x .

36 ∆ s ≤ x .

54 ∆ γ . (8.9)This bound should be compared to | L ( x ) | ≤ C ′ x / ∆ in Theorem 2.1 and | ˜ L ( x ) | ≤ . x / ∆ inTheorem 2.2. Notice, however, that x / ∆ s can be fairly large. Therefore in some circumstancesit can be useful to write ˜ L ( x ) as a polynomial in x , obtained by truncating the Cram´er-Petrovseries, plus a correction term that is small when x is small compared to ∆ s . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 49

Lemma 8.3.

Suppose that condition ( S γ ) holds true with ∆ > γ . Let s be the integerfrom (8.2) and m := min (cid:16)(cid:24) γ (cid:25) + 1 , s (cid:17) . (8.10) Then ˜ L ( x ) = m X j =3 λ j x j + θC γ (cid:16) x ∆ γ (cid:17) m +1 (8.11) for some constant C γ > and all x ∈ (0 , ∆ γ ) .Proof. We use ˜ λ k = λ k for k ≤ s (see (A.8)) and split ∞ X j = m +1 ˜ λ j x j = s X j = m +1 λ j x j + ∞ X j = s +1 ˜ λ j x j . (8.12)The ﬁrst sum on the right-hand side is set to zero if m = s . For the second sum we use (8.8) andobtain ∞ X j = s +1 | ˜ λ j | x j ≤ ∞ X j = s +1 j ( j −

1) 10 . x j (0 . s ) j − = 5 x ∞ X j = s +1 j ( j − (cid:16) x . s (cid:17) j − ≤ x s ( x/ (0 . s )) s − − x/ (0 . s ) ≤ . ∆ s s ( x/ (0 . s )) s +1 − x/ (0 . s ) ≤ .

15 ∆ s s − x/ (0 . s ) ( x/ ∆ γ )) s +1 . Here, we used ∆ s ≥ γ (see (8.4)) for the last inequality. The factor − x/ (0 . s ) is bounded for x ∈ (0 , ∆ γ ) and also ∆ s /s is bounded by some constant that does not depend on ∆ (see (8.5)).Hence, we get ∞ X j = s +1 | ˜ λ j | x j ≤ C ′ γ (cid:16) x ∆ γ (cid:17) s +1 (8.13)for some constant C ′ γ > x ∈ (0 , ∆ γ ).If m = s , then (8.11) follows, with C γ = C ′ γ .If m < s , we use the bound from Proposition A.6, which yields s X j = m +1 | λ j | x j ≤ x s X j = m +1 ( j + 1)! γ (cid:16) x ∆ (cid:17) j − ≤ ( m + 3)! γ x ( s + 2) − ( m − γ s X j = m +1 ( s + 2) ( j − γ (cid:16) x ∆ (cid:17) j − ≤ ( m + 3)! γ x ( s + 2) − ( m − γ (2 . x/ ∆ s ) m − − . x/ ∆ s ≤ ( m + 3)! γ .

25 ∆ s ( s + 2) ( m − γ (2 . x/ ∆ s ) m +1 − . x/ ∆ s . (8.14)Notice that ∆ γ ≤ .

25 ∆ s ≤ ∆ s / .

5, hence the geometric series in the previous inequality areindeed absolutely convergent for x ∈ (0 , ∆ γ ).By Lemma 8.2, for large s (i.e. large ∆, see (8.2)), s is of the order of ∆ s and if in addition x is of the order of ∆ s (and strictly smaller than ∆ s / .

5) then (8.14) is of the order of∆ s ∆ γ ( m − s , which stays bounded because m ≥ /γ . It follows that for some constant C ′′ γ > x ∈ (0 , ∆ γ ), s X j = m +1 | λ j | x j ≤ C ′′ γ (cid:16) x ∆ γ (cid:17) m +1 . We combine this bound with (8.13) and obtain (8.11) for m < s . The case m = s was alreadyaddressed above, the proof is complete. (cid:3) Proof of Theorem 2.3.

We start with the proof for ∆ ≥ γ . Set ∆ γ := (∆ √ / / (1+2 γ ) andlet ∆ s and s be as in Eqs. (8.1) and (8.2). By Lemma 8.2, for ∆ ≥ γ , we have∆ γ ≤ √ s √ . ≤ √ s √ e . For δ := x/ ∆ γ we have x = δ ∆ γ ≤ δ √ s/ (3 √ e). Theorem 2.2 is stated for equality x = δ √ s/ (3 √ e)but it extends to x ≤ δ √ s/ (3 √ e) because f ( s, δ ) is monotone increasing in δ . Therefore we obtain P ( X ≥ x ) = e ˜ L ( x ) P ( Z ≥ x ) (cid:16) θf ( δ, s ) x + 1 √ s (cid:17) ˜ L given in (8.7). Then f ( δ, s ) = 11 − δ (cid:16)

127 + 113 s e − (1 − δ ) s / / (cid:17) ≤ − δ (cid:16)

127 + 113 · γ exp (cid:0) −

12 (1 − δ )(3 √ e ∆ γ ) / (cid:1)(cid:17) ≤ − δ (cid:16)

127 + 113 · γ exp (cid:0) − (1 − δ ) p ∆ γ (cid:1)(cid:17) , where we used (8.6) for the ﬁrst inequality. Hence we have1 √ s f ( δ, s ) ≤ γ √ .

82 11 − δ (cid:16)

127 + 113 · γ exp (cid:0) − (1 − δ ) p ∆ γ (cid:1)(cid:17) ≤ γ − δ (cid:16)

24 + 749∆ γ exp (cid:0) − (1 − δ ) p ∆ γ (cid:1)(cid:17) = 1∆ γ g ( δ, ∆ γ )Then we obtain for ∆ ≥ γ P ( X ≥ x ) = e ˜ L ( x ) P ( Z ≥ x ) (cid:16) θg ( δ, ∆ γ ) x + 1∆ γ (cid:17) , x ∈ [0 , ∆ γ ) . (8.15)We check that (8.15) is trivial for ∆ ≤ γ as follows. For 0 < ∆ < γ we have0 ≤ x < ∆ γ ≤ γ √ ! γ ≤ √ ≤ . . By (8.9) we have for x ∈ [0 , ∆ γ ) | L γ ( x ) | ≤ x . γ ≤ x . ≤ . g ( δ, ∆ γ ) x + 1∆ γ = 11 − δ (cid:16)

24 + 749∆ γ exp (cid:0) − (1 − δ ) p ∆ γ (cid:1)(cid:17) x + 1∆ γ ≥ . This implies (cid:0) − Φ( x ) (cid:1) e L γ ( x ) (cid:18) g ( δ, ∆ γ ) x + 1∆ γ (cid:19) ≥ . · . · ≥ ≥ ≥ − F X ( x )as well as 1 − F X ( x ) e L γ ( x ) (cid:0) − Φ( x ) (cid:1) − ≥ − ≥ − ≥ − g ( δ, ∆ γ ) x + 1∆ γ . This proves that (8.15) is also true for ∆ ≤ γ . The statement on ˜ L γ = ˜ L in the theoremfollows from Eq. (8.9) and Lemma 8.3. (cid:3) HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 51 Berry-Esseen bound. Proof of Theorem 2.4

The proof of Theorem 2.4 is similar to the proof of the normal approximation of the tiltedmeasure for the proof of Theorem 2.2, see Section 7.6. The primary ingredients are a smoothinginequality, e.g. Lemma 4.1, and bounds on the characteristic function of X . The bounds oncharacteristic functions are similar to the bounds from Sections 7.4 and 7.5, the estimates areslightly easier because we do not need to take tilt parameters into account. Proof of Theorem 2.4.

Replacing the exponentials by their second-order Taylor approximations,we immediately get (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e − t / (cid:12)(cid:12) ≤ t for all t ∈ R , compare Section 7.5. But now suppose ﬁrst that ∆ > γ . Deﬁne ∆ γ , s := s γ ,and m γ as in (2.8) and ∆ s = ∆ / (6( s + 2) γ ) as in (8.1). In Section 8 we showed that s ≤ s and s ≥

4, moreover X satisﬁes condition ( S ∗ ) with ∆ s instead of ∆, i.e., | κ j | ≤ ( j − / ∆ j − s for j = 3 , . . . , s + 2. Let ˜ ϕ ( t ) = t + P sj =1 κ j t j /j !. We split (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e − t / (cid:12)(cid:12) ≤ (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e ˜ ϕ (i t ) (cid:12)(cid:12) + (cid:12)(cid:12) e ˜ ϕ (i t ) − e − t / (cid:12)(cid:12) . (9.1)For | t | ≤ ∆ s , we have (cid:12)(cid:12)(cid:12) ˜ ϕ (i t ) + t (cid:12)(cid:12)(cid:12) ≤ t s X j =3 ( t/ ∆ s ) j − j ( j − ≤ t s . In the last inequality we have used P ∞ j =3 / [ j ( j − P ∞ j =2 (1 /j − / ( j + 1)) = 1 /

2. For the ﬁrstterm on the right-hand side of (9.1), we use a couple of relations from Section 7.1, notably (7.5),involving the truncated moments ˜ m j and the functions g t ( x ) = exp s ( tx ) + x ˜ r ( t ), which give (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e ˜ ϕ (i t ) (cid:12)(cid:12) = (cid:12)(cid:12) E (cid:2) e i tX − g i t ( X ) (cid:3)(cid:12)(cid:12) ≤ E (cid:12)(cid:12) e i tX − exp s (i tX ) (cid:12)(cid:12) + | ˜ r (i t ) | , which is similar to (7.15) with h = 0 (note g h ( x ) = 1 when h = 0) and added expected values.Let us assume that s ≥

30, which is the case when ∆ is large enough, so that the bounds fromSection 7.4 and 7.5 are applicable. Let a := p s/ (4e) and δ ∈ (0 ,

1) with δ − ≤ a . Then,proceeding as in Lemma 7.8, we obtain the upper bound (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e ˜ ϕ (i t ) (cid:12)(cid:12) ≤ √ δ ⌊ a ⌋ − δ ( | t | ≤ δ a ) . We deduce (cid:12)(cid:12) E (cid:2) e i tX (cid:3) − e − t / (cid:12)(cid:12) ≤ √ δ ⌊ a ⌋ − δ + e − t / (cid:12)(cid:12) e t / (2∆ s ) − (cid:12)(cid:12) ( | t | < δ a ) . A reasoning similar to Lemma 7.13 shows that the Kolmogorov distance between the normal lawand the law of X is bounded by some constant times 1 / √ s hence also by some constant times1 / ∆ γ , sup x ∈ R (cid:12)(cid:12)(cid:12) P ( X ≥ x ) − P ( Z ≥ x ) (cid:12)(cid:12) ≤ C ∆ γ . This holds true if ∆ is large enough so that s = s γ is larger than 30, say ∆ ≥ ∆ ∗ . For smaller ∆,the bound is trivially true if we choose C ≥ ∆ ≤ ∆ ∗ ∆ γ . (cid:3) Appendix A. Cram´er-Petrov series

Here we recall some facts on the Cram´er-Petrov series, also called Cram´er series, for the reader’sconvenience. The series was introduced by Cram´er [21] and appeared in a limit theorem subse-quently improved by Petrov [101], see [102, Theorem 5.23] or [70, Chapter 8]. The recurrencerelation (A.2) below can be found in [112].

A.1.

When Cram´er’s condition is satisﬁed.

Even though we are primarily interested inheavy-tailed variables, the deﬁnition of the Cram´er-Petrov series is best understood by lookingﬁrst at a random variable that satisﬁes Cram´er’s condition. Thus let X be a real-valued randomvariable X such that E [exp( tX )] < ∞ for all t ∈ ( − ∆ , ∆), for some ∆ >

0. Further assume that X is not almost surely constant sothat the variance σ is non-zero. Then the cumulant generating function ϕ ( t ) = log E [exp( tX )] isanalytic in some neighborhood of the origin and the Taylor expansion ϕ ( t ) = µt + 12 σ t + ∞ X j =3 κ j j ! t j has a strictly positive radius of convergence. Recall I ( x ) = sup t ∈ R ( tx − ϕ ( t )). Proposition A.1.

Let X be a real-valued random variable that is not almost surely constant.Assume E [exp( tX )] < ∞ for all t ∈ ( − ∆ , ∆) for some ε > . Let µ = E [ X ] and σ = V ( X ) .Then the Taylor expansion of I at µ = E [ X ] is of the form I ( µ + τ ) = τ σ − ∞ X j =3 λ j τ j (A.1) and has non-zero radius of convergence. The coeﬃcients ( λ j ) j ≥ are given by λ k = − b k − /k withcoeﬃcients ( b k ) k ≥ computed recursively as follows: b = 1 /σ and for all k ≥ , b k = − σ k X r =2 κ r +1 r ! X ≤ j ,...,j r ≤ k − j + ··· + j r = k b j · · · b j r . (A.2)The coeﬃcients b k have a signiﬁcance of their own: for small t , ϕ ′ ( t ) = µ + τ ⇔ t = τσ + ∞ X k =2 b k τ k . Proof.

The restriction of the cumulant generating function to ( − ∆ , ∆) is strictly convex and in C ∞ (( − ∆ , ∆)), its derivative ϕ ′ is a strictly increasing smooth bijection from ( − ∆ , ∆) onto someopen interval ( a, b ) ⊂ R , and ϕ ′′ ( t ) > − ∆ , ∆). Therefore the inverse map ( ϕ ′ ) − : ( a, b ) → ( − ∆ , ∆) is smooth as well, i.e., ( ϕ ′ ) − ∈ C ∞ (( a ; b )). By standard facts on Legendre transforms,setting t x := ( ϕ ′ ) − ( x ), we have for all x ∈ ( a, b ) I ( x ) = t x x − ϕ ( t x ) , ϕ ′ ( t x ) = x, I ′ ( x ) = t x , I ′′ ( x ) = 1 ϕ ′′ ( t x ) . In view of ϕ ′ (0) = µ and ϕ ′′ (0) = σ , we have t µ = 0 and I ( µ ) = 0 , I ′ ( µ ) = 0 , I ′′ ( µ ) = 1 σ , a well-known property of the Cram´er rate function I ( x ). Consequently the Taylor series of I ( x )at x = µ is of the form (A.1) with λ j = − j ! d j d x j I ( x ) (cid:12)(cid:12)(cid:12) x = µ . From the analyticity of the cumulant generating function ϕ ( t ) at t = 0, the fact ϕ ′′ (0) = 1 = 0, andthe holomorphic inverse function theorem, we know that there exist complex open neighborhoods U, V ⊂ C of t = 0 and µ respectively such that ϕ ′ is a bijection from U onto V with holomorphicinverse ( ϕ ′ ) − : V → U . In particular, its Taylor series around µ has non-zero radius convergence.Thus t ( τ ) := ( ϕ ′ ) − ( µ + τ ) = ∞ X k =0 b k τ k HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 53 for suitable coeﬃcients b k ∈ R , and the series is absolutely convergent for suﬃciently small t . Theﬁrst two terms are easily seen to be b = 0 and b = 1 /σ , thus t ( τ ) = τσ + ∞ X k =2 b k τ k . (A.3)Therefore I ( µ + τ ) = Z τ I ′ ( µ + u )d u = Z τ t ( u )d u = τ σ + ∞ X k =2 b k τ k +1 k + 1hence the Taylor series of I and ( ϕ ′ ) − around µ have the same radius of convergence, moreover λ k = − b k − k ( k ≥ . The coeﬃcients b k from (A.3) are computed as follows: we must have µ + τ = ϕ ′ (cid:0) t ( τ ) (cid:1) = µ + σ (cid:16)X k ≥ b k τ k (cid:17) + ∞ X r =2 κ r +1 r ! (cid:16) ∞ X j =1 b j τ j (cid:17) r for all suﬃciently small τ . The left- and right-hand sides are power series in τ and therefore musthave all their coeﬃcients equal. The coeﬃcients of order zero and one are equal because µ = µ and 1 = b σ . For orders k ≥

2, we obtain the equation0 = σ b k + ∞ X r =2 κ r +1 r ! X j ,...,j r ≥ j + ··· + j r = k b j · · · b j r . In the second summand, because of j ℓ ≥

1, the only relevant contributions come from r ≤ k and1 ≤ j , . . . , j r ≤ k −

1, so the recurrence relation for the b k ’s follows. (cid:3) An immediate consequence is the following: If σ = 1, then each coeﬃcient λ j is a polynomialof the cumulants κ , . . . , κ j . Explicit formulas for the ﬁrst few coeﬃcients b , . . . , b are givenin [112, p. 19], see also [70, Eq. (7.2.20)].It is instructive to work out an explicit bound on the radius of convergence. Proposition A.2.

Assume E [ X ] = 0 , V [ X ] = 1 and | κ j | ≤ ( j − / ∆ j − for all j ≥ . Then theradius of convergence of the Cram´er-Petrov series is at least ∆ . Notice √ √ e ≈ . < . / (3 √ e ) to the radius of convergence to the Cram´er-Petrov series proven in[112]. Proof.

We bound the radius of convergence of the series P k b k τ k with the help of Lagrangeinversion, a trick used for the virial expansion in classical statistical mechanics [81]. The radius ofconvergence of P j κ j t j is at least ∆ (obvious), and for | t | < ∆ we have | ϕ ′′ ( t ) | ≤ − | t | / ∆ . and (cid:12)(cid:12)(cid:12) ϕ ′ ( t ) t − (cid:12)(cid:12)(cid:12) ≤ ∞ X j =3 | κ j | ( j − | t | j − ≤ ∞ X j =3 j − | t | j − ∆ j − = 1 | t | / ∆ h − log (cid:16) − | t | ∆ (cid:17) − | t | ∆ i =: ε (cid:16) | t | ∆ (cid:17) . (A.4)By the Lagrange inversion formula [47, Appendix A.6], the coeﬃcient of τ k in the expansion of t ( τ ) is equal to 1 /k times the coeﬃcient of t k − in the expansion of ( ϕ ′ ( t ) /t ) − k , which we write as b k = [ τ k ] t ( τ ) = 1 k [ t k − ] (cid:16) ϕ ′ ( t ) t (cid:17) − k . (A.5) Since coeﬃcients in convergent integrals can be extracted by complex contour integrals, it followsthat b k = 1 k π i I ϕ ′ ( t ) /t ) k d tt k with the contour of integration a circle | t | = r with r ∈ (0 , ∆). It follows that | b k | ≤ kr k − sup | t | = r | ϕ ′ ( t ) /t | k ( r ∈ (0 , ∆))which yields, together with (A.4), | b k | ≤ kr k − (1 − ε ( r/ ∆)) k ( r ∈ (0 , ∆)) . Let us choose r = ∆ /

2. Then ε ( r/ ∆) = ε ( 12 ) = 2(log 2 −

12 ) = log 4 − ≃ . ≤ . r (1 − ε ( r/ ∆)) ≥ ∆(1 − .

4) = 0 . | b k | ≤ rk r [1 − ε ( )]) k ≤ ∆2 k . k . (A.6)Therefore the radius of convergence of P k b k t k and P j λ j t j is at least 0 . (cid:3) A.2.

When X has moments up to order s ≥ . More generally, we adopt the recurrencerelation from Proposition A.1 as a deﬁnition of coeﬃcients b k and λ k . Deﬁnition A.3.

Fix s ≥ . Let X be a real-valued random variable with mean E [ X ] = µ andvariance σ = V ( X ) > . Assume E [ | X | s ] < ∞ . Then we deﬁne coeﬃcients b , . . . , b s − and λ , . . . , λ s as follows: • b := 1 /σ . • b , . . . , b s − are deﬁned recursively by (A.2) . • λ k := − b k − /k for k = 1 , . . . , s .If s = ∞ , the series P ∞ j =0 λ j +3 τ j is called Cram´er-Petrov series . A substitute for Proposition A.1 is the following. Deﬁne˜ ϕ ( t ) := s X j =1 κ j j ! t j , ˜ κ j := ( κ j , j ≤ s, , j > s. Let (˜ b k ) k ≥ and (˜ λ k ) k ≥ be the coeﬃcients deﬁned by ˜ b = 1 /σ , ˜ λ k = − ˜ b k − /k , and the recurrencerelation (A.2) with ˜ κ r +1 instead of κ r +1 or equivalently,˜ b k = − σ s − X r =2 κ r +1 r ! X ≤ j ,...,j r ≤ k − j + ··· + j r = k ˜ b j · · · ˜ b j r . (A.7)A ﬁnite induction over k shows that ∀ k ∈ { , . . . , s − } : ˜ b k = b k , ∀ k ∈ { , . . . , s } : ˜ λ k = λ k . (A.8) Proposition A.4.

Under the assumptions of Deﬁnition A.3, there exist open intervals ( − ε, ε ′ ) ∋ , ( µ − δ, µ + δ ) such that the following holds true: (a) ˜ ϕ ′ is a bijection from ( − ε, ε ′ ) onto ( µ − δ, µ + δ ) . (b) The series P ∞ k =1 ˜ b k τ k and P ∞ k =3 ˜ λ k τ k have radius of convergence larger or equal to δ . (c) If t ∈ ( − ε, ε ′ ) and τ ∈ ( − δ, δ ) , then ˜ ϕ ′ ( t ) = µ + τ ⇔ t = ∞ X k =1 ˜ b k τ k . HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 55 (d)

For x ∈ ( µ − δ, µ + δ ) and t ∈ ( − ε, ε ′ ) the solution of ˜ ϕ ′ ( t ) = x , we have ˜ I ( x ) := tx − ˜ ϕ ( t ) = − x σ + ∞ X j =3 ˜ λ j x j . Proof.

Parts (a) and (b) follow from the holomorphic inverse function theorem. The function˜ ϕ ′ : C → C is a polynomial, so in particular holomorphic. Its derivative at 0 is ˜ ϕ ′′ (0) = σ = 0,and at 0 it takes the value ˜ ϕ ′ (0) = µ . Therefore there exist open neighborhoods U, V ⊂ C of 0and µ , respectively, such that ˜ ϕ ′ , restricted to U , is a bijection from U onto V with holomorphicinverse ( ϕ ′ ) − .Let δ > B ( µ, δ ) := { z ∈ C | | z − µ | < δ } is contained in V and( − ε, ε ′ ) := ( ˜ ϕ ′ ) − (( µ − δ, µ + δ )). Then part (a) of the lemma is clearly satisﬁed. The Taylor seriesof ( ˜ ϕ ′ ) − at µ has radius of convergence at least δ . A reasoning completely analogous to the proofof Proposition A.1 shows that the coeﬃcients of the Taylor series are equal to ˜ b k . Part (b) and(c) of the lemma follow.For (d), we notedd x ˜ I ( x ) = ( ˜ ϕ ′ ) − ( x ) + x ϕ ′′ (( ˜ ϕ ′ ) − ( x ) − ˜ ϕ ′ (cid:0) ( ˜ ϕ ′ ) − ( x ) (cid:1) ϕ ′′ (( ˜ ϕ ′ ) − ( x ) = ( ˜ ϕ ′ ) − ( x )and conclude by an argument similar to the proof of Proposition A.1. (cid:3) Eq. (A.6) in the proof of Proposition A.2 has a counterpart as well.

Proposition A.5.

Assume E [ X ] = 0 , V ( X ) = 1 , and κ j ≤ ( j − / ∆ j − for all j = 3 , . . . , s .Then | ˜ b k | ≤ ∆2 k . k for all k ≥ . The proof is similar to the proof of Eq. (A.6) and is omitted.We conclude with a representation of the coeﬃcients λ k needed in the proof of Proposition A.6below. We assume E [ X ] = 0 and V ( X ) = 1. Eq. (A.8) and the analogue of (A.5) for ˜ b k instead of b k yields λ k = − ˜ b k − k = − k ( k −

1) [ t k − ] (cid:16) ˜ ϕ ′ ( t ) t (cid:17) − ( k − = − k ( k −

1) [ t k − ] (cid:16) s X j =3 κ j ( j − t j − (cid:17) − ( k − (A.9)for all k ≤ s .A.3. Bounds under the Statuleviˇcius condition.Proposition A.6.

Under condition S γ , the coeﬃcients λ k of the Cram´er-Petrov series P k ≥ λ k x k satisfy | λ k | ≤ ( k + 2)! γ (∆ / k − ( k ≥ . The proposition is a slightly improved version of [112, Eq. (2.67)].

Proof.

By Eq. (A.9) (applied to s = k ), we have λ k = − k ( k −

1) [ t k − ] (cid:16) k X j =3 κ j ( j − t j − (cid:17) − ( k − . Let g k > | κ j | ≤ j ! /g j − k for all j = 3 , . . . , k ; a bound for g k is given shortly. ThenCauchy’s inequality yields | λ k | ≤ k ( k −

1) inf r r − ( k − (cid:16) − k X j =3 j (cid:0) rg k (cid:1) j − (cid:17) − ( k − . Strictly speaking, we should write ( ˜ ϕ ′ | U ) − , since ˜ ϕ ′ with domain C is not injective. The inﬁmum is over intervals r ∈ [0 , α ] on which the denominator is non-zero. We write r = ρ g k ,bound the sum by a series, and obtain | λ k | ≤ g − ( k − k k ( k −

1) inf ρ ρ − ( k − (cid:16) − ∞ X j =3 jρ j − (cid:17) − ( k − . A numerical evaluation yieldsinf ρ ρ − (cid:16) − ∞ X j =3 jρ j − (cid:17) − ≃ . ≤ ρ ≃ . ≤ .

13, therefore | λ k | ≤ g − ( k − k k ( k −

1) 0 . · k − ≤ /g k ) ( k − k ( k − . (A.10)In order to get a bound on g k , we check that j ! / ( j − ≤ ( k + 2)! / ( k − ( j = 3 , . . . , k ) (A.11)or equivalently, j ! k − ≤ ( k + 2)! j − . The proof is by induction over k ≥ j at ﬁxed j ≥

3. For k = j , the claim is obviously true. For the induction step, we note that for all k ≥ j ≥

3, we have j ! ≤ j j − hence j !( k + 3) j − ≤ j j − ( j + 3) j − ≤ j + 3 ≤ . Therefore, under the induction hypothesis ( k + 2)! j − ≥ j ! k − , we have( k + 3)! j − = ( k + 3) j − ( k + 2)! j − ≥ ( k + 3) j − j ! k − ≥ j ! k − . This completes the induction. Condition ( S γ ) and Eq. (A.11) yield | κ j | ≤ j ! /g j − k for all j ≤ k bychoosing 1 g k = 1∆ ( k + 2)! γ/ ( k − . We deduce from (A.10) that | λ k | ≤ k ( k −

1) (15 / ∆) k − ( k + 2)! γ ≤ (15 / ∆) k − ( k + 2)! γ . (cid:3) Acknowledgments.

We thank Zakhar Kabluchko, Christoph Th¨ale and all members of the DFGscientiﬁc network

Cumulants, concentration and superconcentration , as well as the participantsof the workshops, for helpful discussions. This work is funded by the scientiﬁc network

Cumu-lants, concentration and superconcentration by the DFG (German Research Foundation) – projectnumber 318196255.

References [1] Y. Ameur, H. Hedenmalm, and N. Makarov. Fluctuations of eigenvalues of random normal matrices.

DukeMath. J. , 159(1):31–81, 07 2011.[2] N. N. Amosova. Necessity of the Cram´er, Linnik and Statuleviˇcius conditions for the probabilities of largedeviations.

Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) , 260(Veroyatn. i Stat. 3):9–16, 317, 1999. Translated in J. Math. Sci. (New York) 109 (2002), no. 6, 2031–2036.[3] N. N. Amosova. On the necessity of the Statuleviˇcius condition in limit theorems for probabilities of largedeviations.

Liet. Mat. Rink. , 39(3):293–303, 1999. Translation in Lithuanian Math. J. 39 (1999), no. 3,231–239.[4] M. A. Arcones. Limit theorems for nonlinear functionals of a stationary Gaussian sequence of vectors.

Ann.Probab. , 22(4):2242–2274, 1994.[5] O. Arizmendi, T. Hasebe, F. Lehner, and C. Vargas. Relations between cumulants in noncommutative prob-ability.

Adv. Math. , 282:56–92, 2015.[6] R. Arratia, A. D. Barbour, and S. Tavar´e.

Logarithmic combinatorial structures: a probabilistic approach .EMS Monographs in Mathematics. European Mathematical Society (EMS), Z¨urich, 2003.[7] R. R. Bahadur and R. Ranga Rao. On deviations of the sample mean.

Ann. Math. Statist. , 31:1015–1027,1960.

HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 57 [8] A. D. Barbour, E. Kowalski, and A. Nikeghbali. Mod-discrete expansions.

Probab. Theory Related Fields ,158(3-4):859–893, 2014.[9] O. E. Barndorﬀ-Nielsen and D. R. Cox.

Asymptotic techniques for use in statistics . Monographs on Statisticsand Applied Probability. Chapman & Hall, London, 1989.[10] Yu. Baryshnikov and J. E. Yukich. Gaussian limits for random measures in geometric probability.

Ann. Appl.Probab. , 15(1A):213–253, 2005.[11] R. Bentkus and R. Rudzkis. On exponential estimates of the distribution of random variables.

LithuanianMath. J. , 20:15–30, 01 1980.[12] A. C. Berry. The accuracy of the Gaussian approximation to the sum of independent variates.

Trans. Amer.Math. Soc. , 49:122–136, 1941.[13] B. B laszczyszyn, D. Yogeshwaran, and J. E. Yukich. Limit theory for geometric statistics of point processeshaving fast decay of correlations.

Ann. Probab. , 47(2):835–895, 2019.[14] S. G. Bobkov. Closeness of probability distributions in terms of Fourier-Stieltjes transforms.

Uspekhi Mat.Nauk , 71(6(432)):37–98, 2016. translation in Russian Math. Surveys 71 (2016), no. 6, 1021–1079.[15] T. Bodineau, I. Gallagher, L. Saint-Raymond, and S. Simonella. Statistical dynamics of a hard sphere gas:ﬂuctuating Boltzmann equation and large deviations. Online preprint arXiv:2008.10403, 2020.[16] G. Borot and A. Guionnet. Asymptotic expansion of β matrix models in the one-cut regime. Comm. Math.Phys. , 317(2):447–483, 2013.[17] D. R. Brillinger. Statistical inference for stationary point processes. In

Stochastic processes and related topics(Proc. Summer Res. Inst. Statist. Inference for Stochastic Processes, Indiana Univ., Bloomington, Ind.,1974, Vol. 1; dedicated to Jerzy Neyman) , pages 55–99, 1975.[18] W. Bryc. A remark on the connection between the large deviation principle and the central limit theorem.

Statist. Probab. Lett. , 18(4):253–256, 1993.[19] R. Chhaibi, J. Najnudel, and A. Nikeghbali. The circular unitary ensemble and the Riemann zeta function:the microscopic landscape and a new approach to ratios.

Invent. Math. , 207(1):23–113, 2017.[20] O. Costin and J. L. Lebowitz. Gaussian ﬂuctuation in random matrices.

Phys. Rev. Lett. , 75:69–72, Jul 1995.[21] H. Cram´er. Sur un nouveau th´eor`eme-limite de la th´eorie des probabilit´es.

Actual. Sci. Industr. , 736:5–23,1938.[22] H. Cram´er and H. Touchette (translator). On a new limit theorem in probability theory. (translation of: Sur unnouveau th´eor`eme-limite de la th´eorie des probabilit´es). Electronic preprint arXiv:1802.05988v3 [math.HO],2018.[23] F. Delbaen, E. Kowalski, and A. Nikeghbali. Mod- ϕ convergence. Int. Math. Res. Not. IMRN , 2015(11):3445–3485, 2014.[24] A. Dembo and O. Zeitouni.

Large deviations techniques and applications , volume 38 of

Applications of Math-ematics (New York) . Springer-Verlag, New York, second edition, 1998.[25] D. Denisov, A. B. Dieker, and V. Shneer. Large deviations for random walks under subexponentiality: thebig-jump domain.

Ann. Probab. , 36(5):1946–1991, 2008.[26] H. Dette and D. Tomecki. Determinants of block Hankel matrices for random matrix-valued measures.

Sto-chastic Process. Appl. , 129(12):5200–5235, 2019.[27] R. L. Dobrushin and S. B. Shlosman. Completely analytical interactions: constructive description.

J. Statist.Phys. , 46(5-6):983–1014, 1987.[28] H. D¨oring and P. Eichelsbacher. Edge ﬂuctuations of eigenvalues of Wigner matrices. In

High dimensionalprobability VI , volume 66 of

Progr. Probab. , pages 261–275. Birkh¨auser/Springer, Basel, 2013.[29] H. D¨oring and P. Eichelsbacher. Moderate deviations for the eigenvalue counting function of Wigner matrices.

ALEA Lat. Am. J. Probab. Math. Stat. , 10(1):27–44, 2013.[30] H. D¨oring and P. Eichelsbacher. Moderate deviations via cumulants.

J. Theoret. Probab. , 26(2):360–385, 2013.[31] P. Doukhan and M. H. Neumann. Probability and moment inequalities for sums of weakly dependent randomvariables, with applications.

Stochastic Processes and their Applications , 117(7):878–903, 2007.[32] J. Dousse and V. F´eray. Weighted dependency graphs and the Ising model.

Ann. Inst. Henri Poincar´e D ,6(4):533–571, 2019.[33] M. Duneau, D. Iagolnitzer, and B. Souillard. Decrease properties of truncated correlation functions andanalyticity properties for classical lattices and continuous systems.

Comm. Math. Phys. , 31:191–208, 1973.[34] P. Eichelsbacher and L. Knichel. Moment estimates of rosenthal type via cumulants, 2019. arXiv:1901.04865.[35] P. Eichelsbacher and L. Knichel. Fine asymptotics for models with Gamma type moments.

Random MatricesTheory Appl. , 10(1):2150007, 51, 2021.[36] P. Eichelsbacher, M. Raiˇc, and T. Schreiber. Moderate deviations for stabilizing functionals in geometricprobability.

Ann. Inst. Henri Poincar´e Probab. Stat. , 51(1):89–128, 2015.[37] P. Embrechts, C. Kl¨uppelberg, and T. Mikosch.

Modelling extremal events , volume 33 of

Applications ofMathematics (New York) . Springer-Verlag, Berlin, 1997. For insurance and ﬁnance.[38] N. M. Ercolani, S. Jansen, and D. Ueltschi. Singularity analysis for heavy-tailed random variables.

J. Theoret.Probab. , 32(1):1–46, 2019.[39] W. Feller.

An introduction to probability theory and its applications. Vol. II . Second edition. John Wiley &Sons, Inc., New York-London-Sydney, 1971. [40] M. Fenzl and G. Lambert. Precise deviations for disk counting statistics of invariant determinantal processes,2020.[41] V. F´eray. Weighted dependency graphs.

Electron. J. Probab. , 23:Paper No. 93, 65, 2018.[42] V. F´eray. Central limit theorems for patterns in multiset permutations and set partitions.

Ann. Appl. Probab. ,30(1):287–323, 2020.[43] V. F´eray, P.-L. M´eliot, and A. Nikeghbali.

Mod- φ convergence . SpringerBriefs in Probability and MathematicalStatistics. Springer, Cham, 2016. Normality zones and precise deviations.[44] V. F´eray, P.-L. M´eliot, and A. Nikeghbali. Mod- φ convergence, II: Estimates on the speed of convergence.In C. Donati-Martin, A. Lejay, and A. Rouault, editors, S´eminaire de Probabilit´es L , pages 405–477, Cham,2019. Springer International Publishing.[45] V. F´eray, P.-L. M´eliot, and A. Nikeghbali. Graphons, permutons and the thoma simplex: three mod-gaussianmoduli spaces.

Proceedings of the London Mathematical Society , 121(4):876–926, 2020.[46] R. A. Fisher and J. Wishart. The derivation of the pattern formulae of two-way partitions from those ofsimpler patterns.

Proc. London Math. Soc. (2) , 33(3):195–208, 1931.[47] P. Flajolet and R. Sedgewick.

Analytic combinatorics . Cambridge University Press, Cambridge, 2009.[48] S. Friedli and Y. Velenik.

Statistical mechanics of lattice systems: a concrete mathematical introduction .Cambridge University Press, 2017.[49] B. V. Gnedenko and A. N. Kolmogorov.

Limit distributions for sums of independent random variables .Translated from the Russian, annotated, and revised by K. L. Chung. With appendices by J. L. Doob and P.L. Hsu. Revised edition. Addison-Wesley Publishing Co., Reading, Mass.-London-Don Mills., Ont., 1968.[50] R. D. Gordon. Values of Mills’ ratio of area to bounding ordinate and of the normal probability integral forlarge values of the argument.

Ann. Math. Statist. , 12(3):364–366, 09 1941.[51] F. G¨otze, L. Heinrich, and C. Hipp. m -dependent random ﬁelds with analytic cumulant generating function. Scand. J. Statist. , 22(2):183–195, 1995.[52] J. Grote, Z. Kabluchko, and Ch. Th¨ale. Limit theorems for random simplices in high dimensions.

ALEA Lat.Am. J. Probab. Math. Stat. , 16(1):141–177, 2019.[53] J. Grote and Ch. Th¨ale. Concentration and moderate deviations for Poisson polytopes and polyhedra.

Bernoulli , 24(4A):2811–2841, 2018.[54] J. Grote and Ch. Th¨ale. Gaussian polytopes: a cumulant-based approach.

J. Complexity , 47:1–41, 2018.[55] R. Gr¨ubel and Z. Kabluchko. Edgeworth expansions for proﬁles of lattice branching random walks.

Ann. Inst.Henri Poincar´e Probab. Stat. , 53(4):2103–2134, 2017.[56] A. Gusakova and Ch. Th¨ale. The volume of simplices in high-dimensional poisson–delaunay tessellations.

Annales Henri Lebesgue , 4:121–153, 2021.[57] A. Hald. The early history of the cumulants and the gram-charlier series.

International Statistical Review ,68(2):137–153, 2000.[58] L. Heinrich. Some estimates of the cumulant-generating function of a sum of m -dependent random vectorsand their application to large deviations. Math. Nachr. , 120:91–101, 1985.[59] L. Heinrich. A method for the derivation of limit theorems for sums of weakly dependent random variables:a survey.

Optimization , 18(5):715–735, 1987.[60] L. Heinrich. Some bounds of cumulants of m -dependent random ﬁelds. Math. Nachr. , 149:303–317, 1990.[61] L. Heinrich. Large deviations of the empirical volume fraction for stationary Poisson grain models.

Ann. Appl.Probab. , 15(1A):392–420, 2005.[62] L. Heinrich. An almost-Markov-type mixing condition and large deviations for Boolean models in the line.

Acta Appl. Math. , 96(1-3):247–262, 2007.[63] L. Heinrich. On the strong Brillinger-mixing property of α -determinantal point processes and some applica-tions. Appl. Math. , 61(4):443–461, 2016.[64] L. Heinrich and W.-D. Richter. On moderate deviations of sums of m -dependent random vectors. Math.Nachr. , 118:253–263, 1984.[65] L. Heinrich and M. Spiess. Berry-Esseen bounds and Cram´er-type large deviations for the volume distributionof Poisson cylinder processes.

Lith. Math. J. , 49(4):381–398, 2009.[66] L. Heinrich and M. Spiess. Central limit theorems for volume and surface content of stationary Poissoncylinder processes in expanding domains.

Adv. in Appl. Probab. , 45(2):312–331, 2013.[67] L. Hofer. A central limit theorem for vincular permutation patterns.

Discrete Math. Theor. Comput. Sci. ,19(2):Paper No. 9, 26, 2017.[68] H.-K. Hwang. On convergence rates in the central limit theorems for combinatorial structures.

European J.Combin. , 19(3):329–343, 1998.[69] D. Iagolnitzer and B. Souillard. Lee-Yang theory and normal ﬂuctuations.

Phys. Rev. B (3) , 19(3):1515–1518,1979.[70] I. A. Ibragimov and Yu. V. Linnik.

Independent and stationary sequences of random variables . Wolters-Noordhoﬀ Publishing, Groningen, 1971. With a supplementary chapter by I. A. Ibragimov and V. V. Petrov,Translation from the Russian edited by J. F. C. Kingman.[71] G. Ivanoﬀ. Central limit theorems for point processes.

Stochastic Process. Appl. , 12(2):171–186, 1982.[72] J. Jacod, E. Kowalski, and A. Nikeghbali. Mod-Gaussian convergence: new limit theorems in probability andnumber theory.

Forum Math. , 23(4):835–873, 2011.

HE METHOD OF CUMULANTS FOR THE NORMAL APPROXIMATION 59 [73] S. Janson. Normal convergence by higher semi-invariants with applications to sums of dependent randomvariables and random graphs.

Ann. Probab. , 16(1):305–312, 1988.[74] Z. Kabluchko, V. Vysotsky, and D. Zaporozhets. Convex hulls of random walks: expected number of facesand face probabilities.

Adv. Math. , 320:595–629, 2017.[75] R. S. Kallabis and M. H. Neumann. An exponential inequality under weak dependence.

Bernoulli , 12(2):333–350, 2006.[76] J. P. Keating and N. C. Snaith. Random matrix theory and ζ (1 / it ). Comm. Math. Phys. , 214(1):57–89,2000.[77] E. Kowalski and A. Nikeghbali. Mod-Poisson convergence in probability and number theory.

Int. Math. Res.Not. IMRN , 2010(18):3549–3587, 2010.[78] E. Kowalski and A. Nikeghbali. Mod-Gaussian convergence and the value distribution of ζ ( + it ) and relatedquantities. J. Lond. Math. Soc. (2) , 86(1):291–319, 2012.[79] G. Lambert. Limit theorems for biorthogonal ensembles and related combinatorial identities.

Advances inMathematics , 329:590–648, 2018.[80] G. Lambert. Mesoscopic ﬂuctuations for unitary invariant ensembles.

Electron. J. Probab. , 23:33 pp., 2018.[81] J. L. Lebowitz and O. Penrose. Convergence of virial expansions.

J. Mathematical Phys. , 5:841–847, 1964.[82] J. L. Lebowitz, B. Pittel, D. Ruelle, and E. R. Speer. Central limit theorems, Lee-Yang zeros, and graph-counting polynomials.

J. Combin. Theory Ser. A , 141:147–183, 2016.[83] V. P. Leonov and A. N. ˇSirjaev. On a method of semi-invariants.

Theor. Probability Appl. , 4:319–329, 1959.Translated from Teor. Verojatnost. i Primenen. (1959), 342–355.[84] Yu. V. Linnik. Limit theorems for sums of independent variables, taking large deviations into account. II. Teor. Verojatnost. i Primenen. , 6, 1961.[85] Yu. V. Linnik. Limit theorems for the sums of independent variables taking into account the large deviations.I.

Teor. Verojatnost. i Primenen. , 6:145–163, 1961.[86] Yu. V. Linnik. Limit theorems for sums of independent quantities, taking large deviations into account. III.

Teor. Verojatnost. i Primenen , 7:121–134, 1962.[87] J. Lukkarinen, M. Marcozzi, and A. Nota. Summability of connected correlation functions of coupled latticeﬁelds.

J. Stat. Phys. , 171(2):189–206, 2018.[88] D. M. Mason and H. H. Zhou. Quantile coupling inequalities and their applications.

Probab. Surv. , 9:439–479,2012.[89] P. McCullagh.

Tensor methods in statistics . Monographs on Statistics and Applied Probability. Chapman &Hall, London, 1987.[90] P.-L. M´eliot and A. Nikeghbali. Mod-Gaussian convergence and its applications for models of statisticalmechanics. In

In memoriam Marc Yor—S´eminaire de Probabilit´es XLVII , volume 2137 of

Lecture Notes inMath. , pages 369–425. Springer, Cham, 2015.[91] M. Michelen and J. Sahasrabudhe. Central limit theorems and the geometry of polynomials. Online preprintarXiv:1908.09020 [math.PR], 2019.[92] M. Michelen and J. Sahasrabudhe. Central limit theorems from the roots of probability generating functions.

Adv. Math. , 358:106840, 27, 2019.[93] T. Mikosch and A. V. Nagaev. Large deviations of heavy-tailed sums with applications in insurance.

Extremes ,1(1):81–110, 1998.[94] M. M¨ohle and H. Pitters. Absorption time and tree length of the Kingman coalescent and the Gumbeldistribution.

Markov Process. Related Fields , 21(2):317–338, 2015.[95] A. V. Nagaev. Local limit theorems with regard to large deviations when Cram´er’s condition is not satisﬁed.

Litovsk. Mat. Sb. , 8:553–579, 1968. Selected Transl. in Math. Stat. Probab. 11, 249–278 (1973).[96] S. V. Nagaev. Some limit theorems for large deviations.

Teor. Verojatnost. i Primenen , 10:231–254, 1965.English translation in Theor. Probability Appl. 10 (1965), 214—235.[97] I. Nourdin and G. Peccati. Cumulants on the Wiener space.

J. Funct. Anal. , 258(11):3775–3791, 2010.[98] G. Pan, S. Wang, and W. Zhou. Limit theorems for linear spectrum statistics of orthogonal polynomialensembles and their applications in random matrix theory.

J. Math. Phys. , 58(10):103301, 2017.[99] G. Peccati and M. S. Taqqu.

Wiener chaos: moments, cumulants and diagrams , volume 1 of

Bocconi &Springer Series . Springer, Milan; Bocconi University Press, Milan, 2011. A survey with computer implemen-tation, Supplementary material available online.[100] R. Pemantle and M. C. Wilson.

Analytic combinatorics in several variables , volume 140 of

Cambridge Studiesin Advanced Mathematics . Cambridge University Press, Cambridge, 2013.[101] V. V. Petrov. Generalization of Cram´er’s limit theorem.

Uspehi Matem. Nauk (N.S.) , 9(4(62)):195–202, 1954.[102] V. V. Petrov.

Limit theorems of probability theory , volume 4 of

Oxford Studies in Probability . The ClarendonPress, Oxford University Press, New York, 1995. Sequences of independent random variables, Oxford SciencePublications.[103] H. Pitters. On the number of segregating sites, 2017. arXiv:1708.05634.[104] H. Pitters. The number of cycles in a random permutation and the number of segregating sites jointly convergeto the Brownian sheet, 2019. arXiv:1903.04906.[105] B. Rider and B. Vir´ag. The Noise in the Circular Law and the Gaussian Free Field.

International MathematicsResearch Notices , 2007, 01 2007. rnm006. [106] H. Robbins. A remark on Stirling’s formula.

American Mathematical Monthly , 62:402–405, 1955.[107] G.-C. Rota and J. Shen. On the combinatorics of cumulants.

J. Combin. Theory Ser. A , 91(1-2):283–304,2000. In memory of Gian-Carlo Rota.[108] R. Rudzkis and A. Bakshaev. General theorems on large deviations for random vectors.

Lith. Math. J. ,57(3):367–390, 2017.[109] R. Rudzkis, L. Saulis, and V. Statuljaviˇcus. A general lemma on probabilities of large deviations.

Litovsk.Mat. Sb. , 18(2):99–116, 217, 1978. Translated in Lithuanian Math. J. 18 (1978), no. 2, 226–238 (1979).[110] D. Ruelle.

Statistical mechanics: Rigorous results . W. A. Benjamin, Inc., New York-Amsterdam, 1969.[111] L. Saulis. Limit theorems that take into account large deviations in the case when Ju. V. Linnik’s conditionis satisﬁed.

Litovsk. Mat. Sb. , 13(4), 1973.[112] L. Saulis and V. A. Statuleviˇcius.

Limit theorems for large deviations , volume 73 of

Mathematics and itsApplications (Soviet Series) . Kluwer Academic Publishers Group, Dordrecht, 1991. Translated and revisedfrom the 1989 Russian original.[113] M. Schulte and Ch. Th¨ale. Cumulants on Wiener chaos: moderate deviations and the fourth moment theorem.

J. Funct. Anal. , 270(6):2223–2248, 2016.[114] P. M. Sch¨utzenberger. Contribution aux applications statistiques de la th´eorie de l´information. Publ. Inst.Statist. Univ. Paris, 1954. Th`ese d’´Etat).[115] A. D. Scott and A. D. Sokal. The repulsive lattice gas, the independent-set polynomial, and the Lov´asz locallemma.

J. Stat. Phys. , 118(5-6):1151–1261, 2005.[116] A. Soshnikov. Level spacings distribution for large random matrices: Gaussian ﬂuctuations.

Annals of Math-ematics , 148(2):573–617, 1998.[117] A. Soshnikov. The central limit theorem for local linear statistics in classical compact groups and relatedcombinatorial identities.

Ann. Probab. , 28(3):1353–1370, 2000.[118] A. Soshnikov. Gaussian limit for determinantal random point ﬁelds.

Ann. Probab. , 30(1):171–187, 2002.[119] T. P. Speed. Cumulants and partition lattices.

Australian Journal of Statistics , 25(2):378–388, 1983.[120] V. A. Statuleviˇcius. On large deviations.

Z. Wahrscheinlichkeitstheorie und Verw. Gebiete , 6:133–144, 1966.[121] B. Sturmfels and P. Zwiernik. Binary cumulant varieties.

Annals of combinatorics , 17(1):229–250, 2013.[122] T.N. Thiele. Forelæsninger over almindelig iagttagelseslære: Sandsynlighedsregning og mindste kvadratersmethode.

Reitzel, Copenhagen , 1889.[123] B. Tsirelson. Bernstein inequality. Encyclopedia of Mathematics, http://encyclopediaofmath.org/index.php?title=Bernstein_inequality&oldid=15217 ,2012. Adapted from an original article by A.V. Prokhorov, N.P. Korneichuk, V.P. Motornyi (originator),which appeared in Encyclopedia of Mathematics - ISBN 1402006098.[124] W. Wolf. Asymptotische Entwicklungen f¨ur Wahrscheinlichkeiten grosser Abweichungen.

Z. Wahrschein-lichkeitstheorie und Verw. Gebiete , 40(3):239–256, 1977.[125] V. M. Zolotarev. On the closeness of the distributions of two sums of independent random variables.

Teor.Verojatnost. i Primenen. , 10:519–526, 1965.[126] V. M. Zolotarev. A sharpening of the inequality of Berry-Esseen.

Z. Wahrscheinlichkeitstheorie und Verw.Gebiete , 8:332–342, 1967.[127] P. Zwiernik.

Semialgebraic statistics and latent tree models , volume 146 of

Monographs on Statistics andApplied Probability . Chapman & Hall/CRC, Boca Raton, FL, 2016.(Hanna D¨oring)

Institut f¨ur Mathematik, Universit¨at Osnabr¨uck, Albrechtstr. 28a, 49076 Osnabr¨uck,Germany

Email address : [email protected] (Sabine Jansen) Mathematisches Institut, Ludwig-Maximilians-Universit¨at M¨unchen, Theresienstr. 39,80333 Munich, Germany; Munich Center for Quantum Science and Technology (MCQST), Schelling-str. 4, 80799 Munich, Germany

Email address : [email protected] (Kristina Schubert) Institut f¨ur Mathematik, Universit¨at Osnabr¨uck, Albrechtstr. 28a, 49076 Os-nabr¨uck, Germany

Email address ::