[PDF] New Edgeworth-type expansions with finite sample guarantees

Abstract

We establish higher-order expansions for a difference between probability distributions of sums of i.i.d. random vectors in a Euclidean space. The derived bounds are uniform over two classes of sets: the set of all Euclidean balls and the set of all half-spaces. These results allow to account for an impact of higher-order moments or cumulants of the considered distributions; the obtained error terms depend on a sample size and a dimension explicitly. The new inequalities outperform accuracy of the normal approximation in existing Berry--Esseen inequalities under very general conditions. For symmetrically distributed random summands, the obtained results are optimal in terms of the ratio between the dimension and the sample size. Using the new higher-order inequalities, we study accuracy of the nonparametric bootstrap approximation and propose a bootstrap score test under possible model misspecification. The proposed results include also explicit error bounds for general elliptical confidence regions for an expected value of the random summands, and optimality of the Gaussian anti-concentration inequality over the set of all Euclidean balls.

Full PDF

aa r X i v : . [ m a t h . S T ] J un New Edgeworth-type expansions withﬁnite sample guarantees

Mayya Zhilova ∗ School of MathematicsGeorgia Institute of TechnologyAtlanta, GA 30332-0160 USAe-mail: [email protected]

Abstract:

We establish Edgeworth-type expansions for a diﬀerence be-tween probability distributions of sums of independent random vectors in aEuclidean space. The derived bounds are uniform over two classes of sets:the set of all Euclidean balls and the set of all half-spaces. These resultsallow to account for an impact of higher-order moments or cumulants of theconsidered distributions; the derived error terms depend on a sample sizeand a dimension explicitly. We compare these results with known Berry–Esseen inequalities, and show how the new bounds can outperform accuracyof the normal approximation. We apply the new bounds to the linear re-gression model and the smooth function model, and examine dependenceof accuracy of the normal approximation in these models on higher-ordermoments, a dimension, and a sample size.This is a short version of the original paper.

MSC 2010 subject classiﬁcations:

Primary 62E17; secondary 62E99.

Keywords and phrases:

Edgeworth series expansion, dependence on di-mension, higher-order accuracy, linear regression model, smooth functionmodel.

1. Introduction

The Edgeworth series had been introduced by Edgeworth [11, 12] and Chebyshev[6], and developed by Cram´er [10]. Since that time, the Edgeworth expansionhas become one of the major asymptotic techniques for approximation of a c.d.f.or a p.d.f. In particular, the Edgeworth expansion is a powerful instrument forestablishing rates of convergence in the CLT and for studying accuracy of thebootstrap (see, for example, Hall [14], Mammen [17], Lahiri [16]).In this paragraph we recall some basic facts about the Edgeworth seriesthat are useful for comparison with the results in this paper. The propertieslisted in this paragraph can be found in Chapter 5 of the monograph by Hall[14] (see also Bhattacharya and Rao [5], Kolassa [15], and Skovgaard [19]). Let S n := n − / P ni =1 X i for i.i.d. R d -valued random vectors { X i } ni =1 with E X i =0 , Σ := Var X i , and E | X ⊗ ( k +2) i | < ∞ . Let A denote a class of sets A ⊆ R d ∗ Support by the National Science Foundation grant DMS-1712990 is gratefully acknowl-edged 1 . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds satisfying sup A ∈ A Z ( ∂A ) ε ϕ ( x ) dx = O ( ε ) , ε ↓ , (1.1)where ϕ ( x ) is the p.d.f. of N (0 , I d ), and ( ∂A ) ε denotes the set of points dis-tant no more than ε from the boundary ∂A of A . This condition holds for anymeasurable convex set in R d . Let also ψ ( t ) := E e it T X . If the Cram´er conditionlim sup k t k→∞ | ψ ( t ) | < P ( S n ∈ A ) = Z A { ϕ Σ ( x ) + k P j =1 n − j/ P j ( − ϕ Σ : { κ j } )( x ) (cid:9) dx + o ( n − k/ ) (1.3)for n → ∞ . The remainder term equals o ( n − k/ ) uniformly in A ∈ A , ϕ Σ ( x )denotes the p.d.f. N (0 , Σ); κ j are cumulants of X , and P j ( − ϕ Σ : { κ j } )( x ) is adensity of a signed measure, recovered from the series expansion of the charac-teristic function of X using the inverse Fourier transform. In the multivariatecase, a calculation of an expression for P j ( x ) for large j is rather involved sincethe number of terms included in it grows with j , and it requires to employgeneralized multivariate cumulants (see McCullagh [18]).Expansion (1.3) does not hold for arbitrary random variables, in particular,Cram´er’s condition (1.2) holds if a probability distribution of X has a nonde-generate absolutely continuous component. Condition (1.1) does not take intoaccount dependence on the dimension d . Indeed, if d is not reduced to a genericconstant, then the right hand side of (1.1) depends on d in diﬀerent ways formajor classes of sets. Let us refer to the series of recent works by Chernozhukovet al. [7, 8, 9], Belloni et al. [1], where the authors established normal approxima-tion results and bootstrap accuracy in nonasymptotic and very high-dimensionalsetting. Their results include anti-concentration properties of important classesof sets. Getting back to the Edgeworth series (1.3), due to its asymptotic naturefor probability distribution functions, this kind of expansions is typically usedin the asymptotic framework (i.e., for n → ∞ ) without taking into account de-pendence of the remainder term o ( n − k/ ) on the dimension. To the best of ourknowledge, there have been no studies on accuracy of the Edgeworth expansionsin ﬁnite sample multivariate setting so far. In the present paper, we considerthis framework and establish approximation bounds of type (1.3) with explicitdependence on dimension d and sample size n ; this is useful for numerous mod-ern applications, where it is important to track dependence of error terms on d and n . Furthermore, these results allow to account for an impact of higher-order moments of the considered distributions, which is important for derivingapproximation bounds with higher-order accuracy. We establish expansions for the diﬀerence between probability distributions of S n := n − / P ni =1 X i for independent random vectors { X i } and N (0 , Var S n ). . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds The bounds are uniform over two classes of subsets of R d : the set B of all ℓ -balls, and the set H of all half-spaces. These classes of sets are useful when oneworks with linear or quadratic approximations of a smooth function of S n ; theyare also useful for construction of conﬁdence sets based on linear contrasts, andfor elliptical conﬁdence regions. We consider these examples in Section 4. Theestablished bounds are also helpful for approximation of distribution of χ and F -test statistics.In Section 2, we consider the 2-second order expansions for i.i.d. { X i } . For theclass B , the approximation error is ≤ Cn − / R + C p d /n + Cd /n , where R isa sublinear function of the 3-d moment E X ⊗ , and R ≤ { nonzero elements of E X ⊗ } C . In Lemma 2.1 it is shown that the relation d = o ( n / ) as n → ∞ isnecessary for consistency of the expansion when E X ⊗ = 0. See also Remark 2.1where we compare this bound with the Berry–Esseen inequality by Bentkus [2].The analogous 2-nd order expansion for the class H is constructed in Theorem2.2, in Remark 2.2 we explain how this bound can improve the classical Berry–Esseen inequality. In Proposition 2.1 we establish similar bounds between S n and S T,n := n − / P ni =1 T i for i.i.d. random vectors { T i } with the same meanand covariance matrix as X i ’s; here the error term includes a sublinear functionof the diﬀerence E X ⊗ − E T ⊗ . In Section 3 we consider expansions of an ar-bitrary order K , where the expansions’ terms depend sublinearly on diﬀerencesbetween higher-order moments of the compared distributions. Therefore, theaccuracy of these approximation bounds includes information about closenessof the higher-order moments or cumulants. In Section 3 we study also the casewhen summands are independent but not necessarily identically distributed. InSection 4 we apply the established inequalities to two popular models. The ﬁrstexample is concerned with a distribution of the least squares estimator in thelinear regression model. In the second example, we consider the smooth func-tion model and accuracy of the normal approximation in it. Proofs of the mainresults are given in a supplementary material.Let us emphasize that the derived expansions impose considerably weakerconditions on probability distributions of X i than the Edgeworth expansions(1.3) since our results do not require Cram´er’s condition (1.2) to be fulﬁlled,and they assume a smaller number of ﬁnite moments. Furthermore, the genericconstants in our results do not depend on d and n , which allows to track de-pendence of the error terms on them. To the best of our knowledge, there havebeen no such results obtained so far. For X = ( x , . . . , x d ) T ∈ R d , k X k denotes the Euclidean norm, E | X k | < ∞ denotes that E | x i · · · x i k | < ∞ ∀ i , . . . , i k ∈ { , . . . , d } . For d × d matrices, wedenote their spectral norm with k · k . For A ∈ R d ⊗ k , the operator norm (for k ≥

3) is denoted by k A k op := sup {h A, γ ⊗· · ·⊗ γ k i : k γ j k = 1 , γ j ∈ R d , j = 1 , . . . , k } . k A k max := max {| a i ,...,i k | : i , . . . , i k ∈ { , . . . , d }} . C, c denote positive genericconstants. The abbreviation PD denotes “positive deﬁnite”. . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds

2. Second order expansions

Denote for random vectors

X, Y in R d ∆ B ( X, Y ) := sup r ≥ , t ∈ R d | P ( k X − t k ≤ r ) − P ( k Y − t k ≤ r ) | . (2.1) Theorem 2.1.

Let { X i } ni =1 be i.i.d. R d -valued random vectors with E | X i | < ∞ .Let Σ := Var X i be PD; without loss of generality, assume that E X i = 0 . Letalso Z Σ ∼ N (0 , Σ) in R d , and Y := Σ − / X , then it holds ∆ B ( S n , Z Σ ) ≤ . n − / R (2.2)+ C k Σ − kk Σ k (cid:8) . E k Y k + d + 2 d (cid:9) / n − / + (cid:8) . E k Y k + 0 . d + 2 d ) (cid:9) n − , where R is a sublinear function of E (Σ − / X ) ⊗ s.t. in general | R | ≤ E k Σ − / X k ;if the number of nonzero elements of E (Σ − / X ) ⊗ is ≤ d and k E (Σ − / X ) ⊗ k max ≤ m , which takes place, for example, if X is symmetrically distributed or if itscoordinates are mutually independent, then | R | ≤ m d. (2.3) The generic constant C ≥ is independent from d, n , and probability distribu-tion of X i (the detailed deﬁnitions of R and C are given in the proof ). Corollary 1.

Denote the j -th coordinate of Y = Σ − / X with Y j . Supposeadditionally that for some m ≥ E Y j ≤ m ∀ j = 1 , . . . , d , then ∆ B ( S n , Z Σ ) ≤ . n − / R (2.4)+ C k Σ − kk Σ k (cid:8) (7 . m + 1) d + 2 d (cid:9) / n − / + (cid:8) (1 . m + 0 . d + 0 . d (cid:9) n − . Lemma 2.1.

There exists a probability distribution X i satisfying conditions ofTheorem 2.1 with E X = 0 s.t. the relation d = o ( n / ) as n → ∞ is necessaryfor lim n →∞ ∆ B ( S n , Z Σ ) = 0 .Remark . The Berry–Esseen inequality by Bentkus [2] shows that for Σ = I d and E k X k < ∞ ∆ B ( S n , Z Σ ) ≤ c E k X k n − / . The term Cn − / R in The-orem 2.1 is upper-bounded by C E k Σ − / X k n − / , which is analogous to theerror term in the inequality by Bentkus [2] (but has an explicit constant). How-ever, R is a sublinear function of the third moment of Σ − / X and, therefore,it can be considerably smaller than the third moment of the ℓ -norm k Σ − / X k .Corollary 1 shows that the error term in Theorem 2.1 depends on d and n as( n − / CR + C m , Σ ( p d /n + d /n ), which can improve the Berry–Esseen ap-proximation error C m , Σ p d /n in terms of the ratio between d and n if, forexample, R ≤ cd (see inequality (2.3) and Lemma 2.1). Theorem 2.1 imposesa stronger moment assumption than the Berry–Esseen bound by Bentkus [2] . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds since the latter inequality assumes only 3 ﬁnite moments of k X i k . However,the theorems considered in this paper require much weaker conditions than theEdgeworth expansions (1.3) that would assume in general at least 5 ﬁnite mo-ments of k X i k and the Cram´er condition (1.2).Below we consider the uniform distance between the probability distributionsof S n and Z Σ over the set of all half-spaces in R d :∆ H ( S n , Z Σ ) := sup x ∈ R ,γ ∈ R d (cid:12)(cid:12) P ( γ T S n ≤ x ) − P ( γ T Z Σ ≤ x ) (cid:12)(cid:12) . (2.5)Let A : R R , A : R R denote multilinear forms s.t. A ( x , x , x ) = E Q j =1 x Tj Y , A ( x , . . . , x ) = E Q j =1 x Tj Y ∀ x , . . . , x ∈ R d . Theorem 2.2.

Given the conditions of Theorem 2.1, it holds ∆ H ( S n , Z Σ ) ≤ . k A k op n − / + C (cid:8) k A k op + 6 (cid:9) / n − / + { . k A k op + 0 . } n − , where C ≥ is a generic constant independent from d, n and a probabilitydistribution of X i .Remark . Recalling the arguments in Remark 2.1, C k A k op n − / in the latterstatement depends on the third moment of Y sublinearly, and it equals zero when E X = 0. Furthermore, the classical Berry–Esseen theorem by Berry [3] andEsseen [13] (that requires E k X i k < ∞ ) gives an error term ≤ c k A k / op n − / that is ≥ p k A k op /n because k A k op ≥

1. This shows that Theorem 2.2 canhave a better accuracy than the result for ∆ H implied by the classical Berry–Esseen inequality when, for example, E X = 0 and E k A k op is rather big (e.g.,for the logistic or von Mises distributions).The following statement is an extension of Theorems 2.1 and 2.2 to a generalapproximating distribution with the same ﬁrst two moments as X i ’s. Here thenew terms Cn − / R ,T and C k A ,T k n − / in the approximation errors dependon the diﬀerence E (Σ − / X ) ⊗ − E (Σ − / T ) ⊗ sublinearly; in general, R ,T ≤ C p d /n . Proposition 2.1.

Let { X i } ni =1 satisfy conditions of Theorem 2.1; let also { T i } ni =1 be i.i.d r.v. in R d , with E T i = 0 , Var T i = Σ , and E | T i | < ∞ . Denote L := E k Σ − / X k + E k Σ − / T k , S T,n := n − / P ni =1 T i . It holds ∆ B ( S n , S T,n ) ≤ . n − / R ,T + C k Σ − kk Σ k (cid:8) . L + 2 d + 4 d (cid:9) / n − / + 1 . L n − . Let also A ,T : R R , A ,T : R R denote multilinear forms s.t. A ,T ( x , x , x ) = E Q j =1 x Tj Σ − / X − E Q j =1 x Tj Σ − / T , A ,T ( x , . . . , x ) = E Q j =1 x Tj Σ − / T ∀ x , . . . , x ∈ R d , then ∆ H ( S n , S T,n ) ≤ . k A ,T k op n − / + C (cid:8) k A k op + k A ,T k op ) + 12 (cid:9) / n − / + 1 . k A k op + k A ,T k op ) n − . . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds Let ∆ K ( X, Y ) denote the uniform or Kolmogorov metric for random variableson the real line ∆ K ( X, Y ) := sup x ∈ R | P ( X ≤ x ) − P ( Y ≤ x ) | . The followingstatement shows how Proposition 2.1 reads in the one-dimensional case. It isderived similarly to Theorem 2.2 and Proposition 2.1 because one can take d = γ = 1. Lemma 2.2.

Let { X i } ni =1 satisfy conditions of Theorem 2.1 for d = 1 . Let also σ := Var X i > . Suppose that { T i } ni =1 are i.i.d real-valued random variabless.t. conditions of Proposition 2.1 are fulﬁlled. Let also L := σ − ( E X + E T ) ,then it holds ∆ K ( S n , S T,n ) ≤ . σ − | E X − E T | n − / + C (cid:8) L + 12 (cid:9) / n − / + 1 . L n − . Remark . For any m > , m ∈ R there exists an explicit r.v. T i s.t. E T i =0 , E T i = m , E T i = m , and E T i < ∞ . Indeed, if m = 0, one can take T i tobe symmetrically distributed around 0 with a ﬁnite 4-th moment. If m > Gamma ( α, β ) with α = 4 m m − , β = 2 m m − leads to therequired distribution; or, for instance, centered Lognormal with appropriateparameters. In case m <

0, one can take T i := − ˜ T i , where ˜ T i is a solutionfor E ˜ T i = 0 , E ˜ T i = m , E ˜ T i = − m >

0. Hence, one can construct explicitdistributions of T i with given moments m , m in order to approximate thedistribution of S n in distance ∆ K , using Lemma 2.2.

3. Expansions of an arbitrary order

The construction that we introduce in the proofs of Theorems 2.1, 2.2, andProposition 2.1 allows to obtain explicit error terms up to the order n − with-out imposing special structural assumptions on a distribution of X i . In thissection, we modify this method and establish an approximation of an arbitraryorder under similar mild conditions on X i , and also in case when the randomsummands { X i } ni =1 are mutually independent but not necessarily identicallydistributed. Theorem 3.1.

Let { X i } ni =1 be i.i.d. R d -valued random vectors with E | X K + δi | < ∞ for an integer K ≥ and δ > . Let Σ := Var X i be PD; without loss ofgenerality, assume that E X i = 0 . Let also Z Σ ∼ N (0 , Σ) in R d . Then it holds ∆ B ( S n , Z Σ ) ≤ X K − j =3 n − ( j − / R j,K + C Σ ,δ { d K/ n − ( K − / + ( d K/ n − ( K − / ) δ K } , (3.1) where R j,K are sublinear functions of diﬀerences between the moments E Z ⊗ j and E { Σ − / X } ⊗ j (hence these terms can be considerably smaller than theabsolute j -th moments of Σ − / X i and Z ); C Σ ,δ > does not depend on d and n , and δ K > is s.t the last term in (3.1) approaches C p d/n as K increases.Explicit expressions for the approximation error terms are given in the proof. . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds The following statement extends the K -th order expansions to the case ofa general approximating distribution with the same ﬁrst two moments as X i ’s(similarly to Proposition 2.1). Here we include the results for the i.i.d. sum-mands, the n-i.i.d. case is derived is the same way. Theorem 3.2.

Let { X i } ni =1 be i.i.d. R d -valued random vectors with E | X K + δi | < ∞ for an integer K ≥ and δ > . Let Σ := Var X i be PD;without loss of generality, assume that E X i = 0 . Let also { T i } ni =1 be i.i.d r.v.,with the same ﬁrst two moments as X i ’s, and E | T K + δi | < ∞ ; denote S T,n := n − / P ni =1 T i . It holds ∆ B ( S n , S T,n ) ≤ X K − j =3 n − ( j − / R j,T,K + C Σ ,δ,T { d K/ n − ( K − / + ( d K/ n − ( K − / ) δ ′ K } , (3.2) R j,T,K are sublinear functions of diﬀerences between the moments E { Σ − / X } ⊗ j and E { Σ − / T } ⊗ j . C Σ ,δ > does not depend on d and n , and δ ′ K > is s.tthe last term in (3.2) approaches C p d/n as K increases. Explicit expressionsfor the approximation error terms are given in the proof.

4. Examples

Consider Y i = X Ti θ + ε i for i = 1 , . . . , n , with unknown parameter θ ∈ R d ,deterministic regressors X i ∈ R d s.t. matrix XX T = P ni =1 X i X Ti is PD; ε i arei.i.d. real-valued r.v. with Var ε i = σ >

0, and E | ε i | K + δ < ∞ for an integer K ≥ δ > Lemma 4.1.

Consider the least squares estimator ˆ θ := ( XX T ) − P ni =1 X i Y i ofthe parameter θ . Let Z Σ ∼ N (0 , σ I d ) , and β := max ≤ i ≤ n k ( XX T ) − / X i k ≤ c p d/n , then it holds sup x ≥ (cid:12)(cid:12) P ( k ( XX T ) / (ˆ θ − θ ) k ≤ x ) − P ( k Z Σ k ≤ x (cid:1)(cid:12)(cid:12) ≤ ∆ B ( { XX T } / (ˆ θ − θ ) , Z Σ ) ≤ | E ( ε i /σ ) | dβC + | E ( ε /σ ) − | dβ C + d P K − j =5 β j − | m j | C j + Cdβ K − + C ( d K/ n − ( K − / ) δ ′ K ≤ P K − j =3 C | m j | ( d j /n j − ) / + C ( d K /n K − ) / + C ( d K/ n − ( K − / ) δ ′ K , (4.1) m j are linear functions of diﬀerences between the moments E ( ε i /σ ) j and Z j . This inequality follows from Theorem 3.1 (the result for the n-i.i.d. case),using that P ni =1 k ( XX T ) − / X i k = d . Inequality (4.1) shows how accuracy ofthe normal approximation depends on the diﬀerence between moments of ε i /σ and N (0 , E ε i = 0, then C | m | p d /n = 0, and the remaining . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds terms include C | m | d /n , C | m | p d /n , etc. If there exist suﬃciently manyﬁnite moments of | ε i | , then the last term in (4.1) approaches the ratio C p d/n as K increases. Here we consider the smooth function model introduced by Bhattacharya andGhosh [4] and studied by Hall [14]. We apply Theorem 2.2 and derive a higher-order expansion for the normal approximation in this model. Let f : R d R be s ≥ x ∈ R d k f ( s ) ( x ) k op ≤ c s,f for a constant c s,f > { X i } di =1 be i.i.d. r.v. in R d , Σ := Var X i is PD and s.t. for some σ > E (cid:8) exp( γ T ( X i − µ )) (cid:9) ≤ exp( k γ k σ / ∀ γ ∈ R d , where µ := E X i . The following lemma provides a uniform second and K -thorder Edgeworth-type expansions for a c.d.f. of f ( ¯ X ), ¯ X := n − P ni =1 X i . Lemma 4.2.

Let Z f, Σ ∼ N (0 , σ f ) , Y := f ′ ( µ ) T ( X − µ ) /σ f for σ f := f ′ ( µ ) T Σ f ′ ( µ ) > . Then the following uniform 2-nd order approximation holds sup x ∈ R | P ( √ n { f ( ¯ X ) − f ( µ ) } ≤ x ) − P ( Z f, Σ ≤ x ) |≤ k A k op n − / + C (cid:8) k A k op + 6 (cid:9) / n − / + { . k A k op + 0 . } n − + p d/n ∆( t ) + e − t for any t > , where ∆( t ) := c f, Σ { P s − j =2 c j,f,µ σ j ( d/n ) ( j − / C j ( t ) /j !+ c s,f σ s ( d/n ) ( s − / C s ( t ) /s ! } for c j,f,µ := k f ( j ) ( µ ) k op , C j ( t ) := (cid:0) p t/d +2 t/d (cid:1) j/ (if d ≤ n and t ≤ d , then ∆( t ) ≤ C ), and A : R R , A : R R are s.t. A ( x , x , x ) = E Q j =1 x Tj Σ − / ( X − µ ) , A ( x , . . . , x ) = E Q j =1 x Tj Σ − / ( X − µ ) ∀ x , . . . , x ∈ R d .For any integer K ≥ it holds sup x ∈ R | P ( √ n { f ( ¯ X ) − f ( µ ) } ≤ x ) − P ( Z f, Σ ≤ x ) |≤ c | E Y | n − / + c | E Y − | n − + P K − j =5 c j n − ( j − / | m j | + c f,K,µ { n − K − + n − ( K − δ K } + p d/n ∆( t ) + e − t , for any t > , where m j are linear functions of diﬀerences between the momentsof Y and N (0 , . Constants c f, Σ , c j , c f,K,µ do not depend on d and n , theirexplicit expressions are provided in the proof. Supplementary Material

Proofs of the presented results are available in the supplementary material. . Zhilova/Edgeworth-type expansions with ﬁnite sample bounds References [1] Belloni, A., Bugni, F. A., and Chernozhukov, V. (2018). Subvector in-ference in PI models with many moment inequalities. arXiv preprintarXiv:1806.11466 .[2] Bentkus, V. (2003). On the dependence of the Berry–Esseen bound ondimension.

Journal of Statistical Planning and Inference , 113(2):385–402.[3] Berry, A. C. (1941). The accuracy of the Gaussian approximation to the sumof independent variates.

Transactions of the American Mathematical Society ,49(1):122–136.[4] Bhattacharya, R. N. and Ghosh, J. K. (1978). On the validity of the formalEdgeworth expansion.

Ann. Statist , 6(2):434–451.[5] Bhattacharya, R. N. and Rao, R. R. (1986).

Normal approximation andasymptotic expansions , volume 64. SIAM.[6] Chebyshev, P. (1890). Sur deux th´eor`emes relatifs aux probabilit´es.

ActaMath , 14:305–315.[7] Chernozhukov, V., Chetverikov, D., and Kato, K. (2013). Gaussian approx-imations and multiplier bootstrap for maxima of sums of high-dimensionalrandom vectors.

The Annals of Statistics , 41(6):2786–2819.[8] Chernozhukov, V., Chetverikov, D., and Kato, K. (2014). Comparison andanti-concentration bounds for maxima of Gaussian random vectors.

Proba-bility Theory and Related Fields , 162:47–70.[9] Chernozhukov, V., Chetverikov, D., and Kato, K. (2017). Central limittheorems and bootstrap in high dimensions.

The Annals of Probability ,45(4):2309–2352.[10] Cram´er, H. (1928). On the composition of elementary errors.

ScandinavianActuarial Journal , 1928(1):13–74.[11] Edgeworth, F. (1896). The asymmetrical probability-curve.

The Lon-don, Edinburgh, and Dublin Philosophical Magazine and Journal of Science ,41(249):90–99.[12] Edgeworth, F. (1905). The law of error. In

Pros. Camb. Philos. Soc ,volume 20, pages 16–65.[13] Esseen, C.-G. (1942). On the Liapounoﬀ limit of error in the theory ofprobability.

Ark. Mat. Astron. Fys. , A28(9):1–19.[14] Hall, P. (1992).

The bootstrap and Edgeworth expansion . Springer.[15] Kolassa, J. E. (2006).

Series approximation methods in statistics , vol-ume 88. Springer Science & Business Media.[16] Lahiri, S. N. (2013).

Resampling methods for dependent data . SpringerScience & Business Media.[17] Mammen, E. (1993). Bootstrap and wild bootstrap for high dimensionallinear models.

The Annals of Statistics , 21(1):255–285.[18] McCullagh, P. (1987).

Tensor Methods in Statistics: Monographs on Statis-tics and Applied Probability . Chapman and Hall/CRC.[19] Skovgaard, I. M. (1986). On multivariate Edgeworth expansions.