Edgeworth approximations for distributions of symmetric statistics
aa r X i v : . [ m a t h . S T ] F e b EDGEWORTH APPROXIMATIONS FOR DISTRIBUTIONSOF SYMMETRIC STATISTICS
FRIEDRICH G ¨OTZE AND MINDAUGAS BLOZNELIS In memoriam Willem Rutger van Zwet *March 31, 1934 † July 2, 2020
Abstract.
We study the distribution of a general class of asymptotically linear statistics whichare symmetric functions of N independent observations. The distribution functions of thesestatistics are approximated by an Edgeworth expansion with a remainder of order o ( N − ).The Edgeworth expansion is based on Hoeffding’s decomposition which provides a stochasticexpansion into a linear part, a quadratic part as well as smaller higher order parts. The validityof this Edgeworth expansion is proved under Cram´er’s condition on the linear part, momentassumptions for all parts of the statistic and an optimal dimensionality requirement for the nonlinear part. Mathematics Subject Classification.
Primary 62E20; Secondary 60F05
Key words and phrases.
Edgeworth expansion, Littlewood-Offord problem, concentration inBanach spaces, symmetric statistic, U -statistic, Hoeffding decomposition1. Introduction and Results
By the classical central limit theorem the distributions of sums X + · · · + X N of independentand identically distributed random variables can be approximated by the normal distribution.The accuracy of the normal approximation is of order O ( N − / ) by the well-known Berry-Esseentheorem.A function of observations X , . . . , X N is called linear statistic if it can be represented by a sumof functions depending on a single observation only. Many important statistics are non linear,but can be approximated by a linear statistic. We call these statistics asymptotically linear .The central limit theorem and the normal approximation with rate O ( N − / ) extend to theclass of asymptotically linear statistics as well. For comparisons of performance of statisticalprocedures beyond efficiency of first order, that is o ( N − / ), at the level of Hodges and Lehmanndeficiency, that is o ( N − ), more precise approximations beyond the normal approximation arerequired. Such a refinement is provided by Edgeworth expansions of the distribution function.The one-term Edgeworth expansion adds a correction term of order O ( N − / ) to the standardnormal distribution function Φ( x ) = R x −∞ (2 π ) − / e − u / du and provides an approximation withthe error o ( N − / ). Similarly, the two-term Edgeworth expansion includes correction terms oforders O ( N − / ) and O ( N − ) as well with an approximation error of order o ( N − ).Normal approximation theory including Edgeworth expansions is well studied for distributionfunction F N ( x ) = P { X + · · · + X N ≤ N E X + N / σx } , σ = Var X of a sum of independentrandom variables, see Petrov (1975) [22]. Sums of independent random vectors are consideredin Bhattacharya and Rao (1986)) [4]. One distinguishes two cases. For summands taking valuesin an arithmetic progression (lattice case) the distribution function of the sum has jumps oforder O ( N − / ). Corresponding asymptotic expansions are discontinuous functions designedto capture these jumps see the seminal results by Esseen (1945) [15]. For sums of non-lattice Date : September 1, 2005. Research supported by CRC 701. random variables (non-lattice case) the one-term Edgeworth expansionΦ( x ) − √ N κ σ ( x − ′ ( x )is a differentiable function. The correcting term reflects the skewness of the distribution of asummand, κ = E ( X − E X ) . More generally, a k − term Edgeworth expansion is a differen-tiable function with all derivatives bounded (it is a sum of Hermite polynomials of increasingorder with scalar weights involve cumulants of order at least three of X which vanish for X being Gaussian). Therefore, in order to establish the validity of such an expansion, that is toprove the bound o ( N − k/ ) for the remainder, one should assume that the underlying distributionis sufficiently smooth (see Bickel and Robinson 1982 [9]). A convenient condition to ensure thatis Cram´er’s condition (C): lim sup | t |→∞ | E exp { itX }| < . ( C )In this paper we establish the two-term Edgeworth expansion for a general asymptotically linearstatistic T = T ( X , . . . , X N ) with non-lattice distribution.There is a rich literature devoted to normal approximation and Edgeworth expansions for variousclasses of asymptotically linear statistics (see e.g. Babu and Bai (1993) [1], Bai and Rao (1991)[2], Bentkus, G¨otze and van Zwet (1997) [3], Bhattacharya and Ghosh (1978, 1980) [5, 6],Bhattacharya and Rao (1986) [4], Bickel (1974) [7], Bickel, G¨otze and van Zwet (1986) [8],Callaert, Janssen and Veraverbeke (1980) [11], Chibisov (1980) [12], Hall (1987) [18], Helmers(1982) [19], Petrov (1975) [22], Pfanzagl (1985) [23], Serfling (1980) [24], etc.A wide class of statistics can be represented as functions of sample means of vector variables.Edgeworth expansions of such statistics can be obtained by applying the multivariate expansionto corresponding functions, see Bhattacharya and Ghosh (1978, 1980) [5, 6]. In their work thecrucial Cram´er condition (C) is assumed on the joint distribution of all the components of avector which may be too restrictive in cases where some components have a negligible influenceon the statistic. More often only one or a few of the components satisfy a conditional version ofcondition (C). Bai and Rao (1991) [2], Babu and Bai (1993) [1] established Edgeworth expansionsfor functions of sample means under such a conditional Cram´er condition. This approach exploitsthe smoothness of the distribution of vector as well as the smoothness of the function definingthe statistic. In particular this approach needs a class of statistics which are smooth functions ofobservations or can be approximated by such functions via Taylor’s expansion, see also Chibisov(1980) [12].Let us note that the smoothness of the distribution function of a statistic, say T = φ ( X , . . . , X N )may have little to do with the smoothness of the kernel φ . Just take Gini’s mean difference P i Determining in this setup the actual influence of the non linear terms of T on theapproximation error represents a considerable challenge. The crucial problem, which requirednew techniques, was to control the quasi periodic behavior of upper bounds for the characteristicfunction of T at frequencies t ∼ N , These upper bounds involve bounds on conditional char-acteristic functions of the non linear part of T , say g ( t, Y ) m , given a subset Y := X I of size N − m of the observations. Understanding the separation of large maxima of ( t, Y ) → g ( t, Y ) m in terms of both arguments in a function space setup was finally achieved by a combinatorialargument of Kleitman on symmetric partitions (see Section 4) controlling the concentration of YMMETRIC STATISTICS 3 sums like in the Littlewood-Offord problem in Banach spaces. The separation of large maxima of g ( t, Y ) m then allowed to prove the desired bounds when averaging over t and Y . Note however,that more standard analytical arguments for concentration bounds would not work in the requiredgenerality here. Let X, X , X , . . . , X N be independent and identically distributed random variables takingvalues in a measurable space ( X , B ). Let P X denotes the distribution of X on ( X , B ). Weassume that T ( X , . . . , X N ) is a symmetric function of its arguments (symmetric statistic, forshort). Furthermore, we assume that the moments E T and σ T := Var T are finite.Our approach is based on Hoeffding’s decomposition of T , see Hoeffding (1948) [20], Efron andStein (1981) [14] and van Zwet (1984) [26]. Hoeffding’s decomposition expands T into the seriesof centered and mutually uncorrelated U -statistics of increasing order T = E T + 1 N / X ≤ i ≤ N g ( X i ) + 1 N / X ≤ i L, Q and K denote the first, the second and the third sum. We call L the linear part, Q thequadratic part and K the cubic part of the decomposition.We shall assume that the linear part does not vanish, that is, Var L > 0. If, for large N ,the linear part dominates the statistic we call T asymptotically linear. The distribution of anasymptotically linear statistic can be approximated by the normal distribution, via the centrallimit theorem. An improvement over the normal approximation is obtained by using Edgeworthexpansions for the distribution function F ( x ) = P { T − E T ≤ σ T x } . For this purpose we writeHoeffding’s decomposition in the form(1) T − E T = L + Q + K + R, where R denotes the remainder. For a number of important examples of asymptotically linearstatistics we have R/σ T = o P ( N − ) (in probability) as N → ∞ . Therefore, the U -statistic σ − T ( L + Q + K ) can be viewed as a stochastic expansion of ( T − E T ) /σ T up to the order o P ( N − ).Furthermore, an Edgeworth expansion of σ − T ( L + Q + K ) can be used to approximate F ( x ).Introduce the following two term Edgeworth expansion of the distribution function of σ − T ( L + Q + K ) G ( x ) = Φ( x ) − √ N κ x − ′ ( x )(2) − N (cid:16) κ 72 ( x − x + 15 x )Φ ′ ( x ) + κ 24 ( x − x )Φ ′ ( x ) (cid:17) . Here Φ respectively Φ ′ denote the standard normal distribution function and its derivative.Furthermore, we introduce σ = E g ( X ) and κ = σ − (cid:16) E g ( X ) + 3 E g ( X ) g ( X ) ψ ( X , X ) (cid:17) ,κ = σ − (cid:16) E g ( X ) − σ + 12 E g ( X ) g ( X ) ψ ( X , X )+ 12 E g ( X ) g ( X ) ψ ( X , X ) ψ ( X , X )+ 4 E g ( X ) g ( X ) g ( X ) χ ( X , X , X ) (cid:17) . Our main result, Theorem 1 below, establishes a bound o ( N − ) for the Kolmogorov distance∆ = sup x ∈ R | F ( x ) − G ( x ) | . F. G ¨OTZE AND M. BLOZNELIS We shall consider a general situation where the kernel T = T ( N ) , the space ( X , B ) = ( X ( N ) , B ( N ) )and the distribution P X = P ( N ) X all depend on N as N → ∞ . In order to keep the notationsimple we drop the subscript N in what follows. Let us introduce the conditions we need in order to prove the bound ∆ = o ( N − ).(i) Moment conditions. Assume that, for some absolute constants 0 < A ∗ < M ∗ > r > s > 2, we have E g ( X ) > A ∗ σ T , E | g ( X ) | r < M ∗ σ r T , (3) E | ψ ( X , X ) | r < M ∗ σ r T , E | χ ( X , X , X ) | s < M ∗ σ s T . These moment conditions refer to the linear, the quadratic and the cubic part of T . In order tocontrol the remainder R of the approximation (1) we use moments of differences introduced inBentkus, G¨otze and van Zwet (1997) [3], see also van Zwet (1984) [26] . Define, for 1 ≤ i ≤ N , D i T = T − E i T , E i T := E ( T | X , . . . , X i − , X i +1 , . . . , X N ) . A subsequent application of difference operations D i , D j , . . . , (the indices i, j , . . . , are alldistinct) produce higher order differences, like D i D j T := D i ( D j T ) = T − E i T − E j T + E i E j T . For m = 1 , , , m = E | N m − / D D · · · D m T | .We shall assume that for some absolute constant D ∗ > ν ∈ (0 , / 2) we have(4) ∆ /σ T ≤ N − ν D ∗ For a number of important examples of asymptotically linear statistics the moments ∆ m areevaluated or estimated in [3]. Typically we have ∆ m /σ T = O (1) for some m . Therefore, assumingthat (4) holds uniformly in N as N → ∞ , we obtain from the inequality E R ≤ N − ∆ , see (162)(see Appendix), that R/σ T = O P ( N − − ν ). Furthermore, assuming that (3), (4) hold uniformlyin N as N → ∞ , we obtain from (162), (161), see Appendix, that σ /σ T = (1 − O ( N − )).(ii) Cram´er type smoothness condition. Introduce the function ρ ( a, b ) = 1 − sup {| E exp { itg ( X ) /σ }| : a ≤ | t | ≤ b } . We shall assume that, for some δ > ν > 0, we have(5) ρ ( β − , N ν +1 / ) ≥ δ. Here β = σ − E | g ( X ) | .It was shown in G¨otze and van Zwet (1992) [17], see as well Theorem 1.4 of [3], that momentconditions (like (3) and (4)) together with Cram´er’s condition (on the summand g ( X i ) of thelinear part) do not suffice to establish the bound ∆ = o ( N − ). For convenience we state thisresult in Example 1 below. Example 1. Let X , X , . . . be independent random variables uniformly distributed on the in-terval ( − / , / T N = ( W N + N − / V N )(1 − N − / V N ), where V N = N − / P { N / X j } and W N = N − P [ N / X j ]. Here [ x ] denotes the nearest integer to x and { x } = x − [ x ].Assume that N = m , where m is odd. We have, by the local limit theorem, P { W N = 1 } ≥ cN − and P {| V N | < δ } > cδ, < δ < , where c > W N and V N , that P { − δ N − ≤ T N ≤ } ≥ c δN − .The example defines a sequence of U -statistics T N whose distribution functions F N have O ( N − )sized increments in a particular interval of length o ( N − ). These fluctuations of magnitude O ( N − ) appear as a result of a nearly lattice structure induced by the interplay between the YMMETRIC STATISTICS 5 (smooth) linear part and the quadratic part. In order to avoid examples with such a (conditional)lattice structure a simple moment condition was introduced in G¨otze and van Zwet (1992) [17]which, separates (in L distance) the random variable ψ ( X , X ) from any random variable ofthe form ψ h ( X , X ) = h ( X ) g ( X ) + g ( X ) h ( X ), h − measurable. Note that the L distance E ( ψ ( X , X ) − ψ h ( X , X )) is minimized by h ( x ) = b ( x ), where b ( x ) = σ − E (cid:0) ψ ( X , X ) g ( X ) (cid:12)(cid:12) X = x (cid:1) − ( κ/ σ ) g ( x ) . Here κ = E ψ ( X , X ) g ( X ) g ( X ).Therefore, we assume that, for some absolute constant δ ∗ > 0, we have(6) E (cid:16) ψ ( X , X ) − (cid:0) b ( X ) g ( X ) + b ( X ) g ( X ) (cid:1)(cid:17) ≥ δ ∗ σ T . Define ν = 600 − min { ν , ν , s − , r − , } . Theorem 1. Let N ≥ . Assume that for some absolute constants A ∗ , M ∗ , D ∗ > and numbers r > , s > , ν , ν > and δ, δ ∗ > , the conditions (3), (4), (5), (6) hold. Then there exists aconstant C ∗ > depending only on A ∗ , M ∗ , D ∗ , r , s , ν , ν , δ, δ ∗ such that ∆ ≤ C ∗ N − − ν (cid:0) δ − ∗ N − ν (cid:1) . In particular case of U statistics of degree three (the case where R ≡ Remark 1. Condition (6) can be relaxed. Assume that for some absolute constant G ∗ we have(7) E (cid:16) ψ ( X , X ) − (cid:0) b ( X ) g ( X ) + b ( X ) g ( X ) (cid:1)(cid:17) ≥ N − ν G ∗ σ T . The bound of Theorem 1 holds if we replace (6) by this weaker condition. In this case we have∆ ≤ C ∗ N − − ν , where the constant C ∗ depends on A ∗ , D ∗ , G ∗ , M ∗ , r, s, ν , ν , δ . Remark 2. Consider a sequence of statistics T ( N ) = T ( N ) ( X N , . . . , X NN ) based on indepen-dent observations X N , . . . , X NN taking values in ( X ( N ) , B ( N ) ) and with the common distribu-tion P ( N ) X . Assume that conditions (3), (4) and (6) (or (7)) hold uniformly in N = N , N +1 , . . . ,for some N . Theorem 1 implies the bound ∆ = o ( N − ) as N → ∞ . Remark 3. The value of ν = 600 − min { ν , ν , s − , r − , } is far from being optimal. Fur-thermore, the moment conditions (3) and (4) are not the weakest possible that would ensurethe approximation of order o ( N − ). The condition (3) can likely be reduced to the momentconditions that are necessary to define Edgeworth expansion terms κ and κ , similarly, (4) canbe reduced to ∆ /σ T = o ( N − ). No effort was made to obtain the result under the optimalconditions. This would increase the complexity of the proof which is already rather involved. In order to compare Theorem 1 with earlier results of similar nature let us consider thecase of U -statistics of degree two(8) U = √ N (cid:18) N (cid:19) − X ≤ i 0, where σ h = E h ( X ). In this case Hoeffding’s decomposition (1) reduces to U = L + Q ,where, by the assumption σ h > 0, we have Var L > 0. Since the cubic part vanishes we mayremove the moment E g ( X ) g ( X ) g ( X ) χ ( X , X , X ) from the expression for κ . In this way we F. G ¨OTZE AND M. BLOZNELIS obtain the two term Edgeworth expansion (2) for the distribution function F U ( x ) = P { U ≤ σ U x } with σ U := Var U .We call a kernel h reducible if for some measurable functions u, v : X → R we have h ( x, y ) − E h ( X , X ) = v ( x ) u ( y ) + v ( y ) u ( x ) for P X × P X almost sure ( x, y ) ∈ X × X . A simple calculationshows that for a sequence of U -statistics (8) with a fixed non-reducible kernel the condition (6)is satisfied, for some δ ∗ > 0, uniformly in N . A straightforward consequence of Theorem 1 isthe following corollary. Write ˜ ν = 600 − min { ν , r − , } . Corollary 1. Let N ≥ . Assume that for some r > E | h ( X , X ) | r < ∞ . Assume that σ h > and the kernel h is non-reducible and that for some δ > {| E e itσ − h h ( X ) | : | t | ≥ β − } ≤ − δ. Then there exist a constant C ∗ > such that sup x ∈ R | F U ( x ) − G ( x ) | ≤ C ∗ N − − ˜ ν For U -statistics with fixed kernel h the validity of the Edgeworth expansion (2) up to the order o ( N − ) was established by Callaert, Janssen and Veraverbeke (1980) [11] and Bickel, G¨otze andvan Zwet (1986) [8]. In addition to the moment conditions (like (9)) and Cram´er’s condition(like (10)) they imposed the following rather implicit conditions which ensure the regularity of F U ( x ). Callaert, Janssen and Veraverbeke (1980) [11] assumed that for some 0 < c < < α < / (cid:12)(cid:12)(cid:12) E (cid:0) exp { itσ − U N X j = m +1 h ( X , X j ) } (cid:12)(cid:12) X m +1 , . . . , X N (cid:1)(cid:12)(cid:12)(cid:12) ≤ c has probability 1 − o (1 /N log N ) uniformly for all t ∈ [ N / / log N, N log N ]. Here m ≈ N α .Bickel, G¨otze and van Zwet (1986) [8] more explicitly required that the linear operator, f ( · ) → E ψ ( X, · ) f ( X ) defined by ψ has sufficiently large number of non-zero eigenvalues (depending onthe existing moments).Both of these conditions correspond to some techniques used in the parts of our proof. Recallthat in the first step of the proof, one reduces the problem of bounding | F U ( x ) − G ( x ) | bymeans of Berry-Esseen’s smoothing inequality, to that of bounding the difference between thecorresponding Fourier transforms | ˆ F U ( t ) − ˆ G ( t ) | in the region | t | < N ε − N , for some ε N ↓ 0. For | t | ≈ N one writes | ˆ F U ( t ) − ˆ G ( t ) | ≤ | ˆ F U ( t ) | + | ˆ G ( t ) | and bounds every summand separately. Thecondition (11) applies more or less directly and shows exponential decay of | ˆ F U ( t ) | as N → ∞ ,for | t | ≈ N . The eigenvalue condition achieves the same goal, but refers to a more sophisticatedapproach based on a symmetrization technique introduced by G¨otze (1979) [16].The condition (6) provides an alternative to (11) and the eigenvalue condition. It is aimed toexclude the situations, where the interplay between the linear and the quadratic part produces anearly lattice structure of U , which in turn results in fluctuations of F U ( x ) of magnitude O ( N − ).The proof of Theorem 1 uses a result of Kleitman which provides a solution to multidimensionalLittlewood-Offord problem. This result establishes bounds for probabilities of the concentrationof sums of random variables with values in multidimensional spaces. Remark 4. The U -statistic (8) with a kernel h ( x, y ) = v ( x ) u ( y ) + v ( y ) u ( x ), where E u ( X ) = 0,violates (6). Let us note that in this case one can establish the validity of an Edgeworthexpansion with the remainder o ( N − ), under the following bivariate Cram´er’s condition, whichis essentially more restrictive than condition (ii),1 − sup {| E exp { i tσ u ( X ) + i sσ v ( X ) }| : β − < | t | ≤ N ν +1 / , | s | ≤ N ν } > δ, YMMETRIC STATISTICS 7 for some δ, ν > 0. Note that from this condition we immediately obtain the desired exponentialdecay of ˆ F ( t ) as N → ∞ for | t | ≈ N .The remaining parts of the paper (Sections 2—5) contain the proof of Theorem 1. Auxiliaryresults are placed in the Appendix.2. Proof of Theorem 1 The proof combines various techniques developed in earlier papers by Callaert, Janssen andVeraverbeke (1980) [11], Bickel, G¨otze and van Zwet (1986) [8]. It is based on the manuscriptof G¨otze and van Zwet (1992) [17]. The later paper introduces the condition (6), providesthe crucial counter example, see Example 1 above, and contains an outline of the proof in theparticular case of U -statistics of degree three ( T − E T = L + Q + K ). In order to extendthese arguments to general symmetric statistics we combine stochastic expansions by means ofHoeffding’s decomposition and bounds for various parts of the decomposition. This approachwas introduced in van Zwet (1984) [26] and further developed in [3]. Let us start with an outline of the proof. Firstly, using the linear structure induced byHoeffding’s decomposition we replace T /σ T by the statistic ˜ T which is conditionally linear given X m +1 , . . . , X N . Secondly, invoking a smoothing inequality we pass from distribution functionsto Fourier transforms. In the remaining steps we bound the difference δ ( t ) = E e it ˜ T − ˆ G ( t ), for | t | ≤ N ν . For ”small frequencies” | t | ≤ CN / , we expand the characteristic function E e it ˜ T in order to show that δ ( t ) = o ( N − ). For remaining range of frequencies, that is CN / ≤ | t | ≤ N ν , we bound the summands E e it ˜ T and ˆ G ( t ) separately. The cases of ”large frequencies”, thatis N − ν ≤ | t | ≤ N ν , and ”medium frequencies”, that is C √ N ≤ | t | ≤ N − ν , are treated in adifferent manner. For medium frequencies the Cram´er type condition (5) ensures an exponentialdecay of | E e it ˜ T | as N → ∞ . For large frequencies we combine conditions (5) and (6). Here weapply a combinatorial concentration bound due to Kleitman as described in the introduction. Before starting the proof we introduce some notation. By c ∗ we shall denote a positiveconstant which may depend only on A ∗ , D ∗ , M ∗ , r, s, ν , ν , δ , but it does not depend on N . Indifferent places the values of c ∗ may be different.It is convenient to write the decomposition in the form(12) T = E T + X ≤ k ≤ N U k , U k = X ≤ i < ··· k = { , . . . , k } . Given a subset A = { i , . . . , i k } ⊂ Ω N we write, for short, T A := g k ( X i , . . . , X i k ). Put T ∅ := E T . Now the decomposition (12) can be written as follows T = E T + X ≤ k ≤ N U k = X A ⊂ Ω N T A , U k = X | A | = k, A ⊂ Ω N T A . Proof of Theorem 1. Throughout the proof we assume without loss of generality that(14) 4 < r ≤ , < s ≤ E T = 0 , σ T = 1 . Denote, for t > β t = σ − t E | g ( X ) | t , γ t = E | ψ ( X , X ) | t , ζ t = E | χ ( X , X , X ) | t . F. G ¨OTZE AND M. BLOZNELIS The linearization step. Choose number ν > m such that(15) ν = 600 − min { ν , ν , s − , r − } , m ≈ N ν . Split T = T [ m ] + W , T [ m ] = X A : A ∩ Ω m = ∅ T A , W = X A : A ∩ Ω m = ∅ T A . Furthermore, write T [ m ] = U ∗ + U ∗ + Λ , Λ = Λ + Λ + Λ + Λ + Λ , U ∗ = m X i =1 T { i } , U ∗ = m X i =1 N X j = m +1 T { i,j } , Λ = X ≤ i 0, we have(16) ∆ ≤ sup x ∈ R | ˜ F ( x ) − G ( x ) | + ε sup x ∈ R | G ′ ( x ) | + P {| Λ | > ε } . From Lemma 5 we obtain via Chebyshev’s inequality, for ε = N − − ν , P {| Λ | > ε } ≤ X i =1 P {| Λ i | > ε }≤ (cid:0) ε (cid:1) E | Λ | + (cid:0) ε (cid:1) ( E Λ + E Λ + E Λ ) + (cid:0) ε (cid:1) s E | Λ | s ≤ c ∗ N − − ν . In the last step we used conditions (3), (4) and the inequality (163). Furthermore, using (3) and(4) one can show that(17) sup x ∈ R | G ′ ( x ) | ≤ c ∗ . Therefore, (16) implies ∆ ≤ ˜∆ + c ∗ N − − ν , ˜∆ := sup x ∈ R | ˜ F ( x ) − G ( x ) | . It remains to show that ˜∆ ≤ c ∗ N − − ν . A smoothing inequality. Given a > k ≥ x → g a,k ( x ) = a c ( k )( ax ) − k sin k ( ax ) , YMMETRIC STATISTICS 9 where c ( k ) is the normalizing constant. Its characteristic functionˆ g a,k ( t ) = Z + ∞−∞ e itx g a,k ( x ) dx = 2 π a c ( k ) u ∗ k [ − a,a ] ( t )vanishes outside the interval | t | ≤ ka . Here u ∗ k [ − a,a ] ( t ) denotes the probability density function ofthe sum of k independent random variables each uniformly distributed in [ − a, a ]. It is easy toshow that the function t → ˆ g a,k ( t ) is unimodal and symmetric around t = 0.Let µ be the probability distribution with the density g a, , where a is chosen to satisfy µ ([ − , / 4. Given T > µ T ( A ) = µ ( T A ), for A ⊂ R − measurable. Let ˆ µ T denote the charac-teristic function corresponding to µ T .We apply Lemma 12.1 of Bhattacharya and Rao (1986) [4]. It follows from (17) and the identity µ T ([ − T − , T − ]) = 3 / ≤ x ∈ R (cid:12)(cid:12) ( ˜ F − G ) ∗ µ T ( −∞ , x ] (cid:12)(cid:12) + c ∗ T − . Here ˜ F respectively G denote the probability distribution of ˜ T respectively the signed measurewith density function G ′ ( x ). Furthermore, ∗ denotes the convolution operation. Proceeding asin the proof of Lemma 12.2 ibidem we obtain(20) ( ˜ F − G ) ∗ µ T ( −∞ , x ] = 12 π Z + ∞−∞ e − itx (cid:16) E e it ˜ T − ˆ G ( t ) (cid:17) ˆ µ T ( t ) − it dt, where ˆ G denotes the Fourier transform of G ( x ). Note that ˆ µ T ( t ) vanishes outside the interval | t | ≤ aT . Finally, we obtain from (19) and (20) that(21) ˜∆ ≤ π sup x ∈ R | I ( x ) | + c ∗ aT , I ( x ) := Z T − T e − itx (cid:0) E e it ˜ T − ˆ G ( t ) (cid:1) ˆ µ T ′ ( t ) − it dt, where T ′ = T / a . Here we use the fact that ˆ µ T ′ ( t ) = 0 for | t | > T .Choose T = N ν and denote K N ( t ) = ˆ µ T ′ ( t ). Note that | K N ( t ) | ≤ µ T ′ is a probabilitymeasure). Write | I ( x ) | ≤ c I + c I + | I | + | I | ,I = Z | t |≤ t (cid:12)(cid:12) E e it ˜ T − ˆ G ( t ) (cid:12)(cid:12) dt | t | , I = Z t < | t | Here we prove the bounds (22) for I and I . The proof of the bound | I | ≤ c ∗ N − − ν is relativelysimple and it is deferred to the end of the section.Let us show that(23) (cid:12)(cid:12)(cid:12)Z N − ν < | t | 5] and ν , defined by (15), we have2 /r < α < / ν < min { rα − , − α } . Given N introduce the integers(24) n ≈ N ν , M = ⌊ ( N − m ) /n ⌋ . We have N − m = M n + s , where the integer 0 ≤ s < n . Observe, that the inequalities ν < − and m < N / , see (15), imply M > n . Therefore, s < M . Split the index set { m + 1 , . . . , N } = O ∪ O ∪ · · · ∪ O n , (25) O i = { j : m + ( i − M < j ≤ m + iM } , ≤ i ≤ n − ,O n = { j : m + ( n − M < j ≤ N } Clearly, O , . . . , O n − are of equal size (= M ) and | O n | = M + s < M .We shall assume that the random variable X : Ω → X is defined on the probability space (Ω , P )and P X is the probability distribution on X induced by X . Given p ≥ L p = L p ( X , P X )denote the space of real functions f : X → R with E | f ( X ) | p < ∞ . Denote k f k p = ( E | f ( X ) | p ) /p .With a random variable g ( X ) we associate an element g = g ( · ) of L p , p ≤ r . Let p g : L → L denotes the projection onto the subspace orthogonal to the vector g ( · ) in L . Given h ∈ L ,decompose(26) h = a h g + h ∗ , a h = h h, g i k g k − , h ∗ := p g ( h ) , where h h, g i = R h ( x ) g ( x ) P X ( dx ). For h ∈ L r we have(27) k h k r ≥ k h k ≥ k h ∗ k . Furthermore, for r − + v − = 1 (here r ≥ ≥ v > 1) we have | h h, g i | ≤ k h k r k g k v ≤ k h k r k g k . In particular,(28) | a h | ≤ k h k r / k g k . It follows from the decomposition (26) and (28) that k h ∗ k r ≤ k h k r + | a h | k g k r ≤ k h k r (1 + k g k r / k g k )(29) = c g k h k r , c g := 1 + k g k r / k g k . (30)Note that c g ≤ c ∗ g := 1 + M /r ∗ A − / ∗ . Introduce the numbers(31) a = 14 min (cid:8) c ∗ g , ( c r A ∗ / r M ∗ ) / ( r − A − / ∗ (cid:9) , c r = 724 12 r − . We shall show that there exist δ ′ , δ ′′ > A ∗ , M ∗ , δ only such that (uniformly in N ) Cram´er’s characteristic ρ , see (5), satisfies(32) ρ ( a , N − ν +1 / ) ≥ δ ′ , ρ ((2 β ) − , N ν +1 / ) ≥ δ ′′ . YMMETRIC STATISTICS 11 We shall prove the first inequality only. In view of (5) it suffices to show that ρ ( a , β − ) ≥ δ ′ .Invoking the simple inequality, see, e.g., proof of (187) below, | E e itσ − g ( X ) | ≤ − − t (1 − − | t | β )we obtain, for | t | ≤ β − , | E e itσ − g ( X ) | ≤ − t / . Therefore, ρ ( a , β − ) ≥ a / δ ′ = min { δ, a / } in (32).Introduce the constant (depending only on A ∗ , M ∗ , δ )(33) δ = δ ′ / (10 c ∗ g ) . Note that 0 < δ < / f ∈ L r and T ∈ R such that(34) N − ν +1 / ≤ | T | ≤ N ν +1 / , denote I ( T ) = [ T , T + δ N − ν +1 / ] ,τ ( f ) = 1 − v ( f ) , v ( f ) = sup t ∈ I ( T ) | u t ( f ) | , (35) u t ( f ) = Z exp (cid:8) it (cid:0) g ( x ) + N − / f ( x ) (cid:1)(cid:9) P X ( dx ) . Given a random variable η with values in L r and number 0 < s < d s ( η, I ( T )) = II { v ( η ) > − s } II {k η k r ≤ N ν } , δ s ( η, I ( T )) = E d s ( η, I ( T )) . Introduce the function(37) ψ ∗∗ ( x, y ) = ψ ( x, y ) − b ( x ) g ( y ) − b ( y ) g ( x )and the number δ = E | ψ ∗∗ ( X , X ) | . It follows from (6) and our assumption that σ T = 1, see (14), that δ ≥ δ ∗ . The proof of (23) is rather technical and therefore will be illustrated byan outline. In the first step we truncate random variables X m +1 , . . . , X N in a special way usingconditioning and replace them by corresponding ”truncated” random variables Y m +1 , . . . , Y N .Correspondingly the statistic ˜ T is, then, replaced by T ′ , see (43). In the second step we split theinterval of frequencies N − ν ≤ | t | ≤ N ν into non overlapping intervals ∪ p J p of sizes ≈ N − ν sothat the integral (45) splits into the sum (46). Conditionally, given Y m +1 , . . . , Y N , the statistic T ′ is linear in observations X , . . . , X m , since in ˜ T we have removed the higher order terms(in X , . . . , X m ) from T . Let E Y denote the conditional expectation given Y m +1 , . . . , Y N . Theconditional characteristic function E Y exp { itT ′ } = α mt exp { itW ′ } contains the multiplicativecomponent α mt , where α t = E Y exp { itN − / g ( X ) + itN − / N X j = m +1 ψ ( X , Y j ) } and where the real valued statistic W ′ is obtained from W replacing X j by Y j , for m +1 ≤ j ≤ N .In order to bound | E Y e itT ′ | one would like to show exponential decay (in m ) of the product | α mt | using a Cram´er type condition like (5) above. For | t | = o ( N ) (the case of medium frequencies)the size of the quadratic part N − / P Nj = m +1 ψ ( X , Y j ) can be neglected and Cram´er’s conditionimplies | α t | ≤ − v ′ for some v ′ > 0. Thus we obtain | α mt | ≤ e − mv ′ . For large frequencies | t | ≈ N ,the contribution of the quadratic part becomes significant and we introduce an extra moment condition (6). Using (6), we show that, for a large set of values t ∈ J p , Cram´er’s condition (5)yields the desired decay of | α mt | . Furthermore, the measure of the remaining t is small with highprobability. Step 1. Truncation. Recall that the random variable X : Ω → X is defined on the probabilityspace (Ω , P ). Let X ′ be an independent copy so that ( X, X ′ ) is defined on (Ω × Ω ′ , P × P ),where Ω ′ = Ω. It follows from E | ψ ( X, X ′ ) | r < ∞ , by Fubini, that for P almost all ω ′ ∈ Ω ′ the function ψ ( · , X ′ ( ω ′ )) = { x → ψ ( x, X ′ ( ω ′ )) , x ∈ X } is an element of L r . Furthermore,one can define an L r -valued random variable Z ′ : Ω ′ → L r such that Z ′ ( ω ′ ) = ψ ( · , X ′ ( ω ′ )),for P almost all ω ′ . Consider the event ˜Ω = {k Z ′ k r ≤ N α } ⊂ Ω ′ and denote q N = P ( ˜Ω).Here k Z ′ k r = ( R | ψ ( x, X ′ ( w ′ )) | r P X ( dx )) /r denotes the L r norm of the random vector Z ′ . Let Y : ˜Ω → X denote the random variable X ′ conditioned on the event ˜Ω. Therefore Y is definedon the probability space ( ˜Ω , ˜ P ), where ˜ P denotes the restriction of q − N P to the set ˜Ω and,for every ω ′ ∈ ˜Ω, we have Y ( ω ′ ) = X ′ ( ω ′ ). Let Z denote the L r − valued random element { x → ψ ( x, Y ( ω ′ )) } defined on the probability space ( ˜Ω , ˜ P ).We can assume that X := ( X , . . . , X N ) is a sequence of independent copies of X defined on theprobability space (Ω N , P N ). Let ω = ( ω , . . . , ω N ) denote an element of Ω N . Every X j definesrandom vector Z ′ j = ψ ( · , X j ) taking values in L r . Introduce events A j := {k Z ′ j k r ≤ N α } ⊂ Ω N and let X ′ = ( X , . . . , X m , Y m +1 , . . . , Y N ) denote the sequence X conditioned on the eventΩ ∗ = ∩ Nj = m +1 A j = Ω m × ˜Ω N − m . Clearly, X ′ ( ω ) = X ( ω ) for every ω ∈ Ω ∗ and X ′ is defined onthe space Ω m × ˜Ω N − m equipped with the probability measure P m × ˜ P N − m . In particular, therandom variables X , . . . , X m , Y m +1 , . . . , Y N are independent and Y j , for m + 1 ≤ j ≤ N , hasthe same distribution as Y . Let Z j denote the L r − valued random element { x → ψ ( x, Y j ) } , for m + 1 ≤ j ≤ N .We are going to replace E e it ˜ T by E e itT ′ . For s > − I A j ≤ N − α s k Z ′ j k sr , k Z ′ j k rr = E (cid:0) | ψ ( X, X j ) | r (cid:12)(cid:12) X j (cid:1) . Therefore, by Chebyshev’s inequality, for s = r ,(39) 0 ≤ − q N ≤ N − rα E | ψ ( X, X j ) | r ≤ N − rα M ∗ ≤ c ∗ N − − ν . We have for k ≤ N q − kN ≤ (1 − N − r α M ∗ ) − k ≤ (1 − N − M ∗ ) − N ≤ c ∗ ,q − kN − ≤ kq − kN (1 − q N ) ≤ c ∗ kN − − ν ≤ c ∗ N − − ν . (40)For a measurable function f : X N → R , we have(41) E f ( X , . . . , X m , Y m +1 , . . . , Y N ) = E f ( X , . . . , X N ) I A m +1 . . . I A N q ( N − m ) N . Therefore, for f ≥ 0, (40) and (41) imply(42) E f ( X , . . . , X m , Y m +1 , . . . , Y N ) ≤ c ∗ E f ( X , . . . , X N )Furthermore, for(43) T ′ := ˜ T ( X , . . . , X m , Y m +1 , . . . , Y N )we have, by (40) and (41), | E e it ( T ′ − x ) − E e it (˜ T − x ) | ≤ (cid:0) q − ( N − m ) N − (cid:1) + (cid:0) − P { A m +1 ∩ · · · ∩ A N } (cid:1) = ( q − ( N − m ) N − 1) + (1 − q N − mN ) ≤ c ∗ N − − ν . (44) YMMETRIC STATISTICS 13 We replace ˜ T by T ′ in the exponent in (23). The error of this replacement is c ∗ N − − ν , by (44)and the simple inequality | K N ( t ) | ≤ 1, for every t . In order to prove (23) we shall show that I := Z N − ν ≤| t |≤ N ν E e it ˆ T v N ( t ) dt ≤ c ∗ δ − N ν , (45) v N ( t ) = t − K N ( t ) , ˆ T = T ′ − x. Step 2. Here we prove (45). Split the integral(46) I = X p I p , I p = E Z t ∈ J p e it ˆ T v N ( t ) dt, where { J p , p = 1 , , . . . } denote a sequence of consecutive intervals of length ≈ δ N − ν each.Here δ is a constant defined by (33). In order to prove (45) we show that for every p ,(47) | I p | ≤ c ∗ N − + c ∗ N − − ν (cid:0) δ − (cid:1) . Given p let us prove (47). Firstly, we replace I p by E J ∗ , where J ∗ = Z I { t ∈ I ∗ } v N ( t ) E Y e it ˆ T dt. Here I ∗ = I ∗ ( Y m +1 , . . . , Y N ) ⊂ J p is a random subset defined by(48) I ∗ = { t ∈ J p : | α t | > − ε m } , ε m = m − ln N. Since, for t / ∈ I ∗ , we have | E Y e itT ′ | ≤ | α t | m ≤ (1 − ε m ) m/ ≤ c ∗ N − , the error of this replacement is given by(49) | I p − E J ∗ | ≤ c ∗ N − . Secondly, we shall show that with a high probability the set I ∗ ⊂ J p is a (random) interval. Thisfact and the fact that v N ( t ) is monotone will be used latter to bound the integral J ∗ . Introducethe L r − valued random element(50) S = N − / ( Z m +1 + · · · + Z N ) = N − / N X j = m +1 ψ ( · , Y j ) . We apply Lemma 12 to the set N − / I ∗ conditionally on the event SS = {k S k r < N ν/ } .This lemma shows that N − / I ∗ is an interval of size at most c ∗ ε m . That is, we can write I ∗ = ( a N , a N + b − N ) and(51) I SS J ∗ = I SS E Y ˜ J ∗ , ˜ J ∗ = Z a N + b − N a N v N ( t ) e it ˆ T dt, where the random variables (functions of Y m +1 , . . . , Y N ) satisfy a N ∈ J p and b − N ≤ c ∗ ε m √ N = c ∗ √ N m − / ln N. By Lemma 13, SS has at least a probability P { SS } ≥ − c ∗ N − . Therefore,(52) | E J ∗ − E I SS J ∗ | ≤ c ∗ N − . Clearly, I ∗ = ∅ if and only if α > − ε m , where α = sup {| α t | : t ∈ J p } . Therefore, we can write, see also (51), I SS J ∗ = I B J ∗ = I B E Y ˜ J ∗ , where B = { α > − ε m } ∩ SS. This identity together with (49) and (52) shows(53) | I p | ≤ | E I B E Y ˜ J ∗ | + c ∗ N − . Using the integration by parts formula we shall show that | E I B E Y ˜ J ∗ | ≤ cN − ν (cid:16) P { B } + Z b N P { B ε } ε dε (cid:17) , (54) B ε = B ∩ {| ˆ T | ≤ ε } . This inequality in combination with (53) and (55), see below, shows (47),(55) Z b N P { B ε } ε dε ≤ c ∗ δ − N ν , P { B } ≤ c ∗ δ − N ν . Proof of (55) is rather technical. It is given in subsection .Let us prove (54). Firstly, we show that(56) | ˜ J ∗ | ≤ c ( | ˆ T | + b N ) − a − N . The integration by parts formula shows(57) i ˆ T ˜ J ∗ = v N ( t ) e it ˆ T (cid:12)(cid:12) a N + b − N a N − Z a N + b − N a N v ′ N ( t ) e it ˆ T dt =: a ′ + a ′′ . By our choice of the smoothing kernel, v N ( t ) is monotone on J p . Therefore, | a ′′ | ≤ Z a N + b − N a N | v ′ N ( t ) | dt = | Z a N + b − N a N v ′ N ( t ) dt | = | v N ( a N ) − v N ( a N + b − N ) | . Invoking the simple inequality | a ′ | ≤ | v N ( a N ) | + | v N ( a N + b − N ) | and using | v N ( t ) | ≤ | t | − weobtain from (57) | ˆ T ˜ J ∗ | ≤ c (cid:0) a − N + ( a N + b − N ) − (cid:1) ≤ c a − N . For | ˆ T | > b N , this inequality implies (56). For | ˆ T | ≤ b N the inequality (56) follows from theinequalities | ˜ J ∗ | ≤ Z a N + b − N a N | v N ( t ) | dt ≤ Z a N + b − N a N c | t | dt ≤ c a − N b − N . Furthermore, it follows from (56) and the inequality a N ≥ N − ν that | ˜ J ∗ | ≤ c ( | ˆ T | + b N ) − N − ν . Finally, we apply the inequality (which holds for arbitrary real number v )1 | v | + b N ≤ Z b N dεε I {| v |≤ ε } to derive | ˜ J ∗ | ≤ c ∗ N − ν (cid:0) Z b N dεε I {| ˆ T |≤ ε } (cid:1) . This inequality gives (54). Here we prove (55). The first (respectively second) inequality is proved in Step A (respec-tively Step B ). YMMETRIC STATISTICS 15 Step A. Here we prove the first inequality of (55). Split W = W + W + W , W = 1 N / N X j = m +1 g ( X j ) ,W = 1 N / X m ε . Therefore, P { B ε } ≤ P { B ∩ {| L | ≤ ε } } + P {| ˆ T | ≤ ε, | ∆ + W ′ | ≥ ε } =: I ( ε ) + I ( ε ) . In order to prove (55) we show that(59) Z b N dεε I ( ε ) ≤ c ∗ N − ν (1 + δ − ) , Z b N dεε I ( ε ) ≤ c ∗ N − ν . Step A.1. Here we prove (59), for the integral over I ( ε ). We have I ( ε ) ≤ P {| W ′ | > ε/ } + I ( ε ) , (60) I ( ε ) := P {| L + ∆ | < ε/ , | ∆ | > ε/ } . It follows from (42), by Chebyshev’s inequality, P {| W ′ | > ε/ } ≤ c ∗ ε − E W . Furthermore,invoking the inequalities, see (162), (163), E W = X | A |≥ A ∩ Ω m = ∅ E T A ≤ X | A |≥ E T A ≤ N − ∆ ≤ c ∗ N − we obtain from (60) I ( ε ) ≤ I ( ε ) + c ∗ ε − N − . Since Z b N dεε (cid:16) ε N (cid:17) ≤ c ∗ b − N N − ≤ c ∗ N − ν , it suffices to show (59) for I ( ε ).We have, for Λ = N − / P ≤ i Step A.2. Here we prove (59) for I ( ε ). Write I ( ε ) in the form I ( ε ) = E II A II SS II W ≤ E II A II V II W , A = { α > − ε m } , V = {k S k r ≤ N ν } , W = {| L | < ε } , where ε m is defined in (48). Note that, by the Berry–Esseen inequality,(65) P { W } ≤ c ∗ ( ε + N − / ) . Furthermore, one can show that the probability of the event A is small, like P { A } = O ( N − ν ).We are going to make use of both of these bounds while constructing an upper bound for I ( ε ). Since the events A and W refer to the same set of random variables Y m +1 , . . . , Y N , wecannot argue directly that E I A I W ≈ P { A } P { W } . Nevertheless, invoking a complex conditioningargument we are able to show that(66) I ( ε ) ≤ c ∗ R ( ε + N − / ) + c ∗ N − , R := N − ν (1 + δ − ) . Since ε ≥ b N > N − / , the inequality (66) implies (59).Let us prove (66). Since the proof is rather involved we start by providing an outline, Let theintegers n and M be defined by (24). Split { , . . . , N } = O ∪ O ∪· · ·∪ O n , where O = { , . . . , m } and where the sets O i , for 1 ≤ i ≤ n , are defined in (25). Split L , see (58),(67) L = n X k =0 L k − x, L k = N − / X j ∈ O k g ( Y j ) , for k = 1 , . . . , n, and L = N − / P j ∈ O g ( X j ). Observe, that II W is a function of L , L , . . . , L n . The randomvariables II A and II V are functions of Y m +1 , . . . , Y N and do not depend on X , . . . , X m . Therefore,denoting m ( l , . . . , l n ) = E (II A II V | L = l , . . . , L n = l n )we obtain from (65)(68) E II A II V II W = E II W m ( L , . . . , L n ) ≤ c ∗ ( ε + N − / ) M , where M = ess sup m ( l , . . . , l n ). Clearly, the bound M ≤ c ∗ R would imply (66). Unfortu-nately, we are not able to establish such a bound directly. In what follows we prove (66) usingthe argument outlined above. But we shall use a more delicate conditioning which allow toestimate quantities like M . Step A.2.1. Firstly we replace L k , 1 ≤ k ≤ n , by smooth random variables(69) g k = 1 N ξ k n / + L k , where ξ , . . . , ξ n are symmetric i.i.d. random variables with the density function defined by (18)with k = 6 and a = 1 / t → E exp { itξ } vanishes outside theunit interval { t : | t | < } . Note that E ξ < ∞ .We assume that the sequences ξ , ξ , . . . and X , . . . , X m , Y m +1 , . . . , Y N are independent. Inparticular, ξ k and L k are independent. Introduce the event˜ W = {| L + n X k =1 g k − x | < ε } . Note that II W ≤ II ˜ W + II {| ξ |≥ εN } , where ξ = 1 n / n X k =1 ξ k . By Chebyshev’s inequality and the inequality E ξ ≤ c , P {| ξ | ≥ εN } ≤ E ξ ε N ≤ cε N ≤ c ∗ N . Here we used the inequality ε N ≥ b N N ≥ c ′∗ . Therefore, we obtain(70) E II A II V II W ≤ E II A II V II ˜ W + c ∗ N − . In subsequent steps of the proof we replace the conditioning on L , . . . , L n (in (68)) by theconditioning on the random variables g , . . . , g n . Since the latter random variables have densities(as it is shown in Lemma 7 below) the corresponding conditional distributions are much easierto handle. Moreover, we restrict the conditioning on the event where these densities are positive. Step A.2.2. Given w > 0, consider the events {| g k | ≤ n − / w } and their indicator functionsII k = II {| g k |≤ n − / w } . Using the simple inequality n E g k ≤ c ∗ (where c ∗ depends on M ∗ and r ) weobtain from Chebyshev’s inequality that(71) P { II k = 1 } = 1 − P {| g k | > n − / w } ≥ − w − n E | g k | > / , where the last inequality holds for a sufficiently large constant w (depending on M ∗ , r ). Fix anumber w such that (71) holds and introduce the event B ∗ = { P nk =1 II k > n/ } . Hoeffding’sinequality shows P { B ∗ } ≥ − exp {− n/ } . Therefore,(72) E II A II V II ˜ W ≤ E II A II V II ˜ W II B ∗ + c ∗ N − . Given a binary vector θ = ( θ , . . . , θ n ) (with θ k ∈ { 0; 1 } ) write | θ | = P k θ k . Introduce the event B θ = { II k = θ k , ≤ k ≤ n } and the conditional expectation m θ ( z , . . . , z n ) = E (II A II V II B θ | g = z , . . . , g n = z n ) . Note that II B θ , the indicator of the event B θ , is a function of g , . . . , g n . It follows from theidentities B ∗ = ∪ | θ | >n/ B θ and II B ∗ = X | θ | >n/ II B θ (here B θ ∩ B θ ′ = ∅ , for θ = θ ′ ) that E II A II V II ˜ W II B ∗ = X | θ | >n/ E II A II V II ˜ W II B θ . = X | θ | >n/ E II B θ II ˜ W m θ ( g , . . . , g n ) . YMMETRIC STATISTICS 19 Assume that we have already shown that uniformly in θ , satisfying | θ | > n/ 4, we have(73) M θ ≤ c ∗ R , where M θ := ess sup m θ ( z , . . . , z n ) . This bound in combination with (65), which extends to ˜ W as well, implies E II A II V II ˜ W II B ∗ ≤ c ∗ R X | θ | >n/ E II B θ II ˜ W = c ∗ R E II B ∗ II ˜ W ≤ c ∗ R P { ˜ W } ≤ c ∗ R ( ε + N − / ) . Combining this inequality, (70) and (72) we obtain (66). Step A.2.3. Here we show (73). Fix θ = ( θ , . . . , θ n ) satisfying | θ | > n/ 4. Denote, for brevity, h = | θ | and assume without loss of generality that θ i = 1, for 1 ≤ i ≤ h , and θ j = 0, for h + 1 ≤ j ≤ n .Consider the h − dimensional random vector g [ θ ] = ( g , . . . , g h ). The random vector g [ θ ] and thesequences of random variables Y θ = (cid:8) Y j : m + hM < j ≤ N (cid:9) , ξ θ = { ξ j : h < j ≤ n } are independent. Furthermore the summands S θ and S ′ θ of the decomposition S = S θ + S ′ θ , S θ ( · ) = 1 √ N X ≤ k ≤ h X j ∈ O k ψ ( · , Y j ) , are independent, see (50). Moreover, we have m θ ( z , . . . , z n ) ≤ ˜ m θ ( z [ θ ] ), where˜ m θ ( z [ θ ] ) = ess sup θ E (cid:0) II A II V II B θ (cid:12)(cid:12) g [ θ ] = z [ θ ] , Y θ , ξ θ (cid:1) denotes the ”ess sup” taken with respect to almost all values of Y θ and ξ θ . Here z [ θ ] =( z , . . . , z h ) ∈ R h . In order to prove (73) we show that(74) ˜ m θ ( z [ θ ] ) ≤ c ∗ R . Let us prove (74). Given Y θ , denote f θ = S ′ θ (note that S ′ θ is a function of Y θ ). Using thenotation (36), we have for the interval J ′ p = N − / J p ,(75) E (cid:0) II A II V II B θ (cid:12)(cid:12) g [ θ ] = z [ θ ] , Y θ , ξ θ (cid:1) = I B θ E (cid:0) d ε m ( f θ + S θ , J ′ p ) (cid:12)(cid:12) g [ θ ] = z [ θ ] , Y θ , ξ θ (cid:1) . Note that the factor I B θ in the right hand side is non zero only in the case where z [ θ ] = ( z , . . . , z h )satisfies | z i | ≤ w/ √ n , for i = 1 , . . . , h .Introduce the L r valued random variables U i = N − / X j ∈ O i ψ ( · , Y j ) , i = 1 , . . . , h, and the regular conditional probability P ( z [ θ ] ; A ) = E (cid:0) I { ( U ,...,U h ) ∈A} (cid:12)(cid:12) g [ θ ] = z [ θ ] (cid:1) . Here A denotes a Borel subset of L r × · · · × L r ( h -times). By independence, there exist regularconditional probabilities(76) P i ( z i ; A i ) = E (II U i ∈A i (cid:12)(cid:12) g i = z i ) , i = 1 , . . . , h, such that for Borel subsets A i of L r we have P ( z [ θ ] ; A × · · · × A h ) = Y ≤ i ≤ h P i ( z i ; A i ) . In particular, for every z [ θ ] , the regular conditional probability P ( z [ θ ] ; · ) is the (measure theoret-ical) extension of the product of the regular conditional probabilities (76). Therefore, denotingby ψ i a random variable with values in L r and with the distribution(77) P { ψ i ∈ B} = P i ( z i ; B ) , B ⊂ L r − Borel set , we obtain that the distribution of the sum(78) ζ = ψ + · · · + ψ h of independent random variables ψ , . . . , ψ h is the regular conditional distribution of S θ , given g [ θ ] = z [ θ ] . In particular, the expectation in the right hand side of (75) equals δ ε m ( f θ + ζ ), where(79) δ s ( f θ + ζ ) := E ζ d s ( f θ + ζ, J ′ p ) , s > , and where E ζ denotes the conditional expectation given all the random variables, but ζ . Notethat the inequality(80) ε m ≤ ε ∗ implies(81) δ ε m ( f θ + ζ ) ≤ δ ε ∗ ( f θ + ζ ) . We shall apply Lemma 1 to construct an upper bound for δ ε ∗ ( f θ + ζ ), where ε ∗ = µ ∗ | T | N − / / µ ∗ is defined in (92) and satisfies c ∗ δ /n ≤ µ ∗ ≤ c ′∗ δ /n , by the inequality (212).Note that for T satisfying (34), for some integers m and n as in (15) and (24), and for thequantity δ (see (37)) which satisfies(82) δ ≥ N − ν , the inequality (80) holds with ε m defined by (48), provided that N is sufficiently large ( N > C ∗ ).Moreover, we have(83) ε ∗ ≤ c ∗ δ N − ν . In order to apply Lemma 1 we invoke the moment inequalities of Lemma 10. Now Lemma 1shows that(84) δ ε ∗ ( f θ + ζ ) ≤ c ∗ κ / ∗ ε ( r − / (2 r ) ∗ + c ∗ N − , where the number κ ∗ , defined in (92), satisfies κ ∗ ≤ c ∗ δ − r/ ( r − , by (213). Denote ˜ r = r − +( r − − . It follows from (84), (83) and (81), for r > 4, that δ ε m ( f θ + ζ ) ≤ c ∗ δ − ˜ r N − ν + c ∗ N − ≤ c ∗ (1 + δ − ˜ r ) N − ν ≤ c ∗ R . (85)In the last step we used the simple bound δ ≤ c ∗ , see (195), and the inequality 1+ δ − ˜ r ≤ δ − ,which follows from ˜ r < 1. Note that (85) and (75), (79) imply (74) thus completing the proofof the first inequality (55). Step B. Here we prove the second bound of (55). It is convenient to write the L r -valued randomvariable (50) in the form S = U + · · · + U n − + U n =: S ′ + U n ,U i = N − / X j ∈ O i ψ ( · , Y j ) . (86)Observe that U , . . . , U n − are independent and identically distributed L r -valued random vari-ables.We are going to apply Lemma 1 conditionally, given U n , to the probability P { B } = E ˜ p ( U n ) , ˜ p ( f ) = E (cid:0) d ε m ( S ′ + f, N − / J p ) (cid:12)(cid:12) U n = f (cid:1) . YMMETRIC STATISTICS 21 Lemma 9 shows that U , . . . , U n − satisfy the moment conditions of Lemma 1, but now thecorresponding quantity µ ∗ , see (92), satisfies c ∗ δ /n ≤ µ ∗ ≤ c ′∗ /n , by (196). This implies thebound ε ∗ ≤ c ∗ N − ν instead of (83). As a result we obtain a different power of δ in the upperbound below. Proceeding as in proof of (85), see (81), (83), (84), we obtain˜ p ( f ) ≤ c ∗ (1 + δ − r/ r − ) N − ν ≤ c ∗ R . In the last step we used the inequality 1 + δ − r/ r − ≤ δ − . This inequality follows from r/ r − < 1, for r > 4. Therefore, we have P { B } ≤ E ˜ p ( U n ) ≤ c ∗ R , where R is defined in(66). This completes the proof of the second inequality in (55). Here we prove the bound | I | ≤ c ∗ N − − ν , see (22). It follows from (44) that(87) | I | ≤ Z t ∈ J E | α mt || t | dt + c ∗ N − − ν , where J = { t : √ N / β ≤ | t | ≤ N − ν } . For the L r − valued random element S defined in (50)and the event S = {k S k r < N ν/ } write(88) E | α mt | ≤ E I S | α mt | + E (1 − I S ) . Estimate the second summand as P {k S k r ≥ N ν/ } ≤ c ∗ N − , by Lemma 13. Furthermore,expanding the exponent in α t we obtain I S | α t | ≤ | E exp { itN − / g ( X ) }| + I S | t | N − k S k . It follows from (5) that the first summand is bounded from above by 1 − v , for some v > A ∗ , M ∗ , D ∗ , δ only, see the proof of (32). Furthermore, the second summandis bounded from above by N − ν/ almost surely. Therefore, for sufficiently large N > C ∗ we have I S | α t | ≤ − v/ N . Invoking this bound in (88) we obtain E | α mt | ≤ (1 − v/ m + c ∗ N − ≤ c ∗ N − , for m satisfying (15). Finally, we obtain that the integral in (87)is bounded from above by c ∗ N − thus completing the proof.4. Combinatorial concentration bound We start the section by introducing some notation and collecting auxiliary inequalities. Thenwe formulate and prove Lemmas 1 and 2.Introduce the number(89) δ = min (cid:8) c g , ( c r k g k / r k g k rr ) / ( r − / k g k (cid:9) , where c g = 1 + k g k r / k g k and c r = (7 / − ( r − . Denote ρ ∗ = 1 − sup {| E e itg ( X ) | : 2 − δ ≤ | t | ≤ N − ν +1 / } . It follows from the identity ρ ∗ = ρ (2 − σδ , σN − ν +1 / ) and the simple inequality a ≤ δ / ρ ∗ ≥ ρ (2 σ a , σN − ν +1 / ). Furthermore, it follows from (164) and the assumption σ T = 1 that 1 / < σ < N ( N > C ∗ ). Therefore, ρ ∗ ≥ ρ ( a , N − ν +1 / ) ≥ δ ′ , where the last inequality follows from (32). We obtain, for N > C ∗ ,(90) 1 − sup {| E e itg ( X ) | : 2 − δ ≤ | t | ≤ N − ν +1 / } ≥ δ ′ , where the number δ ′ depends on A ∗ , D ∗ , M ∗ , ν , r, s, δ only. In what follows we use the notation c = 10. Let L r = { y ∈ L r : R X y ( x ) P X ( dx ) = 0 } denotes a subspace of L r . Observe, that E g ( X ) = 0 implies y ∗ (= p g ( y )) ∈ L r , for every y ∈ L r . Let ψ , . . . , ψ n denote independent random vectors with values in L r . For k = 1 , . . . , n ,write ζ k = ψ + · · · + ψ k and ζ = ζ n . Let ψ i denote an independent copy of ψ i . Write ψ ∗ i = p g ( ψ i ) and ψ ∗ i = p g ( ψ i ), see (26). Introducerandom vectors˜ ψ i = 2 − ( ψ i − ψ i ) , ˜ ψ ∗ i = 2 − ( ψ ∗ i − ψ ∗ i ) , ˆ ψ i = 2 − ( ψ i + ψ i ) . We shall assume that, for some c A ≥ c D ≥ c B > n r/ E k ˜ ψ i k rr ≤ c rA , c B ≤ n E k ˜ ψ ∗ i k ≤ c D , for every 1 ≤ i ≤ n . Furthermore, denote µ i = E k ˜ ψ ∗ i k and ˜ κ r − i = E k ˜ ψ i k rr µ ri ,(92) µ ∗ = min ≤ i ≤ n µ i , κ ∗ = max ≤ i ≤ n ˜ κ i . Observe that, by H¨older’s inequality and (27), we have κ i > 1, for i = 1 , . . . , n . Lemma 1. Let < r ≤ and < ν < − ( r − . Assume that n ≥ N ν . Suppose that (93) κ ∗ ≤ n ln N . Assume that (90), (91) as well as (101), (107) (below) hold. There exist a constant c ∗ > whichdepends on r, s, ν, A ∗ , D ∗ , M ∗ , δ only such that for every T satisfying (34) we have (94) δ ε ∗ ( f + ζ, I ( T )) ≤ c ∗ ( C D /C B ) / κ / ∗ ε ∗ ( r − / r + c ∗ N − , for an arbitrary non-random element f ∈ L r . Here ε ∗ = µ ∗ c | T |√ N . The function δ s ( · , I ( T )) , isdefined in (36). In Step A.2.3 of Section 3 we apply this lemma to random vector ζ = ψ + · · · + ψ h , see (78).In Step B of Section 3 we apply this lemma to the random vector S ′ , see (86). Proof of Lemma 1. We shall consider the case where T > 0. For T < c < N ν . Denote X = k ˜ ψ ∗ i k and Y = k ˜ ψ i k r and µ = µ i , κ = ˜ κ i . By (27), we have Y ≥ X . Step . Here we construct the bound (95), see below, for the probability P { B i } , where B i = { X ≥ µ/ , Y < κµ } . Write µ = E X = E X I A + E X I B i + E X I D ,A = { X < µ/ } , D = { X ≥ µ/ , Y ≥ κµ } . Substitution of the bounds E X I A ≤ µ , E X I B i ≤ E Y I B i ≤ ( κµ ) P { B i } , E X I D ≤ E Y I { Y ≥ κµ } ≤ ( κµ ) − r E Y r gives µ ≤ − µ + κ µ P { B i } + ( κµ ) − r E Y r . Finally, invoking the identity κ r − = (8 / E Y r /µ r we obtain(95) P { B i } ≥ κ − E Y r ( κµ ) r = 34 κ (cid:0) − E Y r µ r κ r − (cid:1) = 38 κ ≥ κ ∗ =: p YMMETRIC STATISTICS 23 Introduce the (random) set J = { i : B i occurs } ⊂ { , . . . , n } . Hoeffding’s inequality applied tothe random variable | J | = II B + · · · + II B n shows(96) P {| J | ≤ ρn } ≤ exp {− np / } ≤ N − , ρ := p/ / κ − ∗ . In the last step we invoke (93) and use (95). Step 2. Here we introduce randomization. Note that for any α i ∈ {− , +1 } , i = 1 , . . . , n , thedistributions of the random vectors( ψ , . . . , ψ n ) and (cid:0) α ˜ ψ + ˆ ψ , . . . , α n ˜ ψ n + ˆ ψ n (cid:1) coincide. Therefore, denoting˜ ζ n = α ˜ ψ + · · · + α n ˜ ψ n , ˆ ζ n = ˆ ψ + · · · + ˆ ψ n , we have for s > δ s ( f + ζ, I ( T )) = δ s ( f + ˜ ζ n + ˆ ζ n , I ( T )) , for every choice of α , . . . , α n . From now on let α , . . . , α n denote a sequence of independentidentically distributed Bernoulli random variables independent of ˜ ψ i , ˆ ψ i , 1 ≤ i ≤ n , and withprobabilities P { α = 1 } = P { α = − } = 1 / 2. Denoting by E α the expectation with respect tothe sequence α , . . . , α n we obtain(97) δ s ( f + ζ, I ( T )) = E α δ s ( f + ˜ ζ n + ˆ ζ n , I ( T )) . We are going to condition on ˜ ψ i and ˆ ψ i , 1 ≤ i ≤ n , while taking expectations with respect to α , . . . , α n . It follows from (96), (97) and the fact that the random variable | J | does not dependon α , . . . , α n that(98) δ s ( f + ζ, I ( T )) ≤ E II {| J |≥ ρn } γ s ( ˜ ψ i , ˆ ψ i , ≤ i ≤ n ) + N − , where γ s ( ˜ ψ i , ˆ ψ i , ≤ i ≤ n ) = E α II {| J |≥ ρn } II { v ( f +˜ ζ n +ˆ ζ n ) > − s } II {k f +˜ ζ n +ˆ ζ n k r ≤ N ν } denotes the conditional expectation given ˜ ψ i , ˆ ψ i , 1 ≤ i ≤ n . Note that (94) is a consequence of(98) and of the bound(99) γ ε ∗ ( ˜ ψ i , ˆ ψ i , ≤ i ≤ n ) ≤ c ∗ κ / ∗ ε ( r − / (2 r ) ∗ . Let us prove this bound. Introduce the integers n = l − l = ⌊ δ κ − ε − ( r − /r ∗ ⌋ , κ = 2 c ( C D /C B ) κ ∗ . Let us show that(100) n ≤ ρn. It follows from the inequalities ε − ∗ ≤ c c B N ν n / , N ν ( r − /r ≤ N ν ≤ n /r , δ ≤ (cid:0) (cid:1) / ( r − that l ≤ δ C D k ∗ (cid:0) C B c (cid:1) /r (cid:0) N ν n / (cid:1) ( r − /r ≤ 316 1 k ∗ C /rB C D n / . Note that (93) implies k ∗ ≤ n / . Therefore, the inequality(101) C /rB C − D ≤ n / implies l ≤ (3 / k − ∗ n = ρn . We obtain (100). Given ˜ ψ i , ˆ ψ i , 1 ≤ i ≤ n , consider the corresponding set J , say J = { i , . . . , i k } . Assume that k ≥ ρn . From the inequality ρn ≥ n , see (100), it follows that we can choose a subset J ′ ⊂ J of size | J ′ | = n . Split ˜ ζ n = X i ∈ J ′ α i ˜ ψ i + X i ∈ J \ J ′ α i ˜ ψ i =: ζ ∗ + ζ ′ and denote f + ζ ′ + ˆ ζ n = f ∗ . Note that f ∗ ∈ L r almost surely. The bound (99) would follow ifwe show that ˜ δ ≤ c ∗ κ / ∗ ε ( r − / (2 r ) ∗ , (102) ˜ δ := E ′ II { v ( f ∗ + ζ ∗ ) > − ε ∗ } II {k f ∗ + ζ ∗ k r ≤ N ν } . Here E ′ denotes the conditional expectation given all the random variables,but { α i , i ∈ J ′ } . Step 3. Here we prove (102). Note that for j ∈ J ′ the vectors x j = T N − / ˜ ψ j and x ∗ j = p g ( x j ) = T N − / ˜ ψ ∗ j satisfy(103) k x ∗ j k ≥ c ε ∗ , k x j k r ≤ κ ε ∗ , κ = 2 c ( C D /C B ) κ ∗ . Given A ⊂ J ′ denote x A = X i ∈ A x i − X i ∈ J ′ \ A x i , x ∗ A = p g ( x A ) . We are going to apply Kleitman’s theorem on symmetric partitions (see, e.g. the proof ofTheorem 4.2, Bollobas (1986)) to the sequence { x ∗ j , j ∈ J ′ } in L . Since for j ∈ J ′ we have k x ∗ j k ≥ c ε ∗ , it follows from Kleitman’s theorem that the collection P ( J ′ ) of all subsets of J ′ splits into non-intersecting non-empty classes P ( J ′ ) = D ∪ · · · ∪ D s , such that the correspondingsets of linear combinations V t = (cid:8) x ∗ A , A ∈ D t (cid:9) , t = 1 , , . . . , s , are sparse, i.e., given t , for A, A ′ ∈ D t and A = A ′ we have(104) k x ∗ A − x ∗ A ′ k ≥ c ε ∗ . Furthermore, the number of classes s is bounded from above by (cid:0) n ⌊ n / ⌋ (cid:1) .Next, using Lemma 2 we shall show that given f ∗ the class D t may contain at most one element A ∈ D t such that(105) v ( f ∗ + ˜ x A ) > − ε ∗ , k f ∗ + ˜ x A k r ≤ N ν , ˜ x A := N / T − x A . This means that there are at most (cid:0) n ⌊ n / ⌋ (cid:1) different subsets A ⊂ J ′ for which (105) holds. Thisimplies (102) ˜ δ ≤ − n (cid:18) n ⌊ n / ⌋ (cid:19) ≤ cn − / = cδ − / κ / ε r − r ∗ . Finally, (94) follows from (98), (99), (102).Given f ∗ ∈ L r let us show that there is no pair A, A ′ in D t which satisfy (105). Fix A, A ′ ∈ D t .We have, by (103) and the choice of n , k x A − x A ′ k r ≤ X i ∈ J ′ k x i k r ≤ n κ ε ∗ < δ ε /r ∗ . Denoting S A = f ∗ + ˜ x A and S A ′ = f ∗ + ˜ x A ′ we obtain(106) k S A − S A ′ k r = N / T − k x A − x A ′ k r ≤ δ ε /r ∗ N / T − . Assume that S A and S A ′ satisfy the second inequality of (105), i.e., k S A k r ≤ N ν and k S A ′ k r ≤ N ν . We are going to apply Lemma 2 to the vectors S A and S A ′ . In order to check the conditions YMMETRIC STATISTICS 25 of Lemma 2 note that (109) and (110) are verified by (103), (104) and (106). Furthermore, theinequalities c < N ν and(107) c B ≥ N ν ( n/N ) / , imply N ν − / ≤ ε ∗ . Finally, we can assume without loss of generality that ε ∗ ≤ c ′∗ :=min (cid:8) ( δ ′ / r/ , ( A / ∗ / r/ (cid:9) . Otherwise (94) follows from trivial inequalities δ ε ∗ ≤ ≤ ( ε ∗ /c ′∗ ) ( r − / r ≤ c ∗ ε ( r − / r ∗ and the inequality κ ∗ > { v ( S A ) , v ( S A ′ ) } ≤ − ε ∗ thus completing the proof of Lemma1. Here we formulate and prove Lemma 2. Let us introduce first some notation. Given y ∈ L r (= L r ( X , P X )) define the symmetrization y s ∈ L r ( X ×X , P X × P X ) by y s ( x, x ′ ) = y ( x ) − y ( x ′ ),for x, x ′ ∈ X . In what follows X , X denote independent random variables with values in X and with the common distribution P X . By E we denote the expectation taken with respect to P X . For h ∈ L r we write E h = E h ( X ) = Z X h ( x ) P X ( dx ) , E e ih = E e ih ( X ) = Z X e ith ( x ) P X ( dx ) . Furthermore, for 2 ≤ p ≤ r , denote k y s k pp = E | y ( X ) − y ( X ) | p , k y k pp = E | y ( X ) | p . Note that for y ∈ L r we have y ∗ (= p g ( y )) ∈ L r and, therefore,(108) E | y ∗ ( X ) − y ∗ ( X ) | = 2 E | y ∗ ( X ) | . Let y , . . . , y k , f be non-random vectors in L r . We shall assume that these vectors belongto the linear subspace L r . Given non random vectors α = { α i } ki =1 and α ′ = { α ′ i } ki =1 , with α i , α ′ i ∈ {− , +1 } , denote S α = f + k X i =1 α i y i , S α ′ = f + k X i =1 α ′ i y i . Lemma 2. Let κ > . Assume that (90) holds and suppose that N ν − / ≤ ε ≤ min (cid:8) ( δ ′ / r/ , ( k g k / r/ (cid:9) . Given T , satisfying (34), write T ∗ = N / T − and assume that (109) k y ∗ j k > c T ∗ ε, k y j k r ≤ κ T ∗ ε, j = 1 , . . . , k. Suppose that k S α k r ≤ N ν and k S α ′ k r ≤ N ν and (110) k S ∗ α − S ∗ α ′ k ≥ c T ∗ ε, k S α − S α ′ k r ≤ δ T ∗ ε /r . Then min { v ( S α ) , v ( S α ′ } ≤ − ε . Recall that the functionals τ ( · ), | u t ( · ) | and the interval I = I ( T ) used in proof below are definedin (35). Proof. Note that δ < / 10 and δ < / 12. In particular, we have(111) 9 / ≤ − δ ≤ | s/T | ≤ δ ≤ / , for | s − T | < δ N − ν +1 / . Step 1. Assume that the inequality min { v ( S α ) , v ( S α ′ } ≤ − ε fails. Then for some s, t ∈ I we have(112) 1 − | u t ( S α ) | < ε , − | u s ( S α ′ ) | < ε , see also (35). Fix these s, t and denote˜ X = s ( g + N − / S α ′ ) − t ( g + N − / S α ) . We are going to apply the inequality (251),1 − | E e i ( Y + Z ) | ≥ − (1 − | E e iZ | ) − (1 − | E e iY | )to Z = − ˜ X and Y = s ( g + N − / S α ′ ). It follows from this inequality and (112) that ε > − | u t ( S α ) | = 1 − | E e i ( Y + Z ) | ≥ − (1 − | E e − i ˜ X | ) − ε . In view of the identity | E e − i ˜ X | = | E e i ˜ X | we have(113) 1 − | E e i ˜ X | < ε . Step 2. Here we shall show that (113) contradicts the second inequality of (110). Firstly, wecollect some auxiliary inequalities. Write the decomposition (26) for S α and S α ′ ,(114) S α = a g + S ∗ α , S α ′ = a ′ g + S ∗ α ′ . Decompose ˜ X = vg + h,v = ( s − t )(1 + a N − / ) + ( a ′ − a ) sN − / ,h = ( s − t ) N − / S ∗ α + sN − / ( S ∗ α ′ − S ∗ α ) , where v ∈ R and where h ∈ L r is L -orthogonal to g . An application of (30) to S ∗ α and S ∗ α ′ − S ∗ α gives(115) k h k r ≤ c g N − / (cid:0) | s | k S α ′ − S α k r + | s − t | k S α k r (cid:1) . Furthermore, it follows from the simple inequality k x + y k ≥ − k x k − k y k that(116) k h k ≥ − s N − k S ∗ α ′ − S ∗ α k − ( s − t ) N − k S ∗ α k . Note that for a and a ′ defined in (114) we obtain from (28) and (110) that | a | ≤ k S α k r k g k − ≤ N ν k g k − , (117) | a ′ − a | ≤ k S α ′ − S α k r k g k − ≤ δ ε /r N / T − k g k − . (118) Step 4.2.1. Consider the case where, | s − t | < δ . Invoking the inequalities k S α k r ≤ N ν and(110) we obtain from (115) that k h k rr ≤ (4 c g ) r δ r (cid:16) N νr − r/ + ε | s | r T − r (cid:17) . Furthermore, using (111), (89), and N ν − / ≤ ε , we obtain for 4 ≤ r ≤ k h k rr ≤ − r ( ε r + ε (11 / r ) ≤ − r ε . Note that (27) implies k S ∗ α k ≤ k S α k r ≤ N ν . This inequality in combination with (110) and(116) gives k h k ≥ − ( s/T ) c ε − δ N ν − . Invoking (111) and using c > δ < − , and N ν − / ≤ ε we obtain(120) k h k ≥ (4 / c ε . YMMETRIC STATISTICS 27 Now we are going to apply Lemma 12 statement a) to ˜ X = vg + h . For this purpose weverify conditions of the lemma. Firstly, note that (120), (108) imply, k h s k ≥ (8 / c ε .Furthermore, it follows from the simple inequality E | h ( X ) − h ( X ) | r ≤ r E | h ( X ) | r and (119)that k h s k rr ≤ / r ε . Therefore, we obtain, for 4 ≤ r ≤ k h s k rr ≤ ε ≤ c − k h s k ≤ c r k h s k , c r = (7 / − ( r − . Furthermore, the inequalities (117), (118) and (111) imply | v | ≤ δ + δ k g k − ( N ν − / + 2 ε /r (11 / ≤ δ (1 + 4 k g k − ) , for N ν − / ≤ ε ≤ 1. Invoking (89) and using the inequality k g s k rr ≤ r k g k rr and the identity k g s k = 2 k g k we obtain | v | r − ≤ c r r k g k k g k rr ≤ c r r − k g s k − r k g s k rr ≤ c r k g s k k g s k rr as required by Lemma 12 a) . This lemma implies1 − | E e i ˜ X | ≥ − k h s k = 3 − k h k . In the last step we used (108). Now (120), for c ≥ 10, contradicts (113). Step 4.2.2. Consider the case where δ < | s − t | ≤ δ N − ν +1 / .It follows from (115), (110) and (111) that E | h | ≤ k h k r ≤ c g (cid:0) δ ε /r | s/T | + δ (cid:1) ≤ c g ( δ + 3 δ ε /r ) ≤ c g δ + ε /r . (121)In the last step we used δ < / 3. From (117), (118) and (111), we obtain for δ ≤ | s − t | and N ν − / ≤ ε , | v | ≥ δ (1 − N ν − / k g k − ) − δ ε /r | s/T |k g k − = δ (1 − k g k − ( ε + ε /r (22 / ≥ δ (1 − ε /r k g k − ) ≥ δ / , provided that ε /r < k g k / 6. Similarly, using in addition, δ , δ < / ε < k g k , we obtain,for | s − t | ≤ δ N − ν +1 / , | v | ≤ | s − t | (1 + N ν − / k g k − ) + 2 δ ε /r | s/T |k g k − ≤ | s − t | (1 + ε k g k − ) + (22 / δ ε /r k g k − ≤ | s − t | + 1 ≤ N − ν +1 / . It follows from these inequalities, see (90), that1 − | E e i ˜ X | ≥ − | E e i ˜ X | ≥ − | E e ivg | − E | h | ≥ δ ′ − E | h | . Finally, invoking (121) and (33), we get1 − | E e i ˜ X | ≥ δ ′ − c g δ − ε /r ≥ δ ′ / > ε , Once again we obtain a contradiction to (113), thus completing the proof. Expansions Here we prove the bound(122) Z | t |≤ t (cid:12)(cid:12)(cid:12) E e it ˜ T − ˆ G ( t ) (cid:12)(cid:12)(cid:12) dt | t | ≤ c ∗ N − − ν , where t = N / / β . For the definition of ˜ T and ˆ G see section Here and below c ∗ denotes a constant depending on A ∗ , M ∗ , D ∗ , r, s, ν only. We prove (122) for sufficiently large N , that is, we shall assume that N > C ∗ , where C ∗ is a number depending on A ∗ , M ∗ , D ∗ , r, s, ν only. Note that for N < C ∗ , the bound (122) becomes trivial, since in this case the integral isbounded by a constant.Let us first introduce some notation. Denote Ω m = { , . . . , m } . For A ⊂ Ω N write U ( A ) = P j ∈ A g ( X j ). Given complex valued functions f, h we write f ≺ R if Z | t |≤ t | t − f ( t ) | dt ≤ c ∗ N − − ν and write f ∼ h if f − h ≺ R . In particular, (122) can be written in short E e it ˜ T ∼ ˆ G ( t ).In order to prove (122) we show that(123) E e it ˜ T ∼ E e it T and E e it T ∼ ˆ G ( t ) . In what follows we use the notation of Section 2 and assume that (14) holds. Let us prove the first part of (123). Write T = ˜ T + ˜Λ + ˜Λ , ˜Λ = Λ + Λ , ˜Λ = Λ + Λ + Λ , where the random variables Λ j are introduced in . We shall show that(124) E e it ˜ T ∼ E e it (˜ T +˜Λ ) and E e it (˜ T +˜Λ ) ∼ E e it T . The second relation follows from the moment bounds of Lemma 5 via Taylor expansion. Wehave E e it T = E e it (˜ T +˜Λ ) + R, | R | ≤ | t | E | ˜Λ | , By Lyapunov’s inequality, E | ˜Λ | ≤ ( E Λ ) / + ( E Λ ) / + ( E Λ ) / . Invoking the moment bounds of Lemma 5 we obtain | t | E | ˜Λ | ≺ R , thus, proving the second partof (124).In order to prove the first part we combine Taylor’s expansion with bounds for characteristicfunctions. Expanding the exponent we obtain E e it (˜ T +˜Λ ) = E e it ˜ T + it E e it ˜ T ˜Λ + R, | R | ≤ t E | ˜Λ | . Invoking the identities(125) E Λ = (cid:18) m (cid:19) γ N , E Λ = m (cid:18) N − m (cid:19) ζ N we obtain, for γ < c ∗ and ζ < c ∗ , see (3), and m ≤ N / , that R ≺ R . We complete the proofof (124) by showing that(126) t E e it ˜ T ˜Λ ≺ R . Let us prove (126). Split W = W + W + W + R W , where W k = X A ⊂ Ω ′ , | A | = k T A , R W = X A ⊂ Ω ′ , | A |≥ T A . YMMETRIC STATISTICS 29 Here Ω ′ = { m + 1 , . . . , N } . Denote R = U ∗ + W + R W and U = P Nj =1 g ( X j ). We have˜ T = U + W + R . Expanding the exponent in powers of it R we obtain(127) t E e it ˜ T ˜Λ = t E e it ( U + W ) ˜Λ + t R, where | R | ≤ E | ˜Λ R | ≤ ( r + r )( r + r + r ) ,r = E Λ , r = E Λ , r = E ( U ∗ ) , r = E R W , r = E W . In the last step we applied the Cauchy-Schwartz inequality. Combining (125) with the identities E ( U ∗ ) = m ( N − m ) N γ , E W = (cid:0) N − m (cid:1) N ζ and invoking the simple bound E R W ≤ ∆ N ≤ D ∗ N ν , we obtain t ( r + r )( r + r + r ) ≺ R . Therefore, (127) implies t E e it ˜ T ˜Λ ∼ t E e it ( U + W ) ˜Λ . Let us show that t E e it ( U + W ) ˜Λ ∼ 0. Expanding the exponent in powers of it W we get t E e it ( U + W ) ˜Λ = f ( t ) + f ( t ) + f ( t ) + f ( t ) ,f ( t ) = t E e it U ˜Λ , f ( t ) = it E e it U Λ W ,f ( t ) = t E e it U Λ W θ , f ( t ) = t E e it U Λ W θ / , where θ , θ are functions of W satisfying | θ i | ≤ f i ≺ R , for i = 1 , , , 4. Split the set Ω m = { , . . . , m } in three (non-intersecting) parts A ∪ A ∪ A = Ω m of (almost) equal size | A i | ≈ m/ 3. The set of pairs (cid:8) { i, j } ⊂ Ω m (cid:9) splits into six (non-intersecting) parts B kr , 1 ≤ k ≤ r ≤ { i, j } belongsto B kr if i ∈ A k and j ∈ A r ). WriteΛ = X ≤ k ≤ r ≤ Λ ( k, r ) , Λ ( k, r ) = X { i,j }∈ B kr g ( X k , X l ) , Λ = X ≤ k ≤ Λ ( k ) , Λ ( k ) = X i ∈ A k X m +1 ≤ j 4. Since the random variable U ( A i ) := P j ∈ A i g ( X j ) and therandom variables Λ ( k, r ), W are independent, we have E e it U Λ ( k, r ) W θ ≤ E e it U ( A i ) E Λ ( k, r ) W θ . Therefore,(129) | E e it U Λ ( k, r ) W θ | ≤ | E e it U ( A i ) | E | Λ ( k, r ) W | . The first factor on the right is bounded from above by exp {− mt / N } , for k i ≥ m/ 4, see (160)below. The second factor is bounded from above by r , where r = E Λ ( k, r ) E W ≤ c ∗ m N − . Here we combined the Cauchy-Schwartz inequality and the bounds E Λ ( k, r ) ≤ c ∗ m N − , E W ≤ c ∗ N − . Finally, (128) follows from (129) (cid:12)(cid:12) t E e it U Λ ( k, r ) W θ (cid:12)(cid:12) ≤ c ∗ | t | e − mt / N mN − / ≺ R . The proof of f ≺ R is almost the same as that of f ≺ R .Let us prove f ≺ R . Split the set Ω ′ = { m + 1 , . . . , N } into three (non-intersecting) parts B ∪ B ∪ B = Ω ′ of (almost) equal sizes | B i | ≈ ( N − m ) / 3. Split the set of pairs (cid:8) { i, j } : m + 1 ≤ i < j ≤ N (cid:9) into (non-intersecting) groups D ( k, r ), for 1 ≤ k ≤ r ≤ 3. The pair { i, j } ∈ D ( k, r ) if i ∈ B k and j ∈ B r . Write W = X ≤ k ≤ r ≤ W ( k, r ) , W ( k, r ) = X { i,j }∈ D ( k,r ) g ( X i , X j ) . Λ = X ≤ k ≤ r ≤ Λ ( k, r ) , Λ ( k, r ) = X ≤ s ≤ m X { i,j }∈ D ( k,r ) g ( X s , X i , X j ) , In order to prove f ≺ R we shall show that(130) t E e it U Λ W ( k, r ) ≺ R . Write B i = Ω ′ \ ( B k ∪ B r ) and denote m i = | B i | . We shall assume that m i ≥ N/ 4. Sincethe random variable U ( B i ) = P j ∈ B i g ( X j ) and the random variables Λ and W ( k, r ) areindependent, we have, cf. (129),(131) | E e it U Λ W ( k, r ) | ≤ | E e it U ( B i ) | E | Λ W ( k, r ) | . The first factor in the right is the product | α m i ( t ) | ≤ e − m i t / N , see the argument used in theproof of (128) above. The second factor is bounded from above by r , where r = E Λ E W ( k, r ) ≤ c ∗ m N − . Finally, we obtain, using the inequality m i ≥ N/ | E e it U | E | Λ W ( k, r ) | ≤ c ∗ mN exp {− t m i N } ≤ c ∗ mN exp {− t } . This in combination with (131) shows (130). We obtain f ≺ R .Let us prove f ≺ R . We shall show that f ∗ ≺ R and f ⋆ ≺ R , where f ⋆ = t E e it U Λ and f ∗ = t E e it U Λ satisfy f ∗ + f ⋆ = f .Let us show f ⋆ ≺ R . Denote U ⋆ = P Nj = m +1 g ( X j ). We obtain, by the independence of U ⋆ andΛ that | E e it U Λ | ≤ | E e it U ⋆ | E | Λ | . Invoking, for N − m > N/ 2, the bound | E e it U ⋆ | ≤ e − t / , see (160) below, and the bound E | Λ | ≤ ( E Λ ) / ≤ c ∗ mN − / we obtain | f ⋆ ( t ) | ≤ c ∗ | t | e − t / N − / ≺ R . Let us prove f ∗ ≺ R . We shall show that, for 1 ≤ k ≤ r ≤ t E e it U Λ ( k, r ) ≺ R . Proceeding as in the proof of (130) we obtain the chain of inequalities(133) | E e it U Λ ( k, r ) | ≤ e − t / E | Λ ( k, r ) | ≤ c ∗ e − t / m / N − / . In the last step we applied Cauchy-Schwartz and the simple bound E Λ ( k, r ) ≤ c ∗ mN − .Clearly, (133) implies (132). YMMETRIC STATISTICS 31 Here we prove the second relation of (123). Firstly, we shall show that(134) E e it T ∼ E exp { it ( U + U + U ) } , (135) E exp { it ( U + U + U ) } ∼ E exp { it ( U + U ) } + (cid:18) N (cid:19) e − t / ( it ) w, where w = E g ( X , X , X ) g ( X ) g ( X ) g ( X ).Let m ( t ) be an integer valued function such that(136) m ( t ) ≈ C N t − ln( t + 1) , C ≤ | t | ≤ t , and put m ( t ) ≡ 10, for | t | ≤ C . Here C denotes a large absolute constant (one can take, e.g., C = 200). Assume, in addition, that the numbers m = m ( t ) are even. Here we prove (5B.1). Given m write T = U + U + U + H , where H = H + H , H = X | A |≥ , A ∩ Ω m = ∅ T A , H = X | A |≥ , A ∩ Ω m = ∅ T A . In order to show (134) we expand the exponent in powers of it H and it U , E exp { it T } = E exp { it ( U + U + U ) } + E exp { it ( U + U ) } it H + R, where | R | ≤ t ( E H + E | U H | ). Invoking the bounds, see (161), (162), (3), (4),(137) E H ≤ N − ∆ ≤ c ∗ N − − ν , E U ≤ N − ζ ≤ c ∗ N − we obtain, by Cauchy-Schwartz, | R | ≤ c ∗ t N − − ν ≺ R . We complete the proof of (134) byshowing that(138) E exp { it ( U + U ) } it H ≺ R . Before proving (138) we collect some auxiliary inequalities. For m = 2 k write(139) Ω m = A ∪ A , where A = { , . . . , k } , A = { k + 1 , . . . , k } . Furthermore, split the sum U = Z + Z + Z + Z , (140) Z = X ≤ i 3. The random variable U ( A ) does not depend on theobservations X j , j ∈ Ω \ A . Therefore, we can write h . = E exp { it U ( A ) } E exp { it ( U (Ω \ A ) + Z ) } ( it ) H Z . Furthermore, using (160) we obtain, for | A | = m/ | h . | ≤ t | α m/ ( t ) | E | H Z | ≤ c ∗ t exp {− t m N } m / N ν . In the last step we combined the bound E H ≤ c ∗ N − − ν and (141) to get E | H Z | ≤ ( E H ) / ( E Z ) / ≤ c ∗ m / N − − ν . Note that choosing of C in (136) sufficiently large implies, for | t | ≥ C , t m/ N ≈ ( C / 12) ln( t + 1) ≥ 10 ln( t + 1) . An application of this bound to the argument of the exponent in (143) shows h . ≺ R . Theproof of h .i ≺ R , for i = 1 , 2, is almost the same. Therefore, we obtain h ≺ R .Let us prove h ≺ R . Firstly we collect some auxiliary inequalities. Write m = 2 k (recall thatthe number m is even) and split Ω m = B ∪ D , where B denotes the set of odd numbers and D denotes the set of even numbers. Split H = H B + H D + H C . Here, for A ⊂ Ω N and | A | ≥ H B the sum of T A such that A ∩ B = ∅ and A ∩ D = ∅ ; H D denotes the sum of T A such that A ∩ B = ∅ and A ∩ D = ∅ ; H C denotes the sum of T A such that A ∩ B = ∅ and A ∩ D = ∅ . It follows from the inequalities (172) and (4) that(144) E H C ≤ c ∗ m N − − ν , E H B = E H D ≤ c ∗ mN − − ν . Using the notation z = it exp { it ( U + Z + Z + Z ) } write h = E z H = h . + h . + h . ,h . = E z H B , h . = E z H D , h . = E z H C . We shall show that h .i ≺ R , for i = 1 , , 3. The relation h . ≺ R follows from (144) and (141),and by Cauchy-Schwartz, | h . | ≤ c ∗ | t | mN − − ν ≺ R .Let us show that h . ≺ R . Expanding the exponent in powers of it ( Z + Z ) we obtain h . = h ∗ . + R, h ∗ . := E exp { it ( U + Z ) } it H D , YMMETRIC STATISTICS 33 where | R | ≤ t E | H D ( Z + Z ) | . Combining the bounds (141) and (144) we obtain, by Cauchy-Schwartz, | R | ≤ c ∗ t mN − (5+2 ν ) / ≺ R . Next we show that h ∗ . ≺ R . The random variable U ( D ) = P j ∈ D g ( X j ) and the random variable H D are independent. Therefore, we can write | h ∗ . | ≤ | t | | E exp { it U ( D ) }| E | H D | . Combining (160) and (144) we obtain using Cauchy-Schwartz, | h ∗ . | ≤ c ∗ | t | e − mt / N m / N − (3+2 ν ) / ≺ R . The proof of h . ≺ R is similar. Therefore, we obtain h ≺ R . This together with the relation h ≺ R , proved above, implies h ≺ R . Thus we arrive at (138) completing the proof of (134). Here we prove (135). We start with some auxiliary moment inequalities. Split U = W + Z, W = X | A | =3 , A ∩ Ω m = ∅ T A , Z = X | A | =3 , A ∩ Ω m = ∅ T A . Using the orthogonality and moment bounds for U -statistics, see, e.g., Dharmadhikari et al(1968), one can show that E W ≤ mN E g ( X , X , X ) , E Z ≤ N E g ( X , X , X ) , and E | Z | s ≤ cN s/ E | g ( X , X , X ) | s . Invoking (3) we obtain(145) E W ≤ c ∗ mN − , E Z ≤ c ∗ N − , E | Z | s ≤ c ∗ N − s . For the sets A , A ⊂ Ω m defined in (139) write D = { A ⊂ Ω N : | A | = 3 , A ∩ Ω m = ∅} , D = { A ∈ D : A ∩ A = ∅} , D = { A ∈ D : A ∩ A = ∅} , D = { A ∈ D : A ∩ A = ∅ , A ∩ A = ∅} . We have D = D ∪ D ∪ D and W = P A ∈D T A . Therefore, we can write W = W + W + W ,where W j = P A ∈D j T A .A calculation shows that E W = E W ≤ kN E g ( X , X , X ) , E W ≤ k N E g ( X , X , X ) . Therefore, we obtain form (3) that(146) E W = E W ≤ c ∗ mN − , E W ≤ c ∗ m N − . Let us prove (135). Write U = W + Z . Expanding the exponent in powers of itW we obtain E exp { it ( U + U + U ) } = h + h + R,h = E exp { it ( U + U + Z ) } ,h = E exp { it ( U + U + Z ) } itW, where, by (145), | R | ≤ t E W ≤ c ∗ t mN − ≺ R . This implies E exp { it ( U + U + U ) } ∼ h + h . In order to prove (135) we shall show that h ∼ E exp { it U } itW, (147) h ∼ E exp { it ( U + U ) } + E exp { it U } itZ, (148) E exp { it U } it U ∼ (cid:18) N (cid:19) e − t / ( it ) w. (149) Let us prove (147). Expanding the exponent (in h ) in powers of itZ we obtain h = h + R, h = E exp { it ( U + U ) } itW, where, by (145) and Cauchy-Schwartz, | R | ≤ t E | W Z | ≤ c ∗ t m / N − / ≺ R . We have, h ∼ h .It remains to show that h ∼ E exp { it U } itW . Split(150) U = U ∗ + U ⋆ , U ∗ = X | A | =2 , A ∩ Ω m = ∅ T A , U ⋆ = X | A | =2 , A ∩ Ω m = ∅ T A . We have, see (141),(151) E ( U ∗ ) ≤ c ∗ mN − , E ( U ⋆ ) ≤ c ∗ N − . Expanding the exponent (in h ) in powers of it U ∗ we obtain h = h + R, where h = E exp { it ( U + U ⋆ ) } itW, and where, by (145), (151) and Cauchy-Schwartz, | R | ≤ t E | W U ∗ | ≤ c ∗ t mN − / ≺ R . Therefore, we obtain h ∼ h .We complete the proof of (147) by showing that h ∼ E exp { it U } itW . Use the decomposition W = W + W + W and write h = h . + h . + h . , h .j = E exp { it ( U + U ⋆ ) } itW j . We shall show that(152) h .j ∼ E exp { it U } itW j , j = 1 , , . Expanding in powers of it U ⋆ we obtain h .j = E exp { it U } itW j + R j , where R j = ( it ) E exp { it U } W j U ⋆ θ and where θ is a function of U ⋆ satisfying | θ | ≤ 1. In orderto prove (152) we show that R j ≺ R , for j = 1 , , | R | ≤ c ∗ t mN − / ≺ R . Furthermore, using the fact that the random variable U ( A ) and the random variables U ⋆ and W are independent, we can write | R | ≤ t | E exp { it U ( A ) }| E | W U ⋆ | ≤ c ∗ t e − mt / N m / N − ≺ R . Here we used (160) and the moment inequalities (146) and (151). The proof of R ≺ R issimilar. We arrive at (152) and, thus, complete the proof of (147).Let us prove (148). We proceed in two steps. Firstly we show h ∼ h + h , (153) h = E exp { it ( U + U ) } , h = E exp { it ( U + U ) } itZ. Secondly, we show(154) h ∼ E exp { it U } itZ. In order to prove (153) we write h = h + h + R, R = E exp { it ( U + U ) } r, r = exp { itZ } − − itZ, YMMETRIC STATISTICS 35 and show that R ≺ R . In order to bound the remainder R we write U = U ∗ + U ⋆ , see (150),and expand the exponent in powers of it U ∗ . We obtain R = R + R , where R = E exp { it ( U + U ⋆ ) } r and | R | ≤ E | it U ∗ r | . Note that, for 2 < s ≤ 3, we have | r | ≤ c | tZ | s/ . Combining (145) and (151) we obtain viaCauchy-Schwartz, | R | ≤ | t | s/ E | Z | s/ | U ∗ | ≤ c ∗ | t | s/ m / N − − s/ ≺ R . In order to prove R ≺ R we use the fact that the random variable U (Ω m ) and the randomvariables U ⋆ and r are independent. Invoking the inequality | r | ≤ t Z we obtain from (160)and (145) | R | ≤ t | α m ( t ) | E Z ≤ c ∗ t e − mt / N N − ≺ R . We thus arrive at (153).Let us prove (154). Use the decomposition (140) and expand the exponent (in h ) in powers of it Z to get h = h + R , where h = E exp { it ( U + Z + Z + Z ) } itZ, | R | ≤ t E | Z Z | . Combining (141) and (145) we obtain via Cauchy-Schwartz | R | ≤ c ∗ t mN − / ≺ R . Therefore, we have h ∼ h . Now we expand the exponent in h in powers of it ( Z + Z ) and obtain h = h + h + R ,where h = E exp { it ( U + Z ) } itZ, h = E exp { it ( U + Z ) } ( it ) Z ( Z + Z ) , and where | R | ≤ | t | E | Z | | Z + Z | . Combining (141) and (145) we obtain via Cauchy-Schwartz | R | ≤ | t | mN − ≺ R . Therefore, we have h ∼ h + h . We complete the proof of (154), by showing that(155) h ∼ E exp { it U } itZ and h ≺ R . In order to prove the second bound write h = R + R , where R j = E exp { it ( U + Z ) } ( it ) Z Z j . We shall show that R ≺ R . Using the fact that the random variable U ( A ) and the randomvariables Z , Z and Z are independent we obtain from (160) | R | ≤ t | α m/ ( t ) | E | Z Z | ≤ t e − mt / m / N − ≺ R . In the last step we combined (141), (145) and Cauchy-Schwartz. The proof of R ≺ R is similar.In order to prove the first relation of (155) we expand the exponent in powers of it Z and obtain h = E exp { it U } itZ + R . Furthermore, combining (160), (141) and (145) we obtain | R | ≤ t | α m ( t ) | E | Z Z | ≤ c ∗ t e − mt / N N − / ≺ R . Thus the proof of (148) is complete.Let us prove (149). By symmetry and the independence,(156) E e it U it U = (cid:18) N (cid:19) h E e it U ∗ , h = E e itx e itx e itx itz. Here we denote z = g ( X , X , X ) and write, U = x + x + x + U ∗ , U ∗ = X ≤ j ≤ N g ( X j ) , x j = g ( X j ) . Furthermore, write r j = e itx j − − itx j , v j = e itx j − . In what follows we expand the exponents in powers of itX j , j = 1 , , E (cid:0) g ( X , X , X ) (cid:12)(cid:12) X , X (cid:1) = 0 as well as the obvious symmetry. Thus, we have h = h + R , h = E e itx e itx ( it ) zx , R = E e itx e itx itzr ,h = h + R , h = E e itx ( it ) zx x , R = E e itx ( it ) zx r h = h + R , h = E ( it ) zx x x , R = E ( it ) zx x r . Furthermore, we have R = E itz r v v , R = E ( it ) zx r v . Invoking the bounds | r j | ≤ | tx j | and | v j | ≤ | tx j | we obtain(157) h = h + R, where | R | ≤ c | t | E | zx x | x . The bound, | R | ≤ c ∗ | t | N − / (which follows, by Cauchy-Schwartz) in combination with (156) and (157) implies(158) E e it U it U ∼ (cid:18) N (cid:19) E e it U ∗ ( it ) w. Note that (cid:0) N (cid:1) | w | ≤ c ∗ N − . In order to show (149) we replace E e it U ∗ by e − t / . Therefore, (149)follows from (158) and the inequalities( it ) N ( E e it U ∗ − e − t σ ( N − / N ) ≺ R , ( it ) N ( e − t σ ( N − / N − e − t / ) ≺ R . The second inequality is a direct consequence of (164). The proof of the first inequality is routineand here omitted. Thus the proof of (135) is complete. Here we show that(159) E exp { it U + U ) } + (cid:18) N (cid:19) e − t / ( it ) w ∼ ˆ G ( t ) . This relation in combination with (134) and (135) implies E e it T ∼ ˆ G ( t ).Let G U ( t ) denote the two term Edgeworth expansion of the U - statistic U + U . That is, G U ( t )is defined by (2), but with κ replaced by κ ∗ , where κ ∗ is obtained from κ after removingthe summand 4 E g ( X ) g ( X ) g ( X ) χ ( X , X , X ). Furthermore, let ˆ G U ( t ) denote the Fouriertransform of G U ( t ). It easy to show thatˆ G ( t ) = ˆ G U ( t ) + (cid:18) N (cid:19) e − t / ( it ) w. Therefore, in order to prove (159) it suffices to show that ˆ G U ( t ) ∼ E exp { it ( U + U ) } . Thebound Z | t |≤ t | ˆ G U ( t ) − E exp { it ( U + U ) }| dt | t | ≤ ε N N − where ε N ↓ 0, was shown by Callaert, Janssen and Veraverbeke (1980) [11] and Bickel, G¨otzeand van Zwet (1986) [8]. An inspection of their proofs shows that under the moment conditions(3) one can replace ε n by c ∗ N − ν . This completes the proof of (122). YMMETRIC STATISTICS 37 For the reader convenience we formulate in Lemma 3 a known result on upper bounds forcharacteristic functions. Lemma 3. Assume that (14) holds. There exists a constant c ∗ depending on D ∗ , M ∗ , r, s, ν only such that, for N > c ∗ and | t | ≤ N / / β and B ⊂ Ω N , we have (160) | α ( t ) | ≤ − t / N, E exp { it U ( B ) }| ≤ | α ( t ) | | B | ≤ e −| B | t / N . Here α ( t ) = E exp { itg ( X ) } and U ( B ) = P j ∈ B g ( X j ) .Proof. Let us prove the first inequality of (160). Expanding the exponent, see (183), we obtain | α ( t ) | ≤ (cid:12)(cid:12) − − t E g ( X ) (cid:12)(cid:12) + 6 − | t | E | g ( X ) | = (cid:12)(cid:12) − σ t / N (cid:12)(cid:12) + β σ | t | / N / Invoking the inequality 1 − − ≤ σ ≤ N > c ∗ , where c ∗ issufficiently large, we obtain | α ( t ) | ≤ − t / N , for | t | ≤ N / / β .The second inequality of (160) follows from the first one via the inequality 1 + x ≤ e x , for x ∈ R . (cid:3) Appendix 1 Here we compare the moments ∆ m and E R m , where R m denotes the remainder of theexpansion (12), T = E T + U + · · · + U m − + R m , R m := U m + · · · + U N . For k = 1 , . . . , N , write Ω k = { , , . . . , k } and denote σ k := E g k ( X , . . . , X k ) = E T k . It followsfrom (12), by the orthogonality property (13), that(161) σ T = N X k =1 E U k , E R m = N X k = m E U k , E U k = (cid:18) Nk (cid:19) σ k . Lemma 4. Assume that E T < ∞ . Then E R m ≤ N − ( m − ∆ m , (162) ∆ m ≤ N m − σ m + N − ∆ m +1 , (163) Assume that (3) and (4) hold, then there exists a constant c ∗ < ∞ depending on D ∗ , M ∗ , r, s, ν such that (164) 0 ≤ − σ σ − T ≤ c ∗ N − . Remark. For m = 3, the inequality (163) yields ∆ ≤ ζ + N − ∆ . Proof. Let us prove (162). The identity D · · · D m T = X A : Ω m ⊂ A ⊂ Ω N T A = X m ≤ k ≤ N U k | m , where U k | m = P | A | = k, A ⊃ Ω m T A , implies(165) E ( D · · · D m T ) = X m ≤ k ≤ N E U k | m , E U k | m = σ k (cid:18) N − mk − m (cid:19) . Write(166) E ( D D · · · D m T ) = X m ≤ k ≤ N σ k (cid:18) N − mk − m (cid:19) = X m ≤ k ≤ N σ k (cid:18) Nk (cid:19) b k , where b k = [ k ] m / [ N ] m satisfies b k ≥ b m ≥ m ! N − m . Here we denote [ x ] m = x ( x − · · · ( x − m +1).A comparison of (161) and (166) shows (162) E R m ≤ N m E ( D · · · D m T ) = N − ( m − ∆ m . Let us prove (163). Write E ( D · · · D m T ) = σ m + X m Lemma 5. Assume that σ T = 1 . For ≤ m ≤ N and s > , we have E Λ ≤ m N ∆ , E Λ ≤ m N ∆ , E | Λ | ≤ c m N / γ , (167) E | Λ | s ≤ c ( s ) m s/ N − s/ ζ s , E η i ≤ N − ∆ , E Λ ≤ mN − ∆ . (168) Here c denotes an absolute constant and c ( s ) denotes a constant which depends only on s .Proof. The inequalities (167) are proved in [3].Let us prove (168). Split Λ = z + · · · + z m , where z i = X | A | =3 , A ∩ Ω m = i T A . Let E ′ denote the conditional expectation given X m +1 , . . . , X N . It follows from Rosenthal’sinequality that almost surely E ′ | Λ | s ≤ c ( s ) m X i =1 E ′ | z i | s + c ( s ) (cid:0) m X i =1 E ′ z i (cid:1) s/ . Invoking H¨older’s inequality we obtain, by symmetry,(169) E | Λ | s = EE ′ | Λ | s ≤ c ( s ) m s/ E | z | s . Using well known martingale moment inequalities (and their applications to U statistics), seeDharmadhikari, Fabian and Jogdeo (1968) [13], one can show the bound E | z | s ≤ c ( s ) N − s/ ζ s .Invoking this bound in (169) we obtain the first bound of (168). YMMETRIC STATISTICS 39 In order to prove the second bound of (168) write η i = N − m +1 X k =4 U ∗ k , U ∗ k = X | A | = k, A ∩ Ω m = { i } T A . A simple calculation shows E ( U ∗ k ) = (cid:0) N − mk − (cid:1) σ k . Therefore, by orthogonality, E η i = N − m +1 X k =4 (cid:18) N − mk − (cid:19) σ k = N − m +1 X k =4 (cid:18) N − k − (cid:19) b k σ k ≤ N E ( D · · · D T ) . (170)In the last step we invoke (165) and use the bound b k ≤ N , where b k = (cid:0) N − mk − (cid:1)(cid:0) N − k − (cid:1) − . Clearly,(170) implies E η i ≤ N − ∆ . Finally, using the fact that η , . . . , η m are uncorrelated we obtainΛ = E η + · · · + E η m ≤ m N − ∆ . thus completing the proof. (cid:3) Before formulating next result we introduce some notation. Given m let D denote the class ofsubsets A ⊂ Ω N satisfying | A | ≥ m ∩ A = ∅ . Introduce the random variable H ( m ) = P A ∈D T A . Denote x i = 2 i − y i = 2 i . For even integer m = 2 k ≤ N writeΩ m = A k ∪ B k , A k = { x , . . . , x k } , B k = { y , . . . , y k } and put A = B = ∅ . Let A ( k ) (respectively B ( k )) denote the collection of those A ∈ D whichsatisfy A ∩ A k = ∅ (respectively A ∩ B k = ∅ ). Furthermore, let C ( k ) denote the collection of A ∈ D such that A ∩ A k = ∅ and A ∩ B k = ∅ . Write H A ( k ) = X A ∈A ( k ) T A , H B ( k ) = X A ∈B ( k ) T A , H C ( k ) = X A ∈C ( k ) T A . Lemma 6. There exists an absolute constant c such that, (171) E H ( m ) ≤ c mN ∆ , for m = 4 , , . . . , N. For even integer m = 2 k < N we have (172) E H A ( k ) = E H B ( k ) ≤ c kN ∆ , E H C ( k ) ≤ c k N ∆ . Proof. Let us prove the first bound of (171). For m = 4 we have H (4) = H + H + H + H , H k = X | A |≥ , | A ∩ Ω | = k T A . A calculation shows that, for k = 1 , , , E H k = (cid:18) k (cid:19) N X j =4 σ j (cid:18) N − j − k (cid:19) = (cid:18) k (cid:19) N X j =4 σ j (cid:18) N − j − (cid:19) a k ( j ) , where the numbers a k ( j ) = (cid:0) N − j − k (cid:1)(cid:0) N − j − (cid:1) ≤ N − k . Invoking (166) we obtain(173) E H k ≤ c N − k E ( D · · · D T ) = cN − − k ∆ . Finally, we obtain (171) for m = 4 E H (4) = E H + · · · + E H ≤ cN − ∆ . In order to prove (171) for m = 5 , , . . . we apply a recursive argument. Write(174) E H ( m + 1) = E H ( m ) + E d m , where d m = H ( m + 1) − H ( m ) is the sum of those T A with | A | ≥ A ∩ Ω m = ∅ and A ∩ Ω m +1 = ∅ . In particular, we have d m = X | A |≥ , A ∩ Ω m +1 = ∅ T A ∪{ m +1 } . Therefore, E d m = N X j =4 σ j (cid:18) N − m − j − (cid:19) = N X j =4 σ j (cid:18) N − j − (cid:19) c j , where the numbers c j = (cid:0) N − m − j − (cid:1)(cid:0) N − j − (cid:1) ≤ N . Invoking (166) we obtain E d m ≤ N − ∆ . This bound together with (174) implies (171).Let us prove (172). Note that for m = 2 k we have H ( m ) = H A ( k ) + H B ( k ) + H C ( k ) and thesummands are uncorrelated. Therefore, the first bound of (172) follows from (171).Let us show the second inequality of (172). For k = 2 we have C (2) ⊂ C , where C denotes theclass of subsets A ⊂ Ω N such that | A | ≥ | A ∩ Ω | ≥ 2. Write H C = P A ∈C T A . We have E H C (2) ≤ E H C = E H + E H + E H ≤ cN − ∆ . In the last step we applied (173). We obtain (172), for k = 2.In order to prove the bound (172), for k = 3 , , . . . , we apply a recursive argument similar tothat used in the proof of (171). Denote d [ k ] = H C ( k + 1) − H C ( k ) = X A ∈C ( k +1) \C ( k ) T A . We shall show that(175) E d k ] ≤ ckN − ∆ . This bound in combination with the identity E H C ( k + 1) = E H C ( k ) + E d k ] shows (172) forarbitrary k .In order to show (175) split the set C ( k + 1) \ C ( k ) into 2 k + 1 non-intersecting parts C ( k + 1) \ C ( k ) = (cid:0) ∪ ki =1 C x.i (cid:1) ∪ (cid:0) ∪ ki =1 C y.i (cid:1) ∪ C x.y , where we denote C x.y = (cid:8) A = ˜ A ∪ { x k +1 , y k +1 } : ˜ A ∩ ( B k ∪ A k ) = ∅ , | ˜ A | ≥ (cid:9) , C x.i = (cid:8) A = ˜ A ∪ { y k +1 , x i } : ˜ A ∩ ( B k ∪ A i − ) = ∅ , | ˜ A | ≥ (cid:9) , C y.i = (cid:8) A = ˜ A ∪ { x k +1 , y i } : ˜ A ∩ ( B i − ∪ A k ) = ∅ , | ˜ A | ≥ (cid:9) . By the orthogonality property ( E T A T V = 0 for A = V ), the random variables d x.i = X A ∈C x.i T A , d y.i = X A ∈C y.i T A , d x.y = X A ∈C x.y T A are uncorrelated. Therefore, we have(176) E d k ] = E d x.y + k X i =1 ( E d x.i + E d y.i ) . YMMETRIC STATISTICS 41 A calculation shows that E d x.y = N X j =4 σ j (cid:18) N − k − j − (cid:19) = N X j =4 σ j (cid:18) N − j − (cid:19) v j , where the coefficients v j = (cid:0) N − k − j − (cid:1)(cid:0) N − j − (cid:1) ≤ N . Invoking (166) we obtain E d x.y ≤ N − ∆ . The same argument shows E d x.i = E d y.i ≤ N − ∆ .The latter bound in combination with (176) shows (175). The lemma is proved. (cid:3) Appendix 2 Here we construct bounds for the probability density function (and its derivatives) of randomvariables g ∗ k = ( N/M ) / g k , for 1 ≤ k ≤ n − 1, where g k are defined in (69). Since these randomvariables are identically distributed it suffices to consider g ∗ = (cid:0) NM (cid:1) / g = 1 √ M m + M X j = m +1 g ( Y j ) + ξ R . Here R = √ n M N . Introduce the random variables g ∗ = g ∗ − M − / g ( Y m +1 ) , g ∗ = g ∗ − M − / (cid:0) g ( Y m +1 ) + g ( Y m +2 ) (cid:1) . Let p i ( · ) denote the probability density function of g ∗ i , for i = 1 , , 3. Recall that the integers n ≈ N ν ≤ N ν / and M ≈ N/n ≥ N / are introduced in (24) and the number ν > Lemma 7. Assume that conditions of Theorem 1 are satisfied. There exist positive constants C ∗ , c ∗ , c ′∗ depending only on M ∗ , D ∗ , δ, r and ν , ν such that, for i = 1 , , , we have uniformlyin u ∈ R and N > C ∗ (177) | p i ( u ) | ≤ c ∗ , | p ′ i ( u ) | ≤ c ∗ , | p ′′ i ( u ) | ≤ c ∗ , | p ′′′ i ( u ) | ≤ c ∗ . Furthermore, given w > there exists a constant C ∗ ( w ) depending on M ∗ , D ∗ , δ, r , ν , ν and w such that uniformly in z ∗ ∈ [ − w, w ] and N > C ∗ ( w ) we have (178) p i ( z ∗ ) ≥ c ′∗ , i = 1 , , . Proof. We shall prove (177) and (178) for i = 1. For i = 2 , 3, the proof is almost the same.Before starting the proof we introduce some notation and collect auxiliary results.Denote θ = E g ∗ = M / θ , θ = E g ( Y m +1 ) ,s = E ( g ( Y m +1 ) − θ ) , ˜ β = s − E | g ( Y m +1 ) − θ | . It follows from E g ( X m +1 ) = 0 that θ = q − N E g ( X m +1 )II A m +1 = − q − N E g ( X m +1 )(1 − II A m +1 ) . Therefore, by Chebyshev’s inequality, for α = 3 / ( r + 2) we have(179) | θ | ≤ q − N N − α ( r − E | g ( X m +1 ) | k Z ′ m +1 k r − r ≤ c ∗ N − / . In the last step we invoke the inequalities α ( r − ≥ r − / ( r + 2) ≥ / q − N ≤ c ∗ , see(40), and E | g ( X m +1 ) | k Z ′ m +1 k r − r ≤ M ∗ , where the latter inequality follows from (3) by H¨olderinequality. Similarly, the identities s = q − N E g ( X m +1 )II A m +1 − θ = q − N σ − q − N E g ( X m +1 )(1 − II A m +1 ) − θ in combination with (39) and the inequalities E g ( X m +1 )(1 − II A m +1 ) ≤ N − α ( r − E g ( X m +1 ) k Z ′ m +1 k r − r ≤ N − α ( r − M ∗ and α ( r − 2) = 1 + 2( r − / ( r − ≥ | s − σ | ≤ c ∗ N − . Introduce the random variables g ∗ = S + ξ s R , S = w + · · · + w M , w j = g ( Y m + j ) − θ M / s . We have g ∗ = s − ( g ∗ − θ ). Let p ( · ) denote the density function of g ∗ . Note that p ( u ) = s − p (cid:0) s − ( u − θ ) (cid:1) . Furthermore, we have, by (179), | θ | ≤ c ∗ N − and, by (180), (164), | s − | ≤ c ∗ N − . Therefore, it suffices to prove (177) and (178) for p ( · ) (the latter inequality we verifyfor every z ∗ ∈ [ − w, w ]).In order to prove (177) and (178) we approximate the characteristic function ˆ p ( t ) = E e itg ∗ by e − t / and then apply a Fourier inversion formula. Writeˆ p ( t ) = E e itg ∗ = γ M ( t ) τ (cid:16) tsR (cid:17) , γ ( t ) := E e itw , τ ( t ) := E e itξ . The fact that τ ( t ) = 0, for | t | ≥ 1, implies ˆ p ( t ) = 0, for | t | > s R . Therefore, we obtain from theFourier inversion formula, p ( x ) = 12 π Z + ∞−∞ e − itx ˆ p ( t ) dt = 12 π Z s R − s R e − itx ˆ p ( t ) dt. Write ˆ p ( t ) − e − t / = r ( t ) + r ( t ), where r ( t ) = ( γ M ( t ) − e − t / ) τ ( t/sR ) , r ( t ) = e − t / (cid:0) τ ( t/sR ) − (cid:1) . We shall show below that(181) Z | t |≤ sR | r i ( t ) | dt ≤ c ∗ M − / , i = 1 , . These bounds in combination with the simple inequality Z | t |≥ sR e − t / dt ≤ c ∗ M − / show that(182) | p ( x ) − ϕ ( x ) | ≤ c ∗ M − / , x ∈ R . Here ϕ denotes the standard normal density function ϕ ( x ) = 1 √ π e − x / = 12 π Z + ∞−∞ e − itx e − t / dt It follows from (182) that | p ( x ) | ≤ c ∗ , x ∈ R . Furthermore, given w we have uniformly in | z ∗ | ≤ w | p ( z ∗ ) | ≥ ϕ (3 w ) − c ∗ M − / ≥ c ′∗ > , for sufficiently large M (for N > C ∗ ( w )). YMMETRIC STATISTICS 43 In order to prove an upper bounds for the k − th derivative, | p ( k ) ( x ) | ≤ c ∗ , write p ( k ) ( t ) = 12 π Z + ∞−∞ ( − it ) k exp {− itx } ˆ p ( t ) dt, k = 1 , , , and replace ˆ p ( t ) by e − t / as in the proof of (182). We obtain p ( k ) ( x ) = 12 π Z + ∞−∞ ( − it ) k exp {− itx } e − t / ( t ) dt + r, | r | ≤ c ∗ M − / . This implies | p ( k ) ( x ) − ϕ ( k ) ( x ) | ≤ c ∗ M − / . We arrive at the desired bound | p ( k ) ( x ) | ≤ c ∗ , for k = 1 , , i = 2 this bound follows from | τ ( t/sR ) − | ≤ ct / ( sR ) . The latter inequality is a consequence of the short expansion (cid:12)(cid:12) E exp { itξ /sR } − − E itξ /sR (cid:12)(cid:12) ≤ E ( tξ ) / sR ) and E ξ = 0 and E ξ ≤ c , for some absolute constant c .Let us prove (181) for i = 1. Introduce the sequence of i.i.d. centered Gaussian random variables η , η , . . . with variances E η i = M − . Denote f ( t ) = E e itη = e − t / (2 M ) and δ ( t ) = γ ( t ) − f ( t ) . We are going to apply the well known inequality(183) (cid:12)(cid:12) e iv − (cid:0) iv 1! + ( iv ) 2! + · · · + ( iv ) k − ( k − (cid:1)(cid:12)(cid:12) ≤ | v | k k ! . It follows from (183) and identities E η i = E w i , i = 1 , 2, that(184) | δ ( t ) | ≤ | t | (cid:0) E | w | + E | η | (cid:1) ≤ c | t | E | w | . Here we use the inequality E | η | ≤ c E | w | , which follows from E η = E w .Combining (184) and the simple identity γ M ( t ) − f M ( t ) = δ ( t ) M X k =1 γ M − k ( t ) f k − ( t )we obtain(185) | γ M ( t ) − f M ( t ) | ≤ c | t | Z ( t ) M − / ˜ β . Here we denote Z ( t ) = max r + v = M − | f r ( t ) γ v ( t ) | . We shall show below that(186) Z ( t ) ≤ exp n − t M − M o + exp {− δ ′′ ( M − / } , ≤ | t | ≤ sR, where δ ′′ > δ, A ∗ , D ∗ , M ∗ , ν and it is given in (32). This inequality in combinationwith (185) proves (181).Let us prove (186). Clearly, Z ≤ | f M − ( t ) | + | γ M − ( t ) | . Furthermore, f M ( t ) = e − t / . In orderto prove (186) we shall show | γ M ( t ) | ≤ e − t / , ≤ | t | ≤ M / / ˜ β , (187) | γ ( t ) | ≤ e − δ ′′ / , M / / ˜ β ≤ | t | ≤ sR. (188) To show (187) we expand e itw using (183), | γ ( t ) | = (cid:12)(cid:12) E e itw (cid:12)(cid:12) ≤ (cid:12)(cid:12) − t E w (cid:12)(cid:12) + | t | E | w | = (cid:12)(cid:12) − t M (cid:12)(cid:12) + | t | 3! ˜ β M / = 1 − t M (cid:0) − | t | β √ M (cid:1) ≤ − t M . Here we used the identity | − t / M | = 1 − t / M , which holds for | t | < M / / ˜ β , since ˜ β ≥ − x ≤ e − x to x = t / M > δ ′′ defined by (32) we shall show δ ′′ ≤ δ , where˜ δ = 1 − sup {| γ ( t ) | : M / ˜ β − ≤ | t | ≤ sR } = 1 − sup {| E exp { iuσ − g ( Y m +1 ) | : σ/s ˜ β ≤ | u | ≤ σ √ n N } . We are going to replace g ( Y m +1 ) , ˜ β , s by X m +1 , β , σ respectively. Write E e ivg ( Y m +1 ) = q − N E e ivg ( X m +1 ) II A m +1 = E e ivg ( X m +1 ) + r + r ,r = q − N E e ivg ( X m +1 ) (cid:0) II A m +1 − (cid:1) , r = ( q − N − E e ivg ( X m +1 ) . It follows from (5.A5) that, for every v ∈ R , | r | ≤ q − N E | II A m +1 − | = q − N − ≤ c ∗ N − , | r | ≤ q − N − ≤ c ∗ N − . These bounds imply(189) | E e ivg ( Y m +1 ) − E e ivg ( X m +1 ) | ≤ c ∗ N − , for every v ∈ R . One can show that, for sufficiently large N (i.e., for N > C ∗ ), we have(190) | ˜ β /β − | < / , | s /σ − | < / , | s − | ≤ / . Using (189), (190) we get, for N > C ∗ ,˜ δ ≥ − sup (cid:8) | E e iuσ − g ( Y m +1 ) | : (2 β ) − ≤ | u | ≤ N (1+50 ν ) / (cid:9) ≥ − sup (cid:8) | E e iuσ − g ( X m +1 ) | : (2 β ) − ≤ | u | ≤ N ( ν +1) / (cid:9) − c ∗ N − ≥ δ ′′ / . We obtain | γ ( t ) | ≤ − ˜ δ ≤ − δ ′′ / | γ ( t ) | ≤ e − δ ′′ / .8. Appendix 3 The main results of this section are moment inequalities of Lemma 9 and corresponding inequal-ities for conditional moments of Lemma 10. Lemma 8 provides an auxiliary inequality.We start with some notation. We call v = v ( · ) , u = u ( · ) ∈ L r orthogonal if h u, v i = 0, where h u, v i = Z X u ( x ) v ( x ) P X ( dx ) = E u ( X ) v ( X ) . Given f ∈ L ( P X ) we have for the kernel ψ ∗∗ defined in (37) E ψ ∗∗ ( X , X ) (cid:0) f ( X ) g ( X ) + f ( X ) g ( X ) (cid:1) = 0 YMMETRIC STATISTICS 45 and almost surely E ( ψ ∗∗ ( X , X ) | X ) = 0 , (191) E (cid:0) ψ ∗∗ ( X , X ) g ( X ) | X (cid:1) = 0 . (192)The latter identity says that almost all values of the L r -valued random variable ψ ∗∗ ( · , X ) areorthogonal to the vector g ( · ) ∈ L r .Let p g : L r → L r denote the projection on the subspace of elements u ∈ L r which are orthogonalto g = g ( · ). For v ∈ L r , write v ∗ = p g ( v ). It follows from (192) that(193) ψ ∗ ( · , Y j ) (cid:16) = p g (cid:0) ψ ( · , Y j ) (cid:1)(cid:17) = ψ ∗∗ ( · , Y j ) + g ( Y j ) b ∗ ( · ) , where b ∗ ( · ) = p g ( b ( · )) = σ − p g (cid:0) E ( ψ ( · , X ) g ( X ) (cid:1) . Denote U ∗ k (cid:0) = p g ( U k ) (cid:1) = 1 √ N X j ∈ O k ψ ∗ ( · , Y j ) , U ∗∗ k = N − / X j ∈ O k ψ ∗∗ ( · , Y j ) , where the L r -valued random variables U k are introduced in (86). For the random variables g k and L k introduced in (67) and (69), we have(194) U ∗ k = U ∗∗ k + L k b ∗ ( · ) = U ∗∗ k + ( g k − √ n ξ k N ) b ∗ ( · ) . Denote K = E | ψ ( X , X ) | r and K s = E | ψ ∗∗ ( X , X ) | s , s ≤ r . Lemma 8. Let < r ≤ . For s ≤ r , we have (195) K r/ss ≤ K r ≤ c K (cid:16) E | g ( X ) | r σ r (cid:17) . Proof. The first inequality of (195) is a consequence of Lyapunov’s inequality. Let us prove thesecond inequality. The inequality | a + b + c | r ≤ r ( | a | r + | b | r + | c | r ) implies K r = E | ψ ∗∗ ( X , X ) | r ≤ r (cid:0) K + 2 E | b ( X ) | r E | g ( X ) | r (cid:1) . Therefore, (195) is a consequence of the inequalities E | b ( X ) | r ≤ r σ r K + | κ | r σ r E | g ( X ) | r ,κ ≤ σ E ψ ( X , X ) ≤ σ K /r . Here κ = E ψ ( X , X ) g ( X ) g ( X ). To prove the first inequality use | a + b | r ≤ r ( | a | r + | b | r ) toget E | b ( X ) | r ≤ r σ r E (cid:12)(cid:12) E (cid:0) ψ ( X , X ) g ( X ) (cid:12)(cid:12) X (cid:1)(cid:12)(cid:12) r + κ r σ r E | g ( X ) | r . Furthermore, by Cauchy–Schwartz, (cid:12)(cid:12) E (cid:0) ψ ( X , X ) g ( X ) | X (cid:1)(cid:12)(cid:12) ≤ (cid:0) E ( ψ ( X , X ) | X ) (cid:1) / σ. Finally, Lyapunov’s inequality implies (cid:0) E ( ψ ( X , X ) | X ) (cid:1) r/ ≤ E ( | ψ ( X , X ) | r | X ) . We obtain E (cid:12)(cid:12) E (cid:0) ψ ( X , X ) g ( X ) | X (cid:1)(cid:12)(cid:12) r ≤ Kσ r thus completing the proof. (cid:3) Lemma 9. Let ≤ k ≤ n − . For U ∗ k , an independent copy of U ∗ k , we have δ − c ∗ N α ( r − ≤ NM E k U ∗ k − U ∗ k k ≤ δ + c ∗ , (196) E k U k − U k k rr ≤ c ∗ (cid:16) MN (cid:17) r/ . (197) Recall that δ = E | ψ ∗∗ ( X , X ) | .Proof. Let us prove (196). By symmetry, we have, for i, j ∈ O , E k U ∗ − U ∗ k = 2 MN H − MN H ,H := E k ψ ∗ ( · , Y j ) k , H := E h ψ ∗ ( · , Y j ) , ψ ∗ ( · , Y i ) i , i = j. The inequality (196) follows from the inequalities δ − c ∗ N − α ( r − ≤ H ≤ δ + c ∗ , (198) H ≤ c ∗ N − α ( r − . (199)Let us prove (198). From (193) we have H = V + V + 2 V , where V = E k ψ ∗∗ ( · , Y j ) k , V = k b ∗ ( · ) k E g ( Y j ) , V = E g ( Y j ) h ψ ∗∗ ( · , Y j ) , b ∗ ( · ) i . Let us show that(200) δ − c ∗ N − α ( r − ≤ V ≤ δ + c ∗ N − α r . This inequality follows from (39), (40) and the identity V = q − N E | ψ ∗∗ ( X , X j ) | I A j = q − N E | ψ ∗∗ ( X , X j ) | − q − N V ′ , where V ′ = E | ψ ∗∗ ( X , X j ) | (1 − I A j ) satisfies, by (38),(201) 0 ≤ V ′ ≤ N − α ( r − E | ψ ∗∗ ( X , X j ) | k Z ′ j k r − r ≤ c ∗ N − α ( r − . In the last step we applied H¨older’s inequality and Lemma 8 to get E | ψ ∗∗ ( X , X j ) | k Z ′ j k r − r ≤ K /rr ( E k Z ′ j k rr ) ( r − /r ≤ K /rr K ( r − /r ≤ c ∗ . Let us show that(202) 0 ≤ V ≤ c ∗ . For ˜ b ( · ) := E ψ ( · , X ) g ( X ) we have, by Cauchy-Schwartz, k ˜ b ( · ) k = E (cid:0) E ( ψ ( X, X ) g ( X ) (cid:12)(cid:12) X ) (cid:1) ≤ E ψ ( X, X ) σ ≤ c ∗ σ . Now the identity b ∗ = σ − p g (˜ b ) implies(203) k b ∗ k ≤ σ − k ˜ b k ≤ σ − c ∗ . Invoking the bound E g ( Y j ) ≤ c ∗ σ , see (42), we obtain (202).Finally, write V = q − N E ˜ V I A j , ˜ V = g ( X j ) ψ ∗∗ ( X , X j ) b ∗ ( X ) . The identity (192) implies E ˜ V = 0. Therefore V = q − N E ˜ V ( I A j − q − N ≤ c ∗ , see (40), we obtain(204) | V | ≤ c ∗ N − α ( r − E | ˜ V |k Z ′ j k r − r ≤ c ∗ N − α ( r − . In the last step we used the bound E | ˜ V |k Z ′ j k r − r ≤ c ∗ . In order to prove this bound we invokethe inequalities | abc | ≤ ( ab ) + c ≤ a + b + c to show that | ˜ V | ≤ | g ( X j ) | + | ψ ∗∗ ( X , X j ) | + | b ∗ ( X ) | . Furthermore, by H¨older’s inequality and (195), E | g ( X j ) | k Z ′ j k r − r ≤ c ∗ , E | ψ ∗∗ ( X , X j ) | k Z ′ j k r − r ≤ c ∗ . YMMETRIC STATISTICS 47 By the independence and (203), E | b ∗ ( X ) | k Z ′ j k r − r = k b ∗ k E k Z ′ j k r − r ≤ c ∗ . Thus we arrive at (204). Combining (200), (202) and (204) we obtain (198).Let us prove (199). Using (193) write H = Q + Q + 2 Q , where Q = E ψ ∗∗ ( X , Y j ) ψ ∗∗ ( X , Y i ) , Q = k b ∗ k E g ( Y j ) g ( Y i ) ,Q = E ψ ∗∗ ( X , Y j ) g ( Y i ) b ∗ ( X ) . It follows from the identity (191) that Q = q − N E ψ ∗∗ ( X , X j ) ψ ∗∗ ( X , X i )( I A j − I A i − . The simple inequality | ψ ∗∗ ( X , X j ) ψ ∗∗ ( X , X i ) | ≤ | ψ ∗∗ ( X , X j ) | + | ψ ∗∗ ( X , X i ) | yields, bysymmetry,(205) | Q | ≤ q − N E | ψ ∗∗ ( X , X j ) | (1 − I A j ) ≤ c ∗ N − α ( r − . In the last step we applied (201) and q − N ≤ c ∗ , see (39).Furthermore, using the identity E g ( X i ) = 0 we obtain from (38) | E g ( Y i ) | = q − N | E g ( X i )( I A i − | (206) ≤ q − N N − α ( r − E | g ( X i ) |k Z i k r − r ≤ c ∗ N − α ( r − . In the last step we applied H¨older’s inequality to show E | g ( X i ) |k Z i k r − r ≤ c ∗ .The bounds (206), (39) and (203) together imply(207) | Q k | ≤ c ∗ N − α ( r − , k = 2 , . The bound (199) follows from (205) and (207).Let us prove (197). For this purpose we shall show that(208) E (cid:13)(cid:13) X j ∈ O k V j √ M (cid:13)(cid:13) rr ≤ c ∗ , where V j = ψ ( · , Y j ) − ψ ( · , Y j ) , and where Y j denote independent copies of Y j , j ∈ O k . Using E k ψ ( · , X j ) k rr = E | ψ ( X , X j ) | r ≤ c ∗ we obtain, by symmetry and (42), E k V j k rr ≤ r E k ψ ( · , Y j ) k rr ≤ c ∗ E k ψ ( · , X j ) k rr ≤ c ∗ . Now (208) follows from the well known inequality(209) k ξ + · · · + ξ k k rr ≤ c ( r ) k X i =1 E k ξ i k rr + c ( r ) (cid:0) k X i =1 E k ξ i k r (cid:1) r/ , k = 1 , , . . . which is valid for independent centered random elements ξ i with values in L r . One can derivethis inequality from Hoffmann – Jorgensen’s inequality (see e.g., Proposition 6.8 in Ledoux andTalagrand (1991) [21]) using the type 2 property of the Banach space L r and the symmetrizationlemma (see formula (9.8) and Lemma 6.3 ibidem). The proof of the lemma is complete. (cid:3) Before formulating and proving Lemma 10 we introduce some more notation. Let B ( L r ) denotethe class of Borel sets of L r . Consider the regular conditional probability P k : R × B ( L r ) → [0 , z k ∈ R and B ∈ B ( L r ), P k ( z k ; B ) := P (cid:0) U k ∈ B (cid:12)(cid:12) g k = z k (cid:1) = E (II U k ∈ B | g k = z k ) . Recall, see (77), that ψ k denotes a L r valued random variable with the distribution P { ψ k ∈ B } = P k ( z k ; B ). Note that the L r valued random variable ψ ∗ k = p g ( ψ k ) has distribution P { ψ ∗ k ∈ B } = P { p g ( ψ k ) ∈ B } = P { ψ k ∈ p − g ( B ) } = P (cid:0) U k ∈ p − g ( B ) (cid:12)(cid:12) g k = z k (cid:1) = P (cid:0) U ∗ k ∈ B (cid:12)(cid:12) g k = z k (cid:1) . (210)Furthermore, using (194) we write (210) in the form P { ψ ∗ k ∈ B } = P (cid:16) U ∗∗ k + ( z k − N ξ k n / ) b ∗ ∈ B (cid:12)(cid:12)(cid:12) g k = z k (cid:17) . Let ψ k respectively ψ ∗ k denote an independent copy of ψ k respectively ψ ∗ k . Denote τ N = M − ( r − / + N − α ( r − M. Lemma 10. Let k = 1 , . . . , n − . Let | z k | ≤ w n − / . There exist positive constants c ( i ) ∗ , i = 0 , , , , which depend on w, r, ν , ν , δ, A ∗ , D ∗ , M ∗ only such that for (211) τ N ≤ c (0) ∗ δ , we have c (1) ∗ δ ≤ n E k ψ ∗ k − ψ ∗ k k ≤ c (2) ∗ δ (212) E k ψ k − ψ k k rr ≤ c (3) ∗ n − r/ . (213)The condition (211) requires N to be large enough. A simple calculation shows τ N ≤ N − ν , for ν satisfying (15). Therefore, (82) implies τ N ≤ N − ν δ . In particular, under (82) the inequality(211) is satisfied provided that N > c ∗ , where c ∗ does not depend on δ . Proof. By ˜ c ∗ , ˜ c ′∗ we denote positive constants which depend only on w, r, ν , ν , δ, A ∗ , D ∗ , M ∗ .These constants can be different in different places of the text. Given i, j ∈ O k , i = j , introducerandom variables g ∗ = η + ζ, η = ξ k R , ζ = 1 √ M X j ∈ O k g ( Y j ) ,ζ i = ζ − g ( Y i ) √ M , ζ ij = ζ − g ( Y i ) √ M − g ( Y j ) √ M . Here R = √ n M N satisfies N/ ≤ R ≤ N , by the choice of n and M . Let p , p , p , and p denote the densities of random variables η , ζ + η , ζ i + η , and ζ ij + η respectively.Note that g ∗ = p N/M g k . Therefore, the condition g k = z k is equivalent to g ∗ = z ∗ , where z ∗ = p N/M z k . Furthermore, | z k | ≤ w n − / ⇔ | z ∗ | ≤ w ∗ , where w ∗ = w p N/M n ≤ w .Given a random variable Y , we denote the conditional expectation E ( Y | g ∗ = z ∗ ) = E ( Y | g k = z k )by E ∗ Y . Similarly, for an event A , we have P ( A | g k = z k ) = P ( A | g ∗ = z ∗ ). Proof of (212). For the L r valued random variable ˆ ψ ∗ = ψ ∗ k − z k b ∗ we have(214) P { ˆ ψ ∗ ∈ B } = P (cid:16) U ∗∗ k − N ξ k n / b ∗ ∈ B (cid:12)(cid:12)(cid:12) g ∗ = z ∗ (cid:17) . Note that for an independent copy ψ ∗ k of ψ ∗ k the distributions of ψ ∗ k − ψ ∗ k and ˆ ψ ∗ − ˆ ψ ∗ c are thesame. Here ˆ ψ ∗ c denotes an independent copy of ˆ ψ ∗ . Therefore,(215) E k ψ ∗ k − ψ ∗ k k = E k ˆ ψ ∗ − ˆ ψ ∗ c k = 2 E k ˆ ψ ∗ k − k E ˆ ψ ∗ k . In order to prove (212) we show that(216) k E ˆ ψ ∗ k ≤ ˜ c ∗ N − and, for τ N ≤ c (0) ∗ δ (i.e., for sufficiently large N ),(217) ˜ c ∗ δ ≤ n E k ˆ ψ ∗ k ≤ ˜ c ′∗ δ . Since N − n < τ N , we can choose c (0) ∗ small enough such that the inequalities (215), (216) and(217) together imply (212) Proof of (216). Recall that an element m = m ( · ) ∈ L ( P X ) is called mean of an L ( P X ) valuedrandom variable ˆ ψ ∗ = ˆ ψ ∗ ( · ) if for every f = f ( · ) ∈ L ( P X ) h f, m i = E D f, ˆ ψ ∗ E . We shall show below that E k ˆ ψ ∗ k < ∞ . Then, by Fubini, E D f, ˆ ψ ∗ E = Z f ( x ) E ˆ ψ ∗ ( x ) P X ( dx ) . Therefore, m ( x ) = E ˆ ψ ∗ ( x ), for P X almost all x .For f ∈ L ( P X ) it follows from (214) that E D f, ˆ ψ ∗ E = E ∗ (cid:28) f, U ∗∗ k − N ξ k n / b ∗ (cid:29) (218) = E ∗ h f, U ∗∗ k i − √ M √ N h f, b ∗ i E ∗ η. Fix i ∈ O k . By symmetry,(219) E ∗ h f, U ∗∗ k i = M √ N E ∗ h f, ψ ∗∗ ( · , Y i ) i An application of (247) yields E ∗ h f, ψ ∗∗ ( · , Y i ) i = 1 p ( z ∗ ) E h f, ψ ∗∗ ( · , Y i ) i p (cid:0) z ∗ − g ( Y i ) √ M (cid:1) = h f, a z ∗ i , (220)where a z ∗ ( · ) = b z ∗ ( · ) p ( z ∗ ) , b z ∗ ( · ) = E ψ ∗∗ ( · , Y i ) p (cid:0) z ∗ − g ( Y i ) √ M (cid:1) are non-random elements of L r .It follows from (218), (219), (220) that m ( · ) = M √ N a z ∗ ( · ) − √ M √ N b ∗ ( · ) E ∗ η. In order to prove (216) we show that, for | z ∗ | ≤ w ∗ , k b z ∗ k ≤ c ∗ M − , (221) | E ∗ η | ≤ ˜ c ∗ M − / + ˜ c ∗ R − / , (222) p i ( z ∗ ) ≥ ˜ c ∗ , i = 0 , , , (223)and apply (203). Note that, by Lemma 7, there exist positive constants ˜ c ∗ , ˜ c ′∗ such that, for M, N > ˜ c ′∗ , the inequality (223) holds. Let us prove (221). In Lemma 7 we show, for i = 1 , 2, that p i and its derivatives are boundedfunctions. That is,(224) | p i | ≤ c ∗ , | p ′ i | ≤ c ∗ , | p ′′ i | ≤ c ∗ , | p ′′′ i | ≤ c ∗ , i = 1 , . Expanding in powers of M − / g ( Y i ) we obtain(225) p (cid:0) z ∗ − g ( Y i ) √ M (cid:1) = p ( z ∗ ) − g ( Y i ) √ M p ′ ( z ∗ ) + g ( Y i ) M p ′′ ( θ )2 . It follows from the identities (191) and (192) that for P X almost all x E ψ ∗∗ ( x, Y i ) = q − N E ψ ∗∗ ( x, X i )II A i = q − N E ψ ∗∗ ( x, X i )(II A i − q − N a ( x ) E ψ ∗∗ ( x, Y i ) g ( Y i ) = q − N E ψ ∗∗ ( x, X i ) g ( X i )II A i = q − N E ψ ∗∗ ( x, X i ) g ( X i )(II A i − q − N a ( x ) . Using (224) and the inequality q − N ≤ c ∗ , see (39), we obtain from (225) k b z ∗ ( · ) k ≤ c ∗ k a ( · ) k + c ∗ √ M k a ( · ) k + c ∗ M k a ( · ) k , where we denote a ( · ) = E ψ ∗∗ ( · , Y i ) g ( Y i ). In order to prove (221) we show that(226) k a ( · ) k ≤ c ∗ N α ( r − , k a ( · ) k ≤ c ∗ N α ( r − , k a ( · ) k ≤ c ∗ . Let us prove (226). Invoking (38) we obtain, by H¨older’s inequality,(227) | a ( x ) | ≤ E | ψ ∗∗ ( x, X i ) | k Z ′ i k r − r N α ( r − ≤ w /r ( x ) K ( r − /r N α ( r − , where we denote w ( x ) = E | ψ ∗∗ ( x, X i ) | r . Furthermore, by Lyapunov’s inequality,(228) k w /r ( · ) k = E w /r ( X ) ≤ (cid:0) E w ( X ) (cid:1) /r = K /rr . Clearly, the first bound of (226) follows from (227), (228) and (195). A similar argument showsthe second bound of (226). We have(229) | a ( x ) | ≤ E | ψ ∗∗ ( x, X i ) g ( X i ) | k Z ′ i k r − r N α ( r − ≤ w /r ( x ) V ( r − /r N α ( r − , where we denote V = E (cid:0) k Z ′ i k r − r | g ( X i ) | (cid:1) r/ ( r − . By H¨older’s inequality,(230) V ≤ (cid:0) E | g ( X i ) | r (cid:1) / ( r − (cid:0) E k Z ′ i k rr (cid:1) ( r − / ( r − ≤ c ∗ . Clearly, (228), (229) and (230) imply the second bound of (226). The last bound of (226) followsfrom (42), by Cauchy-Shwartz. Indeed, we have | a ( x ) | ≤ c ∗ E | ψ ∗∗ ( x, X i ) | g ( X i ) ≤ c ∗ (cid:0) E | ψ ∗∗ ( x, X i ) | E g ( X i ) (cid:1) / . Therefore, k a ( · ) k ≤ c ∗ K E g ( X i ) ≤ c ∗ , by (195). Let us prove (222). We have, by (246), E ∗ η = p − ( z ∗ ) E ( z ∗ − ζ ) p ( z ∗ − ζ ) . In order to prove (222) it suffices to show in view of (223) that(231) | E ( z ∗ − ζ ) p ( z ∗ − ζ ) | ≤ c ∗ R − / + c ∗ M − / . Let ˜ p denotes the density function of ξ k . Then p ( u ) = R ˜ p ( R u ). We have E ( z ∗ − ζ ) p ( z ∗ − ζ ) = 6 c ξ E sin ( R ( z ∗ − ζ ) / (cid:0) R ( z ∗ − ζ ) / (cid:1) . YMMETRIC STATISTICS 51 Therefore, denoting H ( z ∗ ) = 1 + | R ( z ∗ − ζ ) | , we obtain(232) E | ( z ∗ − ζ ) p ( z ∗ − ζ ) | ≤ c E H − ( z ∗ ) . On the event | ζ − z ∗ | ≥ R − / we have H − ( z ∗ ) ≤ R − / . Furthermore, a bound for theprobability of the complementary event P {| ζ − z ∗ | ≤ R − / } ≤ c ∗ R − / + c ∗ M − / , follows by the Berry-Esseen bound applied to the sum ζ . Therefore, E H − ( z ∗ ) is bounded bythe right hand side of (231). Now (231) follows from (232). Proof of (217). Write U ∗∗ k − N ξ k √ n b ∗ = √ M √ N ( T − T ) ,T := 1 √ M X j ∈ O k ψ ∗∗ ( · , Y j ) , T := ηb ∗ . It follows from (214), by the inequality k u + v k ≥ k u k / − k v k , for u, v ∈ L ( P X ), that E k ˆ ψ ∗ k = MN E ∗ k T − T k ≥ M N E ∗ k T k − MN E ∗ k T k . We shall show that E ∗ k T k ≤ p − ( z ∗ ) (cid:0) c ∗ R − M − / + c ∗ R − / (cid:1) , (233) E ∗ k T k ≥ p − ( z ∗ ) (cid:0) p ( z ∗ ) δ − c ∗ τ N (cid:1) . (234) E ∗ k T k ≤ p − ( z ∗ ) (cid:0) p ( z ∗ ) δ + c ∗ τ N (cid:1) . (235)The inequalities (233) and (234) imply the lower bound in (217). Indeed, by (223), we have, forsmall c (0) ∗ , c ∗ M / R + c ∗ R / ≤ c ∗ τ N ≤ c ∗ c (0) ∗ δ ≤ p ( z ∗ ) δ / . Similarly, the inequalities (233) and (235) imply the upper bound in (217). Proof of (233). We have, by (246), E ∗ η = p − ( z ∗ ) W, W := E ( z ∗ − ζ ) p ( z ∗ − ζ ) . Proceeding as in proof of (231), we obtain W = 36 R c ξ E sin ( R ( z ∗ − ζ ) / R ( z ∗ − ζ ) / ≤ cR E ˜ H − ( z ∗ ) , where ˜ H ( z ∗ ) = 1 + | R ( z ∗ − ζ ) | satisfies E ˜ H − ( z ∗ ) ≤ c ∗ R − / + c ∗ M − / . Therefore, W ≤ c ∗ R − / + c ∗ R − M − / . This inequality in combination with (203) implies(233). Proof of (234). Fix i, j ∈ O k , i = j . By symmetry, E ∗ k T k = E ∗ T + ( M − E ∗ T , (236) T = k ψ ∗∗ ( · , Y i ) k , T = h ψ ∗∗ ( · , Y i ) , ψ ∗∗ ( · , Y j ) i . We have, by (247), E ∗ T = p − ( z ∗ ) H , E ∗ T = p − ( z ∗ ) H ,H = E T p (cid:0) z ∗ − g ( Y i ) √ M (cid:1) , H = E T p (cid:0) z ∗ − g ( Y i ) + g ( Y j ) √ M (cid:1) . The inequality (234) follows from (236) and the bounds H ≥ p ( z ∗ ) δ − c ∗ M − / , (237) | H | ≤ c ∗ N − α ( r − + c ∗ M − ( r − / . (238)Let us prove (237). It follows from (224), by the mean value theorem, that(239) H = p ( z ∗ ) E T + Q, | Q | ≤ c ∗ E T | g ( Y i ) |√ M , where | Q | ≤ c ∗ M − / . Indeed, by (42) and Cauchy-Schwartz, E T | g ( Y i ) | ≤ E k ψ ∗∗ ( · , X i ) k | g ( X i ) | ≤ K / σ ≤ c ∗ . In the last step we applied (195). Furthermore, the identity E T = q − N E | ψ ∗∗ ( X, X i ) | I A i = E | ψ ∗∗ ( X, X i ) | − b − b ,b = (1 − q − N ) E | ψ ∗∗ ( X, X i ) | , b = q − N E | ψ ∗∗ ( X, X i ) | (1 − I A i )combined with (38), (39) and (40) yields E T ≥ δ − c ∗ M − / . This bound together with (239)shows (237).Let us prove (238). Write y i = g ( Y i ) and expand p (cid:0) z ∗ − y i + y j √ M (cid:1) = p ( z ∗ ) − p ′ ( z ∗ ) y i + y j √ M + p ′′ ( z ∗ )2 ( y i + y j ) M + ˜ Q. From (224) it follows, for 2 < r − ≤ 3, the bound | ˜ Q | ≤ c ∗ | y i + y j | r − /M ( r − / . Furthermore, denote h = E T , h = E T g ( Y i ) ,h = E T g ( Y i ) , h = E T g ( Y i ) g ( Y j ) . We obtain, by symmetry, H = p ( z ∗ ) h − p ′ ( z ∗ ) √ M h + p ′′ ( z ∗ ) M ( h + h ) + E T ˜ Q, (240) E | T ˜ Q | ≤ c ∗ M − ( r − / E | g ( Y i ) | r − | T | . Denote ˜ T = q − N ψ ∗∗ ( X, X i ) ψ ∗∗ ( X, X j ) . It follows from (40), by H¨olders inequality and (195), that E | g ( Y i ) | r − | T | ≤ E | g ( X i ) | r − | ˜ T | ≤ c ∗ . Therefore,(241) E | T ˜ Q | ≤ c ∗ M − ( r − / . Furthermore, (191) and (192) imply h = E ˜ T (II A i − A j − , h = E ˜ T g ( X i )(II A i − A j − ,h = E ˜ T g ( X i )II A i (II A j − , h = E ˜ T g ( X i ) g ( X j )(II A i − A j − . YMMETRIC STATISTICS 53 Invoking the inequalities q − N ≤ c ∗ , see (39), and 1 − II A i ≤ V si , s > 0, where V i := k Z ′ i k r /N α ,see (38), we obtain, by H¨older’s inequality, | h | ≤ E | ˜ T | ( V i V j ) ( r − / ≤ c ∗ N − α ( r − , (242) | h | ≤ E | ˜ T g ( X i ) | V ( r − / i V ( r − / j ≤ c ∗ N − α ( r − , | h | ≤ E | ˜ T | g ( X i ) V r − j ≤ c ∗ N − α ( r − , | h | ≤ E | ˜ T g ( X i ) g ( X j ) | ( V i V j ) ( r − / ≤ c ∗ N − α ( r − . Combining (240), (242), (241) and using the simple inequalities1 N α ( r − M / ≤ N , N α ( r − M ≤ N , ˜ N = min { N α ( r − , M ( r − / } and the inequalities (224), we obtain (238). Proof of (235). The inequality follows from (236), (238) and the inequality H ≤ p ( z ∗ ) E T + c ∗ M − / ≤ p ( z ∗ ) δ + c ∗ N − αr + c ∗ M − / ≤ p ( z ∗ ) δ + c ∗ M − / , which is obtained in the same way as (237) above. Proof of (213). In order to prove (213) we shall show that(243) E k ψ k k rr ≤ ˜ c ∗ n − r/ . Split O k = B ∪ D , where B ∩ D = ∅ and | B | = [ M/ 2] and write U k = √ M √ N ( U B + U D ) , U B = 1 √ M X j ∈ B ψ ( · , Y j ) ,ζ = ζ B + ζ D , ζ B = 1 √ M X j ∈ B g ( Y j ) . In particular, we have g ∗ = η + ζ B + ζ D .The inequality E k ψ k k rr = E ∗ k U k k rr ≤ r (cid:16) MN (cid:17) r/ (cid:0) E ∗ k U B k rr + E ∗ k U D k rr (cid:1) combined with the bounds(244) E ∗ k U B k rr ≤ c ∗ , E ∗ k U D k rr ≤ c ∗ imply (243). Let us prove the first bound of (244). By (247), we have E ∗ k U B k rr = p − ( z ∗ ) E k U B k rr p ( z ∗ − ζ B ) , where p denotes the density of η + ζ D . Furthermore, invoking the boundsup x ∈ R | p ( x ) | ≤ c ∗ , (which is obtained using the same argument as in the proof of Lemma7) and the inequality (223), we obtain E ∗ k U B k rr ≤ ˜ c ∗ E k U B k rr . Finally, invoking the bound(245) E k U B k rr ≤ c ∗ E k √ M X j ∈ B ψ ( · , X j ) k rr ≤ c ∗ , see (42) and (209), we obtain the first bound of (244). The second bound is obtained in thesame way. This completes the proof of the lemma. (cid:3) We collect some facts about conditional moments in a separate lemma. Lemma 11. Let η and ζ be independent random variables. Assume that η is real valued andhas a density, say x → p ( x ) .(i) Assume that ζ is real valued. Then the function x → E p ( x − ζ ) , x ∈ R , is a density of the distribution P η + ζ of η + ζ . Let w : R → R be a measurable function such that E | w ( η ) | < ∞ . For P η + ζ almost all x ∈ R , we have (246) E (cid:0) w ( η ) (cid:12)(cid:12) η + ζ = x (cid:1) = E w ( x − ζ ) p ( x − ζ ) E p ( x − ζ ) . (ii) Assume that ζ takes values in a measurable space, say Y . Assume that u, v : Y → R aremeasurable functions and denote P η + u ( ζ ) the distribution of η + u ( ζ ) . If E | v ( ζ ) | < ∞ , then for P η + u ( ζ ) almost all x ∈ R , (247) E (cid:0) v ( ζ ) (cid:12)(cid:12) η + u ( ζ ) = x (cid:1) = E v ( ζ ) p (cid:0) x − u ( ζ ) (cid:1) E p (cid:0) x − u ( ζ ) (cid:1) . Appendix 4 In the next lemma we consider independent and identically distributed random vectors ( ξ, η )and ( ξ ′ , η ′ ) with values in R and the symmetrization ( ξ s , η s ) where ξ s = ξ − ξ ′ and η s = η − η ′ .Note that in the main text we apply this lemma to ξ = g ( X ) and η = N − / P Nj = m +1 ψ ( X , Y j ). Lemma 12. Let < ν < / and r > . Assume that E | ξ | r + E | η | r < ∞ . The followingstatements hold. a) For c r = (7 / − r the conditions | t | r − E | ξ s | r ≤ c r E ξ s , E ξ s η s = 0 , E | η s | r ≤ c r E η s imply − | E exp { i ( tξ + η ) }| ≥ − ( t E ξ s + E η s ) . b) Assume that for some ˜ c , ˜ c > we have (248) E ξ s / − N − E η s > ˜ c and c r E ξ s / E | ξ s | r ≥ ˜ c r − . Let ε > be such that (249) ε < / c , ε ( r − / < σ z / A, ε r − < σ z / B, where ˜ c = 2 + (5 / ˜ c ) σ z and where the numbers A, B are defined in (260). Here σ z = E ( ξ s + N − / η s ) . Assume that for some < δ < ˜ c and δ ′ > ε , (250) sup δ< | t | Proof b) . Introduce the function t → τ ∗ t = 1 − | E e it ( ξ + N − / η ) | . Assume that the set I ∗ isnon-empty and choose s, t ∈ I ∗ , i.e., we have τ ∗ t , τ ∗ s ≤ ε . Firstly we show that | s − t | ≤ c − ε ,thus proving the bound for the size of the set I ∗ .The inequality 1 − cos( x + y ) ≥ (1 − cos x ) / − (1 − cos y ) implies(251) 1 − | E e i ( X + Y ) | ≥ − (1 − | E e iX | ) − (1 − | E e iY | ) , for arbitrary random variables X, Y . Choosing ˜ Y = t ( ξ + N − / η ) and ˜ X = ( s − t )( ξ + N − / η )shows(252) τ ∗ s ≥ (1 − | E e i ˜ X | ) / − τ ∗ t . Now we show that the inequality | t − s | > c − ε implies 1 − | E e i ˜ X | > ε , thus, contradictingto our choice τ ∗ s , τ ∗ t < ε and (252). In what follows the cases of “large” and “small” values of | t − s | are treated separately.For 5˜ c − ε < | t − s | ≤ δ we shall apply (251) to ˜ X = X + Y , where X = ( s − t ) ξ and Y =( s − t ) N − / η . Note that the statement a) implies(253) 1 − | E e iX | ≥ ( t − s ) E ξ s / . Indeed, in view of the second inequality of (248), the conditions of a) are satisfied for | t − s | ≤ δ ≤ ˜ c . Furthermore, we have(254) 0 ≤ − | E e iY | = 1 − cos (cid:0) N − / ( s − t ) η s (cid:1) ≤ ( s − t ) N − E η s . Invoking the bounds (253) and (254) in (251) we obtain1 − | E e i ˜ X | ≥ ( s − t ) E ξ s / − ( s − t ) N − E η s ≥ ˜ c ( s − t ) ≥ ε . In the last step we used (248).For δ < | t − s | ≤ N − ν +1 / we expand in powers of a = i ( s − t ) N − / η s to get1 − | E e i ˜ X | = 1 − E exp { i ( s − t ) ξ s + a } ≥ − E exp { i ( s − t ) ξ s } − E | a |≥ δ ′ − E | ( t − s ) N − / η s | ≥ δ ′ − N − ν E | η s |≥ δ ′ / ≥ ε . In the last step we applied (250).Let us prove that I ∗ is indeed an interval. Assume the contrary, i.e. there exist s < u < t suchthat s, t ∈ I ∗ and u / ∈ I ∗ . In particular, τ ∗ t ≤ ε < τ ∗ u . Clearly, we can choose u to be a localmaximum (stationary) point of the function t → τ ∗ t . Denote z = ξ s + N − / η s , σ z = E z . An application of (251) to Y ′ = ( t − u )( ξ + N − / η ) and X ′ = u ( ξ + N − / η ) gives τ ∗ t ≥ τ ∗ u / − (cid:0) − E e i ( t − u ) z (cid:1) = τ ∗ u / − (cid:0) − E cos( t − u ) z (cid:1) . Invoking the inequalities τ ∗ t ≤ ε and 1 − cos( t − u ) z ≤ ( t − u ) z / τ ∗ u ≤ ε + ( t − u ) σ z ≤ ε ˜ c , ˜ c = 2 + (5 / ˜ c ) σ z . Here we used the bound | t − u | ≤ | t − s | ≤ ε/ ˜ c proved above.Denoting y = ( t − u ) z we have τ ∗ t = 1 − E e iuz e iy . Invoking the expansion e iy = 1+ iy +( iy ) / R ′ ,where | R ′ | ≤ y / | y | r , we obtain(256) τ ∗ t = τ ∗ u − i E ye iuz + 2 − E y e iuz + R, | R | ≤ E y / E | y | r =: R . For a stationary point u we have 0 = ∂∂t τ ∗ t (cid:12)(cid:12) t = u = − i E ze iuz . Therefore, E ye iuz = 0 and (256)implies τ ∗ t ≥ τ ∗ u + 2 − ( t − u ) E z e iuz − R . Write the right hand side in the form τ ∗ u + 2 − ( t − u ) R , where R = E z e iuz − − σ z − E | z | r | t − u | r − . Note that the inequality R > τ ∗ t < τ ∗ u . We complete the proofby showing that R > z is symmetric we have E z sin uz = 0. Therefore,(257) E z e iuz = E z cos uz = σ z − E z (1 − cos uz ) . Given λ > E z (1 − cos uz ) = E z (1 − cos uz ) (cid:16) I { z <λ } + I { z ≥ λ } (cid:17) ≤ λ E (1 − cos uz ) + 2 E | z | r λ − r . (258)In the last step we used Chebyshev’s inequality. Furthermore, invoking the inequality E (1 − cos uz ) = τ ∗ u ≤ ˜ c ε , see (255), we obtain from (257) and (258) for λ = ε − σ z (259) E z e iuz ≥ σ z − ε ˜ c σ z − ε ( r − / E | z | r σ − rz . Finally, invoking the inequality | t − u | ≤ | t − s | ≤ c − ε we obtain from (259) R ≥ σ z (1 − − − ε ˜ c ) − ε ( r − / A − ε r − B, where for random variable z = ξ s + N − / η s we write(260) A = 2 E | z | r σ − rz and B = 2 E | z | r (5 / ˜ c ) r − . Thus, for ε satisfying (249) we have R > (cid:3) Appendix 5 Let Z , . . . , Z N be independent copies of the L r valued random element Z = { x → ψ ( x, Y ) } .Recall that almost surely k Z k ≤ N α . Here k · k denotes the norm of the Banach space L r , where r > / > α > 0. Write M p = E | ψ ( X , X ) | p . Lemma 13. (i) Assume that k E Z k ≤ E k Z k /N . Then there exists a constant c ( r ) > suchthat for k ≤ N and x > c ( r ) we have (261) P {k Z + · · · + Z k k > k / u x } ≤ exp {− − x (1 + xN α /k / u ) − } . Here u = E k Z k .(ii) The following inequalities hold k E Z k ≤ M r /q N N ( r − α (262) q − N ( M − M r N − ( r − α ) ≤ E k Z k ≤ q − N ( M /rr + M r N − ( r − α ) . (263) Remark. Assume that M ≥ M r N − ( r − α , M r ≤ ( q N / M N κ , where κ = 2( r − α − . Then (262) and (263) imply the inequality k E Z k ≤ E k Z k /N . Note that rα > κ > q N > − M r N − rα . Proof. We derive (i) from Yurinskii’s (1976) inequality. Denote ζ k = Z + · · · + Z k . Using thetype − L r valued random variable ζ k − E ζ k , E k ζ k − E ζ k k ≤ k ˜ c ( r ) E k Z − E Z k , and the inequality k Z − E Z k ≤ k Z k + 2 k E Z k , we obtain E k ζ k − E ζ k k ≤ (cid:0) E k ζ k − E ζ k k (cid:1) / ≤ k / c ′ ( r )( u + k E Z k ) . YMMETRIC STATISTICS 57 We have E k ζ k k ≤ E k ζ k − E ζ k k + k k E Z k≤ c ′ ( r ) k / u + k (1 + c ′ ( r ) k − / ) k E Z k =: β k . It follows from the inequality k Z k ≤ N α that E k Z k L ≤ − L ! u N α ( L − , L = 2 , , . . . . Write B k = ku . Theorem 2.1 of Yurinskii (1976) shows(264) P {k ζ k k ≥ xB k } ≤ exp {− B } , B = x xN α / B k )) − } , provided that x = x − β k /B k > β k /B k ≤ c ′ ( r )(1 + k − / ) we have, for x > c ( r ) := 4 c ′ ( r ) + 2, x > β k /B k and x > x > x/ . The latter inequality implies B ≥ B ′ := ( x/ xN α /B k )) − } . Finally, replacing B by B ′ in (264) we obtain (261).Let us prove (ii). The mean value E Z = { x → E ψ ( x, Y ) } is an element of L r . For P X almostall x ∈ X we have E ψ ( x, X ) = 0. Therefore, E Z = q − N E ψ ( x, X ) I A = q − N E ψ ( x, X )( I A − . Invoking (38) and using Chebyshev and H¨older inequalities, we obtain, for P X almost all x , | E Z | ≤ q N N α ( r − E k Z ′ k r − r | ψ ( x, X ) | ≤ q N N α ( r − ( E k Z ′ k rr ) ( r − /r a ( x ) , where a ( x ) = ( E | ψ ( x, X ) | r ) /r . Note that E k Z ′ k rr = M r and k a k r = M r . Finally, k E Z k ≤ k a k M ( r − /rr /q N N α ( r − = M r /q N N α ( r − . Let us prove (263). Denote b p ( x ) = ( E X | ψ ( X , x ) | p ) /p . Here E X denotes the conditionalexpectation given all the random variables, but X . We have(265) E k Z k = q − N E I A b r ( X ) = q − N E b r ( X ) + q − N R, R = E ( I A − b r ( X ) . By H¨older’s inequality b r ( x ) ≥ b ( x ), for P X almost all x . Therefore,(266) M = E b ( X ) ≤ E b r ( X ) ≤ M /rr Combining (266) and (265) and the bound | R | ≤ M r N − ( r − α we obtain (263). In order tobound | R | we use (38), | R | ≤ N − ( r − α E k Z ′ k r − r b r ( X ) and apply H¨older inequality, E k Z ′ k r − r b r ( X ) ≤ ( E k Z ′ k rr ) ( r − /r ( E b rr ( X )) /r = M r . References [1] Babu, Gutti Jogesh, Bai, Z.D.: Edgeworth expansions of a function of sample means under minimal momentconditions and partial Cramer’s condition. Sankhya, Ser. A 55, (1993), 244–258.[2] Bai, Z.D., Rao, Radhakrishna, C.: Edgeworth expansion of a function of sample means. Ann. Statist. Ann. Statist. 25, (1997), 851–896.[4] Bhattacharya, R.N. and Rao, Ranga R.: Normal approximation and asymptotic expansions , Robert E. KriegerPublishing company, Inc. 1986.[5] Bhattacharya, Rabi N. and Ghosh, J.K.: On the validity of the formal edgeworth expansion. Ann. Statist. 6, (1978), 434–451. [6] Bhattacharya, Rabi N. and Ghosh, J.K.: Correction to: On the validity of the formal Edgeworth expansion. Ann. Statist. 8, (1980), 1399.[7] Bickel, P. J.: Edgeworth expansions in nonparametric statistics. Ann. Statist. 2, (1974), 1–20.[8] Bickel, P. J., G¨otze, F. and van Zwet, W. R.: The Edgeworth expansion for U -statistics of degree two. Ann.Statist. 14, (1986), 1463–1484.[9] Bickel, P. J. and Robinson, J.: Edgeworth expansions and smoothness. Ann. Probab. 10, (1982), 500–503.[10] Bollob´as B´ela, Combinatorics. Set systems, hypergraphs, families of vectors and combinatorial probability ,Cambridge Univ. Press, 1986.[11] Callaert, H., Janssen, P. and Veraverbeke N.: An Edgeworth expansion for U -statistics. Ann. Statist. Teor. Veroyatn. Primen. 25, (1980), 745-757.[13] Dharmadhikari, S. W., Fabian, V. and Jogdeo, K.: Bounds on the moments of martingales Ann. Math.Statist. 39, (1968), 1719–1723.[14] Efron, B. and Stein, C.: The jackknife estimate of variance Ann. Statist. 9, (1981), 586–596.[15] Esseen, C.G.: Fourier analysis of distribution functions. A mathematical study of the Laplace-Gaussian law. Acta Math. 77, (1945), 1–125.[16] G¨otze, F.: Asymptotic expansions for bivariate von Mises functionals. Z. Wahrsch. Verw. Gebiete 50, (1979),333–355.[17] G¨otze, F. and van Zwet W.R.: Edgeworth expansions for asymptotically linear statistics. Manuscript, 1992,1–45.[18] Hall, P.: Edgeworth expansion for Student’s t statistic under minimal moment conditions. Ann. Probab. Edgeworth expansions for linear combinations of order statistics. Mathematical Centre Tracts,105. Amsterdam CWI, 1982.[20] Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Statist. 19, (1948),293–325.[21] Ledoux, M. and Talagrand M., Probability in Banach spaces. Isoperimetry and processes. Springer-Verlag,Berlin, Heidelberg, 1991.[22] Petrov, V. V., Sums of independent random variables. Springer-Verlag, New York-Heidelberg, 1975.[23] Pfanzagl, J., Asymptotic expansions for general statistical models. With the assistance of W. Wefelmeyer. Lecture Notes in Statistics, 31. Springer-Verlag, Berlin, 1985.[24] Serfling, R. J., Approximation theorems of mathematical statistics. Wiley, 1980.[25] Yurinskii, V. V.: Exponential inequalities for sums of random vectors. J. Multivar. Analysis 6, (1976), 473–499.[26] van Zwet, W. R.: A Berry–Esseen bound for symmetric statistics. Z. Wahrsch. Verw. Gebiete 66, (1984),425–440. Faculty of Mathematics, University of Bielefeld, Germany Email address : [email protected] Department of Mathematics, Vilnius University, Lithuania Email address ::