Testing multivariate normality by zeros of the harmonic oscillator in characteristic function spaces
TT ESTING MULTIVARIATE NORMALITY BY ZEROS OF THEHARMONIC OSCILLATOR IN CHARACTERISTIC FUNCTIONSPACES
A P
REPRINT
Philip Dörr
Institute of Stochastics,Karlsruhe Institute of Technology (KIT),Englerstr. 2, D-76133 Karlsruhe.
Bruno Ebner
Institute of Stochastics,Karlsruhe Institute of Technology (KIT),Englerstr. 2, D-76133 Karlsruhe.
Norbert Henze
Institute of Stochastics,Karlsruhe Institute of Technology (KIT),Englerstr. 2, D-76133 Karlsruhe.
September 30, 2019 A BSTRACT
We study a novel class of affine invariant and consistent tests for normality in any dimension. The testsare based on a characterization of the standard d -variate normal distribution as the unique solutionof an initial value problem of a partial differential equation motivated by the harmonic oscillator,which is a special case of a Schrödinger operator. We derive the asymptotic distribution of the teststatistics under the hypothesis of normality as well as under fixed and contiguous alternatives. Thetests are consistent against general alternatives, exhibit strong power performance for finite samples,and they are applied to a classical data set due to R.A. Fisher. The results can also be used for aneighborhood-of-model validation procedure. The multivariate normal distribution plays a key role in classical and hence widely used procedures. Serious statisticalinference that involves the assumption of multivariate normality should therefore start with a test of fit to this model.There is a continuing interest in this testing problem, as evidenced by a multitude of papers. The proposed tests maybe roughly classified as follows: [4, 7, 36, 37, 47, 55] consider tests based on the empirical characteristic function,while [31, 32, 35] employ the empirical moment generating function. A classical (and still popular) approach is toconsider measures of multivariate skewness and kurtosis (see, e.g., [16, 38, 41, 42, 43, 45]), as supposedly diagnostictools with regard to the kind of deviation from normality when this hypothesis has been rejected, but the deficiencies ofthose measures in this regard have been clearly demonstrated (see, e.g., [8, 9, 26, 27, 29]). Other approaches involvegeneralizations of tests for univariate normality [2, 39, 53], the examination of nonlinearity of dependence [13, 19],canonical correlations [57] and the notion of energy, see [54]. For a survey of affine invariant tests for multivariatenormality, see [30]. Monte Carlo studies can be found in [20, 44, 58].To be specific, let
X, X , . . . , X n , . . . be a sequence of independent identically distributed (i.i.d.) d -dimensional random(column) vectors, which are defined on some common probability space (Ω , A , P ) . Here, d ≥ is a fixed integer, which MSC 2010 subject classifications.
Primary 62E10 Secondary 60E10, 62G10
Key words and phrases test for multivariate normality; affine invariance; consistency; empirical characteristic function; harmonicoscillator; neighborhood-of-model validation a r X i v : . [ s t a t . M E ] S e p esting multivariate normality by zeros of the harmonic oscillator in characteristic function spacesmeans that the univariate case is deliberately not excluded. We write P X for the distribution of X . The d -variate normaldistribution with expectation µ and nonsingular covariance matrix Σ will be denoted by N d ( µ, Σ) . Furthermore, N d = { N d ( µ, Σ) : µ ∈ R d , Σ ∈ R d × d positive definite } stands for the family of nondegenerate d -variate normal distributions. A check of the assumption of multivariatenormality means to test the hypothesis H : P X ∈ N d , (1)against general alternatives.Writing I d for the unit matrix of order d , our novel idea for testing H is to use a characterization of the Fouriertransform of N d (0 , I d ) as the unique solution of an initial value problem of a partial differential equation motivated bythe harmonic oscillator, which is a special case of a Schrödinger operator. The proposed test statistic is based on thesquared norm of a functional of the empirical characteristic function in a suitably weighted L -space. This statistic isclose to zero under the hypothesis (1), and rejection will be for large values of the test statistic.Let L ( R d ) be the space of square integrable functions, equipped with the usual norm and scalar product (cid:104)· , ·(cid:105) . Considerfor sufficiently regular f ∈ L ( R d ) the partial differential equation (cid:26) ∆ f ( x ) = ( (cid:107) x (cid:107) − d ) f ( x ) , x ∈ R d ,f (0) = 1 . (2)Here, ∆ stands for the Laplace operator, and (cid:107) · (cid:107) denotes the Euclidean norm. Notice that we can rewrite (2) as ( − ∆ + (cid:107) x (cid:107) − d ) f ( x ) = 0 or, equivalently, as d (cid:88) j =1 (cid:32) − ∂ ∂x j + x j − (cid:33) f ( x ) = 0 , x = ( x , . . . , x d ) ∈ R d . (3)The operator − ∆ + (cid:107) x (cid:107) − d is known as the harmonic oscillator, see [24]. In the univariate case, (2) reduces to afixed point problem (or, equivalently, to the problem of finding the eigenfunction that corresponds to the eigenvalue 1)of the Hermite operator, see equation (1.1.9) in [56]. The solution of this problem is the th Hermite function, whichcoincides with the solution given in the following theorem.
Theorem 1.1.
The characteristic function ψ ( t ) = exp (cid:18) − (cid:107) t (cid:107) (cid:19) , t ∈ R d , (4) of the d -variate standard normal distribution N d (0 , I d ) is the unique solution of (2).Proof. Let f be an arbitrary solution of (2). Writing i for the imaginary unit, we introduce the creation and annihilationoperators a j = x j + i p j and a (cid:63)j = x j − i p j , j = 1 , . . . , d , where p j = − i ∂∂x j , j = 1 , . . . , d . For each j ∈ { , . . . , d } we have a (cid:63)j a j = (cid:18) x j − ∂∂x j (cid:19) (cid:18) x j + ∂∂x j (cid:19) = x j + x j ∂∂x j − ∂∂x j x j − ∂ ∂x j = − ∂ ∂x j + x j − . So we can rewrite (3) as (cid:80) dj =1 a (cid:63)j a j f = 0 , which implies (cid:28) f, d (cid:88) j =1 a (cid:63)j a j f (cid:29) = d (cid:88) j =1 (cid:104) f, a (cid:63)j a j f (cid:105) = d (cid:88) j =1 (cid:104) a j f, a j f (cid:105) = d (cid:88) j =1 (cid:107) a j f (cid:107) = 0 and thus a j f = 0 for each j ∈ { , . . . , d } . We therefore have ( x + ∇ ) f = 0 , where ∇ denotes the gradient operator.By the last statement and the product rule, it follows that ∇ (cid:18) exp (cid:18) (cid:107) x (cid:107) (cid:19) f (cid:19) = x exp (cid:18) (cid:107) x (cid:107) (cid:19) f − x exp (cid:18) (cid:107) x (cid:107) (cid:19) f = 0 which, in view of the condition f (0) = 1 , completes the proof.2esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces Remark 1.2.
The operator H = − ∆ + (cid:107) x (cid:107) is the Hermite operator in R d , and ψ is the product of the one-dimensional th Hermite functions. Therefore, since Hψ = dψ (as we have shown in Theorem 1.1), ψ is the eigenfunctionassociated with the eigenvalue d . For details on the Hermite operator in R d and corresponding eigenfunctions, see p. 5of [56].In this paper, we study a family of affine invariant test statistics for H that is based on the characterization givenin Theorem 1.1. Since the class N d is closed under full rank affine transformations, Theorem 1.1 does not restrictthe scope of the testing problem. We make the tacit standing assumption that P X is absolutely continuous withrespect to Lebesgue measure, and that n ≥ d + 1 . Let X n = n − (cid:80) nj =1 X j denote the sample mean and S n = n − (cid:80) nj =1 ( X j − X n )( X j − X n ) (cid:62) the sample covariance matrix of X , . . . , X n , where x (cid:62) means transposition of acolumn vector x . The assumptions made above guarantee that S n is invertible almost surely, see [18]. The test statisticwill be based on the so-called scaled residuals Y n,j = S − / n ( X j − X n ) , j = 1 , . . . , n, (5)which represent an empirical standardization of the data. Here, S − / n is the unique symmetric positive definite squareroot of S − n . The test statistic will be based on the empirical characteristic function ψ n ( t ) = 1 n n (cid:88) j =1 exp(i t (cid:62) Y n,j ) , t ∈ R d , of Y n, , . . . , Y n,n . Notice that an application of the Laplace operator ∆ to ψ n yields ∆ ψ n ( t ) = − n n (cid:88) j =1 (cid:107) Y n,j (cid:107) exp(i t (cid:62) Y n,j ) , t ∈ R d . Motivated by (2) and Theorem 1.1, we propose the weighted L -statistic T n,a = n (cid:90) | ∆ ψ n ( t ) − ∆ ψ ( t ) | w a ( t ) d t = n (cid:90) (cid:12)(cid:12)(cid:12) n n (cid:88) j =1 (cid:107) Y n,j (cid:107) exp(i t (cid:62) Y n,j ) + ( (cid:107) t (cid:107) − d ) exp (cid:18) − (cid:107) t (cid:107) (cid:19) (cid:12)(cid:12)(cid:12) w a ( t ) d t, where w a ( t ) = exp( − a (cid:107) t (cid:107) ) , t ∈ R d , (6)and a > is a fixed constant. Moreover, | z | is the modulus of a complex number z , and integration is, unless otherwisespecified, over R d . In principle, other weight functions than w a are conceivable in the definition of T n,a . Since, for c ∈ R d (see [37], p. 3601), (cid:90) cos( t (cid:62) c ) exp( − a (cid:107) t (cid:107) ) d t = (cid:16) πa (cid:17) d exp (cid:18) − (cid:107) c (cid:107) a (cid:19) , (7)as well as (cid:90) ( (cid:107) t (cid:107) − d ) cos( t (cid:62) c ) exp (cid:18) − (cid:18) a + 12 (cid:19) (cid:107) t (cid:107) (cid:19) d t = − (2 π ) d (cid:0) (cid:107) c (cid:107) + 2 da (2 a + 1) (cid:1) (2 a + 1) d exp (cid:18) − (cid:107) c (cid:107) a +1) (cid:19) , (8) (cid:90) ( (cid:107) t (cid:107) − d ) exp (cid:0) − ( a + 1) (cid:107) t (cid:107) (cid:1) d t = π d ( a + 1) d (cid:18) a ( a + 1) d + d ( d + 2)4 (cid:19) , (9)an attractive feature of the choice of w a is that the test statistic takes the simple form T n,a = (cid:16) πa (cid:17) d n n (cid:88) j,k =1 (cid:107) Y n,j (cid:107) (cid:107) Y n,k (cid:107) exp (cid:18) − a (cid:107) Y n,j − Y n,k (cid:107) (cid:19) (10) − π ) d (2 a + 1) d n (cid:88) j =1 (cid:107) Y n,j (cid:107) (cid:0) (cid:107) Y n,j (cid:107) + 2 da (2 a + 1) (cid:1) exp (cid:18) − (cid:107) Y n,j (cid:107) a + 1 (cid:19) (11) + n π d ( a + 1) d (cid:18) a ( a + 1) d + d ( d + 2)4 (cid:19) , (12)3esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaceswhich is amenable to computational purposes. Moreover, T n,a depends only on the scalar products Y (cid:62) n,i Y n,j =( X i − X n ) (cid:62) S − n ( X j − X n ) , where i, j ∈ { , . . . , n } . This shows that T n,a is invariant with respect to full rank affinetransformations of X , . . . , X n . Moreover, not even the square root S − / n of S − n is needed.The rest of the paper is organized as follows: In Section 2, we show that, as the tuning parameter a tends to infinity, thetest statistic T n,a , after a suitable scaling, converges to a certain measure of multivariate skewness. On the other hand, atime-honored measure of multivariate kurtosis emerges as a → . Section 3 presents a basic Hilbert space central limittheorem, which proves beneficial for obtaining the limit distribution of T n,a both under H and under fixed alternativesto normality. In Section 4, we derive the limit null distribution of T n,a as n → ∞ . Section 5 considers the behaviorof T n,a with respect to contiguous alternatives to H . In Section 6, we show that the test for multivariate normalitythat rejects H for large values of T n,a is consistent against general alternatives. Moreover, the limit distribution of T n,a under a fixed alternative distribution is seen to be normal. Since the variance of this normal distribution can beestimated from the data, there is an asymptotic confidence interval for the measure of distance from normality underalternative distributions that is inherent in the procedure. Furthermore, there is the option for a neighborhood-of-modelvalidation procedure. The results of a simulation study, presented in Section 7, show that the novel test is strong withrespect to prominent competitors. In Section 8, the new tests are applied to the Iris flower data set due to R.A. Fisher.The paper concludes with some remarks. For the sake of readability, most of the proofs and some auxiliary results aredeferred to Section A. Finally, the following abbreviations, valid for t, x ∈ R d , will be used in several sections: CS + ( t, x ) := cos( t (cid:62) x ) + sin( t (cid:62) x ) , CS − ( t, x ) = cos( t (cid:62) x ) − sin( t (cid:62) x ) . (13) a → ∞ and a → The results of this section show that the class of tests for multivariate normality based on T n,a is "closed at theboundaries" a → ∞ and a → and thus shed some light on the tuning parameter a , which figures in the weightfunction w a given in (6). We first consider the case a → ∞ . Theorem 2.1.
Elementwise on the underlying probability space, we have lim a →∞ a d/ nπ d T n,a = 1 n n (cid:88) j,k =1 (cid:107) Y n,j (cid:107) (cid:107) Y n,k (cid:107) Y (cid:62) n,j Y n,k . (14) Proof.
For short, put Y j := Y n,j . From the representation of T n,a we have a d/ T n,a nπ d/ = 2 an n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) exp (cid:18) − a (cid:107) Y n,j − Y k (cid:107) (cid:19) − (cid:18) a a +1 (cid:19) d/ n (2 a +1) n (cid:88) j =1 (cid:107) Y j (cid:107) (cid:0) (cid:107) Y j (cid:107) + 2 da (2 a +1) (cid:1) exp (cid:18) − (cid:107) Y j (cid:107) a +1 (cid:19) + (cid:18) aa + 1 (cid:19) d/ a + 1 (cid:18) a ( a + 1) d + d ( d + 2)4 (cid:19) =: U n, − U n, + U n, (say). Since (cid:80) nj =1 (cid:107) Y j (cid:107) = nd , an expansion of the exponential function yields U n, = 2 ad − dn n (cid:88) j =1 (cid:107) Y j (cid:107) + 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) Y (cid:62) j Y k + o (1) as a → ∞ . To tackle U n, , we use (cid:18) a a +1 (cid:19) d/ = (cid:18) a (cid:19) − ( d/ = 1 − (cid:18) d (cid:19) a + O ( a − ) and, after some algebra, obtain U n, = 4 ad − (cid:18) d (cid:19) d − dn n (cid:88) j =1 (cid:107) Y j (cid:107) + o (1) . ( a/ ( a + 1)) d/ yields U n, = 2 ad − (cid:18) d (cid:19) d + o (1) . Summing up, the assertion follows.
Remark 2.2.
The limit (cid:101) b ,d := 1 n n (cid:88) j,k =1 (cid:107) Y n,j (cid:107) (cid:107) Y n,k (cid:107) Y (cid:62) n,j Y n,k (say), which figures on the right-hand side of (14), is a measure of multivariate (sample) skewness, introduced by Móri,Rohatgi and Székely, see [45]. A much older time-honored measure of multivariate (sample) skewness is skewness inthe sense of Mardia (see [42]), which is given by b ,d := 1 n n (cid:88) j,k =1 (cid:0) Y (cid:62) n,j Y n,k (cid:1) . It is interesting to compare Theorem 2.1 with similar results found in connection with other weighted L -statisticsthat have been studied for testing H . Thus, by Theorem 2.1 of [28], the time-honored class of BHEP-statistics fortesting for multivariate normality (see [36]), after suitable rescaling, approaches the linear combination b ,d + 3 (cid:101) b ,d ,as a smoothing parameter (called β in that paper) tends to . Since β and a are related by β = a − / , this correspondsto letting a tend to infinity. The same linear combination b ,d + 3 (cid:101) b ,d also showed up as a limit statistic in [31] and[32]. Notice that, in the univariate case, the limit statistic (cid:101) b ,d figuring in Theorem 2.1 is nothing but three timessquared sample skewness. We stress that tests for multivariate normality based on b ,d or (cid:101) b ,d or on related measures ofmultivariate skewness and kurtosis lack consistency against general alternatives, see, e.g., [8, 9, 26, 27, 29].We now consider the case a → . Since, elementwise on the underlying probability space, the expressions in (11) and(12) have finite limits as a → , and since the double sum figuring in (10) converges to (cid:80) nj =1 (cid:107) Y n,j (cid:107) as a → , wehave the following result. Theorem 2.3.
Elementwise on the underlying probability space, we have lim a → (cid:16) aπ (cid:17) d/ T n,a = 1 n n (cid:88) j =1 (cid:107) Y n,j (cid:107) . (15) Remark 2.4.
The limit statistic on the right-hand side of (15) is Mardia’s celebrated measure b ,d of multivariate samplekurtosis, see [42]. Together with Theorem 2.1, this result shows that, just like the class of BHEP tests for multivariatenormality (see [28]), also the class of tests based on T n,a is "closed at the boundaries" a → ∞ and a → . Notably,Mardia’s measure of kurtosis shows up for the first time in connection with limits of weighted L -statistics for testingfor multivariate normality. The corresponding limit statistic for the class of BHEP tests is, up to a linear transformation, n − (cid:80) nj =1 exp( −(cid:107) Y n,j (cid:107) / , see Theorem 3.1 of [28]. In this chapter, we present a basic Hilbert space central limit theorem. This theorem implies the limit distributionof T n,a under the null hypothesis (1), but it is also beneficial for proving a limit normal distribution of T n,a underfixed alternatives to H . Throughout this section, we assume that the underlying distribution satisfies E (cid:107) X (cid:107) < ∞ .Moreover, we let E ( X ) = 0 and E ( XX (cid:62) ) = I d in view of affine invariance. To motivate the benefit of a Hilbert spacesetting and for later purposes, it will be convenient to represent T n,a in a different way. Proposition 3.1.
Recall ψ ( t ) from (4), and let m ( t ) := (cid:0) d − (cid:107) t (cid:107) (cid:1) ψ ( t ) , t ∈ R d , (16) Z n ( t ) := 1 √ n n (cid:88) j =1 (cid:110) (cid:107) Y n,j (cid:107) (cid:0) cos( t (cid:62) Y n,j ) + sin( t (cid:62) Y n,j ) (cid:1) − m ( t ) (cid:111) , t ∈ R d . (17)5esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces We then have T n,a = (cid:90) Z n ( t ) w a ( t ) d t. (18) Proof.
The proof follows by straightforward algebra using the addition theorems for the sine function and the cosinefunction and the fact that (cid:82) sin( t (cid:62) y ) m ( t ) w a ( t ) d t = 0 , y ∈ R d .A convenient setting for asymptotics will be the separable Hilbert space H of (equivalence classes of) measurablefunctions f : R d → R satisfying (cid:82) f ( t ) w a ( t ) d t < ∞ . The scalar product and the norm in H will be denoted by (cid:104) f, g (cid:105) H = (cid:90) f ( t ) g ( t ) w a ( t ) d t, (cid:107) f (cid:107) H = (cid:104) f, f (cid:105) / H , f, g ∈ H , respectively. Notice that T n,a = (cid:107) Z n (cid:107) H . Recalling CS ± ( t, x ) from (13), and putting µ ( t ) := E (cid:2) (cid:107) X (cid:107) CS + ( t, X ) (cid:3) , t ∈ R d , (19)the main object of this section is the random element V n of H , defined by V n ( t ) := 1 √ n n (cid:88) j =1 (cid:0) (cid:107) Y n,j (cid:107) CS + ( t, Y n,j ) − µ ( t ) (cid:1) , t ∈ R d . (20)Observe that V n = Z n if the distribution of X is N d (0 , I d ) , since then the functions µ and m coincide. We will showthat, as n → ∞ , V n converges in distribution to a centred Gaussian random element V of H . The only technical problemin proving such a result is the fact that V n is based on the scaled residuals Y n, , . . . , Y n,n and not on X , . . . , X n . If V n ( t ) denotes the modification of V n ( t ) that results from replacing Y n,j with X j , a Hilbert space central limit theoremholds for V n , since the summands comprising V n ( t ) are i.i.d. square-integrable centred random elements of H . Writing D −→ for convergence in distribution of random elements of H and random variables, the basic idea to prove V n D −→ V is to find a random element (cid:101) V n of H , such that (cid:101) V n D −→ V and (cid:101) V n − V n = o P (1) as n → ∞ . In what follows, thestochastic Landau symbol o P (1) refers to convergence to zero in probability in H , i.e., we have to show (cid:107) (cid:101) V n − V n (cid:107) H = (cid:90) (cid:16) (cid:101) V n ( t ) − V n ( t ) (cid:17) w a ( t ) d t = o P (1) as n → ∞ . (21)To state the main result of this section, let ψ X ( t ) = E [exp(i t (cid:62) X )] , t ∈ R d , denote the characteristic function of X ,and put ψ + X ( t ) := Re ψ X ( t ) + Im ψ X ( t ) , ψ − X ( t ) := Re ψ X ( t ) − Im ψ X ( t ) , where Re w and Im w stand for the real and the imaginary part of a complex number w , respectively. For a twicecontinuously differentiable function f : R d → R , let H f ( t ) denote the Hessian matrix of f , evaluated at t . Furthermore,recall the gradient operator ∇ and the Laplace operator ∆ from Section 1. Proposition 3.2.
Let (cid:101) V n ( t ) := 1 √ n n (cid:88) j =1 v ( t, X j ) , t ∈ R d , (22) where v ( t, x ) = v ( t, x ) + v ( t, x ) + v ( t, x ) + v ( t, x ) , (23) v ( t, x ) = (cid:107) x (cid:107) CS + ( t, x ) , v ( t, x ) = 12 t (cid:62) (cid:0) xx (cid:62) − I d (cid:1) ∇ ∆ ψ + X ( t ) , (24) v ( t, x ) = (cid:0) ∇ ψ − X ( t ) + ∆ ψ − X ( t ) t (cid:1) (cid:62) x, v ( t, x ) = x (cid:62) H ψ + X ( t ) x. (25) We then have (21).
The proof of Proposition 3.2 is given in Section A.Since E ( X ) = 0 , E ( XX (cid:62) ) = I d and E [ (cid:107) X (cid:107) CS + ( t, X )] = − ∆ ψ + X ( t ) , we have (writing tr for trace) E v ( t, X ) = − ∆ ψ + X ( t ) + tr( Hψ + X ( t )) = 0 . Thus, v ( · , X ) , . . . , v ( · , X n ) are i.i.d. centred square-integrable random elements of H ,and the central limit theorem in Hilbert spaces gives (cid:101) V n D −→ V for some centred Gaussian element V of H . In view of(21) and Slutsky’s lemma, we therefore can state the main result of this section.6esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces Theorem 3.3.
Let
X, X , X , . . . be i.i.d. random vectors satisfying E (cid:107) X (cid:107) < ∞ , E ( X ) = 0 and E ( XX (cid:62) ) = I d . Forthe sequence of random elements V n defined in (20) we have V n D −→ V as n → ∞ , where V is a centred Gaussian element of H having covariance kernel L ( s, t ) = E [ v ( s, X ) v ( t, X )] , s, t ∈ R d , (26) where v ( t, x ) is given in (23). T n,a In this section we derive the limit distribution of T n,a under the null hypothesis (1). In view of affine invariance, weassume that X has a d -variate standard normal distribution. Since the process Z n ( t ) given in (17) is nothing but V n , asdefined in (20), in this special case, we have the following result. Theorem 4.1.
Suppose that X has some non-degenerate normal distribution. Putting d := d + 2 , d := d + 4 , we have the following:a) There is a centred Gaussian random element Z of H with covariance kernel K ( s, t ) = ψ ( s − t ) (cid:16) (cid:0) (cid:107) s − t (cid:107) − d (cid:1) − d (cid:17) + ψ ( s ) ψ ( t ) (cid:110) − ( s (cid:62) t ) (cid:107) s (cid:107) − d )( (cid:107) t (cid:107) − d )+2 d (cid:0) (cid:107) s (cid:107) + (cid:107) t (cid:107) (cid:1) −(cid:107) s (cid:107) −(cid:107) t (cid:107) −(cid:107) s (cid:107) (cid:107) t (cid:107) − s (cid:62) t (cid:0) (cid:107) s (cid:107) − d (cid:1) (cid:0) (cid:107) t (cid:107) − d (cid:1) − dd (cid:111) ,s, t ∈ R d , such that, with Z n defined in (17), we have Z n D −→ Z in H as n → ∞ .b) We have T n,a D −→ (cid:90) Z ( t ) w a ( t ) d t. Notice that b) follows from a) and the continuous mapping theorem. Part a) follows from Theorem 3.3. For thespecial case X ∼ N d (0 , I d ) , we have ψ + X ( t ) = ψ − X ( t ) = exp( −(cid:107) t (cid:107) /
2) = ψ ( t ) , which entails ∇ ψ ( t ) = − tψ ( t ) , ∆ ψ ( t ) = (cid:0) (cid:107) t (cid:107) − d (cid:1) ψ ( t ) , H ψ ( t ) = (cid:0) tt (cid:62) − I d (cid:1) ψ ( t ) , and ∇ ∆ ψ ( t ) = tψ ( t ) (cid:0) d − (cid:107) t (cid:107) (cid:1) . Thus, the function v ( t, x ) figuring in the statement of Proposition 3.2 takes the special form h ( x, t ) = (cid:107) x (cid:107) CS + ( t, x ) − (2 ψ ( t ) + m ( t )) t (cid:62) x − ψ ( t ) (cid:107) x (cid:107) (27) + (cid:18) ψ ( t ) + m ( t )2 (cid:19) ( t (cid:62) x ) − (cid:18) ψ ( t ) + m ( t )2 (cid:19) (cid:107) t (cid:107) . Long but straightforward computations, using symmetry arguments and the identities E (cid:2) (cid:107) X (cid:107) ( s (cid:62) X ) (cid:3) = ( d + 2) (cid:107) s (cid:107) , E (cid:2) (cid:107) X (cid:107) cos( t (cid:62) X ) (cid:3) = (cid:0)(cid:0) d + 2 − (cid:107) t (cid:107) (cid:1) (cid:0) d − (cid:107) t (cid:107) (cid:1) − (cid:107) t (cid:107) (cid:1) ψ ( t ) , E (cid:2) (cid:107) X (cid:107) s (cid:62) X sin( t (cid:62) X ) (cid:3) = s (cid:62) t (cid:0) d + 2 − (cid:107) t (cid:107) (cid:1) ψ ( t ) , E (cid:2) (cid:107) X (cid:107) ( s (cid:62) X ) cos( t (cid:62) X ) (cid:3) = (cid:0) d + 2 − (cid:107) t (cid:107) (cid:1) (cid:0) (cid:107) s (cid:107) − ( s (cid:62) t ) (cid:1) ψ ( t ) − s (cid:62) t ) ψ ( t ) ,s, t ∈ R d , show that the covariance kernel K ( s, t ) = E [ h ( s, X ) h ( t, X )] takes the form given above.Let T ∞ ,a be a random variable with the limit null distribution of T n,a , i.e., with the distribution of (cid:82) Z ( t ) w a ( t ) d t .Since E ( T ∞ ,a ) = (cid:82) K ( t, t ) w a ( t ) d t , the following result may be obtained by straightforward but tedious manipulationsof integrals. Theorem 4.2.
Putting c j ( a, d ) := π d/ d ( a + 1) d/ j , j = 1 , , , , we have E ( T ∞ ,a ) = d ( d + 2) (cid:32)(cid:16) πa (cid:17) d/ − (cid:18) πa + 1 (cid:19) d/ (cid:33) − c ( a, d ) ( d + 2)( d + 4)( d + 6)32 + c ( a, d ) ( d + 2)( d + 3)( d + 4)8 − c ( a, d ) ( d + 2)( d + 4 d + 14)8 + c ( a, d ) ( d − d + 2)2 . The quantiles of the distribution of T ∞ ,a can be approximated by a Monte Carlo method, see Section 7. In this section, we consider a triangular array ( X n , . . . , X nn ) , n ≥ d + 1 , of rowwise i.i.d. random vectors withLebesgue density f n ( x ) = ϕ ( x ) (cid:18) g ( x ) √ n (cid:19) , x ∈ R d , where ϕ ( x ) = (2 π ) − d/ exp (cid:0) −(cid:107) x (cid:107) / (cid:1) , x ∈ R d , is the density of the standard normal distribution N d (0 , I d ) , and g issome bounded measurable function satisfying (cid:82) g ( x ) ϕ ( x ) d x = 0 . We assume that n is sufficiently large to render g nonnegative. Recall Z n ( t ) from (17). Theorem 5.1.
Under the triangular X n, , . . . , X n,n given above, we have Z n D −→ Z + c as n → ∞ , where Z is the centred random element of H figuring in Theorem 4.1, and c ( t ) = (cid:90) h ( x, t ) g ( x ) ϕ ( x ) d x, with h ( x, t ) given in (27).Proof. Since the reasoning uses standard LeCam theory on contiguous probability measures and closely parallels thatgiven in Section 3 of [36], it will be omitted.
Corollary 5.2.
Under the conditions of Theorem 5.1, we have T n,a D −→ (cid:90) ( Z ( t ) + c ( t )) w a ( t ) d t. From Theorem 5.1 and the above corollary, we conclude that the test for multivariate normality based on T n,a is able todetect alternatives which converge to the normal distribution at the rate n − / , irrespective of the underlying dimension d . The test of Bowman and Foster (see [12]) is a prominent example of an affine invariant tests for multivariatenormality that is consistent against each fixed non-normal alternative distribution but nevertheless lacks this property of n − / -consistency, see Section 7 of [30]. In this section we assume that
X, X , X , . . . are i.i.d. with a distribution that is absolutely continuous with respect toLebesgue measure, and that E (cid:107) X (cid:107) < ∞ . In view of affine invariance, we make the additional assumptions E ( X ) = 0 and E ( XX (cid:62) ) = I d . Recall m ( t ) from (16) and µ ( t ) from (19). Writing a . s . −→ for P -almost sure convergence, our firstresult is a strong limit for T n,a /n . Theorem 6.1. If E (cid:107) X (cid:107) < ∞ , we have T n,a n a . s . −→ ∆ a = (cid:90) (cid:0) µ ( t ) − m ( t ) (cid:1) w a ( t ) d t as n → ∞ . (28)8esting multivariate normality by zeros of the harmonic oscillator in characteristic function spacesThe proof of Theorem 6.1 is given in Section A. Remark 6.2.
Let ψ X ( t ) = E [exp(i t (cid:62) X )] , t ∈ R d , denote the characteristic function of X . We have ∆ ψ X ( t ) = − E [ (cid:107) X (cid:107) exp(i t (cid:62) X )] . Since ∆ ψ ( t ) = ( (cid:107) t (cid:107) − d ) ψ ( t ) , some algebra gives ∆ a = (cid:90) (cid:12)(cid:12)(cid:12) ∆ ψ X ( t ) − ∆ ψ ( t ) (cid:12)(cid:12)(cid:12) w a ( t ) d t. By Theorem 1.1, we have ∆ a = 0 if and only if X has the normal distribution N d (0 , I d ) . In view of Theorem 6.1, T n,a → ∞ P -almost surely under any alternative distribution satisfying E (cid:107) X (cid:107) < ∞ . Since, according to Theorem4.1, the sequence of critical values of a level- α -test of H that rejects H for large values of T n,a stays bounded, wehave the following result. Corollary 6.3.
The test for multivariate normality based on T n,a is consistent against any fixed alternative distributionsatisfying E (cid:107) X (cid:107) < ∞ . By analogy with Theorem 2.1, the next result shows that the (population) measure of multivariate skewness in the senseof Móri, Rohatgi and Székely (see [45]) emerges as the limit of ∆ a , after a suitable normalization, as a → ∞ . Theorem 6.4. If E (cid:107) X (cid:107) < ∞ , then lim a →∞ a d/ π d/ ∆ a = (cid:13)(cid:13) E ( (cid:107) X (cid:107) X ) (cid:13)(cid:13) . The proof of Theorem 6.4 is given in Section A.We now show that the limit distribution of √ n ( T n,a /n − ∆ a ) as n → ∞ is centred normal. This fact is essentially aconsequence of Theorem 1 in [6]. The reasoning is as follows: By (18), we have T n,a = (cid:107) Z n (cid:107) H , where Z n is given in(17). Putting z ( t ) := µ ( t ) − m ( t ) , t ∈ R d , display (28) shows that ∆ a = (cid:107) z (cid:107) H . Now, defining Z ∗ n ( t ) := n − / Z n ( t ) ,it follows that √ n (cid:18) T n,a n − ∆ a (cid:19) = √ n (cid:0) (cid:107) Z ∗ n (cid:107) H − (cid:107) z (cid:107) H (cid:1) = √ n (cid:104) Z ∗ n − z, Z ∗ n + z (cid:105) H = √ n (cid:104) Z ∗ n − z, z + Z ∗ n − z (cid:105) H = 2 (cid:104)√ n ( Z ∗ n − z ) , z (cid:105) H + 1 √ n (cid:107)√ n ( Z ∗ n − z ) (cid:107) H . (29)A little thought shows that √ n ( Z ∗ n ( t ) − z ( t )) = V n ( t ) , where V n is given in (20). By Theorem 3.3, V n D −→ V for acentred Gaussian element V of H . As a consequence, the second summand in (29) is o P (1) as n → ∞ , and, by thecontinuous mapping theorem, the first summand converges in distribution to (cid:104) V, z (cid:105) H . The latter random variable has acentred normal distribution with variance σ a := 4 E [ (cid:104) V, z (cid:105) H ] . The following theorem summarizes our findings. Theorem 6.5.
For a fixed alternative distribution satisfying E (cid:107) X (cid:107) < ∞ , E ( X ) = 0 and E ( XX (cid:62) ) = I d , we have √ n (cid:18) T n,a n − ∆ a (cid:19) D −→ N(0 , σ a ) as n → ∞ , where σ a = 4 (cid:90) (cid:90) L ( s, t ) z ( s ) z ( t ) w a ( s ) w a ( t ) d s d t (30) and L ( s, t ) is given in (26).Proof. To complete the proof, notice that, by Fubini’s theorem, σ a = 4 E [ (cid:104) V, z (cid:105) H ] = 4 E (cid:34) (cid:18)(cid:90) V ( s ) z ( s ) w a ( s ) d s (cid:19) (cid:18)(cid:90) V ( t ) z ( t ) w a ( t ) d t (cid:19) (cid:35) = 4 (cid:90) (cid:90) E [ V ( s ) V ( t )] z ( s ) z ( t ) w a ( s ) w a ( t ) d s d t. P X , there is a consistent estimator of σ a . To obtain such an estimator, we replace L ( s, t ) as well as z ( s ) and z ( t ) figuring in (30) with suitable empirical counterparts. To this end, notice that, by (26),we have L ( s, t ) = (cid:88) i =1 4 (cid:88) j =1 L i,j ( s, t ) , (31)where L i,j ( s, t ) = E (cid:2) v i ( s, X ) v j ( t, X ) (cid:3) (32)and v j ( t, x ) , j ∈ { , , , } , are given in (24), (25). Since ∇ ∆ ψ ± X ( t ) = ∓ E [CS ∓ ( t, X ) (cid:107) X (cid:107) X ] , ∇ ψ ± X ( t ) = ± E [CS ∓ ( t, X ) X ] , ∆ ψ ± X ( t ) = − E [CS ± ( t, X ) (cid:107) X (cid:107) ] and H ψ + X ( t ) = − E [CS + ( t )( t, X ) XX (cid:62) ] , parts a) – d) of thefollowing lemma show that the unknown quantities ∇ ∆ ψ + X ( t ) , ∇ ψ − X ( t ) , ∆ ψ − X ( t ) and H ψ + X ( t ) that figure in theexpressions of v ( t, x ) , v ( t, x ) and v ( t, x ) can be replaced with consistent estimators that are based on the scaledresiduals Y n, , . . . , Y n,n defined in (5). Lemma 6.6. If E (cid:107) X (cid:107) < ∞ , we have a) Ψ ,n ( t ) := n − (cid:80) nj =1 CS + ( t, Y n,j ) Y n,j a.s. −→ −∇ ψ − X ( t ) , b) Ψ ,n ( t ) := n − (cid:80) nj =1 CS + ( t, Y n,j ) Y n,j Y (cid:62) n,j a.s. −→ − H ψ + X ( t ) , c) Ψ ± ,n ( t ) := n − (cid:80) nj =1 CS ± ( t, Y n,j ) (cid:107) Y n,j (cid:107) a.s. −→ − ∆ ψ ± X ( t ) , d) Ψ ± ,n ( t ) := n − (cid:80) nj =1 CS ± ( t, Y n,j ) (cid:107) Y n,j (cid:107) Y n,j a.s. −→ ±∇ ∆ ψ ∓ X ( t ) , e) Ψ ,n ( t ) := n − (cid:80) nj =1 CS + ( t, Y n,j ) (cid:107) Y n,j (cid:107) Y n,j Y (cid:62) n,j a.s. −→ H ∆ ψ + X ( t ) . The proof of Lemma 6.6 is given in Section 9.In view of Lemma 6.6, a suitable estimator of L ( s, t ) defined in (31) is L n ( s, t ) = (cid:88) i =1 4 (cid:88) j =1 L i,jn ( s, t ) , (33)where L i,jn ( s, t ) = 1 n n (cid:88) k =1 v n,i ( s, Y n,k ) v n,j ( t, Y n,k ) , (34)and v n, ( s, Y n,k ) = (cid:107) Y n,k (cid:107) CS + ( s, Y n,k ) , v n, ( s, Y n,k ) = − s (cid:62) ( Y n,k Y (cid:62) n,k − I d )Ψ − ,n ( s ) ,v n, ( s, Y n,k ) = − (cid:0) ,n ( s ) + Ψ − ,n ( s ) s (cid:1) (cid:62) Y n,k , v n, ( s, Y n,k ) = − Y (cid:62) n,k Ψ ,n ( s ) Y n,k . By straightforward algebra we have L , n ( s, t ) = n − (cid:80) nj =1 (cid:107) Y n,j (cid:107) cos (cid:0) ( t − s ) (cid:62) Y n,j (cid:1) + n − (cid:80) nj =1 (cid:107) Y n,j (cid:107) sin (cid:0) ( t + s ) (cid:62) Y n,j (cid:1) ,L , n ( s, t ) = − Ψ − ,n ( s ) (cid:62) Ψ ,n ( t ) s + Ψ +3 ,n ( t )Ψ − ,n ( s ) (cid:62) s, (35) L , n ( s, t ) = (cid:0) − ,n ( s ) − Ψ − ,n ( s ) s (cid:1) (cid:62) Ψ +4 ,n ( t ) L , n ( s, t ) = − n − (cid:80) nj =1 (cid:107) Y n,j (cid:107) CS + ( t, Y n,j ) Y (cid:62) n,j Ψ ,n ( s ) Y n,j L , n ( s, t ) = Ψ − ,n ( t ) (cid:62) n − (cid:80) nj =1 Y n,j Y (cid:62) n,j t Ψ − ,n ( s ) (cid:62) (cid:0) Y n,j Y (cid:62) n,j − I d (cid:1) s,L , n ( s, t ) = Ψ − ,n ( t ) (cid:62) n − (cid:80) nj =1 Y n,j Y (cid:62) n,j t (cid:0) Ψ ,n ( s ) + Ψ − ,n ( s ) s (cid:1) (cid:62) Y n,j ,L , n ( s, t ) = Ψ − ,n ( t ) (cid:62) n − (cid:80) nj =1 (cid:0) Y n,j Y (cid:62) n,j − I d (cid:1) tY (cid:62) n,j Ψ ,n ( s ) Y n,j ,L , n ( s, t ) = (cid:0) ,n ( t ) + Ψ − ,n ( t ) t (cid:1) (cid:62) (cid:0) ,n ( s ) + Ψ − ,n ( s ) s (cid:1) ,L , n ( s, t ) = n − (cid:80) nj =1 (cid:0) ,n ( s ) + Ψ − ,n ( s ) s (cid:1) (cid:62) Y n,j Y (cid:62) n,j Ψ ,n ( t ) Y n,j ,L , n ( s, t ) = n − (cid:80) nj =1 Y (cid:62) n,j Ψ ,n ( t ) Y n,j Y (cid:62) n,j Ψ ,n ( s ) Y n,j . n U (cid:0) −√ , √ (cid:1) L (cid:0) , / √ (cid:1) Lo (cid:0) , √ /π (cid:1) ∆ . by I n, . ,α for different distributions (5000 replications)Notice that, by symmetry, L i,jn = L j,in if i (cid:54) = j . Since z ( s ) = µ ( s ) − m ( s ) = E [ (cid:107) X (cid:107) CS + ( s, X )] − m ( s ) , a naturalestimator of z ( s ) is z n ( s ) = 1 n n (cid:88) k =1 CS + ( s, Y n,k ) (cid:107) Y n,k (cid:107) − m ( s ) . (36)Writing P −→ for convergence in probability, we then have the following result. Theorem 6.7.
Suppose E (cid:107) X (cid:107) < ∞ , E ( X ) = 0 and E ( XX (cid:62) ) = I d . Let (cid:98) σ n,a = 4 (cid:90) (cid:90) L n ( s, t ) z n ( s ) z n ( t ) w a ( s ) w a ( t ) d s d t, (37) where L n ( s, t ) and z n ( s ) are defined in (33) and (36), respectively. We then have (cid:98) σ n,a P −→ σ a . The proof of Theorem 6.7 is given in Section A.Under the conditions of Theorem 6.7, Theorem 6.5 and Sluzki’s lemma yield √ n (cid:98) σ n,a (cid:18) T n,a n − ∆ a (cid:19) D −→ N(0 , as n → ∞ , (38)provided that σ a > . We thus obtain the following asymptotic confidence interval for ∆ a . Corollary 6.8.
Let α ∈ (0 , , and write Φ − α/ for the (1 − α/ -quantile of the standard normal law. Then I n,a,α := (cid:20) T n,a n − (cid:98) σ n,a √ n Φ − α/ , T n,a n + (cid:98) σ n,a √ n Φ − α/ (cid:21) is an asymptotic confidence interval for ∆ a . Example 6.9.
We consider the case d = 1 , a = 0 . and the following standardized symmetric alternative distributions:the uniform distribution U (cid:0) −√ , √ (cid:1) , the Laplace distribution L (cid:0) , / √ (cid:1) , and the logistic distribution Lo (cid:0) , √ /π (cid:1) .The corresponding characteristic functions and their second derivatives are given by ϕ U ( t ) = sin( √ t ) √ t , ϕ (cid:48)(cid:48) U ( t ) = √ − t ) sin( √ t ) − t cos( √ t )3 t ,ϕ L ( t ) = 22 + t , ϕ (cid:48)(cid:48) L ( t ) = 12 t − t ) ,ϕ Lo ( t ) = √ t sinh( √ t ) , ϕ (cid:48)(cid:48) Lo ( t ) = 32 3 √ t − √ t ) + √ t cosh(2 √ t )sinh( √ t ) . The pertaining values of ∆ . are ∆ . , U ≈ ∆ . , L ≈ ∆ . , Lo ≈ I n, . ,α for α = n ≥ . Remark 6.10.
A further application of (38) addresses a fundamental drawback inherent in any goodness of fit test, see[6]. If a level- α -test of H does not reject H , the hypothesis H is by no means validated or confirmed , since each11esting multivariate normality by zeros of the harmonic oscillator in characteristic function spacesmodel is wrong, and there is probably only not enough evidence to reject H . However, if we define a certain "essentialdistance" δ > , we can consider the inverse testing problem H δ : ∆ a ( F ) ≥ δ against K δ : ∆ a ( F ) < δ . Here, the dependence of ∆ a on the underlying distribution has been made explicit.From (38), we obtain an asymptotic level- α - neighborhood-of-model-validation test of H δ against K δ , which rejects H δ if and only if T n.a n ≤ δ − (cid:98) σ n,a √ n Φ − (1 − α ) . Indeed, from (38) we have for each F ∈ H δ lim sup n →∞ P F (cid:18) T n.a n ≤ δ − (cid:98) σ n,a √ n Φ − (1 − α ) (cid:19) = lim sup n →∞ P F (cid:18) √ n (cid:98) σ n,a (cid:18) T n,a n − δ (cid:19) ≤ − Φ − (1 − α ) (cid:19) ≤ α. Moreover, it follows that lim n →∞ P F (cid:18) T n,a n ≤ δ − (cid:98) σ n,a √ n Φ − (1 − α ) (cid:19) = α for each F such that ∆ a ( F ) = δ . Finally, we have lim n →∞ P F (cid:18) T n,a n ≤ δ − (cid:98) σ n,a √ n Φ − (1 − α ) (cid:19) = 1 if ∆ a ( F ) < δ . Thus, the test is consistent against each alternative. Remark 6.11.
The double integral figuring (37) may be calculated explicitly. To this end, notice that (cid:98) σ n,a = (cid:80) i,j =1 (cid:98) σ i,jn,a , where (cid:98) σ i,jn,a = 4 (cid:90) (cid:90) L i,jn ( s, t ) z n ( s ) z n ( t ) w a ( s ) w a ( t ) d s d t (39)and (cid:98) σ i,jn,a = (cid:98) σ j,in,a , i, j ∈ { , . . . , } , by symmetry. We put q ,a ( y ) := (cid:90) m ( t )CS + ( t, y ) w a ( t )d t = (2 π ) d/ (2 a + 1) d/ (cid:0) (cid:107) y (cid:107) + 2 da (2 a + 1) (cid:1) exp (cid:18) − (cid:107) y (cid:107) a + 1 (cid:19) ,p ,a ( y, z ) := (cid:90) CS + ( t, y )CS + ( t, z ) w a ( t )d t = (cid:16) πa (cid:17) d/ exp (cid:18) − (cid:107) y − z (cid:107) a (cid:19) ,p ,a ( y, z ) := (cid:90) CS + ( t, y )CS − ( t, z ) tw a ( t )d t = (cid:16) πa (cid:17) d/ a exp (cid:18) − (cid:107) y − z (cid:107) a (cid:19) ( y − z ) ,q ,a ( y ) := (cid:90) m ( t )CS − ( t, y ) tw a ( t )d t = (2 π ) d/ (2 a + 1) d/ (cid:0) a + 1)(1 − ad ) − (cid:107) y (cid:107) (cid:1) exp (cid:18) − (cid:107) y (cid:107) a + 1 (cid:19) y y, z ∈ R d and P ,a, n := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) p ,a ( Y j , Y k ) − n n (cid:88) j =1 (cid:107) Y j (cid:107) q ,a ( Y j ) , (cid:101) P ,a, n := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) p ,a ( Y j , Y k ) Y k − n n (cid:88) j =1 (cid:107) Y j (cid:107) q ,a ( Y j ) Y j , (cid:101) P ,a, n := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) p ,a ( Y j , Y k ) Y k − n n (cid:88) j =1 q ,a ( Y j ) Y j ,P ,an := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) p ,a ( Y j , Y k ) Y k Y (cid:62) k − n n (cid:88) j =1 (cid:107) Y j (cid:107) Y j Y (cid:62) j q ,a ( Y j ) ,P ,a, n ( Y j ) := 1 n n (cid:88) k,(cid:96) =1 (cid:107) Y k (cid:107) (cid:0) Y (cid:62) j Y (cid:96) (cid:1) p ,a ( Y k , Y l ) − n n (cid:88) k =1 (cid:0) Y (cid:62) j Y k (cid:1) q ,a ( Y k ) ,P ,a, n ( Y j ) := 1 n n (cid:88) k =1 (cid:107) Y k (cid:107) p ,a ( Y j , Y k ) − q ,a ( Y j ) ,P ,a, n := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) Y (cid:62) k p ,a ( Y j , Y k ) − n n (cid:88) j =1 (cid:107) Y j (cid:107) Y (cid:62) j q ,a ( Y j ) ,P ,a, n := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) Y (cid:62) k P ,a p ,a ( Y j , Y k ) − n n (cid:88) j =1 (cid:107) Y j (cid:107) Y (cid:62) j P ,a q ,a ( Y j ) , (cid:101) P ,an := 1 n n (cid:88) j,k =1 (cid:107) Y j (cid:107) (cid:107) Y k (cid:107) p ,a ( Y j , Y k ) − n n (cid:88) j =1 (cid:107) Y j (cid:107) q ,a ( Y j ) ,P ,a, n ( Y j ) := 1 n n (cid:88) k,(cid:96) =1 (cid:107) Y k (cid:107) (cid:107) Y (cid:96) (cid:107) Y (cid:62) k Y j Y (cid:62) j p ,a ( Y k , Y (cid:96) ) − n n (cid:88) k =1 (cid:107) Y k (cid:107) Y (cid:62) k Y j Y (cid:62) j q ,a ( Y k ) , where Y j is shorthand for Y n,j . Notice that (cid:101) P ,a, n , (cid:101) P ,a, n and (cid:101) P ,an are vectors and P ,an is a matrix. By straightforwardbut tedious manipulations of the integrals in (39), each (cid:98) σ i,jn is seen to be an arithmetic mean of functions of the scaledresiduals, namely: (cid:98) σ , n,a = 4 n n (cid:88) j =1 (cid:107) Y j (cid:107) P ,a, n ( Y j ) , (cid:98) σ , n,a = 2 P ,a, n P ,a, n − P ,a, n , (cid:98) σ , n,a = − (cid:16) (cid:101) P ,a, n + (cid:101) P ,an (cid:17) (cid:62) (cid:101) P ,a, n , (cid:98) σ , n,a = − n n (cid:88) j =1 (cid:107) Y j (cid:107) P ,a, n ( Y j ) P ,a, n ( Y j ) , (cid:98) σ , n,a = 1 n n (cid:88) j =1 P ,a, ( Y j ) − ( P ,a, n ) , (cid:98) σ , n,a = 4 n n (cid:88) j =1 P ,a, ( Y j ) (cid:18) (cid:101) P ,a, n + 12 (cid:101) P ,an (cid:19) (cid:62) Y j , (cid:98) σ , n,a = 2 n n (cid:88) j =1 (cid:0) P ,a, n ( Y j ) − P ,a, n (cid:1) P ,a, n ( Y j ) , (cid:98) σ , n,a = 4 · (cid:13)(cid:13)(cid:13) (cid:101) P ,a, n + (cid:101) P ,an (cid:13)(cid:13)(cid:13) , (cid:98) σ , n,a = 4 n n (cid:88) j =1 P ,a, n ( Y j ) (cid:16) (cid:101) P ,a, n + (cid:101) P ,an (cid:17) (cid:62) Y j , (cid:98) σ , n,a = 4 n n (cid:88) j =1 P ,a, n ( Y j ) . In this section, we present the results of a Monte Carlo simulation study on the finite-sample power performanceof the test based on T n,a with that of several competitors. Since different procedures have been used for univariate13esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces d n \ a ∞ ∞ ∞ ∞ ∞ d − (cid:0) aπ (cid:1) d/ T n,a and α = 0 . (100 000 replications)and multivariate data, we distinguish the cases d = 1 and d ≥ . In the univariate case, the sample sizes are n ∈ { , , } , and in the multivariate case we restrict the simulations to n ∈ { , } . The nominal level ofsignificance is fixed throughout all simulations to . . Critical values for T n,a (in fact, for a scaled version of T n,a inorder to obtain values of a similar magnitude across the range of values for d and a considered) have been simulatedunder H with 100 000 replications, see Table 2. The critical values in the rows in Table 2 denoted by " ∞ " representapproximations of quantiles of the distribution of the limit random element T ∞ ,a = (cid:82) Z ( t ) w a ( t ) d t in Theorem 4.1 b).To obtain these values we simulate (say) m i.i.d. random supporting points U , . . . , U m where U ∼ N d (0 , (2 a ) − I d ) ,which are adapted to the integration over R d with respect to the weight function w a ( t ) = exp( − a (cid:107) t (cid:107) ) and a largenumber (say) (cid:96) of random variables Z j = d m (cid:107) X j (cid:107) , j = 1 , . . . , (cid:96) , with i.i.d. X j ∼ N m (0 , Σ K ) , where the covariancematrix Σ k is given by Σ K = ( K ( U k , U k )) k ,k ∈{ ,...,m } and K is the covariance kernel in Theorem 4.1 a). Next,we calculate the empirical quantile of Z , . . . , Z (cid:96) . Each approximation was simulated with (cid:96) = 100 , and m = 1 , for d ∈ { , , , } and each entry in Tables 4 - 7, which exhibit percentages of rejection of H of the testsunder consideration against various alternative distributions, is based on 10 000 replications. Entries that are markedwith ∗ denote . The simulations have been performed using the statistical computing environment R , see [48]. For testing the hypothesis H that the distribution of X is univariate normal (i.e., P X ∈ N ), we considered thefollowing competitors to the new test statistic:1. the Shapiro–Wilk test (SW), see [52],2. the Shapiro–Francia test (SF), see [51],3. the Anderson–Darling test (AD), see [3],4. the Baringhaus–Henze–Epps–Pulley test (BHEP), see [36],5. the del Barrio–Cuesta-Albertos–Mátran–Rodríguez-Rodríguez test (BCMR), see [15],6. the Betsch–Ebner test (BE), see [11].For the implementation of the first three tests in R we refer to the package nortest by [23]. Tests based on the empiricalcharacteristic function are represented by the BHEP-test with tuning parameter a > . The statistic is given in (40), a = 1 is fixed, and the critical values are taken from [25].14esting multivariate normality by zeros of the harmonic oscillator in characteristic function spacesThe BCMR-test is based on the L -Wasserstein distance, see section 3.3 in [14], and is given by BCMR = n − S n (cid:32) n (cid:88) k =1 X ( k ) (cid:90) knk − n Φ − ( t ) d t (cid:33) − (cid:90) nn +11 n +1 t (1 − t )( ϕ (Φ − ( t ))) d t. Here, X ( k ) is the k -th order statistic of X , . . . , X n , S n is the sample variance, and Φ − is the quantile function of thestandard normal law. Simulated critical values can be found in [40], or in Table 4 of [11].The BE-test is based on a L -distance between the empirical zero-bias transformation and the empirical distribution.Since the zero-bias transformation has a unique fixed point under normality, this distance is minimal under H . Thestatistic is given by BE a = 2 n (cid:88) ≤ j 1) + aY ( j ) Y ( k ) (cid:17) + a √ πa exp (cid:18) − Y k ) a (cid:19) (cid:16) − Y j ) Y ( k ) + Y ( k ) + Y ( j ) (cid:17)(cid:27) + 1 n n (cid:88) j =1 (cid:26) (cid:16) − Φ (cid:16) Y j √ a (cid:17)(cid:17) (cid:0) Y j + ( a − Y j + 1 (cid:1) + a √ πa exp (cid:16) − Y j a (cid:17) (cid:0) Y j − Y j (cid:1)(cid:27) , where Y , . . . , Y n is shorthand for the scaled residuals Y n, , . . . , Y n,n , Y (1) ≤ . . . ≤ Y ( n ) are the order statistics of Y , . . . , Y n , and Φ stands for the distribution function of the standard normal law. The implementation employs abootstrap procedure to find a data driven version of the tuning parameter a , see [1]. We used B = 400 bootstrapreplications and the same grid of tuning parameters as in [11], p. 19.The alternative distributions are chosen to fit the extensive power study of univariate normality tests by [49], in orderto ease the comparison with a wide variety of other existing tests. As representatives of symmetric distributionswe simulate the Student t ν -distribution with ν ∈ { , , } degrees of freedom, as well as the uniform distributionU ( −√ , √ . The asymmetric distributions are represented by the χ ν -distribution with ν ∈ { , } degrees of freedom,the Beta distributions B (1 , and B (2 , , the Gamma distributions Γ(1 , and Γ(5 , , parametrized by their shapeand rate parameter, the Gumbel distribution Gum (1 , with location parameter 1 and scale parameter 2, the lognormaldistribution LN (0 , as well as the Weibull distribution W (1 , . with scale parameter 1 and shape parameter 0.5. Asrepresentatives of bimodal distributions we take the mixture of normal distributions NMix ( p, µ, σ ) , where the randomvariables are generated by (1 − p ) N(0 , 1) + p N( µ, σ ) , p ∈ (0 , , µ ∈ R , σ > . Table 4 shows that the empirical power estimates of the new test T n,a outperform the other strong procedures for thesymmetric t -distribution, and they can compete for most of the other alternatives. Interestingly, the power does notdiffer too much when varying the tuning parameter a , although an effect is clearly visible, especially for the uniformdistribution. A data driven choice as in [1] might be of benefit also in connection with the new testing procedure. In this subsection we consider testing the hypothesis H that the distribution of X is multivariate normal (i.e., belongsto N d ), for the dimensions d ∈ { , , } . As competitors to the new test statistic we chose1. the Henze–Visagie test (HV), see [35],2. the Henze–Jiménez-Gamero test (HJG), see [31],3. the Baringhaus–Henze–Epps–Pulley test (BHEP), see [36].The HV-test uses a weighted L -type statistic based on a characterization of the moment generating function thatemploys a system of first-order partial differential equations. The statistic is defined by HV γ = 1 n (cid:18) πγ (cid:19) d n (cid:88) j,k =1 exp (cid:18) (cid:107) Y n,j + Y n,k (cid:107) γ (cid:19) (cid:18) Y (cid:62) n,j Y n,k + (cid:107) Y n,j + Y n,k (cid:107) (cid:18) γ − γ (cid:19) + d γ (cid:19) , γ > . We followed the recommendation of the authors in [35] and fixed γ = 5 . Since the limiting statistic HV ∞ for γ → ∞ is a linear combination of sample skewness in the sense of Mardia and that of Móri, Rohatgi and Székely,we also included HV ∞ .The HJG-test uses a weighted L -distance between the empirical moment generating function of the standardizedsample and the moment generating function of the standard normal distribution. The test statistic is given by HJG β = 1 nβ d n (cid:88) j,k =1 exp (cid:18) (cid:107) Y n,j + Y n,k (cid:107) β (cid:19) − (cid:112) β − / n (cid:88) j =1 exp (cid:18) (cid:107) Y n,j (cid:107) β − (cid:19) + n ( β − d , with β > . In our simulation we fix β = 1 . . For each of the tests based on HV , HV ∞ and HJG . , critical valueswere simulated with 100 000 replications.Finally, the now classical BHEP-test examines the weighted L -distance between the empirical characteristic functionof the standardized data and the characteristic function of the d -variate standard normal distribution. The statistic hasthe simple form BHEP a = 1 n n (cid:88) j,k =1 exp (cid:18) − a (cid:107) Y n,j − Y n,k (cid:107) (cid:19) (40) − (cid:0) a (cid:1) − d n n (cid:88) j =1 exp (cid:18) − a (cid:107) Y n,j (cid:107) a ) (cid:19) + (1 + 2 a )) − d , with a tuning parameter a > . A variety of values of a , i.e., a ∈ { . , . , . , . , , , , , } , has beenconsidered. Critical values can be found in Tables I - III of [36], whereas missing critical values have been simulatedseparately with 100 000 replications.The alternative distributions are chosen to fit the simulation study in [35] and are defined as follows. We denote byNMix ( p, µ, Σ) the normal mixture distributions generated by (1 − p ) N d (0 , I d ) + p N d ( µ, Σ) , p ∈ (0 , , µ ∈ R d , Σ > , where Σ > stands for a positive definite matrix. We write in the notation of above µ = 3 for a d -variate vector of3’s and Σ = B d for a ( d × d ) -matrix containing 1’s on the main diagonal and 0.9’s on each off-diagonal entry. Wedenote by t ν (0 , I d ) the multivariate t -distribution with ν degrees of freedom, see [22]. By DIST d ( ϑ ) we denote the d -variate random vector generated by independently simulated components of the distribution DIST with parametervector ( ϑ ) , where DIST is taken to be the Cauchy distribution C, the logistic distribution L, the Gamma distribution Γ as well as the Pearson Type VII distribution P V II , with ϑ denoting in this specific case the degrees of freedom. Thespherical symmetric distributions where simulated using the R package distrEllipse , see [50], and are denoted by S d ( DIST ) , where DIST stands for the distribution of the radii and was chosen to be the exponential, the beta and the χ -distribution.From Tables 5 to 7, it is obvious that T n,a outperforms the competing tests for most of the alternatives considered, againshowing that the tuning parameter has –compared to the BHEP test– little effect on the power. As in the univariate case T n,a has very strong power against the multivariate t -distribution. If the radial distribution of the spherical symmetricalternatives has compact support, the BHEP test exhibits a better performance than T n,a . The HV - and the HJG . -testhave a good power, but they are mostly dominated by the BHEP-test and the T n,a -test. Again, a data driven choice ofthe tuning parameter a would be beneficial for the test, but to the best of our knowledge no reliable multivariate methodis yet available. We suggest to choose a small tuning parameter like a = 0 . . In 1936 R.A. Fisher presented the now classical data set called Iris Flower , see Table I in [21]. The data consist ofthe four variables sepal, length, sepal width, petal length, and petal width, measured on fifty specimens of each of threetypes or iris, namely Iris setosa , Iris versicolor , and Iris virginica . This data set is included in the statistical language R , and it can be downloaded from the UCI Machine Learning Repository, see [17]. That reference provides a list ofarticles that use this specific data set to validate clustering methods, neural networks or learning algorithms, and itpresents a typical test case for statistical classification techniques in machine learning, such as support vector machines.A visualization of two-dimensional projections of the data set is given in Figure 1.In Table 3 we present empirical p -values simulated with 10 000 replications. As can be seen, the test does not reject thehypothesis of normality on a small significance level (like α = 0 . ) for the different species for each of the tuning16esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces . . . . . . . . Sepal length . . . . . . . . . . Sepal width Petal length . . . . . Petal width setosaversicolorvirginicasetosaversicolorvirginicasetosaversicolorvirginica Figure 1: 2D projections of the iris data set with coloured species.parameters considered. For the Iris setosa data, however, an increase of the significance level to 0.05 results in arejection of the hypothesis for a = 2 and a = 3 . For the whole data set, we observe a small p -value due to the mixtureof the three populations and consequently reject the hypothesis H . We proved consistency of the test for multivariate normality based on T n,a against each alternative distribution thatsatisfies the moment condition E (cid:107) X (cid:107) < ∞ . Intuitively, the test should be "all the more consistent" if E (cid:107) X (cid:107) = ∞ .In fact, we conjecture consistency of the new test against any non-normal alternative distribution.The limiting random element T ∞ ,a = (cid:107) Z (cid:107) H from Theorem 4.1 b) has the same distribution as (cid:80) ∞ j =1 λ j N j , where the N j are i.i.d. standard normal random variables, and the λ j are the positive eigenvalues corresponding to eigenfunctionsof the linear integral operator Kf ( s ) = (cid:82) R d K ( s, t ) f ( t ) w a ( t ) d t associated with the covariance kernel K fromTheorem 4.1 a), i.e., we have λf ( s ) = (cid:82) R d K ( s, t ) f ( t ) w a ( t ) d t. These positive eigenvalues clearly depend on K andthe weight function w a . It is hardly possible to obtain a closed form expression for λ , λ , . . . . It would be interestingto approximate the eigenvalues either numerically or by some Monte Carlo method, since the largest eigenvalue has acrucial influence on the approximate Bahadur efficiency, see [5] and the monograph [46], as well as [34] for results ondistribution-free L p -type statistics.Species a p -values for T n,a .17esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces Acknowledgement The authors thank Ioannis Anapolitanos for his help in proving Theorem 1.1 and for sharing his knowledge ofSchrödinger operators. A Proofs and Auxiliary results This section contains the proofs of some of the theorems as well as some auxiliary results that are used in the main text.Recall our standing assumption that the distribution of X is absolutely continuous. In what follows, let ∆ n,j = ( S − / n − I d ) X j − S − / n X n , j = 1 , . . . , n. Notice that ∆ n,j = Y n,j − X j , where the scaled residuals Y n,j are given in (5). Proposition A.1. Let X, X , X , . . . be i.i.d. random vectors satisfying E (cid:107) X (cid:107) < ∞ , E ( X ) = 0 and E ( XX (cid:62) ) = I d .We then have:a) n (cid:88) j =1 (cid:107) ∆ n,j (cid:107) = O P (1) . b) n n (cid:88) j =1 (cid:107) ∆ n,j (cid:107) . s . −→ .c) max j =1 ,...,n (cid:107) ∆ n,j (cid:107) = o P (cid:16) n − / (cid:17) .Proof. a) Notice that (cid:107) ∆ n,j (cid:107) = X (cid:62) j ( S − / n − I d ) X j − X (cid:62) n S − / n ( S − / n − I d ) X j + X n S − n X n and thus, using tr ( AB ) = tr( BA ) , where tr ( C ) denotes the trace of a square matrix C , n (cid:88) j =1 (cid:107) ∆ n,j (cid:107) = tr (cid:16) ( S − / n − I d ) n (cid:88) j =1 X j X (cid:62) j (cid:17) − nX (cid:62) n S − / n ( S − / n − I d ) X n + nX n S − n X n . (41)Due to E (cid:107) X (cid:107) < ∞ , the central limit theorem implies √ n (cid:0) S − n − I d (cid:1) = − S − n √ n ( S n − I d ) = O P (1) . Since √ n ( S − / n − I d )( S − / n + I d ) = √ n ( S − n − I d ) , it follows that √ n ( S − / n − I d ) = O P (1) . In view of (cid:80) nj =1 X j X (cid:62) j = O P ( n ) and √ nX n = O P (1) , we are done.b) After dividing both sides of (41) by n , the first summand on the right hand side converges to 0 P -almost surelybecause n − (cid:80) nj =1 X j X (cid:62) j a . s . −→ I d , and S − / n a . s . −→ I d . The same holds for the other two terms, since X n a . s . −→ .c) Let (cid:107) A (cid:107) sp be the spectral norm of a matrix A . Then (cid:107) ∆ n,j (cid:107) ≤ (cid:107) S − / n − I d (cid:107) sp (cid:107) X j (cid:107) + (cid:107) S − / n (cid:107) sp (cid:107) X n (cid:107) (42)and thus max j =1 ,...,n (cid:107) ∆ n,j (cid:107) ≤ (cid:107) S − / n − I d (cid:107) sp max j =1 ,...,n (cid:107) X j (cid:107) + (cid:107) S − / n (cid:107) sp (cid:107) X n (cid:107) . From Theorem 5.2 of [10] we have max j =1 ,...,n (cid:107) X j (cid:107) /n / . s . −→ . Since √ n (cid:107) S − / n − I d (cid:107) sp = O P (1) , (cid:107) S − / n (cid:107) sp = O P (1) and √ n (cid:107) X n (cid:107) = O P (1) , we are done. Proposition A.2. Let X, X , X , . . . be i.i.d. random vectors satisfying E (cid:107) X (cid:107) < ∞ , E ( X ) = 0 and E ( XX (cid:62) ) = I d .We then have n n (cid:88) j =1 (cid:107) ∆ n,j (cid:107) k (cid:107) X j (cid:107) (cid:96) a . s . −→ as n → ∞ for each k ∈ N and (cid:96) ∈ N such that k + (cid:96) ≤ . Proof. From (42) we have (cid:107) ∆ n,j (cid:107) k (cid:107) X j (cid:107) (cid:96) ≤ k (cid:107) S − / n − I d (cid:107) k sp (cid:107) X j (cid:107) k + (cid:96) + 2 k (cid:107) S − / n (cid:107) k sp (cid:107) X n (cid:107) k (cid:107) X j (cid:107) (cid:96) . Since (cid:107) S − / n − I d (cid:107) sp a . s . −→ , S − / n a . s . −→ I d and (cid:107) X n (cid:107) a . s . −→ , the assertion follows from the strong law of largenumbers. Proof of Proposition 3.2 In what follows, we put Y j = Y n,j and ∆ j = ∆ n,j for the sake of brevity. Notice that cos( t (cid:62) Y j ) = cos( t (cid:62) X j ) − sin(Θ j ) t (cid:62) ∆ j , sin( t (cid:62) Y j ) = sin( t (cid:62) X j ) + cos(Γ j ) t (cid:62) ∆ j , where Θ j , Γ j depend on X , . . . , X n and t and satisfy | Θ j − t (cid:62) X j | ≤ | t (cid:62) ∆ j | , | Γ j − t (cid:62) X j | ≤ | t (cid:62) ∆ j | . (43)Since (cid:107) Y j (cid:107) = (cid:107) X j (cid:107) + (cid:107) ∆ j (cid:107) + 2 X (cid:62) j ∆ j , it follows that V n ( t ) = (cid:80) k =1 V n,k ( t ) , where V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:110) (cid:107) X j (cid:107) CS + ( t, X j ) − µ ( t ) (cid:111) , (44) V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:107) X j (cid:107) t (cid:62) ∆ j (cid:0) cos(Γ j ) − sin(Θ j ) (cid:1) ,V n, ( t ) = 2 √ n n (cid:88) j =1 X (cid:62) j ∆ j CS + ( t, X j ) ,V n, ( t ) = 2 √ n n (cid:88) j =1 X (cid:62) j ∆ j t (cid:62) ∆ j (cid:0) cos(Γ j ) − sin(Θ j ) (cid:1) ,V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:107) ∆ j (cid:107) CS + ( t, X j ) ,V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:107) ∆ j (cid:107) t (cid:62) ∆ j (cid:0) cos(Γ j ) − sin(Θ j ) (cid:1) . We first show that V n,(cid:96) = o P (1) if (cid:96) ∈ { , , } . As for V n, , notice that, by the Cauchy–Schwarz inequality, | V n, ( t ) | ≤ (cid:107) t (cid:107) n − / max i =1 ,...,n (cid:107) X i (cid:107) (cid:80) nj =1 (cid:107) ∆ j (cid:107) . Since E (cid:107) X (cid:107) < ∞ , Theorem 5.2 of [10] yields max i =1 ,...,n (cid:107) X i (cid:107) = o P ( n / ) . In view of Proposition A.1 a), we have V n, = o P (1) . The same proposition immediately also gives V n, = o P (1) . Since | V n, ( t ) | ≤ (cid:107) t (cid:107) n − / max i =1 ,...,n (cid:107) ∆ i (cid:107) (cid:80) nj =1 (cid:107) ∆ j (cid:107) , we have V n, = o P (1) in view ofProposition 41 a) and Proposition A.1 c).We now consider V n, ( t ) . Since | V n, ( t ) | ≤ (cid:107) t (cid:107) n − (cid:80) nj =1 (cid:107) X j (cid:107) n / max i =1 ,...,n (cid:107) ∆ i (cid:107) n / , the law of largenumbers and Proposition 41 c) show that V n, = o P ( n / ) . In view of (43) and Proposition 41 c), the error is thus o P (1) if we replace both Γ j and Θ j with t (cid:62) X j . Moreover, plugging ∆ j = ( S − / n − I d ) X j − S − / n X n into the definitionof V n, ( t ) , the error is o P (1) if we replace S − / n X n with X n . Recalling (13), we thus obtain V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:107) X j (cid:107) t (cid:62) (cid:110) (cid:16) S − / n − I d (cid:17) X j − X n (cid:111) CS − ( t, X j ) + o P (1) . (45)We now use display (2.13) of [36], according to which √ n ( S − / n − I d ) = − √ n n (cid:88) j =1 (cid:0) X j X (cid:62) j − I d (cid:1) + O P (cid:16) n − / (cid:17) . Plugging this expression into (45) we obtain V n, ( t ) = − · n n (cid:88) k =1 (cid:107) X k (cid:107) CS − ( t, X k ) t (cid:62) √ n n (cid:88) j =1 (cid:0) X j X (cid:62) j − I d (cid:1) X k (46) − n n (cid:88) k =1 (cid:107) X k (cid:107) CS − ( t, X k ) 1 √ n n (cid:88) j =1 t (cid:62) X j + o P (1) . ( AB ) = tr( BA ) , where tr denotes trace and AB is a square matrix. Furthermore, theerror is o P (1) if we replace n − (cid:80) nj =1 (cid:107) X j (cid:107) CS − ( t, X j ) and n − (cid:80) nj =1 (cid:107) X j (cid:107) X j CS − ( t, X j ) with their almost surelimits E [ (cid:107) X (cid:107) CS − ( t, X )] = − ∆ ψ − X ( t ) and E [ (cid:107) X (cid:107) X CS − ( t, X )] = −∇ ∆ ψ + X ( t ) , respectively. We therefore obtain V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:110) ∇ ∆ ψ + X ( t ) (cid:0) X j X (cid:62) j − I d (cid:1) t + ∆ ψ − X ( t ) t (cid:62) X j (cid:111) + o P (1) . (47)In the same way, we proceed with V n, ( t ) and, using E [ X CS + ( t, X ) X (cid:62) ] = − Hψ + X ( t ) as well as E [ X CS + ( t, X )] = −∇ ψ − X ( t ) , finally arrive at V n, ( t ) = 1 √ n n (cid:88) j =1 (cid:110) X (cid:62) j ∇ ψ − X ( t ) + X (cid:62) j Hψ − X ( t ) X j + µ ( t ) (cid:111) + o P (1) . (48)By adding (44), (47) and (48), we have V n = (cid:101) V n + o P (1) , where (cid:101) V n is given in (22). (cid:3) Proof of Theorem 6.1 By analogy with Z n ( t ) , as defined in (17), let Z n ( t ) := 1 √ n n (cid:88) j =1 (cid:8) CS + ( t, X j ) − m ( t ) (cid:9) , t ∈ R d . A straightforward calculation gives n − / ( Z n ( t ) − Z n ( t )) = (cid:80) j =1 U n,j ( t ) , where U n, ( t ) = 1 n n (cid:88) j =1 (cid:107) X j (cid:107) (cid:0) CS + ( t, Y n,j ) − CS + ( t, X j ) (cid:1) ,U n, ( t ) = 2 n n (cid:88) j =1 X (cid:62) j ∆ n,j CS + ( t, Y n,j ) , U n, ( t ) = 1 n n (cid:88) j =1 (cid:107) ∆ n,j (cid:107) CS + ( t, Y n,j ) . Since | cos( t (cid:62) Y n,j ) − cos( t (cid:62) X j ) | ≤ (cid:107) t (cid:107) (cid:107) ∆ n,j (cid:107) , | sin( t (cid:62) Y n,j ) − sin( t (cid:62) X j ) | ≤ (cid:107) t (cid:107) (cid:107) ∆ n,j (cid:107) , the Cauchy–Schwarzinequality gives | U n, ( t ) | ≤ (cid:107) t (cid:107) (cid:0) n − (cid:80) nj =1 (cid:107) X j (cid:107) (cid:1) / (cid:0) n − (cid:80) nj =1 (cid:107) ∆ n,j (cid:107) (cid:1) / . By the strong law of large numbers,we have n − (cid:80) nj =1 (cid:107) X j (cid:107) . s . −→ E (cid:107) X (cid:107) . In view of Proposition A.1 b), we thus obtain (cid:107) U n, (cid:107) H a . s . −→ . Next, noticethat, again by the Cauchy–Schwarz inequality, | U n, ( t ) | ≤ (cid:0) n − (cid:80) nj =1 (cid:107) X j (cid:107) (cid:1) / (cid:0) n − (cid:80) nj =1 (cid:107) ∆ n,j (cid:107) (cid:1) / . Hence,we have (cid:107) U n, (cid:107) H a . s . −→ . Finally, | U n, ( t ) | ≤ n − (cid:80) nj =1 (cid:107) ∆ n,j (cid:107) which, in view of Proposition A.1 b), shows that (cid:107) U n, (cid:107) H a . s . −→ . Summarizing, we have (cid:107) n − / ( Z n ( · ) − Z n ( · )) (cid:107) H a . s . −→ . (49)By the strong law of large numbers in Banach spaces, it follows that n − / Z n ( · ) a . s . −→ µ ( · ) − m ( · ) as n → ∞ in H . Inview of (49), we thus obtain T n,a n = (cid:107) n − / Z n ( · ) (cid:107) H a . s . −→ (cid:107) µ ( · ) − m ( · ) (cid:107) H = (cid:90) (cid:0) µ ( t ) − m ( t ) (cid:1) w a ( t ) d t. (cid:3) Proof of Theorem 6.4 Starting with (28), we have ∆ a = (cid:90) (cid:0) E ( (cid:107) X (cid:107) CS + ( t, X )) (cid:1) exp( − a (cid:107) t (cid:107) ) d t − (cid:90) ( d − (cid:107) t (cid:107) ) E ( (cid:107) X (cid:107) CS + ( t, X )) exp (cid:18) − a + 12 (cid:107) t (cid:107) (cid:19) d t + (cid:90) ( d − (cid:107) t (cid:107) ) exp( − ( a + 1) (cid:107) t (cid:107) ) d t =: I ,a − I ,a + I ,a , X , X be independent copies of X , Fubini’s theorem, the addition theorems for the sine function and thecosine function, considerations of symmetry and (7) yield I ,a = (cid:16) πa (cid:17) d/ E (cid:104) (cid:107) X (cid:107) (cid:107) X (cid:107) exp (cid:18) − (cid:107) X − X (cid:107) a (cid:19) (cid:105) . From (8) and (9), we have I ,a = 2(2 π ) d/ (2 a + 1) d/ E (cid:104) (cid:107) X (cid:107) (cid:0) (cid:107) X (cid:107) + 2 da (2 a + 1) (cid:1) exp (cid:18) − (cid:107) X (cid:107) a + 1) (cid:19) (cid:105) ,I ,a = π d/ ( a + 1) d/ (cid:18) a ( a + 1) d + d ( d + 2)4 (cid:19) and thus a (cid:16) aπ (cid:17) d/ ∆ a = 2 a E (cid:104) (cid:107) X (cid:107) (cid:107) X (cid:107) exp (cid:18) − (cid:107) X − X (cid:107) a (cid:19) (cid:105) − a ) d/ (2 a + 1) d/ E (cid:104) (cid:107) X (cid:107) (cid:0) (cid:107) X (cid:107) + 2 da (2 a + 1) (cid:1) exp (cid:18) − (cid:107) X (cid:107) a + 1) (cid:19) (cid:105) + 2 a d/ ( a + 1) d/ (cid:18) a ( a + 1) d + d ( d + 2)4 (cid:19) =: J ,a − J ,a + J ,a , say. An expansion of the exponential terms, dominated convergence (notice that exp( − u ) ≤ − u + u if u ≥ ) anda binomial expansion gives J ,a = 2 ad − d E (cid:107) X (cid:107) + E (cid:0) (cid:107) X (cid:107) (cid:107) X (cid:107) X (cid:62) X (cid:1) + O ( a − ) ,J ,a = 4 ad − d − d − d E (cid:107) X (cid:107) + O ( a − ) ,J ,a = 2 ad − d − d + O ( a − ) and thus lim a →∞ a (cid:16) aπ (cid:17) d/ ∆ a = E (cid:0) (cid:107) X (cid:107) (cid:107) X (cid:107) X (cid:62) X (cid:1) = E (cid:0) (cid:107) X (cid:107) X (cid:1) (cid:62) E (cid:0) (cid:107) X (cid:107) X (cid:1) = (cid:13)(cid:13) E ( (cid:107) X (cid:107) X ) (cid:13)(cid:13) . Notice that the condition E (cid:107) X (cid:107) < ∞ is not only needed for the existence of the final limit, but it also occurs whendealing with J ,a . (cid:3) Proof of Lemma 6.6 Putting Y j = Y n,j and ∆ j = Y j − X j , we have cos( t (cid:62) Y j ) = cos( t (cid:62) X j ) − t (cid:62) ∆ j sin( t (cid:62) X j ) + ε j ( t ) , sin( t (cid:62) Y j ) = sin( t (cid:62) X j ) + t (cid:62) ∆ j cos( t (cid:62) X j ) + η j ( t ) , where | ε j ( t ) | , | η j ( t ) | ≤ (cid:107) t (cid:107) (cid:107) ∆ j (cid:107) (see [36], p. 8) and thusCS ± ( t, Y j ) = CS ± ( t, X j ) ± t (cid:62) ∆ j CS ∓ ( t, X j ) + ε j ( t ) + η j ( t ) . (50)To prove a), notice that Ψ ,n ( t ) = n − (cid:80) nj =1 CS + ( t, X j ) X j + R ,n ( t ) , where R ,n ( t ) = 1 n n (cid:88) j =1 (cid:0) t (cid:62) ∆ j CS − ( t, X j ) + ε j ( t ) + η j ( t ) (cid:1) X j + 1 n n (cid:88) j =1 (cid:0) CS + ( t, X j ) + t (cid:62) ∆ j CS − ( t, X j ) + ε j ( t ) + η j ( t ) (cid:1) ∆ j . | CS ± ( t, X j ) | ≤ and | ε j ( t ) + η j ( t ) | ≤ (cid:107) t (cid:107) (cid:107) ∆ j (cid:107) , the Cauchy–Schwarz inequality yields | R ,n ( t ) | ≤ (cid:107) t (cid:107) n n (cid:88) j =1 (cid:107) ∆ j (cid:107)(cid:107) X j (cid:107) + 2 (cid:107) t (cid:107) n n (cid:88) j =1 (cid:107) ∆ j (cid:107) (cid:107) X j (cid:107) + 2 n n (cid:88) j =1 (cid:107) ∆ j (cid:107) + 2 (cid:107) t (cid:107) n n (cid:88) j =1 (cid:107) ∆ j (cid:107) + 2 (cid:107) t (cid:107) n n (cid:88) j =1 (cid:107) ∆ j (cid:107) . In view of Proposition A.2, each summand converges to zero almost surely, which proves a). The remaining assertionsb), . . . , e) are treated similarly. To tackle b) and e), one can show negligibility of terms by using the fact that the spectralnorm (cid:107) · (cid:107) sp of a matrix satisfies (cid:107) xy (cid:62) (cid:107) sp = (cid:107) x (cid:107) (cid:107) y (cid:107) , if x, y are (column) vectors in R d . The condition E (cid:107) X (cid:107) < ∞ is needed for part e), since (cid:107) Y j (cid:107) (cid:107) Y j Y (cid:62) j (cid:107) sp ≤ (cid:0) (cid:107) X j (cid:107) + 2 (cid:107) X j (cid:107)(cid:107) ∆ j (cid:107) + (cid:107) ∆ j (cid:107) (cid:1) = (cid:107) X j (cid:107) + 4 (cid:107) X j (cid:107) (cid:107) ∆ j (cid:107) + 6 (cid:107) X j (cid:107) (cid:107) ∆ j (cid:107) + 4 (cid:107) X j (cid:107)(cid:107) ∆ j (cid:107) + (cid:107) ∆ j (cid:107) , and multiplication with (cid:107) ∆ j (cid:107) (which arises from an expansion of CS + ( t, Y n,j ) ) gives monomials of order 6. (cid:3) Proof of Theorem 6.7 The proof is similar to the proof of Theorem 5 of [33] and will therefore only be sketched. From (26), (23) and (30), wehave σ a = (cid:80) i,j =1 σ i,ja , where σ i,ja = 4 (cid:90) (cid:90) L i,j ( s, t ) z ( s ) z ( t ) w a ( s ) w a ( t ) d s d t, and L i,j ( s, t ) is given in (32). It thus suffices to prove (cid:98) σ i,jn,a P −→ σ i,ja for each pair ( i, j ) , where (cid:98) σ i,jn,a is given in (39) .The first step of the proof is to replace L i,jn ( s, t ) with L i,jn, ( s, t ) , which arises from L i,jn ( s, t ) given in (34) by throughoutreplacing Y n,k with X k in the functions v n,j ( s, Y n,k ) , j ∈ { , . . . , } . Notice that this replacement also refers to thequantities Ψ − ,n ( s ) , Ψ ,n ( s ) , Ψ − ,n ( s ) and Ψ ,n ( s ) that figure in the definition of v n, , v n, and v n, . Moreover, wereplace z n ( s ) with z n, ( s ) = n − (cid:80) nj =1 CS + ( s, X j ) (cid:107) X j (cid:107) − m ( s ) . Putting (cid:98) σ i,jn, ,a = 4 (cid:90) (cid:90) L i,jn, ( s, t ) z n, ( s ) z n, ( t ) w a ( s ) w a ( t ) d s d t, Fubini’s theorem shows that (cid:98) σ i,jn, ,a P −→ σ i,ja . It thus remains to prove (cid:98) σ i,jn,a − (cid:98) σ i,jn, ,a = o P (1) as n → ∞ . To tackle (cid:98) σ i,jn,a − (cid:98) σ i,jn, ,a , we put Λ n ( s, t ) = L i,jn ( s, t ) z n ( s ) z n ( t ) − L i,jn, ( s, t ) z n, ( s ) z n, ( t ) and notice that Λ n ( s, t ) = (cid:16) L i,jn ( s, t ) − L i,jn, ( s, t ) (cid:17) z n ( s ) z n ( t ) (51) + L i,jn, ( s, t ) ( z n ( s ) z n ( t ) − z n, ( s ) z n, ( t )) , (52) z n ( s ) z n ( t ) − z n, ( s ) z n, ( t ) = ( z n ( s ) − z n, ( s )) ( z n ( t ) − z n, ( t ))+ z n, ( s ) ( z n ( t ) − z n, ( t )) + z n, ( t ) ( z n ( s ) − z n, ( s )) . From (36), we have | z n ( s ) | ≤ d + m ( s ) . Moreover, | z n, ( s ) | ≤ n − (cid:80) nj =1 (cid:107) X j (cid:107) + m ( s ) . A Taylor expansionshows that | z n ( s ) − z n, ( s ) | is bounded from above by a finite sum of terms of the type (cid:107) s (cid:107) (cid:96) n − (cid:80) nj =1 (cid:107) X j (cid:107) β (cid:107) ∆ n,j (cid:107) γ ,where s ∈ { , } , γ ≥ and β + γ ≤ . Next, by straightforward calculations we obtain that | L i,jn, ( s, t ) | isbounded from above by (cid:107) s (cid:107) (cid:96) (cid:107) t (cid:107) m times finitely many products of the type n − (cid:80) nj =1 (cid:107) X j (cid:107) β , where (cid:96), m ∈ { , } and β ∈ { , , , } . It now follows from Proposition A.2 that the term figuring in (52), multiplied with w a ( s ) w a ( t ) and integrated over R d × R d , converges to zero in probability.22esting multivariate normality by zeros of the harmonic oscillator in characteristic function spacesAs for the term figuring in (51), we have | z n ( s ) z n ( t ) | ≤ (2 d + m ( s ))(2 d + m ( t )) . To tackle L i,jn ( s, t ) − L i,jn, ( s, t ) ,we confine ourselves to the case i = 1 , j = 2 , since the other cases can be treated in a similar way. From (35), we have L , n ( s, t ) = − (cid:40) n n (cid:88) j =1 CS − ( s, Y j ) (cid:107) Y j (cid:107) Y j (cid:41) (cid:62) (cid:40) n n (cid:88) j =1 CS + ( t, Y j ) (cid:107) Y j (cid:107) Y j Y (cid:62) j s (cid:41) (53) + 12 (cid:40) n n (cid:88) j =1 CS + ( t, Y j ) (cid:107) Y j (cid:107) (cid:41)(cid:40) n n (cid:88) j =1 CS − ( s, Y j ) (cid:107) Y j (cid:107) Y (cid:62) j s (cid:41) . (54)Moreover, L , n, ( s, t ) = − (cid:40) n n (cid:88) j =1 CS − ( s, X j ) (cid:107) X j (cid:107) X j (cid:41) (cid:62) (cid:40) n n (cid:88) j =1 CS + ( t, X j ) (cid:107) X j (cid:107) X j X (cid:62) j s (cid:41) (55) + 12 (cid:40) n n (cid:88) j =1 CS + ( t, X j ) (cid:107) X j (cid:107) (cid:41)(cid:40) n n (cid:88) j =1 CS − ( s, X j ) (cid:107) X j (cid:107) X (cid:62) j s (cid:41) . (56)Using (50), some calculations show that, for each i ∈ { , } , the norm of the difference of the i th curly bracket in (53)and the i th curly bracket in (55) is bounded from above by finite sum of terms of the type (cid:107) s (cid:107) (cid:96) (cid:107) t (cid:107) m n n (cid:88) j =1 (cid:107) X j (cid:107) β (cid:107) ∆ n,j (cid:107) γ , (57)where (cid:96), m ∈ { , , } , γ ≥ , and β + γ ≤ . Since the same holds for the norm of the difference of the i th curlybracket in (54) and the i th curly bracket in (56), i ∈ { , } , it follows that | L , n ( s, t ) − L , n, ( s, t ) | is bounded fromabove by a finite sum of terms, which are either products of two terms of type (57), or a product of a term of type (57)and one of the terms inside one of the curly brackets in (55) or (56). From Proposition A.2, it follows that the termfiguring in (51), multiplied with w a ( s ) w a ( t ) and integrated over R d × R d , converges to zero in probability. (cid:3) References [1] J. S. Allison and L. Santana. On a data-dependent choice of the tuning parameter appearing in certain goodness-of-fit tests. Journal of Statistical Computation and Simulation , 85(16):3276–3288, 2015.[2] J. A. V. Alva and E. G. Estrada. A generalization of Shapiro–Wilk’s test for multivariate normality. Communica-tions in Statistics - Theory and Methods , 38(11):1870–1883, 2009.[3] T. W. Anderson and D. A. Darling. Asymptotic theory of certain "goodness of fit" criteria based on stochasticprocesses. Annals of Mathematical Statistics , 23:193–212, 1952.[4] M. A. Arcones. Two tests for multivariate normality based on the characteristic function. Mathematical Methodsof Statistics , 16(3):177–201, 2007.[5] R. R. Bahadur. Stochastic comparison of tests. The Annals of Mathematical Statistics , 31(2):276–295, 06 1960.[6] L. Baringhaus, B. Ebner, and N. Henze. The limit distribution of weighted L -Goodness-of-Fit statistics underfixed alternatives, with applications. The Annals of the Institute of Statistical Mathematics , 69(5):969–995, 2017.[7] L. Baringhaus and N. Henze. A consistent test for multivariate normality based on the empirical characteristicfunction. Metrika , 35(1):339–348, 1988.[8] L. Baringhaus and N. Henze. Limit distributions for measures of multivariate skewness and kurtosis based onprojections. Journal of Multivariate Analysis , 38(1):51–69, 1991.[9] L. Baringhaus and N. Henze. Limit distributions for Mardia’s measure of multivariate skewness. The Annals ofStatistics , 20(4):1889–1902, 1992.[10] O. Barndorff-Nielsen. On the limit behavior of extreme order statistics. The Annals of Mathematical Statistics ,34(3):992–1002, 1963.[11] S. Betsch and B. Ebner. Testing normality via a distributional fixed point property in the stein characterization. TEST , Feb 2019.[12] A. Bowman and P. Foster. Adaptive smoothing and density-based tests of multivariate normality. Journal of theAmerican Statistical Association , 88:529–537, 1993.23esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces[13] D. Cox and N. Small. Testing multivariate normality. Biometrika , 65:263–272, 1978.[14] E. del Barrio, J. A. Cuesta-Albertos, C. Matrán, S. Csörgö, C. M. Cuadras, T. de Wet, E. Giné, R. Lockhart,A. Munk, and W. Stute. Contributions of empirical and quantile processes to the asymptotic theory of goodness-of-fit tests. TEST , 9(1):1–96, 2000.[15] E. del Barrio, J. A. Cuesta-Albertos, C. Matran, and J. M. Rodriguez-Rodriguez. Tests of goodness of fit based onthe L -Wasserstein distance. The Annals of Statistics , 27(4):1230–1239, 1999.[16] J. A. Doornik and H. Hansen. An omnibus test for univariate and multivariate normality. Oxford Bulletin ofEconomics and Statistics , 70:927–939, 2008.[17] D. Dua and C. Graff. UCI machine learning repository, 2017.[18] M. L. Eaton and M. D. Perlman. The non-singularity of generalized sample covariance matrices. The Annals ofStatistics , 1(4):710–717, 1973.[19] B. Ebner. Asymptotic theory for the test for multivariate normality by Cox and Small. Journal of MultivariateAnalysis , 111:368–379, 2012.[20] P. J. Farrell, M. Salibian-Barrera, and K. Naczk. On tests for multivariate normality and associated simulationstudies. Journal of Statistical Computation and Simulation , 77(12):1065–1080, 2007.[21] R. A. Fisher. The use of multiple measurements in taxonomic problems. Annals of Eugenics , 7(2):179–188, 1936.[22] A. Genz and F. Bretz. Computation of Multivariate Normal and t Probabilities . Lecture Notes in Statistics, 195.Springer, Dordrecht, 2009.[23] J. Gross and U. Ligges. nortest: Tests for Normality , 2015. R package version 1.0-4.[24] S. J. Gustafson. Mathematical Concepts of Quantum Mechanics . Springer, Heidelberg, 2nd edition, 2011.[25] N. Henze. An approximation to the limit distribution of the Epps–Pulley test statistic for normality. Metrika ,37:7–18, 1990.[26] N. Henze. The asymptotic behaviour of a variant of multivariate kurtosis. Communications in Statistics - Theoryand Methods , 23(4):1047–1061, 1994.[27] N. Henze. On Mardia’s kurtosis test for multivariate normality. Communications in Statistics - Theory andMethods , 23(4):1031–1045, 1994.[28] N. Henze. Extreme smoothing and testing for multivariate normality. Statistics & Probability Letters , 35(3):203–213, 1997.[29] N. Henze. Limit laws for multivariate skewness in the sense of Móri, Rohatgi and Székely. Statistics & ProbabilityLetters , 33(3):299–307, 1997.[30] N. Henze. Invariant tests for multivariate normality: A critical review. Statistical Papers , 43:467–506, 2002.[31] N. Henze and M. D. Jiménez-Gamero. A class of tests for multinormality with i.i.d. and GARCH data based onthe empirical moment generating function. TEST , 28(2):499—-521, 2019.[32] N. Henze, M. D. Jiménez-Gamero, and S. G. Meintanis. Characterizations of multinormality and correspondingtests of fit, including for GARCH models. Econometric Theory , 35(3):510—-546, 2019.[33] N. Henze and C. Mayer. More good news on the hkm test for multivariate reflected symmetry about an unknowncentre. Annals of the Institute of Statistical Mathematics , pages 1–30, 2019.[34] N. Henze, Y. Nikitin, and B. Ebner. Integral distribution-free statistics of lp-type and their asymptotic comparison. Computational Statistics & Data Analysis , 53(9):3426 – 3438, 2009.[35] N. Henze and J. Visagie. Testing for normality in any dimension based on a partial differential equa-tion involving the moment generating function. The Annals of the Institute of Statistical Mathematics ,https://doi.org/10.1007/s10463-019-00720-8:1—-29.[36] N. Henze and T. Wagner. A new approach to the BHEP tests for multivariate normality. Journal of MultivariateAnalysis , 61(1):1–23, 1997.[37] N. Henze and B. Zirkler. A class of invariant and consistent tests for multivariate normality. Communications inStatistics - Theory and Methods , 19(10):3595–3617, 1990.[38] K. Kankainen, S. Taskinen, and H. Oja. Tests of multinormality based on location vectors and scatter matrices. Statistical Methods & Applications , 16:357–359, 2007.[39] I. Kim and S. Park. Likelihood ratio tests for multivariate normality. Communications in Statistics - Theory andMethods , 47(8):1923–1934, 2018. 24esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces[40] É. Krauczi. A study of the quantile correlation test for normality. TEST , 18(1):156–165, 2009.[41] J. Malkovich and A. Afifi. On tests of multivariate normality. Journal of the American Statistical Association ,68(341):176–179, 1973.[42] K. V. Mardia. Measures of multivariate skewness and kurtosis with applications. Biometrika , 57(3):519–530,1970.[43] K. V. Mardia. Applications of some measures of multivariate skewness and kurtosis in testing normality androbustness studies. Sankhya Series B , 36(2):115–128, 1974.[44] C. J. Mecklin and D. J. Mundfrom. A Monte Carlo comparison of the type I and type II error rates of tests ofmultivariate normality. Journal of Statistical Computation and Simulation , 75(2):93–107, 2005.[45] T. F. Móri, V. K. Rohatgi, and G. J. Székely. On multivariate skewness and kurtosis. Teor. Veroyatn. Primen. ,38(3):675–679, 1993.[46] Y. Nikitin. Asymptotic Efficiency of Nonparametric Tests . Cambridge University Press, 1995.[47] J. Pudelko. On a new affine invariant and consistent test for multivariate normality. Probability and MathematicalStatistics , 25(1):43–54, 2005.[48] R Core Team. R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing,Vienna, Austria, 2019.[49] X. Romão, R. Delgado, and A. Costa. An empirical power comparison of univariate goodness-of-fit tests fornormality. Journal of Statistical Computation and Simulation , 80(5):545–591, 2010.[50] P. Ruckdeschel, M. Kohl, T. Stabla, and F. Camphausen. S4 classes for distributions. R News , 6(2):2–6, 2006.[51] S. S. Shapiro and R. S. Francia. An approximate analysis of variance test for normality. Journal of the AmericanStatistical Association , 67(337):215–216, 1972.[52] S. S. Shapiro and M. B. Wilk. An analysis of variance test for normality (complete samples). Biometrika ,52(3/4):591–611, 1965.[53] B. Sürücü. Goodness-of-fit tests for multivariate distributions. Communications in Statistics - Theory and Methods ,35(7):1319–1331, 2006.[54] G. J. Székely and M. L. Rizzo. A new test for multivariate normality. Journal of Multivariate Analysis , 93:58–80,2005.[55] C. Tenreiro. On the choice of the smoothing parameter for the BHEP goodness-of-fit test. Computational Statistics& Data Analysis , 53:1038–1053, 2009.[56] S. Thangavelu. Lectures on Hermite and Laguerre expansions . Mathematical Notes ; 42. Princeton Univ. Press,Princeton, NJ, 1993.[57] M. Thulin. Tests for multivariate normality based on canonical correlations. Statistical Methods & Applications ,23(2):189–208, 2014.[58] V. Voinov, N. Pya, R. Makarov, and Y. Voinov. New invariant and consistent chi-squared type goodness-of-fit testsfor multivariate normality and a related comparative simulation study. Communications in Statistics - Theory andMethods , 45(11):3249–3263, 2016. 25esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces T n,a Alt. n \ a (0 , 20 5 5 5 5 5 5 5 5 5 5 5 550 5 5 5 5 5 5 5 5 5 5 5 5100 5 5 5 5 5 5 5 5 5 5 5 5NMix (0 . , , . 20 13 15 18 19 19 19 23 28 28 27 (0 . , , 20 38 39 40 39 38 36 36 40 43 42 46 50 67 69 70 68 65 61 64 78 80 80 20 39 39 39 38 38 37 34 35 37 34 33 69 69 68 67 66 64 57 64 65 61 60 100 89 89 89 88 87 86 81 88 89 86 85 t 23 23 22 22 21 21 20 19 20 18 17 2250 40 39 38 37 36 31 35 37 32 31 100 61 61 61 59 58 55 46 56 58 50 48 t 12 12 12 12 12 11 11 10 11 9 9 50 19 19 19 18 17 17 14 16 17 13 12 28 28 28 27 26 25 19 22 24 16 15 U ( −√ , √ 20 1 1 1 1 1 1 3 17 13 17 850 26 15 13 9 5 2 8 70 55 58 47100 96 89 81 73 59 24 45 ∗ 99 95 95 97 χ 20 35 37 39 41 41 42 44 44 44 42 38 4250 79 81 83 84 85 85 87 88 84 80 85100 99 99 99 99 99 99 ∗ ∗ ∗ 99 99 ∗ χ 20 16 17 17 18 18 18 18 18 17 16 1850 36 39 42 43 44 44 42 42 39 33 40100 67 71 73 75 75 76 75 74 68 61 71B (1 , 20 34 38 42 44 45 45 50 58 52 51 5350 90 91 92 93 93 92 94 98 98 94 95 97100 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ B (2 , 20 11 12 13 14 15 15 15 16 16 16 14 1450 31 33 37 39 40 40 43 47 45 39 40100 78 79 79 80 80 79 81 89 80 76 82 Γ(1 , 20 64 68 72 73 74 74 77 83 83 77 77 8050 99 99 99 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Γ(5 , 20 21 22 23 24 25 25 25 24 24 23 20 2450 50 54 57 59 59 60 59 59 55 49 56100 84 87 88 89 89 90 90 90 85 81 88W (1 , . 20 65 68 72 74 75 74 ∗ 84 83 78 77 8050 99 99 99 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Gum (1 , 20 28 30 32 32 33 33 31 32 31 28 3250 63 67 69 70 71 72 72 69 69 66 60 67100 92 94 94 95 95 95 94 94 91 89 93LN (0 , 20 84 86 88 89 89 89 91 93 93 91 90 9150 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Table 4: Empirical power of T n,a against competitors ( d = 1 , α = 0 . , 10 000 replications)26esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces BHEP a T n,a Alt. n HV HV ∞ HJG . (0 . , , I ) 20 32 37 33 38 39 41 38 23 17 11 8 31 33 34 35 35 38 39 38 3750 69 85 73 86 87 86 83 62 47 29 16 73 80 82 83 84 86 86 85 83NMix (0 . , , B ) 25 25 24 23 20 13 10 8 7 27 27 27 27 27 26 25 24 2450 53 43 52 43 45 47 43 37 23 17 12 9 54 55 55 55 55 53 50 48 45NMix (0 . , , B ) 16 16 17 16 16 17 16 16 12 11 9 8 19 19 19 18 18 17 16 1550 25 21 24 21 21 26 30 32 28 23 16 11 40 40 37 35 34 30 27 23 21 t (0 , I ) 20 54 53 54 52 52 52 51 48 37 31 24 15 57 57 57 57 55 53 5150 85 78 85 78 80 84 85 83 74 67 54 37 88 88 89 89 89 88 85 81 t (0 , I ) 20 33 31 33 31 31 30 28 25 16 13 10 8 34 34 34 34 34 32 31 3050 60 53 60 52 53 55 53 49 34 27 19 12 63 63 64 64 64 61 57 53 t (0 , I ) 20 16 16 16 15 15 14 13 11 8 7 6 6 17 17 17 16 16 15 1550 29 25 28 25 24 23 21 17 11 10 8 7 29 29 30 30 30 28 26 24C (0 , 20 95 93 95 93 94 95 96 96 94 91 82 97 97 97 97 97 97 96 95 9450 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ L (0 , 20 17 17 17 16 16 15 14 12 8 7 6 6 17 17 17 18 18 17 16 15 1550 29 24 29 24 24 24 21 18 12 10 8 7 31 31 31 32 32 30 28 26 24 Γ (0 . , 20 90 96 92 96 97 98 99 99 98 97 95 88 92 93 93 94 95 97 97 96 9650 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Γ (5 , 20 22 25 23 26 27 27 25 16 12 9 7 20 21 22 22 23 25 26 26 2650 53 68 56 69 70 70 64 40 28 18 10 47 53 56 58 60 67 68 68 67P V II (10) 20 27 26 27 26 26 25 23 20 12 10 8 7 28 28 28 28 28 28 27 25 2550 50 44 50 43 44 44 42 38 25 20 15 10 55 55 55 54 51 47 44P V II (20) 13 13 12 11 10 7 6 6 5 14 14 14 14 14 14 13 13 1350 23 20 23 20 20 18 16 13 9 7 7 6 24 24 24 24 24 24 22 21 20 S ( Exp (1)) 20 67 66 67 65 66 70 74 77 78 72 59 77 75 75 76 77 76 74 69 6450 92 83 92 83 89 96 99 99 ∗ ∗ 99 97 98 98 98 99 99 99 98 96 91 S (B(1 , 20 14 15 14 14 15 17 20 24 33 35 28 24 22 21 21 22 21 18 16 1450 10 12 10 11 13 23 38 52 75 76 66 34 32 32 33 34 32 26 20 14 S ( χ ) 20 22 21 22 21 21 20 18 16 10 9 8 7 23 23 23 23 23 23 22 21 2050 38 32 38 32 32 33 32 29 20 15 12 9 43 42 42 43 43 40 36 32 Table 5: Empirical power of T n,a against competitors ( d = 2 , α = 0 . , 10 000 replications)27esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces BHEP a T n,a Alt. n HV HV ∞ HJG . (0 . , , I ) 20 33 39 34 39 41 44 44 38 20 14 9 7 31 34 35 35 35 37 39 39 3850 67 92 72 91 93 93 89 64 42 21 10 71 85 87 86 86 90 91 90 88NMix (0 . , , B ) 41 41 39 35 28 16 12 9 7 41 42 42 42 42 42 40 3850 80 72 79 72 73 74 68 60 36 25 15 9 80 80 81 81 81 80 79 76 73NMix (0 . , , B ) 20 28 29 28 28 28 29 30 28 22 17 12 9 37 37 35 34 33 32 30 27 2550 45 40 44 38 42 53 63 67 62 49 28 13 75 72 68 67 62 58 49 40 t (0 , I ) 20 65 66 65 64 64 64 61 56 41 32 21 13 70 69 69 69 69 67 64 6150 94 91 94 90 92 94 94 93 85 75 56 28 97 97 97 97 97 96 95 92 t (0 , I ) 20 40 40 40 39 39 38 34 29 17 13 10 8 44 43 43 43 43 42 38 3650 72 68 72 67 68 69 66 61 42 32 20 11 78 78 78 78 79 77 73 67 t (0 , I ) 20 20 20 20 19 19 18 15 12 8 7 7 6 21 21 21 21 21 21 20 18 1850 39 35 39 34 33 32 27 22 13 10 8 6 42 42 42 43 43 41 36 33C (0 , 20 98 98 98 97 98 98 99 99 98 96 91 77 99 99 99 99 99 99 99 98 9750 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ L (0 , 20 16 16 16 16 16 15 13 11 7 6 6 6 18 18 18 18 18 17 16 16 1550 31 29 31 27 27 26 22 18 11 10 8 6 37 36 36 36 36 34 29 26 Γ (0 . , 20 92 98 93 98 98 99 ∗ ∗ 98 96 89 68 94 96 95 95 95 97 98 97 9750 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Γ (5 , 20 22 26 23 27 28 28 24 13 10 8 6 18 20 21 22 22 24 26 27 2650 54 74 58 75 77 74 65 36 23 13 8 47 56 60 62 63 69 72 72 71P V II (10) 20 31 30 30 29 29 28 24 19 11 9 7 7 32 32 32 32 32 31 28 2750 59 54 59 52 53 53 48 41 26 18 12 8 66 65 65 65 64 62 57 52P V II (20) 20 14 14 14 13 13 12 11 9 6 6 6 6 15 15 15 15 15 14 14 13 1250 26 24 26 23 22 21 17 13 8 8 7 6 29 27 28 28 28 28 26 24 22 S ( Exp (1)) 20 89 89 88 87 89 91 94 95 95 94 90 76 95 94 94 94 95 94 91 8650 99 98 99 98 99 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ S (B(1 , 20 45 49 44 47 48 53 60 65 73 73 65 49 68 64 61 61 62 62 58 51 4450 56 54 56 50 59 82 93 97 99 99 98 92 94 91 90 91 92 93 90 81 63 S ( χ ) 20 43 44 42 43 43 43 41 38 29 23 16 11 52 50 50 50 50 48 43 4050 73 68 73 66 69 76 79 79 73 62 44 20 86 85 85 86 87 85 79 69 Table 6: Empirical power of T n,a against competitors ( d = 3 , α = 0 . , 10 000 replications)28esting multivariate normality by zeros of the harmonic oscillator in characteristic function spaces BHEP a T n,a Alt. n HV HV ∞ HJG . (0 . , , I ) 20 31 34 32 35 35 35 31 24 12 9 8 3 26 27 29 30 30 31 33 34 3350 51 86 58 87 90 93 85 38 17 10 8 48 65 77 78 76 75 81 85 83NMix (0 . , , B ) 59 58 54 43 31 14 10 9 3 56 57 58 59 59 60 60 58 5550 94 94 92 86 76 43 24 13 10 95 95 95 95 96 96 95 94NMix (0 . , , B ) 20 55 59 54 57 57 59 59 55 37 24 18 4 71 70 68 66 65 63 58 5150 78 78 78 74 81 93 97 99 95 77 35 20 98 98 98 97 96 95 92 81 t (0 , I ) 20 80 81 79 79 79 77 70 62 39 26 20 5 84 83 83 83 83 82 79 7450 99 99 99 98 99 99 99 98 92 76 42 24 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ t (0 , I ) 20 54 55 54 53 53 49 40 30 15 11 10 3 59 59 58 58 58 57 57 53 4850 89 89 89 87 87 88 84 78 52 31 15 11 96 96 95 94 94 95 94 92 87 t (0 , I ) 20 25 25 24 24 24 22 17 12 8 7 7 2 28 28 28 28 27 27 26 24 2250 55 54 55 51 51 47 38 29 14 10 8 7 65 63 62 63 63 62 58 49C (0 , ∗ ∗ ∗ ∗ ∗ ∗ 99 98 94 88 61 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ L (0 , 20 17 18 17 17 17 15 12 9 6 6 6 3 19 19 19 19 19 19 18 17 1550 36 35 35 32 32 29 22 16 9 8 7 6 45 43 43 43 43 42 37 30 Γ (0 . , 20 93 98 94 98 99 99 ∗ 99 93 81 69 34 93 96 97 96 96 96 97 98 9750 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ 99 91 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Γ (5 , 20 19 23 21 24 24 23 18 9 7 7 4 17 18 19 19 19 20 23 24 2450 50 75 55 76 78 80 73 59 23 12 7 7 44 51 59 61 61 63 68 V II (10) 20 32 32 32 31 30 28 21 15 9 8 7 2 34 34 34 34 34 33 30 2750 67 65 67 63 62 59 50 40 20 13 9 8 76 75 75 75 74 73 69 60P V II (20) 20 13 13 13 12 11 10 7 6 6 6 3 14 14 14 14 14 14 14 13 1150 28 27 28 26 25 22 16 12 7 6 6 6 34 33 31 32 32 32 31 28 24 S ( Exp (1)) 20 99 99 99 99 99 99 99 99 99 98 96 83 ∗ ∗ ∗ ∗ ∗ ∗ ∗ 99 9850 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗S (B(1 , 20 86 91 86 88 89 91 93 94 94 91 86 66 97 97 96 96 95 95 94 91 8450 98 98 99 97 99 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ S ( χ ) 20 77 80 76 78 78 77 74 71 59 46 36 12 89 89 87 86 86 85 84 79 7150 98 98 98 96 98 99 ∗ ∗ 99 95 74 50 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ Table 7: Empirical power of T n,a against competitors ( d = 5 , α = 0 .05